DVC Skill
Expert system for Data Version Control (DVC) operations.
When to Use This Skill
This skill should be triggered when:
- •Versioning large datasets or models (that git cannot handle)
- •Defining and running reproducible data pipelines (
dvc.yaml) - •Tracking and comparing machine learning experiments
- •Debugging DVC cache or remote storage issues
Quick Reference
Common Patterns
Pattern 1: Initialize DVC
dvc init git commit -m "Initialize DVC"
Pattern 2: Versioning specific file
dvc add data/data.xml git add data/data.xml.dvc data/.gitignore git commit -m "Add data.xml to DVC"
Pattern 3: Define Pipeline Stage
dvc run -n prepare \
-p prepare.seed,prepare.split \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
Pattern 4: Reproduce Pipeline
dvc repro
Pattern 5: Push/Pull Data
dvc push dvc pull
Pattern 6: Metrics and Plots
dvc metrics show dvc plots show
Example Code Patterns
Example 1: Configure Remote (S3)
dvc remote add -d myremote s3://mybucket/dvcstore dvc remote modify myremote endpointurl https://s3.us-west-1.amazonaws.com
Example 2: DVC YAML Structure
stages:
prepare:
cmd: python src/prepare.py data/data.xml
deps:
- data/data.xml
- src/prepare.py
params:
- prepare.seed
- prepare.split
outs:
- data/prepared
Reference Files
This skill includes documentation in references/:
- •other.md - General DVC documentation notes (Contribution guide).
[!NOTE] The reference documentation is currently limited. Rely on
dvc --helpor the official website for complex inquiries not covered by the Quick Reference.
usage
For Beginners
Start with dvc init and dvc add to understand the basic workflow of tracking files alongside git.
For Pipelines
Use dvc run (or edit dvc.yaml directly) to define stages. Use dvc repro to execute them.
For Debugging
Use dvc doctor to diagnose environment issues.