Objectives
- •Separate data processing from visualization.
- •Make the pipeline deterministic, re-runnable, and inspectable.
- •Make it obvious where data came from and whether outputs are stale.
Checklist
- •Directory structure
- •Ensure
data/raw,data/processed,data/generated,src,tests,scripts,reportsexist.
- •Ensure
- •Environment
- •Ensure dependencies are pinned/recorded (uv/conda/docker).
- •Orchestration
- •Prefer Snakemake or Make (or, at minimum, one
scripts/run_pipeline.shentrypoint). - •Define inputs/outputs per step.
- •Prefer Snakemake or Make (or, at minimum, one
- •Validation
- •Add fast sanity checks and cheap diagnostics (saved PNGs).
- •Promote critical checks to tests.
- •Reproducibility
- •No hard-coded paths.
- •Fixed random seeds where randomness is used.
Deliverables
- •A single documented entrypoint to run the pipeline.
- •Tests for at least the core transformation.
- •A short note in
docs/DECISIONS.mdfor any scientific choice.