Run Analysis Pipeline
Execute the project's data cleaning and analysis pipeline, report any failures or warnings, and flag stale output files.
Steps
- •
Read CLAUDE.md to find:
- •Data cleaning scripts and their execution order
- •Analysis scripts and their dependencies
- •Expected output directories (figures, tables)
- •Working directory and R project configuration
- •
Determine scope from arguments:
- •
$ARGUMENTS=all: run full pipeline (clean + analysis) - •
$ARGUMENTS=clean: run only data cleaning scripts - •
$ARGUMENTS=analysis: run only analysis scripts (assumes clean data exists) - •
$ARGUMENTS= specific.Rfilename: run that single script - •Default (no arguments): run
all
- •
- •
Run cleaning scripts (if in scope):
- •Execute each cleaning script in dependency order
- •Capture stdout, stderr, and exit code
- •Record runtime for each script
- •Check that expected output data files are created/updated
- •
Run analysis scripts (if in scope):
- •Execute each analysis script
- •Capture stdout, stderr, and exit code
- •Record runtime for each script
- •Note any warnings (especially from statistical functions)
- •
Check output freshness:
- •Compare modification times of output files (figures, tables) against source scripts
- •Flag outputs that are older than their generating script (stale)
- •Flag scripts that produce no detectable output files
- •
Produce a pipeline report with:
- •Script-by-script results (PASS / WARN / FAIL)
- •Runtime for each script
- •Error messages and warnings (full text)
- •List of stale output files
- •List of missing expected outputs
- •
Save report to
quality_reports/pipeline_run.md - •
Present summary:
- •Total scripts run / passed / warned / failed
- •Critical failures highlighted
- •Stale outputs listed
Important Notes
- •Scripts should be run with the project root as the working directory
- •Use
Rscriptto run R scripts (notR CMD BATCH) - •If a cleaning script fails, do NOT proceed to analysis scripts that depend on its output
- •Capture warnings separately from errors — warnings may indicate data issues worth investigating