Experiment Workflow

This skill covers the complete lifecycle of Python experiments in experiments/. Experiments are the primary way this thesis generates empirical evidence.

Source of truth: filesystem → experiments/CLAUDE.md → this skill

Project Context

File Layout

Artifact	Location
Tracking	Task file in `tasks/`
Code	`experiments/<label>/`
SPEC.md	`experiments/<label>/SPEC.md`
Tests	`experiments/<label>/test_*.py` (colocated)
Data	`data/<label>/`
Thesis assets	`thesis/assets/<label>/`

Standard Code Structure

code

experiments/<label>/
├── SPEC.md              # Research question, method, success criteria
├── stage_build.py       # Generate data
├── test_stage_build.py  # Colocated test
├── stage_analyze.py     # Compute results
└── stage_plot.py        # Create figures

Why stages? Separate expensive computation (build) from cheap iteration (analyze/plot). Re-run analysis without regenerating data.

Teaching Example

experiments/_example/ demonstrates all conventions. Study it first when creating a new experiment.

Experiment Lifecycle

Initial Development

Phase	Owner	Output
Ideation	Jörn + PM agent	Task file in `tasks/inbox/`
Planning	Task agent (plan mode)	SPEC.md with method and success criteria
Development	Task agent	stage_* code, FINDINGS.md, artifacts, PR
Code review	Review agent	Approved PR (or escalate back)
Merge	PM agent	Merged PR, task moved to `done/`

Ongoing Maintenance

After initial development, experiments may be rerun when the codebase changes.

Phase	Owner	Output
Rerun	Any agent	Updated data and asset artifacts
Maintenance	Any agent	Updated FINDINGS.md; escalate if needed

Invariants

SPEC.md Template

markdown

# <label> Experiment

**Status:** Ideation / Specified / In Progress / Complete

## Research Question
What are we trying to learn?

## Method
How will we answer the question?

## Success Criteria
What outcome means "we are satisfied"?

## Expected Outputs
- data/<label>/...
- thesis/assets/<label>/...

FINDINGS.md Template

markdown

# <label> Findings

## Results Summary
Key findings from the data.

## Interpretation
What the results mean for the research question.

## Escalation Procedures
When and how to flag issues.

## Changelog
- YYYY-MM-DD: Initial findings
- YYYY-MM-DD: Updated after <change>

FINDINGS.md Content Guidelines

FINDINGS.md is a living document updated on each rerun:

•Results summary: Key findings from the data
•Interpretation: What the results mean for the research question
•Escalation procedures: When and how to flag issues (e.g., "if validation fails, open issue before updating thesis")
•Changelog: Dated entries when findings change significantly

Stage Patterns

stage_build.py

python

"""Generate data for <label> experiment."""

from pathlib import Path

# All config hardcoded for reproducibility
CONFIG = {
    "experiment_name": "<label>",
    "variant": "full",  # Change to "smoke" for quick runs
}

DATA_DIR = Path(__file__).parent.parent.parent / "data"
OUTPUT_DIR = DATA_DIR / CONFIG["experiment_name"]

def main():
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    # Generate data
    # Save to OUTPUT_DIR / "results.json" or similar

if __name__ == "__main__":
    main()

stage_plot.py

python

"""Generate plots for <label> experiment."""

from pathlib import Path
import matplotlib.pyplot as plt

CONFIG = {"experiment_name": "<label>"}

DATA_DIR = Path(__file__).parent.parent.parent / "data"
INPUT_DIR = DATA_DIR / CONFIG["experiment_name"]
ASSETS_DIR = Path(__file__).parent.parent.parent / "thesis" / "assets" / CONFIG["experiment_name"]

def main():
    ASSETS_DIR.mkdir(parents=True, exist_ok=True)
    # Load data
    # Create figures
    plt.savefig(ASSETS_DIR / "figure.pdf")

if __name__ == "__main__":
    main()

Escalation Triggers

Escalate (update task file and notify Jörn) when:

•Validation tests that previously passed now fail
•Data artifacts contradict FINDINGS.md conclusions
•Upstream code changes invalidate assumptions in SPEC.md

Philosophy

•Task files track progress - coordination, blockers, status updates
•Files contain research content - SPEC.md, FINDINGS.md, stages, data
•Don't compress ideas - preserve intellectual labor in SPEC/FINDINGS
•Reproduction must be obvious - pattern self-evident from repo structure
•Escalate don't hide - if results change qualitatively, flag it rather than silently updating

Code Style

•KISS (Keep It Simple)
•Pure functions (closeness to math)
•Fast dev cycle (no external users)