Design Experiment
You help users plan experiments for fine-tuning and evaluating LLMs. Create a plan that specifies the complete workflow from training through evaluation, verifies resources, and documents all steps in a structured YAML configuration.
Your Task
Guide the user through designing their experiment by asking questions, verifying resources, and creating a comprehensive experiment_summary.yaml file that documents the complete plan.
Workflow
Follow the three-stage process:
1. Parameter Selection → param_selection.md
Guide the user through 9 interactive steps to gather all experiment parameters:
- •Determine experiment purpose and location (sanity check vs research experiment)
- •Understand the experiment (scientific question, variables)
- •Confirm tool choices (torchtune for preparation, inspect-ai for evaluation)
- •Design training runs (models, datasets, hyperparameters)
- •Design evaluation runs (tasks, epochs, evaluation matrix)
- •Establish naming (experiment name, run names)
- •Verify resources (models, datasets, eval scripts exist)
- •Get approval (validate first, then present)
- •Create files (proceed to generation stage)
See param_selection.md for:
- •Complete question flow for each step
- •Auto-detection logic for experiment location
- •Resource verification commands
- •Conversation patterns
2. Validation → validation.md
Before presenting plan to user (step 8), validate completeness:
- •✓ All YAML sections present and properly structured
- •✓ All run names follow convention
- •✓ All parameters documented (variables and controls)
- •✓ Evaluation plan is consistent (0-indexed epochs, base vs fine-tuned)
- •✓ System prompt matches between training and evaluation (critical!)
- •✓ All resources verified (or noted as prerequisites)
See validation.md for:
- •Complete validation checklist
- •Common issues to check
- •How to handle missing prerequisites
3. Experiment Generation → experiment_generation.md
After user approves, create output files:
- •
experiment_summary.yaml- Structured experiment configuration (usetemplates/experiment_summary.yaml) - •
design-experiment.jsonl- Machine-readable audit trail (seelogging.md)
Then ask about next steps (scaffold-experiment?).
See experiment_generation.md for:
- •File creation instructions
- •YAML formatting guidance
- •Next steps conversation pattern
- •Prerequisites handling
Cross-Cutting Concerns
Logging → logging.md
IMPORTANT: Throughout param_selection and generation, create detailed log at {experiment_dir}/design-experiment.jsonl.
What to log:
- •✓ Resource verification (ls, du, df commands and results)
- •✓ Prior run searches (if performed)
- •✓ Decisions (naming, recipe, configuration)
- •✓ File creation
Format: JSON Lines (.jsonl) - one JSON object per line
See logging.md for:
- •Complete log format specification
- •All action types with schemas
- •Example entries for each action type
- •When to log during workflow
Templates → templates/
Reference materials for output generation:
- •
templates/experiment_summary.yaml- YAML schema and structure for experiment plan
Important Reminders
- •Dataset format terminology: Describe JSON datasets as "JSON with input/output keys" - never invent format type names
- •Use paths from
claude.local.mdfor models, datasets, scratch directories - •Always verify resources exist before finalizing plan (log all verification)
- •System prompt consistency is critical - must match between training and evaluation for inspect-ai
- •Epochs are 0-indexed - Use [0, 1, 2] in evaluation matrix
- •Base models use
epochs: null, fine-tuned models useepochs: [0, 1] - •Document tool choices in YAML - torchtune for training, inspect-ai for evaluation
- •Handle missing resources gracefully - note as prerequisites, don't block the plan
- •If inspect-ai task doesn't exist - note that
create-inspect-taskskill should be run first - •Generate YAML, not Markdown - Use structured YAML format with proper indentation
Module Organization
This skill uses the param_selection → validation → generation pattern:
| Module | Purpose | Lines |
|---|---|---|
| param_selection.md | 9-step interactive workflow | ~340 |
| validation.md | Completeness checklist | ~140 |
| experiment_generation.md | Create YAML and JSONL files | ~125 |
| logging.md | JSONL audit trail specification | ~400 |
| templates/experiment_summary.yaml | YAML schema and structure | ~150 |
Pattern: Three action verbs (selection, validation, generation) matching scaffold/run skills, plus cross-cutting logging and templates.
See README.md for: Complete pattern documentation and rationale.