Your Task
Orchestrate experiment execution by reading tool specifications from experiment_summary.yaml and calling the appropriate tool modules sequentially:
- •Read experiment_summary.yaml to identify tools being used
- •Execute model optimization (fine-tuning) for all runs
- •Wait for optimization to complete (REQUIRED)
- •Execute model evaluation for all runs
This ensures the entire experiment runs from training through evaluation with proper dependency management.
Prerequisites
- •experiment_summary.yaml exists (from design-experiment skill)
- •Scaffolding complete (from scaffold-experiment skill)
- •SLURM cluster access
Workflow
High-Level Steps
- •Locate experiment - Find experiment directory (current dir or ask user)
- •Verify scaffolding - Ensure configs exist for optimization and evaluation
- •Read tool specifications - Parse experiment_summary.yaml "tools" section
- •Execute optimization - Call optimizer module (torchtune)
- •Execute evaluation - Call evaluator module (inspect) - MUST wait for optimization
- •Create orchestration log - Document process in
run-experiment.log - •Report combined summary - Show complete status
Tool Modules
Optimizer modules: See optimizers/ for tool-specific execution logic
- •Currently supported: torchtune (fine-tuning)
- •Future: DSPy (prompt optimization), custom trainers
Evaluator modules: See evaluators/ for tool-specific execution logic
- •Currently supported: inspect-ai
- •Future: custom evaluation frameworks
Detailed Workflows
For step-by-step execution details:
- •Torchtune execution: workflows/torchtune.md
- •Inspect execution: workflows/inspect.md
Reading Tool Specifications
Parse experiment_summary.yaml "tools" section to identify frameworks:
Expected format:
tools: preparation: "torchtune" evaluation: "inspect-ai"
Tool to module mapping:
- •
torchtune→ optimizers/torchtune/ - •
inspect-ai→ evaluators/inspect/
If tools section missing: Assume torchtune + inspect-ai (backward compatibility)
Sequential Execution
CRITICAL: Evaluation MUST wait for optimization to complete.
Why? Evaluation jobs need optimized model checkpoints.
Implementation:
- •Execute optimizer module (torchtune fine-tuning)
- •Monitor until ALL optimization jobs complete
- •Only then execute evaluator module (inspect evaluation)
- •Monitor until ALL evaluation jobs complete
- •Report combined results
Logging
Create orchestration log at {experiment_dir}/run-experiment.log:
Log format:
[YYYY-MM-DD HH:MM:SS] ACTION: Description
Details: {specifics}
Result: {outcome}
What to log:
- •Experiment discovery and validation
- •Scaffolding verification
- •Tool module invocations (timestamps, results, durations)
- •Completion status (successes/failures)
- •Errors or warnings
- •Final combined summary
- •Paths to results and module logs
Expected Outputs
After successful execution:
Logs created:
- •
run-experiment.log- Orchestration log - •Optimizer module logs (e.g., detailed fine-tuning execution)
- •Evaluator module logs (e.g., detailed evaluation execution)
Status updated:
- •Run tracking logs updated with job IDs, timestamps, states
- •All execution details recorded in module logs
Artifacts created:
- •Model checkpoints from optimization
- •Evaluation logs from evaluation
Logging: All actions logged to {experiment_dir}/run-{torchtune|inspect}.log (see logging.md)
Error Handling
If experiment_summary.yaml not found:
- •Suggest running design-experiment skill first
- •Do not proceed
If scaffolding incomplete:
- •Report which parts missing
- •Suggest running scaffold-experiment skill
- •Can proceed with optimization only if just evaluation configs missing
If optimization fails:
- •Log failure details
- •Do NOT proceed to evaluation (missing model checkpoints)
- •Report failure and stop
If evaluation fails:
- •Log failure details
- •Optimization results still valid
- •Report partial success
If user cancels:
- •SLURM jobs continue running independently
- •Can resume monitoring by re-running skill
Validation Checklist
Before reporting success, verify:
- •✓ experiment_summary.yaml found and read
- •✓ Scaffolding verified
- •✓ Optimizer module executed and completed
- •✓ Evaluator module executed and completed
- •✓ Model checkpoints exist
- •✓ Evaluation logs exist
- •✓ Orchestration log created
- •✓ All module logs exist
Output Summary
Provide comprehensive summary after completion:
## Run Experiment Complete
Experiment: `{experiment_dir}`
### Optimization Results
✓ {N}/{M} runs completed successfully
Duration: {duration}
**Completed runs:** [list with times]
**Failed runs:** [list with errors]
**Model checkpoints:** {paths}
### Evaluation Results
✓ {N}/{M} evaluations completed successfully
Duration: {duration}
**Completed evaluations:** [list with times]
**Failed evaluations:** [list with errors]
**Evaluation logs:** {paths}
### Total Time
Complete workflow: {total_duration}
- Optimization: {opt_duration}
- Evaluation: {eval_duration}
### Next Steps
1. View results: `inspect view --port=$(get_free_port)`
2. Export data: `inspect log export ...`
3. Analyze results (see experiment_summary.yaml for configuration)
Optional: Generate Summary
After completing the experiment, offer to generate a summary:
Experiment complete! Would you like me to generate a summary.md with key metrics? This will extract final loss values and evaluation accuracy for easy comparison. [Y/n]
If yes: Invoke the summarize-experiment skill to create summary.md
If no: Skip summarization. User can run summarize-experiment manually later.
Important Notes
Orchestration principles:
- •This skill orchestrates rather than implements
- •Each tool module maintains its own detailed log
- •Sequential execution is mandatory (evaluation requires optimization complete)
- •Partial success is acceptable (some runs succeed, others fail)
- •Tool modules can be executed independently if needed
Relationship to other skills:
- •Before: design-experiment, scaffold-experiment
- •After: summarize-experiment (optional), analyze-experiment (planned)
- •Standalone: Individual tool modules can run independently
Resumability:
- •Re-running run-experiment is safe
- •Tool modules check for completed jobs
- •Won't re-submit successful jobs