Code Evolution
Architecture
code
orchestrator (you) ├── spawn agents (Task tool, subagent_type='general-purpose') ├── evaluate solutions (run evaluate.py) ├── manage archive (best solutions per generation) └── plan next generation
Critical Principle: Agent Autonomy
NEVER write solution code yourself. You (the orchestrator) ONLY:
- •Create the fixed evaluation harness (read-only for agents)
- •Spawn autonomous subagents via Task tool
- •Evaluate results using the harness
- •Plan next generation based on results
Agents have full autonomy to implement their assigned approach. You don't guide their code - you guide their problem-solving strategy.
Workflow
Phase 0: Setup (Orchestrator Only)
Create the immutable harness - agents can ONLY use, never alter:
- •
problems/<name>/problem.md- problem definition (READ-ONLY for agents) - •
problems/<name>/evaluation/evaluate.py- evaluation function (FROZEN, not modifiable by agents) - •
problems/<name>/config.json- benchmark, constraints, metadata
Agents receive paths to these files but cannot modify them.
Phase 1: Generation Loop (3-7 generations)
- •Plan Strategies: Design 2-4 different approaches for agents to explore
- •Spawn Agents: Use Task tool with
subagent_type='general-purpose'(15s timeout per agent)- •Each agent gets problem description, their specific approach, and path to evaluator
- •Agents write solutions to
generations/gen{N}/agent_{id}.py - •Agents run themselves:
subprocess.run([sys.executable, agent_file]) - •Output: JSON with
"score"and"circles"
- •Evaluate: You run evaluator on agent outputs (agents cannot run this)
- •Cross-Inspiration: Share winning ideas with next generation agents for inspiration
- •Prune: Keep only the best 1-2 approaches from previous generation
- •Archive: Store best solution to
generations/archive/
Phase 2: Cross-Inspiration & Pruning
Between generations:
- •Reference winners: Show agents the best previous solution's strategy
- •Prune dead approaches: Stop testing approaches that underperform
- •Mix winning ideas: Combine best techniques from multiple agents
- •Diversify within winners: Vary parameters (seeds, iteration counts, thresholds)
File Structure
code
problems/<name>/
├── problem.md
├── config.json
├── evaluation/evaluate.py
└── generations/
├── gen1/agent_*.py
└── archive/best_solution.py
Core Design Principles
Separation of Concerns
- •Orchestrator role: Strategy planning, harness building, result evaluation, pruning
- •Agent role: Implementation autonomy within their assigned strategy
- •Harness: Frozen, read-only, immutable contract between them
Evolution Mechanics
- •Diverse exploration (Gen 1-3): Different approaches find different optima
- •Cross-inspiration (Gen 2+): Winning ideas inspire next generation
- •Pruning (Gen 3+): Kill weak approaches, double down on winners
- •Multi-start within winners: Vary parameters of proven strategies (+2-5% improvement)
- •Validation first: Invalid solutions score 0 - harness is source of truth
Evolution Strategy
| Phase | Generations | Orchestrator Action |
|---|---|---|
| Explore | 1-3 | Spawn 3-4 agents with diverse strategies. Find winners. |
| Prune | After Gen 2-3 | Kill underperforming approaches. Keep 1-2 best. |
| Cross-Inspire | Before Gen 4+ | Share winning solution code/strategy with next agents. |
| Exploit | 4-5 | Spawn agents that refine/combine winning approaches. Vary seeds/params. |
| Polish | 6-7 | Multi-start within best approach. Push toward benchmark. |
Orchestrator Responsibilities
What YOU Do (Never Delegate)
- •Create immutable evaluation harness (problem definition, evaluator, config)
- •Spawn agents with Task tool
- •Analyze results and plan next generation
- •Prune: Decide which approaches to continue, which to kill
- •Cross-inspire: Extract winning ideas and share with next agents
- •Archive best solutions
What Agents Do (Full Autonomy)
- •Implement their assigned strategy
- •Write solution code
- •Self-validate before output
- •Run themselves and produce JSON output
Cross-Inspiration Strategy
After each generation, extract and communicate:
markdown
## What Worked - Agent X achieved Y% with [strategy description] - Key insight: [what made it work] - Code reference: [location or snippet] ## What Failed - Agent Z's [strategy] only achieved W% - Likely issue: [root cause analysis] - Don't repeat: [specific thing to avoid] ## Recommended Evolution - Agents should build on: [winning strategy] - Vary these parameters: [list of what to try] - Combine techniques: [which ideas from multiple winners]
Agents use this to:
- •Understand what works (cross-inspiration)
- •Avoid dead ends (prune knowledge)
- •Focus effort on proven directions
References
- •Agent spawning: See references/agent-prompts.md
- •Evaluator template: See references/evaluator-template.md
Adding New Problems
- •Create
problems/<name>/problem.md(objective, constraints, benchmark, format) - •Create
problems/<name>/config.json(benchmark value, metadata) - •Create
problems/<name>/evaluation/evaluate.py(validate, score, evaluate functions)