Code Evolution

Architecture

code

orchestrator (you)
├── spawn agents (Task tool, subagent_type='general-purpose')
├── evaluate solutions (run evaluate.py)
├── manage archive (best solutions per generation)
└── plan next generation

Critical Principle: Agent Autonomy

NEVER write solution code yourself. You (the orchestrator) ONLY:

•Create the fixed evaluation harness (read-only for agents)
•Spawn autonomous subagents via Task tool
•Evaluate results using the harness
•Plan next generation based on results

Agents have full autonomy to implement their assigned approach. You don't guide their code - you guide their problem-solving strategy.

Workflow

Phase 0: Setup (Orchestrator Only)

Create the immutable harness - agents can ONLY use, never alter:

•problems/<name>/problem.md - problem definition (READ-ONLY for agents)
•problems/<name>/evaluation/evaluate.py - evaluation function (FROZEN, not modifiable by agents)
•problems/<name>/config.json - benchmark, constraints, metadata

Agents receive paths to these files but cannot modify them.

Phase 1: Generation Loop (3-7 generations)

•Plan Strategies: Design 2-4 different approaches for agents to explore
•
Spawn Agents: Use Task tool with subagent_type='general-purpose' (15s timeout per agent)
- •Each agent gets problem description, their specific approach, and path to evaluator
- •Agents write solutions to generations/gen{N}/agent_{id}.py
- •Agents run themselves: subprocess.run([sys.executable, agent_file])
- •Output: JSON with "score" and "circles"
•Evaluate: You run evaluator on agent outputs (agents cannot run this)
•Cross-Inspiration: Share winning ideas with next generation agents for inspiration
•Prune: Keep only the best 1-2 approaches from previous generation
•Archive: Store best solution to generations/archive/

Phase 2: Cross-Inspiration & Pruning

Between generations:

•Reference winners: Show agents the best previous solution's strategy
•Prune dead approaches: Stop testing approaches that underperform
•Mix winning ideas: Combine best techniques from multiple agents
•Diversify within winners: Vary parameters (seeds, iteration counts, thresholds)

File Structure

code

problems/<name>/
├── problem.md
├── config.json
├── evaluation/evaluate.py
└── generations/
    ├── gen1/agent_*.py
    └── archive/best_solution.py

Core Design Principles

Separation of Concerns

•Orchestrator role: Strategy planning, harness building, result evaluation, pruning
•Agent role: Implementation autonomy within their assigned strategy
•Harness: Frozen, read-only, immutable contract between them

Evolution Mechanics

•Diverse exploration (Gen 1-3): Different approaches find different optima
•Cross-inspiration (Gen 2+): Winning ideas inspire next generation
•Pruning (Gen 3+): Kill weak approaches, double down on winners
•Multi-start within winners: Vary parameters of proven strategies (+2-5% improvement)
•Validation first: Invalid solutions score 0 - harness is source of truth

Evolution Strategy

Phase	Generations	Orchestrator Action
Explore	1-3	Spawn 3-4 agents with diverse strategies. Find winners.
Prune	After Gen 2-3	Kill underperforming approaches. Keep 1-2 best.
Cross-Inspire	Before Gen 4+	Share winning solution code/strategy with next agents.
Exploit	4-5	Spawn agents that refine/combine winning approaches. Vary seeds/params.
Polish	6-7	Multi-start within best approach. Push toward benchmark.

Orchestrator Responsibilities

What YOU Do (Never Delegate)

•Create immutable evaluation harness (problem definition, evaluator, config)
•Spawn agents with Task tool
•Analyze results and plan next generation
•Prune: Decide which approaches to continue, which to kill
•Cross-inspire: Extract winning ideas and share with next agents
•Archive best solutions

What Agents Do (Full Autonomy)

•Implement their assigned strategy
•Write solution code
•Self-validate before output
•Run themselves and produce JSON output

Cross-Inspiration Strategy

After each generation, extract and communicate:

markdown

## What Worked
- Agent X achieved Y% with [strategy description]
- Key insight: [what made it work]
- Code reference: [location or snippet]

## What Failed
- Agent Z's [strategy] only achieved W%
- Likely issue: [root cause analysis]
- Don't repeat: [specific thing to avoid]

## Recommended Evolution
- Agents should build on: [winning strategy]
- Vary these parameters: [list of what to try]
- Combine techniques: [which ideas from multiple winners]

Agents use this to:

•Understand what works (cross-inspiration)
•Avoid dead ends (prune knowledge)
•Focus effort on proven directions

References

•Agent spawning: See references/agent-prompts.md
•Evaluator template: See references/evaluator-template.md

Adding New Problems

•Create problems/<name>/problem.md (objective, constraints, benchmark, format)
•Create problems/<name>/config.json (benchmark value, metadata)
•Create problems/<name>/evaluation/evaluate.py (validate, score, evaluate functions)