LLM Capability Matching for Multi-Agent Development
Assign tasks to the most suitable LLMs based on live research, user budget, and task requirements.
Do NOT rely on hardcoded model scores. Models and pricing change frequently. Always use WebSearch to verify current capabilities before making assignments. See
references/llm-strengths.mdfor the full decision protocol.
Workflow
Step 1: Ask Available LLMs
code
Which LLMs/tools do you have available? What's your budget constraint? (none / moderate / tight)
Step 2: Research Current Capabilities (WebSearch-First)
For EACH LLM the user mentions:
code
WebSearch: "[Model Name] capabilities benchmarks pricing [current year]"
Verify from official sources:
- •Context window (exact size)
- •Pricing (input/output per 1M tokens)
- •Strengths (from benchmarks, not assumptions)
- •Known limitations
Present findings WITH source URLs. Never guess.
Step 3: Categorize Tasks
| Category | Prioritize | Avoid |
|---|---|---|
| Architecture & system design | Strongest reasoning model | Fast/cheap models |
| Backend implementation | Good code + fast iteration | Overkill reasoning |
| Frontend / UI | Vision-capable, UI-aware | Code-only models |
| Testing | Thorough + cost-effective | Expensive flagship |
| Documentation | Large context + clear writing | Small context |
| DevOps / CI/CD | Broad knowledge | Narrow specialists |
| Refactoring | Code-focused, pattern-aware | Conversational models |
Step 4: Consider Constraints
| Constraint | Strategy |
|---|---|
| Budget limited | Use cheaper models for bulk, flagship for architecture only |
| Time critical | Use fastest-responding models |
| Quality critical | Use flagship for all phases |
| Large codebase | Prioritize largest context window |
| Single developer | Skip Phase 4; use one model for everything |
Step 5: Generate Assignment Matrix
markdown
| Agent ID | LLM | Tasks | Est. Cost | Rationale | |----------|-----|-------|-----------|-----------| | [ID] | [Model - verified] | [Tasks] | [Est - from live pricing] | [Why this model - with source] |
Cost Estimation
Token Estimates by Task Type
| Task Type | Est. Input | Est. Output |
|---|---|---|
| Architecture design | 5,000 | 3,000 |
| API endpoint (each) | 2,000 | 1,500 |
| React component | 3,000 | 2,000 |
| Unit test file | 1,500 | 2,000 |
| Integration test | 3,000 | 2,500 |
| Documentation page | 2,000 | 3,000 |
| Refactor module | 4,000 | 3,000 |
code
Total Cost = Sum(task_input_tokens * input_price + task_output_tokens * output_price)
Session Splitting Strategy
| Scenario | Recommendation |
|---|---|
| > 50K tokens expected | Split into phases |
| Context loss risk | Checkpoint every 20K |
| Multiple modules | One session per module |
| Complex dependencies | Sequential sessions |
Assignment Review Checklist
- • All tasks have an assigned LLM
- • Cost estimates from live pricing (not hardcoded)
- • Token estimates reasonable
- • Handoff points defined
- • Session splitting planned
- • User has approved assignments
Anti-Patterns
- •Never hardcode model scores - they change with every release
- •Never assume pricing - always verify current rates via WebSearch
- •Never skip research - "I think Model X is good at Y" is not evidence
- •Never ignore user experience - their hands-on experience > benchmarks