AgentSkillsCN

llm-capability-matching

在为多代理协作分配开发任务,或估算多代理协作的成本时使用。

SKILL.md
--- frontmatter
name: llm-capability-matching
description: Use when assigning development tasks to different LLMs or estimating costs for multi-agent work.
allowed-tools: Read, Write, WebSearch

LLM Capability Matching for Multi-Agent Development

Assign tasks to the most suitable LLMs based on live research, user budget, and task requirements.

Do NOT rely on hardcoded model scores. Models and pricing change frequently. Always use WebSearch to verify current capabilities before making assignments. See references/llm-strengths.md for the full decision protocol.


Workflow

Step 1: Ask Available LLMs

code
Which LLMs/tools do you have available?
What's your budget constraint? (none / moderate / tight)

Step 2: Research Current Capabilities (WebSearch-First)

For EACH LLM the user mentions:

code
WebSearch: "[Model Name] capabilities benchmarks pricing [current year]"

Verify from official sources:

  • Context window (exact size)
  • Pricing (input/output per 1M tokens)
  • Strengths (from benchmarks, not assumptions)
  • Known limitations

Present findings WITH source URLs. Never guess.

Step 3: Categorize Tasks

CategoryPrioritizeAvoid
Architecture & system designStrongest reasoning modelFast/cheap models
Backend implementationGood code + fast iterationOverkill reasoning
Frontend / UIVision-capable, UI-awareCode-only models
TestingThorough + cost-effectiveExpensive flagship
DocumentationLarge context + clear writingSmall context
DevOps / CI/CDBroad knowledgeNarrow specialists
RefactoringCode-focused, pattern-awareConversational models

Step 4: Consider Constraints

ConstraintStrategy
Budget limitedUse cheaper models for bulk, flagship for architecture only
Time criticalUse fastest-responding models
Quality criticalUse flagship for all phases
Large codebasePrioritize largest context window
Single developerSkip Phase 4; use one model for everything

Step 5: Generate Assignment Matrix

markdown
| Agent ID | LLM | Tasks | Est. Cost | Rationale |
|----------|-----|-------|-----------|-----------|
| [ID] | [Model - verified] | [Tasks] | [Est - from live pricing] | [Why this model - with source] |

Cost Estimation

Token Estimates by Task Type

Task TypeEst. InputEst. Output
Architecture design5,0003,000
API endpoint (each)2,0001,500
React component3,0002,000
Unit test file1,5002,000
Integration test3,0002,500
Documentation page2,0003,000
Refactor module4,0003,000
code
Total Cost = Sum(task_input_tokens * input_price + task_output_tokens * output_price)

Session Splitting Strategy

ScenarioRecommendation
> 50K tokens expectedSplit into phases
Context loss riskCheckpoint every 20K
Multiple modulesOne session per module
Complex dependenciesSequential sessions

Assignment Review Checklist

  • All tasks have an assigned LLM
  • Cost estimates from live pricing (not hardcoded)
  • Token estimates reasonable
  • Handoff points defined
  • Session splitting planned
  • User has approved assignments

Anti-Patterns

  • Never hardcode model scores - they change with every release
  • Never assume pricing - always verify current rates via WebSearch
  • Never skip research - "I think Model X is good at Y" is not evidence
  • Never ignore user experience - their hands-on experience > benchmarks