AgentSkillsCN

model-selection

结合基准测试的建议,提供具备 FinOps 意识的模型选择指南。适用于在代理与提示词之间选择 Opus、Sonnet、Haiku,或对比 Claude 与 GPT/Gemini 时使用。

SKILL.md
--- frontmatter
name: model-selection
description: FinOps-aware model selection guidance with benchmark-backed recommendations. Use when choosing between Opus, Sonnet, Haiku, or comparing Claude to GPT/Gemini for agents and prompts.

Model Selection Guide

Apply this knowledge when selecting models for agents, prompts, or API integrations. Recommendations are based on GitHub Copilot premium request multipliers and independent benchmarks (Artificial Analysis, Aider leaderboards).


Cost Multipliers (GitHub Copilot Premium Requests)

ModelMultiplierRelative Cost
Claude Haiku 4.50.33xBudget-friendly
Claude Sonnet 4.51.0xBaseline
Claude Opus 4.63.0xPremium
GPT-5.x (heavy thinking)8.0xVery expensive

Key insight: Haiku costs 1/3 of Sonnet. Opus costs 3x Sonnet. Choose the cheapest model that succeeds reliably.


Independent Benchmark Summary

ModelIntelligence IndexCode Editing (Aider)Speed (t/s)
Claude Opus 4.65092
Claude Sonnet 4.54384.2% (tied #1 with o1)71
Gemini 3 Flash46199
GPT-5.251

Key finding: Sonnet 4.5 matches o1 on code editing benchmarks (84.2%) while costing 1/3 of Opus.


Decision Framework

Match Model to Task Complexity

Task TypeRecommended ModelReasoning
File reading, grep, simple searchHaiku 4.5Routine tool use; 0.33x cost justified
Code editing, focused implementationSonnet 4.5Matches top-tier benchmarks
Multi-step orchestration, planningOpus 4.6Complex reasoning worth 3x premium
Deep research with synthesisSonnet 4.5Intelligence gap (14%) rarely matters

Agent Type → Model Mapping

Agent PatternModelCostJustification
Research/read-onlyHaiku 4.50.33xNo creative reasoning needed
ImplementationSonnet 4.51.0xCode editing is core strength
Review/testSonnet 4.51.0xAnalysis doesn't need Opus overhead
Orchestrator (multi-agent)Opus 4.63.0xCoordination complexity justifies cost
Complex planningOpus 4.63.0xMulti-step reasoning is Opus strength

When to Use Opus (3x Cost)

Reserve Opus for tasks where the 14% intelligence gap matters:

✅ Use Opus❌ Don't Need Opus
Orchestrating 3+ sub-agentsSingle-agent code editing
Novel architectural decisionsFollowing established patterns
Ambiguous, underspecified problemsClear, well-scoped tasks
Long-horizon multi-step planningShort, focused operations

Heuristic: If you can describe the task in <50 words with clear inputs/outputs, Sonnet suffices.


When to Use Haiku (0.33x Cost)

Haiku excels at routine operations:

✅ Use HaikuWhen It Fails
CLI tool invocationCreative code generation
File reading and summarizationComplex refactoring
Search result synthesisArchitectural decisions
Repetitive data transformationNovel problem solving

Practitioner evidence: "Using the CLI lowers the bar so cheap, fast models can reliably succeed." — Jeremy Daer, 37signals


Cross-Vendor Comparison

When comparing across providers (useful for tool selection):

NeedBest OptionNotes
Fastest responseGemini 3 Flash (199 t/s)Good for interactive use
Best code editingClaude Sonnet 4.5 / o1 (tied)Aider benchmark leaders
Cheapest per tokenDeepSeek V3.2 ($0.30/MTok)20x cheaper than Sonnet
Highest intelligenceGPT-5.2 (51) / Opus 4.6 (50)Near-equivalent at top

Quick Reference

QuestionAnswer
Default for most agents?Sonnet 4.5
Read-only research?Haiku 4.5 (0.33x)
Orchestrators/planners?Opus 4.6 (3x)
When unsure?Start with Sonnet, upgrade if it struggles
FinOps ruleUse cheapest model that succeeds reliably