AgentSkillsCN

model-hierarchy

根据任务的复杂程度,将任务路由至合适的模型,从而实现AI代理运营的成本优化。在以下场景中使用此技能:(1) 决定为某项任务选用哪款模型;(2) 派生子代理;(3) 考虑成本效益;(4) 当前使用的模型对于该任务而言显得“大材小用”。触发指令:“模型路由”“成本优化”“选择哪款模型”“太贵了”“派生代理”。

SKILL.md
--- frontmatter
name: model-hierarchy
description: >
  Cost-optimize AI agent operations by routing tasks to appropriate models based on complexity.
  Use this skill when: (1) deciding which model to use for a task, (2) spawning sub-agents,
  (3) considering cost efficiency, (4) the current model feels like overkill for the task.
  Triggers: "model routing", "cost optimization", "which model", "too expensive", "spawn agent".

Model Hierarchy

Route tasks to the cheapest model that can handle them. Most agent work is routine.

Core Principle

80% of agent tasks are janitorial. File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.

Model Tiers

Tier 1: Cheap ($0.10-0.50/M tokens)

ModelInputOutputBest For
DeepSeek V3$0.14$0.28General routine work
GPT-4o-mini$0.15$0.60Quick responses
Claude Haiku$0.25$1.25Fast tool use
Gemini Flash$0.075$0.30High volume

Tier 2: Mid ($1-5/M tokens)

ModelInputOutputBest For
Claude Sonnet$3.00$15.00Balanced performance
GPT-4o$2.50$10.00Multimodal tasks
Gemini Pro$1.25$5.00Long context

Tier 3: Premium ($10-75/M tokens)

ModelInputOutputBest For
Claude Opus$15.00$75.00Complex reasoning
GPT-4.5$75.00$150.00Frontier tasks
o1$15.00$60.00Multi-step reasoning
o3-mini$1.10$4.40Reasoning on budget

Prices as of Feb 2026. Check provider docs for current rates.

Task Classification

Before executing any task, classify it:

ROUTINE → Use Tier 1

Characteristics:

  • Single-step operations
  • Clear, unambiguous instructions
  • No judgment required
  • Deterministic output expected

Examples:

  • File read/write operations
  • Status checks and health monitoring
  • Simple lookups (time, weather, definitions)
  • Formatting and restructuring text
  • List operations (filter, sort, transform)
  • API calls with known parameters
  • Heartbeat and cron tasks
  • URL fetching and basic parsing

MODERATE → Use Tier 2

Characteristics:

  • Multi-step but well-defined
  • Some synthesis required
  • Standard patterns apply
  • Quality matters but isn't critical

Examples:

  • Code generation (standard patterns)
  • Summarization and synthesis
  • Draft writing (emails, docs, messages)
  • Data analysis and transformation
  • Multi-file operations
  • Tool orchestration
  • Code review (non-security)
  • Search and research tasks

COMPLEX → Use Tier 3

Characteristics:

  • Novel problem solving required
  • Multiple valid approaches
  • Nuanced judgment calls
  • High stakes or irreversible
  • Previous attempts failed

Examples:

  • Multi-step debugging
  • Architecture and design decisions
  • Security-sensitive code review
  • Tasks where cheaper model already failed
  • Ambiguous requirements needing interpretation
  • Long-context reasoning (>50K tokens)
  • Creative work requiring originality
  • Adversarial or edge-case handling

Decision Algorithm

code
function selectModel(task):
    # Rule 1: Escalation override
    if task.previousAttemptFailed:
        return nextTierUp(task.previousModel)
    
    # Rule 2: Explicit complexity signals
    if task.hasSignal("debug", "architect", "design", "security"):
        return TIER_3
    
    if task.hasSignal("write", "code", "summarize", "analyze"):
        return TIER_2
    
    # Rule 3: Default classification
    complexity = classifyTask(task)
    
    if complexity == ROUTINE:
        return TIER_1
    elif complexity == MODERATE:
        return TIER_2
    else:
        return TIER_3

Behavioral Rules

For Main Session

  1. Default to Tier 2 for interactive work
  2. Suggest downgrade when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
  3. Request upgrade when stuck: "This needs more reasoning power. Switching to [premium model]."

For Sub-Agents

  1. Default to Tier 1 unless task is clearly moderate+
  2. Batch similar tasks to amortize overhead
  3. Report failures back to parent for escalation

For Automated Tasks

  1. Heartbeats/monitoring → Always Tier 1
  2. Scheduled reports → Tier 1 or 2 based on complexity
  3. Alert responses → Start Tier 2, escalate if needed

Communication Patterns

When suggesting model changes, use clear language:

Downgrade suggestion:

"This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."

Upgrade request:

"I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."

Explaining hierarchy:

"I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."

Cost Impact

Assuming 100K tokens/day average usage:

StrategyMonthly CostNotes
Pure Opus~$225Maximum capability, maximum spend
Pure Sonnet~$45Good default for most work
Pure DeepSeek~$8Cheap but limited on hard problems
Hierarchy (80/15/5)~$19Best of all worlds

The 80/15/5 split:

  • 80% routine tasks on Tier 1 (~$6)
  • 15% moderate tasks on Tier 2 (~$7)
  • 5% complex tasks on Tier 3 (~$6)

Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.

Integration Examples

OpenClaw

yaml
# config.yml - set default model
model: anthropic/claude-sonnet-4

# In session, switch models
/model opus  # upgrade for complex task
/model deepseek  # downgrade for routine

# Spawn sub-agent on cheap model
sessions_spawn:
  task: "Fetch and parse these 50 URLs"
  model: deepseek

Claude Code

code
# In CLAUDE.md or project instructions
When spawning background agents, use claude-3-haiku for:
- File operations
- Simple searches  
- Status checks

Reserve claude-sonnet-4 for:
- Code generation
- Analysis tasks

General Agent Systems

python
def get_model_for_task(task_description: str) -> str:
    routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status']
    complex_signals = ['debug', 'architect', 'design', 'security', 'why']
    
    desc_lower = task_description.lower()
    
    if any(signal in desc_lower for signal in complex_signals):
        return "claude-opus-4"
    elif any(signal in desc_lower for signal in routine_signals):
        return "deepseek-v3"
    else:
        return "claude-sonnet-4"

Anti-Patterns

DON'T:

  • Run heartbeats on Opus
  • Use premium models for file I/O
  • Keep expensive model when task is clearly routine
  • Spawn sub-agents on premium models by default

DO:

  • Start mid-tier, adjust based on task
  • Spawn helpers on cheapest viable model
  • Escalate explicitly when stuck
  • Track cost per task type to optimize further

Extending This Skill

To customize for your use case:

  1. Adjust tier definitions based on your provider/budget
  2. Add domain-specific signals to classification rules
  3. Track actual complexity vs predicted to improve heuristics
  4. Set budget alerts to catch runaway premium usage