AgentSkillsCN

Model Router

模型路由

SKILL.md

model-router

Dynamic model switching based on task classification.

Installation

bash
openclaw skill install model-router

Usage

Automatic Routing (via trigger)

Prefix your message with intent:

PrefixModelUse Case
/codecodexProgramming, debugging, refactoring
/researchkimiLong context, web search, analysis
/deepopusComplex reasoning, architecture, security
/fasthaikuQuick tasks, summaries, simple Q&A
/cheapkimi-freeCost-sensitive, low priority

Example:

code
/code Review this Python function for bugs
/research What are the latest developments in LLM agents?
/deep Design a distributed system for real-time event processing

Direct Switching

code
/model codex    # Switch to Codex
/model kimi     # Switch to Kimi K2.5
/model opus     # Switch to Opus
/model haiku    # Switch to Haiku

Sub-agent Spawning (Isolated Context)

For complex multi-model workflows:

javascript
// From any session
sessions_spawn({
  task: "Analyze this codebase for security issues",
  model: "opus",
  thinking: "high"
})

Configuration

Edit ~/.openclaw/skills/model-router/config.json:

json
{
  "default": "kimi",
  "autoRoute": true,
  "routes": {
    "code": {
      "model": "codex",
      "patterns": ["code", "function", "bug", "refactor", "implement"],
      "threshold": 0.7
    },
    "research": {
      "model": "kimi",
      "patterns": ["research", "find", "search", "latest", "what is"],
      "threshold": 0.6
    },
    "complex": {
      "model": "opus",
      "patterns": ["design", "architecture", "analyze", "security audit"],
      "threshold": 0.8
    }
  },
  "costLimits": {
    "dailyOpusTokens": 100000,
    "alertThreshold": 0.8
  }
}

How It Works

  1. Intent Detection: Simple keyword/pattern matching on user input
  2. Cost Awareness: Tracks daily spend per model, suggests cheaper alternatives when approaching limits
  3. Context Preservation: Main session switches model; sub-agents get isolated context with specific model
  4. Fallback Chain: If model unavailable → fallback to next best option

Architecture

code
User Input
    ↓
Intent Classifier (keyword-based, lightweight)
    ↓
    ├─→ Code detected → Spawn Codex agent OR switch to Codex
    ├─→ Research detected → Switch to Kimi (long context)
    ├─→ Complex detected → Switch to Opus (reasoning)
    └─→ Default → Stay on current model
    ↓
Cost Check (if high-cost model requested)
    ↓
Execute with selected model

Advanced: Multi-Agent Workflows

javascript
// Parallel execution with different models
const results = await Promise.all([
  sessions_spawn({ task: "Code review", model: "codex" }),
  sessions_spawn({ task: "Security check", model: "opus" }),
  sessions_spawn({ task: "Documentation", model: "kimi" })
]);

Integration (Option A: Pre-processor)

The agent detects prefixes and switches models using session_status().

Pattern:

javascript
// 1. Check for explicit prefix
if (message.startsWith('/code')) {
  session_status({ model: 'codex' });
}
if (message.startsWith('/deep')) {
  session_status({ model: 'opus' });
}
if (message.startsWith('/research')) {
  session_status({ model: 'kimi' });
}

// 2. Process task with selected model
// ...

For coding tasks (spawn sub-agent):

javascript
sessions_spawn({
  task: "Review this code for bugs",
  model: "codex"
});

Key principle: Agent-controlled, no magic. User sees what model runs.


Precedence Rules

When multiple keywords match, the router uses this priority:

code
1. Explicit prefix (/code, /deep, /research) → ALWAYS WINS
2. opus (architecture, complex reasoning)
3. codex (code, debugging)
4. kimi (research, general)
5. default → kimi

Example: "design the code architecture"opus (not codex, because opus is higher priority)

Why Not Auto-Detect Everything?

Explicit routing > magic:

  • Predictable costs — you know which model runs
  • Consistent behavior — no surprise switches mid-conversation
  • User control — override anytime with /model
  • Simple — no LLM-based classification overhead

Future Ideas

  • Usage analytics dashboard (/router stats)
  • Smart fallback when rate-limited
  • "Best of 3" ensemble (run task on 3 models, vote on best)
  • Automatic escalation (cheap model fails → retry with better model)