Intelli-Router: Automatic Model Routing
Purpose
Automatically select the best model for each user message based on task complexity. Saves money by handling simple tasks locally and only escalating to expensive API models when the task requires it.
When to Activate
Run the classification step on EVERY incoming user message BEFORE generating a response. Do NOT classify messages that start with "/" (these are commands, not tasks).
Workflow
- •Take the user's full message text.
- •Run the classifier:
python3 {baseDir}/scripts/classify.py "USER_MESSAGE_HERE"
- •Parse the JSON output. It returns:
{"model": "provider/model-name", "tier": "simple|moderate|coding|complex", "reason": "..."}
- •Use the
modelvalue from the output to switch the session model:
session_status(model="<value from model field>")
- •Then proceed to answer the user's message using the newly selected model.
Tier Routing Table
| Tier | Model | Why |
|---|---|---|
| simple | ollama/dengcao/Qwen3-32B:Q5_K_M | Free local model, handles basic tasks |
| moderate | anthropic/claude-sonnet-4-5 | Balanced cost and capability |
| coding | openai-codex/gpt-5.2 | Optimized for code generation/debugging |
| complex | anthropic/claude-opus-4-5 | Maximum reasoning power |
Fallback Behavior
If Ollama is unreachable or the classifier fails, defaults to moderate (Sonnet). If the model returns an invalid tier, the normalizer picks the best match.
Important Notes
- •The triage model (qwen3:1.7b) runs locally via Ollama at localhost:11434.
- •Classification typically takes less than 500ms.
- •Uses think:false to disable reasoning mode for speed.
- •Do NOT re-classify follow-up messages in the same conversational turn.
- •If the user explicitly requests a specific model (e.g., "use opus for this"), honor that instead.