intelli-router

智能模型路由器，根据消息复杂度，利用本地Ollama模型（qwen3:1.7b）对每条传入用户消息进行分诊，随后通过session_status将消息路由至最优AI模型。对所有传入用户消息进行自动筛选：对于简单任务，选择本地模型（dengcao/Qwen3-32B:Q5_K_M via Ollama）；对于中等难度任务，选择Claude Sonnet；对于编码任务，选择Codex；对于复杂任务，选择Claude Opus。当用户发送的不是斜杠命令时，该路由器便会自动触发。关键词：模型路由、分诊、分类、复杂度、成本优化、智能路由。

name: intelli-router description: > Intelligent model router that triages every incoming user message by complexity using a local Ollama model (qwen3:1.7b), then routes to the optimal AI model via session_status. Use on ALL incoming user messages to automatically select between local models (dengcao/Qwen3-32B:Q5_K_M via Ollama) for simple tasks, Claude Sonnet for moderate tasks, Codex for coding tasks, and Claude Opus for complex tasks. Triggers on any user message that is not a slash command. Keywords: model routing, triage, classify, complexity, cost optimization, smart routing. requires: - python3 - requests (pip)

Intelli-Router: Automatic Model Routing

Purpose

Automatically select the best model for each user message based on task complexity. Saves money by handling simple tasks locally and only escalating to expensive API models when the task requires it.

When to Activate

Run the classification step on EVERY incoming user message BEFORE generating a response. Do NOT classify messages that start with "/" (these are commands, not tasks).

Workflow

•Take the user's full message text.
•Run the classifier:

code

python3 {baseDir}/scripts/classify.py "USER_MESSAGE_HERE"

•Parse the JSON output. It returns:

json

{"model": "provider/model-name", "tier": "simple|moderate|coding|complex", "reason": "..."}

•Use the model value from the output to switch the session model:

code

session_status(model="<value from model field>")

•Then proceed to answer the user's message using the newly selected model.

Tier Routing Table

Tier	Model	Why
simple	ollama/dengcao/Qwen3-32B:Q5_K_M	Free local model, handles basic tasks
moderate	anthropic/claude-sonnet-4-5	Balanced cost and capability
coding	openai-codex/gpt-5.2	Optimized for code generation/debugging
complex	anthropic/claude-opus-4-5	Maximum reasoning power

Fallback Behavior

If Ollama is unreachable or the classifier fails, defaults to moderate (Sonnet). If the model returns an invalid tier, the normalizer picks the best match.

Important Notes

•The triage model (qwen3:1.7b) runs locally via Ollama at localhost:11434.
•Classification typically takes less than 500ms.
•Uses think:false to disable reasoning mode for speed.
•Do NOT re-classify follow-up messages in the same conversational turn.
•If the user explicitly requests a specific model (e.g., "use opus for this"), honor that instead.