Multi-LLM - Intelligent Model Switching
Trigger Command: multi llm
Default Behavior: Always use Claude Opus 4.5 (strongest model) Only when the message contains
multi llmcommand will local model selection be activated.
What's New in v1.1.0
- •Renamed trigger from
mlti llmtomulti llm(clearer naming) - •Enhanced model existence checking with fallback chain
- •Added detailed usage examples and troubleshooting
- •Improved task detection patterns
Usage
Default Mode (without command)
code
Help me write a Python function -> Uses Claude Opus 4.5 Analyze this code -> Uses Claude Opus 4.5
Multi-Model Mode (with command)
code
multi llm Help me write a Python function -> Selects qwen2.5-coder:32b multi llm Analyze this math proof -> Selects deepseek-r1:70b multi llm Translate to Chinese -> Selects glm4:9b
Command Format
| Command | Description |
|---|---|
multi llm | Activate intelligent model selection |
multi llm coding | Force coding model |
multi llm reasoning | Force reasoning model |
multi llm chinese | Force Chinese model |
multi llm general | Force general model |
Model Mapping
Primary Model (Default): github-copilot/claude-opus-4.5
Local Models (when multi llm triggered):
| Task Type | Model | Size | Best For |
|---|---|---|---|
| Coding | qwen2.5-coder:32b | 19GB | Code generation, debugging, refactoring |
| Reasoning | deepseek-r1:70b | 42GB | Math, logic, complex analysis |
| Chinese | glm4:9b | 5.5GB | Translation, summaries, quick tasks |
| General | qwen3:32b | 20GB | General purpose, fallback |
Fallback Chain
If the selected model is unavailable, the system tries alternatives:
code
Coding: qwen2.5-coder:32b -> qwen2.5-coder:14b -> qwen3:32b Reasoning: deepseek-r1:70b -> deepseek-r1:32b -> qwen3:32b Chinese: glm4:9b -> qwen3:8b -> qwen3:32b General: qwen3:32b -> qwen3:14b -> qwen3:8b
Detection Logic
code
User Input
|
v
Contains "multi llm"?
|
+-- No -> Use Claude Opus 4.5 (default)
|
+-- Yes -> Task Type Detection
|
+-------+-------+-------+
v v v v
Coding Reasoning Chinese General
| | | |
v v v v
qwen2.5 deepseek glm4 qwen3
coder r1:70b :9b :32b
Task Detection Keywords
| Category | Keywords (EN) | Keywords (CN) |
|---|---|---|
| Coding | code, debug, function, script, api, bug, refactor, python, java, javascript | 代码, 编程, 函数, 调试, 重构 |
| Reasoning | analysis, proof, logic, math, solve, algorithm, evaluate | 推理, 分析, 证明, 逻辑, 数学, 计算, 算法 |
| Chinese | translate, summary | 翻译, 总结, 摘要, 简单, 快速 |
Examples
Example 1: Coding Task
bash
# Input multi llm Write a Python function to calculate fibonacci # Output Selected: qwen2.5-coder:32b Reason: Detected coding task (keywords: python, function)
Example 2: Math Analysis
bash
# Input multi llm reasoning Prove that sqrt(2) is irrational # Output Selected: deepseek-r1:70b Reason: Force command 'reasoning' used
Example 3: Quick Translation
bash
# Input multi llm 把这段话翻译成英文 # Output Selected: glm4:9b Reason: Detected Chinese lightweight task (keywords: 翻译)
Example 4: Default (No trigger)
bash
# Input Write a REST API with authentication # Output Selected: claude-opus-4.5 Reason: Default model (no 'multi llm' trigger)
Prerequisites
- •Ollama must be installed and running:
bash
# Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Start Ollama service ollama serve # Pull required models ollama pull qwen2.5-coder:32b ollama pull deepseek-r1:70b ollama pull glm4:9b ollama pull qwen3:32b
- •Check available models:
bash
ollama list
Troubleshooting
Model not found
bash
# Check if model exists ollama list | grep "qwen2.5-coder" # Pull missing model ollama pull qwen2.5-coder:32b
Ollama not running
bash
# Check service status curl -s http://localhost:11434/api/tags # Start Ollama ollama serve &
Slow response
- •Large models (70b) require significant RAM/VRAM
- •Consider using smaller variants:
deepseek-r1:32binstead of70b
Wrong model selected
- •Use force commands:
multi llm coding,multi llm reasoning - •Check if keywords match your task type
Files in This Skill
code
multi-llm/
├── SKILL.md # This documentation
└── scripts/
├── select-model.sh # Model selection logic
└── fallback-demo.sh # Interactive demo script
Integration
With OpenCode/ClaudeCode
The trigger multi llm is detected in your message. Simply prefix your request:
code
multi llm [your request here]
Programmatic Usage
bash
# Get recommended model for a task ./scripts/select-model.sh "multi llm write a sorting algorithm" # Output: qwen2.5-coder:32b # Demo with actual model call ./scripts/fallback-demo.sh --force-local "explain recursion"
Author
- •GitHub: @leohan123123
License
MIT