Xrouter
Xrouter is an open-source inference router that sits between OpenClaw and your LLM providers. It uses a fast, hardware-aware classifier to route each request to the most cost-effective model that can handle the task.
This project is MIT licensed. See the MIT License.
Core Features
- •OpenAI-compatible reverse proxy at
POST /v1/chat/completions. - •3-tier classifier (0 = cheap, 1 = medium, 2 = frontier) with early stream cutoff.
- •Hardware detection helper to recommend local engine.
- •Provider selection wizard to choose local and cloud endpoints.
- •Cache layer with Redis or in-memory LRU fallback.
- •Full cloud mode when local inference is not viable.
- •Token tracking dashboard at
/dashboard.
Workflow
flowchart TD
A["Client / OpenClaw request"] --> B["Router (OpenAI-compatible)"]
B --> C{"Classifier enabled?"}
C -->|No| F["Route to Frontier provider"]
C -->|Yes| D["Classifier (0 / 1 / 2)"]
D --> E{"Decision"}
E -->|0| G["Route to Cheap provider"]
E -->|1| M["Route to Medium provider"]
E -->|2| F
G --> H["Provider adapter (auto or explicit)"]
M --> H
F --> H
H --> I["Upstream API call"]
I --> J["Stream/Response back to client"]
Repository Layout
- •
src/server.js: router and streaming proxy. - •
src/classifier.js: classifier call and retry logic. - •
src/config.js: configuration and env parsing. - •
src/cache.js: Redis + LRU cache. - •
src/token_tracker.js: token tracking. - •
scripts/check_hw.js: hardware detection. - •
scripts/configure_providers.js: interactive provider setup.
Requirements
- •Node.js 20+.
- •Local classifier engine (optional).
- •A frontier provider endpoint (required).
Quickstart
- •Install dependencies.
- •(Optional) Start a local model server.
- •Run the configuration wizard.
- •Start the router.
npm install npm run configure npm run dev
How To Use
- •Start your local model server (optional but recommended).
- •Run the wizard to configure providers and models.
- •Start the router.
- •Send OpenAI-compatible requests to the router.
- •Inspect routing decisions in response headers or the dashboard.
Example local setup (Ollama):
ollama pull llama3.1 ollama run llama3.1
Run the wizard:
npm run configure
Start the router:
npm run dev
Test a request:
curl -i http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"any","messages":[{"role":"user","content":"Fix this sentence: I has a apple."}]}'
Look for these headers:
- •
X-Xrouter-decision:0,1, or2. - •
X-Xrouter-upstream:cheap,medium, orfrontier.
Open the dashboard:
- •
http://localhost:3000/dashboard
Raw usage JSON:
- •
http://localhost:3000/usage
Provider Selection (Terminal Wizard) Run:
npm run configure
The wizard:
- •Scans hardware and recommends a local engine.
- •Suggests a local classifier model.
- •Lets you choose provider base URLs, API keys, and model overrides for cheap/medium/frontier routes.
- •Writes
upstreams.jsonand optionally updates.env.
Quick Start Mode
- •If your machine can run a local model, you can choose Quick Start.
- •Quick Start auto-configures the local classifier.
- •Cheap route always uses the same local model as the classifier to avoid Ollama model swapping.
- •You only need to choose medium and frontier providers/models.
- •On Apple Silicon (Ollama), the wizard lists installed Ollama models and can auto-download a recommended model.
Routing Behavior
- •The classifier is called for each uncached request.
- •The first
0,1, or2token returned decides the route. - •If classification fails, the router defaults to the frontier route.
- •When the classifier is enabled, cheap, medium, and frontier routes must be configured.
Compatibility
- •The router accepts OpenAI-style requests and translates when needed.
- •Provider type can be explicit (
xrouter,openai_compatible,openai,anthropic,gemini,cohere,azure_openai,mistral,groq,together,perplexity) orauto. - •
autoinfers the provider adapter from the base URL or API key. - •Providers that expose OpenAI-compatible endpoints use the
openai_compatibleadapter. - •Anthropic/Gemini/Cohere streaming is translated into OpenAI-style SSE chunks.
- •Non-OpenAI adapters currently support text-only messages and basic sampling params (temperature/top_p/stop).
Token Tracking Dashboard
- •
GET /usage: returns cumulative token usage forcheap,medium, andfrontier. - •
GET /dashboard: UI that displays token split and totals. - •Local usage is counted inside
cheapwhen cheap uses the local model.
Environment Summary
- •
HOST: bind host, default0.0.0.0. - •
PORT: bind port, default3000. - •
ROUTER_API_KEY: requireAuthorization: Bearer <key>. - •
LOG_LEVEL: log level (debug/info/warn/error). - •
LOG_TO_FILE: settrueto write logs to files. - •
LOG_DIR: directory for log files (default./logs). - •
CLASSIFIER_ENABLED: setfalseto disable local classification. - •
CLASSIFIER_BASE_URL: OpenAI-compatible classifier endpoint. - •
CLASSIFIER_MODEL: classifier model name. - •
CLASSIFIER_SYSTEM_PROMPT: classifier prompt (single line). - •
CLASSIFIER_TIMEOUT_MS: classifier timeout. - •
CLASSIFIER_FORCE_STREAM: force streaming classifier request. - •
CLASSIFIER_WARMUP: warm the classifier on server start. - •
CLASSIFIER_WARMUP_DELAY_MS: delay before warmup request (ms). - •
CLASSIFIER_KEEP_ALIVE_MS: keep-alive interval for classifier warmup (ms). - •
CLASSIFIER_LOADING_RETRY_MS: delay between retries when the model is loading. - •
CLASSIFIER_LOADING_MAX_RETRIES: max retries when the model is loading. - •
CHEAP_BASE_URL: optional, defaults to classifier base URL. - •
CHEAP_API_KEY: cheap provider API key. - •
CHEAP_MODEL: optional model override for cheap route. - •
CHEAP_PROVIDER: provider type for cheap route (autoif empty). - •
CHEAP_HEADERS: optional JSON headers for cheap provider (stringified object). - •
CHEAP_DEPLOYMENT: Azure deployment override for cheap route. - •
CHEAP_API_VERSION: Azure API version override for cheap route. - •
MEDIUM_BASE_URL: required when classifier is enabled. - •
MEDIUM_API_KEY: medium provider API key. - •
MEDIUM_MODEL: optional model override for medium route. - •
MEDIUM_PROVIDER: provider type for medium route (autoif empty). - •
MEDIUM_HEADERS: optional JSON headers for medium provider (stringified object). - •
MEDIUM_DEPLOYMENT: Azure deployment override for medium route. - •
MEDIUM_API_VERSION: Azure API version override for medium route. - •
FRONTIER_BASE_URL: OpenAI-compatible frontier endpoint. - •
FRONTIER_API_KEY: frontier API key. - •
FRONTIER_MODEL: optional model override for frontier route. - •
FRONTIER_PROVIDER: provider type for frontier route (autoif empty). - •
FRONTIER_HEADERS: optional JSON headers for frontier provider (stringified object). - •
FRONTIER_DEPLOYMENT: Azure deployment override for frontier route. - •
FRONTIER_API_VERSION: Azure API version override for frontier route. - •
REDIS_URL: if set, enables Redis cache.
Local Model Installation & Run Guides Ollama (best for Mac, easiest cross-platform)
- •Install: Ollama Quickstart
- •Pull a model:
ollama pull llama3.1 - •Run:
ollama run llama3.1 - •Base URL:
http://localhost:11434 - •Router config:
CLASSIFIER_BASE_URL=http://localhost:11434CLASSIFIER_MODEL=llama3.1
vLLM (NVIDIA GPU)
- •OpenAI server: vLLM OpenAI Server
- •Example:
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 - •Base URL:
http://localhost:8000 - •Router config:
CLASSIFIER_BASE_URL=http://localhost:8000CLASSIFIER_MODEL=NousResearch/Meta-Llama-3-8B-Instruct
TensorRT-LLM (NVIDIA, max speed)
- •Repo: TensorRT-LLM
- •Server: trtllm-serve
- •Base URL:
http://<host>:<port> - •Router config:
CLASSIFIER_BASE_URL=http://<host>:<port>CLASSIFIER_MODEL=<your model>
llama.cpp (CPU/AMD fallback)
- •Repo: llama.cpp
- •Example:
llama-server -m model.gguf --port 8080 - •Base URL:
http://localhost:8080 - •Router config:
CLASSIFIER_BASE_URL=http://localhost:8080CLASSIFIER_MODEL=<gguf model name>
Docker Build and run the router with Redis:
docker compose -f deploy/docker-compose.yml up --build
Hardware Detection Run:
npm run check-hw
This prints the recommended engine:
- •
tensorrt-llmfor large NVIDIA GPUs. - •
vllmfor standard NVIDIA GPUs. - •
mlxfor Apple Silicon. - •
llama.cppfor CPU/AMD fallback.
Model List Fetching
- •The wizard queries provider model list endpoints when possible.
- •OpenAI-compatible:
/v1/models - •Anthropic:
/v1/models - •Gemini:
/v1beta/models - •Cohere:
/v1/models - •If listing fails, the wizard falls back to
scripts/cloud_model_catalog.json.
Star History