Xrouter

Xrouter is an open-source inference router that sits between OpenClaw and your LLM providers. It uses a fast, hardware-aware classifier to route each request to the most cost-effective model that can handle the task.

This project is MIT licensed. See the MIT License.

Core Features

•OpenAI-compatible reverse proxy at POST /v1/chat/completions.
•3-tier classifier (0 = cheap, 1 = medium, 2 = frontier) with early stream cutoff.
•Hardware detection helper to recommend local engine.
•Provider selection wizard to choose local and cloud endpoints.
•Cache layer with Redis or in-memory LRU fallback.
•Full cloud mode when local inference is not viable.
•Token tracking dashboard at /dashboard.

Workflow

mermaid

flowchart TD
  A["Client / OpenClaw request"] --> B["Router (OpenAI-compatible)"]
  B --> C{"Classifier enabled?"}
  C -->|No| F["Route to Frontier provider"]
  C -->|Yes| D["Classifier (0 / 1 / 2)"]
  D --> E{"Decision"}
  E -->|0| G["Route to Cheap provider"]
  E -->|1| M["Route to Medium provider"]
  E -->|2| F
  G --> H["Provider adapter (auto or explicit)"]
  M --> H
  F --> H
  H --> I["Upstream API call"]
  I --> J["Stream/Response back to client"]

Repository Layout

•src/server.js: router and streaming proxy.
•src/classifier.js: classifier call and retry logic.
•src/config.js: configuration and env parsing.
•src/cache.js: Redis + LRU cache.
•src/token_tracker.js: token tracking.
•scripts/check_hw.js: hardware detection.
•scripts/configure_providers.js: interactive provider setup.

Requirements

•Node.js 20+.
•Local classifier engine (optional).
•A frontier provider endpoint (required).

Quickstart

•Install dependencies.
•(Optional) Start a local model server.
•Run the configuration wizard.
•Start the router.

bash

npm install
npm run configure
npm run dev

How To Use

•Start your local model server (optional but recommended).
•Run the wizard to configure providers and models.
•Start the router.
•Send OpenAI-compatible requests to the router.
•Inspect routing decisions in response headers or the dashboard.

Example local setup (Ollama):

bash

ollama pull llama3.1
ollama run llama3.1

Run the wizard:

bash

npm run configure

Start the router:

bash

npm run dev

Test a request:

bash

curl -i http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"any","messages":[{"role":"user","content":"Fix this sentence: I has a apple."}]}'

Look for these headers:

•X-Xrouter-decision: 0, 1, or 2.
•X-Xrouter-upstream: cheap, medium, or frontier.

Open the dashboard:

•http://localhost:3000/dashboard

Raw usage JSON:

•http://localhost:3000/usage

Provider Selection (Terminal Wizard) Run:

bash

npm run configure

The wizard:

•Scans hardware and recommends a local engine.
•Suggests a local classifier model.
•Lets you choose provider base URLs, API keys, and model overrides for cheap/medium/frontier routes.
•Writes upstreams.json and optionally updates .env.

Quick Start Mode

•If your machine can run a local model, you can choose Quick Start.
•Quick Start auto-configures the local classifier.
•Cheap route always uses the same local model as the classifier to avoid Ollama model swapping.
•You only need to choose medium and frontier providers/models.
•On Apple Silicon (Ollama), the wizard lists installed Ollama models and can auto-download a recommended model.

Routing Behavior

•The classifier is called for each uncached request.
•The first 0, 1, or 2 token returned decides the route.
•If classification fails, the router defaults to the frontier route.
•When the classifier is enabled, cheap, medium, and frontier routes must be configured.

Compatibility

•The router accepts OpenAI-style requests and translates when needed.
•Provider type can be explicit (xrouter, openai_compatible, openai, anthropic, gemini, cohere, azure_openai, mistral, groq, together, perplexity) or auto.
•auto infers the provider adapter from the base URL or API key.
•Providers that expose OpenAI-compatible endpoints use the openai_compatible adapter.
•Anthropic/Gemini/Cohere streaming is translated into OpenAI-style SSE chunks.
•Non-OpenAI adapters currently support text-only messages and basic sampling params (temperature/top_p/stop).

Token Tracking Dashboard

•GET /usage: returns cumulative token usage for cheap, medium, and frontier.
•GET /dashboard: UI that displays token split and totals.
•Local usage is counted inside cheap when cheap uses the local model.

Environment Summary

•HOST: bind host, default 0.0.0.0.
•PORT: bind port, default 3000.
•ROUTER_API_KEY: require Authorization: Bearer <key>.
•LOG_LEVEL: log level (debug/info/warn/error).
•LOG_TO_FILE: set true to write logs to files.
•LOG_DIR: directory for log files (default ./logs).
•CLASSIFIER_ENABLED: set false to disable local classification.
•CLASSIFIER_BASE_URL: OpenAI-compatible classifier endpoint.
•CLASSIFIER_MODEL: classifier model name.
•CLASSIFIER_SYSTEM_PROMPT: classifier prompt (single line).
•CLASSIFIER_TIMEOUT_MS: classifier timeout.
•CLASSIFIER_FORCE_STREAM: force streaming classifier request.
•CLASSIFIER_WARMUP: warm the classifier on server start.
•CLASSIFIER_WARMUP_DELAY_MS: delay before warmup request (ms).
•CLASSIFIER_KEEP_ALIVE_MS: keep-alive interval for classifier warmup (ms).
•CLASSIFIER_LOADING_RETRY_MS: delay between retries when the model is loading.
•CLASSIFIER_LOADING_MAX_RETRIES: max retries when the model is loading.
•CHEAP_BASE_URL: optional, defaults to classifier base URL.
•CHEAP_API_KEY: cheap provider API key.
•CHEAP_MODEL: optional model override for cheap route.
•CHEAP_PROVIDER: provider type for cheap route (auto if empty).
•CHEAP_HEADERS: optional JSON headers for cheap provider (stringified object).
•CHEAP_DEPLOYMENT: Azure deployment override for cheap route.
•CHEAP_API_VERSION: Azure API version override for cheap route.
•MEDIUM_BASE_URL: required when classifier is enabled.
•MEDIUM_API_KEY: medium provider API key.
•MEDIUM_MODEL: optional model override for medium route.
•MEDIUM_PROVIDER: provider type for medium route (auto if empty).
•MEDIUM_HEADERS: optional JSON headers for medium provider (stringified object).
•MEDIUM_DEPLOYMENT: Azure deployment override for medium route.
•MEDIUM_API_VERSION: Azure API version override for medium route.
•FRONTIER_BASE_URL: OpenAI-compatible frontier endpoint.
•FRONTIER_API_KEY: frontier API key.
•FRONTIER_MODEL: optional model override for frontier route.
•FRONTIER_PROVIDER: provider type for frontier route (auto if empty).
•FRONTIER_HEADERS: optional JSON headers for frontier provider (stringified object).
•FRONTIER_DEPLOYMENT: Azure deployment override for frontier route.
•FRONTIER_API_VERSION: Azure API version override for frontier route.
•REDIS_URL: if set, enables Redis cache.

Local Model Installation & Run Guides Ollama (best for Mac, easiest cross-platform)

•Install: Ollama Quickstart
•Pull a model: ollama pull llama3.1
•Run: ollama run llama3.1
•Base URL: http://localhost:11434
•Router config: CLASSIFIER_BASE_URL=http://localhost:11434 CLASSIFIER_MODEL=llama3.1

vLLM (NVIDIA GPU)

•OpenAI server: vLLM OpenAI Server
•Example: vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
•Base URL: http://localhost:8000
•Router config: CLASSIFIER_BASE_URL=http://localhost:8000 CLASSIFIER_MODEL=NousResearch/Meta-Llama-3-8B-Instruct

TensorRT-LLM (NVIDIA, max speed)

•Repo: TensorRT-LLM
•Server: trtllm-serve
•Base URL: http://<host>:<port>
•Router config: CLASSIFIER_BASE_URL=http://<host>:<port> CLASSIFIER_MODEL=<your model>

llama.cpp (CPU/AMD fallback)

•Repo: llama.cpp
•Example: llama-server -m model.gguf --port 8080
•Base URL: http://localhost:8080
•Router config: CLASSIFIER_BASE_URL=http://localhost:8080 CLASSIFIER_MODEL=<gguf model name>

Docker Build and run the router with Redis:

bash

docker compose -f deploy/docker-compose.yml up --build

Hardware Detection Run:

bash

npm run check-hw

This prints the recommended engine:

•tensorrt-llm for large NVIDIA GPUs.
•vllm for standard NVIDIA GPUs.
•mlx for Apple Silicon.
•llama.cpp for CPU/AMD fallback.

Model List Fetching

•The wizard queries provider model list endpoints when possible.
•OpenAI-compatible: /v1/models
•Anthropic: /v1/models
•Gemini: /v1beta/models
•Cohere: /v1/models
•If listing fails, the wizard falls back to scripts/cloud_model_catalog.json.

Star History