Multi-Provider Model Benchmark
This skill provides tools to benchmark coding plan models from multiple providers.
Supported Providers
- •OpenCode Zen (default) - Curated models from OpenCode (GPT-5.2, GLM-5, etc.)
- •Z.AI - GLM Coding Plan models (glm-5, glm-4.7, etc.)
Available Tools
- •benchmark-code - Run full benchmarks on coding plan models
- •benchmark-code-models - List available models from providers
Prerequisites
Before running benchmarks, set your API keys:
bash
# For OpenCode Zen (default) export OPENCODE_API_KEY=your_api_key # For Z.AI export ZAI_API_KEY=your_api_key
Usage Examples
List available models from OpenCode Zen:
code
Use benchmark-code-models with provider "opencode" to show me available models
List available models from Z.AI:
code
Use benchmark-code-models with provider "zai" to show me available models
Run benchmark on all OpenCode Zen models (default):
code
Run benchmark-code to benchmark all OpenCode Zen models
Run benchmark on all Z.AI models:
code
Run benchmark-code with provider "zai" to benchmark all models
Run benchmark on specific models:
code
Run benchmark-code with provider "zai" and models "glm-5,glm-4.7" for 5 runs
Run benchmark with JSON output:
code
Run benchmark-code with output "json"
Metrics Explained
| Metric | Description |
|---|---|
| TTFT | Time to First Token - latency until first token arrives |
| Speed | Generation speed in tokens per second |
| Latency | Total end-to-end request time |
| Success | Successful runs / total runs |
Provider-Specific Notes
OpenCode Zen
- •Models: GPT-5.2, GLM-5, Kimi K2.5, Qwen3 Coder, etc.
- •Free accounts automatically detected - only free models benchmarked
- •Paid models require payment method
Z.AI
- •Models: glm-5, glm-4.7, glm-4.6, glm-4.5, etc.
- •All models accessible with API key
- •Supports reasoning_content for thinking models (glm-5, glm-4.7)
When to Use
Use this skill when:
- •You need to compare model performance across providers
- •You want to find the fastest model for a task
- •You're optimizing for latency or throughput
- •You need to benchmark new models from any provider