AgentSkillsCN

benchmark-code

为编码计划提供商(OpenCode Zen、Z.AI)运行模型基准测试

SKILL.md
--- frontmatter
name: benchmark-code
description: Run model benchmarks for coding plan providers (OpenCode Zen, Z.AI)
license: MIT
compatibility: opencode

Multi-Provider Model Benchmark

This skill provides tools to benchmark coding plan models from multiple providers.

Supported Providers

  • OpenCode Zen (default) - Curated models from OpenCode (GPT-5.2, GLM-5, etc.)
  • Z.AI - GLM Coding Plan models (glm-5, glm-4.7, etc.)

Available Tools

  1. benchmark-code - Run full benchmarks on coding plan models
  2. benchmark-code-models - List available models from providers

Prerequisites

Before running benchmarks, set your API keys:

bash
# For OpenCode Zen (default)
export OPENCODE_API_KEY=your_api_key

# For Z.AI
export ZAI_API_KEY=your_api_key

Usage Examples

List available models from OpenCode Zen:

code
Use benchmark-code-models with provider "opencode" to show me available models

List available models from Z.AI:

code
Use benchmark-code-models with provider "zai" to show me available models

Run benchmark on all OpenCode Zen models (default):

code
Run benchmark-code to benchmark all OpenCode Zen models

Run benchmark on all Z.AI models:

code
Run benchmark-code with provider "zai" to benchmark all models

Run benchmark on specific models:

code
Run benchmark-code with provider "zai" and models "glm-5,glm-4.7" for 5 runs

Run benchmark with JSON output:

code
Run benchmark-code with output "json"

Metrics Explained

MetricDescription
TTFTTime to First Token - latency until first token arrives
SpeedGeneration speed in tokens per second
LatencyTotal end-to-end request time
SuccessSuccessful runs / total runs

Provider-Specific Notes

OpenCode Zen

  • Models: GPT-5.2, GLM-5, Kimi K2.5, Qwen3 Coder, etc.
  • Free accounts automatically detected - only free models benchmarked
  • Paid models require payment method

Z.AI

  • Models: glm-5, glm-4.7, glm-4.6, glm-4.5, etc.
  • All models accessible with API key
  • Supports reasoning_content for thinking models (glm-5, glm-4.7)

When to Use

Use this skill when:

  • You need to compare model performance across providers
  • You want to find the fastest model for a task
  • You're optimizing for latency or throughput
  • You need to benchmark new models from any provider