AgentSkillsCN

kebab-benchmark-and-run

在 Kebab 中运行算子基准测试和单次执行程序,用于 GEMM 和逐元素加法。适用于被要求执行性能基线测试、比较不同实现,或通过 Make 目标快速进行正确性/性能检查时使用。

SKILL.md
--- frontmatter
name: kebab-benchmark-and-run
description: Run operator benchmarks and single-run executables in Kebab for GEMM and elementwise add. Use when asked to execute performance baselines, compare implementations, or run quick correctness/perf checks through Make targets.

Kebab Benchmark and Run

When to Use This Skill

  • User asks to benchmark GEMM or elementwise add
  • User asks to run one implementation quickly (cute/cuda/ref)
  • User asks to generate benchmark summary CSV/Markdown outputs

Prerequisites

  • Build succeeded (make build)
  • GPU and CUDA available

Step-by-Step Workflows

Workflow A: Operator Benchmarks

  1. Build:
    • make build
  2. Run one benchmark:
    • make bench-gemm
    • make bench-elementwise-add
  3. Run all benchmarks:
    • make bench-all

Expected outputs:

  • CSV files under bench_results/
  • Optional summary: bench_results/summary.md

Workflow B: Single-Run Binaries

For GEMM variants:

  • make run-gemm-cute
  • make run-gemm-cuda
  • make run-gemm-ref

Notes

  • make test maps to bench-gemm in this repository.
  • Elementwise single-run binary naming in CMake differs from run-*-cute template usage; prefer bench-elementwise-add for reliable checks.

Troubleshooting

IssueMitigation
Benchmark target not foundConfirm target exists in Makefile help output
Binary missing under build/lib/benchmarkRe-run make build and inspect CMake benchmark targets
Empty summary reportVerify CSV files exist and scripts/generate_report.py is present

References

  • Makefile (bench-*, bench-all, run-*, test)
  • kebab/lib/benchmark/CMakeLists.txt
  • scripts/generate_report.py