AgentSkillsCN

kebab-microbench-and-dump

在 Kebab 中执行微基准测试,并生成 PTX/SASS 转储以进行内核级分析。适用于被要求运行 mbench 目标、检查汇编输出、比较复制或 MMA 内核,或分析底层代码生成时使用。

SKILL.md
--- frontmatter
name: kebab-microbench-and-dump
description: Execute Kebab microbenchmarks and generate PTX/SASS dumps for kernel-level analysis. Use when asked to run mbench targets, inspect assembly output, compare copy or MMA kernels, or analyze low-level code generation.

Kebab Microbenchmark and Dump

When to Use This Skill

  • User requests microbenchmark execution (mbench-*)
  • User needs SASS/PTX dumps for kernels
  • User wants to compare implementations like native/vectorized/PTX/CuTe copy paths

Prerequisites

  • Successful build (make build)
  • cuobjdump available in CUDA toolkit

Step-by-Step Workflows

Workflow A: Run Microbenchmark

  1. Build:
    • make build
  2. Run one microbenchmark:
    • make mbench-copy-gmem-to-smem
    • make mbench-mma-wgmma
    • make mbench-hgemm

Workflow B: Generate Kernel Dumps

  1. Build:
    • make build
  2. Dump one microbenchmark binary:
    • make mdump-copy-gmem-to-smem
  3. Inspect outputs under:
    • dump/microbench/<microbench_name>/

Workflow C: Operator Dumps

  • make dump-gemm-cute
  • make dump-gemm-cuda
  • make dump-gemm-ref

Outputs are under dump/operator/.

Troubleshooting

IssueMitigation
cuobjdump not foundEnsure CUDA toolkit bin directory is available and retry
No PTX generatedSome binaries may not embed PTX; SASS output is still usable
Dump files too largeFocus on split kernel files instead of all_kernels.*

References

  • Makefile (mbench-*, mdump-*, dump-*-cute/cuda/ref)
  • kebab/lib/microbench/CMakeLists.txt
  • dump/