Wafer CLI

GPU development primitives for optimizing CUDA and HIP kernels.

Installation

Before using Wafer CLI commands, install the tool:

bash

# Install wafer-cli using uv (recommended)
uv tool install wafer-cli

# Authenticate (one-time setup)
wafer login

When to Use This Skill

Activate this skill when:

•Writing or optimizing CUDA/HIP kernels
•Profiling GPU code with NCU, NSYS, or ROCprof
•Evaluating kernel correctness and performance
•Looking up GPU programming documentation (CUDA, CUTLASS, HIP)

Core Workflows

1. Documentation Lookup

Query indexed GPU documentation:

bash

# Download corpus (one-time)
wafer corpus download cuda
wafer corpus download cutlass
wafer corpus download hip

# Query documentation
wafer wevin -t ask-docs --corpus cuda "What is warp divergence?"
wafer wevin -t ask-docs --corpus cutlass "What is a TiledMma?"

2. Trace Analysis

Analyze NCU, NSYS, or PyTorch profiler traces:

bash

# AI-assisted analysis
wafer wevin -t trace-analyze --args trace=./profile.ncu-rep "Why is this kernel slow?"

# Direct trace queries (PyTorch/Perfetto JSON)
wafer nvidia perfetto query trace.json \
  "SELECT name, dur/1e6 as ms FROM slice WHERE cat='kernel' ORDER BY dur DESC LIMIT 10"

# NCU/NSYS analysis
wafer nvidia ncu analyze profile.ncu-rep
wafer nvidia nsys analyze profile.nsys-rep

3. Kernel Evaluation

Test correctness and measure speedup against a reference:

bash

# Generate template files
wafer evaluate make-template ./my-kernel
# Creates: kernel.py, reference.py, test_cases.json

# Run evaluation on a configured target
wafer evaluate \
  --impl ./my-kernel/kernel.py \
  --reference ./my-kernel/reference.py \
  --test-cases ./my-kernel/test_cases.json \
  --target <target-name>

# With profiling
wafer evaluate ... --profile

4. AI-Assisted Optimization

Iteratively optimize a kernel with evaluation feedback:

bash

wafer wevin -t optimize-kernel \
  --args kernel=./my_kernel.cu \
  --args target=H100 \
  "Optimize this GEMM for memory bandwidth"

5. Remote Execution

Run on cloud GPU workspaces:

bash

wafer workspaces list
wafer workspaces create my-workspace --gpu H100
wafer workspaces exec <id> "python train.py"
wafer workspaces ssh <id>

Command Reference

Command	Description
`wafer corpus list\|download\|path`	Manage documentation corpora
`wafer evaluate`	Test kernel correctness/performance
`wafer nvidia ncu\|nsys\|perfetto`	NVIDIA profiling tools
`wafer amd isa\|rocprof-compute`	AMD profiling tools
`wafer workspaces`	Cloud GPU environments
`wafer wevin -t <template>`	AI-assisted workflows
`wafer config targets`	Configure GPU targets

Target Configuration

Targets define GPU access methods. Initialize with:

bash

wafer config targets init ssh          # Your own GPU via SSH
wafer config targets init runpod       # RunPod cloud GPUs
wafer config targets init digitalocean # DigitalOcean AMD GPUs

Then use: wafer evaluate --target <name> ...