Create Benchmark
Skill for creating, modifying, and reviewing Go benchmarks in the gobench.dev project.
Quick Reference
- •Benchmarks live in
benchmarks/{slug}/ - •Each benchmark needs:
*_test.gofiles +_meta.yml - •Generated files (
_bench.out,_bench.json) are created bytask bench - •Shared constants go in
a-consts.go(prefixeda-to sort first)
For full structural details and _meta.yml schema, see reference.md.
Naming Convention (Critical)
Function format: BenchmarkImplementationName_behavior
The parser splits this into:
- •Implementation name: CamelCase after
Benchmark, split into words at uppercase boundaries- •
BenchmarkStringBuilder→ "String Builder" - •
BenchmarkAtomicPointerCounter→ "Atomic Pointer Counter"
- •
- •Behavior suffix: lowercase after the underscore
- •
_runfor single-behavior benchmarks - •
_read,_write, etc. for multi-behavior benchmarks
- •
The implementation field in _meta.yml must exactly match the parsed implementation name (the CamelCase-to-space-separated form).
All implementations must define the same set of behavior suffixes. The UI auto-detects multiple behaviors and renders synced tabs.
Writing Correct Benchmarks
Essential Rules
- •Always use
b.Nfor the hot loop — Go's benchmark framework controls iteration count:
for i := 0; i < b.N; i++ {
// code under test
}
- •Use
b.ResetTimer()after expensive setup that should not count toward the measurement:
func BenchmarkFoo_run(b *testing.B) {
data := expensiveSetup()
b.ResetTimer()
for i := 0; i < b.N; i++ {
process(data)
}
}
- •Prevent dead-code elimination — the compiler may optimize away results. Assign to a package-level sink or use
b.N-scoped variables:
var sink int
func BenchmarkFoo_run(b *testing.B) {
for i := 0; i < b.N; i++ {
sink = compute()
}
}
- •Prevent loop hoisting — don't let the compiler lift invariant work out of the loop. Vary inputs per iteration when the goal is to measure the operation itself:
func BenchmarkLookup_run(b *testing.B) {
m := buildMap()
keys := allKeys(m)
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = m[keys[i%len(keys)]]
}
}
- •Use
b.RunParallelfor concurrent benchmarks (not raw goroutines):
func BenchmarkConcurrent_run(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
// concurrent work
}
})
}
- •
Keep setup fair — all implementations in a group should operate on equivalent data sizes and conditions.
- •
Use
b.StopTimer()/b.StartTimer()only for per-iteration cleanup that is unavoidable. Prefer structuring the benchmark to avoid these. - •
Report allocations — the CLI runs with
-benchmem, soAllocedBytesPerOpandAllocsPerOpare captured automatically. No need to callb.ReportAllocs(). - •
Explain benchmark code — use small comments to explain the benchmark code. Can be inline. Don't overdo it, only explain complex steps.
- •
NEVER run
task bench. It reruns all benchmarks. The user will run the benchmarks and generation logic. Only use the go tooling to verify that the benchmark works as intended.
Common Mistakes to Watch For
- •Value receiver on mutable state:
func (c IntCounter) increment()modifies a copy, not the original. Use pointer receivers for mutable structs. - •Benchmarking setup instead of work: forgetting
b.ResetTimer()after heavy initialization. - •Inconsistent behavior suffixes: if one implementation has
_readand_write, all must. - •Not using
b.N: using a fixed loop count instead ofb.Nproduces invalid results. - •Data-dependent timing: creating new data inside the
b.Nloop skews results. - •Shared mutable state leaking between iterations: previous iteration's side-effects affecting the next.
What Gets Displayed
The frontend visualizes three metrics:
| Metric | Field | Description |
|---|---|---|
| Time | NsPerOp | Nanoseconds per operation (primary metric) |
| Memory | AllocedBytesPerOp | Bytes allocated per operation |
| Allocs | AllocsPerOp | Allocations per operation |
Charts show:
- •Overview: all implementations side-by-side across iteration counts (1K–10K)
- •Detail: each implementation's scaling across CPU core counts (1, 2, 4, 8, …)
- •Comparisons: "X is 2.1× faster than Y" (based on mean NsPerOp at CPU=1)
- •Badges: "Fastest" / "Slowest" per behavior and CPU count
What matters most: NsPerOp is the primary comparison metric. Memory and allocations are secondary but very useful for understanding why one approach is faster.
Workflow: Creating a New Benchmark
- •Create
benchmarks/{slug}/directory (lowercase, hyphens) - •Write
*_test.gofiles withBenchmarkName_behaviorfunctions - •Add shared constants in
a-consts.goif needed - •Create
_meta.yml(see reference.md for schema) - •Verify:
implementationnames in_meta.ymlmatch parsed CamelCase names - •Run
task benchto generate_bench.outand_bench.json
Workflow: Improving an Existing Benchmark
When asked to improve a benchmark, carefully evaluate:
- •Correctness: Is
b.Nused correctly? Is setup excluded viab.ResetTimer()? - •Fairness: Do all implementations benchmark equivalent workloads?
- •Dead-code elimination: Are results consumed (sunk) properly?
- •Receiver types: Pointer vs value — does the method actually mutate state?
- •Concurrency: Is
b.RunParallelused where concurrency matters? - •Meta sync: Does
_meta.ymlmatch the actual benchmark functions? - •Descriptions: Are implementation descriptions accurate and helpful?
Always update _meta.yml when renaming, adding, or removing benchmark functions.