Go Benchmarking - Modern Patterns and Best Practices
Write effective Go benchmarks using the new Loop pattern and analyze results with benchstat.
Core Pattern: The Loop Method
Use b.Loop() instead of explicit for i := 0; i < b.N; i++ loops:
func BenchmarkExample(b *testing.B) {
// Setup - not measured
data := generateTestData()
for b.Loop() {
// Code to measure
result := processData(data)
_ = result // Keep result alive
}
// Cleanup - not measured
}
Why Loop?
- •Automatically manages timer (resets on first call, stops after loop)
- •Prevents compiler optimizations that would invalidate results
- •Disables inlining of functions called within the loop body
- •Runs benchmark function once per measurement (vs multiple times with b.N)
- •Keeps arguments and results alive to prevent dead code elimination
Critical Rules
- •Loop condition must be exactly
b.Loop()- variations don't work - •Only use Loop OR b.N, never both - mixing creates incorrect measurements
- •Optimizations apply only within loop braces - called functions optimize normally
- •Setup before loop, cleanup after - neither counts toward measurement
Incremental Approach
Philosophy: Make small changes, measure each change, understand impact before proceeding.
Workflow
- •Baseline first: Write benchmark for current code
- •Run multiple times:
go test -bench=. -count=10 > old.txt - •Make one change: Modify one aspect of the implementation
- •Measure again:
go test -bench=. -count=10 > new.txt - •Compare:
benchstat old.txt new.txt - •Decide: Keep change if improvement is significant and stable
- •Repeat: Make next change from proven baseline
Never change multiple things at once - you won't know which change caused the impact.
benchstat - Statistical Analysis
Install: go install golang.org/x/perf/cmd/benchstat@latest
Basic Comparison
# Run old version go test -bench=BenchmarkProcess -count=10 > old.txt # Make changes, run new version go test -bench=BenchmarkProcess -count=10 > new.txt # Compare with statistical significance benchstat old.txt new.txt
Reading benchstat Output
name old time/op new time/op delta Process-8 1.23µs ± 2% 0.98µs ± 1% -20.33% (p=0.000 n=10+10)
- •
±indicates variance (lower is more stable) - •
deltashows percentage change - •
pvalue indicates statistical significance (< 0.05 is significant) - •
nshows sample size
Ignore changes under 5% - measurement noise is real.
Best Practices
Prevent Compiler Optimizations
// BAD: Result not used, entire loop may be optimized away
for b.Loop() {
processData(data)
}
// GOOD: Assign to package-level var
var result int
for b.Loop() {
result = processData(data)
}
// GOOD: Use within loop body (Loop keeps it alive)
for b.Loop() {
r := processData(data)
_ = r
}
Benchmark Allocations
func BenchmarkAllocations(b *testing.B) {
b.ReportAllocs() // Include allocation stats in output
for b.Loop() {
data := make([]byte, 1024) // Measure this allocation
_ = data
}
}
Look for allocs/op in output - reducing allocations often improves performance more than CPU optimization.
Sub-benchmarks for Variations
func BenchmarkEncode(b *testing.B) {
sizes := []int{1, 10, 100, 1000}
for _, size := range sizes {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
encode(data)
}
})
}
}
Run specific sub-benchmark: go test -bench=BenchmarkEncode/size=100
Avoid Timer Manipulation
// BAD: Manual timer management is error-prone
for b.Loop() {
b.StopTimer()
setup := prepareData()
b.StartTimer()
process(setup)
}
// GOOD: Move setup outside loop
for b.Loop() {
setup := prepareData()
process(setup)
}
// BEST: If setup must be per-iteration, benchmark it separately
Loop handles timer automatically - manual control usually indicates wrong benchmark structure.
Parallel Benchmarks
func BenchmarkParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() { // Use pb.Next() not b.Loop() in parallel
doWork()
}
})
}
Note: RunParallel uses pb.Next() not b.Loop() - different API for concurrent execution.
Common Patterns
Benchmarking HTTP Handlers
func BenchmarkHandler(b *testing.B) {
handler := NewHandler()
req := httptest.NewRequest("GET", "/api/data", nil)
for b.Loop() {
w := httptest.NewRecorder()
handler.ServeHTTP(w, req)
}
}
Benchmarking Database Operations
func BenchmarkQuery(b *testing.B) {
db := setupTestDB(b)
defer db.Close()
b.ResetTimer() // Don't measure setup
for b.Loop() {
rows, err := db.Query("SELECT * FROM users WHERE age > ?", 18)
if err != nil {
b.Fatal(err)
}
rows.Close()
}
}
Table-Driven Benchmarks
func BenchmarkHash(b *testing.B) {
cases := []struct {
name string
size int
}{
{"small", 16},
{"medium", 1024},
{"large", 65536},
}
for _, tc := range cases {
b.Run(tc.name, func(b *testing.B) {
data := make([]byte, tc.size)
for b.Loop() {
hash(data)
}
})
}
}
Running Benchmarks
# All benchmarks in package go test -bench=. # Specific benchmark go test -bench=BenchmarkEncode # With memory allocations go test -bench=. -benchmem # Multiple runs for stability go test -bench=. -count=10 # Longer runs for accuracy go test -bench=. -benchtime=10s # CPU profiling go test -bench=. -cpuprofile=cpu.prof # Memory profiling go test -bench=. -memprofile=mem.prof
Analysis Tools
CPU Profiling
go test -bench=BenchmarkProcess -cpuprofile=cpu.prof go tool pprof cpu.prof # Then: top, list FunctionName, web
Memory Profiling
go test -bench=BenchmarkProcess -memprofile=mem.prof go tool pprof -alloc_space mem.prof
Comparing Multiple Changes
# Run baseline go test -bench=. -count=10 > baseline.txt # After change 1 go test -bench=. -count=10 > change1.txt benchstat baseline.txt change1.txt # After change 2 go test -bench=. -count=10 > change2.txt benchstat baseline.txt change2.txt # Compare all three benchstat baseline.txt change1.txt change2.txt
Red Flags
Unstable results (high variance):
- •Increase
-countto get more samples - •Close other applications
- •Check for background tasks affecting CPU
- •Consider using
-cpu=1to reduce scheduling effects
Results too good to be true:
- •Compiler probably optimized away your code
- •Ensure results are used (assign to var or use in assertion)
- •Check with
-gcflags='-m'to see optimization decisions
Inconsistent improvements:
- •Results under 5% change are often noise
- •Need more samples (
-count=20) - •Consider using
-benchtime=5sfor longer, more stable runs
Guidelines
- •Always use b.Loop() - never explicit
for i := 0; i < b.Nloops - •One change at a time - incremental measurement reveals real impact
- •Use benchstat - statistical significance matters more than single runs
- •Report allocations -
b.ReportAllocs()often finds easy wins - •Keep results alive - assign to var or use
_to prevent dead code elimination - •Setup outside loop - only measure what matters
- •Run multiple times -
-count=10minimum for reliable comparison - •Ignore small changes - under 5% is probably noise
- •Profile when stuck - CPU and memory profiles show bottlenecks
- •Measure before optimizing - assumptions are usually wrong
Implementation Checklist
When /go-benchmark is invoked:
- •✓ Use
b.Loop()pattern - •✓ Add
b.ReportAllocs()if allocations matter - •✓ Keep results alive (assign to var or
_ =) - •✓ Setup before loop, cleanup after
- •✓ Run with
-count=10for baseline - •✓ Make one incremental change
- •✓ Measure and compare with
benchstat - •✓ Verify improvement is significant (> 5%, low p-value)
- •✓ Document findings in commit message
- •✓ Consider profiling if results unexpected