Benchmark Common Skill
This skill provides comprehensive benchmarking standards for performance comparison. Designed to compare multiple implementations in a single benchmark file.
When to Use
- •Compare different implementations (e.g., RingBuffer vs LinkedListBuffer)
- •Measure performance of Common packages
- •Validate optimization effectiveness
- •Find performance bottlenecks
Quality Standards
| Metric | Requirement |
|---|---|
| Warm-up | Reset state between iterations |
| Fair Comparison | Same data sizes, same conditions |
| Multiple Sizes | Test small, medium, large data |
| Memory Reporting | Use b.ReportAllocs() |
| Clear Naming | BenchmarkOperation_Implementation_Size |
Workflow (Strict 3-Step Process)
Step 1: Benchmark Design
Identify:
- •Implementations to compare: List all types/approaches
- •Operations to benchmark: Core operations (Read, Write, etc.)
- •Data sizes: Small (64B), Medium (1KB), Large (1MB)
- •Setup requirements: Pre-allocation, warm-up needs
Output Format:
markdown
## Benchmark Design: [comparison name] ### Implementations - [Implementation 1]: Description - [Implementation 2]: Description ### Operations | Operation | Description | Why benchmark? | |-----------|-------------|----------------| | Write | Append data | Core operation | | Read | Consume data | Core operation | ### Data Sizes - Small: 64 bytes (cache-friendly) - Medium: 1KB (typical payload) - Large: 1MB (stress test)
STOP and wait for user approval.
Step 2: Benchmark Case Design
Create benchmark matrix:
| Benchmark ID | Operation | Implementation | Size | Setup |
|---|---|---|---|---|
| B1.1 | Write | RingBuffer | 64B | New(1024) |
| B1.2 | Write | LinkedListBuffer | 64B | New() |
| B2.1 | Write | RingBuffer | 1KB | New(4096) |
| B2.2 | Write | LinkedListBuffer | 1KB | New() |
Considerations:
- •Same data for all implementations
- •Pre-allocate to avoid setup cost in measurement
- •Reset between iterations
- •Report memory allocations
STOP and wait for user approval.
Step 3: Benchmark Code Implementation
Follow these Go benchmarking standards:
File Structure
code
benchmark_comparison_test.go
├── // Data generators
│ var smallData = make([]byte, 64)
│ var mediumData = make([]byte, 1024)
│ var largeData = make([]byte, 1<<20)
│
├── // Comparison benchmarks (grouped by operation)
│ func BenchmarkWrite(b *testing.B)
│ ├── b.Run("RingBuffer/64B", ...)
│ ├── b.Run("RingBuffer/1KB", ...)
│ ├── b.Run("LinkedList/64B", ...)
│ └── b.Run("LinkedList/1KB", ...)
│
├── func BenchmarkRead(b *testing.B)
│ ├── b.Run("RingBuffer/64B", ...)
│ └── ...
Code Patterns
Comparison Benchmark (Grouped by Operation):
go
func BenchmarkWrite(b *testing.B) {
sizes := []struct {
name string
data []byte
}{
{"64B", make([]byte, 64)},
{"1KB", make([]byte, 1024)},
{"1MB", make([]byte, 1<<20)},
}
for _, size := range sizes {
// RingBuffer
b.Run("RingBuffer/"+size.name, func(b *testing.B) {
buf := NewRing(len(size.data) * 2)
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf.Write(size.data)
buf.Reset()
}
})
// LinkedListBuffer
b.Run("LinkedList/"+size.name, func(b *testing.B) {
buf := &LinkedListBuffer{}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf.PushBack(size.data)
buf.Reset()
}
})
}
}
Read After Write Pattern:
go
func BenchmarkRead(b *testing.B) {
data := make([]byte, 1024)
readBuf := make([]byte, 1024)
b.Run("RingBuffer", func(b *testing.B) {
buf := NewRing(2048)
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf.Write(data)
buf.Read(readBuf)
}
})
b.Run("LinkedList", func(b *testing.B) {
buf := &LinkedListBuffer{}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf.PushBack(data)
buf.Read(readBuf)
}
})
}
Memory-Heavy Benchmark:
go
func BenchmarkMemory(b *testing.B) {
b.Run("RingBuffer/Grow", func(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf := NewRing(64)
for j := 0; j < 1000; j++ {
buf.Write(make([]byte, 100))
}
}
})
}
Running Benchmarks
bash
# Run all benchmarks go test -bench=. -benchmem # Run specific comparison go test -bench=BenchmarkWrite -benchmem # Run with count for stability go test -bench=. -benchmem -count=5 # Compare with benchstat go test -bench=. -benchmem -count=10 > old.txt # (make changes) go test -bench=. -benchmem -count=10 > new.txt benchstat old.txt new.txt
Output Interpretation
code
BenchmarkWrite/RingBuffer/64B-8 10000000 120 ns/op 0 B/op 0 allocs/op BenchmarkWrite/LinkedList/64B-8 5000000 250 ns/op 64 B/op 1 allocs/op
| Column | Meaning |
|---|---|
-8 | GOMAXPROCS (8 cores) |
10000000 | Iterations run |
120 ns/op | Time per operation |
0 B/op | Bytes allocated per op |
0 allocs/op | Allocations per op |
Interpretation: RingBuffer ~2x faster, no allocations.
Rules (Non-negotiable)
- •Always use b.ResetTimer() after setup code
- •Always use b.ReportAllocs() for memory analysis
- •Same data sizes for fair comparison
- •Reset state between iterations
- •Clear naming convention:
Operation/Implementation/Size - •Run multiple times for stable results
Best Practices
| Practice | Why |
|---|---|
| Pre-allocate data outside loop | Avoid measurement pollution |
Use b.StopTimer() / b.StartTimer() | Exclude setup from measurement |
| Reset buffers after write | Simulate real usage pattern |
| Test multiple sizes | Find performance characteristics |
Run -count=10 | Statistical stability |
Approval Prompt
After each step, ask:
"Please review the above and confirm if I should proceed to the next step."