Go Performance Principles
Profiling First
- •Never optimise without profiling — measure before changing
- •See
PERF.mdfor the full profiling workflow, platform-specific notes, and containerised tooling setup (Pyroscope, Grafana) - •CPU profile:
go tool pprof cpu.prof - •Memory profile:
go tool pprof -alloc_space mem.prof - •Goroutine profile: check for leaks and excessive goroutines
- •Trace:
go tool trace trace.outfor latency analysis - •Use
net/http/pprofendpoints in services for live profiling - •Use
fgproffor wall-clock profiling (captures I/O wait + CPU)
Benchmarking
- •Use
testing.Bwithb.ReportAllocs() - •Run with
go test -bench=. -benchmem -count=10 - •Compare with
benchstat old.txt new.txt - •Benchmark realistic data sizes, not just small inputs
- •Use
b.ResetTimer()after expensive setup - •Store benchmark results:
tee .perf/$(date +%Y-%m-%d)/bench.txt
Allocation Reduction (Uber style)
- •Pre-allocate slices:
make([]T, 0, expectedCap) - •Pre-allocate maps:
make(map[K]V, expectedSize) - •Use
sync.Poolfor frequently allocated/deallocated objects - •Prefer
strconvoverfmtfor primitive conversions (significantly faster) - •Use
strings.Builderfor string concatenation in loops - •Avoid
fmt.Sprintfin hot paths - •Avoid interface boxing in hot paths (causes heap allocation)
- •Use pointer receivers to avoid copying large structs
- •Check escape analysis:
go build -gcflags='-m' - •Avoid repeated string-to-byte conversions — cache the result
Concurrency Performance
- •Use
sync.Maponly for read-heavy workloads with stable keys - •Prefer sharded maps with
sync.RWMutexfor write-heavy workloads - •Bound goroutine creation with worker pools or semaphores
- •Use
runtime.GOMAXPROCS()awareness for CPU-bound work - •Channels should have size zero or one — profile before using larger buffers
- •Use
go.uber.org/goleakin tests to detect goroutine leaks
HTTP / Network
- •Reuse
http.Clientandhttp.Transport— never create per-request - •Set
MaxIdleConns,MaxIdleConnsPerHost,IdleConnTimeout - •Use
context.WithTimeoutfor all outbound calls - •Enable HTTP/2 where supported
- •Use connection pooling for database clients
Kubernetes Controller Performance
- •Use informer caches — never list from the API server in reconcile
- •Use predicates to filter events before they reach the reconciler
- •Set appropriate
MaxConcurrentReconcilesfor the workload - •Use
client.Reader(cached) for reads,client.Writerfor writes - •Avoid unnecessary status updates — compare before writing
Continuous Profiling (Production)
- •Use Grafana Pyroscope for continuous profiling in production
- •Push mode:
github.com/grafana/pyroscope-goSDK - •Pull mode: Grafana Alloy scraping pprof endpoints
- •See
PERF.mdfor Docker Compose and Kubernetes deployment - •Record findings in beads:
bd create "Perf: finding" --type note --labels perf
Recording Results
After profiling, record findings in beads for persistent memory:
bash
bd create "Perf: description of finding" --type note --labels perf,context bd create "Baseline: metric at N resources" --type note --labels perf,baseline