AgentSkillsCN

performance

具备Go语言的性能优化专业技能。在进行性能剖析、基准测试、减少内存分配、优化热点路径、分析延迟、提升吞吐量,或使用pprof、benchstat、trace、Pyroscope、fgprof等工具时,可使用此功能。

SKILL.md
--- frontmatter
name: performance
description: >
  Performance optimisation expertise for Go. Use when profiling, benchmarking,
  reducing allocations, optimising hot paths, analysing latency, improving
  throughput, or working with pprof, benchstat, trace, Pyroscope, or fgprof.

Go Performance Principles

Profiling First

  • Never optimise without profiling — measure before changing
  • See PERF.md for the full profiling workflow, platform-specific notes, and containerised tooling setup (Pyroscope, Grafana)
  • CPU profile: go tool pprof cpu.prof
  • Memory profile: go tool pprof -alloc_space mem.prof
  • Goroutine profile: check for leaks and excessive goroutines
  • Trace: go tool trace trace.out for latency analysis
  • Use net/http/pprof endpoints in services for live profiling
  • Use fgprof for wall-clock profiling (captures I/O wait + CPU)

Benchmarking

  • Use testing.B with b.ReportAllocs()
  • Run with go test -bench=. -benchmem -count=10
  • Compare with benchstat old.txt new.txt
  • Benchmark realistic data sizes, not just small inputs
  • Use b.ResetTimer() after expensive setup
  • Store benchmark results: tee .perf/$(date +%Y-%m-%d)/bench.txt

Allocation Reduction (Uber style)

  • Pre-allocate slices: make([]T, 0, expectedCap)
  • Pre-allocate maps: make(map[K]V, expectedSize)
  • Use sync.Pool for frequently allocated/deallocated objects
  • Prefer strconv over fmt for primitive conversions (significantly faster)
  • Use strings.Builder for string concatenation in loops
  • Avoid fmt.Sprintf in hot paths
  • Avoid interface boxing in hot paths (causes heap allocation)
  • Use pointer receivers to avoid copying large structs
  • Check escape analysis: go build -gcflags='-m'
  • Avoid repeated string-to-byte conversions — cache the result

Concurrency Performance

  • Use sync.Map only for read-heavy workloads with stable keys
  • Prefer sharded maps with sync.RWMutex for write-heavy workloads
  • Bound goroutine creation with worker pools or semaphores
  • Use runtime.GOMAXPROCS() awareness for CPU-bound work
  • Channels should have size zero or one — profile before using larger buffers
  • Use go.uber.org/goleak in tests to detect goroutine leaks

HTTP / Network

  • Reuse http.Client and http.Transport — never create per-request
  • Set MaxIdleConns, MaxIdleConnsPerHost, IdleConnTimeout
  • Use context.WithTimeout for all outbound calls
  • Enable HTTP/2 where supported
  • Use connection pooling for database clients

Kubernetes Controller Performance

  • Use informer caches — never list from the API server in reconcile
  • Use predicates to filter events before they reach the reconciler
  • Set appropriate MaxConcurrentReconciles for the workload
  • Use client.Reader (cached) for reads, client.Writer for writes
  • Avoid unnecessary status updates — compare before writing

Continuous Profiling (Production)

  • Use Grafana Pyroscope for continuous profiling in production
  • Push mode: github.com/grafana/pyroscope-go SDK
  • Pull mode: Grafana Alloy scraping pprof endpoints
  • See PERF.md for Docker Compose and Kubernetes deployment
  • Record findings in beads: bd create "Perf: finding" --type note --labels perf

Recording Results

After profiling, record findings in beads for persistent memory:

bash
bd create "Perf: description of finding" --type note --labels perf,context
bd create "Baseline: metric at N resources" --type note --labels perf,baseline