What I do

•Provide a systematic methodology for performance analysis
•Document universal profiling and benchmarking tools
•Guide flame graph interpretation and bottleneck identification
•Define performance regression testing principles

When to use me

Use this skill as the entry point for any performance analysis task. Pair with a language-specific skill (perf-go, perf-typescript) for tooling details.

Performance Methodology

Every performance task follows the same loop:

•Measure -- establish a baseline (latency, throughput, memory, CPU)
•Profile -- collect data on where time and resources are spent
•Identify -- find the bottleneck (CPU? Memory? I/O? Contention?)
•Optimize -- fix only the identified bottleneck
•Verify -- re-measure to confirm improvement and check for regressions

Do not skip steps. Do not optimize without profiling first.

Universal Tools

Tool	Purpose	Install
`hyperfine`	CLI benchmark runner with statistical analysis	`cargo install hyperfine`
`perf`	Linux CPU profiler (sampling, counters)	`apt install linux-tools-common`
`flamegraph`	Generate flame graphs from `perf` data	`cargo install flamegraph`
`valgrind`	Memory error detection and heap profiling	`apt install valgrind`
`strace`	Syscall tracing for I/O analysis	`apt install strace`

Quick benchmarking with hyperfine

bash

# Single command
hyperfine 'my-program --input data.json'

# Compare two implementations
hyperfine 'my-program-v1 input.txt' 'my-program-v2 input.txt'

# Warmup runs and minimum iterations
hyperfine --warmup 3 --min-runs 10 'my-program'

Reading Flame Graphs

•Width = time spent in that function (wider = hotter)
•Height = call stack depth (taller = deeper call chain)
•Look for wide plateaus -- these are hot functions consuming the most time
•Ignore narrow towers -- deep but fast, not the bottleneck
•Use differential flame graphs to compare before/after optimization
•Colors are arbitrary in most tools -- width is the only metric that matters

Performance Regression Testing

•Run benchmarks in CI on every pull request
•Compare results against a baseline from the default branch
•Alert on regressions exceeding a defined threshold (e.g., >5%)
•Store benchmark results as CI artifacts for historical comparison
•Use statistical analysis (multiple runs, confidence intervals) to avoid false positives from noise

Anti-Patterns

Anti-Pattern	Why It Fails
Optimizing without measuring	You will optimize the wrong thing
Microbenchmarks in isolation	Miss system-level bottlenecks and real-world interactions
Optimizing cold paths	1% of code often accounts for 99% of runtime
Premature optimization	Correctness and clarity first, then measure
Single-run benchmarks	Statistical noise masks the real signal
Guessing the bottleneck	Profile data beats intuition every time

Language-Specific Skills

After establishing methodology with this skill, load the appropriate language-specific skill for concrete tooling:

Language	Skill	Coverage
Go	`perf-go`	pprof, testing.B, benchstat, GC tuning
Rust	`perf-rust`	criterion, divan, cargo flamegraph, LLVM codegen
TypeScript / Node.js	`perf-typescript`	V8 profiling, clinic.js, vitest bench