Performance Engineering
Evidence-based performance optimization → measure → profile → optimize → validate.
<when_to_use>
- •Profiling slow code paths or bottlenecks
- •Identifying memory leaks or excessive allocations
- •Optimizing latency-critical operations (P95, P99)
- •Benchmarking competing implementations
- •Database query optimization
- •Reducing CPU usage in hot paths
- •Improving throughput (RPS, ops/sec)
NOT for: premature optimization, optimization without measurement, guessing at bottlenecks
</when_to_use>
<iron_law>
NO OPTIMIZATION WITHOUT MEASUREMENT
Required workflow:
- •Measure baseline performance with realistic workload
- •Profile to identify actual bottleneck
- •Optimize the bottleneck (not what you think is slow)
- •Measure again to verify improvement
- •Document gains and tradeoffs
Optimizing unmeasured code wastes time and introduces bugs.
</iron_law>
<stages>Load the maintain-tasks skill for stage tracking:
Stage 1: Establishing baseline
- •content: "Establish performance baseline with realistic workload"
- •activeForm: "Establishing performance baseline"
Stage 2: Profiling bottlenecks
- •content: "Profile code to identify actual bottlenecks"
- •activeForm: "Profiling code to identify bottlenecks"
Stage 3: Analyzing root cause
- •content: "Analyze profiling data to determine root cause"
- •activeForm: "Analyzing profiling data"
Stage 4: Implementing optimization
- •content: "Implement targeted optimization for identified bottleneck"
- •activeForm: "Implementing optimization"
Stage 5: Validating improvement
- •content: "Measure performance gains and verify no regressions"
- •activeForm: "Validating performance improvement"
Key Performance Indicators
Latency (response time):
- •P50 (median) — typical case
- •P95 — most users
- •P99 — tail latency
- •P99.9 — outliers
- •TTFB — time to first byte
- •TTLB — time to last byte
Throughput:
- •RPS — requests per second
- •ops/sec — operations per second
- •bytes/sec — data transfer rate
- •queries/sec — database throughput
Memory:
- •Heap usage — allocated memory
- •GC frequency — garbage collection pauses
- •GC duration — stop-the-world time
- •Allocation rate — memory churn
- •Resident set size (RSS) — total memory
CPU:
- •CPU time — total compute
- •Wall time — elapsed time
- •Hot paths — frequently executed code
- •Time complexity — algorithmic efficiency
- •CPU utilization — percentage used
Always measure:
- •Before optimization (baseline)
- •After optimization (improvement)
- •Under realistic load (not toy data)
- •Multiple runs (account for variance)
<profiling_tools>
TypeScript/Bun
Built-in timing:
console.time('operation')
// ... code to measure
console.timeEnd('operation')
// High precision
const start = Bun.nanoseconds()
// ... code to measure
const elapsed = Bun.nanoseconds() - start
console.log(`Took ${elapsed / 1_000_000}ms`)
Performance API:
const mark1 = performance.mark('start')
// ... code to measure
const mark2 = performance.mark('end')
performance.measure('operation', 'start', 'end')
const measure = performance.getEntriesByName('operation')[0]
console.log(`Duration: ${measure.duration}ms`)
Memory profiling:
- •Chrome DevTools → Memory tab → heap snapshots
- •Node.js
--inspectflag + Chrome DevTools - •
process.memoryUsage()for RSS/heap tracking
CPU profiling:
- •Chrome DevTools → Performance tab → record session
- •Node.js
--profflag +node --prof-process - •Flamegraphs for visualization
Rust
Benchmarking:
#[cfg(test)]
mod benches {
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_function(black_box(42)))
});
}
criterion_group!(benches, benchmark_function);
criterion_main!(benches);
}
Profiling:
- •
cargo bench— criterion benchmarks - •
perf record+perf report— Linux profiling - •
cargo flamegraph— visual flamegraphs - •
cargo bloat— binary size analysis - •
valgrind --tool=callgrind— detailed profiling - •
heaptrack— memory profiling
Instrumentation:
use std::time::Instant;
let start = Instant::now();
// ... code to measure
let duration = start.elapsed();
println!("Took: {:?}", duration);
</profiling_tools>
<optimization_patterns>
Algorithm Improvements
Time complexity:
- •O(n²) → O(n log n) — sorting, searching
- •O(n) → O(log n) — binary search, trees
- •O(n) → O(1) — hash maps, memoization
Space-time tradeoffs:
- •Cache computed results (memoization)
- •Precompute expensive operations
- •Index data for faster lookup
- •Use hash maps for O(1) access
Memory Optimization
Reduce allocations:
// Bad: creates new array each iteration
for (const item of items) {
const results = []
results.push(process(item))
}
// Good: reuse array
const results = []
for (const item of items) {
results.push(process(item))
}
// Bad: allocates String every time
fn format_user(name: &str) -> String {
format!("User: {}", name)
}
// Good: reuses buffer
fn format_user(name: &str, buf: &mut String) {
buf.clear();
buf.push_str("User: ");
buf.push_str(name);
}
Memory pooling:
- •Reuse expensive objects (connections, buffers)
- •Object pools for frequently allocated types
- •Arena allocators for batch allocations
Lazy evaluation:
- •Compute only when needed
- •Stream processing vs loading all data
- •Iterators over materialized collections
I/O Optimization
Batching:
- •Batch API calls (1 request vs 100)
- •Batch database writes (bulk insert)
- •Batch file operations (single write vs many)
Caching:
- •Cache expensive computations
- •Cache database queries (Redis, in-memory)
- •Cache API responses (HTTP caching)
- •Invalidate stale cache entries
Async I/O:
- •Non-blocking operations (async/await)
- •Concurrent requests (Promise.all, tokio::spawn)
- •Connection pooling (reuse connections)
Database Optimization
Query optimization:
- •Add indexes for common queries
- •Use EXPLAIN/EXPLAIN ANALYZE
- •Avoid N+1 queries (use joins or batch loading)
- •Select only needed columns
- •Filter at database level (WHERE vs client filter)
Schema design:
- •Normalize to reduce duplication
- •Denormalize for read-heavy workloads
- •Partition large tables
- •Use appropriate data types
Connection management:
- •Connection pooling (don't create per request)
- •Prepared statements (avoid SQL parsing)
- •Transaction batching (reduce round trips)
</optimization_patterns>
<workflow>Loop: Measure → Profile → Analyze → Optimize → Validate
- •Define performance goal — target metric (e.g., P95 < 100ms)
- •Establish baseline — measure current performance under realistic load
- •Profile systematically — identify actual bottleneck (not guesses)
- •Analyze root cause — understand why code is slow
- •Design optimization — plan targeted improvement
- •Implement optimization — make focused change
- •Measure improvement — verify gains, check for regressions
- •Document results — record baseline, optimization, gains, tradeoffs
At each step:
- •Document measurements with methodology
- •Note profiling tool output
- •Track optimization attempts (what worked/failed)
- •Update performance documentation
Before declaring optimization complete:
Check gains:
- •✓ Measured improvement meets target?
- •✓ Improvement statistically significant?
- •✓ Tested under realistic load?
- •✓ Multiple runs confirm consistency?
Check regressions:
- •✓ No degradation in other metrics?
- •✓ Memory usage still acceptable?
- •✓ Code complexity still manageable?
- •✓ Tests still pass?
Check documentation:
- •✓ Baseline measurements recorded?
- •✓ Optimization approach explained?
- •✓ Gains quantified with numbers?
- •✓ Tradeoffs documented?
ALWAYS:
- •Measure before optimizing (baseline)
- •Profile to find actual bottleneck
- •Use realistic workload (not toy data)
- •Measure multiple runs (account for variance)
- •Document baseline and improvements
- •Check for regressions in other metrics
- •Consider readability vs performance tradeoff
- •Verify statistical significance
NEVER:
- •Optimize without measuring first
- •Guess at bottleneck without profiling
- •Benchmark with unrealistic data
- •Trust single-run measurements
- •Skip documentation of results
- •Sacrifice correctness for speed
- •Optimize without clear performance goal
- •Ignore algorithmic improvements
Methodology:
- •benchmarking.md — rigorous benchmarking methodology
Related skills:
- •codebase-recon — evidence-based investigation (foundation)
- •debugging — structured bug investigation
- •typescript-dev — correctness before performance