Performance Mode
Recommended model tier: smart (opus) - this skill requires complex reasoning
Systematic approach to identifying and fixing performance issues.
Prerequisites
Before starting:
- •Identify the specific operation or endpoint that is slow
- •Understand what "fast enough" means (target latency, throughput)
- •Ensure you can measure performance reproducibly
Workflow
Step 1: Establish Baseline Measurement
Never optimize without data. Measure current performance:
bash
# Node.js - simple timing time node script.js # Node.js - CPU profiling node --cpu-prof script.js # Creates CPU.*.cpuprofile - analyze in Chrome DevTools # Go - benchmarks go test -bench=. -benchmem ./... # API endpoint curl -w "@curl-format.txt" -o /dev/null -s "http://localhost:3000/api/endpoint"
Record baseline metrics:
- •Execution time (p50, p95, p99 if available)
- •Memory usage
- •Number of operations per second
- •Number of I/O operations
Step 2: Identify Hotspots
Find where time is being spent:
bash
# Node.js profiling node --cpu-prof app.js # Then load .cpuprofile in Chrome DevTools > Performance # Go profiling go test -cpuprofile=cpu.prof -bench=. go tool pprof -http=:8080 cpu.prof
code
# Search for common expensive patterns
mcp__plugin_aide_aide__code_search query="forEach" kind="function"
mcp__plugin_aide_aide__code_search query="map" kind="function"
# Find database queries
Grep for "SELECT", "find(", "query("
# Find network calls
Grep for "fetch", "axios", "http.Get"
Step 3: Analyze Performance Patterns
Look for these common issues:
| Issue | Pattern | Solution |
|---|---|---|
| N+1 queries | Loop containing DB call | Batch/eager load |
| Repeated computation | Same calculation in loop | Memoize/cache |
| Large allocations | Creating objects in loop | Reuse/pool objects |
| Blocking I/O | Sync file/network ops | Make async/concurrent |
| Missing indexes | Slow DB queries | Add database indexes |
| Unnecessary work | Processing unused data | Filter/skip early |
| Serial execution | Sequential independent ops | Parallelize |
Step 4: Apply Optimizations
Priority order (highest impact first):
- •Algorithmic improvements - O(n^2) -> O(n log n)
- •Reduce I/O - Batch requests, add caching
- •Parallelize - Concurrent operations
- •Reduce allocations - Reuse objects, pre-allocate
- •Micro-optimizations - Only as last resort
Make one optimization at a time and measure after each.
Step 5: Measure After Each Change
bash
# Same measurement as baseline time node script.js go test -bench=. -benchmem ./...
Compare:
- •Did the metric improve?
- •By how much (percentage)?
- •Any negative side effects?
If no improvement: Revert and try different approach.
Step 6: Verify No Regressions
bash
# Run all tests npm test go test ./... # Check for correctness # Ensure output is still correct after optimization
Failure Handling
| Situation | Action |
|---|---|
| Cannot measure reliably | Increase sample size, reduce variance sources |
| Optimization made it slower | Revert, analyze why, profile more carefully |
| Optimization broke tests | Fix tests or revert if behavior changed |
| Bottleneck is external | Document, consider caching, async processing |
| Memory improved but CPU worse | Evaluate trade-off for use case |
Common Optimizations
JavaScript/TypeScript
typescript
// BAD: N+1 queries
for (const user of users) {
const posts = await db.getPosts(user.id);
}
// GOOD: Batch query
const userIds = users.map(u => u.id);
const posts = await db.getPostsForUsers(userIds);
// BAD: Repeated work
const typeA = items.filter(x => x.type === 'a').map(x => x.value);
const typeB = items.filter(x => x.type === 'b').map(x => x.value);
// GOOD: Single pass
const grouped = { a: [], b: [] };
for (const x of items) {
if (x.type in grouped) grouped[x.type].push(x.value);
}
// BAD: Serial async
const result1 = await fetch(url1);
const result2 = await fetch(url2);
// GOOD: Parallel async
const [result1, result2] = await Promise.all([
fetch(url1),
fetch(url2)
]);
Go
go
// BAD: Allocation in loop
var result []T
for _, item := range items {
result = append(result, process(item))
}
// GOOD: Pre-allocate
result := make([]T, 0, len(items))
for _, item := range items {
result = append(result, process(item))
}
// BAD: String concatenation
s := ""
for _, item := range items {
s += item
}
// GOOD: Builder
var b strings.Builder
for _, item := range items {
b.WriteString(item)
}
SQL
sql
-- BAD: Missing index on frequently queried column SELECT * FROM users WHERE email = 'x@y.com'; -- GOOD: Add index CREATE INDEX idx_users_email ON users(email); -- BAD: SELECT * SELECT * FROM users; -- GOOD: Select only needed columns SELECT id, name, email FROM users; -- BAD: Query in loop -- for each user: SELECT * FROM posts WHERE user_id = ? -- GOOD: Single batch query SELECT * FROM posts WHERE user_id IN (?, ?, ?);
MCP Tools
- •
mcp__plugin_aide_aide__code_search- Find loops, queries, expensive operations - •
mcp__plugin_aide_aide__code_symbols- Understand function structure - •
mcp__plugin_aide_aide__memory_search- Check past performance decisions
Profiling Commands Reference
Node.js
bash
# CPU profiling node --cpu-prof app.js # Produces .cpuprofile file # Memory profiling node --heap-prof app.js # Produces .heapprofile file # Clinic.js for analysis npx clinic doctor -- node app.js npx clinic flame -- node app.js
Go
bash
# CPU profiling go test -cpuprofile=cpu.prof -bench=. go tool pprof -http=:8080 cpu.prof # Memory profiling go test -memprofile=mem.prof -bench=. go tool pprof -http=:8080 mem.prof # Execution trace go test -trace=trace.out -bench=. go tool trace trace.out
Browser
- •DevTools -> Performance -> Record
- •DevTools -> Memory -> Heap snapshot
- •Lighthouse for overall page performance
Verification Criteria
Before completing:
- • Baseline measurement recorded
- • Improvement quantified (percentage)
- • All tests still pass
- • No correctness regressions
- • Memory usage acceptable
Output Format
markdown
## Performance Analysis: [Operation/Endpoint Name] ### Baseline - Execution time: 450ms (p50), 680ms (p95) - Memory: 125MB peak - Database queries: 150 ### Hotspots Identified 1. `db.getUsers()` - 300ms (67% of total) 2. `processData()` - 100ms (22% of total) 3. `formatOutput()` - 50ms (11% of total) ### Optimizations Applied 1. Batched user queries - 300ms -> 50ms 2. Memoized processData for repeated calls - 100ms -> 5ms ### Results - Execution time: 450ms -> 105ms (77% faster) - Memory: 125MB -> 80MB (36% reduction) - Database queries: 150 -> 3 (98% reduction) ### Verification - All tests: PASS - Output correctness: VERIFIED