Performance Optimization Skill

Reusable workflow extracted from otto-performance-optimizer expertise.

Purpose

Systematically identify and eliminate performance bottlenecks through data-driven profiling, algorithmic optimization, and infrastructure tuning to achieve scalability and efficiency goals.

When to Use

•Performance degradation investigation
•Pre-release performance validation
•Scalability planning and capacity assessment
•High-load optimization
•Cost optimization through efficiency
•Database query optimization
•Frontend performance improvement (Core Web Vitals)
•Infrastructure right-sizing

Workflow Steps

•
Define Performance Goals
- •Establish specific, measurable targets (e.g., P95 < 200ms)
- •Define throughput requirements (req/sec, ops/sec)
- •Set resource efficiency goals (CPU, memory, cost)
- •Identify user experience requirements (page load, TTI)
- •Document current baseline metrics
•
Baseline Measurement
- •Create reproducible benchmark suite
- •Measure current performance across key metrics
- •Identify representative workloads
- •Document environment configuration
- •Establish measurement methodology
•
Profile & Analyze
- •CPU profiling: Identify hot paths and expensive functions
- •Memory profiling: Find allocations, leaks, GC pressure
- •I/O profiling: Measure disk and network bottlenecks
- •Database profiling: Query analysis with EXPLAIN
- •Frontend profiling: Lighthouse, WebPageTest, DevTools
•
Identify Bottlenecks
- •Analyze profiling data for actual constraints
- •Distinguish symptoms from root causes
- •Quantify impact of each bottleneck
- •Prioritize by impact/effort ratio
- •Avoid premature optimization (profile first!)
•
Prioritize Optimizations
- •Quick Wins: High impact, low effort
- •Strategic: High impact, medium effort
- •Incremental: Medium impact, low effort
- •Deferred: Low impact or high complexity
- •Create optimization roadmap
•
Implement & Measure
- •Apply optimizations incrementally
- •Measure each change independently
- •Document before/after metrics
- •Verify no functional regressions
- •Track trade-offs (complexity, maintainability)
•
Validate & Compare
- •Compare against baseline and goals
- •Run load tests to verify at scale
- •Test edge cases and failure modes
- •Check resource utilization under load
- •Measure cost impact
•
Monitor & Prevent Regression
- •Set up performance monitoring
- •Create alerting for degradation
- •Add performance tests to CI/CD
- •Document optimization decisions
- •Regular performance review cadence

Inputs Required

•Performance targets: Specific latency, throughput, resource goals
•Current metrics: Baseline performance measurements
•Workload profile: Traffic patterns, peak loads, data volumes
•Constraints: Budget, timeline, acceptable trade-offs
•Environment: Production specs, infrastructure configuration

Outputs Produced

•Profiling Report: Flame graphs, hot spots, bottleneck analysis
•Optimization Roadmap: Prioritized improvements with expected impact
•Before/After Benchmarks: Quantified performance improvements
•Capacity Plan: Scalability analysis and resource projections
•Monitoring Setup: Metrics, dashboards, and alerting configuration
•Cost Analysis: Infrastructure cost savings from optimization

Profiling Tools by Category

CPU Profiling

•Python: cProfile, py-spy, line_profiler
•JavaScript/Node: Chrome DevTools, clinic.js, 0x, node --prof
•C/C++/Objective-C: Instruments, perf, Valgrind, Intel VTune
•Java/Kotlin: JProfiler, async-profiler, JFR, VisualVM
•Go: pprof, trace, benchstat

Memory Profiling

•Python: memory_profiler, tracemalloc, objgraph
•JavaScript/Node: Chrome DevTools heap profiler, node --heap-prof
•C/C++: Valgrind, AddressSanitizer, LeakSanitizer
•Java: VisualVM, JProfiler, heap dumps
•Go: pprof heap profile

Database Profiling

•PostgreSQL: EXPLAIN ANALYZE, pg_stat_statements
•MySQL: EXPLAIN, slow query log, pt-query-digest
•MongoDB: explain(), profiler, slow query log
•Redis: SLOWLOG, redis-cli --latency

System Profiling

•Linux: perf, eBPF/bpftrace, sysstat, iotop
•macOS: Instruments, dtrace, fs_usage
•Network: Wireshark, tcpdump, netstat, ss

Optimization Strategies Catalog

Algorithmic Optimization

•Complexity Reduction: O(n²) → O(n log n) → O(n)
•Data Structure Selection: Array vs Hash vs Tree
•Caching Results: Memoization, computed properties
•Lazy Evaluation: Compute only when needed
•Batch Processing: N+1 → single batch operation

Database Optimization

•Query Optimization: Rewrite inefficient queries
•Index Strategy: B-tree, hash, partial, covering indexes
•Connection Pooling: Optimal pool size (typically 2-10× CPU cores)
•Query Batching: Combine multiple queries
•Denormalization: Trade-off for read performance
•Caching: Redis/Memcached for hot data

Frontend Optimization

•
Core Web Vitals:
- •LCP (Largest Contentful Paint) < 2.5s
- •FID (First Input Delay) < 100ms
- •CLS (Cumulative Layout Shift) < 0.1
•Bundle Optimization: Code splitting, tree shaking, lazy loading
•Asset Optimization: Image compression, WebP, responsive images
•Caching: Service workers, Cache-Control headers
•CDN: Geographic distribution, edge caching

Backend Optimization

•API Response: Reduce payload size, compression
•Async Processing: Queue long-running tasks
•Connection Reuse: HTTP keep-alive, connection pooling
•Caching Layers: Application cache, CDN, database cache
•Concurrency: Proper use of async/await, workers

Infrastructure Optimization

•Auto-Scaling: Horizontal and vertical scaling policies
•Right-Sizing: Match resources to actual usage
•Load Balancing: Distribute traffic efficiently
•Geographic Distribution: Multi-region for global users
•Resource Limits: Prevent resource exhaustion

Performance Metrics Checklist

Latency Metrics

• P50 (median) latency measured
• P95 latency (95th percentile) tracked
• P99 latency (worst case) monitored
• Max latency identified

Throughput Metrics

• Requests per second (RPS) capacity known
• Transactions per second (TPS) measured
• Concurrent users handled documented
• Peak load capacity established

Resource Metrics

• CPU utilization tracked (target: <70% at peak)
• Memory usage monitored (avoid swapping)
• Disk I/O measured (IOPS, throughput)
• Network bandwidth utilization tracked

User Experience Metrics

• Time to First Byte (TTFB) < 200ms
• First Contentful Paint (FCP) < 1.8s
• Time to Interactive (TTI) < 3.8s
• Total Page Load < 3s

Example Usage

code

Input: API endpoint /api/users slow (P95: 3.2s), target: <200ms

Workflow Execution:
1. Goal: Reduce P95 latency to <200ms, increase throughput 5x
2. Baseline: Current P95 = 3.2s, 50 req/sec max
3. Profile:
   - Flame graph shows 80% time in database query
   - Query: SELECT * FROM users JOIN orders... (full table scan)
   - 5M users table, no index on email column
4. Bottleneck: Missing index causing seq scan, N+1 query pattern
5. Prioritize:
   - 🔴 Quick Win: Add index on users.email
   - 🔴 Quick Win: Fix N+1 with JOIN optimization
   - 🟡 Strategic: Add Redis cache for user profile
6. Implement:
   - CREATE INDEX idx_users_email ON users(email)
   - Rewrite query with proper JOIN
   - Add Redis cache (TTL: 5min)
7. Validate:
   - P95 latency: 3.2s → 45ms (98.6% improvement)
   - Throughput: 50 → 400 req/sec (8x improvement)
   - Database CPU: 85% → 12%
8. Monitor: Added Grafana dashboard, alert if P95 > 200ms

Output:
✅ Performance goal achieved: P95 = 45ms (target: <200ms)
✅ Throughput exceeded: 400 req/sec (target: 250 req/sec)
✅ Cost reduced: 6 → 2 database instances ($2,400/month savings)

Optimization Anti-Patterns to Avoid

Premature Optimization

•❌ Optimizing without profiling data
•✅ Profile first, identify actual bottleneck, then optimize

Micro-Optimizations

•❌ Focusing on saving nanoseconds while ignoring second-long delays
•✅ Focus on bottlenecks with measurable user impact

Benchmark Gaming

•❌ Optimizing for artificial benchmarks not real workloads
•✅ Use representative production-like workloads

Complexity Creep

•❌ Adding complexity for marginal 2% gains
•✅ Balance performance with maintainability

Ignoring Trade-offs

•❌ Not considering memory usage, code complexity, maintainability
•✅ Document trade-offs explicitly

Performance Budget Template

markdown

## Performance Budget: [Feature/Page Name]

### Targets
- P95 Latency: < [target]ms
- Throughput: > [target] req/sec
- Page Load: < [target]s
- Bundle Size: < [target]KB
- CPU Usage: < [target]%
- Memory Usage: < [target]MB

### Current Metrics
- P95 Latency: [current]ms
- Throughput: [current] req/sec
- Status: ✅ Within budget / ❌ Exceeds budget

### Action Required
[If budget exceeded, optimization plan]

Related Agents

•otto-performance-optimizer - Full agent with profiling expertise
•baccio-tech-architect - Architecture-level performance design
•dario-debugger - Performance-related bug investigation
•omri-data-scientist - ML model inference optimization
•marco-devops-engineer - Infrastructure performance tuning

ISE Engineering Fundamentals Alignment

•Leverage observability (metrics, tracing) for performance
•Load testing validates behavior under peak load
•Performance testing measures against baselines
•Stress testing finds breaking points
•Design for NFRs: performance SLAs defined upfront
•Parametrize configurations for easy tuning
•Log operation durations on critical paths
•Test under realistic load, not just happy-path