Performance Optimization Skill
Reusable workflow extracted from otto-performance-optimizer expertise.
Purpose
Systematically identify and eliminate performance bottlenecks through data-driven profiling, algorithmic optimization, and infrastructure tuning to achieve scalability and efficiency goals.
When to Use
- •Performance degradation investigation
- •Pre-release performance validation
- •Scalability planning and capacity assessment
- •High-load optimization
- •Cost optimization through efficiency
- •Database query optimization
- •Frontend performance improvement (Core Web Vitals)
- •Infrastructure right-sizing
Workflow Steps
- •
Define Performance Goals
- •Establish specific, measurable targets (e.g., P95 < 200ms)
- •Define throughput requirements (req/sec, ops/sec)
- •Set resource efficiency goals (CPU, memory, cost)
- •Identify user experience requirements (page load, TTI)
- •Document current baseline metrics
- •
Baseline Measurement
- •Create reproducible benchmark suite
- •Measure current performance across key metrics
- •Identify representative workloads
- •Document environment configuration
- •Establish measurement methodology
- •
Profile & Analyze
- •CPU profiling: Identify hot paths and expensive functions
- •Memory profiling: Find allocations, leaks, GC pressure
- •I/O profiling: Measure disk and network bottlenecks
- •Database profiling: Query analysis with EXPLAIN
- •Frontend profiling: Lighthouse, WebPageTest, DevTools
- •
Identify Bottlenecks
- •Analyze profiling data for actual constraints
- •Distinguish symptoms from root causes
- •Quantify impact of each bottleneck
- •Prioritize by impact/effort ratio
- •Avoid premature optimization (profile first!)
- •
Prioritize Optimizations
- •Quick Wins: High impact, low effort
- •Strategic: High impact, medium effort
- •Incremental: Medium impact, low effort
- •Deferred: Low impact or high complexity
- •Create optimization roadmap
- •
Implement & Measure
- •Apply optimizations incrementally
- •Measure each change independently
- •Document before/after metrics
- •Verify no functional regressions
- •Track trade-offs (complexity, maintainability)
- •
Validate & Compare
- •Compare against baseline and goals
- •Run load tests to verify at scale
- •Test edge cases and failure modes
- •Check resource utilization under load
- •Measure cost impact
- •
Monitor & Prevent Regression
- •Set up performance monitoring
- •Create alerting for degradation
- •Add performance tests to CI/CD
- •Document optimization decisions
- •Regular performance review cadence
Inputs Required
- •Performance targets: Specific latency, throughput, resource goals
- •Current metrics: Baseline performance measurements
- •Workload profile: Traffic patterns, peak loads, data volumes
- •Constraints: Budget, timeline, acceptable trade-offs
- •Environment: Production specs, infrastructure configuration
Outputs Produced
- •Profiling Report: Flame graphs, hot spots, bottleneck analysis
- •Optimization Roadmap: Prioritized improvements with expected impact
- •Before/After Benchmarks: Quantified performance improvements
- •Capacity Plan: Scalability analysis and resource projections
- •Monitoring Setup: Metrics, dashboards, and alerting configuration
- •Cost Analysis: Infrastructure cost savings from optimization
Profiling Tools by Category
CPU Profiling
- •Python: cProfile, py-spy, line_profiler
- •JavaScript/Node: Chrome DevTools, clinic.js, 0x, node --prof
- •C/C++/Objective-C: Instruments, perf, Valgrind, Intel VTune
- •Java/Kotlin: JProfiler, async-profiler, JFR, VisualVM
- •Go: pprof, trace, benchstat
Memory Profiling
- •Python: memory_profiler, tracemalloc, objgraph
- •JavaScript/Node: Chrome DevTools heap profiler, node --heap-prof
- •C/C++: Valgrind, AddressSanitizer, LeakSanitizer
- •Java: VisualVM, JProfiler, heap dumps
- •Go: pprof heap profile
Database Profiling
- •PostgreSQL: EXPLAIN ANALYZE, pg_stat_statements
- •MySQL: EXPLAIN, slow query log, pt-query-digest
- •MongoDB: explain(), profiler, slow query log
- •Redis: SLOWLOG, redis-cli --latency
System Profiling
- •Linux: perf, eBPF/bpftrace, sysstat, iotop
- •macOS: Instruments, dtrace, fs_usage
- •Network: Wireshark, tcpdump, netstat, ss
Optimization Strategies Catalog
Algorithmic Optimization
- •Complexity Reduction: O(n²) → O(n log n) → O(n)
- •Data Structure Selection: Array vs Hash vs Tree
- •Caching Results: Memoization, computed properties
- •Lazy Evaluation: Compute only when needed
- •Batch Processing: N+1 → single batch operation
Database Optimization
- •Query Optimization: Rewrite inefficient queries
- •Index Strategy: B-tree, hash, partial, covering indexes
- •Connection Pooling: Optimal pool size (typically 2-10× CPU cores)
- •Query Batching: Combine multiple queries
- •Denormalization: Trade-off for read performance
- •Caching: Redis/Memcached for hot data
Frontend Optimization
- •Core Web Vitals:
- •LCP (Largest Contentful Paint) < 2.5s
- •FID (First Input Delay) < 100ms
- •CLS (Cumulative Layout Shift) < 0.1
- •Bundle Optimization: Code splitting, tree shaking, lazy loading
- •Asset Optimization: Image compression, WebP, responsive images
- •Caching: Service workers, Cache-Control headers
- •CDN: Geographic distribution, edge caching
Backend Optimization
- •API Response: Reduce payload size, compression
- •Async Processing: Queue long-running tasks
- •Connection Reuse: HTTP keep-alive, connection pooling
- •Caching Layers: Application cache, CDN, database cache
- •Concurrency: Proper use of async/await, workers
Infrastructure Optimization
- •Auto-Scaling: Horizontal and vertical scaling policies
- •Right-Sizing: Match resources to actual usage
- •Load Balancing: Distribute traffic efficiently
- •Geographic Distribution: Multi-region for global users
- •Resource Limits: Prevent resource exhaustion
Performance Metrics Checklist
Latency Metrics
- • P50 (median) latency measured
- • P95 latency (95th percentile) tracked
- • P99 latency (worst case) monitored
- • Max latency identified
Throughput Metrics
- • Requests per second (RPS) capacity known
- • Transactions per second (TPS) measured
- • Concurrent users handled documented
- • Peak load capacity established
Resource Metrics
- • CPU utilization tracked (target: <70% at peak)
- • Memory usage monitored (avoid swapping)
- • Disk I/O measured (IOPS, throughput)
- • Network bandwidth utilization tracked
User Experience Metrics
- • Time to First Byte (TTFB) < 200ms
- • First Contentful Paint (FCP) < 1.8s
- • Time to Interactive (TTI) < 3.8s
- • Total Page Load < 3s
Example Usage
code
Input: API endpoint /api/users slow (P95: 3.2s), target: <200ms Workflow Execution: 1. Goal: Reduce P95 latency to <200ms, increase throughput 5x 2. Baseline: Current P95 = 3.2s, 50 req/sec max 3. Profile: - Flame graph shows 80% time in database query - Query: SELECT * FROM users JOIN orders... (full table scan) - 5M users table, no index on email column 4. Bottleneck: Missing index causing seq scan, N+1 query pattern 5. Prioritize: - 🔴 Quick Win: Add index on users.email - 🔴 Quick Win: Fix N+1 with JOIN optimization - 🟡 Strategic: Add Redis cache for user profile 6. Implement: - CREATE INDEX idx_users_email ON users(email) - Rewrite query with proper JOIN - Add Redis cache (TTL: 5min) 7. Validate: - P95 latency: 3.2s → 45ms (98.6% improvement) - Throughput: 50 → 400 req/sec (8x improvement) - Database CPU: 85% → 12% 8. Monitor: Added Grafana dashboard, alert if P95 > 200ms Output: ✅ Performance goal achieved: P95 = 45ms (target: <200ms) ✅ Throughput exceeded: 400 req/sec (target: 250 req/sec) ✅ Cost reduced: 6 → 2 database instances ($2,400/month savings)
Optimization Anti-Patterns to Avoid
Premature Optimization
- •❌ Optimizing without profiling data
- •✅ Profile first, identify actual bottleneck, then optimize
Micro-Optimizations
- •❌ Focusing on saving nanoseconds while ignoring second-long delays
- •✅ Focus on bottlenecks with measurable user impact
Benchmark Gaming
- •❌ Optimizing for artificial benchmarks not real workloads
- •✅ Use representative production-like workloads
Complexity Creep
- •❌ Adding complexity for marginal 2% gains
- •✅ Balance performance with maintainability
Ignoring Trade-offs
- •❌ Not considering memory usage, code complexity, maintainability
- •✅ Document trade-offs explicitly
Performance Budget Template
markdown
## Performance Budget: [Feature/Page Name] ### Targets - P95 Latency: < [target]ms - Throughput: > [target] req/sec - Page Load: < [target]s - Bundle Size: < [target]KB - CPU Usage: < [target]% - Memory Usage: < [target]MB ### Current Metrics - P95 Latency: [current]ms - Throughput: [current] req/sec - Status: ✅ Within budget / ❌ Exceeds budget ### Action Required [If budget exceeded, optimization plan]
Related Agents
- •otto-performance-optimizer - Full agent with profiling expertise
- •baccio-tech-architect - Architecture-level performance design
- •dario-debugger - Performance-related bug investigation
- •omri-data-scientist - ML model inference optimization
- •marco-devops-engineer - Infrastructure performance tuning
ISE Engineering Fundamentals Alignment
- •Leverage observability (metrics, tracing) for performance
- •Load testing validates behavior under peak load
- •Performance testing measures against baselines
- •Stress testing finds breaking points
- •Design for NFRs: performance SLAs defined upfront
- •Parametrize configurations for easy tuning
- •Log operation durations on critical paths
- •Test under realistic load, not just happy-path