Performance Profiling
When to Use
- •Establishing performance baselines before optimization
- •Diagnosing slow response times, high CPU, or memory issues
- •Identifying bottlenecks in application, database, or infrastructure
- •Planning capacity for expected load increases
- •Validating performance improvements after optimization
- •Creating performance budgets for new features
Core Methodology
The Golden Rule: Measure First
Never optimize based on assumptions. Follow this order:
- •Measure - Establish baseline metrics
- •Identify - Find the actual bottleneck
- •Hypothesize - Form a theory about the cause
- •Fix - Implement targeted optimization
- •Validate - Measure again to confirm improvement
- •Document - Record findings and decisions
Profiling Hierarchy
Profile at the right level to find the actual bottleneck:
code
Application Level
|-- Request/Response timing
|-- Function/Method profiling
|-- Memory allocation tracking
|
System Level
|-- CPU utilization per process
|-- Memory usage patterns
|-- I/O wait times
|-- Network latency
|
Infrastructure Level
|-- Database query performance
|-- Cache hit rates
|-- External service latency
|-- Resource saturation
Profiling Patterns
CPU Profiling
Identify what code consumes CPU time:
- •Sampling profilers - Low overhead, statistical accuracy
- •Instrumentation profilers - Exact counts, higher overhead
- •Flame graphs - Visual representation of call stacks
Key metrics:
- •Self time (time in function itself)
- •Total time (self time + time in called functions)
- •Call count and frequency
Memory Profiling
Track allocation patterns and detect leaks:
- •Heap snapshots - Point-in-time memory state
- •Allocation tracking - What allocates memory and when
- •Garbage collection analysis - GC frequency and duration
Key metrics:
- •Heap size over time
- •Object retention
- •Allocation rate
- •GC pause times
I/O Profiling
Measure disk and network operations:
- •Disk I/O - Read/write latency, throughput, IOPS
- •Network I/O - Latency, bandwidth, connection count
- •Database I/O - Query time, connection pool usage
Key metrics:
- •Latency percentiles (p50, p95, p99)
- •Throughput (ops/sec, MB/sec)
- •Queue depth and wait times
Bottleneck Identification
The USE Method
For each resource, check:
- •Utilization - Percentage of time resource is busy
- •Saturation - Degree of queued work
- •Errors - Error count for the resource
The RED Method
For services, measure:
- •Rate - Requests per second
- •Errors - Failed requests per second
- •Duration - Distribution of request latencies
Common Bottleneck Patterns
| Pattern | Symptoms | Typical Causes |
|---|---|---|
| CPU-bound | High CPU, low I/O wait | Inefficient algorithms, tight loops |
| Memory-bound | High memory, GC pressure | Memory leaks, large allocations |
| I/O-bound | Low CPU, high I/O wait | Slow queries, network latency |
| Lock contention | Low CPU, high wait time | Synchronization, connection pools |
| N+1 queries | Many small DB queries | Missing joins, lazy loading |
Amdahl's Law
Optimization impact is limited by the fraction of time affected:
code
If 90% of time is in function A and 10% in function B: - Optimizing A by 50% = 45% total improvement - Optimizing B by 50% = 5% total improvement
Focus on the biggest contributors first.
Capacity Planning
Baseline Establishment
Measure current capacity under production load:
- •Peak load metrics - Maximum concurrent users, requests/sec
- •Resource headroom - How close to limits at peak
- •Scaling patterns - Linear, sub-linear, or super-linear
Load Testing Approach
- •Establish baseline - Current performance at normal load
- •Ramp testing - Gradually increase load to find limits
- •Stress testing - Push beyond limits to understand failure modes
- •Soak testing - Sustained load to find memory leaks, degradation
Capacity Metrics
| Metric | What It Tells You |
|---|---|
| Throughput at saturation | Maximum system capacity |
| Latency at 80% load | Performance before degradation |
| Error rate under stress | Failure patterns |
| Recovery time | How quickly system returns to normal |
Growth Planning
code
Required Capacity = (Current Load x Growth Factor) + Safety Margin Example: - Current: 1000 req/sec - Expected growth: 50% per year - Safety margin: 30% Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec
Optimization Patterns
Quick Wins
- •Enable caching - Application, CDN, database query cache
- •Add indexes - For slow queries identified in profiling
- •Compression - Gzip/Brotli for responses
- •Connection pooling - Reduce connection overhead
- •Batch operations - Reduce round-trips
Algorithmic Improvements
- •Reduce complexity - O(n^2) to O(n log n)
- •Lazy evaluation - Defer work until needed
- •Memoization - Cache computed results
- •Pagination - Limit data processed at once
Architectural Changes
- •Horizontal scaling - Add more instances
- •Async processing - Queue background work
- •Read replicas - Distribute read load
- •Caching layers - Redis, Memcached
- •CDN - Edge caching for static content
Best Practices
- •Profile in production-like environments; development can have different characteristics
- •Use percentiles (p95, p99) not averages for latency
- •Monitor continuously, not just during incidents
- •Set performance budgets and enforce them in CI
- •Document baseline metrics before making changes
- •Keep profiling overhead low in production
- •Correlate metrics across layers (application, database, infrastructure)
- •Understand the difference between latency and throughput
Anti-Patterns
- •Optimizing without measurement
- •Using averages for latency metrics
- •Profiling only in development
- •Ignoring tail latencies (p99, p999)
- •Premature optimization of non-bottleneck code
- •Over-engineering for hypothetical scale
- •Caching without invalidation strategy
References
- •Profiling Tools Reference - Tools by language and platform