Performance Profiling

When to Use

•Establishing performance baselines before optimization
•Diagnosing slow response times, high CPU, or memory issues
•Identifying bottlenecks in application, database, or infrastructure
•Planning capacity for expected load increases
•Validating performance improvements after optimization
•Creating performance budgets for new features

Core Methodology

The Golden Rule: Measure First

Never optimize based on assumptions. Follow this order:

•Measure - Establish baseline metrics
•Identify - Find the actual bottleneck
•Hypothesize - Form a theory about the cause
•Fix - Implement targeted optimization
•Validate - Measure again to confirm improvement
•Document - Record findings and decisions

Profiling Hierarchy

Profile at the right level to find the actual bottleneck:

code

Application Level
    |-- Request/Response timing
    |-- Function/Method profiling
    |-- Memory allocation tracking
    |
System Level
    |-- CPU utilization per process
    |-- Memory usage patterns
    |-- I/O wait times
    |-- Network latency
    |
Infrastructure Level
        |-- Database query performance
        |-- Cache hit rates
        |-- External service latency
        |-- Resource saturation

Profiling Patterns

CPU Profiling

Identify what code consumes CPU time:

•Sampling profilers - Low overhead, statistical accuracy
•Instrumentation profilers - Exact counts, higher overhead
•Flame graphs - Visual representation of call stacks

Key metrics:

•Self time (time in function itself)
•Total time (self time + time in called functions)
•Call count and frequency

Memory Profiling

Track allocation patterns and detect leaks:

•Heap snapshots - Point-in-time memory state
•Allocation tracking - What allocates memory and when
•Garbage collection analysis - GC frequency and duration

Key metrics:

•Heap size over time
•Object retention
•Allocation rate
•GC pause times

I/O Profiling

Measure disk and network operations:

•Disk I/O - Read/write latency, throughput, IOPS
•Network I/O - Latency, bandwidth, connection count
•Database I/O - Query time, connection pool usage

Key metrics:

•Latency percentiles (p50, p95, p99)
•Throughput (ops/sec, MB/sec)
•Queue depth and wait times

Bottleneck Identification

The USE Method

For each resource, check:

•Utilization - Percentage of time resource is busy
•Saturation - Degree of queued work
•Errors - Error count for the resource

The RED Method

For services, measure:

•Rate - Requests per second
•Errors - Failed requests per second
•Duration - Distribution of request latencies

Common Bottleneck Patterns

Pattern	Symptoms	Typical Causes
CPU-bound	High CPU, low I/O wait	Inefficient algorithms, tight loops
Memory-bound	High memory, GC pressure	Memory leaks, large allocations
I/O-bound	Low CPU, high I/O wait	Slow queries, network latency
Lock contention	Low CPU, high wait time	Synchronization, connection pools
N+1 queries	Many small DB queries	Missing joins, lazy loading

Amdahl's Law

Optimization impact is limited by the fraction of time affected:

code

If 90% of time is in function A and 10% in function B:
- Optimizing A by 50% = 45% total improvement
- Optimizing B by 50% = 5% total improvement

Focus on the biggest contributors first.

Capacity Planning

Baseline Establishment

Measure current capacity under production load:

•Peak load metrics - Maximum concurrent users, requests/sec
•Resource headroom - How close to limits at peak
•Scaling patterns - Linear, sub-linear, or super-linear

Load Testing Approach

•Establish baseline - Current performance at normal load
•Ramp testing - Gradually increase load to find limits
•Stress testing - Push beyond limits to understand failure modes
•Soak testing - Sustained load to find memory leaks, degradation

Capacity Metrics

Metric	What It Tells You
Throughput at saturation	Maximum system capacity
Latency at 80% load	Performance before degradation
Error rate under stress	Failure patterns
Recovery time	How quickly system returns to normal

Growth Planning

code

Required Capacity = (Current Load x Growth Factor) + Safety Margin

Example:
- Current: 1000 req/sec
- Expected growth: 50% per year
- Safety margin: 30%

Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec

Optimization Patterns

Quick Wins

•Enable caching - Application, CDN, database query cache
•Add indexes - For slow queries identified in profiling
•Compression - Gzip/Brotli for responses
•Connection pooling - Reduce connection overhead
•Batch operations - Reduce round-trips

Algorithmic Improvements

•Reduce complexity - O(n^2) to O(n log n)
•Lazy evaluation - Defer work until needed
•Memoization - Cache computed results
•Pagination - Limit data processed at once

Architectural Changes

•Horizontal scaling - Add more instances
•Async processing - Queue background work
•Read replicas - Distribute read load
•Caching layers - Redis, Memcached
•CDN - Edge caching for static content

Best Practices

•Profile in production-like environments; development can have different characteristics
•Use percentiles (p95, p99) not averages for latency
•Monitor continuously, not just during incidents
•Set performance budgets and enforce them in CI
•Document baseline metrics before making changes
•Keep profiling overhead low in production
•Correlate metrics across layers (application, database, infrastructure)
•Understand the difference between latency and throughput

Anti-Patterns

•Optimizing without measurement
•Using averages for latency metrics
•Profiling only in development
•Ignoring tail latencies (p99, p999)
•Premature optimization of non-bottleneck code
•Over-engineering for hypothetical scale
•Caching without invalidation strategy

References

•Profiling Tools Reference - Tools by language and platform