Infrastructure Sizing and Capacity Planning
Overview
Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).
Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."
1. Right-Sizing Principles
Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.
| Principle | Description |
|---|---|
| Utilization Thresholds | Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky. |
| Vertical first... | Increase resource limits for single-threaded or monolithic apps. |
| ...Horizontal usually | Spread load across multiple small instances for resilience and elasticity. |
| Metric-Based | Use P95 or P99 metrics for latency, but Average for base capacity sizing. |
2. Compute Sizing (EC2, VMs, GCE)
Step 1: Resource Profiling
Run your app in a staging environment and measure:
- •CPU: Is the app CPU-bound (mathematical calculations, compression)?
- •Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
- •Thread Usage: How many concurrent requests can one CPU core handle?
Step 2: Instance Family Selection
| Family | Best For | AWS Example | GCP Example |
|---|---|---|---|
| General Purpose | Balanced workloads, small DBs | t3, m6g | n2, e2 |
| Compute Optimized | Batch processing, high-traffic APIs | c6g, c7i | c2, c3 |
| Memory Optimized | Redis, high-RAM DBs, Analytics | r6g, x2 | m1, m2 |
Sizing Formula (Basic)
Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)
3. Database Sizing (RDS, Cloud SQL, Azure SQL)
IOPS (Input/Output Operations Per Second)
Disk performance is often the bottleneck, not CPU.
- •GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
- •Provisioned IOPS (io2): For high-performance transactional DBs.
Storage Growth Calculation
Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)
- •Buffer: Always keep 20% free to allow for indexing and temp file creation.
Connection Pool Sizing
Max Connections = (Instance RAM / 10MB) - (System Reserve)
- •Too many connections lead to high "Context Switching" and performance degradation.
4. Cache Sizing (Redis/Memcached)
Caching is a trade-off between Memory Cost and Latency Benefits.
Formula: Working Set Size
Not all data needs to be in cache. Only store the Working Set (frequently accessed data).
- •Measure Total Data Size.
- •Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
- •Cache Size = 20-30% of Total Data Size.
Eviction Policy Impact
- •allkeys-lru: Best for general caching.
- •noeviction: Returns errors when full (dangerous).
5. Container Sizing (Kubernetes)
Understanding the difference between Requests and Limits is critical for both stability and cost.
| Metric | Purpose | Cost Impact |
|---|---|---|
| Requests | Kubernetes guarantees this capacity. Used for scheduling. | High: Cloud Providers charge based on requests. |
| Limits | The maximum a pod can burst to. | Low: Generally doesn't impact cost unless using serverless K8s. |
The "OOMKill" Trap
If Memory Requests < Actual Usage, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).
6. Serverless Sizing (Lambda / Cloud Functions)
Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.
- •Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
- •Strategy: Use
AWS Lambda Power Tuningto find the "Sweet Spot" where performance and cost intersect.
| Memory (MB) | Duration (ms) | Cost ($) | Result |
|---|---|---|---|
| 128 | 1000 | 0.0000021 | Slow |
| 512 | 200 | 0.0000016 | Winner (Faster & Cheaper) |
| 1024 | 150 | 0.0000025 | Diminishing returns |
7. Network and CDN Sizing
- •Throughput: Measure P99 payload size * Peak requests per second.
- •CDN Coverage: What % of your traffic can be served from the edge?
- •Goal: > 80% Cache Hit Ratio for static assets.
- •Impact: CDN bandwidth is 50-70% cheaper than origin egress.
8. Load Testing for Capacity Planning
Never size based on assumptions. Use tools like k6, Locust, or JMeter.
- •Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
- •Soak Test: Run at 80% load for 24 hours to find memory leaks.
- •Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.
9. Monitoring for Right-Sizing
The Dashboard Template (Grafana/Datadog)
- •CPU Heatmap: Identify idle periods (e.g., weekends).
- •RAM Saturation: Identify "Memory Bloat".
- •Disk Queue Depth: Identify IOPS bottlenecks.
- •Network In/Out: Identify efficient vs inefficient regions.
Automated Right-Sizing Tools
- •AWS Compute Optimizer: Provides JSON recommendations for instance types.
- •VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
- •Goldilocks: A K8s dashboard that visualizes VPA recommendations.
10. Capacity Planning Template
| Component | Metric | Current Load | Growth (6mo) | Buffer | Target Spec |
|---|---|---|---|---|---|
| Web Tier | Peak Req/sec | 500 | 2x (1000) | 20% | 4x c6g.large |
| Database | Storage | 500GB | +100GB/mo | 30% | 1.5TB GP3 |
| Cache | Working Set | 8GB | 12GB | 10% | 16GB Node |
11. Real Sizing Scenario: SaaS API
- •Initial Setup: 10 nodes of
m5.xlarge(4 vCPU, 16GB RAM). Monthly cost: $1,400. - •Observation: CPU average 12%, RAM average 40%.
- •Analysis: The app is memory-bound, but CPU is idle.
- •Action: Switched to 5 nodes of
t3.large($350/mo) + enabled Auto-scaling. - •Result: 75% cost reduction while maintaining the same performance metrics.
Related Skills
- •
40-system-resilience/graceful-degradation - •
42-cost-engineering/cloud-cost-models - •
42-cost-engineering/budget-guardrails