Infrastructure Sizing and Capacity Planning

Overview

Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).

Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."

1. Right-Sizing Principles

Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.

Principle	Description
Utilization Thresholds	Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky.
Vertical first...	Increase resource limits for single-threaded or monolithic apps.
...Horizontal usually	Spread load across multiple small instances for resilience and elasticity.
Metric-Based	Use P95 or P99 metrics for latency, but Average for base capacity sizing.

2. Compute Sizing (EC2, VMs, GCE)

Step 1: Resource Profiling

Run your app in a staging environment and measure:

•CPU: Is the app CPU-bound (mathematical calculations, compression)?
•Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
•Thread Usage: How many concurrent requests can one CPU core handle?

Step 2: Instance Family Selection

Family	Best For	AWS Example	GCP Example
General Purpose	Balanced workloads, small DBs	`t3`, `m6g`	`n2`, `e2`
Compute Optimized	Batch processing, high-traffic APIs	`c6g`, `c7i`	`c2`, `c3`
Memory Optimized	Redis, high-RAM DBs, Analytics	`r6g`, `x2`	`m1`, `m2`

Sizing Formula (Basic)

Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)

3. Database Sizing (RDS, Cloud SQL, Azure SQL)

IOPS (Input/Output Operations Per Second)

Disk performance is often the bottleneck, not CPU.

•GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
•Provisioned IOPS (io2): For high-performance transactional DBs.

Storage Growth Calculation

Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)

•Buffer: Always keep 20% free to allow for indexing and temp file creation.

Connection Pool Sizing

Max Connections = (Instance RAM / 10MB) - (System Reserve)

•Too many connections lead to high "Context Switching" and performance degradation.

4. Cache Sizing (Redis/Memcached)

Caching is a trade-off between Memory Cost and Latency Benefits.

Formula: Working Set Size

Not all data needs to be in cache. Only store the Working Set (frequently accessed data).

•Measure Total Data Size.
•Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
•Cache Size = 20-30% of Total Data Size.

Eviction Policy Impact

•allkeys-lru: Best for general caching.
•noeviction: Returns errors when full (dangerous).

5. Container Sizing (Kubernetes)

Understanding the difference between Requests and Limits is critical for both stability and cost.

Metric	Purpose	Cost Impact
Requests	Kubernetes guarantees this capacity. Used for scheduling.	High: Cloud Providers charge based on requests.
Limits	The maximum a pod can burst to.	Low: Generally doesn't impact cost unless using serverless K8s.

The "OOMKill" Trap

If Memory Requests < Actual Usage, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).

6. Serverless Sizing (Lambda / Cloud Functions)

Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.

•Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
•Strategy: Use AWS Lambda Power Tuning to find the "Sweet Spot" where performance and cost intersect.

Memory (MB)	Duration (ms)	Cost ($)	Result
128	1000	0.0000021	Slow
512	200	0.0000016	Winner (Faster & Cheaper)
1024	150	0.0000025	Diminishing returns

7. Network and CDN Sizing

•Throughput: Measure P99 payload size * Peak requests per second.
•
CDN Coverage: What % of your traffic can be served from the edge?
- •Goal: > 80% Cache Hit Ratio for static assets.
- •Impact: CDN bandwidth is 50-70% cheaper than origin egress.

8. Load Testing for Capacity Planning

Never size based on assumptions. Use tools like k6, Locust, or JMeter.

•Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
•Soak Test: Run at 80% load for 24 hours to find memory leaks.
•Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.

9. Monitoring for Right-Sizing

The Dashboard Template (Grafana/Datadog)

•CPU Heatmap: Identify idle periods (e.g., weekends).
•RAM Saturation: Identify "Memory Bloat".
•Disk Queue Depth: Identify IOPS bottlenecks.
•Network In/Out: Identify efficient vs inefficient regions.

Automated Right-Sizing Tools

•AWS Compute Optimizer: Provides JSON recommendations for instance types.
•VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
•Goldilocks: A K8s dashboard that visualizes VPA recommendations.

10. Capacity Planning Template

Component	Metric	Current Load	Growth (6mo)	Buffer	Target Spec
Web Tier	Peak Req/sec	500	2x (1000)	20%	4x c6g.large
Database	Storage	500GB	+100GB/mo	30%	1.5TB GP3
Cache	Working Set	8GB	12GB	10%	16GB Node

11. Real Sizing Scenario: SaaS API

•Initial Setup: 10 nodes of m5.xlarge (4 vCPU, 16GB RAM). Monthly cost: $1,400.
•Observation: CPU average 12%, RAM average 40%.
•Analysis: The app is memory-bound, but CPU is idle.
•Action: Switched to 5 nodes of t3.large ($350/mo) + enabled Auto-scaling.
•Result: 75% cost reduction while maintaining the same performance metrics.

Related Skills

•40-system-resilience/graceful-degradation
•42-cost-engineering/cloud-cost-models
•42-cost-engineering/budget-guardrails