Scalability Advisor

Provides systematic guidance for scaling systems at different growth stages, identifying bottlenecks, and designing for horizontal scalability.

When to Use

•Planning for 10x, 100x, or 1000x growth
•Diagnosing current performance bottlenecks
•Designing new systems for scale
•Evaluating scaling strategies (vertical vs. horizontal)
•Capacity planning and infrastructure sizing

Scaling Stages Framework

Stage Overview

code

┌─────────────────────────────────────────────────────────────────────┐
│                    SCALING JOURNEY                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Stage 1        Stage 2         Stage 3         Stage 4             │
│  Startup        Growth          Scale           Enterprise          │
│  0-10K users    10K-100K        100K-1M         1M+ users           │
│                                                                     │
│  Single         Add caching,    Horizontal      Global,             │
│  server         read replicas   scaling         multi-region        │
│                                                                     │
│  $100/mo        $1K/mo          $10K/mo         $100K+/mo           │
└─────────────────────────────────────────────────────────────────────┘

Stage 1: Startup (0-10K Users)

Architecture

code

┌────────────────────────────────────────┐
│           Single Server                │
│  ┌──────────────────────────────────┐  │
│  │  App Server (Node/Python/etc)    │  │
│  │  + Database (PostgreSQL)         │  │
│  │  + File Storage (local/S3)       │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

Key Metrics

Metric	Target	Warning
Response time (P95)	< 500ms	> 1s
Database queries/request	< 10	> 20
Server CPU	< 70%	> 85%
Database connections	< 50% pool	> 80% pool

What to Focus On

DO:

•Write clean, maintainable code
•Use database indexes on frequently queried columns
•Implement basic monitoring (uptime, errors)
•Keep architecture simple (monolith is fine)

DON'T:

•Over-engineer for scale you don't have
•Add caching before you need it
•Split into microservices prematurely
•Worry about multi-region yet

When to Move to Stage 2

•Database CPU consistently > 70%
•Response times degrading
•Single queries taking > 100ms
•Server resources maxed

Stage 2: Growth (10K-100K Users)

Architecture

code

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌─────────┐      ┌─────────────────────────────────┐     │
│    │   CDN   │      │      Load Balancer              │     │
│    └────┬────┘      └──────────────┬──────────────────┘     │
│         │                          │                        │
│         │           ┌──────────────┼──────────────┐         │
│         │           │              │              │         │
│         ▼           ▼              ▼              ▼         │
│    ┌─────────┐ ┌─────────┐   ┌─────────┐   ┌─────────┐      │
│    │ Static  │ │ App 1   │   │ App 2   │   │ App 3   │      │
│    │ Assets  │ └────┬────┘   └────┬────┘   └────┬────┘      │
│    └─────────┘      │             │             │           │
│                     └──────────────┼────────────┘           │
│                                    │                        │
│                     ┌──────────────┼──────────────┐         │
│                     │              │              │         │
│                     ▼              ▼              ▼         │
│               ┌─────────┐   ┌─────────┐   ┌─────────┐       │
│               │ Primary │   │  Read   │   │  Redis  │       │
│               │   DB    │───│ Replica │   │  Cache  │       │
│               └─────────┘   └─────────┘   └─────────┘       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Key Additions

Component	Purpose	When to Add
CDN	Static asset caching	Images, JS, CSS taking > 20% bandwidth
Load Balancer	Distribute traffic	Single server CPU > 70%
Read Replicas	Offload reads	> 80% database ops are reads
Redis Cache	Application caching	Same queries repeated frequently
Job Queue	Async processing	Background tasks blocking requests

Caching Strategy

code

Request Flow with Caching:

1. Check CDN (static assets)         ─► HIT: Return cached
                                           │
2. Check Application Cache (Redis)   ─► HIT: Return cached
                                           │
3. Check Database                    ─► Return + Cache result

What to Cache:

•Session data (TTL: session duration)
•User profile data (TTL: 5-15 minutes)
•API responses (TTL: varies by freshness needs)
•Database query results (TTL: 1-5 minutes)
•Computed values (TTL: based on computation cost)

Database Optimization

sql

-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

-- Find missing indexes
SELECT schemaname, tablename, indexrelname, idx_scan, seq_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND seq_scan > 1000;

When to Move to Stage 3

•Write traffic overwhelming single primary
•Cache hit rate plateauing despite optimization
•Read replicas can't keep up with replication lag
•Need independent scaling of components

Stage 3: Scale (100K-1M Users)

Architecture

code

┌──────────────────────────────────────────────────────────────────────┐
│                           CDN / Edge                                 │
└──────────────────────────────────────────────────────────────────────┘
                                    │
┌──────────────────────────────────────────────────────────────────────┐
│                        API Gateway                                   │
│              (Rate limiting, Auth, Routing)                          │
└──────────────────────────────────────────────────────────────────────┘
                                    │
        ┌───────────────────────────┼───────────────────────────┐
        │                           │                           │
        ▼                           ▼                           ▼
┌───────────────┐          ┌───────────────┐          ┌───────────────┐
│   Service A   │          │   Service B   │          │   Service C   │
│   (Users)     │          │   (Orders)    │          │   (Search)    │
│   Auto-scale  │          │   Auto-scale  │          │   Auto-scale  │
└───────┬───────┘          └───────┬───────┘          └───────┬───────┘
        │                          │                          │
        ▼                          ▼                          ▼
┌───────────────┐          ┌───────────────┐          ┌───────────────┐
│   User DB     │          │   Order DB    │          │ Elasticsearch │
│   (Sharded)   │          │   (Sharded)   │          │   (Cluster)   │
└───────────────┘          └───────────────┘          └───────────────┘
                                    │
                                    ▼
                    ┌───────────────────────────┐
                    │     Message Queue         │
                    │     (Kafka / SQS)         │
                    └───────────────────────────┘

Key Patterns

Database Sharding

code

Sharding Strategies:

1. Hash-based (user_id % num_shards)
   PRO: Even distribution
   CON: Hard to add shards

2. Range-based (user_id 1-1M → shard 1)
   PRO: Easy to add shards
   CON: Hotspots possible

3. Directory-based (lookup table)
   PRO: Flexible
   CON: Lookup overhead

Event-Driven Architecture

code

Synchronous → Asynchronous

Before:
  API → Service A → Service B → Service C → Response (slow)

After:
  API → Service A → Queue → Response (fast)
                      ↓
              Service B, C process async

Scaling Checklist

• Stateless application servers (no local state)
• Database read/write separation
• Asynchronous processing for non-critical paths
• Circuit breakers between services
• Distributed tracing implemented
• Auto-scaling configured with proper metrics
• Database connection pooling (PgBouncer, ProxySQL)

When to Move to Stage 4

•Need geographic distribution for latency
•Regulatory requirements (data residency)
•Single region can't handle failover
•Global user base with latency requirements

Stage 4: Enterprise (1M+ Users)

Architecture

code

┌─────────────────────────────────────────────────────────────────────────┐
│                          Global Load Balancer                           │
│                       (GeoDNS, Anycast, Route53)                        │
└─────────────────────────────────────────────────────────────────────────┘
                    │                                │
           ┌────────┴────────┐              ┌───────┴────────┐
           │                 │              │                │
           ▼                 ▼              ▼                ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │  US-East     │  │  US-West     │  │  EU-West     │  │  AP-South    │
    │  Region      │  │  Region      │  │  Region      │  │  Region      │
    │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │
    │  │Services│  │  │  │Services│  │  │  │Services│  │  │  │Services│  │
    │  └────────┘  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │
    │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │
    │  │Database│  │  │  │Database│  │  │  │Database│  │  │  │Database│  │
    │  │(Primary)│ │  │  │(Replica)│ │  │  │(Primary)│ │  │  │(Replica)│ │
    │  └────────┘  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │
    └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘
                              │                   │
                              └─────────┬─────────┘
                                        │
                              Cross-Region Replication

Multi-Region Patterns

Pattern	Consistency	Latency	Complexity
Active-Passive	Strong	High failover	Low
Active-Active	Eventual	Low	High
Follow-the-Sun	Strong per region	Medium	Medium

Data Consistency Strategies

code

CAP Theorem Trade-offs:

Strong Consistency (CP):
- All regions see same data
- Higher latency for writes
- Use for: Financial transactions, inventory

Eventual Consistency (AP):
- Regions may have stale data briefly
- Low latency always
- Use for: Social feeds, analytics, non-critical

Causal Consistency:
- Related operations ordered correctly
- Balance of latency and correctness
- Use for: Messaging, collaboration

Enterprise Checklist

• Multi-region deployment
• Cross-region data replication
• Global CDN with edge caching
• Disaster recovery tested
• Compliance (SOC 2, GDPR, data residency)
• 99.99% SLA architecture
• Zero-downtime deployments
• Chaos engineering practice

Bottleneck Diagnosis Guide

Finding the Bottleneck

code

Systematic Diagnosis:

1. Where is time spent?
   └─► Distributed tracing (Jaeger, Datadog)

2. Is it the database?
   └─► Check slow query logs, connection pool

3. Is it the application?
   └─► CPU profiling, memory analysis

4. Is it the network?
   └─► Latency between services, DNS resolution

5. Is it external services?
   └─► Third-party API latency, rate limits

Common Bottlenecks by Layer

Layer	Symptoms	Solutions
Database	Slow queries, high CPU	Indexing, read replicas, caching
Application	High CPU, memory	Optimize code, scale horizontally
Network	High latency, timeouts	CDN, edge caching, connection pooling
Storage	Slow I/O, high wait	SSD, object storage, caching
External APIs	Timeouts, rate limits	Circuit breakers, caching, fallbacks

Database Bottleneck Checklist

markdown

## Quick Database Health Check

1. Connection Pool
   - Current connections vs max?
   - Connection wait time?
   - Pool exhaustion events?

2. Query Performance
   - Slowest queries (pg_stat_statements)?
   - Missing indexes (seq scans > 10K)?
   - Lock contention?

3. Replication
   - Replica lag?
   - Write throughput?
   - Read distribution?

4. Storage
   - Disk I/O wait?
   - Table/index bloat?
   - WAL write latency?

Scaling Calculations

Capacity Planning Formula

code

Required Capacity = Peak Traffic × Growth Factor × Safety Margin

Example:
- Current peak: 1,000 req/sec
- Expected growth: 3x in 12 months
- Safety margin: 1.5x

Required: 1,000 × 3 × 1.5 = 4,500 req/sec capacity

Database Sizing

code

Connection Pool Size:
  connections = (num_cores × 2) + effective_spindle_count

  Example: 8 cores, SSD
  connections = (8 × 2) + 1 = 17 connections per instance

Read Replica Sizing:
  replicas = ceiling(read_traffic / single_replica_capacity)

  Example: 10,000 reads/sec, 3,000/replica capacity
  replicas = ceiling(10,000 / 3,000) = 4 replicas

Cache Sizing

code

Cache Size:
  memory = working_set_size × (1 + overhead_factor)

  Working set = frequently accessed data (usually 10-20% of total)
  Overhead = ~1.5x for Redis data structures

  Example: 10GB working set
  Redis memory = 10GB × 1.5 = 15GB

Quick Reference

Scaling Decision Matrix

Symptom	First Try	Then Try	Finally
Slow page loads	Add caching	CDN	Edge compute
Database slow	Add indexes	Read replicas	Sharding
API timeouts	Async processing	Circuit breakers	Event-driven
High server CPU	Vertical scale	Horizontal scale	Optimize code
High memory	Increase RAM	Fix memory leaks	Redesign data structures

Infrastructure Cost at Scale

Users	Architecture	Monthly Cost
10K	Single server	$100-300
100K	Load balanced + cache	$1,000-3,000
1M	Microservices + sharding	$10,000-30,000
10M	Multi-region	$100,000+

References

•Bottleneck Diagnosis Guide - Detailed troubleshooting
•Capacity Planning Calculator - Sizing formulas