System Design Patterns
Decision Frameworks
Architecture is decisions. Good architects make them explicitly.
When Monolith vs Services
Start monolith. Split only when you have evidence for ALL three:
- •Independent deployment need: teams ship on different cadences
- •Independent scaling need: one component needs 10x the resources
- •Fault isolation need: one component's failure must not cascade
If you can't name the specific component for all three, stay monolith. "Microservices for flexibility" is a premature abstraction.
When SQL vs NoSQL
| Signal | Choose SQL | Choose NoSQL |
|---|---|---|
| Data has relationships | Yes — joins are free | No — you'll denormalize anyway |
| Schema is evolving rapidly | Migrations are friction | Schema-less helps |
| Need transactions across entities | ACID is a feature | Distributed transactions are pain |
| Read pattern is "get by key" | Overkill | Document/KV stores excel |
| Write volume is extreme | Sharding SQL is hard | Built for horizontal scale |
| Need full-text search | Bolted on (fine for most) | Elasticsearch/dedicated |
Default: Postgres. It handles 90% of workloads. Add specialized stores when Postgres can't.
When Sync vs Async
- •Sync (request/response): when the caller needs the result to continue
- •Async (queue/event): when the work can happen later or the caller doesn't need the result
- •Hybrid: sync for the user-facing response, async for side effects (email, analytics, notifications)
Rule: if the user is waiting, be sync. If the user doesn't care when it happens, be async.
When to Cache
Cache when ALL of these are true:
- •Data is read far more than written (>10:1 ratio)
- •Data can be stale for some period without harm
- •Computing/fetching the data is expensive
- •You have a clear invalidation strategy
No invalidation strategy = no cache. Stale data bugs are worse than slow responses.
Architectural Review Checklist
When reviewing a system design or plan:
1. Boundaries
- •Are service/module boundaries at natural domain seams?
- •Does each component have a single reason to change?
- •Are boundaries enforced (API contracts, not shared databases)?
2. Data Flow
- •Trace a request from user to storage and back. How many hops?
- •Where does data transform? Is validation at the boundary?
- •Are there circular dependencies?
3. Failure Modes
- •What happens when each dependency is down?
- •Is there retry logic? Is it idempotent?
- •What's the blast radius of the most likely failure?
- •Are there circuit breakers where needed?
4. Scale Bottlenecks
- •What's the first thing that breaks at 10x load?
- •Are there N+1 patterns (queries, API calls, file reads)?
- •Is there a single writer bottleneck?
- •Can the hot path be cached or precomputed?
5. Coupling
- •Can you deploy component A without redeploying B?
- •Can you test component A without standing up B?
- •If you change A's internal implementation, does B break?
- •Are shared libraries creating hidden coupling?
6. Security Surface
- •Is auth at the edge or scattered through the codebase?
- •Are secrets in config, not code?
- •Is the principle of least privilege applied (services, DB users, IAM roles)?
Common Architectural Mistakes
Distributed Monolith
Services that must be deployed together, share a database, or fail together. You got the complexity of microservices with none of the benefits. Fix: merge them back or establish real boundaries.
Premature Abstraction
Building a "plugin system" for one plugin. Creating an "event bus" for two events. Writing a "generic data layer" before you have two data sources. Fix: inline it. Abstract when you have three concrete examples.
Shared Database
Two services reading/writing the same tables. Any schema change requires coordinating both teams. Fix: each service owns its data. Expose via API, not shared tables.
Synchronous Chain
A -> B -> C -> D, all synchronous. Latency is the sum. Failure in D fails A. Fix: go async where the user doesn't need an immediate result. Add timeouts and fallbacks.
God Service
One service that "orchestrates everything." It knows about every other service. It's the bottleneck for every change. Fix: distribute decision-making. Each service handles its own domain logic.
Scalability Analysis
When asked "will this scale?", work through:
- •Identify the hot path: what gets called most? (usually <5% of code handles >95% of traffic)
- •Measure, don't guess: profile before optimizing
- •Scale reads and writes separately: reads are usually cacheable, writes usually aren't
- •Look for state: stateless components scale horizontally. Stateful ones need coordination.
- •Check the database: it's almost always the bottleneck. Indexes, query plans, connection pools.