Jeff Dean Style Guide
Overview
Jeff Dean is the architect behind much of Google's infrastructure: MapReduce, BigTable, Spanner, TensorFlow, and more. He exemplifies the rare combination of deep systems knowledge, performance intuition, and practical engineering judgment. His work defines how modern internet-scale systems are built.
Core Philosophy
"Design for 10x the current load, but plan to rewrite before 100x."
"Simple solutions often require the most sophisticated understanding of the problem."
"If a problem isn't interesting at scale, it probably isn't interesting at all."
Design Principles
- •
Embrace Failure: At scale, everything fails. Design systems that degrade gracefully, not catastrophically.
- •
Numbers Matter: Know your latencies, throughputs, and failure rates by heart. Performance intuition comes from data.
- •
Codesign Hardware and Software: The best performance comes from understanding the entire stack, from disk to datacenter.
- •
Simplicity at Scale: Complex systems break in complex ways. The simplest solution that scales is usually the best.
- •
Measure, Then Optimize: Never optimize without profiling. Intuition fails; data doesn't.
Numbers Every Engineer Should Know
L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns Read 4K randomly from SSD 150,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Read 1 MB sequentially from SSD 1,000,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA→Netherlands→CA 150,000,000 ns
These numbers should guide every design decision.
When Designing Systems
Always
- •Start with back-of-envelope calculations before designing
- •Design for partial failure—some machines will always be down
- •Use replication for availability, sharding for scale
- •Batch operations when possible—amortize fixed costs
- •Compress data on the wire and at rest (CPU is cheaper than I/O)
- •Add monitoring and observability from day one
- •Design for debugging—you'll need to diagnose production issues
Never
- •Assume the network is reliable (it's not)
- •Assume latency is zero (it's not)
- •Assume bandwidth is infinite (it's not)
- •Optimize before measuring
- •Design for current load only—design for 10x
- •Ignore tail latency (p99 matters more than average)
- •Build systems you can't reason about under failure
Prefer
- •Idempotent operations over exactly-once semantics
- •Eventual consistency over strong consistency (when possible)
- •Denormalization over joins at scale
- •Structured data over unstructured (schemas help)
- •Batch processing over real-time when latency allows
- •Simple retry logic over complex distributed transactions
Architectural Patterns
MapReduce Mental Model
Problem: Process petabytes of data Solution: 1. Map: Transform input into (key, value) pairs in parallel 2. Shuffle: Group all values by key 3. Reduce: Aggregate values for each key Why it works: - Embarrassingly parallel map phase - Fault tolerance via re-execution - Simple programming model hides distribution
BigTable Design
Problem: Structured storage at massive scale Solution: - Sparse, distributed, multi-dimensional sorted map - (row, column, timestamp) → value - Rows sorted lexicographically (enables range scans) - Column families for locality - Tablets (row ranges) as unit of distribution Key insight: One data model, flexible enough for many use cases.
Spanner's TrueTime
Problem: Global consistency requires synchronized clocks Solution: - GPS + atomic clocks in every datacenter - API returns interval [earliest, latest] not a point - Wait out uncertainty before committing TrueTime.now() returns TTinterval: [earliest, latest] Commit rule: Wait until TrueTime.now().earliest > commit_timestamp
Code Patterns
Back-of-Envelope Capacity Planning
def estimate_storage_needs(
daily_active_users: int,
actions_per_user_per_day: int,
bytes_per_action: int,
retention_days: int,
replication_factor: int = 3
) -> dict:
"""Jeff Dean-style capacity estimation."""
daily_bytes = daily_active_users * actions_per_user_per_day * bytes_per_action
total_bytes = daily_bytes * retention_days * replication_factor
return {
"daily_raw_gb": daily_bytes / (1024**3),
"total_storage_tb": total_bytes / (1024**4),
"monthly_bandwidth_tb": (daily_bytes * 30) / (1024**4),
"estimated_machines_1tb_each": total_bytes / (1024**4),
}
# Example: 100M DAU, 10 actions/day, 1KB each, 90 day retention
# = 270 TB storage, ~300 machines (with replication)
Sharding Strategy
class ConsistentHashRing:
"""Distribute data across nodes with minimal reshuffling."""
def __init__(self, nodes: list[str], virtual_nodes: int = 150):
self.ring: dict[int, str] = {}
self.sorted_keys: list[int] = []
for node in nodes:
for i in range(virtual_nodes):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys = sorted(self.ring.keys())
def get_node(self, key: str) -> str:
"""Find the node responsible for this key."""
if not self.ring:
raise ValueError("Empty ring")
h = self._hash(key)
for ring_key in self.sorted_keys:
if h <= ring_key:
return self.ring[ring_key]
return self.ring[self.sorted_keys[0]]
def _hash(self, key: str) -> int:
import hashlib
return int(hashlib.md5(key.encode()).hexdigest(), 16)
Retry with Exponential Backoff
import random
import time
from typing import TypeVar, Callable
T = TypeVar('T')
def retry_with_backoff(
fn: Callable[[], T],
max_retries: int = 5,
base_delay_ms: int = 100,
max_delay_ms: int = 10000,
) -> T:
"""
Retry with exponential backoff and jitter.
At Google scale, thundering herds kill systems.
Jitter prevents synchronized retries.
"""
for attempt in range(max_retries):
try:
return fn()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = min(base_delay_ms * (2 ** attempt), max_delay_ms)
jitter = random.uniform(0, delay * 0.1)
time.sleep((delay + jitter) / 1000)
raise RuntimeError("Unreachable")
Mental Model
Jeff Dean approaches problems with:
- •Quantify first: How much data? How many QPS? What latency budget?
- •Identify bottlenecks: Where will the system break first?
- •Design for failure: What happens when (not if) components fail?
- •Simplify ruthlessly: Can this be simpler while still meeting requirements?
- •Plan for evolution: Today's solution should be replaceable in 3 years
The Google Design Doc
1. Context & Scope - What problem are we solving? Why now? 2. Goals and Non-Goals - What this system WILL do - What this system explicitly WON'T do 3. Design - System architecture - Data model - API 4. Alternatives Considered - What else could we do? Why not? 5. Cross-cutting Concerns - Security, privacy, monitoring, rollout 6. Open Questions - What don't we know yet?
Warning Signs
You're violating Dean's principles if:
- •You don't know your system's p50, p99, and p999 latencies
- •You haven't done back-of-envelope capacity planning
- •Your system has no strategy for partial failure
- •You're optimizing without profiling data
- •You designed for current load, not 10x growth
- •You can't explain where every millisecond goes
Additional Resources
- •For detailed philosophy, see philosophy.md
- •For references (papers, talks), see references.md