Event Sourcing & CQRS Patterns
Event Store Technology Selection
| Technology | Best For | Avoid If |
|---|---|---|
| EventStoreDB | Pure event sourcing, projections built-in | Need multi-purpose DB |
| PostgreSQL | Existing Postgres stack, SQL expertise | Extreme write throughput (>10K/s) |
| Kafka | High-throughput streaming, event bus | Per-aggregate queries critical |
| DynamoDB | Serverless, AWS-native, auto-scaling | Complex cross-stream queries |
Event Store Schema Design (Postgres)
CREATE TABLE events (
stream_id VARCHAR(255) NOT NULL, -- Pattern: "{Type}-{UUID}"
stream_type VARCHAR(255) NOT NULL,
event_type VARCHAR(255) NOT NULL,
event_data JSONB NOT NULL,
version BIGINT NOT NULL,
global_position BIGSERIAL, -- Critical for projections
created_at TIMESTAMPTZ DEFAULT NOW(),
CONSTRAINT unique_stream_version UNIQUE (stream_id, version)
);
CREATE INDEX idx_events_stream ON events(stream_id, version);
CREATE INDEX idx_events_global ON events(global_position); -- Projection catchup
CREATE INDEX idx_events_type ON events(event_type); -- Type-based subscriptions
Key Decisions:
- •
stream_idformat:Order-{uuid}>{uuid}alone (enables type-based queries) - •
global_positionserial vs timestamp: serial prevents race conditions - •
versionper-stream vs global: per-stream enables optimistic concurrency
Event Store Guardrails
- •Immutability: Never UPDATE or DELETE events -- add compensating events instead
- •Optimistic concurrency: Always check
expected_versionon append to prevent lost updates - •Event size: Keep <10KB; reference large payloads via URL/S3 key
- •Idempotency: Use
event_idfor deduplication; append must be idempotent - •Correlation/Causation IDs: Required for tracing --
metadata.correlation_id
CQRS: Consistency Models
| Model | When | Implementation |
|---|---|---|
| Eventual | Default -- acceptable lag | Async projections, no write-time coupling |
| Read-your-writes | User expects immediate visibility | Poll projection until version ≥ write version (5s timeout) |
| Inline projection | Strong consistency required | Update read model in same transaction as event append |
Gotcha: Inline projections couple write/read stores -- breaks scaling independence. Only for single-DB deployments.
Projection Design Patterns
Idempotency
Projections must be idempotent (events replay during catchup/rebuild). Techniques:
- •Upsert with full state (not incremental updates)
- •Track
last_processed_versionper entity - •Use
ON CONFLICT DO UPDATE(Postgres) or conditional expressions (DynamoDB)
Checkpointing
Store last_processed_global_position per projection:
- •Enables resume after restart
- •Supports independent projection versioning
- •Checkpoint frequency: every event is overkill; batch every 100-1000 events
Rebuild Strategy
Projections must support full rebuild:
- •Create new projection table (v2)
- •Replay all events into v2
- •Atomic swap: rename v1→old, v2→current
- •Drop old after validation
Never rebuild in-place -- risks data loss on failure.
Projection Types
| Type | Use Case | Tradeoff |
|---|---|---|
| Summary view | Order totals, counts | Must handle out-of-order events |
| Search index | Elasticsearch, Algolia | External dependency, harder rebuild |
| Aggregates | Daily sales rollups | Time-based bucketing complexity |
| Denormalized join | Customer + Orders in one doc | Higher storage, faster queries |
Saga & Workflow Orchestration
Choreography vs Orchestration
| Factor | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose (services react to events) | Tighter (orchestrator knows steps) |
| Visibility | Hard to trace end-to-end | Easy (orchestrator holds state) |
| Complexity ceiling | Breaks down at 4+ steps | Scales to 10+ steps |
| Best for | 2-3 steps, decoupled teams | Order-dependent steps, complex compensation |
Default to orchestration unless <4 steps with simple compensation.
Saga vs Workflow Engine
| Feature | Plain Saga | Workflow Engine (Temporal) |
|---|---|---|
| Retries | Manual | Built-in with backoff |
| State persistence | Manual saga store | Automatic |
| Determinism | Not enforced | Enforced (replay-safe) |
| Versioning | Manual migration | workflow.get_version() |
| Pick when | Simple compensating txns | Long-running, stateful, multi-service |
Compensation Design (Critical)
- •LIFO order: Compensate in reverse execution order (stack, not queue)
- •Idempotency: Compensations retry; design for multiple execution
- •Always succeed: Compensations cannot fail -- if they can, add retry/alert
- •Register before execution: Add compensation to stack before each step
- •Partial compensation: Track completed steps; only compensate those
Gotcha: A saga that completes compensation is "failed successfully" -- distinguish from unrecoverable failures in monitoring.
Workflow Engine Constraints (Temporal-style)
Workflow Code (Deterministic):
- •Prohibited:
datetime.now(),random(), threading, I/O, network - •Use instead:
workflow.now(),workflow.random(), activities for side effects
Activity Code (Non-Deterministic):
- •Must be idempotent
- •Must have timeout (activities can hang)
- •Classify errors: retryable (network) vs non-retryable (validation)
- •Use heartbeats for long-running (>30s) activities
- •2MB payload limit per argument
Gotcha: Workflow code runs repeatedly during replay -- any non-determinism causes divergence errors.
Operational Guardrails
- •Correlation IDs: Propagate through every step for tracing
- •Timeouts on every step: Never wait indefinitely (5min default)
- •Monitor: workflow duration, step failure rate, compensation trigger rate, stuck count
- •Versioning: Never modify running workflow logic; use version gates or new workflow types
Non-Obvious Gotchas
- •Choreography needs saga ID: Even without orchestrator, need correlation across events
- •Eventual consistency SLAs: Define acceptable lag (500ms? 5s?) and monitor breach rate
- •Event versioning from day one: Add
event.schema_version; upcasting is harder than prevention - •Don't query in command handlers: Commands are for writes; breaks CQRS separation
- •Business logic in workflows, not activities: Activities are I/O adapters; decisions belong in workflow
- •Projection lag snowball: If projection falls behind, writes accelerate lag -- needs backpressure or scaling
Do's and Don'ts
Event Store
- •Do: Use stream IDs with type prefix (
Order-{uuid}) - •Do: Include correlation/causation IDs in metadata
- •Do: Implement optimistic concurrency on append
- •Don't: Update or delete events
- •Don't: Store large payloads (>10KB)
CQRS
- •Do: Denormalize read models for query patterns
- •Do: Validate in command handlers before state change
- •Do: Define consistency SLAs per feature
- •Don't: Query in command handlers
- •Don't: Couple read/write schemas
Projections
- •Do: Make projections idempotent
- •Do: Store checkpoints for resume
- •Do: Support full rebuild
- •Don't: Couple projections (each is independent)
- •Don't: Ignore projection lag monitoring
Sagas
- •Do: Test compensations more than happy path
- •Do: Use orchestration for >3 steps
- •Do: Set timeouts on every step
- •Don't: Skip correlation IDs
- •Don't: Modify running workflow logic in-place