Workflow Orchestration Patterns
Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.
Use this skill when
- •Working on workflow orchestration patterns tasks or workflows
- •Needing guidance, best practices, or checklists for workflow orchestration patterns
Do not use this skill when
- •The task is unrelated to workflow orchestration patterns
- •You need a different domain or tool outside this scope
Instructions
- •Clarify goals, constraints, and required inputs.
- •Apply relevant best practices and validate outcomes.
- •Provide actionable steps and verification.
- •If detailed examples are required, open
resources/implementation-playbook.md.
When to Use Workflow Orchestration
Ideal Use Cases (Source: docs.temporal.io)
- •Multi-step processes spanning machines/services/databases
- •Distributed transactions requiring all-or-nothing semantics
- •Long-running workflows (hours to years) with automatic state persistence
- •Failure recovery that must resume from last successful step
- •Business processes: bookings, orders, campaigns, approvals
- •Entity lifecycle management: inventory tracking, account management, cart workflows
- •Infrastructure automation: CI/CD pipelines, provisioning, deployments
- •Human-in-the-loop systems requiring timeouts and escalations
When NOT to Use
- •Simple CRUD operations (use direct API calls)
- •Pure data processing pipelines (use Airflow, batch processing)
- •Stateless request/response (use standard APIs)
- •Real-time streaming (use Kafka, event processors)
Critical Design Decision: Workflows vs Activities
The Fundamental Rule (Source: temporal.io/blog/workflow-engine-principles):
- •Workflows = Orchestration logic and decision-making
- •Activities = External interactions (APIs, databases, network calls)
Workflows (Orchestration)
Characteristics:
- •Contain business logic and coordination
- •MUST be deterministic (same inputs → same outputs)
- •Cannot perform direct external calls
- •State automatically preserved across failures
- •Can run for years despite infrastructure failures
Example workflow tasks:
- •Decide which steps to execute
- •Handle compensation logic
- •Manage timeouts and retries
- •Coordinate child workflows
Activities (External Interactions)
Characteristics:
- •Handle all external system interactions
- •Can be non-deterministic (API calls, DB writes)
- •Include built-in timeouts and retry logic
- •Must be idempotent (calling N times = calling once)
- •Short-lived (seconds to minutes typically)
Example activity tasks:
- •Call payment gateway API
- •Write to database
- •Send emails or notifications
- •Query external services
Design Decision Framework
Does it touch external systems? → Activity Is it orchestration/decision logic? → Workflow
Core Workflow Patterns
1. Saga Pattern with Compensation
Purpose: Implement distributed transactions with rollback capability
Pattern (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):
For each step: 1. Register compensation BEFORE executing 2. Execute the step (via activity) 3. On failure, run all compensations in reverse order (LIFO)
Example: Payment Workflow
- •Reserve inventory (compensation: release inventory)
- •Charge payment (compensation: refund payment)
- •Fulfill order (compensation: cancel fulfillment)
Critical Requirements:
- •Compensations must be idempotent
- •Register compensation BEFORE executing step
- •Run compensations in reverse order
- •Handle partial failures gracefully
2. Entity Workflows (Actor Model)
Purpose: Long-lived workflow representing single entity instance
Pattern (Source: docs.temporal.io/evaluate/use-cases-design-patterns):
- •One workflow execution = one entity (cart, account, inventory item)
- •Workflow persists for entity lifetime
- •Receives signals for state changes
- •Supports queries for current state
Example Use Cases:
- •Shopping cart (add items, checkout, expiration)
- •Bank account (deposits, withdrawals, balance checks)
- •Product inventory (stock updates, reservations)
Benefits:
- •Encapsulates entity behavior
- •Guarantees consistency per entity
- •Natural event sourcing
3. Fan-Out/Fan-In (Parallel Execution)
Purpose: Execute multiple tasks in parallel, aggregate results
Pattern:
- •Spawn child workflows or parallel activities
- •Wait for all to complete
- •Aggregate results
- •Handle partial failures
Scaling Rule (Source: temporal.io/blog/workflow-engine-principles):
- •Don't scale individual workflows
- •For 1M tasks: spawn 1K child workflows × 1K tasks each
- •Keep each workflow bounded
4. Async Callback Pattern
Purpose: Wait for external event or human approval
Pattern:
- •Workflow sends request and waits for signal
- •External system processes asynchronously
- •Sends signal to resume workflow
- •Workflow continues with response
Use Cases:
- •Human approval workflows
- •Webhook callbacks
- •Long-running external processes
State Management and Determinism
Automatic State Preservation
How Temporal Works (Source: docs.temporal.io/workflows):
- •Complete program state preserved automatically
- •Event History records every command and event
- •Seamless recovery from crashes
- •Applications restore pre-failure state
Determinism Constraints
Workflows Execute as State Machines:
- •Replay behavior must be consistent
- •Same inputs → identical outputs every time
Prohibited in Workflows (Source: docs.temporal.io/workflows):
- •❌ Threading, locks, synchronization primitives
- •❌ Random number generation (
random()) - •❌ Global state or static variables
- •❌ System time (
datetime.now()) - •❌ Direct file I/O or network calls
- •❌ Non-deterministic libraries
Allowed in Workflows:
- •✅
workflow.now()(deterministic time) - •✅
workflow.random()(deterministic random) - •✅ Pure functions and calculations
- •✅ Calling activities (non-deterministic operations)
Versioning Strategies
Challenge: Changing workflow code while old executions still running
Solutions:
- •Versioning API: Use
workflow.get_version()for safe changes - •New Workflow Type: Create new workflow, route new executions to it
- •Backward Compatibility: Ensure old events replay correctly
Resilience and Error Handling
Retry Policies
Default Behavior: Temporal retries activities forever
Configure Retry:
- •Initial retry interval
- •Backoff coefficient (exponential backoff)
- •Maximum interval (cap retry delay)
- •Maximum attempts (eventually fail)
Non-Retryable Errors:
- •Invalid input (validation failures)
- •Business rule violations
- •Permanent failures (resource not found)
Idempotency Requirements
Why Critical (Source: docs.temporal.io/activities):
- •Activities may execute multiple times
- •Network failures trigger retries
- •Duplicate execution must be safe
Implementation Strategies:
- •Idempotency keys (deduplication)
- •Check-then-act with unique constraints
- •Upsert operations instead of insert
- •Track processed request IDs
Activity Heartbeats
Purpose: Detect stalled long-running activities
Pattern:
- •Activity sends periodic heartbeat
- •Includes progress information
- •Timeout if no heartbeat received
- •Enables progress-based retry
Best Practices
Workflow Design
- •Keep workflows focused - Single responsibility per workflow
- •Small workflows - Use child workflows for scalability
- •Clear boundaries - Workflow orchestrates, activities execute
- •Test locally - Use time-skipping test environment
Activity Design
- •Idempotent operations - Safe to retry
- •Short-lived - Seconds to minutes, not hours
- •Timeout configuration - Always set timeouts
- •Heartbeat for long tasks - Report progress
- •Error handling - Distinguish retryable vs non-retryable
Common Pitfalls
Workflow Violations:
- •Using
datetime.now()instead ofworkflow.now() - •Threading or async operations in workflow code
- •Calling external APIs directly from workflow
- •Non-deterministic logic in workflows
Activity Mistakes:
- •Non-idempotent operations (can't handle retries)
- •Missing timeouts (activities run forever)
- •No error classification (retry validation errors)
- •Ignoring payload limits (2MB per argument)
Operational Considerations
Monitoring:
- •Workflow execution duration
- •Activity failure rates
- •Retry attempts and backoff
- •Pending workflow counts
Scalability:
- •Horizontal scaling with workers
- •Task queue partitioning
- •Child workflow decomposition
- •Activity batching when appropriate
Additional Resources
Official Documentation:
- •Temporal Core Concepts: docs.temporal.io/workflows
- •Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns
- •Best Practices: docs.temporal.io/develop/best-practices
- •Saga Pattern: temporal.io/blog/saga-pattern-made-easy
Key Principles:
- •Workflows = orchestration, Activities = external calls
- •Determinism is non-negotiable for workflows
- •Idempotency is critical for activities
- •State preservation is automatic
- •Design for failure and recovery