AgentSkillsCN

cluster-sizing

当用户希望了解“Temporal 规模化”、“历史分片”、“集群容量”、“Temporal 资源”、“扩展 Temporal”、“Temporal 性能”、“分片数量”时,或需要关于 Temporal 集群容量规划的指导时,应使用此技能。

SKILL.md
--- frontmatter
name: cluster-sizing
description: This skill should be used when the user asks about "Temporal sizing", "history shards", "cluster capacity", "Temporal resources", "scale Temporal", "Temporal performance", "how many shards", or needs guidance on capacity planning for Temporal clusters.
version: 1.0.0

Temporal Cluster Sizing

Guidance for sizing Temporal clusters based on workload requirements.

Key Sizing Factors

FactorImpactCannot Change
History ShardsWorkflow parallelismYes (set at creation)
History ReplicasThroughput, availabilityNo
Matching ReplicasTask dispatch rateNo
Frontend ReplicasAPI request rateNo
Database SizeHistory storageNo

History Shards

Critical: History shards cannot be changed after cluster creation.

Shards determine maximum workflow parallelism. Each workflow belongs to one shard.

Sizing Guidelines

Concurrent WorkflowsRecommended Shards
< 10,000128
10,000 - 100,000256
100,000 - 500,000512
500,000 - 2,000,0001024
> 2,000,0002048 or 4096

Calculation Formula

code
shards = ceil(max_concurrent_workflows / 1000) * safety_factor

# Round up to nearest power of 2
# safety_factor = 2-4x for growth

Example: Expecting 50,000 concurrent workflows with 3x growth:

code
base = 50,000 / 1000 = 50
with_growth = 50 * 3 = 150
nearest_power_of_2 = 256 shards

Shard Distribution

Shards distribute across history service replicas:

code
shards_per_replica = total_shards / history_replicas

# Example: 512 shards, 4 replicas = 128 shards/replica

More replicas = better distribution = higher throughput.

Service Sizing

Frontend Service

Handles API requests, authentication, rate limiting.

Load LevelReplicasCPUMemory
Low (<100 rps)1-2500m1Gi
Medium (100-1000 rps)312Gi
High (1000-5000 rps)524Gi
Very High (>5000 rps)10+48Gi

History Service

Manages workflow state and event history.

ShardsReplicasCPU/replicaMemory/replica
128212Gi
256324Gi
5124-624Gi
10248-1248Gi
204816-2448Gi

Matching Service

Dispatches tasks to workers.

Task RateReplicasCPUMemory
Low (<1000/s)2500m1Gi
Medium (1000-10000/s)312Gi
High (>10000/s)5+24Gi

Worker Service (Internal)

Handles internal system workflows. Scale with cluster size:

Cluster SizeReplicasCPUMemory
Small1200m256Mi
Medium1500m512Mi
Large211Gi

Database Sizing

PostgreSQL Recommendations

Workflow VolumeCPUMemoryStorageIOPS
< 100K workflows28GB100GB3000
100K-1M workflows416GB500GB6000
1M-10M workflows832GB1TB12000
> 10M workflows16+64GB+2TB+20000+

Storage Calculation

code
storage_per_workflow = avg_history_events * event_size
                     = 100 events * 1KB = 100KB

total_storage = workflows * storage_per_workflow * retention_multiplier
              = 1,000,000 * 100KB * 1.5 = 150GB

Retention: Configure appropriate workflow retention to manage storage.

Elasticsearch Sizing

For visibility queries (optional but recommended):

Indexed WorkflowsNodesCPU/nodeMemory/nodeStorage/node
< 1M312Gi50Gi
1M-10M324Gi200Gi
> 10M5+48Gi500Gi

Configuration Templates

Small Cluster (Dev/Test)

yaml
server:
  config:
    numHistoryShards: 128
  replicaCount:
    frontend: 1
    history: 1
    matching: 1
    worker: 1
  resources:
    frontend:
      requests: {cpu: "250m", memory: "512Mi"}
    history:
      requests: {cpu: "500m", memory: "1Gi"}
    matching:
      requests: {cpu: "250m", memory: "512Mi"}

Medium Cluster (Production Start)

yaml
server:
  config:
    numHistoryShards: 256
  replicaCount:
    frontend: 3
    history: 3
    matching: 3
    worker: 1
  resources:
    frontend:
      requests: {cpu: "500m", memory: "1Gi"}
      limits: {cpu: "2", memory: "4Gi"}
    history:
      requests: {cpu: "1", memory: "2Gi"}
      limits: {cpu: "4", memory: "8Gi"}
    matching:
      requests: {cpu: "500m", memory: "1Gi"}
      limits: {cpu: "2", memory: "4Gi"}

Large Cluster (High Volume)

yaml
server:
  config:
    numHistoryShards: 1024
  replicaCount:
    frontend: 5
    history: 10
    matching: 5
    worker: 2
  resources:
    frontend:
      requests: {cpu: "2", memory: "4Gi"}
      limits: {cpu: "4", memory: "8Gi"}
    history:
      requests: {cpu: "4", memory: "8Gi"}
      limits: {cpu: "8", memory: "16Gi"}
    matching:
      requests: {cpu: "2", memory: "4Gi"}
      limits: {cpu: "4", memory: "8Gi"}

Scaling Guidelines

Horizontal Scaling

Scale replicas when:

  • CPU utilization > 70% sustained
  • Memory utilization > 80%
  • Request latency p99 > SLA
  • Task backlog growing

Vertical Scaling

Increase resources when:

  • Replica count at practical limit
  • Database connection pooling maxed
  • GC pressure affecting latency

Monitoring for Sizing Decisions

Key metrics to watch:

promql
# History service load
sum(rate(temporal_persistence_requests_total[5m])) by (operation)

# Task latency (indicates matching capacity)
histogram_quantile(0.99, rate(temporal_schedule_to_start_latency_bucket[5m]))

# Workflow throughput
sum(rate(temporal_workflow_completed_total[5m]))

# Shard distribution
temporal_history_shard_count

Common Sizing Mistakes

MistakeImpactSolution
Too few shardsCannot scale laterStart with more shards
Undersized historyLatency spikesIncrease memory, replicas
Single frontendSingle point of failureMinimum 2 for HA
No ElasticsearchSlow visibility queriesEnable for production

Additional Resources

Reference Files

For detailed sizing calculations, consult:

  • references/sizing-calculator.md - Detailed sizing formulas
  • references/benchmark-results.md - Performance benchmark data