Designing Simulation Workloads
When to Use This Skill
Invoke this skill when you are:
- •Creating new simulation tests for actor systems or distributed components
- •Improving test coverage by expanding existing workloads
- •Designing randomized operation sequences to explore state space
- •Planning verification strategies (reference implementations, operation logs, invariants)
- •Scaling workloads from single-node (1x1) to multi-node topologies (2x2, 10x10)
Related Skills
- •using-buggify: Add fault injection to force edge cases in your workload
- •using-chaos-assertions: Track coverage and validate safety properties during execution
- •validating-with-invariants: Design cross-workload properties for global validation
Philosophy: From Test Cases to Autonomous Exploration
Traditional testing writes specific scenarios: "Do A, then B, verify C." This misses bugs hiding in unexpected combinations.
Autonomous workload testing shifts the approach: Define all possible operations (an "alphabet"), generate massive concurrent work, let deterministic chaos explore the state space.
The Plinko Board Mental Model
Think of your system as a Plinko board:
[Drop Zone - Inputs]
│
○───○───○───○───○ ← Execution paths
○───○───○───○ ← State transitions
○───○───○ ← Decisions
○───○
┌───┬───┬───┬───┐
│ 0 │$10│$50│ 0 │ ← Outcomes (success or bugs)
└───┴───┴───┴───┘
- •Pegs = Code paths, state transitions, decisions
- •Discs = Work items (messages, requests, operations)
- •Buckets = Outcomes (successful behavior OR bugs)
Traditional testing: Drop one disc down predefined paths → misses unexpected behavior
Autonomous testing: Dump an entire bucket of discs → find unexpected states through randomness and massive concurrency
Four Principles of Autonomous Testing
Principle 1: Build Properties, Not Test Cases
Test Case Thinking (limited):
assert!(insert("Alice", 100).is_ok());
assert!(get("Alice") == Some(100));
Property Thinking (general):
property!(valid_inserts_succeed,
forall key: String, value: u64 =>
is_valid(key, value) => insert(key, value).is_ok()
);
For actor systems, turn assumptions into testable properties:
// Assumption: "Actors activate only once"
always_assert!(
no_duplicate_activation,
activation_count(actor_id) <= 1,
"Actor activated multiple times - race condition detected"
);
// Assumption: "Messages don't duplicate"
always_assert!(
message_conservation,
messages_received <= messages_sent,
"Message duplication detected"
);
Principle 2: Add Randomness (Data + Sequences)
Level 1: Random Data
let key = random.generate_string(1..100); let value = random.range(0..u64::MAX);
Level 2: Random Sequences (more powerful!)
// Define alphabet, let simulation choose order
let operations = vec![
Operation::Insert(random_key(), random_value()),
Operation::Get(random_key()),
Operation::Delete(random_key()),
Operation::Update(random_key(), random_value()),
];
for _ in 0..1000 {
let op = random.choice(&operations);
execute(op);
}
Why this matters: Random sequences reveal race conditions. What if delete happens before insert? What if two activations race?
Principle 3: Validate Often (Not Just at End)
Traditional: Check only at completion
run_test(); assert_eq!(final_state, expected); // ← Only here
Autonomous: Validate throughout execution
async fn workload() {
sometimes_assert!(
actors_making_progress,
active_actors > 0,
"At least one actor is active"
);
always_assert!(
directory_consistent,
directory.count(actor) <= 1,
"Actor appears in multiple locations"
);
validate_final_properties();
}
Why: Assertions are signposts guiding exploration to find bugs faster.
Principle 4: Generate Enough Work
❌ Don't drop one disc at a time:
for i in 0..10 {
send_message(i);
wait_for_response();
}
✅ Dump the entire bucket:
let num_operations = random.range(500..2000);
let mut tasks = vec![];
for _ in 0..num_operations {
let op = random.choice(&operations);
tasks.push(spawn_task(execute(op)));
}
join_all(tasks).await;
Why: Bugs hide in combinations and concurrency. Sequential execution misses them.
The Operation Alphabet Pattern
The key to autonomous workloads: Define all possible operations, let the fuzzer combine them.
Basic Template
enum Operation {
// Actor lifecycle
ActivateActor(ActorId),
DeactivateActor(ActorId),
// Messaging
SendMessage(ActorId, Message),
SendRequest(ActorId, Request),
// State management
SaveState(ActorId),
LoadState(ActorId),
// Infrastructure chaos (optional)
CrashNode(NodeId),
RestoreNode(NodeId),
}
async fn execute_operation(
op: Operation,
runtime: &ActorRuntime,
) -> Result<()> {
match op {
Operation::ActivateActor(id) => {
runtime.activate_actor(id).await?;
sometimes_assert!(actor_activated, true, "Actor activated");
}
Operation::SendMessage(id, msg) => {
runtime.send_message(id, msg).await?;
sometimes_assert!(message_sent, true, "Message sent");
}
// ... handle all operations
}
Ok(())
}
Workload Structure
async fn autonomous_workload(
random: SimRandomProvider,
network: SimNetworkProvider,
time: SimTimeProvider,
task_provider: TokioTaskProvider,
topology: WorkloadTopology,
) -> SimulationResult<SimulationMetrics> {
let runtime = ActorRuntime::with_providers(
"test",
network,
time,
task_provider.clone(),
).await?;
// 1. Generate operation alphabet
let mut operations = vec![];
for _ in 0..100 {
let actor_id = random_actor_id();
operations.push(Operation::ActivateActor(actor_id));
operations.push(Operation::SendMessage(actor_id, random_msg()));
operations.push(Operation::DeactivateActor(actor_id));
}
// 2. Shuffle for randomness
random.shuffle(&mut operations);
// 3. Execute concurrently
let mut tasks = vec![];
for op in operations {
let task = task_provider.spawn_task(
execute_operation(op, &runtime)
);
tasks.push(task);
}
// 4. Wait for completion
for task in tasks {
task.await?;
}
// 5. Final validation
validate_final_state(&runtime).await;
Ok(SimulationMetrics::default())
}
Three Verification Patterns
Choose the pattern that fits your system:
Pattern 1: Reference Implementation
Mirror production logic with simple, correct implementation.
// Production: Complex distributed KV store
// Reference: std::HashMap
let mut reference = HashMap::new();
let distributed = DistributedKV::new();
// Apply same operations to both
for op in operations {
match op {
Insert(k, v) => {
reference.insert(k, v);
distributed.insert(k, v).await;
}
Get(k) => {
let expected = reference.get(&k);
let actual = distributed.get(&k).await;
always_assert!(kv_match, actual == expected, "Mismatch");
}
}
}
Pattern 2: Operation Logging
Record all operations, replay to verify consistency.
let mut log = Vec::new();
for op in operations {
log.push(op.clone());
system.execute(op).await;
}
// After execution, replay log and verify state
let final_state = system.get_state().await;
let expected = replay_operations(&log);
always_assert!(state_matches, final_state == expected, "Replay mismatch");
Pattern 3: Invariant Tracking
Maintain mathematical properties that must hold.
// Example: Total balance conservation in banking system
let initial_balance: u64 = accounts.iter().map(|a| a.balance).sum();
// ... many operations (deposits, withdrawals, transfers) ...
let final_balance: u64 = accounts.iter().map(|a| a.balance).sum();
always_assert!(
balance_conservation,
final_balance == initial_balance + total_deposits - total_withdrawals,
"Money conservation violated"
);
Topology Scaling Strategy
Start simple, scale up progressively.
1x1 Topology (Basic Functionality)
SimulationBuilder::new()
.register_workload("client", client_workload)
.register_workload("server", server_workload)
.run()
.await;
Tests: Basic request-response, error handling, simple state transitions
2x2 Topology (Distributed Scenarios)
SimulationBuilder::new()
.register_workload("client_1", client_workload)
.register_workload("client_2", client_workload)
.register_workload("server_1", server_workload)
.register_workload("server_2", server_workload)
.run()
.await;
Tests: Multi-connection handling, load distribution, server switching, basic race conditions
10x10 Topology (Stress Testing)
async fn run_large_topology(
num_clients: usize,
num_servers: usize,
) -> SimulationReport {
let mut builder = SimulationBuilder::new()
.use_random_config()
.set_iteration_control(
IterationControl::UntilAllSometimesReached(10_000)
);
for i in 1..=num_servers {
builder = builder.register_workload(
format!("server_{}", i),
server_workload
);
}
for i in 1..=num_clients {
builder = builder.register_workload(
format!("client_{}", i),
client_workload
);
}
builder.run().await
}
Tests: Rare race conditions, queue overflow, network partition behavior, high contention
ClientId-Based Work Partitioning
Use topology.client_id to partition work across multiple workload instances.
async fn partitioned_workload(
random: SimRandomProvider,
network: SimNetworkProvider,
time: SimTimeProvider,
task_provider: TokioTaskProvider,
topology: WorkloadTopology,
) -> SimulationResult<SimulationMetrics> {
let runtime = ActorRuntime::with_providers(/*...*/).await?;
// Partition actor IDs by client_id
let actor_ids: Vec<_> = (0..50)
.filter(|i| i % topology.total_clients == topology.client_id)
.map(|i| ActorId::virtual_actor("Test", &format!("actor_{}", i)))
.collect();
// Generate operations for this partition
let mut operations = vec![];
for actor_id in &actor_ids {
operations.push(Operation::ActivateActor(actor_id.clone()));
operations.push(Operation::SendMessage(actor_id.clone(), random_msg()));
}
// Execute...
Ok(SimulationMetrics::default())
}
Benefits: Enables scaling tests (10+ clients) without operation conflicts.
Simulation Test Setup
Basic Test Structure
#[test]
fn slow_simulation_my_workload() {
let local_runtime = tokio::runtime::Builder::new_current_thread()
.build_local(Default::default())
.expect("Failed to build local runtime");
local_runtime.block_on(async move {
let report = SimulationBuilder::new()
.use_random_config() // Enable chaos
.set_iteration_control(
IterationControl::UntilAllSometimesReached(10_000)
)
.register_workload("my_workload", my_workload)
.run()
.await;
println!("{}", report);
if !report.seeds_failing.is_empty() {
panic!("Faulty seeds: {:?}", report.seeds_failing);
}
panic_on_assertion_violations(&report);
});
}
Iteration Control Strategies
// Run until all sometimes_assert! statements succeed at least once
IterationControl::UntilAllSometimesReached(10_000)
// Fixed number of seeds (quick smoke test)
IterationControl::FixedCount(10)
// Debug specific failing seed
SimulationBuilder::new()
.set_seed(12345)
.set_iteration_control(IterationControl::FixedCount(1))
Debugging Failed Seeds
When a seed fails:
- •Capture the seed: Note from error output
- •Single-seed replay with detailed logging:
rust
let _ = tracing_subscriber::fmt() .with_max_level(Level::ERROR) .try_init(); let report = SimulationBuilder::new() .set_seed(failing_seed) .set_iteration_control(IterationControl::FixedCount(1)) .run() .await; - •Examine error: Read stack trace and error message
- •Fix root cause: Don't just work around the symptom
- •Re-enable chaos: Verify fix under full randomness
Integration Checklist
When creating a new simulation workload:
- • Define operation alphabet (enum) with all possible operations
- • Implement
execute_operation()matching on each variant - • Add
sometimes_assert!for coverage tracking (See: using-chaos-assertions skill) - • Add
always_assert!for safety invariants - • Generate 500-2000 concurrent operations
- • Shuffle operations for randomness
- • Execute operations concurrently via
spawn_task - • Add strategic
buggify!calls (See: using-buggify skill) - • Validate final state properties
- • Configure test with
slow_simulationprefix in name - • Set timeout in
.config/nextest.toml(240s recommended) - • Start with 1x1 topology, scale to 2x2, then 10x10
- • Use
UntilAllSometimesReached(10_000)for comprehensive coverage
Practical Guidelines
Start Small, Scale Up
// Phase 1: Basic alphabet (5-10 operations)
enum BasicOps {
Activate(ActorId),
Deactivate(ActorId),
SendMessage(ActorId, Msg),
}
// Phase 2: Add complexity incrementally
// Phase 3: Increase concurrency gradually
// Start: 100 operations
// Then: 500 operations
// Finally: 1000+ operations
Balance Exploration vs Test Time
// Quick smoke test (development) let num_ops = 100; let iterations = IterationControl::FixedCount(10); // Thorough testing (CI) let num_ops = 500; let iterations = IterationControl::UntilAllSometimesReached(1000); // Comprehensive (nightly) let num_ops = 1000; let iterations = IterationControl::UntilAllSometimesReached(10_000);
Debug Strategy
When tests fail:
- •Capture seed: From error output
- •Reduce operations:
let num_ops = 10;to simplify - •Add logging: Trace operation execution
- •Replay deterministically:
set_seed(failing_seed) - •Binary search: Reduce until bug disappears, then analyze
Key Takeaways
- •Think in operations, not scenarios: Define alphabet, let simulation explore
- •Properties over test cases: Make assumptions explicit as assertions
- •Massive concurrency: Dump the bucket, don't drop one disc
- •Validate throughout: Assertions guide exploration
- •Randomness is key: Both data AND sequences
- •Start simple, scale up: Gradual complexity increase
The goal is autonomous state space exploration, finding bugs you couldn't imagine!
Additional Resources
See separate reference files:
- •
EXAMPLES.md: Three complete workload examples (Bank Account, Directory, MessageBus) - •
PATTERNS.md: Detailed operation alphabet implementations - •
VERIFICATION.md: Deep dive into the three verification patterns