Traditional sequential execution wastes time:
- •/optimize runs 5 quality checks sequentially (10-15 minutes)
- •/ship runs 5 pre-flight checks sequentially (8-12 minutes)
- •/implement processes tasks one-by-one despite no dependencies
- •Prototype screens generated sequentially when all could run in parallel
This skill analyzes operation dependencies, groups independent work into batches, and orchestrates parallel execution using multiple Task() agent calls in a single message. The result: 3-5x faster phase completion with zero compromise on quality or correctness. </objective>
<quick_start> <basic_pattern> When you detect multiple independent operations, send a single message with multiple tool calls:
Sequential (slow):
- •Send message with Task call for security-sentry
- •Wait for response
- •Send message with Task call for performance-profiler
- •Wait for response
- •Send message with Task call for accessibility-auditor
- •Total: 15 minutes
Parallel (fast):
- •Send ONE message with 3 Task calls (security-sentry, performance-profiler, accessibility-auditor)
- •All three run concurrently
- •Total: 5 minutes </basic_pattern>
<immediate_use_cases>
- •/optimize phase: Run 5 quality checks in parallel (security, performance, accessibility, code-review, type-safety)
- •/ship pre-flight: Run 5 deployment checks in parallel (env-vars, build, docker, CI-config, dependency-audit)
- •/implement: Process independent task batches in parallel layers
- •Design variations: Generate multiple mockup variations concurrently
- •Research phase: Fetch multiple documentation sources concurrently </immediate_use_cases> </quick_start>
Scan the current phase for operations that:
- •Read different files/data sources
- •Don't modify shared state
- •Have no sequential dependencies
- •Can produce results independently
Examples:
- •Quality checks (security scan + performance test + accessibility audit)
- •File reads (spec.md + plan.md + tasks.md)
- •API documentation fetches (Stripe docs + Twilio docs + SendGrid docs)
- •Test suite runs (unit tests + integration tests + E2E tests) </step>
Build a dependency graph:
- •Layer 0: Operations with no dependencies (can run immediately)
- •Layer 1: Operations depending only on Layer 0 outputs
- •Layer 2: Operations depending on Layer 1 outputs
- •etc.
Example (/optimize):
Layer 0 (parallel): - security-sentry (reads codebase) - performance-profiler (reads codebase + runs benchmarks) - accessibility-auditor (reads UI components) - type-enforcer (reads TypeScript files) - dependency-curator (reads package.json) Layer 1 (after Layer 0): - Generate optimization-report.md (combines all Layer 0 results)
Create batches for each layer:
- •All Layer 0 operations in single message (parallel execution)
- •Wait for Layer 0 completion
- •All Layer 1 operations in single message
- •Continue through layers
Batch size considerations:
- •Optimal: 3-5 operations per batch (balanced parallelism)
- •Maximum: 8 operations (avoid overwhelming system)
- •Minimum: 2 operations (below 2, parallelism has no benefit) </step>
Send a single message with multiple tool calls for each batch.
Critical requirements:
- •Must be a single message with multiple tool use blocks
- •Each tool call must be complete and independent
- •Do not use placeholders or forward references
- •Each agent must have all required context in its prompt
See references/execution-patterns.md for detailed examples. </step>
<step number="5"> **Aggregate results**After each batch completes:
- •Collect results from all parallel operations
- •Check for failures or blocking issues
- •Decide whether to proceed to next layer
- •Aggregate findings into unified report
Failure handling:
- •If any operation blocks (critical security issue), halt pipeline
- •If operations have warnings (minor performance issue), continue but log
- •If operations fail (agent error), retry individually or escalate </step> </workflow>
<phase_specific_patterns> <optimize_phase> Operation: Run 5 quality gates in parallel
Dependency graph:
Layer 0 (parallel - 5 operations): 1. security-sentry → Scan for vulnerabilities, secrets, auth issues 2. performance-profiler → Benchmark API endpoints, detect N+1 queries 3. accessibility-auditor → WCAG 2.1 AA compliance (if UI feature) 4. type-enforcer → TypeScript strict mode compliance 5. dependency-curator → npm audit, outdated packages Layer 1 (sequential - 1 operation): 6. Generate optimization-report.md (aggregates Layer 0 findings)
Time savings:
- •Sequential: ~15 minutes (3 min per check)
- •Parallel: ~5 minutes (longest check + aggregation)
- •Speedup: 3x
See references/optimize-phase-parallelization.md for implementation details. </optimize_phase>
<ship_preflight> Operation: Run 5 pre-flight checks in parallel
Dependency graph:
Layer 0 (parallel - 5 operations): 1. Check environment variables (read .env.example vs .env) 2. Validate production build (npm run build) 3. Check Docker configuration (docker-compose.yml, Dockerfile) 4. Validate CI configuration (.github/workflows/*.yml) 5. Run dependency audit (npm audit --production) Layer 1 (sequential - 1 operation): 6. Update state.yaml with pre-flight results
Time savings:
- •Sequential: ~12 minutes
- •Parallel: ~4 minutes (build is longest operation)
- •Speedup: 3x
See references/ship-preflight-parallelization.md. </ship_preflight>
<implement_phase> Operation: Execute independent task batches in parallel
Dependency analysis:
- •Read tasks.md
- •Build dependency graph from task relationships
- •Identify tasks with no dependencies (Layer 0)
- •Group tasks by layer
Example (15 tasks):
Layer 0 (4 tasks - parallel): T001: Create User model T002: Create Product model T005: Setup test framework T008: Create API client utility Layer 1 (3 tasks - parallel, depend on Layer 0): T003: User CRUD endpoints (needs T001) T004: Product CRUD endpoints (needs T002) T009: Write User model tests (needs T001, T005) Layer 2 (2 tasks - parallel): T006: User-Product relationship (needs T001, T002) T010: Write Product model tests (needs T002, T005) Layer 3 (sequential): T007: Integration tests (needs all above)
Execution:
- •Batch 1: Launch 4 agents for Layer 0 tasks (parallel)
- •Batch 2: Launch 3 agents for Layer 1 tasks (parallel)
- •Batch 3: Launch 2 agents for Layer 2 tasks (parallel)
- •Batch 4: Single agent for Layer 3
Time savings:
- •Sequential: 15 tasks × 20 min = 300 minutes (5 hours)
- •Parallel: 4 batches × 30 min = 120 minutes (2 hours)
- •Speedup: 2.5x
See references/implement-phase-parallelization.md. </implement_phase>
<prototype_screens> Operation: Generate multiple prototype screens in parallel
Use case: User wants to create 3 different screens (login, dashboard, settings)
Sequential approach (slow):
- •Generate login screen
- •Generate dashboard screen
- •Generate settings screen Total: 15 minutes
Parallel approach (fast):
- •Launch 3 screen agents in single message (login, dashboard, settings) Total: 5 minutes (all generate concurrently)
Speedup: 3x
Note: All screens share theme.yaml for consistency. </prototype_screens> </phase_specific_patterns>
<dependency_analysis> <determining_independence> Two operations are independent if:
- •Read-only access to shared resources: Both only read the same files (safe to parallelize)
- •Disjoint file access: They read/write completely different files
- •No temporal dependencies: Neither requires the other's output
- •Idempotent operations: Running them in any order produces same result
Two operations are dependent if:
- •Write-after-read: Operation B reads file that Operation A writes
- •Write-after-write: Both write to same file (race condition)
- •Data dependency: Operation B needs Operation A's output as input
- •Order-dependent side effects: Operations modify shared state </determining_independence>
<common_patterns> Independent (safe to parallelize):
- •Multiple quality checks reading codebase
- •Multiple file reads (spec.md, plan.md, tasks.md)
- •Multiple API documentation fetches
- •Multiple test suite runs (if isolated)
- •Multiple lint checks on different file types
Dependent (must sequence):
- •Generate code → Run tests on generated code
- •Fetch API docs → Generate client based on docs
- •Write file → Read file back for validation
- •Create database schema → Run migrations
- •Build project → Deploy built artifacts </common_patterns>
<edge_cases> Shared mutable state: If operations modify the same git branch, database, or filesystem location, they CANNOT run in parallel safely.
Resource contention: Even if logically independent, operations competing for same resource (CPU, memory, network) may not see speedup. Monitor system resources.
Cascading failures: If one parallel operation fails and others depend on it indirectly, you may need to cancel or retry the batch. </edge_cases> </dependency_analysis>
<auto_trigger_conditions> <when_to_apply> Automatically apply parallel execution when you detect:
- •Multiple quality checks: ≥3 independent checks in /optimize or /ship
- •Multiple file reads: ≥3 files to read that don't depend on each other
- •Multiple API calls: ≥2 external API documentation fetches
- •Batch task processing: ≥5 tasks in /implement with identifiable layers
- •Multiple test suites: Unit, integration, E2E running independently
- •Multiple design variations: ≥2 mockup/prototype variants requested
- •Multiple research queries: ≥3 web searches or documentation lookups </when_to_apply>
<when_not_to_apply> Do NOT parallelize when:
- •Sequential dependencies exist: Operation B needs Operation A's output
- •Shared state modification: Operations write to same files/database
- •Small operation count: <2 independent operations (no benefit)
- •Complex coordination needed: Results must be merged in specific order
- •User explicitly requests sequential: "Do X, then Y, then Z" </when_not_to_apply>
<proactive_detection> Scan for these phrases in phase workflows:
- •"Run these checks: A, B, C, D, E" → Parallel candidate
- •"Generate variations: X, Y, Z" → Parallel candidate
- •"Fetch documentation for: Service1, Service2, Service3" → Parallel candidate
- •"Execute tasks: T001, T002, T003 (no dependencies)" → Parallel candidate
When detected, immediately analyze dependencies and propose parallel execution strategy. </proactive_detection> </auto_trigger_conditions>
<examples> <example name="/optimize-phase-parallelization"> **Context**: Running /optimize on a feature with UI componentsSequential execution (15 minutes):
1. Launch security-sentry (3 min) 2. Wait for completion 3. Launch performance-profiler (4 min) 4. Wait for completion 5. Launch accessibility-auditor (3 min) 6. Wait for completion 7. Launch type-enforcer (2 min) 8. Wait for completion 9. Launch dependency-curator (2 min) 10. Wait for completion 11. Aggregate results (1 min) Total: 15 minutes
Parallel execution (5 minutes):
1. Launch 5 agents in SINGLE message: - security-sentry - performance-profiler - accessibility-auditor - type-enforcer - dependency-curator 2. All run concurrently (longest is 4 min) 3. Aggregate results (1 min) Total: 5 minutes
Implementation: See examples/optimize-phase-parallel.md </example>
<example name="/ship-preflight-parallelization"> **Context**: Running pre-flight checks before deploymentSequential execution (12 minutes):
1. Check env vars (1 min) 2. Run build (5 min) 3. Check Docker config (2 min) 4. Validate CI config (2 min) 5. Dependency audit (2 min) Total: 12 minutes
Parallel execution (6 minutes):
1. Launch 5 checks in SINGLE message (all concurrent) 2. Longest operation is build (5 min) 3. Update workflow state (1 min) Total: 6 minutes
Implementation: See examples/ship-preflight-parallel.md </example>
<example name="/implement-task-batching"> **Context**: 12 tasks with dependency graph in /implement phaseTask dependencies:
T001 (User model) → no deps T002 (Product model) → no deps T003 (User endpoints) → depends on T001 T004 (Product endpoints) → depends on T002 T005 (User tests) → depends on T001, T003 T006 (Product tests) → depends on T002, T004 T007 (Integration tests) → depends on T003, T004
Parallel execution plan:
Batch 1 (Layer 0): T001, T002 (parallel - 2 tasks) Batch 2 (Layer 1): T003, T004 (parallel - 2 tasks, wait for Batch 1) Batch 3 (Layer 2): T005, T006 (parallel - 2 tasks, wait for Batch 2) Batch 4 (Layer 3): T007 (sequential - 1 task, wait for Batch 3)
Time savings:
- •Sequential: 7 tasks × 20 min = 140 minutes
- •Parallel: 4 batches × 25 min = 100 minutes
- •Speedup: 1.4x
Implementation: See examples/implement-batching-parallel.md </example> </examples>
<anti_patterns> <anti_pattern name="fake-parallelism"> Problem: Sending multiple messages rapidly, thinking they'll run in parallel
Wrong approach:
Send message 1: Launch agent A Send message 2: Launch agent B Send message 3: Launch agent C
These execute sequentially because each message waits for the previous to complete.
Correct approach:
Send ONE message with 3 tool calls (A, B, C)
Rule: Multiple tool calls in a SINGLE message = parallel. Multiple messages = sequential. </anti_pattern>
<anti_pattern name="ignoring-dependencies"> Problem: Parallelizing dependent operations causing race conditions
Wrong approach:
Parallel batch: - Generate User model code - Write tests for User model (needs generated code)
Second operation will fail because code doesn't exist yet.
Correct approach:
Batch 1 (sequential): Generate User model code Batch 2 (sequential): Write tests for User model
Rule: Always build dependency graph first. Never parallelize dependent operations. </anti_pattern>
<anti_pattern name="over-parallelization"> Problem: Launching 20 agents in parallel, overwhelming system
Wrong approach:
Launch 20 agents in single message (all tasks at once)
System resources exhausted, agents may fail or slow down dramatically.
Correct approach:
Batch 1: Launch 5 agents (Layer 0) Batch 2: Launch 5 agents (Layer 1) Batch 3: Launch 5 agents (Layer 2) Batch 4: Launch 5 agents (Layer 3)
Rule: Keep batches to 3-8 operations. More layers is better than huge batches. </anti_pattern>
<anti_pattern name="parallelizing-trivial-operations"> Problem: Using parallel execution for operations taking <30 seconds each
Wrong approach:
Parallel batch: - Read spec.md (5 seconds) - Read plan.md (5 seconds)
Overhead of parallel coordination exceeds time savings.
Correct approach:
Sequential: - Read spec.md - Read plan.md
Rule: Only parallelize operations taking ≥1 minute each. Below that, sequential is fine. </anti_pattern> </anti_patterns>
<validation> <success_indicators> After applying parallel execution optimization:- •Wall-clock time reduced: Phase completes 2-5x faster than sequential baseline
- •All operations successful: No failures due to race conditions or dependencies
- •Results identical: Parallel execution produces same output as sequential
- •No resource exhaustion: System handles parallel load without failures
- •Clear dependency graph: Can explain why operations were grouped into specific batches </success_indicators>
<testing_approach> Before parallelization:
- •Run phase sequentially
- •Record total time
- •Record all outputs (files, reports, state changes)
After parallelization:
- •Run phase with parallel batches
- •Record total time
- •Record all outputs
Validate:
- •Time reduced by expected factor (2-5x)
- •Outputs identical (diff files, compare checksums)
- •No errors or warnings introduced
- •Workflow state updated correctly
Rollback if:
- •Parallel version produces different outputs
- •Failures or race conditions occur
- •Time savings <20% (not worth complexity) </testing_approach> </validation>
<reference_guides> For deeper topics, see reference files:
Execution patterns: references/execution-patterns.md
- •Correct vs incorrect parallel execution patterns
- •Message structure for parallel tool calls
- •Handling tool call failures
Phase-specific guides:
- •references/optimize-phase-parallelization.md
- •references/ship-preflight-parallelization.md
- •references/implement-phase-parallelization.md
Dependency analysis: references/dependency-analysis-guide.md
- •Building dependency graphs
- •Detecting hidden dependencies
- •Handling edge cases
Troubleshooting: references/troubleshooting.md
- •Common failures and fixes
- •Performance not improving
- •Race condition debugging </reference_guides>
<success_criteria> The parallel-execution-optimizer skill is successfully applied when:
- •Dependency graph created: All operations analyzed for dependencies before execution
- •Batches identified: Independent operations grouped into parallel execution batches
- •Single message per batch: Each batch executed via ONE message with multiple tool calls
- •Time savings achieved: 2-5x speedup compared to sequential execution
- •Correctness maintained: Parallel execution produces identical results to sequential
- •No race conditions: No failures due to shared state or missing dependencies
- •Appropriate scope: Only applied when ≥2 operations taking ≥1 minute each
- •Clear documentation: Execution plan explained (layers, batches, expected speedup) </success_criteria>