Parallel Computing
Use this skill to convert parallel performance work into reproducible scaling decisions.
Workflow
- •Define scaling objective and constraints.
- •Capture workload shape, data size, and latency/throughput targets.
- •Define hardware assumptions (core count, SMT policy, NUMA context).
- •Choose parallel model and partitioning.
- •Select task/data/pipeline parallelism intentionally.
- •Set chunk size and scheduling strategy to minimize overhead and imbalance.
- •Define shared-state boundaries before coding.
- •Diagnose bottlenecks.
- •Check lock contention, false sharing, synchronization frequency, and memory bandwidth pressure.
- •Separate algorithmic limits from runtime/scheduler overhead.
- •Validate scaling behavior.
- •Compare baseline vs current throughput by thread count.
- •Evaluate parallel efficiency and regressions at each thread level.
- •Treat regressions above threshold as blockers.
- •Deliver implementation handoff.
- •Include tuning deltas, tradeoffs, and reproducible benchmark commands.
- •Provide clear patch plan for runtime/algorithm changes.
Commands
bash
python3 scripts/compare_parallel_scaling.py \ --baseline <baseline.json> \ --current <current.json> \ --regression-threshold-pct 5 \ --efficiency-drop-threshold-pct 10
Treat non-zero exits as blocker regressions.
Output Contract
Return:
- •
Scaling Context: workload and hardware assumptions. - •
Findings: thread-level throughput/speedup/efficiency deltas. - •
Optimization Plan: concrete runtime/algorithm changes. - •
Verification: benchmark commands and thresholds. - •
Residual Risks: unresolved contention or scaling ceilings.
References
- •
references/workflow.md: detailed parallel optimization sequence. - •
references/scaling-playbook.md: common bottlenecks and remedies. - •
references/signoff-template.md: concise scaling sign-off format.
Execution Rules
- •Compare like-for-like workloads and environments only.
- •Report both speedup and efficiency, not throughput alone.
- •Flag thread-level regressions above thresholds as blockers.
- •Avoid overfitting to one thread count; evaluate full scaling curve.