parallel-computing

在可量化的扩展证据支持下，设计、优化并验证跨CPU线程/工作者的并行执行。在选择并行化策略、诊断竞争与负载失衡、评估加速/效率曲线、调优任务粒度，或对基线与当前并行性能的回归问题进行分类时使用此技能。

SKILL.md

--- frontmatter

name: parallel-computing
description: Design, optimize, and validate parallel execution across CPU threads/workers with measurable scaling evidence. Use when selecting parallelization strategy, diagnosing contention and load imbalance, evaluating speedup/efficiency curves, tuning task granularity, or triaging baseline-vs-current parallel performance regressions.

Parallel Computing

Use this skill to convert parallel performance work into reproducible scaling decisions.

Workflow

•Define scaling objective and constraints.

•Capture workload shape, data size, and latency/throughput targets.
•Define hardware assumptions (core count, SMT policy, NUMA context).

•Choose parallel model and partitioning.

•Select task/data/pipeline parallelism intentionally.
•Set chunk size and scheduling strategy to minimize overhead and imbalance.
•Define shared-state boundaries before coding.

•Diagnose bottlenecks.

•Check lock contention, false sharing, synchronization frequency, and memory bandwidth pressure.
•Separate algorithmic limits from runtime/scheduler overhead.

•Validate scaling behavior.

•Compare baseline vs current throughput by thread count.
•Evaluate parallel efficiency and regressions at each thread level.
•Treat regressions above threshold as blockers.

•Deliver implementation handoff.

•Include tuning deltas, tradeoffs, and reproducible benchmark commands.
•Provide clear patch plan for runtime/algorithm changes.

Commands

bash

python3 scripts/compare_parallel_scaling.py \
  --baseline <baseline.json> \
  --current <current.json> \
  --regression-threshold-pct 5 \
  --efficiency-drop-threshold-pct 10

Treat non-zero exits as blocker regressions.

Output Contract

Return:

•Scaling Context: workload and hardware assumptions.
•Findings: thread-level throughput/speedup/efficiency deltas.
•Optimization Plan: concrete runtime/algorithm changes.
•Verification: benchmark commands and thresholds.
•Residual Risks: unresolved contention or scaling ceilings.

References

•references/workflow.md: detailed parallel optimization sequence.
•references/scaling-playbook.md: common bottlenecks and remedies.
•references/signoff-template.md: concise scaling sign-off format.

Execution Rules

•Compare like-for-like workloads and environments only.
•Report both speedup and efficiency, not throughput alone.
•Flag thread-level regressions above thresholds as blockers.
•Avoid overfitting to one thread count; evaluate full scaling curve.