AgentSkillsCN

cpu-gpu-performance

触发条件:性能 通过负载测量与经济高效的验证策略,监控并优化 CPU/GPU 使用情况。 触发条件:CPU 使用、GPU 使用、性能、负载监控、构建性能、训练、资源消耗、测试套件、编译 适用场景:会话开始时(自动加载并节约 Token)、规划可能使 CPU/GPU 占用超过 1 分钟的构建或训练任务,或重试资源密集型失败命令时使用。 禁忌场景:无需资源占用的简单操作。 禁忌场景:快速的单文件操作。 在资源密集型操作之前,务必先使用此技能。主动建立基准线。

SKILL.md
--- frontmatter
name: cpu-gpu-performance
description: |
  Triggers: performance
  Monitor and optimize CPU/GPU usage with load measurement and cost-effective
  validation strategies.

  Triggers: CPU usage, GPU usage, performance, load monitoring, build performance,
  training, resource consumption, test suite, compilation

  Use when: session starts (auto-load with token-conservation), planning builds
  or training that could pin CPUs/GPUs for >1 minute, retrying failed resource-heavy commands

  DO NOT use when: simple operations with no resource impact.
  DO NOT use when: quick single-file operations.

  Use this skill BEFORE resource-intensive operations. Establish baselines proactively.
location: plugin
token_budget: 400
progressive_loading: true
dependencies:
  hub: [token-conservation]
  modules: []

Table of Contents

CPU/GPU Performance Discipline

When to Use

  • At the beginning of every session (auto-load alongside token-conservation).
  • Whenever you plan to build, train, or test anything that could pin CPU cores or GPUs for more than a minute.
  • Before retrying a failing command that previously consumed significant resources.

Required TodoWrite Items

  1. cpu-gpu-performance:baseline
  2. cpu-gpu-performance:scope
  3. cpu-gpu-performance:instrument
  4. cpu-gpu-performance:throttle
  5. cpu-gpu-performance:log

Step 1: Establish Current Baseline

  • Capture current utilization:

    • uptime
    • ps -eo pcpu,cmd | head
    • nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

    Note which hosts/GPUs are already busy.

  • Record any CI/cluster budgets (time quotas, GPU hours) before launching work.

  • Set a per-task CPU minute / GPU minute budget that respects those limits.

Step 2: Narrow the Scope

  • Avoid running "whole world" jobs after a small fix. Prefer diff-based or tag-based selective testing:
    • pytest -k
    • Bazel target patterns
    • cargo test <module>
  • Batch low-level fixes so you can validate multiple changes with a single targeted command.
  • For GPU jobs, favor unit-scale smoke inputs or lower epoch counts before scheduling the full training/eval sweep.

Step 3: Instrument Before You Optimize

  • Pick the right profiler/monitor:
    • CPU work:
      • perf
      • intel vtune
      • cargo flamegraph
      • language-specific profilers
    • GPU work:
      • nvidia-smi dmon
      • nsys
      • nvprof
      • DLProf
      • framework timeline tracers
  • Capture kernel/ops timelines, memory footprints, and data pipeline latency so you have evidence when throttling or parallelizing.
  • Record hot paths + I/O bottlenecks in notes so future reruns can jump straight to the culprit.

Step 4: Throttle and Sequence Work

  • Use nice, ionice, or Kubernetes/Slurm quotas to prevent starvation of shared nodes.
  • Chain heavy tasks with guardrails:
    • Rerun only the failed test/module
    • Then (optionally) escalate to the next-wider shard
    • Reserve the full suite for the final gate
  • Stagger GPU kernels (smaller batch sizes or gradient accumulation) when memory pressure risks eviction; prefer checkpoint/restore over restarts.

Step 5: Log Decisions and Next Steps

Conclude by documenting the commands that were run and their resource cost (duration, CPU%, GPU%), confirming whether they remained within the per-task budget. If a full suite or long training run was necessary, justify why selective or staged approaches were not feasible. Capture any follow-up tasks, such as adding a new test marker or profiling documentation, to streamline future sessions.

Output Expectations

  • Brief summary covering:
    • baseline metrics
    • scope chosen
    • instrumentation captured
    • throttling tactics
    • follow-up items
  • Concrete example(s) of what ran (e.g.):
    • "reran pytest tests/test_orders.py -k test_refund instead of pytest -m slow"
    • "profiled nvidia-smi dmon output to prove GPU idle time before scaling"

Troubleshooting

Common Issues

Command not found Ensure all dependencies are installed and in PATH

Permission errors Check file permissions and run with appropriate privileges

Unexpected behavior Enable verbose logging with --verbose flag