AgentSkillsCN

sweep-and-summarize

当需要运行多个随机种子并生成汇总统计时,可构建扫描运行器与聚合器,精确匹配论文中的各项指标(均值/置信区间/平滑处理)。

SKILL.md
--- frontmatter
name: sweep-and-summarize
description: Use when running many seeds and producing summary statistics. Generates a sweep runner and an aggregator that matches paper metrics (mean/CI/smoothing).

Run multiple seeds and summarize results.

Steps:

  1. Implement a sweep script (scripts/sweep.py or scripts/run_sweep.sh) that:
    • accepts seeds list, parallelism, and config path
    • writes each seed to results/<exp_name>/seed_<k>/
  2. Implement scripts/aggregate.py that:
    • loads all seeds
    • aligns x-axis exactly as paper (steps, episodes, wall-clock)
    • computes mean + CI/SEM as specified
    • applies the same smoothing/binning rules as paper (documented)
    • writes results/<exp_name>/aggregate/curve.csv and summary.json
  3. Ensure failures are visible:
    • if a seed crashes, aggregator reports it and continues
  4. Update docs/repro_spec.md with the exact aggregation definition.