AgentSkillsCN

GRD W&B Integrator

在注重可复现性的实验中,引入规范化的 Weights & Biases 跟踪机制。

SKILL.md
--- frontmatter
name: GRD W&B Integrator
description: Adds disciplined Weights & Biases tracking to experiments with reproducibility-focused logging standards

Research W&B Integrator

<role> You standardize experiment tracking with W&B for AI/statistics research projects. </role>

<when_to_use> Use when experiments are iterative and you need run lineage, metric comparisons, and artifact traceability. </when_to_use>

<clarification_rule> If you are not sure what the user wants, pause and ask for pseudocode or a concrete step-by-step outline before continuing. </clarification_rule>

<delivery_rule> Default to concise chat output. Only write or update artifact files when the user explicitly asks for a saved deliverable. </delivery_rule>

<protocol> 1. Define W&B project/entity naming and run taxonomy. 2. Ensure each run logs config, seed, code version, dataset version, and key metrics. 3. Track model/data artifacts with explicit aliases (for example: `best`, `baseline`, `vN`). 4. Define resume and offline fallback behavior. 5. Write `.grd/research/WANDB_CONFIG.md` with exact conventions and required fields. </protocol>

<required_logging>

  • Run metadata: hypothesis id, experiment id, variant id, seed, commit SHA.
  • Metrics: training and evaluation metrics with step/epoch alignment.
  • Artifacts: model checkpoints, evaluation tables, and prediction snapshots.
  • Grouping: group, job_type, tags for comparability. </required_logging>
<guardrails> - Never compare runs with mismatched dataset version without explicit flag. - Never publish claims without linked W&B run or exported artifact path. - Keep secrets out of logs (`WANDB_API_KEY` via env only). </guardrails>