W&B Tracker
Use this skill to instrument and manage experiment tracking with Weights & Biases.
Environment
Required:
- •WANDB_API_KEY
Run Initialization
Initialize each run with:
- •project name
- •entity/workspace
- •run name
- •tags (model scale, dataset version, phase, framework)
- •config snapshot (hyperparameters + data/tokenizer/model ids)
Ensure runs are reproducible and easy to filter/compare.
Metric Logging
Log metrics at consistent intervals:
- •train loss
- •eval/validation loss
- •learning rate
- •throughput (tokens/sec or samples/sec)
- •grad norm (if available)
- •GPU utilization and memory (if available)
Align metric names across runs for clean comparisons.
Artifact Versioning
Track artifacts with explicit versions and lineage:
- •model checkpoints
- •tokenizer artifacts
- •dataset manifests/snapshots
- •evaluation reports
Include metadata linking artifacts to run id, step, and source config.
Hyperparameter Tuning Comparison
Support side-by-side run comparison for tuning cycles:
- •summarize best-performing runs by target metric
- •identify stable vs unstable training regimes
- •surface cost/performance tradeoffs for candidate configs
Output concise recommendation on next experiment set.
Deliverables
- •
wandb_setup.md(project/entity/tagging conventions) - •
experiment_tracking_plan.md(metrics + logging cadence) - •
run_comparison.md(tuning summary + recommended next run)