Weights & Biases

Monitor, analyze, and compare W&B training runs.

Setup

bash

wandb login
# Or set WANDB_API_KEY in environment

Scripts

Characterize a Run (Full Health Analysis)

bash

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/characterize_run.py ENTITY/PROJECT/RUN_ID

Analyzes:

•Loss curve trend (start → current, % change, direction)
•Gradient norm health (exploding/vanishing detection)
•Eval metrics (if present)
•Stall detection (heartbeat age)
•Progress & ETA estimate
•Config highlights
•Overall health verdict

Options: --json for machine-readable output.

Watch All Running Jobs

bash

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/watch_runs.py ENTITY [--projects p1,p2]

Quick health summary of all running jobs plus recent failures/completions. Ideal for morning briefings.

Options:

•--projects p1,p2 — Specific projects to check
•--all-projects — Check all projects
•--hours N — Hours to look back for finished runs (default: 24)
•--json — Machine-readable output

Compare Two Runs

bash

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/compare_runs.py ENTITY/PROJECT/RUN_A ENTITY/PROJECT/RUN_B

Side-by-side comparison:

•Config differences (highlights important params)
•Loss curves at same steps
•Gradient norm comparison
•Eval metrics
•Performance (tokens/sec, steps/hour)
•Winner verdict

Python API Quick Reference

python

import wandb
api = wandb.Api()

# Get runs
runs = api.runs("entity/project", {"state": "running"})

# Run properties
run.state      # running | finished | failed | crashed | canceled
run.name       # display name
run.id         # unique identifier
run.summary    # final/current metrics
run.config     # hyperparameters
run.heartbeat_at # stall detection

# Get history
history = list(run.scan_history(keys=["train/loss", "train/grad_norm"]))

Metric Key Variations

Scripts handle these automatically:

•Loss: train/loss, loss, train_loss, training_loss
•Gradients: train/grad_norm, grad_norm, gradient_norm
•Steps: train/global_step, global_step, step, _step
•Eval: eval/loss, eval_loss, eval/accuracy, eval_acc

Health Thresholds

•Gradients > 10: Exploding (critical)
•Gradients > 5: Spiky (warning)
•Gradients < 0.0001: Vanishing (warning)
•Heartbeat > 30min: Stalled (critical)
•Heartbeat > 10min: Slow (warning)

Integration Notes

For morning briefings, use watch_runs.py --json and parse the output.

For detailed analysis of a specific run, use characterize_run.py.

For A/B testing or hyperparameter comparisons, use compare_runs.py.