Check Results
Comprehensive tool for fetching, analyzing, and comparing WandB experiment results across branches. This is the primary tool for evaluating experiments and selecting winners.
Subcommands
status — Run status overview
python .claude/skills/check-results/check_results.py --project <project> status [--branch <name>]
Shows state (running, finished, crashed, etc.) for all branches or a specific one.
summary — Summary metrics
python .claude/skills/check-results/check_results.py --project <project> summary [--metrics loss,accuracy]
Fetches final summary metrics for all finished runs. Auto-detects numeric metrics if none specified.
history — Full metric history
python .claude/skills/check-results/check_results.py --project <project> history --branch <name> [--metrics loss,accuracy] [--rows 40]
Fetches complete step-by-step metric history for a branch's latest run. Useful for analyzing training curves.
compare — Side-by-side comparison
python .claude/skills/check-results/check_results.py --project <project> compare [--branches a,b,c] [--metrics loss,accuracy]
Compares branches on each metric, identifies the best per metric (auto-detects if lower/higher is better), and shows win counts.
artifacts — List logged artifacts
python .claude/skills/check-results/check_results.py --project <project> artifacts --branch <name>
Lists all artifacts (models, checkpoints, outputs) logged by a branch's run.
diagnose — Debug failed runs
python .claude/skills/check-results/check_results.py --project <project> diagnose [--branch <name>] [--limit 5]
For crashed/failed runs: shows config, summary, last history steps, and log tail. Essential for fixing branch bugs before re-launching.
report — Full winner selection report
python .claude/skills/check-results/check_results.py --project <project> report [--metrics loss,accuracy]
Comprehensive report covering all branches: status overview, metric comparison, hyperparameter diff, problematic runs, and winner recommendations.
post-pr — Post results to GitHub PR
python .claude/skills/check-results/check_results.py --project <project> post-pr --branch <name>
Finds the PR for a branch, posts a comment with metrics table, comparison vs baseline, and updates PR labels (experiment:finished, experiment:crashed, experiment:running).
Global Options
- •
--project(required): WandB project name - •
--entity: WandB entity (team/user) - •
--json: Output structured JSON (useful for programmatic analysis)
When to use
- •status: Check if runs are done before analyzing
- •summary/compare: Evaluate finished runs to pick winners
- •history: Deep-dive into training dynamics of a specific branch
- •diagnose: Debug crashed/failed runs to fix and re-launch
- •report: End-of-cycle full evaluation for merge decisions
- •artifacts: Inspect what a run produced (models, checkpoints)
- •post-pr: Post results to the branch's GitHub PR with metrics and baseline comparison
Metric direction heuristic
Metrics containing "loss", "error", "perplexity", "mse", "mae", "rmse" are treated as lower-is-better. All others are higher-is-better. Override by specifying --metrics explicitly.
$ARGUMENTS