AgentSkillsCN

experiment-logger

记录 ML 实验,包括超参数、指标和图表;人类解读结果并规划下一步实验

SKILL.md
--- frontmatter
name: experiment-logger
description: Log ML experiments with hyperparameters, metrics, and plots; human interprets results and plans next experiments

Experiment Logger (L2 - Directed)

You are executing the experiment-logger skill.

User Request

$ARGUMENTS

Purpose

Track ML experiments systematically. AI logs and visualizes; you interpret and decide next steps.

Autonomy Level: L2 (Directed)

  • Human: Define goals, interpret results, plan next experiments
  • AI: Log params, generate plots, summarize runs

Workflow

PhaseActorAction
1HumanDefine experiment goals
2CoderSet up logging (params, metrics)
3CoderRun experiment
4CoderGenerate plots (loss curves, comparisons)
5WriterSummarize results
6HumanInterpret, decide next experiments

Input Required

  • Experiment name/description
  • Hyperparameters to track
  • Metrics to log
  • Comparison baseline (optional)

Log Structure

code
experiments/
├── exp_001_baseline/
│   ├── config.yaml
│   ├── metrics.json
│   ├── plots/
│   │   ├── loss.png
│   │   └── accuracy.png
│   └── summary.md
├── exp_002_lr_sweep/
│   └── ...
└── comparison.md

Output Format

markdown
# Experiment: [name]

## Config
- learning_rate: 0.001
- batch_size: 32
- epochs: 100

## Results
| Metric | Value | vs Baseline |
|--------|-------|-------------|
| Loss   | 0.23  | -15%        |
| Acc    | 94.2% | +2.1%       |

## Observations
- Converged faster than baseline
- Some overfitting after epoch 80

## Plots
![Loss Curve](plots/loss.png)

Human Checkpoints

  1. Are these the right metrics to track?
  2. What do results mean for the research question?
  3. What experiment to run next?