ML Experiment Tracking Skill
Track machine learning experiments with reproducible parameters and metrics.
Trigger Conditions
- •Model configuration changes or hyperparameter updates
- •New experiment run initiated
- •User invokes with "track experiment" or "compare models"
Input Contract
- •Required: Experiment parameters (model, hyperparameters, data)
- •Required: Evaluation metrics
- •Optional: Baseline comparison, hypothesis
Output Contract
- •Experiment log entry with full reproducibility info
- •Comparison table against baseline/prior runs
- •Recommendation on whether to promote or iterate
Tool Permissions
- •Read: Model configs, training data metadata, metric logs
- •Write: Experiment logs, comparison reports
- •Execute: Metric collection commands
Execution Steps
- •Record experiment hypothesis and parameters
- •Capture environment (dependencies, data version, code commit)
- •Execute or observe training run
- •Collect metrics and artifacts
- •Compare against baseline and prior experiments
- •Recommend: promote, iterate, or abandon
Success Criteria
- •Experiment is fully reproducible from logged parameters
- •Metrics compared against baseline
- •Clear recommendation with rationale
Escalation Rules
- •Escalate if model performance degrades vs. baseline
- •Escalate if data drift detected in training set
- •Escalate if experiment requires new infrastructure
Example Invocations
Input: "Compare the BERT-base and DistilBERT models for our classification task"
Output: Experiment log: BERT-base (F1: 0.92, latency: 45ms, size: 440MB) vs DistilBERT (F1: 0.89, latency: 12ms, size: 260MB). Recommendation: DistilBERT for production (3% F1 trade-off for 73% latency improvement). Promote to staging for A/B test.