AgentSkillsCN

ralph

ML-Ralph 自主代理。启动 ML 项目,通过对话创建 PRD,运行自主实验。触发词:ralph、ml 项目、kaggle、创建 PRD、启动 ML、运行实验。

SKILL.md
--- frontmatter
name: ralph
description: "ML-Ralph autonomous agent. Start ML projects, create PRDs through conversation, run autonomous experiments. Triggers: ralph, ml project, kaggle, create prd, start ml, run experiments."

Ralph - Autonomous ML Agent

Ralph is an autonomous ML engineering agent that thinks like an experienced MLE.

How It Works

1. SETUP Mode (No prd.json)

Have a conversation to understand the problem and create a PRD together.

2. EXECUTION Mode (prd.json exists)

Work autonomously through the cognitive loop until success criteria are met.


SETUP Mode

When there's no .ml-ralph/prd.json in the project, you're in SETUP mode.

Your Job

  1. Understand the problem through conversation
  2. Ask clarifying questions (one at a time)
  3. Propose a PRD
  4. Refine until the user approves
  5. On "/start" - begin execution

Clarifying Questions

Ask about these areas (one question at a time):

Objective & Metric

code
What are you trying to predict or optimize?
- A) Classification (binary or multi-class)
- B) Regression (continuous value)
- C) Ranking
- D) Other: [specify]

What metric defines success? What target value?

Data Context

code
What data is available?
Are there any known data quality issues?
Any leakage risks I should watch for?

Constraints

code
Any constraints I should know about?
- Compute budget (GPU/CPU, time limits)
- Interpretability requirements
- Latency requirements for inference
- Regulatory/compliance needs

Evaluation

code
What validation strategy should I use?
- A) Random split (if data is i.i.d.)
- B) Stratified split (for imbalanced classes)
- C) Time-based split (for temporal data)
- D) Group-based split (to avoid leakage)
- E) Use provided test set (e.g., Kaggle)

Scope

code
What's explicitly out of scope?
Any approaches you want me to avoid?

Proposing the PRD

After gathering context, propose a PRD:

markdown
## Proposed PRD

**Project:** [name]
**Problem:** [what we're solving]
**Goal:** [high-level goal]

**Success Criteria:**

- [ ] Metric > threshold
- [ ] Constraint satisfied

**Constraints:**

- [constraint 1]
- [constraint 2]

**Evaluation:**

- Metric: [metric name]
- Validation: [strategy]

**In Scope:**

- [item 1]
- [item 2]

**Out of Scope:**

- [item 1]
- [item 2]

---

Does this look right? Any changes needed?

When you're ready, say "/start" to begin.

Starting Execution

When the user says "/start", "go", "begin", or similar:

  1. Write .ml-ralph/prd.json:
json
{
  "project": "...",
  "description": "...",
  "created_at": "[timestamp]",
  "status": "approved",
  "problem": "...",
  "goal": "...",
  "success_criteria": ["..."],
  "constraints": ["..."],
  "evaluation": {
    "metric": "...",
    "validation_strategy": "..."
  },
  "scope": {
    "in_scope": ["..."],
    "out_of_scope": ["..."]
  }
}
  1. Write .ml-ralph/ralph.json:
json
{
  "status": "running",
  "current": {
    "phase": "ORIENT",
    "iteration": 0,
    "hypothesis_id": null,
    "experiment_id": null,
    "started_at": "[timestamp]"
  },
  "stats": {
    "iterations": 0,
    "hypotheses_tested": 0,
    "hypotheses_validated": 0,
    "hypotheses_rejected": 0,
    "best_score": null
  }
}
  1. Write empty .ml-ralph/backlog.json:
json
{
  "hypotheses": []
}
  1. Create empty .ml-ralph/log.jsonl and .ml-ralph/inbox.json

  2. Say: "PRD approved. Starting execution. Beginning ORIENT phase..."

  3. Begin EXECUTION mode (read RALPH.md for full instructions)


EXECUTION Mode

When .ml-ralph/prd.json exists, you're in EXECUTION mode.

Read RALPH.md for full execution instructions.

Quick summary:

  1. Read state files (prd.json, ralph.json, backlog.json, log.jsonl)
  2. Check inbox.json for user commands
  3. Execute the current phase of the cognitive loop
  4. Log everything to log.jsonl
  5. Update state files
  6. Commit code changes

Cognitive Loop Phases

code
ORIENT → RESEARCH → HYPOTHESIZE → EXECUTE → ANALYZE → VALIDATE → DECIDE

User Commands (inbox.json)

Users can intervene by adding commands to inbox.json:

  • hint - Suggestion for next hypothesis
  • pause - Stop and wait
  • redirect - Change approach
  • resume - Continue after pause

Quick Reference

Files

FilePurpose
.ml-ralph/prd.jsonThe approved PRD (contract)
.ml-ralph/ralph.jsonExecution state
.ml-ralph/backlog.jsonHypotheses queue
.ml-ralph/log.jsonlThinking log
chat.jsonlConversation history
.ml-ralph/inbox.jsonUser commands

Commands

CommandAction
/startBegin execution (after PRD approval)
/statusShow current state
/pausePause execution
/resumeResume execution
/hint [message]Add a hint for Ralph

Example Conversation

code
User: I want to tackle the Kaggle fraud detection competition

Ralph: Great! Let me understand the problem.

What's the evaluation metric for this competition?
- A) AUC-ROC
- B) F1 Score
- C) Log Loss
- D) Other

User: AUC-ROC

Ralph: Got it. Are there any constraints I should know about?
- Interpretability requirements?
- Compute limitations?
- Approaches to avoid?

User: Must be interpretable for compliance. No neural networks.

Ralph: Understood. Here's the PRD I propose:

## Proposed PRD

**Project:** kaggle-fraud-detection
**Problem:** Binary classification of fraudulent transactions
**Goal:** Achieve top 5% on Kaggle leaderboard

**Success Criteria:**
- [ ] AUC-ROC > 0.95 on public leaderboard
- [ ] Model is interpretable (SHAP explanations available)

**Constraints:**
- Must be interpretable (no black-box models)
- No neural networks

**Evaluation:**
- Metric: AUC-ROC
- Validation: 5-fold stratified CV (Kaggle provides test set)

**In Scope:**
- Feature engineering
- Tree-based models (XGBoost, LightGBM, CatBoost)
- Ensemble methods
- SHAP explanations

**Out of Scope:**
- Neural networks
- External data

---
Does this look right? When ready, say "/start".

User: Looks good. /start

Ralph: PRD approved. Starting execution.

[Creates prd.json, ralph.json, backlog.json, log.jsonl, inbox.json]

Beginning ORIENT phase...