AgentSkillsCN

review

查看并修正近期的 ALM 评估结果。若 ALM 分类错误,可手动覆盖修正检测结果。这是提升评估准确性的主要反馈机制。

SKILL.md
--- frontmatter
name: review
description: View and correct recent ALM evaluations. Override correction detection if ALM misclassified. Primary feedback mechanism for improving evaluation accuracy.
disable-model-invocation: true
user-invocable: true
argument-hint: "[number-of-recent | session-id]"

ALM Review

View recent evaluations and allow the user to override correction detection.

Instructions

Step 1: Load Recent Evaluations

Read .jsonl files in ~/.claude/alm/evaluations/, starting from the most recent. By default show the last 10 evaluations. If $ARGUMENTS is a number, show that many. If it looks like a session ID, show that specific session.

Step 2: Present Evaluations

For each evaluation, display:

code
Session: {sessionId}  |  {timestamp}
Task Type: {taskType}  |  Outcome: {outcome}
Correction: {correctionDetected}  |  Type: {correctionType}  |  Severity: {severity}
Summary: {correctionSummary or "none"}
Learning: {learning or "none"}
Tools: {toolsUsed}  |  Success Rate: {toolSuccessRate}
Prompt: {promptText (first 100 chars)}

Step 3: Offer Override

After showing evaluations, ask the user if they want to override any evaluations. Common overrides:

  1. Correction was wrong — ALM detected a correction that wasn't really one (false positive). Set correctionDetected: false, correctionType: "none", severity: "none".

  2. Missed correction — ALM failed to detect a correction (false negative). Set correctionDetected: true and ask for correctionType ("explicit" or "implicit") and severity ("minor", "moderate", "major").

  3. Wrong task type — ALM misclassified the task. Update taskType to the correct value.

  4. Wrong outcome — Override outcome to "success", "partial", or "failure".

Step 4: Apply Overrides

When the user specifies an override:

  1. Read the evaluation's .jsonl file
  2. Find the line matching the evaluation ID
  3. Update the fields the user specified
  4. Add "overridden": true and "overriddenAt": "{ISO timestamp}" to the record
  5. Write the updated file back

Use this bash approach to update a specific line in a JSONL file:

bash
python3 -c "
import json, sys, os

eval_dir = os.path.expanduser('~/.claude/alm/evaluations')
eval_id = sys.argv[1]
updates = json.loads(sys.argv[2])

for fname in sorted(os.listdir(eval_dir)):
    if not fname.endswith('.jsonl'):
        continue
    fpath = os.path.join(eval_dir, fname)
    lines = []
    modified = False
    with open(fpath) as f:
        for line in f:
            stripped = line.strip()
            if not stripped:
                lines.append(line)
                continue
            try:
                record = json.loads(stripped)
                if record.get('id') == eval_id:
                    record.update(updates)
                    from datetime import datetime
                    record['overridden'] = True
                    record['overriddenAt'] = datetime.utcnow().isoformat() + 'Z'
                    lines.append(json.dumps(record, separators=(',', ':')) + '\n')
                    modified = True
                else:
                    lines.append(line)
            except json.JSONDecodeError:
                lines.append(line)
    if modified:
        with open(fpath, 'w') as f:
            f.writelines(lines)
        print(f'Updated {eval_id} in {fname}')
        break
" "$EVAL_ID" "$UPDATES_JSON"

Step 5: Update Confidence

After overrides, recalculate confidence scores for affected task types by running:

bash
python3 -c "
import sys, os
sys.path.insert(0, os.path.expanduser('${CLAUDE_PLUGIN_ROOT}/scripts'))
from lib.confidence import load_confidence, save_confidence, calculate_score, get_autonomy_level

confidence = load_confidence()
task_type = sys.argv[1]
if task_type in confidence:
    entry = confidence[task_type]
    entry['score'] = calculate_score(entry)
    entry['autonomyLevel'] = get_autonomy_level(entry['score'])
    save_confidence(confidence)
    print(f'Recalculated {task_type}: score={entry[\"score\"]}, autonomy={entry[\"autonomyLevel\"]}')
" "$TASK_TYPE"

Report what was changed and suggest running /alm:reflect if significant overrides were made.