ALM Review
View recent evaluations and allow the user to override correction detection.
Instructions
Step 1: Load Recent Evaluations
Read .jsonl files in ~/.claude/alm/evaluations/, starting from the most recent. By default show the last 10 evaluations. If $ARGUMENTS is a number, show that many. If it looks like a session ID, show that specific session.
Step 2: Present Evaluations
For each evaluation, display:
Session: {sessionId} | {timestamp}
Task Type: {taskType} | Outcome: {outcome}
Correction: {correctionDetected} | Type: {correctionType} | Severity: {severity}
Summary: {correctionSummary or "none"}
Learning: {learning or "none"}
Tools: {toolsUsed} | Success Rate: {toolSuccessRate}
Prompt: {promptText (first 100 chars)}
Step 3: Offer Override
After showing evaluations, ask the user if they want to override any evaluations. Common overrides:
- •
Correction was wrong — ALM detected a correction that wasn't really one (false positive). Set
correctionDetected: false,correctionType: "none",severity: "none". - •
Missed correction — ALM failed to detect a correction (false negative). Set
correctionDetected: trueand ask forcorrectionType("explicit" or "implicit") andseverity("minor", "moderate", "major"). - •
Wrong task type — ALM misclassified the task. Update
taskTypeto the correct value. - •
Wrong outcome — Override
outcometo "success", "partial", or "failure".
Step 4: Apply Overrides
When the user specifies an override:
- •Read the evaluation's
.jsonlfile - •Find the line matching the evaluation ID
- •Update the fields the user specified
- •Add
"overridden": trueand"overriddenAt": "{ISO timestamp}"to the record - •Write the updated file back
Use this bash approach to update a specific line in a JSONL file:
python3 -c "
import json, sys, os
eval_dir = os.path.expanduser('~/.claude/alm/evaluations')
eval_id = sys.argv[1]
updates = json.loads(sys.argv[2])
for fname in sorted(os.listdir(eval_dir)):
if not fname.endswith('.jsonl'):
continue
fpath = os.path.join(eval_dir, fname)
lines = []
modified = False
with open(fpath) as f:
for line in f:
stripped = line.strip()
if not stripped:
lines.append(line)
continue
try:
record = json.loads(stripped)
if record.get('id') == eval_id:
record.update(updates)
from datetime import datetime
record['overridden'] = True
record['overriddenAt'] = datetime.utcnow().isoformat() + 'Z'
lines.append(json.dumps(record, separators=(',', ':')) + '\n')
modified = True
else:
lines.append(line)
except json.JSONDecodeError:
lines.append(line)
if modified:
with open(fpath, 'w') as f:
f.writelines(lines)
print(f'Updated {eval_id} in {fname}')
break
" "$EVAL_ID" "$UPDATES_JSON"
Step 5: Update Confidence
After overrides, recalculate confidence scores for affected task types by running:
python3 -c "
import sys, os
sys.path.insert(0, os.path.expanduser('${CLAUDE_PLUGIN_ROOT}/scripts'))
from lib.confidence import load_confidence, save_confidence, calculate_score, get_autonomy_level
confidence = load_confidence()
task_type = sys.argv[1]
if task_type in confidence:
entry = confidence[task_type]
entry['score'] = calculate_score(entry)
entry['autonomyLevel'] = get_autonomy_level(entry['score'])
save_confidence(confidence)
print(f'Recalculated {task_type}: score={entry[\"score\"]}, autonomy={entry[\"autonomyLevel\"]}')
" "$TASK_TYPE"
Report what was changed and suggest running /alm:reflect if significant overrides were made.