Metric Runner Skill
Runs verification tests and extracts metrics for ralph-loop++.
Purpose
Execute the verification test created by the Test Architect and parse the results into a usable format for decision-making.
Running a Test
IMPORTANT: Never use eval or bash -c to execute test commands. Use the safe execution script instead.
bash
# Use the run-test.sh script for safe execution
TEST_COMMAND="$1"
WORKING_DIR="$2"
# Safe execution (validates command format, no eval)
OUTPUT=$("${CLAUDE_PLUGIN_ROOT}/scripts/run-test.sh" "$TEST_COMMAND" "$WORKING_DIR")
EXIT_CODE=$?
echo "$OUTPUT"
exit $EXIT_CODE
Allowed Test Commands
The following command prefixes are allowed:
- •
node <script> - •
python <script>/python3 <script> - •
npm test/npx <command> - •
pnpm <command> - •
bun <command> - •
pytest <args> - •
jest <args> - •
vitest <args>
Commands containing shell metacharacters (;, |, &, `, $, () are rejected for security.
Parsing Results
Numeric Metric
javascript
const output = `{"success": true, "metric_value": 45.2, "unit": "ms"}`;
const result = JSON.parse(output);
if (result.success && result.metric_value !== undefined) {
console.log(`Metric: ${result.metric_value} ${result.unit || ''}`);
}
Boolean Success
javascript
const output = `{"success": true, "reason": "All tests passed"}`;
const result = JSON.parse(output);
if (result.success !== undefined) {
console.log(`Success: ${result.success}`);
if (result.reason) console.log(`Reason: ${result.reason}`);
}
Expected Output Formats
Valid Formats
json
{"success": true, "metric_value": 42, "unit": "ms"}
{"success": true, "metric_name": "latency", "metric_value": 42}
{"success": false, "error": "Connection refused"}
{"success": true, "reason": "All 10 runs passed"}
{"success": true}
Invalid Formats (Handle Gracefully)
code
# Plain text output - treat as failure
Error: something went wrong
# Non-JSON - treat as failure
42
# Missing success field - infer from metric_value
{"metric_value": 42}
Comparison Logic
Numeric Metrics
javascript
function checkNumericGoal(current, target, direction) {
if (direction === 'decrease') {
return current <= target;
} else if (direction === 'increase') {
return current >= target;
}
return false;
}
// Example
const achieved = checkNumericGoal(45, 50, 'decrease'); // true
Boolean Metrics
javascript
function checkBooleanGoal(result) {
return result.success === true;
}
Progress Tracking
After each test run, update the state file:
yaml
# Append to metrics_history
metrics_history:
- timestamp: "2024-01-06T10:30:00Z"
worker: 1
iteration: 5
metric_value: 85
goal_achieved: false
Handling Test Failures
Test Crashes
bash
if [[ $EXIT_CODE -ne 0 ]]; then
echo '{"success": false, "error": "Test crashed with exit code '$EXIT_CODE'"}'
fi
Timeout
bash
timeout 300 $TEST_COMMAND || echo '{"success": false, "error": "Test timed out after 5 minutes"}'
Invalid JSON
javascript
try {
const result = JSON.parse(output);
} catch (e) {
return {
success: false,
error: `Invalid JSON output: ${output.substring(0, 100)}`
};
}
Baseline Measurement
When establishing baseline:
javascript
// Run test multiple times for stability
const runs = 3;
const results = [];
for (let i = 0; i < runs; i++) {
const result = runTest(testCommand);
results.push(result.metric_value);
}
// Use median for baseline
results.sort((a, b) => a - b);
const baseline = results[Math.floor(runs / 2)];
console.log(`Baseline established: ${baseline}`);
Integration with Ralph Loop
The worker stop hook uses this skill to:
- •Run the verification test after each iteration
- •Parse the metric value
- •Compare against target
- •Decide whether to continue or signal completion
bash
# In stop hook RESULT=$(run-metric-test "$TEST_COMMAND" "$WORKTREE_PATH") METRIC=$(echo "$RESULT" | jq -r '.metric_value // empty') if [[ -n "$METRIC" ]] && (( $(echo "$METRIC <= $TARGET" | bc -l) )); then echo "<promise>GOAL ACHIEVED: $METRIC</promise>" fi