Name: results-audit
Rating: 92
Author: paulbroadmission

bash

# Quick anomaly detection
python3 -c "
import json, sys

results_file = 'workspace/results/iteration_001/test_results.json'  # adjust iteration
try:
    with open(results_file) as f:
        r = json.load(f)

    flags = []

    # Check for perfect metrics
    for k, v in r.items():
        if isinstance(v, float) and v >= 1.0:
            flags.append(f'SUSPICIOUS: {k} = {v} (perfect score)')
        if isinstance(v, float) and v == 0.0:
            flags.append(f'SUSPICIOUS: {k} = {v} (zero)')

    # Check seed is recorded
    if 'seed' not in r:
        flags.append('MISSING: random seed not recorded')

    if flags:
        print('🚩 RED FLAGS:')
        for f in flags:
            print(f'  - {f}')
    else:
        print('✅ No obvious red flags')
except Exception as e:
    print(f'❌ Cannot read results: {e}')
"

json

{
  "timestamp": "...",
  "iteration": N,
  "status": "PASS | WARN | CRITICAL",
  "expected_range": [low, high],
  "actual_result": X,
  "plausibility": "PLAUSIBLE | SUSPICIOUS | OUTSIDE_RANGE",
  "statistical_validity": "PASS | FAIL",
  "reproducibility": "CONFIRMED | UNCONFIRMED",
  "red_flags": [],
  "score": X
}

results-audit

Results Audit — Authenticity & Statistical Validity

Automated Red Flag Scan

Verification Checklist

1. Training Log Integrity

2. Results Plausibility

3. Cross-Consistency

4. Statistical Significance

5. Reproducibility

IMMEDIATE FAIL Conditions

Output Format