Analyze Report

Interpret benchmark and test results. Turn raw data into actionable findings.

When to Use

•After running benchmarks — interpret results, flag anomalies
•After ctest — analyze failures, identify patterns
•Before spec updates — verify performance claims with measured data
•Periodic — track performance trends across runs

When NOT to Use

•Running benchmarks — run them first, then use this skill to analyze
•Spec quality review — use /spec-review instead
•Test gap analysis — use /test-plan instead

Inputs

•No argument: find and analyze all reports in the project (benchmark JSON, test output, coverage reports)
•File path: analyze a specific report file
•--compare A B: compare two report files (e.g., before/after optimization)

Workflow

Phase 1: Discovery

•Read CLAUDE.md for project structure — find benchmark output directory, test report directory
•Glob for benchmark results: **/*_bench*.json, **/benchmark_*.json
•Glob for test reports: **/test-report*, **/coverage*
•Check for hardware baseline data (if project has hw-baseline or similar)

Phase 2: Parse Raw Data

For each benchmark result:

•Extract scenario names, iterations, timing data (mean, median, P50/P99, min/max)
•Identify the unit (ns, us, ms, ops/sec)
•Group by benchmark suite

For each test report:

•Extract pass/fail counts, failure messages
•Extract coverage data if available

Phase 3: Compute Derived Metrics

Scaling analysis: Compare scenarios that differ by one parameter (e.g., 1 reader vs 4 readers).

code

Scaling ratio = time(N) / time(1)
Ideal: ratio ≈ 1.0 (independent)
Degraded: ratio > 2.0 (contention)

Contention indicators: Retry rates, CAS failures, cache miss ratios.

Regression detection (when comparing two runs):

code

Regression threshold: > 10% slower on same hardware
Improvement threshold: > 10% faster
Noise band: within 10%

Phase 4: Cross-Reference with Specs

For each performance claim in specs:

•Find the corresponding benchmark result
•Compare measured value against spec target
•Classify: MEETS | EXCEEDS | MISSES | NO_DATA

Phase 5: Report

markdown

# Benchmark Analysis Report

**Date**: YYYY-MM-DD HH:MM
**Hardware**: {from baseline if available}

## Summary

| Metric | Value |
|--------|-------|
| Benchmarks analyzed | N |
| Spec claims verified | N / M |
| Anomalies found | N |

## Findings

### Performance vs Spec Targets

| Component | Spec Target | Measured | Status |
|-----------|-------------|----------|--------|
| Reactor loop | ~1.6ns (design est.) | P50=4.7ns | MEETS (within order) |
| SeqLock read | ~50ns | — | NO_DATA |

### Scaling Analysis

| Benchmark | 1→N Scaling | Ratio | Assessment |
|-----------|------------|-------|------------|
| DeltaRing 1→4 readers | 2.3ns→845ns | 367x | Expected (SPMC contention) |

### Anomalies

1. **{benchmark}:{scenario}** — {description}
   Data: {numbers}
   Possible cause: {analysis}
   Action: {recommendation}

### Trends (if comparing runs)

| Benchmark | Before | After | Change |
|-----------|--------|-------|--------|
| ... | ... | ... | +/-% |

Principles

•Data first, interpretation second: Present raw numbers before drawing conclusions.
•Hardware context matters: Same benchmark on different CPUs means different things. Always note hardware.
•Noise awareness: Single-digit percent differences are noise, not signal. Only flag > 10% changes.
•No spec numbers from thin air: If a spec claims X ns but no benchmark measures it, report NO_DATA, don't estimate.
•Actionable output: Every anomaly should have a "what to do next" recommendation.