Spec Review

Audit and maintain specification documents for the current project.

When to Use

•Before /test-plan — clean specs produce better test plans
•After major implementation — specs may be stale
•When adding a new component — ensure spec quality from day one
•Periodic review — catch drift between spec and code

When NOT to Use

•Code review — use /verify instead
•Test gap analysis — use /test-plan instead
•Writing new specs from scratch without existing design — draft first, then /spec-review

Inputs

•No argument: audit ALL specs in the project
•File path: audit a specific spec file
•--fix: auto-fix trivial issues (formatting, dead links) without asking
•--interactive: for each missing template section, ask structured questions instead of just reporting the gap. Uses AskUserQuestion to fill missing content collaboratively.

Workflow

Critical rule: Phase 1-4 are READ-ONLY. No files are modified until Phase 5. Premature fixes risk reversal (e.g., reclassifying a zone before understanding the design intent).

Phase 1: Discovery

•Read CLAUDE.md for project structure — find spec directory, source directories, and constraints
•Glob {spec-dir}/**/*.md to find all spec files
•Check for implementation status doc (if project has one)
•Build a map: which specs exist, what components they cover

Phase 2: Per-File Audit

For each spec file, check all 7 quality criteria (see below). Evidence-based: every finding MUST cite spec file:line AND code file:line (or "no code found").

Phase 3: Cross-File Checks

•SSOT: Detect duplicate content across spec files (same struct defined twice, same protocol described in two places)
•Completeness: Every implemented module (check test dirs and source dirs) should have a corresponding spec section
•Orphaned specs: Spec describes something that was deleted or never implemented

Phase 4: Report

Output the COMPLETE findings report. Do not fix anything yet. Present ALL findings grouped by severity so the user sees the full picture before any action:

code

Severity Decision Tree:

Spec says X, code does Y?           → STALE (highest priority)
Spec says X but X is unclear?       → AMBIGUOUS
Spec doesn't mention feature Z?     → INCOMPLETE
Same info in file A and file B?     → DUPLICATE
Missing required template section?   → TEMPLATE
Formatting inconsistency?           → STYLE (lowest priority)

Phase 4b: Interactive Fill (with --interactive)

After the full report is presented, for each TEMPLATE finding (missing required section):

•Show which section is missing and WHY it matters
•Ask the user a targeted question to fill the gap
•Draft the section content from the user's answer
•Collect all drafts — do NOT write yet (writing happens in Phase 5)

Example questions by section:

•Zone: "What is the latency budget for [module]? Is it per-message (<10us), per-cycle (<500us), or event-driven (ms)?"
•Performance Budget: "What are the per-stage timing targets? Which are measured vs design estimates?"
•Testable Behaviors: "List the behaviors of [module] that a test should verify. E.g., 'push to full queue returns false'"
•Error Handling: "What happens when [error scenario]? Does it return error code, log, or crash?"

Phase 5: Discussion & Fix

User has seen the full report. Now discuss and apply changes:

•Walk through findings with user, confirm each proposed action
•Spec fixes (STALE, AMBIGUOUS, TEMPLATE, STYLE): batch-apply after confirmation
•Code findings (STALE_CODE, missing concept formalization, etc.): do NOT fix here — create as separate tasks or note in Open Items
•For STYLE: auto-fix if --fix flag, otherwise list
•For INCOMPLETE: suggest what to add, let user decide

Separation of concerns: spec-review modifies SPEC files only. Code changes are out of scope — they become tasks for implementation agents.

Quality Criteria

1. Clarity

Every spec section that describes behavior must be unambiguous enough to write a test from.

Check for:

•Vague quantifiers without numbers: "fast", "low latency", "approximately", "around"
•Missing units on performance claims: "~50" (50 what? ns? us? ms?)
•Conditional behavior without specifying all branches: "if X, then Y" (what if not X?)
•Undefined terms used without context

Evidence format:

code

AMBIGUOUS: {spec-file}:{line}
  "{quoted text}"
  -> Missing: {what's unclear and why it blocks testing}

2. Consistency with Code

Spec and code must be consistent. When they disagree, classify the disagreement — don't assume which side is right.

Classification	Meaning	Action
STALE_SPEC	Code evolved intentionally	Update spec
STALE_CODE	Code has bug/shortcut, spec is design target	Flag for code fix
EVOLVED	Requirements changed	Update both
UNCLEAR	Can't determine	Present both, ask

Check for:

•Struct definitions in spec vs actual headers (field names, types, sizes, alignment)
•Function signatures in spec vs actual code
•Template parameters in spec vs actual code
•Constants (capacities, limits) in spec vs actual code
•Algorithms described in spec vs actual implementation

How to verify:

•Read CLAUDE.md for source directory layout
•Read the corresponding header/source files
•Compare struct layouts, method signatures, constant values
•Flag any mismatch with both locations
•Determine WHY they disagree (commit history, design docs, code comments)

Evidence format:

code

DISAGREEMENT: {spec-file}:{line}
  Spec: "{what spec says}"
  Code: {source-file}:{line}
  Mismatch: {what's different}
  Classification: STALE_SPEC | STALE_CODE | EVOLVED | UNCLEAR
  Reasoning: {why you classified it this way}

3. Single Source of Truth (SSOT)

Each piece of information should exist in exactly ONE spec file.

Check for:

•Same struct described in multiple spec files
•Same protocol described in multiple places
•Performance targets stated in multiple files with different values
•Cross-references that copy content instead of linking

Fix pattern: Keep the canonical definition in the more specific file, replace others with a cross-reference link.

4. Completeness

Every implemented component should have testable behaviors described in the spec.

Check for:

•Implemented headers without corresponding spec sections
•Spec sections that say "TBD", "TODO", or are empty
•Components with tests but no spec (spec may need to be written retroactively)
•Performance claims without benchmark references
•Edge cases mentioned in code comments but not in spec

5. Currency

Spec reflects the CURRENT state, not historical or planned future state.

Check for:

•"Will be implemented" / "planned" for things that are already done (update to present tense)
•"Will be implemented" / "planned" for things in layers not yet built (acceptable, but mark clearly)
•References to deleted code or renamed types
•Version/date markers that are stale

6. Performance Claims

Every performance number must be grounded.

Grounding levels:

Source	Example	Trust Level
Benchmark data	"P50=4.7ns (see reactor_overhead_bench)"	Measured
Hardware baseline	"L1 latency ~1ns (hw_cache_latency_bench)"	Measured
Design estimate	"~1.6 ns/loop (fold expression, single comparison)"	Theoretical
Spec assertion	"~50 ns read latency"	Ungrounded until measured

Check for:

•Performance claims without source: state whether it's a design estimate, measured value, or theoretical bound
•Claims contradicted by benchmark data (check test reports and benchmark results)
•Missing benchmark for a claimed number (no corresponding *_bench.cpp)

7. Template Compliance

Every spec file must contain ALL required sections from the Spec Document Template (CLAUDE.md).

Check for:

•Missing required sections (Overview, Zone Classification, Data Structures, Interface, Error Handling, Testable Behaviors, Dependencies)
•HOT/WARM specs missing Performance Budget or Cache Footprint
•Testable Behaviors section missing or empty (blocks /test-plan)
•Zone Classification missing (blocks /low-latency-audit scoping)

Evidence format:

code

TEMPLATE: {spec-file}
  Missing: [list of missing required sections]
  Zone: [classified zone or UNCLASSIFIED]

Output Format

markdown

# Spec Review Report

**Scope**: [all specs | specific file]
**Date**: YYYY-MM-DD HH:MM

## Summary

| Severity | Count |
|----------|-------|
| STALE | N |
| AMBIGUOUS | N |
| INCOMPLETE | N |
| DUPLICATE | N |
| TEMPLATE | N |
| STYLE | N |

## Findings

### STALE (fix immediately — misleading)

1. **file.md:LINE** — Description
   Code: header.h:LINE — What code actually does
   Proposed fix: ...

### AMBIGUOUS (discuss — blocks testing)

1. **file.md:LINE** — Description
   Impact: Cannot write test for [behavior]
   Proposed fix: ...

[...etc for each severity...]

## Actions Taken

- [x] Fixed: description (auto-fix)
- [ ] Needs discussion: description

Principles

•Disagreement is a finding, not an automatic fix direction: When spec and code disagree, classify first (STALE_SPEC / STALE_CODE / EVOLVED / UNCLEAR). Don't blindly update spec to match code — the code might be wrong.
•Delete, don't archive: Remove stale content. Git has history.
•Compress: A shorter spec that says the same thing is a better spec. Tables > prose. Remove filler.
•No speculation: If you're not sure whether something is stale, read the code. If code doesn't exist, say so.
•One pass, all criteria: Don't scan the file 7 times. Read once, check all criteria simultaneously.