Objective
Reconstruct specifications from a repository's codebase, existing documentation, and git history. You're doing archaeology — finding the features buried in code and docs, then surfacing them as structured specs.
A spec is warranted when code implements user-facing functionality or business logic. Not every module needs a spec. Your job is to find the functional boundaries and document them.
Process
- •Gather — Read existing specs, docs, code, git history, and mapping state
- •Classify code — Distinguish functional code (needs specs) from architectural (needs ADRs) from utility (needs neither)
- •Map code to specs — Identify existing coverage, gaps, and conflicts
- •Ask — Surface unknowns, request clarification. No guessing.
- •Generate — Produce/amend specs only after confirmation
- •Verify — Ensure no relevant legacy content dropped
- •Archive — Move superseded specs to
_archiveafter confirmation - •Persist — Update mapping and changelog
Maintain state in files so work isn't lost if the conversation ends.
1. Code Classification
Functional (needs spec coverage)
- •User-facing features and workflows
- •Business logic and domain rules
- •API endpoints that serve external consumers
- •Data transformations with business meaning
- •Integration points with business significance
Architectural (needs ADR coverage, separate skill)
- •Infrastructure decisions (deployment, CI/CD)
- •Technology choices (frameworks, libraries)
- •Cross-cutting concerns (auth, logging, caching)
- •System boundaries and module structure
Utility (needs neither)
- •Helper functions and utilities
- •Generated code
- •Test infrastructure
- •Dev tooling
- •Pure technical plumbing
Judgment calls
When uncertain, ask: "If a product manager asked 'what does this do for users?', would there be a meaningful answer?" If yes, it's functional.
2. Hierarchy Relationship
Build hierarchy (source of truth, time-oriented)
specs/build/ 0.md # Vision — why the product exists 1.1.md # Epic — large user-facing goal 1.1.1.md # User Story — specific user need changelog.md
Build captures intent — what we set out to do and when.
Functional hierarchy (derived view, navigation-oriented)
specs/functional/ 0.md # Product description — what it is today 1.1.md # Domain — bounded context 1.1.1.md # Feature — specific capability changelog.md
Functional captures current state — what exists now and how it's organized.
Relationship
- •Build generates Functional: epics/stories become domains/features when implemented
- •Functional gaps trigger build investigation: missing feature → check epic history
- •During backfill: populate build first, derive functional from it
- •Both can exist independently, but functional without build loses historical context
Numbering convention
Hierarchical numbering reflects parent-child relationships:
- •
1.1.md= first child of root - •
1.1.1.md= first child of 1.1 - •
1.2.md= second child of root (sibling of 1.1)
3. Coverage Analysis
Mapping code to specs
For each functional code path, determine:
- •Does a spec exist? → Check alignment
- •No spec exists? → Flag as gap
- •Spec exists but conflicts with code? → Flag as conflict
Coverage report structure
Coverage Report =============== Covered: src/billing/invoice.py → specs/functional/1.2.1.md (Feature: Invoicing) Gap: src/notifications/email.py → No spec (appears to be email notifications) Conflict: src/auth/login.py → specs/functional/1.1.1.md (code has MFA, spec doesn't mention it)
Conflict types
- •Spec ahead of code: Spec describes feature not yet implemented
- •Code ahead of spec: Code has functionality not in spec
- •Semantic mismatch: Both exist but describe different behavior
- •Stale spec: Spec describes removed functionality
4. Gates (when to stop and ask)
Code classification ambiguity
"I'm unsure whether
src/lib/pricing_engine.pyis functional (business logic) or utility (calculation helper). It computes discounts based on customer tier. How do you see it?"
Spec mapping unclear
"Code in
src/orders/fulfillment.pyhandles shipping, returns, and tracking. Should this map to one feature or three?"
Business intent unknown
"This module processes webhook events from Stripe. I can see what it does but not why — what business need does this serve?"
Conflict resolution
"Spec says users can have max 5 projects. Code enforces max 10. Which is correct — should I update the spec or flag this as a bug?"
Gap prioritization
"I found 12 code paths without specs. Should I generate specs for all, or which are highest priority?"
Always ask before generating a spec. Present your analysis, let the user confirm or correct.
5. Spec Generation
From code (gap filling)
When generating a spec for undocumented code:
- •Extract observable behavior from code
- •Identify inputs, outputs, side effects
- •Note business rules and constraints
- •Ask user for: purpose, user story, acceptance criteria
- •Generate spec only after confirmation
From existing docs (migration)
When migrating legacy docs to spec structure:
- •Identify content that maps to features
- •Preserve all relevant information
- •Ask user to confirm nothing important is dropped
- •Archive original in
specs/_archive
6. Changelog Reconstruction
From git history
Parse git history of spec documents to rebuild changelog:
- •Identify creation dates (first commit)
- •Track significant modifications (not typo fixes)
- •Note status transitions (draft → active, active → deprecated)
- •Ask for clarification on ambiguous commits
Changelog format
# Changelog ## [YYYY-MM-DD] - Added: Feature 1.2.3 (Email notifications) - Modified: Feature 1.1.1 (Login) — added MFA requirement - Deprecated: Feature 1.3.1 (Legacy export) ## [YYYY-MM-DD] - ...
Ambiguous changes
When commit message doesn't clarify intent:
"Commit abc123 modified specs/functional/1.2.1.md significantly but message just says 'updates'. What changed and why?"
7. Verification
Before finalizing any run:
- •Coverage check — All functional code has spec mapping
- •Content preservation — No relevant legacy content dropped
- •Consistency check — No internal contradictions in specs
- •Link integrity — All cross-references (ADRs, code paths) valid
Legacy content handling
When archiving old specs:
- •Diff old vs. new content
- •Surface any content in old not in new
- •Ask user: "This content exists in legacy but not in new spec: [content]. Should it be added, or is it obsolete?"
- •Only archive after confirmation
8. Location
specs/
build/
0.md # Vision
1.1.md # Epic
1.1.1.md # User Story
...
changelog.md
functional/
0.md # Product description
1.1.md # Domain
1.1.1.md # Feature
...
changelog.md
spec-mapping.yaml # Code ↔ spec mapping (permanent)
spec-backfill-state.yaml # Session state (temporary)
_archive/ # Superseded specs (preserve history)
One document per node. One changelog per hierarchy.
Quality Bar
A good backfilled spec:
- •Could have been written before the code was built
- •Describes what and for whom, not implementation details
- •Stands alone (reader doesn't need to read the code)
- •Is honest about what's confirmed vs. inferred
- •Links to relevant ADRs for why decisions were made
A bad backfilled spec:
- •Just describes the code structure
- •Invents requirements the user didn't confirm
- •Mixes multiple features in one document
- •Contains implementation details instead of behavior
- •Conflicts with the actual code
Anti-patterns to Avoid
- •Inventing requirements: If you don't know the business need, ask. Don't fabricate.
- •Generating without code validation: Every spec claim must be verifiable against code.
- •Guessing business intent: Code shows what, not why for users. Ask.
- •Dropping legacy content silently: Always surface what's being removed for confirmation.
- •Over-specifying implementation: Specs describe behavior, not code structure.
- •Ignoring conflicts: Surface mismatches explicitly, don't paper over them.
Getting Started
When the user invokes this skill:
- •Check for existing state (
specs/spec-backfill-state.yaml), offer to resume - •If starting fresh, begin with code classification
- •Show the user your classification for validation
- •Map existing specs to code, surface gaps and conflicts
- •Ask which gaps to prioritize
- •Generate specs one at a time, confirming each before proceeding
- •Verify no legacy content lost before archiving
- •Update mapping and changelog
Reference
Mapping schema (permanent)
# specs/spec-mapping.yaml
version: 1
repository: "git@github.com:org/repo.git"
last_updated: "2024-01-15T14:22:00Z"
code_classification:
# path → {type: functional | architectural | utility, confidence: confirmed | inferred}
"src/billing/invoice.py": {type: functional, confidence: confirmed}
"src/lib/kafka_client.py": {type: architectural, confidence: confirmed}
"src/utils/formatting.py": {type: utility, confidence: inferred}
mappings:
# code_path → spec mapping
"src/billing/invoice.py":
spec_path: "specs/functional/1.2.1.md"
spec_title: "Invoicing"
status: aligned # aligned | gap | conflict | stale
last_verified: "2024-01-15"
confidence: confirmed # confirmed | inferred-pending-review
conflict_details: null
"src/notifications/email.py":
spec_path: null
spec_title: null
status: gap
last_verified: "2024-01-15"
confidence: inferred-pending-review
inferred_feature: "Email notifications"
"src/auth/login.py":
spec_path: "specs/functional/1.1.1.md"
spec_title: "User Login"
status: conflict
last_verified: "2024-01-15"
confidence: confirmed
conflict_details: "Code has MFA, spec doesn't mention it"
specs:
# spec_path → metadata
"specs/functional/1.2.1.md":
title: "Invoicing"
status: active # draft | active | deprecated | removed
last_verified: "2024-01-15"
linked_adrs: ["ADR-0012", "ADR-0015"]
code_paths: ["src/billing/invoice.py", "src/billing/line_items.py"]
"specs/build/1.2.md":
title: "Billing Epic"
status: active
last_verified: "2024-01-15"
linked_adrs: ["ADR-0012"]
child_specs: ["specs/build/1.2.1.md", "specs/build/1.2.2.md"]
archive:
# archived_path → metadata
"specs/_archive/old-billing-spec.md":
original_path: "docs/billing.md"
archived_at: "2024-01-15"
reason: "Migrated to specs/functional/1.2.x"
content_verified: true
Session state schema (temporary)
# specs/spec-backfill-state.yaml version: 1 started_at: "2024-01-15T10:30:00Z" last_updated: "2024-01-15T14:22:00Z" processing_cursor: step: "mapping" # classification | mapping | gap_analysis | generation | verification | archiving current_path: "src/notifications/email.py" substep: "awaiting_user_input" configuration: functional_roots: ["src/"] exclude_patterns: ["**/tests/**", "**/migrations/**"] spec_output_dir: "specs" require_confirmation: true
Spec templates
Vision (build/0.md)
# Vision ## Purpose [Why this product exists] ## Target Users [Who this is for] ## Core Value Proposition [What problem it solves] ## Success Metrics [How we measure success] --- *Status: active* *Last verified: YYYY-MM-DD*
Epic (build/1.x.md)
# [Epic Name] ## Goal [What user-facing outcome this achieves] ## User Stories - 1.x.1 — [Story title] - 1.x.2 — [Story title] ## Success Criteria [How we know this epic is complete] ## Related ADRs - ADR-NNNN — [Decision title] --- *Status: active* *Last verified: YYYY-MM-DD*
User Story (build/1.x.x.md)
# [Story Title] ## User Story As a [role], I want [capability] so that [benefit]. ## Acceptance Criteria - [ ] Criterion 1 - [ ] Criterion 2 ## Implementation - `src/path/to/module.py` — [what it does] --- *Status: active* *Last verified: YYYY-MM-DD* *Parent epic: 1.x*
Product Description (functional/0.md)
# [Product Name] ## Overview [What the product is and does] ## Domains - 1.1 — [Domain name] - 1.2 — [Domain name] ## Key Integrations [External systems this connects to] --- *Status: active* *Last verified: YYYY-MM-DD*
Domain (functional/1.x.md)
# [Domain Name] ## Description [What this bounded context covers] ## Features - 1.x.1 — [Feature name] - 1.x.2 — [Feature name] ## Boundaries [What's in scope vs. out of scope] --- *Status: active* *Last verified: YYYY-MM-DD*
Feature (functional/1.x.x.md)
# [Feature Name] ## Status - Status: active - Last verified: YYYY-MM-DD - Linked ADRs: ADR-NNNN ## Description [What this feature does for users] ## Behavior [Observable behavior, rules, constraints] ## Code References - `src/path/to/module.py` — [what it implements] --- *Parent domain: 1.x*