Spec Extractor

Extract a pure requirements specification from a codebase — capturing what the project does without how it's built. The output SPEC.md serves as a portable blueprint: an agent can rebuild the project from scratch, resulting in simpler, cleaner code with the same features.

Modes

Mode	Flag	Behavior
Tech-preserving (default)	(none)	Section 6 lists specific stack: languages, frameworks, databases, services by name
Tech-agnostic	`--tech-agnostic`	Section 6 describes abstract capabilities: "persistent relational storage", "server-side rendering framework"

Analysis Workflow

Execute these four phases sequentially. Read the directory tree first, then selectively read representative files — do NOT read every file.

Phase 1: Project Identity

Purpose: Establish what the project IS.

Read: README, CLAUDE.md, package manifests (package.json, pyproject.toml, Cargo.toml, go.mod), .env.example, docker-compose.yml, config files.

Extract:

•Project name and one-line purpose
•Target users / audience
•Core value proposition

Phase 2: Feature Discovery

Purpose: Discover what the project DOES.

Read by project type:

•API/Web: Routes, controllers, middleware, pages, components
•CLI: Command parsers, subcommands, argument definitions
•Library: Public API surface, exported functions/classes
•Full-stack: Both API and frontend layers

Method: Read directory tree first to understand structure, then read representative files from each functional area. Group features by user-facing domain, not code organization.

Phase 3: Data & Integrations

Purpose: Map data entities and external dependencies.

Read: Schema/migration files, ORM models/entities, .env.example for third-party services, auth middleware, API client configurations.

Extract:

•Data entities and their relationships (not column-level schema)
•External services and their purpose (database, cache, email, payment, auth providers)
•Which integrations are required vs optional

Phase 4: Behavioral Verification

Purpose: Confirm completeness by cross-referencing against tests.

Read: Test file names and descriptions (not test implementation), error types/messages, configuration keys.

Verify:

•Every test-described behavior appears in the feature list
•Error scenarios are reflected in acceptance criteria
•All configuration keys are documented

SPEC.md Output Template

Write the specification using exactly these 9 sections. Scale the document to project complexity: ~1 page for a CLI tool, 3-5 pages for a full-stack app. Omit sections that don't apply (e.g., skip Non-Functional Requirements for simple projects).

markdown

# SPEC.md — {Project Name}

## 1. Purpose

{1-3 sentences: what the project does, who it's for, why it exists.}

## 2. User-Facing Features

{Group by domain. Describe behaviors only — what users can do, not how it works internally.}

### {Domain Group}

- {Feature behavior description}
- {Feature behavior description}

## 3. User Flows

{Numbered steps from the user's perspective. Primary flows only — trust the rebuilding agent for edge cases.}

### {Flow Name}

1. {User action}
2. {System response}
3. {Next step}

## 4. Data Entities

| Entity | Description | Relationships |
|--------|-------------|---------------|
| {Name} | {What it represents} | {How it relates to other entities} |

## 5. External Integrations

| Service | Purpose | Required |
|---------|---------|----------|
| {Name} | {What it's used for} | Yes/No |

## 6. Technology Constraints

{Tech-preserving: list specific stack by name.}
{Tech-agnostic: describe abstract capabilities needed.}

## 7. Configuration & Environment

| Key | Purpose | Required |
|-----|---------|----------|
| {KEY_NAME} | {What it controls} | Yes/No |

## 8. Non-Functional Requirements

{Only include if relevant. Examples: performance targets, security requirements, accessibility standards.}

## 9. Acceptance Criteria

{Checkbox list of testable pass/fail behaviors. Every feature and integration should have at least one criterion.}

- [ ] {Testable behavior statement}
- [ ] {Testable behavior statement}

Writing Rules

Follow these rules strictly when drafting SPEC.md:

•Describe behaviors, not mechanisms — "Users can reset their password via email" not "The PasswordResetController sends a token using SendGrid"
•User-perspective language — write from what a user sees and does, not what the code does internally
•No implementation names — omit file names, class names, function names, database column names (except in Section 6 when using tech-preserving mode)
•Group by domain — organize features by what they mean to users, not by how code is structured
•Scale to project size — a CLI tool gets a concise 1-page spec; a full-stack app gets 3-5 pages
•High-level only — capture requirements at a level where a competent agent can fill in the details during rebuild

Self-Review Checklist

Before saving SPEC.md, verify every item:

• No implementation details leaked (no file names, class names, function names, library names outside Section 6)
• Every discovered feature from Phase 2 appears in the spec
• Acceptance criteria cover all features listed in Section 2
• Acceptance criteria cover all integrations listed in Section 5
• Section 6 matches selected mode (tech-preserving or tech-agnostic)
• Document length is proportional to project complexity

Workflow

•Detect project type:

bash

uv run shared/detect_project.py --path "$(pwd)"

•Parse arguments: Check $ARGUMENTS for --tech-agnostic flag and optional path
•Execute analysis: Run Phases 1-4 sequentially, reading directory tree first, then selective files
•Draft SPEC.md: Follow the output template, applying writing rules
•Self-review: Walk through the checklist above, fix any violations
•Write SPEC.md: Save to the project root