Context Extractor Skill

You are an expert parser specializing in extracting structured context from Product Requirements Documents (PRDs). You excel at parsing markdown tables and converting them into machine-readable JSON format.

When to Use This Skill

•Extracting context from PRD files for implementation
•Parsing "All Needed Context" sections
•Converting PRD context into structured data
•Preparing context bundles for /flow:generate-prp
•Providing context to /flow:implement and /flow:validate

Input Format

This skill accepts a file path to a PRD markdown file as input. The PRD must contain an "All Needed Context" section with the following subsections:

•Code Files - Source code files relevant to the feature
•Docs / Specs - Related documentation and specifications
•Examples - Example files demonstrating patterns
•Gotchas / Prior Failures - Known pitfalls and lessons learned
•External Systems / APIs - External dependencies and integrations

Parsing Instructions

1. Locate the "All Needed Context" Section

Search for the markdown heading ## All Needed Context in the PRD file. All content between this heading and the next H2 heading (##) is part of the context section.

2. Parse Each Subsection

For each subsection (H3 heading ###), parse the markdown table that follows:

Code Files Table Format

markdown

| File Path | Purpose | Read Priority |
|-----------|---------|---------------|
| `path/to/file` | Description | High/Medium/Low |

Extract into:

json

{
  "path": "path/to/file",
  "purpose": "Description",
  "priority": "High|Medium|Low"
}

Docs / Specs Table Format

markdown

| Document | Link | Key Sections |
|----------|------|--------------|
| Doc Name | `docs/path` or URL | Sections |

Extract into:

json

{
  "title": "Doc Name",
  "link": "docs/path or URL",
  "key_sections": "Sections"
}

Examples Table Format

markdown

| Example | Location | Relevance to This Feature |
|---------|----------|---------------------------|
| Example Name | `examples/path` | Description |

Extract into:

json

{
  "name": "Example Name",
  "location": "examples/path",
  "relevance": "Description"
}

Gotchas / Prior Failures Table Format

markdown

| Gotcha | Impact | Mitigation | Source |
|--------|--------|------------|--------|
| Issue | What happens | How to fix | Reference |

Extract into:

json

{
  "issue": "Issue",
  "impact": "What happens",
  "mitigation": "How to fix",
  "source": "Reference"
}

External Systems / APIs Table Format

markdown

| System / API | Type | Documentation | Notes |
|--------------|------|---------------|-------|
| System Name | REST/GraphQL/etc | Link | Details |

Extract into:

json

{
  "name": "System Name",
  "type": "REST|GraphQL|gRPC|Database|etc",
  "documentation": "Link",
  "notes": "Details"
}

3. Handle Empty Sections

If a subsection table has only headers (no data rows), or if the subsection is missing entirely, return an empty array [] for that section.

4. Clean Up Markdown Formatting

•Remove backticks from file paths and code references
•Trim whitespace from all fields
•Convert inline code markers to plain text
•Preserve newlines in multi-line fields as \n

Output Format

Return a JSON object with the following structure:

json

{
  "code_files": [
    {
      "path": "src/flowspec_cli/commands/specify.py",
      "purpose": "Main implementation of /flow:specify command",
      "priority": "High"
    }
  ],
  "docs_specs": [
    {
      "title": "Spec-Driven Development Guide",
      "link": "docs/guides/sdd-guide.md",
      "key_sections": "Section 3: Context Management"
    }
  ],
  "examples": [
    {
      "name": "User Authentication Flow",
      "location": "examples/auth/login.py",
      "relevance": "Shows proper session handling pattern"
    }
  ],
  "gotchas": [
    {
      "issue": "Race condition in concurrent writes",
      "impact": "Data corruption under high load",
      "mitigation": "Use database transactions with proper isolation",
      "source": "task-123"
    }
  ],
  "external_systems": [
    {
      "name": "GitHub API",
      "type": "REST",
      "documentation": "https://docs.github.com/rest",
      "notes": "Rate limit: 5000 req/hour, requires PAT"
    }
  ]
}

Error Handling

If the PRD file cannot be read or parsed:

•Return an error object: {"error": "Description of error"}
•Include the file path in the error message
•Suggest remediation steps if applicable

Common Error Cases

•File not found: {"error": "PRD file not found: {path}. Verify the file exists."}
•No context section: {"error": "PRD missing 'All Needed Context' section. Add section to PRD."}
•Malformed table: {"error": "Malformed table in section '{section_name}'. Check markdown syntax."}

Usage Example

Input PRD Excerpt

markdown

## All Needed Context

### Code Files

| File Path | Purpose | Read Priority |
|-----------|---------|---------------|
| `src/flowspec_cli/commands/specify.py` | Main /flow:specify implementation | High |
| `templates/prd-template.md` | PRD template structure | Medium |

### Docs / Specs

| Document | Link | Key Sections |
|----------|------|--------------|
| SDD Guide | `docs/guides/sdd-guide.md` | Context Management |

### Examples

| Example | Location | Relevance to This Feature |
|---------|----------|---------------------------|
| Login Flow | `examples/auth/login.py` | Session handling pattern |

### Gotchas / Prior Failures

| Gotcha | Impact | Mitigation | Source |
|--------|--------|------------|--------|
| Race condition | Data corruption | Use transactions | task-123 |

### External Systems / APIs

| System / API | Type | Documentation | Notes |
|--------------|------|---------------|-------|
| GitHub API | REST | https://docs.github.com/rest | 5000 req/hour limit |

Output JSON

json

{
  "code_files": [
    {
      "path": "src/flowspec_cli/commands/specify.py",
      "purpose": "Main /flow:specify implementation",
      "priority": "High"
    },
    {
      "path": "templates/prd-template.md",
      "purpose": "PRD template structure",
      "priority": "Medium"
    }
  ],
  "docs_specs": [
    {
      "title": "SDD Guide",
      "link": "docs/guides/sdd-guide.md",
      "key_sections": "Context Management"
    }
  ],
  "examples": [
    {
      "name": "Login Flow",
      "location": "examples/auth/login.py",
      "relevance": "Session handling pattern"
    }
  ],
  "gotchas": [
    {
      "issue": "Race condition",
      "impact": "Data corruption",
      "mitigation": "Use transactions",
      "source": "task-123"
    }
  ],
  "external_systems": [
    {
      "name": "GitHub API",
      "type": "REST",
      "documentation": "https://docs.github.com/rest",
      "notes": "5000 req/hour limit"
    }
  ]
}

Integration Points

/flow:implement

Uses extracted context to:

•Identify files to read before implementation
•Prioritize reading order (High → Medium → Low)
•Discover related documentation
•Warn about gotchas early

/flow:generate-prp

Uses extracted context to:

•Build comprehensive context bundles
•Include all relevant files and docs
•Attach examples for reference
•Warn about known failure modes

/flow:validate

Uses extracted context to:

•Verify all referenced files exist
•Check that documentation is up-to-date
•Validate against known gotchas
•Test external system integrations

Validation Checklist

After parsing, verify:

• All five sections present in output (even if empty)
• File paths are clean (no backticks or extra quotes)
• Priorities are valid (High/Medium/Low only)
• JSON is valid and properly formatted
• No markdown artifacts in extracted text
• Empty sections return [] not null

Quality Standards

•Accuracy: Preserve exact meanings from PRD
•Completeness: Extract all rows from all tables
•Cleanliness: Remove markdown formatting artifacts
•Consistency: Use consistent field names and structure
•Robustness: Handle missing sections gracefully