Discovery Scan: Extract & Synthesize
Purpose
Extract symbols from the codebase using the rai discover scan CLI command, then synthesize meaningful descriptions for each component. The output is a draft component catalog ready for human validation.
Mastery Levels (ShuHaRi)
Shu (守): Follow all steps, synthesize descriptions for all public symbols.
Ha (破): Filter to public APIs only; skip internal helpers automatically.
Ri (離): Custom synthesis prompts for domain-specific codebases.
Context
When to use:
- •After
/rai-discover-starthas created context - •When refreshing component descriptions after code changes
- •For targeted scan of specific directory
When to skip:
- •
work/discovery/components-draft.yamlexists and is current - •Just need to re-validate existing descriptions
Inputs required:
- •
work/discovery/context.yamlfrom/rai-discover-start - •OR explicit path argument for targeted scan
Output:
- •
work/discovery/components-draft.yaml— Draft components with synthesized descriptions - •Ready for
/rai-discover-validate
Steps
Step 1: Load Discovery Context
Read the context file to determine scan scope:
Read: work/discovery/context.yaml
Extract:
- •
languages— Which extractors to use - •
root_dirs— Directories to scan
Alternative: If user provides explicit path, use that instead of context.
Verification: Context loaded OR explicit path provided.
If you can't continue: No context and no path → Run
/rai-discover-startfirst.
Step 2: Run Extraction
Execute the rai discover scan command:
# For each root_dir in context
rai discover scan {root_dir} --language {language} --output json
Example:
rai discover scan src/rai_cli --language python --output json
Capture the JSON output — it contains all extracted symbols.
Verification: JSON output received with symbols array.
If you can't continue: Scan fails → Check path exists and language is supported.
Step 3: Run Analysis
Run the deterministic analyzer on the scan output to compute confidence scores, auto-categorize components, fold methods into parent classes, and group by module:
rai discover analyze --input {scan_output_json} --output human
Or pipe directly:
rai discover scan {root_dir} --language {language} --output json | rai discover analyze --output human
This produces:
- •Confidence scores for each component (high/medium/low tiers)
- •Auto-categorization from path conventions and naming patterns
- •Hierarchical folding (methods grouped under parent classes)
- •Module grouping (for parallel AI synthesis batches)
- •
work/discovery/analysis.json— the primary artifact for/rai-discover-validate
Review the summary output:
- •High-confidence components can be auto-validated (no AI synthesis needed)
- •Medium-confidence components need AI synthesis in module batches
- •Low-confidence components need individual human review
Verification: work/discovery/analysis.json exists with components and module_groups.
If you can't continue: Analysis fails → Check scan output is valid JSON.
Step 4: Synthesize Descriptions (Medium/Low Confidence Only)
High-confidence components (score >= 70): Use auto_purpose and auto_category from the analyzer — no AI synthesis needed.
Medium and low-confidence components: Synthesize descriptions using the module groups from the analysis. Process each module group as a batch:
For each module group in analysis.json:
- •Read all components in that module
- •Synthesize purpose and category for components that lack good auto_purpose
Synthesis approach per component:
Given:
name: {name}
kind: {kind} (class/function)
signature: {signature}
docstring: {docstring} # may be None for low-confidence
file: {file}
auto_category: {category} # from analyzer
auto_purpose: {purpose} # from docstring first sentence, may be empty
Synthesize:
- •Purpose — What does it do? Why does it exist? (1-2 sentences, focus on reuse value)
- •Category — Verify or correct the auto_category
- •Dependencies — Key types/classes it depends on (from signature)
Quality criteria:
- •Purpose is actionable (describes what, not how)
- •Purpose highlights reuse value
- •Category matches the symbol's role
- •Dependencies are specific, not generic (
BaseModel, notpydantic)
Step 5: Generate Component IDs
Create unique IDs for each component:
Pattern: comp-{module}-{name}
- •
module= file stem without extension, lowercase - •
name= symbol name, lowercase, hyphens for underscores
Examples:
- •
Symbolinscanner.py→comp-scanner-symbol - •
scan_directoryinscanner.py→comp-scanner-scan-directory - •
ConceptNodeinmodels.py→comp-models-conceptnode
Step 6: Write Draft YAML
Create work/discovery/components-draft.yaml:
# work/discovery/components-draft.yaml
# Generated by /rai-discover-scan
# Review with /rai-discover-validate
generated_at: {ISO_TIMESTAMP}
source_command: "raise discover scan {path} --language {lang}"
symbol_count: {N}
components:
# Module: src/rai_cli/discovery/scanner.py
- id: comp-scanner-symbol
name: Symbol
kind: class
file: src/rai_cli/discovery/scanner.py
line: 44
signature: "class Symbol(BaseModel)"
docstring: |
A code symbol extracted from source.
...
# Synthesized by Rai:
purpose: "Represents a code symbol extracted from source files. Core data structure for discovery output."
category: model
depends_on:
- pydantic.BaseModel
internal: false
validated: false
- id: comp-scanner-get-signature
name: _get_signature
kind: function
file: src/rai_cli/discovery/scanner.py
line: 96
signature: "def _get_signature(node: ast.ClassDef | ast.FunctionDef | ast.AsyncFunctionDef) -> str"
docstring: "Extract signature from an AST node."
purpose: "Internal helper for signature extraction."
category: utility
depends_on:
- ast.ClassDef
- ast.FunctionDef
internal: true
validated: false
Write the file using the Write tool.
Verification: File created at work/discovery/components-draft.yaml.
Step 7: Display Summary
Present scan results to user:
## Discovery Scan Complete
**Scanned:** {path}
**Language:** {language}
**Symbols found:** {total}
- Classes: {N}
- Functions: {N}
- Methods: {N}
**Output:** `work/discovery/components-draft.yaml`
### Component Categories
- Models: {N}
- Services: {N}
- Utilities: {N}
- ...
### Internal (skipped for validation): {N}
### Next Step
Run `/rai-discover-validate` to review and approve component descriptions.
Verification: Summary displayed; user knows next step.
Output
- •Artifacts:
- •
work/discovery/analysis.json— Deterministic analysis with confidence scores and module groups - •
work/discovery/components-draft.yaml— Draft components with synthesized descriptions
- •
- •Telemetry:
skill_eventvia Stop hook - •Next:
/rai-discover-validate
Component Schema
id: string # Unique ID (comp-{module}-{name})
name: string # Symbol name
kind: string # class | function | method | module
file: string # Relative path to source
line: number # Line number (1-indexed)
signature: string # Full signature
docstring: string # Original docstring (if present)
purpose: string # Synthesized description (1-2 sentences)
category: string # service | model | utility | handler | parser | builder | schema | command | test
depends_on: string[] # Key dependencies
internal: boolean # True if private/internal
validated: boolean # True after human review (false initially)
Category Definitions
| Category | Description | Examples |
|---|---|---|
service | Business logic, orchestration | UserService, AuthHandler |
model | Data structures, schemas | User, ConceptNode, Symbol |
utility | Helper functions, tools | format_date, parse_yaml |
handler | Request/event handlers | handle_request, on_click |
parser | Parsing, extraction | parse_markdown, extract_yaml |
builder | Construction, generation | build_graph, create_report |
schema | Validation, type definitions | UserSchema, ConfigModel |
command | CLI commands | scan, query, emit |
test | Test utilities | fixtures, mocks |
Notes
Synthesis Quality
Good synthesis focuses on reuse value:
- •What problem does this solve?
- •When would someone use this?
- •What does it integrate with?
Bad synthesis describes implementation:
- •"Uses a loop to iterate..."
- •"Calls the database..."
Handling Large Codebases
For codebases with >100 symbols:
- •Scan in chunks (by directory)
- •Focus on public APIs first
- •Use
--excludepatterns for tests/vendors
Targeted Scans
User can run for specific path:
/rai-discover-scan src/specific/module
This overrides context.yaml root_dirs for this scan.
References
- •Previous skill:
/rai-discover-start - •Next skill:
/rai-discover-validate - •CLI docs:
rai discover scan --help - •Design:
work/stories/f13.3/design.md