Phase 1: Ingest
Read and catalog all input materials to prepare for analysis.
Trigger
Execute when:
- •Starting Ontology Builder Pipeline
- •New documents added to
_input/ - •Re-processing requested
Process
Step 1: Validate Input Structure
Check _input/ folder:
code
Required: ✓ project-context.md (MUST exist) Optional but valuable: ○ domain-hints.md ○ requirements/*.md ○ interviews/*.md ○ existing-docs/* ○ references/*
If project-context.md missing:
→ STOP and request from user
Step 2: Read Project Context
Extract from project-context.md:
- •Project name, module, domain
- •Region (for regulatory context)
- •Current state and pain points
- •Stakeholders
- •Constraints
Store as context_vars for later phases.
Step 3: Catalog All Files
For each file in _input/:
yaml
file_entry: path: [relative path] type: [user-story | interview | existing-doc | reference | other] format: [md | pdf | docx | txt | image] size: [file size] summary: [2-3 sentence summary of content] key_topics: [list of main topics] entities_mentioned: [list of nouns that might be entities] workflows_mentioned: [list of processes/actions mentioned]
Step 4: Identify File Relationships
Map relationships between files:
- •Which files reference each other?
- •Which files cover same topics?
- •Which files contradict each other?
Step 5: Assess Input Quality
Rate input completeness:
| Aspect | Score | Notes |
|---|---|---|
| Requirements coverage | [1-5] | [notes] |
| Domain context | [1-5] | [notes] |
| Stakeholder input | [1-5] | [notes] |
| Examples/scenarios | [1-5] | [notes] |
Step 6: Generate Ingestion Report
Output: _output/_logs/ingestion-report.md
markdown
# Ingestion Report **Generated**: [timestamp] **Project**: [from context] **Module**: [from context] ## Input Summary | Category | Files | Status | |----------|-------|--------| | Project Context | 1 | ✓ Valid | | Requirements | [N] | [status] | | Interviews | [N] | [status] | | Existing Docs | [N] | [status] | | References | [N] | [status] | ## File Catalog [List all files with summaries] ## Preliminary Extraction ### Potential Entities (Raw) [List all nouns that appear to be domain entities] ### Potential Workflows (Raw) [List all processes/actions mentioned] ### Potential Business Rules (Raw) [List any rules or constraints mentioned] ## Quality Assessment [Quality scores and notes] ## Gaps Identified [List what seems to be missing] ## Recommendations [Suggestions for Phase 2]
Output
- •
_output/_logs/ingestion-report.md - •
_output/_logs/gate-1-manifest.yaml(verification manifest) - •Ready for Phase 2: Analyze
Gate 1: Self-Verification
Before completing Phase 1, generate verification manifest:
yaml
# _output/_logs/gate-1-manifest.yaml
gate: 1
name: "Post-Ingest Verification"
timestamp: "[ISO timestamp]"
structural_checks:
- check: "ingestion-report.md exists"
status: PASS
- check: "All input files cataloged"
status: PASS | FAIL
details:
input_files_count: [N]
cataloged_count: [N]
missing: []
- check: "project-context.md processed"
status: PASS | FAIL
consistency_checks:
- check: "No duplicate entries"
status: PASS | FAIL
- check: "All file paths valid"
status: PASS | FAIL
traceability_checks:
- check: "Each file has summary"
status: PASS | FAIL
files_without_summary: []
result:
status: PASS | FAIL | WARN
blocking_failures: []
warnings: []
proceed_to_next_phase: true | false
Verification Rule: Only proceed to Phase 2 if status: PASS or status: WARN
Error Handling
| Issue | Action |
|---|---|
| No project-context.md | STOP, request from user |
| Empty _input/ folder | STOP, request input files |
| Unreadable file | Log warning, skip file |
| Very large file (>100KB) | Summarize first 10K, note truncation |
Next Phase
After completing ingestion:
→ Load phase-2-analyze/SKILL.md
→ Pass ingestion-report.md as input