DocumentIndex Skills
Comprehensive skills for processing, indexing, searching, and extracting information from structured documents like SEC filings, earnings calls, and research reports.
Quick Decision Guide
Choose the right skill based on your needs:
- •Need to process a new document? → document-indexing.md
- •Looking for specific content in a document? → node-searching.md
- •Have a specific question to answer? → agentic-qa.md
- •Need ALL evidence on a topic for compliance/audit? → provenance-extraction.md
Available Patterns
Core Skills
- •Document Indexing - Transform unstructured documents into hierarchical tree structures with summaries, metadata, and cross-references
- •Node Searching - Find relevant document sections using LLM reasoning instead of vector similarity
- •Agentic Question Answering - Answer questions through iterative reasoning with confidence scoring and full reasoning traces
- •Provenance Extraction - Exhaustively scan entire documents to find ALL evidence for topics with progress tracking
When to Use This Skill
Use DocumentIndex patterns when you need to:
- •Process and index financial documents (10-K, 10-Q, 8-K, earnings calls)
- •Search for specific information within structured documents
- •Answer questions about document content with citations
- •Extract comprehensive evidence for compliance, audits, or research
- •Navigate complex document hierarchies (PART → ITEM → Section → Note)
- •Follow cross-references between document sections
Complexity Levels
- •Beginner: Skill 1 (Document Indexing) - Start here to understand the foundation
- •Intermediate: Skills 2-3 (Node Searching, Agentic QA) - Build on indexed documents
- •Advanced: Skill 4 (Provenance Extraction) - Comprehensive evidence gathering
Learning Path
- •Start with Document Indexing - Learn how to transform raw documents into queryable structures
- •Explore Node Searching - Understand how to find relevant sections using LLM reasoning
- •Try Agentic QA - Answer specific questions with confidence scoring and citations
- •Master Provenance Extraction - Extract exhaustive evidence for compliance and research
Recommended progression:
- •First-time users: Read skills in order (1 → 2 → 3 → 4)
- •Experienced users: Jump to the skill matching your use case via the Quick Decision Guide
Typical Workflow
code
1. Index Document (Skill 1) ↓ 2. Choose your approach: ├─ Quick question? → Agentic QA (Skill 3) ├─ Find sections? → Node Searching (Skill 2) └─ Need ALL evidence? → Provenance Extraction (Skill 4)
Configuration Trade-offs
Speed vs. Quality
- •Fast/Cheap: Lower thresholds, fewer iterations, no summaries
- •Balanced: Default configurations (recommended for most use cases)
- •Thorough: Higher thresholds, more iterations, with summaries
Coverage vs. Precision
- •Maximum coverage: Low relevance thresholds (0.5-0.6)
- •Balanced: Medium thresholds (0.6-0.7)
- •High precision: High thresholds (0.7-0.8+)
Working Examples
See examples/ directory for complete, runnable code examples:
- •indexer_deep_dive.py - Document indexing patterns
- •searcher_showcase.py - Node searching strategies
- •agentic_qa_tutorial.py - Question answering workflows
- •provenance_patterns.py - Evidence extraction patterns
- •basic_usage.py - Getting started examples
- •caching_example.py - Performance optimization
- •streaming_example.py - Streaming responses
- •multi_provider_example.py - Multi-LLM support
See examples/README.md for setup instructions and detailed descriptions.
Integration Patterns
Sequential Processing
code
DocumentIndexer → NodeSearcher → AgenticQA (Index once) → (Find sections) → (Answer questions)
Parallel Analysis
code
DocumentIndexer → ProvenanceExtractor (multiple topics in parallel) (Index once) → (Extract evidence for: climate, cyber, regulatory, etc.)
Hybrid Approach
code
DocumentIndexer → AgenticQA (quick questions)
↓
→ ProvenanceExtractor (comprehensive evidence when needed)
Performance Considerations
| Component | Time | Cost | Use Case |
|---|---|---|---|
| Document Indexing | 30-60s | $0.10-0.50 | One-time per document |
| Node Searching | 2-5s | $0.01-0.05 | Per search query |
| Agentic QA | 5-15s | $0.02-0.10 | Per question |
| Provenance Extraction | 30-90s | $0.05-0.20 | Per topic extraction |
Additional Resources
- •Source Code:
/documentindexpackage - •Documentation: Project README
- •Main Skills README:
../README.md
Decision Framework Summary
code
┌─────────────────────────────────────────┐
│ What do you need to do? │
└─────────────────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
New Doc? Find Content? Answer Q?
│ │ │
▼ ▼ ▼
Indexing Searching QA
(Skill 1) (Skill 2) (Skill 3)
│
Need ALL evidence?
│
▼
Provenance
(Skill 4)
Last Updated: 2026-02-01 Skills Version: 1.0 Compatible with: DocumentIndex v0.1.0+