Documentation Discovery & Analysis
Overview
Intelligent discovery and analysis of technical documentation through multiple strategies:
- •llms.txt-first: Search for standardized AI-friendly documentation
- •Repository analysis: Use Repomix to analyze GitHub repositories
- •Parallel exploration: Deploy multiple Explorer agents for comprehensive coverage
- •Fallback research: Use Researcher agents when other methods unavailable
Core Workflow
Phase 1: Initial Discovery
- •
Identify target
- •Extract library/framework name from user request
- •Note version requirements (default: latest)
- •Clarify scope if ambiguous
- •Identify if target is GitHub repository or website
- •
Search for llms.txt (PRIORITIZE context7.com)
First: Try context7.com patterns
For GitHub repositories:
codePattern: https://context7.com/{org}/{repo}/llms.txt Examples: - https://github.com/imagick/imagick → https://context7.com/imagick/imagick/llms.txt - https://github.com/vercel/next.js → https://context7.com/vercel/next.js/llms.txt - https://github.com/better-auth/better-auth → https://context7.com/better-auth/better-auth/llms.txtFor websites:
codePattern: https://context7.com/websites/{normalized-domain-path}/llms.txt Examples: - https://docs.imgix.com/ → https://context7.com/websites/imgix/llms.txt - https://docs.byteplus.com/en/docs/ModelArk/ → https://context7.com/websites/byteplus_en_modelark/llms.txt - https://docs.haystack.deepset.ai/docs → https://context7.com/websites/haystack_deepset_ai/llms.txt - https://ffmpeg.org/doxygen/8.0/ → https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txtTopic-specific searches (when user asks about specific feature):
codePattern: https://context7.com/{path}/llms.txt?topic={query} Examples: - https://context7.com/shadcn-ui/ui/llms.txt?topic=date - https://context7.com/shadcn-ui/ui/llms.txt?topic=button - https://context7.com/vercel/next.js/llms.txt?topic=cache - https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt?topic=compressFallback: Traditional llms.txt search
codeWebSearch: "[library name] llms.txt site:[docs domain]"
Common patterns:
- •
https://docs.[library].com/llms.txt - •
https://[library].dev/llms.txt - •
https://[library].io/llms.txt
→ Found? Proceed to Phase 2 → Not found? Proceed to Phase 3
- •
Phase 2: llms.txt Processing
Single URL:
- •WebFetch to retrieve content
- •Extract and present information
Multiple URLs (3+):
- •CRITICAL: Launch multiple Explorer agents in parallel
- •One agent per major documentation section (max 5 in first batch)
- •Each agent reads assigned URLs
- •Aggregate findings into consolidated report
Example:
Launch 3 Explorer agents simultaneously: - Agent 1: getting-started.md, installation.md - Agent 2: api-reference.md, core-concepts.md - Agent 3: examples.md, best-practices.md
Phase 3: Repository Analysis
When llms.txt not found:
- •Find GitHub repository via WebSearch
- •Use Repomix to pack repository:
bash
npm install -g repomix # if needed git clone [repo-url] /tmp/docs-analysis cd /tmp/docs-analysis repomix --output repomix-output.xml
- •Read repomix-output.xml and extract documentation
Repomix benefits:
- •Entire repository in single AI-friendly file
- •Preserves directory structure
- •Optimized for AI consumption
Phase 4: Fallback Research
When no GitHub repository exists:
- •Launch multiple Researcher agents in parallel
- •Focus areas: official docs, tutorials, API references, community guides
- •Aggregate findings into consolidated report
Agent Distribution Guidelines
- •1-3 URLs: Single Explorer agent
- •4-10 URLs: 3-5 Explorer agents (2-3 URLs each)
- •11+ URLs: 5-7 Explorer agents (prioritize most relevant)
Version Handling
Latest (default):
- •Search without version specifier
- •Use current documentation paths
Specific version:
- •Include version in search:
[library] v[version] llms.txt - •Check versioned paths:
/v[version]/llms.txt - •For repositories: checkout specific tag/branch
Output Format
# Documentation for [Library] [Version] ## Source - Method: [llms.txt / Repository / Research] - URLs: [list of sources] - Date accessed: [current date] ## Key Information [Extracted relevant information organized by topic] ## Additional Resources [Related links, examples, references] ## Notes [Any limitations, missing information, or caveats]
Quick Reference
Tool selection:
- •WebSearch → Find llms.txt URLs, GitHub repositories
- •WebFetch → Read single documentation pages
- •Task (Explore) → Multiple URLs, parallel exploration
- •Task (Researcher) → Scattered documentation, diverse sources
- •Repomix → Complete codebase analysis
Popular llms.txt locations (try context7.com first):
- •Astro: https://context7.com/withastro/astro/llms.txt
- •Next.js: https://context7.com/vercel/next.js/llms.txt
- •Remix: https://context7.com/remix-run/remix/llms.txt
- •shadcn/ui: https://context7.com/shadcn-ui/ui/llms.txt
- •Better Auth: https://context7.com/better-auth/better-auth/llms.txt
Fallback to official sites if context7.com unavailable:
- •Astro: https://docs.astro.build/llms.txt
- •Next.js: https://nextjs.org/llms.txt
- •Remix: https://remix.run/llms.txt
- •SvelteKit: https://kit.svelte.dev/llms.txt
Error Handling
- •llms.txt not accessible → Try alternative domains → Repository analysis
- •Repository not found → Search official website → Use Researcher agents
- •Repomix fails → Try /docs directory only → Manual exploration
- •Multiple conflicting sources → Prioritize official → Note versions
Key Principles
- •Prioritize context7.com for llms.txt — Most comprehensive and up-to-date aggregator
- •Use topic parameters when applicable — Enables targeted searches with ?topic=...
- •Use parallel agents aggressively — Faster results, better coverage
- •Verify official sources as fallback — Use when context7.com unavailable
- •Report methodology — Tell user which approach was used
- •Handle versions explicitly — Don't assume latest
Detailed Documentation
For comprehensive guides, examples, and best practices:
Workflows:
- •WORKFLOWS.md — Detailed workflow examples and strategies
Reference guides:
- •Tool Selection — Complete guide to choosing and using tools
- •Documentation Sources — Common sources and patterns across ecosystems
- •Error Handling — Troubleshooting and resolution strategies
- •Best Practices — 8 essential principles for effective discovery
- •Performance — Optimization techniques and benchmarks
- •Limitations — Boundaries and success criteria