Researcher Skill
Sharded Deep Research for analyzing large codebases. Uses LangGraph with Map-Plan-Loop-Synthesize architecture to handle repositories that exceed LLM context limits.
Architecture
code
┌─────────┐ ┌──────────────┐ ┌────────────────┐ ┌──────────────┐
│ Setup │ --> │ Architect │ --> │ Process Shard │ --> │ Synthesize │
│ Clone │ │ (Plan) │ │ (Loop) │ │ Index.md │
└─────────┘ └──────────────┘ └────────────────┘ └──────────────┘
│ │ │
│ 3-5 shards compress
│ defined by + analyze
│ LLM each shard
Commands
run_research_graph
[CORE] Execute the Sharded Deep Research Workflow.
This autonomously:
- •Clones the repository to a temporary workspace
- •Maps the file structure (god view)
- •Plans 3-5 logical analysis shards (subsystems) via LLM
- •Iterates through each shard:
- •Compress with repomix (shard-specific config)
- •Analyze with LLM
- •Save shard analysis to
shards/<id>_<name>.md
- •Synthesizes
index.mdlinking all shard analyses
Parameters:
- •
repo_url(string, required): Git repository URL to analyze - •
request(string, optional): Research goal/focus (default: "Analyze the architecture")
Returns:
json
{
"success": true,
"harvest_dir": "/path/to/.data/harvested/20250123-repo/",
"shards_analyzed": 4,
"shard_summaries": [
"- **[Core Kernel](./shards/01_core_kernel.md)**: Main business logic",
"- **[API Layer](./shards/02_api_layer.md)**: HTTP handlers"
],
"summary": "Research Complete!..."
}
Output Location:
code
.data/harvested/<date>-<repo_name>/
├── index.md # Master index with all shard links
└── shards/
├── 01_core_kernel.md # Shard 1 analysis
├── 02_api_layer.md # Shard 2 analysis
└── ...
Usage Example
python
# Analyze a repository's security patterns
await researcher.run_research_graph(
repo_url="https://github.com/example/large-repo",
request="Analyze security patterns and vulnerability surfaces"
)
# Result: Multiple shard analyses saved to .data/harvested/
Technical Details
- •Repomix: Used directly (not via npx) for code compression
- •Sharding: LLM dynamically determines shard boundaries based on repo structure
- •Loop: Conditional edges in LangGraph process shards until queue empty
- •Checkpoint: MemorySaver enables resumption of interrupted workflows