Researcher Skill

Name: researcher
Rating: 92
Author: tao3k

Sharded Deep Research for analyzing large codebases. Uses LangGraph with Map-Plan-Loop-Synthesize architecture to handle repositories that exceed LLM context limits.

Architecture

code

┌─────────┐     ┌──────────────┐     ┌────────────────┐     ┌──────────────┐
│  Setup  │ --> │  Architect   │ --> │ Process Shard  │ --> │ Synthesize  │
│  Clone  │     │   (Plan)     │     │    (Loop)      │     │   Index.md   │
└─────────┘     └──────────────┘     └────────────────┘     └──────────────┘
     │                  │                    │
     │              3-5 shards          compress
     │              defined by           + analyze
     │              LLM                  each shard

Commands

run_research_graph

[CORE] Execute the Sharded Deep Research Workflow.

This autonomously:

•Clones the repository to a temporary workspace
•Maps the file structure (god view)
•Plans 3-5 logical analysis shards (subsystems) via LLM
•
Iterates through each shard:
- •Compress with repomix (shard-specific config)
- •Analyze with LLM
- •Save shard analysis to shards/<id>_<name>.md
•Synthesizes index.md linking all shard analyses

Parameters:

•repo_url (string, required): Git repository URL to analyze
•request (string, optional): Research goal/focus (default: "Analyze the architecture")

Returns:

json

{
  "success": true,
  "harvest_dir": "/path/to/.data/harvested/20250123-repo/",
  "shards_analyzed": 4,
  "shard_summaries": [
    "- **[Core Kernel](./shards/01_core_kernel.md)**: Main business logic",
    "- **[API Layer](./shards/02_api_layer.md)**: HTTP handlers"
  ],
  "summary": "Research Complete!..."
}

Output Location:

code

.data/harvested/<date>-<repo_name>/
├── index.md                    # Master index with all shard links
└── shards/
    ├── 01_core_kernel.md       # Shard 1 analysis
    ├── 02_api_layer.md         # Shard 2 analysis
    └── ...

Usage Example

python

# Analyze a repository's security patterns
await researcher.run_research_graph(
    repo_url="https://github.com/example/large-repo",
    request="Analyze security patterns and vulnerability surfaces"
)

# Result: Multiple shard analyses saved to .data/harvested/

Technical Details

•Repomix: Used directly (not via npx) for code compression
•Sharding: LLM dynamically determines shard boundaries based on repo structure
•Loop: Conditional edges in LangGraph process shards until queue empty
•Checkpoint: MemorySaver enables resumption of interrupted workflows