Multi-Agent Research
Production patterns from Anthropic's research system for conducting complex, multi-faceted research efficiently.
Source: How we built our multi-agent research system
When to Use
Use multi-agent-research patterns when:
- •Research has 3+ independent dimensions to explore
- •Comparing multiple alternatives simultaneously
- •Need to synthesize information from 10+ sources
- •Time-sensitive research requiring parallelization
- •Complex technical landscape with unclear paths
- •Breadth-first exploration needed
Don't use for:
- •Simple fact-finding (1-2 sources)
- •Single-dimension queries
- •When sequential research is sufficient
Core Principles
1. Scale Effort to Query Complexity
Assess complexity BEFORE starting research to allocate appropriate resources.
Simple Fact-Finding (3-10 tool calls)
- •Single specific question with clear answer
- •1-2 authoritative sources needed
- •Examples: Version number lookup, API syntax check, single DocType field validation
Approach: Direct search → validate → document
// Example: Check ERPNext Task field
mcp__ref__ref_search_documentation({
query: "ERPNext Task DocType fields"
})
// Review results → Document finding with citation
Direct Comparison (10-15 tool calls)
- •Compare 2-3 specific alternatives
- •Field mapping between systems
- •Feature parity analysis
Approach: Parallel searches for each option → compare → recommend
// Example: Compare DocTypes - execute in parallel
mcp__ref__ref_search_documentation({ query: "ERPNext Task DocType" })
mcp__ref__ref_search_documentation({ query: "ERPNext Project Task DocType" })
mcp__ref__ref_search_documentation({ query: "ERPNext ToDo DocType" })
// Compare results → Build comparison table → Recommend
Complex Multi-Faceted Research (20+ tool calls)
- •Architecture decision with multiple unknowns
- •Integration across multiple systems
- •Novel feature requiring ecosystem research
- •Multiple independent research dimensions
Approach: Use deep researcher OR spawn parallel focused sub-tasks
// Example: Complex research with deep researcher
mcp__exasearch__deep_researcher_start({
instructions: "Research OpenTelemetry integration for AWS Lambda Node.js. Cover: (1) Lambda layer vs manual instrumentation trade-offs, (2) X-Ray backend compatibility, (3) cold start performance impact, (4) 2025 best practices. Include code examples with versions.",
model: "exa-research"
})
// Poll for completion, extract findings, add to research doc
Complexity Heuristic: If research question has 3+ independent dimensions, use parallel or deep researcher approach.
2. Parallel Tool Execution
Execute independent searches simultaneously for 90% faster results.
When to Parallelize
Comparing alternatives:
// ❌ DON'T: Sequential searches (slow)
await mcp__ref__ref_search_documentation({ query: "Redis session store" });
// wait...
await mcp__ref__ref_search_documentation({ query: "Memcached session store" });
// wait...
// ✅ DO: Parallel searches (single message, multiple tool calls)
mcp__ref__ref_search_documentation({ query: "Redis session store" })
mcp__ref__ref_search_documentation({ query: "Memcached session store" })
mcp__exasearch__web_search_exa({ query: "Redis vs Memcached 2025 comparison" })
Multi-source validation:
// Execute all simultaneously
mcp__ref__ref_search_documentation({ query: "OpenTelemetry Lambda official docs" })
mcp__exasearch__web_search_exa({ query: "OpenTelemetry Lambda examples 2025" })
mcp__exasearch__web_search_exa({ query: "OpenTelemetry Lambda deprecated OR EOL" })
Independent research dimensions:
// All can run in parallel
mcp__ref__ref_search_documentation({ query: "Stripe API authentication" })
mcp__exasearch__web_search_exa({ query: "Stripe API rate limits" })
mcp__exasearch__web_search_exa({ query: "Stripe webhook best practices 2025" })
3. Search Strategy: Start Wide, Then Narrow
Progressive refinement prevents missing important context.
Anti-Pattern: Overly Specific Initial Queries
❌ DON'T START WITH: "how to implement OpenTelemetry auto-instrumentation for AWS Lambda with X-Ray backend using custom sampling rules in Node.js 18" Result: Few/no results, miss alternative approaches
Recommended: Progressive Refinement
✅ STEP 1 - Broad Exploration (2-5 results): "OpenTelemetry AWS Lambda Node.js" → Discover: What approaches exist? What's recommended? ✅ STEP 2 - Evaluate Landscape: Review results, identify main approaches (layer vs manual instrumentation) ✅ STEP 3 - Narrow Focus (3-10 results): "OpenTelemetry Lambda layer vs manual instrumentation" → Compare trade-offs ✅ STEP 4 - Specific Details: "OpenTelemetry Lambda layer installation guide 2025" → Find working examples
Query Pattern Templates
| Purpose | Pattern | Example |
|---|---|---|
| Discovery | [technology] [use case] | "GraphQL federation microservices" |
| Comparison | [option A] vs [option B] [criteria] | "REST vs GraphQL performance" |
| Implementation | [specific approach] [version] guide | "GraphQL Apollo Federation v2 guide" |
| Validation | [library] deprecated OR EOL OR migration | "Apollo Federation deprecated" |
4. Thinking Process for Research
Use extended and interleaved thinking to plan strategy and evaluate results.
Planning Phase (Extended Thinking)
Before tool calls, think through:
[Extended thinking example]: This ERPNext DocType selection question has 3 candidates to evaluate. I need to research each in parallel: - Field mappings (official docs) - Custom field requirements (API specs) - Community usage patterns (real-world examples) I'll use ref.tools for official ERPNext docs (3 parallel calls) and exa for community examples (3 parallel calls). Total: 6 parallel tool calls for comprehensive coverage.
Planning Checklist:
- •What are the independent sub-questions?
- •Which tools fit this research?
- •What's the complexity level? (Simple/Comparison/Complex)
- •Which searches can run simultaneously?
After Tool Results (Interleaved Thinking)
After each set of results, evaluate:
[Interleaved thinking example]: Results from 6 parallel searches received: Quality check: - Official docs: High confidence, current (v14) - Community examples: Medium confidence, mix of v13/v14 Gap analysis: - Tasks DocType: 80% field coverage, clear - Project Tasks: 60% coverage, BUT community prefers for workflow integration - Missing: Understanding WHY community prefers Project Tasks Next step: Need one more targeted search on workflow capabilities difference
Evaluation Checklist:
- •✅ Are sources authoritative? Current? Relevant?
- •✅ What's still missing? What contradicts?
- •✅ Should I go deeper or pivot direction?
- •✅ What's my confidence level? (High/Medium/Low)
5. Deep Research Delegation
Know when to delegate to specialized deep research vs direct tool calls.
Decision Framework
| Use Direct Research | Use Deep Researcher |
|---|---|
| Query scope clear and bounded | Open-ended exploration needed |
| 2-4 specific sources | Unclear which sources to check |
| Can complete in 10-15 tool calls | Requires 10+ sources |
| Example: "Compare Redis vs Memcached for sessions" | Example: "Research state of GraphQL federation in 2025 - solutions, trade-offs, migration paths" |
Deep Researcher Workflow
1. Start task with detailed instructions:
mcp__exasearch__deep_researcher_start({
instructions: `Research OpenTelemetry integration for AWS Lambda Node.js functions.
Focus areas:
1. Lambda layer vs manual instrumentation (trade-offs, pros/cons)
2. X-Ray backend compatibility (setup, configuration)
3. Cold start performance impact (benchmarks, mitigation)
4. Current best practices as of 2025 (official recommendations)
Deliverables:
- Code examples with specific version numbers
- Official documentation links
- Deprecation warnings if any
- Recommended approach with justification`,
model: "exa-research" // or "exa-research-pro" for very complex
})
// Returns: { taskId: "abc123" }
2. Poll for results (repeat until status: "completed"):
mcp__exasearch__deep_researcher_check({
taskId: "abc123"
})
// Tool includes 5-second delay before checking
// Keep calling until status: "completed"
3. Extract and integrate findings:
## Findings from Deep Research **Source**: Deep Researcher Task abc123 ### Finding 1: Lambda Layer Recommended - **Summary**: Official docs recommend layer approach over manual - **Evidence**: "Layer reduces cold start by 200ms, handles auto-instrumentation" - **Source**: OpenTelemetry Lambda Docs (High confidence) - **Code Example**: `layers: ['arn:aws:lambda:...:opentelemetry-nodejs']` [Additional findings...]
6. Self-Improvement from Tool Failures
Adapt when tools don't work as expected.
Document Failure Patterns
When tool repeatedly fails:
TOOL FAILURE PATTERN:
- Tool: mcp__ref__ref_search_documentation
- Query: "ERPNext v14 custom fields"
- Parameters: { query: "ERPNext v14 custom field creation API" }
- Expected: Documentation on custom field APIs
- Actual: No results returned (0 matches)
- Frequency: 5/5 attempts with different phrasings
Adaptation Strategies
- •Try alternative tool:
// Primary tool failing, switch to web search
mcp__exasearch__web_search_exa({
query: "ERPNext v14 custom field creation site:erpnext.com"
})
- •Rephrase query:
// Original: "ERPNext v14 custom fields" // Rephrased: "ERPNext custom field API" (drop version) // Rephrased: "Frappe custom field creation" (use framework name)
- •Break into smaller queries:
// Original: "ERPNext v14 custom field creation and validation API"
// Split into:
mcp__ref__ref_search_documentation({ query: "ERPNext custom field creation" })
mcp__ref__ref_search_documentation({ query: "ERPNext field validation" })
Report to Planning
TOOL ISSUE REPORT: **Tool**: mcp__ref__ref_search_documentation **Issue**: Consistently returns no results for ERPNext v14 queries, v13 queries work **Attempted**: 5 different query phrasings **Workaround**: Using web_search_exa with site:erpnext.com filter, then validating **Impact**: 30% slower research, but results accurate **Recommendation**: Tool may need ERPNext v14 docs indexed
7. Findings Compression Strategy
Compress vast research into actionable context for downstream agents.
The Problem
Complex research generates 10+ sources with hundreds of pages. Action Agent needs compressed, decision-critical information only.
4-Element Compression Pattern
For each major finding:
- •Core claim (1 sentence)
- •Evidence (quote or concrete example)
- •Source (URL + confidence level)
- •Relevance (why this matters for decision)
Anti-Pattern: Copying Entire Docs
❌ DON'T: Found 15-page OpenTelemetry Lambda guide covering: - History of observability (3 pages) - Architecture deep-dive (4 pages) - Deployment options (2 pages) - Configuration reference (5 pages) - Troubleshooting (1 page) [Paste entire guide]
Better: Extract Decision-Critical Information
✅ DO: **Finding**: OpenTelemetry Lambda layer recommended over manual instrumentation - **Core Claim**: Layer approach reduces cold start by 200ms vs manual - **Evidence**: Official guide states "Layer handles auto-instrumentation and reduces cold start overhead through optimized initialization" - **Source**: [OpenTelemetry Lambda Docs - Deployment Options](https://opentelemetry.io/docs/faas/lambda-nodejs/#deployment) (High confidence - official docs) - **Code Example**: ```typescript layers: ['arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-nodejs-ver-1-18-0:1']
- •Relevance: Meets performance requirement (<500ms cold start) without custom instrumentation code, reduces maintenance burden
Alternative Considered: Manual instrumentation - rejected due to cold start penalty and maintenance overhead
#### Compression Checklist Before handing off findings: - [ ] Each finding fits in 3-5 sentences - [ ] Code examples are minimal but working (not full files) - [ ] Links to docs (don't paste full docs) - [ ] Clear connection to decision criteria - [ ] Confidence level explicit (High/Medium/Low) - [ ] Alternatives considered and rejection rationale ## Integration with Existing Workflows ### For Researcher Agent Apply these patterns in existing research workflow: 1. **Query Complexity Assessment** → Add before "Search Existing Project Documentation" 2. **Parallel Tool Execution** → Use in "Conduct External Research" phase 3. **Search Strategy** → Apply when formulating search queries 4. **Thinking Process** → Use before tool calls and after results 5. **Deep Researcher** → Option in "Conduct External Research" for complex topics 6. **Tool Failures** → Add to Error Handling section 7. **Findings Compression** → Apply in "Document Findings with Citations" ### Output Format Maintain existing output structure, enhance with compression: ```markdown ## Key Findings ### Finding 1: [Title] **Core Claim**: [1 sentence] **Evidence**: [Quote or concrete example] **Source**: [URL + confidence] **Relevance**: [Decision impact] **Code Example** (if applicable): ```[language] // Minimal working example
[Repeat for 3-5 key findings]
## Success Metrics
Applying multi-agent research patterns should result in:
- **Speed**: 90% faster for breadth-first queries (via parallelization)
- **Quality**: Higher confidence findings (progressive refinement)
- **Efficiency**: Right-sized effort (complexity assessment)
- **Completeness**: Better coverage (parallel exploration)
- **Usability**: Actionable findings (compression)
## Examples
### Example 1: Simple Query
**Task**: Check if Redis supports session TTL
**Complexity**: Simple fact-finding
**Approach**: Direct search (3 tool calls)
```typescript
// Single targeted search
mcp__ref__ref_search_documentation({
query: "Redis TTL expire session keys"
})
// Validate in official docs
// Document finding with citation
Result: 2 minutes, High confidence
Example 2: Comparison Query
Task: Compare Redis vs Memcached for session storage
Complexity: Direct comparison Approach: Parallel searches (10 tool calls)
// Execute in parallel (single message)
mcp__ref__ref_search_documentation({ query: "Redis session storage features" })
mcp__ref__ref_search_documentation({ query: "Memcached session storage features" })
mcp__exasearch__web_search_exa({ query: "Redis vs Memcached session store 2025" })
mcp__exasearch__web_search_exa({ query: "Redis session TTL persistence" })
mcp__exasearch__web_search_exa({ query: "Memcached session limitations" })
Result: 8 minutes, comparison table with pros/cons, High confidence recommendation
Example 3: Complex Multi-Faceted Query
Task: Research GraphQL federation migration from REST API
Complexity: Complex (architecture decision, multiple unknowns) Approach: Deep researcher (40+ sources)
mcp__exasearch__deep_researcher_start({
instructions: `Research migrating from REST to GraphQL federation for microservices.
Focus areas:
1. Available federation solutions (Apollo, Mercurius, etc.) - compare features, maturity
2. Migration strategies (big bang vs incremental, REST wrapper patterns)
3. Schema stitching vs federation trade-offs
4. Performance implications (n+1 queries, caching)
5. Client migration (breaking changes, backward compatibility)
6. 2025 best practices and anti-patterns
Deliverables:
- Solution comparison table
- Recommended migration path with phases
- Code examples for federation setup
- Known gotchas and mitigation strategies`,
model: "exa-research-pro"
})
// Poll until complete
mcp__exasearch__deep_researcher_check({ taskId: "..." })
// Extract findings, compress to key decisions
// Build recommendation with phased approach
Result: 45 minutes, comprehensive analysis with migration roadmap, Medium-High confidence (validated against official docs)
Anti-Patterns
❌ Starting Too Specific
Query: "implement OpenTelemetry auto-instrumentation AWS Lambda X-Ray custom sampling Node 18" Result: 0 results or miss better approaches
Fix: Start broad, progressively narrow
❌ Sequential When Could Parallelize
await search("Redis features");
await search("Memcached features");
await search("comparison");
Fix: Single message with 3 tool calls
❌ Pasting Entire Docs
Finding: [15 pages of OpenTelemetry docs copied]
Fix: Extract 4-element compressed findings
❌ Skipping Thinking Steps
[Immediately calls 10 tools without planning]
Fix: Extended thinking to plan, interleaved thinking to evaluate
❌ Using Deep Researcher for Simple Queries
Task: "What's the latest Redis version?" Approach: deep_researcher_start (overkill)
Fix: Simple direct search (1-2 tool calls)
References
- •Anthropic: How we built our multi-agent research system
- •Anthropic: Agents cookbook - Prompts
- •Linear Issue LAW-76 - Implementation tracking