deep-reasoning-agent — Senior Research Quality Director
Overview
The deep-reasoning-agent is the sole quality gatekeeper in the MAIRA pipeline. It replaces the
need for separate validation, fact-checking, and quality-checking agents by performing ALL verification
tasks in a single, structured pass using 7 specialized tools.
Dictionary-Based SubAgent Definition:
python
deep_reasoning_subagent = {
"name": "deep-reasoning-agent",
"description": "Unified draft verification agent that performs citation validation, fact-checking, content quality assessment, and source cross-referencing in a single pass.",
"system_prompt": "...", # Full prompt below
"tools": [
validate_citations,
verify_draft_completeness,
fact_check_claims,
assess_content_quality,
cross_reference_sources,
internet_search,
extract_webpage,
],
"model": subagent_model # Default: gemini_3_flash
}
When the Main Agent Should Invoke This Subagent
- •Tier 3 (Deep Research) — Step 4 (Verification), after the draft-subagent produces a draft
- •Called every time a new or revised draft is produced
- •Part of the verification loop (max 3 revision cycles)
Invocation Pattern:
python
task(name="deep-reasoning-agent", task="Verify the following research draft for accuracy, citations, completeness, and quality: [draft content + original query]")
Tools
| Tool | Purpose | Phase |
|---|---|---|
validate_citations | Check citation format, URL accessibility, distribution across sections | Phase 1 |
verify_draft_completeness | Verify draft adequately covers the original research query | Phase 1 |
fact_check_claims | Extract and verify 3–5 critical factual claims | Phase 2 |
internet_search | Deep-investigate unverified or disputed claims | Phase 2 |
extract_webpage | Verify claims from specific authoritative URLs | Phase 2 |
assess_content_quality | Check structural completeness, content depth, table presence | Phase 3 |
cross_reference_sources | Ensure all gathered sources are properly cited in the draft | Phase 3 |
Three-Phase Verification Workflow
Phase 1: Citation & Completeness Validation
code
1. validate_citations → Check format, URL accessibility, section distribution 2. verify_draft_completeness → Verify topic coverage against original query
Phase 2: Fact-Checking
code
3. fact_check_claims → Extract and verify 3-5 critical claims 4. internet_search → Deep-investigate disputed/unverified claims (max 5 searches) 5. extract_webpage → Verify against authoritative sources
Phase 3: Content Quality & Source Utilization
code
6. assess_content_quality → Structural completeness, depth, table quality 7. cross_reference_sources → All gathered sources properly cited?
Scoring Weights
| Component | Weight | What It Measures |
|---|---|---|
| Citation Validation | 15% | Format correctness, URL accessibility, distribution |
| Completeness | 10% | Topic alignment with original query |
| Fact Accuracy | 35% | Correctness of factual claims (HIGHEST weight) |
| Content Quality | 25% | Structure, depth, tables, formatting |
| Source Utilization | 15% | Proper citation of all gathered sources |
Output Format
markdown
### 📊 DEEP REASONING VERIFICATION REPORT #### AGGREGATED SCORES | Component | Score | Weight | Weighted | |---------------------|---------|--------|----------| | Citation Validation | [0-100] | 15% | [score] | | Completeness | [0-100] | 10% | [score] | | Fact Accuracy | [0-100] | 35% | [score] | | Content Quality | [0-100] | 25% | [score] | | Source Utilization | [0-100] | 15% | [score] | | **OVERALL SCORE** | — | 100% | **[X]** | #### 🔗 CITATION VALIDATION - Citations: [X] valid / [Y] total ([Z]%) - Broken URLs: [list if any] - Sections missing citations: [list if any] #### 🎯 COMPLETENESS CHECK - Topic Alignment: [score]% - Missing Key Topics: [list if any] - Word Count: [count] #### ✅ FACT-CHECK RESULTS - Claims Checked: [X] - Verified: [X] | Unverified: [Y] | Contradicted: [Z] - Critical Contradictions: [list if any] - Deep Investigation Notes: [summary of searches performed] #### 📋 CONTENT QUALITY - Structure Score: [score]% - Content Depth Score: [score]% - Table Quality: [score]% - Missing Sections: [list if any] - Short Sections: [list if any] #### 🔄 SOURCE UTILIZATION - Sources Gathered: [X] | Cited: [Y] - Coverage: [Z]% - Unused Sources: [count] #### ⚠️ ALL ISSUES (Prioritized) **CRITICAL** (Must Fix): 1. [Issue] **MAJOR** (Should Fix): 1. [Issue] **MINOR** (Nice to Fix): 1. [Issue] #### 🎯 FINAL VERIFICATION DECISION **STATUS**: [VALID | NEEDS_REVISION | INVALID] **Reasoning**: [Brief explanation based on scores and issues] #### ✨ RECOMMENDATIONS [Specific next steps based on STATUS]
Decision Thresholds
VALID (Draft Approved)
code
- Overall Score >= 85 - Zero critical issues - Zero contradicted facts - All required sections present
→ Proceed to Summary (Step 5)
NEEDS_REVISION
code
- Overall Score 60–84 - Max 2 critical issues - Max 1 contradicted fact - Minor structural issues acceptable
→ Send back to draft-subagent with specific revision feedback
INVALID
code
- Overall Score < 60 - >2 critical issues - >1 contradicted fact - Major structural gaps
→ Return to research phase with refined approach
Verification Loop Integration
The deep-reasoning-agent is part of a revision loop controlled by the main agent:
code
Draft → Deep Reasoning → VALID? → Summary → Report
↓ NO
Revision (max 3x) → Re-draft → Deep Reasoning → ...
- •Main agent tracks
revision_count(starts at 0, hard cap at 3) - •On NEEDS_REVISION/INVALID: main agent re-invokes
draft-subagentwith specific feedback - •After 3 failures: main agent proceeds with LOW CONFIDENCE warning
Critical Rules
- •Run ALL Phases — Never skip a verification layer
- •Be Efficient — Max 5 internet searches for fact-checking
- •Aggregate Fairly — Apply weights consistently
- •Prioritize Issues — Critical > Major > Minor
- •Be Decisive — Make a clear final STATUS decision
- •Be Actionable — Provide specific next steps, not vague suggestions
- •Use Authoritative Sources — Prefer official, academic, or established news sources
- •Document Everything — Investigation trail helps the main agent make decisions