Research Synthesis Workflow
This skill provides a systematic methodology for conducting research, synthesizing findings from multiple sources, and producing actionable knowledge artifacts.
Core Competencies
- •Source Evaluation: Assessing credibility, relevance, and bias
- •Information Extraction: Systematic note-taking and annotation
- •Synthesis Methods: Thematic analysis, meta-analysis, framework building
- •Knowledge Artifacts: Reports, literature reviews, decision frameworks
Research Workflow Overview
code
┌──────────────────────────────────────────────────────────────┐ │ Research Synthesis Workflow │ ├──────────────────────────────────────────────────────────────┤ │ │ │ 1. SCOPE 2. GATHER 3. EXTRACT │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Define │─────▶│ Find │─────▶│ Capture │ │ │ │ Question│ │ Sources │ │ Insights│ │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ 5. PRODUCE 4. SYNTHESIZE │ │ │ ┌─────────┐ ┌─────────┐ │ │ └─────────▶│ Create │◀─────│ Connect │ │ │ │ Artifact│ │ Themes │ │ │ └─────────┘ └─────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘
Phase 1: Scope Definition
Research Question Framework
Transform vague topics into answerable questions:
| Type | Pattern | Example |
|---|---|---|
| Exploratory | What is X? How does X work? | What is vector search? |
| Comparative | How does X compare to Y? | PostgreSQL vs. Neo4j for graphs? |
| Evaluative | Is X effective for Y? | Is RAG effective for technical docs? |
| Causal | What causes X? What are effects of X? | What causes LLM hallucinations? |
| Prescriptive | How should we implement X? | How to design a RAG pipeline? |
Scope Boundaries
Define explicitly:
- •In scope: Topics to cover
- •Out of scope: Adjacent topics to exclude
- •Depth: Survey (broad) vs. deep-dive (narrow)
- •Time bounds: Cut-off dates for sources
- •Source types: Academic, industry, primary data
Example Scope Document
markdown
## Research Scope: Vector Database Selection ### Research Question Which vector database best fits our production RAG system requiring <50ms latency at 10M+ vectors? ### In Scope - Pinecone, Weaviate, Milvus, Qdrant, pgvector - Latency benchmarks at scale - Cost analysis (cloud vs self-hosted) - Operational complexity ### Out of Scope - General-purpose databases with vector extensions - Sub-million vector use cases - Academic/research-only systems ### Success Criteria Recommendation with supporting evidence for 2-3 top candidates
Phase 2: Source Gathering
Source Quality Assessment
Evaluate each source on:
| Criterion | High Quality | Low Quality |
|---|---|---|
| Authority | Expert author, peer-reviewed | Anonymous, no credentials |
| Currency | Recent, updated | Outdated, no dates |
| Accuracy | Citations, verifiable | Unsupported claims |
| Purpose | Inform, educate | Sell, persuade |
| Coverage | Comprehensive | Superficial |
Source Types and Uses
code
Primary Sources (original) ├── Research papers ├── Official documentation ├── Benchmark data └── Expert interviews Secondary Sources (analysis) ├── Review articles ├── Technical blogs ├── Industry reports └── Book chapters Tertiary Sources (summaries) ├── Wikipedia ├── Textbooks └── Encyclopedias
Search Strategies
Keyword expansion:
- •Start: "vector database performance"
- •Expand: "approximate nearest neighbor", "HNSW benchmark", "embedding search latency"
Citation chaining:
- •Forward: Who cites this paper?
- •Backward: What does this paper cite?
Author tracking:
- •Find key researchers, follow their work
Source Documentation
For each source, capture:
markdown
## Source: [Title] - **URL/DOI**: - **Author(s)**: - **Date**: - **Type**: [paper/blog/docs/report] - **Quality Score**: [1-5] - **Relevance**: [high/medium/low] - **Key Topics**: - **Notes**:
Phase 3: Information Extraction
Structured Note-Taking
Use consistent templates for extraction:
markdown
## Claim: [Specific assertion] - **Source**: [reference] - **Evidence**: [supporting data/reasoning] - **Strength**: [strong/moderate/weak] - **My Assessment**: [agree/disagree/uncertain] - **Related Claims**: [links to other notes]
Evidence Classification
| Type | Description | Weight |
|---|---|---|
| Empirical | Measured data, experiments | High |
| Analytical | Logical derivation | Medium-High |
| Anecdotal | Case studies, examples | Medium |
| Expert Opinion | Authority statements | Medium |
| Theoretical | Model predictions | Medium-Low |
Contradiction Tracking
When sources disagree:
markdown
## Conflict: [Topic] ### Position A: [Claim] - Sources: [list] - Evidence: [summary] ### Position B: [Claim] - Sources: [list] - Evidence: [summary] ### Analysis - Methodological differences: - Context differences: - Possible resolution: - My conclusion:
Phase 4: Synthesis
Thematic Analysis
- •Code individual insights with tags
- •Cluster related codes into themes
- •Review themes for coherence
- •Define each theme clearly
- •Relate themes to research question
code
Codes Themes Findings ├─ fast queries ─┐ ├─ low latency ─┼── Performance ─┬── Theme 1: Performance ├─ high throughput ─┘ │ varies significantly ├─ managed service ─┐ │ by workload type ├─ self-hosted ─┼── Deployment ─┼── Theme 2: Cloud vs ├─ kubernetes ─┘ │ self-hosted tradeoff ├─ pricing tiers ─┐ │ ├─ compute costs ─┼── Economics ─┴── Theme 3: Total cost ├─ hidden costs ─┘ drives final choice
Framework Building
Create decision frameworks from synthesis:
markdown
## Vector Database Selection Framework ### Decision Tree 1. Scale requirement? - <1M vectors → pgvector (simplicity) - 1M-100M vectors → Continue to 2 - >100M vectors → Milvus/Weaviate (distributed) 2. Operational capacity? - Limited DevOps → Pinecone (managed) - Strong DevOps → Continue to 3 3. Cost sensitivity? - Budget constrained → Qdrant (open source) - Budget flexible → Evaluate all options ### Comparison Matrix | Criterion | Weight | Pinecone | Milvus | Qdrant | |----------------|--------|----------|--------|--------| | Latency | 30% | 4 | 5 | 4 | | Scalability | 25% | 5 | 5 | 4 | | Operations | 20% | 5 | 3 | 4 | | Cost | 15% | 2 | 4 | 5 | | Features | 10% | 4 | 5 | 4 | | **Weighted** | | **4.0** | **4.4**| **4.2**|
Phase 5: Knowledge Artifact Production
Artifact Types
| Format | Purpose | Audience |
|---|---|---|
| Executive Summary | Quick decision support | Leadership |
| Technical Report | Detailed analysis | Engineers |
| Literature Review | Academic synthesis | Researchers |
| Decision Framework | Structured evaluation | Decision makers |
| Reference Guide | Quick lookup | Practitioners |
Structure Templates
Executive Summary (1-2 pages):
- •Context and question
- •Key findings (3-5 bullets)
- •Recommendation
- •Risks and considerations
Technical Report (5-20 pages):
- •Executive summary
- •Background and scope
- •Methodology
- •Findings by theme
- •Analysis and discussion
- •Recommendations
- •Appendices (data, sources)
Quality Checklist
Before finalizing:
- • Research question answered?
- • All claims supported by evidence?
- • Contradictions addressed?
- • Limitations acknowledged?
- • Actionable recommendations?
- • Sources properly cited?
- • Appropriate for audience?
Best Practices
Avoiding Bias
- •Seek disconfirming evidence actively
- •Include multiple perspectives
- •Note your priors and update them
- •Separate observation from interpretation
- •Document methodology for transparency
Managing Scope Creep
- •Return to research question frequently
- •Park interesting tangents in "Future Research"
- •Time-box each phase
- •Define "good enough" criteria upfront
Iteration
Research is rarely linear:
- •New sources may require scope adjustment
- •Synthesis may reveal gaps requiring more gathering
- •Artifacts may need multiple drafts
References
- •
references/evaluation-rubrics.md- Source quality scoring guides - •
references/synthesis-methods.md- Detailed synthesis techniques - •
references/artifact-templates.md- Document templates and examples