Trace: Concept Lineage Deep Dive
Trace the complete research genealogy of a single concept. Answers "Where did this come from? How did it evolve?"
When to Use
- •"Trace [concept]"
- •"Where did [concept] come from?"
- •"Show me [concept]'s paper history"
- •"History of Attention mechanism"
- •Understanding one concept's full evolution
When NOT to Use
- •Learning multiple concepts → use
deep-dive - •General domain exploration → use
domain-vocab - •Latest research only → use
frontier
Core Value
When you dig deep into one concept, you see the entire field.
Trace provides vertical depth (one concept, complete history) vs. deep-dive's horizontal breadth (many concepts, key papers).
Workflow
Phase 1: Concept Identification
Input: Concept name (optionally with domain context)
Actions:
- •Confirm the concept and its domain
- •Identify potential ambiguity (e.g., "Attention" in NLP vs. psychology)
- •Prime context with domain-vocab tokens (light version)
Output:
concept: "Attention Mechanism" domain: "NLP / Deep Learning" disambiguation: "Neural attention for sequence models, not cognitive attention" search_keywords: ["attention", "neural machine translation", "sequence to sequence"]
Phase 2: Root Paper Discovery
Objective: Find the seminal paper that introduced this concept
Actions:
- •Search Semantic Scholar + arXiv for concept
- •Filter by:
- •High citation count
- •Early publication date
- •Title/abstract directly mentions concept introduction
- •Verify with heuristics:
- •Does the abstract say "we propose/introduce"?
- •Is it widely cited as origin?
- •If multiple candidates, ask user to confirm
Decision Matrix:
| Signal | Indicates Root | Score |
|---|---|---|
| "We propose/introduce X" in abstract | Strong | +3 |
| Published before all high-citation papers on topic | Strong | +3 |
| 1000+ citations | Likely influential | +2 |
| Cited by survey papers as origin | Strong | +2 |
| Author is known pioneer | Supporting | +1 |
Output:
root_paper: id: "semantic_scholar_id" title: "Neural Machine Translation by Jointly Learning to Align and Translate" authors: ["Bahdanau", "Cho", "Bengio"] year: 2014 venue: "ICLR 2015" citations: 45000+ key_contribution: "Introduced attention mechanism for NMT" abstract: "..." url: "https://arxiv.org/abs/1409.0473"
Phase 3: Ancestry Extraction (Parents)
Objective: What did the root paper build upon?
Actions:
- •Get references from root paper
- •Identify influential predecessors:
- •Papers cited multiple times in root
- •Papers providing key techniques used
- •Conceptual foundations
- •Categorize ancestors:
- •Direct parents: Immediate building blocks
- •Grandparents: Foundational work
- •Parallel influences: Contemporary related work
Output:
ancestry:
direct_parents:
- title: "Sequence to Sequence Learning with Neural Networks"
authors: ["Sutskever", "Vinyals", "Le"]
year: 2014
relationship: "Seq2Seq framework that attention extends"
- title: "Learning Phrase Representations using RNN Encoder-Decoder"
authors: ["Cho", "van Merrienboer", "et al"]
year: 2014
relationship: "GRU architecture used in attention model"
grandparents:
- title: "Long Short-Term Memory"
authors: ["Hochreiter", "Schmidhuber"]
year: 1997
relationship: "Foundational recurrent architecture"
parallel_influences:
- title: "Neural Turing Machines"
authors: ["Graves", "Wayne", "Danihelka"]
year: 2014
relationship: "Independent attention-like mechanism"
Phase 4: Descendant Extraction (Children)
Objective: How did the concept evolve after introduction?
Actions:
- •Get papers citing root paper
- •Filter for high-impact descendants (citations > threshold)
- •Identify evolution branches:
- •Direct extensions: Improve original mechanism
- •Applications: Apply to new domains
- •Paradigm shifts: Fundamental reimagining
Output:
descendants:
direct_extensions:
- title: "Effective Approaches to Attention-based NMT"
authors: ["Luong", "Pham", "Manning"]
year: 2015
contribution: "Local vs global attention variants"
citations: 12000+
paradigm_shifts:
- title: "Attention Is All You Need"
authors: ["Vaswani", "et al"]
year: 2017
contribution: "Self-attention, eliminated RNNs entirely"
citations: 100000+
spawned: ["BERT", "GPT", "Vision Transformer"]
applications:
- title: "Show, Attend and Tell"
authors: ["Xu", "et al"]
year: 2015
contribution: "Attention for image captioning"
domain: "Computer Vision"
Phase 5: Timeline Construction
Objective: Visualize the complete evolution
Output:
timeline
title Attention Mechanism Evolution
1997 : LSTM (Hochreiter)
: Foundation for sequence modeling
2014 : Seq2Seq (Sutskever)
: Encoder-decoder framework
2014 : Bahdanau Attention ⭐
: ROOT - Attention mechanism introduced
2015 : Luong Attention
: Local/global variants
2015 : Show, Attend, Tell
: Attention for vision
2017 : Transformer ⭐⭐
: Self-attention revolution
2018 : BERT, GPT
: Pretrained transformers
2020 : Vision Transformer
: Attention conquers CV
2023 : Modern LLMs
: Attention at scale
Phase 6: Insight Synthesis
Objective: Extract learnings from the lineage
Output:
## Key Insights from Tracing "Attention Mechanism" ### Origin Story Attention emerged from a practical problem: RNN encoder-decoder models struggled with long sequences. Bahdanau's insight was to let the decoder "look back" at relevant parts of the input. ### Evolution Pattern 1. **Problem → Solution**: Long-range dependency problem → Attention 2. **Generalization**: NMT-specific → General sequence mechanism 3. **Paradigm Shift**: Auxiliary mechanism → Primary architecture (Transformer) 4. **Cross-Domain Transfer**: NLP → Vision → Multimodal ### Branching Points - 2015: Luong's local attention (efficiency branch) - 2017: Transformer (self-attention revolution) - 2020: Vision Transformer (modality transfer) ### Key Researchers - Dzmitry Bahdanau: Original attention - Ashish Vaswani: Transformer architecture - Alexey Dosovitskiy: Vision Transformer ### If You Learn This Understanding attention's evolution helps you: - See why Transformers replaced RNNs - Understand architectural decisions in modern LLMs - Predict where attention might go next
Output Formats
Format A: Lineage Report (Default)
Complete markdown report with all phases.
Format B: Visual Tree
[LSTM 1997]
│
[Seq2Seq 2014]
│
┌──────────┴──────────┐
│ │
[Bahdanau Attention 2014] ⭐ [Neural Turing Machines]
│
┌─────────┼─────────┐
│ │ │
[Luong [Show,Attend] [Pointer
2015] Tell 2015] Networks]
│
└─────────────────┐
│
[Transformer 2017] ⭐⭐
│
┌────────────┼────────────┐
│ │ │
[BERT] [GPT] [ViT 2020]
Format C: Obsidian Canvas
Visual representation with clickable paper nodes.
Format D: Timeline Only
Compact timeline view for quick reference.
Example Session
Input: "Trace Transformer architecture"
Output Summary:
# Transformer Lineage ## Root Paper "Attention Is All You Need" (Vaswani et al., 2017) - Google Brain / Google Research - 100,000+ citations - Introduced: Self-attention, multi-head attention, positional encoding ## Ancestry ├── Bahdanau Attention (2014) - Attention mechanism ├── Seq2Seq (2014) - Encoder-decoder framework ├── Layer Normalization (2016) - Training stability └── Residual Connections (2015) - Deep network training ## Key Descendants ├── BERT (2018) - Bidirectional pretraining ├── GPT (2018) - Autoregressive pretraining ├── GPT-2/3/4 (2019-2023) - Scale revolution ├── Vision Transformer (2020) - CV application └── Multimodal Models (2021+) - Cross-modal attention ## Timeline: 7 years of dominance 2017 → 2024: From NMT improvement to foundation of modern AI
Depth Levels
| Level | Ancestry Depth | Descendant Depth | Papers |
|---|---|---|---|
| Quick | Parents only | Top 5 children | ~10 |
| Standard | Grandparents | 2 generations | ~25 |
| Exhaustive | 3 generations | 3 generations | ~50+ |
Default: Standard
Error Handling
| Situation | Recovery |
|---|---|
| Multiple possible roots | Present options, ask user |
| Concept too recent (< 2 years) | Note as "emerging", limited lineage |
| Concept from practice (not academia) | Note origin, trace related academic work |
| Too many descendants (1000+) | Filter by citations, ask for sub-branch focus |