Foundation: Building Fundamental Understanding

Build deep foundational understanding of a domain. Goes beyond surface definitions to explore meaning, connectivity, and historical development of concepts.

Position in Learning Pipeline

code

domain-vocab → foundation → trace/frontier/paper-flow
(WHAT)         (WHY/HOW)    (WHERE)

Concepts  →  Understanding  →  Research Origins

When to Use

•"Build foundation for [domain]"
•"I know the terms but don't understand how they connect"
•"Foundation [domain]"
•"[domain] fundamentals deeply"
•"How did [domain] develop?"
•"What's the big picture of [domain]?"
•After domain-vocab, before diving into papers

When NOT to Use

•Just need term definitions → use domain-vocab
•Want specific paper lineage → use trace
•Want latest research → use frontier
•Want full concept + paper treatment → use deep-dive

Core Value

Knowing terms is vocabulary. Understanding how they connect, why they exist, and how they evolved is fluency.

Foundation transforms scattered concepts into a coherent mental map.

Workflow

Phase 1: Deep Meaning Extraction

Objective: Go beyond dictionary definitions to understand what terms really mean.

Actions:

•
For each key concept, extract:
- •Literal meaning: What the term denotes
- •Connotative meaning: What practitioners imply when using it
- •Historical meaning: How the meaning evolved over time
- •Contextual meaning: How meaning shifts in different sub-domains
•
Identify semantic layers:
- •Surface level (beginner understanding)
- •Working level (practitioner understanding)
- •Deep level (expert/researcher understanding)

Output:

yaml

concept: "Gradient Descent"
meanings:
  literal: "Iteratively moving in the direction of steepest decrease"
  connotative: "The workhorse optimization method; implies iterative refinement"
  historical:
    - 1847: Cauchy introduces for equation solving
    - 1960s: Applied to neural networks
    - 2010s: Becomes synonymous with "training"
  contextual:
    optimization: "A first-order method"
    deep_learning: "The training algorithm"
    practitioner: "Just run .backward() and step()"
semantic_layers:
  surface: "Go downhill to find minimum"
  working: "Compute gradients, update parameters with learning rate"
  deep: "Navigating loss landscape geometry, escaping saddles"

Phase 2: Connectivity Mapping

Objective: Reveal how concepts relate, depend, and interact.

Actions:

•
Build Prerequisite Graph:
- •What must you understand before concept X?
- •What concepts assume X as given?
•
Build Synergy Map:
- •Which concepts amplify each other?
- •Which are frequently used together?
•
Build Tension Map:
- •Which concepts compete or conflict?
- •What are the fundamental tradeoffs?
•
Identify Bridge Concepts:
- •Concepts that connect different sub-domains
- •Concepts imported from other fields

Output:

yaml

connectivity:
  prerequisites:
    "Neural Network":
      requires: ["Linear Algebra", "Calculus", "Probability"]
      enables: ["Deep Learning", "Backpropagation", "Architectures"]

  synergies:
    - concepts: ["Attention", "Transformer"]
      relationship: "Attention enables Transformer's parallelism"
    - concepts: ["Dropout", "Regularization"]
      relationship: "Dropout implements stochastic regularization"

  tensions:
    - concepts: ["Bias", "Variance"]
      tradeoff: "Reducing one typically increases the other"
    - concepts: ["Interpretability", "Accuracy"]
      tradeoff: "Complex models harder to explain"

  bridges:
    - concept: "Information Theory"
      connects: ["ML Theory", "Compression", "Generalization"]
      imported_from: "Communication Theory (Shannon)"

Phase 3: Historical Flow

Objective: Trace the development arc of the field.

Actions:

•
Identify Era Boundaries:
- •What paradigm shifts divided the field's history?
- •What changed "before" vs "after" key moments?
•
Map Evolution Threads:
- •How did major ideas evolve?
- •What parallel paths merged or diverged?
•
Extract Key Inflection Points:
- •Breakthroughs that changed everything
- •Failures that redirected the field
- •External events that accelerated progress
•
Capture Conceptual Archaeology:
- •Ideas that died and were resurrected
- •Terms that changed meaning over time
- •Approaches that fell out of favor and why

Output:

markdown

## Historical Flow: Machine Learning

### Era Map
| Era | Period | Paradigm | Key Development |
|-----|--------|----------|-----------------|
| Symbolic | 1950-1980 | Logic & rules | Expert systems, LISP |
| Connectionist Winter | 1970-1986 | Criticism of neural nets | Minsky's Perceptrons critique |
| Revival | 1986-2006 | Backpropagation | Rumelhart, Hinton, Williams |
| Deep Learning | 2006-2017 | Deep architectures | GPU training, ImageNet |
| Foundation Models | 2017-now | Scale + attention | Transformers, GPT, BERT |

### Inflection Points
- **1986**: Backpropagation popularized → Neural networks become trainable
- **2012**: AlexNet wins ImageNet → Deep learning proven at scale
- **2017**: "Attention Is All You Need" → Transformer architecture
- **2020**: GPT-3 → Few-shot learning without fine-tuning

### Resurrection Stories
- **Neural Networks**: Dismissed 1970s → Revived 2000s
- **Reinforcement Learning**: Academic curiosity → AlphaGo 2016
- **Perceptrons**: "Can't learn XOR" → Deep networks transcend this

Phase 4: Structural Understanding

Objective: Build the "big picture" mental model of the field.

Actions:

•
Identify Core Pillars:
- •What are the 3-5 fundamental ideas the field rests on?
- •What would collapse if removed?
•
Map Sub-domain Architecture:
- •How does the field divide into areas?
- •What are the boundaries and overlaps?
•
Extract Governing Principles:
- •What laws, theorems, or heuristics guide the field?
- •What are the "physics" of this domain?
•
Identify Open Questions:
- •What does the field not yet understand?
- •Where are the active frontiers?

Output:

yaml

structure:
  core_pillars:
    - "Optimization": "Finding parameters that minimize loss"
    - "Generalization": "Performance on unseen data"
    - "Representation": "Learning useful features"
    - "Architecture": "Structure of the model"

  sub_domains:
    supervised:
      includes: ["Classification", "Regression"]
      key_assumption: "Labeled training data available"
    unsupervised:
      includes: ["Clustering", "Dimensionality Reduction", "Generative"]
      key_assumption: "Find structure without labels"
    reinforcement:
      includes: ["Policy Learning", "Value Functions", "Exploration"]
      key_assumption: "Learn from interaction with environment"

  governing_principles:
    - "No Free Lunch": No universally best algorithm
    - "Bias-Variance Tradeoff": Fundamental tension in learning
    - "Occam's Razor": Prefer simpler models
    - "Universal Approximation": Neural nets can approximate any function

  open_questions:
    - "Why do overparameterized models generalize?"
    - "How to achieve robust out-of-distribution performance?"
    - "What is the right inductive bias for reasoning?"

Phase 5: Misconception Clearing

Objective: Identify and correct common misunderstandings.

Actions:

•
Catalog Beginner Misconceptions:
- •What do newcomers typically get wrong?
- •What intuitions from other fields mislead?
•
Identify Expert Blind Spots:
- •What do experts assume everyone knows?
- •What jargon obscures understanding?
•
Debunk Persistent Myths:
- •What false beliefs persist despite evidence?
- •What oversimplifications are dangerous?

Output:

yaml

misconceptions:
  beginner:
    - myth: "More data always improves performance"
      reality: "Diminishing returns; data quality matters more than quantity"

    - myth: "Deep learning requires massive datasets"
      reality: "Transfer learning, augmentation, and pretraining reduce data needs"

    - myth: "Higher accuracy means better model"
      reality: "Depends on task; calibration, fairness, robustness matter"

  expert_blind_spots:
    - assumption: "Everyone knows what a tensor is"
      clarification: "Multidimensional array with specific mathematical properties"

    - jargon: "The model is overfitting"
      clarification: "Training performance >> test performance; memorizing not learning"

  persistent_myths:
    - myth: "Neural networks are black boxes"
      reality: "Interpretability methods exist; attention, gradients, probing"

    - myth: "AI will be generally intelligent soon"
      reality: "Current systems are narrow; AGI timeline highly uncertain"

Output Formats

Format A: Foundation Report (Default)

markdown

# [Domain] Foundation Report

## Quick Orientation
{3-4 sentences positioning this field}

## Core Pillars
{The 3-5 fundamental ideas}

## Concept Deep Dives
{Detailed meaning exploration for key concepts}

## How It Connects
{Connectivity map: prerequisites, synergies, tensions}

## Historical Arc
{Timeline with eras and inflection points}

## Mental Models
{Frameworks for thinking about this domain}

## Common Misconceptions
{What to unlearn}

## Before You Continue
{Prerequisites checklist}
{Recommended next steps}

Format B: Visual Concept Map

Mermaid diagrams showing:

•Prerequisite chains
•Era timeline
•Sub-domain relationships

Format C: Study Guide

Structured learning path with:

•Ordered concept sequence
•Checkpoint questions
•Practical exercises

Example Session

Input: "Build foundation for reinforcement learning"

Output Summary:

markdown

# Reinforcement Learning Foundation

## Quick Orientation
RL is learning through interaction: an agent takes actions in an environment,
receives rewards, and learns to maximize cumulative reward. Unlike supervised
learning (given correct answers) or unsupervised (find patterns), RL discovers
behavior through trial and error.

## Core Pillars
1. **Agent-Environment Loop**: Action → State → Reward → Action
2. **Value Functions**: Predicting future cumulative reward
3. **Policy**: Mapping states to actions
4. **Exploration vs Exploitation**: Try new vs use known

## Deep Meaning: "Reward"
- Literal: Scalar signal from environment
- Connotative: "What we want the agent to do"
- Deeper: Reward shaping is hard; sparse rewards cause credit assignment
- Historical: From behaviorist psychology (Skinner, operant conditioning)

## Connectivity
Prerequisites: Probability, Markov Chains, Dynamic Programming
Synergies: Value + Policy → Actor-Critic
Tensions: Exploration ↔ Exploitation, Sample Efficiency ↔ Asymptotic Performance

## Historical Arc
- 1950s: Dynamic programming (Bellman)
- 1989: Q-learning (Watkins)
- 1992: TD-Gammon (Tesauro) - backgammon via self-play
- 2013: DQN (Mnih) - Atari from pixels
- 2016: AlphaGo - RL defeats world champion
- 2020+: RLHF - RL for language model alignment

## Misconceptions
❌ "Reward must be designed perfectly" → Reward shaping, inverse RL help
❌ "RL is just trial and error" → Planning, model-based methods exist
❌ "RL needs millions of samples" → Offline RL, model-based reduce this

Integration with Other Skills

Flow	Description
domain-vocab → foundation	Concepts identified → now understand deeply
foundation → trace	Understand field → trace specific concept origins
foundation → frontier	Understand history → see where it's heading
foundation → deep-dive	Optional: combine with paper genealogy

Error Handling

Situation	Recovery
Domain too broad	Focus on sub-domain, ask user preference
Insufficient historical data	Note gaps, focus on structural understanding
Conflicting sources	Present multiple perspectives with context
Highly interdisciplinary	Map connections to source fields