The Three-Layer Agent Stack
Overview
A framework for building effective AI agents by synchronizing innovation across three distinct layers: Model, API, and Harness. Success requires tight integration—not treating the model as a black box.
Core principle: Features like "compaction" (long-running tasks) require simultaneous changes across all three layers.
The Stack
code
┌─────────────────────────────────────────────────────────────────┐ │ LAYER 3: HARNESS / PRODUCT LAYER │ │ ───────────────────────────────────────────────────────────── │ │ The environment that executes actions and provides context │ │ • VS Code / IDE integration │ │ • Terminal / Shell access │ │ • Sandbox / Secure execution environment │ ├─────────────────────────────────────────────────────────────────┤ │ LAYER 2: API LAYER │ │ ───────────────────────────────────────────────────────────── │ │ Interface handling state, context windows, and orchestration │ │ • Context management / Compaction │ │ • State handoff between sessions │ │ • Tool routing and formatting │ ├─────────────────────────────────────────────────────────────────┤ │ LAYER 1: MODEL LAYER │ │ ───────────────────────────────────────────────────────────── │ │ Foundation model providing reasoning and intelligence │ │ • Code generation / Reasoning │ │ • Summarization for compaction │ │ • Environment-specific training │ └─────────────────────────────────────────────────────────────────┘
Key Principles
| Principle | Description |
|---|---|
| Full-Stack Iteration | Changes often need Model + API + Harness together |
| Harness Specificity | Models perform best when trained for specific environments |
| Feedback Loops | Product usage (Harness) must inform model training |
| Safety Sandboxing | Harness provides secure environment for code execution |
Common Mistakes
- •Model-only optimization: Changing model without adapting harness
- •Generic API assumptions: Assuming generic API supports agentic behaviors
- •No feedback loop: Harness doesn't feed back to model training
Real-World Example
Implementing "Compaction" to allow Codex to run 24 hours:
- •Model: Must understand summarization
- •API: Must handle the context handoff
- •Harness: Must prepare and format the payload
Source: Alexander Embiricos (OpenAI Codex) via Lenny's Podcast