Context Engineering
Overview
context-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns.
Purpose: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality
The 5 Context Optimization Operations:
- •Monitor Context Usage - Track token consumption, identify heavy consumers
- •Reduce Context Load - Remove stale content, minimize loaded files
- •Optimize Skill Design - Progressive disclosure, efficient reference loading
- •Separate Planning/Execution - Keep execution context clean
- •File-Based Strategies - Externalize large data (95% token savings)
Key Benefits:
- •29-39% performance improvement with context editing strategies
- •95% token savings using file-based approaches for large data
- •Sustained quality in multi-turn conversations
- •Cost reduction through efficient token usage
- •Prevention of context overflow and conversation drift
Context Window Sizes (2025):
- •Sonnet 4/4.5: 200k tokens (standard), 500k-1M (beta for Tier 4)
- •Auto-compaction: Triggers around 80% usage (~160k for 200k window)
When to Use
Use context-engineering when:
- •Approaching Context Limits - Context usage >60-70%, need to optimize before hitting limits
- •Building Large Skills - Creating skills with extensive documentation, need efficient loading strategies
- •Token Cost Management - Reducing API costs through optimization
- •Multi-Turn Conversations - Maintaining coherence across extended sessions
- •Skill Design Phase - Planning context-efficient architecture from start
- •Performance Optimization - Improving response quality and latency
- •Conversation Quality - Preventing drift and maintaining focus
- •MCP Integration - Managing context from Model Context Protocol servers
- •Large Data Handling - Working with extensive datasets or outputs
Prerequisites
- •Understanding of Claude context windows
- •Access to context monitoring (Claude Code
/contextcommand) - •Familiarity with progressive disclosure pattern
- •Skills under development or optimization
Operations
Operation 1: Monitor Context Usage
Purpose: Track token consumption, identify context-heavy elements, and detect optimization opportunities
When to Use This Operation:
- •Beginning of optimization effort (baseline measurement)
- •During development (continuous monitoring)
- •When approaching context limits (>60-70% usage)
- •Investigating performance issues
Process:
- •
Check Current Context Usage
codeUse /context command in Claude Code to view: - Total tokens used - Percentage of context window - Files loaded - Recent tool calls
- •
Identify Heavy Consumers
- •Large files loaded (>5,000 tokens each)
- •Extensive conversation history
- •Many tool call results
- •Large CLAUDE.md files
- •
Analyze Usage Patterns
- •Which files are loaded but rarely referenced?
- •Are all tool results still relevant?
- •Is conversation history necessary?
- •Are there duplicate or redundant contexts?
- •
Document Baseline
- •Record current token usage
- •Note context-heavy elements
- •Identify optimization targets
- •
Set Optimization Goals
- •Target token reduction (e.g., reduce by 20%)
- •Performance improvement targets
- •Quality maintenance requirements
Validation Checklist:
- • Current context usage measured (tokens and percentage)
- • Context-heavy elements identified (files, history, tool results)
- • Baseline documented for comparison
- • Optimization targets set
- • High-impact optimization opportunities noted
Outputs:
- •Current context usage metrics
- •List of context-heavy elements
- •Baseline measurement
- •Optimization targets
- •Priority optimization opportunities
Time Estimate: 10-15 minutes
Example:
Context Usage Analysis ====================== Current Usage: 145,000 tokens (72% of 200k window) Heavy Consumers: 1. CLAUDE.md: 25,000 tokens (17%) 2. Large skill files: 40,000 tokens (28%) - planning-architect/SKILL.md: 15,000 tokens - development-workflow/common-patterns.md: 12,000 tokens - review-multi/scoring-rubric.md: 8,000 tokens 3. Conversation history: 30,000 tokens (21%) 4. Tool call results: 20,000 tokens (14%) Optimization Opportunities: - Split large CLAUDE.md (25k → 10k target) - Use references/ loading instead of full files (40k → 15k) - Clear old tool results (20k → 5k) Target: Reduce to ~100k tokens (50% of window, 31% reduction)
Operation 2: Reduce Context Load
Purpose: Remove stale content, minimize loaded files, and reduce token consumption
When to Use This Operation:
- •Context usage >70% (approaching limits)
- •Performance degradation noticed
- •Before major operations (clear space)
- •Periodic maintenance (every few hours)
Process:
- •
Remove Stale Tool Results
- •Identify old tool call results no longer needed
- •Tool results from exploratory work
- •Superseded information
- •
Minimize File Loading
- •Only load files actively needed
- •Use Grep instead of Read for searching
- •Load specific sections, not entire files
- •Unload files when done with them
- •
Optimize Conversation History
- •Context editing auto-clears stale content
- •Summarize long conversations if needed
- •Start fresh session for new major tasks
- •
Reduce CLAUDE.md Size
- •Keep only essential long-term instructions
- •Move project-specific details to separate files
- •Use CLAUDE.local.md for temporary preferences
- •Target: <5,000 tokens for CLAUDE.md
- •
Apply Progressive Loading
- •Load overview/index files first
- •Load detailed references only when needed
- •Use skill references/ on-demand loading
Validation Checklist:
- • Stale tool results cleared
- • Only necessary files loaded
- • CLAUDE.md size optimized (<5,000 tokens if possible)
- • Progressive loading applied where relevant
- • Context usage reduced measurably
Outputs:
- •Reduced token count
- •Cleaner context window
- •List of removed/optimized elements
- •New context usage measurement
Time Estimate: 15-30 minutes
Example Reduction:
Before Optimization: 145,000 tokens (72%) Actions Taken: 1. Cleared 50 old tool results: -15,000 tokens 2. Unloaded 3 large files no longer needed: -18,000 tokens 3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens 4. Used references/ loading instead of full files: -25,000 tokens After Optimization: 75,000 tokens (37%) Reduction: 70,000 tokens (48% reduction) Quality Impact: None - relevant context maintained
Operation 3: Optimize Skill Design
Purpose: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture
When to Use This Operation:
- •Planning new skills (design for efficiency from start)
- •Refactoring existing skills (improve context efficiency)
- •Building large/complex skills (manage context proactively)
- •Creating skill ecosystems (coordinate context usage)
Process:
- •
Apply Progressive Disclosure
- •SKILL.md: Overview + essentials only (<1,200 lines, ~3,000-5,000 tokens)
- •references/: Detailed guides loaded on-demand (300-600 lines each, ~1,000-2,000 tokens)
- •scripts/: Automation loaded when needed
Token Impact: 70-80% reduction vs monolithic (5k vs 20k+ tokens)
- •
Design for Lazy Loading
- •Separate content into focused reference files
- •Each reference file covers one topic
- •Load specific reference, not entire skill
- •Example: Load
references/structure-review-guide.mdnot entire review-multi
- •
Optimize File Sizes
- •SKILL.md target: 800-1,200 lines (2,500-4,000 tokens)
- •Reference files: 300-600 lines (1,000-2,000 tokens each)
- •Keep files focused and concise
- •
Use Token-Efficient Formats
- •Tables instead of prose (higher information density)
- •Lists instead of paragraphs (more scannable)
- •Code blocks for examples (clear and concise)
- •Quick Reference sections (high-density lookup)
- •
Consider Context Budget
- •Estimate token usage for skill
- •Simple skill: 3,000-8,000 tokens total
- •Medium skill: 10,000-20,000 tokens total
- •Complex skill: 25,000-40,000 tokens total
- •Design within budget
Validation Checklist:
- • Progressive disclosure applied (SKILL.md + references/)
- • SKILL.md <1,200 lines (~4,000 tokens or less)
- • Reference files 300-600 lines each
- • Files focused on single topics (lazy loadable)
- • Token-efficient formats used (tables, lists, code blocks)
- • Estimated total tokens within budget
- • Skill can be partially loaded (references/ on-demand)
Outputs:
- •Context-efficient skill design
- •Token budget estimate
- •Progressive disclosure plan
- •Reference file organization
Time Estimate: 20-40 minutes (during planning phase)
Example:
Skill Design: api-integration Token Budget Analysis: - SKILL.md: 900 lines → ~3,000 tokens - references/ (3 files): - api-guide.md: 400 lines → ~1,300 tokens - auth-patterns.md: 350 lines → ~1,200 tokens - examples.md: 300 lines → ~1,000 tokens - README.md: 300 lines → ~1,000 tokens Total if all loaded: ~7,500 tokens Typical usage: SKILL.md only → 3,000 tokens (60% savings) With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings) Progressive Disclosure Impact: - Monolithic (all in SKILL.md): ~7,500 tokens always loaded - Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical - Token Savings: 40-60% depending on usage Design: ✅ Context-efficient with progressive disclosure
Operation 4: Separate Planning from Execution
Purpose: Keep execution context clean by separating exploratory planning from focused implementation
When to Use This Operation:
- •Starting complex development work
- •Context getting cluttered with exploration
- •Need clean context for implementation
- •Before critical/focused work
Process:
- •
Planning Phase (Separate Session)
- •Broad codebase exploration
- •Research and pattern discovery
- •Architecture decisions
- •Task breakdown
- •Output: Plan documents, task lists
Characteristics: High context usage, exploratory, broad
- •
Execution Phase (Fresh Session)
- •Load plan documents (not exploration history)
- •Focused implementation
- •Specific file operations
- •Minimal context bloat
Characteristics: Clean context, focused, efficient
- •
Session Transition
- •End planning session when plan complete
- •Save planning artifacts (plans, task lists, decisions)
- •Start new session for execution
- •Load only: plan docs, CLAUDE.md, immediate dependencies
- •
Maintain Clean Execution Context
- •Don't re-explore during execution
- •Follow plan, don't re-research
- •Load files as needed, unload when done
- •Keep focus on implementation
Validation Checklist:
- • Planning and execution in separate sessions (when appropriate)
- • Planning artifacts saved and documented
- • Execution session starts with clean context
- • Only plan docs and essentials loaded for execution
- • No re-exploration during execution
- • Context stays focused on current task
Outputs:
- •Clean execution context
- •Focused implementation
- •Reduced context bloat
- •Better performance and quality
Time Estimate: Planning decision (0-5 min), session management (as needed)
Example:
Planning Session (Context: 150k tokens, 75% usage): - Explored 20 files for research - Analyzed patterns across codebase - Made architecture decisions - Created detailed plan - Output: skill-plan.md (comprehensive) [End session, save plan] Execution Session (Context: 30k tokens, 15% usage): - Load: CLAUDE.md + skill-plan.md + development-workflow - Context: Clean, focused, 30k tokens - Implementation: Follow plan, build skill - Load references as needed (not all at once) Result: 80% context reduction (150k → 30k) Quality: Higher (clean context, focused work)
Research Finding: "Separating planning from execution keeps implementation context clean" - confirmed by 2025 best practices
Operation 5: File-Based Optimization
Purpose: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings
When to Use This Operation:
- •Handling large data sets (>5,000 tokens)
- •Processing extensive outputs (logs, reports)
- •Working with large MCP responses
- •Managing generated content
Process:
- •
Identify Large Data
- •Data >5,000 tokens (typically >10,000 characters)
- •Repeated reference to same large content
- •Extensive generated outputs
- •Large MCP server responses
- •
Externalize to Files
bash# Save large data to temp file echo "large data here" > /tmp/large-data.txt
Instead of keeping in conversation context
- •
Reference File Instead of Content
markdownLarge dataset saved to: /tmp/analysis-data.json (50,000 tokens) To analyze: Read /tmp/analysis-data.json when needed
Token Impact: 50,000 tokens → ~500 tokens (95% reduction)
- •
Load On-Demand
- •Read file only when specific analysis needed
- •Process in chunks if necessary
- •Don't keep full content in context
- •
Clean Up Temp Files
- •Remove temp files when no longer needed
- •Don't accumulate unused files
Validation Checklist:
- • Large data identified (>5,000 tokens)
- • Data externalized to files
- • File paths documented (reference in conversation)
- • On-demand loading used (not full content in context)
- • Token savings measured (before/after)
- • Quality maintained (can access data when needed)
- • Temp files cleaned up when done
Outputs:
- •Externalized data files
- •File path references
- •Significant token savings (often 90-95%)
- •Maintained data accessibility
Time Estimate: 10-20 minutes (setup and management)
Example:
Scenario: Analyzing large log file (100,000 tokens) Before Optimization: - Full log in conversation context: 100,000 tokens - Context usage: 50% just for log data After Optimization: 1. Save log to /tmp/app-log.txt 2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)" 3. Read specific sections when needed: - Read first 50 lines for overview - Grep for errors - Read relevant sections on-demand Token Usage: - Before: 100,000 tokens in context - After: ~500 tokens (file reference) + ~2,000 tokens (specific reads) - Savings: 97,500 tokens (97.5% reduction) Quality: Maintained - can still analyze log on-demand Access: Full log available when needed
Research Finding: "File-based approach achieves 95% token savings" - proven in 2025 optimization studies
Best Practices
1. Quality Over Quantity
Practice: Focus on relevant, high-quality context rather than loading everything
Rationale: Every piece should be current, accurate, and directly relevant to task
Application: Before loading file, ask: "Do I need this right now for current task?"
2. Progressive Disclosure Always
Practice: Design all skills with SKILL.md + references/ pattern
Rationale: 70-80% token reduction vs monolithic design
Application: SKILL.md <1,200 lines, details in references/ loaded on-demand
3. Monitor Regularly
Practice: Check context usage periodically, especially in long sessions
Rationale: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift
Application: Check /context every 30-60 minutes in active development
4. File-Based for Large Data
Practice: Externalize data >5,000 tokens to files
Rationale: 95% token savings while maintaining accessibility
Application: Save to /tmp/, reference file path, load on-demand
5. Separate Planning from Execution
Practice: Plan in one session, execute in clean session with plan artifacts
Rationale: Keeps execution context focused, prevents exploratory noise
Application: When planning >1 hour, start fresh session for implementation
6. Clean Context Before Critical Work
Practice: Reduce context load before important/complex operations
Rationale: Clean context improves quality and performance
Application: Before complex implementation, clear unnecessary context
7. Use Appropriate Tools
Practice: Choose tools that minimize context usage
Rationale: Some tools add more context than others
Application:
- •Grep instead of Read for searching (doesn't load full file)
- •Glob for finding files (doesn't load content)
- •Task agents for exploration (separate context)
8. Optimize CLAUDE.md
Practice: Keep CLAUDE.md concise (<5,000 tokens), split if needed
Rationale: CLAUDE.md loaded every session, large files waste context
Application:
- •Essential standards in CLAUDE.md (project level)
- •Temporary preferences in CLAUDE.local.md
- •Specific domain knowledge in separate files loaded as needed
Context Budgets for Skills
Simple Skills (Total: 5,000-10,000 tokens)
Structure:
- •SKILL.md only or SKILL.md + 1-2 small references
- •No scripts or minimal automation
- •~400-800 lines total
Token Breakdown:
- •SKILL.md: 3,000-4,000 tokens
- •References (if any): 1,000-2,000 tokens each
- •README: 1,000 tokens
Example: format-validator, simple helpers
Medium Skills (Total: 15,000-30,000 tokens)
Structure:
- •SKILL.md + 3-5 references + scripts
- •~1,500-3,000 lines total
Token Breakdown:
- •SKILL.md: 3,000-5,000 tokens
- •References: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
- •Scripts: 2-3 files × 1,000 tokens = 2,000-3,000 tokens
- •README: 1,000-1,500 tokens
Progressive Loading: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical
Example: prompt-builder, skill-researcher
Complex Skills (Total: 40,000-60,000 tokens)
Structure:
- •SKILL.md + 7-10 references + 4+ scripts
- •~4,000-7,000 lines total
Token Breakdown:
- •SKILL.md: 4,000-6,000 tokens
- •References: 7-10 files × 2,000 tokens = 14,000-20,000 tokens
- •Scripts: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
- •README: 1,500-2,000 tokens
Progressive Loading: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical
Example: review-multi, testing-validator
Key: Even complex skills only load 6-10k tokens typically (not full 40-60k)
Common Mistakes
Mistake 1: Loading Everything Upfront
Symptom: Context quickly fills with all skill files
Cause: Reading all files instead of progressive loading
Fix: Load SKILL.md first, load references/ only when needed for specific operations
Prevention: Follow progressive disclosure pattern
Mistake 2: Not Monitoring Context
Symptom: Unexpected context overflow, performance degradation
Cause: No visibility into token usage
Fix: Check /context regularly, monitor usage patterns
Prevention: Check context every 30-60 min in active sessions
Mistake 3: Large Monolithic CLAUDE.md
Symptom: Every session starts with 20k-50k tokens used
Cause: Putting everything in CLAUDE.md
Fix: Split CLAUDE.md - essentials only, separate files for detailed knowledge
Prevention: Keep CLAUDE.md <5,000 tokens, use multiple files
Mistake 4: Keeping Stale Tool Results
Symptom: Context bloated with old exploratory results
Cause: Not clearing old tool calls
Fix: Context editing auto-clears, but can manually manage by starting fresh sessions
Prevention: Fresh session for major transitions (planning → execution)
Mistake 5: Not Using File-Based Strategies
Symptom: Large data sets consuming 30-50% of context
Cause: Keeping large outputs in conversation
Fix: Save to /tmp/ files, reference file path, load on-demand
Prevention: Any data >5,000 tokens → externalize to file
Mistake 6: Ignoring Progressive Disclosure
Symptom: Skills with 2,000+ line SKILL.md files
Cause: Not using references/ for detailed content
Fix: Extract detailed content to references/, keep SKILL.md as overview
Prevention: Design with progressive disclosure from start (use planning-architect)
Quick Reference
Context Window Limits (2025)
| Model | Standard | Beta (Tier 4) | Auto-Compact |
|---|---|---|---|
| Sonnet 4/4.5 | 200k tokens | 500k-1M tokens | ~80% (~160k for 200k) |
Token Savings Strategies
| Strategy | Token Savings | Application |
|---|---|---|
| Progressive Disclosure | 70-80% | SKILL.md + references/ vs monolithic |
| File-Based Externalization | 95% | Large data >5k tokens to /tmp/ files |
| Context Editing | 29-39% | Auto-clears stale content |
| Lazy Loading | 60-70% | Load references on-demand vs all upfront |
| Optimized CLAUDE.md | Variable | Keep <5k tokens vs 20-50k bloat |
Skill Token Budgets
| Skill Complexity | Total Tokens | Typical Load | Progressive Load |
|---|---|---|---|
| Simple | 5k-10k | 3k-4k | SKILL.md only |
| Medium | 15k-30k | 4k-6k | SKILL.md + 1 reference |
| Complex | 40k-60k | 6k-10k | SKILL.md + 2 references |
Context Usage Guidelines
| Usage % | Status | Action |
|---|---|---|
| <50% | ✅ Healthy | Normal operation |
| 50-70% | ⚠️ Monitor | Check periodically, plan optimization |
| 70-80% | ⚠️ Optimize | Reduce context load soon |
| >80% | ❌ Critical | Immediate optimization needed (auto-compact triggers) |
Optimization Decision Tree
Is context >70%?
├─ Yes → Reduce immediately (Operation 2)
│ ├─ Clear stale tool results
│ ├─ Unload unnecessary files
│ └─ Start fresh session if needed
│
└─ No → Preventive optimization
├─ Is data >5k tokens? → File-based (Operation 5)
├─ Building skills? → Progressive disclosure (Operation 3)
└─ Long session? → Consider planning/execution split (Operation 4)
Quick Optimization Actions
Immediate (When context >80%):
1. Check usage: /context command 2. Clear old tool results (context editing helps) 3. Start fresh session with essentials only 4. Load plan docs, not exploration history
Preventive (During development):
1. Design skills with progressive disclosure 2. Use file-based for large data (>5k tokens) 3. Monitor context every 30-60 min 4. Separate planning from execution (for complex work)
Common Commands
# Monitor context usage /context # Read specific lines (not full file) Read file_path --limit 50 # Search without loading (uses Grep, more efficient) Grep "pattern" path/ # Find files without loading content Glob "*.py" path/ # Externalize large data Bash: command > /tmp/output.txt # Then reference: "See /tmp/output.txt for results"
Token Estimation
Quick Estimation:
- •1 line of code/text ≈ 3-4 tokens (average)
- •1,000 lines ≈ 3,000-4,000 tokens
- •Dense prose: 3-3.5 tokens/line
- •Code with comments: 2.5-3 tokens/line
- •Tables/lists: 2-2.5 tokens/line (more efficient)
For More Information
- •Context monitoring: references/context-monitoring-guide.md
- •Reduction strategies: references/reduction-strategies.md
- •Optimization patterns: references/optimization-patterns.md
- •Analysis script: scripts/analyze-context-usage.py
context-engineering helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.