Context Optimization

Context Optimization Techniques

Extend effective context capacity through strategic compression, masking, caching, and partitioning.

Prerequisites

•Understanding of context windows
•Familiarity with token economics

Instructions

Optimization Strategies

Compaction: Summarize context near limits, reinitialize with summary.

•Target 50-70% token reduction
•Less than 5% quality degradation

Observation Masking: Replace verbose tool outputs with references.

•Tool outputs can be 80%+ of token usage
•Store full output, return summary + reference

KV-Cache Optimization: Reuse cached computations for common prefixes.

•Place stable elements first (system prompt, tool definitions)
•Avoid dynamic content like timestamps

Context Partitioning: Split work across sub-agents with isolated contexts.

•Each sub-agent gets fresh context
•Aggregate results at coordination layer

Compaction Priority

•Tool outputs → replace with summaries
•Old turns → summarize early conversation
•Retrieved docs → summarize if recent versions exist
•Never compress → system prompt

Observation Masking Strategy

Never mask: Critical observations, most recent turn, active reasoning

Consider masking: 3+ turns ago, verbose outputs with extractable key points

Always mask: Repeated outputs, boilerplate, already-summarized content

Cache-Friendly Ordering

python

context = [system_prompt, tool_definitions]  # Stable, cacheable
context += [reused_templates]                 # Reusable
context += [unique_content]                   # Unique per request

Guidelines

•Measure before optimizing—know your current state
•Apply compaction before masking when possible
•Design for cache stability with consistent prompts
•Partition before context becomes problematic
•Balance token savings against quality preservation

Notes

•Context quality matters more than quantity
•Optimization can double or triple effective capacity
•Monitor token utilization and trigger at 70-80%

Source: muratcankoylan/Agent-Skills-for-Context-Engineering