Memory Orchestration
Analyzes context management and memory systems.
Process
- •Trace context assembly — How prompts are built from components
- •Identify eviction policies — How context overflow is handled
- •Map memory tiers — Short-term (RAM) to long-term (DB)
- •Analyze token management — Counting, budgeting, truncation
Context Assembly Analysis
Standard Assembly Order
code
┌─────────────────────────────────────────┐ │ 1. System Prompt │ │ - Role definition │ │ - Behavioral guidelines │ │ - Output format instructions │ ├─────────────────────────────────────────┤ │ 2. Retrieved Context / Memory │ │ - Relevant past interactions │ │ - Retrieved documents (RAG) │ │ - User preferences │ ├─────────────────────────────────────────┤ │ 3. Tool Definitions │ │ - Available tools and schemas │ │ - Usage examples │ ├─────────────────────────────────────────┤ │ 4. Conversation History │ │ - Previous turns (user/assistant) │ │ - Prior tool calls and results │ ├─────────────────────────────────────────┤ │ 5. Current Input │ │ - User's current message │ │ - Any attachments/context │ ├─────────────────────────────────────────┤ │ 6. Agent Scratchpad (Optional) │ │ - Current thinking/planning │ │ - Intermediate results │ └─────────────────────────────────────────┘
Assembly Patterns
Template-Based
python
PROMPT_TEMPLATE = """
{system_prompt}
## Available Tools
{tool_descriptions}
## Conversation
{history}
## Current Request
{user_input}
"""
prompt = PROMPT_TEMPLATE.format(
system_prompt=self.system_prompt,
tool_descriptions=self._format_tools(),
history=self._format_history(),
user_input=message
)
Message List (Chat API)
python
messages = [
{"role": "system", "content": system_prompt},
*self._get_history_messages(),
{"role": "user", "content": user_input}
]
Programmatic Assembly
python
def build_prompt(self, input):
builder = PromptBuilder()
builder.add_system(self.system_prompt)
builder.add_context(self.memory.retrieve(input))
builder.add_tools(self.tools)
builder.add_history(self.history, max_tokens=2000)
builder.add_user(input)
return builder.build()
Eviction Policies
FIFO (First In, First Out)
python
def trim_history(self, max_messages: int):
while len(self.history) > max_messages:
self.history.pop(0) # Remove oldest
Pros: Simple, predictable Cons: May lose important early context
Sliding Window
python
def get_context_window(self, max_tokens: int):
window = []
token_count = 0
for msg in reversed(self.history):
msg_tokens = count_tokens(msg)
if token_count + msg_tokens > max_tokens:
break
window.insert(0, msg)
token_count += msg_tokens
return window
Pros: Token-aware, keeps recent Cons: Still loses old context
Summarization
python
def summarize_and_trim(self, max_tokens: int):
if self.total_tokens < max_tokens:
return
# Summarize oldest messages
old_messages = self.history[:len(self.history)//2]
summary = self.llm.summarize(old_messages)
# Replace with summary
self.history = [
{"role": "system", "content": f"Previous conversation summary: {summary}"},
*self.history[len(self.history)//2:]
]
Pros: Preserves context semantically Cons: Expensive (LLM call), lossy
Vector Store Swapping
python
def manage_context(self, current_input: str, max_tokens: int):
# Move old messages to vector store
if self.total_tokens > max_tokens:
to_archive = self.history[:-10]
self.vector_store.add(to_archive)
self.history = self.history[-10:]
# Retrieve relevant context
relevant = self.vector_store.search(current_input, k=5)
return self._build_prompt(relevant, self.history)
Pros: Scalable, relevance-based Cons: Complex, retrieval quality matters
Importance Scoring
python
def score_and_trim(self, max_tokens: int):
scored = []
for msg in self.history:
score = self._compute_importance(msg)
scored.append((score, msg))
# Keep highest scoring until budget
scored.sort(reverse=True)
kept = []
tokens = 0
for score, msg in scored:
if tokens + count_tokens(msg) > max_tokens:
break
kept.append(msg)
tokens += count_tokens(msg)
# Restore chronological order
self.history = sorted(kept, key=lambda m: m['timestamp'])
Pros: Keeps important context Cons: Expensive to compute
Memory Tier Mapping
code
┌─────────────────────────────────────────────────────┐ │ MEMORY TIERS │ ├─────────────────────────────────────────────────────┤ │ Tier 1: Working Memory (In-Prompt) │ │ ├── Current conversation turns │ │ ├── Active tool results │ │ └── Immediate scratchpad │ │ Latency: 0ms | Capacity: Context window │ ├─────────────────────────────────────────────────────┤ │ Tier 2: Session Memory (RAM) │ │ ├── Full conversation history │ │ ├── Session state │ │ └── Cached retrievals │ │ Latency: <1ms | Capacity: GB │ ├─────────────────────────────────────────────────────┤ │ Tier 3: Persistent Memory (Database) │ │ ├── Vector store (semantic search) │ │ ├── SQL/Document store (structured) │ │ └── User profiles and preferences │ │ Latency: 10-100ms | Capacity: TB+ │ └─────────────────────────────────────────────────────┘
Tier Promotion/Demotion
python
class MemoryManager:
def on_turn_end(self, turn):
# Tier 1 → Tier 2: Move from prompt to session
self.session_memory.add(turn)
# Tier 2 → Tier 3: Persist important turns
if self.should_persist(turn):
self.persistent_memory.add(turn)
def on_session_end(self):
# Tier 2 → Tier 3: Archive session
summary = self.summarize_session()
self.persistent_memory.add(summary)
Token Management
Counting Strategies
| Method | Accuracy | Speed |
|---|---|---|
tiktoken | Exact | Fast |
len(text) / 4 | Rough estimate | Instant |
| API response | Post-hoc | After call |
| Tokenizer model | Exact | Medium |
Budget Allocation
python
class TokenBudget:
def __init__(self, total: int = 8000):
self.total = total
self.allocations = {
'system': 1000,
'tools': 1500,
'history': 4000,
'input': 1000,
'output_reserve': 500
}
def remaining_for_history(self, used: dict) -> int:
fixed = used.get('system', 0) + used.get('tools', 0)
return self.total - fixed - self.allocations['output_reserve']
Output Template
markdown
## Memory Orchestration Analysis: [Framework Name] ### Context Assembly - **Order**: [System → Memory → Tools → History → Input] - **Method**: [Template/Message List/Programmatic] - **Location**: `path/to/prompt_builder.py` ### Eviction Policy - **Strategy**: [FIFO/Window/Summarization/Vector/Importance] - **Trigger**: [Token count/Message count/Explicit] - **Location**: `path/to/memory.py:L45` ### Memory Tiers | Tier | Storage | Capacity | Retrieval | |------|---------|----------|-----------| | Working | In-prompt | ~4K tokens | Immediate | | Session | Dict/List | Unlimited | Direct | | Persistent | [Chroma/Pinecone/SQL] | Unlimited | Semantic | ### Token Management - **Counting**: [tiktoken/estimate/API] - **Budget Allocation**: [Description] - **Overflow Handling**: [Truncate/Summarize/Error]
Integration
- •Prerequisite:
codebase-mappingto identify memory files - •Feeds into:
comparative-matrixfor context strategies - •Related:
control-loop-extractionfor scratchpad usage