LLM Memory Expert
Use this skill when working on memory systems, personalization, pattern detection, or any feature that requires understanding how LLMs should remember, learn, and adapt to users.
Core Principles
1. Never RAG at Inference Time
This is the #1 lesson from industry leaders.
| Wrong | Right |
|---|---|
| Query vector DB during response | Pre-compute and inject into prompt |
| 500ms+ retrieval latency | <50ms from hot cache |
| RAG pipeline in critical path | Async extraction, sync injection |
Voice assistants, chat interfaces, and real-time systems cannot tolerate retrieval latency. OpenAI, DeepSeek, and MemoryOS all converged on this.
Pattern:
code
+-----------------------+ +-----------------------------------+ | HOT MEMORY (Redis) |----> System Prompt Injection (<50ms) | | ~500 tokens max | | User profile + active patterns | +-----------------------+ +-----------------------------------+
2. Memory is Metabolic (SimpleMem)
Memory should:
- •Compress - Not store everything verbatim
- •Consolidate - Merge related facts into abstractions
- •Forget - Expire stale or contradicted information
Anti-pattern: Growing unbounded memory list Pattern: Tiered storage with consolidation pipeline
3. Separation of Concerns
| Tier | Access Time | Contents | Update Frequency |
|---|---|---|---|
| HOT | <1ms | Profile, top-N patterns | On confirmation |
| WARM | 10-50ms | All facts, graph, candidates | After each session |
| COLD | 100ms+ | Full history, audit trail | Batch/async |
4. Patterns Emerge from Observations
Don't store individual events. Track observations and detect patterns.
code
Observation 1: Gym at 7am Monday Observation 2: Gym at 7am Tuesday Observation 3: Gym at 7am Wednesday Observation 4: Gym at 7am Thursday Observation 5: Gym at 7am Friday --------------------------------- Pattern: "User gyms at 7am on weekdays" (confidence: 0.85)
State of the Art Reference
OpenAI ChatGPT Memory
- •No RAG - 4-layer injection (profile, history, extracted knowledge, active context)
- •Bio tool for memory management
- •Memories evolve with interactions, separate from chat history
DeepSeek Engram (January 2026)
- •O(1) lookups via N-gram hashing for static patterns
- •75/25 split - 75% compute, 25% memory
- •Memory improved reasoning MORE than knowledge retrieval
- •100B parameters offloadable to CPU DRAM with <3% penalty
PersonaMem-v2 (December 2025)
- •GRPO training for memory distillation
- •2,048 token max human-readable memory
- •16x more efficient than full-history approaches
- •80% MCQ + 20% open-ended training mix is critical
- •Frontier LLMs only 37-48% on implicit personalization
SimpleMem (January 2026)
- •30x token reduction via semantic compression
- •Three stages: Compress -> Consolidate -> Retrieve
- •Recursive consolidation: 31.3% improvement in multi-hop reasoning
- •Inspired by Complementary Learning Systems (CLS) theory
Mem0
- •Graph memory for relationships (26% improvement over OpenAI)
- •Hybrid: vector DB + graph DB + key-value
- •91% lower p95 latency, 90% token savings
MemGPT/Letta
- •LLM as OS - virtual context management
- •Agent self-manages memory via tool calls
- •Two tiers: in-context (editable) + external (archival + recall)
MemoryOS (EMNLP 2025)
- •STM/MTM/LPM hierarchy (Short/Mid/Long-term)
- •49% F1 improvement, 4.9 LLM calls vs 13 for competitors
- •MTM (topical grouping) provides most value
Google Titans/MIRAS (December 2025)
- •Surprise-based retention - models learn what to remember
- •Scales to 2M+ tokens with 98% accuracy
- •Test-time training: compress context into weights
Pattern Detection Implementation
Observation Schema
go
type Observation struct {
ID string `json:"id"`
UserID string `json:"user_id"`
Type string `json:"type"` // calendar, email, command
Action string `json:"action"` // created, sent, updated
Timestamp time.Time `json:"timestamp"`
Features map[string]any `json:"features"`
// Features examples:
// - time_of_day: "07:00"
// - day_of_week: "monday"
// - duration_minutes: 30
// - participants: ["sarah@company.com"]
// - category: "fitness"
}
Pattern Types
| Type | Example | Signal |
|---|---|---|
| Temporal | "Gyms at 7am weekdays" | N events at same time |
| Preference | "Prefers 30-min meetings" | N meetings with same duration |
| Workflow | "Blocks focus after standup" | N sequences of events |
| Relationship | "Always CCs Sarah on legal" | N emails with same pattern |
| Avoidance | "Never Friday afternoons" | Absence in time slots |
Confidence Calculation
go
// From PersonaMem-v2 research
func CalculateConfidence(candidate PatternCandidate) float64 {
// 1. Count score (40% weight)
countScore := min(len(candidate.Observations) / CONFIRMATION_COUNT, 1.0)
// 2. Recency score (20% weight) - recent observations matter more
recencyScore := calculateRecencyDecay(candidate.Observations)
// 3. Consistency score (30% weight) - how similar are observations?
consistencyScore := calculateConsistency(candidate)
// 4. Span score (10% weight) - observed over longer period = reliable
spanDays := candidate.LastSeen.Sub(candidate.FirstSeen).Hours() / 24
spanScore := min(spanDays / 14.0, 1.0) // 2 weeks ideal
return 0.4*countScore + 0.2*recencyScore + 0.3*consistencyScore + 0.1*spanScore
}
const CONFIRMATION_COUNT = 5 // 5 observations to confirm
const CONFIRMATION_THRESHOLD = 0.7 // 70% confidence needed
Pattern Lifecycle
code
+--------------+ +--------------+ +--------------+
| TRACKING |---->| CONFIRMED |---->| EXPIRED |
| | | | | |
| Accumulating | | In HOT tier | | Contradicted |
| observations | | Prompt inject| | or decayed |
+--------------+ +--------------+ +--------------+
| | |
| | |
v v v
confidence++ confidence decay remove/demote
on match 5%/day if not seen after 3 contradictions
Decay and Expiration
go
// From SimpleMem research
func DecayPatterns(patterns []PatternCandidate) {
for _, p := range patterns {
daysSince := time.Since(p.LastSeen).Hours() / 24
decayFactor := math.Pow(0.95, daysSince) // 5% per day
p.Confidence *= decayFactor
if p.Confidence < 0.3 { // Below 30% = expire
expirePattern(p)
}
}
}
func CheckContradiction(pattern Pattern, observation Observation) {
if contradicts(pattern, observation) {
pattern.Metadata["contradiction_count"]++
if pattern.Metadata["contradiction_count"] >= 3 {
demoteToTracking(pattern) // Re-evaluate
}
}
}
Memory Architecture Patterns
Pattern 1: Hierarchical Storage (MemoryOS)
code
User Request
|
v
+------------------------------------------------+
| SHORT-TERM MEMORY (STM) |
| - Last N conversation turns |
| - FIFO eviction to MTM |
| - ~5-10 items max |
+------------------------+-----------------------+
| overflow
v
+------------------------------------------------+
| MID-TERM MEMORY (MTM) |
| - Topically grouped "segments" |
| - Cosine + Jaccard similarity |
| - Consolidation merges similar items |
+------------------------+-----------------------+
| summary
v
+------------------------------------------------+
| LONG-TERM PERSONAL MEMORY (LPM) |
| - User traits and preferences |
| - Confirmed patterns |
| - Permanent unless contradicted |
+------------------------------------------------+
Pattern 2: Graph Memory (Mem0)
code
Entities:
Person: {name, role, company}
Company: {name, industry}
Project: {name, status}
Relationships:
Person --WORKS_AT--> Company
Person --COLLABORATES_WITH--> Person
Person --MANAGES--> Project
Query: "Who handles legal?"
Graph: MATCH (p:Person)-[:WORKS_AT]->(c:Company {dept: "legal"}) RETURN p
Pattern 3: Dual Embedding (Alfred Current)
code
+------------------------------------------------+
| CLOUD EMBEDDING (Gemini-embedding-001) |
| - Primary, 768-dim |
| - For sync and cloud search |
+------------------------------------------------+
+
+------------------------------------------------+
| LOCAL EMBEDDING (Qwen3-0.6B) |
| - Fallback, 1024-dim |
| - For offline capability |
+------------------------------------------------+
Personalization Techniques
Explicit vs Implicit
| Explicit | Implicit |
|---|---|
| User says "Remember I like X" | System observes repeated X behavior |
| Direct tool call | Pattern detection |
| Immediate storage | Confidence accumulation |
| 100% confidence | Variable confidence |
The Personalization Ladder
- •Level 0: No memory (stateless)
- •Level 1: Explicit facts ("User likes morning meetings")
- •Level 2: Relationship tracking ("Sarah is user's cofounder")
- •Level 3: Implicit patterns ("User always blocks 9-10am")
- •Level 4: Predictive ("User probably wants to block 9-10am tomorrow")
Anti-Patterns
| Anti-Pattern | Why It's Wrong | Alternative |
|---|---|---|
| Store everything | Unbounded growth, noise | Compress and consolidate |
| No expiration | Stale data misleads | Decay over time |
| Single embedding model | Offline fails | Dual cloud+local |
| RAG in critical path | Latency kills UX | Pre-inject to prompt |
| Flat storage | No structure | Hierarchical tiers |
Implementation Checklist
Starting a Memory System
- • Define memory categories (preference, habit, alias, etc.)
- • Choose storage tiers (HOT/WARM/COLD)
- • Implement observation extraction
- • Build pattern accumulator with confidence scoring
- • Set up async consolidation pipeline
- • Create profile injection for system prompt
- • Add decay and expiration logic
- • Test with realistic user behavior sequences
Evaluating Memory Quality
- • Implicit pattern detection rate (should find 5+ observation patterns)
- • False positive rate (patterns that don't hold)
- • Retrieval latency (<50ms for HOT tier)
- • Token efficiency (track memory tokens / useful context)
- • Contradiction handling (demote after 3 violations)
Quick Reference
Memory Types (Recommended)
| Category | Example | Tier |
|---|---|---|
| profile | Name, timezone, work style | HOT |
| preference | "Likes morning meetings" | HOT |
| habit | "Gyms at 7am weekdays" | HOT |
| alias | "Cofounder = Alex" | HOT |
| relationship | "Sarah works at legal" | WARM |
| fact | "Meeting was productive" | WARM |
| episode | Full conversation log | COLD |
Confidence Thresholds
| Threshold | Action |
|---|---|
| 0.7+ | Confirm pattern -> HOT tier |
| 0.5-0.7 | Keep tracking |
| 0.3-0.5 | Low priority candidate |
| <0.3 | Expire |
Key Constants
go
CONFIRMATION_COUNT = 5 // Observations to confirm CONFIRMATION_THRESHOLD = 0.7 // Confidence to promote DECAY_RATE = 0.95 // 5% per day CONTRADICTION_LIMIT = 3 // Before demote SPAN_DAYS_TARGET = 14 // Ideal observation window