LLM Memory Expert

Use this skill when working on memory systems, personalization, pattern detection, or any feature that requires understanding how LLMs should remember, learn, and adapt to users.

Core Principles

1. Never RAG at Inference Time

This is the #1 lesson from industry leaders.

Wrong	Right
Query vector DB during response	Pre-compute and inject into prompt
500ms+ retrieval latency	<50ms from hot cache
RAG pipeline in critical path	Async extraction, sync injection

Voice assistants, chat interfaces, and real-time systems cannot tolerate retrieval latency. OpenAI, DeepSeek, and MemoryOS all converged on this.

Pattern:

code

+-----------------------+    +-----------------------------------+
| HOT MEMORY (Redis)    |----> System Prompt Injection (<50ms)  |
| ~500 tokens max       |    | User profile + active patterns    |
+-----------------------+    +-----------------------------------+

2. Memory is Metabolic (SimpleMem)

Memory should:

•Compress - Not store everything verbatim
•Consolidate - Merge related facts into abstractions
•Forget - Expire stale or contradicted information

Anti-pattern: Growing unbounded memory list Pattern: Tiered storage with consolidation pipeline

3. Separation of Concerns

Tier	Access Time	Contents	Update Frequency
HOT	<1ms	Profile, top-N patterns	On confirmation
WARM	10-50ms	All facts, graph, candidates	After each session
COLD	100ms+	Full history, audit trail	Batch/async

4. Patterns Emerge from Observations

Don't store individual events. Track observations and detect patterns.

code

Observation 1: Gym at 7am Monday
Observation 2: Gym at 7am Tuesday
Observation 3: Gym at 7am Wednesday
Observation 4: Gym at 7am Thursday
Observation 5: Gym at 7am Friday
---------------------------------
Pattern: "User gyms at 7am on weekdays" (confidence: 0.85)

State of the Art Reference

OpenAI ChatGPT Memory

•No RAG - 4-layer injection (profile, history, extracted knowledge, active context)
•Bio tool for memory management
•Memories evolve with interactions, separate from chat history

DeepSeek Engram (January 2026)

•O(1) lookups via N-gram hashing for static patterns
•75/25 split - 75% compute, 25% memory
•Memory improved reasoning MORE than knowledge retrieval
•100B parameters offloadable to CPU DRAM with <3% penalty

PersonaMem-v2 (December 2025)

•GRPO training for memory distillation
•2,048 token max human-readable memory
•16x more efficient than full-history approaches
•80% MCQ + 20% open-ended training mix is critical
•Frontier LLMs only 37-48% on implicit personalization

SimpleMem (January 2026)

•30x token reduction via semantic compression
•Three stages: Compress -> Consolidate -> Retrieve
•Recursive consolidation: 31.3% improvement in multi-hop reasoning
•Inspired by Complementary Learning Systems (CLS) theory

Mem0

•Graph memory for relationships (26% improvement over OpenAI)
•Hybrid: vector DB + graph DB + key-value
•91% lower p95 latency, 90% token savings

MemGPT/Letta

•LLM as OS - virtual context management
•Agent self-manages memory via tool calls
•Two tiers: in-context (editable) + external (archival + recall)

MemoryOS (EMNLP 2025)

•STM/MTM/LPM hierarchy (Short/Mid/Long-term)
•49% F1 improvement, 4.9 LLM calls vs 13 for competitors
•MTM (topical grouping) provides most value

Google Titans/MIRAS (December 2025)

•Surprise-based retention - models learn what to remember
•Scales to 2M+ tokens with 98% accuracy
•Test-time training: compress context into weights

Pattern Detection Implementation

Observation Schema

type Observation struct {
    ID        string         `json:"id"`
    UserID    string         `json:"user_id"`
    Type      string         `json:"type"`      // calendar, email, command
    Action    string         `json:"action"`    // created, sent, updated
    Timestamp time.Time      `json:"timestamp"`
    Features  map[string]any `json:"features"`
    // Features examples:
    // - time_of_day: "07:00"
    // - day_of_week: "monday"
    // - duration_minutes: 30
    // - participants: ["sarah@company.com"]
    // - category: "fitness"
}

Pattern Types

Type	Example	Signal
Temporal	"Gyms at 7am weekdays"	N events at same time
Preference	"Prefers 30-min meetings"	N meetings with same duration
Workflow	"Blocks focus after standup"	N sequences of events
Relationship	"Always CCs Sarah on legal"	N emails with same pattern
Avoidance	"Never Friday afternoons"	Absence in time slots

Confidence Calculation

// From PersonaMem-v2 research
func CalculateConfidence(candidate PatternCandidate) float64 {
    // 1. Count score (40% weight)
    countScore := min(len(candidate.Observations) / CONFIRMATION_COUNT, 1.0)

    // 2. Recency score (20% weight) - recent observations matter more
    recencyScore := calculateRecencyDecay(candidate.Observations)

    // 3. Consistency score (30% weight) - how similar are observations?
    consistencyScore := calculateConsistency(candidate)

    // 4. Span score (10% weight) - observed over longer period = reliable
    spanDays := candidate.LastSeen.Sub(candidate.FirstSeen).Hours() / 24
    spanScore := min(spanDays / 14.0, 1.0)  // 2 weeks ideal

    return 0.4*countScore + 0.2*recencyScore + 0.3*consistencyScore + 0.1*spanScore
}

const CONFIRMATION_COUNT = 5       // 5 observations to confirm
const CONFIRMATION_THRESHOLD = 0.7 // 70% confidence needed

Pattern Lifecycle

code

+--------------+     +--------------+     +--------------+
|  TRACKING    |---->|  CONFIRMED   |---->|   EXPIRED    |
|              |     |              |     |              |
| Accumulating |     | In HOT tier  |     | Contradicted |
| observations |     | Prompt inject|     | or decayed   |
+--------------+     +--------------+     +--------------+
       |                    |                    |
       |                    |                    |
       v                    v                    v
   confidence++        confidence decay     remove/demote
   on match            5%/day if not seen   after 3 contradictions

Decay and Expiration

// From SimpleMem research
func DecayPatterns(patterns []PatternCandidate) {
    for _, p := range patterns {
        daysSince := time.Since(p.LastSeen).Hours() / 24
        decayFactor := math.Pow(0.95, daysSince)  // 5% per day
        p.Confidence *= decayFactor

        if p.Confidence < 0.3 {  // Below 30% = expire
            expirePattern(p)
        }
    }
}

func CheckContradiction(pattern Pattern, observation Observation) {
    if contradicts(pattern, observation) {
        pattern.Metadata["contradiction_count"]++
        if pattern.Metadata["contradiction_count"] >= 3 {
            demoteToTracking(pattern)  // Re-evaluate
        }
    }
}

Memory Architecture Patterns

Pattern 1: Hierarchical Storage (MemoryOS)

code

User Request
     |
     v
+------------------------------------------------+
| SHORT-TERM MEMORY (STM)                        |
| - Last N conversation turns                    |
| - FIFO eviction to MTM                         |
| - ~5-10 items max                              |
+------------------------+-----------------------+
                         | overflow
                         v
+------------------------------------------------+
| MID-TERM MEMORY (MTM)                          |
| - Topically grouped "segments"                 |
| - Cosine + Jaccard similarity                  |
| - Consolidation merges similar items           |
+------------------------+-----------------------+
                         | summary
                         v
+------------------------------------------------+
| LONG-TERM PERSONAL MEMORY (LPM)                |
| - User traits and preferences                  |
| - Confirmed patterns                           |
| - Permanent unless contradicted                |
+------------------------------------------------+

Pattern 2: Graph Memory (Mem0)

code

Entities:
  Person: {name, role, company}
  Company: {name, industry}
  Project: {name, status}

Relationships:
  Person --WORKS_AT--> Company
  Person --COLLABORATES_WITH--> Person
  Person --MANAGES--> Project

Query: "Who handles legal?"
Graph: MATCH (p:Person)-[:WORKS_AT]->(c:Company {dept: "legal"}) RETURN p

Pattern 3: Dual Embedding (Alfred Current)

code

+------------------------------------------------+
| CLOUD EMBEDDING (Gemini-embedding-001)         |
| - Primary, 768-dim                             |
| - For sync and cloud search                    |
+------------------------------------------------+
                      +
+------------------------------------------------+
| LOCAL EMBEDDING (Qwen3-0.6B)                   |
| - Fallback, 1024-dim                           |
| - For offline capability                       |
+------------------------------------------------+

Personalization Techniques

Explicit vs Implicit

Explicit	Implicit
User says "Remember I like X"	System observes repeated X behavior
Direct tool call	Pattern detection
Immediate storage	Confidence accumulation
100% confidence	Variable confidence

The Personalization Ladder

•Level 0: No memory (stateless)
•Level 1: Explicit facts ("User likes morning meetings")
•Level 2: Relationship tracking ("Sarah is user's cofounder")
•Level 3: Implicit patterns ("User always blocks 9-10am")
•Level 4: Predictive ("User probably wants to block 9-10am tomorrow")

Anti-Patterns

Anti-Pattern	Why It's Wrong	Alternative
Store everything	Unbounded growth, noise	Compress and consolidate
No expiration	Stale data misleads	Decay over time
Single embedding model	Offline fails	Dual cloud+local
RAG in critical path	Latency kills UX	Pre-inject to prompt
Flat storage	No structure	Hierarchical tiers

Implementation Checklist

Starting a Memory System

• Define memory categories (preference, habit, alias, etc.)
• Choose storage tiers (HOT/WARM/COLD)
• Implement observation extraction
• Build pattern accumulator with confidence scoring
• Set up async consolidation pipeline
• Create profile injection for system prompt
• Add decay and expiration logic
• Test with realistic user behavior sequences

Evaluating Memory Quality

• Implicit pattern detection rate (should find 5+ observation patterns)
• False positive rate (patterns that don't hold)
• Retrieval latency (<50ms for HOT tier)
• Token efficiency (track memory tokens / useful context)
• Contradiction handling (demote after 3 violations)

Quick Reference

Memory Types (Recommended)

Category	Example	Tier
profile	Name, timezone, work style	HOT
preference	"Likes morning meetings"	HOT
habit	"Gyms at 7am weekdays"	HOT
alias	"Cofounder = Alex"	HOT
relationship	"Sarah works at legal"	WARM
fact	"Meeting was productive"	WARM
episode	Full conversation log	COLD

Confidence Thresholds

Threshold	Action
0.7+	Confirm pattern -> HOT tier
0.5-0.7	Keep tracking
0.3-0.5	Low priority candidate
<0.3	Expire

Key Constants

CONFIRMATION_COUNT = 5      // Observations to confirm
CONFIRMATION_THRESHOLD = 0.7 // Confidence to promote
DECAY_RATE = 0.95           // 5% per day
CONTRADICTION_LIMIT = 3     // Before demote
SPAN_DAYS_TARGET = 14       // Ideal observation window

llm-memory-expert

LLM Memory Expert

Core Principles

1. Never RAG at Inference Time

2. Memory is Metabolic (SimpleMem)

3. Separation of Concerns

4. Patterns Emerge from Observations

State of the Art Reference

OpenAI ChatGPT Memory

DeepSeek Engram (January 2026)

PersonaMem-v2 (December 2025)

SimpleMem (January 2026)

Mem0

MemGPT/Letta

MemoryOS (EMNLP 2025)

Google Titans/MIRAS (December 2025)

Pattern Detection Implementation

Observation Schema

Pattern Types

Confidence Calculation

Pattern Lifecycle

Decay and Expiration

Memory Architecture Patterns

Pattern 1: Hierarchical Storage (MemoryOS)

Pattern 2: Graph Memory (Mem0)

Pattern 3: Dual Embedding (Alfred Current)

Personalization Techniques

Explicit vs Implicit

The Personalization Ladder

Anti-Patterns

Implementation Checklist

Starting a Memory System

Evaluating Memory Quality

Quick Reference

Memory Types (Recommended)

Confidence Thresholds

Key Constants

Sources