AgentSkillsCN

llm-memory-expert

参考指南,用于构建LLM记忆系统,结合模式检测、分层存储与个性化功能。适用于实现记忆功能、RAG替代方案、观察记录,或基于置信度的模式生命周期管理的场景。

SKILL.md
--- frontmatter
name: llm-memory-expert
description: Reference guide for building LLM memory systems with pattern detection, tiered storage, and personalization. Use when implementing memory, RAG alternatives, observation tracking, or confidence-based pattern lifecycle.

LLM Memory Expert

Use this skill when working on memory systems, personalization, pattern detection, or any feature that requires understanding how LLMs should remember, learn, and adapt to users.


Core Principles

1. Never RAG at Inference Time

This is the #1 lesson from industry leaders.

WrongRight
Query vector DB during responsePre-compute and inject into prompt
500ms+ retrieval latency<50ms from hot cache
RAG pipeline in critical pathAsync extraction, sync injection

Voice assistants, chat interfaces, and real-time systems cannot tolerate retrieval latency. OpenAI, DeepSeek, and MemoryOS all converged on this.

Pattern:

code
+-----------------------+    +-----------------------------------+
| HOT MEMORY (Redis)    |----> System Prompt Injection (<50ms)  |
| ~500 tokens max       |    | User profile + active patterns    |
+-----------------------+    +-----------------------------------+

2. Memory is Metabolic (SimpleMem)

Memory should:

  • Compress - Not store everything verbatim
  • Consolidate - Merge related facts into abstractions
  • Forget - Expire stale or contradicted information

Anti-pattern: Growing unbounded memory list Pattern: Tiered storage with consolidation pipeline

3. Separation of Concerns

TierAccess TimeContentsUpdate Frequency
HOT<1msProfile, top-N patternsOn confirmation
WARM10-50msAll facts, graph, candidatesAfter each session
COLD100ms+Full history, audit trailBatch/async

4. Patterns Emerge from Observations

Don't store individual events. Track observations and detect patterns.

code
Observation 1: Gym at 7am Monday
Observation 2: Gym at 7am Tuesday
Observation 3: Gym at 7am Wednesday
Observation 4: Gym at 7am Thursday
Observation 5: Gym at 7am Friday
---------------------------------
Pattern: "User gyms at 7am on weekdays" (confidence: 0.85)

State of the Art Reference

OpenAI ChatGPT Memory

  • No RAG - 4-layer injection (profile, history, extracted knowledge, active context)
  • Bio tool for memory management
  • Memories evolve with interactions, separate from chat history

DeepSeek Engram (January 2026)

  • O(1) lookups via N-gram hashing for static patterns
  • 75/25 split - 75% compute, 25% memory
  • Memory improved reasoning MORE than knowledge retrieval
  • 100B parameters offloadable to CPU DRAM with <3% penalty

PersonaMem-v2 (December 2025)

  • GRPO training for memory distillation
  • 2,048 token max human-readable memory
  • 16x more efficient than full-history approaches
  • 80% MCQ + 20% open-ended training mix is critical
  • Frontier LLMs only 37-48% on implicit personalization

SimpleMem (January 2026)

  • 30x token reduction via semantic compression
  • Three stages: Compress -> Consolidate -> Retrieve
  • Recursive consolidation: 31.3% improvement in multi-hop reasoning
  • Inspired by Complementary Learning Systems (CLS) theory

Mem0

  • Graph memory for relationships (26% improvement over OpenAI)
  • Hybrid: vector DB + graph DB + key-value
  • 91% lower p95 latency, 90% token savings

MemGPT/Letta

  • LLM as OS - virtual context management
  • Agent self-manages memory via tool calls
  • Two tiers: in-context (editable) + external (archival + recall)

MemoryOS (EMNLP 2025)

  • STM/MTM/LPM hierarchy (Short/Mid/Long-term)
  • 49% F1 improvement, 4.9 LLM calls vs 13 for competitors
  • MTM (topical grouping) provides most value

Google Titans/MIRAS (December 2025)

  • Surprise-based retention - models learn what to remember
  • Scales to 2M+ tokens with 98% accuracy
  • Test-time training: compress context into weights

Pattern Detection Implementation

Observation Schema

go
type Observation struct {
    ID        string         `json:"id"`
    UserID    string         `json:"user_id"`
    Type      string         `json:"type"`      // calendar, email, command
    Action    string         `json:"action"`    // created, sent, updated
    Timestamp time.Time      `json:"timestamp"`
    Features  map[string]any `json:"features"`
    // Features examples:
    // - time_of_day: "07:00"
    // - day_of_week: "monday"
    // - duration_minutes: 30
    // - participants: ["sarah@company.com"]
    // - category: "fitness"
}

Pattern Types

TypeExampleSignal
Temporal"Gyms at 7am weekdays"N events at same time
Preference"Prefers 30-min meetings"N meetings with same duration
Workflow"Blocks focus after standup"N sequences of events
Relationship"Always CCs Sarah on legal"N emails with same pattern
Avoidance"Never Friday afternoons"Absence in time slots

Confidence Calculation

go
// From PersonaMem-v2 research
func CalculateConfidence(candidate PatternCandidate) float64 {
    // 1. Count score (40% weight)
    countScore := min(len(candidate.Observations) / CONFIRMATION_COUNT, 1.0)

    // 2. Recency score (20% weight) - recent observations matter more
    recencyScore := calculateRecencyDecay(candidate.Observations)

    // 3. Consistency score (30% weight) - how similar are observations?
    consistencyScore := calculateConsistency(candidate)

    // 4. Span score (10% weight) - observed over longer period = reliable
    spanDays := candidate.LastSeen.Sub(candidate.FirstSeen).Hours() / 24
    spanScore := min(spanDays / 14.0, 1.0)  // 2 weeks ideal

    return 0.4*countScore + 0.2*recencyScore + 0.3*consistencyScore + 0.1*spanScore
}

const CONFIRMATION_COUNT = 5       // 5 observations to confirm
const CONFIRMATION_THRESHOLD = 0.7 // 70% confidence needed

Pattern Lifecycle

code
+--------------+     +--------------+     +--------------+
|  TRACKING    |---->|  CONFIRMED   |---->|   EXPIRED    |
|              |     |              |     |              |
| Accumulating |     | In HOT tier  |     | Contradicted |
| observations |     | Prompt inject|     | or decayed   |
+--------------+     +--------------+     +--------------+
       |                    |                    |
       |                    |                    |
       v                    v                    v
   confidence++        confidence decay     remove/demote
   on match            5%/day if not seen   after 3 contradictions

Decay and Expiration

go
// From SimpleMem research
func DecayPatterns(patterns []PatternCandidate) {
    for _, p := range patterns {
        daysSince := time.Since(p.LastSeen).Hours() / 24
        decayFactor := math.Pow(0.95, daysSince)  // 5% per day
        p.Confidence *= decayFactor

        if p.Confidence < 0.3 {  // Below 30% = expire
            expirePattern(p)
        }
    }
}

func CheckContradiction(pattern Pattern, observation Observation) {
    if contradicts(pattern, observation) {
        pattern.Metadata["contradiction_count"]++
        if pattern.Metadata["contradiction_count"] >= 3 {
            demoteToTracking(pattern)  // Re-evaluate
        }
    }
}

Memory Architecture Patterns

Pattern 1: Hierarchical Storage (MemoryOS)

code
User Request
     |
     v
+------------------------------------------------+
| SHORT-TERM MEMORY (STM)                        |
| - Last N conversation turns                    |
| - FIFO eviction to MTM                         |
| - ~5-10 items max                              |
+------------------------+-----------------------+
                         | overflow
                         v
+------------------------------------------------+
| MID-TERM MEMORY (MTM)                          |
| - Topically grouped "segments"                 |
| - Cosine + Jaccard similarity                  |
| - Consolidation merges similar items           |
+------------------------+-----------------------+
                         | summary
                         v
+------------------------------------------------+
| LONG-TERM PERSONAL MEMORY (LPM)                |
| - User traits and preferences                  |
| - Confirmed patterns                           |
| - Permanent unless contradicted                |
+------------------------------------------------+

Pattern 2: Graph Memory (Mem0)

code
Entities:
  Person: {name, role, company}
  Company: {name, industry}
  Project: {name, status}

Relationships:
  Person --WORKS_AT--> Company
  Person --COLLABORATES_WITH--> Person
  Person --MANAGES--> Project

Query: "Who handles legal?"
Graph: MATCH (p:Person)-[:WORKS_AT]->(c:Company {dept: "legal"}) RETURN p

Pattern 3: Dual Embedding (Alfred Current)

code
+------------------------------------------------+
| CLOUD EMBEDDING (Gemini-embedding-001)         |
| - Primary, 768-dim                             |
| - For sync and cloud search                    |
+------------------------------------------------+
                      +
+------------------------------------------------+
| LOCAL EMBEDDING (Qwen3-0.6B)                   |
| - Fallback, 1024-dim                           |
| - For offline capability                       |
+------------------------------------------------+

Personalization Techniques

Explicit vs Implicit

ExplicitImplicit
User says "Remember I like X"System observes repeated X behavior
Direct tool callPattern detection
Immediate storageConfidence accumulation
100% confidenceVariable confidence

The Personalization Ladder

  1. Level 0: No memory (stateless)
  2. Level 1: Explicit facts ("User likes morning meetings")
  3. Level 2: Relationship tracking ("Sarah is user's cofounder")
  4. Level 3: Implicit patterns ("User always blocks 9-10am")
  5. Level 4: Predictive ("User probably wants to block 9-10am tomorrow")

Anti-Patterns

Anti-PatternWhy It's WrongAlternative
Store everythingUnbounded growth, noiseCompress and consolidate
No expirationStale data misleadsDecay over time
Single embedding modelOffline failsDual cloud+local
RAG in critical pathLatency kills UXPre-inject to prompt
Flat storageNo structureHierarchical tiers

Implementation Checklist

Starting a Memory System

  • Define memory categories (preference, habit, alias, etc.)
  • Choose storage tiers (HOT/WARM/COLD)
  • Implement observation extraction
  • Build pattern accumulator with confidence scoring
  • Set up async consolidation pipeline
  • Create profile injection for system prompt
  • Add decay and expiration logic
  • Test with realistic user behavior sequences

Evaluating Memory Quality

  • Implicit pattern detection rate (should find 5+ observation patterns)
  • False positive rate (patterns that don't hold)
  • Retrieval latency (<50ms for HOT tier)
  • Token efficiency (track memory tokens / useful context)
  • Contradiction handling (demote after 3 violations)

Quick Reference

Memory Types (Recommended)

CategoryExampleTier
profileName, timezone, work styleHOT
preference"Likes morning meetings"HOT
habit"Gyms at 7am weekdays"HOT
alias"Cofounder = Alex"HOT
relationship"Sarah works at legal"WARM
fact"Meeting was productive"WARM
episodeFull conversation logCOLD

Confidence Thresholds

ThresholdAction
0.7+Confirm pattern -> HOT tier
0.5-0.7Keep tracking
0.3-0.5Low priority candidate
<0.3Expire

Key Constants

go
CONFIRMATION_COUNT = 5      // Observations to confirm
CONFIRMATION_THRESHOLD = 0.7 // Confidence to promote
DECAY_RATE = 0.95           // 5% per day
CONTRADICTION_LIMIT = 3     // Before demote
SPAN_DAYS_TARGET = 14       // Ideal observation window

Sources