LightMem: Memory-Augmented Generation

Lightweight framework for adding long-term memory to LLM applications. Extracts facts from conversations, stores in vector DB, retrieves relevant memories for context.

Prerequisites

bash

# Install LightMem
pip install lightmem

# Or from source
git clone https://github.com/zjunlp/LightMem.git
cd LightMem && pip install -e .

# Download required models
# 1. LLMLingua-2: microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank
# 2. Embedding: sentence-transformers/all-MiniLM-L6-v2

Environment variables:

•OPENAI_API_KEY (if using OpenAI backend)

Quick Start

Minimal Setup

python

from lightmem.memory.lightmem import LightMemory

config = {
    "pre_compress": False,
    "topic_segment": False,
    "metadata_generate": True,
    "text_summary": True,
    "memory_manager": {
        "model_name": "openai",
        "configs": {
            "model": "gpt-4o-mini",
            "api_key": "your-api-key",
        }
    },
    "index_strategy": "embedding",
    "text_embedder": {
        "model_name": "huggingface",
        "configs": {
            "model": "/path/to/all-MiniLM-L6-v2",
            "embedding_dims": 384,
        },
    },
    "retrieve_strategy": "embedding",
    "embedding_retriever": {
        "model_name": "qdrant",
        "configs": {
            "collection_name": "my_memory",
            "embedding_model_dims": 384,
            "path": "./qdrant_data/my_memory",
        }
    },
    "update": "offline",
}

lightmem = LightMemory.from_config(config)

Full Setup (with compression and segmentation)

python

config = {
    "pre_compress": True,
    "pre_compressor": {
        "model_name": "llmlingua-2",
        "configs": {
            "llmlingua_config": {
                "model_name": "/path/to/llmlingua-2-model",
                "device_map": "cuda",
                "use_llmlingua2": True,
            },
            "compress_config": {"rate": 0.6}
        }
    },
    "topic_segment": True,
    "precomp_topic_shared": True,
    "topic_segmenter": {"model_name": "llmlingua-2"},
    "messages_use": "user_only",
    "metadata_generate": True,
    "text_summary": True,
    "memory_manager": {
        "model_name": "openai",  # or "ollama", "deepseek", "vllm"
        "configs": {
            "model": "gpt-4o-mini",
            "api_key": "your-api-key",
            "max_tokens": 16000,
        }
    },
    "extract_threshold": 0.1,
    "index_strategy": "embedding",
    "text_embedder": {
        "model_name": "huggingface",
        "configs": {
            "model": "/path/to/all-MiniLM-L6-v2",
            "embedding_dims": 384,
            "model_kwargs": {"device": "cuda"},
        },
    },
    "retrieve_strategy": "embedding",
    "embedding_retriever": {
        "model_name": "qdrant",
        "configs": {
            "collection_name": "my_memory",
            "embedding_model_dims": 384,
            "path": "./qdrant_data/my_memory",
        }
    },
    "update": "offline",
}

lightmem = LightMemory.from_config(config)

Core Workflow

Phase 1: Add Memory

Store conversation turns with timestamps:

python

messages = [
    {"role": "user", "content": "My favorite color is blue.", "time_stamp": "2024-01-15T10:30:00"},
    {"role": "assistant", "content": "Got it, blue is a nice color.", "time_stamp": "2024-01-15T10:30:00"},
]

result = lightmem.add_memory(
    messages=messages,
    force_segment=True,   # Force topic segmentation
    force_extract=True    # Force fact extraction
)

What happens:

•Messages normalized with timestamps
•(Optional) Pre-compressed to reduce tokens
•(Optional) Segmented by topic
•Facts extracted by LLM (e.g., "User's favorite color is blue")
•Stored in vector database

Phase 2: Offline Update (Optional)

Consolidate related memories after batch additions:

python

# Build update queue (find related memories)
lightmem.construct_update_queue_all_entries(top_k=20, keep_top_n=10)

# Update/delete redundant memories
lightmem.offline_update_all_entries(score_threshold=0.8)

Phase 3: Retrieve

Search for relevant memories:

python

query = "What is the user's favorite color?"
memories = lightmem.retrieve(query, limit=5)
print(memories)
# Output: "2024-01-15T10:30:00 Mon User's favorite color is blue."

Phase 4: Use in Generation

Inject memories into LLM context:

python

from openai import OpenAI

client = OpenAI()
memories = lightmem.retrieve(user_question, limit=10)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": f"Relevant memories:\n{memories}"},
        {"role": "user", "content": user_question}
    ]
)

Configuration Reference

Option	Values	Description
`pre_compress`	true/false	Enable token compression
`topic_segment`	true/false	Enable topic-based segmentation
`messages_use`	user_only/assistant_only/hybrid	Which messages to extract facts from
`metadata_generate`	true/false	Extract metadata (keywords, entities)
`text_summary`	true/false	Generate text summaries
`index_strategy`	embedding/context/hybrid	Indexing method
`retrieve_strategy`	embedding/context/hybrid	Retrieval method
`update`	online/offline	When to consolidate memories

Backend Options

OpenAI

python

"memory_manager": {
    "model_name": "openai",
    "configs": {"model": "gpt-4o-mini", "api_key": "..."}
}

Ollama (Local)

python

"memory_manager": {
    "model_name": "ollama",
    "configs": {"model": "llama3:latest", "host": "http://localhost:11434"}
}

DeepSeek

python

"memory_manager": {
    "model_name": "deepseek",
    "configs": {"model": "deepseek-chat", "api_key": "...", "deepseek_base_url": "..."}
}

MCP Server

LightMem provides an MCP server for integration:

bash

# Install MCP dependencies
pip install 'lightmem[mcp]'

# Run server
cd LightMem
fastmcp run mcp/server.py:mcp --transport http --port 8000

Available tools:

•add_memory: Add conversation to memory
•retrieve_memory: Search memories
•offline_update: Consolidate memories
•get_timestamp: Get current timestamp
•show_lightmem_instance: Show configuration

Algorithm Details

For the full extraction prompts, update logic, and retrieval algorithms, see references/algorithm.md.