SuperLocalMemory: Build Knowledge Graph
Build or rebuild the knowledge graph from existing memories to improve search quality and discover hidden relationships.
Usage
slm build-graph [--force] [--clustering]
What It Does
1. Entity Extraction (TF-IDF)
- •Scans all memories
- •Identifies important terms (entities)
- •Creates nodes in knowledge graph
- •Examples: "FastAPI", "JWT", "PostgreSQL", "React hooks"
2. Relationship Discovery
- •Finds memories sharing entities
- •Calculates similarity scores
- •Creates edges between related nodes
- •Discovers indirect connections
3. Topic Clustering (Optional)
- •Groups related memories into clusters
- •Uses Leiden algorithm (community detection)
- •Creates semantic topic groups
- •Examples: "Authentication cluster", "Database cluster"
Examples
Example 1: Basic Graph Build
$ slm build-graph
Output:
🔄 Building Knowledge Graph... Phase 1: Entity Extraction Scanning 1,247 memories... Extracted 892 unique entities Created 892 graph nodes ✓ Complete (3.2s) Phase 2: Relationship Discovery Computing similarity scores... Created 3,456 edges (relationships) Avg edges per node: 3.9 ✓ Complete (5.1s) Phase 3: Optimization Indexing graph structure... Pruning weak edges (score < 0.3)... Final edge count: 2,134 ✓ Complete (1.2s) ✅ Knowledge graph built successfully! Graph Statistics: Nodes: 892 Edges: 2,134 Density: 0.27% Largest Component: 856 nodes (96%) Next: Use `slm recall` to see improved search results
Example 2: Force Rebuild
$ slm build-graph --force
Rebuilds from scratch (deletes existing graph first)
Use when:
- •Graph seems corrupted
- •Major bulk import completed
- •Want fresh start
Example 3: With Clustering
$ slm build-graph --clustering
Requires optional dependencies:
pip3 install python-igraph leidenalg
Additional output:
Phase 4: Topic Clustering (Leiden)
Detecting communities...
Found 47 clusters
Largest cluster: 89 memories
Smallest cluster: 3 memories
Modularity score: 0.82 (excellent)
✓ Complete (2.3s)
Discovered Clusters:
Cluster 1 (89 memories): "Authentication & Security"
Top entities: JWT, OAuth, tokens, auth, security
Cluster 2 (76 memories): "Database & PostgreSQL"
Top entities: PostgreSQL, database, SQL, queries, indexes
Cluster 3 (54 memories): "React & Frontend"
Top entities: React, hooks, components, state, props
...
Arguments
| Argument | Description | When to Use |
|---|---|---|
--force | Delete existing graph and rebuild | Corruption, fresh start |
--clustering | Run topic clustering | Want to discover topic groups |
--verbose | Show detailed progress | Debugging, understanding process |
--dry-run | Preview without saving | Testing, analysis |
When to Run
Always Run After:
- •Bulk imports - Added 50+ memories at once
- •Database restore - Restored from backup
- •Major project milestone - Sprint complete, project phase done
Run Periodically:
- •Monthly - Keep graph optimized
- •After 500 new memories - Maintain quality
- •When search feels slow - Rebuild indexes
Run on Issues:
- •Poor search results - Graph may be stale
- •Missing relationships - Rebuild connections
- •Corrupted graph errors - Force rebuild
What Gets Built
Graph Nodes
Entities extracted from memories:
- •Technologies: "FastAPI", "PostgreSQL", "React"
- •Concepts: "authentication", "performance", "testing"
- •Patterns: "TDD", "async", "REST API"
- •Decisions: "prefer X over Y"
Node properties:
- •Entity text
- •Frequency (how many memories mention it)
- •Importance score
- •First seen / last seen
Graph Edges
Relationships between entities:
- •Similarity edge: Memories share similar content
- •Co-occurrence edge: Entities appear together
- •Sequential edge: Memories created close in time
Edge properties:
- •Similarity score (0.0 - 1.0)
- •Shared entities list
- •Edge type
Clusters (if --clustering)
Topic groups discovered:
- •Cluster ID
- •Cluster name (auto-generated from top entities)
- •Member memories (which memories belong)
- •Top entities in cluster
- •Modularity score (how well-defined)
Performance
| Memory Count | Build Time | Notes |
|---|---|---|
| 100 | ~1s | Instant |
| 1,000 | ~10s | Fast |
| 10,000 | ~2min | Acceptable |
| 50,000+ | ~15min | Plan accordingly |
With clustering (add ~50%):
- •1,000 memories: ~15s
- •10,000 memories: ~3min
Factors affecting speed:
- •Memory content length
- •Vocabulary size (unique words)
- •Hardware (CPU, RAM)
Advanced Usage
Incremental Updates
# Add new memories slm remember "New content..." --tags new # Incremental graph update (fast) slm build-graph # Only processes new memories # Force full rebuild (slower, thorough) slm build-graph --force
Monitoring Quality
# Check graph stats before slm status | grep "Knowledge Graph" # Build graph slm build-graph --verbose # Check stats after slm status | grep "Knowledge Graph"
Scripting & Automation
Weekly rebuild (cron job):
#!/bin/bash # Every Sunday at 3 AM echo "$(date): Starting graph rebuild" slm build-graph --clustering >> /var/log/slm-build.log 2>&1 echo "$(date): Graph rebuild complete"
Post-import hook:
#!/bin/bash # After bulk import memories_added=$1 if [ "$memories_added" -gt 50 ]; then echo "Large import detected, rebuilding graph..." slm build-graph fi
Clustering Analysis
# Build with clustering slm build-graph --clustering # Check discovered clusters slm status --verbose | grep -A 20 "Topic Clusters" # Search within specific cluster slm recall "FastAPI" --cluster "Backend & APIs"
Troubleshooting
"Build failed: Memory error"
Cause: Not enough RAM for large graph
Solution:
# Build in chunks (process fewer memories at once) slm build-graph --chunk-size 1000 # Or increase system memory # Or archive old memories
"Clustering requires python-igraph"
Cause: Optional dependencies not installed
Solution:
pip3 install python-igraph leidenalg # Verify python3 -c "import igraph; import leidenalg" # Try again slm build-graph --clustering
"Graph build slow"
Causes:
- •Large database
- •Slow disk I/O
- •Complex memory content
Solutions:
# Show progress slm build-graph --verbose # Skip clustering (faster) slm build-graph # No --clustering flag # Check disk space df -h ~/.claude-memory/
"Edges seem wrong"
Cause: Stale graph or poor similarity threshold
Solution:
# Force complete rebuild slm build-graph --force # Adjust similarity threshold (advanced) slm build-graph --min-similarity 0.4 # Default: 0.3
Graph Metrics Explained
Node Count
Total unique entities found
- •Good: > 100 for 1,000 memories
- •Poor: < 10 for 1,000 memories
Why matters: More nodes = richer semantic understanding
Edge Count
Total relationships discovered
- •Good: Edges/Nodes ratio > 2
- •Poor: Ratio < 1 (disconnected graph)
Why matters: More edges = better search via relationships
Density
How connected the graph is
- •Formula: (Edges / Possible Edges) × 100
- •Typical: 0.1% - 1%
- •Too low (<0.05%): Memories very disconnected
- •Too high (>5%): May indicate poor entity extraction
Largest Component
Size of biggest connected subgraph
- •Good: >80% of nodes
- •Poor: <50% (fragmented knowledge)
Why matters: Smaller component = isolated knowledge islands
Modularity (Clustering)
How well-defined clusters are
- •Excellent: >0.7
- •Good: 0.5 - 0.7
- •Poor: <0.3
Why matters: Higher = clearer topic separation
Impact on Other Commands
slm recall (Search)
Before graph build:
- •Relies mainly on keyword matching
- •May miss related memories
After graph build:
- •Discovers indirect relationships
- •Finds conceptually similar memories
- •Better ranked results
Example:
Query: "authentication" Before: - Direct matches only (JWT, auth, login) After: - Direct matches (JWT, auth, login) - + Related concepts (security, tokens, OAuth) - + Connected memories (API design, user management)
slm status
Shows updated graph statistics
slm switch-profile
Each profile has separate graph
Notes
- •Non-destructive: Original memories never modified
- •Idempotent: Can run multiple times safely
- •Automatic: Search uses graph automatically after build
- •Privacy: All processing local
Related Commands
- •
slm recall- Search uses the graph - •
slm status- Check graph stats - •
slm remember- Add memories (triggers incremental update)
Created by: Varun Pratap Bhardwaj (Solution Architect) Project: SuperLocalMemory V2 License: MIT with attribution requirements (see ATTRIBUTION.md) Repository: https://github.com/varun369/SuperLocalMemoryV2
Open source doesn't mean removing credit. Attribution must be preserved per MIT License terms.