Memory Evolution
Iron Law
EVOLVE ONLY WITH EVIDENCE.
Every evolution action (refine, deprecate, merge, archive, induce) must be justified by:
- •Q-value data (usage + outcome signals)
- •Usage history (success/failure patterns)
- •Explicit feedback signals
Never evolve based on gut feeling or speculation.
Trigger Conditions
This skill is activated when:
- •Scheduled: Every 6 hours (configurable)
- •Q-update threshold: 100+ Q-value updates accumulated
- •Process threshold: 20+ new Process nodes created
- •Manual: User runs
/evolvecommand
Evolution Workflow
Step 1: Load Supporting Skills
Before taking action, load these meta skills for guidance:
code
skill_search("skill-editing") → How to edit/deprecate
skill_search("skill-creation") → How to induce new skills
skill_search("learning-from-experience") → Feedback patterns
Step 2: Query Evolution Candidates
Use query_evolution_candidates tool to find memories needing attention:
code
Candidates by Priority:
├── CRITICAL: Q < 0.2, usage >= 10
│ └── Action: Likely deprecation (consistent failure)
├── HIGH: Q < 0.3, usage >= 5
│ └── Action: Consider refinement (fixable issues)
├── MEDIUM: 90+ days stale
│ └── Action: Consider archival (no recent relevance)
└── LOW: Process clusters without skills
└── Action: Consider skill induction
Query with different criteria:
- •
query_evolution_candidates(criteria="low_q")→ Low Q-value nodes - •
query_evolution_candidates(criteria="stale")→ Unused for 90+ days - •
query_evolution_candidates(criteria="unlinked_processes")→ Processes without skills - •
query_evolution_candidates(criteria="similar_pairs")→ Potential merge candidates
Step 3: Evaluate Each Candidate
For each candidate, use get_usage_history to understand patterns:
code
get_usage_history(node_id, limit=10)
→ Returns: [{timestamp, outcome, context}, ...]
Decision matrix:
| Q-value | Usage Count | Pattern | Action |
|---|---|---|---|
| < 0.2 | >= 10 | Consistent failure | deprecate |
| < 0.3 | >= 5 | Identifiable fix | refine |
| < 0.3 | >= 5 | No clear fix | Keep, monitor |
| any | >= 5 | Similarity > 0.95 | merge |
| any | 0 for 90+ days | No recent queries | archive |
| N/A | 3+ Processes | Similar triggers | induce skill |
Step 4: Execute Actions
Deprecate (mark as obsolete)
code
deprecate_node(
node_id="fact_xxx",
node_type="Fact",
reason="Low Q-value (0.15) with 12 uses, consistent failure pattern"
)
- •Node will no longer appear in retrieval
- •Provenance chain preserved for auditing
Refine (create improved version)
code
refine_node(
node_id="process_xxx",
node_type="Process",
improved_content="...", # New trigger/action/outcome
reason="Clarified trigger condition based on failure patterns"
)
- •Creates new node with SUPERSEDES relationship
- •Old node marked as superseded
- •New node starts with Q=0.5
Merge (combine redundant nodes)
code
merge_nodes(
node_ids=["fact_001", "fact_002"],
node_type="Fact",
merged_content="Combined fact with complete information"
)
- •Keeps node with highest Q-value
- •Merges provenance chains
- •Others marked MERGED_INTO
Archive (mark as historical)
code
archive_node(
node_id="fact_xxx",
node_type="Fact"
)
- •Node marked as archived with timestamp
- •Still accessible but deprioritized in retrieval
Induce Skill (from Process cluster)
code
induce_skill(
process_ids=["proc_001", "proc_002", "proc_003"],
skill_name="debug-memory-leak",
description="Use when encountering OOM errors",
content="# Steps\n1. Capture heap dump\n..."
)
- •Requires 3+ similar Processes (follow
skill-creationguidelines) - •Creates INSTANCE_OF relationships
- •Skill written to file system
Step 5: Report Results
Return structured summary:
json
{
"candidates_evaluated": 15,
"actions_taken": {
"deprecated": 2,
"refined": 3,
"merged": 1,
"archived": 5,
"induced_skills": 1
},
"details": [
{"node_id": "fact_xxx", "action": "deprecated", "reason": "..."},
{"node_id": "process_yyy", "action": "refined", "new_id": "process_zzz", "reason": "..."}
],
"summary": "Deprecated 2 low-quality facts, refined 3 processes with fixable issues, induced 1 skill from OOM debugging patterns"
}
Evidence Requirements
| Action | Minimum Evidence |
|---|---|
| Deprecate | Q < 0.2, usage >= 10, failure pattern in history |
| Refine | Q < 0.3, usage >= 5, identifiable improvement path |
| Merge | Similarity > 0.95, same node type |
| Archive | No usage for 90+ days |
| Induce | 3+ similar generalizable Processes |
Anti-patterns
DO NOT:
- •Deprecate without checking usage history
- •Refine when fundamental approach is wrong (should deprecate instead)
- •Merge semantically different memories just because wording is similar
- •Induce skills from < 3 processes (insufficient evidence)
- •Archive frequently used memories
- •Evolve protected meta skills without explicit justification
Conservative by Default
When in doubt:
- •Keep > Deprecate: A marginally useful memory is better than none
- •Monitor > Act: Wait for more evidence if uncertain
- •Refine > Replace: Improve existing rather than start over
Integration with Q-Learning
Evolution actions feed back into the Q-learning cycle:
code
┌─────────────────────────────────────────────────────┐ │ Low Q-value detected │ │ ↓ │ │ Evolution evaluates │ │ ↓ │ │ Action taken (refine/deprecate) │ │ ↓ │ │ New/updated memory enters retrieval │ │ ↓ │ │ Usage generates new feedback │ │ ↓ │ │ Q-value updates │ │ ↓ │ │ (cycle continues) │ └─────────────────────────────────────────────────────┘
Refined memories get a fresh start (Q=0.5) to prove their value.