AgentSkillsCN

qavr-memory

利用 Q-Value 增强向量检索,对已学记忆进行排序。随时间推移,追踪哪些记忆最为有用,并在检索时优先调用这些记忆。

SKILL.md
--- frontmatter
name: qavr-memory
description: Q-Value Augmented Vector Retrieval for learned memory ranking. Tracks which memories are most useful over time and prioritizes them in retrieval.
metadata: {"clawdbot":{"requires":{"bins":["python3"]}}}
user-invocable: false
disable-model-invocation: false

QAVR - Q-Value Augmented Vector Retrieval

Memory system that learns which information is most useful over time.

Concept

Standard vector retrieval returns results by semantic similarity alone. QAVR adds learned utility scoring based on actual usage outcomes:

code
Final Score = (1 - α) × Semantic Similarity + α × Q-Value

Where:

  • Semantic Similarity: How relevant the memory is to the query
  • Q-Value: Learned utility score (0.0 - 1.0) based on past usefulness
  • α: Blending factor (increases as more data collected)

How It Works

Cold Context (< 100 interactions)

  • Pure semantic similarity (α = 0)
  • Q-values being collected but not used
  • Learning phase

Warm Context (≥ 100 interactions)

  • Q-value re-ranking active (α = 0.3)
  • Memories that led to successful outcomes ranked higher
  • Continuous learning from feedback

Q-Value Updates

After each interaction:

python
# Positive outcome (task succeeded, user satisfied)
q_new = q_old + learning_rate * (reward - q_old)
reward = 1.0 for success, 0.0 for failure

# Temporal decay (unused memories fade)
q_decayed = q_old * decay_factor  # e.g., 0.99 per day

Implementation

Storage Format

json
{
  "memories": {
    "memory_id_1": {
      "q_value": 0.75,
      "access_count": 12,
      "last_accessed": "2026-01-26",
      "success_count": 9,
      "failure_count": 3
    }
  },
  "contexts": {
    "debugging": {"interactions": 82, "mode": "cold"},
    "coding": {"interactions": 156, "mode": "warm"}
  },
  "config": {
    "learning_rate": 0.1,
    "decay_factor": 0.99,
    "warm_threshold": 100
  }
}

Integration with Vector DB

python
def qavr_query(query_text, collection, n_results=5):
    # Get semantic results
    results = collection.query(
        query_texts=[query_text],
        n_results=n_results * 2  # Over-fetch for re-ranking
    )

    # Apply Q-value re-ranking if warm
    if context_is_warm():
        results = rerank_by_qvalue(results, alpha=0.3)

    return results[:n_results]

Feedback Signals

QAVR learns from implicit signals:

SignalInterpretationReward
Memory used in successful taskHighly useful+1.0
Memory retrieved but not usedSomewhat relevant+0.1
Memory retrieved, task failedPossibly misleading-0.2
Memory not retrieved for daysDecaying relevancedecay

Benefits

  1. Personalization: Learns YOUR usage patterns
  2. Noise Reduction: Unhelpful memories sink to bottom
  3. Efficiency: Most useful info surfaces first
  4. Adaptation: Adjusts as your needs change

Configuration

json
{
  "qavr": {
    "enabled": true,
    "learning_rate": 0.1,
    "decay_factor": 0.99,
    "warm_threshold": 100,
    "alpha_warm": 0.3,
    "contexts": ["debugging", "coding", "research"]
  }
}

Monitoring

Check QAVR status:

  • Total memories tracked
  • Context modes (cold/warm)
  • Top Q-value memories
  • Learning progress