Coder Memory Recall V3.2 - Clean Vector Architecture

Purpose: Efficiently retrieve relevant patterns using two-stage retrieval (preview → full content).

When to Use:

•Before starting complex/unfamiliar tasks
•When encountering technical problems
•User says "--recall" or asks for past patterns
•Need architectural guidance or debugging strategies

When NOT to Use:

•Routine/trivial tasks
•Just recalled similar knowledge
•Project-specific questions (use git history)

Remember: Search for both #success and #failure - failures often more valuable.

EMBEDDED ROLE CONFIGURATION

yaml

# Embedded configuration - no external files needed
role_collections:
  global:
    universal:
      name: "universal-patterns"
      description: "Search here for cross-domain patterns"
      query_hints: ["general", "architecture", "debugging", "performance"]

    backend:
      name: "backend-patterns"
      description: "Backend engineering patterns"
      query_hints: ["api", "database", "auth", "server", "microservices"]

    frontend:
      name: "frontend-patterns"
      description: "Frontend engineering patterns"
      query_hints: ["react", "vue", "ui", "component", "state"]

    quant:
      name: "quant-patterns"
      description: "Quantitative finance patterns"
      query_hints: ["trading", "backtest", "risk", "portfolio"]

    devops:
      name: "devops-patterns"
      description: "DevOps and infrastructure patterns"
      query_hints: ["docker", "kubernetes", "ci-cd", "terraform"]

    ml:
      name: "ml-patterns"
      description: "Machine learning patterns"
      query_hints: ["model", "training", "neural", "llm", "embedding"]

    security:
      name: "security-patterns"
      description: "Security engineering patterns"
      query_hints: ["vulnerability", "encryption", "auth", "pentest"]

    mobile:
      name: "mobile-patterns"
      description: "Mobile development patterns"
      query_hints: ["ios", "android", "react-native", "flutter"]

# Role detection from task context
role_detection:
  patterns:
    backend: "api|endpoint|database|server|auth|rest|graphql"
    frontend: "react|vue|component|ui|dom|css|state"
    quant: "trading|backtest|portfolio|risk|market"
    devops: "deploy|docker|kubernetes|ci|cd"
    ml: "model|training|neural|embedding|llm"
    security: "vulnerability|encryption|pentest|jwt"
    mobile: "ios|android|native|flutter|swift"

  multi_role_strategy: "search_all"  # When multiple roles detected
  default_role: "universal"          # When no clear role

PHASE 1: Intelligent Query Construction

Role Detection

python

# Analyze task context for role keywords
detected_roles = detect_roles_from_context(task_description)

if len(detected_roles) == 0:
    roles_to_search = ["universal"]
elif len(detected_roles) == 1:
    roles_to_search = detected_roles
else:  # Multiple roles detected
    roles_to_search = detected_roles + ["universal"]

Query Building

Semantic Query (not just keywords):

python

# Build 2-3 sentence summary for semantic search
query = build_semantic_query(
    problem_description,
    technical_context,
    desired_outcome
)

# Example output:
# "Implementing rate limiting for REST API to prevent abuse.
#  Need patterns for handling burst traffic and client fairness.
#  Looking for proven approaches that scale."

Memory Type Hints:

•Starting new implementation? → Focus on procedural/semantic
•Debugging specific issue? → Focus on episodic
•Need architecture guidance? → Focus on semantic
•Unclear? → Search all types

PHASE 2: Two-Stage Retrieval

Stage 1: Search for Previews

python

all_previews = []
for role in roles_to_search:
    previews = search_memory(
        query=semantic_query,
        memory_level="global",
        role=role,
        limit=10  # Cast wider net
    )
    all_previews.extend(previews.results)

# Sort by similarity across all roles
all_previews.sort(key=lambda x: x.similarity, reverse=True)

Preview Analysis (Intelligence, not thresholds):

python

relevant_previews = []
for preview in all_previews:
    relevance = analyze_preview(
        preview.title,
        preview.description,
        preview.tags,
        task_context
    )
    if relevance.is_relevant:
        relevant_previews.append({
            "preview": preview,
            "relevance_reason": relevance.reason,
            "priority": relevance.priority  # high/medium/low
        })

Stage 2: Retrieve Full Content

Efficient Batch Retrieval:

python

# Group by role for efficient retrieval
by_role = group_by_role(relevant_previews)

full_memories = {}
for role, preview_group in by_role.items():
    doc_ids = [p.preview.doc_id for p in preview_group]

    # Batch retrieve for efficiency
    memories = batch_get_memories(
        doc_ids=doc_ids,
        memory_level="global",
        role=role
    )
    full_memories[role] = memories.memories

Token Efficiency:

•Search examined: 10-30 previews (~500 tokens)
•Full content retrieved: 3-5 memories (~2000 tokens)
•Savings: ~80% vs retrieving all full content

PHASE 3: Relevance Ranking

Multi-Factor Scoring

python

def rank_memories(full_memories, task_context):
    scored = []
    for memory in full_memories:
        score = calculate_relevance(
            semantic_similarity=memory.similarity,
            memory_type_match=matches_needed_type(memory.type),
            tag_overlap=count_relevant_tags(memory.tags),
            temporal_relevance=get_recency_weight(memory.created_at),
            outcome_alignment=matches_desired_outcome(memory.tags)
        )
        scored.append((memory, score))

    # Return top 3 most relevant
    return sorted(scored, key=lambda x: x[1], reverse=True)[:3]

Outcome Consideration

•Need proven solution? → Prioritize #success
•Debugging failure? → Prioritize #failure patterns
•Exploring options? → Mix both

PHASE 4: Present Results

Standard Format

code

🔍 Memory Recall Results

**Query**: <user question or inferred need>
**Roles Searched**: backend, universal
**Previews Examined**: 24
**Memories Retrieved**: 5

---

## 🥇 Most Relevant: [Title]

**Role**: backend-patterns
**Type**: Procedural
**Relevance**: High - Direct pattern match for rate limiting
**Tags**: #backend #api #rate-limiting #success

<Full memory content here>

**Why This Helps**: <1-2 sentences on specific applicability>

---

## 🥈 Relevant: [Title]

[Similar format]

---

## 🥉 Related: [Title]

[Similar format]

---

## 💡 Application Guidance

<2-3 sentences synthesizing insights and suggesting application>

**Token Efficiency**: Saved ~2,400 tokens by retrieving only relevant memories

No Results Format

code

🔍 Memory Recall Results

**Query**: <query>
**Roles Searched**: backend, frontend, universal
**Previews Examined**: 30
**Relevant Found**: 0

No existing patterns match your specific need.

**Suggestions**:
- This might be a novel problem worth storing after solving
- Try broader search terms or different role
- Proceed with first principles and document learnings

PHASE 5: Learning Feedback Loop

Update Recall Metadata

python

# Track which memories were helpful
for memory in presented_memories:
    update_memory(
        doc_id=memory.doc_id,
        document=memory.document,  # Unchanged
        metadata={
            ...existing,
            "last_recall_time": now(),
            "recall_count": existing.recall_count + 1,
            "helpfulness": track_if_applied()  # Future enhancement
        },
        memory_level="global"
    )

Pattern Recognition

If no relevant memories found but task succeeds:

•Strong signal to store new pattern after completion
•Indicates knowledge gap in memory system

Key V3.2 Improvements

•True Two-Stage: Actually implemented, saves ~80% tokens
•Embedded Config: No external files to maintain
•Multi-Role Search: Searches across relevant roles
•Intelligent Ranking: Context-aware relevance scoring
•Efficiency Metrics: Shows token savings

Workflow Examples

Example 1: API Development

code

Task: "Implement webhook processing system"

1. Detect roles: ["backend", "universal"]
2. Build query: "Implementing webhook processing system. Need patterns
                 for reliable delivery, retry logic, and event ordering."
3. Search both collections → 18 previews
4. Analyze previews → 4 relevant
5. Retrieve full → 4 memories
6. Rank by relevance → Present top 3
7. Token efficiency: 80% saved

Example 2: Debugging Session

code

Task: "React component re-rendering infinitely"

1. Detect roles: ["frontend"]
2. Build query: "React component stuck in infinite re-render loop.
                 Need debugging patterns for render cycles and state updates."
3. Search frontend-patterns → 12 previews
4. Focus on #failure tags → 3 relevant
5. Retrieve full → 3 memories
6. All 3 are relevant failure patterns
7. Present with emphasis on failure lessons

Example 3: Cross-Domain Architecture

code

Task: "Design event-driven microservices"

1. Detect roles: ["backend", "devops", "universal"]
2. Build query: "Designing event-driven microservice architecture.
                 Need patterns for event sourcing, service communication."
3. Search all three → 35 previews
4. Analyze → 7 relevant across roles
5. Retrieve full → 7 memories
6. Rank → Mix of backend (events) + devops (deploy) + universal (architecture)
7. Present top 3 with cross-domain synthesis

Testing This Skill

Test Two-Stage

python

# Search should return previews only
result = search_memory("api patterns", "global", role="backend")
assert "doc_id" in result.results[0]
assert "document" not in result.results[0]  # No full content

# Get should return full content
full = get_memory(result.results[0].doc_id, "global", "backend")
assert "document" in full  # Has full content

Test Role Detection

code

"Building REST API" → ["backend"]
"React component styling" → ["frontend"]
"Deploy with Docker" → ["devops"]
"Neural network training" → ["ml"]
"API with React frontend" → ["backend", "frontend", "universal"]

Test Efficiency

code

# Measure tokens
search_tokens = count_tokens(search_results)  # ~500
full_tokens = count_tokens(all_full_memories)  # ~10,000
efficient_tokens = count_tokens(retrieved_only)  # ~2,000
savings = 1 - (search_tokens + efficient_tokens) / full_tokens  # ~75%