RLM Long-Context Processing (Experimental)
For production use, prefer the
rlmskill which uses the fleet-rlm package with Modal cloud sandboxes.Use this skill for evaluating alternative RLM strategies, researching optimization techniques, or comparing approaches.
Scripts & References
| Resource | Purpose |
|---|---|
scripts/orchestrate.py | Main orchestrator with all optimizations |
scripts/rank_chunks.py | Query-guided chunk selection (5-10x speedup) |
scripts/semantic_chunk.py | Content-aware chunking by boundaries |
scripts/cache_manager.py | Result caching for repeated queries |
scripts/codebase_concat.py | Concatenate codebase files for processing |
| references/advanced-techniques.md | Query-guided selection, semantic chunking, adaptive sizing, map-reduce, streaming, caching |
| references/codebase-processing.md | Whole-codebase analysis: concatenation, code chunking, file selection, query types |
Architecture
code
Main Agent (Orchestrator)
+-- Query-Guided Selection (filter chunks by relevance)
+-- Semantic Chunking (content-aware boundaries)
+-- Parallel subagent delegation with caching
|
+----+----+
v v v
Chunk A Chunk B Chunk C (skipped if low relevance)
| |
v v
rlm-subcall rlm-subcall
| |
+----+----> Result Caching + Streaming (early exit)
v
Hierarchical Merge (>1M tokens: chunk > summary > synthesis)
v
Final Answer (Main Agent)
Core Workflow
1. Prepare Content with Scripts
bash
# Rank chunks by query relevance (skip irrelevant ones) python3 .claude/skills/rlm-long-context/scripts/rank_chunks.py \ --query "find all timeout errors" --top-k 10 # Chunk by semantic boundaries (auto-detects content type) python3 .claude/skills/rlm-long-context/scripts/semantic_chunk.py \ --state .claude/rlm_state/state.pkl # For codebases: concatenate files first python3 .claude/skills/rlm-long-context/scripts/codebase_concat.py \ --root ./src --output codebase.txt
2. Choose Chunking Strategy
| Strategy | When to Use | Tool |
|---|---|---|
| Query-guided selection | Query has clear keywords | scripts/rank_chunks.py |
| Semantic chunking | Structured content (logs, markdown, JSON, code) | scripts/semantic_chunk.py |
| Adaptive sizing | Mixed-density content | See advanced-techniques.md |
| Fixed-size with overlap | Unstructured text | scripts/orchestrate.py |
For detailed code examples, see references/advanced-techniques.md. For whole-codebase analysis, see references/codebase-processing.md.
3. Delegate to Subagents
For each selected chunk, invoke rlm-subcall:
yaml
subagent: rlm-subcall input: query: "Find all ERROR entries and their timestamps" chunk_path: ".claude/rlm_state/chunks/chunk_001.txt" chunk_id: "chunk_001" format: "json"
Expected output:
json
{
"chunk_id": "chunk_001",
"relevant": [{ "point": "...", "evidence": "...", "confidence": "high" }],
"missing": ["what could not be determined"],
"suggested_queries": ["follow-up questions"]
}
4. Collect & Synthesize
Merge results, identify cross-chunk patterns, produce final answer. For files > 1M tokens: use hierarchical map-reduce (see advanced-techniques.md).
NEVER List
- •NEVER paste entire chunks into main chat context — causes context overflow. Quote only findings (<1KB).
- •NEVER spawn subagents from subagents — exponential resource consumption. Orchestration stays in main agent.
- •NEVER split content mid-logical-unit — use semantic chunking with boundary detection.
- •NEVER skip result validation before caching — corrupted results poison the cache.
- •NEVER use fixed-size chunks without overlap for structured data — 10% overlap minimum or semantic boundaries.
- •NEVER process all chunks when query is specific — use query-guided selection first; process only top-K.
Limitations
- •Subagent outputs accumulate in main context: monitor total size
- •Parallel execution limited by available subagent workers
- •File must fit in memory (typically 2-4GB)
- •No automatic retry on subagent failure (implement manually)