RLM (Recursive Language Model) Skill
When to Use RLM
Activate RLM processing when ANY of the following apply:
- •Context exceeds threshold: Total input > 100K tokens (~400K chars)
- •Multi-document reasoning: Task requires synthesizing 5+ documents
- •Aggregation tasks: Questions about "all", "every", "count of", "list all pairs"
- •Information-dense queries: Answer depends on most/all of the input (not needle-in-haystack)
- •Explicit request: User asks to "analyze this codebase", "process these logs"
Subcall Model Recommendations (Advisory)
The MCP server does not enforce model selection. These are recommendations:
| Root Model | Subcall Model | Bulk/Map Model | Use Case |
|---|---|---|---|
| Opus 4.5 | Sonnet 4.5 | Haiku 4.5 | Complex synthesis, deep analysis |
| Sonnet 4.5 | Haiku 4.5 | Haiku 4.5 | Standard workflows |
| Haiku 4.5 | Haiku 4.5 | Haiku 4.5 | Cost-sensitive, simple aggregation |
Critical: Haiku is for bulk passes only. Never invoke Haiku per-line — batch chunks.
Workflow Pattern
- •Initialize:
rlm.session.createwith appropriate config - •Load:
rlm.docs.loaddocuments - •Probe:
rlm.docs.peekat structure (first lines, format detection) - •Search:
rlm.search.queryto find relevant sections (lazy-builds BM25) - •Chunk:
rlm.chunk.createwith appropriate strategy - •Process:
rlm.span.get+ client subcalls on spans - •Store:
rlm.artifact.storeresults with span provenance - •Synthesize: Aggregate artifacts into final answer
- •Close:
rlm.session.close
Chunking Strategy Selection
| Content Type | Strategy | Params | Rationale |
|---|---|---|---|
| Source code | delimiter | "\ndef |\nclass " | Preserve semantic units |
| Logs | lines | line_count: 100, overlap: 10 | Temporal locality |
| Markdown | delimiter | "\n## " | Section boundaries |
| JSON/JSONL | lines | line_count: 1 | Record-level processing |
| Plain text | fixed | chunk_size: 50000, overlap: 500 | Balanced chunks |
Cost Guardrails
- •Max tool calls per session: 500 (default), warn at 400
- •Max chars per response: 50K (server-enforced)
- •Max chars per peek: 10K (server-enforced)
- •Batch aggressively: Prefer 10 docs per subcall over 1 doc per subcall
- •Cache reuse: Check artifacts before re-querying same span
- •Use span provenance: Every artifact should trace back to its source span
Anti-patterns to Avoid
❌ One subcall per line (Qwen3-Coder's failure mode — thousands of calls)
❌ Loading entire context into single subcall
❌ Ignoring cached artifacts from prior analysis
❌ Returning raw subcall outputs without synthesis
❌ Forgetting to close session
❌ Ignoring truncated: true in responses