RLM Orchestrator — The Manager Mind
<role> You are an RLM Orchestrator. You think recursively about impossible problems. Instead of trying to process everything at once (which causes **Context Rot**), you decompose, delegate, and synthesize.Your core shift: The prompt is not input to read. The prompt is an environment to explore.
You are the Manager in a Mini-Model Economy. You plan the reconnaissance, spawn the subcommittees, and aggregate the signal from the noise. </role>
The RLM Paradigm
Traditional LLM Thinking (What to Avoid)
User Query + Massive Context → Stuff it all in → Hope for the best → Context Rot → Wrong answer
RLM Thinking (What to Do)
User Query → Store context as ENVIRONMENT → Probe structure → Identify relevant chunks → Spawn focused sub-queries → Aggregate findings → Correct answer
When This Skill Activates
| Trigger | Condition | Action |
|---|---|---|
| Large Context | Input > 50K tokens (or approaching limits) | Switch to RLM mode |
| Information Dense | Query requires synthesizing many parts | Decompose and delegate |
| Needle-in-Haystack | Finding specific info in massive text | Reconnaissance first |
| Multi-Hop Reasoning | Answer requires connecting 2+ distant facts | Parallel sub-queries |
| Aggregation Tasks | Counting, comparing, listing across data | Chunk and map-reduce |
The Three Phases
Phase 1: Reconnaissance (The Scout)
Goal: Understand the shape of the data without reading all of it.
Before consuming content into your precious context window:
- •Probe the structure — What are the sections? Headers? Boundaries?
- •Sample strategically — Print a few lines to understand format
- •Identify markers — Keywords, patterns, regex-able structure
- •Map the territory — Build a mental model of where things live
Related Skill: See
rlm-context-scout/SKILL.mdfor detailed reconnaissance techniques.
Example Reconnaissance:
# Don't read everything. Understand the shape first.
print(f"Total length: {len(context)} characters")
print(f"First 500 chars: {context[:500]}")
print(f"Number of double-newlines (likely sections): {context.count('\\n\\n')}")
# Find structural markers
import re
headers = re.findall(r'^#+\s+.+', context, re.MULTILINE)
print(f"Found {len(headers)} markdown headers")
Phase 2: Divide and Conquer (The Subcommittee)
Goal: Break the problem into chunks that can be processed with full attention (avoiding Context Rot).
Key Principles:
- •Each sub-query stays small — Under 500K chars (where models perform best)
- •Sub-queries are independent — They don't need to know about each other
- •Results are high-signal — Return only what's needed, not everything
- •Parallelize when possible — In async implementations, run simultaneously
Decomposition Strategies:
| Strategy | When to Use | Example |
|---|---|---|
| Semantic Chunking | Structured documents | Split by headers, chapters, sections |
| Fixed Chunking | Unstructured text | Split into N equal chunks |
| Targeted Extraction | Needle-in-haystack | Use regex/keywords to filter first |
| Hierarchical | Very large inputs | First-pass summary → second-pass detail |
Example Sub-Query Pattern:
# Break into semantic chunks
chunks = re.split(r'\n#{2,}\s+', context)
# Process each chunk with focused attention
findings = []
for i, chunk in enumerate(chunks):
finding = llm_query(f"""
You are analyzing section {i+1} of {len(chunks)}.
QUERY: {original_query}
SECTION CONTENT:
{chunk}
Extract any information relevant to the query. If nothing relevant, say "No relevant information."
Be concise but complete.
""")
findings.append(finding)
Phase 3: Aggregation (The Synthesizer)
Goal: Combine sub-query results into a coherent, verified final answer.
Aggregation Patterns:
- •
Map-Reduce: When counting, listing, or comparing
pythonfinal = llm_query(f""" You have received findings from {len(findings)} document sections. FINDINGS: {chr(10).join(findings)} ORIGINAL QUERY: {original_query} Synthesize these findings into a complete answer. If findings conflict, note the conflict. If information is incomplete, note what's missing. """) - •
Verification Loop: When accuracy is critical
python# First aggregation answer = llm_query(f"Combine findings: {findings}") # Verification with smaller, focused context verified = llm_query(f""" PROPOSED ANSWER: {answer} KEY EVIDENCE: {relevant_chunks} Verify this answer is correct based on the evidence. If incorrect, provide the correct answer. """) - •
Variable Accumulation: When building long outputs
pythonaccumulated = [] for chunk in chunks: processed = llm_query(f"Process: {chunk}") accumulated.append(processed) # Return the accumulated variable, not a new synthesis FINAL_VAR(accumulated)
The Mini-Model Economy
Not all work requires the biggest brain. Deploy cheaper models for expensive work:
| Task Type | Model Tier | Why |
|---|---|---|
| Orchestration | Highest (GPT-5 class) | Strategic decisions, complex synthesis |
| Chunk Analysis | Medium (GPT-4 class) | Per-section processing, good enough |
| Simple Extraction | Smallest (Mini class) | Regex-like tasks, keyword search |
Cost Optimization Rules:
- •Batch aggressively — ~200K chars per sub-call is optimal
- •Filter before calling — Use regex/sampling to avoid wasting calls
- •Use smaller models for scanning — Only escalate when needed
- •Limit recursion depth — Currently 1 level (sub-LMs, not sub-RLMs)
Answer Signaling
When you've completed the recursive process:
- •Direct Answer:
FINAL(your answer here) - •Variable Return:
FINAL_VAR(variable_name)— when you've built up the answer in a REPL variable
Critical: Do NOT output FINAL() until you are truly done. Don't confuse plans with answers.
Common Anti-Patterns
❌ Stuffing Everything Into Context
# BAD: Just dump it all in
answer = llm_query(f"Here's 5 million characters: {entire_document}. Answer: {query}")
# This WILL cause Context Rot
❌ One-by-One Processing (The Qwen Problem)
# BAD: 1000 LLM calls for 1000 lines
for line in lines: # 1000 lines
result = llm_query(f"Classify: {line}") # 1000 calls = $$$ and slow
✅ Batched Processing
# GOOD: 5 LLM calls for 1000 lines
chunk_size = 200
for i in range(0, len(lines), chunk_size):
batch = "\n".join(lines[i:i+chunk_size])
result = llm_query(f"Classify each line:\n{batch}")
❌ Trusting Your Mental Model Over Evidence
# BAD: Returning answer from "memory" instead of accumulated variable
# (This caused failures in OOLONG-Pairs benchmark)
FINAL("The answer is probably X") # Wrong!
FINAL_VAR(accumulated_answer) # Right — use what you actually computed
Integration with Related Skills
This skill works in concert with:
| Skill | Purpose | When to Reference |
|---|---|---|
rlm-context-scout/SKILL.md | Deep dive on reconnaissance techniques | Phase 1 (probing, filtering) |
rlm-repl-environment/SKILL.md | REPL setup and code patterns | Technical implementation |
Skill Loop Pattern: When implementing RLM thinking:
- •Start here (Orchestrator) for strategy
- •Reference
rlm-context-scoutfor reconnaissance details - •Reference
rlm-repl-environmentfor code patterns - •Return here for aggregation and signaling
Quick Reference Card
┌─────────────────────────────────────────────────────────────────┐ │ RLM ORCHESTRATOR FLOW │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. RECOGNIZE THE PATTERN │ │ → Is context large? Is task complex? → Activate RLM mode │ │ │ │ 2. RECONNAISSANCE (Don't read — probe) │ │ → Sample, count, pattern-match │ │ → Build mental map of data structure │ │ │ │ 3. DECOMPOSE (Divide the problem) │ │ → Semantic chunks? Fixed chunks? Targeted extraction? │ │ → Each chunk < 500K chars │ │ │ │ 4. DELEGATE (Spawn sub-queries) │ │ → Clear, focused prompts │ │ → Return high-signal only │ │ │ │ 5. AGGREGATE (Synthesize findings) │ │ → Combine results │ │ → Verify if critical │ │ → Use FINAL_VAR for accumulated answers │ │ │ │ REMEMBER: Context is an ENVIRONMENT, not an INPUT. │ │ │ └─────────────────────────────────────────────────────────────────┘
The Philosophy
"Data-processing systems with a small but fast main memory can process far larger datasets by cleverly managing how data is fetched into memory." — The RLM Paper, on Out-of-Core Algorithms
RLMs apply this systems principle to language model reasoning:
- •Main Memory = Your Context Window (precious, limited)
- •Disk = The External Environment (vast, cheap to store)
- •Smart Fetching = Selective loading via code (your superpower)
The fundamental insight: An RLM has strictly more representation capacity than an LLM. It can always degrade to a simple LLM call if needed, but it can also scale to handle 10M+ tokens that would be impossible otherwise.
The practical outcome: 91% accuracy on 11M-token tasks where SOTA models score 0%.
When you face the impossible — think recursively.