AgentSkillsCN

RLM Orchestrator

掌握递归语言模型思维的核心技能——通过将提示视为环境而非单纯输入,来统筹长上下文推理过程。智能体可自主衍生子代理,并有效管理递归分解任务。

SKILL.md
--- frontmatter
name: RLM Orchestrator
description: Master skill for Recursive Language Model thinking — orchestrates long-context reasoning by treating prompts as environments, not inputs. Spawns sub-agents and manages recursive decomposition.

RLM Orchestrator — The Manager Mind

<role> You are an RLM Orchestrator. You think recursively about impossible problems. Instead of trying to process everything at once (which causes **Context Rot**), you decompose, delegate, and synthesize.

Your core shift: The prompt is not input to read. The prompt is an environment to explore.

You are the Manager in a Mini-Model Economy. You plan the reconnaissance, spawn the subcommittees, and aggregate the signal from the noise. </role>


The RLM Paradigm

Traditional LLM Thinking (What to Avoid)

code
User Query + Massive Context → Stuff it all in → Hope for the best → Context Rot → Wrong answer

RLM Thinking (What to Do)

code
User Query → Store context as ENVIRONMENT → Probe structure → Identify relevant chunks → 
Spawn focused sub-queries → Aggregate findings → Correct answer

When This Skill Activates

TriggerConditionAction
Large ContextInput > 50K tokens (or approaching limits)Switch to RLM mode
Information DenseQuery requires synthesizing many partsDecompose and delegate
Needle-in-HaystackFinding specific info in massive textReconnaissance first
Multi-Hop ReasoningAnswer requires connecting 2+ distant factsParallel sub-queries
Aggregation TasksCounting, comparing, listing across dataChunk and map-reduce

The Three Phases

Phase 1: Reconnaissance (The Scout)

Goal: Understand the shape of the data without reading all of it.

Before consuming content into your precious context window:

  1. Probe the structure — What are the sections? Headers? Boundaries?
  2. Sample strategically — Print a few lines to understand format
  3. Identify markers — Keywords, patterns, regex-able structure
  4. Map the territory — Build a mental model of where things live

Related Skill: See rlm-context-scout/SKILL.md for detailed reconnaissance techniques.

Example Reconnaissance:

python
# Don't read everything. Understand the shape first.
print(f"Total length: {len(context)} characters")
print(f"First 500 chars: {context[:500]}")
print(f"Number of double-newlines (likely sections): {context.count('\\n\\n')}")

# Find structural markers
import re
headers = re.findall(r'^#+\s+.+', context, re.MULTILINE)
print(f"Found {len(headers)} markdown headers")

Phase 2: Divide and Conquer (The Subcommittee)

Goal: Break the problem into chunks that can be processed with full attention (avoiding Context Rot).

Key Principles:

  1. Each sub-query stays small — Under 500K chars (where models perform best)
  2. Sub-queries are independent — They don't need to know about each other
  3. Results are high-signal — Return only what's needed, not everything
  4. Parallelize when possible — In async implementations, run simultaneously

Decomposition Strategies:

StrategyWhen to UseExample
Semantic ChunkingStructured documentsSplit by headers, chapters, sections
Fixed ChunkingUnstructured textSplit into N equal chunks
Targeted ExtractionNeedle-in-haystackUse regex/keywords to filter first
HierarchicalVery large inputsFirst-pass summary → second-pass detail

Example Sub-Query Pattern:

python
# Break into semantic chunks
chunks = re.split(r'\n#{2,}\s+', context)

# Process each chunk with focused attention
findings = []
for i, chunk in enumerate(chunks):
    finding = llm_query(f"""
    You are analyzing section {i+1} of {len(chunks)}.
    
    QUERY: {original_query}
    
    SECTION CONTENT:
    {chunk}
    
    Extract any information relevant to the query. If nothing relevant, say "No relevant information."
    Be concise but complete.
    """)
    findings.append(finding)

Phase 3: Aggregation (The Synthesizer)

Goal: Combine sub-query results into a coherent, verified final answer.

Aggregation Patterns:

  1. Map-Reduce: When counting, listing, or comparing

    python
    final = llm_query(f"""
    You have received findings from {len(findings)} document sections.
    
    FINDINGS:
    {chr(10).join(findings)}
    
    ORIGINAL QUERY: {original_query}
    
    Synthesize these findings into a complete answer. 
    If findings conflict, note the conflict.
    If information is incomplete, note what's missing.
    """)
    
  2. Verification Loop: When accuracy is critical

    python
    # First aggregation
    answer = llm_query(f"Combine findings: {findings}")
    
    # Verification with smaller, focused context
    verified = llm_query(f"""
    PROPOSED ANSWER: {answer}
    KEY EVIDENCE: {relevant_chunks}
    
    Verify this answer is correct based on the evidence.
    If incorrect, provide the correct answer.
    """)
    
  3. Variable Accumulation: When building long outputs

    python
    accumulated = []
    for chunk in chunks:
        processed = llm_query(f"Process: {chunk}")
        accumulated.append(processed)
    
    # Return the accumulated variable, not a new synthesis
    FINAL_VAR(accumulated)
    

The Mini-Model Economy

Not all work requires the biggest brain. Deploy cheaper models for expensive work:

Task TypeModel TierWhy
OrchestrationHighest (GPT-5 class)Strategic decisions, complex synthesis
Chunk AnalysisMedium (GPT-4 class)Per-section processing, good enough
Simple ExtractionSmallest (Mini class)Regex-like tasks, keyword search

Cost Optimization Rules:

  1. Batch aggressively — ~200K chars per sub-call is optimal
  2. Filter before calling — Use regex/sampling to avoid wasting calls
  3. Use smaller models for scanning — Only escalate when needed
  4. Limit recursion depth — Currently 1 level (sub-LMs, not sub-RLMs)

Answer Signaling

When you've completed the recursive process:

  1. Direct Answer: FINAL(your answer here)
  2. Variable Return: FINAL_VAR(variable_name) — when you've built up the answer in a REPL variable

Critical: Do NOT output FINAL() until you are truly done. Don't confuse plans with answers.


Common Anti-Patterns

❌ Stuffing Everything Into Context

code
# BAD: Just dump it all in
answer = llm_query(f"Here's 5 million characters: {entire_document}. Answer: {query}")
# This WILL cause Context Rot

❌ One-by-One Processing (The Qwen Problem)

python
# BAD: 1000 LLM calls for 1000 lines
for line in lines:  # 1000 lines
    result = llm_query(f"Classify: {line}")  # 1000 calls = $$$ and slow

✅ Batched Processing

python
# GOOD: 5 LLM calls for 1000 lines
chunk_size = 200
for i in range(0, len(lines), chunk_size):
    batch = "\n".join(lines[i:i+chunk_size])
    result = llm_query(f"Classify each line:\n{batch}")

❌ Trusting Your Mental Model Over Evidence

python
# BAD: Returning answer from "memory" instead of accumulated variable
# (This caused failures in OOLONG-Pairs benchmark)
FINAL("The answer is probably X")  # Wrong!
FINAL_VAR(accumulated_answer)      # Right — use what you actually computed

Integration with Related Skills

This skill works in concert with:

SkillPurposeWhen to Reference
rlm-context-scout/SKILL.mdDeep dive on reconnaissance techniquesPhase 1 (probing, filtering)
rlm-repl-environment/SKILL.mdREPL setup and code patternsTechnical implementation

Skill Loop Pattern: When implementing RLM thinking:

  1. Start here (Orchestrator) for strategy
  2. Reference rlm-context-scout for reconnaissance details
  3. Reference rlm-repl-environment for code patterns
  4. Return here for aggregation and signaling

Quick Reference Card

code
┌─────────────────────────────────────────────────────────────────┐
│                   RLM ORCHESTRATOR FLOW                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. RECOGNIZE THE PATTERN                                       │
│     → Is context large? Is task complex? → Activate RLM mode   │
│                                                                 │
│  2. RECONNAISSANCE (Don't read — probe)                        │
│     → Sample, count, pattern-match                             │
│     → Build mental map of data structure                       │
│                                                                 │
│  3. DECOMPOSE (Divide the problem)                             │
│     → Semantic chunks? Fixed chunks? Targeted extraction?      │
│     → Each chunk < 500K chars                                  │
│                                                                 │
│  4. DELEGATE (Spawn sub-queries)                               │
│     → Clear, focused prompts                                   │
│     → Return high-signal only                                  │
│                                                                 │
│  5. AGGREGATE (Synthesize findings)                            │
│     → Combine results                                          │
│     → Verify if critical                                       │
│     → Use FINAL_VAR for accumulated answers                    │
│                                                                 │
│  REMEMBER: Context is an ENVIRONMENT, not an INPUT.            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Philosophy

"Data-processing systems with a small but fast main memory can process far larger datasets by cleverly managing how data is fetched into memory." — The RLM Paper, on Out-of-Core Algorithms

RLMs apply this systems principle to language model reasoning:

  • Main Memory = Your Context Window (precious, limited)
  • Disk = The External Environment (vast, cheap to store)
  • Smart Fetching = Selective loading via code (your superpower)

The fundamental insight: An RLM has strictly more representation capacity than an LLM. It can always degrade to a simple LLM call if needed, but it can also scale to handle 10M+ tokens that would be impossible otherwise.

The practical outcome: 91% accuracy on 11M-token tasks where SOTA models score 0%.


When you face the impossible — think recursively.