AgentSkillsCN

rlm-long-context

(实验性)研究使用独立 Python 脚本实现 RLM 长上下文处理的方法。若用于生产环境,建议优先使用结合 Modal 沙盒的 fleet-rlm 技能。本技能旨在探索实验、评估效果,并提供多种替代实现方案。

SKILL.md
--- frontmatter
name: rlm-long-context
description: (EXPERIMENTAL) Research implementation for RLM long-context processing using standalone Python scripts. For production use, prefer the rlm skill which uses the fleet-rlm package with Modal sandboxes. This skill is for experimentation, evaluation, and alternative implementation patterns.

RLM Long-Context Processing (Experimental)

For production use, prefer the rlm skill which uses the fleet-rlm package with Modal cloud sandboxes.

Use this skill for evaluating alternative RLM strategies, researching optimization techniques, or comparing approaches.

Scripts & References

ResourcePurpose
scripts/orchestrate.pyMain orchestrator with all optimizations
scripts/rank_chunks.pyQuery-guided chunk selection (5-10x speedup)
scripts/semantic_chunk.pyContent-aware chunking by boundaries
scripts/cache_manager.pyResult caching for repeated queries
scripts/codebase_concat.pyConcatenate codebase files for processing
references/advanced-techniques.mdQuery-guided selection, semantic chunking, adaptive sizing, map-reduce, streaming, caching
references/codebase-processing.mdWhole-codebase analysis: concatenation, code chunking, file selection, query types

Architecture

code
Main Agent (Orchestrator)
  +-- Query-Guided Selection (filter chunks by relevance)
  +-- Semantic Chunking (content-aware boundaries)
  +-- Parallel subagent delegation with caching
       |
  +----+----+
  v    v    v
 Chunk A  Chunk B  Chunk C (skipped if low relevance)
  |    |
  v    v
 rlm-subcall    rlm-subcall
  |    |
  +----+----> Result Caching + Streaming (early exit)
       v
  Hierarchical Merge (>1M tokens: chunk > summary > synthesis)
       v
  Final Answer (Main Agent)

Core Workflow

1. Prepare Content with Scripts

bash
# Rank chunks by query relevance (skip irrelevant ones)
python3 .claude/skills/rlm-long-context/scripts/rank_chunks.py \
  --query "find all timeout errors" --top-k 10

# Chunk by semantic boundaries (auto-detects content type)
python3 .claude/skills/rlm-long-context/scripts/semantic_chunk.py \
  --state .claude/rlm_state/state.pkl

# For codebases: concatenate files first
python3 .claude/skills/rlm-long-context/scripts/codebase_concat.py \
  --root ./src --output codebase.txt

2. Choose Chunking Strategy

StrategyWhen to UseTool
Query-guided selectionQuery has clear keywordsscripts/rank_chunks.py
Semantic chunkingStructured content (logs, markdown, JSON, code)scripts/semantic_chunk.py
Adaptive sizingMixed-density contentSee advanced-techniques.md
Fixed-size with overlapUnstructured textscripts/orchestrate.py

For detailed code examples, see references/advanced-techniques.md. For whole-codebase analysis, see references/codebase-processing.md.

3. Delegate to Subagents

For each selected chunk, invoke rlm-subcall:

yaml
subagent: rlm-subcall
input:
  query: "Find all ERROR entries and their timestamps"
  chunk_path: ".claude/rlm_state/chunks/chunk_001.txt"
  chunk_id: "chunk_001"
  format: "json"

Expected output:

json
{
  "chunk_id": "chunk_001",
  "relevant": [{ "point": "...", "evidence": "...", "confidence": "high" }],
  "missing": ["what could not be determined"],
  "suggested_queries": ["follow-up questions"]
}

4. Collect & Synthesize

Merge results, identify cross-chunk patterns, produce final answer. For files > 1M tokens: use hierarchical map-reduce (see advanced-techniques.md).

NEVER List

  • NEVER paste entire chunks into main chat context — causes context overflow. Quote only findings (<1KB).
  • NEVER spawn subagents from subagents — exponential resource consumption. Orchestration stays in main agent.
  • NEVER split content mid-logical-unit — use semantic chunking with boundary detection.
  • NEVER skip result validation before caching — corrupted results poison the cache.
  • NEVER use fixed-size chunks without overlap for structured data — 10% overlap minimum or semantic boundaries.
  • NEVER process all chunks when query is specific — use query-guided selection first; process only top-K.

Limitations

  • Subagent outputs accumulate in main context: monitor total size
  • Parallel execution limited by available subagent workers
  • File must fit in memory (typically 2-4GB)
  • No automatic retry on subagent failure (implement manually)