AgentSkillsCN

rlm

利用 fleet-rlm 运行递归语言模型,处理长上下文任务。本技能采用持久化的 Modal 沙盒 Python REPL,并以 rlm-subcall 子代理作为子 LLM(llm_query)。适用于处理大型文件、从文档中提取信息,或在云端沙盒中运行代码时使用。

SKILL.md
--- frontmatter
name: rlm
description: Run the Recursive Language Model using fleet-rlm for long-context tasks. Uses a persistent Modal Sandbox Python REPL and an rlm-subcall subagent as the sub-LLM (llm_query). Use when processing large files, extracting information from documents, or running code in cloud sandboxes.

RLM — Modal Sandbox Long-Context Skill

Process files exceeding context limits using DSPy's Recursive Language Model backed by Modal cloud sandboxes. The sandbox is a persistent Python REPL where code navigates data programmatically; the rlm-subcall subagent acts as the sub-LLM for semantic analysis of individual chunks.

Delegation Guidance

This skill provides domain knowledge — load it for RLM best practices. For execution delegation, combine it with subagents:

ScenarioApproach
Process a large file inlineLoad this skill, use ModalInterpreter directly
Delegate large-file processingDelegate to rlm-orchestrator subagent (which loads this skill)
Analyze individual chunksDelegate to rlm-subcall subagent (leaf node)
Debug a failing pipelineUse rlm-debug skill or delegate to rlm-specialist
Parallel document analysis (agent team)Each teammate loads this skill automatically via CLAUDE.md

Synergy: Skills inject knowledge; subagents isolate execution. Use both: delegate to rlm-orchestrator which loads this skill + rlm-execute + rlm-memory.

Additional Resources

Prerequisites

  1. Modal account configured: uv run modal setup
  2. Modal secret named LITELLM with DSPy env vars:
    bash
    modal secret create LITELLM \
      DSPY_LM_MODEL=openai/gemini-3-flash-preview \
      DSPY_LM_API_BASE=https://your-proxy \
      DSPY_LLM_API_KEY=sk-... \
      DSPY_LM_MAX_TOKENS=65536
    
  3. Local .env at project root with the same vars (for the planner LM).
  4. Dependencies synced: uv sync

Quick Mode — CLI One-Liner

For standard long-context tasks, use the CLI directly:

bash
# Analyze a document
uv run fleet-rlm run-long-context \
  --docs-path <FILE> \
  --query "<QUERY>" \
  --mode analyze \
  --max-iterations 30 \
  --max-llm-calls 50 \
  --timeout 900

# Summarize a document with focus
uv run fleet-rlm run-long-context \
  --docs-path <FILE> \
  --query "<FOCUS_TOPIC>" \
  --mode summarize \
  --timeout 900

# With persistent volume
uv run fleet-rlm run-long-context \
  --docs-path <FILE> \
  --query "<QUERY>" \
  --mode analyze \
  --volume-name rlm-volume-dspy

All run-* commands support --max-iterations, --max-llm-calls, --verbose, --timeout, --secret-name, --volume-name, and --full-output. Run uv run fleet-rlm --help for full details.


Interactive Mode — Custom Workflows with ModalInterpreter

For multi-step or custom workflows, use ModalInterpreter directly:

python
from fleet_rlm import ModalInterpreter

with ModalInterpreter(
    timeout=600,
    volume_name='rlm-volume-dspy',
) as interp:
    import pathlib
    content = pathlib.Path('rlm_content/dspy-knowledge/dspy-doc.txt').read_text()
    result = interp.execute(
        'print(f"Loaded {len(content):,} chars")',
        variables={'content': content},
    )
    print(result)

Scout the Content

Once content is in the sandbox, use the injected sandbox-side helpers:

python
# See first 3000 chars
result = interp.execute("print(peek(content, 0, 3000))")

# Find all mentions of "optimizer"
result = interp.execute("matches = grep(content, 'optimizer', context=1); print(len(matches))")

# Split into sections
result = interp.execute("""
sections = chunk_by_headers(content)
for i, s in enumerate(sections):
    print(f"{i}: {s['header'][:60]}  ({len(s['content'])} chars)")
""")

Chunk and Write to Filesystem

Write chunks to /tmp/chunks/ (ephemeral) or /data/chunks/ (volume-persisted):

python
result = interp.execute("""
import os, json

chunks = chunk_by_size(content, 8000, 400)
os.makedirs('/tmp/chunks', exist_ok=True)

manifest = []
for i, chunk in enumerate(chunks):
    path = f'/tmp/chunks/chunk_{i:04d}.txt'
    with open(path, 'w') as f:
        f.write(chunk)
    manifest.append({'id': f'chunk_{i:04d}', 'path': path, 'chars': len(chunk)})

SUBMIT(chunk_count=len(manifest), manifest=manifest)
""")

Subcall Loop (rlm-subcall subagent)

For each chunk, invoke the rlm-subcall subagent:

code
Subagent: rlm-subcall
Input:
  chunk_path: /tmp/chunks/chunk_0001.txt
  query: "What modules does DSPy provide?"
  chunk_id: chunk_0001

The subagent returns structured JSON with relevant, missing, and suggested_queries fields. Collect all results, then synthesize.

Synthesize in the Sandbox

python
result = interp.execute("""
import json

findings = []
for r in all_results:
    for item in r.get('relevant', []):
        if item['confidence'] in ('high', 'medium'):
            findings.append(item)

seen = set()
unique = [f for f in findings if f['point'] not in seen and not seen.add(f['point'])]

SUBMIT(findings=unique, total=len(unique))
""", variables={'all_results': all_results})

Full RLM Mode — dspy.RLM with ModalInterpreter

For fully automated RLM execution (the LLM writes its own code):

python
import dspy
from fleet_rlm import ModalInterpreter, AnalyzeLongDocument

with ModalInterpreter(timeout=900, volume_name='rlm-volume-dspy') as interp:
    rlm = dspy.RLM(
        signature=AnalyzeLongDocument,
        interpreter=interp,
        max_iterations=20,
        max_llm_calls=30,
        verbose=True,
    )
    result = rlm(
        document=open('rlm_content/dspy-knowledge/dspy-doc.txt').read(),
        query="What are the main design decisions?",
    )
    print(f"Findings: {result.findings}")
    print(f"Answer: {result.answer}")