RLM — Modal Sandbox Long-Context Skill
Process files exceeding context limits using DSPy's Recursive Language Model
backed by Modal cloud sandboxes. The sandbox is a persistent Python REPL
where code navigates data programmatically; the rlm-subcall subagent acts
as the sub-LLM for semantic analysis of individual chunks.
Delegation Guidance
This skill provides domain knowledge — load it for RLM best practices. For execution delegation, combine it with subagents:
| Scenario | Approach |
|---|---|
| Process a large file inline | Load this skill, use ModalInterpreter directly |
| Delegate large-file processing | Delegate to rlm-orchestrator subagent (which loads this skill) |
| Analyze individual chunks | Delegate to rlm-subcall subagent (leaf node) |
| Debug a failing pipeline | Use rlm-debug skill or delegate to rlm-specialist |
| Parallel document analysis (agent team) | Each teammate loads this skill automatically via CLAUDE.md |
Synergy: Skills inject knowledge; subagents isolate execution. Use both: delegate to
rlm-orchestratorwhich loads this skill +rlm-execute+rlm-memory.
Additional Resources
- •For complete ModalInterpreter API, sandbox helpers, DSPy signatures, and troubleshooting, see references/api-reference.md
Prerequisites
- •Modal account configured:
uv run modal setup - •Modal secret named
LITELLMwith DSPy env vars:bashmodal secret create LITELLM \ DSPY_LM_MODEL=openai/gemini-3-flash-preview \ DSPY_LM_API_BASE=https://your-proxy \ DSPY_LLM_API_KEY=sk-... \ DSPY_LM_MAX_TOKENS=65536
- •Local
.envat project root with the same vars (for the planner LM). - •Dependencies synced:
uv sync
Quick Mode — CLI One-Liner
For standard long-context tasks, use the CLI directly:
# Analyze a document uv run fleet-rlm run-long-context \ --docs-path <FILE> \ --query "<QUERY>" \ --mode analyze \ --max-iterations 30 \ --max-llm-calls 50 \ --timeout 900 # Summarize a document with focus uv run fleet-rlm run-long-context \ --docs-path <FILE> \ --query "<FOCUS_TOPIC>" \ --mode summarize \ --timeout 900 # With persistent volume uv run fleet-rlm run-long-context \ --docs-path <FILE> \ --query "<QUERY>" \ --mode analyze \ --volume-name rlm-volume-dspy
All run-* commands support --max-iterations, --max-llm-calls, --verbose,
--timeout, --secret-name, --volume-name, and --full-output. Run
uv run fleet-rlm --help for full details.
Interactive Mode — Custom Workflows with ModalInterpreter
For multi-step or custom workflows, use ModalInterpreter directly:
from fleet_rlm import ModalInterpreter
with ModalInterpreter(
timeout=600,
volume_name='rlm-volume-dspy',
) as interp:
import pathlib
content = pathlib.Path('rlm_content/dspy-knowledge/dspy-doc.txt').read_text()
result = interp.execute(
'print(f"Loaded {len(content):,} chars")',
variables={'content': content},
)
print(result)
Scout the Content
Once content is in the sandbox, use the injected sandbox-side helpers:
# See first 3000 chars
result = interp.execute("print(peek(content, 0, 3000))")
# Find all mentions of "optimizer"
result = interp.execute("matches = grep(content, 'optimizer', context=1); print(len(matches))")
# Split into sections
result = interp.execute("""
sections = chunk_by_headers(content)
for i, s in enumerate(sections):
print(f"{i}: {s['header'][:60]} ({len(s['content'])} chars)")
""")
Chunk and Write to Filesystem
Write chunks to /tmp/chunks/ (ephemeral) or /data/chunks/ (volume-persisted):
result = interp.execute("""
import os, json
chunks = chunk_by_size(content, 8000, 400)
os.makedirs('/tmp/chunks', exist_ok=True)
manifest = []
for i, chunk in enumerate(chunks):
path = f'/tmp/chunks/chunk_{i:04d}.txt'
with open(path, 'w') as f:
f.write(chunk)
manifest.append({'id': f'chunk_{i:04d}', 'path': path, 'chars': len(chunk)})
SUBMIT(chunk_count=len(manifest), manifest=manifest)
""")
Subcall Loop (rlm-subcall subagent)
For each chunk, invoke the rlm-subcall subagent:
Subagent: rlm-subcall Input: chunk_path: /tmp/chunks/chunk_0001.txt query: "What modules does DSPy provide?" chunk_id: chunk_0001
The subagent returns structured JSON with relevant, missing, and
suggested_queries fields. Collect all results, then synthesize.
Synthesize in the Sandbox
result = interp.execute("""
import json
findings = []
for r in all_results:
for item in r.get('relevant', []):
if item['confidence'] in ('high', 'medium'):
findings.append(item)
seen = set()
unique = [f for f in findings if f['point'] not in seen and not seen.add(f['point'])]
SUBMIT(findings=unique, total=len(unique))
""", variables={'all_results': all_results})
Full RLM Mode — dspy.RLM with ModalInterpreter
For fully automated RLM execution (the LLM writes its own code):
import dspy
from fleet_rlm import ModalInterpreter, AnalyzeLongDocument
with ModalInterpreter(timeout=900, volume_name='rlm-volume-dspy') as interp:
rlm = dspy.RLM(
signature=AnalyzeLongDocument,
interpreter=interp,
max_iterations=20,
max_llm_calls=30,
verbose=True,
)
result = rlm(
document=open('rlm_content/dspy-knowledge/dspy-doc.txt').read(),
query="What are the main design decisions?",
)
print(f"Findings: {result.findings}")
print(f"Answer: {result.answer}")