Skip to content

rlm

Orchestrate processing of documents that exceed context window limits using the rlm-rs CLI tool. This skill implements the RLM pattern from arXiv:2512.24601, enabling analysis of content up to 100x larger than typical context windows.

RLM ConceptImplementation
Root LLMMain Claude Code conversation (Opus/Sonnet)
Sub-LLM (llm_query)rlm-subcall agent (Haiku)
External Environmentrlm-rs CLI with SQLite storage

Verify rlm-rs is installed and available:

Terminal window
command -v rlm-rs >/dev/null 2>&1 || echo "INSTALL REQUIRED: cargo install rlm-rs"

Installation options:

Terminal window
# Via Cargo (recommended)
cargo install rlm-rs
# Via Homebrew
brew install zircote/tap/rlm-rs

Create or verify the RLM database:

Terminal window
rlm-rs init
rlm-rs status

If already initialized, status shows current buffers and state.

Load the large document into a buffer with appropriate chunking:

Terminal window
# Semantic chunking (recommended for structured content)
rlm-rs load <file_path> --name <buffer_name> --chunker semantic
# Fixed chunking (for unstructured text)
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000
# With overlap for continuity
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000 --overlap 1000

Examine the beginning and end to understand structure:

Terminal window
# View first 3000 characters
rlm-rs peek <buffer_name> --start 0 --end 3000
# View last 3000 characters
rlm-rs peek <buffer_name> --start -3000

Search for relevant sections:

Terminal window
rlm-rs grep <buffer_name> "<pattern>" --max-matches 20 --window 150

Use hybrid semantic + BM25 search to find chunks matching your query:

Terminal window
# Hybrid search (semantic + BM25 with rank fusion)
rlm-rs search "your query" --buffer <buffer_name> --top-k 100
# JSON output for programmatic use
rlm-rs --format json search "your query" --top-k 100

Output includes chunk IDs with relevance scores and document position (index):

{
"count": 2,
"mode": "hybrid",
"query": "your query",
"results": [
{"chunk_id": 42, "buffer_id": 1, "index": 5, "score": 0.0328, "semantic_score": 0.0499, "bm25_score": 1.6e-6},
{"chunk_id": 17, "buffer_id": 1, "index": 2, "score": 0.0323, "semantic_score": 0.0457, "bm25_score": 1.2e-6}
]
}
  • index: Sequential position within the document (0-based) - use for temporal ordering
  • buffer_id: Which buffer/document this chunk belongs to

Extract chunk IDs sorted by document position: jq -r '.results | sort_by(.index) | .[].chunk_id'

Get specific chunk content via pass-by-reference:

Terminal window
# Get chunk content
rlm-rs chunk get 42
# With metadata
rlm-rs --format json chunk get 42 --metadata

Only process chunks returned by search. Batch chunk IDs to reduce agent calls:

  1. Search returns chunk IDs with relevance scores and document indices
  2. Sort all chunk IDs by index (document position) to preserve temporal context
  3. Group sorted chunk IDs into batches (default 10, configurable via batch_size argument)
  4. Invoke rlm-subcall agent once per batch using only the two required arguments
  5. Launch batches in parallel via multiple Task calls in one response
  6. Agent handles retrieval internally via rlm-rs chunk get <id> (NO buffer ID needed)
  7. Collect structured JSON findings from all batches

IMPORTANT: Sort chunks by index before batching to preserve document flow. Each subagent should receive chunks in document order (e.g., 3,7,12,15,22 not 22,3,15,7,12). This ensures temporal context is maintained - definitions appear before usages, causes before effects.

CORRECT Task invocation - pass ONLY query and chunk_ids arguments (sorted by index):

Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='3,7,12,15,22'"
Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='28,31,45'"

CRITICAL - DO NOT:

  • Write narrative prompts - the agent already knows what to do
  • Include buffer ID or buffer NAME anywhere in the prompt
  • Mention the buffer at all - chunk IDs are globally unique across all buffers

WRONG (causes exit code 2):

prompt="Analyze chunks from buffer 1..." # NO - has buffer ID
prompt="Analyze chunks from buffer 'myfile.txt'..." # NO - has buffer name
prompt="Use rlm-rs chunk 1 <id>..." # NO - buffer ID in command
prompt="Use rlm-rs chunk get <id> --buffer x..." # NO - --buffer flag doesn't exist

RIGHT:

prompt="query='the user question' chunk_ids='5,105,2,3,74'" # YES - just args!

Once all chunks are processed:

  1. Collect all JSON findings from subcall agents
  2. Pass findings directly to rlm-synthesizer agent (no intermediate files)
  3. Present the final synthesized response to the user

Example Task tool invocation:

Task agent=rlm-synthesizer query="What errors occurred?" findings='[...]' chunk_ids="42,17,23"
  • Never paste large chunks into main context - Use peek/grep to extract only relevant excerpts
  • Keep subagent outputs compact - Request JSON format with short evidence fields
  • Orchestration stays in main conversation - Subagents cannot spawn other subagents
  • State persists in SQLite - All buffers survive across sessions via .rlm/rlm-state.db
  • No file I/O for chunk passing - Use pass-by-reference with chunk IDs
Content TypeRecommended Strategy
Markdown docssemantic
Source codesemantic
JSON/XMLsemantic
Plain logsfixed with overlap
Unstructured textfixed

For detailed chunking guidance, refer to the rlm-chunking skill.

CommandPurpose
initInitialize database
statusShow state summary
loadLoad file into buffer
listList all buffers
showShow buffer details
peekView buffer content slice
grepSearch with regex
searchHybrid semantic + BM25 search
chunk getRetrieve chunk by ID
chunk listList buffer chunks
chunk embedGenerate embeddings
chunk statusShow embedding status
write-chunksExport chunks to files (legacy)
add-bufferStore intermediate results
export-buffersExport all buffers
varGet/set context variables
resetClear all state
Terminal window
# 1. Initialize
rlm-rs init
# 2. Load a large log file
rlm-rs load server.log --name logs --chunker fixed --chunk-size 6000 --overlap 500
# 3. Search for relevant chunks
rlm-rs --format json search "database connection errors" --buffer logs --top-k 100
# 4. For each relevant chunk ID, invoke rlm-subcall agent
# 5. Collect JSON findings
# 6. Pass findings to rlm-synthesizer agent
# 7. Present final answer
  • references/cli-reference.md - Complete CLI documentation
  • rlm-subcall agent - Chunk-level analysis (Haiku)
  • rlm-synthesizer agent - Result aggregation (Sonnet)
  • rlm-chunking skill - Chunking strategy selection