Architecture
This document describes the architecture of the rlm-rs Claude Code plugin, which implements the Recursive Language Model (RLM) pattern from arXiv:2512.24601.
Overview
Section titled “Overview”The RLM pattern enables processing of documents that far exceed LLM context window limits (up to 100x larger) by using a hierarchical approach with chunking, distributed analysis, and synthesis.
┌─────────────────────────────────────────────────────────────────┐│ Claude Code (Root LLM) ││ Opus/Sonnet Model │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ ││ │ Skills │ │ Commands │ │ Agents │ ││ │ │ │ │ │ │ ││ │ • rlm │ │ • rlm-init │ │ • rlm-subcall │ ││ │ • rlm- │ │ • rlm-load │ │ (Haiku) │ ││ │ chunking │ │ • rlm-status│ │ │ ││ │ │ │ • rlm-query │ │ • rlm-synthesizer │ ││ │ │ │ │ │ (Sonnet) │ ││ └─────────────┘ └─────────────┘ └─────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘ │ │ CLI Integration ▼┌─────────────────────────────────────────────────────────────────┐│ rlm-rs CLI (Rust) ││ External Environment │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ ││ │ Chunking │ │ Buffer │ │ SQLite │ ││ │ Engine │ │ Manager │ │ Storage │ ││ │ │ │ │ │ │ ││ │ • Fixed │ │ • Load │ │ .rlm/rlm-state.db │ ││ │ • Semantic │ │ • Peek │ │ │ ││ │ • Parallel │ │ • Grep │ │ • Buffers │ ││ │ │ │ • Search │ │ • Chunks │ ││ │ │ │ • Chunk Get │ │ • Embeddings │ ││ └─────────────┘ └─────────────┘ └─────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Component Mapping
Section titled “Component Mapping”| RLM Paper Concept | Plugin Implementation |
|---|---|
| Root LLM | Main Claude Code conversation (Opus/Sonnet) |
Sub-LLM (llm_query) | rlm-subcall agent (Haiku) |
| External Environment | rlm-rs CLI with SQLite storage |
| Context Buffer | SQLite buffer tables |
| Recursive Calls | Task tool invocations |
Data Flow
Section titled “Data Flow”Processing Pipeline
Section titled “Processing Pipeline”1. User Request │ ▼2. Skill Activation (rlm skill) │ ▼3. CLI: Initialize Database │ rlm-rs init ▼4. CLI: Load & Chunk Document │ rlm-rs load <file> --chunker <strategy> ▼5. CLI: Search for Relevant Chunks │ rlm-rs search "<query>" --buffer <name> --top-k 100 │ (Embeddings generated automatically on first search) ▼6. Subcall Loop (targeted, parallel) │ For each relevant chunk ID from search: │ Task tool → rlm-subcall agent (chunk_id) → JSON findings │ Agent retrieves content via: rlm-rs chunk get <id> ▼7. Synthesis │ Task tool → rlm-synthesizer agent (JSON findings) → Final answer ▼8. Present to UserPass-by-Reference Pattern
Section titled “Pass-by-Reference Pattern”The 1.0.0 workflow uses pass-by-reference for chunk content:
┌─────────────┐ chunk_id ┌─────────────┐│ Search │ ───────────────→ │ Subcall ││ Results │ │ Agent │└─────────────┘ └──────┬──────┘ │ │ rlm-rs chunk get <id> ▼ ┌─────────────┐ │ SQLite │ │ Storage │ └─────────────┘Benefits:
- No file I/O: Chunks stay in SQLite, not written to disk
- Efficient: Only relevant chunks retrieved (via search)
- Atomic: Chunk retrieval by ID is guaranteed consistent
State Persistence
Section titled “State Persistence”All state persists in .rlm/rlm-state.db:
-- Buffers tableCREATE TABLE buffers ( id INTEGER PRIMARY KEY, name TEXT UNIQUE, content TEXT, chunker TEXT, chunk_size INTEGER, overlap INTEGER, created_at TIMESTAMP);
-- Chunks table (with embeddings)CREATE TABLE chunks ( id INTEGER PRIMARY KEY, buffer_id INTEGER REFERENCES buffers(id), chunk_index INTEGER, start_offset INTEGER, end_offset INTEGER, content TEXT, embedding BLOB -- Vector embedding for semantic search);
-- Variables table (for context passing)CREATE TABLE variables ( key TEXT PRIMARY KEY, value TEXT);Component Details
Section titled “Component Details”Skills
Section titled “Skills”rlm (Main Orchestrator)
Section titled “rlm (Main Orchestrator)”Purpose: Orchestrates the complete RLM workflow
Trigger Phrases:
- “process a large file”
- “analyze document exceeding context”
- “use RLM”
- “handle long context”
Allowed Tools: Read, Write, Bash, Glob, Grep, Task
Workflow:
- Verify rlm-rs installation
- Initialize database
- Load document with appropriate chunking
- Scout content structure (peek, grep)
- Search for relevant chunks (hybrid semantic + BM25)
- Invoke subcall agents for each relevant chunk (by ID)
- Synthesize results from JSON findings
rlm-chunking (Strategy Guide)
Section titled “rlm-chunking (Strategy Guide)”Purpose: Help select optimal chunking parameters
Trigger Phrases:
- “chunking strategies”
- “how to chunk a file”
- “semantic vs fixed chunking”
Allowed Tools: Read, Bash
Commands
Section titled “Commands”| Command | Arguments | Description |
|---|---|---|
/rlm-init | --force (optional) | Initialize SQLite database |
/rlm-load | file, name, chunker, chunk-size, overlap | Load file into buffer |
/rlm-status | none | Show current state |
/rlm-query | query, buffer, top_k, batch_size | Run analysis query |
Agents
Section titled “Agents”rlm-subcall (Chunk Analyzer)
Section titled “rlm-subcall (Chunk Analyzer)”Model: Haiku (fast, cost-effective) Color: Cyan Tools: Bash, Grep
Input: Query + chunk ID (pass-by-reference) Output: Structured JSON with findings
The agent retrieves chunk content via rlm-rs chunk get <id>:
{ "chunk_id": 42, "relevant": true, "findings": [...], "metadata": {...}}rlm-synthesizer (Result Aggregator)
Section titled “rlm-synthesizer (Result Aggregator)”Model: Sonnet (more capable for synthesis) Color: Green Tools: Bash, Read
Input: Original query + JSON findings (inline or buffer name) Output: Coherent markdown response with:
- Executive summary
- Key findings
- Analysis
- Recommendations
The synthesizer accepts findings directly as JSON (preferred) or as a buffer name to retrieve.
Chunking Strategies
Section titled “Chunking Strategies”Fixed Chunking
Section titled “Fixed Chunking”Document: [==========================================] ↓Chunks: [=====][=====][=====][=====][=====][=====] ← size →- Splits at exact byte boundaries
- Predictable chunk sizes
- May split mid-concept
Semantic Chunking
Section titled “Semantic Chunking”Document: [# Heading 1 ][# Heading 2 ] [ paragraph... ][ paragraph... ] [ code block... ][ - list item ] ↓Chunks: [# Heading 1 ][# Heading 2 ] [ content... ][ content... ]- Respects document structure
- Variable chunk sizes
- Maintains semantic coherence
Overlap
Section titled “Overlap”Chunk 1: [===============] [===] ← overlapChunk 2: [===============] [===] ← overlapChunk 3: [===============]- Ensures context continuity
- Prevents loss of boundary information
- Increases total chunks slightly
Security Considerations
Section titled “Security Considerations”- File Access: Plugin only accesses files explicitly provided by user
- State Isolation: Each project has its own
.rlm/directory - No Network Access: CLI operates entirely locally
- Subprocess Containment: Agents run in Claude Code’s sandboxed environment
Performance Characteristics
Section titled “Performance Characteristics”| Metric | Typical Value |
|---|---|
| Chunk processing | ~2-5s per chunk (Haiku) |
| Synthesis | ~5-15s (Sonnet) |
| Max document size | ~20MB practical limit |
| Typical chunks | 10-100 for large docs |
Extension Points
Section titled “Extension Points”Adding New Chunking Strategies
Section titled “Adding New Chunking Strategies”Implement in rlm-rs CLI (Rust):
- Add variant to
ChunkStrategyenum - Implement
Chunkertrait - Register in CLI argument parser
Adding New Finding Types
Section titled “Adding New Finding Types”Update rlm-subcall agent:
- Add type to finding types list
- Document expected evidence format
- Update synthesizer to handle new type