Technical Overview
> Status: Early development / Experimental > Version: 1.2.0 (Unreleased) > Last Updated: January 2026
Introduction
Section titled “Introduction”rlm-rs is a Rust implementation of the Recursive Language Model (RLM) pattern described in arXiv:2512.24601. This project extends the original paper’s concepts with modern information retrieval techniques to create a practical tool for processing documents that exceed typical LLM context windows.
This implementation is designed to be assistant-agnostic—while the current plugin targets Claude Code, the underlying rlm-rs CLI and skill architecture can be adapted to any AI assistant platform that supports skills, tools, or plugin systems.
Research Foundations
Section titled “Research Foundations”The RLM Pattern (arXiv:2512.24601)
Section titled “The RLM Pattern (arXiv:2512.24601)”The Recursive Language Model pattern addresses a fundamental limitation of LLMs: fixed context windows. The key insight is that an LLM can orchestrate its own “sub-calls” to process content in manageable chunks, then synthesize results into a coherent response.
Core concepts from the paper:
| Concept | Description | Our Implementation |
|---|---|---|
| Root LLM | Orchestrates the overall workflow | Main assistant conversation |
| Sub-LLM | Processes individual chunks | rlm-subcall agent (lightweight model) |
| External Environment | Stores state between calls | rlm-rs CLI with SQLite |
| Context Buffer | Holds intermediate results | SQLite buffer tables |
Hybrid Search: BM25 + Semantic + RRF
Section titled “Hybrid Search: BM25 + Semantic + RRF”A key extension beyond the original paper is our hybrid search system that combines multiple retrieval strategies:
BM25 (Best Match 25)
Section titled “BM25 (Best Match 25)”BM25 is a probabilistic ranking function used by search engines since the 1990s. It excels at:
- Exact keyword matching
- Term frequency analysis
- Document length normalization
score(D,Q) = Σ IDF(qi) · (f(qi,D) · (k1 + 1)) / (f(qi,D) + k1 · (1 - b + b · |D|/avgdl))Where:
f(qi,D)= frequency of term qi in document D|D|= document lengthavgdl= average document lengthk1,b= tuning parameters
Semantic Search (Vector Embeddings)
Section titled “Semantic Search (Vector Embeddings)”Semantic search uses dense vector representations to find conceptually similar content, even without keyword overlap:
- Captures meaning and context
- Handles synonyms and paraphrases
- Works across languages and terminology
Embeddings are generated on-demand and cached in SQLite for subsequent queries.
Reciprocal Rank Fusion (RRF)
Section titled “Reciprocal Rank Fusion (RRF)”RRF combines multiple ranking strategies into a single unified score. Originally introduced by Cormack, Clarke, and Buettcher (2009), it’s elegant in its simplicity:
RRF_score(d) = Σ 1 / (k + rank_i(d))Where:
k= constant (default: 60) that controls influence of high-ranked documentsrank_i(d)= rank of document d in ranking system i
Why RRF? It’s robust, parameter-light, and consistently outperforms individual rankers in practice.
Chunking Strategies
Section titled “Chunking Strategies”The implementation supports multiple chunking approaches:
| Strategy | Best For | Trade-offs |
|---|---|---|
| Semantic | Markdown, code, structured docs | Respects boundaries, variable sizes |
| Fixed | Logs, unstructured text | Predictable sizes, may split concepts |
| Parallel | Very large files (>10MB) | Fast processing, same as fixed |
Semantic chunking respects document structure (headings, paragraphs, code blocks) to maintain coherent context within each chunk.
Architecture
Section titled “Architecture”SQLite as External Environment
Section titled “SQLite as External Environment”All state persists in a local SQLite database (.rlm/rlm-state.db):
┌─────────────────────────────────────────┐│ SQLite Database │├─────────────────────────────────────────┤│ buffers │ Loaded documents ││ chunks │ Document segments ││ embeddings │ Vector representations ││ variables │ Context state │└─────────────────────────────────────────┘Why SQLite?
- Zero configuration, single file
- ACID transactions for consistency
- Portable across platforms
- No external services required
Pass-by-Reference Pattern
Section titled “Pass-by-Reference Pattern”Instead of writing chunks to files, rlm-rs 1.0.0 uses pass-by-reference:
Search Results → chunk_id → Sub-Agent → rlm-rs chunk get <id> → ContentBenefits:
- No file I/O overhead
- Only relevant chunks are retrieved
- Atomic, consistent access via database
- Reduced disk usage
Portability
Section titled “Portability”Assistant-Agnostic Design
Section titled “Assistant-Agnostic Design”The rlm-rs CLI is a standalone Rust binary with no dependencies on any specific AI platform. It can be integrated with:
- Claude Code (current implementation via skills/commands/agents)
- OpenAI GPTs (via function calling)
- LangChain (as a tool)
- Any assistant with tool/skill support
Skill-Based Architecture
Section titled “Skill-Based Architecture”The plugin uses a skill-based architecture where:
- Skills are markdown files with frontmatter metadata
- Commands map to CLI invocations
- Agents are sub-LLM definitions with specific roles
This pattern translates to most modern AI assistant frameworks:
| This Plugin | OpenAI | LangChain | Generic |
|---|---|---|---|
| Skill | Custom GPT Instructions | Chain | Workflow |
| Command | Function | Tool | Action |
| Agent | Assistant | Agent | Sub-agent |
Adapting to Other Platforms
Section titled “Adapting to Other Platforms”To use rlm-rs with a different assistant:
- Install the CLI:
cargo install rlm-rs - Define tools that wrap CLI commands
- Create workflow matching the RLM pattern:
- Initialize → Load → Search → Process chunks → Synthesize
Example tool definition (generic):
{ "name": "rlm_search", "description": "Search loaded documents using hybrid semantic + BM25", "parameters": { "query": "Search query text", "top_k": "Maximum results (default: 100)", "buffer": "Filter by buffer name" }, "command": "rlm-rs --format json search \"$query\" --top-k $top_k"}Current Limitations
Section titled “Current Limitations”> This is early-stage software under active development.
Known Limitations
Section titled “Known Limitations”- Embedding Model: Currently uses a single embedding model; future versions may support model selection
- Search Tuning: BM25 and RRF parameters are not yet user-configurable
- Large Scale: Tested primarily on documents up to ~20MB
- Concurrent Access: Single-user, single-process access to database
Experimental Features
Section titled “Experimental Features”- Semantic chunking heuristics are being refined
- Overlap handling in hybrid search
- Performance optimization for very large documents
Roadmap
Section titled “Roadmap”Potential future directions (not committed):
- Configurable embedding models
- User-tunable search parameters
- Multi-document cross-referencing
- Streaming chunk processing
- Remote/shared database support
References
Section titled “References”Papers
Section titled “Papers”- RLM Pattern: arXiv:2512.24601 - Recursive Language Models for long-context processing
- BM25: Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond
- RRF: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods
Resources
Section titled “Resources”Contributing
Section titled “Contributing”This project is open source under the MIT license. Contributions, bug reports, and feedback are welcome.
See CONTRIBUTING.md for development guidelines.
This document describes rlm-rs v1.0.0. Features and behavior may change in future versions.