Skip to content

Technical Overview

> Status: Early development / Experimental > Version: 1.2.0 (Unreleased) > Last Updated: January 2026

rlm-rs is a Rust implementation of the Recursive Language Model (RLM) pattern described in arXiv:2512.24601. This project extends the original paper’s concepts with modern information retrieval techniques to create a practical tool for processing documents that exceed typical LLM context windows.

This implementation is designed to be assistant-agnostic—while the current plugin targets Claude Code, the underlying rlm-rs CLI and skill architecture can be adapted to any AI assistant platform that supports skills, tools, or plugin systems.

The Recursive Language Model pattern addresses a fundamental limitation of LLMs: fixed context windows. The key insight is that an LLM can orchestrate its own “sub-calls” to process content in manageable chunks, then synthesize results into a coherent response.

Core concepts from the paper:

ConceptDescriptionOur Implementation
Root LLMOrchestrates the overall workflowMain assistant conversation
Sub-LLMProcesses individual chunksrlm-subcall agent (lightweight model)
External EnvironmentStores state between callsrlm-rs CLI with SQLite
Context BufferHolds intermediate resultsSQLite buffer tables

A key extension beyond the original paper is our hybrid search system that combines multiple retrieval strategies:

BM25 is a probabilistic ranking function used by search engines since the 1990s. It excels at:

  • Exact keyword matching
  • Term frequency analysis
  • Document length normalization
score(D,Q) = Σ IDF(qi) · (f(qi,D) · (k1 + 1)) / (f(qi,D) + k1 · (1 - b + b · |D|/avgdl))

Where:

  • f(qi,D) = frequency of term qi in document D
  • |D| = document length
  • avgdl = average document length
  • k1, b = tuning parameters

Semantic search uses dense vector representations to find conceptually similar content, even without keyword overlap:

  • Captures meaning and context
  • Handles synonyms and paraphrases
  • Works across languages and terminology

Embeddings are generated on-demand and cached in SQLite for subsequent queries.

RRF combines multiple ranking strategies into a single unified score. Originally introduced by Cormack, Clarke, and Buettcher (2009), it’s elegant in its simplicity:

RRF_score(d) = Σ 1 / (k + rank_i(d))

Where:

  • k = constant (default: 60) that controls influence of high-ranked documents
  • rank_i(d) = rank of document d in ranking system i

Why RRF? It’s robust, parameter-light, and consistently outperforms individual rankers in practice.

The implementation supports multiple chunking approaches:

StrategyBest ForTrade-offs
SemanticMarkdown, code, structured docsRespects boundaries, variable sizes
FixedLogs, unstructured textPredictable sizes, may split concepts
ParallelVery large files (>10MB)Fast processing, same as fixed

Semantic chunking respects document structure (headings, paragraphs, code blocks) to maintain coherent context within each chunk.

All state persists in a local SQLite database (.rlm/rlm-state.db):

┌─────────────────────────────────────────┐
│ SQLite Database │
├─────────────────────────────────────────┤
│ buffers │ Loaded documents │
│ chunks │ Document segments │
│ embeddings │ Vector representations │
│ variables │ Context state │
└─────────────────────────────────────────┘

Why SQLite?

  • Zero configuration, single file
  • ACID transactions for consistency
  • Portable across platforms
  • No external services required

Instead of writing chunks to files, rlm-rs 1.0.0 uses pass-by-reference:

Search Results → chunk_id → Sub-Agent → rlm-rs chunk get <id> → Content

Benefits:

  • No file I/O overhead
  • Only relevant chunks are retrieved
  • Atomic, consistent access via database
  • Reduced disk usage

The rlm-rs CLI is a standalone Rust binary with no dependencies on any specific AI platform. It can be integrated with:

  • Claude Code (current implementation via skills/commands/agents)
  • OpenAI GPTs (via function calling)
  • LangChain (as a tool)
  • Any assistant with tool/skill support

The plugin uses a skill-based architecture where:

  • Skills are markdown files with frontmatter metadata
  • Commands map to CLI invocations
  • Agents are sub-LLM definitions with specific roles

This pattern translates to most modern AI assistant frameworks:

This PluginOpenAILangChainGeneric
SkillCustom GPT InstructionsChainWorkflow
CommandFunctionToolAction
AgentAssistantAgentSub-agent

To use rlm-rs with a different assistant:

  1. Install the CLI: cargo install rlm-rs
  2. Define tools that wrap CLI commands
  3. Create workflow matching the RLM pattern:
    • Initialize → Load → Search → Process chunks → Synthesize

Example tool definition (generic):

{
"name": "rlm_search",
"description": "Search loaded documents using hybrid semantic + BM25",
"parameters": {
"query": "Search query text",
"top_k": "Maximum results (default: 100)",
"buffer": "Filter by buffer name"
},
"command": "rlm-rs --format json search \"$query\" --top-k $top_k"
}

> This is early-stage software under active development.

  1. Embedding Model: Currently uses a single embedding model; future versions may support model selection
  2. Search Tuning: BM25 and RRF parameters are not yet user-configurable
  3. Large Scale: Tested primarily on documents up to ~20MB
  4. Concurrent Access: Single-user, single-process access to database
  • Semantic chunking heuristics are being refined
  • Overlap handling in hybrid search
  • Performance optimization for very large documents

Potential future directions (not committed):

  • Configurable embedding models
  • User-tunable search parameters
  • Multi-document cross-referencing
  • Streaming chunk processing
  • Remote/shared database support
  1. RLM Pattern: arXiv:2512.24601 - Recursive Language Models for long-context processing
  2. BM25: Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond
  3. RRF: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods

This project is open source under the MIT license. Contributions, bug reports, and feedback are welcome.

See CONTRIBUTING.md for development guidelines.


This document describes rlm-rs v1.0.0. Features and behavior may change in future versions.