Skip to content

Plugin Integration

This guide explains how to integrate rlm-rs with AI coding assistants through plugins, skills, and commands. While the examples focus on Claude Code, the patterns apply to any AI assistant that can execute shell commands.

rlm-cli is designed as a CLI-first tool that AI assistants invoke via shell execution. This architecture enables:

  • Universal Compatibility: Any assistant with shell access can use rlm-cli
  • No Custom APIs: Standard stdin/stdout/stderr communication
  • JSON Output: Machine-readable format for programmatic integration
  • Stateless Commands: Each invocation is independent (state lives in SQLite)

The rlm-rs Claude Code plugin implements the RLM pattern:

User-invocable shortcuts for common operations:

CommandDescriptionMaps To
/rlm-loadLoad file into RLMrlm-cli load <file>
/rlm-searchSearch loaded contentrlm-cli search <query>
/rlm-statusShow RLM staterlm-cli status
/rlm-analyzeFull RLM analysis workflowOrchestrated multi-step

Example Skill Definition (.claude/skills/rlm-load.md):

---
name: rlm-load
description: Load a file or directory into RLM for analysis
arguments:
- name: path
description: File or directory to load
required: true
- name: name
description: Buffer name (defaults to filename)
required: false
---
Load content into RLM for semantic search and chunk-based analysis.
## Workflow
1. Check if rlm-rs is installed: `which rlm-cli`
2. Initialize if needed: `rlm-cli init`
3. Load the content: `rlm-cli load {{path}} --name {{name}} --chunker semantic`
4. Report status: `rlm-cli status --format json`
## Output
Report the number of chunks created and confirm embeddings were generated.

Specialized agents for chunk-level processing:

rlm-subcall Agent (.claude/agents/rlm-subcall.md):

---
name: rlm-subcall
model: haiku
description: Efficient chunk-level analysis for RLM workflow
tools:
- Bash
- Read
---
You are a focused analysis agent processing individual chunks from large documents.
## Instructions
1. Retrieve the chunk: `rlm-cli chunk get <chunk_id>`
2. Analyze according to the prompt
3. Return structured JSON findings:
```json
\{
"chunk_id": <id>,
"findings": [...],
"relevance": "high|medium|low",
"summary": "Brief summary"
\}

Keep responses concise. You’re part of a larger workflow.

**rlm-synthesizer Agent** (`.claude/agents/rlm-synthesizer.md`):
```markdown
---
name: rlm-synthesizer
model: sonnet
description: Synthesize findings from multiple chunk analyses
tools:
- Read
- Bash
---
You aggregate results from multiple rlm-subcall analyses.
## Instructions
1. Review all chunk findings
2. Identify patterns and connections
3. Synthesize into coherent narrative
4. Highlight key insights and recommendations

Automated triggers for RLM operations:

Auto-load on large files (.claude/hooks/large-file-rlm.md):

---
event: PostToolUse
tool: Read
---
If the file read was larger than 50KB, suggest loading it into RLM:
"This is a large file. Consider using `/rlm-load {{file_path}}` for semantic search."

Any AI assistant can integrate with rlm-rs using these patterns:

Terminal window
# 1. Load content (one-time setup)
rlm-cli load large-document.md --name docs
# 2. Search for relevant chunks
RESULTS=$(rlm-cli --format json search "your query" --top-k 5)
# 3. Extract chunk IDs
CHUNK_IDS=$(echo "$RESULTS" | jq -r '.results[].chunk_id')
# 4. Retrieve and process each chunk
for ID in $CHUNK_IDS; do
CONTENT=$(rlm-cli chunk get $ID)
# Process $CONTENT...
done
Terminal window
# Find specific patterns
rlm-cli grep docs "TODO|FIXME|HACK" --format json --max-matches 50
# Get context around matches
rlm-cli grep docs "error.*handling" --window 200
Terminal window
# Broad search first
rlm-cli search "authentication" --top-k 20
# Narrow down
rlm-cli search "JWT token validation" --top-k 5 --mode semantic
# Exact match
rlm-cli search "validateToken function" --mode bm25

All commands with --format json return structured data:

Search Results:

{
"count": 3,
"mode": "hybrid",
"query": "authentication",
"results": [
{
"chunk_id": 42,
"buffer_id": 1,
"buffer_name": "auth.rs",
"score": 0.0328,
"semantic_score": 0.0499,
"bm25_score": 0.0000016
}
]
}

Status:

{
"initialized": true,
"db_path": ".rlm/rlm-state.db",
"db_size_bytes": 245760,
"buffer_count": 3,
"chunk_count": 42,
"total_content_bytes": 125000,
"embeddings_count": 42
}

Chunk:

{
"id": 42,
"buffer_id": 1,
"buffer_name": "auth.rs",
"index": 3,
"byte_range": [12000, 15000],
"size": 3000,
"content": "...",
"has_embedding": true
}

Copilot can invoke rlm-rs through its terminal integration:

@terminal rlm-cli load src/ --name code
@terminal rlm-cli search "error handling"

Codex can execute rlm-rs commands directly:

Terminal window
codex "Load the documentation and find sections about API authentication"
# Codex runs: rlm-cli load docs/ && rlm-rs search "API authentication"

These tools can use rlm-cli as an external helper:

Terminal window
# In .aider.conf.yml or similar
tools:
- name: rlm-search
command: rlm-cli --format json search "$QUERY"

Extensions should use execFile instead of exec for security (avoids shell injection):

import { execFile } from 'child_process';
import { promisify } from 'util';
const execFileAsync = promisify(execFile);
interface SearchResult {
chunk_id: number;
score: number;
}
interface SearchResponse {
results: SearchResult[];
}
async function searchRLM(query: string): Promise<SearchResult[]> {
// Using execFile (not exec) prevents shell injection
const { stdout } = await execFileAsync('rlm-cli', [
'--format', 'json',
'search', query
]);
const response: SearchResponse = JSON.parse(stdout);
return response.results;
}

Terminal window
rlm-cli load src/ --chunker semantic --chunk-size 3000

Semantic chunking respects function and class boundaries.

Terminal window
rlm-cli load src/auth/ --name auth-module
rlm-cli load src/api/ --name api-handlers
rlm-cli load docs/ --name documentation

This makes search results more interpretable.

Terminal window
rlm-cli search "query" --mode hybrid

Hybrid combines semantic understanding with keyword matching.

Instead of sequential calls, use parallel Task invocations:

# Good: Parallel
Task(rlm-subcall, chunk 12) || Task(rlm-subcall, chunk 27) || Task(rlm-subcall, chunk 33)
# Avoid: Sequential
Task(rlm-subcall, chunk 12)
Task(rlm-subcall, chunk 27)
Task(rlm-subcall, chunk 33)
Terminal window
# After subcall analysis
rlm-cli add-buffer auth-analysis "$(cat subcall-results.json)"
# Later retrieval
rlm-cli show auth-analysis

When integrating rlm-rs into AI workflows, proper error handling ensures graceful recovery and good user experience. This section provides structured patterns for handling common errors.

All rlm-cli commands return:

  • Exit code 0: Success
  • Exit code 1: Error (details in stderr)

With JSON format, errors are structured:

{
"error": "storage error: RLM not initialized. Run: rlm-rs init",
"code": "NOT_INITIALIZED"
}
Error MessageCauseRecovery Strategy
RLM not initializedDatabase not createdRun rlm-cli init
buffer not found: <name>Buffer doesn’t existRun rlm-cli list to verify
chunk not found: <id>Invalid chunk IDRe-run search to get valid IDs
No results foundQuery too specificBroaden query or lower threshold
embedding errorModel loading issueCheck disk space, retry once
file not foundInvalid pathVerify path exists before load
Terminal window
# Robust error handling for AI assistants
RESULT=$(rlm-cli --format json search "$QUERY" 2>&1)
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
# Parse error
ERROR=$(echo "$RESULT" | jq -r '.error // empty')
case "$ERROR" in
*"not initialized"*)
rlm-cli init
# Retry original command
RESULT=$(rlm-cli --format json search "$QUERY")
;;
*"buffer not found"*)
echo "Buffer not found. Available buffers:"
rlm-cli list
;;
*"No results"*)
echo "No results. Try broader query or: --threshold 0.1"
;;
*)
echo "Error: $ERROR"
;;
esac
fi

For transient errors (embedding model loading, database locks):

Terminal window
MAX_RETRIES=3
RETRY_DELAY=1
for i in $(seq 1 $MAX_RETRIES); do
RESULT=$(rlm-cli --format json chunk embed "$BUFFER" 2>&1)
if [ $? -eq 0 ]; then
break
fi
if [ $i -lt $MAX_RETRIES ]; then
sleep $RETRY_DELAY
RETRY_DELAY=$((RETRY_DELAY * 2)) # Exponential backoff
fi
done

Before complex workflows, verify prerequisites:

Terminal window
# Check 1: rlm-rs is installed
if ! command -v rlm-rs &> /dev/null; then
echo "rlm-rs not found. Install with: cargo install rlm-cli"
exit 1
fi
# Check 2: Database is initialized
if ! rlm-rs status &> /dev/null; then
rlm-cli init
fi
# Check 3: Content is loaded
BUFFER_COUNT=$(rlm-cli --format json status | jq '.buffer_count')
if [ "$BUFFER_COUNT" -eq 0 ]; then
echo "No content loaded. Use: rlm-rs load <file>"
exit 1
fi
# Check 4: Embeddings exist for semantic search
EMBED_COUNT=$(rlm-cli --format json chunk status | jq '.embedded_chunks')
if [ "$EMBED_COUNT" -eq 0 ]; then
echo "No embeddings. Generating..."
rlm-cli chunk embed --all
fi

When semantic search fails, fall back to BM25:

Terminal window
# Try semantic first
RESULT=$(rlm-cli --format json search "$QUERY" --mode semantic 2>&1)
if echo "$RESULT" | jq -e '.error' > /dev/null 2>&1; then
# Fall back to BM25 (keyword search, no embeddings required)
RESULT=$(rlm-cli --format json search "$QUERY" --mode bm25)
fi

When reporting errors to users, provide actionable guidance:

**Good**: "Buffer 'config' not found. Available buffers: main, auth. Did you mean one of these?"
**Bad**: "Error: buffer not found: config"

Terminal window
# Check installation
which rlm-cli
# Install if missing
cargo install rlm-cli
# or
brew install zircote/tap/rlm-rs
Terminal window
rlm-cli init
  1. Check if content is loaded: rlm-cli list
  2. Verify embeddings exist: rlm-cli chunk status
  3. Try broader query or lower threshold: --threshold 0.1

Ensure you’re using --format json:

Terminal window
rlm-cli --format json search "query" # Correct
rlm-cli search "query" --format json # Also correct

Ready-to-use system prompts for AI assistants integrating with rlm-rs are available in the prompts/ directory:

TemplatePurposeRecommended Model
rlm-orchestrator.mdCoordinates search, dispatch, and synthesissonnet
rlm-analyst.mdAnalyzes individual chunkshaiku
rlm-synthesizer.mdAggregates analyst findingssonnet
  1. Orchestrator receives user request and searches for relevant chunks
  2. Analysts (parallel) process individual chunks and return structured findings
  3. Synthesizer aggregates findings into a coherent report
User Request
┌─────────────┐
│ Orchestrator │──▶ rlm-rs search "query"
└─────────────┘
▼ dispatch
┌─────────────────────────────────┐
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Analyst 1│ │Analyst 2│ │Analyst N│ │ (parallel)
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────┘
│ collect
┌─────────────┐
│ Synthesizer │──▶ Final Report
└─────────────┘