Skip to content

Features

rlm-cli uses Cargo features to provide optional functionality and reduce binary size for specific use cases.

What it does: Enables semantic search using FastEmbed ONNX-based embedding models.

Dependencies:

  • fastembed crate (ONNX Runtime binaries)
  • BGE-M3 embedding model (1024 dimensions)

Use when:

  • You need semantic similarity search
  • Context-aware document retrieval is important
  • Hybrid search (semantic + BM25) is required

Binary size impact: ~100MB (includes ONNX runtime + model weights)

Build:

Terminal window
# Enabled by default
cargo build --release
# Explicitly enable
cargo build --release --features fastembed-embeddings

Skip when:

  • You only need keyword/regex search
  • Binary size is critical (embedded systems, containers)
  • BM25 full-text search is sufficient

Build without:

Terminal window
cargo build --release --no-default-features

What it does: Enables high-performance vector search using HNSW (Hierarchical Navigable Small World) algorithm.

Dependencies:

  • usearch crate v2.23.x from crates.io (pinned <2.24 for Windows compatibility)
  • Requires C++ compiler (C++17 or later)

Note: Version 2.24.0+ is excluded due to Windows compilation issues. See Troubleshooting for details.

Use when:

  • Working with large document collections (>10,000 chunks)
  • Low-latency vector search is required (<10ms)
  • Memory usage is acceptable (HNSW index ~4x embedding size)

Performance:

  • Exact search (SQLite): O(n) - 100ms for 10K chunks
  • HNSW search: O(log n) - <10ms for 10K chunks

Build:

Terminal window
cargo build --release --features usearch-hnsw

Skip when:

  • Document collection is small (<1,000 chunks)
  • Build environment lacks C++ toolchain
  • Approximate nearest neighbor trade-offs are unacceptable

What it does: Combines fastembed-embeddings + usearch-hnsw for complete semantic search capabilities.

Use when:

  • Production deployment with large-scale semantic search
  • Maximum search performance is required
  • You want the complete feature set

Build:

Terminal window
cargo build --release --features full-search

FeaturesEmbeddingVector SearchBM25Use Case
(none)Keyword search only, minimal binary
fastembed-embeddingsExact (SQLite)Hybrid search, moderate scale
usearch-hnswNo embeddings, BM25 only
full-searchHNSWProduction, large scale

Semantic search commands will fall back to BM25-only:

Terminal window
# This command requires embeddings
rlm-cli search "query" --mode semantic
# Error: FastEmbed not available, falling back to BM25
# Suggestion: Rebuild with --features fastembed-embeddings

The CLI will automatically use hash-based pseudo-embeddings for compatibility, but results will be degraded.

Vector search uses exact SQLite-based cosine similarity:

Terminal window
rlm-cli search "query" --mode hybrid --top-k 100
# Uses exact search - slower but accurate

Performance degrades linearly with chunk count.

Terminal window
# Smallest binary, keyword search only
cargo build --release --no-default-features
# Result: ~5MB binary, no embedding dependencies
Terminal window
# Default: FastEmbed embeddings + SQLite vector search
cargo build --release
# Result: ~100MB binary, hybrid search, moderate scale
Terminal window
# Full features: FastEmbed + HNSW
cargo build --release --features full-search
# Result: ~105MB binary, maximum performance
# Dockerfile example - minimal size
FROM rust:1.88-slim AS builder
WORKDIR /app
COPY . .
RUN cargo build --release --no-default-features
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/rlm-cli /usr/local/bin/
CMD ["rlm-cli"]

First-Time Model Download (fastembed-embeddings)

Section titled “First-Time Model Download (fastembed-embeddings)”

When first running with embeddings enabled:

Terminal window
rlm-cli load document.md --name docs
# Downloads BGE-M3 model (~1GB) to ~/.cache/fastembed/
# Progress: Downloading model... 100%
# Generating embeddings... Done (5000 chunks in 30s)

Model cache location: $HOME/.cache/fastembed/

Download size: ~1GB (one-time)

Check which features are compiled:

Terminal window
rlm-cli --version
# Output:
# rlm-cli 1.2.4
# Features: fastembed-embeddings, usearch-hnsw

Error: error: failed to compile usearch

Solution: Install C++ compiler

Terminal window
# Ubuntu/Debian
sudo apt-get install build-essential
# macOS
xcode-select --install
# Or disable HNSW
cargo build --release --features fastembed-embeddings

Error: ONNX Runtime not found

Solution: Use bundled binaries (enabled by default)

Terminal window
# Explicitly enable bundled ONNX
cargo build --release --features fastembed-embeddings

Issue: Embedding generation is slow

Solutions:

  • Use --chunker parallel for multi-threaded chunking
  • Reduce chunk size: --chunk-size 50000 (default: 100k)
  • Check CPU resources (embedding uses all cores)

Issue: High memory usage during search

Solutions:

  • Without HNSW: Memory = chunk_count × 1024 × 4 bytes
  • With HNSW: Memory = chunk_count × 1024 × 16 bytes (includes index)
  • Use --top-k to limit result set: --top-k 10

Benchmark: 50,000 chunks, BGE-M3 embeddings (1024d)

ConfigurationSearch TimeMemoryBinary Size
No features200ms (BM25)50MB5MB
fastembed-embeddings800ms (exact)250MB100MB
full-search8ms (HNSW)450MB105MB

Recommendation: Use fastembed-embeddings (default) for most use cases. Enable usearch-hnsw only for large-scale deployments (>10K chunks).