ADR-010: Switch to BGE-M3 Model
ADR-010: Switch to BGE-M3 Embedding Model
Section titled “ADR-010: Switch to BGE-M3 Embedding Model”Status
Section titled “Status”Accepted
Context
Section titled “Context”Background and Problem Statement
Section titled “Background and Problem Statement”The original embedding model choice (all-MiniLM-L6-v2) was made for its small size and fast inference. However, production usage revealed limitations:
- 384 dimensions provide lower semantic resolution
- ~512 token context often truncates larger chunks
- Multilingual support is limited
BGE-M3 offers significant improvements at the cost of larger model size.
Current Limitations
Section titled “Current Limitations”- Dimension mismatch: 384 dimensions limit semantic expressiveness
- Token truncation: ~512 token limit truncates chunks near the 2000-byte default size
- Multilingual gaps: MiniLM has limited non-English support
Decision Drivers
Section titled “Decision Drivers”Primary Decision Drivers
Section titled “Primary Decision Drivers”- Context coverage: BGE-M3’s 8192 token limit covers full chunks without truncation
- Embedding quality: 1024 dimensions provide richer semantic representation
- Consistency: Full chunk content is embedded, not truncated
Secondary Decision Drivers
Section titled “Secondary Decision Drivers”- Multilingual support: BGE-M3 handles non-English content better
- Model maturity: BGE-M3 is well-established in the embedding community
- fastembed support: Both models supported by fastembed-rs
Considered Options
Section titled “Considered Options”Option 1: Switch to BGE-M3
Section titled “Option 1: Switch to BGE-M3”Description: Replace all-MiniLM-L6-v2 with BGE-M3 embedding model.
Technical Characteristics:
- 1024 dimensions (vs 384)
- 8192 token context (vs ~512)
- ~1.3GB model size (vs ~90MB)
- Stronger multilingual support
Advantages:
- Full chunk coverage without truncation
- Higher semantic resolution
- Better multilingual embeddings
- More accurate semantic search
Disadvantages:
- Larger model download (~1.3GB vs ~90MB)
- Slightly slower inference
- Breaking change: requires schema migration
- Existing embeddings must be regenerated
Risk Assessment:
- Technical Risk: Low. fastembed-rs supports BGE-M3 well
- Schedule Risk: Low. Simple model swap
- Ecosystem Risk: Low. Well-established model
Option 2: Keep all-MiniLM-L6-v2
Section titled “Option 2: Keep all-MiniLM-L6-v2”Description: Maintain current model.
Technical Characteristics:
- 384 dimensions
- ~512 token context
- ~90MB model size
Advantages:
- Smaller model download
- Faster inference
- No migration needed
Disadvantages:
- Continued truncation issues
- Lower semantic quality
- Limited multilingual support
Disqualifying Factor: Token truncation undermines semantic search quality for typical chunk sizes.
Risk Assessment:
- Technical Risk: None. No change
- Schedule Risk: None. No change
- Ecosystem Risk: Low. Status quo
Option 3: External Embedding API
Section titled “Option 3: External Embedding API”Description: Switch to OpenAI or similar API embeddings.
Technical Characteristics:
- API-based embedding generation
- Higher quality models available
- Requires network and API key
Advantages:
- Access to latest models
- No local model storage
Disadvantages:
- Network dependency
- Privacy concerns
- API costs
- Conflicts with offline-first design
Disqualifying Factor: Violates offline-first and privacy principles established in ADR-007.
Risk Assessment:
- Technical Risk: Low. APIs are simple
- Schedule Risk: Low. Easy integration
- Ecosystem Risk: High. API dependency
Decision
Section titled “Decision”Switch from all-MiniLM-L6-v2 to BGE-M3 as the default embedding model.
The implementation will:
- Change
EmbeddingModel::AllMiniLML6V2toEmbeddingModel::BGEM3 - Update
DEFAULT_DIMENSIONSfrom 384 to 1024 - Add schema migration (v2→v3) to clear incompatible embeddings
- Keep model download silent (existing behavior)
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Full chunk coverage: 8192 tokens handles any reasonable chunk size
- Better search quality: 1024 dimensions capture more semantic nuance
- Multilingual improvement: Better handling of non-English content
- Future-proof: More headroom for chunk size increases
Negative
Section titled “Negative”- Breaking change: Existing embeddings incompatible (different dimensions)
- Larger download: ~1.3GB model vs ~90MB
- Slower inference: Larger model has higher compute cost
- Migration required: Users must regenerate embeddings after upgrade
Neutral
Section titled “Neutral”- Model download timing: Same lazy loading pattern as before
Decision Outcome
Section titled “Decision Outcome”BGE-M3 provides a significant quality improvement for semantic search. The migration clears existing embeddings, requiring users to re-embed their content, but this is a one-time cost for lasting quality improvements.
Mitigations:
- Schema migration (v2→v3) automatically clears old embeddings
- Clear error messages guide users to re-embed
- Document migration in CHANGELOG
- Lazy loading preserves cold start for non-embedding operations
Related Decisions
Section titled “Related Decisions”- ADR-007: Embedded Embedding Model - Embedding infrastructure
- ADR-008: Hybrid Search - Semantic search uses embeddings
- BGE-M3 Paper - Model research paper
- fastembed-rs - Rust embedding library
- BAAI/bge-m3 - Hugging Face model card
More Information
Section titled “More Information”- Date: 2025-01-20
- Source: Production usage feedback and quality analysis
- Related ADRs: ADR-007, ADR-008
2025-01-20
Section titled “2025-01-20”Status: Compliant
Findings:
| Finding | Files | Lines | Assessment |
|---|---|---|---|
| BGE-M3 model configured | src/embedding/fastembed_impl.rs | L66 | compliant |
| DEFAULT_DIMENSIONS = 1024 | src/embedding/mod.rs | L27 | compliant |
| Schema version bumped to 3 | src/storage/schema.rs | L6 | compliant |
| Migration clears embeddings | src/storage/schema.rs | L179-183 | compliant |
Summary: BGE-M3 model switch fully implemented with schema migration.
Action Required: None