Expand description
Deduplication service for pre-compact hook.
This module provides three-tier deduplication checking:
- Exact match: SHA256 hash comparison via tag search
- Semantic similarity:
FastEmbedembeddings with cosine similarity threshold - Recent capture: In-memory LRU cache with TTL-based expiration
The service implements short-circuit evaluation, exiting early on first match.
§Architecture
┌─────────────────────────────────────────────────────────────────┐
│ DeduplicationService │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ ExactMatch │ │ Semantic │ │ RecentCapture │ │
│ │ Checker │ │ Checker │ │ Checker │ │
│ │ │ │ │ │ │ │
│ │ SHA256 hash │ │ Embedding │ │ LRU Cache with TTL │ │
│ │ comparison │ │ similarity │ │ (5 min window) │ │
│ └──────────────┘ └──────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘§Example
ⓘ
use subcog::services::deduplication::{DeduplicationService, DeduplicationConfig};
let config = DeduplicationConfig::default();
let service = DeduplicationService::new(recall, embedder, config);
let result = service.check_duplicate("Use PostgreSQL for primary storage", Namespace::Decisions)?;
if result.is_duplicate {
println!("Skipping duplicate: {:?}", result.reason);
}Modules§
- config 🔒
- Deduplication configuration.
- exact_
match 🔒 - Exact match deduplication checker.
- hasher 🔒
- Content hashing utility for deduplication.
- recent 🔒
- Recent capture deduplication checker.
- semantic 🔒
- Semantic similarity deduplication checker.
- service 🔒
- Deduplication service orchestrator.
- types 🔒
- Deduplication result types.
Structs§
- Content
Hasher - Content hasher for deduplication.
- Deduplication
Config - Configuration for the deduplication service.
- Deduplication
Service - Service for deduplication checking.
- Duplicate
Check Result - Result of a deduplication check.
Enums§
- Duplicate
Reason - The reason content was identified as a duplicate.
Traits§
- Deduplicator
- Trait for deduplication checking.