pub struct DeduplicationService<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> {
config: DeduplicationConfig,
exact_match: ExactMatchChecker,
semantic: Option<SemanticSimilarityChecker<E, V>>,
recent: RecentCaptureChecker,
domain: Domain,
}Expand description
Service for deduplication checking.
Orchestrates three-tier deduplication:
- Exact match (fastest) - SHA256 hash lookup
- Semantic similarity - Embedding cosine similarity
- Recent capture - In-memory LRU cache
Uses short-circuit evaluation: stops on first match.
§Example
use subcog::services::deduplication::{DeduplicationService, DeduplicationConfig};
use subcog::services::recall::RecallService;
use subcog::embedding::FastEmbedEmbedder;
use subcog::storage::vector::UsearchBackend;
use std::sync::Arc;
let recall = Arc::new(RecallService::default());
let embedder = Arc::new(FastEmbedEmbedder::new());
let vector = Arc::new(UsearchBackend::in_memory(384));
let config = DeduplicationConfig::default();
let service = DeduplicationService::new(recall, embedder, vector, config);
let result = service.check_duplicate("Use PostgreSQL", Namespace::Decisions)?;
if result.is_duplicate {
println!("Duplicate found: {:?} - {}", result.reason, result.matched_urn.unwrap());
}Fields§
§config: DeduplicationConfigConfiguration.
exact_match: ExactMatchCheckerExact match checker.
semantic: Option<SemanticSimilarityChecker<E, V>>Semantic similarity checker (optional - may be disabled).
recent: RecentCaptureCheckerRecent capture checker.
domain: DomainDomain for URN construction.
Implementations§
Source§impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> DeduplicationService<E, V>
impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> DeduplicationService<E, V>
Sourcepub fn new(
recall: Arc<RecallService>,
embedder: Arc<E>,
vector: Arc<V>,
config: DeduplicationConfig,
) -> Self
pub fn new( recall: Arc<RecallService>, embedder: Arc<E>, vector: Arc<V>, config: DeduplicationConfig, ) -> Self
Creates a new deduplication service with all checkers.
§Arguments
recall-RecallServicefor exact match searchesembedder- Embedder for semantic similarityvector-VectorBackendfor semantic similarity searchesconfig- Configuration including thresholds
Sourcepub fn without_embeddings(
recall: Arc<RecallService>,
config: DeduplicationConfig,
) -> Self
pub fn without_embeddings( recall: Arc<RecallService>, config: DeduplicationConfig, ) -> Self
Creates a service without semantic checking.
Useful when embeddings are unavailable or disabled. Only performs exact match and recent capture checks.
§Arguments
recall-RecallServicefor exact match searchesconfig- Configuration
Sourcepub fn with_domain(self, domain: Domain) -> Self
pub fn with_domain(self, domain: Domain) -> Self
Sets the domain for URN construction.
Sourcefn domain_string(&self) -> String
fn domain_string(&self) -> String
Returns the domain string for URN construction.
Sourcefn check_exact_match(
&self,
content: &str,
namespace: Namespace,
domain: &str,
start: Instant,
) -> Option<DuplicateCheckResult>
fn check_exact_match( &self, content: &str, namespace: Namespace, domain: &str, start: Instant, ) -> Option<DuplicateCheckResult>
Performs exact match check.
Sourcefn check_semantic(
&self,
content: &str,
namespace: Namespace,
domain: &str,
start: Instant,
) -> Option<DuplicateCheckResult>
fn check_semantic( &self, content: &str, namespace: Namespace, domain: &str, start: Instant, ) -> Option<DuplicateCheckResult>
Performs semantic similarity check.
Sourcefn check_recent(
&self,
content: &str,
namespace: Namespace,
start: Instant,
) -> Option<DuplicateCheckResult>
fn check_recent( &self, content: &str, namespace: Namespace, start: Instant, ) -> Option<DuplicateCheckResult>
Performs recent capture check.
Sourcefn record_unique_check_metrics(&self, namespace: Namespace, duration_ms: u64)
fn record_unique_check_metrics(&self, namespace: Namespace, duration_ms: u64)
Records final metrics for a unique check.
Sourcepub fn check(
&self,
content: &str,
namespace: Namespace,
) -> Result<DuplicateCheckResult>
pub fn check( &self, content: &str, namespace: Namespace, ) -> Result<DuplicateCheckResult>
Checks if content is a duplicate.
Performs checks in order: exact match → semantic → recent capture. Returns early on first match (short-circuit evaluation).
§Arguments
content- The content to checknamespace- The namespace to check within
§Returns
A DuplicateCheckResult with match details.
§Errors
Returns an error if a check fails. Individual check failures are handled gracefully (logged and skipped).
Sourcepub fn record_capture(
&self,
content: &str,
memory_id: &MemoryId,
namespace: Namespace,
)
pub fn record_capture( &self, content: &str, memory_id: &MemoryId, namespace: Namespace, )
Records a successful capture for future duplicate detection.
Should be called after a memory is successfully captured to enable recent-capture detection.
§Arguments
content- The captured contentmemory_id- The ID of the captured memorynamespace- The namespace the content was captured to
Sourcepub fn record_capture_by_hash(
&self,
content_hash: &str,
memory_id: &MemoryId,
namespace: Namespace,
)
pub fn record_capture_by_hash( &self, content_hash: &str, memory_id: &MemoryId, namespace: Namespace, )
Records a capture by content hash.
Useful when the hash has already been computed.
§Arguments
content_hash- The pre-computed content hashmemory_id- The ID of the captured memorynamespace- The namespace the content was captured to
Sourcepub fn content_to_tag(content: &str) -> String
pub fn content_to_tag(content: &str) -> String
Returns the hash tag for content.
This tag should be added to the memory’s tags during capture to enable future exact-match detection.
Sourcepub fn hash_content(content: &str) -> String
pub fn hash_content(content: &str) -> String
Returns the content hash for the given content.
Sourcepub const fn is_enabled(&self) -> bool
pub const fn is_enabled(&self) -> bool
Returns true if deduplication is enabled.
Sourcepub fn get_threshold(&self, namespace: Namespace) -> f32
pub fn get_threshold(&self, namespace: Namespace) -> f32
Returns the configured threshold for a namespace.
Trait Implementations§
Source§impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> Deduplicator for DeduplicationService<E, V>
Implementation of the Deduplicator trait.
impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> Deduplicator for DeduplicationService<E, V>
Implementation of the Deduplicator trait.
Source§fn check_duplicate(
&self,
content: &str,
namespace: Namespace,
) -> Result<DuplicateCheckResult>
fn check_duplicate( &self, content: &str, namespace: Namespace, ) -> Result<DuplicateCheckResult>
Auto Trait Implementations§
impl<E, V> !Freeze for DeduplicationService<E, V>
impl<E, V> !RefUnwindSafe for DeduplicationService<E, V>
impl<E, V> Send for DeduplicationService<E, V>
impl<E, V> Sync for DeduplicationService<E, V>
impl<E, V> Unpin for DeduplicationService<E, V>
impl<E, V> !UnwindSafe for DeduplicationService<E, V>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> FutureExt for T
impl<T> FutureExt for T
§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request§impl<L> LayerExt<L> for L
impl<L> LayerExt<L> for L
§fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>where
L: Layer<S>,
fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>where
L: Layer<S>,
Layered].