DeduplicationService

Struct DeduplicationService 

Source
pub struct DeduplicationService<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> {
    config: DeduplicationConfig,
    exact_match: ExactMatchChecker,
    semantic: Option<SemanticSimilarityChecker<E, V>>,
    recent: RecentCaptureChecker,
    domain: Domain,
}
Expand description

Service for deduplication checking.

Orchestrates three-tier deduplication:

  1. Exact match (fastest) - SHA256 hash lookup
  2. Semantic similarity - Embedding cosine similarity
  3. Recent capture - In-memory LRU cache

Uses short-circuit evaluation: stops on first match.

§Example

use subcog::services::deduplication::{DeduplicationService, DeduplicationConfig};
use subcog::services::recall::RecallService;
use subcog::embedding::FastEmbedEmbedder;
use subcog::storage::vector::UsearchBackend;
use std::sync::Arc;

let recall = Arc::new(RecallService::default());
let embedder = Arc::new(FastEmbedEmbedder::new());
let vector = Arc::new(UsearchBackend::in_memory(384));
let config = DeduplicationConfig::default();

let service = DeduplicationService::new(recall, embedder, vector, config);

let result = service.check_duplicate("Use PostgreSQL", Namespace::Decisions)?;
if result.is_duplicate {
    println!("Duplicate found: {:?} - {}", result.reason, result.matched_urn.unwrap());
}

Fields§

§config: DeduplicationConfig

Configuration.

§exact_match: ExactMatchChecker

Exact match checker.

§semantic: Option<SemanticSimilarityChecker<E, V>>

Semantic similarity checker (optional - may be disabled).

§recent: RecentCaptureChecker

Recent capture checker.

§domain: Domain

Domain for URN construction.

Implementations§

Source§

impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> DeduplicationService<E, V>

Source

pub fn new( recall: Arc<RecallService>, embedder: Arc<E>, vector: Arc<V>, config: DeduplicationConfig, ) -> Self

Creates a new deduplication service with all checkers.

§Arguments
  • recall - RecallService for exact match searches
  • embedder - Embedder for semantic similarity
  • vector - VectorBackend for semantic similarity searches
  • config - Configuration including thresholds
Source

pub fn without_embeddings( recall: Arc<RecallService>, config: DeduplicationConfig, ) -> Self

Creates a service without semantic checking.

Useful when embeddings are unavailable or disabled. Only performs exact match and recent capture checks.

§Arguments
  • recall - RecallService for exact match searches
  • config - Configuration
Source

pub fn with_domain(self, domain: Domain) -> Self

Sets the domain for URN construction.

Source

fn domain_string(&self) -> String

Returns the domain string for URN construction.

Source

fn check_exact_match( &self, content: &str, namespace: Namespace, domain: &str, start: Instant, ) -> Option<DuplicateCheckResult>

Performs exact match check.

Source

fn check_semantic( &self, content: &str, namespace: Namespace, domain: &str, start: Instant, ) -> Option<DuplicateCheckResult>

Performs semantic similarity check.

Source

fn check_recent( &self, content: &str, namespace: Namespace, start: Instant, ) -> Option<DuplicateCheckResult>

Performs recent capture check.

Source

fn record_unique_check_metrics(&self, namespace: Namespace, duration_ms: u64)

Records final metrics for a unique check.

Source

pub fn check( &self, content: &str, namespace: Namespace, ) -> Result<DuplicateCheckResult>

Checks if content is a duplicate.

Performs checks in order: exact match → semantic → recent capture. Returns early on first match (short-circuit evaluation).

§Arguments
  • content - The content to check
  • namespace - The namespace to check within
§Returns

A DuplicateCheckResult with match details.

§Errors

Returns an error if a check fails. Individual check failures are handled gracefully (logged and skipped).

Source

pub fn record_capture( &self, content: &str, memory_id: &MemoryId, namespace: Namespace, )

Records a successful capture for future duplicate detection.

Should be called after a memory is successfully captured to enable recent-capture detection.

§Arguments
  • content - The captured content
  • memory_id - The ID of the captured memory
  • namespace - The namespace the content was captured to
Source

pub fn record_capture_by_hash( &self, content_hash: &str, memory_id: &MemoryId, namespace: Namespace, )

Records a capture by content hash.

Useful when the hash has already been computed.

§Arguments
  • content_hash - The pre-computed content hash
  • memory_id - The ID of the captured memory
  • namespace - The namespace the content was captured to
Source

pub fn content_to_tag(content: &str) -> String

Returns the hash tag for content.

This tag should be added to the memory’s tags during capture to enable future exact-match detection.

Source

pub fn hash_content(content: &str) -> String

Returns the content hash for the given content.

Source

pub const fn is_enabled(&self) -> bool

Returns true if deduplication is enabled.

Source

pub fn get_threshold(&self, namespace: Namespace) -> f32

Returns the configured threshold for a namespace.

Trait Implementations§

Source§

impl<E: Embedder + Send + Sync, V: VectorBackend + Send + Sync> Deduplicator for DeduplicationService<E, V>

Implementation of the Deduplicator trait.

Source§

fn check_duplicate( &self, content: &str, namespace: Namespace, ) -> Result<DuplicateCheckResult>

Checks if content is a duplicate. Read more
Source§

fn record_capture(&self, content_hash: &str, memory_id: &MemoryId)

Records a successful capture for recent-capture tracking. Read more

Auto Trait Implementations§

§

impl<E, V> !Freeze for DeduplicationService<E, V>

§

impl<E, V> !RefUnwindSafe for DeduplicationService<E, V>

§

impl<E, V> Send for DeduplicationService<E, V>

§

impl<E, V> Sync for DeduplicationService<E, V>

§

impl<E, V> Unpin for DeduplicationService<E, V>

§

impl<E, V> !UnwindSafe for DeduplicationService<E, V>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> FutureExt for T

§

fn with_context(self, otel_cx: Context) -> WithContext<Self>

Attaches the provided Context to this type, returning a WithContext wrapper. Read more
§

fn with_current_context(self) -> WithContext<Self>

Attaches the current Context to this type, returning a WithContext wrapper. Read more
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> IntoRequest<T> for T

§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
§

impl<L> LayerExt<L> for L

§

fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>
where L: Layer<S>,

Applies the layer to a service and wraps it in [Layered].
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more