Git-Native Semantic Memory for LLM Agents
A Framework for Persistent, Distributed, and Progressively-Hydrated Memory
Author: Robert Allen Date: December 21, 2025 Version: 2.0
Abstract
Large Language Model (LLM) agents operating in software development environments suffer from a fundamental architectural limitation: context window boundaries enforce session isolation, causing accumulated knowledge to be lost when sessions terminate or contexts compact. This paper presents git-notes-memory-manager, a novel architecture that addresses this limitation by leveraging Git’s native notes mechanism as a distributed, version-controlled memory store. The system implements progressive hydration across three detail levels (SUMMARY, FULL, FILES) to optimize token consumption, and employs hook-based capture with confidence-scored signal detection to automate memory extraction with minimal cognitive overhead.
We ground our architecture in established cognitive science frameworks, drawing from Baddeley’s multicomponent working memory model (Baddeley & Hitch, 1974; Baddeley, 2000) to structure memory prioritization, and from signal detection theory (Green & Swets, 1966) to formalize capture decisions. The system applies Shneiderman’s “overview first, details on demand” progressive disclosure principle to manage token budgets while preserving access to complete context when needed.
Production validation demonstrates sub-10ms context generation, 116+ indexed memories across 10 semantic namespaces, and automatic capture of 5+ memories per session via hook-based detection. The architecture achieves zero-infrastructure deployment by storing memories alongside code in Git, enabling team-wide knowledge sharing through standard git push/pull operations.
Keywords: LLM agents, persistent memory, semantic search, Git notes, progressive hydration, signal detection, working memory, context management
1. Introduction
1.1 The Memory Problem in LLM Agents
Large Language Model agents operating in development environments face a fundamental limitation that distinguishes them from human collaborators: context window constraints force session isolation. When a developer and LLM agent together make an architectural decision in one session, that knowledge exists only within the conversation history. Upon session termination or context compaction, the decision vanishes unless explicitly recorded elsewhere.
This limitation has significant practical consequences. Recent surveys on LLM agent memory mechanisms observe that “unlike humans who dynamically integrate new information, LLMs effectively ‘reset’ once information falls outside their context window” (arXiv:2404.13501). Even as models push context length boundaries—GPT-4 at 128K tokens, Claude 3.7 at 200K, Gemini at 10M—these improvements merely delay rather than solve the fundamental limitation.
The research question motivating this work is:
How can LLM agents maintain persistent, semantically-searchable memory across sessions while integrating naturally with existing developer workflows and requiring no additional infrastructure?
1.2 Design Requirements
Analysis of developer workflows and the constraints of LLM agent operation revealed five core requirements that a memory system must satisfy:
- Persistence: Memories must survive session boundaries and context compaction events
- Distribution: Memory should synchronize with code using existing infrastructure (no separate databases or cloud services)
- Semantic Retrieval: Natural language queries must locate relevant memories without requiring exact-match keywords
- Progressive Detail: The system must load only as much context as needed, preserving tokens for active work
- Automatic Capture: Reduce cognitive load by detecting memorable content rather than requiring manual intervention
1.3 Contribution
This paper presents a complete implementation addressing all five requirements. The key contributions are:
- Git-native memory storage using
refs/notes/mem/{namespace}references, enabling distributed synchronization through standard git operations - Progressive hydration implementing three detail levels (SUMMARY, FULL, FILES) that reduce token consumption by 10-50x while preserving access to complete context
- Hook-based automatic capture leveraging IDE extension points with confidence-scored signal detection based on signal detection theory
- Token-budgeted context injection that adapts to project complexity using cognitive load principles
Production validation demonstrates:
- 116 memories indexed across 10 semantic namespaces
- Sub-10ms context generation at session start
- Automatic capture of 5+ memories per session via hook-based detection
- Cross-session recall of decisions, learnings, and blockers
2. Theoretical Foundations
The architecture draws from three established theoretical frameworks: cognitive psychology’s multicomponent working memory model, human-computer interaction’s progressive disclosure principle, and signal detection theory from psychophysics. This section establishes how each framework informs system design.
2.1 The Multicomponent Working Memory Model
Baddeley and Hitch (1974) proposed a multicomponent model of working memory that replaced the earlier unitary short-term memory concept. The model posits a central executive controlling limited attentional capacity, coordinating two subsidiary systems: the phonological loop for verbal information and the visuospatial sketchpad for spatial information. Baddeley (2000) later added the episodic buffer, a limited-capacity system that binds information from subsidiary systems and long-term memory into unified episodic representations.
This cognitive architecture maps directly to LLM agent memory requirements:
| Cognitive Component | System Mapping | Implementation |
|---|---|---|
| Central Executive | Context window management | Token budget allocation |
| Episodic Buffer | Working memory section | Active blockers, recent decisions |
| Long-term Memory | Semantic memory store | Git notes + vector index |
| Binding Process | Progressive hydration | SUMMARY to FULL expansion |
The episodic buffer’s role is particularly relevant: it holds “a limited capacity system that provides temporary storage of information held in a multimodal code, which is capable of binding information from the subsidiary systems, and from long-term memory, into a unitary episodic representation” (Baddeley, 2000). In our system, the SessionStart context injection performs analogous binding— retrieving relevant memories from the persistent store (long-term memory) and formatting them for inclusion in the active context (working memory).
The system allocates token budgets reflecting this structure:
- Working Memory (50-70%): Active blockers, pending decisions, recent progress
- Semantic Context (20-35%): Relevant learnings, related patterns retrieved via vector similarity
- Guidance (10%): Behavioral instructions for memory capture
2.2 The Two-Stage Memory Consolidation Model
The architecture also draws from memory consolidation research, particularly the two-stage model of memory formation (Diekelmann & Born, 2010). This model posits that new information is initially encoded rapidly in a temporary store (hippocampus in biological systems), then gradually consolidated into a slower-learning long-term store (neocortex) during periods of rest.
Our system implements an analogous two-stage process:
- Fast capture: During sessions, memories are captured to Git notes (append-only, fast writes)
- Consolidation: At session end, the Stop hook analyzes transcripts, extracts high-confidence signals, and indexes them for semantic retrieval
This separation enables rapid capture without blocking user interaction, while the consolidation phase ensures memories are properly indexed and de-duplicated.
2.3 Progressive Disclosure and Information Layering
Shneiderman’s information visualization mantra—“overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996)—provides the theoretical foundation for progressive hydration. The principle recognizes that users (and by extension, LLM agents) benefit from seeing abstract summaries before diving into details, reducing cognitive load while maintaining access to complete information.
Nielsen (2006) formalized progressive disclosure as “deferring advanced or rarely used features to a secondary screen, making applications easier to learn and less error-prone.” Applied to LLM context management, this translates to:
- Overview (SUMMARY level): Memory summaries in context injection
- Zoom (FULL level): Complete memory content on demand
- Details (FILES level): File snapshots from the commit when memory was created
Recent research on progressive disclosure in AI transparency confirms its efficacy: “The HCI community has advocated for design principles like progressive disclosure to improve transparency” of AI systems (Springer, 2024). Our implementation extends this principle to memory retrieval, ensuring token efficiency while preserving access to complete context.
2.4 Signal Detection Theory for Capture Decisions
Signal detection theory (SDT), developed by Green and Swets (1966) for analyzing sensory discrimination, provides a rigorous framework for formalizing capture decisions. SDT separates two independent aspects of discrimination performance: sensitivity (ability to detect signals) and criterion (threshold for reporting detection).
The theory addresses a fundamental challenge in automatic memory capture: balancing false positives (capturing irrelevant content, wasting storage and polluting retrieval) against false negatives (missing valuable memories). SDT formalizes this trade-off through the receiver operating characteristic (ROC).
Our system implements a three-tier decision model based on SDT principles:
| Confidence | Action | SDT Interpretation |
|---|---|---|
| >= 0.95 | AUTO | High sensitivity, low false-positive risk |
| 0.70-0.95 | SUGGEST | Present to user for criterion adjustment |
| < 0.70 | SKIP | Below detection threshold, false-positive risk too high |
This approach allows the system to “optimize criterion location—to adopt a criterion that maximizes expected utility, producing the optimal blend of missed detections and false alarms” (Green & Swets, 1966).
2.5 Git Notes as a Distributed Memory Store
Git notes (git notes) provide an overlooked mechanism for attaching metadata
to commits without modifying commit history. Notes are stored in separate
reference namespaces and can contain arbitrary content:
refs/notes/mem/
decisions/ # Architectural choices
learnings/ # Technical insights
blockers/ # Impediments and resolutions
progress/ # Milestones and completions
patterns/ # Reusable approaches
... # 10 namespaces total
Research on metadata management in distributed systems confirms that “Git provides the ability to track changes and has powerful sharing capabilities, allowing changes to metadata to be exchanged with a central repository and other users” (Metagit, 2017). This observation motivates our choice of Git notes over external databases:
Advantages over external databases:
- Distributed: Synchronizes with
git push/pullusing existing infrastructure - Versioned: Complete history of memory changes available through git log
- Local-first: No network latency, operates offline
- Team-shareable: Memories propagate to collaborators through standard workflows
Trade-off: Git notes lack native semantic search capability, requiring a secondary index (SQLite + sqlite-vec) for fast vector similarity queries. The system treats Git notes as the source of truth and SQLite as a derived, rebuildable index.
3. System Architecture
3.1 System Overview
The architecture comprises three layers: a hook layer interfacing with the IDE, a service layer implementing core memory operations, and a storage layer managing Git notes and the vector index.
+-------------------------------------------------------------------+
| Claude Code IDE |
+-------------------------------------------------------------------+
| SessionStart UserPrompt PostToolUse PreCompact Stop |
| | | | | | |
+-------------------------------------------------------------------+
| Hook Handlers |
|. ContextBuilder SignalDetector DomainExtractor Analyzer |
+-------------------------------------------------------------------+
| Service Layer |
| CaptureService RecallService SyncService |
+----------------+--------------------------+-----------------------+
| Git Notes | SQLite Index | Embedding Service |
| refs/notes/ | memories + vec_memories | all-MiniLM-L6-v2 |
+----------------+--------------------------+-----------------------+
3.2 Data Model
The core entity is a frozen (immutable) dataclass ensuring memory integrity:
@dataclass(frozen=True)
class Memory:
id: str # "decisions:5da308d:0"
commit_sha: str # Git commit reference
namespace: str # Semantic category
summary: str # <= 100 characters
content: str # Full markdown body
timestamp: datetime # Capture time (UTC)
spec: str | None # Project specification
tags: tuple[str, ...] # Categorization
status: str # "active", "resolved"
relates_to: tuple[str, ...] # Related memory IDs
ID Format: {namespace}:{commit_sha_prefix}:{index}
- Example:
decisions:5da308d:19 - Enables tracing to the originating git commit for full implementation context
3.3 Storage Format
Memories use YAML front matter with a markdown body, enabling both machine parsing and human readability:
---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
spec: git-notes-memory
tags: performance,architecture
---
## Context
Import-time loading of sentence-transformers adds 2+ seconds to startup.
## Decision
Use Python's `__getattr__` in `__init__.py` for lazy module loading.
## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Users who don't need embeddings never pay the cost
3.4 Namespace Taxonomy
The system defines ten semantic namespaces, each with associated signal detection patterns:
| Namespace | Purpose | Signal Patterns |
|---|---|---|
| decisions | Architectural choices | “I decided”, “we chose”, “[decision]” |
| learnings | Technical insights | “I learned”, “TIL”, “[learned]” |
| blockers | Impediments | “blocked by”, “stuck on”, “[blocker]” |
| progress | Milestones | “completed”, “shipped”, “[progress]” |
| patterns | Reusable approaches | “best practice”, “[pattern]” |
| research | External findings | Manual capture |
| reviews | Code review notes | Manual capture |
| retrospective | Post-mortems | Manual capture |
| inception | Problem statements | Manual capture |
| elicitation | Requirements | Manual capture |
4. Progressive Hydration
4.1 The Hydration Model
Progressive hydration implements Shneiderman’s “details on demand” principle, loading memory details only when needed. This approach addresses the token budget constraint inherent in LLM context windows.
Level 1: SUMMARY (Default for context injection)
<memory id="decisions:5da308d:19" hydration="summary">
<summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
</memory>
- Token cost: 15-20 tokens
- Retrieval time: Sub-millisecond (index lookup)
Level 2: FULL (On-demand expansion)
---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
---
## Context
Import-time loading of sentence-transformers adds 2+ seconds...
## Decision
Use Python's `__getattr__` in `__init__.py`...
## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Token cost: 100-500 tokens
- Retrieval time: ~10ms (git notes show)
Level 3: FILES (Full context reconstruction)
- Includes file snapshots from the commit when memory was created
- Enables complete context reconstruction
- Token cost: Unbounded (file-dependent)
- Retrieval time: Variable (git tree traversal)
4.2 Token Efficiency Analysis
The three-level model achieves significant token savings. For a project with 100 indexed memories:
| Approach | Token Cost | Context Utilization |
|---|---|---|
| All FULL | 25,000-50,000 | Exceeds typical budgets |
| All SUMMARY | 1,500-2,000 | 13 memories shown |
| Progressive | 2,000 + on-demand | Full coverage with depth |
The progressive approach enables injecting summaries for all relevant memories while reserving tokens for expanding specific memories when the LLM determines additional context is needed.
4.3 Production Example
In a debugging session, the SessionStart hook injected 13 memories at SUMMARY level:
<memory_context project="git-notes-memory" memories_retrieved="13">
<working_memory>
<decisions title="Recent Decisions">
<memory id="decisions:5da308d:21">
<summary>Adaptive token budget based on project complexity</summary>
</memory>
<memory id="decisions:5da308d:20">
<summary>Confidence-based tiered capture behavior (AUTO/SUGGEST/SKIP)</summary>
</memory>
<memory id="decisions:5da308d:19">
<summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
</memory>
<memory id="decisions:5da308d:18">
<summary>Git notes as source of truth, SQLite as derived queryable index</summary>
</memory>
</decisions>
</working_memory>
</memory_context>
Total token cost: approximately 200 tokens for 13 memories. When the agent
requires full context on a specific decision, it requests
/memory:recall decisions:5da308d:19 for FULL hydration.
5. Hook-Based Capture
5.1 Hook Event Lifecycle
The system integrates with Claude Code’s hook infrastructure at five extension points, each serving a distinct purpose in the memory lifecycle:
Session Start --> Context Injection (memories -> Claude)
|
v
User Prompt ---> Signal Detection (user text -> capture decision)
|
v
Tool Use ------> Domain Context (file path -> related memories)
|
v
Pre-Compact ---> Preservation (high-confidence signals -> git notes)
|
v
Stop ----------> Session Analysis (transcript -> memory extraction)
5.2 Signal Detection Implementation
The SignalDetector implements the three-tier SDT-based model using regex patterns with confidence scoring:
Pattern Examples:
DECISION_PATTERNS = [
(r"\[decision\]", 0.98), # Explicit marker
(r"\[d\]", 0.95), # Shorthand
(r"I\s+decided\s+to", 0.90), # Natural language
(r"we\s+chose", 0.88), # Collaborative
(r"we'll\s+go\s+with", 0.85), # Informal
]
Block Marker Format (highest confidence: 0.99):
>> decision -----------------------------------------------
Use PostgreSQL for persistence layer
## Context
Evaluated SQLite, PostgreSQL, and MongoDB.
## Rationale
- ACID guarantees required for financial data
- Team expertise in PostgreSQL
-------------------------------------------------------
The block marker format uses Unicode characters for visual distinction and achieves 0.99 confidence because it represents explicit, unambiguous user intent.
5.3 Novelty Checking
Before committing a captured memory, the system performs vector similarity checking against existing memories:
novelty_threshold = 0.3 # 30% different from existing = novel
similarity = cosine_similarity(new_embedding, existing_embeddings)
if max(similarity) > (1 - novelty_threshold):
skip_capture() # Too similar to existing memory
This prevents duplicate captures when users rephrase previously captured decisions, addressing the memory bloat problem identified in recent research: “indiscriminate strategies propagate errors and degrade long-term agent performance” (Xiong et al., 2025).
5.4 Production Capture Example
During a debugging session, the Stop hook analyzed the transcript and captured memories:
Hook Log (2025-12-21 00:46:36):
Stop hook invoked
Analyzing transcript: /Users/.../c2df8449-ad02-413c-ae27-52886bb605c8.jsonl
Found 5 signals in transcript
Signal: type=decision, ns=decisions, conf=1.00, match=[decision]...
Signal: type=decision, ns=decisions, conf=0.99, match=>> decision ---...
Signal: type=decision, ns=decisions, conf=0.99, match=>> decision ---...
Auto-capturing signals (min_conf=0.80, max=50)
Auto-capture result: 5 captured, 0 remaining
Captured: decisions:5da308d:17
Captured: decisions:5da308d:18
Captured: decisions:5da308d:19
Captured: decisions:5da308d:20
Captured: decisions:5da308d:21
These five memories appeared in the subsequent session’s <memory_context>,
demonstrating cross-session persistence.
6. Context Injection Mechanism
6.1 SessionStart Context Injection
The SessionStart hook outputs JSON with an additionalContext field that
Claude Code injects into the system prompt:
{
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": "<memory_context>...</memory_context>"
},
"message": "Memory system: 116 memories indexed"
}
This mechanism enables the LLM to access memories without explicit user action, implementing the working memory binding process from Baddeley’s model.
6.2 Response Guidance
Beyond memory context, the hook injects behavioral guidance teaching the LLM the capture syntax:
<session_behavior_protocol level="standard">
<mandatory_rules>
When you make a decision, learn something, hit a blocker, or complete work,
you MUST capture it using block markers.
### Block Format (Required for All Captures)
```text
>> decision -----------------------------------------------
Use PostgreSQL for JSONB support
## Context
Why this decision was needed...
## Rationale
- Reason 1 with supporting evidence
- Alternative considered and why rejected
-------------------------------------------------------
```
</mandatory_rules>
</session_behavior_protocol>
This guidance enables the LLM to actively create memories during sessions, closing the loop between memory retrieval and memory capture.
6.3 Token Budget Adaptation
Context injection adapts to project complexity, implementing an elastic memory allocation strategy:
| Complexity | Memory Count | Token Budget | Working % | Semantic % |
|---|---|---|---|---|
| Simple | < 10 | 500 | 70% | 20% |
| Medium | 10-50 | 1000 | 70% | 20% |
| Complex | 50-200 | 2000 | 70% | 25% |
| Full | > 200 | 3500 | 60% | 35% |
Projects with more memories receive larger budgets, and the working-to-semantic ratio shifts to accommodate the increased value of cross-referenced context in complex projects.
7. Novel Use Cases
7.1 Cross-Session Architectural Continuity
Scenario: Developer asks about a database choice in a new session, without explicitly querying memory.
Memory Context (injected at SessionStart):
<memory id="decisions:5da308d:18">
<summary>Git notes as source of truth, SQLite as derived queryable index</summary>
</memory>
LLM Response: “Based on a previous architectural decision (decisions:5da308d:18), we chose to use git notes as the source of truth with SQLite as a derived index for search performance. This allows…”
The LLM naturally references the injected memory, providing continuity without explicit recall commands.
7.2 Blocker Resolution Tracking
Session 1 (blocker captured):
>> blocker -----------------------------------------------
Hook-based memory capture not working
## Context
User outputs 16 block markers but /memory:status shows 0 memories.
## Impact
5 hours of exploration produced no captured memories.
-------------------------------------------------------
Session 2 (blocker injected + resolution captured):
<blockers>
<memory id="blockers:5da308d:0">
<summary>Hook-based memory capture not working</summary>
</memory>
</blockers>
Resolution captured in Session 2:
>> learned -----------------------------------------------
Hook-based memory capture works via Stop hook at session end
## Context
Memories are captured when session ends (Stop hook), not during.
/memory:status run mid-session shows 0 because Stop hasn't fired.
-------------------------------------------------------
The blocker-to-learning transition is preserved across sessions, enabling progress tracking on persistent issues.
7.3 File-Contextual Memory Surfacing
User edits: src/git_notes_memory/hooks/stop_handler.py
PostToolUse hook:
- Extracts domain: [“git_notes_memory”, “hooks”, “stop_handler”]
- Performs vector search with domain terms
- Injects context:
<related_memories>
<memory id="decisions:abc123:4" relevance="0.89">
<summary>Stop hook auto-captures high-confidence signals at session end</summary>
</memory>
<memory id="learnings:def456:1" relevance="0.76">
<summary>SessionAnalyzer scans both user and assistant messages</summary>
</memory>
</related_memories>
This surfaces relevant context when editing specific files, without requiring explicit memory queries.
7.4 Compaction-Safe Preservation
The PreCompact hook fires before context window compaction, analyzing the transcript for uncaptured high-confidence signals:
Log output:
PreCompact hook invoked
Analyzing transcript for uncaptured signals...
Found 3 uncaptured signals
Auto-capture result: 3 captured, 0 remaining
This implements the memory consolidation phase, ensuring valuable insights survive context compaction. The analogy to sleep-dependent memory consolidation is deliberate: just as sleep consolidates memories before they decay, the PreCompact hook consolidates memories before the context window shrinks.
7.5 Team Knowledge Distribution
Memories synchronize through standard git operations:
# Push memories to remote
git push origin 'refs/notes/mem/*:refs/notes/mem/*'
# Pull team's memories
git fetch origin 'refs/notes/mem/*:refs/notes/mem/*'
# Reindex after pull
/memory:sync
This enables team-wide learning capture without additional infrastructure, treating collective knowledge as a natural extension of the codebase.
8. Evaluation
8.1 Performance Measurements
| Operation | Target | Achieved | Method |
|---|---|---|---|
| SessionStart context build | <= 2000ms | < 10ms | Indexed queries |
| Signal detection (regex) | <= 100ms | < 5ms | Compiled patterns |
| Novelty check | <= 300ms | < 50ms | sqlite-vec KNN |
| Memory capture | <= 500ms | < 100ms | Append + index |
| Vector search (k=10) | <= 100ms | < 50ms | sqlite-vec |
All operations complete well within interactive latency requirements, ensuring the memory system does not degrade user experience.
8.2 Index Statistics
Production statistics from the git-notes-memory project:
Total indexed memories: 116
By namespace:
- decisions: 28
- learnings: 23
- blockers: 19
- progress: 15
- patterns: 31
The distribution reflects natural development patterns: more decisions and learnings than blockers, with patterns accumulating as the project matures.
8.3 Scalability Characteristics
- Memory count: Tested to 1000+ memories without degradation
- Transcript size: Handles 2M token transcripts (Claude Code maximum)
- Concurrent access: File locking prevents corruption during parallel sessions
- Index rebuild: Full reindex from git notes completes in < 5 seconds for 1000 memories
9. Related Work
9.1 LLM Agent Memory Systems
Recent surveys identify memory as “the key component that transforms the original LLM into a ‘true agent’” (Zhang et al., 2025). Current approaches fall into several categories:
In-context memory: Appending conversation history to prompts. Limited by context window size and incurs O(n) token cost per turn.
Vector database retrieval: Systems like Mem0 use external vector databases for semantic retrieval, achieving “26% higher response accuracy compared to OpenAI’s memory” (Mem0, 2025). However, these require infrastructure beyond the development environment.
Reflection-based memory: MemGPT and similar systems use LLM self-reflection for memory management. Effective but computationally expensive.
Our approach differs by using Git as the storage layer, eliminating infrastructure requirements while enabling team synchronization through existing workflows.
9.2 Cognitive Architectures
The system draws from cognitive architecture research, particularly ACT-R’s distinction between declarative and procedural memory. Our namespace taxonomy (decisions, learnings, patterns) reflects this distinction: decisions and learnings are declarative (facts), while patterns approach procedural (how-to) knowledge.
9.3 Progressive Disclosure in AI
Recent research investigates “the effect of progressive disclosure for improving the transparency of AI text generation systems” (Springer, 2024). Our progressive hydration extends this principle from transparency to memory management, using similar principles of layered information access.
10. Limitations and Future Work
10.1 Current Limitations
-
Session-End Capture: Memories from assistant responses are captured at session end, not mid-session. Users cannot query newly-captured memories until the next session.
-
Single-Model Embeddings: The system uses all-MiniLM-L6-v2 (384 dimensions). Migration to different embedding models requires full reindexing.
-
Single-Repository Scope: Each repository maintains an isolated memory index. Cross-repository queries are not supported.
-
Manual Namespace Selection: Block markers require explicit namespace specification. Automatic namespace inference would reduce cognitive load.
10.2 Future Directions
-
Mid-Session Capture: Analyze assistant responses via UserPromptSubmit or PostToolUse hooks for real-time capture.
-
LLM-Assisted Classification: Use the LLM itself for namespace inference and memory summarization, trading latency for accuracy.
-
Cross-Repository Federation: Query memories from linked repositories, enabling organization-wide knowledge retrieval.
-
Temporal Decay: Implement exponential decay in relevance scoring, prioritizing recent memories while retaining access to historical context.
-
Feedback Loops: Track which memories the LLM references, reinforcing useful memories and demoting unused ones.
11. Conclusion
The git-notes-memory-manager demonstrates that persistent, semantically- searchable memory for LLM agents is achievable without external infrastructure. By leveraging Git’s native notes mechanism, progressive hydration, and hook- based capture with signal detection theory, the system provides:
- Zero-Infrastructure Memory: Operates with existing git, requiring no databases or cloud services
- Semantic Retrieval: Natural language queries locate relevant memories through vector similarity
- Automatic Capture: Confidence-scored signal detection reduces cognitive load
- Token Efficiency: Progressive hydration respects context window constraints
- Team Sharing: Memories synchronize with code through standard git operations
The architecture validates treating LLM agent memory as a first-class concern— rather than an afterthought—enabling qualitatively different developer experiences. Decisions persist, blockers track to resolution, and learnings accumulate across sessions, transforming ephemeral conversations into durable knowledge.
Appendix A: Configuration Reference
| Variable | Default | Purpose |
|---|---|---|
HOOK_ENABLED |
true | Master switch for all hooks |
HOOK_SESSION_START_ENABLED |
true | Context injection at session start |
HOOK_STOP_ENABLED |
true | Session-end transcript analysis |
HOOK_STOP_MAX_CAPTURES |
50 | Maximum auto-captures per session |
HOOK_PRE_COMPACT_ENABLED |
true | Capture before context compaction |
HOOK_PRE_COMPACT_MIN_CONFIDENCE |
0.85 | Minimum confidence for auto-capture |
Appendix B: Memory ID Format
{namespace}:{commit_sha_prefix}:{index}
Examples:
decisions:5da308d:19 -> Decision #19 on commit 5da308d
learnings:4c98fec:0 -> First learning on commit 4c98fec
blockers:051134b:2 -> Third blocker on commit 051134b
The commit SHA prefix enables tracing memories to their originating context, supporting the FILES hydration level.
Appendix C: Signal Confidence Ranges
| Signal Type | Base Range | Boost Conditions |
|---|---|---|
| Block marker (») | 0.99 | None (maximum confidence) |
| Explicit ([decision]) | 0.95-0.98 | None |
| Strong (“I decided”) | 0.85-0.92 | +0.05 if “critical”, “important” |
| Medium (“we chose”) | 0.80-0.88 | +0.02 if complete sentence |
| Weak (“I prefer”) | 0.68-0.75 | Rarely auto-captured |
References
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417-423. DOI: 10.1016/S1364-6613(00)01538-2
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47-89). Academic Press.
Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.
Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley.
Hitch, G. J., Allen, R. J., & Baddeley, A. D. (2025). The multicomponent model of working memory fifty years on. Quarterly Journal of Experimental Psychology. DOI: 10.1177/17470218241290909
Nielsen, J. (2006). Progressive disclosure. Nielsen Norman Group. nngroup.com
Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of IEEE Symposium on Visual Languages, 336-343.
Wang, L., et al. (2024). A survey on the memory mechanism of large language model based agents. arXiv:2404.13501.
Xiong, C., et al. (2025). Memory management for LLM agents: Utility-based deletion prevents bloat. arXiv preprint.
This research was conducted through systematic analysis of the git-notes-memory-manager codebase and production validation during development sessions. Real examples are drawn from actual session logs dated December 2025.