Git-Native Semantic Memory for LLM Agents

A Framework for Persistent, Distributed, and Progressively-Hydrated Memory

Author: Robert Allen Date: December 21, 2025 Version: 2.0

Abstract

Large Language Model (LLM) agents operating in software development environments suffer from a fundamental architectural limitation: context window boundaries enforce session isolation, causing accumulated knowledge to be lost when sessions terminate or contexts compact. This paper presents git-notes-memory-manager, a novel architecture that addresses this limitation by leveraging Git’s native notes mechanism as a distributed, version-controlled memory store. The system implements progressive hydration across three detail levels (SUMMARY, FULL, FILES) to optimize token consumption, and employs hook-based capture with confidence-scored signal detection to automate memory extraction with minimal cognitive overhead.

We ground our architecture in established cognitive science frameworks, drawing from Baddeley’s multicomponent working memory model (Baddeley & Hitch, 1974; Baddeley, 2000) to structure memory prioritization, and from signal detection theory (Green & Swets, 1966) to formalize capture decisions. The system applies Shneiderman’s “overview first, details on demand” progressive disclosure principle to manage token budgets while preserving access to complete context when needed.

Production validation demonstrates sub-10ms context generation, 116+ indexed memories across 10 semantic namespaces, and automatic capture of 5+ memories per session via hook-based detection. The architecture achieves zero-infrastructure deployment by storing memories alongside code in Git, enabling team-wide knowledge sharing through standard git push/pull operations.

Keywords: LLM agents, persistent memory, semantic search, Git notes, progressive hydration, signal detection, working memory, context management

1. Introduction

1.1 The Memory Problem in LLM Agents

Large Language Model agents operating in development environments face a fundamental limitation that distinguishes them from human collaborators: context window constraints force session isolation. When a developer and LLM agent together make an architectural decision in one session, that knowledge exists only within the conversation history. Upon session termination or context compaction, the decision vanishes unless explicitly recorded elsewhere.

This limitation has significant practical consequences. Recent surveys on LLM agent memory mechanisms observe that “unlike humans who dynamically integrate new information, LLMs effectively ‘reset’ once information falls outside their context window” (arXiv:2404.13501). Even as models push context length boundaries—GPT-4 at 128K tokens, Claude 3.7 at 200K, Gemini at 10M—these improvements merely delay rather than solve the fundamental limitation.

The research question motivating this work is:

How can LLM agents maintain persistent, semantically-searchable memory across sessions while integrating naturally with existing developer workflows and requiring no additional infrastructure?

1.2 Design Requirements

Analysis of developer workflows and the constraints of LLM agent operation revealed five core requirements that a memory system must satisfy:

Persistence: Memories must survive session boundaries and context compaction events
Distribution: Memory should synchronize with code using existing infrastructure (no separate databases or cloud services)
Semantic Retrieval: Natural language queries must locate relevant memories without requiring exact-match keywords
Progressive Detail: The system must load only as much context as needed, preserving tokens for active work
Automatic Capture: Reduce cognitive load by detecting memorable content rather than requiring manual intervention

1.3 Contribution

This paper presents a complete implementation addressing all five requirements. The key contributions are:

Git-native memory storage using refs/notes/mem/{namespace} references, enabling distributed synchronization through standard git operations
Progressive hydration implementing three detail levels (SUMMARY, FULL, FILES) that reduce token consumption by 10-50x while preserving access to complete context
Hook-based automatic capture leveraging IDE extension points with confidence-scored signal detection based on signal detection theory
Token-budgeted context injection that adapts to project complexity using cognitive load principles

Production validation demonstrates:

116 memories indexed across 10 semantic namespaces
Sub-10ms context generation at session start
Automatic capture of 5+ memories per session via hook-based detection
Cross-session recall of decisions, learnings, and blockers

2. Theoretical Foundations

The architecture draws from three established theoretical frameworks: cognitive psychology’s multicomponent working memory model, human-computer interaction’s progressive disclosure principle, and signal detection theory from psychophysics. This section establishes how each framework informs system design.

2.1 The Multicomponent Working Memory Model

Baddeley and Hitch (1974) proposed a multicomponent model of working memory that replaced the earlier unitary short-term memory concept. The model posits a central executive controlling limited attentional capacity, coordinating two subsidiary systems: the phonological loop for verbal information and the visuospatial sketchpad for spatial information. Baddeley (2000) later added the episodic buffer, a limited-capacity system that binds information from subsidiary systems and long-term memory into unified episodic representations.

This cognitive architecture maps directly to LLM agent memory requirements:

Cognitive Component	System Mapping	Implementation
Central Executive	Context window management	Token budget allocation
Episodic Buffer	Working memory section	Active blockers, recent decisions
Long-term Memory	Semantic memory store	Git notes + vector index
Binding Process	Progressive hydration	SUMMARY to FULL expansion

The episodic buffer’s role is particularly relevant: it holds “a limited capacity system that provides temporary storage of information held in a multimodal code, which is capable of binding information from the subsidiary systems, and from long-term memory, into a unitary episodic representation” (Baddeley, 2000). In our system, the SessionStart context injection performs analogous binding— retrieving relevant memories from the persistent store (long-term memory) and formatting them for inclusion in the active context (working memory).

The system allocates token budgets reflecting this structure:

Working Memory (50-70%): Active blockers, pending decisions, recent progress
Semantic Context (20-35%): Relevant learnings, related patterns retrieved via vector similarity
Guidance (10%): Behavioral instructions for memory capture

2.2 The Two-Stage Memory Consolidation Model

The architecture also draws from memory consolidation research, particularly the two-stage model of memory formation (Diekelmann & Born, 2010). This model posits that new information is initially encoded rapidly in a temporary store (hippocampus in biological systems), then gradually consolidated into a slower-learning long-term store (neocortex) during periods of rest.

Our system implements an analogous two-stage process:

Fast capture: During sessions, memories are captured to Git notes (append-only, fast writes)
Consolidation: At session end, the Stop hook analyzes transcripts, extracts high-confidence signals, and indexes them for semantic retrieval

This separation enables rapid capture without blocking user interaction, while the consolidation phase ensures memories are properly indexed and de-duplicated.

2.3 Progressive Disclosure and Information Layering

Shneiderman’s information visualization mantra—“overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996)—provides the theoretical foundation for progressive hydration. The principle recognizes that users (and by extension, LLM agents) benefit from seeing abstract summaries before diving into details, reducing cognitive load while maintaining access to complete information.

Nielsen (2006) formalized progressive disclosure as “deferring advanced or rarely used features to a secondary screen, making applications easier to learn and less error-prone.” Applied to LLM context management, this translates to:

Overview (SUMMARY level): Memory summaries in context injection
Zoom (FULL level): Complete memory content on demand
Details (FILES level): File snapshots from the commit when memory was created

Recent research on progressive disclosure in AI transparency confirms its efficacy: “The HCI community has advocated for design principles like progressive disclosure to improve transparency” of AI systems (Springer, 2024). Our implementation extends this principle to memory retrieval, ensuring token efficiency while preserving access to complete context.

2.4 Signal Detection Theory for Capture Decisions

Signal detection theory (SDT), developed by Green and Swets (1966) for analyzing sensory discrimination, provides a rigorous framework for formalizing capture decisions. SDT separates two independent aspects of discrimination performance: sensitivity (ability to detect signals) and criterion (threshold for reporting detection).

The theory addresses a fundamental challenge in automatic memory capture: balancing false positives (capturing irrelevant content, wasting storage and polluting retrieval) against false negatives (missing valuable memories). SDT formalizes this trade-off through the receiver operating characteristic (ROC).

Our system implements a three-tier decision model based on SDT principles:

Confidence	Action	SDT Interpretation
>= 0.95	AUTO	High sensitivity, low false-positive risk
0.70-0.95	SUGGEST	Present to user for criterion adjustment
< 0.70	SKIP	Below detection threshold, false-positive risk too high

This approach allows the system to “optimize criterion location—to adopt a criterion that maximizes expected utility, producing the optimal blend of missed detections and false alarms” (Green & Swets, 1966).

2.5 Git Notes as a Distributed Memory Store

Git notes (git notes) provide an overlooked mechanism for attaching metadata to commits without modifying commit history. Notes are stored in separate reference namespaces and can contain arbitrary content:

refs/notes/mem/
  decisions/     # Architectural choices
  learnings/     # Technical insights
  blockers/      # Impediments and resolutions
  progress/      # Milestones and completions
  patterns/      # Reusable approaches
  ...            # 10 namespaces total

Research on metadata management in distributed systems confirms that “Git provides the ability to track changes and has powerful sharing capabilities, allowing changes to metadata to be exchanged with a central repository and other users” (Metagit, 2017). This observation motivates our choice of Git notes over external databases:

Advantages over external databases:

Distributed: Synchronizes with git push/pull using existing infrastructure
Versioned: Complete history of memory changes available through git log
Local-first: No network latency, operates offline
Team-shareable: Memories propagate to collaborators through standard workflows

Trade-off: Git notes lack native semantic search capability, requiring a secondary index (SQLite + sqlite-vec) for fast vector similarity queries. The system treats Git notes as the source of truth and SQLite as a derived, rebuildable index.

3. System Architecture

3.1 System Overview

The architecture comprises three layers: a hook layer interfacing with the IDE, a service layer implementing core memory operations, and a storage layer managing Git notes and the vector index.

+-------------------------------------------------------------------+
|                      Claude Code IDE                              |
+-------------------------------------------------------------------+
|    SessionStart    UserPrompt    PostToolUse   PreCompact   Stop  |
|         |              |             |            |          |    |
+-------------------------------------------------------------------+
|                            Hook Handlers                          |
|.   ContextBuilder  SignalDetector  DomainExtractor  Analyzer      |
+-------------------------------------------------------------------+
|                    Service Layer                                  |
|        CaptureService    RecallService    SyncService             |
+----------------+--------------------------+-----------------------+
|   Git Notes    |     SQLite Index         |  Embedding Service    |
| refs/notes/    |  memories + vec_memories |  all-MiniLM-L6-v2     |
+----------------+--------------------------+-----------------------+

3.2 Data Model

The core entity is a frozen (immutable) dataclass ensuring memory integrity:

@dataclass(frozen=True)
class Memory:
    id: str                      # "decisions:5da308d:0"
    commit_sha: str              # Git commit reference
    namespace: str               # Semantic category
    summary: str                 # <= 100 characters
    content: str                 # Full markdown body
    timestamp: datetime          # Capture time (UTC)
    spec: str | None             # Project specification
    tags: tuple[str, ...]        # Categorization
    status: str                  # "active", "resolved"
    relates_to: tuple[str, ...]  # Related memory IDs

ID Format: {namespace}:{commit_sha_prefix}:{index}

Example: decisions:5da308d:19
Enables tracing to the originating git commit for full implementation context

3.3 Storage Format

Memories use YAML front matter with a markdown body, enabling both machine parsing and human readability:

---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
spec: git-notes-memory
tags: performance,architecture
---

## Context
Import-time loading of sentence-transformers adds 2+ seconds to startup.

## Decision
Use Python's `__getattr__` in `__init__.py` for lazy module loading.

## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Users who don't need embeddings never pay the cost

3.4 Namespace Taxonomy

The system defines ten semantic namespaces, each with associated signal detection patterns:

Namespace	Purpose	Signal Patterns
decisions	Architectural choices	“I decided”, “we chose”, “[decision]”
learnings	Technical insights	“I learned”, “TIL”, “[learned]”
blockers	Impediments	“blocked by”, “stuck on”, “[blocker]”
progress	Milestones	“completed”, “shipped”, “[progress]”
patterns	Reusable approaches	“best practice”, “[pattern]”
research	External findings	Manual capture
reviews	Code review notes	Manual capture
retrospective	Post-mortems	Manual capture
inception	Problem statements	Manual capture
elicitation	Requirements	Manual capture

4. Progressive Hydration

4.1 The Hydration Model

Progressive hydration implements Shneiderman’s “details on demand” principle, loading memory details only when needed. This approach addresses the token budget constraint inherent in LLM context windows.

Level 1: SUMMARY (Default for context injection)

<memory id="decisions:5da308d:19" hydration="summary">
  <summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
</memory>

Token cost: 15-20 tokens
Retrieval time: Sub-millisecond (index lookup)

Level 2: FULL (On-demand expansion)

---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
---

## Context

Import-time loading of sentence-transformers adds 2+ seconds...

## Decision

Use Python's `__getattr__` in `__init__.py`...

## Rationale

- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+

Token cost: 100-500 tokens
Retrieval time: ~10ms (git notes show)

Level 3: FILES (Full context reconstruction)

Includes file snapshots from the commit when memory was created
Enables complete context reconstruction
Token cost: Unbounded (file-dependent)
Retrieval time: Variable (git tree traversal)

4.2 Token Efficiency Analysis

The three-level model achieves significant token savings. For a project with 100 indexed memories:

Approach	Token Cost	Context Utilization
All FULL	25,000-50,000	Exceeds typical budgets
All SUMMARY	1,500-2,000	13 memories shown
Progressive	2,000 + on-demand	Full coverage with depth

The progressive approach enables injecting summaries for all relevant memories while reserving tokens for expanding specific memories when the LLM determines additional context is needed.

4.3 Production Example

In a debugging session, the SessionStart hook injected 13 memories at SUMMARY level:

<memory_context project="git-notes-memory" memories_retrieved="13">
  <working_memory>
    <decisions title="Recent Decisions">
      <memory id="decisions:5da308d:21">
        <summary>Adaptive token budget based on project complexity</summary>
      </memory>
      <memory id="decisions:5da308d:20">
        <summary>Confidence-based tiered capture behavior (AUTO/SUGGEST/SKIP)</summary>
      </memory>
      <memory id="decisions:5da308d:19">
        <summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
      </memory>
      <memory id="decisions:5da308d:18">
        <summary>Git notes as source of truth, SQLite as derived queryable index</summary>
      </memory>
    </decisions>
  </working_memory>
</memory_context>

Total token cost: approximately 200 tokens for 13 memories. When the agent requires full context on a specific decision, it requests /memory:recall decisions:5da308d:19 for FULL hydration.

5. Hook-Based Capture

5.1 Hook Event Lifecycle

The system integrates with Claude Code’s hook infrastructure at five extension points, each serving a distinct purpose in the memory lifecycle:

Session Start --> Context Injection (memories -> Claude)
      |
      v
User Prompt ---> Signal Detection (user text -> capture decision)
      |
      v
Tool Use ------> Domain Context (file path -> related memories)
      |
      v
Pre-Compact ---> Preservation (high-confidence signals -> git notes)
      |
      v
Stop ----------> Session Analysis (transcript -> memory extraction)

5.2 Signal Detection Implementation

The SignalDetector implements the three-tier SDT-based model using regex patterns with confidence scoring:

Pattern Examples:

DECISION_PATTERNS = [
    (r"\[decision\]", 0.98),           # Explicit marker
    (r"\[d\]", 0.95),                   # Shorthand
    (r"I\s+decided\s+to", 0.90),        # Natural language
    (r"we\s+chose", 0.88),              # Collaborative
    (r"we'll\s+go\s+with", 0.85),       # Informal
]

Block Marker Format (highest confidence: 0.99):

>> decision -----------------------------------------------
Use PostgreSQL for persistence layer

## Context
Evaluated SQLite, PostgreSQL, and MongoDB.

## Rationale
- ACID guarantees required for financial data
- Team expertise in PostgreSQL
-------------------------------------------------------

The block marker format uses Unicode characters for visual distinction and achieves 0.99 confidence because it represents explicit, unambiguous user intent.

5.3 Novelty Checking

Before committing a captured memory, the system performs vector similarity checking against existing memories:

novelty_threshold = 0.3  # 30% different from existing = novel
similarity = cosine_similarity(new_embedding, existing_embeddings)
if max(similarity) > (1 - novelty_threshold):
    skip_capture()  # Too similar to existing memory

This prevents duplicate captures when users rephrase previously captured decisions, addressing the memory bloat problem identified in recent research: “indiscriminate strategies propagate errors and degrade long-term agent performance” (Xiong et al., 2025).

5.4 Production Capture Example

During a debugging session, the Stop hook analyzed the transcript and captured memories:

Hook Log (2025-12-21 00:46:36):

Stop hook invoked
Analyzing transcript: /Users/.../c2df8449-ad02-413c-ae27-52886bb605c8.jsonl
Found 5 signals in transcript
  Signal: type=decision, ns=decisions, conf=1.00, match=[decision]...
  Signal: type=decision, ns=decisions, conf=0.99, match=>> decision ---...
  Signal: type=decision, ns=decisions, conf=0.99, match=>> decision ---...
Auto-capturing signals (min_conf=0.80, max=50)
Auto-capture result: 5 captured, 0 remaining
  Captured: decisions:5da308d:17
  Captured: decisions:5da308d:18
  Captured: decisions:5da308d:19
  Captured: decisions:5da308d:20
  Captured: decisions:5da308d:21

These five memories appeared in the subsequent session’s <memory_context>, demonstrating cross-session persistence.

6. Context Injection Mechanism

6.1 SessionStart Context Injection

The SessionStart hook outputs JSON with an additionalContext field that Claude Code injects into the system prompt:

{
  "hookSpecificOutput": {
    "hookEventName": "SessionStart",
    "additionalContext": "<memory_context>...</memory_context>"
  },
  "message": "Memory system: 116 memories indexed"
}

This mechanism enables the LLM to access memories without explicit user action, implementing the working memory binding process from Baddeley’s model.

6.2 Response Guidance

Beyond memory context, the hook injects behavioral guidance teaching the LLM the capture syntax:

<session_behavior_protocol level="standard">
  <mandatory_rules>
    When you make a decision, learn something, hit a blocker, or complete work,
    you MUST capture it using block markers.

    ### Block Format (Required for All Captures)
    ```text
    >> decision -----------------------------------------------
    Use PostgreSQL for JSONB support

    ## Context
    Why this decision was needed...

    ## Rationale
    - Reason 1 with supporting evidence
    - Alternative considered and why rejected
    -------------------------------------------------------
    ```
  </mandatory_rules>
</session_behavior_protocol>

This guidance enables the LLM to actively create memories during sessions, closing the loop between memory retrieval and memory capture.

6.3 Token Budget Adaptation

Context injection adapts to project complexity, implementing an elastic memory allocation strategy:

Complexity	Memory Count	Token Budget	Working %	Semantic %
Simple	< 10	500	70%	20%
Medium	10-50	1000	70%	20%
Complex	50-200	2000	70%	25%
Full	> 200	3500	60%	35%

Projects with more memories receive larger budgets, and the working-to-semantic ratio shifts to accommodate the increased value of cross-referenced context in complex projects.

7. Novel Use Cases

7.1 Cross-Session Architectural Continuity

Scenario: Developer asks about a database choice in a new session, without explicitly querying memory.

Memory Context (injected at SessionStart):

<memory id="decisions:5da308d:18">
  <summary>Git notes as source of truth, SQLite as derived queryable index</summary>
</memory>

LLM Response: “Based on a previous architectural decision (decisions:5da308d:18), we chose to use git notes as the source of truth with SQLite as a derived index for search performance. This allows…”

The LLM naturally references the injected memory, providing continuity without explicit recall commands.

7.2 Blocker Resolution Tracking

Session 1 (blocker captured):

>> blocker -----------------------------------------------
Hook-based memory capture not working

## Context
User outputs 16 block markers but /memory:status shows 0 memories.

## Impact
5 hours of exploration produced no captured memories.
-------------------------------------------------------

Session 2 (blocker injected + resolution captured):

<blockers>
  <memory id="blockers:5da308d:0">
    <summary>Hook-based memory capture not working</summary>
  </memory>
</blockers>

Resolution captured in Session 2:

>> learned -----------------------------------------------
Hook-based memory capture works via Stop hook at session end

## Context
Memories are captured when session ends (Stop hook), not during.
/memory:status run mid-session shows 0 because Stop hasn't fired.
-------------------------------------------------------

The blocker-to-learning transition is preserved across sessions, enabling progress tracking on persistent issues.

7.3 File-Contextual Memory Surfacing

User edits: src/git_notes_memory/hooks/stop_handler.py

PostToolUse hook:

Extracts domain: [“git_notes_memory”, “hooks”, “stop_handler”]
Performs vector search with domain terms
Injects context:

<related_memories>
  <memory id="decisions:abc123:4" relevance="0.89">
    <summary>Stop hook auto-captures high-confidence signals at session end</summary>
  </memory>
  <memory id="learnings:def456:1" relevance="0.76">
    <summary>SessionAnalyzer scans both user and assistant messages</summary>
  </memory>
</related_memories>

This surfaces relevant context when editing specific files, without requiring explicit memory queries.

7.4 Compaction-Safe Preservation

The PreCompact hook fires before context window compaction, analyzing the transcript for uncaptured high-confidence signals:

Log output:

PreCompact hook invoked
Analyzing transcript for uncaptured signals...
Found 3 uncaptured signals
Auto-capture result: 3 captured, 0 remaining

This implements the memory consolidation phase, ensuring valuable insights survive context compaction. The analogy to sleep-dependent memory consolidation is deliberate: just as sleep consolidates memories before they decay, the PreCompact hook consolidates memories before the context window shrinks.

7.5 Team Knowledge Distribution

Memories synchronize through standard git operations:

# Push memories to remote
git push origin 'refs/notes/mem/*:refs/notes/mem/*'

# Pull team's memories
git fetch origin 'refs/notes/mem/*:refs/notes/mem/*'

# Reindex after pull
/memory:sync

This enables team-wide learning capture without additional infrastructure, treating collective knowledge as a natural extension of the codebase.

8. Evaluation

8.1 Performance Measurements

Operation	Target	Achieved	Method
SessionStart context build	<= 2000ms	< 10ms	Indexed queries
Signal detection (regex)	<= 100ms	< 5ms	Compiled patterns
Novelty check	<= 300ms	< 50ms	sqlite-vec KNN
Memory capture	<= 500ms	< 100ms	Append + index
Vector search (k=10)	<= 100ms	< 50ms	sqlite-vec

All operations complete well within interactive latency requirements, ensuring the memory system does not degrade user experience.

8.2 Index Statistics

Production statistics from the git-notes-memory project:

Total indexed memories: 116
By namespace:
  - decisions: 28
  - learnings: 23
  - blockers: 19
  - progress: 15
  - patterns: 31

The distribution reflects natural development patterns: more decisions and learnings than blockers, with patterns accumulating as the project matures.

8.3 Scalability Characteristics

Memory count: Tested to 1000+ memories without degradation
Transcript size: Handles 2M token transcripts (Claude Code maximum)
Concurrent access: File locking prevents corruption during parallel sessions
Index rebuild: Full reindex from git notes completes in < 5 seconds for 1000 memories

9.1 LLM Agent Memory Systems

Recent surveys identify memory as “the key component that transforms the original LLM into a ‘true agent’” (Zhang et al., 2025). Current approaches fall into several categories:

In-context memory: Appending conversation history to prompts. Limited by context window size and incurs O(n) token cost per turn.

Vector database retrieval: Systems like Mem0 use external vector databases for semantic retrieval, achieving “26% higher response accuracy compared to OpenAI’s memory” (Mem0, 2025). However, these require infrastructure beyond the development environment.

Reflection-based memory: MemGPT and similar systems use LLM self-reflection for memory management. Effective but computationally expensive.

Our approach differs by using Git as the storage layer, eliminating infrastructure requirements while enabling team synchronization through existing workflows.

9.2 Cognitive Architectures

The system draws from cognitive architecture research, particularly ACT-R’s distinction between declarative and procedural memory. Our namespace taxonomy (decisions, learnings, patterns) reflects this distinction: decisions and learnings are declarative (facts), while patterns approach procedural (how-to) knowledge.

9.3 Progressive Disclosure in AI

Recent research investigates “the effect of progressive disclosure for improving the transparency of AI text generation systems” (Springer, 2024). Our progressive hydration extends this principle from transparency to memory management, using similar principles of layered information access.

10. Limitations and Future Work

10.1 Current Limitations

Session-End Capture: Memories from assistant responses are captured at session end, not mid-session. Users cannot query newly-captured memories until the next session.
Single-Model Embeddings: The system uses all-MiniLM-L6-v2 (384 dimensions). Migration to different embedding models requires full reindexing.
Single-Repository Scope: Each repository maintains an isolated memory index. Cross-repository queries are not supported.
Manual Namespace Selection: Block markers require explicit namespace specification. Automatic namespace inference would reduce cognitive load.

10.2 Future Directions

Mid-Session Capture: Analyze assistant responses via UserPromptSubmit or PostToolUse hooks for real-time capture.
LLM-Assisted Classification: Use the LLM itself for namespace inference and memory summarization, trading latency for accuracy.
Cross-Repository Federation: Query memories from linked repositories, enabling organization-wide knowledge retrieval.
Temporal Decay: Implement exponential decay in relevance scoring, prioritizing recent memories while retaining access to historical context.
Feedback Loops: Track which memories the LLM references, reinforcing useful memories and demoting unused ones.

11. Conclusion

The git-notes-memory-manager demonstrates that persistent, semantically- searchable memory for LLM agents is achievable without external infrastructure. By leveraging Git’s native notes mechanism, progressive hydration, and hook- based capture with signal detection theory, the system provides:

Zero-Infrastructure Memory: Operates with existing git, requiring no databases or cloud services
Semantic Retrieval: Natural language queries locate relevant memories through vector similarity
Automatic Capture: Confidence-scored signal detection reduces cognitive load
Token Efficiency: Progressive hydration respects context window constraints
Team Sharing: Memories synchronize with code through standard git operations

The architecture validates treating LLM agent memory as a first-class concern— rather than an afterthought—enabling qualitatively different developer experiences. Decisions persist, blockers track to resolution, and learnings accumulate across sessions, transforming ephemeral conversations into durable knowledge.

Appendix A: Configuration Reference

Variable	Default	Purpose
`HOOK_ENABLED`	true	Master switch for all hooks
`HOOK_SESSION_START_ENABLED`	true	Context injection at session start
`HOOK_STOP_ENABLED`	true	Session-end transcript analysis
`HOOK_STOP_MAX_CAPTURES`	50	Maximum auto-captures per session
`HOOK_PRE_COMPACT_ENABLED`	true	Capture before context compaction
`HOOK_PRE_COMPACT_MIN_CONFIDENCE`	0.85	Minimum confidence for auto-capture

Appendix B: Memory ID Format

{namespace}:{commit_sha_prefix}:{index}

Examples:
  decisions:5da308d:19    -> Decision #19 on commit 5da308d
  learnings:4c98fec:0     -> First learning on commit 4c98fec
  blockers:051134b:2      -> Third blocker on commit 051134b

The commit SHA prefix enables tracing memories to their originating context, supporting the FILES hydration level.

Appendix C: Signal Confidence Ranges

Signal Type	Base Range	Boost Conditions
Block marker (»)	0.99	None (maximum confidence)
Explicit ([decision])	0.95-0.98	None
Strong (“I decided”)	0.85-0.92	+0.05 if “critical”, “important”
Medium (“we chose”)	0.80-0.88	+0.02 if complete sentence
Weak (“I prefer”)	0.68-0.75	Rarely auto-captured

References

Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417-423. DOI: 10.1016/S1364-6613(00)01538-2

Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47-89). Academic Press.

Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.

Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley.

Hitch, G. J., Allen, R. J., & Baddeley, A. D. (2025). The multicomponent model of working memory fifty years on. Quarterly Journal of Experimental Psychology. DOI: 10.1177/17470218241290909

Nielsen, J. (2006). Progressive disclosure. Nielsen Norman Group. nngroup.com

Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of IEEE Symposium on Visual Languages, 336-343.

Wang, L., et al. (2024). A survey on the memory mechanism of large language model based agents. arXiv:2404.13501.

Xiong, C., et al. (2025). Memory management for LLM agents: Utility-based deletion prevents bloat. arXiv preprint.

This research was conducted through systematic analysis of the git-notes-memory-manager codebase and production validation during development sessions. Real examples are drawn from actual session logs dated December 2025.