Large Result Offloading

When MIF operations return large result sets, the full payload consumes significant context window tokens in the AI assistant conversation. Large Result Offloading (LRO) specifies a protocol where results exceeding a token threshold are written to a temporary JSONL file, and the tool returns a compact prompt with the file path, schema, and ready-to-use jq recipes, enabling the assistant to selectively extract only what it needs.

Motivation

Context windows are finite and expensive. A recall_memories call returning 200 memories at Full detail can easily exceed 40,000 tokens, consuming half or more of the available context for raw data the assistant will typically filter or summarize. LRO preserves the full result fidelity while returning a compact inline response that guides the assistant to selectively read only the data it needs.

LRO applies to the following operations:

recall_memories
list_memories
inject_context
Search queries via SearchService

Threshold Detection

LRO uses a single global token threshold to decide whether results are returned inline or offloaded to a file.

/// Global token threshold for LRO activation.
/// Results estimated to exceed this threshold are offloaded to JSONL.
/// Default: 6400 tokens. Configurable via [prompt.offload] config section.
pub const DEFAULT_OFFLOAD_THRESHOLD_TOKENS: usize = 6400;

Token estimation MUST use the same heuristic defined in Prompt Integration (Context Window Budgeting): tokens ≈ characters / 4 for Latin-script content, with model-specific tokenizers RECOMMENDED for CJK or mixed-script content.

The threshold check occurs after the operation completes but before formatting the response. Implementations MUST:

Execute the operation (recall, list, search, inject) normally.
Estimate the total token count of the result set at the requested detail level.
If estimated_tokens > threshold_tokens, offload to JSONL and return an OffloadResponse.
If estimated_tokens <= threshold_tokens, return results inline as usual.

Normative: The threshold is evaluated against the total result set, not individual memories. A single large memory below the threshold is returned inline; many small memories exceeding the threshold collectively are offloaded.

JSONL File Format

Offloaded results are written as line-delimited JSON (JSONL). Each file consists of a header line followed by one MIF memory object per line.

Header Line (Line 0)

The first line is a metadata header conforming to OffloadHeader:

/// Metadata header written as the first line of an offloaded JSONL file.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadHeader {
    /// Marker identifying this as an LRO header. Always `"lro_header"`.
    #[serde(rename = "type")]
    pub header_type: String,

    /// The operation that produced these results.
    pub operation: String,

    /// The query string (if applicable).
    pub query: Option<String>,

    /// Total number of memory lines following the header.
    pub count: usize,

    /// MIF schema version of the memory objects.
    pub schema_version: String,

    /// ISO 8601 timestamp of when the file was written.
    pub timestamp: String,

    /// Estimated total tokens of the result set.
    pub estimated_tokens: usize,

    /// Detail level used for serialization.
    pub detail: String,
}

Memory Lines (Line 1+)

Each subsequent line is a complete MIF Memory object serialized as JSON, including all fields present at the requested detail level:

Core fields: id, memory_type, content, created, modified, namespace, title, tags, status
Enrichment fields: entities, relationships, wiki_links, embedding, summary
Provenance: provenance (confidence, trust_level, source_type, agent)
Temporal: temporal (decay config and state, TTL, valid_from/until)
Extensions: extensions, blocks, citations

File Naming

Files MUST be written to a configurable output directory using the following naming convention:

{output_dir}/atlatl-{operation}-{ulid}.jsonl

Where:

{output_dir} defaults to the system temporary directory (e.g., /tmp)
{operation} is the operation name (recall, list, search, inject)
{ulid} is a ULID providing both uniqueness and temporal ordering

/// Represents an offloaded result file.
#[derive(Debug, Clone)]
pub struct OffloadedResult {
    /// Absolute path to the JSONL file.
    pub path: PathBuf,

    /// Header metadata.
    pub header: OffloadHeader,

    /// Time-to-live for this file. After expiry, custodial cleanup MAY delete it.
    pub ttl: Duration,

    /// When this file was created.
    pub created_at: DateTime<Utc>,
}

Inline Response Format

When LRO activates, the tool returns an OffloadResponse instead of the full result set. This response is designed as a self-contained prompt that gives the AI assistant everything it needs to work with the offloaded data.

/// Compact response returned when results are offloaded to JSONL.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadResponse {
    /// Indicates this is an offloaded result.
    pub offloaded: bool,

    /// Summary of the result set.
    pub summary: OffloadSummary,

    /// Absolute path to the JSONL file.
    pub file_path: String,

    /// JSON Schema describing each memory line in the JSONL file.
    pub line_schema: serde_json::Value,

    /// Ready-to-use jq recipes for common extraction patterns.
    pub jq_recipes: Vec<JqRecipe>,

    /// Usage guidance for the AI assistant.
    pub guidance: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadSummary {
    /// Total number of memories in the file.
    pub count: usize,

    /// Estimated total tokens saved by offloading.
    pub estimated_tokens: usize,

    /// The operation that was performed.
    pub operation: String,

    /// Top namespaces represented (up to 5).
    pub top_namespaces: Vec<String>,

    /// Score range (min, max) if applicable.
    pub score_range: Option<(f64, f64)>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JqRecipe {
    /// Human-readable description of what this recipe does.
    pub description: String,

    /// The jq command to execute.
    pub command: String,
}

Standard jq Recipe Library

Implementations MUST include the following recipes in every OffloadResponse:

#	Description	Command
1	List all titles with scores	`tail -n +2 {file} \| jq -r '[.title, .provenance.confidence] \| @tsv'`
2	Filter by namespace prefix	`tail -n +2 {file} \| jq 'select(.namespace \| startswith("_semantic"))'`
3	Search titles by keyword	`tail -n +2 {file} \| jq 'select(.title \| test("keyword"; "i"))'`
4	Sort by confidence (descending)	`tail -n +2 {file} \| jq -s 'sort_by(-.provenance.confidence)'`
5	Extract IDs and titles only	`tail -n +2 {file} \| jq '{id, title, namespace}'`
6	Filter by memory type	`tail -n +2 {file} \| jq 'select(.memory_type == "semantic")'`
7	Get memories with entities	`tail -n +2 {file} \| jq 'select(.entities \| length > 0)'`
8	Count by namespace	`tail -n +2 {file} \| jq -s 'group_by(.namespace) \| map({namespace: .[0].namespace, count: length}) \| sort_by(-.count)'`
9	Get top N by score	`tail -n +2 {file} \| jq -s 'sort_by(-.provenance.confidence) \| .[:10]'`
10	Full-text search in content	`tail -n +2 {file} \| jq 'select(.content \| test("pattern"; "i"))'`

Note: All recipes use tail -n +2 to skip the header line. The {file} placeholder MUST be replaced with the actual file path from OffloadResponse.file_path.

Guidance Prompt

The guidance field MUST contain a brief instruction block for the AI assistant. Implementations SHOULD use the following template:

Results offloaded to JSONL ({count} memories, ~{tokens} tokens saved).
File: {path}

Use the jq recipes above to extract specific data. Common patterns:
- Browse: recipe #1 (titles with scores)
- Filter: recipe #2 (by namespace) or #3 (by keyword)
- Analyze: recipe #8 (count by namespace)

Read the file directly only if you need the complete dataset.
The header line (line 1) contains metadata; memory objects start at line 2.

Decision Flow

flowchart TD
    A[Operation completes] --> B[Estimate total tokens]
    B --> C{tokens > threshold?}
    C -->|No| D[Return inline response]
    C -->|Yes| E[Write JSONL to temp file]
    E --> F[Build OffloadResponse]
    F --> G[Include summary + recipes]
    G --> H[Return OffloadResponse]

    style C fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#9f9,stroke:#333
    style H fill:#9f9,stroke:#333

Cleanup and Lifecycle

Offloaded JSONL files are ephemeral and MUST be cleaned up after their TTL expires.

TTL

Default TTL: 3600 seconds (1 hour)
Configurable via prompt.offload.ttl_seconds
Implementations MUST record created_at for each offloaded file

Custodial Integration

Implementations SHOULD register an offload_cleanup custodial task that:

Scans the output_dir for files matching atlatl-*.jsonl
Deletes files whose created_at + ttl has elapsed
Emits OffloadFileExpired events for observability

Task Name	Default Schedule	Description
`offload_cleanup`	Every hour	Delete expired LRO JSONL files

Error Handling

If the temporary file write fails (disk full, permission denied, etc.), implementations MUST fall back to returning an inline truncated result:

Truncate the result set to fit within threshold_tokens.
Include a warning in the response indicating that LRO failed and results are truncated.
Emit an OffloadWriteFailed event with the error details.

Implementations MUST NOT fail the entire operation due to an LRO write failure. The offloading is an optimization; the operation itself succeeded.

Configuration

LRO configuration lives under the [prompt.offload] section:

[prompt.offload]
enabled = true                          # Enable/disable LRO globally
threshold_tokens = 6400                 # Token threshold for offloading
ttl_seconds = 3600                      # File TTL (1 hour default)
output_dir = ""                         # Empty = system temp dir

Key	Type	Default	Description
`prompt.offload.enabled`	`bool`	`true`	Enable or disable LRO
`prompt.offload.threshold_tokens`	`usize`	`6400`	Token threshold for activation
`prompt.offload.ttl_seconds`	`u64`	`3600`	Seconds before file cleanup
`prompt.offload.output_dir`	`String`	`""` (system temp)	Directory for JSONL files

Environment variable mapping follows the standard convention:

Config Key	Environment Variable
`prompt.offload.enabled`	`ATLATL_PROMPT__OFFLOAD__ENABLED`
`prompt.offload.threshold_tokens`	`ATLATL_PROMPT__OFFLOAD__THRESHOLD_TOKENS`
`prompt.offload.ttl_seconds`	`ATLATL_PROMPT__OFFLOAD__TTL_SECONDS`
`prompt.offload.output_dir`	`ATLATL_PROMPT__OFFLOAD__OUTPUT_DIR`

Conformance Requirements

Conformance Level	Requirement
Level 1	MAY implement LRO. If implemented, MUST support threshold detection and JSONL output.
Level 2	SHOULD implement LRO. If implemented, MUST include the standard jq recipe library and custodial cleanup.
Level 3	MUST implement LRO with threshold detection, JSONL output, full jq recipe library, custodial cleanup, and error fallback.