Large Result Offloading

When MIF operations return large result sets, the full payload consumes significant context window tokens in the AI assistant conversation. Large Result Offloading (LRO) specifies a protocol where results exceeding a token threshold are written to a temporary JSONL file, and the tool returns a compact prompt with the file path, schema, and ready-to-use jq recipes, enabling the assistant to selectively extract only what it needs.

Motivation

Context windows are finite and expensive. A recall_memories call returning 200 memories at Full detail can easily exceed 40,000 tokens, consuming half or more of the available context for raw data the assistant will typically filter or summarize. LRO preserves the full result fidelity while returning a compact inline response that guides the assistant to selectively read only the data it needs.

LRO applies to the following operations:

Threshold Detection

LRO uses a single global token threshold to decide whether results are returned inline or offloaded to a file.

/// Global token threshold for LRO activation.
/// Results estimated to exceed this threshold are offloaded to JSONL.
/// Default: 6400 tokens. Configurable via [prompt.offload] config section.
pub const DEFAULT_OFFLOAD_THRESHOLD_TOKENS: usize = 6400;

Token estimation MUST use the same heuristic defined in Prompt Integration (Context Window Budgeting): tokens ≈ characters / 4 for Latin-script content, with model-specific tokenizers RECOMMENDED for CJK or mixed-script content.

The threshold check occurs after the operation completes but before formatting the response. Implementations MUST:

  1. Execute the operation (recall, list, search, inject) normally.
  2. Estimate the total token count of the result set at the requested detail level.
  3. If estimated_tokens > threshold_tokens, offload to JSONL and return an OffloadResponse.
  4. If estimated_tokens <= threshold_tokens, return results inline as usual.

Normative: The threshold is evaluated against the total result set, not individual memories. A single large memory below the threshold is returned inline; many small memories exceeding the threshold collectively are offloaded.

JSONL File Format

Offloaded results are written as line-delimited JSON (JSONL). Each file consists of a header line followed by one MIF memory object per line.

Header Line (Line 0)

The first line is a metadata header conforming to OffloadHeader:

/// Metadata header written as the first line of an offloaded JSONL file.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadHeader {
    /// Marker identifying this as an LRO header. Always `"lro_header"`.
    #[serde(rename = "type")]
    pub header_type: String,

    /// The operation that produced these results.
    pub operation: String,

    /// The query string (if applicable).
    pub query: Option<String>,

    /// Total number of memory lines following the header.
    pub count: usize,

    /// MIF schema version of the memory objects.
    pub schema_version: String,

    /// ISO 8601 timestamp of when the file was written.
    pub timestamp: String,

    /// Estimated total tokens of the result set.
    pub estimated_tokens: usize,

    /// Detail level used for serialization.
    pub detail: String,
}

Memory Lines (Line 1+)

Each subsequent line is a complete MIF Memory object serialized as JSON, including all fields present at the requested detail level:

File Naming

Files MUST be written to a configurable output directory using the following naming convention:

{output_dir}/atlatl-{operation}-{ulid}.jsonl

Where:

/// Represents an offloaded result file.
#[derive(Debug, Clone)]
pub struct OffloadedResult {
    /// Absolute path to the JSONL file.
    pub path: PathBuf,

    /// Header metadata.
    pub header: OffloadHeader,

    /// Time-to-live for this file. After expiry, custodial cleanup MAY delete it.
    pub ttl: Duration,

    /// When this file was created.
    pub created_at: DateTime<Utc>,
}

Inline Response Format

When LRO activates, the tool returns an OffloadResponse instead of the full result set. This response is designed as a self-contained prompt that gives the AI assistant everything it needs to work with the offloaded data.

/// Compact response returned when results are offloaded to JSONL.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadResponse {
    /// Indicates this is an offloaded result.
    pub offloaded: bool,

    /// Summary of the result set.
    pub summary: OffloadSummary,

    /// Absolute path to the JSONL file.
    pub file_path: String,

    /// JSON Schema describing each memory line in the JSONL file.
    pub line_schema: serde_json::Value,

    /// Ready-to-use jq recipes for common extraction patterns.
    pub jq_recipes: Vec<JqRecipe>,

    /// Usage guidance for the AI assistant.
    pub guidance: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadSummary {
    /// Total number of memories in the file.
    pub count: usize,

    /// Estimated total tokens saved by offloading.
    pub estimated_tokens: usize,

    /// The operation that was performed.
    pub operation: String,

    /// Top namespaces represented (up to 5).
    pub top_namespaces: Vec<String>,

    /// Score range (min, max) if applicable.
    pub score_range: Option<(f64, f64)>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JqRecipe {
    /// Human-readable description of what this recipe does.
    pub description: String,

    /// The jq command to execute.
    pub command: String,
}

Standard jq Recipe Library

Implementations MUST include the following recipes in every OffloadResponse:

#DescriptionCommand
1List all titles with scorestail -n +2 {file} | jq -r '[.title, .provenance.confidence] | @tsv'
2Filter by namespace prefixtail -n +2 {file} | jq 'select(.namespace | startswith("_semantic"))'
3Search titles by keywordtail -n +2 {file} | jq 'select(.title | test("keyword"; "i"))'
4Sort by confidence (descending)tail -n +2 {file} | jq -s 'sort_by(-.provenance.confidence)'
5Extract IDs and titles onlytail -n +2 {file} | jq '{id, title, namespace}'
6Filter by memory typetail -n +2 {file} | jq 'select(.memory_type == "semantic")'
7Get memories with entitiestail -n +2 {file} | jq 'select(.entities | length > 0)'
8Count by namespacetail -n +2 {file} | jq -s 'group_by(.namespace) | map({namespace: .[0].namespace, count: length}) | sort_by(-.count)'
9Get top N by scoretail -n +2 {file} | jq -s 'sort_by(-.provenance.confidence) | .[:10]'
10Full-text search in contenttail -n +2 {file} | jq 'select(.content | test("pattern"; "i"))'

Note: All recipes use tail -n +2 to skip the header line. The {file} placeholder MUST be replaced with the actual file path from OffloadResponse.file_path.

Guidance Prompt

The guidance field MUST contain a brief instruction block for the AI assistant. Implementations SHOULD use the following template:

Results offloaded to JSONL ({count} memories, ~{tokens} tokens saved).
File: {path}

Use the jq recipes above to extract specific data. Common patterns:
- Browse: recipe #1 (titles with scores)
- Filter: recipe #2 (by namespace) or #3 (by keyword)
- Analyze: recipe #8 (count by namespace)

Read the file directly only if you need the complete dataset.
The header line (line 1) contains metadata; memory objects start at line 2.

Decision Flow

flowchart TD
    A[Operation completes] --> B[Estimate total tokens]
    B --> C{tokens > threshold?}
    C -->|No| D[Return inline response]
    C -->|Yes| E[Write JSONL to temp file]
    E --> F[Build OffloadResponse]
    F --> G[Include summary + recipes]
    G --> H[Return OffloadResponse]

    style C fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#9f9,stroke:#333
    style H fill:#9f9,stroke:#333

Cleanup and Lifecycle

Offloaded JSONL files are ephemeral and MUST be cleaned up after their TTL expires.

TTL

Custodial Integration

Implementations SHOULD register an offload_cleanup custodial task that:

  1. Scans the output_dir for files matching atlatl-*.jsonl
  2. Deletes files whose created_at + ttl has elapsed
  3. Emits OffloadFileExpired events for observability
Task NameDefault ScheduleDescription
offload_cleanupEvery hourDelete expired LRO JSONL files

Error Handling

If the temporary file write fails (disk full, permission denied, etc.), implementations MUST fall back to returning an inline truncated result:

  1. Truncate the result set to fit within threshold_tokens.
  2. Include a warning in the response indicating that LRO failed and results are truncated.
  3. Emit an OffloadWriteFailed event with the error details.

Implementations MUST NOT fail the entire operation due to an LRO write failure. The offloading is an optimization; the operation itself succeeded.

Configuration

LRO configuration lives under the [prompt.offload] section:

[prompt.offload]
enabled = true                          # Enable/disable LRO globally
threshold_tokens = 6400                 # Token threshold for offloading
ttl_seconds = 3600                      # File TTL (1 hour default)
output_dir = ""                         # Empty = system temp dir
KeyTypeDefaultDescription
prompt.offload.enabledbooltrueEnable or disable LRO
prompt.offload.threshold_tokensusize6400Token threshold for activation
prompt.offload.ttl_secondsu643600Seconds before file cleanup
prompt.offload.output_dirString"" (system temp)Directory for JSONL files

Environment variable mapping follows the standard convention:

Config KeyEnvironment Variable
prompt.offload.enabledATLATL_PROMPT__OFFLOAD__ENABLED
prompt.offload.threshold_tokensATLATL_PROMPT__OFFLOAD__THRESHOLD_TOKENS
prompt.offload.ttl_secondsATLATL_PROMPT__OFFLOAD__TTL_SECONDS
prompt.offload.output_dirATLATL_PROMPT__OFFLOAD__OUTPUT_DIR

Conformance Requirements

Conformance LevelRequirement
Level 1MAY implement LRO. If implemented, MUST support threshold detection and JSONL output.
Level 2SHOULD implement LRO. If implemented, MUST include the standard jq recipe library and custodial cleanup.
Level 3MUST implement LRO with threshold detection, JSONL output, full jq recipe library, custodial cleanup, and error fallback.