Large Result Offloading
When MIF operations return large result sets, the full payload consumes significant context window tokens in the AI assistant conversation. Large Result Offloading (LRO) specifies a protocol where results exceeding a token threshold are written to a temporary JSONL file, and the tool returns a compact prompt with the file path, schema, and ready-to-use jq recipes, enabling the assistant to selectively extract only what it needs.
Motivation
Context windows are finite and expensive. A recall_memories call returning 200 memories at Full detail can easily exceed 40,000 tokens, consuming half or more of the available context for raw data the assistant will typically filter or summarize. LRO preserves the full result fidelity while returning a compact inline response that guides the assistant to selectively read only the data it needs.
LRO applies to the following operations:
recall_memorieslist_memoriesinject_context- Search queries via
SearchService
Threshold Detection
LRO uses a single global token threshold to decide whether results are returned inline or offloaded to a file.
/// Global token threshold for LRO activation.
/// Results estimated to exceed this threshold are offloaded to JSONL.
/// Default: 6400 tokens. Configurable via [prompt.offload] config section.
pub const DEFAULT_OFFLOAD_THRESHOLD_TOKENS: usize = 6400;
Token estimation MUST use the same heuristic defined in Prompt Integration (Context Window Budgeting): tokens ≈ characters / 4 for Latin-script content, with model-specific tokenizers RECOMMENDED for CJK or mixed-script content.
The threshold check occurs after the operation completes but before formatting the response. Implementations MUST:
- Execute the operation (recall, list, search, inject) normally.
- Estimate the total token count of the result set at the requested detail level.
- If
estimated_tokens > threshold_tokens, offload to JSONL and return anOffloadResponse. - If
estimated_tokens <= threshold_tokens, return results inline as usual.
Normative: The threshold is evaluated against the total result set, not individual memories. A single large memory below the threshold is returned inline; many small memories exceeding the threshold collectively are offloaded.
JSONL File Format
Offloaded results are written as line-delimited JSON (JSONL). Each file consists of a header line followed by one MIF memory object per line.
Header Line (Line 0)
The first line is a metadata header conforming to OffloadHeader:
/// Metadata header written as the first line of an offloaded JSONL file.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadHeader {
/// Marker identifying this as an LRO header. Always `"lro_header"`.
#[serde(rename = "type")]
pub header_type: String,
/// The operation that produced these results.
pub operation: String,
/// The query string (if applicable).
pub query: Option<String>,
/// Total number of memory lines following the header.
pub count: usize,
/// MIF schema version of the memory objects.
pub schema_version: String,
/// ISO 8601 timestamp of when the file was written.
pub timestamp: String,
/// Estimated total tokens of the result set.
pub estimated_tokens: usize,
/// Detail level used for serialization.
pub detail: String,
}
Memory Lines (Line 1+)
Each subsequent line is a complete MIF Memory object serialized as JSON, including all fields present at the requested detail level:
- Core fields:
id,memory_type,content,created,modified,namespace,title,tags,status - Enrichment fields:
entities,relationships,wiki_links,embedding,summary - Provenance:
provenance(confidence, trust_level, source_type, agent) - Temporal:
temporal(decay config and state, TTL, valid_from/until) - Extensions:
extensions,blocks,citations
File Naming
Files MUST be written to a configurable output directory using the following naming convention:
{output_dir}/atlatl-{operation}-{ulid}.jsonl
Where:
{output_dir}defaults to the system temporary directory (e.g.,/tmp){operation}is the operation name (recall,list,search,inject){ulid}is a ULID providing both uniqueness and temporal ordering
/// Represents an offloaded result file.
#[derive(Debug, Clone)]
pub struct OffloadedResult {
/// Absolute path to the JSONL file.
pub path: PathBuf,
/// Header metadata.
pub header: OffloadHeader,
/// Time-to-live for this file. After expiry, custodial cleanup MAY delete it.
pub ttl: Duration,
/// When this file was created.
pub created_at: DateTime<Utc>,
}
Inline Response Format
When LRO activates, the tool returns an OffloadResponse instead of the full result set. This response is designed as a self-contained prompt that gives the AI assistant everything it needs to work with the offloaded data.
/// Compact response returned when results are offloaded to JSONL.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadResponse {
/// Indicates this is an offloaded result.
pub offloaded: bool,
/// Summary of the result set.
pub summary: OffloadSummary,
/// Absolute path to the JSONL file.
pub file_path: String,
/// JSON Schema describing each memory line in the JSONL file.
pub line_schema: serde_json::Value,
/// Ready-to-use jq recipes for common extraction patterns.
pub jq_recipes: Vec<JqRecipe>,
/// Usage guidance for the AI assistant.
pub guidance: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OffloadSummary {
/// Total number of memories in the file.
pub count: usize,
/// Estimated total tokens saved by offloading.
pub estimated_tokens: usize,
/// The operation that was performed.
pub operation: String,
/// Top namespaces represented (up to 5).
pub top_namespaces: Vec<String>,
/// Score range (min, max) if applicable.
pub score_range: Option<(f64, f64)>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JqRecipe {
/// Human-readable description of what this recipe does.
pub description: String,
/// The jq command to execute.
pub command: String,
}
Standard jq Recipe Library
Implementations MUST include the following recipes in every OffloadResponse:
| # | Description | Command |
|---|---|---|
| 1 | List all titles with scores | tail -n +2 {file} | jq -r '[.title, .provenance.confidence] | @tsv' |
| 2 | Filter by namespace prefix | tail -n +2 {file} | jq 'select(.namespace | startswith("_semantic"))' |
| 3 | Search titles by keyword | tail -n +2 {file} | jq 'select(.title | test("keyword"; "i"))' |
| 4 | Sort by confidence (descending) | tail -n +2 {file} | jq -s 'sort_by(-.provenance.confidence)' |
| 5 | Extract IDs and titles only | tail -n +2 {file} | jq '{id, title, namespace}' |
| 6 | Filter by memory type | tail -n +2 {file} | jq 'select(.memory_type == "semantic")' |
| 7 | Get memories with entities | tail -n +2 {file} | jq 'select(.entities | length > 0)' |
| 8 | Count by namespace | tail -n +2 {file} | jq -s 'group_by(.namespace) | map({namespace: .[0].namespace, count: length}) | sort_by(-.count)' |
| 9 | Get top N by score | tail -n +2 {file} | jq -s 'sort_by(-.provenance.confidence) | .[:10]' |
| 10 | Full-text search in content | tail -n +2 {file} | jq 'select(.content | test("pattern"; "i"))' |
Note: All recipes use
tail -n +2to skip the header line. The{file}placeholder MUST be replaced with the actual file path fromOffloadResponse.file_path.
Guidance Prompt
The guidance field MUST contain a brief instruction block for the AI assistant. Implementations SHOULD use the following template:
Results offloaded to JSONL ({count} memories, ~{tokens} tokens saved).
File: {path}
Use the jq recipes above to extract specific data. Common patterns:
- Browse: recipe #1 (titles with scores)
- Filter: recipe #2 (by namespace) or #3 (by keyword)
- Analyze: recipe #8 (count by namespace)
Read the file directly only if you need the complete dataset.
The header line (line 1) contains metadata; memory objects start at line 2.
Decision Flow
flowchart TD
A[Operation completes] --> B[Estimate total tokens]
B --> C{tokens > threshold?}
C -->|No| D[Return inline response]
C -->|Yes| E[Write JSONL to temp file]
E --> F[Build OffloadResponse]
F --> G[Include summary + recipes]
G --> H[Return OffloadResponse]
style C fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#9f9,stroke:#333
style H fill:#9f9,stroke:#333
Cleanup and Lifecycle
Offloaded JSONL files are ephemeral and MUST be cleaned up after their TTL expires.
TTL
- Default TTL: 3600 seconds (1 hour)
- Configurable via
prompt.offload.ttl_seconds - Implementations MUST record
created_atfor each offloaded file
Custodial Integration
Implementations SHOULD register an offload_cleanup custodial task that:
- Scans the
output_dirfor files matchingatlatl-*.jsonl - Deletes files whose
created_at + ttlhas elapsed - Emits
OffloadFileExpiredevents for observability
| Task Name | Default Schedule | Description |
|---|---|---|
offload_cleanup | Every hour | Delete expired LRO JSONL files |
Error Handling
If the temporary file write fails (disk full, permission denied, etc.), implementations MUST fall back to returning an inline truncated result:
- Truncate the result set to fit within
threshold_tokens. - Include a warning in the response indicating that LRO failed and results are truncated.
- Emit an
OffloadWriteFailedevent with the error details.
Implementations MUST NOT fail the entire operation due to an LRO write failure. The offloading is an optimization; the operation itself succeeded.
Configuration
LRO configuration lives under the [prompt.offload] section:
[prompt.offload]
enabled = true # Enable/disable LRO globally
threshold_tokens = 6400 # Token threshold for offloading
ttl_seconds = 3600 # File TTL (1 hour default)
output_dir = "" # Empty = system temp dir
| Key | Type | Default | Description |
|---|---|---|---|
prompt.offload.enabled | bool | true | Enable or disable LRO |
prompt.offload.threshold_tokens | usize | 6400 | Token threshold for activation |
prompt.offload.ttl_seconds | u64 | 3600 | Seconds before file cleanup |
prompt.offload.output_dir | String | "" (system temp) | Directory for JSONL files |
Environment variable mapping follows the standard convention:
| Config Key | Environment Variable |
|---|---|
prompt.offload.enabled | ATLATL_PROMPT__OFFLOAD__ENABLED |
prompt.offload.threshold_tokens | ATLATL_PROMPT__OFFLOAD__THRESHOLD_TOKENS |
prompt.offload.ttl_seconds | ATLATL_PROMPT__OFFLOAD__TTL_SECONDS |
prompt.offload.output_dir | ATLATL_PROMPT__OFFLOAD__OUTPUT_DIR |
Conformance Requirements
| Conformance Level | Requirement |
|---|---|
| Level 1 | MAY implement LRO. If implemented, MUST support threshold detection and JSONL output. |
| Level 2 | SHOULD implement LRO. If implemented, MUST include the standard jq recipe library and custodial cleanup. |
| Level 3 | MUST implement LRO with threshold detection, JSONL output, full jq recipe library, custodial cleanup, and error fallback. |