Skip to content

Jsonl Log Analyzer

Automated schema-aware analysis of large JSONL log files. Discovers the field schema dynamically, generates tailored jq extraction recipes, and orchestrates the standard RLM fan-out/fan-in pipeline — making JSONL log analysis a single-prompt operation.

Related skills:


flowchart TD
    A[User prompt + JSONL file path] --> B[Phase 1: Schema Discovery]
    B -->|inline: head + jq| C[Schema extract + field classification]
    C --> D[Phase 2: Partition & Generate Prompts]
    D -->|chunk file by line count| E[Chunk 1]
    D --> F[Chunk 2]
    D --> G[Chunk N]
    E --> H["swarm:rlm-json-analyzer<br/>(Haiku)"]
    F --> I["swarm:rlm-json-analyzer<br/>(Haiku)"]
    G --> J["swarm:rlm-json-analyzer<br/>(Haiku)"]
    H --> K[Phase 3: Synthesis]
    I --> K
    J --> K
    K -->|"swarm:rlm-synthesizer<br/>(Sonnet)"| L[Final Report]

Three phases:

  1. Schema Discovery — Extract field paths, types, and presence counts using head/tail + jq. No raw log lines enter the orchestrator’s context.
  2. Partition & Generate Prompts — Split the file by line count, generate per-chunk analyst prompts with schema summary and tailored jq recipes.
  3. Synthesis — Aggregate analyst findings into a coherent report with log-specific guidance (temporal patterns, error clustering, service breakdown).

ScenarioUse This Skill?
Large JSONL log file (>1500 lines)Yes
Incident investigation in structured logsYes
Traffic/request analysis from JSONL event streamsYes
JSONL files with unknown or evolving schemaYes
Plain text logs (unstructured)No — use basic RLM with swarm:rlm-chunk-analyzer
Small JSONL file (<1500 lines)No — Claude handles it directly
JSON array (not line-delimited)No — use Content-Aware JSON RLM
CSV/TSV data filesNo — use Content-Aware CSV RLM

Schema extraction uses shell commands only — minimal tokens, lossless for the data it derives. No Read tool, no LLM parsing of raw lines.

Terminal window
wc -l < input.jsonl

Store the result as total_lines.

Sample the first 20 lines and extract every unique field path with its observed types:

Terminal window
head -20 input.jsonl | jq -s '
[.[] | paths(type != "object" and type != "array") as $p |
{path: ($p | map(tostring) | join(".")), type: (getpath($p) | type)}
] | group_by(.path) |
map({path: .[0].path, types: ([.[].type] | unique), count: length}) |
sort_by(.path)'

Output is a compact JSON array of {path, types, count} objects — one per unique leaf field. This captures:

  • All field paths including nested (dot notation: metadata.source)
  • Type(s) per field (detects mixed types like ["string", "null"])
  • Presence count out of N (required vs optional: count=20 means required, count<20 means optional)

If the file is large (>1000 lines), also sample the tail to catch schema evolution:

Terminal window
tail -20 input.jsonl | jq -s '[same extraction as Step 2]'

Merge the two schema extracts:

  • Union of paths
  • Union of types per path
  • Sum counts (out of 40 total samples)

If the tail introduces new paths absent from the head, flag them as potential schema drift.

From the extracted paths, classify using substring matching on path names (case-insensitive):

CategoryPath substrings
Timestamptime, timestamp, date, created_at, @timestamp, ts
Levellevel, severity, priority, log_level
Errorerror, exception, stack, traceback, err
Identifierrequest_id, trace_id, correlation_id, session_id, span_id
Messagemessage, msg, body, text
Statusstatus, code, http_status, status_code
Sourcesource, service, host, hostname, component, logger

The output of Phase 1 is:

  1. Schema extract — compact JSON array of {path, types, count} objects
  2. Field classification map — which discovered paths map to which log-relevant categories
  3. Total line count — for partition sizing
  4. Drift flag — whether tail sampling revealed new fields

This is what gets injected into analyst prompts — no raw log lines enter the orchestrator’s context.


Phase 2: Partition & Generate Analyst Prompts

Section titled “Phase 2: Partition & Generate Analyst Prompts”

Generate recipes only for fields that exist in the discovered schema. Each recipe uses {field} placeholders replaced with actual discovered field paths.

#RecipeCategoryTemplate
1Filter by levelLevelselect(.{level_field} == "ERROR")
2Extract errorsErrorselect(.{error_field} != null) | {ts: .{timestamp_field}, err: .{error_field}, msg: .{message_field}}
3Count by statusStatusgroup_by(.{status_field}) | map({status: .[0].{status_field}, count: length})
4Time range filterTimestampselect(.{timestamp_field} >= "START" and .{timestamp_field} <= "END")
5Search messagesMessageselect(.{message_field} | test("PATTERN"; "i"))
6Top errorsErrorgroup_by(.{error_field}) | map({error: .[0].{error_field}, count: length}) | sort_by(-.count) | .[0:10]
7Duration outliersTimestampselect(.{duration_field} > THRESHOLD) | {ts: .{timestamp_field}, dur: .{duration_field}, msg: .{message_field}}
8Correlation traceIdentifierselect(.{trace_id_field} == "TRACE_ID") | sort_by(.{timestamp_field})
9Count by sourceSourcegroup_by(.{source_field}) | map({source: .[0].{source_field}, count: length}) | sort_by(-.count)
10AggregationAny numeric.{numeric_field} | numbers (for downstream stats)

Only include recipes where the required field category has a match in the schema. For example, if no Source field was discovered, omit recipe #9.

Standard JSONL partitioning:

  • Chunk size: 500-1000 lines per chunk
  • Partition count: ceil(total_lines / chunk_size)
  • Use smaller chunks (500) for wide schemas (>15 fields) or deeply nested objects
  • Use larger chunks (1000) for narrow schemas (<10 fields)

Create one task per chunk. Each task description includes:

User query: {original user query}
File: {file_path}
Chunk: {N} of {M} (lines {start}-{end})
Format: jsonl
Schema summary:
{compact schema JSON from Phase 1}
Field classification:
- Timestamp: {field_path}
- Level: {field_path}
- Error: {field_path}
...
Tailored jq recipes:
1. Filter errors: jq 'select(.level == "ERROR")'
2. Extract error details: jq 'select(.error != null) | {ts: .timestamp, err: .error, msg: .message}'
...
Instructions: Read your assigned chunk using the Read tool with offset={start_line} and limit={chunk_size}. Apply the jq recipes mentally to count and categorize entries. Report findings as structured JSON.

Analyst type: swarm:rlm-json-analyzer (existing — no new agent needed)

Spawn analysts as teammates with team_name + name, staged in batches of ~15 for large workloads. Each analyst gets fresh context (1:1 analyst-per-partition model).


Use swarm:rlm-synthesizer with:

  • The original user query
  • Schema summary from Phase 1
  • Log-specific synthesis guidance:
Synthesis guidance for JSONL log analysis:
- Identify temporal patterns (error spikes, traffic waves, latency trends)
- Cluster errors by type and root cause — deduplicate across chunks
- Break down metrics by service/source if the field exists
- Note schema drift if analysts report inconsistent fields across chunks
- Correlate request IDs / trace IDs that span multiple chunks
- Highlight the top 5-10 most actionable findings

Analyze the application logs at /var/log/app/events.jsonl for error patterns.
Use the JSONL log analyzer skill. I need to understand:
- What types of errors are most frequent?
- Are there temporal spikes?
- Which services are generating the most errors?
Use the JSONL log analyzer to analyze the API gateway log at
data/gateway-access.jsonl. Report on:
- Request volume by endpoint and status code
- P50/P95 latency patterns over time
- Any anomalous traffic patterns or suspicious request bursts
Investigate the production incident using logs at /tmp/incident-2026-02-25.jsonl.
Use the JSONL log analyzer skill to:
- Build a timeline of events leading to the outage
- Trace affected request IDs across services
- Identify the root cause service and error type

File SizeChunk Size~PartitionsAnalyst Batching
1,500-5,000 lines1,0002-5All at once
5,000-20,000 lines7507-27Batch of ~15
20,000-100,000 lines50040-200Staged batches of ~15
100,000+ lines500200+Staged batches of ~15

For very large files (100k+ lines), consider pre-filtering with jq or grep to reduce the dataset before analysis — e.g., filter to a specific time window or error level.


Flatten to 3 levels of nesting in the schema extract. Paths like metadata.request.headers.content_type become metadata.request.headers.content_type (preserved), but paths beyond 3 levels of nesting are reported as deep.path...leaf to keep the schema compact.

Some JSONL files begin with a metadata line (keys like _meta, _header, _schema). If the first line’s keys are entirely distinct from lines 2-20, skip line 1 during schema extraction and note it as a metadata header.

If jq fails on some sampled lines during schema discovery, note the failure rate and proceed with parseable lines. Include the malformed line rate in the analyst prompt so analysts can report it per-chunk.

If schema discovery reveals highly divergent object shapes (e.g., event types with completely different fields), note the distinct shapes and include all shapes in the schema summary. Analysts will report shape distribution per chunk.

If tail sampling reveals fields absent from the head sample, flag this as schema drift. Include both the “early schema” and “late schema” differences in analyst prompts so they can report where the transition occurs.


This skill is a specialization of the RLM pattern, not a replacement:

AspectStandard JSON/JSONL RLMJSONL Log Analyzer
Schema discoveryManual or implicitAutomated via jq
jq recipesUser provides or noneAuto-generated from schema
Field classificationNoneTimestamp, level, error, ID, etc.
Synthesis guidanceGenericLog-specific (temporal, error clustering)
Analyst typeswarm:rlm-json-analyzerSame — swarm:rlm-json-analyzer
PartitioningStandard JSONL line-countSame — standard JSONL line-count

Use standard JSON/JSONL RLM when:

  • The file is a JSON array (not line-delimited)
  • The data isn’t log-structured (e.g., product catalog, config dump)
  • You want full manual control over analysis prompts

Use this skill when:

  • The file is JSONL with log-structured data
  • You want automated schema discovery and jq recipe generation
  • You’re investigating incidents, analyzing traffic, or profiling errors