JSONL Item Reader

Name: read-jsonl
Rating: 92
Author: zhuxiaohai

Quick Start

Extract specific items by ID:

bash

# Single ID
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids xsum_0

# Multiple IDs
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids xsum_0 xsum_5 xsum_10

# Comma-separated IDs
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids "xsum_0,xsum_5,xsum_10"

Display Options

Choose what to display:

bash

# Show all fields (default)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0

# Show only specific fields
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --fields id text_human

# Show only text fields (human, ai_base, humanized)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --text-only

# Compact view (metadata only, no text content)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --compact

# Pretty print with better formatting
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --pretty

Output Formats

bash

# Human-readable (default)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0

# JSON output
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --format json

# JSON Lines (one item per line)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --format jsonl

# Export to file
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --output output.json

Common Use Cases

Compare text lengths across stages:

bash

python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_ai_gpt-4o_humanized_gpt-4o.jsonl \
  --ids xsum_0 \
  --fields id text_human text_ai_base text_ai_humanized \
  --stats

Inspect error cases:

bash

python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_ai_gpt-4o.jsonl \
  --ids xsum_42 \
  --fields id generation_status error text_ai_base

Extract detection scores:

bash

python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_winston_text_human.jsonl \
  --ids xsum_0 \
  --fields id ai_probability score

Features

The reader provides:

•🎯 Exact ID matching - Extract specific entries by ID
•📊 Length statistics - Character and word counts for text fields
•🔍 Flexible display - Show all fields, specific fields, or text-only
•💾 Multiple formats - Human-readable, JSON, or JSONL output
•📝 Pretty printing - Formatted text with line numbers and truncation
•⚡ Fast lookup - Efficient ID-based extraction

Field Groups

Common field selections:

•Metadata: id dataset chunk_type
•Text (Group H): text_human
•Text (Group A): text_human text_ai_base
•Text (Group B): text_human text_ai_base text_ai_humanized
•Generation: generation_status error was_truncated model
•Humanization: humanization_status humanizer_error humanizer_was_truncated humanizer_model
•Detection: ai_probability score status_code

Integration

For programmatic usage and batch processing, see API.md.

For detailed examples and patterns, see GUIDE.md.

Typical Workflow

bash

# 1. Find problematic IDs with analyzer
python .claude/skills/analyze-jsonl/scripts/analyzer.py data/processed/xsum_ai.jsonl

# 2. Read specific entries to investigate
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_ai.jsonl --ids xsum_42 --pretty

# 3. Compare across pipeline stages
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_humanized.jsonl --ids xsum_42 --text-only