AgentSkillsCN

read-jsonl

根据ID从JSONL文件中读取并提取特定条目。在需要检查单个条目、对比文本字段,或分析数据/处理后的特定样本时使用此功能。

SKILL.md
--- frontmatter
name: read-jsonl
description: Read and extract specific entries from JSONL files by ID. Use when you need to inspect individual items, compare text fields, or analyze specific samples from data/processed/.

JSONL Item Reader

Quick Start

Extract specific items by ID:

bash
# Single ID
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids xsum_0

# Multiple IDs
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids xsum_0 xsum_5 xsum_10

# Comma-separated IDs
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_group_h.jsonl --ids "xsum_0,xsum_5,xsum_10"

Display Options

Choose what to display:

bash
# Show all fields (default)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0

# Show only specific fields
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --fields id text_human

# Show only text fields (human, ai_base, humanized)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --text-only

# Compact view (metadata only, no text content)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --compact

# Pretty print with better formatting
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --pretty

Output Formats

bash
# Human-readable (default)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0

# JSON output
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --format json

# JSON Lines (one item per line)
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --format jsonl

# Export to file
python .claude/skills/read-jsonl/scripts/reader.py file.jsonl --ids xsum_0 --output output.json

Common Use Cases

Compare text lengths across stages:

bash
python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_ai_gpt-4o_humanized_gpt-4o.jsonl \
  --ids xsum_0 \
  --fields id text_human text_ai_base text_ai_humanized \
  --stats

Inspect error cases:

bash
python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_ai_gpt-4o.jsonl \
  --ids xsum_42 \
  --fields id generation_status error text_ai_base

Extract detection scores:

bash
python .claude/skills/read-jsonl/scripts/reader.py \
  data/processed/xsum_group_h_winston_text_human.jsonl \
  --ids xsum_0 \
  --fields id ai_probability score

Features

The reader provides:

  • 🎯 Exact ID matching - Extract specific entries by ID
  • 📊 Length statistics - Character and word counts for text fields
  • 🔍 Flexible display - Show all fields, specific fields, or text-only
  • 💾 Multiple formats - Human-readable, JSON, or JSONL output
  • 📝 Pretty printing - Formatted text with line numbers and truncation
  • Fast lookup - Efficient ID-based extraction

Field Groups

Common field selections:

  • Metadata: id dataset chunk_type
  • Text (Group H): text_human
  • Text (Group A): text_human text_ai_base
  • Text (Group B): text_human text_ai_base text_ai_humanized
  • Generation: generation_status error was_truncated model
  • Humanization: humanization_status humanizer_error humanizer_was_truncated humanizer_model
  • Detection: ai_probability score status_code

Integration

For programmatic usage and batch processing, see API.md.

For detailed examples and patterns, see GUIDE.md.

Typical Workflow

bash
# 1. Find problematic IDs with analyzer
python .claude/skills/analyze-jsonl/scripts/analyzer.py data/processed/xsum_ai.jsonl

# 2. Read specific entries to investigate
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_ai.jsonl --ids xsum_42 --pretty

# 3. Compare across pipeline stages
python .claude/skills/read-jsonl/scripts/reader.py data/processed/xsum_humanized.jsonl --ids xsum_42 --text-only