AgentSkillsCN

dump-agent-data-for-rg

为将 Codex 与 Claude 代理的对话转储至可搜索文本文件提供技术参考。包括命令参考、标志位与输出格式。此技能是“搜索代理对话”技能的前提条件。

SKILL.md
--- frontmatter
name: dump-agent-data-for-rg
description: Technical reference for dumping Codex and Claude agent conversations to searchable text files. Command reference, flags, and output format. Used as prerequisite by search-agent-conversation skill.

Dump Agent Data for rg

[Created by Claude: 7f04f921-3ad9-48b4-a951-f8227c466a3e]

Relationship to Other Skills

Used by: search-agent-conversation (this is the prerequisite)

This skill provides technical details about the dump tool. For search strategies and when to use this tool, see search-agent-conversation.


❌ CRITICAL WARNING

NEVER grep sse_lines.jsonl directly!

Always use this dump tool first, then rg the output.


Quick Start

bash
# Last hour (both agents, auto-generated unique dir)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 1 --end-hour 0

# Last 24 hours, codex only
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 24 --end-hour 0 --agent codex

# Precise time window (local naive time)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00" --to-time "2026-01-23 12:00:00"

# From specific time to NOW (note the printed --to-time!)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00"

Sample Output

code
Time window: 2026-01-23 12:00:00 → 2026-01-23 12:42:47 (42m 47s)
# --to-time defaulted to: 2026-01-23 12:42:47 (use this for next --since-time)
Output:      /tmp/agent-dump-conversations/codex-and-claude-0.7h-019be960-66bc-7b2d-a0d3-3ed0f1d36221

Codex   0 sessions  →  /tmp/agent-dump-conversations/.../codex/
Claude  1 session   →  /tmp/agent-dump-conversations/.../claude/

codex-and-claude-0.7h-019be960-66bc-7b2d-a0d3-3ed0f1d36221/
├── codex/
└── claude/
    └── 7f04f921-3ad9-48b4-a951-f8227c466a3e-0/
        └── conversation.txt

# Search conversations:
rg "vscode extension" /tmp/agent-dump-conversations/codex-and-claude-0.7h-...
# Reminder: Do NOT search sse_lines.jsonl directly, use this dump instead!

Total size: 0.11 MB

Advanced Usage: Git Diff Workflow

⚠️ MUST use --since-time + --to-time for this workflow

WHY: Using relative hours (--start-hour) causes data loss between dumps. If a dump takes 10 seconds, you lose 10 seconds of conversations between runs.

Step-by-Step

bash
# Step 1: Initial 24h dump (note the exact --to-time printed!)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-22 12:00:00" \
  --to-time "2026-01-23 12:00:00"

# Output shows: Time window: 2026-01-22 12:00:00 → 2026-01-23 12:00:00

cd /tmp/agent-dump-conversations/codex-and-claude-24h-xxxxx
git init && git add . && git commit -m "24h baseline"

# Step 2: Later dump (use EXACT --to-time from previous as --since-time)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 12:00:00"
# Omit --to-time = dumps up to NOW
# Output shows: # --to-time defaulted to: 2026-01-23 14:30:15 (use this for next --since-time)

# Copy new data into git repo
cp -r /tmp/agent-dump-conversations/codex-and-claude-*/codex/* .
cp -r /tmp/agent-dump-conversations/codex-and-claude-*/claude/* .

# Step 3: git diff shows only NEW conversations (no loss!)
git diff

Key insight: The printed --to-time defaulted to: message tells you the exact cutoff. Use that value as --since-time for the next dump to ensure no gaps.


Script Locations

Wrapper (recommended)

code
~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py

Individual scripts

code
~/swe/telemetry_projects/agent_dump/launchers/dump_codex_conversations.py
~/swe/telemetry_projects/agent_dump/launchers/dump_claude_conversations.py

Symlinks exist in both projects

code
~/swe/telemetry_projects/codex_sqlite/launchers/dump_conversations.py
~/swe/telemetry_projects/claude_sqlite/launchers/dump_conversations.py

Command Reference

Time Modes (mutually exclusive)

ModeFlagsExample
Relative--start-hour + --end-hour--start-hour 24 --end-hour 0 (last 24h)
Precise--since-time [+ --to-time]--since-time "2026-01-23 10:00:00"

Time Formats

FlagFormatDefaultExample
--start-hourFloatRequired24, 1.5
--end-hourFloatRequired0, 0.5
--since-timeYYYY-MM-DD HH:MM:SSRequired"2026-01-23 10:00:00"
--to-timeYYYY-MM-DD HH:MM:SSnow"2026-01-23 12:00:00"

Note: Times are local naive (no timezone suffix).

Other Flags

FlagDescriptionExample
--agentWhich agent(s): codex, claude, both (default: both)--agent codex
--data-dirCustom output directory (default: auto-generated)--data-dir /tmp/my-dump
--sidFilter by session ID suffix (can specify multiple)--sid abc123 --sid def456
--include-reasoningInclude thinking/reasoning blocks (disabled by default)--include-reasoning
-q, --quietSuppress output-q

About Reasoning/Thinking Content

Disabled by default because reasoning is transient and exploratory:

  • Agents exploring possibilities, not final decisions
  • Often discarded or revised in final output
  • Can add noise when searching for what agents actually did

When to include (--include-reasoning):

  • Debugging agent decision-making process
  • Understanding why agents chose specific approaches
  • Analyzing agent behavior patterns

Note: Tool calls are included by default (they show what agents actually executed).


Output Structure

code
/tmp/agent-dump-conversations/codex-and-claude-24h-{uuid}/
├── codex/
│   ├── {sid}-{pid}/
│   │   └── conversation.txt
│   └── ...
└── claude/
    ├── {sid}-{pid}/
    │   └── conversation.txt
    └── ...

Each conversation.txt contains:

  • Agent type (CODEX or CLAUDE)
  • Session ID and PID
  • Round-by-round conversations
  • User prompts
  • Assistant responses
  • Timestamps

⚠️ IMPORTANT: Direct SQLite Querying Protocol

Agents are heavily discouraged from querying SQLite databases directly, especially for metadata-only queries. Use the dump tool instead.

If You Must Query SQLite Directly

When querying ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite or ~/centralized-logs/sqlite-dbs/claude-rounds.sqlite:

Timeout Rules:

  • Maximum timeout: 16 seconds
  • On each timeout: Halve the timeout (16s → 8s → 4s → 2s → 1s)
  • After multiple timeouts: Stop and report the issue

Background Execution (Recommended):

  • Run queries in a background terminal if possible
  • Background terminals must also comply with the timeout rule
  • Use timeout 16s sqlite3 ... to enforce limits

Example:

bash
# With timeout protection
timeout 16s sqlite3 ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite "SELECT COUNT(*) FROM sessions"

# If timeout occurs, retry with 8s
timeout 8s sqlite3 ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite "SELECT COUNT(*) FROM sessions"

Why this matters: SQLite can lock under concurrent writes. The dump tool is optimized for safe, read-only access to conversation data.


🚀 FAST QUERY: --prompt-only Mode

When you only need user prompts with timestamps, use --prompt-only instead of querying SQLite directly:

bash
# Get all prompts from last 24 hours (JSON to stdout)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 24 --end-hour 0 --prompt-only

# Get codex-only prompts from precise time window
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00" --agent codex --prompt-only

Example Output

json
[
  {
    "agent": "claude",
    "sid": "03d9e752-394b-477b-84d5-b7770322c09f",
    "pid": 0,
    "prompt_count": 3,
    "user_prompts": [
      {"t": "2026-01-23 12:00:01", "prompt": "Upgrade the repo to have more observability"},
      {"t": "2026-01-23 12:15:30", "prompt": "Add tests for the new feature"},
      {"t": "2026-01-23 12:45:00", "prompt": "Fix the failing CI"}
    ]
  },
  {
    "agent": "codex",
    "sid": "019be4a6-fa69-7d91-a963-ecca7d9185c6",
    "pid": 12351,
    "prompt_count": 2,
    "user_prompts": [
      {"t": "2026-01-23 11:30:00", "prompt": "Port the documentation"},
      {"t": "2026-01-23 12:00:00", "prompt": "Update the changelog"}
    ]
  }
]

Use Cases

TaskUse --prompt-only
List all agents that worked on project X✅ Yes - Parse JSON, filter by prompt keywords
Get timeline of user requests✅ Yes - Sort by timestamp
Find sessions with specific prompts✅ Yes - Much faster than full dump
Search assistant responses or tool calls❌ No - Use full dump + rg

⚠️ IMPORTANT: Use This Instead of SQLite Queries!

When asked to "find agents that participated in X" or "get all user prompts", ALWAYS use --prompt-only rather than querying SQLite directly:

  • python dump_conversations.py --start-hour 72 --end-hour 0 --prompt-only
  • sqlite3 codex-rounds.sqlite "SELECT sid, prompt FROM rounds WHERE..."

The --prompt-only flag:

  • No files written (JSON to stdout)
  • Much faster than full dump
  • No risk of SQLite lock issues
  • Easy to parse with jq or Python

Basic rg Commands (Quick Reference)

bash
# Basic search
rg "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

# Search only codex sessions
rg "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/codex/

# List matching files only
rg "keyword" -l /tmp/agent-dump-conversations/codex-and-claude-*/

# Search with context
rg -C 5 "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

# Case-insensitive
rg -i "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

For search strategies and when to regenerate data, see the search-agent-conversation skill.