Dump Agent Data for rg

[Created by Claude: 7f04f921-3ad9-48b4-a951-f8227c466a3e]

Relationship to Other Skills

Used by: search-agent-conversation (this is the prerequisite)

This skill provides technical details about the dump tool. For search strategies and when to use this tool, see search-agent-conversation.

❌ CRITICAL WARNING

NEVER grep sse_lines.jsonl directly!

✅ Always use this dump tool first, then rg the output.

Quick Start

bash

# Last hour (both agents, auto-generated unique dir)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 1 --end-hour 0

# Last 24 hours, codex only
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 24 --end-hour 0 --agent codex

# Precise time window (local naive time)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00" --to-time "2026-01-23 12:00:00"

# From specific time to NOW (note the printed --to-time!)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00"

Sample Output

code

Time window: 2026-01-23 12:00:00 → 2026-01-23 12:42:47 (42m 47s)
# --to-time defaulted to: 2026-01-23 12:42:47 (use this for next --since-time)
Output:      /tmp/agent-dump-conversations/codex-and-claude-0.7h-019be960-66bc-7b2d-a0d3-3ed0f1d36221

Codex   0 sessions  →  /tmp/agent-dump-conversations/.../codex/
Claude  1 session   →  /tmp/agent-dump-conversations/.../claude/

codex-and-claude-0.7h-019be960-66bc-7b2d-a0d3-3ed0f1d36221/
├── codex/
└── claude/
    └── 7f04f921-3ad9-48b4-a951-f8227c466a3e-0/
        └── conversation.txt

# Search conversations:
rg "vscode extension" /tmp/agent-dump-conversations/codex-and-claude-0.7h-...
# Reminder: Do NOT search sse_lines.jsonl directly, use this dump instead!

Total size: 0.11 MB

Advanced Usage: Git Diff Workflow

⚠️ MUST use --since-time + --to-time for this workflow

WHY: Using relative hours (--start-hour) causes data loss between dumps. If a dump takes 10 seconds, you lose 10 seconds of conversations between runs.

Step-by-Step

bash

# Step 1: Initial 24h dump (note the exact --to-time printed!)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-22 12:00:00" \
  --to-time "2026-01-23 12:00:00"

# Output shows: Time window: 2026-01-22 12:00:00 → 2026-01-23 12:00:00

cd /tmp/agent-dump-conversations/codex-and-claude-24h-xxxxx
git init && git add . && git commit -m "24h baseline"

# Step 2: Later dump (use EXACT --to-time from previous as --since-time)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 12:00:00"
# Omit --to-time = dumps up to NOW
# Output shows: # --to-time defaulted to: 2026-01-23 14:30:15 (use this for next --since-time)

# Copy new data into git repo
cp -r /tmp/agent-dump-conversations/codex-and-claude-*/codex/* .
cp -r /tmp/agent-dump-conversations/codex-and-claude-*/claude/* .

# Step 3: git diff shows only NEW conversations (no loss!)
git diff

Key insight: The printed --to-time defaulted to: message tells you the exact cutoff. Use that value as --since-time for the next dump to ensure no gaps.

Script Locations

Wrapper (recommended)

code

~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py

Individual scripts

code

~/swe/telemetry_projects/agent_dump/launchers/dump_codex_conversations.py
~/swe/telemetry_projects/agent_dump/launchers/dump_claude_conversations.py

Symlinks exist in both projects

code

~/swe/telemetry_projects/codex_sqlite/launchers/dump_conversations.py
~/swe/telemetry_projects/claude_sqlite/launchers/dump_conversations.py

Command Reference

Time Modes (mutually exclusive)

Mode	Flags	Example
Relative	`--start-hour` + `--end-hour`	`--start-hour 24 --end-hour 0` (last 24h)
Precise	`--since-time` [+ `--to-time`]	`--since-time "2026-01-23 10:00:00"`

Time Formats

Flag	Format	Default	Example
`--start-hour`	Float	Required	`24`, `1.5`
`--end-hour`	Float	Required	`0`, `0.5`
`--since-time`	`YYYY-MM-DD HH:MM:SS`	Required	`"2026-01-23 10:00:00"`
`--to-time`	`YYYY-MM-DD HH:MM:SS`	now	`"2026-01-23 12:00:00"`

Note: Times are local naive (no timezone suffix).

Other Flags

Flag	Description	Example
`--agent`	Which agent(s): `codex`, `claude`, `both` (default: `both`)	`--agent codex`
`--data-dir`	Custom output directory (default: auto-generated)	`--data-dir /tmp/my-dump`
`--sid`	Filter by session ID suffix (can specify multiple)	`--sid abc123 --sid def456`
`--include-reasoning`	Include thinking/reasoning blocks (disabled by default)	`--include-reasoning`
`-q, --quiet`	Suppress output	`-q`

About Reasoning/Thinking Content

Disabled by default because reasoning is transient and exploratory:

•Agents exploring possibilities, not final decisions
•Often discarded or revised in final output
•Can add noise when searching for what agents actually did

When to include (--include-reasoning):

•Debugging agent decision-making process
•Understanding why agents chose specific approaches
•Analyzing agent behavior patterns

Note: Tool calls are included by default (they show what agents actually executed).

Output Structure

code

/tmp/agent-dump-conversations/codex-and-claude-24h-{uuid}/
├── codex/
│   ├── {sid}-{pid}/
│   │   └── conversation.txt
│   └── ...
└── claude/
    ├── {sid}-{pid}/
    │   └── conversation.txt
    └── ...

Each conversation.txt contains:

•Agent type (CODEX or CLAUDE)
•Session ID and PID
•Round-by-round conversations
•User prompts
•Assistant responses
•Timestamps

⚠️ IMPORTANT: Direct SQLite Querying Protocol

Agents are heavily discouraged from querying SQLite databases directly, especially for metadata-only queries. Use the dump tool instead.

If You Must Query SQLite Directly

When querying ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite or ~/centralized-logs/sqlite-dbs/claude-rounds.sqlite:

Timeout Rules:

•Maximum timeout: 16 seconds
•On each timeout: Halve the timeout (16s → 8s → 4s → 2s → 1s)
•After multiple timeouts: Stop and report the issue

Background Execution (Recommended):

•Run queries in a background terminal if possible
•Background terminals must also comply with the timeout rule
•Use timeout 16s sqlite3 ... to enforce limits

Example:

bash

# With timeout protection
timeout 16s sqlite3 ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite "SELECT COUNT(*) FROM sessions"

# If timeout occurs, retry with 8s
timeout 8s sqlite3 ~/centralized-logs/sqlite-dbs/codex-rounds.sqlite "SELECT COUNT(*) FROM sessions"

Why this matters: SQLite can lock under concurrent writes. The dump tool is optimized for safe, read-only access to conversation data.

🚀 FAST QUERY: --prompt-only Mode

When you only need user prompts with timestamps, use --prompt-only instead of querying SQLite directly:

bash

# Get all prompts from last 24 hours (JSON to stdout)
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --start-hour 24 --end-hour 0 --prompt-only

# Get codex-only prompts from precise time window
python ~/swe/telemetry_projects/agent_dump/launchers/dump_conversations.py \
  --since-time "2026-01-23 10:00:00" --agent codex --prompt-only

Example Output

json

[
  {
    "agent": "claude",
    "sid": "03d9e752-394b-477b-84d5-b7770322c09f",
    "pid": 0,
    "prompt_count": 3,
    "user_prompts": [
      {"t": "2026-01-23 12:00:01", "prompt": "Upgrade the repo to have more observability"},
      {"t": "2026-01-23 12:15:30", "prompt": "Add tests for the new feature"},
      {"t": "2026-01-23 12:45:00", "prompt": "Fix the failing CI"}
    ]
  },
  {
    "agent": "codex",
    "sid": "019be4a6-fa69-7d91-a963-ecca7d9185c6",
    "pid": 12351,
    "prompt_count": 2,
    "user_prompts": [
      {"t": "2026-01-23 11:30:00", "prompt": "Port the documentation"},
      {"t": "2026-01-23 12:00:00", "prompt": "Update the changelog"}
    ]
  }
]

Use Cases

Task	Use `--prompt-only`
List all agents that worked on project X	✅ Yes - Parse JSON, filter by prompt keywords
Get timeline of user requests	✅ Yes - Sort by timestamp
Find sessions with specific prompts	✅ Yes - Much faster than full dump
Search assistant responses or tool calls	❌ No - Use full dump + rg

⚠️ IMPORTANT: Use This Instead of SQLite Queries!

When asked to "find agents that participated in X" or "get all user prompts", ALWAYS use --prompt-only rather than querying SQLite directly:

•✅ python dump_conversations.py --start-hour 72 --end-hour 0 --prompt-only
•❌ sqlite3 codex-rounds.sqlite "SELECT sid, prompt FROM rounds WHERE..."

The --prompt-only flag:

•No files written (JSON to stdout)
•Much faster than full dump
•No risk of SQLite lock issues
•Easy to parse with jq or Python

Basic rg Commands (Quick Reference)

bash

# Basic search
rg "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

# Search only codex sessions
rg "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/codex/

# List matching files only
rg "keyword" -l /tmp/agent-dump-conversations/codex-and-claude-*/

# Search with context
rg -C 5 "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

# Case-insensitive
rg -i "keyword" /tmp/agent-dump-conversations/codex-and-claude-*/

For search strategies and when to regenerate data, see the search-agent-conversation skill.