Terminal Automation with pilotty
CRITICAL: Argument Positioning
All flags (--name, -s, --format, etc.) MUST come BEFORE positional arguments:
# CORRECT - flags before command/arguments pilotty spawn --name myapp vim file.txt pilotty key -s myapp Enter pilotty snapshot -s myapp --format text # WRONG - flags after command (they get passed to the app, not pilotty!) pilotty spawn vim file.txt --name myapp # FAILS: --name goes to vim pilotty key Enter -s myapp # FAILS: -s goes nowhere useful
This is the #1 cause of agent failures. When in doubt: flags first, then command/args.
Quick start
pilotty spawn vim file.txt # Start TUI app in managed session pilotty wait-for "file.txt" # Wait for app to be ready pilotty snapshot # Get screen state with UI elements pilotty key i # Enter insert mode pilotty type "Hello, World!" # Type text pilotty key Escape # Exit insert mode pilotty kill # End session
Core workflow
- •Spawn:
pilotty spawn <command>starts the app in a background PTY - •Wait:
pilotty wait-for <text>ensures the app is ready - •Snapshot:
pilotty snapshotreturns screen state with detected UI elements - •Understand: Parse
elements[]to identify buttons, inputs, toggles - •Interact: Use keyboard commands (
key,type) to navigate and interact - •Re-snapshot: Check
content_hashto detect screen changes
Commands
Session management
pilotty spawn <command> # Start TUI app (e.g., pilotty spawn htop) pilotty spawn --name myapp <cmd> # Start with custom session name (--name before command) pilotty kill # Kill default session pilotty kill -s myapp # Kill specific session pilotty list-sessions # List all active sessions pilotty daemon # Manually start daemon (usually auto-starts) pilotty shutdown # Stop daemon and all sessions pilotty examples # Show end-to-end workflow example
Screen capture
pilotty snapshot # Full JSON with text content and elements pilotty snapshot --format compact # JSON without text field pilotty snapshot --format text # Plain text with cursor indicator pilotty snapshot -s myapp # Snapshot specific session # Wait for screen to change (eliminates need for sleep!) HASH=$(pilotty snapshot | jq '.content_hash') pilotty key Enter pilotty snapshot --await-change $HASH # Block until screen changes pilotty snapshot --await-change $HASH --settle 50 # Wait for 50ms stability
Input
pilotty type "hello" # Type text at cursor pilotty type -s myapp "text" # Type in specific session pilotty key Enter # Press Enter pilotty key Ctrl+C # Send interrupt pilotty key Escape # Send Escape pilotty key Tab # Send Tab pilotty key F1 # Function key pilotty key Alt+F # Alt combination pilotty key Up # Arrow key pilotty key -s myapp Ctrl+S # Key in specific session # Key sequences (space-separated, sent in order) pilotty key "Ctrl+X m" # Emacs chord: Ctrl+X then m pilotty key "Escape : w q Enter" # vim :wq sequence pilotty key "a b c" --delay 50 # Send a, b, c with 50ms delay pilotty key -s myapp "Tab Tab Enter" # Sequence in specific session
Interaction
pilotty click 5 10 # Click at row 5, col 10 pilotty click -s myapp 10 20 # Click in specific session pilotty scroll up # Scroll up 1 line pilotty scroll down 5 # Scroll down 5 lines pilotty scroll up 10 -s myapp # Scroll in specific session
Terminal control
pilotty resize 120 40 # Resize terminal to 120 cols x 40 rows pilotty resize 80 24 -s myapp # Resize specific session pilotty wait-for "Ready" # Wait for text to appear (30s default) pilotty wait-for "Error" -r # Wait for regex pattern pilotty wait-for "Done" -t 5000 # Wait with 5s timeout pilotty wait-for "~" -s editor # Wait in specific session
Global options
| Option | Description |
|---|---|
-s, --session <name> | Target specific session (default: "default") |
--format <fmt> | Snapshot format: full, compact, text |
-t, --timeout <ms> | Timeout for wait-for and await-change (default: 30000) |
-r, --regex | Treat wait-for pattern as regex |
--name <name> | Session name for spawn command |
--delay <ms> | Delay between keys in a sequence (default: 0, max: 10000) |
--await-change <hash> | Block snapshot until content_hash differs |
--settle <ms> | Wait for screen to be stable for this many ms (default: 0) |
Environment variables
PILOTTY_SESSION="mysession" # Default session name PILOTTY_SOCKET_DIR="/tmp/pilotty" # Override socket directory RUST_LOG="debug" # Enable debug logging
Snapshot Output
The snapshot command returns structured JSON with detected UI elements:
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
"text": "Settings:\n [x] Notifications [ ] Dark mode\n [Save] [Cancel]",
"elements": [
{ "kind": "toggle", "row": 1, "col": 2, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
{ "kind": "toggle", "row": 1, "col": 20, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
{ "kind": "button", "row": 2, "col": 2, "width": 6, "text": "[Save]", "confidence": 0.8 },
{ "kind": "button", "row": 2, "col": 10, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
],
"content_hash": 12345678901234567890
}
Use --format text for a plain text view with cursor indicator:
--- Terminal 80x24 | Cursor: (5, 10) --- bash-3.2$ [_]
The [_] shows cursor position. Use the text content to understand screen state and navigate with keyboard commands.
Element Detection
pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure.
Element Kinds
| Kind | Detection Patterns | Confidence | Fields |
|---|---|---|---|
| toggle | [x], [ ], [*], ☑, ☐ | 1.0 | checked: bool |
| button | Inverse video, [OK], <Cancel>, (Submit) | 1.0 / 0.8 | focused: bool (if true) |
| input | Cursor position, ____ underscores | 1.0 / 0.6 | focused: bool (if true) |
Element Fields
| Field | Type | Description |
|---|---|---|
kind | string | Element type: button, input, or toggle |
row | number | Row position (0-based from top) |
col | number | Column position (0-based from left) |
width | number | Width in terminal cells (CJK chars = 2) |
text | string | Text content of the element |
confidence | number | Detection confidence (0.0-1.0) |
focused | bool | Whether element has focus (only present if true) |
checked | bool | Toggle state (only present for toggles) |
Confidence Levels
| Confidence | Meaning |
|---|---|
| 1.0 | High confidence: Cursor position, inverse video, checkbox patterns |
| 0.8 | Medium confidence: Bracket patterns [OK], <Cancel> |
| 0.6 | Lower confidence: Underscore input fields ____ |
Wait for Screen Changes (Recommended)
Stop guessing sleep durations! Use --await-change to wait for the screen to actually update:
# Capture baseline hash HASH=$(pilotty snapshot | jq '.content_hash') # Perform action pilotty key Enter # Wait for screen to change (blocks until hash differs) pilotty snapshot --await-change $HASH # Or wait for screen to stabilize (for apps that render progressively) pilotty snapshot --await-change $HASH --settle 100
Flags:
| Flag | Description |
|---|---|
--await-change <HASH> | Block until content_hash differs from this value |
--settle <MS> | After change detected, wait for screen to be stable for MS |
-t, --timeout <MS> | Maximum wait time (default: 30000) |
Why this is better than sleep:
- •
sleep 1is a guess - too short causes race conditions, too long slows automation - •
--await-changewaits exactly as long as needed - no more, no less - •
--settlehandles apps that render progressively (show partial, then complete)
Waiting for Streaming AI Responses
When interacting with AI-powered TUIs (like opencode, etc.) that stream responses, you need a longer --settle time since the screen keeps updating as tokens arrive:
# 1. Capture hash before sending prompt HASH=$(pilotty snapshot -s myapp | jq -r '.content_hash') # 2. Type prompt and submit pilotty type -s myapp "write me a poem about ai agents" pilotty key -s myapp Enter # 3. Wait for streaming response to complete # - Use longer settle (2-3s) since AI apps pause between chunks # - Extend timeout for long responses (60s+) pilotty snapshot -s myapp --await-change "$HASH" --settle 3000 -t 60000 # 4. Response may be scrolled - scroll up if needed to see full output pilotty scroll -s myapp up 10 pilotty snapshot -s myapp --format text
Key parameters for streaming:
- •
--settle 2000-3000: AI responses have pauses between chunks; 2-3 seconds ensures streaming is truly done - •
-t 60000: Extend timeout beyond the 30s default for longer generations - •The settle timer resets on each screen change, so it naturally waits until streaming stops
Manual Change Detection
For manual polling (not recommended), use content_hash directly:
# Get initial state SNAP1=$(pilotty snapshot) HASH1=$(echo "$SNAP1" | jq -r '.content_hash') # Perform action pilotty key Tab # Check if screen changed SNAP2=$(pilotty snapshot) HASH2=$(echo "$SNAP2" | jq -r '.content_hash') if [ "$HASH1" != "$HASH2" ]; then echo "Screen changed - re-analyze elements" fi
Using Elements Effectively
Elements are read-only context for understanding the UI. Use keyboard navigation for reliable interaction:
# 1. Get snapshot to understand UI structure pilotty snapshot | jq '.elements' # Output shows toggles (checked/unchecked) and buttons with positions # 2. Navigate and interact with keyboard (reliable approach) pilotty key Tab # Move to next element pilotty key Space # Toggle checkbox pilotty key Enter # Activate button # 3. Verify state changed pilotty snapshot | jq '.elements[] | select(.kind == "toggle")'
Key insight: Use elements to understand WHAT is on screen, use keyboard to interact with it.
Navigation Approach
pilotty uses keyboard-first navigation, just like a human would:
# 1. Take snapshot to see the screen pilotty snapshot --format text # 2. Navigate using keyboard pilotty key Tab # Move to next element pilotty key Enter # Activate/select pilotty key Escape # Cancel/back pilotty key Up # Move up in list/menu pilotty key Space # Toggle checkbox # 3. Type text when needed pilotty type "search term" pilotty key Enter # 4. Click at coordinates for mouse-enabled TUIs pilotty click 5 10 # Click at row 5, col 10
Key insight: Parse the snapshot text and elements to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
Example: Edit file with vim
# 1. Spawn vim pilotty spawn --name editor vim /tmp/hello.txt # 2. Wait for vim to load and capture baseline hash pilotty wait-for -s editor "hello.txt" HASH=$(pilotty snapshot -s editor | jq '.content_hash') # 3. Enter insert mode pilotty key -s editor i # 4. Type content pilotty type -s editor "Hello from pilotty!" # 5. Wait for screen to update, then exit (no sleep needed!) pilotty snapshot -s editor --await-change $HASH --settle 50 pilotty key -s editor "Escape : w q Enter" # 6. Verify session ended pilotty list-sessions
Alternative using individual keys:
pilotty key -s editor Escape pilotty type -s editor ":wq" pilotty key -s editor Enter
Example: Dialog checklist interaction
# 1. Spawn dialog checklist (--name before command)
pilotty spawn --name opts dialog --checklist "Select features:" 12 50 4 \
"notifications" "Push notifications" on \
"darkmode" "Dark mode theme" off \
"autosave" "Auto-save documents" on \
"telemetry" "Usage analytics" off
# 2. Wait for dialog to render (use await-change, not sleep!)
pilotty snapshot -s opts --settle 200 # Wait for initial render to stabilize
# 3. Get snapshot and examine elements, capture hash
SNAP=$(pilotty snapshot -s opts)
echo "$SNAP" | jq '.elements[] | select(.kind == "toggle")'
HASH=$(echo "$SNAP" | jq '.content_hash')
# 4. Navigate to "darkmode" and toggle it
pilotty key -s opts Down # Move to second option
pilotty key -s opts Space # Toggle it on
# 5. Wait for change and verify
pilotty snapshot -s opts --await-change $HASH | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
# 6. Confirm selection
pilotty key -s opts Enter
# 7. Clean up
pilotty kill -s opts
Example: Form filling with elements
# 1. Spawn a form application pilotty spawn --name form my-form-app # 2. Get snapshot to understand form structure pilotty snapshot -s form | jq '.elements' # Shows inputs, toggles, and buttons with positions for click command # 3. Tab to first input (likely already focused) pilotty type -s form "myusername" # 4. Tab to password field pilotty key -s form Tab pilotty type -s form "mypassword" # 5. Tab to remember me and toggle pilotty key -s form Tab pilotty key -s form Space # 6. Tab to Login and activate pilotty key -s form Tab pilotty key -s form Enter # 7. Check result pilotty snapshot -s form --format text
Example: Monitor with htop
# 1. Spawn htop pilotty spawn --name monitor htop # 2. Wait for display pilotty wait-for -s monitor "CPU" # 3. Take snapshot to see current state pilotty snapshot -s monitor --format text # 4. Send commands pilotty key -s monitor F9 # Kill menu pilotty key -s monitor q # Quit # 5. Kill session pilotty kill -s monitor
Example: Interact with AI TUI (opencode, etc.)
AI-powered TUIs stream responses, requiring special handling:
# 1. Spawn the AI app pilotty spawn --name ai opencode # 2. Wait for the prompt to be ready pilotty wait-for -s ai "Ask anything" -t 15000 # 3. Capture baseline hash HASH=$(pilotty snapshot -s ai | jq -r '.content_hash') # 4. Type prompt and submit pilotty type -s ai "explain the architecture of this codebase" pilotty key -s ai Enter # 5. Wait for streaming response to complete # - settle=3000: Wait 3s of no changes to ensure streaming is done # - timeout=60000: Allow up to 60s for long responses pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text # 6. If response is long and scrolled, scroll up to see full output pilotty scroll -s ai up 20 pilotty snapshot -s ai --format text # 7. Clean up pilotty kill -s ai
Gotchas with AI apps:
- •Use
--settle 2000-3000because AI responses pause between chunks - •Extend timeout with
-t 60000for complex prompts - •Long responses may scroll the terminal; use
scroll upto see the beginning - •The settle timer resets on each screen update, so it waits for true completion
Sessions
Each session is isolated with its own:
- •PTY (pseudo-terminal)
- •Screen buffer
- •Child process
# Run multiple apps (--name must come before the command) pilotty spawn --name monitoring htop pilotty spawn --name editor vim file.txt # Target specific session pilotty snapshot -s monitoring pilotty key -s editor Ctrl+S # List all pilotty list-sessions # Kill specific pilotty kill -s editor
The first session spawned without --name is automatically named default.
Important: The
--nameflag must come before the command. Everything after the command is passed as arguments to that command.
Daemon Architecture
pilotty uses a background daemon for session management:
- •Auto-start: Daemon starts on first command
- •Auto-stop: Shuts down after 5 minutes with no sessions
- •Session cleanup: Sessions removed when process exits (within 500ms)
- •Shared state: Multiple CLI calls share sessions
You rarely need to manage the daemon manually.
Error Handling
Errors include actionable suggestions:
{
"code": "SESSION_NOT_FOUND",
"message": "Session 'abc123' not found",
"suggestion": "Run 'pilotty list-sessions' to see available sessions"
}
{
"code": "SPAWN_FAILED",
"message": "Failed to spawn process: command not found",
"suggestion": "Check that the command exists and is in PATH"
}
Common Patterns
Reliable action + wait (recommended)
# The pattern: capture hash, act, await change HASH=$(pilotty snapshot | jq '.content_hash') pilotty key Enter pilotty snapshot --await-change $HASH --settle 50 # This replaces fragile patterns like: # pilotty key Enter && sleep 1 && pilotty snapshot # BAD: guessing
Wait then act
pilotty spawn my-app pilotty wait-for "Ready" # Ensure app is ready pilotty snapshot # Then snapshot
Check state before action
pilotty snapshot --format text | grep "Error" # Check for errors pilotty key Enter # Then proceed
Check for specific element
# Check if the first toggle is checked
pilotty snapshot | jq '.elements[] | select(.kind == "toggle") | {text, checked}' | head -1
# Find element at specific position
pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
Retry on timeout
pilotty wait-for "Ready" -t 5000 || {
pilotty snapshot --format text # Check what's on screen
# Adjust approach based on actual state
}
Deep-dive Documentation
For detailed patterns and edge cases, see:
| Reference | Description |
|---|---|
| references/session-management.md | Multi-session patterns, isolation, cleanup |
| references/key-input.md | Complete key combinations reference |
| references/element-detection.md | Detection rules, confidence, patterns |
Ready-to-use Templates
Executable workflow scripts:
| Template | Description |
|---|---|
| templates/vim-workflow.sh | Edit file with vim, save, exit |
| templates/dialog-interaction.sh | Handle dialog/whiptail prompts |
| templates/multi-session.sh | Parallel TUI orchestration |
| templates/element-detection.sh | Element detection demo |
Usage:
./templates/vim-workflow.sh /tmp/myfile.txt "File content here" ./templates/dialog-interaction.sh ./templates/multi-session.sh ./templates/element-detection.sh