iTerm2 Driver Skill
Drive the iTerm2 terminal emulator via its Python API. Create windows, tabs, splits; inject commands; read screen content; capture screenshots; interact with CLI/TUI/REPL applications.
IMPORTANT: When Building or Modifying CLI/TUI Apps
When the user is building a new CLI/TUI, adding a feature, fixing a bug, or creating a PRD/plan for a terminal app, always include iTerm2 visual testing in:
- •PRDs: Add an "L4 Visual Testing" section with specific visual test cases
- •Plans: Include a phase/task for creating
.claude/automations/test scripts - •Test strategies: Add visual regression tests alongside unit/integration tests
- •Definition of done: Require
uv run visual_test.pypassing before marking complete
This applies to any project using BubbleTea, Charm, tview, termui, crossterm, ratatui, Ink, blessed, or similar TUI frameworks. Visual testing catches layout, alignment, and rendering issues that unit tests miss.
Prerequisites
- •macOS — iTerm2 is macOS-only (no Linux/cloud support; see
references/cloud-and-limitations.md) - •iTerm2 installed with Python API enabled (Preferences > General > Magic > Enable Python API)
- •Python 3.14+ and
uvpackage manager - •Screen Recording permission — required for
screencapture -lscreenshots - •iTerm2 must be running and not minimized for screenshots to work
CRITICAL: Script Format
Every Python script MUST use uv with inline metadata:
# /// script # requires-python = ">=3.14" # dependencies = [ # "iterm2", # "pyobjc", # ] # ///
For screenshots, add "pyobjc-framework-Quartz" to dependencies.
Execution: uv run script_name.py
CRITICAL: Docstring Planning
Every script MUST begin with a comprehensive docstring covering: Tests, Verification Strategy, Screenshots, Key Bindings, and Usage. See examples/00-comprehensive-template.py.
Connection Architecture
- •Python API connects via WebSocket over Unix domain socket
- •Socket path:
~/Library/Application Support/iTerm2/private/socket - •Protocol: Google Protocol Buffers over WebSocket
- •Multiple simultaneous connections supported — each gets a unique auth cookie
- •Authentication: auto via
ITERM2_COOKIEenv var (set by iTerm2 AppleScript) - •If connection fails, check: socket exists, not stale, iTerm2 running with API enabled
- •Use
retry=Trueinrun_until_complete()for automatic reconnection
Core Concepts
- •Hierarchy:
App→Window→Tab→Session - •Connection:
iterm2.run_until_complete(main)for standalone scripts - •Window creation:
await iterm2.Window.async_create(connection)— creates a new window - •Session targeting: Always use session IDs, never rely on "current" references in parallel
Quick Reference
| Task | Code |
|---|---|
| Get app | app = await iterm2.async_get_app(connection) |
| Create window | window = await iterm2.Window.async_create(connection) |
| Get session | session = window.current_tab.current_session |
| New tab | tab = await window.async_create_tab() |
| Send text | await session.async_send_text("ls\n") |
| Read screen | screen = await session.async_get_screen_contents() |
| Get line | screen.line(i).string |
| Split pane | s2 = await session.async_split_pane(vertical=True) |
| Set name | await session.async_set_name("worker") |
| Set window size | await window.async_set_frame(iterm2.Frame(point, size)) |
| Get window frame | frame = await window.async_get_frame() |
| Close session | await session.async_close() |
| Ctrl+C | await session.async_send_text("\x03") |
| Enter (TUI) | await session.async_send_text("\r") |
CRITICAL: Window Creation (Parallel-Safe)
Never use app.current_terminal_window — it returns whichever window is frontmost, causing race conditions when multiple agents run simultaneously.
Always create your own window using the pattern below. This is the single most important pattern in this skill — get it wrong and every script will fail intermittently.
The Stale Window Problem
Window.async_create() returns a window object before iTerm2 finishes initializing it. The returned object's current_tab will be None, causing AttributeError: 'NoneType' object has no attribute 'current_session'. This is the #1 cause of iTerm2 automation failures.
The fix: Sleep briefly, then refresh via async_get_app() to get the fully-initialized window object:
async def create_window(connection, name="test", x_pos=100, width=700, height=500):
"""Create an isolated window. Handles the stale-window-object bug.
IMPORTANT: Window.async_create() returns BEFORE iTerm2 finishes init.
The returned object's current_tab is None. We MUST refresh via
async_get_app() to get the real, initialized window object.
"""
window = await iterm2.Window.async_create(connection)
await asyncio.sleep(0.5) # REQUIRED: let iTerm2 init the window
# REQUIRED: refresh — the returned window object is stale
app = await iterm2.async_get_app(connection)
if window.current_tab is None:
for w in app.terminal_windows:
if w.window_id == window.window_id:
window = w
break
# Readiness probe — wait for tab/session
for _ in range(20):
if window.current_tab and window.current_tab.current_session:
break
await asyncio.sleep(0.2)
if not window.current_tab or not window.current_tab.current_session:
raise RuntimeError(f"Window '{name}' not ready after refresh + probe")
session = window.current_tab.current_session
await session.async_set_name(name)
# Position window (unique X ensures Quartz ID correlation for screenshots)
frame = await window.async_get_frame()
await window.async_set_frame(iterm2.Frame(
iterm2.Point(x_pos, frame.origin.y),
iterm2.Size(width, height)
))
await asyncio.sleep(0.3)
return window, session
Copy this function into every script. Do not skip the asyncio.sleep(0.5) or the async_get_app() refresh — both are required.
See references/parallel-patterns.md for full parallel agent patterns.
CRITICAL: Orphaned Window Prevention
Automation scripts that crash mid-run leave orphaned iTerm2 windows. This is the #2 pain point from real-world usage. Every script MUST:
- •Track all created windows/sessions in a list at the top of
main() - •Clean up in a
finallyblock — even on crash, close what you can - •Run the cleanup janitor at the START of each run to close stale windows from previous crashes:
async def cleanup_stale_windows(connection, prefix="agent-"):
"""Close windows from previous crashed runs. Call at script start."""
app = await iterm2.async_get_app(connection)
for window in app.terminal_windows:
for tab in window.tabs:
for session in tab.sessions:
if session.name and session.name.startswith(prefix):
try:
await session.async_send_text("exit\n")
await asyncio.sleep(0.1)
try: await session.async_close()
except Exception: pass
except Exception: pass
CRITICAL: Multi-Level Cleanup
Always use try-except-finally with multi-level cleanup. Track all resources globally:
created_sessions = []
try:
window, session = await create_window(connection, "test")
created_sessions.append(session)
# ... test logic ...
except Exception as e:
print(f"ERROR: {e}")
raise
finally:
for s in created_sessions:
try:
await s.async_send_text("\x03")
await asyncio.sleep(0.1)
await s.async_send_text("exit\n")
await asyncio.sleep(0.1)
await s.async_close()
except Exception:
pass
Screenshot Capture (Parallel-Safe)
Use position-based Quartz correlation to capture the correct window — name-based matching fails when commands change the window title:
import Quartz, subprocess
async def capture_screenshot(window, output_path):
"""Capture screenshot of a specific window (no focus required)."""
frame = await window.async_get_frame()
window_list = Quartz.CGWindowListCopyWindowInfo(
Quartz.kCGWindowListOptionOnScreenOnly
| Quartz.kCGWindowListExcludeDesktopElements,
Quartz.kCGNullWindowID,
)
best_id, best_score = None, float("inf")
for w in window_list:
if "iTerm" not in w.get("kCGWindowOwnerName", ""):
continue
b = w.get("kCGWindowBounds", {})
score = (abs(float(b.get("X", 0)) - frame.origin.x) * 2
+ abs(float(b.get("Width", 0)) - frame.size.width)
+ abs(float(b.get("Height", 0)) - frame.size.height))
if score < best_score:
best_score, best_id = score, w.get("kCGWindowNumber")
if best_id and best_score < 30:
subprocess.run(["screencapture", "-x", "-l", str(best_id), output_path])
return output_path
return None
Key facts:
- •
screencapture -lworks for non-frontmost windows — no focus required - •Minimized windows cannot be captured (excluded by
kCGWindowListOptionOnScreenOnly) - •Each agent needs its own window (not tab) for independent screenshots
- •Tabs share the same Quartz window ID — screenshot shows the active tab only
TUI Layout Verification
TUI elements frequently misalign. Always verify layout integrity. See references/verification-patterns.md for complete helpers including box integrity, modal boundaries, and status bar checks.
Test Reporting
Track results with a results dict containing passed, failed, and tests list. See references/reporting.md for JSON/JUnit export patterns.
Special Keys Reference
| Key | Code | Notes |
|---|---|---|
| Enter | \r | Prefer over \n in TUIs |
| Esc | \x1b | |
| Ctrl+C | \x03 | |
| Ctrl+D | \x04 | EOF |
| Ctrl+X | \x18 | |
| Tab | \t | |
| Up Arrow | \x1b[A | |
| Down Arrow | \x1b[B | |
| Right Arrow | \x1b[C | |
| Left Arrow | \x1b[D |
Guidelines
- •Always use
uv run— never run Python directly - •Create your own window — never use
app.current_terminal_window - •Use readiness probes — never rely on fixed
sleep()for initialization - •Track all resources — close all created sessions/windows in finally blocks
- •Use
\rfor Enter in TUIs — safer than\nfor prompts - •Dump screen on failure — always show what went wrong
- •Verify layout — check box-drawing characters connect properly
- •Use
suppress_broadcast=Truewhen broadcast input may be enabled (prevents text leaking to other sessions)
Script Storage
| Scope | Location | Git |
|---|---|---|
| Project-specific | ./.claude/automations/{script}.py or ./.agent/automations/{script}.py | COMMIT — these are project assets |
| General utility | ~/.claude/automations/{script}.py | N/A (user home) |
| Screenshots | ./.claude/screenshots/ | GITIGNORE — local verification only |
Important: Automation scripts SHOULD be committed to the repository. They are project assets that enforce test coverage and enable reproducible testing across machines and agents. Use .claude/automations/ or the agent-neutral .agent/automations/ folder.
No PII in scripts: Since scripts are committed and shareable, they MUST NOT contain:
- •Hardcoded usernames, hostnames, or paths specific to one developer
- •API keys, tokens, or credentials
- •Personal file paths (use
~,$HOME, or relative paths) - •Machine-specific configurations (use environment variables)
Screenshots MUST NOT be committed. Add to .gitignore:
.claude/screenshots/ .agent/screenshots/ screenshots/
Examples & References
Examples (examples/ directory):
- •
00-comprehensive-template.py— Complete template with all patterns - •
01-basic-tab.py— Simple tab creation and command execution - •
02-dev-layout.py— Multi-pane development layout - •
03-repl-driver.py— REPL automation with verification - •
04-nano-automation.py— TUI editor interaction with cleanup - •
05-screen-monitor.py— ScreenStreamer for real-time monitoring - •
06-environment-vars.py— Environment variable handling - •
07-cleanup-sessions.py— Session cleanup patterns - •
08-badge-control.py— Badge and tab control - •
09-special-keys.py— Special key sequences for TUI navigation - •
10-session-reuse.py— Get-or-create session reuse pattern - •
11-layout-verification.py— TUI layout alignment checks - •
12-parallel-agents.py— Multiple concurrent agents with independent screenshots - •
13-connection-diagnostics.py— Pre-flight checks and troubleshooting
References (references/ directory):
- •
templates.md— Copy-paste script templates (single + parallel) - •
verification-patterns.md— All verification helpers including layout checks - •
reporting.md— Test reporting patterns, JSON/JUnit export - •
parallel-patterns.md— Parallel agent patterns, Quartz correlation, cleanup - •
cloud-and-limitations.md— Platform support matrix, cloud alternatives