Browser Automation with z-agent-browser
BEFORE YOU START - Mode Selection
STOP. Ask the user which mode to use before executing any browser commands.
| Mode | Best For | Requires |
|---|---|---|
| Native (default) | General automation, scraping, testing | Nothing extra |
| Native + Stealth | Sites with bot detection (Cloudflare, etc.) | Nothing extra |
| CDP (Real Chrome) | Saved passwords, CAPTCHA, 2FA, Google/Gmail | Chrome running with --remote-debugging-port=9222 |
| Playwright MCP | Control user's existing Chrome tabs | Extension from GitHub + token |
Ask the user:
"I can automate browsers in a few ways:
- •Headless - Fast, invisible automation (default)
- •Headed - Visible browser window (for debugging or manual steps)
- •Stealth - Bypasses bot detection (for protected sites)
- •Real Chrome (CDP) - Uses your actual Chrome with saved passwords
- •Playwright MCP - Controls your existing Chrome tabs (experimental)
Which mode should I use? (or describe what you're trying to do and I'll recommend one)"
Quick Decision Guide
Need saved passwords or 2FA? → CDP Mode (Real Chrome) Site blocks automation? → Native + Stealth Google/Gmail? → CDP for login, then Stealth Just testing/scraping? → Native (default) Want to watch it run? → Native + Headed Control existing Chrome tabs? → Playwright MCP
Mode Setup Commands
# Native (default) z-agent-browser start # Native + Headed (visible) z-agent-browser start --headed # Native + Stealth (anti-detection) z-agent-browser start --stealth # CDP (Real Chrome) - requires custom profile directory (Chrome 136+ blocks CDP on default profile) # First-time: cp -R "$HOME/Library/Application Support/Google/Chrome" ~/.z-agent-browser/chrome-profile # Then launch: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 --user-data-dir="$HOME/.z-agent-browser/chrome-profile" & z-agent-browser connect 9222 # Playwright MCP - requires extension from https://github.com/microsoft/playwright-mcp/releases # Load unpacked in chrome://extensions, get token from extension icon, then: # export PLAYWRIGHT_MCP_EXTENSION_TOKEN=<token> # export AGENT_BROWSER_BACKEND=playwright-mcp z-agent-browser open <url>
Core Workflow
- •Navigate:
z-agent-browser open <url> - •Snapshot:
z-agent-browser snapshot -i(returns refs like@e1,@e2) - •Interact using refs from snapshot
- •Re-snapshot after navigation or DOM changes
Token Efficiency: eval vs snapshot
Use snapshot -i for navigation (finding what to click):
z-agent-browser snapshot -i # ~200-500 tokens
Use eval for data extraction (getting information):
z-agent-browser eval "document.querySelectorAll('.item').length"
z-agent-browser eval "[...document.querySelectorAll('a')].map(a => a.href)"
| Task | Best Tool | Tokens |
|---|---|---|
| Find button to click | snapshot -i | ~200-500 |
| Count items on page | eval | ~10 |
| Extract all links/data | eval | ~50-100 |
| Fill a form | snapshot -i + refs | ~200-500 |
Rule: Need to CLICK? → snapshot. Need to READ/EXTRACT? → eval.
Commands
Browser Lifecycle
z-agent-browser start # Start headless browser z-agent-browser start --headed # Start visible browser z-agent-browser start --stealth # Start with anti-detection z-agent-browser status # Check current mode z-agent-browser stop # Stop browser z-agent-browser connect 9222 # Connect to Chrome via CDP
Navigation
z-agent-browser open <url> # Navigate to URL z-agent-browser back # Go back z-agent-browser forward # Go forward z-agent-browser reload # Reload page z-agent-browser close # Close browser
Snapshot (page analysis)
z-agent-browser snapshot # Full accessibility tree z-agent-browser snapshot -i # Interactive elements only (recommended) z-agent-browser snapshot -c # Compact output z-agent-browser snapshot -d 3 # Limit depth to 3 z-agent-browser snapshot -s "#main" # Scope to CSS selector
Interactions (use @refs from snapshot)
z-agent-browser click @e1 # Click z-agent-browser dblclick @e1 # Double-click z-agent-browser fill @e2 "text" # Clear and type z-agent-browser type @e2 "text" # Type without clearing z-agent-browser press Enter # Press key z-agent-browser press Control+a # Key combination z-agent-browser hover @e1 # Hover z-agent-browser check @e1 # Check checkbox z-agent-browser uncheck @e1 # Uncheck checkbox z-agent-browser select @e1 "value" # Select dropdown z-agent-browser scroll down 500 # Scroll page z-agent-browser scrollintoview @e1 # Scroll element into view z-agent-browser drag @e1 @e2 # Drag and drop z-agent-browser upload @e1 file.pdf # Upload files
Get Information
z-agent-browser get text @e1 # Get element text z-agent-browser get html @e1 # Get innerHTML z-agent-browser get value @e1 # Get input value z-agent-browser get attr @e1 href # Get attribute z-agent-browser get title # Get page title z-agent-browser get url # Get current URL z-agent-browser get count ".item" # Count matching elements
Check State
z-agent-browser is visible @e1 # Check if visible z-agent-browser is enabled @e1 # Check if enabled z-agent-browser is checked @e1 # Check if checked
Screenshots & PDF
z-agent-browser screenshot # Screenshot to stdout (base64) z-agent-browser screenshot path.png # Save to file z-agent-browser screenshot --full # Full page z-agent-browser pdf output.pdf # Save as PDF
State Persistence
z-agent-browser state save auth.json # Save cookies/localStorage z-agent-browser state load auth.json # Restore session
Video Recording
z-agent-browser record start demo.webm # Start recording z-agent-browser record stop # Stop and save z-agent-browser record restart take2.webm # Restart with new file
Wait
z-agent-browser wait @e1 # Wait for element z-agent-browser wait 2000 # Wait milliseconds z-agent-browser wait --text "Success" # Wait for text z-agent-browser wait --url "**/dashboard" # Wait for URL pattern z-agent-browser wait --load networkidle # Wait for network idle
JavaScript
z-agent-browser eval "document.title"
z-agent-browser eval "[...document.querySelectorAll('a')].map(a => a.href)"
Browser Settings
z-agent-browser set viewport 1920 1080
z-agent-browser set device "iPhone 14"
z-agent-browser set headers '{"X-Key":"v"}'
Tabs
z-agent-browser tab # List tabs z-agent-browser tab new [url] # New tab z-agent-browser tab 2 # Switch to tab z-agent-browser tab close # Close tab
Debug
z-agent-browser console # View console messages z-agent-browser errors # View page errors z-agent-browser highlight @e1 # Highlight element
Login Persistence
Use state save/load to persist login sessions:
# First time: Login manually in headed mode z-agent-browser start --headed z-agent-browser open "https://github.com" # [User logs in manually] z-agent-browser state save ~/.z-agent-browser/github.json z-agent-browser stop # Later: Restore session headlessly z-agent-browser start z-agent-browser state load ~/.z-agent-browser/github.json z-agent-browser open "https://github.com" # Already logged in!
CDP Mode (Real Chrome)
For CAPTCHA, 2FA, or using saved passwords.
Note: Chrome 136+ blocks CDP on default profile. You must use
--user-data-dir.
First-time setup (copy your Chrome profile):
# macOS mkdir -p ~/.z-agent-browser cp -R "$HOME/Library/Application Support/Google/Chrome" ~/.z-agent-browser/chrome-profile # Linux cp -R ~/.config/google-chrome ~/.z-agent-browser/chrome-profile
Launch and connect:
# Quit existing Chrome pkill -9 "Google Chrome" # Launch with CDP + custom profile "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.z-agent-browser/chrome-profile" & z-agent-browser connect 9222 z-agent-browser open "https://github.com" # Uses real Chrome with your logins
Gmail/Google (Hybrid Workflow)
Google blocks Chromium automation. Use real Chrome for login, then stealth for automation.
Prerequisite: Complete "CDP Mode" first-time setup above (copy Chrome profile).
# 1. Launch real Chrome with CDP (using copied profile) pkill -9 "Google Chrome" "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.z-agent-browser/chrome-profile" & # 2. Connect and login if needed z-agent-browser connect 9222 z-agent-browser open "https://mail.google.com" # User completes login in Chrome window # 3. Save state z-agent-browser state save ~/.z-agent-browser/gmail-state.json z-agent-browser close && pkill -9 "Google Chrome" # 4. Use stealth with saved state z-agent-browser start --stealth z-agent-browser state load ~/.z-agent-browser/gmail-state.json z-agent-browser open "https://mail.google.com" # Logged in!
Important Notes
- •Always use
snapshot -ito reduce output size - •Use refs (@e1, @e2) from snapshot, not CSS selectors
- •Re-snapshot after navigation - refs change when page changes
- •Save state after login -
state savepersists sessions - •For Google/Gmail - use hybrid CDP workflow, not direct login
- •Use
evalfor extraction - much more token efficient than parsing snapshots