agent-browser: CLI Browser Automation
Vercel's headless browser CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Setup Check
bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
Install if needed
bash
npm install -g agent-browser agent-browser install # Downloads Chromium
Core Workflow
The snapshot + ref pattern is optimal for LLMs:
- •Navigate to URL
- •Snapshot to get interactive elements with refs
- •Interact using refs (@e1, @e2, etc.)
- •Re-snapshot after navigation or DOM changes
bash
# Step 1: Open URL agent-browser open https://example.com # Step 2: Get interactive elements with refs agent-browser snapshot -i # Step 3: Interact using refs agent-browser click @e1 agent-browser fill @e2 "search query" # Step 4: Re-snapshot after changes agent-browser snapshot -i
Key Commands
Navigation
bash
agent-browser open <url> # Navigate to URL agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page agent-browser close # Close browser
Snapshots (Essential for AI)
bash
agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (recommended) agent-browser snapshot -i --json # JSON output for parsing agent-browser snapshot -c # Compact (remove empty elements) agent-browser snapshot -d 3 # Limit depth agent-browser snapshot -s @e5 # Scope to element subtree
Interactions
bash
agent-browser click @e1 # Click element agent-browser dblclick @e1 # Double-click agent-browser fill @e1 "text" # Clear and fill input agent-browser type @e1 "text" # Type without clearing agent-browser press Enter # Press key agent-browser hover @e1 # Hover element agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck checkbox agent-browser select @e1 "option" # Select dropdown option agent-browser scroll down 500 # Scroll (up/down/left/right) agent-browser scrollintoview @e1 # Scroll element into view
Get Information
bash
agent-browser get text @e1 # Get element text agent-browser get html @e1 # Get element HTML agent-browser get value @e1 # Get input value agent-browser get attr href @e1 # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count "button" # Count matching elements
Screenshots & PDFs
bash
agent-browser screenshot # Viewport screenshot agent-browser screenshot --full # Full page agent-browser screenshot output.png # Save to file agent-browser pdf output.pdf # Save as PDF
Wait
bash
agent-browser wait @e1 # Wait for element agent-browser wait 2000 # Wait milliseconds agent-browser wait "text" # Wait for text to appear agent-browser wait --url "pattern" # Wait for URL match
Semantic Locators (Alternative to Refs)
bash
agent-browser find role button click --name "Submit" agent-browser find text "Sign up" click agent-browser find label "Email" fill "user@example.com" agent-browser find placeholder "Search..." fill "query"
Sessions & Profiles
Sessions (Parallel Isolated Browsers)
bash
agent-browser --session browser1 open https://site1.com agent-browser --session browser2 open https://site2.com agent-browser session list
Profiles (Persistent State Across Restarts)
bash
# Profiles preserve cookies, localStorage, login sessions agent-browser --profile ~/.myapp-profile open https://app.example.com
Authentication
Skip UI Login with Headers
bash
agent-browser open https://api.example.com --headers '{"Authorization": "Bearer <token>"}'
Save/Load Auth State
bash
# After logging in via UI agent-browser state save auth-state.json # Reuse in future sessions agent-browser state load auth-state.json agent-browser open https://app.example.com # Already logged in
Debug Mode
bash
# Run with visible browser window agent-browser --headed open https://example.com agent-browser --headed snapshot -i agent-browser --headed click @e1
Examples
Login Flow
bash
agent-browser open https://app.example.com/login agent-browser snapshot -i # Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3] agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait 2000 agent-browser snapshot -i # Verify logged in
Search and Extract
bash
agent-browser open https://news.ycombinator.com agent-browser snapshot -i agent-browser get text @e12 # Get headline text agent-browser click @e12 # Click to open story
Form Filling
bash
agent-browser open https://forms.example.com agent-browser snapshot -i agent-browser fill @e1 "John Doe" agent-browser fill @e2 "john@example.com" agent-browser select @e3 "United States" agent-browser check @e4 # Agree to terms agent-browser click @e5 # Submit agent-browser screenshot confirmation.png
JSON Output
bash
agent-browser snapshot -i --json
Returns structured data with refs for programmatic parsing.
vs Playwright MCP
| Feature | agent-browser (CLI) | Playwright MCP |
|---|---|---|
| Interface | Bash commands | MCP tools |
| Selection | Refs (@e1) | Refs (e1) |
| Output | Text/JSON | Tool responses |
| Parallel | Sessions | Tabs |
| Best for | Quick automation | Tool integration |
Use agent-browser when:
- •You prefer Bash-based workflows
- •You want simpler CLI commands
- •You need quick one-off automation
Use Playwright MCP when:
- •You need deep MCP tool integration
- •You want tool-based responses
- •You're building complex automation