Browser Automation with agent-browser
Installation: The agent-browser binaries for all platforms are available in this skill directory. Use the platform-specific launcher:
- •On Windows:
./agent-browser.bat- •On macOS/Linux:
./agent-browser.shAll command examples below use./agent-browser.sh(macOS/Linux syntax). On Windows, substitute with./agent-browser.bat.
Core Workflow
Every browser automation follows this pattern:
- •Navigate:
./agent-browser.sh open <url> - •Snapshot:
./agent-browser.sh snapshot -i(get element refs like@e1,@e2) - •Interact: Use refs to click, fill, select
- •Re-snapshot: After navigation or DOM changes, get fresh refs
./agent-browser.sh open https://example.com/form ./agent-browser.sh snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" ./agent-browser.sh fill @e1 "user@example.com" ./agent-browser.sh fill @e2 "password123" ./agent-browser.sh click @e3 ./agent-browser.sh wait --load networkidle ./agent-browser.sh snapshot -i # Check result
Essential Commands
# Navigation ./agent-browser.sh open <url> # Navigate (aliases: goto, navigate) ./agent-browser.sh close # Close browser # Snapshot ./agent-browser.sh snapshot -i # Interactive elements with refs (recommended) ./agent-browser.sh snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) ./agent-browser.sh snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) ./agent-browser.sh click @e1 # Click element ./agent-browser.sh fill @e2 "text" # Clear and type text ./agent-browser.sh type @e2 "text" # Type without clearing ./agent-browser.sh select @e1 "option" # Select dropdown option ./agent-browser.sh check @e1 # Check checkbox ./agent-browser.sh press Enter # Press key ./agent-browser.sh scroll down 500 # Scroll page # Get information ./agent-browser.sh get text @e1 # Get element text ./agent-browser.sh get url # Get current URL ./agent-browser.sh get title # Get page title # Wait ./agent-browser.sh wait @e1 # Wait for element ./agent-browser.sh wait --load networkidle # Wait for network idle ./agent-browser.sh wait --url "**/page" # Wait for URL pattern ./agent-browser.sh wait 2000 # Wait milliseconds # Capture ./agent-browser.sh screenshot # Screenshot to temp dir ./agent-browser.sh screenshot --full # Full page screenshot ./agent-browser.sh pdf output.pdf # Save as PDF
Common Patterns
Form Submission
./agent-browser.sh open https://example.com/signup ./agent-browser.sh snapshot -i ./agent-browser.sh fill @e1 "Jane Doe" ./agent-browser.sh fill @e2 "jane@example.com" ./agent-browser.sh select @e3 "California" ./agent-browser.sh check @e4 ./agent-browser.sh click @e5 ./agent-browser.sh wait --load networkidle
Authentication with State Persistence
# Login once and save state ./agent-browser.sh open https://app.example.com/login ./agent-browser.sh snapshot -i ./agent-browser.sh fill @e1 "$USERNAME" ./agent-browser.sh fill @e2 "$PASSWORD" ./agent-browser.sh click @e3 ./agent-browser.sh wait --url "**/dashboard" ./agent-browser.sh state save auth.json # Reuse in future sessions ./agent-browser.sh state load auth.json ./agent-browser.sh open https://app.example.com/dashboard
Session Persistence
# Auto-save/restore cookies and localStorage across browser restarts ./agent-browser.sh --session-name myapp open https://app.example.com/login # ... login flow ... ./agent-browser.sh close # State auto-saved to ~/.agent-browser/sessions/ # Next time, state is auto-loaded ./agent-browser.sh --session-name myapp open https://app.example.com/dashboard # Encrypt state at rest export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32) ./agent-browser.sh --session-name secure open https://app.example.com # Manage saved states ./agent-browser.sh state list ./agent-browser.sh state show myapp-default.json ./agent-browser.sh state clear myapp ./agent-browser.sh state clean --older-than 7
Data Extraction
./agent-browser.sh open https://example.com/products ./agent-browser.sh snapshot -i ./agent-browser.sh get text @e5 # Get specific element text ./agent-browser.sh get text body > page.txt # Get all page text # JSON output for parsing ./agent-browser.sh snapshot -i --json ./agent-browser.sh get text @e1 --json
Parallel Sessions
./agent-browser.sh --session site1 open https://site-a.com ./agent-browser.sh --session site2 open https://site-b.com ./agent-browser.sh --session site1 snapshot -i ./agent-browser.sh --session site2 snapshot -i ./agent-browser.sh session list
Connect to Existing Chrome
# Auto-discover running Chrome with remote debugging enabled ./agent-browser.sh --auto-connect open https://example.com ./agent-browser.sh --auto-connect snapshot # Or with explicit CDP port ./agent-browser.sh --cdp 9222 snapshot
Visual Browser (Debugging)
./agent-browser.sh --headed open https://example.com ./agent-browser.sh highlight @e1 # Highlight element ./agent-browser.sh record start demo.webm # Record session
Local Files (PDFs, HTML)
# Open local files with file:// URLs ./agent-browser.sh --allow-file-access open file:///path/to/document.pdf ./agent-browser.sh --allow-file-access open file:///path/to/page.html ./agent-browser.sh screenshot output.png
iOS Simulator (Mobile Safari)
# List available iOS simulators ./agent-browser.sh device list # Launch Safari on a specific device ./agent-browser.sh -p ios --device "iPhone 16 Pro" open https://example.com # Same workflow as desktop - snapshot, interact, re-snapshot ./agent-browser.sh -p ios snapshot -i ./agent-browser.sh -p ios tap @e1 # Tap (alias for click) ./agent-browser.sh -p ios fill @e2 "text" ./agent-browser.sh -p ios swipe up # Mobile-specific gesture # Take screenshot ./agent-browser.sh -p ios screenshot mobile.png # Close session (shuts down simulator) ./agent-browser.sh -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- •Clicking links or buttons that navigate
- •Form submissions
- •Dynamic content loading (dropdowns, modals)
./agent-browser.sh click @e5 # Navigates to new page ./agent-browser.sh snapshot -i # MUST re-snapshot ./agent-browser.sh click @e1 # Use new refs
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
./agent-browser.sh find text "Sign In" click ./agent-browser.sh find label "Email" fill "user@test.com" ./agent-browser.sh find role button click --name "Submit" ./agent-browser.sh find placeholder "Search" type "query" ./agent-browser.sh find testid "submit-btn" click
JavaScript Evaluation (eval)
Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.
# Simple expressions work with regular quoting
./agent-browser.sh eval 'document.title'
./agent-browser.sh eval 'document.querySelectorAll("img").length'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
./agent-browser.sh eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Alternative: base64 encoding (avoids all shell escaping issues)
./agent-browser.sh eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.
Rules of thumb:
- •Single-line, no nested quotes -> regular
eval 'expression'with single quotes is fine - •Nested quotes, arrow functions, template literals, or multiline -> use
eval --stdin <<'EVALEOF' - •Programmatic/generated scripts -> use
eval -bwith base64
Deep-Dive Documentation
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output