Browser Automation with agent-browser
Installation Required
This skill requires agent-browser to be installed. If you see an error that this skill is unavailable, follow the installation guide below.
Quick Install
Choose your preferred method:
Option 1: npm (Recommended)
bash
npm install -g @agent-tools/browser
Option 2: Use the installation script
bash
# macOS/Linux bash scripts/install-agent-browser.sh # Windows powershell scripts/install-agent-browser.ps1
Option 3: Manual installation
- •Download from: https://github.com/hdresearch/nolita/releases
- •Add to your system PATH
Verify installation:
bash
agent-browser --version
Platform-Specific Notes
Windows Users:
- •Use
agent-browser.cmdinstead ofagent-browserin commands - •Or add
.cmdextension:agent-browser.cmd open <url> - •The installation script handles this automatically
All Platforms:
- •After installation, you may need to restart your terminal
- •Verify with:
agent-browser --version
Quick start
bash
agent-browser open <url> # Navigate to page agent-browser snapshot -i # Get interactive elements with refs agent-browser click @e1 # Click element by ref agent-browser fill @e2 "text" # Fill input by ref agent-browser close # Close browser
Windows: Replace agent-browser with agent-browser.cmd in all commands above.
Core workflow
- •Navigate:
.\agent-browser.cmd open <url> - •Snapshot:
.\agent-browser.cmd snapshot -i(returns elements with refs like@e1,@e2) - •Interact using refs from the snapshot
- •Re-snapshot after navigation or significant DOM changes
Commands
Navigation
bash
.\agent-browser.cmd open <url> # Navigate to URL .\agent-browser.cmd back # Go back .\agent-browser.cmd forward # Go forward .\agent-browser.cmd reload # Reload page .\agent-browser.cmd close # Close browser
Snapshot (page analysis)
bash
.\agent-browser.cmd snapshot # Full accessibility tree .\agent-browser.cmd snapshot -i # Interactive elements only (recommended) .\agent-browser.cmd snapshot -c # Compact output .\agent-browser.cmd snapshot -d 3 # Limit depth to 3 .\agent-browser.cmd snapshot -s "#main" # Scope to CSS selector
Interactions (use @refs from snapshot)
bash
.\agent-browser.cmd click @e1 # Click .\agent-browser.cmd dblclick @e1 # Double-click .\agent-browser.cmd focus @e1 # Focus element .\agent-browser.cmd fill @e2 "text" # Clear and type .\agent-browser.cmd type @e2 "text" # Type without clearing .\agent-browser.cmd press Enter # Press key .\agent-browser.cmd press Control+a # Key combination .\agent-browser.cmd keydown Shift # Hold key down .\agent-browser.cmd keyup Shift # Release key .\agent-browser.cmd hover @e1 # Hover .\agent-browser.cmd check @e1 # Check checkbox .\agent-browser.cmd uncheck @e1 # Uncheck checkbox .\agent-browser.cmd select @e1 "value" # Select dropdown .\agent-browser.cmd scroll down 500 # Scroll page .\agent-browser.cmd scrollintoview @e1 # Scroll element into view .\agent-browser.cmd drag @e1 @e2 # Drag and drop .\agent-browser.cmd upload @e1 file.pdf # Upload files
Get information
bash
.\agent-browser.cmd get text @e1 # Get element text .\agent-browser.cmd get html @e1 # Get innerHTML .\agent-browser.cmd get value @e1 # Get input value .\agent-browser.cmd get attr @e1 href # Get attribute .\agent-browser.cmd get title # Get page title .\agent-browser.cmd get url # Get current URL .\agent-browser.cmd get count ".item" # Count matching elements .\agent-browser.cmd get box @e1 # Get bounding box
Check state
bash
.\agent-browser.cmd is visible @e1 # Check if visible .\agent-browser.cmd is enabled @e1 # Check if enabled .\agent-browser.cmd is checked @e1 # Check if checked
Screenshots & PDF
bash
.\agent-browser.cmd screenshot # Screenshot to stdout .\agent-browser.cmd screenshot path.png # Save to file .\agent-browser.cmd screenshot --full # Full page .\agent-browser.cmd pdf output.pdf # Save as PDF
Video recording
bash
.\agent-browser.cmd record start ./demo.webm # Start recording (uses current URL + state) .\agent-browser.cmd click @e1 # Perform actions .\agent-browser.cmd record stop # Stop and save video .\agent-browser.cmd record restart ./take2.webm # Stop current + start new recording
Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.
Wait
bash
.\agent-browser.cmd wait @e1 # Wait for element .\agent-browser.cmd wait 2000 # Wait milliseconds .\agent-browser.cmd wait --text "Success" # Wait for text .\agent-browser.cmd wait --url "**/dashboard" # Wait for URL pattern .\agent-browser.cmd wait --load networkidle # Wait for network idle .\agent-browser.cmd wait --fn "window.ready" # Wait for JS condition
Mouse control
bash
.\agent-browser.cmd mouse move 100 200 # Move mouse .\agent-browser.cmd mouse down left # Press button .\agent-browser.cmd mouse up left # Release button .\agent-browser.cmd mouse wheel 100 # Scroll wheel
Semantic locators (alternative to refs)
bash
.\agent-browser.cmd find role button click --name "Submit" .\agent-browser.cmd find text "Sign In" click .\agent-browser.cmd find label "Email" fill "user@test.com" .\agent-browser.cmd find first ".item" click .\agent-browser.cmd find nth 2 "a" text
Browser settings
bash
.\agent-browser.cmd set viewport 1920 1080 # Set viewport size
.\agent-browser.cmd set device "iPhone 14" # Emulate device
.\agent-browser.cmd set geo 37.7749 -122.4194 # Set geolocation
.\agent-browser.cmd set offline on # Toggle offline mode
.\agent-browser.cmd set headers '{"X-Key":"v"}' # Extra HTTP headers
.\agent-browser.cmd set credentials user pass # HTTP basic auth
.\agent-browser.cmd set media dark # Emulate color scheme
Cookies & Storage
bash
.\agent-browser.cmd cookies # Get all cookies .\agent-browser.cmd cookies set name value # Set cookie .\agent-browser.cmd cookies clear # Clear cookies .\agent-browser.cmd storage local # Get all localStorage .\agent-browser.cmd storage local key # Get specific key .\agent-browser.cmd storage local set k v # Set value .\agent-browser.cmd storage local clear # Clear all
Network
bash
.\agent-browser.cmd network route <url> # Intercept requests
.\agent-browser.cmd network route <url> --abort # Block requests
.\agent-browser.cmd network route <url> --body '{}' # Mock response
.\agent-browser.cmd network unroute [url] # Remove routes
.\agent-browser.cmd network requests # View tracked requests
.\agent-browser.cmd network requests --filter api # Filter requests
Tabs & Windows
bash
.\agent-browser.cmd tab # List tabs .\agent-browser.cmd tab new [url] # New tab .\agent-browser.cmd tab 2 # Switch to tab .\agent-browser.cmd tab close # Close tab .\agent-browser.cmd window new # New window
Frames
bash
.\agent-browser.cmd frame "#iframe" # Switch to iframe .\agent-browser.cmd frame main # Back to main frame
Dialogs
bash
.\agent-browser.cmd dialog accept [text] # Accept dialog .\agent-browser.cmd dialog dismiss # Dismiss dialog
JavaScript
bash
.\agent-browser.cmd eval "document.title" # Run JavaScript
Example: Form submission
bash
.\agent-browser.cmd open https://example.com/form .\agent-browser.cmd snapshot -i # Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3] .\agent-browser.cmd fill @e1 "user@example.com" .\agent-browser.cmd fill @e2 "password123" .\agent-browser.cmd click @e3 .\agent-browser.cmd wait --load networkidle .\agent-browser.cmd snapshot -i # Check result
Example: Authentication with saved state
bash
# Login once .\agent-browser.cmd open https://app.example.com/login .\agent-browser.cmd snapshot -i .\agent-browser.cmd fill @e1 "username" .\agent-browser.cmd fill @e2 "password" .\agent-browser.cmd click @e3 .\agent-browser.cmd wait --url "**/dashboard" .\agent-browser.cmd state save auth.json # Later sessions: load saved state .\agent-browser.cmd state load auth.json .\agent-browser.cmd open https://app.example.com/dashboard
Sessions (parallel browsers)
bash
.\agent-browser.cmd --session test1 open site-a.com .\agent-browser.cmd --session test2 open site-b.com .\agent-browser.cmd session list
JSON output (for parsing)
Add --json for machine-readable output:
bash
.\agent-browser.cmd snapshot -i --json .\agent-browser.cmd get text @e1 --json
Debugging
bash
.\agent-browser.cmd open example.com --headed # Show browser window .\agent-browser.cmd console # View console messages .\agent-browser.cmd errors # View page errors .\agent-browser.cmd record start ./debug.webm # Record from current page .\agent-browser.cmd record stop # Save recording .\agent-browser.cmd open example.com --headed # Show browser window .\agent-browser.cmd --cdp 9222 snapshot # Connect via CDP .\agent-browser.cmd console # View console messages .\agent-browser.cmd console --clear # Clear console .\agent-browser.cmd errors # View page errors .\agent-browser.cmd errors --clear # Clear errors .\agent-browser.cmd highlight @e1 # Highlight element .\agent-browser.cmd trace start # Start recording trace .\agent-browser.cmd trace stop trace.zip # Stop and save trace