Web Browser Skill (agent-browser)

Use agent-browser for web automation. It runs a headless Chromium instance by default and exposes a CLI optimized for AI agents.

Full command reference: agent-browser --help

Installation

bash

npm install -g agent-browser
agent-browser install              # Download Chromium
# Linux only:
agent-browser install --with-deps  # Install system deps

Core Workflow (recommended)

•
Open a page
bash
```
agent-browser open https://example.com
```

•Get a snapshot (refs)

bash

agent-browser snapshot -i        # Interactive elements only
# or JSON for machine parsing
agent-browser snapshot -i --json

•Interact using refs

bash

agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1

•
Re-snapshot after changes
bash
```
agent-browser snapshot -i --json
```

Refs (@e1, @e2, …) are deterministic and ideal for AI workflows.

Common Commands

bash

agent-browser open <url>            # Navigate (alias: goto)
agent-browser snapshot              # Accessibility tree with refs
agent-browser click <sel|@ref>
agent-browser fill <sel|@ref> <text>
agent-browser type <sel|@ref> <text>
agent-browser press <key>           # e.g. Enter, Tab, Control+a
agent-browser get text <sel|@ref>
agent-browser screenshot [path]     # Use --full for full page
agent-browser close                 # Close browser

Semantic Finders (optional)

bash

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"

Helpful Options

•Headed mode (visible browser):

bash

agent-browser open https://example.com --headed

•Persistent profile (cookies/logins):

bash

agent-browser --profile ~/.myapp-profile open https://example.com

•Isolated sessions:

bash

agent-browser --session agent1 open https://example.com

•Agent-friendly JSON output:

bash

agent-browser snapshot -i --json
agent-browser get text @e1 --json

•Local files (file://):

bash

agent-browser --allow-file-access open file:///path/to/page.html

When to Use

Use this skill whenever the agent needs to browse the web, inspect pages, click buttons, fill forms, or capture screenshots.