AgentSkillsCN

agent-browser

面向 AI 代理的浏览器自动化 CLI。当用户需要与网站交互,例如浏览页面、填写表单、点击按钮、截取屏幕截图、提取数据、测试 Web 应用,或自动化任何浏览器任务时,可使用此工具。触发条件包括:“打开网站”、“填写表单”、“点击按钮”、“截屏”、“从页面抓取数据”、“测试这个 Web 应用”、“登录站点”、“自动化浏览器操作”,或任何需要程序化网页交互的任务。

SKILL.md
--- frontmatter
name: agent-browser
description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
allowed-tools: Bash(agent-browser:*)

Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

  1. Navigate: agent-browser open <url>
  2. Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
  3. Interact: Use refs to click, fill, select
  4. Re-snapshot: After navigation or DOM changes, get fresh refs

```bash agent-browser open https://example.com/form agent-browser snapshot -i

Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ```

Essential Commands

```bash

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser

Snapshot

agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector

Interaction (use @refs from snapshot)

agent-browser click @e1 # Click element agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser scroll down 500 # Scroll page

Get information

agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title

Wait

agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds

Capture

agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser pdf output.pdf # Save as PDF ```

Common Patterns

Form Submission

```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ```

Authentication with State Persistence

```bash

Login once and save state

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Reuse in future sessions

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ```

Data Extraction

```bash agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text

JSON output for parsing

agent-browser snapshot -i --json agent-browser get text @e1 --json ```

Parallel Sessions

```bash agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i

agent-browser session list ```

Visual Browser (Debugging)

```bash agent-browser --headed open https://example.com agent-browser highlight @e1 # Highlight element agent-browser record start demo.webm # Record session ```

Local Files (PDFs, HTML)

```bash

Open local files with file:// URLs

agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html agent-browser screenshot output.png ```

iOS Simulator (Mobile Safari)

```bash

List available iOS simulators

agent-browser device list

Launch Safari on a specific device

agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

Same workflow as desktop - snapshot, interact, re-snapshot

agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile-specific gesture

Take screenshot

agent-browser -p ios screenshot mobile.png

Close session (shuts down simulator)

agent-browser -p ios close ```

Requirements: macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)

Real devices: Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.

Ref Lifecycle (Important)

Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:

  • Clicking links or buttons that navigate
  • Form submissions
  • Dynamic content loading (dropdowns, modals)

```bash agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs ```

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

```bash agent-browser find text "Sign In" click agent-browser find label "Email" fill "user@test.com" agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click ```

Deep-Dive Documentation

ReferenceWhen to Use
references/commands.mdFull command reference with all options
references/snapshot-refs.mdRef lifecycle, invalidation rules, troubleshooting
references/session-management.mdParallel sessions, state persistence, concurrent scraping
references/authentication.mdLogin flows, OAuth, 2FA handling, state reuse
references/video-recording.mdRecording workflows for debugging and documentation
references/proxy-support.mdProxy configuration, geo-testing, rotating proxies

Ready-to-Use Templates

TemplateDescription
templates/form-automation.shForm filling with validation
templates/authenticated-session.shLogin once, reuse state
templates/capture-workflow.shContent extraction with screenshots

```bash ./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output ```