AgentSkillsCN

browser-use

用于浏览器工作流自动化的代理技能。适用于各类浏览器工作流,例如浏览网站、与页面交互、填写表单、截取屏幕截图,或提取页面数据。若目标网站高度依赖 JavaScript,或受到验证码保护,此技能亦可作为网页抓取的备选方案。此外,该技能也可用于前端测试,但若需进行前端测试,我们更推荐使用 `agent-browser` 技能。

SKILL.md
--- frontmatter
name: browser-use
description: Agent Skill of workflow automation on browser. Use for any workflow on browser like navigating websites, interacting with pages, filling forms, capturing screenshots, or extracting on-page data. Can also be used as a backup method for web fetching if the website is heavily JS-based, or Captcha-protected. Can also be used for frontend testing but prefer `agent-browser` skill for that.
allowed-tools: Bash(browser-use:*), Bash(uvx browser-use:*)

Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

If browser-use is not installed, run it via uvx browser-use.

Quick Start

bash
browser-use open https://example.com           # Navigate to URL
browser-use state                              # Get page elements with indices
browser-use click 5                            # Click element by index
browser-use type "Hello World"                 # Type text
browser-use screenshot                         # Take screenshot
browser-use close                              # Close browser

Core Workflow

  1. Navigate: browser-use open <url> - Opens URL (starts browser if needed)
  2. Inspect: browser-use state - Returns clickable elements with indices
  3. Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
  4. Verify: browser-use state or browser-use screenshot to confirm actions
  5. Repeat: Browser stays open between commands

Browser Modes

bash
browser-use --browser chromium open <url>      # Default: headless Chromium
browser-use --browser chromium --headed open <url>  # Visible Chromium window
browser-use --browser real open <url>          # User's Chrome with login sessions
browser-use --browser remote open <url>        # Cloud browser (requires API key)
  • chromium: Fast, isolated, headless by default
  • real: Uses your Chrome with cookies, extensions, logged-in sessions
  • remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

Navigation

bash
browser-use open <url>                    # Navigate to URL
browser-use back                          # Go back in history
browser-use scroll down                   # Scroll down
browser-use scroll up                     # Scroll up

Page State

bash
browser-use state                         # Get URL, title, and clickable elements
browser-use screenshot                    # Take screenshot (outputs base64)
browser-use screenshot path.png           # Save screenshot to file
browser-use screenshot --full path.png    # Full page screenshot

Interactions (use indices from browser-use state)

bash
browser-use click <index>                 # Click element
browser-use type "text"                   # Type text into focused element
browser-use input <index> "text"          # Click element, then type text
browser-use keys "Enter"                  # Send keyboard keys
browser-use keys "Control+a"              # Send key combination
browser-use select <index> "option"       # Select dropdown option

Tab Management

bash
browser-use switch <tab>                  # Switch to tab by index
browser-use close-tab                     # Close current tab
browser-use close-tab <tab>               # Close specific tab

JavaScript & Data

bash
browser-use eval "document.title"         # Execute JavaScript, return result
browser-use extract "all product prices"  # Extract data using LLM (requires API key)

Python Execution (Persistent Session)

bash
browser-use python "x = 42"               # Set variable
browser-use python "print(x)"             # Access variable (outputs: 42)
browser-use python "print(browser.url)"   # Access browser object
browser-use python --vars                 # Show defined variables
browser-use python --reset                # Clear Python namespace
browser-use python --file script.py       # Execute Python file

The Python session maintains state across commands. The browser object provides:

  • browser.url - Current page URL
  • browser.title - Page title
  • browser.goto(url) - Navigate
  • browser.click(index) - Click element
  • browser.type(text) - Type text
  • browser.screenshot(path) - Take screenshot
  • browser.scroll() - Scroll page
  • browser.html - Get page HTML

Agent Tasks (Requires API Key)

bash
browser-use run "Fill the contact form with test data"    # Run AI agent
browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

bash
browser-use sessions                      # List active sessions
browser-use close                         # Close current session
browser-use close --all                   # Close all sessions

Server Control

bash
browser-use server status                 # Check if server is running
browser-use server stop                   # Stop server
browser-use server logs                   # View server logs

Global Options

OptionDescription
--session NAMEUse named session (default: "default")
--browser MODEBrowser mode: chromium, real, remote
--headedShow browser window (chromium mode)
--profile NAMEChrome profile (real mode only)
--jsonOutput as JSON
--api-key KEYOverride API key

Session behavior: All commands without --session use the same "default" session. The browser stays open and is reused across commands. Use --session NAME to run multiple browsers in parallel.

Examples

Form Submission

bash
browser-use open https://example.com/contact
browser-use state
# Shows: [0] input "Name", [1] input "Email", [2] textarea "Message", [3] button "Submit"
browser-use input 0 "John Doe"
browser-use input 1 "john@example.com"
browser-use input 2 "Hello, this is a test message."
browser-use click 3
browser-use state  # Verify success

Multi-Session Workflows

bash
browser-use --session work open https://work.example.com
browser-use --session personal open https://personal.example.com
browser-use --session work state    # Check work session
browser-use --session personal state  # Check personal session
browser-use close --all             # Close both sessions

Data Extraction with Python

bash
browser-use open https://example.com/products
browser-use python "
products = []
for i in range(20):
    browser.scroll('down')
browser.screenshot('products.png')
"
browser-use python "print(f'Captured {len(products)} products')"

Using Real Browser (Logged-In Sessions)

bash
browser-use --browser real open https://gmail.com
# Uses your actual Chrome with existing login sessions
browser-use state  # Already logged in!

Tips

  1. Always run browser-use state first to see available elements and their indices
  2. Use --headed for debugging to see what the browser is doing
  3. Sessions persist - the browser stays open between commands
  4. Use --json for parsing output programmatically
  5. Python variables persist across browser-use python commands within a session
  6. Real browser mode preserves your login sessions and extensions

Troubleshooting

Browser won't start?

bash
browser-use server stop               # Stop any stuck server
browser-use --headed open <url>       # Try with visible window

Element not found?

bash
browser-use state                     # Check current elements
browser-use scroll down               # Element might be below fold
browser-use state                     # Check again

Session issues?

bash
browser-use sessions                  # Check active sessions
browser-use close --all               # Clean slate
browser-use open <url>                # Fresh start

Cleanup

Always close the browser when done. Run this after completing browser automation:

bash
browser-use close