Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

Quick Start

bash

browser-use open https://example.com           # Navigate to URL
browser-use state                              # Get page elements with indices
browser-use click 5                            # Click element by index
browser-use type "Hello World"                 # Type text
browser-use screenshot                         # Take screenshot
browser-use close                              # Close browser

Core Workflow

•Navigate: browser-use open <url> - Opens URL (starts browser if needed)
•Inspect: browser-use state - Returns clickable elements with indices
•Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
•Verify: browser-use state or browser-use screenshot to confirm actions
•Repeat: Browser stays open between commands

Browser Modes

bash

browser-use --browser chromium open <url>      # Default: headless Chromium
browser-use --browser chromium --headed open <url>  # Visible Chromium window
browser-use --browser real open <url>          # User's Chrome with login sessions
browser-use --browser remote open <url>        # Cloud browser (requires API key)

•chromium: Fast, isolated, headless by default
•real: Uses your Chrome with cookies, extensions, logged-in sessions
•remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

Navigation

bash

browser-use open <url>                    # Navigate to URL
browser-use back                          # Go back in history
browser-use scroll down                   # Scroll down
browser-use scroll up                     # Scroll up

Page State

bash

browser-use state                         # Get URL, title, and clickable elements
browser-use screenshot                    # Take screenshot (outputs base64)
browser-use screenshot path.png           # Save screenshot to file
browser-use screenshot --full path.png    # Full page screenshot

Interactions (use indices from `browser-use state`)

bash

browser-use click <index>                 # Click element
browser-use type "text"                   # Type text into focused element
browser-use input <index> "text"          # Click element, then type text
browser-use keys "Enter"                  # Send keyboard keys
browser-use keys "Control+a"              # Send key combination
browser-use select <index> "option"       # Select dropdown option

Tab Management

bash

browser-use switch <tab>                  # Switch to tab by index
browser-use close-tab                     # Close current tab
browser-use close-tab <tab>               # Close specific tab

JavaScript & Data

bash

browser-use eval "document.title"         # Execute JavaScript, return result
browser-use extract "all product prices"  # Extract data using LLM (requires API key)

Python Execution (Persistent Session)

bash

browser-use python "x = 42"               # Set variable
browser-use python "print(x)"             # Access variable (outputs: 42)
browser-use python "print(browser.url)"   # Access browser object
browser-use python --vars                 # Show defined variables
browser-use python --reset                # Clear Python namespace
browser-use python --file script.py       # Execute Python file

The Python session maintains state across commands. The browser object provides:

•browser.url - Current page URL
•browser.title - Page title
•browser.goto(url) - Navigate
•browser.click(index) - Click element
•browser.type(text) - Type text
•browser.screenshot(path) - Take screenshot
•browser.scroll() - Scroll page
•browser.html - Get page HTML

Agent Tasks (Requires API Key)

bash

browser-use run "Fill the contact form with test data"    # Run AI agent
browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

bash

browser-use sessions                      # List active sessions
browser-use close                         # Close current session
browser-use close --all                   # Close all sessions

Server Control

bash

browser-use server status                 # Check if server is running
browser-use server stop                   # Stop server
browser-use server logs                   # View server logs

Global Options

Option	Description
`--session NAME`	Use named session (default: "default")
`--browser MODE`	Browser mode: chromium, real, remote
`--headed`	Show browser window (chromium mode)
`--profile NAME`	Chrome profile (real mode only)
`--json`	Output as JSON
`--api-key KEY`	Override API key

Session behavior: All commands without --session use the same "default" session. The browser stays open and is reused across commands. Use --session NAME to run multiple browsers in parallel.

Examples

Form Submission

bash

browser-use open https://example.com/contact
browser-use state
# Shows: [0] input "Name", [1] input "Email", [2] textarea "Message", [3] button "Submit"
browser-use input 0 "John Doe"
browser-use input 1 "john@example.com"
browser-use input 2 "Hello, this is a test message."
browser-use click 3
browser-use state  # Verify success

Multi-Session Workflows

bash

browser-use --session work open https://work.example.com
browser-use --session personal open https://personal.example.com
browser-use --session work state    # Check work session
browser-use --session personal state  # Check personal session
browser-use close --all             # Close both sessions

Data Extraction with Python

bash

browser-use open https://example.com/products
browser-use python "
products = []
for i in range(20):
    browser.scroll('down')
browser.screenshot('products.png')
"
browser-use python "print(f'Captured {len(products)} products')"

Using Real Browser (Logged-In Sessions)

bash

browser-use --browser real open https://gmail.com
# Uses your actual Chrome with existing login sessions
browser-use state  # Already logged in!

Tips

•Always run browser-use state first to see available elements and their indices
•Use --headed for debugging to see what the browser is doing
•Sessions persist - the browser stays open between commands
•Use --json for parsing output programmatically
•Python variables persist across browser-use python commands within a session
•Real browser mode preserves your login sessions and extensions

Troubleshooting

Browser won't start?

bash

browser-use server stop               # Stop any stuck server
browser-use --headed open <url>       # Try with visible window

Element not found?

bash

browser-use state                     # Check current elements
browser-use scroll down               # Element might be below fold
browser-use state                     # Check again

Session issues?

bash

browser-use sessions                  # Check active sessions
browser-use close --all               # Clean slate
browser-use open <url>                # Fresh start

Cleanup

Always close the browser when done. Run this after completing browser automation:

bash

browser-use close

Browser Automation with browser-use CLI

Quick Start

Core Workflow

Browser Modes

Commands

Navigation

Page State

Interactions (use indices from browser-use state)

Tab Management

JavaScript & Data

Python Execution (Persistent Session)

Agent Tasks (Requires API Key)

Session Management

Server Control

Global Options

Examples

Form Submission

Multi-Session Workflows

Data Extraction with Python

Using Real Browser (Logged-In Sessions)

Tips

Troubleshooting

Cleanup

Interactions (use indices from `browser-use state`)