Browser Automation

This skill provides comprehensive guidance for browser automation using the browser_subagent tool.

Overview

The browser_subagent enables autonomous browser control with automatic session recording. All interactions are captured as WebP videos saved to the artifacts directory.

Core Capabilities

•Navigation - Open URLs, navigate pages, handle redirects
•Interaction - Click, type, scroll, hover, drag-and-drop
•Extraction - Read DOM content, capture screenshots, scrape data
•Verification - Test user flows, validate UI changes
•Recording - Automatic video capture of all sessions

When to Use

Use browser_subagent (vs read_url_content) when:

•JavaScript execution is required
•User interaction is needed (forms, clicks, navigation)
•Authentication or session state is required
•Dynamic content loads after page render
•Visual verification or screenshots are needed
•Recording demonstrations or tutorials

Use read_url_content for static HTML content where JavaScript isn't needed.

Tool Parameters

TaskName (required)

Human-readable title for the browser task.

•Should be properly capitalized
•Example: "Testing Login Flow", "Scraping Product Data"
•Avoid URLs or technical jargon

Task (required)

Detailed instructions for the browser subagent. Be explicit about:

•What to do
•When to stop
•What information to return

Critical: The subagent is autonomous and one-shot. Provide comprehensive instructions upfront.

RecordingName (required)

Filename for the video recording.

•All lowercase with underscores
•Maximum 3 words
•Describes what the recording contains
•Example: login_flow_demo, checkout_process

Best Practices

1. Clear Task Instructions

Bad: "Check the website"

code

Task: Go to example.com and check it

Good: Specific, with clear completion criteria

code

Task: Navigate to https://example.com, wait for the page to fully load, 
verify that the main heading contains "Welcome", capture a screenshot of 
the page, then return the page title and the text content of the main 
heading.

2. Return Conditions

Always specify what the subagent should return:

code

Task: Navigate to the product page at https://shop.example.com/products/123
and extract the following data:
- Product title
- Price
- Availability status
- Number of reviews

Return this information in a structured format when complete.

3. Error Handling

Instruct the subagent how to handle failures:

code

Task: Attempt to log in to https://app.example.com with username "testuser" 
and password "testpass123". If login succeeds, navigate to the dashboard 
and return the user's display name. If login fails, capture a screenshot 
of the error message and return the error text.

4. Wait Conditions

Specify wait conditions for dynamic content:

code

Task: Navigate to https://example.com/search, type "widgets" into the 
search box, click the search button, and wait until the results list 
appears (look for element with class "search-results"). Once results load, 
count the number of result items and return that count.

5. Multi-Step Flows

Break down complex flows into clear steps:

code

Task: Complete the following checkout flow:
1. Navigate to https://shop.example.com
2. Click "Add to Cart" on the first product
3. Click the cart icon in the top right
4. Click "Proceed to Checkout"
5. Fill in the shipping form with test data
6. Capture a screenshot of the order summary
7. Return the total price shown on the order summary

Common Patterns

Authentication Testing

code

TaskName: "Testing User Login"
Task: Navigate to https://app.example.com/login, enter "user@example.com" 
in the email field, enter "password123" in the password field, click the 
"Sign In" button, wait for navigation to complete. If login succeeds and 
you see a dashboard, return "Login successful". If there's an error message, 
return the error text.
RecordingName: login_test

Data Scraping

code

TaskName: "Scraping Product Listings"
Task: Navigate to https://shop.example.com/products, wait for all product 
cards to load, then extract the title and price from each product card. 
Return a list of products with their titles and prices. If pagination 
exists, only scrape the first page.
RecordingName: product_scrape

Form Submission

code

TaskName: "Submitting Contact Form"
Task: Navigate to https://example.com/contact, fill in the form with:
- Name: "Test User"
- Email: "test@example.com"
- Message: "This is a test message"
Then click the submit button and wait for the confirmation message. 
Return the confirmation message text.
RecordingName: contact_form

Screenshot Capture

code

TaskName: "Capturing Homepage Design"
Task: Navigate to https://example.com, wait for complete page load including 
all images, scroll to show the full page layout, capture a full-page 
screenshot, and return confirmation that the screenshot was saved.
RecordingName: homepage_capture

UI Verification

code

TaskName: "Verifying Responsive Layout"
Task: Navigate to https://example.com, resize the browser window to mobile 
width (375px), capture a screenshot, then resize to desktop width (1920px), 
capture another screenshot. Return the dimensions used and confirm both 
screenshots were captured.
RecordingName: responsive_check

Element Selectors

The browser subagent can find elements using:

•CSS selectors
•Text content
•ARIA labels
•Position/proximity
•Visual descriptions

Be specific when describing elements:

Good:

•"Click the blue 'Submit' button at the bottom of the form"
•"Type into the input field labeled 'Email Address'"
•"Click the first product card in the grid"

Avoid:

•"Click the button" (which button?)
•"Fill in the field" (which field?)

Waiting Strategies

Wait for Navigation

code

After clicking "Submit", wait for the page to navigate to the success page.

Wait for Elements

code

Wait until the spinner disappears and the results table is visible.

Wait for Content

code

Wait until the product count shows a number greater than 0.

Fixed Delays (use sparingly)

code

Wait 3 seconds for animations to complete.

Recording Best Practices

Naming Convention

•Use lowercase with underscores
•Be descriptive but concise
•Maximum 3 words
•
Examples:
- •login_flow
- •checkout_test
- •nav_demo
- •form_submit

Recording Purpose

Recordings are automatically saved and useful for:

•Debugging failed automation
•Demonstrating user flows
•Documenting test results
•Creating tutorials
•Reviewing UI behavior

Advanced Techniques

Session State

The browser maintains state during a single subagent execution:

•Cookies persist across navigation
•Login sessions remain active
•Form data can carry forward

Multiple Tabs

If needed, the subagent can work with multiple tabs:

code

Task: Open https://example.com in the current tab, then open a new tab 
and navigate to https://example.com/compare. Switch between tabs to 
compare data from both pages.

File Downloads

code

Task: Navigate to https://example.com/downloads, click the "Download Report" 
button, wait for the download to complete, and return confirmation.

Iframes

code

Task: Navigate to https://example.com, locate the embedded iframe 
containing the video player, switch context to that iframe, then click 
the play button.

Error Recovery

If the browser tool encounters issues:

•The subagent will report what went wrong
•Read the error message carefully
•Adjust your Task instructions
•Try again with more specific instructions or wait conditions

Common issues:

•Element not found: Be more specific about element description
•Timeout: Add explicit wait conditions or increase wait time
•Navigation failed: Check URL validity, network issues

Performance Tips

•Be Specific: Clear selectors are faster than vague descriptions
•Minimize Waits: Only wait when necessary; don't add arbitrary delays
•Single Purpose: One task per browser_subagent call
•Return Fast: Return as soon as the required information is collected

Examples

See examples/ directory for complete working examples:

•examples/login_test.md - Authentication flow
•examples/form_automation.md - Form submission
•examples/data_extraction.md - Web scraping
•examples/ui_testing.md - Visual verification

Limitations

•Each subagent call is independent (no session sharing between calls)
•Cannot execute arbitrary JavaScript (but can interact with page elements)
•Video recordings use system resources (keep sessions focused)
•Some sites may block automation (CAPTCHA, bot detection)

Integration with Workflows

Browser automation pairs well with:

•Testing workflows: Automate E2E tests
•Data collection: Scrape and process information
•Documentation: Record user flows automatically
•Verification: Validate deployments

When NOT to Use

Avoid browser automation when:

•Static HTML scraping is sufficient → use read_url_content
•API endpoints are available → use direct API calls
•File processing is needed → use file manipulation tools
•The task requires human judgment (CAPTCHA, visual verification)