AgentSkillsCN

browser-automation

通过 CDP 远程控制 Chrome 浏览器,实现标签页管理、页面导航、截图以及自动化操作。适用于以下场景:(1) 无需新建标签页,即可按域名查找现有浏览器标签页;(2) 在现有标签页或新标签页中导航至指定 URL;(3) 截取网页截图;(4) 自动化浏览器交互操作;(5) 管理多个浏览器上下文与页面。

SKILL.md
--- frontmatter
name: browser-automation
description: "Control remote Chrome browser via CDP for tab management, navigation, screenshots, and automation. Use when: (1) Finding existing browser tabs by domain without creating duplicates, (2) Navigating to specific URLs in existing or new tabs, (3) Taking screenshots of web pages, (4) Automating browser interactions, (5) Managing multiple browser contexts and pages."

Browser Automation Skill

Control remote Chrome browser via Chrome DevTools Protocol (CDP) for reliable tab management and automation.

Core Features

  • No duplicate tabs: Always finds existing tabs by domain
  • Domain-based search: Match tabs by domain (handles x.com/twitter.com variations)
  • Automatic navigation: Opens target URL in found or created tab
  • CDP integration: Returns tab ID and WebSocket URL for direct control
  • Cross-context search: Checks ALL browser contexts, not just the first

Prerequisites

  • Chrome running with remote debug port (e.g., --remote-debugging-port=9222)
  • CDP endpoint accessible (e.g., http://10.0.9.105:9222)
  • Python 3.8+ with playwright installed

Quick Start

python
import sys
sys.path.insert(0, '<skill-path>/scripts')
from browser_tools import find_or_create_tab

async def example():
    # Find tab with domain "x.com" and navigate to URL
    page, browser, p, is_new, cdp_id, ws_url = await find_or_create_tab(
        "x.com",                           # Domain to search for
        "https://x.com/oraclesrun"         # URL to open in the tab
    )
    
    # is_new = False if reused existing tab, True if created new
    # cdp_id = Chrome DevTools tab ID
    # ws_url = WebSocket URL for direct CDP connection
    
    await page.screenshot(path="/tmp/screenshot.png")
    
    # Close connection, keep tab open
    await browser.close()
    await p.stop()

Main Functions

find_or_create_tab(domain, target_url=None)

Find existing tab by domain or create new one, then navigate to target_url.

Parameters:

  • domain (str): Domain to search for (e.g., "x.com", "github.com")
  • target_url (str, optional): URL to navigate to in the tab

Returns: (page, browser, playwright, is_new, cdp_id, ws_url)

  • page: Playwright page object
  • browser: Playwright browser object
  • p: Playwright instance
  • is_new: Boolean (False=reused existing, True=created new)
  • cdp_id: Chrome DevTools tab ID
  • ws_url: WebSocket debugger URL

Example:

python
# Find x.com tab, navigate to specific profile
page, browser, p, is_new, cdp_id, ws_url = await find_or_create_tab(
    "x.com",
    "https://x.com/elonmusk"
)

if is_new:
    print(f"Created new tab: {cdp_id}")
else:
    print(f"Reused existing tab: {cdp_id}")

list_tabs()

List all open tabs across ALL browser contexts with CDP information.

Returns: List of dicts with keys:

  • index: Tab index
  • context: Context index
  • cdp_id: Chrome DevTools tab ID
  • websocket_url: WebSocket debugger URL
  • url: Tab URL
  • domain: Extracted domain
  • title: Page title

Example:

python
tabs = await list_tabs()
for tab in tabs:
    print(f"{tab['cdp_id']}: {tab['url']}")

Domain Matching

The skill handles domain variations automatically:

Search DomainMatches URLs
x.comx.com, twitter.com
twitter.comtwitter.com, x.com
github.comgithub.com only

Command Line Usage

bash
# List all tabs
python3 browser_tools.py list_tabs

# Find or create tab
python3 browser_tools.py find_or_create_tab <domain> [url]

# Examples:
python3 browser_tools.py find_or_create_tab x.com https://x.com/home
python3 browser_tools.py find_or_create_tab github.com

Best Practices

Always check is_new

python
page, browser, p, is_new, cdp_id, ws_url = await find_or_create_tab("x.com")

if is_new:
    print("WARNING: Created new tab - check for duplicates!")

Never close tabs, only connections

python
# Correct: Close browser connection, keep tab
await browser.close()
await p.stop()

# Wrong: Don't close the page/tab
# await page.close()  # ❌ Don't do this

Work without focus

The skill operates in background via CDP - no need to bring browser window to front.

Configuration

Edit browser_tools.py to change CDP endpoint:

python
CDP_URL = "http://10.0.9.105:9222"  # Change to your Chrome debug URL

Troubleshooting

Duplicate tabs created

  • Check version: browser_tools.__version__ should be 1.0.4+
  • Verify CDP endpoint is correct
  • Check Chrome is running with remote debugging enabled

Connection refused

  • Verify Chrome is running with --remote-debugging-port=9222
  • Check firewall rules for CDP port
  • Ensure CDP_URL matches your Chrome debug endpoint

Page operations timeout

  • Some operations (screenshots) may timeout on heavy pages
  • Use page.wait_for_load_state() before operations
  • Consider networkidle or domcontentloaded states

Version History

  • 1.0.4: Fixed navigation - now opens target_url in found tabs
  • 1.0.3: Added CDP ID and WebSocket URL in output
  • 1.0.2: Fixed tab search across ALL contexts
  • 1.0.1: Added x.com/twitter.com domain matching
  • 1.0.0: Initial release