AgentSkillsCN

browser-control

通过控制平面的浏览器自动化 API 控制 Chrome 浏览器。适用于自动化网页交互、截取屏幕截图、提取页面内容,或执行诸如点击、输入文本、滚动等浏览器操作。控制平面的端点位于 /browser/*(默认端口 18420)。

SKILL.md
--- frontmatter
name: browser-control
description: Control Chrome browser via the Control Plane's browser automation API. Use when automating web interactions, taking screenshots, extracting page content, or performing browser actions like clicking, typing, scrolling. Endpoints available at /browser/* on the Control Plane (default port 18420).

Browser Control API

Automate Chrome via CDP (Chrome DevTools Protocol) using Playwright.

Prerequisites

Chrome must be running with remote debugging:

bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

Endpoints

MethodPathDescription
GET/browser/statusCheck connection status
POST/browser/connectConnect to Chrome via CDP
GET/browser/tabsList open tabs
POST/browser/navigateNavigate to URL
GET/browser/snapshotGet AI-readable page snapshot
POST/browser/screenshotCapture screenshot (base64 PNG)
POST/browser/actPerform actions (click, type, etc.)
POST/browser/evaluateExecute JavaScript
POST/browser/closeClose a page

Quick Start

bash
# Check status
curl -s 'http://127.0.0.1:18420/browser/status' -H 'x-cp-token: dev'

# Connect to browser
curl -s -X POST 'http://127.0.0.1:18420/browser/connect' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' -d '{}'

# Navigate
curl -s -X POST 'http://127.0.0.1:18420/browser/navigate' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com"}'

# Get snapshot for AI analysis
curl -s 'http://127.0.0.1:18420/browser/snapshot' -H 'x-cp-token: dev'

# Screenshot (save to file)
curl -s -X POST 'http://127.0.0.1:18420/browser/screenshot' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"fullPage": true}' | jq -r '.image' | base64 -d > screenshot.png

Actions (POST /browser/act)

ActionRequiredOptionalExample
clickreftargetId{"kind":"click","ref":"#btn"}
dblclickreftargetId{"kind":"dblclick","ref":".item"}
typeref, textsubmit, targetId{"kind":"type","ref":"#input","text":"hello","submit":true}
presskeytargetId{"kind":"press","key":"Enter"}
hoverreftargetId{"kind":"hover","ref":".menu"}
selectref, valuestargetId{"kind":"select","ref":"#dropdown","values":["US"]}
waittimeMs or waitForTexttargetId{"kind":"wait","waitForText":"Success"}
scroll-direction, targetId{"kind":"scroll","direction":"down"}

Element Reference Formats

The ref parameter accepts:

  • CSS selector: #id, .class, button
  • Aria ref (from snapshot): e72 (pattern: e<number>)
  • XPath: /html/body/div/button (starts with /)
  • Playwright pseudo: :has-text("Submit")

Action Examples

bash
# Click button
curl -s -X POST 'http://127.0.0.1:18420/browser/act' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"kind":"click","ref":"#submit"}'

# Type and submit
curl -s -X POST 'http://127.0.0.1:18420/browser/act' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"kind":"type","ref":"#search","text":"query","submit":true}'

# Press key
curl -s -X POST 'http://127.0.0.1:18420/browser/act' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"kind":"press","key":"Escape"}'

# Wait for text
curl -s -X POST 'http://127.0.0.1:18420/browser/act' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"kind":"wait","waitForText":"Loading complete"}'

# Scroll down
curl -s -X POST 'http://127.0.0.1:18420/browser/act' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"kind":"scroll","direction":"down"}'

Evaluate JavaScript

bash
# Get page title
curl -s -X POST 'http://127.0.0.1:18420/browser/evaluate' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"expression":"document.title"}'

# Get all links
curl -s -X POST 'http://127.0.0.1:18420/browser/evaluate' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"expression":"Array.from(document.querySelectorAll(\"a\")).map(a=>a.href)"}'

Tab Management

bash
# List tabs
curl -s 'http://127.0.0.1:18420/browser/tabs' -H 'x-cp-token: dev'
# Response: {"ok":true,"tabs":[{"targetId":"page_abc123","url":"...","title":""}]}

# Close specific tab
curl -s -X POST 'http://127.0.0.1:18420/browser/close' \
  -H 'x-cp-token: dev' -H 'Content-Type: application/json' \
  -d '{"targetId":"page_abc123"}'

Response Format

All endpoints return JSON:

json
{"ok": true, ...}           // Success
{"ok": false, "error": "..."} // Error

Workflow Example

bash
# 1. Connect and navigate
curl -X POST 'http://127.0.0.1:18420/browser/connect' -H 'x-cp-token: dev' -H 'Content-Type: application/json' -d '{}'
curl -X POST 'http://127.0.0.1:18420/browser/navigate' -H 'x-cp-token: dev' -H 'Content-Type: application/json' -d '{"url":"https://github.com"}'

# 2. Get snapshot to find element refs
curl 'http://127.0.0.1:18420/browser/snapshot' -H 'x-cp-token: dev' | jq '.snapshot'

# 3. Interact using refs from snapshot
curl -X POST 'http://127.0.0.1:18420/browser/act' -H 'x-cp-token: dev' -H 'Content-Type: application/json' -d '{"kind":"click","ref":"e42"}'

# 4. Screenshot result
curl -X POST 'http://127.0.0.1:18420/browser/screenshot' -H 'x-cp-token: dev' -H 'Content-Type: application/json' -d '{}' | jq -r '.image' | base64 -d > result.png

Notes

  • Default CDP URL: http://127.0.0.1:9222
  • Control Plane dev port: 18420, stable port: 8420
  • targetId is a synthetic hash of the page URL (changes on navigation)
  • Snapshot uses Playwright's _snapshotForAI when available, falls back to accessibility tree