AgentSkillsCN

browser

统一的浏览器自动化技能。为Rodney CLI、agent-browser CLI、dev-browser(TypeScript/Playwright)以及browser-tools辅助脚本提供单一入口。

SKILL.md
--- frontmatter
name: browser
description: Unified browser automation skill. Single entry point for Rodney CLI, agent-browser CLI, dev-browser (TypeScript/Playwright), and browser-tools helper scripts.

Unified Browser Automation Skill

Use this as the single browser instruction set.

It covers four methods, each optimized for different tasks:

  1. Rodney CLI (fastest command-by-command automation)
  2. agent-browser CLI (ARIA snapshot refs + rich CLI surface)
  3. dev-browser (TypeScript + Playwright for complex workflows)
  4. browser-tools scripts (interactive picker, profile-copy launch, readability extraction)

Quick Tool Chooser

SituationUseWhy
1–3 quick CLI operations, assertions, waits, screenshots/PDFRodneyLowest per-command overhead and broad built-ins
Need ARIA snapshot refs (@e1) and agent-browser workflowagent-browserBest ref-based element discovery + robust CLI
3+ linked steps, branching logic, loops, network interception, complex form/app flowsdev-browserFull Playwright API in one script
Need user-assisted element selection on visible pagebrowser-tools browser-pick.jsInteractive click-to-select picker
Need to reuse local Chrome profile copy for auth/cookiesbrowser-tools browser-start.js --profileCopies profile into automation dir
Need markdown content extraction from dynamic pagebrowser-tools browser-content.jsReadability + Turndown pipeline

Default selection order

When no special constraints are given:

  1. Start with Rodney
  2. Use agent-browser if snapshot refs or agent-browser-specific features are needed
  3. Use dev-browser for scripted multi-step workflows
  4. Use browser-tools for picker/profile/content-extract specific needs

Performance Heuristics (local measurements)

Recent runs on this machine:

  • Rodney: title read ~3ms (warm), screenshot ~36ms, start ~232ms
  • agent-browser: title read ~152ms
  • browser-tools (after timer fix): eval ~201ms, nav ~257ms
  • dev-browser: server startup ~1–1.5s, batched script (~5 reads + screenshot) ~642ms

Interpretation:

  • Rodney is best for fast incremental CLI loops.
  • dev-browser wins once multiple operations are batched in one TS run.
  • browser-tools is now viable for quick helpers (no old 5s lag).

Port / Session Rules (important)

  • Rodney
    • start --local creates directory-scoped session (./.rodney/)
    • can also connect <host:port> to existing Chrome
  • browser-tools uses Chrome remote debug on :9222
  • dev-browser uses HTTP control server on :9222 and Chrome CDP on :9223

⚠️ Do not run browser-tools and dev-browser simultaneously on :9222.

If needed, clear port 9222 first:

bash
lsof -ti:9222 | xargs -r kill

Method A: Rodney CLI (primary quick CLI)

Common flow

bash
rodney start --local
rodney open https://example.com --local
rodney title --local
rodney waitidle --local
rodney screenshot --local /tmp/page.png
rodney stop --local

Useful commands

bash
rodney text <selector> --local
rodney click <selector> --local
rodney input <selector> "value" --local
rodney exists <selector> --local
rodney assert '<js expression>' [expected] --local
rodney pages --local
rodney newpage [url] --local
rodney ax-tree --local --depth 2
rodney ax-find --role link --local --json
rodney pdf --local /tmp/page.pdf

Notes

  • rodney with no subcommand prints help and exits code 2 (expected).
  • Use rodney connect 127.0.0.1:9222 --local to drive an already-running Chrome.

Method B: agent-browser CLI (ref-based automation)

Common flow

bash
agent-browser open "https://example.com"
agent-browser snapshot
agent-browser click @e5
agent-browser get title
agent-browser screenshot /tmp/page.png
agent-browser close

Best when

  • You want snapshot refs (@e1, @e2) rather than CSS selectors
  • You are already in an agent-browser workflow/session
  • You need its broader CLI feature set (network, storage, trace, etc.)

Method C: dev-browser (TypeScript + Playwright)

Use for robust, multi-step logic and richer scripting.

Start server

<skill_dir> = directory containing this SKILL.md

bash
cd <skill_dir>
./server.sh --headless

Script pattern

bash
cd <skill_dir> && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();
const page = await client.page("workflow");

await page.goto("https://example.com");
await waitForPageLoad(page);

console.log(await page.title());
await page.screenshot({ path: "/tmp/workflow.png" });

await client.disconnect();
EOF

Best when

  • You need loops/conditionals/data transforms
  • You need Playwright-native APIs (routing/interception, advanced waits)
  • You want to batch many steps into one run

Method D: browser-tools helper scripts

Use these scripts for specific workflows that are still very useful.

Setup (once)

bash
cd <skill_dir>/../browser-tools
npm install

Start Chrome for browser-tools

bash
<skill_dir>/../browser-tools/browser-start.js
<skill_dir>/../browser-tools/browser-start.js --profile

Core helpers

bash
<skill_dir>/../browser-tools/browser-nav.js https://example.com
<skill_dir>/../browser-tools/browser-eval.js 'document.title'
<skill_dir>/../browser-tools/browser-screenshot.js
<skill_dir>/../browser-tools/browser-cookies.js
<skill_dir>/../browser-tools/browser-content.js https://example.com
<skill_dir>/../browser-tools/browser-pick.js "Select the elements to scrape"

Best when

  • You need interactive element picking (browser-pick.js)
  • You want readable markdown extraction (browser-content.js)
  • You want profile-copy launch (--profile) quickly

Smart Decision Checklist for Agents

Before choosing a method, ask:

  1. How many steps?
    • Few quick steps → Rodney
    • Many scripted steps → dev-browser
  2. Need interactive user picking?
    • Yes → browser-pick.js
  3. Need snapshot refs (@eX)?
    • Yes → agent-browser
  4. Need profile-copy Chrome startup on :9222?
    • Yes → browser-tools start with --profile
  5. Any :9222 conflicts?
    • Ensure only one of browser-tools/dev-browser owns that port

Quick Troubleshooting

  • Could not connect to browser (:9222)
    • Start correct service/tool (browser-tools start or dev-browser server)
    • Check for port conflicts (lsof -i :9222)
  • dev-browser scripts fail to import @/client.js
    • Run scripts from <skill_dir>
  • Rodney command exits code 2 unexpectedly
    • Verify subcommand/arguments; code 2 indicates usage/runtime error
  • Need clean isolation per project
    • Prefer rodney --local sessions