AgentSkillsCN

Browser

以“调试优先”的理念,打造始终可见的浏览器自动化工具。适用于浏览器操作、截屏、页面导航、Web 测试、UI 验证,以及“VERIFY”阶段的网页调试。会话默认自动启动,并按需捕获控制台与网络日志。

SKILL.md
--- frontmatter
name: Browser
version: 1.2.0
description: Debug-first browser automation with always-on visibility. USE WHEN browser, screenshot, navigate, web testing, verify UI, VERIFY phase, debug web page. Session auto-starts, captures console/network by default.

Browser - Debug-First Browser Automation

Browser automation with debugging visibility by DEFAULT.

Console logs, network requests, and errors are always captured - because debugging shouldn't be opt-in.


Philosophy: Debug-First

Traditional browser automation treats debugging as an afterthought. You run your test, it fails, and THEN you add logging to figure out why.

This skill flips that: debugging is enabled from the start. Every page load captures:

  • Console logs (errors, warnings, info)
  • Network requests and responses
  • Failed requests (4xx, 5xx)
  • Page load status

When something breaks, the diagnostic data already exists.


Session Architecture

Session auto-starts on first use. No explicit start command needed.

code
Any CLI command → Session running?
                      ├─ Yes → Execute command
                      └─ No → Auto-start session → Execute command

Key behaviors:

  • First command starts a persistent browser session
  • Session stays alive between commands (fast subsequent operations)
  • 30-minute idle timeout auto-cleans up zombie processes
  • State stored in /tmp/browser-session.json

CLI Commands (Primary Interface)

Location: $PAI_DIR/skills/Browser/Tools/Browse.ts

Primary Command - Navigate with Diagnostics

bash
bun run Browse.ts <url>

This is the main command. Navigates to the URL and outputs:

  • Screenshot path
  • Console errors (if any)
  • Console warnings (if any)
  • Failed requests (if any)
  • Network summary
  • Page load status

Example output:

code
📸 Screenshot: /tmp/browse-1704567890.png

🔴 Console Errors (1):
   • Uncaught TypeError: Cannot read property 'map' of undefined

🌐 Failed Requests (1):
   • GET /api/users → 500 Internal Server Error

📊 Network: 23 requests | 847KB | avg 156ms
⚠️ Page: "My App" loaded with issues

Query Commands

Check current session state without navigating:

bash
bun run Browse.ts errors      # Console errors only
bun run Browse.ts warnings    # Console warnings only
bun run Browse.ts console     # All console output
bun run Browse.ts network     # All network activity
bun run Browse.ts failed      # Failed requests (4xx, 5xx)

Interaction Commands

bash
bun run Browse.ts navigate <url>           # Navigate without diagnostics
bun run Browse.ts screenshot [path]        # Screenshot current page
bun run Browse.ts click <selector>         # Click element
bun run Browse.ts fill <selector> <value>  # Fill input field
bun run Browse.ts type <selector> <text>   # Type with delay
bun run Browse.ts eval "<javascript>"      # Execute JavaScript
bun run Browse.ts open <url>               # Open in default browser

Session Management

bash
bun run Browse.ts status     # Show session info
bun run Browse.ts restart    # Fresh session (clears logs)
bun run Browse.ts stop       # Stop session

STOP - CLI First, Always

The Wrong Pattern

DO NOT write new TypeScript code for simple browser tasks:

typescript
// WRONG - Writing new code defeats the purpose
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/src/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
await browser.screenshot({ path: '/tmp/shot.png' })
await browser.close()

Problems:

  • 5+ lines of boilerplate every time
  • Manual browser lifecycle management
  • No automatic diagnostic capture

The Right Pattern

USE the CLI - it handles everything:

bash
# One command = navigate + screenshot + diagnostics
bun run Browse.ts https://example.com

Decision Tree

code
                    What are you trying to do?
                              |
           ┌──────────────────┴──────────────────┐
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │   SIMPLE    │                      │   COMPLEX   │
    │ Single task │                      │ Multi-step  │
    └─────────────┘                      └─────────────┘
           │                                     │
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │ • Navigate  │                      │ • Form fill │
    │ • Screenshot│                      │ • Auth flow │
    │ • Click     │                      │ • Conditionals│
    │ • Fill      │                      │ • Scraping  │
    └─────────────┘                      └─────────────┘
           │                                     │
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │ USE CLI     │                      │ USE WORKFLOW│
    │ Browse.ts   │                      │ or API      │
    └─────────────┘                      └─────────────┘

VERIFY Phase Integration

The Browser skill is MANDATORY for VERIFY phase of web changes.

Before claiming ANY web change is "live" or "working":

bash
# 1. Navigate with full diagnostics
bun run Browse.ts https://example.com/changed-page

# 2. View the screenshot
Read /tmp/browse-*.png

If you haven't LOOKED at the rendered page and its diagnostics, you CANNOT claim it works.


Debugging Workflow Example

Scenario: "Why isn't the user list loading?"

bash
# Step 1: Load the page with diagnostics
$ bun run Browse.ts https://myapp.com/users

📸 Screenshot: /tmp/browse-1704567890.png

🔴 Console Errors (1):
   • Uncaught TypeError: Cannot read property 'map' of undefined

🌐 Failed Requests (1):
   • GET /api/users → 500 Internal Server Error

📊 Network: 23 requests | 847KB | avg 156ms
⚠️ Page: "User List" loaded with issues

Immediately identified:

  1. API returning 500 error
  2. Frontend crashing because no data
  3. Specific error location
bash
# Step 2: Dig deeper
$ bun run Browse.ts console    # Full console output
$ bun run Browse.ts network    # All network activity
$ bun run Browse.ts failed     # Just the failures

Server Endpoints (for advanced use)

The persistent session runs an HTTP server on port 9222:

EndpointMethodDescription
/healthGETHealth check
/diagnosticsGETFull diagnostic summary
/consoleGETConsole logs
/networkGETNetwork activity
/navigatePOSTNavigate to URL
/clickPOSTClick element
/fillPOSTFill input
/screenshotPOSTTake screenshot
/stopPOSTStop server

Workflow Routing

For complex, multi-step tasks:

TriggerWorkflow
Fill forms, interact with pageWorkflows/Interact.md
Extract page contentWorkflows/Extract.md
Complex verification sequenceWorkflows/VerifyPage.md
Screenshot with custom optionsWorkflows/Screenshot.md

TypeScript API (Advanced)

Only use this for custom automation that CLI cannot handle.

typescript
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/src/index.ts'

const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
// ... custom logic ...
await browser.close()

API Reference

Navigation: launch(), navigate(), goBack(), goForward(), reload(), close()

Capture: screenshot(), getVisibleText(), getVisibleHtml(), savePdf(), getAccessibilityTree()

Interaction: click(), fill(), type(), select(), pressKey(), hover(), drag(), uploadFile()

Monitoring: getConsoleLogs(), getNetworkLogs(), getNetworkStats(), clearNetworkLogs()

Waiting: waitForSelector(), waitForText(), waitForNavigation(), waitForNetworkIdle(), wait()

Viewport: resize(), setDevice()


Token Savings

ApproachTokensNotes
Playwright MCP~13,700Loaded at startup
CLI tool~0Executes pre-written code
TypeScript API~50-200Only what you write
CLI Savings99%+Compared to MCP