Browser - Debug-First Browser Automation
Browser automation with debugging visibility by DEFAULT.
Console logs, network requests, and errors are always captured - because debugging shouldn't be opt-in.
Philosophy: Debug-First
Traditional browser automation treats debugging as an afterthought. You run your test, it fails, and THEN you add logging to figure out why.
This skill flips that: debugging is enabled from the start. Every page load captures:
- •Console logs (errors, warnings, info)
- •Network requests and responses
- •Failed requests (4xx, 5xx)
- •Page load status
When something breaks, the diagnostic data already exists.
Session Architecture
Session auto-starts on first use. No explicit start command needed.
Any CLI command → Session running?
├─ Yes → Execute command
└─ No → Auto-start session → Execute command
Key behaviors:
- •First command starts a persistent browser session
- •Session stays alive between commands (fast subsequent operations)
- •30-minute idle timeout auto-cleans up zombie processes
- •State stored in
/tmp/browser-session.json
CLI Commands (Primary Interface)
Location: $PAI_DIR/skills/Browser/Tools/Browse.ts
Primary Command - Navigate with Diagnostics
bun run Browse.ts <url>
This is the main command. Navigates to the URL and outputs:
- •Screenshot path
- •Console errors (if any)
- •Console warnings (if any)
- •Failed requests (if any)
- •Network summary
- •Page load status
Example output:
📸 Screenshot: /tmp/browse-1704567890.png 🔴 Console Errors (1): • Uncaught TypeError: Cannot read property 'map' of undefined 🌐 Failed Requests (1): • GET /api/users → 500 Internal Server Error 📊 Network: 23 requests | 847KB | avg 156ms ⚠️ Page: "My App" loaded with issues
Query Commands
Check current session state without navigating:
bun run Browse.ts errors # Console errors only bun run Browse.ts warnings # Console warnings only bun run Browse.ts console # All console output bun run Browse.ts network # All network activity bun run Browse.ts failed # Failed requests (4xx, 5xx)
Interaction Commands
bun run Browse.ts navigate <url> # Navigate without diagnostics bun run Browse.ts screenshot [path] # Screenshot current page bun run Browse.ts click <selector> # Click element bun run Browse.ts fill <selector> <value> # Fill input field bun run Browse.ts type <selector> <text> # Type with delay bun run Browse.ts eval "<javascript>" # Execute JavaScript bun run Browse.ts open <url> # Open in default browser
Session Management
bun run Browse.ts status # Show session info bun run Browse.ts restart # Fresh session (clears logs) bun run Browse.ts stop # Stop session
STOP - CLI First, Always
The Wrong Pattern
DO NOT write new TypeScript code for simple browser tasks:
// WRONG - Writing new code defeats the purpose
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/src/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
await browser.screenshot({ path: '/tmp/shot.png' })
await browser.close()
Problems:
- •5+ lines of boilerplate every time
- •Manual browser lifecycle management
- •No automatic diagnostic capture
The Right Pattern
USE the CLI - it handles everything:
# One command = navigate + screenshot + diagnostics bun run Browse.ts https://example.com
Decision Tree
What are you trying to do?
|
┌──────────────────┴──────────────────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ SIMPLE │ │ COMPLEX │
│ Single task │ │ Multi-step │
└─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ • Navigate │ │ • Form fill │
│ • Screenshot│ │ • Auth flow │
│ • Click │ │ • Conditionals│
│ • Fill │ │ • Scraping │
└─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ USE CLI │ │ USE WORKFLOW│
│ Browse.ts │ │ or API │
└─────────────┘ └─────────────┘
VERIFY Phase Integration
The Browser skill is MANDATORY for VERIFY phase of web changes.
Before claiming ANY web change is "live" or "working":
# 1. Navigate with full diagnostics bun run Browse.ts https://example.com/changed-page # 2. View the screenshot Read /tmp/browse-*.png
If you haven't LOOKED at the rendered page and its diagnostics, you CANNOT claim it works.
Debugging Workflow Example
Scenario: "Why isn't the user list loading?"
# Step 1: Load the page with diagnostics $ bun run Browse.ts https://myapp.com/users 📸 Screenshot: /tmp/browse-1704567890.png 🔴 Console Errors (1): • Uncaught TypeError: Cannot read property 'map' of undefined 🌐 Failed Requests (1): • GET /api/users → 500 Internal Server Error 📊 Network: 23 requests | 847KB | avg 156ms ⚠️ Page: "User List" loaded with issues
Immediately identified:
- •API returning 500 error
- •Frontend crashing because no data
- •Specific error location
# Step 2: Dig deeper $ bun run Browse.ts console # Full console output $ bun run Browse.ts network # All network activity $ bun run Browse.ts failed # Just the failures
Server Endpoints (for advanced use)
The persistent session runs an HTTP server on port 9222:
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Health check |
/diagnostics | GET | Full diagnostic summary |
/console | GET | Console logs |
/network | GET | Network activity |
/navigate | POST | Navigate to URL |
/click | POST | Click element |
/fill | POST | Fill input |
/screenshot | POST | Take screenshot |
/stop | POST | Stop server |
Workflow Routing
For complex, multi-step tasks:
| Trigger | Workflow |
|---|---|
| Fill forms, interact with page | Workflows/Interact.md |
| Extract page content | Workflows/Extract.md |
| Complex verification sequence | Workflows/VerifyPage.md |
| Screenshot with custom options | Workflows/Screenshot.md |
TypeScript API (Advanced)
Only use this for custom automation that CLI cannot handle.
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/src/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
// ... custom logic ...
await browser.close()
API Reference
Navigation: launch(), navigate(), goBack(), goForward(), reload(), close()
Capture: screenshot(), getVisibleText(), getVisibleHtml(), savePdf(), getAccessibilityTree()
Interaction: click(), fill(), type(), select(), pressKey(), hover(), drag(), uploadFile()
Monitoring: getConsoleLogs(), getNetworkLogs(), getNetworkStats(), clearNetworkLogs()
Waiting: waitForSelector(), waitForText(), waitForNavigation(), waitForNetworkIdle(), wait()
Viewport: resize(), setDevice()
Token Savings
| Approach | Tokens | Notes |
|---|---|---|
| Playwright MCP | ~13,700 | Loaded at startup |
| CLI tool | ~0 | Executes pre-written code |
| TypeScript API | ~50-200 | Only what you write |
| CLI Savings | 99%+ | Compared to MCP |