AgentSkillsCN

browser-automation

交互式网页任务、浏览器登录、点击、滚动、表单交互、已认证会话、Playwright MCP(项目)

SKILL.md
--- frontmatter
name: browser-automation
description: Interactive web tasks, browser login, click, scroll, form interaction, authenticated sessions, Playwright MCP (project)

Browser Automation Skill

Use the Playwright MCP server for interactive web tasks that require login, clicking, scrolling, or form interaction.

Quick Start

When to use Browser Automation:

  • Pages requiring authentication (login, sessions)
  • Interactive UI (clicking buttons, filling forms)
  • Dynamic content (scrolling to load more)
  • Screenshots of authenticated pages
  • Multi-step web workflows

When NOT to use Browser (use these instead):

  • WebSearch: Public information searches
  • WebFetch: Reading public pages (no login)
  • curl: Simple API calls

Tool Selection Guide

NeedToolWhy
Search the webWebSearchFast, no browser needed
Read a public URLWebFetchSimpler, converts HTML to markdown
Login to a sitePlaywright BrowserHandles auth, cookies, sessions
Click a buttonPlaywright BrowserInteractive elements
Fill a formPlaywright BrowserForm fields, validation
Scroll to loadPlaywright BrowserDynamic/infinite scroll
Screenshot dashboardPlaywright BrowserAuthenticated page
API endpointcurlDirect HTTP, no browser

Examples: When to Use Each

Use WebSearch/WebFetch

code
"What's the latest React documentation?"
→ Use WebSearch

"Summarize this blog post: https://example.com/post"
→ Use WebFetch

"Find documentation on Python asyncio"
→ Use WebSearch

Use Playwright Browser

code
"Check my Twitter mentions"
→ Use browser_navigate to twitter.com, then browser_snapshot

"Scroll through my LinkedIn feed"
→ Use browser_navigate, then browser_evaluate with scroll

"Fill out this form and submit"
→ Use browser_fill_form or browser_type + browser_click

"Take a screenshot of my dashboard"
→ Use browser_navigate + browser_take_screenshot

"Click the export button on this SaaS tool"
→ Use browser_snapshot to find ref, then browser_click

Playwright MCP Tools

ToolPurposeWhen to Use
browser_navigateGo to URLStart any session
browser_snapshotGet accessibility treeBefore clicking (find element refs)
browser_clickClick elementButtons, links, interactive elements
browser_typeType into fieldText inputs, search boxes
browser_fill_formFill multiple fieldsLogin forms, multi-field forms
browser_take_screenshotCapture imageVisual documentation
browser_evaluateRun JavaScriptScroll, complex interactions
browser_press_keyPress keyboard keyEnter, Escape, arrows
browser_hoverHover over elementReveal tooltips, dropdowns
browser_select_optionSelect dropdownDropdown menus
browser_wait_forWait for text/timeContent loading
browser_tabsManage tabsMultiple pages
browser_closeClose browserEnd session

Common Workflows

Login to a Site

code
1. browser_navigate → url: "https://site.com/login"
2. browser_snapshot → find form field refs
3. browser_fill_form → username, password fields
4. browser_click → submit button (ref from snapshot)
5. browser_wait_for → text: "Dashboard" (or success indicator)

Extract Content After Login

code
1. Login workflow (above)
2. browser_navigate → target page
3. browser_snapshot → get page content as accessibility tree
4. Process the content

Scroll and Load More

code
1. browser_navigate → page with infinite scroll
2. browser_evaluate → function: "async (page) => { window.scrollTo(0, document.body.scrollHeight); }"
3. browser_wait_for → time: 2 (seconds for content to load)
4. browser_snapshot → get loaded content
5. Repeat 2-4 as needed

Fill and Submit Form

code
1. browser_navigate → form page
2. browser_snapshot → find field refs
3. browser_fill_form → fields: [
     {name: "Email", type: "textbox", ref: "R123", value: "email@example.com"},
     {name: "Phone", type: "textbox", ref: "R124", value: "555-1234"}
   ]
4. browser_click → submit button ref
5. browser_wait_for → confirmation text

Persistent Sessions

The Playwright MCP can maintain logged-in sessions across Claude Code restarts:

bash
# Setup with persistent browser profile
claude mcp add playwright -- npx @playwright/mcp@latest --user-data-dir ~/.playwright-data/pcp-browser

Benefits:

  • Login once, stay logged in
  • Cookies and local storage preserved
  • Faster subsequent interactions
  • No re-authentication needed

Browser Snapshot vs Screenshot

MethodOutputUse For
browser_snapshotAccessibility tree (text)Finding elements, extracting text, action refs
browser_take_screenshotImage fileVisual documentation, debugging

Rule: Use browser_snapshot for actions, browser_take_screenshot for visuals.

When User Says...

User RequestWhat To Do
"Check my Twitter mentions"Navigate to twitter.com/notifications, snapshot
"Log in to X and check Y"Login workflow, then navigate + snapshot
"Scroll through my feed"Navigate, evaluate scroll, wait, snapshot
"Fill out this form"Navigate, snapshot for refs, fill_form, click submit
"Screenshot my dashboard"Navigate, wait for load, take_screenshot
"Click the button that says X"Snapshot to find ref, click with ref
"What's on this page?"If public: WebFetch. If login: browser + snapshot

Error Handling

Browser not installed:

code
Error: Browser not found
→ Use browser_install tool

Element not found:

code
Error: Element ref not found
→ Run browser_snapshot first to get current refs

Login expired:

code
Content shows login page instead of expected content
→ Re-authenticate or use persistent user-data-dir

Integration with PCP

Browser automation works with PCP capabilities:

  1. Extract → Capture: Get content via browser, capture to vault

    python
    content = browser_snapshot()
    smart_capture(f"From Twitter feed: {content}")
    
  2. Browser → Email: Draft email based on browser data

    python
    data = browser_get_form_data()
    create_draft(to="x@y.com", subject="Form data", body=data)
    
  3. Browser → Knowledge: Store permanent facts found in browser

    python
    price = extract_from_browser()
    add_knowledge(f"Widget price is {price}", category="fact")
    

Related Skills

  • native-tools - When CLI tools suffice (curl for APIs)
  • email-processing - For Outlook email (use Graph API, not browser)
  • vault-operations - Capture browser-extracted content
  • knowledge-base - Store permanent facts from browser research