AgentSkillsCN

browser-test

为E2E浏览器测试创建、更新并管理浏览器使用的YAML测试提示。 根据代码变更自动生成针对性的测试定义。当新增前端页面、API解析器,或面向用户的特性时,该技能会自动触发。

SKILL.md
--- frontmatter
name: browser-test
description: |
  Create, update, and manage browser-use YAML test prompts for E2E browser testing.
  Analyzes code changes to generate targeted test definitions. Auto-triggers when new
  frontend pages, API resolvers, or user-facing features are added.

Browser Test Skill

AI-powered end-to-end browser testing using browser-use. Manages YAML test prompts in tests/browser/ and executes them via the browser-use CLI. Complements static analysis skills (a11y-audit, ux-review, performance-check) with actual runtime browser validation.

Arguments

  • $ARGUMENTS -- One of:
    • Path to a test directory (default: tests/browser/)
    • Path to a specific YAML test file
    • A natural-language task description (runs as ad-hoc test)
    • create -- Generate new test prompts based on recent code changes
    • list -- List all existing browser test files
    • dry-run -- Validate YAML test files without executing

Auto-Trigger Criteria

This skill suggests creating or updating browser tests when it detects:

Change TypeDetectionExample
New page/routeNew file in pages/, app/, routes/app/settings/page.tsx
New API resolverNew GraphQL resolver or REST endpointresolvers.ts changes
Auth changesFiles matching auth, login, sessionmiddleware/auth.ts
Form additionsNew <form> elements or form handlerscomponents/checkout-form.tsx

When auto-triggered, suggest test prompts inline without blocking workflow.


Instructions

Step 0: Consult Knowledge Base

Before starting, check for known patterns relevant to browser testing:

bash
~/.claude/scripts/learning_capture.sh query --language typescript --format llm
~/.claude/scripts/learning_capture.sh query --language general --format llm

If the knowledge base contains relevant antipatterns or insights:

  • Include them as additional considerations when generating test prompts
  • Flag known UI/UX issues that tests should cover

This step is non-blocking -- if the knowledge base is empty or the query fails, proceed with the standard workflow.

Phase 1: Environment Detection

Check for browser-use availability and test infrastructure:

bash
# Check browser-use CLI
command -v browser-use

# Check Python environment
python3 -c "import browser_use" 2>/dev/null

# Check for existing test directory
ls tests/browser/*.yaml 2>/dev/null

If browser-use is not installed:

  • Report as skip (not fail)

  • Print install instructions:

    bash
    pip install browser-use
    # or
    pipx install browser-use
    
  • Continue with test prompt creation/validation (YAML-only, no execution)

Phase 2: Test Discovery

Scan for YAML test prompts:

bash
find tests/browser/ -name "*.yaml" -o -name "*.yml" | sort

For each file, validate the schema:

FieldRequiredTypeDescription
taskYesstringNatural language task for the browser agent
judge_contextYeslist[string]Success criteria for evaluation
max_stepsNoint (default: 15)Maximum agent steps
urlNostringStarting URL (overrides task navigation)
tagsNolist[string]Categorization tags (smoke, auth, crud, a11y)

Report invalid YAML files with specific validation errors.

Phase 3: Execute Tests

For each valid test file, run via the wrapper script:

bash
~/.claude/scripts/browser_test.sh run tests/browser/smoke-test.yaml

Or for all tests:

bash
~/.claude/scripts/browser_test.sh run-all tests/browser/

Execution behavior:

  • 60-second timeout per test (configurable via --timeout)
  • Headless mode by default (--headless)
  • Screenshots saved to /tmp/browser_test_screenshots/
  • Results captured as JSON

For create mode: Analyze recent code changes and generate YAML test prompts:

  1. Run git diff --name-only HEAD~1 to identify changed files
  2. Filter for frontend/API changes matching auto-trigger criteria
  3. For each change, generate a YAML test prompt with:
    • Descriptive task covering the user-facing behavior
    • Specific judge_context for success validation
    • Appropriate max_steps based on complexity
    • tags matching the change type (smoke, auth, crud, form)
  4. Write to tests/browser/<feature-name>.yaml

For dry-run mode: Validate all YAML files without browser execution:

  1. Parse each YAML file
  2. Validate schema (task, judge_context required)
  3. Report valid/invalid files
  4. Estimate token cost based on max_steps

Phase 4: Results Report

Generate a structured report:

markdown
## Browser Test Report

**Project**: {absolute-path}
**Tests**: {total} total, {passed} passed, {failed} failed, {skipped} skipped
**Timestamp**: {ISO-8601}

### Results

| Test | Status | Steps | Duration | Summary |
|------|--------|-------|----------|---------|
| smoke-test.yaml | pass | 5/10 | 12s | Homepage loaded, nav visible |
| auth-flow.yaml | fail | 15/15 | 45s | Login button not found |
| checkout.yaml | skip | - | - | browser-use not installed |

### Failures

#### auth-flow.yaml
- **Task**: Navigate to login page, enter credentials, verify dashboard
- **Failed at step**: 8 — Could not find login button
- **Judge context**: "User must reach the dashboard after login"
- **Screenshot**: /tmp/browser_test_screenshots/auth-flow-step8.png

### Verdict

- **PASS**: All tests passed
- **WARN**: Some tests skipped (tool missing)
- **FAIL**: One or more tests failed

Parallel Agent Integration

This skill uses parallel agents conditionally:

TriggerReason
Critical user flows (auth, payment)Cross-verify test prompt quality
3+ test files in a single runValidate coverage completeness
Test failures on critical pathsGet multiple perspectives on fix

When triggered:

bash
~/.claude/scripts/parallel_agent.sh --json --validate --timeout 600 \
  --cursor-model flash --claude-model sonnet \
  "Review these browser test definitions for completeness and correctness: <YAML_CONTENTS>"

Non-Blocking Behavior

This skill follows the auto-trigger pattern:

  • Never blocks user workflow or command execution
  • Suggests tests when code changes match trigger criteria
  • Reports inline without interrupting primary tasks
  • Degrades gracefully when browser-use is not installed

Safety Checks

  • Never install packages -- report missing tools with install instructions
  • Never modify source code -- only creates/updates files in tests/browser/
  • Respect .gitignore -- do not scan ignored directories
  • Cap test execution at 60 seconds per test (configurable)
  • Cap max_steps at 25 to limit LLM token consumption
  • Save screenshots to temp directory, not project tree
  • Sanitize test prompts to avoid exposing real credentials

YAML Test Prompt Format

yaml
# tests/browser/smoke-test.yaml
task: "Navigate to http://localhost:3000 and verify the homepage loads correctly"
judge_context:
  - "The page must load without JavaScript errors"
  - "The main navigation must be visible"
  - "The page title must match the app name"
max_steps: 10
url: "http://localhost:3000"
tags: [smoke, homepage]

Common Test Categories

TagPurposeExample
smokeBasic page load and navigationHomepage renders, nav works
authAuthentication flowsLogin, logout, session persistence
crudCreate/read/update/delete operationsAdd item, edit, delete
formForm submissions and validationSubmit form, check validation errors
a11yRuntime accessibility checksKeyboard navigation, screen reader
perfPerformance measurementPage load time, interaction delay

Learning Capture (Optional)

After completing test execution, capture significant findings:

  1. For recurring test failures:

    bash
    ~/.claude/scripts/learning_capture.sh add \
      --category antipattern --language typescript \
      --title "<failure pattern>" \
      --description "<what failed and recommended fix>" \
      --source browser-test --confidence medium
    
  2. For new testing patterns discovered:

    bash
    ~/.claude/scripts/learning_capture.sh add \
      --category pattern --language general \
      --title "<testing pattern>" \
      --description "<what works well for browser testing>" \
      --source browser-test --confidence medium
    
  3. This step is non-blocking -- failures in learning capture should not affect the test report.