QA Verification
Test like a human QA — navigate the app, look at the screen, and flag what looks wrong.
Core Rules
- •
Screenshots are your eyes. Every verification judgment must come from looking at a screenshot. If you can't see it in the screenshot, you can't claim it passes or fails.
- •
browser_snapshotis for interaction only. You need snapshot refs to click buttons and fill forms — that's fine. But NEVER use snapshot/DOM data to verify acceptance criteria. A human QA doesn't open DevTools to check if a element exists; they look at the screen. - •
Flag visual problems even if they aren't in the acceptance criteria. A human QA notices when a modal is cut off, text overflows its container, buttons overlap, or layout looks broken. You should too. Report these as additional findings separate from acceptance criteria results.
- •
Navigate like a user. Click links, fill forms, wait for pages to load. Don't skip steps or assume state.
What to Look For
Beyond acceptance criteria, flag anything a human would notice:
- •Layout issues — elements overlapping, content cut off at viewport edges, modals extending off-screen, unexpected scrollbars
- •Text problems — truncated labels, text overflowing containers, unreadable contrast, placeholder text still showing
- •Broken interactions — buttons that don't appear clickable, missing hover/focus states, forms that don't respond
- •Loading/state issues — spinners that never resolve, flash of unstyled content, empty states where data should be
- •Visual inconsistencies — misaligned elements, inconsistent spacing, elements that look out of place
Workflow
1. Parse Task Specification
Read the task file from docs/tasks/task-<id>.md or docs/TASKS.md. Extract:
- •Acceptance criteria: Specific conditions to verify
- •User flows: Step-by-step interactions to walk through
If user specifies a PR number, use gh pr view <number> to identify affected files and infer the relevant task.
2. Gather Test Context
Before starting, collect:
| Item | Source | Default |
|---|---|---|
| Target URL | User-provided | http://localhost:3000 (also works with preview/staging/production URLs) |
| Task/PR | User-specified | Ask user |
| Auth required? | Task spec or inference | Assume no |
| Screenshot dir | User preference | ./qa-screenshots/ |
3. Verification Loop
For each acceptance criterion:
1. SCREENSHOT the current state 2. LOOK at the screenshot — does everything look right? 3. ACT — click, type, navigate (use browser_snapshot only to get element refs) 4. WAIT for the page to settle 5. SCREENSHOT the result 6. LOOK at the screenshot — did the expected thing happen? Does anything look off? 7. RECORD the result with screenshot references
At every screenshot, ask yourself: "If I were a human looking at this screen, would anything catch my eye as wrong?" Flag it even if it's unrelated to the current criterion.
4. Authentication Handling
Google Account Selection — handle automatically:
If the page shows a Google account picker ("Choose an account", list of emails), click the first account and continue. Do NOT ask the user for help.
1. browser_take_screenshot to see the auth screen 2. If it looks like a Google account picker, browser_snapshot to get the ref 3. browser_click the first account 4. browser_wait_for(time=2) for redirect 5. browser_take_screenshot to confirm auth completed 6. Continue verification
Credential-based Login — defer to user:
If the page has a username/password form, pause and ask:
## Auth Required I've encountered a login screen at [URL]. **Screenshot:** auth-required.png **Options:** 1. **Manual login**: I'll wait while you log in via the browser, then continue 2. **Skip auth flows**: Mark auth-required tests as SKIPPED 3. **Provide credentials**: Share test credentials to proceed
5. Screenshot Strategy
| When | Filename Pattern | Why |
|---|---|---|
| Before any action | {criterion}-before-{action}.png | Baseline |
| After action completes | {criterion}-after-{action}.png | Verify result |
| Something looks wrong | {criterion}-FAIL-{desc}.png | Evidence |
| Visual issue unrelated to criteria | visual-issue-{desc}.png | Additional finding |
- •Use
fullPage: truefor layout verification - •Use element screenshots for component-level detail
- •PNG format
6. Generate Report
# QA Report **Task:** [task ID or PR number] **URL:** [target URL] **Date:** [timestamp] **Screenshots:** [directory path] ## Summary - Passed: X - Failed: Y - Skipped: Z - Visual Issues: N (not in acceptance criteria but worth noting) ## Acceptance Criteria Results ### 1. [Criterion] **Status:** PASS / FAIL / SKIP **Steps:** 1. [What you did + screenshot ref] 2. [What you saw + screenshot ref] **Notes:** [What you verified visually] ## Additional Visual Issues ### [Issue description] **Screenshot:** [ref] **Severity:** Minor / Major **Details:** [What looks wrong and where]
Playwright Tools
For interaction (getting refs, clicking, typing):
- •
browser_snapshot— get element refs so you can interact. NOT for verification. - •
browser_click,browser_type,browser_fill_form,browser_select_option— interact with the page - •
browser_navigate— go to URLs - •
browser_wait_for— wait for page updates
For verification (looking at the screen):
- •
browser_take_screenshot— this is how you see. Use it constantly.
Example Session
User: "Verify task 3.1 against localhost:5173"
- •Read
docs/tasks/task-3.1.md, extract acceptance criteria - •
browser_navigatetohttp://localhost:5173 - •
browser_take_screenshot— look at the landing state, note anything off - •For each criterion:
- •Screenshot before
- •
browser_snapshotto get refs, then interact - •Wait for result
- •Screenshot after
- •Look at the screenshot — does it pass? Anything else look wrong?
- •Record result
- •If auth encountered: auto-select Google account or pause for credential login
- •Generate report with all screenshots, criteria results, and any additional visual issues found