Agentic UI Walkthrough Skill
Systematic approach to reviewing user interfaces — finding bugs, assessing UX quality, and verifying visual correctness through screenshot-driven analysis.
When to Use
- •After implementing UI changes (web, Electron, SwiftUI, native)
- •Before presenting work to the user for review
- •When the user reports visual issues or "something feels off"
- •During UI polish or UX improvement sprints
- •As part of the audit/harden cycle in Codex collaboration
Screenshot-Driven Review Flow
Step 1: Capture Baseline
Capture screenshots of every distinct UI state.
[!IMPORTANT] For this repo's web UI: CC agents call the MCP API for visual testing using the
visual-reviewskill (.agent/skills/visual-review/SKILL.md). CC agents must NOT run Playwright or capture screenshots directly. The commands below apply only to ux-visual-reviewer-resident agents or platform-specific contexts (iOS/watchOS).
# iOS/iPad simulator xcrun simctl io DEVICE_UUID screenshot /tmp/ui-baseline.png # watchOS simulator xcrun simctl io DEVICE_UUID screenshot /tmp/watch-baseline.png
Step 2: Walk Through Every State
Document each state you can reach:
- •Default/empty state
- •Loading state
- •Populated/data state
- •Error state
- •Edge cases (no data, very long text, single item, 100+ items)
Step 3: Analyze Against Heuristics
Use the UX Heuristic Checklist (below) to systematically evaluate each screen.
Step 4: Document Findings
Create a prioritized report with embedded screenshots showing issues.
Step 5: Fix ALL Findings
[!CAUTION] Every P-level finding must be fixed — not deferred, dismissed, or rationalized. Agents must NEVER classify findings as "acceptable," "cosmetic-only," or "not the major issue." UX Sandbox checks are explicitly refined by the project owner and are intentional. If the NRQA or visual review flags it, fix it. The only exception is a single finding estimated at >2 hours of work — document it as a follow-up issue with user confirmation.
- •P0-P1: Fix immediately — blocks ship
- •P2: Fix in the same session — do NOT defer
- •P3: Fix in the same session — do NOT log for "future iteration"
UX Heuristic Checklist
Rate each item ✅ (good), ⚠️ (needs work), or ❌ (broken):
Visual Design
- • Visual hierarchy — most important information is prominent
- • Consistent spacing — margins and padding follow a system (4/8/12/16/24px)
- • Typography — readable font sizes, clear hierarchy (h1 > h2 > body > caption)
- • Color contrast — text is readable against background (WCAG AA minimum)
- • Alignment — elements are properly aligned (no off-by-1 pixel issues)
- • Empty states — empty views have helpful messaging, not blank screens
- • Loading states — spinners/skeletons show when data is loading
- • Error states — errors are clearly communicated with actionable recovery
Interaction Design
- • Responsive — layout works at all supported sizes
- • Touch targets — buttons/links are at least 44×44pt (mobile) or clearly clickable (web)
- • Feedback — user actions have visible responses (hover, press, loading)
- • Navigation — user always knows where they are and how to go back
- • Affordance — interactive elements look interactive
- • Consistency — similar actions behave the same way everywhere
Content & Data
- • Data freshness — displayed data reflects current state
- • Overflow handling — long text truncates gracefully (ellipsis, not clipping)
- • Number formatting — numbers, dates, times are formatted for readability
- • Localization-ready — no hardcoded strings that would break in translation
Performance & Polish
- • Smooth transitions — animations are fluid, no jank
- • No layout shifts — content doesn't jump around during load
- • Image quality — images are sharp on retina displays
- • Scroll performance — long lists scroll smoothly
Bug Classification Matrix
| Category | Examples | Priority |
|---|---|---|
| Crash | App crashes on action, unhandled exception | P0 |
| Data Loss | User input lost, unsaved changes discarded | P0 |
| Functional | Button does nothing, wrong data displayed | P1 |
| Visual | Layout broken, overlapping elements, wrong colors | P1 |
| UX | Confusing flow, missing feedback, poor affordance | P2 |
| Performance | Slow load, janky scroll, delayed response | P2 |
| Polish | Minor spacing, subtle animation issues | P3 |
| Accessibility | Missing labels, low contrast, no keyboard nav | P2 |
Review Report Format
# UI Review: [Feature/Screen Name] **Date:** YYYY-MM-DD **Platform:** [Web/iOS/iPadOS/watchOS/Electron] **Reviewer:** [Agent name] ## Summary [1-2 sentence overall assessment] ## Screenshots  ## Findings ### P1 — Must Fix 1. **[Title]** — [Description] - Screenshot: [reference] - Location: [file:line] - Suggested fix: [approach] ### P2 — Should Fix ... ### P3 — Nice to Have ... ## UX Scorecard | Category | Score | |----------|-------| | Visual Design | ✅ / ⚠️ / ❌ | | Interaction | ✅ / ⚠️ / ❌ | | Content & Data | ✅ / ⚠️ / ❌ | | Performance | ✅ / ⚠️ / ❌ |
Platform-Specific Patterns
Electron/Web (Two-Tier Testing via UX Visual Reviewer)
[!IMPORTANT] CC agents run visual reviews by calling the MCP API via the
visual-reviewskill (.agent/skills/visual-review/SKILL.md). CC agents must NOT run Playwright or testing tools directly.
Tier 1 — UX Visual Reviewer Playwright (layout, structure, styling):
- •UX Visual Reviewer loads via
file://andhttp://127.0.0.1:3050/gui/— tests HTML/CSS/DOM - •Tests layout, responsive breakpoints, DOM structure, form input styling
- •Uses injected click handlers to test sidebar view toggling
- •Executed by: ux-visual-reviewer infrastructure only (CC agents call MCP API, consume artifacts)
Tier 2 — Live Browser Evidence (data, functional proof):
- •
http://127.0.0.1:3050/gui/serves the full GUI withbrowser-renderer.jsfor HTTP-based data loading - •UX Visual Reviewer captures screenshots showing real data (workspaces, machines, agents)
- •Exercise EVERY sidebar/navigation view — 📂, 📡, 🎯
- •Verify data loads: workspace cards, machine list, agent activity
- •Check styled components: form inputs, buttons, selects matching dark theme
- •Test at multiple viewport sizes (1280×800, 1440×900, 1728×1117)
- •Executed by: ux-visual-reviewer infrastructure only
Both tiers required. Tier 1 alone proves only structure. Tier 2 alone proves only one moment in time. Together they provide comprehensive evidence. CC agents verify the returned artifacts meet the gate checklist.
SwiftUI (iOS/iPad/Watch)
- •Test both light and dark appearance
- •Test Dynamic Type (accessibility font sizes)
- •Check Safe Area handling (notch, home indicator)
- •Verify NavigationStack push/pop animations
- •Test rotating orientation (iPad)
watchOS-Specific
- •Verify Digital Crown scrolling
- •Check text fits on small screens (40mm vs 49mm)
- •Test complications if applicable
- •Verify data loads before app is suspended
Anti-Patterns to Catch
- •"It works on my screen" — always test multiple sizes/states
- •Invisible errors — errors that log but don't inform the user
- •Phantom loading — loading indicator that never resolves
- •Dead-end states — no way to recover or navigate away
- •Data staleness — showing cached data without indicating age
- •Truncation without affordance — text cut off with no way to see full content
- •Inconsistent spacing — mixing 10px, 12px, 15px instead of a spacing scale
- •Missing empty states — blank screen when there's no data