AgentSkillsCN

agent-eyes

AI代理的视觉情境分析工具。为网页提供截图、无障碍性扫描、DOM快照以及元素描述。当需要直观了解网页外观、分析无障碍性问题、检查DOM结构,或获取元素的详细信息时,可调用此工具。触发条件包括:“截取屏幕截图”、“检查无障碍性”、“这页看起来怎么样?”、“分析UI界面”、“检查这个元素”,或任何与视觉/界面分析相关的任务。

SKILL.md
--- frontmatter
name: agent-eyes
description: Visual context analyzer for AI agents. Provides screenshots, accessibility scans, DOM snapshots, and element descriptions for web pages. Use when you need to see what a web page looks like, analyze accessibility issues, inspect DOM structure, or get detailed element information. Triggers on requests like "take a screenshot", "check accessibility", "what does this page look like", "analyze the UI", "inspect this element", or any visual/UI analysis task.

Agent Eyes

Visual context analyzer for web pages. Provides AI agents with the ability to "see" web applications through screenshots, accessibility scans, DOM snapshots, and element descriptions.

Prerequisites

  • Python 3.10+
  • uv package manager (recommended)
  • Playwright browsers installed: playwright install chromium

Compact Mode (Token-Efficient Output)

All commands support --compact / -c flag for token-efficient output:

ModeScreenshotDOMA11yTotal Tokens
StandardBase64 inlinedepth=5, 20 childrenFull violations~500K+
CompactFile path onlydepth=3, 10 childrenSummary only~3-5K

Use compact mode when context window size is a concern (which is most of the time).

bash
# Compact context - reduces ~500K tokens to ~3-5K tokens
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact

# Compact screenshot - always saves to file, never returns base64
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --compact

# Compact a11y - returns summary + top N issues only
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact

# Compact DOM - stricter limits on depth and children
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --compact

Commands

All commands use uv run for automatic dependency management:

bash
SKILL_DIR=".claude/skills/agent-eyes/scripts"

Screenshot

Capture full page or element screenshots:

bash
# Full page screenshot (saves to .canvas/screenshots/)
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000

# Element screenshot
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --selector ".hero"

# Save to specific path
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --output ./tmp/page.png

# Get as base64 (for inline context) - NOT recommended, use --compact instead
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --base64

# RECOMMENDED: Compact mode - always saves to file, never returns base64
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --compact

Accessibility Scan

Run axe-core accessibility analysis:

bash
# Full page scan (WCAG 2.1 AA)
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

# Scoped to element
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --selector "main"

# WCAG AAA level
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --level AAA

# RECOMMENDED: Compact mode - summary + top issues only (~1-2K tokens vs 100K+)
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact --max-issues 5

DOM Snapshot

Get simplified DOM tree:

bash
# Full page DOM
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000

# Subtree only
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --selector ".content"

# Control depth and children
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --depth 3 --max-children 10

# RECOMMENDED: Compact mode - depth=3, max-children=10, text=50 chars
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --compact

Describe Element

Get detailed element information (styles, bounding box, attributes):

bash
uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".hero-button"

Full Context

Get comprehensive context bundle (screenshot + a11y + DOM + description):

bash
# Full context for page
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000

# Focused on element
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --selector ".hero"

# Without screenshot (smaller output)
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --no-screenshot

# RECOMMENDED: Compact mode - file paths only, limited DOM/a11y (~3-5K tokens)
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact

# Compact with custom limits
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact \
  --dom-depth 2 --max-children 5 --max-issues 5

Output Format

All commands return JSON to stdout:

json
{
  "ok": true,
  "...": "command-specific fields"
}

On error:

json
{
  "ok": false,
  "error": "Error description"
}

Compact Mode Output Examples

Compact context output (~3-5K tokens instead of ~500K):

json
{
  "ok": true,
  "url": "http://localhost:3000",
  "title": "My App",
  "timestamp": "2026-01-22T10-30-00-000Z",
  "compact": true,
  "screenshot_path": ".canvas/screenshots/2026-01-22T10-30-00-000Z.png",
  "screenshot_size": 443281,
  "dom": {
    "tag": "body",
    "children": [...]
  },
  "a11y_summary": {
    "total_violations": 5,
    "by_severity": {"critical": 1, "serious": 2, "moderate": 2, "minor": 0},
    "top_issues": [
      {"id": "color-contrast", "impact": "serious", "affected_count": 3}
    ]
  }
}

Compact a11y output (~1-2K tokens instead of ~100K):

json
{
  "ok": true,
  "total_violations": 15,
  "by_severity": {"critical": 2, "serious": 5, "moderate": 6, "minor": 2},
  "by_category": {"color": 3, "aria": 5, "keyboard": 2},
  "top_issues": [
    {
      "id": "color-contrast",
      "impact": "serious",
      "description": "Elements must have sufficient color contrast...",
      "affected_count": 3,
      "help_url": "https://dequeuniversity.com/rules/axe/..."
    }
  ],
  "passes": 42,
  "incomplete": 3
}

Typical Agent Workflow

  1. Start dev server (if not running):

    bash
    npm run dev &
    
  2. Take initial screenshot to see current state:

    bash
    uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000
    
  3. Run accessibility scan to find issues:

    bash
    uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000
    
  4. Inspect specific element for details:

    bash
    uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".problematic-button"
    
  5. Get full context for comprehensive analysis:

    bash
    uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --selector ".hero"
    

Example: Analyze and Fix A11y Issues

bash
# 1. Get accessibility violations
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

# Output shows violations like:
# {
#   "ok": true,
#   "violations": [
#     {
#       "id": "color-contrast",
#       "impact": "serious",
#       "description": "Elements must have sufficient color contrast",
#       "nodes": [{"html": "<button class='cta'>..."}]
#     }
#   ]
# }

# 2. Describe the element to understand current styles
uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".cta"

# 3. Make code changes to fix the contrast issue

# 4. Re-run a11y to verify fix
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

Notes

  • Screenshots are saved to .canvas/screenshots/ by default with ISO timestamps
  • The tool runs headless Chromium via Playwright
  • All commands wait for networkidle before capturing
  • DOM snapshots are simplified to reduce output size
  • A11y scans use axe-core, the industry standard accessibility testing engine

Token Budget Guide

OperationStandard ModeCompact Mode
Screenshot~100-470K tokens (base64)~50 tokens (path only)
DOM Snapshot~50-150K tokens~2-3K tokens
A11y Scan~50-100K tokens~500-1K tokens
Full Context~500K+ tokens~3-5K tokens

Recommendation: Always use --compact flag unless you specifically need base64 data for inline image processing. The compact mode reduces token usage by 99% while preserving all essential information.

When to Use Each Mode

ModeUse Case
StandardDebugging, when you need full HTML snippets, when feeding to vision model
CompactMost agent workflows, design reviews, accessibility audits, CI/CD pipelines