Screenshot Analysis Skill

This skill guides you on how to effectively use the Screenshot Analyzer MCP server to validate visual elements, UI states, game mechanics, and more through AI-powered vision analysis.

When to Use Screenshot Analysis

Screenshot analysis is valuable for:

•UI Validation: Verify colors, text, layout, and component visibility
•Game State Checks: Validate player position, game mechanics, UI elements in games
•Visual Regression Testing: Compare screenshots across versions or browsers
•Accessibility Checks: Verify contrast, text readability, and visual hierarchy
•Cross-Browser Validation: Ensure consistent appearance across browsers
•Dynamic Content Verification: Check content that loads asynchronously
•Error State Detection: Identify error messages, warnings, or unexpected states

Available Tools

`analyze_screenshot`

Analyzes a screenshot using OpenAI Vision API.

Key Parameters:

•screenshot (required): The full file path to the screenshot image.
•prompt (required): Your custom analysis prompt.
•model: OpenAI model (default: "gpt-4o")
•responseFormat: "text" or "json_object" (default: "text")
•maxTokens: Maximum tokens in response
•temperature: Generation temperature (default: 0)

Prompt Crafting Best Practices

1. Be Specific and Contextual

Bad:

code

"What do you see?"

Good:

code

"Check if the login button in the top-right corner is visible and what color it is. Also verify if the text says 'Sign In' or 'Log In'."

2. Include Location and Context

Always specify:

•Where to look (top-left, center, specific region)
•What to look for (button, text, color, layout)
•Expected state (visible, hidden, specific value)

Example:

code

"In the header navigation bar, verify that the 'Cart' button is visible and displays the number '3' in a red badge. The button should be positioned on the right side."

3. Request Structured Output When Needed

For programmatic use, request JSON format:

code

"Analyze this UI and return a JSON object with: { 'loginButtonVisible': boolean, 'loginButtonColor': string, 'headerText': string, 'errorMessages': string[] }"

4. Handle Dynamic Content

Account for:

•Loading states ("Check if the loading spinner is visible")
•Animations ("Wait for animations to complete, then check...")
•Dynamic data ("Verify the user's name appears in the profile section")

5. Cost Optimization

•Use gpt-4o-mini for simple checks (colors, visibility, text)
•Use gpt-4o for complex analysis (layout analysis, multiple elements, detailed descriptions)
•Set maxTokens to limit response length when appropriate

6. Multi-Step Validation

Break complex validations into steps:

•Capture screenshot
•Analyze for specific element
•Based on result, capture another screenshot or analyze further

Common Use Cases

UI Element Validation

Prompt Example:

code

"Verify that the submit button is visible, enabled (not grayed out), and displays the text 'Submit Order'. Check if it's positioned below the form fields."

Color and Styling Checks

Prompt Example:

code

"Check the color of the primary action button. It should be blue (#0066CC). Also verify the text is white and the button has rounded corners."

Text Content Verification

Prompt Example:

code

"Read the heading text at the top of the page. It should say 'Welcome to Dashboard'. Also check if there's any error message displayed in red text."

Layout and Positioning

Prompt Example:

code

"Verify the layout: the sidebar should be on the left (200px wide), the main content area should be in the center, and the header should span the full width at the top."

Game State Validation

Prompt Example:

code

"Check the game state: verify the player's health bar shows 75%, the score displays '1,250', and there are 3 enemies visible on screen. The pause button should be in the top-right corner."

Visual Regression

Prompt Example:

code

"Compare this screenshot to the expected design. Check for: 1) Header height matches (should be 60px), 2) Logo is positioned correctly, 3) Navigation items are aligned, 4) No unexpected elements or spacing issues."

Workflow Patterns

Pattern 1: Simple Validation

•Obtain screenshot
•Analyze with specific prompt
•Parse result and validate

Pattern 2: Multi-Element Check

•Obtain screenshot
•Analyze with structured JSON prompt requesting multiple elements
•Parse JSON and validate all elements

Pattern 3: Conditional Validation

•Obtain initial screenshot
•Analyze to determine current state
•Based on state, obtain additional screenshots or perform actions
•Validate final state

Pattern 4: Comparison Validation

•Obtain screenshot
•Obtain screenshot of current state
•Analyze both (or use comparison logic)
•Validate differences are expected

Error Handling

•Analysis Failures: Verify API key, check prompt clarity, ensure screenshot is a valid full file path
•Invalid Format: Ensure screenshot is a valid file path

Tips for Effective Analysis

•Start Broad, Then Narrow: First verify overall page state, then focus on specific elements
•Use Descriptive Prompts: Include visual cues (colors, positions, sizes) in your prompts
•Leverage Structured Output: Use JSON format for programmatic validation
•Combine with Other Tools: Use screenshot analysis alongside DOM inspection or API checks
•Ensure Base64 Format: Screenshots must be provided as base64-encoded strings
•Iterate on Prompts: Refine prompts based on analysis results

Example Workflow

javascript

// 1. Obtain screenshot (from external source)
// Screenshot should be provided as a valid full file path
const screenshotBase64 = "...";

// 2. Analyze for specific elements
const analysis = await analyze_screenshot({
  screenshot: screenshotBase64,
  prompt:
    "Return JSON with: { 'headerVisible': boolean, 'userName': string, 'notificationCount': number, 'primaryButtonColor': string }",
  responseFormat: "json_object",
  model: "gpt-4o",
});

// 3. Validate results
const result = JSON.parse(analysis.analysis);
if (!result.headerVisible) {
  throw new Error("Header is not visible");
}
if (result.notificationCount !== 3) {
  throw new Error(
    `Expected 3 notifications, found ${result.notificationCount}`,
  );
}

Screenshot Analysis Skill

When to Use Screenshot Analysis

Available Tools

analyze_screenshot

Prompt Crafting Best Practices

1. Be Specific and Contextual

2. Include Location and Context

3. Request Structured Output When Needed

4. Handle Dynamic Content

5. Cost Optimization

6. Multi-Step Validation

Common Use Cases

UI Element Validation

Color and Styling Checks

Text Content Verification

Layout and Positioning

Game State Validation

Visual Regression

Workflow Patterns

Pattern 1: Simple Validation

Pattern 2: Multi-Element Check

Pattern 3: Conditional Validation

Pattern 4: Comparison Validation

Error Handling

Tips for Effective Analysis

Example Workflow

See Also

`analyze_screenshot`