Cursor IDE Browser Automation

Browser automation tool for Cursor IDE using MCP (Model Context Protocol) server cursor-ide-browser and accessibility snapshots for precise element interaction.

Core Mechanism

Accessibility Snapshot First: Always get a snapshot before interacting with elements. The snapshot provides structured page information with element references (ref) needed for all interactions.

javascript

// Standard workflow
browser_navigate(url="https://example.com")
browser_snapshot()  // Required: Get element references
browser_click(element="Button", ref="ref-from-snapshot")

Essential Workflow

•Navigate to target page
•Snapshot to get element references (required before any interaction)
•Convert to Markdown (⭐ Recommended) for easier searching, locating and reading
•Search with grep in md to find information or locate interactive elements
•Interact using refs from snapshot
•Wait for dynamic content if needed
•Verify with screenshots or console messages

Quick example:

javascript

browser_navigate(url="https://example.com")
browser_snapshot()  // Creates .log file
mcp_snapshot-query_convert_to_markdown(file_path="snapshot.log")
grep(pattern="button|登录", path="snapshot.md")  // Find elements
browser_click(element="Login", ref="ref-from-grep-results")

Key Tools

Navigation:

•browser_navigate(url, position?) - Navigate to URL
•browser_navigate_back() - Go back

Page Information:

•browser_snapshot() - Required before interactions - Get accessibility tree with element refs
•browser_take_screenshot(fullPage?, filename?) - Capture visual
•browser_console_messages() - Get console logs
•browser_network_requests() - Get network activity

Element Interaction:

•browser_click(element, ref, doubleClick?, button?, modifiers?) - Click element
•browser_type(element, ref, text, submit?, slowly?) - Type text
•browser_hover(element, ref) - Hover
•browser_select_option(element, ref, values) - Select dropdown
•browser_press_key(key) - Press key (supports PageDown, PageUp, ArrowDown, ArrowUp, Space, End, Home for scrolling)

Synchronization:

•browser_wait_for(text?, textGone?, time?) - Wait for text or time

Tab Management:

•browser_tabs(action, index?, position?) - Manage tabs (list/new/close/select)

Element References

•**element**: Human-readable description (for permission confirmation)
•**ref**: Technical reference from snapshot (required for interaction)
•Refs are page-state specific - get a new snapshot after navigation or page changes

Snapshot Files

Snapshots are automatically saved as YAML files:

•Location: C:\Users\{username}\.cursor\browser-logs\snapshot-{timestamp}.log
•Format: YAML accessibility tree with role, ref, name, children
•Usage: Extract ref values for element interactions

Querying Snapshots

⭐ Recommended Workflow: Convert to Markdown + Grep

Best practice for finding information and locating interactive elements:

•Get snapshot → Creates .log file
•Convert to Markdown → More readable format with structured content
•Use grep → Fast text search across the entire document
•Extract refs → Use found refs for interactions

javascript

// Step 1: Get page snapshot
browser_snapshot()  // Creates: snapshot-2026-01-10T23-43-30-351Z.log

// Step 2: Convert to Markdown (RECOMMENDED)
mcp_snapshot-query_convert_to_markdown(
  file_path="snapshot-2026-01-10T23-43-30-351Z.log",
  include_ref=true
) # save to snapshot-2026-01-10T23-43-30-351Z.md

// Step 3: Search with grep (much easier than querying raw YAML)
grep(pattern="搜索|button|登录", path="snapshot.md", -i=true)
grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md")  // Find all links/buttons

// Step 4: Use found refs for interaction
browser_click(element="Login button", ref="ref-found-from-grep")

Why this workflow is preferred:

•✅ More readable: Markdown format is human-friendly
•✅ Faster search: grep is more efficient than parsing YAML
•✅ Better context: See surrounding content with -C flag
•✅ Easy element discovery: Links and buttons clearly formatted
•✅ Preserves refs: All element references included for interaction

Alternative: Direct Query Tools

For programmatic element finding, use snapshot-query MCP tools:

Command line:

bash

browser_snapshot()  # Generate snapshot
uvx snapshot-query snapshot.log find-name "search"  # Find element

MCP tools:

javascript

mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="搜索")
mcp_snapshot-query_find_by_role(file_path="snapshot.log", role="button")
mcp_snapshot-query_find_by_text(file_path="snapshot.log", text="登录")
mcp_snapshot-query_find_by_regex(file_path="snapshot.log", pattern="\\d+\\s*ft", field="name")
mcp_snapshot-query_find_by_name_bm25(file_path="snapshot.log", name="search query", top_k=5)
mcp_snapshot-query_count_elements(file_path="snapshot.log")
mcp_snapshot-query_get_element_path(file_path="snapshot.log", ref="ref-xxx")
mcp_snapshot-query_extract_all_refs(file_path="snapshot.log")

Integrated workflow:

javascript

browser_snapshot()  // Creates snapshot file
// Query snapshot to find element ref
const result = mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="Login")
browser_click(element="Login", ref=result.ref)  // Use ref from query

⭐ snapshot-query works with OCR results too:

The snapshot-query tools can process OCR results from fast-paddleocr-mcp. After OCR processing, you get a .snapshot.log file that can be queried just like browser snapshots:

javascript

// OCR generates webpage.png.snapshot.log
mcp_fast-paddleocr-mcp_ocr_image(image_path="webpage.png", language="ch")

// Query OCR results with snapshot-query
mcp_snapshot-query_find_by_text(
  file_path="webpage.png.snapshot.log",
  text="8 ft",
  case_sensitive=false
)

// Use regex to find measurements
mcp_snapshot-query_find_by_regex(
  file_path="webpage.png.snapshot.log",
  pattern="\\d+\\s*ft|cm|meters?",
  field="name"
)

// Semantic search for better results
mcp_snapshot-query_find_by_name_bm25(
  file_path="webpage.png.snapshot.log",
  name="height measurement",
  top_k=5
)

// Convert to Markdown for analysis
mcp_snapshot-query_convert_to_markdown(
  file_path="webpage.png.snapshot.log",
  include_ref=true
)

See references/snapshot-query.md for complete snapshot-query documentation.

Common Patterns

Login flow:

javascript

browser_navigate(url="https://example.com/login")
browser_snapshot()
// Find username input ref from snapshot
browser_type(element="Username", ref="ref-username", text="user")
// Find password input ref from snapshot
browser_type(element="Password", ref="ref-password", text="pass")
// Find login button ref from snapshot
browser_click(element="Login", ref="ref-login-btn")
browser_wait_for(text="Welcome")

Search and extract (with Markdown workflow):

javascript

browser_navigate(url="https://www.baidu.com/s?wd=哈梅内伊有几个孩子")
browser_snapshot()  // Creates snapshot.log
// Convert to Markdown for easier searching
mcp_snapshot-query_convert_to_markdown(
  file_path="snapshot.log",
  include_ref=true
)
// Search for information using grep
grep(pattern="六名|6个|子女", path="snapshot.md", -i=true, -C=3)
// Find interactive elements (links/buttons)
grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md")
// Click on found link using ref
browser_click(element="Article link", ref="ref-45py92vjdrs")
browser_wait_for(text="Results")
browser_take_screenshot(filename="results.png")

Debug page issues:

javascript

browser_snapshot()
browser_console_messages()  // Check for errors
browser_network_requests()  // Check failed requests

Scrolling web pages:

javascript

browser_press_key("PageDown")   // Scroll down one page
browser_press_key("PageUp")      // Scroll up one page
browser_press_key("ArrowDown")   // Scroll down line by line
browser_press_key("ArrowUp")     // Scroll up line by line
browser_press_key("Space")       // Scroll down one screen
browser_press_key("End")         // Scroll to bottom
browser_press_key("Home")        // Scroll to top
browser_wait_for(time=1)        // Wait after scrolling for content to load

OCR processing with fast-paddleocr-mcp:

javascript

// Take screenshot of webpage
browser_take_screenshot(filename="webpage.png", fullPage=false)

// Process with OCR (generates .md and .snapshot.log files)
mcp_fast-paddleocr-mcp_ocr_image(
  image_path="webpage.png",
  language="ch"  // Use "ch" for Chinese+English, "en" for English only
)

// Query OCR results with snapshot-query
mcp_snapshot-query_find_by_text(
  file_path="webpage.png.snapshot.log",
  text="tallest",
  case_sensitive=false
)

// Use BM25 semantic search for better results
mcp_snapshot-query_find_by_name_bm25(
  file_path="webpage.png.snapshot.log",
  name="height tallest person",
  top_k=5
)

// Convert OCR snapshot to Markdown for easier analysis
mcp_snapshot-query_convert_to_markdown(
  file_path="webpage.png.snapshot.log",
  include_ref=true
)

Cross-verification workflow:

javascript

// Navigate to multiple sources for verification
browser_navigate(url="https://source1.com/article")
browser_snapshot()
// Extract information from source 1

browser_navigate(url="https://source2.com/article")
browser_snapshot()
// Extract information from source 2

// Compare and verify information consistency
// Prefer authoritative sources (Wikipedia, official records, etc.)

Important Notes

•Always snapshot before interaction - Refs are required and page-specific
•⭐ Convert to Markdown first - Use convert_to_markdown + grep for finding information and elements (much easier than querying raw YAML)
•Wait for dynamic content - Use browser_wait_for() for async operations
•Refs expire - Get new snapshot after navigation or page changes
•Multi-tab support - Use viewId parameter or browser_tabs() to manage tabs
•Position control - Use position="side" when user mentions side panel
•OCR limitations - OCR may merge adjacent text (e.g., "otherreliablesourcesccordingtoG"). Key information is usually extracted correctly, but verify important details
•Cross-verification - For critical information, verify across multiple authoritative sources (Wikipedia, official records, etc.)
•Tool combination - Combine browser automation + OCR + snapshot-query for comprehensive web content analysis

Best Practices & Lessons Learned

Workflow Optimization

•Standard workflow: Navigate → Snapshot → Convert to Markdown → Search → Interact
•OCR workflow: Screenshot → OCR → Query with snapshot-query → Extract information
•Verification workflow: Multiple sources → Extract → Compare → Verify consistency

Tool Integration

•Browser + OCR: Use browser_take_screenshot() + fast-paddleocr-mcp to extract text from visual content
•OCR + snapshot-query: OCR generates .snapshot.log files that can be queried with all snapshot-query tools
•Markdown + grep: Convert snapshots/OCR results to Markdown for easier searching

Key Insights

•snapshot-query is universal: Works with both browser snapshots and OCR results
•Markdown conversion is recommended: Much easier to search and read than raw YAML
•BM25 semantic search: Use find_by_name_bm25() for better relevance when exact matches are unclear
•Cross-verification: Always verify critical information from multiple authoritative sources
•OCR accuracy: Works well for key information but may merge adjacent text - verify important details

Detailed Reference

•Complete tool reference: See references/tools.md for all tools with full parameters
•Examples and patterns: See references/examples.md for detailed workflows
•Snapshot file format: See references/snapshot-format.md for YAML structure details
•Snapshot querying: See references/snapshot-query.md for querying snapshot files