Cursor IDE Browser Automation
Browser automation tool for Cursor IDE using MCP (Model Context Protocol) server cursor-ide-browser and accessibility snapshots for precise element interaction.
Core Mechanism
Accessibility Snapshot First: Always get a snapshot before interacting with elements. The snapshot provides structured page information with element references (ref) needed for all interactions.
// Standard workflow browser_navigate(url="https://example.com") browser_snapshot() // Required: Get element references browser_click(element="Button", ref="ref-from-snapshot")
Essential Workflow
- •Navigate to target page
- •Snapshot to get element references (required before any interaction)
- •Convert to Markdown (⭐ Recommended) for easier searching, locating and reading
- •Search with grep in md to find information or locate interactive elements
- •Interact using refs from snapshot
- •Wait for dynamic content if needed
- •Verify with screenshots or console messages
Quick example:
browser_navigate(url="https://example.com") browser_snapshot() // Creates .log file mcp_snapshot-query_convert_to_markdown(file_path="snapshot.log") grep(pattern="button|登录", path="snapshot.md") // Find elements browser_click(element="Login", ref="ref-from-grep-results")
Key Tools
Navigation:
- •
browser_navigate(url, position?)- Navigate to URL - •
browser_navigate_back()- Go back
Page Information:
- •
browser_snapshot()- Required before interactions - Get accessibility tree with element refs - •
browser_take_screenshot(fullPage?, filename?)- Capture visual - •
browser_console_messages()- Get console logs - •
browser_network_requests()- Get network activity
Element Interaction:
- •
browser_click(element, ref, doubleClick?, button?, modifiers?)- Click element - •
browser_type(element, ref, text, submit?, slowly?)- Type text - •
browser_hover(element, ref)- Hover - •
browser_select_option(element, ref, values)- Select dropdown - •
browser_press_key(key)- Press key (supports PageDown, PageUp, ArrowDown, ArrowUp, Space, End, Home for scrolling)
Synchronization:
- •
browser_wait_for(text?, textGone?, time?)- Wait for text or time
Tab Management:
- •
browser_tabs(action, index?, position?)- Manage tabs (list/new/close/select)
Element References
- •
**element**: Human-readable description (for permission confirmation) - •
**ref**: Technical reference from snapshot (required for interaction) - •Refs are page-state specific - get a new snapshot after navigation or page changes
Snapshot Files
Snapshots are automatically saved as YAML files:
- •Location:
C:\Users\{username}\.cursor\browser-logs\snapshot-{timestamp}.log - •Format: YAML accessibility tree with
role,ref,name,children - •Usage: Extract
refvalues for element interactions
Querying Snapshots
⭐ Recommended Workflow: Convert to Markdown + Grep
Best practice for finding information and locating interactive elements:
- •Get snapshot → Creates
.logfile - •Convert to Markdown → More readable format with structured content
- •Use grep → Fast text search across the entire document
- •Extract refs → Use found refs for interactions
// Step 1: Get page snapshot browser_snapshot() // Creates: snapshot-2026-01-10T23-43-30-351Z.log // Step 2: Convert to Markdown (RECOMMENDED) mcp_snapshot-query_convert_to_markdown( file_path="snapshot-2026-01-10T23-43-30-351Z.log", include_ref=true ) # save to snapshot-2026-01-10T23-43-30-351Z.md // Step 3: Search with grep (much easier than querying raw YAML) grep(pattern="搜索|button|登录", path="snapshot.md", -i=true) grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md") // Find all links/buttons // Step 4: Use found refs for interaction browser_click(element="Login button", ref="ref-found-from-grep")
Why this workflow is preferred:
- •✅ More readable: Markdown format is human-friendly
- •✅ Faster search:
grepis more efficient than parsing YAML - •✅ Better context: See surrounding content with
-Cflag - •✅ Easy element discovery: Links and buttons clearly formatted
- •✅ Preserves refs: All element references included for interaction
Alternative: Direct Query Tools
For programmatic element finding, use snapshot-query MCP tools:
Command line:
browser_snapshot() # Generate snapshot uvx snapshot-query snapshot.log find-name "search" # Find element
MCP tools:
mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="搜索") mcp_snapshot-query_find_by_role(file_path="snapshot.log", role="button") mcp_snapshot-query_find_by_text(file_path="snapshot.log", text="登录") mcp_snapshot-query_find_by_regex(file_path="snapshot.log", pattern="\\d+\\s*ft", field="name") mcp_snapshot-query_find_by_name_bm25(file_path="snapshot.log", name="search query", top_k=5) mcp_snapshot-query_count_elements(file_path="snapshot.log") mcp_snapshot-query_get_element_path(file_path="snapshot.log", ref="ref-xxx") mcp_snapshot-query_extract_all_refs(file_path="snapshot.log")
Integrated workflow:
browser_snapshot() // Creates snapshot file // Query snapshot to find element ref const result = mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="Login") browser_click(element="Login", ref=result.ref) // Use ref from query
⭐ snapshot-query works with OCR results too:
The snapshot-query tools can process OCR results from fast-paddleocr-mcp. After OCR processing, you get a .snapshot.log file that can be queried just like browser snapshots:
// OCR generates webpage.png.snapshot.log mcp_fast-paddleocr-mcp_ocr_image(image_path="webpage.png", language="ch") // Query OCR results with snapshot-query mcp_snapshot-query_find_by_text( file_path="webpage.png.snapshot.log", text="8 ft", case_sensitive=false ) // Use regex to find measurements mcp_snapshot-query_find_by_regex( file_path="webpage.png.snapshot.log", pattern="\\d+\\s*ft|cm|meters?", field="name" ) // Semantic search for better results mcp_snapshot-query_find_by_name_bm25( file_path="webpage.png.snapshot.log", name="height measurement", top_k=5 ) // Convert to Markdown for analysis mcp_snapshot-query_convert_to_markdown( file_path="webpage.png.snapshot.log", include_ref=true )
See references/snapshot-query.md for complete snapshot-query documentation.
Common Patterns
Login flow:
browser_navigate(url="https://example.com/login") browser_snapshot() // Find username input ref from snapshot browser_type(element="Username", ref="ref-username", text="user") // Find password input ref from snapshot browser_type(element="Password", ref="ref-password", text="pass") // Find login button ref from snapshot browser_click(element="Login", ref="ref-login-btn") browser_wait_for(text="Welcome")
Search and extract (with Markdown workflow):
browser_navigate(url="https://www.baidu.com/s?wd=哈梅内伊有几个孩子") browser_snapshot() // Creates snapshot.log // Convert to Markdown for easier searching mcp_snapshot-query_convert_to_markdown( file_path="snapshot.log", include_ref=true ) // Search for information using grep grep(pattern="六名|6个|子女", path="snapshot.md", -i=true, -C=3) // Find interactive elements (links/buttons) grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md") // Click on found link using ref browser_click(element="Article link", ref="ref-45py92vjdrs") browser_wait_for(text="Results") browser_take_screenshot(filename="results.png")
Debug page issues:
browser_snapshot() browser_console_messages() // Check for errors browser_network_requests() // Check failed requests
Scrolling web pages:
browser_press_key("PageDown") // Scroll down one page
browser_press_key("PageUp") // Scroll up one page
browser_press_key("ArrowDown") // Scroll down line by line
browser_press_key("ArrowUp") // Scroll up line by line
browser_press_key("Space") // Scroll down one screen
browser_press_key("End") // Scroll to bottom
browser_press_key("Home") // Scroll to top
browser_wait_for(time=1) // Wait after scrolling for content to load
OCR processing with fast-paddleocr-mcp:
// Take screenshot of webpage browser_take_screenshot(filename="webpage.png", fullPage=false) // Process with OCR (generates .md and .snapshot.log files) mcp_fast-paddleocr-mcp_ocr_image( image_path="webpage.png", language="ch" // Use "ch" for Chinese+English, "en" for English only ) // Query OCR results with snapshot-query mcp_snapshot-query_find_by_text( file_path="webpage.png.snapshot.log", text="tallest", case_sensitive=false ) // Use BM25 semantic search for better results mcp_snapshot-query_find_by_name_bm25( file_path="webpage.png.snapshot.log", name="height tallest person", top_k=5 ) // Convert OCR snapshot to Markdown for easier analysis mcp_snapshot-query_convert_to_markdown( file_path="webpage.png.snapshot.log", include_ref=true )
Cross-verification workflow:
// Navigate to multiple sources for verification browser_navigate(url="https://source1.com/article") browser_snapshot() // Extract information from source 1 browser_navigate(url="https://source2.com/article") browser_snapshot() // Extract information from source 2 // Compare and verify information consistency // Prefer authoritative sources (Wikipedia, official records, etc.)
Important Notes
- •Always snapshot before interaction - Refs are required and page-specific
- •⭐ Convert to Markdown first - Use
convert_to_markdown+grepfor finding information and elements (much easier than querying raw YAML) - •Wait for dynamic content - Use
browser_wait_for()for async operations - •Refs expire - Get new snapshot after navigation or page changes
- •Multi-tab support - Use
viewIdparameter orbrowser_tabs()to manage tabs - •Position control - Use
position="side"when user mentions side panel - •OCR limitations - OCR may merge adjacent text (e.g., "otherreliablesourcesccordingtoG"). Key information is usually extracted correctly, but verify important details
- •Cross-verification - For critical information, verify across multiple authoritative sources (Wikipedia, official records, etc.)
- •Tool combination - Combine browser automation + OCR + snapshot-query for comprehensive web content analysis
Best Practices & Lessons Learned
Workflow Optimization
- •Standard workflow: Navigate → Snapshot → Convert to Markdown → Search → Interact
- •OCR workflow: Screenshot → OCR → Query with snapshot-query → Extract information
- •Verification workflow: Multiple sources → Extract → Compare → Verify consistency
Tool Integration
- •Browser + OCR: Use
browser_take_screenshot()+fast-paddleocr-mcpto extract text from visual content - •OCR + snapshot-query: OCR generates
.snapshot.logfiles that can be queried with all snapshot-query tools - •Markdown + grep: Convert snapshots/OCR results to Markdown for easier searching
Key Insights
- •snapshot-query is universal: Works with both browser snapshots and OCR results
- •Markdown conversion is recommended: Much easier to search and read than raw YAML
- •BM25 semantic search: Use
find_by_name_bm25()for better relevance when exact matches are unclear - •Cross-verification: Always verify critical information from multiple authoritative sources
- •OCR accuracy: Works well for key information but may merge adjacent text - verify important details
Detailed Reference
- •Complete tool reference: See references/tools.md for all tools with full parameters
- •Examples and patterns: See references/examples.md for detailed workflows
- •Snapshot file format: See references/snapshot-format.md for YAML structure details
- •Snapshot querying: See references/snapshot-query.md for querying snapshot files