Authenticated Scrape
A guided workflow for scraping authenticated pages using Chrome DevTools automation. This skill uses dev-browser under the hood to capture network requests with auth headers automatically.
What This Skill Does
Helps you scrape data from authenticated pages by:
- •Opening a browser and navigating to the target site
- •Letting you log in normally (or automating it)
- •Capturing authenticated API requests automatically
- •Extracting the data you need
- •Creating reusable code/scripts for future scraping
Workflow
Step 1: Launch Browser & Navigate
- •Use
mcp__chrome-devtools__list_pagesto check existing pages - •Use
mcp__chrome-devtools__new_pageormcp__chrome-devtools__navigate_pageto open the target site - •Ask user to log in manually, OR offer to automate login if they want
Step 2: Navigate to Target Content
- •Once authenticated, navigate to the page with the data they want to scrape
- •Use
mcp__chrome-devtools__take_snapshotto verify page loaded correctly
Step 3: Capture Network Requests
- •Use
mcp__chrome-devtools__list_network_requeststo capture all API calls - •Filter for XHR/Fetch requests (these usually contain the data)
- •Show user a clean list of endpoints captured (with their URLs and types)
Step 4: Identify Target Request
- •Ask user which request contains the data they want
- •Use
mcp__chrome-devtools__get_network_requestto show details - •Display the request URL, headers (including auth), and response preview
Step 5: Extract Data
- •Get the full response data from the request
- •Parse JSON or HTML as needed
- •Ask what specific data points they want to extract
- •Use
jq, JavaScript, or other tools to extract the data
Step 6: Make It Reusable
- •Offer to create a standalone script that:
- •Uses the same headers/cookies
- •Makes the request programmatically
- •Parses and extracts the data
- •Save as a Node.js script, Python script, or simple curl command
- •Remind them that auth tokens expire
Important Reminders
- •Security: Network requests contain sensitive auth tokens. Handle carefully.
- •Token Expiration: Session tokens expire. Scripts may need token refresh logic.
- •Ethics: Only scrape your own authenticated sessions and respect ToS.
- •Rate Limiting: Be respectful if automating frequent requests.
Known Limitations
Automated Login Detection: Many sites (GitHub, Google, banking sites) detect automated browsers via Chrome DevTools Protocol and block login attempts. This is a security feature.
Workarounds:
- •Manual login approach: Ask user to log in manually in the browser window
- •Regular browser first: Have user log in via regular Chrome, then capture requests
- •Focus on data extraction: Skip automated login, focus on capturing already-authenticated sessions
- •API-friendly sites: Some demo/test APIs (ReqRes, JSONPlaceholder) are more lenient
What Works Well:
- •✅ Capturing network requests from any page
- •✅ Extracting headers, cookies, auth tokens
- •✅ Parsing JSON/HTML responses
- •✅ Generating reusable scripts
- •✅ Sites without strict bot detection
What May Not Work:
- •❌ Automated login on major platforms (GitHub, Google, Facebook)
- •❌ Sites with aggressive bot detection
- •❌ Multi-factor authentication (requires manual intervention)
Troubleshooting
"Could not log in - This browser may not be secure"
Cause: Site detects automated browser (DevTools Protocol) Solution: Have user log in manually in the browser window instead of automating login
Empty network requests / No XHR captured
Cause: Page hasn't loaded data yet or uses different request types Solution:
- •Wait for page to fully load
- •Check all request types, not just XHR/Fetch
- •Navigate to the page that actually loads the data
"Unexpected token '<'" or HTML instead of JSON
Cause: API endpoint requires authentication or returns error page Solution:
- •Verify the endpoint URL is correct
- •Check if authentication headers were captured
- •Try the request in the browser first to confirm it works
Tips
- •If the page uses complex auth (OAuth, JWT), capture the Authorization header
- •For cookie-based auth, capture the Cookie header
- •If requests fail later, you may need to recapture with fresh tokens
- •For pagination, help identify the pagination parameters in the request
- •Best practice: Let user log in manually, then capture the authenticated session
Example Interaction
User: /authenticated-scrape You: I'll help you scrape authenticated content. First, let me check if there's already a browser page open. [Lists pages, or creates new one] You: I've opened https://example.com/dashboard. Please log in manually in the browser. [User logs in] You: Great! Now navigate to the page with the data you want to scrape. [User navigates to /api/users endpoint or data page] You: I've captured 12 network requests. Here are the XHR/Fetch requests: 1. GET /api/users - 200 OK (JSON, 45KB) 2. GET /api/analytics - 200 OK (JSON, 12KB) 3. POST /api/events - 204 No Content Which request has the data you want? User: The first one You: [Shows request details with headers and preview] Found Authorization: Bearer eyJhbG... and Cookie: session_id=abc123... The response contains an array of 200 users with fields: id, email, name, created_at. What data would you like to extract? User: All the emails You: [Extracts emails and offers to save] I can create a reusable Node.js script that makes this request with the same auth headers. Would you like me to do that?
Start Here
When the skill is invoked, begin by asking the user:
- •What website/service they want to scrape
- •Whether they're already logged in or need help with authentication
Then proceed with Step 1 of the workflow.