Site Checker
Validate all pages reachable from an index URL. Checks HTTP status, broken media, SEO tags, accessibility basics, and console errors.
Arguments
The user provides:
- •Index URL (required): e.g.,
http://localhost:4321/blog,https://nx.dev/docs,https://deploy-preview-123--nx-dev.netlify.app/blog - •Limit (optional): max pages to check, default 50
- •Compare URL (optional): base URL of a reference site to diff DOM against (e.g.,
https://nx.devwhen checking a preview deploy) - •Path filter (optional): only follow links matching this prefix. Defaults to the path of the index URL.
Process
Phase 1: Extract Links from Index Page
- •
Navigate to the index URL using Playwright MCP:
codemcp__playwright__browser_navigate({ url: INDEX_URL }) mcp__playwright__browser_snapshot() - •
Extract all
<a href>links from the page usingbrowser_evaluate:javascriptmcp__playwright__browser_evaluate({ expression: `JSON.stringify( [...document.querySelectorAll('a[href]')] .map(a => new URL(a.href, document.baseURI).href) .filter((v, i, arr) => arr.indexOf(v) === i) )` }) - •
Filter links:
- •Same origin only — must match the index URL's origin (no external links)
- •Path prefix — must start with the base path (e.g.,
/blog/) unless user specified a different filter - •Deduplicate — remove query params and fragments for uniqueness
- •Exclude anchors — skip
#-only links - •Apply limit — cap at the user's limit (default 50)
- •
Report: "Found N links matching
<prefix>on<index URL>. Checking M pages (limit: L)."
Phase 2: Check Each Page
Spawn parallel subagents (up to 5 at a time) to check pages concurrently. Each agent checks one page using Playwright MCP.
For each page URL, perform ALL of the following checks:
Check 1: HTTP Status
- •Navigate to the URL with
mcp__playwright__browser_navigate - •Record final status. A page is OK if it loads without a network error.
- •If the page shows a 404/error page, flag it.
- •Record any redirect chains (the page navigated away from the expected URL).
Check 2: Broken Media
Use browser_evaluate to check all media elements:
mcp__playwright__browser_evaluate({
expression: `JSON.stringify({
images: [...document.querySelectorAll('img')].map(img => ({
src: img.src,
alt: img.alt,
broken: !img.complete || img.naturalWidth === 0
})),
videos: [...document.querySelectorAll('video source, video[src]')].map(v => ({
src: v.src || v.getAttribute('src')
})),
pictures: [...document.querySelectorAll('picture source')].map(s => ({
srcset: s.srcset
}))
})`
})
- •Flag any image where
broken === true - •Flag images with empty or missing
src - •Flag images missing
altattribute (accessibility)
Check 3: SEO Tags
Use browser_evaluate:
mcp__playwright__browser_evaluate({
expression: `JSON.stringify({
title: document.title,
metaDesc: document.querySelector('meta[name="description"]')?.content || null,
ogTitle: document.querySelector('meta[property="og:title"]')?.content || null,
ogDesc: document.querySelector('meta[property="og:description"]')?.content || null,
ogImage: document.querySelector('meta[property="og:image"]')?.content || null,
ogUrl: document.querySelector('meta[property="og:url"]')?.content || null,
canonical: document.querySelector('link[rel="canonical"]')?.href || null,
twitterCard: document.querySelector('meta[name="twitter:card"]')?.content || null,
h1Count: document.querySelectorAll('h1').length,
h1Text: document.querySelector('h1')?.textContent?.trim() || null,
robots: document.querySelector('meta[name="robots"]')?.content || null
})`
})
Flag if:
- •Missing
<title>or it's empty - •Missing
meta[name="description"] - •Missing
og:title,og:description, orog:image - •
og:imageURL returns 404 (verify with a fetch) - •Missing
canonicallink - •
h1count is not exactly 1 - •
og:urldoesn't match the current page URL
Check 4: Console Errors
Check for JavaScript errors after page load:
mcp__playwright__browser_console_messages()
- •Flag any messages with level
error - •Ignore common benign errors (e.g., favicon.ico 404, third-party analytics)
Check 5: Internal Link Health
Use browser_evaluate to extract all internal links on the page:
mcp__playwright__browser_evaluate({
expression: `JSON.stringify(
[...document.querySelectorAll('a[href]')]
.map(a => new URL(a.href, document.baseURI))
.filter(u => u.origin === location.origin)
.map(u => u.pathname)
.filter((v, i, arr) => arr.indexOf(v) === i)
)`
})
- •Do NOT follow these (that would be recursion) — just log them for awareness
- •If any link visually appears broken (e.g., empty href,
javascript:void), flag it
Check 6: Mixed Content
On HTTPS pages, flag any resources loaded over HTTP:
mcp__playwright__browser_evaluate({
expression: `JSON.stringify(
[...document.querySelectorAll('[src], [href], [srcset]')]
.map(el => el.src || el.href || el.srcset)
.filter(url => typeof url === 'string' && url.startsWith('http://'))
)`
})
Check 7: Structured Data
mcp__playwright__browser_evaluate({
expression: `JSON.stringify(
[...document.querySelectorAll('script[type="application/ld+json"]')]
.map(s => { try { JSON.parse(s.textContent); return { valid: true }; } catch(e) { return { valid: false, error: e.message, content: s.textContent.slice(0, 200) }; }})
)`
})
- •Flag invalid JSON-LD
Phase 3: Compare Mode (Optional)
Only run if the user provides a compare URL (e.g., "compare against prod").
For each page that was checked:
- •Derive the equivalent URL on the reference site (same path, different origin)
- •Navigate to both pages
- •Extract the main content area DOM structure:
javascript
mcp__playwright__browser_evaluate({ expression: `document.querySelector('main, article, [role="main"], .content')?.innerHTML || document.body.innerHTML` }) - •Compare:
- •Tag structure: Are the same HTML elements present? Flag missing/extra elements.
- •Broken Markdoc: Look for raw
{% %}text visible in the page content (indicates unprocessed Markdoc tags) - •Content presence: Major text blocks should exist in both versions
- •Image count: Same number of images in content area
Report differences as:
- •CRITICAL: Raw Markdoc tags visible, entire sections missing, content significantly shorter
- •WARNING: Minor structural differences, different class names (expected with redesigns)
- •INFO: Cosmetic differences, additional wrapper elements
Phase 4: Report
Generate a markdown report with two outputs:
- •Inline in conversation — summary table + failures only
- •File at
.ai/<today's date>/site-check-report.md— full details
Report Format
# Site Check Report
**Index:** <URL>
**Date:** <date>
**Pages checked:** N / M found
**Compare:** <reference URL or "none">
## Summary
| Status | Count |
|--------|-------|
| OK | N |
| Warnings | N |
| Errors | N |
## Errors
### <page URL>
- **[STATUS]** Page returned 404
- **[IMAGE]** Broken image: `<src>`
- **[SEO]** Missing og:title
- ...
## Warnings
### <page URL>
- **[SEO]** Missing meta description
- **[A11Y]** Image missing alt: `<src>`
- **[CONSOLE]** JS error: `<message>`
- ...
## Compare Differences (if applicable)
### <path>
- **[CRITICAL]** Raw Markdoc tag visible: `{% tabs %}`
- **[WARNING]** Section "Getting Started" has 3 fewer paragraphs
- ...
Severity Classification
| Severity | Criteria |
|---|---|
| ERROR | 404/5xx status, broken images in content, missing og:image that 404s |
| WARNING | Missing SEO tags, missing alt text, console errors, redirect chains |
| INFO | Minor issues, cosmetic compare differences |
Important Notes
- •Never follow external links — only same-origin URLs
- •Shallow crawl only — only visit pages linked from the index, don't recurse
- •Respect limits — default 50 pages, user can override
- •Parallel execution — use up to 5 subagents for page checks to speed things up
- •Don't cache — each check should be a fresh page load
- •Report promptly — show inline summary as soon as checks complete, then write the file
- •When running on localhost, the user must have the dev server running. If navigation fails, remind them to start it.
- •Netlify preview URLs look like
https://deploy-preview-NNN--SITE.netlify.app/ - •For
og:imagevalidation, use WebFetch to check the image URL returns 200