Verify FireCrawl Crawled Content

Name: verify-crawl
Rating: 78
Author: ihainan

This skill verifies that Markdown files scraped by FireCrawl accurately reflect the content of the original web pages.

When to Use

•Read the Markdown file specified by the user
•
Extract metadata from the YAML frontmatter:
- •source_url: The original URL that was scraped
- •scraped_at: When the content was scraped
•Fetch the original web page using the WebFetch tool with the source_url
•Compare content between the Markdown file and the freshly fetched content
•Generate verification report

The Markdown files should have this structure:

markdown

---
source_url: https://example.com/page
scraped_at: 2026-01-09
---

# Page Title

Content...

When comparing, check for:

•Title Match: Does the main heading match?
•Key Sections Present: Are all major sections from the original present in the scraped file?
•Important Data Accuracy: Are dates, names, numbers accurately captured?
•Link Integrity: Are important links preserved?
•Content Completeness: Is there significant missing content?

Return the verification result in this format:

code

File: <file_path>
URL: <source_url>
Status: PASS | FAIL
Comment: <brief explanation of findings>

•PASS: The Markdown file accurately represents the original web page content with no significant discrepancies
•FAIL: There are notable differences, missing content, or errors between the Markdown and the original page

User: "Verify the crawled file at data/aaai-26/bridge-program/bridge-program.md"

Steps:

•If the original page has been updated since scraping, note this in the comment
•Focus on content accuracy, not formatting differences (minor whitespace/formatting differences are acceptable)
•If the page requires JavaScript rendering, mention this as a potential cause for discrepancies