Web Scraping with Firecrawl
This skill enables intelligent web scraping and data extraction using Firecrawl.
When to Use
Use Firecrawl when you need to:
- •Extract content from web pages and convert to markdown
- •Crawl multiple pages from a website
- •Search the web and retrieve scraped results
- •Discover all URLs on a website (sitemap generation)
- •Get summaries, screenshots, or links from web pages
Available Tools
firecrawl_scrape
Scrape a single URL and return clean content.
- •Best for: Single page extraction, documentation pages, articles
- •Formats: markdown, html, rawHtml, screenshot, links, summary
firecrawl_batch_scrape
Scrape multiple URLs in a single request.
- •Best for: Known list of specific pages
- •Returns: Array of content from each URL
firecrawl_crawl
Crawl an entire website starting from a URL.
- •Best for: Comprehensive site extraction, documentation sites
- •Returns: Content from multiple discovered pages
- •Note: Async operation - use
firecrawl_check_crawl_statusto poll until complete
firecrawl_check_crawl_status
Check the status of an ongoing crawl operation.
- •Use after starting a crawl to monitor progress
- •Poll until status is "completed" or "failed"
firecrawl_map
Discover all URLs on a website.
- •Best for: Site structure analysis, pre-crawl planning
- •Returns: Array of discovered URLs
firecrawl_search
Search the web and return scraped results.
- •Best for: Research, finding current information
- •Returns: Search results with scraped content
firecrawl_agent
Autonomous AI agent for complex web data gathering.
- •Best for: Multi-step research, finding data across multiple sources, complex queries
- •Just describe what data you need - the agent searches, navigates, and extracts automatically
- •No URLs required - agent finds the information autonomously
- •Returns: Structured data matching your request
firecrawl_agent_status
Check the status of an agent job.
- •Use after starting an agent to monitor progress
- •Poll until status is "completed" or "failed"
Available Formats
When scraping, you can request different output formats:
- •
markdown- Clean markdown content (default) - •
html- Raw HTML content - •
rawHtml- Unprocessed HTML - •
screenshot- Screenshot of the page - •
links- Extract all links from the page - •
summary- AI-generated summary of the content
Best Practices
- •
Start with map: For large sites, use
firecrawl_mapfirst to understand the site structure - •
Use appropriate tool:
- •Known single URL →
scrape - •Known multiple URLs →
batch_scrape - •Unknown pages on a site →
crawl - •Need to find pages →
search - •Complex multi-source research →
agent
- •Known single URL →
- •
Handle crawls automatically: When crawling, always poll
firecrawl_check_crawl_statusuntil complete - don't ask users to check manually - •
Choose the right format: Use
summaryfor quick overviews,markdownfor full content,linksfor navigation analysis - •
Handle rate limits: Be mindful of API credits and rate limits on large operations
- •
JavaScript-rendered content: Firecrawl automatically handles dynamic content - no special configuration needed
Example Workflows
Research Task
- •Use
searchto find relevant pages - •Use
scrapeon promising results for full content
Documentation Extraction
- •Use
mapto discover all documentation pages - •Use
crawlorbatch_scrapeto extract content - •Poll
check_crawl_statusuntil complete
Quick Page Summary
- •Use
scrapewithsummaryformat for a quick overview - •Follow up with full
markdownscrape if more detail needed