Firecrawl Automation
Run Firecrawl web crawling and extraction directly from Claude Code. Scrape individual pages, crawl entire sites, extract structured data with AI, batch process URL lists, and map website structures without leaving your terminal.
Toolkit docs: composio.dev/toolkits/firecrawl
Setup
- •Add the Composio MCP server to your configuration:
code
https://rube.app/mcp
- •Connect your Firecrawl account when prompted. The agent will provide an authentication link.
- •Be mindful of credit consumption -- scope your crawls tightly and test on small URL sets before scaling.
Core Workflows
1. Scrape a Single Page
Fetch content from a URL in multiple formats with optional browser actions for dynamic pages.
Tool: FIRECRAWL_SCRAPE
Key parameters:
- •
url(required) -- fully qualified URL to scrape - •
formats-- output formats:markdown(default),html,rawHtml,links,screenshot,json - •
onlyMainContent(default true) -- extract main content only, excluding nav/footer/ads - •
waitFor-- milliseconds to wait for JS rendering (default 0) - •
timeout-- max wait in ms (default 30000) - •
actions-- browser actions before scraping (click, write, wait, press, scroll) - •
includeTags/excludeTags-- filter by HTML tags - •
jsonOptions-- for structured extraction withschemaand/orprompt
Example prompt: "Scrape the main content from https://example.com/pricing as markdown"
2. Crawl an Entire Site
Discover and scrape multiple pages from a website with configurable depth, path filters, and concurrency.
Tool: FIRECRAWL_CRAWL_V2
Key parameters:
- •
url(required) -- starting URL for the crawl - •
limit(default 10) -- max pages to crawl - •
maxDiscoveryDepth-- depth limit from the root page - •
includePaths/excludePaths-- regex patterns for URL paths - •
allowSubdomains-- include subdomains (default false) - •
crawlEntireDomain-- follow sibling/parent links, not just children (default false) - •
sitemap--include(default),skip, oronly - •
prompt-- natural language to auto-configure crawler settings - •
scrapeOptions_formats-- output format for each page - •
scrapeOptions_onlyMainContent-- main content extraction per page
Example prompt: "Crawl the docs section of firecrawl.dev, max 50 pages, only paths matching docs"
3. Extract Structured Data
Extract structured JSON data from web pages using AI with a natural language prompt or JSON schema.
Tool: FIRECRAWL_EXTRACT
Key parameters:
- •
urls(required) -- array of URLs to extract from (max 10 in beta). Supports wildcards likehttps://example.com/blog/* - •
prompt-- natural language description of what to extract - •
schema-- JSON Schema defining the desired output structure - •
enable_web_search-- allow crawling links outside initial domains (default false)
At least one of prompt or schema must be provided.
Check extraction status with FIRECRAWL_EXTRACT_GET using the returned job id.
Example prompt: "Extract company name, pricing tiers, and feature lists from https://example.com/pricing"
4. Batch Scrape Multiple URLs
Scrape many URLs concurrently with shared configuration for efficient bulk data collection.
Tool: FIRECRAWL_BATCH_SCRAPE
Key parameters:
- •
urls(required) -- array of URLs to scrape - •
formats-- output format for all pages (defaultmarkdown) - •
onlyMainContent(default true) -- main content extraction - •
maxConcurrency-- parallel scrape limit - •
ignoreInvalidURLs(default true) -- skip bad URLs instead of failing the batch - •
location-- geolocation settings withcountrycode - •
actions-- browser actions applied to each page - •
blockAds(default true) -- block advertisements
Example prompt: "Batch scrape these 20 product page URLs as markdown with ad blocking"
5. Map Website Structure
Discover all URLs on a website from a starting URL, useful for planning crawls or auditing site structure.
Tool: FIRECRAWL_MAP_MULTIPLE_URLS_BASED_ON_OPTIONS
Key parameters:
- •
url(required) -- starting URL (must behttps://orhttp://) - •
search-- guide URL discovery toward specific page types - •
limit(default 5000, max 100000) -- max URLs to return - •
includeSubdomains(default true) -- include subdomains - •
ignoreQueryParameters(default true) -- dedupe URLs differing only by query params - •
sitemap--include,skip, oronly
Example prompt: "Map all URLs on docs.example.com, focusing on API reference pages"
6. Monitor and Manage Crawl Jobs
Track crawl progress, retrieve results, and cancel runaway jobs.
Tools: FIRECRAWL_CRAWL_GET, FIRECRAWL_GET_THE_STATUS_OF_A_CRAWL_JOB, FIRECRAWL_CANCEL_A_CRAWL_JOB
- •
FIRECRAWL_CRAWL_GET-- get status, progress, credits used, and crawled page data - •
FIRECRAWL_CANCEL_A_CRAWL_JOB-- stop an active or queued crawl
Both require the cr