AgentSkillsCN

enact/firecrawl

使用Firecrawl API抓取、爬取、搜索和提取网站上的结构化数据——将网页转换为LLM可用的markdown

SKILL.md
--- frontmatter
name: enact/firecrawl
version: 1.2.1
description: Scrape, crawl, search, and extract structured data from websites using Firecrawl API - converts web pages to LLM-ready markdown
enact: "2.0"

from: python:3.12-slim

build:
  - pip install requests

env:
  FIRECRAWL_API_KEY:
    description: Your Firecrawl API key from firecrawl.dev
    secret: true

command: python /workspace/firecrawl.py ${action} ${url} ${formats} ${limit} ${only_main_content} ${prompt} ${schema}

timeout: 300s

license: MIT

tags:
  - web-scraping
  - crawling
  - markdown
  - llm
  - ai
  - data-extraction
  - search
  - structured-data

annotations:
  readOnlyHint: true
  openWorldHint: true

inputSchema:
  type: object
  properties:
    action:
      type: string
      description: |
        The action to perform:
        - scrape: Extract content from a single URL
        - crawl: Discover and scrape all subpages of a website
        - map: Get all URLs from a website (fast discovery)
        - search: Search the web and get scraped results
        - extract: Extract structured data using AI
      enum:
        - scrape
        - crawl
        - map
        - search
        - extract
      default: scrape
    url:
      type: string
      description: The URL to process (for scrape, crawl, map, extract) or search query (for search action)
    formats:
      type: string
      description: Comma-separated output formats (markdown, html, links, screenshot). Used by scrape and crawl actions.
      default: markdown
    limit:
      type: integer
      description: Maximum number of pages to crawl (crawl action) or search results to return (search action)
      default: 10
    only_main_content:
      type: boolean
      description: Extract only the main content, excluding headers, navs, footers (scrape action)
      default: true
    prompt:
      type: string
      description: |
        Multi-purpose field:
        - For map: Search query to filter URLs
        - For extract: Natural language instruction for what to extract
      default: ""
    schema:
      type: string
      description: JSON schema string for structured extraction (extract action only). Define the shape of data you want to extract.
      default: ""
  required:
    - url

outputSchema:
  type: object
  properties:
    success:
      type: boolean
      description: Whether the operation succeeded
    action:
      type: string
      description: The action that was performed
    url:
      type: string
      description: The URL or query that was processed
    data:
      type: object
      description: The scraped/crawled/extracted data including markdown, metadata, and structured content
    error:
      type: string
      description: Error message if the operation failed

examples:
  - input:
      url: "https://example.com"
      action: "scrape"
    description: Scrape a single page and get markdown
  - input:
      url: "https://docs.example.com"
      action: "crawl"
      limit: 5
    description: Crawl a documentation site (up to 5 pages)
  - input:
      url: "https://example.com"
      action: "map"
    description: Get all URLs from a website
  - input:
      url: "latest AI news"
      action: "search"
      limit: 5
    description: Search the web and get scraped results
  - input:
      url: "https://news.ycombinator.com"
      action: "extract"
      prompt: "Extract the top 5 news headlines with their URLs and point counts"
    description: Extract structured data from a page using AI

Firecrawl Web Scraping Tool

A powerful web scraping tool that uses the Firecrawl API to convert websites into clean, LLM-ready markdown and extract structured data.

Features

  • Scrape: Extract content from a single URL as markdown, HTML, or with screenshots
  • Crawl: Automatically discover and scrape all accessible subpages of a website
  • Map: Get a list of all URLs from a website without scraping content (extremely fast)
  • Search: Search the web and get full scraped content from results
  • Extract: Use AI to extract structured data from pages with natural language prompts

Setup

  1. Get an API key from firecrawl.dev
  2. Set your API key as a secret:
    bash
    enact env set FIRECRAWL_API_KEY <your-api-key> --secret --namespace enact
    

This stores your API key securely in your OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service).

Usage Examples

CLI

Scrape a single page

bash
enact run enact/firecrawl -a '{"url": "https://example.com", "action": "scrape"}'

Crawl an entire documentation site

bash
enact run enact/firecrawl -a '{"url": "https://docs.example.com", "action": "crawl", "limit": 20}'

Map all URLs on a website

bash
enact run enact/firecrawl -a '{"url": "https://example.com", "action": "map"}'

Search the web

bash
enact run enact/firecrawl -a '{"url": "latest AI developments 2024", "action": "search", "limit": 5}'

Extract structured data with AI

bash
enact run enact/firecrawl -a '{"url": "https://news.ycombinator.com", "action": "extract", "prompt": "Extract the top 10 news headlines with their URLs"}'

Extract with a JSON schema

bash
enact run enact/firecrawl -a '{
  "url": "https://example.com/pricing",
  "action": "extract",
  "prompt": "Extract pricing information",
  "schema": "{\"type\":\"object\",\"properties\":{\"plans\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"}}}}}}"
}'

MCP (for LLMs/Agents)

When using this tool via MCP, call enact__firecrawl with these parameters:

Scrape a single page

Call with url set to the target URL and action set to "scrape".

Crawl a documentation site

Call with url, action set to "crawl", and limit to control the maximum number of pages.

Map all URLs on a website

Call with url and action set to "map" to discover all URLs without scraping content.

Search the web

Call with url set to your search query (e.g., "latest AI news") and action set to "search". Use limit to control result count.

Extract structured data with AI

Call with url, action set to "extract", and prompt describing what data to extract. Optionally provide a schema for structured output.

Output

The tool returns JSON with:

  • markdown: Clean, LLM-ready content
  • metadata: Title, description, language, source URL
  • extract: Structured data (for extract action)
  • links: Discovered URLs (for map action)

API Features

Firecrawl handles the hard parts of web scraping:

  • Anti-bot mechanisms
  • Dynamic JavaScript content
  • Proxies and rate limiting
  • PDF and document parsing
  • Screenshot capture