AgentSkillsCN

enact/firecrawl

使用Firecrawl API抓取、爬取、搜索和提取网站的结构化数据——将网页转换为LLM就绪的markdown

SKILL.md
--- frontmatter
name: enact/firecrawl
version: 1.2.0
description: Scrape, crawl, search, and extract structured data from websites using Firecrawl API - converts web pages to LLM-ready markdown
enact: "2.0"

from: python:3.12-slim

build:
  - pip install requests

env:
  FIRECRAWL_API_KEY:
    description: Your Firecrawl API key from firecrawl.dev
    secret: true

command: python /workspace/firecrawl.py ${action} ${url} ${formats} ${limit} ${only_main_content} ${prompt} ${schema}

timeout: 300s

license: MIT

tags:
  - web-scraping
  - crawling
  - markdown
  - llm
  - ai
  - data-extraction
  - search
  - structured-data

annotations:
  readOnlyHint: true
  openWorldHint: true

inputSchema:
  type: object
  properties:
    action:
      type: string
      description: |
        The action to perform:
        - scrape: Extract content from a single URL
        - crawl: Discover and scrape all subpages of a website
        - map: Get all URLs from a website (fast discovery)
        - search: Search the web and get scraped results
        - extract: Extract structured data using AI
      enum:
        - scrape
        - crawl
        - map
        - search
        - extract
      default: scrape
    url:
      type: string
      description: The URL to process (for scrape, crawl, map, extract) or search query (for search action)
    formats:
      type: string
      description: Comma-separated output formats (markdown, html, links, screenshot). Used by scrape and crawl actions.
      default: markdown
    limit:
      type: integer
      description: Maximum number of pages to crawl (crawl action) or search results to return (search action)
      default: 10
    only_main_content:
      type: boolean
      description: Extract only the main content, excluding headers, navs, footers (scrape action)
      default: true
    prompt:
      type: string
      description: |
        Multi-purpose field:
        - For map: Search query to filter URLs
        - For extract: Natural language instruction for what to extract
      default: ""
    schema:
      type: string
      description: JSON schema string for structured extraction (extract action only). Define the shape of data you want to extract.
      default: ""
  required:
    - url

outputSchema:
  type: object
  properties:
    success:
      type: boolean
      description: Whether the operation succeeded
    action:
      type: string
      description: The action that was performed
    url:
      type: string
      description: The URL or query that was processed
    data:
      type: object
      description: The scraped/crawled/extracted data including markdown, metadata, and structured content
    error:
      type: string
      description: Error message if the operation failed

examples:
  - input:
      url: "https://example.com"
      action: "scrape"
    description: Scrape a single page and get markdown
  - input:
      url: "https://docs.example.com"
      action: "crawl"
      limit: 5
    description: Crawl a documentation site (up to 5 pages)
  - input:
      url: "https://example.com"
      action: "map"
    description: Get all URLs from a website
  - input:
      url: "latest AI news"
      action: "search"
      limit: 5
    description: Search the web and get scraped results
  - input:
      url: "https://news.ycombinator.com"
      action: "extract"
      prompt: "Extract the top 5 news headlines with their URLs and point counts"
    description: Extract structured data from a page using AI

Firecrawl Web Scraping Tool

A powerful web scraping tool that uses the Firecrawl API to convert websites into clean, LLM-ready markdown and extract structured data.

Features

  • Scrape: Extract content from a single URL as markdown, HTML, or with screenshots
  • Crawl: Automatically discover and scrape all accessible subpages of a website
  • Map: Get a list of all URLs from a website without scraping content (extremely fast)
  • Search: Search the web and get full scraped content from results
  • Extract: Use AI to extract structured data from pages with natural language prompts

Setup

  1. Get an API key from firecrawl.dev
  2. Set your API key as a secret:
    bash
    enact env set FIRECRAWL_API_KEY <your-api-key> --secret --namespace enact
    

This stores your API key securely in your OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service).

Usage Examples

Scrape a single page

bash
enact run enact/firecrawl --url "https://example.com" --action scrape

Crawl an entire documentation site

bash
enact run enact/firecrawl --url "https://docs.example.com" --action crawl --limit 20

Map all URLs on a website

bash
enact run enact/firecrawl --url "https://example.com" --action map

Search the web

bash
enact run enact/firecrawl --url "latest AI developments 2024" --action search --limit 5

Extract structured data with AI

bash
enact run enact/firecrawl --url "https://news.ycombinator.com" --action extract --prompt "Extract the top 10 news headlines with their URLs"

Extract with a JSON schema

bash
enact run enact/firecrawl \
  --url "https://example.com/pricing" \
  --action extract \
  --prompt "Extract pricing information" \
  --schema '{"type":"object","properties":{"plans":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'

Output

The tool returns JSON with:

  • markdown: Clean, LLM-ready content
  • metadata: Title, description, language, source URL
  • extract: Structured data (for extract action)
  • links: Discovered URLs (for map action)

API Features

Firecrawl handles the hard parts of web scraping:

  • Anti-bot mechanisms
  • Dynamic JavaScript content
  • Proxies and rate limiting
  • PDF and document parsing
  • Screenshot capture