Crawl4AI Web Crawler
Converts web pages to clean, LLM-friendly markdown using crawl4ai.
Requirements
- •
uvmust be installed (https://docs.astral.sh/uv/) - •The skill manages its own isolated Python environment via
uv sync
Setup
Before first use, run the setup script:
bash
skills/crawl4ai/scripts/setup.sh
This will:
- •Check for
uvinstallation - •Run
uv syncto create.venv/and install dependencies - •Run
crawl4ai-setupto install playwright browsers - •Run
crawl4ai-doctorto verify installation
Usage
To crawl a URL and get markdown (from the skill directory):
bash
cd skills/crawl4ai uv run python scripts/crawl.py "https://example.com"
Options:
- •
--output FILE- Save markdown to file instead of stdout - •
--include-links- Include hyperlinks in markdown output - •
--include-images- Include image references - •
--no-headless- Show browser window (default: headless) - •
--timeout SECONDS- Page load timeout (default: 30)
Examples
Basic crawl
bash
cd skills/crawl4ai uv run python scripts/crawl.py "https://docs.python.org/3/"
Save to file
bash
uv run python scripts/crawl.py "https://example.com" --output content.md
With all options
bash
uv run python scripts/crawl.py "https://example.com" \ --include-links \ --include-images \ --timeout 60 \ --output result.md
Troubleshooting
If crawling fails:
- •Run
uv run crawl4ai-doctorto diagnose - •Ensure playwright browsers are installed:
uv run python -m playwright install chromium - •Check the URL is accessible and not blocked by robots.txt