AgentSkillsCN

scraper

从网站中提取可读内容。当用户问“读一读这个网站”、“概括一下这个网址”或“爬取这个页面”时触发。相比简单的 curl 命令,效果更佳。

SKILL.md
--- frontmatter
name: "scraper"
description: "Extracts readable content from websites. Invoke when user asks 'Read this website', 'Summarize this URL', or 'Scrape this page'. BETTER than simple curl."

Web Scraper (Advanced)

Converts a webpage to Markdown for easier reading by the LLM.

Requirements

  • Tool: r.jina.ai (Free Reader API) - No installation needed, just use curl.

Cross-Platform Method (Python)

Works on Windows, Linux, and Mac.

  1. Run Script:
    bash
    python workspace/skills/scraper/scripts/scrape.py "https://example.com"
    

Commands (Bash/Linux/Mac)

Read Page (Markdown)

Fetches the URL and converts it to clean Markdown.

bash
curl -s "https://r.jina.ai/https://example.com"

Read Page (Text Only)

Fetches the URL and returns plain text.

bash
curl -s -H "Accept: text/plain" "https://r.jina.ai/https://example.com"

Usage

Ghost uses this to "read" documentation, news articles, or blog posts that are otherwise too cluttered with HTML/JS for simple analysis.