AgentSkillsCN

web-scraping

使用 CSS 选择器和 XPath 从网页中提取结构化数据

SKILL.md
--- frontmatter
name: web-scraping
description: Extract structured data from web pages using CSS selectors and XPath

Web Scraping

Extract structured data from web pages.

Capabilities

  • Fetch HTML content from URLs
  • Parse and extract specific elements (tables, lists, text)
  • Handle pagination
  • Output in JSON or CSV format

Supported Selectors

  • CSS selectors: .class, #id, tag
  • XPath expressions
  • Text patterns (regex)

Rate Limiting

Always respect robots.txt and implement delays between requests. Default delay: 1 second between requests.

Example

code
Scrape product names and prices from example.com/products
Output as JSON with fields: name, price, url