AgentSkillsCN

salesforce-help-site-scraper

在获得用户同意并进行内容清理的前提下,将 Salesforce Help 文章抓取为整洁的 Markdown 格式。适用于需要为研究或文档支持,获取 Help 内容的内部可读快照时使用。

SKILL.md
--- frontmatter
name: salesforce-help-site-scraper
description: 'Scrape Salesforce Help articles into clean Markdown with consent handling and content cleanup. Use when you need an internal, readable snapshot of Help content for research or documentation support.'
license: Forward Proprietary
compatibility: VS Code 1.x+, Node.js 18+

Salesforce Help Site Scraper

Use this skill to extract Salesforce Help article content into clean Markdown when pages render dynamically or are blocked by consent banners.

When to Use This Skill

  • You need a readable Markdown snapshot of a Help article for internal research.
  • OneTrust cookie banners block access to the main content.
  • You want to remove headers, footers, or navigation chrome before extraction.
  • NOT for: high-volume crawling, bypassing access controls, or republishing Salesforce content.

Prerequisites

  • Node.js 18+
  • Scraper script at skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js

How to Use

Basic Usage

bash
node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
  --url "https://help.salesforce.com/s/articleView?id=sf.flow.htm&type=5" \
  --out "./artifacts/online-research/help_flow_overview.md" \
  --consent-selector "#onetrust-accept-btn-handler" \
  --remove-selectors "header,footer,nav,aside" \
  --wait 2500

Script Options

OptionRequiredDescription
--urlYesTarget Help article URL.
--outYesOutput Markdown file path.
--consent-selectorNoSelector for cookie/consent accept button (OneTrust).
--remove-selectorsNoComma-separated selectors to remove before extraction.
--waitNoMilliseconds to wait after navigation or consent click.

Compliance Notes

  • Prefer the Salesforce Knowledge APIs for structured, supported access where possible.
  • Check and respect robots.txt before scraping.
  • Do not republish or redistribute Salesforce Help content.
  • Attribute content to Salesforce when used internally.

Examples

Example: Capture a Flow Help article

bash
node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
  --url "https://help.salesforce.com/s/articleView?id=sf.flow_build.htm&type=5" \
  --out "./artifacts/online-research/help_flow_build.md" \
  --consent-selector "#onetrust-accept-btn-handler" \
  --remove-selectors "header,footer,nav,aside" \
  --wait 2500

Troubleshooting

Issue: Output is empty or too short

Solution: Increase --wait or refine --remove-selectors to avoid removing the main content container.

Issue: Consent banner blocks content

Solution: Provide --consent-selector for the OneTrust accept button.

References