Name: docs-scraping
Rating: 92
Author: victoryforphil

What I do

•Turn a user docs scraping request into a reusable script and generated docs snapshot output.
•Standardize output under docs/external/<source>/ with one page-level .ext.md file per discovered page.
•Ensure each generated page has a metadata header and notes footer, and create an index.ext.md manifest.

Accepted request patterns

Use this skill when the request includes one or more of:

•Script: scripts/scrapes/scrape_<source>_docs.sh.ts
•Output directory: docs/external/<source>/
•
Output files:
- •Per-page files: <stable-page-stem>.ext.md
- •Index file: index.ext.md

•
Resolve source metadata from the user request:
- •source key (safe folder/script slug)
- •docs root URL
- •discovery method (sitemap.xml preferred)
•
Reuse project script conventions:
- •Bun shebang (#!/usr/bin/env bun)
- •*.sh.ts naming
- •helpers under scripts/helpers/
•
Use resilient scraping strategy:
- •Primary: r.jina.ai markdown proxy
- •Fallback: direct HTML fetch + conversion to markdown
•
Normalize filenames from docs paths:
- •deterministic flattening (for example docs__guides__intro.ext.md)
•
Regenerate output cleanly:
- •remove old *.ext.md in target source directory
- •write fresh per-page files and index.ext.md

•Inspect existing scraper scripts for reuse patterns (scripts/scrapes/scrape_*.sh.ts).
•Create or update scripts/scrapes/scrape_<source>_docs.sh.ts.
•Run the script once to generate docs output.
•Report totals (pages, ok, failed) and notable blocked pages.
•If a new script entrypoint was introduced, update README.md and AGENTS.md.

•Never embed secrets, auth headers, or private tokens in script or output files.
•Skip private/authenticated docs pages unless explicit credentials handling is requested and safe.
•Keep scripts idempotent and deterministic where practical.