Name: docs-scraping
Rating: 92
Author: victoryforphil

What I do

•Turn a user docs scraping request into a reusable script and generated docs snapshot output.
•Standardize output under docs/external/<source>/ with one page-level .ext.md file per discovered page.
•Ensure each generated page has a metadata header and notes footer, and create an index.ext.md manifest.
•Prefer routing implementation through the dedicated scraper subagent.

Accepted request patterns

Use this skill when the request includes one or more of:

•Script: scripts/scrapes/scrape_<source>_docs.sh.ts
•Output directory: docs/external/<source>/
•
Output files:
- •Per-page files: <stable-page-stem>.ext.md
- •Index file: index.ext.md

•
Resolve source metadata from the user request:
- •source key (safe folder/script slug)
- •docs root URL
- •discovery method (sitemap.xml preferred)
•
Reuse project script conventions:
- •Bun shebang (#!/usr/bin/env bun)
- •*.sh.ts naming
- •helpers under scripts/helpers/
•
Use resilient scraping strategy:
- •Primary: r.jina.ai markdown proxy
- •Fallback: direct HTML fetch + conversion to markdown
•
Normalize filenames from docs paths:
- •deterministic flattening (for example docs__guides__intro.ext.md)
•
Regenerate output cleanly:
- •remove old *.ext.md in target source directory
- •write fresh per-page files and index.ext.md

•Inspect existing scraper scripts for reuse patterns (scripts/scrapes/scrape_*.sh.ts).
•Create or update scripts/scrapes/scrape_<source>_docs.sh.ts.
•Run the script once to generate docs output.
•Report totals (pages, ok, failed) and notable blocked pages.
•If a new script entrypoint was introduced, update README.md and AGENTS.md.

•Never embed secrets, auth headers, or private tokens in script or output files.
•Skip private/authenticated docs pages unless explicit credentials handling is requested and safe.
•Keep scripts idempotent and deterministic where practical.