read-webpage-content-as-markdown

Name: read-webpage-content-as-markdown
Rating: 88
Author: santiago-afonso

使用 curl + markitdown + codex exec 将网页读入清理后的 markdown。每当被要求读取网页或从 URL 提取文章内容时使用。仅限静态 HTML；JS/客户端渲染页面需要 Playwright 工作流。

SKILL.md

--- frontmatter

name: read-webpage-content-as-markdown
description: Read a webpage into cleaned markdown using curl + markitdown + codex exec. Use whenever asked to read a webpage or extract article content from a URL. Static HTML only; JS/client-rendered pages require a Playwright workflow.

Read Webpage Content as Markdown

Use:

bash

scripts/read-webpage-content-as-markdown.sh [--navlinks] <url> [output_md]

Notes:

•Uses curl (static HTML only); JavaScript is not executed.
•Temp artifacts are stored under /tmp.
•Output includes YAML frontmatter: source_url, accessed_at, commands.
•Output path defaults to /tmp/read-webpage-content-as-markdown.<timestamp>.md; relative output paths are written under /tmp/.
•--navlinks keeps only topic-relevant navigation links (e.g., in-page table of contents); it drops site-wide menus and unrelated links.
•If the script reports JS/client rendering, retry with Playwright.