HackerNews Extract
Extract a HackerNews post (article + comments) into clean Markdown for quick reading or LLM input.
see Example Output
What it does
- •Accepts an HackerNews id, url, or a saved Algolia JSON file.
- •Scrapes the linked article content with
trafilatura, cleans HTML, and formats it. - •Fetches the story metadata and comment tree from
https://hn.algolia.com/api/v1/items/<id>. - •Outputs a readable combined markdown file with original article, threaded comments, and key metadata.
Requirements
- •
uvinstalled and in PATH.
Install
No install beyond having uv.
Dependencies will be installed automatically by uv into to a dedicated venv when run this script.
Usage Workflow (Mandatory for Agents)
When an agent is asked to extract a HackerNews post:
- •Run the script with an output path:
uv run --script ${baseDir}/hn-extract.py <input> -o /tmp/hn-<id>.md. - •Send ONE combined message: Upload the file and ask the question in the same tool call. Use the
messagetool (action=send,filePath="/tmp/hn-<id>.md",message="Extraction complete. Do you want me to summarize it?"). - •Do not output the full text or a summary directly in the chat unless specifically requested.
Usage
bash
# run as uv script
uv run --script ${baseDir}/hn-extract.py <hn-id|hn-url|path/to/item.json> [-o path/to/output.md]
# Examples
uv run --script ${baseDir}/hn-extract.py 46861313 -o /tmp/output.md
uv run --script ${baseDir}/hn-extract.py "https://news.ycombinator.com/item?id=46861313"
uv run --script ${baseDir}/hn-extract.py data/item.json
- •Omit
-oto print to stdout. - •Directories for
-oare created automatically.
Notes
- •Retries are enabled for HTTP fetches.
- •Comments are indented by thread depth.
- •Article fetch uses
trafilatura.fetch_urlwith liberal SSL handling to make it more usable. - •Sites requires authentication or blocks scraping may still fail.