hn-extract

HackerNews Extract

Extract a HackerNews post (article + comments) into clean Markdown for quick reading or LLM input.

see Example Output

What it does

•Accepts an HackerNews id, url, or a saved Algolia JSON file.
•Scrapes the linked article content with trafilatura, cleans HTML, and formats it.
•Fetches the story metadata and comment tree from https://hn.algolia.com/api/v1/items/<id>.
•Outputs a readable combined markdown file with original article, threaded comments, and key metadata.

Requirements

•uv installed and in PATH.

Install

No install beyond having uv. Dependencies will be installed automatically by uv into to a dedicated venv when run this script.

Usage Workflow (Mandatory for Agents)

When an agent is asked to extract a HackerNews post:

•Run the script with an output path: uv run --script ${baseDir}/hn-extract.py <input> -o /tmp/hn-<id>.md.
•Send ONE combined message: Upload the file and ask the question in the same tool call. Use the message tool (action=send, filePath="/tmp/hn-<id>.md", message="Extraction complete. Do you want me to summarize it?").
•Do not output the full text or a summary directly in the chat unless specifically requested.

Usage

bash

# run as uv script
uv run --script ${baseDir}/hn-extract.py <hn-id|hn-url|path/to/item.json> [-o path/to/output.md]

# Examples
uv run --script ${baseDir}/hn-extract.py 46861313 -o /tmp/output.md
uv run --script ${baseDir}/hn-extract.py "https://news.ycombinator.com/item?id=46861313"
uv run --script ${baseDir}/hn-extract.py data/item.json

•Omit -o to print to stdout.
•Directories for -o are created automatically.

Notes

•Retries are enabled for HTTP fetches.
•Comments are indented by thread depth.
•Article fetch uses trafilatura.fetch_url with liberal SSL handling to make it more usable.
•Sites requires authentication or blocks scraping may still fail.