dufuddle
A CLI wrapper for Defuddle that extracts clean content and metadata from any web page.
Setup
Install dependencies in the scripts folder:
bash
cd .opencode/skill/dufuddle/scripts && bun install
Usage
bash
bun run .opencode/skill/dufuddle/scripts/dufuddle.ts [options] <URL>
Options
| Option | Description |
|---|---|
--help | Show help message |
--dry-run | Verify URL is reachable without parsing content |
--content | Output only clean markdown content |
--json | Output full result as JSON |
--timeout <ms> | Set request timeout (default: 10000ms) |
Examples
Extract article content as clean Markdown
bash
bun run .opencode/skill/dufuddle/scripts/dufuddle.ts https://example.com/article
By default, outputs only the clean Markdown content (when piped to a file or non-TTY).
Verify URL accessibility
bash
bun run .opencode/skill/dufuddle/scripts/dufuddle.ts --dry-run https://example.com
Get JSON output for programmatic use
bash
bun run .opencode/skill/dufuddle/scripts/dufuddle.ts --json https://example.com/article | jq
Set custom timeout
bash
bun run .opencode/skill/dufuddle/scripts/dufuddle.ts --timeout 30000 https://slow-site.com
Output Format
Default Output (clean Markdown)
When running in a terminal (TTY), displays full output with title, content, and metadata. When piping to a file or non-TTY, outputs only the clean Markdown content:
markdown
# Article Title Article content in clean Markdown...
Full Output (TTY mode only)
code
--- TITLE ---
Article Title
--- CONTENT (Markdown) ---
# Heading
Article content in clean Markdown...
--- METADATA ---
{
"author": "Author Name",
"published": "2025-01-01",
"site": "Example Site",
"wordCount": 1500,
"domain": "example.com"
}
JSON Output (--json)
json
{
"title": "Article Title",
"content": "# Heading\nArticle content...",
"author": "Author Name",
"published": "2025-01-01",
"site": "Example Site",
"wordCount": 1500,
"domain": "example.com"
}
Bundled Resources
- •
scripts/dufuddle.ts- Main CLI script - •
scripts/package.json- Dependencies (defuddle, jsdom)
Best Practices
- •Use
--dry-runfirst to verify URL accessibility before full extraction - •Use
--contentfor clean markdown exports to files - •Use
--jsonwhen integrating with other tools or scripts - •Increase
--timeoutfor slow-loading pages - •The tool uses a browser-like User-Agent to avoid blocking by some sites