AgentSkillsCN

markdown-converter

借助 markitdown 工具,将各类文档与文件转换为 Markdown 格式。无论是 PDF、Word (.docx)、PowerPoint (.pptx)、Excel (.xlsx, .xls)、HTML、CSV、JSON、XML,还是图片(含 EXIF/OCR)、音频(支持转录)、ZIP 压缩包、远程 URL(包括 YouTube),乃至 EPUB 文件,均可轻松转换为 Markdown 格式,以供大语言模型处理或文本分析之用。

SKILL.md
--- frontmatter
name: markdown-converter
description: Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, remote URLs (including YouTube), or EPubs to Markdown format for LLM processing or text analysis.

Markdown Converter

Convert files to Markdown using uvx markitdown — no installation required.

Basic Usage

bash
# Convert to stdout
uvx markitdown input.pdf

# Convert a remote URL (markitdown will fetch it)
uvx markitdown https://example.com

# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md

# From stdin
cat input.pdf | uvx markitdown

Supported Formats

  • Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
  • Web/Data: HTML, CSV, JSON, XML
  • Media: Images (EXIF + OCR), Audio (EXIF + transcription)
  • Other: ZIP (iterates contents), remote URLs (HTTP/HTTPS, including YouTube), EPub

Options

bash
-o OUTPUT      # Output file
-x EXTENSION   # Hint file extension (for stdin)
-m MIME_TYPE   # Hint MIME type
-c CHARSET     # Hint charset (e.g., UTF-8)
-d             # Use Azure Document Intelligence
-e ENDPOINT    # Document Intelligence endpoint
--use-plugins  # Enable 3rd-party plugins
--list-plugins # Show installed plugins

Examples

bash
# Convert Word document
uvx markitdown report.docx -o report.md

# Convert a remote document
uvx markitdown https://example.com/report.pdf -o report.md

# Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md

# Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md

# Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md

# Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"

Notes

  • Output preserves document structure: headings, tables, lists, links
  • First run caches dependencies; subsequent runs are faster
  • For complex PDFs with poor extraction, use -d with Azure Document Intelligence