AgentSkillsCN

markitdown

使用 Microsoft 的 markitdown CLI 将本地文档转换为 Markdown 格式。此工具尤其适合处理 PDF、Word、Excel、PowerPoint、图像(OCR)以及音频文件。虽然也可以抓取网页链接,但若要处理网络内容,Jina 的速度更快。适用场景包括:“转换为 Markdown”、“读取 PDF”、“解析文档”、“提取文本”、“docx、xlsx、pptx 文件”、“OCR 图像”、“本地文件”。

SKILL.md
--- frontmatter
name: markitdown
description: "Convert local documents to Markdown using Microsoft's markitdown CLI. Best for: PDF, Word, Excel, PowerPoint, images (OCR), audio. Can fetch URLs but Jina is faster for web. Triggers on: convert to markdown, read PDF, parse document, extract text from, docx, xlsx, pptx, OCR image, local file."
compatibility: "Requires markitdown. Install: pip install markitdown"
allowed-tools: "Bash"

markitdown - Document to Markdown

Convert local documents to clean Markdown. One tool for PDF, Word, Excel, PowerPoint, images, and more.

When to Use markitdown

Use CaseRecommendation
Local files (PDF, Word, Excel)Use markitdown - unique capability
Web pages❌ Use Jina (r.jina.ai/) - 5x faster
Blocked/anti-bot sites❌ Use Firecrawl
OCR on imagesUse markitdown
Audio transcriptionUse markitdown

Basic Usage

bash
# Local files (primary use case)
markitdown document.pdf
markitdown report.docx
markitdown data.xlsx
markitdown slides.pptx
markitdown screenshot.png    # OCR

# URLs (works, but Jina is faster)
markitdown https://example.com

# Save output
markitdown document.pdf > document.md

Supported Formats

FormatExtensionsNotes
PDF.pdfText extraction, tables
Word.docxFormatting preserved
Excel.xlsxTables to markdown
PowerPoint.pptxSlides as sections
Images.jpg, .pngOCR text extraction
HTML.htmlClean conversion
Audio.mp3, .wavSpeech-to-text
Text.txt, .csv, .json, .xmlPass-through/structure
URLshttps://...Works but slower than Jina

Benchmarked Performance (URLs)

ToolAvg SpeedSuccess Rate
Jina0.5s10/10
markitdown2.5s9/10
Firecrawl4.5s10/10

Verdict: For URLs, use Jina. For local files, markitdown is the only option.

Examples

bash
# PDF to markdown (primary use case)
markitdown report.pdf > report.md

# Excel spreadsheet
markitdown financials.xlsx

# Image with text (OCR)
markitdown screenshot.png

# PowerPoint deck
markitdown presentation.pptx > slides.md

# Audio transcription
markitdown meeting.mp3 > transcript.md

Comparison with Alternatives

TaskmarkitdownAlternative
PDF textmarkitdown file.pdfPyMuPDF, pdfplumber
Word docsmarkitdown file.docxpython-docx
Excelmarkitdown file.xlsxpandas, openpyxl
OCRmarkitdown image.pngTesseract
Web pagesUse Jina insteadr.jina.ai/URL (5x faster)

markitdown's advantage: One CLI for all local document formats. No code needed.