Document Converter

All-in-one document conversion skill for both importing external documents and exporting analysis reports.

Overview

code

┌─────────────────┐              ┌─────────────────┐
│   INPUT FILES   │              │  OUTPUT FILES   │
│  PDF, DOCX,     │ ──import──▶  │   Markdown      │
│  PPTX (OCR)     │              │   (.md)         │
└─────────────────┘              └─────────────────┘
                                        │
                                        │ export
                                        ▼
                                 ┌─────────────────┐
                                 │  FINAL REPORTS  │
                                 │  PDF (styled),  │
                                 │  DOCX           │
                                 └─────────────────┘

Capabilities

1. IMPORT: PDF/DOCX/PPTX → Markdown

Uses markdowner.py to convert documents to Markdown.

bash

# Basic conversion
python3 .agent/skills/document-converter/scripts/markdowner.py input.pdf

# Force OCR for scanned documents
python3 .agent/skills/document-converter/scripts/markdowner.py input.pdf --ocr

# Specify output path
python3 .agent/skills/document-converter/scripts/markdowner.py input.pdf -o output.md

# Batch convert directory
python3 .agent/skills/document-converter/scripts/markdowner.py /input_dir/ -o /output_dir/

Supported formats: .pdf, .docx, .pptx

Features:

•Layout preservation with pdftotext
•OCR fallback with Tesseract for scanned documents
•DOCX/PPTX extraction with pypandoc

2. EXPORT: Markdown → PDF/DOCX

Uses compile_report.py to generate professional reports.

bash

# Professional PDF with cover page
python3 .agent/skills/document-converter/scripts/compile_report.py \
    report.md \
    --format pdf \
    --title "Analysis Report" \
    --subtitle "Q1 2026" \
    --color "2980b9"

# Simple DOCX for editing
python3 .agent/skills/document-converter/scripts/compile_report.py \
    report.md --format docx

PDF Features:

•Professional cover page with custom title, subtitle, author
•Auto-centered and scaled images
•LaTeX styling with eisvogel-like headers
•Page breaks handled automatically

DOCX Features:

•Standard conversion for further editing

Dependencies

System Packages

bash

sudo apt install poppler-utils tesseract-ocr pandoc texlive-xetex texlive-fonts-extra

Python Packages

bash

pip install pypandoc pdfminer.six pdf2image pytesseract python-pptx Pillow

Workflow Examples

Import → Process → Export

bash

# 1. Import external PDF to markdown
python3 markdowner.py survey_report.pdf -o source.md

# 2. Process/analyze in Python or manually

# 3. Export final report
python3 compile_report.py analysis.md \
    --format pdf \
    --title "Survey Analysis"

Batch Processing

bash

# Import all PDFs in folder
python3 markdowner.py /reports/ -o /markdown/

# Export all markdown to PDFs
for f in /markdown/*.md; do
    python3 compile_report.py "$f" --format pdf
done

Troubleshooting

Issue	Solution
Missing text in PDF	Use `--ocr` flag
OCR not working	Install `tesseract-ocr`
PDF styling broken	Install `texlive-xetex texlive-fonts-extra`
PPTX error	Install `python-pptx`
Pandoc not found	Install `pandoc`

File Structure

code

.agent/skills/document-converter/
├── SKILL.md              # This file
└── scripts/
    ├── markdowner.py     # PDF/DOCX/PPTX → Markdown
    └── compile_report.py # Markdown → PDF/DOCX

Migration Note

This skill replaces and unifies:

•pdf-to-markdown (import functionality)
•report-writer (export functionality)

The original skills are deprecated but kept for backward compatibility.