Learning Markdown to DOCX Converter
Objectives
- •Convert .md files to .docx with proper formatting
- •Preserve images, tables, and code blocks
- •Apply academic document styling
Instructions
1. Pre-process Markdown
Remove horizontal rules before conversion:
- •Horizontal rules
---in markdown become visible lines in Word - •Remove all
---separators if clean layout is desired - •Alternative: Use blank lines for section spacing
2. Use Python-docx or Pandoc
Recommended: Pandoc (more reliable)
bash
pandoc input.md -o output.docx --reference-doc=template.docx
Alternative: Python script with python-docx
python
from docx import Document from docx.shared import Inches, Pt import markdown
2. Formatting Requirements
Document structure:
- •Title: Bold, 16pt
- •Student info: Name, ID, Section, Date
- •Headings: Hierarchical (Heading 1, 2, 3)
- •Body text: 11pt, single spacing
- •Code blocks: Courier New, 10pt, gray background
- •Images: Centered, with captions
Image handling:
- •Convert relative paths to absolute before processing
- •Resize images to fit page width (max 6 inches)
- •Control captions via alt text:
becomes figure caption - •Use empty alt text
to avoid automatic captions - •Maintain aspect ratio
3. Conversion Steps
- •
Pre-process markdown:
- •Remove horizontal rules
---if clean layout is needed - •Resolve relative image paths
- •Clean up formatting inconsistencies
- •Verify all images exist
- •Remove horizontal rules
- •
Convert to DOCX:
- •Use pandoc or python-docx
- •Apply formatting rules
- •Insert images with proper sizing
- •
Post-process DOCX:
- •Verify all images display correctly
- •Check page breaks
- •Ensure consistent formatting
- •Add page numbers if required
4. Pandoc Command Examples
Basic conversion:
bash
pandoc Lab1_Template.md -o Lab1.docx
With custom styling:
bash
pandoc Lab1_Template.md -o Lab1.docx \ --reference-doc=academic_template.docx \ --toc \ --number-sections
With metadata:
bash
pandoc Lab1_Template.md -o Lab1.docx \ -M title="Lab 1: Zipf's Law" \ -M author="Student Name" \ -M date="2026-01-20"
A ready-to-use script is available: scripts/convert_md_to_docx.py
Usage:
bash
# Basic conversion python scripts/convert_md_to_docx.py Lab1_Template.md # Specify output filename python scripts/convert_md_to_docx.py Lab1_Template.md Lab1.docx # The script will: # - Auto-download pandoc if needed # - Handle relative image paths # - Auto-remove image alt text (prevents captions in Word) # - Provide validation checklist
Features:
- •Automatic pandoc installation
- •Relative path resolution for images
- •Auto-preprocessing: removes image alt text (e.g.,
→)- •This prevents alt text from appearing as captions below images in Word
- •Error handling and troubleshooting tips
- •Validation checklist after conversion
Validation
Check the generated .docx:
- • All images display correctly
- • Headings are properly formatted
- • Code blocks are readable
- • Tables are formatted correctly
- • Page layout is appropriate
- • File size is reasonable (<10MB)
Common Issues
- •Missing images: Ensure all image paths are resolved before conversion
- •Broken formatting: Use
--reference-docwith proper template - •Large file size: Compress images before conversion
- •Chinese characters: Ensure UTF-8 encoding:
pandoc -f markdown+east_asian_line_breaks - •Unwanted text below images: Image alt text appears as captions in Word. Use empty alt text
![]()or short text![Code]()to avoid verbose captions
Installation
Pandoc:
bash
# Windows (using chocolatey) choco install pandoc # Or download from: https://pandoc.org/installing.html
Python packages:
bash
pip install pypandoc python-docx