DOCX Document Skill
Read, create, and edit .docx documents with professional formatting and visual verification.
Overview
This skill handles Word document operations end-to-end: reading existing .docx files, creating new documents with structured formatting, editing content while preserving layout, and verifying visual quality through a rendering pipeline that converts DOCX to PNG pages.
Core capabilities:
- •Create
.docxfiles with headings, styles, tables, lists, and images viapython-docx - •Edit existing documents while preserving formatting, styles, and section properties
- •Render documents to PNG pages via LibreOffice and Poppler for visual inspection
- •Validate layout quality (typography, spacing, alignment, table integrity)
When to use:
- •Read or review DOCX content where layout matters (tables, diagrams, pagination)
- •Create or edit DOCX files with professional formatting
- •Validate visual layout before delivery
Dependencies
Install with uv (preferred) or pip:
uv pip install python-docx pdf2image
System tools for rendering (LibreOffice + Poppler):
# Ubuntu/Debian sudo apt-get install -y libreoffice poppler-utils # macOS brew install libreoffice poppler
If system tools cannot be installed, python-docx still works for document creation and editing; only the rendering pipeline requires LibreOffice and Poppler.
Workflow
Step 1: Assess Environment
python3 -c "import docx; print('python-docx:', docx.__version__)" 2>/dev/null || echo "MISSING"
command -v soffice >/dev/null 2>&1 && echo "LibreOffice: OK" || echo "MISSING: LibreOffice"
command -v pdftoppm >/dev/null 2>&1 && echo "Poppler: OK" || echo "MISSING: Poppler"
Step 2: Gather Requirements
Ask the user for objective, mode (create/edit/review), input file path, output path (default: output/doc/<name>.docx), and style requirements.
Step 3: Execute Document Operation
Create a document:
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
style = doc.styles["Normal"]
style.font.name = "Calibri"
style.font.size = Pt(11)
doc.add_heading("Title", level=0)
table = doc.add_table(rows=3, cols=3, style="Table Grid")
table.cell(0, 0).text = "Header 1"
doc.add_picture("image.png", width=Inches(4.0))
doc.save("output/doc/result.docx")
Read a document:
from docx import Document
doc = Document("input.docx")
for para in doc.paragraphs:
print(f"[{para.style.name}] {para.text}")
for table in doc.tables:
for row in table.rows:
print([cell.text for cell in row.cells])
Edit a document:
from docx import Document
doc = Document("input.docx")
for para in doc.paragraphs:
if "old text" in para.text:
for run in para.runs:
run.text = run.text.replace("old text", "new text")
doc.save("output/doc/edited.docx")
Step 4: Render and Inspect
SKILL_DIR="<path-to-this-skill>" python3 "$SKILL_DIR/scripts/render_docx.py" output/doc/result.docx \ --output_dir tmp/doc/pages --width 1600 --height 2000
Manual rendering without the bundled script:
soffice -env:UserInstallation=file:///tmp/lo_profile_$$ \ --headless --convert-to pdf --outdir tmp/doc/ output/doc/result.docx pdftoppm -png tmp/doc/result.pdf tmp/doc/result
Inspect each page for: clipped text, broken tables, encoding issues, spacing, image overflow.
Step 5: Iterate and Deliver
Fix formatting issues, re-render, re-inspect. Save final .docx, report path, clean up tmp/doc/.
File Conventions
| Path | Purpose |
|---|---|
tmp/doc/ | Intermediate files (PDFs, PNGs); delete when done |
output/doc/ | Final document artifacts |
Best Practices
Do:
- •Set explicit fonts, sizes, and spacing via
doc.styles - •Use
Inches(),Pt(),Cm(),Emu()for dimensions - •Set column widths explicitly with
cell.width = Cm(N) - •Re-render after every structural change
- •Use ASCII hyphens only; avoid U+2011, U+2013, U+2014
Do not:
- •Leave formatting to Word defaults (they vary across systems)
- •Mix direct formatting with style-based formatting
- •Insert images without specifying width
- •Skip visual verification before delivery
Common Patterns
Table with Header Shading
from docx import Document
from docx.shared import Pt, Cm
from docx.oxml.ns import qn
doc = Document()
table = doc.add_table(rows=4, cols=3, style="Table Grid")
for i, header in enumerate(["Name", "Role", "Status"]):
cell = table.cell(0, i)
cell.text = header
run = cell.paragraphs[0].runs[0]
run.bold = True
run.font.size = Pt(10)
shading = cell._element.get_or_add_tcPr()
shading_elm = shading.makeelement(qn("w:shd"), {
qn("w:val"): "clear", qn("w:color"): "auto", qn("w:fill"): "4472C4"})
shading.append(shading_elm)
for row in table.rows:
row.cells[0].width = Cm(5)
row.cells[1].width = Cm(5)
row.cells[2].width = Cm(3)
Page Setup
from docx import Document from docx.shared import Inches from docx.enum.section import WD_ORIENT doc = Document() section = doc.sections[0] section.top_margin = Inches(1.0) section.bottom_margin = Inches(1.0) section.left_margin = Inches(1.25) section.right_margin = Inches(1.25) section.orientation = WD_ORIENT.LANDSCAPE section.page_width, section.page_height = section.page_height, section.page_width
Examples
Example 1: Create a report
Prompt: Create a quarterly sales report with summary table
Steps: Set up Calibri 11pt, add title heading, create a 5-row table with Q1-Q4 revenue columns, add chart placeholder, render to PNG, save to
output/doc/quarterly-report.docx
Example 2: Edit an existing document
Prompt: Update the header in proposal.docx to say "2026 Budget Proposal"
Steps: Load with
python-docx, find heading paragraph, replace text preserving formatting, re-render, save tooutput/doc/proposal.docx
Example 3: Visual review
Prompt: Check if the tables in report.docx are aligned properly
Steps: Run
render_docx.py, inspect each PNG page for alignment, report findings on column widths and border consistency, suggest fixes
Output Checklist
Before delivering a document, verify:
- • Document opens without errors in Word or LibreOffice
- • Fonts render consistently (no fallback to default serif)
- • Tables have aligned columns with no text overflow
- • Images are properly sized within page margins
- • Page margins and orientation match requirements
- • Headings follow consistent hierarchy (H1, H2, H3)
- • No encoding artifacts (mojibake, replacement characters)
- • Temporary files cleaned up from
tmp/doc/
Troubleshooting
| Problem | Solution |
|---|---|
ModuleNotFoundError: docx | Run uv pip install python-docx (package name differs from import) |
soffice not found | Install LibreOffice: apt-get install libreoffice or brew install libreoffice |
pdftoppm not found | Install Poppler: apt-get install poppler-utils or brew install poppler |
| LibreOffice profile lock | Use unique profile: -env:UserInstallation=file:///tmp/lo_profile_$$ |
| PDF conversion fails | Try DOCX to ODT to PDF fallback (bundled script handles this) |
| Table columns misaligned | Set explicit widths: cell.width = Cm(N) |
| Image overflows page | Specify width: doc.add_picture(path, width=Inches(N)) |
| Unicode dash rendering | Replace U+2011, U+2013, U+2014 with ASCII hyphen 0x2D |
Reference Map
- •
references/python-docx-patterns.md: API patterns for paragraphs, tables, styles, images, sections - •
references/rendering-pipeline.md: DOCX-to-PNG conversion, DPI calculation, troubleshooting