DOCX Skill
Create, edit, and analyze Word documents (.docx files) with support for tracked changes, comments, and formatting preservation.
Reading/Analyzing DOCX
Extract Text Content
bash
pandoc input.docx -t markdown -o output.md
Access Raw XML (for comments, formatting, structure, metadata)
bash
python ooxml/scripts/unpack.py input.docx unpacked/
Then read the relevant XML files:
- •
word/document.xml— main content - •
word/comments.xml— comments - •
word/styles.xml— style definitions - •
docProps/core.xml— metadata
Creating New DOCX
IMPORTANT: Before using JavaScript/TypeScript to create documents, read the complete docx-js.md file (~500 lines).
Basic Structure
javascript
const { Document, Paragraph, TextRun, Packer } = require('docx');
const doc = new Document({
sections: [{
children: [
new Paragraph({
children: [new TextRun("Hello World")],
}),
],
}],
});
const buffer = await Packer.toBuffer(doc);
fs.writeFileSync("output.docx", buffer);
Editing Existing DOCX
For Basic Changes
Use Python's python-docx library for simple text replacements.
For Professional/Legal Documents (Redlining)
MANDATORY WORKFLOW for contracts, legal documents, or any document requiring tracked changes:
- •Read
ooxml.mdcompletely - •Unpack:
python ooxml/scripts/unpack.py input.docx unpacked/ - •Edit XML using the Document class
- •Pack:
python ooxml/scripts/pack.py unpacked/ output.docx
Redlining Best Practices
- •Group changes into 3-10 item batches
- •Organize by section, type, or proximity
- •Use minimal, precise edits
- •Only changed text receives tracked markup
- •Preserve ALL original formatting elements
Tracked Changes XML
Insertion
xml
<w:ins w:author="Claude" w:date="2024-01-15T10:30:00Z">
<w:r><w:t>new text</w:t></w:r>
</w:ins>
Deletion
xml
<w:del w:author="Claude" w:date="2024-01-15T10:30:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
Visual Analysis
Convert DOCX to images for visual inspection:
bash
libreoffice --headless --convert-to pdf input.docx pdftoppm -jpeg -r 150 input.pdf output
Dependencies
| Tool | Purpose |
|---|---|
| pandoc | Text extraction to markdown |
| docx (npm) | JavaScript document creation |
| python-docx | Python document manipulation |
| LibreOffice | PDF conversion |
| poppler-utils | PDF to image conversion |
| defusedxml | Secure XML parsing |