DOCX Skill

Create, edit, and analyze Word documents (.docx files) with support for tracked changes, comments, and formatting preservation.

Reading/Analyzing DOCX

Extract Text Content

bash

pandoc input.docx -t markdown -o output.md

Access Raw XML (for comments, formatting, structure, metadata)

bash

python ooxml/scripts/unpack.py input.docx unpacked/

Then read the relevant XML files:

•word/document.xml — main content
•word/comments.xml — comments
•word/styles.xml — style definitions
•docProps/core.xml — metadata

Creating New DOCX

IMPORTANT: Before using JavaScript/TypeScript to create documents, read the complete docx-js.md file (~500 lines).

Basic Structure

javascript

const { Document, Paragraph, TextRun, Packer } = require('docx');

const doc = new Document({
    sections: [{
        children: [
            new Paragraph({
                children: [new TextRun("Hello World")],
            }),
        ],
    }],
});

const buffer = await Packer.toBuffer(doc);
fs.writeFileSync("output.docx", buffer);

Editing Existing DOCX

For Basic Changes

Use Python's python-docx library for simple text replacements.

For Professional/Legal Documents (Redlining)

MANDATORY WORKFLOW for contracts, legal documents, or any document requiring tracked changes:

•Read ooxml.md completely
•Unpack: python ooxml/scripts/unpack.py input.docx unpacked/
•Edit XML using the Document class
•Pack: python ooxml/scripts/pack.py unpacked/ output.docx

Redlining Best Practices

•Group changes into 3-10 item batches
•Organize by section, type, or proximity
•Use minimal, precise edits
•Only changed text receives tracked markup
•Preserve ALL original formatting elements

Tracked Changes XML

Insertion

xml

<w:ins w:author="Claude" w:date="2024-01-15T10:30:00Z">
    <w:r><w:t>new text</w:t></w:r>
</w:ins>

Deletion

xml

<w:del w:author="Claude" w:date="2024-01-15T10:30:00Z">
    <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

Visual Analysis

Convert DOCX to images for visual inspection:

bash

libreoffice --headless --convert-to pdf input.docx
pdftoppm -jpeg -r 150 input.pdf output

Dependencies

Tool	Purpose
pandoc	Text extraction to markdown
docx (npm)	JavaScript document creation
python-docx	Python document manipulation
LibreOffice	PDF conversion
poppler-utils	PDF to image conversion
defusedxml	Secure XML parsing