AgentSkillsCN

Docx

Docx

SKILL.md

DOCX Skill

Create, edit, and analyze Word documents (.docx files) with support for tracked changes, comments, and formatting preservation.


Reading/Analyzing DOCX

Extract Text Content

bash
pandoc input.docx -t markdown -o output.md

Access Raw XML (for comments, formatting, structure, metadata)

bash
python ooxml/scripts/unpack.py input.docx unpacked/

Then read the relevant XML files:

  • word/document.xml — main content
  • word/comments.xml — comments
  • word/styles.xml — style definitions
  • docProps/core.xml — metadata

Creating New DOCX

IMPORTANT: Before using JavaScript/TypeScript to create documents, read the complete docx-js.md file (~500 lines).

Basic Structure

javascript
const { Document, Paragraph, TextRun, Packer } = require('docx');

const doc = new Document({
    sections: [{
        children: [
            new Paragraph({
                children: [new TextRun("Hello World")],
            }),
        ],
    }],
});

const buffer = await Packer.toBuffer(doc);
fs.writeFileSync("output.docx", buffer);

Editing Existing DOCX

For Basic Changes

Use Python's python-docx library for simple text replacements.

For Professional/Legal Documents (Redlining)

MANDATORY WORKFLOW for contracts, legal documents, or any document requiring tracked changes:

  1. Read ooxml.md completely
  2. Unpack: python ooxml/scripts/unpack.py input.docx unpacked/
  3. Edit XML using the Document class
  4. Pack: python ooxml/scripts/pack.py unpacked/ output.docx

Redlining Best Practices

  • Group changes into 3-10 item batches
  • Organize by section, type, or proximity
  • Use minimal, precise edits
  • Only changed text receives tracked markup
  • Preserve ALL original formatting elements

Tracked Changes XML

Insertion

xml
<w:ins w:author="Claude" w:date="2024-01-15T10:30:00Z">
    <w:r><w:t>new text</w:t></w:r>
</w:ins>

Deletion

xml
<w:del w:author="Claude" w:date="2024-01-15T10:30:00Z">
    <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

Visual Analysis

Convert DOCX to images for visual inspection:

bash
libreoffice --headless --convert-to pdf input.docx
pdftoppm -jpeg -r 150 input.pdf output

Dependencies

ToolPurpose
pandocText extraction to markdown
docx (npm)JavaScript document creation
python-docxPython document manipulation
LibreOfficePDF conversion
poppler-utilsPDF to image conversion
defusedxmlSecure XML parsing