DOCX Document Analysis Skill

Name: docx_document_analysis
Rating: 78
Author: niranjannav

Quick Start — Extract Text

python

from docx import Document

doc = Document('document.docx')
for para in doc.paragraphs:
    if para.text.strip():
        print(f"[{para.style.name}] {para.text}")

Extract with Structure

python

from docx import Document

doc = Document('document.docx')
for para in doc.paragraphs:
    if para.style.name.startswith('Heading'):
        level = para.style.name.replace('Heading ', '')
        print(f"\n{'#' * int(level)} {para.text}")
    elif para.text.strip():
        print(para.text)

Extract Tables

python

from docx import Document

doc = Document('document.docx')
for i, table in enumerate(doc.tables):
    print(f"\nTable {i+1}:")
    for row in table.rows:
        cells = [cell.text.strip() for cell in row.cells]
        print(" | ".join(cells))

Extract Document Metadata

python

from docx import Document

doc = Document('document.docx')
props = doc.core_properties
print(f"Title: {props.title}")
print(f"Author: {props.author}")
print(f"Created: {props.created}")
print(f"Modified: {props.modified}")
print(f"Paragraphs: {len(doc.paragraphs)}")
print(f"Tables: {len(doc.tables)}")

Search for Content

python

from docx import Document

doc = Document('document.docx')
search_term = "conclusion"
for i, para in enumerate(doc.paragraphs):
    if search_term.lower() in para.text.lower():
        print(f"Paragraph {i} [{para.style.name}]: {para.text[:200]}")

Analysis Guidelines

•Headings indicate document structure — use them to navigate content
•Tables often contain key data — always extract and analyze them
•Track numbering and cross-references
•Note images/charts referenced but not extractable as text
•Comments and tracked changes may contain important editorial context