Legislative Text Flattener

This skill processes Word documents (DOCX) containing legislative or regulatory text and converts hierarchical structures into a flat, numbered list of discrete requirements or provisions.

When to Use This Skill

Activate this skill when the user wants to:

•Flatten hierarchical legislative or regulatory text
•Extract requirements from compliance frameworks
•Convert nested sections/subsections into a linear list
•Prepare legislative text for requirement mapping
•Process Word documents containing legal or regulatory content

Quick Start

Install required dependencies:

bash

pip install python-docx openpyxl

Basic usage (outputs formatted XLSX by default):

bash

python flattener_utility.py input.docx output.xlsx

Or specify format explicitly:

bash

python flattener_utility.py input.docx output.xlsx --format xlsx
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json

Processing Instructions

1. Input Analysis

First, understand the input document structure:

•Read the DOCX file (use python-docx library or convert to text)
•Identify the hierarchical structure (sections, subsections, paragraphs, sub-paragraphs)
•Detect numbering schemes (1.2.3, (a)(b)(c), Article-Section-Paragraph, etc.)
•Note any special formatting or emphasis (bold, italics, SHALL/MUST keywords)

2. Extraction Process

Extract content while preserving context:

•Each discrete requirement or provision becomes a separate item
•Maintain parent context for nested items
•Preserve the original numbering/reference system
•Extract complete sentences or logical requirement units
•Identify normative language (shall, must, should, may)

3. Flattening Strategy

Convert hierarchy to flat structure:

•Assign sequential flat numbering (1, 2, 3...)
•Include original reference in metadata (e.g., "Section 500.02(b)(3)")
•Preserve full context path for each item
•Handle multi-level lists and nested requirements
•Maintain logical groupings where appropriate

4. Output Format

Generate a structured flat list with these fields for each requirement:

code

Flat ID: [Sequential number]
Original Reference: [Original section/subsection identifier]
Context Path: [Full hierarchical path, e.g., "Article 5 > Section 2 > Paragraph b"]
Requirement Type: [Mandate/Prohibition/Permission/Definition]
Normative Level: [SHALL/MUST/SHOULD/MAY/INFORMATIVE]
Text: [Full requirement text]
Keywords: [Extracted key concepts or compliance areas]
---

5. Implementation Approach

Use this workflow:

python

# Install required library if needed
# pip install python-docx

from docx import Document
import re

def flatten_legislative_text(docx_path):
    """
    Flattens hierarchical legislative text from DOCX.
    Returns a list of flattened requirements.
    """
    doc = Document(docx_path)
    flattened = []
    flat_id = 1
    context_stack = []

    for paragraph in doc.paragraphs:
        # Skip empty paragraphs
        if not paragraph.text.strip():
            continue

        # Detect hierarchy level (by style or numbering)
        level = detect_level(paragraph)
        original_ref = extract_reference(paragraph)

        # Update context stack
        update_context(context_stack, level, paragraph.text)

        # Extract if it's a requirement (not just a heading)
        if is_requirement(paragraph):
            item = {
                'flat_id': flat_id,
                'original_ref': original_ref,
                'context_path': ' > '.join(context_stack),
                'requirement_type': classify_requirement(paragraph.text),
                'normative_level': extract_normative_level(paragraph.text),
                'text': clean_text(paragraph.text),
                'keywords': extract_keywords(paragraph.text)
            }
            flattened.append(item)
            flat_id += 1

    return flattened

6. Output Options

Provide results in user's preferred format:

•XLSX (Excel): Formatted Excel workbook with styled headers, auto-filter, frozen panes, and optimized column widths (default and recommended)
•CSV: Simple tabular format with all fields as columns
•JSON: Structured data for programmatic use
•Markdown: Human-readable with clear section breaks
•Database insert: SQL statements or database-ready format

7. Quality Checks

Validate the flattening:

•Ensure no requirements are lost or duplicated
•Verify context paths are complete and accurate
•Check that normative language is correctly identified
•Confirm original references are preserved
•Review that similar requirements are consistently formatted

Example Output (Markdown Format)

markdown

## Flattened Requirements

### Requirement 1
- **Flat ID**: 1
- **Original Reference**: Section 500.02(a)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (a)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: Each Covered Entity shall maintain a cybersecurity program designed to protect the confidentiality, integrity and availability of the Covered Entity's Information Systems.
- **Keywords**: cybersecurity program, confidentiality, integrity, availability, Information Systems

---

### Requirement 2
- **Flat ID**: 2
- **Original Reference**: Section 500.02(b)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (b)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: The cybersecurity program shall be based on the Covered Entity's Risk Assessment and designed to perform the following core cybersecurity functions...
- **Keywords**: Risk Assessment, core cybersecurity functions

---

Notes

•
XLSX output is recommended for most use cases as it provides:
- •Professional formatting with styled headers
- •Auto-filter for easy data exploration
- •Frozen header row for scrolling large datasets
- •Optimized column widths for readability
- •Text wrapping for long requirement text
•Handle tables within documents by processing each cell as potential requirement text
•Preserve cross-references between sections where they exist
•Flag ambiguous or incomplete requirements for manual review
•Support batch processing of multiple documents
•Maintain a processing log of any items that couldn't be automatically classified

XLSX Features

The XLSX export includes:

•Header styling: Bold white text on blue background
•Auto-filter: Filter requirements by any column
•Frozen panes: Header row stays visible when scrolling
•Column widths: Optimized for each field type (10-60 characters)
•Text wrapping: Long text automatically wraps for readability
•Borders: Clean grid lines for professional appearance

Supporting Files

See examples.md for sample input/output pairs and template.md for customizable output templates.