Legislative Text Flattener
This skill processes Word documents (DOCX) containing legislative or regulatory text and converts hierarchical structures into a flat, numbered list of discrete requirements or provisions.
When to Use This Skill
Activate this skill when the user wants to:
- •Flatten hierarchical legislative or regulatory text
- •Extract requirements from compliance frameworks
- •Convert nested sections/subsections into a linear list
- •Prepare legislative text for requirement mapping
- •Process Word documents containing legal or regulatory content
Quick Start
Install required dependencies:
pip install python-docx openpyxl
Basic usage (outputs formatted XLSX by default):
python flattener_utility.py input.docx output.xlsx
Or specify format explicitly:
python flattener_utility.py input.docx output.xlsx --format xlsx python flattener_utility.py input.docx output.csv --format csv python flattener_utility.py input.docx output.json --format json
Processing Instructions
1. Input Analysis
First, understand the input document structure:
- •Read the DOCX file (use python-docx library or convert to text)
- •Identify the hierarchical structure (sections, subsections, paragraphs, sub-paragraphs)
- •Detect numbering schemes (1.2.3, (a)(b)(c), Article-Section-Paragraph, etc.)
- •Note any special formatting or emphasis (bold, italics, SHALL/MUST keywords)
2. Extraction Process
Extract content while preserving context:
- •Each discrete requirement or provision becomes a separate item
- •Maintain parent context for nested items
- •Preserve the original numbering/reference system
- •Extract complete sentences or logical requirement units
- •Identify normative language (shall, must, should, may)
3. Flattening Strategy
Convert hierarchy to flat structure:
- •Assign sequential flat numbering (1, 2, 3...)
- •Include original reference in metadata (e.g., "Section 500.02(b)(3)")
- •Preserve full context path for each item
- •Handle multi-level lists and nested requirements
- •Maintain logical groupings where appropriate
4. Output Format
Generate a structured flat list with these fields for each requirement:
Flat ID: [Sequential number] Original Reference: [Original section/subsection identifier] Context Path: [Full hierarchical path, e.g., "Article 5 > Section 2 > Paragraph b"] Requirement Type: [Mandate/Prohibition/Permission/Definition] Normative Level: [SHALL/MUST/SHOULD/MAY/INFORMATIVE] Text: [Full requirement text] Keywords: [Extracted key concepts or compliance areas] ---
5. Implementation Approach
Use this workflow:
# Install required library if needed
# pip install python-docx
from docx import Document
import re
def flatten_legislative_text(docx_path):
"""
Flattens hierarchical legislative text from DOCX.
Returns a list of flattened requirements.
"""
doc = Document(docx_path)
flattened = []
flat_id = 1
context_stack = []
for paragraph in doc.paragraphs:
# Skip empty paragraphs
if not paragraph.text.strip():
continue
# Detect hierarchy level (by style or numbering)
level = detect_level(paragraph)
original_ref = extract_reference(paragraph)
# Update context stack
update_context(context_stack, level, paragraph.text)
# Extract if it's a requirement (not just a heading)
if is_requirement(paragraph):
item = {
'flat_id': flat_id,
'original_ref': original_ref,
'context_path': ' > '.join(context_stack),
'requirement_type': classify_requirement(paragraph.text),
'normative_level': extract_normative_level(paragraph.text),
'text': clean_text(paragraph.text),
'keywords': extract_keywords(paragraph.text)
}
flattened.append(item)
flat_id += 1
return flattened
6. Output Options
Provide results in user's preferred format:
- •XLSX (Excel): Formatted Excel workbook with styled headers, auto-filter, frozen panes, and optimized column widths (default and recommended)
- •CSV: Simple tabular format with all fields as columns
- •JSON: Structured data for programmatic use
- •Markdown: Human-readable with clear section breaks
- •Database insert: SQL statements or database-ready format
7. Quality Checks
Validate the flattening:
- •Ensure no requirements are lost or duplicated
- •Verify context paths are complete and accurate
- •Check that normative language is correctly identified
- •Confirm original references are preserved
- •Review that similar requirements are consistently formatted
Example Output (Markdown Format)
## Flattened Requirements ### Requirement 1 - **Flat ID**: 1 - **Original Reference**: Section 500.02(a) - **Context Path**: Part 500 > Section 500.02 > Paragraph (a) - **Requirement Type**: Mandate - **Normative Level**: SHALL - **Text**: Each Covered Entity shall maintain a cybersecurity program designed to protect the confidentiality, integrity and availability of the Covered Entity's Information Systems. - **Keywords**: cybersecurity program, confidentiality, integrity, availability, Information Systems --- ### Requirement 2 - **Flat ID**: 2 - **Original Reference**: Section 500.02(b) - **Context Path**: Part 500 > Section 500.02 > Paragraph (b) - **Requirement Type**: Mandate - **Normative Level**: SHALL - **Text**: The cybersecurity program shall be based on the Covered Entity's Risk Assessment and designed to perform the following core cybersecurity functions... - **Keywords**: Risk Assessment, core cybersecurity functions ---
Notes
- •XLSX output is recommended for most use cases as it provides:
- •Professional formatting with styled headers
- •Auto-filter for easy data exploration
- •Frozen header row for scrolling large datasets
- •Optimized column widths for readability
- •Text wrapping for long requirement text
- •Handle tables within documents by processing each cell as potential requirement text
- •Preserve cross-references between sections where they exist
- •Flag ambiguous or incomplete requirements for manual review
- •Support batch processing of multiple documents
- •Maintain a processing log of any items that couldn't be automatically classified
XLSX Features
The XLSX export includes:
- •Header styling: Bold white text on blue background
- •Auto-filter: Filter requirements by any column
- •Frozen panes: Header row stays visible when scrolling
- •Column widths: Optimized for each field type (10-60 characters)
- •Text wrapping: Long text automatically wraps for readability
- •Borders: Clean grid lines for professional appearance
Supporting Files
See examples.md for sample input/output pairs and template.md for customizable output templates.