HTML Structure Validate Skill
Purpose
This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.
The skill checks:
- •HTML5 compliance - Proper DOCTYPE, tags
- •Tag closure - All tags properly closed
- •Required elements - Meta tags, stylesheet links
- •Well-formedness - Valid structure
If validation fails, the pipeline STOPS and triggers a hook to notify the user.
This enforces the principle: Python validates, ensuring deterministic quality.
What to Do
- •
Load HTML file to validate
- •Read
04_page_XX.htmlgenerated by AI skill - •Verify file exists and is readable
- •Confirm file is text (not binary)
- •Read
- •
Run validation checks
- •Check HTML5 structure compliance
- •Verify tag closure
- •Validate head section
- •Check required CSS link
- •Validate page container structure
- •
Generate validation report
- •Document all checks performed
- •List any errors found
- •Note warnings (non-blocking)
- •Record informational findings
- •
Save validation report as JSON
- •Save to:
output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json - •Include timestamp
- •Include all check results
- •Save to:
- •
Exit with appropriate code
- •Return 0 if VALID (continue pipeline)
- •Return 1 if INVALID (STOP pipeline, trigger hook)
Input Parameters
html_file: <str> - Path to 04_page_XX.html output_dir: <str> - Directory for validation report strict_mode: <bool> - If true, warnings also fail (default: false) page_number: <int> - Page number (for reporting) chapter: <int> - Chapter number (for reporting)
Validation Checks
Check 1: DOCTYPE Declaration
Requirement: File must start with proper DOCTYPE
<!DOCTYPE html>
Check:
- • File contains
<!DOCTYPE html>(case-insensitive) - • DOCTYPE appears before any tags
- • DOCTYPE is on first line or near beginning
Error if: Missing or incorrect DOCTYPE
Check 2: HTML Tags
Requirement: Proper <html> opening and closing tags
<html lang="en">
...
</html>
Checks:
- •
<html>tag present - •
</html>closing tag present - • Tags are properly paired
- • No unclosed
<html>tags
Error if: Missing either tag or improperly paired
Check 3: Head Section
Requirement: Complete <head> section with metadata
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>...</title>
<link rel="stylesheet" href="../../styles/main.css">
</head>
Checks:
- •
<head>and</head>tags present - •
<meta charset="UTF-8">present - •
<meta name="viewport">present (warning if missing) - •
<title>tag with content present - • CSS
<link>tag present with href attribute
Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag
Check 4: Body Section
Requirement: Proper <body> tags with content
<body>
<div class="page-container">
<main class="page-content">
...
</main>
</div>
</body>
Checks:
- •
<body>and</body>tags present - •
<div class="page-container">present - •
<main class="page-content">present inside container - • Body contains substantial content (> 100 bytes)
Error if: Missing tags or required container divs
Check 5: Tag Closure Validation
Requirement: All tags must be properly closed
Checks for:
- •Unmatched opening tags (e.g.,
<p>without</p>) - •Improper nesting (e.g.,
<p><h2>text</h2></p>) - •Self-closing tags used correctly (e.g.,
<br/>,<img/>) - •Comment blocks properly formatted (
<!-- -->)
Validation method:
- •Parse HTML into tree structure
- •Verify all nodes properly matched
- •Check nesting doesn't violate HTML5 rules
Error if: Any unmatched or improperly nested tags
Check 6: Heading Tags (h1-h6)
Requirement: Valid heading hierarchy
<h1>Chapter Title</h1> <h2>Section Heading</h2> <h3>Subsection</h3>
Checks:
- • All heading tags properly closed
- • First heading should be h1 (warning if not)
- • Heading levels don't skip dramatically (h1 → h4 is suspicious)
- • All headings have text content (not empty)
Error if: Heading tags improperly closed Warning if: Suspicious hierarchy
Check 7: Content Structure
Requirement: Meaningful content in page container
Checks:
- •
<main class="page-content">contains elements - • Content includes headings or paragraphs
- • No completely empty content area
- • Text nodes or elements present (> 100 words total)
Error if: No content or empty structure
Check 8: List Integrity
Requirement: All lists properly structured
Checks for each <ul> or <ol>:
- • List opening and closing tags matched
- • List contains
<li>elements - • All
<li>tags properly closed - •
<li>count matches opening/closing pairs - • No nested
<ul>or<ol>improperly closed
Error if: Empty lists or unmatched <li> tags
Check 9: Image and Link Tags
Requirement: Self-closing tags properly formatted
Checks:
- • All
<img>tags havesrcandaltattributes - • All
<a>tags have validhrefattributes - • Image paths don't have obvious errors (no broken syntax)
- • Self-closing tags use proper syntax
Warning if: Images missing alt text or links missing href
Check 10: Table Tags (if present)
Requirement: Proper table structure
Checks:
- •
<table>,<tr>,<td>,<th>tags properly nested - • All rows have consistent column counts
- • Table headers and body properly structured
Error if: Malformed table structure
Validation Report Format
Output: 06_validation_structure.json
{
"page": 16,
"book_page": 17,
"chapter": 2,
"validation_type": "structure",
"validation_timestamp": "2025-11-08T14:34:00Z",
"overall_status": "PASS",
"error_count": 0,
"warning_count": 1,
"checks_performed": [
{
"check_name": "DOCTYPE Declaration",
"status": "PASS",
"details": "Valid HTML5 DOCTYPE found"
},
{
"check_name": "HTML Tags",
"status": "PASS",
"details": "Proper <html> opening and closing tags"
},
{
"check_name": "Head Section",
"status": "PASS",
"details": "All required meta tags and title present"
},
{
"check_name": "Body Section",
"status": "PASS",
"details": "Body and content structure valid"
},
{
"check_name": "Tag Closure",
"status": "PASS",
"details": "All tags properly matched and closed"
},
{
"check_name": "Heading Hierarchy",
"status": "PASS",
"details": "4 headings found, proper h1-h4 hierarchy"
},
{
"check_name": "Content Structure",
"status": "PASS",
"details": "Main content area contains 245 words across 3 paragraphs"
},
{
"check_name": "List Integrity",
"status": "PASS",
"details": "1 list with 3 items, all properly formed"
},
{
"check_name": "Image Tags",
"status": "PASS",
"details": "No images on this page"
},
{
"check_name": "Table Tags",
"status": "PASS",
"details": "No tables on this page"
}
],
"errors": [],
"warnings": [
{
"check": "Heading Hierarchy",
"message": "First heading is h2, typically should be h1 for page opening",
"severity": "LOW"
}
],
"summary": {
"total_checks": 10,
"passed": 9,
"failed": 0,
"warnings": 1,
"html_valid": true,
"tags_matched": true,
"content_substantial": true
}
}
Validation Rules
PASS Criteria
- •DOCTYPE present and valid
- •All required tags (
html,head,body,main,div.page-container) present - •All tags properly closed and matched
- •Title tag with content
- •CSS stylesheet link present
- •Content structure valid
- •No structural errors
FAIL Criteria (BLOCKS PIPELINE)
- •Missing DOCTYPE
- •Missing required tags
- •Unmatched or improperly nested tags
- •Missing title or CSS link
- •Empty content
- •Malformed lists or tables
WARNING (Logged but doesn't block)
- •Missing viewport meta tag
- •First heading is not h1
- •Large heading jumps (h1 → h4)
- •Missing alt text on images
- •Missing href on links
Implementation: Using Python Script
This validation is performed by existing validate_html.py tool, run in structure validation mode:
cd Calypso/tools # Validate single page HTML python3 validate_html.py \ ../output/chapter_02/page_artifacts/page_16/04_page_16.html \ --output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \ --strict-structure # Exit code: # 0 = VALID (continue to next skill) # 1 = INVALID (STOP pipeline)
Hook Integration
When validation FAILS:
# Trigger hook: .claude/hooks/validate-structure.sh # Receives: # - Page number # - HTML file path # - Validation report path # - Error details # Hook behavior: # - Log failure with details # - Save error report # - Notify user # - STOP pipeline (no further processing)
Error Recovery
If validation fails:
- •User reviews validation report
- •User identifies issue in AI-generated HTML
- •Options:
- •Fix HTML manually and re-validate
- •Re-run AI generation with improved prompt
- •Review source extraction data for errors
- •Proceed with caution (expert override)
Quality Metrics
Validation provides metrics:
- •Percentage of checks passing
- •Error severity levels
- •Content size (word count, element count)
- •Structure complexity
These metrics feed into final quality reports.
Success Criteria
✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails
Next Steps After PASS
If validation passes:
- •All pages of chapter processed through this gate
- •Skill 4 (consolidate pages) merges individual page HTMLs
- •Quality Gate 2 (semantic validate) checks semantic structure
- •Continue through validation pipeline
Next Steps After FAIL
If validation fails:
- •PIPELINE STOPS
- •Hook
validate-structure.shtriggered - •User receives error report with details
- •User must fix issues and retry
Design Notes
- •This is the first deterministic quality gate
- •Uses proven
validate_html.pytool - •Catches structural issues before semantic analysis
- •Provides clear, actionable error messages
- •Essential for ensuring pipeline reliability
Testing
To test structure validation:
# Test with known-good HTML python3 validate_html.py ../output/chapter_01/chapter_01.html # Should show: ✓ VALID # Test with invalid HTML (if needed) python3 validate_html.py broken_html.html # Should show: ✗ INVALID with specific errors