template-analyzer SKILL.md
This skill analyzes uploaded document templates and produces a Template Schema JSON describing sections, fields, types, and replacement strategy so downstream skills (intent-classifier, prompt-generator, content-generator, renderer) can operate automatically.
Goals
- •Accept a template file (hwpx/docx/html/md/txt) and produce a JSON Template Schema.
- •Detect placeholders and candidate fields using heuristics + LLM fallback.
- •Annotate each field with: field_id, label, type, required, replacement (bulk|sequential), constraints, example, prompt_hint.
- •Provide an interactive review step (auto-suggest then user confirms/edits schema).
Core components
- •
File loaders/parsers
- •HWPX: use ZIP-level extraction + hwpx.ObjectFinder when available. Prefer ZIP-level text search of Contents/*.xml to enumerate t-tag text.
- •DOCX: python-docx to inspect paragraphs, runs, tables, and placeholders.
- •HTML: BeautifulSoup to extract headings, placeholders ({{...}}), lists and tables.
- •Markdown/Plain: lightweight parsing of headings (#/##), lists, and placeholder tokens.
- •
Heuristics extractor
- •Headings -> sections
- •Bold/ALLCAPS/large-font -> candidate field labels
- •Placeholder tokens like {{field}} or repeated short phrases -> placeholder fields
- •Tables -> table-type fields
- •Repeated identical text -> candidate for sequential replacement
- •
LLM assistance (optional)
- •When structure is ambiguous, call LLM: "Suggest a JSON schema of fields and their types for this template. Provide examples."
- •Save LLM output as suggestion; require human confirmation.
- •
Output: Template Schema JSON structure (see templates/report_template_hwpx_v1.json for example)
Interactive review
- •After auto-generation, write suggested JSON to /workspace/tmp and open for user review. Provide a CLI to accept/modify/reject fields.
Implementation notes
- •Always preserve original template file: copy to a working path before any modifications.
- •For HWPX: prefer ZIP-level string replacement. Provide utilities zip_replace and zip_replace_sequential (examples included in hwpx_my SKILL.md).
- •Save schema files under workspace/templates/{template_id}.json
Validation & tests
- •Provide unit tests for parser modules using sample templates in assets/
Security & safety
- •Do not call remote services with sensitive template text unless user explicitly allows (LLM calls are optional and require user consent in secure settings).
Example usage (CLI)
python -m template_analyzer.analyze --input uploads/user_template.hwpx --output templates/user_template_v1.json --interactive