AgentSkillsCN

template-analyzer

模板分析器

SKILL.md
--- frontmatter
name: template-analyzer
summary: Analyze user-provided document templates (hwpx/docx/html/md/plain) and produce a machine-readable Template Schema (JSON). Designed to work with the hwpx_my skill conventions.

template-analyzer SKILL.md

This skill analyzes uploaded document templates and produces a Template Schema JSON describing sections, fields, types, and replacement strategy so downstream skills (intent-classifier, prompt-generator, content-generator, renderer) can operate automatically.

Goals

  • Accept a template file (hwpx/docx/html/md/txt) and produce a JSON Template Schema.
  • Detect placeholders and candidate fields using heuristics + LLM fallback.
  • Annotate each field with: field_id, label, type, required, replacement (bulk|sequential), constraints, example, prompt_hint.
  • Provide an interactive review step (auto-suggest then user confirms/edits schema).

Core components

  1. File loaders/parsers

    • HWPX: use ZIP-level extraction + hwpx.ObjectFinder when available. Prefer ZIP-level text search of Contents/*.xml to enumerate t-tag text.
    • DOCX: python-docx to inspect paragraphs, runs, tables, and placeholders.
    • HTML: BeautifulSoup to extract headings, placeholders ({{...}}), lists and tables.
    • Markdown/Plain: lightweight parsing of headings (#/##), lists, and placeholder tokens.
  2. Heuristics extractor

    • Headings -> sections
    • Bold/ALLCAPS/large-font -> candidate field labels
    • Placeholder tokens like {{field}} or repeated short phrases -> placeholder fields
    • Tables -> table-type fields
    • Repeated identical text -> candidate for sequential replacement
  3. LLM assistance (optional)

    • When structure is ambiguous, call LLM: "Suggest a JSON schema of fields and their types for this template. Provide examples."
    • Save LLM output as suggestion; require human confirmation.
  4. Output: Template Schema JSON structure (see templates/report_template_hwpx_v1.json for example)

Interactive review

  • After auto-generation, write suggested JSON to /workspace/tmp and open for user review. Provide a CLI to accept/modify/reject fields.

Implementation notes

  • Always preserve original template file: copy to a working path before any modifications.
  • For HWPX: prefer ZIP-level string replacement. Provide utilities zip_replace and zip_replace_sequential (examples included in hwpx_my SKILL.md).
  • Save schema files under workspace/templates/{template_id}.json

Validation & tests

  • Provide unit tests for parser modules using sample templates in assets/

Security & safety

  • Do not call remote services with sensitive template text unless user explicitly allows (LLM calls are optional and require user consent in secure settings).

Example usage (CLI)

python -m template_analyzer.analyze --input uploads/user_template.hwpx --output templates/user_template_v1.json --interactive