PDF -> DOCX Paper Summary (LLM-Enforced)

Overview

Generate per-paper Word summaries that match a provided template .docx, while forcing the agent to: (1) read the PDF, (2) write a logical Chinese review (not sentence-by-sentence translation), and (3) pass QA before producing the final docx.

Workflow (per PDF)

Inputs:

•Template docx (e.g. 示例.docx)
•Paper PDF (e.g. pdfs/<paper>.pdf)
•Output folder (recommended: out/<paper-id>/)

Steps:

•Extract a “paper pack” (text excerpts + figures/tables + key facts)

bash

python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/extract_paper_pack.py" \
  --pdf 'pdfs/<paper>.pdf' \
  --assets-dir 'out/<paper-id>/assets' \
  --out-json 'out/<paper-id>/paper_pack.json'

•Read the PDF (LLM step) and write the config JSON

Create docs/plans/<paper-id>.json following:

•references/config-format.md
•references/writing-standard.md (this is the “示例.docx” writing standard)

Hard rules (do not violate):

•No “原文要点改写 / 补充讨论 / 逐句翻译堆段落”
•Each section must be logical and paper-specific (models/data/metrics/codes/limits)
•Paragraphs must not devolve into one-sentence repetition

•Run QA on the JSON (must pass before building docx)

bash

python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/qa_config.py" \
  --config 'docs/plans/<paper-id>.json' \
  --pack 'out/<paper-id>/paper_pack.json'

If QA fails, revise the JSON (do not “paper over” errors with generic filler).

•Build the final docx (template-preserving)

bash

python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/build_paper_docx.py" \
  --template '示例.docx' \
  --config 'docs/plans/<paper-id>.json' \
  --pdf 'pdfs/<paper>.pdf' \
  --out-docx 'out/<paper-id>/<paper-id>_summary.docx' \
  --assets-dir 'out/<paper-id>/assets'

Workflow (batch)

Process a directory of PDFs by looping the per-PDF workflow. In batch mode, do not move on to the next paper until the current paper passes qa_config.py.

Recommended output layout:

code

docs/plans/<paper-id>.json
out/<paper-id>/paper_pack.json
out/<paper-id>/assets/FigXX.png
out/<paper-id>/<paper-id>_summary.docx

Notes

•This skill intentionally refuses generic padding. If you cannot extract enough paper-specific details, go back to the PDF and read deeper (Methods/Results).
•Keep images supportive, not dominant (QA also checks text quality).