AgentSkillsCN

pdf-docx-paper-summary-llm

LLM 强制执行的 PDF 转 DOCX 论文摘要:逐页阅读 PDF,以中文撰写兼具研究背景、研究内容与主要结论的评述性段落,并辅以防重复的问答式校验;提取图表与表格,再按照指定的 DOCX 模板(如示例.docx)进行渲染。适用于批量处理学术 PDF 文件,生成高质量、逻辑清晰的 Word 摘要。

SKILL.md
--- frontmatter
name: pdf-docx-paper-summary-llm
description: "LLM-enforced PDF to DOCX paper summaries: read each PDF, write Chinese review-style sections (研究背景/研究内容/主要结论) with anti-repetition QA, extract figures/tables, and render into a provided template docx (e.g., 示例.docx). Use for batch-processing academic PDFs into high-quality, logic-driven Word summaries."

PDF -> DOCX Paper Summary (LLM-Enforced)

Overview

Generate per-paper Word summaries that match a provided template .docx, while forcing the agent to: (1) read the PDF, (2) write a logical Chinese review (not sentence-by-sentence translation), and (3) pass QA before producing the final docx.

Workflow (per PDF)

Inputs:

  • Template docx (e.g. 示例.docx)
  • Paper PDF (e.g. pdfs/<paper>.pdf)
  • Output folder (recommended: out/<paper-id>/)

Steps:

  1. Extract a “paper pack” (text excerpts + figures/tables + key facts)
bash
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/extract_paper_pack.py" \
  --pdf 'pdfs/<paper>.pdf' \
  --assets-dir 'out/<paper-id>/assets' \
  --out-json 'out/<paper-id>/paper_pack.json'
  1. Read the PDF (LLM step) and write the config JSON

Create docs/plans/<paper-id>.json following:

  • references/config-format.md
  • references/writing-standard.md (this is the “示例.docx” writing standard)

Hard rules (do not violate):

  • No “原文要点改写 / 补充讨论 / 逐句翻译堆段落”
  • Each section must be logical and paper-specific (models/data/metrics/codes/limits)
  • Paragraphs must not devolve into one-sentence repetition
  1. Run QA on the JSON (must pass before building docx)
bash
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/qa_config.py" \
  --config 'docs/plans/<paper-id>.json' \
  --pack 'out/<paper-id>/paper_pack.json'

If QA fails, revise the JSON (do not “paper over” errors with generic filler).

  1. Build the final docx (template-preserving)
bash
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/build_paper_docx.py" \
  --template '示例.docx' \
  --config 'docs/plans/<paper-id>.json' \
  --pdf 'pdfs/<paper>.pdf' \
  --out-docx 'out/<paper-id>/<paper-id>_summary.docx' \
  --assets-dir 'out/<paper-id>/assets'

Workflow (batch)

Process a directory of PDFs by looping the per-PDF workflow. In batch mode, do not move on to the next paper until the current paper passes qa_config.py.

Recommended output layout:

code
docs/plans/<paper-id>.json
out/<paper-id>/paper_pack.json
out/<paper-id>/assets/FigXX.png
out/<paper-id>/<paper-id>_summary.docx

Notes

  • This skill intentionally refuses generic padding. If you cannot extract enough paper-specific details, go back to the PDF and read deeper (Methods/Results).
  • Keep images supportive, not dominant (QA also checks text quality).