任务目标

Name: paper-convert
Rating: 92
Author: LetItBe12345

工作流程

•
解析用户需求
- •判断动作：convert / split / group。
- •判断输入：单个PDF路径或目录。
•
选择脚本参数
- •单PDF：直接传入路径。
- •多PDF：传入目录路径（脚本会扫描目录下的PDF）。
•
运行脚本
- •脚本路径：/home/jin/.codex/skills/paper-convert/scripts/paper_pipeline.py
- •规则文件（可选）：/home/jin/.codex/skills/paper-convert/references/section_rules.json
•
检查输出
- •每个PDF生成一个同名目录。
- •sections/ 内是拆分后的章节文件。
- •_grouped/ 内按章节名归类并编号。

•

仅转换

bash

python /home/jin/.codex/skills/paper-convert/scripts/paper_pipeline.py --convert <paper.pdf>

•

转换并拆分

bash

python /home/jin/.codex/skills/paper-convert/scripts/paper_pipeline.py --convert --split <paper.pdf>

•

仅拆分（仍会临时转换，但最终只保留拆分结果）

bash

python /home/jin/.codex/skills/paper-convert/scripts/paper_pipeline.py --split <paper.pdf>

•

拆分并归类（目录输入）

bash

python /home/jin/.codex/skills/paper-convert/scripts/paper_pipeline.py --split --group <pdf_dir>

•
标题清洗：
- •去掉编号前缀：1.Introduction -> Introduction
- •去掉编号片段：Introduction 1.1 Introduction -> Introduction
- •去掉尾部冒号：Introduction: -> Introduction
- •折叠多余空格
•
归类匹配：
- •以清洗后的标题为主，再应用轻度规则（见 section_rules.json）。
- •Introduction 可匹配 Intro / Idea Introduction。
- •Background 不归入 Introduction。

•
如需调整归类范围，编辑：
- •/home/jin/.codex/skills/paper-convert/references/section_rules.json
•如需更严格/更宽松的匹配，修改脚本中的规则或传入自定义规则文件。