pdf-summary

通过 Python（pymupdf）提取文本，生成结构化摘要，从而对 PDF 文档进行精简概括。当用户希望从 PDF 文件中获取摘要、概览或关键要点时，这套工具将为您省去大量阅读与整理的时间。

SKILL.md

--- frontmatter

name: pdf-summary
description: Summarize PDF documents by extracting text with Python (pymupdf) and generating structured summaries. Use when the user wants a summary, overview, or key points from a PDF file.
compatibility: Requires Python 3 and pymupdf (pip install pymupdf)

PDF Summary

Extract text from PDF files and produce structured summaries.

When to use this skill

•User asks to summarize a PDF document
•User wants key points or an executive overview of a PDF
•User needs a chapter-by-chapter breakdown of a long PDF

Steps

•
Ensure pymupdf is installed. If not:
bash
```
pip install pymupdf
```
•
Extract text from the PDF using the bundled script.
bash
```
python scripts/extract_text.py "INPUT_FILE_PATH"
```
The script prints extracted text directly to stdout (page-separated). You do NOT need to read a separate file — just use the shell output. Options: --quiet (no page markers), --save (also write a .txt file), --password <pwd> (encrypted PDF).
•
Produce a summary based on the extracted text in this structure:
- •Document: title or filename
- •Pages: total count
- •Executive Summary: 2–3 sentence overview
- •Key Points: 5–10 bullet items
- •Detailed Sections: section-by-section breakdown (if requested)

Edge cases

•Scanned/image-only PDFs: If extracted text is empty or garbled, inform the user that OCR is required and suggest pip install pytesseract.
•Very large PDFs (100+ pages): Extract text first, then summarize in chunks rather than loading the entire text at once.
•Password-protected PDFs: pymupdf supports passwords — pass it via fitz.open(path, password="xxx"). Ask the user for the password if needed.

Scripts

•extract_text.py — Extract all pages as plain text