PDF Summary
Extract text from PDF files and produce structured summaries.
When to use this skill
- •User asks to summarize a PDF document
- •User wants key points or an executive overview of a PDF
- •User needs a chapter-by-chapter breakdown of a long PDF
Steps
- •
Ensure
pymupdfis installed. If not:bashpip install pymupdf
- •
Extract text from the PDF using the bundled script.
bashpython scripts/extract_text.py "INPUT_FILE_PATH"
The script prints extracted text directly to stdout (page-separated). You do NOT need to read a separate file — just use the shell output. Options:
--quiet(no page markers),--save(also write a .txt file),--password <pwd>(encrypted PDF). - •
Produce a summary based on the extracted text in this structure:
- •Document: title or filename
- •Pages: total count
- •Executive Summary: 2–3 sentence overview
- •Key Points: 5–10 bullet items
- •Detailed Sections: section-by-section breakdown (if requested)
Edge cases
- •Scanned/image-only PDFs: If extracted text is empty or garbled, inform the user that OCR is required and suggest
pip install pytesseract. - •Very large PDFs (100+ pages): Extract text first, then summarize in chunks rather than loading the entire text at once.
- •Password-protected PDFs:
pymupdfsupports passwords — pass it viafitz.open(path, password="xxx"). Ask the user for the password if needed.
Scripts
- •extract_text.py — Extract all pages as plain text