PDF Processing Skill
When to use this skill
Use this skill when the user mentions:
- •Reading or extracting content from PDFs
- •Working with tables in PDF documents
- •Merging, splitting, or reorganizing PDFs
- •Creating new PDF documents
- •Filling out PDF forms
- •Processing scanned PDFs (OCR)
Progressive Disclosure Strategy
Level 1: Quick Start
When user asks about PDF:
- •Ask ONLY for the file path
- •Choose the appropriate script based on their need
- •Execute the script with minimal parameters
- •Show results
Level 2: Core Operations
Only reveal these options after initial execution or if user asks.
Level 3: Advanced Features
Only introduce when user explicitly needs them.
Available Scripts
All scripts are in the ./scripts/ directory. Execute them using bash or python.
1. Extract Text: extract_text.py
When to use: User wants to read PDF content, extract text.
Usage:
python ./scripts/extract_text.py <pdf_path> [--pages PAGES] [--output OUTPUT]
Parameters:
- •
pdf_path: (required) Path to PDF file - •
--pages: (optional) Page range, e.g., "1-5" or "1,3,5". Default: all pages - •
--output: (optional) Output file path. Default: stdout
Example:
# Extract all text python ./scripts/extract_text.py report.pdf # Extract specific pages python ./scripts/extract_text.py report.pdf --pages "1-5" # Save to file python ./scripts/extract_text.py report.pdf --output extracted.txt
Progressive disclosure: Start with just the pdf_path, add options only if user needs them.
2. Extract Tables: extract_tables.py
When to use: User wants to extract tables, data, or spreadsheet content from PDF.
Usage:
python ./scripts/extract_tables.py <pdf_path> [--format FORMAT] [--output OUTPUT]
Parameters:
- •
pdf_path: (required) Path to PDF file - •
--format: (optional) Output format: excel, csv, json. Default: excel - •
--output: (optional) Output file path. Default: auto-generated
Example:
# Extract to Excel (default) python ./scripts/extract_tables.py financial_report.pdf # Extract to CSV python ./scripts/extract_tables.py data.pdf --format csv # Custom output path python ./scripts/extract_tables.py data.pdf --output results/tables.xlsx
Progressive disclosure: Default to Excel format unless user specifies otherwise.
3. Merge PDFs: merge_pdfs.py
When to use: User wants to combine multiple PDFs into one.
Usage:
python ./scripts/merge_pdfs.py <output_path> <pdf1> <pdf2> [pdf3 ...]
Parameters:
- •
output_path: (required) Path for merged PDF - •
pdf1, pdf2, ...: (required) PDFs to merge (in order)
Example:
# Merge PDFs python ./scripts/merge_pdfs.py merged.pdf doc1.pdf doc2.pdf doc3.pdf # Merge all PDFs in directory python ./scripts/merge_pdfs.py combined.pdf *.pdf
4. Split PDF: split_pdf.py
When to use: User wants to split PDF into separate files.
Usage:
python ./scripts/split_pdf.py <pdf_path> [--mode MODE] [--output-dir OUTPUT_DIR]
Parameters:
- •
pdf_path: (required) Path to PDF file - •
--mode: (optional) Split mode: pages (one per file), range. Default: pages - •
--output-dir: (optional) Output directory. Default: ./split_output
Example:
# Split into individual pages python ./scripts/split_pdf.py document.pdf # Split into custom directory python ./scripts/split_pdf.py document.pdf --output-dir pages/
5. Create PDF: create_pdf.py
When to use: User wants to create a PDF from text or content.
Usage:
python ./scripts/create_pdf.py <output_path> [--title TITLE] [--content CONTENT] [--input INPUT_FILE]
Parameters:
- •
output_path: (required) Path for new PDF - •
--title: (optional) Document title - •
--content: (optional) Text content (for short content) - •
--input: (optional) Text file to read content from
Example:
# From text content python ./scripts/create_pdf.py report.pdf --title "Monthly Report" --content "Report content here..." # From text file python ./scripts/create_pdf.py report.pdf --title "Report" --input content.txt
6. Fill PDF Form: fill_form.py
When to use: User needs to fill out a PDF form programmatically.
Usage:
python ./scripts/fill_form.py <form_path> <data_file> <output_path>
Parameters:
- •
form_path: (required) Path to PDF form - •
data_file: (required) JSON file with field data - •
output_path: (required) Path for filled PDF
Example:
# Fill form with data from JSON python ./scripts/fill_form.py application.pdf data.json filled_application.pdf
See: ./references/form-filling.md for data format details.
7. OCR PDF: ocr_pdf.py
When to use: User has scanned/image-based PDF and needs searchable text.
Usage:
python ./scripts/ocr_pdf.py <pdf_path> [--lang LANG] [--output OUTPUT]
Parameters:
- •
pdf_path: (required) Path to scanned PDF - •
--lang: (optional) OCR language code. Default: eng - •
--output: (optional) Output PDF path. Default: <original>_ocr.pdf
Example:
# OCR English document python ./scripts/ocr_pdf.py scanned.pdf # OCR with different language python ./scripts/ocr_pdf.py document.pdf --lang chi_sim
Note: Requires tesseract-ocr installed on system.
Interaction Guidelines
✅ Good Interaction Pattern
User: "I need to read a PDF"
Agent: "I'll help you extract the text. What's the PDF file path?"
User: "report.pdf"
Agent: [Executes] python ./scripts/extract_text.py report.pdf
Agent: "✅ Extracted text from 5 pages. Here's a preview: [shows first 500 chars]
Would you like to see the full text or extract tables?"
❌ Poor Interaction Pattern
User: "I need to read a PDF"
Agent: "I can extract text, tables, images, metadata. For text I can preserve layout or not,
use different encodings, extract specific pages using ranges or lists,
output to txt/docx/html. For tables I need to know format (Excel/CSV/JSON),
encoding, empty cell handling... Please specify all parameters."
Decision Tree for PDF Tasks
User mentions PDF ├─ "read", "extract text", "content" → execute extract_text.py ├─ "table", "data", "spreadsheet" → execute extract_tables.py ├─ "merge", "combine", "join" → execute merge_pdfs.py ├─ "split", "separate", "break" → execute split_pdf.py ├─ "create", "generate", "make" → execute create_pdf.py ├─ "form", "fill out", "application" → execute fill_form.py └─ "scanned", "image", "OCR" → execute ocr_pdf.py
Error Handling
If a script fails:
- •Show the error message clearly
- •Suggest common fixes (file not found, missing dependencies)
- •Don't repeat the same command - try alternatives or ask for clarification
Common errors:
- •
FileNotFoundError: Check if file path is correct, suggest listing directory - •
ImportError(pdfplumber, pypdf, etc.): Install missing package - •
No tables found: Suggest OCR if PDF might be scanned - •
Permission denied: Check file permissions
Dependencies
Scripts require these Python packages:
pip install pdfplumber pypdf reportlab pandas openpyxl pytesseract pdf2image
System dependencies:
# Ubuntu/Debian sudo apt-get install tesseract-ocr poppler-utils # macOS brew install tesseract poppler
Reference Documents
- •
./references/form-filling.md: Detailed guide for PDF form data format - •
./references/ocr-guide.md: OCR best practices and language codes - •
./references/troubleshooting.md: Common issues and solutions
Examples
See ./examples/ directory for complete workflow examples:
- •
example_extract_report.sh: Extract text and tables from report - •
example_merge_quarterly.sh: Merge quarterly PDFs - •
example_batch_process.sh: Process multiple PDFs
Key Principles
- •Scripts are self-contained: Each script handles one task completely
- •Minimal parameters: Use smart defaults, accept overrides
- •Clear output: Scripts print success/error messages with emojis
- •Exit codes: 0 for success, non-zero for errors
- •Help text: All scripts support
--helpflag
Tips for Agent
- •Start simple: Use minimal parameters first
- •Show progress: For long operations, show that work is happening
- •Be helpful: Suggest next logical steps after each operation
- •Learn preferences: If user always wants CSV, remember that
- •Handle errors gracefully: Don't just show stack traces, explain the problem
Skill Maintenance
This skill follows the Agent Skills standard:
- •name and description in frontmatter enable discovery
- •SKILL.md provides full instructions for activation
- •scripts/ contain executable code for execution
- •Progressive disclosure reduces cognitive load
Version: 1.0.0 Last updated: 2025-02-02