PDF Document Skill
Comprehensive toolkit for PDF manipulation tasks including text/table extraction, document creation, merging, splitting, and form handling.
Trigger
- •When user needs PDF creation or manipulation
- •Text or table extraction from PDF documents
- •Merging or splitting PDF files
- •PDF form filling or processing
- •OCR processing of scanned documents
Core Capabilities
Text & Table Operations:
- •Extract text with layout preservation using pdfplumber
- •Automated table detection and conversion to data formats like Excel
- •OCR support for processing scanned documents
Document Manipulation:
- •Merge multiple PDFs
- •Split documents into individual pages
- •Rotate pages and add watermarks
- •Apply password encryption and decrypt protected files
PDF Creation:
- •Generate new documents from scratch using reportlab
- •Multi-page document support
- •Custom formatting and styling
Metadata & Forms:
- •Read and access document properties (title, author, subject)
- •Fill and process PDF forms
- •Access security and permissions
Primary Libraries
- •pypdf (basic operations)
- •pdfplumber (structured extraction)
- •reportlab (creation)
- •Command-line tools: qpdf, pdftotext
Use When
- •Creating professional PDF reports
- •Extracting data from PDF documents
- •Merging multiple documents
- •Processing scanned forms with OCR
- •Automating document workflows
- •Protecting documents with encryption