Pdf

PDF

SKILL.md

PDF Document Skill

Comprehensive toolkit for PDF manipulation tasks including text/table extraction, document creation, merging, splitting, and form handling.

Trigger

•When user needs PDF creation or manipulation
•Text or table extraction from PDF documents
•Merging or splitting PDF files
•PDF form filling or processing
•OCR processing of scanned documents

Core Capabilities

Text & Table Operations:

•Extract text with layout preservation using pdfplumber
•Automated table detection and conversion to data formats like Excel
•OCR support for processing scanned documents

Document Manipulation:

•Merge multiple PDFs
•Split documents into individual pages
•Rotate pages and add watermarks
•Apply password encryption and decrypt protected files

PDF Creation:

•Generate new documents from scratch using reportlab
•Multi-page document support
•Custom formatting and styling

Metadata & Forms:

•Read and access document properties (title, author, subject)
•Fill and process PDF forms
•Access security and permissions

Primary Libraries

•pypdf (basic operations)
•pdfplumber (structured extraction)
•reportlab (creation)
•Command-line tools: qpdf, pdftotext

Use When

•Creating professional PDF reports
•Extracting data from PDF documents
•Merging multiple documents
•Processing scanned forms with OCR
•Automating document workflows
•Protecting documents with encryption