PDF Processor Skill
Quick Reference
| Task | Method |
|---|---|
| Extract text | Run scripts/extract_text.py |
| Get form fields | Run scripts/get_form_fields.py |
| Fill form | Run scripts/fill_form.py |
| Convert to images | Run scripts/pdf_to_images.py |
Text Extraction
To extract text from a PDF:
bash
python scripts/extract_text.py input.pdf
This outputs the text content to stdout. For large PDFs, it processes page by page.
Form Processing
Get Form Fields
First, identify what fields exist:
bash
python scripts/get_form_fields.py form.pdf
Output is JSON with field names, types, and current values.
Fill Form Fields
Create a JSON file with field values:
json
{
"name": "John Doe",
"email": "john@example.com",
"date": "2024-01-15"
}
Then fill the form:
bash
python scripts/fill_form.py form.pdf values.json output.pdf
Page Manipulation
See references/page_operations.md for:
- •Rotating pages
- •Extracting specific pages
- •Merging multiple PDFs
- •Splitting PDFs
Conversion
PDF to Images
bash
python scripts/pdf_to_images.py input.pdf output_dir/
Creates one PNG per page.
Error Handling
- •If a PDF is encrypted, you'll need to provide the password
- •Some PDFs with complex layouts may have text extraction issues
- •Scanned PDFs require OCR (not included in this skill)
Guidelines
- •Always verify form field names before filling
- •Back up original files before modifications
- •Check output files after processing