Simple PDF Skill

Name: simple-pdf-skill
Rating: 76
Author: aiguozhi123456

Quick guide for PDF processing using Python libraries.

Library Selection Guide

Choose the right library based on your task:

Task	Library	Guide
Create new PDFs	reportlab	reportlab-guide.md
Edit existing PDFs	PyMuPDF (fitz)	pymupdf-guide.md
Extract text/tables	pdfplumber or PyMuPDF	pdfplumber-guide.md
Merge/split PDFs	PyMuPDF or pypdf	pymupdf-guide.md
Add annotations	PyMuPDF	pymupdf-guide.md
Extract images	PyMuPDF	pymupdf-guide.md
Render to images	pypdfium2	pypdfium2-guide.md
Password protection	pypdf	pypdf-guide.md
Generate charts	matplotlib + reportlab	chart-guide.md

Quick Start Workflow

1. Identify the Task Type

Creating PDFs:

•Use reportlab for new documents from scratch
•See reportlab-guide.md for complete API

Editing Existing PDFs:

•Use PyMuPDF (fitz) for any modifications
•Common edits: highlights, annotations, watermarks, merging, splitting
•See pymupdf-guide.md

Extracting Content:

•Text extraction: pdfplumber or PyMuPDF
•Table extraction: pdfplumber (better for tables)
•Image extraction: PyMuPDF
•See pdfplumber-guide.md or pymupdf-guide.md

2. Special Considerations

Chinese Text Support:

•CRITICAL: Default fonts do not support Chinese
•Must register Chinese font before use in reportlab
•See reportlab-guide.md → Chinese Font Support section
•Recommended fonts: WQY Microhei (4.4MB), Noto Sans SC (15MB)

Performance:

•For large PDFs, process in chunks
•Use fitz (PyMuPDF) for best performance on editing tasks
•Use pdfplumber for reliable text extraction

3. Implementation Reference

For implementation patterns and examples:

•Code patterns: PATTERNS.md
•Complete examples: EXAMPLES.md
•Real-world scenarios: SCENARIOS.md
•Workflow details: WORKFLOWS.md

Installation

Install required libraries:

bash

pip install reportlab
pip install pymupdf
pip install pdfplumber
pip install pypdf
pip install pypdfium2
pip install fonttools  # For TTF font extraction

For advanced features (OCR, CLI tools):

bash

# For OCR (scanned PDFs)
pip install pytesseract pdf2image
sudo apt-get install tesseract-ocr

# For command-line tools
sudo apt-get install poppler-utils
sudo apt-get install qpdf

Key Rules

•reportlab: Canvas coordinates (0,0 at bottom-left), use Pt() for font sizes, Inch() for positioning
•PyMuPDF: Uses RGB tuples (0-1 range), not 0-255
•Always: Call save() to finalize documents, close documents to free resources
•Chinese fonts: ALWAYS register Chinese fonts before using Chinese text in reportlab
•Large PDFs: Process in chunks to avoid memory issues
•Encrypted PDFs: Handle gracefully with proper password management