AgentSkillsCN

pdf

当您需要从 PDF 中提取文本或表格、合并或拆分文档、填写表单,或生成新的 PDF 时使用。关键词:PDF、pypdf、pdfplumber、提取文本、OCR、合并 PDF。

SKILL.md
--- frontmatter
name: pdf
description: Use when you need to extract text/tables from PDFs, merge/split documents, fill forms, or generate new PDFs. Keywords: pdf, pypdf, pdfplumber, extract text, ocr, merge pdf.

PDF Processing Expert

Overview

This skill provides efficient methods for PDF manipulation. It prioritizes performance and correct tool selection.

[!TIP] Performance First: For simple text extraction or page operations, CLI tools (pdftotext, qpdf) are 10-50x faster than Python libraries. See Performance Guide.

Quick Start

1. Read Text (Best for reliability)

python
import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
   print(pdf.pages[0].extract_text())

2. Merge Documents (Best for speed)

python
from pypdf import PdfWriter

writer = PdfWriter()
writer.append("doc1.pdf")
writer.append("doc2.pdf")
writer.write("merged.pdf")

Common Tasks & Tool Selection

GoalRecommended ToolReference
Extract Text/Tablespdfplumber (Python) or pdftotext (CLI)Library Guide
Merge/Split/Rotatepypdf (Python) or qpdf (CLI)Library Guide
Generate PDFsreportlabLibrary Guide
Fill Formspypdf or pdf-libSee forms.md
OCR Scanned Docspytesseract + pdf2imageLibrary Guide

Documentation & References

  • Library Guide: Detailed code snippets for pypdf, pdfplumber, reportlab.
  • Performance Guide: Optimization tips for large files and low-memory environments.
  • Forms Guide: Special instructions for handling PDF forms.