AgentSkillsCN

pdf

一套功能全面的 PDF 处理工具集,可用于提取文本与表格、创建新 PDF、合并或拆分文档,以及处理表单。当 Claude 需要填写 PDF 表单,或以程序化方式大规模处理、生成、分析 PDF 文档时,这套工具将大显身手。

SKILL.md
--- frontmatter
name: pdf
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
source: anthropics/skills
license: Apache-2.0

PDF Processing Guide

Quick Start

python
from pypdf import PdfReader, PdfWriter

# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")

# Extract text
text = ""
for page in reader.pages:
    text += page.extract_text()

Python Libraries

pypdf - Basic Operations

Merge PDFs

python
from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Split PDF

python
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i+1}.pdf", "wb") as output:
        writer.write(output)

pdfplumber - Text and Table Extraction

Extract Tables

python
import pdfplumber
import pandas as pd

with pdfplumber.open("document.pdf") as pdf:
    all_tables = []
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            if table:
                df = pd.DataFrame(table[1:], columns=table[0])
                all_tables.append(df)

reportlab - Create PDFs

python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
c.drawString(100, height - 100, "Hello World!")
c.save()

Command-Line Tools

bash
# Extract text (poppler-utils)
pdftotext input.pdf output.txt

# Merge PDFs (qpdf)
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf

Quick Reference

TaskBest ToolCommand/Code
Merge PDFspypdfwriter.add_page(page)
Split PDFspypdfOne page per file
Extract textpdfplumberpage.extract_text()
Extract tablespdfplumberpage.extract_tables()
Create PDFsreportlabCanvas or Platypus
OCR scanned PDFspytesseractConvert to image first