AgentSkillsCN

pdf-processing

从 PDF 文件中提取文本与表格。

SKILL.md
--- frontmatter
name: pdf-processing
description: Extract text and tables from PDF files.
compatibility: Requires PyMuPDF library.

PDF Processing

When to use this skill

Use this skill when the task involves reading, extracting, or transforming content from PDF documents.

How to use

For standard extraction, run the bundled script:

code
python scripts/extract.py <file.pdf>

The script reads the PDF and outputs structured content with page numbers and detected tables.

Output

Return extracted text with page numbers and any detected tables in a structured format.

Notes

If you encounter scanned PDFs or complex layouts, OCR processing may be required.