PDF Reading Skill

When you need to read content from a PDF file, use this skill to convert it to markdown first, then read the markdown file.

Why Use This Skill

•Large PDFs often fail to load due to size limits
•PDF text extraction is more reliable through pdftotext
•Markdown output preserves layout and is easier to process
•Works with single files or entire directories of PDFs

How to Process PDFs

Step 1: Convert PDF to Markdown

Use the pdf_to_markdown.sh script located in this skill's directory:

bash

# Single PDF file
pdf_to_markdown.sh "/path/to/file.pdf" "/tmp/pdf_output.md"

# Multiple PDFs in a directory
pdf_to_markdown.sh "/path/to/pdf_directory/" "/tmp/combined_output.md"

Step 2: Read the Markdown Output

After conversion, read the markdown file using the Read tool:

code

Read /tmp/pdf_output.md

Step 3: Clean Up (Optional)

Remove temporary markdown files when done:

bash

rm /tmp/pdf_output.md

Workflow Example

When a user asks you to read or analyze a PDF:

•First, convert the PDF to markdown using the script
•Read the generated markdown file
•Process/analyze the content as requested
•Provide your response based on the extracted text

Requirements

The script requires pdftotext from poppler. If not installed:

•macOS: brew install poppler
•Ubuntu: sudo apt-get install poppler-utils

Notes

•The script preserves text layout from the original PDF
•UTF-8 encoding is used for proper character support
•Failed extractions are noted in the output file
•For very large PDFs, you may need to read the markdown file in chunks using offset/limit parameters