CapturePdf

Name: CapturePdf
Rating: 87
Author: Alejfanjul

Captures PDF books and converts them to markdown for the ai-brain knowledge base.

How It Works

•Extracts text from all PDF pages
•Cleans content (removes tracking, formatting artifacts)
•Saves as markdown in sources/
•Auto-commits to git
•Ready for embedding via embed_sources.py

Usage

Basic (interactive - will prompt for author/title if not in PDF metadata):

bash

python3 ~/ai-brain/scripts/capture_pdf.py /path/to/book.pdf

With metadata (recommended for cleaner filenames):

bash

python3 ~/ai-brain/scripts/capture_pdf.py /path/to/book.pdf --author "Author Name" --title "Book Title"

Without auto-commit:

bash

python3 ~/ai-brain/scripts/capture_pdf.py /path/to/book.pdf --no-commit

After Capture

Generate embeddings for the new content:

bash

python3 ~/ai-brain/scripts/embed_sources.py

Examples

Example 1: Capture a fitness book

code

User: "/capture-pdf"
User provides: /home/user/downloads/531-forever.pdf
→ Runs capture script with --author "Jim Wendler" --title "5/3/1 Forever"
→ Creates sources/2026-01-18-jim-wendler-531-forever.md
→ Commits to git
→ User runs embed_sources.py to index

Example 2: Capture with prompts

code

User: "capture this pdf ~/books/some-book.pdf"
→ Runs capture script
→ Script prompts for author/title if not detected
→ Creates markdown in sources/
→ Ready for embeddings

Example 3: Batch capture

code

User: "I have 3 Wendler PDFs to capture"
→ Run capture script for each PDF
→ Then run embed_sources.py once at the end