Nano PDF
Extract and process PDF content.
Extract Text
bash
pdftotext input.pdf -
Extract Specific Pages
bash
pdftotext -f 1 -l 5 input.pdf -
Get PDF Info
bash
pdfinfo input.pdf
Extract as HTML
bash
pdftohtml input.pdf /tmp/output
Tips
- •Use
-layoutflag to preserve formatting - •Use
-rawfor continuous text without page breaks - •Pipe to other tools for further processing
code