PDF Processor - Extract Data from PDFs

Extract text, tables, and structured data from PDF documents.

Workflow

Step 1: Fetch PDF Content

Use Linkup to fetch PDF URLs:

bash

orth api run linkup /fetch --body '{"url": "https://example.com/document.pdf"}'

Step 2: Extract with AI

Use ScrapeGraph to extract specific content:

bash

orth api run scrapegraph /v1/smartscraper --body '{
  "website_url": "https://example.com/report.pdf",
  "user_prompt": "Extract all financial figures, tables, and key metrics from this document"
}'

Step 3: Extract Tables

Get structured table data:

bash

orth api run riveter /v1/run --body '{
  "input": {
    "urls": ["https://example.com/report.pdf"]
  },
  "output": {
    "tables": {"prompt": "Extract all tables with titles, headers, and rows", "contexts": ["urls"]}
  }
}'

Step 4: Convert to Markdown

Get readable markdown output:

bash

orth api run scrapegraph /v1/markdownify --body '{"website_url": "https://example.com/document.pdf"}'

Example Usage

bash

# Extract data from financial report
orth api run scrapegraph /v1/smartscraper --body '{
  "website_url": "https://example.com/annual-report.pdf",
  "user_prompt": "Extract revenue, profit, and key business metrics with their values"
}'

# Extract invoice data
orth api run riveter /v1/run --body '{
  "input": {"urls": ["https://example.com/invoice.pdf"]},
  "output": {
    "vendor": {"prompt": "Vendor name", "contexts": ["urls"]},
    "amount": {"prompt": "Total amount", "contexts": ["urls"]},
    "date": {"prompt": "Invoice date", "contexts": ["urls"]}
  }
}'

Tips

•Specify exact data you need for better extraction
•Use schemas for consistent structured output
•Handle multi-page documents in chunks
•Verify extracted numbers against source

Discover More

List all endpoints, or add a path for parameter details:

bash

orth api show linkup
orth api show riveter
orth api show scrapegraph

Example: orth api show olostep /v1/scrapes for endpoint parameters.