PaddleOCR Document Parsing

Name: paddleocr-doc-parsing
Rating: 65
Author: openclaw

Parse images and PDF files using PaddleOCR's API. Supports multiple document parsing algorithms with structured output.

Key Features

•Multi-format support: PDF and image files (JPG, PNG, BMP, TIFF)
•Layout analysis: Automatic detection of text blocks, tables, formulas
•Multi-language: Support for 110+ languages
•Structured output: Markdown format with preserved document structure

Setup

•Obtain credentials from the PaddleOCR official website. Click the “API” button, choose the desired algorithm (e.g., PP-Structure, PaddleOCR-VL-1.5), and copy the API URL and the access token.
•Set environment variables:

bash

export PADDLEOCR_API_URL="https://your-endpoint-here"
export PADDLEOCR_ACCESS_TOKEN="your_access_token"

Usage Examples

Run Script

bash

# Parse local image
{baseDir}/paddleocr_parse.sh document.jpg

# Parse local PDF file
{baseDir}/paddleocr_parse.sh -t pdf document.pdf

# Parse document from URL
{baseDir}/paddleocr_parse.sh -t pdf https://example.com/document.pdf

# Output to stdout (default)
{baseDir}/paddleocr_parse.sh document.jpg

# Save output to file
{baseDir}/paddleocr_parse.sh -o result.json document.jpg

Response Structure

json

{
  "logId": "unique_request_id",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "layoutParsingResults": [
      {
        "prunedResult": [...],
        "markdown": {
          "text": "# Document Title\n\nParagraph content...",
          "images": {}
        },
        "outputImages": [...],
        "inputImage": "http://input-image"
      }
    ],
    "dataInfo": {...}
  }
}

Important Fields:

•prunedResult - Contains detailed layout element information including positions, categories, etc.
•markdown - Stores the document content converted to Markdown format with preserved structure and formatting.

Quota Information

See official documentation: https://ai.baidu.com/ai-doc/AISTUDIO/Xmjclapam