OCR Super Surya

Name: ocr-super-surya
Rating: 92
Author: aktsmm

GPU-optimized OCR using Surya.

When to Use

•Extracting text from screenshots, photos, or scanned images
•Processing PDFs with embedded images
•Multi-language document OCR (90+ languages including Japanese)
•Layout analysis and table detection

Features

Feature	Description
Accuracy	2x better than Tesseract (0.97 vs 0.88)
GPU	PyTorch-based, CUDA optimized
Languages	90+ including CJK
Layout	Document layout, table recognition

Quick Start

Installation

bash

# 1. Check GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 2. Install (with CUDA if GPU available)
pip install surya-ocr

# If CUDA=False but you have GPU, reinstall PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Usage

bash

# CLI
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Or use surya directly
surya_ocr image.png --output_dir ./results

Python API

python

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)

GPU Configuration

Variable	Default	Description
`RECOGNITION_BATCH_SIZE`	512	Reduce for lower VRAM
`DETECTOR_BATCH_SIZE`	36	Reduce if OOM

bash

export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

Script	Description
`scripts/ocr_helper.py`	Helper with OOM auto-retry, batch support

Troubleshooting

Issue	Solution
CUDA=False with GPU	Reinstall PyTorch with CUDA
OOM Error	Reduce batch sizes
CPU Fallback	Auto-detected (slower)

License

•This skill: CC BY-NC 4.0
•Surya: GPL-3.0 (code), commercial license for >$2M revenue