AgentSkillsCN

ocr-super-surya

使用 Surya 对 GPU 进行优化的 OCR 处理。当您需要:(1) 从图片或截图中提取文字;(2) 处理嵌入了图片的 PDF 文件;(3) 支持多语言文档的 OCR 识别;(4) 进行版面分析与表格检测时,可使用此技能。支持 90 多种语言,识别准确率比 Tesseract 高出 2 倍。

SKILL.md
--- frontmatter
name: ocr-super-surya
description: "GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract."
license: CC BY-NC 4.0
metadata:
  author: yamapan (https://github.com/aktsmm)

OCR Super Surya

GPU-optimized OCR using Surya.

When to Use

  • Extracting text from screenshots, photos, or scanned images
  • Processing PDFs with embedded images
  • Multi-language document OCR (90+ languages including Japanese)
  • Layout analysis and table detection

Features

FeatureDescription
Accuracy2x better than Tesseract (0.97 vs 0.88)
GPUPyTorch-based, CUDA optimized
Languages90+ including CJK
LayoutDocument layout, table recognition

Quick Start

Installation

bash
# 1. Check GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 2. Install (with CUDA if GPU available)
pip install surya-ocr

# If CUDA=False but you have GPU, reinstall PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Usage

bash
# CLI
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Or use surya directly
surya_ocr image.png --output_dir ./results

Python API

python
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)

GPU Configuration

VariableDefaultDescription
RECOGNITION_BATCH_SIZE512Reduce for lower VRAM
DETECTOR_BATCH_SIZE36Reduce if OOM
bash
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

ScriptDescription
scripts/ocr_helper.pyHelper with OOM auto-retry, batch support

Troubleshooting

IssueSolution
CUDA=False with GPUReinstall PyTorch with CUDA
OOM ErrorReduce batch sizes
CPU FallbackAuto-detected (slower)

License

  • This skill: CC BY-NC 4.0
  • Surya: GPL-3.0 (code), commercial license for >$2M revenue