When to use

•Convert textbook/lecture images to markdown notes
•Batch OCR processing of scanned documents
•Extract text from image sequences (iPhone photos, screenshots)
•Create searchable markdown from visual content
•Process documents privately without cloud services

ollama-deepseek-ocr-tool Skill

Purpose

This skill provides access to ollama-deepseek-ocr-tool, a CLI tool for fast, private batch OCR processing using DeepSeek-OCR via Ollama. Converts sequences of images (textbook pages, slides, scans) into a single coherent markdown document.

Key capabilities:

•⚡ Fast processing (~3s per image on M4)
•🔒 Private - runs entirely locally
•📝 Clean markdown output (tables, headings, lists)
•🔄 Natural sorting (IMG_1 < IMG_2 < IMG_10)
•💰 Free - no API costs or rate limits

When to Use This Skill

Use this skill when:

•Converting textbook chapters to Obsidian notes
•Processing lecture slides or handouts to markdown
•Extracting text from scanned documents
•Creating searchable study materials from images
•Need comprehensive examples and troubleshooting

Do NOT use this skill for:

•Cloud-based OCR (this is local-only)
•Describing image content (extracts text only)
•Handwritten text recognition (printed text only)
•Real-time streaming OCR (batch processing only)

CLI Tool: ollama-deepseek-ocr-tool

The ollama-deepseek-ocr-tool processes multiple images in sequence and creates a single markdown document with extracted text. Images are sorted naturally and text is appended sequentially for coherent reading.

Installation

bash

# Clone and install
git clone https://github.com/dnvriend/ollama-deepseek-ocr-tool.git
cd ollama-deepseek-ocr-tool
uv tool install .

Prerequisites

•
Ollama - Local LLM runtime
bash
```
brew install ollama
ollama serve
```
•
DeepSeek-OCR model (~6GB download)
bash
```
ollama pull deepseek-ocr
```
•
Python 3.14+ and uv package manager

Quick Start

bash

# Example 1: Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md

# Example 2: Convert lecture slides to markdown
ollama-deepseek-ocr-tool "lecture-week5/*.jpg" week5-summary.md

# Example 3: With verbose logging to debug issues
ollama-deepseek-ocr-tool "*.png" output.md -vv

Main Command - Batch OCR Processing

Process images matching a glob pattern and create a markdown document.

Usage:

bash

ollama-deepseek-ocr-tool GLOB_PATTERN OUTPUT_FILE [OPTIONS]

Arguments:

•GLOB_PATTERN: Pattern to match images (e.g., ".png", "dir/.jpg")
•OUTPUT_FILE: Path to output markdown file (will be overwritten)
•-v/-vv/-vvv: Verbosity (INFO/DEBUG/TRACE)
•--help: Show comprehensive help with examples
•--version: Show version

Examples:

bash

# Basic: Process all PNGs in current directory
ollama-deepseek-ocr-tool "*.png" output.md

# Process specific directory
ollama-deepseek-ocr-tool "textbook-ch3/*.jpg" chapter-3.md

# With verbose logging
ollama-deepseek-ocr-tool "*.png" output.md -vv

# Preview help (shows all examples)
ollama-deepseek-ocr-tool --help

Output Format:

markdown

<!-- Source: IMG_4170.png -->

[extracted text from image 1]

---

<!-- Source: IMG_4171.png -->

[extracted text from image 2]

</details> <details> <summary><strong>⚙️ Advanced Features (Click to expand)</strong></summary>

Multi-Level Verbosity Logging

Control logging detail with progressive verbosity levels. All logs output to stderr.

Logging Levels:

Flag	Level	Output	Use Case
(none)	WARNING	Errors and warnings only	Production, quiet mode
`-v`	INFO	+ High-level operations	Normal debugging
`-vv`	DEBUG	+ Detailed info, full tracebacks	Development, troubleshooting
`-vvv`	TRACE	+ Library internals	Deep debugging

Examples:

bash

# INFO level - see operations
ollama-deepseek-ocr-tool command -v

# DEBUG level - see detailed info
ollama-deepseek-ocr-tool command -vv

# TRACE level - see all internals
ollama-deepseek-ocr-tool command -vvv

What Can Be Extracted

Text & Formatting:

•✅ Headings (H1, H2, H3)
•✅ Body text with bold/italic
•✅ Bulleted and numbered lists
•✅ Multi-column layouts

Tables:

•✅ Clean markdown table format
•✅ Headers and structure preserved
•✅ Merged cells handled

Diagrams & Figures:

•✅ Text labels extracted
•✅ Figure captions captured
•❌ Visual content not described
•❌ Flowchart arrows not preserved

Performance Characteristics

•Speed: ~3 seconds per image (M4 MacBook)
•Memory: ~6GB (DeepSeek-OCR model)
•Throughput: ~20 images per minute
•Scalability: Sequential processing (no parallel batching)

</details> <details> <summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>

Common Issues

Issue: "No files match pattern"

bash

# Check your glob pattern and current directory
ls *.png  # Verify files exist

# Use absolute or relative paths correctly
ollama-deepseek-ocr-tool "./images/*.png" output.md

Issue: "Connection refused" / "OCR extraction failed"

bash

# Ensure Ollama is running
ollama serve

# Verify model is installed
ollama list | grep deepseek-ocr

# Pull model if missing
ollama pull deepseek-ocr

Issue: Poor quality extraction

•Use -vv flag to see word counts and verify extraction
•Check image quality (resolution, clarity)
•For complex layouts, results may vary
•Tables and diagrams work best with clear text

Issue: Slow processing

•Expected: ~3 seconds per image on M4
•Check if Ollama is using GPU acceleration
•Sequential processing is by design (6GB model)

Getting Help

bash

# Show comprehensive help with examples
ollama-deepseek-ocr-tool --help

# Use verbose logging to debug
ollama-deepseek-ocr-tool "*.png" output.md -vv

</details>

Exit Codes

•0: Success - all images processed
•1: Validation error - no files match pattern or invalid arguments
•2: Runtime error - Ollama connection failed or model not found

Best Practices

•Organize images before processing: Name files sequentially (IMG_001, IMG_002) for natural sorting
•Use descriptive output names: chapter-3-entrepreneurship.md not output.md
•Start with small batches: Test with 2-3 images first to verify quality
•Enable verbose logging for debugging: Use -vv to see extraction progress and word counts
•Review output after processing: OCR may miss formatting or misread complex layouts
•Keep images at good resolution: Higher quality = better extraction
•Process similar content together: Keep textbook pages separate from diagrams

Resources

•GitHub: https://github.com/dnvriend/ollama-deepseek-ocr-tool
•Python Package Index: https://pypi.org/project/ollama-deepseek-ocr-tool/
•Documentation: