Smart Screenshot
Intelligent screen capture with OCR, markdown conversion, and smart formatting. Capture screen regions, extract text, convert images/PDFs to markdown, and save with automated formatting.
Quick Start
Trigger methods:
- •Keyboard shortcut: Press
PrtSc(customizable) - •Command line:
python scripts/capture.py - •Claude Code: Ask Claude to "take a screenshot"
Workflow:
- •Press PrtSc → Capture mode activates
- •Choose: Image or Text
- •Select region/window
- •If Image: Save with annotation options
- •If Text: OCR → MarkItDown → Save markdown
Prerequisites
System Requirements
- •Windows 10/11, macOS 10.14+, or Linux
- •Python 3.8+
- •Screen with display access
Install Dependencies
Core (required):
# Screenshot and OCR pip install pillow pyautogui mss pytesseract pyscreenshot --break-system-packages # MarkItDown (Microsoft's converter) pip install markitdown --break-system-packages # Keyboard hooks pip install keyboard pynput --break-system-packages # GUI for dialogs pip install tkinter --break-system-packages # May be pre-installed
OCR engine (Tesseract):
Windows:
# Download installer from: # https://github.com/UB-Mannheim/tesseract/wiki # Install to: C:\Program Files\Tesseract-OCR\ # Add to PATH
macOS:
brew install tesseract
Linux:
sudo apt-get install tesseract-ocr # or sudo dnf install tesseract
Optional enhancements:
# Better OCR (EasyOCR - slower but more accurate) pip install easyocr --break-system-packages # PDF handling pip install pdf2image pypdf2 --break-system-packages # Image enhancement pip install opencv-python --break-system-packages # Clipboard integration pip install pyperclip --break-system-packages
See reference/setup-guide.md for detailed installation.
Features
Capture Modes
1. Region Selection
- •Click and drag to select area
- •Real-time preview
- •Pixel-perfect selection
2. Window Capture
- •Automatically detect windows
- •Capture specific application
- •Includes/excludes borders
3. Full Screen
- •Entire display
- •Multi-monitor support
- •All screens at once
4. Scrolling Capture
- •Capture long web pages
- •Auto-scroll and stitch
- •Perfect for documentation
Text Extraction
OCR Engines:
- •Tesseract - Fast, free, 100+ languages
- •EasyOCR - Slower, more accurate
- •Cloud OCR - Azure/Google (highest accuracy)
Smart text processing:
- •Automatic language detection
- •Text cleanup and formatting
- •Table recognition
- •Layout preservation
Markdown Conversion
Using MarkItDown (Microsoft):
- •Images → Markdown with alt text
- •PDFs → Clean markdown
- •Screenshots → Formatted text
- •Tables → Markdown tables
- •Code blocks → Syntax highlighting
Conversion features:
- •Smart heading detection
- •List preservation
- •Link extraction
- •Code formatting
- •Table structure recognition
Core Operations
Quick Capture
Keyboard shortcut:
# Run as background service python scripts/screenshot_service.py # Now press PrtSc anytime: # 1. Screen freezes # 2. Choose "Image" or "Text" # 3. Select region # 4. Auto-process and save
Command line:
# Capture with UI python scripts/capture.py # Capture full screen immediately python scripts/capture.py --fullscreen --output screenshot.png # Capture region with coordinates python scripts/capture.py --region 100,100,800,600 --output region.png
Text Mode (OCR → Markdown)
Interactive:
# Start capture python scripts/capture.py --mode text # Process: # 1. Select region # 2. OCR extracts text # 3. MarkItDown formats # 4. Save dialog opens # 5. Save as .md file
Automatic:
# Capture and OCR python scripts/capture_text.py --output extracted.md # With specific language python scripts/capture_text.py --lang eng+fra --output text.md # With enhancement python scripts/capture_text.py --enhance --output clean.md
Image Mode
Interactive:
# Start capture python scripts/capture.py --mode image # Process: # 1. Select region # 2. Annotation tools appear # 3. Add arrows, boxes, text # 4. Save dialog opens
With annotations:
# Capture and annotate python scripts/capture_annotate.py --output annotated.png # Annotation tools: # - Arrow # - Rectangle # - Circle # - Text # - Highlight # - Blur (redact sensitive info)
PDF to Markdown
Convert PDF to markdown:
# Using MarkItDown python scripts/pdf_to_markdown.py --input document.pdf --output document.md # With OCR for scanned PDFs python scripts/pdf_to_markdown.py --input scanned.pdf --ocr --output text.md # Batch convert folder python scripts/batch_pdf_convert.py --input ./pdfs/ --output ./markdown/
Screenshot from Image
Process existing image:
# Extract text to markdown python scripts/image_to_markdown.py --input screenshot.png --output text.md # Clean up image first python scripts/enhance_and_extract.py --input noisy.png --output clean.md
Configuration
Settings file: config.yaml
# Keyboard shortcut
hotkey: "Print" # or "ctrl+shift+s", "cmd+shift+5", etc.
# Default capture mode
default_mode: "prompt" # "image", "text", or "prompt"
# OCR settings
ocr:
engine: "tesseract" # "tesseract", "easyocr", or "cloud"
language: "eng"
enhance: true # Pre-process image for better OCR
# Output settings
output:
directory: "~/Screenshots"
filename_pattern: "Screenshot-{date}-{time}"
auto_save: false # true = skip save dialog
clipboard: true # Copy to clipboard
# Markdown settings
markdown:
format_code_blocks: true
detect_tables: true
preserve_formatting: true
# Annotation defaults
annotation:
arrow_color: "#FF0000"
box_color: "#0000FF"
text_color: "#000000"
text_size: 12
line_width: 2
Common Workflows
Workflow 1: Code Documentation
Scenario: Capture code from screen → Markdown documentation
# 1. Run screenshot service
python scripts/screenshot_service.py &
# 2. Press PrtSc on your keyboard
# 3. Select "Text" mode
# 4. Select code region on screen
# 5. OCR extracts code
# 6. MarkItDown formats as code block:
```python
def example_function():
return "formatted code"
7. Save dialog opens → Save as code-snippet.md
### Workflow 2: Meeting Notes from Slides **Scenario:** Capture presentation slides → Formatted notes ```bash # Capture multiple slides python scripts/capture_sequence.py \ --count 5 \ --delay 3 \ --mode text \ --output slides.md # Result: All slides as markdown in one file
Workflow 3: Email/Document Processing
Scenario: Screenshot email → Extract and format text
# Capture email python scripts/capture.py --mode text --enhance # Text extracted, formatted, and saved # Perfect for archiving or processing
Workflow 4: Research Paper Annotation
Scenario: Screenshot paper → Annotate → Save
# Capture and annotate python scripts/capture_annotate.py --output paper-notes.png # Add arrows, highlights, notes # Save annotated version
Workflow 5: Batch PDF Conversion
Scenario: Convert all PDFs to markdown
# Convert folder of PDFs python scripts/batch_pdf_convert.py \ --input ~/Documents/PDFs/ \ --output ~/Documents/Markdown/ \ --ocr # Enable OCR for scanned docs # Progress shown for each file # All PDFs → Clean markdown
MarkItDown Features
Microsoft's MarkItDown converts:
Images:
- •Screenshots → Extracted text
- •Diagrams → Alt text descriptions
- •Charts → Data tables
PDFs:
- •Native PDFs → Clean markdown
- •Scanned PDFs → OCR + markdown
- •Preserve structure and formatting
Documents:
- •Word docs → Markdown
- •PowerPoint → Slide content
- •Excel → Markdown tables
Code:
- •Syntax highlighted code blocks
- •Language detection
- •Proper indentation
Tables:
- •Visual tables → Markdown tables
- •Preserved alignment
- •Header detection
Keyboard Shortcuts
During capture:
- •
Esc- Cancel capture - •
Space- Toggle crosshair/selection - •
Enter- Confirm selection - •
Ctrl+Z- Undo annotation - •
Ctrl+C- Copy to clipboard - •
Ctrl+S- Save
Annotation mode:
- •
A- Arrow tool - •
R- Rectangle tool - •
C- Circle tool - •
T- Text tool - •
H- Highlight tool - •
B- Blur tool - •
Delete- Remove last annotation
OCR Accuracy Tips
Better results:
- •Enhance image first - Increase contrast, denoise
- •Correct language - Specify language(s)
- •Proper DPI - Higher resolution = better OCR
- •Clean background - Remove clutter
- •Good lighting - For camera captures
- •Straight text - Rotate if needed
Pre-processing:
# Enhance before OCR python scripts/enhance_image.py \ --input screenshot.png \ --output enhanced.png \ --operations "grayscale,contrast,denoise" # Then OCR python scripts/image_to_markdown.py --input enhanced.png
Multi-Monitor Support
Capture from specific monitor:
# List monitors python scripts/list_monitors.py # Capture from monitor 2 python scripts/capture.py --monitor 2 # Capture all monitors python scripts/capture.py --all-monitors
Save Dialog Options
When save dialog appears:
- •Filename: Auto-generated or custom
- •Location: Last used or default directory
- •Format: .md, .txt, .png, .jpg
- •Options:
- •Copy to clipboard
- •Open in editor
- •Share
Skip dialog (auto-save):
# config.yaml output: auto_save: true directory: "~/Screenshots"
Integration with Clipboard
Copy to clipboard automatically:
# Text mode - copies markdown python scripts/capture.py --mode text --clipboard # Image mode - copies image python scripts/capture.py --mode image --clipboard # Both python scripts/capture.py --clipboard --save
Paste from clipboard:
# Process clipboard image python scripts/process_clipboard.py --output result.md
Running as Service
Windows
Install as Windows service:
# Install python scripts/install_windows_service.py # Service runs on startup # PrtSc always available
Or use Task Scheduler:
1. Open Task Scheduler 2. Create Basic Task 3. Trigger: At log on 4. Action: Start program 5. Program: python 6. Arguments: path\to\screenshot_service.py
macOS
LaunchAgent setup:
# Install service python scripts/install_macos_service.py # Creates ~/Library/LaunchAgents/com.screenshot.service.plist # Runs on login
Manual:
# Create LaunchAgent plist # Load with launchctl launchctl load ~/Library/LaunchAgents/com.screenshot.service.plist
Linux
Systemd service:
# Install python scripts/install_linux_service.py # Creates ~/.config/systemd/user/screenshot.service # Enable and start: systemctl --user enable screenshot systemctl --user start screenshot
Or use autostart:
# Copy desktop entry cp screenshot.desktop ~/.config/autostart/
Cloud OCR (Optional)
For highest accuracy:
Azure Computer Vision:
export AZURE_CV_KEY="your-key" export AZURE_CV_ENDPOINT="https://your-region.api.cognitive.microsoft.com/" python scripts/capture.py --mode text --ocr-engine azure
Google Vision:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json" python scripts/capture.py --mode text --ocr-engine google
Costs:
- •Azure: 1000 transactions/month free, then $1/1000
- •Google: 1000 units/month free, then $1.50/1000
Scripts Reference
Capture:
- •
capture.py- Main interactive capture - •
capture_text.py- Text mode only - •
capture_annotate.py- Image with annotations - •
capture_sequence.py- Multiple captures
Service:
- •
screenshot_service.py- Background service - •
install_windows_service.py- Windows installer - •
install_macos_service.py- macOS installer - •
install_linux_service.py- Linux installer
Conversion:
- •
image_to_markdown.py- Image → Markdown - •
pdf_to_markdown.py- PDF → Markdown - •
batch_pdf_convert.py- Batch conversion
Processing:
- •
enhance_image.py- Image enhancement - •
process_clipboard.py- Clipboard processing - •
extract_tables.py- Table extraction
Utilities:
- •
list_monitors.py- List displays - •
test_ocr.py- Test OCR accuracy - •
configure.py- Interactive config
Best Practices
- •Run as service - Always available with hotkey
- •Configure hotkey - Choose comfortable shortcut
- •Enable clipboard - Quick copy-paste workflow
- •Enhance first - Better OCR results
- •Use appropriate OCR - Tesseract for speed, Cloud for accuracy
- •Organize output - Set default directory
- •Backup settings - Save config.yaml
- •Test thoroughly - Verify OCR accuracy for your use case
Troubleshooting
"Tesseract not found"
# Install Tesseract # Windows: Download installer # macOS: brew install tesseract # Linux: apt install tesseract-ocr # Check installation tesseract --version
"Permission denied" (screenshot)
Windows: Run as Administrator macOS: System Preferences → Security → Privacy → Screen Recording Linux: Check X11 permissions
"Keyboard hook failed"
# Requires administrator/root privileges # Windows: Run as Administrator # macOS: Grant Accessibility permissions # Linux: Run with sudo or add user to input group
"Poor OCR quality"
# Enhance image first python scripts/enhance_image.py --input screenshot.png # Try different OCR engine python scripts/capture.py --ocr-engine easyocr # Specify language python scripts/capture.py --lang eng+fra
"MarkItDown not working"
pip install --upgrade markitdown --break-system-packages # Check version python -c "import markitdown; print(markitdown.__version__)"
Platform-Specific Notes
Windows
- •PrtSc key native support
- •Windows Ink integration available
- •OneDrive sync compatible
- •Notification system integration
macOS
- •cmd+shift+5 alternative
- •Quick Look preview
- •iCloud Drive sync
- •Notification Center integration
Linux
- •Wayland/X11 support
- •Various hotkey daemons
- •Desktop environment integration
- •Screenshot directories vary
Integration Examples
See examples/ for complete workflows:
- •examples/documentation-workflow.md - Code docs
- •examples/research-notes.md - Paper processing
- •examples/meeting-capture.md - Meeting slides
- •examples/email-archival.md - Email processing
Reference Documentation
- •reference/setup-guide.md - Complete setup
- •reference/ocr-engines.md - OCR comparison
- •reference/markitdown-guide.md - MarkItDown features
- •reference/hotkey-config.md - Keyboard shortcuts
- •reference/service-install.md - Service setup