OCR Error Fixer for LaTeX Manuscript
Fixes OCR errors in LaTeX manuscripts for "Direct Use of the Sun's Energy" by Farrington Daniels. Preserves LaTeX formatting and technical accuracy while correcting typical OCR scanning errors.
Workflow
Execute this sequential workflow when asked to "fix OCR errors" or "clean up the OCR":
1. Identify target file(s)
Determine which chapter file(s) need correction. If not specified, ask which file to work on.
Target files: manuscript/content/*/*.tex
2. Read and analyze
Open the file and scan for common OCR errors while understanding the solar energy technical context.
3. Fix hyphenated words split across lines
Rejoin words split by OCR line breaks:
- •"conven ience" → "convenience"
- •"re duced" → "reduced"
- •"ab sorbed" → "absorbed"
- •"tempera ture" → "temperature"
- •"effi ciency" → "efficiency"
Important: Only fix incorrect splits. Preserve intentional hyphenation at line breaks that are part of LaTeX formatting.
4. Convert hyphens to en dashes
Replace hyphens (-) with en dashes (--) in:
- •Ranges: "60-cm" → "60--cm"
- •Measurements: "10-20 degrees" → "10--20 degrees"
- •Compound terms: "east-west" → "east--west", "north-south" → "north--south"
- •Date ranges: "1950-1960" → "1950--1960"
Do NOT change hyphens in:
- •Hyphenated words (e.g., "well-known", "self-contained")
- •Command-line options or technical terms
- •URLs or file paths
5. Fix common OCR scanning errors
Correct typical OCR misreadings:
Character substitutions:
- •"rn" misread as "m" (e.g., "moming" → "morning")
- •"cl" misread as "d" (e.g., "dass" → "class")
- •"ii" misread as "n" (e.g., "tliis" → "this")
- •"vv" misread as "w" (e.g., "vvater" → "water")
- •"l" misread as "1" or vice versa
- •"O" (letter) misread as "0" (zero) or vice versa
Spacing issues:
- •Missing spaces between words
- •Extra spaces within words
- •Spaces before punctuation
Punctuation errors:
- •Incorrect quotation marks
- •Missing or extra periods, commas
- •Malformed apostrophes
Mathematical notation:
- •Superscripts (e.g., "ft?" → "ft²", "m?" → "m²")
- •Subscripts (e.g., "H20" → "H₂O" or "H\textsubscript{2}O")
- •Degree symbols (e.g., "60°" → "60\textdegree" or "60°")
6. Preserve LaTeX formatting (CRITICAL)
DO NOT change valid LaTeX quotes:
- •Opening double quotes: `` (two backticks) are CORRECT
- •Closing double quotes: '' (two single quotes) are CORRECT
- •Opening single quotes: ` (one backtick) is CORRECT
- •Closing single quotes: ' (one apostrophe) is CORRECT
Only fix Unicode smart quotes (incorrect in LaTeX):
- •" (U+201C) → `` (two backticks)
- •" (U+201D) → '' (two single quotes)
- •' (U+2018) → ` (one backtick)
- •' (U+2019) → ' (one apostrophe)
Preserve all LaTeX commands:
- •
\textasciicircum,\footnote{},\endnote{} - •
\label{},\ref{} - •
\textit{},\textbf{},\emph{} - •
\section{},\subsection{} - •Math environments:
\[...\],$...$,\begin{equation}...\end{equation} - •All other LaTeX commands and environments
7. Maintain technical accuracy
Ensure all corrections preserve:
- •Technical terminology: Solar energy terms must be accurate (e.g., "photovoltaic", "thermal collector", "insolation", "irradiance")
- •Scientific accuracy: Don't change technical descriptions or measurements
- •Original intent: Keep the author's voice and meaning intact
- •Context: Consider the surrounding text when making corrections
8. Review and verify
Before saving:
- •Scan through corrected text to ensure no LaTeX commands were broken
- •Verify technical terms are spelled correctly
- •Check mathematical notation is properly formatted
- •Ensure text still makes sense in context
9. Save the corrected file
Write the corrected content back to the original file.
10. Report summary
Provide a brief summary:
OCR errors fixed! ✨ Summary: - File: <filename> - Hyphenated words rejoined: <count> - Hyphens → en dashes: <count> - Scannos corrected: <count> - LaTeX formatting preserved: ✓ - Technical accuracy maintained: ✓
Important Notes
- •Be conservative: When in doubt, don't change it. Better to miss an error than introduce a new one.
- •Context matters: Always consider surrounding text and technical context before making corrections.
- •LaTeX is sacred: Never break LaTeX commands or environments. If unsure, leave it alone.
- •Technical accuracy first: This is a scientific text about solar energy. Preserve technical accuracy above all else.
- •Ask if uncertain: If you encounter something ambiguous, ask the user before making the change.
- •Work incrementally: For large files, consider working section by section to maintain quality.
- •No formatting changes: Don't reformat file structure, indentation, or line breaks unless fixing OCR errors.
Common Solar Energy Terms
- •photovoltaic, thermal collector, insolation, irradiance, absorptance, emittance
- •concentrator, reflector, flat-plate collector, evacuated tube
- •kilowatt-hour (kWh), British thermal unit (Btu), calorie, joule
- •efficiency, transmittance, absorber, glazing, selective surface
- •declination, azimuth, zenith angle, solar constant