AgentSkillsCN

scanno-fixer

在保留格式与技术准确性的前提下,修复 LaTeX 稿件中的 OCR 错误(scannos)。当您需要修复 OCR 错误、清理 OCR 文本、纠正 scannos,或处理数字化后的 LaTeX 稿件时,此技能便能派上用场。

SKILL.md
--- frontmatter
name: scanno-fixer
description: Fixes OCR errors (scannos) in LaTeX manuscripts while preserving formatting and technical accuracy. Use when fixing OCR errors, cleaning up OCR text, correcting scannos, or when working with digitized LaTeX manuscripts.

OCR Error Fixer for LaTeX Manuscript

Fixes OCR errors in LaTeX manuscripts for "Direct Use of the Sun's Energy" by Farrington Daniels. Preserves LaTeX formatting and technical accuracy while correcting typical OCR scanning errors.

Workflow

Execute this sequential workflow when asked to "fix OCR errors" or "clean up the OCR":

1. Identify target file(s)

Determine which chapter file(s) need correction. If not specified, ask which file to work on.

Target files: manuscript/content/*/*.tex

2. Read and analyze

Open the file and scan for common OCR errors while understanding the solar energy technical context.

3. Fix hyphenated words split across lines

Rejoin words split by OCR line breaks:

  • "conven ience" → "convenience"
  • "re duced" → "reduced"
  • "ab sorbed" → "absorbed"
  • "tempera ture" → "temperature"
  • "effi ciency" → "efficiency"

Important: Only fix incorrect splits. Preserve intentional hyphenation at line breaks that are part of LaTeX formatting.

4. Convert hyphens to en dashes

Replace hyphens (-) with en dashes (--) in:

  • Ranges: "60-cm" → "60--cm"
  • Measurements: "10-20 degrees" → "10--20 degrees"
  • Compound terms: "east-west" → "east--west", "north-south" → "north--south"
  • Date ranges: "1950-1960" → "1950--1960"

Do NOT change hyphens in:

  • Hyphenated words (e.g., "well-known", "self-contained")
  • Command-line options or technical terms
  • URLs or file paths

5. Fix common OCR scanning errors

Correct typical OCR misreadings:

Character substitutions:

  • "rn" misread as "m" (e.g., "moming" → "morning")
  • "cl" misread as "d" (e.g., "dass" → "class")
  • "ii" misread as "n" (e.g., "tliis" → "this")
  • "vv" misread as "w" (e.g., "vvater" → "water")
  • "l" misread as "1" or vice versa
  • "O" (letter) misread as "0" (zero) or vice versa

Spacing issues:

  • Missing spaces between words
  • Extra spaces within words
  • Spaces before punctuation

Punctuation errors:

  • Incorrect quotation marks
  • Missing or extra periods, commas
  • Malformed apostrophes

Mathematical notation:

  • Superscripts (e.g., "ft?" → "ft²", "m?" → "m²")
  • Subscripts (e.g., "H20" → "H₂O" or "H\textsubscript{2}O")
  • Degree symbols (e.g., "60°" → "60\textdegree" or "60°")

6. Preserve LaTeX formatting (CRITICAL)

DO NOT change valid LaTeX quotes:

  • Opening double quotes: `` (two backticks) are CORRECT
  • Closing double quotes: '' (two single quotes) are CORRECT
  • Opening single quotes: ` (one backtick) is CORRECT
  • Closing single quotes: ' (one apostrophe) is CORRECT

Only fix Unicode smart quotes (incorrect in LaTeX):

  • " (U+201C) → `` (two backticks)
  • " (U+201D) → '' (two single quotes)
  • ' (U+2018) → ` (one backtick)
  • ' (U+2019) → ' (one apostrophe)

Preserve all LaTeX commands:

  • \textasciicircum, \footnote{}, \endnote{}
  • \label{}, \ref{}
  • \textit{}, \textbf{}, \emph{}
  • \section{}, \subsection{}
  • Math environments: \[...\], $...$, \begin{equation}...\end{equation}
  • All other LaTeX commands and environments

7. Maintain technical accuracy

Ensure all corrections preserve:

  • Technical terminology: Solar energy terms must be accurate (e.g., "photovoltaic", "thermal collector", "insolation", "irradiance")
  • Scientific accuracy: Don't change technical descriptions or measurements
  • Original intent: Keep the author's voice and meaning intact
  • Context: Consider the surrounding text when making corrections

8. Review and verify

Before saving:

  • Scan through corrected text to ensure no LaTeX commands were broken
  • Verify technical terms are spelled correctly
  • Check mathematical notation is properly formatted
  • Ensure text still makes sense in context

9. Save the corrected file

Write the corrected content back to the original file.

10. Report summary

Provide a brief summary:

code
OCR errors fixed! ✨

Summary:
- File: <filename>
- Hyphenated words rejoined: <count>
- Hyphens → en dashes: <count>
- Scannos corrected: <count>
- LaTeX formatting preserved: ✓
- Technical accuracy maintained: ✓

Important Notes

  • Be conservative: When in doubt, don't change it. Better to miss an error than introduce a new one.
  • Context matters: Always consider surrounding text and technical context before making corrections.
  • LaTeX is sacred: Never break LaTeX commands or environments. If unsure, leave it alone.
  • Technical accuracy first: This is a scientific text about solar energy. Preserve technical accuracy above all else.
  • Ask if uncertain: If you encounter something ambiguous, ask the user before making the change.
  • Work incrementally: For large files, consider working section by section to maintain quality.
  • No formatting changes: Don't reformat file structure, indentation, or line breaks unless fixing OCR errors.

Common Solar Energy Terms

  • photovoltaic, thermal collector, insolation, irradiance, absorptance, emittance
  • concentrator, reflector, flat-plate collector, evacuated tube
  • kilowatt-hour (kWh), British thermal unit (Btu), calorie, joule
  • efficiency, transmittance, absorber, glazing, selective surface
  • declination, azimuth, zenith angle, solar constant