EPUB Translation Skill
Translate EPUB files between any language pair with optimized support for Japanese and English to Korean.
When to Use This Skill
Use this skill when:
- •User wants to translate an EPUB ebook to another language
- •User mentions translating Japanese/English/Chinese novels or books
- •User has multiple EPUB files to translate in batch
- •User needs to preserve EPUB formatting and structure during translation
Usage
/epub-translator <epub_path> [options]
Arguments
- •
<epub_path>: EPUB file or directory containing EPUBs
Options
| Option | Description | Default |
|---|---|---|
--source-lang | Source language code | ja |
--target-lang | Target language code | ko |
--dict | Custom dictionary (JSON) | none |
--output-dir | Output directory | ./translated |
--parallel | Concurrent agents | 5 |
--split-threshold | File size for splitting (KB) | 30 |
--split-parts | Parts to split large files | 4 |
--high-quality | Use Opus model for translation | false |
--vertical | Output vertical writing (ja/zh only) | false |
Language Codes
ja (Japanese), en (English), ko (Korean), zh (Chinese), es (Spanish), fr (French), de (German), ru (Russian), ar (Arabic), or any ISO 639-1 code.
Examples
# Japanese novel to Korean (default) /epub-translator "/books/novel.epub" # English to Korean /epub-translator "/books/english.epub" --source-lang en # Japanese to English /epub-translator "/books/jp_novel.epub" --source-lang ja --target-lang en # High-quality translation using Opus model /epub-translator "/books/important.epub" --high-quality # Batch with larger split threshold (less splitting) /epub-translator "/books/" --split-threshold 50 --parallel 10 # More aggressive splitting for slower connections /epub-translator "/books/large.epub" --split-threshold 20 --split-parts 6 # English to Japanese with vertical writing (우종서/縦書き) /epub-translator "/books/novel.epub" --source-lang en --target-lang ja --vertical # Korean to Chinese with vertical writing /epub-translator "/books/korean.epub" --source-lang ko --target-lang zh --vertical
Architecture
graph TB
O["ORCHESTRATOR<br/>• Analyzes EPUBs and creates task manifest<br/>• Spawns parallel translator agents (foreground)<br/>• Collects results directly from agent responses<br/>• Validates translation quality<br/>• Handles retries and error recovery"]
T1["Translator<br/>Agent 1"]
T2["Translator<br/>Agent 2"]
TN["Translator<br/>Agent N"]
O --> T1
O --> T2
O --> TN
Key Constraint: Sub-agents do NOT have Task tool access. They use only Read, Edit, Write, and Bash.
Execution Model: All Task agents run in foreground mode (not background). Multiple Tasks can be spawned in a single message for parallel execution, but results are collected synchronously.
Model Selection
Default (no flags)
| Task | Model |
|---|---|
| Content translation | Sonnet |
| Metadata/TOC | Haiku |
| Validation | Haiku |
With --high-quality
| Task | Model |
|---|---|
| Content translation | Opus |
| Metadata/TOC | Sonnet |
| Validation | Sonnet |
Automatic Upgrade
- •If quality score < 70: Re-translate flagged files with Opus
- •If translation fails: Retry with upgraded model
Execution Workflow
Phase 1: Analysis
- •
Create work directory:
bashWORK_DIR="/tmp/epub_translate_$(date +%s)" mkdir -p "$WORK_DIR"/{extracted,sections,translated,status,logs} - •
Analyze EPUBs (with configurable split threshold):
bashpython3 scripts/analyze_epub.py \ --epub "{EPUB_PATH}" \ --work-dir "$WORK_DIR" \ --source-lang "{SOURCE_LANG}" \ --target-lang "{TARGET_LANG}" \ --split-threshold 30 \ --split-parts 4 - •
Review
$WORK_DIR/manifest.jsonfor task count.
Phase 2: Translation
- •
Select translator prompt from references/:
- •Japanese:
translator_ja.md - •English:
translator_en.md - •Other:
translator_generic.md
- •Japanese:
- •
Spawn Task agents in foreground mode (batched):
- •
model: "sonnet" - •CRITICAL: Multiple Tasks in single message for parallel execution
- •Process in batches of
--parallelcount (default: 5) - •Results are returned directly - no status file monitoring needed
- •
- •
Batch execution pattern:
codeFor each batch of N tasks: - Spawn N Task agents in a single message (foreground, parallel) - Collect results directly from agent responses - Track completed/failed tasks - Proceed to next batch
- •
Retry failed tasks (max 2 attempts, upgrade to
opusif persistent)
Phase 3: Finalization
- •
Merge split files:
bashpython3 scripts/merge_xhtml.py --work-dir "$WORK_DIR" --manifest manifest.json
- •
Translate metadata and navigation (LLM-based):
- •Spawn metadata translation agent using
translator_metadata.md - •Translate: toc.ncx, nav.xhtml, content.opf (title, author, description)
- •Translate: cover.xhtml, titlepage.xhtml (if present)
- •Ensure TOC entries match translated chapter headings
- •Spawn metadata translation agent using
- •
Apply layout conversion (CRITICAL - must be done before packaging):
Determine conversion type based on target language and
--verticaloption:Target Language --verticalResult ko, en, etc. (ignored) horizontal-tb, ltr ja, zh false (default) horizontal-tb, ltr ja, zh true vertical-rl, rtl (우종서/縦書き) ar, he, fa (ignored) horizontal-tb, rtl A. Horizontal output (default for all languages):
bashTRANSLATED_DIR="$WORK_DIR/translated/{VOLUME_ID}" # Convert CSS files: vertical-rl → horizontal-tb find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \ -e 's/writing-mode:[[:space:]]*vertical-rl/writing-mode: horizontal-tb/g' \ -e 's/-webkit-writing-mode:[[:space:]]*vertical-rl/-webkit-writing-mode: horizontal-tb/g' \ -e 's/-epub-writing-mode:[[:space:]]*vertical-rl/-epub-writing-mode: horizontal-tb/g' \ {} \; # Convert content.opf: page direction and writing mode find "$TRANSLATED_DIR" -name "content.opf" -exec sed -i '' \ -e 's/page-progression-direction="rtl"/page-progression-direction="ltr"/g' \ -e 's/primary-writing-mode" content="vertical-rl"/primary-writing-mode" content="horizontal-tb"/g' \ {} \; # Convert XHTML inline styles if present find "$TRANSLATED_DIR" -name "*.xhtml" -exec sed -i '' \ -e 's/writing-mode:[[:space:]]*vertical-rl/writing-mode: horizontal-tb/g' \ {} \;B. Vertical output (only when
--verticalAND target is ja/zh):bashTRANSLATED_DIR="$WORK_DIR/translated/{VOLUME_ID}" # Convert CSS files: horizontal-tb → vertical-rl find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \ -e 's/writing-mode:[[:space:]]*horizontal-tb/writing-mode: vertical-rl/g' \ -e 's/-webkit-writing-mode:[[:space:]]*horizontal-tb/-webkit-writing-mode: vertical-rl/g' \ -e 's/-epub-writing-mode:[[:space:]]*horizontal-tb/-epub-writing-mode: vertical-rl/g' \ {} \; # Convert content.opf: page direction and writing mode for vertical find "$TRANSLATED_DIR" -name "content.opf" -exec sed -i '' \ -e 's/page-progression-direction="ltr"/page-progression-direction="rtl"/g' \ -e 's/primary-writing-mode" content="horizontal-tb"/primary-writing-mode" content="vertical-rl"/g' \ {} \; # Convert XHTML inline styles if present find "$TRANSLATED_DIR" -name "*.xhtml" -exec sed -i '' \ -e 's/writing-mode:[[:space:]]*horizontal-tb/writing-mode: vertical-rl/g' \ {} \;C. RTL output (for ar/he/fa targets):
bash# Convert page direction sed -i '' 's/page-progression-direction="ltr"/page-progression-direction="rtl"/g' "$TRANSLATED_DIR"/content.opf # Convert CSS direction find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \ -e 's/direction:[[:space:]]*ltr/direction: rtl/g' \ {} \;Note: If source is already vertical and
--verticalis set, skip CSS conversion (keep existing vertical layout).See
references/layout_conversion.mdfor complete conversion patterns. - •
Verify source text removed:
bashpython3 scripts/verify.py --work-dir "$WORK_DIR" --source-lang "{SOURCE_LANG}"
Phase 4: Quality Validation (LLM-Based)
- •
Extract text for validation (token-efficient format):
bashpython3 scripts/extract_for_validation.py \ --dir "$WORK_DIR/translated" \ --output-dir "$WORK_DIR/validation" \ --max-tokens 8000
- •
Select validator prompt from references/:
- •Korean target:
validator_ko.md(extendsvalidator_generic.md) - •Other targets:
validator_generic.md
- •Korean target:
- •
Spawn validation Task agents in foreground mode (batched):
- •Read
$WORK_DIR/validation/validation_manifest.json - •For each chunk, spawn a validator agent with:
- •
model: "haiku"(sufficient for validation)
- •
- •CRITICAL: Multiple Tasks in single message for parallel execution
- •Process in batches, collect results directly
- •Read
- •
Aggregate results:
- •Collect validation results from agent responses
- •Calculate average quality score
- •Identify files flagged for re-translation
- •
If average score < 70: Re-translate flagged files with
model: "opus"
Phase 5: Packaging
- •
Package EPUB:
bashbash scripts/package_epub.sh "$WORK_DIR" "{OUTPUT_DIR}" - •
Generate final report with quality metrics
File Splitting Configuration
Conservative defaults prevent context overflow in translation agents:
| Setting | Default | Description |
|---|---|---|
split-threshold | 30 KB | Files larger than this are split |
split-parts | 4 | Number of sections per large file |
Tuning Guidelines
- •Slow connection / Timeouts: Lower threshold (20 KB), more parts (6)
- •Fast connection / Large context: Higher threshold (50 KB), fewer parts (3)
- •Very large files (100KB+): Will be split into more parts automatically
Quality Validation (LLM-Based)
Translation quality is validated by LLM sub-agents, not regex patterns. This provides:
- •Context-aware naturalness assessment
- •Understanding of literary style and tone
- •Detection of subtle translation issues
Validator Instructions
| Target Language | Primary Instruction | Base Instruction |
|---|---|---|
| Korean | validator_ko.md | validator_generic.md |
| Other | validator_generic.md | - |
Korean-Specific Checks
- •Translationese (번역투):
~하는 것이다,~라고 하는, etc. - •Pronoun overuse: Excessive
그녀는,그는 - •Particle chains: Awkward
의의의patterns - •Honorific consistency: Speech level matching
Quality Score
- •90-100: Excellent - reads naturally
- •75-89: Good - minor issues
- •60-74: Acceptable - review recommended
- •<60: Poor - re-translation needed
Validation Workflow
- •Text extracted in token-efficient format
- •Chunked for parallel validation (8000 tokens each)
- •LLM validators spawned in foreground batches
- •Results collected directly from agent responses
- •Results aggregated into final report
Language-Specific Processing
Source Language Handling
| Source | Special Handling |
|---|---|
| Japanese | Remove ruby tags, handle vertical writing |
| Chinese | Handle traditional/simplified, remove pinyin |
| Arabic/Hebrew | Handle RTL text direction |
| English | Standard processing |
Layout Conversion (Target-Based)
Key Principle: All languages default to horizontal LTR (except RTL languages).
| Target Language | Page Direction | Writing Mode | Text Direction | Notes |
|---|---|---|---|---|
| Korean (ko) | ltr | horizontal-tb | ltr | |
| English (en) | ltr | horizontal-tb | ltr | |
| Japanese (ja) | ltr | horizontal-tb | ltr | Default |
Japanese (ja) + --vertical | rtl | vertical-rl | ltr | 縦書き (우종서) |
| Chinese (zh) | ltr | horizontal-tb | ltr | Default |
Chinese (zh) + --vertical | rtl | vertical-rl | ltr | 縱排 (우종서) |
| Arabic (ar) | rtl | horizontal-tb | rtl | |
| Hebrew (he) | rtl | horizontal-tb | rtl |
Note: --vertical option is only valid for Japanese (ja) and Chinese (zh) targets. It will be ignored for other languages.
See references/layout_conversion.md for complete conversion scripts.
Custom Dictionary (Optional)
The translator works without external dictionary files. It naturally translates based on context.
Use custom dictionaries ONLY for:
- •Proper nouns: names, places, organizations, brands
- •Document-specific terms: proprietary terms unique to this document
Do NOT add common words - let the translator handle them naturally.
Creating a Custom Dictionary
See assets/template.json for format:
{
"proper_nouns": { "names": { "田中太郎": "Tanaka Taro" } },
"domain_terms": { "ProprietaryTech": "고유 기술명" }
}
Academic/Technical Template
For academic or technical documents, use assets/template_academic.json.
Work Directory Structure
$WORK_DIR/ ├── manifest.json # Task manifest ├── extracted/ # Extracted EPUB contents ├── sections/ # Split large files ├── translated/ # Translated files ├── validation/ # Validation input/output files │ ├── validation_manifest.json │ ├── validate_001_input.txt │ ├── validate_001_result.json │ └── ... ├── status/ # Task status files └── logs/ # Log files
Status Codes
| Status | Meaning |
|---|---|
pending | Not started |
in_progress | Being translated |
completed | Done |
failed | Error occurred |
Error Handling
| Error | Action |
|---|---|
| Extraction failure | Skip corrupted file |
| Translation timeout | Split further, retry |
| XML error | Attempt fix, report |
| Remaining source text | Re-translate or manual review |
| Low quality score | Review samples, re-translate if needed |
File Reference
| Path | Description |
|---|---|
SKILL.md | This file |
references/orchestrator.md | Detailed orchestrator instructions |
references/translator_*.md | Language-specific translator prompts |
references/translator_metadata.md | Metadata and TOC translation instruction |
references/layout_conversion.md | Writing direction and layout conversion guide |
references/validator_generic.md | Generic validation instruction |
references/validator_ko.md | Korean-specific validation instruction |
scripts/analyze_epub.py | EPUB analysis (configurable splitting) |
scripts/split_xhtml.py | File splitting |
scripts/merge_xhtml.py | Section merging |
scripts/verify.py | Source text verification |
scripts/extract_for_validation.py | Token-efficient text extraction for LLM validation |
scripts/package_epub.sh | EPUB packaging |
assets/template.json | Dictionary template |
assets/template_academic.json | Academic dictionary template |