AgentSkillsCN

epub-translator

支持并行处理,实现 EPUB 电子书文件在不同语言间的互译。支持日语、英语、中文及其他多种语言。对于大型文件,可按章节拆分,同时管理多个卷册,并完整保留 EPUB 的结构与格式。内置翻译质量验证功能。适用于小说、书籍,以及任何 EPUB 内容的翻译工作。

SKILL.md
--- frontmatter
name: epub-translator
description: Translates EPUB ebook files between languages with parallel processing. Supports Japanese, English, Chinese, and other languages. Handles large files by splitting into sections, manages multiple volumes simultaneously, and preserves EPUB structure and formatting. Includes translation quality validation. Use when translating novels, books, or any EPUB content.
compatibility: Requires Python 3.8+, zip/unzip commands. Optional epubcheck for validation.
allowed-tools: Read Write Edit Bash(python3:*) Bash(bash:*) Bash(mkdir:*) Bash(find:*) Bash(echo:*) Bash(wc:*) Bash(cat:*) Bash(ls:*)
metadata:
  author: Haze Lee
  version: "1.0.0"
  category: translation

EPUB Translation Skill

Translate EPUB files between any language pair with optimized support for Japanese and English to Korean.

When to Use This Skill

Use this skill when:

  • User wants to translate an EPUB ebook to another language
  • User mentions translating Japanese/English/Chinese novels or books
  • User has multiple EPUB files to translate in batch
  • User needs to preserve EPUB formatting and structure during translation

Usage

bash
/epub-translator <epub_path> [options]

Arguments

  • <epub_path>: EPUB file or directory containing EPUBs

Options

OptionDescriptionDefault
--source-langSource language codeja
--target-langTarget language codeko
--dictCustom dictionary (JSON)none
--output-dirOutput directory./translated
--parallelConcurrent agents5
--split-thresholdFile size for splitting (KB)30
--split-partsParts to split large files4
--high-qualityUse Opus model for translationfalse
--verticalOutput vertical writing (ja/zh only)false

Language Codes

ja (Japanese), en (English), ko (Korean), zh (Chinese), es (Spanish), fr (French), de (German), ru (Russian), ar (Arabic), or any ISO 639-1 code.

Examples

bash
# Japanese novel to Korean (default)
/epub-translator "/books/novel.epub"

# English to Korean
/epub-translator "/books/english.epub" --source-lang en

# Japanese to English
/epub-translator "/books/jp_novel.epub" --source-lang ja --target-lang en

# High-quality translation using Opus model
/epub-translator "/books/important.epub" --high-quality

# Batch with larger split threshold (less splitting)
/epub-translator "/books/" --split-threshold 50 --parallel 10

# More aggressive splitting for slower connections
/epub-translator "/books/large.epub" --split-threshold 20 --split-parts 6

# English to Japanese with vertical writing (우종서/縦書き)
/epub-translator "/books/novel.epub" --source-lang en --target-lang ja --vertical

# Korean to Chinese with vertical writing
/epub-translator "/books/korean.epub" --source-lang ko --target-lang zh --vertical

Architecture

mermaid
graph TB
    O["ORCHESTRATOR<br/>• Analyzes EPUBs and creates task manifest<br/>• Spawns parallel translator agents (foreground)<br/>• Collects results directly from agent responses<br/>• Validates translation quality<br/>• Handles retries and error recovery"]
    T1["Translator<br/>Agent 1"]
    T2["Translator<br/>Agent 2"]
    TN["Translator<br/>Agent N"]

    O --> T1
    O --> T2
    O --> TN

Key Constraint: Sub-agents do NOT have Task tool access. They use only Read, Edit, Write, and Bash.

Execution Model: All Task agents run in foreground mode (not background). Multiple Tasks can be spawned in a single message for parallel execution, but results are collected synchronously.


Model Selection

Default (no flags)

TaskModel
Content translationSonnet
Metadata/TOCHaiku
ValidationHaiku

With --high-quality

TaskModel
Content translationOpus
Metadata/TOCSonnet
ValidationSonnet

Automatic Upgrade

  • If quality score < 70: Re-translate flagged files with Opus
  • If translation fails: Retry with upgraded model

Execution Workflow

Phase 1: Analysis

  1. Create work directory:

    bash
    WORK_DIR="/tmp/epub_translate_$(date +%s)"
    mkdir -p "$WORK_DIR"/{extracted,sections,translated,status,logs}
    
  2. Analyze EPUBs (with configurable split threshold):

    bash
    python3 scripts/analyze_epub.py \
      --epub "{EPUB_PATH}" \
      --work-dir "$WORK_DIR" \
      --source-lang "{SOURCE_LANG}" \
      --target-lang "{TARGET_LANG}" \
      --split-threshold 30 \
      --split-parts 4
    
  3. Review $WORK_DIR/manifest.json for task count.

Phase 2: Translation

  1. Select translator prompt from references/:

    • Japanese: translator_ja.md
    • English: translator_en.md
    • Other: translator_generic.md
  2. Spawn Task agents in foreground mode (batched):

    • model: "sonnet"
    • CRITICAL: Multiple Tasks in single message for parallel execution
    • Process in batches of --parallel count (default: 5)
    • Results are returned directly - no status file monitoring needed
  3. Batch execution pattern:

    code
    For each batch of N tasks:
      - Spawn N Task agents in a single message (foreground, parallel)
      - Collect results directly from agent responses
      - Track completed/failed tasks
      - Proceed to next batch
    
  4. Retry failed tasks (max 2 attempts, upgrade to opus if persistent)

Phase 3: Finalization

  1. Merge split files:

    bash
    python3 scripts/merge_xhtml.py --work-dir "$WORK_DIR" --manifest manifest.json
    
  2. Translate metadata and navigation (LLM-based):

    • Spawn metadata translation agent using translator_metadata.md
    • Translate: toc.ncx, nav.xhtml, content.opf (title, author, description)
    • Translate: cover.xhtml, titlepage.xhtml (if present)
    • Ensure TOC entries match translated chapter headings
  3. Apply layout conversion (CRITICAL - must be done before packaging):

    Determine conversion type based on target language and --vertical option:

    Target Language--verticalResult
    ko, en, etc.(ignored)horizontal-tb, ltr
    ja, zhfalse (default)horizontal-tb, ltr
    ja, zhtruevertical-rl, rtl (우종서/縦書き)
    ar, he, fa(ignored)horizontal-tb, rtl

    A. Horizontal output (default for all languages):

    bash
    TRANSLATED_DIR="$WORK_DIR/translated/{VOLUME_ID}"
    
    # Convert CSS files: vertical-rl → horizontal-tb
    find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \
        -e 's/writing-mode:[[:space:]]*vertical-rl/writing-mode: horizontal-tb/g' \
        -e 's/-webkit-writing-mode:[[:space:]]*vertical-rl/-webkit-writing-mode: horizontal-tb/g' \
        -e 's/-epub-writing-mode:[[:space:]]*vertical-rl/-epub-writing-mode: horizontal-tb/g' \
        {} \;
    
    # Convert content.opf: page direction and writing mode
    find "$TRANSLATED_DIR" -name "content.opf" -exec sed -i '' \
        -e 's/page-progression-direction="rtl"/page-progression-direction="ltr"/g' \
        -e 's/primary-writing-mode" content="vertical-rl"/primary-writing-mode" content="horizontal-tb"/g' \
        {} \;
    
    # Convert XHTML inline styles if present
    find "$TRANSLATED_DIR" -name "*.xhtml" -exec sed -i '' \
        -e 's/writing-mode:[[:space:]]*vertical-rl/writing-mode: horizontal-tb/g' \
        {} \;
    

    B. Vertical output (only when --vertical AND target is ja/zh):

    bash
    TRANSLATED_DIR="$WORK_DIR/translated/{VOLUME_ID}"
    
    # Convert CSS files: horizontal-tb → vertical-rl
    find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \
        -e 's/writing-mode:[[:space:]]*horizontal-tb/writing-mode: vertical-rl/g' \
        -e 's/-webkit-writing-mode:[[:space:]]*horizontal-tb/-webkit-writing-mode: vertical-rl/g' \
        -e 's/-epub-writing-mode:[[:space:]]*horizontal-tb/-epub-writing-mode: vertical-rl/g' \
        {} \;
    
    # Convert content.opf: page direction and writing mode for vertical
    find "$TRANSLATED_DIR" -name "content.opf" -exec sed -i '' \
        -e 's/page-progression-direction="ltr"/page-progression-direction="rtl"/g' \
        -e 's/primary-writing-mode" content="horizontal-tb"/primary-writing-mode" content="vertical-rl"/g' \
        {} \;
    
    # Convert XHTML inline styles if present
    find "$TRANSLATED_DIR" -name "*.xhtml" -exec sed -i '' \
        -e 's/writing-mode:[[:space:]]*horizontal-tb/writing-mode: vertical-rl/g' \
        {} \;
    

    C. RTL output (for ar/he/fa targets):

    bash
    # Convert page direction
    sed -i '' 's/page-progression-direction="ltr"/page-progression-direction="rtl"/g' "$TRANSLATED_DIR"/content.opf
    
    # Convert CSS direction
    find "$TRANSLATED_DIR" -name "*.css" -exec sed -i '' \
        -e 's/direction:[[:space:]]*ltr/direction: rtl/g' \
        {} \;
    

    Note: If source is already vertical and --vertical is set, skip CSS conversion (keep existing vertical layout).

    See references/layout_conversion.md for complete conversion patterns.

  4. Verify source text removed:

    bash
    python3 scripts/verify.py --work-dir "$WORK_DIR" --source-lang "{SOURCE_LANG}"
    

Phase 4: Quality Validation (LLM-Based)

  1. Extract text for validation (token-efficient format):

    bash
    python3 scripts/extract_for_validation.py \
      --dir "$WORK_DIR/translated" \
      --output-dir "$WORK_DIR/validation" \
      --max-tokens 8000
    
  2. Select validator prompt from references/:

    • Korean target: validator_ko.md (extends validator_generic.md)
    • Other targets: validator_generic.md
  3. Spawn validation Task agents in foreground mode (batched):

    • Read $WORK_DIR/validation/validation_manifest.json
    • For each chunk, spawn a validator agent with:
      • model: "haiku" (sufficient for validation)
    • CRITICAL: Multiple Tasks in single message for parallel execution
    • Process in batches, collect results directly
  4. Aggregate results:

    • Collect validation results from agent responses
    • Calculate average quality score
    • Identify files flagged for re-translation
  5. If average score < 70: Re-translate flagged files with model: "opus"

Phase 5: Packaging

  1. Package EPUB:

    bash
    bash scripts/package_epub.sh "$WORK_DIR" "{OUTPUT_DIR}"
    
  2. Generate final report with quality metrics


File Splitting Configuration

Conservative defaults prevent context overflow in translation agents:

SettingDefaultDescription
split-threshold30 KBFiles larger than this are split
split-parts4Number of sections per large file

Tuning Guidelines

  • Slow connection / Timeouts: Lower threshold (20 KB), more parts (6)
  • Fast connection / Large context: Higher threshold (50 KB), fewer parts (3)
  • Very large files (100KB+): Will be split into more parts automatically

Quality Validation (LLM-Based)

Translation quality is validated by LLM sub-agents, not regex patterns. This provides:

  • Context-aware naturalness assessment
  • Understanding of literary style and tone
  • Detection of subtle translation issues

Validator Instructions

Target LanguagePrimary InstructionBase Instruction
Koreanvalidator_ko.mdvalidator_generic.md
Othervalidator_generic.md-

Korean-Specific Checks

  • Translationese (번역투): ~하는 것이다, ~라고 하는, etc.
  • Pronoun overuse: Excessive 그녀는, 그는
  • Particle chains: Awkward 의의의 patterns
  • Honorific consistency: Speech level matching

Quality Score

  • 90-100: Excellent - reads naturally
  • 75-89: Good - minor issues
  • 60-74: Acceptable - review recommended
  • <60: Poor - re-translation needed

Validation Workflow

  1. Text extracted in token-efficient format
  2. Chunked for parallel validation (8000 tokens each)
  3. LLM validators spawned in foreground batches
  4. Results collected directly from agent responses
  5. Results aggregated into final report

Language-Specific Processing

Source Language Handling

SourceSpecial Handling
JapaneseRemove ruby tags, handle vertical writing
ChineseHandle traditional/simplified, remove pinyin
Arabic/HebrewHandle RTL text direction
EnglishStandard processing

Layout Conversion (Target-Based)

Key Principle: All languages default to horizontal LTR (except RTL languages).

Target LanguagePage DirectionWriting ModeText DirectionNotes
Korean (ko)ltrhorizontal-tbltr
English (en)ltrhorizontal-tbltr
Japanese (ja)ltrhorizontal-tbltrDefault
Japanese (ja) + --verticalrtlvertical-rlltr縦書き (우종서)
Chinese (zh)ltrhorizontal-tbltrDefault
Chinese (zh) + --verticalrtlvertical-rlltr縱排 (우종서)
Arabic (ar)rtlhorizontal-tbrtl
Hebrew (he)rtlhorizontal-tbrtl

Note: --vertical option is only valid for Japanese (ja) and Chinese (zh) targets. It will be ignored for other languages.

See references/layout_conversion.md for complete conversion scripts.


Custom Dictionary (Optional)

The translator works without external dictionary files. It naturally translates based on context.

Use custom dictionaries ONLY for:

  • Proper nouns: names, places, organizations, brands
  • Document-specific terms: proprietary terms unique to this document

Do NOT add common words - let the translator handle them naturally.

Creating a Custom Dictionary

See assets/template.json for format:

json
{
  "proper_nouns": { "names": { "田中太郎": "Tanaka Taro" } },
  "domain_terms": { "ProprietaryTech": "고유 기술명" }
}

Academic/Technical Template

For academic or technical documents, use assets/template_academic.json.


Work Directory Structure

code
$WORK_DIR/
├── manifest.json           # Task manifest
├── extracted/              # Extracted EPUB contents
├── sections/               # Split large files
├── translated/             # Translated files
├── validation/             # Validation input/output files
│   ├── validation_manifest.json
│   ├── validate_001_input.txt
│   ├── validate_001_result.json
│   └── ...
├── status/                 # Task status files
└── logs/                   # Log files

Status Codes

StatusMeaning
pendingNot started
in_progressBeing translated
completedDone
failedError occurred

Error Handling

ErrorAction
Extraction failureSkip corrupted file
Translation timeoutSplit further, retry
XML errorAttempt fix, report
Remaining source textRe-translate or manual review
Low quality scoreReview samples, re-translate if needed

File Reference

PathDescription
SKILL.mdThis file
references/orchestrator.mdDetailed orchestrator instructions
references/translator_*.mdLanguage-specific translator prompts
references/translator_metadata.mdMetadata and TOC translation instruction
references/layout_conversion.mdWriting direction and layout conversion guide
references/validator_generic.mdGeneric validation instruction
references/validator_ko.mdKorean-specific validation instruction
scripts/analyze_epub.pyEPUB analysis (configurable splitting)
scripts/split_xhtml.pyFile splitting
scripts/merge_xhtml.pySection merging
scripts/verify.pySource text verification
scripts/extract_for_validation.pyToken-efficient text extraction for LLM validation
scripts/package_epub.shEPUB packaging
assets/template.jsonDictionary template
assets/template_academic.jsonAcademic dictionary template