Book Converter Skill
Convert EPUB books into professionally formatted Markdown books with AI-assisted quality improvements.
Overview
This skill converts EPUB files into high-quality Markdown documents by:
- •Using pandoc to extract raw Markdown from EPUB
- •Creating a structured project directory
- •Planning and executing AI-driven formatting fixes
- •Producing chapter-by-chapter formatted output
- •Generating merged book file with Table of Contents
Quick Start
User provides an EPUB file path:
/Users/username/Downloads/Book.Name.2024.epub
Execute the conversion workflow:
python3 scripts/convert_book.py "/path/to/book.epub"
This initiates the complete conversion process.
Workflow
CRITICAL: Use subagents for all formatting work to avoid polluting main context.
Phase 1: Setup and Extraction (Main Agent)
Run the conversion script:
python3 scripts/convert_book.py "/path/to/book.epub"
This script:
- •Verifies EPUB file exists
- •Creates project structure:
- •
books/book-name/- Main directory - •
books/book-name/raw/- Pandoc output - •
books/book-name/chapters/- Formatted chapters - •
books/book-name/images/- Extracted images
- •
- •Runs pandoc to extract Markdown
- •Copies formatting standards to project directory
Output: Raw Markdown in books/book-name/raw/book-parsed.md
Phase 2: Analysis and Planning (Script + Subagent)
Step 1: Run the structure analysis script (Main Agent):
python3 books/book-name/analyze_structure.py books/book-name
This script:
- •Extracts all headers with line numbers
- •Detects formatting issues by sampling
- •Suggests chapter boundaries
- •Creates
STRUCTURE_ANALYSIS.mdreport (~5-10 KB instead of 35k+ lines)
Step 2: Launch a general subagent to create mapping files:
Task( subagent_type="general", description="Create chapter map and formatting plan", prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md: 1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues) 2. Read books/book-name/references/chapter-map-template.md for format 3. Read books/book-name/references/formatting-plan-template.md for format 4. Create books/book-name/CHAPTER_MAP.md: - Use suggested chapter boundaries from analysis - Verify line ranges make sense - Create proper slugged filenames 5. Create books/book-name/FORMATTING_PLAN.md: - Document issues found in analysis - Add severity and priority - Note book-specific patterns 6. Update books/book-name/progress.md to mark Phase 2 complete Return: Summary of chapters found and major issues identified.""" )
Output: CHAPTER_MAP.md, FORMATTING_PLAN.md, and updated progress.md
Phase 3: Chapter Formatting (Use Subagents)
For EACH chapter, launch a separate general subagent:
# Example for Chapter 1 Task( subagent_type="general", description="Format Chapter 1", prompt="""Format Chapter 1 following the chapter formatting workflow. **Critical Instructions:** 1. Read and follow ALL steps in books/book-name/references/chapter-workflow.md 2. Apply formatting rules from books/book-name/references/formatting-standards.md 3. Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1 4. Read books/book-name/FORMATTING_PLAN.md for known issues to watch for **Workflow Summary (see chapter-workflow.md for complete details):** Step 1: Read Standards and Chapter Map - Read references/formatting-standards.md - Read CHAPTER_MAP.md for your chapter's line ranges - Read FORMATTING_PLAN.md for known issues Step 2: Extract Chapter Content - Extract Chapter 1 from raw/book-parsed.md using line ranges Step 3: Identify Issues follow the standards - Headers using bold instead of # - Shattered code blocks - Split paragraphs - Missing code language identifiers - Emphasis artifacts [word] - Corrupted footnotes - Missing image alt text - Broken links Step 4: Apply Formatting Fixes - Follow the three-pass approach in chapter-workflow.md: * First pass: Structure (headers, code blocks) * Second pass: Content (paragraphs, emphasis) * Third pass: Details (footnotes, images, links) Step 5: Create Output File - Write to books/book-name/chapters/chapter-01-title.md - Use structure from chapter-workflow.md Step 6: Update Progress - Update books/book-name/progress.md with completion status - Document fixes applied **Quality Checklist (from chapter-workflow.md):** - [ ] All headers use proper # syntax - [ ] All code blocks have language identifiers - [ ] No shattered code blocks remain - [ ] Text flows naturally without mid-sentence breaks - [ ] All footnotes have [^N] format with definitions - [ ] Images have descriptive alt text Return: Confirmation with summary of fixes applied.""" )
Important:
- •Launch subagents in parallel batches (3-5 at a time) for efficiency
- •Each subagent must read chapter-workflow.md and formatting-standards.md
- •Follow the systematic workflow to ensure consistent quality
Output: Formatted chapters in books/book-name/chapters/
Phase 4: Book Assembly (Main Agent)
The merge_book.py script is already copied to your project directory. Simply run it:
python3 books/book-name/merge_book.py books/book-name
The script will:
- •Read
CHAPTER_MAP.mdfor chapter order - •Load all formatted chapters from
chapters/ - •Extract headers for Table of Contents
- •Fix image paths (relative to final location)
- •Combine all chapters in order
- •Generate comprehensive TOC
- •Output to
books/book-name-book.md
Output: books/book-name-book.md with complete formatted book
Note: The merge script is reusable - no need to create it per book!
Critical: Chapter Formatting Requirements
Every subagent in Phase 3 MUST:
- •Read chapter-workflow.md first - Contains the complete step-by-step process
- •Read formatting-standards.md - Contains all formatting rules (678 lines)
- •Follow the workflow systematically - Don't skip steps
- •Use the three-pass approach:
- •First pass: Fix structure (headers, code blocks)
- •Second pass: Fix content (paragraphs, emphasis)
- •Third pass: Fix details (footnotes, images, links)
- •Complete the quality checklist - Verify all items before finishing
Why this matters:
- •Ensures consistent quality across all chapters
- •Prevents common mistakes (skipped issues, inconsistent style)
- •Proven process from Clean Code Collection (35k+ lines)
- •Each chapter is only formatted once - must be thorough
The workflow documents are your complete instructions - trust them!
Subagent Usage Principles
Never process book content in main context. Always use subagents to:
- •Keep main context clean: Book content is large and pollutes context
- •Enable parallelization: Format multiple chapters simultaneously
- •Isolate formatting work: Each chapter gets fresh context
- •Avoid token limits: Raw content can exceed context windows
Subagent Selection: Always use subagent_type="general" for all book processing tasks.
Progress Tracking
Create and maintain books/book-name/progress.md:
# Book Name - Conversion Progress ## Phase 1: Setup ✓ - [x] EPUB extracted - [x] Project structure created ## Phase 2: Planning ✓ - [x] Chapter map created (15 chapters identified) - [x] Formatting plan documented ## Phase 3: Chapter Formatting (5/15 complete) - [x] Front Matter - [x] Chapter 1: Introduction - [x] Chapter 2: Getting Started - [x] Chapter 3: Advanced Topics - [x] Chapter 4: Best Practices - [ ] Chapter 5: Performance - [ ] ... ## Phase 4: Assembly - [ ] Merge script created - [ ] Final book generated
Update after each subagent completes.
Quality Standards
All formatted output must meet these criteria:
- •Headers: Use proper
#syntax, not bold text - •Code Blocks: Include language identifiers, merge shattered blocks
- •Text Flow: Join split sentences into natural paragraphs
- •Emphasis: Use
*italic*and**bold**, not[brackets] - •Footnotes: Standard
[^1]format with definitions - •Images: Descriptive alt text, not generic filenames
- •Links: Clean anchors, no PDF conversion artifacts
Complete standards reference: references/formatting-standards.md
Example Usage
User Request:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
Skill Execution:
- •Run conversion script to extract content
- •Analyze structure and create chapter map
- •Format each chapter using AI subagents
- •Merge into final book with TOC
- •Provide user with
books/effective-java-final.md
Scripts
- •convert_book.py: Main conversion script (Phase 1) - Extracts EPUB and sets up project
- •analyze_structure.py: Structure analyzer (Phase 2) - Extracts headers and detects issues efficiently
- •merge_book.py: Reusable merge script (Phase 4) - Combines all chapters into final book
References
- •formatting-standards.md: Complete formatting rules (loaded as needed during formatting)
- •chapter-workflow.md: Detailed chapter formatting workflow (loaded as needed)
- •progress-template.md: Template for progress tracking file
- •chapter-map-template.md: Template for chapter mapping
- •formatting-plan-template.md: Template for formatting issue documentation
Notes
- •High Quality Focus: Manual AI-driven formatting ensures prose flows naturally
- •No Automated Scripts: Formatting requires human-like judgment for line joining
- •Preserve Content: Never alter meaning or remove content
- •Code Accuracy: Ensure code blocks are syntactically complete