Podcast Generator
Generates podcast-style audio that plays directly in the browser. Zero cost, no API keys needed.
Workflow Decision Tree
User provides formatted dialogue
→ Use existing dialogue as-is if quality is good → Refine structure only if flow needs improvement
User provides article, list, or text content
→ Create dialogue from content (see "Dialogue Creation Process")
User provides topic only
→ Request source material before proceeding
Dialogue Creation Process
Follow this two-phase workflow when creating podcast dialogue from content:
Phase 1: Analyze Source Content
- •Read source material completely
- •Detect language from content (en, de, fr, es, it, etc.)
- •Identify key information: facts, dates, names, numbers, details
- •Organize by theme: chronology, category, or logical grouping
Phase 2: Create Dialogue
- •
Structure conversation:
- •Host (Speaker 1): ~20% - questions and transitions
- •Expert (Speaker 2): ~80% - factual responses from source
- •Length: as many exchanges as needed to cover all content (typically 10-50+ lines)
- •
Apply TTS formatting: Read reference/tts-formatting.md for complete rules
- •
Generate JSX file from template (see "Implementation Steps")
Information Accuracy
Convert text to audio. Use only source material facts.
Expert responses:
- •Use only names, dates, numbers, details explicitly stated in source
- •Never invent examples, context, or explanations not in source
- •Never add interpretations, opinions, or evaluations
- •Never explain WHY something happened unless source explains it
Host phrases:
- •Use neutral transitions: "I see", "Tell me more", "Can you elaborate?"
- •Reference previous statements: "You mentioned X - how does that connect to Y?" (when source shows connection)
- •Never add new facts, context, or interpretations
Dialogue Format Guidelines
Host (Speaker 1) - ~20% of content
- •Introduce topic with opening question
- •Ask transition questions between topics
- •Reference Expert's previous statements: "You mentioned X - can you elaborate?"
- •Use conversational acknowledgments: "I see", "Tell me more"
- •Never introduce facts not in source
Expert (Speaker 2) - ~80% of content
- •Provide comprehensive factual responses from source material
- •Include specific details: names, dates, numbers, locations
- •Organize information logically by theme, chronology, or category
- •Structure facts narratively using only source material
- •Never repeat information already stated
- •Never add context or examples not in source
Natural Conversation Techniques
Use these patterns without adding information:
- •Vary question styles: "What happened next?" / "Can you explain that further?" / "Tell me about..."
- •Ask follow-up questions based on Expert's previous response
- •Expert elaborates when source provides multiple details about a topic
- •Clear transitions between sections: "Moving to the next category...", "In the European context..."
Avoid
- •Personal opinions: "I think...", "That's crazy..."
- •Value judgments: "amazing", "fascinating", "interesting"
- •Humor, irony, jokes
- •Rapid back-and-forth after every sentence
Implementation Steps
When user requests a podcast:
- •Analyze source content and create dialogue following format above
- •Detect language from content
- •Read template from
assets/podcast-template.jsx - •Replace values in template:
- •
PODCAST_SCRIPT- your generated dialogue - •
PODCAST_TITLE- descriptive title from content - •
PODCAST_LANGUAGE- detected language code
- •
- •Save as JSX file - Use the Write tool to save the modified template as a
.jsxfile. The file will render as an interactive podcast player. - •Recommend Microsoft Edge browser for best voice quality (250+ Natural voices vs Chrome's 19)
Technical Reference
Script Format
<speaker1>Host's question or statement. <speaker2>Expert's response with factual information.
Voice Configuration (automatic)
- •Speaker 1 (Host): Pitch 1.05, Rate 0.95
- •Speaker 2 (Expert): Pitch 0.88, Rate 0.93
Platform-Aware Voice Selection
Platform detection:
- •Automatically detects iOS, Android, Desktop Edge, or Desktop
- •Selects best available voices based on platform
Desktop Edge:
- •Priority: Microsoft Neural/Natural voices (Katja, Conrad, Aria, Guy, etc.)
- •250+ high-quality voices available
Desktop Chrome:
- •Priority: Google voices (Google UK English Female, Google Deutsch, etc.)
- •~19 voices available (lower quality than Edge)
- •Fallback: local system voices
iOS (Safari/Mobile):
- •Priority: Native Siri voices (Samantha, Anna, Daniel, etc.)
- •Best quality on iOS devices
Android (Chrome/Mobile):
- •Priority: Google TTS voices (Google Deutsch, Google UK English Female, etc.)
- •Wavenet voices preferred when available
Voice assignment:
- •Automatically assigns different voices to Speaker 1 and Speaker 2
- •Uses modulo distribution for 3+ speakers
- •Ensures distinct voices even with limited availability
Player Features
- •Play/Pause/Resume with full playback control
- •Stop to reset to beginning
- •Click any transcript line to resume from there
- •Progress bar shows current position
- •Auto-scroll follows current line
Technical Constraints
- •Keep sentences under 14 seconds (Chrome limitation)
- •350ms pause between speakers
- •Microsoft Edge browser provides 250+ high-quality Natural voices (best option)
- •Chrome provides only 19 lower-quality voices with utterance bugs
- •Firefox has very limited voice support
Quality Requirements
- •Factual Accuracy: Expert responses use only source facts
- •Natural Flow: Avoid rapid back-and-forth, value judgments
- •TTS Compliance: All text must play without pronunciation errors
- •Zero Hallucination: No invented examples or context
- •Complete Coverage: Include all important facts from source
- •No Duplicates: Each fact appears exactly once