AgentSkillsCN

podcastfy-generator

支持从 URL、YouTube 视频、PDF 文档或文本主题中生成 AI 播客风格的音频对话。可打造类似 NotebookLM 的双主持人对话形式。当用户提出“制作播客”“生成音频摘要”“将这篇文章转为播客”等需求,或希望将现有内容转换为音频讨论形式时,即可使用此功能。

SKILL.md
--- frontmatter
name: podcastfy-generator
description: Generate AI podcast-style audio conversations from URLs, YouTube videos, PDFs, or text topics. Creates NotebookLM-style two-host dialogues. Use when user asks to "create a podcast", "make an audio summary", "turn this article into a podcast", or wants content converted to audio discussion format.
homepage: https://github.com/kesslerio/podcastfy-generator-openclaw-skill
metadata: {"openclaw": {"emoji": "🎙️", "requires": {"bins": ["ffmpeg", "uv"], "env": ["OPENAI_API_KEY", "GEMINI_API_KEY"]}, "primaryEnv": "OPENAI_API_KEY", "optionalEnv": ["ELEVENLABS_API_KEY"]}}

Podcastfy Generator 🎙️

Generate AI podcast-style audio conversations from any content. Creates engaging two-host dialogues similar to Google NotebookLM's Audio Overview feature.

Capabilities

  • URLs → Fetch article content, generate podcast discussion
  • YouTube → Extract transcript, create audio summary
  • PDFs → Parse document, synthesize key points as dialogue
  • Text/Topics → Generate podcast from plain text or topic prompts
  • Multi-lingual → English, German, French, Spanish (auto-detect or specify)
  • Custom Identity → Name the podcast, name the hosts, pick their voices

Quick Examples

code
"Create a podcast about this article: https://example.com/tech-news"
"Turn this YouTube video into a podcast: https://youtube.com/watch?v=..."
"Generate a German podcast discussing quantum computing"
"Make a podcast called 'Deep Dive' with hosts Alex and Sam about this PDF"

Usage

Basic Generation

bash
# From URL
<skill>/scripts/generate.py --url "https://example.com/article"

# From YouTube
<skill>/scripts/generate.py --url "https://youtube.com/watch?v=abc123"

# From text
<skill>/scripts/generate.py --text "Your content here..."

# From PDF
<skill>/scripts/generate.py --pdf "/path/to/document.pdf"

# Multiple sources
<skill>/scripts/generate.py --url "https://url1.com" --url "https://url2.com"

Podcast Identity

bash
# Name the podcast
<skill>/scripts/generate.py --url "https://..." --podcast-name "Deep Dive"

# Name the hosts (they'll use each other's names in conversation)
<skill>/scripts/generate.py --url "https://..." --host-name Alex --cohost-name Sam

# No podcast name (hosts introduce topic naturally, no show branding)
<skill>/scripts/generate.py --url "https://..." --podcast-name ""

# Full customization
<skill>/scripts/generate.py --url "https://..." \
  --podcast-name "Tech Talk" --podcast-tagline "Breaking down the future" \
  --host-name Alex --cohost-name Kiki

Language Options

bash
# Auto-detect (default)
<skill>/scripts/generate.py --url "https://example.de/artikel"

# Explicit language
<skill>/scripts/generate.py --url "https://example.com" --lang de

Supported: en (English), de (German), fr (French), es (Spanish)

TTS Provider & Voice Options

Default: OpenAI TTS (tts-1-hd with onyx + nova voices)

Optional: ElevenLabs for higher quality, more natural voices:

bash
# Use ElevenLabs with defaults (Daniel + Alice)
<skill>/scripts/generate.py --url "https://..." --elevenlabs

# Custom voices per host
<skill>/scripts/generate.py --url "https://..." --elevenlabs \
  --host-voice Daniel --cohost-voice Alice

# OpenAI custom voices
<skill>/scripts/generate.py --url "https://..." \
  --host-voice echo --cohost-voice shimmer
bash
# Use local sherpa-onnx TTS (free, offline, unlimited)
<skill>/scripts/generate.py --url "https://..." --sherpa

OpenAI voices: alloy, echo, fable, onyx, nova, shimmer

ElevenLabs voices (premade): Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Alice, Matilda, Will, Jessica, Eric, Bella, Chris, Brian, Daniel, Lily, Adam, Bill

Sherpa-onnx (local): Uses Piper VITS models. Voice paths configured in config/conversation.yaml under text_to_speech.sherpa. Requires sherpa-onnx-offline-tts binary (set SHERPA_ONNX_TTS_BIN or install to ~/.openclaw/tools/sherpa-onnx-tts/). Performance note: CPU-based synthesis, typically ~2-10x realtime, requires ~2GB+ RAM, and quality is good but generally below ElevenLabs.

Browse ElevenLabs voices: https://elevenlabs.io/voice-library

All CLI Options

OptionDescriptionExample
--urlURL to process (repeatable)--url https://...
--textPlain text content--text "AI is..."
--pdfPath to PDF file--pdf report.pdf
--langOutput language--lang de
--podcast-namePodcast name (empty = none)--podcast-name "Deep Dive"
--podcast-taglinePodcast tagline--podcast-tagline "..."
--host-nameHost name (Person1)--host-name Alex
--cohost-nameCo-host name (Person2)--cohost-name Kiki
--elevenlabsUse ElevenLabs TTS--elevenlabs
--sherpaUse local sherpa-onnx TTS (free)--sherpa
--host-voiceVoice for host--host-voice Daniel
--cohost-voiceVoice for co-host--cohost-voice Alice
--output, -oOutput file path-o podcast.ogg

Output

The script outputs an OGG audio file path. Use the OpenClaw message tool to send it:

python
# Agent workflow
audio_path = exec("<skill>/scripts/generate.py --url 'https://...'")
message(action="send", media=audio_path, target=user_chat)

Configuration

Default podcast style is configured in <skill>/config/conversation.yaml. CLI flags override config values.

Key config options:

  • podcast_name — Show name (empty = content-driven intro)
  • roles_person1 / roles_person2 — Host role descriptions
  • text_to_speech.{provider}.default_voices — Default voice per provider
  • language_voices.{provider}.{Language} — Per-language voice overrides (applied when no --host-voice/--cohost-voice is set)
  • conversation_style — Style keywords (engaging, concise, etc.)
  • creativity — 0-1 scale (higher = more creative dialogue)

Environment Variables

VariableRequiredPurpose
OPENAI_API_KEYYesTTS audio generation (default)
GEMINI_API_KEYYesTranscript/dialogue generation
ELEVENLABS_API_KEYNoElevenLabs TTS (required for --elevenlabs)
SHERPA_ONNX_TTS_BINNoPath to sherpa-onnx-offline-tts binary (for --sherpa)

Get your ElevenLabs API key at: https://elevenlabs.io/app/settings/api-keys

Installation

First-time setup (run once):

bash
<skill>/scripts/install.sh

Requirements

  • ffmpeg — Audio format conversion
  • uv — Python environment management
  • Python 3.11+ — Runtime

Troubleshooting

"ffmpeg not found"

Install ffmpeg: brew install ffmpeg (macOS) or apt install ffmpeg (Linux)

"API key not set"

Ensure OPENAI_API_KEY and GEMINI_API_KEY are in your environment or secrets.conf

Hosts say "Quick Brief" or reference a show name

Set podcast_name: "" in config/conversation.yaml or use --podcast-name ""

Generation takes too long

Podcastfy processes content through LLM + TTS. Expect 30-90 seconds for short podcasts.

Audio quality issues

Try ElevenLabs (--elevenlabs) for more natural voices. OpenAI tts-1-hd is decent but synthetic.