AgentSkillsCN

tts-openai

借助 OpenAI TTS API,生成高质量的语音旁白。

SKILL.md
--- frontmatter
name: tts-openai
description: Generate high-quality voiceover using OpenAI TTS API
allowed-tools:
  - Bash
  - Read
  - Write

TTS (Text-to-Speech) Skill

Generate professional voiceover audio using OpenAI's TTS API.

Prerequisites

  • OpenAI API key set as environment variable: OPENAI_API_KEY
  • curl installed (standard on most systems)

Available Voices

VoiceDescriptionBest For
alloyNeutral, balancedGeneral purpose
echoWarm, conversationalStorytelling
fableBritish accent, narrativeDocumentaries
onyxDeep, authoritativeNews, serious topics
novaFriendly, upbeatTutorials, explainers
shimmerSoft, gentleMeditation, ASMR

Available Models

ModelQualitySpeedCost
tts-1StandardFastLower
tts-1-hdHigh-definitionSlowerHigher

Generate Voiceover

Using curl (Recommended)

bash
# Generate voiceover
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "Your script text goes here. This will be converted to speech.",
    "voice": "nova"
  }' \
  --output output/voiceover.mp3

For Long Scripts

Split into chunks if text is very long (max ~4000 characters per request):

bash
# Generate part 1
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "First part of the script...",
    "voice": "nova"
  }' \
  --output output/vo_part1.mp3

# Generate part 2
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "Second part of the script...",
    "voice": "nova"
  }' \
  --output output/vo_part2.mp3

# Combine parts
ffmpeg -i "concat:output/vo_part1.mp3|output/vo_part2.mp3" \
  -acodec copy output/voiceover_combined.mp3

Audio Post-Processing

Normalize Audio Levels

bash
ffmpeg -i output/voiceover.mp3 \
  -filter:a loudnorm=I=-16:TP=-1.5:LRA=11 \
  output/voiceover_normalized.mp3

Add Background Music

bash
# Mix voiceover with background music (music at 15% volume)
ffmpeg -i output/voiceover.mp3 -i assets/music/background.mp3 \
  -filter_complex "[1:a]volume=0.15[bg];[0:a][bg]amix=inputs=2:duration=first" \
  output/voiceover_with_music.mp3

Get Audio Duration

bash
ffprobe -v error -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 \
  output/voiceover.mp3

Voice Selection Guide

Content TypeRecommended Voice
Tech newsonyx (authoritative)
Tutorialsnova (friendly)
Storytellingfable or echo
Generalalloy
Calm/ASMRshimmer

Best Practices

  1. Clean the script: Remove stage directions, keep only spoken text
  2. Add punctuation: Periods and commas create natural pauses
  3. Use SSML-like formatting: "..." creates pauses in speech
  4. Test voices: Generate samples before full production
  5. Normalize audio: Ensure consistent levels
  6. Save originals: Keep unprocessed audio for re-editing

Cost Estimation

ModelCost per 1M chars~Cost per minute
tts-1$15.00~$0.10
tts-1-hd$30.00~$0.20

(Based on ~150 words/min, ~6 chars/word = ~900 chars/min)

Output Files

Save audio files to:

  • Raw TTS: output/voiceover_{project}.mp3
  • Normalized: output/voiceover_{project}_normalized.mp3
  • With music: output/voiceover_{project}_final.mp3