AgentSkillsCN

text-to-speech

通过inference.sh CLI,利用DIA TTS、Kokoro、Chatterbox等模型,将文本转化为自然流畅的语音。 支持的模型:DIA TTS(对话式语音合成)、Kokoro TTS、Chatterbox、Higgs Audio、VibeVoice(播客专用)。 功能包括:文本转语音、语音克隆、多说话人对话、播客生成、富有表现力的语音输出。 适用场景:配音旁白、有声书、播客节目、无障碍辅助、视频解说、IVR系统、语音助手。 触发词:文本转语音、TTS、语音生成、AI语音、语音合成、配音旁白、 生成语音、AI旁白、语音克隆、文本转音频、ElevenLabs替代方案、 语音AI、AI配音、语音生成器、自然语音

SKILL.md
--- frontmatter
name: text-to-speech
description: |
  Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI.
  Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Higgs Audio, VibeVoice (podcasts).
  Capabilities: text-to-speech, voice cloning, multi-speaker dialogue, podcast generation, expressive speech.
  Use for: voiceovers, audiobooks, podcasts, accessibility, video narration, IVR, voice assistants.
  Triggers: text to speech, tts, voice generation, ai voice, speech synthesis, voice over,
  generate speech, ai narrator, voice cloning, text to audio, elevenlabs alternative,
  voice ai, ai voiceover, speech generator, natural voice
allowed-tools: Bash(infsh *)

Text-to-Speech

Text-to-Speech

Convert text to natural speech via inference.sh CLI.

Quick Start

bash
# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

Available Models

ModelApp IDBest For
DIA TTSinfsh/dia-ttsConversational, expressive
Kokoro TTSinfsh/kokoro-ttsFast, natural
Chatterboxinfsh/chatterboxGeneral purpose
Higgs Audioinfsh/higgs-audioEmotional control
VibeVoiceinfsh/vibevoicePodcasts, long-form

Browse All Audio Apps

bash
infsh app list --category audio

Examples

Basic Text-to-Speech

bash
infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

Conversational TTS with DIA

bash
infsh app sample infsh/dia-tts --save input.json

# Edit input.json:
# {
#   "text": "Hey! How are you doing today? I'm really excited to share this with you.",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

Long-form Audio (Podcasts)

bash
infsh app sample infsh/vibevoice --save input.json

# Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json

Expressive Speech with Higgs

bash
infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "This is absolutely incredible!",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

Use Cases

  • Voiceovers: Product demos, explainer videos
  • Audiobooks: Convert text to spoken word
  • Podcasts: Generate podcast episodes
  • Accessibility: Make content accessible
  • IVR: Phone system voice prompts
  • Video Narration: Add narration to videos

Combine with Video

Generate speech, then create a talking head video:

bash
# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json

# 2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Related Skills

bash
# Full platform skill (all 150+ apps)
npx skills add inferencesh/skills@inference-sh

# AI avatars (combine TTS with talking heads)
npx skills add inferencesh/skills@ai-avatar-video

# AI music generation
npx skills add inferencesh/skills@ai-music-generation

# Speech-to-text (transcription)
npx skills add inferencesh/skills@speech-to-text

# Video generation
npx skills add inferencesh/skills@ai-video-generation

Browse all apps: infsh app list

Documentation