TTS Skill - Text-to-Speech Synthesis
Slice: slices/tts/
Type: Voice Synthesis Service
Purpose
Text-to-speech synthesis with voice personas for PMOVES.AI content. Use this skill when:
- •Generating audio narration from text
- •Creating multi-speaker podcast content
- •Applying punctuation engineering for prosody control
Voice Personas
| Persona | Engine | Style | Use Case |
|---|---|---|---|
| HOST | Kokoro | Warm, natural flow | Introductions, narration |
| ARCHITECT | Fish Speech | Fast, excited tech jargon | Technical explanations |
| OPS | IndexTTS2 | Gritty, authoritative | Infrastructure descriptions |
| NEUTRAL | Any | Default voice | General content |
Components
1. Simple Synthesis
python
from slices.tts import TTSService, VoicePersona
# Create service
service = TTSService(http_client=client)
# Synthesize with persona
result = await service.synthesize(
text="Welcome to PMOVES.AI",
persona=VoicePersona.HOST,
)
print(f"Audio URL: {result.audio_url}")
print(f"Duration: {result.duration_ms}ms")
2. Punctuation Engineering
KOKORO engine supports punctuation engineering for prosody control:
| Pattern | Effect | Duration |
|---|---|---|
... | Long pause | ~600ms |
— | Sharp break/tone shift | ~300ms |
, | Short breath | ~150ms |
python
# Punctuation engineering applied automatically for HOST persona text = """ Imagine a world where your local hardware isn't just a server... but a living organism. We are looking at the P-MOVES dot A-I orchestration mesh. It's a distributed architecture—one that mirrors high-end production environments—but it lives entirely on your local metal. """ result = await service.synthesize(text, persona=VoicePersona.HOST)
3. Reference Audio Cloning
FISH_SPEECH engine supports voice cloning from reference audio:
python
from slices.tts import VoiceConfig, TTSEngine
# Configure with reference audio
config = VoiceConfig(
engine=TTSEngine.FISH_SPEECH,
reference_audio_url="https://example.com/reference.mp3",
reference_text="This is the fastest CPU we have ever tested.",
speed=1.2,
)
result = await service.synthesize(
text="This is not just storage—this is Local Inference!",
voice_config=config,
)
4. Emotion Prompts
INDEXTTS2 engine supports natural language style prompts:
python
from slices.tts import VoiceConfig, TTSEngine, EmotionStyle
# Configure with emotion
config = VoiceConfig(
engine=TTSEngine.INDEXTTS2,
emotion=EmotionStyle.WHISPER,
pitch=0.8,
speed=0.7,
)
result = await service.synthesize(
text="And it learns. The Evo-Controller reads those packets...",
voice_config=config,
)
5. Multi-Speaker Podcast Mode
Generate conversations with multiple voice personas:
python
from slices.tts import MultiSpeakerSegment
segments = [
MultiSpeakerSegment(
speaker=VoicePersona.HOST,
text="Imagine a world where your local hardware is a living organism.",
),
MultiSpeakerSegment(
speaker=VoicePersona.OPS,
text="It starts with Local-First Infrastructure. We aren't renting power.",
),
MultiSpeakerSegment(
speaker=VoicePersona.ARCHITECT,
text="Correct! And this is not just storage—this is Local Inference!",
),
]
result = await service.synthesize_multi_speaker(
segments=segments,
crossfade_ms=100,
)
Engine Configuration
| Engine | URL | API Key Required |
|---|---|---|
| Kokoro | http://localhost:8090 | No |
| Fish Speech | http://localhost:8091 | No |
| IndexTTS2 | http://localhost:8092 | No |
| ElevenLabs | Cloud API | Yes |
| OpenAI TTS | Cloud API | Yes |
python
service = TTSService(
http_client=client,
kokoro_url="http://localhost:8090",
fish_speech_url="http://localhost:8091",
indextts_url="http://localhost:8092",
elevenlabs_api_key=os.environ.get("ELEVENLABS_API_KEY"),
openai_api_key=os.environ.get("OPENAI_API_KEY"),
)
Integration with Discord Bot
python
# In Discord bot command handler
@bot.tree.command(name="speak", description="Generate speech from text")
async def speak_cmd(
interaction: discord.Interaction,
text: str,
persona: str = "host",
):
await interaction.response.defer()
tts_service = TTSService(http_client=httpx.AsyncClient())
persona_enum = VoicePersona(persona.lower())
result = await tts_service.synthesize(text, persona=persona_enum)
if result.error:
await interaction.followup.send(f"TTS failed: {result.error}")
else:
await interaction.followup.send(f"Audio: {result.audio_url}")
References
- •docs/agents/PMOVES_Engine_Templates.md - TTS template specifications
- •Kokoro TTS documentation
- •Fish Speech documentation
- •IndexTTS2 documentation