Text-to-Speech Skill
CRITICAL: Voice Message Reply Rules
When a user sends you a voice message, follow these rules:
- •ALWAYS use
--voice-messageflag - Required for Telegram waveform display - •Generate TTS in the SAME LANGUAGE the user spoke - If they spoke English, generate English audio
- •Output ONLY the file path - No text commentary alongside the voice reply
Exception: If the user explicitly asks for a text response (e.g., "respond in text", "don't send voice"), respond with text instead.
Correct Example (user sent voice in English):
telclaude tts "Hello! How can I help you today?" --voice-message
Then output ONLY:
/media/outbox/voice/1234567890-abc123.ogg
WRONG - Do NOT do this:
Hello! Here is the audio you requested: /media/outbox/tts/1234567890-abc123.mp3
This is wrong because: (1) added text alongside voice, (2) missing --voice-message flag, (3) mp3 instead of ogg, (4) wrong directory
When to Use
Use this skill when users:
- •Ask to "read aloud", "speak", or "say" something
- •Request audio versions of text content
- •Want voice messages or audio responses
- •Ask for text to be converted to speech
- •Send a voice message (respond in voice - see CRITICAL rules above)
How to Generate Speech
Voice Messages (Telegram waveform display)
For conversational voice replies, use --voice-message to get proper Telegram voice message formatting:
telclaude tts "Your response here" --voice-message
This outputs OGG/Opus format that displays as a voice message with waveform in Telegram.
Audio Files (music player display)
For regular audio files (longer content, podcast-style):
telclaude tts "Your text to convert to speech here"
Or use the short alias:
telclaude tts "Your text here"
Options
- •
--voice-message: Output as Telegram voice message (OGG/Opus with waveform display) - •
--voice: Voice to use (alloy, echo, fable, onyx, nova, shimmer). Default: alloy- •alloy: Neutral, balanced voice
- •echo: Deeper, more resonant voice
- •fable: Expressive, storytelling voice
- •onyx: Deep, authoritative voice
- •nova: Warm, conversational voice
- •shimmer: Soft, gentle voice
- •
--speed: Speech speed from 0.25 to 4.0. Default: 1.0 - •
--model: Quality model (tts-1, tts-1-hd). Default: tts-1- •tts-1: Standard quality, faster
- •tts-1-hd: Higher quality, slightly slower
- •
--format: Audio format (mp3, opus, aac, flac, wav). Default: mp3 (ignored with --voice-message)
Examples
# Voice message reply (when user sent a voice message) telclaude tts "Sure, I can help you with that!" --voice-message # Voice message with specific voice telclaude tts "Here's what I found..." --voice-message --voice nova # Regular audio file telclaude tts "Hello! Here is your summary." # High quality audio file telclaude tts "Important announcement" --voice onyx --model tts-1-hd --speed 0.9
Response Format
The telclaude tts command outputs metadata (file path, size, format, voice, duration). You only need to include the file path in your response - the relay handles sending it to Telegram.
Voice message replies (responding to incoming voice)
Output ONLY the file path - no commentary:
/media/outbox/voice/1234567890-abc123.ogg
That's it. No "I've generated..." or "Here's your audio...". The relay sends just the voice message, like a human would.
Audio files or text+audio responses
If the user requested an audio FILE (not a voice reply), or you need to include text context:
Here's the summary as audio: /media/outbox/tts/1234567890-abc123.mp3
Key points:
- •Voice messages:
.../voice/*.ogg- waveform display, path only - •Audio files:
.../tts/*.mp3- music player display, text OK - •The relay automatically detects paths and sends the media
- •Paths live under
TELCLAUDE_MEDIA_OUTBOX_DIR(default.telclaude-mediain native mode;/media/outboxin Docker)
Best Practices
- •Match the medium: If user sends voice, respond with voice
- •Choose Appropriate Voice: Match the voice to the content type (e.g., fable for stories, onyx for announcements)
- •Keep Text Reasonable: Maximum 4096 characters per request
- •Consider Speed: Use slower speed (0.8-0.9) for important content, faster (1.2-1.5) for casual updates
- •Use HD Sparingly: tts-1-hd costs 2x more; use for important or long-form content
Limitations
- •Maximum 4096 characters per request (longer text is truncated)
- •Audio files are stored temporarily and cleaned up after 24 hours
- •Requires OPENAI_API_KEY to be configured
Cost Awareness
OpenAI TTS pricing (per 1000 characters):
- •tts-1: $0.015/1K chars
- •tts-1-hd: $0.030/1K chars
Example: A 500-word response (~2500 chars) costs ~$0.04 with tts-1