Voice Reply Skill

Generate spoken audio responses using OpenAI's Text-to-Speech API.

When to Use

•
When the user asks to "reply by voice", "voice reply", "speak this", or similar voice-related requests.
•
Command trigger: When user sends /voice_note, resend the last message as a voice note.
- •Remove all emojis from the text before converting to speech
- •Rephrase if needed to make it sound natural when spoken (e.g., convert bullet points to flowing sentences)

How to Use

1. Prepare the Response

Write your response without emojis — they don't translate well to speech.

2. Generate Audio

Important: Use opus format for Telegram voice notes (shows waveform bubble).

bash

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "<your text here>",
    "voice": "echo",
    "speed": 1.2,
    "response_format": "opus"
  }' -s --output /tmp/voice_reply.ogg

3. Send the Audio

Copy to outbound folder and send via message tool:

bash

mkdir -p /home/exedev/.clawdbot/media/outbound
cp /tmp/voice_reply.ogg /home/exedev/.clawdbot/media/outbound/voice_reply.ogg

Then use the message tool with asVoice: true for proper voice message format:

json

{
  "action": "send",
  "channel": "telegram",
  "to": "<user_id>",
  "media": "/home/exedev/.clawdbot/media/outbound/voice_reply.ogg",
  "asVoice": true
}

Important:

•Use .ogg (opus) format — required for Telegram voice notes
•asVoice: true sends as voice bubble with waveform
•message caption is optional for voice notes

Configuration Options

Voice Options

Voice	Description
`alloy`	Neutral, balanced
`echo`	Warm, conversational (default)
`fable`	British, expressive
`onyx`	Deep, authoritative
`nova`	Friendly, upbeat
`shimmer`	Soft, calm

Speed

•Range: 0.25 to 4.0
•Default: 1.2 (slightly faster than normal)

Model

•gpt-4o-mini-tts — Fast, cost-effective
•tts-1 — Standard quality
•tts-1-hd — High definition

Example Workflow

bash

# 1. Generate audio (opus format for Telegram voice notes)
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hey Oscar! Your main task today is Create Task Skill. Let me know if you need help!",
    "voice": "echo",
    "speed": 1.2,
    "response_format": "opus"
  }' -s --output /tmp/reply.ogg

# 2. Copy to outbound
cp /tmp/reply.ogg ~/.clawdbot/media/outbound/reply.ogg

# 3. Send via message tool with asVoice: true

Tips

•Keep responses concise for voice — long text becomes tiring to listen to
•Avoid special characters, URLs, and code blocks
•Use natural language, as if speaking to someone
•Numbers and dates should be written out naturally

voice-reply

Voice Reply Skill

When to Use

How to Use

1. Prepare the Response

2. Generate Audio

3. Send the Audio

Configuration Options

Voice Options

Speed

Model

Example Workflow

Tips

References