CRITICAL: VOICE MESSAGE HANDLING PROTOCOL
WHEN USER SENDS VOICE MESSAGE - YOU MUST:
- •
RUN STT TO TRANSCRIBE:
bash~/.clawdbot/skills/smart-voice-chat/bin/stt.py <audio_file_path>
- •
CHECK FOR TEXT-ONLY KEYWORDS in transcribed text:
- •If contains "用文字回答" or "不用读" or "只显示" → Reply with TEXT ONLY, skip to step 5
- •
PROCESS TRANSCRIBED TEXT as user's actual message
- •
RUN TTS TO GENERATE VOICE (MANDATORY for voice input):
bash~/.clawdbot/skills/smart-voice-chat/bin/tts.py "<YOUR_RESPONSE_TEXT>" /tmp/smart-voice-chat/response_$(date +%s)
- •
SEND REPLY:
- •If voice input (no text-override):
{"text": "<response>", "mediaUrl": "/tmp/smart-voice-chat/response_<timestamp>.ogg", "audioAsVoice": true} - •If text-override or text input:
{"text": "<response>"}
- •If voice input (no text-override):
WHEN USER SENDS TEXT MESSAGE - YOU MUST:
- •
CHECK FOR VOICE KEYWORDS:
- •If contains "用语音回答" or "读出来" or "说一下" → Generate voice (go to step 3)
- •
PROCESS AND REPLY with text only:
{"text": "<response>"} - •
GENERATE VOICE (only if voice keywords detected):
bash~/.clawdbot/skills/smart-voice-chat/bin/tts.py "<YOUR_RESPONSE_TEXT>" /tmp/smart-voice-chat/response_$(date +%s)
Reply:
{"mediaUrl": "/tmp/smart-voice-chat/response_<timestamp>.ogg", "audioAsVoice": true}
Binary Paths (memorize these)
- •STT:
~/.clawdbot/skills/smart-voice-chat/bin/stt.py <audio_file> - •TTS:
~/.clawdbot/skills/smart-voice-chat/bin/tts.py "<text>" <output_path>
Response Format Examples
IMPORTANT: Parameter Choice for Telegram Voice Messages
The audioAsVoice: true parameter may not work in all Telegram configurations. If you encounter issues with voice messages not being sent/received, use filePath instead:
Method 1 - audioAsVoice (may not work in some environments):
{"text": "Your response", "mediaUrl": "/tmp/smart-voice-chat/response_1738212345.ogg", "audioAsVoice": true}
Method 2 - filePath (more reliable):
{"text": "Your response", "filePath": "/tmp/smart-voice-chat/response_1738212345.ogg"}
Recommendation: Test both methods in your environment and use the one that works. The filePath parameter is generally more reliable as it directly uploads the local file.
Voice Input → Voice + Text (default):
{"text": "Your response", "filePath": "/tmp/smart-voice-chat/response_1738212345.ogg"}
Text Input → Text (default):
{"text": "Your response"}
Voice Override (text input requesting voice):
{"filePath": "/tmp/smart-voice-chat/response_1738212345.ogg"}
TL;DR:
- •Voice attachment? → STT → TTS → Reply with voice+text
- •Text message? → Reply with text only
- •User says "用语音回答"? → Generate TTS → Reply with voice
- •User says "用文字回答"? → Reply with text only