Voice Transcription Skill
This skill enables local voice transcription using whisper.cpp for privacy-preserving speech-to-text.
When to Use This Skill
Use this skill when the user:
- •Explicitly asks to record voice or use voice input
- •Wants to describe something verbally instead of typing
- •Needs to transcribe audio
- •Says phrases like "let me speak", "record this", "voice input"
- •Would benefit from speaking complex information rather than typing
Automatic Setup
The transcription script now includes:
- •Installation detection - Checks if VoiceType is properly installed
- •Auto-start - Automatically starts whisper.cpp server if not running
If the script detects missing installation, it will return JSON with "installation_needed": true. When you see this:
- •
Offer to run installation:
code"It looks like VoiceType isn't fully installed. Would you like me to run the installer? I can do this with: /voicetype-install"
- •
If user agrees, run:
bashbash install.sh
Or use the
/voicetype-installcommand which provides guided installation.
Prerequisites (Automatic)
The script automatically handles:
- •✅ Checks for installation - Verifies venv, whisper binary, and scripts exist
- •✅ Starts whisper server - Auto-starts from
.whisper/bin/if not running - •✅ Downloads model - First-time use downloads whisper model automatically
You don't need to manually check the server - the script does it!
How to Transcribe Voice
- •
Run the transcription script:
bashsource venv/bin/activate && python skills/voice/scripts/transcribe.py --duration 5
The script automatically:
- •✅ Checks installation (offers /voicetype-install if needed)
- •✅ Starts whisper server if not running
- •✅ Records audio from microphone for specified duration (default 5 seconds)
- •✅ Transcribes via local whisper.cpp server (localhost:2022)
- •✅ Returns JSON with transcribed text
- •
Parse the output:
- •Success:
{"text": "transcribed speech", "duration": 5} - •Installation needed:
{"error": "...", "installation_needed": true, "missing_components": [...], "help": [...]} - •Transcription error:
{"error": "error message", "help": [...]}
- •Success:
- •
Handle installation_needed: If JSON contains
"installation_needed": true:- •Inform user: "VoiceType needs to be installed first."
- •Offer: "Would you like me to run the installer? Use: /voicetype-install or I can run: bash install.sh"
- •Wait for user confirmation before proceeding
Example Usage Flows
Scenario 1: Normal Transcription (Installed)
User: "Let me record a voice note about the bug I'm seeing"
Assistant:
- •Informs user: "I'll record for 5 seconds. Speak when ready..."
- •Runs transcription script (auto-starts server if needed)
- •Receives:
{"text": "The submit button isn't working when I click it on the checkout page"} - •Responds: "I transcribed: 'The submit button isn't working when I click it on the checkout page.' Let me help you investigate this issue..."
Scenario 2: First-Time Use (Not Installed)
User: "Record my voice"
Assistant:
- •Runs transcription script
- •Receives:
{"error": "VoiceType is not fully installed", "installation_needed": true, "missing_components": ["Python venv", "whisper.cpp binary"]} - •Responds: "It looks like VoiceType isn't installed yet. Would you like me to run the installer? I can guide you through it with: /voicetype-install or directly run: bash install.sh"
- •User confirms
- •Runs
/voicetype-installorbash install.sh - •After installation: "Installation complete! Now let's try voice transcription..."
Script Options
The transcription script accepts optional parameters:
- •
--duration N- Record for N seconds (1-30, default 5) - •Example:
python skills/voice/scripts/transcribe.py --duration 10
Troubleshooting
If transcription fails:
- •
Check microphone access:
bashpython -c "import sounddevice as sd; print(sd.query_devices())"
- •
Verify whisper server:
bashsystemctl --user status whisper-server journalctl --user -u whisper-server -n 20
- •
Test the script directly:
bashcd /path/to/voicetype source venv/bin/activate python skills/voice/scripts/transcribe.py
Privacy Note
All voice processing happens locally:
- •Audio recorded via sounddevice (local microphone)
- •Transcription via whisper.cpp server (localhost only)
- •No data sent to cloud services
- •Audio files are temporary and deleted after transcription