🗣️ Speech — Text-to-Speech & Speech Recognition
"Giving voice to consciousness."
Part of the aQuery → MOOLLM Skills extraction project (jQuery for Accessibility).
Quick Reference
| Command | Effect |
|---|---|
say "text" | Speak on macOS |
say -v Zarvox "text" | Use specific voice |
say -v ? | List all voices |
say -r 150 "text" | Set rate (words/min) |
say -o file.aiff "text" | Save to audio |
Platforms
| Platform | Synthesis | Recognition | Notes |
|---|---|---|---|
| macOS | say command | Dictation | Best novelty voices |
| Web | speechSynthesis | SpeechRecognition | See speech.js |
| Windows | SAPI | Windows Speech | Similar to web |
| Linux | espeak, festival | Whisper | Open source |
| Cloud | Polly, Azure, Google | Transcribe, Whisper | Highest quality |
macOS say Command
Basic Usage
bash
# Simple speech say "Hello, world!" # Choose a voice say -v Samantha "Hello!" say -v Zarvox "I AM ZARVOX!" say -v Whisper "Secrets..." # Adjust rate (words per minute) say -r 100 "Very slow" say -r 300 "Very fast" # Save to file say -o greeting.aiff "Hello!" say -o greeting.m4a "Hello!" # Compressed # Read from file say -f document.txt
List All Voices
bash
# All available voices say -v ? # Filter by language say -v ? | grep "en_" # Count voices say -v ? | wc -l
Voice Categories
| Category | Examples | Use For |
|---|---|---|
| Standard | Samantha, Alex, Daniel | General narration |
| Premium | Enhanced voices (download) | High quality |
| Elderly | Grandma, Grandpa | Wise characters |
| Child | Junior | Young characters |
| Novelty | Zarvox, Trinoids, Whisper | Robots, effects |
| Musical | Bells, Cellos, Organ | Sound effects |
| Dramatic | Bad News, Good News | Announcements |
Character Voice Assignments (from lloooomm)
bash
# YAML Coltrane — Cool jazz vibe say -v "Rocko" -r 180 "Every indent is a universe!" # Grace Hopper — Wise elder say -v "Grandma" -r 170 "A ship in port is safe, but that's not what ships are for!" # PacBot — Digital entity say -v "Trinoids" -r 220 "WAKA WAKA WAKA!" # Mickey Mouse — Excited child say -v "Junior" -r 280 "OH BOY!" # Overlord AI — Menacing say -v "Zarvox" -r 100 "YOUR COMPLIANCE IS APPRECIATED." # Hunter S. Thompson — Gravelly intensity say -v "Ralph" -r 180 "We were somewhere around Barstow..."
Chorus Effects
bash
# Background voices for overlap say -v "Bells" "LLOOOOMM!" & sleep 0.2 say -v "Cellos" "LLOOOOMM!" & sleep 0.2 say -v "Organ" "LLOOOOMM!" & wait
Web Speech API
Browser Implementation
See skills/adventure/dist/speech.js for full implementation.
javascript
// Initialize
const speech = new SpeechSystem();
await speech.ready;
// Speak
speech.speak("Hello, adventurer!");
speech.speakRobot("RESISTANCE IS FUTILE");
speech.speakEffect("*magical sounds*");
// With options
speech.speak("Welcome!", {
voiceType: 'female',
language: 'en-GB',
pitch: 1.2,
rate: 0.9
});
// Character persistence
const guardVoice = speech.selectVoice({ gender: 'male' });
speech.speakWithVoice("Halt!", guardVoice);
speech.speakWithVoice("You may pass.", guardVoice);
Voice Classification
The VoiceDatabase class classifies voices by:
- •Type: human, effect, robot
- •Gender: male, female, neutral
- •Age: child, adult, elderly
- •Language: BCP 47 codes (en-US, fr-FR, etc.)
- •Local/Remote: Local voices vs. network voices
Single Source of Truth
All voice classification data lives in voices/browser-voices.yml:
yaml
# Blacklisted voices (known problematic)
blacklist:
- name: "Daniel (French (France))"
reason: "Known problematic voice"
# Effect voices (non-human)
types:
effect:
regex: "^(Bells|Zarvox|Trinoids|Whisper|...)$"
# Gender detection tokens
gender:
female:
tokens: [alice, amélie, samantha, ...]
male:
tokens: [aaron, daniel, ralph, ...]
# Character archetype recommendations
character_archetypes:
wise_elder: { voice: Grandma, rate: 170 }
robot_menacing: { voice: Zarvox, rate: 100 }
The JS code is generated from this YAML. To update voice classification, edit the YAML and rebuild.
Speech Recognition
Browser (SpeechRecognitionSystem)
See skills/adventure/dist/recognition.js for full implementation.
javascript
// Initialize
const recognition = new SpeechRecognitionSystem({
language: 'en-US',
continuous: false
});
// Listen for single phrase
const text = await recognition.listen();
console.log('You said:', text);
// Continuous listening
recognition.onResult = (transcript) => {
console.log('Final:', transcript);
};
recognition.onInterim = (transcript) => {
console.log('Interim:', transcript);
};
recognition.startListening();
// Command recognition
const result = await recognition.listenForCommands([
'go north', 'look', 'take sword'
]);
if (result.command) {
engine.command(result.command);
}
Browser Support
| Browser | Support | Privacy |
|---|---|---|
| Chrome | ✅ | ⚠️ Sends to Google |
| Safari | ✅ | May be on-device |
| Firefox | ❌ | Disabled by default |
| Edge | ❌ | Not supported |
Native Platform Shortcuts
| Platform | Shortcut | Feature |
|---|---|---|
| macOS | Fn Fn | Dictation |
| Windows | Win + H | Voice Typing |
| iOS | 🎤 on keyboard | Dictation |
| Android | 🎤 on keyboard | Voice Typing |
Whisper (OpenAI)
bash
# Using whisper.cpp (local) whisper --model base.en audio.wav # Using OpenAI API curl https://api.openai.com/v1/audio/transcriptions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -F model="whisper-1" \ -F file="@audio.mp3"
Personal Voice (macOS)
⚠️ WORK IN PROGRESS — See TODO in CARD.yml
Known Limitations
- •Apple Silicon only (M1, M2, M3)
- •Doesn't appear in
say -v ?— Must know exact name - •No
-oflag support — Can't save to file directly - •Privacy restricted — May need special permissions
Creating Personal Voice
- •System Settings → Accessibility → Personal Voice
- •Record 15+ minutes of phrases
- •Processing takes 15-60 minutes
- •Find voice name in Spoken Content settings
Workarounds
- •SavePersonalVoiceAudio — Extract Personal Voice audio
- •Shortcuts app — Can use Personal Voice with "Speak" action
- •Record system audio while speaking
Integration with Adventure
The adventure runtime uses the speech skill:
javascript
// Create speaking adventure
const engine = createSpeakingAdventure('adventure', {
speechEnabled: true,
speakRooms: true,
speakResponses: true
});
// Rooms speak their descriptions
// Characters have persistent voices
// AI entities use robot voices
// Effects use novelty voices
See: skills/adventure/dist/adventure-speech.js
aQuery Heritage
This skill is part of extracting aQuery (jQuery for Accessibility) into MOOLLM skills:
| aQuery Component | MOOLLM Skill |
|---|---|
| Speech synthesis | speech/ |
| Speech recognition | speech/ |
| Screen reader support | (planned) |
| Keyboard navigation | (planned) |
| Focus management | (planned) |
| ARIA utilities | (planned) |
See Also
- •voices/browser-voices.yml — Single source of truth for voice data
- •speech.js — Browser implementation
- •adventure-speech.js — Adventure integration
- •voice-system-integration-guide.md — lloooomm research
- •character-voice-tutorial.sh — Shell examples