VoiceMix Skill
Table of Contents
- •Overview
- •Installation
- •Environment Setup
- •Core API
- •Common Tasks
- •Agent Usage Rules
- •Troubleshooting
- •References
Overview
VoiceMix is a chainable text-to-speech Node.js library that generates audio files (mp3/wav) from text using multiple TTS provider APIs.
Use this skill when:
- •A task requires generating speech audio from text
- •Integrating text-to-speech into a Node.js project
- •Working with ElevenLabs, Resemble AI, or Cartesia APIs
- •Batch-processing scripts of dialogue lines into audio files
Do NOT use this skill when:
- •The task requires real-time streaming audio playback (VoiceMix writes to files)
- •The target environment is browser-only (VoiceMix uses Node.js
fs) - •Speech-to-text (transcription) is needed — this is text-to-speech only
Installation
bash
npm install voicemix dotenv
- •Requires Node.js with ESM support (
"type": "module"inpackage.json). - •
dotenvis optional but recommended for loading API keys from.env.
Environment Setup
Create a .env file with the relevant provider key(s):
plaintext
ELEVENLABS_API_KEY="your-elevenlabs-key" RESEMBLE_API_KEY="your-resemble-key" CARTESIA_API_KEY="your-cartesia-key"
Only the key for the provider being used is required. ElevenLabs is the default provider.
Core API
VoiceMix uses a chainable fluent API. Every method (except save()) returns this.
Constructor
javascript
import { VoiceMix } from 'voicemix';
const vm = new VoiceMix(); // defaults: ElevenLabs, multilingual_v2, mp3, cwd
const vm = new VoiceMix({ filePath: './audio', format: 'wav' }); // with options
Constructor options (all optional):
| Option | Default | Description |
|---|---|---|
filePath | './' | Output directory for audio files |
format | 'mp3' | Output format (mp3 or wav) |
filePrefix | '' | Prefix for generated filenames |
drymode | false | Skip API calls, return filename |
apiKey | env var | Override provider API key |
Provider Selection
javascript
vm.useElevenLabs(apiKey?) // default — reads ELEVENLABS_API_KEY vm.useResemble(apiKey?) // reads RESEMBLE_API_KEY vm.useCartesia(apiKey?) // reads CARTESIA_API_KEY
ElevenLabs Model Selection
javascript
vm.monolingual_v1() // English only vm.multilingual_v1() // First multilingual vm.multilingual_v2() // Default — improved multilingual vm.v3() // Latest, most advanced
Speech Generation Chain
javascript
vm.voice('voiceId') // required — set provider voice ID
.say('Hello world') // required — set text, auto-generates hashed filename
.save(); // returns Promise<string> resolving to the full file path
Additional Methods
javascript
vm.lang('en-US') // set language (used by Resemble for SSML)
vm.prompt('Friendly tone') // set voice style prompt (Resemble only)
vm.path('./output') // change output directory
vm.prefix('ch1_') // set filename prefix
vm.file('custom-name') // override auto-generated filename
vm.id('voiceId') // alias for .voice()
Resemble-Specific
javascript
vm.setSampleRate(48000) // default 48000
vm.setPrecision('PCM_16') // MULAW | PCM_16 | PCM_24 | PCM_32
vm.setOutputFormat('mp3') // mp3 | wav
Common Tasks
Generate a Single Audio File (ElevenLabs)
javascript
import { VoiceMix } from 'voicemix';
import dotenv from 'dotenv';
dotenv.config();
const vm = new VoiceMix();
await vm
.voice('EbhcCfMvNsbvjN6OhjpJ')
.say('Hello, world!')
.save();
Generate with ElevenLabs v3
javascript
const vm = new VoiceMix();
await vm
.v3()
.voice('dxvGlXoa4TLMyfYR6uC9')
.say('This uses the latest ElevenLabs model.')
.save();
Generate with Resemble AI (with Prompt Styling)
javascript
const vm = new VoiceMix();
await vm
.useResemble()
.prompt('Friendly and conversational tone')
.voice('ba875a0a')
.lang('en-US')
.say('Your text here')
.save();
Generate with Cartesia
javascript
const vm = new VoiceMix();
await vm
.useCartesia()
.voice('6ccbfb76-1fc6-48f7-b71d-91ac6298247b')
.say('Your text here')
.save();
Batch Process a Script from JSON
javascript
import { VoiceMix } from 'voicemix';
import fs from 'fs';
const script = JSON.parse(fs.readFileSync('./lines.json', 'utf8'));
const vm = new VoiceMix({ filePath: './audio' });
for (const entry of script) {
await vm
.prompt(entry.prompt || 'Friendly and conversational tone')
.voice(entry.voiceId)
.say(entry.english)
.save();
}
Expected lines.json format:
json
[
{
"prompt": "Friendly and conversational tone",
"english": "Hello, how are you today?",
"voiceId": "EbhcCfMvNsbvjN6OhjpJ"
}
]
Save to a Custom Path and Filename
javascript
const vm = new VoiceMix();
await vm
.voice('EbhcCfMvNsbvjN6OhjpJ')
.path('./output/chapter1')
.prefix('line_')
.say('Opening narration here.')
.save();
Agent Usage Rules
- •Always load environment variables — call
dotenv.config()(or equivalent) before constructingVoiceMixso provider API keys are available viaprocess.env. - •Check if
voicemixis already installed before runningnpm install. - •Ensure
"type": "module"exists in the project'spackage.json— VoiceMix is ESM-only. - •Never hardcode API keys — use environment variables or pass keys via constructor/provider methods.
- •Always
awaitthe.save()call — it returns a Promise. Withoutawait, files may not be written before the process exits. - •Voice ID is required — calling
.save()without.voice()throws aValidationError. - •Use the correct voice IDs for the selected provider — ElevenLabs, Resemble, and Cartesia voice IDs are not interchangeable.
- •Filenames are auto-hashed —
.say(text)generates a deterministic filename from the text + config. The same input produces the same filename (useful for caching). Use.file('name')only when an explicit filename is needed. - •Provider methods are provider-scoped —
.prompt()only works with Resemble;.v3()only works with ElevenLabs. Calling them on the wrong provider is a no-op (no error thrown). - •Batch processing uses an internal queue (batch size 3) — multiple
.save()calls are automatically batched and processed concurrently.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
ProviderError: ElevenLabs API key is required | Missing env var | Set ELEVENLABS_API_KEY in .env and call dotenv.config() |
ProviderError: Cartesia API key is required | Missing env var | Set CARTESIA_API_KEY in .env |
ValidationError: Voice ID is required | .voice() not called | Chain .voice('id') before .save() |
| 401 / 403 from provider API | Invalid or expired key | Verify the API key in provider dashboard |
| Files not appearing | save() not awaited | Add await before .save() |
| Wrong provider voice ID | Mixing IDs across providers | Use voice IDs from the active provider's dashboard |