ElevenLabs Voice Design Prompting
Expert guidance for creating AI-generated voices using ElevenLabs Voice Design.
Quick Start Decision Tree
| Goal | Approach |
|---|---|
| Generic narrator | Short prompt: "A calm male narrator" |
| Specific character | Detailed prompt with age, accent, tone, pacing, emotion |
| High audio quality | Add "perfect audio quality" or "studio-quality recording" |
| Stylized/lo-fi audio | Add "low-fidelity audio" or "sounds like a voicemail" |
| Strong accent | Use "thick" (not "strong"): "thick French accent" |
| Subtle accent | Use "slight": "slight Southern drawl" |
Prompt Structure Template
Build prompts by combining these elements:
[Audio Quality] + [Age] + [Gender] + [Accent] + [Tone/Timbre] + [Pacing] + [Character/Emotion]
Example:
"Perfect audio quality. A man in his 40s with a thick British accent. His voice is deep and warm, speaking at a natural conversational pace. He sounds confident and approachable."
Phrasing Experimentation
How you phrase descriptors matters. The same concept written differently can produce noticeably different results:
| Phrasing A | Phrasing B | Notes |
|---|---|---|
| "Perfect audio quality" | "The audio quality is perfect" | May produce different tonal qualities |
| "Speaking quickly" | "A fast pace" | Affects rhythm differently |
| "Deep voice" | "His voice is deep" | Contextual vs standalone descriptor |
| "Thick accent" | "A very pronounced accent" | Intensity perception varies |
Best Practice: When iterating on a voice, try rephrasing key descriptors rather than just adding more details. Small wording changes can unlock the exact voice you're looking for.
Core Attributes Reference
Age Descriptors
| Descriptor | Effect |
|---|---|
| Adolescent | Youthful, higher energy |
| Young adult / in their 20s | Fresh, vibrant |
| Middle-aged / in their 40s | Mature, experienced |
| Elderly / in their 80s | Weathered, wise |
Tone/Timbre Descriptors
| Category | Options |
|---|---|
| Depth | Deep, low-pitched, booming, resonant |
| Texture | Smooth, gravelly, raspy, breathy, airy |
| Quality | Warm, mellow, rich, buttery |
| Edge | Nasally, shrill, harsh, tinny, metallic |
| Special | Ethereal, robotic, throaty |
Pacing Descriptors
| Speed | Descriptors |
|---|---|
| Fast | Speaking quickly, fast-paced, hurried cadence, staccato |
| Normal | Normal pace, conversational, relaxed pacing |
| Slow | Speaking slowly, deliberate, measured, drawn out |
| Variable | Erratic pacing, rhythmic, musical |
Accent Guidance
Use "thick" for prominent accents, "slight" for subtle:
- •"A middle-aged man with a thick French accent"
- •"A young woman with a slight Southern drawl"
- •"An old man with a heavy Eastern European accent"
Avoid: "foreign", "exotic" (too vague)
For fantasy characters, reference real accents:
- •"An elf with a proper thick British accent. He is regal and lyrical."
- •"A goblin with a raspy Eastern European accent."
Technical Parameters
Guidance Scale Settings
| Scenario | Guidance Scale | Notes |
|---|---|---|
| Accent/tone accuracy critical | 35-40% | Higher adherence to prompt |
| Balanced quality + accuracy | 25-30% | Good middle ground |
| Performance quality priority | 15-25% | More creative freedom |
| Very niche/specific prompts | Lower (20%) | Prevents audio artifacts |
Loudness Control
Controls the volume level of preview generation and saved voice output.
| Setting | Use Case |
|---|---|
| Higher loudness | Energetic voices, announcers, shouting characters |
| Default/medium | Most conversational voices |
| Lower loudness | Soft-spoken characters, whispers, intimate narration |
Tip: Adjust loudness to match the character's energy level. A drill sergeant should be louder than a meditation guide.
Preview Text Best Practices
- •Match emotional tone - Preview text should complement the voice description
- •Use longer text - Full sentences or paragraphs produce more stable results
- •Avoid contradictions - Don't use aggressive text for a calm voice description
Bad pairing:
Voice: "calm and reflective younger female voice" Preview: "Hey! I can't stand what you've done!!!"
Good pairing:
Voice: "calm and reflective younger female voice" Preview: "It's been quiet lately... I've had time to think, and maybe that's what I needed most."
Special Effects in Preview Text
Use these in preview text for expressive delivery:
- •
[laughs]- Laughter - •
[sighs]- Sighs - •
[exhales]- Exhale - •
[lip smacks]- Lip smack - •
(maniacal laughter)- Parenthetical actions
Common Voice Archetypes
| Archetype | Key Prompt Elements |
|---|---|
| Sports Commentator | High-energy, thick accent, quick pace, enthusiastic |
| Drill Sergeant | Angry, fast pace, shouting, authoritative |
| Movie Trailer | Dramatic, builds anticipation, deep, resonant |
| Friendly Narrator | Warm, conversational pace, approachable |
| Evil Villain | Deep, resonant, slow, menacing |
| Cute Character | Squeaky, high-pitched, playful |
Detailed Reference
For complete attribute tables, example prompts with preview text, and advanced techniques, read:
- •
references/voice-attributes.md- Complete attribute reference with all descriptors - •
references/example-prompts.md- Full example prompts with preview text and guidance scales
Key Reminders
- •More detail = better accuracy for specific characters
- •Simple prompts work for generic/neutral voices
- •"thick" > "strong" for accent prominence
- •Preview text matters - match it to your voice description
- •Longer preview text = more stable voice generation
- •Guidance scale tradeoff: Higher = more accurate but potential quality loss
- •Experiment with phrasing - same concept, different words can produce different results
- •Adjust loudness to match character energy level