AgentSkillsCN

voicemode

为Claude Code提供语音交互支持。当用户提及语音模式、语音输入、对话交流、语音状态或语音故障排查时,可调用此技能。

SKILL.md
--- frontmatter
name: voicemode
description: Voice interaction for Claude Code. Use when users mention voice mode, speak, talk, converse, voice status, or voice troubleshooting.

First-Time Setup

If VoiceMode isn't working or MCP fails to connect, run:

code
/voicemode:install

After install, reconnect MCP: /mcp → select voicemode → "Reconnect" (or restart Claude Code).


VoiceMode

Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).

Note: The Python package is voice-mode (hyphen), but the CLI command is voicemode (no hyphen).

When to Use MCP vs CLI

TaskUseWhy
Voice conversationsMCP voicemode:converseFaster - server already running
Service start/stopMCP voicemode:serviceWorks within Claude Code
InstallationCLI voice-mode-installOne-time setup
ConfigurationCLI voicemode configEdit settings directly
DiagnosticsCLI voicemode diagAdministrative tasks

Usage

Use the converse MCP tool to speak to users and hear their responses:

python
# Speak and listen for response (most common usage)
voicemode:converse("Hello! What would you like to work on?")

# Speak without waiting (for narration while working)
voicemode:converse("Searching the codebase now...", wait_for_response=False)

For most conversations, just pass your message - defaults handle everything else.

ParameterDefaultDescription
messagerequiredText to speak
wait_for_responsetrueListen after speaking
voiceautoTTS voice

For all parameters, see Converse Parameters.

Best Practices

  1. Narrate without waiting - Use wait_for_response=False when announcing actions
  2. One question at a time - Don't bundle multiple questions in voice mode
  3. Check status first - Verify services are running before starting conversations
  4. Let VoiceMode auto-select - Don't hardcode providers unless user has preference
  5. First run is slow - Model downloads happen on first start (2-5 min), then instant

Handling Pauses and Wait Requests

When the user asks you to wait or give them time:

Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.

Longer pauses (2+ minutes): Use bash sleep N where N is seconds. For example, if the user says "give me 5 minutes":

bash
sleep 300  # Wait 5 minutes

Then call converse again when the wait is over:

python
voicemode:converse("Five minutes is up. Ready when you are.")

Configuration: The short pause duration is configurable via VOICEMODE_WAIT_DURATION (default: 60 seconds).

Check Status

bash
voicemode service status          # All services
voicemode service status whisper  # Specific service

Shows service status including running state, ports, and health.

Installation

bash
# Install VoiceMode CLI and configure services
uvx voice-mode-install --yes

# Install local services (Apple Silicon recommended)
voicemode service install whisper
voicemode service install kokoro

See Getting Started for detailed steps.

Service Management

python
# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")

# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
ServicePortPurpose
whisper2022Speech-to-text
kokoro8880Text-to-speech
voicemode8765HTTP/SSE server

Actions: status, start, stop, restart, logs, enable, disable

Configuration

bash
voicemode config list                           # Show all settings
voicemode config set VOICEMODE_TTS_VOICE nova   # Set default voice
voicemode config edit                           # Edit config file

Config file: ~/.voicemode/voicemode.env

See Configuration Guide for all options.

DJ Mode

Background music during VoiceMode sessions with track-level control.

bash
# Core playback
voicemode dj play /path/to/music.mp3  # Play a file or URL
voicemode dj status                    # What's playing
voicemode dj pause                     # Pause playback
voicemode dj resume                    # Resume playback
voicemode dj stop                      # Stop playback

# Navigation and volume
voicemode dj next                      # Skip to next chapter
voicemode dj prev                      # Go to previous chapter
voicemode dj volume 30                 # Set volume to 30%

# Music For Programming
voicemode dj mfp list                  # List available episodes
voicemode dj mfp play 49               # Play episode 49
voicemode dj mfp sync                  # Convert CUE files to chapters

# Music library
voicemode dj find "daft punk"          # Search library
voicemode dj library scan              # Index ~/Audio/music
voicemode dj library stats             # Show library info

# Play history and favorites
voicemode dj history                   # Show recent plays
voicemode dj favorite                  # Toggle favorite on current track

Configuration: Set VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).

CLI Cheat Sheet

bash
# Service management
voicemode service status            # All services
voicemode service start whisper     # Start a service
voicemode service logs kokoro       # View logs

# Diagnostics
voicemode deps                      # Check dependencies
voicemode diag info                 # System info
voicemode diag devices              # Audio devices

# History search
voicemode history search "keyword"
voicemode history play <exchange_id>

# DJ Mode
voicemode dj play <file|url>        # Start playback
voicemode dj status                 # What's playing
voicemode dj next/prev              # Navigate chapters
voicemode dj stop                   # Stop playback
voicemode dj mfp play 49            # Music For Programming

Voice Handoff Between Agents

Transfer voice conversations between Claude Code agents for multi-agent workflows.

Use cases:

  • Personal assistant routing to project-specific foremen
  • Foremen delegating to workers for focused tasks
  • Returning control when work is complete

Quick Reference

python
# 1. Announce the transfer
voicemode:converse("Transferring you to a project agent.", wait_for_response=False)

# 2. Spawn with voice instructions (mechanism depends on your setup)
spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")

# 3. Go quiet - let new agent take over

Hand-back:

python
voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)
# Stop conversing, exit or go idle

Key Principles

  1. Announce transfers: Always tell the user before transferring
  2. One speaker: Only one agent should use converse at a time
  3. Distinct voices: Different voices make handoffs audible
  4. Provide context: Tell receiving agent why user is being transferred

Detailed Documentation

See Call Routing for comprehensive guides:

Documentation Index

TopicLink
Converse ParametersAll Parameters
InstallationGetting Started
ConfigurationConfiguration Guide
Claude Code PluginPlugin Guide
Whisper STTWhisper Setup
Kokoro TTSKokoro Setup
PronunciationPronunciation Guide
TroubleshootingTroubleshooting
CLI ReferenceCLI Docs
DJ ModeBackground Music

Related Skills

  • VoiceMode Connect - Remote voice via mobile/web clients (no local STT/TTS needed)