Voice Transcription Skill

Transcribe audio files (voice memos, recordings, podcasts) to text using OpenAI's Whisper API or local transcription tools.

Purpose

This skill enables PCP to:

•Transcribe voice memos sent via Discord
•Convert audio recordings to searchable text
•Process podcast clips or meeting recordings
•Handle any audio-to-text conversion

When to Use

•User sends a voice memo or audio file
•User asks to "transcribe this"
•Any audio file needs to be converted to text
•Voice notes need to be captured in the vault

Supported Formats

OpenAI Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac

For other formats (like opus), ffmpeg will convert automatically.

How to Execute

Step 1: Identify the Audio File

Check if the file exists and identify its format:

bash

file /path/to/audio.file

Step 2: Convert if Necessary

If the format isn't directly supported, convert with ffmpeg:

bash

# Convert to WAV (most compatible)
ffmpeg -i input.opus -ar 16000 -ac 1 -y output.wav

# Common conversions:
ffmpeg -i input.ogg -y output.mp3
ffmpeg -i input.webm -y output.mp3

Step 3: Transcribe Using Helper Script

Use the bundled transcribe.py script:

bash

python /workspace/skills/voice-transcription/transcribe.py /path/to/audio.wav

Or in Python:

python

from pathlib import Path
import sys
sys.path.insert(0, "/workspace/skills/voice-transcription")
from transcribe import transcribe_audio

result = transcribe_audio("/path/to/audio.wav")
if result["success"]:
    print(result["text"])
else:
    print(f"Error: {result['error']}")

Step 4: Return Results

Present the transcription to the user in a readable format. Optionally capture it to the vault:

python

from vault_v2 import smart_capture

# Capture the transcription
result = smart_capture(
    f"Voice memo transcription: {transcription_text}",
    capture_type="note"
)

Fallback: Local Transcription

If OPENAI_API_KEY is not available but local whisper CLI is installed:

bash

# Check if whisper is installed
which whisper

# Use local whisper
whisper /path/to/audio.wav --model base --output_format txt

Configuration

Configure in /workspace/config/pcp.yaml:

yaml

skills:
  entries:
    voice-transcription:
      enabled: true
      model: whisper-1           # OpenAI model
      language: en               # Default language (optional, auto-detect if not set)
      fallback_to_local: true    # Use local whisper if API fails
      max_file_size_mb: 25       # Maximum file size to process

Error Handling

Error	Solution
"File too large"	Split audio into smaller chunks
"Unsupported format"	Convert with ffmpeg first
"API rate limit"	Wait and retry, or use local fallback
"No API key"	Set OPENAI_API_KEY or use local whisper

Example Workflow

User sends voice memo via Discord:

•Discord attachment saved to /tmp/discord_attachments/voice-memo.ogg
•PCP detects audio file, activates voice-transcription skill
•Convert to supported format: ffmpeg -i voice-memo.ogg -y voice-memo.mp3
•Transcribe: transcribe_audio("voice-memo.mp3")
•Return text to user: "Here's what you said: ..."
•Optionally capture to vault

Related Skills

•/vault-operations - Capture transcriptions
•/email-processing - Transcribe voice messages from emails
•/task-delegation - Delegate long audio files to background processing