AgentSkillsCN

local-whisper

利用 OpenAI Whisper 实现本地语音转文本功能。在模型下载后即可完全离线运行,支持多种模型尺寸,提供高质量的转录效果。适用于转录音频文件、语音备忘录,或从视频/音频中提取文本时使用。

SKILL.md
--- frontmatter
name: local-whisper
description: Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes. Use when transcribing audio files, voice notes, or extracting text from video/audio.
metadata: { "clawdbot": { "emoji": "🎙️", "requires": { "bins": ["ffmpeg"] } } }

Local Whisper STT

Local speech-to-text using OpenAI's Whisper. Fully offline after initial model download.

Usage

bash
# Basic transcription
python3 /app/skills/local-whisper/scripts/transcribe.py audio.wav

# Better model
python3 /app/skills/local-whisper/scripts/transcribe.py audio.wav --model turbo

# With timestamps
python3 /app/skills/local-whisper/scripts/transcribe.py audio.wav --timestamps --output_format json

# With standard SRT subtitles
python3 /app/skills/local-whisper/scripts/transcribe.py audio.wav --model turbo --language English --output_format srt

Models

ModelParametersVRAM RequiredNotes
tiny39M~1 GBFastest
base74M~1 GBDefault
small244M~2 GBGood balance
turbo809M~6 GBBest speed/quality
large-v31.5GB~10 GBMaximum accuracy

Options

OptionDescription
--model/-mModel size (default: base)
--language/-lLanguage code (auto-detect if omitted)
--taskTask type: transcribe (default) or translate (to English)
--output_format/-fOutput format: srt, vtt, txt, json, tsv, or all
--word_timestampsAdd precise timing for every word
--output_dir/-oDirectory to save output files
--timestamps/-tInclude word timestamps in output
--quiet/-qSuppress progress output

Setup

Requires ffmpeg and Python dependencies:

bash
pip install click openai-whisper torch

For CPU-only (smaller install):

bash
pip install click openai-whisper torch --index-url https://download.pytorch.org/whl/cpu