AgentSkillsCN

whisper-transcribe

利用 OpenAI Whisper 将音频与视频文件转录为文本(本地 CLI 或 API 方式)

SKILL.md
--- frontmatter
name: whisper-transcribe
version: 1.0.0
description: Transcribe audio and video files to text using OpenAI Whisper (local CLI or API)
author: Code Buddy
tags: whisper, transcribe, speech-to-text, stt, audio

Whisper Transcription

Overview

Transcribe audio/video files to text using OpenAI Whisper — either locally (free, no API key) or via the API.

Local Whisper (no API key)

Install

bash
pip install openai-whisper
# or with conda:
conda install -c conda-forge openai-whisper

Transcribe

bash
whisper audio.mp3 --model medium --output_format txt
whisper audio.m4a --model small --language fr --output_format srt
whisper meeting.wav --model large --output_format json

Models (speed vs accuracy)

ModelSizeSpeedAccuracyUse When
tiny39MFastestLowQuick drafts, short clips
base74MFastFairSimple audio, clear speech
small244MMediumGoodGeneral use
medium769MSlowGreatMeetings, accented speech
large1.5GSlowestBestCritical accuracy needed

Models download to ~/.cache/whisper on first use.

Output Formats

FormatExtensionUse
txt.txtPlain text
srt.srtSubtitles with timestamps
vtt.vttWeb subtitles
json.jsonProgrammatic access
tsv.tsvSpreadsheet import

Translation (to English)

bash
whisper audio_french.mp3 --task translate --model medium

API Whisper (requires OPENAI_API_KEY)

bash
curl -s https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file="@audio.mp3" \
  -F model="whisper-1" \
  -F response_format="text"

Max file size: 25MB. For larger files, split with ffmpeg:

bash
ffmpeg -i long_audio.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3

Tips

  • Extract audio from video: ffmpeg -i video.mp4 -vn -acodec libmp3lame audio.mp3
  • Reduce file size: ffmpeg -i input.wav -ar 16000 -ac 1 output.mp3
  • For meetings: use medium or large model with --language flag
  • Combine chunks: concatenate text output files