AgentSkillsCN

transcribing-audio

当您需要将音频文件转录为文本时,务必使用此工具。在 Apple Silicon 上,基于 Parakeet MLX 的本地语音转文本(STT)转录技术,速度快、私密性强、支持离线操作。可通过诸如“transcribe audio”、“convert audio to text”、“speech to text”、“STT”、“transcription”、“get text from audio”、“audio file transcription”、“voice to text”、“extract text from recording”、“transcribe podcast”、“transcribe meeting”、“transcribe voice memo”等短语进行触发。

SKILL.md
--- frontmatter
name: transcribing-audio
description: "MUST be used when you need to transcribe audio files to text. Local speech-to-text (STT) transcription using Parakeet MLX on Apple Silicon - fast, private, offline. Triggers on: transcribe audio, convert audio to text, speech to text, STT, transcription, get text from audio, audio file transcription, voice to text, extract text from recording, transcribe podcast, transcribe meeting, transcribe voice memo."

Transcribing Audio

Use this skill to transcribe audio files to text using Parakeet MLX - NVIDIA's state-of-the-art ASR model running locally on Apple Silicon.

When to Use

Use this skill when the user wants to:

  • Transcribe an audio file (MP3, WAV, M4A, etc.)
  • Convert speech to text
  • Get a text transcript from a recording
  • Extract text from a podcast, meeting, or voice memo
  • Generate subtitles (SRT/VTT) from audio

Prerequisites

The parakeet-mlx command is available in PATH via ~/.local/bin/parakeet-mlx.

bash
parakeet-mlx <audio_file> [options]

Basic Usage

bash
# Simple transcription (outputs .srt file)
parakeet-mlx /path/to/audio.mp3

# Output as plain text
parakeet-mlx /path/to/audio.mp3 --output-format txt

# Output as JSON with timestamps
parakeet-mlx /path/to/audio.mp3 --output-format json

# All formats (txt, srt, vtt, json)
parakeet-mlx /path/to/audio.mp3 --output-format all

# With word-level timestamps in subtitles
parakeet-mlx /path/to/audio.mp3 --output-format vtt --highlight-words

# Specify output directory
parakeet-mlx /path/to/audio.mp3 --output-dir /path/to/output

Output Formats

FormatDescriptionUse Case
txtPlain text, no timestampsReading, copying, searching
srtSubRip subtitles with timestampsVideo subtitles, editing
vttWebVTT subtitles with timestampsWeb video, HTML5
jsonStructured data with full timestampsProgrammatic use, analysis
allAll of the aboveWhen you need everything

Options

OptionDefaultDescription
--output-formatsrtOutput format (txt/srt/vtt/json/all)
--output-dir.Directory for output files
--highlight-wordsfalseWord-level timestamps in SRT/VTT
--verbose / -vfalseShow detailed progress
--chunk-duration120Chunk duration in seconds for long audio

Model

Uses mlx-community/parakeet-tdt-0.6b-v3 by default - a 600M parameter model that runs efficiently on Apple Silicon with excellent accuracy.

Examples

Transcribe a voice memo

bash
parakeet-mlx ~/Desktop/voice-memo.m4a --output-format txt

Generate subtitles for a video

bash
parakeet-mlx video.mp4 --output-format srt --highlight-words

Transcribe multiple files

bash
parakeet-mlx file1.mp3 file2.mp3 file3.mp3 --output-format txt

Reading the Output

After transcription, read the output file to show the user:

bash
cat /path/to/audio.txt  # For text output

For JSON output, you can parse the timestamps:

bash
cat /path/to/audio.json | jq '.sentences[] | {text, start, end}'

Notes

  • First run may download the model (~600MB) which is cached for future use
  • Runs entirely locally - no API calls, fully private
  • Supports WAV, MP3, M4A, FLAC, and other common audio formats
  • For very long audio (>2 hours), chunking is automatic