Transcribing Audio

Use this skill to transcribe audio files to text using Parakeet MLX - NVIDIA's state-of-the-art ASR model running locally on Apple Silicon.

When to Use

Use this skill when the user wants to:

•Transcribe an audio file (MP3, WAV, M4A, etc.)
•Convert speech to text
•Get a text transcript from a recording
•Extract text from a podcast, meeting, or voice memo
•Generate subtitles (SRT/VTT) from audio

Prerequisites

The parakeet-mlx command is available in PATH via ~/.local/bin/parakeet-mlx.

bash

parakeet-mlx <audio_file> [options]

Basic Usage

bash

# Simple transcription (outputs .srt file)
parakeet-mlx /path/to/audio.mp3

# Output as plain text
parakeet-mlx /path/to/audio.mp3 --output-format txt

# Output as JSON with timestamps
parakeet-mlx /path/to/audio.mp3 --output-format json

# All formats (txt, srt, vtt, json)
parakeet-mlx /path/to/audio.mp3 --output-format all

# With word-level timestamps in subtitles
parakeet-mlx /path/to/audio.mp3 --output-format vtt --highlight-words

# Specify output directory
parakeet-mlx /path/to/audio.mp3 --output-dir /path/to/output

Output Formats

Format	Description	Use Case
`txt`	Plain text, no timestamps	Reading, copying, searching
`srt`	SubRip subtitles with timestamps	Video subtitles, editing
`vtt`	WebVTT subtitles with timestamps	Web video, HTML5
`json`	Structured data with full timestamps	Programmatic use, analysis
`all`	All of the above	When you need everything

Options

Option	Default	Description
`--output-format`	`srt`	Output format (txt/srt/vtt/json/all)
`--output-dir`	`.`	Directory for output files
`--highlight-words`	`false`	Word-level timestamps in SRT/VTT
`--verbose` / `-v`	`false`	Show detailed progress
`--chunk-duration`	`120`	Chunk duration in seconds for long audio

Model

Uses mlx-community/parakeet-tdt-0.6b-v3 by default - a 600M parameter model that runs efficiently on Apple Silicon with excellent accuracy.

Examples

Transcribe a voice memo

bash

parakeet-mlx ~/Desktop/voice-memo.m4a --output-format txt

Generate subtitles for a video

bash

parakeet-mlx video.mp4 --output-format srt --highlight-words

Transcribe multiple files

bash

parakeet-mlx file1.mp3 file2.mp3 file3.mp3 --output-format txt

Reading the Output

After transcription, read the output file to show the user:

bash

cat /path/to/audio.txt  # For text output

For JSON output, you can parse the timestamps:

bash

cat /path/to/audio.json | jq '.sentences[] | {text, start, end}'

Notes

•First run may download the model (~600MB) which is cached for future use
•Runs entirely locally - no API calls, fully private
•Supports WAV, MP3, M4A, FLAC, and other common audio formats
•For very long audio (>2 hours), chunking is automatic