Audio Transcription with Groq
Fast, accurate speech-to-text transcription powered by Groq's Whisper API
Prerequisites
- •
GROQ_API_KEYenvironment variable must be set (provided automatically when connected in Settings) - •
GROQ_TRANSCRIPTION_MODELenvironment variable is set towhisper-large-v3-turboby default
API Basics
- •Base URL:
https://api.groq.com/openai/v1 - •Auth: Bearer token
- •Max file size: 25MB
- •Supported formats: MP3, MP4, MPEG, MPGA, M4A, WAV, WebM, OGG, FLAC
Quick Start
Transcribe a Local Audio File
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL"
Transcribe from URL (Download First)
bash
# Download the audio file first curl -L -o /tmp/audio.mp4 "https://example.com/audio.mp4" # Then transcribe curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@/tmp/audio.mp4" \ -F "model=$GROQ_TRANSCRIPTION_MODEL"
Response Formats
Basic Response (Default)
json
{
"text": "The transcribed text content goes here..."
}
Verbose JSON (With Timestamps)
Add response_format=verbose_json for detailed output:
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=verbose_json"
Response:
json
{
"task": "transcribe",
"language": "english",
"duration": 45.67,
"text": "Full transcription text...",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.5,
"text": "First segment text"
},
{
"id": 1,
"start": 3.5,
"end": 7.2,
"text": "Second segment text"
}
]
}
Timestamp Options
Segment-Level Timestamps
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=verbose_json" \ -F "timestamp_granularities[]=segment"
Word-Level Timestamps
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=verbose_json" \ -F "timestamp_granularities[]=word"
Word-level response includes:
json
{
"words": [
{"word": "Hello", "start": 0.0, "end": 0.3},
{"word": "world", "start": 0.35, "end": 0.7}
]
}
Both Segment and Word Timestamps
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=verbose_json" \ -F "timestamp_granularities[]=segment" \ -F "timestamp_granularities[]=word"
Language Options
Specify Input Language
Improve accuracy by specifying the language (ISO-639-1 code):
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "language=en"
Common language codes: en, es, fr, de, it, pt, ja, ko, zh
Instagram Reels Workflow
Combine with Apify to transcribe Instagram Reels:
Step 1: Get Reel Video URL with Apify
bash
REEL_DATA=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~instagram-reel-scraper/run-sync-get-dataset-items?token=$APIFY_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"directUrls": ["https://www.instagram.com/reel/ABC123/"],
"resultsLimit": 1
}')
VIDEO_URL=$(echo "$REEL_DATA" | jq -r '.[0].videoUrl')
Step 2: Download and Transcribe
bash
# Download the video curl -L -o /tmp/reel.mp4 "$VIDEO_URL" # Transcribe curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@/tmp/reel.mp4" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=verbose_json"
Available Models
| Model | Description |
|---|---|
whisper-large-v3-turbo | Fast, recommended for most use cases |
whisper-large-v3 | Most accurate, slightly slower |
distil-whisper-large-v3-en | English-only, fastest |
Request Parameters
| Parameter | Type | Description |
|---|---|---|
file | file | Audio file to transcribe (required) |
model | string | Model to use (required) |
language | string | ISO-639-1 language code |
prompt | string | Optional context to improve transcription |
response_format | string | json, text, srt, verbose_json, vtt |
temperature | float | Sampling temperature (0-1) |
timestamp_granularities[] | array | segment and/or word |
Response Formats
| Format | Description |
|---|---|
json | Simple JSON with text field |
text | Plain text only |
verbose_json | Full JSON with segments, timing, language |
srt | SubRip subtitle format |
vtt | WebVTT subtitle format |
SRT Output Example
bash
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL" \ -F "response_format=srt"
Output:
code
1 00:00:00,000 --> 00:00:03,500 First subtitle text 2 00:00:03,500 --> 00:00:07,200 Second subtitle text
Environment Variables
| Variable | Description |
|---|---|
GROQ_API_KEY | Groq API key for authentication |
GROQ_TRANSCRIPTION_MODEL | Default model (whisper-large-v3-turbo) |
Error Handling
| Status | Description |
|---|---|
400 Bad Request | Invalid file format or missing parameters |
401 Unauthorized | Invalid API key |
413 Payload Too Large | File exceeds 25MB limit |
429 Too Many Requests | Rate limit exceeded |
500 Internal Server Error | Server error, retry |
Best Practices
- •File size - Keep files under 25MB; split longer audio
- •Format - MP3/MP4 work well; convert unusual formats first
- •Language hint - Specify language for better accuracy
- •Use verbose_json - Get timestamps for subtitles/analysis
- •Clean up - Remove temporary files after transcription
- •Error handling - Implement retry logic for transient errors
Translation
To translate audio to English:
bash
curl -X POST "https://api.groq.com/openai/v1/audio/translations" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@spanish_audio.mp3" \ -F "model=$GROQ_TRANSCRIPTION_MODEL"
Note: Translation always outputs English text.