WAV Audio Analysis Skill
<!-- ABOUTME: Analyze WAV audio files for debugging TTS/audio pipelines --> <!-- ABOUTME: Statistics, waveform patterns, and format validation -->Description
Analyze WAV audio files to debug audio generation pipelines. Provides statistical analysis, format validation, and quality metrics for diagnosing issues with generated speech.
Triggers: wav, audio, waveform, samples, amplitude, audio analysis, sound quality, audio debug
Analysis Capabilities
Basic Statistics
- •Sample count and duration
- •Min/max amplitude
- •Standard deviation (expected ~3000-8000 for speech)
- •Near-silent sample percentage
Quality Indicators
- •Zero crossing rate (speech typically 50-200 per 1000 samples)
- •Clipping detection (samples at ±32767)
- •NaN/Inf detection (if processing raw floats)
- •DC offset analysis
Format Validation
- •Sample rate verification (24kHz for Qwen3-Omni TTS)
- •Bit depth check
- •Channel count
- •RIFF header validation
Usage
To analyze a WAV file, provide the path and I'll run comprehensive diagnostics:
python
import numpy as np
with open("audio.wav", "rb") as f:
header = f.read(44)
data = f.read()
samples = np.frombuffer(data, dtype=np.int16)
print(f"Samples: {len(samples)}")
print(f"Duration: {len(samples)/24000:.2f} sec")
print(f"Min/Max: {samples.min()} / {samples.max()}")
print(f"Std dev: {np.std(samples):.1f}")
# Quality check
near_silent = np.sum(np.abs(samples) < 100)
print(f"Near-silent: {100*near_silent/len(samples):.1f}%")
# Zero crossings (voice activity indicator)
if len(samples) > 1000:
zc = np.sum(np.diff(np.sign(samples[:1000])) != 0)
print(f"Zero crossings (first 1000): {zc}")
Typical Values for Good Speech Audio
| Metric | Expected Range | Meaning |
|---|---|---|
| Std dev | 3000-8000 | Audio energy level |
| Near-silent | <5% | Minimal silent padding |
| Zero crossings | 50-200/1000 | Voice frequency activity |
| Min/Max | ±20000-32000 | Healthy amplitude range |
Common Issues
99% Near-Silent
- •Cause: NaN values converted to zeros
- •Fix: Check for numerical overflow in pipeline
Low Std Dev (<1000)
- •Cause: Values too quiet before output normalization
- •Fix: Check gain stages, ensure proper scaling
Constant Value Runs
- •Cause: Chunked processing with context overlap issues
- •Fix: Verify chunk stitching logic
Clipping (values at ±32767)
- •Cause: Overflow or missing tanh/clamp
- •Fix: Add output clamping before int16 conversion