WaveCap Silence Detection & Chunking Skill
Use this skill to tune how WaveCap detects speech boundaries and groups audio into chunks for transcription.
Configuration Location
Silence/chunking settings are in the whisper: section:
- •User config:
/Users/thw/Projects/WaveCap/state/config.yaml - •Default config:
/Users/thw/Projects/WaveCap/backend/default-config.yaml
Core Parameters
Silence Threshold (amplitude floor for speech detection)
yaml
whisper: silenceThreshold: 0.015 # RMS amplitude (0.0-1.0)
| Value | Effect |
|---|---|
| 0.005-0.01 | Very sensitive, detects quiet speech, may trigger on noise |
| 0.015-0.02 | Balanced for most radio sources |
| 0.025-0.04 | Less sensitive, ignores background noise, may miss soft speech |
Symptoms & Fixes:
- •Speech cut off at start → Lower threshold (e.g., 0.015 → 0.01)
- •False triggers on noise → Raise threshold (e.g., 0.015 → 0.025)
Silence Hold Seconds (wait before cutting)
yaml
whisper: silenceHoldSeconds: 1.8 # Seconds of silence before flush
| Value | Effect |
|---|---|
| 0.5-0.8 | Fast updates, may clip trailing words |
| 1.0-1.5 | Balanced latency vs completeness |
| 1.8-2.5 | Safer for complete sentences, higher latency |
Symptoms & Fixes:
- •Words cut off at end → Increase (e.g., 1.2 → 1.8)
- •Updates feel slow → Decrease (e.g., 1.8 → 1.2)
Silence Lookback Seconds (evaluation window)
yaml
whisper: silenceLookbackSeconds: 3.0 # Window for silence detection
| Value | Effect |
|---|---|
| 1.0-2.0 | Responsive but may trigger on brief pauses |
| 3.0-4.0 | More stable, requires sustained silence |
| 5.0+ | Very stable, may delay chunk boundaries |
Active Samples Percentage (VAD sensitivity)
yaml
whisper: activeSamplesInLookbackPct: 0.10 # 10% of samples must be active
| Value | Effect |
|---|---|
| 0.05 (5%) | Very sensitive, keeps chunks open longer |
| 0.10 (10%) | Balanced default |
| 0.20 (20%) | Requires more sustained speech to keep chunk open |
Chunk Length (maximum chunk duration)
yaml
whisper: chunkLength: 20 # Maximum seconds before forced flush
| Value | Effect |
|---|---|
| 10-15 | Low latency, may break mid-sentence |
| 20-30 | Balanced for real-time monitoring |
| 45-60 | Better sentence structure, higher latency |
Minimum Chunk Duration
yaml
whisper: minChunkDurationSeconds: 12 # Minimum before silence triggers flush
| Value | Effect |
|---|---|
| 4-8 | Frequent small updates |
| 10-15 | Balanced, avoids tiny fragments |
| 15-20 | Fewer but larger chunks |
Context Overlap (between chunks)
yaml
whisper: contextSeconds: 1.5 # Overlap for Whisper context
| Value | Effect |
|---|---|
| 0.5 | Low latency, may lose speech at boundaries |
| 1.0-2.0 | Balanced overlap for boundary recovery |
| 3.0-5.0 | Maximum safety, more redundant processing |
View Current Settings
bash
grep -E "(silence|chunk|context|active)" /Users/thw/Projects/WaveCap/state/config.yaml
Diagnose Clipping Issues
Check for premature clipping (speech cut at start)
bash
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.text | startswith("..."))] | length' | \
xargs -I{} echo "Transcriptions starting with '...': {}"
Check for late clipping (speech cut at end)
bash
curl -s http://localhost:8000/api/transcriptions/export | \
jq '[.[] | select(.text | test("[a-z]$"))] | length' | \
xargs -I{} echo "Transcriptions ending mid-word: {}"
Analyze audio amplitude at boundaries
bash
cd /Users/thw/Projects/WaveCap/backend && source .venv/bin/activate && python3 << 'EOF'
import numpy as np
from pathlib import Path
RECORDINGS = Path("/Users/thw/Projects/WaveCap/state/recordings")
def analyze_boundaries(filepath):
with open(filepath, 'rb') as f:
f.read(44)
data = f.read()
samples = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
if len(samples) < 1600:
return None
first_50ms = np.sqrt(np.mean(samples[:800]**2))
last_50ms = np.sqrt(np.mean(samples[-800:]**2))
return first_50ms, last_50ms
print("Boundary analysis (high RMS at start/end = potential clipping):")
print("-" * 70)
files = sorted(RECORDINGS.glob("*.wav"), key=lambda x: x.stat().st_mtime, reverse=True)[:15]
clip_start = clip_end = 0
for f in files:
result = analyze_boundaries(f)
if result:
first, last = result
issues = []
if first > 0.06: issues.append("HIGH_START"); clip_start += 1
if last > 0.06: issues.append("HIGH_END"); clip_end += 1
status = ", ".join(issues) if issues else "OK"
print(f"{f.name[-45:]}: start={first:.4f} end={last:.4f} [{status}]")
print("-" * 70)
print(f"Summary: {clip_start} potential start clips, {clip_end} potential end clips")
EOF
Tuning Scenarios
Reduce Clipping (speech being cut off)
yaml
whisper: silenceThreshold: 0.012 # More sensitive silenceHoldSeconds: 2.0 # Wait longer before cutting silenceLookbackSeconds: 3.5 # Larger evaluation window contextSeconds: 2.0 # More overlap activeSamplesInLookbackPct: 0.08 # Keep chunks open easier
Lower Latency (faster updates)
yaml
whisper: silenceThreshold: 0.025 # Less sensitive silenceHoldSeconds: 0.8 # Cut quickly on silence silenceLookbackSeconds: 1.5 # Smaller window chunkLength: 15 # Shorter max chunks minChunkDurationSeconds: 6 # Allow smaller chunks contextSeconds: 0.5 # Minimal overlap
Noisy Environment (false triggers)
yaml
whisper: silenceThreshold: 0.035 # Ignore low-level noise silenceHoldSeconds: 1.0 # Quick silence detection activeSamplesInLookbackPct: 0.20 # Require 20% active
Quiet Environment (soft speakers)
yaml
whisper: silenceThreshold: 0.008 # Very sensitive silenceHoldSeconds: 2.0 # Patient with pauses activeSamplesInLookbackPct: 0.05 # 5% is enough
Apply Changes
bash
launchctl stop com.wavecap.server && sleep 2 && launchctl start com.wavecap.server
Latency vs Accuracy Trade-offs
| Priority | chunkLength | minChunk | silenceHold | context |
|---|---|---|---|---|
| Low latency | 15 | 6 | 0.8 | 0.5 |
| Balanced | 20 | 12 | 1.5 | 1.5 |
| High accuracy | 45 | 20 | 2.5 | 3.0 |
Tips
- •Changes take effect on service restart
- •Test with live audio after changes
- •Monitor clipping rates over 50+ transcriptions before final tuning
- •
silenceThresholdandsilenceHoldSecondshave the biggest impact - •Higher
contextSecondshelps but adds processing overhead