Video Editing
Goal
Automatically edit talking-head videos: remove silences via neural VAD, add swivel teaser preview.
Scripts
- •
./scripts/jump_cut_vad_singlepass.py- VAD silence removal - •
./scripts/insert_3d_transition.py- Swivel teaser insertion - •
./scripts/simple_video_edit.py- Basic FFmpeg editing
Quick Start
bash
# Step 1: Remove silences python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4 # Step 2: Add swivel teaser python3 ./scripts/insert_3d_transition.py .tmp/edited.mp4 output.mp4 --bg-image .tmp/bg.png # One-liner python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4 && \ python3 ./scripts/insert_3d_transition.py .tmp/edited.mp4 output.mp4 --bg-image .tmp/bg.png
Step 1: VAD Silence Removal
How It Works
- •Extracts audio as WAV (16kHz mono)
- •Runs Silero VAD to detect speech segments
- •Merges close segments, adds padding
- •Uses FFmpeg trim+concat to join segments in single pass
- •Hardware encodes with hevc_videotoolbox (H.265, 17Mbps, 30fps)
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--min-silence | 0.5 | Min silence duration to cut (seconds) |
--min-speech | 0.25 | Min speech duration to keep (seconds) |
--padding | 100 | Padding around speech (ms) |
--merge-gap | 0.3 | Merge segments closer than this (seconds) |
--keep-start | true | Always start from 0:00 |
Step 2: Swivel Teaser
How It Works
- •Extracts frames from later in video (default: 60s onwards)
- •Creates 3D rotating "swivel" animation
- •Splits video: intro, transition, main content
- •Re-encodes and concatenates with audio preserved
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--insert-at | 3 | Where to insert teaser (seconds) |
--duration | 5 | Teaser duration (seconds) |
--teaser-start | 60 | Where to sample content from (seconds) |
--bg-image | none | Background image for 3D effect |
Final Timeline
code
[0-3s intro] [3-8s swivel teaser @ 100x] [8s onwards: edited content] Audio: Original audio plays continuously
Processing Time (49-min 4K video)
- •Step 1 (VAD + encode): ~8 minutes
- •Step 2 (swivel teaser): ~3 minutes
- •Total: ~11 minutes
Troubleshooting
| Issue | Solution |
|---|---|
| Cuts feel abrupt | --padding 200 |
| Too much cut | --min-silence 1.0 |
| Too little cut | --min-speech 0.1 |
| Won't play in QuickTime | Ensure hvc1 codec tag |
| Swivel has blank frames | Extract 300 frames for 5s teaser |
Dependencies
bash
pip install torch # For Silero VAD brew install ffmpeg node # macOS cd video_effects && npm install # For 3D rendering
Technical Details
- •macOS: Hardware encoding (hevc_videotoolbox) H.265 at 17Mbps
- •Fallback: libx265 CRF 18
- •Audio: AAC 192kbps
- •Uses
hvc1codec tag for QuickTime compatibility