🎵 Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
- •Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
- •ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
- •Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
- •Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
- •Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
↓
┌─────────────────────────────────────┐
│ Step 1: Stem Separation (HPSS) │
│ - Isolate harmonic content │
│ - Remove drums/percussion │
│ - Noise gating │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 2: Pitch Detection │
│ - Basic Pitch ML model (Spotify) │
│ - Polyphonic note detection │
│ - Onset/offset estimation │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: Analysis │
│ - Pitch class distribution │
│ - Key detection │
│ - Dominant note identification │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 4: Quantization & Cleanup │
│ - Timing grid snap │
│ - Key-aware pitch correction │
│ - Octave pruning (harmonic removal) │
│ - Overlap-based pruning │
│ - Note merging (legato) │
│ - Velocity normalization │
└─────────────────────────────────────┘
↓
MIDI Output (Standard MIDI File)
Setup
Prerequisites
- •Python 3.11+ (Python 3.14+ recommended)
- •FFmpeg (for audio format support)
- •pip
Installation
Quick Install (Recommended):
cd /path/to/voice-note-to-midi ./setup.sh
This automated script will:
- •Check Python 3.11+ is installed
- •Create the
~/melody-pipelinedirectory - •Set up the virtual environment
- •Install all dependencies (basic-pitch, librosa, music21, etc.)
- •Download and configure the hum2midi script
- •Add melody-pipeline to your PATH
Manual Install:
If you prefer manual setup:
mkdir -p ~/melody-pipeline cd ~/melody-pipeline python3 -m venv venv-bp source venv-bp/bin/activate pip install basic-pitch librosa soundfile mido music21 chmod +x ~/melody-pipeline/hum2midi
- •Add to your PATH (optional):
echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc source ~/.bashrc
Verify Installation
cd ~/melody-pipeline ./hum2midi --help
Usage
Basic Usage
Convert a voice memo to MIDI:
./hum2midi my_humming.wav
This creates my_humming.mid with 16th-note quantization.
Specify Output File
./hum2midi input.wav output.mid
Command-Line Options
| Option | Description | Default |
|---|---|---|
--grid <value> | Quantization grid: 1/4, 1/8, 1/16, 1/32 | 1/16 |
--min-note <ms> | Minimum note duration in milliseconds | 50 |
--no-quantize | Skip quantization (output raw Basic Pitch MIDI) | disabled |
--key-aware | Enable key-aware pitch correction | disabled |
--no-analysis | Skip pitch analysis and key detection | disabled |
Usage Examples
Quantize to eighth notes
./hum2midi melody.wav --grid 1/8
Key-aware quantization (recommended for tonal music)
./hum2midi song.wav --key-aware
Require longer minimum notes
./hum2midi humming.wav --min-note 100
Skip analysis for faster processing
./hum2midi quick.wav --no-analysis
Combine options
./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80
Processing MIDI Input
You can also process existing MIDI files through the quantization pipeline:
./hum2midi input.mid output.mid --grid 1/16 --key-aware
This skips the audio processing steps and goes directly to analysis and quantization.
Sample Output
═══════════════════════════════════════════════════════════════ hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition) [Key-Aware Mode Enabled] ═══════════════════════════════════════════════════════════════ Input: my_humming.wav Output: my_humming.mid → Step 1: Stem Separation (HPSS) Isolating melodic content... Loaded: 5.23s @ 44100Hz ✓ Melody stem extracted → 5.23s → Step 2: Audio-to-MIDI Conversion (Basic Pitch) Running Spotify's Basic Pitch ML model on melody stem... ✓ Raw MIDI generated (Basic Pitch) → Step 3: Pitch Analysis & Key Detection Notes detected: 42 total, 7 unique Note range: C3 - G4 Pitch classes: C3, E3, G3, A3, C4, D4, G4 Dominant note: G3 (23.8% of notes) Detected key: G major → Step 4: Quantization & Cleanup Octave pruning: removed 3 harmonic notes above 67 (median+12) Overlap pruning: removed 2 harmonic notes at overlapping positions Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks) Grid: 240 ticks (1/16) Notes: 38 notes Key: G major Key-aware: 2 notes corrected to scale Tempo: 120 BPM ✓ Quantized MIDI saved ═══════════════════════════════════════════════════════════════ ✓ Done! Output: my_humming.mid ═══════════════════════════════════════════════════════════════ 📊 ANALYSIS SUMMARY ───────────────────────────────────────────────────────────── Detected Notes: C3, E3, G3, A3, C4, D4, G4 Detected Key: G major Quantization: Key-aware mode (notes snapped to scale) MIDI Info: 38 notes, 7 unique pitches, 120 BPM Pitches: C3, E3, G3, A3, C4, D4, G4
Notes & Limitations
Audio Quality Matters
- •Clear, loud melody produces the best results
- •Background noise can cause false note detection
- •Reverb and effects may confuse pitch detection
- •Close-mic'd vocals work significantly better than room recordings
Musical Considerations
- •Monophonic sources work best (single melody line)
- •Polyphonic audio (chords, multiple instruments) will produce messy results
- •Vibrato and pitch bends may be quantized to stepped pitches
- •Rapid note passages may be missed or merged
Technical Limitations
- •Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
- •Note velocities are normalized but may need manual adjustment
- •Very short notes (<50ms) may be filtered out by default
- •Extreme pitch ranges may cause octave detection issues
Post-Processing Recommendations
After generating MIDI, you may want to:
- •Import into your DAW and adjust tempo to match your original recording
- •Quantize further if stricter timing is needed
- •Adjust note velocities for dynamics
- •Apply swing/groove templates if the rigid grid sounds too mechanical
- •Edit individual notes that were misdetected (common with fast runs)
Supported Audio Formats
Input formats supported via FFmpeg:
- •WAV, AIFF, FLAC (uncompressed, best quality)
- •MP3, M4A, AAC (compressed, acceptable)
- •OGG, OPUS (open source formats)
- •Most other formats FFmpeg supports
Troubleshooting
No notes detected
- •Check that input file isn't silent or corrupted
- •Try increasing
--min-notethreshold - •Verify audio has clear melodic content (not just noise)
Too many notes / messy output
- •Enable octave pruning and overlap pruning (on by default)
- •Use
--key-awareto constrain to musical scale - •Check for background noise in source audio
Wrong key detected
- •Key detection works best with at least 8-10 measures of music
- •Chromatic passages may confuse the detector
- •Manually review and adjust in your DAW if needed
Notes in wrong octave
- •Basic Pitch sometimes detects harmonics instead of fundamentals
- •The pipeline includes pruning, but some may slip through
- •Use your DAW's transpose function for simple octave shifts
References
- •Basic Pitch - Spotify's polyphonic pitch detection model
- •librosa HPSS - Harmonic-Percussive Source Separation
- •Krumhansl-Kessler Key Profiles - Key detection algorithm
License
This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.