Searching for Similar Phrases in Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/
1. Motivation: Similar Phrases 2. Phrase Matching System 3. Experiments 4. Conclusions & Future
Similar Phrases in Music - Ellis 2007-12-18 p. 1 /12 1. Motivation: Similar Phrases • Idea: Music is a sequence of reused pieces e.g. melodic runs, chord sequences, ... • Can we identify them in large music databases? ... which we have i.e. machine learning • Applications classification and matching of pieces compressed representation data-driven musicology
Similar Phrases in Music - Ellis 2007-12-18 p. 2 /12 Common Phrase Discovery
Beat tracking
Locality Music Chroma Key Landmark Sensitive audio features normalization identification Hash Table • Chop up music into short descriptions of musical content 24-beat beat-chroma matrices? • Choose a few that appear to be “starts” • Put into LSH table (similar items fall in same bin) • Find the bins with most entries
Similar Phrases in Music - Ellis 2007-12-18 p. 3 /12 2. Phrase Matching: Beat Tracking Ellis ’06,’07 • Goal: One feature vector per ‘beat’ (tatum) for tempo normalization, efficiency • “Onset Strength Envelope” sumf(max(0, difft(log |X(t, f)|))) 40 30 20 freq / mel 10
0
0 5 10 time / sec 15 • Autocorr. + window → global tempo estimate
0
0168.5100 BPM 200 300 400 500 600 700 800 900 1000 lag / 4 ms samples Similar Phrases in Music - Ellis 2007-12-18 p. 4 /12 Chroma Features • Chroma features convert spectral energy into musical weights in a canonical octave i.e. 12 semitone bins Piano chromatic scale IF chroma
4 a z m
H G o k r
/
3 h Piano F c q e r scale f 2 D 1 C
0 A 2 4 6 8 10 time / sec 100 200 300 400 500 600 700 time / frames • Can resynthesize as “Shepard Tones”
all octa12 Shepardves toneat spectra once Shepard tone resynth 0 4 -10 3 -20 level / dB
-30 freq / kHz 2 -40 1 -50 -60 0 0 500 1000 1500 2000 2500 freq / Hz 2 4 6 8 10 time / sec
Similar Phrases in Music - Ellis 2007-12-18 p. 5 /12 Key Estimation Ellis ICASSP ’07 • Covariance of chroma reflects key • Normalize by transposing for best fit Taxman Eleanor Rigby I'm Only Sleeping
G G G F F F
single Gaussian D D D model of one piece aligned chroma C C C A A A A C D F G A C D F G find ML rotation Love You To Aligned Global model Yellow Submarine G G G of other pieces F F F
D D D model all C C C
A A A transposed pieces A C D F G A C D F G A C D F G She Said She Said Good Day Sunshine And Your Bird Can Sing
iterate until G G G convergence F F F D D D C C C
A A A A C D F G A C D F G A C D F G aligned chroma Similar Phrases in Music - Ellis 2007-12-18 p. 6 /12 Landmark Location • Looking for “beginnings” of phrases e.g. abrupt change in harmony, instruments, etc. use likelihood ratio test: weighted windows either side of boundary vs. all Come Together - Spectrogram, Beat-sync chromogram, and top 10 segment points
5 freq / kHz 4 • Choose top 10 3 locally-normalized 2 1
peaks 0 .. to control 12 10 chroma bin data size 8
6
4
2
0 50 100 150 200 250 time / sec Similar Phrases in Music - Ellis 2007-12-18 p. 7 /12 Locality Sensitive Hashes • Goal: Quantize high-dimensional data so ‘similar’ items fall into same bin .. for fast and scalable nearest-neighbor search • Idea: Multiple random scalar projections each one will tend to keep neighbors nearby items close together in all projections are probably neighbors
from Slaney & Casey ‘08 Similar Phrases in Music - Ellis 2007-12-18 p. 8 /12 3. Experiments • Data “artist20” - 20 artist x 6 albums = 1413 tracks (up to) 10 landmarks/track = 14,078 patches each patch = 12 chroma bins x 24 beats (288 dims) # neighbors within 2.0 - 14078 a20 patches Performance 100 • 90
feature calculation: 80
~ 60 min 70 LSH 14k NNs: 60 50 ~ 30 sec count 40
51 patches have 30 >10 NNs 20 within r = 2.0 10 0 0 5 10 15 20 25 30 35 40 # near neighbors Similar Phrases in Music - Ellis 2007-12-18 p. 9 /12 Results - artist20
radiohead 01-You 192.7-198.5s green day 02-Blood Sex And Booze 122.3-129.8s 12 12
10 10
8 8
6 6 chroma bin chroma bin
4 4
2 2
radiohead 07-Ripcord 177.4-182.5s radiohead 11-Genchildren hidden 30.9-35.0s 12 12
10 10
8 8
6 6 chroma bin chroma bin
4 4
2 2
5 10 15 20 5 10 15 20 beat beat mainly sustained notes Similar Phrases in Music - Ellis 2007-12-18 p. 10 /12 Results - Beatles • Only the 86 Beatles tracks • All beat offsets = 41,705 patches LSH takes 300 sec - approx NlogN in patches?
02-I Should Have Known Better 92.4-97.7s 05-Here There And Everywhere 12.1-20.5s • High-pass 12 12 10 10
along time 8 8
6 6
to avoid chroma bin chroma bin sustained 4 4 2 2
notes 09-Martha My Dear 90.9-98.6s 12-Piggies 22.0-29.6s 12 12 Song filter 10 10 • 8 8
remove hits 6 6 chroma bin chroma bin in same 4 4 track 2 2 5 10 15 20 5 10 15 20 beat beat Similar Phrases in Music - Ellis 2007-12-18 p. 11 /12 Summary / Conclusions
Beat tracking
Locality Music Chroma Key Landmark Sensitive audio features normalization identification Hash Table • Lots of data find motifs by counting near neighbors • Common patterns e.g. melodic/harmonic-beat sequences • Future different features and/or pre-emphasis better landmark points complete dictionary
Similar Phrases in Music - Ellis 2007-12-18 p. 12 /12