<<

Searching for Similar Phrases in Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/

1. Motivation: Similar Phrases 2. Phrase Matching System 3. Experiments 4. Conclusions & Future

Similar Phrases in Music - Ellis 2007-12-18 p. 1 /12 1. Motivation: Similar Phrases • Idea: Music is a sequence of reused pieces e.g. melodic runs, chord sequences, ... • Can we identify them in large music databases? ... which we have i.e. machine learning • Applications classification and matching of pieces compressed representation data-driven

Similar Phrases in Music - Ellis 2007-12-18 p. 2 /12 Common Phrase Discovery

Beat tracking

Locality Music Chroma Key Landmark Sensitive audio features normalization identification Hash Table • Chop up music into short descriptions of musical content 24- beat-chroma matrices? • Choose a few that appear to be “starts” • Put into LSH table (similar items fall in same bin) • Find the bins with most entries

Similar Phrases in Music - Ellis 2007-12-18 p. 3 /12 2. Phrase Matching: Beat Tracking Ellis ’06,’07 • Goal: One feature vector per ‘beat’ (tatum) for tempo normalization, efficiency • “Onset Strength Envelope” sumf(max(0, difft(log |X(t, f)|))) 40 30 20 freq / mel 10

0

0 5 10 time / sec 15 • Autocorr. + window → global tempo estimate

0

0168.5100 BPM 200 300 400 500 600 700 800 900 1000 lag / 4 ms samples Similar Phrases in Music - Ellis 2007-12-18 p. 4 /12 Chroma Features • Chroma features convert spectral energy into musical weights in a canonical i.e. 12 bins Piano chromatic scale IF chroma

4 a z m

H G o k r

/

3 h Piano F q e r scale f 2 D 1 C

0 A 2 4 6 8 10 time / sec 100 200 300 400 500 600 700 time / frames • Can resynthesize as “Shepard Tones”

all octa12 Shepardves toneat spectra once Shepard tone resynth 0 4 -10 3 -20 level / dB

-30 freq / kHz 2 -40 1 -50 -60 0 0 500 1000 1500 2000 2500 freq / Hz 2 4 6 8 10 time / sec

Similar Phrases in Music - Ellis 2007-12-18 p. 5 /12 Key Estimation Ellis ICASSP ’07 • Covariance of chroma reflects key • Normalize by transposing for best fit Taxman Eleanor Rigby I'm Only Sleeping

G G G F F F

single Gaussian D D D model of one piece aligned chroma C C C A A A A C D F G A C D F G find ML rotation Love You To Aligned Global model Yellow Submarine G G G of other pieces F F F

D D D model all C C C

A A A transposed pieces A C D F G A C D F G A C D F G She Said She Said Good Day Sunshine And Your Bird Can Sing

iterate until G G G convergence F F F D D D C C C

A A A A C D F G A C D F G A C D F G aligned chroma Similar Phrases in Music - Ellis 2007-12-18 p. 6 /12 Landmark Location • Looking for “beginnings” of phrases e.g. abrupt change in harmony, instruments, etc. use likelihood ratio test: weighted windows either side of boundary vs. all Come Together - Spectrogram, Beat-sync chromogram, and top 10 segment points

5 freq / kHz 4 • Choose top 10 3 locally-normalized 2 1

peaks 0 .. to control 12 10 chroma bin data size 8

6

4

2

0 50 100 150 200 250 time / sec Similar Phrases in Music - Ellis 2007-12-18 p. 7 /12 Locality Sensitive Hashes • Goal: Quantize high-dimensional data so ‘similar’ items fall into same bin .. for fast and scalable nearest-neighbor search • Idea: Multiple random scalar projections each one will tend to keep neighbors nearby items close together in all projections are probably neighbors

from Slaney & Casey ‘08 Similar Phrases in Music - Ellis 2007-12-18 p. 8 /12 3. Experiments • Data “artist20” - 20 artist x 6 albums = 1413 tracks (up to) 10 landmarks/track = 14,078 patches each patch = 12 chroma bins x 24 beats (288 dims) # neighbors within 2.0 - 14078 a20 patches Performance 100 • 90

feature calculation: 80

~ 60 min 70 LSH 14k NNs: 60 50 ~ 30 sec count 40

51 patches have 30 >10 NNs 20 within r = 2.0 10 0 0 5 10 15 20 25 30 35 40 # near neighbors Similar Phrases in Music - Ellis 2007-12-18 p. 9 /12 Results - artist20

radiohead 01-You 192.7-198.5s green day 02-Blood Sex And Booze 122.3-129.8s 12 12

10 10

8 8

6 6 chroma bin chroma bin

4 4

2 2

radiohead 07-Ripcord 177.4-182.5s radiohead 11-Genchildren hidden 30.9-35.0s 12 12

10 10

8 8

6 6 chroma bin chroma bin

4 4

2 2

5 10 15 20 5 10 15 20 beat beat mainly sustained notes Similar Phrases in Music - Ellis 2007-12-18 p. 10 /12 Results - Beatles • Only the 86 Beatles tracks • All beat offsets = 41,705 patches LSH takes 300 sec - approx NlogN in patches?

02-I Should Have Known Better 92.4-97.7s 05-Here There And Everywhere 12.1-20.5s • High-pass 12 12 10 10

along time 8 8

6 6

to avoid chroma bin chroma bin sustained 4 4 2 2

notes 09-Martha My Dear 90.9-98.6s 12-Piggies 22.0-29.6s 12 12 Song filter 10 10 • 8 8

remove hits 6 6 chroma bin chroma bin in same 4 4 track 2 2 5 10 15 20 5 10 15 20 beat beat Similar Phrases in Music - Ellis 2007-12-18 p. 11 /12 Summary / Conclusions

Beat tracking

Locality Music Chroma Key Landmark Sensitive audio features normalization identification Hash Table • Lots of data find motifs by counting near neighbors • Common patterns e.g. melodic/harmonic-beat sequences • Future different features and/or pre-emphasis better landmark points complete dictionary

Similar Phrases in Music - Ellis 2007-12-18 p. 12 /12