ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma and Chords
1. Features for Music Audio 2. Chroma Features 3. Chord Recognition
Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 1 /18 1. Features for Music Audio • Challenges of large music databases how to find “what we want”... • Euclidean metaphor music tracks as points in space • What are the dimensions? “sound” - timbre, instruments → MFCC melody, chords → Chroma rhythm, tempo → Rhythmic bases E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 2 /18 MFCCs Logan 2000 • The standard feature for speech recognition
0.5 Sound 0
−0.5 FFT X[k] 0.25 0.255 0.26 0.265 0.27 time / s spectra 10 5
Mel scale 0 4 freq. warp 0x 10 1000 2000 3000 freq / Hz 2 1 0 log |X[k]| 0 5 10 15 freq / Mel 100 audspec 50 0
IFFT 0 5 10 15 freq / Mel 200
cepstra 0
−200 Truncate 0 10 20 30 quefrency
MFCCs E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 3 /18 MFCC Example • Resynthesize by imposing spectrum on noise MFCCs capture instruments, not notes
Let It Be - log-freq specgram (LIB-1)
6000 1400 freq / Hz 300
MFCCs
12 10 8 6 coefficient 4 2 Noise excited MFCC resynthesis (LIB-2)
6000 1400 freq / Hz 300
0 5 10 15 20 25 time / sec
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 4 /18 MFCC Artist Classification Ellis 2007 • 20 Artists x 6 albums each train models on 5 albums, classify tracks from last • Model as MFCC mean + covariance per artist “single Gaussian” Confusion: MFCCs (acc 55.13%) u2 model tori_amos
true suzanne_vega steely_dan 20 (mean) roxette radiohead + 10 x 19 queen prince metallica (covariance) madonna led_zeppelin parameters green_day garth_brooks fleetwood_mac 55% correct depeche_mode dave_matthews_b (guessing ~5%) cure creedence_c_r beatles ma aerosmith me ae be ga qu u2 da de cu su gr ra ro pr to cr le st fl E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 5 /18 2. Chroma Features
• What about modeling75 tonal content (notes)? melody spotting 70 65
chord recognitionMIDI note number 60 cover songs... 55 50 MFCCs exclude45 tonal content • 40 Polyphonic • 75 transcription 70 65
is too hard MIDI note number 60 e.g. sinusoidal 55 tracking: 50 Recognized 45 True confused by 40 22 24 26 28 30 32 34 36 harmonics time / s • Chroma features as solution... E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 6 /18 Chroma Features Fujishima 1999 • Idea: Project all energy onto 12 semitones regardless of octave maintains main “musical” distinction invariant to musical equivalence no need to worry about harmonics?
4 G G
chroma F 3 chroma F freq / kHz / freq D 2 D C 1 C
A 0 A 50 100 150 2 4 6 8 fft bin time / sec 50 100 150 200 250 time / frame
NM C(b)= B(12 log (k/k ) b)W (k) X[k] 2 0 | | k =0 W(k) is weighting, B(b) selects every ~ mod12
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 7 /18 Better Chroma • Problems: blurring of bins close to edges limitation of FFT bin resolution • Solutions: peak picking - only keep energy at center of peaks
4 G G
chroma F 3 chroma F freq / kHz / freq D 2 D C ( 1 ) C A 0 A 0 2000 freq / Hz 2 4 6 8 time / sec 50 100 150 200 time / frame
Instantaneous Frequency - high-resolution estimates adapt tuning center based on histogram of pitches
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 8 /18 Chroma Resynthesis Ellis & Poliner 2007 • Chroma describes the notes in an octave ... but not the octave • Can resynthesize by presenting all octaves ... with a smooth envelope “Shepard tones” - octave is ambiguous M b o+ b y (t)= W (o + ) cos 2 12 w t b 12 0 o=1 12 Shepard tone spectra Shepard tone resynth 0 4 -10 3 -20 level / dB
-30 freq / kHz 2 -40 1 -50 -60 0 0 500 1000 1500 2000 2500 freq / Hz 2 4 6 8 10 time / sec endless sequence illusion
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 9 /18 Chroma Example • Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs Let It Be - log-freq specgram (LIB-1)
6000 1400 freq / Hz 300
Chroma features B A G E chroma bin D C Shepard tone resynthesis of chroma (LIB-3) 6000 1400 freq / Hz 300
MFCC-filtered shepard tones (LIB-4)
6000 1400 freq / Hz 300
0 5 10 15 20 25 time / sec E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 10 /18 Beat-Synchronous Chroma Bartsch & Wakefield 2001 • Drastically reduce data size by recording one chroma frame per beat
Let It Be - log-freq specgram (LIB-1)
6000 1400 freq / Hz / freq 300
Onset envelope + beat times
Beat-synchronous chroma B A G E chroma bin D C Beat-synchronous chroma + Shepard resynthesis (LIB-6)
6000 1400 freq / Hz / freq 300
0510152025 time / sec
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 11 /18 3. Chord Recognition • Beat synchronous chroma look like chords
G E D chroma bin C A 05101520... time / sec A-C-E C-E-G B-D-G A-C-D-F can we transcribe them? • Two approaches manual templates (prior knowledge) learned models (from training data)
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 12 /18 Chord Recognition System Sheh & Ellis 2003 • Analogous to speech recognition Gaussian models of features for each chord Hidden Markov Models for chord transitions
Beat track
beat-synchronous Audio 100-1600 Hz Chroma BPF chroma features HMM chord Viterbi labels
B 25-400 Hz Chroma A C maj BPF G test E D 24 C train C D E G A B Gauss B Root A c min Gaussian Unnormalize G normalize models E D C C D E G A B
Labels Resample b
a
g
f e Count d 24x24 c B transitions transition A G
F matrix E D
C
C D E F G A B c d e f g a b E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 13 /18 HMMs • Hidden Markov Models are good for .8 .8 inferring hidden states S qn+1 .1 p(qn+1|qn) S A B C E A .1 B underlying Markov S 0 1 0 0 0 .1 .1 A 0 .8 .1 .1 0 “generative model” .1 .1 qn B 0 .1 .8 .1 0 C C 0 .1 .1 .7 .1 each state has .1 E 0 0 0 0 1 E emission distribution .7 S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E
State sequence observations Emission distributions AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC q = A q = B q = C ) 0.8 Observation q 3 | sequence
tell us x 0.6 ( n
p 2 something 0.4 x 0.2 1 about state... 0 0 )
q 0.8 | q = A q = B q = C 3 x
( 0.6 p n 2 infer smoothed 0.4 x 0.2 1
0 0 state sequence 0 1 2 3 4 0 10 20 30 observation x time step n E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 14 /18 HMM Inference HMM defines emission distribution p(x q) • | and transition probabilities p(qn qn 1) | • Likelihood of observed given state sequence: p( xn qn )= p(xn qn)p(qn qn 1) { }|{ } | | n
Model M1 x Observations Observation n x , x , x likelihoods 0.7 0.8 p(x|B) 1 2 3 p(x|q) x1 x2 x3 0.9 0.2 0.2 p(x|A) A 2.5 0.2 0.1 S A B E q { 0.1 2.2 2.3 0.1 0.1 1 23n B S A B E Paths S • 0.9 0.1 • E A • 0.7 0.2 0.1 0.8 0.8 0.2 By dynamic programming, B • • 0.8 0.2 B 0.1 0.2 0.2 0.1 • E • • • 1 A
States 0.7 0.7 S 0.9 we can also identify the 01234time n All possible 3-emission paths Qk from S to E best state sequence given q0 q1 q2 q3 q4 p(Q | M) = Πn p(qn|qn-1) p(X | Q,M) = Πn p(xn|qn) p(X,Q | M) S AAAE .9 x .7 x .7 x .1 = 0.0441 2.5 x 0.2 x 0.1 = 0.05 0.0022 S AAB E .9 x .7 x .2 x .2 = 0.0252 2.5 x 0.2 x 2.3 = 1.15 0.0290 just the observations S A BBE .9 x .2 x .8 x .2 = 0.0288 2.5 x 2.2 x 2.3 = 12.65 0.3643 S BBBE .1 x .8 x .8 x .2 = 0.0128 0.1 x 2.2 x 2.3 = 0.506 0.0065 Σ = 0.1109 Σ = p(X | M) = 0.4020 E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 15 /18 Key Normalization • Chord transitions depend on key of piece dominant, relative minor, etc...
Taxman Eleanor Rigby I'm Only Sleeping
G G G F F F • Chord transition D D D aligned chroma C C C
probabilities should A A A ACDFG ACDFG be key-relative Love You ToAligned Global model Yellow Submarine G G G estimate main key of F F F piece D D D C C C rotate all chroma A A A ACDFG ACDFG ACDFG features She Said She Said Good Day Sunshine And Your Bird Can Sing G G G learn models F F F D D D C C C
A A A ACDFG ACDFG ACDFG aligned chroma E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 16 /18 Chord Recognition • Often works: Let It Be/06-Let It Be freq / Hz / freq 2416
Audio 761
240
Ground truth C G A:min C G F C C G A:min F:maj7 F:maj6 F:maj7 chord A:min/b7 A:min/b7 G F Beat- E synchronous D chroma C B A Recognized CGaFCGFC Ga
0 2 4 6 8 10 12 14 16 18 • But only about 60% of the time
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 17 /18 Summary • Music Audio Features capture information useful for classification
• Chroma Features 12 bins to robustly summarize notes
• Chord Recognition Sometimes easy, sometimes subtle
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 18 /18 References • B. Logan, “Mel frequency cepstral coefficients for music modeling,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000. • D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340, Vienna, October 2007. • T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” In Proc. Int. Comp. Music Conf., pp. 464–467, Beijing, 1999. • D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, pp. IV-1429-1432, Hawai'i, April 2007. • M. A. Bartsch and G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” in Proc. IEEE WASPAA, Mohonk, October 2001. • A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003.
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 19 /18