<<

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma and Chords

1. Features for Music Audio 2. Chroma Features 3. Chord Recognition

Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 1 /18 1. Features for Music Audio • Challenges of large music databases how to find “what we want”... • Euclidean metaphor music tracks as points in space • What are the dimensions? “” - timbre, instruments → MFCC melody, chords → Chroma rhythm, tempo → Rhythmic bases E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 2 /18 MFCCs Logan 2000 • The standard feature for speech recognition

0.5 Sound 0

−0.5 FFT X[k] 0.25 0.255 0.26 0.265 0.27 time / s spectra 10 5

Mel scale 0 4 freq. warp 0x 10 1000 2000 3000 freq / Hz 2 1 0 log |X[k]| 0 5 10 15 freq / Mel 100 audspec 50 0

IFFT 0 5 10 15 freq / Mel 200

cepstra 0

−200 Truncate 0 10 20 30 quefrency

MFCCs E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 3 /18 MFCC Example • Resynthesize by imposing spectrum on noise MFCCs capture instruments, not notes

Let It Be - log-freq specgram (LIB-1)

6000 1400 freq / Hz 300

MFCCs

12 10 8 6 coefficient 4 2 Noise excited MFCC resynthesis (LIB-2)

6000 1400 freq / Hz 300

0 5 10 15 20 25 time / sec

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 4 /18 MFCC Artist Classification Ellis 2007 • 20 Artists x 6 albums each train models on 5 albums, classify tracks from last • Model as MFCC mean + covariance per artist “single Gaussian” Confusion: MFCCs (acc 55.13%) u2 model tori_amos

true suzanne_vega steely_dan 20 (mean) roxette radiohead + 10 x 19 queen prince metallica (covariance) madonna led_zeppelin parameters green_day garth_brooks fleetwood_mac 55% correct depeche_mode dave_matthews_b (guessing ~5%) cure creedence_c_r beatles ma aerosmith me ae be ga qu u2 da de cu su gr ra ro pr to cr le st fl E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 5 /18 2. Chroma Features

• What about modeling75 tonal content (notes)? melody spotting 70 65

chord recognitionMIDI note number 60 cover songs... 55 50 MFCCs exclude45 tonal content • 40 Polyphonic • 75 transcription 70 65

is too hard MIDI note number 60 e.g. sinusoidal 55 tracking: 50 Recognized 45 True confused by 40 22 24 26 28 30 32 34 36 harmonics time / s • Chroma features as solution... E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 6 /18 Chroma Features Fujishima 1999 • Idea: Project all energy onto 12 regardless of maintains main “musical” distinction invariant to musical equivalence no need to worry about harmonics?

4 G G

chroma F 3 chroma F freq / kHz / freq D 2 D 1 C

A 0 A 50 100 150 2 4 6 8 fft bin time / sec 50 100 150 200 250 time / frame

NM C(b)= B(12 log (k/k ) b)W (k) X[k] 2 0 | | k=0 W(k) is weighting, B(b) selects every ~ mod12

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 7 /18 Better Chroma • Problems: blurring of bins close to edges limitation of FFT bin resolution • Solutions: peak picking - only keep energy at center of peaks

4 G G

chroma F 3 chroma F freq / kHz / freq D 2 D C ( 1 ) C A 0 A 0 2000 freq / Hz 2 4 6 8 time / sec 50 100 150 200 time / frame

Instantaneous Frequency - high-resolution estimates adapt tuning center based on histogram of pitches

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 8 /18 Chroma Resynthesis Ellis & Poliner 2007 • Chroma describes the notes in an octave ... but not the octave • Can resynthesize by presenting all ... with a smooth envelope “Shepard tones” - octave is ambiguous M b o+ b y (t)= W (o + ) cos 2 12 w t b 12 0 o=1 12 Shepard tone spectra Shepard tone resynth 0 4 -10 3 -20 level / dB

-30 freq / kHz 2 -40 1 -50 -60 0 0 500 1000 1500 2000 2500 freq / Hz 2 4 6 8 10 time / sec endless sequence illusion

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 9 /18 Chroma Example • Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs Let It Be - log-freq specgram (LIB-1)

6000 1400 freq / Hz 300

Chroma features B A G E chroma bin D C Shepard tone resynthesis of chroma (LIB-3) 6000 1400 freq / Hz 300

MFCC-filtered shepard tones (LIB-4)

6000 1400 freq / Hz 300

0 5 10 15 20 25 time / sec E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 10 /18 -Synchronous Chroma Bartsch & Wakefield 2001 • Drastically reduce data size by recording one chroma frame per beat

Let It Be - log-freq specgram (LIB-1)

6000 1400 freq / Hz / freq 300

Onset envelope + beat times

Beat-synchronous chroma B A G E chroma bin D C Beat-synchronous chroma + Shepard resynthesis (LIB-6)

6000 1400 freq / Hz / freq 300

0510152025 time / sec

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 11 /18 3. Chord Recognition • Beat synchronous chroma look like chords

G E D chroma bin C A 05101520... time / sec A-C-E C-E-G B-D-G A-C-D-F can we transcribe them? • Two approaches manual templates (prior knowledge) learned models (from training data)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 12 /18 Chord Recognition System Sheh & Ellis 2003 • Analogous to speech recognition Gaussian models of features for each chord Hidden Markov Models for chord transitions

Beat track

beat-synchronous Audio 100-1600 Hz Chroma BPF chroma features HMM chord Viterbi labels

B 25-400 Hz Chroma A C maj BPF G test E D 24 C train C D E G A B Gauss B Root A c min Gaussian Unnormalize G normalize models E D C C D E G A B

Labels Resample b

a

g

f e Count d 24x24 c B transitions transition A G

F matrix E D

C

C D E F G A B c d e f g a b E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 13 /18 HMMs • Hidden Markov Models are good for .8 .8 inferring hidden states S qn+1 .1 p(qn+1|qn) S A B C E A .1 B underlying Markov S 0 1 0 0 0 .1 .1 A 0 .8 .1 .1 0 “generative model” .1 .1 qn B 0 .1 .8 .1 0 C C 0 .1 .1 .7 .1 each state has .1 E 0 0 0 0 1 E emission distribution .7 S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E

State sequence observations Emission distributions AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC q = A q = B q = C ) 0.8 Observation q 3 | sequence

tell us x 0.6 ( n

p 2 something 0.4 x 0.2 1 about state... 0 0 )

q 0.8 | q = A q = B q = C 3 x

( 0.6 p n 2 infer smoothed 0.4 x 0.2 1

0 0 state sequence 0 1 2 3 4 0 10 20 30 observation x time step n E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 14 /18 HMM Inference HMM defines emission distribution p(x q) • | and transition probabilities p(qn qn 1) | • Likelihood of observed given state sequence: p( xn qn )= p(xn qn)p(qn qn 1) { }|{ } | | n

Model M1 x Observations Observation n x , x , x likelihoods 0.7 0.8 p(x|B) 1 2 3 p(x|q) x1 x2 x3 0.9 0.2 0.2 p(x|A) A 2.5 0.2 0.1 S A B E q { 0.1 2.2 2.3 0.1 0.1 1 23n B S A B E Paths S • 0.9 0.1 • E A • 0.7 0.2 0.1 0.8 0.8 0.2 By dynamic programming, B • • 0.8 0.2 B 0.1 0.2 0.2 0.1 • E • • • 1 A

States 0.7 0.7 S 0.9 we can also identify the 01234time n All possible 3-emission paths Qk from S to E best state sequence given q0 q1 q2 q3 q4 p(Q | M) = Πn p(qn|qn-1) p(X | Q,M) = Πn p(xn|qn) p(X,Q | M) S AAAE .9 x .7 x .7 x .1 = 0.0441 2.5 x 0.2 x 0.1 = 0.05 0.0022 S AAB E .9 x .7 x .2 x .2 = 0.0252 2.5 x 0.2 x 2.3 = 1.15 0.0290 just the observations S A BBE .9 x .2 x .8 x .2 = 0.0288 2.5 x 2.2 x 2.3 = 12.65 0.3643 S BBBE .1 x .8 x .8 x .2 = 0.0128 0.1 x 2.2 x 2.3 = 0.506 0.0065 Σ = 0.1109 Σ = p(X | M) = 0.4020 E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 15 /18 Key Normalization • Chord transitions depend on key of piece dominant, relative minor, etc...

Taxman Eleanor Rigby I'm Only Sleeping

G G G F F F • Chord transition D D D aligned chroma C C C

probabilities should A A A ACDFG ACDFG be key-relative Love You ToAligned Global model Yellow Submarine G G G estimate main key of F F F piece D D D C C C rotate all chroma A A A ACDFG ACDFG ACDFG features She Said She Said Good Day Sunshine And Your Bird Can Sing G G G learn models F F F D D D C C C

A A A ACDFG ACDFG ACDFG aligned chroma E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 16 /18 Chord Recognition • Often works: Let It Be/06-Let It Be freq / Hz / freq 2416

Audio 761

240

Ground truth C G A:min C G F C C G A:min F:maj7 F:maj6 F:maj7 chord A:min/b7 A:min/b7 G F Beat- E synchronous D chroma C B A Recognized CGaFCGFC Ga

0 2 4 6 8 10 12 14 16 18 • But only about 60% of the time

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 17 /18 Summary • Music Audio Features capture information useful for classification

• Chroma Features 12 bins to robustly summarize notes

• Chord Recognition Sometimes easy, sometimes subtle

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 18 /18 References • B. Logan, “Mel frequency cepstral coefficients for music modeling,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000. • D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340, Vienna, October 2007. • T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” In Proc. Int. Comp. Music Conf., pp. 464–467, Beijing, 1999. • D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, pp. IV-1429-1432, Hawai'i, April 2007. • M. A. Bartsch and G. H. Wakefield, “To catch a : Using chroma-based representations for audio thumbnailing,” in Proc. IEEE WASPAA, Mohonk, October 2001. • A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003.

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - 19 /18