Towards Automatic Extraction of Harmony Information from Music Signals
Total Page:16
File Type:pdf, Size:1020Kb
Towards Automatic Extraction of Harmony Information from Music Signals Thesis submitted in partial fulfilment of the requirements of the University of London for the Degree of Doctor of Philosophy Christopher Harte August 2010 Department of Electronic Engineering, Queen Mary, University of London I certify that this thesis, and the research to which it refers, are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. I acknowledge the helpful guidance and support of my supervisor, Professor Mark Sandler. 2 Abstract In this thesis we address the subject of automatic extraction of harmony information from audio recordings. We focus on chord symbol recognition and methods for evaluating algorithms designed to perform that task. We present a novel six-dimensional model for equal tempered pitch space based on concepts from neo-Riemannian music theory. This model is employed as the basis of a harmonic change detection function which we use to improve the performance of a chord recognition algorithm. We develop a machine readable text syntax for chord symbols and present a hand labelled chord transcription collection of 180 Beatles songs annotated using this syntax. This collection has been made publicly avail- able and is already widely used for evaluation purposes in the research community. We also introduce methods for comparing chord symbols which we subsequently use for analysing the statistics of the transcription collection. To ensure that researchers are able to use our transcriptions with confidence, we demonstrate a novel alignment algorithm based on simple audio fingerprints that allows local copies of the Beatles audio files to be accurately aligned to our transcriptions automatically. Evaluation methods for chord symbol recall and segmentation mea- sures are discussed in detail and we use our chord comparison techniques as the basis for a novel dictionary-based chord symbol recall calculation. At the end of the thesis, we evaluate the performance of fifteen chord recognition algorithms (three of our own and twelve entrants to the 2009 MIREX chord detection evaluation) on the Beatles collection. Results are presented for several different evaluation measures using a range of evaluation parameters. The algorithms are compared with each other in terms of performance but we also pay special attention to analysing and discussing the benefits and drawbacks of the different evaluation methods that are used. 3 In memory of Stefan Demuth 4 Acknowledgements I am very glad to finally have the opportunity of thanking all the people who have helped me to produce this thesis. It’s taken a long time to finish it so the list has become very long... I would like to start by thanking my supervisor Mark Sandler for his guidance and support throughout the last seven years. His continued con- fidence in me and my work has helped me immensely. I would also like to thank Mike Davies, Mark Plumbley and Juan Bello for their helpful advice and enthusiasm, particularly in my first few years as a Ph.D student. My special thanks go to Matthew Davies for always being available to talk over ideas and particularly for his support while I was in the final stages of writing up. Without his help, I am quite sure I would never have completed this thesis. I am greatly indebted to Laurie Cuthbert, Yue Chen and Andy Watson for reorganising my teaching commitments and extending work deadlines giving me the space to complete my write up during term time. I would also like to thank my slightly neglected year 3 and year 4 students for their understanding this term. I am especially grateful to the following people for sitting through the listening tests that enabled me to verify my Beatles chord transcriptions. I hope that their ears will one day forgive me: Amir Dotan, Jonny Amer, Matthew Davies, Katy Noland, Ramona Behravan, Andrew Nesbit, Pey- man Heydarian, Dawn Black, Chris Cannam, Kurt Jacobson, Yves Rai- mond, Mark Levy, Andrew Robertson, Chris Sutton, Enrique Peres Gon- zales, George Fazekas, Maria Jafari, Beiming Wang, Emanuel Vincent and Becky Stewart. I am grateful to my friends and research colleagues at UPF Barcelona and OFAI in Vienna for being so welcoming and fun to work with. Thank 5 you to Emilia G´omez, David Garcia, Fabien Gouyon, Elias Pampalk, Mar- tin Gasser, Simon Dixon, Werner Goebel, David Stevens and Margit De- muth. Thanks to Jan Weil, Craig Sapp and Johan Pauwels for help and com- ments on my transcriptions and chord tools. Special thanks also to Mert Bay for providing the raw output data from the MIREX09 chord detection competition and answering my many questions about the MIREX evaluation process. Thanks also to my friends and colleagues past and present at Queen Mary. In no particular order: Anssi, Samer, Chris D, Chris L, Dawn, Katy, Piem, Nico, Andrew, Thomas, Manu, Matthias, Am´elie, Dan S, Chris S, Yves, Frank, Ling Ma, Karen, Ho, Andy M, Xue, Bindu, Zhang Dan, Gina, Wang Yi, Yashu, Paula and Vindya. Many thanks to Mum, Dad, Simon, Jan and David for helping to proof read bits of the thesis for me. Thanks to Simon Travaglia’s BOFH for providing me with amusement when I needed a break from writing. Finally I would like to thank Lingling, my friends, family and my ever- patient landlady Margaret for putting up with me while I was writing up. 6 Contents Abstract ................................ 3 Acknowledgements ......................... 5 1 Introduction ............................ 21 1.1 Motivation........................... 21 1.2 Thesisstructure........................ 23 1.3 Contribution.......................... 27 1.4 Publications.......................... 28 2 Theoretical underpinnings ................... 29 2.1 Musicalfundamentals. 29 2.1.1 Musical terms of reference . 29 2.1.2 Pitch perception and the harmonic series . 34 2.1.3 Pitch height and chroma . 34 2.2 Harmony in western tonal music . 37 2.2.1 Consonance and Dissonance . 37 2.2.2 Chords......................... 39 2.2.3 TuningSystems.................... 43 2.3 TheTonnetz.......................... 45 2.3.1 Longuet-Higgins’ pitch space . 47 2.3.2 Assuming pitchname equivalence: Chew’s Spiral array 54 2.3.3 Toroidal models of tonal space . 57 2.4 Extension of the toroidal pitch space model . 60 2.4.1 Distances in the six dimensional tonal model . 62 3 Chord recognition from audio ................. 65 3.1 Basic chord recognition system . 66 3.1.1 Audiofrontend.................... 66 3.2 Improved system using segmentation algorithms . 77 7 3.2.1 Tonal centroid . 77 3.2.2 Harmonic change detection function . 79 3.2.3 Chord segmentation with the HCDF . 80 3.2.4 Analysis of HCDF performance . 82 4 Representing chords in plain text .............. 86 4.1 Problems for plain text chord symbols . 87 4.1.1 Annotation style ambiguity . 87 4.1.2 Undefined syntax ambiguity . 88 4.1.3 Capitalisation and character choice ambiguity . 88 4.1.4 Key context ambiguity . 89 4.1.5 Chord label semantics . 90 4.1.6 Inflexible labelling models . 92 4.2 Development of a chord symbol syntax . 93 4.2.1 Specification...................... 93 4.2.2 Logicalmodel ..................... 94 4.2.3 Developing a syntax for plain text . 98 4.2.4 Shorthand....................... 100 4.3 Formal definition of chord label syntax . 105 4.4 A reference software toolkit . 107 5 Chord symbol comparison methods ............. 109 5.1 Manipulating chords and other musical objects . 110 5.1.1 Pitchnames and intervals . 111 5.2 Formalising the chord matching problem . 116 5.2.1 Stringmatching. 117 5.2.2 Pitchname set comparison . 118 5.2.3 Pitchclass set comparison . 120 5.3 Chordtypecomparison . 122 5.3.1 String matching chordtypes . 122 5.3.2 Interval set matching . 123 5.3.3 Relative pitchclass set matching . 124 5.4 Cardinality-limited comparison . 124 5.5 Treatment of bass intervals . 127 5.6 Chordlikenessmeasure. 128 5.6.1 Comparison of chord likeness function with TPSD . 131 8 6 A reference transcription dataset .............. 136 6.1 TheBeatlesstudioalbums . 136 6.2 Transcriptionfileformat . 139 6.3 Thetranscriptionprocess. 139 6.4 Transcriptionpolicy. 141 6.4.1 Treatment of the melody line . 142 6.4.2 Consistent inconsistencies . 142 6.5 Verification of transcriptions . 144 6.5.1 Converting chord transcriptions to MIDI files . 145 6.5.2 Synthesising MIDI files as digital audio . 145 6.5.3 Listeningtests. 147 6.6 Transcription collection statistics . 148 6.6.1 Chord cardinality . 148 6.6.2 Chord symbol counts: frequency and duration . 151 6.6.3 Chordroots ...................... 168 6.6.4 Triad content . 172 6.6.5 Chordtimes...................... 174 7 Local audio file alignment ................... 179 7.1 Access to audio for annotated data sets . 180 7.2 Aligning wave files using audio fingerprints . 182 7.2.1 Audiofingerprinting . 182 7.2.2 Fingerprint technique . 182 7.2.3 Alignment of local audio files using fingerprints . 184 7.2.4 Checking the alignment . 186 7.3 Testing the fingerprinting technique . 187 7.3.1 An alignment tool for the Beatles album collection 187 7.3.2 Fingerprint data files . 189 7.3.3 Testing......................... 191 8 Chord recognition evaluation methods ........... 192 8.1 Chordsymbolrecall. 193 8.1.1 Frame-based chord symbol recall . 193 8.1.2 Segment-based chord symbol recall . 195 8.1.3 Defining a correct match . 196 8.1.4 Dictionary-based recall evaluation . 205 8.2 Chord sequence likeness measure . 212 9 8.3 Chord recognition segmentation measurement . 213 8.3.1 Problems with using recall on its own . 214 8.3.2 Directional hamming distance . 216 8.4 Practical considerations . 219 8.4.1 Frame-based vs segment-based recall evaluation . 220 8.4.2 Combined chord recognition F-measure .