Audio Description and Corpus Analysis of Popular Music
Total Page:16
File Type:pdf, Size:1020Kb
AUDIO DESCRIPTION AND CORPUS ANALYSIS OF POPULAR MUSIC Beschrijving en corpusanalyse van populaire muziek (met een samenvatting in het Nederlands) Proefschrift ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof.dr. G.J. van der Zwaan, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op woensdag 15 juni 2016 des mid- dags te 2.30 uur door JAN MARKUS HARRIE VAN BALEN geboren op 26 oktober 1988 te Leuven, Belgie¨ Promotor: Prof.dr. R.C. Veltkamp Copromotor: Dr. F. Wiering Jan Van Balen, 2016. This thesis is licensed under a Creative Commons Attribution 4.0 International license. The research in this thesis was supported by the NWO CATCH project COGITCH (640.005.004). ABSTRACT In the field of sound and music computing, only a handful of studies are concerned with the pursuit of new musical knowledge. There is a substantial body of corpus analysis research focused on new musical insight, but almost all of it deals with symbolic data: scores, chords or manual annotations. In contrast, and despite the wide availability of audio data and tools for audio content analysis, very little work has been done on the corpus analysis of audio data. This thesis presents a number of contributions to the scientific study of music, based on audio corpus analysis. We focus on three themes: audio description, corpus analysis methodology, and the application of these description and analysis techniques to the study of music similarity and ‘hooks’. On the theme of audio description, we first present, in part i, an overview of the audio description methods that have been proposed in the music information retrieval literature, focusing on timbre, har- mony and melody. We critically review current practices in terms of their relevancy to audio corpus analysis. Throughout part ii and iii, we then propose new feature sets and audio description strategies. Contributions include the introduction of audio bigram features, pitch descriptors that can be used for retrieval as well as corpus analysis, and second-order audio features, which quantify distinctiveness and re- currence of feature values given a reference corpus. On the theme of audio corpus analysis methodology, we first situ- ate corpus analysis in the disciplinary context of music information retrieval, empirical musicology and music cognition. In part i, we then present a review of audio corpus analysis, and a case study com- paring two influential corpus-based investigations into the evolution of popular music [122,175]. Based on this analysis, we formulate a set of nine recommendations for audio corpus analysis research. In part ii and iii, we present, alongside the new audio description techniques, 4 new analysis methods for the study of song sections and within-song variation in a large corpus. Contributions on this theme include the first use of a probabilistic graphical model for the analysis of audio features. Finally, we apply new audio description and corpus analysis tech- niques to address two research problems of the cogitch project of which our research was a part: improving audio-based models of mu- sic similarity, and the analysis of hooks in popular music. In parts i and ii, we introduce soft audio fingerprinting, an umbrella MIR task that includes any efficient audio-based content identification. We then fo- cus on the problem of scalable cover song detection, and evaluate sev- eral solutions based on audio bigram features. In part iii, we review the prevailing perspectives on musical catchiness, recognisability and hooks. We describe Hooked, a game we designed to collect data on the recognisability of a set of song fragments. We then present a cor- pus analysis of hooks, and new findings on what makes music catchy. Across the three themes above, we present several contributions to the available methods and technologies for audio description and au- dio corpus analysis. Along the way, we present new insights into choruses, catchiness, recognisability and hooks. By applying the pro- posed technologies, following the proposed methods, we show that rigorous audio corpus analysis is possible and that the technologies to engage in it are available. 5 ACKNOWLEDGEMENTS For the last four years, I have been able to do work I thoroughly en- joy on a topic I truly care about, and I have been able to do this because of the vision and work of everyone involved in making the cogitch project a reality. I am very grateful to my supervisors Frans and Remco for their guidance, feedback and ideas. Your help has been essential to this thesis, and the freedom you have given me to explore many different topics from many different angles has helped me find my passion as a researcher. I also want to thank my closest collaborators on this project, Dim- itrios and Ashley, for all the amazing work they did, and for the many discussions we had at the office. This research would have been far less exciting without the amazing dataset that is the Hooked! and Hooked on Music data. It wouldn’t have existed without their tireless work on both of the games. It also wouldn’t have existed without the work of Henkjan Honing. I am grateful for his initiative and guidance, and for making me feel welcome in the Music Cognition Group and music reading group at the University of Amsterdam. I’m also grateful for many insightful discussions with colleages at the Meertens Institute and Beeld en Geluid. I especially thank every- one at the Meertens Institute for giving me a place to work in Ams- terdam, and to the members of Tunes and Tales and FACT for their company. A special word of gratitude goes out to Louis Grijp. Louis’ extraordinary vision laid the foundations for much of the digital mu- sic research in The Netherlands, and the cogitch project would not have existed without his decades of work on the Dutch Song Database. For their support inside and outside the office, I also thank my fel- low PhD students in the department—Marcelo, Vincent, Anna—as well as in Amsterdam, London and Barcelona, and at the different ISMIR conferences I was lucky to attend; I learned a lot from the in- teractions with students of my generation. 6 Finally, I would like to thank my brothers and my parents for their patience, hospitality and help, you gave me a welcome home in Bel- gium, London and South Africa. A very big thank you also goes to Maria for her patience and support—and letting me turn our house into a coffee lab during the last months. And last but not least, I must thank my friends in Leuven and Amsterdam: thank you for the drinks and dinners that kept me sane. 7 CONTENTS i introduction to audio features and corpus anal- ysis 16 1 introduction 17 1.1 Research Goals 17 1.1.1 Audio Corpus Analysis 17 1.1.2 The cogitch project 18 1.1.3 Research Goals 20 1.2 Disciplinary Context 21 1.2.1 Musicology in the Twentieth Century. 22 1.2.2 Music Information Retrieval 24 1.2.3 Empirical Musicology and Music Cognition 26 1.2.4 Why Audio? 30 1.2.5 Conclusion 32 1.3 Outline 33 1.3.1 Structure 33 1.3.2 How to Read This Thesis 35 2 audio description 37 2.1 Audio Features 37 2.1.1 Basis Features 39 2.1.2 Timbre Description 42 2.1.3 Harmony Description 46 2.1.4 Melody Extraction and Transcription 50 2.1.5 Psycho-acoustic Features 54 2.1.6 Learned Features 58 2.2 Applications of Audio Features 59 2.2.1 Audio Descriptors and Classification 60 2.2.2 Structure Analysis 65 2.2.3 Audio Fingerprinting and Cover Song Detection 71 2.3 Summary 77 8 CONTENTS 3 audiocorpusanalysis 79 3.1 Audio Corpus Analysis 79 3.1.1 Corpus Analysis 79 3.1.2 Audio Corpus Analysis 80 3.2 Review: Corpus Analysis in Music Research 81 3.2.1 Corpus Analysis Based on Manual Annotations 81 3.2.2 Corpus Analysis Based on Symbolic Data 83 3.2.3 Corpus Analysis Based on Audio Data 86 3.3 Methodological Reflections 89 3.3.1 Research Questions and Hypotheses 90 3.3.2 Choice of Data in Corpus Analysis 92 3.3.3 Reflections on Audio Features 93 3.3.4 Reflections on Analysis Methods 96 3.4 Case Study: the Evolution of Popular Music 99 3.4.1 Serra,` 2012 100 3.4.2 Mauch, 2015 103 3.4.3 Discussion 104 3.4.4 Conclusion 107 3.5 Summary and Desiderata 110 3.5.1 Research Questions and Hypotheses 110 3.5.2 Data 110 3.5.3 Audio Features 111 3.5.4 Analysis methods 112 3.6 To Conclude 113 ii chorus analysis & pitch description 114 4 chorus analysis 115 4.1 Introduction 115 4.1.1 Motivation 115 4.1.2 Chorus Detection 116 4.1.3 Chorus Analysis 118 4.2 Methodology 119 4.2.1 Datasets 119 4.2.2 Audio Features 121 4.3 Choruses in Early Popular Music 126 9 CONTENTS 4.4 Choruses in the Billboard dataset 129 4.4.1 Graphical Models 130 4.4.2 Chorusness 132 4.4.3 Implementation 133 4.4.4 Analysis Results 134 4.4.5 Discussion 135 4.4.6 Regression 137 4.4.7 Validation 138 4.5 Conclusions 138 5cognition-informed pitch description 140 5.1 Introduction 140 5.1.1 Improving Pitch Description 141 5.1.2 Audio Description and Cover Song Detection 142 5.2 Cognition-inspired Pitch Description 144 5.2.1 Pitch-based Audio Bigrams 145 5.2.2 Pitch Interval-based Audio Bigrams 147 5.2.3 Summary 149 5.3 Experiments 150 5.3.1 Data 150 5.3.2 Methods 151 5.3.3 Results & Discussion 157 5.4 Conclusions 161 6 audio bigrams 163 6.1 Introduction 163 6.2 Soft Audio Fingerprinting 164 6.3 Unifying Model 166 6.3.1 Fingerprints as Audio Bigrams 166 6.3.2 Efficient Computation 168 6.3.3 Audio Bigrams and 2DFTM 173 6.4 Implementation 176 6.4.1 pytch 176 6.4.2 Code Example 177 6.4.3 Example Experiment 178 6.5 Conclusions and Future Work 179 10 CONTENTS iii corpus analysis of hooks 181 7 hooked 182 7.1 Catchiness, Earworms and Hooks 182 7.2 Hooks in Musicology and Music Cognition 184 7.2.1 Hooks in Musicology 184 7.2.2 Hooks and Music Cognition 186 7.2.3 Summary: Hook Types 190 7.3 Experiment Design 191 7.3.1 Measuring Recognisability 191 7.3.2 Games and Music Research 192 7.3.3 Gameplay 193 7.3.4 Experiment Parameters 195 7.4 Implementations 198 7.4.1 Hooked! 198 7.4.2 Hooked on Music 201 7.5 Conclusion 202 8 hook analysis 204 8.1 Second-Order Audio Features 204 8.1.1 Second-Order Features 205 8.1.2 Second-Order Symbolic Features 205 8.1.3 Second-Order Audio Features 206 8.1.4 Song- vs.