Computational Models of Music Similarity and Their Applications In
Total Page:16
File Type:pdf, Size:1020Kb
DISSERTATION Computational Models of Music Similarity and their Application in Music Information Retrieval ausgef¨uhrt zum Zwecke der Erlangung des akademischen Grades Doktor der technischen Wissenschaften unter der Leitung von Univ. Prof. Dipl.-Ing. Dr. techn. Gerhard Widmer Institut f¨urComputational Perception Johannes Kepler Universit¨atLinz eingereicht an der Technischen Universit¨at Wien Fakult¨atf¨urInformatik von Elias Pampalk Matrikelnummer: 9625173 Eglseegasse 10A, 1120 Wien Wien, M¨arz 2006 Kurzfassung Die Zielsetzung dieser Dissertation ist die Entwicklung von Methoden zur Unterst¨utzung von Anwendern beim Zugriff auf und bei der Entdeckung von Musik. Der Hauptteil besteht aus zwei Kapiteln. Kapitel 2 gibt eine Einf¨uhrung in berechenbare Modelle von Musik¨ahnlich- keit. Zudem wird die Optimierung der Kombination verschiedener Ans¨atze beschrieben und die gr¨oßte bisher publizierte Evaluierung von Musik¨ahnlich- keitsmassen pr¨asentiert. Die beste Kombination schneidet in den meisten Evaluierungskategorien signifikant besser ab als die Ausgangsmethode. Beson- dere Vorkehrungen wurden getroffen um Overfitting zu vermeiden. Um eine Gegenprobe zu den Ergebnissen der Genreklassifikation-basierten Evaluierung zu machen, wurde ein H¨ortest durchgef¨uhrt.Die Ergebnisse vom Test best¨a- tigen, dass Genre-basierte Evaluierungen angemessen sind um effizient große Parameterr¨aume zu evaluieren. Kapitel 2 endet mit Empfehlungen bez¨uglich der Verwendung von Ahnlichkeitsmaßen.¨ Kapitel 3 beschreibt drei Anwendungen von solchen Ahnlichkeitsmassen.¨ Die erste Anwendung demonstriert wie Musiksammlungen organisiert and visualisiert werden k¨onnen,so dass die Anwender den Ahnlichkeitsaspekt,¨ der sie interessiert, kontrollieren k¨onnen. Die zweite Anwendung demon- striert, wie auf der K¨unstlerebene Musiksammlungen hierarchisch in sich ¨uberlappende Gruppen organisiert werden k¨onnen. Diese Gruppen werden mittels W¨orter von Webseiten zusammengefasst, welche mit den K¨unstlern assoziiert sind. Die dritte Anwendung demonstriert, wie mit minimalen An- wendereingaben Playlisten generiert werden k¨onnen. Abstract This thesis aims at developing techniques which support users in accessing and discovering music. The main part consists of two chapters. Chapter 2 gives an introduction to computational models of music simi- larity. The combination of different approaches is optimized and the largest evaluation of music similarity measures published to date is presented. The best combination performs significantly better than the baseline approach in most of the evaluation categories. A particular effort is made to avoid overfitting. To cross-check the results from the evaluation based on genre classification a listening test is conducted. The test confirms that genre- based evaluations are suitable to efficiently evaluate large parameter spaces. Chapter 2 ends with recommendations on the use of similarity measures. Chapter 3 describes three applications of such similarity measures. The first application demonstrates how music collections can be organized and vi- sualized so that users can control the aspect of similarity they are interested in. The second application demonstrates how music collections can be orga- nized hierarchically into overlapping groups at the artist level. These groups are summarized using words from web pages associated with the respective artists. The third application demonstrates how playlists can be generated which require minimum user input. Acknowledgments This work was carried out while I was working as a researcher at the Austrian Research Institute for Artificial Intelligence (OFAI). I want to thank Gerhard Widmer for giving me the opportunity to work at OFAI and supervising me. His constant feedback and guidance have shaped this thesis from the wording of the title to the structure of the conclusions. I want to thank Andreas Rauber for reviewing this thesis, for convincing me to start working on MIR in the first place (back in 2001), and for his supervision during my Master’s studies. Colleagues at OFAI I want to thank my colleagues at OFAI for lots of interesting discussions, vari- ous help, and making life at OFAI so enjoyable: Simon Dixon, Werner Goebl, Dominik Schnitzer, Martin Gasser, Søren Tjagvad Madsen, Asmir Tobudic, Paolo Petta, Markus Mottl, Fabien Gouyon, and those who have moved to Linz in the meantime: Tim Pohle, Peter Knees, and Markus Schedl. I espe- cially want to thank: Simon for pointers to related work, help with signal processing, Linux, English expressions, lots of useful advice, and helpful com- ments on this thesis; Werner for introducing me to the fascinating world of expressive piano performances; Dominik for some very inspiring discussions and for the collaboration on industry projects related to this thesis; Mar- tin for helping me implement the “Simple Playlist Generator” and for lots of help with Java; Paolo for always being around when I was looking for someone to talk to; Arthur for interesting discussions related to statistical evaluations; Markus M. for help with numerical problems, Fabien for inter- esting discussions related to rhythm features and helpful comments on this thesis; Tim for the collaboration on feature extraction and playlist genera- tion; Peter for the collaboration on web-based artist similarity; and Markus S. for the collaboration on visualizations. I also want to thank the colleagues from the other groups at OFAI, the system administrators, and the always helpful secretariat. It was a great time and I would have never managed to complete this thesis without all the help I got. Other Colleagues I want to thank Xavier Serra and Mark Sandler for inviting me to visit their labs where I had the chance to get to know many great colleagues. I want to thank Perfecto Herrera for all his help during my visit to the Music Technol- ogy Group at UPF, for his help designing and conducting listening tests (to evaluate similarity measures for drum sounds), and for many valuable point- ers to related work. I’m also very grateful to Perfe for his tremendous efforts writing and organizing the SIMAC project proposal. Without SIMAC this thesis might not have been related to MIR. (Special thanks to the European Commission for funding it!) Juan Bello has been very helpful in organizing my visit to the Centre for Digital Music at QMUL. I’m thankful for the in- teresting discussions on chord progressions and harmony in general. Which brings me to Christopher Harte who has taught me virtually everything I know about chords, harmony, and tonality. I’m also grateful for the guitar lessons he gave me. I want to thank Peter Hlavac for very interesting discus- sions on the music industry, and in particular for help and ideas related to the organization of drum sample libraries. I want to thank John Madsen for insightful discussions which guided the direction of my research. I’m grateful to many other colleagues in the MIR community who have been very helpful in many ways. For example, my opinions and views on evaluation procedures for music similarity measures were influenced by the numerous discussions on the MIREX and Music-IR mailing lists. In this context I particularly want to thank Kris West for all his enthusiasm and efforts. I’m also thankful to: Jean-Julien Aucouturier for a helpful review of part of my work; Beth Logan for help understanding the KL-Divergence; Masataka Goto for giving me the motivation I needed to finish this thesis a lot sooner than I would have other- wise; Stephan Baumann for interesting discussions about the future of MIR; and Emilia G´omez for interesting discussions related to harmony. Special thanks also to Markus Fr¨uhwirth who helped me get started with MIR. Friends & Family I’m very grateful to my friends for being who they are, and for helping me balance my life and giving me valuable feedback and ideas related to MIR applications. I especially want to thank the Nordsee gang for many very interesting discussions. Finally, I want to thank those who are most important: my parents and my two wonderful sisters. I want to thank them for sharing so much with me, for always being there for me, for all the love, encouragement, and support. Financial Support This work was supported by the EU project SIMAC (FP6-507142) and by the project Y99-INF, sponsored by the FWF in the form of a START Research Prize. Contents 1 Introduction 1 1.1 Outline of this Thesis ....................... 2 1.1.1 Contributions of this Doctoral Thesis ........... 4 1.2 Matlab Syntax ........................... 5 2 Audio-based Similarity Measures 9 2.1 Introduction ............................ 9 2.1.1 Sources of Information ................... 10 2.1.2 Related Work ....................... 12 2.2 Techniques ............................. 13 2.2.1 The Basic Idea (ZCR Illustration) ............. 14 2.2.2 Preprocessing (MFCCs) .................. 16 2.2.2.1 Power Spectrum ................. 17 2.2.2.2 Mel Frequency .................. 18 2.2.2.3 Decibel ...................... 21 2.2.2.4 DCT ....................... 22 2.2.2.5 Parameters .................... 25 2.2.3 Spectral Similarity ..................... 25 2.2.3.1 Related Work .................. 25 2.2.3.2 Thirty Gaussians and Monte Carlo Sampling (G30) 26 2.2.3.3 Thirty Gaussians Simplified (G30S) ....... 29 2.2.3.4 Single Gaussian (G1) .............. 33 2.2.3.5 Computation Times ............... 34 2.2.3.6 Distance Matrices ................ 36 2.2.4 Fluctuation Patterns .................... 36 2.2.4.1 Details ...................... 38 2.2.4.2 Illustrations