A Computational Approach to Rhythm Description Audio
Total Page:16
File Type:pdf, Size:1020Kb
A COMPUTATIONAL APPROACH TO RHYTHM DESCRIPTION AUDIO FEATURES FOR THE COMPUTATION OF RHYTHM PERIODICITY FUNCTIONS AND THEIR USE IN TEMPO INDUCTION AND MUSIC CONTENT PROCESSING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF TECHNOLOGY OF THE UNIVERSITY POMPEU FABRA FOR THE PROGRAM IN COMPUTER SCIENCE AND DIGITAL COMMUNICATION IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR PER LA UNIVERSITAT POMPEU FABRA WITH THE MENTION OF EUROPEAN DOCTOR Fabien Gouyon 2005 Dipòsit legal: B.10835-2006 ISBN: 84-689-7256-8 c Copyright by Fabien Gouyon 2005 All Rights Reserved ii DOCTORAL DISSERTATION DIRECTION Co-Advisor Dr. Xavier Serra Department of Technology Pompeu Fabra University, Barcelona Co-Advisor Dr. Gerhard Widmer Department of Computational Perception Johannes Kepler University, Linz iii iv Abstract This dissertation is about musical rhythm. More precisely, it is concerned with computer programs that automatically extract rhythmic descriptions from musical audio signals. New algorithms are presented for tempo induction, tatum estimation, time signature determi- nation, swing estimation, swing transformations and classification of ballroom dance music styles. These algorithms directly process digitized recordings of acoustic musical signals. The backbones of these algorithms are rhythm periodicity functions: functions measuring the salience of a rhythmic pulse as a function of the period (or frequency) of the pulse, calculated from selected instantaneous physical attributes (henceforth features) emphasizing rhythmic aspects of sound. These features are computed at a constant time rate on small chunks (frames) of audio signal waveforms. Our algorithms determine tempo and tatum of different genres of music, with almost constant tempo, with over 80% accuracy if we do not insist on finding a specific metrical level. They identify time signature with around 90% accuracy, assuming lower metrical levels are known. They classify ballroom dance music in 8 categories with around 80% accuracy when taking nothing but rhythmic aspects of the music into account. Finally they add (or remove) swing to musical audio signals in a fully-automatic fashion, while conserving very good sound quality. From a more general standpoint, this dissertation substantially contributes to the field of com- putational rhythm description a) by proposing an unifying functional framework; b) by reviewing the architecture of many existing systems with respect to individual blocks of this framework; c) by organizing the first public evaluation of tempo induction algorithms; and d) by identifying promis- ing research directions, particularly with respect to the selection of instantaneous features which are best suited to the computation of useful rhythm periodicity functions and the strategy to combine and parse multiple sources of rhythmic information. This research was performed at the Music Technology Group of Pompeu Fabra University in Barcelona. Primary support was provided by the EU projects FP6-507142 SIMAC (http://www.semanticaudio.org) and IST-1999-20194 CUIDADO. v vi Acknowledgments It goes without saying that the work reported in this document is not the work of a single person. It really is a collaborative effort and, at the risk of unfair omission, I want to express my gratitude to those who participated. I have been extremely lucky to reside for five years in the Music Technology Group of Pompeu Fabra University in Barcelona, where I could interact with such brilliant and insightful researchers. The first person I would like to thank is Xavier Serra for inviting me to join the Music Technology Group, when we met at CCRMA back in 1999, and for encouraging and supporting me since then. I am also grateful for the multiple opportunities I had to make my work public. I truly enjoyed working with many people in the MTG and I want to thank them here for the interesting discussion we had and for proof-reading writings of mine: Alex` Loscos, Martin Kaltenbrunner, G¨unter Geiger, Alvaro´ Barbosa, Lars Fabig, Emilia G´omez, Jordi Bonada, Jordi Janer, Sergi Jord`a, Ross Bencina, Bram de Jong, Sebastian Streich, Bee-Suan Ong, Oscar Celma, Markus Koppenberger, Jos´e-Pedro Garc´ıa, Hendrik Purwins, Nicolas Wack, Oscar Mayor, Hugo Sol´ıs, Amaury Hazan and Maarten Grachten. I also want to thank Marteen de Boer, Ram´on Loureiro and Carlos Atance for providing the means to work in amazingly smooth material conditions. Other technical thanks go to David Garc´ıa and Miquel Ram´ırez. Thanks also to Joana Clotet and Cristina Garrido for helping in all administrative tasks and, most importantly, for creating such a nice atmosphere around them. Pedro Cano, Perfecto Herrera and Simon Dixon deserve a special mention for their enormous (and I mean it) influence on my work, for being amazing sources of knowledge and inspiration and always being willing to spend a couple of hours (with or without beer or “vieille prune”) discussing this or that aspect of some scientific problem, or proof-reading hundreds of pages. Special credits should also be given to Xavier Serra and Gerhard Widmer, my supervisors, for their expert guidance throughout the years. I also wish to reflect on the past and thank the people who brought me to this point (writing the acknowledgments of a PhD dissertation). The first credits should be given to my parents, Jean-Paul and Christine Gouyon, my elder sister Val´erie Brunot and my grand-parents, Georges and Jeanne Gouyon and Lucienne Bonin, for their love and support of course but also for transmitting to me the love of learning. I also thank Zizou and Daniel Tardivon for their unconditional support while I vii lived in Paris as well as Pierre-Fran¸cois, Lise and Fanny Brunot for their love. Many other people (professors, work colleagues or just fellow researchers with whom I had the chance to have insightful discussions) influenced my research in the last years. I want to thank them here, in a chronological order: R´egine Andr´e-Obrecht, Philippe Lepain and Philippe Joly (at IRIT), Francis Castani´e, Jean-Yves Tourneret and Corinne Mailhes (at ENSEEIHT), Julius Orion Smith III, Harvey Thornburg, Stefan Bilbao, Bob Sturm, Hendrik Purwins, Tamara Smyth, Stefania Serafin, Perry Cook, Juan Pampin, David Berners, Fernando Lopez-Lezcano, Dan Levitin and Caroline Traube (at CCRMA), Jeremy Marozeau, Thibaut Ehrette, Emilia G´omez, Xavier Rodet, Olivier Warusfel, Jean-Claude Risset, G´erard Assayag, Daniel Arfib, Richard Kronland-Martinet, Philippe Depalle, Stephen McAdams, Alain de Cheveign´e, Hughes Vinet, Geoffroy Peeters, Olivier Lartillot and Benoit Meudic (at IRCAM), Fran¸cois Pachet, Olivier Delerue, Luc Steels and Gert Westermann (at Sony CSL), Elias Pampalk and Werner Goebl (at OFAI),¨ Anssi Klapuri (with a special thanks for his thorough and inspiring review of this document), Mark Sandler, Juan-Pablo Bello, George Tzanetakis, Tristan Jehan, Daniel Pressnitzer, Peter Desain, Martin McKinney, Douglas Eck, Robert Zatorre, Barbara Tillmann, Sofia Dahl and Guy Madison. More closely related to this document, I also want to thank the coauthors of my papers who all contributed an important part of the work presented here and anonymous reviewers of these papers who (usually) helped in improving them. For being great friends, making Barcelona a place where it is so difficult to work and (for some of them) being able to cope with my moody behavior in the last months of writing my thesis, I would like to thank Pedro Cano, Chiara Capuccio, Martin Kaltenbrunner, Lars Fabig, Jordi Janer, Marie- Florence Deruffi, Daniele Bertolucci, G¨unter Geiger, Nadine Alber, Alvaro´ Barbosa, Cristina Alves de S´a, Joachim and Line Haas, Alex` Loscos, Jos´e-Pedro Garc´ıa, Markus Koppenberger, Rossella Cascone, Marion Hassler, Alejandro Ramirez, Cecilia Gayon, Vadim Tarasov, Mila Makarevitch, Emilia G´omez, Jordi Bonada, Maira Laranjeira, Aurora Iraita, Laura Colloridi, Begonya Samit, Hagar Zeligman, Arantxa, Silvia Poblete, Claudia Rivera, Macarena Francia, los del caj´on and Joana Martines. But Barcelona is not all... In other parts of the world, other people cannot go unmentioned for they shared fun parts (and helped me through hard parts) of my life since I began this work. I wish to thank them in a geographical order (why not?, from the closest to the farthest): Ivan and Colin Segalowitch, Philippe Nadal, Philippe Terral, Jordan Treff´e, Christine and Jos´e Morat´e, Marianne Chambon, Camille Bonaldi, Alice and Etienne´ Bertaud du Chazaud, B´eatrice Bou´e, Michael Janus, Gerda Strobl, Claudia Kogler, Eva Kahler, Claudia Vargas and Katsuhiko Sakamoto. Energy was supplied by 30 plats, el Salvador, el Rodri, el Sarima, el Foro, el Econ´omic, “el de en frente del Econ´omic,” el Rinc´on de la Ciutadela, el Pasatore, “los pescaditos,” la Plata, el Refugio del Puerto and of course... “las alemanas.” Burning energy excesses was possible thanks to the Club Nataci´oAtl`etic-Barceloneta and the beach on which I could run while admiring such wonderful scenes. viii Contents Abstract v 1 Introduction 1 1.1 Rhythmperiodicityfunctions . ......... 1 1.2 Methodology ..................................... ... 2 1.3 Aimsanddissertationoutline . ......... 4 2 Survey 7 2.1 Representingmusicalrhythm . ........ 7 2.1.1 Metricalstructure ............................. .... 9 2.1.2 Tempo—Tactus.................................. 11 2.1.3 Timing ....................................... 13 2.2 Computationalrhythmdescription . .......... 15 2.2.1 Featurelistcreation . ..... 17 2.2.1.1 Event-wisefeatures . 19 2.2.1.2 Frame-wisefeatures