Expressivity-Aware Tempo Transformations of Music

EXPRESSIVITY-AWARE TEMPO TRANSFORMATIONS OF MUSIC PERFORMANCES USING CASE BASED REASONING A dissertation submitted to the Department of Technology of the Pompeu Fabra University for the program in Computer Science and Digital Communication in partial fulfilment of the requirements for the degree of Doctor per la Universitat Pompeu Fabra Maarten Grachten July 2006 c 2006 Copyright by Maarten Grachten All rights reserved DOCTORAL DISSERTATION DIRECTION Thesis Advisor Dr. Josep Llu´ıs Arcos IIIA, Artificial Intelligence Research Institute CSIC, Spanish Council for Cientific Research Barcelona Tutor Dr. Xavier Serra Department of Technology Pompeu Fabra University Barcelona Dipòsit legal: B.54191-2006 ISBN: 978-84-690-3514-6 Abstract This dissertation is about expressivity-aware tempo transformations of monophonic audio recordings of saxophone jazz performances. It is a contribution to content-based audio processing, a field of technology that has recently emerged as an answer to the increased need to deal intelligently with the ever growing amount of digital multimedia information available nowadays. Content-based audio processing applications may for example search a data base for music that has a particular instrumentation, or musical form, rather than just searching for music based on meta-data such as the artist, or title of the piece. Content-based audio processing also includes making changes to the audio to meet spe- cific musical needs. The work presented here is an example of such content-based transformation. We have investigated the problem of how a musical performance played at a particular tempo can be rendered automatically at another tempo, while preserving naturally sounding expressivity. Or, differently stated, how does expressiveness change with global tempo. Changing the tempo of a given melody is a problem that cannot be reduced to just applying a uniform transformation to all the notes of the melody. The expressive resources for emphasizing the musical structure of the melody and the affective content differ de- pending on the performance tempo. We present a case based reasoning system to address this problem. It automatically performs melodic and expressive analysis, and it contains a set of examples of tempo-transformations, and when a new musical performance must be tempo-transformed, it uses the most similar example tempo-transformation to infer the changes of expressivity that are necessary to make the result sound natural. We have validated the system experimentally, and show that expressivity-aware tempo- transformation are more similar to human performances than tempo transformations ob- tained by uniform time stretching, the current standard technique for tempo transformation. Apart from this contribution as an intelligent audio processing application proto- type, several other contributions have been made in this dissertation. Firstly, we present a representation scheme of musical expressivity that is substantially more elaborate than existing representations, and we describe a technique to automatically annotate music i ii Abstract performances using this representation scheme. This is an important step towards fully- automatic case acquisition for musical CBR applications. Secondly, our method reusing past cases provides an example of solving synthetic tasks with multi-layered sequential data, a kind of task that has not been explored much in case based reasoning research. Thirdly, we introduce a similarity measure for melodies that computes similarity based on an semi-abstract musical level. In a comparison with other state-of-the-art melodic similarity techniques, this similarity measure gave the best results. Lastly, a novel evaluation methodology is presented to assess the quality of predictive models of musical expressivity. This research was performed at the Artificial Intelligence Research Institute (IIIA), Barcelona, Spain, and was funded by the Spanish Ministry of Science and Technology through the projects TIC 2000-1094-C02 TABASCO and TIC 2003-07776-C02 PROMUSIC. Acknowledgments I think behind every dissertation there is a host of people, all vital in some sense or another. The word has more than one sense: vital adjective 1. critical, vital (urgently needed; absolutely necessary) ”a critical element of the plan”; ”critical medical supplies”; ”vital for a healthy society”; ”of vital interest” 2. vital, life-sustaining (performing an essential function in the living body) ”vital organs”; ”blood and other vital fluids”; ”the loss of vital heat in shock”; ”a vital spot”; ”life-giving love and praise” 3. full of life, lively, vital (full of spirit) ”a dynamic full of life woman”; ”a vital and charis- matic leader”; ”this whole lively world” 4. vital (manifesting or characteristic of life) ”a vital, living organism”; ”vital signs” (WordNet 2.1, http://wordnet.princeton.edu/) For this dissertation, many people were vital in various of those senses, and I wish to express my appreciation and gratitude to all of them. First and foremost Josep Llu´ıs Arcos, who was my primary guide and source of feedback and motivation for this research. He and Ramon Lopez´ de Mantaras´ have given me the opportunity to work at the Artificial Intelligence Research Institute (IIIA) and suggested that I took up the Ph.D. in the first place. They have helped me a lot by suggesting or evaluating new ideas, tirelessly revising texts, having me participate in conferences, and expressing their faith in me. However, I wouldn’t even have known them without Rineke Verbrugge, my M.Sc. supervisor who inspired me to work at the IIIA. The three of them were definitely vital in sense 1. I am also indebted to Xavier Serra, who has generously given me a place to work as a collaborator at the Music Technology Group. It’s a great place to be, like a second home. Furthermore I wish to thank colleagues, fellow researchers, co-authors and (anony- mous) paper reviewers who contributed to my work by their work, by giving feedback, by iii iv Acknowledgments discussions etcetera. Emilia Gomez´ and Esteban Maestre were of great help, they provided audio analysis/synthesis tools and melody descriptions. Amaury Hazan, Rafael Ramirez, Alejandro Garc´ıa, Raquel Ros, Perfecto Herrera, Henkjan Honing, Gerhard Widmer, and Roger Dannenberg provided feedback (at some point or throughout the years) and made me rethink and refine ideas. I thank Rainer Typke, Frans Wiering, and the IMIRSEL crew (especially Xiao Hu and Stephen Downie) for the great amount of work they put in the MIREX ’05 Symbolic Melodic Similarity contest. David Temperley, Emilios Cambouropou- los, Belinda Thom, Mikael Djurfeldt, Lars Fabig, Jordi Bonada and Santi Ontanon˜ provided programs and source code I used. Finally, I owe respect to Eugene Narmour, his work has been of great importance in my research. Another group of people I hardly know but whose work I appreciate very much are the free software developers. Throughout the years I have used GNU/Linux as a free operating system, and grew to love it to an almost evangelical degree (to the despair of some). I should mention especially Richard Stallman, the godfather of free software, and the Debian/Ubuntu developers in general. Also, I got a lot of help from the developers of Guile. I wish to thank my parents Harma Vos, and Henri Grachten, and my sister Gerdi Grachten. They have shown me love, kindness and generosity throughout my life. And also my grandparents and relatives, particularly Elly Tomberg, Henk van Rees, Willemien van Heck, and Wim Mulder. And more people have been important to me, as friends, by being there and being vital (sense 3) every day, just now and then, or on a single occasion: Gabriela Villa, Hugo Sol´ıs, Aurea´ Bucio, Alfonso Perez,´ Sylvain le Groux, Amaury Hazan, Noem´ı Perez, Stephanie Langevin, Raquel Ros, Alejandro Garc´ıa, Dani Polak, Catarina Peres, Daniele Montagner, Patrick Zanon, Ping Xiao, Fabien Gouyon, Rui Pedro Paiva, Leandro Da Rold, Hendrik Purwins, Mary DeBacker, Helge Leonhardt, Michiel Hollanders, Sjoerd Druiven, Volker Nannen, Marten Voulon, and (not least) Lieveke van Heck. Contents Abstract i Acknowledgments iii 1 Introduction 1 1.1 Musical Expressivity and Tempo . 3 1.2 Problem Definition, Scope and Research Objectives . 6 1.2.1 High-level Content-Based Tempo Transformation . 6 1.2.2 Scope . 7 1.2.3 Representation of Expressivity . 8 1.2.4 Melodic Retrieval Mechanisms . 8 1.2.5 Performance Model Evaluation . 8 1.3 Outline of the Dissertation . 8 2 Preliminaries and Related Work 11 2.1 Expressivity in Musical Performances . 11 2.2 Functions of Expressivity . 12 2.2.1 Expressivity and Musical Structure . 12 2.2.2 Expressivity and Emotional Content . 14 2.3 Definitions of Expressivity . 15 2.4 Computational Models of Expressivity . 17 2.4.1 Common Prerequisites . 17 2.4.2 Analysis by Measurement . 18 2.4.3 Analysis by Synthesis . 19 2.4.4 Automatic Rule Induction . 21 2.4.5 Case Based Reasoning . 22 2.4.6 Evaluation of Expressive Models . 24 v vi CONTENTS 2.5 Melodic Similarity . 27 2.6 Case Based Reasoning . 29 2.6.1 The CBR Cycle . 31 2.6.2 CBR Characterization . 34 2.7 Conclusions . 35 3 System Architecture/Data Modeling 39 3.1 A Global View of the System . 39 3.1.1 Problem Analysis . 41 3.1.2 Problem Solving . 42 3.2 Musical Corpus . 44 3.3 Melody Representation . 45 3.4 Musical Models . 46 3.4.1 GTTM . 46 3.4.2 The Implication-Realization Model . 47 3.5 Melody Segmentation . 49 3.5.1 Melisma . 50 3.5.2 Local Boundary Detection Model . 50 3.5.3 I-R based Segmentation . 51 3.5.4 n-Grams Segmentation . 51 3.5.5 Binary Segmentation . 52 3.5.6 Characterization Segmentation Methods . 52 3.6 Performance Representation . 54 3.6.1 Describing Expressivity through Performance Events . 55 4 Knowledge Acquisition 61 4.1 The Implication-Realization Parser . 62 4.1.1 Scope and Limitations .

Load more