Computational Methods for Tonality-Based Style Analysis of Classical Music Audio Recordings
Total Page:16
File Type:pdf, Size:1020Kb
Fakult¨at fur¨ Elektrotechnik und Informationstechnik Computational Methods for Tonality-Based Style Analysis of Classical Music Audio Recordings Christof Weiß geboren am 16.07.1986 in Regensburg Dissertation zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) Angefertigt im: Fachgebiet Elektronische Medientechnik Institut fur¨ Medientechnik Fakult¨at fur¨ Elektrotechnik und Informationstechnik Gutachter: Prof. Dr.-Ing. Dr. rer. nat. h. c. mult. Karlheinz Brandenburg Prof. Dr. rer. nat. Meinard Muller¨ Prof. Dr. phil. Wolfgang Auhagen Tag der Einreichung: 25.11.2016 Tag der wissenschaftlichen Aussprache: 03.04.2017 urn:nbn:de:gbv:ilm1-2017000293 iii Acknowledgements This thesis could not exist without the help of many people. I am very grateful to everybody who supported me during the work on my PhD. First of all, I want to thank Prof. Karlheinz Brandenburg for supervising my thesis but also, for the opportunity to work within a great team and a nice working enviroment at Fraunhofer IDMT in Ilmenau. I also want to mention my colleagues of the Metadata department for having such a friendly atmosphere including motivating scientific discussions, musical activity, and more. In particular, I want to thank all members of the Semantic Music Technologies group for the nice group climate and for helping with many things in research and beyond. Especially|thank you Alex, Ronny, Christian, Uwe, Estefan´ıa, Patrick, Daniel, Ania, Christian, Anna, Sascha, and Jakob for not only having a prolific working time in Ilmenau but also making friends there. Furthermore, I want to thank several students at TU Ilmenau who worked with me on my topic. Special thanks go to Prof. Meinard Muller¨ for co-supervising my thesis, for a lot of scientific input, and for some very fruitful collaborations during my PhD work, but also for being always welcome in his great research group where I have now the honour of being part of. Thank you also Jonathan, Thomas, Stefan, Christian, Patricio, Frank, Julia, and Vlora for this pleasant working atmosphere and the good time in the past year. I also received inspiration from other sides. In particular, I want to thank Simon Dixon, Matthias Mauch, and several PhD students from the Centre for Digital Music at Queen Mary University of London. Thank you for the chance to spend two extended research stays, which pushed me forward a lot. I am also grateful for all collaborations on the musicology side involving people in Wurzburg,¨ Saarbrucken,¨ and others. In this context, I also want to thank Prof. Wolfgang Auhagen for co-supervising my dissertation. Furthermore, I am thankful to those who taught me to understand and love music, with a special mention of Karin Berndt-Vogel, Hermann Beyer, Prof. Zsolt G´ardonyi, Tobias Schneid, and Prof. Heinz Winbeck. Fortunately, I had the opportunity to focus on my work and following my own ideas with- out being distracted with many other tasks. I am very grateful to the Foundation of German Business (Stiftung der Deutschen Wirtschaft), whose financial support during both my stud- ies and my PhD time opened up many possibilities for me. Far more importantly, the unique spirit of this great community gave me a lot of inspiration. Thanks to everyone working in Berlin for creating this special atmosphere and thanks to all sdw friends in Wurzburg,¨ Ilmenau, Erfurt, and elsewhere for bringing this spirit to life. Finally, I want to say a deep \thank you" to my parents Rita and Josef Weiß. Thank you for accompanying me all the way, for giving me the chance to do the things I want to do, and for laying the foundations to this. I also want to thank my brother Thomas and my extended family for all support and interest. And last, thank you, Ulli, for your love and support, and for being there in the good and in the harder times. iv v Abstract With the tremendously growing impact of digital technology, the ways of accessing music crucially changed. Nowadays, streaming services, download platforms, and private archives provide a large amount of music recordings to listeners. As tools for organizing and browsing such collections, automatic methods have become important. In the area of Music Informa- tion Retrieval, researchers are developing algorithms for analyzing and comparing music data with respect to musical characteristics. One typical application scenario is the classification of music recordings according to categories such as musical genres. In this thesis, we approach such classification problems with the goal of discriminating subgenres within Western classical music. In particular, we focus on typical categories such as historical periods or individual composers. From a musicological point of view, this classi- fication problem relates to the question of musical style, which constitutes a rather ill-defined and abstract concept. Usually, musicologists analyze musical scores in a manual fashion in order to acquire knowledge about style and its determining factors. This thesis contributes with computational methods for realizing such analyses on comprehensive corpora of audio recordings. Though it is hard to extract explicit information such as note events from audio data, the computational analysis of audio recordings might bear great potential for musi- cological research. One reason for this is the limited availability of symbolic scores in high quality. The style analysis experiments presented in this thesis focus on the fields of harmony and tonality. In the first step, we use signal processing techniques for computing chroma representations of the audio data. These semantic \mid-level" representations capture the pitch class content of an audio recording in a robust way and, thus, constitute a suitable starting point for subsequent processing steps. From such chroma representations, we derive measures for quantitatively describing stylistic properties of the music. Since chroma features suppress timbral characteristics to a certain extent, we hope to achieve invariance to timbre and instrumentation for our analysis methods. Inspired by the characteristics of the chroma representations, we model in this thesis specific concepts from music theory and propose algorithms to measure the occurence of certain tonal structures in audio recordings. One of the proposed methods aims at estimating the global key of a piece by considering the particular role of the final chord. Another contribution of this thesis is an automatic method to visualize modulations regarding diatonic scales as well as scale types over the course of a piece. Furthermore, we propose novel techniques for estimating the presence of specific interval and chord types and for measuring more abstract notions such as tonal complexity. In first experiments, we show the features' behavior for individual pieces and discuss their musical meaning. On the basis of these novel types of audio features, we perform comprehensive experiments for analyzing and classifying audio recordings regarding musical style. For this purpose, we apply methods from the field of machine learning. Using unsupervised clustering methods, we investigate the similarity of musical works across composers and composition years. Even though the underlying feature representations may be imprecise and error-prone in some cases, we can observe interesting tendencies that may exhibit some musical meaning when vi analyzing large databases. For example, we observe an increase of tonal complexity during the 19th and 20th century on the basis of our features. As an essential contribution of this dissertation, we perform automatic classification experiments according to historical periods (\eras") and composers. We compile two datasets, on which we test common classifiers using both our tonal features and standardized audio features. Despite the vagueness of the task and the complexity of the data, we obtain good results for the classification with respect to historical periods. This indicates that the tonal features proposed in this thesis seem to robustly capture some stylistic properties. In contrast, using standardized timbral features for classification often leads to overfitting to the training data resulting in worse performance. Comparing different types of tonal features revealed that features relating to interval types, tonal complexity, and chord progressions are useful for classifying audio recordings with respect to musical style. This seems to validate the hypothesis that tonal characteristics can be discriminative for style analysis and that we can measure such characteristics directly from audio recordings. In summary, the interplay between musicology and audio signal processing can be very promising. When applied to a specific example, we have to be careful with the results of computational methods, which, of course, cannot compete with the experienced judgement of a musicologist. For analyzing comprehensive corpora, however, computer-assisted techniques provide interesting opportunities to recognize fundamental trends and to verify hypotheses. vii Zusammenfassung Im Zuge der fortschreitenden Digitalisierung vieler Lebensbereiche ist eine deutliche Ver¨ande- rung des Musikangebots festzustellen. Streamingdienste, Downloadportale und auch private Archive stellen dem H¨orer umfangreiche Kollektionen von Musikaufnahmen zur Verfugung.¨ Bei der Strukturierung solcher Archive und der Suche nach Inhalten spielen automatische Methoden eine immer wichtigere Rolle. In diesem Kontext widmet sich der noch junge For- schungsbereich des Music Information Retrieval