Improving Supervised Music Classification by Means of Multi
Total Page:16
File Type:pdf, Size:1020Kb
Improving Supervised Music Classification by Means of Multi-Objective Evolutionary Feature Selection Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Technischen Universit¨atDortmund an der Fakult¨atf¨urInformatik von Igor Vatolkin Dortmund 2013 II Tag der m¨undlichen Pr¨ufung : 17.04.2013 Dekan : Prof. Dr.-Ing. Gernot Fink Gutachter : Prof. Dr. G¨unter Rudolph : Prof. Dr. Claus Weihs Contents Abstract 1 Acknowledgements3 1. Introduction5 1.1. Motivation and scope..............................5 1.2. Main achievements and structure of the thesis.................7 1.3. Previous own publications............................8 2. Music Data Analysis 11 2.1. Background.................................... 11 2.1.1. Application scenarios........................... 12 2.1.2. Music data sources............................ 14 2.1.3. Algorithm chain............................. 17 2.2. Feature extraction................................ 21 2.2.1. Low-level and high-level descriptors.................. 21 2.2.2. Music and signal processing....................... 24 2.2.3. Audio features.............................. 28 2.2.3.1. Timbre and energy....................... 30 2.2.3.2. Chroma and harmony..................... 31 2.2.3.3. Tempo, rhythm, and structure................ 33 2.2.3.4. High-level descriptors from classification models...... 33 2.3. Feature processing................................ 33 2.3.1. Preprocessing............................... 36 2.3.2. Processing of feature dimension..................... 37 2.3.3. Processing of time dimension...................... 38 2.3.4. Building of classification frames..................... 43 2.4. Classification................................... 44 2.4.1. Decision trees and random forest.................... 47 2.4.2. Naive Bayes................................ 49 2.4.3. Support vector machines......................... 50 3. Feature Selection 53 3.1. Targets and methodology............................ 53 3.2. Evolutionary feature selection.......................... 56 3.2.1. Basics of evolutionary algorithms.................... 56 3.2.2. Multi-objective optimisation...................... 58 3.2.3. Reasons for evolutionary multi-objective feature selection...... 60 3.2.4. SMS-EMOA customisation for feature selection............ 62 3.3. Sliding feature selection............................. 64 IV Contents 3.4. Related works................................... 66 3.4.1. Evolutionary feature selection...................... 66 3.4.2. Feature selection in music classification................ 68 4. Evaluation Methods 71 4.1. Evaluation metrics................................ 71 4.1.1. Confusion matrix and classification performance measures...... 72 4.1.2. Further metrics.............................. 76 4.2. Organisation of data for evaluation....................... 79 4.3. Statistical hypothesis testing.......................... 82 5. Application of Feature Selection 87 5.1. Recognition of high-level features........................ 87 5.1.1. Instruments................................ 87 5.1.2. Moods................................... 91 5.1.3. GFKL 2011 features........................... 95 5.2. Recognition of genres and styles......................... 97 5.2.1. Low-level feature selection........................ 104 5.2.2. High-level feature selection....................... 108 5.2.3. Comparison of low-level and high-level feature sets.......... 111 5.2.4. Analysis of high-level features...................... 118 6. Conclusions 127 6.1. Summary of results................................ 127 6.2. Directions for future research.......................... 130 A. Feature Lists 135 A.1. Timbre and energy features (low-level)..................... 135 A.2. Chroma and harmony features (low-level and high-level)........... 136 A.3. Temporal characteristics (low-level and high-level).............. 137 A.4. Instruments (high-level)............................. 138 A.5. Moods (high-level)................................ 139 A.6. GFKL-2011 (high-level)............................. 140 A.7. Structural complexity characteristics and features for their estimation... 143 B. Song Lists 145 B.1. Genre and style album distribution....................... 145 B.2. Genre and style training sets.......................... 148 B.3. Genre and style optimisation and holdout sets................. 151 Bibliography 157 List of Symbols 177 Index 179 Abstract In this work, several strategies are developed to reduce the impact of the two limitations of most current studies in supervised music classification: the classification rules and music features have often a low interpretability, and the evaluation of algorithms and feature subsets is almost always done with respect to only one or a few common evaluation criteria separately. Although music classification is in most cases user-centered and it is desired to understand well the properties of related music categories, many current approaches are based on low- level characteristics of the audio signal. We have designed a large set of more meaningful and interpretable high-level features, which may completely replace the baseline low-level feature set and are even capable to significantly outperform it for the categorisation into three music styles. These features provide a comprehensible insight into the properties of music genres and styles: instrumentation, moods, harmony, temporal, and melodic char- acteristics. A crucial advantage of audio high-level features is that they can be extracted from any digitally available music piece, independently of its popularity, availability of the corresponding score, or the Internet connection for the download of the metadata and community features, which are sometimes erroneous and incomplete. A part of high-level features, which are particularly successful for classification into genres and styles, has been developed based on the novel approach called sliding feature selection. Here, high-level features are estimated from low-level and other high-level ones during a sequence of super- vised classification steps, and an integrated evolutionary feature selection helps to search for the most relevant features in each step of this sequence. Another drawback of many related state-of-the-art studies is that the algorithms and fea- ture sets are almost always compared using only one or a few evaluation criteria separately. However, different evaluation criteria are often in conflict: an algorithm optimised only with respect to classification quality may be slow, have high storage demands, perform worse on imbalanced data, or require high user efforts for labelling of songs. The simulta- neous optimisation of multiple conflicting criteria remains until now almost unexplored in music information retrieval, and it was applied for feature selection in music classification for the first time in this thesis, except for several preliminary own publications. As an exemplarily multi-objective approach for optimisation of feature selection, we simultane- ously minimise the classification error and the number of features used for classification. The sets with more features lead to a higher classification quality. On the other side, the sets with fewer features and a lower classification performance may help to strongly decrease the demands for storage and computing time and to reduce the risk of too com- plex and overfitted classification models. Further, we describe several groups of evaluation criteria and discuss other reasonable multi-objective optimisation scenarios for music data analysis. Acknowledgements This work would not have been possible without the support of many colleagues. First of all, I would like to thank both supervisors of the thesis, Prof. G¨unter Rudolph and Prof. Claus Weihs. In my opinion, they allowed me the perfect balance for my work and helped to carry out many of my own ideas. The supervisors always had enough time for discussions, encouraged me to present my results at conferences, and gave me an opportunity to participate in the organisation and supervision of student seminars, a project group and several master theses. A solid basis for any classification task is the feature set. I would like to thank Markus Eichhoff, Dr. Antti Eronen, Dr. Olivier Lartillot, Dr. Matthias Mauch, Dr. Ingo Mierswa, Prof. Meinard M¨uller,Anil Nagathil and Dr. Wolfgang Theimer for their help on the integration of features from different software toolboxes. Our student assistants, Daniel Stoller and Clemens W¨altken, contributed to the imple- mentation of AMUSE. Thank you! The first steps in music classification and many fruitful discussions were carried out during my cooperation with the Nokia Research Center, Bochum. During that time I had a great support by Prof. Martin Botteck (now FH S¨udwestfalen, Meschede) and Dr. Wolfgang Theimer (now BlackBerry, Bochum). I am grateful to several colleagues, which supported me in certain aspects of the work. Several of discussions led to joint publications. I would like to thank Bernd Bischl for the introduction into the basics of statistical testing and proper evaluation, Prof. Dietmar Jannach for discussions on music recommendation, Markus Eichhoff for the cooperation on instrument identification, Mike Preuß for discussions on the evolutionary algorithms, and Prof. G¨unther R¨otterfor the design of many high-level features and the discussions related to music science. Especially encouraging talks and discussions were held during the regular meetings of the \Special Interest Group on Music Analysis"