Hidden Markov Models for Music Classification
Total Page:16
File Type:pdf, Size:1020Kb
Hidden Markov Models for Music Classification BY Xinru Liu A Study Presented to the Faculty of Wheaton College in Partial Fulfillment of the Requirements for Graduation with Departmental Honors in Mathematics Norton, Massachusetts May, 2019 Acknowledgement I would like to first thank my thesis advisor, Professor Michael Kahn, for helping me accomplish this year-long honors thesis. I have been taking classes with him since I was a freshman and I was lucky for being able to do this honors thesis with him in my senior year. I appreciate all the instructions, care, courage and love he gave me in the past four years. Secondly, I would like to thank my parents and my sister, Xinyi Liu, for giving me the mental support when I was struggling and frustrated during my senior year. Without their support, I would not be able to go through this hardship. Then I would like to thank my piano teacher, professor Lisa Romanul, for teaching me for four years. This thesis is motivated by my curiosity of the similarity between western piano music and it was taking piano lessons with her that made me be fascinated with classical music. I want to thank all the care she gave me. I would also like to thank all the professors at Wheaton College that I took the classes with, especially my committee members, professor Mike Gousie and professor Tommy Ratliff, who gave many good suggestions on the thesis I wrote. Finally, I would like to thank all my friends who shared the joy and sorrow with me during the past four years, including Xinyi Liu, Zhuo Chen, Cheng Zhang, Shi Shen, Martha Bodell, Jenny Migotsky, Keran Yang, Weiqi Feng and many others. I had my happiest four years at Wheaton and they made my life full of fun and energy. i Contents Acknowledgementi 1 Introduction1 2 Music Information Retrieval5 2.1 Music Information Retrieval.......................5 2.2 Mel-Frequency Cepstral Coefficients...................7 2.3 Implementation of MFCCs........................ 11 3 Hidden Markov Model 13 3.1 Hidden Markov Model.......................... 13 3.1.1 Markov Chain........................... 14 3.1.2 Hidden Markov Models...................... 15 3.2 Forward algorithm............................ 17 3.3 Backward Algorithm........................... 20 3.4 Parameter Estimation.......................... 21 3.4.1 Expectation-Maximization Algorithm............. 21 3.4.2 Baum Welch algorithm...................... 24 3.5 Similarity Metric............................. 26 4 Initialization 28 4.1 Initial parameter estimation....................... 28 ii CONTENTS iii 4.1.1 Model-based agglomerative Hierarchical Clustering...... 29 5 Experiments and results 34 5.1 Composers................................. 35 5.1.1 Bach................................ 35 5.1.2 Beethoven............................. 37 5.1.3 Schubert.............................. 39 5.1.4 Chopin............................... 41 5.1.5 Debussy.............................. 43 5.1.6 Schumann............................. 45 5.1.7 Schoenberg............................ 47 5.2 Experiments................................ 49 5.3 Discussion................................. 50 5.3.1 Validation............................. 50 5.3.2 Analysis of the result....................... 51 5.3.3 Accuracy.............................. 55 5.3.4 Problems and Concerns..................... 56 6 Conclusion and Future work 58 A Appendix 60 List of Figures 66 Chapter 1 Introduction Music Information Retrieval (MIR) is concerned with extracting features from music (audio signal or notated music), as well as developing different search and retrieval schemes[1]. With the explosion in the availability of music in the past two decades (both digital audio and musical scores), more individuals have access to large music collections. People use online or streaming music repositories, such as Spotify[2] and Pandora[3] to access music. Others obtain music scores from either music stores or online repositories such as IMSLP[4]. MIR developed in response to music retrieval applications focusing on matching personal tastes to corresponding music. For the digital audio music representation, the main goal of MIR is to characterize different musical features from the audio and overcome the gap between extractable features and human music perception[1]. One of the factors that determines the perspective of music is music “similarity”, which is difficult to define and is particularly complex because of its numerous parameters (timbre, melody, rhythm, harmony). Similarity metrics measure some inherent structure of a music collection, and the acceptance of a music retrieval system crucially depends on whether the user can recognize some similarities between the given piece and the retrieved music. A way of comparing audio recordings is to extract features from the audio signal which reflects the relevant 1 CHAPTER 1. INTRODUCTION 2 aspects of the recordings, followed by defining and computing a measure of similarity of the extracted information. Timbre is a feature that is used to distinguish the same tone performed by different instruments. It is one of the most important dimensions in a piece of music. Mel-frequency cepstral coefficients (MFCCs) are a good measure for the perceptual timbre space[5]. MFCCs also capture the melodic structure in the music, but the pitch-related features, like the chroma-based features (meaningfully categorize pitches), are the most powerful representation for describing harmonic information[5]. The rhythmic features also provide information about the music’s structure. Timbre, melody and rhythm are three of the most important features that represent perceptual cues for each music piece and researchers mainly focus on extracting these features from the music. Motivated by the goal of recognizing the similarities and relationships between different music pieces, several statistical models have been developed for analysis. Both supervised and unsupervised classification methods have been applied in pre- vious research. In Xu et al.’s paper, the authors proposed a music genre classifi- cation method using support vector machine, a supervised machine learning algo- rithm[6]. The paper concludes that multi-layer support vector machines have better performance compared to traditional Euclidean distance based methods and statisti- cal learning methods. In Tzanetakis and Cook’s paper[7], Guassian Mixture Model (GMM) and K-Nearest Neighbor are employed to classify music pieces. However, those methods treated data independently and identically distributed samples and failed to take the dependent, dynamic features of the music into consideration[8]. In Qi et al.’s paper[8], the Hidden Markov Model (HMM) was proposed to accurately represent the characteristics of sequential data. A HMM mixture model in a Bayesian setting using a non-parametric Dirichlet process (DP) as a prior distribution is applied by the authors. A similarity matrix between the respective HMM mixture models trained from each piece is computed. The paper compares the results from DP HMM CHAPTER 1. INTRODUCTION 3 mixture models and DP Gaussian Mixture models and concludes that HMM mix- ture models better distinguish between the content of the given music by taking the temporal character of the given music into account, providing sharper contrasts in similarities than the GMMs[8]. The motivation of this thesis is from my own curiosity of the similarity between western classical piano pieces by different composers. Classical music is used to refer to the period from 1750 to 1820. The major time divisions of classical music include the Baroque, Classical and Romantic periods. Prominent composers during the entire classical era include Johann Sebastian Bach, Wolfgang Amadeus Mozart, Ludwig van Beethoven, Franz Schubert, Frederic Chopin, Robert Schumann, Claude Debussy and more. Although the composing styles differ between composers across the different time periods, there exist interesting connections between those composers and their music. For example, Franz Schubert and Ludwig van Beethoven are composers that are often compared and contrasted in a number of ways due to their temporal and spatial proximity to each other in the early nineteenth century. Indeed, there are many similarities in the two composers’ piano work. However, their music can be distinguished not only because of the different compositional process but also their distinctive personality. Although these characteristics cannot be measured directly, they are reflected in the music which can be quantified. Similar connections oc- cur when comparing Schumann and Chopin. One of the piano pieces in Schumann Carnaval Op.9 is called Chopin, which Schumann wrote in homage to his colleague, Frederic Chopin. There are many “Chopin elements” in the piece so that many people misclassfiy it as a work by Chopin. So, the questions is: if the human ear can detect similarities and differences between piano pieces, can a model be built that is able to tell which pieces are more similar to each other? Is there a metric that can be built to measure the similarity? The goal of this thesis is to build Hidden Markov Models on piano pieces from CHAPTER 1. INTRODUCTION 4 different composers and develop a similarity metric to measure similarity between piano works. In order to examine the effectiveness of the model and the similarity metric, a database of pre-characterized piano pieces trained by Hidden Markov Model for each piece will be built. A new piece can then be put into those trained models and the “similarity” between the new piece