Improving Music Mood Classification Using Lyrics, Audio and Social Tags
Total Page:16
File Type:pdf, Size:1020Kb
IMPROVING MUSIC MOOD CLASSIFICATION USING LYRICS, AUDIO AND SOCIAL TAGS BY XIAO HU DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Library and Information Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2010 Urbana, Illinois Doctoral Committee: Professor Linda C. Smith, Chair Associate Professor J. Stephen Downie, Director of Research Associate Professor ChengXiang Zhai Associate Professor P. Bryan Heidorn, University of Arizona ABSTRACT The affective aspect of music (popularly known as music mood) is a newly emerging metadata type and access point to music information, but it has not been well studied in information science. There has yet to be developed a suitable set of mood categories that can reflect the reality of music listening and can be well adopted in the Music Information Retrieval (MIR) community. As music repositories have grown to an unprecedentedly large scale, people call for automatic tools for music classification and recommendation. However, there have been only a few music mood classification systems with suboptimal performances, and most of them are solely based on the audio content of the music. Lyric text and social tags are resources independent of and complementary to audio content but have yet to be fully exploited. This dissertation research takes up these problems and aims to 1) summarize fundamental insights in music psychology that can help information scientists interpret music mood; 2) identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music; 3) advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio. Using linguistic resources and human expertise, 36 mood categories were identified from the most popular social tags collected from last.fm, a major Western music tagging site. A ground truth dataset of 5,296 songs in 18 mood categories were built with mood labels given by a number of real-life users. Both commonly used text features and advanced linguistic features were investigated, as well as different feature representation models and feature combinations. The best performing lyric feature sets were then compared to a leading audio-based system. In combining lyric and audio sources, both methods of feature ii concatenation and late fusion (linear interpolation) of classifiers were examined and compared. Finally, system performances on various numbers of training examples and different audio lengths were compared. The results indicate: 1) social tags can help identify mood categories suitable for a real world music listening environment; 2) the most useful lyric features are linguistic features combined with text stylistic features; 3) lyric features outperform audio features in terms of averaged accuracy across all considered mood categories; 4) systems combining lyrics and audio outperform audio-only and lyric-only systems; 5) combining lyrics and audio can reduce the requirement on training data size, both in number of examples and in audio length. Contributions of this research are threefold. On methodology, it improves the state of the art in music mood classification and text affect analysis in the music domain. The mood categories identified from empirical social tags can complement those in theoretical psychology models. In addition, many of the lyric text features examined in this study have never been formally studied in the context of music mood classification nor been compared to each other using a common dataset. On evaluation, the ground truth dataset built in this research is large and unique with ternary information available: audio, lyrics and social tags. Part of the dataset has been made available to the MIR community through the Music Information Retrieval Evaluation eXchange (MIREX) 2009 and 2010, the community-based evaluation framework. The proposed method of deriving ground truth from social tags provides an effective alternative to the expensive human assessments on music and thus clears the way to large scale experiments. On application, findings of this research help build effective and efficient music mood classification and recommendation systems by optimizing the interaction of music audio and lyrics. A prototype of such systems can be accessed at http://moodydb.com. iii To Mom, Huamin, and Ethan iv ACKNOWLEDGMENTS I would have never made it to this stage of Ph.D. study without the help and support from many people. First and foremost, I am greatly in debt to my academic advisor and mentor, Dr. Linda C. Smith. I am blessed with an advisor and mentor who is supportive and encouraging, sincerely cares about students professionally and personally, provides detailed guidance while at the same time respecting one’s initiatives and academic creativity, and treats her advisees as future fellow scholars. The impact which Linda has had on me will keep motivating and guiding me as I step beyond my dissertation. I also cannot say thank you enough to my research director, Dr. J. Stephen Downie. I am grateful that Stephen introduced me to the general topic of Music Information Retrieval and then gave me the freedom to identify a project which fit my interests. He also set me up with research funding and the opportunity to gain hands-on experience on real-world research projects. In addition, Stephen offered incredible guidance on writing research articles as well as completing this manuscript of my dissertation. Through interactions with Stephen, I also learned the benefits of efficiency and the active pursuit of research opportunities. Many thanks go to Dr. ChengXiang Zhai. An early experience working with Cheng opened my eyes to information retrieval and text analytics. Besides helping me make progress on research by giving me invaluable feedback, advice and encouragement, Cheng also inspires me to mature as an academic and consider the big picture of research in information science. Moreover, I am also very grateful to Dr. P. Bryan Heidorn. Bryan was my research supervisor when I first joined the Ph.D. program and has since served on every committee of v mine during my Ph.D. study. He provided kind support and guidance as I familiarized myself with this new academic and living environment. Over the years, Bryan has given me plenty of advice, not only on research methodology, but also on future career paths. I am thankful to the Graduate School of Library and Information Science for awarding me a Dissertation Completion Fellowship and providing me with financial aid to complete this program and to attend academic conferences, so that I could meet and communicate with researchers from other parts of the world. I also wanted to thank my teammates in the International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL): Mert Bay, Andreas Ehmann, Anatoliy Gruzd, Cameron Jones, Jin Ha Lee and Kris West, for their wonderful collaboration and beneficial discussions on research as well as the sweet memories of marching to conferences together. I also thank my peer doctoral students who shared my joy and frustrations during the years in the program. It was a privilege to grow up with them all. Finally, I simply owe too much to my family, especially my mom, my husband and my son. They endured the long process with great patience and always believe in and have confidence in me. This dissertation is dedicated to them. vi TABLE OF CONTENTS LIST OF FIGURES ............................................................................................................... x LIST OF TABLES ................................................................................................................ xi CHAPTER 1: INTRODUCTION ......................................................................................... 1 1.1 INTRODUCTION ......................................................................................................... 1 1.2 ISSUES IN MUSIC MOOD CLASSIFICATION .......................................................... 3 1.2.1 Mood Categories ..................................................................................................... 3 1.2.2 Ground Truth Datasets ............................................................................................ 3 1.2.3 Multi-modal Classification ..................................................................................... 4 1.3 RESEARCH QUESTIONS ............................................................................................ 5 1.3.1 Identifying Mood Categories from Social Tags ...................................................... 5 1.3.2 Best Lyric Features ................................................................................................. 7 1.3.3 Lyrics vs. Audio ...................................................................................................... 8 1.3.4 Combining Lyrics and Audio .................................................................................. 9 1.3.5 Learning Curves and Audio Lengths .................................................................... 10 1.3.6 Research Question Summary ................................................................................ 11 1.4 CONTRIBUTIONS ..................................................................................................... 11 1.4.1 Contributions to Methodology .............................................................................