UCLA Electronic Theses and Dissertations
Total Page:16
File Type:pdf, Size:1020Kb
UCLA UCLA Electronic Theses and Dissertations Title Data-Driven Structural Sequence Representations of Songs and Applications Permalink https://escholarship.org/uc/item/7dc0q8cw Author Wang, Chih-Li Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Data-Driven Structural Sequence Representations of Songs and Applications A dissertation submitted in partial satisfaction of requirements for the degree Doctor of Philosophy in Electrical Engineering by Chih-Li Wang 2013 © Copyright by Chih-Li Wang 2013 ABSTRACT OF THE DISSERTATION Data-Driven Structural Sequence Representations of Songs and Applications by Chih-Li Wang Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2013 Professor Vwani Roychowdhury, Chair Content-based music analysis has attracted considerable attention due to the rapidly growing digital music market. A number of specific functionalities, such as the exact look-up of melodies from an existing database or classification of music into well-known genres, can now be executed on a large scale, and are even available as consumer services from several well-known social media and mobile phone companies. In spite of these advances, robust representations of music that allow efficient execution of tasks, seemingly simple to many humans, such as identifying a cover song, that is, a new recording of an old song, or breaking up a song into its constituent structural parts, are yet to be invented. Motivated by this challenge, we introduce a method for determining approximate structural sequence representations purely from the chromagram of songs without adopting any prior knowledge from musicology. Each song is represented by a sequence of states of an underlying Hidden Markov Model, where each state may represent a property of a song, such as the harmony, chord, or melody. Then, by adapting ii different versions of the sequence alignment algorithms, the method is applied to the problems of: (i) Exploring and identifying repeating parts in a song; (ii) identifying cover songs; and (iii) extracting similar sections from two different songs. The proposed method has a number of advantages, including elimination of the unreliable beat estimation step and the capability to match parts of songs. The invariance of key transpositions among cover songs is achieved by cyclically rotating the chromatic domain of a chromagram. Our data-driven method is shown to be robust against the reordering, insertion, and deletion of sections of songs, and its performance is superior to that of other known methods for the cover song identification task. iii The dissertation of Chih-Li Wang is approved. Lieven Vandenberghe Ying Nian Wu Kung Yao Vwani Roychowdhury, Committee Chair University of California, Los Angeles 2013 iv Table of Contents Biography ..................................................................................................................................... viii 1 Introduction ............................................................................................................................. 1 2 Features .................................................................................................................................... 8 2.1 Root-mean-square Energy (RMS) .................................................................................... 9 2.2 Zero–crossing Rate ........................................................................................................... 9 2.3 Mel-frequency Cepstral Coefficient (MFCC) ................................................................ 10 2.4 Fluctuation Pattern ......................................................................................................... 11 2.5 Unwrapped/Wrapped Chromagram ............................................................................... 16 2.5.1 Chromagram Estimation ......................................................................................... 16 2.5.2 Key Transposition ................................................................................................... 18 2.6 Key Strength ................................................................................................................... 20 3 Music Feature Interpretation ................................................................................................. 22 3.1 Self-similarity Matrix ..................................................................................................... 22 3.2 Interpretation .................................................................................................................. 22 3.2.1 Key Variation .......................................................................................................... 24 3.2.2 Remove/Reduce Vocal Signals ............................................................................... 26 3.2.3 Wahwah Effect........................................................................................................ 27 3.2.4 Tempo Variation ..................................................................................................... 28 3.3 Music Feature Analysis by Clustering ........................................................................... 32 v 3.3.1 K-means .................................................................................................................. 32 3.3.2 Fuzzy C-mean ......................................................................................................... 32 3.3.3 Self-organizing Map (SOM) ................................................................................... 33 3.3.4 Results ..................................................................................................................... 35 4 Applications ........................................................................................................................... 79 4.1 Structure of a Song ......................................................................................................... 79 4.1.1 Hidden Markov Model Estimation ......................................................................... 80 4.1.2 State-Splitting State Hidden Markov Model (SSS HMM) ..................................... 81 4.1.3 State Sequence Alignment ...................................................................................... 85 4.1.4 Score for the Local Alignment ................................................................................ 88 4.1.5 Method of Music Summarization ........................................................................... 88 4.1.6 Cover Song Identification ....................................................................................... 89 4.2 Extraction of Similar Melodies from Two Different Songs ........................................... 91 4.3 Results ............................................................................................................................ 92 4.3.1 Data-Driven Chord Sequence Representation ........................................................ 92 4.3.2 Music Summary ...................................................................................................... 94 4.3.3 Cover Song Identification ..................................................................................... 102 4.3.4 Tolerance of Tempo Variations ............................................................................ 104 4.3.5 Extraction of Similar Melodies from Two Songs ................................................. 105 5 Conclusions ......................................................................................................................... 107 vi 6 Appendix ............................................................................................................................. 108 7 References ........................................................................................................................... 115 vii Biography Chih-Li Wang received his B.S. degree in Electrical Engineering from the National Tsing Hua University, Taiwan, and M.S. degree in Electrical Engineering from the National Taiwan University, Taiwan. His research focuses on signal processing for music analysis and music information retrieval. Publications: C.L. Wang, Q. Zhong, S.Y. Wang, and V. Roychowdhury, “Cover Song Identification by Sequence Alignment Algorithms,” presented at the International Conference on Signal and Information Processing, 2010, pp. 187-191. C.L. Wang, Q. Zhong, S.Y. Wang, and V. Roychowdhury, “Data-Driven Chord-Sequence Representations of Songs and Applications” presented at the International Association of Science and Technology for Development (IASTED) on Signal and Image Processing, 2011. viii 1 Introduction The proliferation of the Internet has facilitated the acquisition of digital music. However, digital recordings, especially older recordings, are not always tagged with appropriate metadata. Therefore, content-based automatic music analysis, such as organizing songs into genres, music summarization, locating a specific song, or finding a cover version among thousands of songs, has attracted considerable attention in recent years. Various groups of researchers have focused on content-based automatic music analysis. Normally, automatic music analysis comprises two steps. The first is numerical representation of a musical property, such as timbre, pitch, and