Popular Music Analysis: Chorus and Emotion Detection
Total Page:16
File Type:pdf, Size:1020Kb
Popular Music Analysis: Chorus and Emotion Detection Chia-Hung Yeh 1, Yu-Dun Lin 1, Ming-Sui Lee 2 and Wen-Yu Tseng 1 1Department of Electrical Engineering, National Sun Yat-sen University, 804 Taiwan E-mail: [email protected] , Tel: +886-7-5252000 Ext. 4112 2Department of Computer Science and Information Engineering, National Taiwan University, 106 Taiwan E-mail: [email protected] , Tel: +886-2-33664888 Ext. 520 Abstract—In this paper, a chorus detection and an emotion have to waste time searching files in order to find suitable detection algorithm for popular music are proposed. First, a songs. In this paper, we aim at establishing a music retrieval popular music is decomposed into chorus and verse segments system which classifies songs by emotions so as to provide based on its color representation and MFCCs (Mel-frequency the user proper music for certain scenarios. cepstral coefficients ). Four features including intensity, tempo The rest of this paper is organized as follows. Section II and rhythm regularity are extracted from these structured segments for emotion detection. The emotion of a song is reviews the background of music emotion detection. Section classified into four classes of emotions: happy, angry, depressed III describes the proposed scheme including chorus detection and relaxed via a back-propagation neural network classifier. and its emotion detection. Experimental results are shown in Experimental results show that the average recall and precision Sec. IV to evaluate the performance of the proposed method. of the proposed chorus detection are approximated to 95% and Finally, concluding remarks and recommendation for future 84%, respectively; the average precision rate of emotion work are given in Sec. V. detection is 88.3% for a test database consisting of 210 popular music songs. II. BACKGROUND REVIEW Keyword: Chorus, MFCC, music emotion, neural network Friedrich Nietzsche, a famous German philosopher, once said that “Without music, life would be a mistake.” Music I. INTRODUCTION does represent a significant part of our daily life. Besides, Multimedia information has evolved over the last decade. music can express emotion and produce emotion for listeners. Digital popular music has taken a significant place in our Therefore, we want to figure out what kind of emotions a daily lives nowadays. Knowing how to manage a large song conveys, especially popular songs. Most popular music number of music files has been of more importance these days has a simple musical structure that includes the repetition of a because of its’ mounting quantity as well as the availability. chorus. Stein [9] defines chorus as a section of a song that is Therefore, a great number of methods for music database heard several times, repeating the same lyrics. Choruses are management and retrieval are investigated [1]-[8]. Music the most noticeable and easily-remembered parts of a song. retrieval systems can find specific songs based on users’ Once the repeated part, i.e. chorus, can be detected, we obtain demands. In general, music retrieval systems can be classified a clear picture of the chorus/verse structure of a song. In into three categories: content-based, text-based, and emotion- general, the verse sections of a song lay out the theme of the based. Content-based retrieval systems enable users to song, the chorus sections allow one to remember and sing retrieve a musical piece by humming, singing, whistling or along by repeating its motifs. The main objective of this playing a fragment of the musical piece. The result shows an research is to analyze the verse-chorus structure of popular output list containing the best matching (e.g. Top 5) songs. songs so as to dig out their motion for browsing music Text-based music retrieval systems classify songs by music databases quickly and other purposes. The problem of the information such as album, artist, alphabet, title, genre and so structure analysis of a song has been addressed recently [10]- on. However, manual classification is time-consuming and [14]. Most researches employ similarity matrix to analyze the not suitable for some cases. For example, exciting or happy structure of a song. In this research, we examine this problem music with fast tempo and steady rhythms is always in via color representation. demand for a party. Cheerful music might be useful for As mentioned in [15], “Music arouses strong emotions in soothing a depressed person. There are some cases when users people, and they want to know why.” One of the major feel like listening to a series of songs, rather than specific difficulties in music emotion classification is that emotions ones, as long as these songs can go with his mood at that time. are hard to express. The Thayer’s model [16] is commonly There are some cases when users feel like listening to a series used to categorize emotions. It defines the emotion as a two of songs, rather than specific ones, as long as these songs can dimension model. As for the taxonomy of music emotion, to go with his mood at that time. To be able to retrieve music simplify the emotion detection procedure, we classify with the tag “happy” from database, it is reasonable and emotions into four classes which will be described in detail in necessary to categorize songs by emotion. The users do not Sec. 4. 907 Proceedings of the Second APSIPA Annual Summit and Conference, pages 907–910, Biopolis, Singapore, 14-17 December 2010. 10-0109070910©2010 APSIPA. All rights reserved. III. PROPOSED METHOD 0.5 0.6 0.7 0.8 A. Chorus Detection 0.9 1 A framework for extracting chorus from popular music 1.1 1.2 based on structural content analysis is proposed. Fig. 1 (a) 1.3 1.4 shows the flowchart of chorus detection. For each frame of 1.5 20 40 60 80 100 120 140 160 180 200 220 audio data, we extract the feature vectors by calculating the energy of three features (intensity, high band, and low band in (c) the frequency domain). We map the feature vectors to the R, G, B color space domain to obtain the music color map for the 2 4 representation of the structure of a song. Color image adaptive 6 clustering segmentation algorithm [17] is employed to cluster 8 the region with similar color distribution. Then, MFCCs is 10 12 extracted from each region obtained from the color map as the 2000 4000 6000 8000 10000 12000 feature for classification of verse or chorus sections in a (a) (d) popular song. Finally, some post-processing steps to exclude some fragile regions in order to enhance the detection result. 0.5 0.5 0.6 0.6 Experimental results show the efficiency of the proposed 0.7 0.7 0.8 0.8 system for chorus detection. The followings are the details of 0.9 0.9 1 1 our proposed chorus detection algorithm. 1.1 1.1 1.2 1.2 1.3 1.3 1.4 1.4 Color Map Generation 1.5 1.5 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 140 160 180 200 220 Color information is utilized to represent music structure. Three audio signal features are extracted from the music, and (b) (e) converted into RGB color space so as to generate a color map. Fig. 1 The proposed chorus detection method.(a) Flowchart of chorus The color map not only helps us find repeating patterns but detection (b) Color map of the song “Let It Go ” (c) Structure information of also gives us an overall impression of a song. First, the the song “Let It Go” by employing RPCL (d) Illustration of MFCCs of combined chorus sections (e) Result of chorus detection after post-processing summation of energy of a song, F1, is calculated in (1). The summation of the energy of both low and high frequency Chorus and Verse Designation bands are also calculated as F2 and F3. Based on the numerous For each of combined segments, its MFCCs are calculated. experiments, the ranges of the low and high frequency bands MFCCs provide a simple way to represent band frequency are set as 1~2048Hz and 2049~22050Hz, respectively, as energy, creating a simplified spectrum with 13 coefficients as demonstrated in the following equations. follows ͥ _ (3) ͨͨͥͤͤ ̽.ʚ͢ʛ = ∑'Ͱͥ ͓ʚ͠ʛcos ʚ͠ ʚͧ − 0.5ʛ, ͦ ̀ͥ = ȕ ͙ ʚ͝ʛ, where Cs(n) is the sth coefficient of the nth frame, and Y(l) is $Ͱͥ ͦͤͨͬ the l th filter bank, which is representative of the critical band ̀ͦ = ȕ ͕͖ͧ ʚ͚͚ͨ ʚ͚$ʛʛ × ͫʚℎʛ, (1) in the human auditory system. C0(n) is the energy band and #Ͱͥ ͦͦͤͩͤ the other twelve coefficients Cs(n), s=1,2 ,…,12 are generally ̀ͧ = ȕ ͕͖ͧ ʚ͚͚ͨ ʚ͚$ʛʛ × ͫʚℎʛ, adopted in audio signal analysis. Fig. 1 (d) demonstrates #Ͱͦͤͨͭ MFCCs of chorus sections Then, (4) measures the similarity where fi is the ith frame of audio data, and W(h) is a window [19] of each cluster as follows, function at h Hz. We convert the values of F1, F2 and F3 to R, 2ͯͥ 1 (4) G, and B, respectively, by a mapping function M as seen in ͍2ʚ͝, ͞ʛ = ȕ ʚͪ$ͮ& ∙ ͪ%ͮ&ʛ, ͫ &Ͱͤ the following (2). The simplest mapping function is a where w is the size of the cluster. To obtain a high similarity normalization operation . value, coefficients in a cluster should be similar. In this step, the cluster with the greatest similarity value is regarded as the (2) ʚ͌, ́, ̼ʛ = ͇ʚ̀ͥ, ̀ͦ, ̀ͧʛ chorus because of the definition of chorus [9].