Automatic Emotion Prediction of Song Excerpts: Index Construction, Algorithm Design, and Empirical Comparison
Total Page:16
File Type:pdf, Size:1020Kb
CORE Metadata, citation and similar papers at core.ac.uk Provided by CiteSeerXsankarr 18/4/08 15:46 NNMR_A_292950 (XML) Journal of New Music Research 2007, Vol. 36, No. 4, pp. 283–301 Automatic Emotion Prediction of Song Excerpts: Index Construction, Algorithm Design, and Empirical Comparison Karl F. MacDorman, Stuart Ough and Chin-Chang Ho School of Informatics, Indiana University, USA Abstract enough to find and organize desired pieces of music. Even genre offers limited insight into the style of music, Music’s allure lies in its power to stir the emotions. But the because one piece may encompass several genres. These relation between the physical properties of an acoustic limitations indicate a need for a more meaningful, signal and its emotional impact remains an open area of natural way to search and organize a music collection. research. This paper reports the results and possible Emotion has the potential to provide an important implications of a pilot study and survey used to construct means of music classification and selection allowing an emotion index for subjective ratings of music. The listeners to appreciate more fully their music libraries. dimensions of pleasure and arousal exhibit high reliability. There are now several commercial software products for Eighty-five participants’ ratings of 100 song excerpts are searching and organizing music based on emotion. used to benchmark the predictive accuracy of several MoodLogic (2001) allowed users to create play lists from combinations of acoustic preprocessing and statistical their digital music libraries by sorting their music based on learning algorithms. The Euclidean distance between genre, tempo, and emotion. The project began with over acoustic representations of an excerpt and corresponding 50,000 listeners submitting song profiles. MoodLogic emotion-weighted visualizations of a corpus of music analysed its master song library to ‘‘fingerprint’’ new excerpts provided predictor variables for linear regression music profiles and associate them with other songs in the that resulted in the highest predictive accuracy of mean library. The software explored a listener’s music library, pleasure and arousal values of test songs. This new attempting to match its songs with over three million songs technique also generated visualizations that show how in its database. Although MoodLogic has been discon- rhythm, pitch, and loudness interrelate to influence our tinued, the technology is used in AMG’s product Tapestry. appreciation of the emotional content of music. Other commercial applications include All Media Guide (n.d.), which allows users to explore their music library through 181 emotions and Pandora.com, which 1. Introduction uses trained experts to classify songs based on attributes including melody, harmony, rhythm, instrumentation, The advent of digital formats has given listeners greater arrangement, and lyrics. Pandora (n.d.) allows listeners to access to music. Vast music libraries easily fit on create ‘‘stations’’ consisting of similar music based on an computer hard drives, are accessed through the Internet, initial artist or song selection. Stations adapt as the listener and accompany people in their MP3 players. Digital rates songs ‘‘thumbs up’’ or ‘‘thumbs down.’’ A profile of jukebox applications, such as Winamp, Windows Media the listener’s music preferences emerge, allowing Pandora Player, and iTunes offer a means of cataloguing music to propose music that the listener is more likely to enjoy. collections, referencing common data such as artist, title, While not an automatic process of classification, Pandora album, genre, song length, and publication year. But as offers listeners song groupings based on both their own libraries grow, this kind of information is no longer pleasure ratings and expert feature examination. Correspondence: Karl F. MacDorman, Indiana University School of Informatics, 535 West Michigan Street, Indianapolis, IN 46202, USA. DOI: 10.1080/09298210801927846 Ó 2007 Taylor & Francis 284 Karl F. MacDorman et al. As technology and methodologies advance, they open consuming and challenging, especially for those who are up new ways of characterizing music and are likely to offer not musically trained. Automatic classification based on useful alternatives to today’s time-consuming categoriza- acoustic properties is one method of assisting the listener. tion options. This paper attempts to study the classifica- The European Research and Innovation Division of tion of songs through the automatic prediction of human Thomson Multimedia worked with musicologists to emotional response. The paper contributes to psychology define parameters that characterize a piece of music by refining an index to measure pleasure and arousal (Thomson Multimedia, 2002). Recognizing that a song responses to music. It contributes to music visualization can include a wide range of styles, Thomson’s formula by developing a representation of pleasure and arousal evaluates it at approximately forty points along its with respect to the perceived acoustic properties of music, timeline. The digital signal processing system combines namely, bark bands (pitch), frequency of reaching a given this information to create a three-dimensional fingerprint sone (loudness) value, modulation frequency, and rhythm. of the song. The k-means algorithm was used to form It contributes to pattern recognition by designing and clusters based on similarities; however, the algorithm testing an algorithm to predict accurately pleasure and stopped short of assigning labels to the clusters. arousal responses to music. Sony Corporation has also explored the automatic extraction of acoustic properties through the develop- ment of the Extractor Discovery System (EDS, Pachet & 1.1 Organization of the paper Zils, 2004). This program uses signal processing and Section 2 reviews automatic methods of music classifica- genetic programming to examine such acoustic dimen- tion, providing a benchmark against which to evaluate the sions as frequency, amplitude, and time. These dimen- performance of the algorithms proposed in Section 5. sions are translated into descriptors that correlate to Section 3 reports a pilot study on the application to music human-perceived qualities of music and are used in the of the pleasure, arousal, and dominance model of grouping process. MusicIP has also created software that Mehrabian and Russell (1974). This results in the uses acoustic ‘‘fingerprints’’ to sort music by similarities. development of a new pleasure and arousal index. In MusicIP includes an interface to enable users to create a Section 4, the new index is used in a survey to collect play list of similar songs from their music library based sufficient data from human listeners to evaluate ade- on a seed song instead of attempting to assign meaning quately the predictive accuracy of the algorithms pre- to musical similarities. sented in Section 5. An emotion-weighted visualization of Another common method for classifying music is acoustic representations is developed. Section 5 introduces genre; however, accurate genre classification may require and analyses the algorithms. Their potential applications some musical training. Given the size of music libraries are discussed in Section 6. and the fact that some songs belong to two or more genres, sorting through a typical music library is not easy. Pampalk (2001) created a visualization method 2. Methods of automatic music classification called Islands of Music to represent a corpus of music visually. The method represented similarities between The need to sort, compare, and classify songs has grown songs in terms of their psychoacoustic properties. The with the size of listeners’ digital music libraries, because Fourier transform was used to convert pulse code larger libraries require more time to organize them. modulation data to bark frequency bands based on a Although there are some services to assist with managing model of the inner ear. The system also extracted a library (e.g. MoodLogic, All Music Guide, Pandora), they rhythmic patterns and fluctuation strengths. Principal are also labour-intensive in the sense that they are based on component analysis (PCA) reduced the dimensions of the human ratings of each song in their corpus. However, music to 80, and then Kohonen’s self-organizing maps research into automated classification of music based on clustered the music. The resulting clusters form ‘‘islands’’ measures of acoustic similarity, genre, and emotion has led on a two-dimensional map. to the development of increasingly powerful software (Pampalk, 2001; Pampalk et al., 2002; Tzanetakis & Cook, 2.2 Grouping by genre 2002; Yang, 2003; Neve & Orio, 2004; Pachet & Zils, 2004; Pohle et al., 2005). This section reviews different ways of Scaringella et al. (2006) survey automatic genre classifi- grouping music automatically, and the computational cation by expert systems and supervised and unsuper- methods used to achieve each kind of grouping. vised learning. In an early paper in this area, Tzanetakis and Cook (2002) investigate genre classification using statistical pattern recognition on training and sample 2.1 Grouping by acoustic similarity music collections. They focused on three features of One of the most natural means of grouping music is to audio they felt characterized a genre: timbre, pitch, and listen for similar sounding passages; however, this is time rhythm. Mel frequency cepstral coefficients (MFCC), Automatic emotion prediction of song excerpts 285 which are