Identifying the Emotional Polarity of Song Lyrics Through Natural Language Processing

Identifying the Emotional Polarity of Song Lyrics through Natural Language Processing Ashley M. Oudenne Sarah E. Chasins Swarthmore College Swarthmore College Swarthmore, PA Swarthmore, PA [email protected] [email protected] Abstract features or collaborative filtering. However, these methods have numerous drawbacks. Audio feature The analysis of song lyrics can provide processing relies on having the actual recording of a important meta-information about a song, song, which may be difficult to obtain under copy- such as its genre, emotion, and theme; right laws. Collaborative filtering can suffer from however, obtaining this information can the long-tail problem: obscure songs are not likely be difficult. Classifying a song based to have adequate representation on social network- solely on its lyrics can be challenging for ing sites, which makes similarity analysis difficult. a variety of reasons, and as yet no single Recently, researchers have turned to analyzing algorithm results in highly accurate clas- lyrics in order to extract information about a song. sifications. This information can be combined with audio fea- In this paper, we examine why the task tures or collaborative filtering data or used alone in of classification based solely on lyrics is a music classification system. so challenging by predicting the emotional Song lyrics are a good source of data because polarity of a song based on the song’s many lyrics are freely available in a semi-structured lyrics. We use natural language pro- format on the Internet. However, they are difficult to cessing to classify a song as either posi- analyze for sentiment and, as a result, many systems tive or negative according to four sets of that rely on lyrical analysis alone for classification frequency-based or machine learning al- yield poor results. gorithms which we develop and test on In this paper, we will explore song lyric analy- a dataset of 420 songs. We determine sis by focusing on the simple yet non-trivial task of which factors complicate classification, categorizing music by emotional polarity. We clas- such as generic subjectivity lexicons and sify songs as positive or negative depending upon non-subjective lyrics, and suggest strate- whether they express a mainly positive or negative gies for overcoming these difficulties. overall emotion. We test different algorithms that analyze word frequencies, word presence, and co- 1 Introduction sine similarity to classify a dataset of 420 unique Lyrics provide high-level information about a song. lyrics. We also evaluate the performance of generic Humans gather meta-information such as the genre, versus corpus-specific subjectivity lexicons in or- sentiment, and theme of a song simply by reading der to determine how lyrics-based sentiment anal- its lyrics. However, automating this task is very dif- ysis differs from traditional sentiment analysis. ficult. The rest of this paper is laid out as follows: in Sec- In automatic music classification systems, re- tion 2 we present related work in sentiment analysis search typically focuses on classification by audio and how it has been extended to song lyrics. In Sec- tion 3, we introduce our data set and the methods we suggested sentiment word lists to a corpus-specific use to classify the data. In Section 4 we present our most-frequent word list, they achieved higher accu- results and discuss different ways to classify lyrics racy using corpus-specific words. This suggests that by emotion in Section 5. Finally, in Section 6, we researchers should not take a “one-size-fits-all” ap- conclude by discussing future work in song lyric proach to subjectivity lexicons like Wiebe’s. A lexi- analysis. con specific to the corpus and the task will perform better than a generic lexicon. 2 Related Work Additionally, they use machine learning algorithms such as Naive Bayes to perform sentiment 2.1 Sentiment Analysis classification. They use a bag-of-features frame- Identifying the primary sense of a document in a work where the features can be bigrams, unigrams, corpus is very useful in natural language process- positional data, and part-of-speech data. Unigram ing. Knowing whether a document’s sentiment is features perform the best, and their performance in- positive or negative overall can aid in classification creases as they are combined with other features. We tasks. To that end, Janyce Wiebe created a lexicon of also experiment with the Naive Bayes algorithm in subjectivity clues, such as repressive and celebrate, our work. that are good indicators of positive or negative sentiment (Wilson et al., 2005). These words are tagged 2.2 Sentiment Analysis with Lyrics with their part of speech, polarity (positive, negative, neutral, or both), and the strength of their subjectiv- Lyrical analysis for classification is a relatively ity (strong or weak). Many researchers make use of new area of research. Due largely to sites such this lexicon for sentiment analysis, as do we. as Lyrics.com and elyrics.net, millions of lyrics Subjectivity lexicons like the one described above are now available on the Internet to researchers in can be used to extract the overall sentiment of a doc- semi-structured formats that are amenable to web- ument. (O’Connor et al., 2010) use sentiment analy- scraping. Lyrics are typically analyzed as part of a sis to classify Twitter tweets about the economy and music classification task, where songs are classified the presidential election. They classify tweets as ei- by genre, mood, or emotion (Cho and Lee, 2006; Hu ther positive or negative by counting the number of and Downie, 2010). positive or negative subjectivity clues from Wiebe’s (McKay et al., 2010) use lyrical features com- subjectivity lexicon that occur in the tweet. They bined with audio, cultural, and symbolic features then use this classification to predict consumer con- to classify music by genre. The find that lyrics fidence poll data. They note that Wiebe’s subjectiv- alone are poor indicators of a song’s genre, but that ity clues lead to many instances of falsely-detected when lyrical analysis is combined with other fea- sentiment, because the subjectivity clues are used tures, their system is able to achieve high genre clas- differently in tweets than in the corpus that gener- sification accuracy. However, some of the songs in ated the clues. However, they were able to success- their data set were instrumental, so there may not fully predict polling data with tweet sentiment, in have been enough data for training. In our experi- spite of the fact that they took no negation of sentiments, we use a larger data set that includes lyrics ment words into account (e.g. ”I do not think that for every song. the economy is good”). Based on their results, we Compared to sentiment analysis in movie reviews, have included sentiment negation into some of our we expect sentiment analysis to be more difficult for algorithms. lyrics. Movie reviewers typically use opinion-words Sentiment classification has also been success- (e.g. ”I liked this movie because...”, ”This scene was fully used to classify movie reviews. (Pang et al., horrible”) in their reviews, which can be extracted 2002) use machine learning techniques to classify with relative ease with the proper subjectivity lexi- movie reviews as either positive or negative. They con. Lyrics, however, are more challenging. There conclude that subjectivity lexicons must be corpus- are three main difficulties with lyric-based sentiment specific. In an experiment comparing human- analysis: 1. Songs can contain a series of negative lyrics but the years 1980-2009, and then searched for lyrics end on an uplifting, positive note, or vice versa. matching those songs from Lyrics.com2. We mined Love songs in particular can be misleading be- the lyrics and cleaned them of xml tags and other cause the lyrics often express how happy the extraneous text. This created an initial data set of singer was while in love, and then at the end of 1,652 lyrics. the song the singer expresses his sadness over a sudden breakup. 3.1.2 Classifying the Data To gather the ground truth emotional polarity la- 2. Songs may not contain any of the subjectivity bels for the song lyrics, we used Last.fm’s developer clues in a general subjectivity lexicon, yet ex- API3. For each song in our initial dataset, we queried press positive or negative emotions. For exam- the Last.fm API and retrieved the user-specified top ple, the song ”It’s Still Rock And Roll To Me” tags—semantic text descriptors— for that song. We by Billy Joel includes the following stanza: then searched through the top tags to see if any of What’s the matter with the clothes I’m wearing? them occurred in Jan Wiebe’s subjectivity lexicon. Can’t you tell that your tie’s too wide? We kept a count of the number of positive and neg- Maybe I should buy some old tab collars? ative subjectivity clues that occurred in the top tags Welcome back to the age of jive. of the song, and then labeled the song as the emo- It’s not immediately apparent which of the tion with the greatest number of subjectivity clues words in this stanza would have positive conno- that occurred in the lyrics. We manually annotated tations; yet, taken together, the stanza expresses the few songs that did not appear in the Last.fm a positive emotion. This occurs in both posi- database. This resulted in an intermediate dataset tive and negative songs, and it can be difficult to of 1,515 lyrics. separate the overall emotion of a song from the 3.1.3 Equalizing the Data sentiments expressed by each line of its lyrics.

Load more