Hit Songs' Sentiments Harness Public Mood & Predict Stock Market
Total Page:16
File Type:pdf, Size:1020Kb
Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market Rachel Harsley1, Bhavesh Gupta2, Barbara Di Eugenio1, and Huayi Li1 1University of Illinois at Chicago, Chicago, IL, USA 2Cerner Corporation, Kansas City, MO, USA {rharsl2, bdieugen, hli47}@uic.edu [email protected] from the Billboard Hot 100, a weekly listing of Abstract the top 100 songs, is representative of public mood. Moreover, we aimed to explore correlat- This work explores the relationship be- ing, causal, and even predictive relationships be- tween the sentiment of lyrics in Bill- tween song lyrics, public opinion, and the stock board Top 100 songs, stocks, and a con- market. sumer confidence index. We hypothe- Thus, we aimed to explore the sentiment of sized that sentiment of Top 100 songs top song lyrics in a manner similar to researchers could be representative of public mood who used Twitter to ascertain public mood and and correlate to stock market changes as correlated this sentiment to public opinion polls well. We analyzed the sentiment for po- (Bollen et al., 2011; O’Connor et al., 2010). larity and mood in terms of seven dimen- While Twitter offers a fine-grained time-based sions. We gathered data from 2008 to approach to harnessing public expressions, other 2013 and found statistically significant mediums, such as popular song lyrics, may offer correlations between lyrical sentiment the same insight while being less costly to obtain polarity and DJIA closing values and be- and less susceptible to the “burstiness” of Twit- tween anxiety in lyrics and consumer ter. In other words, use of popular song lyrics confidence. We also found strong would automatically filter out the noise of Granger-causal relationships involving ephemeral, popular happenings which pervade anxiety, hope, anger, and both societal Twitter. indicators. Finally, we introduced a vec- The Hot 100 listing is calculated based on a tor autoregression model with time lag single’s selling performance, radio airplay audi- which is able to capture stock and con- ence impression, and online streaming activity sumer confidence indices (R2=.97, (Trust, Gary, 2013). We explored if correlation p<.001 and R2=0.72, p<.01 respectively). would reflect in the Thomson Reuters/University of Michigan Consumer Confidence Index (ICC) 1 Introduction and the Dow Jones Industrial Average (DJIA), a major U.S. stock market index. Many would agree that the success of top songs For our work, we gathered the entire set of is due to a complicated mix of marketing, popu- weekly Hot 100 songs between 2008 and 2013. larity, and blockbuster theory in which compa- We used OpinionFinder to analyze the positive nies invest big money into few products. Yet, we and negative polarity of lyrics (Wilson et al., also hypothesized that the lyrical sentiment of 2005). We then used a second tool, WordNet top songs can be viewed as transient, but genuine Affect, to perform sentiment analysis along nine- snapshot of public mood. There is a plethora of dimensions (Strapparava and Valitutti, 2004). research which unequivocally confirms both the We assessed the strength of sentiment correlation influence of mood on music choice and the influ- to the DJIA and ICC. We then explored poignant ence of music on mood and even buying behav- Granger-causal relations and created a predictive ior (Areni and Kim, 1993; Bruner, 1990; Chen et model for both societal indicators. In this work, al., 2007; R McCraty, 1998) we discover, there are indeed correlating and While researchers have attempted to model causal relationships between the song sentiments public opinion indices and the stock market via and these societal indicators. sentiment analysis of news articles, microblog- ging and social media sites, no research has tak- en this correlation-seeking approach using popu- lar song lyrics. We hypothesized that song lyrics 2 Related Work linking the effect of music on mood and social behaviors including buying decisions and even Bollen et al. (2011) explored the notion that pub- it’s inverse; the role of mood in music lic mood can be correlated to and even predictive choice(Areni and Kim, 1993; Bruner, 1990; of economic indicators. They used sentiment North and Hargreaves, 1997; R McCraty, 1998; analysis of large scale twitter feeds and compare Sloboda, 2011). Due to the strong relationship it with the Dow Jones Industrial Average over between music and mood, we considered it to be time. High correlation results led them to create a reasonable hypothesis that top music choice of a neural network to predict the DJIA given their the nation, via the Hot 100, could in some ways Twitter sentiment insights. They reached 87% be representative of public mood. accuracy in predicting the daily up and down changes of the DJIA. Similarly, O’ Connor et al. 3 Dataset (2010) connected measures of public opinion measured from polls with the results of sentiment 3.1 Lyrics analysis over text on twitter feeds. They analyzed The first step in preparing our analysis was to several surveys on consumer confidence and po- collect the song listings themselves. Our dataset litical opinion between 2008 and 2009 and found spans six years, 2008 through 2013. In order to correlation between sentiment word frequencies perform this, we consulted the Ultimate Music in twitter messages. Database (http://www.umdmusic.com/) which Acerbi et al. (2013) examined the usage of has the full listing of Billboard Music Charts “mood” in the context of 20th century books songs available. We collected a total of 36,000 written in English.(Acerbi et al., 2013).They used WordNet Affect song listings (with some songs being repeated). to perform sentiment analysis on the literature For each listing, we queried and scraped Lyr- and found evidence for distinct historical periods icsWikia (http://lyrics.wikia.com/) for actual lyr- of positive and negative moods in American Lit- ics. The lyrical data along with full chart listings erature. Further, these periods often correlated to are available for public use by contacting the au- historical happenings. thors. Daas and Puts (2014) explored changes in the sentiment in Dutch public blogs and social 3.2 Societal Indicators media messages i.e. Twitter, Facebook and For the purpose of this research, we examined LinkedIn over a 3.5-year period.(Daas and Puts, 2014)They per- two societal indicators; Dow Jones Industrial formed sentiment analysis on the text and com- Average (DJIA) and Thomson Reu- pared results with changes in Netherlands month- ters/University of Michigan Index of Consumer ly consumer confidence. They discovered a high Confidence (ICC). The two were chosen due to correlation (up to r=0.9) and that changes in so- their role in previously researched correlations as cial media sentiment precede the consumer con- described in the Related Work section. fidence changes. The DJIA shows how 30 large publicly While there has been much interest in auto- owned companies based in the United States matically determining the sentiment of songs have traded during a standard trading session in from both acoustic and natural language pro- the stock market. It is the second oldest U.S. cessing communities, there has been far less suc- market index and is influenced by many factors. cess in performing the task. Xia et al (2008) pro- The ICC aims to measure consumers’ level of posed using a sentiment vector space model in optimism towards their own financial situation, conjunction with a support vector machine to short term general economy and long term gen- overcome the hurdle of lyrical data sparseness.(Xia et al., 2008) eral economy, according to their share of spend- Other research has focused on combining audio ing and savings (Curtin, 2004). and lyrical data for ascertaining the mood of a given song (Hu and Downie, 2010; Zhong et al., 4 Methods 2012). While past work has looked at correlations In this section, we describe our approach to find between Twitter, literature, and other social me- correlations between the lyrical sentiment of top dia in regards to stocks and public opinion, our songs and societal indicators. work looks at the correlation between the senti- ment of popular song lyrics and these societal measures. There is an abundance of research 4.1 Sentiment Analysis fear is distinguished in contrast to fear, which may also signify reverence. Finally, we included We chose to utilize two simplistic approaches to positive emotion and negative emotion to com- sentiment mining given the insight of past work pare to OpinionFinder’s polarity results. showing the uniqueness of song sentiment classi- In order to effectively compare the range of fication. Positive results using the simplest of sentiment counts and the societal indicators, each sentiment analysis techniques would be indica- time series was normalized to their z-score using tive of opportunities for fine-tuning the sentiment the overall mean and standard deviation. The analysis method for enhanced results. We per- normalization allows all time series to fluctuate formed our first analysis of the lyrics using Opin- around a zero mean and be expressed on the ionFinder. OpinionFinder performs sentiment scale of a single standard deviation. The Z score analysis by labeling words as either positive, of time series X is denotes Z is defined as: negative, or neutral. It generates a text file that X tags the words in the document with respect to their contextual polarity. Using OpinionFinder �8 − � �6 = (2) results, we calculated the polarity for a song us- � ing the following ratio: where � and � represent the mean and standard deviation of the X time series. ���_��� + ���_���� + .1 �������� = (1) 4.2 Correlation Analysis ���_��� + ���_���� + .1 We began by examining the Pearson Correlation where num_pos, num_neut, and num_neg Coefficient for the sentiment time series in com- represent the number of words with positive, parison to the societal indicator data.