DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS , SWEDEN 2017

And the winner is... Predicting the outcome of by analyzing the sentiment value of Tweets

ALEXANDER KOSKI AND JENNIFER PERSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION And the winner is...

Predicting the outcome of Melodifestivalen by analyzing the sentiment value of Tweets

ALEXANDER KOSKI AND JENNIFER PERSSON

Bachelor’s Thesis in Computer Science Date: June 5, 2017 Supervisor: Iolanda Leite Examiner: Örjan Ekeberg Swedish title: Och vinnaren är... Att med sentimentalanalys på tweets förutspå resultatet av Melodifestivalen School of Computer Science and Communication

iii

Abstract

In a world where a lot of people post their feelings about things on so- cial media, the interest in using sentiment analysis to be able to collect and understand these feelings has arisen. This thesis aims to investi- gate the possibility of predicting the outcome of a television compe- tition, decided partly by the viewers’ votes, using sentiment analysis on tweets. The lexicon AFINN and a Swedish translated version of it was used for the lexical sentiment analysis of this report. After pre- processing the tweets gathered from Twitter with the competition’s hashtag, the tweets were analysed and mapped to the different com- petitors. Each mapped tweet was scored with a sentiment value ac- cording to the lexicons. Six different predictions were derived from the sentiment values of the tweets. The predictions was compared to the real result of the competition using Kendall tau distance, where shorter distance indicates more similarities between the lists. The re- sult show that it is possible to make a rough prediction of the outcome of the competition where the best prediction was achieved by ranking the top 5 artists based on the sum of positive sentiment value for the songs. iv

Sammanfattning

I en värld där människor postar sina känslor och åsikter om saker på sociala medier, har intresset för att genom sentimentalanalys samman- ställa och förstå dem ökat. Den här uppsatsen undersöker om det är möjligt att förutspå resultatet av en tävling, där vinnaren delvis väljs av röster från tittarna, genom att göra sentimentalanalys på tweets. Det engelska lexikonet AFINN och en svensk översättning av den an- vändes för att göra den lexikala sentimentalanalysen för den här rap- porten. Efter att behandlat alla tweets med tävlingens hashtag, som samlats in från Twitter, genomfördes analysen och mappningen av de olika tweets till de olika bidragen. Varje mappat tweet blev tilldelat ett sentimentalvärde med hjälp av lexikonen. Sex olika alternativa resul- tatlistor togs fram baserat på sentimentalvärdena från alla sorterade tweets. De olika alternativa resultatlistorna jämfördes med det faktis- ka utfallet av tävlingen där likheten mellan dem mättes med Kendall tau distance, där kortare distans indikerar större likheter mellan listor- na. Resultatet visar att det går att göra en ungefärlig förutsägelse på tävlingsresultatet med hjälp av sentimentalanalys, där det alternativa resultatet som bäst stämde överens med verkligheten togs fram ge- nom att rangordna de 5 populäraste bidragen efter summan av deras positiva sentimentalvärden. Contents

1 Introduction 1 1.1 Problem Statement ...... 2 1.2 Scope ...... 2

2 Background 4 2.1 Melodifestivalen ...... 4 2.2 Data-Mining using Twitter ...... 5 2.3 Sentiment Analysis ...... 5 2.3.1 Lexical Analysis ...... 6 2.4 Kendall tau distance ...... 6 2.5 Related works ...... 7

3 Method 8 3.1 Data-gathering ...... 8 3.2 Data pre-processing ...... 9 3.3 Producing ranking lists ...... 11 3.3.1 Scoring the tweets using lexical analysis ...... 11 3.3.2 Metrics to calculate from the existing data . . . . . 12 3.3.3 Kendall Tau comparison ...... 12

4 Result 14 4.1 Six ways of ranking ...... 14 4.2 Kendall tau distance comparison ...... 15 4.2.1 Complete list of rankings ...... 16 4.2.2 Top 5 list of ranking ...... 16

5 Discussion 18 5.1 Real World Comparison ...... 18 5.2 Limitations ...... 19 5.3 Further Research ...... 21

v vi CONTENTS

6 Conclusion 22

Bibliography 23 Chapter 1

Introduction

Using computers to predict the outcome of future events by using mathematical models is something that has occurred for a long time. The big advantage of using computers for making predictions is the computer’s ability to handle large amounts of data quickly. It would not be possible for a human to analyse the same amount of data in reasonable time, while a computer is able to do it in a matter of min- utes. Today we can see many uses for computers predictive powers with models to predict traffic patterns on highway systems (Allström 2005), the rise and fall of stock prices (Aase 2011) and much more. In the recent decade social media has been growing exponentially with more and more people connecting and sharing their opinions and views on different matters. This has opened the door for mass collect- ing of opinions from the public. This new data can be used to predict outcome that relates to people’s feelings on specific topics. Political elections and brand evaluation are examples of areas where data from the masses are used to draw conclusions about a population’s future behavior (Chin, Zappone, and Zhao 2016) (Brandwatch 2017). To analyse all of this new data a field called sentiment analysis has emerged. Sentiment analysis is a way of analysing texts with the pur- pose of extracting its sentiment value. There are many different meth- ods of performing sentiment analysis on texts, one of them is called lexical analysis. Lexical analysis polarizes the given text as either pos- itive, neutral or negative by scoring the individual words sentiment with the help of a pre-defined lexicon (Thelwall et al. 2010). By using sentiment analysis this thesis aims to investigate the pos- sibility to predict the outcome of a competition where the viewers

1 2 CHAPTER 1. INTRODUCTION

opinion play a great part in the result. A popular Swedish music competition is Melodifestivalen, in which artists compete with song and dance performances. The winner is partially determined by the artist who receives the highest number of votes cast from the viewers of the competition. Recently the competition has encouraged its viewers to interact with the show by tweeting using the competition’s official hashtag (Account 2017). This has created the opportunity to find out people’s opinions about the competing artists before the final voting results has been presented on live television.

1.1 Problem Statement

The aim of this thesis is to investigate how a sentiment analysis of what people post online can relay the outcome of a competition based mostly on votes. This will give insight into how well sentiment anal- ysis can be applied to predicting the future and how it best may be used. The research question follows:

• Is it possible to predict the outcome of the competition Melodifes- tivalen by doing a sentiment analysis of tweets posted by viewers during the show before the result is announced?

1.2 Scope

The focus of this report is to perform sentiment analysis on posts made by viewers of the show posted on social media, specifically posted on Twitter. Limiting the research to only one social media increases the possibility to be more specific in the data gathering, and the decision to only look at posts on Twitter was made because of the limited length of the tweets and the ease of finding related data by looking at hashtags. This study is based on the assumption that the fetched tweets are a sample of how the viewers of the competition are planning to vote. A decision was made to look only at the tweets written in Swedish and English, since it is expected that the majority of tweets to be writ- ten in these languages. This was also a decision based on difficulties to guarantee the quality of lexicons in other languages. Finally, the voting period is during the air time of the competi- tion and thus will this research only analyse tweets posted during that CHAPTER 1. INTRODUCTION 3

time. The tweets included in the study will therefore only reflect the sentiment of the viewers before the winner was announced. Chapter 2

Background

This chapter aims to introduce the competition Melodifestivalen and the concept of data mining using Twitter. It will also present the concept of sentiment analysis and the method of lexical sentiment analysis will be explained in more detail. After that, a method for comparing two lists to each others will be shown. Lastly some related work will be presented.

2.1 Melodifestivalen

Melodifestivalen is an annual Swedish music competition arranged by the Swedish national public TV broadcaster, , SVT. The competition is divided into a couple of qualification heats before the final. Twelve artists compete in every heat by displaying their own musical performance on stage, two winners from each heat qualifies to the grand finale with a chance of winning the whole competition. The winner of the grand finale is decided by a jury in combination with the popular vote, and will later represent Sweden in competing against other European countries. The jury gives half of the final points and the popular vote decides the other half, and the song with the highest combined score is the winner. Dur- ing the final in 2016th competition, roughly 12.5 million votes were registered over telephone (SVT 2016), which shows how popular this contest is in Sweden. In the last couple of years, Melodifestivalen has had its own hashtag, #melfest, on Twitter (Account 2017), to encourage the viewers of the show to tweet their opinion about the competition.

4 CHAPTER 2. BACKGROUND 5

2.2 Data-Mining using Twitter

Twitter is a microblogging network with 319 million monthly active users in the fourth quarter of 2016 (Twitter 2017). The users can write their own short blog-posts, so called tweets, or they can retweet other people’s posts. Hashtags are used to relate tweets to a certain sub- ject. Twitter provides plenty of Application Programming Interfaces (API) online, which gives access and the ability to use data from Twit- ter. The Twitter database has become a popular and common way for companies to analyze their customers opinions on their products (Brandwatch 2017) and for scientists to predict stock markets rise and fall and other events (Pagolu et al. 2016). One way of retrieving tweets with content related to a live ongo- ing event is by using Twitter’s streaming API. Through this streaming API one get access to Twitter’s global stream of tweets in real time as they are posted by the users to the global feed. In order to only receive tweets related to a specific event the API provides a filter func- tion, which lets the user filter the global stream to only show tweets containing a specific hashtag and more (D. D. Twitter 2017). Many of the tweets on the social network are a part of a bigger conversation or are aimed towards specific receivers. It is possible to identify these tweets and the targeted user by examining the mentions. This helps the data mining process with classifying the tweet into the right context.

2.3 Sentiment Analysis

Sentiment analysis is the concept of extracting the emotions from the contents of a text. When extracting a sentiment from a text it is often common to polarize the text down to three sentiments, positive, neu- tral or negative rather than a range of emotions such as sad, angry, happy etc. However, there are ways to extract a greater range of feel- ings by looking at smaller portions of the text or even the individual words (Thelwall et al. 2010). It is suggested to divide the sentiment analysis process in two or three steps. The first step is to split the text in smaller portions. The second step is to discard the objective text parts since they do not dis- play any emotions, lastly the subjective parts of the text remain and 6 CHAPTER 2. BACKGROUND

are ready to be analysed (Thelwall et al. 2010). Common methods of extracting the sentiment of a text are by using decision tree, machine learning or lexical analysis.

2.3.1 Lexical Analysis Lexical analysis is a method for performing sentiment analysis on a text. In lexical analysis, the words in a text will be matched to a prede- fined lexicon where words are given a value that convey the estimated sentiment of a word. A common range of sentiment value is -5 to +5, with the negative values representing negative sentiment and positive values representing positive sentiment. This score is set by the author of the lexicon used. The complete sentiment value of a text is the total sum of the values of the words. (Sommar and Wielondek 2015). The word lexicon that this thesis intend to use is the lexicon called AFINN. While there are multiple lexicons available, most of them have been around for a long period of time. This means that many of the wordlists may not contain new words, phrases and/or slang often used on the internet. The fact that older lexicons were not accurate in extracting the correct sentiment out of a small text from online social media motivated Nielsen to create the AFINN lexicon (Nielsen 2011). The AFINN word lexicon consists of 2477 words and phrases based on what is most often found in tweets. Alex Gustafsson made a Swedish translation on the AFINN word lexicon, making it available to be used on tweets in Swedish (Gustafsson 2017).

2.4 Kendall tau distance

The Kendall tau distance is a well known measurement to describe how closely related two lists are to each other. The value derived from the Kendall tau distance corresponds to the number of pairwise dis- agreements between the two lists. The number of disagreements is normalised to receive a value between 0 and 1, where 0 is total similar- ity between the lists while a value of 1 appears when all the rankings on the pairwise items differ in order. Since the original Kendall tau distance only applies to two sets con- taining the same elements, the regular Kendall tau can not be applied to the top 5 ranking lists since the same artists does not always appear in the top 5 rankings in both lists. Fagin, Kumar, and Sivakumar (2003) CHAPTER 2. BACKGROUND 7

proposes a fix to the Kendall tau distance to make it work on top lists where elements might not appear in both lists. This method is based on introducing a penalty by extending the distance between the lists based on certain criterias when an element does not appear in both lists.

2.5 Related works

Predicting a specific outcome based on Twitter data is nothing new. One can read all kinds of studies from people predicting different events based on tweets, such as the stock markets rise and fall (Pagolu et al. 2016). Other attempts has been made to predict golf and basketball player’s performance in games based on the sentiment of the player’s tweets. Xu and Yu (2015) could make a connection between the bas- ketball players physical performance and the sentiment of the athletes tweets, meanwhile Abdelmassih and Hultman (2016) was not able to make a connection between the golf players’ tweets and their players’ performances. Caplar (2015), conducted a pre-study on analysing the most men- tioned competitor for the competition European song contest by look- ing at tweets from Twitter. He could from the amount of tweets derive a prediction on how the public could vote and which artist who were most likely to win the contest later on. This model has predicted the results of the competition for three years in a row with varying re- sults. His study looked only at the amount of tweets mentioning an artist and the model did not take the sentiment of the tweets into ac- count. His prediction showed a correlation between how much a song is tweeted about and the winner of the contest. Johansson and Lilja (2016) did a pre-study on comparing three dif- ferent methods of extracting the sentiment value from three different data sets. One of these data sets were a collection of random collected tweets. In this study the methods used where lexical analysis, using the word list AFINN, a decision tree and Naive Bayes machine learn- ing algorithm. In their study they found that for shorter texts like tweets the lexical analysis performed significantly better than the two other methods used (Johansson and Lilja 2016). Chapter 3

Method

This chapter describes the method of the thesis and the technical im- plementation of the techniques previously explained. First, a descrip- tion of the data gathering process will be presented, then how the data was pre-processed before the predicting process. Lastly, the methods of getting results from the pre-processed data will be presented. The code used within this section for fetching tweets, processing data and calculating Kendall tau distances can be found in the git-repository at the link: https://gits-15.sys.kth.se/akoski/twitter-sentiment-ranking.

3.1 Data-gathering

In this study one stream connection was set up with the API using the filter #melfest. This hashtag is the commonly used hashtag in tweets related to the Swedish Melodifestivalen. The airtime of the competition was between 8 pm and 10 pm. We started collecting data through the previously mentioned stream at half past 7 pm. The stream collected tweets up until a quarter past 10 pm. The total amount of tweets col- lected is 46,378, with 17,288 tweets in Swedish and 13,802 tweets in English. See language distribution in graph 3.1. When looking at the amount of tweets plotted over time there is a significant increase in tweets containing the official hashtag of Melod- ifestivalen, #melfest, during the show’s airtime as seen in the graph 3.2. The graph shows a spike in number of tweets posted at around 21:57. It was around this time that the winner of the competition was announced. These tweets were excluded from the calculations in this study because of the strive to explore if it is possible to predict the

8 CHAPTER 3. METHOD 9

outcome the competition before the winner is announced. The tweets were stored with the following information: the times- tamps of when the tweet was posted on Twitter, the tweeting user’s username, the contents of the tweet and also the language of the tweet. The language of a tweet is determined automatically by Twitter’s own algorithms. The tweets were stored as JSON objects in a text file.

Figure 3.1: The chart shows the language distribution of the collected tweets

3.2 Data pre-processing

The first step was to sort the tweets by language where only the Swedish and the English tweets were kept for further processing. Then, the tweets sent out after the winner was announced was also filtered away. In order to make a prediction based on the tweets gathered, the tweets were mapped to the different competitors. This was executed by considering two factors: mentions related to a specific performance and the timestamp of the tweet. The tweets with a mention of an artist’s name, Twitter username (if such exists), the song-title or the song-number would be mapped to that performing artist. This is a directly linked tweet, where the rel- evance of the content is close to guaranteed. Tweets that mentioned multiple artists or songs would get mapped to the artist or song first 10 CHAPTER 3. METHOD

Figure 3.2: The graph shows the amount of tweets gathered using the official hashtag over time.

mentioned within the tweet, since many of the tweets, with several artists mentioned, included some sort of ranking with the highest rank- ing artists mentioned first. See quote below. “My top 12 of #Melfest final 1. Jon Henrik & Aninia 2. Robin 3. Wik- toria 4. Anton 5. Owe 12. The rest ”

The next step was to filter out the irrelevant tweets to make sure that tweets with mentions of no relevance are not taken in regard. This was done by looking at mentions of the show hosts, midtime enter- tainment etc. and removing them. For example, the tweets mention- ing , who performed during the midtime entertainment, were removed. Finally, the timetable of each performance during the competition was received from SVT on request and the timestamp of the tweets was matched to that information. All tweets posted within a time period of 1 minutes and 40 seconds before and after each song was considered related to that particular song. The time limit was decided based on the shortest time between any two songs during this competition. The remaining unclassified tweets after this filtering process was disregarded. After pre-processing the data, a total of 18,497 tweets sorted and mapped to the different competitors remained. CHAPTER 3. METHOD 11

3.3 Producing ranking lists

When producing the final result the lexical analysis was performed first, then six different ways of ranking the results were applied and fi- nally a Kendall tau distance comparison of the ranking lists was made to give an indication of how accurate the results were. The methods used will be presented here.

3.3.1 Scoring the tweets using lexical analysis The Swedish lexicon used was provided in code by Gustafsson (2017) and respectively the English lexicon by Sliwinski (2017). Only the tweets that were successfully linked to an artist were scored. The pos- itive, negative and neutral scores were stored separately for the possi- bility to analyse the scores from different angles. Below are five tweets collected from the study. The tweets have been chosen to illustrate how positive, negative and neutral tweets looks like in relation to its scored sentiment value. The sentiment is, as mentioned in section 2.3.1, the sum of the individual words senti- ment determined by the lexicon. The score is shown in table 3.1 in the leftmost column and the tweet in the rightmost column.

Table 3.1: Sample of scored tweets Value Tweet #melfest And here’s the winner! Love me some old school 7 d’n’b when the beats kick in. I actually really like Anton’s song #Melfest 2 https://t.co/Cow7BjaXi1 I would like this song a lot more if that annoying dubstept 0 would be removed. #Melfest Of the 7 songs I don’t like this is the one I find most tolera- -2 ble. #MelFest I’m just gonna say it though: I fucking hate this song. SO -7 much. Someone has to, mmk? #melfest 12 CHAPTER 3. METHOD

3.3.2 Metrics to calculate from the existing data There are many different ways of ranking the artist based of the data received and processed from Twitter. We have chosen to create six different rankings. The six rankings are derived from the following metrics:

• Number of tweets (NT)

• Number of positively scored tweets (NTP)

• The sum of the positive sentiment values (SSP)

• The average positive sentiment value for a tweet (ASP)

• The average sentiment value for a tweet, excluding tweets with a sentiment value of 0 (AS)

• The average sentiment value for a tweet, including tweets with a sentiment value of 0 (AS0)

A difference was made between including and excluding the neu- tral tweets (with a sentiment score of 0) in the result. Since a lot of tweets got scored with a value of 0, there will be different average val- ues when including and excluding these neutral tweets. More attention was also put into the tweets with a positive senti- ment value rather than the negative ones, since it does not exist a way of downvoting an artist. It was assumed that the positive sentiment would show a more correct relation with the voting result.

3.3.3 Kendall Tau comparison The six ranking lists mentioned above were compared individually to two reference rankings. The reference rankings are the actual rankings based on the outcome of . The two reference lists are described below:

• Ranking of the artists, according to only the viewers votes (View- ers Ranking)

• Final ranking of the artist’s derived from both the viewers and the jury votes (Final Ranking). CHAPTER 3. METHOD 13

The comparison was made using the normalized Kendall tau dis- tance. Generally in competitions the top ranking competitors often draws more attention than the lower rankings, therefore a comparison was made between the same set of lists with only the top 5 rankings using the method provided by Fagin, Kumar, and Sivakumar (2003). The method is an enhancement to the regular Kendall tau distance since the regular Kendall tau distance can only be applied to two lists containing the same set of elements and the top 5 lists used in this example do not fill that criteria. Chapter 4

Result

In this chapter, the results received from the sentiment analysis and the Kendall tau distance comparison will be shown. The results will firstly be presented in six different ways, as to show six different ways of ranking the artists based on the results. Finally, the results when using Kendall Tau to compare the rankings with the two official rankings from the competitions, will be presented.

4.1 Six ways of ranking

Below follows table 4.1 containing the six results from each metric de- scribed in the method, section 3.3.2, calculated for each artist. The ranking predictions are derived column wise based on the numbers in this table. The six different rankings derived from the columns in table 4.1 is shown in table 4.2 below. The two rightmost columns are the reference rankings used in comparison to calculate the Kendall tau distance, see section 3.3.3. Looking at the predicted rankings one can see that Robin was in first place three out of six times, which corresponds well with the of- ficial final ranking where Robin came in at first place. Robin achieved the highest ranking based on the total number of tweets (NT), total number of positive tweets (NTP) and on the total sum of positive sen- timent value (SSP). However, Robin ranked in 6th place when looking at the overall average sentiment value (AS) and the average sentiment value including zero (AS0). It is also worth noticing that Nano was predicted first in two out of six times, as he was the highest ranking

14 CHAPTER 4. RESULT 15

Table 4.1: Artists score and other metrics

Name NT NTP SSP ASP AS AS0 Ace 1675 565 1942 3.45 2.15 0.7 Boris 808 398 1253 3.15 2.1 1.15 Lisa 1047 358 1055 2.95 0.45 0.85 Robin 2592 1073 3647 3.4 2.0 0.85 Jon 1806 802 2587 3.25 1.85 0.7 Anton 1432 583 1885 3.25 1.8 0.75 Mariette 1488 680 2324 3.45 2.45 1.25 FO&O 1446 466 1355 2.9 1.1 0.65 Nano 1468 725 2590 3.55 2.3 1.35 Wiktoria 1789 750 2653 3.55 2.05 1.15 Benjamin 1385 549 1668 3.05 1.4 0.65 Owe 1561 654 1963 3.0 1.45 0.65

Table 4.2: Artist ranking according to table 4.1 Rank NT NTP SSP ASP AS AS0 Viewers Ranking Final Ranking 1 Robin Robin Robin Nano Mariette Nano Nano Robin 2 Jon Jon Wiktoria Wiktoria Nano Mariette Wiktoria Nano 3 Wiktoria Wiktoria Nano Ace Ace Boris Robin Jon 4 Ace Nano Jon Mariette Boris Wiktoria Jon Mariette 5 Owe Mariette Mariette Robin Wiktoria Lisa Mariette Benjamin 6 Mariette Owe Owe Jon Robin Robin Anton Wiktoria 7 Nano Anton Ace Anton Jon Anton FO&O Ace 8 FO&O Ace Anton Boris Anton Ace Benjamin Boris 9 Anton Benjamin Benjamin Benjamin Owe Jon Ace Lisa 10 Benjamin FO&O FO&O Owe Benjamin FO&O Owe Anton 11 Lisa Boris Boris Lisa FO&O Benjamin Boris FO&O 12 Boris Lisa Lisa FO&O Lisa Owe Lisa Owe artist in the viewers official rankings. When comparing the viewers ranking with the predicted rankings, one may notice that the top 5 rankings of the sum of the positive sen- timent values (SSP) and the total number of positive tweets (NTP) in- clude all the same 5 people as the public vote, although in a slightly different order.

4.2 Kendall tau distance comparison

In this section, the results from the Kendall tau distance comparison will be shown. First the comparison with the complete lists of rankings 16 CHAPTER 4. RESULT

will be presented, and then the comparison with the top 5.

4.2.1 Complete list of rankings The table 4.3 shows the normalized Kendall tau distances, derived from comparing the Predicted rankings with the two official reference rankings from section 3.3.3. From the values in table 4.3 it is possible to calculate the average Kendall tau distance. The average distance between the Predicted rankings and the Viewers rankings, 0.472, are shorter than the average distance between the Predicted rankings and the Final ranking, 0.482. This indicates a better correlation between the Predicted rankings and the Viewers rankings with respect to the Final ranking. Collectively, the average for the whole table 4.3 is 0.477 with a standard deviation of 0.078. The smallest distance, i.e the best correlation, was achieved when comparing the predicted ranking based on the Total amount of tweets (NT) with the Viewers ranking, followed equally by the distance be- tween the Average sentiment of a tweet (AS) and the Average senti- ment of a tweet excluding all neutral Tweets (AS), also in relation to the Viewers ranking.

Table 4.3: Tau comparison for full rankings NT NTP SSP ASP AS AS0 Viewers Ranking 0.364 0.606 0.47 0.424 0.576 0.394 Final Ranking 0.561 0.5 0.455 0.5 0.379 0.5

4.2.2 Top 5 list of ranking The table 4.4 shows the normalized Kendall tau distances derived from making the same comparisons as in section 4.2.1 but this time only fo- cusing on the top 5 rankings. Again, from the values in table 4.4 it is possible to calculate the average Kendall tau distance. The aver- age distance in table 4.4 for the Viewers ranking, 0.229, is also in this table smaller than the average distance for the Final rankings, 0.271. The general average distance for the whole table, 0.25, is also smaller than the mean distance in table 4.3 but with a slightly larger standard deviation of 0,162. CHAPTER 4. RESULT 17

The lowest distance appears when comparing the Sum of the pos- itive sentiment (SSP) to the Viewers Ranking. The same ranking, SSP, also has the lowest distance (shared with the ranking derived from Number of positively scored tweets, NTP) to the Final Ranking

Table 4.4: Tau comparison for top 5 rankings NT NTP SSP ASP AS AS0 Viewers Ranking 0.244 0.111 0.067 0.111 0.576 0.267 Final Ranking 0.356 0.089 0.089 0.289 0.379 0.422 Chapter 5

Discussion

In this chapter the different ways of making a prediction and the re- sults of these predictions will be analysed. An attempt of explaining and comparing the results will be made. The topic on limitations of the result will also be raised as well as possible further research to be made.

5.1 Real World Comparison

Comparing the results from the six different ways of ranking, the Sum of positive sentiment value (SSP) had the shortest distance among all the other predicted rankings to the Viewers ranking, but the distance to the Final ranking is slightly longer. One of the reasons is that Ben- jamin, who had received low amount of votes from the viewers, as well as a low score in all of our ranking systems, was ranked high by the jury. Since the Final result of the competition is not solely based on the viewers voting pattern, the six different predictions managed to pre- dict the viewers voting pattern better than the final result. This is con- firmed by looking at the Viewers ranking and the Final ranking dis- tances, since the Viewers rankings got a lower average Kendall tau distance. See section 4.1.1 and 4.2.2. The results also suggests that the best way of predicting the out- come is not to look at the Average sentiment of a tweet (AS, AS0 and ASP), since it is not interesting to know how strongly someone feels about an artist. It is simply more interesting with the amount of peo- ple tweeting with a positive sentiment since they might vote. Loving

18 CHAPTER 5. DISCUSSION 19

or liking an artist does not in itself result in a more valuable vote, since voting is a binary action. This research managed to correctly guess Robin as the winner of the competition in 3 of 6 different ranking predictions. The first (1) of these ways was by only looking at the number of tweets (NT) men- tioning the artist, which does not have anything to do with sentiment analysis. The second time (2) our prediction managed to guess the winner correctly was by looking at the Number of positive tweets (NTP). This result was expected since the previously mentioned pre- diction, only using the number of tweets (NT), was successful. Table 4.1 shows that the average sentiment of all the tweets are pretty much the same. So when one artist receives more tweets with the same aver- age as other artists with less tweets, they would still get ahead in the ranking based on total amount of positive tweets. The last prediction (3) was based on the Sum of positive sentiment values (SSP) mapped to the artists. Robin did receive the greatest number of positive tweets, so when adding them up the results are not surprising. The Sum of positive sentiment values (SSP) has shown to be correct in both guessing the winner and achieving the closest distance in com- parison to the real outcome of the competition when looking at the top 5. This prediction also managed to be the closest one in guessing the Viewers Ranking. Considering these results, it is obvious that in this case SSP was the best method to make a prediction of Melodifestivalen. The average value of the distances derived from the 12 ranking pre- dictions were significantly higher than the average of the distances de- rived from the top 5 rankings. The deviation was low for both datasets. Therefore a better prediction of the outcome can be made by only look- ing at the top 5 rankings instead of the whole list. However, the rankings show that it is easier to get a good estima- tion of the overall result of the competition, while it is hard to predict the specific rankings of artists. There might still be some misplace- ment among the predicted rankings of the artists since a low Kendall tau distance only indicates an overall similarity between lists.

5.2 Limitations

While conducting this study, some different aspects came to stand out as possible limitations on our results. 20 CHAPTER 5. DISCUSSION

One of them was the difficulty of making sure that the lexicons were properly set up with the currently popular Internet slang. In this case, there are reasons to believe that the lexicons used in this study were a bit outdated. E. g. there was a tweet with the content “She slayed yassss #melfest”, which was given a neutral score instead of the positive sentiment that the poster intended. In this case, it was because the lexicon did not recognize the word “slayed” nor the word “yassss” as something positive. There is also the effect of the lexicon not recognising words that are misspelled, resulting in a lot of tweets not given the correct senti- ment value. A suggestion for future research in this area would be to perform a spell check on the tweets before doing the lexical analysis. It was also noticed that some of the translations of the words in the Swedish lexicons were not as accurate as they should have been. E. g. the word “pretty” in the English lexicon was given a score of +1, sug- gesting that the intended meaning of the word pretty is “attractive” or “beautiful”, rather than the neutral synonym of “slightly”. However, in the Swedish lexicon the word “pretty” was translated into “gan- ska”, the Swedish word for “slightly”, giving a positive sentiment to a world that most likely is neutral. Comparing the amount of tweets initially gathered with the amount of tweets actually analysed, a large gap is noticed. Roughly one fourth of the tweets were filtered away in the first process of filtering out the tweets in English and Swedish. This opens up a discussion of how the tweets in other languages might have altered the results achieved. However, since the inability to guarantee the quality of lexicons in other languages, the accuracy of those results would be questionable. The fact that people outside of Sweden can not vote in the competition also justifies the choice to only look at tweets in Swedish and English, since these are the most widely used languages in the country. See graph 3.1. Another complication that might have affected the result is if a few users tweet a lot. These few users would possibly express an opinion that gets interpreted as the opinion of a whole population since this study does not apply any weights to lower the importance of tweets posted from an account with many other post as well. It is also to be considered that the study was only conducted on a single competition. This is most likely not enough to confirm a pat- tern in how to predict the outcome of Melodifestivalen using sentiment CHAPTER 5. DISCUSSION 21

analysis.

5.3 Further Research

While this research did not give a conclusive answer, it has given an insight into how to predict the outcome of future events by perform- ing sentiment analysis on tweets. To perform the same tests as in this research on future competitions, tentatively future finals of Melodifesti- valen or competitions with similar setup, might open up the possibility to notice a pattern in the sentiment analysis. A pattern would really help to understand how to gain the most useful information while per- forming sentiment analysis. It would be interesting to undertake a research on how to best set up a lexicon to ensure modern online language is represented in it, as to improve the accuracy when performing lexical analysis on text posted on social media. Another aspect that could improve the accu- racy is to analyse the emoticons used in the tweets and add them the sentiment value. Chapter 6

Conclusion

The results show that it is possible to make a rough prediction of the outcome of the competition Melodifestivalen by doing sentiment anal- ysis on tweets posted by viewers of the show. However, the methods alone described in this study was not enough to achieve a perfect pre- diction. The prediction of the top 5 part of the outcome was more accurate than the prediction of the full outcome. The most accurate prediction was made by looking at the Sum of positive sentiment for a song. Comparing Number of tweets with Sum of positive sentiment shows that including sentiment analysis gives a better prediction than solely looking at number of tweets. This confirms the conclusion that it is possible to use sentiment analysis to predict the outcome of this competition, and that it is better than the other option in this study. However, this study would need to be repeated on more competitions to find the overall best way to make a prediction using sentiment anal- ysis. The same techniques used in this study can be applied in concert evaluation, general elections and more situations where it would be interesting to derive rankings based on people’s opinions. This pre- dicting process could easily be modified to be conducted at the same time as the election process is happening to get a real time prediction on the outcome of the result.

22 Bibliography

Aase, Kim-Georg (2011). Text mining of news articles for stock price pre- dictions. Abdelmassih, Christian and Axel Hultman (2016). Förutspå golfresultat med hjälp av sentimentanalys på Twitter. Account, Melodifestivalen (2017). Melodifestivalen (@SVTmelfest). URL: https://twitter.com/svtmelfest (visited on 06/05/2017). Allström, Andreas (2005). “Korttidsprediktering av restider med Holt- Winters metod”. MA thesis. Linköping University, The Institute of Technology, p. 48. Brandwatch (2017). Brandwatch Analytics. URL: https://www.brandwatch. com/brandwatch-analytics/ (visited on 06/05/2017). Caplar, Neven (2015). Predicting eurovision 2015 scores from twitter data. URL: http://astrodataiscool.com/2015/05/predicting- eurovision-2015-scores-from-twitter-data (visited on 06/05/2017). Chin, Delenn, Anna Zappone, and Jessica Zhao (2016). Analyzing Twit- ter Sentiment of the 2016 Presidential Candidates. Fagin, Ronald, Ravi Kumar, and Dakshinamurthi Sivakumar (2003). “Comparing top k lists”. In: SIAM Journal on discrete mathematics 17.1, pp. 134–160. Gustafsson, Alex (2017). Swedish translated, AFINN-based sentiment anal- ysis for Node.js. URL: https://github.com/AlexGustafsson/ sentiment-swedish (visited on 02/26/2017). Johansson, Henrik and Anton Lilja (2016). Method performance difference of sentiment analysis on social media databases : Sentiment classification in social media. Nielsen, Finn Årup (2011). “A new ANEW: Evaluation of a word list for sentiment analysis in microblogs”. In: CoRR abs/1103.2903. URL: http://arxiv.org/abs/1103.2903.

23 24 BIBLIOGRAPHY

Pagolu, Venkata Sasank et al. (2016). “Sentiment Analysis of Twitter Data for Predicting Stock Market Movements”. In: CoRR abs/1610.09225. URL: http://arxiv.org/abs/1610.09225. Sliwinski, Andrew (2017). AFINN-based sentiment analysis for Node.js. URL: https://github.com/thisandagain/sentiment (vis- ited on ). Sommar, Fredrik and Milosz Wielondek (2015). Combining Lexicon- and Learning-based Approaches for Improved Performance and Convenience in Sentiment Classification. SVT (2016). Svenska folkets röster, hjärtröster, telefon och SMS. Alla röst- ningssiffror . URL: http://www.svtstatic. se/image- cms/svtse/1458075562/melodifestivalen/ article7181233.svt/BINARY/Alla%20r%C3%B6stningssiffror% 20Melodifestivalen%202016.pdf (visited on 06/05/2017). Thelwall, Mike et al. (2010). “Sentiment strength detection in short in- formal text”. In: Journal of the American Society for Information Sci- ence and Technology 61.12, pp. 2544–2558. ISSN: 1532-2890. DOI: 10. 1002/asi.21416. URL: http://dx.doi.org/10.1002/asi. 21416. Twitter (2017). Quarterly Results. URL: https://investor.twitterinc. com/results.cfm (visited on 06/05/2017). Twitter, Development Documents (2017). Streaming APIs. URL: https: //dev.twitter.com/streaming/overview (visited on 06/05/2017). Xu, Chenyan and Yang Yu (2015). “Measuring NBA Players’ Mood by Mining Athlete-Generated Content”. In: System Sciences (HICSS), 2015 48th Hawaii International Conference on. IEEE, pp. 1706–1713. www.kth.se