Institutionen för datavetenskap Department of Computer and Information Science

Examensarbete

Twitter as the Second Channel

av Matteus Hemström och Anton Niklasson

LIU-IDA/LITH-EX-G--14/063--SE

2014-06-04

Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping Linköpings universitet Institutionen för datavetenskap

Examensarbete Twitter as the Second Channel

av Matteus Hemström och Anton Niklasson

LIU-IDA/LITH-EX-G--14/063--SE

2014-06-04

Handledare: Niklas Carlsson Examinator: Nahid Shahmehri Students in the 5 year Information Technology program complete a semester- long software development project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, currently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culmi- nates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elic- itation. During the final stage of the semester, students create small groups and specialise in one topic, resulting in a bachelor thesis. The current re- port represents the results obtained during this specialization work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis. Abstract

People share a big part of their lives and opinions on platforms such as Facebook and Twitter. The companies behind these sites do their absolute best to collect as much data as possible. This data could be used to extract opinions in many different ways. Every company, organization or public person is probably curious on what is being said about them right now. There are also areas where opinions are related to the outcome of an event. Examples of such events are presidential elections or the Eurovision Song Contest. In these events, peoples’ votes will directly reflect the outcome of the elections or contests. We have developed a simplistic prototype that is able to predict the result of the Eurovision Song Contest using sentiment analysis on tweets. The prototype collects tweets about the event, performs sentiment analysis, and uses different filters to predict the ranks of the contestants. We evaluted our results with the actual voting results of the event and found a Pearson correlation of approximately 0.65. With more time and resources we believe that it is possible to create a highly accurate prediction model. It could be used in lots of different contexts. Politicians and their parties could use it to evaluate their campaigns. The press could use it to create more interesting news reports. Companies would be able to investigate their brand appreciation. A system like this could be used in many different fields. Contents

1 Introduction 1 1.1 Motivation ...... 1 1.2 Problem Statement ...... 1 1.3 Contributions ...... 2

2 Theory and Related Work 3 2.1 Sentiment Analysis ...... 3 2.2 Collaborative Filtering ...... 4

3 Expectations 5

4 Methodology 6 4.1 Overview ...... 6 4.2 Data Collection ...... 6 4.3 Sentiment Analysis ...... 7 4.4 Visualization ...... 8 4.5 Filters ...... 9

5 Results 12 5.1 Dataset Characteristics ...... 12 5.2 Sentiment Analysis ...... 13 5.3 Prediction Results ...... 14

6 Discussion 17 6.1 Methodology ...... 17 6.1.1 Filters ...... 18 6.2 Results ...... 19 6.3 Ethics ...... 20

7 Conclusion 21 7.1 Future Work ...... 22

A Entity Mentions 24

B Language Distribution 25

C Correlation Plots 26

D Results From Visualization 27 Chapter 1

Introduction

1.1 Motivation

The Eurovision Song Contest is a very popular event in most countries across Europe. It engages hundreds of millions of people over the course of a few weeks each spring. The whole show is broadcasted live by multiple TV- channels and people gather at home to support their favourite act. Although the main interaction by people is set to watching TV, there is a lot of activity in social media as well. The fact that the result of this contest is based on people’s votes and that they continuously share their opinions for anyone to read creates great a opportunity for analysis. Our goal is to collect tweets via the Twitter API, analyse them in terms of sentiment and create a prediction of the final results. We would like to find out if our result can predict the outcome of the event with sufficient accuracy.

1.2 Problem Statement

Using simple entity extraction and sentiment analysis, this thesis explores how information in tweets can be used to predict outcomes of competi- tions, such as the Eurovision Song Contest. The core question of this paper is: • Is it possible to predict the result of the Eurovision Song Contest by running sentiment analysis on tweets related to the topic?

1 1.3. CONTRIBUTIONS CHAPTER 1. INTRODUCTION

1.3 Contributions

We have created a system that we call Eagle. It includes three mod- ules: • Data Collection This module is responsible for talking to the Twitter REST API. It downloads tweets and users, and saves the data in a MySQL database. • Data Analysis This module is responsible for extracting entities and analysing sen- timent using our heuristics. The entity extraction connects the tweet with one or more of our pre-defined entities. We do this by simple comparisons between the tweets body text and a list of identifiers that we have manually decided upon. • Visualization This module is responsible for presenting the analysed tweets. It pro- vides a simple web interface for building database queries. The web- site will then present a bar graph showing the result. It comes with 3 heuristics, 2 filters and an option for languages. We have also collected 737.793 tweets tagged with #eurovision. Since Twit- ter does not allow API access to tweets older than 8-10 days this data would probably be interesting in other projects.

2 Chapter 2

Theory and Related Work

No more than a couple of years ago we had nowhere near as much data available to us as we have today. Since then, a lot of studies have been done in this field. There is also a growing interest of language analysis in commercial markets.

2.1 Sentiment Analysis

The concept of sentiment analysis is to determine opinions in text. Many times the opinions are directed towards an entity. The entities are often political parties or competitors in some form. We are focusing on the Eu- rovision Song Contest, so let’s use that as an example. We think of each individual act as an entity. This means that each tweet can have an opinion on each entity. There are many tools to perform sentiment analysis. A popular technique to characterize sentiment in short texts is Linguistic Inquiry and Word Count (LIWC). LIWC has been used to determine the sentimental value in tweets [4]. LIWC is a commercial product, The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods [9] is a paper describing how it was created. Another algorithm that has been used for sentiment analysis in tweets is Sen- tiStrength [10]. SentiStrenght is designed to extract sentimental values from MySpace comments [10]. Thelwall et al. [11] claims that “[..] the accurate detection of sentiment is domain-dependant” and that the SentiStrength al- gorithm is suitable for Twitter comments since they are similar to MySpace

3 2.2. COLLABORATIVE FILTERINGCHAPTER2. THEORY AND RELATED WORK comments. They also state that informal language and abbreviations may be common which is somewhat contradictory towards the work of Hu et al. [3]. They claim that Twitter is an “evolving medium whose language is a projection of the language of more formal media like news and blogs into a space restricted by size”.

2.2 Collaborative Filtering

As explained by Kim et al. [4] the entities and users could be organized in a matrix. Each column represents an entity, and each row is a user. Then each cell is given a rating of a user’s sentiment towards an entity. Users that share sentiment towards a few entities are likely to have similar opinions on other entities. It is therefore possible to extrapolate opinions by identifying similar users. This is called collaborative filtering. It is most commonly used for recom- mending content to users. Even though our work is not centred around recommendations this is an interesting technique, as recommendations are basically predictions of opinions. While we did not implement any collaborative filtering in our analysis, we expect that it could be added as a compliment to the summarization model that we developed.

4 Chapter 3

Expectations

Our expectations for this project is that the system will be able to accurately predict three entities out of the actual top five without any internal order or ranking. Initially we had an idea of predicting the complete results. A few weeks in we felt that it was a bit too ambitious. Predicting 60 % of the top contestants is still useful while also achievable from our point of view. Predicting 60 % is relevant in the context of the Eurovision Song Contest. That number would not mean anything if we were to predict a presidential election or something with a lower number of entities. Having many entities makes it harder in some ways. A big hurdle is trying to decide which entity some tweet is mentioning. Sentiment analysis is difficult, even for humans. An interesting fact is that humans are about 80 % accurate when it comes to deciding sentiment [8]. This means that a computer which is correct 10 out of 10 times would still not be considered correct by a human in every case, making it difficult to get a god end result. Since we are working with a tight deadline we present a preliminary pre- diction analysis based on a simple prediction design. With more time on our hands we would probably try to make our prediction more accurate in terms of internal rankings in the top 5. We expect it to be fairly easy to recognize the most popular entities. The real challenge is to decide exactly how popular different entities are. Developing a complete system like this is a time-consuming task. Our goal is to create a prototype and present some of the potential in this area of research.

5 Chapter 4

Methodology

4.1 Overview

We have tried to define sub-systems, or modules, in our larger complex system. They operate completely independent to each other. Our goal here was to make each module small and lucid. We ended up with three different sub-systems: data collection, sentiment analysis and visualization. This is shown in Figure 4.1.

Figure 4.1: Low-tech overview

4.2 Data Collection

The task for this module is to talk to the Twitter API. It collects relevant tweets from a given hashtag and saves them. Our goal here was to make the module smart enough to expand its topic and figure out relevant hashtags on its own. Unfortunately we did not have enough time to develop it further. We are currently feeding it with a hashtag manually. The data collection module is written in the Go language. This language

6 4.3. SENTIMENT ANALYSIS CHAPTER 4. METHODOLOGY is performing really well for this sort of task. It allows for powerful asyn- chronous code. The module keeps track of what tweets it has already downloaded by storing two state variables called max and since. The usage of these variables are well documented in the Twitter API documentation [13]. By tracking these variables and using them for the Twitter API search pagination, the module can make sure that we will not miss any tweet and that we don’t download any tweet we already have.

4.3 Sentiment Analysis

The module for analysing the raw data is the most sophisticated module. Its task is to look at each individual tweet and analyse them in terms of sentimental value. This means that the module needs to understand what is considered positive and what is considered negative. The analysis module is also written in Go. To accomplish the sentiment analysis, the module takes tweets that it has not yet analysed from a database shared with the data collection module. The analysis module separates its work into two different tasks, entity ex- traction and sentiment analysis. These tasks are modular and can therefore be executed asynchronously and in parallel. This module is running com- pletely independent of the data collection module. The entity extraction is done with a very simple approach. Each entity is associated with multiple identifiers. For example, the entity (Sweden’s contestant in the contest) have both Sanna and Nielsen as identi- fiers. We also add the countries they represent and the name of their songs. Whenever an identifier is found in a tweet, the analysis module will create an association between the entity and the tweet. The sentiment analysis is more complex. We created a system capable of running multiple heuristics of sentiment analysis. This makes it possible to compare different algorithms and approaches of analysis. To optimize for performance we made sure that the module kept track of what heuristic had been run on what tweet. This feature gives us the ability to analyse tweets in real time with just a few seconds of delay. We did this on the night of the final. A great challenge with this module is that not all tweets are written in the same language. This means we either to translate every tweet, or to have customized algorithms for each individual language. Initially we wanted to translate every tweet to English and we did some research on the subject. We came to the conclusion that translation services are expensive. Google

7 4.4. VISUALIZATION CHAPTER 4. METHODOLOGY

Translate’s API would cost us hundreds of dollars. That is not something we are ready to invest at this point. One of our heuristics uses the SentiStrength algorithm. It is able to pro- cess 16,000 “social web texts” per second, according to its website [7]. Our heuristic takes the text of a tweet, passes it to the standard input of a SentiStrength process, and reads the response from standard output. Sen- tiStrength is primarily focused on analysing English text. But it is also possible to extend its support for other languages by feeding it with data files for each language. SentiStrength is widely used and there are multiple languages supported. There are a couple of language data packages available on its website. These are files generated with machine learning. The following languages are the ones provided which we incorporated in our heuristic: Arabic, Dutch, En- glish, French, German, Greek, Italian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish and Welsh. It is worth mentioning that the qual- ity of these data packages vary quite a bit. Most of them are untested and not fully reliable. We chose to make use of them anyway since running some analysis on a per language basis is still better than running an English analysis on foreign tweets. Another heuristic we created uses a very simple algorithm of our own that we named SimpleWords. It searches the tweet text for some predefined words that are classified as positive or negative. The score is then determined by how many positive, versus negative words, that are found in the text. We created this heuristic as a simple proof-of-concept and to allow us to compare a simple and naive approach to more complex algorithms.

4.4 Visualization

Data collection and sentiment analysis is almost useless if there is no way of presenting the result in an understandable way. We wanted to present our analysed data in a simple and accessible way. To accomplish this we chose to create a website with the data visualized by diagrams. To create the website we used a PHP web framework called Laravel. This framework provides lots of features: simple routing, database migrations, database seeding, an ORM and much more. We used Laravel to create and execute database queries. Since many database queries are heavy and slow, the results of these queries are cached. To visualize the result of the queries, we use the JavaScript library d3.js. The library is provided with the results in JSON format and is then used to produce bar graphs.

8 4.5. FILTERS CHAPTER 4. METHODOLOGY

Figure 4.2: Frontpage of the website

4.5 Filters

We decided to add filters to the model. Our intention with these filters were to create a more realistic approach to summarizing the scores. The filters we created are not that advanced. The intention with filters is merely to suggest an improvement to the model. It is a proof-of-concept design. We realized that the scale is in fact not linear in the sense that 80 points is not worth twice as much as 40 points. A tweet with 80 points is probably not worth the same as two tweets with 40 points when it comes down to the actual voting process. We also realized that we wanted to be able to modify the impact of negative tweets. As we show in Results most tweets are positive meaning that fewer people decide to share their negative opinions. We wanted to create a better representation for a sort of general opinion, and that is why we decided to amplify negative values. It is a compensation to the fact that fewer people tweet negative opinions.

Every heuristic is used as a function hn(e, t) where e is an entity and t is a tweet. The set of entities is E, the set of tweets is T and set the set of tweets associated with an entity e ∈ E is Te. Note that Te ⊆ T . Without any filter applied the equation is simple.

X hn(e, t), ∀ e ∈ E. (4.1)

t∈Te

The function hn(e, t) is executed on every tweet attached to an entity to calculate that entity’s total score. Each filter is defined as a function fn(x).

9 4.5. FILTERS CHAPTER 4. METHODOLOGY

2.8

1 1 2.6

0.95 0.95 2.4 0.9 0.9 2.2 0.85 0.85

0.8 0.8 2

0.75 0.75 1.8 0.7 0.7 Filter scaling factor Filter scaling factor 1.6 0.65 0.65

0.6 0.6 1.4

0.55 0.55 1.2

0.5 0.5 −100 −80−80 −60−60 −40−40 −20−20 00 2020 4040 6060 8080 100100 1 Score −100 −80 −60 −40 −20 0 20 40 60 80 100 Score

Figure 4.3: Plot of the low-pass Figure 4.4: Plot of the negative filter amplifier filter

Furthermore, each filter takes one input parameter and returns a scaling factor for that given value. Since filters are intended for scaling we multiply the returned value from each filter with the score returned from hn(e, t). Equation 4.1 defines the summarization with out filter. With a single filter f1(x) the equation be- comes:

X hn(e, t) · f1(hn(e, t)), ∀ e ∈ E. (4.2)

t∈Te

Finally, the equation is slightly different with two filters:

X hn(e, t) · f1(hn(e, t)) · f2(hn(e, t)), ∀ e ∈ E. (4.3)

t∈Te

This model is flexible and scalable since more filters could easily be added to tweak the equation even further. We have defined two filters. The first is a low-pass filter as defined in equation 4.4, and graphically illustrated in Figure 4.3.

1 f(x) = (4.4) (0.0001 · x2) + 1

This filter is designed specifically for our domain. We know −100 < x < 100 and chose 0.0001 based on that. We also defined a negative amplifier function

|x| f(x) = (2 − )0.92, (4.5) x

10 4.5. FILTERS CHAPTER 4. METHODOLOGY also illustrated in Figure 4.4. This filter is also specifically designed for our needs. We chose the parameter 0.92 because it seemed like a good fit. Further investigations could probably fine tune that even more. The reasoning behind these filters is further discussed in Chapter 6.

11 Chapter 5

Results

5.1 Dataset Characteristics

We ended up with 737.793 tweets, following only the hashtag #eurovision. About 65% of those tweets are written in English. Appendix A shows how many times each entity was mentioned in a tweet. The language distribution of the tweets can be seen in Figure 5.1 and 5.2. The majority of all tweets are represented by only a handful of languages. This is a fairly common phenomena and indicates that the distribution is heavily skewed as with Paretos law and power-law distributions [6]. Power functions are described as f(x) = αx−η. A power function turns into a straight line with slope −η and y-intercept at log(α) when taking the log- arithm on both axises. If the data would follow a power law, or a Zipf distribution, the data should show straight-line behaviour when plotted on a log-log scale. We can see that 10 % of the detected languages make for about 80 % of all tweets. This follows Paretos principle, but the tail of the languages is bounded and does not appear to follow a Zipf distribution. Figure 5.2 displays the language distribution with a logarithmic y-axis but a linear x-axis. It somewhat ressembles a straight line, meaning it is not a Zipf distribution, the long tail is simply too short. We believe that a larger dataset and better language classification would get us closer to a Zipf distribution. An interesting aspect to the language distribution is that we realized that Twitter is running some sort of language recognition on every tweet. We have no insights into how they do it. We believe their algorithm is fairly accurate, but it is still something we cannot control.

12 5.2. SENTIMENT ANALYSIS CHAPTER 5. RESULTS

One single tweet is recognized as being written in the Kannada language which is spoken in some parts of India. The reason for this is that the user tweeted only a smiley concisting of characters from their alphabet.

14 others de 12 fr

10 ru

8 it

6

log of number tweets 4 es

2 en

0 it fi lt tl fr tr nl pl el in sl ja is lv vi pt et ht fa th ru ar iw ur sv sk ko hy en es de bg da hu uk no zh ka kn Language code

Figure 5.1: Language distribu- Figure 5.2: Language distribu- tion of tweets tion of tweets

5.2 Sentiment Analysis

As stated in Methodology we used three different heuristics to calculate a sentimental value of each tweet. These algorithms are nowhere near perfect and should be considered proof of concept. As mentioned previously we also applied filters in combination with each heuristic. Appendix D shows the total score per entity for every combination of heuristic and filter. Not every tweet is considered either positive or negative. The majority of tweets we ran through our heuristics were considered neutral. Their sentimental value is equal to zero. Figure 5.3 shows the distribution of sentiment calculated by the SentiStrength algorithm. We believe this is an area that is not performing particularly well. We know that each of these tweets is about some entity and that most of them probably states a positive or negative opinion. The problem is that most of them are too vague or hard to interpret and end up being read as neutral. With a more complex and tailor-made algorithm we are certain that a higher rate of opinions could be extracted.

13 5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

x 105 3.5

3

2.5

2

1.5 Tweet count

1

0.5

0 −4 −3 −2 −1 0 1 2 3 4 Sentiment Value

Figure 5.3: Distribution of sentiment score

This is not a surprising result. Our algorithms are defensive and most words does not influence the tweets score. The heuristic SimpleWords is a great example. It contains only 10 positive and 10 negative words from the English language.

5.3 Prediction Results

We knew from the start that our result would not come close to the result presented on TV. That final result concists of votes from the 30 countries competing. Our goal was to find similarities with the votes cast by countries with an English speaking population. England, Ireland and a few more are the countries with English as their native language but there is still a few more countries where communicating in English is not uncommon. Our results are not compared on a per-country basis due to time con- straints. We believe that would make for a better correlation in any future work. The result when running the SentiStrength algorithm without filters on all tweets can be seen in Figure 5.4. It places the top three as: & Cleo (), (Austria) and (Spain). Figure 5.4 illustrates each entities total score.

14 5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

Figure 5.4: SentiStrength on all tweets

Juries cast about 50% of the votes of the Eurovision Song Contest. The rest of the votes are casted by the people watching. Therefore we chose to mainly compare our results against the telephone ranking only. That rank was extracted and calculated from data provided by www.eurovision.tv. To clarify, the comparison is made with the telephone outcome from all of Europe. We discussed whether or not to compare it to the United Kingdom exclusively, but we cannot know for sure if an english tweet is written by a UK resident or not. To be able to compare our result with reality we calculated the Pearson correlation. It is a measure of similarity and ranges from -1 to 1 inclusive. -1 means total negative correlation and 1 is total correlation. We created scatter plots for all combinations of heuristics and filters. Those plots can be found in Appendix C. To our surprise, the highest correla- tion was done by our own heuristic SimpleWords with no filters (see Fig- ure 5.5).

15 5.3. PREDICTION RESULTS CHAPTER 5. RESULTS

30 SimpleWords (no filters) Real telephone outcome 25

20

15 Rank

10

5

0 0 5 10 15 20 25 30 Participants

Figure 5.5: Telephone rank correlation: SimpleWords without filters

The x-axis is entity index, the y-axis represents rank. The red line is the what we were aiming for, the actual result. Note that the Pearson cor- relation accounts for all entities when calculating the correlation. That is not exactly what we want, but it is still a good measurement of the overall performance. Table 5.1 displays the Pearson correlation values we found for each heuristc and filter combination. The Mentions heuristic is not applicable to any fil- ters since it only accounts for mentions, not actual sentimental value. Our results vary quite a bit between different combinations of heuristic and filters. The average correlation with the telephone votes only is 0.5079 while the average for telephone votes combined with the jury is 0.4468. This is in line with our initial expectations since we cannot really predict the votes of the jury. We can also see a patterns in the way our filters impact the result. Both our filters lowers the correlation meaning they need to be evaluated further.

Jury + Telephone Telephone Heuristic and Filter Correlation Correlation SentiStrength (no filters) 0.4489 0.5521 SentiStrength (low-pass) 0.4386 0.5515 SentiStrength (amp. negatives) 0.2205 0.3415 SimpleWords (no filters) 0.6561 0.6533 SimpleWords (low-pass) 0.4386 0.5515 SimpleWords (amp. negatives) 0.4892 0.3415 Mentions 0.4359 0.5638

Table 5.1: Pearson correlation

16 Chapter 6

Discussion

6.1 Methodology

We knew beforehand that not every tweet would be written in English. But it made things harder than we expected them to be. We thought that a simple solution would be to translate all tweets to English before process- ing them. However, it turned out we had far to much text too translate and it would be too expensive in terms of money. Google charges $20 per million characters for both translation and language detection. We did feed SentiStrength with some language data, but not nearly enough for it to per- form well. Also, we did not spend any time configuring SentiStrength for our specific needs. We feel that SentiStrength has the potential to outperform SimpleWords with a bit more effort. Every component in Eagle affects the result, which also means that every little tweak we do affects the result. A diverse and multi-language approach is needed to make sure we collect tweets from the whole spectrum. For example we are only looking at one specific hashtag when fetching tweets. There are room for improvements. A more advanced collection methodology should definitely strive to be highly customized on a per country basis and collect tweets from related hashtags. Entity extraction is the process of connecting a tweet with an entity. Doing this is hard and we tried to refine the process. It is still nowhere near perfect. With a better entity recognition we would get a better result. Another thing about this process is the fact that a tweet can have different sentiment against different entities. We simplified this process and gave each mentioned entity in a tweet the same score. This is not ideal either. We realized quite early on that simply adding tweet scores together would not give us a perfect result. There are some differences between our model

17 6.1. METHODOLOGY CHAPTER 6. DISCUSSION and reality that makes everything more complex. • A tweet can have a negative sentiment but you can not vote “negative”. • Our algorithm is far from perfect at extracting entities from tweets. • Sentiment analysis is hard. We can not know for sure that we interpret and rate the sentiment correct every time. • “The vote of the people” is only 50 % of the total score in the Euro- vision finals. There is an elected jury from each country deciding on the second half of the score. • In the Eurovision Song Contest each country’s votes is weighted equally. In our model a country with more tweeters gets higher influence. • Related to the previous point, our model allows people to “vote” on their own country. This is not allowed in the Eurovision Song Contest. Initially we took each tweet about each entity and added all the points together. This gives a total score that point in the right direction though it is far from correct. We believe that a more accurate model consists of a more advanced summarization.

6.1.1 Filters

Each person is allowed to vote how many times as they feel like. This means that a positive tweet could correspond to more than one vote. This concept adds a lot of complexity to the way tweets could be interpreted. Without any kind of weighted sum model the scale is linear. It means that a tweet with 20 points is worth twice as much as a tweet with 10 points. In terms of voting this is probably not the case. A person that writes a tweet valued 20 points will probably not vote twice as many times as some other person writing a tweet worth 10 points. We needed some way of translating points to votes in a smart way. We decided that we wanted to run every tweet through some kind of filter to create a more “normalized” summarization. We decided on a low-pass filter. Our intention with this was to bring lower points “closer” to higher points. Basically preventing 100 points to be 5 times more worth than 20 points. We also created a filter that would amplify negative points. We did this mostly as an experiment. We believe that Twitter could represent “human- ity”, but for that to work we need to account for negative opinions more than positive. Expressing a negative opinion often takes a bit more courage and effort, and that should therefore be considered more worth. We can not really tell if this filter adds something or not. We thought about it and decided to include it in the system.

18 6.2. RESULTS CHAPTER 6. DISCUSSION

Furthermore we have discussed some other approaches to simulate the voting process. Everything we have done is built on what each tweet says about some entity. One approach to this whole thing is to stop looking at the tweets, and instead care about who wrote the tweet. There is a person behind each tweet, and that person is actually the one that may cast a vote. By looking at a tweet we believe we know what that person is thinking. But in fact that tweet only shows a fraction of their thoughts. An approach to deal with this would be to look at the average sentiment point in every tweet from that person. This could give us a more accurate score and actually reflect what that person votes for, and how much they vote. By only looking at tweets we fool ourselves a bit. This approach would probably be our next step if we had some more time. 6.2 Results

We are fairly satisfied with our results. In general everything went ac- cording to our initial plan and we were able to generate some interesting results. We had no way of test our model on an event with known outcome. The Twitter API provides tweets no older than 8 - 10 days, meaning we were not able to collect tweets from previous years. This meant we had to build our prediction model without really knowing what would work. This had a huge impact on our results and we are sure we would have been able to create a more accurate system with more feedback. With that being said we did get results that correlated with the real outcome. We speculated quite a bit to try different methods, and we hope that we can shed some light on how a future prediction model might be constructed. For someone else wanting to create a similar product, our mistakes could save them time. This paper is of course focusing on the patterns that we found and the actual results that we got. But it is worth mentioning that mistakes are also valuable. Nothing is more time consuming than making the mistakes over and over again. Correlation is an interesting aspect of our result. But it could also be mis- leading. When we sat out to predict the result of the Eurovision Song Contest our main goal was not to find a high correlation. Our goal was to find the winner, or a highly accurate top 5 at least. Correlation gives an understanding on how our whole model performed for all the entities. Although that is interesting, the unique selling point of a system like this would be to accurately predict the top performers. That is what predictions are all about, and what people are interested in. One could even argue that correlation is somewhat irrelevant in this context since knowing who will fin- ish 24th is not really that interesting. Our result was actually pretty good when it came to the top 5. That does not show in the correlation.

19 6.3. ETHICS CHAPTER 6. DISCUSSION

To make for a better result, and a better interpretation of the result, some rules should be setup. These are rules that limits the impact of entities where the data is inadequate. • An entity with too low of a “buzz” around them should not be con- sidered in the interpretation. • Entities having their fellow citizens writing too much about them should probably not be considered. • If an entity has too much of a negative hype around them, it is probably not correct to assume a summarization to calculate their total score. These rules, or guidelines rather, are a part of our contribution to any future work in this field.

6.3 Ethics

The way the Internet works today we are able to find detailed information about literally anything. For that to be viable anyone involved have the responsibility to continuously evaluate how he or she handles information. The information we deal with in this paper is not “private”, we are not doing anything illegal or unethical to obtain it. We are using information created by humans and put on the Internet by themselves. Our concerns evolve around personal integrity. An unwritten rule on the Internet is that personal integrity must be respected. Not everyone does that, but no one wants their own integrity compromised. We are reading what people are sharing on the Internet. They do not know it, and we have not done anything to ask for their permission. The data is publicly available via the Twitter API, but it is still worth considering how our work could affect someone’s personal integrity. We believe that we are doing our best to respect every Twitter user’s per- sonal integrity. We talk about the dataset, and not an individual tweet or user. We do our best to ignore the people behind the tweets, because it is not relevant to our work. The way our system is used today there is no risk in it being harmful to anyone’s personal integrity. It all depends on how we use it. The system could potentially be used in more harmful ways. Say person A wants to find out what person B thinks about person C. Person A uses our system to collect opinions from B in a harmful way. This is hurtful for C’s integrity. But also against B’s integrity. This system is meant to be used for gathering opinions from a larger crowd. As soon as the crowd becomes just a handful of people the extraction of opinions could compromise someones personal integrity.

20 Chapter 7

Conclusion

Our initial goal was to investigate if Twitter users tweeting on a very specific topic could represent a much larger crowd. If we were able to determine what people tweeting on our topic were thinking, we thought that we would be able to predict opinions of the larger crowd. The topic we were focusing on was the Eurovision Song Contest. It is a large event and involves a lot of people. The result is based on people’s votes, which means a prediction would be highly relevant. We built heuristics to rank the acts and find the winner. Our goal was to predict a top 5 close to the actual results. We had the real winner on 2nd place and a couple of others close to their actual spots. But we also found huge differences in our result compared to the real. We also studied the correlation of the results. It shows some interesting data, but it is important to reflect on the way correlation works. Every position in the field gives just as much weight to the equation as everyone else. Our initial plan was to focus on the top 5. But the correlation calculates a sort of average for the whole field. The most naive heuristic performed very well in terms of Pearson corre- lation. All of our heuristics gave us an average of about 0.5 in Pearson correlation. We have come to the realization that a prediction using social media in this way is complex. Our model is very simplified and is not as sophisti- cated as reality. We believe future work could benefit from our implemen- tations.

21 7.1. FUTURE WORK CHAPTER 7. CONCLUSION

7.1 Future Work

It is very difficult to analyse a big dataset based on social values. Our findings may not be true for another dataset and there is no method that is guaranteed to work. To find such a method will take time. We would like to end this paper with some questions that we think are highly relevant for any future work in this area. • How many active tweeters, or tweets, from a country is needed to predict that country’s votes in the Eurovision Song Contest? • Which part in our model and implementation is the weakest link? Is it data collection, data analysis or the visualization? – Which part performs well? • Would a prediction model like this perform better in a context with a lower number of entities?

22 Bibliography [1] P.C. Guerra, W. Meira Jr., C. Cardie. Sentiment Analysis on Evolving Social Streams: How Self-Report Imbalances Can Help, In proc. WSDM Conference 2014, New York City. [2] Go Programming Language, http://golang.org/ [3] Y. Hu, K. Talamadupula, S. Kambhampati. Dude, srsly?: The Surpris- ingly Formal Nature of Twitter’s Language, In proc. International AAAI Conference on Weblogs and Social Media, 2013. [4] J. Kim, J. Yoo, H. Lim, H. Qui, Z. Kozareva, A. Galstyan. Sentiment Prediction Using Collaborative Filtering, In proc. International AAAI Conference on Weblogs and Social Media, 2013. [5] J. Lee, M. Sun, G. Lebanon. A Comparative Study of Collaborative Fil- tering, Technical Report (arXiv), May 2012. [6] Anik. Mahanti, N. Carlsson, Anir. Mahanti, M. Arlitt, C. Williamson. A Tale of the Tails: Power-laws in Internet Measurements, IEEE Network, Vol. 27, No. 1, Jan/Feb. 2013 [7] SentiStrength, http://sentistrength.wlv.ac.uk/ [8] G. Shirolkar, R. Shukla, H. Shah, R. Shah Mental State Classification for Hypnotherapy Using Sentiment Analysis, International Journal of Advanced research in Computer Science and Software Engineering, vol. 3, iss. 10, October 2013. [9] Y.R. Tausczik, J.W. Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods, Journal of Language and Social Psychology, March 2010; vol. 29, 1: p. 24-54. [10] M. Thelwall, K. Buckley, G. Paltoglou. Sentiment In Twitter Events, University of Wolverhampton, 2011. [11] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, A. Kappas. Sentiment Strength Detection in Short Informal Text, Journal of the American So- ciety for Information Science and Technology, 2010. [12] Topsy, http://topsy.com/ [13] Twitter API, https://dev.twitter.com/docs/api

23 Appendix A Entity Mentions

Country Code Entity Mentions Poland pl Donatan & Cleo 29880 Austria at Conchita Wurst 26697 Spain es Ruth Lorenzo 26613 Belarus by TEO 18918 United Kingdom uk Molly 18672 Ukraine ua Maria Yaremchuk 17004 Finland fi 14138 Russia ru Tolmachevy Sisters 13874 Sweden se Sanna Nielsen 11691 Italy it 11645 Switzerland ch 11522 Ireland ie Can-Linn feat. Kasey Smith 10188 gr feat. Risky Kidd 10125 Hungary hu Andr´asK´allay-Saunders 8927 Netherlands nl 8797 Armenia am Aram 8550 Iceland is Pollap¨onk 7759 San Marino sm 7432 Malta mt Firelight 6551 Israel il Mei Finegold 6503 Latvia lv Aarzemnieki 6462 Romania ro Paula and Ovi 6429 dk Basim 6309 France fr 6143 Montenegro me Sergej Cetkovic 5778 Norway no 5748 Belgium be Axel Hirsoux 5497 Azerbaijan az Dilara Kazimova 5173 Estonia ee Tanja 4142 Slovenia si Tinkara Kovac 4092 Macedonia mk Tijana Dapcevic 3754 Georgia ge & Mariko 3586 Lithuania lt Vilija Mataciunait´e 3401 Moldova md 3374 Albania al Hersi 3083 Portugal pt Susy 1960 Germany de 1876

24 Appendix B Language Distribution

Language Tweet count en 482262 es 90405 it 31179 ru 30578 fr 24972 de 12493 nl 11462 tr 8969 sv 7490 pl 5695 el 5047 bg 3398 da 3111 in 3051 fi 2820 sl 2793 pt 2477 sk 2419 hu 1405 uk 825 et 717 ja 715 lt 711 is 693 lv 575 no 513 ht 301 tl 279 vi 177 ko 48 hy 48 fa 43 zh 42 ar 32 iw 28 th 9 ur 8 ka 2 kn 1

25 Appendix C Correlation Plots

30 30 SentiStrength (no filters) SentiStrength (no filters) SimpleWords (no filters) SimpleWords (no filters) Mentions Mentions 25 25 Real outcome Real outcome

20 20

15 15 Rank Rank

10 10

5 5

0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Participants Participants

Outcome without filters Telephone outcome without filters

30 30 SentiStrength (low−pass) SentiStrength (amp. negatives) SimpleWords (low−pass) SimpleWords (amp. negatives) Mentions Mentions 25 25 Real outcome Real outcome

20 20

15 15 Rank Rank

10 10

5 5

0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Participants Participants

Telephone outcome with low-pass filters Telephone outcome with amplified negatives

Figure C.1: Correlation plots

26 Appendix D Results From Visualization

Figure D.1: Graph for heuristic Mention with no filter

Figure D.2: Graph for heuristic SentiStrength with no filter

Figure D.3: Graph for heuristic SimpleWords with no filter

27 APPENDIX D. RESULTS FROM VISUALIZATION

Figure D.4: Graph for heuristic SentiStrength with low pass filter

Figure D.5: Graph for heuristic SimpleWords with low pass filter

28 APPENDIX D. RESULTS FROM VISUALIZATION

Figure D.6: Graph for heuristic SentiStrength with amplify negatives filter

Figure D.7: Graph for heuristic SimpleWords with amplify negatives filter

29 APPENDIX D. RESULTS FROM VISUALIZATION

P˚asvenska Detta dokument h˚allstillg¨angligtp˚aInternet – eller dess framtida ers¨attare– under en l¨angretid fr˚an publiceringsdatum under f¨oruts¨attningatt inga extra-ordin¨ara omst¨andigheteruppst˚ar. Tillg˚angtill dokumentet inneb¨artillst˚andf¨orvar och en att l¨asa,ladda ner, skriva ut enstaka kopior f¨orenskilt bruk och att anv¨anda det of¨or¨andratf¨orickekommersiell forskning och f¨orundervisning. Overf¨oringav¨ upphovsr¨attenvid en senare tidpunkt kan inte upph¨ava detta tillst˚and.All annan anv¨andning av dokumentet kr¨aver upphovsmannens medgivande. F¨oratt garan- tera ¨aktheten,s¨akerheten och tillg¨anglighetenfinns det l¨osningarav teknisk och administrativ art. Upphovsmannens ideella r¨attinnefattar r¨attatt bli n¨amndsom upphovsman i den omfattning som god sed kr¨aver vid anv¨andning av dokumentet p˚aovan beskrivna s¨attsamt skydd mot att dokumentet ¨andras eller presenteras i s˚adanform eller i s˚adant sammanhang som ¨arkr¨ankande f¨orupphovsmannens litter¨araeller konstn¨arligaanseende eller egenart. F¨orytterligare information om Link¨opingUniversity Electronic Press se f¨orlagetshemsida http://www.ep.liu.se/ In English The publishers will keep this document online on the Internet - or its possible re- placement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent per- mission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Link¨opingUniversity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ c Matteus Hemstr¨omoch Anton Niklasson

30