<<

Radboud University

Master Thesis Computer Science

Manifestation of real world social events on

Author: M. Van de Voort

Supervisor: dr. S. Verberne Second reader: prof. dr. T.M. Heskes

August 13, 2014 2 ABSTRACT

Situations in which many people come together can result in very interesting and positive, lively social events. Unfortunately, large events with many people involved also include risks, ranging from small injuries to death. It would be useful if social media could be used to monitor these events, and to predict unwanted situations and casualties. To be able to know in advance how an event will develop based on information on social media, a good understanding of the relation between the development of real world events and their manifestation online is needed. In this work, we study the relationship between real world events and their online manifestation. We create a model of both the real world event and its online manifestation. We compare these two models using data about five real world social events and Twitter data about them. We determine the correlation between the online and real world model. We find several weak to moderate correla- tions between online and real world characteristics of events. The intensity, readability and sentiment of the tweets are examples of variables in the online model that show a correlation with the real world and the weekends, school holidays and position of the moon are examples of real world variables which manifest themselves on Twitter.

3 4 Contents

1 Introduction 9 1.1 Research questions ...... 10 1.2 Methodology ...... 10

2 Related Work 13 2.1 Event detection in social media ...... 13 2.1.1 Twitter ...... 14 2.1.2 Event detection in social media other than Twitter ...... 15 2.2 Process of event detection ...... 15 2.2.1 Data pre-processing ...... 16 2.2.2 New event detection ...... 16 2.2.3 Event tracking and known event detection ...... 17 2.2.4 Event prediction ...... 18 2.2.5 Event summarizing ...... 19

3 Definition of events 21 3.1 Time and duration ...... 21 3.2 Place ...... 22 3.3 People involved ...... 23 3.4 Associated events ...... 24 3.5 Content ...... 24

4 Model of online and real-world social events 25 4.1 Model of real-world events ...... 25 4.1.1 Time ...... 25 4.1.2 Place ...... 26 4.1.3 People involved ...... 27 4.1.4 Associated events ...... 27 4.1.5 Content ...... 28 4.1.6 Context ...... 28 4.2 Model of online manifestation of events ...... 28 4.2.1 Notation ...... 28 4.2.2 Time and duration ...... 31 4.2.3 Place ...... 31 4.2.4 People ...... 33 4.2.5 Associated events ...... 36 4.2.6 Content ...... 36 4.3 Comparison between models ...... 39

5 CONTENTS CONTENTS

5 Dataset selection and preprocessing 41 5.1 Data selection and preprocessing ...... 41 5.2 Tweets ...... 42 5.3 The relations between users within the datasets ...... 45

6 Implementation of online and real world variables 49 6.1 Real world features ...... 49 6.1.1 Associated events ...... 49 6.1.2 Content ...... 50 6.1.3 Contextual events ...... 50 6.2 Online features ...... 53 6.2.1 Time ...... 53 6.2.2 Place ...... 53 6.2.3 People ...... 54 6.2.4 Content ...... 56 6.3 Comparison between the models ...... 61

7 Results and analysis 63 7.1 Relations between real world variables ...... 63 7.2 Relations between online variables ...... 64 7.2.1 Place related variables ...... 68 7.2.2 People related variables ...... 68 7.2.3 Content related variables ...... 69 7.2.4 Place and people related variables ...... 70 7.2.5 Place and content related variables ...... 70 7.2.6 People and content related variables ...... 71 7.3 Relations between online and real world variables ...... 71 7.3.1 Correlations between online place and real world variables ...... 71 7.3.2 Correlations between the relations between people online and the real world variables ...... 74 7.3.3 Correlations between the tweet content and the real world variables . . . 77 7.4 Discussion ...... 79

8 Conclusion 83 8.1 Future work ...... 84

Bibliography 85

List of Figures 91

List of Tables 93

Appendices 97

A Weather information 99

B Holidays 101

C Related events 103

6 CONTENTS CONTENTS

D Content events 107 D.1 Program Dance Valley ...... 107 D.2 Program ...... 109 D.2.1 Program Pinkpop ...... 109 D.2.2 Programma ...... 110 D.2.3 Programma Pukkelpop ...... 112 D.2.4 Programma Pukkelpop ...... 113 D.2.5 Programma ...... 115

E Real World Variables 123

F Online place related variables 125

G Online people related variables 127

H Online content related variables 129

I Correlations real world variables 131

J Correlations online variables 143

K Average correlations online variables 149

L Correlations online and real world variables 157

M Average correlations real world and online variables (part 1) 171

N average correlations real world and online variables (part 2) 185

O Overview correlations real and online variables for individual datasets 199

7 CONTENTS CONTENTS

8 Chapter 1

Introduction

Situations in which many people come together can result in very interesting and positive, lively social events. Unfortunately, large events with many people involved also include risks, ranging from small injuries to death[40, 59]. During the last years we have seen, for example, a collapsed tent at a pop festival1, a birthday party that got out of hand2, a Loveparade which got so crowded that people got hurt and even died3, new years eve parties that end in riots4, and soccer matches with violent supporters5. At all of these events many people were involved in or witnessed the event. The events started as a positive happening and ended in negative sentiment, with casualties, chaos, or even vandalism, fights and deaths. Knowing how events develop might help decreasing these risks or might accommodate early intervention and prevent escalation. To be able to monitor the event closely, we can use informa- tion that is available on social media: people that are involved in the event often publish all sorts of information on these media, including information about the event. Currently an average of 5,700 tweets per second is produced by Twitter users[76]. This data contains information about personal activities, social interactions, public opinion, news, developments in science and arts, regional information about weather, traffic, social activity and much more. Many people have the possibility to share information “on the road”. In the third quarter of 2013 72% of people in the had a smart phone6. The percentage of people that use internet was 94% in the last year. The number of people that used mobile internet is 56% 7. This allows people to not only share information on social media after the fact, but to share the information “as it happens”. To be able to use the information people publish online during an event to predict future developments and to accurately anticipate unwanted situations, it is necessary to first understand how the online developments relate to the developments that happen in the real world. In this research we investigate the relation between the online world and the real world, within the scope of music festivals. We choose to first study the relation between medium to large scale music events without irregularities, to get a better understanding of how the real world and the online world relate to each other in a normal situation. We choose these events because we expect

1http://nos.nl/video/265456-tenten-omgewaaid-op-pukkelpop.html 2http://www.volkskrant.nl/vk/nl/2686/Binnenland/article/detail/3326464/2012/10/04/ Project-X-Haren-Niet-zo-janken-gewoon-feesten.dhtml 3http://www.volkskrant.nl/vk/nl/2664/Nieuws/archief/article/detail/1011559/2010/07/26/ 19-doden-en-342-gewonden-door-paniek-op-dancefeest.dhtml 4http://nos.nl/artikel/592166-100-mensen-gearresteerd-in-veen.html 5http://www.telegraaf.nl/feed/22365935/__Ook_rellen_bij_Euroborg__.html 6http://www.telecompaper.com/news/dutch-smartphone-penetration-hits-72-in-q3–973995 7http://www.cbs.nl/nl-NL/menu/themas/dossiers/eu/publicaties/archief/2013/2013-3851-wm.htm

9 that many young people visit these events and assume that these young people often use social media, providing us with enough information on social media about the event. In our research we limit our scope to Twitter. We hope our work will contribute to a better understanding of how messages online relate to occurrences in the real world. We think our results can contribute to the development of new techniques or the improvement of existing techniques that are able to predict developments in events that involve large crowds to prevent disturbances from happening and to allow detection of irregularities in an early state.

1.1 Research questions

In this research, we look into the relation between large social events in the real world and compare them with what happens online in social media. Our assumption is that the development of the online manifestation of the real world event in social media is strongly related to the development of the event in the real world. Our research question is: How do developments of social events in the real world relate to messages about this event on Twitter? We divide this question into the following sub-questions:

1. Which characteristics of events can be used to describe the social events offline and their online manifestation?

2. How do the online and real world characteristics of events relate to each other? Which online characteristics represent real world developments of events most distinctively?

3. How can the relation between events and their online manifestation be used to predict the developments of events based on online information from social media?

1.2 Methodology

We investigate the relation between the development of an event in the real world and the manifestation of this event in messages on Twitter in four steps:

1. We make a model for events online and events in the real world. We base our model of social events on a generic model of an event that has five properties: time and duration, place, people involved, associated events and content of the event (see Chapter 3). We specify our description of online and real-world events by defining metrics that make these five properties measurable. We choose three statistical measures to compare the online model with the real world model. (See Chapter 4)

2. We use data about real events to compare the real world events with their online manifes- tation. To this aim, we choose one type of large scale social event, music events, and choose a couple of examples of this type of event that happened in 2013 in the Netherlands and . We use two dance events (Dance Valley and Sensation) and three pop festivals (Pukkelpop, Pinkpop and Lowlands). We gather data about these events on Twitter. (See Chapter 5).

3. We implement our online event model using the data we gather on Twitter about the five events we use, and also implement our real world event model using the data about the developments during and surrounding the event in the real world. (See Chapter 6).

10 4. We describe the correlation results for both models separately and combined and discuss our findings. (See Chapter 7 and 8)

11 12 Chapter 2

Related Work

Event detection on Twitter has been a popular research topic during the past years [5]. The events that are being detected are various in nature. Some examples are the detection of earthquakes and typhoons [62], disasters and crisis management [41, 46], political preferences, political sentiment or outcomes of elections[27, 75, 66, 73], and disaster related events [39], geosocial events in general [38], crime-related events [39] or more specific crimes like threats [49, 50]. The event detection methods use different characteristics of twitter messages to detect events. Some use changes in the number of messages that are being tweeted or similarity between mes- sages to find messages that describe similar events. Others use natural language processing techniques, clustering-based methods, or machine learning algorithms [9, 7, 1, 54, 56]. Not only event detection has been a topic of research. For example, Tops et al, Kunneman and Van den Bosch, and Hürriyetoˇgluet al. investigate the prediction of events [74, 35, 31]. Furthermore, research has been done into summarizing of (information about) events [45, 48, 15, 65, 68]. We describe work related to event detection on Twitter and other social media in Section 2.1 and discuss the work performed in event detection in Section 2.2.

2.1 Event detection in social media

The term social media is used to refer to websites like Twitter, and Flickr and is used for media that can be used for communication with friends, as a source for information, as means for companies to spread information, for sharing news and expressing opinions [46]. The term social media refers to “a group of internet based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchange of User Generated Content” according to Kaplan and Haenlein [32]. The term Web 2.0 was introduced in 1999 by Darcy DiNucci and describes a new way of using the World Wide Web [21]. In this new way of using the web, both content and applications are continuously updated by users in a collaborative fashion, while in Web 1.0 content and applications are published by individuals. The Organisation for Economic Corporation and Development calls Web 2.0 “the participatory web”. They also define User Generated Content (or User Created Content) as “content made publicly available over the Internet, which reflects a certain amount of creative effort, and which is created outside of professional routines and practices.” [80]. Social media are used intensively by millions of people. A survey by phone late 2012 under 1,802 internet users in the shows that 67% of the people use Facebook, of whom

13 women in the age of 18-29 form the largest group. Twitter is being used by 16% of the respon- dents, and is most popular among urban residents. Pinterest is used by 15% of the respondents, and is popular among women under 50, Instagram is used by 13% and is popular among young people in the age group 18-29. Gigya shows that Facebook is the medium of choice to share content for 50% of the people that share information in social networks. Twitter is used by 24% and Pinterest by 16%.

2.1.1 Twitter In this research, we look into the usage of Twitter for event detection. Twitter is an online micro-blogging service and social network site. The service allows users to publish short status updates consisting of maximum 140 characters. Users can subscribe to other users, allowing them to easily follow their status. Users can also direct messages to each other using special signs (‘@user’), or tag a message about a specific topic using ‘#topic’ [36]. The Twitter messages of a user can be open to the public, or restricted to a select group of Twitter users. Twitter produces a large amount of data of which a large portion is publicly available. Cur- rently, an average of 5,700 tweets per second is produced by 241 million average monthly active users1[76]. This data contains information ranging from reports of personal activities, social interactions, public opinion, news and developments in science and arts to regional information about weather, traffic, social activity and more. Although Twitter provides large amounts of data, finding useful information can be . Weng and Lee [78] report two challenges in detecting events on Twitter: dealing with the amount of data that is produced and with the speed in which it is published and the ratio of “point- less babbles” (noise) versus information that might be about events or other topics of interest. Petrovic et al. also indicate that finding events on Twitter is hard because of the high volume of data to analyze and a higher noise level [55]. Agichtein et al. state that while traditional media have a small range of quality, the quality in social media varies from very high to very low. In their research, they look into how the high quality information can be identified, and use data from the website ‘Yahoo answers’ for their experiments. They use three different properties of the information to determine the quality of answers: intrinsic content quality (spelling, punctuation, grammar), relationships between users and items and the usage statistics (time spend, number of clicks) [2]. Using these properties, they are able to asses the quality of answers with a high level of accuracy. The credibility of information on Twitter is investigated by Castillo et al. [13]. Twitter is able to break news directly from first-observers, following a fast route for the process of gathering, filtering and propagating compared to traditional media. In their research, Castillo et al. try to automatically assess the credibility of the information on Twitter. They define credibility as “offering reasonable grounds for being believed”. In their research, they consider four factors to influence this credibility: the sentiment in the reaction to the message, the level of certainty of users that propagate the information, external sources that are being cited and the properties of the users that forward the information. Based on their experiments, they conclude that they are able to determine credibility with precision and recall in the range of 70% to 80%. Other work in that looks into the correctness of Twitter messages is performed by Mendoza et al.. They look into Tweets after an earthquake in 2010 in Chile to investigate the propagation of false rumours in Twitter. They find that these rumours tend to be questioned much more often than messages that contain information that is confirmed to be true [43]. Ratkiewicz et al. build a system that tracks political memes in Twitter and can identify misinformation like astroturfing and smear campaigns. They are able to detect astroturf political campaigns using

1https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

14 network analysis and sentiment analysis, resulting in detection of false information with high accuracies [60, 61]. Petrovic et al. remark that an important benefit of Twitter is that not only the occurrence of events can be analysed, but also the impact of the event and the reaction of people to it [55]. Finding the impact of events in this meaning would be similar to sentiment analysis, which has been a topic of research concerning Twitter [70, 52, 11, 10, 47].

2.1.2 Event detection in social media other than Twitter Apart from Twitter, other social media have been used for event detection as well. Ciglan and Norvag analyse event detection using Wikipedia page views [16]. Tinati et. al. detect social trends and events in Wikipedia [71]. Combinations of Wikipedia and Twitter are also a topic of research. Osborne et. al. in- vestigate how Wikipedia can be used to improve event detection on Twitter. The conclusion of their research is that Wikipedia is lagging behind Twitter by about two hours. They further investigated how the information found on Wikipedia can be used to improve the information found on Twitter. Their results indicate that combining the results from Twitter and Wikipedia can significantly improve the quality of event detection [51]. Steiner et al. developed an application called Wikipedia Live Monitor [67]. This application searches for events on Wikipedia and social media. In contradiction to the work of Osborne et al., Steiner et al. first search for events on Wikipedia, and use social media to validate their findings. While Osborne et al. look at hourly accumulated page view logs, Steiner at al. look into article edit log streams, which are realtime. While Oborne et al. find a delay of two hours between Wikipedia and social media sites, Steiner finds that this might be 30 minutes for breaking news, while this can even be 5 minutes or less for global breaking news like celebrity deaths. Other combinations of (social) media are also used for event detection. Becker investigates event detection on Twitter, Youtube and Flickr [8]. Aggarwal performs event detection using metadata on Flickr [1]. Yan et al. perform cross-network social analysis to find out whether people first post their news on Twitter or on Youtube [81]. Rabbath et al. investige a detection method for Facebook to find the photos of the same events at pages of different Facebook users and distributed over multiple photo albums [58].

2.2 Process of event detection

In event detection, there are a couple of tasks that are being performed, depending on the kind of event that is being detected. Dou et al. identify four different tasks: new event detection, event tracking, event summarizing and event association [22]. Allan identifies three tasks: data segmentation, event detection and event tracking [3]. In this paper, we consider five different tasks within the process of event detection. Four tasks are event-related, namely event detection, event tracking, event summarizing and event prediction. The fifth is about the preparation of the data for the four event-related tasks: data pre-processing. The five tasks we use in this work are mainly based on the steps identified by Dou et al. and Allan et al.. Dou et al. identify four different tasks: new event detection, event tracking, event summarizing and event association [22]. Allan identifies three tasks: data segmentation, event detection and event tracking [3]. In our work, we subdivide the process of event detection into preprocessing, new event detection, event tracking and known event detection, event prediction and event summarizing.

15 In the event detection process, before the actual analysis can be performed, the data needs to be prepared. We consider the data segmentation task identified by Allan as part of data pre-processing. The event detection and tracking steps are listed in both works. Event detection can be about new events or about known events. New events have many unknown features, while known events have features which can be used to search for the event. Event tracking is the task of labelling new data according to the event it belongs to. Since for event tracking the features of the event are already known, we consider event tracking and known event detection closely related. We separate the tasks into the task of new event detection and that of event tracking and known event detection. We furthermore consider event summarizing identified by Dou et al. also a separate task, and add event prediction to the list of tasks. In the following paragraphs, we describe the five event detection tasks and show techniques used to perform these tasks.

2.2.1 Data pre-processing

The data pre-processing task has the objective to prepare data for further analysis.The data that is provided by Twitter is noisy: the density of relevant information is low (“babble”) [75] and the messages often do not comply with rules for spelling or grammar [57]. Various pre-processing methods can be used. Stopwords (on, of, and, are) can be removed [5]. When only the content of the tweets is needed the addressing in the tweets, the urls that are included or the hashtags can be removed [55]. Keywords can be extracted out of the tweet using different approaches for keyword extraction (methods that extract nouns and named entities out of messages). A porter stemming algorithm can be used to stemm the extracted keywords and make it easier to identify similar messages [63]. In the work of Allan et al. the ‘segmentation’-task is described as a separate part of event detection (or topic detection and tracking). Although Allan et al. describe this process as a process that follows after pre-processing, the segmentation itself can also be seen as a part of it. Segmentation is defined as dividing the incoming information stream in homogeneous blocks. This task needs to be performed before analysis [3]. Another task that can be considered as part of pre-processing is the translation of data in such a way that the data can be handled by the analysis technique. According to Atefeh and Khreich the traditional way of representing data is by using term vectors or bag of words; alternatives are named entity vector and the mixed vector. Furthermore, some features can be represented using specific metrics like distance: for example time differences or location differences [54]. For specific features specific metrics can be chosen like SIFT for images[77, 58].

2.2.2 New event detection

New event detection is about detecting an event of which its occurrence is not known in advance. This means that the content of the event is unknown, but also the time and place or the people involved, implying that the event cannot be detected by just using a query with search terms. In research about event detection, detection can be split into different categories based on the approach in handling the data, the features that are being looked at and the techniques that are used. The detection task can be split into two different detection approaches: retrospective event detection and online or new event detection. Retrospective event detection happens based on the full data set, while online event detection happens on the fly: for each sample is decided whether it describes an event, before the next sample is processed [3, 5, 22, 82].

16 Atefeh and Khreich furthermore differentiate between document pivot event detection tech- niques and feature pivot event detection techniques. Document pivot techniques cluster doc- uments based on their textual similarities, while feature pivot techniques assume that certain features show increased frequency at the moment of the occurrence of the event, based on the assumption that an event in text streams behaves like a ‘bursty activity’ [5]. The techniques used for event detection can be split into unsupervised, supervised and hybrid event detection techniques [5] or into statistical methods, probabilistic methods, AI and ML methods and composite methods [33]. To find new events clustering based techniques are used in a number of works to detect new events. Becker et al. propose an ensemble clustering approach using a single pass incremental clustering algorithm to identify events in social media content [7]. They use tags to calculate three features: the cosine distance between tf-idf-weights for textual information for the media content, the difference between the time stamps of the data and the Haversine distance for the geographical similarity. They evaluate their work using a large dataset that contains photos from Flickr that were manually tagged with an event id that corresponds to events in the Flickr upcoming event database. Aggarwal and Subbian distinguish between novel events and evolution events. A novel event in terms of clustering is a data point that does not fit in existing clusters and thus forms a new cluster in itself. An evolution event defines a change in activity within an existing cluster. Both data about the network structure and data containing the content of the messages were used. They found that using the network structure in addition to the content improves the accuracy of their algorithm [1]. Petkos et al. propose a multimodal clustering algorithm to find social events in collections of multimedia which are found on social media [54]. Bernardus uses a couple of natural language processing techniques to detect trending top- ics [9]. He performs two experiments, where he combines the results from a couple of NLP techniques: in his first experiment he combines raw frequency and relative normalized term fre- quency for both unigrams and bigrams that were extracted from the data. The second experiment used raw frequency, tf-idf and entropy. Petrovic at al. use an adapted version of locality sensitive hashing that is optimized for large amounts of streaming data[55]. Weng and Lee propose a technique called Event Detection with Clustering of Wavelet-based signals (EDCoW) [78]. They calculate bursts in the frequency of word usage for a specific word by applying wavelet analysis. Between these results, they calculate cross correlations. Events are clustered using modularity based graph partitioning. Sayyadi et al. create a keygraph based on the data: a graph that contains keywords extracted from the data as nodes and has edges for keywords that appear together in a document [63]. They apply community analysis techniques to this graph to identify events. The communities that are identified contain a couple of keywords. Each community of keywords is considered as an artificial document and is used to cluster the documents in the dataset. They analysed the characteristics of the events they identified and found that the temporal characteristics matched the expected characteristics of a news event.

2.2.3 Event tracking and known event detection The task of event tracking follows the detection task and is about labelling incoming samples that discuss a specific and already detected event [3]. This technique is used to track the development of an event and is also applied to find related events (‘Event chains’) [22]. Relations between events are mentioned by several researchers and are given various names. (saga or story vs

17 event [63, 3] (see Section 3.5)). We consider event tracking equal to the task of known event detection, as opposed to new event detection. While new event detection techniques detect events that have many unknown features, known event detection techniques detect events of which the content is already known, but where time and place are still unknown. The methods used for known event detection are often rule based or use a query containing keywords that describe the event. Examples of work that describes event tracking methods or known event detection methods often use query terms. Li et al. [39] detect events using TEDAS, a Twitter-based Event Detection and Analysis System. This system detects events, ranks them according to their importance and also determines a temporal and spatial pattern for these events. To detect events, they initially use “tracking rules” based on keywords that are commonly associated with the intended events. Based on the tweets that contain events they extract new keywords, based on the assumption that as keywords are often mixed new rules can be derived from them using the confidence based on the ratio of the messages being an event or not being an event. Baldwin et al. built a system that finds events based on a user defined query containing key terms within a specified time window. As a result to this query they show the message, the geolocation of the message and the predicted probability of the location.[6] Oostdijk and Van Halteren find tweets containing treats using n-gram based recognition [49]. They use two approaches: in the first approach they manually construct n-grams based on their own knowledge of Dutch and based on ideas they got while looking into the datasets they used. In the second approach they used a machine learning algorithm to learn the n-grams. In their next work Oostdijk en Van Halteren [50] try to improve the precision of their treat detection. To this aim, they extend their manual approach with a parse system that applies rules that are manually crafted based on the meaning of specific words. Popescu and Pennacchiotti propose three models for detecting controversial events on Twitter. They define controversial events as events that “provoke a public discussion in which audience members express opposing opinions, surprise or disbelieve” [56]. The features they use to denote their data are based on analysis of twitter data and meta-data using several lexicons, including polarity words and a dictionary, and information from the internet about items on news sites.

2.2.4 Event prediction The event prediction task is about estimating when an event will happen. This means that the starting time of the event should be in the future. Other properties apart from the starting time - like the content or the location - might either already be known, or might be unknown. Tops et al. [74] estimate the time before an event is going to happen based on Twitter messages. They use a data set containing tweets referring to soccer matches in the Netherlands. For their prediction, support vector machines, Naive Bayes and k-nearest neighbor classification are used. They find that, compared to the algorithm that had the lowest prediction error, while humans have a higher prediction error they are better at determining whether a tweet is posted before the event occurs. Kunneman and Van den Bosch [35] try to identify whether tweets are published before, dur- ing or after a scheduled or unscheduled event. They use five unsupervised classification methods: k nearest-neighbor, Winnow, SVM, MaxEnt, Naive Bayes. They were able to accurately deter- mine whether a tweet was placed before a scheduled event but had low performance rates for unscheduled events. Hürriyetoˇgluet al [31] investigate the applicability of three methods to find the time-to- event based on Twitter messages: linear regression, local regression, and time-series analyis. They evaluate their methods using Twitter messages hashtags referring to football matches in

18 the Netherlands in a period of three weeks before these matches. They found that, looking at a period of 200 hours before the event, during 200-100 hours before the event the time-series analysis is most accurate in predicting the event, while in the last 100 hours before the event the local regression method performs better. However, both methods have an error of about a day for these two periods.

2.2.5 Event summarizing Event summarizing is the task of creating a summary of an event based on the available informa- tion which can be found in related messages. This summary can based on content (e.g. Twitter messages), or based on meta-data that is contained in these messages [22]. Summarization has been a topic of interest for many researchers. Das and Martins [17] provide an overview of summarization techniques. They differentiate between techniques for summarization of single documents and multiple documents and other techniques. A number of these techniques, and mainly the techniques for summarization of multiple documents, might be (partly) applicable to Twitter messages, even though the Twitter messages are significantly shorter with their 140-character Twitter messages than the larger documents these summarization methods are intended for. A couple of works focus on summarizing events on Twitter. Nichols et al. create a system that detects events and provides a journalistic summary of these events. They test their method using messages about soccer matches [45]. O’Connor et al. [48] try to offer a solution for the large amount of data provided by social media by organizing the data for the user. In their system, the user can insert a query into their system, the results are being grouped and summarized by their system. Chakrabarti and Punera [15] also search for a solution for the large amount of data, they provide summarization of events on Twitter using an adapted version of the Hidden Markov Model. Sharifi et al. created an algorithm that creates a summary of a phrase that is specified by a user. Their system gathers Twitter messages based on the user query and provides a summary of the findings.[65] Takamura et al. summarize Twitter messages using an extractive approach. They experiment on soccer messages on Twitter[68], making factual summaries. Using their experiments they show that they are able to make good summaries of Twitter messages on sports matches.

19 20 Chapter 3

Definition of events

In the literature several different definitions of what an event is are provided. In this chapter we provide an overview of these definitions. Based on these definitions, we identify a five aspects of events that can be used to give a definition of what is meant with ‘event’. A general definition of an event is provided by Kerman et al.: according to them an event is “a significant occurrence or large scale activity that is unusual relative to normal patterns of behaviour” [33]. Using this definition, event detection seems to be about finding abnormalities in the data and outlier detection might be closely related, since they both try to identify anomalous data points in a dataset. Two definitions for outliers are the following: “an outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs” by Grubbs [28], or “an observation which deviates so much from other observations as to arouse suspicions that is was generated by a different mechanism” by Hawkins [30]. A crucial difference between outlier detection and event detection is that outliers refer to a more generic phenomenon that deviates from the rest of the data, while an event seems to refer to a deviating or outstanding occurrence in time and place. A more specific definition of an event would be “something (non-trivial) happening in a certain place at a certain time” [82]. In current research in event detection on social media, we find that very different aspects of events are used to define the type of event that is being detected. There are social events or real-world events [7], events and sagas [63] or events and stories [3] and trends[42, 9]. Based on the various definitions, we find that an event always shows a change in a certain aspect of the state of the world. This can be a social occurrence like a party or a football match, a natural phenomenon like an earth quake, or something intangible like a change in sentiment. We define an event as an occurrence that indicates a change in a certain aspect of the state of the world, which has the following properties: the event has a starting time, and a duration, the event has a place associated to it, there can be people involved in the event, the event can be related to other events and finally the event is about something, it has a content. In the following sections we provide an overview of related work in event detection that uses these aspects of events.

3.1 Time and duration

An event has a beginning and an end. During the event, the state of the world is different than before and after the event. The start of an event is usually indicated by an increase of messages on Twitter concerning the event. Dou et al. offer a definition of event in the context of social media that includes the mentioning of time: “An occurrence causing change in the volume of

21 text data that discusses the associated topic at a specific time. This occurrence is characterized by topic and time and often associated with entities such as people and location”[23]. Mathioudakis and Koudas also look at an increase of keywords being mentioned to find events: they differentiate between detecting bursty keywords and detecting trends [42]. They label a keyword as ‘bursty’ when it occurs at an unusual high rate in a stream of tweets. A trend is defined as a set of bursty keywords that often occur together. In the definition of Mathioudakis and Koudas, something that others might call an event is called a trend. They are not the only researchers who state that they detect trends, or, in relation to twitter, trending topics. For example, according to the definition of Benhardus, a trending topic is “a word or phrase that is experiencing an increase in usage, both in relation to its long-term usage and in relation to the usage of other words” [9]. Often, instead of ‘event detection’, researchers state that they look into ‘trend detection’. With the term ‘trend’ is often referred to ‘trending topics’ [9], where trend detection has to do with identifying novel events[37]. However, when looking at the traditional meaning of trend, this has to do with tendency, implying that a trend is something that has a longer duration or that it is of less incidental character than an event. Oxford defines it as the general direction into which something is developing or changing [14]. In the work of Wu et al.[79], the following definition of a trend is provided: “The trend is an intrinsically fitted monotonic function or a function in which there can be at most one extremum within a given data span.” The term ‘trend’ is also used by Twitter to describe topics that are being mentioned by users often, called ‘Trending topics’. These ‘trending topics’ are defined by Twitter as follows: “Trends are determined by an algorithm ... this algorithm identifies topics that are immediately popular, rather than topics that have been popular for a while or on a daily basis”1. Based on these definitions, we differentiate between events and trends using the duration of the occurrence (or the mentioning thereof on Twitter). In this research, we consider a trend to follow the definition of Wu, with the extension that a trend is something that indicates a general direction of development and thus manifests for a longer period of time. In our definition a trend does not refer to an occurrence with a related time or place. Data indicating an event on the other hand, shows a comparable development of a certain topic as a trend but over a shorter period of time. For an event, this period of time in which this manifestation takes place clearly relates to the occurrence of the event.

3.2 Place

The occurrence of an event can be in the real world or online according to Becker at al. [7]. They investigate how to separate real world events from non-event content. Online there can be events that just include increased Twitter activity that surround trending topics, but that do not reflect any events that happen in the real world. According to these authors these events are non-events. Events that happen in the real world are called ‘real world events’. The authors use the following definition: “an event is defined as a real-world occurrence e with (1) an associated time period Te and (2) a time-ordered stream of Twitter messages Me, of substantial volume, discussing the occurrence and published during time Te”. Examples of event detection applications in which the location of the event is important can be found in the works of Sakaki et al. [62] and Li et al. [39]. Sakaki et al. try to find earthquakes and typhoons and are also interested in the location. They determine the location of tweets using either the GPS location or the location of the Twitter user as is entered in the Twitter profile. Li et al. [39] detect crime and disaster events and provide temporal and spacial information about

1http://support.twitter.com/entries/101125

22 these events. They determine the location of the tweets using three resources: the GPS-tag in the tweet according to them is accurate but sparse, the location of the tweet can be in the content of the tweet, and the third resource is the location of the user, which is being guessed using the historical Twitter messages of the user. The location of tweets is not only a property of interest as a result of event detection, but can also be a starting point in event detection. Lee and Sumiya anticipate on the development that applications for smart phones are available that automatically update the status of the user on Twitter using the location of the user [38]. Using these applications, they assume that a large number of data containing geosocial information will become available. In their research, Lee and Sumiya aim at performing large-scale geosocial event detection using Twitter messages. They distribute their target area into regions, and for each region they determine the characteristic patterns. For each region they determine the number of tweets, the number of users and the movement activity of the users per time period. Finally they compare new data with the patterns determined using the training data. If the results are outside the usual range the tweets are considered indicating an event.

3.3 People involved

In the events that are identified using event detection techniques on Twitter people are involved or affected. Whether the people on Twitter are actually involved in the event, are a witness to the event or are interested in the event does not matter: by using Twitter they discuss or mention about the event, which means they have some relation to the event and are therefore involved in it. Some research involves detecting events in which people play a crucial role. For example Petkos et al. look into the detection of social events, which are defined as “events that are orga- nized by people and attended mostly by people who are not directly involved in the organization of the events” [54]. Other research looks for events that involve only specific people. The research of Oostdijk and Van Halteren [49, 50] investigates threat detection on Twitter, which indicates an increased risk of an interpersonal event to happen. The relations between Twitter users can be a feature that is used as a technique to find information on Twitter. In research about event detection on Twitter, Twitter users that talk about an event are used to find the location of the event [62, 39]. For this not only the location that is being given to the Twitter profile is being used, but also assumptions about the behaviour of these people is being used to estimate their location. The location of the user might not be known, but according to Li et al. the location can be estimated using historical messages. According to them the location of the user is more likely to appear in their messages than other locations. Also the locations of the friends of the user are often close to the location of the user, so interpersonal relations are also providing information about events [39]. The relations between Twitter users are also used in research performed by Golbeck and Hanson into the political preferences of the audience of media outlets using Twitter [27]. The Twitter users they investigated were followers of a member of Congress, which according to research made it possible to label these Twitter users as liberal or conservative based on the voting of the Congress member. They then looked into the online media outlets these Congress member followers were following and used voting preferences of these followers to determine the average political preferences of the followers of these online media outlets. The value of behaviour of Twitter users online compared to the real world is investigated by Tumasjan at al. [75]. These authors look into whether the messages on Twitter give a meaningful

23 reflection of the political sentiment in the real world by comparing online and real world behaviour of people. They found that the tweets of politicians accurately reflected their behaviour in the real world. The reaction of people to events can be useful to determine the meaning and impact of an event. It can be measured using sentiment analysis techniques. An overview of work in sentiment detection is provided by Pang and Lee [53]. Asur and Huberman [4] predict box-office revenues using Twitter messages. Go et al. classify Twitter messages as either positive or negative in relation to a specific query term [26]. Thelwall analyses sentiments in Twitter, relate these to events and find that the sentiments around Twitter events are associated with negative sentiment strength [70].

3.4 Associated events

Some events do not stand on their own, but are a small part of a larger event or are related to other events in another way. Sayyadi et al. give a definition for long ‘broader context events’ and smaller events. They differentiate between ‘episodes’ and ‘sagas’: an episode refers to one event, while a saga contains a collection of events related within a broader context. They explain this using the presidential election as a saga, containing multiple episodes like a speech at a rally [63]. Allan et al. distinguish between ‘event’ and ‘story’: they assume that an information stream contains a sequence of multiple stories. These stories contain information about one or multiple events. They differentiate between unexpected and expected events: in case of unexpected events the event is followed by stories about the event, while in case of an expected event the event is both proceeded and followed by stories referring to the event [3].

3.5 Content

According to our definition an event indicates a change in the world in a certain aspect. This aspect is the content or the topic of the event. These aspects might be natural phenomena or natural disasters, such as earthquakes and typhoons [62], disasters and crisis management [41, 46], political preferences, political sentiment or outcomes of elections[27, 75, 66, 73]and disaster related events [39], geosocial events in general [38], crime-related events [39] or more specific crimes like threats [49, 50].

24 Chapter 4

Model of online and real-world social events

We compare real world social events with online events using a model that describes social events in the real world and a model that describes the online manifestation of these events. In this chapter we first describe a model for the real world events (section 4.1), followed by the model for the online manifestation of events (Section 4.2). We end this chapter with a description of the methodology for the comparison of these two models (Section 4.3).

4.1 Model of real-world events

We define a model for the real world occurrence of the event using the characteristics of events that are defined in Chapter 3. These characteristics are time, place, people associated events and content. When we look at what happens during an event, we find that the content of an event can often be described as a series of smaller events. Each sub-event has its own duration, place= and content. The same is true for the associated events. We therefore define an event e ∈ E as a tuple (T, l, P, ea, c). Each event has a time period T = [t0, tend], a location le, people p ∈ P that are involved in the event, associated events ea ∈ AE ∈ E and a description of the content of the event c. The associated events and the description of the content are either sets of events or a description d of the content. This description can be a description in natural language or a number. The event takes place in a context, that also influence the outcome of the event. We model the context either as a sequence of events or as a continuous time function that describes the behaviour of an influential context variable. The notations we use for the definition of the real world events can be found in Table 4.1

4.1.1 Time We define the model for the manifestation of real world events based on time. We use a discrete approach to describe the occurrences within the event and to define the occurrence of associated events before, during and after the event. An event occurs during a period T and has a beginning and an end, defined by [t0, tend], during which the event exists. Before and after this period the event is not defined.

25 Table 4.1: Notation real world model

Events E Event e = (T, l, P, ea, c) Associated events ea ∈ AE ⊆ E Time period T = [t0, tend] Event location le ∈ Le, le = (l, n) Event content c = (e1, e2, ...en) ∨ c ∈ D Content descriptions D , with ∀d ∈ D, d ∈ R ∪ Σnat.lang. lr ∈ Lr, lr = (l, t) location l = (φ, λ]) Name n People P {A ∪ R ∪ P f ∪ S} ⊆ P Event category t ∈ {UME,PU,PE,PA} Location category tag ∈ {H,E,F,R,N} t context co = (e1, e2, ...en) ∨ co ∈ f(t)

Figure 4.1: Event and occurrence of properties over time

In figure 4.1 an example of how an event might be defined in time is provided. In this figure the lower line represents the time axis. The bars above this axis are the event and occurrences of specific features of the event. This means that an event can be described with a list of events and related events that occurred at a specific period between the start and the end of the event. This also implies that there might also be periods between the start and the end of the event for which no description by means of occurrences of events is available, as can be seen in the figure between a and b, d and e, and that there can also be moments at which multiple events are happening, as is the case at the points of overlap between b and c, c and d, and e and f.

4.1.2 Place The place where the event takes place can be split into the primary location of the event and the secondary locations where the event is being followed remotely, using internet, television or radio. The primary location is the actual, physical location where the event takes place. This location is denoted as Le and can be expressed using geographical coordinates (φ, λ) and using the name of the place n where the event is being hosted, if the place is named. The secondary location are places where the event is being experienced because of tv- broadcasts, live broadcasts using internet, social media or news-feeds on internet that offer regular updates. These locations can be everywhere: people often watch television at home, or in bars. When people follow the event using internet, they can be anywhere where they have access to internet. The set of remote locations where the event is being experienced LR consists

26 Audience (A) Performers (Pf)

Staff (S) Reporters (R)

Figure 4.2: People that are involved in the event of locations that are denoted, using geographical coordinates (φ, λ) and a tag that describes the type of location: home (H), place of entertainment (E), work (W), on the road (R), nature (N), with friends or family (F). The notations for the place of the event are listed in Table 4.1.

4.1.3 People involved People that are involved in the event can be split into four groups: the group of performers P f, the audience A, the employees S (staff) of the event and the reporters R. The reporters can be television reporters or journalists of (online) news papers, but also people that publish developments of the event on social media. Between these groups can be overlap: e.g., performers can become audience before or after their performance and audience can become reporter when they start to publish news using social media. Part of the audience is not present at the event, but is remotely included in the event through information provided by the reporters using any type of (real-time) medium.

4.1.4 Associated events We differentiate between different types of associated events: unscheduled micro-events during the event (UME), unrelated events that occur at the same day (UE), previous editions of the event (PE) and events that target the same audience that are scheduled closely to the current event (PA). Each associated event ea ∈ Ea is defined as an event as shown in Table 4.1. The unscheduled micro-events during the event are incidents that occur during the event, which can range from for example someone that suddenly collapses to a spontaneous performance or other spontaneous actions, small disturbances or even riots. (Scheduled sub-events are part of the content of the event, see paragraph 4.1.5) Unrelated events that occur at the same day are events that have nothing in common with the event itself, but might influence the event because they occur the same day and the people

27 that attend the event might be concerned with, interested in or influenced by this other event. An example is a soccer match during a music festival: if these two events occur the same day, people present at the music festival might simultaneously be following the soccer match. The outcome of this match might influence their mood and might therefore influence the course of the event. Previous editions of the same event might be subject of the conversations during the event and might influence the expectations of the people present at the event. Events that target the same audience that are scheduled closely to the current event might also be topic of conversation. The course of past events might also influence the course of this event when the same people are present. For example: if the same group of people visits multiple events of the same type and there are always fireworks at the end of these events, they might be very disappointed when this does not happen at the current event.

4.1.5 Content The content of the event can be described as a sequence of scheduled or expected small events that occur within the borders of the event region and within the period in which the event is scheduled. This content of the event is based on the program for the event and consists of small events such as talks, performances or matches. The events are described using the notation for events that is listed in Table 4.1. These smaller events might overlap in time or might even occur simultaneously.

4.1.6 Context The context of the event might influence the event, even though the contextual variables do not have a direct relation with the event. An example of a context variable is the weather. The weather has no direct relationship with the program of an event, but changes in the weather might cause changes in the program of the event or might change the sentiment of the event. The context co is either modelled as a sequence of events (e1, e2, ..., en), or as a continuous time function f(t) that describes the behaviour of an influential context variable. (See also Table 4.1.)

4.2 Model of online manifestation of events

We define a model that describes the manifestation of social events that happen in the real world online on Twitter. We base our model on the five characteristics of an event as defined in Chapter 3: time and duration, place, people involved, associated events and content. For each characteristic, we define a number of metrics to make these characteristics measurable. In the following sections we explain and motivate the various metrics that we use to measure characteristics of social events online.

4.2.1 Notation We use a dedicated set of symbols to describe our model of the online manifestation of events. For this description we assume that we study a set of Twitter messages M which is relevant to the event that is being observed. This set of Twitter messages contains a number of messages m ∈ M. Each of these messages is defined as a 4-tuple containing the time t in which the message is published, the user u that published the message, the location l from which it is published and the content c.

28 A person that publishes messages on Twitter is defined as a 6-tuple. The person has a unique name n, has a set of friends fr and a set of followers fo, a place of residence rl, a current location l, a gender g and an age a. Person p exists in the set of persons P . There are four partly overlapping subsets within the set of persons. The first group is the set of users U. The users are the Twitter users that publish messages that are part of set M, containing messages about the event that is being investigated. The second and third group are the friends and followers of the users. Followers are the users that subscribed to the status updates of the user, while friends are the Twitter users that are being followed by the users in U. The friends and followers of the users might overlap with the group of users and also with each other. The fourth group is the set of relatives. This is the group of persons that are follower or friend to the users, but are not part of the group of users. These groups are described in Figure 4.3. The dots each indicate a person, the arrow heads point from a follower to a friend. For some of the metrics that we use, it is necessary to walk through a set of sets. For S example: the set of friends can be described using the following notation: frui . We use ui∈UMx the notation U to indicate that we take the set of all users U that belong to all the messages in the set of messages M , which could also be expressed using S u ∈ m . To indicate that x mi∈Mx i we use all friends of the i-th user ui, we use the notation frui . The symbols used in our model are also shown in Table 4.2.

Table 4.2: Notation online model

Description Definition Twitter message mi = (t, u, l, c) m ∈ M person p = (n, fr, fo, rl, l, a, g) ∈ P users U = S u ⊆ P mi∈M mi friends fr ∈ F r ⊆ P followers fo ∈ F o ⊆ P relatives R = S (F o ∪ F r ) \ U mi∈M umi umi persons P = R ∪ U Name n

∀px, py ∈ P, npx 6= npy Age a ∈ N Gender g ∈ { , } Time t ∈ T♀ ♂

tmi <= tmi+1 Location (long. and lat.) l = (φ, λ) Location of residence rl p 2 2 Distance D(lx, ly) = |φlx − φly | + |λlx − λly | Twitter Alphabet Σ = set of UTF8-characters tag ht = #Σ∗, |ht| <= 140 Mention dir = @Σ∗, |dir| <= 140 Emoticon e ∈ E = {: −), ; −), ...} Twitter text s = Σ∗| |s| <= 140 Twitter language TL = {c = (dir, ht, e, s)∗| |c| <= 140}

29 g g f f b b a a e e k h k h i i j j c UM c UM d d S S (a) Users: umi (b) Friends: frui mi∈M ui∈U

g g f f b b a a e e k h k h i i j j c UM c UM d d S (c) Followers: foui (d) Relatives: (F r ∪ F o) \ U ui∈U Figure 4.3: Different groups of Twitter users. The black points and the red squares are Twitter users. The users that are depicted in the large cirle have posted tweets about the events, and form the group of users (U), the users that are outside this circle are friends and relatives of these users, but have not posted tweets about the events themselves. For each sub-figure: the red points are the users that are described in the formula in the caption. The arrows indicate the direction of the relation. An arrow from a to b indicates that a is a follower of b and that b is a friend of a.

30 4.2.2 Time and duration We define the model for the manifestation of events based on time. We therefore express the metrics about location, people, associated events and content as a function of time. Some metrics, like the event moving speed which is defined in Section 4.2.3, are already defined using time. Other properties can be expressed as a function of time using a sliding window. We define property P as a function of time t using window w as described in Equation 4.1. Using this sliding window, we only calculate the property based on a part of the full set of messages M, which is the portion of messages that fits within the window w: Mw.

(tm −w) [i Mw = mi (4.1)

tmi

Intensity The intensity of the event is measured using the number of messages that is being sent, based on the assumption that people increase the amount of information they share when they experience the event as intense. We use a time window w to measure the intensity. We express intensity in Equation 4.2. P Intensity(t) = mt∈M,0≤t−mt≤w (4.2) w

4.2.3 Place We use a couple of metrics that are based on the location of users or the event. To use these metrics, we first need to define some location types. We then use these definitions to describe location-dependent metrics.

Definitions

In our model we consider the location of the event el, the current location of the Twitter user ul and the living place of the Twitter user rlu. We define the location of the event el as the location where most people are at the time they are posting their tweets. We define the location of the event Le as: the center of a region with event_diameter = 2r, where the density of tweets with a tag referring to the event is highest compared to alternative regions.

X el = arg max subject to lmx ∈ LM (4.3)

lmi ∈M|D(lmx ,lmi )

The living place of the Twitter user (rlu) can be described as an optimalization problem, based on the work of Li et al [39]. In this model, they work with a couple of observations of the behavior of the user: the location of the user is more likely to appear in the Tweets than other locations, the friends of the user tend to be close to the user and the user has mentioned his location at least once or the location is the same as the location of his friends. In this model rlu is the place of residence of user u, MUu ⊆ M is the set of tweets of user u in which locations are mentioned and Fu is the set of locations of the friends of the user. The location for user u is is described as follows:

31 X X rlu = arg min D(rlx, lmi ) + D(rlx, rlfj ) subject to lux ∈ LMUu ∪Fu (4.4)

mi∈MUu fj ∈Fu

We assume that the current location of the user (ul) is equal to the location of the tweets.

ul = lmx (4.5)

User involvement For each event, people will be participating from the location of the event, but also from various places in the world. We distinguish between interested users and involved users. Interested users twitter opinions from another location than the event, involved users are present at the location. We use the event diameter that is also used in the definition of the location of the event. The user involvement is defined in Eq. 4.8.

∀u ∈ UinvD(lu, le) ≤ event diameter (4.6)

∀u ∈ UintD(lu, le) > event diameter (4.7)

U Involvement(U) = inv (4.8) Uint + Uinv

Event moving speed Some events are stationary, other events move. A demonstration, for example, physically moves from one place in a city to another place. Holidays like new year or Christmas move over the world like a wave. After-parties can move the event from the original place to another place in the city. Football riots do not always just happen in the football stadium, but can also start before the match or continue on the way home. The speed of movement of the event is defined in Eq.4.9.

D(el , el ) v (t) = ti ti−1 (4.9) event ∆t

Distance event to place of residence The average distance D of the place of residence of Twitter users that participate in the discussion to the location of the event is defined in Eq. 4.10.

1 X D(l ,U) = D(l , rl ) (4.10) e |U| e ui ui∈U

Average distance between Twitter users The average distance of place of residence of Twitter users that participate in the discussion to each other is defined in Eq. 4.11.

n n 2 X X D(U) = D(rl , rl ) (4.11) n(n − 1) ui uj i=1 j=i+1

32 4.2.4 People We model metrics that involve characteristics of the people that are involved in the event. We define metrics that are based on individual characteristics of people such as age or gender, and metrics that are based on characteristics of the group of users that is active on Twitter.

Gender of the users We define the male ratio as the number of male twitter users that is active divided by the total number of active users. The male ratio is given in Eq. 4.12: P |U | Male(U) = P (4.12) |U +♂U | ♂ ♀ Age of the participants The average age of participants in the discussion is defined in Eq. 4.13: P au Average age(U) = ui∈U i (4.13) |U|

User activity The activity of the users is determined by comparing the number of tweets within a certain time window to the average number of tweets per time window based on the full dataset. We define this activity measure in Equation 4.14, where the expected value is given in Eq. 4.15.

P m ∈ M, t − w <= t <= t userActivity(t) = m (4.14) Pm

|M| |M|w |M|w w P = |U| = = = (4.15) m tend−t0 w |U|(tend − t0) |U||M| |U| We also look at the change in activity, we use the first differential for the user activity to calculate this (Equation 4.16).

activityChange(t) = userActivity0(t) (4.16)

Relations between people The relations between the users and their friends and followers can be described using centrality measures that originate from Graph theory. We view the group of users and their friends and followers as nodes and describe their relations as edges. The centrality networks give an indication of how connected the network is. A high value implies that many people know each other without having many links in between them. A low value means that many people are needed to connect two people to each other via existing relationships. We use the degree centrality, closeness centrality and the betweenness centrality to define the centrality of the group of active users. The degree centrality describes the number of paths from a node to other nodes in the network, compared to the possible number of paths in the network [12]. The degree centrality is defined in equation 4.17.

33 |F r ∈ U ∪ F o | C (u ) = ui ui (4.17) D i |U| The degree centrality for the group of users can be calculated using the Freeman’s general formula for centralization [25]. This measure for degree centrality is defined in Equation 4.18. ∗ 0 In this formula, CD(p is the maximum centrality value in the network, CD(pi) is the degree centrality value for node pi, and n is the number of nodes.

n 0 ∗ 0 X [C (p ) − C (pi)] C = D D (4.18) C n2 − 3n + 2 i=1 The closeness centrality of a node is a number that describes the sum of the number of shortest paths to all the other nodes in the network [64]. In a social network this describes how connected people are with each other. The shortest path between two nodes a and b is called the geodesic and is referred to as g(a, b). The closeness degree for a node in a network is defined in Equation 4.19. X CC (ui) = g(ui, uj) (4.19)

uj ∈U|uj 6=ui The closeness centrality for the group of users can be calculated using the Freeman’s general formula for centralization [25]. This measure for closeness centrality is defined in Equation 4.20.

n 0 ∗ 0 X [C (p ) − C (pi)] C = C C (4.20) C (n2 − 3n + 2)/(2n − 3) i=1 The betweenness centrality of a node in a network expresses the number of geodesic paths that pass through the node [64]. This value is calculated for each node separately and is defined in Equation 4.21. In this equation the geodesic path between nodes a and b is called g(a, b); the geodesic that leads through node i is expressed as gi(a, b). When there is no path between a and b, the distance is usually put to infinite.

X gi(ab) CB(i) = (4.21) gj j

n 0 ∗ 0 X [C (p ) − C (pi)] C = B B (4.22) B n3 − 4n2 + 5n − 2 i=1

Inclusiveness The inclusiveness describes how many people that are involved in the event do not have any relations with other users that are involved in the event. We define the inclusiveness in Equa- tion 4.23.

U \ (U ∩ (F r ∪ F o)) Inclusiveness(U) = (4.23) |U|

34 Social Equality In the group of users, the social equality expresses the size of bidirectional relations to the number of unidirectional relations. This is calculated by taking the number of relations in which the users are both each others friend and follower and dividing that number by the number of relations that only consist of a being a friend or a follower.

X |F ru ∈ F r ∩ F ou ∈ F o| Equality(U) = i i (4.24) |F rui ∈ F r4F oui ∈ F o| ui∈U

Popularity of the Twitter users We define the popularity as the number of followers in relation to the number of friends the user has. If a user has more followers than friends, the popularity of this user is higher then when the number of friends is higher than the number of followers. To express the direction of the relations within the network, we use the indegree and outdegree of the graph[64]. The number of followers for each user equals the indegree of each user or node in the graph. We calculate the average indegree as defined in Equation 4.26.

1 X mFo(U) = |F o | (4.25) |U| ui ui∈U

The number of friends for each user equals the outdegree of each node in the graph. We calculate the average indegree as defined in Equation 4.26.

1 X mFr(U) = |F r | (4.26) |U| ui ui∈U

We calculate the popularity as defined in Equation 4.27.

mFo(U) Popularity(U) = (4.27) mFr(U)

Experience The experience of the user is defined as the number of tweets that has been published by this user. The experience is defined in Equation 4.28.

X Experience(ui) = umj = ui (4.28)

mj ∈M

People of interest For each group of users the people of interest ratio is determined by dividing the number of famous people and other people of interest by the total number of users. This ratio is defined in Equation 4.29.

|U | People of interest(U) = int (4.29) |U|

35 4.2.5 Associated events The associated events can be sub-events or events that occur the same day, in the same period or events that are related because of the type of event. Hash tags are used not only to label events with the name of the event, but also to label the event with a qualification, theme, relation to a place, company, organization or other events. We assume that if an event has an associated event, tags that refer to these associated events co-occur with the tags to the main event.

Number of Topics We calculate the average number of topics for the Tweets by assuming that each topic is indicated using a hash tag. We calculate both the unique number of topics within the set of messages as expressed in Equation 4.30 and the average number of topics mentioned for each message which is defined in Equation 4.31. [ Sum unique topics(M) = | tag | (4.30) cmi mi∈M P mi ∈ M|tagc | Average topics(M) = mi (4.31) |M|

Co-occurrence of tags We cluster tags based on co-occurrence, and assume that these tags that occur together separate different aspects of events, different sub-events or associated events. We cluster these tags by counting how often the tags occur in the same messages. If the tags co-occur in more than n times, we assume that they together describe specific properties of the event.

4.2.6 Content We defined variables that describe different characteristics of the tweet that are based on the content of the tweet.

Originality We calculate the average originality for the contributions calculating the portion of the messages that is not a retweet divided by the total number of messages in Equation 4.32. P mi∈M|rtc =false mi Originality(M) = P (4.32) mi∈M

Emotionality In Tweets the author sometimes indicates emotions using emoticons. Although the usage of emoticons depends on the context (emoticons are used more in social context than in task- oriented context [20], are also more often used with friends than with strangers [19] and the exact meaning of usage of emoticons seems to be still unknown), emoticons are used in sentiment analysis on Twitter in several projects [34, 44] We count the number of emoticons in the message to define the emotionality of the message. We define the emotionality for the messages within a time window as the sum of the number of

36 emoticons for each message divided by the total number of messages. The emotionality is defined in Equation 4.33. P P m ∈M ( e ∈E i j(c ) cm Emotionality(M) = mi i (4.33) |M|

Sentiment We measure the positivity using a list L containing sentiment values for words. The sentiment value for a word w is defined as L(w).

The words in the content of message mi are defined by using the function words(cmi ). The sentiment for the messages M is expressed as the sum of the sentiment values for all words in the tweet divided by the number of messages and is defined in Equation 4.34. P P m ∈M ( w∈words(c ) L(w)) Sentiment(M) = i mi (4.34) |M|

Newsworthiness We calculate the newsworthiness by determining whether a message contains news or chatter using various properties of the tweet. A mention is considered an indication that the message contains conversation, while we assume that tweets that are not addressed to individual users are more likely to be news or of interest to every Twitter user. We also assume that question marks are part of conversations, since they ask for a reply. If messages contain emoticons we do assume that they are part of chatter [20]. A Tweet that contains an url, we expect to be more likely newsworthy than to contain chatter. We determine whether a message contains news by counting the newsworthiness-score in Equation 4.35. The variables a to e will be determined based on experiments and the character- istics of the dataset.  +a if url ∈ cm  +b if @ ∈/ c  m News(m) = +c if ? ∈ cm (4.35)  +d if emoticon ∈ cm  +e if @ ∈ cm We calculate the average newsworthiness for a set of messages as defined in Equation 4.36. P News(mi) Newsworthiness (M) = mi∈M (4.36) |M|

Readability Some tweets contain more complex sentences than other tweets. For texts, measures are used to define the readability using the length of the sentence and the word length. We base our readability scale on the Flesch-Douma scale 1. Using this scale, the words in the content of message mi are defined by using the function words(cmi ). The word length of word w is defined as |w|. We define the readability for a set of messages is defined in Equation 4.37.

1http://www.kennislink.nl/publicaties/hoe-begrijpelijk-is-mijn-tekst

37 P |w| P w∈words(cm) m ∈M 206.84 − 0.77 × − 0.93 × |words(cm)| Readability(M) = i |words(cm)| (4.37) |M|

38 4.3 Comparison between models

We compare the model of online manifestation of social events with our model of real-world events. To be able to compare both models we express each variable of both the online and real world events as discrete time functions, which we will call fo(i) = vt1 , vt2 , ..., vtn and fr(j) = wt1 , wt2 , ..., wtn , where i and j correspond to the moments in time at which the tweets are posted. How we implemented this can be found in Chapter 6. We expect that if online and real-world events are related to each other their variables show a correlation. Two measures that are suitable for finding correlations between two variables that are measured on ratio scale are the Pearson product-moment correlation coefficient and the Spearman’s rank correlation coefficient [29] The Pearson correlation coefficient expects a normal distribution of the data and a linear dependency between the variables while the Spearman cor- relation coefficient searches for a monotonic function that can describe the dependency between the two variables. We do not know in advance for each variable what its distribution will be nor do we know the kind of relationship between any of the online and real world variables. We therefore choose to calculate both correlation measures. The Pearson product-moment correlation coefficient is defined in Equation 4.38

cov(fo, fr) ρP (fo, fr) = (4.38) σfo σf r

where cov is the covariance, σfo is the standard deviation of fo and µfo is the mean of fo. The Spearman’s correlation coefficient calculates the correlation for the variables using their rank numbers instead of their values. This means that the values for each variable are put in order and then are given a ranking number. The original pairs of the data are used to calculate the Spearman’s correlation coefficient. If double values occur, the mean value of the ranks is used as the rank for both values. An example of the translation of the values of the variables X and Y to ranks ri and si is given in Equation 4.40. The Spearman’s rank correlation coefficient is defined in Equation 4.39.

P 2 i Di ρS(ri, si) = 1 − P 2 2 (4.39) 2( i ri − nr¯ )

, where ri and si are the ranking numbers for the variables fo and fr and Di = ri − si.

2 4  1 1, 5 3 4 2, 5 1, 5     [fo, fr] = 5 5 → [ri, si] =  4 3  (4.40)     7 6  5 4  3 9 2, 5 5 We do not expect that if two related events occur online and in the real-world, they occur at the same time. This implies that the Pearson correlation calculated on data pairs where the points in the pair are coupled based on the occurrence in time, might not show a relationship that has a delay between the online and real world. To overcome this we also calculate the cross correlation between the two variables. The cross-correlation is similar to the convolution of the two variables and is shown in Equation 4.41.

∞ def X ∗ (fo ? fr)[n] = fo [m] fr[m + n]. (4.41) m=−∞

39 When the cross-correlation is calculated, the sum of the two functions is calculated while they are shifted against each other. The highest value for the cross-correlation occurs at the lag where the correlation between the variables is the highest. Using the highest value of the cross-correlation we can find the lag between the functions with the highest correlation. We use this lag to manually shift the functions and calculate the Pearson correlation and its p-values for these shifted functions. When calculating the cross-correlation we can shift the functions both ways, assuming the real world events occur before the online events or vice versa. We expect that online behaviour can be the result of real world occurrences, but we also expect that online behaviour can be a symptom of something of which the physical symptoms are revealed later in the real world. Because of our two-fold assumption we test for correlations between time shifts in both directions. For both the Pearson and Spearman correlation coefficients we calculate the p-values. We assume a probability of p < 0.01 for the correlation to be significant. We furthermore consider correlations between 0.15 and 0.36 as a (very) weak correlation, a correlation between 0.36 and 0.68 as a moderate correlation and a correlation higher than 0.68 as a strong correlation [69]. This research is of exploratory character: we do not have any suspicion about possible corre- lations between online and real world variables, nor do we know which techniques would be best to measure these variables. We do not adapt our techniques of measuring the online and real world variables in such a way that they would fit better with the dataset, nor do we consider the character of the variables for the method of data selection. We do expect that if we find significant correlations, these will be low because of these factors. We will therefore consider lower correlation values. When we calculate the lag between the online and real world events a limit needs to be set on how large this lag can be. We assume that if two variables in the online and real world are dependent, that the delay between the event and the manifestation of this event in either direction between the online and real world is not longer than 36 hours. We base this on the kind of event we study: the festivals we study typically do not have a duration longer than a couple of days, we assume that a delay of 36 hours is the maximum delay for assuming a dependency between two variables.

40 Chapter 5

Dataset selection and preprocessing

The data used in this research consists of tweets about five music events in 2013. We describe how we selected and preprocessed the data in Section 5.1. We analyse the five datasets by looking at the tweets in Section 5.2 and by looking at the users that contribute tweets to the datasets in Section 5.3.

5.1 Data selection and preprocessing

The data used in this research consists of tweets about five music events in 2013: Dance Valley, Sensation, Pukkelpop, Pinkpop, and Lowlands. The tweets have been selected using the Twiqs dataset [72]1. This dataset contains about 40% of Dutch Tweets from December 2010 until now. Using this dataset the data about the five selected events is retrieved using the search terms that are listed in Table 5.1. The tweets that are retrieved from the Twiqs dataset, contain the following information: the users id, the tweet id, the date and time at which the tweet is published, the id of the tweet to which this tweet is a reply, the id of the original tweet if this tweet is a retweet, the name of the user that published the tweet and the text of the tweet. For our research we need more information about the tweets, such as the location of the tweet, the friends and followers of the users, the number of retweets and the number of previous tweets by the users. We use the tweet-id to download the tweet and its meta-data using the Twitter REST-API2. Using this API we were able to download most of the original dataset in weeks 17 and 18 of 2014. Some of the missing tweets were removed by the users and some users changed their visibility to ’protected’, causing the tweets to be unavailable. The number of Tweets is listed in Table 5.1. The tweets were retrieved in Json format, containing information about the tweet, retweets, the user, other contributions and the location of the tweet. We selected and combined some of these fields, resulting in a datastructure that is described in Table 5.2. From the text field we removed emoticons, user mentions, retweet indications at the beginning of the message (’RT’) and punctuation marks. We removed all other characters which were not alphanumerical. We calculated the time from epoch based on the twitter time stamp. For the other fields we used the original values from Twitter.

1www.twiqs.nl 2urlhttps://dev.twitter.com/docs/api/1.1

41 For each set of tweets we created a list containing unique users that contributed tweets. For each user, we retrieved friends and followers using the Twitter Rest-Api.

Table 5.1: Dataset

Event Event date Search period Search terms #tweets Dance Valley 3 Aug 2013 20130501h00-20131101h23 "dance valley", #dv2013, #dv13 2.484 Sensation 6 July 2013 20130401h00-20131001h23 sensation, #sensation 6.235 Pukkelpop 15-17 Aug 2013 20130514h00-20131116h23 #pkp13, pukkelpop 16.978 Pinkpop 14-16 June 2013 20130315h00-20130715h23 #pp13, pinkpop 36.712 Lowlands 16-18 Aug 2013 20130513h00-20131116h23 lowlands, #ll13 48.935

5.2 Tweets

We analysed the tweets to see whether there are remarkable differences between the tweets about the different music events. We compared variables like the number of words per tweet, the average length of the words, the usage of punctuation marks, occurrences of emoticons, urls, hashtags and mentions in each message and we counted the percentage of messages that are marked as favourite or that are retweeted. The numbers for each dataset are specified in Table 5.3. We analysed a total of 111.563 tweets. An average tweet contains 13.2 words with an average length of 4.8 characters and 3.2 punctuation marks. Only 0.5% of tweets contain emoticons, and each tweet with an emoticon contains on average just one emoticon. User mentions occur in 45% of tweets and each of these tweets contains 1.3 user mentions on average. Hash-tags occur in 47% of the tweets and each tweet with hash-tags contains an average of 1.6 tags. Urls are found in 25% of the tweets. 31% of the messages is a retweet and each retweet is being retweeted on average for 10 times. Only 5% of the messages is marked as favourite and on average each favourite message is being marked as favourite 1.6 times. 0.6% of the messages is sent by a verified account (which is used by Twitter to verify the identify of famous people and brands). We see that these statistics are similar for the five datasets. The number of favourites in the Pukkelpop data is about four times higher than in the other datasets and the number of retweets for each retweeted tweet in the Pinkpop and Lowlands dataset more than three times higher than for the other three data sets. We do not now what causes these deviations.

3https://dev.twitter.com/docs/platform-objects/tweets

42 Table 5.2: Structure of the Twitter data. Fields are based on the Twitter tweet format3

field subfield type description Id Int64 The integer representation of the unique identifier for this Tweet. Created at Int Epoch representation of time of creation of the Tweet Retweet count Int Number of times this Tweet has been retweeted. Favorite count Int Indicates approximately how many times this Tweet has been "favorited" by Twitter users. Place Indicates that the tweet is associated with (but not necessarily originating from) a placea Full name String Full human-readable representation of the place’s name Country code String Shortened country code representing the country containing this place. Coordinates Collection of Float Represents the geographic location, calculated using the centre of the bounding box location. Format: [φ,λ] Country String Name of the country containing this place. Place type String The type of location represented by this place Name String Short human-readable representation of the place’s name Coordinates Collection of Float Represents the geographic location of this Tweet as reported by the user or client application. Format: [φ,λ] Text String String containing alphanumerical characters, mentions, urls, hashtags, emoticons and punctuation marks are removed Punctuation Collection of Char Characters that are not in [a-zA-Z0-9] but are within the ascii keyset Emoticons Emoticons that are present in the tweet Happy Collection of String Emoticons found with RegExp=’(\^_\^|’ + [:=] + (|o|O|-) + [D\)\]] + ’)’+\b Sad Collection of String Emoticons found with RegExp= [:=] + (|o|O|-) + [\(\[]+\b Wink Collection of String Emoticons found with RegExp=’[;] + (|o|O|-) + [D\)\]]+\b 43 Tongue Collection of String Emoticons found with RegExp= [:=] + (|o|O|-) + [pP]+\b Other Collection of String Emoticons found with RegExp=’(’+[:=]+’|’+[;]+’)’ + (|o|O|-) + [doO/\\] +\b Entities Entities which have been parsed out of the text of the Tweetb. Hashtags Collection of String List of hashtag text User mentions Collection of Int64 List of ids of twitter users Urls Collection of String Urls for each tweet User Information about the user who posted this Tweet.c Created at Int Epoch representation of time of creation of the user account Favourites count Int The number of tweets this user has favorited in the account’s lifetime. Followers count Int The number of followers this account currently has. Friends count Int The number of users this account is following (AKA their "followings"). Id Int64 The integer representation of the unique identifier for this User. Name String The name of the user, as they’ve defined it. Not necessarily a person’s name. Typically capped at 20 characters. Screen name String The screen name, handle, or alias that this user identifies themselves with. Screen names are unique but subject to change. Statuses count Int The number of tweets (including retweets) issued by the user. Verified Bool When true, indicates that the user has a verified account. Location String The user-defined location for this account’s profile. Not necessarily a location nor parseable. Description String The user-defined UTF-8 string describing their account.

ahttps://dev.twitter.com/docs/platform-objects/places bhttps://dev.twitter.com/docs/platform-objects/entities chttps://dev.twitter.com/docs/platform-objects/users Table 5.3: Statistics about the Twitter messages in the five datasets. We describe the occurrences per tweet and the number of tweets in which the property occurs. E.g.: A percentage following the description ‘emoticons (%)’ gives the percentage of tweets in which emoticons occur. The numbers following ‘Emoticons’ are the average number of emoticons for the tweets in which the emoticons occur.

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands Average Total number of messages 2,586 6,254 16,986 36,744 48,993 Words per tweet 12.87 13.49 12.19 13.3 13.74 13.14 Average word length 4.86 4.94 4.91 4.66 4.85 4.84 Punctuation marks per tweet 3.11 3.51 2.82 3.01 3.32 3.16 Emoticons (%) 1.34 0.6 0.34 0.13 0.12 0.5 Emoticons per tweet 1.04 1.06 1.06 1.05 1.03 1.05 Urls (%) 29.58 24.96 25.73 16.84 29.21 25.25 Urls per tweet 1.03 1.03 1.02 1.02 1.03 1.03 Hashtags (%) 34.69 37.42 58.09 49.39 56.91 47.23 Hashtags per tweet 1.67 1.67 1.53 1.58 1.58 1.61 User mentions (%) 39.75 47.33 44.81 41.86 48.69 44.55 User mentions per tweet 1.33 1.45 1.35 1.3 1.31 1.35 Favorites (%) 2.59 2.69 14.91 2.99 4.92 5.47 Favorite count 1.15 1.78 1.66 1.69 1.61 1.58 Retweets (%) 26.95 40.66 25.87 26.24 34.07 30.92 Retweets per retweet 5.53 6.29 4.93 19.52 15.39 10.48 Verified tweets (%) 0.43 0.37 0.68 0.57 0.72 0.55

44 5.3 The relations between users within the datasets

We analysed the relations between the users that contributed to the tweets in our dataset. We found that there is some overlap between the users that contributed to data about the different events. This overlap is shown in Fig. 5.1. In this diagram the groups of users are identified that contributed to data about the same events. We recovered the user names of users that contribute to all five datasets or to four of the five data sets to see what kind of users these are. These users are listed in Table 5.4 and 5.5. Within both groups of users a significant portion exists of radio programs and news media, especially in the group of users that contribute to five datasets. Within the followers and friends of the users for each dataset we also compared overlapping users; the results are shown in Table 5.6. The number of users for each data set is listed as the number of unique users: |U| The user occurrences refers to the number of times the users are listed as either follower or friend: |U| + |U ∪ F o| + |U ∪ F r|. The relations between the users refers to the number of all relations within the group of users, friends and followers together. The number of cumulative friends is the number of friends that are listed in the dataset, including P friends that are listed as friends for multiple users: |F rn|. The unique number of friends S n∈U is defined as | n∈U F rn|. Idem for the followers. To show the differences between the datasets we also expressed the numbers of friends and followers and their relations in relation to the number of users in the dataset. These numbers can be found in Table 5.7. In this table it can be seen that for the number of user occurrences, the cumulative and unique friends, the datasets are similar. For the cumulative and unique number of followers we see that the number of followers is much higher for the Dance Valley and the Sensation datasets compared to the Pukkelpop, Pinkpop and Lowlands data. One of the reasons for this can be the fact that both data sets are smaller than the other three data sets, while all five data sets contain a number of people that have high amounts of followers. In a small dataset a number of users with large amounts of followers or friends might immediately impact the cumulative number of followers. We analysed the relations within the users by modelling the users and their relations as a graph. The users are the nodes in this graph, the relationships between these users, being friend or follower, are modelled as an edge between the two user-nodes. As an example of what the relations between the users within a dataset look like we included a graph of the users within the Dance Valley dataset in Fig 5.2. We depict only the users and their relations and disregard the followers and friends that are not part of the group of users in our dataset because of the high number of nodes this would result in. We used the tool Gephi4 to generate the graph. In Gephi we used the Yifan Hu Multilevel layout algorithm to create a layout and used a modularity algorithm to colour clusters within the graph. A modularity algorithm is used to detect communities within a graph. A high modularity indicates that there are communities within the graph in which the connectivity is high, while the connectivity between the nodes in different communities is low. The modularity for this graph is 0.54, and this number indicates that there are communities present in this data. We show a number of network measures for each data set in Table 5.8. We analyse the relations within the group of users and disregard the friends and followers that do not belong to the group of users because of the size of the graphs. We calculate the indegree and the outdegree. The outdegree is the number of friends for each user, while the outdegree is the number of followers. We use the Freeman measure for centrality to calculate one value for the 0 0 network, as defined in Equation. 5.1[24]. In this measure, CX (pi) is the centrality value CX 0 ∗ for node pi. CX (p ) is the maximum centrality value for the nodes p1...pn in the network. For details about the centrality measure, see Chapter 4.2.4.

4https://gephi.org

45 DanceV alley

Sensation Lowlands 1.524

54 130 44 24 48 3.847 256 20 294 17.566 22 65 45 19 428 42 77 52 44 167 3365 724 125 35 371 P ukkelpop 173 37 P inkpop 777 15.956 6.373

Figure 5.1: Overlapping users in the Twitter data about Dance Valley, Sensation, Pukkelpop, Lowlands and Pinkpop

Pn 0 ∗ 0 i=1 [CX (p ) − CX (pi)] CX = Pn 0 ∗ 0 (5.1) max i=1 [CX (p ) − CX (pi)] We compare different centrality measures. The degree centrality is the fraction of the nodes in the network that is connected to a node v. The betweenness centrality of a node v in a network is the portion of all the paths between nodes w and x in the network that pass through v. The closeness centrality of one node is the distance of this node to all the other nodes in the network or to the nodes in the sub-graph when the graph is not connected. The numbers indicating the centrality are very small numbers, as can be observed in Ta- ble 5.8. We used the Freeman measure for centrality in the network. This measure is based on a comparison between the centrality for the network, compared to the optimal possible centrality for the network. The shape of this optimum depends on the centrality measure. For example for the degree centrality the optimal shape is a star or a wheel. Our networks of nodes for the different datasets are very sparse, and which results in low centrality values.

46 Figure 5.2: Users in the Dance Valley dataset, displayed using the Yifan Hu Multilevel layout algorithm using the Network Analysis tool Gephi5. The colours have been assigned using a modularity algorithm, and show different clusters in the graph. The modularity for this graph is 0.54, and this number indicates that there are communities present in this data.

47 Table 5.4: Users that contribute to four sets of tweets

Eric-Jan Dol Rob van der Zwaan noordholland Harm Groustra RadioNL Pierre Oitmann Manon redactie NU.nl Muziek Shownieuws Jim Aasgier P1pp1 Rob Min Edgar Kruize Spitsnieuws Radiowereld Ochtendkrant Raphaël Varane Headlinez Den Haag & Nieuws Quinty Crew The Matrix André Verzaal Skunk Hitradio/Party feedNL Nieuws - News Dagelijks nieuws! follow me Amsterdam_Nu 01shownieuws Meneer Noll lijtie jimmy Nieuwsflitser Het Laatste Nieuws Spreekt voor zich! Tweet Meldingen thom gewoontelui (sn) News 24h Nederland Prov. Noord-Holland Het Beste Nieuws Feit van de dag SocialFM The Radio Station Dagblad De Limburger

Table 5.5: Users that contribute to five sets of tweets

3voor12 Patrick Petersen Festival-/Podiuminfo Richard Spoelstra ekvin RETWEET FEITEN Miss Carly & Co. Metro Entertainment Dagblad Metro Nederland Vandaag United Artists woutveldhoen Sander Headlines.nl NieuwsActueel Media Cultuur Opennow.eu ! NL_Nieuws Nedernieuws

Table 5.6: User statistics

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands #Unique users 1,566 3,882 6,417 15,978 17,586 #User occurences 3,430 8,843 13,133 32,673 37,582 #Relations within users 12,214 45,950 352,040 532,858 1,078,128 #Cumulative friends 739,267 1,899,098 2,709,460 6,221,601 8,453,776 #Unique friends 436.736 1,004,001 825,968 1,873,795 2,163,125 #Cumulative followers 2,205,684 7,139,092 5,136,089 13.098,941 25,230,731 #Unique followers 1,434,086 4,394,470 1,809,226 4,375,679 8,373,383

Table 5.7: Ratio between number user variables and the number of users

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands Unique users 1,0 1,0 1,0 1,0 1,0 User occurences 2,19 2,28 2,05 2,04 2,14 Relations within users 8 12 55 33 61 Cumulative friends 472 489 422 389 481 Unique friends 279 259 129 117 123 Cumulative followers 1,408 1,839 800 820 1,435 Unique followers 916 1,132 282 274 476

Table 5.8: Network properties for users within the different datasets

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands Number of nodes 1,355 3,332 6,157 15,474 17,260 Number of edges 7,319 26,869 203,495 334,951 686,688 Average degree 5.40 8.06 33.05 21.65 39.78 Indegree centrality 54.41 · 10−06 13.88 · 10−06 27.37 · 10−06 7.10 · 10−06 7.42 · 10−06 Outdegree centrality 13.03 · 10−05 3.85 · 10−05 6.73 · 10−05 1.76 · 10−05 1.44 · 10−05 Degree centrality 18.47 · 10−05 3.81 · 10−05 6.84 · 10−05 1.78 · 10−05 1.70 · 10−05 Betweenness centrality 1.25 · 10−07 1.31 · 10−08 4.15 · 10−09 N/A N/A Closeness centrality 42.90 · 10−02 45.12 · 10−02 61.94 · 10−02 N/A N/A

48 Chapter 6

Implementation of online and real world variables

We implemented both the online and the real world variables. We used Python for the im- plementation, because this language has many libraries including functions we can use for the implementation of our model, such as a library for network analysis and libraries for mathemat- ical functions and plotting. In this chapter we describe how we implemented the online and real world model.

6.1 Real world features

Based on the model we defined in Chapter 4.1 we used information from various sources to describe the occurrences in the real world. In our model we list six categories: time, place, people, associated events, content and context events. Events are described as tuples e = (T, l, P, ea, c) ∈ E, including the time period T = [t0, tend], people p ∈ P that are involved in the event, a location l, associated events ea ∈ AE ⊂ E and a description of the content of the event c. The associated events and the description of the content are either sets of events or a description d of the content. This description can be a description in natural language or a number. The context is either modelled as a sequence of events or as a continuous time function that describes the behaviour of an influential context variable. To calculate the correlation between the variables it is convenient when these variables are expressed as a time function. We therefore create a discrete time function v(t) for each type of event and for the context. This function returns 0.0 when no event occurs. For the moment ti in which an event occurs we assigned a value 0 > v(ti) >= 1 to this function. The value depends on the type of event. Details about the implementation of these events can be found in the following description of the implementation of the real world features. For context we use a function that describes the development of an influential variable. Since people and place are already included in the definition of the associated events and content events we do not describe these separately.

6.1.1 Associated events We implemented three types of associated events: news events and sport events, which we consider unrelated events, and other music festivals that occur in the same year, which we consider

49 to be events that target the same audience and that are scheduled closely to the current event (UE and PA, see Chapter 4.1.4). We implemented the news and sport events as two discrete time functions that return a 1.0 if there are activities at that moment, and 0.0 if there are no activities. We used an overview of news events (including sport events) during the year and selected news events that we considered to have enough impact at least to be a topic of conversation for many people. To determine which events we included we used our own judgement. We selected music events from a couple of lists containing music events in the Netherlands and in 2013 1 2 , of which none of these lists is expected to be exhaustive. We used the size of the event (minimum number of visitors should at least around be 10.000) and our own judgement to decide which events are interesting enough to include in our implementation. We included both pop festivals and dance festivals. We expressed the music events as a discrete time function that returns for each moment of time sum of the (normalized) number of visitors of each music event that is scheduled for that moment. The selected news, sport and music events can be found in Appendix C.

6.1.2 Content We used the programs of the five music events as content of the event. These programs consist of a list of events (performances) of which each one has a time, place (stage), people (the per- formers) and content (the performance). Of this information we used the time, place to create a function that describes the program. Each event has more than one stage where performances are scheduled. This means that these events have overlapping performances. For each event we expressed the scheduled activities in a time function that returns the number of performances for each event at each moment in time divided by the number of stages that is available. This means that if there are two performances scheduled at Dance Valley for t = 3 aug 2013 13 : 00 and at the Dance Valley event are 2 stages available, the result of this function yields DV (t) = 2/2 = 1.0.

6.1.3 Contextual events We implemented two types of contextual variables describing the weather and the holidays.

Weather For the contextual weather information we used historical meteorological data which is made publicly available by the KNMI3. We selected data for the time spans of the five datasets. Within the Netherlands there are a number of weather stations of which data is available. We chose the Schiphol weather station for Dance Valley and Sensation, Lelystad for Lowlands, and Maastricht for Pukkelpop and Pinkpop. A map with the weather stations can be found in Figure 6.1, the estimated distances can be found in Table 6.2. The data that is made available by the KNMI consists of 24 columns of information, containing information about rain, wind, snow, etc. for each hour. We summarized this information into eight functions that return values ranging from 0 to 1. The functions are listed in table 6.1. For details about the weather data from the KNMI and the translation from the KNMI-data to the functions used in our research, we refer to Appendix A.

1http://reizen-en-recreatie.infonu.nl/evenementen/71847-popfestivals-nederland-2013-overzicht-pinkpop-tm-lowlands. html 2 http://reizen-en-recreatie.infonu.nl/buitenland/71849-popfestivals-europa-2013.html http://www.yellowtipi.nl/ 3http://www.knmi.nl/klimatologie/uurgegevens

50 Table 6.1: Description of weather event functions

Event Description value RD Rain duration [0,1] RI Rain intensity [0,1] S Sun [0,1] T Temperature t in {0, 1} F Fog f in {0, 1} SN Snow 0 or 1 TS Thunderstorm 0 or 1 W Wind [0,1] M Moon m ∈ {0, 0.5, 1}

Table 6.2: Time period of the data about the events, and the distance between the events and the weather stations

Event Search period Event location Weather station Distance Dance Valley 20130501h00-20131101h23 Spaarnwoude Schiphol 12± 5 Sensation 20130401h00-20131001h23 Amsterdam Arena Schiphol 12± 5 Pinkpop 20130514h00-20131116h23 Landgraaf Maastricht 23± 5 Pukkelpop 20130315h00-20130715h23 Maastricht 26± 5 Lowlands 20130513h00-20131116h23 Biddinghuizen Lelystad 17± 5

We furthermore used a table containing the position of the moon in 20134 to create a function that returns 0 for the first quarter, 0.5 for the second and fourth quarter and 1 for the third quarter.

Holidays

We distinguished between three types of holidays: School Holidays (SH), Public Holidays (PH) and Construction industry Holidays (CH). We created a function SH(t) that returns the portion of schools that have a holiday at a given time. Since not all schools have a school holiday at the same day, we created the function in such a way that for each time it returns the proportion of regions and school types that have a 1 holiday at that time. For each region and type of education the function returns 6 , in such a way that at the days where all the schools have a holiday, the holiday value for that day is 1. The public holidays are the same for the whole country. For the public holidays we created a function PH(t) that returns 1 for the days that are a public holiday, while for the other days the function returns 0. For the construction industry holidays we have three regions in the Netherlands. We expressed the construction industry holidays in a function CH(t) that returns a 1 when all three regions 2 1 have a holiday at that date, and 3 or 3 when two or one regions have a holiday at that date. The dates of the school holidays, public holidays and construction industry holidays can be found in Appendix B.

4http://plazilla.com/page/4295039744/de-maanstanden-van-2013-de-data-van-het-laatste-kwartier-nieuwe-maan-eerste-kwartier-en-volle-maan 5http://www.knmi.nl/klimatologie/images_algemeen/stations.jpg

51 Figure 6.1: Meteorological stations KNMI with the locations of the events 5. Schiphol is used for weather information about Sensation and Dance Valley, Maastricht for Pukkelpop and Pinkpop and Lelystad for Lowlands.

52 6.2 Online features

We implemented the online features as functions of time, using a running average. We put all tweets for each dataset in chronological order, and used a sliding window with the width of 50 tweet to calculate the values of the variables. The implementation is not complete: not all variables that are described in the model are implemented. We implemented time, place, people and content related variables. We did not implement the associated event-related variables. In this section we describe the variables that we have implemented.

6.2.1 Time Intensity We calculate the intensity of the period within the sliding window by first calculating the time difference between the first and the last sample. We convert this number to a difference in days. We then divide the window size by this time difference, resulting in the number of tweets per day.

6.2.2 Place We implemented three measures for observing changes that relate to the location of the user or the tweet. In our model we defined three locations: the place of residence of the user, the event location and the current location of the user. In the description of these locations we assumed that we have access to the location from where the tweet is published, descriptions of the location (place of residence) of the user, the tweets that the user posed previously and the locations of the friends of the user. (See Chapter 4.2.3.) About 72% of the users in our dataset have filled out their location in their Twitter profile. Around 2% of the tweets contain coordinates of the place from which the tweet was posted. We do not have multiple tweets for each user available. This caused us to use an adapted approach to estimate the location of the user, tweet and event. We determined the place of residence based on information the user provided in the ‘location’ text from the user profile. We used a list of zip codes, place, municipality and coordinates for the Netherlands (obtained from d-centralize67). From this list we created a new list containing place and municipality names and coordinates. In case a place and a municipality have the same name, we used the coordinates for the place as coordinates for both the municipality and the place. In the location description we replaced ‘(’, ‘)’ and ‘/’ by white spaces, and removed all characters except letters and ’’. We first compared the results with a manually compiled list of provinces and a list of names of places that have different spelling variants. We replaced the name of provinces with the name of the capital of the province. We then searched in the list of known places with the separate words and the full combination of words in the resulting description. If a match was found, the coordinates of the matching city were used as the location of the living place of the user. We calculated the number of locations we were able to identify using our algorithm. We were able to identify 60% of the available locations. When we compared the average scores for each data set we found an average of 56% ± 18%, which is caused by a remarkable low number of identified locations of 21% in the Pukkelpop dataset. We inspected this dataset and found that many of the locations are in Belgium. We furthermore checked our approach manually on 1000 sequential tweets in the Pinkpop dataset. Of these 1000 tweets 263 had no location filled

6http://www.d-centralize.nl/projects/6pp/ 7https://github.com/ddsc/webclient/blob/master/app/data/4pp.csv

53 Table 6.3: Coordinates of the locations of Dance Valley, Sensation, Pukkelpop, Pinkpop and Lowlands

Event Location φ λ Dance Valley Spaarnwoude 52.404615 4.694549 Sensation Amsterdam_Arena 52.314145 4.940639 Pukkelpop Hasselt (Kiewit) 50.964167 5.354722 Pinkpop Landgraaf 50.892765 6.022408 Lowlands Biddinghuizen 52.455236 5.692693 out. Of the 737 location descriptions that are analysed by the algorithm for 272 descriptions a location could not be determined by the algorithm. By manual inspection of unidentified descriptions, only two of them were actual a description that contained a city, municipality, region or province. The other unidentified locations consisted of too general descriptions (The Netherlands, Europe), places in other countries, telephone area codes (020 for Amsterdam) and other descriptions (“Outer Space”, “Achter ~3600000 pixels”). We also inspected the 465 locations that were identified by our algorithm and could find one incorrectly identified location. Based on our test on this small sample we determined that the algorithm has an precision of 99% and a recall of 99%. Using this algorithm, we determined for each tweet the coordinates of the living place of the user that posted it. We used this information to calculate the average distance between event and user, event and the place where the tweet is posted and between users. We calculated the average distance between event and the place of residence of the user for the users that are active within the sliding window. We used the haversine8 formula which gives the great-circle distance between two locations on a sphere given the longitude (φ) and latitude (λ). The coordinates we used for the events are listed in Table 6.3. If no location was available for a user, the tweet was disregarded in the calculation of the average distance. We also calculated the distance between the users. We used a sliding window of 50 tweets. For each first tweet in this window we calculated the distance from the user that published the tweet to the other users that published tweets within the sliding window. If no location was available for a user the user was disregarded in the calculation of the average distance. We furthermore calculated the average distance from the event to the location of the tweet. For that we used the coordinates that are provided by the user. However, this information is very sparse. As already mentioned, only 2% of the users provide the current location at the moment the tweet is published. Since we use a sliding window with a size of 50 tweets, we might on average have one location for each window. We calculated this distance between the provided location and the event location using the haversine function. We do however not expect that this results in reliable information because of the sparseness of the information.

6.2.3 People

Of the variables that are described in the online model in Chapter 4, we implemented the centrality measures, the inclusiveness, social equality, popularity, experience and the people of interest.

8http://en.wikipedia.org/wiki/Haversine_formula

54 Centrality measures

In Chapter 4.2.4 we defined the metrics to describe the relations between people. We used a sliding window of 50 tweets and collected the users that published tweets in this time window. For each user we found the friends and followers which are part of the group of users for this dataset. We chose to only consider friends and followers within the group of users because of the calculation time for the centrality measures for large graphs. As could be seen in Ta- ble 5.6 the number of followers and friends is very high. We described the relations between the friends and followers as an undirected graph and used the Python Networkx package9 to calculate the centrality, closeness and betweenness degree. The functions degree_centrality, betweenness_centrality, closeness_centrality each return a list of degrees for each node in the network. We used the Freeman measure for centrality to calculate one value for each window. For details about this measure, see Chapter 4.2.4.

Inclusiveness

The inclusiveness is implemented by collecting the friends and followers within the group of users for each user within the sliding window. We calculated the size of the set difference between the users and the union of the friends and followers, which yields the number of users within the sliding window that are not a friend or a follower of the user.

Social equality

The social equality is the number of bidirectional relations for a group of users in relation to the total number of relations. The number of bidirectional relations for each user is calculated by collecting the users within the dataset that are both friend and follower of one of the users within the sliding window. The unidirectional relations are the users that are either friend or follower of this user, but not both. We counted these relationships for each user within the sliding window, and used the number of bidirectional relations divided by the total number of relations as a measure for social equality.

Popularity

The popularity for each user is calculated by dividing the total number of friends by the total number of followers. The friends and followers do not need to be in the set of users. For each time window the average of this ratio is calculated for all the users that published a tweet within this time frame.

Experience

The user experience is described using the status-count field of the tweet, which indicates the number of tweets that the user has published before.

People of interest

For each time window, the number of tweets that is sent from a verified account is counted.

9https://networkx.github.io/

55 6.2.4 Content Of the content variables that are introduced in Chapter 4 we implemented the variables describing the originality, emotionality, sentiment, newsworthiness, readability and intensity.

Originality We calculated the originality of the tweet by counting the total number of times that the tweet has been retweeted.

Emotionality We calculated the emotionality of the tweet by counting the total number of emoticons in the tweet. Five types of emoticons are counted: happy and sad emoticons, winking emoticons, emoticons with tongue and other emoticons. The code for finding the emoticons is provided by O’Brian10. The regular expressions that describe the emoticons are listed in Algorithm1.

Algorithm 1 Regular expressions to find different emoticons Happy ← RegExp(’(\^_\^|’ + [:=] + (|o|O|-) + [D\)\]] + ’)’+\b, tweet) Sad ← RegExp( [:=] + (|o|O|-) + [\(\[]+\b, tweet) Wink ← RegExp(’[;] + (|o|O|-) + [D\)\]]+\b, tweet) Tongue ← RegExp( [:=] + (|o|O|-) + [pP]+\b, tweet) Other ← RegExp(’(’+[:=]+’|’+[;]+’)’ + (|o|O|-) + [doO/\\] +\b, tweet)

Sentiment We analysed the sentiment in the tweet by using the meaning of certain words in the tweet, the meaning of the emoticons and the presence of punctuation marks. We used a list containing Dutch words with for each word a number that indicates the sentiment value. We created this list using two lists. The first list is a list that is created by Smedts and Daelemans[18], which is available as part of a Python part-of-speech tagger for Dutch called ‘Pattern’ 11. This list contains 41636 non-unique adjectives with polarity and subjectivity numbers for each word. The polarity values range from negative to positive with values ranging from -1.0 to +1.0, and a subjectivity range from objective to subjective with values from +0.0 to +1.0. The list contains multiple interpretations of several words. The adjectives are part of (formal) written language and many of these words do not occur in the text of the tweets used in this research, since the language used in the tweets is often colloquial. Using solely this list, many tweets do not contain even one word from the list. We compiled a list of all words that occur in the datasets we used. We sorted the list based on the number of occurrences in the five datasets. We manually selected 300 words that have a polarity (different than 0) according to our opinion and often occur 2120 most occurring words in the dataset we use (frequencies of the selected words range from 2944 to 54 in the five datasets together). We manually assigned a polarity value to each word. We then combined the two lists en removed double occurrences of words by randomly selecting one of the interpretations of the duplicate words. The resulting list contained 3416 words, including adjectives, verbs, nouns, but also words of abuse, vulgarisms and street language.

10https://github.com/aritter/twitter_nlp/blob/master/python/emoticons.py 11http://www.clips.ua.ac.be/pages/pattern-nl

56 Dutch adjectives change their ending depending on whether they are is used in combination with a definite or indefinite article. For example: “Het mooie huis”, “Een mooi gebouw”. Usually an -e is added when the indefinite article is used. However, there are words in which also other letters change, such as ‘hopeloos’ and ‘hopeloze’. Although we most likely did not correct for every possible type of adjective, we created some rules for translating often occurring types of adjective endings from the definite ending to the indefinite ending. The function we used for this translation is listed in Listing 2 To determine the sentiment we summed the polarity values for each word in the tweet with the sentiment word list. We counted the number of happy and sad emoticons and the number of exclamation marks. We determined the sentiment by summing the polarity sum and the emoticons multiplied by constant values. We used the number of exclamation marks as a booster for the sentiment, an approach that is also used in the Pattern library. The constant values were determined empirically. The resulting value was cut at -1 and 1. The formula used to determine the sentiment value is described in Equation 6.1.

S(m) = min(−1, max(1, (polarity(t) + happy(t) ∗ 0.3 − sad(t) ∗ 0.3) ∗ exclamation(m))) (6.1) where polarity(m) for a message m is based on the polarity values of the words, happy(m) and sad(m) are the number of happy and sad emoticons and exclamation(m) is the number of exclamation marks in the message. We tested the sentiment analysis algorithm by manually labelling 200 samples from the Dance Valley dataset. We labelled each tweet in this sample as either ‘positive’ or ‘negative’, where ‘positive’ is equal to a polarity value of 0 and higher, and ‘negative’ to a value lower than 0. We compared the manually assigned labels with the values that are assigned by the algorithm. Sentiment values above 0 were counted as positive and sentiment values below 0 as negative. In this data sample the algorithm has a precision of 87% and a recall of 97% (with regard to determining positive sentiment). Within the data that is used to test, the ratio of positive to negative samples was 188:12.

Newsworthiness

We defined the newsworthiness of the tweet using the number of urls u, the number of mentions m, a boolean indicating the tweet is a retweet r, question marks q and the number of emoticons e as an indication of newsworthiness of the tweet. We found the constants in the formula empirically, based on a training set which was a part the Dance Valley dataset. We tested the algorithm on 150 samples of the Pinkpop dataset, where a newsworthiness value >= 0.5 is counted as a newsworthy tweet. We manually labelled these 150 samples as either news or chatter and compared our labels with the labels assigned by the algorithm. This resulted in a precision of 69% and a recall of 68%. The formula we used including the constants we determined is listed in Equation 6.2. Examples of newsworthiness scores of tweets can be found in Table 6.5.

 +0.3 u (number of urls in t)  −0.2 m (number of mentions in t)  NW (t) = +0.5 r (1 if t is a retweet, else 0) (6.2)  −0.3 q (number of question marks in t)  −0.3 e (number of emoticons in t)

57 Algorithm 2 Algorithm for rewriting the ending of adverbials function RewriteAdverbial(word) origEnd ← regexp(’(((i?[aeiou]?[tvz]))|([bdfgklmnprst]{1,2}))e$’, word) if length(origEnd)=2 then word ← word[: length(word) − 1] . Remove -e else if length(origEnd)=3 then if ex[0]=origEnd[1] then . -sse → s newEnd ← origEnd[0] else . -ave, -ene, -oze → aaf, een, oos newEnd ← origEnd[0] + origEnd[0] . -a,e,o,u- → -aa,ee,oo,uu- if origEnd[1]=’z’ then newEnd ← newEnd +0 s0 . -z- → -s else if origEnd[1]=’v’ then newEnd ← newEnd +0 f 0 . -v- → -f else newEnd ← newEnd + origEnd[1] . -e?e → -ee? end if end if else if len(origEnd)=4 then . -ieve, -ieze, -iete → ief, -ies, -iet newEnd ← origEnd[0] + origEnd[1] . -ie- → -ie- if origEnd[2]=’z’ then newEnd ← newEnd +0 s0 . -ze → -s else if origEnd[2]=’v’ then newEnd ← newEnd +0 f 0 . -ve → -f else newEnd ← newEnd + origEnd[2] . -?e → -? end if end if word ← word[: length(word) − length(origEnd)] + newEnd . Replace ending end if return word end function

58 Sent. Tweet 1.0 Ze lopen aardig te knallen op Dance Valley, wat een hardcore!? Zo nu en dan wel een relaxed house nummertje #dv2013 #achtertuin 1.0 Facking vet om helemaal vooraan de mainstage te staan bij dance valley! 0.9 ’Nieuwe koers Dance Valley groot succes’ #NUnl http://t.co/3LRYNO0MnQ 0.8 Ben ook nog is op dance valley, het kan niet beter 0.6 Thuis mee kunnen feesten met Dance Valley 0.3 @jacvandijk hey, moest jij niet naar dance valley vandaag? of is dat niets voor jou? 0.2 Nog een laatste sfeerimpressie van mijn mooie werkdag en dan zit #DV13 er voor mij helaas weer op. http://t.co/cSdJ6gW6Bp 0.0 ’Dance Valley heeft ziel terug’ http://t.co/oFr7ala26K 0.0 RT @IamNizzle: Met 180 km/h richting Dance Valley #dancevalley < @Reggiede- Wit moet voor Mclaren gaan rijden. 0.0 Ik hoor Dance Valley als ik in de tuin staa 0.0 #show #twente ’Dance Valley heeft ziel terug’ - SPAARNWOUDE - Dance Valley isvolgens festivaldirecteur Brian Bout... http://t.co/gEy8sXeSxN 0.0 ’Dance Valley heeft ziel terug’ http://t.co/LAh9T0AZXH #Nieuwsflitser -0.4 Ik wou ook naar dance valley maar nee ik moet werken en vrij krijgen mog niet bluhh -0.4 Vanmiddag noodhulp "draaien". Ivm Dance Valley weinig mensen in dienst dus as- sisteren en de 2e noodhulpauto bemannen -0.2 Dance Valley is echt sick !!! -0.6 2 jaar terug.. deze zaterdag.. dance valley.. sloot jij je ogen.. ik mis je.. -0.6 Wil ook aanwezig zijn bij de dance valley, dit -0.6 Je hoort eigenlijk niemand over Dance Valley! -0.8 RT @Sidhappens: Ik zie Gaypride, Pandemonium en Solar op twitter en FB voorbij komen. Maar geen Dance Valley. Of ik volg niemand uit Purmer? -0.8 Bonk, bonk, bonk Dance Valley, wind waait helaas de verkeerde kant op -1.0 Ik heb alleen maar kut fotos. Maar dance valley is kapot gedraaid. Klaar voor morgen. WHHHOOOOOOPPPPP -1.0 Dance Valley zou niet slecht zijn nee -1.0 Ik ga helemaal bad dat ik niet bij dance valley ben, maar een lichtelijk goedmakertje met mn maatje van vroeger, #loveland @DanielvVliet

Table 6.4: Examples of sentiment analysis on tweets in the Dance Valley dataset. S is the value that is returned by the sentiment analysis algorithm. A negative value indicates negative sentiment, a positive value positive sentiment

Readability We based our readability value on the Flesch-Douma readability test12. This index is calculated using the formula: RE = 206.83 − (0.93 × SL) − (0.77 × W LS) (6.3) , where SL is the average number of words in the sentence and WL is the average number of syllables in each word. The formula uses the number of syllables for each word, which we don’t know. We do know the number of characters for each word. To preserve the ratio between the word length and the sentence length, we use a constant estimate to calculate the number of syllables based on

12http://www.kennislink.nl/publicaties/hoe-begrijpelijk-is-mijn-tekst

59 Table 6.5: Examples of tweets and their newsworthiness scores and manually assigned labels (C or N for chatter and news)

NW Label Tweet 1.0 N RT @politieKEN: Goede sfeer op Dance Valley, foto van net buiten het terrein. #DV13 #DV2013 http://t.co/wmxf6sDJzI 1.0 N #Entertainment ’Dance Valley groot succes’ - Telegraaf.nl:Shownieuws’Dance Valley groot s... http://t.co/RuCpHJi6aD #europa #openomroep 0.8 C Super leuke dag gehad op Dance Valley!! http://t.co/nRy3rkdSy7 0.8 N RT @politieKEN: Met de bus naar huis va DV? Pendelbussen rijden tot 00.00 u naar NS Adam Sloterdijk. #DV13 @DV2013 Meer vervoersinfo: http:? 0.6 N ’Nieuwe koers Dance Valley groot succes’ http://t.co/kLNDuQYLxh via @NUnl 0.5 N Soundflow is vandaag aanwezig op Dance Valley. Morgenavond komt het sfeerverslag, inclusief interviews, online op de site. 0.5 N Vandaag helpen wij de collega’s bij het solarweeekend bij dance valley en bij colourfestival in utrecht. Rvsecurity heeft 75 man werken. 0.5 C Bij Dance Valley! 0.5 C achter af wilde ik ook naar dance valley.. 0.2 C ?@xLENYBRITT: ik wil ook Dance Valley nu? waar woon je? 0.2 C bijna gaan die stekkers eruit at #Dancevalley Dance valley was toch wel die shit weet iemand nog een chille afterparty in A’dam? #getlaid 0.1 C RT @xMarleennJansen RT @Patriciaaa_x: Naar Dance Valley luisteren zonder een kaartje te hoeven betalen, gewoon in m’n tuin 0.0 C Wie van m’n volgers zijn NU in Dance Valley en kan ik vrijdag bellen in #MJJWI op @RTVRondeVenen? RT pls the number of characters in each word. We determined this number by counting letters and syllables in 250 randomly selected words from the Pinkpop dataset. We found 4.5 letters and 1.44 syllables for each word. We assume that the distribution of letters over syllables does not depend on the length of the word. Based on the number we found and the average number of words for all data sets (See table 5.3) we determined a ratio of 4.8:1.5 between word length in characters and syllable. Instead of the constant value 0.77 in the Flesch-Douma formula, we used 0.77*1.5/4.8 =0.24. The formula we use is now defined in the following formula:

RE = 206.83 − (0.93 × SL) − (0.24 × W LL) (6.4) , where SL is the average number of words in the sentence and WLL is the average number letters in each word. Some tweets contain more than one sentence. However, punctuation marks are used in several ways: e.g. to indicate emotions, to draw emoticons, and as punctuation. These marks are also often used incorrectly. Based on the punctuation marks that are present in the tweet we guessed the number of sentences. We used the marks ‘.’, ‘?’, and ‘!’ as sentence separators. If multiple punctuation marks are used directly next to each other we count them as one separator. For example: ’...’, ’!!!’ or ’?!’ were all counted as one separation. We counted the total number of separations in the tweet, and if no separation was found at the end of the tweet, one extra separation is added. The number of separations was used as an indication of the number of sentences in the tweet. We do not expect that the resulting readability values have the same meaning as when

60 the formula is used as it is designed (for formal text), but we expect that the values of more complicated tweets will result in higher readability scores. We selected some examples of tweets and show their text and readability scores and the number of words and estimated sentences in Table 6.6. We saw that higher scores indeed indicate simpler sentences. Lower scores indicate complexer sentences or sentences that lack punctuation marks, which might also result in a higher complexity.

Table 6.6: Examples of Flesch-Douma scores of tweets in the Dance Valley data set. W is the total number of words in the tweet, S number of estimated sentences in the tweet

FD-score tweet W/S #W #S 204 Dance Valley .....wie spaarnwoude..? 4.7 3 2 204 Vandaag dv2013 7.0 1 1 200 RT @politieKEN: 19e editie van Dance Valley, op enkel incident 5.4 16 3 na, goed en veilig verlopen. Veilige thuisreis! DV13 DV2013 199 Bijkomen van een GIGA DV13 zit er nier in we gaan gewoon 4.1 19 3 door!!! ? het was zooooo VET gister! Thnx @ http://t.co/CM7JXeqHw3 197 Bekijk hier de foto’s van Dance Valley 2013: IJMUIDEN - Fo- 4.9 18 2 tograaf Michel van Bergen was voor ... http://t.co/DaQodddIRH Haarlems Dagblad 196 Aan het bijkomen van Dance Valley? Geniet na en zie jezelf miss- 4.1 20 2 chienwel terug op onze foto’s en video: http://t.co/ugI4sLtxZY DV13 195 Super mooie dag gehad op dance valley, hardwell is de man leven 4.2 11 1 193 Ik kan thuis lekker live meegenieten van Dance Valley en jullie 4.8 13 1 lekker niet 191 @WendyK81 deze zaterdag Dance Valley en volgende week nog 4.7 15 1 iets in Spaarnwoude, en dan weer rust :) 190 Waarom voor 60eu naar dance valley gaan als ik een priveconcert 4.1 17 1 heb in mn tuin van ze.. 187 damn wat een dag fucking warm Kapper geweest dus haar weer 5.1 20 1 fresh fresh morgen richting Amsterdam voor DANCE Valley whoop 186 kwart voor 11 en nog steeds 25 graden in de huiskamer,maar in 4.0 21 1 de verte hoor ik onweer,morgen dance valley 184 over 21 dagen vliegen we naar Malta, tijd gaat snel als je geniet, 4.3 23 1 maar morgen eerst maar eens flink feesten op dance Valley 182 Dus ik ga naar dance valley en hoop een kaartje te bemachtigen, 4.0 25 1 als dit niet het geval is heb ik het op zijn minst geprobeerd 181 Zo weer terug van Dance Valley Was een super feestje Set van 4.4 26 1 de dag was toch wel die van @djblackburnNL wat een heerlijke tracks draaide die

6.3 Comparison between the models

We calculated the correlation for two variables that are expressed as discrete time functions. We first transformed the values of these variables to standard scores ([0,1])using feature scaling as

61 defined in Equation 6.5.

f − min(f) f 0 = (6.5) max(f) − min(f) and then calculated the Pearson correlation coefficient with its 2-paired p-value and the Spearman correlation coefficient and its 2-paired p-value for each combination of online and real world variables. To calculate the correlation coefficients and the p-values, we use the functions scipy.stats.spearmanr and scipy.stats.pearsonr from the SciPy library 13. For each combination of variables we also determined the lag that has the highest cross correlation and for this lag we calculated the correlation coefficients and p-value. To calculate the crosscorrelation we used scipy.signal.correlate, which results in an array containing the cross-correlations. Using the cross-correlation we can find the highest positive correlation. However, we are also interested in the negative correlations. We therefore inverted the second variable by using the difference between this function and the maximum value of this variable. We found the cross-correlation with the highest value for the positive and negative cross-correlations and determine the lag. The lag we found indicates the shift between the two inputs that results in the highest correlation. This lag should not be too high, otherwise it might not be realistic to assume a dependency or causal relationship between the occurrences online and in the real world. We limited the lag to 36 hours. The variables do not have values for regular intervals, so we calculated the average time difference between the tweets, and used this to determine the range in which the maximum cross-correlation and its corresponding lag should be found. We then shifted the two variables based on the lag we found, and calculate the Pearson correlation coefficient and its p-value. We determined the correlations both ways: assuming a causal relationship between the online and real world in both directions.

13http://docs.scipy.org/doc/scipy/reference/genindex.html

62 Chapter 7

Results and analysis

We calculated the real world variables and the online variables for each dataset as functions of time. For the real world variables we describe the correlation results of the variables in four categories: weather (rain, sun, temperature, max. temperature, fog, thunderstorm, wind, moon), holidays (public holidays, school holidays, construction industry holidays, weekends), context events (news, sports and festival events) and festival activities (Dance Valley, Sensa- tion, Pukkelpop, Pinkpop and Lowlands). The online variables are described in three categories: place (distance between event and place of residence, distance between event and user and the distance between users), people (degree centrality, betweenness centrality, closeness centrality, inclusiveness, social equality, popularity, experience and people of interest) and content (origi- nality, emotionality, sentiment, newsworthiness, readability and intensity. Of the people related variables, we did not calculate the betweenness and the closeness centrality measures for the Pukkelpop, Pinkpop and Lowlands datasets. We compared both the real world and the online variables with themselves to see which combinations of variables show weak, moderate or strong correlations. We then compared the real world variables with the online variables. We calculated the Pearson and Spearman correlation coefficients for each combination of variables. We only describe variables that have moderate (ρ > 0.36) or high (ρ > 0.68) correlation coefficients combined with a p-value below 0.01 or, when low correlations (0.36 ≤ ρ > 0.15) are found, that we consider remarkable with regard to the meaning of the variables. When we state that we did not find any significant correlations, we mean that we did not find any correlation coefficients with ρ ≤ 0.15 and p < 0.01. Since the Spearman and the Pearson correlation coefficients are often similar in our results, we often only describe the Pearson correlation coefficient. In the cases where we describe the Spearman correlation coefficient, we explicitly say so. When we refer to linear correlations or non-linear correlations, we refer to the Pearson correlation coefficient or the Spearman correlation coefficient respectively.

7.1 Relations between real world variables

We calculated the correlation coefficients for each possible combination of real world variables which are distributed in four categories: weather, holidays, context events and holidays. We found that from the weather variables, the temperature and the daily maximum tempera- ture have a strong correlation (ρP = 0.77). The sun intensity also has moderate correlations with the temperature (ρP = 0.478) and with the strength of the wind (ρP = 0.547). This is probably caused by the fact that all three variables are measured per hour and that the sun shines during

63 the day, causing higher temperatures. The wind is also partly caused by the sun: when the sun shines, air starts moving, resulting in turbulence. Because of this the wind is stronger during the day than during the nights. The school holidays and the construction industry holidays have a strong correlation (ρP = 0.865), which has to do with the large overlap in the summer of both holidays. The weekends and the school holidays only have a modest Spearman correlation coefficient (ρS = 0.391). This might have to do with the days that were chosen as boundaries for the holidays. The holidays start on Saturday at the first weekend and end at the last weekend of the holiday on Sunday, having overlap with the weekends. The activities for the five music festivals we use in this research show a strong correlation between the festival activities on Lowlands and Pukkelpop (ρS = 0.695). This is caused by the overlap of two days between the festivals. Pukkelpop starts on August 15 and ends at August 17, while Lowlands starts at 16th and ends on the 18th. T We compared the weather variables and the holiday variables. We found that both the summer holidays and the construction industry holidays have a moderate correlation (ρ > 0.561, both) with temperature. Both type of holidays also have a high correlation with the daily maximum temperature (ρ = 0.705 and ρ = 0.695). Both correlations can be explained by the fact that the largest holiday in both categories is in the summer, when the temperature is higher. A remarkable but weak correlation is found between the weekends and the rain intensity and duration: between these two variables a negative correlation (ρ = −0.26) exists. It seems like it has rained less during the weekends than during weekdays in 2013. Between the weather and context variables no significant moderate correlation values are found. However, a weak correlation between festival activities and thunderstorms is found (ρ = 0.216), which could be caused by the fact that both thunderstorms and festivals occur in the summer. We also find a weak correlation between the school holidays and music events (ρP = 0.218, ρS : 0.328), and while a very low Pearson correlation coefficient is found a moderate correlation is found between weekends and music events (ρP = 0.071, ρS = 0.376). Music festivals are often held during holidays and weekends, which might cause these correlation values. Between the news, music and sports events we could not find significant correlations. We also could not find significant (moderate) correlations between the weather variables and festival activities and between the festival activities and the holidays. An overview of the significant correlations between the real world variables can be found in Table 7.1. A graphical representation of the variables can be found in Appendix E. An overview of all correlation values for the real world variables can be found in Appendix I.

7.2 Relations between online variables

We calculated the correlation coefficients for each combination of online variables for each dataset and also calculated the average correlations for the datasets combined, based on the results of the five datasets. We describe the remarkable, moderate or strong correlations we found both in the average correlations and the correlations for the separate datasets. An overview of the significant average correlation results can be found in Table 7.2. In the following paragraphs, we describe the results for each combination of categories of online variables. An extended summary can be found in Appendix J, an overview of the average correlation for the online variables can be found in Appendix K.

64 Table 7.1: Correlations between real world variables

Variable 1 Variable 2 ρP P-value ρS P-value Pukkelpop Lowlands  0.695 0.0 N 0.632 0.0 School Holidays Const. Ind. holidays  0.865 0.0  0.826 0.0 School Holidays weekend M 0.354 0.0 N 0.391 0.0 Const. Ind. holidays News M -0.2 0.0 M -0.208 0.0 Const. Ind. holidays Sport M -0.16 0.0 School Holidays Festivals M 0.218 0.0 M 0.328 0.0 School Holidays News M -0.229 0.0 School Holidays Sport M -0.186 0.0 M -0.173 0.0 weekend Festivals N 0.376 0.0 School Holidays Dance Valley M 0.204 0.0 M 0.233 0.0 weekend Dance Valley N 0.363 0.0 N 0.369 0.0 Max Temp Moon M 0.202 0.0 Max Temp Wind M -0.247 0.0 M -0.33 0.0 Rain Max Temp M -0.251 0.0 M -0.298 0.0 Rain Temperature M -0.241 0.0 M -0.249 0.0 Rain Thunderstorm M 0.208 0.0 M 0.208 0.0 Sun Temperature N 0.478 0.0 N 0.574 0.0 Sun Wind N 0.547 0.0 N 0.532 0.0 Temperature Max Temp  0.77 0.0  0.709 0.0 Temperature Moon M 0.15 0.0 Moon News M 0.192 0.0 M 0.2 0.0 Thunderstorm Festivals M 0.216 0.0 Sun Dance Valley M -0.238 0.0 M -0.255 0.0 Max Temp Const. Ind. holidays  0.695 0.0 N 0.554 0.0 Max Temp School Holidays  0.705 0.0 N 0.536 0.0 Rain School Holidays M -0.162 0.0 Rain weekend M -0.26 0.0 M -0.256 0.0 Temperature Const. Ind. holidays N 0.561 0.0 N 0.479 0.0 Temperature School Holidays N 0.561 0.0 N 0.482 0.0

Table 7.2: Average correlations between online variables

Variable 1 Variable 2 µ(ρP ) σ(ρP ) P-value ρS σ(ρP ) P-value Emotionality Intensity M -0.167 0.054 0.0 Emotionality Newsworthiness M -0.253 0.061 0.0 M -0.235 0.061 0.0 Originality Newsworthiness M 0.189 0.272 0.0 M 0.219 0.183 0.0 Readability Intensity M 0.27 0.074 0.0 N 0.392 0.103 0.0 Betweenness centr. Closeness centr. M 0.257 0.094 0.0 M 0.253 0.098 0.0 Betweenness centr. Inclusiveness M -0.271 0.109 0.0 M -0.257 0.054 0.0 Degree centr. Betweenness centr. N 0.552 0.038 0.0 N 0.457 0.078 0.0 Degree centr. Closeness centr. N 0.412 0.036 0.0 N 0.426 0.054 0.0 Degree centr. People of int. M -0.16 0.069 0.001 Degree centr. Popularity M -0.217 0.063 0.0 Degree centr. Social Equality M 0.292 0.16 0.0 Inclusiveness Popularity Popularity Experience M 0.311 0.254 0.0 M 0.281 0.215 0.0 Degree centr. Intensity M -0.154 0.13 0.0 M -0.22 0.179 0.0 Experience Newsworthiness M 0.242 0.199 0.0 Experience Originality M -0.185 0.164 0.0 Inclusiveness Intensity M 0.191 0.278 0.001 M 0.193 0.325 0.0 People of int. Intensity M 0.182 0.063 0.0 Popularity Newsworthiness M 0.154 0.222 0.0 Social Equality Readability M -0.169 0.18 0.0 M -0.178 0.201 0.0 Dist Event - Res Distance Users N 0.657 0.254 0.0 N 0.477 0.35 0.0 Dist Event - Res Intensity M -0.164 0.3 0.0 Distance Users Intensity M -0.162 0.286 0.0

65 Figure 7.1: Place and content related variables for Dance Valley (3 Aug 12:00 - 22:00)

66 Figure 7.2: People related variables for Dance Valley (3 Aug 12:00 - 22:00)

67 7.2.1 Place related variables We calculated the distance between the users, between the current location of the users and the event and between the place of residence of the users and the place of the event. We show the place related variables during one day in the Dance Valley dataset in Figure 7.1. The graphs containing the other festivals and the whole period of the dataset can be found in Appendix F. We found that for the place related variables, only the Pearson correlation between the distances between the living place of the users and the distance between the event and the living place of the users is high (ρP =0.657± 0.254). This means that given a changing group of people that is active on Twitter during a period of time, when the average distance between the event and the living place of this group raises so does the distance between the users. This would suggest that the event is situated close to the centre of the area where these people live. When we look at the correlations between these variables for the five festivals we see that Dance Valley and Sensation both show very strong correlations between the distance from the place of residence from the user to the event and between the users (0.888 and 0.898). Both fes- tivals are held close to Amsterdam. The demographic centre of the Netherlands is the Randstad, containing around 7 million people, which includes Amsterdam. We suspect that the fact that these two festivals are held in this populated area explains these correlation values. Lowlands, Pinkpop and Pukkelpop are held outside the Randstad. Based on the distances from these festivals to the Randstad, we would expect Lowlands to have a lower correlation between the two distances than we found in Dance Valley and Sensation and Pinkpop and Pukkelpop to have even lower correlation values that Lowlands, but to have correlation numbers which are close to each other. For Lowlands, we indeed found lower correlation coefficients 0.655), and for Pukkelpop we only found a weak correlation of 0.196. For Pinkpop we would expect the same number, but we found a correlation of 0.648. We do not have an explanation for this correlation value.

7.2.2 People related variables We calculated the degree centrality, closeness centrality, betweenness centrality, social equality, inclusiveness, popularity, experience and people of interest for the five datasets. We show the people related variables during one day in the Dance Valley dataset in Figure 7.2. The graphs containing the other festivals and the whole period of the dataset can be found in Appendix F. We found strong correlations between the centrality measures in each dataset. The average correlation between the degree centrality and the betweenness centrality is ρP = 0.552 ± 0.038, between the degree centrality and the closeness centrality ρP = 0.412 ± 0.036. The betweenness centrality and the closeness centrality only have an average Pearson correlation coefficient of ρP = 0.257 ± 0.0.094. The betweenness and closeness centrality are only measured for Dance Valley and Sensation. The betweenness centrality shows a weak correlation with the inclusiveness (ρP = −0.271 ± 0.109). This means that if there are people that know more people (are more ‘connected’), at the same time the number of users that do not have friends or followers in that time window is lower. The degree centrality reflects the ratio between the number of people that users have a relation with and the number of people that they could have had a relation with. The degree centrality shows a weak positive correlation with the social equality (ρP = 0.292 ± 0.16). The social equality is the portion of friends and followers that have a unidirectional relationship with the user of the total number of relationships that the users have. Both variables have higher values when people are more ‘connected’, which explains the correlation value.

68 The degree centrality furthermore shows a weak non-linear correlation with the popularity of the users (ρS = −0.217 ± 0.063). The popularity is measured by the ratio of friends to followers. The degree centrality does not differentiate between unidirectional or bidirectional relationships and just counts the number of relationships that exist, compared to the number of people that are available to have a relationship with. Only people within the dataset are considered. The popularity considers just the total number of friends and the total number of followers, regardless of whether the users belong to the dataset. It appears that according to this correlation value, if people are less connected with the other people that tweet about the same festival they might have more friends than followers. What this finding means, is not clear. The experience of people is measured by the number of tweets that they have posted. This number is also related to the popularity of these users (ρP = 0.311 ± 0.25). This correlation could be explained by that people have more friends when they have been using Twitter for a longer period of time. When we looked into the correlations between the online variables for the individual data sets, we found a strong correlation between social equality and popularity for the Sensation dataset(ρp = 0.675). We found a Spearman correlation for the Pukkelpop dataset (ρS = −0.323), but the other datasets did not show a correlation between these variables. This correlation can be explained with the definitions of these variables: both variables say something about the ratio between followers and friends.

7.2.3 Content related variables We calculated the originality, emotionality, sentiment, newsworthiness, readability and intensity of the tweets. We show the content related variables during one day in the Dance Valley dataset in Figure 7.1. The graphs containing the other festivals and the whole period of the dataset can be found in Appendix F. For the content related variables we found a weak correlation between the emotionality and the intensity (ρP = −0.167 ± 0.054) and the readability and the intensity (ρP = 0.27 ± 0.074 and ρS = 0.392 ± 0.103); This means that when the amount of tweets per time unit is higher, people are sending shorter messages which include less emoticons. We also found correlations between the newsworthiness and the emotionality (ρP = −0.253± 0.061) and the originality and the newsworthiness (ρP = 0.189 ± 0.272). These correlations can be explained by the definition of the newsworthiness: we consider whether the message is retweeted as an indication of newsworthiness, which also defines the originality and we consider the presence of emoticons as a contra-indication for newsworthiness of the tweet. It should be noted that while the average correlation between the newsworthiness and originality is a weak positive correlation, the newsworthiness for Sensation, Pukkelpop, Pinkpop and Lowlands is weak to moderate and positive (ρP = 0.302, ρP = 0.334, ρP = 0.52, ρP = 0.23), the correlation is moderate and negative for the Dance Valley dataset (ρP = −0.306). It is unclear what causes these differences. When we looked at the individual datasets, we also found correlations between the readability and originality in the Sensation dataset (rhoS : 0.308) and the Pinkpop dataset(rhoP = 0.328). However, in the Pinkpop dataset we found a linear correlation, while for the Sensation dataset we found a non-linear correlation (rhoP = 0.079). It seems that shorter and simpler tweets are more often retweeted than more complex tweets. We furthermore found correlations between the intensity and originality (rhoP = 0.241) and the intensity and the sentiment (rhoP = 0.293), between the readability and the emotionality (rhoP = −0.22) and the readability and sentiment (rhoP = 0.212) in the Sensation dataset. In the Pinkpop dataset we found a weak correlation between the readability and the newsworthiness

69 of the tweets (rhoP = 0.244). These numbers cannot be found in other datasets.

7.2.4 Place and people related variables Between the variables that describe place and people related characteristics we could not find combinations with significant average correlations. The social equality and the distance between the users showed a weak negative correlation for four datasets. (Dance Valley: ρP = −0.285, Pinkpop ρP = −0.105 ρS = −0.164and Low- lands ρP = −0.124). This suggests that when users that have more unidirectional relationships compared to the total amount of relationships they have are active, the average distance between the places of residence of these users is smaller. However, in the Sensation dataset we found a positive linear correlation and an even higher non-linear correlation. (ρP = 0.198, ρS =: 0.473). We do not know what causes these differences. When we looked into the individual datasets, we found more weak to moderate non-linear correlations between the place and people related variables than we found linear correlation coefficients. In the Dance Valley and the Sensation dataset, we found several weak and moderate linear and non-linear correlations between the people and place related variables, while we did not find any in the Pukkelpop, Pinkpop and Lowlands dataset. In the Dance Valley dataset we found weak correlations between the distance between the users and the experience (ρS − 0.258) and the social equality (ρP = −0.285) and between the distance between the living places of the users and the event and the betweenness centrality (ρP = −0.217). In the Sensation dataset we also found non-linear moderate correlations between the distance between the place of the event and the place of residence of the user and the social equality (ρS = 0.507). We furthermore found correlations between this same distance and the degree centrality (ρS = 0.368) and the popularity (ρS = −0.387). There is also a non-linear correlation between the distance between the users and the popularity (ρS = −0.4) and the social equality (ρS = 0.473).

7.2.5 Place and content related variables In the four datasets we found a weak non-linear correlation between the distance between the event and the place of residence of the users and the intensity of the tweets (ρS = −0.164±0.286). This value has a high standard deviation, which implies large differences between the datasets. Indeed, we find different values. Dance Valley (ρS = −0.431) and Sensation (ρS = −0.563) show moderate negative correlations, while Pukkelpop (ρS −0.281), Pinkpop (ρS = 0.28) and Lowlands (ρS = 0.182) show lower correlations. We suspect that a lower distance in these datasets means that the users are living closer to the Randstad (see also Section 7.2.1). This correlation would then mean that at moments that more people from the Randstad are active the number of Tweets per time unit is also higher. A possible explanation is that during the festivals the intensity is higher and people from Amsterdam are tweeting more about this festival because the festivals are held in Amsterdam. We furthermore found weak and moderate correlations between variables in the five separate datasets. We found a correlation between the distance from the event to the place of residence and the originality of the tweet (ρP = 0.302) and between the distance between the users and the newsworthiness ( ρP = −0.373) in the Dance Valley dataset. In the Sensation dataset we found correlations between the readability and the distances between the event and place of residence ( ρP = −0.402) and between the distance between the active users and the intensity (ρP = −0.542). In the Pukkelpop dataset we found a correlation between the distance between the event and the place where the tweet is published and the readability (ρP = −0.303). For

70 Pinkpop and lowlands we did not find any other significant correlations between place and content related variables.

7.2.6 People and content related variables

When we compared the people related variables with the content related variables we found weak negative correlations between the intensity and degree centrality (ρP = −0.154 ± 0.13). This correlation shows a low standard deviation, which means that this value is constant. We did however find a moderate non-linear negative correlation between these variables in the Sensation dataset (ρS = −0.499). This correlation shows that at moments that people are more ‘connected’ within the group of people that tweet about the same music event, the number of tweets per time unit is higher.

Between the intensity and the inclusiveness we found a weak average correlation (ρP = 0.191 ± 0.278), but with a high standard deviation. This is caused by the moderate correlations between these values which we found in the Pinkpop (ρP = 0.486) and the Lowlands (ρP = 0.445) datasets. This correlation means that in periods with a higher number of tweets for each time unit the number of users that have no friends or followers within the group of people that tweet about the same music event is higher. This seems to contradict the previous finding, but it is not. The inclusiveness and the degree centrality are not correlated, which is caused by the different approach they take in the way they define the (lack of) “connectedness” of the users. We furthermore found a weak negative correlation between the social equality and the read- ability (ρP = −0.169 ± 0.18). The social equality appears to have a stronger correlation with the readability in the Sensation dataset (ρP = −0.386). In the Sensation dataset we further- more found moderate non-linear correlations between Popularity and Intensity (ρS = 0.538) and between intensity and social equality (ρS = −0.694)

7.3 Relations between online and real world variables

We compared the online variables with the real world variables by calculating the Pearson corre- lation for the two variables while being shifted in time. We only considered results with P-values below 0.01 and with correlation values above 0.15 and below -0.15. We furthermore calculated the maximum correlations for shifting the variables to a maximum time difference of 36 hours in both directions. We only considered the shifted value with the highest Pearson correlation. We describe only a selection of the correlations we found in the paragraphs below. An overview of the average correlations for the five datasets can be found in Table 7.3. For more details see Appendix J for the significant correlations for the different datasets; the complete average correlations can be found in Appendix K.

7.3.1 Correlations between online place and real world variables

We compared the place variables with the real world variables. We found some weak average correlations with the holiday variables and no significant average correlations with the weather, festival activities or context variables. We found mainly found significant correlations in the Dance Valley and Sensation dataset. The place related variables we found correlations with are mostly the distance between the place of residence of the users and the distance between the place of residence of the users and the event.

71 Table 7.3: Average cross correlations between online and real world variables

Variable 1 Variable 2 µ(ρP ) σ(ρP ) P-value µ(lag) σ(lag) Festivals Intensity M 0.23 0.27 0.0 6:9:55 12:19:51 Festivals Intensity M 0.235 0.269 0.0 -28:16:42 6:23:17 Festivals Readability M 0.192 0.181 0.0 16:1:44 14:33:25 Festivals Betweenness centr. M 0.154 0.049 0.0 -16:40:17 8:57:54 Festivals Closeness centr. M 0.177 0.054 0.0 0:0:0 0:0:0 Festivals Closeness centr. M 0.213 0.026 0.0 -22:59:31 12:26:25 Festivals Social Equality M -0.198 0.14 0.0 15:29:57 12:18:2 Festivals Social Equality M -0.181 0.131 0.0 -1:54:39 3:45:44 Dance Valley Betweenness centr. M -0.154 0.094 0.0 13:18:52 2:45:45 School Holidays Intensity M 0.245 0.231 0.0 12:50:46 15:52:21 School Holidays Intensity M 0.246 0.23 0.0 -7:0:8 14:0:17 School Holidays Readability M 0.229 0.196 0.0 14:1:4 14:55:48 School Holidays Readability M 0.228 0.193 0.0 -5:17:6 8:56:31 weekend Emotionality M -0.19 0.041 0.0 11:39:27 14:13:45 weekend Emotionality M -0.185 0.06 0.0 -15:15:45 7:49:44 weekend Intensity N 0.392 0.249 0.0 13:39:43 15:58:46 weekend Intensity N 0.395 0.245 0.0 -9:59:25 12:58:12 weekend Readability M 0.218 0.076 0.0 17:11:49 14:32:3 weekend Readability M 0.198 0.064 0.0 -16:21:1 13:57:28 weekend Sentiment M 0.179 0.109 0.003 -3:36:52 5:19:57 Const. Ind. holidays Social Equality M 0.162 0.106 0.0 17:18:41 15:19:28 School Holidays Degree centr. M -0.152 0.108 0.006 15:30:43 16:18:8 School Holidays Dist Event - Res M -0.159 0.136 0.0 29:44:0 4:30:11 Max Temp Readability M 0.174 0.155 0.0 1:41:14 3:22:28 Max Temp Readability M 0.161 0.168 0.0 -3:49:12 6:19:53 Sun Newsworthiness M 0.154 0.057 0.0 7:40:22 10:30:44 Temperature Intensity M 0.159 0.095 0.0 -8:20:53 13:4:25 Temperature Readability M 0.163 0.118 0.0 7:6:9 13:49:26

72 Weather and place related variables

We compared the place related variables with the weather related variables and did not find any combinations that on average result in a weak or stronger correlations. When we inspected the five datasets separately, we found a number of weak correlations of which we found most in the Dance Valley and the Sensation dataset. When we compared the distance between the event and the place of residence with the weather we found weak correlations with the maximum temperature for the Dance Valley dataset (ρP = −0.213). We furthermore found correlations between the position of the moon and the distance between the place of residence of the users (ρP = 0.285 for Dance Valley and ρP = −0.301 in the Sensation dataset.) These correlations have different signs and we also found corresponding correlations between the distances between the users and the position of the moon for these datasets (ρP = 0.289 for Dance Valley and (ρP = −0.211) in the Sensation dataset.) We do not know how to explain these correlations.

Holiday and place related variables

When we compared the holiday related variables with the place related variables we found a moderate to weak negative correlation between the variable that describes the weekend and the distances between the place of residence and the place of event and between the users. We found these correlations for the Dance Valley and Sensation dataset. We know from section 7.2.1 that these two distance measures have a strong correlation in the Sensation and Dance Valley dataset, which might cause similar behaviour when these variables are being correlated to another variable. We also found that the lag for these variables is similar. (See also Table 7.4.) In our implementation, the weekends start at Saturday morning, 00:00 AM. The lag that is found implies that the correlation is stronger if instead we use Friday and Saturday as weekend instead of Saturday and Sunday. What we found is that the distance between the event and the place of residence and the distances between the event and the place of residence and between the users is smaller on Fridays and Saturdays. Since the place of the event is (close to) Amsterdam for both festivals, this might imply that more people that live in the Randstad talk about these festivals on Fridays and Saturdays, while on other days more people from other regions in the Netherlands participate.

Table 7.4: Holiday and place related variables

Dataset Real world variable Online variable ρP lag Dance Valley weekend Dist Event - Res N -0.394 -24:35:06 Sensation weekend Dist Event - Res M -0.261 -31:17:56 Dance Valley weekend Distance Users N -0.419 -25:38:12 Sensation weekend Distance Users M -0.178 -31:17:56 Dance Valley School Holidays Dist Event - Res N -0.411 -27:43:39 Dance Valley School Holidays Dist Event - Res N -0.415 30:49:39 Sensation School Holidays Dist Event - Res M -0.172 35:25:57

When we consider this, we would expect that there would also be a similar correlation with school holidays. Indeed we found a moderate correlation for Dance Valley. We did however also find a correlation with comparable strength for the Dance Valley and Sensation dataset with a lag of about a day in the other direction, which would mean that a correlation between the living places of the people that are active and something that starts on Sundays and ends on Mondays is found. Possibly, the correlations between school holidays and the distances contains both information about the correlation between the weekends and about the school holidays.

73 Festival and place related variables In the Dance Valley dataset we found that both the distance between the users and the living place of the users and the distance between the living places of the users has a weak negative correlation with the occurrence of Dance Valley. When we consider the direction of the lag with the highest correlation values, we might find that one day after the occurrence of Dance Valley people that live in the Randstad are more active on Twitter. (Dist Event - Res: ρP = −0.293 + 28:45:41, Distance Users ρP = −0.322 + 27:43:39) We also found that in the Pinkpop dataset there is a positive correlation between the oc- curence of Lowlands (ρP = 0.278-28:54:24) and Pukkelpop (ρP = 0.281 -33:45:11), also with a lag around 30 hours, but in the other direction. This would mean that people from a larger region in the Netherlands are active while Pukkelpop and Lowlands are taking place, starting 30 hours before the event. We would also expect such a relationship for Pinkpop itself, but we did not find one.

Context and place related variables In the Dance Valley dataset we found a negative weak correlation between the distance between the users and news events (ρP = −0.33). In the Sensation dataset we found a weak and moderate positive correlation between the distance between the place of residence of the user (ρP = 0.355) and the event and the distance between the users and sport events (ρP = 0.45). Each of these three correlations has a positive lag of 30 hours. The lag of 30 hours means that the events that correlate are found 30 hours after the day started at which the event occurred in the real world. The correlation found with the news events means that thirty hours after a news event occurs, the people that tweet about Dance Valley have a shorter distance from the place where they tweet to the place of the event. In the case of Dance Valley, this means that more people that are closer to Amsterdam are tweeting about Dance Valley. When we looked at the sport events, we found 30 hours after the sport event, people that live further away from the Amsterdam are tweeting more often about Sensation, and that people are living further away from each other. A possible explanation for these findings might be that even though people tweet about Dance Valley, they might be triggered to tweet because of news or sport events. We do not know how to explain the correlation with regard to the location. We did not find comparable results for the Pinkpop, Lowlands or Pukkelpop datasets.

7.3.2 Correlations between the relations between people online and the real world variables When we compared the people related variables with the real world variables, we found weak significant average correlations for the holidays and for the context variables. For the festival activities and the weather we did not find any significant average correlations with the relations between people. When we looked at the different datasets, the Dance Valley and Sensation dataset showed the strongest correlations. The variables that describe the relations between people that show the strongest correlation with the real world occurrences are the social equality and Experience and degree centrality.

Weather and people related variables Between the weather and people related variables we found no significant average correlations. When we looked into the results for the five datasets separately, we found very diverse weak correlations between the two types of variables. The most remarkable correlations are between

74 the position of the moon and various people related variables, which occur mainly in the Dance Valley and the Sensation dataset. We found that during the period around the full moon, the closeness and degree centrality tends to be higher, while the popularity tends to be lower. In the sensation dataset we found that the degree centrality, inclusiveness and the social equality tend to be lower during full moon, while the popularity tends to be higher. In the Pinkpop dataset the inclusiveness and the number of people of interest tend to be lower during the period around full moon. A list of the variables that are correlated with the moon is listed in Table 7.5. We found conflicting correlation values with opposite signs, but given their strength and the number of significant correlations we found, we expect that the position of the moon does have an impact on people, but we cannot provide an explanation about how this variable impacts people.

Table 7.5: Weather and people related variables

Dataset Real world variable Online variable ρP Dance Valley Moon Closeness centr. M 0.279 Dance Valley Moon Degree centr. M 0.296 Sensation Moon Degree centr. M -0.323 Dance Valley Moon Popularity M -0.203 Sensation Moon Popularity M 0.396 Sensation Moon Social Equality M -0.448 Pinkpop Moon Inclusiveness M -0.316 Sensation Moon Inclusiveness M -0.216 Pinkpop Moon People of int. M -0.222

Holiday and people related variables We found only very weak correlations between the holiday related variables and the people related variables for the five datasets together. We did find more weak correlations within the Dance Valley and Sensation dataset and also some moderate correlations. For example, in the Dance Valley dataset the inclusiveness showed a moderate correlation with the school holidays (ρP = 0.453) and the construction industry holidays (ρP = 0.411), which would imply that during the holidays more people that do not have friends or followers in the Sensation dataset tweet about Sensation. Sensation was at the 6th of July; part of the school holidays start around and after this date, and the construction holidays start after this date. We therefore would not expect people to Twee more about Sensation after the festival has held. We do not have an explanation for this correlation. Possibly, this is caused by tweets about the HTC Sensation smart phone.

Festival and people related variables When we compared festival activities with the people related variables, we did not find average correlations for the five datasets that indicate a relationship between these variables. When we inspected the results for the separate datasets we found that there are weak correlations between these categories of variables. When we compared these values we found that especially the Pinkpop, Pukkelpop and Lowlands activities showed weak correlations with some of the people related variables in the Pinkpop, Pukkelpop, Lowlands and Sensation dataset. We assume that these three festivals have similar characteristics, so we compare the people related variables for the five datasets while combining the festivals. The variables with a significant correlation can be found in Table 7.6. We see that the results from the dance festivals Sensation and the Dance Valley datasets are very different from the results of the pop festivals Pinkpop, Pukkelpop and Lowlands. For

75 Table 7.6: Festival and people related variables

Dataset Real world variable Online variable ρP lag Sensation Lowlands Closeness centr. M 0.218 33:22:3 Sensation Pukkelpop Closeness centr. M 0.235 0:0:0 Pukkelpop Dance Valley Degree centr. M 0.162 15:56:9 Sensation Lowlands Degree centr. M 0.263 4:54:30 Sensation Pukkelpop Degree centr. M 0.262 0:42:4 Pukkelpop Pukkelpop Degree centr. M -0.207 31:15:27 Sensation Pukkelpop Experience M 0.16 -23:42:33 Dance Valley Dance Valley Experience M 0.2 -12:45:28 Sensation Lowlands Experience M 0.171 -7:42:23 Pinkpop Pinkpop Experience M -0.189 4:17:19 Sensation Pukkelpop Inclusiveness M -0.188 20:14:52 Sensation Lowlands Inclusiveness M -0.189 35:25:57 Pinkpop Pinkpop Inclusiveness M 0.37 33:38:44 Pinkpop Pinkpop Inclusiveness M 0.405 -33:45:11 Pukkelpop Pukkelpop Inclusiveness M -0.204 24:23:17 Lowlands Pukkelpop Inclusiveness M 0.269 26:55:50 Lowlands Lowlands Inclusiveness M 0.241 2:49:26 Pukkelpop Lowlands People of int. M 0.158 -29:14:30 Sensation Sensation People of int. M 0.227 -35:25:57 Pukkelpop Pukkelpop People of int. M 0.251 -23:40:46 Lowlands Pukkelpop People of int. M 0.206 19:19:45 Lowlands Lowlands People of int. M 0.266 16:33:21 Pinkpop Pinkpop Social Equality M -0.166 31:49:7 Lowlands Lowlands Social Equality M -0.211 35:0:44 Lowlands Pukkelpop Social Equality M -0.218 29:48:51 Pukkelpop Lowlands Social Equality M -0.215 30:8:31 Pukkelpop Pukkelpop Social Equality M -0.287 33:53:48 Sensation Pukkelpop Social Equality M 0.273 -34:3:21 Sensation Lowlands Social Equality M 0.256 -4:12:26

both the inclusiveness and the social equality the correlations between the festival activities and these people related variables are the inverse of the other datasets. Both the inclusiveness and the social equality have a relation with the amount of friends the user has and are influenced by the fact that we only consider friends and followers that have published tweets within these datasets to determine these variables. Possibly the people that tweet about the pop festivals are ‘outsiders’ in the group of people that tweet about the dance festivals, while the people that tweet about pop festivals do have more social relations with the people that tweet about these pop festivals.

When we disregard the results from the Sensation and the Dance Valley dataset we found a couple of correlations between pop festival activities and people related variables. During the festivals, we observed that less people that do not know anyone in the group of users that tweet about the same festival post tweets. This is observed around 30 hours after the festival activities have started. We furthermore found that before and after the festivals have started slightly more people with verified accounts post tweets about these festivals or about one of the other festivals. Around 30 hours after the festival starts the social equality tends to lower. The social equality is the portion of friends and followers that have a unidirectional relationship with the user of the total number of relationships that the users have. If this number decreases, this means that people have more bidirectional relations and are more connected with the group.

76 Context and people related variables Between the context and people related variables we find a weak negative correlation between the occurrence of festivals and the social equality amongst the people that tweet about the five festivals (ρP = −0.198 ± 0.14). This means that these people that are active on Twitter while a festival occurs have more unidirectional relations. What causes this correlation is unclear. We furthermore found in the Sensation dataset a positive weak correlation between the oc- currence of festivals and the popularity (ρP = 0.393) and a negative weak correlation with the social equality (ρP = −0.461). In the Pukkelpop dataset we found a weak correlation between the occurrence of festivals and the inclusiveness (ρP = 0.324). This would imply that during the occurrence of festivals, more people that have more followers than friends, which also have less bidirectional relationships, tweet about Sensation. During the occurrence of festivals more people that do not have friends or followers in the Pukkelpop dataset tweet about Pukkelpop. We do not know what causes these correlations.

7.3.3 Correlations between the tweet content and the real world vari- ables We found several weak to moderate average correlations between the tweet content and the real world variables. We found these correlations in all datasets. The content variables that show the strongest correlations are the intensity, readability and newsworthiness.

Weather and content related variables We found very weak average correlations between the weather and content related variables. We found a very weak positive correlation between the sun intensity and the newsworthiness of the tweets (ρP = 0.154 ± 0.057). The sun intensity is higher during the day than during the night. Possibly, this is because people send more newsworthy tweets during the day than during the night. The temperature is also higher during the day and shows a very weak correlation with the intensity (ρP = 0.159 ± 0.095) and the readability (ρP = 0.163 ± 0.118). The readability shows a very weak correlation with the maximum temperature (ρP = 0.174 ± 0.155). The maximum temperature is determined per day and thus shows the changing temperature trends instead of the temperature changes in one day. This correlation with the readability shows that during the warmer periods the tweets are a little simpler and shorter than when the maximum temperature is lower. This correlation is strongest in the Dance Valley dataset (ρP = 0.399). In the Dance Valley dataset we furthermore found a moderate positive correlation between the maximum temperature and the intensity (ρP = 0.408, which implies that during the warmer periods the amount of messages per time unit is higher. This might be related to the weak positive correlation between the intensity and the readability. In the Dance Valley dataset we found a remarkable moderate negative correlation between the sun intensity and the sentiment. (ρP = −0.446). This would imply that during the day the sentiment is more negative than during the night and that people are more negative when the sun shines stronger. The percentage of tweets with negative sentiment is very low (<10%), which might influence the reliability of this correlation value. Since the sun intensity increases during the morning and decreases during the afternoon, it might also have to do with the fact that people are having more fun in the evenings, while they do not have to work. A remarkable finding is that though the correlations we found between combinations of the same variables are usually consistent in the different datasets, we found a number of inconsistent correlations between the moon and various content related variables (see Table 7.7). Although

77 the originality has a positive weak correlation with the position of the moon in the Pinkpop dataset, this correlation is of the same strength but negative in the Dance Valley dataset. We saw the same for the Pinkpop and Sensation dataset with regard to the correlation between the intensity and the position of the moon and for the Pinkpop and Sensation dataset when we look at the readability and the position of the moon. We also found an inverse lag for the correlation between the moon and the intensity and the moon and the readability for the different datasets. We do not know how to explain this.

Table 7.7: Weather and content related variables

Dataset Real world variable Online variable ρP lag Dance Valley Moon Originality M -0.222 -30:49:39 Pinkpop Moon Originality M 0.202 -32:34:16 Sensation Moon Intensity N 0.366 35:25:57 Pinkpop Moon Intensity M -0.237 -33:45:11 Sensation Moon Readability M 0.336 -35:25:57 Pinkpop Moon Readability M -0.289 33:45:11

We furthermore found two weak negative correlations between the wind and the sentiment (Dance Valley dataset, ρP = −0.308) and between the wind and the intensity (Sensation dataset, ρP = 0.337). These numbers would imply that when the wind is stronger people (that tweet about Dance Valley) are less positive and that they (people that tweet about Sensation) publish less tweets per time unit.

Holiday and content related variables

We found several weak average correlations between the holidays and the content related vari- ables. The intensity shows a weak correlation with the school holidays (ρP = 0.245 ± 0.231), which is strongest in the Sensation dataset (ρP = 0.524). The intensity also shows a weak positive correlation with the (ρP = 0.395), which is most strong in the Pinkpop dataset (ρP = 0.547). We furthermore found an average weak positive correlation between the readability and the school holidays (ρP = ±0.229) and between the readability and the weekends (ρP = ±0.218). The correlation with the school holidays is stronger in the Lowlands dataset (ρP = 0.371 in the Lowlands dataset). When we looked at the results for the individual datasets we found that the readability is also correlated with the construction industry holidays in the Dance Valley dataset (ρP = 0.502).

Festival and content related variables

When we compared the festival activities at the five festivals with the content related variables, we found that the festival activities shows weak to moderate correlations with the intensity. We found this for the Pukkelpop (ρP = 0.32), Lowlands (ρP = 0.584), Pinkpop(ρP = 0.669) and the Dance Valley (ρP = 0.442) activities in the corresponding datasets (ρP =). We also found correlations for Pukkelpop activities in the Lowlands dataset (ρP = 0.237) and vice versa (ρP = 0.542). This is probably because the programs for Pukkelpop and Lowlands party overlap. We furthermore found a weak correlation between the originality in the Dance Valley dataset with the Pinkpop activities (ρP = 0.336) and with the Lowlands (ρP = 0.555) and Pukkelpop (ρP = 0.504) activities in the Sensation dataset. This means that the number of retweets about Dance Valley and Sensation respectively increases during these other festivals. Possibly, people retweet messages that mention both music events.

78 Context and content related variables

We found weak average correlations between the occurrence of festivals and the intensity (ρP = 0.235)and the readability (ρP = 0.192)of the tweets. When we looked at the results for the five datasets, we found that the correlation between the readability and festival activities is moderate in the Sensation dataset (0.457), and weak in the Pukkelpop (ρP = 0.233) and Lowlands (ρP = 0.241) datasets. We found that the correlation in the Dance Valley dataset has an opposite sign (ρP = −0.249). When we looked at the correlation results for the datasets for the festival activities and the intensity, we found that this correlation is strong (ρP = 0.659) in the Sensation dataset, weak (ρP = 0.348) in the Pinkpop dataset and very weak (ρP = 0.252)in the lowlands dataset. This would imply that people that tweet about the five festivals tweet a little more during other festivals and that these tweets are a little shorter and simpler than usual. We also found a moderate positive correlation with the sentiment in the Sensation dataset (ρP = 0.379), meaning that the tweets about Sensation are more positive at days with festivals. We furthermore found that the occurrences of news events show a weak correlation with the readability in the Dance Valley (ρP = −0.315) and the Sensation datasets (ρP =-0.228). This means that the tweets about both Dance festivals at days with news events are a little longer than usual.

7.4 Discussion

In the correlation results for the combinations of online variables, and the combinations of real world variables, we found high correlation values for combinations of variables for which high correlations could be expected. For example, when the temperature is higher the sun intensity is higher. These findings show that the method used for the comparison is most likely valid. When we compared the online with the real world variables, we found mostly weak correlations between these variables. The p-values are very low, so the chances of these finding being random are small. We compared all average correlations for the five datasets. For each real world variable we calculated the average correlation with all online variables with a p-value below 0.01 and vice versa. These numbers give an indication of how well each online variable correlates with the real world variables. These results can be found in Table 7.8 and 7.9, under ‘overall average’. The differences in the occurrence of significant correlations are large between the datasets. The Sensation dataset has the largest number of significant correlations, followed by those of Dance Valley, Pukkelpop, Pinkpop and Lowlands. The difference between the number of corre- lations in the Sensation dataset and the Lowlands dataset is a factor 6. The significant average correlations that were found in the different datasets are comparable in strength, ranging from 0.201 to 0.253 for the real world variables and 0.199 to 0.263 for the online variables. We suspect that diversity within the dataset causes lower correlations. The number of sig- nificant correlations we found corresponds with the order of datasets in size. Possibly the size of the dataset causes a larger diversity of behaviours within the dataset, causing less significant correlations. We furthermore suspect that apart from the size of the data possibly the selection of the data also causes the data to be very diverse in nature: the data contains tweets from before, during and after the events. It is possible that for example a strong sun intensity increases the Twitter intensity during a festival but causes a decrease in intensity before and after the festival. We found correlations which we could explain, such as the correlations between intensity and festival activities. It makes sense that of the amount of tweets about a festival is higher in de period that the festival is held. We also found correlations which are very hard to interpret, such as the correlation between the amount of followers and friends during a festival, or the number

79 of unidirectional relations during an occurrence of a festival. We suspect that for some cases we might have found spurious associations. An example in which this is clear, is the correlation we found between sun and festival occurrences. The cofounding factor that we did not include in our comparison is the summer. Festivals are held in the summer, and the sun intensity is higher during the summer. Another example is the relation between the sun intensity and the negative sentiment. This might have something to do with the fact that both the sun intensity and the negative sentiment are higher because of the time of the day. During the day, the sun intensity is higher and people have to work, which might result in a more negative sentiment. For the real world variables the school holidays, the weekends and the festivals have the strongest influence on all online variables. Several real world variables are reflected in the online variables of the different datasets with significant correlations. Sorted in decreasing order of the average number of correlations, these are: the weekends, school holidays, position of the moon, festivals, maximum temperature and temperature. For the online variables these correlations are less strong and significant correlations occur less often. The content variables have the highest correlations and occur most often as significant correlation. The online variables in decreasing order of significant correlations are: intensity, readability, sentiment, newsworthiness, emotionality, inclusiveness and social equality.

80 Table 7.8: Average correlations between real world variables and all online variables for the separate datasets and the datasets combined. The Dataset average is based on the significant correlations for the datasets, while the dataset average is based on all values. AR=Average correlation ASC=Average significant correlation # = Number of significant correlations

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands Overall average Dataset average AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # Public Holidays 0.025 0 0 0.09 M 0.1741 0.066 0 0 0.069 M 0.1552 0.074 M 0.2272 0.014 0 0 0.054 0.093 0.8 School Holidays M 0.229 M 0.302 12 0.133 M 0.2048 0.136 M 0.227 0.079 M 0.2152 0.112 M 0.2724 0.083 M 0.2324 0.115 M 0.202 5.5 Const. Ind. holidays M 0.233 M 0.303 11 0.149 M 0.2237 0.077 M 0.2362 0.073 M 0.1733 0.073 M 0.1752 0.049 0.129 1 0.101 M 0.185 4.2 weekend M 0.263 M 0.325 12 M 0.211 M 0.262 11 0.116 M 0.1836 0.132 M 0.286 0.108 M 0.2014 0.116 M 0.1854 0.138 M 0.209 6.5 Rain 0.13 M 0.2085 0.132 M 0.2018 0.062 0 0 0.064 M 0.2271 0.065 M 0.1911 0.026 0 0 0.075 0.138 2.5 Sun 0.148 M 0.2557 0.109 M 0.2144 0.073 0 0 0.078 M 0.1622 0.078 M 0.2121 0.054 M 0.2121 0.081 0.14 2.3 Temperature 0.124 M 0.2195 M 0.155 M 0.2228 0.11 M 0.224 0.077 M 0.2073 0.088 M 0.175 0.067 0.124 2 0.092 M 0.173 4.2 Max Temp M 0.164 M 0.239 10 M 0.161 M 0.229 0.113 M 0.1996 0.082 M 0.1933 0.053 M 0.1551 0.057 0.086 1 0.096 M 0.168 4.8 Fog 0.026 M 0.1881 0.047 M 0.2122 0.022 0 0 0.038 M 0.171 0.055 0 0 0.01 0 0 0.031 0.095 0.7 Thunderstorm 0.036 0 0 0.047 0 0 0.061 0 0 0.041 0 0 0.049 0 0 0.018 0 0 0.039 0.0 0.0 Wind 0.092 M 0.2183 0.117 M 0.2365 0.08 M 0.1783 0.062 0 0 0.067 M 0.161 0.027 0 0 0.07 0.132 2.0 Moon M 0.191 M 0.241 11 M 0.204 M 0.278 12 0.078 M 0.1651 0.122 M 0.2367 0.064 M 0.1561 0.031 0 0 0.11 M 0.179 5.3 Dance Valley M 0.163 M 0.2897 0.044 0 0 0.068 M 0.1692 0.037 0 0 0.042 0 0 0.044 0.0 1 0.059 0.076 1.5 Sensation 0.0 0 0 0.105 M 0.1973 0.011 0 0 0.017 0 0 0.018 0 0 0.018 0 0 0.025 0.033 0.5 Pukkelpop 0.056 0 0 M 0.156 M 0.2647 M 0.169 M 0.244 10 0.062 M 0.2811 0.127 M 0.285 0.039 0 0 0.095 M 0.178 3.8 Pinkpop 0.056 M 0.3361 0.071 0 0 0.06 0 0 0.146 M 0.335 0.055 0 0 0.025 0 0 0.065 0.111 1.0 Lowlands 0.025 0 0 0.149 M 0.2627 0.135 M 0.2057 0.066 M 0.2781 0.139 M 0.2985 0.017 0 0 0.086 M 0.174 3.3 News 0.135 M 0.2197 0.131 M 0.1857 0.073 0 0 0.064 0 0 0.079 M 0.1722 0.05 0 0 0.08 0.096 2.7 Sport 0.078 M 0.2313 0.135 M 0.2995 0.071 0 0 0.061 M 0.191 0.081 M 0.1591 0.053 0 0 0.071 0.147 1.7 Festivals 0.127 0.2264 0.276 0.312 14 0.089 0.1875 0.119 0.264 0.092 0.2193 0.106 0.117 5 0.117 0.201 5.0 81 M M M M M M M Average 0.115 M 0.253 5.0 0.131 M 0.233 5.9 0.083 M 0.201 2.6 0.074 M 0.224 2.1 0.076 M 0.203 1.9 0.045 0.136 0.9 0.08 0.136 2.9 Table 7.9: Average correlations between online variables and all real world variables for the separate datasets and the datasets combined. The Dataset average is based on the significant correlations for the datasets, while the dataset average is based on all values. AR=Average correlation ASC=Average significant correlation # = Number of significant correlations

Variable Dance Valley Sensation Pukkelpop Pinkpop Lowlands Overall average Dataset average AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # AC ASC # Dist Event - Res M 0.162 M 0.2958 M 0.155 M 0.233 11 0.065 M 0.161 0.056 0 0 0.058 M 0.2611 0.052 0.044 1 0.083 M 0.158 3.5 Dist. Event - User 0.079 M 0.2613 0.084 M 0.1864 0.084 M 0.1743 0.054 0 0 0.065 0 0 0.023 0 0 0.061 0.104 1.7 Distance Users 0.132 M 0.2676 0.122 M 0.2487 0.093 M 0.1612 0.075 M 0.2433 0.062 0 0 0.042 0 0 0.081 M 0.153 3.0 Degree centr. 0.112 M 0.3024 M 0.159 M 0.2459 0.086 M 0.1842 0.04 0 0 0.091 M 0.1842 0.046 M 0.1681 0.081 M 0.152 2.8 Betweenness centr. 0.113 M 0.2098 0.091 M 0.1832 0.0 0 0 0.0 0 0 0.0 0 0 0.053 0.0 2 0.034 0.065 1.7 Closeness centr. 0.104 M 0.2234 0.087 M 0.2133 0.0 0 0 0.0 0 0 0.0 0 0 0.054 0.0 1 0.032 0.073 1.2 Inclusiveness 0.102 M 0.3544 0.119 M 0.1968 0.087 M 0.1743 0.139 M 0.2966 0.102 M 0.2383 0.04 0 0 0.092 M 0.21 4.0 Social Equality 0.067 M 0.2311 M 0.184 M 0.284 10 0.123 M 0.2127 0.083 M 0.1763 0.105 M 0.1894 0.045 0.11 2 0.094 M 0.182 4.2 Popularity 0.081 M 0.2324 M 0.154 M 0.2579 0.066 M 0.1861 0.049 0 0 0.063 M 0.1941 0.034 0 0 0.069 0.145 2.5 Experience 0.086 M 0.2375 0.099 M 0.1874 0.075 M 0.162 0.066 M 0.1772 0.059 0 0 0.022 0 0 0.064 0.127 2.2 People of int. 0.087 M 0.254 0.102 M 0.2244 0.081 M 0.2052 0.064 M 0.1892 0.099 M 0.1915 0.031 0 0 0.072 M 0.176 2.8 Originality 0.119 M 0.2428 0.124 M 0.3574 0.097 M 0.1934 0.08 M 0.182 0.072 M 0.1633 0.015 0 0 0.082 M 0.189 3.5 Emotionality 0.128 M 0.2199 0.115 M 0.1847 0.106 M 0.1896 0.105 M 0.2285 0.068 0 0 0.044 0.12 1 0.087 0.137 4.5 Sentiment 0.132 M 0.2747 M 0.154 M 0.238 10 0.098 M 0.2114 0.082 M 0.1763 0.07 M 0.1591 0.051 0.139 1 0.089 M 0.176 4.2 Newsworthiness 0.112 M 0.2346 0.11 M 0.1777 0.12 M 0.2336 0.105 M 0.1885 0.084 M 0.1872 0.05 M 0.2121 0.089 M 0.17 4.3 Readability M 0.171 M 0.3388 M 0.166 M 0.26 10 0.135 M 0.2417 0.1 M 0.2183 0.116 M 0.2067 0.079 M 0.1915 0.115 M 0.21 5.8 Intensity M 0.169 M 0.296 10 M 0.203 N 0.3669 0.103 M 0.2983 M 0.167 M 0.3218 M 0.177 M 0.3079 0.087 M 0.2684 0.137 M 0.265 6.5 Average 0.115 M 0.263 5.8 0.131 M 0.238 6.9 0.083 M 0.199 3.1 0.074 M 0.217 2.5 0.076 M 0.207 2.2 0.045 0.125 1.1 0.08 M 0.158 3.4 82 Chapter 8

Conclusion

In this research we looked into the relation between real world social events and the online man- ifestation of these events on Twitter. The aim of this research was to find whether developments of social events in the real world relate to messages about these events on Twitter. This work was intended as a first exploratory step into using the relations between real world and online developments of social events to predict the future development of events, and to allow the anticipation of unwanted situations during these events. To investigate the relation between the real world and online developments of social events, we chose five music events in The Netherlands and Belgium in 2013 and compared information about the real world development of these events and the time before and after the event with their online manifestations. We compared four categories of real world variables (festival activities, weather, holidays, context events) with three categories of online variables (place, people, content of tweet). We found weak to moderate correlations between some of the combinations of these different variables for the different datasets. The differences between the datasets were large. In the smaller datasets of the dance festivals Dance Valley and Sensation we found up to six times more significant correlations than in the larger datasets of the larger pop festivals Pukkelpop, Pinkpop and Lowlands, which might be caused by diversity in user behaviours. Within the individual datasets we found some moderate and many weak correlations. The on- line variables in decreasing order of the number of significant correlations with the real world are: intensity, readability, sentiment, newsworthiness, emotionality, inclusiveness and social equality. The real world variables which were strongest reflected online in decreasing order of the average number of correlations with online variables are: the weekends, school holidays, position of the moon, festivals, maximum temperature and temperature. The significant correlations we found have weak correlation coefficients. The significant av- erage correlations range from 0.201 to 0.253 for the real world variables and 0.199 to 0.263 for the online variables, all with very low p-values (p < 0.01). Due to the noisy nature of the data, the differences between the datasets and the many factors that influence the behaviour of people we find our results promising. We can identify online variables of which the suspicion arises that they have a dependency of some sort with particular real world variables within the context of a social event. The goal of our research was to identify online variables that can be used to identify real world developments. Based on our results we can advise to monitor the intensity, readability, sentiment, newsworthiness and emotionality of tweets, because we found that in a normal situation these variables reflect developments that take place in the real world. We do not know how these

83 variables behave during events with incidents but this could be subject of future research.

8.1 Future work

With our research we intended to find variables which show correlation between online and real world development to allow the prediction of the future development and possible escalation of social events. In our research we set a first step towards this goal by studying the relation between the real world and the online world in a normal setting without incidents. Our results are promising, but we are still far from able to predict the development of social events. We list suggestions for future research: • We selected our data by searching for the name of the event in the tweet and in the hashtag of the tweet. We do not know how many tweets about the event we missed nor do we know how many tweets exist in the data that are not about the event. Different selection methods might result in a more complete or less noisy dataset. Investigating the quality of our datasets and - if needed based on our findings - looking into methods for better extraction of information from Twitter could be a next step in our research. • Not all tweets about the event might be useful for our research. Especially when something happens at an event it might be useful to know which people are present at the event and which people follow the event from a remote location. It would be useful to find a method to separate these two groups of Twitter users. • In larger datasets the correlation results are lower than in smaller datasets. It would be useful to understand why this is the case. Manually inspection of parts of the data might be useful and it would also be good to get a better understanding of the groups of people that are involved in the five events we study. Possibly the heterogeneity of the larger groups of people that are involved in large music events make it harder to find correlations between online and real world variables. • Some of the online variables give promising results, such as the readability or the news- worthiness. A next step would be to further investigate the relationship between these variables and developments in the real world in a different context to find out whether there really is a dependency between these variables and certain real world variables and to get a better understanding of these variables. Assuming that variables such as the read- ability or the newsworthiness really have a dependency with certain real world variables, it might be useful to find out how these variables can be defined in such a way that the relation between the online world and the real world can be measured more precisely. • In this research, we calculated the correlation between online and real world variables for a long period of time that included the data about the event. It would be interesting to also study the correlations between the periods before, during and after the event separately to get a better understanding of how the online and real world variables connect. • In this research, only data about music events is used to investigate the relation between the real world and the online world. It would be useful to also investigate other types of social events and to investigate past events that escalated. • In this research we have looked at the correlations between online and real world variables. The next step would be to investigate how these results can be used to predict the future development of events based on the developments online.

84 Bibliography

[1] Charu C. Aggarwal and Karthik Subbian. Event detection in social streams. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, , USA, April 26-28, 2012, pages 624–635. SIAM / Omnipress, 2012.

[2] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high-quality content in social media. In Proceedings of the 2008 International Con- ference on Web Search and Data Mining, WSDM ’08, pages 183–194, New York, NY, USA, 2008. ACM.

[3] James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang. Topic detection and tracking pilot study final report. In In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pages 194–218, 1998.

[4] Sitaram Asur and Bernardo A Huberman. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM Interna- tional Conference on, volume 1, pages 492–499. IEEE, 2010.

[5] Farzindar Atefeh and Wael Khreich. A survey of techniques for event detection in twitter. Computational Intelligence, pages n/a–n/a, 2013.

[6] Timothy Baldwin, Paul Cook, Bo Han, Aaron Harwood, Shanika Karunasekera, and Masud Moshtaghi. A support platform for event detection using social intelligence. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69–72. Association for Computational Linguistics, 2012.

[7] Hila Becker, Mor Naaman, and Luis Gravano. Event identification in social media. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases, 2009.

[8] Hila Becker, Mor Naaman, and Luis Gravano. Beyond trending topics: Real-world event identification on twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 2011.

[9] James Benhardus and Jugal Kalita. Streaming trend detection in twitter. Int. J. Web Based Communities, 9(1):122–139, January 2013.

[10] Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8, 2011.

[11] Johan Bollen, Alberto Pepe, and Huina Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 2011.

85 [12] S.P. Borgatti and D. Halgin. Analyzing affiliation networks. In P. Carrington and J. Scott, editors, The Sage Handbook of Social Network Analysis. Sage Publications, Oxford, In press.

[13] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web, pages 675–684. ACM, 2011.

[14] Sara Hawker Catherine Soanes, Angus Stevenson, editor. Concise Oxford English Dictionary 11e Revised on CD-ROM. Oxford University Press, 2006.

[15] Deepayan Chakrabarti and Kunal Punera. Event summarization using tweets. In ICWSM, 2011.

[16] Marek Ciglan and Kjetil Nørvåg. Wikipop: Personalized event detection system based on wikipedia page view statistics. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pages 1931–1932, New York, NY, USA, 2010. ACM.

[17] Dipanjan Das and André FT Martins. A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU, 4:192–195, 2007.

[18] Tom De Smedt and Walter Daelemans. "vreselijk mooi!" (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC’12), page 3568â“3572, 2012.

[19] Daantje Derks, Arjan E. R. Bos, and Jasper von Grumbkow. Emoticons in computer- mediated communication: Social motives and social context. CyberPsychology & Behavior, 11:99–101, February 2008.

[20] Daantje Derks, Arjan E.R. Bos, and Jasper von Grumbkow. Emoticons and social interaction on the internet: the importance of social context. Computers in Human Behavior, 23(1):842 – 849, 2007.

[21] D. DiNucci. Fragmented future. Print Magazine, 53(4)(32), 1999.

[22] W. Dou, X. Wang, W. Ribarsky, and M. Zhou. Event detection in social media data. In IEEE VisWeek Workshop on Interactive Visual Text Analytics - Task Driven Analytics of Social Media Content, 2012.

[23] Wenwen Dou, Xiaoyu Wang, D. Skau, W. Ribarsky, and M.X. Zhou. Leadline: Interactive visual analysis of text data through event identification and exploration. In Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on, pages 93–102, 2012.

[24] Eric Eaton. Network centrality, 2013. Lecture slides".

[25] L. C. Freeman. Centrality in social networks: Conceptual clarification. Social Networks, 1:215â“239, 1979.

[26] Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1–12, 2009.

[27] Jennifer Golbeck and Derek Hansen. Computing political preference among twitter followers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pages 1105–1108, New York, NY, USA, 2011. ACM.

86 [28] F.E. Grubbs. Procedures for detecting outlying observations in samples. Technometrics, 11:1–1, 1969. [29] Jan Hauke and Tomasz Kossowski. Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30:87â“93, 2011. [30] D.M. Hawkins. Identification of Outliers. Chapman and Hall, , 1980. [31] A Hürriyetoˇglu,F Kunneman, and A van den Bosch. Estimating the time between twit- ter messages and future events. In Proceedings of the 13th Dutch-Belgian Workshop on Information Retrieval, 2013. [32] Andreas M. Kaplan and Michael Haenlein. Users of the world, unite! the challenges and opportunities of social media. Business Horizons, 53(1):59 – 68, 2010. [33] M. C. Kerman, W Jiang, A. F. Blumberg, and S. E. Butterey. Event detection challenges, methods, and applications in natural and artificial systems. In Proceedings of 14th Inter- national Command and Control Research and Technology Symposium: “C2 and Agility”, 2009. [34] Efthymios Kouloumpis, Theresa Wilson, and Johanna Moore. Twitter sentiment analysis: The good the bad and the omg! ICWSM, 11:538–541, 2011. [35] FA Kunneman and A van den Bosch. Leveraging unscheduled event prediction through mining scheduled event tweets. In Proceedings of the 24th Belgium-Netherlands Artificial Intelligence Conference (BNAIC), 2012. [36] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591–600. ACM, 2010. [37] Jey Han Lau, Nigel Collier, and Timothy Baldwin. On-line trend analysis with topic models:\# twitter trends detection topic model online. In COLING, pages 1519–1534, 2012. [38] Ryong Lee and Kazutoshi Sumiya. Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of the 2Nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, LBSN ’10, pages 1–10, New York, NY, USA, 2010. ACM. [39] Rui Li, Kin Hou Lei, R. Khadiwala, and K.C.-C. Chang. Tedas: A twitter-based event detection and analysis system. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pages 1273–1276, 2012. [40] David J. Low. Statistical physics: Following the crowd. Nature, 407(6803):465–466, 2000. [41] Alan M MacEachren, Anthony C Robinson, Anuj Jaiswal, Scott Pezanowski, Alexander Savelyev, Justine Blanford, and Prasenjit Mitra. Geo-twitter analytics: Applications in crisis management. In Proceedings, 25th International Cartographic Conference, Paris, , 2011. [42] Michael Mathioudakis and Nick Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 1155–1158, New York, NY, USA, 2010. ACM.

87 [43] Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. Twitter under crisis: can we trust what we rt? In Proceedings of the first workshop on social media analytics, pages 71–79. ACM, 2010.

[44] Nasir Naveed, Thomas Gottron, Jérôme Kunegis, and Arifah Che Alhadi. Bad news travel fast: A content-based analysis of interestingness on twitter. In Proceedings of the 3rd In- ternational Web Science Conference, WebSci ’11, pages 8:1–8:7, New York, NY, USA, 2011. ACM.

[45] Jeffrey Nichols, Jalal Mahmud, and Clemens Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM International Conference on Intelligent User In- terfaces, IUI ’12, pages 189–198, New York, NY, USA, 2012. ACM.

[46] A. Nurwidyantoro and E. Winarko. Event detection in social media: A survey. In ICT for Smart Society (ICISS), 2013 International Conference on, pages 1–5, 2013.

[47] Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11:122– 129, 2010.

[48] Brendan O’Connor, Michel Krieger, and David Ahn. Tweetmotif: Exploratory search and topic summarization for twitter. In ICWSM, 2010.

[49] Nelleke Oostdijk and Hans Halteren. N-gram-based recognition of threatening tweets. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, vol- ume 7817 of Lecture Notes in Computer Science, pages 183–196. Springer Heidelberg, 2013.

[50] Nelleke Oostdijk and Hans van Halteren. Shallow parsing for recognizing threats in dutch tweets. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, pages 1034–1041, New York, NY, USA, 2013. ACM.

[51] Miles Osborne, Sasa Petrovic, Richard McCreadie, Craig Macdonald, and Iadh Ounis. Bieber no more: First story detection using twitter and wikipedia. In Proceedings of the SIGIR Workshop on Time-aware Information Access, 2012.

[52] Alexander Pak and Patrick Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, 2010.

[53] Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2(1-2):1–135, January 2008.

[54] Georgios Petkos, Symeon Papadopoulos, and Yiannis Kompatsiaris. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, page 23. ACM, 2012.

[55] Sasa Petrovic, Miles Osborne, and Victor Lavrenko. Streaming first story detection with application to twitter. In HLT-NAACL’10, pages 181–189, 2010.

[56] Ana-Maria Popescu, Marco Pennacchiotti, and Deepa Paranjpe. Extracting events and event descriptions from twitter. In Proceedings of the 20th international conference companion on World wide web, pages 105–106. ACM, 2011.

88 [57] Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 1589–1599, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.

[58] Mohamad Rabbath, Philipp Sandhaus, and Susanne Boll. Analysing facebook features to support event detection for photo-based facebook applications. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR ’12, pages 11:1–11:8, New York, NY, USA, 2012. ACM.

[59] Aldo Raineri and Cameron Earl. Crowd management for outdoor music festivals. Journal of Occupational Health and Safety, 3(21):205–215, 2005.

[60] Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. Detecting and tracking political abuse in social media. In ICWSM, 2011.

[61] Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Snehal Patil, Alessandro Flammini, and Filippo Menczer. Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web, pages 249–252. ACM, 2011.

[62] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Con- ference on World Wide Web, WWW ’10, pages 851–860, New York, NY, USA, 2010. ACM.

[63] Hassan Sayyadi, Matthew Hurst, and Alexey Maykov. Event detection and tracking in social streams. In ICWSM, 2009.

[64] John Scott. Social Network Analysis: A Handbook. SAGE publications, 1991.

[65] Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita. Summarizing microblogs auto- matically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pages 685– 688, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.

[66] Tamara A. Small. What the hashtag? Information, Communication & Society, 14(6):872– 895, 2011.

[67] Thomas Steiner, Seth van Hooland, and Ed Summers. Mj no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. In Proceedings of the 22Nd International Conference on World Wide Web Companion, WWW ’13 Companion, pages 791–794, Republic and Canton of Geneva, , 2013. Inter- national World Wide Web Conferences Steering Committee.

[68] Hiroya Takamura, Hikaru Yokono, and Manabu Okumura. Summarizing a document stream. In Advances in Information Retrieval, pages 177–188. Springer, 2011.

[69] Richard Taylor. Interpretation of the correlation coefficient: a basic review. Journal of diagnostic medical sonography, 6(1):35–39, 1990.

[70] Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 62(2):406–418, 2011.

89 [71] Ramine Tinati, Thanassis Tiropanis, and Lesie Carr. An approach for using wikipedia to measure the flow of trends across countries. In Proceedings of the 22Nd International Con- ference on World Wide Web Companion, WWW ’13 Companion, pages 1373–1378, Repub- lic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.

[72] Erik Tjong Kim Sang. Dealing with big data: The case of twitter. Computational Linguistics in the Netherlands Journal, 3:121–134, 2013. [73] Erik Tjong Kim Sang and Johan Bos. Predicting the 2011 dutch senate election results with twitter. In SASN2012 (EACL 2012 Workshop on Semantic Analysis in Social Media), page 8. ACL, 2012.

[74] H Tops, A van den Bosch, and F Kunneman. Predicting time-to-event from twitter messages. In Proceedings of the 25th Belgium-Netherlands Artificial Intelligence Conference (BNAIC), 2013. [75] Andranik Tumasjan, Timm Sprenger, Philipp Sandner, and Isabell Welpe. Predicting elec- tions with twitter: What 140 characters reveal about political sentiment. In International AAAI Conference on Weblogs and Social Media, 2010.

[76] Twitter Inc. Twitter reports fourth quarter and fiscal year 2013 results. https://investor. twitterinc.com/releasedetail.cfm?releaseid=823321, Feb. 5 2014. [77] Zhiyu Wang, Peng Cui, Lexing Xie, Hao Chen, Wenwu Zhu, and Shiqiang Yang. Analyzing social media via event facets. In Proceedings of the 20th ACM International Conference on Multimedia, MM ’12, pages 1359–1360, New York, NY, USA, 2012. ACM. [78] Jianshu Weng and Bu-Sung Lee. Event detection in twitter. In International AAAI Con- ference on Weblogs and Social Media, 2011.

[79] Zhaohua Wu, Norden E. Huang, Steven R. Long, and Chung-Kang Peng. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences, 104(38):14889–14894, 2007. [80] Sacha Wunsch-Vincent and Graham Vickery. participative web: user created content. Tech- nical report, Organisation for Economic Co-operation and Development, April 12 2007.

[81] Ming Yan, Zhengyu Deng, Jitao Sang, and Changsheng Xu. User-oriented social analysis across social media sites. In New Trends in Image Analysis and Processing–ICIAP 2013, pages 482–490. Springer, 2013. [82] Yiming Yang, Tom Pierce, and Jaime Carbonell. A study of retrospective and on-line event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pages 28–36, New York, NY, USA, 1998. ACM.

90 List of Figures

4.1 Event and occurrence of properties over time ...... 26 4.2 People that are involved in the event ...... 27 4.3 Different groups of Twitter users. The black points and the red squares are Twitter users. The users that are depicted in the large cirle have posted tweets about the events, and form the group of users (U), the users that are outside this circle are friends and relatives of these users, but have not posted tweets about the events themselves. For each sub-figure: the red points are the users that are described in the formula in the caption. The arrows indicate the direction of the relation. An arrow from a to b indicates that a is a follower of b and that b is a friend of a.. 30

5.1 Overlapping users in the Twitter data about Dance Valley, Sensation, Pukkelpop, Lowlands and Pinkpop ...... 46 47figure.caption.34 6.1 Meteorological stations KNMI with the locations of the events 1. Schiphol is used for weather information about Sensation and Dance Valley, Maastricht for Pukkelpop and Pinkpop and Lelystad for Lowlands...... 52

7.1 Place and content related variables for Dance Valley (3 Aug 12:00 - 22:00) . . . . 66 7.2 People related variables for Dance Valley (3 Aug 12:00 - 22:00) ...... 67

91 92 List of Tables

4.1 Notation real world model ...... 26 4.2 Notation online model ...... 29

5.1 Dataset ...... 42 43table.caption.31 5.3 Statistics about the Twitter messages in the five datasets. We describe the oc- currences per tweet and the number of tweets in which the property occurs. E.g.: A percentage following the description ‘emoticons (%)’ gives the percentage of tweets in which emoticons occur. The numbers following ‘Emoticons’ are the average number of emoticons for the tweets in which the emoticons occur. . . . . 44 5.4 Users that contribute to four sets of tweets ...... 48 5.5 Users that contribute to five sets of tweets ...... 48 5.6 User statistics ...... 48 5.7 Ratio between number user variables and the number of users ...... 48 5.8 Network properties for users within the different datasets ...... 48

6.1 Description of weather event functions ...... 51 6.2 Time period of the data about the events, and the distance between the events and the weather stations ...... 51 6.3 Coordinates of the locations of Dance Valley, Sensation, Pukkelpop, Pinkpop and Lowlands ...... 54 6.4 Examples of sentiment analysis on tweets in the Dance Valley dataset. S is the value that is returned by the sentiment analysis algorithm. A negative value indicates negative sentiment, a positive value positive sentiment ...... 59 6.5 Examples of tweets and their newsworthiness scores and manually assigned labels (C or N for chatter and news) ...... 60 6.6 Examples of Flesch-Douma scores of tweets in the Dance Valley data set. W is the total number of words in the tweet, S number of estimated sentences in the tweet...... 61

7.1 Correlations between real world variables ...... 65 7.2 Average correlations between online variables ...... 65 7.3 Average cross correlations between online and real world variables ...... 72 7.4 Holiday and place related variables ...... 73 7.5 Weather and people related variables ...... 75 7.6 Festival and people related variables ...... 76 7.7 Weather and content related variables ...... 78

93 7.8 Average correlations between real world variables and all online variables for the separate datasets and the datasets combined. The Dataset average is based on the significant correlations for the datasets, while the dataset average is based on all values. AR=Average correlation ASC=Average significant correlation # = Number of significant correlations ...... 81 7.9 Average correlations between online variables and all real world variables for the separate datasets and the datasets combined. The Dataset average is based on the significant correlations for the datasets, while the dataset average is based on all values. AR=Average correlation ASC=Average significant correlation # = Number of significant correlations ...... 81

A.1 Description of the meteorological data 2 ...... 100 A.2 Description of events expressed using the KNMI data. Norm() means that the data is normalized to a value within the range [0,1] ...... 100

B.1 Data schoolvakanties regio Noord schooljaar 2012-2013 ...... 101 B.2 Data schoolvakanties regio Midden schooljaar 2012-2013 ...... 102 B.3 Data schoolvakanties regio Zuid schooljaar 2012-2013 ...... 102 B.4 Bouwvak zomervakantie ...... 102 B.5 national holidays 2013 ...... 102

C.1 Nieuws ...... 104 C.2 Sport events ...... 104 105table.caption.100 D.1 Program Dance Valley 2013 ...... 107 D.2 Programma Sensation 2013 ...... 109 D.3 Programma Pinkpop 2013 ...... 109 D.4 Programma Pukkelpop 2013 Thursday 15th August ...... 110 D.5 Programma Pukkelpop 2013 Friday 16th August ...... 112 D.6 Programma Pukkelpop 2013 Saturday 17th August ...... 113 D.7 Programma Lowlands 2013 ...... 115 D.8 Programma Lowlands 2013 ...... 117 D.9 Programma Lowlands 2013 ...... 119

I.1 Pearson Correlation coefficient weather variables ...... 132 I.2 Spearman Correlation coefficient weather variables ...... 132 I.3 Pearson Correlation coefficient holiday variables ...... 133 I.4 Spearman Correlation coefficient holiday variables ...... 133 I.5 Pearson Correlation coefficient Festivals ...... 134 I.6 Spearman Correlation coefficient Festivals ...... 134 I.7 Pearson Correlation coefficient Context ...... 135 I.8 Spearman Correlation coefficient Context ...... 135 I.9 Pearson Correlation coefficient weather and holiday variables ...... 136 I.10 Spearman Correlation coefficient weather and holiday variables ...... 136 I.11 Pearson Correlation coefficient weather and festival variables ...... 137 I.12 Spearman Correlation coefficient weather and festival variables ...... 137 I.13 Pearson Correlation coefficient weather and context variables ...... 138 I.14 Spearman Correlation coefficient weather and context variables ...... 138 I.15 Pearson Correlation coefficient holiday and festival variables ...... 139 I.16 Spearman Correlation coefficient holiday and festival variables ...... 139

94 I.17 Pearson Correlation coefficient holiday and context variables ...... 140 I.18 Spearman Correlation coefficient holiday and context variables ...... 140 I.19 Pearson Correlation coefficient festival and context variables ...... 141 I.20 Spearman Correlation coefficient festival and context variables ...... 141

J.1 Average cross correlations between online variables Dance Valley ...... 143 J.2 Average cross correlations between online variables Sensation ...... 144 J.3 Average cross correlations between onlinevariables Pukkelpop ...... 145 J.4 Average cross correlations between online variables Pinkpop ...... 146 J.5 Average cross correlations between online variables Lowlands ...... 147

K.1 Pearson Average correlations for place related variables ...... 150 K.2 Spearman Average correlations for place related variables ...... 150 K.3 Pearson Average correlations for people related variables ...... 151 K.4 Spearman Average correlations for people related variables ...... 151 K.5 Pearson Average correlations for content related variables ...... 152 K.6 Spearman Average correlations for content related variables ...... 152 K.7 Pearson Average correlations for place and people related variables ...... 153 K.8 Spearman Average correlations for place and people related variables ...... 153 K.9 Pearson Average correlations for place and content related variables ...... 154 K.10 Spearman Average correlations for place and content related variables ...... 154 K.11 Pearson Average correlations for people and content related variables ...... 155 K.12 Spearman Average correlations for people and content related variables ...... 155

L.1 Average cross correlations between online and offline variables Dance Valley . . . 157 L.2 Average cross correlations between online and offline variables Sensation . . . . . 161 L.3 Average cross correlations between online and offline variables Pukkelpop . . . . 165 L.4 Average cross correlations between online and offline variables Pinkpop ...... 166 L.5 Average cross correlations between online and offline variables Lowlands . . . . . 167

M.1 Pearson average correlations for weather and place related variables ...... 172 M.2 Spearman average correlations for weather and place related variables ...... 172 M.3 Cross average correlations for weather and place related variables ...... 173 M.4 Cross average correlations for weather and place related variables ...... 173 M.5 Cross average correlations for weather and place related variables (continued) . . 173 M.6 Pearson average correlations for holiday and place related variables ...... 174 M.7 Spearman average correlations for holiday and place related variables ...... 174 M.8 Cross average correlations for holiday and place related variables ...... 175 M.9 Cross average correlations for holiday and place related variables ...... 175 M.10Pearson average correlations for festival and place related variables ...... 176 M.11Spearman average correlations for festival and place related variables ...... 176 M.12Cross average correlations for festival and place related variables ...... 177 M.13Cross average correlations for festival and place related variables ...... 177 M.14Cross average correlations for festival and place related variables (continued) . . 177 M.15Pearson average correlations for context and place related variables ...... 178 M.16Spearman average correlations for context and place related variables ...... 178 M.17Cross average correlations for context and place related variables ...... 179 M.18Cross average correlations for context and place related variables ...... 179 M.19Pearson average correlations for weather and people related variables ...... 180 M.20Spearman average correlations for weather and people related variables . . . . . 180

95 M.21Cross average correlations for weather and people related variables ...... 181 M.22Cross average correlations for weather and people related variables (continued) . 181 M.23Cross average correlations for weather and people related variables ...... 181 M.24Cross average correlations for weather and people related variables (continued) . 181 M.25Pearson average correlations for holiday andpeople related variables ...... 182 M.26Spearman average correlations for holiday andpeople related variables ...... 182 M.27Cross average correlations for holiday andpeople related variables ...... 183 M.28Cross average correlations for holiday andpeople related variables (continued) . . 183 M.29Cross average correlations for holiday andpeople related variables ...... 183

N.1 Pearson average correlations for festival and people related variables ...... 186 N.2 Spearman average correlations for festival and people related variables ...... 186 N.3 Cross average correlations for festival and people related variables ...... 187 N.4 Cross average correlations for festival and people related variables (continued) . . 187 N.5 Cross average correlations for festival and people related variables ...... 187 N.6 Cross average correlations for festival and people related variables (continued) . . 187 N.7 Pearson average correlations for context and people related variables ...... 188 N.8 Spearman average correlations for context and people related variables ...... 188 N.9 Cross average correlations for context and people related variables ...... 189 N.10 Cross average correlations for context and people related variables (continued) . 189 N.11 Cross average correlations for context and people related variables ...... 189 N.12 Pearson average correlations for weather and content related variables ...... 190 N.13 Spearman average correlations for weather and content related variables . . . . . 190 N.14 Cross average correlations for weather and content related variables ...... 191 N.15 Cross average correlations for weather and content related variables (continued) . 191 N.16 Cross average correlations for weather and content related variables ...... 191 N.17 Cross average correlations for weather and content related variables (continued) . 191 N.18 Pearson average correlations for holiday and content related variables ...... 192 N.19 Spearman average correlations for holiday and content related variables . . . . . 192 N.20 Cross average correlations for holiday and content related variables ...... 193 N.21 Cross average correlations for holiday and content related variables (continued) . 193 N.22 Cross average correlations for holiday and content related variables ...... 193 N.23 Pearson average correlations for festival and content related variables ...... 194 N.24 Spearman average correlations for festival and content related variables . . . . . 194 N.25 Cross average correlations for festival and content related variables ...... 195 N.26 Cross average correlations for festival and content related variables (continued) . 195 N.27 Cross average correlations for festival and content related variables ...... 195 N.28 Cross average correlations for festival and content related variables (continued) . 195 N.29 Pearson average correlations for context and content related variables ...... 196 N.30 Spearman average correlations for context and content related variables . . . . . 196 N.31 Cross average correlations for context and content related variables ...... 197 N.32 Cross average correlations for context and content related variables (continued) . 197 N.33 Cross average correlations for context and content related variables ...... 197

O.1 Correlation between online and real world variables for Dance Valley ...... 200 O.2 Correlation between online and real world variables for Sensation ...... 201 O.3 Correlation between online and real world variables for Pukkelpop ...... 202 O.4 Correlation between online and real world variables for Pinkpop ...... 203 O.5 Correlation between online and real world variables for Lowlands ...... 204

96 Appendices

97

Appendix A

Weather information

In Table A.1, a description of the KNMI data is provided. The translation of the KNMI-data into the functions used in this research can be found in Table A.2.

1http://www.knmi.nl/klimatologie/daggegevens/selectie.cgi

99 Table A.1: Description of the meteorological data 1

Tag Description YYYYMMDD datum (YYYY=jaar,MM=maand,DD=dag); HH tijd (HH=uur, UT.12 UT=13 MET, 14 MEZT. Uurvak 05 loopt van 04.00 UT tot 5.00 UT; DD Windrichting (in graden) gemiddeld over de laatste 10 minuten van het afgelopen uur (360=noord, 90=oost, 180=zuid, 270=west, 0=windstil 990=veranderlijk. FH Uurgemiddelde windsnelheid (in 0.1 m/s). FF Windsnelheid (in 0.1 m/s) gemiddeld over de laatste 10 minuten van het afgelopen uur; FX Hoogste windstoot (in 0.1 m/s) over het afgelopen uurvak; T Temperatuur (in 0.1 graden Celsius) op 1.50 m hoogte tijdens de waarneming; T10N Minimumtemperatuur (in 0.1 graden Celsius) op 10 cm hoogte in de afgelopen 6 uur; TD Dauwpuntstemperatuur (in 0.1 graden Celsius) op 1.50 m hoogte tijdens de waarneming; SQ Duur van de zonneschijn (in 0.1 uren) per uurvak, berekend uit globale straling (-1 for <0.05 uur); Q Globale straling (in J/cm2) per uurvak; DR Duur van de neerslag (in 0.1 uur) per uurvak; RH Uursom van de neerslag (in 0.1 mm) (-1 voor <0.05 mm); P Luchtdruk (in 0.1 hPa) herleid naar zeeniveau, tijdens de waarneming; VV Horizontaal zicht tijdens de waarneming (0=minder dan 100m, 1=100-200m, 2=200-300m,..., 49=4900-5000m, 50=5-6km, 56=6-7km, 57=7-8km, ..., 79=29-30km, 80=30-35km, 81=35-40km,..., 89=meer dan 70km); N Bewolking (bedekkingsgraad van de bovenlucht in achtsten), tijdens de waarneming (9=bovenlucht onzichtbaar); U Relatieve vochtigheid (in procenten) op 1.50 m hoogte tijdens de waarneming; WW Weercode (00-99), visueel(WW) of automatisch(WaWa) waargenomen, voor het actuele weer of het weer in het afgelopen uur. IX Weercode indicator voor de wijze van waarnemen op een bemand of automatisch station (1=bemand gebruikmakend van code uit visuele waarnemingen, 2,3=bemand en weggelaten (geen belangrijk weersverschijnsel, geen gegevens), 4=automatisch en opgenomen (gebruikmakend van code uit visuele waarnemingen), 5,6=automatisch en weggelaten (geen belangrijk weersverschijnsel, geen gegevens), 7=automatisch gebruikmakend van code uit automatische waarnemingen); M Mist 0=niet voorgekomen, 1=wel voorgekomen in het voorgaande uur en/of tijdens de waarneming; R Regen 0=niet voorgekomen, 1=wel voorgekomen in het voorgaande uur en/of tijdens de waarneming; S Sneeuw 0=niet voorgekomen, 1=wel voorgekomen in het voorgaande uur en/of tijdens de waarneming; O Onweer 0=niet voorgekomen, 1=wel voorgekomen in het voorgaande uur en/of tijdens de waarneming; Y IJsvorming 0=niet voorgekomen, 1=wel voorgekomen in het voorgaande uur en/of tijdens de waarneming;

Table A.2: Description of events expressed using the KNMI data. Norm() means that the data is normalized to a value within the range [0,1]

Event Description KNMI Value value RD Rain duration DR [0,1] RI Rain intensity RH [0,1] S Sun (SQ+Q)/2 [0,1] T Temperature T [0,1] F Fog M 0 or 1 SN Snow S 0 or 1 TS Thunderstorm O 0 or 1 W Wind (FH+FF+FX)/3 [0,1]

100 Appendix B

Holidays

In the tables below, an overview is provided of the dates for the holidays that are used to describe the contextual events. The school holidays are based on Table B.1, B.2 and B.3. The public holidays are based on dates in Table B.5. The construction industry holidays are based on Table B.4

Table B.1: Data schoolvakanties regio Noord schooljaar 2012-2013

Soort vakantie Data basis & speciaal onderwijs Data voortgezet onderwijs Herfstvakantie 20.10.2012 t/m 28.10.2012 20.10.2012 t/m 28.10.2012 Kerstvakantie 22.12.2012 t/m 06.01.2013 22.12.2012 t/m 06.01.2013 Voorjaarsvakantie 16.02.2013 t/m 24.02.2013 ** 16.02.2013 t/m 24.02.2013 ** Meivakantie 27.04.2013 t/m 05.05.2013 27.04.2013 t/m 05.05.2013 Zomervakantie 06.07.2013 t/m 18.08.2013 06.07.2013 t/m 25.08.2013 Herfstvakantie 19.10.2013 t/m 27.10.2013 idem Kerstvakantie 21.12.2013 t/m 05.01.2014 idem

101 Table B.2: Data schoolvakanties regio Midden schooljaar 2012-2013

Soort vakantie Data basis & speciaal onderwijs Data voortgezet onderwijs Herfstvakantie 13.10.2012 t/m 21.10.2012 13.10.2012 t/m 21.10.2012 Kerstvakantie 22.12.2012 t/m 06.01.2013 22.12.2012 t/m 06.01.2013 Voorjaarsvakantie 16.02.2013 t/m 24.02.2013 ** 16.02.2013 t/m 24.02.2013 ** Meivakantie 27.04.2013 t/m 05.05.2013 27.04.2013 t/m 05.05.2013 Zomervakantie 20.07.2013 t/m 01.09.2013 13.07.2013 t/m 01.09.2013 Herfstvakantie 19.10.2013 t/m 27.10.2013 idem Kerstvakantie 21.12.2013 t/m 05.01.2014 idem

Table B.3: Data schoolvakanties regio Zuid schooljaar 2012-2013

Soort vakantie Data basis & speciaal onderwijs Data voortgezet onderwijs Herfstvakantie 13.10.2012 t/m 21.10.2012 13.10.2012 t/m 21.10.2012 Kerstvakantie 22.12.2012 t/m 06.01.2013 22.12.2012 t/m 06.01.2013 Voorjaarsvakantie 23.02.2013 t/m 03.03.2013 ** 23.02.2013 t/m 03.03.2013 ** Meivakantie 27.04.2013 t/m 05.05.2013 27.04.2013 t/m 05.05.2013 Zomervakantie 29.06.2013 t/m 11.08.2013 29.06.2013 t/m 18.08.2013 Herfstvakantie 12.10.2013 t/m 20.10.2013 idem Kerstvakantie 21.12.2013 t/m 05.01.2014 idem

Table B.4: Bouwvak zomervakantie

Regio Periode Noord Nederland 22 jul 2013 t/m 9 aug 2013 Midden Nederland 29 jul 2013 t/m 16 aug 2013 Zuid Nederland 15 jul 2013 t/m 2 aug 2013

Table B.5: national holidays 2013

Feestdag datum New years day 1 januari 2013 Good Friday 29 maart 2013 Eastern 1th 31 maart 2013 Eastern 2nd 1 april 2013 Queens day 30 april 2013 Bevrijdingsdag 5 mei 2013 Ascension Thursday 9 mei 2013 Pentecost 1th 19 mei 2013 Pentecost 2nd 20 mei 2013 Christmas 1th 25 december 2013 Christmas 2nd 26 december 2013

102 Appendix C

Related events

Below, the associated events that are used in this research are listed. We used news and sport events as unrelated events, these are listed in Table C.1 and C.2. In Table C.3, related events are listed. (Related events are labeled PA, unrelated events UR, see Table 4.1.)

1http://reizen-en-recreatie.infonu.nl/evenementen/71847-popfestivals-nederland-2013-overzicht-pinkpop-tm-lowlands. html http://reizen-en-recreatie.infonu.nl/buitenland/71849-popfestivals-europa-2013.html http://www.yellowtipi.nl/

103 Table C.1: Nieuws

28 jan 2013 Beatrix maakt bekend dat ze zal aftreden 15 feb 2013 bij Tsjeljabinsk slaat een meteoriet in 24 feb 2013 Oscaruitreiking in LA 28 feb 2013 Paus Benedictus XVI treedt af als paus 13 feb 2013 Jorge Mario Bergoglio wordt tot nieuwe paus verkozen. 19 feb 2013 Paus Fransiscus wordt geinstalleerd 15 maart 2013 bomaanslag Boston 30 maart 2013 troonswisseling Nederland 5 mei 2013 treinramp Wetteren 18 mei 2013 Eurovisiesongfestival 15 Juni 2013 protesten Gezipark Istanbul 16-19 juli 2013 vierdaagse en zomerfeesten nijmegen 21 juli 2013 troonswisseling Belgie 25 juli 2013 hittegolf in Belgie en Nederland 21 aug 2013 gifgasaanval Syrie 21 sept 2013 aanslag winkelcentrum Westgate Kenia 1 okt 2013 shutdown VS 28 okt 2013 storm Nederland 8 nov 2013 tyhoon op Filipijnen 21 okt 2013 demonstaties Onafhankelijkheidsplein Kiev 19 okt 2013 gratie voor twee leden Pussy Riot en bemanningsleden Artic Sunrise

Table C.2: Sport events

7 sept 2013 IOC maakt bekend dat Olympische zomerspelen 2020 in Tokio gehouden zullen worden 25 mei 2013 Finale UEFA cup Tour de France 29 juni 2013 - 21 juli 2013 Luik-Bastenaken-Luik 21 april 2013 22 - 23 juni 2013 NK Wielrennen 13 jan 2013 EK schaatsen 15 jan 2013 marathon schaatsen op natuurijs

104 Table C.3: Music events 1

Description time period location people event category dance valley 3 augustus 2013 spaarnewoude 60.000 (2012) Sensation 6 juli 2013 Amsterdam 20.000 26-28 juli 2013 Boom, belgie 180.000 Loveland 10 aug 2013 Amsterdam 20.000 Amsterdam Open Air 8-9 juni 2013 Amsterdam 15.000 24 aug 2013 Hoofddorp 60.000 4-7 juli Rotselaar (Belgie) 340.000 bezoekers. Dagelijks 67.000 houders van een combiticket en 18.000 losse bezoekers. Dat maakt 139.000 unieke bezoekers over vier dagen. Zwarte Cross festival 26-28 juli 2013 Lichenvoorde 162.795 Szigetfestival 5-12 aug 2013 Sziget 379.000 (2012) Pukkelpop 15-17 aug 2013 Hasselt 189.000, verspreid over drie dagen (2012) pinkpop 14-16 juni Landgraaf 60.000 bezoekers, 32.000 waren het hele weekend (zaterdag en zondag) aanwezig, 28.000 popfans appelpop 13-14 sept 2013 Tiel 160.000 (verspreid over twee dagen, 2012) Lowlands 16-18 aug 2013 Biddinghuizen 55.000 (2012) Waterpop 9-10 aug 2013 Wateringen 28.000 (2012) 13-14 juli 2013 Weer 20.000(2012) Paaspop 29 - 31 maart 2013 Schijndel 52.00 (verspreid over 3 dagen) Huntenpop 23-24 aug 2013 Ulft 20.000 parkpop 30 juni 2013 Den Haag 150.000(2012)

105 106 Appendix D

Content events

D.1 Program Dance Valley

Table D.1: Program Dance Valley 2013

Stage Time Performance Mainstage 12:00 - 13:00 Leroy Styles 13:00 - 14:00 Franky Rizardo 14:00 - 15:00 Live: 15:00 - 16:00 Thomas Gold Duitsland 16:00 - 17:00 NERVO Australië 17:00 - 18:00 Zedd Duitsland 18:00 - 19:00 Live: Faithless Verenigd Koninkrijk 19:00 - 20:15 Sunnery James & Ryan Marciano 20:15 - 21:30 Alesso Zweden 21:30 - 23:00 Hardwell ? 12:00 - 13:30 Maarten de Jong & The Tunnel & Paul Van Dyk Politics of Dancing 13:30 - 15:00 Simon Patterson Verenigd Koninkrijk 15:00 - 16:30 Super8 & Tab Finland 16:30 - 17:30 Live: Giuseppe Ottaviani Italië 17:30 - 18:30 Jerome Isma-Ae Duitsland 18:30 - 20:00 Hard Rock Sofa Rusland 20:00 - 22:00 Paul van Dyk Duitsland The Warehouse 12:00 - 13:30 Live: Terence Fixmer Frankrijk 13:30 - 15:00 Rebekah Verenigd Koninkrijk 15:00 - 16:00 Live: Brian Sanhaji Duitsland 16:00 - 17:45 Monoloc Duitsland 17:45 - 19:00 Live: Planetary Assault Systems Verenigd Koninkrijk 19:00 - 22:30 Chris Liebing Duitsland The Crib 12:00 - 13:30 Flava 13:30 - 14:30 Dyna 14:30 - 15:30 WaxFiend 15:30 - 16:30 Irwan 16:30 - 17:30 Lady Bee Continued on next page 107 Table D.1 – continued from previous page Stage Time Performance 17:30 - 18:30 Abstract 18:30 - 19:30 FS Green 19:30 - 20:30 20:30 - 21:00 Band: 21:00 - 22:00 The Flexican MC: Fit, MC: Mr. VI, MC: Sef The Club 12:00 - 13:30 Roul and Doors 13:30 - 14:45 Jordy Dazz 14:45 - 16:00 16:00 - 17:00 Vato Gonzalez 17:00 - 18:00 18:00 - 19:00 Daddy’s Groove Italië 19:00 - 20:00 John Dahlbäck Zweden 20:00 - 21:00 21:00 - 22:00 Live: Far East Movement Verenigde Staten Refinery 12:00 - 13:00 Bass Chaserz 13:00 - 14:00 F8trix 14:00 - 15:00 Geck-o 15:00 - 16:00 Isaac 16:00 - 17:00 Slim Shore The Pitcher 17:00 - 18:00 D-Block & S-te-Fan 18:00 - 19:00 Zany 19:00 - 20:00 Tuneboy Italië 20:00 - 21:00 Thera 21:00 - 21:30 Live: Donkey Rollers 21:30 - 22:30 Deepack MC: Da Syndrome The Underground 12:00 - 12:45 Norman 12:45 - 13:30 G-Town Madness 13:30 - 14:30 Meccano Twins Italië 14:30 - 15:30 Ophidian 15:30 - 16:30 The Dreamteam 16:30 - 17:30 D-Passion Promo 17:30 - 18:30 The Outside Agency 18:30 - 19:30 Ruffneck 19:30 - 20:30 Tommyknocker Italië 20:30 - 21:30 Tieum Frankrijk 21:30 - 22:30 Unexist Italië, MC: Da Mouth of Madness Junkyard 12:00 - 13:00 Duruz Zirkum 13:00 - 14:30 Cellrock Duitsland 14:30 - 15:30 Cold Case 15:30 - 16:30 Blackburn 16:30 - 17:30 Phuture Noize 17:30 - 18:30 Dark Pact Warface 18:30 - 19:30 Prefix & Density 19:30 - 20:30 Twice 20:30 - 22:00 Degos & Re-Done, MC: Livid Crazy Stage 12:00 - 14:00 Cherr du Perr Darling 10 Continued on next page

108 Table D.1 – continued from previous page Stage Time Performance 14:00 - 16:00 16:00 - 17:00 17:00 - 18:00 FeestDJRuud 18:00 - 19:00 Vic Crezée 19:00 - 20:00 Naffz Punish 20:00 - 21:00 Yellow Claw 21:00 - 22:00 Steve Sweet

D.2 Program Sensation

Table D.2: Programma Sensation 2013

Stage Time Performance MAIN STAGE: 23:00 - 23:50 Mr White 23:50 - 01:00 Nic Fanciulli 01:00 - 02:15 Sunnery James & Ryan Marciano 02:15 - 03:30 03:30 - 04:45 04:45 - 06:00 Mark Knight 23:00 - 06:00 Mc Gee DELUXE: 21:00 - 23.30 Ferreck Dawn 23:30 - 01:00 Claptone 01:00 - 03:00 Dj Sneak 03:00 - 05:00 Onno 05:00 - 08:00 Olivier Weiter b2b Miss Melera

D.2.1 Program Pinkpop

Table D.3: Programma Pinkpop 2013

Stage Time Performance VRIJDAG 14 JUNI FESTIVALTERREIN GEOPEND VANAF 13.00 UUR MAINSTAGE 15.30 - 16.15 HANDSOME POETS 17.05 - 18.05 PARAMORE 19.00 - 20.00 THE SCRIPT 21.15 - 22.30 3FM STAGE 16.15 - 17.05 MASTERS OF REALITY 18.05 - 19.00 JIMMY EAT WORLD 20.00 - 21.15 BRAND BIER STAGE 15.00 - 15.30 CHRISTOPHER GREEN 16.15 - 17.05 ANDY BURROWS 18.10 - 19.00 KODALINE 20.00 - 21.15 NETSKY LIVE! ZATERDAG 15 JUNI FESTIVALTERREIN GEOPEND VANAF 11.00 UUR Continued on next page

109 Table D.3 – continued from previous page Stage Time Performance MAINSTAGE 13.40 - 14.30 LA PEGATINA 15.25 - 16.15 PASSENGER 17.15 - 18.15 THE OPPOSITES 19.15 - 20.15 21.15 - 22.30 3FM STAGE 14.30 - 15.25 DOUWE BOB 16.15 - 17.15 FUN. 18.15 - 19.15 20.15 - 21.15 PHOENIX BRAND BIER STAGE 13.00 - 13.40 PALMA VIOLETS 14.30 - 15.25 GRAVEYARD 16.15 - 17.15 MILES KANE 18.15 - 19.15 ELLIE GOULDING 20.15 - 21.15 C2C ZONDAG 16 JUNI FESTIVALTERREIN GEOPEND VANAF 10.00 UUR MAINSTAGE 12.50 - 13.40 KENSINGTON 14.40 - 15.40 WILL AND THE PEOPLE 16.40 - 17.30 THE VACCINES 18.30 - 19.30 20.30 - 22.30 3FM STAGE 12.00 - 12.50 TOM ODELL 13.40 - 14.40 TRIXIE WHITLEY 15.40 - 16.40 BLAUDZUN 17.30 - 18.30 19.30 - 20.30 BRAND BIER STAGE 12.00 - 12.50 PUGGY 13.40 - 14.40 BASTILLE 15.40 - 16.40 LIANNE LA HAVAS 17.30 - 18.30 DIE ANTWOORD 19.30 - 20.30 ALT-J

D.2.2 Programma Pukkelpop

Table D.4: Programma Pukkelpop 2013 Thursday 15th August 1

Stage Time Performance Main Stage 11:25-12:05 School Is Cool 12:45-13:25 Imagine Dragons 14:05-14:50 Mac Miller 15:30-16:20 17:05-18:00 Deftones 18:50-19:50 Fall Out Boy 20:40-21:55 22:40-00:10 01:00-02:00 Chase & Status Live Marquee 11:50-12:25 Psycho 44 Continued on next page

110 Table D.4 – continued from previous page Stage Time Performance 13:00-13:40 Merchandise 14:20-15:00 Parquet Courts 15:40-16:25 Surfer Blood 17:05-17:55 Villagers 18:40-19:30 Miles Kane 20:15-21:05 Johnny Marr 21:50-22:50 HURTS 00:15-01:45 Godspeed ! Black Emperor Dance Hall 12:50-13:30 Safi & Spreej 14:10-14:55 Charli XCX 15:35-16:35 Klangkarussell 17:15-18:05 Danny Brown 18:45-19:30 AlunaGeorge 20:10-21:00 The Parov Stellar Band 21:40-22:40 TNGHT ( x ) 23:20-00:20 NERO 01:00-02:00 Duck Sauce The Shelter 12:25-13:05 Oathbreaker 13:40-14:25 The Menzingers 15:00-15:45 Hawk Eyes 16:25-17:10 Zebrahead 17:55-18:45 Fucked Up 19:30-20:20 Quicksand 21:05-21:55 The Bronx 23:10-00:10 Alkaline Trio Boiler Room 12:00-13:00 The Whatevers 13:00-14:00 The Mixfitz 14:00-15:30 DJ Green Lantern 15:30-17:00 Just 17:00-18:30 Wilkinson 18:30-20:00 Dillon Francis 20:00-21:30 21:30-23:00 A-Trak 23:00-00:30 Mark Ronson DJ Set 00:30-02:00 Rudimental DJ Set 02:00-04:00 TLP Castello 14:55-15:35 Vuurwerk 16:35-17:15 Lowell 18:05-18:45 Kate Boy 19:30-20:15 Bombino 21:00-22:00 Naughty Boy 22:50-23:50 00:40-01:40 Araabmuzik Club 13:25-14:05 Mikal Cronin 14:50-15:30 Allah-Las 16:20-17:00 BadBadNotGood Continued on next page

111 Table D.4 – continued from previous page Stage Time Performance 18:00-18:45 Phosphorescent 19:50-20:35 Glen Hansard 21:55-22:40 Savages 00:10-01:00 Crystal Fighters Wablief?! 13:40-14:20 Polaroid Fiction 15:00-15:45 Pomrad 16:25-17:05 Few Bits 17:50-18:40 Float Fall 19:30-20:20 The Happy 21:00-21:50 Steak Number Eight 22:55-23:50 Meuris

D.2.3 Programma Pukkelpop

Table D.5: Programma Pukkelpop 2013 Friday 16th August 2

Stage Time Performance Main Stage 12:25-13:05 Puggy 13:45-14:30 15:10-16:00 Noah And The Whale 16:40-17:40 18:25-19:15 FUN. 20:05-21:05 22:05-23:15 Eels 00:45-02:00 Marquee 13:05-13:45 Nina Nesbit 14:25-15:05 Frank Turner & The Sleeping Souls 15:55-16:40 Little Green Cars 17:20-18:05 18:55-19:45 Girls In Hawaii 20:30-21:20 Johnny Borrell & Zazou 22:05-22:55 Local Natives 23:45-00:45 James Blake Dance Hall 12:35-13:15 Animal-Music 13:55-14:40 Dope D.O.D. 15:10-15:55 The Opposites 16:05-16:50 Yellow Claw 17:20-18:20 Sound 19:00-20:00 RONE presents MODULE 20:35-21:35 Proxy 22:20-23:20 katy B 00:05-01:05 DJ Fresh (live) The Shelter 12:30-13:10 Homer 13:45-14:25 Palm Reader 15:10-15:55 Cerebral Ballzy 16:40-17:20 Sylosis Continued on next page

112 Table D.5 – continued from previous page Stage Time Performance 18:05-18:55 Caliban 19:45-20:30 We Came As Romans 21:15-22:00 Architects 22:45-23:45 Killswitch Engage Boiler Room 12:30-14:00 The Oddword 14:00-15:30 Dismantle 15:30-17:00 Hazard 17:00-18:30 TC 18:30-20:00 Duke Dumont 20:00-21:30 Zeds Dead 21:30-23:00 Bingo Players 23:00-00:30 The Magician 00:30-02:00 (dj set) 02:00-04:00 Nadiem Shah Castello 11:50-12:30 In The Valley Below 13:10-13:50 Cloud Boat 14:30-15:15 Slow Magic 15:55-16:40 Factory Floor 17:20-18:10 Mala In Cuba 18:50-19:35 Ms Mr 20:15-21:05 21:45-22:35 Totally Enormous Extinct Dinousaurs 23:00-00:30 Maya Jane Coles 00:30-02:00 SBTRKT dj set Club 11:50-12:25 SKATERS 13:05-13:45 Chuck Ragan 14:30-15:10 Lord Huron 16:00-16:40 Lucy Rose 17:40-18:20 Unknown Mortal Orchestra 19:15-19:55 Daughter 21:05-21:55 Poliça 23:20-00:20 Low Wablief?! 15:10-15:50 BRNS 16:40-17:20 The Black Heart Rebellion 18:05-18:50 Raketkanon 19:45-20:30 Gruppo di Pawlowski 21:20-22:05 Compact Disk Dummies 22:55-23:45 Dez Mona

D.2.4 Programma Pukkelpop

Table D.6: Programma Pukkelpop 2013 Saturday 17th August 3

Stage Time Performance Main Stage 11:15-11:50 Mintzkov 12:25-13:05 Noisettes Continued on next page

113 Table D.6 – continued from previous page Stage Time Performance 13:45-14:45 Regina Spektor 15:25-16:25 Alabama Shakes 17:05-18:05 Triggerfinger 18:45-19:35 Foals 20:15-21:15 Franz Ferdinand 22:00-23:10 The xx 00:00-01:00 Goose Marquee 11:45-12:25 San Cisco 13:05-13:45 Clock Opera 14:45-15:25 I Am Kloot 16:10-17:00 Kodaline 17:45-18:45 Bonobo 19:25-20:25 Bat For Lashes 21:15-22:26 The Knife 23:15-00:35 Paul Kalkbrenner Dance Hall 11:45-12:25 Robert DeLong 13:00-13:45 Superpoze 14:25-15:15 Waka Flocka Flame 15:55-16:55 Foreign Beggars 17:35-18:25 chk chk chk 19:15-20:15 Crystal Castles 21:05-22:05 Noisia 23:00-00:00 Knife Party The Shelter 11:15-11:50 Spoil Engine 12:25-13:05 Don Broco 13:55-14:35 While She Sleeps 15:25-16:10 Your Demise 17:00-17:45 Cult Of Luna 18:40-19:25 Filter 20:25-21:15 Gojira 22:25-23:15 Lamb Of God 00:00-01:00 Boiler Room 12:30-13:30 A.N.D.Y. 13:30-15:00 Gorgon City 15:00-16:30 16:30-18:00 RL 18:00-19:00 Flosstradamus 19:00-20:30 Friction 20:30-22:00 Doctor P 22:00-23:30 Erol Alkan 23:30-01:00 Mr Oizo 01:00-04:00 Michael Midnight Castello 12:25-13:05 The Haxan Cloak 13:45-14:25 SOHN 15:10-15:50 Jagwar Ma 16:35-17:15 Holy Other Continued on next page

114 Table D.6 – continued from previous page Stage Time Performance 18:00-18:50 The Soft Moon 19:30-20:30 20:30-21:30 Mosca 21:30-22:30 Ben Pearce 22:30-00:00 Julio Bashmore 00:00-01:00 Jamie xx Club 11:50-12:25 Pokey LaFarge 13:05-13:45 The Family Rain 14:45-15:25 Bosnian Rainbows 16:25-17:05 Deap Vally 18:05-18:45 Frightened Rabbit 19:35-20:15 Andy Burrows 21:15-22:00 HAIM 23:10-00:00 Midlake Wablief?! 12:30-12:40 Rhinos Are People Too 12:40-12:50 Tout Va Bien 12:50-13:00 Soldier’s Heart 13:50-14:35 The Sedan Vault 15:25-16:10 Delv!s 17:00-17:45 dans dans 18:40-19:25 Sir Yes Sir 20:25-21:15 Amenra 22:25-23:15 Baloji

D.2.5 Programma Lowlands

Table D.7: Programma Lowlands 2013 Friday 4

Stage Time Performance Alpha 13:30-14:10 Tom Odell 14:50-15:50 Seasick Steve 16:30-17:30 18:10-19:10 Biffy Clyro 19:50-20:50 De Jeugd van Tegenwoordig 21:30-23:00 Nine Inch Nails Bravo 14:10-15:00 AlunaGeorge 16:00-17:00 Gold Panda 17:50-18:50 Hurts 19:40-20:40 Baauer 21:30-22:45 Disclosure (live) 23:00-00:30 Ben Pearce 00:30-02:00 Julio Bashmore 02:00-03:30 Maceo Plex 03:30-04:55 Maya Jane Coles Grolsch 14:00-14:45 The Joy Formidable 15:30-16:30 Bullet For My Valentine Continued on next page

115 Table D.7 – continued from previous page Stage Time Performance 17:15-18:15 Kendrick Lamar 19:00-20:00 Crystal Fighters 20:50-21:50 Slayer 23:00-03:00 Pop-O-Matic hosted by DJ St. Paul & VJ Switchdoctor Charlie 14:10-14:50 Theme Park 15:50-16:30 James McCartney 17:30-18:10 Sir Yes Sir 19:10-19:50 The Menzingers 20:50-21:30 Jagwar Ma India 13:40-14:25 Deep Sea Arcade 15:10-16:00 The 1975 16:45-17:45 Villagers 18:30-19:30 20:15-21:05 Mikal Cronin 22:00-22:50 Mister And Mississippi 23:15-04:15 Global cLIMAx to the Bass feat Gato Preto, Fellow & Tommi + Bomb Diggy Lima 13:30-14:10 Tim Vantol 14:45-15:30 The Staves 16:25-17:25 Pokey LaFarge 18:15-19:15 Beans & Fatback 20:05-21:05 Bombino 22:00-23:00 Watcha Clan 23:00-03:00 Swingtastic X-Ray 13:30-14:15 traumahelikopter 15:00-15:50 Matias Aguayo & Mostro 16:30-17:30 Charanjit Singh 18:15-19:10 Snakehips 20:05-21:05 Petite Noir 22:00-22:55 MMM live 23:00-00:00 Know V.A. 00:00-01:00 01:00-02:30 LeFtO 02:30-03:30 03:30-04:45 Jameszoo Juliet 14:30-15:40 Trouble Man / Sadettin Kirmiziyüz i.s.m. the Sadists - ’Somedaymyprincewill.com’ 17:00-17:50 iLL Skill Squad in coproductie met Theatergroep DOX- ’155’ 19:00-20:00 Dolf Jansen 20:45-21:35 Herman In Een Bakje Geitenkwark 22:15-23:15 De Bourgondische Belgen 00:00-00:45 URLAND - ’House On Mars: Pixel ’ Echo 13:45-14:30 LL University - Pieter Jonker 15:00-15:45 LL University - Ernst Hirsch Ballin 16:15-16:45 VPRO’s Nationale Wetenschaps Quiz 17:30-19:00 ’Spring Breakers’ 19:20-20:35 ’Life @ Lowlands’ 21:00-22:40 Lowlands Keuzefilm Continued on next page

116 Table D.7 – continued from previous page Stage Time Performance 23:10-00:40 ’Citadel’ 01:10-03:00 ’This Is The End’ Tïtty Twïster 14:30-15:00 Joubert Pignon & Matthijs Leeuwis - ’Waar beren drinken’ 15:30-16:00 Vrouwkje Tuinman 16:30-17:00 De Grote Lowlands Schrijfwedstrijd 17:30-18:15 Literaire Stoelendans 20:30-05:00 Tïtty Twïster Mike 18:00-19:00 Lulverhalen 19:30-20:00 Tex de Wit 20:30-21:00 Henry Van Loon 21:30-22:00 Bert Gabriëls 22:30-23:00 Ava Vidal 23:30-00:00 René van Meurs

Table D.8: Programma Lowlands 2013 Saturday 5

Stage Time Performance Alpha 13:00-14:00 14:45-15:45 Balthazar 16:30-17:30 The Opposites 18:15-19:15 The Lumineers 20:00-21:00 Chase And Status 21:50-23:00 Editors Bravo 11:00-11:45 Wakker Worden Met Scappuccino! 13:30-14:30 Great Minds 15:30-16:30 Mount Kimbie 17:30-18:30 Miles Kane 19:30-20:30 Buraka Som Sistema 21:35-22:50 23:00-00:00 Sandrien 00:00-01:30 Nina Kraviz 01:30-03:30 Richie Hawtin 03:30-04:55 Marcel Fengler Grolsch 13:45-14:45 Seeed 15:25-16:25 Imagine Dragons 17:10-18:10 Michael Kiwanuka 19:00-20:00 Empire Of The Sun 20:50-22:05 Major Lazer 23:00-03:00 Kill All Hipsters Charlie 14:00-14:40 Little Green Cars 15:45-16:30 Torche 17:30-18:15 Satellite Stories 19:15-20:00 Twenty One Pilots 21:00-21:45 Cerebral Ballzy India 11:30-12:30 Daughter 13:10-14:00 Half Moon Run Continued on next page

117 Table D.8 – continued from previous page Stage Time Performance 14:50-15:40 Mozes And The Firstborn 16:30-17:30 The Veils 18:20-19:20 Unknown Mortal Orchestra 20:15-21:00 Chvrches 22:00-23:00 Poliça 23:15-04:15 feat Lunice, Kutmah, , DJ , D-Styles, Lima 10:30-11:00 Tai Chi 11:30-12:30 Llowgenda 13:00-14:00 Heartless Bastards 14:40-15:40 Maison du Malheur 16:25-17:25 Ben Zabo 18:15-19:15 Eläkeläiset 20:00-21:00 Protoje & The Indiggnation 21:50-22:50 Dubioza Kolektiv 23:00-03:00 Dosvedanya met DJ Pizdabolkin & Kameraden X-Ray 13:00-13:50 Mmoths 14:30-15:15 Rainbow Arabia 16:00-16:50 Factory Floor 17:30-18:15 John Coffey 19:00-19:50 Forest Swords 20:30-21:20 The Hard Way 22:05-23:00 Austra 23:00-00:30 Job Jobse 00:30-03:00 DJ Koze 03:00-04:45 George FitzGerald Juliet 12:30-13:30 Esther Scheldwacht - ’De Sunshine Show’ 14:15-15:00 Itamar Serussi Sahar - ’Mono’ 16:00-17:00 ’Nineties: Live Your Life Like A Rave Machine’ 17:45-18:45 Comedy Parade 19:30-20:30 Maxim Hartman - ’Een College In Chaos’ 21:00-21:50 Judah Friedlander 22:45-23:45 Remko Vrijdag & Martine Sandifort - ’Hulphond’ 00:15-01:15 Thijs van Domburg - ’Stand-up Superhero’ Echo 12:30-13:15 LL University - Christine Mummery 13:45-14:30 LL University - Leo Kouwenhoven 15:00-15:30 VPRO’s Nationale Wetenschaps Quiz 16:45-18:15 ’A Field In England’ 18:45-20:40 ’Sound City’ 21:10-22:40 ’Cinema Curioso’ 23:10-00:40 ’’ 01:00-03:00 ’You’re Next’ Tïtty Twïster 12:00-12:45 Lekkere wijven, lelijke gasten 13:15-14:00 Acteur Leest Auteur 14:45-15:45 De Magie van het Woord 16:15-17:00 Lamoer 17:30-18:15 Neil Gaiman Continued on next page

118 Table D.8 – continued from previous page Stage Time Performance 20:30-05:00 Tïtty Twïster Mike 11:30-12:10 LowSofie: So You Think?! 13:00-13:40 LowSofie: Marc Slors 15:00-15:40 LowSofie: Jos de Mul 16:30-17:10 LowSofie: Discussie 18:00-19:00 Lulverhalen 19:30-20:00 Maartje & Kine 20:30-21:00 William Boeva 21:30-22:00 Tim Fransen 22:30-23:00 Ava Vidal 23:30-00:00 De Partizanen

Table D.9: Programma Lowlands 2013 Sunday 6

Stage Time Performance Alpha 12:50-13:50 The World Orchestra 14:40-15:40 Jake Bugg 16:25-17:25 Fall Out Boy 18:15-19:15 Alabama Shakes 20:00-21:00 Franz Ferdinand 21:50-23:00 Nick Cave & The Bad Seeds Bravo 14:00-15:00 NJO Reich Ensemble 15:40-16:40 Totally Enormous Extinct Dinosaurs 17:30-18:30 James Blake 19:30-20:30 Nero 21:30-22:30 22:45-23:45 Guerilla Speakerz 23:45-00:45 TNGHT (Hudson Mohawke X Lunice) 00:45-01:45 Just Blaze 01:45-02:45 Noisia 2:45-04:00 Calyx & Teebee feat. MC AD0 04:00-04:55 Black Sun Empire Grolsch 13:50-14:35 Kodaline 15:25-16:25 Bonobo 17:15-18:15 Bat For Lashes 19:05-20:05 Foals 21:00-22:10 The Knife - Shaking The Habitual Show 23:00-03:00 Superstijl Charlie 13:55-14:40 The Bronx 15:40-16:25 Frightened Rabbit 17:25-18:10 Bo Ningen 19:15-20:00 21:00-21:45 We Came As Romans India 11:30-12:20 London Grammar 13:05-13:55 Jacco Gardner 14:40-15:40 HAIM Continued on next page

119 Table D.9 – continued from previous page Stage Time Performance 16:35-17:35 Noah And The Whale 18:20-19:20 MS MR 20:00-21:00 Gojira 22:00-23:00 Goat 23:15-04:15 90’s Alternative Lima 10:30-11:00 Tai Chi 11:30-12:30 Llowgenda 13:00-13:50 The Boxettes 14:30-15:30 Los Chinches 16:25-17:15 Delv!s 18:15-19:05 Sam Amidon 20:00-21:00 Patchanka 21:45-22:45 Fiddler’s Green 23:00-03:00 Afroodoo met DJ Bertú en DJ Safri X-Ray 13:00-13:50 Kate Boy 14:30-15:20 Mt. Wolf 16:00-16:45 Deap Vally 17:30-18:20 Machinedrum 19:00-19:50 Vondelpark 20:30-21:15 Robert DeLong 22:00-23:00 AraabMuzik 23:00-00:30 Sadar Bahar 00:30-02:00 Young Marco 02:00-03:30 Midland 03:30-04:45 William Kouam Djoko Juliet 13:00-13:35 Kofferband (Orkater) - ’De Ongelukkige Dag’ 15:00-15:45 Conny Janssen Danst - ’Meer Ruis’ 17:00-17:45 Scapino Ballet Rotterdam - ’Pearl’ 19:15-20:15 Monica da Silva Trio 21:00-22:00 De Gebroeders Fretz - ’Revolte’ 22:30-23:30 Martijn Koning 00:15-01:15 Comedytrain Echo 12:30-13:15 LL University - Hans Achterhuis 13:45-14:30 LL University - Henk Barendregt z15:00-15:30 VPRO’s Nationale Wetenschaps Qui 17:00-18:30 ’Eraserhead’ with Cercueil 18:50-20:40 ’Greetings From Tim Buckley’ 21:00-22:30 ’Safety Not Guaranteed’ 23:00-01:00 ’Borgman’ 01:25-03:00 ’Maniac’ Tïtty Twïster 12:00-13:00 Wed and Walk 13:30-14:15 Poetracks 14:45-15:30 Dimitri Verhulst 16:15-17:00 David Vann 17:30-18:15 Achievers 20:30-05:00 Tïtty Twïster Continued on next page

120 Table D.9 – continued from previous page Stage Time Performance Mike 11:30-12:10 LowSofie: So You Think?! 13:00-13:40 LowSofie: René ten Bos 15:00-15:40 LowSofie: Ad Verbrugge 16:30-17:10 LowSofie: Discussie 18:00-19:00 Lulverhalen 19:30-20:00 Soundos el Ahmadi 20:30-21:00 Pepijn Schoneveld 21:30-22:00 Ava Vidal 22:30-23:00 Kees Van Amstel 23:30-00:00 The Wonderfull Days AKA Carolien Borgers goes happy hardcore

121 122 Appendix E

Real World Variables

This version does not contain all appendices.

123 124 Appendix F

Online place related variables

This version does not contain all appendices.

125 126 Appendix G

Online people related variables

This version does not contain all appendices.

127 128 Appendix H

Online content related variables

This version does not contain all appendices.

129 130 Appendix I

Correlations real world variables

131 Table I.1: Pearson Correlation coefficient weather variables

Variables Rain Sun Temperature Max temp. Fog Thunderstorm Wind Moon ρP P ρP P ρP P ρP P ρP P ρP P ρP P ρP P Rain - -0.078 0.0 -0.241 0.0 -0.251 0.0 -0.011 0.581 0.208 0.0 0.083 0.0 -0.147 0.0 Sun -0.078 0.0 - †0.478 0.0 -0.042 0.033 -0.044 0.028 -0.039 0.05 †0.547 0.0 0.001 0.951 Temperature -0.241 0.0 †0.478 0.0 - ‡0.77 0.0 -0.011 0.587 -0.007 0.708 0.091 0.0 0.15 0.0 Max temp. -0.251 0.0 -0.042 0.033 ‡0.77 0.0 - 0.002 0.903 0.039 0.052 -0.247 0.0 0.111 0.0 Fog -0.011 0.581 -0.044 0.028 -0.011 0.587 0.002 0.903 - -0.003 0.887 -0.05 0.012 0.016 0.41 Thunderstorm 0.208 0.0 -0.039 0.05 -0.007 0.708 0.039 0.052 -0.003 0.887 - 0.004 0.841 0.042 0.033 Wind 0.083 0.0 †0.547 0.0 0.091 0.0 -0.247 0.0 -0.05 0.012 0.004 0.841 - -0.1 0.0

132 Moon -0.147 0.0 0.001 0.951 0.15 0.0 0.111 0.0 0.016 0.41 0.042 0.033 -0.1 0.0 - Table I.2: Spearman Correlation coefficient weather variables

Variables Rain Sun Temperature Max temp. Fog Thunderstorm Wind Moon ρS P ρS P ρS P ρS P ρS P ρS P ρS P ρS P Rain - -0.136 0.0 -0.249 0.0 -0.298 0.0 -0.011 0.581 0.208 0.0 0.11 0.0 -0.14 0.0 Sun -0.136 0.0 - †0.574 0.0 0.04 0.046 -0.037 0.062 -0.066 0.001 †0.532 0.0 0.009 0.637 Temperature -0.249 0.0 †0.574 0.0 - ‡0.709 0.0 -0.019 0.329 -0.017 0.394 0.141 0.0 0.141 0.0 Max temp. -0.298 0.0 0.04 0.046 ‡0.709 0.0 - -0.005 0.812 0.06 0.002 -0.33 0.0 0.202 0.0 Fog -0.011 0.581 -0.037 0.062 -0.019 0.329 -0.005 0.812 - -0.003 0.887 -0.057 0.004 0.016 0.41 Thunderstorm 0.208 0.0 -0.066 0.001 -0.017 0.394 0.06 0.002 -0.003 0.887 - 0.017 0.402 0.045 0.022 Wind 0.11 0.0 †0.532 0.0 0.141 0.0 -0.33 0.0 -0.057 0.004 0.017 0.402 - -0.09 0.0 Moon -0.14 0.0 0.009 0.637 0.141 0.0 0.202 0.0 0.016 0.41 0.045 0.022 -0.09 0.0 - Table I.3: Pearson Correlation coefficient holiday variables

Variables Public holidays School holidays Const. ind. holidays weekend ρP P ρP P ρP P ρP P Public Holidays - -0.059 0.003 -0.046 0.021 0.027 0.176 School Holidays -0.059 0.003 - ‡0.865 0.0 0.354 0.0 Const Ind Holidays -0.046 0.021 ‡0.865 0.0 - 0.102 0.0

133 weekend 0.027 0.176 0.354 0.0 0.102 0.0 - Table I.4: Spearman Correlation coefficient holiday variables

Variables Public holidays School holidays Const. ind. holidays weekend ρS P ρS P ρS P ρS P Public Holidays - -0.052 0.008 -0.041 0.04 0.026 0.186 School Holidays -0.052 0.008 - ‡0.826 0.0 †0.391 0.0 Const Ind Holidays -0.041 0.04 ‡0.826 0.0 - 0.0 0.995 weekend 0.026 0.186 †0.391 0.0 0.0 0.995 - Table I.5: Pearson Correlation coefficient Festivals

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρP P ρP P ρP P ρP P ρP P Dance Valley - nan 1.0 -0.023 0.247 -0.016 0.424 -0.015 0.465 Sensation nan 1.0 - nan 1.0 nan 1.0 nan 1.0 Pukkelpop -0.023 0.247 nan 1.0 - -0.003 0.898 ‡0.695 0.0 Pinkpop -0.016 0.424 nan 1.0 -0.003 0.898 - -0.002 0.935

134 Lowlands -0.015 0.465 nan 1.0 ‡0.695 0.0 -0.002 0.935 - Table I.6: Spearman Correlation coefficient Festivals

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρS P ρS P ρS P ρS P ρS P Dance Valley - nan nan -0.025 0.211 -0.018 0.377 -0.016 0.43 Sensation nan nan - nan nan nan nan nan nan Pukkelpop -0.025 0.211 nan nan - -0.003 0.888 0.632 0.0 Pinkpop -0.018 0.377 nan nan -0.003 0.888 - -0.002 0.929 Lowlands -0.016 0.43 nan nan †0.632 0.0 -0.002 0.929 - Table I.7: Pearson Correlation coefficient Context

Variables News Sport Festivals ρP P ρP P ρP P NewS - -0.019 0.334 -0.096 0.0 Sport -0.019 0.334 - -0.051 0.011

135 FestivalS -0.096 0.0 -0.051 0.011 - Table I.8: Spearman Correlation coefficient Context

Variables News Sport Festivals ρS P ρS P ρS P NewS - -0.019 0.334 -0.136 0.0 Sport -0.019 0.334 - -0.077 0.0 FestivalS -0.136 0.0 -0.077 0.0 - Table I.9: Pearson Correlation coefficient weather and holiday variables

Variables Public holidays School holidays Const. ind. holidays weekend ρP P ρP P ρP P ρP P Rain -0.009 0.652 -0.162 0.0 -0.046 0.022 -0.26 0.0 Sun 0.025 0.203 -0.022 0.274 -0.053 0.008 0.042 0.035 Temperature -0.034 0.083 †0.561 0.0 †0.561 0.0 0.001 0.952 Max temp. -0.064 0.001 ‡0.705 0.0 ‡0.695 0.0 0.043 0.031 Fog -0.001 0.961 -0.042 0.036 -0.034 0.087 -0.016 0.412 Thunderstorm -0.002 0.908 0.038 0.056 0.011 0.595 -0.004 0.827 Wind 0.006 0.768 -0.073 0.0 -0.048 0.015 0.074 0.0

136 Moon -0.005 0.79 0.018 0.354 -0.048 0.016 -0.075 0.0 Table I.10: Spearman Correlation coefficient weather and holiday variables

Variables Public holidays School holidays Const. ind. holidays weekend ρS P ρS P ρS P ρS P Rain -0.009 0.652 -0.147 0.0 -0.001 0.954 -0.256 0.0 Sun 0.025 0.208 0.014 0.484 -0.049 0.013 0.078 0.0 Temperature -0.039 0.048 †0.482 0.0 †0.479 0.0 0.049 0.013 Max temp. -0.04 0.044 †0.536 0.0 †0.554 0.0 0.066 0.001 Fog -0.001 0.961 -0.036 0.069 -0.032 0.111 -0.018 0.366 Thunderstorm -0.002 0.908 0.023 0.243 -0.003 0.879 -0.008 0.698 Wind 0.012 0.541 -0.075 0.0 -0.048 0.016 -0.001 0.967 Moon -0.006 0.763 -0.08 0.0 -0.057 0.004 -0.093 0.0 Table I.11: Pearson Correlation coefficient weather and festival variables

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρP P ρP P ρP P ρP P ρP P Rain -0.121 0.0 nan 1.0 0.037 0.06 -0.013 0.501 0.057 0.004 Sun -0.238 0.0 nan 1.0 0.002 0.921 -0.008 0.69 -0.021 0.292 Temperature -0.081 0.0 nan 1.0 0.005 0.792 -0.049 0.013 -0.006 0.748 Max temp 0.001 0.956 nan 1.0 0.009 0.641 -0.039 0.052 0.014 0.476 Fog -0.013 0.512 nan 1.0 -0.002 0.916 -0.001 0.942 -0.001 0.947 Thunderstorm -0.031 0.117 nan 1.0 -0.005 0.802 -0.003 0.862 -0.003 0.874 Wind 0.028 0.157 nan 1.0 -0.003 0.887 -0.016 0.424 -0.016 0.409

137 Moon -0.071 0.0 nan 1.0 -0.011 0.565 -0.08 0.0 -0.007 0.716 Table I.12: Spearman Correlation coefficient weather and festival variables

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρS P ρS P ρS P ρS P ρS P Rain -0.126 0.0 nan nan 0.024 0.235 -0.014 0.476 0.056 0.005 Sun -0.255 0.0 nan nan 0.002 0.925 -0.021 0.286 -0.034 0.087 Temperature -0.14 0.0 nan nan 0.008 0.686 -0.058 0.003 -0.016 0.432 Max temp. -0.096 0.0 nan nan 0.012 0.548 -0.06 0.003 0.032 0.112 Fog -0.014 0.494 nan nan -0.002 0.913 -0.002 0.939 -0.001 0.945 Thunderstorm -0.032 0.102 nan nan -0.005 0.795 -0.004 0.854 -0.003 0.869 Wind -0.036 0.072 nan nan 0.009 0.644 -0.012 0.544 -0.019 0.339 Moon -0.084 0.0 nan nan -0.013 0.499 -0.075 0.0 -0.008 0.67 Table I.13: Pearson Correlation coefficient weather and context variables

Variables News Sport Festivals ρP P ρP P ρP P Rain -0.055 0.005 0.039 0.048 -0.007 0.736 Sun 0.006 0.746 0.0 0.984 -0.006 0.773 Temperature -0.033 0.094 -0.099 0.0 0.051 0.011 Max temp. -0.051 0.01 -0.132 0.0 0.081 0.0 Fog -0.007 0.734 -0.003 0.866 0.017 0.404 Thunderstorm -0.016 0.417 -0.008 0.686 0.216 0.0 Wind -0.057 0.004 0.067 0.001 0.043 0.03

138 Moon 0.192 0.0 0.055 0.006 -0.083 0.0 Table I.14: Spearman Correlation coefficient weather and context variables

Variables News Sport Festivals ρS P ρS P ρS P Rain -0.055 0.005 0.039 0.048 -0.117 0.0 Sun 0.001 0.962 -0.006 0.752 0.012 0.552 Temperature -0.006 0.764 -0.102 0.0 0.013 0.5 Max temp 0.041 0.037 -0.135 0.0 -0.069 0.001 Fog -0.007 0.734 -0.003 0.866 0.001 0.967 Thunderstorm -0.016 0.417 -0.008 0.686 0.142 0.0 Wind -0.043 0.03 0.059 0.003 0.22 0.0 Moon 0.2 0.0 0.058 0.004 -0.117 0.0 Table I.15: Pearson Correlation coefficient holiday and festival variables

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρP P ρP P ρP P ρP P ρP P Public Holidays -0.011 0.592 nan 1.0 -0.002 0.932 -0.001 0.953 -0.001 0.957 School Holidays 0.204 0.0 nan 1.0 0.006 0.759 -0.087 0.0 0.004 0.846 Const Ind Holidays 0.114 0.0 nan 1.0 -0.055 0.005 -0.068 0.001 -0.045 0.023

139 weekend †0.363 0.0 nan 1.0 -0.028 0.161 0.04 0.043 0.009 0.635 Table I.16: Spearman Correlation coefficient holiday and festival variables

Variables Dance Valley Sensation Pukkelpop Pinkpop Lowlands ρS P ρS P ρS P ρS P ρS P Public Holidays -0.011 0.577 nan nan -0.002 0.929 -0.001 0.95 -0.001 0.955 School Holidays 0.233 0.0 nan nan -0.073 0.0 -0.083 0.0 -0.046 0.021 Const Ind Holidays 0.06 0.002 nan nan -0.065 0.001 -0.065 0.001 -0.047 0.017 weekend †0.369 0.0 nan nan -0.028 0.164 0.042 0.037 0.012 0.536 Table I.17: Pearson Correlation coefficient holiday and context variables

Variables News Sport Festivals ρP P ρP P ρP P Public Holidays -0.006 0.781 -0.003 0.89 -0.015 0.464 School Holidays -0.12 0.0 -0.186 0.0 0.218 0.0 Const Ind Holidays -0.2 0.0 -0.16 0.0 0.015 0.46

140 weekend -0.085 0.0 0.087 0.0 0.071 0.0 Table I.18: Spearman Correlation coefficient holiday and context variables

Variables News Sport Festivals ρS P ρS P ρS P Public Holidays -0.006 0.781 -0.003 0.89 -0.022 0.266 School Holidays -0.229 0.0 -0.173 0.0 0.328 0.0 Const Ind Holidays -0.208 0.0 -0.142 0.0 -0.017 0.387 weekend -0.086 0.0 0.086 0.0 †0.376 0.0 Table I.19: Pearson Correlation coefficient festival and context variables

Variables News Sport Festivals ρP P ρP P ρP P Dance Valley -0.074 0.0 -0.037 0.062 0.003 0.897 Sensation nan 1.0 nan 1.0 nan 1.0 Pukkelpop -0.012 0.549 -0.006 0.765 0.085 0.0 Pinkpop 0.055 0.005 -0.004 0.837 0.0 0.989

141 Lowlands -0.008 0.705 -0.004 0.85 0.062 0.002 Table I.20: Spearman Correlation coefficient festival and context variables

Variables News Sport Festivals ρS P ρS P ρS P Dance Valley -0.077 0.0 -0.039 0.052 0.383 0.0 Sensation nan nan nan nan nan nan Pukkelpop -0.012 0.534 -0.006 0.757 0.097 0.0 Pinkpop 0.085 0.0 -0.004 0.827 0.043 0.03 Lowlands -0.008 0.695 -0.004 0.845 0.061 0.002 142 Appendix J

Correlations online variables

Table J.1: Average cross correlations between online variables Dance Valley

Variable 1 Variable 2 ρP P-value ρS P-value Emotionality Newsworthiness M -0.319 0.0 M -0.339 0.0 Newsworthiness Intensity M 0.283 0.0 M 0.231 0.0 Newsworthiness Readability M -0.181 0.0 Originality Intensity M -0.188 0.0 M -0.194 0.0 Originality Newsworthiness M -0.306 0.0 Originality Readability M 0.152 0.0 Originality Sentiment M -0.189 0.0 M -0.162 0.0 Readability Intensity M 0.328 0.0 N 0.488 0.0 Betweenness centr. Closeness centr. M 0.35 0.0 M 0.351 0.0 Betweenness centr. Experience M -0.265 0.0 Betweenness centr. Inclusiveness N -0.38 0.0 M -0.311 0.0 Betweenness centr. Popularity M -0.224 0.0 Closeness centr. Inclusiveness M -0.152 0.0 M -0.169 0.0 Closeness centr. Social Equality M -0.266 0.0 M -0.242 0.0 Degree centr. Betweenness centr. N 0.589 0.0 N 0.535 0.0 Degree centr. Closeness centr. N 0.376 0.0 N 0.48 0.0 Degree centr. Experience M -0.18 0.0 M -0.18 0.0 Degree centr. Inclusiveness N -0.514 0.0 M -0.316 0.0 Degree centr. Popularity M -0.15 0.0 Degree centr. Social Equality M 0.211 0.0 Experience People of int. M -0.183 0.0 M -0.197 0.0 Inclusiveness People of int. M 0.186 0.0 M 0.192 0.0 Inclusiveness Social Equality M -0.248 0.0 Popularity Experience  0.783 0.0  0.69 0.0 Popularity People of int. M -0.168 0.0 Betweenness centr. Intensity M -0.215 0.0 M -0.267 0.0 Betweenness centr. Readability M -0.195 0.0 M -0.2 0.0 Closeness centr. Readability M -0.227 0.0 Degree centr. Emotionality M -0.237 0.0 M -0.282 0.0 Degree centr. Intensity M -0.172 0.0 M -0.204 0.0 Experience Intensity N 0.378 0.0 M 0.285 0.0 Experience Newsworthiness N 0.611 0.0 N 0.492 0.0 Experience Originality M -0.19 0.0 N -0.506 0.0 Experience Sentiment M -0.239 0.0 Inclusiveness Emotionality M 0.303 0.0 M 0.296 0.0 Inclusiveness Intensity M 0.286 0.0 M 0.333 0.0 Inclusiveness Newsworthiness M -0.298 0.0 M -0.265 0.0 Inclusiveness Readability M 0.305 0.0 N 0.374 0.0 People of int. Intensity M 0.224 0.0 Continued on next page

143 Table J.1 – continued from previous page Variable 1 Variable 2 ρP P-value ρS P-value Popularity Intensity M 0.358 0.0 M 0.201 0.0 Popularity Newsworthiness N 0.554 0.0 N 0.455 0.0 Popularity Originality M -0.151 0.0 M -0.276 0.0 Popularity Sentiment M -0.339 0.0 M -0.161 0.0 Social Equality Emotionality M -0.215 0.0 M -0.171 0.0 Social Equality Newsworthiness M 0.154 0.0 M 0.157 0.0 Social Equality Originality M -0.288 0.0 M -0.206 0.0 Dist Event - Res Distance Users  0.888 0.0  0.933 0.0 Dist Event - Res Emotionality M 0.224 0.0 M 0.293 0.0 Dist Event - Res Intensity N -0.382 0.0 N -0.426 0.0 Dist Event - Res Newsworthiness M -0.302 0.0 M -0.26 0.0 Dist Event - Res Originality M 0.337 0.0 Distance Users Emotionality M 0.299 0.0 M 0.285 0.0 Distance Users Intensity M -0.349 0.0 N -0.431 0.0 Distance Users Newsworthiness N -0.373 0.0 M -0.345 0.0 Distance Users Originality M 0.236 0.0 Dist Event - Res Degree centr. M -0.153 0.0 Dist Event - Res Experience M -0.186 0.0 Dist Event - Res People of int. M -0.157 0.0 Dist Event - Res Social Equality M -0.187 0.0 M -0.184 0.0 Dist. Event - User Betweenness centr. M -0.217 0.0 Dist. Event - User Closeness centr. M -0.192 0.0 Distance Users Experience M -0.258 0.0 Distance Users Social Equality M -0.285 0.0 M -0.293 0.0

Table J.2: Average cross correlations between online variables Sensation

Variable 1 Variable 2 ρP P-value ρS P-value Emotionality Intensity M -0.168 0.0 M -0.165 0.0 Emotionality Newsworthiness M -0.214 0.0 M -0.172 0.0 Emotionality Readability M -0.22 0.0 M -0.224 0.0 Originality Emotionality M -0.213 0.0 Originality Intensity M 0.241 0.0 N 0.439 0.0 Originality Newsworthiness M 0.302 0.0 M 0.196 0.0 Originality Readability M 0.308 0.0 Readability Intensity N 0.362 0.0 N 0.53 0.0 Sentiment Intensity M 0.293 0.0 N 0.392 0.0 Sentiment Readability M 0.212 0.0 M 0.242 0.0 Betweenness centr. Closeness centr. M 0.163 0.0 M 0.155 0.0 Betweenness centr. Inclusiveness M -0.162 0.0 M -0.203 0.0 Closeness centr. Popularity M -0.16 0.0 M -0.192 0.0 Closeness centr. Social Equality M 0.284 0.0 M 0.185 0.0 Degree centr. Betweenness centr. N 0.514 0.0 N 0.379 0.0 Degree centr. Closeness centr. N 0.448 0.0 N 0.372 0.0 Degree centr. Experience M 0.151 0.0 Degree centr. People of int. M -0.163 0.0 M -0.228 0.0 Degree centr. Popularity M -0.304 0.0 M -0.319 0.0 Degree centr. Social Equality N 0.555 0.0 N 0.466 0.0 Inclusiveness Experience M -0.155 0.0 M -0.215 0.0 Inclusiveness Popularity N -0.439 0.0 N -0.398 0.0 Inclusiveness Social Equality M 0.177 0.0 Popularity Experience M 0.161 0.0 Social Equality Experience M 0.222 0.0 M 0.15 0.0 Social Equality Popularity  -0.675 0.0  -0.716 0.0 Closeness centr. Originality M 0.238 0.0 Degree centr. Intensity M -0.349 0.0 N -0.499 0.0 Degree centr. Newsworthiness M 0.2 0.0 Degree centr. Originality M -0.216 0.0 Continued on next page

144 Table J.2 – continued from previous page Variable 1 Variable 2 ρP P-value ρS P-value Degree centr. Readability M -0.274 0.0 M -0.326 0.0 Degree centr. Sentiment M -0.214 0.0 M -0.196 0.0 Experience Intensity M -0.214 0.0 Experience Newsworthiness M 0.176 0.0 Experience Originality M -0.15 0.0 M -0.155 0.0 Inclusiveness Intensity M -0.251 0.0 Inclusiveness Originality M -0.177 0.0 M -0.341 0.0 People of int. Intensity M 0.234 0.0 People of int. Readability M 0.223 0.0 M 0.276 0.0 People of int. Sentiment M 0.187 0.0 Popularity Intensity M 0.188 0.0 N 0.538 0.0 Popularity Newsworthiness M -0.256 0.0 M -0.22 0.0 Popularity Originality M -0.152 0.0 M 0.182 0.0 Popularity Readability M 0.322 0.0 M 0.275 0.0 Popularity Sentiment M 0.342 0.0 M 0.349 0.0 Social Equality Emotionality M 0.168 0.0 Social Equality Intensity N -0.418 0.0  -0.694 0.0 Social Equality Readability N -0.386 0.0 N -0.433 0.0 Social Equality Sentiment N -0.363 0.0 N -0.374 0.0 Dist Event - Res Distance Users  0.898 0.0  0.871 0.0 Dist Event - Res Emotionality M 0.22 0.0 M 0.177 0.0 Dist Event - Res Intensity M -0.27 0.0 N -0.563 0.0 Dist Event - Res Originality M -0.238 0.0 Dist Event - Res Readability N -0.402 0.0 N -0.402 0.0 Dist Event - Res Sentiment M -0.207 0.0 M -0.275 0.0 Dist. Event - User Originality M 0.207 0.0 Distance Users Emotionality M 0.216 0.0 Distance Users Intensity M -0.174 0.0 N -0.542 0.0 Distance Users Readability M -0.294 0.0 M -0.337 0.0 Distance Users Sentiment M -0.163 0.0 M -0.327 0.0 Dist Event - Res Degree centr. M 0.242 0.0 N 0.368 0.0 Dist Event - Res Inclusiveness M 0.202 0.0 M 0.205 0.0 Dist Event - Res People of int. M -0.174 0.0 Dist Event - Res Popularity M -0.295 0.0 N -0.387 0.0 Dist Event - Res Social Equality M 0.326 0.0 N 0.507 0.0 Distance Users Degree centr. M 0.312 0.0 Distance Users Inclusiveness M 0.171 0.0 M 0.222 0.0 Distance Users People of int. M -0.176 0.0 Distance Users Popularity M -0.178 0.0 N -0.4 0.0 Distance Users Social Equality M 0.198 0.0 N 0.473 0.0

Table J.3: Average cross correlations between onlinevariables Pukkelpop

Variable 1 Variable 2 ρP P-value ρS P-value Emotionality Intensity M -0.209 0.0 M -0.271 0.0 Emotionality Newsworthiness M -0.332 0.0 M -0.27 0.0 Emotionality Readability M -0.154 0.0 Newsworthiness Intensity M 0.246 0.0 M 0.338 0.0 Newsworthiness Readability M 0.2 0.0 M 0.217 0.0 Originality Newsworthiness M 0.198 0.0 M 0.334 0.0 Originality Sentiment M -0.169 0.0 Readability Intensity M 0.252 0.0 N 0.378 0.0 Sentiment Intensity M -0.16 0.0 Degree centr. Inclusiveness N 0.388 0.0 N 0.375 0.0 Degree centr. People of int. M -0.274 0.0 M -0.318 0.0 Degree centr. Popularity M -0.225 0.0 Degree centr. Social Equality M 0.339 0.0 M 0.352 0.0 Popularity Experience N 0.37 0.0 M 0.306 0.0 Continued on next page

145 Table J.3 – continued from previous page Variable 1 Variable 2 ρP P-value ρS P-value Popularity People of int. M 0.296 0.0 Social Equality People of int. N -0.365 0.0 M -0.36 0.0 Social Equality Popularity M -0.189 0.0 M -0.323 0.0 Degree centr. Emotionality M 0.178 0.0 Degree centr. Intensity M -0.168 0.0 M -0.225 0.0 Degree centr. Newsworthiness M -0.218 0.0 M -0.182 0.0 Degree centr. Readability M -0.187 0.0 M -0.19 0.0 Experience Intensity M -0.184 0.0 M -0.273 0.0 Experience Readability M -0.178 0.0 M -0.237 0.0 Inclusiveness Intensity M -0.225 0.0 Inclusiveness Newsworthiness M -0.175 0.0 M -0.163 0.0 Popularity Newsworthiness M 0.205 0.0 M 0.273 0.0 Social Equality Emotionality M 0.158 0.0 Social Equality Intensity M -0.244 0.0 Social Equality Newsworthiness M -0.297 0.0 M -0.301 0.0 Social Equality Readability M -0.305 0.0 M -0.319 0.0 Dist Event - Res Distance Users M 0.196 0.0 M 0.223 0.0 Dist. Event - User Intensity M -0.185 0.0 M -0.281 0.0 Dist. Event - User Originality M 0.165 0.0 Dist. Event - User Readability M -0.195 0.0 M -0.303 0.0 Distance Users Originality M 0.188 0.0 Dist. Event - User Degree centr. M 0.167 0.0 Dist. Event - User Experience M 0.172 0.0 Dist. Event - User Social Equality M 0.195 0.0 Distance Users Experience M 0.169 0.0

Table J.4: Average cross correlations between online variables Pinkpop

Variable 1 Variable 2 ρP P-value ρS P-value Emotionality Intensity M -0.226 0.0 M -0.248 0.0 Emotionality Newsworthiness M -0.184 0.0 M -0.192 0.0 Newsworthiness Readability M 0.244 0.0 M 0.18 0.0 Originality Newsworthiness N 0.52 0.0 N 0.441 0.0 Originality Readability M 0.328 0.0 M 0.221 0.0 Readability Intensity M 0.263 0.0 Degree centr. Experience M 0.312 0.0 Degree centr. Inclusiveness M 0.256 0.0 Degree centr. People of int. M -0.164 0.0 M -0.206 0.0 Degree centr. Popularity M -0.239 0.0 Inclusiveness Experience M -0.211 0.0 M -0.29 0.0 Inclusiveness Popularity M -0.191 0.0 Social Equality People of int. M -0.204 0.0 M -0.227 0.0 Social Equality Popularity M -0.212 0.0 Experience Intensity M -0.206 0.0 M -0.322 0.0 Experience Newsworthiness M 0.223 0.0 Experience Sentiment M -0.158 0.0 M -0.172 0.0 Inclusiveness Intensity N 0.486 0.0 N 0.554 0.0 Inclusiveness Readability M 0.163 0.0 Inclusiveness Sentiment M 0.172 0.0 M 0.164 0.0 Social Equality Intensity M -0.18 0.0 Social Equality Newsworthiness M -0.234 0.0 M -0.247 0.0 Social Equality Originality M -0.155 0.0 Social Equality Readability M -0.2 0.0 M -0.203 0.0 Dist Event - Res Distance Users N 0.648 0.0 Dist Event - Res Intensity M 0.28 0.0 Dist Event - Res Degree centr. M 0.19 0.0 Dist Event - Res Inclusiveness M 0.218 0.0 Distance Users Social Equality M -0.164 0.0

146 Table J.5: Average cross correlations between online variables Lowlands

Variable 1 Variable 2 ρP P-value ρS P-value Emotionality Intensity M -0.163 0.0 M -0.164 0.0 Emotionality Newsworthiness M -0.215 0.0 M -0.203 0.0 Originality Newsworthiness M 0.23 0.0 M 0.228 0.0 Readability Intensity M 0.262 0.0 M 0.303 0.0 Degree centr. People of int. M -0.163 0.0 Degree centr. Popularity M -0.15 0.0 Degree centr. Social Equality M 0.285 0.0 M 0.34 0.0 Inclusiveness Social Equality M -0.332 0.0 M -0.329 0.0 Popularity Experience M 0.185 0.0 Social Equality People of int. M -0.214 0.0 M -0.214 0.0 Degree centr. Intensity M -0.235 0.0 Experience Newsworthiness M 0.239 0.0 Experience Sentiment M -0.181 0.0 Inclusiveness Intensity N 0.445 0.0 N 0.464 0.0 Inclusiveness Readability M 0.157 0.0 M 0.174 0.0 People of int. Intensity M 0.224 0.0 M 0.229 0.0 Popularity Newsworthiness M 0.151 0.0 Social Equality Intensity M -0.334 0.0 N -0.397 0.0 Social Equality Newsworthiness M -0.187 0.0 M -0.202 0.0 Dist Event - Res Distance Users N 0.655 0.0 M 0.244 0.0 Distance Users Intensity M 0.182 0.0 Distance Users Degree centr. M -0.17 0.0 M -0.23 0.0

147 148 Appendix K

Average correlations online variables

149 Table K.1: Pearson Average correlations for place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dist Event - Res - 0.003 0.035 N 0.657 0.254 Dist. Event - User 0.003 0.035 - -0.014 0.047

150 Distance Users N 0.657 0.254 -0.014 0.047 - Table K.2: Spearman Average correlations for place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dist Event - Res - 0.023 0.025 N 0.477 0.35 Dist. Event - User 0.023 0.025 - -0.035 0.032 Distance Users N 0.477 0.35 -0.035 0.032 - Table K.3: Pearson Average correlations for people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Degree centr. - N 0.552 0.038 N 0.412 0.036 0.017 0.294 M 0.292 0.16 -0.115 0.103 0.069 0.159 M -0.16 0.069 Betweenness centr. N 0.552 0.038 - M 0.257 0.094 M -0.271 0.109 0.095 0.02 -0.027 0.073 -0.046 0.066 -0.038 0.016 Closeness centr. N 0.412 0.036 M 0.257 0.094 - -0.13 0.022 0.009 0.275 -0.128 0.032 -0.077 0.06 0.01 0.02 Inclusiveness 0.017 0.294 M -0.271 0.109 -0.13 0.022 - -0.082 0.182 -0.108 0.174 -0.052 0.125 0.021 0.102 Social Equality M 0.292 0.16 0.095 0.02 0.009 0.275 -0.082 0.182 - -0.217 0.236 0.065 0.09 -0.186 0.117 Popularity -0.115 0.103 -0.027 0.073 -0.128 0.032 -0.108 0.174 -0.217 0.236 - M 0.311 0.254 -0.016 0.085 Experience 0.069 0.159 -0.046 0.066 -0.077 0.06 -0.052 0.125 0.065 0.09 M 0.311 0.254 - -0.058 0.078

151 People of int. M -0.16 0.069 -0.038 0.016 0.01 0.02 0.021 0.102 -0.186 0.117 -0.016 0.085 -0.058 0.078 - Table K.4: Spearman Average correlations for people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Degree centr. - N 0.457 0.078 N 0.426 0.054 0.091 0.234 0.267 0.156 M -0.217 0.063 -0.019 0.108 -0.189 0.095 Betweenness centr. N 0.457 0.078 - M 0.253 0.098 M -0.257 0.054 -0.021 0.029 -0.088 0.135 -0.117 0.148 -0.03 0.022 Closeness centr. N 0.426 0.054 M 0.253 0.098 - -0.12 0.05 -0.028 0.214 -0.109 0.083 -0.061 0.021 0.014 0.006 Inclusiveness 0.091 0.234 M -0.257 0.054 -0.12 0.05 - -0.036 0.175 -0.157 0.137 -0.089 0.147 0.019 0.111 Social Equality 0.267 0.156 -0.021 0.029 -0.028 0.214 -0.036 0.175 - M -0.292 0.227 0.104 0.062 -0.179 0.126 Popularity M -0.217 0.063 -0.088 0.135 -0.109 0.083 -0.157 0.137 M -0.292 0.227 - M 0.281 0.215 0.087 0.124 Experience -0.019 0.108 -0.117 0.148 -0.061 0.021 -0.089 0.147 0.104 0.062 M 0.281 0.215 - -0.047 0.095 People of int. -0.189 0.095 -0.03 0.022 0.014 0.006 0.019 0.111 -0.179 0.126 0.087 0.124 -0.047 0.095 - Table K.5: Pearson Average correlations for content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Originality - -0.069 0.041 -0.102 0.082 M 0.189 0.272 0.096 0.138 0.004 0.144 Emotionality -0.069 0.041 - 0.019 0.05 M -0.253 0.061 -0.079 0.098 M -0.167 0.054 Sentiment -0.102 0.082 0.019 0.05 - -0.088 0.044 0.039 0.101 0.057 0.133 Newsworthiness M 0.189 0.272 M -0.253 0.061 -0.088 0.044 - 0.05 0.17 0.071 0.175 Readability 0.096 0.138 -0.079 0.098 0.039 0.101 0.05 0.17 - M 0.27 0.074

152 Intensity 0.004 0.144 M -0.167 0.054 0.057 0.133 0.071 0.175 M 0.27 0.074 - Table K.6: Spearman Average correlations for content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Originality - -0.075 0.074 -0.008 0.087 M 0.219 0.183 0.088 0.15 0.085 0.207 Emotionality -0.075 0.074 - 0.013 0.038 M -0.235 0.061 -0.081 0.095 -0.172 0.091 Sentiment -0.008 0.087 0.013 0.038 - -0.051 0.051 0.028 0.111 0.081 0.177 Newsworthiness M 0.219 0.183 M -0.235 0.061 -0.051 0.051 - 0.072 0.11 0.112 0.169 Readability 0.088 0.15 -0.081 0.095 0.028 0.111 0.072 0.11 - N 0.392 0.103 Intensity 0.085 0.207 -0.172 0.091 0.081 0.177 0.112 0.169 N 0.392 0.103 - Table K.7: Pearson Average correlations for place and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dist Event - Res 0.061 0.105 0.022 0.026 -0.059 0.042 0.074 0.068 0.024 0.17 -0.078 0.109 -0.019 0.064 -0.057 0.066 Dist. Event - User 0.012 0.089 -0.14 0.077 -0.06 0.132 -0.055 0.063 0.032 0.076 0.047 0.037 0.06 0.066 0.016 0.05

153 Distance Users -0.023 0.114 0.03 0.002 -0.036 0.026 0.059 0.077 -0.065 0.155 -0.042 0.083 -0.008 0.107 -0.01 0.06 Table K.8: Spearman Average correlations for place and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dist Event - Res 0.102 0.177 -0.016 0.026 -0.054 0.056 0.096 0.096 0.058 0.242 M -0.155 0.12 -0.027 0.099 -0.074 0.084 Dist. Event - User 0.003 0.055 -0.085 0.017 0.036 0.0 -0.042 0.061 0.049 0.078 0.004 0.067 0.029 0.072 -0.014 0.054 Distance Users -0.021 0.186 -0.021 0.07 -0.045 0.01 0.07 0.091 -0.038 0.265 -0.1 0.164 -0.023 0.16 -0.013 0.106 Table K.9: Pearson Average correlations for place and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dist Event - Res 0.075 0.143 0.08 0.119 -0.063 0.083 -0.075 0.117 -0.091 0.161 -0.135 0.172 Dist. Event - User 0.102 0.024 0.021 0.053 0.006 0.063 0.05 0.1 -0.045 0.106 -0.046 0.104

154 Distance Users 0.082 0.11 0.089 0.146 -0.059 0.08 -0.055 0.164 -0.038 0.147 -0.117 0.14 Table K.10: Spearman Average correlations for place and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dist Event - Res -0.004 0.134 0.057 0.153 -0.061 0.112 -0.062 0.114 -0.079 0.163 M -0.164 0.3 Dist. Event - User 0.126 0.061 0.029 0.063 -0.049 0.102 0.039 0.092 -0.076 0.116 -0.062 0.138 Distance Users -0.001 0.121 0.047 0.146 -0.076 0.144 -0.018 0.167 -0.037 0.173 M -0.162 0.286 Table K.11: Pearson Average correlations for people and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Degree centr. 0.047 0.014 0.012 0.137 -0.054 0.088 0.029 0.132 -0.121 0.097 M -0.154 0.13 Betweenness centr. 0.023 0.04 -0.047 0.02 -0.023 0.048 0.024 0.114 -0.114 0.081 -0.069 0.146 Closeness centr. 0.123 0.115 0.015 0.022 -0.012 0.057 0.077 0.003 -0.101 0.047 -0.029 0.049 Inclusiveness -0.003 0.098 0.064 0.15 0.035 0.086 -0.104 0.14 0.074 0.168 M 0.191 0.278 Social Equality -0.069 0.116 0.045 0.133 -0.067 0.155 -0.105 0.172 M -0.169 0.18 -0.183 0.167 Popularity -0.072 0.069 -0.07 0.04 -0.008 0.216 0.141 0.259 0.079 0.129 0.115 0.141 Experience -0.092 0.077 -0.013 0.069 -0.112 0.105 M 0.242 0.199 -0.046 0.113 -0.032 0.23

155 People of int. -0.023 0.036 -0.006 0.061 0.02 0.063 0.024 0.092 0.104 0.061 0.093 0.074 Table K.12: Spearman Average correlations for people and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Degree centr. -0.001 0.142 0.004 0.148 -0.034 0.099 0.01 0.126 -0.131 0.124 M -0.22 0.179 Betweenness centr. 0.095 0.013 -0.076 0.0 0.011 0.031 -0.008 0.079 -0.088 0.113 -0.074 0.194 Closeness centr. 0.082 0.033 0.029 0.024 -0.01 0.03 0.094 0.011 -0.129 0.098 -0.042 0.038 Inclusiveness -0.12 0.116 0.066 0.144 0.022 0.085 -0.094 0.12 0.094 0.191 M 0.193 0.325 Social Equality -0.128 0.141 0.065 0.122 -0.071 0.157 -0.099 0.188 M -0.178 0.201 M -0.277 0.271 Popularity -0.023 0.147 -0.053 0.058 0.008 0.177 M 0.154 0.222 0.087 0.107 0.138 0.226 Experience M -0.185 0.164 0.026 0.062 -0.096 0.055 0.175 0.171 -0.084 0.107 -0.065 0.231 People of int. 0.117 0.154 -0.019 0.039 0.002 0.105 0.037 0.098 0.121 0.081 M 0.182 0.063 156 Appendix L

Correlations online and real world variables

Table L.1: Average cross correlations between online and offline variables Dance Valley

Variable 1 Variable 2 ρP P-value lag Festivals Emotionality M 0.185 0.0 30:49:39 Festivals Readability M -0.249 0.0 -30:49:39 News Intensity M -0.154 0.0 0:0:0 News Intensity M -0.155 0.0 -12:45:28 News Readability M -0.315 0.0 27:43:39 News Readability M -0.211 0.0 -0:0:0 Sport Newsworthiness M -0.17 0.0 24:35:6 Sport Originality M 0.27 0.0 30:49:39 Festivals Closeness centr. M 0.232 0.0 0:0:0 Festivals Closeness centr. M 0.239 0.0 -10:33:6 Festivals Social Equality M -0.211 0.0 0:0:0 Festivals Social Equality M -0.231 0.0 -9:26:7 News Betweenness centr. M 0.192 0.0 30:49:39 News Closeness centr. M 0.185 0.0 -24:35:6 News Dist Event - Res M 0.164 0.0 -30:49:39 News Dist. Event - User M -0.33 0.0 30:49:39 News Distance Users M 0.19 0.0 -16:3:38 Sport Dist Event - Res M 0.254 0.0 25:38:12 Sport Dist Event - Res M 0.152 0.0 -0:0:0 Dance Valley Intensity N 0.412 0.0 0:0:0 Dance Valley Intensity N 0.442 0.0 -30:49:39 Dance Valley Newsworthiness M 0.181 0.0 30:49:39 Dance Valley Newsworthiness M 0.191 0.0 -21:24:58 Dance Valley Sentiment M 0.232 0.0 0:0:0 Dance Valley Sentiment M 0.331 0.0 -30:49:39 Pinkpop Originality M 0.283 0.0 0:0:0 Pinkpop Originality M 0.336 0.0 -18:13:18 Dance Valley Betweenness centr. M -0.247 0.0 10:33:6 Dance Valley Betweenness centr. M -0.243 0.0 -0:0:0 Dance Valley Experience M 0.198 0.0 0:0:0 Dance Valley Experience M 0.2 0.0 -12:45:28 Dance Valley Dist Event - Res M -0.293 0.0 28:45:41 Dance Valley Dist Event - Res M -0.27 0.0 -0:0:0 Dance Valley Distance Users M -0.322 0.0 27:43:39 Dance Valley Distance Users M -0.298 0.0 -7:10:16 Continued on next page

157 Table L.1 – continued from previous page Variable 1 Variable 2 ρP P-value lag Const. Ind. holidays Emotionality M 0.255 0.0 23:31:50 Const. Ind. holidays Emotionality M 0.251 0.0 -0:0:0 Const. Ind. holidays Intensity M 0.325 0.0 0:0:0 Const. Ind. holidays Intensity M 0.334 0.0 -30:49:39 Const. Ind. holidays Originality M -0.203 0.0 30:49:39 Const. Ind. holidays Originality M -0.199 0.0 -0:0:0 Const. Ind. holidays Readability N 0.502 0.0 0:0:0 Const. Ind. holidays Readability N 0.502 0.0 -0:0:0 School Holidays Emotionality M 0.227 0.0 0:0:0 School Holidays Emotionality M 0.227 0.0 -0:0:0 School Holidays Intensity N 0.437 0.0 0:0:0 School Holidays Intensity N 0.437 0.0 -0:0:0 School Holidays Originality M -0.311 0.0 0:0:0 School Holidays Originality M -0.312 0.0 -10:33:6 School Holidays Readability N 0.401 0.0 0:0:0 School Holidays Readability N 0.401 0.0 -0:0:0 School Holidays Sentiment M 0.202 0.0 0:0:0 School Holidays Sentiment M 0.202 0.0 -0:0:0 weekend Emotionality M -0.243 0.0 0:0:0 weekend Emotionality M -0.255 0.0 -10:33:6 weekend Intensity N 0.647 0.0 30:49:39 weekend Intensity N 0.634 0.0 -0:0:0 weekend Newsworthiness M 0.327 0.0 23:31:50 weekend Newsworthiness M 0.325 0.0 -8:18:14 weekend Originality M -0.164 0.0 -30:49:39 weekend Readability M 0.291 0.0 0:0:0 weekend Readability M 0.291 0.0 -1:16:49 weekend Sentiment M 0.235 0.0 6:1:16 weekend Sentiment M 0.229 0.0 -0:0:0 Const. Ind. holidays Betweenness centr. M -0.217 0.0 0:0:0 Const. Ind. holidays Betweenness centr. M -0.217 0.0 -0:0:0 Const. Ind. holidays Degree centr. M -0.322 0.0 0:0:0 Const. Ind. holidays Degree centr. M -0.322 0.0 -0:0:0 Const. Ind. holidays Inclusiveness N 0.403 0.0 0:0:0 Const. Ind. holidays Inclusiveness N 0.411 0.0 -6:1:16 Const. Ind. holidays People of int. N 0.397 0.0 0:0:0 Const. Ind. holidays People of int. N 0.397 0.0 -0:0:0 School Holidays Betweenness centr. M -0.192 0.0 0:0:0 School Holidays Betweenness centr. M -0.192 0.0 -0:0:0 School Holidays Degree centr. M -0.338 0.0 0:0:0 School Holidays Degree centr. M -0.338 0.0 -0:0:0 School Holidays Experience M 0.173 0.0 0:0:0 School Holidays Experience M 0.183 0.0 -27:43:39 School Holidays Inclusiveness N 0.451 0.0 0:0:0 School Holidays Inclusiveness N 0.453 0.0 -1:16:49 School Holidays People of int. M 0.235 0.0 0:0:0 School Holidays People of int. M 0.235 0.0 -0:0:0 weekend Betweenness centr. M -0.228 0.0 0:0:0 weekend Betweenness centr. M -0.228 0.0 -0:0:0 weekend Experience M 0.359 0.0 0:0:0 weekend Experience N 0.391 0.0 -26:40:54 weekend Inclusiveness M 0.254 0.0 30:49:39 weekend Inclusiveness M 0.181 0.0 -0:0:0 weekend Popularity M 0.259 0.0 0:0:0 weekend Popularity M 0.298 0.0 -30:49:39 Const. Ind. holidays Dist Event - Res M -0.326 0.0 14:57:43 Const. Ind. holidays Dist Event - Res M -0.346 0.0 -30:49:39 Const. Ind. holidays Dist. Event - User M 0.16 0.0 0:0:0 Const. Ind. holidays Dist. Event - User M 0.193 0.0 -9:26:7 Continued on next page

158 Table L.1 – continued from previous page Variable 1 Variable 2 ρP P-value lag Const. Ind. holidays Distance Users M -0.152 0.0 -4:52:10 School Holidays Dist Event - Res N -0.415 0.0 30:49:39 School Holidays Dist Event - Res N -0.411 0.0 -27:43:39 School Holidays Distance Users M -0.196 0.0 30:49:39 School Holidays Distance Users M -0.228 0.0 -27:43:39 weekend Dist Event - Res M -0.341 0.0 0:0:0 weekend Dist Event - Res N -0.394 0.0 -24:35:6 weekend Distance Users M -0.357 0.0 16:3:38 weekend Distance Users N -0.419 0.0 -25:38:12 Fog Originality M 0.187 0.0 0:0:0 Fog Originality M 0.188 0.0 -2:28:59 Max Temp Emotionality M 0.279 0.0 0:0:0 Max Temp Emotionality M 0.313 0.0 -29:47:39 Max Temp Intensity M 0.164 0.0 0:0:0 Max Temp Intensity M 0.189 0.0 -30:49:39 Max Temp Newsworthiness M -0.214 0.0 0:0:0 Max Temp Newsworthiness M -0.214 0.0 -0:0:0 Max Temp Readability N 0.399 0.0 0:0:0 Max Temp Readability N 0.399 0.0 -0:0:0 Moon Intensity M -0.17 0.0 30:49:39 Moon Intensity M -0.15 0.0 -30:49:39 Moon Newsworthiness M -0.273 0.0 20:21:22 Moon Newsworthiness M -0.241 0.0 -0:0:0 Moon Originality M -0.222 0.0 -30:49:39 Rain Emotionality M 0.17 0.0 30:49:39 Rain Intensity M -0.237 0.0 9:26:7 Rain Intensity M -0.234 0.0 -0:0:0 Rain Originality M 0.238 0.0 20:21:22 Rain Originality M 0.175 0.0 -0:0:0 Rain Readability M -0.204 0.0 8:18:14 Rain Readability M -0.182 0.0 -0:0:0 Rain Sentiment M -0.194 0.0 8:18:14 Rain Sentiment M -0.186 0.0 -3:40:38 Sun Emotionality M -0.172 0.0 0:0:0 Sun Emotionality M -0.172 0.0 -0:0:0 Sun Newsworthiness M 0.228 0.0 0:0:0 Sun Newsworthiness M 0.228 0.0 -0:0:0 Sun Sentiment N -0.38 0.0 0:0:0 Sun Sentiment N -0.446 0.0 -30:49:39 Temperature Emotionality M 0.163 0.0 0:0:0 Temperature Emotionality M 0.238 0.0 -30:49:39 Temperature Intensity M 0.163 0.0 0:0:0 Temperature Intensity M 0.163 0.0 -0:0:0 Temperature Readability M 0.338 0.0 0:0:0 Temperature Readability M 0.338 0.0 -0:0:0 Temperature Sentiment M -0.199 0.0 -30:49:39 Wind Emotionality M -0.155 0.0 0:0:0 Wind Emotionality M -0.157 0.0 -1:16:49 Wind Intensity M 0.189 0.0 24:35:6 Wind Intensity M 0.185 0.0 -30:49:39 Wind Sentiment M -0.308 0.0 0:0:0 Wind Sentiment M -0.308 0.0 -0:0:0 Max Temp Betweenness centr. M -0.152 0.0 -11:39:16 Max Temp Degree centr. M -0.211 0.0 0:0:0 Max Temp Degree centr. M -0.252 0.0 -9:26:7 Max Temp Inclusiveness M 0.297 0.0 0:0:0 Max Temp Inclusiveness M 0.297 0.0 -0:0:0 Max Temp People of int. M 0.209 0.0 29:47:39 Max Temp People of int. M 0.182 0.0 -0:0:0 Continued on next page

159 Table L.1 – continued from previous page Variable 1 Variable 2 ρP P-value lag Max Temp Popularity M -0.156 0.0 0:0:0 Max Temp Popularity M -0.156 0.0 -0:0:0 Moon Betweenness centr. M 0.199 0.0 -30:49:39 Moon Closeness centr. M 0.275 0.0 0:0:0 Moon Closeness centr. M 0.279 0.0 -1:16:49 Moon Degree centr. M 0.296 0.0 -30:49:39 Moon Experience M -0.167 0.0 0:0:0 Moon Experience M -0.171 0.0 -4:52:10 Moon Popularity M -0.17 0.0 0:0:0 Moon Popularity M -0.203 0.0 -30:49:39 Sun Betweenness centr. M 0.199 0.0 3:40:38 Sun Betweenness centr. M 0.241 0.0 -16:3:38 Sun Closeness centr. M 0.166 0.0 0:0:0 Sun Closeness centr. M 0.189 0.0 -13:51:42 Sun Experience M 0.204 0.0 0:0:0 Sun Experience M 0.237 0.0 -30:49:39 Sun Popularity M 0.264 0.0 0:0:0 Sun Popularity M 0.271 0.0 -7:10:16 Temperature People of int. M 0.159 0.0 -30:49:39 Max Temp Dist Event - Res M -0.213 0.0 19:17:39 Max Temp Dist Event - Res M -0.163 0.0 -0:0:0 Moon Dist Event - Res M 0.285 0.0 30:49:39 Moon Dist Event - Res M 0.201 0.0 -0:0:0 Moon Dist. Event - User M -0.262 0.0 0:0:0 Moon Dist. Event - User M -0.262 0.0 -0:0:0 Moon Distance Users M 0.289 0.0 8:18:14 Moon Distance Users M 0.282 0.0 -0:0:0

160 Table L.2: Average cross correlations between online and offline variables Sensation

Variable 1 Variable 2 ρP P-value lag Festivals Emotionality M -0.209 0.0 0:0:0 Festivals Emotionality M -0.209 0.0 -0:0:0 Festivals Intensity N 0.658 0.0 0:0:0 Festivals Intensity N 0.659 0.0 -23:1:7 Festivals Newsworthiness M -0.156 0.0 35:25:57 Festivals Readability N 0.457 0.0 15:22:54 Festivals Readability N 0.438 0.0 -0:0:0 Festivals Sentiment N 0.379 0.0 0:0:0 Festivals Sentiment N 0.379 0.0 -0:0:0 News Newsworthiness M 0.153 0.0 35:25:57 News Readability M -0.228 0.0 -35:25:57 News Sentiment M -0.167 0.0 35:25:57 Sport Emotionality M 0.222 0.0 23:1:7 Festivals Betweenness centr. M 0.193 0.0 0:0:0 Festivals Betweenness centr. M 0.202 0.0 -7:42:23 Festivals Closeness centr. M 0.188 0.0 -35:25:57 Festivals Degree centr. M -0.336 0.0 35:25:57 Festivals Degree centr. M -0.249 0.0 -0:0:0 Festivals Inclusiveness M -0.194 0.0 35:25:57 Festivals Inclusiveness M -0.151 0.0 -0:0:0 Festivals Popularity N 0.393 0.0 35:25:57 Festivals Popularity M 0.341 0.0 -0:0:0 Festivals Social Equality N -0.461 0.0 35:25:57 Festivals Social Equality N -0.406 0.0 -0:0:0 News Inclusiveness M 0.191 0.0 -17:28:4 News Popularity M -0.202 0.0 35:25:57 News Popularity M -0.154 0.0 -0:0:0 News Social Equality M 0.196 0.0 35:25:57 Sport Degree centr. M 0.156 0.0 35:25:57 Sport People of int. M 0.314 0.0 13:59:17 Sport People of int. M 0.238 0.0 -0:0:0 Festivals Dist Event - Res M -0.319 0.0 35:25:57 Festivals Dist Event - Res M -0.294 0.0 -0:0:0 Festivals Dist. Event - User M -0.189 0.0 0:42:4 Festivals Dist. Event - User M -0.188 0.0 -0:0:0 Festivals Distance Users M -0.222 0.0 35:25:57 Festivals Distance Users M -0.203 0.0 -0:0:0 News Dist Event - Res M 0.158 0.0 -31:59:19 Sport Dist Event - Res M 0.355 0.0 29:55:9 Sport Distance Users N 0.45 0.0 29:55:9 Lowlands Newsworthiness M 0.184 0.0 35:25:57 Lowlands Originality N 0.555 0.0 27:50:59 Lowlands Originality M 0.346 0.0 -0:0:0 Pukkelpop Newsworthiness M 0.222 0.0 31:59:19 Pukkelpop Originality N 0.504 0.0 10:30:6 Pukkelpop Originality N 0.46 0.0 -0:0:0 Sensation Readability M 0.195 0.0 0:0:0 Sensation Readability M 0.203 0.0 -16:4:37 Sensation Sentiment M 0.159 0.0 0:0:0 Sensation Sentiment M 0.161 0.0 -2:48:17 Lowlands Closeness centr. M 0.218 0.0 33:22:3 Lowlands Closeness centr. M 0.188 0.0 -0:0:0 Lowlands Degree centr. M 0.263 0.0 4:54:30 Lowlands Degree centr. M 0.243 0.0 -0:0:0 Lowlands Experience M 0.171 0.0 -7:42:23 Lowlands Inclusiveness M -0.189 0.0 35:25:57 Lowlands Social Equality M 0.254 0.0 0:0:0 Lowlands Social Equality M 0.256 0.0 -4:12:26 Continued on next page

161 Table L.2 – continued from previous page Variable 1 Variable 2 ρP P-value lag Pukkelpop Closeness centr. M 0.235 0.0 0:0:0 Pukkelpop Closeness centr. M 0.235 0.0 -0:0:0 Pukkelpop Degree centr. M 0.262 0.0 0:42:4 Pukkelpop Degree centr. M 0.262 0.0 -0:0:0 Pukkelpop Experience M 0.16 0.0 -23:42:33 Pukkelpop Inclusiveness M -0.188 0.0 20:14:52 Pukkelpop Inclusiveness M -0.159 0.0 -0:0:0 Pukkelpop Social Equality M 0.198 0.0 0:0:0 Pukkelpop Social Equality M 0.273 0.0 -34:3:21 Sensation People of int. M 0.227 0.0 -35:25:57 Const. Ind. holidays Intensity M -0.185 0.0 35:25:57 Const. Ind. holidays Intensity M -0.184 0.0 -26:28:12 Const. Ind. holidays Sentiment M -0.174 0.0 2:48:17 Const. Ind. holidays Sentiment M -0.173 0.0 -0:0:0 Public Holidays Newsworthiness M 0.174 0.0 0:0:0 Public Holidays Newsworthiness M 0.174 0.0 -0:0:0 School Holidays Emotionality M -0.171 0.0 0:0:0 School Holidays Emotionality M -0.171 0.0 -0:0:0 School Holidays Intensity M 0.315 0.0 35:25:57 School Holidays Intensity M 0.314 0.0 -0:0:0 School Holidays Originality M 0.195 0.0 20:56:31 School Holidays Originality M 0.177 0.0 -0:0:0 School Holidays Readability M 0.277 0.0 30:36:32 School Holidays Readability M 0.26 0.0 -0:0:0 School Holidays Sentiment M 0.186 0.0 -35:25:57 weekend Emotionality M -0.18 0.0 0:0:0 weekend Emotionality M -0.186 0.0 -2:48:17 weekend Intensity N 0.521 0.0 35:25:57 weekend Intensity N 0.524 0.0 -6:18:27 weekend Readability M 0.327 0.0 34:44:39 weekend Readability M 0.253 0.0 -35:25:57 weekend Sentiment M 0.34 0.0 18:51:31 weekend Sentiment M 0.304 0.0 -0:0:0 Const. Ind. holidays Degree centr. M 0.167 0.0 0:0:0 Const. Ind. holidays Degree centr. M 0.209 0.0 -35:25:57 Const. Ind. holidays Popularity M -0.224 0.0 18:51:31 Const. Ind. holidays Popularity M -0.215 0.0 -0:0:0 Const. Ind. holidays Social Equality N 0.371 0.0 0:0:0 Const. Ind. holidays Social Equality N 0.382 0.0 -35:25:57 School Holidays Degree centr. M -0.156 0.0 35:25:57 weekend Betweenness centr. M 0.164 0.0 12:35:37 weekend Degree centr. M -0.247 0.0 -35:25:57 weekend People of int. M 0.201 0.0 35:25:57 weekend People of int. M 0.172 0.0 -0:0:0 weekend Popularity M 0.163 0.0 27:50:59 weekend Social Equality M -0.288 0.0 2:48:17 weekend Social Equality M -0.287 0.0 -0:0:0 Const. Ind. holidays Dist Event - Res M 0.17 0.0 35:25:57 Const. Ind. holidays Dist. Event - User M -0.218 0.0 0:0:0 Const. Ind. holidays Dist. Event - User M -0.218 0.0 -0:0:0 School Holidays Dist Event - Res M -0.172 0.0 35:25:57 School Holidays Dist Event - Res M -0.159 0.0 -0:0:0 School Holidays Dist. Event - User M -0.163 0.0 0:0:0 School Holidays Dist. Event - User M -0.163 0.0 -0:0:0 weekend Dist Event - Res M -0.261 0.0 -31:17:56 weekend Distance Users M -0.178 0.0 -31:17:56 Max Temp Emotionality M -0.172 0.0 0:0:0 Max Temp Emotionality M -0.175 0.0 -2:6:12 Max Temp Intensity N 0.408 0.0 35:25:57 Continued on next page

162 Table L.2 – continued from previous page Variable 1 Variable 2 ρP P-value lag Max Temp Intensity N 0.407 0.0 -0:0:0 Max Temp Newsworthiness M -0.175 0.0 0:0:0 Max Temp Newsworthiness M -0.175 0.0 -0:0:0 Max Temp Readability M 0.218 0.0 0:0:0 Max Temp Readability M 0.221 0.0 -2:48:17 Max Temp Sentiment M 0.28 0.0 18:51:31 Max Temp Sentiment M 0.261 0.0 -0:0:0 Moon Emotionality M -0.152 0.0 -35:25:57 Moon Intensity N 0.366 0.0 35:25:57 Moon Intensity N 0.363 0.0 -0:0:0 Moon Readability M 0.296 0.0 0:0:0 Moon Readability M 0.336 0.0 -35:25:57 Moon Sentiment M 0.204 0.0 0:0:0 Moon Sentiment M 0.257 0.0 -28:32:24 Rain Intensity M -0.172 0.0 35:25:57 Rain Intensity M -0.169 0.0 -0:0:0 Rain Readability M -0.175 0.0 28:32:24 Rain Sentiment M -0.191 0.0 14:41:6 Sun Originality M -0.15 0.0 0:0:0 Sun Originality M -0.175 0.0 -27:50:59 Temperature Intensity M 0.325 0.0 16:4:37 Temperature Intensity M 0.324 0.0 -0:0:0 Temperature Newsworthiness M -0.162 0.0 0:0:0 Temperature Newsworthiness M -0.176 0.0 -4:12:26 Temperature Readability M 0.181 0.0 34:44:39 Temperature Readability M 0.16 0.0 -0:0:0 Temperature Sentiment M 0.244 0.0 0:0:0 Temperature Sentiment M 0.244 0.0 -0:0:0 Wind Emotionality M 0.172 0.0 -31:59:19 Wind Intensity M -0.337 0.0 0:0:0 Wind Intensity M -0.337 0.0 -0:0:0 Wind Readability M -0.199 0.0 0:0:0 Wind Readability M -0.199 0.0 -0:0:0 Max Temp Inclusiveness M -0.187 0.0 0:0:0 Max Temp Inclusiveness M -0.187 0.0 -0:0:0 Max Temp Popularity M 0.226 0.0 0:0:0 Max Temp Popularity M 0.226 0.0 -0:42:4 Max Temp Social Equality M -0.151 0.0 23:42:33 Moon Degree centr. M -0.323 0.0 0:0:0 Moon Degree centr. M -0.323 0.0 -0:0:0 Moon Inclusiveness M -0.216 0.0 35:25:57 Moon Inclusiveness M -0.157 0.0 -0:0:0 Moon People of int. M 0.151 0.0 35:25:57 Moon Popularity N 0.396 0.0 4:54:30 Moon Popularity N 0.393 0.0 -0:0:0 Moon Social Equality N -0.448 0.0 0:0:0 Moon Social Equality N -0.448 0.0 -0:0:0 Rain Degree centr. M 0.249 0.0 31:59:19 Rain Degree centr. M 0.164 0.0 -2:48:17 Rain Popularity M -0.167 0.0 -18:51:31 Rain Social Equality M 0.214 0.0 7:0:26 Rain Social Equality M 0.205 0.0 -11:53:46 Sun Experience M 0.242 0.0 9:6:16 Sun Experience M 0.243 0.0 -20:14:52 Sun Inclusiveness M -0.186 0.0 5:36:29 Sun Inclusiveness M -0.158 0.0 -0:0:0 Sun Popularity M 0.254 0.0 5:36:29 Sun Popularity M 0.239 0.0 -0:0:0 Temperature Experience M 0.173 0.0 9:6:16 Continued on next page

163 Table L.2 – continued from previous page Variable 1 Variable 2 ρP P-value lag Temperature Experience M 0.157 0.0 -0:0:0 Temperature Inclusiveness M -0.221 0.0 5:36:29 Temperature Inclusiveness M -0.205 0.0 -0:0:0 Temperature Popularity M 0.285 0.0 6:18:27 Temperature Popularity M 0.276 0.0 -0:0:0 Temperature Social Equality M -0.174 0.0 27:50:59 Fog Dist Event - Res M 0.195 0.0 -23:42:33 Fog Distance Users M 0.23 0.0 -30:36:32 Max Temp Dist Event - Res M -0.161 0.0 2:48:17 Max Temp Dist Event - Res M -0.155 0.0 -0:0:0 Moon Dist Event - Res M -0.301 0.0 0:0:0 Moon Dist Event - Res M -0.301 0.0 -0:0:0 Moon Dist. Event - User M 0.173 0.0 0:0:0 Moon Dist. Event - User M 0.173 0.0 -0:0:0 Moon Distance Users M -0.211 0.0 0:0:0 Moon Distance Users M -0.211 0.0 -0:0:0 Rain Dist Event - Res M 0.248 0.0 32:40:43 Rain Dist Event - Res M 0.166 0.0 -0:0:0 Rain Distance Users M 0.195 0.0 32:40:43 Rain Distance Users M 0.162 0.0 -13:59:17 Wind Dist Event - Res M 0.218 0.0 13:17:27 Wind Dist Event - Res M 0.154 0.0 -0:0:0 Wind Distance Users M 0.252 0.0 27:50:59 Wind Distance Users M 0.152 0.0 -0:0:0

164 Table L.3: Average cross correlations between online and offline variables Pukkelpop

Variable 1 Variable 2 ρP P-value lag Festivals Emotionality M -0.195 0.0 14:57:3 Festivals Emotionality M -0.185 0.0 -0:0:0 Festivals Originality M -0.182 0.0 0:0:0 Festivals Originality M -0.196 0.0 -22:29:27 Festivals Readability M 0.231 0.0 0:0:0 Festivals Readability M 0.233 0.0 -3:19:55 Festivals Social Equality M -0.159 0.0 5:53:31 Festivals Social Equality M -0.159 0.0 -0:0:0 Festivals Distance Users M -0.153 0.0 0:0:0 Festivals Distance Users M -0.153 0.0 -0:0:0 Dance Valley Originality M 0.177 0.0 33:53:48 Lowlands Emotionality M -0.179 0.0 33:53:48 Lowlands Emotionality M -0.17 0.0 -0:0:0 Lowlands Intensity M 0.237 0.0 33:53:48 Lowlands Intensity M 0.184 0.0 -0:0:0 Lowlands Newsworthiness M 0.28 0.0 33:53:48 Lowlands Newsworthiness M 0.247 0.0 -0:0:0 Lowlands Readability M 0.21 0.0 19:21:55 Lowlands Readability M 0.206 0.0 -0:0:0 Pukkelpop Emotionality M -0.188 0.0 22:58:2 Pukkelpop Emotionality M -0.184 0.0 -0:0:0 Pukkelpop Intensity M 0.32 0.0 33:53:48 Pukkelpop Intensity M 0.277 0.0 -0:0:0 Pukkelpop Newsworthiness M 0.221 0.0 27:11:4 Pukkelpop Newsworthiness M 0.195 0.0 -0:0:0 Pukkelpop Originality M -0.184 0.0 21:31:59 Pukkelpop Originality M -0.168 0.0 -14:12:31 Pukkelpop Readability N 0.394 0.0 0:0:0 Pukkelpop Readability N 0.406 0.0 -33:53:48 Dance Valley Degree centr. M 0.162 0.0 15:56:9 Lowlands People of int. M 0.158 0.0 -29:14:30 Lowlands Social Equality M -0.215 0.0 30:8:31 Lowlands Social Equality M -0.199 0.0 -0:0:0 Pukkelpop Degree centr. M -0.207 0.0 31:15:27 Pukkelpop Degree centr. M -0.188 0.0 -0:0:0 Pukkelpop Inclusiveness M -0.204 0.0 24:23:17 Pukkelpop Inclusiveness M -0.161 0.0 -0:0:0 Pukkelpop People of int. M 0.211 0.0 0:0:0 Pukkelpop People of int. M 0.251 0.0 -23:40:46 Pukkelpop Social Equality M -0.287 0.0 33:53:48 Pukkelpop Social Equality M -0.281 0.0 -0:0:0 Lowlands Dist. Event - User M -0.156 0.0 33:14:46 Pukkelpop Dist. Event - User M -0.175 0.0 33:14:46 Pukkelpop Dist. Event - User M -0.172 0.0 -0:0:0 Const. Ind. holidays Originality M 0.215 0.0 22:15:5 Const. Ind. holidays Originality M 0.21 0.0 -33:53:48 Const. Ind. holidays Sentiment M -0.258 0.0 28:47:18 Const. Ind. holidays Sentiment M -0.222 0.0 -0:0:0 School Holidays Emotionality M -0.204 0.0 21:31:59 School Holidays Emotionality M -0.19 0.0 -5:7:31 School Holidays Intensity M 0.338 0.0 0:0:0 School Holidays Intensity M 0.338 0.0 -0:0:0 School Holidays Newsworthiness M 0.235 0.0 19:21:55 School Holidays Newsworthiness M 0.241 0.0 -32:8:41 School Holidays Readability M 0.244 0.0 4:21:26 School Holidays Readability M 0.249 0.0 -22:58:2 School Holidays Sentiment M -0.158 0.0 28:6:14 weekend Emotionality M -0.194 0.0 33:14:46 Continued on next page

165 Table L.3 – continued from previous page Variable 1 Variable 2 ρP P-value lag weekend Newsworthiness M 0.179 0.0 33:53:48 weekend Newsworthiness M 0.163 0.0 -20:48:46 weekend Sentiment M 0.222 0.0 0:0:0 weekend Sentiment M 0.236 0.0 -3:4:32 School Holidays Experience M -0.156 0.0 11:42:48 weekend Inclusiveness M -0.151 0.0 -26:57:14 weekend Social Equality M -0.176 0.0 33:53:48 School Holidays Dist. Event - User M -0.193 0.0 1:47:39 School Holidays Dist. Event - User M -0.191 0.0 -0:0:0 weekend Dist Event - Res M -0.16 0.0 -10:42:34 Max Temp Emotionality M -0.176 0.0 0:0:0 Max Temp Emotionality M -0.176 0.0 -0:0:0 Max Temp Newsworthiness M 0.208 0.0 0:0:0 Max Temp Newsworthiness M 0.208 0.0 -0:0:0 Max Temp Readability M 0.228 0.0 8:26:11 Max Temp Readability M 0.21 0.0 -0:0:0 Temperature Newsworthiness M 0.268 0.0 0:0:0 Temperature Newsworthiness M 0.268 0.0 -0:0:0 Temperature Readability M 0.213 0.0 0:46:9 Temperature Readability M 0.212 0.0 -0:0:0 Wind Readability M 0.152 0.0 0:0:0 Wind Readability M 0.152 0.0 -0:0:0 Wind Sentiment M 0.192 0.0 20:34:21 Max Temp Popularity M 0.186 0.0 1:32:16 Max Temp Popularity M 0.185 0.0 -0:0:0 Max Temp Social Equality M -0.227 0.0 13:27:49 Max Temp Social Equality M -0.19 0.0 -0:0:0 Moon Experience M 0.165 0.0 25:5:36 Temperature Inclusiveness M -0.167 0.0 8:56:40 Temperature Social Equality M -0.23 0.0 10:42:34 Temperature Social Equality M -0.183 0.0 -0:0:0 Wind Social Equality M -0.191 0.0 12:57:52 Wind Social Equality M -0.156 0.0 -0:0:0 Max Temp Distance Users M 0.17 0.0 33:53:48

Table L.4: Average cross correlations between online and offline variables Pinkpop

Variable 1 Variable 2 ρP P-value lag Festivals Emotionality M -0.212 0.0 16:31:12 Festivals Emotionality M -0.207 0.0 -0:0:0 Festivals Intensity M 0.348 0.0 0:0:0 Festivals Intensity M 0.348 0.0 -18:38:10 Festivals Newsworthiness M -0.156 0.0 6:10:33 Sport Readability M -0.19 0.0 -9:40:27 Festivals Inclusiveness M 0.324 0.0 30:12:9 Festivals Inclusiveness M 0.288 0.0 -4:52:48 Pinkpop Emotionality M -0.224 0.0 0:0:0 Pinkpop Emotionality M -0.224 0.0 -0:0:0 Pinkpop Intensity N 0.64 0.0 0:0:0 Pinkpop Intensity N 0.669 0.0 -30:38:3 Pinkpop Experience M -0.189 0.0 4:17:19 Pinkpop Experience M -0.187 0.0 -0:0:0 Pinkpop Inclusiveness N 0.37 0.0 33:38:44 Pinkpop Inclusiveness N 0.405 0.0 -33:45:11 Pinkpop Social Equality M -0.166 0.0 31:49:7 Lowlands Distance Users M 0.278 0.0 -28:54:24 Pukkelpop Distance Users M 0.281 0.0 -33:45:11 Continued on next page

166 Table L.4 – continued from previous page Variable 1 Variable 2 ρP P-value lag Const. Ind. holidays Newsworthiness M 0.17 0.0 26:44:34 Const. Ind. holidays Originality M 0.158 0.0 1:11:44 Const. Ind. holidays Originality M 0.158 0.0 -0:0:0 Const. Ind. holidays Sentiment M 0.191 0.0 4:59:53 Const. Ind. holidays Sentiment M 0.186 0.0 -1:11:44 Public Holidays Sentiment M -0.152 0.0 0:0:0 Public Holidays Sentiment M -0.152 0.0 -0:0:0 School Holidays Intensity M -0.209 0.0 28:47:56 School Holidays Intensity M -0.206 0.0 -0:0:0 weekend Emotionality M -0.215 0.0 0:50:15 weekend Emotionality M -0.247 0.0 -21:30:27 weekend Intensity N 0.547 0.0 0:0:0 weekend Intensity N 0.547 0.0 -8:37:55 weekend Newsworthiness M -0.191 0.0 3:6:11 weekend Newsworthiness M -0.181 0.0 -0:0:0 weekend Readability M 0.174 0.0 12:26:10 weekend Readability M 0.175 0.0 -26:57:35 Public Holidays Inclusiveness M -0.154 0.0 0:0:0 Public Holidays Inclusiveness M -0.158 0.0 -0:28:45 School Holidays Inclusiveness M -0.222 0.0 21:56:48 School Holidays Inclusiveness M -0.219 0.0 -0:0:0 weekend Experience M -0.165 0.0 -25:26:22 weekend Inclusiveness M 0.355 0.0 11:10:33 weekend Inclusiveness M 0.336 0.0 -5:28:10 Max Temp Emotionality M -0.233 0.0 0:0:0 Max Temp Emotionality M -0.234 0.0 -0:21:34 Max Temp Intensity M 0.153 0.0 0:0:0 Max Temp Intensity M 0.16 0.0 -32:53:37 Max Temp Sentiment M 0.186 0.0 0:0:0 Max Temp Sentiment M 0.186 0.0 -0:0:0 Moon Intensity M -0.201 0.0 0:0:0 Moon Intensity M -0.237 0.0 -33:45:11 Moon Newsworthiness M 0.192 0.0 -31:49:7 Moon Originality M 0.202 0.0 -32:34:16 Moon Readability M -0.289 0.0 33:45:11 Moon Readability M -0.232 0.0 -0:0:0 Rain Intensity M 0.171 0.0 0:0:0 Rain Intensity M 0.227 0.0 -33:12:57 Sun Intensity M -0.168 0.0 7:6:54 Sun Intensity M -0.156 0.0 -0:0:0 Temperature Emotionality M -0.223 0.0 2:23:18 Temperature Emotionality M -0.22 0.0 -0:0:0 Temperature Newsworthiness M 0.233 0.0 5:21:6 Temperature Newsworthiness M 0.19 0.0 -0:0:0 Moon Inclusiveness M -0.316 0.0 18:38:10 Moon Inclusiveness M -0.305 0.0 -0:0:0 Moon People of int. M -0.222 0.0 28:28:29 Moon People of int. M -0.213 0.0 -0:0:0 Moon Social Equality M 0.196 0.0 27:56:4 Moon Social Equality M 0.175 0.0 -12:12:26 Sun People of int. M 0.155 0.0 -28:47:56 Temperature Social Equality M -0.165 0.0 4:3:6 Fog Distance Users M 0.17 0.0 17:38:9

Table L.5: Average cross correlations between online and offline variables Lowlands

Variable 1 Variable 2 ρP P-value lag Continued on next page

167 Table L.5 – continued from previous page Variable 1 Variable 2 ρP P-value lag Festivals Intensity M 0.237 0.0 0:0:0 Festivals Intensity M 0.252 0.0 -35:0:44 Festivals Originality M 0.164 0.0 1:0:16 Festivals Originality M 0.164 0.0 -0:0:0 Festivals Readability M 0.241 0.0 34:40:4 Festivals Readability M 0.237 0.0 -0:0:0 News Intensity M -0.171 0.0 16:11:49 News Intensity M -0.18 0.0 -35:0:44 Sport Readability M -0.159 0.0 21:49:3 News Social Equality M 0.163 0.0 34:40:4 Lowlands Intensity N 0.584 0.0 35:0:44 Lowlands Intensity N 0.547 0.0 -0:0:0 Lowlands Readability M 0.173 0.0 35:0:44 Lowlands Readability M 0.187 0.0 -34:55:34 Pukkelpop Intensity N 0.542 0.0 27:58:51 Pukkelpop Intensity N 0.478 0.0 -0:0:0 Pukkelpop Readability M 0.155 0.0 17:59:22 Pukkelpop Readability M 0.167 0.0 -35:0:44 Lowlands Inclusiveness M 0.241 0.0 2:49:26 Lowlands Inclusiveness M 0.239 0.0 -0:0:0 Lowlands People of int. M 0.266 0.0 16:33:21 Lowlands People of int. M 0.194 0.0 -9:37:2 Lowlands Social Equality M -0.211 0.0 35:0:44 Lowlands Social Equality M -0.176 0.0 -0:0:0 Pukkelpop Inclusiveness M 0.269 0.0 26:55:50 Pukkelpop Inclusiveness M 0.262 0.0 -0:0:0 Pukkelpop People of int. M 0.206 0.0 19:19:45 Pukkelpop People of int. M 0.186 0.0 -0:0:0 Pukkelpop Social Equality M -0.218 0.0 29:48:51 Pukkelpop Social Equality M -0.186 0.0 -0:0:0 Const. Ind. holidays Intensity M -0.191 0.0 35:0:44 Const. Ind. holidays Intensity M -0.174 0.0 -35:0:44 Const. Ind. holidays Sentiment M -0.159 0.0 35:0:44 Const. Ind. holidays Sentiment M -0.152 0.0 -0:0:0 School Holidays Intensity M 0.345 0.0 0:0:0 School Holidays Intensity M 0.348 0.0 -35:0:44 School Holidays Readability N 0.368 0.0 1:22:9 School Holidays Readability N 0.371 0.0 -3:27:29 weekend Intensity M 0.298 0.0 0:0:0 weekend Intensity M 0.322 0.0 -35:0:44 weekend Originality M 0.167 0.0 22:58:3 weekend Readability M 0.16 0.0 4:54:27 Public Holidays Popularity M 0.194 0.0 20:34:33 School Holidays Degree centr. M -0.163 0.0 35:0:44 School Holidays Degree centr. M -0.168 0.0 -5:16:11 School Holidays Inclusiveness M 0.184 0.0 3:0:18 School Holidays Inclusiveness M 0.202 0.0 -35:0:44 weekend People of int. M 0.158 0.0 -33:32:49 Public Holidays Dist Event - Res M 0.261 0.0 -33:22:27 Moon Readability M -0.156 0.0 30:30:42 Rain Intensity M 0.191 0.0 7:26:39 Rain Intensity M 0.186 0.0 -0:0:0 Sun Newsworthiness M 0.212 0.0 2:54:52 Sun Newsworthiness M 0.171 0.0 -0:0:0 Temperature Intensity M 0.151 0.0 -7:59:17 Temperature Newsworthiness M 0.163 0.0 3:27:29 Wind Originality M 0.16 0.0 5:27:3 Wind Originality M 0.158 0.0 -0:43:50 Max Temp People of int. M 0.155 0.0 23:13:59 Continued on next page

168 Table L.5 – continued from previous page Variable 1 Variable 2 ρP P-value lag Temperature Degree centr. M -0.2 0.0 0:0:0 Temperature Degree centr. M -0.2 0.0 -0:0:0 Temperature People of int. M 0.17 0.0 1:5:44 Temperature People of int. M 0.167 0.0 -0:0:0 Temperature Social Equality M -0.164 0.0 2:11:17

169 170 Appendix M

Average correlations real world and online variables (part 1)

171 Table M.1: Pearson average correlations for weather and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Rain 0.034 0.092 0.013 0.039 0.022 0.067 Sun -0.029 0.024 0.021 0.03 0.01 0.043 Temperature -0.057 0.059 -0.034 0.033 0.028 0.073 Max Temp -0.051 0.088 -0.052 0.036 0.018 0.074 Fog 0.037 0.034 0.011 0.013 0.032 0.032 Thunderstorm 0.022 0.021 0.003 0.029 0.034 0.032 Wind 0.007 0.093 0.004 0.041 0.041 0.061

172 Moon -0.001 0.164 0.014 0.149 0.034 0.163 Table M.2: Spearman average correlations for weather and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Rain 0.025 0.102 0.022 0.06 0.036 0.079 Sun -0.059 0.029 0.025 0.039 0.002 0.071 Temperature -0.04 0.069 -0.029 0.056 0.008 0.12 Max Temp -0.014 0.107 -0.05 0.08 -0.003 0.126 Fog 0.017 0.023 0.01 0.018 0.015 0.021 Thunderstorm 0.025 0.03 0.0 0.032 0.042 0.028 Wind -0.001 0.08 -0.007 0.047 0.05 0.053 Moon -0.043 0.203 -0.007 0.115 -0.033 0.233 Table M.3: Cross average correlations for weather and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Rain 0.054 0.118 12:25:30 15:15:6 -0.028 0.057 15:40:19 4:54:1 0.036 0.1 21:23:22 11:31:20 Sun -0.029 0.024 0:0:0 0:0:0 0.009 0.052 7:31:3 7:47:24 0.037 0.05 6:27:54 7:19:10 Temperature -0.048 0.071 3:53:14 6:27:52 -0.057 0.008 12:38:10 14:38:19 0.05 0.083 7:48:17 8:28:37 Max Temp -0.06 0.105 6:34:41 7:29:11 -0.07 0.024 9:32:33 12:20:2 0.029 0.093 9:19:24 12:37:4 Fog 0.061 0.022 6:4:33 6:8:15 0.025 0.013 13:46:8 11:41:14 0.078 0.048 16:12:35 14:12:30 Thunderstorm 0.024 0.038 16:19:21 12:28:58 -0.005 0.053 10:52:52 10:30:44 0.017 0.045 10:40:50 13:56:27 Wind 0.028 0.113 5:49:37 7:10:56 0.0 0.045 4:28:25 3:58:36 0.071 0.094 11:0:19 10:31:33 Moon 0.016 0.186 8:51:22 11:29:48 0.02 0.15 7:8:12 7:50:29 0.029 0.172 17:18:25 12:49:8

173 Table M.4: Cross average correlations for weather and place related variables

Variables Rain Sun Temperature Max Temp µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res 0.042 0.108 -8:3:47 11:36:44 -0.043 0.027 -4:54:55 4:29:58 -0.025 0.082 -6:41:30 9:54:19 -0.035 0.104 -5:11:0 4:48:11 Dist. Event - User 0.041 0.029 -14:16:5 14:16:12 0.032 0.034 -7:23:58 10:50:12 -0.039 0.032 -1:6:28 1:33:52 -0.053 0.036 -0:18:40 0:29:48 Distance Users 0.032 0.085 -13:36:5 9:57:11 0.018 0.047 -4:49:32 5:23:27 0.042 0.078 -5:12:46 5:52:40 0.019 0.079 -7:52:57 8:24:24 Table M.5: Cross average correlations for weather and place related variables (continued)

Variables Fog Thunderstorm Wind Moon µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res 0.077 0.067 -13:28:52 8:25:35 0.036 0.026 -13:25:14 13:54:2 0.013 0.098 -2:11:43 2:45:22 -0.001 0.164 -0:0:0 0:0:0 Dist. Event - User 0.026 0.014 -6:2:31 5:55:50 0.004 0.031 -0:27:40 0:55:21 0.027 0.04 -11:45:44 14:30:35 0.015 0.149 -1:6:29 2:12:59 Distance Users 0.077 0.078 -18:43:51 11:49:47 0.041 0.037 -12:20:25 12:11:7 0.057 0.055 -5:19:5 6:31:49 0.025 0.166 -4:25:8 7:14:1 Table M.6: Pearson average correlations for holiday and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Public Holidays 0.0 0.024 0.053 0.035 0.005 0.021 School Holidays -0.142 0.137 -0.105 0.078 -0.077 0.1 Const. Ind. holidays -0.033 0.158 -0.018 0.122 -0.006 0.091

174 weekend -0.076 0.163 -0.05 0.043 -0.097 0.142 Table M.7: Spearman average correlations for holiday and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Public Holidays -0.004 0.048 0.043 0.018 0.02 0.052 School Holidays -0.108 0.124 -0.026 0.033 -0.086 0.13 Const. Ind. holidays 0.032 0.156 -0.003 0.017 0.03 0.131 weekend -0.054 0.19 -0.061 0.028 -0.11 0.147 Table M.8: Cross average correlations for holiday and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Public Holidays -0.001 0.044 17:56:39 10:51:43 0.061 0.033 6:18:35 8:8:50 0.035 0.06 20:25:30 12:30:32 School Holidays M -0.159 0.136 29:44:0 4:30:11 -0.104 0.082 12:48:7 14:57:43 -0.091 0.103 24:0:53 12:59:46 Const. Ind. holidays -0.028 0.165 17:24:15 12:35:27 -0.015 0.124 5:10:36 10:18:29 -0.007 0.092 10:52:7 12:27:54 175 weekend -0.072 0.169 4:25:58 6:39:15 -0.014 0.105 18:36:12 11:42:41 -0.093 0.149 5:16:27 6:14:18 Table M.9: Cross average correlations for holiday and place related variables

Variables Public Holidays School Holidays Const. Ind. holidays weekend µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res 0.083 0.094 -24:58:0 12:32:33 -0.144 0.142 -5:32:43 11:5:27 -0.036 0.168 -13:15:7 16:17:44 -0.125 0.195 -13:19:7 12:44:31 Dist. Event - User 0.067 0.033 -13:44:57 13:25:23 -0.087 0.104 -5:32:43 11:5:27 -0.007 0.134 -7:41:24 11:16:17 -0.034 0.074 -4:55:1 9:50:2 Distance Users 0.045 0.017 -19:21:45 11:1:57 -0.085 0.111 -6:19:40 10:48:24 -0.004 0.093 -15:3:46 16:33:23 -0.14 0.164 -13:14:48 12:42:46 Table M.10: Pearson average correlations for festival and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dance Valley -0.061 0.106 0.032 0.053 -0.061 0.119 Sensation -0.008 0.021 -0.001 0.002 -0.01 0.012 Pukkelpop 0.018 0.046 -0.056 0.072 0.026 0.052 Pinkpop 0.027 0.054 -0.001 0.021 0.028 0.031

176 Lowlands -0.009 0.036 -0.058 0.059 0.013 0.056 Table M.11: Spearman average correlations for festival and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dance Valley -0.058 0.105 0.052 0.088 -0.058 0.122 Sensation -0.008 0.027 -0.007 0.011 -0.015 0.023 Pukkelpop 0.035 0.042 -0.101 0.145 0.055 0.057 Pinkpop 0.076 0.134 -0.022 0.044 0.042 0.04 Lowlands -0.003 0.046 -0.07 0.061 0.028 0.07 Table M.12: Cross average correlations for festival and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dance Valley -0.05 0.125 24:28:55 7:19:51 0.048 0.064 16:20:48 12:37:59 -0.065 0.131 15:54:22 12:42:37 Sensation -0.015 0.032 16:57:7 10:58:3 0.022 0.005 11:59:37 9:50:30 -0.005 0.027 24:49:35 10:34:10 Pukkelpop 0.022 0.057 17:10:32 14:58:8 -0.048 0.091 22:36:3 11:46:0 0.025 0.064 21:57:57 12:5:10 Pinkpop 0.039 0.061 21:5:31 10:3:3 -0.004 0.035 15:50:48 13:47:54 0.025 0.059 23:4:21 5:19:59 Lowlands -0.019 0.053 23:42:44 12:16:38 -0.052 0.079 29:55:39 5:12:52 0.004 0.074 26:13:29 11:37:55 Table M.13: Cross average correlations for festival and place related variables 177 Variables Dance Valley Sensation Pukkelpop µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res -0.052 0.112 -6:15:37 8:58:20 -0.001 0.024 -13:56:31 14:37:45 0.041 0.065 -12:42:42 12:55:22 Dist. Event - User 0.047 0.064 -10:16:48 13:0:4 0.023 0.008 -17:37:39 12:16:46 -0.056 0.072 -0:0:0 0:0:0 Distance Users -0.051 0.124 -8:36:59 13:40:52 -0.002 0.019 -13:44:15 13:54:50 0.085 0.112 -12:3:6 13:27:9 Table M.14: Cross average correlations for festival and place related variables (continued)

Variables Pinkpop Lowlands µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res 0.067 0.045 -23:5:28 13:35:8 0.012 0.06 -10:31:34 11:53:38 Dist. Event - User 0.008 0.036 -3:25:7 6:50:15 -0.058 0.059 -0:0:0 0:0:0 Distance Users 0.038 0.035 -8:48:36 12:49:10 0.07 0.119 -7:48:31 11:8:15 Table M.15: Pearson average correlations for context and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ News 0.051 0.064 0.02 0.038 0.045 0.063 Sport 0.049 0.062 0.019 0.029 0.02 0.056

178 Festivals -0.048 0.127 -0.096 0.053 -0.037 0.122 Table M.16: Spearman average correlations for context and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ News 0.079 0.082 0.025 0.069 0.084 0.093 Sport 0.036 0.04 0.024 0.03 0.007 0.059 Festivals -0.036 0.183 -0.107 0.061 -0.067 0.163 Table M.17: Cross average correlations for context and place related variables

Variables Dist Event - Res Dist. Event - User Distance Users µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ News 0.062 0.054 3:46:33 7:33:6 -0.027 0.161 19:34:1 13:8:39 0.055 0.082 9:0:43 11:20:36 Sport 0.121 0.157 26:6:23 6:3:19 0.066 0.037 22:11:25 9:12:18 0.089 0.194 16:23:43 10:51:2

179 Festivals -0.057 0.137 13:16:33 15:22:3 -0.108 0.048 14:11:59 13:50:55 -0.051 0.126 12:17:43 15:4:33 Table M.18: Cross average correlations for context and place related variables

Variables News Sport Festivals µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dist Event - Res 0.08 0.072 -17:46:59 14:21:41 0.069 0.057 -2:5:55 2:40:20 -0.047 0.129 -7:20:38 11:57:41 Dist. Event - User 0.043 0.048 -9:45:53 10:10:3 0.052 0.039 -9:57:15 10:21:13 -0.098 0.051 -1:4:34 2:9:9 Distance Users 0.066 0.081 -14:28:45 11:11:30 0.038 0.055 -4:9:52 4:1:27 -0.012 0.138 -16:6:6 14:28:14 Table M.19: Pearson average correlations for weather and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Rain 0.022 0.088 -0.027 0.017 -0.043 0.058 0.036 0.024 0.032 0.093 -0.023 0.064 -0.001 0.062 -0.026 0.04 Sun -0.026 0.082 0.086 0.106 0.069 0.097 -0.09 0.05 -0.073 0.06 0.114 0.114 0.075 0.119 0.026 0.1 Temperature -0.086 0.066 0.084 0.009 0.096 0.044 -0.031 0.115 -0.112 0.076 0.083 0.104 0.03 0.09 0.117 0.037 Max Temp -0.073 0.094 -0.006 0.096 0.056 0.035 0.019 0.172 -0.065 0.079 0.047 0.142 -0.009 0.051 0.103 0.052 Fog 0.015 0.025 0.018 0.009 0.009 0.005 0.015 0.014 0.002 0.024 -0.009 0.012 -0.006 0.015 -0.018 0.012 Thunderstorm 0.026 0.041 0.003 0.015 0.068 0.056 0.016 0.042 0.002 0.046 -0.004 0.039 0.014 0.042 -0.005 0.019 Wind -0.022 0.067 -0.06 0.056 0.029 0.097 -0.024 0.038 -0.033 0.096 0.03 0.063 -0.006 0.045 0.035 0.035

180 Moon -0.037 0.154 0.069 0.063 0.129 0.146 -0.123 0.1 -0.035 0.222 0.056 0.188 -0.003 0.094 0.006 0.119 Table M.20: Spearman average correlations for weather and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Rain 0.016 0.095 -0.019 0.021 -0.053 0.057 0.031 0.029 0.028 0.093 -0.053 0.049 0.017 0.058 -0.027 0.037 Sun -0.05 0.094 0.084 0.112 0.069 0.112 -0.093 0.043 -0.093 0.068 0.136 0.069 0.043 0.098 0.039 0.086 Temperature -0.11 0.09 0.123 0.025 0.106 0.031 -0.053 0.116 -0.143 0.106 0.133 0.107 -0.018 0.083 0.135 0.045 Max Temp -0.059 0.086 0.045 0.06 0.068 0.055 -0.003 0.123 -0.088 0.127 0.062 0.119 -0.037 0.057 0.094 0.037 Fog 0.017 0.027 0.026 0.012 0.015 0.005 0.016 0.015 0.006 0.026 -0.019 0.03 -0.006 0.023 -0.019 0.013 Thunderstorm 0.018 0.031 -0.003 0.03 0.05 0.043 0.012 0.044 0.001 0.043 -0.016 0.041 0.008 0.051 -0.003 0.02 Wind -0.032 0.096 -0.048 0.101 0.027 0.115 -0.041 0.031 -0.04 0.097 0.039 0.069 -0.007 0.056 0.036 0.056 Moon -0.044 0.173 0.101 0.047 0.094 0.126 -0.117 0.088 -0.052 0.24 0.044 0.208 -0.014 0.143 0.008 0.121 Table M.21: Cross average correlations for weather and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Rain 0.057 0.117 12:35:34 12:30:56 0.034 0.079 17:22:19 17:22:19 -0.018 0.082 16:20:21 16:20:21 0.055 0.024 18:10:16 12:53:52 Sun -0.032 0.087 2:28:20 2:27:41 0.074 0.125 3:14:28 0:26:10 0.048 0.118 5:35:58 5:35:58 -0.108 0.048 4:23:54 3:16:11 Temperature -0.092 0.069 1:26:57 2:46:49 0.094 0.018 3:0:38 3:0:38 0.098 0.045 1:50:19 1:50:19 -0.04 0.13 6:3:1 5:7:6 Max Temp -0.058 0.108 9:36:31 13:19:22 -0.001 0.101 7:41:27 7:41:27 0.062 0.03 5:49:38 5:49:38 0.01 0.178 4:45:19 5:53:38 Fog 0.019 0.034 16:19:29 13:29:18 0.001 0.044 13:58:42 2:4:55 -0.039 0.003 31:45:11 0:55:32 -0.009 0.035 11:52:5 7:12:7 Thunderstorm 0.069 0.03 19:23:2 14:10:28 0.041 0.003 23:45:3 10:59:35 0.09 0.055 3:20:42 0:51:43 0.022 0.049 11:54:23 12:55:59 Wind -0.006 0.072 3:10:8 6:20:16 -0.067 0.064 1:3:6 1:3:6 0.021 0.105 1:45:10 1:45:10 -0.026 0.038 0:42:4 1:24:8 Moon -0.031 0.159 14:10:33 13:16:59 0.069 0.062 0:21:2 0:21:2 0.129 0.146 0:0:0 0:0:0 -0.112 0.144 19:53:57 12:23:51 Table M.22: Cross average correlations for weather and people related variables (continued)

Variables Social Equality Popularity Experience People of int. µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Rain 0.007 0.117 8:17:12 9:1:50 0.002 0.085 14:10:44 11:54:36 0.01 0.073 9:27:2 3:46:13 -0.01 0.082 14:38:35 10:52:26 Sun -0.085 0.066 1:26:27 1:19:10 0.135 0.101 13:15:25 12:42:51 0.09 0.119 6:24:26 8:59:0 0.029 0.102 1:37:18 2:18:17 Temperature -0.139 0.091 8:57:35 10:6:2 0.093 0.104 7:41:25 9:37:15 0.032 0.095 2:22:12 3:31:52 0.126 0.032 3:4:4 3:9:5 Max Temp -0.084 0.091 13:34:59 12:22:38 0.05 0.142 0:55:41 1:14:23 -0.009 0.051 0:0:0 0:0:0 0.115 0.061 19:31:53 13:6:57 Fog 0.015 0.038 16:23:37 10:16:58 -0.008 0.023 20:14:27 13:18:4 -0.005 0.03 18:43:11 4:52:5 -0.013 0.022 6:46:23 11:58:31 Thunderstorm 0.018 0.051 5:54:14 5:57:53 -0.001 0.054 17:31:13 11:25:5 0.003 0.048 11:34:8 12:3:22 -0.009 0.045 8:59:15 5:37:49 Wind -0.037 0.117 6:19:4 6:39:21 0.044 0.063 10:6:55 10:11:50 0.021 0.07 9:12:30 8:55:41 0.039 0.037 3:13:2 3:35:39 Moon -0.035 0.229 11:6:41 11:20:11 0.06 0.188 9:20:55 11:42:47 0.014 0.11 13:11:50 11:48:51 -0.003 0.131 23:9:38 12:0:34 181

Table M.23: Cross average correlations for weather and people related variables

Variables Rain Sun Temperature Max Temp µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.021 0.097 -2:44:9 3:0:15 0.006 0.102 -9:17:31 9:55:27 -0.102 0.064 -6:9:55 12:19:51 -0.069 0.117 -6:11:37 8:29:44 Betweenness centr. -0.032 0.022 -1:14:29 1:14:29 0.092 0.149 -21:57:19 5:53:40 0.088 0.004 -1:45:10 1:45:10 -0.031 0.121 -5:49:38 5:49:38 Closeness centr. -0.023 0.097 -17:30:38 10:20:21 0.081 0.108 -6:55:51 6:55:51 0.103 0.036 -3:30:13 3:30:13 0.049 0.087 -32:47:9 1:57:29 Inclusiveness 0.048 0.033 -3:22:58 3:37:30 -0.105 0.038 -2:47:22 3:49:10 -0.031 0.115 -4:51:39 9:22:1 0.019 0.172 -0:0:0 0:0:0 Social Equality 0.057 0.087 -6:57:44 5:20:35 -0.07 0.067 -5:45:8 11:30:16 -0.112 0.076 -0:15:21 0:30:43 -0.077 0.074 -5:32:43 11:5:27 Popularity -0.041 0.092 -12:6:32 7:8:22 0.123 0.108 -6:49:13 10:3:28 0.085 0.102 -4:43:3 9:26:6 0.046 0.143 -0:47:44 1:16:12 Experience 0.019 0.082 -16:9:15 13:49:27 0.093 0.133 -13:35:49 10:24:11 0.03 0.09 -0:15:21 0:30:43 -0.012 0.063 -9:1:54 12:15:21 People of int. 0.008 0.075 -12:58:40 11:43:59 0.029 0.109 -14:11:59 12:8:57 0.125 0.036 -12:50:20 15:44:42 0.108 0.05 -5:29:45 7:53:23 Table M.24: Cross average correlations for weather and people related variables (continued)

Variables Fog Thunderstorm Wind Moon µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.031 0.027 -9:28:58 12:30:52 0.049 0.033 -12:36:39 11:15:46 -0.013 0.077 -3:21:8 6:21:35 0.005 0.198 -10:43:23 12:12:11 Betweenness centr. 0.02 0.007 -1:35:31 0:53:27 0.018 0.038 -12:59:37 11:35:29 -0.034 0.082 -10:10:41 10:10:41 0.089 0.111 -31:24:29 0:34:49 Closeness centr. 0.032 0.002 -7:5:22 3:24:43 0.069 0.055 -17:42:58 17:42:58 0.029 0.097 -0:0:0 0:0:0 0.131 0.147 -0:38:24 0:38:24 Inclusiveness 0.016 0.037 -10:21:50 6:17:41 0.022 0.042 -1:41:11 2:11:48 -0.023 0.048 -3:19:50 5:35:31 -0.126 0.098 -1:29:28 2:23:39 Social Equality 0.023 0.036 -20:2:22 11:57:1 0.029 0.055 -11:24:2 10:31:11 -0.021 0.108 -3:8:11 5:17:3 -0.033 0.224 -5:39:12 7:1:50 Popularity -0.005 0.014 -4:36:57 8:2:13 -0.018 0.039 -7:55:21 11:16:1 0.027 0.076 -17:27:41 11:49:26 0.057 0.196 -15:11:5 12:25:44 Experience 0.002 0.026 -7:29:39 9:10:52 0.002 0.051 -2:50:42 4:57:25 -0.009 0.063 -4:53:30 7:4:45 0.002 0.098 -7:43:28 13:9:0 People of int. -0.002 0.025 -9:27:48 11:37:45 0.012 0.06 -16:36:24 11:24:14 0.044 0.044 -20:7:4 12:32:44 0.006 0.119 -3:12:50 4:5:31 Table M.25: Pearson average correlations for holiday andpeople related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Public Holidays 0.017 0.043 0.031 0.035 -0.03 0.051 -0.034 0.085 0.044 0.039 -0.02 0.047 0.021 0.023 -0.038 0.024 School Holidays -0.145 0.112 -0.09 0.102 0.052 0.018 0.066 0.232 -0.028 0.076 0.006 0.032 0.029 0.1 0.037 0.115 Const. Ind. holidays -0.018 0.169 -0.128 0.089 -0.028 0.113 0.053 0.187 0.145 0.115 -0.051 0.093 0.032 0.069 0.026 0.186

182 weekend -0.036 0.069 -0.045 0.183 0.054 0.048 0.126 0.13 -0.069 0.118 0.061 0.112 0.019 0.178 0.023 0.109 Table M.26: Spearman average correlations for holiday andpeople related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Public Holidays 0.022 0.05 0.014 0.015 -0.028 0.052 -0.025 0.085 0.051 0.03 -0.021 0.047 0.032 0.03 -0.041 0.026 School Holidays -0.088 0.108 -0.11 0.108 0.018 0.065 0.035 0.178 0.007 0.085 0.032 0.051 0.025 0.109 0.032 0.133 Const. Ind. holidays 0.001 0.143 -0.126 0.048 -0.044 0.139 0.054 0.143 0.137 0.115 -0.05 0.138 0.006 0.114 0.043 0.216 weekend -0.037 0.069 -0.039 0.202 0.073 0.031 0.123 0.136 -0.078 0.13 0.045 0.146 -0.008 0.185 0.02 0.117 Table M.27: Cross average correlations for holiday andpeople related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Public Holidays 0.016 0.07 15:10:42 10:19:45 0.031 0.053 7:22:46 3:10:19 -0.03 0.051 0:38:24 0:38:24 -0.03 0.091 4:38:28 5:41:20 School Holidays M -0.152 0.108 15:30:43 16:18:8 -0.089 0.103 15:18:16 15:18:16 0.074 0.004 18:13:4 12:36:34 0.053 0.238 16:57:16 13:26:37 Const. Ind. holidays -0.016 0.169 5:7:46 8:54:51 -0.135 0.082 17:42:58 17:42:58 -0.028 0.113 0:0:0 0:0:0 0.047 0.202 27:21:39 13:43:8 weekend -0.07 0.07 22:49:13 8:24:30 -0.032 0.196 6:17:48 6:17:48 0.079 0.023 17:42:58 17:42:58 0.106 0.18 20:14:43 11:41:24 Table M.28: Cross average correlations for holiday andpeople related variables (continued)

Variables Social Equality Popularity Experience People of int. µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Public Holidays 0.036 0.076 12:13:0 12:43:50 0.024 0.099 14:31:18 9:45:10 -0.011 0.071 20:27:20 10:53:30 -0.025 0.041 4:58:18 9:56:36

183 School Holidays -0.03 0.083 18:59:50 15:51:48 0.016 0.039 20:56:24 16:57:31 0.03 0.105 9:41:17 8:38:46 0.036 0.119 15:20:16 14:25:25 Const. Ind. holidays M 0.162 0.106 17:18:41 15:19:28 -0.054 0.096 18:31:38 12:45:20 0.027 0.075 16:53:31 15:0:8 0.022 0.188 12:36:15 15:26:24 weekend -0.105 0.12 21:51:20 13:50:56 0.102 0.118 15:59:34 13:20:48 0.043 0.179 10:48:3 12:28:40 0.03 0.126 16:27:53 15:32:2

Table M.29: Cross average correlations for holiday andpeople related variables

Variables Public Holidays School Holidays Const. Ind. holidays weekend µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.024 0.052 -7:56:24 11:25:58 -0.144 0.114 -2:21:34 2:55:2 -0.003 0.182 -16:23:56 15:37:19 -0.09 0.091 -16:35:2 14:13:10 Betweenness centr. -0.047 0.066 -22:11:31 2:53:51 -0.09 0.102 -0:0:0 0:0:0 -0.128 0.089 -0:0:0 0:0:0 -0.045 0.183 -0:0:0 0:0:0 Closeness centr. -0.03 0.051 -0:38:24 0:38:24 0.052 0.018 -2:48:14 2:48:14 -0.026 0.116 -17:42:58 17:42:58 0.074 0.042 -8:15:41 2:14:25 Inclusiveness -0.032 0.088 -6:49:54 13:25:31 0.07 0.235 -7:15:30 13:53:8 0.061 0.189 -7:59:0 13:9:53 0.112 0.158 -13:36:53 13:51:8 Social Equality 0.044 0.039 -0:3:17 0:6:35 -0.026 0.079 -2:1:38 4:3:17 0.149 0.118 -9:55:41 13:53:20 -0.108 0.112 -13:38:56 12:13:51 Popularity 0.026 0.071 -14:22:8 13:21:53 0.004 0.037 -15:41:3 15:25:55 -0.048 0.096 -7:21:9 10:13:44 0.095 0.122 -25:33:1 10:3:57 Experience 0.044 0.022 -14:23:0 11:22:18 0.04 0.104 -19:54:15 13:31:17 0.042 0.072 -13:28:4 14:36:12 0.022 0.197 -17:1:47 14:8:28 People of int. 0.003 0.069 -11:9:18 13:57:9 0.029 0.119 -13:20:53 13:42:48 0.024 0.187 -7:39:58 13:11:12 0.051 0.114 -16:50:33 13:24:25 184 Appendix N average correlations real world and online variables (part 2)

185 Table N.1: Pearson average correlations for festival and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dance Valley 0.037 0.023 -0.103 0.139 0.032 0.046 -0.021 0.035 0.037 0.032 0.025 0.042 0.031 0.086 0.015 0.042 Sensation -0.021 0.032 0.109 0.0 -0.065 0.0 -0.011 0.012 -0.027 0.065 0.018 0.034 -0.025 0.044 0.022 0.051 Pukkelpop -0.018 0.152 0.061 0.069 0.102 0.133 -0.013 0.16 -0.044 0.171 -0.032 0.046 -0.007 0.075 0.06 0.114 Pinkpop 0.02 0.046 -0.072 0.023 -0.024 0.036 0.096 0.134 -0.033 0.06 -0.021 0.028 -0.049 0.073 0.008 0.044

186 Lowlands 0.006 0.128 0.057 0.064 0.086 0.102 -0.001 0.135 -0.023 0.163 -0.033 0.06 0.014 0.078 0.039 0.089 Table N.2: Spearman average correlations for festival and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dance Valley 0.054 0.037 -0.105 0.154 0.035 0.041 -0.018 0.029 0.04 0.024 0.047 0.04 0.046 0.115 0.005 0.036 Sensation -0.029 0.043 0.128 0.0 -0.065 0.0 -0.008 0.01 -0.032 0.077 -0.003 0.012 -0.025 0.05 0.016 0.042 Pukkelpop -0.041 0.123 0.062 0.066 0.071 0.112 -0.006 0.16 -0.058 0.164 -0.02 0.089 0.007 0.082 0.057 0.115 Pinkpop 0.025 0.058 -0.085 0.027 -0.025 0.04 0.106 0.15 -0.031 0.061 -0.058 0.031 -0.068 0.106 0.005 0.043 Lowlands -0.012 0.106 0.064 0.071 0.071 0.092 -0.002 0.13 -0.039 0.148 -0.031 0.103 0.019 0.092 0.034 0.091 Table N.3: Cross average correlations for festival and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dance Valley 0.073 0.047 12:25:18 10:6:33 M -0.154 0.094 13:18:52 2:45:45 0.071 0.032 28:58:34 4:23:28 -0.044 0.076 19:45:41 11:16:17 Sensation -0.03 0.046 22:14:59 8:23:35 0.119 0.0 9:6:16 0:0:0 -0.065 0.0 0:0:0 0:0:0 -0.027 0.01 23:34:28 13:6:24 Pukkelpop -0.037 0.162 18:47:16 13:15:26 0.026 0.105 13:10:8 12:28:4 0.083 0.152 13:51:49 13:51:49 -0.027 0.176 16:35:20 9:14:23 Pinkpop 0.036 0.058 6:21:42 8:22:30 -0.021 0.074 11:14:15 11:14:15 -0.064 0.005 12:27:4 6:50:34 0.108 0.138 15:47:20 13:29:39 Lowlands -0.019 0.15 16:18:47 10:46:24 0.04 0.095 16:50:6 11:55:35 0.09 0.127 32:5:51 1:16:11 -0.023 0.152 17:2:4 14:28:9 Table N.4: Cross average correlations for festival and people related variables (continued)

Variables Social Equality Popularity Experience People of int. µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dance Valley 0.059 0.022 20:54:34 12:46:42 0.038 0.045 21:21:34 7:3:32 0.059 0.074 16:0:54 13:28:43 0.06 0.035 19:48:29 13:16:50 Sensation -0.026 0.069 19:10:40 9:37:59 0.041 0.038 17:35:32 11:40:9 -0.025 0.045 15:3:20 10:32:8 0.028 0.049 3:56:13 6:49:9 Pukkelpop -0.068 0.177 19:22:33 14:11:39 -0.032 0.047 8:2:27 13:9:24 0.012 0.079 15:43:26 14:51:17 0.064 0.118 6:1:42 7:5:52 Pinkpop -0.011 0.097 15:22:29 15:13:19 -0.013 0.035 15:11:1 15:0:7 -0.063 0.074 21:3:52 15:36:1 0.031 0.056 8:29:37 11:45:48 Lowlands -0.035 0.176 20:52:34 13:58:22 -0.036 0.062 6:11:43 5:32:33 0.031 0.08 12:42:21 12:16:29 0.06 0.119 7:5:13 8:42:39

Table N.5: Cross average correlations for festival and people related variables 187 Variables Dance Valley Sensation Pukkelpop µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.048 0.014 -3:53:32 3:5:51 -0.008 0.04 -9:9:17 9:26:12 -0.018 0.152 -0:0:0 0:0:0 Betweenness centr. -0.096 0.147 -1:24:8 1:24:8 0.114 0.0 -13:17:27 0:0:0 0.078 0.051 -13:20:27 13:20:27 Closeness centr. 0.032 0.046 -0:0:0 0:0:0 -0.065 0.0 -0:0:0 0:0:0 0.102 0.133 -0:0:0 0:0:0 Inclusiveness -0.008 0.047 -5:36:45 5:14:14 0.0 0.019 -16:47:9 16:50:14 -0.012 0.16 -0:44:7 1:28:15 Social Equality 0.054 0.031 -13:15:57 11:17:1 -0.015 0.072 -24:35:35 14:19:51 -0.027 0.193 -12:3:29 14:46:3 Popularity 0.03 0.049 -9:25:58 10:59:25 0.025 0.031 -5:3:24 5:28:41 -0.017 0.064 -12:31:12 11:21:7 Experience 0.034 0.086 -8:35:41 11:29:55 -0.013 0.051 -19:24:27 12:35:47 0.024 0.098 -17:35:25 14:48:2 People of int. 0.023 0.043 -6:47:19 9:54:43 0.069 0.091 -15:22:11 12:6:58 0.068 0.125 -4:59:31 9:21:24 Table N.6: Cross average correlations for festival and people related variables (continued)

Variables Pinkpop Lowlands µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.03 0.05 -9:0:19 9:39:46 0.008 0.129 -7:9:23 11:35:20 Betweenness centr. -0.018 0.076 -5:49:38 5:49:38 0.071 0.05 -9:38:49 9:38:49 Closeness centr. -0.019 0.041 -2:26:5 2:26:5 0.086 0.102 -0:0:0 0:0:0 Inclusiveness 0.12 0.148 -14:10:18 12:57:38 -0.001 0.135 -0:0:0 0:0:0 Social Equality -0.006 0.087 -10:10:59 13:27:38 -0.018 0.164 -8:52:26 9:31:55 Popularity -0.002 0.05 -4:48:32 8:51:41 -0.017 0.072 -4:10:41 6:28:8 Experience -0.041 0.082 -4:55:29 9:50:58 0.038 0.089 -13:0:33 13:28:54 People of int. 0.069 0.052 -13:5:2 12:50:55 0.053 0.101 -7:54:43 11:16:12 Table N.7: Pearson average correlations for context and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ News -0.005 0.038 -0.035 0.072 0.046 0.102 0.009 0.1 0.051 0.079 -0.057 0.052 -0.006 0.072 -0.046 0.043 Sport 0.023 0.035 -0.019 0.013 -0.021 0.028 0.005 0.047 0.009 0.031 -0.019 0.011 -0.001 0.035 0.026 0.108

188 Festivals -0.071 0.099 0.134 0.059 M 0.177 0.054 0.03 0.152 M -0.177 0.13 0.063 0.154 -0.062 0.016 0.014 0.087 Table N.8: Spearman average correlations for context and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ News 0.003 0.038 -0.046 0.108 0.038 0.09 0.012 0.101 0.052 0.075 -0.051 0.068 -0.003 0.075 -0.05 0.04 Sport 0.025 0.046 -0.005 0.016 -0.019 0.03 0.0 0.05 0.017 0.033 0.003 0.025 0.02 0.04 0.009 0.077 Festivals -0.079 0.14 0.057 0.146 0.148 0.043 0.108 0.204 M -0.198 0.11 0.073 0.11 -0.081 0.1 0.055 0.077 Table N.9: Cross average correlations for context and people related variables

Variables Degree centr. Betweenness centr. Closeness centr. Inclusiveness µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ News 0.028 0.061 19:31:54 15:58:8 0.041 0.151 23:6:17 7:43:22 0.046 0.102 0:0:0 0:0:0 0.026 0.102 6:54:26 13:29:49 Sport 0.077 0.07 20:23:1 13:14:7 0.006 0.067 21:44:44 12:18:36 -0.061 0.01 7:13:46 4:25:29 0.003 0.074 13:18:45 11:37:11 Festivals -0.113 0.128 25:32:5 8:17:34 0.134 0.059 0:0:0 0:0:0 M 0.177 0.054 0:0:0 0:0:0 0.038 0.176 20:43:32 14:17:38 Table N.10: Cross average correlations for context and people related variables (continued)

Variables Social Equality Popularity Experience People of int. µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ News 0.076 0.119 20:35:44 16:4:6 -0.076 0.064 19:35:51 16:4:32 0.0 0.083 3:42:24 5:26:1 -0.048 0.044 14:11:50 14:39:55

189 Sport 0.031 0.058 16:45:52 11:25:19 0.037 0.055 20:14:18 9:20:22 0.014 0.051 18:31:49 10:37:48 0.03 0.142 8:53:39 9:47:48 Festivals M -0.198 0.14 15:29:57 12:18:2 0.094 0.172 31:2:49 4:32:34 -0.074 0.018 17:29:33 9:31:4 0.024 0.093 12:29:37 12:53:12

Table N.11: Cross average correlations for context and people related variables

Variables News Sport Festivals µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Degree centr. 0.029 0.061 -12:21:41 14:53:57 0.064 0.058 -21:16:41 13:41:48 -0.074 0.098 -7:49:13 10:24:24 Betweenness centr. -0.033 0.078 -12:27:59 11:3:51 0.066 0.003 -28:28:4 2:49:51 M 0.154 0.049 -16:40:17 8:57:54 Closeness centr. 0.058 0.127 -13:41:41 10:53:24 0.02 0.068 -9:6:39 9:6:39 M 0.213 0.026 -22:59:31 12:26:25 Inclusiveness 0.02 0.123 -11:13:5 12:57:10 -0.025 0.051 -7:9:3 12:49:9 0.038 0.152 -7:9:35 11:58:55 Social Equality 0.066 0.083 -10:19:23 11:42:47 0.037 0.045 -15:57:23 11:54:8 M -0.181 0.131 -1:54:39 3:45:44 Popularity -0.048 0.069 -13:44:50 12:9:35 0.028 0.045 -17:10:59 15:3:44 0.084 0.15 -17:10:45 15:4:58 Experience -0.003 0.078 -3:15:31 4:1:22 0.019 0.047 -10:26:53 10:3:7 -0.067 0.019 -1:59:49 2:26:50 People of int. -0.046 0.046 -4:29:52 7:17:23 0.042 0.116 -2:1:34 2:37:17 0.033 0.093 -20:21:32 11:23:3 Table N.12: Pearson average correlations for weather and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Rain 0.006 0.092 0.004 0.051 -0.044 0.108 -0.022 0.041 -0.093 0.067 -0.016 0.171 Sun 0.009 0.096 -0.026 0.075 -0.073 0.169 0.128 0.072 0.017 0.037 -0.015 0.111 Temperature 0.011 0.069 -0.067 0.129 -0.006 0.14 0.07 0.16 M 0.158 0.118 0.154 0.101 Max Temp -0.017 0.035 -0.067 0.185 0.098 0.118 -0.023 0.157 M 0.17 0.153 0.135 0.172 Fog 0.033 0.078 0.008 0.015 -0.011 0.024 -0.016 0.024 -0.022 0.025 -0.031 0.024 Thunderstorm -0.006 0.018 0.03 0.042 0.026 0.055 0.031 0.019 -0.049 0.021 -0.058 0.015 Wind 0.017 0.092 0.014 0.097 -0.035 0.151 0.026 0.04 0.002 0.113 -0.042 0.181

190 Moon -0.031 0.091 0.007 0.088 0.051 0.084 0.003 0.129 -0.004 0.186 -0.038 0.204 Table N.13: Spearman average correlations for weather and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Rain 0.009 0.102 0.004 0.049 -0.05 0.105 -0.021 0.06 -0.075 0.05 -0.04 0.15 Sun 0.029 0.08 -0.026 0.076 -0.063 0.161 0.147 0.058 0.014 0.033 0.042 0.074 Temperature 0.055 0.036 -0.064 0.1 0.002 0.133 0.11 0.134 0.142 0.113 M 0.236 0.127 Max Temp -0.012 0.111 -0.057 0.158 0.1 0.136 0.021 0.109 M 0.177 0.166 M 0.195 0.174 Fog 0.002 0.03 0.007 0.014 -0.014 0.021 -0.015 0.017 -0.022 0.025 -0.042 0.028 Thunderstorm 0.018 0.04 0.033 0.033 0.029 0.053 0.039 0.019 -0.051 0.021 -0.063 0.016 Wind 0.016 0.082 0.008 0.092 -0.04 0.16 0.048 0.046 -0.012 0.118 -0.021 0.136 Moon 0.0 0.135 0.011 0.09 0.055 0.091 -0.008 0.119 -0.023 0.181 -0.066 0.353 Table N.14: Cross average correlations for weather and content related variables

Variables Originality Emotionality Sentiment µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Rain 0.015 0.121 12:23:34 10:45:31 0.004 0.094 17:22:58 12:43:33 -0.056 0.117 6:8:1 5:0:20 Sun 0.015 0.099 7:30:53 13:12:8 0.004 0.091 15:42:58 13:7:37 -0.067 0.174 1:31:32 2:42:24 Temperature 0.023 0.076 6:7:9 7:16:23 -0.069 0.129 0:39:37 0:56:1 -0.002 0.142 2:18:14 4:36:29 Max Temp -0.009 0.039 4:18:28 4:30:2 -0.067 0.185 0:0:0 0:0:0 0.102 0.123 3:51:47 7:29:59 Fog 0.047 0.075 14:39:1 15:1:5 0.026 0.025 21:56:48 8:11:7 -0.025 0.034 13:43:7 9:36:9 Thunderstorm -0.009 0.025 19:27:17 12:53:2 0.039 0.068 11:40:11 6:43:25 0.018 0.06 5:33:41 9:48:56 Wind 0.03 0.104 7:15:20 11:58:24 0.031 0.106 10:58:32 11:40:23 -0.019 0.169 4:6:52 8:13:44 Moon -0.032 0.092 2:2:59 4:5:59 0.003 0.095 14:36:19 12:59:20 0.052 0.087 18:3:1 14:49:36 Table N.15: Cross average correlations for weather and content related variables (continued)

Variables Newsworthiness Readability Intensity µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Rain -0.026 0.049 11:55:18 13:22:34 -0.118 0.061 11:45:17 10:16:53 -0.017 0.174 11:16:57 12:29:30 Sun M 0.154 0.057 7:40:22 10:30:44 0.004 0.076 14:42:59 13:50:22 -0.016 0.116 5:16:54 7:31:43 Temperature 0.088 0.17 1:45:43 2:14:22 M 0.163 0.118 7:6:9 13:49:26 0.155 0.102 3:12:55 6:25:51 Max Temp -0.019 0.16 2:0:15 4:0:31 M 0.174 0.155 1:41:14 3:22:28 0.134 0.174 11:12:3 14:30:3 Fog -0.004 0.04 8:50:37 7:22:2 -0.052 0.05 15:14:47 11:39:46 -0.032 0.026 30:3:1 6:23:38 Thunderstorm 0.031 0.04 18:30:23 11:16:13 -0.046 0.044 15:41:25 12:58:44 -0.063 0.016 19:52:8 11:41:19 191 Wind 0.036 0.059 4:43:48 3:13:12 0.003 0.114 0:42:56 1:25:52 -0.041 0.183 4:55:1 9:50:2 Moon 0.006 0.143 8:51:30 10:35:33 -0.025 0.206 12:51:10 15:46:29 -0.043 0.208 13:15:7 16:17:44

Table N.16: Cross average correlations for weather and content related variables

Variables Rain Sun Temperature Max Temp µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality 0.014 0.095 -7:19:26 14:3:42 -0.001 0.11 -11:22:33 10:37:13 -0.005 0.079 -18:6:0 14:44:34 -0.024 0.063 -19:41:43 16:7:6 Emotionality -0.009 0.071 -7:33:54 9:16:28 0.008 0.094 -13:12:29 12:25:25 -0.052 0.156 -6:26:45 12:12:10 -0.064 0.198 -10:38:15 12:21:57 Sentiment -0.033 0.114 -5:14:7 4:51:11 -0.085 0.195 -8:13:3 11:54:29 -0.015 0.152 -6:25:42 12:12:36 0.107 0.111 -2:32:14 5:4:29 Newsworthiness -0.017 0.065 -10:48:38 12:26:42 0.115 0.094 -4:2:58 8:5:57 0.068 0.164 -0:50:29 1:40:58 -0.036 0.159 -4:38:47 9:17:35 Readability -0.101 0.055 -2:23:44 4:47:28 -0.001 0.044 -6:6:32 9:56:49 0.145 0.139 -4:0:54 8:1:49 M 0.161 0.168 -3:49:12 6:19:53 Intensity -0.006 0.185 -13:25:21 16:26:26 -0.003 0.118 -13:46:54 16:52:58 M 0.159 0.095 -8:20:53 13:4:25 0.138 0.179 -21:8:21 14:13:42 Table N.17: Cross average correlations for weather and content related variables (continued)

Variables Fog Thunderstorm Wind Moon µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality 0.064 0.075 -9:17:27 13:8:54 -0.013 0.029 -8:14:30 8:17:50 0.024 0.097 -14:0:43 16:52:18 -0.039 0.154 -28:25:20 9:24:48 Emotionality 0.036 0.023 -15:41:26 12:43:47 0.026 0.046 -6:29:2 10:47:14 0.037 0.107 -13:6:41 12:23:13 0.008 0.106 -13:14:30 13:3:47 Sentiment -0.007 0.026 -6:38:35 13:17:11 0.03 0.056 -3:39:56 3:57:5 -0.035 0.151 -0:0:0 0:0:0 0.065 0.105 -12:32:5 9:43:45 Newsworthiness -0.008 0.034 -2:3:1 4:6:3 0.043 0.014 -8:19:39 7:48:0 0.029 0.051 -3:3:29 4:32:38 0.009 0.147 -13:27:0 16:30:45 Readability -0.012 0.035 -5:52:12 8:21:30 -0.039 0.038 -4:16:59 8:33:59 -0.015 0.119 -6:20:53 12:14:41 0.002 0.2 -7:39:1 13:56:2 Intensity -0.032 0.026 -7:54:25 11:18:48 -0.061 0.019 -11:16:39 13:54:44 -0.024 0.189 -19:56:50 16:20:40 -0.055 0.213 -26:41:52 13:25:14 Table N.18: Pearson average correlations for holiday and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Public Holidays 0.007 0.039 -0.015 0.041 -0.041 0.073 0.023 0.078 -0.033 0.04 -0.057 0.034 School Holidays 0.027 0.174 -0.045 0.148 0.021 0.125 0.086 0.081 M 0.228 0.192 M 0.246 0.23 Const. Ind. holidays 0.027 0.146 0.053 0.111 -0.058 0.158 0.005 0.092 0.099 0.215 -0.013 0.194

192 weekend -0.027 0.106 M -0.159 0.069 0.173 0.109 0.0 0.179 M 0.179 0.068 N 0.39 0.246 Table N.19: Spearman average correlations for holiday and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Public Holidays 0.014 0.054 -0.018 0.049 -0.04 0.068 -0.008 0.033 -0.027 0.028 -0.084 0.056 School Holidays 0.041 0.122 -0.04 0.109 0.001 0.104 0.017 0.04 M 0.202 0.204 M 0.241 0.359 Const. Ind. holidays 0.06 0.083 0.05 0.095 -0.112 0.123 -0.029 0.122 0.1 0.221 0.058 0.296 weekend -0.103 0.138 M -0.152 0.049 0.172 0.099 0.011 0.184 M 0.19 0.062 N 0.372 0.214 Table N.20: Cross average correlations for holiday and content related variables

Variables Originality Emotionality Sentiment µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Public Holidays 0.01 0.056 15:30:7 12:18:12 0.012 0.053 8:10:21 10:26:56 -0.028 0.09 14:20:47 15:22:27 School Holidays 0.035 0.18 11:43:34 9:37:1 -0.049 0.152 4:34:10 8:29:49 0.024 0.133 15:40:40 14:0:38 Const. Ind. holidays 0.009 0.157 21:3:19 12:4:3 0.059 0.113 23:36:31 12:23:58 -0.066 0.168 14:19:14 14:34:27 weekend -0.006 0.131 12:33:12 14:11:36 M -0.19 0.041 11:39:27 14:13:45 0.181 0.118 4:58:33 7:19:21 Table N.21: Cross average correlations for holiday and content related variables (continued)

Variables Newsworthiness Readability Intensity µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ

193 Public Holidays 0.004 0.098 20:2:34 11:59:11 -0.04 0.052 15:28:9 13:6:56 -0.059 0.034 17:14:34 13:22:34 School Holidays 0.099 0.09 18:28:41 10:36:32 M 0.229 0.196 14:1:4 14:55:48 M 0.245 0.231 12:50:46 15:52:21 Const. Ind. holidays 0.003 0.115 20:15:15 11:59:45 0.077 0.232 24:9:59 13:45:18 -0.017 0.198 20:50:22 17:1:28 weekend 0.017 0.2 18:36:52 12:43:50 M 0.218 0.076 17:11:49 14:32:3 N 0.392 0.249 13:39:43 15:58:46

Table N.22: Cross average correlations for holiday and content related variables

Variables Public Holidays School Holidays Const. Ind. holidays weekend µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality 0.024 0.066 -15:28:14 14:33:19 0.03 0.176 -9:2:59 8:10:59 0.04 0.144 -13:46:54 16:52:58 -0.037 0.116 -6:9:55 12:19:51 Emotionality 0.005 0.06 -7:54:24 13:7:42 -0.051 0.149 -4:52:23 7:27:9 0.052 0.111 -0:34:28 0:49:13 M -0.185 0.06 -15:15:45 7:49:44 Sentiment -0.037 0.077 -7:49:59 13:11:26 0.036 0.139 -7:5:11 14:10:23 -0.058 0.158 -0:14:20 0:28:41 M 0.179 0.109 -3:36:52 5:19:57 Newsworthiness 0.048 0.083 -5:54:30 7:14:14 0.111 0.081 -16:17:58 14:7:52 0.006 0.093 -3:46:18 7:32:36 0.019 0.192 -5:49:24 8:9:20 Readability -0.031 0.041 -3:51:4 6:31:54 M 0.228 0.193 -5:17:6 8:56:31 0.1 0.216 -7:0:8 14:0:17 M 0.198 0.064 -16:21:1 13:57:28 Intensity -0.058 0.034 -10:52:24 11:24:11 M 0.246 0.23 -7:0:8 14:0:17 -0.013 0.197 -28:56:23 5:55:51 N 0.395 0.245 -9:59:25 12:58:12 Table N.23: Pearson average correlations for festival and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ Dance Valley -0.025 0.033 -0.031 0.035 0.046 0.093 0.03 0.077 0.008 0.025 0.057 0.178 Sensation 0.002 0.013 -0.028 0.041 0.048 0.065 -0.049 0.054 0.042 0.089 0.019 0.055 Pukkelpop 0.058 0.208 -0.071 0.076 -0.012 0.078 0.038 0.092 0.056 0.194 0.122 0.219 Pinkpop 0.022 0.135 -0.027 0.106 -0.004 0.02 -0.094 0.013 0.003 0.064 0.093 0.274

194 Lowlands 0.056 0.147 -0.075 0.063 0.026 0.069 0.025 0.114 0.029 0.132 0.12 0.231 Table N.24: Spearman average correlations for festival and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ Dance Valley -0.006 0.036 -0.025 0.029 0.06 0.118 0.032 0.076 0.006 0.029 0.054 0.185 Sensation 0.01 0.047 -0.036 0.055 0.042 0.054 -0.063 0.076 0.047 0.097 0.032 0.093 Pukkelpop -0.05 0.16 -0.074 0.083 -0.011 0.084 0.04 0.09 0.071 0.204 0.137 0.254 Pinkpop -0.023 0.06 -0.038 0.106 0.001 0.029 -0.091 0.015 0.001 0.073 0.087 0.31 Lowlands -0.03 0.067 -0.084 0.069 0.032 0.087 0.033 0.121 0.045 0.148 0.128 0.253 Table N.25: Cross average correlations for festival and content related variables

Variables Originality Emotionality Sentiment µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dance Valley 0.035 0.099 13:39:0 11:29:1 -0.037 0.05 17:40:32 12:47:29 0.076 0.091 16:14:31 8:57:6 Sensation 0.019 0.045 14:49:34 13:45:7 -0.017 0.053 15:26:19 8:11:42 0.048 0.066 6:37:51 6:41:47 Pukkelpop 0.071 0.23 17:1:24 11:17:41 -0.07 0.082 9:40:53 9:43:4 0.001 0.097 13:3:32 15:3:42 Pinkpop 0.012 0.141 11:25:1 11:36:48 0.001 0.127 8:33:54 8:6:34 -0.03 0.038 20:59:25 10:34:33 Lowlands 0.088 0.237 21:2:40 11:41:56 -0.077 0.073 14:54:5 15:51:59 0.05 0.076 14:14:43 10:38:38 Table N.26: Cross average correlations for festival and content related variables (continued)

Variables Newsworthiness Readability Intensity µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Dance Valley 0.066 0.078 21:31:32 7:51:27 -0.007 0.071 19:56:15 13:10:41 0.057 0.178 8:39:48 12:20:4 Sensation -0.053 0.051 14:39:13 14:48:41 0.033 0.094 13:14:41 9:6:22 0.022 0.061 17:9:58 11:27:6 Pukkelpop 0.076 0.126 21:29:32 11:32:45 0.047 0.201 10:32:8 8:58:45 0.143 0.246 23:18:8 12:6:58 Pinkpop -0.105 0.015 7:5:20 4:59:50 0.012 0.069 8:46:38 12:40:50 0.092 0.274 19:20:10 12:49:34 Lowlands 0.075 0.136 21:27:57 14:50:52 0.017 0.143 18:15:55 10:28:36 0.138 0.248 25:56:40 13:6:19 195 Table N.27: Cross average correlations for festival and content related variables

Variables Dance Valley Sensation Pukkelpop µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality -0.009 0.049 -8:50:29 8:50:51 -0.006 0.026 -23:31:54 13:40:26 0.054 0.213 -8:36:45 11:31:13 Emotionality -0.016 0.052 -9:15:26 7:49:10 -0.014 0.05 -18:9:43 11:1:17 -0.073 0.076 -1:26:4 2:52:8 Sentiment 0.103 0.119 -15:50:50 10:46:32 0.054 0.062 -9:47:26 13:57:34 -0.011 0.079 -3:13:34 5:23:23 Newsworthiness 0.063 0.065 -17:1:33 10:29:5 -0.029 0.069 -4:20:48 4:41:50 0.059 0.084 -9:43:40 12:31:23 Readability 0.026 0.027 -15:32:40 15:1:36 0.05 0.089 -21:10:55 14:19:23 0.063 0.201 -13:46:54 16:52:58 Intensity 0.063 0.19 -6:9:55 12:19:51 0.019 0.055 -0:23:18 0:40:22 0.121 0.219 -3:54:38 7:49:16 Table N.28: Cross average correlations for festival and content related variables (continued)

Variables Pinkpop Lowlands µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality 0.044 0.154 -10:38:48 14:4:45 0.039 0.156 -10:37:55 14:4:16 Emotionality -0.007 0.128 -10:52:23 14:0:29 -0.075 0.072 -11:16:14 13:54:18 Sentiment 0.034 0.042 -11:40:29 12:21:19 0.039 0.077 -17:6:44 14:38:20 Newsworthiness -0.094 0.013 -0:0:0 0:0:0 0.056 0.105 -9:13:51 12:8:43 Readability 0.003 0.065 -0:44:7 1:28:15 0.032 0.135 -6:59:6 13:58:13 Intensity 0.099 0.285 -6:10:54 12:13:36 0.12 0.231 -0:0:0 0:0:0 Table N.29: Pearson average correlations for context and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ µ(ρP ) σ News -0.042 0.044 -0.003 0.077 -0.06 0.041 -0.008 0.04 -0.077 0.089 -0.086 0.113 Sport -0.009 0.031 0.041 0.058 -0.009 0.046 -0.082 0.051 -0.043 0.025 -0.079 0.01

196 Festivals -0.003 0.135 -0.103 0.134 0.131 0.134 -0.075 0.07 M 0.17 0.184 M 0.232 0.268 Table N.30: Spearman average correlations for context and content related variables

Variables Originality Emotionality Sentiment Newsworthiness Readability Intensity µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ µ(ρS ) σ News -0.037 0.05 0.003 0.076 -0.068 0.048 -0.003 0.038 -0.079 0.087 -0.11 0.172 Sport -0.02 0.062 0.04 0.059 -0.009 0.053 -0.074 0.05 -0.051 0.025 -0.107 0.034 Festivals -0.01 0.168 -0.146 0.098 0.138 0.12 -0.043 0.088 0.232 0.14 N 0.442 0.207 Table N.31: Cross average correlations for context and content related variables

Variables Originality Emotionality Sentiment µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ News -0.05 0.043 5:29:58 7:31:37 0.015 0.091 9:56:57 8:38:22 -0.084 0.07 17:4:11 14:38:3 Sport 0.07 0.113 17:4:50 8:34:26 0.109 0.085 18:51:57 12:5:38 -0.01 0.07 24:8:30 12:36:52 Festivals -0.012 0.138 8:1:5 11:31:26 -0.097 0.153 13:9:5 10:53:56 0.138 0.134 12:9:57 14:55:50 Table N.32: Cross average correlations for context and content related variables (continued)

Variables Newsworthiness Readability Intensity µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ

197 News 0.035 0.086 18:58:36 14:51:12 -0.108 0.13 18:58:55 11:32:36 -0.087 0.114 11:51:38 13:12:57 Sport -0.103 0.042 13:23:5 9:59:20 -0.09 0.043 18:14:31 10:26:37 -0.082 0.01 18:25:54 11:31:31 Festivals -0.085 0.08 17:24:18 13:50:13 M 0.192 0.181 16:1:44 14:33:25 M 0.23 0.27 6:9:55 12:19:51

Table N.33: Cross average correlations for context and content related variables

Variables News Sport Festivals µ(ρX ) σ µ(lag) σ µ(ρX ) σ µ(lag) σ ρX σ µ(lag) σ Originality -0.026 0.062 -17:9:46 13:36:28 -0.033 0.009 -8:21:59 12:5:4 -0.006 0.139 -4:29:53 8:59:46 Emotionality 0.006 0.083 -9:5:2 13:18:59 0.039 0.061 -2:46:20 5:32:40 -0.106 0.133 -3:6:49 6:13:39 Sentiment -0.06 0.041 -0:0:0 0:0:0 0.007 0.056 -9:5:54 11:12:35 0.134 0.133 -3:27:38 4:58:14 Newsworthiness -0.001 0.051 -3:43:45 6:23:4 -0.077 0.059 -7:5:11 14:10:23 -0.075 0.07 -0:0:0 0:0:0 Readability -0.098 0.114 -9:26:11 13:46:17 -0.065 0.065 -1:56:5 3:52:11 0.148 0.229 -12:56:14 14:32:9 Intensity -0.088 0.117 -20:18:4 13:33:10 -0.081 0.01 -4:42:52 6:6:40 M 0.235 0.269 -28:16:42 6:23:17 198 Appendix O

Overview correlations real and online variables for individual datasets

199 Table O.1: Correlation between online and real world variables for Dance Valley Dist Event - Res Dist. Event - User Distance Users Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. Originality Emotionality Sentiment Newsworthiness Readability Intensity Public Holidays 0.065 - -0.063 - - - -0.072 - 0.099 - - - - - 0.054 -0.067 - 0.025

School Holidays N -0.415 0.085 M -0.228 M -0.338 M -0.192 0.079 N 0.4530- M 0.183 M 0.235 M -0.312 M 0.227 M 0.202 0.112 N 0.401 N 0.437 M 0.229

Const. Ind. holidays M -0.346 M 0.193 M -0.152 M -0.322 M -0.217 -0.142 N 0.411 0.124 -0.08 0.074 N 0.397 M -0.203 M 0.255 0.071 -0.141 N 0.502 M 0.334 M 0.233

weekend N -0.394 0.106 N -0.419 -0.134 M -0.228 0.117 M 0.254 0.061 M 0.298 N 0.391 -0.146 M -0.164 M -0.255 M 0.235 M 0.327 M 0.291 N 0.647 M 0.263

Rain 0.141 0.084 0.104 -0.134 -0.054 -0.12 - -0.112 -0.091 -0.109 0.131 M 0.238 M 0.17 M -0.194 -0.084 M -0.204 M -0.237 0.13

Sun 0 -0.068 - 0.139 M 0.241 M 0.189 -0.138 0.058 M 0.271 M 0.237 -0.09 0 M -0.172 N -0.446 M 0.228 0.1 0.137 0.148 200 Temperature -0.124 -0.062 0.086 -0.129 0.112 0.143 0.112 0 - 0.11 M 0.159 -0.067 M 0.238 M -0.199 -0.064 M 0.338 M 0.163 0.124

Max Temp M -0.213 -0.082 0.068 M -0.252 M -0.152- M 0.297 -0.067 M -0.156- M 0.209 -0.075 M 0.313 0.107 M -0.214 N 0.399 M 0.189 M 0.164

Fog 0.085 - 0.054 - - - - -0.056 0 - - M 0.188 - - -0.062 - - 0.026 Thunderstorm 0.065 - 0.088 - 0.056 0.145 - - 0 - - 0 0.144 - - -0.056 -0.065 0.036

Wind 0.061 0.077 0.058 0.085 0 0.126 0 -0.093 0.121 -0.064 0 0.148 M -0.157 M -0.308 - -0.086 M 0.189 0.092

Moon M 0.285 M -0.262 M 0.289 M 0.296 M 0.199 M 0.279 0.105 0.101 M -0.203 M -0.171 -0.07 M -0.222 0.143 0.069 M -0.273 0.109 M -0.17 M 0.191

Dance Valley M -0.293 0.127 M -0.322 0.065 M -0.247 0.103 -0.057 - 0.103 M 0.2 0.082 -0.091 -0.119 M 0.331 M 0.191- N 0.442 M 0.163 Sensation ------0.0

Pukkelpop 0.129 0 0.108 -0.074 -0.079 -0.069 0.067 -0.089 - - - - 0.066 0.077 0.056 -0.144 0 0.056

Pinkpop 0.129 0 0.077 - 0.058 -0.059 0.076 -0.085 0 0 - M 0.336 - 0 -0.076 0.053 - 0.056 Lowlands 0.085 - 0.071 - -0.056 - 0 -0.057 - - - - - 0.055 - -0.099 0 0.025

News M 0.164 M -0.33 M 0.19 -0.069 M 0.192 M 0.185 - -0.132 0.062 -0.084 -0.086 - 0.116 -0.123 0.098 M -0.315 M -0.155 0.135

Sport M 0.254 - 0.137 0.091 0.069 0.088 - -0.083 - - - M 0.27 - -0.074 M -0.17 0 -0.082 0.078

Festivals 0 -0.11 0.133 -0.115 0.105 M 0.2390 M -0.231 -0.132 -0.089 -0.143 -0.059 M 0.185 0.148 -0.085 M -0.249 -0.133 0.127

M 0.162 0.079 0.132 0.112 0.113 0.104 0.102 0.067 0.081 0.086 0.087 0.119 0.128 0.132 0.112 M 0.171 M 0.169 Table O.2: Correlation between online and real world variables for Sensation Dist Event - Res Dist. Event - User Distance Users Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. Originality Emotionality Sentiment Newsworthiness Readability Intensity Public Holidays 0.063 0.083 0.049 0.087 -0.114 -0.081 0.097 -0.1 -0.092 -0.13 -0.061 0.138 -0.087 -0.094 M 0.174 - -0.079 0.09

School Holidays M -0.172 M -0.163 -0.141 M -0.156 - 0.07 -0.053 -0.1 0.053 0.063 - M 0.195 M -0.171 M 0.186 0.144 M 0.277 M 0.315 0.133

Const. Ind. holidays M 0.17 M -0.218 0.07 M 0.209 -0.053 0.09 0.118 N 0.382 M -0.224 0.149 -0.086 -0.106 -0.05 M -0.174 0.106 -0.149 M -0.185 0.149

weekend M -0.261 0.147 M -0.178 M -0.247 M 0.164 0.056 -0.102 M -0.288 M 0.163 -0.128 M 0.201 0.139 M -0.186 M 0.34 -0.133 M 0.327 N 0.524 M 0.211

Rain M 0.248 -0.071 M 0.195 M 0.249 0.113 0.075 - M 0.214 M -0.167 0.101 -0.111 0.063 0.048 M -0.191 0.057 M -0.175 M -0.172 0.132

Sun -0.056 0.042 -0.063 0.04 -0.056 -0.07 M -0.186 -0.077 M 0.254 M 0.243 -0.109 M -0.175 0.101 0.114 0.075 -0.057 -0.135 0.109 201 Temperature -0.134 -0.064 -0.11 -0.146 0.084 0.067 M -0.221 M -0.174 M 0.285 M 0.173 0.094 -0.065 -0.083 M 0.244 M -0.176 M 0.181 M 0.325 M 0.155

Max Temp M -0.161 -0.102 -0.115 -0.15 0.1 0.136 M -0.187 M -0.151 M 0.226 0.05 0.104 0 M -0.175 M 0.28 M -0.175 M 0.221 N 0.408 M 0.161

Fog M 0.195- M 0.23 0.051 0.045 -0.042 0.043 - - - - 0 0.074 -0.046 - -0.072 0 0.047 Thunderstorm 0.068 -0.089 0 0.097 0.044 0.035 0.051 0.094 -0.052 0.038 - - -0.039 -0.037 0.047 -0.063 -0.037 0.047

Wind M 0.218- M 0.252 0.068 -0.131 -0.084 -0.037 0.116 -0.074 0.052 0 -0.075 M 0.172 -0.082 0.093 M -0.199 M -0.337 0.117

Moon M -0.301 M 0.173 M -0.211 M -0.3230- M -0.216 N -0.448 N 0.396 0.065 M 0.151 0.075 M -0.152 M 0.257- M 0.336 N 0.366 M 0.204 Dance Valley 0.044 - - 0.069 -0.06 0.039 -0.142 0.08 0.043 - 0.123 0 0.059 - - 0.053 -0.034 0.044

Sensation -0.065 - -0.044 -0.105 0.119 -0.065 - -0.142 0.091 -0.1 M 0.227 0.098 -0.101 M 0.161 -0.141 M 0.203 0.127 0.105

Pukkelpop 0.069 0.076 0.04 M 0.262 0.131 M 0.235 M -0.188 M 0.273 -0.12 M 0.16 -0.045 N 0.504 -0.085 -0.096 M 0.222 -0.09 -0.058 M 0.156 Pinkpop 0.046 0.049 - -0.067 -0.094 -0.069 0.082 0.072 -0.065 -0.085 0.132 -0.035 0.146 0.043 -0.11 -0.081 -0.038 0.071

Lowlands 0 0.056 - M 0.263 0.135 M 0.218 M -0.189 M 0.256 -0.125 M 0.171 -0.042 N 0.555 -0.081 -0.096 M 0.184 -0.11 -0.053 0.149

News M 0.158 0.102 0.078 0.09 -0.11 -0.068 M 0.191 M 0.196 M -0.202 0.119 -0.093 -0.074 -0.069 M -0.167 M 0.153 M -0.228 -0.121 0.131

Sport M 0.355 0.048 N 0.45 M 0.156 0.073 -0.051 -0.092 0.052 0.045 0.094 M 0.314 0.055 M 0.222 -0.087 -0.056 -0.047 -0.093 0.135

Festivals M -0.319 M -0.189 M -0.222 M -0.336 M 0.202 M 0.188 M -0.194 N -0.461 N 0.393 -0.062 0.137 0.135 M -0.209 N 0.379 M -0.156 N 0.457 N 0.659 M 0.276

M 0.155 0.084 0.122 M 0.159 0.091 0.087 0.119 M 0.184 M 0.154 0.099 0.102 0.124 0.115 M 0.154 0.11 M 0.166 M 0.203 Table O.3: Correlation between online and real world variables for Pukkelpop Dist Event - Res Dist. Event - User Distance Users Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. Originality Emotionality Sentiment Newsworthiness Readability Intensity Public Holidays 0.067 0.118 0.128 0.113 - - 0.048 0.081 0 0.068 0.11 -0.023 0.065 0.096 -0.076 -0.101 -0.036 0.066

School Holidays -0.107 M -0.193 -0.145 -0.094 - - -0.095 -0.049 0.061 M -0.156 -0.081 0.14 M -0.204 M -0.158 M 0.241 M 0.249 M 0.338 0.136

Const. Ind. holidays 0.066 0.052 0.112 0.131 - - 0.045 0.102 0.032 -0.044 -0.081 M 0.215- M -0.258 -0.054 -0.043 0.08 0.077

weekend M -0.16 -0.083 -0.122 -0.081 - - M -0.151 M -0.176 0.141 0.081 0.045 -0.129 M -0.194 M 0.236 M 0.179 0.138 -0.051 0.116 Rain -0.093 -0.063 -0.074 0 - - 0.059 0.061 -0.077 0.093 -0.101 -0.051 -0.071 0.109 0.067 -0.094 -0.043 0.062

Sun -0.063 0.081 0.054 0.075 - - -0.084 -0.133 0.04 0.059 0.082 0.13 0.056 0.085 0.13 -0.097 0.071 0.073

202 Temperature 0.026 -0.068 0.126 -0.038 - - M -0.167 M -0.23 0.105 - 0.112 0.146 -0.145 -0.094 M 0.268 M 0.213 0.131 0.11

Max Temp 0.031 -0.078 M 0.17 0.049 - - -0.112 M -0.227 M 0.186 0.048 0.084 -0.121 M -0.176 -0.059 M 0.208 M 0.228 -0.146 0.113 Fog 0.023 0.035 0.037 - - - -0.026 0.03 - 0.048 0.021 - 0.042 0.042 -0.025 -0.042 0 0.022

Thunderstorm 0.045 0.043 0.076 0.108 - - 0.098 0.044 -0.044 0.083 0.085 -0.049 0.077 -0.062 0.067 -0.057 -0.093 0.061

Wind -0.135 0.058 -0.02 -0.087 - - -0.074 M -0.191 0.068 0.054 0.066 0.046 0.053 M 0.192 0.097 M 0.152 0.063 0.08

Moon 0.025 0.034 0.145 0.026 - - -0.069 -0.11 0.094 M 0.165 0.065 -0.118 -0.105 -0.064 0.063 -0.091 -0.145 0.078

Dance Valley 0.055 0.119 0.047 M 0.162 - - 0.088 0.088 0 0.041 0.044 M 0.177 0.033 0.096 -0.045 -0.126 -0.035 0.068 Sensation 0.022 0.034 0.022 0.02 - - - - 0.063 ------0.022 - - 0.011

Pukkelpop 0.079 M -0.175 -0.06 M -0.207-- M -0.204 M -0.287 0.028 -0.136 M 0.251 M -0.184 M -0.188 -0.136 M 0.221 N 0.406 M 0.32 M 0.169 Pinkpop 0.049 0 0.093 0.111 - - 0.075 0.081 0.081 0.069 0.045 -0.03 0.135 0.067 -0.104 0.04 -0.045 0.06

Lowlands -0.102 M -0.156 -0.116 -0.145 - - -0.125 M -0.215 0.056 -0.109 M 0.158 -0.103 M -0.179 0.101 M 0.28 M 0.21 M 0.237 0.135 News 0.032 0.107 0.14 0.108 - - 0.069 0.132 -0.055 0.1 -0.033 -0.049 0.095 0.042 -0.058 -0.102 -0.125 0.073

Sport 0.056 0.136 -0.026 0.126 - - 0.079 0.062 0.05 0.073 0.115 -0.034 0.115 0.055 -0.129 -0.079 -0.063 0.071

Festivals -0.065 -0.052 M -0.153 -0.035 - - -0.08 M -0.159 0.129 -0.07 0.041 M -0.196 M -0.195 - 0.064 M 0.233 0.045 0.089 0.065 0.084 0.093 0.086 0.0 0.0 0.087 0.123 0.066 0.075 0.081 0.097 0.106 0.098 0.12 0.135 0.103 Table O.4: Correlation between online and real world variables for Pinkpop Dist Event - Res Dist. Event - User Distance Users Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. Originality Emotionality Sentiment Newsworthiness Readability Intensity Public Holidays -0.052 0.062 0.04 -0.047 - - M -0.158 0.124 0.065 0.038 -0.072 -0.053 0.073 M -0.152 -0.076 0.043 -0.115 0.069

School Holidays -0.057 0.025 -0.071 - - - M -0.222 0.1 -0.042 0.052 -0.089 0.107 -0.068 0.054 0.111 -0.146 M -0.209 0.079

Const. Ind. holidays -0.027 -0.046 -0.054 - - - -0.141 0.084 -0.037 0.06 -0.055 M 0.158 -0.043 M 0.191 M 0.17 -0.063 -0.117 0.073

weekend 0.134 -0.093 -0.064 0.051 - - M 0.355 -0.044 0.061 M -0.165 0.025 -0.094 M -0.247- M -0.191 M 0.175 N 0.547 0.132

Rain 0.078 -0.04 -0.025 0.084 - - 0.091 0.057 0.076 -0.059 0.023 -0.066 -0.112 0.032 -0.062 -0.056 M 0.227 0.064

Sun -0.045 -0.027 0.081 -0.104 - - -0.092 -0.123 0.06 0.044 M 0.155 -0.041 0.092 -0.086 0.126 0.081 M -0.168 0.078

203 Temperature 0.051 -0.05 0.064 -0.026 - - 0.051 M -0.165 0.048 -0.036 0.136 0.048 M -0.223 0.067 M 0.233 -0.084 0.028 0.077

Max Temp 0.089 -0.058 -0.022 0.083 - - 0.13 -0.044 0.037 -0.077 0.038 0.036 M -0.234 M 0.186 0.099 -0.109 M 0.16 0.082

Fog 0.079 0.019 M 0.17 0.032 - - -0.033 0.031 0.036 0.022 - 0.101 - -0.033 0.046 -0.046 0 0.038 Thunderstorm -0.021 0.041 -0.028 0.043 - - 0.021 -0.047 0.071 -0.037 -0.047 0.018 -0.027 0.1 0.048 -0.091 -0.055 0.041

Wind 0.036 -0.052 0.039 0.046 - - 0.049 0.095 -0.051 0.118 0.043 -0.097 0.129 0.056 -0.059 0.041 -0.135 0.062

Moon 0.027 0.117 -0.034 -0.033 - - M -0.316 M 0.196 0.038 0.055 M -0.222 M 0.202 0.068 0.055 M 0.192 M -0.289 M -0.237 0.122 Dance Valley -0.016 -0.035 -0.023 0.024 - - -0.029 0.039 0 0.046 0.02 0.121 0.016 0.081 0.111 0.065 0 0.037

Sensation 0 0.023 0.022 -0.018 - - -0.021 0.031 0 0.02 0.022 0 0.019 0.026 0.04 -0.036 -0.014 0.017

Pukkelpop 0.059 -0.076 M 0.281 -0.024 - - -0.075 0.055 0.068 0.049 -0.025 0.033 -0.047 0.098 0.044 -0.078 -0.041 0.062

Pinkpop 0.121 -0.034 0.04 0.061 - - N 0.405 M -0.166 0.029 M -0.189 0.126 -0.131 M -0.224 -0.065 -0.121 0.101 N 0.669 0.146

Lowlands 0.084 -0.071 M 0.278 0.023 - - -0.085 0.055 0.064 0.065 -0.028 -0.024 -0.052 0.083 0.073 -0.088 -0.046 0.066 News 0.054 -0.044 0.029 - - - 0.136 0.053 -0.045 -0.063 0.032 -0.106 -0.118 -0.096 -0.071 0.094 0.142 0.064

Sport -0.037 0.043 -0.059 -0.046 - - -0.057 0.087 0.09 0.034 -0.051 -0.042 0.099 0.045 -0.07 M -0.19 -0.082 0.061

Festivals 0.056 -0.123 0.073 0.046 - - M 0.324 -0.07 0.062 -0.101 0.073 -0.119 M -0.212 0.132 M -0.156 0.133 M 0.348 0.119

0.056 0.054 0.075 0.04 0.0 0.0 0.139 0.083 0.049 0.066 0.064 0.08 0.105 0.082 0.105 0.1 M 0.167 Table O.5: Correlation between online and real world variables for Lowlands Dist Event - Res Dist. Event - User Distance Users Degree centr. Betweenness centr. Closeness centr. Inclusiveness Social Equality Popularity Experience People of int. Originality Emotionality Sentiment Newsworthiness Readability Intensity Public Holidays M 0.261 0.06 0.066 0.048 - - -0.071 0.056 M 0.194 0.075 0.052 0.053 0.051 0.048 0.109 -0.07 -0.042 0.074

School Holidays -0.044 -0.148 0.101 M -0.168-- M 0.202 -0.132 0.038 0.046 0.087 0.047 -0.052 -0.096 -0.022 N 0.371 M 0.348 0.112

Const. Ind. holidays -0.022 -0.017 0 -0.041 - - -0.132 0.129 0.055 -0.063 -0.072 0.048 0.111 M -0.159 -0.063 0.146 M -0.191 0.073

weekend 0.081 -0.114 0.082 -0.068 - - 0.104 -0.112 -0.084 0.03 M 0.158 M 0.167 -0.12 0.139 -0.096 M 0.16 M 0.322 0.108

Rain -0.074 -0.048 -0.055 0.054 - - 0.074 -0.072 0.068 0.071 0.051 -0.093 -0.073 0.029 -0.088 -0.059 M 0.191 0.065

Sun -0.062 0.062 0.067 -0.126 - - -0.077 -0.13 0.057 -0.108 0.122 0.119 0.032 -0.061 M 0.212 -0.015 0.075 0.078

204 Temperature 0.049 -0.044 0.103 M -0.2 - - 0.024 M -0.164 0.047 -0.085 M 0.17 -0.064 -0.06 -0.08 M 0.163 0.097 M 0.151 0.088

Max Temp 0.024 -0.03 0.063 -0.04 - - -0.08 - -0.049 -0.098 M 0.155 0.031 -0.049 0.042 -0.081 0.086 0.079 0.053 Fog 0.081 0.053 0.06 0.07 - - -0.056 0.068 -0.03 -0.055 -0.046 0.056 0.04 -0.05 0.053 -0.128 -0.082 0.055

Thunderstorm 0.041 -0.047 0.038 0.079 - - -0.041 0.084 0.059 -0.054 0.085 -0.033 0.041 0.063 0.055 0.039 -0.074 0.049

Wind -0.048 0.031 0.056 -0.101 - - -0.072 -0.099 0.087 -0.097 0.107 M 0.16 0.048 0.048 0.049 0.022 0.106 0.067

Moon 0.045 0.043 -0.046 0.085 - - -0.074 0.092 -0.039 -0.026 0.066 -0.131 0.061 0.017 0.106 M -0.156 -0.108 0.064 Dance Valley -0.022 0.058 -0.015 0.057 - - -0.081 0.073 0.072 0.039 0.042 -0.021 -0.03 -0.048 0.07 -0.047 -0.045 0.042

Sensation 0.014 0.023 0.018 -0.014 - - -0.042 0.031 0.014 -0.018 0.016 - 0.04 0.025 -0.029 0.013 -0.016 0.018

Pukkelpop -0.063 -0.101 0.09 -0.141 - - M 0.269 M -0.218 -0.043 0.062 M 0.206 0.016 -0.1 0.064 -0.072 M 0.167 N 0.542 0.127 Pinkpop -0.031 0.074 -0.061 0.065 - - -0.043 0.106 -0.026 0.036 0.105 0.042 0.045 0.079 -0.111 -0.056 -0.061 0.055

Lowlands -0.041 -0.11 0.09 -0.144 - - M 0.241 M -0.211 -0.079 0.062 M 0.266 -0.037 -0.125 0.125 -0.059 M 0.187 N 0.584 0.139

News 0.045 0.092 -0.084 0.091 - - -0.138 M 0.163 -0.05 -0.06 -0.061 -0.063 0.057 -0.076 0.068 -0.113 M -0.18 0.079

Sport 0.087 0.069 0.06 0.099 - - -0.112 0.075 0.117 -0.043 -0.048 0.102 0.146 0.089 -0.091 M -0.159 -0.089 0.081

Festivals 0.029 -0.07 0.091 -0.124 - - 0.107 -0.091 0.047 -0.049 0.063 M 0.164 -0.073 0.066 -0.092 M 0.241 M 0.252 0.092

0.058 0.065 0.062 0.091 0.0 0.0 0.102 0.105 0.063 0.059 0.099 0.072 0.068 0.07 0.084 0.116 M 0.177