Manifestation of Real World Social Events on Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Radboud University Master Thesis Computer Science Manifestation of real world social events on Twitter Author: M. Van de Voort Supervisor: dr. S. Verberne Second reader: prof. dr. T.M. Heskes August 13, 2014 2 ABSTRACT Situations in which many people come together can result in very interesting and positive, lively social events. Unfortunately, large events with many people involved also include risks, ranging from small injuries to death. It would be useful if social media could be used to monitor these events, and to predict unwanted situations and casualties. To be able to know in advance how an event will develop based on information on social media, a good understanding of the relation between the development of real world events and their manifestation online is needed. In this work, we study the relationship between real world events and their online manifestation. We create a model of both the real world event and its online manifestation. We compare these two models using data about five real world social events and Twitter data about them. We determine the correlation between the online and real world model. We find several weak to moderate correla- tions between online and real world characteristics of events. The intensity, readability and sentiment of the tweets are examples of variables in the online model that show a correlation with the real world and the weekends, school holidays and position of the moon are examples of real world variables which manifest themselves on Twitter. 3 4 Contents 1 Introduction 9 1.1 Research questions . 10 1.2 Methodology . 10 2 Related Work 13 2.1 Event detection in social media . 13 2.1.1 Twitter . 14 2.1.2 Event detection in social media other than Twitter . 15 2.2 Process of event detection . 15 2.2.1 Data pre-processing . 16 2.2.2 New event detection . 16 2.2.3 Event tracking and known event detection . 17 2.2.4 Event prediction . 18 2.2.5 Event summarizing . 19 3 Definition of events 21 3.1 Time and duration . 21 3.2 Place . 22 3.3 People involved . 23 3.4 Associated events . 24 3.5 Content . 24 4 Model of online and real-world social events 25 4.1 Model of real-world events . 25 4.1.1 Time . 25 4.1.2 Place . 26 4.1.3 People involved . 27 4.1.4 Associated events . 27 4.1.5 Content . 28 4.1.6 Context . 28 4.2 Model of online manifestation of events . 28 4.2.1 Notation . 28 4.2.2 Time and duration . 31 4.2.3 Place . 31 4.2.4 People . 33 4.2.5 Associated events . 36 4.2.6 Content . 36 4.3 Comparison between models . 39 5 CONTENTS CONTENTS 5 Dataset selection and preprocessing 41 5.1 Data selection and preprocessing . 41 5.2 Tweets . 42 5.3 The relations between users within the datasets . 45 6 Implementation of online and real world variables 49 6.1 Real world features . 49 6.1.1 Associated events . 49 6.1.2 Content . 50 6.1.3 Contextual events . 50 6.2 Online features . 53 6.2.1 Time . 53 6.2.2 Place . 53 6.2.3 People . 54 6.2.4 Content . 56 6.3 Comparison between the models . 61 7 Results and analysis 63 7.1 Relations between real world variables . 63 7.2 Relations between online variables . 64 7.2.1 Place related variables . 68 7.2.2 People related variables . 68 7.2.3 Content related variables . 69 7.2.4 Place and people related variables . 70 7.2.5 Place and content related variables . 70 7.2.6 People and content related variables . 71 7.3 Relations between online and real world variables . 71 7.3.1 Correlations between online place and real world variables . 71 7.3.2 Correlations between the relations between people online and the real world variables . 74 7.3.3 Correlations between the tweet content and the real world variables . 77 7.4 Discussion . 79 8 Conclusion 83 8.1 Future work . 84 Bibliography 85 List of Figures 91 List of Tables 93 Appendices 97 A Weather information 99 B Holidays 101 C Related events 103 6 CONTENTS CONTENTS D Content events 107 D.1 Program Dance Valley . 107 D.2 Program Sensation . 109 D.2.1 Program Pinkpop . 109 D.2.2 Programma Pukkelpop . 110 D.2.3 Programma Pukkelpop . 112 D.2.4 Programma Pukkelpop . 113 D.2.5 Programma Lowlands . 115 E Real World Variables 123 F Online place related variables 125 G Online people related variables 127 H Online content related variables 129 I Correlations real world variables 131 J Correlations online variables 143 K Average correlations online variables 149 L Correlations online and real world variables 157 M Average correlations real world and online variables (part 1) 171 N average correlations real world and online variables (part 2) 185 O Overview correlations real and online variables for individual datasets 199 7 CONTENTS CONTENTS 8 Chapter 1 Introduction Situations in which many people come together can result in very interesting and positive, lively social events. Unfortunately, large events with many people involved also include risks, ranging from small injuries to death[40, 59]. During the last years we have seen, for example, a collapsed tent at a pop festival1, a birthday party that got out of hand2, a Loveparade which got so crowded that people got hurt and even died3, new years eve parties that end in riots4, and soccer matches with violent supporters5. At all of these events many people were involved in or witnessed the event. The events started as a positive happening and ended in negative sentiment, with casualties, chaos, or even vandalism, fights and deaths. Knowing how events develop might help decreasing these risks or might accommodate early intervention and prevent escalation. To be able to monitor the event closely, we can use informa- tion that is available on social media: people that are involved in the event often publish all sorts of information on these media, including information about the event. Currently an average of 5,700 tweets per second is produced by Twitter users[76]. This data contains information about personal activities, social interactions, public opinion, news, developments in science and arts, regional information about weather, traffic, social activity and much more. Many people have the possibility to share information “on the road”. In the third quarter of 2013 72% of people in the Netherlands had a smart phone6. The percentage of people that use internet was 94% in the last year. The number of people that used mobile internet is 56% 7. This allows people to not only share information on social media after the fact, but to share the information “as it happens”. To be able to use the information people publish online during an event to predict future developments and to accurately anticipate unwanted situations, it is necessary to first understand how the online developments relate to the developments that happen in the real world. In this research we investigate the relation between the online world and the real world, within the scope of music festivals. We choose to first study the relation between medium to large scale music events without irregularities, to get a better understanding of how the real world and the online world relate to each other in a normal situation. We choose these events because we expect 1http://nos.nl/video/265456-tenten-omgewaaid-op-pukkelpop.html 2http://www.volkskrant.nl/vk/nl/2686/Binnenland/article/detail/3326464/2012/10/04/ Project-X-Haren-Niet-zo-janken-gewoon-feesten.dhtml 3http://www.volkskrant.nl/vk/nl/2664/Nieuws/archief/article/detail/1011559/2010/07/26/ 19-doden-en-342-gewonden-door-paniek-op-dancefeest.dhtml 4http://nos.nl/artikel/592166-100-mensen-gearresteerd-in-veen.html 5http://www.telegraaf.nl/feed/22365935/__Ook_rellen_bij_Euroborg__.html 6http://www.telecompaper.com/news/dutch-smartphone-penetration-hits-72-in-q3–973995 7http://www.cbs.nl/nl-NL/menu/themas/dossiers/eu/publicaties/archief/2013/2013-3851-wm.htm 9 that many young people visit these events and assume that these young people often use social media, providing us with enough information on social media about the event. In our research we limit our scope to Twitter. We hope our work will contribute to a better understanding of how messages online relate to occurrences in the real world. We think our results can contribute to the development of new techniques or the improvement of existing techniques that are able to predict developments in events that involve large crowds to prevent disturbances from happening and to allow detection of irregularities in an early state. 1.1 Research questions In this research, we look into the relation between large social events in the real world and compare them with what happens online in social media. Our assumption is that the development of the online manifestation of the real world event in social media is strongly related to the development of the event in the real world. Our research question is: How do developments of social events in the real world relate to messages about this event on Twitter? We divide this question into the following sub-questions: 1. Which characteristics of events can be used to describe the social events offline and their online manifestation? 2. How do the online and real world characteristics of events relate to each other? Which online characteristics represent real world developments of events most distinctively? 3. How can the relation between events and their online manifestation be used to predict the developments of events based on online information from social media? 1.2 Methodology We investigate the relation between the development of an event in the real world and the manifestation of this event in messages on Twitter in four steps: 1.