Crisees: Real-Time Monitoring of Social Media Streams to Support Crisis Event Management
Total Page:16
File Type:pdf, Size:1020Kb
Crisees: Real-Time Monitoring of Social Media Streams to Support Crisis Event Management David Maxwell 0800660 [email protected] School of Computing Science College of Science and Engineering Sir Alwyn Williams Building University of Glasgow G12 8QQ March 23, 2012 Abstract Social media streams provide access to unprecedented amounts of information de- scribing events as they unfold [17]. Tapping into these real-time sources can pro- vide the authorities and agencies dealing with emergencies and crises with valu- able information, helping to improve their situational awareness of these events. Such events include the recent Strathclyde University fire and the English riots of 2011. Spinsanti and Ostermann analysed how Twitter1 could provide useful informa- tion about European Forest Fires [34]. In their analysis, they found that the loca- tion of tweets regarding outbreaks of fire were closely correlated with the officially recorded locations of each fire. Using social media in this way has led to the idea that citizens can act as “social sensors” [14]. Whilst using such social sensors can provide valuable operational intelligence at ground level and in real-time, there are obvious and numerous problems that need to be resolved in order to manage these new sources of information effectively. These include: collecting and processing data in real-time, filtering and aggregat- ing the content, assessing the integrity of the material, identifying cogent informa- tion, and finally both visualising and conveying the information [34]. To tackle these issues, this project introduces Crisees, a prototype application which can collect, filter, analyse and index content from multiple social media streams in near real-time. Using a novel framework, this information is then made available through the Crisees API to a decoupled Django2-based web application, which vi- sualises the collected data in terms of sentiment and geographical content. This thesis discusses the background, design and implementation of Crisees, as well as describing an in-depth evaluation of the developed product, partially un- dertaken by Strathclyde Police. In addition, the thesis covers the Strathclyde Uni- versity fire of February 2012 which Crisees was used to cover, demonstrating the potential of this prototype. 1http://www.twitter.com 2http://www.djangoproject.com I would like to take this opportunity to express my deepest gratitude to Dr. Leif Azzopardi3, Professor Sarah Oates4, Professor Chris Johnson5 and Mr. Stefan Raue6. Without their advice and guidance throughout my honours year, this project would have seemed nigh-on impossible. In addition, I would like to thank Dr. Karen Renaud7, the reader for this project. Finally, I would like to thank Mr. Stewart Borthwick8 from Strathclyde Police for agreeing to evaluate the implemented prototype. Thank you for all your help and guidance! 3RCUK Research Fellow, School of Computing Science, University of Glasgow. 4Professor of Political Communication, School of Social and Political Sciences, University of Glasgow. 5Professor of Computing Science, School of Computing Science, University of Glasgow. 6Research Assistant, School of Computing Science, University of Glasgow. 7Senior Lecturer, School of Computing Science, University of Glasgow. 8Regional Resilience Advisor, Strathclyde Emergencies Co-ordination Group. Education Use Consent I hereby give my permission for this project to be shown to other University of Glasgow students and to be distributed in an electronic format. Please note that you are under no obligation to sign this declaration, but doing so would help future students. Name: Signature: Contents I Introduction, Background and Requirements 1 1 Introduction 2 1.1 Motivation and Context . 3 1.2 Potential End-Users and Geographical Context . 5 1.2.1 Potential End-Users . 7 1.2.2 Geographical Context . 8 1.3 Research Objectives . 9 1.4 Summary of the Contributions . 10 1.5 Remaining Document Outline . 11 2 Background and Related Work 13 2.1 What is Social Media? . 13 2.1.1 Social Media and Web 2.0 ......................... 13 2.1.2 The Classification of Social Media . 14 2.2 Social Media Regarding Crisis Management . 15 2.2.1 The Overarching Issues . 16 2.3 Related Work . 16 2.3.1 Spinsanti and Ostermann: Retrieving VGI Data for Forest Fires . 17 2.3.2 Haiti 2010 Earthquake: The Ushahidi Project . 18 i 2.3.3 O’Connor et al.: Aggregation of Posts . 19 2.3.4 Arup˚ Nielsen: Sentiment Analysis in Microblogs . 21 2.4 Abstract Prototype Design . 22 2.5 Summary . 26 3 Pilot Analysis Studies 28 3.1 Sauchiehall Street Fire . 28 3.2 Hurricane Bawbag .................................. 29 3.3 Strathclyde University Fire . 32 3.4 Glasgow City Centre Restaurant Siege . 34 3.5 Observations . 36 4 Requirements Specification 39 4.1 Background . 39 4.2 Functional Requirements . 39 4.2.1 Backend Server/Information Retrieval . 40 4.2.2 Middleware . 41 4.2.3 Interface . 41 4.3 Non-Functional Requirements . 43 4.3.1 Product Requirements . 43 4.3.2 Organisational Requirements . 43 4.3.3 Ethical Requirements . 43 II Crisees: A Detailed Summary 44 5 Architecture and Implementation Overview 45 ii 5.1 General Architecture Overview . 45 5.2 Technology Choices . 47 5.2.1 The Backend . 47 5.2.2 The API . 50 5.2.3 The Web Interface . 50 5.3 Implementation Directory Structure . 53 6 The Backend 55 6.1 Conceptual Overview . 55 6.2 Subcomponent Implementation . 57 6.2.1 Passing Information . 58 6.3 Data Management . 59 6.3.1 Relational Database Schema . 59 6.3.2 Social Media Indexing Schema . 62 6.4 Manager Subcomponents . 64 6.4.1 Analysis Manager . 64 6.4.2 Sourcing Manager . 65 6.5 The Collector Subcomponent . 66 6.5.1 Chosen Social Media APIs . 67 6.5.2 Collector Classes . 68 6.6 Filter and Analysis Pipelines . 69 6.7 The Indexer Subcomponent . 71 6.7.1 Social Media Indexing Schema . 71 6.7.2 Saving the Information . 72 iii 7 Filtering and Analysis Pipeline Components 73 7.1 Filtering Pipeline Components . 73 7.1.1 Query-based Filter . 73 7.1.2 Date Filter . 74 7.2 Analysis Pipeline Components . 75 7.2.1 Sentiment Analysis . 75 7.2.2 Geographical Analysis . 77 8 The Crisees API 82 8.1 Technologies Selected . 82 8.2 Design Considerations . 83 8.2.1 API Architecture . 83 8.2.2 Chosen SaaS Approach . 84 8.2.3 URI Design . 85 8.2.4 Response Format . 85 8.3 Producing a Stream . 86 9 The Web Interface 87 9.1 Requirements Summarisation . 87 9.2 Interface Design . 88 9.2.1 Adding an Event . 88 9.2.2 Viewing Events . 90 9.3 Implementation Aspects . 92 9.3.1 Implementation Structure . 94 9.3.2 API Interaction . 95 9.3.3 Mapping Library . 96 iv 9.3.4 Use of AJAX . 97 III Evaluation and Discussion 98 10 Evaluation 99 10.1 Iteration 1 . 100 10.1.1 Evaluation Structure . 100 10.1.2 Think Aloud Tasks . 100 10.1.3 Results . 101 10.2 Impromptu Evaluation . 103 10.2.1 Query Term/Hashtag Development . 104 10.2.2 Geographical Analysis . 104 10.2.3 Resulting Impression . 106 10.3 Iteration 2 . 106 10.3.1 Evaluation Structure . 107 10.3.2 Interface and Questionnaire Feedback . 107 10.3.3 Sentiment Analysis Component Evaluation . 108 10.4 Iteration 3 . 109 10.4.1 Evaluation Structure . 109 10.4.2 Feedback . 110 10.5 Summary . ..