Witnesses of Events in Social Media, © February 2018 ABSTRACT
Total Page:16
File Type:pdf, Size:1020Kb
WITNESSESOFEVENTSINSOCIALMEDIA marie truelove Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy February 2018 Department of Infrastructure Engineering The University of Melbourne Marie Truelove: Witnesses of Events in Social Media, © February 2018 ABSTRACT Social networks now rank amongst the worlds most popular web- sites. Academics and industry alike recognise opportunities provided by the vast quantities of user-generated content. Opportunistically harvesting information to derive event intelligence is now actively sought by numerous applications including emergency management and journalism, and pursued by research interests including crisis informatics and new event detection. However, with these opportu- nities numerous challenges are recognised. Micro-blogging streams are characterised as noisy, and relevant information for any task is typically a fraction of that posted. Additionally, the possibility of am- biguous, misleading, or fake content about events erode trust in the information that is retrieved. Consequently this research is motivated to pursue micro-blogs that are witness accounts of events, and the micro-bloggers who post them. A literature review of social media research related to event witnessing identifies that witness accounts are sought by numerous domains. Their existence can be used to confirm an events occurrence, and the information they contain can improve situation awareness during an emergency, and provide credibility to breaking news sto- ries. However, also identified are varying definitions of witnessing concepts and gaps in knowledge and solutions. Lacking are funda- mental definitions of witnessing evidence and counter-evidence that distinguish inferences by observation, experience, and proximity, in multi-modal forms. Current location inference approaches for micro- blogs and micro-bloggers, are resolved at scales inadequate to in- fer human observation or experience. Additionally, lags in research progress are identified. For example research of micro-blog image category classification in comparison to micro-blog text classification, and exploration of micro-blogger posting history for contextual un- derstanding in comparison to individual micro-blogs. The hypothesis of this research is that micro-bloggers who are wit- nesses of events, can be identified by evidence contained in their micro-blogs. To test this hypothesis an incremental experimental ap- proach is adopted comprising of three stages, each building on the foundation achieved by the previous. Each stage balances the prag- matic requirement of automation with in-depth understanding by detailed human analysis of case study events with varying charac- teristics. The first stage seeks to identify and define inferential evi- dence of event witnessing in micro-blogs, including witness accounts. The second stage demonstrates the automatic extraction of witness- ing evidence and counter-evidence from micro-blog content. And the iii final stage demonstrates the combination of extracted evidence and counter-evidence to identify micro-bloggers who are likely witnesses and test the consistency and certainty of this identification. Experimental outcomes include advanced original models of text, image and geotag evidence that support inferences of witnessing, and counter-evidence to test the status of potential witnesses. The usage of counter-evidence to identify conflict in a micro-blogger’s posting history represents a new approach towards information as- sessment for this purpose, supporting the interrogation of evidence for many applications. The combination of evidence from geotags, images, and text, from a micro-blogger’s complete posting history, is demonstrated to support the identification of a greater number of evidence and potential witnesses in comparison to baseline methods that consider individual geotag or text content only. This contributes to the alleviation of relevant information sparsity. A demonstration of automatic evidence extraction includes a new application of image category classification by the bag-of-words procedure. Experiments make use of case studies from a range of event types that contributes towards generalisation of models. The knowledge gained enables the introduction of a framework of processes for identifying potential witnesses of events by evidence they post to social media. iv DECLARATION This is to certify that: 1. this thesis comprises only my original work towards the degree of Doctor of Philosophy, 2. due acknowledgment has been made in the text to all other material used, 3. the thesis is fewer than 100,000 words in length exclusive of tables, maps, bibliographies and appendices. Australia, February 2018 Marie Truelove PUBLICATIONS This thesis is based on published works from my PhD research dur- ing candidature. Its contents including ideas, algorithms and figures have appeared in the following publications: Journal articles Truelove M, Vasardani M, Winter S. Towards credibility of micro- blogs: characterising witness accounts. GeoJournal. 2015; 80(3):339-359. Truelove M, Khoshelham K, McLean S, Winter S, Vasardani M. Iden- tifying Witness Accounts from Social Media Using Imagery. ISPRS International Journal of Geo-Information. 2017; 6(4). Truelove, M., Vasardani, M. and Winter, S. Testing the event witness- ing status of micro-bloggers from evidence in their micro-blogs. PLoS ONE. 2017; 12(12): e0189378. Peer-reviewed conference articles Truelove M, Vasardani M, Winter S. Testing a model of witness ac- counts in social media. In: Proceedings of the 8th Workshop on Geographic Information Retrieval. Dallas, USA: ACM; 2014. Truelove M, Vasardani M, Winter S. Introducing a framework for au- tomatically differentiating Witness Accounts of Events from Social Media. In: Proceedings Research@Locate ’16. Melbourne, Australia; 2016. p. 13-18. vii ACKNOWLEDGEMENTS I would like to thank Stephan Winter and Maria Vasardani for their supervision throughout my candidature. Their support and direction ensured that my research progressed and improved with each pass- ing year. I would also like to thank my advisory committee and research group for their contributions at different times throughout my candi- dature. There are too many interactions with colleagues to list indi- vidually, that assisted my work in some way, but in particular I would like to thank Kourosh Khoshelham, Tim Baldwin, Michael Rigby and Junchul Kim. Finally I would like to thank my family. Patrick, Aidan and Sophie who have provided unwavering encouragement. My parents Bruce and Julie and in-laws Ann and Terry for their practical support, and my siblings and their partners. ix CONTENTS 1 introduction1 1.1 Background and problem statement . 2 1.2 Hypothesis and research questions . 3 1.3 Research objectives and approach . 4 1.4 Research scope . 6 1.5 Significance of the study . 6 1.6 Structure . 8 2 literature review 11 2.1 Motivations . 11 2.1.1 Direct interest in social media event witnessing 11 2.1.2 Seeking credibility and relevance . 12 2.1.3 Location inference continues to challenge . 14 2.2 Witnessing concepts . 16 2.2.1 Spatial . 19 2.2.2 Topics and themes . 24 2.2.3 Information lifecyle . 25 2.2.4 Source . 26 2.2.5 Quality . 27 2.2.6 Event . 29 2.2.7 Temporal . 30 2.2.8 Summary . 31 2.3 Methods for extracting witness accounts . 31 2.3.1 Twitter case studies . 31 2.3.2 Caution when using social media data . 32 2.3.3 Manual human annotation . 33 2.3.4 Micro-blog categorisation . 35 2.3.5 Image category classification . 39 2.3.6 Summary . 43 2.4 Testing witnessing evidence . 43 2.4.1 Fusion . 44 2.4.2 An introduction to DST and related research implementations . 45 3 characterising witness accounts 49 3.1 Introduction . 49 3.2 Defining witness accounts . 50 3.2.1 Influence regions . 53 3.2.2 Place descriptions . 54 3.2.3 Corroboration . 55 3.3 Experiments . 56 3.3.1 Event description . 56 3.3.2 Data collection . 56 3.3.3 Pre-processing to establish the OIO dataset . 57 xi xii contents 3.3.4 Coding of characteristics . 58 3.4 Results . 58 3.4.1 Summary . 59 3.4.2 Witness accounts . 59 3.4.3 Impact accounts . 60 3.4.4 Coding process evaluation . 62 3.4.5 Place descriptions . 63 3.4.6 Influence regions . 66 3.4.7 Corroboration . 67 3.5 Discussion . 69 3.6 Summary . 70 4 testing a model of witness accounts 73 4.1 Introduction . 73 4.2 Experiments . 74 4.3 Results . 76 4.3.1 Corroboration with meta-data locations . 76 4.3.2 Shark sighting . 78 4.3.3 Music concert . 78 4.3.4 Protest . 80 4.3.5 Cyclone . 82 4.4 Discussion . 84 4.5 Summary . 86 5 categorising images that are witness accounts 87 5.1 Introduction . 87 5.2 Experiments . 89 5.2.1 Events descriptions . 89 5.2.2 Data collection . 90 5.2.3 Training data creation . 91 5.2.4 Posting behaviour: temporal and spatial analysis 92 5.2.5 Image category classification by BoW . 92 5.3 Results . 94 5.3.1 Datasets . 94 5.3.2 Training data categorisation . 95 5.3.3 Posting behavior . 97 5.3.4 Baseline classifier performance . 100 5.3.5 Effect of number of words . 100 5.3.6 Comparison of different classifiers . 100 5.3.7 Learning curve analysis . 101 5.3.8 Transfer learning . 101 5.3.9 Misclassification analysis . 102 5.4 Discussion . 105 5.5 Summary . 107 6 a framework for witnessing evidence 109 6.1 Introduction . 109 6.2 Defining evidence and counter-evidence of event wit- nessing . 110 contents xiii 6.3 Method . 114 6.3.1 Case study event . 114 6.3.2 Filter . 114 6.3.3 Extract . 115 6.4 Results . 117 6.4.1 Manually annotated dataset summary . 117 6.4.2 Analysis of text annotation . 118 6.4.3 Characteristics of OTG and NOTG text evidence for a broadcast sporting event . 119 6.4.4 Summary evidence categorisations for micro-blogs and micro-bloggers . 122 6.4.5 Conflict in the training datasets .