ANALYSIS of the SPATIOTEMPORAL DISTRIBUTION of GEOLOCATED TWEETS By
Total Page:16
File Type:pdf, Size:1020Kb
TRUTHS AND RUMORS ON TWITTER: ANALYSIS OF THE SPATIOTEMPORAL DISTRIBUTION OF GEOLOCATED TWEETS by John J Naylor A Thesis Submitted to the Graduate Faculty of George Mason University in Partial Fulfillment of The Requirements for the Degree of Master of Science Geoinformatics and Geospatial Intelligence Committee: ___________________________________ Dr. Arie Croitoru, Thesis Director ___________________________________ Dr. Anthony Stefanidis, Committee Member ___________________________________ Dr. Andrew Crooks, Committee Member ___________________________________ Dr. Anthony Stefanidis, Department Chairperson ___________________________________ Dr. Donna M. Fox, Associate Dean, Office of Student Affairs & Special Programs, College of Science ___________________________________ Dr. Peggy Agouris, Dean, College of Science Date: _____________________________ Spring Semester 2017 George Mason University Fairfax, VA Truths and Rumors on Twitter: Analysis of the Spatiotemporal Distribution of Geolocated Tweets A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at George Mason University by John J Naylor Bachelor of Science Brigham Young University, 2013 Director: Arie Croitoru, Professor Department of Geoinformatics and Geospatial Intelligence Spring Semester 2017 George Mason University Fairfax, VA Copyright 2017 John J Naylor All Rights Reserved ii DEDICATION This is dedicated to my wife Abbey, my son Jack and my daughter Grace. iii ACKNOWLEDGEMENTS I would like to first thank Dr. Arie Croitoru who for the past years has been a source of invaluable help and friendship. This project would not be what it is today without his extensive expertise. I also would like to thank my committee members Dr. Anthony Stefanidis and Dr. Andrew Crooks. Also a thanks to Dr. Andreas Züfle for helping to answer several clustering questions and allowing me to use an image from his lecture slides. My friend Chris Braniff also helped immensely in proofreading my final thesis and gave great feedback and criticism. Thank you to the GEOINT department at GMU for providing the Boston Bombing dataset used in this research project. I’d finally like to thank General Dynamics who helped encourage me to finish my thesis by allowing me time flexibility and management support in furthering my education. iv TABLE OF CONTENTS Page List of Tables .................................................................................................................... vii List of Figures .................................................................................................................. viii List of Equations ................................................................................................................. x List of Abbreviations and Symbols.................................................................................... xi Abstract ............................................................................................................................. xii 1. INTRODUCTION ....................................................................................................... 1 1.1 Twitter and Rumors .............................................................................................. 1 1.2 Motivating Examples ........................................................................................... 4 1.2.1 The Associated Press Twitter Hack .............................................................. 4 1.2.2 The Boston Bombing .................................................................................... 6 1.2.3 An Emerging Need ....................................................................................... 8 1.3 Overview of What Follows .................................................................................. 9 2. PRIOR WORK....................................................................................................... 10 2.1 Twitter as a News Outlet .................................................................................... 10 2.2 Credibility on Twitter ......................................................................................... 13 2.3 Spatial Propagation of Tweets............................................................................ 14 2.4 Twitter Use after Major Events .......................................................................... 17 3. RESEARCH OBJECTIVE .................................................................................... 19 4. DATA .................................................................................................................... 21 4.1 Data Collection ................................................................................................... 21 4.2 Data Preparation ................................................................................................. 22 4.3 Mapping the Tweets .......................................................................................... 32 5. RESULTS .............................................................................................................. 39 5.1 Time/Distance Graphs ........................................................................................ 39 5.2 Logistic Regression ............................................................................................ 46 5.3 Clustering .......................................................................................................... 51 5.4 Jaccard Coefficient ............................................................................................. 63 6. DISCUSSION ........................................................................................................ 72 6.1 Time/Distance Graphs ........................................................................................ 72 v 6.2 Logistic Regression ............................................................................................ 75 6.3 Clustering/Jaccard Coefficient ........................................................................... 75 7. CONCLUSION ...................................................................................................... 78 7.1 Future Work ....................................................................................................... 81 References ......................................................................................................................... 83 vi LIST OF TABLES Table Page Table 1 Gupta et al.'s top tweets of the Boston Bombing (Gupta et al. 2013). ................. 23 Table 2 The Ten Tweet Groups. ....................................................................................... 25 Table 3 Each tweet group and the number of clusters. ..................................................... 53 Table 4 The Jaccard Coefficient calculated for each cluster pair, 2 parts. ....................... 67 Table 5 Inconsistency Table ............................................................................................. 71 vii LIST OF FIGURES Figure Page Figure 1 The tweet sent out from the AP's hacked Twitter account (AP, 2013). ............... 5 Figure 2 FBI released photos asking for information of these two men (FBI, 2013). ........ 8 Figure 3 The first tweet in the collection with its metadata minus geolocation. .............. 21 Figure 4 A tweet that is geolocated to the user's precise coordinates. .............................. 22 Figure 5 Timeline of tweets in the Blood group. .............................................................. 27 Figure 6 Timeline of tweets in the Cell group. ................................................................. 27 Figure 7 Timeline of tweets in the Doctors group. ........................................................... 28 Figure 8 Timeline of tweets in the Dollar RT group. ....................................................... 28 Figure 9 Timeline of tweets in the Globe group. .............................................................. 29 Figure 10 Timeline of tweets in the More Bombs group. ................................................. 29 Figure 11 Timeline of tweets in the Pressure group. ........................................................ 30 Figure 12 Timeline of tweets in the Sandy Boy group. .................................................... 30 Figure 13 Timeline of tweets in the Sandy Girl group. .................................................... 31 Figure 14 Timeline of tweets in the Video group. ............................................................ 31 Figure 15 Heat map for the tweets in the Blood group. .................................................... 34 Figure 16 Heat map for the tweets in the Cell group. ....................................................... 34 Figure 17 Heat map for the tweets in the Doctors group. ................................................. 35 Figure 18 Heat map for the tweets in the Dollar RT group. ............................................. 35 Figure 19 Heat map for the tweets in the Globe group. .................................................... 36 Figure 20 Heat map for the tweets in the More Bombs group. ......................................... 36 Figure 21 Heat map for the tweets in the Pressure group. ................................................ 37 Figure 22 Heat map for the tweets in the Sandy Boy group. ............................................ 37 Figure 23 Heat map for the tweets in the Sandy Girl group. ............................................ 38 Figure