Analyzing Social Media Data to Enrich Human-Centric Information
Total Page:16
File Type:pdf, Size:1020Kb
ANALYZING SOCIAL MEDIA DATA TO ENRICH HUMAN-CENTRIC INFORMATION FOR NATURAL DISASTER MANAGEMENT A Dissertation Submitted to Kent State University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Zheye Wang 2018 © Copyright All rights reserved Except for previously published materials Dissertation written by Zheye Wang B.S., Shandong Normal University, China, 2011 M.S., University of Chinese Academy of Sciences, China, 2014 Ph.D., Kent State University, 2018 Approved by Dr. Xinyue Ye , Chair, Doctoral Dissertation Committee Dr. Scott Sheridan , Members, Doctoral Dissertation Committee Dr. Jay Lee , Dr. Ye Zhao , Accepted by Dr. Scott Sheridan , Chair, Department of Geography Dr. James L. Blank , Dean, College of Arts and Sciences TABLE OF CONTENTS TABLE OF CONTENTS ............................................................................................................... iii LIST OF FIGURES ........................................................................................................................ v LIST OF TABLES ......................................................................................................................... vi ACKNOWLEDEGEMENTS ....................................................................................................... vii CHAPTER 1 INTRODUCTION .................................................................................................... 1 1.1 Background ...................................................................................................................... 1 1.2 Web 2.0, VGI, and natural disaster management ............................................................. 4 1.3 Social media analytics for natural disaster management ................................................. 6 1.4 Dissertation synopsis ........................................................................................................ 7 CHAPTER 2 LITERATURE REVIEW ....................................................................................... 10 2.1 Introduction .................................................................................................................... 10 2.2 Four dimensions ............................................................................................................. 12 2.2.1 Space ....................................................................................................................... 12 2.2.2 Time ........................................................................................................................ 14 2.2.3 Content .................................................................................................................... 15 2.2.4 Network................................................................................................................... 19 2.3 Focusing on social media information ........................................................................... 19 2.4 Fusing social media data with authoritative data ........................................................... 30 2.4.1 Fusing with remote-sensing data ............................................................................ 31 2.4.2 Fusing with census data .......................................................................................... 32 2.5 Conclusion ...................................................................................................................... 34 CHAPTER 37 ANALYZING WILDFIRE TWITTER ACTIVITIES: SPACE, TIME, CONTENT, AND NETWORK .................................................................................................... 37 3.1 Introduction .................................................................................................................... 37 3.2 Data and methodology ................................................................................................... 41 3.2.1 Data ......................................................................................................................... 41 3.2.2 Methods................................................................................................................... 43 3.3 Spatial and temporal analysis of wildfire Twitter activities ........................................... 45 iii 3.4 Topics and network ........................................................................................................ 54 3.5 Conclusion ...................................................................................................................... 60 CHAPTER 4 SPACE, TIME, AND SITUATIONAL AWARENESS IN NATURAL HAZARDS: A CASE STUDY OF HURRICANE SANDY WITH SOCIAL MEDIA DATA ... 62 4.1 Introduction .................................................................................................................... 62 4.2 Data and methodology ................................................................................................... 65 4.2.1 Hurricane Sandy tweets in New York City............................................................. 65 4.2.2 Cleaning and classifying Hurricane Sandy tweets .................................................. 66 4.2.3 Location quotient: detecting area-specific topic ..................................................... 68 4.2.4 Markov transition probability matrix: measuring temporal transition of area- specific topic .......................................................................................................................... 69 4.3 Results ............................................................................................................................ 71 4.3.1 Data description ...................................................................................................... 71 4.3.2 Top frequent terms .................................................................................................. 73 4.3.3 Spatial visualization of area-specific topic ............................................................. 78 4.3.4 Temporal transition of area-specific topics............................................................. 84 4.4 Discussion ...................................................................................................................... 88 CHAPTER 5 CONCLUSION....................................................................................................... 90 5.1 Summary ........................................................................................................................ 90 5.2 Limitations ..................................................................................................................... 92 5.3 Beyond natural disaster management ............................................................................. 95 References ..................................................................................................................................... 96 iv LIST OF FIGURES Figure 1. Combinations of four dimensions in social media data ............................................................... 22 Figure 2. A summary of papers that focus on analyzing one dimension of social media data .................. 24 Figure 3. A summary of papers where multiple dimensions are involved .................................................. 25 Figure 4. Temporal evolution of wildfire-related tweets with keywords of ‘fire’ and ‘wildfire’ ............... 46 Figure 5. Temporal evolutions of tweets with keywords including (a) Bernardo (b) San Marcos ............. 48 Figure 6. Spatial distribution of geo-tagged ‘fire’ and ‘wildfire’ tweets .................................................... 49 Figure 7. Dual kernel density estimation of geo-tagged tweets on Bernardo fire ....................................... 50 Figure 8. Dual kernel density estimation of geo-tagged tweets on Cocos fire............................................ 52 Figure 9. Spatial distribution of population in San Diego County .............................................................. 53 Figure 10. Term frequency plot .................................................................................................................. 55 Figure 11. Indegree cumulative distribution of the retweet network .......................................................... 57 Figure 12. Outdegree cumulative distribution of the retweet network ....................................................... 58 Figure 13. The major part of the retweet network ...................................................................................... 59 Figure 14. Kernel density map of Sandy tweets (arrow and scale are in the lower left corner) ................. 67 Figure 15. The spatial distribution of area-specific topics for total Sandy tweets ...................................... 80 Figure 16. The spatial distribution of area-specific topics in the Before group .......................................... 81 Figure 17. The spatial distribution of area-specific topics in the During group ......................................... 82 Figure 18. The spatial distribution of area-specific topics in the After group ............................................ 83 v LIST OF TABLES Table 1. Combinations of four dimensions and corresponding articles ...................................................... 26 Table 2. Combinations of four dimensions and data analysis tasks ............................................................ 28 Table 3 Strengths and Limitations of remote sensing, social media, and census