ANALYZING SOCIAL MEDIA DATA TO ENRICH HUMAN-CENTRIC INFORMATION

FOR NATURAL DISASTER MANAGEMENT

A Dissertation Submitted

to Kent State University in Partial

Fulfillment of the Requirements for the

Degree of Doctor of Philosophy

by

Zheye Wang

2018

© Copyright

All rights reserved

Except for previously published materials

Dissertation written by

Zheye Wang

B.S., Shandong Normal University, China, 2011

M.S., University of Chinese Academy of Sciences, China, 2014

Ph.D., Kent State University, 2018

Approved by

Dr. Xinyue Ye , Chair, Doctoral Dissertation Committee

Dr. Scott Sheridan , Members, Doctoral Dissertation Committee

Dr. Jay Lee ,

Dr. Ye Zhao ,

Accepted by

Dr. Scott Sheridan , Chair, Department of Geography

Dr. James L. Blank , Dean, College of Arts and Sciences

TABLE OF CONTENTS

TABLE OF CONTENTS ...... iii LIST OF FIGURES ...... v LIST OF TABLES ...... vi ACKNOWLEDEGEMENTS ...... vii CHAPTER 1 INTRODUCTION ...... 1 1.1 Background ...... 1 1.2 Web 2.0, VGI, and natural disaster management ...... 4 1.3 Social media analytics for natural disaster management ...... 6 1.4 Dissertation synopsis ...... 7 CHAPTER 2 LITERATURE REVIEW ...... 10 2.1 Introduction ...... 10 2.2 Four dimensions ...... 12 2.2.1 Space ...... 12 2.2.2 Time ...... 14 2.2.3 Content ...... 15 2.2.4 Network...... 19 2.3 Focusing on social media information ...... 19 2.4 Fusing social media data with authoritative data ...... 30 2.4.1 Fusing with remote-sensing data ...... 31 2.4.2 Fusing with census data ...... 32 2.5 Conclusion ...... 34 CHAPTER 37 ANALYZING WILDFIRE TWITTER ACTIVITIES: SPACE, TIME, CONTENT, AND NETWORK ...... 37 3.1 Introduction ...... 37 3.2 Data and methodology ...... 41 3.2.1 Data ...... 41 3.2.2 Methods...... 43 3.3 Spatial and temporal analysis of wildfire Twitter activities ...... 45

iii

3.4 Topics and network ...... 54 3.5 Conclusion ...... 60 CHAPTER 4 SPACE, TIME, AND SITUATIONAL AWARENESS IN NATURAL HAZARDS: A CASE STUDY OF HURRICANE SANDY WITH SOCIAL MEDIA DATA ... 62 4.1 Introduction ...... 62 4.2 Data and methodology ...... 65 4.2.1 Hurricane Sandy tweets in New York City...... 65 4.2.2 Cleaning and classifying Hurricane Sandy tweets ...... 66 4.2.3 Location quotient: detecting area-specific topic ...... 68 4.2.4 Markov transition probability matrix: measuring temporal transition of area- specific topic ...... 69 4.3 Results ...... 71 4.3.1 Data description ...... 71 4.3.2 Top frequent terms ...... 73 4.3.3 Spatial visualization of area-specific topic ...... 78 4.3.4 Temporal transition of area-specific topics...... 84 4.4 Discussion ...... 88 CHAPTER 5 CONCLUSION...... 90 5.1 Summary ...... 90 5.2 Limitations ...... 92 5.3 Beyond natural disaster management ...... 95 References ...... 96

iv

LIST OF FIGURES

Figure 1. Combinations of four dimensions in social media data ...... 22

Figure 2. A summary of papers that focus on analyzing one dimension of social media data ...... 24

Figure 3. A summary of papers where multiple dimensions are involved ...... 25

Figure 4. Temporal evolution of wildfire-related tweets with keywords of ‘fire’ and ‘wildfire’ ...... 46

Figure 5. Temporal evolutions of tweets with keywords including (a) Bernardo (b) San Marcos ...... 48

Figure 6. Spatial distribution of geo-tagged ‘fire’ and ‘wildfire’ tweets ...... 49

Figure 7. Dual kernel density estimation of geo-tagged tweets on Bernardo fire ...... 50

Figure 8. Dual kernel density estimation of geo-tagged tweets on ...... 52

Figure 9. Spatial distribution of population in San Diego County ...... 53

Figure 10. Term frequency plot ...... 55

Figure 11. Indegree cumulative distribution of the retweet network ...... 57

Figure 12. Outdegree cumulative distribution of the retweet network ...... 58

Figure 13. The major part of the retweet network ...... 59

Figure 14. Kernel density map of Sandy tweets (arrow and scale are in the lower left corner) ...... 67

Figure 15. The spatial distribution of area-specific topics for total Sandy tweets ...... 80

Figure 16. The spatial distribution of area-specific topics in the Before group ...... 81

Figure 17. The spatial distribution of area-specific topics in the During group ...... 82

Figure 18. The spatial distribution of area-specific topics in the After group ...... 83

v

LIST OF TABLES

Table 1. Combinations of four dimensions and corresponding articles ...... 26

Table 2. Combinations of four dimensions and data analysis tasks ...... 28

Table 3 Strengths and Limitations of remote sensing, social media, and census data ...... 34

Table 4. Data summary for the collected tweets ...... 43

Table 5. Overview of the major wildfires in May, 2014 ...... 47

Table 6. Term clusters in wildfire tweets ...... 56

Table 7. The classification schema of Hurricane Sandy tweets ...... 70

Table 8. Classification results of Sandy tweets ...... 72

Table 9 Top frequent terms under four topics of the total Sandy tweets ...... 74

Table 10 Top frequent terms under four topics of the before Sandy tweets ...... 76

Table 11 Top frequent terms under five topics of the during Sandy tweets ...... 77

Table 12 Top frequent terms for four topics of the after Sandy tweets ...... 78

Table 13. The transition probability matrix for area-specific topics from Before to During ...... 85

Table 14. The transition probability matrix for area-specific topics from During to After ...... 86

Table 15. The transition probability matrix for area-specific topics from Before to After ...... 87

vi

ACKNOWLEDEGEMENTS

Completing this dissertation marks a milestone in my life. However, this could never been achieved without support of many individuals. First and foremost, I owe the biggest debt of gratitude to my family for encouraging me in the whole process. My parents and my brother have always been by my side and I cannot thank them enough.

I have great pleasure in acknowledging my gratitude to my exceptional committee, comprised of Dr. Xinyue Ye, Dr. Jay Lee, Dr. Scott Sheridan, and Dr. Ye Zhao, in providing insightful comments and constructive feedback on this dissertation research. I could not have asked for a better committee in my research endeavors. Dr. Xinyue Ye and Dr. Jay Lee have been especially influential during my stay at Kent State University. I am grateful to my doctoral advisor, Dr. Xinyue Ye, who inspired me a lot in developing ideas that are fundamental for this dissertation. I would also like to thank Dr. Jay Lee for being a great mentor and collaborator. Dr.

Lee is always patient and generous in sharing valuable knowledge and expertise, which has broadened my perspective on GIScience and space-time analysis.

I also take pride in acknowledging my dear friends in the Department of Geography at Kent

State University. They have been so helpful and cooperative in offering their support at all times throughout my doctoral study.

vii

CHAPTER 1

INTRODUCTION

1.1 Background

UN International Strategy for Disaster Reduction identifies two major origins of hazards including natural hazards and technological hazards (UN/ISDR, 2002). Natural hazards are naturally occurring phenomena and can be categorized into the following groups:

1. Geophysical hazards such as landslides, earthquakes, volcanic eruptions, and tsunamis.

2. Hydrological hazards including floods and avalanches.

3. Climatological hazards e.g., heat waves, storms, wildfires, and droughts.

4. Biological hazards involving epidemics and insects.

Technological hazards are events caused by reluctant use of technologies and can be classified into three groups:

1. Industrial hazards such as release of hazardous substances and collapses of industrial

infrastructures.

2. Transport hazards involving land, sea, and air transportation systems.

3. Miscellaneous hazards such as collapse of residential buildings, fires, and explosions.

A hazard alone is not a disaster. A natural hazard will just be a naturally occurring event if it does not exert negative impacts on humans. In other words, when we refer to natural disaster, we emphasize its negative effects on population and community. For example, a hurricane moving

1 in the open sea is not a disaster but when it makes landfall in a populous area the consequences can be disastrous.

Over the past several decades, the frequency and intensity of natural disasters have dramatically increased, causing a huge amount of human injuries, deaths, and property damage

(Cutter and Emrich, 2005; Klomp, 2016; O’Brien et al., 2006). The Great Wenchuan earthquake occurred on May 12, 2008. With an 8.0 magnitude, the hypocenter at 10–20 km depth, and the epicenter at Wenchuan County, the earthquake caused huge devastations to China’s Sichuan

Province. Nearly 70,000 people died, hundreds of thousands were injured, and more than 1.5 million people lost their homes (Balz and Liao, 2010). Last year, United States had a hyperactive and catastrophic hurricane season i.e., 2017 Atlantic hurricane season, which was one of the costliest season on record. In just two months, people witnessed three most destructive hurricanes of this season (i.e., Harvey, Irma and Maria) made landfall in the continental United

States. Based on the reports issued by National Hurricane Center, we summarized the damage and deaths caused by these three hurricanes in the United States:

1. Hurricane Harvey caused 68 direct deaths and $125 billion in damage

(https://www.nhc.noaa.gov/data/tcr/AL092017_Harvey.pdf).

2. Hurricane Irma was responsible for 92 deaths, hundreds injured, and $50 billion in wind

and water damage (https://www.nhc.noaa.gov/data/tcr/AL112017_Irma.pdf).

3. Hurricane Matthew resulted in 52 deaths and $10 billion in wind and water damage

(https://www.nhc.noaa.gov/data/tcr/AL142016_Matthew.pdf).

These natural disasters have imposed great challenges on natural disaster management. As humans cannot stop natural disasters from happening, the major objective of natural disaster

2 management is to reduce losses from disasters, assist victims, and achieve recovery. Natural disaster management consists of four major phases including mitigation, preparedness, response, and recovery (Yu, Yang, and Li, 2018). These four phases are involved in a continuous process i.e., natural disaster management cycle, as shown in Figure 1. The mitigation and preparedness phases both center upon management improvements for expected disasters. However, management activities in these two phases are different. Mitigation activities aim to lessen the impact of natural disasters, while management efforts in preparedness phase are to improve the capability of responding to natural disasters. During a natural disaster, responses are required to search and rescue life, offer immediate assistance, assess damage, and others. The post-disaster recovery stage involves actions taken to restore people’s lives and reconstruct infrastructure.

•lessen the impact of •search and rescue natural disasters life, offer immediate assistance, assess damage, and others

Preparedness Response

Mitigation Recovery

•improve the •restore people’s capability of lives and reconstruct responding to infrastructure natural disasters

Figure 1. Natural disaster management cycle (modified from Alexander (2002))

3

1.2 Web 2.0, VGI, and natural disaster management

A phenomenon has occurred: human beings are continuously creating data at an unprecedented speed. Today, we are living in a world where every citizen has become a data contributor and is creating datasets in various formats such as photos, maps, audios, videos, and texts. Notably, a large proportion of these datasets are produced via new information and communication technologies (ICTs), among which Web 2.0 is a prominent technology empowering users to not only consume content but also produce and contribute new content

(Darwish and Lakhtaria, 2011). User-generated content (UGC) through such Web 2.0 technologies as social media sites (e.g., Facebook, Twitter, and Flickr), Wikipedia, and YouTube captures citizens’ digital footprints and offers a great opportunity for understanding human dynamics.

User-generated Web content has also become a new source of geographic information

(Elwood, Goodchild, and Sui, 2012). For example, most social media sties have enabled users to tag geolocations on what they post (e.g., photos, videos, and texts), and these geotagged posts can be accessed via application program interface (API). This emerging source of geographic information is defined by Goodchild (2007) as volunteered geographic information (VGI). In contrast to the conventional geographic information, voluntary efforts are emphasized in the production of VGI.

Traditional geographic information has long been utilized in natural disaster management.

Ground-based observation networks such as NOAA’s Meteorological Assimilation Data Ingest

System (MADIS) deploy a large number of weather stations to constantly record weather observations such as precipitation, wind, and temperature. Remote sensing imagery has been used to assist natural disaster management due to its capability of providing information for areas

4 with sparse ground observations (Tralli et al., 2005; Gillespie et al., 2007; Joyce et al., 2009).

One such example is the application of Landsat Thematic Mapper (TM) imagery and synthetic aperture radar (SAR) data in flood extent estimation and flood volume calculation (Sanyal and

Lu, 2004; Rakwatin et al., 2013). Remote sensing imagery and ground-based observation networks mainly capture the geophysical features of a natural disaster and are limited in providing information on population features. Natural disasters have been analyzed under a

“socio-political ecology of disasters” framework, because “disasters do not affect members of society equally” (Fothergill and Peek, 2004). In other words, people’s vulnerability to natural disasters varies with their demographic and socioeconomic characteristics i.e., social vulnerability. A good knowledge of social vulnerability helps natural disaster managers to identify places requiring support in preparation for and recovery from disasters. Census data as an important source of demographic and socioeconomic information have been used to conduct social vulnerability analysis for natural disaster management (Cutter, Boruff, and Shirley, 2003;

Flanagan et al., 2011). As census data are aggregative, questionnaires have been used to collect individual-level demographic and socioeconomic information.

Recent years have seen a growing number of studies using VGI to improve natural disaster management. One such example is the use of OpenStreetMap (OSM) in 2010 Haitian earthquake. As one of the poorest countries in the world, Haiti lacked high-quality geo-coded information (e.g., detailed roadmaps and locations) when the disaster occurred, which imposed great challenges on relief and aid efforts. In this context, OSM users from all over the world voluntarily produced geographic data such as streets, buildings, and other locations of interest based on remote sensing imagery (Zook et al., 2010). Disaster management tasks greatly benefited from these geographic data.

5

1.3 Social media analytics for natural disaster management Social media outlets, such as Twitter, Instagram, and Facebook, have evolved beyond platforms for sharing people’s personal life toward data sources for leveraging “the public’s collective intelligence” to deal with emergency events (Wang, Ye, and Tsou 2016). In particular, human-centric information related to people’s perceptions, responses, and behaviors in natural disaster context can be extracted from social media and analyzed to assist natural disaster management (Wang and Ye 2018).

In natural disaster management, social media has been applied to “strengthen situational awareness and improve emergency response” (Steiger, 2015). From a citizen’s perspective, ordinary social media users can be alerted to authoritative situational announcements posted by natural disaster management agencies through following their official accounts. From an organizational perspective, disaster response organizations can leverage social media as a platform to communicate with the public in disaster situations and potentially solicit on-the- ground information using the public as information sources (Latonero and Shklovski, 2011).

Social media could be a useful information source for all phases of natural disaster management.

Looking through the literature on the applications of social media data in natural disaster management, several major directions can be identified: (1) Event detection – social media has proven to be efficient in detecting disaster outbreaks and disseminating notifications to the public

(Sakaki et al., 2010); (2) Rapid assessment of disaster damage – there is a positive relationship between disaster damage and disaster-related social media activities (Guan and Chen 2014,

Kryvasheyeu et al. 2016); (3) Situational awareness – information on people’s perceptions, responses, and behaviors can be collected and analyzed to better understand what is happening in a disaster context (Vieweg et al. 2010). No matter what direction studies are contributing to

6 within the literature, the essential question for all of these studies is how to mine social media data for useful information to enable better policy and decision making in the context of natural disasters.

Although various data-mining efforts have been made, they are rather fragmented and may not fully utilize social media data to gain useful information for assisting natural disaster management (Wang and Ye 2018). Existing studies lack a framework to explore data analysis tasks that have been attempted as well as to highlight potential fruitful areas for further research

(Granell and Ostermann 2016, Miyazaki et al. 2015, Haworth and Bruce 2015, Imran et al. 2015,

Steiger et al. 2015, Klonner et al. 2016). In particular, it is still unclear how to develop social media analytics to better mine social media data to enrich useful information for managing natural disasters.

1.4 Dissertation synopsis This dissertation is organized as follows:

Chapter 1 is the introduction of the study.

Chapter 2 provides a systematic literature review on how social media data analyses have been attempted and highlight potential fruitful areas for future research. In spite of a large variety of metadata fields in social media data, four dimensions (i.e. space, time, content and network) have been given particular attention for mining useful information to gain situational awareness and improve disaster response. This chapter reviews how existing studies analyze these four dimensions, summarize common techniques for mining these dimensions, and then suggest some methods accordingly. More importantly, a framework was proposed to categorize the existing studies into 15 classes and facilitate the generation of data analysis tasks. This chapter acts as a

7 guidance for the next two chapters that are two case studies focusing on how to better mine social media data (the four dimensions, more specifically) to enrich human-centric information for natural disaster management.

Chapter 3 presents a case study to show how space, time, content and network dimensions in social media data can be analyzed to provide useful information to enhance situational awareness in a natural disaster context. After retrieving and cleaning the wildfire-related Twitter activities in San Diego County, , the four dimensions were extracted and separately analyzed with kernel density estimation, histograms, latent Dirichlet allocation (LDA), and social network analysis. This chapter is one of the earlier attempts that identify the four dimensions in social media data and indicate possible methods to analyze them.

Chapter 4 provides another case study that demonstrates how to simultaneously analyze multiple dimensions in social media data to gain useful information. Due to computational constraints, this chapter does not go so far to simultaneously analyze all the four dimensions but choose to synthesize the space, time and content dimensions in Twitter activities related to

Hurricane Sandy in New York City. In this chapter, Sandy tweets were first manually classified into six topics including Caution and Advice, Affected People, Infrastructure/Utilities, Needs and

Donations, Weather and Environment, and Other. Following this, a location quotient was applied to detect area-specific topic, defined as a topic that has higher concentration than other topics in a specific area as compared to the entire region. Finally, a Markov transition probability matrix was implemented to investigate how the spatial concentration of topics changes before, during, and after a disaster. This chapter provides an insightful tool to bring situational awareness into a space-time context and enables disaster managers to learn the dynamics of social responses in the entire process of a natural disaster.

8

Chapter 5 is the conclusion of the dissertation. This chapter concludes the study, indicates its limitations, and provides potential avenues for future investigation.

9

CHAPTER 2

LITERATURE REVIEW1

2.1 Introduction Over the past several decades, the frequency and intensity of natural disasters have dramatically increased, causing a huge amount of human injuries, deaths and property damage

(Cutter and Emrich 2005, Klomp 2016). This has imposed great challenges on natural disaster management (Klonner et al. 2016, Kryvasheyeu et al. 2016). To reduce the impact of disasters to humanity, various management tasks during all disaster phases, i.e. mitigation, preparedness, response and recovery, have soaring needs for human-centric information. In recent years, due to the capability of capturing human activities, social sensing techniques featured by various big data sources such as social media data and movement data are gaining increasing attention from geographic information scientists and domain scientists (Goodchild 2007, Liu et al. 2015, Wang et al. 2016, 2016, Zhao et al. 2016).

In natural disaster management, social media has been applied to ‘strengthen situational awareness and improve emergency response’ (Steiger et al. 2015). From a citizen’s perspective, through following official natural disaster management agencies on social media, ordinary social media users can be alerted to authoritative situational announcements. From an organizational perspective, disaster response organizations can leverage social media as a platform to communicate with the public in disaster situations and potentially solicit on-the-ground

1 This chapter is based on Wang, Z., & Ye, X. (2018). Social media analytics for natural disaster management. International Journal of Geographical Information Science, 32(1), 49-72.

10 information using the public as information sources (Latonero and Shklovski 2011). Social media could be a useful information source for all phases of natural disaster management.

However, most studies, with several exceptions, e.g. Haworth et al. (2015) and Yan et al. (2017), have focused on disaster response instead of other phases (Haworth and Bruce 2015, Klonner et al. 2016). This is probably due to the fact that social media activities are less reported before and long after a disaster than during the disaster. This data sparsity problem in phases like preparation, mitigation and recovery may cause unreliable analytical results. Therefore, future work is needed to overcome this limitation and effort needs to be directed toward gaining more useful information for all phases of disaster management through mining social media data.

Social media data are multi-dimensional. For example, each tweet collected via Twitter API

(application program interface) contains multiple metadata fields such as user ID, timestamp

(i.e., the time when tweet was posted), text (i.e., the text message tweeted by a user), coordinates, retweet (i.e., whether a tweet is retweeted from others), and so forth. In fact, existing studies mainly focus on four dimensions: space, time, content and network. Still using

Twitter as an example, these four dimensions correspond to the Twitter metadata fields of coordinates/place/location, timestamp, text, and retweet, respectively. Notably, besides retweet, other relationships such as reply, mention, and friends/followers can also be utilized to formulate networks (Lai et al., 2015).

Granell and Ostermann (2016), Miyazaki et al. (2015), Haworth and Bruce (2015), Imran et al. (2015), Steiger et al. (2015), and Klonner et al. (2016) have reviewed studies on applications of social media and volunteered geographic information (VGI) in disaster management.

Compared with these existing reviews, this chapter provides a unique way to generalize the characteristics of studies related to natural disasters based on the dimensions of social media

11 data. From a methodological perspective, we summarize common techniques for mining these dimensions, and suggest some methods accordingly. We also propose a novel classification schema that is different from existing ones to categorize relevant studies and suggest data analysis tasks. Furthermore, we point out research opportunities and challenges in fusing social media data with authoritative datasets, i.e. census data and remote-sensing data. Notably, although social media has been used in managing man-made crises such as terrorist attacks, current study focuses on disasters caused by natural processes of the earth (e.g. floods, earthquakes and hurricanes). We acknowledge that it might be limiting to emphasize natural disasters alone, but this narrow scope in our study is motived by an attempt to more explicitly deal with nature and society relations and thus contribute to an important subfield of geography, i.e. the human–environment geography.

The remainder of this paper is structured as follows. Section 2 provides a literature review on how existing studies analytically and methodologically explore social media data in natural disaster contexts. Section 3 includes a schema for categorizing existing studies and generating data analysis tasks. Section 4 suggests some research opportunities and gaps in linking social media data with authoritative data, i.e. census data and remote-sensing data. We make our conclusion in Section 5.

2.2 Four dimensions Four dimensions in social media data including space, time, content and network have attracted particular attention from researchers.

2.2.1 Space Spatial information in social media data is critical for natural disaster management. Disaster mapping is an important tool for disaster managers to learn where things are happening. It has

12 been acknowledged that spatial information in social media data could be used in disaster mapping to enable disaster managers to better identify risks and assess damages (Huang et al.,

2015; Kryvasheyeu et al., 2016). There are mainly two types of spatial information in social media data: exact coordinates (i.e., longitudes and latitudes) and toponyms (e.g., a city name)

(Huang et al., 2014; Huang and Wong, 2016). Exact coordinates can be solicited if the built-in global positioning systems (GPS) in users’ devices are turned on. Toponyms could come from profile locations or an inference from the content of social media messages.

A typical way of using this spatial information is to map people’s responses to a disaster. In most cases, this is done by simply plotting social media messages with geo-coordinates on a map

(see Avvenuti et al., 2014; Blanford et al., 2014; Gupta et al., 2013 for examples). Using this method, Avvenuti et al. (2014) display the spatial distribution of earthquake-related twitter messages, and Gupta et al. (2013) visualize the spatial distribution of Hurricane Sandy tweets on a world map. However, this simple visualization method has very limited capability in detecting spatial patterns. Because of this, natural disaster researchers have started to pay attention to some well-established methods such as kernel density estimation (KDE) that have been widely used in other fields to deal with social media data (Li et al., 2013; Tsou et al., 2013; Spitzberg et al.,

2013; Widener and Li, 2014; Han et al., 2015). Guan and Chen (2013) implement KDE to detect spatial clusters of Twitter activities related to Hurricane Sandy. Wang et al. (2015) apply a density-based clustering method to Weibo (a Chinese social media site) messages pertaining to

2012 Beijing Rainstorm to identify spatial hotspots. Notably, a problem may arise when using

KDE to deal with social media data, which is that the spatial pattern of social media activities is often a reflection of population distribution, meaning that areas with a larger population tend to report more social media messages. To solve this problem, Wang, Ye, and Tsou (2016) adopted

13 a method called Dual KDE to exclude the population impact and identify hot spots of wildfire- related Twitter messages. However, spatial analytical tools are still far from being fully exploited to analyze geocoded social media data for natural disasters. It is also important to note that the spatial information that could be retrieved from social media is quite limited. Take twitter for example, geotagged tweets only account for a tiny percentage of all tweets (Dredze et al., 2013), which possibly undermine the results from a spatial analysis. More importantly, compared with the accurate and widely used geo-coordinates and toponyms, some place names that have vague boundaries but are largely communicated by social media users have been rarely explored in disaster contexts (Jones et al., 2008). For example, “downtown” and “city center” are often used to represent the core area of a city (Hollenstein and Purves, 2011). These “vernacular place names” termed by Hollenstein and Purves (2011) in social media messages could be utilized to better enrich the useful geographic information for natural disaster management.

2.2.2 Time Natural disaster management is often time-critical and thus requires timely information collection and analysis. Event detection is important for natural disaster management, as early detection of disasters could result in early response and better mitigation of damages. It has been acknowledged that the real-time nature of social media streams could help detect the outbreak of disasters in a timely manner (Sakaki et al., 2010).

Every social media message comes with a high-resolution timestamp. A typical way of using this temporal information in natural disaster management is to analyze how people’s responses to disasters change over time. To achieve this, these messages are often temporally aggregated to each time interval. For example, if the time interval is 1 hour, then multiple time points in the same hour are aggregated together. In doing so, a frequency distribution is obtained to show the

14 change of related social media activities over time. Existing studies have shown a temporal concurrent evolution between disasters and corresponding social media activities. Blanford et al.

(2014) note that the frequency of tornado tweets increases until its touchdown and then decreases. Qu et al. (2011) also find a similar temporal trend that Weibo messages peak immediately after the earthquake and then drop gradually. However, these traditional methods are unable to distinguish the rich patterns in time series data. Therefore, Wang et al. (2015) employ time series decomposition to disclose the underlying patterns into three components: the overall trend, the cyclical variation, and the causal fluctuation.

2.2.3 Content According to Endsley (1995), situational awareness is “the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.” Situational awareness in natural disaster management could be enhanced by social media data especially the content dimension. As pointed by Huang et al., (2015), “humanitarian Assistance and Disaster Relief (HA/DR) responders can gain valuable insights and situational awareness by monitoring social media- based feeds from which tactical, actionable data can be mined from content”. As people’s conversational content on social media varies in terms of topics and emotions, a data reduction process is often required to classify social media messages into distinct categories. Unstructured social media texts should be converted to a structured form such as term-document matrix or unigram feature before being imported to any classification algorithms (Zhao, 2012). We discuss two types of classifications: topic-based classification and sentiment-based classification.

Topic-based classification

15

The topic-based classification focuses on mining what people talk about in natural disaster situations. According to the usage of predefined classes, there are mainly two types of classification methods: supervised classification and unsupervised classification.

Supervised classification

When using this method, the categories are predefined by analysts. Analysts should train the classifier with sufficient known social media texts, and then apply the classifier to attach labels to all texts. Although various schemes have been utilized to classify disaster texts, we exemplify them with two types. Some studies use the information provided in social media content to build a classification scheme (Imran et al., 2015). A four-tier scheme including situation update (e.g., factual information about situations around impact area), opinion expression (e.g., criticizing rescue efforts by government agencies), emotional support (e.g., expressing anxiety or other feelings), and calling for action (e.g., requesting help) is adopted by Qu et al (2011) to classify earthquake-related messages from a Chinese social media platform. Imran et al (2013a) and

Imran et al (2013b) utilize a five-tier scheme to classify disaster-related Twitter messages into caution and advice (e.g., conveying disaster warnings), casualties and damage (e.g., reporting people injured), donations (e.g., asking for goods or services), people (e.g., reporting people missing), information sources (e.g., including photos or videos). Some other studies design their classification schemes based on disaster phases. For example, Huang et al. (2015) propose a fine- grained classification scheme to categorize Hurricane Sandy tweets into 4 major classes i.e., preparedness, response, impact, recovery and 47 sub-classes. In terms of training methods, multiple techniques could be used to train classifiers, including naïve Bayes, support vector machine (SVM), and logistic regression (Huang and Xiao, 2015; Imran et al., 2015).

16

Unsupervised classification

When there are no predefined classes, the classification of social media messages becomes unsupervised. Clustering is a widely used technique to perform unsupervised classification. The purpose of clustering is to form clusters in such a way that words within a cluster are more likely to reflect the same topic than those that belong to other clusters. Among various clustering algorithms (e.g., k-means clustering, hierarchical clustering, and logistic regression), Latent

Dirichlet Allocation (LDA) is a popular topic modeling method which “allows a word to simultaneously belong to several clusters with varying degrees” (Imran et al., 2015). Using LDA,

Kireyev et al (2009) detect several prominent topics including (tsunami, disaster, relief, earthquake), (me, you), (happy, feel), (dead, bodies, missing, victims), (Australia, Indonesia),

(Internet, web, online) and (aid, help, money, relief) in 2009 Indonesia earthquake tweets. Wang et al (2015) identify three significant topics i.e., weather, disaster information, and loss and influence in Weibo messages related to the 2012 Beijing rainstorm.

Both unsupervised classification and supervised classification have advantages and disadvantages. Unsupervised classification, take LDA for example, could automatically generate summaries of topics, and thus maybe useful for situations where prior knowledge about the topic distribution of the input disaster-related social media messages is lacking. However, it often provides uninterpretable topics and no obvious way of specifying predefined classes into its learning procedure. In contrast, supervised classification could meet the need of disaster responders to specify their own categorization schemes. Nevertheless, it often requires the analysts to manually train enough sample data with the given scheme, which is a time- consuming process and may not be appropriate for rapid decision support. To utilize the strengths from both of them while offsetting weaknesses, we suggest that, for each type of

17 natural disaster (e.g., hurricane), a classification scheme with widespread acceptance should be developed and trained with social media data historically collected. In this way, when next disaster comes, this well-trained classifier could be directly used to categorize newly collected social media data.

Sentiment-based classification

The textual content of social media also reflects people’s sentiments. Sentiment classification is a special task in text classification, which aims to categorize given texts based on their conveyed sentimental opinions such as positive, neutral, or negative (Pang et al., 2002). A large part of sentiment classification algorithms such as SentiStrength

(http://sentistrength.wlv.ac.uk) are lexicon-based, meaning that a lexicon of words which are labeled as positive or negative are used to determine the sentiment of a text. In this sense, a straightforward way of doing sentiment classification is to use a specific lexicon of words to measure the frequency of such words in a text. For example, Shook and Turner (2016), although not indicating the lexicon, calculate the ratio of positive and negative words in winter storm- related tweets so as to capture the temporal change of people’s emotions. A more sophisticated way is to use a machine learning method to train sample tweets labeled by annotators using multiple features extracted from tweets. These features usually include the results from lexicon- based sentiment analysis algorithms. In order to classify people’s sentiments during Hurricane

Sandy, Caragea et al. (2014) use sentiment strength based on the SentiStrength algorithm, along with other features such as emoticons, Internet acronyms, and unigrams to train labeled sample tweets. Although many sentiment classification methods are well-established, there are still few such social media studies on natural disaster management.

18

2.2.4 Network One of the most important tasks for natural disaster management is to spread authoritative announcements and situational updates to the community. This requires a better knowledge for disaster managers on the social network structure in which disaster-related information is disseminated. The emergence of online networks provides great opportunities to investigate information exchange behaviors of various agents (e.g., ordinary users, authoritative agencies, and news media) in natural disaster situations. Many studies have revealed a hierarchical structure in these networks. Cheong and Cheong (2011) conduct a social network analysis on tweets related to 2011 Australian floods and find that dominant users in propagating disaster- related information are “local authorities (mainly the Queensland Police Services), political personalities (Queensland Premier, Prime Minister, Opposition Leader, Member of Parliament), social media volunteers, traditional media reporters, and people from not-for-profit, humanitarian, and community associations”. Kogan et al (2015) indicate that local government authorities and the media are the most important nodes in spreading useful information in 2012

Hurricane Sandy. Social network analysis has shown its strength in analyzing the components, phases, and characteristics of information diffusion process in disasters. Specifically, the results of social network analysis can be visualized to facilitate reasoning (Cheong and Cheong, 2011;

Chatfield and Brajawidagda, 2012; Lu and Brelsford, 2014); and various metrics such as betweenness centrality, closeness, PageRank can be used to detect network patterns in a quantitative manner (Starbird and Palen, 2010; Chatfield et al., 2013).

2.3 Focusing on social media information Moreover, few studies have incorporated all the four dimensions in their analyses, whereas most of them analyze no more than three dimensions (De Albuquerque, 2015; Huang and Xiao,

19

2015; Imran et al., 2013a; Imran et al., 2013b; Vieweg et al., 2010; Zhu et al., 2011). When multiple dimensions are involved, some researchers analyze them separately, whilst others try to examine their interactive dynamics. For example, one study may separately analyze the spatial and temporal dimensions by presenting the temporal component as a histogram of tweets over time and the spatial component as a kernel density map of tweets while another may suggest the simultaneous evolution of tweets across space and over time with a space-time kernel density map.

A classification schema is proposed in this chapter to generalize the characteristics of studies, identify research gaps, and derive data analysis tasks. Some similar efforts have been developed based on space-time-distributional features of economic datasets (Rey and Ye 2010, Ye and Rey

2013) in order to comprehensively quantify the changes and level of hidden variation of regional economic development datasets across scales and dimensions. By incorporating content and network, the suggested schema in current study moves beyond the socioeconomic conventional spatiotemporal datasets focusing on macro dynamics and towards the finer-scale social media studies integrating physical and virtual spaces.

1 2 3 4 Four dimensions have 15 possible combinations (C4 + C4 +C 4 + C4 ) which are illustrated by

Figure 1. The upper left, upper right, lower left, and lower right part of Figure 1 graphically

1 2 3 4 shows C4 , C4 , C4 , and C4 combinations, respectively. Regardless of the upper left part in Figure

1 where 4 dimensions represent 4 combinations, each colored line connecting dimensions in other three parts of Figure 1 represents one combination. All the combinations are listed in the first column of Table 1 along with the corresponding studies shown in the second column. We use these combinations to classify collected studies into 15 classes. It is important to note that when dimensions are joined together using “∩” (e.g., Space ∩Time), they represent that those

20 dimensions are simultaneously examined in corresponding studies (e.g., identify space-time hot spots of earthquake-related tweets).

With reference to Granell and Ostermann (2016), we first define the criteria for filtering articles, shown as following:

1. Written in English.

2. Explicitly stating that they deal with natural disasters and use social media as their data

source.

3. Empirical studies using quantitative methods.

4. Published in scientific journals, conferences, book chapters or workshops with full text

being accessible.

Based on the criteria, three steps were performed to search and select articles:

1. Apply the criteria to the references listed in Granell and Ostermann (2016), Miyazaki et

al. (2015), Haworth and Bruce (2015), Imran et al. (2015), Steiger et al. (2015), and

Klonner et al. (2016) to obtain an initial set of papers (44, in total).

2. For the newly added articles to the collection, retrieve both their reference papers and the

papers that cite them, and then filter the articles with the criteria.

3. Repeat step (2) until no new articles could be obtained.

We finished the above process on January 2nd, 2017 and obtained a final collection of 94 papers. In these papers, various natural disasters such as earthquake, tornado, wildfire, hurricane, flooding, and tsunami have been analyzed using data from major social media sites including

Twitter, Flickr, Facebook, and Weibo, etc. The authors of this chapter worked together to classify every article in our collection based on the schema. For each article, we first identified

21 the dimensions involved in its analysis, and then decides its combinations by investigating which dimensions are separately analyzed and which dimensions are simultaneously analyzed, and finally assigned the article to categories based on the combinations. Please note that one article could be assigned to different categories because it may involve multiple combinations of dimensions. For a large majority of articles, the authors could easily reach agreement on which categories they should be classified into. Nonetheless, we acknowledge that the coding process is not without uncertainty and readers and some of the authors of the classified articles may assign these articles differently.

Figure 2. Combinations of four dimensions in social media data

22

As shown in Table 1, these 94 papers were categorized into 15 classes based on the combination of dimensions. The first four rows in Table 1 list papers in which there are dimensions being separately analyzed; and we observe that 31 of the 94 papers just include one dimension in their analyses. Figure 2 is a summary of these 31 papers according to the dimension they analyze. As seen from Figure 2, most studies choose to analyze the space or content dimension when only one dimension is involved. This reveals that researchers tend to study disaster mapping or situational awareness for natural disaster management when they focus on one dimension of social media data. The remaining eleven rows in Table 1 display papers in which there are dimensions being simultaneously analyzed; and we observe that 63 out of 94 papers involve multiple dimensions in their analyses. Figure 3 is a summary of these 63 papers according to the dimensions they analyze. As seen from Figure 3, among studies where multiple dimensions are involved, nearly half of them (48%) choose to analyze dimensions both separately and simultaneously. This is termed as composite analysis since it consists of both simultaneous analysis and separate analysis. We take Wang et al. (2015) as an example to demonstrate how this type of analysis is performed. In terms of separate analysis, Wang et al.

(2015) analyze the content dimension by classifying rainstorm-related Weibo messages into several topics with associated word frequency distribution, examine the time dimension by checking how the number of Weibo messages change over time, and explore the spatial dimension by detecting spatial clusters of geo-tagged Weibo messages. In terms of simultaneous analysis, they combine space and content (denoted by Space ∩ Content in Table 1) to compare the spatial clustering of Weibo messages under different topics. They also integrate time and content (denoted by Time ∩ Content in Table 1) to compare the temporal trend of Weibo

23 messages under different topics. Researchers usually start with analyzing some dimensions separately and then simultaneously analyze them to gain richer information.

Network 3%

Space 32%

Content 55% Time 10%

Figure 3. A summary of papers that focus on analyzing one dimension of social media data Given the four dimensions in social media data, separate analysis could provide limited information while simultaneous analysis of their combinations could increase the likelihood of gaining more insights. This has implications for natural disaster management, namely that both separate analysis and simultaneous analysis should be conducted to increase the information richness for disaster managers and thus better support the decision-making process in disaster management. For instance, disaster mangers could learn some general information such as where people’s responses to the disaster are intensive from a single spatial analysis and what damages the disaster has caused from a single content analysis. Although the general information is equally important, a simultaneous analysis of space and content could enable disaster managers to gain detailed information such as the impact areas or locations of the damage (Huang et al.,

2015).

24

Separate 27%

Composite 48%

Simultaneous 25%

Figure 4. A summary of papers where multiple dimensions are involved

25

Table 1. Combinations of four dimensions and corresponding articles

Combination of dimensions References Space Avvenuti et al. (2014), Cameron et al. (2012), Cervone et al. (2016), Chatfield and Brajawidagda (2012), Chatfield et al.(2013), Crooks et al.(2013), De Longueville et al.(2009), Earle (2010), Earle et al. (2012), Eilander et al.(2016), Fuchs et al.(2013), Gao and Liu (2015), Guan and Chen (2014), Gupta et al. (2013), Guy et al.(2010), Hara (2015), Huang and Cervone (2016), Huang et al (2015), Hultquist et al (2015), Kent and Capello Jr (2013), Kryvasheyeu et al. (2016), Landwehr et al.(2016), Liang et al.(2013), McClendon and Robinson (2012), Panteras et al.(2015), Sakaki et al. (2010), Schnebele and Cervone (2013), Schnebele et al.(2014a), Schnebele et al.(2014b), Shelton et al.(2014), Sun et al.(2016),Triglav-Čekada and Radovan (2013), Wang et al. (2015), Wang, Ye, and Tsou (2016), Xiao et al.(2015), Yin et al.(2012), Zielinski et al. (2013) Time Avvenuti et al. (2014), Avvenuti et al. (2016), Cameron et al. (2012), Chatfield and Brajawidagda (2012), Chatfield and Brajawidagda (2013), Chatfield and Brajawidagda (2014), Chatfield et al.(2013), Crooks et al.(2013), De Longueville et al.(2009), De Longueville et al.(2010), Earle et al. (2012), Eilander et al.(2016), Fuchs et al.(2013), Guan and Chen (2014), Guy et al.(2010), Huang and Cervone (2016), Hughes and Palen (2009), Imran et al.(2014b), Jongman et al.(2015), Kryvasheyeu et al. (2015), Kryvasheyeu et al. (2016), Lachlan et al.(2016), Landwehr et al.(2016), MacEachren et al.(2011), Mendoza et al. (2010), Middleton et al.(2014), Oh et al. (2010), Panteras et al.(2015), Power et al. (2014), Preis et al.(2013), Qu et al.(2011), Sakaki et al. (2010), Schade et al.(2013), Terpstra et al.(2012), Wang et al. (2015), Wang, Ye, and Tsou (2016), Yin et al.(2012)

Content Ashktorab et al. (2014), Avvenuti et al. (2014), Cameron et al. (2012), Caragea et al. (2011), Castillo et al. (2013), Chatfield and Brajawidagda (2013), Chatfield and Brajawidagda (2014), Chowdhury et al. (2013), De Longueville et al.(2009), Gelernter and Mushegian (2011), Gupta et al. (2013), Hara (2015), Huang and Xiao (2015), Hughes and Palen (2009), Imran et al.(2014a), Imran et al.(2014b), Imran et al.(2013a), Imran et al.(2013b), Kireyev et al. (2009), Kongthon et al.(2012), Lachlan et al. (2014a), Lachlan et al.(2014b), Lachlan et al.(2016), Lingad et al.(2013), Liu et al.(2008), Mendoza et al. (2010), Olteanu et al. (2015), Oh et al. (2010), Panteras et al.(2015), Qu et al.(2011), Saharia (2015), Schulz et al.(2013), Starbird and Palen (2010), Truelove et al.(2015), Verma et al.(2011), Vieweg et al.(2014), Vieweg et al.(2010), Wang et al. (2015), Wang, Ye, and Tsou (2016), Yin et al.(2012), Zielinski et al.(2013)

Network Chatfield and Brajawidagda (2012), Chatfield and Brajawidagda (2014), Chatfield et al. (2013), Cheong and Cheong (2011), Gupta et al. (2013), Mendoza et al. (2010), Sakaki et al. (2010), Wang, Ye, and Tsou (2016)

Continued on next page

26

Table 1. Combinations of four dimensions and corresponding articles (Continued)

Combination of dimensions References

Space∩Time Blanford et al. (2014), Crooks et al. (2013), De Longueville et al. (2010), Fuchs et al.(2013), Gao and Liu (2015), Guy et al.(2010), Jongman et al.(2015), Liang et al.(2013), Mandel et al. (2012), Schade et al.(2013), Shook and Turner (2016), Terpstra et al.(2012) Space∩Content De Albuquerque et al (2015), Fohringer et al. (2015), Hara (2015), Huang and Cervone (2016), Huang et al (2015), Huang and Xiao (2015), Hultquist et al (2015), Kryvasheyeu et al. (2016), MacEachren et al.(2011), Middleton et al.(2014), Musaev et al. (2014), Robinson et al.(2013), Shanley et al.(2013), Shelton et al. (2014), Truelove et al.(2015), Pohl et al.(2012), Wang et al. (2015) Space∩Network N/A Time ∩Content Caragea et al. (2014), Chatfield and Brajawidagda (2014), Gupta et al. (2013), Huang and Xiao (2015), Kongthon et al.(2012), Kryvasheyeu et al. (2015), Olteanu et al. (2015), Oh et al.(2010), Panteras et al.(2015), Qu et al.(2011), Shanley et al.(2013), Shook and Turner (2016), Vieweg et al.(2010), Wang et al.(2015)

Time∩Network Castillo et al. (2013), Kryvasheyeu et al. (2015), Lu and Brelsford (2014), Mendoza et al. (2010) Content∩Network Gupta et al. (2013), Kogan et al. (2015), Qu et al.(2011) Space∩Time∩Content Bakillah et al.(2015), Caragea et al. (2014), Hara (2015), Kryvasheyeu et al. (2016), Mandel et al. (2012), Spinsanti and Ostermann (2013) Space∩Time∩Network Kogan et al. (2015), Kryvasheyeu et al. (2015) Space∩Content∩Network N/A Time∩Content∩Network Gupta et al. (2013), Zhu et al.(2011) Space∩Time∩Content∩Network Kryvasheyeu et al. (2015)

27

Table 2. Combinations of four dimensions and data analysis tasks

Combination of dimensions Data analysis tasks Space Where is the hot spot of people’s responses to a disaster? For example, are the impact areas the hot spots of disaster-related social media activities? Time How do people’s responses change with the evolution of a disaster (before, during and after)? For example, when do disaster-related social media activities reach peak in the process of a disaster? Content How do people’s responses vary according to their posted content? For example, how many social media feeds report power outage in a disaster? Network Who are the important players in spreading disaster-related information on social media in a disaster? For example, how many reposted messages are originally from emergency management agencies? Space∩Time How do people’s responses to a disaster vary across space and over time? For example, do people’s social media activities from the impact area form a significant hot spot immediately after being struck by a disaster? Space∩Content How do people’s conversational topics related to a disaster on social media vary across space? For example, do people proximate to the impact area have more on-topic messages than distant people do? Space∩Network What is the spatial manifestation of the network structure in a disaster? For example, who are the local opinion leaders in disseminating disaster-related information for a given place? Time ∩Content How do people’s conversational topics vary with the evolution of a disaster? For example, do people change their topics from preparedness (e.g., survival kits and food stock) to impact (e.g., damage and casualty)? Time∩Network What is the temporal manifestation of the network structure in a disaster? For example, is the same set of opinion leaders dominant in all phases of a disaster? Content∩Network Which topic goes viral in a disaster situation? For example, how do rumor messages spread across the social network? Space∩Time∩Content What is the space-time pattern of people’s topics in a disaster? For example, where is the hot spot of transportation-related social media activities when a disaster unfolds? Space∩Time∩Network What is the space-time manifestation of the network structure in a disaster? For example, is the same set of local opinion leaders dominant in all phases of a disaster for a given place? Space∩Content∩Network How does geographical space characterize the diffusion of social media messages under a certain topic? For example, what is the spatial extent of the spreading of rumor messages in a disaster? Time∩Content∩Network What is the temporal dynamics of the diffusion of social media messages under a certain topic? For example, how long do rumor messages last for spreading? Space∩Time∩Content∩Network How do space and time jointly characterize the diffusion of social media messages under a certain topic? For example, what is the space-time extent of the diffusion of rumor messages in a disaster?

28

Due to some computational constraints, it is challenging to simultaneously analyze many dimensions. As a result, if we compare the first four rows with the remaining eleven rows in

Table 1, we can observe that articles containing simultaneous analyses are fewer than those involving separate analyses. Moreover, Table 1 shows that no references correspond to

Space∩Network and Space∩Content∩Network, which means that space and network are difficult to simultaneously analyze. For example, it is difficult to build a spatial retweet network, because retweets are usually not geo-tagged (Gupta et al., 2013). However, there are two exceptions (i.e., Kogan et al., 2015 and Kryvasheyeu et al., 2015) in our collection of papers, which have overcome the computational constraints and successfully combined space with network (see Table 1). Kogan et al. (2015) first glean tweets related to Hurricane Sandy with a bounding box representing the impact area; then, users who have posted geotagged tweets within the impact area are identified as geographically vulnerable users; finally, all the retweets posted by geographically vulnerable users are selected to build social networks. In this way, they compare the authors of original tweets and the retweet authors (i.e., geographically vulnerable users) and find an overlap between them, indicating that “the geographically vulnerable are more likely to propagate tweets from other geographically vulnerable users” (Kogan et al., 2015).

Kryvasheyeu et al. (2015) first identify the impact area of Hurricane Sandy; then they use profile locations as surrogates for non-georeferenced tweets to increase the number of geo-tagged tweets; finally, they build a spatial-social network based on friends/followers relationships. In this way, they find that users with strong network centrality tend to be aware of Hurricane Sandy earlier, and that users within the hurricane-affected area show an awareness advantage when compared with those outside the impact area (Kryvasheyeu et al., 2015). In view of this, future

29 efforts could be devoted to overcoming these computational obstacles and towards more simultaneous analyses of dimensions and richer information for natural disaster management.

Aside from serving as taxonomy, these combinations could also guide researchers to derive data analysis tasks. The examples listed in Table 2 are merely illustrative and by no means exhaustive. A major heuristic purpose behind Table 2 is to encourage researchers to reflect on how to fully mine the dimensions to gain useful information and raise new research questions.

2.4 Fusing social media data with authoritative data While social media, as one of the most commonly used big data sources, often provides high-volume data, it could lack richness or quality in information if utilized as standalone data source (Crampton et al. 2013). Therefore, researchers have proposed to complement or synthesize social media data with traditional or authoritative data (Goodchild and Glennon 2010,

Crampton et al. 2013, Kwan 2016). Although there are some such studies in natural disaster management, current efforts are far from being sufficient and may require further improvement.

In these studies, three major data sets, including surveillance data, remote-sensing imagery, and census data, are fused with social media data to provide additional informational richness to analyses (Kent and Capello, 2013, Schnebele and Cervone 2013, Schnebele et al. 2014a, Xiao et al. 2015, Cervone et al. 2016, Sun et al. 2016). Although observations recorded by surveillance systems or monitoring networks (e.g. river gauge data) are equally important, remote sensing imagery and census data have been increasingly integrated with social media data in natural disaster management (Kent and Capello 2013, Schnebele and Cervone 2013, Schnebele et al.

2014a, Xiao et al. 2015, Cervone et al. 2016). As such, we give a particular emphasis on the review of opportunities and challenges of leveraging social media data against remote-sensing imagery and census data.

30

2.4.1 Fusing with remote-sensing data Remote sensing has been widely used to assist disaster management in recent years due to its capability of providing information for poorly accessible areas or areas with sparse ground measurements (Tralli et al. 2005, Gillespie et al. 2007, Joyce et al. 2009). One such example is the application of Landsat Thematic Mapper (TM) imagery and synthetic aperture radar (SAR) data for flood extent estimation and flood volume calculation to support flood management

(Sanyal and Lu 2004, Rakwatin et al. 2013). However, remote sensing has its own constraints.

First, not all remote-sensing data are freely available. Second, vegetation and cloud cover may compromise the quality of some remote sensing imagery. Third, lengthy revisit times prohibit remote sensing from providing continuous observations with ideal temporal resolution. Quality and reliability issues, unstructured nature, digital divide and privacy are among the most concerned limitations in social media data. Notably, the digital divide implies that social media users could only represent a certain subdivision of the whole population, and vulnerable groups in disasters such as children and elderly people may be underrepresented. The exposure of personal lives, locations and activities may result during the processing of social media data, causing privacy issues (Elwood and Leszczynski 2011, Sui and Goodchild 2011, Zook et al.

2015a, Zook et al. 2015b). In spite of these limitations, social media data that are virtually in real time and freely accessible could be complementary to remote-sensing data. We summarize the strengths and limitations of social media data, remote-sensing data and census data in Table 3.

As seen from Table 3, social media data and remote-sensing data are highly complementary and have a great potential to be fused together to make use of the strengths of both of them while offsetting weaknesses. This has already been confirmed by some existing studies including

Schnebele and Cervone (2013), Schnebele et al. (2014a) and Cervone et al. (2016). As pointed out by Schnebele and Cervone (2013), social media data can verify the presenceof water in a

31 specific area and augment the flood hazard map created from Multispectral Landsat ETM+ images by altering water contour lines and reclassifying water pixels. Schnebele et al. (2014a) shows that the estimation of flood extent and identification of affected roads during a flood disaster have been augmented with a fusion of social media data with SAR imagery and other data. Cervone et al. (2016) have revealed that Twitter can be used to quickly task the collection of remote-sensing imagery and rapidly assess disaster damages. However, studies that fuse social media data with remote sensing data are still rare and some research gaps need to be filled.

Examples of these gaps include but not limited to:

(1) Flooding has been given the most attention by researchers, while studies on other natural

disasters are lacking.

(2) Studies should go beyond the spatial dimension and incorporate more dimensions of

social media data to refine the fusion. For example, the space–time hot spots of a

‘transportation’ topic will be more refined to task the collection of remote-sensing data

for transportation damage assessment in flooding.

(3) Showing all the data on one map is not a good fusion. Studies should go beyond this

simple overlay of social media data and remote-sensing imagery by truly fusing these

heterogeneous data with different spatial and temporal resolutions.

2.4.2 Fusing with census data Natural disasters have been analyzed under a ‘sociopolitical ecology of disasters’ framework, because ‘disasters do not affect members of society equally’ (Fothergill and Peek

2004). In other words, people’s vulnerability to natural disasters varies with their demographic and socioeconomic characteristics. Although social media can capture people’s varied responses to disasters, it provides limited or inaccurate information about their demographic and

32 socioeconomic characteristics such as income, education and so on. In this sense, research questions such as how disaster responses vary among social classes could hardly be addressed using social media as single data source. Additionally, demographic and socioeconomic characteristics are evidenced as important factors for shaping risk perceptions (Botzen et al.

2009). Therefore, when analyzing the variations of people’s responses to disasters, demographic and socioeconomic data can provide more explanatory power. It also provides an opportunity for emergency responders to learn the effects of inequality among different socioeconomic groups in disaster situations to make diversifying interventions. Using the census as an important source of demographic and socioeconomic data, in spite of some limitations, could provide quality information to complement social media data, as shown in Table 3. Kent and Capello (2013) reveal that demographic features (e.g. age under 18, population residing in rental property and the population density) at the census block level could explain the spatial variation of people’s responses to wildfires on Instagram, Twitter, Flickr and Picasa. Similarly, Xiao et al. (2015) find that people’s responses to Hurricane Sandy represented by tweet frequency at the census tract level are related to socioeconomic and demographic factors such as age, gender and income.

There are also research gaps in current literature. Examples of these gaps include but not limited to:

(1) Existing studies have simply aggregated the geo-referenced social media data to census

area units and related them to demographic and socioeconomic variables at the same

level. Refined indicators should be developed to go beyond this spatial aggregation and

exclude the influence of population and geo-technical factors. One preliminary indicator

may be the ratio of disaster-related social media feeds to the general feeds.

33

(2) The social vulnerability (Cutter et al. 2003) may be a good framework for choosing

demographic and socioeconomic variables from census data.

(3) Studies should go beyond the spatial dimension and incorporate more dimensions of

social media data. For example, risk perceptions could be analyzed via a content analysis

of social media feeds, and then an examination of how they differ across demographic

and socioeconomic groups could follow.

Table 3 Strengths and Limitations of remote sensing, social media, and census data

Remote sensing Social media Census Data Strengths  Providing information for  Real-time data  Little privacy poorly accessible areas or  Freely accessible concern areas with sparse ground  Recording human  Reflecting measurements activities demographic and  Capturing physical socioeconomic features characteristics  Data are captured  Most are freely remotely with little risk of accessible lives Limitations  Not all data are freely  Quality and reliability  Long time interval accessible problems (released yearly or  Could be influenced by  Unstructured nature longer) cloud and vegetation  Digital divide  Aggregate data at cover  Privacy areal units (e.g.,  Lengthy revisit time  Limited spatial block, block group, information and census tract)

2.5 Conclusion As natural disasters become more frequent and severe, their management has increasing needs for human-centric information to facilitate better decision-making towards reductions of human and property losses. In this context, social media that records a large amount of human activities has attracted increasing attention from researchers. Reviewing a set of recently published papers, we propose a framework to classify existing studies and facilitate the

34 generation of data analysis tasks. We also suggest a fusion of social media data with remote- sensing data and census data to gain more useful information for natural disaster management.

This chapter This study is not without limitations. First, as we mainly focus on four dimensions, other dimensions might have been overlooked. Hence, we encourage people to update our framework by incorporating more dimensions. Second, the content analysis is only on text, while other contents such as pictures and videos posted by social media users can also be informative to natural disaster management. Third, aside from social media data, informative user generated content and VGI could also be found in OpenStreetMap, Smartphone apps and other sources (Ye et al. 2016). Fourth, the classification results may be slightly different depending on the people who carry out the classification of articles. Thus, it merits future efforts to better deal with this uncertainty. Fifth, current classification schema is based on the combinations of dimensions, but another framework by which studies are categorized based on disaster management phases they most closely correspond to could also be useful. We will shed more light on the development of social media analytics to satisfy the diverse needs from different disaster management phases in the next step. Finally, we emphasize the analyses of social media data but rarely mention how these analyses could be more efficiently conducted.

Due to its capability of collecting, storing and processing massive, unstructured and real-time datasets (Wang 2010, Wang et al. 2013, Huang et al. 2015), CyberGIS could provide a scalable solution for analyzing social media data in natural disaster contexts. According to Wang (2013),

CyberGIS ‘represents a new-generation GIS based on the synthesis of advanced cyberinfrastructure, geographic information science, and spatial analysis and modeling’.

Therefore, to gain a high performance computing power for natural disaster management, the analyses of four dimensions in social media data as well as the fusion of social media data with

35 census data and remote-sensing imagery could be embedded in CyberGIS. In spite of these limitations and drawbacks, our work goes beyond a simple review of existing studies and has great potential for guiding future efforts in the domain of natural disaster management.

Specifically, this study shows that simultaneous analyses of multiple dimensions of social media data are rare in current literature. Therefore, it might be helpful for future studies to put more emphasis on overcoming the computational constraints to analyze more dimensions simultaneously. Generally, other things being equal, with more data being analyzed, more information richness could be gained. We find that data fusion of social media data with authoritative datasets (especially census data and remote-sensing imagery) is far from being mature. The spatial dimension in social media data has been given particular attention, while other dimensions have not been fully exploited in data fusion. Additionally, the fusion of social media data with census data and remote sensing imagery is mainly about simple overlay and aggregation. To solve this problem, Linked Data and Semantic Web technologies that are capable of integrating various and heterogeneous data sources could serve as a good starting point toward a more meaningful data fusion (Purves et al. 2007, Goodwin et al. 2008, Janowicz et al. 2012, Grütter et al. 2017). Aside from disaster management, we believe that the general structure of our framework could be extended to other fields such as politics and public health to serve as guidance for better data mining.

36

CHAPTER 3

ANALYZING WILDFIRE TWITTER ACTIVITIES: SPACE, TIME, CONTENT, AND

NETWORK2

3.1 Introduction As more and more fire-prone areas have been urbanized, people’s livelihoods in the western

U.S. have been severely influenced by the increasingly frequent wildfires (Pyne, 2004; Collins,

2008; Collins and Bolin, 2009). In October 2003, the — the largest fire in California history— caused a huge damage to San Diego County (Goldstein, 2008). In May 2014, San

Diego County witnessed several destructive wildfires within one month (see Table 4).

The increasing wildfire activities, with the associated risks for nature and society, have attracted attention from researchers as well as emergency managers (Rodrigues and Riva 2014).

In order to achieve a better understanding of the occurrences and patterns of spread of wildfires, efforts by domain scientists from various perspectives have included wildfire exposure modeling

(Ager et al 2014a, Ager et al 2014b; Thompson et al., 2015; Youssouf et al., 2014), wildfire risk assessment (Chuvieco et al., 2010; Chuvieco et al., 2012; Martínez et al., 2009; Padilla and

Vega-Garcia, 2011; Rodrigues et al., 2014), wildfire and wildland-urban interface (WUI)

(Herrero-Corral et al., 2012; Massada et al., 2009; Schulte and Miller, 2010), wildfire-climate interactions (Gillett et al., 2004; Liu et al., 2014; Westerling et al., 2006) and among others. To deal with the risks posed by wildfires at the early stages and reduce increased costs, wildfire

2 This chapter is based on Wang, Z., Ye, X., & Tsou, M. H. (2016). Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards, 83(1), 523-540.

37 management agencies have incorporated various wildfire detection systems, e.g., the general public, lookout towers, terrestrial mobile brigades, and aerial reconnaissance (Rego et al., 2013).

The Wildland Fire Decision Support System (WFDSS) has also been developed to provide advanced tools for burn probability modeling and exposure analysis thus to improve the real- time wildfire suppression decision making (Calkin et al., 2011).

Although many studies and practices have been conducted in dealing with wildfire issues, most of them were not from a human-centric perspective and omitted the wildfire-related human behaviors (Slavkovikj et al., 2014). This might be due to lack of available data and the fact that the collection of survey data often required a large amount of time and budget. As an emerging human-centric sensing technology, social media outlets, such as Twitter, Facebook, or LinkedIn, has gone beyond a platform for sharing people’s personal life and has become a data source for possible looks of people’s behavioral patterns (Srivastava, Abdelzaher, and Szymanski, 2012;

Tsou and Leitner, 2013; Tsou et al., 2013; Young, 2014). Consequently, an increasing number of studies have started to characterize the way people becoming aware of, responding to, and recovering from disasters using social media data. According to Endsley (1995), situational awareness is “the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future”.

Disaster-relevant messages communicated by social media users have been used to analyze how people improve their situational awareness through the information contributed by others

(Vieweg et al., 2010; De Albuquerque et al., 2015).

Space and time are strongly related to situational awareness in emergency events. Existing studies have revealed that social media users who are geographically proximate to the events are more likely to produce useful information for improving situational awareness. De Albuquerque

38 et al (2015) carry out a spatial analysis and find a strong spatial relationship between locational proximity to floods and the usefulness of the messages for crisis management. By analyzing the earthquake-related tweets in Japan, Acar and Muraki (2011) find that “people in directly affected areas tend to tweet about their unsafe and uncertain situation while people in remote areas post messages to let their followers know that they are safe”. In addition, the temporal evolution of emergency events and the corresponding Twitter activities have been proved to be somewhat concurrent. Guan and Chen (2014) find that the ratio of tweets associated with Hurricane Sandy to general tweets increase gradually before this disaster, peak when it landed, and then gradually decrease. Huang and Xiao (2015) indicate that messages posted by Twitter users varied with the temporal process of a disaster and thus could provide useful information for improving situational awareness at different stages of a disaster i.e., preparedness, response, impact, and recovery.

Besides analyzing spatial and temporal characteristics of disaster-related social media data, some studies focus on mining the actual content of social media messages to improve knowledge about disaster situations. This is usually carried out by a data reduction process like classification, as these user generated messages are extremely varied and some of them are not informative or relevant. Qu et al (2012) divide the earthquake-related microblog messages with valuable information for improving situational awareness into four categories i.e., situation update, opinion expression, emotional support, and calling for action. Cameron et al (2012) develop a platform for emergency situation awareness, which can detect emergent incidents and classify tweets as interesting or not. Imran et al (2013a) and Imran et al (2013b) utilize machine learning methods to extract informative Twitter messages that can augment situational awareness and classify them into “fine-grained” classes i.e., caution and advice, casualties and damage,

39 donations, people, information sources and other . Imran et al. (2014a) further design an

Artificial Intelligence for Disaster (AIDR) platform to automatically classify emergency-related

Twitter messages into a set of user-defined situational awareness categories in a timely manner.

According to the two-step flow of communication theory, there are “gatekeepers” who filter and interpret information using their own perceptions before passing it on to the public (Xu et al.

2013). On social media, these “gatekeepers” are usually elite users or opinion leaders from whom the general public acquires information. In disaster situations, people may also tend to obtain situational updates and gain situational awareness from the informative messages shared by opinion leaders. However, to our knowledge, few studies have been conducted to investigate who the opinion leaders are and what roles they have played in the information exchange network related to disasters using social media data. Several exceptions to this include works by

Cheong and Cheong (2011), Kogan et al. (2015), and Starbird and Palen (2010). Using social network analysis, Cheong and Cheong (2011) find that local authorities, traditional media reporters, etc. are important players in spreading situational information during 2010-2011

Australian floods. Kogan et al. (2015) indicate that local government authorities and the media are the most important nodes in the retweet network during the 2012 Hurricane Sandy. A similar phenomenon has been also observed by Starbird and Palen (2010), “users are more likely to retweet information originally distributed through Twitter accounts run by media, especially the local media, and traditional service organizations”.

Space, time, content, and network are important attributes of emergency-related social media data and should be fully used to gain insights into emergency situational awareness. This chapter presents the findings from examining the spatial and temporal variations of wildfire- related tweets and from our attempt to characterize wildfire by the discussion topics in the

40 collected tweets, as well as from investigating the role of opinion leaders in people’s acquisition of wildfire-related information. In the following sections, we first introduce our data and related methodology. We then discuss the findings and their implications. Finally, we discuss what the findings suggest and what future pursuits on this topic can be.

3.2 Data and methodology 3.2.1 Data We used Twitter search API (https://search.twitter.com/) to collect wildfire-related Tweets.

Our collection process has two phases. In the first phase, any tweet that contains either of the two keywords — ‘fire’ and ‘wildfire’ — was collected to generate a holistic picture of online

“wildfires”. Those tweets demonstrate that people usually tweeted about wildfires along with the places where wildfires occurred. The examples listed below provide some such instances:

“Flames jumping San Dieguito Rd, Camino del Sur, Rancho Bernardo evacuations in order.”

“The fires are right behind my parents’ house in San Marcos, Please pray.”

In the second phase, to solicit more information about specific wildfires, tweets associated with specific wildfires were gleaned based on keywords that are places where wildfires occurred.

We randomly selected two keywords i.e., San Marcos and Bernardo from a list of places (see

Table 5). As the tweets collected using toponyms as keywords may contain noises which have nothing to do with wildfire, we filtered out these noises by checking whether a ‘fire’ or ‘wildfire’ also appear in the collected tweets. In summary, the first phase focused on collecting general tweets related to wildfires, while the second phase centered upon the tweets pertinent to specific wildfires. Tweets collected in the first phase could be used in analysis of all dimensions (i.e., space, time, content, and network), whereas tweets gleaned in the second phase are of particular importance for spatial analysis as only by identifying the accurate ignition locations of specific

41 wildfires can we investigate the influence of geography (distance) on the spatial distribution of people’s responses.

Temporally, our study period spans from May 13, 2014, when the first wildfire occurred, to

May 22, 2014, when most of the destructive wildfires were 100% contained. Spatially, a radius of 40 miles was set to specify a circular area (centered at downtown) to cover the majority of San

Diego County.

There are several metadata fields in every tweet. The spatial analysis is possible only with tweets that have geographic information (described in the metadata field ‘coordinates’). Less than 5 percent of downloaded tweets have geo-coordinates. We collected 1334 geotagged tweets in the ‘fire’ and ‘wildfire’ pool. In terms of specific wildfires, after filtering out the noises, we retained 106 geotagged tweets with ‘Bernardo’ as the keyword and 149 with ‘San Marcos’ as the keyword (see Table 4 for a data summary). Other tweets do not have coordinates because the devices used to tweet messages did not have the built-in global positioning systems turned on.

Twitter users could either write their own words or re-post another’s tweet (i.e., retweet) to generate a text. We focused on the texts of ‘wildfire’ tweets (including own generated tweets and retweets) to identify people’s conversational topics on Twitter during wildfire hazards. After that, we built up a retweet network using ‘wildfire’ retweets to gain knowledge about who the opinion leaders were and what role they have played.

42

Table 4. Data summary for the collected tweets

Keyword Total Tweets Tweets with Geo-coordinates

Fire 32410 3.9% with geo-coordinates

Wildfire 2084 3.2% with geo-coordinates

Bernardo 2049 5.2% with geo-coordinates

San Marcos 5002 3.0% with geo-coordinates

3.2.2 Methods Several specific methods were adopted in our study: kernel density estimation (KDE) to analyze the spatial pattern of wildfire-related tweets; text mining to identify conversational topics; social network analysis to detect the opinion leaders in wildfire hazards.

KDE is generally used to detect hots spots of spatial point data. Here, this technique was used to create a smoothed map of the wildfire-related tweets. KDE imports the coordinates of tweets and exports a raster-formatted map where each cell is assigned a value to represent the intensity level (Han et al, 2015). Concentrated cells with intensive values are hotspots. To deal with the impact of population, a dual kernel density estimation (Dual KDE) was implemented to map the spatial patterns of tweets associated with two specific wildfires (i.e., Bernardo fire and

San Marcos fire). The population data was obtained at census block level. To transfer the areal data to point data, the population value at each census block was assigned to its centroid. After that, KDE was exploited to generate the population map in a raster format. The formula of Dual

KDE is given as:

Dual KDE Map = Each Cell Value of Tweets Map / Each Cell Value of Population Map

We also analyzed the content and network dimensions, which included a text mining for identifying important terms and term clusters in wildfire-related tweets and a social network analysis for detecting users’ structure and behavior in wildfire retweet network. The text mining

43 of wildfire-related tweets was conducted using the ‘tm’ package in R 3.1.2 (Feinerer, Hornik, and Meyer 2008; Feinerer and Hornik 2014). Since the raw tweets are usually unstructured and noisy, we need to clean them before calculating the term frequency and clustering terms. With reference to Ghosh and Guha (2013), we cleaned the raw tweets by removing URLs and stop words, converting a word’s different morphological variants to the word’s base form, and so on.

In this process, some commonplace but meaningless words such as California, San Diego, county, and news were also removed. Notably, since our tweets contained many toponyms which were made up of more than one word (e.g., 4S Ranch), we combined those words to make sure that each toponym be represented by one word to avoid double counting. After cleaning the raw tweets, we obtained a term-document matrix, where a row stands for a term and a column for a tweet (Zhao, 2012). We then calculated the frequency of terms to check their variation in importance. To identify the conversational topics related to wildfire, we utilizes k-means clustering method to identify clusters where terms appeared in the same group. With this method, terms which appeared frequently in the same document were grouped into one cluster, and terms which were grouped into one cluster were more likely to be seen in the same document.

We used retweets to perform social network analysis. Retweet (abbreviated as RT) is a function provided by the Twitter website, which allows users to tweet the content that has been posted by others. We can directly identify retweets, because the text of a retweet always starts out like “RT@Username”. In retweet network, the nodes are users who retweet other users’ messages, as well as users who are retweeted by others. Another R package — ‘igraph’ (Csardi and Nepusz 2006) —was implemented to conduct the social network analysis.

44

3.3 Spatial and temporal analysis of wildfire Twitter activities

In this section, we analyzed the spatial and temporal relationship between social media activities and wildfire disruptions from the following two perspectives. First, we checked the temporal evolution of wildfire tweets and then compared it with the wildfire’s temporal information (i.e., time of outbreak and time of 100% contained) collected from authoritative source. Second, we examined whether the impact areas were clusters of wildfire tweets or not.

Table 5 demonstrates some basic spatiotemporal information of the major wildfires occurred in our study period. The CAL FIRE merely provided the longitudes and latitudes for several fires, so the geo-coordinates for other fires were inferred from their locations (see the fourth column in

Table 4). The information in Table 5 provided a basis for our spatial and temporal analysis. This table shows that six of the nine wildfires broke out on May 14, which could explain why May 14 had a sudden increase in wildfire tweets (as shown by Figure 4). A temporally concurrent evolution of wildfire and its related tweets could also be observed from Figures 5(a) and 5(b).

More specifically, Bernardo fire and San Marcos fire both had their tweets peak on the day after the breakout day. This one-day time lag is probably because it always takes time for information diffusion on Twitter.

45

14000

12000

10000

8000

6000 # of of # Tweets

4000

2000

0

Date

Figure 5. Temporal evolution of wildfire-related tweets with keywords of ‘fire’ and ‘wildfire’

46

Table 5. Overview of the major wildfires in May, 2014

Major Time of Time of 100% Location Long/Lat Acre Wildfires outbreak contained (UTC) (UTC) Bernardo Fire May 13, May 17, 20:14 Off Nighthawk Lane, -117.133/33.003 1,548 acres 11:00 southwest of Rancho Bernardo Tomahawk May 14, May 19, 9:20 Traveled from Naval Weapons -117.285/33.353 5,367 acres Fire 9:45 Station, Fallbrook to Camp Pendleton Poinsettia May 14, May 17, 12:00 Off Poinsettia Ln & Alicante -117.278/33.112 600 acres Fire 10:30 Rd in Carlsbad (Carlsbad fire) May 14, May 15, 18:30 Off Old Hwy 395 and I-15 in -117.162/33.312 380 acres 13:00 the Deer Springs area May 14, May 19, 9:20 North River Road and College 105 acres 12:12 Blvd., Oceanside Cocos Fire May 14, May 22, 18:15 Village Drive and Twin Oaks -117.160/33.114 1,995 acres (San Marcos 16:00 Road, San Marcos Fire) Freeway Fire May 14, May 20, 11:30 Naval Weapons Station, -117.260/33.370 56 acres 17:43 Fallbrook Pulgas Fire May 15 May 21, 17:00 Off Interstate 5 at Las Pulgas -117.463/33.303 14,416 14:45 Rd, north of Oceanside acres San Mateo May 16, May 20, 23:30 in the Talega area of Marine -117.300/33.286 1,457 acres Fire 11:24 Corps Base Camp Pendleton Sources: complied from http://www.fire.ca.gov/.

47

1200 a

1000

800

600 # of of # Tweets

400

200

0

Date

3000 b

2500

2000

1500 # of of # tweets

1000

500

0

Date

Figure 6. Temporal evolutions of tweets with keywords including (a) Bernardo (b) San Marcos

48

Figure 7. Spatial distribution of geo-tagged ‘fire’ and ‘wildfire’ tweets

49

Figure 8. Dual kernel density estimation of geo-tagged tweets on Bernardo fire

50

The spatial information (i.e., ignition locations) associated with wildfires were used to identify the impact areas. Figure 6 shows that downtown area was the largest hot spot in terms of the number of ‘fire’ and ‘wildfire’ tweets. This might be due to the fact that, although San Diego is far away from the ignition locations, a large population could still generate numerous Twitter activities. Additionally, the digital divide between downtown and other areas in San Diego

County might also explain this, as people in urbanized areas have more access to information and communication technologies (ICTs) than people in other areas do. Although non-spatial factors

(e.g., population and digital divide) could explain to some extent the spatial pattern of wildfire tweets, geography still matters. To filter out the influence of population, dual kernel density estimation was performed to detect the clusters of tweets related to Bernardo fire and Cocos fire

(see Figure 7 and Figure 8 respectively). As shown by Figure 7 and Figure 8, the downtown area became a low-value cluster, whereas clusters with values higher than medium were close to the wildfires’ ignition locations. Given that the amount of geo-tagged tweets on Bernardo fire and

Cocos fire was small, we included a population map of San Diego County (Figure 9) to exclude the impact of “small number problem”. As seen from Figure 9, areas around Bernardo are also heavily populated. Actually, Bernardo (Rancho Bernardo) is one of the populous areas in San

Diego County (https://www.sandiego.gov). As such, our finding is consistent with those in previous studies. For example, Albuquerque et al (2015) shows that twitter messages which are geographically closed to flooded areas are more likely to be related to floods. Crooks et al.

(2013) find that the majority of earthquake-related tweets originated from within the impact area.

Our research provides new evidence on the relationship between geography and emergency- related social media activities.

51

Figure 9. Dual kernel density estimation of geo-tagged tweets on Cocos fire

52

Figure 10. Spatial distribution of population in San Diego County

53

3.4 Topics and network Figure 10 shows the top 10 frequent words. If a term appears frequently in tweets, it is regarded as important. Hence, as shown by Figure 10, the most important term was “evacuate”.

This is consistent with what we expected, since the most urgent thing in wildfire situations is to evacuate and people always try to inform as many people as possible to evacuate. In “evacuate” tweets, a large part pertained to the evacuation of homes, resulting in a high frequency of

“home”. For instance, someone tweeted “Check @KPBSnews for updates on Bernardo fire in

San Diego County. 700 acres burned, 20,000 homes being evacuated” and another one posted a similar message saying, “California wildfire prompts evacuation of 20,000 homes”. “Home” was also jointly tweeted with “burn” and “Carlsbad” when some Twitter users reported the wildfire damage in Carlsbad such as “#CALFIRE official says wildfire has burned at least 30 homes in

#Carlsbad. Homes all is same neighborhood. #PoinsettiaFire #CBS8”. Similarly, “acre” was also used to indicate damage caused by wildfires like “California's wildfire season has ravaged nearly

10,000 acres”. When people were not sure about the exact location of a wildfire, they used

“place name + area” to indicate a fuzzy place impacted by the wildfire. This could be evidenced by such tweets as “evacuations ordered in 4S Ranch area due to brush fire”. To explicitly label their tweets as situational updates, users added “now” or “update” in them. Two examples are shown here: “Now: San Diego County says evacuation orders over for all 20,000 homes in wildfire danger #BernardoFire” and “UPDATE: #BernardoFire is now 700 acres and 5 percent contained”. Similar to “Carlsbad”, “Bernardo” was tweeted because of the breakout of one wildfire in there. School kids expressed their emotions when they knew that their “school” would be closed. This could be exemplified by “Due to the wildfire in Carlsbad. School is

54 cancelled tomorrow. This is exciting. I know I’m not the only one”.

evacuate

home

burn

carlsbad

acre

Term area

now

update

bernardo

school

0 20 40 60 80 100 120 140 160 Frequency

Figure 11. Term frequency plot

Table 6 shows the seven clusters, and within each cluster, only top three terms were displayed. The number of clusters specified here was to ensure that the most but differentiated topics could be obtained. As observed from Table 6, these clusters represented different topics.

Specifically, cluster 1 denoted the topic related to people’s thankfulness to firefighters; the topic revealed by cluster 2 was about the burned homes in Carlsbad; wind was a keyword in cluster 3, as it fanned the wildfire in Carlsbad area; a topic relevant to the containment percentage and impacted acres of Carlsbad wildfire was disclosed by cluster 4; cluster 5 represented the topic associated with the evacuation caused by a burning wildfire in 4S Ranch; cluster 6 was a topic on

55 damage report, as it revealed the number of acres burned and the wildfire containment percentage; the last cluster were related to the evacuation of homes in Bernardo.

These clusters revealed the main topics in the wildfire-related conversations on Twitter. People tweeted about wildfires together with the places where they occurred, as seen from cluster 2, cluster 3, cluster 4, cluster 5 and cluster 7. This reflects Twitter users’ geographical awareness during wildfire events. People also communicated situational updates related to wildfire damage on Twitter, as seen from cluster 2, cluster 4, and cluster 6. As shown by cluster 5 and 7, evacuations caused by wildfires were involved in tweets, representing that users were concerned about how to respond to wildfires. Different from other clusters, cluster 1 showed people’s appreciation for firefighters.

Table 6. Term clusters in wildfire tweets

Number Term Clusters

Cluster 1 know; thank; firefight

Cluster 2 home; Carlsbad; burn

Cluster 3 wind; Carlsbad; area

Cluster 4 Carlsbad; contain; acre

Cluster 5 burn; evacuate; 4S Ranch

Cluster 6 acre; burn; contain

Cluster 7 evacuate; home; Bernardo

Social network analysis was implemented based on the retweet relationship. In the retweet network, if user A retweets a message posted by user B, an edge runs from a node representing user B to another node representing user A, indicating that information has diffused from B to A.

After building the network based on the retweet relationship, we calculated the indegree and

56 outdegree for each node. The indegree of node A was represented by the times that user A has been retweeted by all other users. The outdegree of node A was the total times that user A has retweeted other users. Figure 11 shows the indegree cumulative distribution of the retweet network, from where we observe that more than 85% nodes had no users retweet their messages.

Furthermore, according to Figure 12, upward 90% of users retweeted only one user or none. The indegree and outdegree results revealed a polarized structure in the retweet network. That is, there existed dominant users that acted as hubs in the information exchange network during wildfire hazards. The major part of the retweet network was visualized to show its polarized structure.

100

90

80

70

60

50 %Users 40

30

20

10

0 0 50 100 150 200 250 300 350 Indegree

Figure 12. Indegree cumulative distribution of the retweet network

57

100

90

80

70

60

50 %Users 40

30

20

10

0 0 1 2 3 4 5 6 Out degree

Figure 13. Outdegree cumulative distribution of the retweet network

58

As can be seen from Figure 13, the node size was proportional to the times retweeted by others. The nodes of @10news, @KPBSnews, and @nbcsandiego are twitter accounts owned by three local news media in San Diego. Some accounts for the local government were also retweeted by numerous users. For example, @SanDieoCounty is an official twitter account for the County of San Diego, @NWSSanDiego for the National Weather Service Office in San

Diego, and @ReadySanDiego for the Office of Emergency Services in San Diego. It is observed that people were inclined to acquire reliable information from either government or local news media during wildfire hazards. This provides implications for emergency management, since social media could serve as a useful information propagation tool for emergency responders to improve the public’s situational awareness.

Figure 14. The major part of the retweet network

59

3.5 Conclusion Social media data are increasingly used for enhancing situational awareness and assisting disaster management. We analyzed the wildfire-related Twitter activities in terms of their inherent attributes i.e., space, time, content, and network to gain insights into the usefulness of social media data in revealing situational awareness.

First, we investigated the spatial and temporal patterns of wildfire-related tweets. Our analysis confirmed a temporally concurrent evolution of wildfire and wildfire-related Twitter activities. Meanwhile, a spatial coupling between wildfire disruptions and related Twitter activities was also observed. As such, social media data can characterize the disaster across space and over time, and thus are applicable to provide knowledge associated with disaster situations.

Second, people’s conversations on social media varied highly in terms of their subjects.

Mining topics can reduce data chaos and extract useful information to enhance situational awareness and accelerate disaster response. We find that people’s geographical awareness was strong during emergency events, and people were also interested in communicating disaster- related news, disaster response information, and potential impacts or damages.

Third, opinion leaders played an important role in wildfire retweet network. We find that some elite users such as local authorities and traditional media reporters were dominant in the retweet network, which is consistent with the findings in previous studies. This polarized structure of retweet network has both advantages and disadvantages. On one hand, situational announcements from authoritative sources are accurate and objective. On the other hand, eyewitness reports might not be able to attract sufficient attention.

60

There are drawbacks in our research that should be considered in our future pursuits on this topic. First, although the searching range could cover the majority of San Diego County, some places (e.g., Carlsbad) where wildfire occurred were not contained. Second, the 1 % sample limitation may lead to question that whether the sampled data are a valid representation of the overall wildfire Twitter activities. Third, the social network in our research is only based on the retweet relationship, while other types of network such as followers network which is based on

‘‘who follows whom’’ could be used in future study. Fourth, the social network analysis centered on the investigation of opinion leaders in wildfire situation and thus overlooked the information diffusion process including its components, phases, and characteristics. Finally, since the four dimensions (i.e., space, time, content, and network) were separately analyzed, their simultaneous analysis might be able to provide some new insights into disaster management.

61

CHAPTER 4

SPACE, TIME, AND SITUATIONAL AWARENESS IN NATURAL HAZARDS: A CASE

STUDY OF HURRICANE SANDY WITH SOCIAL MEDIA DATA3

4.1 Introduction Social media outlets, such as Twitter, Instagram, and Facebook, have evolved beyond platforms for sharing people’s personal life toward data sources for leveraging the public’s collective intelligence to deal with emergency events (Wang et al. 2015a, Wang, Ye, and Tsou

2016, Yates and Paquette 2011). In particular, human-centric information related to people’s perceptions, responses, and behaviors in natural disaster context can be extracted from social media and analyzed to assist natural disaster management (Wang and Ye 2018). The literature on the applications of social media data in natural disaster management has identified several major directions: (1) Event detection – social media has proven to be efficient in detecting disaster outbreaks and disseminating notifications to the public (Sakaki et al. 2010); (2) Rapid assessment of disaster damage – there is a positive relationship between disaster damage and disaster-related social media activities (Guan and Chen 2014, Kryvasheyeu et al. 2016); (3) Situational awareness –social media could enable disaster managers to know what is going on in disaster situations (Vieweg et al. 2010). As pointed by Viewg et al. (2010, p1), situational awareness

“describes the idealized state of understanding what is happening in an event with many actors and other moving parts, especially with respect to the needs of command and control

3 This chapter is based on Zheye Wang & Xinyue Ye (2018). Space, time, and situational awareness in natural hazards: a case study of Hurricane Sandy with social media data, Cartography and Geographic Information Science, DOI: 10.1080/15230406.2018.1483740 .

62 operations”. Disaster managers need actionable information associated with disaster situations in order to make sense of the disaster and to facilitate decision-making, policy formulation, and response implementation. The public need authoritative instructions and situational updates as well as a good knowledge on how other people prepare for, respond to, and recover from natural disaster. Therefore, situational awareness is of particular importance for natural disaster management.

Situational awareness needs to be geographically grounded (MacEachren et al. 2011, Shook and Turner 2016). Decision-making, policy formulation, and response implementation in natural disaster situations require disaster managers to go beyond knowing what is happening by also being informed of where something is happening, i.e. geographic situational awareness (Huang and Xiao 2015; MacEachren et al. 2011, De Albuquerque et al. 2015). Existing studies have utilized spatial analytical methods such as K-means clustering and kernel density estimation

(KDE) to generate spatially relevant information for gaining situational awareness and improving disaster response (Wang, Ye, and Tsou 2016). However, one limitation of these studies is that they focus on analyzing the geospatial information in general social media activities related to a disaster while overlooking more detailed social responses to it (Guan and

Chen 2014, Wang, Ye, and Tsou 2016). Actually, these general disaster-relevant messages often contain content reflecting distinct social responses such as damage reports, situational announcements, and help requests that can be extracted with an in-depth analysis (e.g., a topic classification in social media data) (Huang and Xiao 2015). Combining analysis of geospatial information and content in social media data is gaining more attention and has been increasingly practiced in empirical studies (Huang and Xiao 2015, Kryvasheyeu et al. 2016, Hultquist et al.

2015, Resch et al. 2017). The goal of this chapter is to advance methods for integrating

63 geospatial and content information in social media by capturing the spatial concentration and specialization of social responses to Hurricane Sandy on Twitter. More specifically, location quotient (LQ) is introduced to detect area-specific topic, defined as a topic that has higher concentration than other topics in a specific area as compared to the entire region. Location quotient (LQ) can provide additional insights into this field by moving beyond the widely- adopted spatial point pattern analysis and simple mapping in current literature, yet its application to natural disaster studies and social media data has not been fully explored.

Social responses to a natural disaster can transit over time. Existing studies have shown a temporal concurrent evolution between disasters and corresponding social media activities

(Blandford et al. 2014, Qu et al. 2011). That is, disaster-related social media activities increase until the disaster unfolds and then decrease. However, these studies also focus on the general disaster-relevant messages instead of analyzing how social responses change over time.

Moreover, they rarely synthesize the geospatial information, time stamp, and message content in social media data to explore the space-time dynamics of social responses (Wang and Ye 2018).

Hence, the other goal of this chapter is to integrate Markov transition probability matrix with location quotient (LQ) to investigate how the spatial concentration of topics changes before, during, and after a disaster. Markov transition probability matrix is introduced here to bring a temporal perspective into geographic situational awareness and enable disaster managers to gain better knowledge on the geographic process of social responses.

In summary, this chapter aims to bring situational awareness into a space-time context and thereby expanding it to a dynamic understanding of what is happening across geographic space in natural disasters. The remainder of this chapter is organized as follows. Section 2 presents the

64 data and methodology. Section 3 discusses the findings and their implications. We make the conclusion in section 4.

4.2 Data and methodology 4.2.1 Hurricane Sandy tweets in New York City As one of the most destructive cyclones in the United States since 1900 (Blake et al. 2013),

Hurricane Sandy was used as a case study in this chapter. Making its landfall on October 29,

2012 in New Jersey, Hurricane Sandy caused a vast amount of damage to the northeastern states.

It was estimated that the total loss caused by Hurricane Sandy reached $50 billion and that 72 deaths were associated with the storm in the mid-Atlantic and northeastern United States (Blake et al. 2013).

Hurricane Sandy was formed on October 22, 2012 and dissipated on November 2, 2012. All geotagged tweets posted during this time period were gleaned by Wang et al. (2015) using

Twitter Firehose API. A set of bounding boxes were also specified by Wang et al. (2015) to ensure that all geotagged tweets are from the affected areas including Washington DC,

Connecticut, Delaware, Massachusetts, Maryland, New Jersey, New York, North Carolina, Ohio,

Pennsylvania, Rhode Island, South Carolina, Virginia, and West Virginia. With an attempt to open this dataset to the public, IDs of all the solicited tweets have been made available by Wang et al. (2015) on GitHub. Knowing tweet IDs, a Python program was executed to retrieve all the

Sandy-related tweets via Twitter Search API. In total, 83,006 geotagged Sandy tweets were retrieved. Figure 14 shows the spatial distribution of Sandy-related tweets with a kernel density map. As seen in Figure 14, New York City, one of the most severely impacted areas, has the highest cluster of Twitter activity related to Hurricane Sandy. Therefore, the study area is restricted to New York city where 2,0427 geotagged Sandy tweets were posted. It is very

65 interesting to note that some twitter posts highlight the usefulness of social media in natural disaster situations. Some such examples are shown as follows:

1. NYGovCuomo Twitter feed is a must-follow. They are giving great advice on getting

through the specifics of #Sandy

2. Seriously, we are so blessed to have technology like social media available during

emergencies like these. #sandy #frankenstorm

3. What's amazing is how Twitter and Facebook are more current and up to date with events

on #Sandy then the actual news on tv...#media #fail

4. So glad we have #twitter and #FaceBook to keep #InContact with #FriendsAndFamily

right now! #BeSafe #HurricaneSandy

5. Twitter is our only way to find out what's happening in the rest of the city. Until our

phones die #Sandy

4.2.2 Cleaning and classifying Hurricane Sandy tweets The collected geotagged tweets were too general to capture detailed social responses to

Hurricane Sandy, thus requiring a topic classification based on the textual content of the data.

The authors of this chapter first worked together to manually identify and remove non-

informative tweets, reducing the dataset to 8,972 geo-tagged Sandy tweets. That is, 43.9% geotagged Sandy tweets from NYC were annotated as informative, which is lower than the ratio of 60% in Imran et al. (2013). Here, according to Imran et al. (2013), these non-informative/off- topic tweets are those mainly about personal sentiment and do not contain situational awareness

information (Imran et al. 2014). Then, the remaining 8,972 informative tweets were classified into 6 categories/topics including Caution and Advice, Affected People, Infrastructure/Utilities,

Needs and Donations, Weather and Environment, and Other. This classification schema is a

66 slight modification of Imran et al. (2013) by adding a new topic: Weather and Environment. This

modification is motivated by observing that many eyewitness reports were related to weather

Figure 15. Kernel density map of Sandy tweets (arrow and scale are in the lower left corner)

67 conditions and physical environment. Although there exist many other types of classification schemas (see Imran et al., 2015 for a survey), we argue that the one proposed by Imran et al.

(2013) is more useful in revealing information related to situational awareness and is better suited for our case study (i.e., Hurricane Sandy). Note that the authors conducted the classification independently and discussed cases of disagreement in order to reach consensus on the classification of every case. A similar classification procedure has been practiced by De

Albuquerque et al. (2015). Nonetheless, we acknowledge that the coding process was not without uncertainty and can be improved in future investigation. The classification schema is shown in

Table 7 where the left column lists the topics and the right column presents descriptions for each topic along with some exemplary tweets. After the classification, we further processed these

Twitter messages via conversion of words to lower cases, deletion of stop words, punctuations,

URLs, and numbers, as well as word stemming.

4.2.3 Location quotient: detecting area-specific topic Location quotient (LQ) has been traditionally used in regional economics and economic geography to calculate industrial or employment specification. However, as a useful spatial analytical method, LQ can be exploited to detect the spatial concentration of geographic phenomena or events of interest to other domains. For example, Andresen (2007) adopts LQ to measure the specialization of three types of criminal activities (automotive theft, break and enter, and violent crimes) in Vancouver. In the present research, LQ is introduced to measure the spatial concentration of disaster-related conversational topics on social media. Location quotient

(LQ) is represented as a ratio of the percentage of a particular topic in a census tract of New

York City in comparison to the percentage of that same topic in New York City as a whole. Its formula is shown as follows:

68

푘 푘 푘 푋푖 푌푖 퐿푄푖 = 푛 푘 / 푛 푘 ∑푖 푋푖 ∑푖 푌푖

푘 푘 where 푋푖 is the number of tweets under topic 푘 in census tract 푖, 푌푖 is the amount of all Sandy

푘 tweets in census tract 푖, 푛 is the count of census tracts in New York City. If the value of 퐿푄푖 is greater than one, it represents that the census tract 푖 has a higher concentration of tweets under

푘 topic 푘 than the city average; and, the larger the value of 퐿푄푖 , the more concentrated tweets under topic 푘 are in census tract 푖. Additionally, for a given census tract, the topic with the highest LQ value larger than one is the most concentrated one. Therefore, we select the topic with highest LQ value as the area-specific topic.

4.2.4 Markov transition probability matrix: measuring temporal transition of area-specific topic

Markov transition probability matrix is introduced to measure how area-specific topics change before, during and after Hurricane Sandy. Markov transition probability matrix is calculated using the equation below:

푛푘푙 푃푘푙 = 푛푘 where 푃푘푙 is the transition probability of area-specific topic 푘 in stage 푡 to topic 푙 in stage 푡 + 1,

푛푘푙 is the count of census tracts moving from area-specific topic 푘 in stage 푡 to topic 푙 in stage

푡 + 1, 푛푘 is the number of census tracts with area-specific topic 푘 in sate 푡.

69

Table 7. The classification schema of Hurricane Sandy tweets

Category Description Caution and Advice Tweets referring to warnings, preparation, advice, and tips (Imran et al., 2013; Imran et al., 2015). Examples:  ‘@NYCMayorsOffice: Mayor: The peak surge will hit areas along Long Island Sound between 10pm and 2am Tuesday. #Sandy” Everyone be safe.’  ‘Stocking up on candles. #hurricanesandy #Hurricane #hurricanesandy2012’ Affected People Tweets referring to people trapped, injured, missing, and killed (Imran et al., 2013; Imran et al., 2015). Examples:  ‘Due to Hurricane Sandy, I am trapped in New York until Wednesday with @Have_aNiceDavis #bummer’  ‘@CNN: There have been 5 confirmed deaths due to superstorm Sandy in New York’ Infrastructure/Utilities Tweets referring to infrastructure damage, services closure, built environment, and collapsed structure (Imran et al., 2013; Imran et al., 2015). Examples:  ‘@lheron: Massive power outage has turned NYC into city of migrants, from downtown to uptown: http://on.wsj.com/Uhicz8 #Sandy” #mylife’  ‘Subways are shut down! #hurricanesandy @Sunset Park, Brooklyn http://instagr.am/p/RXrwc3p5Rw/’  ‘Crane on 57 street collapsed. So dangerous! #NYC #Hurricane #Sandy’  ‘Kiehl's and Starbucks are closed. #UpperWestSideProblems #sandy’ Needs and Donations Tweets referring to donations, volunteering, relief, and fundraising (Imran et al., 2013; Imran et al., 2015). Examples:  ‘Donating blood w @wandadetroit - our sexiest #ladydate ever! #sandy’  ‘RT @PE_Feeds: #SandyVolunteer / UPDATE: NYC volunteer opportunities for #Sandy cleanup efforts’ Weather and Tweets referring to weather conditions and environment. Environment Examples:  ‘The pressure I'm measuring here at @thepodhotel has now dropped through the 97kPa. Winds picking up. #hurricanesandy’  ‘These winds are pretty strong #Sandy’ Other Tweets not referring to any of the previous categories (Imran et al., 2013; Imran et al., 2015). Examples:  ‘Reports of looting follow Hurricane #Sandy. 15 people have been charged in Queens via @NYMag’  ‘@billmaher Should/can Election day be postponed for the benefit of Americans still struggling with the aftermath of Sandy?’

70

4.3 Results 4.3.1 Data description We first report the classification results of the total 8,972 Sandy tweets. As shown in Table

8, half of the total Sandy tweets (50.4%) were about Infrastructure/Utilities, whilst the second most communicated topic was Caution and Advice (21.2%). Based upon the posted time, we split the tweets into three groups including Before, During and After. The Before group contains

Sandy tweets posted before 10/28/2017, During group has Sandy tweets posted on 10/29/2017 and 10/30/2017, and After group includes Sandy tweets posted after 10/31/2017. Our separation is different from Kogan et al. (2015) where 10/27/2012-10/31/2012 was defined as the During stage of Hurricane Sandy. Their study region covers multiple eastern states while our work focuses on a single city, which, to some extent, triggers their adoption of a longer time slice of

During stage. Meanwhile, after consulting with natural disaster experts, we based our separation mainly upon the time of Sandy’s landfall (late 10/29/2017) when it started to intensively affect

New York City. We acknowledge that there might be some other ways to separate the data, but none will be exempted from uncertainty. The Before, During, and After groups have 1664, 5504, and 1804 Sandy tweets, respectively. For each group, the distribution of conversational topics was obtained (Table 8). Before Hurricane Sandy hit New York City, the majority of Sandy tweets (63%) were about Caution & Advice, while this percentage dropped to 15.1% during the disaster and became much smaller in the After group (0.9%). In contrast, the percentage of Sandy tweets classified as Infrastructure/Utilities increased before, during, and after the event from

22.1%, to 51.2%, and 74.1%, respectively. Moreover, the share of Weather & Environment topic started at 6.3% before Sandy peaked at 22.9% in during group and then declined to 1.7% after the disaster. It could also be observed from Table 8 that the share of Affected People topic had little change in the process of Hurricane Sandy.

71

Table 8. Classification results of Sandy tweets

Total Before During After Topics Number Percent Number Percent Number Percent Number Percent Caution & advice (A) 1898 21.2% 1048 63% 833 15.1% 17 0.9% Affected people (B) 582 6.5% 117 7% 386 7% 79 4.4% Infrastructure/utilities (C) 4523 50.4% 368 22.1% 2819 51.2% 1336 74.1% Needs & donations (D) 414 4.6% 8 0.5% 129 2.3% 277 15.4% Weather & environment (E) 1395 15.5% 105 6.3% 1259 22.9% 31 1.7% Other (F) 160 1.8% 18 1.1% 78 1.4% 64 3.5%

72

4.3.2 Top frequent terms To reveal more details about social responses to the disaster, we summarized the top frequent terms for each topic in the before, during, and after groups. Due to the space limitation, we just reported the top 15 frequent terms for each topic. Table 9 shows the top frequent terms under each topic in total sandy tweets. Regarding the Caution and Advice topic, there were tweets to raise people’s awareness of the disaster and warn them to stay safe, messages about preparation efforts like stocking up on water, food, and other supplies, as well as eyewitness reports such as lines outside grocery stores. The frequent terms of Affected People topic demonstrate that

Hurricane Sandy impacted people in ways of human mobility (‘stuck’, ‘trap’), emotional status

(‘bore’ and ‘scare’), and casualty (‘kill’ and ‘die’). As shown in Infrastructure/Utilities table,

Infrastructure (‘tree’, ‘subway’), utilities (‘power’, ‘water’, ‘light’, ‘dark’), and services (‘close’,

‘open’) were largely communicated on Twitter in the process of Hurricane Sandy. The Needs &

Donations column in Table 9 shows that people posted tweets calling for help, volunteers, donations, expressing their needs, and highlighting relief efforts by Red Cross and NYPD. Under the topic of Weather & Environment, people communicated meteorologic (‘wind’, ‘rain’), hydrologic (‘Hudson’, ‘water’, ‘flood’), and cognitive (‘calm’, ‘empty’) information on Twitter in Hurricane Sandy.

73

Table 9 Top frequent terms under four topics of the total Sandy tweets

Caution & Affected Infrastructure/Utilities Needs & Weather & Advice People Donations Environment Prepare (236) Stuck (73) Power (877) Help (102) Wind (348) Safe (175) Scare (48) Tree (376) Need (91) Water (131) Stay (129) Die (39) Close (314) Victim (45) Rain (120) Evacuate (120) Bore (38) Manhattan (308) People (44) Park (113) Stock (117) Day (38) Park (282) Donate (42) Flood (110) Come (98) Time (34) Street (269) Volunteer (37) Street (109) Food (95) Home (31) Light (256) Redcross (34) Calm (87) Water (87) Thank (29) Brooklyn (207) Come (34) East (84) Zone (77) People (27) Thank (200) Text (31) Manhattan (78) Line (77) Kill (25) Open (200) Charge (31) Brooklyn (67) Everyone (65) Make (21) Dark (187) Relief (28) Come (55) People (65) Hotel (20) Subway (182) Food (28) Start (52) Supply (62) Trap (18) Day (171) Can (26) Empty (52) Hit (60) Work (18) Work (168) Work (25) Window (49) Store (56) Come (18) Water (165) NYPD (25) Hudson (45)

74

Table 8 shows that only eight tweets were classified into Needs & Donations topic in the

Before group. Therefore, we focus on analyzing the term frequency for other four topics (Table

10). The distribution of frequent terms under Caution & Advice topic in Before group is similar to that in total Sandy tweets, as people tweeted most about their preparedness for Hurricane

Sandy e.g., stocking up supplies (‘food’, ‘water’). It is interesting to note from Table 10 that wine was also in the list of people’s hurricane provisions. After a comparison of Table 9 and

Table 10, Affected People topic has some consistency in terms of the word frequency distribution between the total Sandy tweets and the Before group. In both cases, people tweeted about human mobility (‘stuck’), emotional status (‘bore’, ‘scare’), and casualty (‘kill’). This comparison also shows that, since power issues and fallen trees had not taken place before the disaster, Infrastructure/Utilities topic in the Before group was more about closure (‘close’, ‘shut’,

‘school’, ‘subway’, ‘train’ and ‘MTA’) and cancellation (‘class’, ‘flight’) which were notified to the public ahead of time. As already noted, ‘Manhattan’ and ‘Brooklyn’ were frequently tweeted when people communicated Weather & Environment as well as Infrastructure/Utilities in the entire process of Hurricane Sandy (Table 9). This is likely because people tend to report geolocations when tweeting eyewitness reports that are often associated with

Infrastructure/Utilities as well as Weather & Environment. Notably, ‘Manhattan’ and ‘Brooklyn’ do not appear in the term frequency list of Infrastructure/Utilities topic in the Before group, since these tweets are mainly about the ahead-of-time notifications from authorities instead of eyewitnesses. Meteorologic (‘wind’, ‘rain’, ‘sky’), hydrologic (‘water’, ‘river’), and cognitive

(calm) reports are also covered by the Weather and Environment topic in Before group, which is similar to that of the total Sandy tweets.

75

Table 10 Top frequent terms under four topics of the before Sandy tweets

Caution & Advice Affected People Infrastructure/Utilities Weather & Environment Prepare (179) Scare (12) Close (86) Calm (36) Stock (94) Stuck (11) School (62) Wind (33) Get (91) Day (11) Cancel (51) Rain (14) Ready (85) Home (9) Tomorrow (50) Start (7) Food (74) Get (9) Subway (41) Park (7) Evacuate (73) Come (9) Monday (35) Come (7) Line (69) Time (8) Work (32) Weather (6) Water (60) Make (7) Thank (31) Water (6) Come (55) Freak (7) Tonight (27) Pick (6) Safe (49) Thank (6) Shut (27) Brooklyn (6) Zone (47) Everyone (6) MTA (27) River (5) Store (45) Today (5) Flight (27) Look (5) Grocery (40) Plan (5) Service (26) Dark (5) Wine (38) Like (5) Class (26) Sky (4) Supply (38) Kill (4) Train (25) Make (4)

76

In spite of some minor differences, the total Sandy tweets and the During group (Table 11) have an overall similarity in terms of the word frequency distribution. However, there exists non- negligible distinctions between the Before group and During group. First, ‘power’ and ‘tree’ that were rarely tweeted before Sandy become the most frequent terms under the topic of

Infrastructure/Utilities in the During group. Also during the disaster, people communicated their concern about the dangling construction ‘crane’ or posted eyewitness reports of its collapse on

Twitter. Second, as compared to the Before group, more Needs & Donations tweets calling for and reporting relief efforts emerge in the During group.

Table 11 Top frequent terms under five topics of the during Sandy tweets

Caution & Affected People Infrastructure/Utilities Needs & Donations Weather & Advice Environment Safe (124) Stuck (55) Power (681) Help (31) Wind (315) Stay (98) Die (34) Tree (314) NYPD (20) Water (125) Ready (68) Bore (34) Close (193) Need (17) Flood (110) Get (56) Scare (32) Street (186) Work (15) River (108) Prepare (56) Get (30) Light (186) Thank (15) Street (104) Evacuate (46) Time (19) Park (185) FDNY (12) Rain (104) Come (43) Thank (18) Manhattan (167) Volunteer (11) Park (100) Everyone (40) Day (18) Brooklyn (138) Park (11) Get (83) Emergency (33) Kill (17) Open (134) Brooklyn (10) East (78) Hunker (31) Hotel (17) Flood (128) Respond (9) Manhattan (75) Zone (30) Home (17) Water (112) Can (9) Like (60) People (30) People (15) Thank (109) Street (8) Brooklyn (54) Hit (30) Make (13) Build (109) First (8) Look (52) Home (27) Family (13) Crane (99) People (6) Empty (52) Time (26) Wait (12) Get (98) Bridge (6) Calm (51)

As New York City entered into the recovery stage, the amount of Caution & Advice tweets in the After group becomes very small (Table 8) and the word frequency of this topic is equally distributed. Therefore, we focus on the term frequency distribution for other four topics (Table

12). Notable changes can be detected for Affected People, Infrastructure/Utilities, as well as

Weather & Environment topics. Deaths in Staten Island were highlighted under the Affected

People topic in the after group. Gas problem was added to the power issues in the aftermath of

77

Hurricane Sandy, as shown in the list of frequent terms of Infrastructure/Utilities topic (Table

12). Additionally, the weather and environment conditions became better (‘blue’, ‘sky’, ‘sun’,

‘sunshine’, ‘sunrise’) after the disaster.

Table 12 Top frequent terms for four topics of the after Sandy tweets

Affected People Infrastructure/Utilities Needs & Donations Weather & Environment People (9) Power (185) Help (71) Sky (8) Day (9) Manhattan (133) Need (70) Brooklyn (7) Time (7) Dark (100) Victim (42) Blue (7) Stuck (7) Park (90) People (38) Park (6) Staten (6) Day (87) Donate (38) Sun (5) Island (6) Street (79) Come (32) Cloud (4) Death (6) Blackout (79) Redcross (31) Sunshine (3) Thank (5) Gas (78) Charge (31) Light (3) Sad (5) Back (76) Text (30) Halloween (3) Night (5) Get (74) Volunteer (26) Final (3) Like (5) Subway (70) Food (26) East (3) Home (5) Line (70) Marathon (25) Aftermath (3) Halloween (5) Light (68) Relief (24) Weather (2) Kill (4) Brooklyn (68) Thx (23) Sunrise (2) Trap (4) Traffic (66) Resource (23) Skyline (2)

4.3.3 Spatial visualization of area-specific topic We map the area-specific topics at census tract scale for total Sandy tweets, Before group,

During group, and After group in Figures 15, 16, 17, and 18 respectively. It is difficult to summarize the spatial characteristics of area-specific topics for total Sandy tweets from Figure

15. However, the spatial pattern becomes clear when it comes to the Before group (Figure 16),

During group (Figure 17), and After group (Figure 18). We can easily observe that the Caution &

Advice topic plays a dominant role in the geographic distribution of area-specific topics in the

Before group as the majority of census tracts in New York City are colored in red. This could be explained by the fact that people focused more on tweeting cautions and warnings as well as their preparations for the upcoming disaster. A comparison of Figure 16 and Figure 17 reveals that the prevalent topic shifted to Infrastructure/Utilities during the disaster, as eyewitness

78 reports regarding hurricane impacts such as power outage and service closure became major communicated messages on Twitter. This spatial pattern becomes more easily identifiable in the aftermath of Hurricane Sandy (Figure 18). As noted from the above term frequency analysis,

Manhattan and Brooklyn were two places most communicated on Twitter in Hurricane Sandy context. This can be reflected in the four maps of area-specific topics where Manhattan and

Brooklyn (especially the north part) are significant clusters of Sandy Twitter messages.

Moreover, the spatial pattern of area-specific topics in Manhattan and Brooklyn is congruent with the general structure of New York City, as the prevalent topic shifted from Caution &

Advice before Sandy to Infrastructure/Utilities during and after the disaster. Notably, the census tract where John F. Kennedy International Airport (JFK) is located has Infrastructure/Utilities as its prevalent topic in total Sandy tweets, Before group, and During group, because people tweeted about delayed and cancelled flights. Yet, with the restoration of JFK after Sandy, people were updated with more information on Needs & Donations.

79

Figure 16. The spatial distribution of area-specific topics for total Sandy tweets

80

Brooklyn JFK

Figure 17. The spatial distribution of area-specific topics in the Before group

81

JFK Brooklyn

Figure 18. The spatial distribution of area-specific topics in the During group

82

JFK Brooklyn

Figure 19. The spatial distribution of area-specific topics in the After group

83

4.3.4 Temporal transition of area-specific topics We report the transition probability matrices to demonstrate how the area-specific topics transited throughout the three disaster phases i.e., before, during, and after. The first seven rows in Tables 13, 14, and 15 are the transition probability matrices where each element represents the transition probability between area-specific topics from Before to During, During to After, as well as Before to After, respectively. The Total row in each of the Tables 13–15 records the total transition probabilities to a certain topic. For each row in Tables 13–15, we highlighted the largest value in bold, the second largest one with underline, and the third in italic. A glimpse of

Figures 16–18 reveal that, in all three groups, there exist a large number of census tracts from where no georeferenced Sandy tweets were reported. Therefore, census tracts have a very high probability of transiting into “no tweets” status, which is congruent with the fact that the largest total probability value is always from the “No Tweets” columns. Notably, the

“Infrastructures/Unities” column records the second largest total probability value in all three scenarios, suggesting that census tracts were more likely to transit into Infrastructure/Utilities topic upon reaching the during and after stages. The “Weather & Environment” column has the third largest total transition probability in Table 13, indicating that places also shifted their attention to Weather & Environment in the Before to During scenario. The third largest total transition probability values in Tables 14 and 15 are both found in the “Needs & Donations” column, implying that Needs & Donations topic became more important for many places in New

York City at the recovery stage of Hurricane Sandy. More specifics can be obtained from the transition probability matrices: “Infrastructure/Utilities” and “Weather & Environment” columns in Table 13 as well as the “Infrastructure/Utilities” and “Needs & Donations” columns in both

Table 14 and Table 15 concentrate the majority of transitions from their previous stages.

84

Table 13. The transition probability matrix for area-specific topics from Before to During

During Caution& Affected Infrastructure Needs & Weather & No Other Before Advice People /Utilities Donations Environment Tweets Caution & Advice 17.73% 12.49% 16.90% 5.69% 19.97% 4.02% 23.20% Affected People 18.63% 7.64% 18.42% 20.67% 17.38% 5.70% 11.56% Infrastructure/Utilities 13.36% 12.83% 22.41% 10.42% 16.69% 4.12% 20.16% Needs & Donations 8.97% 20.23% 0.00% 6.95% 29.72% 0.00% 34.14% Weather & 11.81% 17.39% 25.06% 13.26% 17.95% 4.01% 10.52% Environment Other 12.02% 11.12% 6.37% 29.07% 13.15% 25.67% 2.61% No Tweets 8.11% 7.35% 16.09% 1.75% 7.85% 1.46% 57.38% Total 10.59% 9.05% 17.05% 4.58% 11.30% 2.51% 44.92% Note: the largest, second largest, and third largest values in each row were highlighted in bold, with underline, and in italic, respectively.

85

Table 14. The transition probability matrix for area-specific topics from During to After

After Caution Affected Infrastructure Needs & Weather & No Other During & Advice People /Utilities Donations Environment Tweets Caution & Advice 1.03% 2.49% 27.51% 13.04% 2.22% 7.64% 46.06% Affected People 0.94% 9.79% 29.93% 8.55% 1.75% 2.05% 46.98% Infrastructure/Utilities 1.02% 5.84% 26.10% 6.66% 1.54% 0.81% 58.02% Needs & Donations 4.81% 4.45% 39.41% 10.17% 5.96% 4.35% 30.86% Weather & 1.22% 3.01% 29.00% 9.70% 3.60% 1.66% 51.83% Environment Other 0.00% 7.64% 37.99% 5.93% 0.00% 12.19% 36.24% No Tweets 0.03% 1.21% 9.76% 3.07% 0.17% 0.72% 85.05% Total 0.74% 3.42% 20.49% 6.38% 1.41% 2.15% 65.41% Note: the largest, second largest, and third largest values in each row were highlighted in bold, with underline, and in italic, respectively.

86

Table 15. The transition probability matrix for area-specific topics from Before to After

After Caution& Affected Infrastructure Needs & Weather & No Other Before Advice People /Utilities Donations Environment Tweets Caution & Advice 1.60% 5.52% 29.76% 10.30% 1.45% 3.21% 48.17% Affected People 1.94% 6.55% 36.58% 18.28% 5.58% 6.14% 24.93% Infrastructure/Utilities 2.68% 6.22% 28.58% 9.88% 5.40% 5.46% 41.78% Needs & Donations 6.95% 29.72% 0.00% 0.00% 20.23% 8.97% 34.14% Weather & 0.00% 6.25% 43.40% 14.74% 1.78% 3.13% 30.71% Environment Other 6.26% 0.00% 29.54% 1.07% 13.15% 29.38% 20.62% No Tweets 0.21% 2.14% 15.03% 3.88% 0.42% 0.89% 77.44% Total 0.74% 3.42% 20.49% 6.38% 1.41% 2.15% 65.41% Note: the largest, second largest, and third largest values in each row were highlighted in bold, with underline, and in italic, respectively.

87

4.4 Discussion We propose a novel approach for area-based detection of topical hotspots of social media conversations in natural disaster context. Moving beyond the spatial analytical methods in most existing studies, this approach uses location quotient (LQ) and Markov transition probability matrix to integrate space, time, and content dimensions in social media data and enable a space- time analysis of detailed social responses to a natural disaster. We contribute to the literature by bringing situational awareness into a space-time context and thereby expanding it to a dynamic understanding of what is happening across geographic space in the whole process of a natural disaster. The case study based on Hurricane Sandy tweets in New York City discloses how the spatial pattern of area-specific topics changes with the evolution of Hurricane Sandy. Area- specific topics mainly transited to Infrastructure/Utilities and Weather & Environment in the

Before to During scenario. In contrast, upon reaching the recovery stage of Hurricane Sandy

(Before to During and During to After), Donations & Environment started to play a role and

Infrastructure/Utilities still attracted significant transitions. Our approach enables disaster responders to compare what people in an area are most concerned about and to identify how they change in the process of a natural disaster. Although our classification is human-annotated and time-consuming, it provides useful reference for rapid topic categorization of social media data related to similar natural disasters. For example, when another hurricane comes, the classifier developed in this study can be modified and applied with a Labelled Latent Dirichlet Allocation

(LLDA) method to categorize the newly generated social media messages. Hence, when this space-time approach is jointly employed with fast topic modeling, it offers more potential to facilitate efficient policy/decision making and rapid response in mitigations of damages caused

88 by natural disasters. Furthermore, despite that we focus on natural disasters, this approach can be applied to social media activities related to other phenomena and events.

89

CHAPTER 5

CONCLUSION

5.1 Summary Over the past several decades, the frequency and intensity of natural disasters have dramatically increased, causing great damage to human society. To reduce the impact of disasters to humanity, social media data have been increasingly analyzed to provide useful human-centric information for better accomplishing various management tasks during all disaster phases, i.e. mitigation, preparedness, response and recovery. Four dimensions including space, time, content, and network in social media data have received particular attention in literature.

However, there lacks a clear guidance to better analyze these dimensions to facilitate decision making and disaster response.

A framework has been developed in chapter 2 to systematically evaluate the four dimensions in social media data. This framework can address the following questions: (1) which combinations of dimensions have been implemented more (or less) frequently in existing studies? (2) what research questions and data analysis tasks could be raised based on the combinations of these dimensions? (3) how to improve the synthesis of social media data with remote sensing imagery and census data? In this chapter, we first review how existing studies analyze the four dimensions in social media data, summarize common techniques for mining these dimensions, and then suggest some methods accordingly. We then use the developed framework to categorize the gathered articles into 15 classes (space, time, content, network, space ∩ time, space ∩ content, space ∩ network, time ∩ content, time ∩ network, content ∩

90 network, space ∩ time ∩ content, space ∩ time ∩ network, space ∩ time ∩ content ∩ network) and facilitate the generation of data analysis tasks. We find that (1) a large part of existing studies involve multiple dimensions of social media data in their analyses, (2) there are both separate analyses for each dimension and simultaneous analyses for multiple dimensions and (3) there are fewer simultaneous analyses as dimensions increase. Remote sensing imagery and census data can complement social media data and should be fused together to provide additional informational richness for natural disaster management. More specifically, studies should go beyond the spatial dimension and incorporate more dimensions of social media data to refine the fusion. Identifying the gaps in the framework can lead to the testing of a number of hypotheses and development of new social media analytics toward better decision and policy making in disaster situations. Additionally, the general structure of this framework can be extended to other fields such as politics and public health to serve as guidance for better data mining. Following this analytical framework, two case studies have been conducted.

In chapter 3, the first case study is conducted to analyze the wildfire-related Twitter activities in terms of their attributes pertinent to space, time, content, and network to gain insights into the usefulness of social media data in revealing situational awareness. The increasing wildfire activities, with the associated risks for nature and society, have attracted attention from researchers as well as emergency managers. Although many studies and practices have been conducted in dealing with wildfire issues, most of them are not from a human-centric perspective and omit wildfire-related human behaviors. This case study separately analyzes the four dimensions with kernel density estimation, temporal histogram, latent Dirichlet allocation

(LDA), and social network analysis to gain more insights into social responses to a set of wildfire hazards in San Diego County, California. Findings show that social media data can

91 characterize the wildfire across space and over time, and thus are applicable to provide useful information on disaster situations. People have strong geographical awareness during wildfire hazards and are interested in communicating situational updates related to wildfire damage (e.g., containment percentage and burned acres), wildfire response (e.g., evacuation), and gratitude to firefighters. News media and local authorities are opinion leaders and play a dominant role in the wildfire retweet network.

Chapter 4 involves the second case study. As indicated in the chapter 2, more simultaneous analyses of dimensions are required to gain richer information for natural disaster management.

Various methods have been developed to investigate the geospatial information, temporal component, and message content in disaster-related social media data to enrich human-centric information for situational awareness. However, few studies have simultaneously analyzed these three dimensions (i.e., space, time, and content). With an attempt to bring a space-time perspective into situational awareness, we develop a novel approach to integrate space, time, and content dimensions in social media data and enable a space-time analysis of detailed social responses to a natural disaster. Using Markov transition probability matrix and location quotient

(LQ), we analyze the Hurricane Sandy tweets in New York City and explore how people’s conversational topics change across space and over time. Our approach offers potential to facilitate efficient policy/decision making and rapid response in mitigations of damages caused by natural disasters.

5.2 Limitations This dissertation focuses on using social media as stand-alone data source. However, social media data are not without limitations. Unstructured nature, digital divide and privacy are among the most concerned limitations in social media data. Social media messages have unstructured

92 texts, photos, and videos and are difficult to be analyzed with traditional computer programs as compared to structured data with high degree of organization. Unstructured social media data should be converted into a structured format prior to be imported into the analysis process. For example, in a topic modeling process, social media messages should usually be changed into a structured term frequency- inverse document frequency (TF-IDF) matrix so that computer programs such as latent Dirichlet allocation (LDA) and machine learning can process them.

However, the conversion of unstructured data to a structured format may not preserve all information in the original dataset. For instance, TF-IDF does not capture the semantics, context, and positions of words. Therefore, the cleaning and preprocessing of social media data should be devoted more attention from researchers to better the development of social media analytics for natural disaster management.

Digital divide has two implications. First, although modern information and communication technologies (ICTs) have been widespread, the penetration and accessibility of these technologies vary from place to place (Spitzberg 2014). In other words, there exists a digital divide that some geographic places such as cities have denser and wider communication networks than other places do. This geotechnical feature may exert influence on people’s engagement in social media activities including those related to natural disasters. Second, social media users only represent a certain subdivision of the whole population, and vulnerable groups in disasters such as children and elderly people may be underrepresented. In this regard, it merits further investigations on how digital divide affects the use of social media in natural disaster management.

Recently, privacy concerns with social media are rising. The exposure of personal lives, locations and activities may result during the processing of social media data, causing privacy

93 issues (Elwood and Leszczynski 2011, Sui and Goodchild 2011, Zook et al. 2015a, Zook et al.

2015b). Although there have been few violations of privacy in academic use of social media data, it warrants efforts to preserve as much useful information as possible for research and to protect privacy at the same time.

In spite of the aforementioned limitations caused by social media data itself, this dissertation is not without drawbacks in the application of social media data. During natural disasters, management agencies need timely human-centric information to gain situational awareness and facilitate disaster response. However, our analytical work in this dissertation were mostly conducted after the disaster. To provide timely information for natural disaster management, it requires that all analytical tasks can be implemented in a real-time or near real- time manner. As such, to enable a rapid decision support system for natural disaster management, the analysis of space, time, content, and network dimensions as well as their combinations should keep pace with the evolution of natural disasters. In this sense, future efforts should be devoted to improving the scalability of social media analytics in order to better support rapid decision-making in natural disaster situations. Although the analytical framework based on the four dimensions in social media data is capable of capturing research gaps, it barely guides the proposal of research questions that will only arise from the field of natural disaster management. More specifically, the identified four dimensions in social media data in Chapter 2 are not restricted to disaster-related messages but are widely existed in most social media activities. Therefore, it merits future work to shed more light on how to extract dimensions in social media data that are specific to natural disaster management.

In spite of these limitations, social media is still a valuable data source for retrieving information related to human perceptions, responses, and behaviors.

94

5.3 Beyond natural disaster management Beyond natural disaster management, this dissertation can pave a path for domain scientists from other scientific fields such as public health, business, and politics to gain more insights into the patterns, processes, and mechanisms of social media activities of interest to them. The general structure of this framework can be extended to other fields such as politics and public health to serve as guidance for better data mining. More specifically, the fact revealed in this dissertation that there are fewer simultaneous analyses with the increase of dimensions not only applies to natural disaster research but also to studies in other fields where social media data are used. Therefore, it might be helpful for researchers to put more emphasis on overcoming the computational constraints to analyze more dimensions simultaneously.

Cyberspace is not isolated from the physical space. Human dynamics in virtual space is driven and shaped by real-world events and phenomena. In this dissertation, the space-time analysis of social media activities related to natural disasters are linked with the evolution of natural disasters. We also point out that the fusion of social media data with remote sensing imagery and census data would be helpful in revealing the driving forces of space-time patterns of social responses to natural disasters. These efforts are not only to offer real-world driving forces in explaining the patterns of human dynamics in virtual space but also to advance our understanding in the connection of virtual and real spaces.

95

References

Acar, A., & Muraki, Y. (2011). Twitter for crisis communication: lessons learned from Japan’s tsunami disaster. International Journal of Web Based Communities, 7 (3), 392–402.

Ager, A. A., Day, M. A., Finney, M. A., Vance-Borland, K., & Vaillant, N. M. (2014a). Analyzing the transmission of wildfire exposure on a fire-prone landscape in Oregon, USA. Forest Ecology and Management, 334, 377-390.

Ager, A. A., Day, M. A., McHugh, C. W., Short, K., Gilbertson-Day, J., Finney, M. A., & Calkin, D. E. (2014b). Wildfire exposure and fuel management on western US national forests. Journal of environmental management, 145, 54-70.

96

Alexander, D.E. Principles of Emergency Planning and Management; Oxford University Press on Demand: Oxford, UK, 2002; ISBN 9780195218381.

Andresen, M. A. (2007). Location quotients, ambient populations, and the spatial analysis of crime in Vancouver, Canada. Environment and Planning A, 39(10), 2423-2444.

Ashktorab, Z., Brown, C., Nandi, M., & Culotta, A. (2014). Tweedr: Mining twitter to inform disaster response. Proc. of ISCRAM.

Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., & Tesconi, M. (2014, August). Ears (earthquake alert and report system): a real time decision support system for earthquake crisis management. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1749-1758). ACM.

Avvenuti, M., Cimino, M. G., Cresci, S., Marchetti, A., & Tesconi, M. (2016). A framework for detecting unfolding emergencies using humans as sensors. SpringerPlus, 5(1), 43.

Bakillah, M., Li, R. Y., & Liang, S. H. (2015). Geo-located community detection in Twitter with enhanced fast-greedy optimization of modularity: the case study of typhoon Haiyan. International Journal of Geographical Information Science, 29(2), 258-279.

Blake, E. S., Kimberlain, T. B., Berg, R. J., Cangialosi, J. P., & Beven Ii, J. L. (2013). Tropical cyclone report: Hurricane sandy. National Hurricane Center, 12, 1-10.

Blanford, J. I., Bernhardt, J., Savelyev, A., Wong-Parodi, G., Carleton, A. M., Titley, D. W., & MacEachren, A. M. (2014). Tweeting and tornadoes. Proc. of ISCRAM.

Botzen, W. J. W., Aerts, J. C. J. H., & Van Den Bergh, J. C. J. M. (2009). Dependence of flood risk perceptions on socioeconomic and objective risk factors. Water Resources Research, 45(10).

Calkin, D. E., Thompson, M. P., Finney, M. A., & Hyde, K. D. (2011). A real-time risk assessment tool supporting wildland fire decisionmaking. Journal of Forestry, 109(5), 274-280.

Cameron, M. A., Power, R., Robinson, B., & Yin, J. (2012, April). Emergency situation awareness from twitter for crisis management. InProceedings of the 21st international conference companion on World Wide Web (pp. 695-698). ACM.

Caragea, C., McNeese, N., Jaiswal, A., Traylor, G., Kim, H. W., Mitra, P., ... & Yen, J. (2011, May). Classifying text messages for the haiti earthquake. In Proceedings of the 8th international conference on information systems for crisis response and management (ISCRAM2011).

97

Caragea, C., Squicciarini, A., Stehle, S., Neppalli, K., & Tapia, A. (2014). Mapping moods: geo-mapped sentiment analysis during hurricane Sandy. Proc. of ISCRAM.

Castillo, C., Mendoza, M., & Poblete, B. (2013). Predicting information credibility in time- sensitive social media. Internet Research, 23(5), 560-588.

Cervone, G., Sava, E., Huang, Q., Schnebele, E., Harrison, J., & Waters, N. (2016). Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 Boulder flood case study. International Journal of Remote Sensing, 37(1), 100-124.

Chatfield, A. T. & Brajawidagda, U. (2012). Twitter tsunami early warning network: a social network analysis of Twitter information flows. In J. W. Lamp (Eds.), ACIS 2012 : Location, location, location : Proceedings of the 23rd Australasian Conference on Information Systems 2012 (pp. 1-10). Australia: Deakin University.

Chatfield, A. T., & Brajawidagda, U. (2013, January). Twitter early tsunami warning system: A case study in Indonesia's Natural Disaster Management. In System sciences (HICSS), 2013 46th Hawaii international conference on(pp. 2050-2060). IEEE.

Chatfield, A. T., & Brajawidagda, U. (2014, January). Crowdsourcing hazardous weather reports from citizens via twittersphere under the short warning lead times of EF5 intensity tornado conditions. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 2231-2241). IEEE.

Chatfield, A. T., Scholl, H. J. J., & Brajawidagda, U. (2013). Tsunami early warnings via Twitter in government: Net-savvy citizens' co-production of time-critical public information services. Government information quarterly, 30(4), 377-386.

Cheong, F., & Cheong, C. (2011). Social media data mining: A social network analysis of tweets during the Australian 2010-2011 floods. In 15th Pacific Asia Conference on Information Systems (PACIS) (pp. 1-16). Queensland University of Technology.

Chowdhury, S. R., Imran, M., Asghar, M. R., Amer-Yahia, S., & Castillo, C. (2013). Tweet4act: Using incident-specific profiles for classifying crisis-related messages. In 10th International ISCRAM Conference.

Chuvieco, E., Aguado, I., Yebra, M., Nieto, H., Salas, J., Martín, M. P., et al. (2010). Development of a framework for fire risk assessment using remote sensing and geographic information system technologies. Ecological Modelling, 221(1), 46-58.

Chuvieco, E., Aguado, I., Jurdao, S., Pettinari, M. L., Yebra, M., Salas, J., et al. (2012). Integrating geospatial information into fire risk assessment. International Journal of Wildland Fire, 23(5), 606-619.

98

Collins, T. W. (2008). What influences hazard mitigation? Household decision making about wildfire risks in Arizona's White Mountains. The Professional Geographer, 60(4), 508- 526.

Collins, T. W., & Bolin, B. (2009). Situating hazard vulnerability: people’s negotiations with wildfire environments in the US Southwest. Environmental Management, 44(3), 441-455.

Crampton, J. W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M. W., & Zook, M. (2013). Beyond the geotag: situating ‘big data’and leveraging the potential of the geoweb. Cartography and geographic information science, 40(2), 130-139.

Crooks, A., Croitoru, A., Stefanidis, A., & Radzikowski, J. (2013). # Earthquake: Twitter as a distributed sensor system. Transactions in GIS,17(1), 124-147.

Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.

Cutter, S. L., Boruff, B. J., & Shirley, W. L. (2003). Social vulnerability to environmental hazards. Social science quarterly, 84(2), 242-261.

Cutter, S. L., & Emrich, C. (2005). Are natural hazards and disaster losses in the US increasing?. EOS, Transactions American Geophysical Union, 86(41), 381-389.

De Albuquerque, J. P., Herfort, B., Brenning, A., & Zipf, A. (2015). A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. International Journal of Geographical Information Science, 29(4), 667-689.

De Longueville, B., Luraschi, G., Smits, P., Peedell, S., & Groeve, T. D. (2010). Citizens as sensors for natural hazards: A VGI integration workflow. Geomatica, 64, 41-59.

De Longueville, B., Smith, R. S., & Luraschi, G. (2009, November). Omg, from here, i can see the flames!: a use case of mining location based social networks to acquire spatio- temporal data on forest fires. In Proceedings of the 2009 international workshop on location based social networks (pp. 73-80). ACM.

Dredze, M., Paul, M. J., Bergsma, S., & Tran, H. (2013, June). Carmen: A twitter geolocation system with applications to public health. In Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Earle, P. (2010). Earthquake twitter. Nature Geoscience, 3(4), 221-222.

Earle, P. S., Bowden, D. C., & Guy, M. (2012). Twitter earthquake detection: earthquake monitoring in a social world. Annals of Geophysics, 54(6), 708-715.

99

Eilander, D., Trambauer, P., Wagemaker, J., & van Loenen, A. (2016). Harvesting social media for generation of near real-time flood maps. Procedia Engineering, 154, 176-183.

Elwood, S., & Leszczynski, A. (2011). Privacy, reconsidered: New representations, data practices, and the geoweb. Geoforum, 42(1), 6-15.

Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors: The Journal of the Human Factors and Ergonomics Society, 37(1), 32-64.

Feinerer, I., & Hornik K (2014). tm: Text Mining Package. R package version 0.6.

Feinerer, I., K. Hornik, and D. Meyer. 2008. “Text Mining Infrastructure in R.” Journal of Statistical Software 25 (5): 1–54.

Flanagan, B. E., Gregory, E. W., Hallisey, E. J., Heitgerd, J. L., & Lewis, B. (2011). A social vulnerability index for disaster management. Journal of homeland security and emergency management, 8(1).

Fohringer, J., Dransch, D., Kreibich, H., & Schröter, K. (2015). Social media as an information source for rapid flood inundation mapping. Natural Hazards and Earth System Sciences, 15(12), 2725-2738.

Fothergill, A., & Peek, L. A. (2004). Poverty and disasters in the United States: A review of recent sociological findings. Natural hazards, 32(1), 89-110.

Fuchs, G., Andrienko, N., Andrienko, G., Bothe, S., & Stange, H. (2013, November). Tracing the German centennial flood in the stream of tweets: first lessons learned. In Proceedings of the second ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information (pp. 31-38). ACM.

Gao, C., & Liu, J. (2015). Uncovering spatiotemporal characteristics of human online behaviors during extreme events. PloS one, 10(10), e0138673.

Gelernter, J., & Mushegian, N. (2011). Geo‐parsing Messages from Microtext. Transactions in GIS, 15(6), 753-773.

Ghosh, D., & Guha, R. (2013). What are we ‘tweeting’about obesity? Mapping tweets with topic modeling and Geographic Information System. Cartography and Geographic Information Science, 40(2), 90-102.

Gillespie, T. W., Chu, J., Frankenberg, E., & Thomas, D. (2007). Assessment and prediction of natural hazards from satellite imagery. Progress in Physical Geography, 31(5), 459-470.

Gillett, N. P., Weaver, A. J., Zwiers, F. W., & Flannigan, M. D. (2004). Detecting the effect of climate change on Canadian forest fires. Geophysical Research Letters, 31(18).

100

Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211-221.

Goodchild, M. F., & Glennon, J. A. (2010). Crowdsourcing geographic information for disaster response: a research frontier. International Journal of Digital Earth, 3(3), 231-241.

Goodwin, J., Dolbear, C., & Hart, G. (2008). Geographical linked data: The administrative geography of Great Britain on the semantic web. Transactions in GIS, 12(s1), 19-30.

Goldstein, B. E. (2008). Skunkworks in the embers of the Cedar Fire: enhancing resilience in the aftermath of disaster. Human Ecology, 36(1), 15-28.

Granell, C., & Ostermann, F. O. (2016). Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management. Computers, Environment and Urban Systems, 59, 231-243.

Grütter, R., Purves, R. S., & Wotruba, L. (2017). Evaluating Topological Queries in Linked Data Using DBpedia and GeoNames in Switzerland and Scotland. Transactions in GIS, 21(1), 114-133.

Guan, X., & Chen, C. (2014). Using social media data to understand and assess disasters. Natural hazards, 74(2), 837-850.

Gupta, A., Lamba, H., Kumaraguru, P., & Joshi, A. (2013, May). Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 729-736). International World Wide Web Conferences Steering Committee.

Guy, M., Earle, P., Ostrum, C., Gruchalla, K., & Horvath, S. (2010). Integration and dissemination of citizen reported and seismically derived earthquake information via social network technologies. In Advances in intelligent data analysis IX (pp. 42-53). Springer Berlin Heidelberg.

Han, S. Y., Tsou, M. H., & Clarke, K. C. (2015). Do Global Cities Enable Global Views? Using Twitter to Quantify the Level of Geographical Awareness of US Cities. PloS one, 10(7), e0132464.

Hara, Y. (2015). Behaviour analysis using tweet data and geo-tag data in a natural disaster. Transportation Research Procedia, 11, 399-412.

Haworth, B., & Bruce, E. (2015). A review of volunteered geographic information for disaster management. Geography Compass, 9(5), 237-250.

101

Haworth, B., Bruce, E., & Middleton, P. (2015). Emerging technologies for risk reduction: Assessing the potential use of social media and VGI for increasing community engagement. Australian Journal of Emergency Management, The, 30(3), 36.

Herrero-Corral, G., Jappiot, M., Bouillon, C., & Long-Fournel, M. (2012). Application of a geographical assessment method for the characterization of wildland–urban interfaces in the context of wildfire prevention: A case study in western Madrid. Applied Geography, 35(1), 60-70.

Hollenstein, L., & Purves, R. (2012). Exploring place through user-generated content: Using Flickr tags to describe city cores. Journal of Spatial Information Science, 2010(1), 21-48.

Huang, Q., Cao, G., & Wang, C. (2014, November). From where do tweets originate?: a GIS approach for user location inference. In Proceedings of the 7th ACM SIGSPATIAL International Workshop on Location-Based Social Networks (pp. 1-8). ACM.

Huang, Q., & Cervone, G. (2016). Usage of Social Media and Cloud Computing During Natural Hazards. TC Vance, N. Merati, C. Yang, and M. Yuan, Cloud Computing in Ocean and Atmospheric Sciences, 297-324.

Huang, Q., Cervone, G., Jing, D., & Chang, C. (2015, November). DisasterMapper: A CyberGIS framework for disaster management using social media data. In Proceedings of the 4th International ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data (pp. 1-6). ACM.

Huang, Q., & Xiao, Y. (2015). Geographic situational awareness: mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information, 4(3), 1549-1568.

Huang, Q., & Wong, D. W. (2016). Activity patterns, socioeconomic status and urban spatial structure: what can social media data tell us?. International Journal of Geographical Information Science, 30(9), 1873-1898.

Hughes, A. L., & Palen, L. (2009). Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6(3-4), 248-260.

Hultquist, C., Simpson, M., Cervone, G., & Huang, Q. (2015, November). Using nightlight remote sensing imagery and Twitter data to study power outages. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management (p. 6). ACM.

Imran, M., Elbassuoni, S., Castillo, C., Diaz, F., & Meier, P. (2013a). Practical extraction of disaster-relevant information from social media. In Proceedings of the 22nd international

102

conference on World Wide Web companion (pp. 1021-1024). International World Wide Web Conferences Steering Committee.

Imran, M., Elbassuoni, S. M., Castillo, C., Diaz, F., & Meier, P. (2013b). Extracting information nuggets from disaster-related messages in social media. Proc. of ISCRAM, Baden-Baden, Germany.

Imran, M., Castillo, C., Lucas, J., Meier, P., & Vieweg, S. (2014a). Aidr: Artificial intelligence for disaster response. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (pp. 159-162). International World Wide Web Conferences Steering Committee.

Imran, M., Castillo, C., Lucas, J., Patrick, M., & Rogstadius, J. (2014b). Coordinating human and machine intelligence to classify microblog communications in crises. Proc. of ISCRAM.

Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: a survey. ACM Computing Surveys (CSUR), 47(4), 67.

Janowicz, K., Scheider, S., Pehle, T., & Hart, G. (2012). Geospatial semantics and linked spatiotemporal data–Past, present, and future. Semantic Web, 3(4), 321-332.

Jones, C. B., Purves, R. S., Clough, P. D., & Joho, H. (2008). Modelling vague places with knowledge from the Web. International Journal of Geographical Information Science, 22(10), 1045-1065.

Jongman, B., Wagemaker, J., Romero, B. R., & de Perez, E. C. (2015). Early flood detection for rapid humanitarian response: harnessing near real-time satellite and Twitter signals. ISPRS International Journal of Geo-Information, 4(4), 2246-2266.

Joyce, K. E., Belliss, S. E., Samsonov, S. V., McNeill, S. J., & Glassey, P. J. (2009). A review of the status of satellite remote sensing and image processing techniques for mapping natural hazards and disasters. Progress in Physical Geography.

Kent, J. D., & Capello Jr, H. T. (2013). Spatial patterns and demographic indicators of effective social media content during the Horsethief of 2012. Cartography and Geographic Information Science, 40(2), 78-89.

Kireyev, K., Palen, L., & Anderson, K. (2009, December). Applications of topics models to analysis of disaster-related twitter data. In NIPS Workshop on Applications for Topic Models: Text and Beyond (Vol. 1). Canada: Whistler.

Klomp, J. (2016). Economic development and natural disasters: A satellite data analysis. Global Environmental Change, 36, 67-88.

103

Klonner, C., Marx, S., Usón, T., Porto de Albuquerque, J., & Höfle, B. (2016). Volunteered geographic information in natural hazard analysis: a systematic literature review of current approaches with a focus on preparedness and mitigation. ISPRS International Journal of Geo-Information, 5(7), 103.

Kogan, M., Palen, L., & Anderson, K. M. (2015, February). Think Local, Retweet Global: Retweeting by the Geographically-Vulnerable during Hurricane Sandy. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 981-993). ACM.

Kongthon, A., Haruechaiyasak, C., Pailai, J., & Kongyoung, S. (2012, July). The role of Twitter during a natural disaster: Case study of 2011 Thai Flood. In Technology Management for Emerging Technologies (PICMET), 2012 Proceedings of PICMET'12: (pp. 2227-2232). IEEE.

Kryvasheyeu, Y., Chen, H., Moro, E., Van Hentenryck, P., & Cebrian, M. (2015). Performance of social network sensors during hurricane Sandy. PLoS one, 10(2), e0117288.

Kryvasheyeu, Y., Chen, H., Obradovich, N., Moro, E., Van Hentenryck, P., Fowler, J., & Cebrian, M. (2016). Rapid assessment of disaster damage using social media activity. Science Advances, 2(3), e1500779.

Kwan, M. P. (2016). Algorithmic Geographies: Big Data, Algorithmic Uncertainty, and the Production of Geographic Knowledge. Annals of the American Association of Geographers, 106(2), 274-282.

Lachlan, K. A., Spence, P. R., & Lin, X. (2014a). Expressions of risk awareness and concern through Twitter: on the utility of using the medium as an indication of audience needs. Computers in Human Behavior, 35, 554-559.

Lachlan, K. A., Spence, P. R., Lin, X., Najarian, K. M., & Del Greco, M. (2014b). Twitter use during a weather event: comparing content associated with localized and nonlocalized hashtags. Communication Studies, 65(5), 519-534.

Lachlan, K. A., Spence, P. R., Lin, X., Najarian, K., & Del Greco, M. (2016). Social media and crisis management: CERC, search strategies, and Twitter content. Computers in Human Behavior, 54, 647-652.

Lai, C., She, B., Ye, X. (2015) Unpacking the Network Processes and Outcomes of Online and Offline Humanitarian Collaboration. Communication Research doi: 10.1177/0093650215616862

Landwehr, P. M., Wei, W., Kowalchuck, M., & Carley, K. M. (2016). Using tweets to support disaster planning, warning and response. Safety science, 90, 33-47.

104

Latonero, M. & Shklovski, I.(2011). Emergency management, Twitter, & Social Media Evangelism. International Journal of Information Systems for Crisis Response and Management, 3(4), 67-86.

Li, L., Goodchild, M. F., & Xu, B. (2013). Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science, 40(2), 61- 77.

Liang, Y., Caverlee, J., & Mander, J. (2013, May). Text vs. images: on the viability of social media to assess earthquake damage. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1003-1006). ACM.

Lingad, J., Karimi, S., & Yin, J. (2013, May). Location extraction from disaster-related microblogs. In Proceedings of the 22nd International Conference on World Wide Web (pp. 1017-1020). ACM.

Liu, Y., Goodrick, S., & Heilman, W. (2014). Wildland fire emissions, carbon, and climate: Wildfire–climate interactions. Forest Ecology and Management,317, 80-96.

Liu, S. B., Palen, L., Sutton, J., Hughes, A. L., & Vieweg, S. (2008). In search of the bigger picture: The emergent role of on-line photo sharing in times of disaster. In Proceedings of the information systems for crisis response and management conference (ISCRAM).

Liu, Y., Liu, X., Gao, S., Gong, L., Kang, C., Zhi, Y., ... & Shi, L. (2015). Social sensing: a new approach to understanding our socioeconomic environments. Annals of the Association of American Geographers, 105(3), 512-530.

Lu, X., & Brelsford, C. (2014). Network structure and community evolution on twitter: human behavior change in response to the 2011 Japanese earthquake and tsunami. Scientific reports, 4.

MacEachren, A. M., Jaiswal, A., Robinson, A. C., Pezanowski, S., Savelyev, A., Mitra, P., ... & Blanford, J. (2011, October). Senseplace2: Geotwitter analytics support for situational awareness. In Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on (pp. 181-190). IEEE.

Martínez, J., Vega-Garcia, C., & Chuvieco, E. (2009). Human-caused wildfire risk rating for prevention planning in Spain. Journal of Environmental Management, 90(2), 1241-1252.

Mandel, B., Culotta, A., Boulahanis, J., Stark, D., Lewis, B., & Rodrigue, J. (2012, June). A demographic analysis of online sentiment during hurricane irene. In Proceedings of the Second Workshop on Language in Social Media(pp. 27-36). Association for Computational Linguistics.

105

Massada, A. B., Radeloff, V. C., Stewart, S. I., & Hawbaker, T. J. (2009). Wildfire risk in the wildland–urban interface: a simulation study in northwestern Wisconsin. Forest Ecology and Management, 258(9), 1990-1999.

McClendon, S., & Robinson, A. C. (2013). Leveraging geospatially-oriented social media communications in disaster response. International Journal of Information Systems for Crisis Response and Management (IJISCRAM),5(1), 22-40.

Mendoza, M., Poblete, B., & Castillo, C. (2010, July). Twitter Under Crisis: Can we trust what we RT?. In Proceedings of the first workshop on social media analytics (pp. 71-79). ACM.

Middleton, S. E., Middleton, L., & Modafferi, S. (2014). Real-time crisis mapping of natural disasters using social media. IEEE Intelligent Systems, 29(2), 9-17.

Miyazaki, H., Nagai, M., & Shibasaki, R. (2015). Reviews of Geospatial Information Technology and Collaborative Data Delivery for Disaster Risk Management. ISPRS International Journal of Geo-Information, 4(4), 1936-1964.

Musaev, A., Wang, D., & Pu, C. (2014). LITMUS: Landslide detection by integrating multiple sources. In 11th International Conference Information Systems for Crisis Response and Management (ISCRAM).

Oh, O., Kwon, K. H., & Rao, H. R. (2010, August). An Exploration of Social Media in Extreme Events: Rumor Theory and Twitter during the Haiti Earthquake 2010. In ICIS (p. 231).

Olteanu, A., Vieweg, S., & Castillo, C. (2015, February). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 994- 1009). ACM.

Padilla, M., & Vega-García, C. (2011). On the comparative importance of fire danger rating indices and their integration with spatial and temporal variables for predicting daily human-caused fire occurrences in Spain. International Journal of Wildland Fire, 20(1), 46-58.

Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.

106

Panteras, G., Wise, S., Lu, X., Croitoru, A., Crooks, A., & Stefanidis, A. (2015). Triangulating social multimedia content for event localization using Flickr and Twitter. Transactions in GIS, 19(5), 694-715.

Pohl, D., Bouchachia, A., & Hellwagner, H. (2012, December). Automatic identification of crisis-related sub-events using clustering. In Machine Learning and Applications (ICMLA), 2012 11th International Conference on (Vol. 2, pp. 333-338). IEEE.

Power, R., Robinson, B., Colton, J., & Cameron, M. (2014). Emergency situation awareness: Twitter case studies. In Information Systems for Crisis Response and Management in Mediterranean Countries (pp. 218-231). Springer International Publishing.

Preis, T., Moat, H. S., Bishop, S. R., Treleaven, P., & Stanley, H. E. (2013). Quantifying the digital traces of Hurricane Sandy on Flickr. Scientific reports,3.

Purves, R. S., Clough, P., Jones, C. B., Arampatzis, A., Bucher, B., Finch, D., ... & Yang, B. (2007). The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. International journal of geographical information science, 21(7), 717-745.

Pyne, S. J. (2004). Pyromancy: Reading stories in the flames. Conservation Biology, 18(4), 874-877.

Qu, Y., Huang, C., Zhang, P., & Zhang, J. (2011, March). Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (pp. 25-34). ACM.

Rakwatin, P., Sansena, T., Marjang, N., & Rungsipanich, A. (2013). Using multi-temporal remote-sensing data to estimate 2011 flood area and volume over Chao Phraya River basin, Thailand. Remote Sensing Letters, 4(3), 243-250.

Rego, F., Catry, F. X., Montiel, C., & Karlsson, O. (2013). Influence of territorial variables on the performance of wildfire detection systems in the Iberian Peninsula. Forest Policy and Economics, 29, 26-35.

Resch, B., Usländer, F., & Havas, C. (2017). Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment. Cartography and Geographic Information Science, 1-15.

Rey, S. and Ye, X. (2010) Comparative Spatial Dynamics of Regional Systems. In A. P´aez, J. Le Gallo, R. Buliung, and Dall’Erba, S. (eds.) Progress in Spatial Analysis: Theory, Computation, and Thematic Applications. 441-464. Springer.

107

Robinson, B., Power, R., & Cameron, M. (2013, May). A sensitive twitter earthquake detector. In Proceedings of the 22nd International Conference on World Wide Web (pp. 999- 1002). ACM.

Rodrigues, M., & de la Riva, J. (2014). An insight into machine-learning algorithms to model human-caused wildfire occurrence. Environmental Modelling & Software, 57, 192-201.

Rodrigues, M., de la Riva, J., & Fotheringham, S. (2014). Modeling the spatial variation of the explanatory factors of human-caused wildfires in Spain using geographically weighted logistic regression. Applied Geography, 48, 52-63.

Saharia, N. (2015, October). Detecting emotion from short messages on Nepal earthquake. In Speech Technology and Human-Computer Dialogue (SpeD), 2015 International Conference on (pp. 1-5). IEEE.

Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes Twitter users: real- time event detection by social sensors. In Proceedings of the 19th international conference on World wide web (pp. 851-860). ACM.

Sanyal, J., & Lu, X. X. (2004). Application of remote sensing in flood management with special reference to monsoon Asia: a review. Natural Hazards, 33(2), 283-301.

Schade, S., Díaz, L., Ostermann, F., Spinsanti, L., Luraschi, G., Cox, S., ... & De Longueville, B. (2013). Citizen-based sensing of crisis events: sensor web enablement for volunteered geographic information. Applied Geomatics, 5(1), 3-18.

Schnebele, E., & Cervone, G. (2013). Improving remote sensing flood assessment using volunteered geographical data. Natural Hazards and Earth System Sciences, 13(3), 669.

Schnebele, E., & Waters, N. (2014b). Road assessment after flood events using non- authoritative data. Natural Hazards and Earth System Sciences, 14(4), 1007.

Schnebele, E., Cervone, G., Kumar, S., & Waters, N. (2014a). Real time estimation of the Calgary floods using limited remote sensing data. Water, 6(2), 381-398.

Schulz, A., Thanh, T., Paulheim, H., & Schweizer, I. (2013). A fine-grained sentiment analysis approach for detecting crisis related microposts.ISCRAM 2013.

Schulte, S., & Miller, K. A. (2010). Wildfire risk and climate change: the influence on homeowner mitigation behavior in the wildland–urban interface.Society and Natural Resources, 23(5), 417-435.

Shanley, L. A., Burns, R., Bastian, Z., & Robson, E. S. (2013). Tweeting up a storm: The promise and Perils of crisis mapping. Photogrammetric engineering and remote sensing, 79(10), 865-879.

108

Shelton, T., Poorthuis, A., Graham, M., & Zook, M. (2014). Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum, 52, 167-179.

Shook, E., & Turner, V. K. (2016). The socio-environmental data explorer (SEDE): a social media–enhanced decision support system to explore risk perception to hazard events. Cartography and Geographic Information Science, 1-15.

Spinsanti, L., & Ostermann, F. (2013). Automated geographic context analysis for volunteered information. Applied Geography, 43, 36-44.

Spitzberg, B. H. (2014). Toward a model of meme diffusion (M3D). Communication Theory, 24(3), 311-339.

Spitzberg, B. H., Tsou, M. H., Gupta, D. K., An, L., Gawron, J. M., & Lusher, D. (2013). The map is not which territory?: Speculating on the geo-spatial diffusion of ideas in the Arab Spring of 2011. Studies in Media and Communication, 1(1), 101-115.

Srivastava, M., Abdelzaher, T., & Szymanski, B. (2012). Human-centric sensing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 370(1958), 176-197.

Starbird, K., & Palen, L. (2010). Pass it on?: Retweeting in mass emergency (pp. 1-10). International Community on Information Systems for Crisis Response and Management.

Stefanidis, A., Crooks, A., & Radzikowski, J. (2013). Harvesting ambient geospatial information from social media feeds. GeoJournal, 78(2), 319-338.

Steiger, E., Albuquerque, J. P., & Zipf, A. (2015). An advanced systematic literature review on spatiotemporal analyses of Twitter data. Transactions in GIS, 19(6), 809-834.

Sui, D., & Goodchild, M. (2011). The convergence of GIS and social media: challenges for GIScience. International Journal of Geographical Information Science, 25(11), 1737- 1748.

Sun, D., Li, S., Zheng, W., Croitoru, A., Stefanidis, A., & Goldberg, M. (2016). Mapping floods due to Hurricane Sandy using NPP VIIRS and ATMS data and geotagged Flickr imagery. International Journal of Digital Earth, 9(5), 427-441.

Terpstra, T., de Vries, A., Stronkman, R., & Paradies, G. L. (2012). Towards a realtime Twitter analysis during crises for operational crisis management(pp. 1-9). Simon Fraser University.

109

Thompson, M. P., Haas, J. R., Gilbertson-Day, J. W., Scott, J. H., Langowski, P., Bowne, E., & Calkin, D. E. (2015). Development and application of a geospatial wildfire exposure and risk calculation tool. Environmental Modelling & Software, 63, 61-72.

Tralli, D. M., Blom, R. G., Zlotnicki, V., Donnellan, A., & Evans, D. L. (2005). Satellite remote sensing of earthquake, volcano, flood, landslide and coastal inundation hazards. ISPRS Journal of Photogrammetry and Remote Sensing, 59(4), 185-198.

Triglav-Čekada, M., & Radovan, D. (2013). Using volunteered geographical information to map the November 2012 floods in Slovenia. Natural Hazards and Earth System Sciences, 13(11), 2753-2762.

Truelove, M., Vasardani, M., & Winter, S. (2015). Towards credibility of micro-blogs: characterising witness accounts. GeoJournal, 80(3), 339-359.

Tsou, M. H., Yang, J. A., Lusher, D., Han, S., Spitzberg, B., Gawron, J. M., ... & An, L. (2013). Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election. Cartography and Geographic Information Science, 40(4), 337-348.

Tsou, M. H., & Leitner, M. (2013). Visualization of social media: seeing a mirage or a message?. Cartography and Geographic Information Science, 40(2), 55-60.

Tsou, M. H., Yang, J. A., Lusher, D., Han, S., Spitzberg, B., Gawron, J. M., et al. (2013). Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US presidential election. Cartography and Geographic Information Science, 40(4), 337-348.

UN/ISDR (2002). Living with risk: a global review of disaster reduction initiatives. United Nations Office for Disaster Risk Reduction (UNISDR). (https://www.unisdr.org/files/657_lwr1.pdf).

Verma, S., Vieweg, S., Corvey, W. J., Palen, L., Martin, J. H., Palmer, M., ... & Anderson, K. M. (2011, July). Natural Language Processing to the Rescue? Extracting" Situational Awareness" Tweets During Mass Emergency. In ICWSM.

Vieweg, S., Castillo, C., & Imran, M. (2014). Integrating social media communications into the rapid assessment of sudden onset disasters. InSocial Informatics (pp. 444-461). Springer International Publishing.

Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010, April). Microblogging during two natural hazards events: what twitter may contribute to situational awareness? In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1079-1088). ACM.

110

Wang, H., Hovy, E. H., & Dredze, M. (2015, January). The Hurricane Sandy Twitter Corpus. In AAAI Workshop: WWW and Public Health Intelligence. http://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/download/10079/10258

Wang, S. (2010). A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers, 100(3), 535-557.

Wang, S. (2013). CyberGIS: blueprint for integrated and scalable geospatial software ecosystems. International Journal of Geographical Information Science, 27(11), 2119- 2121

Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., & Nyerges, T. L. (2013). CyberGIS software: a synthetic review and integration roadmap. International Journal of Geographical Information Science, 27(11), 2122-2145.

Wang, Y., Wang, T., Ye, X., Zhu, J., & Lee, J. (2016). Using social media for emergency response and urban sustainability: a case study of the 2012 Beijing rainstorm. Sustainability, 8(1), 25.

Wang, Z., Ye, X., & Tsou, M. H. (2016). Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards, 83(1), 523-540.

Wang, Z., & Ye, X. (2018). Social media analytics for natural disaster management. International Journal of Geographical Information Science, 32(1), 49-72.

Westerling, A. L., Hidalgo, H. G., Cayan, D. R., & Swetnam, T. W. (2006). Warming and earlier spring increase western US forest wildfire activity. Science, 313(5789), 940-943.

Widener, M. J., & Li, W. (2014). Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Applied Geography, 54, 189-197.

Xiao, Y., Huang, Q., & Wu, K. (2015). Understanding social media data for disaster management. Natural Hazards, 79(3), 1663-1679.

Xu, P., Wu, Y., Wei, E., Peng, T. Q., Liu, S., Zhu, J. J., & Qu, H. (2013). Visual analysis of topic competition on social media. Visualization and Computer Graphics, IEEE Transactions on, 19(12), 2012-2021.

Yan, Y., Eckle, M., Kuo, C. L., Herfort, B., Fan, H., & Zipf, A. (2017). Monitoring and Assessing Post-Disaster Tourism Recovery Using Geotagged Social Media Data. ISPRS International Journal of Geo-Information, 6(5), 144.

Ye, X., Huang, Q., and Li, W. (2016) Integrating Big Social Data, Computing, and Modeling for Spatial Social Science, Cartography and Geographic Information Science, 43:5, 377- 378

111

Ye, X. and Rey, S. J. (2013) A Framework for Exploratory Space-Time Analysis of Economic Data. Annals of Regional Science, 50(1): 315-339.

Yin, J., Lampert, A., Cameron, M., Robinson, B., & Power, R. (2012). Using social media to enhance emergency situation awareness. IEEE Intelligent Systems, 27(6), 52-59.

Youssouf, H., Liousse, C., Roblou, L., Assamoi, E. M., Salonen, R. O., Maesano, C., et al. P. H. A. S. E. (2014). Quantifying wildfires exposure for investigating health-related effects. Atmospheric Environment, 97, 239-251.

Young, S. D. (2014). Behavioral insights on big data: using social media for predicting biomedical outcomes. Trends in microbiology, 22(11), 601-602.

Yu, M., Yang, C., &LI, Y. (2018). Big data in natural disaster management: A review. Geosciences, 8, 165 DOI:10.3390/geosciences8050165

Zhao, P., Qin, K., Ye, X., Wang, Y, & Chen, Y. (2016). A Trajectory Clustering Approach Based on Decision Graph and Data Field for Detecting Hotspots. International Journal of Geographical Information Science DOI: 10.1080/13658816.2016.1213845

Zhao, Y. (2012). R and data mining: Examples and case studies. Academic Press.

Zhu, J., Xiong, F., Piao, D., Liu, Y., & Zhang, Y. (2011, October). Statistically modeling the effectiveness of disaster information in social media. In Global Humanitarian Technology Conference (GHTC), 2011 IEEE (pp. 431-436). IEEE.

Zielinski, A., Middleton, S. E., Tokarchuk, L. N., & Wang, X. (2013). Social media text mining and network analysis for decision support in natural crisis management. Proc. ISCRAM. Baden-Baden, Germany, 840-845.

Zook, M., Graham, M., & Boulton, A. (2015). Crowd-sourced augmented realities: Social media and the power of digital representation. In Mediated geographies and geographies of media (pp. 223-240). Springer Netherlands.

Zook, M., Kraak, M. J., & Ahas, R. (2015). Geographies of mobility: applications of location- based data. International Journal of Geographical Information Science, 29(11), 1935- 1940.

112