A dataset of Ya’an Earthquake based on social

media Scientific Data Tian Chuanzhao1,2, Li Guoqing2*, Yang Tengfei1,2, Li Zhenyu3 Vol.3, No.2, 2018 1. University of Chinese Academy of Sciences, Beijing 100049, P. R. China; 2. Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, P. R. China; 3. Shandong University of Science and Technology, Qingdao 266590, P. R. China ARTICLE DOI: * Email: [email protected] 10.11922/csdata.2018.0004.en

DATA DOI: 10.11922/sciencedb.560 Abstract: The Ya’an Earthquake occurred on April 20, 2013 (Beijing time). Its was located in Lushan County, Ya’an City, Province, and the SUBJECT CATEGORY: Social sciences magnitude of this earthquake was 7.0. As of 14:30, April 24, the earthquake RECEIVED: caused a total of 196 people dead, 21 missing and 11470 injured. With the February 2, 2018 development of information and communication technologies, microblog RELEASED: March 14, 2018 shows great potential in promoting emergency response as it provides an easily accessible platform on which disaster information could be assembled PUBLISHED: June 22, 2018 and rapidly disseminated to a large number of audiences. In view of this, we built the dataset of Ya’an Earthquake based on Sina-Weibo microblogs posted within Sichuan Province during 7 days after its occurrence. Sina- Weibo, a platform for information sharing and exchange, entertainment, leisure and life services, was launched in August 2009. It provides a platform where the public can communicate, express their feelings, offer suggestions, and so on – a platform that is essential for earthquake data search, query and publishing.

Keywords: Ya’an Earthquake; Sina-Weibo; Sichuan Province; data mining

Dataset Profile Chinese Title 雅安地震灾情的社交媒体数据集 English Title A dataset of Ya’an Earthquake based on social media Data authors Tian Chuanzhao, Li Guoqing, Yang Tengfei, Li Zhenyu Corresponding data Li Guoqing author Time range April 20 – 26, 2013 Geographical scope Sichuan Province Data volume 51418 records (about 5MB) Data format .xls Data service system

- 1 - National Key Research and Development Program of China Source of funding (2016YFE0122600); International Partnership Program of Chinese Academy of Sciences(131C11KYSB20160061) The data set consists of two parts of data: (1) “Data.rar” contains 21 tables of Sina-Weibo text data, and Dataset composition each table corresponds to a region; (2) “Classification sample.rar” is a sample subset illustrating the classification of the text data in “Data.rar”.

1. Introduction

Ya’an Earthquake: 1 according to the China Earthquake Networks Center, the Ya’an Earthquake occurred at 8:02, April 20, 2013 (Beijing time). The epicenter was located in Lushan County, Ya’an City (30.3N, 103.0E), at a depth of 13 km, and the earthquake had a magnitude of 7.0. As of 10:00, April 24, 2013, 4045 occurred, among which 103 were above magnitude 3, with the biggest being 5.7. An area of 12500 km2 around the epicenter was affected, involving 1.52 million people. According to the China Earthquake Administration, the earthquake caused 196 people dead, 21 missing and 11470 injured as of 14:30, April 24. Figure 1 shows the location of earthquake occurrence.

- 2 - Figure 1 Location of Ya’an Earthquake

Sina-Weibo, 2 an information sharing and exchange platform that provides entertainment, leisure, and other life services for the public, was launched in August 2009. By the end of March 2013, Sina-Weibo had a number of 536 million registered users, with an annual increase rate of 6.6%, and the number of its daily active users increased to 49.8 million, by 7.8% as of the end of 2012. Sina-Weibo provides timely updates about earthquake disasters. It is a platform where users are free to make searches and queries, where government bodies can post dynamic information about security and rescue, where the public can communicate to express their feelings, such as blessing, sadness, anger, anxiety, etc., and where users can propose to the government actions to be taken. Figure 2 shows some earthquake information at Sina-Weibo.

Figure 2 Earthquake information obtained from Sina-Weibo

There is growing evidence8–11 that the public would look for disaster information most intensively during a certain period of time after its occurrence,

- 3 - irrespective of the sources.3,6 As citizens can both access and post disaster information at open social platforms, such information constitutes a key part of effective responses to a major disaster. On this aspect, research abroad goes earlier than the domestic. Glaser et al.4 analyzed Twitter data during the 2007 California Wildfires. Vieweg et al.5 researched on Twitter data for the 2009 Red River Floods and the 2009 Oklahoma Grassfires. It can be seen that Twitter has already been an effective channel for real-time updates. In China, scholars also studied the application of microblogs in formulating disaster response. Qu et al.6 analyzed people’s responses to the based on Tianya Forum data, and Qu et at.7 analyzed people’s responses to the 2010 Yushu Earthquake based on Sina-Weibo microblogs.

2. Data collection and processing 2.1 Overview Using “Ya’an Earthquake” as the keywords, we searched Sina-Weibo text data posted within the geographical location of Sichuan Province during April 20 – 26, 2013. Each data record included: microblog content, time created, number of forwards, number of likes, number of comments and other information. We first determined a city for data crawling and collected data from 21 cities of Sichuan Province. Due to Sina-Weibo’s search limitations (i.e., up to 1000 records per search), we then determined a time interval for data crawling. Because of Sina- Weibo’s search limitations, the amount of data would reach a peak during a certain period, or within 72 hours, after the earthquake occurrence, which is called the golden relief time. We collected Sina-Weibo data posted from all the cities of Sichuan Province during this period at a time interval of each hour. At other special time periods when Sina-Weibo data was released in particularly large quantities, we crawled data every few hours. However, at periods when the volume was small, we crawled data every few days. The data collected at respective time intervals was then stored into an appropriate data table. We analyzed various counts of the 51418 earthquake-related messages collected within a week period after the earthquake occurrence. We counted the number of messages posted each day (indicated by the blue line in Figure 3), the number of messages forwarded (indicated by the red line in Figure 3), the number of messages commented (indicated by the green line in Figure 3) and the number of messages liked (indicated by the orange line in Figure 3).

- 4 - 30000

25000

20000

15000 All ForwardCount CommentCount 10000 LikeCount

5000

0

04/21/2013 04/22/2013 04/23/2013 04/24/2013 04/25/2013 04/26/2013 04/20/2013 Figure 3 Counts of Ya’an Earthquake-related messages

2.2 Data classification We asked what types of messages people posted at Sina-Weibo in response to the earthquake. To answer the question, we randomly sampled 200 microblog messages for analysis. We identified six categories of content: emotion-related, opinion-related, action-related, situation updates, general information and others. Table 1 shows a summary of the categories.

Table 1 Classification of Sina-Weibo messages Category Description Emotion-related Expressing personal feelings such as blessing, sadness, anger, anxiety, etc. Opinion-related Criticizing or providing suggestions to the public, the government or rescue agencies. Action-related Requesting help, looking for missing people, or proposing relief actions or relief coordination. Situation Updates Updating factual information about the earthquake. General Information Any other earthquake relief-related information. Others Other earthquake-related information.

- 5 - We applied the categories to sampled Sina-Weibo messages (Figure 4), and concluded 42% for emotion-related messages, 21% for action-related messages, 14% for situation updates, 8% for general information, 4% for opinion-related messages, and 11% for other messages on the earthquake.

Opinion- related 3% Others Situation 11% Updates 14%

Action-related 21% Emotion- related 42%

General …

Figure 4 Sample data classification

3. Sample description

The data retrieved from Sina-Weibo was stored into 21 tables. Each table corresponds to a city. Each data entry records information on the ID, content, location, time, forwardCount, commentCount, likeCount, keyword, province and city of the microblog posted.

Table 2 Sample data entry Field Name Description ID 2231 Content #Earthquake Live # 7.0 Ya'an Earthquake of Lushan: As of 18:00 April 21, there were 1642 aftershocks, including 78 aftershocks of magnitudes 3 and above, 4 of magnitude above 5.0 and above, and 18 of magnitudes between 4.0 – 4.9 , and 56 of magnitudes between 3.0 – 3.9. The largest occurred at 5.45 pm, April 21 at Lushan. The 5.4-magnitude aftershock occurred at the junction of the two peaks. Location – Time 2013-04-21 18:38 ForwardCount 10 CommentCount 5 LikeCount 1 Keyword Ya’an Earthquake Province Sichuan City Chengdu

- 6 - 4. Quality control and assessment

When the body of the message retrieved was removed from Sino-Weibo, we then removed this data entry from our dataset accordingly. Data without time information was also removed in the process of quality control. In addition, information with hyperlinks only or without valuable information was removed from our dataset. An example is shown below: “# Ya’an earthquake in Sichuan # # microblogging topic details: web links, # Ya’an earthquake in Sichuan # Details: web links, # Ya’an 7 earthquake # # microblogging topic details: web link, # Ya’an 7 earthquake # Details: Web links, # Ya’an 7 earthquake # #, Ya’an earthquake microblogging reported safe # #.”

5. Value and significance

As time goes by, some messages retrieved have now been deleted by their bloggers, which makes it impossible to access some valuable messages posted at that time. As the only dataset that collects information about the 2013 Ya’an Earthquake, this dataset provides essential resources from Sina-Weibo for studying social media responses to the earthquake of the time. Sina-Weibo provides a platform through which the public can communicate with others. With the development of the Internet in recent years, there has been in particular a large surge in the number of phone application users, and people are more and more concerned about hot news and events. Sina-Weibo, as a major Chinese microblogging platform, plays a crucial role in the search and dissemination of hot information, especially in the event of an earthquake. This dataset can be used by academics to study the types of information most easily forwarded, commented, liked, the ways of information dissemination and data content categorization, and so on.

Acknowledgments

This work is supported by the National Key Research and Development Program of China (2016YFE0122600). We thank Dr. Pang Lushen from the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences for his suggestions on the collection of this dataset. Thank Li Zhenyu from Shandong University of Science and Technology for his support on data processing.

References

1. Ya’an Earthquake, available at: .

- 7 - 2. Sina-Weibo, available at: . 3. Sutton J, Palen L & Irina S. Backchannels on the front lines: Emergent use of social media in the 2007 Southern California Fires, Proceedings of the Information Systems for Crisis Response and Management Conference (ISCRAM 2008), Washington, DC, 2008. 4. Glaser M. California wildfire coverage by local media, blogs, Twitter, maps and more. PBS MediaShift. Available at: . 5. Vieweg S, Hughes AL, Starbird K et al. Microblogging during two natural hazards events: What twitter may contribute to situational awareness, Proc. CHI (2010): 1079 – 1088. 6. Qu Y, Wu PF & Wang X. Online community response to major disaster: A study of Tianya Forum in the 2008 Sichuan Earthquake, Proc. HICCS, 2009. 7. Qu Y, Huang C, Zhang P et al. Microblogging after a major disaster in China: A case study of the 2010 Yushu Earthquake, Proc. CSCW, 2011. 8. Li J, He Z, Plaza J et al. Social media: New perspectives to improve remote sensing for emergency response, Proceedings of the IEEE 105 (2017): 1900 – 1912. 9. Reuter C, Hughes AL & Kaufhold MA. Social media in crisis management: An evaluation and analysis of crisis informatics research, International Journal of Human–Computer Interaction 34 (2018): 280 – 294. 10. Williams BD, Valero JN & Kim K. Social media, trust, and disaster: Does trust in public and nonprofit organizations explain social media use during a disaster? Quality & Quantity 52 (2018): 537 – 550. 11. Park HW. YouTubers’ networking activities during the 2016 South Korea earthquake, Quality & Quantity 52 (2018): 1057 – 1068.

Data citation

1. Tian C, Li G, Yang T et al. A dataset of Ya’an Earthquake based on social media. Science Data Bank. DOI: 10.11922/sciencedb.560

- 8 - Authors and contributions

Tian Chuanzhao, PhD; research area: disaster data mining. Contribution: social media data collection and analysis, writing.

Li Guoqing, PhD, Professor, research area: geospatial data infrastructure, remote sensing, big data. Contribution: advice on dataset design and data check, writing.

Yang Tengfei, PhD; research area: natural language processing, disaster information mining. Contribution: motivation of the research, writing.

Li Zhengyu, MSc; research area: data mining. Contribution: data processing.

------How to cite this article: Tian C, Li G, Yang T et al. A dataset of Ya’an Earthquake based on social media. China Scientific Data 3 (2018), DOI: 10.11922/csdata.2018.0004.en

- 9 -