Peter M. Landwehr Thesis Proposal Labeling, Analyzing, and Simulating Tweets Produced in Disasters Peter Maciunas Landwehr Computation, Organizations, and Society Institute for Software Resource School of Computer Science Carnegie Mellon University
[email protected] September 30, 2013 Abstract Relief organizations use social media to stay abreast of victims’ needs during disaster and to help in planning relief efforts. Organizations hire dedicated teams to find posts that contain substantive, actionable information. Simulation and training exercises play important roles in preparing for disaster; tools like HAZUS-MH and FlamMap help to predict the physical damages and costs that will result when hurricanes and wildfires strike. Yet few if any disaster simulations incorporate the ways that a disaster can cause an outpouring of comment and useful information on the web. In disasters such as the 2010 Haiti Earthquake, tweets and SMS messages have been translated and coded using crowdsourcing platforms to divide up labor. Machine learning has been used to parse tweets for keywords and then intuit earthquake epicenters from physical position. Researchers have developed several different coding schemes for tweets to identify different types of useful content. While unskilled coders can apply some schemes, others require nuanced, subjective judgments. I propose to address the problems posed by the dearth of training simulations and the difficulties of applying salient codings to tweets by developing a novel simulation of social media in disaster, experimenting with using crowdsourcing to apply a set of sophisticated labels, and using the coded to data to train a machine learning classifier. The new simulation will produce sequences of “pseudo-tweets” that consist of content labels and a collection of common features, such as hashtags and retweet indicators, as output.