Analysis of Public Opinion During Anthrax Scares, The
Total Page:16
File Type:pdf, Size:1020Kb
ANTHRAX EVENT DETECTION USING TWITTER: ANALYSIS OF PUBLIC OPINION DURING ANTHRAX SCARES, THE MUELLER INVESTIGATION, AND NORTH KOREAN THREATS A dissertation submitted in partial fulfillment of the Requirements for the degree of Doctor of Philosophy By MICHELE E MILLER MS, Wright State University, 2016 BS, University of Indianapolis, 2014 2020 Wright State University WRIGHT STATE UNIVERSITY GRADUATE SCHOOL December 9th, 2020 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY Michele E Miller ENTITLED Anthrax Event Detection: Analysis of Public Opinion Using Twitter During Anthrax Scares, The Mueller Investigation, and North Korean Threats BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy. __________________________ William Romine, Ph.D Dissertation Director __________________________ Don Cipollini Director, Environmental Sciences Ph.D. Program __________________________ Barry Milligan, Ph.D. Interim Dean of the Graduate School Committee on Final Examination: ________________________________ Terry Oroszi, Ph.D. ________________________________ Nancy Bigley, Ph.D. ________________________________ Dawn Wooley, Ph.D. ________________________________ Cassie Barlow, Ph.D. Abstract Miller, Michele E. Ph.D. Environmental Sciences Ph.D. Program, Wright State University, 2020. Anthrax Event Detection: Analysis of Public Opinion Using Twitter During Anthrax Scares, The Mueller Investigation, and North Korean Threats. When people allow fear to drive their decision making, they often make decisions that do more harm than good. Examples of this include stocking up on ciprofloxacin, flooding doctors’ offices and buying black market antibiotics after the anthrax attacks of 2001. Therefore, it is important to be able to address what people are saying when another anthrax attack occurs. Supervised and unsupervised machine learning methodologies can be utilized to detect an event, classify the tweets by event, and to determine the main topics of discussion. Over the period of data collection, twenty events were detected. Three of these events concerned North Korean Threats, six discussed The Mueller Investigation, and three concerned Anthrax Scares. Other events included natural outbreaks in hippos and cattle, a conspiracy theory about Matt DeHart, an article on how long anthrax remains in the soil, wishing someone had anthrax, and tweets from those affected by the anthrax attacks. Parts of speech, unigrams, hashtags, URL’s and at- mentions were all important for classifying tweets. These methods can be used on other social media sources and can detect other terrorism events. The Mueller Investigation demonstrated that people do not forget past failings of the Federal Bureau of Investigation (FBI) and continue to distrust them because of these failings. Anthrax scares indicate people use past scares to determine how to react to current scares. North Korean threats showed that people are fearful of new threats but stopped talking about them quickly after the story broke. iii Table of Contents Introduction ......................................................................................................................... 1 Affect Heuristic of Fear ....................................................................................................2 Event Detection ................................................................................................................3 Terrorism and WMDs ......................................................................................................3 Biological Terrorism ........................................................................................................5 Bacillus anthracis .............................................................................................................6 Social Media Demographics ............................................................................................8 Features of Twitter .........................................................................................................10 Introduction to Specific Aims ........................................................................................... 11 Review of Literature ......................................................................................................... 13 Events Leading Up to Data Collection ...........................................................................13 Definitions and Types of Terrorism ...............................................................................14 Chemical agents .........................................................................................................15 Radiological agents ....................................................................................................15 Nuclear agents ............................................................................................................16 Explosives ..................................................................................................................17 Biological agents ........................................................................................................18 Event Detection ..............................................................................................................20 Terrorism ....................................................................................................................21 Anthrax ......................................................................................................................23 iv Methods............................................................................................................................. 28 Data Collection ...............................................................................................................29 Features for Classification ..............................................................................................30 Twitter Specific Features ...........................................................................................30 Preprocessing .............................................................................................................31 General features .........................................................................................................31 N-grams ......................................................................................................................32 Determination of Relevancy ...........................................................................................33 Inter-rater Reliability for Annotators and Classifiers ................................................34 Machine Learning Algorithms ...................................................................................35 Evaluation of Classifier Performance ........................................................................39 Addressing Specific Aim 1: How Do the Topics Discussed Each Month Between 9/25/2017 And 8/15/2018 Vary? ....................................................................................41 Latent Dirichlet allocation .........................................................................................41 Finding the Best Number of Topics ...........................................................................43 Addressing Specific Aim 2: How Does The Number Of Parts Of Speech, Words And Phrases Used, Hashtags, URLs, Retweets, And At-Mentions Vary Between Anthrax Event-Related Tweets And Tweets Not Concerning Anthrax Events? ..........................44 T-Test Between Means and Cohen’s D to Determine Important Features For Event Classification ..............................................................................................................45 Addressing Specific Aim 3: How Do The Topics That Emerge For Anthrax Scares, v The Mueller Investigation, and North Korean Threats Compare ...................................46 Results ............................................................................................................................... 48 Relevancy Classification ................................................................................................48 Feature Frequency ......................................................................................................48 Initial Event Detection ...............................................................................................50 Relevancy Gold Standard ..........................................................................................50 Addressing Specific Aim 1: How Do the Topics Discussed Each Month Between 9/25/2017 And 8/15/2018 Vary? ....................................................................................55 Analysis by Month .....................................................................................................56 Addressing Specific Aim 2: How Does The Number Of Parts Of Speech, Words And Phrases Used, Hashtags, URLs, Retweets, And At-Mentions Vary Between Anthrax Event-Related Tweets And Tweets Not Concerning Anthrax Events? ..........................85 Addressing Specific Aim 3: How Do The Topics That Emerge For Anthrax Scares, The Mueller Investigation, and North Korean Threats Compare ...................................90 Discussion ....................................................................................................................... 103 Anthrax Topics Over Time ..........................................................................................103