Sentiment Analysis of Twitter Data Evan L
Total Page:16
File Type:pdf, Size:1020Kb
Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-1-2018 Sentiment Analysis of Twitter Data Evan L. Munson Follow this and additional works at: https://scholar.afit.edu/etd Part of the Computer and Systems Architecture Commons Recommended Citation Munson, Evan L., "Sentiment Analysis of Twitter Data" (2018). Theses and Dissertations. 1853. https://scholar.afit.edu/etd/1853 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. Sentiment Analysis of Twitter Data THESIS Evan L. Munson, Captain, USA AFIT-ENS-MS-18-M-148 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this document are those of the author and do not reflect the official policy or position of the United States Air Force, the United States Army, the United States Department of Defense or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT-ENS-MS-18-M-148 SENTIMENT ANALYSIS OF TWITTER DATA THESIS Presented to the Faculty Department of Operational Sciences Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command in Partial Fulfillment of the Requirements for the Degree of Master of Science in Operations Research Evan L. Munson, B.S., M.S. Captain, USA 22 March 2018 DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. AFIT-ENS-MS-18-M-148 SENTIMENT ANALYSIS OF TWITTER DATA THESIS Evan L. Munson, B.S., M.S. Captain, USA Committee Membership: LTC C. M. Smith, Ph.D. Chair Dr. B. C. Boehmke Member AFIT-ENS-MS-18-M-148 Abstract The rapid expansion and acceptance of social media has opened doors into users opinions and perceptions that were never as accessible as they are with today's preva- lence of mobile technology. Harvested data, analyzed for opinions and sentiment can provide powerful insight into a population. This research utilizes Twitter data due to its widespread global use, in order to examine the sentiment associated with tweets. An approach utilizing Twitter #hashtags and Latent Dirichlet Allocation topic mod- eling were utilized to differentiate between tweet topics. A lexicographical dictionary was then utilized to classify sentiment. This method provides a framework for an ana- lyst to ingest Twitter data, conduct an analysis and provide insight into the sentiment contained within the data. iv AFIT-ENS-MS-18-M-148 To my amazing Wife and Family, who supported me through this entire endeavor. I love and appreciate you more than I can ever express in words To my God, who through him all things possible. -ELM v Acknowledgements I would like to express my gratitude to my faculty research advisor, LTC Christo- pher Smith, Ph.D., for all of his guidance, patience and assistance. I would like to thank all of my peers and friends that have assisted me through the coursework and analysis needed for this analysis. Evan L. Munson vi Table of Contents Page Abstract . iv Acknowledgements . vi List of Figures . .x List of Tables . xii I. Introduction . .1 1.1 Background . .1 1.2 Problem Statement . .3 1.3 Approach . .3 1.4 Research Objectives . .3 1.5 Assumptions . .4 1.6 Summary . .5 II. Literature Review . .6 2.1 Overview . .6 2.1.1 Validation . .8 2.2 Social Media . 10 2.2.1 Availability . 11 2.2.2 Social Media Benefits . 11 2.2.3 Social Media Challenges . 12 2.2.4 Approaches to Utilize Social Media . 16 2.3 Text Mining . 17 2.3.1 Definition . 17 2.3.2 Application . 17 2.4 Sentiment Analysis . 17 2.4.1 Definition . 17 2.4.2 Application . 18 2.4.3 Challenges . 19 2.4.4 Feature Selection . 20 2.4.5 Lexicographical . 20 2.5 Languages . 21 2.5.1 Internet Languages . 21 2.5.2 WordNet . 22 2.5.3 SentiWordNet . 23 2.5.4 NRC. 24 2.5.5 Bing . 26 2.5.6 AFINN . 27 vii Page 2.6 Topic Modeling . 28 2.6.1 Latent Dirichlet Allocation . 28 2.6.2 Number of Topics . 29 III. Methodology . 33 3.1 Overview . 33 3.2 Twitter API . 33 3.3 Cleaning, Tidying, and Exploration . 35 3.3.1 Cleaning . 35 3.3.2 Tidying . 37 3.3.3 Exploration . 37 3.4 Topic Modeling . 39 3.5 Sentiment Scoring . 40 3.6 Visualizations . 42 3.7 Lexicon Comparison . 43 IV. Analysis . 46 4.1 Analysis Datasets . 46 4.2 North Korea . 47 4.2.1 Data Exploration . 48 4.2.2 #Hashtag Sentiment Analysis . 52 4.2.3 Topic Analysis Sentiment Analysis . 57 4.3 Protest . 63 4.3.1 Data Exploration . 63 4.3.2 #Hashtag Sentiment Analysis . 67 4.3.3 Topic Analysis Sentiment Analysis . 73 4.4 Polling Comparison . 77 V. Conclusions and Future Research . 80 5.1 Conclusion . 80 5.1.1.