Ericka Menchen-Trevino
Total Page:16
File Type:pdf, Size:1020Kb
Ericka Menchen-Trevino 7/24/2020 c COPYRIGHT by Kurt Wirth July 23, 2020 ALL RIGHTS RESERVED PREDICTING CHANGES IN PUBLIC OPINION WITH TWITTER: WHAT SOCIAL MEDIA DATA CAN AND CAN'T TELL US ABOUT OPINION FORMATION by Kurt Wirth ABSTRACT With the advent of social media data, some researchers have claimed they have the potential to revolutionize the measurement of public opinion. Others have pointed to non-generalizable methods and other concerns to suggest that the role of social media data in the field is limited. Likewise, researchers remain split as to whether automated social media accounts, or bots, have the ability to influence con- versations larger than those with their direct audiences. This dissertation examines the relationship between public opinion as measured by random sample surveys, Twitter sentiment, and Twitter bot activity. Analyzing Twitter data on two topics, the president and the economy, as well as daily public polling data, this dissertation offers evidence that changes in Twitter sentiment of the president predict changes in public approval of the president fourteen days later. Likewise, it shows that changes in Twit- ter bot sentiment of both the president and the economy predict changes in overall Twitter sentiment on those topics between one and two days later. The methods also reveal a previously undiscovered phenomenon by which Twitter sentiment on a topic moves counter to polling approval of the topic at a seven-day interval. This dissertation also discusses the theoretical implications of various methods of calculating social media sentiment. Most importantly, its methods were pre-registered so as to maximize the generalizability of its findings and avoid data cherry-picking or overfitting. ii ACKNOWLEDGEMENTS This dissertation is the culmination of years of mentorship by experts in their respective fields. Most importantly, I could not have reached this lifelong goal if not for good people - people who are generous with their time, attention, and advice. No journey is easy and while mine hit its obstacles, some were there to once again stand me on my feet. I'd first like to acknowledge Dr. Ericka Menchen-Trevino for her unsurpassed patience and gen- erosity. Her willingness to support me and provide advice throughout the writing of this dissertation, my time at American University, and in the job market afterward proves her capability as a professional and, even further, her quality as a human being. I will forever be grateful for Ericka's friendship, hard-learned lessons, and dedication. Likewise, Dr. W. Joseph Campbell spent a year truly investing in my work and encouraging me to push myself further. His willingness to treat me as a colleague helped me to believe in my research skills more strongly and, as a result, develop them further. Joe is generous, knowledgeable, and courteous and I consider myself lucky to have had him as a mentor. To round out my committee, Dr. Ryan T. Moore has again and again dropped everything to build my programming and analysis skills. If it weren't for his genius and unshakable calm, this dissertation and my skill set would be a far cry from where they are. Likewise, Dr. David Karpf was among the first of my mentors to take a vested interest in my development by giving a random student his time at a local coffee shop. Not only have I grown as a researcher from his advice, I've grown as a person. I'd also like to thank Beth Chunn of Rasmussen for so generously providing poll data for academic use. Her support of academic research should be a standard for all data collectors in the future. Ap- preciation to my American University cohort for their unfaltering emotional support, my parents and parents-in-law for their love, and the American University department staff for their quick and passionate work. Lastly and most importantly, thanks and love to my husband Daniel Wild for his toleration and patience through four strenuous years of my graduate school education. iii TABLE OF CONTENTS ABSTRACT . ii ACKNOWLEDGEMENTS . iii LIST OF TABLES . vii LIST OF FIGURES . viii CHAPTER 1. INTRODUCTION . .1 What Is Public Opinion and How Do We Measure It? . .5 The Origins of the Poll . .6 Random Sample Surveys in the Twentieth Century . .7 An Un-Narrowing Of Public Opinion Methods? . .8 Social Media as a Public Opinion Data Source . .9 The Strengths of Social Media Data . .9 The Challenges Of Using Social Media Data . 10 Predicting Elections, Mirroring, and Predicting Public Polls . 10 Inauthentic Communication and Public Opinion . 12 Summary . 13 2. LITERATURE REVIEW . 15 Random Sample Surveys As The Public Opinion Gold Standard . 15 Found Data vs. Created Data . 16 Topic Coverage vs. Population Coverage . 17 Social Media as Public Opinion Data . 18 Predicting Elections . 19 iv v Mirroring Public Polling . 22 Predicting Changes in Public Polling . 23 Power To The People? . 24 Bot Manipulation . 25 Methodology In Question . 27 Topics . 28 Measuring Twitter Sentiment . 28 Research Questions . 30 3. METHODS . 33 Data Collection . 33 Analysis . 37 Conversation-Level Measurement . 37 Individual-Level Measurement . 39 Measuring Bot Influence . 40 Final Results . 41 4. RESULTS AND DISCUSSION . 45 Measuring the Relationship Between Twitter Sentiment and Polling Data . 45 Per-Topic Results . 48 Effect Direction Over Time . 50 Time Lags . 51 Overview . 53 Measuring the Relationship Between Twitter Sentiment and Bot Sentiment . 54 Sentiment Measurement Techniques Compared . 56 Summary . 60 5. CONCLUSION . 61 Predicting Polls . 62 Bot Influence . 63 The Three Step Flow . 64 Future Research . 65 vi Contributions . 67 APPENDIX A. RQ1 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT . 78 B. RQ1 ROBUSTNESS CHECK . 79 C. RQ2 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT . 80 D. ABSOLUTE VALUES VS. CHANGE . 81 LIST OF TABLES Table Page 1. Corpus-based sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses) . 46 2. Verified user sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses) . 48 3. Corpus-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) . 55 4. Residual standard error by sentiment calculation method, topic, and research question, with prediction improvement when calculating sentiment by corpus rather than individual 59 5. Residual standard error improvement when using corpus-based vs. individual-based cal- culation . 60 6. RQ1 results with individual data (standard error in parentheses) . 78 7. RQ1 results with individual but no verified data (standard error in parentheses) . 78 8. Individual-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) . 80 9. Individual-based sentiment without verified users vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) 80 vii LIST OF FIGURES Figure Page 1. Rune Karlsen's updated two-step flow featuring opinion leaders (OL) and passive individ- uals (P) (Karlsen, 2015). 30 2. Two-step flow including conversation among opinion leaders. 31 3. Data collection timeline . 34 4. Conversation-based measurement process . 38 5. Individual-based measurement process . 40 6. Economic confidence as measured by polls and economic sentiment as measured by Twitter sentiment over time. 47 7. Presidential approval as measured by polls and Twitter sentiment over time. 47 8. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for presidential approval. 52 9. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for economic confidence. 53 10. Three-day rolling averages of presidential polling changes and predicted changes in presi- dential polling approval using fourteen-day and seven-day lagged Twitter sentiment changes. 53 11. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the presidential approval topic. 56 12. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the economic confidence topic. 56 13. Three-day rolling averages of Twitter sentiment and predicted change in Twitter sentiment when using only two-day lagged bot sentiment as a predictor. 57 14. Three-day rolling averages of overall Twitter sentiment change and Twitter bot sentiment change over time. 57 15. Absolute presidential approval (left Y axis) compared to presidential approval change (right Y axis). ..