<<

Ericka Menchen-Trevino

7/24/2020 c COPYRIGHT

by

Kurt Wirth

July 23, 2020

ALL RIGHTS RESERVED PREDICTING CHANGES IN PUBLIC OPINION WITH

TWITTER: WHAT SOCIAL MEDIA DATA CAN AND

CAN’T TELL US ABOUT OPINION FORMATION by Kurt Wirth

ABSTRACT

With the advent of social media data, some researchers have claimed they have the potential to revolutionize the measurement of public opinion. Others have pointed to non-generalizable methods and other concerns to suggest that the role of social media data in the field is limited. Likewise, researchers remain split as to whether automated social media accounts, or bots, have the ability to influence con- versations larger than those with their direct audiences. This dissertation examines the relationship between public opinion as measured by random sample surveys, Twitter sentiment, and Twitter bot activity. Analyzing Twitter data on two topics, the president and the economy, as well as daily public polling data, this dissertation offers evidence that changes in Twitter sentiment of the president predict changes in public approval of the president fourteen days later. Likewise, it shows that changes in Twit- ter bot sentiment of both the president and the economy predict changes in overall Twitter sentiment on those topics between one and two days later. The methods also reveal a previously undiscovered phenomenon by which Twitter sentiment on a topic moves counter to polling approval of the topic at a seven-day interval. This dissertation also discusses the theoretical implications of various methods of calculating social media sentiment. Most importantly, its methods were pre-registered so as to maximize the generalizability of its findings and avoid data cherry-picking or overfitting.

ii ACKNOWLEDGEMENTS

This dissertation is the culmination of years of mentorship by experts in their respective fields.

Most importantly, I could not have reached this lifelong goal if not for good people - people who are generous with their time, attention, and advice. No journey is easy and while mine hit its obstacles, some were there to once again stand me on my feet.

I’d first like to acknowledge Dr. Ericka Menchen-Trevino for her unsurpassed patience and gen- erosity. Her willingness to support me and provide advice throughout the writing of this dissertation, my time at American University, and in the job market afterward proves her capability as a professional and, even further, her quality as a human being. I will forever be grateful for Ericka’s friendship, hard-learned lessons, and dedication. Likewise, Dr. W. Joseph Campbell spent a year truly investing in my work and encouraging me to push myself further. His willingness to treat me as a colleague helped me to believe in my research skills more strongly and, as a result, develop them further. Joe is generous, knowledgeable, and courteous and I consider myself lucky to have had him as a mentor.

To round out my committee, Dr. Ryan T. Moore has again and again dropped everything to build my programming and analysis skills. If it weren’t for his genius and unshakable calm, this dissertation and my skill set would be a far cry from where they are. Likewise, Dr. David Karpf was among the first of my mentors to take a vested interest in my development by giving a random student his time at a local coffee shop. Not only have I grown as a researcher from his advice, I’ve grown as a person.

I’d also like to thank Beth Chunn of Rasmussen for so generously providing poll data for academic use. Her support of academic research should be a standard for all data collectors in the future. Ap- preciation to my American University cohort for their unfaltering emotional support, my parents and parents-in-law for their love, and the American University department staff for their quick and passionate work.

Lastly and most importantly, thanks and love to my husband Daniel Wild for his toleration and patience through four strenuous years of my graduate school education.

iii TABLE OF CONTENTS

ABSTRACT ...... ii

ACKNOWLEDGEMENTS ...... iii

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

CHAPTER

1. INTRODUCTION ...... 1

What Is Public Opinion and How Do We Measure It? ...... 5

The Origins of the Poll ...... 6

Random Sample Surveys in the Twentieth Century ...... 7

An Un-Narrowing Of Public Opinion Methods? ...... 8

Social Media as a Public Opinion Data Source ...... 9

The Strengths of Social Media Data ...... 9

The Challenges Of Using Social Media Data ...... 10

Predicting , Mirroring, and Predicting Public Polls ...... 10

Inauthentic Communication and Public Opinion ...... 12

Summary ...... 13

2. LITERATURE REVIEW ...... 15

Random Sample Surveys As The Public Opinion Gold Standard ...... 15

Found Data vs. Created Data ...... 16

Topic Coverage vs. Population Coverage ...... 17

Social Media as Public Opinion Data ...... 18

Predicting Elections ...... 19

iv v

Mirroring Public Polling ...... 22

Predicting Changes in Public Polling ...... 23

Power To The People? ...... 24

Bot Manipulation ...... 25

Methodology In Question ...... 27

Topics ...... 28

Measuring Twitter Sentiment ...... 28

Research Questions ...... 30

3. METHODS ...... 33

Data Collection ...... 33

Analysis ...... 37

Conversation-Level Measurement ...... 37

Individual-Level Measurement ...... 39

Measuring Bot Influence ...... 40

Final Results ...... 41

4. RESULTS AND DISCUSSION ...... 45

Measuring the Relationship Between Twitter Sentiment and Polling Data ...... 45

Per-Topic Results ...... 48

Effect Direction Over Time ...... 50

Time Lags ...... 51

Overview ...... 53

Measuring the Relationship Between Twitter Sentiment and Bot Sentiment ...... 54

Sentiment Measurement Techniques Compared ...... 56

Summary ...... 60

5. CONCLUSION ...... 61

Predicting Polls ...... 62

Bot Influence ...... 63

The Three Step Flow ...... 64

Future Research ...... 65 vi

Contributions ...... 67

APPENDIX

A. RQ1 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT ...... 78

B. RQ1 ROBUSTNESS CHECK ...... 79

C. RQ2 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT ...... 80

D. ABSOLUTE VALUES VS. CHANGE ...... 81 LIST OF TABLES

Table Page

1. Corpus-based sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses) ...... 46

2. Verified user sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses) ...... 48

3. Corpus-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) ...... 55

4. Residual standard error by sentiment calculation method, topic, and research question, with prediction improvement when calculating sentiment by corpus rather than individual 59

5. Residual standard error improvement when using corpus-based vs. individual-based cal- culation ...... 60

6. RQ1 results with individual data (standard error in parentheses) ...... 78

7. RQ1 results with individual but no verified data (standard error in parentheses) . . . . . 78

8. Individual-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) ...... 80

9. Individual-based sentiment without verified users vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses) 80

vii LIST OF FIGURES

Figure Page

1. Rune Karlsen’s updated two-step flow featuring opinion leaders (OL) and passive individ- uals (P) (Karlsen, 2015)...... 30

2. Two-step flow including conversation among opinion leaders...... 31

3. Data collection timeline ...... 34

4. Conversation-based measurement process ...... 38

5. Individual-based measurement process ...... 40

6. Economic confidence as measured by polls and economic sentiment as measured by Twitter sentiment over time...... 47

7. Presidential approval as measured by polls and Twitter sentiment over time...... 47

8. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for presidential approval...... 52

9. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for economic confidence...... 53

10. Three-day rolling averages of presidential polling changes and predicted changes in presi- dential polling approval using fourteen-day and seven-day lagged Twitter sentiment changes. 53

11. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the presidential approval topic...... 56

12. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the economic confidence topic...... 56

13. Three-day rolling averages of Twitter sentiment and predicted change in Twitter sentiment when using only two-day lagged bot sentiment as a predictor...... 57

14. Three-day rolling averages of overall Twitter sentiment change and Twitter bot sentiment change over time...... 57

15. Absolute presidential approval (left Y axis) compared to presidential approval change (right Y axis)...... 81

viii ix

16. Absolute economic confidence (left Y axis) compared to economic confidence change (right Y axis)...... 82 CHAPTER 1

INTRODUCTION

Research suggests users of Twitter, a social media service featuring real-time, limited-character posts, may act as opinion leaders in non-Twitter spaces (Karlsen, 2015). As Twitter data seem to be able to predict changes in public polling (Jensen and Anstead, 2013; Ceron et al., 2014; Jungherr et al.,

2017), Twitter users may be at the forefront of public opinion. As journalists are more likely to engage with one another than with the general public on Twitter (Molyneux and Mourao, 2019), changes in the information environment within Twitter may influence media coverage in other media (Harder et al.,

2017). However, research into how Twitter mediates the construction and evolution of public opinion has suffered from cherry-picking (applying only data or results that show desired outcomes) data and results (Tjong Kim Sang and Bos, 2012) as well as overfitting (applying methods to a specific and non- generalizable data set) and otherwise non-generalizable methods (Tumasjan et al., 2010; Fu and Chan,

2013; Beauchamp, 2017). Because of the lack of generalizability in the field, some researchers have argued that social media data have little to no role in the measurement and understanding of public opinion (Taylor, 2013; Jungherr et al., 2012; Gayo-Avello, 2012). Better understanding the distinct role that conversations on Twitter play in the formation and evolution of public opinion could illuminate how influence moves through society. This dissertation is designed to investigate this question through rigorous and pre-registered (Nosek et al., 2015) methods that should remain generalizable for later studies.

In the modern day, opinion leaders may gather messages from across the informational landscape, of which Twitter is only a small part, and then choose how to amplify those messages to their respective audiences. This process would resemble the classic two-step flow introduced by Lazarsfeld and Katz in the mid-twentieth century (Lazarsfeld et al., 1944). If opinion leaders continue to play as important a role as they did at the time of the birth of the two-step model, Twitter would prove a useful tool by which to see early signs of shifts in public opinion.

Alternatively, some opinion leaders may use new, interactive communication technologies like Twit-

1 2 ter in order to form their opinions among other opinion leaders, rather than the previous model of opinion self-creation on the part of the opinion leader, before amplifying them to their respective audiences. If evidence of this process were found, the two-step flow would need to expand to include a preliminary step wherein opinion leaders engage in dialogue and establish their respective opinions. Twitter, rather than simply being a loudspeaker for opinion leader perspectives, would instead be seen as an important research tool for better understanding the formation of public opinion through opinion leader discourse.

Twitter, however, likely isn’t representative of opinion leaders. Not all of society’s opinion leaders have Twitter accounts and even fewer post to Twitter about political topics. The reverse, that not all

Twitter users who post about political topics are opinion leaders in off-Twitter situations, is also likely true. Measuring only the behaviors of those deemed by Twitter itself as important individuals, identified by the so-called verified status, may provide a distilled perspective on what relationship, if any, opinion leaders within the Twitter environment have with overall public opinion.

Lastly, Twitter users may be more likely to follow and engage with news and current issues but, counter to Karlsen’s research, they may not have a significant role in the opinion formation of their peers in off-Twitter spaces. In this case, Twitter users would be seen as a non-representative sample of American public opinion who remain as influenced by media messages as their peers. Near-zero lag between changes in Twitter sentiment and changes in public opinion would be detected, bolstering the one-step flow theory proposed by Bennett and Manheim (Bennett and Manheim, 2006) that suggests the erosion of opinion leadership due to direct access to individual targeting by message producers.

What role social media data can have in the field of public opinion remains in question, however, as random sample public polling has gradually become a mainstay method in the field since the 1936 U.S. presidential (Allport, 1937). The random sample survey method, where surveys are administered randomly across a national population (e.g., the ) regularly to small volumes of individuals, produces probability samples. On Twitter, a global platform where demographic and geographic infor- mation is scarce and national random samples thus impossible, other methods have emerged to measure opinions. Researchers must analyze the sentiment of Twitter conversations by collecting non-probability samples. The applicability of Twitter sentiment data to the measurement of public opinion is thus lim- ited by generalizability issues, including that only those who tweet about a topic are measured and the sentiment of tweets are not necessarily even reflective of the author’s opinion of the tweet’s topic.

Measuring social media sentiment is itself a complex task. After gathering tweets on a topic, one could measure the sentiment of that topic by pooling all of the tweet text together as one so-called corpus and then measuring the corpus’s sentiment. Alternatively, one could separate the tweets by each author 3 and measure the sentiment per individual. The researcher would then calculate the topic’s sentiment by averaging the individuals’ sentiment.

How Twitter sentiment is measured should shed light on how robust the two-step model remains in the age of social media. If a stronger correlation is found between public polling data and Twitter data where sentiment is categorized by individuals rather than entire corpuses, the results would suggest the two-step flow remains relevant despite recent technological and media use changes. Opinion leaders would be seen to individually parse media messages off-Twitter and bring them to the Twitter space to share with others. If instead a stronger correlation is instead found between public polling data and

Twitter data where sentiment is categorized by entire corpuses of tweets rather than by individual user, the findings would suggest a third step is needed in the model wherein opinion leaders engage in discourse and settle on their own opinions among one another before disseminating their respective views to their networks. Previous research analyzing social media sentiment within the field of public opinion has measured sentiment as representing an entire corpus of tweets and thereby implied the existence of a third step within the two-step flow (Ceron et al., 2014; Jensen and Anstead, 2013; Jungherr et al., 2017;

O’Connor et al., 2010; Tumasjan et al., 2010). If the assumption of previous research is correct, and a three-step flow of communication more accurately describes the modern-day flow of influence, any ability to influence the conversational third step would deserve special attention.

Automated accounts, or bots, that are able to network with other bots to influence conversations on Twitter are perhaps chief among ways in which users may seek to influence the corpus. Bots have been shown to be able to shape other users’ perceptions of media coverage (Bessi and Ferrara, 2016) and change their audience’s attitudes and behaviors (Aiello et al., 2012). If bots are able to influence online conversation wherein opinion leaders are continually forming their opinions, bot creators could have the ability to, at least in part, influence public opinion as a whole.

Understanding the flow of influence online between Twitter users and Twitter bots and how it correlates with public opinion is the goal of this dissertation. By measuring Twitter sentiment and comparing it to public polling, insight is gained as to how useful Twitter data are to public opinion research. While the field has reason to be skeptical of previous findings of correlations between social media data and election predictions or absolute measures of public polling, dedicated research is needed to explore further evidence that changes in Twitter sentiment seem to predict changes in public polling

(Jungherr et al., 2017; Ceron et al., 2014; Jensen and Anstead, 2013; Barbera, 2016). If these findings are confirmed by research designed specifically to investigate the phenomenon, exploring which model of influence applies more accurately to the findings will contribute to researchers’ understanding of what 4 role Twitter plays within the information influence sphere. Finally, applying computational methods to differentiate automated accounts from non-automated accounts on Twitter and exploring how their behaviors differ will help inform future researchers as to the potential of social media automation to influence public opinion. 5

What Is Public Opinion and How Do We Measure It?

In the first issue of the journal Public Opinion Quarterly, psychologist Dr. Floyd Allport provides a guidebook for the field, spending much of his time describing what is not public opinion and ultimately settling on this definition:

The term public opinion is given its meaning with reference to a multi-individual situation in which individuals are expressing themselves, or can be called upon to express themselves, as favoring or supporting (or else disfavoring and opposing) some definite condition, person, or proposal of widespread importance, in such a proportion of number, intensity, and constancy, as to give rise to the probability of affecting action, directly or indirectly, toward the object concerned. (Allport, 1937, p. 23)

To Allport, public opinion as a concept is constructed by the individual makeup of larger public belief. Thus, the field of public opinion research is one that seeks to understand and predict societal changes, whether they be attitudes or behaviors, through the measurement of individuals’ personal opin- ions. Among those whom this field interests are marketers hoping to maximize profits, politicians seeking election, policy makers hoping to make informed decisions, and researchers informing those policies. How public opinion is measured, however, has narrowed since the introduction of poll-based survey methods and recently resurfaced as a matter of interest for researchers.

Until the fateful election of 1936, where a Roosevelt landslide stunned those relying on outdated election prediction methods (Crossley, 1937), how the government and media measured public opinion remained in flux. Some, like the British government in the early 1930s, deployed expensive armies of canvassers to administer surveys to as many households as possible across the country (Nicolson,

1937). Others, like the Literary Digest in 1936 and before, issued mail-in ballots to huge volumes of individuals and tallied all responses (Crossley, 1937). the measured population randomly was not seen as necessary and time-based changes in opinion unimportant. Innovations in polling methods by George (Gallup and Rae, 1940), Archibald Crossley, and other researchers eventually showed the superiority of representative sampling, while later innovations effectively narrowed the definition for many of public opinion to being measured only by random sample survey responses collected in regular intervals. While there have been detractors from the belief that polling represents public opinion, with some even arguing that this belief erodes popular democracy (Hogan, 1997), it remains a heavily relied- upon measure of public opinion in both popular culture and academia. 6

The Origins of the Poll

The field of public opinion was rocked by the 1936 election of incumbent president Franklin D.

Roosevelt. Measuring voting intentions of the public at large had, until the 1930s, been left to politicians themselves or to magazines and newspapers to make predictions based on the best methods known to them at the time (Lang, 1933). Indeed, the Literary Digest had built a reputation, according to Lang and Robinson, of producing the most accurate electoral predictions to date. The Literary Digest had historically relied on a series of mail-based polls sent to all of its readership, measuring only the opinions of those who chose to take the effort to return their questionnaire (Crossley, 1937). Also included were other Literary Digest customers, including telephone subscribers, automobile registrants, club members, and others (Lusinchi, 2012). The magazine accumulated all of the votes received over the course of months and published predictions based on these data periodically. Leading up to the 1936 election, the publication issued a forecast: governor would defeat Roosevelt 370 electoral votes to

161.

By this time, however, George Gallup and Archibald Crossley, among others, were experimenting with new methods of election intention measurement. The researchers joined Fortune Magazine among others in disagreeing with the Digest in their prediction of a Landon landslide (Crossley, 1937). After

Roosevelt’s sweeping 523-8 electoral college victory, one of the most lopsided in history, electoral predic- tion methods came under greater scrutiny that led to threats from Congress to investigate the Digest

(Crossley, 1937).

Gallup and Crossley discovered that attempting to make their samples representative of the voting population, increasing how often measurements are taken to maximize efficiency, and predicting election outcomes from data pulled soon before the election resulted in far more accurate predictions. These techniques also allowed for smaller sample sizes, making polling logistically easier. Rather than polling only a convenient subset of the population, better educated and wealthier readers in the Digest’s case,

Gallup and Crossley interviewed smaller and more representative samples of individuals through in-person surveys. Attempting to represent the population rather than gather huge volumes of data proved more efficient, as polls could be done every two weeks rather than over the course of months, and improved polling accuracy. Researchers quickly discovered how to improve the measurement of public opinion, including the advent of random samples in later decades, and began to apply those improvements in domains outside of election predictions (Robinson, 1937). As a result of the spectacular failure of the

Literary Digest, measuring public opinion came to be defined by polls administered to a random sample of a population on a semi-regular basis. 7

Random Sample Surveys in the Twentieth Century

In modern times, public opinion as measured by random sample surveys is crucial for shaping policy (Burstein, 2003), maintaining an efficient economy by tracking and predicting consumer behaviors

(Noussair et al., 2001), and understanding election results (Kennedy et al., 2017). In the United States alone, organizations like Nielsen, Gallup (founded by George Gallup, mentioned previously), Pew, and

Harris Interactive feed data directly into news streams, social media feeds, and stock markets. Random sample survey polling results have become a crucial part of the average citizen’s information landscape.

Random sample survey polling, though, is facing a crossroads. Having long since abandoned the expensive task of flying interviewers across the country to randomly poll citizens in person, telephones have become the primary contact method for pollsters. This delivery system, in the age of waning landline telephone usage and increased use of other communication technologies, is growing outdated (Richter,

2019). Aggravating this concern is that decreasing landline usage may not represent the population equally; the decrease may correlate with education level, age, or income, leading to inaccurate data. While modern-day polling methods have extended to cell phones (Brick et al., 2007) and the internet (Couper and Miller, 2008), those contact methods introduce their own unique set of concerns like nonresponse bias, representativeness, and inaccurate participant responses (Keeter et al., 2017; Watson et al., 2015;

Zhang et al., 2017; Conrad et al., 2017).

Some critics have argued that random sample surveys are growing less effective and accurate. Media outlets criticized pollsters in 2016 when many poll-based predictions of the U.S. presidential election pointed to the wrong winner (Cohn, 2017), though others argued that the polls performed reasonably well despite some errors in swing states (Kennedy et al., 2018). Researchers have made the case that the field of public opinion research is primed for a methods overhaul and is destined for a major shift toward the use of more digital trace data (Savage and Burrows, 2007; Ceron et al., 2014; Slade, 2016).

Others have expressed skepticism, pointing out that random sample surveys have been the most reliable, most accurate form of public opinion measurement since the introduction of the method (Tufekci, 2014;

Couper, 2013) and that the 2016 U.S. election outcome was an illustration that probabilities cannot be equated with guarantees (Kennedy et al., 2017) and that national-level polling can only predict a popular vote and cannot compensate for errors in state-based polling. In either case, random sample survey administrators face a growing litany of challenges.

Perhaps due to difficulties gathering polling data from cell phone users, decreasing use of landline telephones, increased data privacy concerns, or other factors, nonresponse rates are rising despite efforts to incentivize participation, introducing greater biases that must be accounted for (Brick and Williams, 8

2013). According to Brick and Williams, current methods that attempt to negate the effect of growing nonresponse bias are falling behind, leading to less accurate measurements.

Funding for increasingly expensive regular interval polling is reflecting the challenges the field is facing. As response rates drop and confidence in random sample survey’s future wanes, organizations struggle to finance their polling efforts and frequency of polling by each decreases as a result (Massey and

Tourangeau, 2013). In his presidential address to the American Association of Public Opinion Research in 2012, Pew Director of Survey Research predicts funding for random sample surveys is likely to level off or begin decreasing in the future (Keeter, 2012). Later president David Dutwin referenced Keeter’s speech in his statement arguing for more investment in the field of public opinion (Dutwin, 2019).

The measurement of public opinion, the concept of measuring a shared public perspective, may not necessarily rely on random sample surveys as much as some expect. Susan Herbst described in 1998 how for opinion leaders like legislative staffers, politicians, and journalists, quantitative measurement of public opinion is often eschewed in favor of peer-to-peer or activist-led communication (Herbst, 1998).

Worse still, public polling has at least in some instances proven to be oversimplified and provides decision- makers that do use polls as guidance misleading assumptions about public opinion (Hogan and Smith,

1991; Hogan, 1985). Likewise, media members often rely on sources outside of quantitative research, including social media itself (McGregor, 2019).

Others claim public polling through random sample surveys, despite its inherent weaknesses, re- mains the leader in public opinion measurement and will remain so. In his book In Defense Of Public

Opinion Polling, Kenneth Warren points out the size of the industry built around public polling and ar- gues that polling services have the resources to address challenges like non-response bias and plummeting use of landline telephones (Warren, 2001). Whether new data sources can address these problems has been an emerging topic in public opinion research, but many have established theoretical and technical arguments as to why this may be impossible (Baker et al., 2013; Couper, 2013; Taylor, 2013), suggesting instead that research explore possible ways to strengthen random sample survey techniques.

An Un-Narrowing Of Public Opinion Methods?

Though public opinion polling methods have refined since the early 1930s, anonymity was discovered to improve accuracy (Dalkey, 1969) and telephones made survey delivery cheaper with similar accuracy

(Rogers, 1976) as examples, the principles of random sample survey methods have remained largely intact since refinement in the mid-twentieth century. While history, beginning with predictions of the

1936 presidential election, has shown the strength of random sample survey methods in the field of public 9 opinion, changes in communication technologies and their use have motivated researchers in the field to question whether there are other sources of data that might complement the weaknesses in those methods that are emerging.

While random sample surveys can offer relatively reliable probabilistic insights as to the winner of elections, this method is now applied to a litany of topics ranging from the economy to views on issues such as global warming, gun control, and views of authority. In these areas, where random sample polling can not easily be measured against a tangible ground truth and can prove misleading or oversimplify complex issues (Hogan, 1997), there exists space for other methods to provide insight into public opinion.

Social Media as a Public Opinion Data Source

Among the explored candidates for alternatives, or complements, to phone-based random sample survey methods in public opinion research is social media data. Social media use, widespread as it is, and the accessible nature of the data produced by it on some of its platforms, might provide an alternative source of information on public opinion. In investigating the potential for social media data to have a role in the measurement of public opinion, however, close attention must be given to ensuring the strengths of methods doing so are measured against the potential weaknesses of these methods.

The Strengths of Social Media Data

While many social media platforms do not allow access to their data, others provide Application

Programming Interface (API) access to developers and researchers. These digital connections enable researchers to gather available public data at large scales for little or no cost. The sheer cost difference between social media data collection, particularly with regard to Twitter data, and random sample survey methods has inspired researchers to investigate its potential to revolutionize public opinion research.

Likewise, as the data is often not proprietary to a single organization and is accessible by all, access to robust volumes of historical data allows for more in-depth analysis on longer timescales than that often made possible by random sample survey methods. Studies like that of Scott Golder and Michael

Macy, wherein changes in Twitter user behavior is studied over large time scales, show the potential for longitudinal research using social media data (Golder and Macy, 2011). Twitter is not representative of social media, however, as the term social media is an umbrella that covers a score of digital platforms that have varying ways in which users can interact. Twitter, however, is among the most real-time social media platforms and produces freely accessible data at scale on a large breadth of topics.

As Gallup and other opinion researchers recognized in the 1930s and 1940s, timely and random 10 sampling of public opinion and of voters’ preferences are important elements in the reliability of survey data. As social media data is produced constantly and its collection frequency is nearly infinite, some researchers naturally suspect its data might hold great value in measuring and predicting public opinion.

In contrast, polling even once daily is a significant challenge using random sample survey methods.

The Challenges Of Using Social Media Data

Questions remain as to whether social media data has a place at the public opinion table. Most importantly, it is theoretically impossible to obtain a representative sample of the overall population via social media data for two reasons. Firstly, not all of the national population uses social media and the demographics of those that do likely differ from other populations in significant ways (Diaz et al., 2014).

Secondly, using currently available social media platforms, information on an audience’s demographics is often impossible to obtain, exaggerating the concern that the measured audience is not representative of the overall population. Lastly, measuring opinion on a given topic on a social media platform by measuring the sentiment of posts is not representative of all social media users but instead of those moved to post about the topic. These could be weaknesses that render social media data in its current form unusable within public opinion research (Baker et al., 2013), though some argue that measuring the opinion of opinion leaders on Twitter, specifically, may be enough to represent public opinion (Karlsen,

2015).

Other challenges face researchers who investigate the potential for social media data to complement or replace current public opinion methods. As communicative changes evolve rapidly on social media, where trends like ”subtweeting”, or posting about a specific person in a vague way so as to make the motivation of the post hidden, and screen-capturing text in lieu of directly sharing posts make accurate measurement of social media data difficult (Tufekci, 2014). Likewise, as social media data are immense and complex, small changes in analysis can have large impacts on research outcomes (Jungherr et al.,

2012). Like surveys, genuine belief may not be represented by social media posts (Schwartz and Halegoua,

2015) and nuances of language make identifying support and specific causes difficult when analyzing social media data.

Predicting Elections, Mirroring, and Predicting Public Polls

Public opinion polling, while often used to guide policy, direct government funds, and other uses, owes its origins to the science of predicting elections. A number of studies have used social media data to build mathematical models to predict a specific election (Tumasjan et al., 2010; Tjong Kim Sang and 11

Bos, 2012; Ceron et al., 2014). Each model, though, relies entirely on the training set of data it was built upon, a process known as overfitting, and as a result, has not proven to be applicable to other election prediction data sets (Gayo-Avello, 2012). Furthermore, small decisions in the collection and preprocessing of that data has shown to have significant effects on the model’s election outcome prediction (Jungherr et al., 2012). Thus, to date, no method exists to reliably predict elections with social media data across time, topic, and data set.

If social media data can not, with current methods, predict elections, can they instead approximate public polling, thereby lessening our reliance on techniques like phone-based and other types of surveys?

The evidence does not support this. Much like election prediction research using social media data, some of these studies suffer from overfitting and non-reproducible methods (O’Connor et al., 2010; Beauchamp,

2017). Many others have found weak or no correlation between absolute measures of public opinion produced by public polling and social media data (Ceron et al., 2014; Annett and Kondrak, 2008; Pasek et al., 2019). The consensus seems, then, to point away from social media data as a stand-in for phone- based and internet-based random sample surveys within public opinion measurement, though it may have the potential to represent public opinion in its own right.

However, another consensus is forming within the field of public opinion research using social media.

After O’Connor and his colleagues introduced the concept of a so-called lag variable (O’Connor et al.,

2010), where a gap of time is recognized between the social media and public polling data sets, a number of studies have applied the concept and found strong evidence that social media data are predictive of changes in public polling data (Jungherr et al., 2017; Ceron et al., 2014; Jensen and Anstead, 2013;

Barbera, 2016). These studies, however, were not designed to explore the concept of changes in social media sentiment predicting changes in public opinion. The phenomenon deserves to be explored more thoroughly with dedicated research questions and methods designed specifically to address it.

If Twitter sentiment directly influences overall public opinion, there must necessarily be an amount of time between Twitter sentiment shifts and shifts in public opinion. This amount of time is known as a lag time and measuring it accurately will allow for greater understanding of the relationship between

Twitter users and public opinion as measured by random-sample surveys. Detecting no lag time, where the two variables move in conjunction with one another, would rule out the possibility that Twitter sentiment is influencing public opinion.

In my dissertation, I will address these questions by investigating whether changes in sentiment on social media can predict changes in public opinion as measured by public polling and what, if any, lag time applies. I will do this by measuring regression coefficients and lag times between social media sentiment 12 regarding two topics, to help ensure reproducability, and using two methods of calculation to explore the theoretical implication of the findings. The results will contribute new methods for approaching questions regarding the applicability of social media data to public opinion research. The studies will also offer a new look at how opinion leaders interact with Twitter and what role Twitter may play in the construction and evolution of public opinion. Finally, by pre-registering (Nosek et al., 2015) these methods, this dissertation will contribute reproducible methods for later research.

Inauthentic Communication and Public Opinion

Online conversations can be subject to interventions by automated accounts, or bots. With no need for sleep, the efficiency of bots to make use of any influence they were to gain within online conversations is potentially much greater than that of human actors. When grouped into networks called armies, bots can influence both the perceptions of reality held by their direct human audiences and media coverage of conversational topics (Bessi and Ferrara, 2016). Strategies like hashtag saturation and spreading misinformation can grant influence to bot creators hoping to manipulate online discourse (Kollanyi et al.,

2016). These efforts have been used to alter perceptions of healthcare (Jin et al., 2014) and disseminate propaganda that strengthens incumbent regimes (Woolley, 2016). In both Venezuela (Forelle et al., 2015) and the Philippines (Etter, 2017), bots on Twitter have served as a tactic of information warfare. Bot influence has become a regular media topic, as fears have grown that they are adding influence to social pressures (Hao, 2020).

In a preliminary study, bots have been found to author approximately 6.5% of Twitter conversations

(Wirth et al., 2019), while that proportion can vary dramatically depending by topic and timeframe. In that small study, my colleagues and I found most bots in our sample were profit-seeking, or spam bots.

Political conversations, however, were found to contain a higher proportion of bots which could disrupt deliberative discourse.

Experiments have shown that bots manipulate the flows of conversations such that they have effects on the behaviors and attitudes of individuals and, as a result, the flow of information online (Aiello et al.,

2012). Aiello and his colleagues found that as bots build apparent relevance through audience size and interactions with other accounts, they were able to manipulate the ways humans interacted with one another on the platform. Their study also found bots were able to trigger emotional responses that exacerbated polarization tendencies in individuals. As a growing volume of subsequent research produces similar findings, research has begun as to what specific bot behaviors and human traits may predict the largest effects (Wald et al., 2013; Everett et al., 2016; Hjouji et al., 2018). 13

By nature, research into whether the individual effects of bots impact larger societal discourse is challenging. Codifying and measuring societal-level changes is a monumental task, though some re- searchers have attempted to do so in order to measure the impact of bots on a larger scale. Philip Howard and his colleagues have shown that bots have the capability, and bot creators the motivation, to interfere with political communication online (Howard et al., 2018). A number of research groups have pointed to ”small” effects, measured by the percentage of conversation traffic estimated to be bot-driven (Forelle et al., 2015; Howard and Kollanyi, 2016; Kollanyi et al., 2016). Whether their findings showed that fewer than 10 percent of tweets about Venezuelan politicians are bot-authored (Forelle et al., 2015), a small percentage of bots can author a vast majority of bot-authored posts (Howard and Kollanyi, 2016), or that a majority of posts about political topics are usually human-authored (Kollanyi et al., 2016), previous research has fallen into a habit of minimizing potential bot effects on information ecosystems. In fact, it may be that small volumes of bots mass producing false engagement can have significant, far-reaching effects on public opinion. Whether bots as a small percentage of social media users can alter online discourse or public opinion as a whole remains an open question.

In a second layer to my dissertation, I will explore whether changes in sentiment of bot-authored communication on Twitter correlate to changes in sentiment of social media discourse on a specific topic and what, if any, lag time applies. This will contribute an empirical perspective on larger impacts of bot armies and, when coupled with my exploration of a potential correlation between changes in social media sentiment and public opinion, allow researchers to gain a more well-rounded look at what relationship, if any, bots have with public opinion.

Summary

Twitter is an interesting topic of study with regard to how influence flows through society for several reasons. As Rune Karlsen discovered (Karlsen, 2015), and other studies suggest (Jensen and Anstead,

2013; Ceron et al., 2014; Jungherr et al., 2017), Twitter users are more likely than non-Twitter users to influence opinions in off-Twitter environments. This phenomenon, coupled with apparent correlation between shifts in the sentiment of Twitter users and public opinion polls (Barbera, 2016; Jensen and

Anstead, 2013; Ceron et al., 2014; Jungherr et al., 2017), suggests Twitter plays a substantive role in the dissemination of influence through society.

Furthermore, a process known as intermedia agenda setting, where media agendas on one media platform by well-funded producers influence agendas on other platforms (Danielian and Reese, 1989), can cause Twitter discussions to influence media coverage off of Twitter (Harder et al., 2017) with 14 some possible limitations (Groshek and Clough Groshek, 2013). This process can effect not just media coverage but also politicians’ talking points and platform stances (Conway et al., 2015). If opinion leaders are engaging one another in conversation on Twitter before settling on their own perspectives and sharing those perspectives with their respective audiences, intermedia agenda setting provides a theoretical mechanism by which influence may flow from Twitter conversations to society at large. Though

Twitter may not be a representative sample of larger populations like the United States or the world, and Twitter users who post about political topics may not represent Twitter users, the platform may play a crucial role in the formation and dissemination of opinions for both opinion leaders and the overall public.

By applying modern and multi-approach methods, this dissertation will examine implied assump- tions made by many researchers within the field of public opinion that opinion leaders are engaging with one another on social media to establish their beliefs before disseminating them out to their respective audiences. These results will help to draw conclusions about the application of the two-step model in today’s information ecosystem and whether social media platforms have changed the way in which influ- ence moves through society. In investigating this, Twitter’s role in the dissemination of societal influence will be analyzed as will the ability for fake social media accounts, or bots, to influence Twitter discourse and public conversations in off-Twitter environments. These results will help members of the media as well as researchers understand the nature of bot influence at scale and in what ways bot armies may have the ability to skew public opinion, potentially leading to new ways to counter these effects in future research. Perhaps most importantly, this dissertation pre-registers its methods (Nosek et al., 2015) in an effort to avoid cherry-picking and overfitting data as well as to provide reproducible methods that will add generalizable findings to research in the field. By setting an open-science example for the field of social media research within public opinion, this dissertation will encourage future researchers to use data in ways that empower the accumulation of meaningful long-term knowledge. CHAPTER 2

LITERATURE REVIEW

As random sample surveys became the gold standard for the measurement of public opinion in the

20th century, recent innovations in communication technology like social media have called into question whether new methods may complement or supplant the old methods. In question is whether representing the entire population in a data set, as done in random sample surveys, is necessary for election prediction and measuring public opinion. After reviewing the exploration of this argument, an application of the two-step model of communication to this field’s research will show some researchers have implicitly argued for the addition of a third step wherein opinion leaders settle on their own opinions on Twitter. Finally, this literature review will point to the dangers of bot influence in a model that includes this third step in public opinion formation.

Random Sample Surveys As The Public Opinion Gold Standard

The field of public opinion, approaching a century in age, was founded on the principles of repre- sentative survey research. In response to an era of belief that social reality was constructed by power structures like politicians, newspapers, and the government itself (Lang, 1933), citizens and the private sector began applying statistical methods to seek a truer understanding of how the public at large felt about specific topics. While originally the field focused on the prediction of election outcomes (Crossley,

1937), perhaps sparked by the Literary Digest’s famously incorrect prediction of a Landon landslide de- feat of incumbent Franklin D. Roosevelt, researchers quickly began to apply sampling methods to other questions (Robinson, 1937). Recent technological innovations, namely the advent of always-on many- to-many communication enabled by social media, have called into question the decades-old assumption that random-sample surveys are the most efficient and effective method by which public opinion can be measured.

15 16

Found Data vs. Created Data

In exploring whether social media data can complement, or even replace, public polling data gath- ered via random sample surveys, a discussion first is warranted regarding a difference in the two types of data of importance to this study. Much like experimental and other first-hand research data, survey data can be described as ”made” (Taylor, 2013). That is, the data that answers the researchers’ questions did not exist before those researchers created the data in some way, either by subjecting individuals to an experiment, interviewing individuals, or other methods meant to create data. This data is created with specific research questions and pre-planned variables in mind. As data becomes widely accessible in both volume and cost, some researchers remain interested in what has been described as ”found” data, or data that exists outside of research and the analysis of which does not alter the already existing information ecosystem from which it is extracted.

As is the case in many fields, tension is growing in public opinion research as researchers debate the role of found data within academic inquiry. Some have argued that the flood of open-access information available to researchers will render made data archaic, or at least useless in some situations (Mayer-

Schonberger and Cukier, 2014; Savage and Burrows, 2007). Proponents of social media data, one form of found data, argue that while understanding and successfully manipulating this data to answer research questions may take some time, doing so will be worth the effort considering the potential for improvements in research cost and scale. Matthew Salganik, for example, sees the flood of ”digital traces”, or digital found data, as a progression away from expensive and rare studies of human behavior and toward better understanding (Salganik, 2019). Others, though, argue that because the demographics of social media users cannot be known, and social media users are not representative of a larger population that includes social media non-users (Tufekci, 2014), the potential of found data is limited (Gayo-Avello, 2012; Baker et al., 2013). These researchers often point similarly to the potential of found data to complement or otherwise support made data but argue that made data, like random sample survey polling, is inherently of greater value and relevancy to the field (Taylor, 2013).

Measuring public opinion may not require random samples at all. Through a process known as intermedia agenda setting (Danielian and Reese, 1989), the high concentration of opinion leaders on

Twitter (Karlsen, 2015) has resulted in Twitter’s prominent role in shaping political campaigns (Conway et al., 2015) and off-Twitter media coverage (Groshek and Clough Groshek, 2013). Members of the media in particular have been found to hold influence in setting the national agenda (Harder et al.,

2017). Potentially intensifying Twitter’s role in influencing public opinion, media members on Twitter are significantly more likely to interact with one another on the platform than with members of the public, 17 creating the potential for an opinion leader echo chamber (Molyneux and Mourao, 2019). Through this tendency for influence of societal narratives to flow between media platforms, and the concentration of opinion leaders on Twitter, it may be possible for Twitter data to stand alone in its ability to reflect public opinion.

Investigating this debate and analyzing the successes and failures of researchers attempting to align social media data with public polling data leads to the need to define what, exactly, these researchers are hoping to learn. It seems clear both found and made data have their strengths and weaknesses and to define those, we must first get to the root of the differences in how they may be able to make claim to representing public opinion.

Topic Coverage vs. Population Coverage

The question of whether social media data can be used to complement or replace random sample surveys in the measurement of public opinion has largely revolved around the question of representation.

Michael Schober and his colleagues argue, however, that the root of this debate seems to center around how ”topic coverage”, or the point at which a researcher can be confident opinions on a topic have been fully represented in a sample, is achieved (Schober et al., 2016). For many decades, they claim, public opinion research has achieved topic coverage via ”population coverage”, or posing a topic to a random sample of the population. By addressing a topic with a representative sample of the overall population, random sample surveys approximate public opinion as a whole and make better than chance predictions of election outcomes. With the advent of big data, in large part powered by social media, some researchers

(Schober et al., 2016) believe population coverage may not be the only way to achieve topic coverage.

Social media users are not representative of the larger population (Baker et al., 2013), but evidence has emerged that users of Twitter (the platform most heavily researched due to its data being more acces- sible to researchers) might often be seen as opinion leaders in off-Twitter contexts (Barbera, 2016; Fu and

Chan, 2013; Karlsen, 2015). In Lazarsfeld and Katz’s ”two-step flow of communication”, opinion leaders amplify media narratives to mass society (Katz et al., 1955). In Barbera’s research, while his method found little connection between online sentiment and public polls, changes in public opinion polling were predicted by changes in sentiment online, suggesting the two-step flow theory may explain the relation- ship between social media data and public opinion. In this way, then, topic coverage may be acquired through the collecting of discussions among opinion leaders without achieving population coverage of a population that includes Twitter non-users and non-tweeters of the measured topic. Naturally, this can lead to a circular argument where the onus is put on the researcher to then prove population coverage of 18 the sub-population ”opinion leaders”, which proves difficult due to a lack of demographic data available from Twitter data collection and perhaps more difficult for the overall population of opinion leaders.

Alternatively, it may be that the concentration of opinion leaders on Twitter creates a intermedia agenda setting (Danielian and Reese, 1989) effect strong enough to influence overall societal discussion by way of shaping off-Twitter media agendas.

The very nature of social media and its use may create topic coverage inherently, at least in some cases (Ampofo et al., 2011). As social media often takes the form of a many-to-many platform, where any poster can be seen and responded to by any other poster, it could encourage deliberation on matters of societal import. Twitter trends like #BlackLivesMatter, #MAGA, and #MeToo have grown from social media, with users debating their meanings and nuances, into fully developed societal narratives.

Likewise, societal conversations on topics like civil rights, government, and international relations are taken up by social media users where conversations can take place on an ongoing basis due to social media’s always-on nature. It could be, then, that the collaborative nature of social media may produce topic coverage, where public opinion on a topic could be accurately measured through social media data.

If this is the case, despite researchers’ access to all of the data rather than just a sample of it (Mayer-

Schonberger and Cukier, 2014), more widely-discussed topics would produce more accurate measurements

(Ceron et al., 2014). As discussion of the topic increases, it could be that topic coverage on social media more accurately reflect overall public discourse.

The pressure to discover whether there is a connection between social media data and public opinion is mounting due to correlations between the two found by studies over the past decade or so (Tumasjan et al., 2010; O’Connor et al., 2010; Fu and Chan, 2013; Tjong Kim Sang and Bos, 2012). As research continues to apply varying methods and finds positive connections, it grows increasingly likely that there is a relationship between the two. While the relationship is likely not direct and linear, it is the task of researchers in this field moving forward to discover and explain it so that the measurement of public opinion can be made increasingly meaningful and accurate.

Social Media as Public Opinion Data

Social media platforms, and thus the relevance of their data, evolve quickly when compared to data gathered via telephone, experiments, or interviews. As innovation spurs rapid change in technologies, forms of communication adapt and create a constant flux of pressures on social media companies. The nature of communication, whether it be what sorts of topics are spoken about or on which platforms they are discussed, may evolve too quickly to warrant research using social media data (Huberty, 2013). 19

Huberty found that self-learning algorithms trained to predict one election failed to predict an election just two years later. As scientific inquiry relies on reproducability so that findings may be confirmed as more than a fluke, Huberty’s study begs the question as to whether social media data hold any value for public opinion researchers, at least when predicting elections.

Huberty’s method may leave some leeway, however, for those believing that social media data could improve method accuracy within the field of public opinion research. The researcher’s model was built upon an assumption that posts mentioning a politician are signs of support (Huberty, 2013), where this clearly should not be so. For example, if a Republican-leaning individual tweeted ”I think Hilary Clinton is dishonest”, his or her tweet would have been counted as a show of support for Clinton despite its showing of disapproval. There is some evidence that this phenomenon occurred en masse in the 2016 election, where Clinton and her supporters were more likely to post about Trump than Trump supports were to post about Clinton (Darwish et al., 2017). Huberty’s method also assigns words as signs of support for either an incumbent or ”salient political issues”, assuming a dichotomy between the two as the only choice for voters. This structure ignores the challenger completely, as he argues that ”reasonable model[s] should start from the assumption of incumbent re-election” (Huberty, 2013, p. 3).

Ultimately, as a number of researchers have found predictive power within social media data, those researching whether social media data can stand in for public opinion polls are tasked with exploring

findings that point both toward and away from this potential. As replication lags behind original research across a number of communication science fields (Open Science Collaboration, 2015), we must investigate the similarities and differences in approaches in these studies as well as potential changes to methods used that may increase accuracy (Al Baghal et al., 2019). We must also limit unintended biases in data collection practices, normalizing them across disciplines where possible, and compensate for the inherent difficulties in measuring data from platforms that are constantly ”drifting” (Salganik, 2019, p. 33) in population, usage, and structure. In any effort to explore questions relating to social media data’s relevance to public opinion, academics must continue to build upon previous research and take into account the findings of others exploring social media data’s relevance to public opinion when presenting their own results.

Predicting Elections

In 2010, two foundational papers investigating the predictive power of Twitter data for elections were published, one by Andranik Tumasjan and his colleagues and another by Brendan O’Connor and his.

With Twitter in its infancy, these research teams directly address potential drawbacks of a long-standing 20 and well-trusted method in random sample surveys and propose in their own ways how social media data may have a role within the field.

After gathering a corpus of tweets containing names of the six parties represented in German parliament along with certain prominent politicians, Tumasjan’s team compared Twitter conversational volume to election results and found promising results (Tumasjan et al., 2010). Though the findings fell a bit short of the accuracy provided by random sample surveys studying the same election, the researchers claimed this study showed social media data had potential in the field of election prediction. The study analyzed only a sample of approximately 100,000 posts, a small sample in today’s terms, and the gathering took place only about a month prior, limiting the study’s scope but perhaps increasing its accuracy.

Later, Erik Tjong Kim Sang and Johan Bos sought to build upon Tumasjan’s method by introducing normalization strategies. After collecting 700 million tweets, representing roughly 37 percent of all Dutch tweets during their collection time frame, the researchers sought to predict how many seats in the Dutch parliament each party would receive in the upcoming election (Tjong Kim Sang and Bos, 2012). Following

Tumasjan’s method, the team performed a one-mention-is-one-vote count method on its corpus of tweets.

The two then normalized their data by eliminating posts with multiple parties in the text and including only the first tweet mentioning a party from each user. This post-processing proved unhelpful, degrading rather than improving their accuracy. In another approach, Sang and Bos hand-coded a sample of 1,333 tweets by sentiment, classifying each into negative and non-negative categories. To address demographic representation, poll-dependent weighting was also used to help normalize data. Through this post- processing, the group was able to bring their accuracy to within 1.7% of the election’s outcome. Though the researchers were eventually able to find a data processing method that approached real-world election results, using polling data to influence how social media data is analyzed creates a circular argument and limits the research’s impact on the larger question of whether social media data on their own are predictive as compared to random sample surveys.

Reproducability ensures research is relevant to future understanding or predictions. Understanding a fact in isolation warrants no scientific interest, as its relevance within real-world contexts is unknown.

For this reason, public opinion research has struggled to understand the role of social media data within its field. As discussed within, a number of studies have found some correlation between the two data sources but few have explored the creation of a model of social media data’s relevance to public opinion.

Without a static model, if one could exist when describing a concept as ephemeral and ever-changing as social media, to explain the relationship, public opinion research will struggle to take seriously the

findings of those who use social media as their only data source. 21

In an effort to treat one of the field’s first successful findings as a model, Daniel Gayo-Avello and his colleagues applied the work of Tumasjan (Tumasjan et al., 2010) to elections in the United

States. As Tumasjan’s work had produced accuracy nearing that of polling data, Gayo-Avello’s group, following a nearly identical method, found accuracy rates three times worse than polling organizations when using sentiment analysis and around 16 times worse when following a pure volume-based method

(Gayo-Avello et al., 2011). These results were no better than random chance and seemed to suggest that, at the very least, Tumasjan’s approach cannot serve as a reliable model for social media data use as an election predictor. The researchers, however, leave room for the possibility to create such a model using more advanced methodologies like machine learning. As a result, Gayo-Avello later argued that in its current form, social media data could not be used to predict elections and provided an extensive list of recommendations for researchers exploring this question (Gayo-Avello, 2012). Among those, he explains that actual predictions could be made rather than using data to speculate predictions could have been made, establishing a ”golden truth” against which data should be measured and improving data quality by eliminating noise like bot-created posts.

Likewise, Andreas Jungherr and his colleagues were highly critical of Tumasjan’s work. In pointing out Tumasjan’s study’s lack of reproducability because of missing information, Jungherr et. al. go on to show that adding another party in the analysis of German parliament elections significantly harms the accuracy of the data (Jungherr et al., 2012). Tumasjan’s work, then, is used as a case study for how positive findings in this field have used a patchwork of undefined and otherwise unreplicable methods relying on convenient or haphazard decisions. Unlike Gayo-Avello and Jungherr, who focused mostly on reproducing Tumasjan’s work, Mark Huberty went a step further by producing a predictive algorithm and testing the same method on an election two years later. He also found a lack of consistency between elections, suggesting the nature of social media use and language itself to be too dynamic to support the creation of a reliable method that uses social media data to predict election outcomes.

The struggle for public opinion researchers in establishing a reliable method to compare social media data with election outcomes led to a wave of dissent within the academic community. Some argue against using social media data in any form (Taylor, 2013), others more conservatively explain that doing so brings a multitude of liabilities that render it less useful than public polling in the current research environment (Smith, 2013; Baker et al., 2013; Couper, 2013; Diaz et al., 2014). Social media are, relatively speaking, still very young and many researchers in the field conclude that the data it creates is potentially enlightening but the fields of inquiry which use that data must be self-aware about their weaknesses and biases when making claims (Tufekci, 2014). In the case of election predicting, current methods are too 22 unreliable to be useful in practice.

Mirroring Public Polling

Another path of research within the field of public opinion research using social media data has evolved as researchers explore whether data gathered from digital platforms may either complement or even eventually replace public opinion polling. In asking this question, scientists ask to what degree the peculiarities of social media data and its analysis can mirror the peculiarities of random sample survey- based public opinion polling methods. This question was perhaps first presented by Brendan O’Connor’s team’s 2010 study. Using a robust method including a corpus of over a billion tweets spanning two years and dozens of public opinion polls, the researchers found correlations between social media sentiment and public opinion polling as high as 80% (O’Connor et al., 2010). After searching on keywords and retrieving their data, O’Connor’s team used a subjectivity lexicon to sort posts with positive and negative phrases.

Each post could be positive, negative, neutral, or both positive and negative. Simply subtracting the volume of negative messages from the volume of positive ones provided an approval that was compared to polling data. The weakness of this method, of course, is that it relies heavily on lexicons that may be out of date and not comprehend nuanced language such as sarcasm. Still, the team’s results are perhaps the most promising to date and their method introduced an important ”lag” variable by which changes in public opinion polls can be seen to take a certain amount of time to represent changes detected in online sentiment. The concept of a lag variable is now used commonly when researching the relationship between social media and public opinion.

Soon after O’Connor’s study, Andrea Ceron and her colleagues began gathering Twitter data to measure whether it could reflect opinions of Italian leaders’ popularity as measured by public opinion polling (Ceron et al., 2014). Measuring the correlation over a span of roughly ten months and over

100,000 tweets, the team found that raw values of support do not seem to be highly correlated across social media data and public polling data, though they more closely aligned for the most popular leaders.

Ceron’s study, though, found evidence that changes in support over time were correlated between the two types of data.

Around the same time, King-wa Fu and Chee-hon Chan applied machine learning to the question, including not just Twitter data but gathering also from forums and blogs. Using machine learning, over 66,000 government-related posts were classified into negative and non-negative categories. Human- trained, or ”supervised”, machine learning sentiment software has been found to be 70 to 80% accurate

(Annett and Kondrak, 2008). In applying this method, Fu and Chan found high confidence in a weak 23 correlation between online sentiment and polling data and that online sentiment leads public opinion polling by eight to 15 days. Interestingly, the researchers suggest that in Lazarsfeld’s two-step model, if those leading online conversations act as opinion leaders, perhaps both social media data and polling data reflect attitudes toward media coverage of topics rather than a causal relationship between the two, with polling data necessarily lagged due to the nature of its collection. By comparing poll data as representative only of opinions on the day it was collected, particularly useful with daily polling, to daily samples collected from Twitter, any lag due to polling companies’ publishing schedules could be accounted for. Later studies have similarly found little to no evidence that social media data can mirror public polling data (Pasek et al., 2019). Another study claims to have found evidence of a connection between the two by training a model on data specific to a given timeframe and election, which assures no utility in future timeframes or elections (Beauchamp, 2017). Beauchamp’s reliance on models trained to his own data set fundamentally biases his research in a way more commonly undertaken by those seeking to match social media data with election results.

It may be theoretically impossible for social media data to mirror public polling. In 1953, Kurt

Lang and Gladys Engel Lang presented the concept of message refraction, a model wherein messages are perceived and interpreted through the lens of societal norms when traveling between media (Lang and

Lang, 1953). This transition in current day includes social media’s role in refracting messages from print and television media, which could create an information environment flooded with reinterpretations as messages flow between media. In this model, messages on social media and the unique way individuals who use them perceive those messages would be only a subset of overall public opinion on any given topic. This could render comparing public opinion to social media data moot.

Predicting Changes in Public Polling

However, growing evidence is mounting that social media data are predictive of changes in public opinion, if not absolute values of its measurement through random sample surveys. Touched on by

O’Connor in 2010 (O’Connor et al., 2010), and later further explored by others (Ceron et al., 2014;

Jensen and Anstead, 2013), shifts in social media sentiment seem to mirror and precede those of public opinion polls. Pablo Barbera’s working paper has tremendous potential on this topic. Barbera likewise fails to find correlation between social media data and public opinion polling but finds strong evidence that changes in public opinion can be predicted by changes in social media data (Barbera, 2016). Trained on over a billion tweets analyzed by week, his machine learning algorithm is compared to presidential approval polling and Barbera’s findings show strong evidence for the claim that social media, or at least 24

Twitter, users serve as opinion leaders.

Barbera also introduces a method for estimating the demographic information of Twitter users using geo-location information attached to tweets and demographic data of those locations provided by the government (Barbera, 2016). If robust and reliable, this method has the potential to revolutionize work in this field, addressing the representation problem presented by the anonymity of social media users. Using a system of weights to balance data for the demographic make-up of their location, known as ”post-stratification”, researchers may be able to use this method to create more representative samples of opinions from social media data. It should be noted, however, that recent changes by Twitter may soon render this technique inaccessible (Porter, 2019).

Further research is needed to explore whether changes in the sentiment of social media users on a topic are correlated with public opinion changes on the same topic. While a number of studies have found evidence of a correlation between the two types of data with regard to changes rather than absolute values, research dedicated to the question may reveal more details about, if a correlation exists, a more definitive lag between when the changes are monitored and which comes first. If research continues to find strong evidence that changes in social media sentiment predict changes in public sentiment as measured by public opinion polling, this would have significant theoretical implications on the two-step model.

Power To The People?

It seems, then, predicting elections is not yet a viable option with data collected from social media and while changes in public opinion seem to be predicted by changes in social media post sentiment, no studies have yet been dedicated to researching that phenomenon. Until this point, this study has referred to social media sentiment in a general way and has left its definition amorphous. Defining it, however, leads to interesting theoretical implications.

How to define social media sentiment is largely determined by method decisions. After tweets are collected, the content of the tweets may be aggregated and treated like a single corpus as in the method used by O’Connor’s team in 2010 (O’Connor et al., 2010). In his study, O’Connor left the day- to-day samples untouched, including no implementation of part-of-speech tagging, stemming, or other preprocessing procedures, and used a lexicon tagger to sort tweets on a positive-negative spectrum. The day’s sentiment, then, was simply the volume of positive messages divided by the volume of negative messages. Other studies have likewise followed the conversation-based approach (Ceron et al., 2014;

Barbera, 2016). This approach assumes the conversation itself has some affect on its participants where it produces, on average, a net opinion that is then eventually reflected nationally in offline polling data. 25

In this model, Lazarsfeld’s two-step flow might be seen to require an additional third step, where opinion leaders discuss among themselves relevant topics and media coverage to eventually arrive at an opinion landscape that is then disseminated to wider audiences. Conversation-based measurement of social media sentiment also suggests the potential for significant influence by accounts that are unrestricted by time and attention span. Automated accounts, or bots, might influence public opinion as a whole via their influence within Twitter conversations if the conversational model proves most accurate.

Social media sentiment can also be defined by individuals rather than a conversation. In this model, heretofore seemingly unused in research, data for each segment (day, week, month, etc.) could be grouped by individual. The sentiment of each set of individual data could then be averaged to produce an aggregate sentiment. The sentiment score for that day, then, would average the sentiment for all users rather than tweets, showing what the average conversational participant felt about the topic during that time period. If this model proved to more accurately predict changes in public polling data, the findings would be seen as further evidence of the two-step flow and Twitter would be seen as a sounding board for opinion leaders that can be measured to detect early changes in public opinion.

How one measures the lag times between influencer and influenced can also reveal important details about the role of opinion leaders online. If conversation or individual-based measurements of social media sentiment changes predict changes in public polling in a predictable way on a reliable timeline, either the two-step or three-step model would appear to be strengthened. However, if changes in social media data correlate most strongly with public polling data with zero lag, a model like Bennett and Manheim’s one- step flow (Bennett and Manheim, 2006), wherein opinion leaders have little influence on their audiences compared to major media sources, would be strengthened. Zero lag time between changes in Twitter sentiment and public opinion would suggest that users of the service were simply using Twitter to share the opinions they had established previously. In this model, despite Twitter users being more likely to be offline opinion leaders, the weight of their perspectives were not measurable in overall public opinion, perhaps due to direct access to individuals by message creators thanks to recent advances in communication technology.

Bot Manipulation

Automated social media accounts vary in their degree of automation. While some accounts may be created and maintained by a human-designed process, a model in which humans have a small but non-zero influence on each account, others may be run entirely by humans at times and rely on predetermined instructions at others. Defining the term ”bot”, then, is challenging and can have serious consequences 26 for claims about the proportion of bot accounts versus non-bot accounts on a platform, the behavior of bots, and the effect of bots on human social media users.

When building BotOMeter (previously called BotOrNot) (Davis et al., 2016), a tool to identify the likelihood that a Twitter account is automated, Davis and his colleagues met this challenge by introducing a set of scores that separates the chance an account is fully automated and the chance it is at least partially automated. Using the most stringent ”completely automated” standard, a recent limited study showed that roughly 6-6.5% of tweets are authored by bots (Wirth et al., 2019). For the purpose of this study, accounts will be considered bots when they are assigned a score by BotOMeter higher than what was found to be most effective by a recent Pew Research study (Gramlich, 2018). Defining bots as having processes that are fully automated and, after creation, without human interference increases the confidence with which claims about bot behavior can be made.

On an individual level, bots seem to have the ability to influence attitudes. Research has shown that bots, particularly ”social bots”, or bots that mimic humans in an effort to mislead human users, are seen as comparable in credibility and communication skills as humans by social media users (Edwards et al.,

2014). When attempting to influence individuals or conversations, bots are especially effective when posting material their audiences disagree with (Everett et al., 2016) and when they have established larger audiences (Wald et al., 2013). An early landmark study in bot effects research by Luca Aiello and his colleagues found bots are able to dig themselves into networks of conversations and users, gaining credibility and eventually triggering emotional responses of human users within those networks (Aiello et al., 2012).

Geopolitics and larger-scale society disruption seems also to be a goal of bot creators. In a recent study, Matthew Hindman and Vlad Barash found bots are prolifically sharing misleading information in an effort to influence national and international politics (Hindman and Barash, 2018). Automated sharing of misleading information as an act of control, or computational propaganda, has also been found to be a primary behavior of many bots by Samuel Wooley (Woolley, 2016) and Emilio Ferrara (Ferrara et al.,

2016) among other researchers. Misinformation campaigns, driven at least in part by bots, have been detected in the United States (Bessi and Ferrara, 2016), Venezuela (Forelle et al., 2015), the Philippines

(Etter, 2017), and Britain (Howard and Kollanyi, 2016).

Detecting efforts to influence politics, however, is not the same as detecting that these efforts were successful. Measuring influence by a comparably small entity like a social media bot, or even something a bit larger in a network of coordinated bots, on a much larger and ambiguous concept like ”society” or

”politics” is understandably very difficult. The current direction of public opinion research may provide 27 just the solution. If research can explore potential effects of bots on discussions within social media, such as in the work of Aiello and his colleagues (Aiello et al., 2012), while also investigating the nature of potential relationships between social media conversations and public opinion as a whole, the effect of bots and interconnected bot armies on wider society may become more clear.

Methodology In Question

Research exploring the relationship between social media data, public polling, and elections has, perhaps understandably, focused on approval of parties or politicians and the outcome of elections

(Beauchamp, 2017; Ceron et al., 2014; Gayo-Avello, 2012; Jungherr et al., 2012). If findings in this inquiry are to be meaningful, particularly when claiming positive correlations, the methods by which this research is conducted must be resilient across time and topic. Though some studies have claimed success matching social media data to polling data or election outcomes (Beauchamp, 2017; Tumasjan et al.,

2010; Ceron et al., 2014), their methods have not provided reproducible outcomes. And as research has shown predicting elections is not viable with social media data in research’s current state, only previous studies that gathered data with the intention of comparing it to public polling data is used as a basis for building this dissertation. In engaging the social media data debate within public opinion research, testing methods across multiple topics increases the confidence that results have been found that may benefit future researchers.

While a variety of methods are being used in the pursuit of the question as to whether social media data can reflect or predict public polling, there appears to be a subconscious debate raging the outcome of which should have important theoretical implications. Given that a number of studies seem to point toward Twitter users serving the role of opinion leader within their social circles (Fu and Chan, 2013;

Ceron et al., 2014; Barbera, 2016; Karlsen, 2015), better understanding how these users interact with messages they encounter on social media will shed light on whether the age of social media has brought about changes that require an update to Lazarsfeld’s two-step theory (Lazarsfeld and Merton, 1948).

Whether the classical two-step model should be updated for the Information Age is a popular topic in communication science. In 2006, Lance Bennett and Jarol Manheim published a landmark paper suggesting that Lazarsfeld and Katz’s model had been transformed by a fragmented mass media and direct line of communication between message senders and large audiences, resulting in a one-step model and largely bypassing opinion leaders (Bennett and Manheim, 2006). Following the mass adoption of social media, subsequent studies have found strong evidence of the two-step model’s applicability on social media platforms (Choi, 2015; Wu et al., 2011). 28

Topics

Gathering valid social media data for comparison to public polling data is no simple feat. In collecting social media data, one may gather a large sample of random tweets, rather than filtering for the desired topic initially, and use automated textual analysis techniques to sort data into various interests or topics such as in Barbera’s 2016 study (Barbera, 2016). This method is technically demanding, though, and potentially inefficient. Regardless of whether random tweets are gathered before tweets on the desired topic are identified, the process by which they are identified might have substantial effects on the outcome

(Jungherr et al., 2012).

If the effort of a researcher, as is the case in the author of this dissertation, is to test the validity of previous approaches to the question of whether changes in the sentiment of social media users predict changes in national sentiment as measured by public polling, it perhaps most wise to choose methods that reflect the strongest, most inclusive, or most-used topic in research of the question to date. As a number of studies have used polls measuring approval of the government, as an entity, or governmental

figures across the world (Ceron et al., 2014; Fu and Chan, 2013), and presidential approval has been measured by reputable polling services for decades on a daily basis, this topic seems to be a natural fit.

In O’Connor’s early but comprehensive work on the question of social media sentiment reflecting polling data, he explored presidential approval and found social media data correlated with polling when time lag was taken into effect (O’Connor et al., 2010). Likewise, other studies have included economic outlook, as this is another long-standing topic among pollsters (Cody et al., 2016; O’Connor et al., 2010).

These two topics meet the requirement of this study for their previous usage, broadness, and inclusion. Like O’Connor’s study, this research will feature two topics so as to decrease the possibility of topic bias in its results. Daily data allows for smaller and more efficient data sets as well as more atomic temporal data with which to detect and measure changes.

Measuring Twitter Sentiment

Once social media data has been collected, sentiment analysis of its relationship with public polling invites an even greater variety of method-based decisions. Should the text of each tweet be processed in its entirety or should common words that serve only as noise to a sentiment analysis algorithm like ”a” and ”and” be removed first? Perhaps most importantly, should individuals receive a single sentiment, in effect measuring the average user sentiment on a given day, or should all tweets be treated equally, in effect measuring the average conversational sentiment on a given day? Answering this question - whether sentiment is defined by the average of sentiments across users or tweets - greatly effects the theoretical 29 perspective of the study’s outcome on researchers’ understanding of the flow of influence between media and society.

In the mid-20th century, Paul Lazarsfeld and Robert Merton coined the ”two-step flow” of com- munication (Lazarsfeld and Merton, 1948). The two-step model proved foundational for communication research by shifting research away from an assumption that media has a powerful and direct effect on its audience, known as the ”transmission model” and often attributed to Harold Lasswell (Lasswell, 1938), and instead introduced the idea of opinion leaders who actively chose which media messages to amplify to their respective networks. The concept influenced Lazarsfeld’s work in the following decade on how voters made their decision to support a candidate and was refined and further described by Elihu Katz,

Lazarsfeld, and others (Katz et al., 1955). The two-step flow, in effect, acknowledges that media mes- sages influence the public but provides for representatives of the public to be in ultimate control of public opinion.

As the opinion leader concept has taken root, so too has technology and the meaning of the term

”media”. In Lasswell’s book, the media was seen as merely a communicative extension of the will of the government, particularly in Nazi Germany. In Lazarsfeld’s time, media was largely dominated by powerful conglomerates in the cable news, radio, and newspaper domains. By the time Lance Bennett and Jarol

Manheim released one of the most well-known updates in history to the two-step model, the internet had firmly taken route in households across the world while the radio and newspaper industries were waning (Bennett and Manheim, 2006). Their ”one-step flow” argues that the internet allows for media senders to target individuals with messages customized to their own identities, thereby circumventing the need for opinion leaders as described in the original two-step flow model. This theory harkens back to Lasswell’s transmission model, portraying public opinion as largely influenced by decisions of media senders. Later studies, incorporating data taken after the advent of social media, have challenged

Bennett and Manheim’s one-step flow theory by showing opinion leaders still serve their function within digital media environments (Weeks et al., 2017; Stansberry, 2012; Choi, 2015). As the body of research confirming the role of opinion leaders on social media platforms, a 2015 study by Rune Karlsen showed that, at least in political contexts, opinion leaders on social media are more likely to be opinion leaders in offline contexts (Karlsen, 2015).

The body of research exploring the relationship between social media data and public polling data seems to suggest a strong likelihood that changes in sentiment on social media predict changes in national sentiment as reflected by public polling and this suggests Twitter users may serve as opinion leaders in larger society. If social media sentiment is measured on an individual basis (Barbera, 2016), then, the 30

Figure 1. Rune Karlsen’s updated two-step flow featuring opinion leaders (OL) and passive individuals

(P) (Karlsen, 2015).

method suggests agreement with Karlsen’s update of Bennett’s two-step model. An interpretation of

Karlsen’s modern two-step flow of influence presented as in Figure 2.

If sentiment is instead measured from an entire corpus of tweets (O’Connor et al., 2010; Fu and

Chan, 2013; Ceron et al., 2014), the method suggests there is a discursive step between opinion leaders and their respective networks, taking place at least in part on Twitter, wherein the opinion leaders’ interpretations of media messages are forged and crystallized before dissemination. This flow of influence may be presented as in Figure 2.

By comparing these two methods in the same study, this dissertation hopes to contribute to un- derstanding of how influence flows between media and society. If correlations are found between changes in social media sentiment and changes in public polling, the method that is found to have a stronger correlation could reveal much about the two-step model’s strength in the social media age and whether a third step, a discourse between opinion leaders on Twitter, might exist.

Research Questions

The field of public opinion research is approaching a watershed moment. As random sample survey methodologies struggle with non-response and establishing platforms through which to deliver surveys in a cost-effective way, research into the potential for social media data seems to have begun settling on some of its strengths and weaknesses. Though it is growing less likely that a single method could 31

Figure 2. Two-step flow including conversation among opinion leaders.

consistently align social media data with polling data in absolute terms, it instead seems likely that shifts in online sentiment are predictive of shifts in public opinion polling and, thus, public sentiment as a whole. Research suggesting the predictive power of social media sentiment have merely found this phenomenon when attached to other research questions and warrant further exploration.

If previous findings hold and a correlation between social media sentiment on a topic and public polling data on the same topic is found, with social media sentiment changing first, further exploration into what ways social media sentiment is formed and could be influenced is an important scientific pursuit.

Even if no connection is found between the two variables, understanding online sentiment and how it can be manipulated is a worthy cause, as a number of studies suggest social media posters may serve as opinion leaders in online and offline situations (Fu and Chan, 2013; Ceron et al., 2014; Barbera, 2016;

Choi, 2015; Wu et al., 2011; Karlsen, 2015). As inauthentic communication in the form of automated accounts, or bots, has been shown to affect and disrupt online conversation (Aiello et al., 2012; Edwards et al., 2014; Hjouji et al., 2018), research investigating the relationship of bots and human sentiment in online conversations is needed.

As previous research suggests Twitter users may, at least in part, be comprised of opinion leaders that may help shape public opinion, understanding the process by which they settle on their perspectives will help reveal much about the two-step model in the age of social media. If opinion leaders’ sentiment on social media is predictive of changes in public opinion as measured by random sample surveys, how 32 their sentiment is measured could help either strengthen the two-step model or suggest possible updates that are needed due to technological changes.

This dissertation’s research questions will be the following:

1. Do shifts in social media sentiment on a topic predict shifts in public opinion polling on the same topic?

2. Do shifts in bot sentiment on social media predict shifts in non-bot sentiment on the same platform?

3. Does the topic or sentiment measurement technique (individual vs. conversation-based calculation) chosen affect the results of RQ1 or RQ2? CHAPTER 3

METHODS

Focusing on whether changes in social media data sentiment predict changes in public opinion polling as measured by random sample surveys is a better informed research direction, given previous

findings, than comparing absolute values of the two variables. To do so, previously established best practices will be followed and analysis will focus on variation patterns in the data sets. During the analysis phase of this research, Twitter sentiment will be measured in two distinct ways, thereby exploring the application of the two-step flow in the age of social media, whether previous attempts to update it can be strengthened, or whether a new model might be more accurate. Following this analysis, the collected data will be broken out by bots and non-bots, as defined by BotOMeter (Davis et al., 2016), and explore the relationship between the two categories.

Data Collection

Two sources of data will be required: polling data and social media data. For polling data, Ras- mussen’s daily presidential approval and Econometric survey data will be accessed. Rasmussen delivers surveys via automated phone call supplemented by web-based survey tool. Presidential approval surveys are well-known with a long history while also focusing on a topic of sufficient social media conversation volume to be measured effectively. Rasmussen ranked around the middle of the pack among major poll- sters in the 2008 election (Panagopoulos, 2009) and was among the worst performers between 2017 and

2019 (Silver, 2019), showing a Republican bias. However, Rasmussen is used in this study for two im- portant reasons: Gallup discontinued its daily polling in 2019 and Rasmussen is the only major pollster producing daily poll results, and the nature of this study relies only on consistency of polling meth- ods rather than accuracy. Additionally, the pollster was among the closest predictors of the 2016 U.S. presidential election popular vote margin, having predicted a 2-point Clinton advantage with the actual

Clinton advantage calculated at 2.1 percent. Since only changes, rather than absolute values, in measured

33 34

Figure 3. Data collection timeline

values will be compared, Rasmussen’s potential biases will not interfere with this study’s accuracy.

As Twitter currently offers the best combination of user base size and data availability, and recent studies have suggested its users may act as opinion leaders in the offline space (Barbera, 2016; Ceron et al., 2014; Karlsen, 2015; Weeks et al., 2017), the platform appears to be an effective way to measure the flow of online influence on public opinion. As social media consists of a multitude of platforms, each with their own unique use cases for users, volume of user base, and access level to their data by researchers, representing social media data is a difficult task for any researcher. For this reason, this dissertation throughout will collect Twitter data as a stand-in for social media data. This, in many way, is inaccurate. Twitter likely has a different demographic and different forms of use than any other social media platform. While Twitter is used more heavily by those wishing to connect to others (Chen, Chen) and has been a focal point for many social movements (Wang, Wang, and Zhu, Wang et al.), suggesting it serves as a platform for societal dialogue, others have shown that discourse on Twitter falls well short of an egalitarian conversational space (Wu et al., 2011). Still, as this study primarily focuses on opinion leaders and whether Twitter plays a role in the formation of their opinions, the platform serves as the source of social media data for this dissertation. The timeline of data collection can be seen in Figure 3.

In Figure 3, tweets, or posts from Twitter, mentioning ”Trump”, the current American president will be collected. As this study is measuring a potential delay in opinion change between Twitter and public polling as large as two weeks, tweets will be collected for 30 days (the timeframe of public polling analysis) plus two weeks. Tweets mentioning ”Trump” over the span of 44 days will be randomly sampled in five minute per day increments and grouped by day. Collecting tweets including the president’s name has been a precedent set by previous research to measure opinion of Twitter users about him (Pasek et al., 2019; Cody et al., 2016; O’Connor et al., 2010). Some research has found that simple keyword collection may introduce enough noise to effect the study’s outcome and suggest training a machine 35 learning algorithm on the keyword in order to sort a corpus, known as distant supervision, may produce more accurate results (Marchetti-Bowick and Chambers, 2012). While this approach seems promising and deserves future exploration, it remains relatively untested in the field of public opinion and this dissertation will rely on keyword search. This study will use Rasmussen’s daily presidential approval tracking poll to quantify public opinion of the president, which is collected Sunday through Thursday.

Tweets from Fridays and Saturdays will not be used for this study in an effort to keep the two data sets, tweets and poll results, as closely aligned as possible.

Similarly to tweets about the president, tweets regarding the economy over the same span of 30 days will be randomly sampled in five minute per day increments and grouped by day. Whereas the president is a singular entity and easy to filter tweets by, a broader concept like the economy is more difficult to define. When using an intentionally noisy data set to distantly supervise a machine learning algorithm, Marchetti and Chambers used simple keyword selection (”job”) to filter tweets about the economy (Marchetti-Bowick and Chambers, 2012). O’Connor and his team likewise used simple keyword selection to collect tweets about the economy, choosing ”economy”, ”job”, and ”jobs” (O’Connor et al.,

2010). These methods, particularly in the case of O’Connor’s study, risk missing relevant conversations and researcher bias, however. In this dissertation, a method of keyword selection will be used called

WordNet (Princeton, 2019). WordNet is a synonym-based lexicon with chains of related concepts grouped under other concepts in a tree-like pattern. This study will begin with O’Connor’s three terms (O’Connor et al., 2010) and, using WordNet’s dictionary, a keyword set semantically similar to those terms will also be included to create an initial corpus. The complete set of terms chosen using WordNet and associated definitions can be seen below. Each umbrella term’s hyponyms, or nested but semantically similar terms, are listed, and each listed term will be used as a keyword when collecting tweets regardless of its place in the semantic order.

• Economy (noun; the system of production and distribution and consumption)

• Job (noun; the principal activity in your life that you do to earn money)

– Synonyms

∗ Occupation (noun)

∗ Business (noun)

∗ Line Of Work (noun)

– Hyponyms 36

∗ Career (noun; the particular occupation for which you are trained)

∗ Employment (noun; the occupation for which you are paid)

WordNet software will then be used on the initial corpus to narrow the population of tweets to only those using the selected terms in ways semantically similar to modern-day colloquial use of the term

”economy”. Likewise, the software will be used to disambiguate the person ”Trump” from the verb ”to trump”. By using WordNet, this dissertation should be less likely to fall victim to researcher bias and include more relevant conversations than simple researcher-chosen keyword filtering.

Measuring opinions on a discrete topic, like the U.S. president, and a comparatively amorphous subject like the economy will likely result in differing outcomes. Exaggerating the difference between the two topics, the corpuses will be treated without regard to geographic location of the user as narrowing the corpuses to only those from accounts that share geographic data would unnecessarily limit the gen- eralizability of the study. Measuring opinions of English-speaking Twitter users all over the world will likely sharpen the differences in detected sentiment scores for Trump and the economy. A vast majority of English-speaking Twitter users are located in the United States (Clement, 2019) and, while not en- tirely accurate, comparing Twitter data to United States-based public polling should provide insight to the proposed research questions and has ample precedent for both topics (O’Connor et al., 2010; Ceron et al., 2014; Jensen and Anstead, 2013). Likewise, differences in bot activity are expected between the two topics, as attempts to influence approval of a sitting president may be more common than attempts to influence opinions of the economy.

Polling data to which social media data will be compared must also be accessed. Thirty sets of polls will be accessed, one for each day (Sunday through Thursday) available for the time period, and grouped by day. All times will be based on Eastern Standard Time (EST). O’Connor compared social media data to ”consumer confidence” as reflected by, in part, Gallup’s daily ”Economic Confidence Index” (O’Connor et al., 2010). While this data is not available any longer, Rasmussen’s Econometric data will be used a close approximation.

Tweets will be collected non-stop over the course of a 30-day period using Twitter’s streaming application programming interface (API) (Twitter, 2019). Twitter’s streaming API gathers tweets from a start point until it is turned off as opposed to its search API, which gathers tweets during a specific timeframe retroactively. Twitter’s streaming API allows the collection of nearly all posts while its search

API offers access only to pre-culled posts that do not include the lowest quality content (Twitter, Twit- ter). Research has verified Twitter’s claim, with the caveat that rate limits can significantly alter study 37 outcomes and should be considered when designing methods using Twitter data (Tromble et al., 2017).

As the second research question requires all data, as some low quality content is likely attributed to bot accounts, the streaming API will be used for this study.

All tweets collected Sunday through Thursday will be used for this study, including simple repost- ings, known as retweets, and repostings with personal comments added, known as quotes. Retweets represent a substantial percentage of overall Twitter activity and thus will be included in conversation- level analysis. While retweets contain only content created by the original poster, for individual-level analysis, this dissertation will count retweets identically to posts, as the retweeter is amplifying the mes- sage of the original poster. Quotes, which include both the referenced tweet and the poster’s original content, will be treated as simple tweets and the referenced tweet will not be included in analysis, as whether the individual agrees or disagrees with the quoted tweet relies upon the user’s original content.

Analysis

Analysis will differ so that two models of influence, the two-step model originally proposed by

Lazarsfeld and Katz (Katz et al., 1955) and a three-step model wherein opinion leaders turn to social media to establish their opinions with one another, may be tested. Variations in pre-processing and sentiment calculation will distinguish the two analysis processes.

Conversation-Level Measurement

The data will be processed to determine Twitter sentiment and public opinion for each day of the

30-day timeframe. For the Twitter data, each topic will be treated as a separate data set and will be processed using the ”sentimentr” package published on the Comprehensive R Archive Network (CRAN)

(Rinker, 2019). This package is the most robust and most-used dictionary-based sentiment-calculating software available on CRAN, the most-respected archive of R software (Team, Team), though future robustness checks using other sentiment detection software could strengthen these studies’ findings. This package pre-processes text by breaking corpuses (in this case, tweets) into individual sentences and removing most punctuation. Each word in each sentence is then compared to a sentiment dictionary

(Jockers, 2019) and assigned a sentiment value on a plus-minus scale. Importantly, sentimentr increases accuracy by compensating for ”valence shifters” (e.g., ”negators” and ”amplifiers”), or words meant to shift the corpuses’s sentiment in some way (e.g., ”I seriously do not like pie” where ”seriously” is an amplifier and ”not” is a negator). By considering ”context clusters”, the package combines the sentiment scores for the words in a sentence to estimate the sentiment for each entire sentence. Using this software 38

Figure 4. Conversation-based measurement process

package, the average sentiment of each sentence in a tweet will be recorded as its sentiment score. The day’s sentiment will then be calculated by averaging the sentiment of all tweets on that day. This process is illustrated in Figure 3.

Sentiment is not identical to opinion. Comparing the sentiment of tweets to public polling data, which measures opinions, is not without concern. How one feels about a subject may not, when asked, represent their stated opinion. This study will attempt to compare the two, however, given that the concepts are seen to be, if not identical, heavily intertwined. A number of studies in the field of public opinion have used sentiment as a stand-in for opinion on Twitter (Ceron et al., 2014; O’Connor et al.,

2010; Barbera, 2016) and this study, as it measures changes in Twitter sentiment and compares them to changes in public polling rather than comparing absolute values, should be less susceptible to possible inaccuracy introduced by the subtle differences between the two.

To compare Twitter sentiment to public opinion data, daily Twitter sentiment, economic confidence, and presidential approval will each receive a score for each day in the period. For Twitter sentiment, as the sentimentr package (Rinker, 2019) assigns scores between -1 and 1, each day will receive a score between those two values. For economic confidence, Rasmussen assigns a score between -100 and 100 for each day’s results. For presidential approval, a 0 to 100 scale will be used. These scores will then be compared. 39

As detailed in O’Connor’s study and used by Barbera in his study (Barbera, 2016; O’Connor et al.,

2010), averaging scores across multiple days (known as a ”rolling average” or ”temporal smoothing”) allows for smoothing outliers and potentially producing more accurate results. As this dissertation’s data time scale is more microscopic than Barbera’s or O’Connor’s, a three-day rolling average (the smallest used in either study) will be produced for social media sentiment scores. As Rasmussen likewise uses a three-day rolling average, this will allow the two data sets to be compared on as similar of terms as possible.

O’Connor introduced the concept of ”lag” into social media measurement within public opinion research, showing that social media data may have the potential to predict changes in public polling data. Considering this dissertation’s research is exploring a more compacted timeframe than others in this field, O’Connor’s lag times between social media sentiment and public results of several weeks are untenable. Instead, this dissertation will analyze lag times of zero, four days, as used in

Jungherr’s 2017 study (Jungherr et al., 2017), a week, and two weeks. Each day of each data set will be given a change score, measured by the current day’s absolute score minus the previous day’s absolute score, and a linear regression test run (with various lag times applied) to compare these scores. Finally, to avoid overfitting the data, or choosing only results that fit a desired outcome, the methods and associated lag times that will be explored will be preregistered before the survey data are collected. (Nosek et al.,

2015).

Individual-Level Measurement

To measure whether evidence of the the two-step flow can be detected without the need for a conversational third step, pre-processing to separate out individuals will be undertaken before a second round of data analysis. Each day’s corpus will first be separated by author and, in cases where an author appears more than once in a day’s corpus, the sentiment of their tweets will be averaged in order to acquire a sentiment score for each person for each day. The day’s sentiment will then be calculated as the average of these individual scores. This process can be seen in Figure 3.

A strong argument exists for removing ”verified” accounts, or accounts labeled by Twitter as belonging to individuals or organizations of importance to society, from the individual-based corpuses, as Barbera does in his study (2016). Removing these accounts may reduce the potential confound of opinion leaders within the Twitter information ecosystem biasing measurement, as studies have shown the stratification of influence within the platform (Wu et al., 2011; Choi, 2015) is measurable. Along with various lag times, this study will examine the effect on the results when verified accounts are included and 40

Figure 5. Individual-based measurement process

discluded from analysis. The individual-level sentiment scores will then be compared to public polling data through the same process as mentioned above for conversation-level analysis.

Measuring Bot Influence

To address whether shifts in bot account sentiment are correlated to shifts in overall Twitter sentiment about a topic, a step of analysis will occur before sentiment measurement wherein each corpus is processed to identify automated accounts. Each daily corpus will be divided into ”bots” and ”non- bots”. For this study, bots will be identified using software known as ”botscan” (Wirth et al., 2019) to apply BotOMeter (formerly known as BotOrNot) (Davis et al., 2016) to entire corpuses of tweets. Led by a team of five researchers, BotOMeter is the most-respected and most-cited bot detection software available to researchers. Despite this, and despite extensive training of BotOMeter’s algorithms, no current-day bot detection software, or any in the foreseeable future, will be perfectly reliable and all suffer from the lack of a so-called ground truth. While algorithms can predict with increasing accuracy which accounts are automated and which are not, lacking definite knowledge of whether the prediction is correct, the previously mentioned ground truth, the results of bot detection software must be taken with at least a small grain of salt. For future research including bot detection, robustness checks using other bot detection software would further strengthen confidence in the studies’ findings. Given, however, the potential importance of issues related to automated communication on social media platforms, using systems like BotOMeter to research the behavior of bots on platforms like Twitter remains valuable.

In botscan’s results, bots will be defined by any account scoring higher on the ”cap.universal” 41 variable than a benchmark (0.43) found most effective by Pew researchers (Gramlich, 2018). Future research using other thresholds should be conducted to further increase confidence in the software’s outcomes. The ”cap.universal” variable, ”cap” standing for ”Complete Automation Probability”, assigns higher scores to accounts more likely to be completely automated without regard to language. This is the most stringent benchmark offered by BotOMeter results and allows a researcher to identify only the accounts the BotOMeter algorithm is most confident are bots. To create a subset of bots from each day’s corpus, each day’s corpus will be analyzed using botscan and the tweets corresponding to identified bots used for further analysis.

Applying sentimentr (Rinker, 2019) as previously described, the sentiment of each day’s bots and non-bots will be calculated separately and noted. Analysis will follow processes laid out above, though rather than comparing changes in the bot data subset with public opinion polls, they will be compared to changes in non-bot individual and conversational sentiment scores. This analysis will use Jungherr’s

+4 to -4 daily lag scale to measure lag times in potential bot influence (Jungherr et al., 2017). This range, rather than the previously mentioned zero, four-day, week, and two week range, allows for the measurement of directional influence between bots and overall conversation. If bots are simply reacting to Twitter conversation, their sentiment would lag behind overall conversational sentiment, producing a negative lag (as represented by bot sentiment timing minus overall sentiment). The smaller range allows for more nuanced measurement of a tighter timescale, as influence on Twitter should flow more quickly than in off-Twitter interpersonal environments.

Bot tweets are included in the calculation of overall sentiment for each day as they are part of the overall informational ecosystem but will be contrasted against individual-level data. Bots are not included in the individual-level analysis because it is designed only to capture the sentiment of potential offline opinion leaders. By separating and comparing them, this study will explore whether bots may have a role in shaping online discussions between offline opinion leaders.

Final Results

The result of this study’s analysis will be a set of six tables comparing sentiment scores for non-bot individuals and conversation to polling for each topic and bot sentiment scores. Each cell will contain a regression coefficient of change scores between two variables. A linear regression model will be run in

R with either changes in public polling or changes in non-bot social media sentiment as the dependent variable and either changes in social media sentiment or changes in social media bot sentiment as the independent variable. This formula can be understood as the following, where 0 represents the intercept, 42

1 represents previous data associated with each measured time lag, and 2 represents Twitter sentiment:

Today’s Opinion Pollingi = β0 + β1(Yesterday’s Polling)i + β2Twitter Sentimenti + i

Applying linear regression allows for the analysis of effects from time lag and Twitter sentiment

(or bot sentiment, in the case of comparison to non-bot sentiment) simultaneously while increasing the interpretability of the effect scale. The resulting tables for Research Questions 1 and 3 will include two rows separated by topic, Economy and President. The observation for each cell will be the appropriate regression coefficient for the given lag times: zero lag, four day lag, one-week lag, and two-week lag.

In the final results tables, regression coefficients will be displayed reflecting the relationship between changes in absolute values of public polling data and Twitter sentiment with various lag times applied.

This table will relay the strength of the correlation between changes in public polling and changes in

Twitter sentiment, answering RQ1. Where stronger correlation is found will also highlight whether there is little to no lag time between the two sets of data, strengthening the one-step flow theory (Bennett and

Manheim, 2006), or that lag is detected, suggesting either the two-step flow (Lazarsfeld et al., 1944) or a three-step flow alternative more accurately describe the data. To help determine whether the data are more applicable to the two-step flow or a three-step model, this chart will be duplicated, one displaying conversation-based analysis and another displaying individual-based analysis.

To further explore the effect of removing verified accounts on the analysis, the individual-based chart will again be duplicated, one displaying data with verified users included and another displaying data with verified users removed. In total, these charts will answer RQ3 while illuminating the implications of removing verified users from individual-based Twitter sentiment analysis on public opinion research using social media data.

To recap, Research Questions 1 and 3 will be answered by a total of three charts illustrated by the

final results table:

• Comparing polling data to conversation-based Twitter sentiment

• Comparing polling data to individual-based Twitter sentiment

– Verified users included

– Verified users discluded 43

The table showing the results of RQ2 will also display regression coefficients reflecting the relation- ship between changes in absolute values of Twitter bot sentiment of each measured topic and non-bot

Twitter sentiment (TS) of the same topics with lag times ranging from -4 days (non-bot sentiment pre- dicts bot sentiment by four days) to +4 days (bot sentiment predicts non-bot sentiment by four days) applied. Analyzing which lag time correlation is highest will highlight whether bots are simply reacting to non-bot Twitter conversation or bots are influencing non-bot Twitter conversation. The strength of the correlations will illustrate the amount of influence one has over the other. To help determine whether bots are more likely to influence individuals or conversation as a whole, this chart will be duplicated, one displaying conversation-based analysis and another displaying individual-based analysis.

Each cell will contain a regression coefficient of change scores between two variables. A linear regression model will be run in R with either changes in non-bot social media sentiment as the dependent variable and changes in social media bot sentiment as the independent variable. This formula can be understood as the following, where 0 represents the intercept, 1 represents previous data associated with each measured time lag, and 2 represents Twitter sentiment:

Today’s Non-Bot Sentimenti = β0 + β1(Yesterday’s Non-Bot Sentiment)i+

β2(Twitter Bot Sentiment)i + i

To further explore the effect of removing verified accounts on the analysis, the individual-based chart will again be duplicated, one displaying data with verified users included and another displaying data with verified users removed. In total, these charts will answer RQ2 while illuminating the implications of removing verified users from individual-based Twitter sentiment analysis on public opinion research using social media data.

To recap, Research Question 2 will be answered by a total of three charts:

• Comparing Twitter bot sentiment to conversation-based non-bot Twitter sentiment

• Comparing Twitter bot sentiment to individual-based non-bot Twitter sentiment

– Verified users included

– Verified users discluded

In summary, Twitter sentiment data on two topics (the U.S. economy and the U.S. president) will be compared to public polling data on the one hand and Twitter bot sentiment on the other. Previous 44 studies in public opinion research have used conversation-based methods to calculate Twitter sentiment, implying a Twitter-based third step in Lazarsfeld and Katz’s two-step flow (Katz et al., 1955), and this dissertation will compare this method to individual-based analysis to explore whether this choice in method most accurately describes opinion formation in the age of social media. By measuring lag times to include the lack of any lag between public opinion and Twitter sentiment, this dissertation will also explore the veracity of Bennett and Manheim’s one-step flow (Bennett and Manheim, 2006).

This dissertation will explore opinion formation and the flow of influence in the modern-day in- formation ecosystem. By testing a number of methods and analysis techniques, it will contribute to clarification and understanding of Twitter data’s role in public opinion research, how Twitter data can best be understood with regard to its place in the larger information ecosystem, and the level of influence

Twitter bots may have in conversations within Twitter and public opinion at large. CHAPTER 4

RESULTS AND DISCUSSION

This dissertation’s methods were designed to answer three primary questions and explore other data that address those questions. Primarily, this research probes the relationship between social data, public opinion, and automated social media accounts known as bots. Lastly, it examines whether the way social media sentiment is measured might alter the outcome of research designs and what implication that may have for theoretical approaches to these questions moving forward. It conceptualizes Twitter users as opinion leaders who help disseminate information from media to offline audiences (Karlsen, 2015). To strengthen confidence in the outcomes of these studies, the methods were pre-registered (Nosek et al.,

2015)1.

With regard to the first two questions, the pre-registered concept of lag times applies. If one data set predicts another, the second must by extension follow the first. The amount of time between the original data and the predicted effect is known as the lag time. Understanding by how much time one precedes the other should lead to more complete understanding of any detected phenomena.

The following sections will be organized by research question and will review the result of each analysis as well as discuss how those results fit into the dialogue of other research. The most important results and associated visuals will be shown in-text while those remaining can be found in the referenced appendices.

Measuring the Relationship Between Twitter Sentiment and Polling Data

Since the advent of relatively easy to access online data from social media over the last decade, the data has tempted researchers. Among those are public opinion researchers who want to know whether social media data might be used to measure public opinion much like random sample polls have been used for decades. By measuring the sentiment, or the amount of positive and negative language, of

1Available at https://osf.io/m423h

45 46

Table 1. Corpus-based sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses)

Zero Lag Poll Four Day Lag One Week Lag Two Week Lag

(n = 30) (n = 26) (n = 23) (n = 16)

Economy 8.544 (22.335) -1.247 (21.229) -2.026 (16.457) 21.214 (15.257)

President 9.823 (18.515) 36.259α (19.014) -59.953** (15.945) 24.945* (35.276) α p < 0.1 * p < .05 ** p < .01 the social media data, researchers have attempted to use this new data source to predict elections in the way polls are able to do. Thus far, attempts that have been successful in doing so have relied on non-generalizable methods (Tjong Kim Sang and Bos, 2012; Tumasjan et al., 2010) and later attempts to apply those methods to predict future elections have failed (Gayo-Avello et al., 2011; Jungherr et al.,

2012). Likewise, it seems simply replacing polls with social media data as a stand-in is not possible with current-day methods (Ceron et al., 2014; Fu and Chan, 2013; Pasek et al., 2019). A number of studies have, while researching those questions, detected the possibility that changes in social media sentiment predict changes in public polling (Ceron et al., 2014; Jensen and Anstead, 2013; O’Connor et al., 2010;

Barbera, 2016).

The method exploring this dissertation’s first research question was designed as the first to explicitly investigate this phenomenon. It explores the question using two separate topics with pre-registered methods (Nosek et al., 2015) so as to avoid the problems of overfitting and cherry-picking. Finally, if there is a relationship, the method is designed to discover how long of a time span, or lag time, between a change in Twitter sentiment is most predictive of a change in public polling. The study’s raw values of polls and Twitter sentiment for both topics can be seen in Figures 6 and 7. 47

Polled Economic Confidence Twitter Sentiment of Economy

200 0.15

0.10 175 0.05

150 0.00

-0.05 125 -0.10

100 -0.15 1/19/2020 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 6. Economic confidence as measured by polls and economic sentiment as measured by Twitter sentiment over time.

Polled Presidential Approval Twitter Sentiment of President

55 0.07

0.05 53

0.03

51 0.01

-0.01 49

-0.03

47 -0.05

45 -0.07 1/19/2020 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 7. Presidential approval as measured by polls and Twitter sentiment over time.

The results of the data’s analysis show that unlike election prediction, changes in social media sentiment data can in some cases and for some topics be used to predict changes in public polling.

Substantial differences in the predictiveness of the data sets representing the two topics were found, as 48

Table 2. Verified user sentiment vs. poll regression coefficient estimate across lag times (standard error in parentheses)

Zero Lag Poll Four Day Lag One Week Lag Two Week Lag

(n = 30) (n = 26) (n = 23) (n = 16)

Economy 6.269 (11.968) 21.177α (13.221) -14.971 (11.119) 3.048 (12.753)

President 0.588 (4.342) -7.148α (3.971) -1.687 (4.519) 0.569 (5.566) α p < 0.1 seen in Table 1.

The results of Table 1 use corpus-based data, while similar results were found using other pre- registered sentiment-measurement methods and those can be found in Appendix A. Each cell is filled by the regression coefficient estimate, showing how predictive a change in sentiment is of a change in public polling for the respective lag period, and the associated standard deviation. As seen in Figures 6 and

7, there does not appear to be any standard or predictable fluctuation over time in either variable that might account for the predictive nature of Twitter sentiment of polling results.

A post-analysis that was not pre-registered was completed to further explore whether the results seen in Table 1 and whether verified Twitter users represent a distilled version of opinion leadership in off-Twitter situations as compared to the overall Twitter population. In this post-analysis, verified users were isolated from each sample and regression analyses were run for each time period identically to the original research question’s method. Those results were inconclusive, likely due to a small sample size, and can be found in Table 2. Further research is needed into the relationship between verified Twitter users and the rest of the users on the platform and what implications this might have on any potential relationship between Twitter users and public opinion at large.

Per-Topic Results

The results of Table 1 suggest that, at least for the presidential topic, changes in social media sentiment can predict changes in public polling fourteen days later. For the presidential topic, the fourteen-day lag results reached statistical significance for a positive coefficient, the four-day lag results approached significance, and the one-week lag test produced interesting results covered in the Effect

Direction Over Time subsection. However, no result from the economic topic approached statistical significance.

The non-preregistered difference between topics could suggest that the ability to predict polling 49 data with social media sentiment is limited only to topics as clear-cut as presidential approval and might not apply to other, less well-defined, topics. When searching Twitter for tweets about the president, a single term can be used to capture the topic while more decentralized topics like the economy, requiring a number of terms to capture, may be too difficult to measure reliably on Twitter. Researchers in other

fields have likewise found prediction with more centralized, focused topics to be more predictive (Azar and

Lo, 2016). When predicting the stock market, researchers often avoid decentralized terms like ”economy” and instead use a stand-in like a single company listed on the stock market (Pagolu et al., 2016).

It is perhaps most probable that despite adding rigor to keyword selection in previous research

(Marchetti-Bowick and Chambers, 2012; O’Connor et al., 2010), fully and accurately gathering data on a topic as general as the economy remains an imperfect task. As the general nature of the economy data set, with its negative turn in the first seven days before a positive turn at the fourteen-day mark, seems to reflect the presidential data set despite its lack of reaching statistical significance, the overall findings point to that additional work is needed in future research to narrow and improve social media data on decentralized topics like the economy. 50

Effect Direction Over Time

When detecting relationships between data, those relationships might be positive or negative. In positive relationships, when the independent variable moves in one direction, the dependent variable moves in the same direction. This dissertation’s pre-registration only accounted for a positive relationship between the data for this study. In Table 1, a positive relationship would be shown as a positive number and a negative relationship shown as a negative number. While the general direction of the point estimates is similar across both topics, starting positive and turning negative before turning positive again, little can be made of the economy data as it does not reach statistical significance. Only the 14-day lag approaches acceptable levels for the economy topic, suggesting some ability to predict but needing confirmation in future studies.

The data show that, for the presidential topic, when Twitter sentiment data moves in one direction, a movement in overall public approval in the same direction is expected around 14 days later. This seems to confirm initial findings from other research (O’Connor et al., 2010; Ceron et al., 2014; Jensen and

Anstead, 2013) and point to Twitter playing a role in the formation and evolution of public opinion.

Additionally, the data show that when Twitter sentiment data moves in one direction, a comparatively strong movement in overall public approval in the opposite direction is predicted about seven days later.

Approaching two weeks after the initial one-week swing in Twitter sentiment, this non-preregistered movement of polling approval in the opposite direction, or reversing, subsides and the data once again show a positive relationship.

The non-preregistered predictive ability of changes in social media sentiment for changes in polling approval reversing is likely not due to chance or artifacts in the data itself. When potential outliers are removed from the data set, the trend line remains negative for the seven-day lag period2. Additionally, the series nature of the data sets and the trend being roughly mirrored in both topic data sets suggests random chance is not the cause of the phenomenon, though the specific time frame in which it was captured is a potential confound. Though studies like that of O’Connor’s team (O’Connor et al., 2010) have not previously detected this reversing effect, this is likely due to this research design’s uniquely granular daily measurement of polling approval. Using preregistered daily polling data, this study is able to better match the time scales of social media and seems to have detected a phenomenon that deserves more investigation and explanation.

The results using presidential data are quite clear, as the 14-day lag data is significant and the non-

2This analysis can be found in Appendix B 51 preregistered seven-day lag reversing nearly achieves a p value of .001 while the economic results again are less clear. Still, both sets of data follow the same trend line and suggest a previously undetected and not pre-registered phenomenon is taking place. This predictive ability sign reversing implies that as offline events take place, an increase in positive sentiment about that topic on Twitter predicts a near-term increase in polling approval of that topic followed by a sharp downturn in polling approval of the topic, followed later by a final significant upturn in polling approval of the topic.

A potential, but not pre-registered, explanation for the reversing is that after an initial response to an outside event by both public polling and Twitter sentiment in the four-day window, America’s increasing political polarization (Jaenicke, 2002; Abramowitz and Webster, 2016) and the differences in how political extremes use Twitter (Faris et al., 2017; Narayanan et al., 2018) lead to a substantial and sudden backlash to the initial event on the platform whereas public opinion continues moving in the original direction of the public response. As the Twitter backlash fades due to a conversational timeline that is shorter than that of other media (Cody et al., 2015; Nugroho et al., 2017), sentiment on the platform once again suddenly changes and better reflects the overall response to the event, resulting in an overall change in sentiment that predicts a long-term shift in public polling as a result of the event. In Cody’s study, the Twitter conversational half-life, or point at which half of the conversation is complete, is roughly one day for offline events and ranges between a week to several weeks for ongoing conversations. This seems to sync well with this study’s non-preregistered finding that the reversing effect peaks around the seven day mark. Ultimately, this is merely speculation and while the seven- day reversing effect warrants future exploration and confirmation, the study provides evidence that for centralized topics, changes in Twitter sentiment predict changes in poll approval fourteen days later.

Further research is needed to explore timelines longer than fourteen days to confirm whether the positive relationship continues at longer time scales.

Time Lags

A compelling observation about these results is that the most predictive time scale matches with the prediction’s reversing. By seven days from an initial one-point change in Twitter sentiment of the president, a 60-point change in polling approval is predicted. However, these two data sets (Twitter sentiment and polling approval) are measured on different scales. Twitter sentiment is measured on an

-1 to +1 scale whereas polling approval of the president is measured on a 0-100 scale. A one-point shift in Twitter sentiment, then, would represent half as large of a shift in polling approval as is possible - an unrealistic occurrence. In absolute terms, the average change in rolling 3-day approval (measured in a -1 52

Poll Change Sentiment Change

2 0.02

1 0.01

0 0.00

-1 -0.01

-2 -0.02 1/19/2020 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 8. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for presidential approval.

to +1 scale) of the president in this data set was 0.005 in either direction. Using this scale, an average daily shift in Twitter presidential sentiment predicted a 0.3 point shift (using a 0-100 scale) in polling approval of the president in the opposite direction seven days later. Given another week, that same

0.005-point shift in Twitter sentiment predicts a 0.12-point shift in polling approval of the president in the same direction. To see absolute measures of both topics compared with change in each, see Appendix

D.

Figure 8 and Figure 9 show how these two variables, sentiment change and polling change, are mapped over time in the data set for the Trump and economy topics, respectively. These charts show how the values of Twitter sentiment and polling of each topic relate to one another with no lag. Figure 10 shows the outcome when using fourteen-day and seven-day lag results to predict poll values for presidential approval. The blue line of Figure 10, representing actual shifts in polling approval, and the figure’s yellow line, representing predicted shifts using 14-day lagged sentiment data, follow along closely and show the predictive nature of that data using pre-registered methods. The figure’s red line, representing shifts predicted using seven-day lagged sentiment data, appear inversed along points of major shifts and illustrate the not pre-registered reversing effect described in the Effect Direction Over Time section. 53

Poll Change Sentiment Change

3 0.03

2 0.02

1

0.01

0

0.00

-1

-0.01 -2

-3 -0.02 1/19/2020 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 9. Three-day rolling averages with zero lag of poll data and Twitter sentiment over time for economic confidence.

Actual Poll Change 7-Day Predicted Poll Change 14-Day Predicted Poll Change

1

0

-1

-2 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 10. Three-day rolling averages of presidential polling changes and predicted changes in presidential polling approval using fourteen-day and seven-day lagged Twitter sentiment changes.

Overview

These results provide evidence for, in at least centralized and focused conversations like those about the president, the ability of researchers to predict changes in public polling with changes in social media sentiment. The predictive nature of Twitter data for polling data is not linear nor even in one direction 54 and, instead, fluctuates over time before settling some time around or after the 14-day mark.

Measuring the Relationship Between Twitter Sentiment and Bot Sentiment

A growing consensus of research into automated social media accounts, or bots, shows that bots designed to mimic humans can alter the attitudes and behaviors of their audiences (Edwards et al., 2014;

Everett et al., 2016; Wald et al., 2013; Aiello et al., 2012). The path to understanding whether bots can influence national or international conversation, however, is less clear. Some bots are designed to disrupt discourse, particularly around politically-charged topics (Hindman and Barash, 2018; Woolley, 2016;

Ferrara et al., 2016; Bessi and Ferrara, 2016). Quantifying whether these types of bots have influence larger than their direct audiences is challenging and previous studies have struggled with interpreting experimental studies of individuals findings related to their societal-level influence (Kollanyi et al., 2016;

Forelle et al., 2015).

The pre-registered (Nosek et al., 2015) method investigating this dissertation’s second research question was designed to introduce a new approach to investigating the effect of bots on a scale larger than that of interpersonal interaction. Specifically, this method explores whether changes in the sentiment of bots on Twitter predict changes in overall Twitter sentiment on the same topic or vice versa. Like

RQ1, this method includes two topics to increase confidence in the results, explores what lag times, if any, apply.

The results suggest that changes in bot sentiment are predictive of changes in overall Twitter sentiment on a given topic. The data approach or achieve statistical significance for the economy topic, depending on sentiment measurement technique, and achieve significance at a positive two-day lag for the presidential topic. This shows that bot sentiment changes of the president precede changes in overall sentiment of the president in the same direction by between one to two days.

Corpus-based results, rounded to three decimal places, can be seen in Table 3. Similar results using individual-based methods of calculating social media sentiment can be found in Appendix C. Each cell is

filled by the regression coefficient estimate, showing how predictive a change in sentiment is of a change in public polling for the respective lag period, and the associated standard deviation.

Unlike the relationship between polling data and Twitter sentiment, the relationship between changes in bot sentiment and changes in Twitter sentiment appears to be in a pre-registered, positive- only direction. This means as Twitter bot sentiment of a topic moves in one direction, overall Twitter sentiment on the topic is expected to move in the same direction afterward. The scale of this graph is more easily interpretable than that of Table 1, as the two variables are measured on the same -1 to +1 55

Table 3. Corpus-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses)

-4 Days -3 -2 -1 0 +1 +2 +3 +4

Bots (Econ.) 0.096 0.100 -0.231 -0.103 0.234α 0.247α 0.017 -0.114 -0.109

(0.177) (0.175) (0.163) (0.150) (0.131) (0.130) (0.145) (0.145) (0.151)

Bots (Pres.) -0.019 -0.085 -0.017 -0.003 0.099 -0.035 0.156* -0.108 -0.055

(0.064) (0.061) (0.061) (0.068) (0.066) (0.068) (0.069) (0.074) (0.075) α p < 0.1 * p < .05

scale. For example, when bot sentiment of the president shifts by one point, these results predict a 0.156 point shift in the same direction by overall Twitter sentiment - a change of about 15 percent as much as the original shift in bot sentiment.

Figures 11 and 12 show how these two variables, sentiment change and polling change, are mapped over time in the data set for the Trump and economy topics, respectively. Figure 13 shows the actual change in Twitter sentiment of the president over time versus predicted changes calculated using the

findings of this study.

One example of this phenomenon features Trump’s State of the Union address on February 4th,

2020. After hitting a heretofore low, bot presidential approval change increased before hitting a peak

February 7th. After a small rise immediately following the speech, change in Trump’s overall approval lagged before jumping to a heretofore climax a few days following the spike in bot approval change. This can be seen in Figure 14.

The strength of the ability for bot sentiment to predict overall Twitter sentiment seems to bolster previous research that point to the potential for bots to influence discourse on Twitter (Bessi and Ferrara,

2016; Howard and Kollanyi, 2016; Howard et al., 2018; Kollanyi et al., 2016). If bot accounts were somehow reacting more quickly than overall Twitter to offline events, one might expect the lag times to be less substantial than one or two days. In contrast, if groups of these accounts are linked into so-called armies (Bessi and Ferrara, 2016) and used to provide consistent pressure in a single direction by exaggerating and amplifying select offline events, one would expect to see the type of results produced by this study. In either case, additional research is warranted. 56

Twitter Sentiment Change Bot Sentiment Change

0.050

0.025

0.000

-0.025

-0.050 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 11. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the presidential approval topic.

Twitter Sentiment Change Bot Sentiment Change

0.04

0.02

0.00

-0.02

-0.04 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 12. Three-day rolling averages with zero lag of overall Twitter sentiment change and Twitter bot sentiment change over time for the economic confidence topic.

Sentiment Measurement Techniques Compared

The pre-registered (Nosek et al., 2015) method exploring this dissertation’s third research question builds on the overwhelming majority of research investigating the predictive ability of social media data 57

Actual Sentiment Change Predicted Sentiment Change

0.02

0.01

0.00

-0.01

-0.02 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 13. Three-day rolling averages of Twitter sentiment and predicted change in Twitter sentiment when using only two-day lagged bot sentiment as a predictor.

Figure 14. Three-day rolling averages of overall Twitter sentiment change and Twitter bot sentiment change over time.

in the field of public opinion using corpus-based measurement of social media sentiment (O’Connor et al., 2010; Fu and Chan, 2013; Ceron et al., 2014). From a two-step model perspective (Lazarsfeld et al., 1944), opinion leaders should consume media and formulate their own opinions based on that 58 media before disseminating their newly-formed perspectives to wider audiences. By measuring Twitter sentiment as the average sentiment of all posts, rather than as the average sentiment of each user, these researchers imply that Twitter users are influencing public opinion and that there is a third step in the model wherein opinion leaders form their opinions on Twitter among one another before sharing them more broadly.

This study examines whether corpus-based sentiment measurement is the most predictive way to measure social media sentiment or whether individual-based measurement is more predictive, taking from these theoretical implications for future research. Taken on average, the results suggest a small but meaningful improvement in predictive capabilities when using corpus-based measurement of Twitter sentiment rather than individual-based measurement, bolstering the methodological choices of previous researchers in the field and suggesting Twitter may play a measurable role in the opinion formation of Twitter users (and that those users may act as opinion leaders in off-Twitter contexts). The pre- registered difference found between corpus-based and individual-based predictive ability provides for the potential that bot accounts influence the overall Twitter corpus and that a third step in the traditional two-step flow exists wherein opinion leaders converse among one another in order to settle on their own perspectives.

Results for both presidential and economic topics can be seen in Table 4. The first three columns in these tables is filled by the average residual standard error (RSE) across all lag periods, showing the estimated average error of a given prediction using the data. In this case, smaller numbers represent smaller errors and, thus, more desirable outcomes. The final column is filled with the percent improvement when using corpus-based data to predict rather than individual-based data. In the final outcome, higher percentages represent more desirable outcomes.

Improvement between corpus-based measurement and individual-based measurement was far more pronounced for questions of bots versus individuals on Twitter (a 4.0% average improvement) than for questions of polling versus Twitter (a 0.3% average improvement). When Twitter users consume conversation on the platform, they encounter both human-created and automation-created content yet

Table 4 shows data created from pre-registered methods that exclude bot activity from individual-level sentiment calculations.

To further explore the difference between corpus-based measurement improvement across research questions beyond preregistered methods, the method investigating bots versus individuals on Twitter was rerun using data including bots in individual calculations in an effort to isolate any potential degradation effect bot activity was having on individual data predictiveness. Improvements in RSE by topic and 59

Table 4. Residual standard error by sentiment calculation method, topic, and re- search question, with prediction improvement when calculating sentiment by corpus rather than individual

Corpus Individual Individual Corpus-Based

No Verified RSE Improvement

Presidential Topic

RQ1 0.605575 0.607450 0.607500 0.310%

RQ2 0.006564 0.006827 0.006851 4.007%

Economic Topic

RQ1 0.983075 0.983575 0.984075 0.051%

RQ2 0.010183 0.010538 0.010696 3.486%

research question can be found in Table 5.

The data show worse predictiveness, or an increase of the difference between individual-based RSE and corpus-based RSE, in three of four quadrants. These results suggest bots may be most effective when interacting directly with individuals at scale rather than posting tweets themselves. This is speculative, however, and warrants further research.

In each case, eliminating verified users slightly worsens the predictive capacity of the data set, suggesting that Barberra’s innovative 2016 study (Barbera, 2016) may have been better served to avoid this preprocessing step. While verified users are more likely to have larger followings and thus be placed more centrally within networks, removing them seems to inhibit prediction, if only by a small amount.

This dissertation explores whether Twitter users are opinion leaders in offline situations, as found in findings from previous research (Karlsen, 2015; Fu and Chan, 2013; Ceron et al., 2014; Barbera, 2016).

Though the findings of this exploration are limited due to a small sample size, further research in this area could explore why eliminating verified users worsens the predictive capacity of data sets and whether it has implications for the role of opinion leaders within the Twitter environment. Specifically, isolating opinion leaders on Twitter and comparing the data they create with data pulled from overall conversations could allow for better understanding of the flow of influence on Twitter. 60

Table 5. Residual standard error improvement when using corpus-based vs. individual-based calculation

Bots Included Bots Excluded

Presidential Topic

RQ1 0.087% 0.310%

RQ2 3.778% 4.007%

Economic Topic

RQ1 0.061% 0.051%

RQ2 1.031% 3.486%

Summary

The results to this dissertation’s pre-registered first research question show an ability for changes in the sentiment of Twitter users to predict changes in public polling regarding the president fourteen days later. Additionally, a change in the opposite direction (on a favorable/unfavorable scale) in a seven-day time frame is also seen before the prediction produces the pre-registered positive correlation near the 14- day mark. Those results found that predicting data on less focused and diffuse topics like the economy proves more difficult than that of centralized, single-term topics like the president.

The second research question’s results show that overall Twitter sentiment can be predicted by bot sentiment on the same topic in a one to two-day time frame and the results of the final research question point to a potential that there could be a third step in the modern-day two-step flow (Lazarsfeld et al.,

1944) wherein Twitter users settle on their opinions online before disseminating them more broadly. In a non-preregistered exploration, the results of this study also suggest Twitter bots are more influential when interacting directly with human users rather than contributing to overall discourse on the platform. CHAPTER 5

CONCLUSION

Established as the most quantitatively sound way to measure public opinion since at least the mid-1930s, public polling is facing increased pressures. George Gallup, Archibald Crossley and others introduced the idea of sampling a population representatively and these techniques evolved in the following decades toward random sample polling. Recent technological innovations, in particular cell phones and the internet, have challenged the method as communication media change but also provided those using it with increased opportunities.

This dissertation explores just one of those opportunities - using social media to understand public opinion. In particular, it shows that while previous research suggests current methods render election prediction impossible using social media data, predicting changes in polling is one way in which social media data could likely be used to complement greater overall understanding of public opinion. Likewise, the dissertation shows a similar connection between Twitter bot behavior and Twitter users and offers methods and findings that point to measurable influence of bot behavior on public opinion. Lastly, it explores the theoretical implications of differences in how Twitter sentiment is measured within the methods proposed.

The results of this dissertation place social media squarely within the conversation about the crossroads random sample polling faces. As the field of public opinion seeks ever more refined techniques to control for non-response bias, representation accuracy due to evolving communication platforms, and other challenges, social media also appears to have the potential to provide researchers insights regarding public opinion. Specifically, Twitter seems to have a predictive relationship with public opinion as measured by polls, suggesting opinion leaders on the platform can influence one another’s opinions before they are disseminated to the public.

Naturally, not all who post to Twitter are opinion leaders in non-Twitter spaces. Likewise, not all opinion leaders post to Twitter. Even given these allowances, the results of this dissertation reinforce

61 62 previous research regarding Twitter users as opinion leaders. Given, in particular, that changes in Twitter sentiment of the president are found to be predictive of changes in his polling approval, the findings of this dissertation bolster those of Rune Karlsen that suggest those who post to Twitter are more likely than average to function as opinion leaders in off-Twitter situations (Karlsen, 2015).

With some of those who post to Twitter seen as opinion leaders in off-Twitter spaces, it becomes perhaps easier to see how influence may in some cases originate on Twitter and disseminate widely enough to influence public opinion as a whole. After opinion leaders settle on their respective opinions among themselves on Twitter, their conversations and opinions may then expand beyond the platform via a process known as intermedia agenda setting (Danielian and Reese, 1989). With the predictive nature of Twitter data for public opinion change detected, the research of Raymond Harder and his colleagues shows a possible mechanism by which this influence may take place (Harder et al., 2017). While the

flow of influence from the media to opinion leaders on Twitter and ultimately to the wider public via intermedia agenda setting was not directly measured by this dissertation, its results imply that this perspective cannot be ruled out.

Predicting Polls

Data cherry-picking and overfitting, or applying models to predict only the outcome to which the data is associated, has been found among studies attempting to predict election outcomes with social media data (Tjong Kim Sang and Bos, 2012; Tumasjan et al., 2010). When rigorous attempts are made to apply methods used to predict one election to other elections, they consistently fail to do so (Gayo-

Avello et al., 2011; Jungherr et al., 2012). Though data chosen after an election and only applicable to that election can be used to build predictive models, they have not been able to extend their usefulness outside of the election upon which they were built. Thus, while predicting elections with social media data may one day prove possible via methods more advanced than those that exist currently, it should for now be considered impossible.

Reproducability likewise dogs researchers who hope to mirror public polling with social media data

(Beauchamp, 2017). While O’Connor’s team in 2010 came the closest to producing a methodologically sound study (O’Connor et al., 2010), it suffered from the lack of accessibility to more modern method choices and future research attempting to build on their ideas failed to show any correlation between absolute measures of Twitter sentiment and public opinion as measured by random sample surveys

(Ceron et al., 2014; Fu and Chan, 2013; Pasek et al., 2019).

However, a series of studies found evidence that changes in Twitter sentiment might predict changes 63 in public polling. By building off of methods proposed by previous research, this dissertation is the first study to investigate this question directly. Its results suggest that studies like that of Barberra (Barbera,

2016) and Ceron (Ceron et al., 2014) found authentic trends and that Twitter data has a time-lagged relationship with public polling. By using pre-registered (Nosek et al., 2015) methods to avoid cherry- picking or overfitting, this dissertation offers a rigorous quantitative look at the relationship between

Twitter data and public polling. It finds evidence of a relationship between Twitter data and public polling data in centralized topics measured by a single term like ”Trump”, while more general topics like the ”economy” did not reach significance. Specifically, changes in Twitter sentiment of the president were predictive of changes in polling approval of the president 14 days later. In finding evidence of a relationship between Twitter data and public polling data, the results of this study show Twitter may play a role in the formation and/or evolution of public opinion.

The exact role Twitter may play in the establishment of public opinion, however, is still an open question. This dissertation’s first study finds that, for the presidential topic, changes in Twitter sentiment are predictive of changes in polling approval 14 days later. It also finds, however, that a change in Twitter sentiment of the president in one direction predicts changes in public opinion in the opposite direction seven days later. This so-called reversing effect has not been found previously and was therefore not pre-registered. Given that the change-based nature of this study is uncharted and the unique daily nature of the data, it is perhaps unsurprising that the results produced an outcome that was not pre- registered. Though this could be detecting political polarization and the differing ways the political extremes use Twitter (Faris et al., 2017; Narayanan et al., 2018), the results certainly suggest they are detecting a rift between the nature of Twitter user opinion and overall American public opinion within political conversations. The reversal of prediction direction could be a logical outcome given that the study measures only those actively engaging in political discourse.

Bot Influence

The body of previous research suggests social media bot accounts can influence individuals on a social level, with certain variables exaggerating their manipulation (Edwards et al., 2014; Everett et al.,

2016; Wald et al., 2013; Aiello et al., 2012). While the goal of political bots, or bots engaged in political conversations on social media, seems to be to disrupt discourse on a broad level (Hindman and Barash,

2018; Woolley, 2016; Ferrara et al., 2016; Bessi and Ferrara, 2016), questions remain about the degree to which interpersonal effects extend further than the individual (Kollanyi et al., 2016; Forelle et al., 2015).

While studies like Forelle’s attribute ”small” effects to detected bot activity, quantitative measurement 64 in bot research of effect scale on wider systems is lacking. If Twitter users are more likely to be offline opinion leaders (Karlsen, 2015) and the results of previous research suggest the possibility of Twitter users acting as opinion leaders in other contexts (Barbera, 2016; Fu and Chan, 2013), the possibility of automated influence centralization on Twitter is one deserving academic attention.

This dissertation applies two sets of methods to establish that bot activity has a relationship with overall Twitter discourse. The ability for changes in bot sentiment to predict change in overall sentiment on the same topic peaks in a one-to-two day window though excluding bots from the calculation of individual-based sentiment increases its predictive power. These findings provide perspective not only on the time scale of bot influence but suggest bots are most influential when interacting with individuals directly. While more research is needed to explore the scale of this effect as compared to other influences on public opinion, the results of this dissertation show bot influence is important to further investigate.

Given consistent efforts by bots to shape wider conversation over long enough time scales, it may be possible for these automated accounts to influence the evolution of public opinion as a whole. These

findings add rigor and new methodological approaches to the measurement of society-level bot influence and suggest the field is one worthy of future study.

The Three Step Flow

Much research has been done to validate Lazarsfeld and Katz’s original two-step flow (Lazarsfeld and Merton, 1948) and apply it to modern-day communication technologies (Bennett and Manheim,

2006; Weeks et al., 2017; Stansberry, 2012; Choi, 2015). Exploring the concept of opinion leaders and the dissemination of opinions throughout society naturally has implications for the formation and evolution of public opinion. Understanding how the flow of influence through communication systems, as information from media passes through the opinion leader gateway onto the public at large, changes in response to new communication technologies is a crucial part of better understanding the field of public opinion as a whole.

This dissertation reveals and tests an important latent assumption by previous research in the field of social media’s relationship to public opinion. Most studies have assumed that an unmeasured third step exists wherein opinion leaders turn to their peers on Twitter to debate and establish their own opinions before disseminating those opinions to wider offline audiences. These studies make this assumption by measuring Twitter sentiment as a collection of posts, or corpus-based measurement, rather than measuring the sentiment of individuals, which would better fit the traditional two-step model. This dissertation finds data using corpus-based measurement is more predictive of polling data than data using individual-based 65 measurement, suggesting the latent assumption by previous studies is correct and that a third step in the two-step flow exists wherein opinion leaders influence the perspectives of one another.

In finding that, rather than polls and Twitter sentiment reacting at the same time to offline events, changes in Twitter sentiment predict changes in polling and thus there is a period of time (known as a lag) between the two, this dissertation refutes Bennett and Manheim’s one-step model (Bennett and Manheim,

2006). In a one-step model, one would expect a direct and immediate response to outside stimulus on social media and in polling simultaneously, thus producing zero lag. Instead, this dissertation finds that changes in Twitter sentiment predict changes in public polling by between seven and fourteen days.

In this way, this dissertation provides further evidence that an opinion leader-based model remains an accurate way to view the transference of influence in modern society.

Establishing that the assumption these researchers were making holds true, though the size of the effect seems small and requires more investigation, adds to the modern interpretation of the two-step

flow and the role of Twitter in the formation of public opinion. The results of this dissertation further strengthen the concept of opinion leaders serving as information gateways, contradicting specifically

Bennett and Manheim’s one-step model (Bennett and Manheim, 2006), and imply Twitter serves at least some role in the formation of opinion leader perspectives.

Future Research

The attempt to understand how public opinion formation and evolution is altered by digital com- munication technologies is a comparatively recent one and is difficult to measure with absolute certainty.

The structure of social media platforms and the way they are used by consumers are constantly evolving, creating a moving target for scientists interested in how they shape the ways in which their users relate to one another and the offline world. Even more challenging is the study of bot behavior and influence, as the strategies of bot creators likely evolve in response not only to platform countermeasures but also to what the public knows about them. What is clear from the body of research in both fields, however, is that there are processes taking place with enormous potential to disrupt the free flow of information

(Howard and Kollanyi, 2016; Bessi and Ferrara, 2016). The purpose of this research is to defend open discourse by exploring the role of social media in the flow of societal influence and bringing to light ways in which bad actors might gain influence over it. To do this, researchers must stay especially agile, constantly seeking to ask creative questions and offering creative methods to do so. This dissertation is an attempt to do both those things.

A common challenge to researchers of social media is that social media users do not represent 66 populations that include social media non-users (Tufekci, 2014). This has the possibility of limiting the representativeness of research using it (Gayo-Avello, 2012; Baker et al., 2013). This dissertation self- consciously accepts the possibility that Twitter is an online gathering place for opinion leaders, based on both direct and indirect findings of previous research (Karlsen, 2015; Conway et al., 2015; Groshek and

Clough Groshek, 2013; Molyneux and Mourao, 2019). However, it is difficult to know whether opinion leaders on Twitter are representative of all opinion leaders, leaving the generalization of these studies reliant on the argument that Twitter opinion leaders are either particularly influential in the formation of public opinion or that there is a close enough correlation between on-Twitter and off-Twitter opinion leaders that the measurable difference is negligible. Given the results of this dissertation, there seems to be a compelling argument for further research into the nature of Twitter users and their ability to influence national public opinion.

Likewise, by measuring tweets, this study necessarily ignores Twitter users who do not tweet. As

Twitter roughly reflects the 80-20 rule, where 20 percent of users are responsible for 80 percent of tweets

(LaForme, 2019), measuring tweets is not even a reliable way to measure a representative sample of

Twitter users. While further exploration of so-called lurkers, or Twitter users who do not post tweets, could examine their role in the opinion formation of offline individuals, this dissertation considers those who engage in conversations to be more likely to influence others.

This dissertation was designed to synthesize previous research in the field of social media data’s role in public opinion. Using a pre-registered set of methods (Nosek et al., 2015), the studies’ results encourage confidence that Twitter plays a substantial and measurable role in the formation of public opinion. In the first study, changes in sentiment of a topic on Twitter are found to be very predictive (far surpassing statistical significance) of changes in public polling of the same topic. In the second study, changes in the sentiment of bot accounts on Twitter with regard to a topic were found to be significantly predictive of changes in overall Twitter sentiment of the same topic. Because of the predictive nature found of the behavior of Twitter users on offline polling, in particular, this dissertation suggests Twitter plays a measurable role in the construction and evolution of public opinion. The size of this effect, however, is difficult to comprehend without context about other effect sizes. This dissertation provides clear evidence of Twitter’s having a role in public opinion formation but additional research may provide better perspective on what these findings mean as to how influential Twitter is in the larger public opinion ecosystem.

Adding to this discussion were the first study’s pre-registered (Nosek et al., 2015) results showing that while changes in Twitter sentiment are predictive of changes in polling approval, the nature of 67 that prediction is not linear. In fact, a negative change in Twitter sentiment of the president, for example, predicts a uptick in presidential approval as measured by polls around seven days later, showing a negative relationship between the two variables. A negative relationship between the two variables is not a possibility that was included in these methods’ pre-registration. However, that same negative change in Twitter sentiment of the president significantly predicts a similar and pre-registered downturn in presidential approval as measured by polls around 14 days later, showing a positive relationship between the two variables. Because of the unique granularity of this study, being based on daily poll results, this so-called reversing effect has not yet been detected and begs questions about both the similarities and differences between American polling respondents and Twitter users. Additional research is needed to investigate and explain this reversing phenomenon, which should further understanding of the role of

Twitter users as opinion leaders and how large a role Twitter has in the construction and/or dissemination of public opinion.

The results of this dissertation’s final study also have a continuum of possible interpretations. In all sets of data analyzed, corpus-based measurement was the most predictive measurement method of

Twitter sentiment. This seems to suggest that the two-step, opinion leader-based model holds true but that Twitter adds another step in which opinion leaders establish opinions among themselves. This seems to strengthen previous findings that, for example, movements on Twitter can influence media coverage on and off Twitter (Casas et al., 2016). However, the size of the difference varied between

RQ1 and RQ2. The data became more predictive when removing bots, suggesting that the relatively small improvement in using corpus-based measurement becomes even smaller when bot activity is not included in the calculations. While the dissertation’s pre-registered results point to a reliable effect where corpus-based measurement is more predictive than individual-based measures, more research is needed to determine whether this apparently small difference has a substantial theoretical implication. Likewise, the difference in predictiveness detected in non-preregistered post-analysis in removing bot activity versus including it suggests a potential method for exploring the methods of bot influence in the future.

Contributions

The results of this dissertation complement one another in worrying ways. Twitter, given the results of two methods, appears to play a role in the formation and evolution of public opinion. Within Twitter, bot influence seems to influence larger discourse, at least among centralized political conversations. This suggests that the possibility exists that bad actors can, and perhaps do, influence public opinion through automated social media accounts. Preventing the influence of bad actors through social media on public 68 discourse will require further understanding empowered by a continuously evolving and innovating body of research.

This dissertation shows that Twitter plays a meaningful role in the formation and evolution of public opinion and should stand as a compelling argument for future research into the exact nature of that role.

It examines in a methodologically rigorous way the question vexing some public opinion researchers over the previous decade and makes the argument that there is a relationship between social media and public opinion and argues for future research into the exact nature of that relationship. Drawing on interpretations of findings from previous studies (Conway et al., 2015; Groshek and Clough Groshek, 2013;

Molyneux and Mourao, 2019) as well as explicit findings from another (Karlsen, 2015), this dissertation’s results suggest at least some Twitter users act as opinion leaders in offline scenarios. Methodologically, it shows that the relationship between Twitter sentiment and public polling is change-based and that corpus-based measurement of Twitter sentiment should be the standard for future research. The results of the dissertation argue for compensating for the possibility of, and further investigation of, negative relationships and using a wide breadth of time scales to compensate for appropriate lag times in the variables’ predictions of one another while contributing to better understanding of what those lag times may be.

Likewise, it presents a new methodological approach for measuring bot influence at scale and shows that doing so offers evidence that bots may have the capability to influence public opinion at large.

This set of studies should add a significant contribution in both topic and method within the fields of social media’s role in public opinion, societal-scale bot influence, and measurement of Twitter sentiment, among others. The dissertation shows the potential of applying BotOMeter (Davis et al., 2016) at scale and, by analyzing bots as a group within specific conversations, provides a robust framework for how to measure the influence of bot activity on overall public opinion. Likewise, it adds to the field of bot research knowledge regarding potential avenues by which bots may have the capability to effect society and argues for future research into that question.

Finally, the studies in this dissertation advocate for the need to embrace the principles of open science, particularly in the fields of public opinion research, social media, and big data exploration. Much of previous research investigating the relationship of social media to public opinion suffers from data overfitting and cherry-picking, rendering overall findings in the field inconsistent and unreliable. By pre-registering (Nosek et al., 2015) these studies and publishing all data and software used to gather 69 that data1, this dissertation produces reliable methods, data, and results that can be replicated, further explored, and further developed. Specifically, the provided repository includes open-source data and software that can be used to replicate this study when advances are made in sentiment and bot detection applications. More generally, it establishes a baseline approach to an open-science research design in the

field and encourages future designs that are pre-registered, open-source, and generalizable.

1Available at https://github.com/kurtawirth/dissertation REFERENCES

Abramowitz, Alan I. and Steven Webster (2016). “The rise of negative partisanship and the nationaliza- tion of U.S. elections in the 21st century.” Electoral Studies 41, 12–22.

Aiello, Luca Maria, Martina Deplano, Rossano Schifanella, and Giancarlo Ruffo (2012). “People are strange when you’re a stranger.” In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media.

Al Baghal, Tarek, Luke Sloan, Curtis Jessop, Matthew L. Williams, and Pete Burnap (2019). “Linking Twitter and survey data: The impact of survey mode and demographics on consent rates across three U.K. studies.” Social Science Computer Review.

Allport, Floyd H. (1937). “Toward a science of public opinion.” The Public Opinion Quarterly 1(1), 7–23.

Ampofo, Lawrence, Nick Anstead, and Ben O’Loughlin (2011). “Trust, confidence, and credibility.” Information, Communication & Society 14(6), 850–871.

Annett, Michelle and Grzegorz Kondrak (2008). “A comparison of sentiment analysis techniques: Po- larizing movie blogs.” In Sabine Bergler, ed., Advances in Artificial Intelligence, Volume 5032, pp. 25–35. Springer Berlin Heidelberg.

Azar, Pablo D. and Andrew W. Lo (2016). “The wisdom of Twitter crowds: Predicting stock market reactions to FOMC meetings via Twitter feeds.” The Journal of Portfolio Management 42(5), 123– 134.

Baker, Reg, J. Michael Brick, Nancy A. Bates, Mike Battaglia, Mick P. Couper, Jill A. Dever, Krista J. Gile, and Roger Tourangeau (2013). “Summary report of the AAPOR task force on non-probability sampling.” Journal of Survey Statistics and Methodology 1(2), 90–143.

Barbera, Pablo (2016). “Less is more? How demographic sample weights can improve public opinion estimates based on Twitter data. pp. 37. NYU.

Beauchamp, Nicholas (2017). “Predicting and interpolating state-level polls using Twitter textual data.” American Journal of Political Science 61(2), 490–503.

Bennett, W. Lance and Jarol B. Manheim (2006). “The one-step flow of communication.” The ANNALS of the American Academy of Political and Social Science 608(1), 213–232.

Bessi, Alessandro and Emilio Ferrara (2016). “Social bots distort the 2016 U.S. presidential election online discussion.” First Monday 21(11).

Brick, J. Michael, Pat D. Brick, Sarah Dipko, Stanley Presser, Clyde Tucker, and Yangyang Yuan (2007). “Cell phone survey feasibility in the U.S.: Sampling and calling cell numbers versus landline num- bers.” Public Opinion Quarterly 71(1), 23–39.

70 71

Brick, J. Michael and Douglas Williams (2013). “Explaining rising nonresponse rates in cross-sectional surveys.” The ANNALS of the American Academy of Political and Social Science 645(1), 36–59. Burstein, Paul (2003). “The impact of public opinion on public policy: A review and an agenda.” Political Research Quarterly 56(1), 29–40. Casas, Andreu, Ferran Davesa, and Mariluz Congosto (2016). “Media coverage of a “connective” action: The interaction between the 15-M movement and the mass media.” Revista Espa˜nolade Investiga- ciones Sociol´ogicas (REIS) 155(1), 73–118. Ceron, Andrea, Luigi Curini, Stefano M Iacus, and Giuseppe Porro (2014). “Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France.” New Media & Society 16(2), 340–358. Chen, Gina Masullo. “Tweet this: A uses and gratifications perspective on how active Twitter use gratifies a need to connect with others.” Computers in Human Behavior 27(2), 755–762. Choi, Sujin (2015). “The two-step flow of communication in Twitter-based public forums.” Social Science Computer Review 33(6), 696–711. Clement, J. (2019). “Countries with most Twitter users 2019.” URL: https://https://bit.ly/ 2UAtCY7, accessed 2019-10-23. Cody, Emily M., Andrew J. Reagan, Peter Sheridan Dodds, and Christopher M. Danforth (2016). “Public opinion polling with Twitter.” arXiv:1608.02024 [physics]. Cody, Emily M., Andrew J. Reagan, Lewis Mitchell, Peter Sheridan Dodds, and Christopher M. Danforth (2015). “Climate change sentiment on Twitter: An unsolicited public opinion poll.” PLoS ONE 10(8). Cohn, Nate (2017). “After a tough 2016, many pollsters haven’t changed anything.” URL: https: //nyti.ms/3gXApVf, accessed 2019-10-21. Conrad, Frederick, Roger Tourangeau, Mick Couper, and Chan Zhang (2017). “Reducing speeding in web surveys by providing immediate feedback.” Survey Research Methods 11(1), 45–61. Conway, Bethany A., Kate Kenski, and Di Wang (2015). “The rise of Twitter in the political campaign: Searching for intermedia agenda-setting effects in the presidential primary.” Journal of Computer- Mediated Communication 20(4), 363–380. Couper, Mick P. (2013). “Is the sky falling? New technology, changing media, and the future of surveys.” Survey Research Methods 7(3), 145–156. Couper, Mick P. and Peter V. Miller (2008). “Web survey methods.” Public Opinion Quarterly 72(5), 831–835. Crossley, Archibald M. (1937). “Straw polls in 1936.” The Public Opinion Quarterly 1(1), 24–35. Dalkey, Norman C. (1969). “The delphi method: An experimental study of group opinion.” Technical Report RM-5888-PR, Rand Corporation, Santa Monica, CA. Danielian, Lucig and Stephen Reese (1989). “A closer look at inter-media influences on the agenda- setting process.” In Communication Campaigns About Drugs: Government, Media, and the Public, pp. 47–66. Lawrence Erlbaum Associates, Inc. Darwish, Kareem, Walid Magdy, and Tahar Zanouda (2017). “Trump vs. Hillary: What went viral during the 2016 U.S. presidential election.” arXiv:1707.03375 [cs]. Davis, Clayton, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer (2016). “BotOrNot: A system to evaluate social bots.” In Proc. 25th Intl. Conf. Companion on World Wide Web, pp. 273–274. 72

Diaz, Fernando, Michael Gamon, Jake Hofman, Emre Kiciman, and David Rothschild (2014). “Online and social media data as a flawed continuous panel survey.” Technical report, Microsoft, Redmond, WA. Dutwin, David (2019). “The need for public opinion research advocacy.” URL: https://www.aapor. org/About-Us/History/Presidential-Addresses/2019-Presidential-Address.aspx, accessed 2020-06-28. Edwards, Chad, Autumn Edwards, Patric R. Spence, and Ashleigh K. Shelton (2014). “Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter.” Computers in Human Behavior 33, 372–376. Etter, Lauren (2017). “Rodrigo Duterte turned Facebook into a weapon, with a little help from Facebook.” URL: https://www.bloomberg.com/news/features/2017-12-07/ how-rodrigo-duterte-turned-facebook-into-a-weapon-with-a-little-help-from-facebook, accessed 2018-03-25. Everett, Richard M., Jason R. C. Nurse, and Arnau Erola (2016). “The anatomy of online deception: What makes automated text convincing?” In Proceedings of the 31st Annual ACM Symposium on Applied Computing, SAC ’16, pp. 1115–1120. ACM. Faris, Robert, Hal Roberts, Bruce Etling, Nikki Bourassa, Ethan Zuckerman, and Yochai Benkler (2017). “Partisanship, propaganda, and disinformation: Online media and the 2016 U.S. presidential elec- tion.” Berkman Klein Center Research Publication (6). Ferrara, Emilio, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini (2016). “The rise of social bots.” Communications of the ACM 59(7), 96–104. Forelle, Michelle C, Philip N. Howard, Andres Monroy-Hernandez, and Saiph Savage (2015). “Political bots and the manipulation of public opinion in Venezuela.” SSRN Electronic Journal. Fu, King-wa and Chee-hon Chan (2013). “Analyzing online sentiment to predict telephone poll results.” CyberPsychology, Behavior & Social Networking 16(9), 702–707. Gallup, G. and S. F. Rae (1940). The pulse of democracy: The public-opinion poll and how it works. Simon & Schuster. Gayo-Avello, D. (2012). “No, you cannot predict elections with Twitter.” IEEE Internet Computing 16(6), 91–94. Gayo-Avello, Daniel, Panagiotis Takis Metaxas, and Eni Mustafaraj (2011). “Limits of electoral predic- tions using Twitter.” In Fifth International AAAI Conference on Weblogs and Social Media. Golder, Scott A. and Michael W. Macy (2011). “Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures.” Science 333(6051), 1878–1881. Gramlich, John (2018). “How we identified bots on Twitter.” Technical report, Pew Research, Washing- ton, D.C. Groshek, Jacob and Megan Clough Groshek (2013). “Agenda trending: Reciprocity and the predictive capacity of social network sites in intermedia agenda setting across issues over time.” Social Science Research Network. Hao, Karen (2020). “Nearly half of Twitter accounts pushing to reopen America may be bots.” URL: https://bit.ly/2Z8ZwfB, accessed 2020-06-25. Harder, Raymond A., Julie Sevenans, and Peter Van Aelst (2017). “Intermedia agenda setting in the social media age: How traditional players dominate the news agenda in election times.” The International Journal of Press/Politics 22(3), 275–293. 73

Herbst, Susan (1998). Reading Public Opinion: How Political Actors View the Democratic Process. University of Chicago Press. Google-Books-ID: RzRH5jfzX1EC.

Hindman, Matthew and Vlad Barash (2018). “Disinformation, ’fake news’, and influence campaigns on Twitter.” Technical report, Knight Foundation, Miami, FL.

Hjouji, Zakaria el, D. Scott Hunter, Nicolas Guenon des Mesnards, and Tauhid Zaman (2018). “The impact of bots on opinions in social networks.” arXiv:1810.12398 [physics, stat].

Hogan, J. Michael (1985). “Public opinion and american foreign policy: The case of illusory support for the Panama Canal treaties.” Quarterly Journal of Speech 71(3), 302–317.

Hogan, J. Michael (1997). “George Gallup and the rhetoric of scientific democracy.” Communication Monographs 64(2), 161–179.

Hogan, J. Michael and Ted J. Smith (1991). “Polling on the issues, public opinion and the nuclear freeze.” Public Opinion Quarterly 55(4), 534–569.

Howard, Philip N. and Bence Kollanyi (2016). “Bots, #StrongerIn, and #brexit: Computational propa- ganda during the U.K.-E.U. referendum.” arXiv:1606.06356 [physics].

Howard, Philip N., Samuel C. Woolley, and Ryan Calo (2018). “Algorithms, bots, and political commu- nication in the U.S. 2016 election: The challenge of automated political communication for election law and administration.” Journal of Information Technology & Politics 15(2), 81–93.

Huberty, Mark Edward (2013). “Multi-cycle forecasting of congressional elections with social media.” In Proceedings of the 2nd workshop on Politics, elections and data - PLEAD ’13, pp. 23–30. ACM Press.

Jaenicke, Douglas W. (2002). “Abortion and partisanship in the U.S. congress, 1976–2000: Increasing partisan cohesion and differentiation.” Journal of American Studies 36(1), 1–22.

Jensen, Michael J. and Nick Anstead (2013). “Psephological investigations: Tweets, votes, and unknown unknowns in the Republican nomination process.” Policy & Internet 5(2), 161–182.

Jin, Fang, Wei Wang, Liang Zhao, Edward Dougherty, Yang Cao, Chang-Tien Lu, and Naren Ramakr- ishnan (2014). “Misinformation propagation in the age of Twitter.” Computer 47(12), 90–94.

Jockers, Matthew L. (2019). “syuzhet.” URL: https://github.com/mjockers/syuzhet, accessed 2019- 09-26.

Jungherr, Andreas, Pascal Jurgens, and Harald Schoen (2012). “Why the Pirate party won the German election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T. O., Sander, P. G., and Welpe, I. M. “Predicting elections with Twitter: What 140 characters reveal about political sentiment”.” Social Science Computer Review 30(2), 229–234.

Jungherr, Andreas, Harald Schoen, Oliver Posegga, and Pascal Jurgens (2017). “Digital trace data in the study of public opinion: An indicator of attention toward politics rather than political support.” Social Science Computer Review 35(3), 336–356.

Karlsen, Rune (2015). “Followers are opinion leaders: The role of people in the flow of political commu- nication on and beyond social networking sites.” European Journal of Communication, 1–18.

Katz, Elihu, Paul F. Lazarsfeld, and Elmo Roper (1955). Personal Influence: The Part Played by People in the Flow of Mass Communications. Routledge.

Keeter, Scott (2012). “Presidential address: Survey research, its new frontiers, and democracy.” Public Opinion Quarterly 76(3), 600–608. 74

Keeter, Scott, Nick Hatley, Courtney Kennedy, and Arnold Lau (2017). “What low response rates mean for telephone surveys.” Pew Research, 1–39.

Kennedy, Courtney, Mark Blumenthal, Scott Clement, Joshua D Clinton, Claire Durand, Charles Franklin, Kyley McGeeney, Lee Miringoff, Kristen Olson, Douglas Rivers, et al. (2018). “An evalu- ation of the 2016 election polls in the United States.” Public Opinion Quarterly 82(1), 1–33.

Kennedy, Ryan, Stefan Wojcik, and David Lazer (2017). “Improving election prediction internationally.” Science 355(6324), 515–520.

Kollanyi, Bence, Philip N. Howard, and Samuel C. Woolley (2016). “Bots and automation over Twitter during the first U.S. presidential debate.” COMPROP Data Memo 2016(1).

LaForme, Ren (2019). “10 percent of Twitter users create 80 percent of tweets, study finds.” Technical report, Poynter, St. Petersburg, FL.

Lang, Kurt and Gladys Engel Lang (1953). “The unique perspective of television and its effect: A pilot study.” American Sociological Review 18(1), 3–12.

Lang, Richard O. (1933). “Review of straw votes: A study of political prediction.” Journal of the American Statistical Association 28(184), 472–474.

Lasswell, Harold D. (1938). Propaganda Technique In The World War. Martino Fine Books.

Lazarsfeld, Paul Felix, Bernard Berelson, and Hazel Gaudet (1944). The People’s Choice: How the Voter Makes Up His Mind in a Presidential Campaign. Columbia University Press.

Lazarsfeld, Paul F and Robert K Merton (1948). Mass Communication, Popular Taste, and Organized Social Action. New York Harper and Brothers.

Lusinchi, Dominic (2012). ““President” Landon and the 1936 Literary Digest poll: Were automobile and telephone owners to blame?” Social Science History 36(1), 23–54.

Marchetti-Bowick, Micol and Nathanael Chambers (2012). “Learning for microblogs with distant super- vision: Political forecasting with Twitter.” In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’12, pp. 603–612. Association for Computational Linguistics.

Massey, Douglas S. and Roger Tourangeau (2013). “Introduction: New challenges to social measurement.” Annals of the American Academy of Political and Social Science 645, 6–22.

Mayer-Schonberger, Viktor and Kenneth Cukier (2014). Big Data: A Revolution That Will Transform How We Live, Work, and Think. Mariner Books, Houghton Mifflin Harcourt.

McGregor, Shannon C (2019). “Social media as public opinion: How journalists use social media to represent public opinion.” Journalism 20(8), 1070–1086.

Molyneux, Logan and Rachel R. Mourao (2019). “Political journalists’ normalization of Twitter: Inter- action and new affordances.” Journalism Studies 20(2), 248–266.

Narayanan, Vidya, Vlad Barash, John Kelly, Bence Kollanyi, Lisa-Maria Neudert, and Philip N. Howard (2018). “Polarization, partisanship and junk news consumption over social media in the U.S.” arXiv:1803.01845 [cs].

Nicolson, Harold (1937). “British public opinion and foreign policy.” The Public Opinion Quarterly 1(1), 53–63. 75

Nosek, B. A., G. Alter, G. C. Banks, D. Borsboom, S. D. Bowman, S. J. Breckler, S. Buck, C. D. Chambers, G. Chin, G. Christensen, M. Contestabile, A. Dafoe, E. Eich, J. Freese, R. Glennerster, D. Goroff, D. P. Green, B. Hesse, M. Humphreys, J. Ishiyama, D. Karlan, A. Kraut, A. Lupia, P. Mabry, T. Madon, N. Malhotra, E. Mayo-Wilson, M. McNutt, E. Miguel, E. Levy Paluck, U. Si- monsohn, C. Soderberg, B. A. Spellman, J. Turitto, G. VandenBos, S. Vazire, E. J. Wagenmakers, R. Wilson, and T. Yarkoni (2015). “Promoting an open research culture.” Science 348(6242), 1422–1425. Noussair, C., S. Robin, and B. Ruffieux (2001). “Genetically modified organisms in the food supply: Public opinion vs. consumer behavior.” Purdue University, Department of Economics. Nugroho, Robertus, Weiliang Zhao, Jian Yang, Cecile Paris, and Surya Nepal (2017). “Using time- sensitive interactions to improve topic derivation in Twitter.” World Wide Web 20(1), 61–87. O’Connor, Brendan, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith (2010). “From tweets to polls: Linking text sentiment to public opinion time series.” In Proceedings of the Inter- national AAAI Conference on Weblogs and Social Media (ICWSM), Volume 11, pp. 122–129. Open Science Collaboration (2015). “Estimating the reproducibility of psychological science.” Sci- ence 349(6251), aac4716–aac4716. Pagolu, Venkata Sasank, Kamal Nayan Reddy Challa, Ganapati Panda, and Babita Majhi (2016). “Sen- timent analysis of Twitter data for predicting stock market movements.” arXiv:1610.09225 [cs]. Panagopoulos, Costas (2009). “Polls and elections: Preelection poll accuracy in the 2008 general elec- tions.” Presidential Studies Quarterly 39(4), 896–907. Pasek, Josh, Colleen A. McClain, Frank Newport, and Stephanie Marken (2019). “Who’s tweeting about the president? What big survey data can tell us about digital traces?” Social Science Computer Review. Porter, Jon (2019). “Twitter removes support for precise geotagging because no one uses it.” URL: https://bit.ly/2zbh7KS, accessed 2019-06-25. Princeton, University (2019). “About WordNet.” URL: https://bit.ly/37bIkJP, accessed 2019-09-10. Richter, Felix (2019). “Infographic: Landline phones are a dying breed.” URL: https://www.statista. com/chart/2072/landline-phones-in-the-united-states/, accessed 2019-06-13. Rinker, Tyler (2019). “sentimentr.” URL: https://github.com/trinker/sentimentr, accessed 2019- 09-26. Robinson, Claude E. (1937). “Recent developments in the straw-poll field.” The Public Opinion Quar- terly 1(3), 45–56. Rogers, Theresa F. (1976). “Interviews by telephone and in person: Quality of responses and field performance.” Public Opinion Quarterly 40(1), 51–65. Salganik, Matthew (2019). Bit by Bit: Social Research in the Digital Age. Princeton University Press. Google-Books-ID: 58iXDwAAQBAJ. Savage, Mike and Roger Burrows (2007). “The coming crisis of empirical sociology.” Sociology 41(5), 885–899. Schober, Michael F., Josh Pasek, Lauren Guggenheim, Cliff Lampe, and Frederick G. Conrad (2016). “Social media analyses for social measurement.” Public Opinion Quarterly 80(1), 180–211. Schwartz, Raz and Germaine R Halegoua (2015). “The spatial self: Location-based identity performance on social media.” New Media & Society 17(10), 1643–1660. 76

Silver, Nate (2019). “The state of the polls, 2019.” URL: https://fivethirtyeight.com/features/ the-state-of-the-polls-2019/, accessed 2020-01-07. Slade, Stephanie (2016). “Why polls don’t work.” URL: https://reason.com/2016/01/14/ why-polls-dont-work/, accessed 2020-06-28. Smith, Tom W. (2013). “Survey-research paradigms old and new.” International Journal of Public Opinion Research 25(2), 218–229. Stansberry, Kathleen (2012). “One-step, two-step, or multi-step flow: The role of influencers in informa- tion processing and dissemination in online, interest-based publics. University of Oregon. Taylor, Sean J. (2013). “Real scientists make their own data.” URL: https://seanjtaylor.com/post/ 41463778912/real-scientists-make-their-own-data, accessed 2019-04-17. Team, R Core. “R: A language and environment for statistical computing.” URL: http://www. R-project.org/, accessed 2019-04-15. Tjong Kim Sang, Erik and Johan Bos (2012). “Predicting the 2011 Dutch senate election results with Twitter.” In Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 53–60. Associ- ation for Computational Linguistics. Tromble, Rebekah, Andreas Storz, and Daniela Stockmann (2017). “We don’t know what we don’t know: When and how the use of Twitter’s public APIs biases scientific inference.” SSRN Electronic Journal. Tufekci, Zeynep (2014). “Big questions for social media big data: Representativeness, validity and other methodological pitfalls.” arXiv:1403.7400 [physics]. Tumasjan, Andranik, Timm O. Sprenger, Philipp G. Sandner, and Isabell M. Welpe (2010). “Predicting elections with Twitter: What 140 characters reveal about political sentiment.” In Fourth Interna- tional AAAI Conference on Weblogs and Social Media. Twitter. “Standard search API.” URL: https://developer.twitter.com/en/docs/tweets/search/ api-reference/get-search-tweets.html, accessed 2018-11-28. Twitter (2019). “Streaming API.” URL: https://developer.twitter.com/en/docs/tweets/ filter-realtime/api-reference/post-statuses-filter.html, accessed 2019-10-10. Wald, Randall, T. M. Khoshgoftaar, A. Napolitano, and C. Sumner (2013). “Predicting susceptibility to social bots on Twitter.” In 2013 IEEE 14th International Conference on Information Reuse & Integration, pp. 6–13. Wang, Cheng-Jun, Pian-Pian Wang, and Jonathan J.H. Zhu. “Discussing occupy wall street on Twitter: Longitudinal network analysis of equality, emotion, and stability of public discussion.” Cyberpsy- chology, Behavior and Social Networking 16(9), 679–685. Warren, Kenneth F. (2001). In Defense Of Public Opinion Polling. Routledge. Watson, Brendan R., Rodrigo Zamith, Sarah Cavanah, and Seth C. Lewis (2015). “Are demographics adequate controls for cell-phone-only coverage bias in mass communication research?” Journalism & Mass Communication Quarterly 92(3), 723–743. Weeks, Brian E., Alberto Ardevol-Abreu, and Homero Gil de Zuniga (2017). “Online influence? Social media use, opinion leadership, and political persuasion.” International Journal of Public Opinion Research 29(2), 214–239. Wirth, Kurt, Ericka Menchen-Trevino, and Ryan T. Moore (2019). “Bots by topic: Exploring differences in bot activity by conversation topic.” In Proceedings of the 10th International Conference on Social Media and Society, SMSociety ’19, pp. 77–82. ACM. 77

Woolley, Samuel C. (2016). “Automating power: Social bot interference in global politics.” First Mon- day 21(4). Wu, Shaomei, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts (2011). “Who says what to whom on Twitter.” In Proceedings of the 20th international conference on World wide web - WWW ’11, pp. 705. ACM Press. Zhang, XiaoChi, Lars Kuchinke, Marcella L. Woud, Julia Velten, and J¨urgenMargraf (2017). “Sur- vey method matters: Online/offline questionnaires and face-to-face or telephone interviews differ.” Computers in Human Behavior 71, 172–180. APPENDIX A RQ1 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT

Table 6. RQ1 results with individual data (standard error in parentheses)

Zero Lag Poll Four Day Lag One Week Lag Two Week Lag (n = 30) (n = 26) (n = 23) (n = 16) Economy 9.795 (21.657) -1.382 (20.646) -2.214 (16.049) 20.118 (14.988) President 11.064 (17.684) 36.187 (18.234) -55.716** (15.569) 19.803 (32.251) ** p < .01

Table 7. RQ1 results with individual but no verified data (standard error in parentheses)

Zero Lag Poll Four Day Lag One Week Lag Two Week Lag (n = 30) (n = 26) (n = 23) (n = 16) Economy 9.091 (21.327) -2.002 (20.316) -1.673 (15.804) 19.616 (14.743) President 10.634 (17.618) 36.981 (17.995) -54.953** (15.554) 19.220 (32.161) ** p < .01

78 APPENDIX B RQ1 ROBUSTNESS CHECK

The regression model was plotted to identify outliers using the Cook’s Distance plot. Those outliers were temporarily removed from the data set and another linear regression was run. The estimate returned was -52.098 (compared to the original data’s -59.963), showing that the data was not sensitive to outliers and the relationship remains negative despite them.

79 APPENDIX C RQ2 RESULTS VIA INDIVIDUAL-BASED SENTIMENT MEASUREMENT

Table 8. Individual-based sentiment vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses)

-4 Days -3 -2 -1 0 +1 +2 +3 +4 Bots (Econ.) 0.094 0.127 -0.246 -0.107 0.207 0.269α 0.021 -0.110 -0.119 (0.183) (0.180) (0.168) (0.155) (0.137) (0.133) (0.155) (-0.103) (-0.052)

Bots (Pres.) -0.024 -0.069 -0.010 0.006 0.085 -0.040 0.155* -0.103 -0.052 (0.066) (0.064) (0.064) (0.071) (0.070) (0.070) (0.072) (0.077) (0.077) α p < 0.1 * p < .05

Table 9. Individual-based sentiment without verified users vs. bots regression coefficient estimate across lag times between bot sentiment and Twitter sentiment (standard error in parentheses)

-4 Days -3 -2 -1 0 +1 +2 +3 +4 Bots (Econ.) -0.094 0.128 -0.254 -0.110 0.210 0.279* 0.017 -0.116 -0.131 (0.186) (0.183) (0.171) (0.157) (0.139) (0.135) (0.153) (0.154) (0.159)

Bots (Pres.) -0.024 -0.075 -0.014 0.007 0.091 -0.036 0.154* -0.109 -0.056 (0.067) (0.064) (0.064) (0.071) (0.070) (0.070) (0.072) (0.077) (0.077) * p < .05

80 APPENDIX D ABSOLUTE VALUES VS. CHANGE

Presidential Approval Presidential Approval Change

52 2

50 1 48

46 0

44 -1 42

40 -2 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 15. Absolute presidential approval (left Y axis) compared to presidential approval change (right Y axis).

81 82

Economic Confidence Economic Confidence Change

160 3

2 155 1

150 0

-1 145 -2

140 -3 1/26/2020 2/2/2020 2/9/2020 2/16/2020 2/23/2020

Figure 16. Absolute economic confidence (left Y axis) compared to economic confidence change (right Y axis).