<<

“IT DOESN’T MATTER NOW WHO’S RIGHT AND WHO’S NOT:” A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON

by

Braeden Bowen

Honors Theis Submitted to the Department of Computer Science and the Department of Wittenberg University In partial fulfillment of the requirements for Wittenberg University honors April 2021 Bowen 2

On April 18, 2019, Special Counsel Robert Mueller III released a 448-page report on Russian influence on the 2016 United States presidential [32]. In the report, Mueller and his team detailed a vast network of false accounts acting in a coordinated, concerted campaign to influence the outcome of the election and insert systemic distrust in Western . Helmed by the Russian (IRA), a state-sponsored organization dedicated to operating the account network, the campaign engaged in " warfare" to undermine the United States democratic .

Russia's campaign of influence on the 2016 U.S. is emblematic of a new breed of warfare designed to achieve long-term foreign goals by preying on inherent social vulnerabilities that are amplified by the novelty and anonymity of social media [13]. To this end, state actors can weaponize automated accounts controlled through software [55] to exert influence through the dissemination of a or the production of inorganic support for a person, issue, or event [13].

Research Questions This study asks six core questions about bots, bot activity, and online:

RQ 1: What are bots? RQ 2: Why do bots work? RQ 3: When have bot campaigns been executed? RQ 4: How do bots work? RQ 5: What do bots do? RQ 6: How can bots be modeled?

Hypotheses With respect to RQ 6, I will propose BotWise, a model designed to distill average behavior on the social media platform Twitter from a of real users and compare that data against input. Regarding this model, I have three central hypotheses:

H 1: real users and bots exhibit distinct behavioral patterns on Twitter H 2: the behavior of accounts can be modeled based on account data and activity H 3: novel bots can be detected using these models by calculating the difference between modeled behavior and novel behavior

Bots Automated accounts on social media are not inherently malicious. Originally, software robots, or "bots," were used to post content automatically on a set schedule. Since then, bots have evolved significantly, and can now be used for a variety of innocuous purposes, including marketing, distribution of information, automatic responding, news aggregation, or just for highlighting and reposting interesting content [13].

No matter their purpose, bots are built entirely from human-written code. As a result, every and decision they are made capable of replicating must be preprogrammed and decided by the account's owner. But because they are largely self-reliant after creation, bots can generate massive amounts of content and data very quickly. Bowen 3

Many limited-use bots make it abundantly clear that they are inhuman actors. Some bots, called social bots, though, attempt to subvert real users by emulating human behavior as closely as possible, creating a mirage of imitation [13]. These accounts may attempt to build a credible persona as a real person in order to avoid detection, sometimes going as far as being partially controlled by a human and partially controlled by software [54]. The more sophisticated the bot, the more effectively it can shroud itself and blend into the landscape of real users online.

Not all social bots are designed benevolently. Malicious bots, those designed with an exploitative or abusive purpose in mind, can also be built from the same framework that creates legitimate social bots. These bad actors are created with the intention of exploiting and manipulating information by infiltrating a population of real, unsuspecting users [13].

If a malicious actor like 's Internet Research Agency were invested in creation a large-scale disinformation campaign with bots, a single account would be woefully insufficient to produce meaningful results. Malicious bots can be coordinated with extreme scalability to feign the existence of a unified populous or movement, or to inject disinformation or polarization into an existing community of users [13], [30]. These networks, called "troll factories," "farms," or "botnets," can more effectively enact an influence campaign [9] and are often hired by partisan groups or weaponized by states to underscore or amplify a political narrative.

Social Media Usage In large part, the effectiveness of bots depends on users' willingness to engage with social media. Luckily for bots, social media usage in the U.S. has skyrocketed since the medium's inception in the early 2000's. In 2005, as the Internet began to edge into American life as a mainstay of communication, a mere 5% of Americans reported using social media [40], which was then just a burgeoning new form of online interconnectedness. Just a decade and a half later, almost 75% of Americans found themselves utilizing YouTube, , Snapchat, , or Twitter. In a similar study, 90% of Americans 18-29, the lowest age range surveyed, reported activity on social media [39]. In 2020, across the globe, over 3.8 billion people, nearly 49% of the world's population, held a presence on social media [23]. In April 2020 alone, Facebook reported that more than 3 billion of those people had used its products [36].

The success of bots also relies on users' willingness to utilize social media not just as a platform for social connections, but as an information source. Again, the landscape is ripe for influence: in January 2021, more than half (53%) of U.S. adults reported reading news from social media and over two-thirds (68%) reported reading news from news websites [45]. In a 2018 Pew study, over half of Facebook users reported getting their news exclusively from Facebook [14]. In large part, this access to information is free, open, and unrestricted, a novel method for the dissemination of .

Generally, social media has made the transmission of information easier and faster than ever before [22]. Information that once spread slowly by mouth now spreads instantaneously through increasingly massive networks, bringing worldwide communication delays to nearly zero. Platforms like Facebook and Twitter have been marketed by proponents of democracy as a of increasing democratic participation, free , and political engagement [49]. In theory, Sunstein [47] says, social media as a vehicle of self-governance should bolster democratic Bowen 4 information sharing. In , though, the proliferation of "," disinformation, and polarization have threatened cooperative political participation [47]. While social media was intended to decentralize and popularize democracy and free speech [49], the advent of these new platforms have inadvertently decreased the authority of institutions (DISNFO) and the power of public officials to influence the public agenda [27] by subdividing groups of people into unconnected spheres of information.

Social Vulnerabilities Raw code and widespread social media usage alone are not sufficient to usurp an electoral process or disseminate a nationwide disinformation campaign. To successfully avoid detection, spread a narrative, and eventually "hijack" a consumer of social media, bots must work to exploit a number of inherent social vulnerabilities that, while largely predating social media, may be exacerbated by the platforms' novelty and opportunity for relative anonymity [44]. Even the techniques for social exploitation are not new: methods of social self-insertion often mirror traditional methods of exploitation for software and hardware [54].

The primary social vulnerability that bot campaigns may exploit is division. By subdividing large groups of people and herding them into like-minded circles of users inside of which - affirmative information flows, campaigns can decentralize political and social , reinforce beliefs, polarize groups, and, eventually, pit groups against one another, even when screens are off [31].

Participatory Media Publically and commercially, interconnectedness, not disconnectedness, is the animus of social media platforms like Facebook, whose public aim is to connect disparate people and give open access to information [58].

In practice, though, this interconnectedness largely revolves around a user's chosen groups, not the platform's entire user base. A participant in social media is given a number of choices: what platforms to join, who to connect with, who to follow, and what to see. Platforms like Facebook and Twitter revolve around sharing information with users' personal connections and associated groups: a tweet is sent out to all of a user's followers, and a Facebook status update can be seen by anyone within a user's chosen group of "friends." Users can post text, pictures, GIFs, videos, and links to outside sources, including other social media sites. Users also have the ability to restrict who can see the content they post, from anyone on the entire platform to no one at all.

Users chose what content to participate in and interact with and chose which groups to include themselves in. This choice is the first building block of division: while participation in self-selected groups online provides users with a sense of community and belonging [5], it also builds an individual group identity [20] that may leave users open to manipulation of their social vulnerabilities.

Social Media Bowen 5

Because so many people use social media, companies have an ever-increasing opportunity to generate massive profits through advertisement revenue. Potential advertisers, then, want to buy advertisement space on the platforms that provide the most eyes on their products [24].

In order to drive profits and increase the visibility of advertisements on their platforms [24], though, social media companies began to compete to create increasingly intricate algorithms designed to keep users on their platforms for longer periods of time [31], increasing both the number of tweets a user saw and the number of advertisements they would see [24].

Traditionally, the content a user saw on their "timeline" or "feed," the front page of the platform that showed a user's chosen content, was displayed chronologically, from newest to oldest. Modern social media algorithms, though, are designed to maximize user engagement by sorting content from most interesting to least interesting [14], rather than simply newest to oldest (although relevancy and recency are still factors).

The most prominent method of sorting content to maximize engagement is a ranking . On their own, ranking algorithms are designed to prioritize a most likely solution to a given problem. On social media, they are designed to predict and prioritize content that a user is most likely to interact with, thus extending time spent on the platform [24].

Ranking algorithms require a large amount of intricate, to make acute decisions. To amass this information, platforms like Twitter and Facebook collect "engagement" data [24], including how often a user sees a certain kind of post, how long they look at it, whether they click the photos, videos, or links included in the post, and whether they like, repost, share, or otherwise engage with the content. Interacting with a post repeatedly or at length is seen as positive engagement, which provides a subtle cue to the ranking algorithm that a user may be more interested in that kind of content. Even without any kind of interaction, the length of time spent on a post is enough to trigger a reaction by the algorithm.

When a user opens Twitter, the algorithm pools all the content posted by users they follow and scores each based on the previously collected engagement data [24]. Higher-ranking posts, the content a user is most likely to engage with, are ordered first, and lower-ranking posts are ordered last. Some algorithms may cascade high-engagement posts, saving some for later in the timeline in the hope of further extending time spent consuming content. Meanwhile, advertisements are sprinkled into the ranking, placed in optimized spaces to fully maximize the likelihood that a user sees and engages with them.

Social media algorithms are not designed to damage information consumption [49] or facilitate bot campaigns, but users' ability to selectively build their own personalized profile accidentally leaves them vulnerable to a social attack. Consider this scenario: a user follows only conservative pundits on Twitter (e.g., “@TuckerCarlson,” “@JudgeJeanine”) and interacts only with conservative video posts. If one such pundit inadvertently reposts a video posted by a bot which contains false information, the user is more likely to see that video and thus absorb that false information than someone who follows mixed or liberal sources. The algorithm does not consider whether the information is true— only whether a user is likely to want to interact with it. Viral Spread Bowen 6

Beyond a user's timeline, many platforms also utilize algorithms to aggregate worldwide user activity data into a public "trending" page that collects and repackages a slice of popular or "viral" topics of conversation on the platform.

Virality online is not necessarily random, though. Guadagno et al. [18] found that videos could be targeted at specific users, who themselves could be primed to share content more rapidly and more consistently. Emotional connections, especially those which evoked positive responses, were shared more often than not.

Algorithms are demonstrably effective at keeping users engaged for longer periods of time. Their subversive and covert nature also makes their manipulation less obvious, a fact which bots are also effective at utilizing. Algorithms have become the standard for the population of content into a user's feed, from timelines to trending pages to advertisement [24].

Filter Bubbles The primary consequence of algorithmic subdivision of users is what Pariser [35] calls a "." By design, social media users adhere to familiar, comfortable spheres that do not challenge their preconceived beliefs and ideas. If, for instance, a liberal user only follows and interacts with liberal accounts, their ranking algorithm will only have liberal content to draw from when creating a tailored feed; as a result, their ranked information will consist of only the most agreeable liberal content available. Even if that user follows some conservative platforms but does not interact with them, those accounts will receive a low ranking and will be less likely to be seen at all.

This is a filter bubble: a lack of variability in the content that algorithms feed to users [35]. In an effort to maximize advertisement revenue, algorithms incidentally seal off a user's access to diverse information, limiting them to the bubble that they created for themselves by interacting with the content that they prefer.

As users follow other users and engage with content that is already familiar to them, the filter bubble inadvertently surrounds them with content with which they already know and agree with [35]. Psychologically, Pariser says, content feeds tailored to exclusively agreeable content overinflate the confidence of social media users to reflect on their own and upset the traditional cognitive balance between acquiring new ideas and reinforcing old ones.

While delivering on the promise of an individually personalized feed, ranking algorithms also serve to amplify the principle of confirmation , the tendency to accept unverified content as correct if it agrees with previously held beliefs [35]. In this way, filter bubbles as feedback loops: as users increasingly surround themselves with content that appeals to their existing understanding, the filter bubble of agreeable content becomes denser and more concentrated.

Filter bubbles are the first step towards efficacy for a bot campaign. To be adequately noticed, and thus to disseminate a narrative, a bot needs to be able to work its way into the algorithm pathway that will lead to first contact with a relevant user's filter bubble feed.

Echo Chambers Bowen 7

Bots may also exploit unity as much as division. Mandiberg and Davidson [28] theorized that users' preexisting , which are folded into the ranking algorithm process, could drive the filter bubble process to a more extreme level, one that may break through individual algorithmic boundaries.

Filter bubbles operate on an individual level— each user's feed is algorithmically tailored to their individual likes and interests. One of the core elements of social media usage, though, is social interaction: users are presented with the choice to follow, unfollow, and even block whomever they please. Given a path of least resistance, users may be accidentally goaded by their filter bubbles into creating for themselves an "ideological cocoon" [16] [47]. Not only are users more likely to read information they agree with inside of their own filter bubble, Karlova and Fisher [22] found, but they are also more likely to share information, news, and content within their established groups if they agree with it, if it interests them, or if they think it would interest someone else within their circle.

Gillani et al. [16] posited that may be to blame for the creation of "cocoons." Homophily, a real-world phenomenon wherein people instinctively associate with like-minded groups, has found a natural home in filter bubbles online [56]. Unlike real-world public engagement, though, participatory media platforms are just that— participatory. Because humans have a tendency to select and interact with content that they approve of, homophily, they can choose not to participate in or be a part of groups that they do not identify with [28].

On social media, interactions with others are purely voluntary. Users are able to choose to or not to follow other users. They are able to choose which groups they join and which kinds of content they interact with. They can even completely remove content that is non-compliant with their chosen culture— blocking. Beyond just being more likely to see, social media users are more likely to share information that they agree with; if the platform's algorithm takes sharing into consideration when ranking, it may strengthen a user's filter bubble [1]. This style of behavior is called "selective exposure," and it can quickly lead to an elimination of involuntary participation from a user's social media experience [4].

This combination of factors creates what Geschke, Lorenz, and Holtz [15] describe as a "triple filter bubble." Building on Pariser's [35] definition of the algorithmic filter bubble, they propose a three-factor system of filtration: individual, social, and technological filters. Each filter feeds into the other: individuals are allowed to make their own decisions. In combination, groups of like- minded users make similar choices about which content they willingly consume. Algorithms, knowing only what a user wants to see more of, deliver more individually engaging content.

The triple filter bubble cycle has the effect of partitioning the information landscape between groups of like beliefs. A new user to Twitter, a Democrat, may choose to follow only Democratic- leaning accounts (e.g., "@TheDemocrats," "@SpeakerPelosi," "@MSNBC"), but the information sphere they reside in will ostensibly present overwhelmingly pro-Democratic content that rarely provides an ideological challenge to the Democratic beliefs the user had before joining Twitter.

When like-minded users' filter bubbles overlap, they cooperatively create an , into and out of which counter-cultural media and information are unlikely to cross [15]. Echo chambers Bowen 8

represent a more concentrated pool of information than a single user's filter bubble or participatory groups: information, shared by users to others on timelines, in reposts, and in direct messages, rarely escapes to the broader platform or into other filter bubbles [18].

Echo chambers can quickly lead to the spread of [4], even through viral online content called "," which are designed to be satirical or humorous in nature. Importantly, Guadagno et al. [18] found that the source of social media content had no impact on a user's decision to share it— only the emotional response it elicited impacted decision-making. An appeal to emotion in an echo chamber can further strengthen the walls of the echo chamber, especially if the appeal has no other grounds of legitimacy.

Group Polarization Echo chambers, Sunstein argues, have an unusual collateral effect: [47]. When people of similar opinions, those confined within a shared echo chamber, discuss an issue, individuals' positions will not remain unchanged, be moderated, or be curtailed by discussion: they will be extremified [48]. Group polarization also operates outside of the confines of social media, in family groups, ethnic groups, and work groups. Inside of an online echo chamber, though, where the saliency of contrary information is low and the saliency of belief-affirming, emotionally reactive information is high, polarization may be magnified [47]. The content users settled in an echo chamber produce tends to follow the same pattern of extremization [48].

Traditionally, the counterstrategy for group polarization has been to expose users to information that is inherently contrary to their held belief (e.g., the exemplary Democratic user should be made to have tweets from "@GOP" integrated into their timeline) [47]. The solution may not be so simple, though: recent research suggests that users exposed to contrary information or arguments that counter the authenticity of a supportive source tend to harden their support for their existing argument rather than reevaluate authenticity [31], [16]. Thus, both contrastive and supportive information, when inserted into an echo chamber, can increase polarization within the group.

Extreme opposing content presents another problem for polarization: virality. Content that breaches social norms generates shock value and strong emotions, making it more likely to be circulated than average content [18]. Compounding the visibility of extremity, social media algorithms that categorize and publicize "viral" popular trends utilize that content to maximize engagement. An internal Facebook marketing presentation described the situation bluntly: "Our algorithms exploit the human brain's attraction to divisiveness," one slide said [36].

Because filter bubbles and echo chambers limit the extent to which groups cross-pollenate beliefs and ideas, only extreme and unrepresentative beliefs tend to break through and receive inter-group exposure [31]. This condition, the "if it's outrageous, it's contagious" principle, may provide an answer as to why contrary information on social media tends to push users further away from consensus.

Intention may introduce another complication. Yardi's [56] review of several studies on digital group polarization found that people tended to go online not to agree, but to argue. This may also indicate that users are predisposed to rejecting contrary information, allowing them to fall back on their preexisting beliefs. Bowen 9

Cultivation Echo chambers, filter bubbles, and group polarization are the central social vulnerabilities that bots are able to exploit to deliver a payload of disinformation, but even these may be subject to a much older model of information: cultivation theory.

First theorized in the 1970's to explain television's ability to define issues that viewers believed were important, cultivation theory has more recently been reapplied to the burgeoning social media model. The theory states that by selecting specific issues to discuss, the media "set the agenda" for what issues viewers consider important or pervasive [34].

Just like television, social media may a role in shaping our perceptions of reality [31]. On social media, though, agenda- is self-selective: ranking algorithms and filter bubbles rather than television producers frame users’ understanding of the world through the information they consume. The information that users chose to post is also self-selective: images or stories may represent only a sliver of reality or may not represent reality at all [34].

Authenticity Malicious bots' manipulation of social vulnerabilities is predicated on a constant and ongoing appeal to the users whom they hope to target by tailoring their identity and content to appeal to those targets. All of this shared content, though, requires authenticity, the perception that a piece of content is legitimate, by average users [30]. In a traditional news environment, content with a well-recognized name (e.g., CNN, Wall Street Journal) or expertise on a topic (e.g., U.S. Department of State) often carries authenticity just by nature of being associated with that well- known label. In contrast, content online can acquire authenticity by being posted by an average, or visibly average, user: the more organic, "bottom-up" content is shared or rewarded with interactions, the more authenticity it generates. Content that is false but appeals to a user's filter bubble [35] and appears authentic is more likely to be spread [30].

Authenticity can be achieved at multiple levels of interaction with a piece of content: a tweet itself, the account's profile, the account's recent tweets, the account's recent replies, and the timing and similarity between each of the former. Low-level authenticity analysis requires the least amount of available information, while high-level authenticity checks require most or all available information.

Authenticity may also be tracked over time: users that see a single account repeatedly, view its profile, or regularly interact with its content must have long-term reinforcement of the account's percieved legitimacy. Having the hallmarks of a legitimate account, including posting on a variety of topics, can help increase a bot's authenticity.

Authenticity is generated through the passage of checks, the process of looking for cues about the authenticity or lack thereof for an account. These checks can occur at each layer of authenticity and can help uncover unusual behavior indicative of a bot [58]. Bots that do not have multiple layers of authenticity are less effective at preying on social vulnerabilities and thus less effective at achieving their intended goal.

Bowen 10

All content and all users on a platform are subject to deception checks, but even cues for deception checks are subject to social vulnerabilities [22]. Content that appeals to a preexisting bias may be able to bypass checks and deliver the desired narrative payload.

Authority If a user has authenticity, it has authority, or the perception that, based on authenticity, a user is acceptable and that their information is accurate. Authority can either come from the top or the bottom of an information landscape. "Top-down" authority is derived from a single figure with high authenticity, like official sources, popular figures, or verified users (e.g., "@POTUS," "@VP," "@BarackObama"). "Bottom-up" authority is derived from a wide number of users who may have less authenticity, but have a consensus, expose a user to a concept repeatedly, or are swept up in some "viral" content (e.g., the sum of the accounts a user is following).

Authority and authenticity are not universal: different users have different standards of authenticity, perform varying degrees of deception checks, and agree variably on the legitimacy of a source. Authority and authenticity are often conflated with agreeability by users: the degree to which they ideologically agree with a piece of content may dictate its reliability. While this is not a legitimate method for deception checks, bots can effectively prey on users by exploiting their social vulnerabilities and predilection for agreeable information to disseminate a desired narrative [22]. Once authenticity and authority have been established, though, a user is likely to accept the information that they consume from these sources, regardless of its actual veracity.

Disinformation Riding the wave of algorithmic sorting and exploiting social vulnerabilities, malicious bots can weaponize authenticity and authority to distribute their primary payloads: damaging information.

As the efficacy of filter bubbles show, consumers of social media are actively fed information that agrees with their social circles' previously held beliefs, enhancing the effect off the filter bubble feedback loop [2]. In fact, the desire for emotionally and ideologically appealing, low-effort, biased information is so strong that social media users are predisposed to accepting false information as correct if it seems to fit within their filter bubble [2] and if it has percieved authority.

Accounts online can spread three types of information through filter bubbles and echo chambers. Information in its base form is intentionally true and accurate, while misinformation is unintentionally inaccurate, misleading, or false. Critically, though, disinformation, the favored fashion of information spread by malicious bots, is more deliberate. Bennett and Livingston [2] define disinformation as "intentional falsehoods spread as news stories or simulated documentary formats to advance political goals." Simply put, disinformation is strategic misinformation [38]. Fig. 1 depicts such a strategy: by mixing partially true and thoroughly false statements in one image, the lines between information and disinformation are easily blurred.

Bowen 11

Sparkes-Vian [46] argues that while the democratic nature of online social connectivity should foster inherent counters to inaccurate or intentionally false information, appeals to authenticity and subversive tactics for deception online supersede corrective failsafes and allow disinformation to roost. Disinformation can be so effective at weaponizing biases that it can be spread through filter bubbles in the same manner as factual information [26].

Consensus on the mechanisms of disinformation's efficacy has yet to be reached. Some research has found that deception checks may be "hijacked" by existing biases [38], but this conclusion is undercut by the fact that rationalization of legitimate contrary information spurs increased polarization [31]. Pennycook [38], meanwhile, has concluded that such Figure 1 reasoning does not occur at all in the social media setting: "cognitive laziness," a lack of engagement of critical thinking skills while idly scrolling through social media, may disengage critical reasoning skills entirely. Social media users that consistently implemented deception checks and reflective reasoning, Pennycook found, were more likely to discern disinformation from information. Even when reading politically supportive or contrastive information, users who performed effective deception checks were more efficacious at rooting out misinformation.

Memes "Memes," shared jokes or images that evolve based on an agreed-upon format, offer an easy vehicle for disinformation to spread without necessarily needing to generate the same authenticity cues as fake news [46]. Like disinformation, memes often appeal to emotion or preexisting biases, propagating quickly through filter bubbles. Memes are also easily re-formatted and re-posted with an account-by-account augmentable meaning. According to Sparks-Vian [46], memes are shared either by "copying a product," where the identical likeness of a piece of content is shared repeatedly, or "copying by instruction," where a base format is agreed upon and variations of the format are shared individually with varying meanings and techniques. Fig. 2 depicts a "copy by instruction" format with disinformation. Figure 2

Doxing Another less common tactic for disinformation spreading is "doxing," whereby private information or potentially compromising material is published on sites like WikiLeaks or Предатель ("traitor") Bowen 12

and redistributed on social media [30]. Doxing, also used as a tactic of the American alt-right on message boards like 4chan and 8chan, has most visibly been used to leak emails from the Democratic National Committee (DNC) in June 2016 and French presidential candidate Emanuel Macron in May 2017 [29].

Fake News One of the most-discussed mediums for disinformation is "fake news," news stories built on or stylized by disinformation. While the term entered the national lexicon during the 2016 U.S. presidential election, misrepresentative news articles are not a new problem— since the early 2000's, online fake news has been designed specifically to distort a shared [57].

Fake news may proliferate along the same channels as disinformation, and as with other forms of disinformation, fake news, inflammatory articles, and conspiracy theories inserted into an echo chamber may increase group polarization [58]. Fake news articles and links may bolster a malicious bot's efforts to self-insert into a filter bubble, especially if headlines offer support for extant beliefs [57].

While disinformation in its raw form often carries little authenticity, disinformation stylized as legitimate news may build authority, even if the source represents a false claim. If a fake website, like that in Fig. 3, can pass low-level deception checks, it can bolster the legitimacy of a claim, thus boosting the authority of a narrative. Of course, fake news can contribute to disinformation by simply being seen and interpreted as legitimate.

Russia's Internet Research Agency Figure 3 effectively weaponized fake news to galvanize its readers away from more credible sources during the 2016 U.S. presidential election. By investing in the bottom-up narrative that the mainstream media was actually "fake news" and that alternative sources were the only legitimate way to understand the world, Russia was able to label the media, users, and platforms attempting to correct false information as suppressers of a real narrative. A similar waning trust in traditional media and political principles allowed the IRA to engage in the complete fabrication of events of its own design [26].

Prior Exposure Fake news is also able to effectively spread by praying on a more subtle social vulnerability: prior exposure. Pennycook, Cannon, and Rand [37] found that simply seeing a statement multiple times Bowen 13

increased readers' likelihood of recalling the statement as accurate later. Even statements that were officially marked as false were more likely to be recalled as true later. Similarly, blatantly partisan and implausible headlines, like that of Fig. 3, are more likely to be recalled as true if users are repeatedly exposed to them online. Just a single prior exposure to fake news of any type was enough to increase the likelihood of later misidentification.

Prior exposure to fake news creates the "Illusory Truth Effect:" since repetition increases the cognitive ease with which information is processed, repetition can be incorrectly used to infer accuracy [37].

The illusory truth problem supports Pennycook's [38] cognitive laziness hypothesis: because humans "lazy scroll" on social media and passively consume information, repetition is an easy way to mentally cut corners for processing information. As a result, though, false information that comes across a user's social media feed repeatedly is more likely to be believed, even if it is demonstrably false.

Propaganda Damaging information need not be false at all, though. , another form of information, may be true or false, but consistently pushes a political narrative and discourages other viewpoints [49]. Traditional propaganda, like that generated by propaganda factories during the 20th century [25], follows the top-down authority model, being created by a state or organization seeking to influence public opinion. Propaganda on social media, however, may follow the bottom- up authority model: being generated not by a top-down media organization or state, but dispersed laterally by average users [26]. Organic, or seemingly organic, propaganda is more effective than identical, state-generated efforts [46].

Figure 4 One of the central benefits of social media is the accessibility of propaganda re-distributors: retweets, reposts, and tracked interactions may bolster the visibility of a narrative simply because real users interacted with it. Fig. 4 depicts a piece of top-down propaganda that attempts to utilize factual information to appeal to a lateral group of re-distributors.

Just like disinformation, propaganda can spread rapidly online when introduced to directly target a filter bubble [49]. Unlike traditional disinformation, though, propaganda that preys on social vulnerabilities is not designed for a reader to believe in its premise, but to radicalize doubt in truth altogether [7].

Bottom-up propaganda can be mimicked inorganically by top-down actors: distributors of disinformation and propaganda engage in "camouflage" to disseminate seemingly legitimate content through online circles [26]. To effectively shroud propaganda as being organic, bottom-up content, distributors must build a visible perception of authority.

Bowen 14

Cognitive Hacking Exploiting social vulnerabilities, weaponizing algorithms, and deploying disinformation and propaganda all serve the ultimate aim for malicious bots: manipulation. Linvill, Boatwright, Grant, and Warren [26] suggested that these methods can be used to engage in "cognitive hacking," exploiting an 's predisposed social vulnerabilities. While consumers of social media content should judge content by checking for cues that may decrease authenticity and credibility [22], content that both appeals to a preconceived viewpoint and appears authentic is able to bypass deception checks [30].

The traditional media tactic of agenda setting argues that media coverage influences public perceptions of issues as salient. In contrast, Linvill and Warren [27] suggest that public agenda building, behavior responses to social movements online can be influenced by disinformation and propaganda. Mass media altering issue salience in mass (agenda setting) is not as effective as mass audiences generating issue salience collectively (agenda building).

State-sponsored efforts to influence the public agenda are less effective than naturally generated public discussion, and as such efforts to alter the public's agenda building are more effective when they are generated either by citizens themselves or by users that appear to be citizens [27]. To this end, malign actors utilize bots in social media environments to disseminate their narrative.

Hacking in Practice Karlova and Fisher [22] argue that can exploit social vulnerabilities to disseminate disinformation and propaganda. Governments and organizations that engage in the manipulation of information, currency, and political narratives online and on social media have been labeled by the European Commission as "hybrid threats:" states capable of engaging in both traditional and so-called "non-linear" warfare waged entirely with information [2].

One such state is Russia, whose modern disinformation campaigns rose from the ashes of the Soviet Union's extensive propaganda campaigns throughout the 20th century [25].

Case Study: The Internet Research Agency The centerpiece of the Russian social media disinformation campaigns since at least 2013 has been the St. Petersburg-based Internet Research Agency [7], a state-sponsored troll factory, a group dedicated to creating and managing bots [27].

The IRA's reach was deep: measuring IRA-generated tweet data and Facebook advertisement log data, up to one in 40,000 internet users were exposed to IRA content per day from 2015 to 2017 [21].

The IRA's efforts were first exposed after a massive, multi-year coordinated effort to influence the outcome of the 2016 United States presidential election [32]. Their goal was not to pursue a particular definition of truth and policy, but to prevent social media users from being able to trust authorities, to encourage them to believe what they were told, and to make indistinguishable truth from disinformation [7].

Bowen 15

To those ends, the IRA utilized bots in a variety of multi-national campaigns to amplify a range of viewpoints and orientations to decrease coordination in both liberal and conservative camps [27]. Early on, Russian-operated accounts inserted themselves into natural political discourse on Twitter, Facebook, and Instagram to disseminate sensational, misleading, or even outright false information [26]. They worked to "delegitimize knowledge" not at the top levels of public , but at the ground level of interpersonal communication online.

To create a sense of authenticity and bottom-up authority [30], IRA accounts on Twitter, Facebook, and Instagram built identities as legitimate citizens and organizations with a spectrum of political affiliations, from deep partisan bias to no affiliation at all [27]. Many accounts acted in concert, generating a fluid machine process, which Linvill and Warren [26] liken to a modern propaganda factory.

To overcome, or perhaps social media filter bubbles, the IRA generally operated a wide variety of accounts, including pro-left, pro-right, and seemingly non-partisan news organizations [26]. Increasing authenticity in these political circles meant posting overtly political content and relevant "camouflage" that signals to a user that the account is, in fact, operated by a legitimate citizen.

Case Study: 2014 Ukrainian Protests In one of Russia's earliest exertion of bot operations, state actors conducted disinformation operations in nearby Ukraine beginning in 2014 [30]. Protest movements in response to Russian- sympathetic Ukrainian president Viktor Yanukovych flourished on social media, but both Russian and Ukrainian authorities were able to disrupt protest movements by inserting disinformation into the social media platforms on which protestors were actively planning.

The Ukrainian social media protests occurred just a few years after the Arab Spring protests in which Twitter and Facebook played vital roles in organization, dissemination of information, and free expression. The speed at which information was passed regarding the Ukrainian protests online heightened insecurity at the upper levels of the Russian and Ukrainian governments [30].

Russian and Ukrainian state actors posing as protestors inserted disinformation into social circles. A 2013 photo of a Syrian war victim was used to show how Ukrainian soldiers had attacked a young boy in Ukraine [30]. Screenshots from notoriously violent Belarussian "The Brest Fortress" were used to show a little girl crying over the body of her mother. While both images were demonstrably false, the content was still used both to dissuade protestors from publicly joining the effort and destabilize Ukrainian citizen coordination. Because it came from outwardly "regular" or legitimate sources and thus carried both authenticity and bottom-up authority, the false content carried inherently high credibility with protestors [30].

Ukraine was one of Russia's earliest forays into social , and a new step away from direct intimidation of opponents [30]. Its experiments in limited online social circles showed state actors that citizens can actively participate in the creation and dissemination of disinformation and propaganda [30] without the state having to exert a previously overt role [25].

Bowen 16

Case Study: 2016 U.S. Presidential Election The genesis of the national conversation of bot campaigns was the 2016 U.S. presidential election, where a coordinated effort by the IRA first sought to destabilize the U.S. political system and erode trust in the political process [32].

The IRA's goal in the election was not to directly support a Trump presidency, but to sow , foster antagonism, spread distrust in authorities, and amplify extremist viewpoints [27]. Their methods, though, seemed to favor a Trump presidency overall [26].

On May 17, 2017, former FBI director Robert Mueller was appointed to a head a special counsel investigation into Russian interference into the election. While much of the 448-page report, which was released on April 18, 2019, remains redacted, publicly available information in the report details the IRA's adherence to advertisements, Facebook groups, and Twitter trolls to spread disinformation [32].

In sum, the IRA employed nearly 3,900 false Twitter accounts to produce, amplify, and insert disinformation, propaganda, fake news, and divisive content into preexisting American social media circles [26]. The organization utilized a combination of bot techniques and preyed on several social vulnerabilities to sow discord on Twitter, operating accounts with explicitly partisan leanings ("@TEN_GOP") and accounts with bottom-up authenticity ("@Pamela_Moore13," "@jenn_abrams") [32]. IRA Twitter content was viewed and interacted with over 414 million times on the platform between 2015 and 2017 [14].

On Facebook, the IRA focused on acquiring the support of existing American users with partisan groups ("Being Patriotic," "Secured Borders," "Tea Party News") and social justice groups ("Blacktivist," "Black Matters," "United Muslims of America") [32]. The IRA also purchased approximately $100,000 worth of algorithmically targeted Facebook advertisements promoting IRA-operated groups and pro-Trump, anti-Clinton messaging. IRA Facebook content reached a total of 126 million American users [14].

Hall [19] found that American citizens online were unable to differentiate between false 2016 election content and real content on either platform. Even if users were able to distinguish between the two, false information and propaganda often still affected user opinion.

Specific Attacks: 2017 French Presidential Election Just five months after their successful influence campaign in the U.S., the IRA set its sights on the French presidential election. Evidence collected by Ferrara [12] suggested that troll factories attributed to the Kremlin assembled a coordinated campaign against centrist candidate Emanuel Macron and his party, "En Marche," in France's 2017 presidential elections. 4chan.org, a mostly American far-right messaging board, fostered the initial discussion of leaking or even manufacturing documents to incriminate the Macron campaign. On May 5, 2017, just two days before the presidential election, a coordinated doxing effort released "MacronLeaks" to the public via well-known leak aggregator WikiLeaks, mirroring a similar Russian effort against the DNC in the United States the previous year [11].

Bowen 17

Russian disinformation groups seized on the initial MacronLeaks dump, coordinating another bottom-up assault via social media [12]. They dispatched bots which largely amplified existing narratives, re-tweeting tweets pertaining to the dump or sending tweets which included "#MacronLeaks," "#MacronGate," or just "#Macron." The bots also generated tweets aimed at Marine Le Pen, a far-right candidate and Macron's electoral opponent.

Beginning on April 30, 2017, the bot campaign ramped up quickly and generated at peak 300 tweets per minute and continued through Election Day, May 7, and fading out by the following day. When the MacronLeaks dump occurred on May 5, bot-generated content regarding Macron and the election began to increase. At peak, bot-generated content competed with or matched human-generated posts, suggesting that the MacronLeaks campaign generated substantial public attention. Additionally, increases in bot traffic and content on Twitter tended to slightly precede corresponding increases in human content, suggesting that bots were able to "cognitively hack" human conversations and generate new discussion topics, particularly regarding controversial issues [12].

The bot campaign in France, though, has been largely regarded as a failure: the program, despite generating substantial online discussion, had little success at swaying French voters [12]. It has been suggested that because the majority of bot-generated Tweets were in English, not French, the language selection both snubbed native-French participation and engaged the American anti- Macron user base, contributing to the volume of Tweets about the MacronLeaks dump.

Case Study: 2020 U.S. Presidential Election After the successful execution of the bot operation during the 2016 Election, Russia turned its sights on the 2020 election, which it again hoped to mold in favor of its foreign policy interests. The 2018 Mueller Investigation, though, had thoroughly exposed the IRA's playbook, and social media platforms had already begun reacting to pervasive misinformation with more stringent social media restrictions [53], necessitating a new strategy for influence. Reorganizing under the less stigmatic moniker "Lakhta Internet Research" (LIR), which first operated during the 2018 U.S. midterm elections, the IRA focused its mission fully on amplifying existing domestic narratives and issues [33].

In April 2020, a cooperative investigation between CNN and Clemson University behavioral science researchers Darren Linvill and Patrick Warren uncovered one of the LIR's new tactics: outsourcing [52]. A pair of proxy troll farms, located in Ghana and Nigeria, produced content specifically targeted Black Americans, working to inflame racial tensions and heighten awareness of police brutality. In just eight months, LIR-owned accounts operated out of these troll factories garnered almost 345,000 followers across Twitter, Instagram, and Facebook.

The of the Ghanaian and Nigerian proxy troll factories did not deter Moscow or the LIR. Just four months later, on August 7, 2020, National Counterintelligence and Security Center director William Evanina released a statement outlining the Intelligence Community's confidence in a sustained, pervasive social media influence operation to sway voters and increase social discord [10]. Simultaneously, a Carnegie Mellon University study found that 82% of the most influential accounts on social media distributing information about the COVID-19 pandemic were bots [58]. Bowen 18

But on November 2, 2020, just one day before Election Day, researchers at the Foreign Policy Research Institute (FPRI) noted that it was state news channels and official state sources, not swarms of bots and trolls, that were pushing divisive and anti-democratic narratives about the election, particularly utilizing phrases like "rigged" and "civil war" [3]. A separate analysis by the FPRI on the same day came to the same conclusion: state-sponsored networks, personalities, and figures seemed to be doing the heavy lifting of disinformation distribution, especially on the issue of mail-in , which had received a similar stamp of "fraudulence" from state sources [51]. After the election, it appeared Moscow's efforts to bolster President Trump's reelection had failed.

In a post-election autopsy report on election interference, the National Intelligence Council (NIC) dissected intelligence on the LIR's tactics, successful or otherwise, during the election cycle [33]. Rather than focus on generating bottom-up authority with an army of bots online, the Kremlin seemingly took an approach of top-down authority instead, using existing social media personas, unreliable news websites, and visible U.S. figures and politicians to deliver its divisive and sometimes outright false messaging. Its comparatively minor bot campaign, centered around the LIR, served mostly to amplify U.S. media coverage of the messaging it had pushed through its other mediums, and tended to ride the coattails of established personas online, including, the report hints, Rudy Giuliani, President Trump's personal lawyer, and President Trump himself.

The report also noted, though, that in the month leading up to the election, Moscow opted to shift its tactics again, beginning to discredit an incoming Biden administration and the results of the election rather than continue to support what it viewed as a shrinking possibility for a Trump reelection. After the election, it doubled down on the "rigging" narrative, raising questions about the validity of Biden's election, exclusively to foster further distrust of the system [33].

Watts [53] argues that the efficacy of bots, trolls, and foreign influence, though, paled in comparison to the wide-reaching success of domestic disinformation. Top-down authority was more effective in the 2020 U.S. presidential election, Watts argues, because one of Moscow's most effective amplifiers of disinformation was already in the White House. Indeed, the NIC found that the operation largely rested on existing, popular figures to disseminate the narratives Moscow had chosen [33]. Meanwhile, filter bubbles, echo chambers, group polarization, and preexisting domestic attitudes culminated in the violent January 6, 2021 raid of the U.S. Capitol by Trump supporters who cited the very claims of fraudulence that the Kremlin's disinformation campaign had produced.

If Russia's efforts in the 2016 U.S. presidential election laid the groundwork for the social diffusion of discord, the 2020 election was a successful trial run. Despite its candidate's loss in the election, Moscow's influence operation was able to convince a significant portion of the U.S. population that their election system was illegitimate and broken enough to warrant an insurrection against it [53]. This succinctly meets one of Moscow's long-term foreign policy goals of destabilizing the Western-led Liberal International Order.

Bot Tactics In these cases, malicious bots were successful because they effectively bred bottom-up authority and consensus on the narratives they delivered. To exploit social vulnerabilities and "hack" users Bowen 19

in these manners, though, bots must utilize a number of tactics to create their authentic personas and bolster their authenticity and thus authority.

Bypassing Controls In order to operate a successful network of bots, the accounts must first be created and designed. When a traditional user creates their account, they are prompted to enter their name, email, phone number, and date of birth. They must also confirm their legitimacy by recalling a numeric code sent to their email address or phone number. Occasionally, users may also be asked to solve a CAPTCHA, ("Completely Automated Public Turing test to tell Computers and Humans Apart"), a test designed to differentiate artificial and human users which often involves mouse movement tracking or picture recognition.

Platforms like Twitter have gradually increased the difficulty of creating new accounts to dissuade troll factory creators, from producing hordes of accounts en masse. Similarly, , which offers simple email account creation tools, increasingly requires cross-authentication to create new accounts. Additionally, both platforms may track the number of new creations on a given IP address, blocking the address after a certain number of account creations in a given period.

On Twitter, developer accounts with tightly controlled Application Programming Interface (API) calls are required to own and control bots. Rates for connection to the API, both for reading and writing, are limited to prevent and massive data collection.

A single bot or small group of accounts, though, is largely insufficient to produce the desired illusion of bottom-up authority required to conduct an effective campaign of influence. As restrictions become more stringent, the cybersecurity arms race has encouraged elicit troll factories to craft new tools to work around these restrictions and maximize their output for the creation of a bot army.

Rather than apply for a Twitter developer account and work under the confines of rate limits, HTML parsers in programming languages like Python allow troll factories to bypass the API entirely by reading, interpreting, and hijacking the Twitter Web App interface, which is not rate limited. As a result, many bots may display a disproportionate usage of the Twitter Web App.

Similarly, programs and scripts designed to bypass proxy limiting, , and CAPTCHA checks are freely available and accessible online. Tools like the PVA Account Creator combine these tactics into one functioning application, allowing users to farm accounts with little resistance or repudiation from Twitter or Google.

Obfuscation Once an account is created and Twitter's restrictions have been bypassed, the more difficult task of maintaining secrecy and effective operation begins. If a bot wants to remain online and avoid detection by real users employing deception checks, it must engage in a number of obfuscation tactics to increase its authority and blend in with real users.

The most glaring flaw of the bottom-up propaganda model, of which bots are a facet, is that the creation of individual authenticity required to support bottom-up propaganda is much more Bowen 20

challenging than that of state-powered, top-down propaganda model: creating authenticity for thousands of accounts is more difficult than creating authority for one. The primary solution to this problem is, ostensibly, to generate assembly-line authenticity and authority.

Evasion Because bots are fabricated and do not have lives that may help authenticate a social media profile, false accounts often rely on a degree of anonymity to survive [8]. This is evasion: generating authenticity by making a user's profile seem innocuous.

Real users often have profile photos of themselves, a background photo to match, and personal details in their description. They may also include links to outside sites and their location. Bots, however, do not have any real personally identifiable information (PII), so information that builds a profile must be taken from other places. Vague descriptions, graphics as profile pictures, and vague locations may indicate bot behavior [8]. Usernames (e.g., "@JamesSt38454788," "@MarshallTuck") also provide clues: algorithmically generated usernames (e.g., "@kaywell52163396") generally include strings of numbers or words that do not relate to the bot's display name (e.g., "puppy12345 / @monkakashi"). Bot accounts created in batches may randomly select from a list of seemingly legitimate display names while picking outlandish or completely unrelated usernames [58], or a display name and a profile picture that are similarly unrelated.

When users perform implicit deception checks, profile photos, display names, and other pieces of PII provide easy fodder for cueing a user into an account's legitimacy. The relative anonymity of social media, though, makes evasion much easier. As users scroll through their feed, each tweet they see is marked by the name and profile photo of an author, but that information is not inherently accurate for any user, let alone bots.

Camouflage Building authenticity through user profiles is only part of the equation for obfuscation: authenticity in content is also required [8]. Camouflage is the process of posting and reposting innocuous and average content to fill a user's post with a variety of topics and sources. Not only does this tactic broaden the visibility of a malicious bot's payload, but it also bolsters the long-term authenticity of a bot by deflecting more critical deception checks that reach beyond just basic account information into user posts and reposts.

The Internet Research Agency accounts regularly utilized this technique to avoid detection. Many accounts spent most of their posts camouflaging themselves with content that linked them to a particular social identity rather than an overtly and obviously political one [7]. Only around 20% of posts were used to expand on an overtly political — the other 80% were reserved for camouflage. On Facebook, the IRA created pages tailored to special interest groups like the Black Lives Matter movement and Christian fundamentalists, building networks of real, extremist users that would naturally bolster and disseminate disinformation, decreasing the necessity for inorganic camouflage [14].

Diminishing Returns Bots can effectively operate by establishing themselves as real, trustworthy identities. In order to craft an effective disguise, though, bot owners must make some distinct compromises on the Bowen 21

thoroughness of their tactics of evasion and camouflage. Critically, the more human a bot appears online, the less effective it is at disseminating a narrative rapidly and widely. Put differently, the more camouflaged a bot is, the less it can perform its intended duties, and the less camouflaged a bot is, the more it risks easy detection.

Bot behavior, then, represents the intersection of the critical values between effectiveness and reservation; even very flimsy and easily detectible bots make critical decisions about account details. These commonly identifiable traits that bots exhibit may provide a foothold for the identification of false accounts and may be used in the implementation of individual-level deception checks.

Spamming One of the most obvious markers for bots is a high activity rate: posting dozens or even hundreds of times per day, especially in rapid succession, is implausible behavior for a real user [8]. Because they are operated entirely within software, though, bots may tweet more often than humanly possible [58]. This method of rapid-fire tweeting, called spamming, is effective for increasing the changes that a narrative is seen by real users, especially if it is targeted. Spamming can be a dangerous tactic, though, as it makes high-level deception checks more likely to find malicious activity, especially if spammed tweets are identical or nearly identical.

Hijacking Creating original content is much harder for bots to accomplish than copying others', so amplifying messages often rests as the most common type of activity for a bot [8]. The tweets page of a bot will likely consist of reposts and identical replications of article headlines, verbatim quotes, or commonly replicated messages.

Similarly, bots in a network may be directed to interact with specific content— accounts that regularly repost the same content in similar orders may be part of a coordinated effort. Bots in networks may also post identical or similar original content: related accounts posting the same content may be part of a coordinated effort.

In order to expand their reach, bots may regularly utilize popular hashtags or viral trends [27]. While making their content more likely to be seen and their accounts more likely to be followed, tweets that "hijack" popular content from real users also serve as a method of camouflage and serve to bolster long-term authenticity. Bots may also manipulate the group polarization vulnerability by portraying both sides of a political argument, poorly portraying the argument of one side and downplaying the extremity of another: both sides, having consumed the conversation, would walk away more convinced, both of the legitimacy of their side and the incompetency of the other [31].

Narrative Switching Bots are written and manipulated with a sole purpose in mind. That purpose, though, may shift over time, especially if a bot is controlled as a product. Narrative switching is the process by which a bot transitions from one topic to another. For bots intended to spread disinformation, this process often includes posting innocuous camouflage content at first and slowly transitioning to more insidious content over time [7]. Like camouflage content, narrative switching is intended to Bowen 22

promote an account's percieved authenticity and improve social identity building, both in the short and long-term.

Bots designed to promote products, meanwhile, may engage in narrative switching by promoting wildly different products concurrently, or by displaying extreme differences between their promotions from month to month as their use is sold and transferred. Bots like these may be bought and sold or leased for a time.

Many Internet Research Agency accounts engaged in both kinds of narrative switching. Often, bots would manage dual identities, posting both overtly pro-Democratic and pro-Republican content at seemingly random intervals [7]. These narratives were often inconsistent, as accounts were repurposed or relocated due to IRA operational needs.

Like spamming, this method of narrative switching is particularly dangerous for a bot owner to use: a high-level deception check should detect the extreme differences between promoted content.

Previous Analyses Prior work on bot detection models have been largely divided into three categories: detection based on social network information, detection based on human intelligence and crowdsourcing, and machine learning methods designed to distill quantifiably differences between bots and real users [13].

Many models were based on supervised machine learning techniques utilizing a random forest classifier algorithm to randomly select a set of characteristics [41], [55], [54], [43], [6].

Twitter's own internal bot detection systems, though, are predicated on holistic behavior analysis rather than content [42], [50]. Twitter itself suspends, in its own words, "millions" of accounts per month, especially if they are producing significant loads of spam. Twitter is reluctant, though, to base its own bot analysis on models of human behavior, content, and engagement, primarily so as to avoid banning or blocking real users whose behavior happened to match that of a bot.

Methodology While not the complete solution to combatting bot campaigns, filter bubbles, and disinformation on social media, a reliable method for the detection of bots may be a first step. With the common characteristics, traditional tactics, and previous analyses of bot accounts in mind, I propose BotWise, a model designed to capture and recognize the behavioral, linguistic, and data disparities between real users and bots on the social media platform Twitter. BotWise utilizes algorithmic and machine learning methods to distill discrete differences between the behavior of bot accounts and human accounts. By defining an "acceptable norm" of human behavior on the platform, accounts with behavioral patterns outside of the norm may be flagged as potentially inhuman.

BotWise inputs a single Twitter username (e.g., "@POTUS," "@nytimes") and outputs a binary classification: "LIKELY BOT" or "NOT LIKELY BOT." The completed and functioning model is presently available at https://github.com/BraedenLB/BotWise.

Bowen 23

Datasets This model primarily implements a subset of Yang, Varol, Hui, and Menczer's [55] "midterm-18" dataset of 8,092 English-language human accounts. A 42,446-account subset of bot accounts also included in the dataset was not implemented, as nearly every account had been previously suspended by Twitter and associated archived JSON data was insufficient for the model's comparatively high parameter count. Due to rate limits on Twitter's API for user streams, BotWise trains using 800 accounts by default, but this number is flexible given expanded Twitter API rate limits.

An additional 100-account set of bot accounts was created by manually labeling artificial accounts that interacted with the "@POTUS" and "@WhiteHouse" Twitter accounts from April 5-6, 2021, based on previously defined criteria for bot identification on the platform.

Features and Data Points BotWise utilizes the Twitter API for Developers to access relevant account and tweet metadata from a target user. To bypass API rate limits of 1,500 tweets per 900 seconds, the model requires a 960-second waiting period between intervals of 1,500 tweets. For this reason, training on large numbers of accounts concurrently may take significantly longer than actual computation time (0.996 seconds per account on average).

12 raw metadata points are collected from a user's account information, and 11 raw metadata points are collected from each of a maximum range of 25 tweets per account. Because only one account in the 800-account human training set (0.125%) had a maximum range of zero tweets, accounts with zero tweets were excluded from the training set. This has the effect of disproportionately categorizing accounts with no tweets as bots.

Processing and Normalization The 12 account points and 11 tweet points are further processed through normalization, which removes extra spaces, lines, and other textual features, including the formatting of JSON objects returned by the API, that could obstruct further processes. This process insulates BotWise from significant errors in processing but has the effect of expunging any potential nuance of the content's grammar and stylization.

An additional layer of normalization performs a filtration process, storing the number and position of each instance of a filtered data point. Data is stored using a modified n-gram data structure (Fig. 5) that, rather than tracking nearby letters or symbols as does a traditional n-gram, tracks the overall presence of a word, letter, or symbol in a phrase (e.g., the number of "." in a tweet and its position in the corpus). This structure can account for any number of instances of a given symbol and any number of positions in a text.

Figure 5 Normalization filters include punctuation, capital letters, numbers, emoji, and a set of 153 "stop words," the most commonly used words on Twitter [17]. The stop words filter, though, only includes English-language words, making the model ineffective for any foreign-language account. Bowen 24

To accommodate this problem, the model rejects all tweets not labeled by the Twitter API as English (e.g., "можно просто весь день гулять по Невскому"), but opts to mark undetermined tweets (e.g., "@NBCNews �") as English under the presumption that they primarily include Twitter-native elements like tagged users and hashtags, which are still classifiable by the model even if they are not in English (e.g., "@FIFAcom #fútbol #Barcelona").

The normalization process repeats for each element of the raw API data: the username, display name, user description, and user tweets. The type, number, and position of each n-gram entry are compiled into separate Tweet and Account objects for easier data management. Other Boolean data points, including "HAS LOCATION" or "HAS PINNED TWEET" are also considered at this stage.

Aggregation After basic data processing has completed, the aggregation process transforms normalized data into a package readable by a comparator, further reducing the complexity of the data. During this process, ratio calculations and average activity rates are computed. All data points are stored in condensable dictionaries of n-grams (Fig. 6).

Figure 6 Once all aggregate calculations are complete, all n-grams are split into their component parts (lists of type, number, and position) to create a total of 90 data points (49 account points, 28 tweet points, and 13 aggregate points) (Fig. 7) and compiled into a single payload. Boolean, text, integer, and decimal data points are concurrently packaged into the payload, which is then output into a log file for accessible manual review.

Once logging is complete, 30 data points, those containing raw text, characters, and symbols, are removed from the payload, as they are not necessary for topological comparison between accounts. The remaining 60 points (66.7% of the full payload) are archived and reserved as a comparison payload. Boolean data points are archived as either "0" or "1."

Figure 7 Bowen 25

Deviation With only numerical data points remaining, a comparator can distill more discrete statistical information about the behavior of an account. For each set of 60 points of comparison , the average value of the set μ and the standard deviation of the set σ of are calculated (Eq. 1). 𝑥𝑥 𝑥𝑥

Equation 1

Training Training BotWise relies on a two-factor method: training and maxing. The first factor, training, determines the average and standard deviation of the whole training set. Of the 800-account subset of human accounts implemented for training the model, 700 are reserved for training the model and 100 are reserved for maxing the model. For each account in the training subset, the normalization-aggregation-deviation algorithm will determine an average μ and a standard deviation σ for each set of 60 points. The absolute value of all standard deviation values is considered to avoid error calculations with negative values. The average value μ0 of the averages and the standard deviation σ0 of all standard deviations for each set are stored in list , a list of tuples of [μ0, σ0] (Fig. 8). List is then output to a JSON file "base_model" for simplified recall and implementation by the model. 𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆

Figure 8 Maxing The remaining 100 training accounts are then utilized in the maxing process, the second factor of the training . Maxing determines the behavior of the most extreme human accounts and creates a "cutoff" above which aggregated account data is considered to be extremely unusual, or where little overlap between human behavior and bot behavior should occur. For this model, the maxing threshold has been set to the 95th percentile of extremity— given a standard bell curve, this extremity should only consist of 4.56% of the dataset.

BotWise considers three ranges of standard deviations : 1.5 standard deviations (81.5% of the expected dataset), 1.75 standard deviations (88.5% of the dataset), and 2.0 standard deviations (95% of the dataset). 𝑟𝑟

The 100-account subset is subjected to an identical normalization, aggregation, and deviation process, yielding an average μ and a standard deviation σ for each of 60 points of comparison in each account. Utilizing the μ0 and σ0 created by the training process, a z-score , the number of standard deviations from the training set, is calculated for each account at each range (Eq. 2). The absolute value of all z-scores is considered to avoid error calculations with negative𝑍𝑍 values. If the of the set is less than the acceptable range, it is zeroed. This prevents the large number𝑟𝑟 of average z-scores from artificially inflating or deflating the representation of more extreme z-scores that 𝑍𝑍surpass the acceptable range.

Bowen 26

Each is stored in a list , containing all z-scores for an account at the given range. Each account has three lists , one at = 1.5, = 1.75, and = 2.0. 𝑍𝑍 𝑃𝑃 𝑃𝑃 𝑟𝑟 𝑟𝑟 𝑟𝑟

Equation 2

For each list for each account, a b-value , the average of values in , is calculated (Eq. 2). Once the three b-values have been calculated for all 100 accounts in the subset, the lists are ordered and the b-value𝑃𝑃 representing the 95th percentile,𝐵𝐵 is stored in dictionary𝑃𝑃 , with keys corresponding to the associated with each (Fig. 9). The dictionary of is output to a JSON file "base_max" for simplified recall and implementation𝑀𝑀 by the model. 𝑀𝑀𝑀𝑀 𝑟𝑟 𝐵𝐵 𝑀𝑀𝑀𝑀 𝑀𝑀

Figure 9 Comparison For all new accounts under comparison by the model, the model will read both the "base_model" and "base_max" JSON files to determine the acceptable average, standard deviation, and maximum for each range. Then, the account will be subjected to the normalization-aggregation- deviation process for which an average μ and a standard deviation σ will be calculated. Then, z- scores and b-values will be calculated for each range . If the of the new account at is larger than the maximum value ar , then the account fails at (Fig. 10). If an account fails at more than𝑍𝑍 one range, it𝐵𝐵 receives the binary classification "LIKELY𝑟𝑟 BOT."𝐵𝐵 Otherwise, it receives𝑟𝑟 the binary classification "NOT LIKELY𝑀𝑀 𝑟𝑟 BOT." 𝑟𝑟

Figure 10 Testing To test the efficacy of the BotWise model on correctly identifying human accounts, a 100-account subset of the midterm-18 dataset [55] was used to test human accounts. Each account was subjected to the full normalization-aggregation-deviation-comparison process. If an account received an incorrect binary classification of "LIKELY BOT," the username (e.g., "@kristinpw") was returned by the script.

To test the efficacy of the BotWise model on correctly identifying bot accounts, a 100-account was manually labeled from a list of accounts that followed or interacted with the "@POTUS" or "@WhiteHouse" Twitter accounts. Each account was subjected to the full normalization- aggregation-deviation-comparison process. If an account received the incorrect binary classification of "NOT LIKELY BOT," it was archived for manual review (e.g., "@susan41404031").

Bowen 27

Results The output of the algorithm supported the hypothesis that user behavior can be effectively modeled based on account data and activity: the topological data of each training account was able to be distilled into the 90 points of data generated by the normalization-aggregation process.

The results of the testing supported the hypothesis that novel bots could be distinguished from real users by comparing account behavior: BotWise correctly identified 93% of the human accounts and correctly identified 84% of manually labeled bot accounts.

Of the seven real users that were incorrectly identified as bots, two failed at two ranges and five failed at all three ranges. Of all incorrectly failing accounts, all seven accounts failed at range 1.5, six accounts failed at range 1.75, and six accounts failed at range 2.0.

Of the 16 bots that were incorrectly identified as bots, two passed at two ranges and 14 passed at all three ranges. Of all incorrectly passing accounts, 14 passed at range 1.5 and all 16 passed at ranges 1.75 and 2.0.

Discussion Given that the 95th percentile should have been expected to preclude only the most extreme 4.56% of the data, a 7% misidentification rate was 53.5% larger than predicted. This is likely due to the fact, though, that the normal distribution of testing account behavior did not perfectly fit the bell curve model that the maxing process assumed.

Of the real users that were incorrectly identified as bots, several accounts ("@GeekPride5," "@kristinpw") primarily engaged in retweeting and had no original tweets to be read by the model, thus rendering as many as 30 points of comparison as null. This method is effective for finding accounts, often bots, that have no tweets, but less effective for a well-populated account with nearly no original tweets.

Other accounts ("@PascoDevService," "@22sday") appear to be shared accounts or representatives of groups or organizations, and thus do not exhibit the behavior of a single individual user. Accounts like these generally include an image or video in each tweet and regularly utilize similar hashtags to promote a brand.

Some accounts still ("@22sday," "@oldskoolking87") simply had extremely unusual behavior, including repeatedly tweeting the same tweet to other, real users or using as many as 12 hashtags in a single tweet. Meanwhile, a few accounts ("@Adrian_LAEvents," "@SereneSleeps") had no manually discernable features that may have triggered the incorrect classification.

Of the bots that were incorrectly identified as real users, most used effective camouflage tactics to avoid detection. Few accounts ("@Girth_Daddy") showed significant variation in the usage of punctuation from tweet to tweet, but the number of retweets the account reposted diluted the number of original tweets the algorithm was able to analyze.

Unexpectedly, an effective way of tricking the algorithm seemed to be posting a single tweet with a picture and just a few words. Both "@ObsinaanNidarba" "@Akki93147433" were incorrectly Bowen 28 identified as human accounts, but each had just one tweet with a picture. Neither user had a description, but "@Akki93147433" had a profile photo— the same photo in the account's only tweet. "@kiddie_drey" utilized nearly an identical tactic, posting two concurrent photos with only hashtags and emoji.

Additionally, accounts ("@Cheng33575929," "@kaywell52163396") that could consistently produce rudimentary replies to other tweets were able to bypass detection.

Other accounts utilized words and phrases written by humans to populate their tweets. "@jAlmz5," "@ArizonaCjrj," and "@JamesSt38454788" avoided detection by tweeting links to articles alongside the exact title of the article in the body of the tweet (Fig. 11). These articles were often from unverified or "fake news" sources like the Epoch Times, the New York Post, and the Gateway Pundit, but links from more reputable sources like Politico and ABC News did uncommonly appear. Figure 11 Perhaps more effectively, some accounts avoided detection by offering diverse commentary on popular links. On April 10, 2021, "@ArizonaCjrj" tweeted, "@FBI America knows what the FBI & DOJ are all about. You ONLY go after Republicans & let the criminals like Clinton, Clinton Foundation,Biden, Obama, .... get away w/ a plethora of crimes.There’s no justice in America; you’ve destroyed our country." Meanwhile, on October 13, 2020, "@JamesSt38454788" responded to a tweet, "Don't stop there , watch the film , we need a new Quarterback . Effective Immediately !!"

Commentary tweets like those from "@ArizonaCjrj" and "@JamesSt38454788," who were both incorrectly identified as real users, exposed a flaw in the model's method of normalization: by removing formatting irregularities in tweets, the model failed to consider the potential for algorithmically generated messages to over- use or under-use spaces.

Similarly, some bots incorrectly identified as real users escaped recognition of usual behavior because line breaks were removed from consideration by the normalization process. Several accounts utilized an interconnected network of bots to simulate a group identity of online friends. "@Gottabfishin," "@Girth_Daddy," "@susan41404031," "BFes56," and "@StarrRinko" regularly participated in follow and retweet chains, lists of other bots that reposted similar, usually patriotic, pro-Trump, material. The lists, though, included a line break after every tagged user, leading to Figure 12 Bowen 29

usually long tweets that would not necessarily be considered strange otherwise (Fig. 12). Many lists also included images and a descriptive test, which further diverted the attention of the model.

Considerations The model's testing results uncovered a number of potential flaws inherent to its design. First, the normalization of spacing, punctuation, and line breaks crippled the model's ability to recognize unusual usage. While abnormal behavior for words and letters was easily visible, consideration of the unusual punctuation behavior that aided in the manual labeling of bot data was entirely absent. Future implementations should include either a method of tracking the usage of space as it pertains to punctuation or implement model-wide a more traditional n-gram data structure that carries information about characters or tokens immediately before or immediately after a given token.

A second critical error of the model was its flippant consideration of retweets. If the Twitter API categorized a tweet as a retweet, the normalization process instantly rejected it, opting just to tally a sum of retweets rather than to collect metadata on them. Future implementations should solve this problem with two steps: by collecting metadata on retweets and requiring a minimum number of original user tweets to be processed if a high volume of retweets exists.

A third design consideration was a lack of weights on individual points of comparison. While all points of comparison are treated with equal legitimacy and importance by the model, some points of comparison have an illegitimate, disproportionate impact on binary classification solely because a disproportionate number of training accounts happened to have a particular characteristic. Future implementations should consider a system of weights for individual points of comparison to magnify or minimize their relevancy.

Similarly, the model had to sacrifice a larger training model (e.g., more accounts for training and maxing) for a greater number of points of comparison for a single account. Although the bounds of user data are largely limited by the Twitter API's rate limitations, future implementations should consider increasing the number of training and maxing accounts used to build the model.

Future Implementations In the 2020 U.S. presidential election and onward, the genesis of sweeping disinformation campaigns on social media are increasingly not on Twitter, but instead on video-based platforms like TikTok and YouTube [53]. Future modeling and capturing of human and bot behavior should be implemented on video-heavy platforms by utilizing multimodal artificial intelligence to analyze video, audio, and textual content simultaneously. This implementation, though, represents a significantly more complex challenge than textual and behavioral cues on Twitter.

Conclusion Whether incidentally or not, Russia's Internet Research Agency provided clues to the aspirations of bot owners years before its greatest success story unfolded. On September 16, 2014, "@OhMyGodKoval," a bot, tweeted, "Is this the real life? It doesn't matter now who's right and who's not." Utilizing deceptive tactics to avoid detection and exploiting inherent social vulnerabilities, networks of bots have capably and effectively destabilized the worldwide social media landscape, disseminating disinformation and propaganda into users' algorithmically Bowen 30 personalized feeds, and rendering them incapable of discerning the legitimacy of a claim if it aligns with their own beliefs.

BotWise, a model designed to recognize bots on the social media platform Twitter, was effectively able to identify bots while minimizing false positives of real users. While it is not the complete solution to combatting rampant disinformation campaigns online, an adequate method for bot detection is a step towards re-defining who is right and who is not.

Bowen 31

Works Cited 1. Barbera, J., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting From Left to Right: Is Online Political Communication More than an Echo Chamber? Psychological Science, 26(10), 1531-1452.

2. Bennett, W. L., & Livingston, S. (2018). The disinformation order: Disruptive communication and the decline of democratic institutions. European Journal of Communication, 33(2), 122-139.

3. Chernaskey, R. (2020, Nov. 2). Foreign Influence and the 2020 Election: What We've Seen and What's to Come. Foreign Policy Research Institute.

4. Choi, D., Chun, S., Oh, H., Han, J., & Kwon, T. (2020). Rumor Propagation is Amplified by Echo Chambers in Social Media. Scientific Reports, 10(310).

5. Croucher, S. M. (2011). Social Networking and Cultural Adaptation: A Theoretical Model. Journal of International and Intercultural Communication, 4(4), 259-264.

6. Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016). BotOrNot: A System to Evaluate Social Bots. WWW16.

7. Dawson, A., & Innes, M. (2019). How Russia's Internet Research Agency Built its Disinformation Campaign. The Political Quarterly, 90(2).

8. DFR Lab. (2017, Aug. 28). #BotSpot: Twelve Ways to Spot a Bot. Medium.

9. Director of National Intelligence (2017). Assessing Russian Activities and Intentions in Recent US Elections. Intelligence Community Assessment.

10. Evanina, W. (2020). Statement by NCSC Director William Evanina: Election Threat Update for the American Public.

11. Faris, R., Roberts, H., Etling, B., Bourassa, N., Zuckerman, E., & Benkler, Y. (2017). Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election. Berkman Klein Center for Internet & Society, 2017(6).

12. Ferrara, E. (2017). Disinformation and Operations in the Run Up to the 2017 French Presidential Election. University of Southern California Information Sciences Institute.

13. Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2017). The Rise of Social Bots. National Science Foundation.

14. Forrest, C. Y. (2018). Russia's Disinformation Campaign: The New Cold War. Communications Lawyer, 33(3), 2-5)

15. Geschke, D., Lorenz, J., & Holtz, P. (2019). The triple-filter bubble: Using agent-based modelling to test a meta-theoretical framework for the emergence of filter bubbles and echo chambers. British Journal of Social Psychology, 58(1), 129-149.

Bowen 32

16. Gillani, N., Yuan, A., Saveski, M., Vosoughi, S., & Roy, D. (2018). Me, My Echo Chamber, and I: Introspection on Social Media Polarization. WWW18.

17. Grossman, L. (2009, June 8). The 500 Most Frequently Used Words on Twitter. Time.

18. Guadagno, R. E., Rempala, D. M., Murphy, S., & Okdie, B. M. (2013). What makes a video go viral? An analysis of emotional contagion and Internet memes. Computers in Human Behavior, 29(2013), 2312-2319.

19. Hall, H. K. (2017). The new voice of America: Countering Foreign Propaganda and Disinformation Act. First Amendment Studies, 51(2), 49-61.

20. Huntington, H. E. (2013). Subversive Memes: Internet Memes as a Form of Visual . Select Papers of Internet Research, 14(2013).

21. Im, J., Chadrasekharan, E., Sargent, J., Lighthammer, P., Denby, T., Bhargava, A., Hemphill, L., Jurgens, D., & Gilbert, E. (2019). Still Out There: Modeling and Identifying Russian Troll Accounts on Twitter. Association for the Advancement of Artificial Intelligence.

22. Karlova, N. A., & Fisher, K. E. (2012). “Plz RT”: A Social Diffusion Model of Misinformation and Disinformation for Understanding Human Information Behaviour. ISIC2012.

23. Kemp, S. (2020). Digital 2020: 3.8 Billion People Use Social Media. We Are Social.

24. Kim, S. A. (2017). Social Media Algorithms: Why You See What You See. Georgetown Technology Review, 147(2017).

25. Klausen, J. Z. (2016). Giving Attention to Conduct on Social Media: Discursive Mechanisms of Attention Structures in Mediating Governance-at-a-Distance in Today’s Russia. International Journal of Communication, 10(2016), 571-588.

26. Linvill, D. L., Boatwright, B. C., Grant, W. J., & Warren, P. L. (2019). “THE RUSSIANS ARE HACKING MY BRAIN!” investigating Russia's internet research agency twitter tactics during the 2016 United States presidential campaign. Computers in Human Behavior, 99(2019), 292- 300.

27. Linvill, D. L. (2020). Troll Factories: Manufacturing Specialized Disinformation on Twitter. Political Communication, 37(4).

28. Mandiberg, M. (2012). The Social Media Reader. NYU Press.

29. Marwick, A., & Lewis, R. (2017). Media Manipulation and Disinformation Online. Data & Society.

30. Mejias, U. A., & Vokuev, N. E. (2017). Disinformation and the media: the case of Russia and Ukraine. Media, Culture & Society, 39(7), 1027-1042.

31. Mims, C. (2020, Oct. 19). Why Social Media Is So Good at Polarizing Us. The Wall Street Journal.

Bowen 33

32. Mueller, R. III. (2019). Report On The Investigation Into Russian Interference In The 2016 Presidential Election. U.S. Department of Justice.

33. National Intelligence Council. (2021). Foreign Threats to the 2020 US Federal Elections. Intelligence Community Assessment.

34. Nevzat, R. (2018). Revising Cultivation Theory for Social Media. The Asian Conference on Social Media, Communication & Film.

35. Pariser, E. (2011). The Filter Bubble: How the New Personalized Web is Changing What We Read and How We Think. Penguin Random House.

36. Parks, M. (2020, May 27). Social Media Usage Is At An All-Time High. That Could Mean A Nightmare For Democracy. NPR.

37. Pennycook, G., Cannon, T. D., & Rand, D. G. (2018). Prior Exposure Increases the Perceived Accuracy of Fake News. Journal of Experimental Psychology: General, 147(12), 1865-1880.

38. Pennycook, G., & Rand, D. G. (2019, Jan. 19). Why Do People Fall for Fake News? .

39. Perrin, A. (2015, Oct. 8). Social Media Usage: 2005-2015. Pew Research Center.

40. Pew Research Center. (2021, April 7). Social Media Fact Sheet. Pew Research Center: Internet & Technology.

41. Rossi, S., Rossi, M., Upretti, B. R., & Liu, Y. (2020). Detecting Political Bots on Twitter during the 2019 Finnish Parliamentary Election. Hawaii International Conference on System Sciences, 53(2020).

42. Roth, Y., & Pickles, N. (2020, May 18). Bot or not? The facts about platform manipulation on Twitter. Twitter Blog.

43. Sayyadiharikandeh, M., Varol, O., Yang, K. C., & Flammini, A. (2020). Detection of Novel Social Bots by Ensembles of Specialized Classifiers. ACM International Conference on Information and Knowledge Management 29(2020), 2725-2732.

44. Shao, C., Ciampaglia, G. L., Varol, O., Yang., K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature Communications, 9(4787).

45. Shearer, E. (2021, Jan. 12). More than eight-in-ten Americans get news from digital devices. FactTank.

46. Sparkes-Vian, C. (2019). Digital Propaganda: The Tyranny of Ignorance. Critical , 45(3), 393-409.

47. Sunstein, C. R. (2018, Jan. 22). Cass R. Sunstein: is Social Media Good or Bad for Democracy? Facebook Newsroom.

48. Sunstein, C. R. (1999). The Law of Group Polarization. John M. Olin Law & Economics Working Paper, 91(2). Bowen 34

49. Tucker, J. A., Guess, A., Barbera, P., Vaccari, C., Siegel, A., Sanovich, S., Stukal, D., & Nyhan, B. (2018). Social Media, , and Political Disinformation: A Review of the Scientific . William & Flora Hewlett Foundation.

50. Twitter. (2019). Retrospective Review: Twitter, Inc. and the 2018 Midterm Elections in the United States. Twitter Blog.

51. Venet, M. (2020, Nov. 2). Foreign State-Sponsored Narratives on Mail-in Voting and Social Media Reach. Foreign Policy Research Institute.

52. Ward, C., Polglase, K., Shukla, S., Mezzofiore, & Lister, T. (2020, April 11). Russian election meddling is back— via Ghana and Nigeria— and in your feeds. CNN.

53. Watts, C., & Chernaskey, R. (2021, Feb. 18). Russia Tried Again, Iran Antagonized, and Didn’t Show: Insights and Lessons Learned on Foreign Influence in Election 2020. Foreign Policy Research Institute.

54. Yang, K.C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. Human Behavior & Emerging Technology.

55. Yang, K. C., Varol, O., Hui, P. H., & Menczer, F. (2019). Scalable and Generalizable Social Bot Detection through Data Selection. AAAI Conference on Artificial Intelligence.

56. Yardi, S., & Boyd, D. (2010). Dynamic Debates: An Analysis of Group Polarization over Time on Twitter. Bulletin of Science, Technology & Society, 30(5).

57. Yerlikaya, T. (2020). Social Media and Fake News in the Post-Truth Era: The Manipulation of in the Election Process. Insight , 22(2), 177-196.

58. Young, V. A. (2020, May 27). Nearly Half of the Twitter AcScounts Discussing 'Reopening America' May Be Bots. Carnegie Mellon University News.

Bowen 35

Additional Works

Blank, S. (2018). Moscow's Competitive Strategy. American Foreign Policy Council.

Bot Repository. Indiana University. Bot Repository (iu.edu)

Bradshaw, S., & Howard, P. N. (2019). The Global Disinformation Order: 2019 Global Inventory of Organized Social Media Manipulation. Project on Computational Propaganda.

Committee on Foreign Relations of the United States Senate. (2018). Putin's Asymmetric Assault on Democracy in Russia and Europe: Implications for U.S. National Security. Minority Staff Report of the One Hundred Fifteenth Congress.

Golovchenko, Y., Buntain, C., Eady, G., Brown, M. A., & Tucker, J. A. (2020). Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube during the 2016 U.S. Presidential Election. The International Journal of Press/Politics, 25(3), 357-389.

Grubbs, D., & Mandi, M. (2020, May 6). Understanding Political Twitter. Towards Data Science.

Mahnken, T. G., Babbage, R., & Yoshihara, T. (2018). Countering Comprehensive Coercion: Competitive Strategies Against Authoritarian . Center for Strategic and Budgetary Assessments.

Milan, S. (2015). When Algorithms Shape Collective Action: Social Media and the Dynamics of Cloud Protesting. Social Media + Society, 2015(2), 1-10.

Rauchfleisch, A., & Kaiser, J. (2020). The False positive problem of automatic bot detection in social science research. PLoS ONE, 15(10).

Svetoka, S. (2016). Social Media as a Tool of Hybrid Warfare. NATO Strategic Communications Centre of Excellence.