CAN UNCLASSIFIED

Towards a Detection Framework for

Bruce Forrester DRDC – Valcartier Research Centre

Friederike Von Franqué Von Franqué consulting

25th ICCRTS Virtual event, 2–6 and 9–13 November 2020

Topic 1: C2 in the Information Age Paper number: 091

Date of Publication from Ext Publisher: December 2020

The body of this CAN UNCLASSIFIED document does not contain the required security banners according to DND security standards. However, it must be treated as CAN UNCLASSIFIED and protected appropriately based on the terms and conditions specified on the covering page.

Defence Research and Development External Literature (P) DRDC-RDDC-2020-P228 December 2020

CAN UNCLASSIFIED

CAN UNCLASSIFIED

IMPORTANT INFORMATIVE STATEMENTS

This document was reviewed for Controlled Goods by Defence Research and Development Canada using the Schedule to the Defence Production Act.

Disclaimer: This document is not published by the Editorial Office of Defence Research and Development Canada, an agency of the Department of National Defence of Canada but is to be catalogued in the Canadian Defence Information System (CANDIS), the national repository for Defence S&T documents. Her Majesty the Queen in Right of Canada (Department of National Defence) makes no representations or warranties, expressed or implied, of any kind whatsoever, and assumes no liability for the accuracy, reliability, completeness, currency or usefulness of any information, product, process or material included in this document. Nothing in this document should be interpreted as an endorsement for the specific use of any tool, technique or process examined in it. Any reliance on, or use of, any information, product, process or material included in this document is at the sole risk of the person so using it or relying on it. Canada does not assume any liability in respect of any damages or losses arising out of or in connection with the use of, or reliance on, any information, product, process or material included in this document.

Template in use: EO Publishing App for CR-EL Eng 2019-01-03-v1.dotm

© Her Majesty the Queen in Right of Canada (Department of National Defence), 2020 © Sa Majesté la Reine en droit du Canada (Ministère de la Défense nationale), 2020

CAN UNCLASSIFIED 25th ICCRTS – 3-5 November, 2020 “The Future of Command and Control”

Towards a Deception Detection Framework for Social Media

Paper number: 091 Topic 1: C2 in the Information Age

Bruce Forrester Defence R&D Canada – Valcartier 2459 Pie-XI North Quebec, QC, G3J 1X5 Tel.: (418) 844-4000 #4943 [email protected]

Friederike Von Franqué Von Franqué Consluting 0176-83076104 [email protected]

Abstract

The democratization of communication media has had significant consequences for military command and control. Never before have adversaries had such free and direct access to our local populations allowing for influence through , and deception. In fact, social media platforms can help target messages to the exact demographic desired, while keeping attribution hidden. Commanders have been reluctant to embrace the new communication technologies which has resulted in playing catch up. Meanwhile, our opponents have infiltrated our communication spaces and ‘recruited’ thousands of followers who spread their messages unwittingly. This paper will present a new research framework for deception that will help overcome the issues of attribution of message originators. Concentrating on uncovering narratives, methods and intent rather than individuals alleviates many ethical problems of social media analytics within western societies. The framework will help to guide research on deception detection and increase assessment confidence for intelligence analysts and public affairs involved in the ongoing information environment clashes of power and politics.

1. Introduction

In warfare, the consequences of deception can be severe and decisive. Not surprisingly, deceiving and influencing the enemy, or one’s own population for that matter, is not new. It has existed for thousands of years and as Sun Tzu stated in his book The Art of War, “the supreme art of war is to subdue the enemy without fighting”. Operational and tactical commanders’ use forms of deception on a regular basis to keep the actual or future adversary misinformed or just guessing about military resources, processes or future plans. The use of influence campaigns designed to sow division, garner support, or to just create chaos in adversary countries have similarly been used before. However, this aspect of the art of war is more than ever easier due to affordances of cyberspace and the democratization of information and communications via social media. A recent example is the suspected involvement of during the 2016 US presidential election. It seems clear that outside forces were at play and trying to influence voters. For example, Timberg [1] states:

“There is no way to know whether the Russian campaign proved decisive in electing Trump, but researchers portray it as part of a broadly effective strategy of sowing distrust in U.S. democracy and its leaders. The tactics included penetrating the computers of election officials in several states and releasing troves of hacked emails that embarrassed Clinton in the final months of her campaign.”

Russian cyber and influence activities have been well documented [2-4] in the Ukraine during the annexation of the Crimea which was accomplished with almost no fighting. In fact, Berzin [3] states that “the Russian view of modern warfare is based on the idea that the main battle-space is the mind, and as a result, new-generation wars are to be dominated by information and …” (p.5). Such influence activities indeed point to a new focus of warfare; one conducted in cyber space, that exploits the use of deception and influence, and that broadcasts messages using social media. Within the cyber domain, Russia’s tool kit includes the “weaponization of information, culture and money” [5] in an effort to inject disinformation, confusion and proliferate falsehoods. This is being accomplished through using all available information channels such as TV news channels (i.e. RT, Sputnik), newspapers, YouTube channels (i.e. RT), Blogs and websites [6], as well as state sponsored Trolls [7, 8] who are ever-present on many social media outlets. Most of these means are combined and intermingled to create repetition of alternative narratives in many places, effectively strengthening perceived authenticity of the message. One of the most important channels to disseminate influential information operations are social media. Their reach has become expansive. Social media provides access to literally billions of people at a very granular level where even regionally targeted news can create viral explosions of a scale that have real effect in the physical world. For example, Pizzagate is the case of Edgar Welch who took his AR-15 rifle to Comet Ping Pong pizzeria in Washington to save children from ritualistic child abuse [9]. A convoluted that originated and spread via social media but ended up with Welch actually going to the pizzeria with a rifle. Different deception techniques use social media as a platform or have been newly developed for this environment. Established such as lies, , exaggeration, omission, bluffs, white lies, etc., meet with the newly available digital deception and propaganda techniques such as deep fakes, phishing and trolling and can each lead to unique manifestations within social media. It is not easy to spot these deception techniques, on the contrary, it is easy to become overwhelmed and muddled in one’s approach to detection of deception. Even fact- checking websites (i.e. .com, Google Fact Check) that are designed to help people differentiate fact from fiction are being faked [10].

To help researchers and operators deal with this complexity, a comprehensive detection framework is required. This paper presents an empirical-based framework for deception detection in social media. The framework described will allow for directed research, the production of indicators, and the development of algorithms. Single pieces of information are not usually considered sufficient to make solid recommendations and decisions. Once the framework is validated it will allow the ability for triangulation between its parts in order to increase confidence in any declaration of deception detection, thus improving a commander’s situational awareness and decision making capability. 1.1 What is Deception?

Many areas of expertise developed definitions of their understanding of deception and their close relatives: lies, fraud, trickery, delusion, misguidance, or misdirection. In military deception operations the objectives are often designed to make the opponent believe some falsehood about military strength or obfuscate future actions. A famous example was the “Operation Fortitude” embedded into the planning of the invasion of Normandy in 1944: For a non-existent army, stage designers built dummy tanks made of rubber and dummy airports with wooden airplanes. Officers engaged in radio communication, ordered food supply and discussed fictitious attack plans [11].

While there is no precise overlap in the definitions of deception from domain to domain, the definitions are close to each other and there are at least four characteristics that are common to most [12, 13]:

a. The intent is deliberate;

b. The intent is to mislead;

c. Targets are unaware; and

d. The aim is to transfer a false belief to another person.

Deception campaigns have as a goal to get the target or population to do something that the deceiver wants them to do or not do and thus give the deceiver, in a way, control over the targets’ actions or general behaviour. It can be to confuse, delay, or waste resources of an opponent, to put the receiver at disadvantage, or to hide one’s own purpose. Deception can occur to discredit and divide. Deception is an art form and as such will never remain static but continue to evolve and change. Social media are enablers for innovation in the area of deception. 1.2 Why social media for deception?

The Merriam-Webster definition states that: “social media are forms of electronic communication (such as websites for social networking and microblogging) through which users create online communities to share information, ideas, personal messages, and other content (such as videos)” [14].

Technically speaking social media services are (currently) web 2.0 internet-based applications with user- generated content, generally stored in a private database and shared online. Individuals, groups and institutions create user-specific profiles for a site or application designed and maintained by the respective social media service. These services facilitate the development of social networks online by connecting a profile with those of other individuals and/or groups.

In line with all the other digital channels, core characteristics of social media are: potential global reach, unlimited access or operating time, and the speed of content sharing, i.e. communication. A unique characteristic, however, is the attribution of content at first sight. Social media is all about individualized and personal communication, every “act of communication” on these platforms comes from a personal account and can be directly attributed to that person. Since almost no platform is using ID verification, any person on any platform can as well be an online-only presence, a so-called “fake account”. Plus, the content provided and associated with the digital person (such as age, race or sexual orientation) can be untrue. Even if the digital persona represents a real person and even if this person has provided the personal information as true as possible, the online profile would nevertheless be incomplete, misleading or exaggerating certain personal features – being deceptive as a rule. At the same time, online accounts are not limited and one person can have multiple accounts that are representing the very same person. On social platforms only the crafted image of a person is communicating with the crafted image of another person, it being the digital representative of a real person or not. If we talk about deception in social media, we should therefore differentiate between the online persona and the content shared by this persona. Some other characteristics make social media a virtual goldmine for deception activities: a. User generated content: Social media are not only platforms for consuming narrative and information generated by traditional media sources. More so, ordinary users are encouraged to create their own content (text, photos, and videos that can contain facts, opinions, emotions, lies, links to other content, etc.). This content does not have to follow editorial and fact checking duties of professional journalists [15]. Therefore, the tactical objective of a deception technique would be for the deceiver to become included within a target online group influencing the content as a source or posing as a user to become the source. b. Content mix. Personal information and political information are being consumed via social media by a large portion of the world’s population. It has been reported [16] that this is where 62% of people get their news. In spring of 2020 during the Covid-19 pandemic, it was found that people with low levels of formal education were more likely to get their news from social media and messaging applications rather than news organizations [17]. This means that well- designed deception on social media, i.e. one that is not obviously questionable, can have a huge reach and is likely to be believed by many. c. Virtual Echo Chambers: People tend to connect with people who share similar values and opinions (homophile). Users of social media often share their opinions and beliefs, which now are instantly available to all of their social network. That transparency can have a polarizing effect on the networks of users. Over time, many users purge others (online friends) that have radically or opposing views to their own. An online-purge a) is easier to execute, b) bears less risk of social alienation because the group of like-minded people can more easily replace the purged contact than offline relationships, and c) streamlines internal online-group discussions because the members know about purging and rarely disagree. This taken together has led to the formation of online echo chambers where users’ views are reflected and confirmed over and over due to the lack of diversity [1]. These types of groups tend to have a skewed world view. However, the mass existence of echo chambers may be overstated. Dubois et al. [18] show that the belief that people encounter only information that confirms their existing political views are blown out of proportion. In fact, most people already have media habits that help them avoid echo chambers. The democratic problem with these supposed echo chambers and filter bubbles is that people are empowered to avoid politics if they want. This means they will be less aware of their political system, less informed and in turn less likely to vote; all bad signs for a healthy democracy. d. Target Group: Other studies found on the contrary that social media users are more politically engaged than non-users of social media [19]. Therefore, influencing these users by deception on social media may have a more significant impact on politics. e. High Trust Levels: There tends to be a high level of trust between users [19]. Users do not only trust their known friends, in addition, trusted friends of friends tend to get included [20]. The higher the level of trust, the higher the impact of deceptive information. Personal trust can even override objective argumentation. A study by Lewandowski [21] revealed that even when a story was subsequently refuted, people often continued to believe the disproven information. Hence, it can be extremely difficult to educate the public about these mistaken beliefs [22]. At the same time, social media nourishes individual follower groups and individuals with high reputation levels will have high trust levels and thus become very influential information providers in social media. f. Consumption habits: SM users have a low standard when judging the validity and trustworthiness of what they read online [23]. Often just the headlines are read when skimming one’s or Twitter feeds. Advertisers and news media tend to use sensational and eye-catching headlines, which do not always represent the related article, to get attention. g. Psychological reasons: While misinformation is often crowd-corrected, the correction takes time and can have a much smaller amplitude [16] meaning that the misinformation often persists in the minds of users. While this is not specific for social media, the sheer volume of posts to which a user is exposed tends to amplify this effect. The speed at which information is available for mass consumption is very fast. Many sources do not seek the truth before posting. Virality is more powerful than veracity.

Hence, the spread of alternative narratives are exponentially abetted by social media users who generally have high trust, far reach (think Hollywood Stars), exist in an echo chamber, and have a low level of fact checking. Some technical specifics have to be highlighted, since, as they are played by deceivers, are the reason for new ways of deception design: a. Deep messaging: The many forms of information and links possible between sources available online far exceed traditional broadcast means. It may give the illusion of being substantially informed even though nobody is checking the background information. This is common in shady advertising where services and products can have many linked websites with contrived 5-star reviews giving the impression of excellence. b. Scale of Access: Social media is widely available through hundreds of platforms and can be instantly accessed through one’s ever-present mobile phone. This also allows messaging to previously hard-to-reach populations. c. Affordance: The digital nature allows for many new ways to deceive. It is possible to manipulate metadata surrounding digital information, to steal online identities and manipulate messages, and to manipulate message traffic volume through the use of bots (automated software scripts), even manipulate network ratings in order to cause messages to trend or go viral. Not only have the message delivery means changed, there are additional psychological phenomena that have been enhanced because of the online environment - first impressions (anchoring effect); people’s ability to differentiate between truth and fake information; information overload; and repetition [24] are examples.

The “old Kremlin/USSR” was concerned with the truth, however today Russian propaganda no longer believes in the truth; Even when the USSR told lies, they took care to prove what they were doing was the ‘truth’. Now no one even tries proving the ‘truth’. You can just say anything. “” changed the famous chant “Build the Wall” to “Finish the Wall” despite no new wall (or just very minimal amounts) having been built. This makes it very convenient for questionable news agencies and Trolls as they pump out their alternative narratives or opinions on an event as fast as possible. This means that they often provide the first impression from which all following judgements are made. Add in the quick memes that pop up (mostly satire), the references to many different websites, and the fast-paced way in which social media is consumed, it is no wonder why deception in the form of is so easily spread and believed.

2. Deception detection models (Literature review)

Social media and deception were defined above as a medium and technique. It is important to find out more about existing deception detection models and their potential for being applied to this medium. Research studies that have looked at various methods and techniques to detect different types of deception in social media are reviewed below.

Noble & Hempson-Jones advocate a dual category framework (messenger and message) for building a typology for explanation of misinformation in social media [25]. They advocate looking at the messenger and the message. Within the messenger category, one looks for indicators of false identity that are used to infiltrate targets by building up trust. This can be achieved through friending of other users or by using attractive imagery with credible text. The message category concentrates on false content; both text and multimedia. As interpretation of the message is based on the reader, “intentional massage misinformation usually requires creation of a strong narrative; in order to gain the attention of audiences and achieve impact” [25]. If the message is not read there is no impact. There seem to be two categories of interest for Noble /Hempson-Jones: indicators and impact as shown in Table 1.

Table 1: Two categories of interest for Noble /Hempson-Jones

Messenger Message

Indicator for - false identity indicators - false content deception

Techniques to - boost trust through closeness (friends of - boost trust through credibility boost impact friends) - strong narrative - boost interest (attractive imagery)

There has been research on what can be termed linguistic approaches, or stylometry, to detecting deception and fraud [26, 27]. Interesting would be also if certain topics, meaningful recurring phrases or short narratives could be identified linguistically, grouped and then offer a certain set of narrative identifications. Hancock [26] looked at digital deception comparing face-to-face, telephone, instant messaging, and text- based computer mediated communications as medium types. His model was based on two aspects: the communicator’s identity (identity-based digital deception), and the actual message (message-based digital deception), Table 2. In identity-based deception the person’s or organization’s identity is false or manipulated in order to look like it comes from someone or somewhere else. Hancock describes some possible manipulations such as trolling (someone posing as a legitimate user who posts inflammatory content to elicit conflict in the group), category deception (where users post as a different gender or race), and identity concealment (using a pseudonym to conceal identity). In the message-based deception it is the content that has been falsified. With this type of deception, the deceiver can apply richness of the media, synchronicity, recordability and language use. This take on deception will fit into several of the framework categories described in section 3 of this paper.

Table 2: Two categories of Hancock’s model

Messenger Message

Indicator - false-flag ID (user posing as someone else thus - richness of media “stealing” that persons trust level) - synchronicity - partly false ID - recordability - concealed ID - language use - not-even-human ID/machine

Rubin [28] constructed an understanding of the contextual domains in which deception, Table 3, occurs by using a grounded approach from blogs. The blogs were actual stories where the authors described their own lies or what others lied about to them, as opposed to actual lies in context. In her findings, politics and personal relations were the most prevalent domains followed by medicine and health. This research was interested in laying the groundwork for an ontology to support automatic deception detection. The types of deception discovered from most to least were: “mis/dis-informing, scheming, lying (deliberately prevaricating), misleading, misrepresenting, cheating, (mostly) showing frustration about deception, white-lying, name-calling, plagiarizing, and ‘bs’-ing” (p.8). This research concentrated on content only.

Table 3: Conceptual domains of deception from Rubins’s grounded model

Message

Indicator - scheming - lying - misleading - misrepresenting - cheating - showing frustration about deception - white-lying - name-calling - plagiarizing - ‘bs’-ing

Afroz, Brennan, & Greenstadt [29] concentrated on detecting imitation and obfuscation in writing. By comparing several different models they were able to detect when obfuscation occurred with a high accuracy rate (using the Writeprints and a Support Vector Machine (SVM)) and that function words were the best indicators of deceptive writing, Table 4. The Writeprints feature set included lexical, syntactic and content specific features. A training corpus is required to train a SVM which would limit effectiveness when applied to social media types that have thousands of different authors. However, in established topic areas, where authorship is more stable, these techniques could be applied for the analysis of both content and authorship.

Table 4: Function words used to imitate or obfuscate documents [29]

Top 20 function words for imitated and obfuscated documents

Indicator Imitated documents Obfuscated documents whats alot atop near lately up wanna theres underneath thousand anymore ours beside shall she thats herself cuz beneath whats like havent he Frequency of comma till lots her tons onto anyway soon plus Frequency of dot other Personal pronoun maybe

Tsikerdekis & Zeadally [12] concentrated on motivations and techniques used as well as their effect on targets when exploring deception. They looked at factors, Table 5, associated with the deceiver, the social media service, the deceptive act, and the potential victim. This is a fairly comprehensive model but is only applied to individuals acting in their own self-interest.

Table 5: Factors in determining level of difficulty in achieving online deception

Messenger Message Medium Receiver (deceiver factors) (deceptive act) (SM service) (potential victim)

Indicator - expectations - Time constraints - Prevalence of deception - ability to detect - goals (for deception and in the system deception - motivations for detection) - level of perceived - his/her relation to - Number of targets security target - type of deceptive - assurance and trust - target's degree of act mechanisms suspicion - media richness

Conroy, Rubin, & Chen [30] proposed a two-pronged typology for determining veracity (with the intent of finding fake news) that includes linguistic approaches and network approaches. The linguistic approaches look at word usage that provides cues of deception. These approaches vary from simple “bag of words” to deep syntax and semantic analysis, as well as rhetorical structure and discourse analysis, and finally the use of classifiers. The network approaches look at linked data and social network behaviour. Conroy et al. ultimately recommend a hybrid approach that combines the advantages of the above approaches. Both the linguistic and network approaches are process heavy and rely on machine learning to improve on the fact that humans are generally bad at detecting lies in text - we are not much better at determining a lie than if we just flipped a coin [31]. While Conroy et al. provide a good overview of methods, their scope is limited to fake news detection. Detecting deception in social media at large requires an expanded framework. However, the methods described will fit into the framework proposed in this paper.

Table 6: Conroy, Rubin, & Chen [30] two pronged typology for determining veracity

Messenger Message

Indicator - network behavior - bag of words - linked data - deep syntax analysis - semantic analysis - rhetorical structure - use of classifiers

There are many psychological techniques that deceptive content publishers employ to deceive users. In advance-fee fraud (where large sums of money are promised after bank account information is provided) the content is full of persuasive cues. The most frequent include attraction/excitement, authority, politeness and urgency [32]. When deceptive content is not based on fraud, “intentional message misinformation usually requires creation of a strong narrative; in order to gain the attention of audiences and achieve impact” [25].

Handcock, Curry, Goorha, & Woodworth [33] looked at how linguistic behaviour changes in computer- mediated communications (CMC) between lying and truth-telling. Significant for automated deception detection, “the data suggest that, overall, when liars were lying to their partners, they produced more words, used fewer first-person singular pronouns but more third-person pronouns, and used more terms that described the senses (e.g., ‘see’, ‘hear’, ‘feel’) than when they were telling the truth” (p.16). They also studied the effects of both the lair and the target and found changes to both. For instance, despite being unaware of the lying behaviour, targets asked more questions when being lied to than during truthful conversations. Hence the targets of deception should also be considered in detection. There can be many different linguistic and syntactic clues to deception. Zhou et al. [34] found that liars were more informal, more expressive and made more typographic errors than those who were telling the truth. Their research found that linguistic features such as “quantity, informality, expressivity, affect, uncertainty, nonimmediacy, diversity, specificity, and complexity – are all potential relevant discriminators” (p. 104) for detecting deception.

Fake news and propaganda are similar in that both are deliberate efforts to try to influence a population. Fake news is a broader term than propaganda; the latter usually associated with a political cause whereas the former is used “to gain financially or politically, often with sensationalist, exaggerated, or patently false headlines that grab attention” [40]. Pomerantsev and Wiess [5] use the following categories when describing the Kremlin’s tools and techniques for its weaponization of information: Shatter communication; Demoralize the enemy; and Take out command structure. However, for this research we need to look in a more granular fashion. Lazitski describes “media endarkenment techniques that have been used to influence audiences in Russia and the : misinformation; censorship; omission; spinning and twisting; construction of a false reality; intimidation; entertainment; simplification; and, lowering/marginalizing of content’s quality”[35]. Wardle [36] uses a similar set of characteristics. Hence, it is a good framework to use when thinking about fake news and is based on an increasing scale of the intent to deceive. This framework can be used for propaganda as well: a. “satire or parody (intention to cause harm but has potential to fool). b. false connection (when headlines, visuals of captions don't support the content). c. misleading content (misleading use of information to frame an issue or an individual). d. false content (when genuine content is shared with false contextual information). e. imposter content (when genuine sources are impersonated" with false, made-up sources). f. manipulated content (when genuine information or imagery is manipulated to deceive, as with a "doctored" photo). g. fabricated content (new content is 100% false, designed to deceive and do harm).”[36]

Finally, Parsons and Calic state: “online deception research demonstrates the complexity of the field, as there are many types of deception and motivations to deceive, but no consistent and overarching taxonomies. Social media adds another level of complexity, as only a few studies have examined this context, and the context itself is so varied” [13]. It is generally agreed that one approach will always have shortcomings and that use of multiple techniques is required. Different types of deception (misinformation, propaganda, lies etc.) can have some similar indicators but often require different methods for detection.

3. Deception Detection Framework

This framework, in large part, was derived empirically from a working knowledge of social media. There are seven proposed components to the framework: a. Originator b. Form of post c. Message content d. Characteristics of the medium e. Means of transmission f. How messages propagation g. Target audience

When looking for deception within SM, it is envisioned that finding positive indicators in three or more of the features will trigger a high probability of deception though triangulation. Details of each of the components is described in the following section. This framework has been developed to guide research into indicators, methods, and techniques. Greater details are provided in sections of the framework components where research has already occurred. Certain components are light on details and will require a concerted research effort. 3.1 Originator

Knowing the originator of the deception is a key aspect to understanding the overall deception. Literally anyone with Internet access can participate in and contribute on social media. On most social media platforms, people can have many accounts and usernames. Accounts could be used for personal, professional, or other reasons. More often than not, usernames are not obvious and platforms do not require forms of authentication to ensure veracity of users. Users could be normal users, journalists or new agencies, people representing an organization, personas, bots, or anything in between. Table 7 provides a non-exhaustive list of types of originators, what types of actions they perform, what their motivation could be, and provides a list of possible indicators that could be used for classification and detection.

Authenticating the originator of a SM post is not necessarily as easy as one might imagine. There has been a lot of research [7, 29, 37-39] around determining originators yet it remains very hard to assign clear attribution when someone wants to hide who they are. Of course not all originators are trying to deceive. In fact a method to detect those that are trying to deceive might be to first identify and eliminate the trustworthy and clearly true accounts. However, deception can come from any of the originator types identified below.

Table 7: Typical originators, what and why they post, and possible indicators for detection.

Originator What do they do? Indicators for classification and posing as: Why they post- Motivation? detection State News - works for a government, state sponsored - official handles, press releases, photos, Actor news agency/news outlet (i.e. RT or news reports (come from official sources) sputnik news). - sharing of the long-form content (Blogs,

reports, etc.) - usually motivated to post from power, - metadata like geolocation, date, time of money, favour, patriotism, need/out of posting, server-ID/IMEI, language correct pressure, belief, and/or job description

Hacker - generally speaking no political or - attend hacker conferences business connection but can work as - use a special list of tools (GITHUB is

freelance or part-time for governments or the site to check – source from code) business - @anon123 is typical handle (letters - usually anonymous (however, some are followed by numbers) quite easily to ID) - more cyber but not necessarily SM - they might steal someone else’s SM - likes to fool people or loves the challenge account or the attention - could use SM to report their hacks - messed up moral compass (but not achievements necessarily) - can deploy botnets - motivated by: - money - social prestige: to be cool in the eyes of their community - patriotism - want to “win” (see above “challenge”) - use deception to gain entry into restricted sites or servers - can as well be working for hardening critical infrastructure or as CIO.

Use misinformation to steal Activist - generally belong to a Non-Governmental - petitions Organization or activist group - crowd funding for a cause

- anonymous is an example - sign up for a demonstrations - promote demonstrations - work for a cause or strong belief (i.e. - websites related to cause anti-dog meat campaign launched against - eco tourism techniques Korea Olympics) - satire - podcasts, blog self-publishing of long- forum always in the top (first to be read) comments Agents - act as agents of the state or a third party - flame wars (Trolls, bots, - leave messages on others posts to incite - RT campaigns click farms) opinion-emotional based reactions (troll) - overused stereotype on the topic Originator What do they do? Indicators for classification and posing as: Why they post- Motivation? detection - deliberate messages to incite negative narrative reactions from others. (troll) - twitter handles/pages often use patriotic flags of country or state - can be paid - use badly mixes use of language (by bots or second language clickers) - clickfarm Exploited - normal Jack and Jills who are exploited - real people and profiles ordinary - identify strongly with issue, topic, users - do so out of ignorance but a strong belief - will often share more than produce in the message original content

- identity politics - often are already part of a propaganda share campaign - they perceive the content as real could behave troll like - are victims of the loss of authoritative (more likely the mass increase of non- authoritative sources prevalent online today)

Lobbyists - work for political or business - mobilize online resources to help prove their points

- paid - propagate networks and resources that reinforce their cause - could use - could use fake accounts Non-state - terrorist organisations - organizing for sympathy, financing and actors - ideological groups influence - insurgence groups - recruiting and training

- operational coordination - for a cause and belief - external communications (i.e. beheading videos) - videos sharing and posting (cherry picked and out of context) propaganda types Commercial - marketing - flood the discussion with fake tweets, entities - companies long-form, videos etc.

- to sell stuff - clickfarms - astroturfing Conspiring - politically motivated companies - flood the narrative in order to change companies people’s mind or to try to change the - paid or earn money through marketing national narrative

- could have means to profit after the campaign

3.1.1 Ideas on methods and techniques

While it might seem easy to determine who the originator or author of a post is, attribution, due to deception, is actually a very difficult challenge. In fact, the framework in large part has been developed in order to increase the confidence level for which association can be made to a specific originator. Understanding the types of potential originators, as described above, is essential. Understanding their motivations and techniques is key to uncovering originators who are trying to deceive.

We are not concerned with all types of deception. For instance, some authors who misrepresents themselves in order to appear more cool or desirable are likely not of interest. Of course the vast majority of originators are good law-abiding people. We are really concerned when an originator’s intent is to harm our operation, troops, society, or as defined by your organizations’ mandate. Association becomes very difficult when there are similar types of, say, content that is being authored by both a country’s citizens and foreign-state agents. For example, a common influence technique is to for agents to join discussion groups and slowly steer the conversations to more extreme and radical views with the aim to divide. Now there will be ‘normal citizens’ within these groups that will have or will accept these radical views. However, they are not necessarily using deception in order to convince others to adopt these views. They actually believe and can advocate these views albeit they have been deceptively influenced.

There has been research [39-42] on characterising an originator through the use of metadata associated with the account. Data fields such as the username, location, date joined, and the profile picture are a few examples. Does the username resemble a real name or is it just a string of random letters and numbers? To confound, there now exists social media platforms that automatically assign a username with a string of numbers. Does the picture match the profile description? Sometimes stolen pictures of very good looking people are used to entice others to follow or to like. Photos can be verified using Google or other image databases. Who is following the user? Fake accounts such as bots (programmed accounts made to look like humans) often have nonsensical hexadecimal random number account names, no photos, and nonhuman characteristics such as rapid posting, posting at weird hours or posting at a rate greater than humanly possible. Botnets (groups of bots with a controlling “MotherBot” designed to game various algorithms that determine SM Platform metrics such as what’s trending) usually follow each other and can be sometimes be discovered by looking at the follower to followee ratios in Twitter.

The motivation and intent of the originator is important. For instance, a similar message could be composed by two different actors; a state actor trying to increase division within a target country would likely use deception to hide their identity where as an ordinary user in that “target” country would not try to hide their identity and just genuinely believe in what they are writing about. 3.2 Form of post

A post can take many forms, the majority being regular posts by regular users wanting for example to share an opinion, picture, or show support for a cause. However, those wishing to deceive have developed special forms of posts that are designed to trick readers and change their actions or opinions. We have all received the email from a rich Nigerian royal family member who needs an intermediary to help transfer millions of dollars out of the country - we need only supply our banking information. These scams have been adapted to social media and have been expanded for political exploitation. The form of post characteristic is about the method or technique of origin, whereas the semantics of the messaging is the message content.

Marketing agencies are experts at creating forms of messaging that convince us to do certain actions or create desires leading to purchases. In this same vain, people who wish to deceive have also created certain forms that trick users. The most difficult to detect are the foreign state actors. These agents will take the time to create a persona, over time, posting banal messages perhaps including local (local to where they would like to be thought to be from) images and commenting on local issues, sports, and politics. These personas are then used to become part of the community that the foreign state is trying to influence. Once ‘activated’ the form of the post slowly becomes more and more radical or divisive. An early version of social media deception was deployed in . Chen et al. [37] examined the “internet water or 50-cent army” paid posters in China. These writers were paid for posting comments for some hidden purpose and were usually paid based on the number of posts. Chen et al. found that these paid posters have some special behavioural patterns that allow detection through statistical analysis. These patterns included percentage of replies, average interval time between posts, the number of days the user remains active, etc. They also found that user IDs were often shared and one could detect the use of the same ID in different geographical locations within a very short time period or that there were large numbers of IDs created in a short time. Lastly, Chen et al. found that to save time, paid posters often copied posts and just slightly changed them hence leading to detection through semantic analysis. The goal was to make the messages look organic, as if they come from individuals sharing, liking and engaging.

An even more intrusive form of deception is called social engineering. In this method, originators must already know certain details about the target, which they use to build trust. Social engineering is a psychological manipulation process of deceiving people into giving away information or performing an action [32]. An example is phishing emails. These messages often use persuasive cues such as authority, urgency, fear and politeness to increase compliance. While this is not social media form per se, the characteristics may be used for “message” indicators in social media such as calls to action that use a similar style characteristics.

Deceptive political content is frequently conveyed through the use of memes. Political memes are a subtle form of deception designed on half-truths that are often satirical, and slowly erode confidence in the subject of the meme. Granted they are technically forms of deception in which the originator intends the receiver to detect that is false (irony, jokes, etc.) but should not be considered as such [27].

The above discussed several forms of deception used in social media but there are new forms being developed constantly. 3.3 Message content

Content, and how it is interpreted, is entirely dependent on context. There is huge variance in what people post on social media. Literally any topic, hobby, sport, politic, religion, sexual content, regardless of how obscure can be found and is present on some social media platform. It is very likely that we could find lies, misinformation, and deceit in most of these areas. All sorts of people lie or report things that they may believe are true but which are actually false. People also believe in things that have no scientific basis as truth (i.e. religion). Further, depending on worldview and political leaning, what one believes to be true could be considered as false by others. This latter is often sighted a propaganda. We need to be able to differentiate between harmless lies or misinformation and deception. Recall that we characterised deception as a deliberate intent to mislead where the targets are unaware and where the aim is to transfer a false belief to another person. So, while someone using a photo of themselves from 20 years ago on a dating site would count as deception based on this definition, it might not be of interest here. However, it might be of interest. The nature of the content that is of interest will depend on the goals of the deception detection.

Deception in content can be expressed in many different mediums such as text, image, audio, video, or any combination. Often the medium depends on the form of the message. For instance, a small industry has evolved based on fake reviews on travel sites and Amazon. These deceivers want to appear as ‘normal” as possible; just regular folk who are voicing an opinion or providing a review. These messages normally contain just text and sometimes photos with content that is directly related to the product. “The systems that create fraudulent reviews are a complicated web of subreddits, invite-only Slack channels, private Discord servers, and closed Facebook groups, but the incentives are simple: Being a five-star product is crucial to selling inventory at scale in Amazon’s intensely competitive marketplace - so crucial that merchants are willing to pay thousands of people to review their products positively” [38]. The same is true for political influence [7, 8, 37]

As discussed in the section on form of post, many publishers of deceptive content are betting on the fact the most users quickly scroll and just skim through their social media “headlines” and forward, retweet, or like ones that they find interesting or shocking without any verification of facts. And from this quick scan of the headline or first sentence of the message, the target (reader) forms an opinion. This opinion can be a very small shift in viewpoint, which then allows for the next very small shift in opinion that, over time, moves increasingly towards the place where the originator intends. Deceptive content is often very close to the truth with just small changes made in favour of the deceptive narrative. Further, publishers of deceptive content will post to users that are already susceptible to the content. For example, politically right leaning users will only see deceptive content that is also right leaning and in line with their world- view. This is true also for politically left leaning users. As the deceptive content is already close to the users views they will tend to believe the content.

In the case of state actors producing content, a set of messages based on an overall narrative is used. For example, Russian propaganda has four distinct features [22]; High volume and multichannel; rapid, continuous and repetitive; lacks commitment to objective reality; and, lacks commitment to consistency. The first two features (high volume and multichannel and rapid, continuous and repetitive) are often achieved through the use of bots or paid posters. The latter two features (lacks commitment to objective reality; lacks commitment to consistency) must be detected within the content of a tweet or in a “salvo of tweets”- a much more difficult indicator of propaganda or deception. According to a NATO study [43], Russia had four strategic (read political) goals for their propaganda effort against the Ukraine: a. to promote Russia as a crucial player in the polycentric world in the process of international peace; b. to claim Russia’s superiority over the US; c. to prevent Ukraine’s transformation into being part of the external border of NATO and the European union; and, d. To soften and, in the nearest future, achieve the lifting of the sanctions regime against Russia.

While this study only looked at TV news stations, it would be very likely that the goals apply to the overall media/propaganda strategy. In fact, similar goals were used by the trolls [7]. Further, Russia has always been concerned with maintaining a land barrier between itself and Europe [43], hence we can assume that any country neighbouring Russia would have similar influence goals applied to them.

Some deception uses content-farming - producing long and short form content for the highest bidder – deployed around internet. This content gives the illusion that it is being produced organically by many different people and not by individuals that “works for the company”. The content looks like authentic non-biased content. The content is reworked such that is it not perceived as the company line but produced in such a way as to be attractive and convincing to “Joe reader” and often uses active voice writing and authentic sounding language.

Repurposed image or video content is frequently used. Images and videos could be altered or sometimes just used ‘as is’ with accompanying textual content that suits the originators’ message. In this case reverse image search engines can be used to help detect if images have been copied from past events.

Fake news reports are a form that repurposes previous images or new reports and twists the message. It exploits the ‘real’ nature of the picture or story, which is often distantly remembered by people without actually remembering the details, adding a sense of truth to the fake report. Often eye-catching headlines are used that are controversial, or lies that are in tune with a particular worldview (reinforcement of someone’s views) that then get quickly shared without necessarily having the sharer read the actual item. Another form used is called rogueing, where the fake news reports are made to look like real news reports be it from newsprint or internet news (i.e. torytube.ca). This form undermines the reader by making the site as close to the real source as possible leading to the viewer to trust the fake source.

In more advanced influence or deception operations, message content is specific to the target audience. It can be hard to determine what the objectives are, but if the messages is obviously biased towards a particular worldview or helps to promote a certain country, looking at the narratives the messages support is important.

3.4 Characteristics of the Medium

Social media comes in many forms and has many different characteristics that can affect the potential effectiveness of deception since the deceiver has to put more or less effort into becoming a “credible” online persona. It can be real-time such as Twitter, or in slow time such as a Blog. Communication can be synchronous (Chat) or asynchronous (Facebook). It can have high or low social-presence, which is a measure of how much users are able project themselves into the medium both socially and affectively. There is also media richness that ranges from simple text to full-feature avatars. Some sites require that a user build a profile with high self-disclosure where as other just require an anonymous username. Further, the purpose of the social media itself often defines how open users are to sharing personal information. As Facebook is used by most to keep in touch with family and friends, there is a high propensity for sharing of one’s life details whereas a wiki is focused on the content; sharing of personal information does not usually happen.

Secondly, the ease and impact of deception varies based on real or perceived security characteristics of social media sites. Sites with elevated levels of security, such as Snapchat (that eliminates messages after a period of time) provide users a greater sense of security. Users tend to ascertain a level of risk they are comfortable with, and any differences in the security of a site will then result in changes to their behaviour to maintain a similar level of risk [13]. Twitter offers no privacy of messages or identity (although one is not forced to properly identify to open an account). Users also tend to change their behaviour based on the amount of deception perceived to be on a specific site. When deception is common, people tend to become more suspicious, which means the ease of deception may be reduced [44]. Similarly, aspects associated with the design of sites can have an impact on the level of perceived security. When users perceive that the security of a site is high, they may be more relaxed, which may increase the frequency of deceptive practices of those sites [44]. Social media platforms have recently made progress to improve security through authentication and adding privacy settings.

Thirdly, specifics of the medium affect deception. Examples of medium characteristics are shown in Figure 3 for common types of platforms. However, platforms continue to add popular features that where once the ‘secret sauce’ of competitors, so the lines are blurred and platforms today tend to be polyvalent. The medium does affect deception. For instance, text-based, asynchronous media, such a blogs, allow time for deception as deceivers can more “easily maintain the lie” over this type of slow moving medium. A blog is non-real-time, with a low social presence but generally have a high self-disclosure. A blogger wants to be identified with their writings which might seem contradictory for a deceiver - think conspiracy theory. Blogs are often linked to from a tweet and tend to be used to delve deeper into the referenced subject. So a Tweet is used as a billboard, to attract attention through a catchy phrase or image, that then links to the ‘background theory’ found in the blog. This background takes the reader through a slow and methodical reasoning process to try to achieve buy-in.

Figure 3 Self-disclosure vs richness of the media for various forms of social media.

Level of perceived deception in a site means that users will already be weary of the possibility of deception. Level of perceived security of a site affects users’ level of trust amongst members. Stricter user signup procedures lead to higher sense of security and trust, hence higher potential for successful deception [45]. 3.5 Means of Transmission

The means of transmission of electronic communications can be manipulated at any point between the sender and receiver. The packets of information can be re-routed and modified. Tsikerdekis [45] notes that this type of manipulation is costly as it requires greater technical skill that other forms of deception, however there is a high likelihood of success. A fairly common technique is web-page redirects to a false page called page-jacking (close replication of name or look of website aimed to capture usernames and passwords).

There is much research that remains to be conducted for this part of the deception framework. Given its nature, such research will likely be classified and conducted with the aid of signal intelligence officers. 3.6 How messages propagate

Social networks are formed for many reasons such as keeping in touch with school friends and family, maintaining professional ties, sharing and learning about a hobby, or supporting ones political interests. In the early years (circa 2007-10) of social media messages propagated naturally or organically from person to person based on interest. Algorithms were soon added to help users to ‘see’ what was of interest to others; what was trending. However, some users quickly learned how to game the algorithms, for example with hashtag-campaigns (very famously done by Justin Bieber fan base) but also through the use of software scripts called bots. These bots are able to share messages at a superhuman speed and thus elevate messages to the top of the trending list. Using many bots in unison led to even greater ways to manipulate the propagation of messages on social media platforms. A botnet, a group of bots acting together, could promote, supress, bury, or corrupt messages using various techniques. These botnets are often manipulated to try and emulate organic propagation.

Lavigne [39] has assembled a partial list of known techniques that could be conducted using bots or coordinated group of users:

a. Astroturfing is comments against point of view followed by a series of comments that tend to push down the message. It takes the form of a series of comments to bury the ‘targeted’ comment. This form fakes grass roots activity to make it look like there is greater activity around the topic that creates a large volume around the topic. This volume can be in the form of comments that lead to the same link or the same number of handles on different topic forums or discussion that lead back to the same long forum content. b. Paid influencer (tweet – news – retweet) – These are originators who are in the movement because they are paid. A provocateur agent or group of posters can work for both sides. An information agitator is basically an influencer marketing (e.g. The Kardahians – they show products, are shown opening products, report on the use of products, etc.) and it is hard to determine what is paid and what is not being paid for. The paid influencer is strictly speaking an actor. Their content technique would be slipping created/paid messages into the daily stream of personal information.

c. Fake blogs are used to repeat the same or similar format and narrative. They use chunks of text that are the same or very similar but the user changes. The goal is to get higher in Google search results resulting in greater exposure on Google. These fake blogs promote something (often a product, but could also be a narrative) and can also generate Google ad words revenue. Cookies attached to these blogs can then be used to track people and to determine their interests in order to then be able to serve them with tactical ads. This technique helps to identify ‘useful idiots’ that share the content thinking that it is real. These useful idiots are then exploited to add a sense of genuine interest for the cause.

d. Paid trolls are originators who use aggressive and emotional content that is designed to disrupt narrative. Examples include: flamewars (conversations where people are just insulting one another); the use of hate-speech; and the use of a totally irrelevant set of messages inserted into a topic that essentially hijacks the conversation and is meant to enrage the participates to the point that they no longer discuss the original topic or leave the topic group altogether. Russian Trolls will often work in teams and utilise “good cop, bad cop” techniques in a ply for authenticity.

Russia employs professional trolls on a 24-hour cycle whose job it is to ensure pro-Russia messages and comments are spread via forums and social media [7]. These messages are made in many languages both for Russian and foreign audiences

Many social media platforms earn revenue from advertising. This has led to platforms keeping very detailed demographic information about its users such that targeted advertising is possible. As well, platforms keep information on what user do, what sites they visit, what interests they have, what they like, etc. These ‘multiple touch points’ over time are key to selling products, influencing through political ads, and advertising. So when a user engages with fake news or show an interest in a political view point it is noted in their data file and leads to banner ads or targeted messages delivered directly to their message feed. This targeting from within social media can also follow the user on Google searches or in advertising on other web sites. This is why you see ads for the new product you just searched for in your Facebook feed.

Another way the messages are deceptively swayed is through click-farming. Here a room full of people are paid to click all day on certain messages or advertisements. This leads the social media algorithms to overvalue that content and consequently put it into more people’s feeds. Similarly, one can also buy a group of ‘fakefriends’ that can make the user seem more important than they actually are and hence make their messages seem more important.

New deceptive schemes are being constantly created with the aim of getting one’s message seen or pushing down an opponent’s message. Social network analysis tools can be used to analyze the propagation of messages throughout the network and undercover indications of deception. 3.7 Target Audience

Who or what is a target? The simple answer is anyone or anything can be the target of deception. It depends on who is of concern. So the question then becomes how does one determine target identification (demographics, worldview, geolocations, associations, loyalties, etc.)?

Some targets may be influencers to the actual targets of interest. News organizations could be targets, thus helping to spread the message. Some targets will be “useful idiots” – people who spread deception unwittingly, believing that it is true. Targets could also be chosen based on level of commitment. For example, just reading a message is very low, retweeting is greater, retweeting and commenting show a higher level of commitment as does using “@mention” or reply, and creating a complimentary message show the highest level of commitment.

While some users are savvy and able to detect deception to a certain extent, many can be blinded by biases or strong beliefs that block objective reflection. Users that are technologically literate have a higher chance of detecting deception attacks [45]. Identifying venerable audiences based on message content is a valid method. There remains much research to link existing knowledge of the effects of psychological biases to deception within social media.

4. Conclusion and Future Research

Research on deception has been conducted in many different fields but there is a lack of coherence within social media. Military intelligence as well as many other organizations require a certain level of confidence when identifying deception. The research framework presented above will allow for greater clarity on how differing aspects of messages can be deceptive. For operators, triangulation of several characteristics of social media will lead to higher levels of confidence in declaring a message or originator as deceptive. However, a framework is just that and requires fleshing out with tools and methods.

As this paper presents a framework, research needs to be conducted to produce models, indicators, and analytics that can be used to actually detect deception. Current research in this area is using neural networks, bidirectional transformers, deep and machine learning, as well as natural language processing. These analytical methods of detection normally require training data. Deception is an art form and as such will never remain static but continue to evolve and change. Hence models and algorithms will continually require tweaking and validation.

A promising avenue of research is finding deception through the understanding of narratives or content in this framework. However the current social media analytic tools are not capable of narrative search. Research on the use of narratives in detecting deception is required.

5. References

[1] Timberg, C., Russian propaganda effort helped spread ‘fake news’ during election, experts say, in The Washington Post, Business section2016. [2] Jolicoeur, P. and A. Seaboyer, The Evolution of Russian Cyber Influence Activity: A Comparison of Russian Cyber Ops in Georgia (2008) and Ukraine (2014), DND, Editor 2014: Royal Military College of Canada. [3] Berzins, J., Russia's new generation warfare in Ukraine: Implications for Latvian defense policy, E. C.f.S.a.S. Research, Editor 2014, National Defence Academy of Latvia. p. 15. [4] Zhdanova, M. and D. Orlova, Computational Propaganda in Ukraine: Caught Between External Threats and Internal Challenges, in Computational Propaganda Research Project, S.W.a.P.N. Howard, Editor 2017, University of Oxford,: UK. [5] Pomerantsev, P. and M. Weiss, The menace of unreality: how the Kremlin weaponizes information, culture and money, in A special report by the Interpreter, a project of the Institute for Modern Russia 2014. [6] PropOrNot. Is It Propaganda Or Not? Your Friendly Neighborhood Propaganda Identification Service, Since 2016! . 2016 [cited 2017 26 April ]; Available from: http://www.propornot.com/p/the-list.html. [7] Volchek, D. and D. Sindelar. One professional Russian troll tells all. 2015 [cited 2017 May]; Available from: https://toinformistoinfluence.com/2015/05/31/one-professional-russian-troll-tells-all/. [8] Wikipedia. . 2019 10 February 2019 [cited 2019 21 February]. [9] LaFrance, A., The Prophecies of Q: American conspiracy theories are entering a dangerous new phase., in The Atlantic2020: Boston, MA. [10] Funke, D., This website impersonated a fact-checking outlet to publish fake news stories, in Poynter2019, The Poynter Institute. [11] Donovan, L.-C.M.J., Strategic Deception: Operation Fortitude2014: Lucknow Books. [12] Tsikerdekis, M. and S. Zeadally, Online deception in social media. Commun. ACM, 2014. 57(9): p. 72-80. [13] Parsons, K. and D. Calic, Understanding online deception: A review, in Types of Misinformation in Social Media in Misinformation in Social Media2015, TTCP: TTCP TECHNICAL REPORT. [14] social media, in Merriman Webster2019, Merriam-Webster.com [15] Starbird, K., Examining the Alternative Media Ecosystem Through the Production of Alternative Narratives of Mass Shooting Events on Twitter, in ICWSM 20172017, Association for the Advancement of Artificial Intelligence. [16] Starbird, K., J. Maddock, M. Orand, P. Achterman, and R. Mason. Rumours, false flags, and digital vigilantes: misinformation on Twitter after the 2013 Boston marathon bombing. in iConference 2014. 2014. [17] Nielsen, R.K., R. Fletcher, N. Newman, J.S. Brennen, and P.n. Howard, Navigating the 'Infodemic': How people in six countries access and rate news and information about Coronavirus. Misinformation, Science, and Media, 2020. April. [18] Dubois, E. and G. Blank, The echo chamber is overstated: the moderating effect of political interest and diverse media. Information, Communication & Society, 2018. 21(5): p. 729-745. [19] Hampton, K., L.S. Goulet, L. Rainie, and K. Purcell, Social networking sites and our lives, P.R. Center, Editor 2011. [20] Moldoveanu, M. and J. Baum, “I think you think I think you’re lying”: The interactive epistemology of trust in social networks. Management Science, 2011. 57(2): p. 393-412. [21] Lewandowsky, S., U.K.H. Ecker, C.M. Seifert, N. Schwarz, and J. Cook, Misinformation and Its Correction: Continued Influence and Successful Debiasing. Psychological Science in the Public Interest, 2012. 13(3): p. 106-131. [22] Kuklinski, J.H., P.J. Quirk, J. Jerit, D. Schwieder, and R.F. Rich, Misinformation and the currency of democratic citizenship. Journal of Politics, 2000. 62(3): p. 790-816. [23] Moturu, S. and H. Liu, Quantifying the trustworthiness of social media content. Distributed and Parallel Databases, 2011. 29: p. 239-260. [24] Paul, C. and M. Matthews, The Russian “Firehose of Falsehood” Propaganda Model, 2016, Rand Corporation. [25] Noble, J. and J. Hempson-Jones, Types of Misinformation in Social Media in Misinformation in Social Media, TTCP, Editor 2015, TTCP TECHNICAL REPORT. p. 6-14. [26] Hancock, J.T., Digital deception: Why, when and how people lie online, in Oxford handbook of internet psychology, A.N. Joinson, et al., Editors. 2007, Oxford University Press: Oxford, UK. p. 289-301. [27] Hancock, J.T., L. Curry, S. Goorha, and M. Woodworth, On Lying and Being Lied To : A Linguistic Analysis of Deception in Computer-Mediated Communication. Discourse Processes, 2008. 45(1): p. 1-23. [28] Rubin, V.L., On deception and deception detection: Content analysis of computer-mediated stated beliefs. Proceedings of the American Society for Information Science and Technology, 2010. 47(1): p. 1-10. [29] Afroz, S., M. Brennan, and R. Greenstadt. Detecting , Frauds, and Deception in Writing Style Online. in 2012 IEEE Symposium on Security and Privacy. 2012. [30] Conroy, N.J., V.L. Rubin, and Y. Chen, Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 2015. 52(1): p. 1-4. [31] Bond, C.F. and B.M. DePaulo, Accuracy of Deception Judgments. Personality and Social Psychology Review, 2006. 10(3): p. 214-234. [32] Atkins, B. and W. Huang, A Study of Social Engineering in Online Frauds. Open Journal of Social Sciences, , 2013. 1(03): p. 23-32. [33] Hancock, J.T., J. Thom-Santelli, and T. Ritchie. Deception and design: the impact of communication technology on lying behavior. in ACM Conference on Human Factors in Computing Systems. 2004. New York. [34] Zhou, L., J.k. Burgoon, J.F.N. Jr, and D. Twitchell, Automating Linguistics-based Cues for Detecting Deception in Text-based Asynchronous Computer Mediated Communications. Group Decision and Negotiation, 2004. 13: p. 81-106. [35] Lazitski, O. Media Endarkenment: A Comparative Analysis of 2012 Election Coverage in the United States and Russia. 2013 [cited 2017 10 May]; Available from: http://www.mediaendarkenment.com/my- publications.html. [36] Wardle, C. Fake news. It's complicated. 2017 [cited 2017 22 April]; Available from: firstdraftnews.com. [37] Chen, C., K. Wu, V. Srinivasan, and X. Zhang Battling the Internet Water Army: Detection of Hidden Paid Posters. eprint arXiv:1111.4297, 2011. [38] Doerr, B., M. Fouz, and T. Fredrich, Why rumors spread so quickly in social networks. Communications of the ACM, 2012. 55(6): p. 70-75. [39] Lavigne, V., Social Bots - Literature Review, 2015, Defence Research and Development Canada: Ottawa. [40] Gorwa, R. and D. Guilbeault, Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet, 2018. [41] Rao, D. and D. Yarowsky, Detecting Latent User Properties in Social Media, 2009. [42] Yang, Z., J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su, Understanding retweeting behaviors in social networks, in Proceedings of the 19th ACM international conference on Information and knowledge management2010, ACM: Toronto, ON, Canada. p. 1633-1636. [43] NATO, The dynamics of Russia’s information activities against Ukraine during the Syria campaign. , 2016: NATO Strategic Communications COE, Riga. [44] Donath, J.S., Identity and deception in the virtual community. Communities in Cyberspace, 1998: p. 29-59. [45] Tsikerdekis, M. and S. Zeadally, Online deception in social media. Library and Information Science Faculty Publications, 2014. 2014(9).

DOCUMENT CONTROL DATA *Security markings for the title, authors, abstract and keywords must be entered when the document is sensitive 1. ORIGINATOR (Name and address of the organization preparing the document. 2 a . SECURITY MARKING A DRDC Centre sponsoring a contractor's report, or tasking agency, is entered (Overall security marking of the document including in Section 8.) special supplemental markings if applicable.)

Institute for Defence Analysis, CAN UNCLASSIFIED IDA Headquarters IDA Systems and Analyses Center 4850 Mark Center Drive 2b. CONTROLLED GOODS Alexandria, VA 22311-1882 NON-CONTROLLED GOODS

DMC A

3. TITLE (The document title and sub-title as indicated on the title page.)

Towards a Deception Detection Framework for Social Media

4. AUTHORS (Last name, followed by initials – ranks, titles, etc., not to be used)

Forrester, B.; Franqué, F.

5. DATE OF PUBLICATION 6a. NO. OF PAGES 6b. NO. OF REFS (Month and year of publication of document.) (Total pages, including (Total references cited.) Annexes, excluding DCD, covering and verso pages.) December 2020 21 45

7. DOCUMENT CATEGORY (e.g., Scientific Report, Contract Report, Scientific Letter.)

External Literature (P)

8. SPONSORING CENTRE (The name and address of the department project office or laboratory sponsoring the research and development.)

DRDC – Valcartier Research Centre Defence Research and Development Canada 2459 route de la Bravoure Québec (Québec) G3J 1X5 Canada

9a. PROJECT OR GRANT NO. (If appropriate, the applicable 9b. CONTRACT NO. (If appropriate, the applicable number under research and development project or grant number under which which the document was written.) the document was written. Please specify whether project or grant.)

05cc - Influence Activities in support of Joint Targeting

10a. DRDC PUBLICATION NUMBER (The official document number 10b. OTHER DOCUMENT NO(s). (Any other numbers which may be by which the document is identified by the originating assigned this document either by the originator or by the sponsor.) activity. This number must be unique to this document.)

DRDC-RDDC-2020-P228

11a. FUTURE DISTRIBUTION WITHIN CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

Public release

11b. FUTURE DISTRIBUTION OUTSIDE CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

12. KEYWORDS, DESCRIPTORS or IDENTIFIERS (Use semi-colon as a delimiter.)

Social media; Deception detection

13. ABSTRACT/RÉSUMÉ (When available in the document, the French version of the abstract must be included here.)

The democratization of communication media has had significant consequences for military command and control. Never before have adversaries had such free and direct access to our local populations allowing for influence through propaganda, disinformation and deception. In fact, social media platforms can help target messages to the exact demographic desired, while keeping attribution hidden. Commanders have been reluctant to embrace the new communication technologies which has resulted in playing catch up. Meanwhile, our opponents have infiltrated our communication spaces and ‘recruited’ thousands of followers who spread their messages unwittingly. This paper will present a new research framework for deception that will help overcome the issues of attribution of message originators. Concentrating on uncovering narratives, methods and intent rather than individuals alleviates many ethical problems of social media analytics within western societies. The framework will help to guide research on deception detection and increase assessment confidence for intelligence analysts and public affairs involved in the ongoing information environment clashes of power and politics.