Online Social Deception and Its Countermeasures: a Survey

Received November 23, 2020, accepted December 14, 2020, date of publication December 24, 2020, date of current version January 5, 2021.

Digital Object Identifier 10.1109/ACCESS.2020.3047337 Online Social Deception and Its Countermeasures: A Survey

ZHEN GUO 1, JIN-HEE CHO 1, (Senior Member, IEEE), ING-RAY CHEN 1, (Member, IEEE), SRIJAN SENGUPTA2, MICHIN HONG3, AND TANUSHREE MITRA 4 1Department of Computer Science, Virginia Tech, Falls Church, VA 22043, USA 2Statistics, North Carolina State University, Raleigh, NC 27695, USA 3School of Social Work, Indiana University, Indianapolis, IN 46202, USA 4Information School, University of Washington, Seattle, WA 98195, USA Corresponding author: Zhen Guo ([email protected])

ABSTRACT We are living in an era when online communication over social network services (SNSs) have become an indispensable part of people’s everyday lives. As a consequence, online social deception (OSD) in SNSs has emerged as a serious threat in cyberspace, particularly for users vulnerable to such cyberattacks. Cyber attackers have exploited the sophisticated features of SNSs to carry out harmful OSD activities, such as ﬁnancial fraud, privacy threat, or sexual/labor exploitation. Therefore, it is critical to understand OSD and develop effective countermeasures against OSD for building trustworthy SNSs. In this paper, we conduct an extensive survey, covering 1) the multidisciplinary concept of social deception; 2) types of OSD attacks and their unique characteristics compared to other social network attacks and cybercrimes; 3) comprehensive defense mechanisms embracing prevention, detection, and response (or mitigation) against OSD attacks along with their pros and cons; 4) datasets/metrics used for validation and veriﬁcation; and 5) legal and ethical concerns related to OSD research. Based on this survey, we provide insights into the effectiveness of countermeasures and the lessons learned from the existing literature. We conclude our survey with in-depth discussions on the limitations of the state-of-the-art and suggest future research directions in OSD research.

INDEX TERMS Online social deception, cyberattacks, security, defense, prevention, detection, and response, social media, online social networks.

I. INTRODUCTION increased significantly in recent times, with about 25% of A. MOTIVATION people experiencing some types of social deception, such as Social media and social network services (SNSs) have identity theft, cyberbullying, fraud, or phishing in 2018 [156]. become an indispensable part of people’s everyday lives. The serious consequences have led to such OSD attacks In 2020, approximately 82% of Americans reported using being defined as cybercrimes [139] since early 2000’s. The social media [93]. This significant surge is due to various advanced features of SNS technologies further have facili- benefits that users enjoy, such as easy communications with tated the significant increase of serious, sophisticated cyber- others, engagement in civic and political activities, searching crimes, beyond simple phishing or spamming, such as human jobs, marketing, and/or sharing information or emotional trafficking, online consumer fraud, identity cloning, hack- support. Even with these significant benefits, many people ing, child pornography, or online stalking [192]. Therefore, have ambivalent feelings about social media due to privacy we need to deeply understand OSD and think of how to concerns and/or deceptive activities aiming to harm normal, develop effective countermeasures against OSD for building legitimate users [153]. The proliferation of highly advanced a trustworthy cyberspace. social media technologies has been exploited by perpetrators Although there have been several papers surveying online as convenient tools for deceiving users [7]. The widespread social network (OSN) attacks [2], [54], [58], [92], [98], [137], damage due to online social deception (OSD) attacks have [154], [216], [218], the existing surveys are limited in discussing detection mechanisms using various artificial intel- The associate editor coordinating the review of this manuscript and ligence (AI) techniques including machine learning, deep approving it for publication was Hocine Cherifi . learning, or text mining. They did not really embrace a wide

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. 1770 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

spectrum of defense against OSN attacks such as prevention, Rathore et al. [154] conducted a comprehensive survey on detection, and response (or mitigation). Further, there has social network security. They classified social network secu- been lack of discussions on deception which is exploited as rity threats in three categories, including multimedia content the starting point of most OSN attacks. threats, traditional threats, and social threats with 21 types of threats/attacks. The authors mainly discussed multime- B. RESEARCH GOAL & QUESTIONS dia content threats, along with their definitions, impact, and To fill the gap discussed above, this work aims to deliver security response methods, including detection methods for a comprehensive, systematic survey for researchers to effi- each type of threat. They also compared various security ciently and effectively grasp a large volume of the state-of- attacks in terms of the nature of attack (attack source), attack the-art literature on OSD attacks and its countermeasures difficulty, risk to data privacy/integrity, and attack impact on in terms of three aspects of defense, such as prevention, users. In the end, they proposed a framework to measure and detection, and response (or mitigation). To this aim, the scope optimize the security of SNSs. of our survey focuses on answering the following research Novak and Li [137] focused on OSN security and data questions: privacy issues. They discussed how to protect user data from attacks by the research in social network inference (e.g., RQ1: How is OSD affected by the fundamental concepts user attributes, location hubs, and link prediction) and in and characteristics of social deception which have been anonymizing social network data. Gao et al. [58] discussed studied in multidisciplinary domains? the four types of social network attacks, which include pri- RQ2: What are new attack types based on the recent vacy breaches, viral marketing attacks, network structural trends of OSD attacks observed in real online worlds and attacks, and malware attacks. The authors compared various how are they related to common social network attacks, attacks, including information leak, de-anonymizing, phish- cybercrimes, and security breaches based on cybersecurity ing, Sybil, malware, and spamming, and discussed counter- perspectives? measure defense mechanisms against them. RQ3: How can the cues of social deception and/or suscept- Fire et al. [54] discussed key OSN threats and solutions ability traits to OSD affect the strategies by attackers and against them. The authors outlined OSN threats with an defenders in OSNs? additional focus on attacks against children and teenagers. RQ4: What kinds of defense mechanisms and/or method- There are 5 classic threats, 9 modern threats, combination ologies need to be explored to develop better defense tools threats and 3 threats targeting children. The defense solu- combating OSD attacks? tions were techniques provided by OSN operators, commer- RQ5: What are the key limitations of existing validation cial companies, and academic researchers and the protection and verification methodologies in terms of datasets and ability of various solutions were discussed. In the end, they metrics? provided recommendations for OSN users to protect their RQ6: What are the key concerns associated with ethical security and privacy when using social networks. Kayes and issues in conducting OSD research? Iamnitchi [92] reviewed the taxonomies of privacy and security attacks and their solutions in OSNs. The authors cate- C. COMPARISON WITH EXISTING SURVEY PAPERS gorized the attacks based on OSN’s stakeholders (users and As social deception leverages OSNs as platforms, there have their OSNs) and entities (i.e., human, computer programs, been several survey papers [2], [54], [58], [92], [98], [137], or organizations) performing the attacks. They discussed [154], [216], [218] discussing social network attacks. attacks on users’ information and how to counter leakages and Fire et al. [54] mainly discussed social network threats linkages. However, the attacks discussed as social deception targeted at young children in terms of phishing, spamming, are common social network attacks, such as Sybil attacks, fake identity, profile cloning attacks, cyberbullying, and compromised accounts and/or spams. The defense techniques cyber-grooming. Rathore et al. [154] surveyed social net- to mitigate each attack type were discussed as ways to detect work attacks with a special emphasis on multi-media secu- and resist against those attacks. rity and privacy. Since fake news is an emerging deception Kumar and Shah [98] discussed the characteristics and attack in OSNs, a recent effort by Kumar and Shah [98] dis- detection of false information on Web and social media, cussed the details of fake news detection methods. Although with two knowledge-based types: opinion-based methods the existing works stated above [54], [98], [154] proposed with ground truth (e.g., fake reviews), and fact-based meth- mechanisms to mitigate specific social deception threats, ods without ground truth (e.g., hoaxes and rumors). They they focused on discussing prevention methods and practical described how false information can perform successful security suggestions. An interesting observation is that no deception attacks, and their impacts on the speed of false work has discussed ethical issues in developing techniques information propagation and characteristics for each type. to deal with OSN threats/attacks. Besides, we observed a Based on the specific characteristics, the authors discussed lack of understanding on the pros and cons of each detection the detection algorithms for each type utilizing different or mitigation technique that combat online social deception features and propagation models in terms of the anal- attacks. ysis of classification, key actors, impacts, features, and

VOLUME 9, 2021 1771 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 1. Comparison of the key contributions of our survey paper and other existing survey papers.

measurements. In addition, they discussed the detection algo- Tsikerdekis and Zeadally [187] analyzed the motivations rithms for opinion-based and fact-based detection mecha- and techniques of online deception in social media platforms. nisms, respectively. They categorized social media by the extent of media richness Wu [216] summarized misinformation in social media, and self-disclosure. Due to the user connection and content focusing more on the unintentional-spread misinformation, sharing nature of social media, online deception techniques such as meme, spam, rumors, and fake news. It dis- can involve multiple roles, such as content, sender, and com- cussed information diffusion models and network structure, munication channel. They also provided an insightful dis- misinformation detection and spreader detection, misinfor- cussion of challenges in prevention and detection of online mation intervention, and detailed evaluation datasets and deception. However, this work did not discuss any attack metrics. The diffusion models are SIR (Susceptible-Infected- behaviors concerned as in our paper. Recovered/Removed), Tipping Point, Independent Cascade, Based on the existing survey papers [2], [54], [58], [92], and Linear Threshold model. In the diffusion process, user [98], [137], [154], [216], [218], we found that there is no types can be categorized as forceful individuals [2], which comprehensive survey paper on online social deception which refer to users not affected upon belief exchange. Wu and sits between OSN threats and cybercrimes. The most related Liu [218] described detecting crowdturfing in social media. work discussed above focused on security and privacy issues The authors summarized the history of astroturfing campaign and their solutions in OSNs. Most previous studies analyzed and crowdturfing. The methods to investigate crowdturf- various types of OSN threats and provided detection methods ing is mining and profiling social media users as attackers for specific types of security threats. However, they usually and modeling information diffusion in social media. Finally, discussed traditional types of security issues, which only crowdturfing detection can be performed in content-based, partially overlap our definitions of social deception threats. behavior-based, and diffusion-based approaches in the state- We intended to cover more types of OSD threats and provide of-the-art research. However, this work [218] limited its full ranges of solutions using a wide spectrum of defense scope only to crowdturfing. Hence, we did not include it in strategies, including prevention, detection, and response (or TABLE1 for the comparison of our survey paper with other mitigation). To clarify the contributions of our survey paper, counterpart survey papers. we demonstrated the key differences in scope and surveyed

1772 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

techniques between our survey paper and the existing OSN in the spectrum of deception in terms of intent and security and/or attack papers in TABLE1. We list the key detectability. Further, we discussed the key properties of contributions of our survey paper compared to existing survey deception. papers in the following section. • In Section III, we discussed various types of OSD attacks in terms of false information, luring and phish- D. KEY CONTRIBUTIONS ing, fake identity, crowdturfing, and human targeted We made the following key contributions in this paper: attacks. Following the major OSD types, the compar- • To understand the fundamental meaning of social decep- isons between social network attacks, social deception tion and its key characteristics, we comprehensively attacks, and cybercrimes are discussed. We also dis- surveyed the multidisciplinary concepts and key prop- cussed the security breach by OSD attacks based on tra- erties of social deception. No previous survey paper has ditional CIA (confidentiality, integrity, and availability) addressed all these concepts together to understand the security goals. fundamental meanings of social deception. • In Section IV, we addressed various cues of social • We provided a comprehensive set of OSD attacks by deception, in terms of individual, cultural, linguistic, following the key properties of social deception (see physiological, psychological, and technological social Section II-D). In particular, we discussed the relation- deception cues. In addition, we discussed the relationships between social network attacks, OSD attacks, and ships between offline and online social deception cues, cybercrimes by describing the relationships between mainly identifying their commonalities and differences. them, major attacks in each category, and the attack • In Section V, we discussed five different types of key goals of OSD in terms of loss of security goals. factors that affect susceptibility to online social decep- • We provided an overview of social deception cues tion, including demographic, personality, cultural, social which have been studied in multidisciplinary domains, and economic, and network structure feature-based fac- including individual, cultural, linguistic, physiological, tors. psychological, and technological deception cues. This • In Section VI, we surveyed two existing prevention literature survey on the deception cues is helpful to mechanisms against OSD attacks, namely, data-driven obtain useful insights for developing better defense tools analysis, and social honeypots. Although social honey- in terms of prevention, detection, and response against pots are used for both intrusion prevention and intrusion OSD attacks. detection, we include them under this intrusion preven- • To provide a more comprehensive understanding tion mechanism to preserve its original design purpose on a system-level defense framework against OSD as a proactive intrusion prevention mechanism. attacks, we extensively surveyed the three types of • In Section VII, we comprehensively surveyed three defense mechanisms, including prevention, detection, existing detection mechanisms against OSD attacks, and response (or mitigation), which are summarized in namely, user profile-based, message content-based, and TABLES 5 – 7. network structure feature-based. Each class of detection • We provided pros and cons of major defense approaches mechanisms are discussed in terms of attack type, key to combat OSD attacks and the overall trends of the methods, features, and datasets used. state-of-the-art OSD defense techniques. This gives a • In Section VIII, we discussed several existing reader to easily identify relevant defense techniques in approaches of response mechanisms to detected OSD a given context to conduct research in this area. attacks in terms of mitigation or recovery from OSD • We identified the common datasets and metrics that have attacks. been used to validate the performance of defense mech- • In Section IX, we discussed datasets and metrics used for anisms combating the OSD attacks. From this compre- the validation and verification of defense mechanisms hensive survey on datasets and metrics, we also provided against OSD attacks. useful research directions to enhance the validation and • In Section X, since OSD research involves humans and verification methods, which have not been discussed in their behaviors, we discussed ethical issues associated other existing counterpart survey papers. with conducting the OSD research. • We also comprehensively discussed key findings, • In Section XI, based on the comprehensive survey insights and lessons learned, limitations, and future conducted on OSD attacks and their countermeasures, research directions based on the extensive survey con- we provided insights and lessons learned along with the ducted in this work. limitations of the state-of-the-art OSD research. • In Section XII, we provided concluding remarks and E. PAPER STRUCTURE discussed future research directions in this area. The rest of this paper is structured as follows: • In Section II, we surveyed the multidisciplinary con- II. CONCEPTS AND CHARACTERISTICS OF DECEPTION cepts of ‘deception’ along with goals of deception. The concept of deception is highly multidisciplinary and has In addition, we compared different types of deception been studied in various domains. In this section, we discuss

VOLUME 9, 2021 1773 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

the root definitions of deception and the fundamental proper- a way to avoid disastrous impact on an organization [143] by ties of deception which have been applied in launching OSD attributing a problem (or guilty) to an individual or victim. attacks in OSN platforms. 6) COMMUNICATIONS OR LINGUISTICS A. MULTIDISCIPLINARY CONCEPT OF DECEPTION In this domain, deception research often aimed to identify Let us start by looking at the dictionary definition of decep- either verbal or non-verbal indicators for deceptive commu- tion [37]. Deception is defined as: ‘‘To cause to believe what nications. Interpersonal deception theory (IDT) views decep- is false.’’ However, the definition is too broad and many tion as an interactive process between senders and receivers, deception researchers raised doubts on the definition. In the exchanging non-verbal and verbal behaviors and interpreting literature, the concepts of deception have been discussed with their communicative meanings. IDT further explains that different perspectives under different disciplines. We briefly deceivers strategically manage their verbal communications discuss how a different discipline has studied deception in the to successfully deceive receivers [15], [16]. Experimental following sections. studies showed that deceivers produced more words, fewer self-oriented (e.g., I, me, my) and more sense-based words 1) PHILOSOPHY (e.g., seeing, touching) than truth-tellers [72]. In Philosophy, intentional and unintentional (by mistake) deception has been discussed, such as ‘inadvertent or mis- 7) COMMAND AND CONTROL taken deceiving’ [19]. However, the common concept of In the military domain, deception refers to any planned deception was mostly agreed with ‘misleading a belief’ maneuvers undertaken for revealing false information and by either inadvertently or mistakenly [59], [160]. The core hiding the truth to an enemy with the purpose of misleading aspects of deception in Philosophy lies in an intentional act the enemy and enticing the enemy to undertake the wrong to mislead an entity to believe a false belief. operations [29], [124], [210]. Military deception involves a large number of individuals or organizations as both deceivers 2) BEHAVIORAL SCIENCE and victims and takes place in a long time period [29]. Behavior scientists1 investigated the concept of deception and its process in the behaviors of animals or humans. Two 8) COMPUTING AND ENGINEERING main concepts of deception are: (1) Functional deception Deceptive behaviors have popularly exhibited by cyber for an individual’s behavior (i.e., a signal) to mislead the attackers in various forms, such as phishing, social engi- actions of others; and (2) intentional deception referring to neering attacks, fraud advertisements, stealthy attack, and so intentional states, such as beliefs and/or desires, guide an forth [74], [154]. In addition, as the threat of phishing emails individual’s behavior, leading to the misrepresentation of increases, an individual online user’s susceptibility to phish- belief states [73], [104], [176]. ing attacks is studied in terms of demographics [114], [141], [171] or personality traits [30], [48], [70], [71], [128], [148], 3) PSYCHOLOGY [149]. We discuss the details of susceptibility to OSD attacks Psychologists defined deception as a behavior providing in SectionV. In addition, a lot of detection mechanisms to information to mislead subjects to some direction [3] or OSD attacks have been developed in the literature. We discuss explicit misrepresentation of a fact aiming to mislead sub- them with more details in Section VII as well. jects [81], [134]. The major psychological deception study For easy grasping of the key multidisciplinary concepts of focused on identifying cues as committing a crime [63], deception, we summarized the key deception concepts under psychological symptoms for self-deception [20], [75], indi- different disciplines in FIGURE 1. vidual differences and/or cues to deception [157], verbal or non-verbal communication cues [235]. B. TYPES OF DECEPTION Although deception can be intentional or unintentional, 4) SOCIOLOGY we focus on intentional deception in this work, which is Sociological deception research has mainly studied the effect more related to an attacker’s intent. The intentional decep- of deception in various social context on both positive and tion consists of deception with malicious intent and with negative aspects [123], or deception as a relational, or mar- non-malicious intent for a deceiver’s interest [47]. keting strategy [150]. The goals of malicious deception include: • Financial benefit: Many deceptive behaviors has its pur- 5) PUBLIC RELATIONS pose to obtain a monetary benefit. Financial benefit is In this domain, the concept of self-deception has been studied a common reason of an individual’s online deceptive as a strategic solution to resolve internal or external cri- behavior. For example, a spammer can be paid from sis [168]. The external role of self-deception is described as clicking advertisements by attracting online traffic to 1We consider biologists, ecologists, neuroscientists, and medical scientists the specific sites [133]. Malicious users spread phishing as ‘behavioral scientists’ in this work. links to collect credentials from victims [194].

1774 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 2. Goal, intent, and security breach according to a different type of social deception.

main beneﬁts of not being detected easily and reducing immediate cognitive costs. In TABLE 2, we summarized what social deception is malicious or not and how it is associated with breach of security goal.

FIGURE 1. The key multidisciplinary concepts of deception. C. TAXONOMIES AND SPECTRUM OF DECEPTION This section discusses the related concepts and spectrum of • Manipulation of public opinions: In social media, social deception. Deception can be defined and explained by a set and political bots play a role in influencing public opin- of related terminologies in which those concepts should be ions [57]. Malicious bots spread spam and phishing defined and compared. Deception exists in our daily life in links. Politicians and governments worldwide have been both verbal and nonverbal forms. Deception ranges a wide using such bots to manipulate public opinions. spectrum with varying intent and detectability (i.e., the extent • Cooperative deception: Cooperation is a strategy of of deception being detected). balancing costs and benefits and maintaining stakeholder relationships in the deception or cooperation 1) KEY TAXONOMIES OF DECEPTION interactions with opponents [183], often used in public In this section, we discuss a set of related terminologies relations. related to deception. Most common concepts are defined • Parasitism [168]: This refers to ‘false framing of respon- in the dictionary and discussed in the cybersecurity litera- sibility’ which can be easily used as a strategy to solve ture [20], [35], [37], [158], [168]. complicated issues without introducing long-term inves- • Deceivee [158]: The victim of a deception. tigations that may cause structural changes. • Deceiver [158]: The perpetrator of a deception. The goals of non-malicious deception are commonly • Susceptibility [37]: Likelihood to be deceived. discussed as follows: • Exploitation [37]: The use of resources and benefit from • Privacy protection: Deception can be used as a defense them (e.g., damage to systems) by attackers. for the privacy protection at the organization-level or • Self-deception [20]: A conscious false belief held with a individual-level. This is also called defensive deception. conflicting unconscious true belief. There are a few methods for the individual-level privacy • Trust [37]: Reliance on the confidentiality and integrity protection in cyberspace. Some privacy techniques add from other sources and with confidence. Earning high a noise to a user’s data for protection against attack- trust from a deceivee can be easily exploited by a ers [151] because the data can be modified before being deceiver. published. • Lying [35], [158]: Deliberate verbal deceptions. People • Self-presentation: People use fake presentation to often lie in pursuit of material gain, personal conve- present themselves as certain roles or intents [164]. nience, or escaping from punishment. Self-presentation is an activity to impress others for • White lying [168]: Normal standards for the lighthearted both liars and truth tellers. Self-presentation is one type of deception. way of understanding nonverbal communication [35]. • Belief [37]: A truth in somebody’s mind, truth basis. Self-presentation can be used as prediction cues to • Misbelief [37]: A misplaced belief (i.e., mistakenly deception [35]. believing in false information) • Self-deception: This is to hide true information reflect- • Perception [37]: The state of being aware of something ing conscious mind unconsciously [183], with the two through the senses.

VOLUME 9, 2021 1775 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

2) SPECTRUM OF DECEPTION In daily life and social networks, deception spans a spectrum of verbal and non-verbal behaviors. This section lists a few of the various deceptions based on [45], [158], [173]. • White lies [158]: Harmless lies to avoid hurting other’s feelings and smooth relationships. • Humorous lies [173]: Jokes that are obvious lies, such as practical jokes. • Altruistic lies [158]: Good lies for protecting others, such as for preventing children from worrying. • Defensive lies [158]: Lies to protect the deceiver, such as lies to get rid of repeated telemarketers. FIGURE 2. The spectrum of deception based on the extent of detectability • Aggressive lies [158]: Lies to deceive others for the of deception (x-axis); and the extent of good/bad intent of deception and benefit of the deceivers. no intent (y-axis). • Pathological lies [158]: Lies by a deceiver with psychological disorder. with multiple scenarios to lower down the risk intro- • Nonverbal minimization [45]: Understating an important duced by deception in terms of a deceiver’s perspective. case in nonverbal camouflage. • Success only by a deceivee’s cooperation: For decep- • Nonverbal exaggeration [45]: Overstating an important tion to be successful, a deceivee should be deceived by case to hide others. the deception. Even if deception is performed but the • Nonverbal neutralization [45]: Intentionally hiding nor- deceiver detects the deception, no effect can be intro- mal emotions when inquired about emotional things. duced. • Nonverbal substitution [158]: Intentionally changing a • Action as a strategy: Deception can be used as a strategy sensitive concept with a less sensitive one. to deal with situations with conflicts. The aim of the • Self-deception [158]: Pushing of a reality into the sub- intentional deception is to mislead a target entity’s belief consciousness. and make the target choose a suboptimal (or poor) action FIGURE2 represents the spectrum of deception from the that can be beneficial for the deceiver. lowest detectability to the highest detectability and from low- • Signals as deception cues: When deception is used, est bad intent (good intent) to no intent and to highest bad even if it can be very subtle, there exists some sig- intent. In general, the deception with lower detectability are nals. Well-known deception strategies are to increase more with good intent, such as altruistic lies and white lies. uncertainty (e.g., no signal increases uncertainty) or Nonverbal deception is usually with bad intent and can be mislead one’s belief (e.g., a false signal leads to false detected by professionals. Those behaviors can also be used beliefs). Although both deception techniques aim to as cues to detect lies. The deceptions with neutral intent can make a deceiver choose a wrong decision, if deception also be easily detected. These concepts can be applicable to by misleading with false signal is detected, this provides detect malicious behaviors in online social networks as many more information about a deceiver to a deceivee than offline human behaviors are also easily observed in online providing no signal. user behaviors. Investigating the key properties of deception is critical in developing defense mechanisms to combat OSD attacks as D. PROPERTIES OF DECEPTION the features of deception-based attacks, distinguished from Via the in-depth literature review, we observe the following other common OSN attacks. In this section, we discussed a unique key properties of deception: variety of cues and susceptability traits of social deception • Misleading one’s belief: Regardless of intent, deception behaviors across online and offline platforms. Thanks to the can mislead one’s belief which is actually false. Since fast advances of social media and OSN technologies, many deception as an action induces confusion or false infor- offline deception characteristics tend to be easily observed mation, false beliefs may be formed regardless of its even in online deception behaviors. However, due to the intent or outcome. limited real-time or interactions for feeling people’s presence • Impact by deception: Confusion or misbelief introduced in online platforms with the current state-of-the-art SNSs and by deception brings an outcome which can be negative social media technologies, some physiological or psycholog- or positive based on its original intent or its proper exe- ical cues may not be applicable in detecting online social cution. However, when deception with a certain intent deception. In addition, upon the detection of the deception, is not properly executed as planned or is used mistak- a deceiver can easily get out of the online situation while a enly, the outcome as its impact may not be predictable, deceivee can easily lose a track of the deceiver. Now we look resulting in high uncertainty (e.g., uncertain outcome). into various types of OSD behaviors currently studied in the Hence, if deception is intended, it should be planned literature.

1776 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 3. Classification of online social deception attacks.

FIGURE 3. The number of works that studied different types of online social deception attacks based on five classes of online social deception. All surveyed works are summarized in TABLES5-7.

III. TYPES OF ONLINE SOCIAL DECEPTION ATTACKS A. FALSE INFORMATION Various types of OSD attacks have been discussed in the False information on the web and social media can be clas- literature. In this section, we first classify various types of sified as misinformation and disinformation. Misinformation OSD attacks into five classes based on the key intent of each can be considered as ‘deception without intent’ which mis- attack class. In addition, since the existing similar studies takenly misleads people’s belief due to the false information have used ‘online social network attacks’ and ‘cybercrime’ propagated. Disinformation can be categorized as ‘decep- to discuss OSD, we discussed our view on how they are tion with intent,’ aiming to mislead people’s beliefs. False distinguished from and related to each other. All the OSD information can be also categorized as opinion-based vs. types are summarized in TABLE3 and the corresponding fact-based. Opinion-based false information is propagated work count for each OSD type is illustrated in FIGURE 3. without ground truth. On the other hand, fact-based false Lastly, we discussed how OSD attacks breach security goals information can mislead people’s beliefs due to the fraud in CIA triad and safety with the aim to give an alert on how from ground truth, such as hoaxes and fake news in social serious the OSD can be as a societal problem. media [86].

VOLUME 9, 2021 1777 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

Although no formally accepted terminologies exist to dis- C. FAKE IDENTITY tinguish different kinds of false information, we follow Jiang Attacks using fake identity have their basis on social decep- and Wilson [86]’s two criteria, which are veracity and intention and include: tionality [172], to discuss false information as below: • Fake Profile: In OSNs, attackers create a huge amount • Fake News: Fake news caused by serious fabrications of fake identities for their own benefits, which is also or large-scale hoaxes [159] has wildly spread via OSNs called Sybil attack. For example, in Facebook, attackers since the beginning of the 2016 US presidential elec- can leak out other users’ personal information, such as tion cycle. Flintham et al. [55] reported that two third e-mail and physical addresses, date of birth, employment of survey respondents accessed news via Facebook. data. Identity theft can take financial interests as well as Facebook and Twitter have banned thousands of pages access photographs of the friends of the victims [69]. and identified as the major culprit of generating and • Profile Cloning: Attackers secretly can create a dupli- promoting misinformation [86]. Fact-checking of news cate of an existing user profile in the same or different articles from different sources becomes a common social media platforms. Since the cloned profile resem- means to determine the veracity of social media posts. bles the current profile, attackers can utilize the friend Vosoughi et al. [200] found that fake news spread relationship and deceive and send friend requests to the faster than truthful news. The time lag between fake contacts of the cloned user. By constructing the trust news and fact-checking by fact-checking websites is relationship with a potential victim user, the attacker 10-20 hours [170]. can steal sensitive data from the user’s friends. • Rumors: Vosoughi et al. [199] defined a rumor as an Profile cloning has exposed severe societal threats unverified assertion that starts from one or more sources because attackers can commit more serious cyber- and spreads over time from one user to another in a crimes, such as cyberbullying, cyberstalking, and black- network. A rumor can be validated as true or false via mail, which can introduce physical threats to potential real-time verification in Twitter or remain unresolved. victims [154]. • Information Manipulation: One of the causes of • Compromised Accounts: Legitimate user accounts can information manipulation is opportunistic disinforma- be hacked and compromised by attackers [44]. Unlike tion [34]. This means false information is deliber- Sybil accounts, compromised accounts are originally ately and often covertly spread (e.g., planting a rumor) maintained by real users with normal social network in order to influence public opinions or obscure the usage history and have established social connections truth. Malicious users propagate opportunistic disin- with other legitimate users. formation mainly for financial interest or political purpose. D. CROWDTURFING • Deceptive Online Comments or Fake Reviews: Mali- Malicious, paid human workers can perform malicious cious users write fake reviews, opinions, or comments behaviors to achieve their employer’s goal. This is called in social media to mislead other users. Usually fake crowdturfing. For example, participants of an astroturfing reviews are classified as opinion-based false informa- campaign are organized by crowdsourcing systems [205]. tion [98]. Social bots are often used for automatically Crowturfing gathers crowdturfing workers and spreads fake generating fake reviews [224]. information to mislead people’s beliefs and/or public opinions in social media. Crowdturfing activities in social media exploit social networking platforms (e.g., instant message B. LURING groups, microblogs, blogs, or online forums) as the main Luring has been commonly used as one of popular deception information channel of the campaign [218]. Crowdturfing in strategies. The most common luring techniques in online social media is usually involved with spreading malicious : worlds include URLs, forming astroturf campaigns, and manipulating public • Spamming: Social media platform users can receive opinions. Usually it is challenging to detect crowdturfing unsolicited messages (spam) that are ranging from accounts because their social media accounts are mixed with advertising to phishing messages [154]. Malicious users normal posts as a camouflage. usually send spam messages in bulk to influence many Chinese crowdsourcing sites [205] and Western sites [110] legitimate users. have been studied for the analysis of crowdturfing in cam- • Phishing: Online phishing attacks, such as phishing paigns. Three classes of crowdturfers (i.e., professional webpages or phishing emails, are one type of cyber- users, casual users, and middlemen) are identified in Twitter crimes that can lure users to reveal sensitive or credential networks. In addition, their profiles, activities, and information and steal private or financial information linguistic characteristics have been analyzed to detect crowd- through social engineering attacks [40] or using other turfing workers [109]. Machine learning (ML)-based crowd- fraudulent, illegal activities [1]. These malicious activi- turfing detection mechanisms have been considered in ties can cause severe economic losses and threaten cred- Wang et al. [206]. Two common types of adversarial attacks ibility and financial security of OSN users. are evasion attacks (i.e., attacks changing behavioral features)

1778 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

and poisoning attacks (i.e., administrators polluting training data) [206].

E. HUMAN TARGETED ATTACKS Advanced online platforms have provided efficient tools for human targeted criminals to achieve their goals. The cybercriminals start their crime by establishing trust relationships with potential victims. Since this implies that these human targeted attacks are started based on social deception [76], we included the human targeted attacks as one of OSD classes considered in this survey. The common human targeted OSD attacks include: • Human Trafficking: Offline traditional human traf- FIGURE 4. The relationships between OSN attacks, social deception, and ficking means traffickers kidnap the victims (mostly cybercrime. women and children) for trading with the purpose of labor exploitation or and sex trafficking [51]. Cyber- trafficking means that traffickers leverage cyber plat- OSD attacks, such as personal and confidential informa- forms for efficiently trafficking a great number of tion leakout, or identity theft, have been treated as cyber- victims by using advertise services across geographic crimes [139] since early 2000’s. The advanced features of boundaries [66], [105]. social network service technologies further facilitated the • Cyberbullying: In this attack, an attacker commits the significant increase of serious, sophisticated cybercrimes, deliberate and repetitive online harassing of someone, such as human trafficking, online consumer fraud, iden- especially adolescents [154]. Cyberbullying causes seri- tity cloning, hacking, child pornography, and/or online ous fear and harms for the victims through the online stalking [192]. platforms involving deception, public humiliation, mal- FIGURE4 illustrates the relationships between OSN ice, and unwanted contact [43]. attacks, OSD attacks, and cybercrimes. Although cyber- • Cybergrooming: In this attack, adult criminals attempt crime is considered the most serious as cyberattacks, we can to establish trust relationships with potential victims, observe there are many attacks that overlap to each other. mostly female children, using online social media plat- OSD attacks overlap either OSN attacks or cybercrime or forms. Their intent is to have improper sexual rela- both. Cybercrimes, such as consumer fraud, cryptojacking, tionships with them or produce child pornography enterprise ransomware, supply chain attacks, and malicious products [154], [226]. email attacks [179], fall in a separate group because these • Cyberstalking: Malicious users can exploit legitimate attacks are spread in Internet, which is much broader than users’ online information and harass them by stalk- OSN platforms. There are no explicit guidelines if certain ing [154]. Without proper security protection of private OSN attacks or threats are illegal or if threats are illegal information, individual users can expose their pri- but their impact may not be direct. For example, when a vate information (e.g., phone number, home address, user’s data privacy (or integrity) is breached but no actually work location, etc.) in social media platforms without loss is found, it is hard to predict if there are future security awareness. concerns. Although cybercriminals caused serious adverse effects F. RELATIONSHIPS BETWEEN ONLINE SOCIAL to the society and individuals, 44% of the victims reported DECEPTION, SOCIAL NETWORK ATTACKS, AND to the police [62]. Victims’ reporting is a beneficial practice to CYBERCRIMES increase the awareness of the communities to defend against Social network attacks, including traditional threats, social potential cybercrimes. Victims may report to not only the threats and multimedia content threats, are the general police, but also the corporation in an active dialogue envi- security threats concerned in the literature [154]. Those ronment, or share the victim stories to families and close security and privacy threats include all the detrimen- friends [62]. Cybercriminal profiling is highly challenging, tal activities with malicious intent. Social deception is compared to profiles of traditional criminals because cyber- part of social network attacks, as shown in FIGURE 4, criminals can easily leave the platforms. However, it is very because social deception attacks can only be successful beneficial to identify common characteristics of cybercrim- when the victims are being deceived from the attacker’s inals [139] and useful for their early detection. Profiling perspective. can follow the procedure in the Behavioral Evidence Anal- Four types of social network attacks are considered the ysis [190]. Since most cybercrime victims are corporations OSD attacks: Unsolicited fake information attacks, identity and/or their customers, corporations can predict the potential attacks, crowdturfing, and human targeted attacks. The spe- insider criminals more intelligently with the help of cyber- cific types of attacks were described in Section III. Some criminal profiling [139].

VOLUME 9, 2021 1779 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 4. Impact of online social deception attacks on loss of security B. CULTURAL DECEPTION CUES goals and safety. Lewis and George [111] showed that individuals from collectivistic cultures were more apt to employ deception in business negotiation than those from individualistic cultures. Heine [75] discussed self-enhancement in Western people where self-enhancement refers to a motivation to make a person feel positive about himself/herself with a high self- esteem [167]. Bond et al. [14] showed in the lying settings, Jordanians displayed more behavioral cues than Americans in terms of eye contact and ﬁlled pauses.

C. LINGUISTIC DECEPTION CUES Linguistic or communicative cues exhibiting deception in communications have been studied. Linguistic profiles are studied in deceptive communication, choice and use of languages, and linguistic patterns in deceptive G. EFFECT OF ONLINE SOCIAL DECEPTION ATTACKS ON messages [15], [16]. The example linguistic deception cues SECURITY GOALS AND SAFETY include use of more word quantity [72], [132], third-person The CIA triad security goals play a major role in pronounce use [182], use of emotion words, and markers the information security practice. With the growth of of cognitive complexity (i.e., lying requires less complex socio-technical security issues, the original CIA triad is cognitive process) [152]. expanded with more specialized aspects, such as authentication and non-repudiation [122]. However, they still have lim- D. PHYSIOLOGICAL DECEPTION CUES itations in systems and data for the wider organizational and Physiological or behavioral cues are the emotions in deceiv- social aspects of security [163]. OSN security has three levels ing that liars are expressing because they are indicators of of security goals: network-level, account-level, and message- guilt [35]. In the studies of behavioral cues to deception [35] level. Achieving the CIA security goals can contribute to all and physiological cues to identifying deception [201], liars social network security levels. In addition to the three security may have at least one of emotions, content complexity, and goals, we also added another goal, which is safety. A person attempted control phenomena. The examples of behavioral and other non-information based assets also needs to be cues include less blinks or decreased hand and finger move- protected in the cyber security practice [197]. For example, ment due to increased cognitive load [201], [202], [204], cyberbullying can cause direct physical harm to a victim even higher-pitched voices and faster speech [35], or displacement if there is no loss of information confidentiality, integrity or activities (e.g., high anxiety or conscious deception) [184]. availability [197]. Therefore, we included human safety as a non-information security goal. For readers’ convenience, E. PSYCHOLOGICAL DECEPTION CUES we summarized how OSD attacks can breach security goals Psychological or cognitive cues include nonverbal anxiety and safety in TABLE4. responses that are consciously revealed in the intentional deception [94]. Mitchell [124] described the mental pro- IV. CUES OF SOCIAL DECEPTION cess of deceptions from a social cognitive perspective based In this section, we discuss various cues of social deception on children verbal deception and nonverbal deception in offline and online so that we can investigate how offline sports. Knapp et al. [94] used controlled lab settings to deception cues can be applicable in online deception cues. determine the characteristics of intentional deception with In addition, we aim to deliver insights on how the estimates verbal and nonverbal cues. The example psychological cues of those deception cues can provide the key predictors of include increased cognitive load [183], [201], [202], nervous- detecting online social deception. ness [35], [183], [201], or controlled behavior [183], [201]. Trivers [183] emphasized nervousness, control and cogni- A. INDIVIDUAL DECEPTION CUES tive load as three key deception cues. In addition, other anx- Riggo and Friedman [157] studied correlations between indi- iety responses are discussed [94]. Deceivers tend to exhibit vidual types and behavioral patterns and found individuals cognitive cues, such as more uncertainty, vagueness, ner- vary systematically in displaying certain behavioral cues vousness, reticence, dependence, and/or unpleasantness as a (e.g., dominance, a social skills measure) are correlated with negative effect. facial animation behavior. Certain types of individuals can control the display of cues to increase the likelihood of decep- F. TECHNOLOGICAL DECEPTION CUES tion. Kraut and Poe [97] found that the occupational status Ferrara et al. [52] discussed the impact and detection of social and age were the top predictors of social deception. bots which are the outcome of abusing new technologies.

1780 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

Social bots with malicious intents caused several levels of OSD attacks in order to help researchers develop protection damage to society. Early bots automatically posted content tools for susceptible users in OSNs. and can be spotted by the cues of a high volume of content generation. Several social honeypot approaches attracted A. INDIVIDUAL OR SOCIETY-BASED SUSCEPTIBLE social bots followers by carefully designed bots and analyzed FACTORS the technology cues of social bots. However, sophisticated Demographic factors were studied to investigate the suscep- social bots are becoming more intelligent and tend to mimic tibility to OSD attacks. Young age groups between 18 and human-like behaviors, making it hard to detect the social bots. 25 are known more susceptible to phishing than other age The advanced detection strategy leveraged the technological groups [171]. Young children were also identified as key cues from social graph structure, such as densely connected potential victims to cybergrooming [12], [214]. Women are communities, and behavioral patterns. The proposed behav- found more susceptible to phishing than men [171]. In partic- ioral signature contains classes of features including network, ular, old women were found the most vulnerable populations user, friends, timing, content, and sentiment [52]. to phishing [114], [141]. People’s risk perception capabilities and knowledge about risk are shown as the key factor to G. RELATIONSHIPS BETWEEN DECEPTION CUES OF prevent online deception [64], [118], [215], [220]. OFFLINE AND ONLINE PLATFORMS Personality traits are studied to investigate their impact Via the in-depth survey of deception cues, we identified the on susceptibility to scams or phishing attacks [30], [48], commonalities and differences between online and offline [70], [71], [128], [148], [149] using the Big Five personality deceptive behaviors as below. traits model [189]. However, due to the sample bias and lack of subjects covering a wide range of personality traits, the findings are not generalizable. In order to overcome the 1) COMMONALITIES BETWEEN ONLINE AND OFFLINE issues of limited sampling, Cho et al. [25] developed a math- DECEPTIVE BEHAVIORS ematical model based on Stochastic Petri Nets to investigate Deception usually spreads via communication between the effect of user personality traits on phishing suscepti- deceivers and deceivees. The online media platforms support bility. Ding et al. [39] classified phishing emails in terms chat-based or synchronous communications similar to the of their corresponding target victims based on personality traditional face-to-face chatting or interviews [187]. Inter- traits. Weir [6] also studied a user’s susceptibility to social personal deception theory [16] discusses several verbal and engineering attack by proposing a user-centric framework non-verbal deception cues for traditional offline communi- considering socio-psychological, habitual, socio-emotional, cations. Most of the verbal deception cues (e.g., linguistic and perceptual user attributes. cues) are relevant to both offline and online deception [36]. Cultural factors have been studied as factors to influence Messages and posts are the main source of online information susceptibility to OSD attacks. A well-known classification of so that the linguistic cues are most useful cues for online cultural values is Hofstede’s two cultural dimensions [77]: deception [230]. These days online platforms also provide individualism vs. collectivism. In the individualistic culture, face-to-face chatting. Although it is limited to some extent, individuals are loosely tied to one another and a sense of ‘I’ some physiological cues and/or body movement can be cap- and an individual’s ‘privacy’ are valued. On the other hand, tured. in the collectivistic culture, individuals are tightly connected emphasizing ‘we-ness’ and ‘belongings’ to each other. Since 2) DIFFERENCES BETWEEN ONLINE AND OFFLINE culture has been studied as a key factor impacting trust in DECEPTIVE BEHAVIORS a society where trust affects deceptive behavior, existing Although face-to-face social media platforms make people studies also have looked at how culture influences deception. feel much closer to each other by delivering body movement Social and economic factors are also studied as factors and facial expressions, feeling some physiological cues or affecting the susceptibility to OSD attacks. Vulnerable status subtle behavioral changes may not be captured like face- in a socio-economic ladder in the off-line world seems to be to-face interactions [187]. In addition, typing behavior (e.g., transferable to the online world. For example, low education response time and the number of edits) for online chatting and/or income may influence the level of knowledge and were studied as cues of online deception [36], which is not awareness about online social deception (or phishing) or often observed in offline interactions. In addition, online related threat [90], [181]. However, there is a lack of empirical behaviors are known different from offline behaviors in their evidence to insist the relationships between individual char- motivations and attitudes [33]. acteristics related to social and economic status [90].

V. SUSCEPTIBILITIES TO ONLINE SOCIAL DECEPTION B. ONLINE ACTIVITY-BASED SUSCEPTIBLE FACTORS Attackers aim to achieve their attack goals as efﬁcient as Wagner et al. [203] found that a user’s out-degree is identi- possible with minimum cost. To this end, the attackers may ﬁed as a key network feature social bots can target as their target highly susceptible people to the OSD attacks. In this victim since higher out-degree in OSNs means more friends section, we discuss various types of susceptability traits to the a user has. Susceptible users tend to be more active (e.g.,

VOLUME 9, 2021 1781 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

retweet, mention, follow or reply) in the Twitter network and users’ consensus to ensure the trustworthiness of posts; interact with more users, but their communication is mainly (ii) identify a malicious user from the transaction record; for conversational purpose rather than informational purpose. and (iii) delete false information posts with a penalty Susceptible users tend to use more social words and show applied to the fake news attackers. In general, the mali- more affection. Similarly, in Facebook, susceptible users tend cious attackers are the normal users but normal users to more engage in posting activities with less restrictive pri- do not have write access to the blockchain. Only the vacy settings, naturally resulting in higher vulnerability to pri- information source from a group of publishers or a group vacy threats [70]. Social isolation (loneliness) and risk-taking of a social network is allowed to commit transactions to online behaviors are the indirect factors of vulnerable people, the blockchain. such as victims of cybergrooming [211], [213]. Albladi and • Phishing Prevention: Florêncio and Herley [56] pro- Weir [6] analyzed various user characteristics, such as a posed a low-delay phishing prevention method where level of involvement, for vulnerability of social engineering a client reports the reuse activities of user password in attacks. unknown websites and a server makes decisions and Engagement in social media is one of the most prominent updates the blocked list. Gupta and Pieprzyk [68] pro- attributes contributing to high susceptibility to social decep- posed a defense model to classify web-pages on a col- tion. Habitual use of social media measured by the size of laborative platform PhishTank. This defense model uses social network and time spent in social media increases the a plug-in method into a browser to check blacklisting likelihood of being victims for social attacks in OSNs [196]. and blocking lists. Highly active social network users can be more favorable • Identity Theft Prevention: Tsikerdekis [186] discussed targets for attackers as they have more exposures to social a proactive approach of identity deception prevention media and accomplish their attacks through the active users’ using social network data. Data in common contribution networks [6]. More use of social media is significantly asso- networks are used to establish a community’s behavioral ciated with a higher level of risks for sexual exploitation [12], profile. Malicious accounts can be barred before joining [214] and cyberbullying [41]. a community based on the deviation of user behaviors It is critical to look into what individual, cultural, network, from the community’s profile. or interaction traits introduce high susceptibilities to OSD • Cyberbullying Prevention: Dinakar et al. [38] proposed attacks because protecting highly susceptible users first can a dashboard reflective user interface in social network be the key to prevent the OSD attacks. However, there has platforms for both cyberbullying attackers and victims. been little work that developed protection tools for suscepti- The reflective user interface integrated notifications, ble users with high priority in the literature. action delay, and interactive education. Their user study revealed that the in-context dynamic help in the user VI. PREVENTION MECHANISMS OF ONLINE SOCIAL interface is effective for the end-users. DECEPTION Pros and Cons: Preventing OSD attacks needs assessment In this section, as proactive defense mechanisms, we dis- of users or information in order to determine whether to cuss two types of OSD prevention mechanisms: Data-driven allow the user or information can stay or be propagated in prevention mechanisms and social honeypots. The sur- a given OSN. However, the so-called trust assessment is not veyed OSD prevention research works are listed in clear. The key merit of the prevention mechanisms should TABLE 5. be how quickly false information or malicious users are detected. Otherwise, it is not distinguishable from detection A. DATA-DRIVEN PREVENTION MECHANISMS mechanisms. In addition, the effectiveness of the prevention Prevention mechanisms against OSD attacks have been little mechanisms is still measured by detection accuracy. There explored. We discuss several types of data-driven prevention should be more useful metrics that can capture the nature mechanisms that have been commonly used to deal with OSD of proactiveness of the prevention mechanisms. In addition, attacks as follows: no real-world implementation using the prevention mechanisms is considered, which limits applicability of the preven- • Fake News Prevention: Saad et al. [161] proposed a tion mechanisms as well. blockchain-based system to fight against fake news by recording a transaction in blockchain when posting a news article and applying authentication consensus of B. SOCIAL HONEYPOTS the record. The result was measured by an authentication Recently, the concept of good bots has appeared by creating indicator along with the post. In this design, when a social network avatars to identify malicious activities by user saw a post, the authentication indicator associated highly intelligent, sophisticated attacks, such as advanced with the post was shown as the status of verification: persistent attacks (APTs) [195]. Honeypots technology is not successful, failed or pending. This mechanism addressed new and has been popularly used in communication networks the following services for preventing fake news spread as a defensive deception to proactively deal with attackers by in the OSN: (i) Determine the authenticity of the news by luring them to honeypots for preventing them from accessing

1782 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 5. Online social deception prevention mechanisms.

a target [27]. The existing approaches using social honeypots For the social honeypots to be used as detection mecha- have mainly focused on detecting social spammers, social- nisms, they are defined as information resources that mon- bots [234], or malware [107], [108], [145]–[147], [177], [208] itor a spammer’s behaviors and log their information (e.g., as a passive monitoring tool. These works use some profiles their profiles and contents in social networking commu- of attackers to detect them based on the features collected nities) [107]. This early study detected deceptive spam from the social honeypots placed as fake SNS accounts (e.g., profiles in MySpace and Twitter by social honeypot deploy- Facebook or Twitter). ment. Based on the spammer they attracted, a SVM spam Although the original purpose of social honeypots was to classifier was trained to identify spammers and legitimate proactively prevent attackers from accessing system/network users. An ML-based classifier was also developed to identify resources, they have been used as a complement to detect unknown spammers with high precision in two social network various OSN attacks. However, the original purpose of social communities. Lee et al. [108] detected content polluters in honeypots lies in a proactive intrusion prevention mechanism. Twitter by designing Twitter-based social honeypots. The In addition, although the social honeypots can be used as 60 social honeypot accounts followed other social honey- a detection tool for OSN or OSD attacks, their goal is an pot accounts and posted four types of tweets to each other. early detection or mitigation based on the proactive defense They investigated the harvested users to nine clusters via the in nature. Hence, we include social honeypots as prevention Expectation-Maximization (EM) algorithm. They used con- mechanisms of OSD attacks. tent polluters classification by Random Forest and improved

VOLUME 9, 2021 1783 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

the results by standard boosting and bagging and by different using individual attack model and coordinated attack model. feature group combinations. De Cristofaro et al. [31] studied paying for ‘likes fraud’ in Haddadi and Hui [69] focused on privacy and fake pro- Facebook and linking the campaigns to honeypot pages to files by characterizing fake profiles and reducing the threats collect data. They analyzed the page advertising and proof identity theft. They set social honeypots using the fake motion activities. Nisrine et al. [135] discovered malicious identities of celebrities and ordinary people and analyzed profiles by social honeypots and used both feature-based the different behaviors (e.g., a number of friends, friends strategy and honeypot feature-based strategy to collect data. requests, and public/private messages) between those fake Combining honeypot features can increase the ML accuracy accounts. Stringhini et al. [177] studied 900 honey-profiles and recall, compared to the scheme with traditional features to detect spammers in three social network communities (e.g., only. MySpace, Facebook, and Twitter) where their honey-profiles Zhu [232] defined ‘‘active honeypots’’ as active Twitter have geographic networks. They collected activity data for accounts, which capture more than 10 new spammers every- a long time (i.e., one year). In addition, this work identified day, similar to the spammer network hubs. They extracted both spam profiles and spam campaigns based on the shared 1,814 those accounts from the Twitter space and stud- URL. ied the properties and identification of active honeypots. Virvilis et al. [195] described the common characteris- Yang et al. [222] deployed passive social honeypots to capture tics of APT attackers and malicious insiders and discussed spammers’ preferences by designing social honeypots with multiple deception techniques for early detection of sophisti- various behaviors. The design considered tweet behavior (i.e., cated attackers. They created social network avatars in attack tweet frequency, tweet keywords, and tweet topics), followed preparation phase (information gathering), along with fake behaviors of famous people’s accounts and application instal- DNS records and HTML comments. Zhu et al. [234] showed lation. They analyzed which type of social honeypots has the the analysis and simulation of infiltrating social honeybots highest capture rate and designed advanced social honeypots defense into botnets of social networks. The framework based on their results. They demonstrated that the advanced SODEXO (SOcial network Deception and EXploitation) had honeypot can capture spammers 26 times faster than the three components: HD, HE, and PAS. The HD deployed a normal social honeypots. moderate number of honeybots in the social network. The HE Pros and Cons: Social honeypots would be highly effec- modeled the dynamics and utility optimization of honeybots tive particularly when it is well deployed to attract targeted and botmaster by a Stackelberg game model. The results attackers. However, so far, the existing studies discussed showed that a small number of honeybots could significantly above did not consider key, unique characteristics of vul- decrease the infected population (i.e., a botnet) in a large nerable victim profiles to develop social honeypots. The social network. effectiveness of existing social honeypots is evaluated based Paradise et al. [145], [146] simulated defense account mon- on intrusion detection accuracy rather than the coverage of itoring attack strategies in OSNs. The attackers sent friend attack types or the main attack types attracted to the social requests to some community members chosen by different honeypots. Since an individual honeypot did not target a par- attacker strategies. In addition, the attackers may have full ticular attack, it is not clear what types of attackers are more knowledge of the defense strategies. The defender chose a set attractive to certain characteristics of the social honeypots of accounts to monitor based on various criteria. They ana- from the existing approaches. In addition, developing social lyzed the acceptance rate, hit rate, a number of friends before honeypots with fake accounts may introduce ethical issues hit, and monitored cost between combinations of attackers because the use of the social honeypots itself is based on and defenders. The result showed that under the sophisti- deceiving all other users as well. cated attackers with the full knowledge of defense strategies, defense using PageRank and most connected profiles had the VII. DETECTION MECHANISMS OF ONLINE SOCIAL best detection with minimum cost. Paradise et al. [147] tar- DECEPTION geted at detecting the attackers in the reconnaissance stage of Most existing defense mechanisms against OSD attacks focus APT. The social honeypot artificial profiles were assimilated on detecting those attacks. We discuss those detection mech- into an organizational social network (Xing and LinkedIn) anisms based on three types: user profile-based, message and received the friend requests to organization employees. content-based, and network feature-based. The authors analyzed the attacker profiles collected in the social honeypot. A. USER PROFILE-BASED DECEPTION DETECTION Badri Satya et al. [9] collected the so called ‘fake likers’ MECHANISMS on Facebook, who are paid workers to propagate fake likes Most profile cloning studies utilized the user profiles [91], using linkage and honeypot pages. The authors extracted the [95], [169]. To identify cloned profiles, they calculated pro- four types of profiles and behavior features and trained clas- file similarities using various methods based on user profile sifiers to detect the fake likers. The temporal features were attributes. Kontaxis et al. [95] proposed three components cost-efficient compared to the previous research. They also to detect profile cloning: an information distiller, a profile evaluated the robustness of their work by modifying features hunter, and a profile verifier. The profile verifier component

1784 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 6. Data-driven deception detection mechanisms.

calculated the profile similarity score between testing social detected the attackers by a detector called ‘ChoneSpotter.’ profiles and the user’s original profile. Both the information The context-free detection algorithm includes the profile field and profile pictures contributed to estimating the profile information and friendship connections. The input features similarity. Kamhoua et al. [91] detected user profiles across include recently used IPs, a friend list, and the profile and its multiple OSNs in a supervised learning classifier. The method similarity. A cloned profile was determined by using the same consists of three steps: the profile information collection IP prefix and the similarity over a certain threshold. from a friend request, the friend list identity verification, User profile features and user behavior/activity features and the report of possible colluders. The binary classifier were extracted to detect malicious accounts [9], [17], [28], was based on both the profile attributes similarity and friend [113], [147], [175], [207] in Sybil attacks, fake reviews, list similarity. Shan et al. [169] simulated profile cloning or spamming attacks. Badri Satya et al. [9] studied the attacks by snowball sampling and iteration attack and then feature engineering from the account of ‘fake likers.’ They

VOLUME 9, 2021 1785 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 6. (Continued.) Data-driven deception detection mechanisms.

considered profile features, such as the length of user intro- crowdturfing campaigns and tasks. Cao and Caverlee [17] duction, the longevity of an account, and the number of studied the behavioral features to detect spam URLs in OSNs. friends. Social activities represent a unique attribute observed They used fifteen click and posting-based features in Random in OSNs and consist of the behavior features of an account, Forest classifiers and evaluated the top six features. such as sending friend request, posting, retweeting, lik- Cresci et al. [28] proposed a novel DNA-inspired social ing/disliking and social attention [9]. More specific features fingerprinting approach of behavioral modeling to detect under each activity category can be further extracted, such spambot accounts. Twitter account behaviors were encoded as the acceptance of a friend request sent from [147] and the as a string of behavioral units (e.g., tweet, reply and retweet). average time interval of posting from [175]. Wang et al. [207] This new model can deal with the new type of spambots investigated several behavioral signatures for the output of which can be easily missed by most traditional tools. Social

1786 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

fingerprinting sequences are characterized by the LCS curve. The probability of 18 topics was used as one feature vector for Spambots are related to high LCS values by sharing suspi- the SVM classifier. cious long behavioral patterns. The LCS curve from behav- The LDA algorithm has been enhanced to detect cyber- ioral model is used to detect more sophisticated types of criminal accounts and spams. Lau et al. [106] developed crowdsourcing spammers. a weakly supervised cybercriminal network mining method User profiles and activities are the key features to detect supported by a probability generative model and a novel OSD attacks (e.g, advanced spammers or crowdturfing), context-sensitive Gibbs sampling algorithm (CSLDA). The along with other content-based and graph-based features [82], algorithm can extract the semantically rich representations [107]–[109], [199], [206]. We will discuss those hybrid detec- of latent concepts to predict transactional and collaborative tion examples in Section VII-D. relationships (e.g., cybercriminal indicator) in publicly acces- Pros and Cons: User profile information provides specific sible messages posted on social media. Song et al. [175] activity features and behaviors about each user. However, used Labeled LDA (L-LDA) to indicate the probability of co- some profile information is private; thus, collecting private occurrence. The latent topics were normalized to topic-based information itself is the violation of a user’s privacy right. features, which have distinct properties with TF-IDF gener- In addition, even if the information itself is open to the ated word-based features. public, how to use the information should be agreed with Golbeck et al. [61] detected two types of false article the owner of the information. Since each user enters his/her stories, which are fake news and satires by themes and word profile information, if the user is malicious, it is easy to enter vectors. Then they defined a theme by a new codebook with fake information for making self-presentation look attractive, 7 theme types, such as conspiracy theory and hyperbolic which is one of self-deception. Besides, collecting profile and criticism. Multiple themes can be labelled to an article as behavioral data incurs high cost and/or time under privacy a theme coding. The proposed classifier worked better for protection of the social media platforms. articles under a certain type of theme. Pros and Cons: The topic features can be easily obtained. However, there would be unique network fea- B. MESSAGE CONTENT-BASED DECEPTION DETECTION tures distinguishing attackers from normal users. That is, MECHANISMS the content-only features may not be able to capture other In TABLE6, we showed that the majority of social decep- features of dynamic interactions with other users, such as tion detection approaches have used content-based features likes, friend acceptance, or frequency of leaving comments because the text of user posts and reviews can be easily or sharing. In addition, topic models are highly sensitive to collected and analyzed using existing linguistic models. The datasets and topic models may perform differently in detec- proliferation of social media and/or network applications tion accuracy depending on datasets. allowed numerous types of raw and advanced content features available. Topic modeling and sentiment-based features have 2) FEATURE-BASED DECEPTION DETECTION been popularly utilized for the linguistic analysis of deceptive TABLE6 lists the feature set used by the papers surveyed in messages. this work. The commonly used features include raw features, such as word vector, word embedding, hashtags, links and 1) TOPIC MODELING-BASED DETECTION URLs [119]. Advanced features include deep content fea- Most of the work developed topic distributions by using tures, statistics, LIWC and other metadata, such as location, Latent Dirichlet Allocation (LDA) [106], [115], [175], [178], source, or time [193]. Most ML-based models use super- [217]. If each user’s posts are collected as a document, LDA vised learning. Among the supervised models, random forest, generates the topic probability distribution of the user’s doc- SVM, Naïve Bayes, logistic regression, and k-nearest neigh- ument. Liu et al. [115] extended the topic features to two new bors are the most favorable classifiers for detection. Neural features. A GOSS indicates a user’s interests in specific top- networks models, such as Recurrent Neural Networks [224] ics, compared to other users while a LOSS indicates a user’s and Convolutional Neural Networks with Long Short-Term interests in various topics. By adding those two topic-based Memory (CNN-LSTM) [223], are used for textural features. features to classifiers, the averaged F1-score shows better Temporal models, such as DTW and HMM [49], [199], are performance. Swe and Myo [178] built a keyword ‘‘blacklist’’ discussed in rumor detection. The boosting-based ensemble to detect fake accounts by extracting topics from LDA and models are implemented for spammer detection [82], [223]. keywords from TF-IDF (term frequency-inverse document A few studies used semi-supervised models [82], [166] when frequency) algorithms. The blacklist contributed to 500 fake the labeled dataset was not available. words. The number and ratio of fake words and a few other Everett et al. [49] studied the veracity of the automated content-based features were extracted for their classifier. The online reviews provided by regular users. They used the text result using a ‘‘blacklist’’ showed better accuracy than the generated by second-order Markov chain model. The key traditional spam word list by reducing false positive rate. findings include: (i) The negative crowd’s opinion reviews Wu et al. [217] extracted the topic distribution of 18 topics are more believable to humans; (ii) light-hearted topics are for one message following the official Weibo topic categories. easier to deceive than factual topics; and (iii) automated

VOLUME 9, 2021 1787 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

text on adult content is the most deceptive. Yao et al. [224] sentiments should be considered in order to capture fake investigated attacks of fake Yelp restaurant reviews generated information and its intent. by an RNN model and LSTM model. The model considers the reviews themselves only, not including metadata as review- C. NETWORK STRUCTURE FEATURE-BASED DETECTION ers. Similarity feature, structural features, syntactic features, Several general network features were extracted in supervised semantic features, and LIWC features were used in SVM learning methods, such as topology, node in-degree and out- to compare the character-level distribution. They found that degree, edge weight, and clustering coefficient [100], [155], information loss was incurred in the process of generating [199]. Wu et al. [216] summarized false information spreader fake reviews from RNN models and the generated reviews can detection based on network structures. Ratkiewicz et al. [155] be detected against real reviews. Song et al. [174] detected built a Truthy system to enable the detection of astroturfing crowdturfing targets and retweets from crowdturfing websites on Twitter. The proposed Truthy system extracted a whole and black-market sites. set of basic network features for each meme and sent those Pros and Cons: Feature-based models generate high features with a meme mood by sentiment analysis to the accuracy and low false positive rates. The raw content supervised learning toolkit. Kumar et al. [100] developed four features are easily obtainable although the extraction of feature sets, including network features to identify hoaxes sophisticated features incurs high cost. However, the tem- in Wikipedia. The network features measure the relation poral pattern of messages influences the detection perfor- between the references of the article in the Wikipedia hyper- mance. The semantic analysis methods may ignore hidden link network. The performance of features sets was evaluated messages and background knowledge and require tuning in a random forest classifier. many input parameters, which leads to high complexity and In the following sections, we discuss algorithms and super- labor-intensive. vised learning methods specifically designed for the network structure, such as propagation-based models, graph optimiza- 3) SENTIMENT-BASED DECEPTION DETECTION tion algorithms, and graph anomaly detection algorithms. Sentiment of social media messages serves as extra fea- TABLE 7 lists all the surveyed works under Section VII-C. tures of message contents. Sentiment provides emotional involvement, such as like, agree, or negation, calculated by 1) EPIDEMIC MODELS lexicon analysis [13], [38], [79], [86], [198]. Jiang and Wil- Epidemic model is a direct way to model and simulate the son [86] introduced a novel emotional and topical lexicon, diffusion of disease [131]. Since the spread of disease in a the so called ComLex. The authors analyzed the linguistic certain population is similar to the propagation of false infor- signals in user comments, regarding misinformation and fact- mation in the social media communities, epidemic models checking. Specifically, they discussed the signals from user have been often modified to quantify the extent of false infor- comments to misinformation posts, veracity of social media mation propagation [87]. The epidemic models are agent- posts, or fact-checking effects. There are signals for positive based, where an individual node is modeled as an agent. fact-checking effect as well as signals (e.g., increased swear Different types of agents are characterized by distinct states word usage) indicating potential ‘‘backfire’’ effects [138], and behaviors, such as the agents Susceptible (S), Infectious where attempts to intervene against misinformation may only (I), and Recovered (R) in the traditional SIR (Susceptible, entrench the original false belief. Infectious, and Recovered) model [129] in false information Sentiment features are often used along with TF-IDF word propagation. In OSNs, agents in the SIR model represent a vectors. Supervised classifiers in current research utilize group of users in each state as follows: (i) Susceptible (S): sentiment analysis to improve prediction. Bhatt et al. [13] Users who have not received information (e.g., rumor posts detected fake news stances from neural embedding, n- or fake news) yet but are susceptible to receive and believe it; gram TF vector and sentiment difference between news (ii) Infectious (I): Users who received the information and can headline-body TF vector pair. Dinakar et al. [38] proposed a actively spread it; and (iii) Recovered (R): Users who received sentiment analysis to predict bullying, aiming at discovering the information and refuse to spread it [227]. goals and emotions behind the contents. Note that Ortony The state transitions are S to I by infection rate β, and I to R lexicon [144] maintains a list of positive and negative words by recovery rate γ depicted in FIGURE 5a. The current false describing the affect. The lexicon of negative words was information propagation research has two tracks employing only added in the feature list to detect bully-related rude the epidemic models: (i) Adding more links and parameters comments. to the traditional SIR model; or (ii) Building SEIZ model Pros and Cons: Sentiment analysis includes more emo- (Susceptible, Exposed, Infected, and Skeptic–Z; discussed tional and background information, in addition to the explicit below) to fit to the OSN data. content, which can increase the prediction accuracy, when compared to semantic-only methods. However, the use of a: SIR MODEL WITH VARIATIONS sentiment analysis cannot fully leverage the linguistic infor- Many variants of the basic SIR models have been mation in the contents where the lexicon is domain-specific. proposed in the current false information propagation In addition, more elaborated dimensions of emotions or research. Zhao et al. [227] added forgetting mechanisms to

1788 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

TABLE 7. Network structure-driven deception detection mechanisms.

VOLUME 9, 2021 1789 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

FIGURE 5. Three types of agent-based epidemic models. The solid line arrows are transitions from one state to another states with probabilities. The dotted line arrows are the transaction that may not exist at all times. (a) SIR model: β is infection rate, γ is recovery rate, and ξ is the rate of Recovered to Susceptible. (b) SIHR model: α is stifling rate, β is refusing rate, γ is spreading rate, δ is forgetting rate, η is wakened remembering rate, and ξ is spontaneous remembering rate. (c) SEIZ model: β is infection rate, is self-adoption rate, φ is contact rate, and ξ is skeptic rate. The details of p and l and the whole model were explained in [87]. the SIR model for rumor spreading, so that the spreader (I) by either adding uncertainty or changing false perceptions. can be converted to stiflers (R). Stiflers are defined similar In their expensive simulation experiments, only a small pop- to Recovered state. They used the population size of R to ulation of fake news attackers can initiate the spread but the measure the impact of rumor. They found that a forgetting fitness of attackers was sensitive to the cost of deception. mechanism can help reduce rumor influence and the rumor saturation threshold can be influenced by the average degree b: SEIZ MODEL WITH VARIATIONS of nodes in the network. Another Hibernator state (i.e., users Jin et al. [87] captured diffusion of false and true news by the who refuse to spread rumor just because they forgot) was SEIZ epidemic model. Instead of considering the Recovered added to the SIHR (Susceptible, Infectious, Hibernator, and state, they modeled a state of users being heard of the rumor Recovered) model [228] to measure forgetting rate α and but not spreading it (Skeptic, Z) and influenced users (E) post- remembering mechanism η. The new remembering mecha- ing the rumor with an exposure delay. The SEIZ model was nism was proved to delay the rumor termination time and accurately capturing the diffusion patterns in real news and reduce rumor maximum influence. The direct link from S to rumors events and was evaluated to be better than the simple R was added by [228] and were extended by [229]. The update SIS (Susceptible, Infectious, and Susceptible) model. They was that all users in state S were finally converted to either I also proposed a ratio RSI , the transition rates entering E or R state if they had the chance to be exposed to spreaders from S to the transition rates exiting E to I, to differentiate (I). FIGURE 5a and FIGURE 5b describe the SIR and SIHR rumor and real news events data. Isea and Lonngren [83] models, respectively. extended the SEIZ model by considering a forgetting rate of Cho et al. [26] extended the basic SIR model by replacing rumor posts. The forgetting rate is defined as a probability the transition between states to a decision based on the agent’s a user forgets the rumors across all the states. FIGURE 5c belief on the extent of uncertainty in the agent’s opinion. shows the key components of the SEIZ model and its process The Subjective Logic opinion model is used to model an with the states and rates given from one state to another agent’s opinion composition and update based on the extent state. of uncertainty. The three states in the SIR are defined based on the degree of each dimension of an opinion which is c: PROS AND CONS defined by belief, disbelief, and uncertainty. The opinion Epidemic models provide a direct and straightforward math- update involved interaction similarity between two agents, ematical model for the diffusion dynamics of the false infor- a conflict measure between belief and disbelief, and opin- mation. The agent density plot with time is a good way of ion decay upon no interactions between agents for opinion observing the differences between the simulation and real updates. Based on the degree of uncertainty in a given opin- values. However, simulation tests face a common issue as the ion, an agent’s opinion can move from any state to any other population size is unknown and stable, and initial variable state. This work investigated the effect of misinformation values are unknown. If the population size is as large as and disinformation in terms of how well false information the real social media network, the computational cost cannot can be effectively mitigated by propagating countering (true) be ignored. In addition, in the SIR model, the state change information by selecting a good set of true informers. is controlled by probability; but this autonomous behavior The evolutionary SIR model simulation has been used to ignores a user’s intention and belief. To complement this, model decision strategies in fake news attacks [96]. The state there have been some efforts [26], [96] focusing on modeling transitions in the SIR model was replaced by the decision and evaluating the effect of subjective, uncertain opinion and model Iterated Prisoner’s Dilemma (IPD). The deception trust of agents and the role of more agents in terms of false strategies can modify the prior knowledge of the agents information diffusion.

1790 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

2) CREDIBILITY-BASED MODELS Akoglu et al. [4] proposed the so-called OddBall algorithm In OSNs, one of the detection mechanisms for false infor- to detect anomaly behavior like malicious posts and fake mation attackers, Sybil accounts, or spammers is modeling donations. They studied a sub-graph (egonets) of a target node the credibility score in the network [88], [89], [225]. Existing with its neighbors. They analyzed various scoring and ranking works used various ways to represent credibility scores, such methods by using feature patterns in density, weights, princi- as reputation scores, trust scores, and belief scores. Credibil- ple eigenvalues, and ranks and compared their performance ity in OSNs can be modeled by two methods: classification- in different network topologies. based and credibility propagation. A classification-based Kumar et al. [102] detected fake reviewers in user-to-item approach uses supervised learning algorithms [130]. On the rating networks. They developed a new trust system to rank other hand, the credibility propagation approach constructs users, products and ratings by fairness, goodness, and reli- a network to propagate credibility scores among users, tweet ability, respectively. The intrinsic scores are calculated by contents, events and activities [88]. Based on the credibility combining network and behavior properties. Users rated with scores, ranking algorithms of users and posts can be con- low reliability are more likely to be fake reviewers [102]. ducted, such as PageRank [5], [24], [60], [225]. Akoglu et al. [5] developed the so called FraudEagle algo- Negm et al. [130] used 5Ws (i.e., who, what, when, where, rithm to spot fraudsters as well as fake reviews in online and why) credibility to distinguish credible news and RSS review platforms. There are two steps in the FraudEagle (Rich Site Summary) files from news agencies to extract algorithm in terms of scoring users and reviews and grouping publication dates, headlines, contents, and locations to feed the analyzed results. For each review, the sentiment from into different algorithms to calculate the credibility of a true and false is only analyzed to assign the belief score. news agency. The compared algorithms include TF-IDF, The grouping step reviews top-ranked users in a subgraph by TF-IDF with location, Latent Semantic Index (LSI), and clustering and merging more evidence to reveal fraudsters. TF with LSI and log entropy. They concluded that TF-IDF Ghosh et al. [60] developed the CollusionRank algo- and TF-IDF with location performed the best in calcu- rithm for detecting link farming type spammer attacks. The lating credibility. More recently, Norambuena et al. [136] influence scores were given to the users and web pages. leveraged the 5W1H extraction and news summarization By decreasing the influence scores of the users connected to techniques to propose the Inverted Pyramid Score (IPS) spammers, the follow-back behavior of social capitalists was to distinguish structural differences between breaking and discouraged. Yu et al. [225] developed the SybilLimit ranking non-breaking news, with the long-term goal of contrasting algorithm for detecting Sybil attacks. A Sybil node was iden- reporting styles of mainstream and non-mainstream fake tified by calculating the node’s trust score. Chirita et al. [24] outlets. developed the MailRank algorithm for detecting Sybil attacks Jin et al. [88] have introduced a credibility propaga- in the email network. A sender is assessed by a global and tion network for news content composed of three lay- personalized reputation score. ers: message, sub-event, and event. The event layer talks Pros and Cons: Credibility models can be applied in about the main event the news covers, the sub-event layer different stages and levels based on contents, user behav- relates events to the main event, and the message layer iors, and posts/comments in highly heterogeneous networks. holds the content of the news article. A graph optimiza- In addition, a credibility model based on network features is tion problem is formulated to calculate the credibility in agnostic to platforms and languages because the model only this hierarchical network. All the layers are content-based, needs network features. However, how to accurately evaluate and have direct relations with the credibility of the news. initial credibility values is not a trivial problem. Considering Jin et al. [89] further proposed a verification method on credibility at multiple levels makes the computation more credibility in a propagation model by using a topic modeling complex and expensive so it may not be preferred. Further, technique. Mitra and Gilbert [125] constructed the CRED- credibility may be subjective and cannot be ported across BANK corpus by tracking tweets, topics, events, and associ- platforms and/or networks. Lastly, a credibility model may ated in-situ human credibility judgements to systematically not be able to detect sudden changes caused by instances study credibility of social media events tracked over real- which are not easily observable, thus impacting the accuracy time. They later leveraged this corpus to construct language of the credibility score assessment. and temporal models for credibility assessment [126], [127]. By identifying theoretically grounded linguistic dimensions, 3) CASCADES FEATURES-BASED MODELS the authors presented a parsimonious model that maps lan- Information network propagation patterns can be repre- guage cues to perceived levels of credibility. For example, sented by a cascading structure depicting the flow of OSD hedge words and positive emotion words were associated information flow that users time-travelled through, posted, with lower credibility. Additionally, by examining the tem- tweeted, and retweeted. The cascading structure has two poral dynamics of the event reportages, they found that the forms: hop-based cascades and time-based cascades [231]. amount of continued collective attention given to an event The cascades features can be grouped into two approaches: contained useful information about its associated levels of (i) Calculating the similarity of cascades between true credibility [126]. and false information; and (ii) representing cascades using

VOLUME 9, 2021 1791 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

informative representation and features in a supervised learn- types of game theories: Greenberg’s deception model [65], ing model. Li and Cruz’s deception model [112], and hypergame theory [11]. The Greenberg’s deception model investigated the a: CASCADES SIMILARITY effect of deception on players’ payoffs [65]. Kopp et al. [96] Cascades similarity is computed between fake news and true mapped false information to Greenberg’s false signal model. news. A graph kernel [231] was used as a common strategy Li and Cruz [112] used passive and active deception strate- for computing the cascades similarity. Wu et al. [217] pro- gies by introducing noise and randomization, respectively, posed a fake news detection method using a hybrid kernel to increase uncertainty. Kopp et al. [96] used the decep- function. This graph kernel function calculates the similarity tion game in [112] for consistently monitoring constraints between different propagation trees. It also discussed about and conditions, which affects game strategies. Bennett and Radial Basis Function (RBF) kernel which calculates the Dando [11] used hypergame theory to model a deception distance between two vectors of traditional and semantic game where players had subjective perception and under- features. The sentiment and doubt scores for user posts need standings of a complicated game. Kopp et al. [96] also to be verified for fakes news. Ma et al. [116] proposed a used [11] to consider players’ subjective belief which may top-down tree structure using RNNs for false information introduce uncertainty as well. Kopp et al. [96] proposed the detection. The RNN learns the representation from tweets information theoretic model that attackers’ deceptive behav- content, such as embedding various indicative signals hidden ior can be significantly mitigated when the cost of deception in the structure to improve rumors identification. is fairly expensive. Pros and Cons: Game theoretic approaches to model OSD b: CASCADES REPRESENTATION attacks add extra features over and other conventional net- Cascades representation pursues informative representation work structure-based approaches above by considering the as features to distinguish fake news from true news. For cost and benefit of performing a deceptive behavior by users example, the number of nodes is a feature in a non-automated in OSNs. Game theoretic deception detection is a promis- way. Alternatively cascades representation can fit deep learning approach that reflects human behaviors aiming to take ing models [219]. Wu and Liu [219] used LSTM-RNN to an optimal action based on the expected outcome. How- model propagation cascades of a message. This work com- ever, game theoretic approaches have been rarely adopted in bines the propagation pathways with user embedding, which modeling and analyzing online social deceptive behaviors, forms a heterogeneous network. A message is represented compared to data-driven deception detection approaches. Due by a sequence of its spreaders. A modularity maximization to this reason, the effectiveness of game theoretic deception algorithm is used to cluster nodes with embedding vectors. detection approaches has not been fully investigated in the Ma et al. [117] proposed propagation trees using Propaga- literature. In addition, aligned with a conventional drawback tion Tree Kernel (PTK) for rumor detection. It can explore in using game theory, a large number of deceptive actions may the suggested feature space when calculating the similarity introduce a high solution complexity. Uncertain, subjective between two objects. beliefs of users should be carefully considered in terms of modeling incomplete information and/or imperfect informa- c: PROS AND CONS tion in game theory. Similarity-based approaches consider the roles of users in propagating false information. Computing similarity between 5) BLOCKCHAIN-BASED MODELS two cascades may require high computational complex- Huckle and White [80] developed a tool called Proventor to ity [231]. Representation-based methods automatically repre- prove the origin of the media. The Proventor is based on sent news to be verified; however, the depth of cascades may Blockchain storing provenance metadata for users to trust challenge such methods as it is equal to the depth of the neural the authenticity of the metadata. Provenator can be used to network. All the approaches only provided experimental data validate news for news outlets like CNN and BBC where to show their effectiveness. However, it may not properly information and news is sometimes gathered from indepen- reflect real world settings. Training data is a time-consuming dent sources. However, since Provenator uses Blockchain process and is often computationally expensive. and cryptography, a small difference, such as one pixel difference between two images, can make the result vastly 4) GAME THEORETIC MODELS different, leading to generating numerous false alarms and This explores the deception and defense by reward and human interventions for validation, which is labor-intensive. penalty model in OSD attacks. In game theory, the actions McEvily et al. [121] proposed a social media platform called and decisions of the players are mainly based on the reward Steem (i.e., a database) based on Blockchain technology for and penalty of their previous activities and the other players’ building a community reward system. The reward system actions [180]. relies on users for consensus voting, reading content, and Kopp et al. [96] discussed a game theoretic false informa- commenting. tion propagation model as a deception model that simulates Pros and Cons: The original design of Blockchain has the propagation of fake news in the OSNs. They used three security benefits in terms of provenance, integrity and

1792 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

immutability. The Blockchain system is a heterogeneous activity (behavior-based features), and user content simi- network that incorporates other stakeholders to detect and larity including linguistic feature from LIWC dictionary. control OSD activities. In addition, it is resilient against Vosoughi et al. [199] developed a tool called Rumor Gauge OSD attacks. Managing the large ledger size in Blockchain for automatically verifying rumors and predicting their verac- is an issue as shared information in social media and news ity before they are verified by trusted channels. Since rumors outlets grows exponentially. Since both flagging accuracy are temporal, time-series features are extracted as the rumor and consensus verification rely on the contribution of crowd spreads. A total of 17 features (e.g., linguistics, user involved, signals, it may break when too many users are malicious. For and propagation dynamics) were studied. They found that the example, if a large volume of attackers contribute to the crowd fraction of low-to-high diffusion in the diffusion graph is the activities and even control the system, a user cannot access to most predictive feature to represent the veracity of rumors. write transactions. In addition, the authorized party may be The time-series features are processed in DTW and HMM compromised by advanced attackers. models but DTW assumes all the time-series are independent and assigns equal weight to all 17 features. The experiment 6) OTHER NETWORK OPTIMIZATION MODELS evaluated the performance of the Rumor Gauge in terms of Several graph optimization algorithms were proposed in the accuracy of veracity prediction, contribution of each indi- graph anomaly detection and community detection problems. vidual feature, and contribution of three groups of features Hu et al. [78] developed a matrix factorization-based algo- and accuracy as a function of latency. rithm to detect social spammers on Twitter. Their framework Pros and Cons: Hybrid detection takes advantages of utilized both content information and network information hybrid feature sets and can improve the accuracy in detect- of an adjacency matrix and solved a non-smooth convex ing rumors, spammers, and crowdturfings. A drawback of optimization problem. Several approaches have been taken the hybrid detection approach is expensive feature engi- to detect link farming attacks via network structure-based neering and acquisition. Furthermore, the training process algorithms. Araujo et al. [8] detected temporal communities is time-consuming with the increase of complexity as the in cell networks and computer-traffic networks based on feature size increases. Tensor analysis. Jiang et al. [85] detected behavior patterns in OSNs where the spectral subspaces had different patterns VIII. RESPONSE MECHANISMS TO ONLINE SOCIAL and different lockstep behaviors. In addition, Jiang et al. [84] DECEPTION identified synchronized behaviors from spammers. Kumar In this section, we survey existing mitigation or recovery et al. [99] considered trolling as a social deception activity. mechanisms after OSD attacks are detected along with early They proposed a decluttering algorithm to break a network detection mechanisms of OSD attacks [38], [56], [216]. into smaller networks on which the detection algorithm could Florêncio and Herley [56] developed a mitigation strategy to be run. Kumar et al. [101] considered sockpuppets as an OSD deal with compromised accounts by detecting password reuse attack where users created multiple identities to manipulate events and timely reporting it to financial institutions. The a discussion. They found that sockpuppets could be distin- aftermath actions were to take down identified phishing sites, guished from normal users by having more clustered egonets. restore the compromised accounts, and rescue users from bad Pros and Cons: Graph-based features are more available decisions. compared to the user profiles and/or user interaction features Dinakar et al. [38] took a mitigation action to counter without violating privacy issues. In addition, graph-based cyberbullying with two steps: (i) early detection; and (ii) algorithms can be agnostic to any datasets with high applica- reflective user interfaces that popped up notices and sugges- bilities in diverse platforms. However, collecting graph-based tions on user behaviors. Most efforts made to mitigate OSD features, such as centrality measures, and solving graph opti- attacks in OSNs mainly focused on reducing the effect of mization often incurs high computational overhead. This hin- false information propagation. Wu et al. [216] summarized ders its applicability to platforms that require real-time or two misinformation intervention methods: (i) detecting and lightweight detection for streaming data. preventing misinformation from spreading in an early stage; and (ii) developing a competing campaign to fight against D. HYBRID DETECTION misinformation. To limit the spread of fake news, a sample Since ML/DL-based models can take an abundant amount of fake news with maximal utility was identified in [185]. of features, one can train a hybrid feature set combining the Within a certain constraint, this sample of fake news kept the user profile, message content, and network features to detect largest number of users away from fake news posts. Their OSD attacks. Unlike several existing survey papers which algorithm was robust against a high amount of spammers. discussed only individual feature categories [98], [218], our Huckle and White [80] also made an effort to mitigate fake discussion will focus on dealing with OSD attacks using news spread based on the validity proof of digital media data, hybrid features [82], [107]–[109], [199], [206]. such as a picture in the fake news. The blockchain technology Lee et al. [109] detected crowdturfers from Twit- was used to prove the origins of digital media data; however, ter users. A total of 92 features were divided into this method cannot prove the authenticity of the whole news 4 groups: User demographics, user friendship networks, user article. Kumar and Shah [98] summarized misinformation

VOLUME 9, 2021 1793 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

FIGURE 6. The types of datasets and the frequency of their use under the five online social deception studies. The datasets are collected from all the approaches for the prevention, detection, and mitigation of OSD attacks in TABLES5-7.

TABLE 8. Classification used for the defense mechanisms to deal with online social deception attacks in this survey.

FIGURE 7. The types of datasets and the frequency of their use based on two types of approaches, data-driven OSD detection techniques shown in TABLE6 and network structure-based OSD detection techniques shown in TABLE7. mitigation by modeling true and false information. From the existing four different approaches, the authors concluded that these algorithms are effective in detecting the spread of rumor and their simulations could suggest rumor mitigation strategies. Okada et al. [140] studied rumor diffusion by an SIR-extended information diffusion model and developed a mitigation mechanism to ask high influential users to spread correction diffusion. The authors examined how false rumor diffuses and converges when help and/or correct information is given and how fast the convergence appears. build trustworthy cyberspace against human targeted attacks, Pros and Cons: Mitigation and recovery mechanisms especially for protecting children. relied heavily on early detection. The simulation model of spreading true information can mitigate the negative influ- IX. VALIDATION & VERIFICATION ence. However, most studies are based on simulation models, A. DATASETS limited in using real world datasets, or has not been vali- We summarized all the datasets used in existing OSD preven- dated based on the implementation in real-world platforms. tion and detection approaches in TABLES5–7. Most datasets Although it is highly challenging for the developed model to are from various social media platforms, including Twitter, be deployed in real platforms, there should be more efforts Sina Weibo, Facebook, YouTube, and Reddit. FIGURE 6 of using empirical, real datasets for the validation of the demonstrates the frequency distribution of each data source developed recovery models. Recovery in OSNs is more dif- for the five types of OSD attacks considered in this work. ficult than offline social networks because the relationships Twitter, Weibo and Facebook platforms are used with syn- can be easily dropped. Only one research [56] designed a thetic datasets and datasets from all other sources. Twitter system for account restoration. More research efforts should is the most frequently used data source probably because of be made to effectively mitigate the aftermath actions upon the user friendly API for public users to download tweets in early detection. a certain time period. Datasets for false information attacks TABLE8 summarizes the classification of OSD defense (e.g., rumors, fake news and fake reviews) and luring attacks mechanisms including prevention, detection, and mitiga- (e.g., spamming and phishing) draw the most attention from tion/response discussed in Sections VI–VIII. Existing works researchers. It demonstrates the diversity of the sources of mostly focused on detection of OSD attacks we classified in datasets used in the literature. Section III. Less attention has been paid to prevention and FIGURE 7 illustrates the dataset platforms distribu- mitigation where the main focuses include false information, tion for two types of OSD attack detection approaches, luring, and identity theft. There are still open questions to namely, data-driven detection and network structure-based

1794 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

detection. FIGURE7 shows the datasets distribution in false negatives. This metric is estimated by: data-driven approaches (see the left part of the figure) TP summarized in TABLE6. Twitter datasets are broadly Recall = (2) TP + FN used in all types of OSD attack detection mechanisms, such as spambot, malicious account, fake account, compro- • F1 Score or Measure [17], [21], [28], [38], [40], [50], mised account, rumors, and crowdturfing. Other data sources [61], [78], [82], [88], [89], [107]–[109], [113], [162], include LinkedIn, YouTube, online forums Reddit, blacklist- [186], [217], [219]: This metric is an indicator of the ing websites, fact-checking websites, crowdturfing worker accuracy of detection based on both precision and recall. : sites, and PhishTank websites, depending on the type of It is measured by OSD attacks. Several benchmark datasets are frequently used, 2 × Precision × Recall F = (3) such as a social honeypot dataset [108] in which the authors 1 Precision + Recall collected a lot of spammer accounts by using social honeypots • Accuracy [9], [10], [13], [28], [38], [40], [49], [60], deployed in Twitter networks for seven months. [82], [84], [86], [88], [89], [107]–[109], [116], [117], FIGURE7 also shows the dataset distribution used [135], [166], [175], [186], [199], [217], [219], [223]: in network structure-based detection (see the right side This metric measures correct detection for true positives of the figure) in TABLE7. Twitter, Weibo, and Face- and true negatives. However, when the datasets are not book are the top three individual data sources. The others balanced such as too large true positives with too small include fact-checking websites, app store database, online true negatives or vice-versa, this metric may mislead. forums, and rating platforms. The datasets for network It is given by: structure-based approaches can be divided into simulation TP + TN research and detection research. Synthetic datasets are more Accuracy = (4) frequently used in simulation models, such as epidemic mod- TP + TN + FP + FN els and/or credibility/ranking-based models. There is also a weighted accuracy score [13] with dif- Based on our survey of the datasets used in the OSD ferent weights on labels. Accuracy can also be used research, as shown in FIGURE7, most existing approaches to evaluate the contribution of each features or feature rely on the analysis of static datasets. Although it is not easy sets [82], [166], [199], [217]. to deploy a defense mechanism in a dynamic, real platform, • False Positive Rate (FPR) [9], [21], [49], [109], [162], agent-based models where the agent’s behavior is modeled [174], [178], [186], [206], [223]: This metric is to mea- based on real datasets can provide better insights on how the sure misdetection in terms of false alarms among the defense mechanisms work under dynamic environments. ones detected as positives and computed by: FP FPR = (5) B. METRICS FP + TN Most data-driven approaches have used metrics to estimate • False Negative Rate (FNR) [9], [10], [109], [206], [223]: the detection accuracy of OSD attacks. The following metrics This metric captures how many positives are missed and have been considered in the literature: is estimated by: • Confusion Matrix [10], [17], [21], [28], [40], [49], [50], FN [61], [78], [88], [89], [91], [95], [102], [107], [108], FNR = (6) [113], [115], [117], [135], [162], [166], [174], [175], TP + FN [177], [178], [186], [199], [206], [222]: The confusion • Specificity [10], [21], [28], [162], [174]: This metric matrix is made of True Positive (TP), False Positive (FP), measures the extent of correctly detecting negatives over True Negative (TN), and False Negative (FN). They are the actual number of negatives and is obtained by: the basic components for other accuracy metrics, such TN as precision and recall. Specificity = = 1 − FPR (7) TN + FP • Precision [10], [17], [21], [28], [40], [50], [61], [78], [82], [88], [89], [91], [102], [107], [113], [115], [135], • Weighted Cost (Wcost ) [223]: In phishing detection, [162], [166], [175], [186], [193], [217], [219], [224]: since the ratio of legitimate websites to phishing website This metric simply estimates the true positives over posi- is high, a legitimate website misclassified to a phishing tives detected including true positives and false positives one (FPR) has severe effects than the reverse (FNR). The by: weighted cost is used to balance the performance of FPR and FNR. Wcost is estimated by: TP Precision = (1) = + × TP + FP Wcost FNR λ FPR, λ > 1. (8) • Recall [17], [21], [28], [40], [50], [61], [78], [82], [88], where λ is the weight of FPR. Higher values of λ means [89], [91], [107], [113], [115], [135], [162], [175], [186], larger influence of FPR value. [217], [219], [224]: This metric captures the true posi- • Receiver Operating Characteristic (ROC) Curve [10], tives over the actual positives include true positives and [82], [106], [174], [175], [206]: ROC curve draws a

VOLUME 9, 2021 1795 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

plot of classifier’s true positive rate (TPR) against FPR time points. at various detection threshold scenarios. This curve is 1 X used to measure and compare stability between several MAE = |pi − li|, (12) |U| classifier models. i∈U • Area Under the Curve (AUC) [10], [17], [22], [61], where U is a user set, pi is a prediction result, li is a true [82], [102], [106], [108], [109], [162], [174]: AUC is label, and i is a data index. calculated by the the area under the ROC curve. It mea- • 2-norm Error [87]: This measures the simulation fitting sures the probability of a classifier to correctly identify error of an epidemic model as one of the performance a true-positive data. Since AUC is insensitive to imbal- measures of model fitting and optimization. A good ance between classes, it can be better than Accuracy in model would reduce this error through iterations. This evaluating imbalanced dataset. AUC is another metric of metric is estimated by: classifier stability and classification quality for different k I(t) − Tweets(t) k2 settings. 2-norm Error = , (13) • Discounted Cumulative Gain (DCG) [147]: DCG mea- k Tweets(t) k2 sures the effectiveness of an algorithm, an alternative where I(t) is the number of users (agent I) that spread the measure to AUC. A higher DCG is indictive of an early rumor tweet at time t. Tweets(t) is the number of tweets identification of suspicious cases and estimated by: at time t from the real data. n • Mean Fraction of Recovered Agents Per Time Unit X r[i] DCG = r[1] + , (9) (R) [26]: This is a specific case of the statistics and plot log i i=2 2 metric. Instead of plotting the count of each agent at th each time point, the average fraction of recovered agents where r[i] is 1 if the i friend request was defined as during the total session time T is calculated. suspicious or 0 if the ith friend request was defined as legitimate, and n is the number of total incoming PT R(t) R = t=1 , (14) requests that require further investigation [147]. T • Matthews Correlation Coefficient (MCC) [28], [61], where R(t) is the number of agents recovered from false [162], [186]: MCC measures the correlation between information (i.e., not believing in false information) and predicted class and real class of users. This metric is T is the total simulation time. considered as the unbiased version of F1-measure and • Spearman’s Rank Correlation Coefficient (ρ) [49], [86]: given by: This metric measures the rank correlation between the TP × (TN − FP) × FN predicted labels and the ground truth and is obtained by: MCC= √ , (TP + FN)(TP + FP)(TN + FP)(TN + FN) 6 P d2 ρ = 1 − i , (15) (10) n(n2 − 1)

where MCC ≈ 1 means high prediction accuracy. where n ranks are distinct integers and di is the dif- MCC ≈ 0 means the prediction is no better than random ference of two ranks between an element. ρ ranges in guessing. MCC ≈ −1 means that the prediction is in [−1, 1] as a real number where 0 refers to random guess disagreement with the real class. while 1 indicates positive correlation [212]. • Cohen’s Kappa Value (κ) [38]: This metric is a mea- • Label Ranking Average Precision (LRAP) [86]: This sure of reliability for two classifiers or raters, which measures the ability to give more accurate prediction for considers true positive agreement by chance. Cohen’s each post message, with a prefect prediction of 1. LRAP Kappa Value is used when Accuracy alone is insufficient is measured by: to evaluate model reliability [38]. Cohen’s Kappa is n−1 : 1 X 1 X |Lij| calculated as LRAP = , (16) n k yi k rankij P − P i= 0 j y = κ = o e (11) 0 , ij 1 1 − Pe where n is number of data points, yi is the vector of where Po is the observed agreement in classification, ground truth labels of the ith data point, k · k0 is number the same as Accuracy, and Pe is the hypothetical prob- of non-zero elements in a vector, yij is the binary label ability of agreement by chance. High Cohen’s Kappa of jth label from ground truth vector yi, |Lij| is number Value (0.8 ≤ κ ≤ 1) indicates good reliability [18]. of positive labels for a given data point i, and rankij is • Mean Absolute Error (MAE) [87], [216]: Many detec- the rank of predicted label (pij) in predicted label vector tion algorithms for OSD attacks use MAE to estimate (pi) for a given i [165]. their detection accuracy. In addition, this metric is used • Label Ranking Loss (LRL) [86]: This metric estimates to measure the simulation fitting error of an epidemic the number of times that irrelevant labels are ranked model by calculating the absolute values of errors at each higher than relevant labels. Due to its large volume of

1796 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

• Several online social deception studies have discussed legal and ethical issues [31], [147], [222]. However, their discussions are limited in that if no malicious activities do not directly involve normal, legitimate users, their design is safe to normal users. There may be indirect inﬂuences of social honeypots that can introduce to normal users, such as normal users approaching to social honeypots. • Although one community seriously concerns ethical issues related to privacy in conducting social deception research, the other community takes a position of

FIGURE 8. Counts of research works in TABLES5-7 by metrics. advocating online social deception research in terms of safeguarding society and vulnerable people. Hence, complex description, the interested readers can refer their perspective is that there are neither unethical nor to [188] for more details. illegal issues associated with conducting online social FIGURE8 illustrates the frequency of each metric used deception research [120]. in the existing approaches surveyed in this work. Since most • Some researchers claim that creating fake accounts as of the current studies are to develop OSD attack detection social honeypots is only for detecting spammers, not mechanisms, the majority of the metrics is related to mea- to take benefits from normal users or buy compro- suring detection accuracy. Among all the detection metrics, mised accounts [191], [232]. However, it seems not clear whether social honeypots using fake accounts do not Precision, Recall, F1 score, Accuracy are the most popular metrics used in the existing works. FPR, FNR, Specificity, introduce any harms to normal users. ROC, and AUC are also obtained based on the Confusion • To prevent risks from using crowdsourcing methods, Matrix. They are used to compare the performance of mul- some guidelines of controls and protections toward tiple classifiers. However, algorithmic complexity of defense unethical behaviors are discussed, such as privacy viola- algorithms is rarely considered. tion [142]. The system design and research procedures should include how to prevent sensitive data sharing and X. ETHICAL ISSUES OF SOCIAL DECEPTION to enforce users’ security education and training. Ethical issues in social deception research have been dis- • For misinformation propagation experiments, some cussed as follows: researchers claim that since misinformation itself (e.g., • Privacy issues may be raised when conducting social fake news) is from public information, it does not require deception research in terms of setting up social hon- any informed consent [32]. However, spreading the pub- eypots and fake profiles, collecting data from those lic misinformation itself can even amplify its influence accounts, and capturing users behaviors (e.g., making in OSNs, which can still manipulate public opinions. friends and posting texts). Elovici et al. [46] strongly The ethical issues associated with conducting online social recommended sharing datasets in the public. This allows deception research have been hotly debated because this other researchers to avoid taking unnecessary proce- issue touches conflicting aspects of the fundamental values, dures associated with any ethical issues which are which is privacy vs. safety. In the current state of the OSD often encountered in the process of data collection. research, there have been a lot of obscure aspects in con- If many public datasets for research are available, new ducting human subject involved research in online platforms. researchers can reduce the need to crawl their own Since human users are the key part of OSNs and the key dataset. In addition, if the OSN provider has an advanced entities to be protected in OSNs, there should be very specific way of anonymization, the researcher can follow those guidelines and regulations which can facilitate researchers standards to protect the identity when handling the col- to safely solve OSD problems within the legal boundary. lected data. The authors also discussed a coordinated Otherwise, although solving the OSD problems is highly emergency response team (CERT) to handle vulnera- critical to ensure the public good and safety in our society, bility disclosures from the new research results [46] extra hassle derived from ethical issues may significantly in terms of strictly anonymizing users’ identities and hinder researchers from tackling the OSD research. handling findings with great care. • Since social honeypots research involves human subjects-based experiments, it should be regulated by XI. DISCUSSIONS: INSIGHTS & LIMITATIONS the institutional review board (IRB) approval [42] par- Based on the extensive survey conducted, we identify the : ticularly in terms of privacy issues that may be raised in following insights personal data analysis, stakeholder analysis, and human • Deception domains and intent: Deception is defined deception analysis. However, Many ethical issues still across multidisciplinary domains with varying intent remain even not discussed [67], [107], [233]. and detectability in type and extent. Although social

VOLUME 9, 2021 1797 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

deception is frequently considered as a negative conno- • Lack of systematic, comprehensive defense strate- tation with low integrity and maliciousness, not neces- gies to combat OSD attacks: Fighting against OSD sarily all socially deceptive behaviors have bad intent. attacks requires systematic, comprehensive, and active Rather, social deception can play a defensive social role defense strategies covering prevention, detection, and for self-protection or self-presentation. mitigation/response. However, existing approaches have • OSD type category: Like OSN attacks and cybercrimes, been heavily explored in detection strategies, rather OSD can be defined by deceptive intent. However, than prevention or mitigation strategies. In addition, unlike OSN attacks or cybercrimes, a unique aspect of some approaches are embracing multiple roles with the OSD is that OSD is only possible when a deceivee a single mechanism. For example, most current OSD cooperates with a deceiver. Hence, training and educa- mitigation approaches are based on the results from tion of deceivees is highly critical for preventing OSD early detection. Further, since a social honeypot col- attacks. lects attacker profiles, the analysis of social honeypots • Importance of social deception cues: Traditional is used to design classifiers for both prevention and offline deception cues and vulnerabilities are from sev- detection. eral domains: individual, cultural, linguistic, physio- • Lack of experiments with real-time, dynamic logical and psychological. The cues and vulnerabili- datasets: Current prevention and detection methods are ties of OSD have variations compared to face-to-face based on simulation and/or real datasets, but only a few communication. For serious OSD attacks which mainly studies discussed effective training and detection using belong to cybercrimes, such as human targeted attacks streaming data, such as Twitter API. In addition, the high (e.g., human trafficking, cyberbullying, cyberstalking, computational and time complexity for real-time detec- or cybergrooming), if OSD cues are effectively captured, tion remains an open issue. there is a much higher chance to prevent and detect • Insufficient proactive defense: The inherent role of a OSD attacks than offline social deception due to much social honeypot is proactively finding targeted attack- less real-time interactions which trigger much less risky ers (i.e., a particular type of attackers). This way situations from the safety perspective. allows a system to identify targeted OSD attackers • Ethical design considerations of social honeypots: and proactively take actions to prevent vulnerable users A social honeypot is one of broadly studied OSD pre- from being victimized by the targeted OSD attackers. vention/detection mechanism. They are deployed to Although honeypots are used in communication net- passively collect attackers account profiles. However, works as a proactive intrusion prevention mechanism, since social honeypots deal with human users, there social honeypots are passively used in OSNs due to should be careful legal or ethical considerations in their potential legal and ethical issues. Without clarifying the design features. To this aim, there should be more spe- legal/ethical design guidelines and regulations, the func- cific, clear guidelines and regulations available for the tion and exploitation of social honeypots cannot be fully researchers. benefited and even can be improved further to deal with • OSD detection mechanisms: Three dominant OSD highly intelligent attackers. In particular, to deal with detection approaches surveyed in this work are user- real human-based OSD attacks, such as crowdturfing profile-based, message content-based, and network by paid workers to conduct social deception activities, structure-based. They each have pros and cons in dif- more active social honeypot designs should be allowed ferent scenarios. In particular, if a detection mechanism while preserving normal user privacy and ethical uses only network structure features to detect OSD rights. attacks, it would better preserve user privacy but need to • High complexity of features and models: We develop lightweight algorithms to efficiently calculate substantially surveyed the features for data-driven expensive network features, such as centrality values detection methods in Sections VII-A and VII-B requiring knowledge of the entire network topology and network/epidemic models for network structure and high computation cost to estimate centrality values. feature-based methods in Section VII-C. The complex- To maximize the synergy of all three approaches, hybrid ity of extracting and evaluating features and the model approaches incorporating all are promising. optimization grows fast with the size of datasets. How • Metrics for performance evaluation: As the majority to reduce the solution complexity and improve solution of OSD defense mechanisms are explored to effectively efficiency for OSD detection is still an open issue. detect OSD attacks, most works have used accuracy • Lack of qualitative analysis for cues of OSD attacks: metrics to measure the performance of their proposed Most OSD defense mechanisms have focused on deal- work. A few of the metrics are based on correlations and ing with attacks by machines (or bots). However, for ranks, which are mainly used to identify key signals to more serious OSD attacks (i.e., human targeted attacks), detect OSD attacks. appropriate cues should be first carefully identified We also found the following limitations of the existing OSD through qualitative analysis based on multidisciplinary detection approaches: research efforts with behavioral scientists.

1798 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

XII. CONCLUSION & FUTURE WORK deception detection (e.g., heart beats can be fed back to a In this section, we discuss the key findings from this survey detection mechanism). to answer the research questions raised in Section I-B as RQ4: What kinds of defense mechanisms and/or method- follows: ologies need to be explored to develop better defense tools RQ1: How is OSD affected by the fundamental concepts combating OSD attacks? and characteristics of social deception which have been stud- Answer: Most defense mechanisms to combat OSD ied in multidisciplinary domains? attacks only focused on detection, particularly in terms of Answer: The fundamental meanings and intent of social data-driven approaches using machine/deep learning tech- deception are commonly present in both offline and online niques. Prevention mechanisms are substantially limited and social deception as we find surprisingly common trends have often been considered along with detection mechanisms and characteristics observed in socially deceptive behav- (e.g., social honeypots or data-driven approaches). Response iors. The common goal is ‘misleading a potential deceivee mechanisms after the detection of the OSD are even much for the benefit of a deceiver’ by increasing the deceivee’s less explored than prevention mechanisms. misbelief or confusion. In both online and offline plat- RQ5: What are the key limitations of existing valida- forms, social deception is successful only when the deceivee tion and verification methodologies in terms of datasets and cooperates with actions taken by the deceiver. Due to the metrics? unique characteristics of an online environment such as less Answer: Popular datasets used in existing OSD research real-time/face-to-face interactions without physical presence are from Twitter, Sina Weibo, and Facebook along with other to each other, both the deceivee and deceivers can take synthetic datasets collected from simulation, as shown in advantages of them in terms of defense (i.e., prevention, FIGURES 6 and 7. In particular, to study human targeted detection, and response/mitigation) and attack (e.g., anony- attacks, there is a lack of datasets available because online mous attacks or easily running away if something goes human targeted deception data are based on individual chats wrong). or dyadic interactions. In addition, most metrics are to mea- RQ2: What are new attack types based on the recent sure detection accuracy of OSD attacks, which is natural trends of OSD attacks observed in real online worlds and to observe as most defense mechanisms mainly focus on how are they related to common social network attacks, detection. Hence, there is a lack of efficiency metrics that cybercrimes, and security breaches based on cybersecurity can capture cost or complexity of the proposed defense tech- perspectives? niques against OSD attacks. Answer: More serious human targeted attacks (e.g., human RQ6: What are the key concerns associated with ethical trafficking, cyberstalking, cybergrooming, or cyberbullying) issues in conducting OSD research? have emerged as new OSD attack types. The seriousness Answer: The OSD research is inherently involved with has grown as online deception often leads to offline crimes, human users and may introduce ethical issues. However, which become indeed the major concern of cybercrimes. to conduct meaningful experiments, some real testbed-based While human targeted attacks become a more serious social validation/verification should be conducted to obtain high issue, there is a lack of cyber laws to respond to this serious confidence in the developed technologies under realistic set- social deception attack, easily leading to cybercrimes. Human tings. However, when deploying defense techniques in a real targeted attacks also bring the discussion of security breach testbed (e.g., Facebook, Twitter, etc), the defense process may of a person and non-information assets. In this sense, human encounter inevitable deception towards normal, legitimate safety needs to be protected against the new types of OSD users. In addition, privacy is a big concern in cybersecurity attacks. and there is an inherent trade-off between preserving users RQ3: How can the cues of social deception and/or sus- privacy and improving the quality of defense tools against ceptability traits to OSD affect the strategies by attackers and OSD attacks (i.e., privacy vs. safety). To investigate serious defenders in OSNs? OSD attacks, such as human targeted attacks, most interac- Answer: Many cues and susceptability traits of offline tions are peer-to-peer, such as dyadic conversations/chats, social deception behaviors are present in online social which is mostly unavailable. As a result, there is a lack of real deception behaviors. The examples include intentionality of datasets in studying highly serious human targeted attacks, social deception, its cues from linguistic, cultural, and/or such as human trafficking, cyberstalking, or cybergrooming technological contexts, and various susceptibility factors attacks. In addition, there is a lack of systematic legal and/or including demographics, cultural, and/or network structure ethical guidelines and regulations on how to proceed the OSD feature-based traits. Moreover, due to the limited real-time research with involvement of human users in real testbed and/or interactions feeling people’s presence in online plat- settings. forms, some cues such as physiological and/or psychological We suggest the following future research directions cues may be missed while they can be highly useful cues in the online social deception and its countermeasure for detecting social deception. However, as more advanced research: features of online platform-based interactions emerge, more • Multidimensional research approaches to solve physiological/psychological cues can be captured to improve online social deception: Although various concepts,

VOLUME 9, 2021 1799 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

properties, and cues of social deception have been stud- OSD attackers can build advanced social bots by ana- ied in diverse disciplines, the multidisciplinary nature lyzing the current detection models and fooling the of social deception has not been appropriately consid- existing models by leveraging adversarial machine ered in developing defense mechanisms against OSD learning (AML) techniques [103]. One countermeasure attacks. In particular, deceivers and deceivees are both is to collect new datasets and retrain the classifiers. How- humans via online platforms. Without understanding ever, it is challenging to support updating the models the way deceivers and deceivees communicate and/or with additional datasets. In addition, the cost of repeat- interact to each other, it is hard to detect deception edly training the classifiers with the whole dataset is easily. Deception can be easily deployed on top of firm, particularly high. Hence, we need to develop lightweight trust relationships. In order to distinguish deception ML algorithms. Another countermeasure can be identi- from truthfulness, in-depth understanding of deception fying unknown deception features based on linguistic, based on multidisciplinary research effort is a must for behavioral, and technological cues. developing effective defense mechanisms against OSD • Defense against human attackers vs. social bots: attacks. A human attacker is another type of advanced attack- • Distinction of benign deception from malicious ers where a real human is behind the social network deception: In the cybersecurity domain, deception platforms performing OSD attacks. They can bypass refers to a deceptive action with malicious intent. detection because the conversation is from real humans However, in a social network, many users may use or the accounts are mimicking normal users. There also OSD to promote self-presentation/protection for privacy exist crowdturfing workers who spread deceptive infor- protection. Therefore, if OSD is treated as a form of mation in social media and get paid. More research work attacks, it can possibly result in a high false positive rate is needed to investigate how to detect and differentiate (i.e., detecting benign users as malicious users). In order social bots from human attackers. to prevent this, we need to develop deception-specific • Measurement of physiological and/or psychological online defense tools that can differentiate benign decep- cues to develop better prevention techniques against tion from malicious deception. OSD attacks: Due to the unique characteristics of online • Culture-aware defense against OSD attacks: Based platforms, some critical deception cues are missing and on our survey, different cultural deception cues have must be identified first, such as physiological and/or been observed [14], [75], [111], [167]. Since decep- psychological cues. Measuring those cues can be critical tion cues are sensitive to cultural characteristics, in terms of improving prevention and early detection culture-aware defense mechanisms should be developed against OSD attacks. to effectively deal with OSD attacks that consider unique • Extra effort for developing prevention and response cultural characteristics of a social network. mechanisms to defend against OSD attacks: In terms • Detectability-aware and intent-aware defense against of the techniques used across all defense mechanisms, OSD attacks: As discussed in FIGURE 2, the spectrum while machine/deep learning approaches are popularly of deception can span into a wide range of detectabil- used, game theoretic and/or network structure feature ity and intent. Intelligent OSD attackers may establish based approaches are still to be further explored to trust relationships with potential victims and exploit the produce more mature approaches. They have extra mer- established trust to deceive the victims. This is especially its over data-driven approaches in that the game the- observed in human targeted attacks, such as human oretic approach can predict an attacker’s next move. trafficking or cybergrooming, which is categorized as For prevention, although early detection as an OSD serious cybercrimes [226]. Hence, we need to develop prevention strategy is receiving a high attention with detectability-aware and intent-aware cues against highly growing amounts of recent works to fight against OSD subtle hard-to-detect OSD attacks. attacks, there should be more prevention mechanisms • Security protection of adolescent online users in mul- that can provide more proactive defense, such as iden- tiple roles: Adolescents have high vulnerability to OSD tifying potential attacks even before the attacks occur. attacks, as discussed in Section V. Deceptions, such as Response/mitigation after OSD detection, such as mit- cyberbullying, have exposed severe social, behavioral igation after false information spread or recovery after and security issues introduced by adolescents. Educa- OSD attacks are launched, is little explored in the lit- tional and habitual guidelines, parental control, and/or erature and calls for more efforts to further investigate security guard tools cannot protect potential deceivees effective mechanisms to minimize risk and aftermath or victims. Social media platforms need to enhance their effect after OSD detection. effective OSD prevention mechanisms especially for • Effective deception cues-based approach to combat young users by identifying their vulnerability factors for OSD attacks without violating user privacy: Due to a more proactive protection. lack of effective deception cues/datasets, it is difficult to • Dynamic, updated defense mechanisms to obfuscate conduct OSD research to defend against serious human highly advanced attackers: Recent studies showed that targeted OSD attacks for validation and verification.

1800 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

A future direction is to develop techniques to capture [8] M. Araujo, S. Papadimitriou, S. Günnemann, C. Faloutsos, P. Basu, clear deception cues without violating user privacy. A. Swami, E. E. Papalexakis, and D. Koutra, ‘‘COM2: Fast automatic discovery of temporal (‘comet’) communities,’’ in Proc. Pacific–Asia • Integrated defense needed to prevent, detect, and Conf. Knowl. Discovery Data Mining. Springer, 2014, pp. 271–283. mitigate false information propagation: As discussed [9] P. R. Badri Satya, K. Lee, D. Lee, T. Tran, and J. J. Zhang, ‘‘Uncovering in Section III-A, false information embraces fake news, fake likers in online social networks,’’ in Proc. 25th ACM Int. Conf. Inf. Knowl. Manage., 2016, pp. 2365–2370. unverified rumors, manipulated information, deceptive [10] S. Barbon, R. A. Igawa, and B. B. Zarpelão, ‘‘Authorship verification online comments or fake reviews. False (or unverified applied to detection of compromised accounts on online social networks,’’ or forged) information is mostly propagated with unde- Multimedia Tools Appl., vol. 76, no. 3, pp. 3213–3233, 2017. sirable intent to influence public opinions. Although [11] P. G. Bennett and M. R. Dando, ‘‘Complex strategic analysis: A hypergame study of the fall of France,’’ J. Oper. Res. Soc., vol. 30, no. 1, there have been a rich volume of defense mechanisms pp. 23–32, Jan. 1979. developed to detect fake news, fake reviews, or fake [12] I. R. Berson, M. J. Berson, and J. M. Ferron, ‘‘Emerging risks of violence comments, the adverse impact of propagated fake news in the digital age,’’ J. School Violence, vol. 1, no. 2, pp. 51–71, Mar. 2002. [13] G. Bhatt, A. Sharma, S. Sharma, A. Nagpal, B. Raman, and A. Mittal, has not been significantly mitigated. A more holistic ‘‘Combining neural, statistical and external features for fake news stance approach is in a critical need by integrating the defense identification,’’ in Proc. Web Conf. Companion, Int. World Wide Web mechanisms for prevention, early detection, and fast Conf. Steering Committee, 2018, pp. 1353–1357. [14] C. F. Bond, A. Omar, A. Mahmoud, and R. N. Bonser, ‘‘Lie detec- mitigation of false information. tion across cultures,’’ J. Nonverbal Behav., vol. 14, no. 3, pp. 189–204, • More efficiency metrics to expedite the defense pro- Sep. 1990. cess: Efficiency metrics for measuring algorithmic com- [15] D. B. Buller, J. K. Burgoon, A. Buslig, and J. Roiger, ‘‘Testing interpersonal deception theory: The language of interpersonal deception,’’ plexity of defense techniques have not been sufficiently Commun. Theory, vol. 6, no. 3, pp. 268–289, Aug. 1996. used in existing approaches. More meaningful complex- [16] D. Buller and J. Burgoon, ‘‘Interpersonal deception theory,’’ Commun. ity/efficiency metrics should be considered in order to Theory, vol. 6, no. 3, pp. 203–242, Aug. 1996. expedite the speed of prevention, detection, and recovery [17] C. Cao and J. Caverlee, ‘‘Detecting spam urls in social media via behavioral analysis,’’ in Proc. Eur. Conf. Inf. Retr. Springer, 2015, pp. 703–714. as a defense against OSD. [18] J. Carletta, ‘‘Assessing agreement on classification tasks: The kappa • Systematic legal and/or ethical guidelines for con- statistic,’’ Comput. Linguistics, vol. 22, no. 2, pp. 249–254, 1996. ducting meaningful OSD research: Since humans are [19] T. L. Carson, Lying and Deception: Theory and Practice. London, U.K.: Oxford Univ. Press, 2010. the key factors in solving the problems associated with [20] Z. Chance and M. I. Norton, ‘‘The what and why of self-deception,’’ the OSD attacks, the research community and govern- Current Opinion Psychol., vol. 6, pp. 104–107, Dec. 2015. ment need to provide clear guidelines on conducting [21] C. Chen, J. Zhang, Y.Xie, Y.Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian, ‘‘A performance evaluation of machine learning-based OSD research without violating user privacy. In com- streaming spam tweets detection,’’ IEEE Trans. Comput. Social Syst., munication networks, the research community appears vol. 2, no. 3, pp. 65–76, Sep. 2015. to have reached some accord about using defensive [22] H. Chen, J. Liu, Y. Lv, M. H. Li, M. Liu, and Q. Zheng, ‘‘Semi-supervised clue fusion for spammer detection in Sina Weibo,’’ Inf. Fusion, vol. 44, deception techniques to defend against cyberattacks by pp. 22–32, Nov. 2018. emphasizing its benefits. However, for cybersecurity [23] T. Chen, W. Liu, Q. Fang, J. Guo, and D.-Z. Du, ‘‘Minimizing misin- research on OSN platforms likely involving human sub- formation profit in social networks,’’ IEEE Trans. Comput. Social Syst., jects, there is little research, let along a consensus, vol. 6, no. 6, pp. 1206–1218, Dec. 2019. [24] P.-A. Chirita, J. Diederich, and W. Nejdl, ‘‘Mailrank: Using ranking for on what methodologies are allowed and what level of spam detection,’’ in Proc. 14th ACM Int. Conf. Inf. Knowl. Manage., 2005, user privacy must be preserved before achieving the goal pp. 373–380. of defense effectiveness. [25] J.-H. Cho, H. Cam, and A. Oltramari, ‘‘Effect of personality traits on trust and risk to phishing vulnerability: Modeling and analysis,’’ in Proc. IEEE Int. Multi-Disciplinary Conf. Cognit. Methods Situation Awareness REFERENCES Decis. Support (CogSIMA), Mar. 2016, pp. 7–13. [1] H. Abutair, A. Belghith, and S. Alahmadi, ‘‘CBR-PDS: A case-based [26] J.-H. Cho, S. Rager, J. O’Donovan, S. Adali, and B. D. Horne, reasoning phishing detection system,’’ J. Ambient Intell. Hum. Comput., ‘‘Uncertainty-based false information propagation in social networks,’’ vol. 10, no. 7, pp. 2593–2606, Jul. 2019. ACM Trans. Social Comput., vol. 2, no. 2, pp. 1–34, Oct. 2019. [2] D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi, ‘‘Spread of (MIS) [27] F. Cohen, ‘‘The use of deception techniques: Honeypots and decoys,’’ information in social networks,’’ Games Econ. Behav., vol. 70, no. 2, Handbook Inf. Secur., vol. 3, no. 1, pp. 646–655, 2006. pp. 194–227, 2010. [28] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, [3] J. Adair, T. Dushenko, and R. Lindsay, ‘‘Ethical regulation and their ‘‘Social fingerprinting: Detection of spambot groups through DNA- impact on research practice,’’ Ethical Regulation Impact Res. Pract., inspired behavioral modeling,’’ IEEE Trans. Dependable Secure Com- vol. 40, no. 1, pp. 59–72, 1985. put., vol. 15, no. 4, pp. 561–576, Aug. 2018. [4] L. Akoglu, M. McGlohon, and C. Faloutsos, ‘‘Oddball: Spotting anoma- [29] D. C. Daniel and K. L. Herbig, Strategic Military Deception: Perga- lies in weighted graphs,’’ in Proc. Pacific–Asia Conf. Knowl. Discovery mon Policy Studies on Security Affairs. Amsterdam, The Netherlands: Data Mining. Berlin, Germany: Springer, 2010, pp. 410–421. Elsevier, 2013. [5] L. Akoglu, R. Chandy, and C. Faloutsos, ‘‘Opinion fraud detection in [30] A. Darwish, A. E. Zarka, and F. Aloul, ‘‘Towards understanding phishing online reviews by network effects,’’ in Proc. 7th Int. AAAI Conf. Weblogs victims’ profile,’’ in Proc. Int. Conf. Comput. Syst. Ind. Inform., 2012, Social Media, 2013, pp. 2–11. pp. 1–5. [6] S. Albladi and G. Weir, ‘‘User characteristics that influence judgment [31] E. De Cristofaro, A. Friedman, G. Jourjon, M. A. Kaafar, and of social engineering attacks in social networks,’’ Hum.-Centric Comput. M. Z. Shafiq, ‘‘Paying for likes: Understanding Facebook like fraud using Inf. Sci., vol. 8, no. 1, p. 5, 2018. honeypots,’’ in Proc. Conf. Internet Meas. Conf., 2014, pp. 129–136. [7] J. Anderson and J. Cho, ‘‘Software defined network based virtual machine [32] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, placement in cloud systems,’’ in Proc. IEEE Mil. Commun. Conf. (MIL- H. E. Stanley, and W. Quattrociocchi, ‘‘The spreading of misinformation COM), Oct. 2017, pp. 876–881. online,’’ Proc. Nat. Acad. Sci. USA, vol. 113, no. 3, pp. 554–559, 2016.

VOLUME 9, 2021 1801 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

[33] K. J. Denker, J. Manning, K. B. Heuett, and M. E. Summers, ‘‘Twitter [57] M. Forelle, P. Howard, A. Monroy-Hernández, and S. Savage, ‘‘Politi- in the classroom: Modeling online communication attitudes and stu- cal bots and the manipulation of public opinion in Venezuela,’’ 2015, dent motivations to connect,’’ Comput. Hum. Behav., vol. 79, pp. 1–8, arXiv:1507.07109. [Online]. Available: http://arxiv.org/abs/1507.07109 Feb. 2018. [58] H. Gao, J. Hu, T. Huang, J. Wang, and Y. Chen, ‘‘Security issues in [34] Department of Homeland Security. (2018). Countering False Information online social networks,’’ IEEE Internet Comput., vol. 15, no. 4, pp. 56–63, on Social Media in Disasters and Emergencies. [Online]. Available: Jul./Aug. 2011. https://www.dhs.gov/sites/default/files/publications/SMWG_Countering- [59] B. Gert, Morality: Its Nature and Justification, 6th ed. London, U.K.: False-Info-Social-Media-Disasters-Emergencies_Mar2018-508.pdf Oxford Univ. Press, 2005. [35] B. M. DePaulo, J. J. Lindsay, B. E. Malone, L. Muhlenbruck, K. Charlton, [60] S. Ghosh, B. Viswanath, F. Kooti, N. K. Sharma, G. Korlam, and H. Cooper, ‘‘Cues to deception,’’ Psychol. Bull., vol. 129, no. 1, F. Benevenuto, N. Ganguly, and K. P. Gummadi, ‘‘Understanding and pp. 74–118, 2003. combating link farming in the Twitter social network,’’ in Proc. 21st Int. [36] D. C. Derrick, T. O. Meservy, J. L. Jenkins, J. K. Burgoon, and Conf. World Wide Web, 2012, pp. 61–70. J. F. Nunamaker, ‘‘Detecting deceptive chat-based communication using [61] J. Golbeck et al., ‘‘Fake news vs satire: A dataset and analysis,’’ in Proc. typing behavior and message cues,’’ ACM Trans. Manage. Inf. Syst., 10th ACM Conf. Web Sci., 2018, pp. 17–21. vol. 4, no. 2, pp. 1–21, Aug. 2013. [62] W. Goucher, ‘‘Being a cybercrime victim,’’ Comput. Fraud Secur., [37] Definition of ‘Deception’, Oxford English Dictionary, London, U.K., vol. 2010, no. 10, pp. 16–18, Oct. 2010. 1989. [63] P. A. Granhag and M. Hartwig, ‘‘A new theoretical perspective on [38] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, ‘‘Common deception detection: On the psychology of instrumental mind-reading,’’ sense reasoning for detection, prevention, and mitigation of cyberbully- Psychol., Crime Law, vol. 14, no. 3, pp. 189–200, Jun. 2008. ing,’’ ACM Trans. Interact. Intell. Syst., vol. 2, no. 3, pp. 1–30, Sep. 2012. [64] S. Grazioli and S. L. Jarvenpaa, ‘‘Perils of Internet fraud: An empiri- [39] K. Ding, N. Pantic, Y. Lu, S. Manna, and M. I. Husain, ‘‘Towards cal investigation of deception and trust with experienced Internet con- building a word similarity dictionary for personality bias classification of sumers,’’ IEEE Trans. Syst., Man, Cybern. A, Syst. Humans, vol. 30, no. 4, phishing email contents,’’ in Proc. IEEE 9th Int. Conf. Semantic Comput., pp. 395–410, Jul. 2000. Feb. 2015, pp. 252–259. [65] I. Greenberg, ‘‘The role of deception in decision theory,’’ J. Conflict [40] Y.Ding, N. Luktarhan, K. Li, and W. Slamu, ‘‘A keyword-based combina- Resolution, vol. 26, no. 1, pp. 139–156, 1982. tion approach for detecting phishing Webpages,’’ Comput. Secur., vol. 84, [66] V. Greiman and C. Bain, ‘‘The emergence of cyber activity as a gateway pp. 256–275, Jul. 2019. to human trafficking,’’ J. Inf. Warfare, vol. 12, no. 2, pp. 41–49, 2013. [41] M. Diomidous, K. Chardalias, A. Magita, P. Koutonias, [67] C. Grier, K. Thomas, V. Paxson, and M. Zhang, ‘‘@Spam: The under- P. Panagiotopoulou, and J. Mantas, ‘‘Social and psychological effects of ground on 140 characters or less,’’ in Proc. 17th ACM Conf. Comput. the Internet use,’’ Acta Inf. Medica, vol. 24, no. 1, pp. 66–68, 2016. Commun. Secur., 2010, pp. 27–37. [42] D. Dittrich, ‘‘The ethics of social honeypots,’’ Res. Ethics, vol. 11, no. 4, [68] G. Gupta and J. Pieprzyk, ‘‘Socio-technological phishing prevention,’’ pp. 192–210, Dec. 2015. Inf. Secur. Tech. Rep., vol. 16, no. 2, pp. 67–73, May 2011. [43] A. N. Doane, S. Ehlke, and M. L. Kelley, ‘‘Bystanders against cyberbul- [69] H. Haddadi and P. Hui, ‘‘To add or not to add: Privacy and social lying: A video program for college students,’’ Int. J. Bullying Prevention, honeypots,’’ in Proc. IEEE Int. Conf. Commun. Workshops, May 2010, vol. 2, no. 1, pp. 41–52, Mar. 2020. pp. 1–5. [44] M. Egele, G. Stringhini, C. Kruegel, and G. Vigna, ‘‘COMPA: Detect- [70] T. Halevi, J. Lewis, and N. Memon, ‘‘A pilot study of cyber security and ing compromised accounts on social networks,’’ in Proc. NDSS, 2013, privacy related behavior and personality traits,’’ in Proc. 22nd Int. Conf. pp. 1–17. World Wide Web, 2013, pp. 737–744. [45] P. Ekman, Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage. New York, NY, USA: Norton, 2009. [71] T. Halevi, N. Memon, and O. Nov, ‘‘Spear-phishing in the wild: A real- world study of personality, phishing self-efficacy and vulnerability to [46] Y.Elovici, M. Fire, A. Herzberg, and H. Shulman, ‘‘Ethical considerations spear phishing attacks,’’ Dept. Comput. Sci. Eng., NYU Polytech. School when employing fake identities in online social networks for research,’’ Eng., New York, NY, USA, Tech. Rep., 2015, doi: 10.2139/ssrn.2544742. Sci. Eng. Ethics, vol. 20, no. 4, pp. 1027–1043, Dec. 2014. [47] E. E. Englehardt and D. Evans, ‘‘Lies, deception, and public relations,’’ [72] J. Hancock, L. E. Curry, S. Goorha, and M. Woodworth, ‘‘On lying and Public Relations Rev., vol. 20, no. 3, pp. 249–266, 1994. being lied to: A linguistic analysis of deception in computer-mediated communication,’’ Discourse Process., vol. 45, pp. 1–23, Jan. 2008. [48] F. Enos, S. Benus, R. L. Cautin, M. Graciarena, J. Hirschberg, and E. Shriberg, ‘‘Personality factors in human deception detection: Compar- [73] M. D. Hauser, ‘‘Minding the behavior of deception,’’ in Machiavel- ing human to machine performance,’’ in Proc. 9th Int. Conf. Spoken Lang. lian Intelligence II: Extensions and Evaluations. Cambridge, U.K.: Process., 2006, pp. 813–816. Cambridge Univ. Press, 1997. [49] R. M. Everett, J. R. C. Nurse, and A. Erola, ‘‘The anatomy of online [74] R. Heartfield and G. Loukas, ‘‘A taxonomy of attacks and a survey deception: What makes automated text convincing?’’ in Proc. 31st Annu. of defence mechanisms for semantic social engineering attacks,’’ ACM ACM Symp. Appl. Comput. (SAC), 2016, pp. 1115–1120. Comput. Surv., vol. 48, no. 3, pp. 1–39, Feb. 2016. [50] A. Ebrahimi Fard, M. Mohammadi, Y.Chen, and B. Van de Walle, ‘‘Com- [75] S. J. Heine, ‘‘Evolutionary explanations need to account for cultural putational rumor detection without non-rumor: A one-class classification variation,’’ Behav. Brain Sci., vol. 34, no. 1, pp. 26–27, Feb. 2011. approach,’’ IEEE Trans. Comput. Social Syst., vol. 6, no. 5, pp. 830–846, [76] M. Hernández-Álvarez, ‘‘Detection of possible human trafficking in Oct. 2019. Twitter,’’ in Proc. Int. Conf. Inf. Syst. Softw. Technol. (ICI2ST), 2019, [51] D. A. Feingold, ‘‘Human trafficking,’’ Foreign Policy, vol. 32, no. 150, pp. 187–191. pp. 26–30, Sep. 2005. [77] G. Hofstede, ‘‘Dimensionalizing cultures: The Hofstede model in con- [52] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, ‘‘The rise text,’’ Online Readings Psychol. Culture, vol. 2, no. 1, p. 8, 2011. of social bots,’’ Commun. ACM, vol. 59, no. 7, pp. 96–104, Jun. 2016. [78] X. Hu, J. Tang, Y. Zhang, and H. Liu, ‘‘Social spammer detection [53] W. Ferreira and A. Vlachos, ‘‘Emergent: A novel data-set for stance in microblogging,’’ in Proc. 23rd Int. Joint Conf. Artif. Intell., 2013, classification,’’ in Proc. Conf. North Amer. Chapter Assoc. Comput. pp. 2633–2639. Linguistics, Hum. Lang. Technol., 2016, pp. 1163–1168. [79] X. Hu, J. Tang, H. Gao, and H. Liu, ‘‘Social spammer detection with [54] M. Fire, R. Goldschmidt, and Y. Elovici, ‘‘Online social networks: sentiment information,’’ in Proc. IEEE Int. Conf. Data Mining, Dec. 2014, Threats and solutions,’’ IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 180–189. pp. 2019–2036, 4th Quart., 2014. [80] S. Huckle and M. White, ‘‘Fake news: A technological approach to [55] M. Flintham, C. Karner, K. Bachour, H. Creswick, N. Gupta, and proving the origins of content, using blockchains,’’ Big Data, vol. 5, no. 4, S. Moran, ‘‘Falling for fake news: Investigating the consumption of news pp. 356–371, Dec. 2017. via social media,’’ in Proc. CHI Conf. Hum. Factors Comput. Syst., 2018, [81] R. Hyman, ‘‘The psychology of deception,’’ Annu. Rev. Psychol., vol. 40, pp. 1–10. no. 1, pp. 133–154, 1989. [56] D. Florêncio and C. Herley, ‘‘Evaluating a trial deployment of password [82] I. Inuwa-Dutse, M. Liptrott, and I. Korkontzelos, ‘‘Detection of spam- re-use for phishing prevention,’’ in Proc. Anti-Phishing Work. Groups 2nd posting accounts on Twitter,’’ Neurocomputing, vol. 315, pp. 496–511, Annu. eCrime Researchers Summit, 2007, pp. 26–36. Nov. 2018.

1802 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

[83] R. Isea and K. E. Lonngren, ‘‘A new variant of the seiz model to describe [106] R. Y. K. Lau, Y. Xia, and Y. Ye, ‘‘A probabilistic generative model for the spreading of a rumor,’’ Int. J. Data Sci. Anal., vol. 3, no. 4, pp. 28–33, mining cybercriminal networks from online social media,’’ IEEE Comput. 2017. Intell. Mag., vol. 9, no. 1, pp. 31–43, Feb. 2014. [84] M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, ‘‘CatchSync: [107] K. Lee, J. Caverlee, and S. Webb, ‘‘Uncovering social spammers: Social Catching synchronized behavior in large directed graphs,’’ in Proc. honeypots + machine learning,’’ in Proc. 33rd Int. ACM SIGIR Conf. Res. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, Develop. Inf. Retr., 2010, pp. 435–442. pp. 941–950. [108] K. Lee, B. D. Eoff, and J. Caverlee, ‘‘Seven months with the devils: [85] M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, ‘‘Inferring strange A long-term study of content polluters on Twitter,’’ in Proc. 5th Int. AAAI behavior from connectivity pattern in social networks,’’ in Proc. Pacific– Conf. Weblogs Social Media, 2011, pp. 185–192. Asia Conf. Knowl. Discovery Data Mining. Springer, 2014, pp. 126–138. [109] K. Lee, P. Tamilarasan, and J. Caverlee, ‘‘Crowdturfers, campaigns, [86] S. Jiang and C. Wilson, ‘‘Linguistic signals under misinformation and and social media: Tracking and revealing crowdsourced manipulation of fact-checking: Evidence from user comments on social media,’’ in Proc. social media,’’ in Proc. 7th Int. AAAI Conf. Weblogs Social Media, 2013, ACM Hum.-Comput. Interact. (CSCW), vol. 2, 2018, pp. 1–23. pp. 331–340. [87] F. Jin, E. Dougherty, P. Saraf, Y. Cao, and N. Ramakrishnan, ‘‘Epidemio- [110] K. Lee, J. Caverlee, and C. Pu, ‘‘Social spam, campaigns, misinformation logical modeling of news and rumors on Twitter,’’ in Proc. 7th Workshop and crowdturfing,’’ in Proc. 23rd Int. Conf. World Wide Web, 2014, Social Netw. Mining Anal., 2013, pp. 1–9. pp. 199–200. [88] Z. Jin, J. Cao, Y.-G. Jiang, and Y. Zhang, ‘‘News credibility evaluation [111] C. C. Lewis and J. F. George, ‘‘Cross-cultural deception in social net- on microblog with a hierarchical propagation model,’’ in Proc. IEEE Int. working sites and face-to-face communication,’’ Comput. Hum. Behav., Conf. Data Mining, Dec. 2014, pp. 230–239. vol. 24, no. 6, pp. 2945–2964, Sep. 2008. [89] Z. Jin, J. Cao, Y. Zhang, and J. Luo, ‘‘News verification by exploiting [112] D. Li and J. B. Cruz, ‘‘Information, decision-making and deception in conflicting social viewpoints in microblogs,’’ in Proc. 13th AAAI Conf. games,’’ Decis. Support Syst., vol. 47, no. 4, pp. 518–527, Nov. 2009. Artif. Intell., 2016, pp. 2972–2978. [113] G. Liang, W. He, C. Xu, L. Chen, and J. Zeng, ‘‘Rumor identification in [90] V. Kalmus, A. Realo, and A. Siibak, ‘‘Motives for Internet use and microblogging systems based on users’ behavior,’’ IEEE Trans. Comput. their relationships with personality traits and socio-demographic factors,’’ Social Syst., vol. 2, no. 3, pp. 99–108, Sep. 2015. Trames, J. Hum. Social Sci., vol. 15, no. 4, pp. 385–403, 2011. [114] T. Lin, D. E. Capecci, D. M. Ellis, H. A. Rocha, S. Dommaraju, [91] G. A. Kamhoua, N. Pissinou, S. S. Iyengar, J. Beltran, C. Kamhoua, D. S. Oliveira, and N. C. Ebner, ‘‘Susceptibility to spear-phishing emails: B. L. Hernandez, L. Njilla, and A. P. Makki, ‘‘Preventing colluding iden- Effects of Internet user demographics and email content,’’ ACM Trans. tity clone attacks in online social networks,’’ in Proc. IEEE 37th Int. Conf. Comput. Hum. Interact., vol. 26, no. 5, pp. 32:1–32:28, Jul. 2019. Distrib. Comput. Syst. Workshops (ICDCSW), Jun. 2017, pp. 187–192. [115] L. Liu, Y. Lu, Y. Luo, R. Zhang, L. Itti, and J. Lu, ‘‘Detecting [92] I. Kayes and A. Iamnitchi, ‘‘Privacy and security in online social net- ‘smart’ spammers on social network: A topic model approach,’’ 2016, works: A survey,’’ Online Social Netw. Media, vol. 3, pp. 1–21, Oct. 2017. arXiv:1604.08504. [Online]. Available: http://arxiv.org/abs/1604.08504 [93] S. Kemp. (2020). More Than Half of the People on Earth Now [116] J. Ma, W. Gao, and K.-F. Wong, ‘‘Detect rumors in microblog posts using Use Social Media. [Online]. Available: https://blog.hootsuite.com/simon- propagation structure via kernel learning,’’ in Proc. 55th Annu. Meeting kemp-social-media/ Assoc. Comput. Linguistics, Long Papers, vol. 1, 2017, pp. 708–717. [94] M. L. Knapp, R. P. Hart, and H. S. Dennis, ‘‘An exploration of deception [117] J. Ma, W. Gao, and K.-F. Wong, ‘‘Rumor detection on Twitter with tree- as a communication construct,’’ Hum. Commun. Res., vol. 1, no. 1, structured recursive neural networks,’’ in Proc. 56th Annu. Meeting Assoc. pp. 15–29, Sep. 1974. Comput. Linguistics, Long Papers, vol. 1, 2018, pp. 1980–1989. [95] G. Kontaxis, I. Polakis, S. Ioannidis, and E. P. Markatos, ‘‘Detecting [118] N. K. Malhotra, S. S. Kim, and J. Agarwal, ‘‘Internet users’ information social network profile cloning,’’ in Proc. IEEE Int. Conf. Pervas. Comput. privacy concerns (IUIPC): The construct, the scale, and a causal model,’’ Commun. Workshops (PERCOM Workshops), Mar. 2011, pp. 295–300. Inf. Syst. Res., vol. 15, no. 4, pp. 336–355, Dec. 2004. [96] C. Kopp, K. B. Korb, and B. I. Mills, ‘‘Information-theoretic models of [119] B. Markines, C. Cattuto, and F. Menczer, ‘‘Social spam detection,’’ in deception: Modelling cooperation and diffusion in populations exposed Proc. 5th Int. Workshop Adversarial Inf. Retr. Web, 2009, pp. 41–48. to ‘fake news,’’’ PLoS ONE, vol. 13, no. 11, 2018, Art. no. e0207383. [120] A. M. Matwyshyn, A. Cui, A. D. Keromytis, and S. J. Stolfo, ‘‘Ethics in [97] R. E. Kraut and D. B. Poe, ‘‘Behavioral roots of person perception: security vulnerability research,’’ IEEE Secur. Privacy Mag., vol. 8, no. 2, The deception judgments of customs inspectors and laymen,’’ J. Pers. pp. 67–72, Mar. 2010. Social Psychol., vol. 39, no. 5, p. 784, 1980. [98] S. Kumar and N. Shah, ‘‘False information on Web and social [121] N. McEvily, D. Novaes, K. Panesar, J. Moyer, A. Karr, B. Ng, media: A survey,’’ 2018, arXiv:1804.08559. [Online]. Available: and W. Ryan. (2018). An Incentivized Blockchain Enabled Mul- http://arxiv.org/abs/1804.08559 timedia Ecosystem. [Online]. Available: https://crushcrypto.com/wp- content/uploads/2018/02/CRNC-Whitepaper.pdf [99] S. Kumar, F. Spezzano, and V. S. Subrahmanian, ‘‘Accurately detecting trolls in slashdot zoo via decluttering,’’ in Proc. IEEE/ACM Int. Conf. Adv. [122] P. Mell and T. Grance, ‘‘The NIST definition of cloud computing,’’ Com- Social Netw. Anal. Mining, Aug. 2014, pp. 188–195. put. Secur. Division, Inf. Technol. Lab., Nat. Inst. Standards Technol., [100] S. Kumar, R. West, and J. Leskovec, ‘‘Disinformation on the Web: Impact, Tech. Rep., 2011. characteristics, and detection of Wikipedia hoaxes,’’ in Proc. 25th Int. [123] B. M. Meltzer, ‘‘Lying: Deception in human affairs,’’ Int. J. Sociol. Social Conf. World Wide Web, Int. World Wide Web Conf. Steering Committee, Policy, vol. 23, nos. 6–7, pp. 61–79, Jun. 2003. 2016, pp. 591–602. [124] R. W. Mitchell, ‘‘A framework for discussing deception,’’ in Deception [101] S. Kumar, J. Cheng, J. Leskovec, and V. Subrahmanian, ‘‘An army of me: Perspectives on Human and Non-Human Deceit. Albany, NY, USA: State Sockpuppets in online discussion communities,’’ in Proc. 26th Int. Conf. Univ. of New York Press, 1986, pp. 3–40. World Wide Web, 2017, pp. 857–866. [125] T. Mitra and E. Gilbert, ‘‘Credbank: A large-scale social media corpus [102] S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, and with associated credibility annotations,’’ in Proc. 9th Int. AAAI Conf. Web V. Subrahmanian, ‘‘Rev2: Fraudulent user prediction in rating plat- Social Media, 2015, pp. 258–267. forms,’’ in Proc. 11th ACM Int. Conf. Web Search Data Mining, 2018, [126] T. Mitra, G. Wright, and E. Gilbert, ‘‘Credibility and the dynamics pp. 333–341. of collective attention,’’ in Proc. ACM Hum.-Comput. Interact., vol. 1, [103] A. Kurakin, I. Goodfellow, and S. Bengio, ‘‘Adversarial machine Dec. 2017, pp. 1–17. learning at scale,’’ 2016, arXiv:1611.01236. [Online]. Available: [127] T. Mitra, G. P. Wright, and E. Gilbert, ‘‘A parsimonious language model http://arxiv.org/abs/1611.01236 of social media credibility across disparate events,’’ in Proc. ACM Conf. [104] D. D. Langleben, L. Schroeder, J. A. Maldjian, R. C. Gur, S. McDonald, Comput. Supported Cooperat. Work Social Comput., 2017, pp. 126–145. J. D. Ragland, C. P. O’Brien, and A. R. Childress, ‘‘Brain activity during [128] D. Modic and S. E. Lea, ‘‘How neurotic are scam victims, really? The big simulated deception: An event-related functional magnetic resonance five and Internet scams,’’ in Proc. Conf. Int. Confederation Advancement study,’’ NeuroImage, vol. 15, no. 3, pp. 727–732, Mar. 2002. Behav. Econ. Econ. Psychol., Exeter, U.K., 2011, pp. 1–24. [105] M. Latonero, ‘‘Human trafficking online: The role of social net- [129] E. Mussumeci and F. C. Coelho, ‘‘Modeling news spread as an SIR working sites and online classifieds,’’ SSRN, Tech. Rep., 2011, doi: process over temporal networks,’’ 2016, arXiv:1701.07853. [Online]. 10.2139/ssrn.2045851. Available: http://arxiv.org/abs/1701.07853

VOLUME 9, 2021 1803 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

[130] T. M. Negm, M. A. Rezqa, and A. F. Hegazi, ‘‘News credibility mea- [153] L. Rainie. (2018). Americans’ Complicated Feelings About Social sure utilizing ontologies & semantic weighing schemes (NCMOSWS),’’ Media in an Ear of Privacy Concerns. [Online]. Available: in Proc. 2nd World Conf. Smart Trends Syst., Secur. Sustainability https://pewrsr.ch/2pJczTZ (WorldS4), Oct. 2018, pp. 57–64. [154] S. Rathore, P. K. Sharma, V. Loia, Y.-S. Jeong, and J. H. Park, ‘‘Social [131] M. E. Newman, ‘‘Spread of epidemic disease on networks,’’ Phys. Rev. network security: Issues, challenges, threats, and solutions,’’ Inf. Sci., E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 66, no. 1, vol. 421, pp. 43–69, 2017. pp. 016128:1–016128:11, 2002. [155] J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, S. Patil, [132] M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards, ‘‘Lying A. Flammini, and F. Menczer, ‘‘Truthy: Mapping the spread of astroturf words: Predicting deception from linguistic styles,’’ Pers. Social Psychol. in microblog streams,’’ in Proc. 20th Int. Conf. Companion World Wide Bull., vol. 29, no. 5, pp. 665–675, May 2003. Web, 2011, pp. 249–252. [133] Nextgate. (2019). Research Report 2013 State of Social Media Spam. [156] R. J. Reinhart. (2018). One in Four Americans Have Experienced Cyber- [Online]. Available: https://www.slideshare.net/prayukth1/2013-state-of- crime. [Online]. Available: https://news.gallup.com/poll/245336/one- social-media-spam-research-report four-americans-experienced-cybercrime.aspx [134] S. D. Nicks, J. H. Korn, and T. Mainieri, ‘‘The rise and fall of deception in [157] R. E. Riggo and H. S. Friedman, ‘‘Individual differences and cues to social psychology and personality research, 1921 to 1994,’’ Ethics Behav., deception,’’ J. Pers. Social Psychol., vol. 45, no. 4, pp. 899–915, 1983. vol. 7, no. 1, pp. 69–77, Mar. 1997. [158] N. C. Rowe and J. Rrushi, Introduction to Cyberdeception. Cham, [135] M. Nisrine, ‘‘A security approach for social networks based on honey- Switzerland: Springer, 2016. pots,’’ in Proc. 4th IEEE Int. Colloq. Inf. Sci. Technol. (CiSt), Oct. 2016, [159] V. L. Rubin, Y. Chen, and N. J. Conroy, ‘‘Deception detection for news: pp. 638–643. Three types of fakes,’’ in Proc. 78th ASIS&T Annu. Meeting, Inf. Sci. [136] B. K. Norambuena, M. Horning, and T. Mitra, ‘‘Evaluating the inverted Impact, Res. Community (ASIST), Silver Springs, MD, USA: American pyramid structure through automatic 5w1h extraction and summariza- Society for Information Science, 2015, pp. 83:1–83:4. tion,’’ in Proc. Comput. Journalism Conf., 2020, pp. 1–7. [160] L.-M. Russow, ‘‘Deception: A philosophical perspective,’’ in Deception [137] E. Novak and Q. Li, ‘‘A survey of security and privacy in online social Perspectives on Human and Non-Human Deceit. Albany, NY, USA: State networks,’’ Dept. Comput. Sci., College William Mary, Williamsburg, University of New York Press, 1986, pp. 3–40. VA, USA, Tech. Rep., 2012, pp. 1–32. [161] M. Saad, A. Ahmad, and A. Mohaisen, ‘‘Fighting fake news propagation [138] B. Nyhan and J. Reifler, ‘‘When corrections fail: The persistence of with blockchains,’’ in Proc. IEEE Conf. Commun. Netw. Secur. (CNS), political misperceptions,’’ Political Behav., vol. 32, no. 2, pp. 303–330, Jun. 2019, pp. 1–4. Jun. 2010. [162] S. R. Sahoo and B. B. Gupta, ‘‘Hybrid approach for detection of malicious profiles in Twitter,’’ Comput. Electr. Eng., vol. 76, pp. 65–81, Jun. 2019. [139] N. Nykodym, R. Taylor, and J. Vilela, ‘‘Criminal profiling and insider cyber crime,’’ Comput. Law Secur. Rev., vol. 21, no. 5, pp. 408–414, [163] S. Samonas and D. Coss, ‘‘The CIA strikes back: Redefining confiden- Jan. 2005. tiality, integrity and availability in security,’’ J. Inf. Syst. Secur., vol. 10, no. 3, pp. 21–45, 2014. [140] Y. Okada, K. Ikeda, K. Shinoda, F. Toriumi, T. Sakaki, K. Kazama, [164] B. R. Schlenker and M. R. Leary, ‘‘Social anxiety and self-presentation: M. Numao, I. Noda, and S. Kurihara, ‘‘SIR-extended information diffu- A conceptualization model,’’ Psychol. Bull., vol. 92, no. 3, pp. 641–669, sion model of false rumor and its prevention strategy for Twitter,’’ J. Adv. 1982. Comput. Intell. Intell. Informat., vol. 18, no. 4, pp. 598–607, 2014. [165] Scikit-Learn. (2019). Metrics and Scoring: Quantifying the [141] D. Oliveira, H. Rocha, H. Yang, D. Ellis, S. Dommaraju, M. Muradoglu, Quality of Predictions. [Online]. Available: https://scikit- D. Weir, A. Soliman, T. Lin, and N. Ebner, ‘‘Dissecting spear phishing learn.org/stable/modules/model_evaluation.html emails for older vs young adults: On the interplay of weapons of influence and life domains in predicting susceptibility to phishing,’’ in Proc. Conf. [166] S. Sedhai and A. Sun, ‘‘Semi-supervised spam detection in Twitter Hum. Factors Comput. Syst. (CHI), 2017, pp. 6412–6424. stream,’’ IEEE Trans. Comput. Social Syst., vol. 5, no. 1, pp. 169–175, Mar. 2018. [142] A. Onuchowska and G.-J. de Vreede, ‘‘Disruption and deception in [167] C. Sedikides and M. J. Strube, ‘‘The multiply motivated self,’’ Pers. Social crowdsourcing: Towards a crowdsourcing risk framework,’’ AIS eLibrary, Psychol. Bull., vol. 21, no. 12, pp. 1330–1335, Dec. 1995. Tech. Rep., 2018. [168] J. Seiffert-Brockmann and K. Thummes, ‘‘Self-deception in public rela- [143] G. Ortmann, ‘‘On drifting rules and standards?’’ Scandin. J. Manage., tions. A psychological and sociological approach to the challenge of con- vol. 26, no. 2, pp. 204–214, 2010. flicting expectations,’’ Public Relations Rev., vol. 43, no. 1, pp. 133–144, [144] A. Ortony, G. L. Clore, and M. A. Foss, ‘‘The referential structure of the Mar. 2017. affective lexicon,’’ Cognit. Sci., vol. 11, no. 3, pp. 341–364, Jul. 1987. [169] Z. Shan, H. Cao, J. Lv, C. Yan, and A. Liu, ‘‘Enhancing and identifying [145] A. Paradise, R. Puzis, and A. Shabtai, ‘‘Anti-reconnaissance tools: Detect- cloning attacks in online social networks,’’ in Proc. 7th Int. Conf. Ubiq- ing targeted socialbots,’’ IEEE Internet Comput., vol. 18, no. 5, pp. 11–19, uitous Inf. Manage. Commun., 2013, pp. 1–6. Sep. 2014. [170] C. Shao, G. L. Ciampaglia, A. Flammini, and F. Menczer, ‘‘Hoaxy: [146] A. Paradise, A. Shabtai, and R. Puzis, ‘‘Hunting organization-targeted A platform for tracking online misinformation,’’ in Proc. 25th Int. Conf. socialbots,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Min- Companion World Wide Web, Int. World Wide Web Conf. Steering Com- ing, Aug. 2015, pp. 537–540. mittee, 2016, pp. 745–750. [147] A. Paradise, A. Shabtai, R. Puzis, A. Elyashar, Y. Elovici, M. Roshandel, [171] S. Sheng, M. Holbrook, P. Kumaraguru, L. Cranor, and J. Downs, and C. Peylo, ‘‘Creation and management of social network honeypots ‘‘Who falls for phish? A demographic analysis of phishing susceptibility for detecting targeted cyber attacks,’’ IEEE Trans. Comput. Social Syst., and effectiveness of interventions,’’ in Proc. Conf. Hum.-Comput. Inter- vol. 4, no. 3, pp. 65–79, Sep. 2017. act. (CHI), Atlanta, GA, USA, 2010, pp. 373–382. [148] J. Parrish, J. L. Bailey, and J. F. Courtney, ‘‘A personality based model [172] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, ‘‘Fake news detection for determining susceptibility to phishing attacks,’’ Univ. Arkansas Little on social media: A data mining perspective,’’ ACM SIGKDD Explor. Rock, Little Rock, AR, USA, Tech. Rep., 2009. Newslett., vol. 19, no. 1, pp. 22–36, 2017. [149] M. Pattinson, C. Jerram, K. Parsons, A. McCormac, and M. Butavicius, [173] H. A. Smith, The Compleat Practical Joker. New York, NY, USA: ‘‘Why do some people manage phishing e-mails better than others?’’ Inf. Morrow, 1953. Manage. Comput. Secur., vol. 20, no. 1, pp. 18–28, 2012. [174] J. Song, S. Lee, and J. Kim, ‘‘Crowdtarget: Target-based detection of [150] A. Patwardhan, S. M. Noble, and C. M. Nishihara, ‘‘The use of strate- crowdturfing in online social networks,’’ in Proc. 22nd ACM SIGSAC gic deception in relationships,’’ J. Services Marketing, vol. 23, no. 5, Conf. Comput. Commun. Secur., 2015, pp. 793–804. pp. 318–325, Jul. 2009. [175] L. Song, R. Y.K. Lau, and C. Yin, ‘‘Discriminative topic mining for social [151] J. Pawlick, E. Colbert, and Q. Zhu, ‘‘A game-theoretic taxonomy and sur- spam detection,’’ in Proc. Pacific Asia Conf. Inf. Syst. (PACIS), 2014, vey of defensive deception for cybersecurity and privacy,’’ ACM Comput. pp. 378–394. Surv., vol. 52, no. 4, pp. 1–28, Sep. 2019. [176] S. A. Spence, T. F. D. Farrow, A. E. Herford, I. D. Wilkinson, [152] J. W. Pennebaker, M. R. Mehl, and K. G. Niederhoffer, ‘‘Psychological Y. Zheng, and P. W. R. Woodruff, ‘‘Behavioural and functional anatom- aspects of natural language use: Our words, our selves,’’ Annu. Rev. ical correlates of deception in humans,’’ Neuroreport, vol. 12, no. 13, Psychol., vol. 54, no. 1, pp. 547–577, Feb. 2003. pp. 2849–2853, Sep. 2001.

1804 VOLUME 9, 2021 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

[177] G. Stringhini, C. Kruegel, and G. Vigna, ‘‘Detecting spammers on [203] C. Wagner, S. Mitter, C. Körner, and M. Strohmaier, ‘‘When social bots social networks,’’ in Proc. 26th Annu. Comput. Secur. Appl. Conf., 2010, attack: Modeling susceptibility of users in online social networks,’’ in pp. 1–9. Proc. MSM, 2012, pp. 41–48. [178] M. M. Swe and N. Nyein Myo, ‘‘Fake accounts detection on Twitter using [204] H. G. Wallbott and K. R. Scherer, ‘‘Stress specificities: Differential effects blacklist,’’ in Proc. IEEE/ACIS 17th Int. Conf. Comput. Inf. Sci. (ICIS), of coping style, gender, and type of stressor on autonomic arousal, facial Jun. 2018, pp. 562–566. expression, and subjective feeling,’’ J. Pers. Social Psychol., vol. 61, no. 1, [179] Symantec. (2019). Internet Security Threat Report. [Online]. Available: pp. 147–156, 1991. https://www.symantec.com/security-center/threat-report [205] G. Wang, C. Wilson, X. Zhao, Y. Zhu, M. Mohanlal, H. Zheng, and [180] S. Tadelis, Game Theory. Princeton, NJ, USA: Princeton Univ. Press, B. Y. Zhao, ‘‘Serf and turf: Crowdturfing for fun and profit,’’ in Proc. 2013. 21st Int. Conf. World Wide Web, 2012, pp. 679–688. [181] R. Tembe, O. Zielinska, Y.Liu, K. W. Hong, E. Murphy-Hill, C. Mayhorn, [206] G. Wang, T. Wang, H. Zheng, and B. Y. Zhao, ‘‘Man vs. machine: and X. Ge, ‘‘Phishing in international waters: Exploring cross-national Practical adversarial detection of malicious crowdsourcing work- differences in phishing conceptualizations between Chinese, Indian and ers,’’ in Proc. 23rd USENIX Secur. Symp. (USENIX Security), 2014, American samples,’’ in Proc. Symp. Bootcamp Sci. Secur. (HotSoS). pp. 239–254. New York, NY, USA: Association for Computing Machinery, 2014, [207] T. Wang, G. Wang, X. Li, H. Zheng, and B. Y. Zhao, ‘‘Characterizing and pp. 1–7. detecting malicious crowdsourcing,’’ ACM SIGCOMM Comput. Com- [182] L. ten Brinke and S. Porter, ‘‘Cry me a river: Identifying the behavioral mun. Rev., vol. 43, no. 4, pp. 537–538, 2013. consequences of extremely high-stakes interpersonal deception,’’ Law [208] S. Webb, J. Caverlee, and C. Pu, ‘‘Social honeypots: Making friends with Hum. Behav., vol. 36, no. 6, pp. 469–477, 2012. a spammer near you,’’ in Proc. CEAS, 2008, pp. 1–10. [183] R. Trivers, Deceit and Self-Deception. Springer, 2010, pp. 373–393. [209] T. Weller, ‘‘Compromised account detection based on clickstream data,’’ [184] A. Troisi, ‘‘Displacement activities as a behavioral measure of stress in in Proc. Web Conf. Companion, Int. World Wide Web Conf. Steering nonhuman primates and human subjects,’’ Stress, vol. 5, no. 1, pp. 47–54, Committee, 2018, pp. 819–823. 2002. [210] B. Whaley, ‘‘Toward a general theory of deception,’’ J. Strategic Stud., [185] S. Tschiatschek, A. Singla, M. G. Rodriguez, A. Merchant, and vol. 5, no. 1, pp. 178–192, Mar. 1982. A. Krause, ‘‘Fake news detection in social networks via crowd signals,’’ [211] H. Whittle, C. Hamilton-Giachritsis, A. Beech, and G. Collings, in Proc. Web Conf. Companion, Int. World Wide Web Conf. Steering ‘‘A review of young people’s vulnerabilities to online grooming,’’ Aggres- Committee, 2018, pp. 517–524. sion Violent Behav., vol. 18, no. 1, pp. 135–146, Jan. 2013. [186] M. Tsikerdekis, ‘‘Identity deception prevention using common contribu- [212] Wikipedia. (2019). Spearman’s Rank Correlation Coefficient. tion network data,’’ IEEE Trans. Inf. Forensics Security, vol. 12, no. 1, [Online]. Available: https://en.wikipedia.org/wiki/Spearman%27s_rank_ pp. 188–199, Jan. 2017. correlation_coefficient [187] M. Tsikerdekis and S. Zeadally, ‘‘Online deception in social media,’’ [213] E. J. Williams, A. Beardmore, and A. N. Joinson, ‘‘Individual differences Commun. ACM, vol. 57, no. 9, pp. 72–80, Sep. 2014. in susceptibility to online influence: A theoretical review,’’ Comput. Hum. [188] G. Tsoumakas, I. Katakis, and I. Vlahavas, ‘‘Mining multi-label data,’’ Behav., vol. 72, pp. 412–421, Jul. 2017. in Data Mining and Knowledge Discovery Handbook. Springer, 2009, [214] J. Wolak, D. Finkelhor, K. Mitchell, and M. Ybarra, ‘‘Online ‘predators’ pp. 667–685. and their victims: Myths, realities, and implications for prevention and [189] E. C. Tupes and R. E. Christal, ‘‘Recurrent personality factors based on treatment,’’ Amer. Psychol., vol. 63, no. 2, pp. 111–128, 2010. trait ratings,’’ J. Pers., vol. 60, no. 2, pp. 225–251, Jun. 1992. [215] R. Wright, S. Chakraborty, A. Basoglu, and K. Marett, ‘‘Where did they [190] B. E. Turvey, Criminal Profiling: An Introduction to Behavioral Evidence go right? Understanding the deception in phishing communications,’’ Analysis. New York, NY, USA: Academic, 2011. Group Decis. Negotiation, vol. 19, no. 4, pp. 391–416, Jul. 2010. [191] Twitter Help. (2019). The Twitter Rules. [Online]. Available: [216] B. Wu, F. Morstatter, X. Hu, and H. Liu, Mining Misinformation in Social https://help.twitter.com/en/rules-and-policies/twitter-rules Media. Boca Raton, FL, USA: CRC Press, 2016, pp. 135–162. [192] S. G. A. van de Weijer, R. Leukfeldt, and W. Bernasco, ‘‘Determinants of reporting cybercrime: A comparison between identity theft, consumer [217] K. Wu, S. Yang, and K. Q. Zhu, ‘‘False rumors detection on Sina Weibo fraud, and hacking,’’ Eur. J. Criminol., vol. 16, no. 4, pp. 486–508, by propagation structures,’’ in Proc. IEEE 31st Int. Conf. Data Eng., Jul. 2019. Apr. 2015, pp. 651–662. [193] C. VanDam, P.-N. Tan, J. Tang, and H. Karimi, ‘‘CADET: A multi-view [218] L. Wu and H. Liu, Detecting Crowdturfing in Social Media. New York, learning framework for compromised account detection on Twitter,’’ in NY, USA: Springer, 2017, pp. 1–9. Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM), [219] L. Wu and H. Liu, ‘‘Tracing fake-news footprints: Characterizing social Aug. 2018, pp. 471–478. media messages by how they propagate,’’ in Proc. 11th ACM Int. Conf. [194] M. Vergelis, T. Shcherbakova, and T. Sidorina. (2019). Spam and Phish- Web Search Data Mining, 2018, pp. 637–645. ing in Q1 2019. [Online]. Available: https://securelist.com/spam-and- [220] X. Dong, J. A. Clark, and J. Jacob, ‘‘Modelling user-phishing interac- phishing-in-q1-2019/90795/ tion,’’ in Proc. Conf. Hum. Syst. Interact., May 2008, pp. 627–632. [195] N. Virvilis, B. Vanautgaerden, and O. S. Serrano, ‘‘Changing the game: [221] C. Yang, R. Harkreader, J. Zhang, S. Shin, and G. Gu, ‘‘Analyzing The art of deceiving sophisticated attackers,’’ in Proc. 6th Int. Conf. Cyber spammers’ social networks for fun and profit: A case study of cyber Conflict (CyCon), Jun. 2014, pp. 87–97. criminal ecosystem on Twitter,’’ in Proc. 21st Int. Conf. World Wide Web, [196] A. Vishwanath, ‘‘Habitual facebook use and its impact on getting 2012, pp. 71–80. deceived on social media,’’ J. Comput.-Mediated Commun., vol. 20, no. 1, [222] C. Yang, J. Zhang, and G. Gu, ‘‘A taste of tweets: Reverse engineering pp. 83–98, Jan. 2015. Twitter spammers,’’ in Proc. 30th Annu. Comput. Secur. Appl. Conf., [197] R. von Solms and J. van Niekerk, ‘‘From information security to cyber 2014, pp. 86–95. security,’’ Comput. Secur., vol. 38, pp. 97–102, Oct. 2013. [223] P. Yang, G. Zhao, and P. Zeng, ‘‘Phishing Website detection based on [198] S. Vosoughi and D. Roy, ‘‘A human-machine collaborative system for multidimensional features driven by deep learning,’’ IEEE Access, vol. 7, identifying rumors on Twitter,’’ in Proc. IEEE Int. Conf. Data Mining pp. 15196–15209, 2019. Workshop (ICDMW), Nov. 2015, pp. 47–50. [224] Y. Yao, B. Viswanath, J. Cryan, H. Zheng, and B. Y. Zhao, ‘‘Auto- [199] S. Vosoughi, M. Mohsenvand, and D. Roy, ‘‘Rumor gauge: Predicting mated crowdturfing attacks and defenses in online review systems,’’ the veracity of rumors on Twitter,’’ ACM Trans. Knowl. Discovery Data, in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, vol. 11, no. 4, pp. 1–36, 2017. pp. 1143–1158. [200] S. Vosoughi, D. Roy, and S. Aral, ‘‘The spread of true and false news [225] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, ‘‘Sybillimit: A near- online,’’ Science, vol. 359, no. 6380, pp. 1146–1151, Mar. 2018. optimal social network defense against sybil attacks,’’ in Proc. IEEE [201] A. Vrij, ‘‘Why professionals fail to catch liars and how they can improve,’’ Symp. Secur. Privacy (SP), May 2008, pp. 3–17. Legal Criminol. Psychol., vol. 9, no. 2, pp. 159–181, Sep. 2004. [226] P. Zambrano, J. Torres, L. Tello-Oquendo, R. Jácome, M. E. Benalcázar, [202] A. Vrij, R. Fisher, S. Mann, and S. Leal, ‘‘Detecting deception by manip- R. Andrade, and W. Fuertes, ‘‘Technical mapping of the grooming ulation cognitive load,’’ Trends Cognit. Sci., vol. 10, no. 4, pp. 141–142, anatomy using machine learning paradigms: An information security 2006. approach,’’ IEEE Access, vol. 7, pp. 142129–142146, 2019.

VOLUME 9, 2021 1805 Z. Guo et al.: Online Social Deception and Its Countermeasures: A Survey

[227] L. Zhao, Q. Wang, J. Cheng, Y. Chen, J. Wang, and W. Huang, ‘‘Rumor ING-RAY CHEN (Member, IEEE) received the spreading model with consideration of forgetting mechanism: A case of B.S. degree from National Taiwan University, and online blogging LiveJournal,’’ Phys. A, Stat. Mech. Appl., vol. 390, no. 13, the M.S. and Ph.D. degrees in computer science pp. 2619–2625, Jul. 2011. from the University of Houston. He is currently a [228] L. Zhao, J. Wang, Y. Chen, Q. Wang, J. Cheng, and H. Cui, ‘‘SIHR rumor Professor with the Department of Computer Sci- spreading model in social networks,’’ Phys. A, Stat. Mech. Appl., vol. 391, ence, Virginia Tech. His research interests include no. 7, pp. 2444–2453, Apr. 2012. trust and security, network and service manage- [229] L. Zhao, H. Cui, X. Qiu, X. Wang, and J. Wang, ‘‘SIR rumor spreading ment, and reliability and performance analysis model in the new media age,’’ Phys. A, Stat. Mech. Appl., vol. 392, no. 4, of mobile wireless networks and cyber physical pp. 995–1003, Feb. 2013. [230] L. Zhou and D. Zhang, ‘‘Following linguistic footprints: Automatic systems. He was a recipient of the IEEE Commu- deception detection in online communication,’’ Commun. ACM, vol. 51, nications Society William R. Bennett Prize in Communications Network- no. 9, pp. 119–122, 2008. ing and the U.S. Army Research Laboratory (ARL) Publication Award. [231] X. Zhou and R. Zafarani, ‘‘A survey of fake news: Fundamental the- He also serves as an Associate Editor for the IEEE TRANSACTIONS ON SERVICES ories, detection methods, and opportunities,’’ 2018, arXiv:1812.00315. COMPUTING, the IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, [Online]. Available: http://arxiv.org/abs/1812.00315 and The Computer Journal. [232] H. Zhu, ‘‘Fighting against social spammers on Twitter by using active honeypots,’’ Ph.D. dissertation, Dept. Elect. Comput. Eng., McGill Univ. Libraries, Montreal, QC, Canada, 2015. SRIJAN SENGUPTA received the B.Stat. and [233] Q. Zhu, A. Clark, R. Poovendran, and T. Basar, ‘‘SODEXO: A sys- M.Stat. degrees from the Indian Statistical Insti- tem framework for deployment and exploitation of deceptive honey- tute, in 2007 and 2009, respectively, and the Ph.D. bots in social networks,’’ 2012, arXiv:1207.5844. [Online]. Available: degree in statistics from the University of Illinois http://arxiv.org/abs/1207.5844 at Urbana-Champaign, in 2016. He is currently [234] Q. Zhu, A. Clark, R. Poovendran, and T. Başar, ‘‘Deployment and an Assistant Professor with the Department of exploitation of deceptive honeybots in social networks,’’ in Proc. 52nd Statistics, North Carolina State University. Before IEEE Conf. Decis. Control, Dec. 2013, pp. 212–219. joining North Carolina State University, he worked [235] M. Zuckerman, B. M. DePaulo, and R. Rosenthal, ‘‘Verbal and nonver- as an Assistant Professor of statistics with Virginia bal communication of deception,’’ in Advances in Experimental Social Tech from 2016 to 2020, and a Risk Management Psychology, vol. 14, L. Berkowitz, Ed. New York, NY, USA: Academic, Actuary from 2009 to 2011. His research interests include statistical net- 1981, pp. 1–59. work analysis, bootstrap and related resampling/subsampling methods, and machine learning. He is also a member of ASA, ICSA, and IISA. His awards include the Norton prize for outstanding Ph.D. thesis from the University of Illinois at Urbana-Champaign, the Birla Sun Life Academic Excellence Award from the Institute of Actuaries of India, and the IMS New Researcher ZHEN GUO received the M.S. degree in biolog- Travel Award. ical sciences and the M.S. degree in computer science from Fordham University, New York City, in 2013 and 2016, respectively. He is currently MICHIN HONG received the B.S. and M.S.W. pursuing the Ph.D. degree in computer sciences degrees from Ewha Womans University, South with the Virginia Polytechnic Institute and State Korea, and the Ph.D. degree from the Univer- University, Falls Church, VA, USA. His research sity of Maryland. She is currently an Associate interests include online social deception and social Professor with the Indiana University School of capital-based friending decision networks. Social Work. Her research interests include social determinants affecting ethnic/racial disparities in health and access to health care. Recently, she has expanded her research to explore individuals’ vulnerability in the online world.

JIN-HEE CHO (Senior Member, IEEE) received the M.S. and Ph.D. degrees in computer science TANUSHREE MITRA received the M.S. degree from Virginia Tech, in 2004 and 2008, respec- in computer science from Texas A&M Univer- tively. She has been an Associate Professor with sity, in 2011, and the Ph.D. degree in computer the Department of Computer Science, Virginia science from the Georgia Institute of Technol- Tech, since 2018. Prior to joining the Virginia ogy, in 2017. She is currently an Assistant Pro- Tech, she has also been working as a Computer fessor with the Information School, University Scientist with the U.S. Army Research Laboratory of Washington. From 2017 to 2020, she was an (USARL), Adelphi, MD, USA, since 2009. She Assistant Professor with the Computer Science has published over 100 peer-reviewed technical Department, Virginia Tech. She studies and builds articles in leading journals and conferences 140 the areas of trust man- large-scale social computing systems to under- agement, cybersecurity, metrics and measurements, network performance stand and counter problematic information online. Her work employs a range analysis, resource allocation, agent-based modeling, uncertainty reasoning of interdisciplinary methods from the ﬁelds of human–computer interac- and analysis, information fusion / credibility, and social network analysis. tion, data mining, machine learning, and natural language processing. She She is also a member of ACM. She received the best paper awards in IEEE received best paper honorable mention awards from ACM CHI 2015 and TrustCom’2009, BRIMS’2013, IEEE GLOBECOM’2017, 2017 ARL’s pub- ACM CSCW 2020, the Virginia Tech College of Engineering’s Outstanding lication award, and IEEE CogSima 2018. She is a winner of the 2015 IEEE New Assistant Professor Award in 2020, and the Georgia Tech’s GVU Communications Society William R. Bennett Prize in the Field of Commu- Center’s Foley Scholarship for research innovation and potential impact nications Networking. In 2016, she was selected for the 2013 Presidential in 2015. Early Career Award for Scientists and Engineers (PECASE).

1806 VOLUME 9, 2021