Social Turing Tests: Crowdsourcing Sybil Detection
Total Page:16
File Type:pdf, Size:1020Kb
Social Turing Tests: Crowdsourcing Sybil Detection Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang‡, Miriam Metzger†, Haitao Zheng and Ben Y. Zhao Department of Computer Science, U. C. Santa Barbara, CA USA †Department of Communications, U. C. Santa Barbara, CA USA ‡Renren Inc., Beijing, China Abstract The research community has produced a substantial number of techniques for automated detection of Sybils [4, As popular tools for spreading spam and malware, Sybils 32, 33]. However, with the exception of SybilRank [3], few (or fake accounts) pose a serious threat to online communi- have been successfully deployed. The majority of these ties such as Online Social Networks (OSNs). Today, sophis- techniques rely on the assumption that Sybil accounts have ticated attackers are creating realistic Sybils that effectively difficulty friending legitimate users, and thus tend to form befriend legitimate users, rendering most automated Sybil their own communities, making them visible to community detection techniques ineffective. In this paper, we explore detection techniques applied to the social graph [29]. the feasibility of a crowdsourced Sybil detection system for Unfortunately, the success of these detection schemes is OSNs. We conduct a large user study on the ability of hu- likely to decrease over time as Sybils adopt more sophis- mans to detect today’s Sybil accounts, using a large cor- ticated strategies to ensnare legitimate users. First, early pus of ground-truth Sybil accounts from the Facebook and user studies on OSNs such as Facebook show that users are Renren networks. We analyze detection accuracy by both often careless about who they accept friendship requests “experts” and “turkers” under a variety of conditions, and from [2]. Second, despite the discovery of Sybil commu- find that while turkers vary significantly in their effective- nities in Tuenti [3], not all Sybils band together to form ness, experts consistently produce near-optimal results. We connected components. For example, a recent study of use these results to drive the design of a multi-tier crowd- half a million Sybils on the Renren network [14] showed sourcing Sybil detection system. Using our user study data, that Sybils rarely created links to other Sybils, and in- we show that this system is scalable, and can be highly ef- stead intentionally try to infiltrate communities of legitimate fective either as a standalone system or as a complementary users [31]. Thus, these Sybils rarely connect to each other, technique to current tools. and do not form communities. Finally, there is evidence that creators of Sybil accounts are using advanced techniques arXiv:1205.3856v2 [cs.SI] 7 Dec 2012 to create more realistic profiles, either by copying profile data from existing accounts, or by recruiting real users to 1 Introduction customize them [30]. Malicious parties are willing to pay for these authentic-looking accounts to better befriend real The rapid growth of Sybil accounts is threatening the sta- bility and security of online communities, particularly on- users. line social networks (OSNs). Sybil accounts represent fake These observations motivate us to search for a new ap- identities that are often controlled by a small number of real proach to detecting Sybil accounts. Our insight is that while users, and are increasingly used in coordinated campaigns attackers are creating more “human” Sybil accounts, fool- to spread spam and malware [6,30]. In fact, measurement ing intelligent users, i.e. passing a “social Turing test,” is studies have detected hundreds of thousands of Sybil ac- still a very difficult task. Careful users can apply intuition counts in different OSNs around the world [3,31]. Recently, to detect even small inconsistencies or discrepancies in the Facebook revealed that up to 83 million of its users may be details of a user profile. Most online communities already fake1, up significantly from 54 million earlier2. have mechanisms for users to “flag” questionable users or content, and social networks often employ specialists dedi- 1http://www.bbc.com/news/technology-19093078 cated to identifying malicious content and users [3]. While 2http://www.bbc.com/news/technology-18813237 these mechanisms are ad hoc and costly, our goal is to ex- plore a scalable and systematic approach of applying human positive. We hope this will pave the way towards testing effort, i.e. crowdsourcing, as a tool to detect Sybil accounts. and deployment of crowdsourced Sybil detection systems Designing a successful crowdsourced Sybil detection re- by large social networks. quires that we first answer fundamental questions on issues of accuracy, cost, and scale. One key question is, how accu- 2 Background and Motivation rate are users at detecting fake accounts? More specifically, how is accuracy impacted by factors such as the user’s ex- Our goal is to motivate and design a crowdsourced Sybil perience with social networks, user motivation, fatigue, and detection system for OSNs. First, we briefly introduce the language and cultural barriers? Second, how much would concept of crowdsourcing and define key terms. Next, we it cost to crowdsource authenticity checks for all suspicious review the current state of social Sybil detection, and high- profiles? Finally, how can we design a crowdsourced Sybil light ongoing challenges in this area. Finally, we introduce detection system that scales to millions of profiles? our proposal for crowdsourced Sybil detection, and enumer- In this paper, we describe the results of a large user study ate the key challenges to our approach. into the feasibility of crowdsourced Sybil detection. We gather ground-truth data on Sybil accounts from three so- cial network populations: Renren [14], the largest social 2.1 Crowdsourcing network in China, Facebook-US, with profiles of English Crowdsourcing is a process where work is outsourced to speaking users, and Facebook-India, with profiles of users an undefined group of people. The web greatly simplifies who reside in India. The security team at Renren Inc. pro- the task of gathering virtual groups of workers, as demon- vided us with Renren Sybil account data, and we obtained strated by successful projects such as Wikipedia. Crowd- Facebook (US and India) Sybil accounts by crawling highly sourcing works for any job that can be decomposed into suspicious profiles weeks before they were banned by Face- short, simple tasks, and brings significant benefits to tasks book. Using this data, we perform user studies analyzing not easily performed by automated algorithms or systems. the effectiveness of Sybil detection by three user popula- First, by harnessing small amounts of work from many peo- tions: motivated and experienced “experts”; crowdsourced ple, no individual is overburdened. Second, the group of workers from China, US, and India; and a group of UCSB workers can change dynamically, which alleviates the need undergraduates from the Department of Communications. for a dedicated workforce. Third, workers can be recruited Our study makes three key contributions. First, we an- quickly and on-demand, enabling elasticity. Finally and alyze detection accuracy across different datasets, as well most importantly, by leveraging human intelligence, crowd- as the impact of different factors such as demographics, sourcing can solve problems that automated techniques can- survey fatigue, and OSN experience. We found that well- not. motivated experts and undergraduate students produced ex- In recent years, crowdsourcing websites have emerged ceptionally good detection rates with near-zero false posi- that allow anyone to post small jobs to the web and have tives. Not surprisingly, crowdsourced workers missed more them be solved by crowdworkers for a small fee. The pi- Sybil accounts, but still produced near zero false positives. oneer in the area is Amazon’s Mechanical Turk, or MTurk We observe that as testers examine more and more suspi- for short. On MTurk, anyone can post jobs called Human cious profiles, the time spent examining each profile de- Intelligence tasks, or HITs. Crowdworkers on MTurk, or creases. However, experts maintained their accuracy over turkers, complete HITs and collect the associated fees. To- time while crowdworkers made more mistakes with addi- day, there are around 100,000 HITs available on MTurk at tional profiles. Second, we performed detailed analysis any time, with 90% paying ≤$0.10 each [11,24]. There are on individual testers and account profiles. We found that over 400,000 registered turkers on MTurk, with 56% from while it was easy to identify a subset of consistently accu- the US, and 36% from India [24]. rate testers, there were very few “chameleon profiles” that Social networks have started to leverage crowdsourcing were undetectable by all test groups. Finally, we propose a to augment their workforce. For example, Facebook crowd- scalable crowdsourced Sybil detection system based on our sources content moderation tasks, including filtering porno- results, and use trace-driven data to show that it achieves graphic and violent pictures and videos [10]. However, both accuracy and scalability with reasonable costs. to date we know of no OSN that crowdsources the iden- By all measures, Sybil identities and fake accounts are tification of fake accounts. Instead, OSNs like Facebook growing rapidly on today’s OSNs. Attackers continue to in- and Tuenti maintain dedicated, in-house staff for this pur- novate and find better ways of mass-producingfake profiles, pose [3,10]. and detection systems must keep up both in terms of accu- Unfortunately, attackers have also begun to leverage racy and scale. This work is the first to proposecrowdsourc- crowdsourcing. Two recent studies have uncovered crowd- ing Sybil detection, and our user study results are extremely sourcing websites where malicious users pay crowdworkers to create Sybil accounts on OSNs and generate spam [21, three reasons: first, humans can make overall judgments 30]. These Sybils are particularly dangerous because they about OSN profiles that are too complex for automated al- are created and managed by real human beings, and thus ap- gorithms.