Sybil in Online Social Networks (Osns)
Total Page:16
File Type:pdf, Size:1020Kb
Sybil In Online Social Networks (OSNs) 1 Sybil (sɪbəl): fake identities controlled by attackers Friendship is a pre-cursor to other malicious activities Does not include benign fakes (secondary accounts) Research has identified malicious Sybils on OSNs Twitter [CCS 2010] Facebook [IMC 2010] Renren [IMC 2011], Tuenti [NSDI 2012] Real-world Impact of Sybil (Twitter) 2 900K 4,000 new July 21st followers/day800K Followers Followers 700K 100,000 new Jul-4 Jul-8 Jul-12 Jul-16 Jul-20followers Jul-24 Jul-28 in 1Aug-1 day Russian political protests on Twitter (2011) 25,000 Sybils sent 440,000 tweets Drown out the genuine tweets from protesters Security Threats of Sybil (Facebook) 3 Large Sybil population on Facebook August 2012: 83 million (8.7%) Sybils are used to: Share or Send Spam Malicious URL Theft of user’s personal information Fake like and click fraud 50 likes per dollar Community-based Sybil Detectors 4 Prior work on Sybil detectors SybilGuard [SIGCOMM’06], SybilLimit [Oakland '08], SybilInfer [NDSS’09] Key assumption: Sybils form tight-knit communities Sybils have difficulty “friending” normal users? Do Sybils Form Sybil Communities? 5 Measurement study on Sybils in the wild [IMC’11] Study Sybils in Renren (Chinese Facebook) Ground-truth data on 560K Sybils collected over 3 years Sybil components: sub-graphs of connected Sybils 10000 1000 100 Edges To To Edges 10 Sybil Normal Users 1components are internally sparse Not amenable1 10 to community100 1000 detection 10000 New Sybil detectionEdges Between system Sybils is needed Detect Sybils without Graphs 6 Anecdotal evidence that people can spot Sybil profiles 75% of friend requests from Sybils are rejected Human intuition detects even slight inconsistencies in Sybil profiles Idea: build a crowdsourced Sybil detector Focus on user profiles Leverage human intelligence and intuition Open Questions How accurate are users? What factors affect detection accuracy? How can we make crowdsourced Sybil detection cost effective? 7 Outline . Introduction . User Study Feasibility Experiment Accuracy Analysis Details in Factors Impacting User Accuracy Paper . Scalable Sybil Detection System . Conclusion User Study Setup* 8 User study with 2 groups of testers on 3 datasets 2 groups of users Experts – Our friends (CS professors and graduate students) Turkers – Crowdworkers from online crowdsourcing systems 3 ground-truth datasets of full user profiles Renren – given to us by Renren Inc. Facebook US and India – crawled Sybils profiles – banned profiles by Facebook Data collection Legitimate profiles – 2-hops from our own profiles details *IRB Approved Real or fake? Why? Classifying Navigation Buttons Profiles Browsing Profiles Screenshot of Profile (Links Cannot be Clicked) 9 Experiment Overview 10 Dataset # of Profiles Test Group # of Profile Sybil Legit. Testers per Tester Chinese Expert 24 100 Renren 100 100 Chinese Turker 418 10 Facebook US Expert 40 50 32 50 US US Turker 299 12 Facebook India Expert 20 100 50 49 India India Turker 342 12 More Profiles per Experts Individual Tester Accuracy 11 100 Much Lower 80 Accuracy Turker •60 Experts prove that humans can be accurate Expert • Turkers needExcellent! extra help… CDF (%) 40 80% of experts have >80% accuracy! 20 0 0 10 20 30 40 50 60 70 80 90 100 Accuracy Per Teser (%) Wisdom of the Crowd 12 Is wisdom of the crowd enough? Majority voting Treat each classification by each tester as a vote Majority vote determines final decision of the crowd Results after majority voting (20 votes) • False Both positive Experts and rates Turkers are have excellent almost zero false positives • What Turker’s can false be negativesdone to areimprove still high turker accuracy? US (19%), India (50%), China (60%) Eliminating Inaccurate Turkers 13 100 DramaticChina 80 ImprovementIndia US 60 Removing inaccurate turkers can effectively reduce40 false negatives! 20 Majority Vote False Negative False (%) Vote Majority 0 0 10 20 30 40 50 60 70 Turker Accuracy Threshold (%) 14 Outline . Introduction . User Study . Scalable Sybil Detection System System Design Trace-driven Simulation . Conclusion A Practical Sybil Detection System 15 1. Scalability Must scale to millions of users Details in High accuracy with low costs Paper 2. Preserve user privacy when giving data to turkers 100 80 Key insight to designing our system 60 • Accuracy in turker population highly skewed 40 CDF (%) CDF 20 • Only 10% turkers > 90% accurate 0 0 10 30 50 60 80 90 20 40 70 100 Accuracy (%) System Architecture 16 Maximize Utility of CrowdsourcingHigh Accuracy Layer Turkers Rejected! OSN Employees Very Accurate Turkers Turker Selection Accurate Turkers Sybils All Turkers • Continuous Quality Control Heuristics• Locate Malicious Workers Social Network User Reports Suspicious Profiles Flag Suspicious Users Trace Driven Simulations 17 Simulation on 2000 profiles Very Accurate Error rates drawn from survey data Turkers Calibrate 4 parameters to: Minimize false positives & false negatives Accurate Turkers Minimize votes per profile (minimize cost) Results (DetailsResults++ in Paper) • Average 68 votes per profile • <1%<0.1% false false positives positives • <1%<0.1% false false negatives negatives Estimating Cost 18 Estimated cost in a real-world social networks: Tuenti 12,000 profiles to verify daily 14 full-time employees Cost Annual with salary malicious 30,000 turkers EUR* (~$20 per hour) $2240 per day • 25% of turkers are malicous •Crowdsourced $504 per day Sybil Detection 20sec/profile, 8 hour day 50 turkers Facebook wage ($1 per hour) $400 per day Augment existing automated systems *http://www.glassdoor.com/Salary/Tuenti-Salaries-E245751.htm Conclusion 19 Designed a crowdsourced Sybil detection system False positives and negatives <1% Resistant to infiltration by malicious workers Low cost Currently exploring prototypes in real-world OSNs 20 Questions? Thank you! Ground-truth Data Collection (Legit.) 21 Facebook Crawl Random 50 US Selection 50 IN 8 Seeds Seeds 100 Legitimate Profiles from Lab 1-hop friends 50 US 86k 2-hop friends 50 India Ground-truth Data Collection (Sybil) 22 573 Confirmed Publicly Facebook Crawl Sybils Available Image Profile Pictures Confirmed Sybils Banned by FacebookIf >90% of picturesDo on notweb consider Facebook links Google Search By Image Suspicious Profiles Users Suspicious Profiles Dataset Preserving User Privacy 23 Showing profiles to crowdworkers raises privacy issues Solution: reveal profile information in context ! Crowdsourced Evaluation ! Friend-Only Public Profile Crowdsourced Profile Information Evaluation Information Friends Survey Fatigue 24 US Experts US Turkers 100 100 100 100 80 80 80 80 60 60 60 60 40 40 40 40 Accuracy (%) Accuracy (%) Time per Profile (s) Time per Profile 20 20 Time per Profile (s) (s) Time per Profile 20 20 0 0 0 0 0 10 20 30 40 Accuracy 0 2 4 6 8 10 Profile Order Time Profile Order No fatigueAll testers speed up overFatigue time matters Wisdom of the Crowd 25 Treat each classification by each tester as a vote Majority vote determines final decision Almost Zero ExpertsFalse False Dataset FalseTest GroupPositives PerformPositives Okay Negatives Chinese Expert 0% 3% • RenrenFalse positive rates are excellent • Turkers needChinese extra Turker helpTurkers against Miss0% false negatives63% Facebook US Expert Lots of Sybils0% 10% • WhatUS can beUS done Turker to improve2% accuracy? 19% Facebook India Expert 0% 16% India India Turker 0% 50% Sybil Profile DifficultyExperts perform well on 26 most difficult Sybils 100 90 80 70 60 • Some50 Sybils are more stealthy Turker 40 Expert • Experts30 catch more tough Sybils than turkers 20 Really difficult 10 profiles Average Accuracy (%) per Sybil Average 0 0 5 10 15 20 25 30 35 Sybil Profiles Ordered By Turker Accuracy How Many Votes Do You Need? 27 100 • Only need a fewFalse votes Negatives 80 • False positives reduce quickly 60 • Fewer votes = less cost China India 40 Error (%) Rate False Positives 20 US 0 2 4 6 8 10 12 14 16 18 20 22 24 Votes per Profile Individual Tester Accuracy 28 100 Chinese Turker Much Lower 80 Accuracy Chinese Expert 60 • Experts prove that humans can be accurate Excellent! CDF CDF (%) 40 • Turkers80% need of extra experts help… have >90% accuracy! 20 0 0 10 20 30 40 50 60 70 80 90 100 Accuracy per Tester (%) .