Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems
Mshabab Alrizah S e n c u n Z h u X i n y u Xing Gang Wang
Background Background
Outline Objectives
Objectives Background
Objectives Methodology
Methodology Methodology
Datasets
Datasets Datasets
Analysis: FP & FN errors
Analysis: Evasions
FP & FNFP errors & FNFP errors &
Conclusion
Evasion Evasion Evasion
2
Background
Ad-Blocking System
Objectives
Methodology
Place your screenshot here
Datasets
FP & FNFP errors &
Evasion Evasion
3 3
Background
Ad-Blocking System
Objectives
Methodology
Place your screenshot here
Datasets
FP & FNFP errors &
Evasion Evasion
4 4
Background Crowdsourcing and Ad-blocking Systems
Objectives
Want big impact?
Methodology Use big image.
Datasets
FP & FNFP errors &
Evasion Evasion
5
Background
Previous Work Studied …
Objectives • Different problems or complementary solutions.
• Economic ramifications of the ad-blocking systems Methodology
Datasets • Specific cases of ad blocking. • e.g., trackerblocking, anti-adblocking
FP & FNFP errors &
• Relationships among Internet users, ad publishers, and ad blocker Evasion Evasion
6
Background
Yet… Objectives
• Remains a lack of deep
understanding on: Methodology
– Filter list effectiveness
Datasets – The crowdsourcing functionality and
contribution – The potential pitfalls and security FP & FNFP errors & vulnerabilities
Evasion Evasion
7
Background Objectives
Provide an in-depth study on the dynamic changes of the filter-list to
Objectives answer the flowing key questions.
― Q1: How prevalent are the errors of missing real advertisements( false negative
Methodology (FN) errors) and the errors of blocking legitimate content( false positive (FP)
errors)?
Datasets ― Q2: What are the primary sources of FP errors?
― Q3: How effective is crowdsourcing in detecting and mitigating FP and FN
errors? FP & FNFP errors &
― Q4: How robust is the filter-list against evasion attacks? Evasion Evasion
8
Background
Objectives
Objectives
― Q1: (prevalence of FN and FP errors)?
Methodology
Datasets ― Q2 (primary sources of FP errors)?
FP & FNFP errors & ― Q3: (crowdsourcing effectiveness)?
Evasion Evasion ― Q4: (Robustness of the filter-list)?
9
Background
Methodology
Datasets Collecting and Cleaning Objectives
• Collect and track dynamic changes of the filter list ( EasyList) – Collecting 117,683 versions of EasyList( 2009 to 2018). – Cleaning and extracting those versions created to correct FP and FN errors Methodology Dataset D1 • Extract filter rules added or removed and build a record for each rule. – Each record contains information about the rule (e.g. time of creation, deletion,
Datasets EasyList versions .
• Collect posts of FP and FN errors in EasyList forum. FP & FNFP errors & • 23,240 topics with at least one report. • Extract the reports from the posts and build a record of each report. Dataset D2
Evasion Evasion • The report record contains information such the contributor profile, webpage has the error, EasyList editor responses….
10
Background
Q1: Error Prevalence
Objectives • To answer the question we need to know:
1. Types of the errors Methodology
2. Websites with the errors
Datasets
Problem: Many reports do not have evidences of correction Dataset D1 FP & FNFP errors & Solution: Link Reports with EasyList Filter’s Record
Evasion Evasion Dataset D2A Dataset D2
Report’s Record Error Record 11
Background
Q2: Primary Sources of FP Errors
Objectives • Required knowledge:
– The web page that has the FP error(s). – The element impacted.
Methodology – The filter that caused the error.
– The EasyList versions created to fix the error(s). Datasets Old EasyList Version Chrome Extension • Reproduced FPs using:
FP & FNFP errors & Dataset D2A
FP Error record Dataset D2B
Webpage Evasion Evasion
Controller and checker
12
Background
Q3: Crowdsourcing Effectiveness Objectives
• Extracting from Dataset D2 the crowdsourcing behaviors:
Methodology Reports
Type of Report
Reporter profile Datasets
EasyList editor response EasyList editor profile Time of correction
FP & FNFP errors & Reason of rejection
……. Evasion Evasion
13
Background
Q4: Robustness of Filter-List
Objectives • Extract from Dataset D1 the EasyList’s behavior:
Reasons of adding rules. Syntax of rules.
Methodology Ad server’s domains.
Change of ad element attribute.
Datasets ….
• Extract from Dataset D2 the websites’ behaviors: Reasons of FN errors.
FP & FNFP errors & Responses of EasyList community.
… Evasion Evasion • Study the reaction of ad networks. Historical traffic information of the ad-severs. 14 …
Background Datasets
Dataset D1 Objectives Dataset # Note Cleaned EasyList Versions 55,607 From November 30, 2009, to December 7, 2018 Added Filter Rules 534,020 In order to correct FP and FN errors Removed Filter Rules 448,479 In order to correct FP and FN errors
Methodology Dataset D2
Dataset # Note Reports of FN errors 17,968 From November 30, 2009, to December 7, 2018
Datasets Reports of FP errors 5,272 From November 30, 2009, to December 7, 2018
Dataset D2A: Linking EasyList Filter Rules with True Reports Dataset # Note
FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 FP & FNFP errors &
Dataset D2B: Reproducing FPs
Dataset # Note Evasion Evasion True Instances of FP errors 570 2,203 webpages studied. Ad-servers: traffic information Dataset # Note Historical traffic information of the ad-severs 567,293 Traffic information of 6903 ad server domains during 4-years. 15
Background Q1 Analysis: Error Prevalence
Objectives
Methodology
Datasets
FP & FNFP errors &
Evasion Evasion
Websites with FN and FP errors 16
Background Q2 Analysis: Sources of FP Errors:
Objectives
Web Designer’s create Ad Blocker create 90% Non-Ad Content Filter Rule t1
Methodology 60%
t2 Block Reques Hide Element
Datasets 30%
t3
0% FP & FNFP errors &
Ad blocker’s Fault Web Designer’s Fault Designer's Fault Ad-blocker's Fault
Evasion Evasion The responsibility (the source of the error) Time t1 17 Q3 Analysis: Crowdsourcing Background Effectiveness Objectives # Reports Avg. of days SD. Title FP FN FP FN FP FN Anonymous 530 853 2.37 1.80 6.88 7.38 New Member 371 307 3.94 9.31 8.77 21.09 Methodology Senior Member 160 749 2.31 6.42 5.35 17.48 Developer 83 99 1.80 16.30 5.52 31.08 Other Lists Editor 105 603 1.65 2.65 3.86 11.02 Veteran 255 751 1.95 5.34 5.17 14.31 Datasets Editor 80 338 0.58 0.52 1.49 2.98 Total 1,584 3,700 2.09 6.05 5.29 15.05 FP and FN error reports submitted by different categories of users FP & FNFP errors & 30% False Positive Evasion Evasion Reports 70% False Negative Reports 18 Contributions by Different Background Types of Users Objectives FN Reports FP Reports Editor Editor 9% 5% Methodology Veteran Anonymous Veteran 16% Datasets 23% Other Lists Anonymous 20% New Editor 34% Member 7% Senior 8% Developer New Member Member Other Lists 5% Senior FP & FNFP errors & 20% 23% Editor Developer Member 17% 3% 10% Evasion Evasion 19 Background Contributions of Different Types of Users Objectives Anonymous New Senior Developer Other Lists Veteran Editor Member Member Editor 7.17E-24 9.16E-06 1.03E-07 0.030464 0.1235264 0.000166 0.0611805 Methodology P-value of X-squared Pearson correlation -0.05275 -0.158028 0.09673657 0.061129 0.0363365 0.1041299 0.067597 Datasets • To Anonymous, New Member, Senior Member, and Veteran classes, the error type and website popularity dependent. FP & FNFP errors & • Anonymous and New Members contributed more on Evasion Evasion correcting FP errors than FN errors for lower-rank websites. • Expert members tend to the opposite side. 20 Background Delay in Reporting FP Errors Objectives Methodology Datasets FP & FNFP errors & Evasion Evasion Delay of reporting FP errors 21 Background Q4 Analysis: Robustness of Filter-list against Evasion Attacks Objectives 15 different evasion attacks: Methodology • More-Studied Attacks (4), • Less-Studied Attacks (3), and Datasets • Nonstudied Attacks (8). FP & FNFP errors & Evasion Evasion 22 Background More-Studied Attacks Objectives Attacks Our Findings WebSockets. Since 2016, EasyList had blocked : • 291 websites. Methodology • 137 ad servers. Anti-ad Blocker. Reaction: • Restricting content on the sites (paywalls, blocking the websites) Datasets • Redirecting the users to different websites or content. Randomization of Ad Attributes and • 15 websites using randomization. URLs. • Facebook appeared most frequently . Factoring Acceptable Ads List Sitekeys. Our datasets do not show any of this attack. FP & FNFP errors & Evasion 23 Background Less-Studied Attacks Attacks Our Findings Objectives Changing Ad-Server Domains. • 52% of the ad servers’ traffic activities disappeared in three days. • 84% of these 52% ad servers were blocked shortly after they were used. • Ad servers with long life: 61% were significantly influenced by the Methodology blocking. • The EasyList community ran code to monitor the changes of ad servers (limited) Datasets Changing Ad-Element Attributes. • EasyList did not have the capability to automatically trace the changes of ad elements. • Manually detected by EasyList, 553 instances changed the filters in response to this type of evasion. FP & FNFP errors & • Changing the Path of Ad Source. 644 websites changed their the ad URL’s paths. Evasion 24 Background Nonstudied Attacks Objectives 1. Exploiting Obsolete Whitelist Filters. 2. Using Generic Exception Rules (Whitelist Filters). Domains in the whitelist filters were not monitored by Methodology EasyList. 3. Exploiting False Positive Errors. 4. First-Party Content and Inline Script. Datasets 5. ISP Injecting Ads. 6. Background Redirection. EasyList counters the anti-ad FP & FNFP errors & blocker or solve FP error by 7. Exploiting WebRTC. GER. So ..? Evasion 8. CSS Background Image Hack. 25 Background Limitations and Future Work Objectives • Limitations: – The dataset covered the historical dated back to 2009. We could not find any data before November 2009. Methodology – Conservative approach was used to link the reported errors to the EasyList updates. Datasets • Trade-off between the scale of the data and the accuracy of the analysis. – The Internet Archive data was limited. • FP & FNFP errors & Future work – Crowdsourcing mechanisms. Evasion Evasion – Dynamic analysis. – And more… 26 Background Conclusions Objectives • An in-depth measurement study to reveal ― Q1: Prevalence of FP and FN errors Methodology ― Q2: Primary sources of FP errors ― Q3: Effectiveness of crowdsourcing in detecting and mitigating FP and FN errors Datasets ― Q4: Robustness of filter-list against evasion attacks? • Our findings are expected to help shed light on any future work to FP & FNFP errors & evolve ad blocking and/or to optimize crowdsourcing mechanisms. Evasion Evasion 27 Mshabab Alrizah [email protected]