Analyzing the Crowdsourcing Process of Ad-Blocking Systems

Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems

Mshabab Alrizah S e n c u n Z h u X i n y u Xing Gang Wang

Background Background

Outline Objectives

Objectives Background

Objectives Methodology

Methodology Methodology

Datasets

Datasets Datasets

Analysis: FP & FN errors

Analysis: Evasions

FP & FNFP errors & FNFP errors &

Conclusion

Evasion Evasion Evasion

Background

Ad-Blocking System

Objectives

Methodology

Place your screenshot here

Datasets

FP & FNFP errors &

Evasion Evasion

3 3

Background

Ad-Blocking System

Objectives

Methodology

Place your screenshot here

Datasets

FP & FNFP errors &

Evasion Evasion

4 4

Background Crowdsourcing and Ad-blocking Systems

Objectives

Want big impact?

Methodology Use big image.

Datasets

FP & FNFP errors &

Evasion Evasion

Background

Previous Work Studied …

Objectives • Different problems or complementary solutions.

• Economic ramifications of the ad-blocking systems Methodology

Datasets • Specific cases of ad blocking. • e.g., trackerblocking, anti-adblocking

FP & FNFP errors &

• Relationships among Internet users, ad publishers, and ad blocker Evasion Evasion

Background

Yet… Objectives

• Remains a lack of deep

understanding on: Methodology

– Filter list effectiveness

Datasets – The crowdsourcing functionality and

contribution – The potential pitfalls and security FP & FNFP errors & vulnerabilities

Evasion Evasion

Background Objectives

Provide an in-depth study on the dynamic changes of the filter-list to

Objectives answer the flowing key questions.

― Q1: How prevalent are the errors of missing real advertisements( false negative

Methodology (FN) errors) and the errors of blocking legitimate content( false positive (FP)

errors)?

Datasets ― Q2: What are the primary sources of FP errors?

― Q3: How effective is crowdsourcing in detecting and mitigating FP and FN

errors? FP & FNFP errors &

― Q4: How robust is the filter-list against evasion attacks? Evasion Evasion

Background

Objectives

― Q1: (prevalence of FN and FP errors)?

Methodology

Datasets ― Q2 (primary sources of FP errors)?

FP & FNFP errors & ― Q3: (crowdsourcing effectiveness)?

Evasion Evasion ― Q4: (Robustness of the filter-list)?

Background

Methodology

Datasets Collecting and Cleaning Objectives

• Collect and track dynamic changes of the filter list ( EasyList) – Collecting 117,683 versions of EasyList( 2009 to 2018). – Cleaning and extracting those versions created to correct FP and FN errors Methodology Dataset D1 • Extract filter rules added or removed and build a record for each rule. – Each record contains information about the rule (e.g. time of creation, deletion,

Datasets EasyList versions .

• Collect posts of FP and FN errors in EasyList forum. FP & FNFP errors & • 23,240 topics with at least one report. • Extract the reports from the posts and build a record of each report. Dataset D2

Evasion Evasion • The report record contains information such the contributor profile, webpage has the error, EasyList editor responses….

Background

Q1: Error Prevalence

Objectives • To answer the question we need to know:

1. Types of the errors Methodology

2. Websites with the errors

Datasets

Problem: Many reports do not have evidences of correction Dataset D1 FP & FNFP errors & Solution: Link Reports with EasyList Filter’s Record

Evasion Evasion Dataset D2A Dataset D2

Report’s Record Error Record 11

Background

Q2: Primary Sources of FP Errors

Objectives • Required knowledge:

– The web page that has the FP error(s). – The element impacted.

Methodology – The filter that caused the error.

– The EasyList versions created to fix the error(s). Datasets Old EasyList Version Chrome Extension • Reproduced FPs using:

FP & FNFP errors & Dataset D2A

FP Error record Dataset D2B

Webpage Evasion Evasion

Controller and checker

Background

Q3: Crowdsourcing Effectiveness Objectives

• Extracting from Dataset D2 the crowdsourcing behaviors:

Methodology  Reports

 Type of Report

 Reporter profile Datasets

 EasyList editor response  EasyList editor profile  Time of correction

FP & FNFP errors &  Reason of rejection

 ……. Evasion Evasion

Background

Q4: Robustness of Filter-List

Objectives • Extract from Dataset D1 the EasyList’s behavior:

 Reasons of adding rules.  Syntax of rules.

Methodology  Ad server’s domains.

 Change of ad element attribute.

Datasets  ….

• Extract from Dataset D2 the websites’ behaviors:  Reasons of FN errors.

FP & FNFP errors &  Responses of EasyList community.

 … Evasion Evasion • Study the reaction of ad networks.  Historical traffic information of the ad-severs. 14  …

Background Datasets

Dataset D1 Objectives Dataset # Note Cleaned EasyList Versions 55,607 From November 30, 2009, to December 7, 2018 Added Filter Rules 534,020 In order to correct FP and FN errors Removed Filter Rules 448,479 In order to correct FP and FN errors

Methodology Dataset D2

Dataset # Note Reports of FN errors 17,968 From November 30, 2009, to December 7, 2018

Datasets Reports of FP errors 5,272 From November 30, 2009, to December 7, 2018

Dataset D2A: Linking EasyList Filter Rules with True Reports Dataset # Note

FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 FP & FNFP errors &

Dataset D2B: Reproducing FPs

Dataset # Note Evasion Evasion True Instances of FP errors 570 2,203 webpages studied. Ad-servers: traffic information Dataset # Note Historical traffic information of the ad-severs 567,293 Traffic information of 6903 ad server domains during 4-years. 15

Background Q1 Analysis: Error Prevalence

Objectives

Methodology

Datasets

FP & FNFP errors &

Evasion Evasion

Websites with FN and FP errors 16

Background Q2 Analysis: Sources of FP Errors:

Objectives

Web Designer’s create Ad Blocker create 90% Non-Ad Content Filter Rule t1

Methodology 60%

t2 Block Reques Hide Element

Datasets 30%

0% FP & FNFP errors &

Ad blocker’s Fault Web Designer’s Fault Designer's Fault Ad-blocker's Fault

Evasion Evasion The responsibility (the source of the error) Time t1

Q3 Analysis: Crowdsourcing Background

Effectiveness Objectives

# Reports Avg. of days SD. Title FP FN FP FN FP FN Anonymous 530 853 2.37 1.80 6.88 7.38 New Member 371 307 3.94 9.31 8.77 21.09 Methodology Senior Member 160 749 2.31 6.42 5.35 17.48

Developer 83 99 1.80 16.30 5.52 31.08 Other Lists Editor 105 603 1.65 2.65 3.86 11.02

Veteran 255 751 1.95 5.34 5.17 14.31 Datasets Editor 80 338 0.58 0.52 1.49 2.98 Total 1,584 3,700 2.09 6.05 5.29 15.05

FP and FN error reports submitted by different categories of users

FP & FNFP errors &

30%

False Positive Evasion Evasion Reports 70% False Negative Reports

Contributions by Different Background

Types of Users

Objectives

FN Reports FP Reports Editor Editor

9% 5%

Methodology

Veteran Anonymous Veteran 16% Datasets 23% Other Lists Anonymous 20% New Editor 34% Member 7% Senior 8% Developer New Member Member Other Lists 5% Senior FP & FNFP errors & 20% 23% Editor Developer Member

17% 3% 10% Evasion Evasion

Background Contributions of Different

Types of Users Objectives

Anonymous New Senior Developer Other Lists Veteran Editor Member Member Editor 7.17E-24 9.16E-06 1.03E-07 0.030464 0.1235264 0.000166 0.0611805

Methodology P-value of X-squared

Pearson correlation -0.05275 -0.158028 0.09673657 0.061129 0.0363365 0.1041299 0.067597

Datasets • To Anonymous, New Member, Senior Member, and Veteran

classes, the error type and website popularity dependent. FP & FNFP errors &

• Anonymous and New Members contributed more on

Evasion Evasion correcting FP errors than FN errors for lower-rank websites. • Expert members tend to the opposite side. 20

Background Delay in Reporting FP Errors

Objectives

Methodology

Datasets

FP & FNFP errors &

Evasion Evasion

Delay of reporting FP errors

Background Q4 Analysis: Robustness of Filter-list

against Evasion Attacks

Objectives

15 different evasion attacks:

Methodology • More-Studied Attacks (4),

• Less-Studied Attacks (3), and Datasets

• Nonstudied Attacks (8).

FP & FNFP errors &

Evasion Evasion

Background

More-Studied Attacks Objectives

Attacks Our Findings

WebSockets. Since 2016, EasyList had blocked : • 291 websites.

Methodology • 137 ad servers.

Anti-ad Blocker. Reaction: • Restricting content on the sites (paywalls, blocking the websites)

Datasets • Redirecting the users to different websites or content.

Randomization of Ad Attributes and • 15 websites using randomization. URLs. • Facebook appeared most frequently .

Factoring Acceptable Ads List Sitekeys. Our datasets do not show any of this attack. FP & FNFP errors &

Evasion

Background

Less-Studied Attacks

Attacks Our Findings Objectives Changing Ad-Server Domains. • 52% of the ad servers’ traffic activities disappeared in three days. • 84% of these 52% ad servers were blocked shortly after they were used. • Ad servers with long life: 61% were significantly influenced by the Methodology blocking. • The EasyList community ran code to monitor the changes of ad

servers (limited) Datasets

Changing Ad-Element Attributes. • EasyList did not have the capability to automatically trace the changes of ad elements. • Manually detected by EasyList, 553 instances changed the filters in

response to this type of evasion. FP & FNFP errors & •

Changing the Path of Ad Source. 644 websites changed their the ad URL’s paths.

Evasion

Background

Nonstudied Attacks Objectives

1. Exploiting Obsolete Whitelist Filters.

2. Using Generic Exception Rules (Whitelist Filters). Domains in the whitelist filters were not monitored by Methodology EasyList.

3. Exploiting False Positive Errors.

4. First-Party Content and Inline Script. Datasets

5. ISP Injecting Ads.

6. Background Redirection.

EasyList counters the anti-ad FP & FNFP errors &

blocker or solve FP error by

7. Exploiting WebRTC. GER. So ..?

Evasion 8. CSS Background Image Hack.

Background

Limitations and Future Work

Objectives • Limitations:

– The dataset covered the historical dated back to 2009. We could not find any data before November 2009.

Methodology – Conservative approach was used to link the reported errors to the EasyList

updates.

Datasets • Trade-off between the scale of the data and the accuracy of the analysis.

– The Internet Archive data was limited. • FP & FNFP errors & Future work

– Crowdsourcing mechanisms.

Evasion Evasion – Dynamic analysis. – And more…

Background

Conclusions Objectives

• An in-depth measurement study to reveal

― Q1: Prevalence of FP and FN errors Methodology

― Q2: Primary sources of FP errors

― Q3: Effectiveness of crowdsourcing in detecting and mitigating FP and FN errors Datasets

― Q4: Robustness of filter-list against evasion attacks? • Our findings are expected to help shed light on any future work to

FP & FNFP errors & evolve ad blocking and/or to optimize crowdsourcing mechanisms.

Evasion Evasion

27 Mshabab Alrizah [email protected]