CNAME Cloaking-Based Tracking on the Web: Characterization, Detection, and Protection Ha Dao Johan Mazel Kensuke Fukuda
Total Page:16
File Type:pdf, Size:1020Kb
1 CNAME Cloaking-based Tracking on the Web: Characterization, Detection, and Protection Ha Dao Johan Mazel Kensuke Fukuda Abstract—Third-party tracking on the web has been used third-party tracking. Many privacy protections take blacklist for collecting and correlating user’s browsing behavior. Due approaches to detect third-party trackers [5]–[8]. Some works to the increasing use of ad-blocking and third-party tracking identify tracking requests using cookies [9], or fingerprinting protections, tracking providers introduced a new technique called CNAME cloaking. It misleads web browsers into believing that [10], [11]. Other studies intend to detect third-party tracker a request for a subdomain of the visited website originates from automatically using machine learning [12]–[14]. These are this particular website, while this subdomain uses a CNAME to effective against third-party tracking. resolve to a tracking-related third-party domain. This technique However, web tracking is becoming more and more intri- thus circumvents the third-party targeting privacy protections. cate. One of the emerging techniques is the use of Canonical The goals of this paper are to characterize, detect, and protect the end-user against CNAME cloaking based tracking. Firstly, Name Record or Alias (CNAME) record in Domain Name we characterize CNAME cloaking-based tracking by crawling System (DNS) to hide usual tracking domains that are blocked top pages of the Alexa Top 300,000 sites and analyzing the by browser filter lists and extensions. For instance, web- usage of CNAME cloaking with CNAME blocklist, including site example.com embeds a first-party a.example.com, which websites and tracking providers using this technique to track points to a tracking provider tracker.com via the CNAME users’ activities. We also point out that browsers and privacy protection extensions are largely ineffective to deal with CNAME x.tracker.com. For instance, website example.com embeds a cloaking-based tracking except for Firefox with a developer’s first-party request by subdomain a.example.com, which points version of the uBlock Origin extension. Secondly, we propose a to a tracking provider tracker.com via a CNAME x.tracker.com. supervised machine learning-based approach to detect CNAME Because this request is in the first-party context, countermea- cloaking-based tracking without the on-demand DNS lookup. sures that aim to block third-party tracking are effectively We show that the proposed approach outperforms well-known tracking filter lists. Finally, to circumvent the lack of DNS circumvented. API in Chrome-based browsers, we design and implement a There are some existing methods to detect CNAME prototype of the supervised machine learning-based browser cloaking-based tracking. Some network-based blocking meth- extension to detect and filter out CNAME cloaking tracking, ods work at the DNS level, such as NextDNS [15], AdGuard called CNAMETracking Uncloaker. Our evaluation shows that DNS [16], Pi-hole [17] that use to get rid of online tracking. CNAMETracking Uncloaker is able to filter out CNAME cloaking- based tracking requests without performance degradation when Furthermore, EasyPrivacy [18], AdGuard tracking protection compared with the vanilla setting on the Chrome browser. [19], and other filter lists manually add new first-party sub- domains which are fronts for CNAME cloaking to these Index Terms—Privacy, CNAME cloaking-based tracking, Third-party tracking, Machine learning techniques, Counter- blocklists. However, this approach will dramatically increase measure, Browser extension the size of the blocklists and these subdomains need to be updated frequently. Besides that, uBlock Origin since version 1.24.1b0 performs a DNS lookup of the hostname loading a I. INTRODUCTION resource to determine if the underlying subdomain is related Web tracking is becoming more and more ubiquitous, thus to CNAME cloaking or not. Nevertheless, only Firefox allows this brings an increase in privacy concerns from Internet users. uBlock Origin to block CNAME cloaking because the other In a TRUSTe study, 92% of British Internet users concern their browsers do not support DNS resolution API [8]. online privacy [3]. A website (first-party domain) has many In this paper, we provide a first in-depth analysis of links to other resources (third-party domains). Some third- CNAME cloaking-based tracking, propose a supervised ma- party domains are used for user tracking (third-party tracking) chine learning-based method for the detection, and implement to provide functionalities, such as advertising and analytics on CNAMETracking Uncloaker browser extension as a counter- the web [4]. For instance, with third-party tracking, advertise- measure. The main contributions of the paper are as follows. ments on a website can be customized based on end-users’ (1) We first characterize CNAME cloaking-based tracking in visits to other websites, which can be frightful for privacy- Alexa Top 300K sites. We detect 1,739 websites (0.58%) sensitive users. There are some existing approaches to detect containing CNAME cloaking-based tracking in Alexa 300K sites as of January 2020 by matching with CNAME tracking H. Dao is with Graduate University for Advanced Studies (SOKENDAI), filter lists (x IV-C); Those websites are spread across many Tokyo, Japan. email: [email protected]. J. Mazel is with French National Cybersecurity Agency (ANSSI), France. countries and categories. They use 24 tracking providers in email: [email protected] total, and the most common one is Adobe (x IV-D); By K. Fukuda is with National Institute of Informatics and Graduate University analyzing longitudinal snapshot crawled data of Alexa Top for Advanced Studies (SOKENDAI), Tokyo, Japan. email: [email protected]. Manuscript received September, 30, 2020; Revised January, 26, 2021. 100K sites (x IV-E), we show that the usage of CNAME 0This paper is an extended version of work published in Refs. [1], [2]. cloaking-based tracking steadily increases from 2016 to 2020; 2 We then conduct further experiments to investigate the impact B. Term definitions of giving consent to CNAME cloaking sites and confirm that We first define some terms we use throughout this paper: there are no significant differences compared to the usage of 1) Subdomain: In the DNS hierarchy, a subdomain is any this phenomena before the user consent is obtained (x IV-F) domain, which is underneath a main domain. Subdomains are as a new contribution. We also evaluate the detection ability used to organize or divide contents of a website into specific of such tracking for major browsers and extensions (x V). sections. For example, a.example.com and b.example.com are (2) Next, we propose the supervised machine learning-based subdomains of domain example.com. method to detect CNAME cloaking-based tracking without the 2) CNAME cloaking-based tracking: The usage of DNS on-demand DNS lookup (x VI); Through the comprehensive CNAME records coupled with Content Delivery Network analysis, we demonstrate the effectiveness of our method. (3) (CDN) is increasingly commonplace to improve website load As a new contribution beyond Refs. [1], [2], we design and times, reduce bandwidth costs, and increase content availabil- implement a prototype browser extension of the supervised ity and redundancy. machine learning approach to protect the end-user against CNAME has also been used for user tracking. Tracking CNAME cloaking-based tracking, named CNAMETracking providers ask their clients to delegate a subdomain for data Uncloaker (x VII). The current best counter-measure strongly collection and tracking and link it to an external server using depends on realtime name resolution (only supported by a CNAME DNS record [27]. This technique, called CNAME Firefox browser), but our extension intends to distinguish cloaking-based tracking, uses CNAME to disguise requests requests using CNAME cloaking-based tracking in Chrome- to a third-party tracker as first-party ones. We also define an based browsers. Our experiment shows that the performance HTTP request by this subdomain is a request linked to CNAME overhead is acceptable when compared with the vanilla setting cloaking-related tracking. on the Chrome browser. Name Type Value II. BACKGROUND AND TERM DEFINITIONS DNS server example.com A 192.168.0.1 A. Background a.example.com CNAME x.tracker.com 1) Third-party web tracking: Privacy leakage occurs x.tracker.com A 172.16.0.1 through communications with trackers. Third-party web track- (2) ing refers to the practice by which an entity (the tracker), other 172.16.0.1 tracker.com (tracker) 172.16.0.1 a.example.com than the website directly visited by the user, identifies and (1) a.example.com/traker.js collects information about web users. example.com (3) From the view of website administrators, user tracking is (192.168.0.1) Browser content + cookie useful for a variety of purposes such as behavioral advertising (4) or website analytics. On the user’s side, the larger number of Fig. 1. The process of the browser connecting to tracking provider by browsing profiles, the greater loss of privacy. CNAME cloaking-based tracking 2) Privacy protection techniques: Several privacy protec- tion techniques have been designed to protect end-users from Figure 1 shows the process of the browser connecting third-party tracking, including network-based blocking, exten- to a third-party tracking server by CNAME cloaking-based sions, and browser itself. tracking to setup third-party cookies in the first-party context: Network-based blocking methods use address-based black- 1) An end-user types the URL of website example.com lists in order to block access to certain domains (DNS block- (192.168.0.1) into his/her browser and presses return. ing) and modify web traffic (interception proxies), which work This website embeds a subdomain a.example.com independently of the underlying application or browser [20]. 2) The browser looks up a.example.com on the DNS server Some anti-tracking extensions work effectively to detect and finds an IP address 172.16.0.1 of tracking provider third-party tracking, such as Ghostery [21], Disconnect [7], tracker.com.