User Tracking in the Post-Cookie Era: How Websites Bypass GDPR Consent to Track Users Please Cite Our WWW’2021 Paper, Doi: 10.1145/3442381.3450056
Total Page:16
File Type:pdf, Size:1020Kb
User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users Please cite our WWW’2021 paper, doi: 10.1145/3442381.3450056 Emmanouil Papadogiannakis Panagiotis Papadopoulos FORTH/University of Crete, Greece Telefonica Research, Spain Nicolas Kourtellis Evangelos P. Markatos Telefonica Research, Spain FORTH/University of Crete, Greece ABSTRACT Indeed, users may be given the choice to accept all cookies, to During the past few years, mostly as a result of the GDPR and the accept some cookies, or even to reject all cookies. The choice is CCPA, websites have started to present users with cookie consent entirely up to the user, and the correct implementation of this choice banners. These banners are web forms where the users can state their is the responsibility of the website. Although this sounds completely preference and declare which cookies they would like to accept, if legal and fully straightforward, deviations have been reported in such option exists. Although requesting consent before storing any literature [1–5]. For example, some websites claim that some cookies identifiable information is a good start towards respecting theuser are absolutely necessary for their operation (e.g., for the page to be privacy, yet previous research has shown that websites do not always delivered) or due to legitimate interest (e.g., to improve the product), respect user choices. Furthermore, considering the ever decreasing and can not be rejected by the users. Thus, users cannot really choose reliance of trackers on cookies and actions browser vendors take by to reject all cookies: these necessary cookies cannot be rejected. Past blocking or restricting third-party cookies, we anticipate a world research studies have noticed some discrepancies between what where stateless tracking emerges, either because trackers or websites the users type and what is registered in the website. For example, do not use cookies, or because users simply refuse to accept any. the users may provide a negative response (i.e., reject all cookies), In this paper, we explore whether websites use more persistent and but the cookie banners may register a positive one (i.e., accept all sophisticated forms of tracking in order to track users who said they cookies), or the cookie banners may register a positive response even do not want cookies. Such forms of tracking include first-party ID before the users had the opportunity to provide any choice [4]. leaking, ID synchronization, and browser fingerprinting. Our results All these previous studies focus on cookies and compliance of suggest that websites do use such modern forms of tracking even cookie processing with the GDPR. In this paper, we set out to explore before users had the opportunity to register their choice with respect a slightly different question: to cookies. To add insult to injury, when users choose to raise their If a user does not provide consent, or chooses to reject voice and reject all cookies, user tracking only intensifies. As a all cookies, do websites use other forms of tracking to result, users’ choices play very little role with respect to tracking: track this user? If so, what are these forms of tracking, we measured that more than 75% of tracking activities happened and what is the extent of this tracking? before users had the opportunity to make a selection in the cookie Considering the (i) ever less reliance of third-party trackers on consent banner, or when users chose to reject all cookies. non-permanent, erasable state-like cookies [6] and (ii) recent ad- vances of browser vendors against third-party cookies [7, 8], it is KEYWORDS apparent that the need for identifying how websites treat user con- User Tracking, GDPR, User Consent, Web fingerprinting sent in case of stateless (cookie-less) tracking is more than timely and urgent. We address this need and try to fill this exact gap in our 1 INTRODUCTION understanding, by being the first to investigate what is the GDPR arXiv:2102.08779v1 [cs.CY] 17 Feb 2021 Over the past few years, we have seen an increasing concern about compliance across the Web in the case where websites and trackers user data protection with respect to the data of European users. This do not use cookies, or users do not accept cookies. was probably the result of the General Data Protection Regulation Sadly, our results suggest that even when users reject all cookies, (GDPR) which was adopted in April 2016 and came into force websites do use other forms of tracking to track users, and process in May 2018. The main difference of this regulation compared to personal data, in violation of GDPR. Such forms of tracking include previous legislation is that it includes significant fines for companies first-party ID leaking, ID synchronization and browser fingerprint- 1 which collect users data without the users’ consent or some other ing. First-party ID leaking and ID synchronization are used to pass legal basis. Such fines can reach up to 20 million euros, orupto an identifier (such as a cookie) as an “argument” in an HTTP request 4% of the annual worldwide turnover of the preceding financial to a website - different from the website that planted this ID in the year, whichever is greater. As a result, several companies, and their first place. In fact, according to past studies9 [ –11], Web entities associated websites, have started asking their visitors and users for 1One might think that ID synchronization is a form of tracking using cookies. This is their consent, before collecting (and processing) their data. not really true: although ID synchronization does use (values stored in) cookies, passing Such a consent has been usually collected via cookie banners, such values around is done in an unorthodox manner, completely different from the way cookies are used. which ask users for consent and may give some choices as well. may share IDs they have assigned to users and help third-parties re- (1) New Alias identify users or create universal IDs. Browser fingerprinting [12, 13] uses elaborate approaches to uniquely identify a user through char- acteristics of her device - characteristics which can be easily found by a website. Such characteristics may include screen resolution and rendering characteristics, browser fonts and installed plugins etc. [14–17]. Combining several of these characteristics can provide (2) New Alias a large enough number of entropy bits to uniquely identify a user. Although these cases of user identification are considered “per- sonal data processing” according to GDPR and ePrivacy [18] regula- tions, and must be visible to users, they often do not appear in request forms of consent managers deployed by modern websites. In this study, we highlight exactly that: the lack of transparency and user consent when it comes to websites that deploy user identification (3) Sync of Aliases techniques like ID synchronization and browser fingerprinting. The contributions of this work are as follows: • We propose a fully automated method for detecting browser &publisher= website3.com fingerprinting on websites using the Chromium Profiler. advertiser.com?syncID=user123 • We crawl close to one million websites and record how they track users using sophisticated forms of tracking (such as first- party ID leaking, ID synchronization and browser fingerprinting) as a function of users’ choices. Figure 1: Example of an ID synchronization operation. Two en- • We find that: (1) More than 75% of leaks happen despite the tities match the IDs they have assigned to the same user. fact that users have chosen to reject all cookies; (2) Websites embedded with ID synchronizing third-parties force browsers to engage in several ID synchronizations (3.51 per ID, on average) bid in ad auctions [27], ad-retargeting [28] and cross-device track- even before users had a chance to accept or deny consent; (3) ing [29]. For the different Web entities (e.g., publishers, analytics, Popular websites are more likely to disregard users’ consent data brokers, advertisers, etc.) to perform such transactions, all of choices and engage in first-party ID leaking and ID synchroniza- the different assigned aliases (i.e., IDs) that each entity has assigned tion; (4) Browsers leak more information when users choose to to the same user, need to be linked (i.e., synced) together. This would reject all cookies than when they choose to take no action at all; reveal that the user that the entity A knows as userABC is the same (5) Our analysis of tracking per country code reveals significant user that entity B knows as user123. discrepancies across EU countries. Figure 1 illustrates an example of how this ID synchroniza- • Our methodology can be transformed into an auditing tool for tion takes place. Assume a user browsing website1.com regulators, stakeholders and privacy-policy makers, for verifying and website2.com, in which there are third-parties like compliance with GDPR and users’ privacy rights. tracker.com and advertiser.com, respectively. Conse- quently, these two third-parties have the chance to assign an 2 BACKGROUND alias to the user and re-identify them in the future. From now In the world of Web, cookies are used to store identifying infor- on, tracker.com knows the user with the ID user123, and mation for a given user. However, recent policies and regulations advertiser.com knows the same user with the ID userABC. from browser vendors and government bodies [19–21] try to con- Next, assume that the user lands on website3.com, which in- trol the exposure of this identifying information to third-parties cludes some JavaScript code from tracker.com making the and for how long. These policies restrain the ad and tracking in- browser issue a GET request to tracker.com (step 1), who dustry that relies on re-identifying a user for long periods to serve responds back with a REDIRECT request (step 2), instructing more targeted ads.