What a Tangled Web We Weave: Understanding the Interconnectedness of the Third Party Cookie Ecosystem
Total Page:16
File Type:pdf, Size:1020Kb
What a Tangled Web We Weave: Understanding the Interconnectedness of the Third Party Cookie Ecosystem Xuehui Hu Nishanth Sastry King’s College London King’s College London, UK London, UK University of Surrey, UK [email protected] [email protected] ABSTRACT KEYWORDS When users browse to a so-called “First Party” website, other third Container Identity, Third Party Cookie parties are able to place cookies on the users’ browsers. Although ACM Reference Format: this practice can enable some important use cases, in practice, these Xuehui Hu and Nishanth Sastry. 2020. What a Tangled Web We Weave: third party cookies also allow trackers to identify that a user has Understanding the Interconnectedness of the Third Party Cookie Ecosys- visited two or more first parties which both share the second party. tem. In 12th ACM Conference on Web Science (WebSci ’20), July 6–10,2020, This simple feature been used to bootstrap an extensive tracking Southampton, United Kingdom. ACM, New York, NY, USA, 10 pages. https: ecosystem that can severely compromise user privacy. //doi.org/10.1145/3394231.3397897 In this paper, we develop a metric called “tangle factor” that measures how a set of first party websites may be interconnected 1 INTRODUCTION or tangled with each other based on the common third parties used. It is common knowledge that cookies are placed in a user’s browser Our insight is that the interconnectedness can be calculated as the when they visit a website. The websites that the users visit are chromatic number of a graph where the first party sites are the called first party websites (FPs). Although some of these cookies nodes, and edges are induced based on shared third parties. are placed by FP domains, many are placed by third party affiliates We use this technique to measure the interconnectedness of (TPs) of the first party sites, for reasons such as advertising, orana- the browsing patterns of over 100 users in 25 different countries, lytics. Examples include advertising networks such as adnxs.com, through a Chrome browser plugin which we have deployed. The amazon-adsystem.com and doubleclick.net, analytics platforms users of our plugin consist of a small carefully selected set of 15 such as google-analytics.com, or social media trackers such as test users in UK and China, and 1000+ in-the-wild users, of whom Facebook and Twitter. 124 have shared data with us. We show that different countries Such TPs have proved to be an extremely powerful method for ag- have different levels of interconnectedness, for example China has gregating users’ browser histories – for example, a TP that appears a lower tangle factor than the UK. We also show that when visiting on both https://www.bbc.com, and https://www.nytimes.com the same sets of websites from China, the tangle factor is smaller, is able to infer that a user visited both sites, and therefore can infer due to blocking of major operators like Google and Facebook. that the user might be someone who is interested in news and We show that selectively removing the largest trackers is a very current affairs. An important motivation for such detailed profiling effective way of decreasing the interconnectedness of third party of users is that it can lead to more “targeted” advertisements, which websites. We then consider blocking practices employed by privacy- are much more profitable. For instance, recent research shows that conscious users (such as ad blockers) as well as those enabled by right-leaning hyperpartisan websites in the US use more sophis- default by Chrome and Firefox, and compare their effectiveness ticated and more intensive tracking techniques and are therefore using the tangle factor metric we have defined. Our results help able to command higher prices than left-leaning websites [3]. quantify for the first time the extent to which one ad blocker is As advertisers’ demand for detailed profiles continues to grow, so more effective than others, and how Firefox defaults also greatly does the sophistication of third party tracking technologies. Users help decrease third party tracking compared to Chrome. and many browsers have responded with new technologies to safe- guard user privacy. Indeed, this has led to an “arms race”, with users CCS CONCEPTS installing ad blockers such as AdBlock Plus[14], uBlock Origin[18] • Security and privacy ! Web application security; Social net- and Ghostery[9], and certain websites (e.g., memeburn.com, english- work security and privacy. forum.ch) responding with anti ad-blocking technology[31, 37, 46] that refuse to deliver content unless users unblock ad blockers whilst visiting their sites. Permission to make digital or hard copies of all or part of this work for personal or An important addition to the arsenal of technologies for pre- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation venting tracking of users is the notion of multi-account containers, on the first page. Copyrights for components of this work owned by others than the often referred to simply as “containers”. Pioneered by Firefox [33], author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or containers are a way to separate different sets of cookies from each republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. other. Containers were initially intended for providing “contextual WebSci ’20, July 6–10, 2020, Southampton, United Kingdom identities”1, i.e., to create different user identities depending on © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7989-2/20/07...$15.00 1https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Work_ https://doi.org/10.1145/3394231.3397897 with_contextual_identities WebSci ’20, July 6–10, 2020, Southampton, United Kingdom Xuehui Hu and Nishanth Sastry the context of operation. For example, containers allow a user to cleanly separate their work and personal identities on the same browser. Different containers can also be used to login to thesame sites with multiple user ids (for example, users having several email accounts from the same provider can be simultaneously logged in to each of them in different containers). In this sense, containers are similar to other browser extensions such as Multifox [28] or CookieSwap [13] (or the similar Swap my Cookies extension on Chrome [15]). Figure 1: Overlapping first parties which share a third party Firefox containers essentially create a different user profile within tracker must be placed in separate containers, thus the red each container, providing a different database of cookies and stor- and green sites must be separated. The blue website does not 2 age for each . Thus, each container identity is kept separate from share any third parties with either red or green and can be the others, and information such as third party cookies are not placed together with either of them in a container. shared across containers. Firefox suggests [33] that this can be used to also achieve additional privacy, for instance by placing a user’s shopping websites into a separate container from financial websites We calculate the tangle factor by modelling the set of FPs as such as banks and credit cards. nodes of a graph, drawing edges between two FPs when they share Although containers offer a clean way to separate third (and first) one or more TPs. We call this the first party interconnection graph parties from each other by creating different cookie stores, they are (FPIG). Two FPs in the FPIG that share an edge (i.e., share one or only effective if interfering parties are manually placed in separate more TPs) must be therefore placed in separate containers in order containers. Recently, Firefox has come up with add-ons which au- to prevent tracking by the shared TPs. If we assign a colour to each tomatically, or by default, provide protection for one of the most container, and label FPs in the FPIG with the colour of the container prevalent trackers – Facebook. This add-on creates a specialised they are placed in, it is easy to see that the vertex chromatic number container [36] for Facebook. Once installed, it opens Facebook in of the FPIG, i.e., the number of colours needed for nodes or vertices its own container, and the user is logged out of Facebook in other of the FPIG such that neighbouring vertices which share an edge containers, thus automatically preventing Facebook Pixel [35] and are coloured differently, gives the minimum number of containers other tracking by Facebook of users who are logged in. needed to effectively separate that set of FPs. We call this the tangle While a solution for Facebook’s tracking is indeed important as factor of a given set of first party websites. it is widely used on many websites, it is not sufficient, as it does not We apply the Tangle Factor metric to three different sets of offer protection against other third parties. Furthermore, Facebook first party websites. First, we look at the Alexak top- websites for is blocked by the Government in countries such as China, and thus different values of k, and for two different countries, UK and China. 3 is not a major third party in that country [20]. Thus, solutions are Second, we leverage an ongoing user study which developed a needed that work for other important third parties, as well as for Chrome plugin and collected anonymised browsing histories for other country and cultural contexts.