(Synced) Cookie Monster Breached My Encrypted VPN Session
Total Page:16
File Type:pdf, Size:1020Kb
Exclusive: How the (synced) Cookie Monster breached my encrypted VPN session Panagiotis Papadopoulos Nicolas Kourtellis Evangelos P. Markatos FORTH-ICS, Greece Telefonica Research, Spain FORTH-ICS, Greece [email protected] [email protected] [email protected] ABSTRACT worse, last year, the USA House of Representatives voted to reverse In recent years, and after the Snowden revelations, there has been a FCC’s regulations that prevented Internet Service Providers (ISPs) significant movement in the web from organizations, policymakers from selling users web-browsing data without their explicit con- and individuals to enhance the privacy awareness among users. sent [14]. Apart from the profit-oriented data collection, ad-related As a consequence, more and more publishers support TLS in their tracking techniques have been also exploited by agencies (as proven websites, and vendors provide privacy and anonymity tools, such after the Snowden revelations [36]) to increase the effectiveness of as secure VPNs or Tor onions, to cover the need of users for privacy- their mass surveillance mechanisms. preserving web browsing. But is the sporadic appliance of such tools This intensive data collection and use beyond the control of the enough to provide privacy? end-users, in some cases has been characterized as pervasive or even In this paper, we describe two privacy-breaching threats against intruding, thus, raising significant privacy concerns [22, 36]. These users accessing the Internet over a secure VPN. The breaches are privacy concerns have made Internet users more privacy-aware [4], made possible through Cookie Synchronization, nowadays widely leading browser vendors to adopt more privacy preserving features used by third parties for advertisement and tracking purposes. The in their products (e.g., anti-tracker solutions [6, 10], integrated generated privacy leaks can be used by a snooping entity such as VPN channels [28], etc.). Therefore, it is easy to see why a rapidly an ISP, to re-identify a user in the web and reveal their browsing increasing number of users communicate with content providers history even when users are hidden behind a VPN. By probing the (i.e., websites) through secure HTTPS connections [13]. The use of top 12K Alexa sites, we find that 1 out of 13 websites expose their TLS authenticates the remote content provider and guarantees the users to these privacy leaks. integrity and confidentiality of any sensitive information that is transmitted from and to the web server. In this way, no third party CCS CONCEPTS located between the two ends can monitor the content the user browses and, thus, what their interests and preferences are. • Security and privacy → Pseudonymity, anonymity and un- Additionally, there are users who add another layer of protection traceability; Web protocol security; • Information systems by redirecting all of their traffic through a Virtual Private Network → Online advertising; (VPN) connection (or Tor-like onion chains) [26, 32]. By using KEYWORDS such secure tunneling, users can hide their actual IP address (and consequently their geolocation), thus preserving their anonymity. Cookie Synchronization, TLS browsing session, User Tracking, Web Thereby, users can prevent not only the ISP from monitoring what Privacy, Anonymity they browse, but also the web server itself from learning where the ACM Reference Format: user is located both in the network, and geographically. Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P. Markatos. In literature, there have been several attacks recorded against 2018. Exclusive: How the (synced) Cookie Monster breached my encrypted both HTTPS and VPN-secured traffic. Regarding HTTPS,16 in[ ] the VPN session. In EuroSec’18: 11th European Workshop on Systems Security authors present a classifier with which an eavesdropper can conduct , April 23–26, 2018, Porto, Portugal. ACM, New York, NY, USA, 6 pages. traffic fingerprinting to estimate what the user browses basedon https://doi.org/10.1145/3193111.3193117 specific traffic characteristics such as the size of the web page.In[5], authors present impersonation attacks against TLS by exploiting 1 INTRODUCTION combinations of RSA and DH key exchange, session resumption, In recent years, the rise of targeted advertising and the high rev- and renegotiation. In addition, there are specific vulnerabilities enues it produces (2:7× more revenue and 2× more efficient than found in commercial VPN, such as DNS hijacking and IPv6 leakage non-targeted [2]) has lured data brokers and trackers to collect presented in [33], or the WebRTC bug [34]. Although the above increasingly more user data [17], which can process to produce au- approaches depict the susceptibility of the protocols, they either dience segments and sell them to advertisers [18]. To make matters provide results of specific accuracy, or they mostly rely on particular vulnerabilities of the deployed commercial VPN service. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed In this paper, we prove that, even in the case of a perfectly secured for profit or commercial advantage and that copies bear this notice and the full citation VPN service, the preservation of the users privacy is still not a given. on the first page. Copyrights for third-party components of this work must be honored. Specifically, we show that the so called Cookie Synchronization For all other uses, contact the owner/author(s). EuroSec’18, April 23–26, 2018, Porto, Portugal mechanism that different advertisers increasingly use1 [ , 27] to sync © 2018 Copyright held by the owner/author(s). their cookies for the same users, can compromise the anonymity ACM ISBN 978-1-4503-5652-7/18/04. of the users under a broad range of circumstances. And this can https://doi.org/10.1145/3193111.3193117 happen even when these users are connected to the Internet with a secure VPN. Specifically, in Cookie Synchronization, tokens se- curely stored in TLS cookies, which are able to uniquely identify a user (userID), are being shared with third parties allowing them to re-identify the user across different websites, even if the user hides Snooping ISP Browsing their real IP address. When this Cookie Synchronization procedure sessions example.com takes place over plain HTTP, it is not only the syncing entities (1) that learn the userID of the user, but also a snooping entity such (2) as an internet service provider (ISP), which can follow the user’s (3) (4) browsing habits just by keeping track of the synced cookies. In particular, the contributions of this paper are the following: - We present in full detail the above attack, and show how a last mile observer can snoop parts of a user’s browsing history HTTPS HTTP-CSync through Cookie Synchronization. tracker1.com tracker2.com - We build a weblog analyzer able to parse HTTP(S) requests, detect Cookie Synchronizations and check for possible brows- ing history and ID spilling from secure TLS sessions. Figure 1: High level overview of the TLS session leak. - As a proof of concept, we perform active measurements through A privacy-aware user (1) visits a webpage (example.com) our secure VPN, by probing the top 12K Alexa sites, and ex- over TLS and VPN. (2) It sends tracking information to plore the feasibility and spread of these attacks in the wild. tracker1.com, and receives its cookie over TLS. (3,4) It By fetching the landing pages, we collect a dataset of 440K takes only a HTTP-based Cookie Synchronization (among HTTP(S) requests, and by using it as input to our analyzer, we tracker1.com and tracker2.com) in order to spill user unique show that 1 out of 13 of the most popular websites worldwide identifiers and visited website. Then, a snooping ISPcan expose their users to these two privacy-breaching attacks. re-identify the user just by monitoring the synced cookies, even if their real IP address is hidden. 1.1 Threat model 2.1 What is Cookie Synchronization? In this paper, we assume a curious monitoring entity (e.g., an ISP) which is interested in collecting user data (such as location and Cookie Synchronization, as first presented by Olejnik et al. in[27], browsing patterns or interests), that can afterwards sell to any- is a technique used by third-party domains to bypass the same- one interested (i.e, data management platforms, advertisers or data origin policy [37] and match the different pseudonymous user IDs brokers [9]). Of course, this curious ISP does not want to jeop- they have assigned to the same user. These user IDs are stored in ardize its own users-clients, therefore it infers user data only by cookies on the user-side, and the syncing procedure takes place by passively monitoring the network traffic it routes. In the early years piggybacking the user ID in the URL of a request from the third of HTTPS, which had a small portion of the Internet traffic, this party A included in the visited website W , to another collaborating passive monitoring was a trivial process. However, lately, more than party B. Party A may have its own iframe embedded in website half of the Internet [13] uses encryption and TLS to prevent prying W , or just a single 1 × 1 pixel image (web beacon [23]), which is eyes from accessing the contents of the secure communication. enough to receive a GET image request and redirect the browser to 0 In addition to the popular use of HTTPS from many websites, B s domain with its user ID piggybacked. there are privacy-aware users who tunnel their HTTPS web traffic Cookie Synchronization provides the potential for more effective through secure VPN, in order to not only preserve the privacy of the and persistent user tracking, allowing the tracking entities (i) to content they browse, but also preserve their anonymity.