Protecting Users from Third-Party Web Tracking with Trackingfree Browser
Total Page:16
File Type:pdf, Size:1020Kb
I Do Not Know What You Visited Last Summer: Protecting Users from Third-party Web Tracking with TrackingFree Browser Xiang Pan Yinzhi Cao Yan Chen Northwestern University Columbia University Northwestern University [email protected] [email protected] [email protected] Abstract—Stateful third-party web tracking has drawn the neither of the approaches can completely protect users from attention of public media given its popularity among top Alexa third-party tracking: in addition to browser cookies, a tracker web sites. A tracking server can associate a unique identifier can store user’s unique identifier into many other places on from the client side with the private information contained in the client-side state, such as Flash files [4] [26] [17] and browser referer header of the request to the tracking server, thus recording caches [26] [4] [17]; meanwhile, blacklist tools highly depend the client’s behavior. Faced with the significant problem, existing on all the records in the database and a tracking company can works either disable setting tracking identifiers or blacklist third- party requests to certain servers. However, neither of them can always adopt new domains to track users. completely block stateful web tracking. To address the shortcomings of existing anti-tracking ap- In this paper, we propose TrackingFree, the first anti-tracking proaches, a third-party tracking defense system should achieve browser by mitigating unique identifiers. Instead of disabling the following goals: those unique identifiers, we isolate them into different browser Complete blocking. The system can completely block all principals so that the identifiers still exist but are not unique •existing stateful third-party tracking techniques, such as those among different web sites. By doing this, we fundamentally cut off tracked by browser cookies, caches, HTML5 localStorage and the tracking chain for third-party web tracking. Our evaluation shows that TrackingFree can invalidate all the 647 trackers found Flash files. High function preservation. While blocking third-party in Alexa Top 500 web sites, and we formally verified that in • TrackingFree browser, a single tracker can at most correlate user’s tracking, the system should be compatible to existing web activities on three web sites by Alloy. sites and web services. Low performance overhead. The system should incur afford- • I. INTRODUCTION able overhead compared with that of normal browsing. Stateful third-party web tracking, the practice by which third- In this paper, we propose TrackingFree, the first anti-tracking party web sites collect private information about web users, has browser that can completely protect users from stateful third- been adopted by more than 90% of Alexa Top 500 web sites [34]. party tracking practice. Instead of disabling places that store To track a web user, a third-party tracking1 site first needs to third-party unique identifiers, such as browser cookies and flash identify the user by a unique string stored in client-side state. files, TrackingFree automatically partitions client-side state into Then, the tracking site associates the identifier of the user with multiple isolation units (a.k.a., browser principals) so that the the private information, such as first-party web site domain name, identifiers still exist but are not unique any more. Therefore, third- contained in the referer header of the third-party request. party tracking web sites cannot correlate a user’s requests sent from different principals with those identifiers. As a comparison, Faced with the significance of third-party tracking in the all existing multi-principal browsers [5], [8], [10], [15], [28], wild, researchers have proposed solutions targeting the two [38] aim to protect users from memory attacks and cannot defend steps of third-party tracking: mitigating the unique identifier like against tracking practices. To summarize, we make the following disabling third-party cookies, or cutting off requests with private contributions: information like blacklisting known tracking servers. However, Anti-tracking Content Allocation Mechanism. To obtain • 1In this paper, unless otherwise stated, third-party tracking refers to stateful the completeness of anti-tracking capability, TrackingFree third-party tracking. partitions client-side state into different browser principals through a novel content allocation mechanism. TrackingFree first allocates initial server-side contents based on the regis- Permission to freely reproduce all or part of this paper for noncommercial tered domain names, and then allocates derivative server-side purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited contents, such as user-navigated windows/frames and pop-up without the prior written consent of the Internet Society, the first-named author windows, based on a dynamically generated in-degree-bounded (for reproduction of an entire paper only), and the author’s employer if the graph with its nodes as browser principals. paper was prepared within the scope of employment. Privacy-preserving Communication. TrackingFree is the NDSS ’15, 8-11 February 2015, San Diego, CA, USA • Copyright 2015 Internet Society, ISBN 1-891562-38-X first multi-principal browser with isolated client-side state http://dx.doi.org/10.14722/ndss.2015.23163 that enables communication among different principals. The on the kernel, has isolated persistent storage so the third-party Algorithm 1 Principal Switch Determination Algorithm contents in different principals cannot share the same identifiers. Input: We adopt profile based isolation mechanism [9] [31] because target : Frame . target frame source : Frame . source frame of its completeness in isolating client-side state and minimum domain : Domain . the domain of the principal overhead. Profile will also isolate user preferences, so we propose isUserT riggered : Boolean preference configure to synchronize all user-initiated preferences Output: 1: isCrossSiteReq (not equal(target.domain, domain)) among all principals. See Section III-D for details. 2: isMainF rameNav( ( 3: (source.isMainF rame() and isNavigation(source, target)) In addition to isolation mechanism, another key factor for 4: if isMainF rameNav & isCrossSiteReq then TrackingFree is content allocation mechanism. In TrackingFree, 5: return switch-principal 6: else if isCrossSiteReq & isUserT riggered then the principal manager handles how to allocate contents from 7: return switch-principal the server into different browser principals. Specifically, it 8: else dynamically determines how to put different frames into different 9: return non-switch 10: principals based on user activities, frame properties and principal end if organization. Principal organization is maintained by principal backend as a directed graph with maximum in-degree set as two. to store identifiers [35] [4]. We treat plugin objects like other We claim that TrackingFree’s content allocation mechanism can HTML objects: putting them in the same principal as their strike a good balance between privacy and compatibility. We embedding frames to cut off the identifier sharing channels will discuss it in details in Section III-B. provided by plugin objects. Principal communication also plays an important role for browser’s privacy and compatibility. They can be classified 2) Derivative Content Allocation: Once a frame is placed as two categories: explicit communication (e.g., postMessage) in a principal, TrackingFree needs to decide how to allocate and implicit communication (e.g., history information sharing). its child frames. There are two steps in allocating those child In TrackingFree, message policy enforcer and public history frames. First, TrackingFree decides whether those child frames manager handle all the principal communication: the former should stay in the same principal as their parent frames. A short restricts the range of explicit communication for privacy- answer is that TrackingFree keeps all non-user-triggered child preserving purpose and the latter proposes a secure history frames in the principals that their parent frames reside in, and sharing channel. We will discuss principal communication in moves all user-triggered cross-site frames to other principals. Section III-C. This process is defined as principal switch. Each frame can TrackingFree also gives user the flexibility of controlling the at most be switched once in its creation phase. Second, for balance between anti-tracking capability and user experience. those frames that need to be moved out of current principal, Specifically, the two components of domain data manager can TrackingFree selects an existing principal or creates a new decrease the number of principals and share specified domains’ one to render them. This process is defined as principal selection. sessions among multiple principals, while still achieving expected privacy. Domain data manager will be discussed in Section III-E. Principal Switch. There are two intuitive yet extreme principal switch algorithms: keeping all child frames in current B. Content Allocation principal (no switch) and making a switch for every child frame. TrackingFree’s content allocation mechanism is composed However, keeping all child frames in the same principal allows of two parts: initial content allocation and derivative content trackers to collect user’s browsing history; making a switch all allocation. Initial content allocation handles top frames