Feasibility and Real-World Implications of Web Browser History Detection
Total Page:16
File Type:pdf, Size:1020Kb
Feasibility and Real-World Implications of Web Browser History Detection Artur Janc Lukasz Olejnik [email protected] [email protected] ABSTRACT known popular websites, including Web 2.0 applications[7]. Browser history detection through the Cascading Style Sheets Initial academic work and all known demonstrations focused visited pseudoclass has long been known to the academic se- on gathering information about the domain names of known curity community and browser vendors, but has been largely sites and potentially applying it for inferring user-related dismissed as an issue of marginal impact. information. In this paper we present several crucial real-world consid- More recently, CSS-based history detection started to be- erations of CSS-based history detection to assess the feasi- come applied as a powerful component of privacy research, bility of conducting such attacks in the wild. We analyze including work to determine the amount of user-specific in- Web browser behavior and detectability of content returned formation obtainable by ad networks[9] and as part of a via various protocols and HTTP response codes. We develop scheme for deanonymizing social network users[14]. an algorithm for efficient examination of large link sets and However, there has been a notable lack of work exam- evaluate its performance in modern browsers. Compared to ining several crucial aspects of history detection, including existing methods our approach is up to 6 times faster, and is the types of browser-supported protocols and resource types able to detect as many as 30,000 links per second in recent which can be detected, performance considerations, and the browsers on modern consumer-grade hardware. number of users affected by such attacks. Perhaps because We present a web-based system capable of effectively de- of the lack of such large-scale studies no Web browser vendor has yet implemented any kind of protection against history tecting clients' browsing histories and categorizing detected 1 information. We analyze and discuss real-world results ob- detection attacks . tained from 271,576 Internet users. Our results indicate that In this paper, we provide a detailed examination of CSS- at least 76% of Internet users are vulnerable to history de- based history detection techniques and their impact on the tection; for a test of most popular Internet websites we were privacy of Internet users. We provide an overview of exist- able to detect, on average, 62 visited locations. We also ing work, and discuss basic cross-browser implementations demonstrate the potential for detecting private data such of history detection using JavaScript as well as a CSS-only as zipcodes or search queries typed into online forms. Our technique. We evaluate the detectability of resources based results confirm the feasibility of conducting attacks on user on the browser-supported protocols used to retrieve them, privacy using CSS-based history detection and demonstrate analyze the effect of loading content in frames and iframes, that such attacks are realizable with minimal resources. as well as review the impact of HTTP redirects and other response codes. We also demonstrate an optimized algorithm for detect- 1. INTRODUCTION ing visited links and its JavaScript implementation. We pro- Web browsers function as generic platforms for applica- vide detailed performance measurements and compare it to tion delivery and provide various usability enhancements known methods. Our approach is, up to six times faster and performance optimizations, many of which have impli- than known methods, and allows for the examination of up cations for user privacy. One of the earliest such usability to 30,000 links per second on modern hardware. We also improvements was the ability to style links to Web pages vis- provide the first performance analysis of the CSS-only tech- ited by the user differently from unvisited links, introduced nique, demonstrating its value as an efficient, though often by the original version of the CSS standard[2] and quickly neglected, alternative to the scripting approach. adopted by all major Web browsers. This mechanism was Based on our work on a real-world testing system[8], we quickly demonstrated to allow malicious Web authors to de- provide an overview of the design of an efficient history tect which links a client had[1]. detection application capable of providing categorized test Since then, a body of academic work was created on the of various website classes, and realizable with minimal re- topic, describing history detection methods[13] and discussing sources. We discuss approaches for the selection of links the potential to detect visited websites to aid in phishing[16, to be detected, and outline our implemented solution based 21]. Countermeasures against such attacks were proposed, on primary links (as site entry points), secondary resources, including client-side approaches through browser extensions[15] and enumeration elements. and server-side solutions on a per-application basis[17], but such methods have not been adopted by browser vendors or 1Since writing the original draft of this article, we have Web application developers. Simultaneously, several demon- learned of plans to mitigate against such attacks in an up- stration sites have been created to show the ability to detect coming release of the Mozilla Firefox browser[12]. Finally, we analyze history detection results obtained from <style > 271,576 users. We demonstrate that a large majority (76.1%) #foo:visited {background: url(/?yes-foo);} of Internet users are vulnerable to history detection attacks. #bar:link {background: url(/?no-bar);} We analyze the average number of primary and secondary </style > links found for a basic test, and provide examples of privacy- <a id="foo" href="http://foo.org"></a> sensitive data gathered by our system. <a id="bar" href="http://bar.biz"></a> Our results indicate that properly prepared history detec- tion attacks have significant malicious potential and can be targeted against the vast majority of Internet users. Figure 1: Basic CSS Implementation. 2. BACKGROUND attempt to discover any intra-domain resources or attach The CSS visited pseudoclass has been applied to links vis- semantic information to detected links. ited by client browsers since the introduction of the CSS1 standard in an early stage of the Web, in 1996[2]. The fea- ture of applying different styles to "known" links quickly be- 3. ANALYSIS came accepted by users and was recommended by various In order to fully evaluate the implications of CSS-based usability experts[4]. history detection, it is necessary to understand how and The ability to use the visited pseudoclass for detecting when Web browsers apply styles to visited links. In this Web users' browsing history was first reported to browser section we analyze various browser behaviors related to vis- vendors as early as the year 2000[11, 1]. Since then, the ited links, describe an efficient algorithm for link detection technique has been independently rediscovered and disclosed and evaluate its performance in several major browsers2. several times[16], and has become widely known among Web browser developers and the security and Web standards com- 3.1 Basic Implementation munities. In fact, Chapter 5.11.2 of the CSS 2.1 standard[3], CSS-based history detection works by allowing an attacker a W3C recommendation since 1998, discusses the potential to determine if a particular URL has been visited by a for history detection using the :visited pseudoclass, and ex- client's browser through applying CSS styles distinguishing plicitly allows conforming User Agents to omit this function- between visited and unvisited links. The entire state of the ality for privacy reasons, without jeopardizing their compli- client's history cannot be directly retrieved; to glean his- ance with the standard. tory information, an attacker must supply the client with While initial discussions of CSS-based history detection a list of URLs to check and infer which links exist in the were mostly conducted in on-line forums, the issue was also client's history by examining the computed CSS values on disclosed to the academic community and discussed in the the client-side. As noted in [11], there are two basic tech- work of Felden et al. in conjuction with cache-based history niques for performing such detection. sniffing[13]. The CSS-only method, as shown in Figure 1, allows an As a response, Jakobsson and Stamm discussed potential attacker's server to learn which URLs victim's browser con- methods for implementing server-side per-application pro- siders to be visited by issuing HTTP requests for background tection measures[17]; such techniques would have to be im- images on elements linking to visited URLs. A similar, but plemented by every Web-based application and are thus an less known technique is to use the link CSS pseudoclass, extremely unlikely solution to the problem. A viable client- which only applies if the link specified as the element's href side solution was a proposed modification to the algorithm attribute has not been visited; the techniques are comple- for deciding which links are to be considered visited as de- mentary. scribed in [15] and implemented in the SafeHistory exten- This basic CSS functionality can also be accessed within sion[6] for Mozilla Firefox. Unfortunately, no such protec- JavaScript by dynamically querying the style of a link (<a>) tion measures were implemented for other Web browsers, element to detect if a particular CSS style has been applied, and the SafeHistory plugin is not available for more recent as shown in Figure 2. Any valid CSS property can be used Firefox versions. to differentiate between visited and unvisited links. The Other academic work in the area included a scheme for in- scripting approach allows for more flexibility on part of the troducing voluntary privacy-oriented restrictions to the ap- attacker, as it enables fine-grained control over the execution plication of history detection[20].