An Automatic HTTP Cookie Management System

Computer Networks 54 (2010) 2182–2198 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet An automatic HTTP cookie management system Chuan Yue, Mengjun Xie, Haining Wang * Department of Computer Science, The College of William and Mary, Williamsburg, VA 23187, United States article info abstract Article history: HTTP cookies have been widely used for maintaining session states, personalizing, authen- Received 2 September 2009 ticating, and tracking user behaviors. Despite their importance and usefulness, cookies Received in revised form 25 February 2010 have raised public concerns on Internet privacy because they can be exploited by third-par- Accepted 15 March 2010 ties to track user behaviors and build user profiles. In addition, stolen cookies may also Available online 20 March 2010 incur severe security problems. However, current Web browsers lack secure and conve- Responsible Editor: Neuman de Souza nient mechanisms for cookie management. A cookie management scheme, which is easy-to-use and has minimal privacy risk, is in great demand; but designing such a scheme Keywords: is a challenge. In this paper, we conduct a large scale HTTP cookie measurement and intro- Web HTTP cookie duce CookiePicker, a system that can automatically validate the usefulness of cookies from Security a Web site and set the cookie usage permission on behalf of users. CookiePicker helps users Privacy achieve the maximum benefit brought by cookies, while minimizing the possible privacy Web browsing and security risks. We implement CookiePicker as an extension to Firefox Web browser, and obtain promising results in the experiments. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction are, is the heart of the privacy concern that cookies raise. For example, a lawsuit alleged that DoubleClick Inc. used HTTP Cookies, also known as Web cookies or just cook- cookies to collect web users’ personal information without ies, are small parcels of text sent by a server to a web brow- their consent [3]. Moreover, vulnerabilities of web applica- ser and then sent back unchanged by the browser if it tions or web browsers can be exploited by attackers to accesses that server again [1]. Cookies are originally de- steal cookies directly, leading to severe security and pri- signed to carry information between servers and browsers vacy problems [4–7]. so that a stateful session can be maintained within the As the general public has become more aware of cookie stateless HTTP protocol. For example, online shopping privacy issues, a few privacy options have been introduced web sites use cookies to keep track of a user’s shopping into Web browsers to allow users to define detailed poli- basket. Cookies make web applications much easier to cies for cookie usage either before or during visiting a write, and thereby have gained a wide range of usage since Web site. However, these privacy options are far from en- debut in 1995. In addition to maintaining session states, ough for users to fully utilize the convenience brought by cookies have also been widely used for personalizing, cookies while limiting the possible privacy and security authenticating, and tracking user behaviors. risks. What makes it even worse is that most users do Despite their importance and usefulness, cookies have not have a good understanding of cookies and often misuse been of major concern for privacy. As pointed out by Kris- or ignore these privacy options [8]. tol in [2], the ability to monitor browsing habits, and pos- Using cookies can be both beneficial and harmful. The sibly to associate what you have looked at with who you ideal cookie-usage decision for a user is to enable and store useful cookies, but disable and delete harmful cookies. It has long been a challenge to design effective cookie man- * Corresponding author. Tel.: +1 757 221 3457. E-mail addresses: [email protected] (C. Yue), [email protected] (M. agement schemes that can help users make the ideal coo- Xie), [email protected] (H. Wang). kie-usage decision. On one hand, determining whether 1389-1286/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2010.03.006 C. Yue et al. / Computer Networks 54 (2010) 2182–2198 2183 some cookies are harmful is almost impossible, because 2. Background very few web sites inform users how they use cookies. Platform for Privacy Preferences Project (P3P) [9] enables HTTP cookies allow an HTTP-based service to create web sites to express their privacy practices but its usage stateful sessions that persist across multiple HTTP transac- is too low to be a practical solution. On the other hand, tions [13]. When a server receives an HTTP request from a determining whether some cookies are useful is possible, client, the server may include one or more Set-Cookie because a user can perceive inconvenience or web page headers in its response to the client. The client interprets differences if some useful cookies are disabled. For in- the Set-Cookie response headers and accepts those cook- stance, if some cookies are disabled, online shopping may ies that do not violate its privacy and security rules. Later be blocked or preference setting cannot take into effect. on, when the client sends a new request to the original ser- However, current web browsers only provide a method, ver, it will use the Cookie header to carry the cookies with which asks questions and prompts options to users, for the request [14]. making decision on each incoming cookie. Such a method In the Set-Cookie response header, each cookie begins is costly [10] and very inconvenient to users. with a NAME=VALUE pair, followed by zero or more semi- In this paper, we first conduct a large scale cookie mea- colon-separated attribute-value pairs. The NAME=VALUE surement for investigating the current cookie usage on var- pair contains the state information that a server attempts ious web sites. Our major measurement findings show that to store at the client side. The optional attributes Domain the pervasive usage of persistent cookies and their very and Path specify the destination domain and the targeted long lifetimes clearly highlight the demand for removing URL path for a cookie. The optional attribute Max-Age useless persistent cookies to reduce the potential privacy determines the lifetime of a cookie and a client should dis- and security risks. Then we present CookiePicker, a system card the cookie after its lifetime expires. that automatically makes cookie usage decisions on behalf In general, there are two different ways to classify cook- of a web user. CookiePicker enhances the cookie manage- ies. Based on the origin and destination, cookies can be ment for a web site by using two processes: a training pro- classified into first-party cookies, which are created by cess to mark cookie usefulness and a tuning process to the web site we are currently visiting; and third-party recover possible errors. CookiePicker uses two comple- cookies, which are created by a web site other than the mentary algorithms to effectively detect HTML page differ- one we are currently visiting. Based on lifetime, cookies ence online, and we believe that these two algorithms have can be classified into session cookies, which have zero life- the potential to be used by other online tools and applica- time and are stored in memory and deleted after the close tions. Based on the two HTML page difference detection of the web browser; and persistent cookies, which have algorithms, CookiePicker identifies those cookies that non-zero lifetime and are stored on a hard disk until they cause perceivable changes on a web page as useful, while expire or are deleted by a user. A recent extensive simply classifying the rest as useless. Subsequently, Coo- investigation of the use of first-party, third-party, session, kiePicker enables useful cookies but disables useless cook- and persistent cookies was carried out by Tappenden and ies. All the tasks are performed without user involvement Miller [15]. or even notice. Third-party cookies bring almost no benefit to web Although it is debatable whether defining useful cook- users and have long been recognized as a major threat to ies as those that lead to perceivable changes in web pages user privacy since 1996 [2]. Therefore, almost all the pop- retrieved is the best choice, so far this definition is the ular web browsers, such as Microsoft Internet Explorer and most reasonable measure at the browser side and it is also Mozilla Firefox, provide users with the privacy options to used in [11]. The reasons mainly lie in that very few web disable third-party cookies. Although disabling third-party sites tell users the intention of their cookie usage, P3P cookies is a very good start to address privacy concerns, it usage is still very low, and many web sites use cookies only limits the profiling of users from third-party cookies indiscriminately [12]. [2], but cannot prevent the profiling of users from first- We implement CookiePicker as a Firefox web browser party cookies. extension, and validate its efficacy through live experi- First-party cookies can be either session cookies or per- ments over a variety of web sites. Our experimental results sistent cookies. First-party session cookies are widely used demonstrate the distinct features of CookiePicker, includ- for maintaining session states, and pose relatively low pri- ing (1) fully automatic decision making, (2) high accuracy vacy or security threats to users due to their short lifetime. on decision making, and (3) very low running overhead. Therefore, it is quite reasonable for a user to enable first- The remainder of this paper is structured as follows.

Load more