Online Tracking, Targeted Advertising and User Privacy - the Technical Part Privacy and Web 2.0 Seminar Summer Term 2011

Online Tracking, Targeted Advertising and User Privacy - The Technical Part Privacy and Web 2.0 Seminar Summer Term 2011 Simon Sprankel sprankel[at]rbg.informatik.tu-darmstadt.de September 30th, 2011 This paper gives a short overview over the field of Online Tracking, Online Targeted Advertising (OTA) and the related privacy issues. Balachander Krishnamurthy rightly states that \there are at least three different angles through which one could approach the problem of reducing privacy leakage: technical, legislative and economic" [1]. This paper focuses on the technical part, whereas Nadine Trüschler's paper mainly focuses on the legislative and economic part. Since she also gives a more detailed introduction in the topic of OTA, it is advantageous to read her paper first. 1 Introduction A short analysis of network traffic suffices in order to see that tracking of users over mul- tiple websites is an up-to-date topic and raises various privacy concerns. For instance studiVZ, one of the largest German Online Social Networks (OSNs), communicates ones profile ID to a big advertising company, Tradedoubler. Another example is a Facebook app called Kickmania, which communicates ones profile ID to another advertising company [2]. Hence, these companies are able to identify the user via its public profile at the OSN. Data, which makes a person identifiable is called Personally Identifiable Infor- mation (PII). There are numerous other cases in which web applications delegate PII to advertising companies. With the growing use of online social networks, the problem has even become worse. It has been shown, that \it is possible for third-parties to link PII, which is leaked via OSNs, with user actions both within OSN sites and elsewhere on non-OSN sites" [2]. Thus, it is an interesting task to look at the ways users are tracked and the ways this can be prohibited. 1 2 Web Tracking Technologies Web tracking means that a user is tracked over various websites by e. g. advertising companies. If an advertising company is able to track a user, it can create an interest profile of the user. This process is often referred to as profiling. With an interest profile, it is possible to show targeted advertisements to the user. This is also called behavioural targeting. In general, these advertisements are more interesting to the user, so that he will click the targeted advertisement more often than a random one. Studies claim that without targeted advertising, advertising effectiveness decreases by around 65 percent [3]. Hence, advertising companies have a huge interest in web tracking. Anyway, the user's privacy must not be forgotten. There are innumerable types of web tracking technologies. I will focus on cookies, web bugs and fingerprinting. There are also other possibilities like browser extensions, complex JavaScript code (e. g. used by Google Analytics), modified browsers or deep packet inspection. 2.1 Cookies Cookies are arbitrary strings belonging to a website and are stored on the user's machine. Their intent is to \facilitate a browser-server stateful interaction, in a stateless protocol" [4]. Each time a user contacts a website, the server may send cookies to the client, which are then stored by the user's browser. If a user contacts the same website again, the stored cookies are added to the request sent to the server. So it is possible to store user-specific data, e. g. shopping cart information or language preferences, in cookies. As one can see, cookies may be used for quite reasonable things. However since websites include more and more third-party content, not only cookies of the requested website, but also many third-party cookies are set. The user often does not expect to communicate with another company than the one the requested website belongs to, so these third-party cookies raise several privacy problems. They are often set by advertising companies. The advertising companies place advertisements on many websites and as a result, they are able to track the user over these websites. Technically, there are different types of cookies. I will introduce HTTP cookies, flash cookies and evercookies. HTTP cookies are the commonly known cookies stored in the user's browser. They are set by the Set-Cookie header described in RFC 6265 [5] and easy to delete in the menu of all current browsers. Flash cookies (also called Local Shared Objects LSO) are somehow more advanced cookies, since they are more difficult to delete. They can be set from a flash object integrated into a website. They are not stored in a folder controlled by the browser, but in a folder configured by Adobe. In the Linux distribution Ubuntu, the folder is =home=USER=:macromedia=F lash P layer=#SharedObjects. Since Flash 10.3, Adobe provides the so called ClearSiteData-API, with which it is possible to delete flash cookies. 2 The current versions of Internet Explorer, Firefox, Opera and Google Chrome already support the deletion of flash cookies. Solely Apple's Safari lacks such a feature by birth. Just a year ago, before Adobe released the ClearSiteData-API, it was only possible to delete flash cookies manually or on a specific website of Adobe. Evercookie 1 is a JavaScript API enabling websites to create cookies that are nearly impossible to delete. Currently, evercookie combines 13 different types of cookie storage possibilities like HTTP cookies, flash cookies, Silverlight and HTML5 storage functions, history and cache techniques and others. If data like HTTP and flash cookies are re- moved, the data can easily be restored, because it is saved redundantly. Techniques to make cookies permanent, often referred to as cookie respawning, have been used in the wild by e. g. KISSmetrics 2. Ever since some researchers at U.C. Berkeley found this out [11], the topic is in the press. Now, there are first lawsuits against cookie respawning methods 3, so the end of these techniques may be near. 2.2 Web Bugs Web Bugs, also called web beacons, tracking bugs or page tags, are \1x1-pixel pieces of code that allow advertisers to track customers remotely" [3]. In general, they are invisible to the user. The most common form of them are tracking pixel, which are 1x1 images often included from a third-party site. As an example, Google uses them to track conversions 4. Here is an example code for Googles tracking pixel: <img height="1" width="1" border="0" src="http://www.googleadservices.com/ pagead/conversion/1234567890/?value=100&label=Purchase&script=0"> With this pixel, a conversion ID (1234567890), the total amount of the order (100) and the type of the conversion (purchase) is transferred to Google. Of course, it is possible to transfer much more data in these tracking pixels. 2.3 Fingerprinting - Panopticlick A web tracking technology that is nearly impossible to block is fingerprinting. A fingerprint is a summary of software and hardware settings. Fingerprints often identify a user uniquely, so that they may be used as a global identifier. In combination with the IP address of the user, a fingerprint could also be used as a cookie regenerator, if cookies are deleted. As a last usage scenario, fingerprints in combination with the IP address could also be used as an identifier, if cookies are completely disabled. Fingerprinting is rather hard to detect because it does not leave any traces on the user's system. This is also the reason why it cannot be deleted or prohibited except by a configuration change making the system less unique. 1http://samy.pl/evercookie/ 2http://www.kissmetrics.com/ 3http://www.wired.com/epicenter/2011/08/tracking-lawsuit/ 4If a user clicks on an advertisement and directly achieves a goal specified by the website operator, e. g. placing an order, this is called a conversion. 3 Panopticlick 5 is a research project of the Electronic Frontier Foundation studying fingerprinting based on browser uniqueness. In this project, Peter Eckersley learned that (in his sample data of 470; 161 participants) \83:6% of the browsers seen had an instantaneously unique fingerprint, and a further 5:3% had an anonymity set of size 2. Among visiting browsers that had either Adobe Flash or a Java Virtual Machine enabled, 94:2% exhibited instantaneously unique fingerprints and a further 4:8% had fingerprints that were seen exactly twice" [6]. These numbers show that fingerprinting is a scary approach and has the capability to become the major tracking technique. 3 Solutions To Prevent Tracking As we already saw, there are many possibilities to track users and a lot of websites use one or more of these possibilities. This raises the question how one can prevent tracking. The most common solutions that are partly implemented, cookie blocking, domain blocking and Do Not Track, are described in this section. 3.1 Disable Cookies / Cookie Blocking A simple solution to prevent tracking through cookies is to block them [7]. One can do this generally for all sites or site by site. To block them site by site is a huge effort - who wants to decide whether to allow cookies or not for every new page one visits? As a matter of fact, this solution is effective in order to prevent tracking through cookies. Anyhow, there are many other, more sophisticated possibilities to track users than cookies. Additionally, many sites require users to enable cookies and do not let them access the site if cookies are disabled. One more disadvantage is that auto login functions, also known as remember me functions, do not work without cookies. This solution is supported by every current web browser. 3.2 Domain Blocking Another common solution is to block any connection to websites of advertising companies [7].

Load more