Research and Development in E-Business through Service-Oriented Solutions

Katalin Tarnay University of Pannonia, Hungary & Budapest University of Technology and Economics, Hungary

Sandor Imre Budapest University of Technology and Economics, Hungary

Lai Xu Bournemouth University, UK

A volume in the Advances in E-Business Research (AEBR) Book Series Managing Director: Lindsay Johnston Editorial Director: Joel Gamon Production Manager: Jennifer Yoder Publishing Systems Analyst: Adrienne Freeland Development Editor: Myla Merkel Assistant Acquisitions Editor: Kayla Wolfe Typesetter: Christina Henning Cover Design: Jason Mull

Published in the United States of America by Business Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com

Copyright © 2013 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data Research and development in e-business through service-oriented solutions / Katalin Tarnay, Sandor Imre and Lai Xu, editors. pages cm Includes bibliographical references and index. Summary: “This book highlights the main concepts of e-business as well as the advanced methods, technologies, and aspects that focus on technical support for professors, students, researchers, developers, and other industryexperts in order to provide a vast amount of specialized knowledge sources for promoting e-business”--Provided by publisher. ISBN 978-1-4666-4181-5 (hardcover) -- ISBN 978-1-4666-4182-2 (ebook) -- ISBN 978-1-4666-4183-9 (print & perpetual access) 1. Service-oriented architecture (Computer science) 2. Data mining. 3. Electronic commerce. I. Tarnay, Katalin. II. Imre, S?ndor. III. Xu, Lai, 1970- TK5105.5828.R47 2013 658.8’72--dc23

2013009512

This book is published in the IGI Global book series Advances in E-Business Research (AEBR) Book Series (ISSN: 1935- 2700; eISSN: 1935-2719)

British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher. 134

Chapter 7 Tracking and Fingerprinting in E-Business: New Storageless Technologies and Countermeasures

Károly Boda Gábor György Gulyás Budapest University of Technology and Budapest University of Technology and Economics, Hungary Economics, Hungary

Ádám Máté Földes Sándor Imre Budapest University of Technology and Budapest University of Technology and Economics, Hungary Economics, Hungary

ABSTRACT Online user tracking is a widely used marketing tool in e-business, even though it is often neglected in the related literature. In this chapter, the authors provide an exhaustive survey of tracking-related identification techniques, which are often applied against the will and preferences of the users of the Web, and therefore violate their privacy one way or another. After discussing the motivations behind the information-collecting activities targeting Web users (i.e., profiling), and the nature of the information that can be collected by various means, the authors enumerate the most important techniques of the three main groups of tracking, namely storage-based tracking, history stealing, and fingerprinting. The focus of the chapter is on the last, as this is the field where both the techniques intended to protect users and the current legislation are lagging behind the state-of-the-art technology; nevertheless, the authors also discuss conceivable defenses, and provide a taxonomy of tracking techniques, which, to the authors’ knowledge, is the first of its kind in the literature. At the end of the chapter, the authors attempt to draw the attention of the research community of this field to new tracking methods.

DOI: 10.4018/978-1-4666-4181-5.ch007

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Tracking and Fingerprinting in E-Business

INTRODUCTION as pursuing behavioral advertising or monetiz- ing user profiles by other means. According to Many Website operators involved in end-user the report of IAB Internet Advertising Revenue oriented e-business have an interest in monetizing Report (IAB, 2012), the advertising revenues set their user base (e.g., by realizing as many clicks a new record at $8.4 billion in Q1 2012, clearly on their advertisements as possible). One way of showing the significance and the growth of the achieving this goal is collecting diverse informa- advertising industry. tion about the user (or profiling her), the vehicles In two recent studies, Goldfarb & Tucker of which are technologies that can be used for (January and May, 2011) concluded that targeted identifying returning visitors, and those that infer advertising can successfully influence individuals sensitive information (such as browsing history) in favor of buying a product, and targeted advertis- that users are not necessarily willing to disclose ing has a significant positive (economic) impact (e.g., purchase preferences). This controversial on advertising. These claims are consistent with method may be a necessity for better serving the the ever-increasing presence of Web tracking needs of online customers, but it often goes beyond techniques (the most used tools for profiling) and user demands, and is used for business purposes the revenues the industry reached in 2012, presum- against the will of the clients. ably with a continuously increasing proportion It is not surprising that certain users are of behavioral advertising. By summarizing the concerned about pervasive profiling. In fact, measurements of Krishnamurthy & Wills between this problem has been discussed from so many 2006 and 2012 (Krishnamurthy & Wills, 2006; viewpoints in academia that not only effective 2009; Krishnamurthy, 2010), Mayer & Mitchell technological, but also more or less effective leg- (2012) highlighted that the coverage on top sites islative countermeasures have been implemented of large tracking companies increased during these in order to mitigate the privacy risks arising from years, and so did the number of trackers per page. the proliferation of a subset of profiling-related Today, researchers estimate that there are track- technologies. That said, this chapter is inspired ers capable of monitoring more than one fifth of by the rest (primarily by fingerprinting attacks), user activity while browsing online (Roesner et for which neither defensive technology nor the al., 2012). law seems to be able to keep up with the pace However, as can be expected, tracking for of development dictated by pro-profiling actors, targeted advertising is not favored by users. A leaving users of the Web powerless against and Harris Interactive poll (Krane, 2008), a TRUSTe vulnerable to them. survey (2009), and a nationally representative telephone survey in the USA conducted by Turow et al. (2009) uniformly confirmed that the majority PROFILING AND USER of individuals (around 60% in all cases) found it PRIVACY ON THE WEB uncomfortable when they faced advertisements on Websites adjusted to their preferences by pre- There are several motives of profiling users on viously observing their online activities. A more the Web. For instance, an enormous number of recent online survey conducted by McDonald & Web services can be accessed for free; however, Cranor (2010) also confirmed this result with contrary to how it looks (or is communicated), 55% of respondents rejecting targeted advertising; free access often comes with a greater sacrifice another recent survey conducted by TRUSTe in of user privacy, as many of these companies gain partnership with Harris Interactive (2011) reported revenue from profiling-related activities, such a higher rate of respondents, namely 85%, who

135 Tracking and Fingerprinting in E-Business

would not consent to being tracked for targeted fingerprints all types of devices (Oracle, 2011). advertising. Somewhat in contrast with user attitude towards Nevertheless, targeted advertising is not the commercial tracking, a survey of TRUSTe & only goal of tracking and profiling; there are several Harris Interactive (2011) shows that 42% of re- other commercial use cases. Besides monetizing spondents would consent to tracking to enhance profiles in trades (i.e., exchanging large databases security and to detect frauds. It must be noted that of pseudonymous profiles), they can be utilized such identification can be used for quite different to provide additional user interface features; for and questionable purposes, such as identification example, recommendations on news or Web shop for censorship or surveillance, too. items, customized user interfaces, or to facilitate price discrimination (dynamic pricing, i.e., adjust- Who Needs Users to be Identified? ing price tags for customer price sensitivity for profit maximization of the vendor). There are several actors in the online era arguing The case of Amazon.com–namely when the against or in favor of identification. Normally, company offered the same DVD at different prices the only participant arguing against is the user for different customers–is a frequently cited ex- herself, for whom identification is an undesired ample of dynamic pricing (Davis, 2000). After feature, up to the point when it brings benefits (it customers discovered pricing differences while unlocks extra functionality or the user gains extra discussing their experiences in online forums, credit). On the other hand, there are numerous Amazon had to remove this feature and apologize other participants having their own objectives and for their experiment. Amazon claimed that prices goals, who want users to be identified regardless were merely random, instead of being tailored for of their privacy preferences. customer profiles. As it has become clear so far, advertising and Web users relate a bit ambivalently to dynamic marketing companies, and other parties pursuing pricing. Although it is a known and unpopular related activities are the most prominent actors feature among them (Cranor, 2003), it becomes (e.g., search engine optimization, retargeting, rather popular when it is communicated in the analytics services). Profiles can also be used in form of discounts: according to the survey of web shops for product recommendation systems, McDonald & Cranor (2010), 80% of respondents or for assisting pricing strategies, but arbitrary would consent to tracking to receive discounts service providers may use profiles and tracking tailored for their purchase interests, suggesting to enhance the functionality of their services. that price discrimination is more acceptable if it Among many others, social networks, content is communicated as a reward or bonus rather than providers and distributors, service platforms, a way of invading privacy. hosting services belong to this category. Finally, we mention security, where tracking Data collectors and traders monetize pro- (and profiling) is applied in order to identify po- files directly. Identification also plays a strong tential malicious users and prevent intrusions. In role in censorship and surveillance, and–as dis- practice, this is pursued by many security-sensitive cussed previously–also in security; for example, companies, such as banks or credit card compa- identification can help prevent click frauds by nies. For instance, ThreatMetrix™ Cybercrime limiting the number of paid clicks per client per Defender Platform uses device identification to advertisement (Schmücker, 2011). On the other identify clients when they detect fraudulent access hand, malicious parties, such as identity thieves, to prevent further attacks (ThreatMetrix, 2012); phishers, and other kinds of online stalkers may similarly, the Oracle Adaptive Access Manager also find profiles useful.

136 Tracking and Fingerprinting in E-Business

Table 1. Classification of profiling sources The Process of Profiling

Implicit data Explicit data In the early years of the Web, tracking was the First party Services of information superpowers only way to profile users. Attributes learned by Third party Tracking Public data sources tracking are called implicit data, since these are not expressed by the user, but deduced from her actions. In the early 2000s, with the spread of information superpower and tracking users. N.b. , social networks, and so forth, and the social that the case of Facebook may be a bit more content contribution to the services of the Web complex, as according to an earlier version of 2.0 era, another source of information became their privacy policy, the social service reserved available, called explicit data, referring to self- the right to crawl public data sources in order to expressed information, collected by other means extend their users’ profiles (Gulyás, 2009). (e.g., using web crawlers) for extending profiles. Development of search engines catalyzed this The Three Steps of Profiling process by supporting the access to public sources for explicit data, to the point where the spread of The implications of tracking become visible after information became real time (e.g., Shankland, understanding the context of identification and 2010). Instant accessibility and the emergence of related economic activities (i.e., how profiling is a variety of “Web and content archiving” services performed). Identification itself does no harm to (e.g., Fitzpatrick, 2012), had a negative effect on user privacy; the question is how identification is user privacy, as it simply eliminated the choice used. Implication of tracking techniques discussed of revocation. later in this chapter only differ in the extent of Classification of Information Sources identification (i.e., more generic identification implies larger profiles that has a wider range of use): while some techniques identify the user In their work, Gulyás et al. (2012) categorize only in the same browser (e.g., tracking cookies, profiling sources into three categories (complex CSS-based history stealing attacks) or the same profiling techniques may use multiple of these): device (e.g., cross-browser fingerprinting, Flash services of information superpowers (large com- PIEs), others achieve device-independent personal panies having a large service portfolio capable of identification (e.g., real-world identification via capturing a significant fragment of user activity), history stealing, biometric fingerprinting). public data sources, and tracking. Here, we further The scheme of the process of personal data refine their classification by categorizing sources collection is depicted on Figure 1. The goal of regarding the data types and the way users typi- the process is to create fine-grained user profiles cally interact with these sources. (It must be added to be used or sold. In the first step, profilers use that exceptional, but less relevant cases do exist; the previously-discussed information sources for for example, first party trackers can theoretically collecting information, the data from which is be realized, but the advertising business is based processed, combined and further refined in the rather on third-party trackers.) Our classification second step. In step three, data can be monetized is shown in Table 1. by several means. According to this classification, a social net- During the collection process, numerous types working site with a social widget in widespread of data are logged to profiles–we just highlight a use over the internet–such as Facebook’s Like few examples. Simply put, trackers can analyze button–is classified as both using sources of an

137 Tracking and Fingerprinting in E-Business

Figure 1. The process of collecting, processing and using different types of data

the visited Websites from different perspectives, Jang et al. also mention another tracker company, such as regarding their content (for contextual ClickTale, who create an aggregate heatmap of advertising), their meaning (for semantic advertis- mouse movements. According to ClickTale’s web- ing), or the attitude of the creator towards the site1, in addition to heatmaps, they record mouse content itself (for sentiment analysis)–information clicks, scroll reach, keystrokes, and offer reports that is rather static. That said, behavior profiles based on the analysis of the data. Besides, Jang et can also include clickstream information (i.e., al. note that many sites in their experiments use how the user navigates through sites, to determine their own tracking techniques instead of using where she is off the track the advertiser planned, external services. or to predict future behavior, e.g., to determine Service providers, and especially information the willingness to order an item online)Van den superpowers, are in an even better position, as they Poel & Buckinx, 2005)). There are trackers that can collect metainformation of the habits of their operate link shortening services and provide users. For instance, Google can be aware of their content sharing widgets in order to observe social users’ acquaintances (via Gmail and Google+), sharing connections (Regalado, 2012). In our daily routine (with Google Calendar), interests (via opinion, this trend will gain momentum, and ), and if Google Search is set as the advertisers will extend their view to collect even home page of a browser, Google can estimate when more metadata on daily routine, (social) behavior the computer is first turned on, and how long it is and connections. operated. By utilizing geolocalization techniques, even home and work location can be determined, What Do they Collect? which, when coupled together, can be used for unique identification (Golle & Partridge, 2009). However, trackers also reach towards a lower metalevel. Jang et al. (2010) report several cases of Privacy Implications of Identification mouse, keyboard, and clipboard tracking. Within the Alexa top 1,300 Websites, they found such When a user is identified and tracked, her activi- tracking in 115 cases, where the tracking happened ties are linked together in her profile, meaning without any visual feedback to the user. Of these a greater loss of user privacy for many reasons. sites, 7 used the behavior tracking service of tynt. For example, a profiled user can be influenced com, which reported mouseover and copy events, in her buying decisions, or her search results can and also transferred the copied text to tynt.com. be biased according to business objectives. A

138 Tracking and Fingerprinting in E-Business

profiled user is tied to her past actions, which can browsers, such as Tor and JondoFox, which are be used for abuses, denigration, or may simply reaching for such high aims (Perry et al., 2011). cause uncomfortable situations. As an example, let us imagine a father looking Penetration of User Tracking for gifts for his daughter’s birthday. After spend- Techniques ing a few hours of searching online and visiting some related sites, he leaves his laptop on his desk. We differentiate between within- and cross-site Later, his daughter arrives and uses his laptop to trackers only; however, further refinement of check out a website. We could imagine how sur- classification is possible regarding the state-of- prised she would be to see the specific gift-related the-art tracking landscape (Roesner et al., 2012). advertisements along the site. Clearly, the father In the case of the first type of trackers, the track- would have liked to separate his gift-searching ing cookie is owned by the visited site, and–since activities from regular ones. the tracker is never visited directly–the cookie is The goal would be to separate activities con- intentionally leaked for tracking (which is referred ducted on different sites, or–more formally–to to as cookie handover). Cross-site trackers own reach the unobservability of the actions of the the cookie themselves, enabling tracking user user (See Figure 2). Regarding a specific site visit, activities over site boundaries (the cookie can unobservability means anonymity towards this be set from a first- or third-party position, too). site (i.e., the visitor cannot be identified within However, both cooperation between trackers the group of visitors), plus undetectability of the and combination of these types have been reported visit for both other sites and other Web actors recently (Roesner et al., 2012). Cooperating track- (Pfitzmann & Hansen, 2010). At this time, these ers share tracking information–such as identifiers goals cannot be accomplished by using default Web and location information upon a visit–which means browsing software (not even in private browsing that a tracker that is not present can learn about it. mode; Aggarwal et al., 2010); some success, how- For instance, Roesner et al. mention the example ever, can be achieved by using anonymous Web of admeld.com leaking its tracking cookie and the

Figure 2. The scheme of undetectable web browsing: a request is unobservable for all actors, and anonymous towards communication partners (i.e., websites). Ideally, this means no actors (users and Websites) can decide whether a user made a request or not, and no Websites can specify which user made the request it received (responses are proxied through the anonymous Web browsing service).

139 Tracking and Fingerprinting in E-Business

top level page URL to turn.com. Cooperating track- Penetration of social widgets is also worth men- ers can seem to be misleadingly privacy-friendly, tioning. In May 2011, the presence of Facebook, as for cooperating parties a single detector that Google, and Twitter widgets was estimated at 33%, invokes others later or sends feedback on detours 25%, and 20% respectively among top 1,000 sites can have a larger implication on privacy than (Efrati, 2011), but surprisingly, their presence it seems. Combination of tracker types implies seems to stagnate. Roesner et al. measured the that a within-site tracker gains cross-site tracker presence of both Facebook and Google at around capability: the within-site tracker is embedded 30% among the Alexa top 500 sites, and Twitter into the detector code of the cross-site tracker (the at 18.6%. In another recent work, Kontaxis et al. cookie is owned by the cross-site tracker). In this (2012) estimate the presence of the Facebook case, the cross-site tracker learns environmental Like button at 35% among the top 10,000 sites information of the user via its tracker asset, but (in June 2012). However, widgets also provide identifies the user via the within-site tracker. income for another group of services, namely Empirical measurements on tracking tech- companies hosting embeddable widget collections niques indicate significant penetration. Gomez (e.g., AddThis), who fund their services from et al. (2009) analyzed 393,829 unique domains selling data obtained by tracking visitors (Mayer (collected by the users of the Ghostery Firefox & Mitchell, 2012). plugin) of which 88.4% used Google Analytics (a As revealed in this section, tracking is widely within-site tracker), and top 50 Websites contained adopted for tracking users. It is not realistic to at least one tracker, but some had as many as 100. assume that these techniques are going to van- Ayenson et al. found Google Analytics to be pres- ish in the near future; rather, on the contrary, ent at 97 of the QuantCast.com top 100 sites in we expect them to spread further and to develop 2011. In their recent work, Roesner et al. (2012) new forms to walk one step ahead of consumers. also confirmed significant tracker presence while In the next chapters, we review currently known analyzing the Alexa top 500, as they found 7,264 and emerging techniques. We discuss storage- instances of 524 trackers present on these sites. based tracking methods first, and “storageless” According to their work, “tracker morbidity” is techniques–which are expected to dominate in depicted as the top 20 trackers being present in near future–afterwards. 26-297 sites in the top 500. Trackers benefit from being visited as first- party sites as they are allowed to set cookies BASIS OF TRACKING: even when third-party cookies are blocked in the REGULAR TECHNIQUES browser agent. Although cross-site trackers are usually not visited directly, there are two excep- In the early years of the Internet, users were tions. First, social networking sites are visited identifiable uniquely by their IPv4 addresses, but voluntarily for their services, and as confirmed IP-based tracking soon became obsolete due to by the results of Roesner et al., some sites force the widespread use of network address translation visitors to view them from a first-party position (NAT) and dynamic IP addresses, as in both cases (e.g., by opening a popup window or applying a multiple clients may use the same address. Despite redirection chain). Clearly, both are used to cir- its inaccuracy as an identifier, the IP address is still cumvent third-party cookie blocking settings of the usable when augmented with other information browser agent, as in the case of insightexpressai. (e.g., the user agent string), and can still play a com, mentioned by Roesner et al. part in tracking as part of a unique identifier (Yen et al., 2012). In addition, the IPv4 addresses are

140 Tracking and Fingerprinting in E-Business

widely used for visitor localization. Free databases the same principles with smaller modifications exist for pairing addresses with country-city loca- (e.g., using scripts or frames). tions, such as (IpToCountry, 2012), and research Problems related to cookies can be observed indicates that even a finer-grained localization is with other storage-based mechanisms, too. As we possible going down to street level (Wang et al., have mentioned previously, some trackers leak 2011). The role of the IP addresses may change cookies intentionally, but malicious third parties in the future as IPv6 is expected to spread, since may also try to steal cookies, for instance by having its addresses are practically capable of covering their scripts included into the site content, which all existing and future devices, thereby eliminat- seems to be common: Yue & Wang (2009) found ing the need for dynamic or translated addresses. that 66.4% of sites embedded external scripts in As IP addresses were not reliable anymore, their experiment. In addition, user-contributed Websites and trackers started to store unique identi- content can also contain cookie-stealing malicious fiers on the visitors’ computers, in the storage of scripts (e.g., cross-site scripting). Cookies are their browsers, first in the cookie database. After a sent along with the request, and then, if they are while, an increasingly relevant proportion of users being transmitted without protection, they can be started to delete cookies to opt out of tracking, subject to traffic sniffing3 or active sidejacking4. and coevolution has characterized web privacy Fortunately, in contrast to cookies, most stored since then. When trackers develop new tracking objects are not transmitted automatically (e.g., techniques, related protection mechanisms follow. Flash cookies and HTML5 storages), and therefore In this section, we review the classic storage-based not affected by these threats. techniques. As a relevant fraction of users soon started deleting cookies, trackers moved to using rich Storage-Based Techniques internet application storages, such as the Flash (Local Shared Objects) and SilverLight (Isolated Cookies have some associated basic protection, Storage) storages. Both plugins are widespread such as the same-origin policy: if a Website sets (respectively 95.78%, 69.33% according to a cookie, only the same site can read it later, al- Stat Owl (2012), and their storages share some lowing the identification of returning visitors, characteristics with cookies, with additional ad- but not cross-site tracking. However, Website vantages (for trackers). Being isolated from the operators soon realized that if they installed browser, these plugins allow the use of the same Web bugs2 on multiple sites, they could easily identifier in multiple browser agents in parallel extend their reach. Web bugs are small, usually (See Figure 3), and default expiration dates also 1x1 pixel-sized invisible images, and based on a favor tracking–perhaps larger cookie sizes, too. simple mechanism. When the user visits a site, For Flash, cookie size starts from 100 KB (for and the content is loaded, the Web bug is down- requests having a larger size, the user is asked for loaded from the third-party site (i.e., the tracker), permission), and permanent expiration is set by who has a chance to read and write cookies, and default (Local Shared Object, 2012). In the first can detect the first-party site address through the years, lower awareness surrounded these tech- referrer HTTP header. Therefore, identification is niques, but nowadays users are more aware, and possible, and the web bug owner can determine there are standardized ways to control (and clear) particular attributes of the visitor in addition to their storages such as the NPAPI ClearSiteData the IP address and browser agent information. (Chandna, 2011). Subsequent tracking techniques usually leverage The first instance of announcing Flash cookies being used for tracking, were the Persistent Iden-

141 Tracking and Fingerprinting in E-Business

Figure 3. Cross-browser tracking with Flash storage. After contacting the first site (1) a Flash based Web bug is downloaded from the tracker (2). After the Web bug is loaded it sets (or reads if it exists) the identifier (3) and sends it back to the tracker directly (4), augmented with profiling information (e.g., subject of site1.com). When visiting another site with another browser, a similar process is executed (5-8), which uses the same storage.

tifier Elements (PIEs for short), which were an- percent of the Alexa top 500 stored identifiers in nounced in 2005 (Unitied Virtualities, 2005). the HTML5 LocalStorage. However, they report Later, Flash cookies were reported to be used for two cases where LocalStorage was used instead of “cookie respawning” (regenerating) cookies to cookies, and a case where cookies were respawned circumvent cookie deletion and user preferences, from LocalStorage. and were found to be present on 54 of the top 100 Different layers of the browser cache are also sites (Soltani et al., 2009). Recent follow-up re- exploited for storing identifiers. A cache control search shows that Flash cookies are still present mechanism, e-tags (or entity-tags) is used as on 37 of the top 100 sites (Ayenson et al., 2011), backup for cookies, for instance, Ayenson et al. but a lower rate was confirmed by Roesner et al. (2011) mention kissmetrics.com, and Wramner (2012), as in their experiment only 35 of 524 (2011) describes the use of e-tags in the tracking trackers used Flash, and it was used as a backup mechanisms of TradeDoubler. In their intended only in 9 cases. The latter authors highlight the use, e-tags are used to determine if the online particular case of sodahead.com, where the Flash document has changed compared to the local backup cookie was even encoded. cache, similarly to fingerprinting (hashing) docu- Ayenson et al. (2011) additionally report that ments for comparison. As a disadvantage (from trackers are also moving to HTML5 storages, as the viewpoint of a tracker), e-tags are harder to they found that 17 of the top 100 sites used them. handle, since reading and writing them requires The HTML5 storages allow larger storage (5 MB), exact URL matches. On the other hand, e-tags and similarly to Flash permanent expiration is set cannot be blocked with cookies, and they are as default. Roesner et al. (2012) find that only one even available in private browsing mode. Another

142 Tracking and Fingerprinting in E-Business

control field, the last modified timestamp can also Further Privacy Issues and be exploited similarly to e-tags, simply by giving Techniques Applied a unique date to each visitor. Cached content can also be exploited for storing The techniques discussed before can be applied identifiers. The evercookie (2010) also uses this to RSS (or ) channels; this is referred to as method: it draws the identifier on some pixels of RSS tracking. In his recent thesis, Danis (2011) an image, and then makes the browser cache it. analyzed the possibilities of applying regular In order to read the image, it is loaded onto an techniques to RSS channels, and found several HTML5 canvas, where images can be managed on flaws in the implementations in browsers and a pixel-level basis. Other content caches can also channel reader software. In our opinion, one of his be used to store identifiers, such as variables in Ja- most important finding is that by using internal vaScript, or entity properties in CSS. Operational frames (represented by the