Web Tracking – a Literature Review on the State of Research

Proceedings of the 51st Hawaii International Conference on System Sciences j 2018 Web Tracking – A Literature Review on the State of Research Tatiana Ermakova Benedict Bender University of Potsdam University of Potsdam [email protected] [email protected] Benjamin Fabian Kerstin Klimek HfT Leipzig University of Potsdam [email protected] [email protected] Abstract (Fabian et al., 2015; Bender et al., 2016). Hence, driven by a variety of enabling techniques (Besson et Web tracking seems to become ubiquitous in online al., 2014; Sanchez-Rola et al., 2016), web tracking has business and leads to increased privacy concerns of become ubiquitous on the Web (Roesner et al., 2012), users. This paper provides an overview over the across websites and even across devices (Brookman et current state of the art of web-tracking research, al., 2017). Besides targeted advertising (Sanchez-Rola aiming to reveal the relevance and methodologies of et al., 2016; Parra-Arnau, 2017), web tracking can be this research area and creates a foundation for future employed for personalization (Sanchez-Rola et al., work. In particular, this study addresses the following 2016; Mayer & Mitchell, 2012; Roesner et al., 2012), research questions: What methods are followed? What advanced web site analytics, social network integration results have been achieved so far? What are potential (Mayer & Mitchell, 2012; Roesner et al., 2012), and future research areas? For these goals, a structured website development (Fourie & Bothma, 2007). literature review based upon an established For online users, especially mature, well-off and methodological framework is conducted. The identified educated individuals, who constitute the most preferred articles are investigated with respect to the applied target group of web tracking (Peacock, 2015), the web research methodologies and the aspects of web tracking practices also imply higher privacy losses tracking they emphasize. (Mayer & Mitchell, 2012; Roesner et al., 2012) and risks including price discrimination, government surveillance, and identity theft (Bujlow et al., 2015, 1. Introduction 2017). For instance, Narayanan & Shmatikov (2009) could correctly identify over one third of users given Compared to digital advertising, where advertisers their social patterns on Twitter and Flickr. and publishers sign private deals, programmatic Despite the popularity of web tracking within advertising automates the purchase of digital ad- commercial and research communities (Libert, 2015; inventory, in a common case by means of real-time Hamed et al., 2013; Acar et al., 2014; Han et al., 2012; auctioning and bidding (Stange & Funk, 2015). Roesner et al., 2012; Schelter & Kunegis, 2016a, Programmatic advertising attracts an increasing 2016b; Englehardt & Narayanan, 2016; Gomer et al., number of marketers and advertisers (O'Connell, 2014) 2013), earlier works mainly present single aspects of by allowing them to reach their target audiences within the topic. the “right” context and, hence, generate higher returns As a literature review on the state of research on on their brand campaigns (Fernandez-Tapia, 2016). web tracking, this paper aims to reveal the relevance Online users’ browsing behavior is seen as a and methods of this research field (vom Brocke et al., worthwhile source for building their detailed profiles 2009) and creates a foundation for further research (Mitchell, 2012; Falahrastegar et al., 2016), being of (Baker, 2000). In particular, the focus is placed on the high relevance to improve the above outlined following research questions: (1) What methods are commercial activities (Roesner et al., 2012). followed? (2) What results have been achieved so far? Against this background, online users are (3) What are potential future research areas? increasingly tracked in real time and across multiple websites (Gomer et al., 2013, Falahrastegar et al., 2. Method 2014), although presumably with different levels of intensity (Ermakova et al., 2017), and even by emails URI: http://hdl.handle.net/10125/50485 Page 4732 ISBN: 978-0-9981331-1-9 (CC BY-NC-ND 4.0) For this literature review, we follow a five-step With respect to the history of the topic, web approach by Herz et al. (2010), which requires review tracking was considered to be part of research on scope definition, topic conceptualization, literature information seeking before the emergence of Web 2.0 search, analysis and synthesis, and research agenda. (Taylor & Pentina, 2017). It was related to transaction- log analysis and can be dated back to the mid-1960s 2.1. Definition of the review scope (Fourie & Bothma, 2007). In the literature review by Jansen & Pooch (2001), the focus was placed on web tracking for monitoring the use of databases, CD-ROM As recommended by vom Brocke et al. (2009), we software and library catalogues. apply Cooper’s (1988) taxonomy for review scope This understanding of web tracking changed around definition. Specifically, we concentrate on research the year 2006 (Fourie & Bothma, 2007). Since then, it outcomes, methods, and applications, and aim to reveal refers to a set of techniques for websites to construct central issues and integrate findings. We base our work user profiles (Besson et al., 2014; Sanchez-Rola et al., on a representative source sample, combine conceptual 2016). Web tracking is nowadays also understood as a and methodological formats to organize the review and widespread Internet technique that collects user data present the results from a neutral perspective, for purposes of online advertisement, user addressing general scholars and public. authentication, content personalization, advanced PRIVACY TECHNOLOGY COMMERCIAL Anti-Tracking Software, Developers & Services Markets & Real-Time Bidding for Tracking, Advertising, Personal Data Location A Websites Tracking Data Aggregators Providers Website User Provider 1 Internet Service Provider A Other Mobile Tracking Data Website Consumers Provider 2 Advertising Programmatic Companies Advertisement Location B Internet Service Providers Provider B Website Provider 3 User Figure 1: Overview and conceptualization of Web Tracking website analytics, social network integration, and 2.2. Conceptualization of the topic website development (Sanchez-Rola et al., 2016; After the review scope has been defined, the Mayer & Mitchell, 2012; Roesner et al., 2012; Fourie research area is conceptualized to show what is known & Bothma, 2007). For these goals, web tracking allows about the topic (Torraco, 2005). For this, we base on an third-party or first-party websites to keep track of informally collected set of starting literature gathered users’ browsing behavior, including browsing during earlier work, or recommended by literature configuration and history (Sanchez-Rola et al., 2016). repositories such as ResearchGate based on previous A high-level overview and conceptualization of in research interests. The main literature review, in web tracking, including the major stakeholders contrast, focuses on documented and formal search involved, is given in Figure 1. A user accesses websites strategies for verification and repeatability. from a local device through an Internet Service Provider (ISP). Websites and ISPs may include Page 4733 tracking technology, either in-house or provided by Relevant - 2 - 159 22 1 15 by third parties that provide tracking services for multiple keyword sites (Pugliese, 2015), which enables cross-site Relevant 58 4 9 74 9 0 27 by tracking and data aggregation of individual browsing abstract habits and interests. If the user switches to a different Total 45 1 7 23 6 0 4 device or moves to another location, cross-device tracking (Brookman et al., 2017) and mobile tracking Table 1. Literature by database can be applied. Tracking data is often used for targeted advertising The articles were further checked for relevance (Roesner et al., 2012). This has created background based on their abstracts (see “Relevant by abstracts”) markets for programmatic advertising, including real- and duplicates (see “Total”). Out of the sample of 86 time bidding for available advertising slots on the articles, 58 could not be retrieved or were considered websites that are displayed to the user. Large-scale data inappropriate after a thorough examination and, hence, aggregators and other data consumers are also were eliminated. As a result of backward / forward interested to gather tracking and browsing data to searches (Webster & Watson, 2002, Herz et al., 2010; enrich data profiles on individual web users. This vom Brocke et al., 2009), only three new articles were creates major challenges for the protection of personal found. Finally, we obtained a total of 31 relevant privacy. Anti-tracking software and services aim at articles for in-depth analysis. reducing the privacy exposure to tracking mechanisms and infrastructure. In summary, we identify three main aspects of web 2.4. Literature analysis and synthesis tracking research: technology, privacy, and commerce. In addition, we investigate what kind of research In the next step, the collected literature was methodologies are applied in this field. analyzed and synthesized. Firstly, the articles were investigated with respect to the research methodologies 2.3. Literature search they apply. For these purposes, Wilde & Hess’ (2007) consolidated spectrum of research methodologies in The databases selected for literature acquisition information systems (IS) was adopted (Table 2). included Google Scholar,

Web Tracking – a Literature Review on the State of Research

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild

The Internet and Web Tracking

Improving Transparency Into Online Targeted Advertising

Cookie Swap Party: Abusing First-Party Cookies for Web Tracking

Analysis and Detection of Spying Browser Extensions

Track the Planet: a Web-Scale Analysis of How Online Behavioral Advertising Violates Social Norms

Online Tracking, Targeted Advertising and User Privacy - the Technical Part Privacy and Web 2.0 Seminar Summer Term 2011

Web Tracking: Mechanisms, Implications, and Defenses Tomasz Bujlow, Member, IEEE, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros

An Analytical Study of Web Tracking: in a Nutshell

A Comparative Measurement Study of Web Tracking on Mobile and Desktop Environments

Web Tracking and Current Countermeasures

Google Data Collection —NEW—