Tracking of the Finnish Web Users
Total Page:16
File Type:pdf, Size:1020Kb
Tracking of the Finnish Web Users John Bailey Department of Information Systems Science Hanken School of Economics Helsinki 2019 i HANKEN SCHOOL OF ECONOMICS Department of: Type of work: Information Systems Science Master’s Thesis Author and student number: Date: John Bailey 021090 May 6th 2019 Title of thesis: Tracking of the Finnish Web Users Abstract: The development of the World Wide Web has always been a tightrope walk between the commercial needs of companies and the privacy rights of the web users. Understanding the behaviors and preferences of the web users provides the online advertising ecosystem with valuable targeting data. The mechanism facilitating this data collection is called online tracking, with which small snippets of code enable the surveillance of users across the websites they visit. User acceptance of tracking and targeted advertising includes some interesting paradoxes, and global data breaches and misapplications have attracted growing media coverage and public outcry. Online tracking prevalence has been researched from many perspectives, including the longitudinal and geographical, but very little is known about who is tracking and profiling the Finnish web users. As research has shown, there are geographical differences in tracking behavior and prevalence. Therefore, the aim of this thesis was to study the online tracking landscape from the perspective of a Finnish web user. This thesis used website ranking service Alexa’s top 500 websites Finnish users most frequently visit as a proxy for Finnish web usage. The sites were observed using the online tracking measurement tool Tracker Tracker, which documented the online trackers found on these websites. The resulting list of trackers was then enriched with organizational ownership data provided by the Disconnect dataset. The measurement found 466 unique trackers from 408 organizations used on 410 of the 500 websites. The core findings of this thesis supported contemporary research, with Google having an overwhelming lead in tracker coverage, mostly through Google Analytics and Doubleclick, reaching a combined 75 % of the websites. The second most prevalent tracking organization was Facebook, reaching 46 % of the websites. Beyond the top 2 organizations, the competition was much tighter, followed by a long tail of organizations. There were also notable differences when comparing Finnish websites to non-Finnish sites, displaying some level of geographical preference in publishers’ choices of advertising platforms and analytical tools. Keywords: online tracking, cookies, tracking prevalence, 3rd party tracking, targeted advertising, privacy, Finnish web users ii SVENSKA HANDELSHÖGSKOLAN Institution: Arbetets art: Informationsbehandling Magistersavhandling Författare och studerandenummer: Datum: John Bailey 021090 6.5.2019 Avhandlingens rubrik: Tracking of the Finnish Web Users Sammandrag: Utvecklingen på webben har alltid varit en svår balansgång mellan företagens kommersiella behov och webbandvändarnas rätt till dataskydd. Goda insikter om användarnas beteende och preferenser har visat sig vara värdefulla i samband med riktade annonser och personifierade webbsidor. Dessa insikter är byggda på data insamlat genom webbspårning, där korta kodavsnitt kan möjliggöra en övervakning av användarna över flertal webbplatser. Användarnas åsikter och beteende angående webbspårning och riktade annonser möter inte alltid varann, men globala dataintrång och andra missbruk har nyligen lyfts upp i medierna och orsakat stark offentlig kritik. Omfattningen av webbspårning har forskats från flera olika perspektiv, bland annat longitudinell och geografisk, men förståelsen om av vem och hur de finländska användarna spåras är snäv. I och med att tidigare forskning har visat en geografisk skiljaktighet inom webbspårning syftade avhandlingen att fylla denna kunskapslucka genom att undersöka webbspårning ur perspektivet av finländska webbanvändare. Avhandlingen använde sig av Alexas topp 500 webbsidor som finska användare besöker för att representera den finländska webbanvändningen. Sidornas spårningsanrop analyserades och dokumenterades med mätverktyget Tracker Tracker och resultatet berikades med data från Disconnect, som länkade de iakttagna spårningsskripten till organisationerna bakom dem. Mätningen fann 466 unika spårningsskript av 408 organisationer på 410 av de 500 webbsidorna. Avhandlingens resultat stödde tidigare forskning, med en suverän ledningsposition för Google, med spårningsskripten Google Analytics och Doubleclick i spetsen, observerades på 75 % webbsidorna. Facebooks spårningsverktyg var näst populärast med 46 %. Den påföljande rangordningen var skenbart jämnare och bildade slutligen en långt sluttande svans. Det tillkom även märkbara skillnader mellan finländska och icke-finländska sidor, vilket tydde på en geografisk preferens av annonsplatformer och analytiska verktyg på de finländska webbsidorna. Nyckelord: webbspårning, spårningsanrop, kakor, tredje partens spårning, riktade annonser, dataskydd, finländska webbanvändare iii CONTENTS 1 INTRODUCTION ....................................................................................................... 1 1.1 Aim of the study................................................................................................ 2 1.2 Research questions ........................................................................................... 2 1.3 Delimitations .................................................................................................... 2 1.4 Key concepts ..................................................................................................... 3 1.5 Structure of the thesis ...................................................................................... 5 2 EXISTING RESEARCH..............................................................................................7 2.1 What is online tracking? ....................................................................................7 2.2 Online tracking methods .................................................................................10 2.3 Online tracking justification ............................................................................ 16 2.4 Online tracking prevalence .............................................................................. 19 2.4.1 Longitudinal research .................................................................................. 21 2.4.2 Regional research ....................................................................................... 25 2.5 Tracking measurement tools .......................................................................... 27 2.6 Privacy ............................................................................................................ 30 2.6.1 Data collection ............................................................................................ 32 2.6.2 Data processing .......................................................................................... 35 3 RESEARCH METHODS .......................................................................................... 39 3.1 Research design .............................................................................................. 39 3.2 Input data gathering ....................................................................................... 39 3.2.1 Alexa ........................................................................................................... 40 3.2.2 Disconnect ................................................................................................... 41 3.3 Data collection and measurement tool ............................................................ 41 3.3.1 Tracker Tracker .......................................................................................... 42 3.3.2 Ghostery ..................................................................................................... 45 3.4 Data gathering considerations ....................................................................... 46 3.5 Data gathering process ................................................................................... 49 3.6 Data categorization ......................................................................................... 50 3.7 Data interpretation choices ............................................................................ 50 3.8 Quality of the study ......................................................................................... 51 4 RESULTS AND ANALYSIS ..................................................................................... 53 4.1 Data description ............................................................................................. 53 4.2 Tracker Prevalence ......................................................................................... 55 iv 4.3 The Finnish digital footprint .......................................................................... 58 4.4 Tracking in Finland ........................................................................................ 60 4.5 Other findings ................................................................................................. 65 5 DISCUSSION .......................................................................................................... 67 5.1 Results and implications ................................................................................ 67 5.2 Limitations ....................................................................................................