In-Depth Technical and Legal Analysis of Web Tracking on Health Related
Total Page:16
File Type:pdf, Size:1020Kb
In-depth technical and legal analysis of Web tracking on health related websites with Ernie extension Vera Wesselkamp, Imane Fouad, Cristiana Santos, Nataliia Bielova, Arnaud Legout To cite this version: Vera Wesselkamp, Imane Fouad, Cristiana Santos, Nataliia Bielova, Arnaud Legout. In-depth tech- nical and legal analysis of Web tracking on health related websites with Ernie extension. 2021. hal- 03241333 HAL Id: hal-03241333 https://hal.archives-ouvertes.fr/hal-03241333 Preprint submitted on 31 May 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Proceedings on Privacy Enhancing Technologies ..; .. (..):1–18 Vera Wesselkamp*, Imane Fouad*, Cristiana Santos, Nataliia Bielova, and Arnaud Legout In-depth technical and legal analysis of Web tracking on health related websites with Ernie extension Abstract: Searching for doctors online has become an in- 1 Introduction creasingly common practice among Web users. However, when health websites owned by doctors and hospitals in- Health data is known to be one of the most sensitive tegrate third-party trackers, they expose their potential types of data, and massive health data leaks is recog- patients’ medical secrets to third parties, thereby vio- nized to be of particularly high severity to the users’ pri- lating the GDPR which only allows the processing of vacy, according to the French Data Protection Authority sensitive health data with the explicit consent of a user. (CNIL) [22]. Searching for doctors online has become an While previous works detected sophisticated forms of increasingly common practice among Web users since cookie syncing at scale, no tool exists as of today that telemedicine peaked in 2020 during the global Covid- would allow owners of health websites detecting com- 19 pandemic [51]. However, the mere visit to a doctor’s plex tracking practices and ensure legal compliance. In website can reveal a lot about its visitor: one can infer this paper, we develop Ernie - a browser extension that which diseases a visitor has or is interested in. When- visualises six tracking and complex cookie syncing state ever health websites integrate third-party trackers, they of the art techniques. We report on the analysis with expose their potential patients’ medical secrets to third Ernie on 176 websites of medical doctors and hospi- parties1. When providing services or monitoring user’s tals that users would visit when searching for doctors in behaviour in the EU, health related websites integrat- France and Germany. At least one form of tracking or ing third-party trackers are in breach with the General cookie syncing occurs on 64% websites before interact- Data Protection Regulation (GDPR) [43] because pro- ing with the consent banner, and 76% of these websites cessing of sensitive health data (derived from a visit to a fail to comply with the GDPR requirements on a valid website) is generally forbidden, unless allowed by several explicit consent. Furthermore, an in-depth analysis of exceptions therein considered (Article 9(2) GDPR). case study websites allowed us to provide comprehen- In the last decade, research has been focused on sive general explanations of why tracking is embedded: quantifying the prevalence of tracking based on cook- for example, in all 45 webpages, where doctors include a ies or lists of known tracking domains [15–17, 32, 33, Google map to help locating their office, tracking occurs 54, 56, 57, 71, 72], while several recent studies de- due to the Google’s cookie already present in the user’s tected sophisticated forms of cookie syncing and ID browser which is attached to a request that fetched the sharing [42, 66, 67]. These studies were performed with Google map useful content. customized large-scale crawlers and hard to replicate Keywords: online tracking, cookie syncing, browser ex- for non-experts. Moreover, quantitative studies measure tension, GDPR, explicit consent, health data the prevalence of various tracking techniques, but rarely explain the reason why tracking is included. This ques- DOI Editor to enter DOI Received ..; revised ..; accepted ... tion is particularly important for health related websites that, differently from commercial websites, do not have an incentive to include targeted advertisement. As a result, owners of health related websites, such as doctors and hospitals, have the urgent need to be able to detect tracking and advanced cookie synchroni- *Co-first Author: Vera Wesselkamp: Inria - Institut Na- tional de Recherche en Informatique et en Automatique 1 According to the French Code of Public Health [23, Article *Co-first Author: Imane Fouad: Inria - Institut National L1110-4], medical secret covers “all information the person com- de Recherche en Informatique et en Automatique ing to the knowledge of the professional, of any member of the Cristiana Santos: Utrecht University [Utrecht] staff of these establishments, services or organizations and of any Nataliia Bielova, Arnaud Legout : Inria - Institut Na- other person in relation, by virtue of his activities, with these tional de Recherche en Informatique et en Automatique establishments or organizations. It applies to all professionals working in the health care system”. Article title 2 sation techniques on their website in order to determine and legal qualitative analysis of one case study website whether the included third parties may be leaking their for each type of potential violation. This analysis helped patients’ health data. While some browser extensions us (1) to uncover the mechanisms used by trackers that visualise known tracking third parties or third party circumvent Firefox’s Enhanced Tracking Protection [40] cookies [28, 31, 44, 61, 64], no browser extension exists used in our experiments; and (2) to identify the reasons as of today that is able to visualise sophisticated forms why tracking is included in otherwise unsolicited health of cookie synchronisation and sharing of user’s identi- websites. This approach demonstrates the usefullness of fiers [42, 66, 67] across third parties. Therefore, own- Ernie browser extension that is a first prototype of an ers of health-related websites are in a difficult position extension that can be further used by non-expert users2. where it is close to impossible to determine tracking and In summary, we make the following contributions: complex cookie syncing included in their websites. (1) We propose the first browser extension Moreover, since processing health data is forbidden Ernie3 that visualizes complex cookie syncing by the GDPR, health website owners can only rely on and ID sharing tracking techniques. Ernie de- one exception and implement a specific type of con- tects 6 categories of such tracking behaviors – Basic sent mechanism, called explicit consent, to make such tracking, basic tracking initiated by another tracker, processing lawful for all third parties included in the first to third party cookie syncing, third to third website. However, even for a basic consent to be legally party cookie syncing, third party cookie forwarding, valid, it has to comply with at least 22 different fine- and third party analytics– following to the state-of- grained requirements [75]. While general websites im- the-art methodology from Fouad et al. [42]. plement cookie banners to comply with the legal re- (2) We perform a legal and technical analysis of quirement of consent, recent works made evident that consent collection on 176 health related web- in practice websites often do not contain any cookie ban- sites and identify practices potentially violat- ners, or contain banners that do not respect the user’s ing the GDPR and the ePrivacy directive. We choice [63, 65, 66, 74]. Therefore, doctors and hospi- found that 64% of the websites track users before tals need to ensure that if their health websites contain any interaction with the banner. Moreover, 76% of tracking or any form of sophisticated cookie syncing, a these websites fail to comply with the legal require- valid and explicit consent must be collected before any ments for a valid explicit consent: out of 176 stud- of such activities are included. ied websites, 46% do not display a cookie banner, In this paper, we perform the first in-depth quali- and 75% thereof still contain tracking, thus violating tative study of third-party tracking, including complex the explicit consent legal requirement; 26% of the cookie syncing and ID sharing techniques on health re- websites provide a cookie banner without a reject lated websites, that are mostly owned by doctors and button, and 86% of these websites include track- hospitals in two EU countries: France and Germany. We ing, hence violating the requirement to give users designed a new Firefox browser extension called Ernie the possibility to reject tracking. Moreover, we show that performs a state-of-the-art detection and visual- that the user choice is not respected on health re- izes sophisticated forms of tracking and ID sharing on a lated websites: 33 (19%) websites still contain track- visited website, based on 6 different categories of third- ing after cookie rejection. party tracking from Fouad et al. [42]. (3) We analyse in depth 5 case study websites, Instead of relying on categorisation services [74, 76], one per each type of tracking and legal vio- we carefully selected 176 websites that Web users would lation, to provide a comprehensive explana- find whenever searching for particular doctors in 2 ma- tion of why tracking is happening on health jor French and German cities, and manually visited related websites. Such in depth analysis helped them with Ernie extension.