The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, Ilana Segall, Fredrik Wollsén, Martin Lopatka To cite this version: David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, Ilana Segall, et al.. The Representa- tiveness of Automated Web Crawls as a Surrogate for Human Browsing. The Web Conference, Apr 2020, Taipei, Taiwan. 10.1145/3366423.3380104. hal-02456195 HAL Id: hal-02456195 https://hal.inria.fr/hal-02456195 Submitted on 27 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing David Zeber Sarah Bird Camila Oliveira Mozilla Mozilla Mozilla
[email protected] [email protected] [email protected] Walter Rudametkin Ilana Segall Fredrik Wollsén Univ. Lille / Inria Mozilla Mozilla
[email protected] [email protected] [email protected] Martin Lopatka Mozilla
[email protected] ABSTRACT KEYWORDS Large-scale Web crawls have emerged as the state of the art for Web Crawling, Tracking, Online Privacy, Browser Fingerprinting, studying characteristics of the Web. In particular, they are a core World Wide Web tool for online tracking research.