Privacy Challenges in Online Targeted Advertising Minh-Dung Tran

Privacy Challenges in Online Targeted Advertising Minh-Dung Tran To cite this version: Minh-Dung Tran. Privacy Challenges in Online Targeted Advertising. Computers and Society [cs.CY]. Université de Grenoble, 2014. English. NNT : 2014GRENM053. tel-01555362 HAL Id: tel-01555362 https://tel.archives-ouvertes.fr/tel-01555362 Submitted on 4 Jul 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité : Informatique Arrêté ministérial : 7 août 2006 Présentée par Minh-Dung Tran Thèse dirigée par Dr. Claude Castelluccia et codirigée par Dr. Mohamed-Ali Kaafar préparée au sein de l’INRIA Rhônes-Alpes, équipe Privatics et de l’École Doctorale Mathématiques, Sciences et Technologies de l’Information, Informatique Privacy Challenges in Online Tar- geted Advertising. Thèse soutenue publiquement le 13 Novembre 2014, devant le jury composé de : Pr. Martin Heusse Grenoble INP - Ensimag, Président Dr. Paul Francis Max Planck Institute for Software Systems, Rapporteur Dr. Marc-Olivier Killijian CNRS, Rapporteur Dr. Vincent Toubiana CNIL, Examinateur Dr. Claude Castelluccia Inria, Directeur de thèse Dr. Mohamed-Ali Kaafar Inria, Co-Directeur de thèse Abstract In modern online advertising, advertisers tend to track Internet users’ activities and use these tracking data to personalize ads. Even though this practice - known as targeted advertising - brings economic benefits to advertising companies, it raises serious concerns about potential abuses of users’ sensitive data. While such privacy violations, if performed by trackers, are subject to be regulated by laws and audited by privacy watchdogs, the consequences of data leakage from these trackers to other entities are much more difficult to detect and control. Protecting user privacy is not easy since preventing tracking undermines the benefits of targeted advertising and consequently impedes the growth of free content and services on the Internet, which are mainly fostered by advertising revenue. While short-term measures, such as detecting and fixing privacy leakages in current systems, are necessary, there needs to be a long-term approach, such as privacy-by-design ad model, to protect user privacy by prevention rather than cure. In the first part of this thesis, we study several vulnerabilities in current advertising systems that leak user data from advertising companies to external entities. First, since targeted ads are personalized to each user, we present an attack exploiting these ads on the fly to infer user private information that have been used to select ads. Second, we investigate common ad exchange protocols, which allow companies to cooperate in serving ads to users, and show that advertising companies are leaking user private information, such as web browsing history, to multiple parties participating in the protocols. These web browsing histories are given to these entities at surprisingly low prices, reflecting the fact that user privacy is extremely underestimated by the advertising industry. In the second part of the thesis, we propose a privacy-by-design targeted advertising model which allows personalizing ads to users without the necessity of tracking. This model is specifically aimed for the two newly emerging ad technologies - retargeting advertising and ad exchange. We show that this model provides strong protection for user privacy while still ensuring ad targeting performance and being practically deployable. i ii I dedicate this thesis to my wife, Minh Trang, and my little son, Minh Anh. iii iv Acknowledgments First and foremost, I would like to thank my family for their unconditional love and support. They have been always beside me to share and to help me overcome the most difficult moments in this journey. Second, I would like to express my sincere appreciation to my advisor Claude Castel- luccia, who gave me the opportunity to work in this interesting topic, for his great guidance and help during my PhD. I would also like to thank my co-advisor, Mohamed-Ali Kaafar for his help and advice, especially at the beginning of my PhD life. I would like to thank Gergely Acs for the invaluable discussions we had which helped me significantly advance in my work. I am very thankful to all my colleagues at INRIA in general, and in Privatics team in particular, for their collaboration and for making my time at INRIA enjoyable. I extend many thanks also to all my friends for their support and the great moments we have shared in life. I would like to express my gratitude to my doctoral committee for their helpful com- ments and discussions. Finally, I gratefully acknowledge the financial support of my studies by the French Ministry of National Education. v vi Contents 1 Introduction 1 1.1 Privacy Challenges in Targeted Advertising . 1 1.2 Identifying Trackers . 3 1.3 Privacy Leaks . 4 1.4 Privacy-Enhancing Initiatives . 6 1.4.1 Regulation and Self-Regulation . 6 1.4.2 Blocking . 7 1.4.3 Privacy Preserving Targeted Advertising . 8 1.5 Contributions . 8 1.6 Organization . 10 2 Background: Online Tracking and Privacy 11 2.1 Tracking Technologies . 11 2.1.1 Collection Techniques . 12 2.1.2 Identifiers . 12 2.1.3 Behavioral Information . 15 2.2 What They Know . 16 2.3 Identifiability . 17 2.3.1 Pseudonymous Identity Can Be Linked with Real Identity . 18 2.3.2 Data De-anonymization . 19 2.4 Why We Should Care about Online Privacy . 19 2.4.1 Counter Arguments . 19 2.4.2 Potential Risks of Privacy Violation . 20 2.5 Conclusion . 23 3 Related Work 25 3.1 Tracking the Trackers . 25 3.1.1 Prevalence of Tracking . 26 3.1.2 Tracking Techniques . 26 3.2 Tracking Protection Techniques . 27 vii 3.3 Privacy Leakage . 29 3.4 Privacy-Preserving Targeted Advertising . 30 3.5 Economics of Privacy . 31 3.5.1 Value of User Privacy . 31 3.5.2 User Privacy as A Commodity . 33 I Privacy Leaks in Targeted Advertising 35 4 Privacy Leaks in Targeted Ads Delivery 37 4.1 Motivation . 37 4.2 Targeted Advertising: The Case of Google . 38 4.3 Reconstructing User Profiles from Targeted Ads . 39 4.3.1 Building Blocks . 40 4.3.2 Extracting Targeted Ads . 43 4.3.3 User-Profile Reconstruction . 43 4.4 Evaluation . 44 4.4.1 Experiment Setup . 44 4.4.2 Evaluation Methodology . 46 4.4.3 Result Analysis . 48 4.5 Discussion . 51 4.6 Summary . 54 5 Privacy Leaks in Ad Exchange 55 5.1 Introduction . 55 5.2 Background information . 57 5.2.1 Cookie Matching . 57 5.2.2 Real-Time Bidding . 58 5.2.3 The Economics of Real-Time Bidding . 59 5.3 Cookie Matching and RTB Detection . 60 5.3.1 Request Hierarchy Detection . 60 5.3.2 Cookie Matching Detection . 61 5.3.3 Real-Time Bidding Detection . 61 5.4 Cookie Matching and RTB Analysis . 64 5.4.1 The RTBAnalyser Plugin . 64 5.4.2 Dataset . 65 5.4.3 Cookie Matching Privacy Analysis . 65 5.4.4 Real-Time Bidding Privacy Analysis . 69 5.5 Value of User Privacy . 73 viii 5.5.1 Considerations . 74 5.5.2 Methodology . 75 5.5.3 Dataset . 75 5.5.4 Experiment Description . 75 5.5.5 Results . 76 5.5.6 Real Profile Analysis . 78 5.6 Discussion . 79 5.6.1 Data Exchange between Companies . 79 5.6.2 Privacy-Preserving Targeted Advertising . 79 5.6.3 The Economics of Private Data . 80 5.7 Summary . 81 II Privacy-Enhancing Solutions 83 6 A Practical Solution: Retargeting Without Tracking 85 6.1 Introduction . 85 6.1.1 Context and Motivation . 85 6.1.2 Our Proposal . 87 6.2 Background: Retargeting and Privacy . 87 6.2.1 Retargeting Mechanism . 88 6.2.2 Privacy Concerns . 89 6.3 Goals and Assumptions . 89 6.3.1 Goals . 89 6.3.2 Security Assumption . 90 6.4 System Overview . 91 6.5 System Details . 93 6.5.1 Product Score Evaluation . 93 6.5.2 Product Ranking . 95 6.5.3 Ad Serving . 96 6.5.4 Other Features . 97 6.6 Privacy Analysis . 97 6.6.1 Retargeter . 97 6.6.2 Ad Exchange . 98 6.6.3 Advertiser . 99 6.6.4 User . 99 6.7 Implementation and Evaluation . 100 6.7.1 Implementation . 100 6.7.2 Evaluation . 100 ix 6.8 Discussion . 102 6.8.1 Compatibility . ..

Privacy Challenges in Online Targeted Advertising Minh-Dung Tran

A Brief Primer on the Economics of Targeted Advertising

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild

The Internet and Web Tracking

Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook

Zeszyty T 13 2018 Tytulowa I Redakcyjna

Improving Transparency Into Online Targeted Advertising

Cookie Swap Party: Abusing First-Party Cookies for Web Tracking

A Privacy-Aware Framework for Targeted Advertising

The Value of Behavioral Targeting By

Download on Our Platform and We Have Obtained Licenses from Many Content Providers

Study Guide TABLE of CONTENTS

Stuxnet : Analysis, Myths and Realities