Analyzing the Impact of Digital Advertising on the User Privacy
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CRETE DEPARTMENT OF COMPUTER SCIENCE FACULTY OF SCIENCES AND ENGINEERING Analyzing the Impact of Digital Advertising on User Privacy Panagiotis Papadopoulos PhD Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Heraklion, September 2018 UNIVERSITY OF CRETE DEPARTMENT OF COMPUTER SCIENCE Analyzing the Impact of Digital Advertising on User Privacy PhD Dissertation Submitted by Panagiotis Papadopoulos in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy APPROVED BY : Supervisor: Prof. Evangelos P. Markatos, University of Crete, Greece Committee Member: Dr. Sotiris Ioannidis, FORTH, Greece Committee Member: Asst. Prof. Xenofontas Dimitropoulos, University of Crete, Greece Committee Member: Dr. Nicolas Kourtellis, Telefonica Research, Spain Committee Member: Asst. Prof. Michalis Polychronakis, Stony Brook University, USA Committee Member: Dr. Nikolaos Laoutaris, Data Transparency Lab, Spain Committee Member: Asst. Prof. Alexandros Kapravelos, North Carolina State University, USA Department Chairman: Heraklion, September 2018 Acknowledgments During this amazing trip called PhD, I was fortunate to have some amazing people supporting me. First of all, I am grateful to my supervisor Prof. Evangelos Markatos for shaping the researcher I am today, and teaching me a unique research mentality. This thesis would not be possible without his advices and our countless talks. I learned so many things next to you that I will be always grateful for the influence you had in my life. I am also grateful to my supervisor in Telefonica Research, Dr. Nicolas Kourtellis for his continuous support and passion for work. It was always a pleasure working with you. I consider myself very fortunate for being surrounded by brilliant minds, and therefore I need to express my appreciation to Sotiris Ioannidis, Nikolaos Laoutaris, Elias Athana- sopoulos, Michalis Polychronakis, Kostas Magoutis and Xenofontas Dimitropoulos for their advices, mentoring and all the creative brainstorming time we spent together. Also I want to thank every single co-author I had all these years: Elias Papadopoulos, Ilias Leontiadis, Giorgos Vasiliadis, Panagiotis Ilia, Antonis Papadogiannakis, Alexandros Kapravelos, Eirini Degkleri, Michalis Diamantaris, Michalis Pachilakis, Antonios Chariton, Giorgos Christou, Thanasis Petsas, Alexandros Kornilakis, Manolis Karabinakis. We worked together, we got rejected together, we resubmitted together. I really enjoyed working with you guys. After spending a bit more than 7 years in Distributed Systems (DCS) lab, I feel like I have found a second family there. I met and worked alongside great people in this lab and I want to express my deepest gratitude to Antonis Krithinakis, Dimitris Deyannis, Panagiotis Gare- falakis, Nikos Tsikoudis, Demetris Antoniades, Iason Polakis, Christos Papachristos, Lazaros Koromilas, Eva Papadogiannaki, Despoina Antonakaki, Meltini Christodoulaki, Antonis Pa- paioannou, Giorgos Tsirantonakis, Michalis Athanasakis, Evangelos Ladakis, Stamatis Vola- nis, Zacharias Tzermias, Evangelos Dimitriadis, Laertis Loutsis, Manolis Stamatogiannakis and all other past and present members of DCS for preserving the balance between computers and real life, making the lab a fun place all these years. I want to express my sincere thanks to Nikos Skotis, Aris Koutsouras, Giorgos Oikonomou, Dimitris Patelis, Marina Aeraki, Stelios Ninidakis, Stephen Tripodianakis, Nikos Patsiouras, Dimitris Chasapis, Dimitris Ioannidis, Bruno Cardoso, Minos Katevas and all other friends for their support, love, tolerance, and all the great moments we have shared. Finally, I would never have been able to reach this point without the unconditional love and support from my parents Manolis and Evangelia, and my sister Kelina. Thank you for bearing with me and being always by my side through my studies and life. vii Abstract Digital advertising is a multi-billion dollar business that has the power to fuel the entire free Internet. The recent years, it progressively moves towards a programmatic model in which ads are matched to actual interests of individuals collected as they browse the web. The advertiser pays a monetary cost to buy ad-space in a publisher's medium (e.g., website) thus delivering their digital advertisement along with the publisher's interesting content in the visitor's display. Unlike traditional advertisements in mediums such as newspapers, TV or radio, in the digital world, the end-users are also paying a cost for the advertisement delivery. Whilst the cost on the advertiser's side is clearly monetary, on the end-user, it includes both quantifiable costs, such as network requests and transferred bytes, and qualitative costs such as privacy loss to the ad ecosystem. Indeed, as advertisements become more and more personalized to match the users interests and become as effective as possible, more personal information about the visiting users is needed. Motivated by that, tracking companies deploy sophisticated user- tracking mechanisms retrieving any piece of information can reveal the user's interests and preferences. Such information may include current and historical geolocations, installed apps, browsing histories, and so forth. All this information is used to form rich user profiles and large audience segments that can be shared with or sold to anyone interested (e.g., advertisers, data brokers, data management platforms, etc.) beyond the control of the users. To conduct such data markets and before performing any background user database merges, different entities perform synchronizations of the different userIDs they have set for the same users. This way they reduce the number of the different \aliases" with which they know a user, increasing this way their capability of re-identifying users when they erase their browser state (i.e., cookies) or even when they browse through VPN to preserve their privacy. Besides the continuous growth of digital advertising and its impact on our everyday lives, little we know about the flow of information within the participating companies and the interconnections between them. Motivated by that, in this dissertation, we aim to enhance the transparency in this large ecosystem and investigate the bidirectional effect between user privacy and programmatic ad-buying. In particular, we explore the impact of personalized advertising on the users privacy and anonymity given the elaborate deployed user tracking and personal data collecting techniques. We experimentally measure the user information leaks appeared while using websites and mobile apps. Based on the insight gained from these experiments, we design countermeasures to mitigate the privacy loss. ix Towards the opposite direction, we study how these collected user data affect the pric- ing dynamics of programmatic ad-auctions and how much advertisers pay to reach a user. Then, we compare the costs imposed by digital advertising to both users and advertiser for the very same delivered ad traffic. These costs include network overhead, temperature, en- ergy consumption, loss of privacy. Finally, in an attempt to investigate privacy-preserving alternatives for web monetization that can be completely detached from any personal data requirement, we perform a detailed analysis of the profitability and the user-side overheads of the emerging technology of web cryptomining . Supervisor: Professor Evangelos P. Markatos PerÐlhyh H yhfiakh diafhmish einai mia epiqeirhsh pollwn disekatommuriwn dolariwn pou eqei thn dunamh na trofodotei oloklhro to dwrean diadiktuo. Ta teleutaia qronia, proqwra proodeutika proc ena programmatistiko montelo sto opoio oi diafhmiseic tairiazoun me ta pragmatika endiafero- nta twn atomwn ta opoia sullegontai kajwc autoi perihgountai ston diadikuo. O diafhmisthc plhrwnei ena qrhmatiko kostoc gia na agorasei diafhmistiko qwro sto yhfiako meso enoc ekdoth (p.q. istotopo) pareqontac etsi thn yhfiakh tou diafhmish mesa sto perieqomeno tou opou katalhgei sthn ojonh tou episkepth. Se antijesh me tic paradosiakec diafhmiseic se mesa opwc oi efhmeridec, h thleorash h to radiofwno, ston yhfiako kosmo, oi telikoi qrhstec plhrwnoun epishc ena kostoc gia na laboun mia diafhmish. Enw to kostoc apo thn pleura tou diafhmizomenou einai safwc nomismatiko, ston teliko qrhsth, perilambanei toso posotika, amesa prosdiorisima kosth (opwc HTTP requests kai metaferomena bytes), oso kai poiotika opwc h apwleia idiwtikothtac mesa sto oikosusthma twn yhfiakwn diafhmisewn. pragmati, kajwc oi diafhmiseic ginontai olo kai pio exatomikeumenec wste na tairiazoun me ta endiaferonta twn qrhstwn kai na ginontai oso to dunaton pio apotelesmatikec, qreiazontai perissoterec proswpikec plhroforiec gia touc episkeptec. Me ton tropo auto, oi etaireiec parakoloujhshc anaptussoun exeligmenouc mhqanismouc parakoloujhshc qrhstwn pou anaktoun opoiadhpote plhroforia pou mporei na apokaluyei ta sumferonta kai tic protimhseic tou qrhsth. Autec oi plhroforiec mporei na perilambanoun treqousec kai istorikec gewgrafikec je- seic, egkatesthmenec efarmogec, istoriko perihghshc, k.o.k. Olec autec oi plhroforiec qrh- simopoiountai gia thn dhmiourgia plousiwn profil qrhstwn pou mporoun na moirastoun h na poulhjoun se opoiondhpote endiaferomeno (p.q. diafhmistec, mesitec dedomenwn, platformec diaqeirishc dedomenwn k.l.p.) pera apo ton elegqo twn idiwn twn qrhstwn. Gia thn diexagwgh tetoiwn agorwn dedomenwn kai prin apo thn pragmatopoihsh sugqw- neushc opoiasdhpote bashc dedomenwn, diaforetikec ontothtec