Analyzing the Privacy and Societal Challenges Stemming from the Rise of Personal Genomic Testing
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY COLLEGE LONDON DOCTORAL THESIS Analyzing the Privacy and Societal Challenges Stemming from the Rise of Personal Genomic Testing Author: Supervisor: Alexandros Mittos Prof. Emiliano De Cristofaro A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy in the Information Security Group Department of Computer Science December 15, 2020 iii Declaration of Authorship I, Alexandros Mittos, declare that this thesis titled, “Analyzing the Privacy and Soci- etal Challenges Stemming from the Rise of Personal Genomic Testing” and the work presented in it are my own. I confirm that: • This work was done wholly or mainly while in candidature for a research degree at this University. • Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated. • Where I have consulted the published work of others, this is always clearly at- tributed. • Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work. • I have acknowledged all main sources of help. • Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself. Signed: Date: v “Somewhere, something incredible is waiting to be known.” Carl Sagan vii Abstract Progress in genomics is enabling researchers to better understand the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. At the same time, the rapid cost drop of genome sequencing has enabled the emergence of a booming market for direct-to-consumer (DTC) genetic testing. Nowa- days, companies like 23andMe and AncestryDNA provide affordable health, geneal- ogy, and ancestry reports, and have already tested tens of millions of customers. How- ever, while this technology has the potential to transform society by improving people’s lives, it also harbors dangers as it prompts important privacy, and societal concerns. In this thesis, we shed light on these issues using a mixed-methods approach. We start by conducting a technical investigation of the limitations on privacy-enhancing technologies used for testing, storing, and sharing genomic data. We rely on a struc- tured methodology to contextualize and provide a critical analysis of the current state- of-the-art and we identify and discuss ten open problems faced by the community. We then focus on the societal aspects of DTC genetic testing by conducting two large-scale analyses of the genetic testing discourse focusing on both mainstream and fringe social networks, specifically, Twitter, Reddit, and 4chan. Our analyses show that DTC genetic testing is a popular topic of discussion on all platforms. However, these discussions often include highly toxic language expressed through hateful and racist comments and openly antisemitic rhetoric, often conveyed through memes. Overall, our findings highlight that the rise in popularity of this new technology is accompanied by several societal implications that are unlikely to be addressed by only one research field and rather require a multi-disciplinary approach. ix Impact Statement The research in this thesis revolves around the societal impact of personal genomic testing. It investigates whether the privacy of those who contribute their data to genetic testing services and public initiatives can be adequately protected both in the short- and the long-term, how personal genomic testing is reflected online, and whether it is purposely misused. Thus, this thesis impacts various entities, such as researchers, policy makers, companies, and individuals. First, we demonstrate that current genome privacy tools developed for testing, stor- ing, and sharing genomic data are unable to adequately protect the privacy of the users, both in the short- and the long-term. Our evaluation shows that the overwhelming ma- jority of proposed techniques aiming to scale up to large genomic datasets need to opt for weaker security guarantees or weaker models. Furthermore, one serious challenge stems from lack of long-term security protection, which is difficult to address as avail- able cryptographic tools are not suitable for this goal. We believe that our work will inspire researchers to find alternative and/or creative ways of addressing some of the non-trivial open problems we identify in this thesis. The results of this thesis have also real-world implications on the DTC genetic test- ing industry and policy makers. Specifically, we show that DTC genetic testing is being used by fringe groups online to ingrain and empower genetics-based prejudice and discrimination, and even call for genocide. Considering that platforms like Facebook and Twitter have begun to be held accountable when their services enable harmful be- havior [229], we believe that our work will motivate policy makers to address the DTC genetic testing industry and legislate so that these companies consider the potential abuse of their services and attempt to find ways of minimizing this behavior. Finally, it is our hope that this thesis will raise awareness about the potential privacy and societal implications of DTC genetic testing to the general public. xi Acknowledgements I would like to express my deep appreciation to all the people who have contributed to this thesis and supported me throughout this process. First and foremost, I would like to thank my supervisor, Emiliano De Cristofaro. Both his advice and his actions have been a source of inspiration. I would like to also thank him for pushing me to become the best I can be, for teaching me what is impor- tant, and for being available when I needed him to. Furthermore, I have been extremely lucky to collaborate with and learn from a few great researchers. Namely: Prof. Bradley Malin, Prof. Jeremy Blackburn, and Dr. Savvas Zannettou. I would like to thank everyone in the Information Security Research Group at UCL for creating a fun and stimulating environment. I would also like to thank the people involved in the Privacy & Us group, both experienced researchers and Ph.D. students. Your sincere passion for research has always been contagious and every group meeting inspired me to work harder. This thesis would not have been possible without the unequivocal support of my family who has never stopped believing in me. Last but not least, I would like to thank Zoe for never giving up on me, for reminding me what is important, and for her un- conditional love and support despite my “eccentric” personality. * This research was supported by the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie “Privacy & Us” project (GA No. 675730). xiii Contents Declaration of Authorship iii Abstract vii Acknowledgements xi 1 Introduction1 1.1 Research Questions and Contributions....................3 1.2 Implications...................................6 1.3 Thesis Outline..................................7 1.4 Collaboration..................................8 2 Background9 2.1 Genomics Primer................................9 2.1.1 Terminologies..............................9 2.1.2 Contexts................................. 11 2.1.3 Attacks Against Genome Privacy................... 12 2.1.4 Legal Aspects.............................. 13 2.2 Social Platform Analysis............................ 14 2.2.1 Social Platforms............................. 15 2.2.2 Tools................................... 16 3 Related Work 19 xiv 3.1 Security & Privacy Challenges of Genetic Testing.............. 19 3.2 Genetic Testing & Society........................... 21 3.3 Health in Social Networks........................... 22 3.4 Quantitative Studies on Twitter, Reddit, and 4chan............. 23 3.5 Online Hate................................... 24 3.6 Discussion.................................... 26 4 Reconciling Privacy with Progress in Genomics: A PETs Perspective 27 4.1 Introduction................................... 27 4.2 Methodology.................................. 30 4.2.1 Sampling Relevant Work....................... 30 4.2.2 Systematization Criteria........................ 32 4.3 Representative Papers............................. 34 4.3.1 Personal Genomic Testing....................... 34 4.3.2 Genetic Relatedness.......................... 36 4.3.3 Access and Storage Control...................... 37 4.3.4 Genomic Data Sharing......................... 38 4.3.5 Outsourcing............................... 40 4.3.6 Statistical Research........................... 42 4.4 Systematic Analysis............................... 44 4.4.1 The Issue of Long-Term Security................... 44 4.4.2 Security Limitations.......................... 48 4.4.3 The Cost of Protecting Privacy.................... 50 4.4.4 Real-Life Deployment......................... 55 4.5 Experts’ Opinions................................ 57 4.6 Discussion.................................... 62 xv 5 Analyzing Genetic Testing Discourse on the Web Through the Lens of Twitter 65 5.1 Introduction................................... 65 5.2 Dataset...................................... 68 5.3 Analyzing the Genetic Testing Discourse................... 70 5.3.1 General Characterization....................... 71 5.3.2 What Are The Tweets About?..................... 75 5.3.3 Who Tweets About Genetic Testing?................. 80 5.3.4 What Do Users Tweet About Otherwise?.............. 83 5.4 Case