A Systematic Adversarial Threat Analysis of Contact Tracing Apps
Total Page:16
File Type:pdf, Size:1020Kb
Western University Scholarship@Western Electronic Thesis and Dissertation Repository 12-18-2020 10:45 AM Protecting Health Data in a Pandemic: A Systematic Adversarial Threat Analysis of Contact Tracing Apps Leah Krehling, The University of Western Ontario Supervisor: Essex, Aleksander, The University of Western Ontario A thesis submitted in partial fulfillment of the equirr ements for the Master of Engineering Science degree in Electrical and Computer Engineering © Leah Krehling 2020 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Other Electrical and Computer Engineering Commons Recommended Citation Krehling, Leah, "Protecting Health Data in a Pandemic: A Systematic Adversarial Threat Analysis of Contact Tracing Apps" (2020). Electronic Thesis and Dissertation Repository. 7586. https://ir.lib.uwo.ca/etd/7586 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract In this thesis centralized, decentralized, Bluetooth, and GPS based applications of digital contact tracing were reviewed and assessed. Using privacy principles created by a contingent of security and privacy experts from across Canada, a metric of assessing an application’s privacy was created. An attack tree was built to assess the security of the contact tracing applications. Eighteen attacks were theorized against contact tracing applications currently in use. An application’s vulnerability to the attacks was measured using a scoring system developed for this purpose. The results of the security scores were used to create a metric for assessing the security of contact tracing systems. Five contact tracing applications were assessed using developed privacy and security met- rics. The results of this assessment are that for privacy and security a centralized Bluetooth model with added location functionality scored low. While in privacy a decentralized Blue- tooth model scored high. In security, the centralized GPS model scored high, while having only a fair level of privacy. Keywords: De-identification, Health Data, Contact Tracing, COVID-19, Privacy, Security i Summary for Lay Audience The digital world is growing larger every day. Everything we do that involves a computer or the internet generates data points. These data points are collected and stored. Together all of this data forms a picture of your life. With it models that can predict your behaviour can be created. There is a lot of power sitting and waiting to be used. The power of this data can be used for good. To create models that can diagnose illness or determine the best treatment plan for an individual. It could also be used to harm people. The same medical record that together with others could create a treatment breakthrough for a mental disorder could be used to discriminate against someone with that disorder, losing them their job. Health data is private for good reasons. Determining the best practices of keeping health data private provides those tasked to do so with the tools they need. The first section of this thesis is a review of the state of health data privacy. This leads to an overview of the best practices of the field. Then the specific health data problem of contact tracing is tackled. Both the privacy and security of contact tracing applications is important. The privacy of the application was mea- sured using privacy principles created by a contingent of security and privacy experts from across Canada. From these privacy principles, a metric for assessing an application’s privacy was created. To assess the security of the contact tracing applications eighteen attacks were theorized. These attacks were then applied to the systems that the application’s use. An application’s vulnerability to the attacks was measured using a scoring system developed for this purpose. The results of the security scores were used to create a metric for assessing the security of contact tracing systems. Five contact tracing applications were assessed using developed privacy and security met- rics. The results of this assessment are that for privacy and security one out of the five was ranked as low, one was ranked as high, and three were ranked as medium. This means that of the five applications scrutinized four out of five have privacy or security concerns. As these applications are intended to be used across the globe by everyone for the safety of the populace these concerns are important to address. ii Acknowledgements I would like to thank the efforts of my advisor Dr. Aleksander Essex. He is an excellent mentor and it is only through his guidance that I was able to learn so much in the last two years. I am incredibly grateful for the support he has provided. Thank you for every insight, pep talk, and for responding to the email of a fourth year undergraduate student in need of an advisor. I also would like to thank Elena Novas and Zeev Glauberzon. Their support, insight, and efforts are greatly appreciated. Thank you for answering all of the many questions I had about data privacy and your unfailing patience while teaching me. To the rest of my friends and family thank you for your support, your conversation and your impeccable ability to keep me laughing. Finally I would like to acknowledge my dog, Naga for all of the work she did as emotional support. Without her this may have been completed but it would have taken a lot longer. iii Contents Abstract i Summary for Lay Audience ii Acknowlegements iii List of Figures ix List of Tablesx 1 Introduction1 1.1 Motivations .................................... 2 1.2 Contributions ................................... 3 1.3 Organization of Thesis .............................. 4 I Systematic Review of Health Data Privacy6 2 Background of Privacy and Health Data7 2.1 De-identification Methods............................. 8 2.2 The Threat of Re-Identification.......................... 11 2.2.1 Identity Disclosure ............................ 11 2.2.2 Attribute Disclosure............................ 12 2.2.3 Internal Attacks.............................. 12 2.2.4 External Attacks ............................. 12 2.3 De-Identification Measures............................ 13 2.3.1 k-anonymity................................ 13 2.3.2 Differential Privacy............................ 14 2.4 Tools of Re-Identification............................. 14 iv 2.4.1 Linkage Attacks.............................. 14 2.4.2 Machine Learning Models ........................ 15 2.4.3 Markov Chains.............................. 15 2.4.4 Reversing Masking............................ 15 3 Review of Health Data De-identification Attacks 16 3.1 De-Identification Review Methodology...................... 16 3.1.1 Search Method .............................. 16 3.1.2 Inclusion/Exclusion Criteria ....................... 17 3.1.3 Data Abstraction.............................. 17 3.2 Results From the De-identification Attack Review................ 20 3.2.1 Notable Observations........................... 20 3.2.2 Health Data................................ 21 3.2.3 Geolocation Data............................. 24 3.2.4 Differential Privacy............................. 25 3.2.5 Breaking Masking ............................ 25 3.2.6 External Data Availability ........................ 26 3.3 Remarks on De-identification Attack Review................... 27 4 Observations from Review of De-identification Attacks 28 4.1 Best Practices for All Data Types......................... 28 4.1.1 Suppression................................ 28 4.1.2 Generalization............................... 29 4.1.3 Masking.................................. 30 4.2 Best Practices for Demographic Data....................... 33 4.2.1 Suppression................................ 34 4.2.2 Generalization............................... 34 4.3 Best Practices for Health Data .......................... 36 4.3.1 Suppression................................ 41 4.3.2 Generalization............................... 43 4.3.3 Perturbation................................ 44 4.3.4 Aggregation................................ 45 4.3.5 Access Control .............................. 45 4.4 Best Practices for Geolocation Data ....................... 47 v 4.4.1 Suppression................................ 48 4.4.2 Generalization............................... 51 4.4.3 Perturbation................................ 57 4.4.4 Aggregation................................ 58 4.5 Best Practices for Browsing History ....................... 58 4.5.1 Suppression................................ 58 4.6 Best Practices for Call Records.......................... 59 4.6.1 Suppression................................ 59 4.6.2 Generalization............................... 60 4.7 Best Practices for Social Networks........................ 60 4.7.1 Suppression................................ 60 4.7.2 Perturbation................................ 61 4.7.3 Aggregation................................ 61 4.8 Best Practices for Billing Information ...................... 62 4.8.1 Suppression................................ 62 4.8.2 Perturbation................................ 62 4.9 Conclusions from the De-identification Attack Review ............. 63 II Security and Privacy of Patient Contact Tracing 64 5 Background on Contact Tracing 65 5.1 History of Contact Tracing ............................ 65 5.1.1 Privacy Concerns of Traditional