Pseudonymization in Biomedical Research
Total Page:16
File Type:pdf, Size:1020Kb
Technische Universität München (TUM) Fakultät für Informatik Institut für medizinische Statistik und Epidemiologie Lehrstuhl für medizinische Informatik Pseudonymization in Biomedical Research Ronald Rene Lautenschläger, M.Sc. Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende(r): Prof. Dr. Florian Matthes Prüfer der Dissertation: 1. Prof. Dr. Klaus A. Kuhn 2. Prof. Dr. Claudia Eckert Die Dissertation wurde am 29.08.2018 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 29.01.2019 angenommen. Acknowledgement Acknowledgement Firstly, I would like to express my sincere gratitude to my advisor Prof. Klaus A. Kuhn for helping me finding my research issue and also for the continuous support of my Ph.D study, for his patience and his guidance. Besides my first advisor, I would like to thank my second advisor Prof. Claudia Eckert for her insightful comments, encouragement and the feedback provided. My sincere thanks also goes to Dr. Fabian Prasser and Dr. Florian Kohlmayer, from whom my studies have greatly benefited. Without their precious support it would not have been possible to conduct this research. The fruitful discussions, meticulous comments and suggestions of Dr. Prasser, Dr. Kohlmayer and Prof. Kuhn were illuminating and crucial for this work. I owe my deepest gratitude to my family and my friends who encouraged me during my research and my whole life. Last but not the least, I would like to thank my co-workers of our institute for their support and friendly assistance in all matters. R. Lautenschläger - Pseudonymization in Biomedical Research (2018) 2 / 134 Abstract Abstract Background: Biomedical research requires collecting and analyzing sensitive multidimensional datasets. In recent years, the domain has shifted towards collaborative and multidisciplinary approaches. Simultaneously, increasing public awareness of privacy threats has led to high social and political pressure to prevent misuse of personal data. Several national and international data protection laws and regulations demand the separation of data that may identify patients from biomedical data used for research purposes. The process, in which a confidential link between both types of data is maintained, is commonly known as pseudonymization. Besides traditional security and privacy measures, pseudonymization provides additional protection for sensitive personal health information. Objective: Legal requirements and regulatory recommendations cannot function as a blue- print for implementing pseudonymization. Existing concepts for pseudonymization often do not sufficiently cover details on implementation. As a result, most pseudonymization solutions have been developed empirically, lacking a systematic methodology regarding a risk and threat analysis as well as technical design and implementation. The aim of this work is to provide the groundwork for such a methodology and, in addition, to provide a basis for the comparison of existing solutions. Finally, this work aims at simplifying future developments, by describing core requirements for a reference architecture and implementation options for generic solutions. Methods: In this thesis, we examine existing pseudonymization concepts and extract basic requirements for managing pseudonymized data and define a reference architecture fulfilling these requirements. In addition, we describe security and privacy requirements for identified or known threats. Secondly, we systematically model the design space of countermeasures for enforcing these requirements. Results: We identified core requirements and developed a generic solution that supports the collection and management of distributed pseudonymized data. We showed how different combinations of countermeasures result in different pseudonymization architectures with different security and privacy characteristics. Next, we presented a risk and threat analysis for our approach and show which countermeasures were used to guard against identified threats. Our approach has been successfully used to implement a R. Lautenschläger - Pseudonymization in Biomedical Research (2018) 3 / 134 Abstract national and an international rare disease network where data about more than 1400 individuals has been collected to date. Conclusions: This work provides a first step towards a model for pseudonymity in biomedical research. The formal definition of requirements, especially for security and privacy as described in this thesis should become a fundamental basis for designing secure and privacy-preserving research data management systems. R. Lautenschläger - Pseudonymization in Biomedical Research (2018) 4 / 134 Contents Contents Acknowledgement ...................................................................................... 2 Abstract ...................................................................................................... 3 Contents ...................................................................................................... 5 List of Figures .............................................................................................. 8 List of Tables ............................................................................................. 10 1 Introduction ........................................................................................... 12 1.1 Background ................................................................................................................ 12 1.1.1 Pseudonymization in Biomedical Research ........................................................ 14 1.1.2 Disease Networks ................................................................................................ 16 1.2 Problem Statement .................................................................................................... 17 1.3 Contributions ............................................................................................................. 18 2 Requirements on Pseudonymized Data Management ............................ 20 2.1 Methods ..................................................................................................................... 21 2.2 Results ........................................................................................................................ 22 2.2.1 Data Entry Requirements .................................................................................... 23 2.2.2 Data Management Requirements ....................................................................... 23 2.2.3 Physical Security Requirements .......................................................................... 25 2.2.4 Logical Security Requirements ............................................................................ 26 2.2.5 Business Continuity Requirements ..................................................................... 28 2.2.6 Software Development Requirements ............................................................... 29 2.2.7 Database Management System Requirements .................................................. 29 2.2.8 Export and Reporting Requirements .................................................................. 29 2.3 Discussion................................................................................................................... 31 R. Lautenschläger - Pseudonymization in Biomedical Research (2018) 5 / 134 Contents 2.3.1 Principal Results .................................................................................................. 31 2.3.2 Comparison with Prior Work ............................................................................... 33 2.3.3 Limitations ........................................................................................................... 34 2.3.4 Conclusions.......................................................................................................... 34 3 A Reference Architecture for Pseudonymized Data Management ........... 35 3.1 Methods ..................................................................................................................... 37 3.2 Results ........................................................................................................................ 38 3.2.1 Architectural Options .......................................................................................... 38 3.2.2 Evaluation ............................................................................................................ 47 3.3 Discussion................................................................................................................... 64 3.3.1 Principal Results .................................................................................................. 64 3.3.2 Comparison with Prior Work ............................................................................... 66 3.3.3 Limitations ........................................................................................................... 69 3.3.4 Conclusions.......................................................................................................... 70 4 Risk and Threat Analysis for the Reference Architecture ......................... 73 4.1 Background ................................................................................................................ 73 4.1.1 Breaches of Privacy and Confidentiality.............................................................