Private-Key Fully Homomorphic Encryption for Private Classification of Medical Data

City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2018 Private-Key Fully Homomorphic Encryption for Private Classification of Medical Data Alexander N. Wood The Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/2888 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] Private-Key Fully Homomorphic Encryption for Private Classification of Medical Data by Alexander Nicolas Wood A dissertation submitted to the Graduate Faculty in Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2018 ii c 2018 Alexander Nicolas Wood All Rights Reserved iii Private-Key Fully Homomorphic Encryption for Private Classification of Medical Data by Alexander Nicolas Wood This manuscript has been read and accepted by the Graduate Faculty in Computer Science in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy. Date Delaram Kahrobaei Chair of Examining Committee Date Robert M. Haralick Executive Officer Supervisory Committee: Robert M. Haralick Delaram Kahrobaei Ali Mostashari Kayvan Najarian Vladimir Shpilrain The City University of New York iv Abstract Private-Key Fully Homomorphic Encryption for Private Classification of Medical Data by Alexander Nicolas Wood Advisor: Professor Delaram Kahrobaei A wealth of medical data is inaccessible to researchers and clinicians due to privacy restrictions such as HIPAA. Clinicians would benefit from access to predictive models for diagnosis, such as classification of tumors as malignant or benign, without compromising patients’ privacy. In addition, the medical institutions and companies who own these medical information systems wish to keep their models private when used by outside parties. Fully homomorphic encryption (FHE) enables practical polynomial computation over encrypted data. This dissertation begins with coverage of speed and security improvements to existing private-key fully homomorphic encryption methods. Next this dissertation presents a protocol for third-party private search using private-key FHE. Finally, fully homomorphic protocols for polynomial machine learning algorithms are presented using privacy-preserving Naive Bayes and Decision Tree classifiers. These protocols allow clients to privately classify their data points without direct access to the learned model. Experiments using these classifiers are run using publicly available medical data sets. These protocols are applied to the task of privacy-preserving classification of real-world medical data. Results show that private-key fully homomorphic encryption is able to provide fast and accurate results for privacy-preserving medical classification. Acknowledgments This work would not have been possible without the financial support of the Office of Naval Research, the CUNY Computer Science Fellowship, the CUNY Mathematics Fellowship, the CUNY Summer Fellowship, and the Hunter College University Fellowship. I am indebted to Dr. Delaram Kahrobaei, Professor of Computer Science at The Graduate Center, CUNY, and the chairperson of my committee. Her guidance and expertise were fundamental to my success and taught me the importance of bridging theory and practice. I am forever grateful for the research opportunities she offered and the mentoring she provided. Furthermore, I would like to thank each of the members of the committee for their invaluable feedback and unwavering support. Professor Kayvan Najarian has been a dedicated mentor throughout the research process. I am indebted to him for his confidence in my abil- ity as a researcher as well as for hosting me at the University of Michigan. I am grateful and indebted to Professor Vladimir Shpilrain for generously sharing his mathematical insights and expertise. Furthermore, I would like to thank Professor Robert Haralick for providing crucial feedback and comments, which led to a great deal of improvements in my final dissertation. I am grateful for the support of Dr. Ali Mostashari, an industry collaborator and committee member whose generous advice helped shape my research. Moreover, I would like to thank Jonathan Gryak for his advice and support through the years and his generous feedback which led to improvements in my dissertation. Last but not least, I would like to thank my family and friends who have offered invaluable support during v vi pursuit of this project. I would like to thank Professor Katia Perea, whose kind and patient friendship made this work possible. Finally, I would like to thank George and Roberta Tabb for always believing in me, for their unconditional love and support, and for their assistance in proofing and editing. Contents 1 Introduction 1 1.1 Contribution . .3 2 Terminology 5 2.1 Machine Learning . .5 2.1.1 Performance Measures . .6 2.2 Privacy-Preserving Classification . .8 2.2.1 Model . .9 2.3 Parallelization via SIMD . 10 2.4 Fully Homomorphic Encryption . 11 2.4.1 Private-Key Fully Homomorphic Encryption . 12 2.4.2 Leveled, Somewhat, and Partially Homomorphic Encryption . 14 2.4.3 Notation . 14 3 Background 15 3.1 Differential Privacy . 16 3.2 Privacy-preserving Classification . 17 3.2.1 ML Confidential . 17 3.2.2 The Simple Encrypted Arithmetic Library (SEAL) . 20 vii CONTENTS viii 3.2.3 SEAL for Classification via Neural Networks . 21 3.2.4 Other Cryptographic Methods . 22 3.3 Fully Homomorphic Encryption Schemes . 25 3.4 The First FHE Schemes . 26 3.4.1 Gentry’s Fully Homomorphic Encryption . 26 3.5 Second-Generation FHE and Beyond . 29 3.5.1 The Learning With Errors Problem . 30 3.5.2 The Brakerski-Gentry-Vaikuntanathan (BGV) Scheme . 32 3.5.3 HElib . 34 3.5.4 Yet Another Somewhat Homomorphic Encryption Scheme (YASHE) . 34 3.5.5 Fan-Vercauteren (FV) Encryption . 36 3.5.6 Third-Generation Public-Key FHE: Recent Developments . 38 3.6 Gribov-Kahrobaei-Shpilrain (GKS) Encryption . 38 3.7 Encoding Data for Fully Homomorphic Computation . 40 3.7.1 Fully Homomorphic Encoding in SEAL . 41 3.7.2 Fully homomorphic encoding in GKS . 43 3.8 Conclusions . 43 4 Implementation of the Gribov-Kahrobaei-Shpilrain (GKS) Scheme 45 4.1 Fully Homomorphic Encoding of Real-World Values in GKS . 46 4.2 Parallelization via SIMD . 47 4.3 Generating the Triangular Basis Transformation . 52 4.3.1 Algorithms for Triangular Basis Implementation . 53 4.4 Homomorphic Evaluation . 58 4.5 Computational Complexity . 60 4.6 Generating Encryptions of Zero . 60 CONTENTS ix 4.7 Experimental Performance . 61 4.7.1 Performance Comparison . 64 4.8 Conclusion . 66 5 Privacy-Preserving Naive Bayes Classification 67 5.1 Naive Bayes Classification . 69 5.2 Proposed Method for Fully Homomorphic Naive Bayes Classification . 71 5.2.1 Privacy-Preserving Argmax Protocol . 73 5.3 Security . 77 5.4 Implementation . 78 5.5 Evaluation . 78 5.6 Discussion . 80 5.6.1 Computational Bottlenecks . 81 5.6.2 Comparison to Other Classifiers . 82 5.7 Conclusions . 83 6 Third Party Private Search via Fully Homomorphic Encryption 84 6.1 Introduction . 84 6.2 Model . 87 6.3 Building Blocks . 89 6.3.1 Secure Modular Reduction (SMR) . 89 6.4 Third Party Private Search (TPPS) Protocol . 90 6.5 TPPS Over Encrypted Data . 91 6.6 Correctness . 93 6.7 Security . 93 6.8 Implementation . 94 6.8.1 Secure Modular Reduction . 94 CONTENTS x 6.8.2 Fully Homomorphic Encryption . 96 6.9 Minimization of Prediction Error . 97 6.10 Large Database Extension . 99 6.10.1 Unencrypted Model . 99 6.10.2 Encrypted Model . 101 6.11 Evaluation . 101 6.12 Discussion . 103 6.12.1 Computational Bottlenecks . 103 6.12.2 Comparison . 104 6.13 Conclusions . 106 7 Privacy-Preserving Decision Tree Classification 108 7.1 Introduction . 108 7.2 Classification and Regression Tree (CART) . 109 7.2.1 Growing Classification Trees . 110 7.2.2 Classification of New Data Points . 113 7.3 Methodologies . 113 7.3.1 Randomizing Trees . 113 7.3.2 Complete Tree Randomization . 115 7.3.3 Oblivious Transfer . 116 7.4 Tree Representation and Decision Tree Classification . 118 7.5 Private Decision Tree Classification Protocol . 120 7.5.1 Private Node Evaluation . 122 7.6 Security . 123 7.7 Implementation . 124 7.7.1 Tree Randomization . 124 CONTENTS xi 7.7.2 Oblivious Transfer . 125 7.8 Evaluation . 126 7.9 Discussion . 127 7.9.1 Computational Bottlenecks . 127 7.9.2 Comparison . 128 7.10 Conclusion . 129 8 Conclusions 130 8.1 Future Work . ..

Load more