Secure Computation Systems for Confidential Data Analysis By
Total Page:16
File Type:pdf, Size:1020Kb
Secure Computation Systems for Confidential Data Analysis by Rishabh Poddar A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Raluca Ada Popa, Chair Professor Ion Stoica Professor Sylvia Ratnasamy Professor Deirdre Mulligan Fall 2020 Secure Computation Systems for Confidential Data Analysis Copyright 2020 by Rishabh Poddar 1 Abstract Secure Computation Systems for Confidential Data Analysis by Rishabh Poddar Doctor of Philosophy in Computer Science University of California, Berkeley Professor Raluca Ada Popa, Chair A large number of services today are built around processing data that is collected from or shared by customers. While such services are typically able to protect the data when it is in transit or in storage using standard encryption protocols, they are unable to extend this protection to the data when it is being processed, making it vulnerable to breaches. This not only threatens data confidentiality in existing services, it also prevents customers from availing such services altogether for sensitive workloads, in that they are unwilling / unable to share their data out of privacy concerns, regulatory hurdles, or business competition. Existing solutions to this problem are unable to meet the requirements of advanced data analysis applications. Systems that are efficient do not provide strong enough security guarantees, and approaches with stronger security are often not efficient. To address this problem, the work in this dissertation develops new systems and protocols for securely computing on encrypted data, that attempt to bridge the gap between security and efficiency. We distill design principles based on the properties of the two primary approaches for secure computation—advanced cryptographic protocols and trusted execution environments. Informed by these principles, we design novel cryptographic protocols and algorithms with strong and provable security guarantees, using which we show how to build systems that are both secure and efficient. i To my family. ii Contents Contents ii List of Figuresv List of Tables viii 1 Introduction1 1.1 Motivation....................................... 1 1.2 Approaches for Secure Computation......................... 2 1.3 Building Systems using Secure Computation..................... 3 1.4 Impact and Adoption ................................. 5 1.5 Dissertation Roadmap................................. 5 2 Building Secure and Practical Data Systems6 2.1 Trusted Execution Environments........................... 6 2.2 Cryptographic Approaches.............................. 7 2.3 Challenges and Design Strategy ........................... 8 3 Database Queries on Encrypted Data 11 3.1 Introduction...................................... 11 3.2 Overview ....................................... 14 3.3 Encryption Building Blocks.............................. 18 3.4 ArxRange & Order-based Queries.......................... 18 3.5 ArxEq & Equality Queries.............................. 23 3.6 ArxAgg & Aggregation Queries........................... 25 3.7 ArxJoin & Join Queries ............................... 26 3.8 Arx’s Planner ..................................... 28 3.9 Security Analysis................................... 30 3.10 Evaluation....................................... 32 3.11 Limitations and Future Work............................. 40 3.12 Related Work ..................................... 40 3.13 Summary ....................................... 42 iii 4 Collaborative SQL Analytics on Encrypted Data 43 4.1 Introduction...................................... 43 4.2 Senate’s API...................................... 47 4.3 Threat Model and Security Guarantees........................ 48 4.4 Senate’s MPC Decomposition Protocol........................ 49 4.5 Senate’s Circuit Primitives .............................. 57 4.6 Decomposable Circuits for SQL Operators...................... 60 4.7 Query Execution ................................... 62 4.8 Evaluation....................................... 67 4.9 Limitations and Discussion.............................. 74 4.10 Related work ..................................... 75 4.11 Summary ....................................... 76 5 Analyzing Encrypted Network Traffic 77 5.1 Introduction...................................... 77 5.2 Model and Threat Model ............................... 80 5.3 SafeBricks: End-to-end Architecture......................... 81 5.4 Background...................................... 83 5.5 SafeBricks: Framework Design............................ 84 5.6 SafeBricks: NF Isolation, Least Privilege....................... 88 5.7 SafeBricks: System Bootstrap Protocol........................ 91 5.8 Security Guarantees.................................. 94 5.9 Evaluation....................................... 95 5.10 Limitations and Future Work.............................100 5.11 Related Work .....................................101 5.12 Summary .......................................102 6 Encrypted Video Analytics and Machine Learning 103 6.1 Introduction......................................103 6.2 Background and Motivation..............................106 6.3 Threat Model and Security Guarantees........................108 6.4 A Privacy-Preserving MLaaS Framework ......................110 6.5 Designing Oblivious Vision Modules.........................114 6.6 Oblivious Video Decoding ..............................116 6.7 Oblivious Image Processing..............................120 6.8 Evaluation.......................................125 6.9 Discussion.......................................133 6.10 Related Work .....................................133 6.11 Summary .......................................134 7 Collaborative Machine Learning on Encrypted Data 136 7.1 Introduction......................................136 iv 7.2 Overview .......................................137 7.3 Threat Model and Security Guarantees........................138 7.4 System Design.....................................139 7.5 Data-oblivious training and inference.........................141 7.6 Implementation....................................144 7.7 Evaluation.......................................144 7.8 Conclusion ......................................145 8 Conclusion 147 8.1 Future Directions ...................................148 Bibliography 149 A Joins over Multisets in Senate 175 B Invertibility of SQL Operators in Senate 176 C Security Proofs and Pseudocode for Visor 178 C.1 Oblivious video decoding...............................178 C.2 Oblivious image processing..............................181 D Impact of Video Encoder Padding on Visor 190 D.1 Inter-prediction for interframes............................190 v List of Figures 1.1 Classification of the systems we built, by computation scenario and the secure computa- tion approach used by the system. ............................ 3 3.1 Arx’s architecture: Shaded boxes depict components introduced by Arx. Locks indicate that sensitive data at the component is encrypted. .................... 14 3.2 ArxRange example. Enc is encryption with BASE..................... 19 3.3 Search token tree...................................... 24 3.4 ArxEq read throughput with increasing no. of duplicates. ................ 35 3.5 ArxEq write throughput with increasing no. of duplicates................. 35 3.6 YCSB throughput for different workloads......................... 36 3.7 ArxRange latency of reads and writes. .......................... 37 3.8 ArxRange throughput, with and without caching...................... 37 3.9 ShareLaTeX performance with Arx’s client proxy on varying cores .......... 38 3.10 ShareLaTeX performance with increasing no. of client threads ............. 38 4.1 Overview of Senate’s workflow. ............................. 44 4.2 Query execution in the baseline (monolithic MPC) vs. Senate (decomposed MPC). s represent a filtering operation, and on is a join. Green boxes with locks denote MPC operations; white boxes denote plaintext computation. X represents additional verification operations added by Senate. ......................... 45 4.3 Query execution in Senate. Colored keys and locks indicate which parties are involved in which MPC circuits................................... 63 4.4 Performance of m-SI in LAN................................ 67 4.5 Performance of m-Sort in LAN. ............................. 68 4.6 Performance of m-SU in LAN............................... 68 4.7 Resource consumption of building blocks (16 parties)................... 69 4.8 Building blocks in WAN.................................. 69 4.9 Query 1 with 16 parties................................... 69 4.10 Query 2 with 16 parties................................... 70 4.11 Query 3 with 16 parties................................... 70 4.12 Effect of query splitting on runtime. ........................... 70 4.13 Network usage. ...................................... 71 vi 4.14 Queries in WAN. ..................................... 71 4.15 Senate’s performance on TPC-H queries.......................... 72 4.16 Accuracy of cost model. ................................. 72 4.17 Semi-honest baselines................................... 72 5.1 Model for outsourced NFs................................. 78 5.2 End-to-end system architecture.............................. 81 5.3 SafeBricks framework: White boxes denote existing NetBricks components, light