Cryptϵ: Crypto-Assisted Differential Privacy on Untrusted Servers

Cryptϵ: Crypto-Assisted Differential Privacy on Untrusted Servers Amrita Roy Chowdhury Chenghong Wang Xi He University of Wisconsin-Madison Duke University University of Waterloo [email protected] [email protected] [email protected] Ashwin Machanavajjhala Somesh Jha Duke University University of Wisconsin-Madison [email protected] [email protected] ABSTRACT optimizations leveraging the fact that the output is noisy. Differential privacy (DP) is currently the de-facto standard We demonstrate Cryptϵ’s practical feasibility with extensive for achieving privacy in data analysis, which is typically im- empirical evaluations on real world datasets. plemented either in the “central” or “local” model. The local ACM Reference Format: model has been more popular for commercial deployments Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanava- as it does not require a trusted data collector. This increased jjhala, and Somesh Jha. 2020. Cryptϵ: Crypto-Assisted Differen- privacy, however, comes at the cost of utility and algorithmic tial Privacy on Untrusted Servers. In 2020 ACM SIGMOD Inter- expressibility as compared to the central model. national Conference on Management of Data (SIGMOD’20), June In this work, we propose, Cryptϵ, a system and program- 14–19, 2020, Portland, OR, USA. ACM, New York, NY, USA, 24 pages. ming framework that (1) achieves the accuracy guarantees https://doi.org/10.1145/3318464.3380596 and algorithmic expressibility of the central model (2) without any trusted data collector like in the local model. Cryptϵ 1 INTRODUCTION achieves the “best of both worlds” by employing two non- Differential privacy (DP) is a rigorous privacy definition that colluding untrusted servers that run DP programs on en- is currently the gold standard for data privacy. It is typically crypted data from the data owners. In theory, straightforward implemented in one of two models – centralized differential implementations of DP programs using off-the-shelf secure privacy (CDP) and local differential privacy (LDP). In CDP, multi-party computation tools can achieve the above goal. data from individuals are collected and stored in the clear However, in practice, they are beset with many challenges in a trusted centralized data curator which then executes like poor performance and tricky security proofs. To this end, DP programs on the sensitive data and releases outputs to Cryptϵ allows data analysts to author logical DP programs an untrusted data analyst. In LDP, there is no trusted data that are automatically translated to secure protocols that curator. Rather, each individual perturbs his/her own data work on encrypted data. These protocols ensure that the using a (local) DP algorithm. The data analyst uses these untrusted servers learn nothing more than the noisy outputs, noisy data to infer aggregate statistics of the datasets. In thereby guaranteeing DP (for computationally bounded ad- practice, CDP’s assumption of a trusted server is ill-suited versaries) for all Cryptϵ programs. Cryptϵ supports a rich for many applications as it constitutes a single point of fail- class of DP programs that can be expressed via a small set ure for data breaches, and saddles the trusted curator with of transformation and measurement operators followed by arXiv:1902.07756v5 [cs.CR] 10 Mar 2020 legal and ethical obligations to uphold data privacy. Hence, arbitrary post-processing. Further, we propose performance recent commercial deployments of DP [42, 51] have preferred LDP over CDP. However, LDP’s attractive privacy properties Permission to make digital or hard copies of all or part of this work for comes at a cost. Under the CDP model, the expected additive personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear error for a aggregate count over a dataset of size n is at most this notice and the full citation on the first page. Copyrights for components Θ¹1/ϵº top achieve ϵ-DP. In contrast, under the LDP model, at of this work owned by others than ACM must be honored. Abstracting with least Ω¹ n/ϵº additive expected error must be incurred by credit is permitted. To copy otherwise, or republish, to post on servers or to any ϵ-DP program [16, 28, 36], owing to the randomness of redistribute to lists, requires prior specific permission and/or a fee. Request each data owner. The LDP model in fact imposes additional permissions from [email protected]. penalties on the algorithmic expressibility; the power of LDP SIGMOD ’20, June 14–19, 2020, Portland, OR, USA © 2020 Association for Computing Machinery. is equivalent to that of the statistical query model [66] and ACM ISBN 978-1-4503-6735-6/20/06...$15.00 there exists an exponential separation between the accuracy https://doi.org/10.1145/3318464.3380596 and sample complexity of LDP and CDP algorithms [64]. In this paper, we strive to bridge the gap between LDP Data Owners Analytics Server ˜ Data Analyst D<latexit sha1_base64="v3Y0RkvdhbRMvYyOhK2FEjLB1TI=">AAACAXicbVDLSsNAFJ3UV62vqBvBzWARXJVEBF0WdeGygq2FJoTJZNIOncyEmYlQQtz4K25cKOLWv3Dn3zhps9DWA8MczrmXe+8JU0aVdpxvq7a0vLK6Vl9vbGxube/Yu3s9JTKJSRcLJmQ/RIowyklXU81IP5UEJSEj9+H4qvTvH4hUVPA7PUmJn6AhpzHFSBspsA+8ULBITRLz5Z6mLCL5dRG4RWA3nZYzBVwkbkWaoEInsL+8SOAsIVxjhpQauE6q/RxJTTEjRcPLFEkRHqMhGRjKUUKUn08vKOCxUSIYC2ke13Cq/u7IUaLKJU1lgvRIzXul+J83yHR84eeUp5kmHM8GxRmDWsAyDhhRSbBmE0MQltTsCvEISYS1Ca1hQnDnT14kvdOW67Tc27Nm+7KKow4OwRE4AS44B21wAzqgCzB4BM/gFbxZT9aL9W59zEprVtWzD/7A+vwBPXaXYA==</latexit> 1 3.Program Executor Crypt� and CDP. We propose, Cryptϵ, a system and a programming 2.Aggregator Program . ˜ framework for executing DP programs that: . <latexit sha1_base64="DzF3TBl0yOsNreO2IIUvqlLJDv8=">AAACCXicbVDLSsNAFJ3UV62vqEs3g0VwVRIRdFnUhcsK9gFNKJPJbTt0MgkzE6GEbN34K25cKOLWP3Dn3zhps9DWC8MczrmXe+4JEs6Udpxvq7Kyura+Ud2sbW3v7O7Z+wcdFaeSQpvGPJa9gCjgTEBbM82hl0ggUcChG0yuC737AFKxWNzraQJ+REaCDRkl2lADG3ua8RAyL4h5qKaR+TIvInpMCc9u8jwf2HWn4cwKLwO3BHVUVmtgf3lhTNMIhKacKNV3nUT7GZGaUQ55zUsVJIROyAj6BgoSgfKz2SU5PjFMiIexNE9oPGN/T2QkUoVL01mYVItaQf6n9VM9vPQzJpJUg6DzRcOUYx3jIhYcMglU86kBhEpmvGI6JpJQbcKrmRDcxZOXQees4ToN9+683rwq46iiI3SMTpGLLlAT3aIWaiOKHtEzekVv1pP1Yr1bH/PWilXOHKI/ZX3+AD5Gm04=</latexit> . D Program3.Program translateD Executor to • underlying secure never stores or computes on sensitive data in the clear computation protocols Differentially Private Output • achieves the accuracy guarantees and algorithmic express- Data Collection Phase Program Execution Phase ibility of the CDP model 1.Key 4.Privacy 5. Data Crypt� Manager Engine Decryption Program sk Key✏<latexit sha1_base64="i/UdiyKbp2jVEah3BP9sL5RnkFY=">AAACAHicbVDLSsNAFJ3UV62vqAsXbgaL4KokIuiy1I3LCvYBTSiTyU07dDIJMxOhhGz8FTcuFHHrZ7jzb5y2WWjrgWEO59zLvfcEKWdKO863VVlb39jcqm7Xdnb39g/sw6OuSjJJoUMTnsh+QBRwJqCjmebQTyWQOODQCya3M7/3CFKxRDzoaQp+TEaCRYwSbaShfeIFCQ/VNDZf7kGqGDdyqxjadafhzIFXiVuSOirRHtpfXpjQLAahKSdKDVwn1X5OpGaUQ1HzMgUpoRMygoGhgsSg/Hx+QIHPjRLiKJHmCY3n6u+OnMRqtqOpjIkeq2VvJv7nDTId3fg5E2mmQdDFoCjjWCd4lgYOmQSq+dQQQiUzu2I6JpJQbTKrmRDc5ZNXSfey4ToN9/6q3myVcVTRKTpDF8hF16iJ7lAbdRBFBXpGr+jNerJerHfrY1FascqeY/QH1ucPqXyXEw==</latexit> ManagerB Cryptϵ employs a pair of untrusted but non-colluding servers pk – Analytics Server (AS) and Cryptographic Service Provider Cryptographic Service ProviDer (CSP). The AS executes DP programs (like the data curator in CDP) but on encrypted data records. The CSP initializes Figure 1: Cryptϵ System and manages the cryptographic primitives, and collaborates with the AS to generate the program outputs. Under the as- order of magnitude. A novel contribution of this work is a sumption that the AS and the CSP are semi-honest and do DP indexing optimization that leverages the fact that noisy not collude (a common assumption in cryptographic systems intermediate statistics about the data can be revealed. [45, 46, 48, 67, 80, 83, 84]), Cryptϵ ensures ϵ-DP guarantee for • Practical for Real World Usage: For the same tasks, its programs via two cryptographic primitives – linear homo- Cryptϵ programs achieve accuracy comparable to CDP morphic encryption (LHE) and garbled circuits. One caveat and 50× more than LDP for a dataset of size ≈ 30K. Cryptϵ here is that due to the usage of cryptographic primitives, the runs within 3:6 hours for a large class of programs on a DP guarantee obtained in Cryptϵ is that of computational dataset with 1 million rows and 4 attributes. differential privacy or SIM-CDP [79] (details in Section 7). • Generalized Multiplication Using LHE: Our implemen- Cryptϵ provides a data analyst with a programming frame- tation uses an efficient way for performing n-way multipli- work to author logical DP programs just like in CDP. Like cations using LHE which maybe of independent interest. in prior work [40, 77, 106], access to the sensitive data is restricted via a set of predefined transformations operators 2 CRYPTϵ OVERVIEW (inspired by relational algebra) and DP measurement operators (Laplace mechanism and Noisy-Max [38]). Thus, any 2.1 System Architecture program that can be expressed as a composition of the above Figure 1 shows Cryptϵ’s system architecture. Cryptϵ has operators automatically satisfies ϵ-DP (in the CDP model) two servers: Analytics server (AS) and Cryptographic Ser- giving the analyst a proof of privacy for free. Cryptϵ pro- vice Provider (CSP). At the very outset, the CSP records grams support constructs like looping, conditionals, and can the total privacy budget, ϵB (provided by the data owners), arbitrarily post-process outputs of measurement operators. and generates

Load more