XONN: XNOR-Based Oblivious Deep Neural Network Inference

XONN: XNOR-based Oblivious Deep Neural Network Inference M. Sadegh Riazi Mohammad Samragh Hao Chen Kim Laine UC San Diego UC San Diego Microsoft Research Microsoft Research Kristin Lauter Farinaz Koushanfar Microsoft Research UC San Diego Abstract 1 Introduction The advent of big data and striking recent progress in ar- Advancements in deep learning enable cloud servers to pro- tificial intelligence are fueling the impending industrial au- vide inference-as-a-service for clients. In this scenario, tomation revolution. In particular, Deep Learning (DL) —a clients send their raw data to the server to run the deep learn- method based on learning Deep Neural Networks (DNNs) ing model and send back the results. One standing chal- —is demonstrating a breakthrough in accuracy. DL mod- lenge in this setting is to ensure the privacy of the clients’ els outperform human cognition in a number of critical tasks sensitive data. Oblivious inference is the task of running such as speech and visual recognition, natural language pro- the neural network on the client’s input without disclosing cessing, and medical data analysis. Given DL’s superior per- the input or the result to the server. This paper introduces formance, several technology companies are now developing z2n XONN (pronounced / /), a novel end-to-end framework or already providing DL as a service. They train their DL based on Yao’s Garbled Circuits (GC) protocol, that pro- models on a large amount of (often) proprietary data on their vides a paradigm shift in the conceptual and practical real- own servers; then, an inference API is provided to the users ization of oblivious inference. In XONN, the costly matrix- who can send their data to the server and receive the analy- multiplication operations of the deep learning model are re- sis results on their queries. The notable shortcoming of this placed with XNOR operations that are essentially free in GC. remote inference service is that the inputs are revealed to the We further provide a novel algorithm that customizes the cloud server, breaching the privacy of sensitive user data. neural network such that the runtime of the GC protocol is Consider a DL model used in a medical task in which minimized without sacrificing the inference accuracy. a health service provider withholds the prediction model. We design a user-friendly high-level API for XONN, al- Patients submit their plaintext medical information to the lowing expression of the deep learning model architecture server, which then uses the sensitive data to provide a med- in an unprecedented level of abstraction. We further pro- ical diagnosis based on inference obtained from its propri- vide a compiler to translate the model description from high- etary model. A naive solution to ensure patient privacy is level Python (i.e., Keras) to that of XONN. Extensive proof- to allow the patients to receive the DL model and run it of-concept evaluation on various neural network architec- on their own trusted platform. However, this solution is tures demonstrates that XONN outperforms prior art such not practical in real-world scenarios because: (i) The DL as Gazelle (USENIX Security’18) by up to 7×, MiniONN model is considered an essential component of the service (ACM CCS’17) by 93×, and SecureML (IEEE S&P’17) by provider’s intellectual property (IP). Companies invest a sig- 37×. State-of-the-art frameworks require one round of in- nificant amount of resources and funding to gather the mas- teraction between the client and the server for each layer sive datasets and train the DL models; hence, it is important of the neural network, whereas, XONN requires a constant to service providers not to reveal the DL model to ensure round of interactions for any number of layers in the model. their profitability and competitive advantage. (ii) The DL XONN is first to perform oblivious inference on Fitnet archi- model is known to reveal information about the underlying tectures with up to 21 layers, suggesting a new level of scala- data used for training [1]. In the case of medical data, this bility compared with state-of-the-art. Moreover, we evaluate reveals sensitive information about other patients, violating XONN on four datasets to perform privacy-preserving med- HIPAA and similar patient health privacy regulations. ical diagnosis. The datasets include breast cancer, diabetes, Oblivious inference is the task of running the DL model liver disease, and Malaria. on the client’s input without disclosing the input or the result to the server itself. Several solutions for oblivious in- ing the inference phase. The XNOR operation is known to be ference have been proposed that utilize one or more cryp- free in the GC protocol [12]; therefore, performing oblivious tographic tools such as Homomorphic Encryption (HE) [2, inference on BNNs using GC results in the removal of costly 3], Garbled Circuits (GC) [4], Goldreich-Micali-Wigderson multiplications. Using our approach, we show that oblivious (GMW) protocol [5], and Secret Sharing (SS). Each of these inference on the standard DL benchmarks can be performed cryptographic tools offer their own characteristics and trade- with minimal, if any, decrease in the prediction accuracy. offs. For example, one major drawback of HE is its compu- We emphasize that an effective solution for oblivious in- tational complexity. HE has two main variants: Fully Ho- ference should take into account the deep learning algo- momorphic Encryption (FHE) [2] and Partially Homomor- rithms and optimization methods that can tailor the DL phic Encryption (PHE) [3, 6]. FHE allows computation on model for the security protocol. Current DL models are encrypted data but is computationally very expensive. PHE designed to run on CPU/GPU platforms where many multi- has less overhead but only supports a subset of functions or plications can be performed with high throughput, whereas, depth-bounded arithmetic circuits. The computational com- bit-level operations are very inefficient. In the GC protocol, plexity drastically increases with the circuit’s depth. More- however, bit-level operations are inexpensive, but multipli- over, non-linear functionalities such as the ReLU activation cations are rather costly. As such, we propose to train deep function in DL cannot be supported. neural networks that involve many bit-level operations but GC, on the other hand, can support an arbitrary function- no multiplications in the inference phase; using the idea of ality while requiring only a constant round of interactions learning binary networks, we achieve an average of 21× re- regardless of the depth of the computation. However, it has duction in the number of gates for the GC protocol. a high communication cost and a significant overhead for We perform extensive evaluations on different datasets. multiplication. More precisely, performing multiplication Compared to the Gazelle [10] (the prior best solution) and in GC has quadratic computation and communication com- MiniONN [9] frameworks, we achieve 7× and 93× lower plexity with respect to the bit-length of the input operands. inference latency, respectively. XONN outperforms DeepSe- It is well-known that the complexity of the contemporary cure [13] (prior best GC-based framework) by 60× and DL methodologies is dominated by matrix-vector multiplica- CryptoNets [14], an HE-based framework, by 1859×. More- tions. GMW needs less communication than GC but requires over, our solution renders a constant round of interactions many rounds of interactions between the two parties. between the client and the server, which has a significant ef- A standalone SS-based scheme provides a computation- fect on the performance on oblivious inference in Internet ally inexpensive multiplication yet requires three or more settings. We highlight our contributions as follows: independent (non-colluding) computing servers, which is a • Introduction of XONN, the first framework for privacy pre- strong assumption. Mixed-protocol solutions have been pro- serving DNN inference with a constant round complexity posed with the aim of utilizing the best characteristics of that does not need expensive matrix multiplications. Our each of these protocols [7, 8, 9, 10]. They require secure solution is the first that can be scalably adapted to ensure conversion of secrets from one protocol to another in the security against malicious adversaries. middle of execution. Nevertheless, it has been shown that the cost of secret conversion is paid off in these hybrid solu- • Proposing a novel conditional addition protocol based on tions. Roughly speaking, the number of interactions between Oblivious Transfer (OT) [15], which optimizes the costly server and client (i.e., round complexity) in existing hybrid computations for the network’s input layer. Our protocol solutions is linear with respect to the depth of the DL model. is 6× faster than GC and can be of independent interest. Since depth is a major contributor to the deep learning ac- We also devise a novel network trimming algorithm to re- curacy [11], scalability of the mixed-protocol solutions with move neurons from DNNs that minimally contribute to the respect to the number of layers remains an unsolved issue for inference accuracy, further reducing the GC complexity. more complex, many-layer networks. This paper introduces XONN, a novel end-to-end frame- • Designing a high-level API to readily automate fast adap- work which provides a paradigm shift in the conceptual tation of XONN, such that users only input a high-level and practical realization of privacy-preserving interference description of the neural network. We further facilitate the on deep neural networks. The existing work has largely usage of our framework by designing a compiler that trans- focused on the development of customized security proto- lates the network description from Keras to XONN. cols while using conventional fixed-point deep learning al- gorithms.

Load more