SOTERIA: in Search of Efficient Neural Networks for Private Inference

SOTERIA: In Search of Efficient Neural Networks for Private Inference Anshul Aggarwal, Trevor E. Carlson, Reza Shokri, Shruti Tople National University of Singapore (NUS), Microsoft Research {anshul, trevor, reza}@comp.nus.edu.sg, [email protected] Abstract paper, we focus on private computation of inference over deep neural networks, which is the setting of machine learning-as- ML-as-a-service is gaining popularity where a cloud server a-service. Consider a server that provides a machine learning hosts a trained model and offers prediction (inference) ser- service (e.g., classification), and a client who needs to use vice to users. In this setting, our objective is to protect the the service for an inference on her data record. The server is confidentiality of both the users’ input queries as well as the not willing to share the proprietary machine learning model, model parameters at the server, with modest computation and underpinning her service, with any client. The clients are also communication overhead. Prior solutions primarily propose unwilling to share their sensitive private data with the server. fine-tuning cryptographic methods to make them efficient We consider an honest-but-curious threat model. In addition, for known fixed model architectures. The drawback with this we assume the two parties do not trust, nor include, any third line of approach is that the model itself is never designed to entity in the protocol. In this setting, our first objective is to operate with existing efficient cryptographic computations. design a secure protocol that protects the confidentiality of We observe that the network architecture, internal functions, client data as well as the prediction results against the server and parameters of a model, which are all chosen during train- who runs the computation. The second objective is to pre- ing, significantly influence the computation and communica- serve the confidentiality of the model parameters with respect tion overhead of a cryptographic method, during inference. to the client. We emphasize that the protection against the Based on this observation, we propose SOTERIA — a training indirect inference attacks that aim at reconstructing model method to construct model architectures that are by-design ef- parameters [41] or its training data [39], by exploiting model ficient for private inference. We use neural architecture search predictions, is not our goal. algorithms with the dual objective of optimizing the accuracy of the model and the overhead of using cryptographic primi- A number of techniques provide data confidentiality while private computation tives for secure inference. Given the flexibility of modifying computing and thereby allow . The tech- a model during training, we find accurate models that are also niques include computation on trusted processors such as efficient for private computation. We select garbled circuits Intel SGX [22,31], and computation on encrypted data, using as our underlying cryptographic primitive, due to their expres- homomorphic encryption, garbled circuits, secret sharing, and siveness and efficiency, but this approach can be extended hybrid cryptographic approaches that jointly optimize the effi- arXiv:2007.12934v1 [cs.CR] 25 Jul 2020 to hybrid multi-party computation settings. We empirically ciency of private inference on neural networks [3,6,16,45,46]. To provide private inference with minimal performance over- evaluate SOTERIA on MNIST and CIFAR10 datasets, to com- head and accuracy loss, the dominant line of research involves pare with the prior work. Our results confirm that SOTERIA is indeed effective in balancing performance and accuracy. adapting cryptographic functions to (an approximation of) a given fixed model [7, 24, 28–30, 35–37]. However, the alternative approach of searching or designing a network architecture 1 Introduction for a given set of efficient and known cryptographic primitives is unexplored in the literature. Machine learning models are susceptible to several security Our Contributions. In this work, we approach the prob- and privacy attacks throughout their training and inference lem of privacy-preserving inference from a novel perspective. pipelines. Defending each of these threats require different Instead of modifying cryptographic schemes to support neu- types of security mechanisms. The most important require- ral network computations, we advocate modification of the ment is that the sensitive input data as well as the trained training algorithms for efficient cryptographic primitives. Re- model parameters remains confidential at all times. In this search has shown that training algorithms for deep learning 1 PHE FHE SS GMW GC schemes enable both addition and multiplication on encrypted data [6, 15, 16, 43] but incur huge performance overhead. SS Expressiveness × XXXX involves distributing the secret shares among non-trusting Efficiency X × XXX parties such that any operation can be computed on encrypted data without revealing the individual inputs of each party [3]. Communication × × GMW [17] and GC [45] allow designing boolean circuits (One time setup) XX X and evaluating them between a client and a server. The dif- ferences between these schemes might make it difficult to Table 1: Properties of secure computation cryptographic prim- decide which primitive is the best fit for designing a privacy- itives: Partially and fully homomorphic encryption schemes preserving system for a particular application. Therefore, we (PHE, FHE), Goldreich-Micali-Widgerson protocol (GMW), first outline the desirable properties specifically for private arithmetic secret sharing (SS), and Yao’s garbled circuit (GC). neural network inference and then compare these primitives with respect to these properties (see Table1). We select a cryptographic scheme that satisfies all our requirements. are inherently flexible with respect to their neural network architecture. This means that different network configurations Expressiveness. This property ensures that the crypto- can achieve similar level of prediction accuracy. We exploit graphic primitive supports encrypted computation for a vari- this fact about deep learning algorithms and investigate the ety of operations. With the goal to enable private computation problem of optimizing deep learning algorithms to ensure for neural networks, we examine the type of computations efficient private computation. required in deep learning algorithms. Neural network algorithms are composed of linear and non-linear operations. Lin- To this end, we present SOTERIA — an approach for con- structing deep neural networks optimized for performance, ear operations include computation required in the execution accuracy and confidentiality. Among all the available crypto- of fully-connected and convolution layers. Non-linear opera- graphic primitives, we use garbled circuits as our main build- tions include activation functions such as Tanh, Sigmoid and ing block to address the confidentiality concern in the design ReLU. The research in deep learning is at its peak with a plethora of new models being proposed by the community to of SOTERIA. Garbled circuits are known to be efficient and allow generation of constant depth circuits even for non-linear improve the accuracy of various tasks. Hence, we desire that function which is not possible with other primitives such the underlying primitive should be expressive with respect to as GMW or FHE. We show that neural network algorithms any new operations used in the future as well. PHE schemes can be optimized to efficiently execute garbled circuits while offer limited operations (either addition or multiplication) on acheiving high accuracy guarantees. We observe that the ef- encrypted data. This limits their usage in applications that ficiency of evaluating an inference circuit depends on two demand expressive functionalities such as neural network al- key factors: the model parameters and the network structure. gorithms. Alternative approaches such as FHE, SS, GMW With this observation, we design a regularized architecture and GC protocols allow arbitrary operations. search algorithm to construct neural networks. SOTERIA se- Computation Efficiency. Efficiency is one of the key fac- lects optimal parameter sparsity and network structure with tors while designing a client-server application such as a neu- the objective to guarantee acceptable performance on garbled ral network inference service on the cloud. FHE techniques circuits and high model accuracy. have shown to incur orders of magnitude overhead for computation of higher-degree polynmials or non-linear functions. Existing approaches using FHE schemes have restricted its 2 Selecting the Cryptographic Primitive use to compute only linear functions. However, most of the neural network architectures such as CNNs have each linear In designing SOTERIA, we make several design choices with layer followed by a non-linear layer. To handle non-linear the goal of achieving efficiency. The most important among operations, previous solutions either approximate them to them is the selection of the underlying cryptographic prim- linear functions or switch to cryptographic primitives that itive to ensure privacy of data. Several cryptographic prim- support non-linearlity [12, 24, 29, 30]. Approximation of non- itives such as partially homomorphic encryption schemes linear functions such as ReLU highly impacts the accuracy (PHE) and fully homomorphic encryption schemes (FHE), of the model. Switching between schemes introduces addi- Goldreich-Micali-Widgerson

SOTERIA: in Search of Efficient Neural Networks for Private Inference

A Fast and Verified Software Stackfor Secure Function Evaluation

Week 1: an Overview of Circuit Complexity 1 Welcome 2

Rigorous Cache Side Channel Mitigation Via Selective Circuit Compilation

Probabilistic Boolean Logic, Arithmetic and Architectures

Privacy Preserving Computations Accelerated Using FPGA Overlays

Reaction Systems and Synchronous Digital Circuits

The Size and Depth of Layered Boolean Circuits

Accelerating Large Garbled Circuits on an FPGA-Enabled Cloud

Boolean Circuit Problems

Boolean Logic and Some Elements of Computational Complexity

GMW Vs. Yao? Efficient Secure Two-Party Computation with Low

CS 2742 (Logic in Computer Science) Lecture 9