Vectorized Secure Evaluation of Decision Forests

Vectorized Secure Evaluation of Decision Forests Raghav Malik Vidush Singhal School of Electrical and Computer Engineering School of Electrical and Computer Engineering Purdue University Purdue University West Lafayette, IN, USA West Lafayette, IN, USA [email protected] [email protected] Benjamin Gottfried Milind Kulkarni School of Electrical and Computer Engineering School of Electrical and Computer Engineering Purdue University Purdue University West Lafayette, IN, USA West Lafayette, IN, USA [email protected] [email protected] Abstract hospitals sharing patient data while adhering to HIPAA, and As the demand for machine learning–based inference in- users offloading sensitive computation to cloud providers. creases in tandem with concerns about privacy, there is a There are many ways of implementing secure machine growing recognition of the need for secure machine learning, learning algorithms, with different tradeoffs of efficiency and in which secret models can be used to classify private data privacy. One popular approach, thanks to its generality, is without the model or data being leaked. Fully Homomorphic based on fully homomorphic encryption (FHE) [11]. FHE is a Encryption (FHE) allows arbitrary computation to be done cryptosystem that allows performing homomorphic addition over encrypted data, providing an attractive approach to and multiplication over asymmetrically encrypted cipher- providing such secure inference. While such computation is texts such that when encryptions of integers are homomor- often orders of magnitude slower than its plaintext counter- phically added the resulting ciphertext is the encryption of part, the ability of FHE cryptosystems to do ciphertext pack- their sum, and similarly when the ciphertexts are homorphi- ing—that is, encrypting an entire vector of plaintexts such cally multiplied the result is an encryption of their product. that operations are evaluated elementwise on the vector— In other words, the inputs to an addition or multiply can helps ameliorate this overhead, effectively creating a SIMD be encrypted, the operation can be carried out over the en- architecture where computation can be vectorized for more crypted data, and the decrypted result will be the same as if efficient evaluation. Most recent research in this area has the operation were carried out on plaintext. targeted regular, easily vectorizable neural network models. FHE is attractive because it is complete: we can structure Applying similar techniques to irregular ML models such as arbitrarily complex calculations as an arithmetic circuit con- decision forests remains unexplored, due to their complex, sisting of additions and multiplications, and an entity can hard-to-vectorize structures. carry out these calculations entirely over encrypted data In this paper we present COPSE, the first system that without ever being trusted to see the actual data. Unfortu- exploits ciphertext packing to perform decision-forest infer- nately, homomorphic operations over ciphertexts tend to be ence. COPSE consists of a staging compiler that automati- orders of magnitude slower than their plaintext counterparts, cally restructures and compiles decision forest models down and this problem only gets worse when the ciphertext size to a new set of vectorizable primitives for secure inference. increases, which can happen due to a higher multiplicative We find that COPSE’s compiled models outperform the state depth in the arithmetic circuit or a larger security param- arXiv:2104.09583v1 [cs.CR] 19 Apr 2021 of the art across a range of decision forest models, often by eter (meaning more-secure encryption). As a result, most more than an order of magnitude, while still scaling well. real-world applications tend to produce FHE circuits that are impractically slow to execute. Keywords: Homomorphic Encryption, Decision Forests, Vec- The ability of FHE cryptosystems to do ciphertext packing torization somewhat mitigates this problem. Ciphertext packing refers to encrypting a vector of integers into a single ciphertext, so 1 Introduction that operations over that ciphertext correspond to the same In recent years, there has been substantial interest in secure operations elementwise over the vector [5]. If a circuit can machine learning: applications of machine learning where be expressed as computing over such packed ciphertexts, the the “owners” of a model, input data, or even the compu- total number of homomorphic operations decreases and it tational resources may not be the same entity, and hence often scales better to larger inputs, both of which result in a may not want to reveal information to one another. Settings more efficient circuit for the same application. The challenge, where these applications are important include banks shar- of course, is vectorizing arbitrary computations in this way. ing financial data while complying with financial regulations, Much recent work in this space has focused on securely primitives. The vectorized evaluation strategy we present evaluating neural network–based models. Neural nets are an here is in contrast with the traditional polynomial-based attractive target for FHE because the core computations of strategy presented by Aloufi et al. [1], which we discuss in neural nets are additions and multiplications, and in dense, more detail in Section 2.3.3. feed-forward neural nets, those computations are already The restructured computation consists of four stages: a naturally vectorized. Recent work developed approaches comparison step in which all the decision nodes are evalu- that compile simple neural net specification to optimized ated (in parallel), a reshaping step in which decisions are and vectorized FHE implementations [10]. shuffled into a canonical order, a level processing step where However, neural nets are not the only type of machine all decisions at a particular depth of the tree are evaluated, learning model that can benefit from the advantages of secure and an aggregation step in which the results from each depth computation. For many applications and data sets, especially are combined into a final classification. those over categorical data, decision forests are better suited COPSE consists of two parts: a compiler, and a runtime. to solving the classification problem than neural nets. The compiler translates a trained decision forest model into Unfortunately, decision forests are inherently trickier to a C++ program containing a vector encoding the tree thresh- map to vectorized FHE than neural nets. The comparisons olds, and matrices that encode the branching shape. The gen- performed at each branch in a decision tree (e.g., “is x greater erated C++ links against the COPSE runtime, which loads than 3?”) are harder to express using the basic addition and the model and provides functions to encrypt it, encrypt fea- multiplication primitives of FHE, especially if the party pro- ture vectors, and classify encrypted feature vectors using viding the comparison (G ¡ 3?) is different than the party encrypted models. The runtime uses HElib [12], which pro- providing the feature (G). Moreover, traditional evaluation vides a low-level interface for encrypting and decrypting and of decision trees is sequential: “executing” a decision tree homomorphically adding and multiplying ciphertext vectors, involves walking along a single path in a decision tree corre- as well as providing basic parallelism capabilities through sponding to a sequence of decisions that evaluate to true. NTL (Number Theory Library) [20]. Recently, researchers have shown how to express the computations of a decision tree as a boolean polynomial [1, 3]. 1.2 Summary of contributions These approaches parallelize decision forests (a set of deci- This paper makes the following contributions: sion trees) by evaluating the polynomials of each tree inde- • pendently. Nevertheless, these approaches still have limited A vectorizing compiler that translates decision forest scalability, as they evaluate each decision within a single models into efficient FHE operations. • tree sequentially, and do not exploit the ciphertext-packing, An analysis of the complexity of the generated FHE SIMD capabilities of FHE. programs, showing that our approach produces low- This paper shows how a compiler can restructure decision depth FHE circuits (allowing them to be evaluated with forest evaluation to more completely parallelize their eval- relatively low overhead) that efficiently pack compu- uation and exploit the SIMD capabilities of FHE, providing tations (allowing them to be vectorized effectvely). • scalable, parallel, secure evaluation of decision forests. A runtime environment built on HElib that encrypts compiled models and executes secure inference queries. 1.1 COPSE: Secure Evaluation of Decision Forests We generate several synthetic microbenchmarks and show The primitives that a cryptosystem like FHE provides can that while there is a linear relationship between level pro- be thought of as an instruction set with semantics that guar- cessing time and the number of decisions in the model, some antee noninterference; that is, no sensitive information can of the work (such as the comparison step) is done only once be leaked through publicly measurable outputs. One key as- and takes a constant amount of time regardless of the model pect of FHE’s noninterference guarantee is that it disallows size. We also train our own models on several open source branching on secret data, instead requiring that all compu- ML datasets and show that

Load more