Accelerating Large Garbled Circuits on an FPGA-Enabled Cloud

Accelerating Large Garbled Circuits on an FPGA-enabled Cloud Miriam Leeser Mehmet Gungor Kai Huang Stratis Ioannidis Dept of ECE Dept of ECE Dept of ECE Dept of ECE Northeastern University Northeastern University Northeastern University Northeastern University Boston, MA Boston, MA Boston, MA Boston, MA [email protected] [email protected] [email protected] [email protected] Abstract—Garbled Circuits (GC) is a technique for ensuring management technique presented is also applicable to other the privacy of inputs from users and is particularly well suited big data problems that make use of FPGAs. for FPGA implementations in the cloud where data analytics is In this paper we describe approaches and experiments to im- frequently run. Secure Function Evaluation, such as that enabled by GC, is orders of magnitude slower than processing in the prove on previous implementations in order to obtain improved clear. We present our best implementation of GC on Amazon speedup as well as support larger examples. Specifically, the Web Services (AWS) that implements garbling on Amazon’s main contribution of this paper is to use the memory available FPGA enabled F1 instances. In this paper we present the largest on an FPGA more efficiently in order to support large data problems garbled to date on FPGA instances, which includes sets on FPGA instances and improve the speed up obtained problems that are represented by over four million gates. Our implementation speeds up garbling 20 times over software over from running applications on FPGAs in the cloud. a range of different circuit sizes. The rest of this paper is organized as follows. Section II Index Terms—privacy, garbled circuits, high performance provides background on garbled circuits and AWS as well computing, FPGAs as presenting related work. In section III we discuss our implementation of the garbler and highlight the improvements I. INTRODUCTION presented here over previously published results. Section IV presents experimental results. Finally, we conclude and present Privacy has become an increasing concern among users of plans for future work. computer systems and services. Secure Function Evaluation (SFE) is an approach then ensures users data privacy. However, II. BACKGROUND such assurances require a significant increase in processing A. Garbled Circuits requirements and latency. The two main techniques used for SFE are homomorphic encryption and garbled circuits Our research accelerates Secure Function Evaluation (SFE), (GC). This research focuses on accelerating GC, which is an specifically Garbled Circuits (GC), using FPGAs. In this excellent match for FPGAs in the data center. In particular, model there are two or more users with data which they wish this research targets FPGAs in the data center, and processing to keep private, and a function to be evaluated over that data. large problems. All parties know the function being evaluated and learn the Previous approaches to accelerating GC with FPGAs have outcome of the evaluation, but users do not reveal their data. focused on embedded systems and small problems, especially The threat model we follow is “honest but curious”, where an those where all intermediate data fits in on-chip memory [1], adversary follows the protocol as specified, but tries to learn [2]. Our recent research accelerates FPGAs in the cloud, as much as possible. A canonical problem exemplifying SFE specifically with AWS F1 instances [3]. In a recent paper [4], is the “Millionaires’ Problem:” two millionaires wish to know we describe our first results accelerating GC in the cloud. In who is worth more without revealing their personal worth to this paper, we discuss improvements to those initial results and each other. how we handle larger circuits. Garbled circuits were initially introduced by Yao [5] for One of the distinguishing features of this research is to make two users and have been extended to multiple users. They use of large data sets that do not entirely fit in on-chip memory rely on cryptographic primitives. In the variant we study here and that require random access. To improve on our previously (adapted from [6], [7]), Yao’s protocol runs between (a) a set of published research, we introduce a hybrid design that makes private input owners, (b) an Evaluator, who wishes to evaluate use of both on-chip and off-chip memory. We use the on- a function over the private inputs, and (c) a third party called chip memory as a managed cache that keeps as much data as the Garbler, that facilities and enables the secure computation. possible close to the processing for acceleration. The memory Garbled Circuits work for any problem that can be expressed as a Boolean circuit. In our and many other implementations, This research is funded by NSF under award SaTC 1717213. this function is represented as a circuit made up of AND and Fig. 1. Yao’s Protocol Phases of Operation XOR gates.1 The Evaluator wishes to evaluate a function f, respective truth table. We note however that, in practice, XOR represented as a Boolean circuit of AND and XOR gates, over gates are handled via the Free XOR optimization [9], discussed private user inputs x1; x2; : : : ; xn. We break the problem into in Section II-A3. A garbled AND gate is shown in Fig. 2. For three phases, as shown in Fig. 1. In Phase I, the Garbler each AND gate g, with input wires (wi; wj) and output wire “garbles” each gate of the circuit, outputting (a) a “garbled wk, the Garbler computes the following four ciphertexts, one circuit,” namely, the garbled representation of every gate for each pair of values bi; bj 2 f0; 1g: in the circuit representing f, and (b) a set of keys, each corresponding to a possible value in the string representing the g(bi;bj ) bi bj g(bi;bj ) Enc b b (k ) = SHA(k kk kg) ⊕ k inputs . These values are shared with the Evaluator. i j wk wi wj wk x1; : : : ; xn (kwi ;kwj ;g) In Phase II, through proxy oblivious transfer [8], the Evaluator (1) learns the keys corresponding to the true user inputs. In the final phase, the Evaluator uses the keys as input to the garbled Here SHA represents the hash function, k indicates con- circuit to evaluate the circuit, un-garbling the gates. At the catenation, g is an identifier for the gate, and ⊕ is the XOR operation. Note that each value k on a wire is implemented conclusion of Phase III, the Evaluator learns f(x1; : : : ; xn): 1) Garbling Phase: A function to be evaluated is repre- with 80 bits in our implementation. The “garbled” gate is sented as a Boolean circuit consisting of AND and XOR then represented by a random permutation of these four ciphertexts. Observe that, given the pair of keys 0 1 it gates. In the garbling phase, each of these gates is garbled (kwi ; kwj ) is possible to successfully recover the key 1 by decrypting as described in this section. Each gate is associated with three kwk 1 2 c = Enc(k0 ;k1 ;g)(k ) through: wires: two input wires and one output wire. At the beginning of wi wj wk the garbling phase, the Garbler associates two random strings, bi bj 0 1 0 1 k k ⊕ (2) Dec(kw ;kw ;g)(c) = SHA(kwi kwj g) c kwi and kwi , with each wire wi in the circuit. Intuitively, each i j kb is an encoding of the bit-value b 2 f0; 1g that the wire wi On the other hand, the other output wire key, namely K0 , w can take. wk i cannot be recovered. More generally, it is worth noting that the bi bj knowledge of (a) the ciphertexts, and (b) keys (k ; kwj ) for bi bj g(bi; bj ) Garbled value wi g(bi;bj ) wi 0 some inputs b and b yields only the value of key k ; w 0 0 0 Enc(k0 ;k0 ;g)(kw ) i j wk k wi wj k wj 0 no other input or output keys of gate g can be recovered. 0 1 0 Enc(k0 ;k1 ;g)(kw ) wi wj k 0 Any Boolean function can be garbled in this manner, by first 1 0 0 Enc(k1 ;k0 ;g)(kw ) wi wj k representing it with ANDs and XORs, and then garbling each 1 1 1 1 Enc(k1 ;k1 ;g)(kw ) wi wj k such gate. Fig. 2. A Garbled AND Gate Our implementation makes use of SHA cores for encryption and decryption. Although SHA has been found to be vulnera- We describe here how to garble an AND gate. The same ble in general, since we re-encrypt for every new input value, principles can be applied to garble an XOR gate, using its this implementation is secure. We compare our results, both 1Recall that AND and XOR gates form a complete basis for Boolean 2Note that the above encryption scheme is symmetric as Enc and Dec are circuits. the same function. values and run times to FlexSC [10] which also makes use of libraries for their FPGA instances. The SDK supports OpenCL SHA. We plan to update the cores to AES. As the core speed development; however, since we are developing an overlay and area is not currently a bottleneck in our design, we do not architecture it is not a good match for our design. expect this to effect the result presented here. AWS-FPGA hardware infrastructure connects the FPGA 2) Evaluation Phase: The output of the garbling process is board, which includes external DDR memory, to the host (a) the garbled gates, each comprising a random permutation mother board through the PCIe bus.

Accelerating Large Garbled Circuits on an FPGA-Enabled Cloud

A Fast and Verified Software Stackfor Secure Function Evaluation

Week 1: an Overview of Circuit Complexity 1 Welcome 2

Rigorous Cache Side Channel Mitigation Via Selective Circuit Compilation

Probabilistic Boolean Logic, Arithmetic and Architectures

Privacy Preserving Computations Accelerated Using FPGA Overlays

Reaction Systems and Synchronous Digital Circuits

SOTERIA: in Search of Efficient Neural Networks for Private Inference

The Size and Depth of Layered Boolean Circuits

Boolean Circuit Problems

Boolean Logic and Some Elements of Computational Complexity

GMW Vs. Yao? Efficient Secure Two-Party Computation with Low

CS 2742 (Logic in Computer Science) Lecture 9