Efficient Multi-Key Homomorphic Encryption with Packed Ciphertexts with Application to Oblivious Neural Network Inference

Hao Chen1, Wei Dai1, Miran Kim2, and Yongsoo Song1

1 Research, Redmond, USA 2 UTHealth, Houston, USA {haoche, wei.dai, yongsoo.song}@microsoft.com [email protected]

Abstract. Homomorphic Encryption (HE) is a cryptosystem which supports computation on en- crypted data. L´opez-Alt et al. (STOC 2012) proposed a generalized notion of HE, called Multi-Key Homomorphic Encryption (MKHE), which is capable of performing arithmetic operations on ci- phertexts encrypted under different keys. In this paper, we present multi-key variants of two HE schemes with packed ciphertexts. We present new relinearization algorithms which are simpler and faster than previous method by Chen et al. (TCC 2017). We then generalize the bootstrapping techniques for HE to obtain multi-key fully homomorphic encryption schemes. We provide a proof-of-concept implementation of both MKHE schemes using Microsoft SEAL. For example, when the dimension of base ring is 8192, homomor- phic multiplication between multi-key BFV (resp. CKKS) ciphertexts associated with four parties followed by a relinearization takes about 116 (resp. 67) milliseconds. Our MKHE schemes have a wide range of applications in secure computation between multiple data providers. As a benchmark, we homomorphically classify an image using a pre-trained neural network model, where input data and model are encrypted under different keys. Our implementation takes about 1.8 seconds to evaluate one convolutional layer followed by two fully connected layers on an encrypted image from the MNIST dataset.

Keywords: Multi-key homomorphic encryption · Packed ciphertext · Ring learning with errors · Neural networks.

1 Introduction

As large amount of data are being generated and used for driving novel scientific discoveries, the effective and responsible utilization of large data remains to be a big challenge. This issue might be alleviated by outsourcing to public cloud service providers with intensive computing resources. However, there still remains a problem in privacy and security of outsourcing data and analysis. In the past few years, significant progresses have been made on cryptographic techniques for secure computation. Among the techniques for secure computation, Multi-Party Computation (MPC) and Homomorphic Encryption (HE) have received increasing attention due to technical breakthroughs. The history of MPC dates back three decades ago [50, 5], and since then it has been intensively studied in the theory community. In this approach, two or more parties participate in an interactive protocol to compute a function on their private inputs, where only the output of the function is revealed to the parties. Recent years witnessed a large body of works on improving the practical efficiency of MPC, and state-of-the-art protocols have achieved orders of magnitude improvements on performance (see e.g. [20, 49, 36]). However, these protocols are still inherently inefficient in terms of communication complexity: the number of bits that the parties need to exchange during the protocol is proportional to the product between the complexity of the function and the number of parties. Therefore, the high communication complexity remains the main bottleneck of MPC protocols. Moreover, the aforementioned MPC protocols may not be desirable for cloud-based applications, as all the parties involved need to perform local computation proportional to the complexity of the function. 2 H. Chen et al.

However, in practical use-cases, we cannot expect the data providers to either perform large amount of work or stay online during the entire protocol execution. Another model was proposed where the data owners secret-share their data with a small number of independent servers, who perform an MPC to generate the computation result [23, 44]. These protocols have good performance and they moved the burden from the data providers to the servers, but their privacy guarantees rely on the assumption that the servers do not collude. HE refers to a cryptosystem that allows computing on encrypted data without decrypting them, thus enabling securely outsourcing computation in an untrusted cloud. There have been significant technical advances on HE after Gentry’s first construction [24]. For example, one can encrypt multiple plaintext values into a single packed ciphertext, and use the single instruction multiple data (SIMD) techniques to perform operations on these values in parallel [48, 26]. Hence, HE schemes with packing techniques [7, 6, 22, 16] have good amortized complexity per plaintext value, and they have been applied to privacy- preserving big data analysis [39, 11, 37]. However, traditional HE schemes only allow computation on ciphertexts decryptable under the same secret key. Therefore, HE does not naturally support secure computation applications involving multiple data providers, each providing its own secret key.

Fig. 1. High-level overview of the application to oblivious neural network inference.

L´opez-Alt et al. [43] proposed a Multi-Key Homomorphic Encryption (MKHE) scheme, which is a cryptographic primitive supporting arithmetic operations on ciphertexts which are not necessarily decryptable to the same secret key. In addition to solving the aforementioned issues of HE, MKHE can be also used to design round-efficient MPC protocols with minimal communication cost [45]. In addition, an MPC protocol from MKHE satisfies the on-the-fly MPC [43] property, where the circuit to be evaluated can be dynamically decided after the data providers upload their encrypted data. Despite its versatility, MKHE has been seldom used in practice. Early studies [19, 45, 46] used a multi- key variant of the GSW scheme [28]. These constructions have large ciphertexts and their performance does not scale well with the number of parties. Previous works [8, 10] proposed MKHE schemes with short ciphertexts, with the caveat that one ciphertext encrypts only a single bit. The only existing MKHE scheme with packed ciphertexts [13, 41] is a multi-key variant of the BGV scheme [7]. Note that all the above studies were purely abstract with no implementation given, and it remains an open problem whether an MKHE scheme with support for SIMD operations can be practical.

1.1 Our Contributions

We design multi-key variants of the BFV [6, 22] and CKKS [16] schemes. We propose a new method for generating a relinearization key which is simpler and faster compared to previous technique in [13]. Furthermore, we adapt the state-of-the-art bootstrapping algorithms for these schemes [12, 14, 9] to the multi-key scenario to build Multi-Key Fully Homomorphic Encryptions with packed ciphertexts. Finally, we give a proof-of-concept implementation of our multi-key schemes using Microsoft SEAL [47] and Efficient MKHE with Packed Ciphertexts 3 present experimental results. To the best of our knowledge, this is the first practical implementation of MKHE schemes that support packed ciphertexts. We also present the first viable application of MKHE that securely evaluates a pre-trained convolu- tional neural network (CNN) model. We build an efficient protocol where a cloud server provides on-line prediction service to a data owner using a classifier from a model provider, while protecting the privacy of both data and model using MKHE. Our scheme with support for the multi-key operations makes it possible to achieve this at a low end-to-end latency, and near-optimal cost for the data and model providers, as shown in Fig. 1. The server can store numerous ciphertexts encrypted under different keys, but the computational cost of a certain task depends only on the number of parties related to the circuit. We note that our solution has the advantage over single-key HE since the ML model providers do not need to send the unencrypted model to the server.

1.2 Overview of Our Construction

n Let = Z[X]/(X +1) be the cyclotomic ring with a power-of-two dimension n, and si ∈ R be the secret of the i-th party. The starting point of the construction of a ring-based MKHE scheme is the requirement that the resulting scheme should be able to handle homomorphic computations on ciphertexts under independently generated secret keys. A ciphertext of our MKHE scheme associated to k different parties k+1 is of the form ct = (c0, c1, . . . , ck) ∈ Rq for a modulus q, which is decryptable by the concatenated secret sk = (1, s1, . . . , sk). In other words, its phase µ = hct, ski (mod q) is a randomized encoding of a plaintext message m corresponding to the base scheme. Homomorphic multiplication of BFV or CKKS consists of two steps: tensor product and relineariza- tion. The tensor product of two input ciphertexts satisfies hct1 ⊗ct2, sk⊗ski = hct1, ski· hct2, ski, so it is a valid encryption under the tensor squared secret sk ⊗ sk. In the relinearization step, we aim to transform (k+1)2 the extended ciphertext ct = ct1 ⊗ ct2 ∈ Rq into a canonical ciphertext encrypting the same message under sk. This step can be understood as a key-switching process which requires a special encryption of sk ⊗ sk. We note that sk ⊗ sk contains entries sisj which depend on secrets of two distinct parties. Hence a relinearization key corresponding to non-linear entries cannot be generated by a single party, different from the traditional HE schemes. We propose an RLWE-based cryptosystem to achieve this functionality. It looks similar to the ring variant of GSW [28, 21] but our scheme supports some operations between ciphertexts under different keys. Let g ∈ Zd be an integral vector, called the gadget vector. This scheme assumes the Common d Reference String (CRS) model so all parties share a random polynomial vector a ∈ Rq . Each party d×3 i generates a special encryption of secret si by itself, which is a matrix Di = [di,0|di,1|di,2] ∈ Rq satisfying di,0 + si · di,1 ≈ ri · g (mod q) and di,2 ≈ ri · a + si · g (mod q) where ri is a small polynomial sampled from the key distribution. It is published as the evaluation key of the i-th party. We present two relinearization methods with different advantages. For each pair 1 ≤ i, j ≤ k, the first method combines the i-th evaluation key Di with the j-th public key bj ≈ −sj · a (mod q) to generate d×3 Ki,j ∈ Rq such that Ki,j · (1, si, sj) ≈ sisj · g (mod q). That is, Ki,j can be used to relinearize one 0 0 0 0 0 0 0 entry ci,j of an extended ciphertext into a triple (c0, ci, cj) such that c0 + cisi + cjsj ≈ ci,jsisj (mod q). This method can be viewed as a variant of the previous GSW ciphertext extension proposed in [45]. In particular, each row of Ki,j consists of three polynomials in Rq (compared to O(k) dimension of previous 2 works [13, 41]), so that the bit size of a shared relinearization key {Ki,j}1≤i,j≤k is O(dk · n log q) and the complexity of key generation is O(d2k2) polynomial operations modulo q (see Section 3 for details). The 2 relinearization algorithm repeats O(k ) key-switching operations from sisj to (1, si, sj), so its complexity 2 is O(dk ) operations in Rq. We note that Ki,j can be pre-computed before multi-key operations, and a generated key can be reused for any computation related to the parties i and j. Our second approach directly linearizes each of the entries of an extended ciphertext by multiplying the j-th public key bj and i-th evaluation key Di in a recursive way. The first solution should generate and store a shared relinearization key {Ki,j}1≤i≤k, so its space and time complexity grow quadratically on k. However, the second algorithm allows us to keep only the individual evaluation keys which is linear on k. Furthermore, it significantly reduces the variance of additional noise from relinearization, so that we 4 H. Chen et al.

Space Time Scheme Type Complexity Type Complexity EvalKey O˜(k3n) EvalKey Gen O˜(k3n) LZY+19 [41] Ciphertext O˜(kn) Mult O˜(k3n) EvalKey O˜(k2n2) EvalKey Gen O˜(k2n2) CCS19 [10] Ciphertext O˜(kn) NAND O˜(k2n2) EvalKey O˜(kn) EvalKey Gen − This work Ciphertext O˜(kn) Mult O˜(k2n) Table 1. Memory (bit-size) and computational costs (number of scalar operations) of MKHE schemes. k denotes the number of parties and n is the dimension of the (R)LWE assumption. EvalKey denotes the evaluation (or bootstrapping) keys.

can use a smaller parameter while keeping the same functionality. Finally, we adapt the modulus raising technique [27, 38] to the second approach to reduce the noise growth even more. As an orthogonal issue, the bootstrapping of packed MKHE schemes has not been studied in the literature. We generalize the existing bootstrapping methods for HE schemes [25, 32, 12, 14, 9] to the multi-key setting. The main issue of generalization is that the pipeline of bootstrapping includes some ad- vanced functionalities such as slot permutation. We resolve this issue and provide all necessary operations by applying the multi-key-switching technique in [10] to Galois automorphism. Finally, we apply the state-of-art optimization techniques for implementing HE schemes [4, 30, 15] to our MKHE schemes for performance improvement. For example, we implement full Residue Number System (RNS) variants of MKHE schemes and use an RNS-friendly decomposition method [4, 30, 15] for relinearization, thereby avoiding expensive high-precision arithmetic. 1.3 Related Works L´opez-Alt et al. [43] firstly proposed an MKHE scheme based on NTRU. After that, Clear and Mc- Goldrick [19] suggested a multi-key variant of GSW together with ciphertext extension technique to design an MKHE scheme and it was simplified by Mukherjee and Wichs [45]. Peikert and Shiehian [46] developed two multi-hop MKHE schemes based on the same multi-key GSW scheme. However, these schemes could encrypt only a single bit in a huge extended GSW ciphertext. Brakerski and Perlman [8] suggested an MKHE scheme with short (LWE-based) ciphertexts, however, its asymptotic/concrete efficiency has not been presented clearly. Chen, Chillotti and Song [10] proposed an improved scheme by applying the framework of TFHE [17] with the first implementation of MKHE primitive. However, this scheme does not support the packing technique, thereby resulting in a large expansion rate similar to TFHE. The most relevant studies are by Chen et al. [13] and Li et al. [41]. They designed multi-key variants of BGV [7] by generating a relinearization key based on the multi-key GSW scheme. However, it consists of 2 O(k ) key-switching keys from sisj to the ordinary key each of which has O(k) components. In addition, they did not provide any implementation result or analysis about concrete performance. Our work is an extension of these researches in the sense that our relinearization method and other optimization techniques can be applied to BGV as well. We also stress that the performance of previous batch MKHE schemes can be improved by observing the sparsity of evaluation keys, but this point was not pointed out in the manuscripts. In Table 1, we provide the performance of the recent MKHE schemes; in the context of our work, the second method is taken into account for comparison.

2 Background

2.1 Notation All logarithms are in base two unless otherwise indicated. We denote vectors in bold, e.g. a, and matrices in upper-case bold, e.g. A. We denote by hu, vi the usual dot product of two vectors u, v. For a real Efficient MKHE with Packed Ciphertexts 5 number r, bre denotes the nearest integer to r, rounding upwards in case of a tie. We use x ← D to denote the sampling x according to distribution D. For a finite set S, U(S) denotes the uniform distribution on S. We let λ denote the security parameter throughout the paper: all known valid attacks against the cryptographic scheme under scope should take Ω(2λ) bit operations.

2.2 Multi-Key Homomorphic Encryption

A multi-key homomorphic encryption is a cryptosystem which allows us to evaluate an arithmetic circuit on ciphertexts, possibly encrypted under different keys. Let M be the message space with arithmetic structure. An MKHE scheme MKHE consists of five PPT algorithms (Setup, KeyGen, Enc, Dec, Eval). We assume that each participating party has a reference (index) to its public and secret keys. A multi-key ciphertext implicitly contains an ordered set T = {id1, . . . , idk} of associated references. For example, a fresh ciphertext ct ← MKHE.Enc(µ; pkid) corresponds to a single-element set T = {id} but the size of reference set gets larger as the computation between ciphertexts from different parties progresses. • Setup: pp ← MKHE.Setup(1λ). Takes the security parameter as an input and returns the public param- eterization. We assume that all the other algorithms implicitly take pp as an input . • Key Generation: (sk, pk) ← MKHE.KeyGen(pp). Outputs a pair of secret and public keys. • Encryption: ct ← MKHE.Enc(µ; pk). Encrypts a plaintext µ ∈ M and outputs a ciphertext ct ∈ {0, 1}∗.

• Decryption: µ ← MKHE.Dec(ct; {skid}id∈T ). Given a ciphertext ct with the corresponding sequence of secret keys, outputs a plaintext µ. • Homomorphic evaluation:

ct ← MKHE.Eval(C, (ct1,..., ct`), {pkid}id∈T ).

Given a circuit C, a tuple of multi-key ciphertexts (ct1,..., ct`) and the corresponding set of public keys {pkid}id∈T , outputs a ciphertext ct. Its reference set is the union T = T1 ∪ · · · ∪ T` of reference sets Tj of the input ciphertexts ctj for 1 ≤ j ≤ `.

Semantic Security. For any two messages µ0, µ1 ∈ M, the distributions {MKHE.Enc(µi; pk)} for i = 0, 1 should be computationally indistinguishable where pp ← MKHE.Setup(1λ) and (sk, pk) ← MKHE.KeyGen(pp). Correctness and Compactness. An MKHE scheme is compact if the size of a ciphertext relevant to k parties is bounded by poly(λ, k) for a fixed polynomial poly(·, ·).

For 1 ≤ j ≤ `, let ctj be a ciphertext (with reference set Tj) such that MKHE.Dec(ctj, {skid}id∈Tj ) = µj. ` Let C : M → M be a circuit and ct ← MKHE.Eval(C, (ct1,..., ct`), {pkid}id∈T ) for T = T1 ∪ · · · ∪ T`. Then,

MKHE.Dec(ct, {skid}id∈T ) = C(µ1, . . . , µ`) (1) with an overwhelming probability. The equality of (1) can be substituted by approximate equality similar to the CKKS scheme for approximate arithmetic [16].

2.3 Ring Learning with Errors

Throughout the paper, we assume that n is a power-of-two integer and R = Z[X]/(Xn + 1). We write Rq = R/(q · R) for the residue ring of R modulo an integer q. The Ring Learning with Errors (RLWE) assumption with parameter (n, q, χ, ψ) is that given any polynomial number of samples of the form 2 (ai, bi = s·ai +ei) ∈ Rq, where ai is uniformly random in Rq, s is chosen from the key distribution χ over Rq, and ei is drawn from the error distribution ψ over R, the bi’s are computationally indistinguishable from uniformly random elements of Rq. 6 H. Chen et al.

2.4 Gadget Decomposition

d −1 Let g = (gi) ∈ Z be a gadget vector and q an integer. The gadget decomposition, denoted by g , is a d d function from Rq to R which transforms an element a ∈ Rq into a vector u = (u0, . . . , ud−1) ∈ R of Pd−1 small polynomials such that a = i=0 gi · ui (mod q). The gadget decomposition technique is widely used in the construction of HE schemes. For exam- ple, homomorphic evaluation of a nonlinear circuit is based on the key-switching technique and most of HE schemes exploit various gadget decomposition method to control the noise growth. There have been suggested in the literature various decomposition methods such as bit decomposition [6, 7], base decom- position [21, 17] and RNS-based decomposition [4, 30]. Our implementation exploits an RNS-friendly decomposition for the efficiency.

3 Relinearizing Multi-key Ciphertexts

This section provides a high-level description of our MKHE schemes and explain how to perform the relinearization procedures which are core operations in homomorphic arithmetic.

3.1 Overview of HEs with Packed Ciphertexts

In recent years, there have been remarkable advances in the performance of HE schemes. For example, the ciphertext packing technique allows us to encrypt multiple data in a single ciphertext and perform parallel homomorphic operations in a SIMD manner. Currently the batch HE schemes such as BGV [7], BFV [6, 22] and CKKS [16] are the best-performing schemes in terms of amortized size and timing per plaintext slot. They adapt some DFT-like algorithms to transform a vector of plaintext values into an element of a cyclotomic ring. Let sk = (1, s) for the secret s ∈ R. A canonical RLWE-based ciphertext is of the form ct = (c0, c1) ∈ 2 Rq such that the inner product µ = hct, ski (mod q), called the phase, is a randomized encoding of a plaintext m. For example, the phase of a BFV ciphertext has the form of µ = (q/t)·m+e for the plaintext modulus t while the phase µ = m + e of CKKS is an approximate value of the plaintext. For homomorphic computation, we basically perform arithmetic operations between the phases of given ciphertexts. In particular, homomorphic multiplication of RLWE ciphertexts consists of two steps: tensor product and relinearization. For input ciphertexts ct1 and ct2, we first compute their tensor product and return the extended ciphertext ct = ct1 ⊗ct2 that satisfies hct, sk⊗ski = hct1, ski·hct2, ski. Since sk⊗sk contains the nonlinear entry s2, it requires to perform the relinearization procedure which transforms the extended ciphertext to a canonical ciphertext encrypting the same message. Roughly speaking, we publish a relinerization key which is some kind of ciphertext encrypting s2 under sk and run the key-switching algorithm for this conversion. In the multi-key case, a ciphertext related to k different parties is of the form ct = (c0, c1, . . . , ck) ∈ k+1 Rq which is decryptable by the concatenated secret sk = (1, s1, . . . , sk), i.e., its phase is computed Pk by µ = hct, ski = c0 + i=1 ci · si. If we follow the same pipeline for homomorphic operation as in the single-key setting, the tensor product step returns an extended ciphertext corresponding sk ⊗ sk. Hence, we need to generate a relinearization key which consists of multiple ciphertexts encrypting the entries si · sj of sk ⊗ sk. Different from the classical HE schemes, it requires some additional computations since the term si · sj depends on two secret keys which are independently generated by different parties. In the following, we will explain how to efficiently generate a relinearization key for multi-key homomorphic multiplication.

3.2 Basic Scheme

In this section, we present a ring-based scheme which will be used to generate some public material for relinearization. Efficient MKHE with Packed Ciphertexts 7

• Setup(1λ): For a given security parameter λ, set the RLWE dimension n, ciphertext modulus q, key d distribution χ and error distribution ψ over R. Generate a random vector a ← U(Rq ). Return the public parameter pp = (n, q, χ, ψ, a).

• KeyGen(pp): Sample the secret key s ← χ. Sample an error vector e ← ψd and set the public key as d b = −s · a + e (mod q) in Rq .

d×3 • UniEnc(µ; s): For an input plaintext µ ∈ R, generate a ciphertext D = [d0|d1|d2] ∈ Rq as follows:

1. Sample r ← χ. d d 2. Sample d1 ← U(Rq ) and e1 ← ψ , and set d0 = −s · d1 + e1 + r · g (mod q). d 3. Sample e2 ← ψ and set d2 = r · a + e2 + µ · g (mod q).

d The public parameter pp contains a randomly generated vector a ∈ Rq , so we are assuming the common reference string model. All parties should take the same public parameter as an input of the key- generation algorithm to support multi-key homomorphic arithmetic. We note that the same assumption was made in all the previous work on MKHE. The uni-encryption algorithm is a symmetric encryption which can encrypt a single ring element. d An uni-encrypted ciphertext D = [d0|d1|d2] ← UniEnc(µ; s) consists of three vectors in Rq so is (3/4) 2d×2 times as large as an ordinary RGSW ciphertext in Rq . For an uni-encrypted ciphertext D, the first two columns [d0|d1] can be viewed as an encryption of r under the secret s while [d2| − a] forms an encryption of µ under secret r.

Security. The uni-encryption scheme is IND-CPA secure under the RLWE assumption of parameter (n, q, χ, ψ). We prove it by showing that the distribution

{(a, b, D): pp = (n, χ, ψ, a) ← Setup(1λ), (s, b) ← KeyGen(pp), D ← UniEnc(µ; s)}

d d d×3 is computationally indistinguishable from the uniform distribution over Rq × Rq × Rq for an arbitrary µ ∈ R. We refer the reader to Appendix B for the proof of this result.

3.3 Relinearization

We revisit the relinearization procedure on extended ciphertexts and present two solutions with different k+1 advantages. We recall that the tensor product ct = ct1 ⊗ ct2 of two multi-key ciphertexts cti ∈ Rq encrypted under the concatenated secret sk = (1, s1, . . . , sk) can be viewed as a ciphertext corresponding to the tensor squared secret sk⊗sk. Note that sk⊗sk contains some nonlinear entries si ·sj related to two different parties. Therefore, the computing server should be able to transform the extended ciphertext (k+1)×(k+1) ct ∈ Rq into a canonical ciphertext by linearization of the non-linear entries si · sj. Our relinearization methods require the same public material (evaluation key) that is generated by individual parties as follows:

• EvkGen(s): Given the secret s ∈ R, return D ← UniEnc(s; s).

To be precise, each party i generates its own secret, public, and evaluation keys by running the algorithms (si, bi) ← KeyGen(pp) and Di ← EvkGen(si), then publishes the pair (bi, Di). For the rest of this section, we present two relinearization algorithms and explain their pros and cons. We make an additional circular security assumption since the evaluation key is an uni-encryption of secret s encrypted by itself. However, we stress that our assumption is no stronger than the same assumption in HE schemes [27, 22, 17, 16] requiring either bootstrapping or relinearization of ciphertexts. 8 H. Chen et al.

Algorithm 1 Relinearization method 1

Input: ct = (ci,j)0≤i,j≤k, rlk = {Ki,j}1≤i,j≤k. 0 0 k+1 Output: ct = (ci)0≤i≤k ∈ Rq . 0 1: c0 ← c0,0 2: for 1 ≤ i ≤ k do 0 3: ci ← c0,i + ci,0 (mod q) 4: end for 5: for 1 ≤ i, j ≤ k do 0 0 0 0 0 0 −1 6: (c0, ci, cj) ← (c0, ci, cj) + g (ci,j) · Ki,j (mod q) 7: end for

3.3.1 First Method This solution includes a pre-processing step which generates a shared relinearization key corresponding to the set of involved parties. A shared relinearization key consists of encryptions of si · sj for all pairs 1 ≤ i, j ≤ k. Then, we can linearize an extended ciphertext by applying a standard key-switching technique. This approach is similar to a method proposed in previous work [13, 41] which also generates a shared evaluation key. However, each element of our shared relinearization key is computed from the public information of at most two parties so consists of three vectors, while previous method based on the multi-key GSW scheme has O(k) dimensional entries. d×3 • Convert(Di, bj): It takes as the input a pair of an uni-encryption Di = [di,0|di,1|di,2] ∈ Rq and d a public key bj ∈ Rq generated by (possibly different) parties i and j. Let ki,j,0 and ki,j,1 be the d −1 −1 vectors in Rq such that ki,j,0[`] = hg (bj[`]), di,0i and ki,j,1[`] = hg (bj[`]), di,1i for 1 ≤ ` ≤ d, d×d −1 d i.e., [ki,j,0|ki,j,1] = Mj · [di,0|di,1] where Mj ∈ R is the matrix whose `-th row is g (bj[`]) ∈ R . Let d×3 ki,j,2 = di,2 and return the ciphertext Ki,j = [ki,j,0|ki,j,1|ki,j,2] ∈ Rq .

   −1        g (bj[1])    .        ki,j,0 ki,j,1  =  .  · di,0 di,1  , ki,j,2  = di,2  .           −1 g (bj[d])

• Relin(ct; {(Di, bi)}1≤i≤k): Given an extended ciphertext ct = (ci,j)0≤i,j≤k and k pairs of evalua- 0 k+1 tion/public keys {(Di, bi)}1≤i≤k, generate a ciphertext ct ∈ Rq as follows:

1. Compute Ki,j ← Convert(Di, bj) for all 1 ≤ i, j ≤ k and set the relinearization key as rlk = {Ki,j}1≤i,j≤k. 2. Run Alg. 1 to relinearize ct.

We note that the first step (generation of rlk) can be pre-computed on public information {(Di, bi)}1≤i≤k without taking a ciphertext as the input.

Correctness. We first claim that, if Di is an uni-encryption of µi ∈ R encrypted by the i-th party and bj is the public key of the j-th party, then the output Ki,j ← Convert(Di, bj) of the conversion algorithm is an encryption of µisj with respect to the secret (1, si, sj), i.e., ki,j,0 + si · ki,j,1 + sj · ki,j,2 ≈ µisj · g (mod q). It is derived from the following formulas:

ki,j,0 + si · ki,j,1 = Mj · (d0 + si · d1) ≈ Mj · rig = ribj (mod q),

sj · ki,j,2 = sj · d2 ≈ risj · a + µsj · g ≈ −r · bj + µisj · g (mod q).

Note that Mj, sj and ri should be small to hold the approximate equalities. We estimate the size of noise in Appendix C. Efficient MKHE with Packed Ciphertexts 9

Algorithm 2 Relinearization method 2

Input: ct = (ci,j)0≤i,j≤k, {(Di = [di,0|di,1|di,2], bi)}1≤i≤k. 0 0 k+1 Output: ct = (ci)0≤i≤k ∈ Rq . 0 1: c0 ← c0,0 2: for 1 ≤ i ≤ k do 0 3: ci ← c0,i + ci,0 (mod q) 4: end for 5: for 1 ≤ i, j ≤ k do 0 −1 6: ci,j ← hg (ci,j), bji (mod q) 0 0 0 0 −1 0 7: (c0, ci) ← (c0, ci) + g (ci,j) · [di,0|di,1] (mod q) 0 0 −1 8: cj ← cj + hg (ci,j), di,2i (mod q) 9: end for

We now show the correctness of our algorithm. Since the evaluation key Di of the i-th party is an 0 uni-encryption of µi = si, we obtain that Ki,j · (1, si, sj) ≈ sisj · g (mod q). From the definition of ct , we get

k 0 0 X 0 hct , ski = c0 + ci · si i=1 k k X X −1 = c0,0 + (c0,i + ci,0)si + g (ci,j) · Ki,j · (1, si, sj) (mod q) i=1 i,j=1 k k X X ≈ c0,0 + (c0,i + ci,0)si + ci,j · sisj = hct, sk ⊗ ski (mod q). i=1 i,j=1

3.3.2 Second Method

Our second solution does not generate a shared relinearization key different from the previous one. Instead, it directly linearizes each entry ci,j of an extended ciphertext ct = (ci,j)0≤i,j≤k by multiplying it to bj and Di in a recursive way.

• Relin(ct; {(Di, bi)}1≤i≤k): For given extended ciphertext ct = (ci,j)0≤i,j≤k and k pairs of evalua- 0 k+1 tion/public keys {(Di, bi)}1≤i≤k, generate a ciphertext ct ∈ Rq as described in Alg. 2.

We will analyze and compare two relinearization methods in the following section. In short, the second method has advantages in storage and noise growth while the first method could be faster if a shared evaluation key is used repeatedly to relinearize multiple ciphertexts corresponding to the same set of parties. We first show the correctness of the second method.

0 −1 Correctness. At each iteration of the second for-loop in Alg. 2, we compute ci,j = hg (ci,j), bji, then −1 0 −1 0 0 0 add g (ci,j) · [di,0|di,1] and hg (ci,j), di,2i to (c0, ci) and cj, respectively. We note that

−1 0 0 g (ci,j) · [di,0|di,1] · (1, si) ≈ ri · ci,j (mod q), and

−1 −1 0 hg (ci,j), di,2i · sj ≈ hg (ci,j), −ri · bj + sisj · gi = −ri · ci,j + ci,j · sisj (mod q). 10 H. Chen et al.

From the definition of ct0, we get

k 0 0 X 0 hct , ski = c0 + ci · si i=1 k k k X X −1 0 X −1 = c0,0 + (c0,i + ci,0)si + g (ci,j) · [di,0|di,1] · (1, si) + hg (ci,j), di,2i · sj (mod q) i=1 i,j=1 i,j=1 k k X X ≈ c0,0 + (c0,i + ci,0)si + ci,j · sisj = hct, sk ⊗ ski (mod q). i=1 i,j=1

3.3.3 Performance of Relinearization Algorithms Suppose that there are k different parties involved in a multi-key computation. For relinearizing an (k+1)2 extended ciphertext ct = (ci,j)0≤i,j≤k ∈ Rq , both of our relinearization methods repeat some com- putations on each ci,j to switch its corresponding secret si · sj into (1, si, sj). So we will focus on a single step (i, j) of each solution to compare their performance. In our first method, a computing party generates a shared relinearization key Ki,j and uses it to linearize an input extended ciphertext. The generation of Ki,j includes a multiplication between d×d and 2 −1 d×2 matrices so its complexity is 2d polynomial multiplications. However, the computation of g (ci,j)· Ki,j in Step 6 of Alg. 1 requires only 3d polynomial multiplications. Meanwhile, the second method does not have any pre-processing but a single iteration of Alg. 2 requires 4d polynomial multiplications. As a result, the first method can be up to (4/3) times faster when one performs multiple homomorphic arithmetic on the same set (or its subset) of parties using a pre-computed shared relinearization key, however, the required storage grows quadratically on k compared to the linear memory of the second method. The second method also has an advantage in noise management, which we will discuss below together with modulus raising technique.

3.3.4 Special Modulus Technique Noise growth is the main factor determining the parameter size and thereby overall performance of a cryptosystem. In general, we can use a large decomposition degree d to reduce the size of a decomposed vector g−1(·) as well as key-switching error, but this naive method causes performance degradation. In addition, the benefit of this trade-off between noise growth and computational complexity gets smaller and smaller as d increases. Therefore, this method is not the best option when we should have a small noise. The special modulus (a.k.a. modulus raising) technique proposed in [27, 38] is one attractive solution to address this noise problem with a smaller overhead. Roughly speaking, it raises the ciphertext modulus from q to pq for an integer p called special modulus, and then computes the key-switching procedure over Rpq followed by modulus reduction back to q. The main advantage of this method is that a key-switching error is decreased by a factor of about p due to the modulus reduction. We apply this technique to our relinearization and encryption algorithms. In particular, a special modulus variant of relinearization requires two sequential modulus switching operations (see Appendix A for details). (k+1)2 We recall that for an extended ciphertext ct ∈ Rq , the goal of relinearization is to generate a 0 k+1 0 ciphertext ct ∈ Rq such that hct , ski = hct, sk⊗ski+elin for some error elin, which should be minimized for efficiency. We refer the reader to Appendix C which provides a noise analysis based on the variance of polynomial coefficients, but we present a concise summary in this section. −1 Let u be a uniform random variable over Rq. We consider its decomposition g (u) and denote by Vg the average of variances of its coefficients. We respectively estimate the variance of a relinearization error from our first and second methods:

2 2 2 2 2 2 2 2 V1 ≈ k n σ · d Vg ,V2 ≈ k n σ · dVg. Efficient MKHE with Packed Ciphertexts 11

In addition, the special modulus variant of the second method achieves a smaller noise whose variance is 1 V 0 = p−2 · V + (k2 + k)n. 2 2 24 Compared to the first method, our second solution has significant advantages in practice because we may use an efficient decomposition method with a small d while obtaining the same level of noise growth. Furthermore, its modulus raising variant obtains an even smaller error variance which is not 0 nearly affected by the size of decomposition since V2 is dominated by the second term (rounding error) when we introduce a special modulus p which can cancel out the term V2.

4 Two MKHE Schemes with Packed Ciphertexts

In this section, we present multi-key variants of the BFV [6, 22] and CKKS [16] schemes. They share the following setup and key generation phases but have different algorithms for message encoding and homomorphic operations. • MKHE.Setup(1λ): Run Setup(1λ) and return the parameter pp.

• MKHE.KeyGen(pp): Each party i generates secret, public and evaluation keys by (si, bi) ← KeyGen(pp) and Di ← EvkGen(si), respectively. Encryption, decryption and homomorphic arithmetic of our MKHE schemes are described in the next subsections. We have a common pre-processing when performing a homomorphic operation between ki+1 ciphertexts. For given ciphertexts cti ∈ Rq , we denote k ≥ max{k1, k2} the number of parties involved in either ct1 or ct2. We rearrange the entries of cti and pad zeros in the empty entries to generate some ∗ ciphertexts cti sharing the same secret sk = (1, s1, . . . , sk). To be precise, a ciphertext cti = (c0, c1, . . . , cki ) ki corresponding to the tuple of parties (id1, . . . , idk ) ∈ {1, 2, . . . , k} is converted into the ciphertext i ( ∗ ∗ ∗ ∗ k+1 ∗ ∗ cj if i = idj for some 1 ≤ j ≤ ki; ct = (c , c , . . . , c ) ∈ R which is defined as c = c0 and c = i 0 1 k q 0 i 0 otherwise, for 1 ≤ i ≤ k. We remark that

hct , (1, s , . . . , s )i = hct∗, (1, s , . . . , s )i. i id1 idki 1 k For simplicity, we will assume that this pre-processing is always done before homomorphic arithmetic so that two input ciphertexts are related to the same set of k parties. Security and Correctness. We recall that BFV and CKKS are IND-CPA secure under the RLWE assumption of parameter (n, q, χ, ψ). Our MKHE schemes have exactly the same single-key encryption algorithms so that their security relies on the hardness of the same RLWE problem (see Appendix B for details). We will briefly show the correctness of our schemes in this section but the rigorous proofs with noise analysis will be provided in Appendix C.

4.1 Multi-Key BFV The BFV scheme [6, 22] is a scale-invariant HE which supports exact computation on a discrete space with a finite characteristic. We denote by t the plaintext modulus and ∆ = bq/te be the scaling factor of the BFV scheme. The native plaintext space is the set of cyclotomic polynomials Rt, but a plaintext is decoded to a tuple of finite field elements via a ring isomorphism from Rt depending on the relation of t and n [48].

• MK-BFV.Enc(m; b, a): This is the standard BFV encryption which takes a polynomial m ∈ Rt as the 2 input. Let a = a[0] and b = b[0]. Sample v ← χ and e0, e1 ← ψ. Return the ciphertext ct = (c0, c1) ∈ Rq where c0 = v · b + ∆ · m + e0 (mod q) and c1 = v · a + e1 (mod q). k+1 • MK-BFV.Dec(ct; s1, . . . , sk): Let ct = (c0, c1, . . . , ck) ∈ Rq be a ciphertext associated to k parties and   s1, . . . , sk be their secret keys. Set sk = (1, s1, . . . , sk) and compute (t/q) · hct, ski (mod t). 12 H. Chen et al.

k+1 0 • MK-BFV.Add(ct1, ct2): Given two ciphertexts cti ∈ Rq , return the ciphertext ct = ct1 + ct2 (mod q). k+1 • MK-BFV.Mult(ct1, ct2; {(Di, bi)}1≤i≤k): Given two ciphertexts cti ∈ Rq , compute ct = b(t/q) · (ct1 ⊗ ct2)e (k+1)2 0 (mod q) ∈ Rq and return the ciphertext ct ← Relin(ct; {(Di, bi)}1≤i≤k). The correctness of our scheme is obtained from the properties of the basic BFV and relinearization k+1 algorithm. A multi-key BFV encryption of m ∈ Rt is a vector ct = (c0, c1, . . . , ck) ∈ Rq such that hct, ski ≈ ∆ · m (mod q) for the secret sk = (1, s1, . . . , sk). So the decryption algorithm can recover m correctly. If ct1 and ct2 are encryptions of m1 and m2 with respect to the secret sk = (1, s1, . . . , sk), then their (scaled) tensor product ct = b(t/q) · (ct1 ⊗ ct2)e (mod q) satisfies hct, sk ⊗ ski ≈ ∆ · m1m2 (mod q) similar to the ordinary BFV scheme. The output ct0 ← Relin(ct; rlk) holds hct0, ski ≈ hct, sk ⊗ ski ≈ ∆ · m1m2 (mod q).

4.2 Multi-Key CKKS The CKKS scheme [16] is a leveled HE scheme with support for approximate fixed-point arithmetic. We QL assume q = i=0 pi for some integers pi to have a chain of ciphertext moduli q0 < q1 < ··· < qL for Q` q` = i=0 pi. The native plaintext is a small polynomial m ∈ R, but one can pack at most (n/2) complex numbers in a single polynomial via DFT. In addition to the basic arithmetic operations, it supports the rescaling algorithm to control the magnitude of encrypted message. For homomorphic operations between ciphertexts at different levels, it requires to transform a high-level ciphertext to have the same level as the other. • MK-CKKS.Enc(m; b, a): Let m ∈ R be an input plaintext and let a = a[0] and b = b[0]. Sample v ← χ and 2 e0, e1 ← ψ. Return the ciphertext ct = (c0, c1) ∈ Rq where c0 = v · b + m + e0 (mod q) and c1 = v · a + e1 (mod q). • . (ct; s , . . . , s ): Let ct = (c , c , . . . , c ) ∈ Rk+1 be a ciphertext at level ` associated to k MK-CKKS Dec 1 k 0 1 k q` parties and s1, . . . , sk be their secret keys. Set sk = (1, s1, . . . , sk) and return hct, ski (mod q`). • . (ct , ct ): Given two ciphertexts ct ∈ Rk+1 at level `, return the ciphertext ct0 = ct + ct MK-CKKS Add 1 2 i q` 1 2 (mod q`). • . (ct , ct ; {(D , b )} ): Given two ciphertexts ct ∈ Rk+1 at level `, compute ct = MK-CKKS Mult 1 2 i i 1≤i≤k i q` 2 ct ⊗ ct (mod q ) ∈ R(k+1) and return the ciphertext ct0 ← (ct; {(D , b )} ) ∈ Rk+1. The 1 2 ` q` Relin i i 1≤i≤k q` relinearization algorithm is defined over modulus q = qL, but we compute the same algorithm modulo q` for level-` ciphertexts. • . (ct): Given a ciphertext ct = (c , c , . . . , c ) ∈ Rk+1 at level `, compute c0 = p−1 · c  MK-CKKS Rescale 0 1 k q` i ` i for 0 ≤ i ≤ k and return the ciphertext ct0 = (c0 , c0 , . . . , c0 ) ∈ Rk+1 . 0 1 k q`−1

A level-` multi-key encryption of a plaintext m with respect to the secret sk = (1, s1, . . . , sk) is a vector ct = (c , c , . . . , c ) ∈ Rk+1 satisfying hct, ski ≈ m (mod q ). For basic homomorphic operation, 0 1 k q` ` we take as input level-` encryptions of m1 and m2. Then, homomorphic addition (resp. multiplication) 0 0 returns a ciphertext ct such that [hct , ski]q` is approximately equal to m1 + m2 (resp. m1m2). Finally, we show that for a level-` encryption ct of m, the rescaling algorithm returns a ciphertext ct0 at level −1 0 −1 (` − 1) encrypting p` · m from the equation [hct , ski]q`−1 ≈ p` · [hct, ski]q` .

4.3 Distributed Decryption In the classical definition of MKHE primitive, all the secrets of the parties involved are required to decrypt a multi-key ciphertext. In practice, however, it is not reasonable to assume that there is a party holding multiple secret keys. Instead, we can ‘imagine a protocol between several key owners to jointly decrypt a ciphertext. The decryption algorithms of our schemes are (approximate) linear combinations of secrets with known coefficients, and there have been proposed some secure methods for this task. We introduce Efficient MKHE with Packed Ciphertexts 13 one simple solution based on the noise flooding technique, but any secure solution achieving the same functionality can be used. The distributed decryption consists of two algorithms: partial decryption and merge. In the first phase, each party i receives the i-th entry of a ciphertext and decrypts it with a noise. We set the noise distribution φ which has a larger variance than the standard error distribution ψ of basic scheme. Then, we merge partially decrypted results with c0 to recover the message.

• MKHE.PartDec(ci, si): Given a polynomial ci and a secret si, sample an error ei ← φ and return µi = ci · si + ei (mod q). Pk • MK-BFV.Merge(c0, {µi}1≤i≤k): Compute µ = c0 + i=1 µi (mod q) and return m = b(t/q) · µe. Pk • MK-CKKS.Merge(c0, {µi}1≤i≤k): Compute and return µ = c0 + i=1 µi (mod q).

For a multi-key ciphertext ct = (c0, . . . , ck), both multi-key BFV and CKKS schemes compute µ = Pk Pk c0 + i=1 µi = hct, ski+ i=1 ei ≈ hct, ski (mod q) in the merge phase. Then, BFV extracts the plaintext by cancelling the scaling factor (q/t).

5 Bootstrapping for two MKHE schemes

There have been several studies on the bootstrapping procedures of the standard (single-key) ring-based HE schemes [25, 32, 12, 14, 9]. Previous work had different goals and solutions depending on the under- lying schemes but they are basically following the Gentry’s technique [24] – homomorphic evaluation of the decryption circuit. In particular, the BFV and CKKS schemes have a very similar pipeline for boot- strapping which consists of four steps: (1) Modulus Raise, (2) Coeff to Slot, (3) Extraction and (4) Slot to Coeff. The second and last steps are specific linear transformations, which require rotation operations on encrypted vectors. For the rest of this section, we first explain how to perform the rotation operation on multi-key cipher- texts based on the evaluation of Galois automorphisms. Then, we revisit the bootstrapping procedures for BFV and CKKS to generalize the existing solutions to our MKHE schemes.

5.1 Homomorphic Evaluation of Galois Automorphisms

The Galois group Gal(Q[X]/(Xn + 1)) of a cyclotomic field consists of the transformation X 7→ Xj for ∗ j ∈ Z2n. We recall that BFV (resp. CKKS) uses the DFT on Rt (resp. R) to pack multiple plaintext values into a single polynomial. As noted in [26], these automorphisms provide special functionalities on packed ciphertext such as rotation of plaintext slots. The evaluation of an automorphism can be done based on the key-switching technique. In some j more details, let τj : a(X) 7→ a(X ) be an element of the Galois group. Given an encryption ct = k+1 (c0, c1, . . . , ck) ∈ Rq of m, we denote by τj(ct) = (τj(c0), . . . , τj(ck)) the ciphertext obtained by taking τj to the entries of ct. Then τj(ct) is a valid encryption of τj(m) corresponding the secret key τj(sk). We then perform the key-switching procedure from τj(sk) back to sk, so as to generate a new ciphertext encrypting the same message under the original secret key sk. In the following, we present two algorithms for the evaluation of the Galois element. The first algorithm generates an evaluation key for the Galois automorphism τj. The second algorithm gathers the evaluation keys of multiple parties and evaluates τj on a multi-key ciphertext using the multi-key-switching technique proposed in [10]. d 0 d • MKHE.GkGen(j; s): Generate a random vector h1 ← U(Rq ) and an error vector e ← ψ . For an RLWE 0 secret s ∈ R, compute h0 = −s · h1 + e + τj(s) · g (mod q). Return the Galois evaluation key as d×2 gk = [h0|h1] ∈ Rq .

• MKHE.EvalGal(ct; {gki}1≤i≤k): Let gki = [hi,0|hi,1] be the Galois evaluation key of the i-th party for 1 ≤ k+1 0 0 0 i ≤ k. Given a ciphertext ct = (c0, . . . , ck) ∈ Rq , compute and return the ciphertext ct = (c0, . . . , ck) 14 H. Chen et al. by

k 0 X −1 c0 = τj(c0) + hg (τj(ci)), hi,0i (mod q), and i=1 0 −1 ci = hg (τj(ci)), hi,1i (mod q) for 1 ≤ i ≤ k.

In the context of CKKS, all the computations are carried out over modulus q = q` for level-` ciphertext. We now show the correctness of our algorithms. 0 0 0 Correctness. From the definition, the output ciphertext ct = (c0, . . . , ck) ← MKHE.EvalGal(ct; {gki}1≤i≤k) holds

k k 0 0 X 0 X −1 −1 hct , ski = c0 + ci · si = τj(c0) + hg (τj(ci)), hi,0i + hg (τj(ci)), hi,1i · si (mod q) i=1 i=1 k X −1  ≈ τj(c0) + hg (τj(ci)), τj(si) · gi = τj(ct), τj(sk) = τj hct, ski (mod q), i=1 as desired. In other words, if the input ciphertext has the phase µ(X) = hct, ski (mod q), then the phase j of the output ciphertext is approximately equal to τj(µ(X)) = µ(X ). Besides the rotation of plaintext slots, we can evaluate the Frobenius endomorphism X 7→ Xt on BFV ciphertexts using the same technique. In the case of CKKS, the map X 7→ X−1 corresponds to the complex conjugation over plaintext slots. Any linear transformation can be represented as a linear combination of shifted plaintext vectors. We note that previous HE optimization techniques [25, 32, 9] for linear transformations can be directly applied to our MKHE schemes.

5.2 Bootstrapping for Multi-Key BFV Chen and Han [12] described a bootstrapping procedure for the single-key BFV scheme, which follows the paradigm of [32], done for BGV scheme. The bootstrapping procedure in [12] takes as input a ciphertext with an arbitrary noise and outputs another ciphertext with a low noise encrypting the same plaintext. Below we present a multi-key variant of [12].

1. The previous work [12] published an encryption of the secret key by itself to raise the modulus. However, we observe that this step can be done by multiplying a constant without extra information. Suppose that the input ciphertext ct encrypts a message m with a plaintext modulus t, i.e., hct, ski = q 0 t m + e (mod q) for some error e. Then we perform a modulus-switching down to a divisor q of q, 0 q0 0 0 0 resulting in hct , ski = t m + e (mod q ), then multiply the ciphertext with q/q and get a ciphertext 00 00 q  q0 0 ct whose phase is hct , ski = q0 · t m + e (mod q). This is a trivial (noise free) encryption of q0 0 0 µ = t m + e with plaintext modulus q and ciphertext modulus q. 2. It computes a homomorphic linear transform which produces multiple ciphertexts holding the coeffi- cients µi ∈ Zt of µ in their plaintext slots. We note that this step can be done using the additions, scalar multiplications and multi-key rotations. 3. We homomorphically evaluate a polynomial, called lower digits removal [12], on the multi-key cipher- texts obtained in previous step. It removes the noise e0 and leaves the coefficients of m in plaintext slots. 4. The final step is another linear transformation which inverts the second step and outputs an encryp- tion of m.

q 00 As a consequence, the output ciphertext has the phase t m + e (mod q) for an error which is smaller than the initial noise e. Efficient MKHE with Packed Ciphertexts 15

5.3 Bootstrapping for Multi-Key CKKS Cheon et al. [14] presented a bootstrapping procedure for the single-key CKKS scheme and its performance was improved in the follow-up research [9]. The bootstrapping procedure of CKKS aims to refresh a low- level ciphertext and return an encryption of the (almost) same messages in a larger ciphertext modulus. We describe its multi-key version as follows.

1. The first step takes a lowest-level ciphertext ct as an input. Let µ = hct, ski (mod q0). Then hct, ski = q0 · I + µ for a small I ∈ R, so ct can be considered as an encryption of t = q0 · I + µ in the largest ciphertext modulus qL. 2. We apply a homomorphic linear transformation to compute one or two ciphertexts encrypting the coefficients of t(X) in their plaintext slots. This step requires multi-key rotation and conjugation described in Section 5.1. 3. We evaluate a polynomial which approximates the reduction modular q0 function. It removes the I part of t and leaves coefficients of µ in the slots. 4. Finally, we apply the inverse linear transformation of the second step to pack all the coefficients of µ back into a ciphertext. The output ciphertext ct0 encrypts the same plaintext µ in a higher level than the input ciphertext 0 ct, i.e., hct , ski ≈ µ (mod q`) for some 0 < ` < L.

6 Implementation

We provide a proof-of-concept implementation to show the performance of our MKHE schemes. Our source code is developed in C++ with Microsoft SEAL version 3.2.0 [47] which includes BFV and CKKS implementations. We summarize our optimization techniques, recommended parameter sets, and some ex- perimental results in this section. Finally, we apply the multi-key CKKS scheme to evaluate an encrypted neural network model on encrypted data and report the experimental result to classify handwritten im- ages on the MNIST dataset [40]. All experiments are performed on a ThinkPad P1 laptop: Intel Xeon E-2176M @ 4.00 GHz single-threaded with 32 GB memory, compiled with GNU C++ 7.3.0 (-O2). In our implementation, we set the secret distribution χ as the uniform distribution over the set of polynomials in R whose coefficients are in {0, ±1}. Each coefficient of an error e ← ψ is drawn according to the discrete Gaussian distribution centered at zero with standard deviation σ = 3.2.

6.1 Optimization Techniques 6.1.1 Basic Optimizations. In the relinearization process, we first compute the tensor product of two ciphertexts which corresponds to the tensor squared secret sk ⊗ sk. It has duplicated entries at (i, j) and (j, i), so we can reduce its 2 1 dimension from (k + 1) down to 2 k(k + 1). Both the size of the relinearization key and complexity of the 2 algorithm are almost halved. Furthermore, each of the diagonal entries si of sk ⊗ sk depends on a single 2 party, so we can include a key-switching key for si in the generation of an evaluation key. It increases the size of evaluation keys but reduces the complexity and noise of relinearization.

6.1.2 RNS and NTT

Our schemes are designed on the ring structure Rq, so we need to optimize the basic polynomial arithmetic. QL There is a well-known technique to use an RNS by taking a ciphertext modulus q = i=0 pi which is a QL product of coprime integers. Based on the ring isomorphism Rq → i=0 Rpi , a 7→ (a (mod pi))0≤i≤L, we achieve asymptotic/practical improvements in polynomial arithmetic over Rq. In particular, it has been studied how to design full-RNS variants of BFV and CKKS [4, 30, 15], which do not require any RNS conversions. In addition, each of base prime can be chosen properly so that there exists a (2n)-th root of unity modulo pi. It allows us to exploit an efficient Number Theoretic Transformation (NTT) modulo pi. Our implementation adapts these techniques to improve the speed of polynomial arithmetic. 16 H. Chen et al.

6.1.3 Gadget Decomposition.

As mentioned before, the gadget decomposition has a major effect on the performance of homomor- P phic arithmetic. Bajard et al. [4] observed that the formula a = gi · [a]p (mod q) where gi =   i i Q −1 Q  j6=i pj · j6=i pj can be used to build an RNS-friendly decomposition a 7→ ([a]pi )i with the pi gadget vector g = (gi)i. We adapt this decomposition method and take an advantage of an RNS-based implementation by storing ciphertexts in the RNS form. In [4, 30], the authors further combined this method with the classical digit decomposition method to provide a more fine-grained control of the trade-off between complexity and noise growth. However, we realize that this hybrid method increases the decomposition degree (and thereby space and computational complexity) several times, and the special modulus technique described in Section 3.3.4 provides a much better trade-off. Therefore, the digit decomposition is not used in our implementation.

6.2 Micro-benchmarks for MKHE Schemes

Table 2 illustrates the selected parameter sets used in experiments. They are default parameter sets in Microsoft SEAL which provide at least 128-bit of security level according to LWE-estimator [3] and HE security standard [2]. Generation time and size of secret keys, and execution time of encryption are the 1 same as those in single-key BFV and CKKS. Decryption and ciphertext addition take 2 (k + 1) times longer than the ordinary HE schemes. We remark that generation time and size of public and evaluation keys {(bi, Di)}1≤i≤k do not depend on the number of parties or the scheme because the generation can be executed in a synchronous way.

Parameter Public Key Evaluation Key 0 ID n dlog qe dlog pie # pis Size Gen. Size Gen. I 213 218 49–60 4 0.75 MB 3 ms 2.25 MB 8 ms II 214 438 53–60 8 7 MB 24 ms 21 MB 59 ms III 215 881 54–60 16 60 MB 195 ms 180 MB 470 ms

Table 2. Proposed parameter sets. log q and log pi denote the bit lengths of the largest RLWE modulus and 0 individual RNS primes, respectively. # pis denotes the number of RNS primes. Public keys’ and evaluations keys’ generation times and sizes are those of each party. ms = 10−3 sec.

In our experiments, a homomorphic multiplication is always followed by a relinearization procedure. BFV requires more NTTs than CKKS overall to perform these operations, and is therefore slower, which is confirmed by the timing results. The execution times of multiplications in both MKHE schemes are asymptotically quadratic in the number of parties as discussed in Section 3.3. In practice, they are better than quadratic, as reported in Table 3. This is because both multiplication and relinearization include a notable portion of compu- tation that is linear in the number of parties. The execution times of homomorphic evaluation of Galois automorphisms are almost linear on the number of the parties as described in Section 5.1. For the single-party scenario in Table 3, we measured performance of our modified Microsoft SEAL [47] with a special modulus. It is infeasible to fairly compare the ordinary BFV and CKKS with their multi- key variants because the performance of a scheme can be analyzed from various perspectives: space/time complexity, noise growth, functionality, etc. It is provided merely as a reference point, for a more portable estimation of MKHE on different processors. Efficient MKHE with Packed Ciphertexts 17

Mult + Relin EvalGal ID #Parties BFV CKKS BFV CKKS 1 20 ms 8 ms 3 ms 4 ms 2 44 ms 22 ms 7 ms 8 ms I 4 116 ms 67 ms 14 ms 16 ms 8 365 ms 229 ms 28 ms 31 ms 1 110 ms 59 ms 22 ms 24 ms 2 257 ms 165 ms 47 ms 49 ms II 4 717 ms 521 ms 88 ms 95 ms 8 2, 350 ms 1, 845 ms 176 ms 193 ms 1 675 ms 465 ms 170 ms 172 ms 2 1, 715 ms 1, 364 ms 333 ms 359 ms III 4 5, 025 ms 4, 287 ms 646 ms 711 ms 8 17, 450 ms 15, 159 ms 1, 332 ms 1, 413 ms Table 3. Execution time that depends on the number of parties. Multiplication is always followed by a relin- earization in MKHE. ms = 10−3 sec.

6.3 Application to Oblivious Neural Network Inference Jiang et al. [34] proposed a novel framework to test encrypted neural networks on encrypted data in the single-key scenario. We consider the same service paradigm but in a multi-key setting: the data and trained model are encrypted under different keys.

6.3.1 Homomorphic Evaluation of CNN We present an efficient strategy to evaluate CNN prediction model on the MNIST dataset. Each image is a 28×28 pixel array and will be labeled with 10 possible digits after an arbitrary number of hidden layers. We assume that a neural network is trained with the plaintext dataset in the clear. Table 4 describes our neural network topology which uses one convolution layer and two fully-connected (FC) layers with square activation function. The final step is to apply the softmax activation function for a purpose of probabilistic classification, so it is enough to obtain an index of maximum values of outputs in a prediction phase. Our objective is to predict a single image in an efficient way, thereby achieving a low latency. In Appendix D, we describe the detailed algorithms for encryption and evaluation.

Layer Description Convolution Input image 28 × 28, window size 4 × 4, stride (2, 2), number of output channels 5 1st square Squaring each of the 845 inputs FC-1 Fully connecting with 845 inputs and 64 outputs 2nd square Squaring each of the 64 inputs FC-2 Fully connecting with 64 inputs and 10 outputs Table 4. Description of our CNN to the MNIST dataset.

The Convolutional Layer. As noted in [35], strided convolution can be decomposed into a sum of simple convolutions (i.e., the stride parameter = 1). From our choice of the parameters, each of such simple convolutions takes as inputs 14 × 14 images and 2 × 2 filters. This representation allows more SIMD parallelism, since we can pack all the inputs into a single ciphertext and perform four simple 18 H. Chen et al. convolutions in parallel. Once this is done, we can accumulate the results across plaintext slots using rotate-and-sum operations in [31]. Moreover, we can pack multiple channels in a single ciphertext as in [35, Section VI.D], yielding in a fully-packed ciphertext of the convolution result. The First Square Layer. This step applies the square activation function to all the encrypted output of the convolutional layer in an SIMD manner.

The FC-1 Layer. In general, an FC layer with ni inputs and no outputs can be computed as a matrix- vector multiplication. Let W and v be the no × ni weight matrix and ni-length vector, respectively. We assume that ni and no are smaller than the number of plaintext slots, and no is much lower than ni in the context of FC layers. Halevi and Shoup [31] presented the diagonal encoding method which puts a square matrix in diagonal order, multiplies each of them with a rotation of the input vector, and then accumulates all the output vectors to obtain the result. Juvekar et al. [35] extended the method to multiply a vector by a rectangular matrix. If the input vector is encrypted in a single ciphertext, the total complexity is no homomorphic multiplications, (no − 1) rotations of the input ciphertext of v, and log(ni/no) rotations for rotate-and-sum algorithm. We extend their ideas to split the original matrix W into smaller sized blocks and perform computation on the sub-matrices as shown in Fig. 2. Suppose that the vector v is split into ` many sub-strings with the same length. For simplicity, we consider the first ` rows of W. We first apply the diagonal method to arrange the 1 × (ni/`) sized sub-matrices of W in a way that intermediate numbers are aligned in the same position across multiple slots after homomorphic multiplications. To be precise, the encryptions of diagonal components are multiplied with ` rotations of the encrypted vector and all these encryptions are added together similar to the diagonal method. Then, the output ciphertext represents (ni/`)-sized ` chunks, each containing partial sums of ` entries of ni inputs. Finally, we can accumulate these using a rotate-and-sum algorithm with log(ni/`) rotations. As a consequence, the output ciphertext encrypts the first ` many entries of Wv. We repeat this procedure for each ` many rows of W, resulting in (no/`) ciphertexts.

Fig. 2. Our matrix-vector multiplication algorithm (` = 2).

When ni is significantly smaller than the number of plaintext slots ns, the performance can be im- proved by packing multiple copies of the input vector into a single ciphertext and performing (ns/ni) aforementioned operations in parallel. The computational cost is (no · ni)/ns homomorphic multiplica- tions, (` − 1) rotations of the input ciphertext of v, and (no · ni)/(ns · `) · log(ni/`) rotations. We provide additional details in Appendix D.2. As a result, our method provides a trade-off between rotations on the same input ciphertext (which can benefit from the hoisting optimization of [33]) and rotations on distinct ciphertexts (which cannot benefit from hoisting). As described in Fig. 2, all slots except the ones corresponding to the result components may reveal information about partial sums. We therefore multiply the output ciphertexts by a constant zero-one plaintext vector to remove the information. Efficient MKHE with Packed Ciphertexts 19

The Second Square Layer. This step applies the square activation function to all the output nodes of the first FC layer. The FC-2 Layer. This step performs a multiplication with small sized weight matrix U and vector v. As discussed in [31], it can be considered as the linear combination of U’s columns using coefficients from v. Suppose that the column vectors are encrypted in a single ciphertext in such a way that they are aligned with the encrypted vector. We first repeatedly rotate the encryption of the vector to generate a single ciphertext with no copies of each entry. Then, we apply pure SIMD multiplication to multiply each column vector by the corresponding scalar of the vector in parallel. Finally, we aggregate all the resulting columns over the slots to generate the final output.

6.3.2 Performance Evaluation We evaluated our framework to classify encrypted handwritten images of the MNIST dataset. We used the library keras [18] with Tensorflow [1] to train the CNN model from 60,000 images of the dataset. We employ the special modulus variant of the multi-key CKKS scheme to achieve efficiency of approximate computation. Each layer of the network has a depth of one homomorphic multiplication (except the first FC layer requiring one more depth for multiplicative masking), so it requires 6 levels for the evaluation of CNN. We chose the parameter Set-II from Table 2 so as to cope with such levels of computations. The data owner chooses one among 10,000 test images from MNIST dataset, normalizes it by dividing by the maximum value 255, and encrypts it into a single ciphertext using the public key, which takes 1.75 MB of space. Meanwhile, the model provider generates a relatively large number of ciphertexts for the trained model: four for the multiple channels, eight for the weight matrix of the FC-1 layer, and one of each for the other weight or bias. Therefore, the total size of the output ciphertexts is 18.5 MB and it takes roughly 7 times longer to encrypt the trained model than an image, but it is an one-time process before data outsourcing and so it is a negligible overhead. After the evaluation, the cloud server outputs a single multi-key ciphertext encrypting the prediction result with respect to the extended secret key of the data and model owner. Table 5 presents timing result for the evaluation of CNN. It takes about 1.8 seconds to classify an encrypted image from the encrypted trained model. Our parameter guarantees at least 32-bit precision after the decimal point. That is, the infinity norm distance between encrypted evaluation and plain computation is bounded by 2−32. Therefore, we had enough space to use the noise flooding technique for decryption. In terms of the accuracy, it achieves about 98.4% on the test set which is the same as the one obtained from the evaluation in the clear.

6.3.3 Comparison with Previous Works In Table 6, we compare our benchmark result with the state-of-the-art frameworks for oblivious neural network inference: CryptoNets [29], MiniONN [42], [35], and E2DM [34]. The first column in- dicates the framework and the second column denotes the cryptographic primitives used for preserving privacy. The last columns give running time for image classification as well as amortized time per instance if applicable. Among the aforementioned solutions for private neural network prediction, E2DM relies on a third- party authority holding a secret key of HE, since the data and model are under the same secret key. CryptoNets has good amortized complexity, but it has a high latency for a single prediction. MiniONN and Gazelle have good latency, but they require both parties to be online during the protocol execution, and at least one party performs local work proportional to the complexity of the function being evaluated. Also, the number of rounds for MiniONN and Gazelle scales with the number of layers in the neural network. On the other hand, our solution has a constant number of rounds. Moreover, our solution allows the parties to outsource homomorphic evaluation to an untrusted server (e.g. a VM in the cloud with large computing power), so both parties only need to pay en- cryption/decryption cost, and the communication cost only scales with the input/model sizes, but not the complexity of the network itself. This feature is made possible since our scheme supports multi-key operations. Note that the server is only assumed to be semi-honest: we do not require non-collusion as- sumptions since even if the server colludes with one party, they cannot learn the other party’s private 20 H. Chen et al.

Stage Runtime Data owner Image encryption 31 ms Model provider Model encryption 236 ms Convolutional layer 705 ms 1st square layer 143 ms Cloud FC-1 layer 739 ms server 2nd square layer 75 ms FC-2 layer 135 ms Total evaluation 1, 797 ms Table 5. Performance breakdown for evaluating an encrypted neural network on encrypted MNIST data, where the two encryptions are under different secret keys.

Runtime Framework Methodology Latency Amortized CryptoNets HE 570 s 0.07 s MiniONN HE, MPC 1.28 s - Gazelle HE, MPC 0.03 s - E2DM HE 28.59 s 0.45 s Ours MKHE 1.80 s - Table 6. MNIST benchmarks of privacy-preserving neural network frameworks.

inputs due to the IND-CPA security of MKHE. Therefore, we believe that our work presents an inter- esting point in the design space of oblivious machine learning inference. Last but not least, MiniONN and Gazelle have communication cost scaling linearly with the number of activation nodes in the neural network due to the use of garbled circuits. In our case, the size of the ciphertexts and public/evaluation keys only depends on the number of layers and the number of input nodes of the neural network. Hence our solution is asymptocially more communication size-efficient.

7 Conclusion

In this paper, we presented practical multi-key variants of the BFV and CKKS schemes and their boot- strapping methods. We provided the first experimental results of MKHE with packed ciphertexts by implementing our schemes. The main technical contribution is to propose new relinearization algorithms achieving better performance compared to prior works [13, 41]. Finally, we showed that our scheme can be applied to secure on-line prediction services by evaluating an encrypted classifier on an encrypted data under two different keys. We implemented our protocol on convolutional neural networks trained on the MNIST dataset and showed that it can achieve a low end-to-end latency by leveraging the optimized homomorphic convolutions and homomorphic matrix-vector multiplications.

References

1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2015), https://www.tensorflow.org 2. Albrecht, M., Chase, M., Chen, H., Ding, J., Goldwasser, S., Gorbunov, S., Halevi, S., Hoffstein, J., Laine, K., Lauter, K., Lokam, S., Micciancio, D., Moody, D., Morrison, T., Sahai, A., Vaikuntanathan, V.: Homomorphic encryption security standard. Tech. rep., HomomorphicEncryption.org, Toronto, Canada (November 2018) 3. Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. Journal of Mathematical Cryptology 9(3), 169–203 (2015) Efficient MKHE with Packed Ciphertexts 21

4. Bajard, J.C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryption schemes. In: International Conference on Selected Areas in Cryptography. pp. 423–442. Springer (2016) 5. Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant dis- tributed computation. In: Proceedings of the twentieth annual ACM symposium on Theory of computing. pp. 1–10. ACM (1988) 6. Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi- Naini, R., Canetti, R. (eds.) CRYPTO 2012, Lecture Notes in Computer Science, vol. 7417, pp. 868–886. Springer (2012) 7. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrap- ping. In: Proc. of ITCS. pp. 309–325. ACM (2012) 8. Brakerski, Z., Perlman, R.: Lattice-based fully dynamic multi-key FHE with short ciphertexts. In: Annual Cryptology Conference. pp. 190–213. Springer (2016) 9. Chen, H., Chillotti, I., Song, Y.: Improved bootstrapping for approximate homomorphic encryption. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. pp. 34–54. Springer (2019) 10. Chen, H., Chillotti, I., Song, Y.: Multi-key homomophic encryption from TFHE. Cryptology ePrint Archive, Report 2019/116 (2019), https://eprint.iacr.org/2019/116 11. Chen, H., Gilad-Bachrach, R., Han, K., Huang, Z., Jalali, A., Laine, K., Lauter, K.: Logistic regression over encrypted data from fully homomorphic encryption. BMC medical genomics 11(4), 81 (2018) 12. Chen, H., Han, K.: Homomorphic lower digits removal and improved FHE bootstrapping. In: Annual In- ternational Conference on the Theory and Applications of Cryptographic Techniques. pp. 315–337. Springer (2018) 13. Chen, L., Zhang, Z., Wang, X.: Batched multi-hop multi-key FHE from Ring-LWE with compact ciphertext extension. In: Theory of Cryptography Conference. pp. 597–627. Springer (2017) 14. Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. pp. 360– 384. Springer (2018) 15. Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: A full RNS variant of approximate homomorphic encryp- tion. In: International Conference on Selected Areas in Cryptography. Springer (2018) 16. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: International Conference on the Theory and Application of Cryptology and Information Security. pp. 409–437. Springer (2017) 17. Chillotti, I., Gama, N., Georgieva, M., Izabachene, M.: Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In: Advances in Cryptology – ASIACRYPT 2016. pp. 3–33. Springer (2016) 18. Chollet, F., et al.: Keras (2015), https://github.com/keras-team/keras 19. Clear, M., McGoldrick, C.: Multi-identity and multi-key leveled FHE from learning with errors. In: Annual Cryptology Conference. pp. 630–656. Springer (2015) 20. Damg˚ard,I., Keller, M., Larraia, E., Pastro, V., Scholl, P., Smart, N.P.: Practical covertly secure MPC for dishonest majority–or: breaking the SPDZ limits. In: European Symposium on Research in Computer Security. pp. 1–18. Springer (2013) 21. Ducas, L., Micciancio, D.: FHEW: Bootstrapping homomorphic encryption in less than a second. In: Advances in Cryptology–EUROCRYPT 2015, pp. 617–640. Springer (2015) 22. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144 (2012), https://eprint.iacr.org/2012/144 23. Gasc´on,A., Schoppmann, P., Balle, B., Raykova, M., Doerner, J., Zahur, S., Evans, D.: Privacy-preserving dis- tributed linear regression on high-dimensional data. Proceedings on Privacy Enhancing Technologies 2017(4), 345–364 (2017) 24. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty- first Annual ACM Symposium on Theory of Computing. pp. 169–178. STOC ’09, ACM (2009). https://doi.org/10.1145/1536414.1536440, http://doi.acm.org/10.1145/1536414.1536440 25. Gentry, C., Halevi, S., Smart, N.P.: Better bootstrapping in fully homomorphic encryption. In: Public Key Cryptography–PKC 2012, pp. 1–16. Springer (2012) 26. Gentry, C., Halevi, S., Smart, N.P.: Fully homomorphic encryption with polylog overhead. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012, Lecture Notes in Computer Science, vol. 7237, pp. 465–482. Springer (2012) 27. Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: Safavi-Naini, R., Canetti, R. (eds.) Advances in Cryptology - CRYPTO 2012. Lecture Notes in Computer Science, vol. 7417, pp. 850–867. Springer (2012) 22 H. Chen et al.

28. Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: Advances in Cryptology–CRYPTO 2013, pp. 75–92. Springer (2013) 29. Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning. pp. 201–210 (2016) 30. Halevi, S., Polyakov, Y., Shoup, V.: An improved RNS variant of the BFV homomorphic encryption scheme. In: Cryptographers Track at the RSA Conference. pp. 83–105. Springer (2019) 31. Halevi, S., Shoup, V.: Algorithms in HElib. In: Advances in Cryptology–CRYPTO 2014. pp. 554–571. Springer (2014) 32. Halevi, S., Shoup, V.: Bootstrapping for HElib. In: Advances in Cryptology–EUROCRYPT 2015, pp. 641–670. Springer (2015) 33. Halevi, S., Shoup, V.: Faster homomorphic linear transformations in HElib. In: Annual International Cryp- tology Conference. pp. 93–120. Springer (2018) 34. Jiang, X., Kim, M., Lauter, K., Song, Y.: Secure outsourced matrix computation and application to neural networks. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. pp. 1209–1222. ACM (2018) 35. Juvekar, C., Vaikuntanathan, V., Chandrakasan, A.: GAZELLE: A low latency framework for secure neu- ral network inference. In: 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD (2018) 36. Keller, M., Pastro, V., Rotaru, D.: Overdrive: making SPDZ great again. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. pp. 158–189. Springer (2018) 37. Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. BMC medical genomics 11(4), 83 (2018) 38. Kim, M., Song, Y., Li, B., Micciancio, D.: Semi-parallel logistic regression for GWAS on encrypted data. Cryptology ePrint Archive, Report 2019/294 (2019), https://eprint.iacr.org/2019/294 39. Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR medical informatics 6(2) (2018) 40. LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998) 41. Li, N., Zhou, T., Yang, X., Han, Y., Liu, W., Tu, G.: Efficient multi-key fhe with short extended ciphertexts and directed decryption protocol. IEEE Access 7, 56724–56732 (2019) 42. Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 619–631. ACM (2017) 43. L´opez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the forty-fourth annual ACM symposium on Theory of computing. pp. 1219–1234. ACM (2012) 44. Mohassel, P., Zhang, Y.: SecureML: A system for scalable privacy-preserving machine learning. In: 2017 38th IEEE Symposium on Security and Privacy (SP). pp. 19–38. IEEE (2017) 45. Mukherjee, P., Wichs, D.: Two round multiparty computation via multi-key FHE. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. pp. 735–763. Springer (2016) 46. Peikert, C., Shiehian, S.: Multi-key FHE from LWE, revisited. In: Theory of Cryptography Conference. pp. 217–238. Springer (2016) 47. Microsoft SEAL (release 3.2). https://github.com/Microsoft/SEAL (Feb 2019), , Red- mond, WA. 48. Smart, N.P., Vercauteren, F.: Fully homomorphic SIMD operations. Designs, codes and cryptography 71(1), 57–81 (2014) 49. Wang, X., Ranellucci, S., Katz, J.: Global-scale secure multiparty computation. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 39–56. ACM (2017) 50. Yao, A.C.C.: How to generate and exchange secrets. In: Foundations of Computer Science, 1986., 27th Annual Symposium on. pp. 162–167. IEEE (1986)

A Special Modulus Variants of Our MKHE Schemes

• MKHE.Setup(1λ): Given a security parameter λ, set the RLWE dimension n, ciphertext modulus q, special d modulus p, key distribution χ and error distribution ψ over R. Generate a random vector a ← U(Rpq). Return the public parameter pp = (n, p, q, χ, ψ, a). d×3 • UniEnc(µ; s): For an input plaintext µ ∈ R, generate a ciphertext D = [d0|d1|d2] ∈ Rpq as follows: Efficient MKHE with Packed Ciphertexts 23

Algorithm 3 Relinearization method with modulus raising

Input: ct = (ci,j)0≤i,j≤k, {(Di = [di,0|di,1|di,2], bi)}1≤i≤k. 0 0 k+1 Output: ct = (ci)0≤i≤k ∈ Rq . 00 1: (ci )0≤i≤k ← 0 . Create a temporary vector modulo pq 2: for 1 ≤ i, j ≤ k do 00 −1 3: ci,j ← hg (ci,j), bji (mod pq) 0  −1 00  4: ci,j ← p · ci,j . It is a polynomial modulo q 00 00 00 00 −1 0 5: (c0 , ci ) ← (c0 , ci ) + g (ci,j) · [di,0|di,1] (mod pq) 00 00 −1 6: cj ← cj + hg (ci,j), di,2i (mod pq) 7: end for 0  −1 00 8: c0 ← c0,0 + p · c0 (mod q) 9: for 1 ≤ i ≤ k do 0  −1 00 10: ci ← c0,i + ci,0 + p · ci (mod q) 11: end for

1. Sample r ← χ. d d 2. Sample d1 ← U(Rpq) and e1 ← ψ , and set d0 = −s · d1 + e1 + pr · g (mod pq). d d 3. Sample e2 ← ψ and set d2 = r · a + e2 + pµ · g (mod pq) in Rpq.

d • MKHE.KeyGen(pp): Each party i samples the secret key si ← χ, an error vector ei ← ψ and sets the public key as bi = −si · a + ei (mod pq). Set the evaluation key Di ← UniEnc(si; si).

(k+1)2 • Relin(ct; {(Di, bi)}1≤i≤k): Given an extended ciphertext ct = (ci,j)0≤i,j≤k ∈ Rq and k pairs of d×3 d k 0 k+1 evaluation and public keys {(Di, bi)}1≤i≤k ∈ (Rpq ×Rpq) , generate a ciphertext ct ∈ Rq as described in Alg. 3.

• MK-CKKS.Enc(m; bi, a): Let m ∈ R be an input plaintext and let a = a[0] and bi = bi[0] be the first entries of the common reference string and public key of the i-th party. Sample v ← χ and e0, e1 ← ψ.  −1  2 Return the ciphertext ct = (m, 0) + p · (c0, c1) ∈ Rq where (c0, c1) = v · (b, a) + (e0, e1) (mod pq). • MK-CKKS.Dec, Add, Mult, Rescale: These algorithms are the same as the ones described in Section 4.2.

d 0 d • MKHE.GkGen(j; s): Generate a random vector h1 ← U(Rpq) and an error vector e ← ψ . Compute 0 d×2 h0 = −s · h1 + e + τj(s) · g (mod pq). Return the Galois evaluation key as gk = [h0|h1] ∈ Rpq .

• MKHE.EvalGal(ct; {gki}1≤i≤k): Let gki = [hi,0|hi,1] be the Galois evaluation key of the i-th party for k+1 1 ≤ i ≤ k. Given a ciphertext ct = (c0, . . . , ck) ∈ Rq , compute

k 00 X −1 c0 = hg (τj(ci)), hi,0i (mod pq) and i=1 00 −1 ci = hg (τj(ci)), hi,1i (mod pq) for 1 ≤ i ≤ k,

0 0 k+1 0  −1 00 0 then return the ciphertext ct = (c0, . . . , ck) ∈ Rq where c0 = τj(c0) + p · c0 (mod q) and ci =  −1 00 p · ci (mod q) for 1 ≤ i ≤ k.

B Security Proofs

B.1 Basic Scheme (Uni-encryption)

Let (n, q, χ, ψ) be the RLWE parameter generated in the setup phase. Given a plaintext µ ∈ R, we define d×5 the distribution D0 = {(a, b, d0, d1, d2)} over Rq as follows: 24 H. Chen et al.

d (1) a ← U(Rq ), s ← χ, e ← ψ, and b = −s · a + e (mod q), d (2) d1 ← U(Rq ), r ← χ, e1 ← ψ, and d0 = −s · d1 + e1 + r · g (mod q), d (3) e2 ← ψ and d2 = r · a + e2 + µ · g (mod q), which represents the tuple of CRS, public key, and uni-encryption of µ. d×5 First, we consider the distribution D1 over Rq which is obtained from D0 by modifying its definition (1) and (2) into

d d (1)’ a ← U(Rq ) and b ← U(Rq ), d d (2)’ d1 ← U(Rq ) and d0 ← U(Rq ).

Since (3) is independent from the choice of s ← χ, the distributions D0 and D1 are computationally indistinguishable if the RLWE problem (of secret s) with parameter (n, q, χ, ψ) is hard. In the next step, we change the definition (3) of D1 into d (3)’ d2 ← U(Rq ), and use the same RLWE assumption (of secret r) to show that it is computationally indistinguishable from D1. d×5 Finally, U(Rq ) is independent from the given plaintext µ, so we conclude that the uni-encryption scheme is IND-CPA secure.

B.2 MKHE Schemes As mentioned earlier, our MKHE schemes have single-key encryption algorithms. A ciphertext is generated by adding an encoded plaintext to a random encryption of zero. Hence, it suffices to show that a random encryption of zero is computationally indistinguishable from a uniform random variable. 4 We consider the random variable (a, b, c0, c1) over Rq defined by a ← U(Rq), b = −s · a + e (mod q) for s ← χ and e ← ψ, and (c0, c1) = v·(b, a)+(e0, e1) (mod q) for v ← χ and e0, e1 ← ψ. It represents the tuple of a CRS, a public key, and an encryption of zero. First, we change the definition of b as b ← U(Rq). It is computationally indistinguishable by the RLWE assumption with parameter (n, q, χ, ψ) (of secret s). Then, (b, c0) and (a, c1) can be viewed as two RLWE samples of secret v which are computationally 2 indistinguishable from the uniform random variable U(Rq) under the same RLWE assumption.

B.3 Special Modulus Variants We set the ciphertext modulus as pq to design the special modulus variants of uni-encryption and relin- earization algorithms. Therefore, we can show that special modulus variants of our MKHEs are IND-CPA secure under the RLWE assumption of parameter (n, pq, χ, ψ) by simply replacing q’s in the previous se- curity proofs by pq.

C Noise Analysis

We provide an average-case noise estimation on the variances of polynomial coefficients. In the following, we make a heuristic assumption that the coefficients of each polynomial behave like independent zero- mean random variables of the same variance. We denote by Var(a) = Var(ai) the variance of its coefficients P i for given random variable a = i ai · X ∈ R over R. Hence, the product c = a · b of two polynomials will have the variance of Var(c) = n · Var(a) · Var(b). More generally, for a vector a ∈ Rd of random variables, 1 Pd we define Var(a) = d i=1 Var(a[i]). −1 Specifically, we let Vg = Var(g (a)) of a uniform random variable a over Rq to estimate the size of gadget decomposition. Recall that our implementation exploits the RNS-friendly decomposition Rq → Q i Rpi , a 7→ ([a]pi )i for distinct word-size primes of the same bit-size so that d = dlog q/ log pie and 1 Pd 2 k+1 Vg ≈ 12d i=1 pi . Finally, we assume that every ciphertext ct ∈ Rq behaves as if it is a uniform k+1 random variable over Rq . Efficient MKHE with Packed Ciphertexts 25

C.1 Relinearization We first specify some distributions for detailed analysis. We set the key distribution χ and the distribution ψ as the uniform distribution over the set of binary polynomials and the Gaussian distribution of variance σ2, respectively.

Method 1. We first analyze the conversion algorithm Ki,j ← Convert(Di, bj). Let Di = [di,0|di,1|di,2] be an uni-encryption of µi ∈ R encrypted by the secret si and (sj, bj) a pair of the secret and public keys of the j-th party, i.e., bj = −sj · a + ej (mod q), di,0 = −si · di,1 + ei,1 + ri · g (mod q), and di,2 = ri · a + ei,2 + µi · g (mod q) for fresh errors ej, ei,1 and ei,2. We observe that

ki,j,0 + si · ki,j,1 = Mj · (di,0 + si · di,1) = Mjei,1 + ribj (mod q),

sj · ki,j,2 = risj · a + sj · ei,2 + µisj · g (mod q), and consequently ki,j,0 + si · ki,j,1 + sj · ki,j,2 = (Mjei,1 + riej + sjei,2) + µisjg (mod q). Therefore, the d noise Mjei,1 + rie + sjei,2 ∈ R of the output ciphertext has the variance of

2 Vconv = nσ · (d · Vg + 1) .

0 We now consider an extended ciphertext ct = (ci,j)0≤i,j≤k and the output ct ← Relin(ct; {(Di, bi)}1≤i≤k) of the relinearization procedure. As shown in Section 3.3.1, it satisfies that

k k k 0 0 X 0 X X −1 hct , ski = c0 + ci · si = c0,0 + (c0,i + ci,0)si + g (ci,j) · Ki,j · (1, si, sj) (mod q) i=1 i=1 i,j=1 k X −1 = hct, sk ⊗ ski + hg (ci,j), ei,ji (mod q), i,j=1 where ei,j = Ki,j · (1, si, sj) − sisj · g (mod q) denotes the error of Ki,j. Hence, the relinearization error Pk −1 elin = i,j=1hg (ci,j), ei,ji has the variance of

2 2 2 2 2 2 Vlin = k · nd · Vg · Vconv ≈ k · n d σ · Vg .

This variance can be reduced to about half by eliminating the duplicated entries of sk⊗sk as explained in 2 Section 6.1. We also note that the factor k in the formula can be reduced down to k1k2 if ct = ct1 ⊗ ct2 is the tensor product of two sparse ciphertexts cti corresponding to ki ≤ k secrets. 0 Method 2. Suppose that ct = (ci,j)0≤i,j≤k is an extended ciphertext and let ct ← Relin(ct; {(Di, bi)}1≤i≤k) be the output of the relinearization procedure. As noted in Section 3.3.2, we have that

−1 0 0 −1 0 g (ci,j) · [di,0|di,1] · (1, si) = ri · ci,j + hg (ci,j), ei,1i (mod q), and

−1 −1 hg (ci,j), di,2i · sj = hg (ci,j), risj · a + sj · ei,2 + sisj · gi −1 0 = hg (ci,j), ri · (−bj + ej) + sj · ei,2 + sisj · gi = −ri · ci,j + ci,j · sisj + ei,j (mod q).

−1 2 2 where ei,j = hg (ci,j), ri · ej + sj · ei,2i (mod q). The variance of ei,j is n d · σ · Vg. From the equation

k 0 X −1 0  hct , ski = hct, sk ⊗ ski + hg (ci,j), ei,1i + ei,j (mod q), i,j=1 the variance of a relinearization error elin is obtained by

2 2 2 2 2 2 Vlin = k · (n + n)d · σ · Vg ≈ k · n d · σ · Vg. 26 H. Chen et al.

−1 0 Special Modulus Variant of Method 2. In lines 3 ∼ 6 of Alg. 3, we add g (ci,j) · [di,0|di,1] and −1 0 00 00 hg (ci,j), di,2i to the temporary ciphertext. We first note that p · ci,j = ci,j − [ci,j]p (mod pq) and

−1 0 0 −1 0 g (ci,j) · (di,0 + si · di,1) = rici,j + hg (ci,j), ei,1i (mod pq) −1 00 −1 0 = hg (ci,j), ribji − ri · [ci,j]p + hg (ci,j), ei,1i (mod pq). Meanwhile, the other term satisfies that

−1 −1 hg (ci,j), sj · di,2i = hg (ci,j), −ribj + riej + psisjg + sjei,2i (mod pq) −1 0 −1 =pci,j · sisj − hg (ci,j), ribji + hg (ci,j), riej + sjei,2i (mod pq).

00 Consequently, the phase of the temporary ciphertext (ci )0≤i≤k is increased by pci,j · sisj + ei,j for the error

00 −1 0 −1 ei,j = −ri · [ci,j]p + hg (ci,j), ei,1i + hg (ci,j), riej + sjei,2i,

2 2 1 2 whose variance is Var(ei,j) = (n + n)d · Vg · σ + 24 np . We repeat it for all 1 ≤ i, j ≤ k, so the temporary ciphertext will satisfy

k k 00 X 00 X c0 + ci · si = e + p · ci,j · sisj (mod pq) i=1 i,j=1

Pk 2 for the error e = i,j=1 ei,j of varianc Var(e) = k · Var(ei,j). After reducing its modulus down to q, we 0 0 get the output ciphertext ct = (ci)0≤i≤k whose phase is

0 −1 hct , ski = hct, sk ⊗ ski + p · e + erd

−1 00 1 1 with an additional rounding error erd = −p · h[(ci )0≤i≤k]p, ski of variance 24 kn + 12 . The final relin- −1 earization error elin = p · e + erd has the variance 1 1 1 1 V = p−2 · k2(n2 + n)dσ2 · V + k2n + kn + ≈ p−2 · k2n2d · σ2 · V + (k2 + k)n. lin g 24 24 12 g 24

C.2 Multi-Key BFV

2 Encryption. Let ct = (c0, c1) ∈ Rq be an encryption of m ∈ Rt generated by the randomness r ← χ and e0, e1 ← ψ. Then we have

c0 + c1 · s = ∆ · m + r · (b + a · s) + (e0 + e1 · s) = ∆ · m + (r · e + e0 + e1 · s) (mod q), where e = e[0] is a noise of public key. Therefore, the encryption noise eenc = r · e + e0 + e1 · s has the variance of 2 2 Venc = σ · (1 + n) ≈ σ n.

Multiplication. Suppose that cti is an encryption of mi for i = 1, 2, i.e., hcti, ski = q · Ii + ∆ · mi + ei j 1 m 1 1  1 for some Ii and ei in R. The variance of Ii = q hcti, ski is computed by Var(Ii) ≈ 12 1 + 2 kn ≈ 24 kn 1 1 k+1 since q · cti behaves like a uniform random variable over q · Rq whose variance is approximately equal 1 to 12 . The tensor product of the input ciphertexts satisfies

hct1 ⊗ ct2, sk ⊗ ski = hct1, ski · hct2, ski 2 =∆ · m1m2 + q · (I1e2 + I2e1) + ∆ · (m1e2 + m2e1) + e1e2 (mod q · ∆), Efficient MKHE with Packed Ciphertexts 27 and consequently the ciphertext ct = b(t/q) · ct1 ⊗ ct2e has the phase

−1 hct, sk ⊗ ski = ∆ · m1m2 + (t · (I1e2 + I2e1) + (m1e2 + m2e1) + ∆ · e1e2 + erd) for the rounding error erd = h(t/q) · ct1 ⊗ ct2 − ct, sk ⊗ ski. Therefore, the multiplication error

−1 emul = t · (I1e2 + I2e1) + (m1e2 + m2e1) + ∆ · e1e2 + erd is dominated by the first term t · (I1e2 + I2e1) whose variance is 1 V = nt2 · (Var(I ) · Var(e ) + Var(I ) · Var(e )) ≈ kn2t2 · (Var(e ) + Var(e )). mul 1 2 2 1 24 1 2

C.3 Multi-Key CKKS

Encryption. The CKKS scheme has the same encryption error as BFV. 0 0 Multiplication. For i = 1, 2, let µi = hcti, ski be the phase of an input ciphertext cti. Then ct = ct1 ⊗ct2 0 0 0 has the phase hct, sk⊗ski = hct1, ski·hct2, ski = µ1 ·µ2 (mod q). Therefore, the output ct ← Relin(ct; rlk) 0 satisfies that hct , ski = µ1µ2 + elin for a relinearization error elin of variance Vlin. As explained above, this variance can be reduced down if cti has some zero entries. Rescaling. Let ct0 ← . (ct) = p−1 · ct ∈ Rk+1 for ct ∈ Rk+1. Then, we have MK-CKKS Rescale ` q`−1 q`

0 −1 hct , ski = p` · [hct, ski]q` + erd (mod q`−1)

 −1  −1 for the rescaling error erd = p` · ct − p` · ct, sk from rounding. Note that each component of  −1  −1 −1 1 p` · ct − p` · ct behaves like a uniform random variable on p` · Rp` whose variance is 12 . Therefore, the rescaling error has the variance of

1  1  1 V = 1 + kn ≈ kn. res 12 2 24

D Homomorphic Evaluation of CNN

For a d1 × d matrix A1 and a d2 × d matrix A2,(A1; A2) denotes the (d1 + d2) × d matrix obtained by concatenating two matrices in a vertical direction. If two matrices A1 and A2 have the same number of rows, (A1|A2) denotes a matrix formed by horizontal concatenation. As defined in [34], there is a row ordering encoding map to transform a vector of dimension n = d2 into a matrix in Rd×d. For a vector n d×d a = (ak)0≤k

D.1 Encryption of Data and Trained Model

28×28 Alg. 4 takes as the input an image X = (xi,j) ∈ R and the public key bD of the image provider. Alg. 5 takes the multiple channels of the convolutional layer from the trained model as inputs and generates their encryptions using the public key of the model provider bM . Alg. 6 takes as the inputs the weights and biases of the FC layers and outputs their encryptions using the public key. 28 H. Chen et al.

Algorithm 4 Encryption of an image 28×28 Input: Image X ∈ R , public key of the image provider bD. Output: ct.X. 1: for 0 ≤ i, j < 2 do 0 14×14 2: Xi,j ← (x2k+i,2l+j)0≤k,l≤13 ∈ R 0 0 70×14 3: Xi,j ← (Xi,j; ... ; Xi,j) ∈ R ¯ 10 1024 4: vi,j ← PadZeros(vec(Xi,j), 2 − 70 · 14) ∈ R 5: end for 4096 6: v ← (v0,0|v0,1|v1,0|v1,1) ∈ R 7: v ← (v|v) ∈ R8192 8: ct.X ← MK-CKKS.Enc(v; bD, a)

Algorithm 5 Encryption of multiple channels (k) (k) 4×4 Input: {Y = (yi,j ) ∈ R }0≤k<5, public key of the model provider bM . Output: {ct.Yl}0≤l<4. 1: for 0 ≤ i, j < 4 do 2: vi,j ← ∅ . Null string 3: for 0 ≤ k < 5 do (k) (k) 13×13 4: Yk,i,j ← (yi,j , . . . , yi,j ) ∈ R 0 14×14 5: Yk,i,j ← PadZeros(Yk,i,j, (1, 1), (R,B)) ∈ R 0 6: vi,j ← (vi,j|vec(Yk,i,j)) 7: end for 10 1024 8: vi,j ← PadZeros(vi,j, 2 − 5 · 14 · 14) ∈ R 9: end for 10: for 0 ≤ i, j < 2 do 4096 11: v2i+j ← (v2i,2j|v2i,2j+1|v2i+1,2j|v2i+1,2j+1) ∈ R 8192 12: v2i+j ← (v2i+j|v2i+j) ∈ R 13: ct.Y2i+j ← MK-CKKS.Enc(v2i+j; bM , a) 14: end for

D.2 Homomorphic Evaluation of CNN

We start with a useful aggregation operation across plaintext slots from the literature [31, 38]. This algorithm is referred as AllSum, which is parameterized by integers ϕ and α. See Algorithm 7 for an implementation. We denote by MKHE.Rot(ct; k) a multi-key rotation which transforms an input ciphertext ct into a new ciphertext that encrypts a shifted plaintext vector by k of the original message. Given ` a ciphertext ct representing a plaintext vector m = (m0, . . . , m`−1) ∈ R for ` = α · ϕ, the AllSum Pα−1 Pα−1 algorithm outputs a ciphertext ctout encrypting α copies of the vector ( j=0 mj·ϕ, j=0 mj·ϕ+1,..., Pα−1 ϕ j=0 m(j+1)·ϕ−1) ∈ R . This can be implemented using rotations and additions.

Matrix-Vector Multiplication. Assume that ni is smaller than the number of plaintext slots ns and the input no × ni matrix W is split into 1 × (ni/`) sized sub-matrices, denoted by wi,j for 0 ≤ i < no and 0 ≤ j < `. Then, we can pack (` · ns)/ni many different sub-matrices into a single ciphertext, and nc = (ns/ni) copies of the input vector v in a single ciphertext. For example, consider the first (` · nc) rows of W. We encode nc many the first diagonal vectors of the matrix into a plaintext vector, i.e., (w0,0|w1,1| ... |w`−1,`−1)|(w`,0|w`+1,1| ... |w2`−1,`−1)|

 ns ... |(w`·(nc−1),0|w`·(nc−1)+1,1| ... |w`·(nc−1)+`−1,`−1) ∈ R .

As such, each extended diagonal vectors are encrypted in a single ciphertext and these ` many ciphertexts are multiplied with ` rotations of the encrypted vector v. Then we add them together similar to the original Efficient MKHE with Packed Ciphertexts 29

Algorithm 6 Encryption of weights & biases of FC layers 64×845 (1) 64 10×64 (2) 10 Input:W ∈ R , z ∈ R , U ∈ R , z ∈ R , public key of the model provider bM . Output: {ct.Wk}0≤k<8, ct.z1, ct.U, ct.z2. [Encryption of weight of FC-1] 1: W0 ← ∅ 2: for 0 ≤ i < 64 do 3: w ← ∅ 4: for 0 ≤ j < 5 do 13×13 5: Wj ← mat(Wi,132·j:132·(j+1)) ∈ R 0 14×14 6: Wj ← PadZeros(Wj, (1, 1), (R,B)) ∈ R 0 7: w ← (w|vec(Wj)) 8: end for 9: w ← PadZeros(w, 210 − 5 · 14 · 14) ∈ R1024 10: W0 ← (W0; w) 11: end for 12: for 0 ≤ k < 8 do 13: v ← ∅ 14: for 0 ≤ i < 64 do  0  15: v ← v | Wi,27·((i+k)%8):27·((i+k+1)%8) 16: end for 17: ct.Wk ← MK-CKKS.Enc(v; bM , a) 18: end for [Encryptions of bias of FC-1/weight & bias of FC-2] 19: v1 ← ∅, u ← ∅, v2 ← ∅ 20: for 0 ≤ i < 64 do (1) 7 (1) (1) 21: v1 ← (v1|PadZeros(zi , 2 − 1)) . zi : i-th entry of z t 7 22: u ← (u|PadZeros(Ui, 2 − 10)) . Ui : i-th column of U (2) 7 23: v2 ← (v2|PadZeros(z , 2 − 10)) 24: end for 25: ct.z1 ← MK-CKKS.Enc(v1; bM , a) 26: ct.U ← MK-CKKS.Enc(u; bM , a) 27: ct.z2 ← MK-CKKS.Enc(v2; bM , a)

diagonal method and the output ciphertext represents (ni/`)-sized (` · nc) chunks, each of which contains partial sums of ` entries of ni inputs. Finally, we can accumulate these using a rotate-and-sum algorithm no with log(ni/`) rotations, yielding in a ciphertext that contains the first (` · nc) entries of Wv ∈ R . We repeat these procedures no/(`·nc) times. In the end, we get no/(`·nc) ciphertexts, each containing (`·nc) entries of the result. The computational cost is `·(no/(`·nc)) = (no ·ni)/ns homomorphic multiplications, (` − 1) rotations of the encrypted vector v, and (no/(` · nc)) · log(ni/`) rotation on distinct ciphertexts. Evaluation Strategy of CNN. Alg. 8 provides an explicit description of homomorphic evaluation of CNN. We denote by SMult(ct, u) the multiplication of ct with a scalar u. For simplicity, we omit the evaluation/public keys {(Di, bi)}1≤i≤2 in homomorphic multiplication process. 30 H. Chen et al.

Algorithm 7 AllSum(ct, ψ, α) Input: ct, input ciphertext, the unit initial amount by which the ciphertext shifts ψ, the number of summands α > 1 Output: ctout 1: ctout ← ct 2: for i = 0, 1,..., log α − 1 do i 3: ctout ← MKHE.Add(ctout, MKHE.Rot(ctout; ψ · 2 )) 4: end for

Algorithm 8 Homomorphic evaluation of CNN

Input: ct.X, {ct.Yl}0≤l<4, {ct.Wk}0≤k<8, ct.z1, ct.U, ct.z2. Output: ctout. [Convolutional layer] 1: ct0 ← MK-CKKS.Mult(ct.X, ct.Y0) . Simple convolutions 2: for 1 ≤ l < 4 do 3: ct.Xl ← MK-CKKS.Rot(ct.X; 14i + j) . i = bl/2c, j = (l%2) 4: ct ← MK-CKKS.Mult(ct.Xl, ct.Yl) 5: ct0 ← MK-CKKS.Add(ct0, ct) 6: end for 7: ct0 ← MK-CKKS.Rescale(ct0) 8: ct0 ← AllSum(ct0, 1024, 4) . Sum over multiple channels [1st square layer] 9: ct1 ← MK-CKKS.Rescale(MK-CKKS.Mult(ct0, ct0)) [FC-1 layer] 10: ct2 ← MK-CKKS.Mult(ct1, ct.W0) 11: for 1 ≤ l < 8 do 7 12: ct ← MK-CKKS.Rot(ct1; 2 · l) 13: ct ← MK-CKKS.Mult(ct, ct.Wl) 14: ct2 ← MK-CKKS.Add(ct2, ct) 15: end for 16: ct2 ← MK-CKKS.Rescale(ct2) 17: ct2 ← AllSum(ct2, 1, 64) 18: u ← PadZeros((1), 127) ∈ R128 19: u ← (u|u| ... |u) ∈ R8192 20: ct2 ← MK-CKKS.SMult(ct2, u) . Multiplicative masking 21: ct2 ← MK-CKKS.Rescale(ct2) 22: ct2 ← MK-CKKS.Add(ct2, ct.z1) [2nd square layer] 23: ct3 ← MK-CKKS.Rescale(MK-CKKS.Mult(ct2, ct2)) [FC-2 layer] 24: ct4 ← AllSum(ct3, −1, 16) 25: ct4 ← MK-CKKS.Rescale(MK-CKKS.Mult(ct4, ct.U)) 26: ct4 ← AllSum(ct4, 128, 64) 27: ctout ← MK-CKKS.Add(ct4, ct.z2)