<<

Institute of Computing University of Campinas

Exame de Qualificação Específico

Cryptographic Engineering of Privacy-preserving Algorithms

Pedro Geraldo M. R. Alves

Supervised by

Prof. Dr. Diego de Freitas Aranha

November 2019 Abstract

When extensive data gathering and outsourced computing becomes every day more common, it is necessary to tighten security requirements. The appreciation of user information trans- forms datasets into targets for malicious entities, that may be able to explore different kinds of system vulnerabilities to acquire it. Moreover, since much of nowadays computation lies on the cloud, data becomes vulnerable also to malicious system administrators and third parties with access to the infrastructure itself.

In this work we intent to explore different solutions for capable privacy-preserving data management and computing. Our focus is on the development of strategies for efficient imple- mentation and applicability in the field. As starting points we shall continue previous efforts on the acceleration of functional schemes (as homomorphic encryption schemes) using GPGPU -parallelism and applying it on human face recognition in the cloud, an other- wise notoriously privacy-intrusive computing task. Furthermore, efforts should be expend on the development of a searchable encryption framework capable of maintaining the usability of the dataset while keep its secrecy. As a secondary objective we target non--based solutions, as differential privacy. Contents

1 Introduction 2

2 Mathematical methods for the efficient implementation of polynomial-based

arithmetic 4

2.1 Residue number system ...... 4

2.2 Polynomial multiplication ...... 5

2.3 Discrete Fourier Transform ...... 6

2.4 Fast Fourier Transform ...... 7

2.5 Number-Theoretic Transform ...... 7

3 Related work 8

3.1 Functional encryption ...... 8

3.2 Searchable encryption ...... 12

3.3 Differential Privacy ...... 15

4 Research proposal 16

4.1 Performance improvement of functional ...... 16

4.2 Privacy-preserving framework for human face recognition ...... 17

5 Method and schedule 18

5.1 Methodology ...... 18

5.2 Technical team ...... 19

5.3 Schedule ...... 21

6 Previous work 21

7 Conclusion 23

1 1 Introduction

In an era of increasing appreciation of data gathering for advertising and behavior forecast, information is gold. Thus, as important as being capable of processing and extracting useful information from data is to protect it from breaches. Accomplishing this task implies looking after for the business itself and its users.

From mobile to scientific computing, the industry increasingly embraces cloud ser- vices and takes advantage of their potential to improve availability and reduce operational costs [HMF+08, DLNW13]. The possibility to outsource the installation, maintenance and scalability of servers, added to competitive prices, makes this service highly attractive [Buy09,

VPB09]. However, the cloud cannot be blindly trusted. Malicious parties may acquire full access to the servers and consequently to sensitive data. Among the threats there are external entities exploiting vulnerabilities, intrusive governments requesting information, competitors seeking unfair advantages, and even possibly malicious system administrators. The data owner has no real control over the processing hardware and therefore cannot guarantee the secrecy of data [XX13]. The risk of confidentiality breaches caused by inadequate and insecure use of cloud computing is real and tangible.

How to efficiently collect and compute user data without undermining their privacy is an open problem. As remarked by Narayanan and Felten, “data privacy is a hard prob- lem” [NF14]. Even when data holders choose the most conservative practice and never share data, system breaches may happen.

The Breach Level Index provides distressful statistics about data leakage since 2013.

According to their last report, almost 2 billion records breached only in the first half of 2017.

Almost 7 thousands per minute. Of these, only 4.6% were encrypted. That is, most of it was in plaintext, including emails, vehicle registration details, payment information, and medical records [Gem17].

Most of those breaches occurred by accidental loss or inadvertently leaving data exposed.

However, malicious parties are far from negligible. For instance, governmental surveillance programs as PRISM that forced cloud companies to provide user data [GM13, Web14]; the

2 Ashley Madison case, when personal data of millions of users was leaked revealing sensitive information and with developments that include suicides [Tho15, Lam16]; and the Yahoo data breach, possibly the largest known and affecting about 3 billion accounts, which revealed unencrypted email addresses, telephone numbers, security questions and answers [BBC16].

These occurrences take us to the disturbing feeling that, despite all efforts, the risk of data deanonymization increases in worrying ways following how much of it is made avail- able [Swe00, Gol06]. Hence, a seemingly obvious strategy to avoid such issue is to simply stop any kind of data collection.

History has proven that the task of collecting and storing data from third parties should be treated as risky. The chance of compromising user privacy by accident is too big and pos- sibly with extreme consequences. This way, the concept of security by renouncing knowledge has attracted adepts, as the search engine DuckDuckGo that states in a blog post that “the only truly anonymised data is no data”, and because of that, claims to forego the right to store their users’ data [Duc17, Sch16].

A more financial-realistic alternative for dealing with this issue is not to give up com- pletely of knowledge but reduce the entities with full access to data by keeping it encrypted during all its lifespan: transportation, storage, and processing, staying secret to the applica- tion and the cloud; and hide it using a big set of similar data from several users. Thus, a new security fence is set, tying data secrecy to formal guarantees.

This document is structured as follows. We first briefly review common directions present in the literature for privacy-preserving computing in Section 3. Section 4 consoli- dates and describes the topics we shall pursue during the research. Section 5.1 analyzes the methodology. Section 5.3 presents the expected schedule for this work and, lastly, Section 6 presents some of the most important results obtained during the student’s previous works.

3 2 Mathematical methods for the efficient implementation of

polynomial-based arithmetic

As discussed in Section 3.1, current SHE and LHE cryptosystems suffer from perfor- mance issues due to their roots in polynomial arithmetic and the need for big parameters to achieve proper security levels. This way, an efficient formulation of the arithmetic together with convenient mathematical methods are essential to obtain a high-performance implemen- tation on the target platform. Following we mention some of the main tools we’ve been working on to boost performance of such schemes.

2.1 Residue number system

Multiprecision integer arithmetic can be computationally expensive once there is no hardware support, forcing the task to be executed by software. The Residue Number System

(RNS) applies the Chinese Remainder Theorem (CRT) to map large integers to a set of arbitrarily smaller ones, called residues [DPS96]. This allows manipulating the polynomial coefficients through a simpler and natively supported arithmetic.

This work applied RNS as tool to build the faster arithmetic intended. Definitions 1 and 2 provides the formulations for applying the direct and inverse CRT.

Definition 1 (Direct CRT function). Let x be a polynomial in Rq and {p0, p1, . . . , pl−1} a set with ` primes. We have

CRT (x) = {x mod p0, x mod p1, . . . , x mod p`−1} .

Definition 2 (Inverse CRT function). Let x be a set with ` residues, {p0, p1, . . . , p`−1} the l−1 corresponding set of primes and M = πi=0pi. Then,

l−1 " −1 # X M  M  ICRT (x) = · x mod p mod M. p p i i i=0 i i

To guarantee the correctness of these functions, it is required that the product of all

4 primes, M, be larger than the largest possible coefficient of a polynomial represented in the CRT domain, even after an addition or multiplication operation. This way, we need

`−1 2 M = Πi=0 pi > n · q , for polynomials of degree up to n and coefficients lying in Zq. Addition and multiplication in the CRT domain work by applying the operation coefficient-wise, as described in Definition 3.

Definition 3 (Addition/multiplication in the CRT domain). Let x, y be integers in Zq, X = CRT (x) and Y = CRT (y) their corresponding residues computed with ` primes and an operation , addition or multiplication. Then,

X  Y = {(X0  Y0) , (X1  Y1) ,..., (Xl−1  Yl−1)} .

2.2 Polynomial multiplication

Let p and q be two polynomials such that

N−1 p(x) = a0 + a1x + ... + aN−1x ,

N−1 q(x) = b0 + b1x + ... + bN−1x .

Its product is defined as

2N−2 p(x) · q(x) = c(x) = c0 + c1x + ... + c2N−2x , (1) where X ci = akbi−k. (2) max{0,i−(N−1)}≤k≤min{i,N−1}

If these expressions are computed directly, the number of operations needed to obtain

2 ci is Θ N , which means that the performance will be seriously affected with the increase of the operands. In the context of cryptosystems based on the RLWE problem, as observed by Lindner-Peikert, security is strongly related to the degree of the polynomial ring [LP11].

Specifically on YASHE, Lepoint-Naehrig propose parameters with N varying between 211

5 to 216, providing a security level equivalent to λ = 128 bits [FV12]. Hence, an efficient implementation of polynomial multiplication for operands with large degree is vital for the performance of these cryptosystems.

The Discrete Fourier Transform (DFT) provides a domain where polynomial multipli- cation is simplified to a coefficient-wise operation. Thus, this operation is computed linearly when the operands lie in the transform domain. Applying the transform, however, requires an elegant strategy for its total cost to remain sub-quadratic.

Two variants of the DFT were analyzed for this work. The first is the Fast Fourier

Transform (FFT). It applies the transform to the operands through a divide-to-conquer algorithm with cost Θ(N log N). The second, the Number-Theoretic Transform (NTT) is an alternative to FFT. While the DFT and FFT are built using the complex field, the NTT uses a finite field. Either way, both variants enables one to execute a polynomial multiplication with complexity of O(N log N), asymptotically lower than the naive algorithm.

Definition 4 (Primitive Nth roots of unity). Let p be a prime and N an integer such that

N ≥ 2. We define ωN as a primitive N-th root of unity if and only if:

N • ωN = 1,

x • For x varying between 1 and p − 1, ωN mod p assumes all values in [1, p − 1] in some order.

2.3 Discrete Fourier Transform

N N DFT is a bijective function between R and C [Har78]. It is defined as the multi- plication between a Vandermonde matrix VN generated by the primitive N-th root of unity

i 2π 2 ωN = exp N , where i ∈ C and i = −1, by a vector a, as described in Equation 3 for a vector

6 of polynomial coefficients a .

      1 1 1 ... 1 a0 aˆ0        2 N−1       1 ωN ω . . . ω   a1   aˆ1   N N       2 4 2(N−1)      DFT(a) = VN · a =  1 ω ω . . . ω  ·  a  =  aˆ  =a ˆ  N N N   2   2   . . . . .   .   .   . . . . .   .   .   . . . . .   .   .        (N−1) 2(N−1) (N−1)2 1 ωN ωN . . . ωN aN−1 aˆN−1 (3)

i It is easy to see that this formulation of DFT is equivalent to the evaluation of ωN by a polynomial with coefficients in a, for i ∈ ZN−1, what contributes to the quadratic cost of the algorithm.

2.4 Fast Fourier Transform

FFT is a more efficient revision of DFT [CCF+67]. It replaces the matrix-vector multi- plication by an elegant divide-and-conquer algorithm that enables one to apply this transform with complexity Θ(N log N). This result provides higher speed and smaller errors, once it requires less floating point operations.

2.5 Number-Theoretic Transform

Despite the FFT being a well-known algorithm used widely in the lite-rature, it has an undesirable incompatibility with RLWE-based cryptosystems. As already said, DFT and

FFT use a complex field element to build the Vandermonde matrix, what affects all of its arithmetic. This might incur precision errors due to the floating point arithmetic and the conversions needed to implement a that uses finite field arithmetic. NTT follows a different strategy [LN16]. Instead of generating a primitive root ωN from a complex number, NTT uses a finite field element as shown in Lemma 1.

Lemma 1. Fix an integer N and a prime p with form p = k·N +1, for some integer k. Thus,

k ωN = r mod p is a primitive N-th root of unity, where r is the primitive root of p [Arn97].

7 Proof. Let N be an integer greater of equal to 2, p a prime such that exists a positive integer

k k that satisfies p = k ·(N +1) and r a primitive root modulo p. Moreover, fix ωN ≡ r mod p.

N kN p−1 m Then ωN ≡ r ≡ r ≡ 1 mod p. Futhermore, ωN 6= 1 mod p if m < N ( ωN was computed from the primitive root of p). Hence, ωN is a primitive N-th root of unity in Zp.

3 Related work

This work is motivated by the need to raise security standards when manipulating data.

Above all on cloud, since third party hardware will be on use and thus we shall not consider this as a secure environment. Different approaches can be found in the literature aiming to reduce the attack surface on this context. On this section we discuss some of the main directions, based on property-preserving encryption or in statistical methods.

3.1 Functional encryption

A frequent undervalued security requirement for cloud computing is the protection of data not only on storage but also during processing. Functional encryption schemes are natural candidates to fulfill such need, since they enable one to operate over without the decryption and, this way, the plaintext itself.

These schemes are characterized by designing a function which interacts with ciphertexts in a useful way, as for instance revealing a specific information but nothing else. For example, consider a monitoring system with cameras scattered throughout a city. Such structure enables the administrator to acquire sensitive data, as the routine, localization, and affiliation of, maybe, millions of people in real time. Through a human face recognition algorithm one is able to localize a target and related information. Thus, this scenario imposes the use of carefully built systems to protect the secrecy of data and guarantee that only with those with legal authorization will be capable of accessing this information. While traditional cryptography cannot help in such scenario, functional encryption provides the capability of searching an encrypted dataset and only revealing entries regarding the target.

In the following we present some important classes of functional encryption schemes.

8 Homomorphic Encryption Homomorphic encryption (HE) schemes allow the evaluation of general functions over encrypted data to obtain, after decryption, a result equivalent to the computation applied over plaintexts. Such functions are composed by additions or multipli- cations, as shown in Definition 5.

Definition 5. Let E be an encryption function and D the corresponding decryption function.

Moreover, let m1 and m2 be plaintexts. The pair (E,D) forms an encryption scheme with the homomorphic property for some operator  if and only if the following holds:

E (m1) ◦ E (m2) ≡ E (m1  m2)

The operation ◦ in the domain is equivalent to  in the plaintext domain.

Homomorphic encryption is proposed as a solution for privacy-preserving computation in the cloud because computation can be outsourced without the cloud (or an adversary) learning any information about the plaintexts beyond what is necessary for homomorphic evaluation. Homomorphic cryptosystems can be classified according to the supported opera- tions and their limitations:

Partially homomorphic encryption (PHE) schemes support Definition 5 for either un-

limited addition or multiplication operations. PHE schemes are long known, as Paillier

and ElGamal [Pai99, ElG85] schemes. These are well known by the community and are

characterized by their good performance.

Fully homomorphic encryption (FHE) schemes support Definition 5 for both addition

and multiplication operations, also without restrictions. Currently these cryptosystems

are built through SHE and LHE schemes.

Somewhat homomorphic encryption (SHE) schemes are similar to FHE but with an

upper bound on the number of homomorphic operations that can me applied to a ci-

phertext without breaking decryption capabilities. Gentry was the first to demonstrate

how to build a FHE from SHE through a technique called bootstrapping. It reduces the

9 noise accumulated after a sequence of homomorphic operations by applying the decryp-

tion procedure homomorphically, obtaining a new ciphertext that encrypts the same

value as before but has lower noise [Gen10].

Leveled fully homomorphic encryption (LHE) is a subclass of SHE. Braverski, Gentry

and Vaikuntanathan [BGV12] define these cryptosystems as those capable of evaluat-

ing arbitrary functions of a certain complexity on ciphertexts, without the need of a

bootstrapping operation, as long as the encryption parameters have been properly cus-

tomized. This way, it generates ciphertexts that support not an unlimited quantity of

homomorphic operations but only the right amount required for a certain application.

We have seen several proposals following the same approach as base for new SHE cryp-

tosystems, as NTRU [HPS98, SS13], BGV [BGV11], YASHE [BLLN13], FV [LN14],

LTV [LATV13a], GSW [GSW13] and others [CS16, DSS16].

Performance is an obstacle for the application of SHE and LHE schemes in the real world, as in machine learning algorithms [BK15, PA17, jLKS16]. These cryptosystems have roots in polynomial arithmetic inherited from the mathematical problems used on their con- struction, like (LWE) [Reg05], Ring-Learning With Errors(RLWE) [LPR10] and the Decision Small Polynomial Ratio (DSPR) [LATV13b]. Furthermore, to obtain ap- propriate security levels, high-degree polynomials rings are needed, what causes performance issues and the incompatibility with resource-constrained devices, as those typical to the IoT.

This became even worse after the work of Albrecht et al., which revealed an attack over

DSPR-based schemes and affects YASHE and LTV, which at the time presented the best per- formance in their class [ABD16, Kud16]. This way, research on techniques to speed-up and optimize the arithmetic is a hot research topic [DS15, DGBL+15, AA15, LN14, PNPM15].

PHE cryptosystems have been known for decades [RAD78]. However, the most common data processing applications, as those arising from statistics, machine learning or genomics processing, frequently require support for both addition and multiplication operations simul- taneously. This way, these schemes are not suitable for general computation. While FHE performance still presents a major challenge, SHE and LHE schemes have been useful for

10 solving computational problems of moderate complexity.

In terms of security, homomorphic cryptosystems achieve only the IND-CCA1-1 level, which means that the scheme is not secure against an attacker with arbitrary access to a decryption oracle [BDPR98, LMSV12]. This is a natural consequence of the design require- ments, since these cryptosystems allow any entity to manipulate ciphertexts. Most proposals however reach only IND-CPA 1 and stay secure against attackers without any access to a decryption oracle.

Order-Revealing Encryption Order-revealing encryption (ORE) schemes are character- ized by having, in addition to the usual set of cryptographic functions like keygen and encrypt, a function capable of comparing ciphertexts and returning the order of the underlying plain- text.

Definition 6 (ORE). Let E be an encryption function, C be a comparison function, and m1 and m2 be plaintexts from the message space. The pair (E,C) is defined as an encryption scheme with the order-revealing property if:

  lower, if m1 < m2,   C(E (m1) ,E (m2)) = equal, if m1 = m2,    greater, otherwise.

An specialization of ORE is order-preserving encryption (OPE), which implements such function as a simple numerical comparison [BLR+15].

Differently from OPE, ORE is not inherently deterministic [KS12]. For example, Ch- enette et al. propose an ORE scheme that applies a pseudo-random function over an OPE scheme, while Lewi and Wu propose an ORE scheme completely built upon symmetric primi- tives, capable of limiting the use of the comparison function and reducing the leakage inherent to this routine [CLWW16, LW16]. Their solution works by defining ciphertexts composed by

1A cryptosystem is said to achieve IND-CPA security if no adversary is able to recognize a ciphertext created from a randomly chosen message from a known two-element message set.

11 pairs (ctL, ctR). To compare ciphertexts ctA and ctB, it requires ctAL and ctBR . This way, the data owner is capable of storing only one side of those pairs in a remote database being certain that no one will be able to make comparisons between those elements.

Any scheme that reveals numerical order of plaintexts through ciphertexts is vulnerable to inference attacks and frequency analysis, as those described by Naveed et al. [NKW15].

Although ORE does not completely discard the possibility of such attacks, it offers stronger defenses than OPE. It is important to understand, besides the applicability, how vulnerable

ORE/OPE schemes can be. Boneh, Durak et al., Chenette et al., Lewi-Wu, and Cash et al. works investigate and propose strategies to mitigate this vulnerability [BLR+15, DDC16,

CLWW16, LW16, CLOZ16].

Identity-Based Encryption Identity-Based encryption (IBE) is a special class of public- key schemes, in which a public key is set as an arbitrary string, such as an e-mail address or a device identifier [PLL16, GS02]. Usually this is done through a trusted third party capable of generate a secret key relative to the identity, called key generation center (KGC). There are three approaches for construction IBE schemes: bilinear map [BF01, Gen06], quadratic residuosity assumption [Coc01, BGH07] and lattices [CHKP10, ABB10, GPV08].

Going beyond the convenience of having an arbitrary public key, IBE may be used in completely different contexts.

3.2 Searchable encryption

Suppose a scenario where Alice stores a set of documents with an untrusted entity

Bob. She would like to keep this data encrypted because, as defined, Bob cannot be trusted.

Alice also would like to occasionally retrieve a subset of documents accordingly to a predicate without revealing any sensitive information to Bob. Thus, sharing the decryption key is not an option. The problem lies in the fact that communication between Alice and Bob may (and probably will) be constrained. Hence, a naive solution consisting of Bob sending all documents to Alice and letting her decrypt and select whatever she wants may not be feasible. Alice must then implement some mechanism to protect her encrypted data so that

12 Bob will be able to identify the desired documents without knowing their contents or the selection criteria [SWPP00].

The problem of using standard encryption in Alice’s database is that it eliminates the capability to select records or evaluate arbitrary functions without the cryptographic keys.

This reduce the storage system to simply a complex and huge storage service. In the context of cloud computing it discards several advantages offered by the service. Searchable encryp- tion enables the cloud to manipulate encrypted data on behalf of a client without learning information. Hence, it solves both of aforementioned problems, keeping confidentiality in regard to the cloud but retaining some of its interesting features.

Several works investigate database designs for maintaining full-time-encrypted but also functional databases [CJJ+13, AA16, MRAA18]. In this context, functional encryption schemes are ubiquitous. CryptDB, Arx and Seabed are state-of-the-art implementations that provide encrypted search functionalities to standard databases, as MySQL and Mon- goDB [PRZB11, PBP16, PBC+16a]. All three prefer to provide a database-agnostic proxy responsible to encrypt queries and process the output, applying different cryptosystems ac- cording to the application needs for data. As shown by Bosch’s survey on the subject, this is still an open problem because strong security and performance penalties are still not appro- priate for real applications [BHJP14].

The development of efficient and secure solutions for management of datasets depends on the awareness of the threats we intent to mitigate. For such, this work follows Grubbs’ definitions of adversaries for a database [GRS17].

Active attacker. The worst case scenario is when the attacker acquires full control over the server, being capable of performing arbitrary operations. Thus, he is not committed to follow any protocol.

Snapshot attacker. The adversary obtains a snapshot of the dataset containing the pri- mary data and indexes but no information about issued queries and how they access the encrypted data.

13 Persistent passive attacker. Another possibility is a scenario in which the attack cannot interfere with the server functionality but can observe all of its operations. We do not consider only attackers that inspect issued queries in real-time, but also those that are able to recover them later. As demonstrated by Grubbs, the data contained in a real-world database goes far beyond the primary dataset (names, addresses, . . . ). It also includes logs, caches, and auxiliary tables (as MySQL’s diagnostic tables) used, for instance, to guarantee ACID 2 and enable the server to undo incomplete queries after a power-break. It is very likely that an attacker competent to subjugate the security protocols of the system will be capable to also recover these secondary datasets.

The idea of a snapshot attacker is very popular among solutions and researchers intended to develop encrypted databases. Nevertheless, it underestimates the attacker and the many side-attacks a motivated adversary can execute. As Rogaway remarks, we cannot make the mistake to reduce the adversary to the lazy and abstract Bob, but we must remember that it can go far beyond that and take the form of a military-industrial-surveillance program with a billionaire budget and capability to surpass the obvious [Rog15].

The management of a dataset is made by a database management system (DBMS). It is composed by several layers responsible for coordinating read and write operations, guarantee data consistency and integrity, and user access. The engineering of such a system is a complex task and requires smart optimizations to be able to store data, process queries and return the outcome with minimum latency and good scalability.

This way, searchable encryption solutions usually are implemented not inside but on top of these systems as a middleware to translate encrypted queries to the DBMS without revealing plaintext data and decrypt the outcome. This strategy enables the use of decades of optimizations incorporated in nowadays DBMSs and portable to encrypted data. It is important to state that, ideally, security features should be designed in conjunction with the underlying database. Long-term solutions are expected to assimilate those strategies internally in the DBMS core.

2Relative to a set of desirable properties for a database. Acronym to “Atomicity, Consistency, Isolation, Durability”.

14 We can mention three state-of-the-art implementations of databases that offers full-time encryption. CryptDB, Arx, and Seabed.

The first is a software layer that provides capabilities to store data in a remote SQL database and query over it without revealing sensitive information to the DBMS. It introduces a proxy layer responsible to encrypt and adjust queries to the database and decrypt the outcome [PRZB11].

Arx is a database system implemented on top of MongoDB [PBP16]. It targets much stronger security properties and claims to protect the database with the same level of reg- ular AES-based encryption3, achieving IND-CPA security. This is a direct consequence of the almost exclusively use of AES to construct selection operators, even on range queries, and not only brings strong security but also good performance due to efficiency of symmetric primitives, sometimes even benefiting from hardware implementations. The authors report a performance overhead of approximately 10% when used to replace the database of ShareLa-

TeX.

Seabed was developed by Papadimitriou et al. and aims at Business Intelligence (BI) applications interested in keeping data secure on the cloud [PBC+16b]. As well as CryptDB and Arx, Seabed was built consisting of a client-side query translator (to SQL), a query plan- ner, and a proxy that connects to a Apache Spark instance [SS15b]. Its main foundations are two new cryptographic constructions, additively symmetric homomorphic encryption (ASHE) and Splayed ASHE (SPLASHE). The former is used to replace Paillier as the additively homo- morphic encryption scheme, stating that their construction is up to three orders of magnitude faster. The latter is used to protect the database against inference attacks [NKW15].

3.3 Differential Privacy

Crowd computing is characterized by massive data collection, frequently through daily- use devices like smartphones and smartwatches. This data can be used for different objectives, such as to improve search results, predict buying options, weather forecasting, health monitor,

3The Advanced Encryption Standard (AES) is a well-established symmetric enabling high performance implementation in hardware and software [DR99].

15 and so on. However, the ability to learn from user’s data raises several privacy flags. This way, to keep users interested in sharing their data, developers must implement procedures to alleviate damage to user privacy.

The concept of differential privacy has its roots in statistical theory. Using client-side noise addition in the collection of users’ data, one is able to guarantee anonymity through a strong mathematical base. This way, an observer consulting a database is capable of learning common information about a community but nothing about individuals [Dwo08, Dwo06].

As some of the existent techniques present in the literature we may cite Xiao et al. work that applies Wavelet Transforms on data before adding noise to it, Andrés et al. apply a generalization of differential privacy to build a formal notion of privacy for location-based systems [ABCP13], and private analysis of networks modeled as graphs [KNRS13]. Deep learning is another field that takes advantage of differential privacy, as demonstrated by recent works [SS15a, ACG+16, JM15].

4 Research proposal

This project aims to develop techniques to efficiently implement secure privacy-preserving computing solutions, enabling it for real-world applications. Our objectives are to propose methods to speedup functional cryptosystems; and frameworks that apply those for building complex applications capable of computing over encrypted data. A complementary objective is to investigate non-encryption-based methods, such as differential privacy.

4.1 Performance improvement of functional cryptosystems

Current LHE schemes have roots on polynomial-based mathematical problems [LPR10,

Reg05, LATV13b]. These require the use of big parameters to achieve industry standard security levels, what implies in serious performance issues. We intent to continue previous efforts on pushing computing over encrypted data as a feasible solution for protecting privacy in real-world tasks when implemented over GPGPUs. This shall be made possible by exploring the following research directions:

16 • Improve algorithmic aspects and software implementation strategies for homomorphic

encryption schemes, allowing performance improvements and better utilization of re-

sources available on GPGPUs. Some of the suggested approaches include exploring

variations of these cryptosystems that are more friendly to parallelism, as those pro-

posed by Bajard et al. and Halevi et al. that adapt the schemes for operations on the

CRT domain [LN16, BMP05, HPS18, BEHZ16].

• Explore new constructions of FHE, as the proposal by Doröz et al. of a new family of

cryptosystems based on the decisional finite field isomorphism [DHP+17].

• Develop more efficient encodings of plaintext to allow simultaneous batched operations,

incurring on smaller parameter sizes and lower number of operations [GC14].

These research problems address challenging fundamental aspects of homomorphic en- cryption in both theoretical and applied research, focusing on the more important obstacles faced by our previous results when deployed in realistic settings. They impact the core of homomorphic encryption research and have the potential of reducing the gap between cur- rent research and the most relevant applications, substantially reducing the timeframe until privacy-preserving solutions based on homomorphic encryption can be made viable in pro- duction systems.

4.2 Privacy-preserving framework for human face recognition

As important as obtaining efficient implementations of functional encryption schemes is to evaluate then in real-world contexts, as we demonstrated in an awarded previous work, where a framework was proposed for building searchable-encrypted databases [AA16, MRAA18].

This direction shall be followed in different contexts. Our first target is to develop secure hu- man face recognition through the Eigenfaces technique, a machine learning technique proposed by Turk and Pentland that enables automated face recognition in real-time [TP91]. It works by computing the eigenvectors of the covariance matrix of a set of training face images. These can be thought of features which together characterize the variation between face images. The following steps compose this stage of the research:

17 • Analyze the security requirements related to human face recognition.

• Study alternative instantiations of homomorphic encryption capable of offering the se-

curity guarantees we need with smaller parameters, and tailored for floating point arith-

metic required by Eigenfaces. The work of Cheon et al. and Costache et al. gives a

good starting point by providing a RLWE-based cryptosystem with better encoding

capabilities for approximate arithmetic [JHCS16, CS16].

• Improve a previous privacy-preserving algorithm for extracting eigenvectors (also called

Principal Component Analysis – PCA) as the main computing step for human face

recognition [PA16]. The improved algorithm may be of independent interest, due to its

wide applicability in the field of machine learning. Because Eigenfaces and polynomial

arithmetic can be highly parallelized, we expect massive performance from the designed

algorithms and resulting implementations.

5 Method and schedule

This section describes the characteristics of the research we intent to execute.

5.1 Methodology

To achieve the aforementioned objectives the methodology we choose is composed by the following steps:

• Identify good quality libraries, frameworks, programming languages, and compilers to be

used in the software development. For instance, SEAL as a famous Microsoft Research

implementation of FV, and Clusion that provides implementations of several searchable

symmetric encryption schemes [DGBL+15, KM16].

• Develop code for nonexistent functionalities and improve the state-of-the-art scenario.

Up to this moment we have been investing efforts on the development of SPOG, a

18 GPGPU -based implementation of FV, which we expect to by far surpass SEAL’s per-

formance, as discussed on Section 6.

• Describe how and in which way users could benefit from privacy-preserving computing.

• Identify open problems and relevant applications which can make use of privacy-preserving

techniques. Up to this moment we have worked on searchable encryption and human

face recognition, as discussed on Section 6. As alternatives we can mention text analysis

through microblogging posts and content-based recommendations [?, ?, ?].

• Study of the chosen applications and proposal of algorithm modifications for better

suitability to the available tools. As observed in the work of Pereira and Aranha, the

constraints of homomorphic encryption schemes may turn unfeasible the implementation

of stock versions of certain algorithms. In their work on the subject of secure machine

learning, it was acknowledged that the principal component analysis (PCA) algorithm

is quite problematic on the encrypted domain mainly by the high cost of computing

orthogonal transformations. Hence, the authors develop an alternative and more HE-

friendly instantiating of PCA [PA17].

• Software implementation as proof-of-concept of the chosen applications under the pro-

posed algorithm modifications and containing state-of-the-art methods for privacy-preserving

computing focusing on speed performance. This task shall be executed following the

Test-Driven Development (TDD) method expecting to obtain a good quality scalable

code [?]. At the end, different metrics shall be observed in the result for understanding

the drawbacks of a HE implementation and proposal of goals for future works capable

of solving them.

• Publication of results in scientific papers and release of all relevant and useful code

developed during this work under a free software license.

5.2 Technical team

• Project advisor: Diego de Freitas Aranha

19 – Title: Doctor in Computer Science

– Position: Assistant Professor - MS3 - Institute of Computing, UNICAMP

– Dedication: Full time

– Short resume: Holds a BSc. in Computer Science from the University of Brasilia

(2005), a MSc. (2007) and a PhD (2011) in Computer Science from the Univer-

sity of Campinas. Worked as a visiting PhD student for 1 year at the University

of Waterloo in Canada, and for 2 years as Adjunct Professor in the Department

of Computer Science at the University of Brasilia. Currently works as Assistant

Professor in the Institute of Computing at the University of Campinas. His pro-

fessional experience is on Cryptography and Computer Security, with a special

interest in the efficient implementation of cryptographic algorithms and the secu-

rity of real-world systems. Coordinated the first team of independent researchers

capable of detecting and exploring vulnerabilities in the software of the Brazilian

voting machine during controlled tests organized by the electoral authority. He

received the Google Latin America Research Award for research on privacy and

Innovators Under 35 Brazil for his work in electronic voting.

• Student: Pedro Geraldo M. R. Alves

– Title: Master in Computer Science

– Dedication: Full time

– Short resume: Bachelor’s degree in Applied mathematics (2013), MSc in Computer

Science (2016) and currently a PhD student in Computer Science at University of

Campinas. His research is focused on information security, specifically cryptog-

raphy and homomorphic encryption, and high performance computing. He has

strong experience with parallel programming using CUDA and common parallel

libraries for CPU such as PThreads, OpenMP and MPI. Moreover, has works in

geophysics and biology on the development of specialized software using CUDA

platform.

20 5.3 Schedule

This project’s development shall respect the following agenda:

• Admission – February/2016: Admission in the graduate program of the Institute of

Computing.

• Compulsory courses – February/2016 to December/2017: Conclusion of required cred-

its by the graduate program. The graduate committee requires 24 credits to be accom-

plished by courses in at most two years. The candidate already took courses in Computer

Architecture, Cryptographic Algorithms, Parallel Computing, and Databases.

• Qualification exams – March/2018: Qualifying exams must be passed up until the fifth

semester from the admission in the graduate program.

• Project development – February/2016 to February/2020: The project has been devel-

oped since the admission in the program and shall continue until the last semester. We

expect to have publications of intermediate results in journals of conferences.

• Sandwich Doctorate – 2018: The candidate pursuing an exchange program for a period

from six to twelve months. The destination still to be defined and depends on obtaining

appropriate scholarship.

• Thesis writing – February/2016 to February/2020: The thesis writing process occurs

concomitantly to the research and will be intensified at the last semester.

• Thesis defense – February/2020: The thesis shall be defended, presenting the results

obtained and concluding the work.

6 Previous work

The student Master’s consisted in developing a CUDA implementation of YASHE, a promising SHE scheme with the best performance and strong security guarantees of the cate- gory at that time [AA16, BLLN13]. Named cuYASHE and release to the community under

21 a GPLv3 license [AA15], this implementation employed different mathematical methods to reduce the computational complexity of the arithmetic and take advantage of the parallel power of GPGPUs, such as the Chinese Remainder Theorem to replace the expensive mul- tiprecision arithmetic with simpler native instructions and the Fast Fourier Transform for reducing the multiplication complexity. Performance was much improved when compared with the state-of-the-art implementations in Desktop processors and FPGAs, with speedups ranging from 6 and 32 times for polynomial multiplication, a performance-critical operation for evaluating any function over encrypted data. It was demonstrated that GPGPUs is an adequate technology for bootstrapping privacy-preserving computing services.

Following this direction, the first part of the student’s doctoral research lead to the devel- opment of the library Secure Processing over GPGPUs (SPOG). It evolves the efforts applied on cuYASHE by combining the lessons learned with the work of Bajard et al., that replaces expensive operations, such as divisions, by others more friendly to the Chinese Remainder

Theorem context [BEHZ16]. Moreover, on account of the attack presented by Albrecht’s work that hurts DSPR-based schemes, as YASHE, SPOG drops replaces it by FV, an alternative proposal by Fan and Vercauteren that does not present as good performance as YASHE but depends only on RLWE, and because of that is not affected by that attack [ABD16, FV12].

Such implementation strategy is a current trend in current HE implementations [DGBL+15].

Our intent with SPOG is to create a base for more complex implementations of real- world applications. Thus, the human face recognition procedure has been used as the first one to be implemented over SPOG. Human face recognition is an important technique to solve several problems in security, such as recognizing persons of interest in surveillance multimedia or authenticating users for access control purposes. Popular approaches are based on machine learning, as Eigenfaces proposed by Turk and Pentland and based on PCA, composed of a training phase with legitimate data for feature extraction and a testing phase to verify if unseen samples match the previously observed data [TP91].

A second branch of the work done by the student was to revisit the classical problem of searching on a encrypted dataset. On an award-winning work, the student proposed a frame- work for searching encrypted databases, capable of preserving data secrecy on an untrusted

22 environment while retaining search and update capabilities [AA16]. It employs order-revealing encryption to provide selection with time complexity in Θ(log n), and homomorphic encryp- tion to enable computation over ciphertexts. When compared to the current state of the art, the approach provides higher security and flexibility. A proof-of-concept implementation on top of MongoDB is offered and presents an 11-factor overhead for a selection query over an encrypted database.

An extended version of this article was recently published in the Journal of Internet

Services and Applications (JISA) [MRAA18]. The main novelty of this work was the improve- ment of the proof-of-concept implementation, which allowed us to reformulate the presented experiment. We chose the Netflix Grand Prize as a case study and deeply discuss the pri- vacy problem caused by deanonymisation through inference attacks on the publicly released dataset. We explore the benefits of our framework to avoid such leakage while enabling com- petitors to develop good-quality predictors. Moreover, we implement fundamental queries for the prize’s winning solution over an encrypted database following our framework and measure the performance penalty such technique.

7 Conclusion

The era of massive data gathering has arrived together of financial-attractive cloud services. Hence, privacy-preserving solutions must be developed to provide proper secrecy guarantees and keep the capability of authorized personal to obtain useful information from users data. Such solutions depends on structured encryption and statistical methods, which are frequently known for its bad performance. So, research on performance boost of such strategies are critical for the future of Internet applications.

This document discusses the trend towards privacy-preserving computing that drives our efforts during the student’s doctorate. The relevant bibliography was revisited. The objective of the project was established as to develop efficient implementations of privacy- preserving algorithms for real-world applications. Lastly, the methodology and appropriate schedule for the next years of research were presented.

23 References

[AA15] Pedro G. M. R. Alves and Diego F. Aranha. cuYASHE.

https://github.com/pdroalves/cuYASHE, 2015.

[AA16] Pedro G. M. R. Alves and Diego F. Aranha. A framework for searching encrypted

databases. In Anais do XVI Simpósio Brasileiro em Segurança da Informação e

de Sistemas Computacionais (SBSeg 2016), 2016.

[ABB10] Shweta Agrawal, Dan Boneh, and Xavier Boyen. Advances in Cryptology –

CRYPTO 2010: 30th Annual Cryptology Conference, Santa Barbara, CA, USA,

August 15-19, 2010. Proceedings, chapter Lattice Basis Delegation in Fixed Di-

mension and Shorter-Ciphertext Hierarchical IBE, pages 98–115. Springer Berlin

Heidelberg, Berlin, Heidelberg, 2010.

[ABCP13] Miguel E. Andrés, Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, and

Catuscia Palamidessi. Geo-indistinguishability: Differential privacy for location-

based systems. In Proceedings of the 2013 ACM SIGSAC Conference on Com-

puter & Communications Security, CCS ’13, pages 901–914, New York, NY,

USA, 2013. ACM.

[ABD16] Martin Albrecht, Shi Bai, and Léo Ducas. A Subfield Lattice Attack on Over-

stretched NTRU Assumptions, pages 153–178. Springer Berlin Heidelberg, Berlin,

Heidelberg, 2016.

[ACG+16] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov,

Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceed-

ings of the 2016 ACM SIGSAC Conference on Computer and Communications

Security, pages 308–318. ACM, 2016.

[Arn97] Jörg Arndt. remarks on FFT algorithms. 1997.

[BBC16] BBC News. Yahoo ’state’ hackers stole data from 500 million users, 2016. Last

accessed: 23/09/2016.

24 [BDPR98] Mihir Bellare, Anand Desai, David Pointcheval, and Phillip Rogaway. Advances

in Cryptology — CRYPTO ’98: 18th Annual International Cryptology Confer-

ence Santa Barbara, California, USA August 23–27, 1998 Proceedings, chapter

Relations among notions of security for public-key encryption schemes, pages

26–45. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998.

[BEHZ16] Jean-Claude Bajard, Julien Eynard, Anwar Hasan, and Vincent Zucca. A full

rns variant of fv like somewhat homomorphic encryption schemes. Cryptology

ePrint Archive, Report 2016/510, 2016. https://eprint.iacr.org/2016/510.

[BF01] Dan Boneh and Matt Franklin. Advances in Cryptology — CRYPTO 2001: 21st

Annual International Cryptology Conference, Santa Barbara, California, USA,

August 19–23, 2001 Proceedings, chapter Identity-Based Encryption from the

Weil Pairing, pages 213–229. Springer Berlin Heidelberg, Berlin, Heidelberg,

2001.

[BGH07] Dan Boneh, Craig Gentry, and Michael Hamburg. Space-efficient identity

based encryptionwithout pairings. In Foundations of Computer Science, 2007.

FOCS’07. 48th Annual IEEE Symposium on, pages 647–657. IEEE, 2007.

[BGV11] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. Fully Homomor-

phic Encryption without Bootstrapping. Cryptology ePrint Archive, Report

2011/277, 2011. http://eprint.iacr.org/.

[BGV12] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) Fully Ho-

momorphic Encryption Without Bootstrapping. In Proceedings of the 3rd Inno-

vations in Theoretical Computer Science Conference, ITCS ’12, pages 309–325,

New York, NY, USA, 2012. ACM.

[BHJP14] Christoph Bösch, Pieter Hartel, Willem Jonker, and Andreas Peter. A survey of

provably secure searchable encryption. ACM Comput. Surv., 47(2):18:1–18:51,

August 2014.

25 [BK15] Adil Bouti and Jörg Keller. Towards practical homomorphic encryption in cloud

computing. In Proceedings of the 2015 IEEE 4th Symposium on Network Cloud

Computing and Applications, NCCA ’15, pages 67–74, Washington, DC, USA,

2015. IEEE Computer Society.

[BLLN13] JoppeW. Bos, Kristin Lauter, Jake Loftus, and Michael Naehrig. Improved Secu-

rity for a Ring-Based Fully Homomorphic Encryption Scheme. In Martijn Stam,

editor, Cryptography and Coding, volume 8308 of Lecture Notes in Computer

Science, pages 45–64. Springer Berlin Heidelberg, 2013.

[BLR+15] Dan Boneh, Kevin Lewi, Mariana Raykova, Amit Sahai, Mark Zhandry, and

Joe Zimmerman. Semantically Secure Order-Revealing Encryption: Multi-input

Functional Encryption Without Obfuscation, pages 563–594. Springer Berlin Hei-

delberg, Berlin, Heidelberg, 2015.

[BMP05] Jean-Claude Jc Bajard, Nicolas Meloni, and Thomas Plantard. Efficient RNS

bases for Cryptography. IMACS World Congress: Scientific Computation, Ap-

plied Mathematics and Simulation, 2005.

[Buy09] Rajkumar Buyya. Market-Oriented Cloud Computing: Vision, Hype, and Re-

ality of Delivering Computing As the 5th Utility. In Proceedings of the 2009

9th IEEE/ACM International Symposium on Cluster Computing and the Grid,

CCGRID ’09, pages 1–, Washington, DC, USA, 2009. IEEE Computer Society.

[CCF+67] William T. Cochran, James W. Cooley, David L. Favin, Howard D. Helms, Regi-

nald A. Kaenel, William W. Lang, Jr. George C. Maling, David E. Nelson,

Charles M. Rader, and Peter D. Welch. What is the fast Fourier transform?

IEEE Transactions on Audio and Electroacoustics, 15:45–55, 1967.

[CHKP10] David Cash, Dennis Hofheinz, Eike Kiltz, and Chris Peikert. Advances in Cryptol-

ogy – EUROCRYPT 2010: 29th Annual International Conference on the Theory

and Applications of Cryptographic Techniques, French Riviera, May 30 – June

26 3, 2010. Proceedings, chapter Bonsai Trees, or How to Delegate a Lattice Basis,

pages 523–552. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

[CJJ+13] David Cash, Stanislaw Jarecki, Charanjit Jutla, Hugo Krawczyk, Marcel-Cătălin

Roşu, and Michael Steiner. Highly-Scalable Searchable Symmetric Encryption

with Support for Boolean Queries, pages 353–373. Springer Berlin Heidelberg,

Berlin, Heidelberg, 2013.

[CLOZ16] David Cash, Feng-Hao Liu, Adam O’Neill, and Cong Zhang. Reducing the leak-

age in practical order-revealing encryption. Cryptology ePrint Archive, Report

2016/661, 2016. http://eprint.iacr.org/2016/661.

[CLWW16] Nathan Chenette, Kevin Lewi, Stephen A. Weis, and David J. Wu. Practical

order-revealing encryption with limited leakage. In FSE, 2016.

[Coc01] Clifford Cocks. Cryptography and Coding: 8th IMA International Conference

Cirencester, UK, December 17–19, 2001 Proceedings, chapter An Identity Based

Encryption Scheme Based on Quadratic Residues, pages 360–363. Springer Berlin

Heidelberg, Berlin, Heidelberg, 2001.

[CS16] Ana Costache and Nigel P. Smart. Which Ring Based Somewhat Homomorphic

Encryption Scheme is Best?, pages 325–340. Springer International Publishing,

Cham, 2016.

[DDC16] F. Betül Durak, Thomas M. DuBuisson, and David Cash. What else is revealed

by order-revealing encryption? In Proceedings of the 2016 ACM SIGSAC Con-

ference on Computer and Communications Security, CCS ’16, pages 1155–1166,

New York, NY, USA, 2016. ACM.

[DGBL+15] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, Michael

Naehrig, and John Wernsing. Manual for Using Homomorphic Encryption for

Bioinformatics, 2015.

27 [DHP+17] Yarkın Doröz, Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman, Berk Sunar,

William Whyte, and Zhenfei Zhang. Fully homomorphic encryption from the

finite field isomorphism problem. Cryptology ePrint Archive, Report 2017/548,

2017. https://eprint.iacr.org/2017/548.

[DLNW13] Hoang T. Dinh, Chonho Lee, Dusit Niyato, and Ping Wang. A survey of mobile

cloud computing: architecture, applications, and approaches. Wireless Commu-

nications and Mobile Computing, 13(18):1587–1611, 2013.

[DPS96] Cunsheng Ding, Dingyi Pei, and Arto Salomaa. Chinese remainder theorem:

applications in computing, coding, cryptography. World Scientific, 1996.

[DR99] Joan Daemen and Vincent Rijmen. AES Proposal: Rijndael, 1999.

[DS15] Wei Dai and Berk Sunar. cuHE: A Homomorphic Encryption Accelerator Library.

Cryptology ePrint Archive, Report 2015/818, 2015. http://eprint.iacr.org/.

[DSS16] Yfke Dulek, Christian Schaffner, and Florian Speelman. Quantum Homomorphic

Encryption for Polynomial-Sized Circuits, pages 3–32. Springer Berlin Heidel-

berg, Berlin, Heidelberg, 2016.

[Duc17] DuckDuckGo. Privacy Mythbusting #3: Anonymized data is safe, right? (Er,

no.). https://spreadprivacy.com/dataanonymization-e1e2b3105f3c, 2017. Accessed 24 July 2017.

[Dwo06] Cynthia Dwork. Differential privacy. In 33rd International Colloquium on Au-

tomata, Languages and Programming, part II (ICALP 2006), volume 4052, pages

1–12, Venice, Italy, July 2006. Springer Verlag.

[Dwo08] Cynthia Dwork. Differential Privacy: A Survey of Results, pages 1–19. Springer

Berlin Heidelberg, Berlin, Heidelberg, 2008.

[ElG85] Taher ElGamal. A Public Key Cryptosystem and a Signature Scheme Based

on Discrete Logarithms. In GeorgeRobert Blakley and David Chaum, editors,

28 Advances in Cryptology, volume 196 of Lecture Notes in Computer Science, pages

10–18. Springer Berlin Heidelberg, 1985.

[FV12] Junfeng Fan and Frederik Vercauteren. Somewhat practical fully homomorphic

encryption. IACR Cryptology ePrint Archive, 2012:144, 2012. informal publica-

tion.

[GC14] Matthias Geihs and Daniel Cabarcas. Efficient Integer Encoding for Homomor-

phic Encryption via Ring Isomorphisms. In LATINCRYPT, volume 8895 of

Lecture Notes in Computer Science, pages 48–63. Springer, 2014.

[Gem17] Gemalto. Poor internal security practices take a toll, findings from the first half of

2017. http://breachlevelindex.com/assets/Breach-Level-Index-Report-H1-2017-

Gemalto.pdf, USA, 2017.

[Gen06] Craig Gentry. Practical Identity-Based Encryption Without Random Oracles,

pages 445–464. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.

[Gen10] Craig Gentry. Computing Arbitrary Functions of Encrypted Data. Commun.

ACM, 53(3):97–105, March 2010.

[GM13] Glenn Greenwald and Ewen MacAskill. NSA Prism program taps in to user data

of Apple, Google and others, 2013.

[Gol06] Philippe Golle. Revisiting the uniqueness of simple demographics in the us popu-

lation. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society,

WPES ’06, pages 77–80, New York, NY, USA, 2006. ACM.

[GPV08] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. Trapdoors for hard

lattices and new cryptographic constructions. In Proceedings of the Fortieth

Annual ACM Symposium on Theory of Computing, STOC ’08, pages 197–206,

New York, NY, USA, 2008. ACM.

29 [GRS17] Paul Grubbs, Thomas Ristenpart, and Vitaly Shmatikov. Why your encrypted

database is not secure. Cryptology ePrint Archive, Report 2017/468, 2017. http:

//eprint.iacr.org/2017/468.

[GS02] Craig Gentry and Alice Silverberg. Hierarchical ID-Based Cryptography, pages

548–566. Springer Berlin Heidelberg, Berlin, Heidelberg, 2002.

[GSW13] Craig Gentry, Amit Sahai, and Brent Waters. Homomorphic Encryption from

Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-

Based, pages 75–92. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.

[Har78] Fredric J Harris. On the use of windows for harmonic analysis with the discrete

fourier transform. Proceedings of the IEEE, 66(1):51–83, 1978.

[HMF+08] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and

J. Good. On the Use of Cloud Computing for Scientific Workflows. In eScience,

2008. eScience ’08. IEEE Fourth International Conference on, pages 640–645,

Dec 2008.

[HPS98] Jeffrey Hoffstein, Jill Pipher, and JosephH. Silverman. NTRU: A ring-based

public key cryptosystem. In JoeP. Buhler, editor, Algorithmic Number Theory,

volume 1423 of Lecture Notes in Computer Science, pages 267–288. Springer

Berlin Heidelberg, 1998.

[HPS18] Shai Halevi, Yuriy Polyakov, and Victor Shoup. An improved rns variant of

the bfv homomorphic encryption scheme. Cryptology ePrint Archive, Report

2018/117, 2018. https://eprint.iacr.org/2018/117.

[JHCS16] Miran Kim Jung Hee Cheon, Andrey Kim and Yongsoo Song. Homomorphic

encryption for arithmetic of approximate numbers. Cryptology ePrint Archive,

Report 2016/421, 2016. http://eprint.iacr.org/2016/421.

30 [jLKS16] Wen jie Lu, Shohei Kawasaki, and Jun Sakuma. Using fully homomorphic encryp-

tion for statistical analysis of categorical, ordinal and numerical data. Cryptology

ePrint Archive, Report 2016/1163, 2016. http://eprint.iacr.org/2016/1163.

[JM15] MI Jordan and TM Mitchell. Machine learning: Trends, perspectives, and

prospects. Science, 349(6245):255–260, 2015.

[KM16] Seny Kamara and Tarik Moataz. The clusion library. https://github.com/

encryptedsystems/Clusion, 2016. Last accessed: 01/25/2018.

[KNRS13] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam

Smith. Analyzing Graphs with Node Differential Privacy, pages 457–476. Springer

Berlin Heidelberg, Berlin, Heidelberg, 2013.

[KS12] Vladimir Kolesnikov and Abdullatif Shikfa. On the limits of privacy provided by

Order-Preserving Encryption. Bell Labs Technical Journal, 2012.

[Kud16] Momonari Kudo. Attacks against search poly-lwe. Cryptology ePrint Archive,

Report 2016/1153, 2016. http://eprint.iacr.org/2016/1153.

[Lam16] Tom Lamont. Life after the Ashley Madison affair, 2016. Last accessed:

01/13/2018.

[LATV13a] Adriana Lopez-Alt, Eran Tromer, and Vinod Vaikuntanathan. On-the-Fly Mul-

tiparty Computation on the Cloud via Multikey Fully Homomorphic Encryption.

Cryptology ePrint Archive, Report 2013/094, 2013. http://eprint.iacr.org/.

[LATV13b] Adriana Lopez-Alt, Eran Tromer, and Vinod Vaikuntanathan. On-the-Fly Mul-

tiparty Computation on the Cloud via Multikey Fully Homomorphic Encryption.

Cryptology ePrint Archive, Report 2013/094, 2013.

[LMSV12] Jake Loftus, Alexander May, Nigel P. Smart, and Frederik Vercauteren. On

CCA-Secure Somewhat Homomorphic Encryption. In Proceedings of the 18th

International Conference on Selected Areas in Cryptography, SAC’11, pages 55–

72, Berlin, Heidelberg, 2012. Springer-Verlag.

31 [LN14] Tancrède Lepoint and Michael Naehrig. A Comparison of the Homomorphic En-

cryption Schemes FV and YASHE. In David Pointcheval and Damien Vergnaud,

editors, Progress in Cryptology – AFRICACRYPT 2014, volume 8469 of Lecture

Notes in Computer Science, pages 318–335. Springer International Publishing,

2014.

[LN16] Patrick Longa and Michael Naehrig. Speeding up the Number Theoretic Trans-

form for Faster Ideal Lattice-Based Cryptography, pages 124–139. Springer In-

ternational Publishing, Cham, 2016.

[LP11] Richard Lindner and Chris Peikert. Topics in Cryptology – CT-RSA 2011: The

Cryptographers’ Track at the RSA Conference 2011, San Francisco, CA, USA,

February 14-18, 2011. Proceedings, chapter Better Key Sizes (and Attacks) for

LWE-Based Encryption, pages 319–339. Springer Berlin Heidelberg, Berlin, Hei-

delberg, 2011.

[LPR10] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. Advances in Cryptology

– EUROCRYPT 2010: 29th Annual International Conference on the Theory

and Applications of Cryptographic Techniques, French Riviera, May 30 – June

3, 2010. Proceedings, chapter On Ideal Lattices and Learning with Errors over

Rings, pages 1–23. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

[LW16] Kevin Lewi and David J. Wu. Order-revealing encryption: New constructions,

applications, and lower bounds. Cryptology ePrint Archive, Report 2016/612,

2016.

[MRAA18] Pedro G. M. R. Alves and Diego F. Aranha. A framework for searching encrypted

databases. Journal of Internet Services and Applications, 9(1):1, Jan 2018.

[NF14] Arvind Narayanan and Edward W Felten. No silver bullet: De-identification still

doesn’t work. White Paper, 2014.

[NKW15] Muhammad Naveed, Seny Kamara, and Charles V. Wright. Inference attacks

on property-preserving encrypted databases. In Proceedings of the 22Nd ACM

32 SIGSAC Conference on Computer and Communications Security, CCS ’15, pages

644–655, New York, NY, USA, 2015. ACM.

[PA16] Hilder V. L. Pereira and Diego F. Aranha. Principal component analysis over

encrypted data using homomorphic encryption. Workshop HEAT Homomorphic

Encryption Applications and Technology, 2016. https://wheat2016.lip6.fr.

[PA17] Hilder V. L. Pereira and Diego F. Aranha. Non-interactive privacy-preserving

k-NN classifier. In 3rd International Conference on Information Systems Security

and Privacy (ICISSP 2017), 2017.

[Pai99] Pascal Paillier. Public-Key Cryptosystems Based on Composite Degree Residu-

osity Classes. In Jacques Stern, editor, Advances in Cryptology — EUROCRYPT

’99, volume 1592 of Lecture Notes in Computer Science, pages 223–238. Springer

Berlin Heidelberg, 1999.

[PBC+16a] Antonis Papadimitriou, Ranjita Bhagwan, Nishanth Chandran, Ramachandran

Ramjee, Andreas Haeberlen, Harmeet Singh, Abhishek Modi, and Saikrishna

Badrinarayanan. Big data analytics over encrypted datasets with seabed. In

Proceedings of the 12th USENIX Conference on Operating Systems Design and

Implementation, OSDI’16, pages 587–602, Berkeley, CA, USA, 2016. USENIX

Association.

[PBC+16b] Antonis Papadimitriou, Ranjita Bhagwan, Nishanth Chandran, Ramachandran

Ramjee, Andreas Haeberlen, Harmeet Singh, Abhishek Modi, and Saikrishna

Badrinarayanan. Big data analytics over encrypted datasets with seabed. In

Proceedings of the 12th USENIX Conference on Operating Systems Design and

Implementation, OSDI’16, pages 587–602, Berkeley, CA, USA, 2016. USENIX

Association.

[PBP16] Rishabh Poddar, Tobias Boelter, and Raluca Ada Popa. Arx: A strongly en-

crypted database system. Cryptology ePrint Archive, Report 2016/591, 2016.

33 [PLL16] Jong Hwan Park, Kwangsu Lee, and Dong Hoon Lee. Efficient identity-based

encryption and public-key signature from trapdoor subgroups. Cryptology ePrint

Archive, Report 2016/500, 2016.

[PNPM15] Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, and Adrian Macias.

Accelerating homomorphic evaluation on reconfigurable hardware. In Lecture

Notes in Computer Science (including subseries Lecture Notes in Artificial In-

telligence and Lecture Notes in Bioinformatics), volume 9293, pages 143–163,

2015.

[PRZB11] Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakr-

ishnan. Cryptdb: Protecting confidentiality with encrypted query processing. In

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Prin-

ciples, SOSP ’11, pages 85–100, New York, NY, USA, 2011. ACM.

[RAD78] R L Rivest, L Adleman, and M L Dertouzos. On Data Banks and Privacy

Homomorphisms. Foundations of Secure Computation, Academia Press, pages

169–179, 1978.

[Reg05] Oded Regev. On lattices, learning with errors, random linear codes, and cryptog-

raphy. In Proceedings of the Thirty-seventh Annual ACM Symposium on Theory

of Computing, STOC ’05, pages 84–93, New York, NY, USA, 2005. ACM.

[Rog15] Phillip Rogaway. The moral character of cryptographic work. IACR Cryptology

ePrint Archive, page 1162, 2015.

[Sch16] Bruce Schneier. Data is a toxic asset, 2016.

[SS13] Damien Stehlé and Ron Steinfeld. Making NTRUEncrypt and NTRUSign as

Secure as Standard Worst-Case Problems over Ideal Lattices. Cryptology ePrint

Archive, Report 2013/004, 2013. http://eprint.iacr.org/.

34 [SS15a] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In Proceed-

ings of the 22Nd ACM SIGSAC Conference on Computer and Communications

Security, CCS ’15, pages 1310–1321, New York, NY, USA, 2015. ACM.

[SS15b] Abdul Ghaffar Shoro and Tariq Rahim Soomro. Big data analysis: Apache spark

perspective. Global Journal of Computer Science and Technology, 15(1), 2015.

[Swe00] Latanya Sweeney. Simple Demographics Often Identify People Uniquely. 2000.

[SWPP00] Dawn Xiaodong Song, David Wagner, Adrian Perrig, and A Perrig. Practical

techniques for searches on encrypted data. Proceeding 2000 IEEE Symposium on

Security and Privacy. S&P 2000, pages 44–55, 2000.

[Tho15] Simon Thomsen. Extramarital affair website Ashley Madison has been hacked

and attackers are threatening to leak data online, 2015. Last accessed:

05/25/2016.

[TP91] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proceed-

ings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, pages 586–591, Jun 1991.

[VPB09] C. Vecchiola, S. Pandey, and R. Buyya. High-Performance Cloud Computing: A

View of Scientific Applications. In Pervasive Systems, Algorithms, and Networks

(ISPAN), 2009 10th International Symposium on, pages 4–16, Dec 2009.

[Web14] Harrison Weber. How the NSA & FBI made Facebook the perfect mass surveil-

lance tool, 2014. Venture Beat. Published on 05/15/2014.

[XX13] Zhifeng Xiao and Yang Xiao. Security and Privacy in Cloud Computing. IEEE

Communications Surveys Tutorials, 15(2):843–859, Second 2013.

35