Institute of Computing University of Campinas
Exame de Qualificação Específico
Cryptographic Engineering of Privacy-preserving Algorithms
Pedro Geraldo M. R. Alves
Supervised by
Prof. Dr. Diego de Freitas Aranha
November 2019 Abstract
When extensive data gathering and outsourced computing becomes every day more common, it is necessary to tighten security requirements. The appreciation of user information trans- forms datasets into targets for malicious entities, that may be able to explore different kinds of system vulnerabilities to acquire it. Moreover, since much of nowadays computation lies on the cloud, data becomes vulnerable also to malicious system administrators and third parties with access to the infrastructure itself.
In this work we intent to explore different solutions for capable privacy-preserving data management and computing. Our focus is on the development of strategies for efficient imple- mentation and applicability in the field. As starting points we shall continue previous efforts on the acceleration of functional encryption schemes (as homomorphic encryption schemes) using GPGPU -parallelism and applying it on human face recognition in the cloud, an other- wise notoriously privacy-intrusive computing task. Furthermore, efforts should be expend on the development of a searchable encryption framework capable of maintaining the usability of the dataset while keep its secrecy. As a secondary objective we target non-cryptography-based solutions, as differential privacy. Contents
1 Introduction 2
2 Mathematical methods for the efficient implementation of polynomial-based
arithmetic 4
2.1 Residue number system ...... 4
2.2 Polynomial multiplication ...... 5
2.3 Discrete Fourier Transform ...... 6
2.4 Fast Fourier Transform ...... 7
2.5 Number-Theoretic Transform ...... 7
3 Related work 8
3.1 Functional encryption ...... 8
3.2 Searchable encryption ...... 12
3.3 Differential Privacy ...... 15
4 Research proposal 16
4.1 Performance improvement of functional cryptosystems ...... 16
4.2 Privacy-preserving framework for human face recognition ...... 17
5 Method and schedule 18
5.1 Methodology ...... 18
5.2 Technical team ...... 19
5.3 Schedule ...... 21
6 Previous work 21
7 Conclusion 23
1 1 Introduction
In an era of increasing appreciation of data gathering for advertising and behavior forecast, information is gold. Thus, as important as being capable of processing and extracting useful information from data is to protect it from breaches. Accomplishing this task implies looking after for the business itself and its users.
From mobile to scientific computing, the industry increasingly embraces cloud ser- vices and takes advantage of their potential to improve availability and reduce operational costs [HMF+08, DLNW13]. The possibility to outsource the installation, maintenance and scalability of servers, added to competitive prices, makes this service highly attractive [Buy09,
VPB09]. However, the cloud cannot be blindly trusted. Malicious parties may acquire full access to the servers and consequently to sensitive data. Among the threats there are external entities exploiting vulnerabilities, intrusive governments requesting information, competitors seeking unfair advantages, and even possibly malicious system administrators. The data owner has no real control over the processing hardware and therefore cannot guarantee the secrecy of data [XX13]. The risk of confidentiality breaches caused by inadequate and insecure use of cloud computing is real and tangible.
How to efficiently collect and compute user data without undermining their privacy is an open problem. As remarked by Narayanan and Felten, “data privacy is a hard prob- lem” [NF14]. Even when data holders choose the most conservative practice and never share data, system breaches may happen.
The Breach Level Index provides distressful statistics about data leakage since 2013.
According to their last report, almost 2 billion records breached only in the first half of 2017.
Almost 7 thousands per minute. Of these, only 4.6% were encrypted. That is, most of it was in plaintext, including emails, vehicle registration details, payment information, and medical records [Gem17].
Most of those breaches occurred by accidental loss or inadvertently leaving data exposed.
However, malicious parties are far from negligible. For instance, governmental surveillance programs as PRISM that forced cloud companies to provide user data [GM13, Web14]; the
2 Ashley Madison case, when personal data of millions of users was leaked revealing sensitive information and with developments that include suicides [Tho15, Lam16]; and the Yahoo data breach, possibly the largest known and affecting about 3 billion accounts, which revealed unencrypted email addresses, telephone numbers, security questions and answers [BBC16].
These occurrences take us to the disturbing feeling that, despite all efforts, the risk of data deanonymization increases in worrying ways following how much of it is made avail- able [Swe00, Gol06]. Hence, a seemingly obvious strategy to avoid such issue is to simply stop any kind of data collection.
History has proven that the task of collecting and storing data from third parties should be treated as risky. The chance of compromising user privacy by accident is too big and pos- sibly with extreme consequences. This way, the concept of security by renouncing knowledge has attracted adepts, as the search engine DuckDuckGo that states in a blog post that “the only truly anonymised data is no data”, and because of that, claims to forego the right to store their users’ data [Duc17, Sch16].
A more financial-realistic alternative for dealing with this issue is not to give up com- pletely of knowledge but reduce the entities with full access to data by keeping it encrypted during all its lifespan: transportation, storage, and processing, staying secret to the applica- tion and the cloud; and hide it using a big set of similar data from several users. Thus, a new security fence is set, tying data secrecy to formal guarantees.
This document is structured as follows. We first briefly review common directions present in the literature for privacy-preserving computing in Section 3. Section 4 consoli- dates and describes the topics we shall pursue during the research. Section 5.1 analyzes the methodology. Section 5.3 presents the expected schedule for this work and, lastly, Section 6 presents some of the most important results obtained during the student’s previous works.
3 2 Mathematical methods for the efficient implementation of
polynomial-based arithmetic
As discussed in Section 3.1, current SHE and LHE cryptosystems suffer from perfor- mance issues due to their roots in polynomial arithmetic and the need for big parameters to achieve proper security levels. This way, an efficient formulation of the arithmetic together with convenient mathematical methods are essential to obtain a high-performance implemen- tation on the target platform. Following we mention some of the main tools we’ve been working on to boost performance of such schemes.
2.1 Residue number system
Multiprecision integer arithmetic can be computationally expensive once there is no hardware support, forcing the task to be executed by software. The Residue Number System
(RNS) applies the Chinese Remainder Theorem (CRT) to map large integers to a set of arbitrarily smaller ones, called residues [DPS96]. This allows manipulating the polynomial coefficients through a simpler and natively supported arithmetic.
This work applied RNS as tool to build the faster arithmetic intended. Definitions 1 and 2 provides the formulations for applying the direct and inverse CRT.
Definition 1 (Direct CRT function). Let x be a polynomial in Rq and {p0, p1, . . . , pl−1} a set with ` primes. We have
CRT (x) = {x mod p0, x mod p1, . . . , x mod p`−1} .
Definition 2 (Inverse CRT function). Let x be a set with ` residues, {p0, p1, . . . , p`−1} the l−1 corresponding set of primes and M = πi=0pi. Then,
l−1 " −1 # X M M ICRT (x) = · x mod p mod M. p p i i i=0 i i
To guarantee the correctness of these functions, it is required that the product of all
4 primes, M, be larger than the largest possible coefficient of a polynomial represented in the CRT domain, even after an addition or multiplication operation. This way, we need
`−1 2 M = Πi=0 pi > n · q , for polynomials of degree up to n and coefficients lying in Zq. Addition and multiplication in the CRT domain work by applying the operation coefficient-wise, as described in Definition 3.
Definition 3 (Addition/multiplication in the CRT domain). Let x, y be integers in Zq, X = CRT (x) and Y = CRT (y) their corresponding residues computed with ` primes and an operation , addition or multiplication. Then,
X Y = {(X0 Y0) , (X1 Y1) ,..., (Xl−1 Yl−1)} .
2.2 Polynomial multiplication
Let p and q be two polynomials such that
N−1 p(x) = a0 + a1x + ... + aN−1x ,
N−1 q(x) = b0 + b1x + ... + bN−1x .
Its product is defined as
2N−2 p(x) · q(x) = c(x) = c0 + c1x + ... + c2N−2x , (1) where X ci = akbi−k. (2) max{0,i−(N−1)}≤k≤min{i,N−1}
If these expressions are computed directly, the number of operations needed to obtain