This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore.

Privacy‑preserving user profile matching in social networks

Yi, Xun; Bertino, Elisa; Rao, Fang‑Yu; Lam, Kwok‑Yan; Nepal, Surya; Bouguettaya, Athman

2019

Yi, X., Bertino, E., Rao, F.‑Y., Lam, K.‑Y., Nepal, S., & Bouguettaya, A. (2019). Privacy‑preserving user profile matching in social networks. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1572‑1585. doi:10.1109/TKDE.2019.2912748 https://hdl.handle.net/10356/143529 https://doi.org/10.1109/TKDE.2019.2912748

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TKDE.2019.2912748.

Downloaded on 27 Sep 2021 00:25:54 SGT 1 Privacy-Preserving User Profile Matching in Social Networks Xun Yi, Elisa Bertino, Fang-Yu Rao, Kwok-Yan Lam, Surya Nepal and Athman Bouguettaya

Abstract—In this paper, we consider a scenario where a user queries a user profile , maintained by a social networking service provider, to identify users whose profiles are similar to the profile specified by the querying user. A typical example of this application is online dating. Most recently, an online data site, Ashley Madison, was hacked, which results in disclosure of a large number of dating user profiles. This data breach has urged researchers to explore practical privacy protection for user profiles in a . In this paper, we propose a privacy-preserving solution for profile matching in social networks by using multiple servers. Our solution is built on homomorphic encryption and allows a user to find out matching users with the help of multiple servers without revealing to anyone the query and the queried user profiles in the clear. Our solution achieves user profile privacy and user query privacy as long as at least one of the multiple servers is honest. Our experiments demonstrate that our solution is practical.

Keywords—User profile matching, data privacy protection, ElGamal encryption, Paillier encryption, homomorphic encryption !

1 INTRODUCTION they use counterintuitive “privacy” settings, and their data Matching two or more users with related interests is an management systems have serious security flaws. important and general problem, applicable to a wide range In July 2015, “The Impact Team” group stole user data of scenarios including job hunting, friend finding, and dating from Ashley Madison, a commercial website billed as enabling services. Existing on-line matching services require participants extramarital affairs. The group then threatened to release to trust a third party server with their preferences. The matching users’ names and personally identifying information if Ashley server has thus full knowledge of the users’ preferences, which Madison was not immediately shut down. On 18 and 20 August raises privacy issues, as the server may leak (either intentionally, 2015, the group leaked more than 25 gigabytes of company or accidentally) users’ profiles. data, including user details. Because of the site’s policy of When signing up for an online matching service, a user not deleting users’ personal information, including real names, creates a “profile” that others can browse. The user may home addresses, search history and credit card transaction be asked to reveal age, sex, education, profession, number records, many users feared being publicly shamed. On 24 of children, religion, geographic location, sexual proclivities, August 2015, Toronto police announced that two unconfirmed drinking behavior, hobbies, income, religion, ethnicity, drug suicides had been linked to that data breach. use, home and work addresses, favorite places. Even after an Such a data breach has raised growing concerns amongst account is canceled, most online matching sites may retain users on the dangers of giving out too much personal infor- such information. mation. Users of these services also need to be aware of data Users’ personal information may be re-disclosed not only theft. A main challenge is thus how to protect privacy of user to prospective matches, but also to advertisers and, ultimately, profiles in social networks. So far, the best solution is through to data aggregators who use the data for purposes unrelated encryption, i.e., users encrypt their profiles before uploading to online matching and without customer consent. In addition, them onto social networks. However, when user profiles are there are risks such as scammers, sexual predators, and encrypted, it is challenging to match the users with the similar reputational damage that come along with using online matching profiles. services. In this paper, we consider a scenario where a user queries a Many online matching sites take shortcuts with respect to user profile database, maintained by a social networking service safeguarding the privacy and security of their customers. Often, provider, to find out some users whose profiles are similar to the profile specified by the querying user. A typical example of X. Yi is with the School of Computer Science and Software Engineering, RMIT this application is online dating. We give a privacy-preserving University, Melbourne, VIC 3001, Australia. solution for user profile matching in social networks by using E. Bertino and F. Y. Rao are with the Department of Computer Science and multiple servers. Cyber Center, Purdue University, West Lafayette, IN 47907. K. Y. Lam is with the School of Computer Science and Engineering, Nanyang Our basic idea can be summarized as follows. Before Technological University, Singapore 639798 uploading his/her profile to a social network, each user encrypts S. Nepal is with the Commonwealth Scientific and Industrial Research the profile by a homomorphic encryption scheme with the Organisation (CSIRO), Armidale, NSW 2350, Australia common encryption key. Therefore, even if the user profile A. Bouguettaya is with the School of Information Technologies, The University database falls into the hand of a hacker, the hacker can only of Sydney, NSW 2006, Australia Manuscript received 2 August 2017. get the encrypted data. When a user wishes to find people 2

in the social network, the user encrypts his/her preferred user P1 inputs X = {x1, x2, ··· , xn} and the other party profile and a dissimilarity threshold and submits the query to the P2 inputs Y = {y1, y2, ··· , yn}, one party learns X ∩ social networking service provider. Based on the query, multiple Y and nothing else and the other party learns nothing. servers, which secretly share the decryption key, compare the Their solution is based on commutative encryption with preferred user profile with each record in the database. If the property: Ek1 (Ek2 (x)) = Ek2 (Ek1 (x)), where k1, k2 the dissimilarity is less than the threshold, the matching user’ are known to P1 and P2, respectively. The idea is: P1 contact information is returned to the querying user. (P2) encrypts its inputs X (Y ) with its key k1 (k2), de-

Our main contributions include noted as Ek1 (X) (Ek2 (Y )) and exchanges them. Next, P1

1) We formally define the user profile matching model, the sends a pair (Ek2 (Y ),Ek1 (Ek2 (Y ))) to P2, which computes

user profile privacy and the user query privacy. Ek2 (Ek1 (X))), compares it with (Ek2 (Y ),Ek1 (Ek2 (Y ))) and

2) We give a solution for privacy-preserving user profile decrypts Ek2 (y) if Ek2 (Ek1 (x)) = Ek1 (Ek2 (y)). Then Vaidya matching for a single dissimilarity threshold and then et al. [28] extended such a solution to n-party setting. Arb et extend it for multiple dissimilarity thresholds. al. [2] applied this idea to detect friend-of-friend in MSN. 3) We perform security analysis on our protocols. If at In 2004, Freedman et al. [9] gave a solution for private two- least one of multiple servers is honest, our protocols party set-intersection problem based on polynomial evaluation. achieve user profile privacy and user query privacy. The idea is: P1 defines a polynomial P (y) = (x1 − y)(x2 − 4) We conduct extensive experiments on a real dataset y)...(xn − y) and sends to P2 homomorphic encryptions of to evaluate the performance of our proposed protocols the coefficients of this polynomial. P2 uses the homomorphic under different parameter settings. Experiments show properties of the encryption system to evaluate the polynomial that our solutions are practical and efficient. at each of his inputs and then multiplies each result by a This paper extends our previous work [32] as follows. fresh random number r to get an intermediate result, and 1) In our previous work, we use a variant ElGamal adds to it an encryption of the value of his input, i.e., P2 encryption scheme [33] as the underlying homomorphic computes E(rP (y) + y). Therefore, for each of the elements encryption scheme, which assumes the two prime factors in the intersection of the two parties’ inputs, the result of of the modulus are public parameters. Rao [22] has this computation is the value of the corresponding element, found a security flaw in the encryption scheme, that whereas for all other values the result is random. In 2005, is, an attacker may decrypt the ciphertexts without the Kissner and Song [19] gave an improved solution that enables decryption key. In this paper, we fix the security flaw set-intersection, cardinality set-intersection, and over-threshold by keeping the factorization of the modulus secret. set-union operations on multisets. Later, Sang et al.[23], Ye 2) In our previous work, the user profile data is shared by et al. [31] and Dachman-Soled et al. [7] further improved and all matching servers and therefore each matching server extended these solutions. In 2007, Li and Wu [17] proposed an is required to maintain a user profile database. In this unconditionally secure protocol for multi-party set intersection. paper, we keep the user profile data in the social service Their idea is similar to Kissner and Song [19], while the inputs provider only and therefore each matching server does are shared among all parties using secret sharing [25], and not need to maintain any user profile database. computations are executed on those shares. In 2010, Narayanan 3) In our previous work, a query user can specify only one et al. [20] proposed an improved solution. All these solutions dissimilarity threshold for user matching. In this paper, are based on polynomial evaluation. we allow the query user to specify multiple dissimilarity In 2008, Hazay and Lindell [13] proposed a solution for the thresholds for user matching. private two-party set-intersection problem, where one party P1 4) In our previous work, user profile matching is based with a set X inputs the key k to a pseudorandom function on numerical attributes only. In this paper, we extend F and another party P2 with a set Y inputs the elements our solution to user profile matching with categorical of its set. At the end, P2 holds the set Fk(y)y∈Y while P1 attributes. has learned nothing. Then, P1 just needs to locally compute 5) In this paper, we enhance the security and performance the set Fk(x)x∈X and send it to P2. By comparing which of the private sharing and multiplication algorithms in elements appear in both sets, P2 can learn the intersection (but our previous work. In addition, we provide two new nothing more). Their solution needs to compute a large number collaborative algorithms for encryption and decryption. of exponentiations. In 2009, Jarecki and Liu [16] proposed a solution to improve the efficiency. In 2010, Cristofaro and The rest of the paper is organized as: Section 2 surveys related Tsudik [6] proposed a solution based on blind RSA signatures work. Section 3 introduces preliminaries. Section 4 describes [5]. The server computes (H(x ))d for the client obliviously, our model for privacy-preserving user profile matching. Sections j where x ’s are client’s inputs and d is server’s one-time RSA 5 and 6 present our solutions. Sections 7 and 8 present the j signing key. Their solution is more efficient than previous security and performance analysis. The last section concludes solutions. All these solutions are based on pseudorandom the paper. functions. In 2010, Yang et al. [29] proposed the E-SmallTalker 2 RELATED WORK scheme for matching people’s interests before initiating a In 2003, Agrawal et al. [1] gave a solution for pri- small-talk. E-SmallTalker considers n potential communication vate two-party set-intersection problem, where if one party users U1,U2, ··· ,Un, each having a set of interests Ai = 3

P {ai,1, ai,2, ··· , ai,ni }. It allows each user Ui to discover the zero values to zero. Let σ = R(zij), where σ equals the identical interests in both Ai and Aj (1 ≤ i, j ≤ n, i 6= j) total number of minutia matchings. Using the homomorphic and the corresponding user Uj if there are identical interests property of the encryption scheme E(σ) can be calculated between Ai and Aj. E-SmallTalker is based on Bloom filter by Bob. Finally, the homomorphic property can be used to [3]. An empty Bloom filter is a bit array of m bits, all set to finalize the protocol and let Alice find out if σ is greater than 0. There must also be k different hash functions defined, each or equal to a threshold τ or not. This work is also related to of which maps or hashes some set element to one of the m our work. The difference is that we assume the stored user array positions with a uniformly random distribution. To add data is encrypted while they assume two parties, each holding an element, one feeds it to each of the k hash functions to get a private fingerprint in cleartext. k array positions and sets the bits at all these positions to 1. To Most recently, Sun et al. [27] proposed privacy-preserving query for an element (test whether it is in the set), one has to distance-aware encoding mechanisms to compare numerical feed it to each of the k hash functions to get k array positions. values in the anonymous space. This work is also related to If any of the bits at these positions is 0, the element is definitely our work. The difference is that we encrypt the original data not in the set - if it were, then all the bits would have been set to while they embed the original data into an anonymous space. 1 when it was inserted. If all are 1, the element is in the set with Our work is also closely related to homomorphic encryption, certain probability. In 2011, Li et al. [18] proposed the FindU a form of encryption that allows computation on ciphertexts, scheme for profile matching in mobile social networks. FindU generating an encrypted result which, when decrypted, matches is based on private set-intersection techniques. However, FindU the result of the operations as if they had been performed on is more efficient because it uses secure multi-party computation the plaintext. The purpose of homomorphic encryption is to based on polynomial secret sharing [25]. allow computation on encrypted data. Typical homomorphic In 2011, Huang et al. [15] proposed a privacy-preserving solu- encryption schemes include the ElGamal [8], Goldwasser- tion for biometric matching, where the server holds a database Micali [11], Paillier [21] schemes, which support either addition n {vi, pi}i=1 (vi denotes the biometric data corresponding to or multiplication on encrypted data, and the Boneh-Goh- some identity profile pi), and the client who holds a biometric Nissim [4] scheme, which supports arbitrary additions and one 0 reading v wants to learn the identity pj for which vj is the multiplication (followed by arbitrary additions) on encrypted closets match to v0 with respect to some metric (e.g., Euclidean data. Fully homomorphic encryption schemes, e.g., the Gentry distance), assuming this match is within some distance threshold scheme [10], supports arbitrary computation on encrypted data, δ. The idea is using Yao’s garbled circuit technique [30] to but has not become practical so far. perform the matching. Each gate of a circuit is associated with four ciphertexts (a garbled table) by one party. The collection 3 PRELIMINARIES of garbled tables (the garbled circuit) is sent to another party, who uses information obtained by oblivious transfer to learn the 3.1 ElGamal Encryption output of the function on the parties’ inputs. This work is most Our protocol is based on the ElGamal encryption scheme, which related to our work. The major difference is that we assume is introduced in this section. the database is encrypted while they assume the database to The ElGamal encryption scheme consists of key generation, be in cleartext. encryption, and decryption algorithms as follows. In 2012, Shahandashti et al. [24] proposed a private fin- • Key Generation: On input a security parameter k, it gerprint matching protocol that compares two fingerprints. publishes a multiplicative cyclic group G of prime order The protocol enables two parties, each holding a private q with a generator g, such that discrete logarithm problem fingerprint, to find out if their fingerprints belong to the same over the group G is hard. Then it chooses a private key individual. The main building block of their construction is the x randomly from ∗ = {1, 2, ··· , q − 1} and computes P k Zq aided computation. Consider a polynomial P (x) = akx a public key y = gx. The private key x is kept secret and a homomorphic encryption algorithm E for which Alice while the public key y can be known to everyone. {E(a )}n knows the decryption key. It is known that given k k=1 • Encryption: On inputs a message m ∈ G and the public and x, the homomorphic property of E enables Bob to ∗ key y, it chooses an integer r randomly from Zq and compute E(P (x)). Aided computation, on the other hand, C = E(m) = (A, B) A = gr n outputs a ciphertext , where enables Bob to compute E(P (x)) given {ak}k=1 and E(x). and B = m · yr. A simplified description of their protocol is as follows. Let • A, B n 0 m Decryption: On inputs a ciphertext ( ), and the private {pi}i=1 and {pj}j=1 denote Alice’s and Bob’s fingerprint key x, it outputs the plaintext m = D(C) = B/Ax. minutiae, respectively. They use the properties of homomorphic encryption schemes to privately calculate and compare the Euclidean distance and angular difference for each pair of 3.2 Conditional Gate 0 minutiae (pi, pj) to given thresholds. Their protocol enables A conditional gate [26] allows n (≥ 2) parties to two 0 Bob to calculate E(zij) for each pair (pi, pj), such that zij is encrypted values a and b without disclosing a and b, as long zero if the two minutiae match and non-zero otherwise. Then, as a is restricted to a two-valued domain. It can be realized by using the aided computation idea, Bob calculates E(R(zij)), the distributed ElGamal cryptosystem. where R is a polynomial that maps zero to one and non- Assume the same setting as distributed ElGamal cryptosystem [32]. Let E(ga), E(gb) denote ElGamal encryptions, with a ∈ 4

4 MODELFOR PRIVACY-PRESERVING USER PROFILE MATCHING 4.1 Our Model

Matching

Server Our model considers a social networking service environment with users and servers. Our model with one user, one database Query (DB) server and n matching servers is illustrated in Fig. 1. DB Response In our model, all users stores their profiles in the DB server in Matching Server Social Networking User the service provider. User profile attributes are either sensitive Service Provider or insensitive. We consider protection of sensitive attributes only. In addition, user profile attributes are either numeric (e.g., income) or categorical (e.g., address). We consider numeric attributes only in this paper except in Section 6.2. Matching Server 4.2 Dissimilarity Definitions Fig. 1: Our Model for Privacy-Preserving User Profile Matching Definition 1 (Dissimilarity Definition for Single Attribute). For any numerical attribute A, the dissimilarity between two users U and U 0 with A = a and A = a0, respectively, is defined ab 0 0 2 0 {−1, 1} and b ∈ Zq, the n server jointly compute W = E(g ) as dA(U, U ) = (a − a ) , where a, a are integers. 1 using the conditional gate as follows . Definition 2 (Dissimilarity Definition for Multiple At- a b • Let U0 = E(g ) and V0 = E(g ) tributes). Assume that A1,A2, ··· ,Am are m numerical • For i = 1, 2 ··· , n, the server Si takes Ui−1 and Vi−1 as attributes for user profile, the dissimilarity between two users input and outputs U = RE(U ri ) and V = RE(V ri ), 0 0 Pm 0 i i−1 i i−1 U and U is defined as d(U, U ) = i=0 dAi (U, U ). where r is randomly chosen from {−1, 1} and RE stands i For a user who queries the user profile database, different for re-encryption. Qn a ri attributes may have different impacts on the dissimilarity. Some • The n servers jointly decrypt Un to obtain g i=1 . 0 attributes may be more important for the dissimilarity than other Because a, ri ∈ {−1, 1}, it is easy to know a = Qn attributes. We introduce the concept of weight to measure the a i=1 ri ∈ {−1, 1}. a0 importance of an attribute to the dissimilarity. The weight for • Let W = Vn . an attribute is an integer more than or equal to 0. It is specified ab The conditional gate correctly computes E(g ) because by a user who queries the database. 0 Qn 0 Qn 2 a b i=1 ri a ab( i=1 ri) ab Vn = E(g ) = E(g ) = E(g ). Definition 3 (Weighted Dissimilarity Definition for Multiple Attributes). Assume that A1,A2, ··· ,Am are m numerical Based on the conditional gate, the n servers can jointly ab a b attributes for user profile and the weight of attribute Ai is wi. compute E(g ) given E(g ) and E(g ) for a, b ∈ {0, 1} as The weighted dissimilarity between two users U and U 0 is follows. 0 Pm 0 defined as d(U, U ) = i=0 widAi (U, U ). −1 −1 E() = E(g((2a−1)b+b)2 ) = (E(g(2a−1)b)E(gb))2 . 4.3 Security Definitions Note that 2a − 1 ∈ {−1, 1} when a ∈ {0, 1} and E(g(2a−1)b) To protect the user profile privacy, the ElGamal encryption is can be computed with the conditional gate as above. used. At first, each matching server Si generates its ElGamal Furthermore, given encrypted bit representations E(x) = public/private key pair (pki, ski). The common public key of (E(gxm−1 ), ··· , E(gx1 ), E(gx0 )) and E(y) = (E(gym−1 ), ··· , Qn the social networking service is set to PK = i=1 pki. E(gy1 ), E(gy0 )) of two numbers x and y, the n servers can According to the common public key PK, a user encrypts jointly compute encrypted bits of x+y, given by E(z) = E(x+ each attribute of his profile along with his contact information y) = (E(gcm−1 ), E(gzm−1 ), ··· , E(gz1 ), E(gz0 )) as follows: and sends the encrypted profile to the DB server, which stores E(gci ) = E(gxiyi )E(gxici−1 )E(gyici−1 )E(g−2xiyici−1 ), them in a user profile database. The encrypted profile can be decrypted only by the cooperation of the n matching servers. E(gzi ) = E(gxi )E(gyi )E(gci−1 )E(g−2ci ), To find users with similar profiles, a user specifies a profile, encrypts and submits it along with a dissimilarity threshold to for i = 0, 1, ··· , m−1, where c = 0. We denote E(x+y) = −1 the social networking service provider. E(x) E(y).  The n matching servers cooperate to find the matching users Yao [30] garbled circuit allows two parties, each having a from the user profile database according to the user profile private input bit, compute the product of their private inputs. and the dissimilarity threshold and then return the contact However, it is restricted to two parties only, whereas the information of the matching users with dissimilarity less than conditional gate allows more than two parties to compute the the threshold to the querying user. product of their private inputs. Considering m attributes A1,A2, ··· ,Am for user profile, 1. The interested reader is referred to [26] for the detailed description of we formally define the above process with a user profile the protocol. matching protocol, composed of four algorithms as follows. 5

(1) Profile Generation (PG): Takes as input the com- as follows: Given the common public key PK, consider the mon public key PK of the social networking ser- following game between an adversary A, and a challenger C. vice, the profile a1, a2, ··· , am of the user, (the user) The game consists of the following steps: outputs an encrypted profile P , denoted as P = (1) The adversary A chooses two user profiles PG(P K, a1, a2, ··· , am). The encrypted profile P is (a01, a02, ··· , a0m) with corresponding weights submitted to the social networking service provider and (w01, w02, ··· , w0m) and a dissimilarity threshold δ0, stored in the user profile database DB. and (a11, a12, ··· , a1m) with corresponding weights (2) Query Generation (QG): Takes as input the common (w11, w12, ··· , w1m) and another dissimilarity threshold public key PK of the social networking service, the δ1, and sends them to the challenger C. profile a1, a2, ··· , am specified by the user, the corre- (2) The challenger C chooses a random bit b ∈ {0, 1}, and sponding wights w1, w2, ··· , wm, and the dissimilarity executes the Query Generation (QG) to obtain a query threshold δ, (the user) outputs a query Q, denoted as Qb = QG(P K, ab1, wb1, ab2, wb2, ··· , abm, wbm, δb), Q = QG(P K, a1, w1, a2, w2 ··· , am, wm, δ), which is and then sends Qb back to the adversary A. submitted to the social networking service provider. (3) The adversary A can experiment with the code of Qb (3) Response Generation (RG): Takes as input the in an arbitrary non-black-box way, and finally outputs a query Q, the user profile database DB, and the bit b0 ∈ {0, 1}. private keys sk1, sk2, ··· , skn, (the n matching The adversary wins the game if b0 = b and loses otherwise. servers) outputs a response R, denoted as R = We define the adversary A’s advantage in this game to be RG(Q, DB, sk , sk , ··· , sk ). The response R, con- 0 1 2 n AdvA(k) = |Pr(b = b) − 1/2| where k is the security taining the contact information of the first t matching parameter. users in DB, is then returned to the user, where t is either fixed or specified by the user in Q. Definition 5 (User Query Privacy Definition). In a user (4) Response Retrieval (RR): Takes as input the response profile matching protocol, the user has user query privacy if for R, (the user) outputs the first t matching users in DB. any probabilistic polynomial time (PPT) adversary A, we have that AdvA(k) is a negligible function, where the probability is A user profile matching protocol is correct if the response R taken over coin-tosses of the challenger and the adversary. lists the contact information of the first t0 matching users with the dissimilarity less than the threshold δ, where t0 = min(t, T ) User query privacy ensures that the attacker cannot determine and T denotes the total number of matching users in DB. the query of the user even if some matching servers are Now, we formally define user profile privacy with a game compromised by the attacker. as follows. Given the common public key PK, consider the following 5 USER PROFILE MATCHING PROTOCOL game between an adversary A and a challenger C. The game 5.1 Initialization consists of the following steps: We will use the ElGamal encryption scheme [8] and choose (1) The adversary A chooses two user profiles the public parameters on the basis of the Paillier encryption (a01, a02, ··· , a0m) and (a11, a12, ··· , a1m) and scheme [21]. sends them to the challenger C. On input a security parameter k, the system generates two (2) The challenger C chooses a random bit b ∈ {0, 1}, large distinct primes p and q and computes N = pq and and executes the Profile Generation (PG) to obtain an g = (1 + N)prN (mod N 2) for a randomly chosen integer 2 encrypted profile Pb = PG(P K, ab1, ab2, ··· , abm), and r ∈ {2, 3, ··· ,N − 1}, such that (g − 1)/N is not an integer, then sends Pb back to the adversary A. then publishes (N, g) and erase p and q from the memory. Note (3) The adversary A can experiment with the code of Pb that (N, g) can be privately generated by multiple parties as in an arbitrary non-black-box way, and finally outputs a described in [14] such that no one knows p and q if at least one bit b0 ∈ {0, 1}. party can be trusted, or by the Intel Software Guard Extensions The adversary wins the game if b0 = b and loses otherwise. (Intel SGX). Anyone can check if (g − 1)/N is an integer. We define the adversary A’s advantage in this game to be Based on the public parameters (N, g), each matching 0 server S chooses a private key SK randomly from ∗ = AdvA(k) = |Pr(b = b) − 1/2| where k is the security i i Zρ parameter. {1, 2, ··· , ρ − 1}, where ρ is a large integer less than N, and computes and publishes a public key PK = gSKi (mod N 2) Definition 4 (User Profile Privacy Definition). In a user i and a zero-knowledge proof of knowledge of SKi. Let the profile matching protocol, the user has profile privacy if for common public key of the social networking service be any probabilistic polynomial time (PPT) adversary A, we have Qn 2 PK = i=1 PKi(mod N ). that AdvA(k) is a negligible function, where the probability is According to the common public key PK and the pub- taken over coin-tosses of the challenger and the adversary. lic parameters (N, g), each user Ui encrypts his profile User profile privacy ensures that the attacker cannot deter- (ai1, ai2, ··· , aim) as follows, aij rij 2 aij rij 2 mine the profile of the user even if some matching servers are E(g1 ) = (g (mod N ), g1 PK (mod N )) ∗ compromised by the attacker. where rij is randomly chosen from Zρ and g1 = N + 1. We Next, we formally define user query privacy with a game assume that |aij|  N. 6

In the same way, the user Ui also encrypts Theorem 1 (Private Sharing Correctness). In Algorithm 1, 2 2 2 Pn (ai1, ai2, ··· , aim). In addition, the user Ui encrypts we have k=1 xk = x. his contact information CIi, e.g., email address or mobile r x r Proof. Assume A = g and B = g1 PK for a random integer phone number, where we assume that the size of the binary r, we have representation of CIi is less than the size of N, i.e., CIi < N. n−1 Then the user Ui submits 2 2 Y xk SKk −1 SKn CIi ai1 ai1 aim aim B (g1 A ) /A E(g1 ), E(g1 ), E(g1 ), ··· , E(g1 ), E(g1 ) k=1 to the social networking service provider, which stores them n Pn−1 in the user profile database DB. x− xk Y = g k=1 PKr/ gSKkr The database DB containing l users’ profiles is a set 1 a2 k=1 CIi aij ij Pn−1 n−1 DB = {E(g1 ), [E(g1 ), E(g1 )]1≤j≤m|1 ≤ i ≤ l}. x− xk P k=1 x− k=1 xk Since the user profiles are encrypted by the common public = g1 = (1 + N) key PK, nobody knows the user profiles unless the n matching n−1 X 2 servers in the social network collude. But we assume that at = 1 + (x − xk)N (mod N ) least one out of the n servers is trusted not to collude with k=1 other servers to attack the user profiles. Therefore, the privacy n−1 n−1 of the user profiles can be protected. Y xk SKk −1 SKn 2 X [B (g1 A ) /A (mod N ) − 1]/N = x − xk 5.2 Private User Profile Matching k=1 k=1 Pn−1 When a user U wishes to find matching users from the social i.e., xn = x − k=1 xk. The theorem is proved. 4 network, he specifies his preferred profile attributes A1 = After the executions of Algorithm 1 for aj and wj (j = a1,A2 = a2, ··· ,Am = am, and the corresponding weights 1, 2, ··· , m), respectively, the server Sk (k = 1, 2, ··· , n) will w1, w2, ··· , wm, and the dissimilarity threshold δ. We assume privately hold a(k) such that Pn a(k) = a , and w(k) such `w j k=1 j j j that all wj are integers between 0 and 2 −1 and δ is an integer Pn (k) `δ that w = wj. between 0 and 2 −1, where `w and `δ are the bit-length of wj’s k=1 j and δ, respectively. In addition, based on the public parameters Private Computation of Dissimilarity. Assume that the server Pn (N, g), the user U randomly chooses the private key SKU from Sk (k = 1, 2, ··· , n) privately holds xj such that k=1 xj = ∗ SK 2 y y Zρ and computes the public key PKU = g (mod N ). x, given the encryption of g1 , i.e., E(g1 ), the n matching xy Then the user U generates a query servers can cooperate to privately compute E(g1 ) by running a1 w1 am wm δ Q = {E(g1 ), E(g1 ), ··· , E(g1 ), E(g1 ), E(g1),PKU } Algorithm 2. and submits Q to the social networking service provider. After receiving the query Q from the user U, the n matching Algorithm 2 Private Multiplication (PM) (n servers) servers cooperate to find matching users from the database DB y Input: S1 : x1,S2 : x2, ··· ,Sn : xn (private), E(g1 ), N, g, P K in four steps as follows. (public) x xy Private Sharing of Query. Given an encryption of g1 , i.e., Output: E(g ). x 1: for k = 1 to n { E(g1 ), assume that the absolute value of x is smaller than L, y xk xky `x+` 2: Sk computes and re-encrypts E(g1 ) = E(g1 ) where L = 2  N, `x is the bit-length of x, and ` is a x y 3: S E(g k ) sufficiently large statistic security parameter to mask x, e.g., k broadcasts 1 to other servers. 4: S computes (A, B) = E(gx1y)E(gx2y) ··· E(gxny) the values of attributes, weights, and thresholds, etc., the n k 1 1 1 5: } matching servers can cooperate to privately share x by running 6: return (A, B) Algorithm 1.

Algorithm 1 Private Sharing (PS) (n Servers) Theorem 2 (Private Multiplication Correctness). In Algo- xy x rithm 2, we have (A, B) = E(g1 ). Input: E(g1 ), N, g, P K (public), S1 : SK1,S2 : SK2, ··· , Sn,SKn (private) Proof. Because Output: S : x ··· S : x Pn x = x 1 1, , n n such that k=1 i . x1y x2y x y x (A, B) = E(g )E(g ) ··· E(g n ) 1: let (A, B) = E(g1 ) 1 1 2: for k = 1 to n − 1 { (x1+x2+···+xn)y xy = E(g1 ) = E(g1 ) 3: The server Sk randomly chooses an integer xk from (−L, L), xk SKk −1 2 computes (g1 A ) (mod N ) and sends it to the server Sn. The theorem is proved. 4 4: } n 5: S computes Now the servers cooperate to determine the dis- n similarity between the user profile and the given record n−1 2 2 2 ai1 a a a aim aim Y x SK −1 SK 2 i1 i2 i2 x = [B (g k A k ) /A n (mod N ) − 1]/N (E(g1 ), E(g ), E(g ), E(g ) ··· , E(g1 ), E(g1 )) of a n 1 user U from the database DB. k=1 i Because the server Sk (k = 1, 2, ··· , n) has privately 6: return S1 : x1, S2 : x2, ··· , Sn : xn (k) Pn (k) held aj such that k=1 aj = aj, the n servers can 7

(aj −2aij )aj cooperate privately to compute E(g1 ) with Algorithm Theorem 3 (Private Comparison Correctness). In Algorithm a2 2 (aj −2aij )aj ij (aj −aij ) 3, we have sλ−1 = 0 if d(U, Ui) ≤ δ and 1 otherwise. 2 and then E(g1 )E(g1 ) = E(g1 ), where j = 1.2, ··· , m. Proof. According to the binary addition described in Section Next, because the server Sk (k = 1, 2, ··· , n) has privately 3.2, we have (sλ−1 ··· s1s0) is the two’s complement of d = (k) Pn (k) δ − d(U, U ). Based on the properties of two’s complement, we held wj such that k=1 wj = wj (j = 1.2, ··· , m), each i server Sk can compute and broadcast have sλ−1 = 0 if d ≥ 0 (i.e., d(U, Ui) ≤ δ) and 1 otherwise. The theorem is proved. 4 m (k) 2 m (k) 2 w (a −a ) P w (a −a ) sλ−1 Y j j ij j=1 j j ij Private Generation of Response. After obtaining e(g ), E(g1 ) = E(g1 ) 2 j=1 the n servers cooperate to decrypt it by running Algorithm 4.

At last, the n servers can compute Algorithm 4 Collaborative Decryption (CD) (n servers) n α Pm (k) 2 Pn Pm (k) 2 w (aj −aij ) w (aj −aij ) Input: S1 : sk1,S2 : sk2, ··· ,Sn : skn (private), e(g2 ) where Y j=1 j k=1 j=1 j 0 E(g1 ) = E(g1 ) α ∈ {0, 1}, q , g2, G, pk (public) k=1 Output: α. α 1: let (A0,B0) = E(g ) Pm w (a −a )2 j=1 j j ij d(U,Ui) 2: for k = 1 to n − 1 { = E(g1 ) = E(g1 ). ∗ 3: Sk randomly chooses rk from Zρ, computes δ Given the encryption of the threshold, i.e., E(g1), the n rk skk rk Ak = A ,Bk = (Bk−1/A ) , δ d(U,Ui) δ−d(U,Ui) k−1 k−1 servers can compute E(g1)/E(g1 ) = E(g1 ) and then privately share d = δ − d(U, Ui) by running Algorithm and sends (Ak,Bk) to the server Sk+1. 4: } 1. After execution of Algorithm 1, assume that the server Sk n skn P 5: The server Sn computes Z = Bn−1/An−1. (k = 1, 2, ··· , n) privately holds dj such that k=1 dj = d, where |d | < L for j = 1, 2, ··· , n − 1, |d | < nL and 6: if Z = 1 { return 0 } j n 7: else { return 1 } |d|  L. Private Comparison with Threshold. For the purpose of security and efficiency, we will use the original ElGamal Theorem 4 (Collaborative Decryption Correctness). In Al- encryption in this step. The n servers cooperate to choose gorithm 4, the output is α. 0 a multiplicative cyclic group G of prime order q with a r0 r0 Proof. When α = 0, let (A0,B0) = (g2 , pk ), we have generator g2, such that discrete logarithm problem over the group G is hard. Then each server Sk chooses a private key r1 r0r1 ∗ 0 A1 = A0 = g2 skk randomly from Zq0 = {1, 2, ··· , q − 1} and computes a n n skk sk r Y r r sk r Y r r public key pkk = g2 . The common public key is defined as B = (B /A 1 ) 1 = (( pk ) 0 /g 0 1 ) 1 = ( pk ) 0 1 Qn 1 0 0 l 2 l pk = k=1 pki. l=1 l=2 Let λ = dlog2 nLe + 2, assume the two’s complement of dk (k) (k) (k) (k) (k) 1 ≤ k ≤ n − 1 is dk = bλ−1bλ−2 ··· b1 b0 where bλ−1 = 0 if dk ≥ 0 and By induction hypothesis, we assume that for , (k) bλ−1 = 1 otherwise according to [12]. rk r0r1···rk Ak = A = g2 To compare d(U, Ui) and δ, the n servers cooperate to run k−1 Algorithm 3, where e stands for the original ElGamal encryption n sk1 rk Y r0r1···rk with the public key pk. Bk = (Bk−1/Ak−1) = ( pkl) l=k+1

Algorithm 3 Private Comparison (PC) (n servers) Therefore, 0 Input: S1 : d1,S2 : d2, ··· ,Sn : dn (private), q , g2, G, pk (public) skn r0r1···rn−1 r0r1···rn−1skn Output: e(1) if d(U, Ui) ≤ δ and e(g2) otherwise. Z = Bn−1/An−1 = pkn /g2 = 1, sλ−1 s1 s0 1: let S = (e(g2 ), ··· , e(g2 ), e(g2 )) where sj = 0 for all j. 2: for k = 1 to n { r1r2···rn−1 and the output is 0. When α = 1, we have Z = g2 6= 1 3: Sk encrypts dk to get and the output is 1. The theorem is proved. 4 (k) (k) (k) b b b λ−1 1 0 If the output of Algorithm 4 is 0, the n servers cooperate e(dk) = (e(g2 ), ··· , e(g2 ), e(g2 )) to encrypt CIi with the public key PKU of the query user U 4: n m m servers cooperate to compute by running Algorithm 5, where E(g1 ,PK) and E(g1 ,PKU ) m sλ−1 s1 s0 stand for encryptions described in the initialisation of g1 under S = S  e(dk) = (e(g2 ), ··· , e(g2 ), e(g2 )) the public keys PK and PKU , respectively. as described in Section 3.2, where sj is either 0 or 1. Theorem 5 (Collaborative Encryption Correctness). In Al- 5: } m1+m2+···+mn sλ−1 gorithm 5, (A, B) is an ElGamal encryption of g1 6: return e(g2 ) with the public key PKU . 8

Algorithm 5 Collaborative Encryption (CE) (n servers) the total number of matching users from DB is less than t, all

Input: S1 : SK1,S2 : SK2, ··· ,Sn : SKn (private), matching users are returned to the querying user. m E(g1 ,PK), N, g, P KU (public) m Output: E(g1 ,PKU ). m XTENDED ROFILE ATCHING ROTOCOL 1: given E(g1 ,PK), the n servers cooperate to run Algorithm 1 to 6 E P M P privately share m and then each server Sk privately holds mk Pn 6.1 User Profile Matching for Multiple Thresholds such that k=1 mk = m. 2: for k = 1 to n { In the user profile matching protocol described in Section 5, ∗ 3: Sk randomly chooses rk from Zρ, computes the query user only specify single threshold δ only. In some A = grk (mod N 2),B = gmk PKrk (mod N 2) user matching queries, the user may wish to specify different k k 1 U thresholds for different groups of attributes, respectively. For and sends (Ak,Bk) to the social network service provider. example, the query user may wish to find people with age 4: } between 20 and 30 and height between 170cm and 180cm. 5: The service provider computes In this section, we extend the user profile matching protocol n n for a single threshold to allow the query user to specify multiple Y 2 Y 2 A = Ak(mod N ),B = Bk(mod N ) thresholds. Let us consider two thresholds at first. k=1 k=1 Assume that the query user U chooses two groups of 6: return (A, B) attributes, one group consists of attributes A1,A2, ··· ,Am and 0 0 0 another group contains attributes A1,A2, ··· ,Am0 and wishes

to find a matching user Ui such that dA1,A2,··· ,Am (Ui,U) = Pm 2 wj(aij − aj) ≤ δ, dA0 ,A0 ,··· ,A0 (Ui,U) = j=1 1 2 m0 Proof. According to Algorithm 5, we have 0 Pm 0 0 0 2 0 0 n n j=1 wj(aij − aj) ≤ δ , where aij and aij are the value 0 Y Y rk r1+r2+···+rn of the attributes Aj and Aj, respectively, for the user Ui, and A = Ak = g = g 0 0 0 aj, wj, δ, aj, wj, δ are specified by U in the query. k=1 k=1 0 0 a1 am a1 am n n First of all, the user U encrypts g1 , ··· , g1 , g1 , ··· , g1 , w0 w0 0 Y Y mk rk m1+···+mn r1+···+rn w1 wm 1 m δ δ B = Bk = g1 PKU = g1 PKU g1 , ··· , g1 , g1 , ··· , g1 , g1, g1 with the ElGamal encryp- k=1 k=1 tion scheme and PK as described in Section 5.1. In addition, the user U randomly chooses the private key SKU and Therefore, the theorem is proved. 4 SKU computes the public key PKU = g . Then the user CIi After receiving E(g1 ,PKU ) from the service provider, the generates a query a1 w1 am wm δ query user U runs Algorithm 6 to obtain CIi. Q = {A1, E(g1 ), E(g1 ), ··· ,Am, E(g1 ), E(g1 ), E(g1), 0 0 0 0 0 0 a1 w1 0 am wm δ A1, E(g1 ), E(g1 ), ··· ,Am0 , E(g1 ), E(g1 ), E(g1 ),PKU } Algorithm 6 Decryption (User U) and submits Q to the service provider. y Given Q, the n matching servers cooperate Input: E(g1 ,PKU ), N, g, P KU (public), U : SKU (private) Output: y. to run Algorithm 1 repeatedly to privately share y 0 0 0 0 1: let (A, B) = E(g1 ,PKU ) a1, w1, ··· , am, wm, a1, w1, ··· , am0 , wm0 . 2: The query user U computes For each user Ui in the database, the n matching servers cooperate to determine whether U meets the thresholds x = [B/ASKU (mod N 2) − 1]/N i specified by the query user U in three steps as follows. 3: return x Step 1: For attributes A1,A2, ··· ,Am, the n matching servers δ−dA1,··· ,Am (U,Ui) collaborate to compute E(g1 ) as described in Theorem 6 (Decryption Correctness). In Algorithm 6, we Section 5.2. Then, the n servers cooperate to privately share have x = y(mod N). δ−dA1,··· ,Am (U, Ui) without revealing it by running Algorithm

r y r 1 and to determine if δ − dA1,··· ,Am (U, Ui) ≥ 0 by running Proof. Assume that A = g and B = g1 PKU , we have α Algorithm 3. Let the output of Algorithm 3 be e(g2 ), where SKU 2 x = [B/A (mod N ) − 1]/N α = 0 if δ − dA1,··· ,Am (U, Ui) ≥ 0 and 1 otherwise.

y r r SKU 2 0 0 0 = [g PK /(g ) (mod N ) − 1]/N Step 2: For attributes A1,A2, ··· ,A 0 , the n matching servers 1 U 0 m y 2 δ −dA0 ,··· ,A0 (U,Ui) = ((1 + N) (mod N ) − 1)/N 1 m0 collaborate to compute E(g1 ) as described in = (1 + y(mod N)N − 1)/N = y(mod N) Section 5.2. Then, the n servers cooperate to privately share δ0− d 0 0 (U, U ) without revealing it by running Algorithm A1,··· ,A 0 i The theorem is proved. 4 m 0 1 and to determine if δ − dA0 ,··· ,A0 (U, Ui) ≥ 0 by running 1 m0 0 Because CIi < N, the output of Algorithm 6 is CIi if the α CI Algorithm 3. Let the output of Algorithm 3 be e(g2 ), where input is E(g i ,PK). 0 0 1 α = 0 if δ − dA0 ,··· ,A0 (U, Ui) ≥ 0 and 1 otherwise. The above process is repeated until t matching users are 1 m0 α α0 found from DB and returned to the querying user. In case that Step 3: After obtaining e(g2 ) and e(g2 ), the n servers 9

a1 am cooperate to compute Q = {E(g1 ), w1 = 2, ··· , E(g1 ), wm = 2, δ = 1,PKU ). 0 0 0 −1 0 0 −1 Like our protocol described in Section 5.2, the n match- αα ((2α−1)α +α )2 (2α−1)α α 2 2 e(g2 ) = e(g2 ) = (e(g2 )e(g2 )) . (aj −aij ) ing servers can cooperate to compute E(g1 ) for (2α−1)α0 j = 1, 2, ··· , m on the basis of Q and the profile Note that e(g2 ) can be computed with a conditional of a user Ui in the database. Then the n servers can gate as described in Section 3.2. Then the n servers cooperate 2 Pm 2 2(aj −aij ) Qm (aj −aij ) wj j=1 to compute compute j=1 E(g1 ) = E(g1 ) and (1−α)(1−α0) 0 0 Pm 2 αα α α 1− 2(aj −aij ) e(g2 ) = e(g2)e(g2 )/(e(g2 )e(g2 )), E(g j=1 ) . (1−α)(1−α0) 1 Pm and decrypt e(g ) by running Algorithm 4. If the Next, the n servers cooperate to share d = 1 − 2(aj − 2 0 j=1 (1−α)(1−α ) 0 2 output is 1 (i.e., e(g2 ) = e(g2) and thus α = α = 0), aij) by running Algorithm 1 and then determine if d ≥ 0 by the n servers cooperate to encrypt CIi by running Algo- running Algorithm 3. Assume that the output of Algorithm 3 α Pm 2 rithm 5 and the social networking service provider returns is e(g2 ). If α = 0, then d = 1 − j=1 2(aj − aij) ≥ 0 and CIi E(g1 ,PKU ) as the response to the query user U. After therefore aij = aj for j = 1, 2, ··· , m. CIi α receiving E(g1 ,PKU ) from the service provider, the query The output of Algorithm 3, i.e., e(g2 ), may be incorporated user U runs Algorithm 6 to obtain CIi. with other dissimilarity thresholds for numerical attributes as described in Section 6.1 to determine if the profile of the user The above process is repeated until t matching users are U meets the searching criteria. found and returned to the querying user. In case that the total i number of matching users is less than t, all matching users are returned to the querying user. 7 SECURITY ANALYSIS 7.1 Security of Our Underlying Encryption Scheme Now we consider multiple thresholds for user matching in general. The query user chooses several groups of attributes, To compute the distance privately, we use the ElGamal specifies a dissimilarity threshold for each group of attributes encryption scheme [8], but we choose the public parameters and logical relations (including NOT, AND, OR) for the on the basis of the Paillier encryption scheme [21]. The public thresholds, generates a query Q accordingly and sends it to the parameters are (N, g), where N is a product of two larger p N 2 social service provider. primes q and p and g = (1 + N) r (mod N ) for a randomly Given the query Q and the profile of a user Ui, the n chosen integer r ∈ {2, 3, ··· ,N − 1}. It is not hard to see matching servers cooperate to perform Step 1 for each group g(p−1)(q−1) 6= 1(mod N 2), but gq(p−1)(q−1) = 1(mod N 2). α Thus the order g is more than q. In addition, because of attributes and each threshold to obtain e(g2 ), where α = 0 if the dissimilarity threshold is satisfied and 1 otherwise. Based (g − 1)/N is not an integer, there is no integer a such that on the logical relations specified by the query user U and g = (1 + N)a(mod N 2). α α0 The security of the original ElGamal encryption scheme is e(g2 ), e(g2 ), ··· obtained in Step 1 (Step 2), the n matching servers cooperate to evaluate the logical conditions like Step 3 built on the decisional Diffie-Hellman (DDH) problem: given a b c with conditional gates described in Section 3.2. g , g , g , determine if c = ab. If the DDH problem is hard, When the output of Algorithm 5 meets the logical condition, the ElGamal encryption scheme is semantically secure. p N x the n matching servers cooperate to compute E(gCIi ,PK ) Because g = (1 + N) r , we have g = (1 + 1 U px Nx and the social network provider returns it as the response to N) r , which is a Paillier encryption of px(mod N). x the querying user U and then the user U decrypts the response If p and q are given, one can decrypt g to obtain px. So given g, ga, gb, gc in a DDH problem, one can obtain to obtain CIi like Step 3. p, pa(mod N), pb(mod N), pc(mod N), by which it is easy to see if c = ab. Because p and q are assumed to be public in 6.2 User Profile Matching for Categorical Attributes [33], the variant ElGamal encryption scheme has been shown For numerical attributes, we define dissimilarities between to be insecure [22]. two users as in Section 4. These dissimilarity definitions are In this paper, we assume that (N, g) can be generated so not applicable to categorical attributes, e.g., user nationality, that no one knows p and q, e.g., by the approach in [14] or by user hobbies, and etc. However, for some special queries, e.g., the Intel Software Guard Extensions (Intel SGX). Furthermore, finding users with the same nationality, we can re-use our user there is no integer a such that g = (1 + N)a(mod N 2) and profile matching protocol as follows. it is well-known that the factorization of a large integer N Assume that A1,A2, ··· ,Am are categorical attributes. is a hard problem. Therefore, under the assumption that no U In the initialization, each user i maps the value of each one knows p and q, the DDH problem over the group G = categorical attribute Aj to the data aij of a fixed size by {gα(mod N 2)|α ∈ } is hard. 2 2 ZN a hash function H, encrypts the ai1, ai1, ··· , aim, aim as In the computation of the distance, our encryption scheme is described in Section 5.1, and then sends the ciphertexts 2 2 actually a combination of the ElGamal and Paillier encryption CIi ai1 ai1 aim aim E(g1 ), E(g1 ), E(g1 ), ··· , E(g1 ), E(g1 ) to the social schemes. Like the Paillier scheme, from an encryption (A = r 2 x r 2 networking service provider, which stores them in the user g (mod N ),B = g1 PK (mod N )) of x, one can obtain N x N rN rN 2 profile database DB. B = (g1 ) PK = PK (mod N ). However, given N ∗ When a user wishes to find people such that aij = aj X , one cannot distinguish X with a random integer R ∈ ZN 2 . (j = 1, 2, ··· , m), he sends to the service provider a query Otherwise, the Paillier scheme is not semantically secure. 10

In summary, under the assumption that no one knows p and bit-length of x and ` is a sufficiently large statistical security q and the Paillier scheme is semantically secure, the ElGamal parameter. Therefore, the adversary still cannot distinguish x scheme described in the initialization is semantically secure. from any other value. In addition, the adversary cannot retrieve SK1 x1 SK1 −1 A from (g1 A ) in view of the existence of x1. Thus SK1 7.2 User Profile Privacy the adversary cannot use A to decrypt any other ciphertext (A, B) in case that the adversary decoys the first server to In Section 4, we define user profile privacy with a game output ASK1 . between an adversary A and a challenger C. We assume In the private multiplication algorithm (Algorithm 2), there that the adversary A has compromised n − 1 matching is no decryption of any query component and thus it does not servers S2,S3, ··· ,Sn and known their private keys SKk for help the adversary win the game. k = 2, 3, ··· , n. In the private comparison algorithm (Algorithm 3), the The adversary A chooses two user profiles encryption scheme used to encrypt 0 or 1 only is different (a01, a02, ··· , a0m) and (a11, a12, ··· , a1m) and sends from the encryption scheme used to encrypt user profiles and them to the challenger C. The challenger C chooses a random queries. There is no decryption of any query component in the bit b ∈ {0, 1}, and executes the Profile Generation (PG) to algorithm and it does not help the adversary win the game. obtain an encrypted profile Pb = PG(P K, ab1, ab2, ··· , abm), In the collaborative decryption algorithm (Algorithm 4), there and then sends Pb back to the adversary A. is only one decryption. The algorithm is designed so that the According to the initialization described in Section 5.1, decryption result is either 0 or 1 and it is hard for the adversary a a2 a a2 b1 b1 bm bm sk1 sk1 rk Pb = {E(g1 ), E(g1 ), ··· , E(g1 ), E(g1 )}. to retrieve A0 from (B0/A0 ) , where rk is randomly For simplicity, we do not include the contact information chosen by the first server. Thus this algorithm also does not here. It can be considered as an attribute. help the adversary win the game. x r 2 x r 2 Given E(g1 ) = (g (mod N ), g1 PK (mod N )), the In the collaborative encryption algorithm (Algorithm 5), the x r r SK2+···+SKn x r adversary can compute g1 PK /(g ) = g1 PK1 . private sharing algorithm (Algorithm 1) is used to share the Without knowing the private key SK1 of the first server, the contact information CIi of the user Ui at first. As we have adversary cannot distinguish x with any other value because the shown, Algorithm 1 does not help the adversary win the game. ElGamal encryption scheme is semantically secure. Therefore, After that, the first server encrypts CI(1) with the public key b i the adversary advantage in guessing is negligible. According PKU of the query user U. Without knowing the private key to Definition 4, our protocols (including the extended protocols) SK , the adversary cannot get CI(1) and hence is not able to have user profile privacy. U i determine CIi. Thus the adversary cannot distinguish CIi in CIi E(g1 ) from any other value and cannot make use of CIi to 7.3 User Query Privacy guess b. In Section 4, we also define user query privacy with a game In summary, our user profile matching protocol is composed between an adversary A and a challenger C. We also assume of Algorithms 1-6. Algorithm 6 is executed by the query user. that the adversary A has compromised n − 1 matching servers Algorithms 1-5 are cooperatively executed by the n matching n − 1 S2,S3, ··· ,Sn and known their private keys SKk, skk for servers. Even if the adversary compromises servers, he k = 2, 3, ··· , n. In addition, we assume that the adversary is still cannot distinguish the two queries in the game. Therefore, different from the adversary in the user profile privacy game. the adversary’s advantage in guessing b is negligible. According The adversary A chooses two user profiles to Definition 5, our protocols (including the extended protocols) (a01, a02, ··· , a0m) with corresponding weights have user query privacy. (w01, w02, ··· , w0m) and dissimilarity threshold δ0, and (a11, a12, ··· , a1m) with corresponding weights 8 PERFORMANCE ANALYSIS (w11, w12, ··· , w1m) and dissimilarity threshold δ1, and sends 8.1 Theoretic Performance Analysis them to the challenger C. The challenger C chooses a random bit We need two instances of ElGamal encryption for our proto- b ∈ {0, 1}, and executes the Query Generation (QG) to obtain cols. The first instance, described in Section 5.1, is used in a query Qb = QG(P K, ab1, wb1, ab2, wb2, ··· , abm, wbm, δb), Algorithms 1, 2, 5, and 6, since in these four algorithms the and then sends Qb back to the adversary A. value of the exponent of the generator in the cyclic group has According to our protocol, to be retrieved by a decrypting party. The second instance is ab1 wb1 abm wbm δb the original ElGamal encryption scheme, used in Algorithms 3 Qb = {E(g ), E(g ), ··· , E(g ), E(g ), E(g ),PKU } 1 1 1 1 1 and 4 to encrypt a bit. SKU where PKU = g is randomly generated. We construct the first instance of ElGamal encryption by x As the analysis on user profile privacy, given E(g1 ), the creating an RSA modulus N, which is the product of two adversary cannot distinguish x with any other value without large primes p and q with bit-length equal to 512-bit each. The p N 2 knowing SK1. generator g is chosen to be (1+N) r modulo N , which has x ∗ In the private sharing algorithm, given E(g1 ) and an order more than q under ZN 2 . As for the second instance, x1 SK1 −1 (g1 A ) from the first server, the adversary can obtain we use the original ElGamal encryption scheme [8] based on x−x1 0 0 0 0 g1 and then x − x1. However, x1 is sufficiently large, i.e., an order q subgroup modulo p , where p and q are prime `x+` around 2 , to hide all possible values of x, where `x is the numbers of 1,024-bit and 160-bit, respectively. The generator 11

TABLE 1: Computation Complexities per Server Distance Private Response TABLE 2: Domain Ranges of Attributes Protocols Computation Comparison Generation Attribute Type Lower Bound Upper Bound 11m Basic Protocol [32] (Exp.) 1 (Exp.) 3 (Exp.) age Numerical 0 100 18m O(λ) Privacy-Enhanced Protocol [32] (Exp.) (exp.) 3 (Exp.) hours-per-week Numerical 0 100 8m O(λ) New Proposed Protocol (Exp.) (exp.) 5 (Exp.) capital-gain Numerical 0 100,000 capital-loss Numerical 0 5,000 fnlwgt Numerical 0 1,500,000 ∗ g2 for this instance is chosen to be an element in Zp0 with q0 order equal to . The first instance allows a decrypting party TABLE 3: Parameters in the Experiments to compute the discrete logarithm to the base of g1 modulo Parameter Meaning Values 2 N but this comes at a cost of a higher decryption complexity |D| Dataset Size 1k 3k 5k 2 since N is a 2,048-bit integer. `0 Max Bit-Length of Attribute Values 5 The most important functionality we provide in our protocol m Number of Attributes 1 3 5 |w| Max Bit-Length of Weight Values 3 is query processing, which in turn consists of three sub- ` Statistical Security Parameter 20 30 40 protocols. An encrypted weighted and squared Euclidean n Number of Matching Servers 2 4 6 distance between the given record (query) and a profile is jointly computed by those n matching servers in the first sub- protocol, whose output is then fed to the second sub-protocol profile data owners could first scale each of their attribute to determine whether or not there is a match. If both match, a values to a predefined range, e.g., [0, 2`0 − 1] that is publicly response, i.e., the encrypted contact information of the matching known. We adopt this approach in our experiments. user, is generated and returned to the query user in the third In our experiments, we use several parameters summarized sub-protocol. Therefore, in the following performance analysis, in Table 3. The default parameters are shown in boldface. In we focus on the performance analysis of the efficiency of our the following subsections, we focus on the empirical analysis proposed protocols in distance computation, private comparison of the efficiency of our proposed protocols. and response generation. We list the computation complexities The prototype of our system is implemented in C/C++, and of each server in the three sub-protocols for the basic protocol the experiments were carried out on a computer equipped with [32], the privacy-enhanced protocol [32] and the proposed new an Intel i7-4770HQ CPU with 16 GB RAM running on Ubuntu protocol in Table 1, where the user profile contains m attributes, 64-bit 14.04.1 . We use the GNU MP library3 and Exp. and exp. denote the modular exponentiation in the first to implement the ElGamal encryption scheme instances. instance and the second instance of the ElGamal encryption 1) Distance Computation scheme, respectively, and λ denotes the max bit-length of the distance shares. In the first part of query processing, to enable n matching Note that both the basic and privacy-enhanced protocols servers to privately determine whether or not there is a match δ−d(U,Ui) [30] require each matching server to share the user profiles in in the next sub-protocol, a ciphertext E(g1 ) should be the initialization and maintain a user database, while the new computed and shared among the n matching servers, where Pm 2 protocol keeps the user profiles in the service provider only. d(U, Ui) = j=1 wj(aj − aij) . The basic protocol [32] does not hide the contact information First of all, we note that ` and n are the most decisive of users, the weights of the query, and the difference between parameters regarding the security of our proposed solutions the distance and threshold. The privacy-enhanced protocol [32] and hence we would like to study their effects on the efficiency hides them, but its private sharing algorithm requires each of the protocols. In Figure 2(a) and Figure 2(b), we provide server to encrypt the share, broadcast the encrypted share, and the evaluation time needed for secure distance computation verify the commitments from other servers and then collaborate when varying (`, m) and (n, |D|), respectively. We include the to decrypt a value. In the new protocol, the private sharing results from the Basic protocol, the Privacy-Enhanced protocol algorithm only needs each server to send one decryption part and the New Proposed protocol in these figures and use B, PE to the last server. and NP to denote them, respectively. From both Figure 2(a) and Figure 2(b), it is clear that the time for distance computation grows linearly in both the number of 8.2 Experimental Performance Analysis attributes (m) and the number of records (|D|) in the database, Now, we have conducted extensive experiments on a real which is as expected because the computation of the Euclidean dataset2 to evaluate the performance of our proposed protocols distance is performed in a componentwise manner and the under different parameter settings. We removed the records computation of the Euclidean distance has to be done for each with missing values and randomly sample 5,000 records from data record given a query. this dataset. We then choose age, hours-per-week, capital-gain, On the other hand, it can be seen that the protocol NP is the capital-loss, and fnlwgt as the attributes involved in the profile most efficient protocol among those three being compared under matching. Table 2 gives the original domain ranges of these the same configuration of (`, m) or (n, |D|). There are several attributes. As we can see from Table 2, the ranges for the reasons. First, recall that in [32], the secret key SKk of the attributes differ a lot. In order to ease profile matching, the server Sk used for the private sharing and private multiplication

2. http://archive.ics.uci.edu/ml/datasets/Census+Income 3. https://gmplib.org/ 12 was chosen from a much larger domain of {1, ··· , q−1}, where m = 1 (NP) m = 1 (PE) m = 1 (B) |D| = 1k (NP) |D| = 1k (PE) |D| = 1k (B) m = 3 (NP) m = 3 (PE) m = 3 (B) |D| = 3k (NP) |D| = 3k (PE) |D| = 3k (B) q is a 512-bit prime number, while in the protocol NP, each m = 5 (NP) m = 5 (PE) m = 5 (B) |D| = 5k (NP) |D| = 5k (PE) |D| = 5k (B) matching server Sk randomly selects a secret key SKk from 200 200 {1, ··· , ρ − 1}, where ρ = 2160. Second, the efficiency of 160 160 the private sharing algorithm in the protocol NP is improved. Specifically, in the private sharing algorithm of the PE protocol, 120 120 Time (s) there are 2 heavy operations, i.e., one ElGamal encryption Time (s) 80 80 xk SKk for E(g1 ) and one exponentiation for C1 . The former 40 40 requires the computation of gr and PKr for a 512-bit random 2 0 0 integer r. In total, at least 3 exponentiations modulo N are 20 30 40 2 4 6 needed. However, in the new private sharing algorithm, each ℓ n xk SKk 2 (a) Varying ` and m (|D| = 3k, `0 = (b) Varying n and |D| (`0 = 5, m = 3, server only needs to compute g1 A ( mod N ) and send its inverse to the last server. Third, fewer invocations of the private 5, |w| = 3, n = 2) |w| = 3, ` = 30) multiplication algorithm are required in the protocol NP. To be Fig. 2: Evaluation Time for Distance Computation 2 aj aj aij precise, for the protocol B and PE, E(g1 ) and E(g1 ) are derived separately via two invocations of private multiplications, ℓ = 20 ℓ = 30 ℓ = 40 whereas in the new protocol, only one private multiplication |D|=1k |D|=3k |D|=5k ×103 (aj −2aij ) 270 1.0 on input E(g1 ) and shares of aj is required to compute (aj −2aij )aj 0.8 E(g1 ). The protocol B is more efficient than the 240 PE protocol in that wj is publicly known in the former so 0.6 210 that no further invocations involving expensive exponentiations 0.4 δ−d(U,Ui) Time (s) are needed to produce E(g ), whereas one additional Time (s) 1 180 0.2 private sharing to share wj and a subsequent call to private w (a −a )2 0.0 j j ij 150 multiplication for producing E(g ) are necessary 2 4 6 1 1 3 5 for the latter. Lastly, unlike the protocol B or PE, our new m n protocol reduces the overhead of re-encryption in the private (a) Varying m and ` (|D| = 3k, (b) Varying n and |D| (`0 = 5, m = multiplication protocol by each server Sk performing only `0 = 5, |w| = 3, n = 2) 3, |w| = 3, ` = 30) Pm (k) 2 one re-encryption on its own share of j=1 wj (aj − aij) , Fig. 3: Evaluation Time for Private Comparison saving up to 2(m − 1) exponentiations modulo N 2 per query, (k) where wj denotes the share of wj possessed by the server Sk. Overall, the protocol NP saves at least 82% of the computational overhead than the PE protocol. 2) Private Comparison Furthermore, in Figure 2(a) and Figure 2(b), we observe that In this subsection, we report the time for determining whether varying ` and n has very limited effects on the elapsed time there is a match between the given query and a profile. In the for each of the protocols being compared. This is as expected. protocol B, it is allowed that the difference between d(U, Ui) First, recall that when the statistical security parameter ` is and δ is revealed. Hence, only one distributive decryption of increasing, a random share x possessed by the server S has δ−d(U,Ui) k k E(g1 ) is needed, which is much cheaper than the cost to be enlarged as well to ensure that the share retrieved by in the case when this difference has to be hidden, i.e., the the last server Sn would look like a random integer chosen protocols PE and NP. We thus study the effects of varying from a much larger domain than the domain in which x, the (m, `) and (n, |D|) on the evaluation time when only the single integer to be shared, resides. In this regard, when the private bit indicating the matching result is allowed to be revealed. multiplication algorithm is invoked later, the computational The results are reported in Figures 3(a) and 3(b), respectively. overhead would also become higher in that each component y Notice that for the private comparison, both protocols PE and of E(g1 ) has to be raised to a larger exponent xk. However, NP use the same private adder, which in turn is based on the due to the fact that the range of the bit-length of each share same construction of conditional gate, and therefore we do not xk in our experiment is much smaller (from 35 to 58 given distinguish between them in those two figures. the current possible parameter configurations) than the random As what we have seen in the distance computation, in Figure xk integer r used in encrypting g1 , increasing ` only slightly 3(b), it is clear that the evaluation time grows linearly in |D|, the y xk increases the time for computing E(g1 ) . database size. However, unlike what is shown before, when `, In addition, we note that in both the private sharing and m, and n increase, we have a noticeable increase in evaluation the private multiplication algorithms used in each protocol time and such an increase is much higher for larger values of ` under consideration, the server Sk does not have to wait for and n. The reason is the use of the cryptographic primitive for xk SKk 2 other matching servers to compute g1 A (mod N ) or a implementing the secure comparison, namely, the multiplication xky re-encryption of E(g1 ), respectively. Thus, these expensive protocol based on the conditional gate. operations are highly parallelizable and the elapsed time does To see this, recall that the weighted squared dis- not depend on the number of matching servers so much. Pm 2 tance is j=1 wj(aj − aij) , which is upper-bounded by 13

Pm `0 2 0 j=1 wj(2 − 1) . Let ` be the minimum number of bits TABLE 4: Performance Comparison with Shahandashti et al.’s required to represent this maximum distance, it can be seen Protocols Computation Complexity Communication Complexity 0 `0 2 Pm that ` = blog2((2 − 1) ( j=1 wj)) + 1c. Hence, the private Our two-party 3O(κ`) (exp.) 3O(κ`)γ 0 L = 2` +` Protocol sharing (Algorithm 1) is invoked with the parameter Shahandashti (O(κ|T |) + O(κ|D |) (O(κ|T |) + O(κ|D |) `0 E E to share the distance, which is upper-bounded by 2 −1. Using et al.’s Protocol O(κ|Da|)) (exp.) O(κ|Da|))γ a similar argument, it can be concluded that the minimum number of bits to represent a distance share owned by the 00 `0 `0+` server Sn is ` = blog2((2 − 1) + (n − 1)(2 − 1)) + 1c. our two-party protocol and Shahandashti et al.’s are roughly The minimum number of bits needed to represent Sn’s distance compared in Table 4, where ` is the statistical security parameter share thus grows linearly in (`0 + `). According to the fact that in our protocol, κ is the number of matching, γ is the ciphertext the time complexity of private addition increases linearly in size, and |S| denotes the number of elements in the set S. the bit-length of its two input numbers, the evaluation time for Usually, ` = 30 is sufficient in our protocol. If |T | = private comparison grows linearly in `, which is consistent with 10, |DE| = 1,000, |Da| = 10 and κ = 1, Shahandashti et our experimental results in Figure 3(a). Increasing the number al.’s protocol needs to compute about 1,020 encryptions and of attributes (m), however, does not have a linear effect on the exchange about 1,020 ciphertexts. However, our protocol needs execution time. The evaluation time only grows logarithmically to compute about 90 encryptions and exchange about 90 in m, which can be confirmed with the formula representing ciphertexts. Shahandashti et al.’s protocol is more efficient 0 ` above. than our protocol only when |T | + |DE| + |Da| is less than In the evaluation of the conditional gate, we assume that the 90, in other words, the number of possible fingerprints is less server Si has to wait for its subsequent servers to finish the than 27,000. computation of Un and Vn to proceed to the next stage in the conditional gate, implying that the evaluation time increases at 9 CONCLUSION least linearly in the number of matching servers involved in the conditional gate, which in turn is used to implement the In this paper, we proposed a new solution for privacy- pre- private addition. This explains why we see a linear increase in serving user profile matching with homomorphic encryption n from Figure 3(b). However, if we allow those n servers to technique and multiple servers. Our solution allows a user to process n user profiles in parallel by interleaving the order of find out the matching users with the help of multiple servers query execution for different user profiles, the evaluation time without revealing the query and the user profiles. Security 0 analyses have showed that the new protocol achieves user will reduce significantly, because the server Si can compute Ui 0 profile privacy and user query privacy. The experimental results and Vi for another user profile in the database while waiting on U and V to be generated by its following servers. have showed that the new protocol is practical and feasible. n n Our future work is to improve the performance of computing conditional gates by parallel computation. 8.3 Comparison with Shahandashti et al.’s Protocol At last, we compare the performance of our protocol with 10 ACKNOWLEDGMENTS Shahandashti et al.’s [24], where Alice and Bob, each holding This work was supported by Australian Research Council a private fingerprint, find out if their fingerprints belong to the Discovery Project (DP190102835), the National Research same individual. Foundation, Prime Minister’s Office, Singapore under its In Shahandashti et al.’s protocol, let Alice have as private Strategic Capability Research Centres Funding Initiative, and input a set of minutiae F = {p1, p2, ··· , pn} where for i = 1 Data61 Research Collaboration Project. to n, pi = (ti, xi, yi, θi) and Bob have as private input a set 0 0 0 0 of minutiae F = {p1, p2, ··· , pm} where for j = 1 to m, 0 0 0 0 0 REFERENCES pj = (tj, xj, yj, θj). Alice’s private output of the protocol is the binary predicate indicating whether or not there are at least [1] R. Agrawal, A. Evfimievski, and R. Srikant, Information sharing across τ of her minutiae matching those of Bob’s, where two minutiae private , in SIGMOD 2003, pp. 86-97. 0 [2] M. von Arb, M. Bader, M. Kuhn, and R. Wattenhofer, Veneta: Serverless are defined to match if their types ti and tj are the same, their d friend-of-friend detection in mobile social networking, in IEEE WIMOB locations are closer than a threshold Euclidean distance E, 2008, pp. 184-189. q 0 2 0 2 i.e., (xi − xj) + (yi − yj) ≤ dE, and their orientations [3] B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, 0 Communications of the ACM 13 (7): 422-426, 1970. are within an angular difference da, i.e., min(|θi − θj)|, 2π − 0 [4] D. Boneh, E. J. Goh, K. Nissim, Evaluating 2-DNF formulas on |θi − θj|) ≤ da. For comparison, in our protocol for multiple thresholds, we ciphertexts, in TCC 2006, pp 325-341. assume that there are two matching servers, which are given the [5] D. Chaum, Blind signatures for untraceable payments, in Crypto 1982, pp. 199-203. encryptions of F and F 0 and output if two minutiae match or [6] E. D. Cristofaro and G. Tsudik, Practical private set intersection protocols not. Let us denote the set of all possible minutia types by T , the with linear complexity, in Financial Cryptography and Data Security set of all possible Euclidean distances (resp. angular differences) 2010. between two arbitrary points (resp. orientations) by DE (resp. [7] D. Dachman-Soled, T. Malkin, M. Raykova, and M. Yung, Efficient Da). The computation and communication complexities of robust private set intersection, in ACNS 2009, pp. 125-142. 14

[8] T. ElGamal, A public-key cryptosystem and a signature scheme based Xun Yi is a Professor with the Computer Science on discrete logarithms, IEEE Transactions on Information Theory 31 and Software Engineering, School of Science, (4): 469-472, 1985 RMIT University, Australia. His research interests include applied cryptography, computer and net- [9] M. Freedman, K. Nissim, and B. Pinkas, Efficient private matching and PLACE set intersection, in EUROCRYPT 2004, pp. 1-19. work security, mobile and wireless communication PHOTO security, and data privacy protection. He has [10] C. Gentry, Fully homomorphic encryption using ideal lattices, in STOC HERE published more than 200 research papers in inter- 2009, pp 169-178. national journals and conference proceedings. He [11] S. Goldwasser and S. Micali, Probabilistic encryption and how to has ever undertaken program committee mem- play mental poker keeping secret all partial information, in Proc. 14th bers for more than 30 international conferences. Symposium on Theory of Computing, 1982, pp. 365-377. [12] D. Harris, D. M. Harris and S. L. Harris, Digital Design and Computer Architecture, Morgan Kaufmann Publishers, 2007. Elisa Bertino is a professor of computer sci- [13] C. Hazay and Y. Lindell, Efficient protocols for set intersection and ence at Purdue University. Her main research pattern matching with security against malicious and covert adversaries, interests include security, privacy, digital identity in TCC 2008, pp. 155-175. management systems, database systems, dis- PLACE tributed systems, and multimedia systems. She [14] C. Hazay, G. L. Mikkelsen, T. Rabin, T. Toft, A. A. Nicolosi, Efficient PHOTO is currently serving as EIC of IEEE Transac- RSA key generation and threshold paillier in the two-party setting, in HERE tions on Dependable and Secure Computing. CT-RSA 2012, pp. 313-331. She received the 2002 IEEE Computer Society [15] Y. Huang, L. Malka, D. Evans and J. Katz, Efficient privacy-preserving Technical Achievement Award for outstanding biometric identification, in NDSS 2011. contributions to database systems and security. [16] S. Jarecki and X. Liu, Efficient oblivious pseudorandom function with applications to adaptive OT and secure computation of set intersection, in TCC’09, 2009, pp. 577-594. Fang-Yu Rao is a Ph.D. candidate in Computer [17] R. Li and C. Wu, An unconditionally secure protocol for multi-party Science at Purdue University. He received his set intersection, in ACNS 2007, pp. 226-236. Master and Bachelor of Computer Science and Engineering from National Sun Yat-sen University. [18] M. Li, N. Cao, S. Yu, and W. Lou, FindU: Privacy-preserving personal PLACE His research interests are in data privacy, informa- profile matching in mobile social networks, in IEEE INFOCOM 2010, PHOTO tion security, and applied cryptography, with an pp. 1-9. HERE emphasis on privacy-preserving data analytics. [19] L. Kissner and D. Song, Privacy-preserving set operations, in CRYPTO 2005, pp. 241-257. [20] G. S. Narayanan, T. Aishwarya, A. Agrawal, A. Patra, A. Choudhary, and C. P. Rangan, Multi party distributed private matching, set disjointness and cardinality of set intersection with information theoretic security, in Kwok Yan Lam is a Professor with the School CANS 2009, pp. 21-40. of Computer Science and Engineering, Nanyang [21] P. Paillier, Public-key cryptosystems based on composite degree residu- Technological University, Singapore. He received osity classes, in Proc. 17th Int. Conf. Theory Appl. Cryptograph. Techn., his B.Sc. (First Class Honours) from the Univer- PLACE 1999, pp. 223-238. sity of London in 1987 and his Ph.D. from the Uni- PHOTO versity of Cambridge in 1990. In 1997, he founded [22] F. Y. Rao, On the security of a variant of ElGamal encryption scheme, HERE PrivyLink International Ltd, a spin-off company of IEEE Trans. Dependable and Secure Computing, 2017. the National University of Singapore, specializing [23] Y. Sang, H. Shen, and N. Xiong, Efficient protocols for privacy preserving in e-security technologies for homeland security matching against distributed datasets, in ICICS 2006, pp. 210-227. and financial systems. [24] S. F. Shahandashti, R. Safavi-Naini, P. Ogunbona, Private fingerprint matching, in ACISP 2012, pp 426-433. [25] A. Shamir, How to share a secret, Communications of the ACM 22 Surya Nepal is a Principal Research Scientist at (11): 612-613, 1979. CSIRO Data61, Australia. His main research inter- est include the development and implementation [26] B. Schoenmakers and P. Tuyls, Practical two-party computation based of technologies in the area of distributed systems on the conditional gate, in ASIACRYPT 2004, pp. 119-136. PLACE and social networks, with a specific focus on se- [27] L. Sun, L. Zhang, X. Ye, Randomized bit vector: Privacy-preserving PHOTO curity, privacy, and trust. At CSIRO, he undertook encoding mechanism, in CIKM 2018, pp. 1263-1272. HERE research in the area of social networks, security, [28] J. Vaidya and C. Clifton, Secure set intersection cardinality with privacy and trust in collaborative environment and application to association rule mining, J. Comput. Secur. 13(4): 593-622, cloud systems and big data. 2005. [29] Z. Yang, B. Zhang, J. Dai, A. Champion, D. Xuan, and D. Li, E- smalltalker: A distributed mobile system for social networking in physical Athman Bouguettaya is a Professor and Head proximity, in IEEE ICDCS, 2010. of School of Information Technology at The [30] A. C. Yao, Protocols for secure computations, in SFCS 1982. University of Sydney, Australia. He received his PhD in Computer Science from the University [31] Q. Ye, H. Wang, and J. Pieprzyk, Distributed private matching and set PLACE of Colorado at Boulder (USA) in 1992. He is operations, in ISPEC 2008, pp. 347-360. PHOTO a founding member and past President of the [32] X. Yi, E. Bertino, F. Y. Rao, A. Bouguettaya, Practical privacy-preserving HERE Service Science Society. He has published more user profile matching in social networks, in ICDE 2016, pp. 373-384. than 200 books, book chapters, and articles in [33] X. Yi, A. Bouguettaya, D. Georgakopoulos, A. Song. Privacy protection journals and conferences in the area of databases for wireless medical sensor data, IEEE Transactions on Dependable and and service computing. Secure Computing, 13(3): 369-380, 2015.