<<

Quantum Inf Process (2015) 14:4201–4210 DOI 10.1007/s11128-015-1091-0

Enhancing user privacy in SARG04-based private database query protocols

Fang Yu1 · Daowen Qiu2,3 · Haozhen Situ4 · Xiaoming Wang1 · Shun Long1

Received: 24 June 2015 / Accepted: 4 August 2015 / Published online: 15 August 2015 © Springer Science+Business Media New York 2015

Abstract The well-known SARG04 protocol can be used in a private query applica- tion to generate an oblivious key. By usage of the key, the user can retrieve one out of N items from a database without revealing which one he/she is interested in. How- ever, the existing SARG04-based private query protocols are vulnerable to the attacks of faked data from the database since in its canonical form, the SARG04 protocol lacks means for one party to defend attacks from the other. While such attacks can cause significant loss of user privacy, a variant of the SARG04 protocol is proposed in this paper with new mechanisms designed to help the user protect its privacy in private query applications. In the protocol, it is the user who starts the session with the database, trying to learn from it of a raw key in an oblivious way. An honesty test is used to detect a cheating database who had transmitted faked data. The whole private query protocol has O(N) communication complexity for conveying at least N encrypted items. Compared with the existing SARG04-based protocols, it is efficient in communication for per- learning.

B Daowen Qiu [email protected] Fang Yu [email protected] Haozhen Situ [email protected]

1 Department of Computer Science, Jinan University, Guangzhou 510632, China 2 Department of Computer Science, Sun Yat-sen University, Guangzhou 510006, China 3 The Guangdong Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou 510006, China 4 College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China 123 4202 F. Yu et al.

Keywords Quantum private database query · User privacy · SARG04-based

1 Introduction

In a private database query (PDQ) application, the user tries to retrieve one item from a database without revealing which item he/she is interested in. This is modeled in the classical scheme symmetrically private information retrieval (SPIR) [1] as the user querying the database A one of its N items, say A j (assumed to be a bit generally, j ∈[N]), while keeping not only the value of j private (user privacy) but also all Aks(k = j) private (data privacy). For such a query task, advantages in communication efficiency can be obtained by the usage of quantum resources [2–4]. There is an exponential reduction in the communication complexity with respect to the best classical SPIR protocol proposed so far [5–7]. However, these protocols compromise security for communication effi- ciency. For example, a cheating database may learn j with a probability a half or more probability, simply by applying a projective measurement on the system [2–4]. Even worse, the cheat may never be detected because there are no means provided for the user to detect the cheat in the canonical forms of some protocols [3,4]. Other quantum SPIR (QSPIR) protocols address more concerns about security than communication efficiency [8–11]. Among them, [9–11] construct QSPIR pro- tocols based on a QKD protocol. The latter is widely thought to offer unconditional security in communication between two parties. By encrypting the items with the key generated by the QKD protocol, better bipartite privacy is obtained. Another advan- tage of this approach is that the research on QKD implementations is progressing very fast, which see that many QKD protocols are realized in practical settings. This will also benefit QSPIR applications in their realization. Out of these concerns, it deserves more attention in future research on finding better QSPIR solutions based on existing QKD schemes. The PDQ protocol [9] is the first QKD-based protocol constructed from the well- known SARG04 QKD scheme [12]. As a bit-learning module, the SARG04 protocol runs first, during which the user tries to learn part of a raw key of the database without revealing which part he/she has learnt. Then, the key is post-processed, with ideally a single bit known to the user. Finally, the oblivious key can be used to encrypt the item values, exactly one of which can be decrypted by the user. In the bit-learning process of the PDQ protocol, the privacy of both parties is guaranteed by the security of the SARG04 protocol. In a QKD application, the two parties, as partners, cooperate to share a key, only with a need to defend against a third party. The SARG04 scheme is secure for both parties to defend against outside attacks in such an application. In a private query application, however, both parties, as adversaries, would like to gain as much information about the other party. So the security of the scheme must be re-evaluated with respect to dishonest parties in such a scenario. The communication of the SARG04 protocol is one way, during which the user can only accept messages from the database compulsively. This makes it convenient for the database to invade user privacy but difficult for the user to defend against attacks from 123 Enhancing user privacy in SARG04-based private database... 4203 the database in a private query application. In fact, the database can gain a big amount of information about user privacy in the bit-learning process of the PDQ protocol via easy methods such as data forging. Moreover, from the forged data the user would make wrong derivations, which might produce wrong answers to his/her queries in the subsequent process. To avoid producing wrong answers and leaking significant information in the query, the scheme must be revised on its bit-learning mechanisms in order to help the user detect such cheats. The Yang’s protocol [11] is recently proposed based on a variation of the SARG04 protocol. It reduces the cost of quantum bits by usage of classical bits and therefore has an advantage over the PDQ protocol in communication. In its bit-learning process, the database always sends one same state to the user, who makes a choice on which messages to transmit to the database next. So, the communication is two way. But as there exists deficiency in the mechanism, the database still can forge the signal and user privacy can be invaded more seriously, even if the user can decide what messages to deliver to its adversary. In this paper, we propose a variant of the SARG04 protocol, which can be used in a SPIR application to enhance user privacy. The bit-learning mechanism is designed to prevent the user from being attacked by faked data in such an application. In this proposed protocol, it is the user who starts the session with the database by sending a state of his/her own choice. A dishonest database may make use of the faked data so as to invade user privacy, but an honesty test is performed subsequently to reveal its dishonest activities. In the meanwhile, as a wrong answer to his/her query might be produced, the user would not take cheats in such a scheme despite that he/she can gain some information from the cheats. Furthermore, the protocol improves its efficiency in communication by reducing the usage of both quantum and classical bits. The rest of the paper begins with a brief introduction to the PDQ protocol and the Yang’s protocol in Sect. 2. Security breaches on user privacy are stated. In Sect. 3, we describe our SPIR scheme in detail with a tabular form presenting the data in its bit-learning process. It is followed by an assessment in its communication perfor- mance in comparison with the existing SARG04-based protocols. The attacks from the adversaries, especially from the database, are addressed, and the security of the protocol against such attacks is analyzed in Sect. 4. Finally, in Sect. 5, we conclude with a summary of our results.

2 Related works

In this subsection, we review two main works, including the PDQ protocol [9] and the Yang’s protocol [11]. The former inherits the SARG04 protocol directly as its bit-learning module and the latter uses one of its variation. Below we briefly describe their mechanisms on bit learning and follow tables to present the data. Suppose that the database initially owns a raw key which is a bit string and tries to commit one of the bits bi to the user in an oblivious way in the process.

123 4204 F. Yu et al.

2.1 The bit-learning process of the PDQ protocol

In the beginning of the bit-learning process of the PDQ protocol, the database sends the user randomly one of the four states |↑, |↓, |→, |←of a single- space. Both |←and |→(or the ↔ basis) represent a bit 1, and both |↑and |↓(or the basis) represent a bit 0(|↔= √1 (|↑±|↓)). Suppose that |↑ 2 was sent, then a pair of states, including the delivered real state |↑and one mask state in the other basis, say, |→, will be announced by the database. See row 5 in Table 1. On receiving the state, the user measures it in randomly one of the two basis. Based on the outcome of his/her measurement along with the announcement of the database, the user tries to derive the value of bi . He/she can only derive bi as 1 “unknown” (represented as “X” in the forms) unless (with a probability of 4 ) he/she has chosen the basis ↔ and measured |←. In the specified case, he/she will infer a conclusive result that |↑has been received and bi is 0. By symmetry, in any other case (1, 2, 3, 4, 6, 7, 8 in Table 1), the probability of the user having a conclusive result 1 is 4 , too. Therefore, when the channel is noiseless and the parties follow the protocol 1 honestly, the user will eventually learn 4 bits of the raw key. In the protocol, a dishonest database may send a faked state other than announced |ψ = 13π |↑+ in order to bias the result of the user. For instance, it sends 1 cos 8 13π |↓ |← |↓ sin 8 , in which case both and will be measured with the probability of approximately 85.36 %. Then, as long as the database announces the pair {| ↑, |→}, the probability of the user having a conclusive result will be raised significantly. See , |ψ = π |↑+ π |↓ the cases 1 5 in Table 1. Conversely, if the database sent 2 cos 8 sin 8 while announcing the same pair, this probability will be reduced to 14.64%. Therefore, the database can bias the conclusiveness of the bit and hence can get information about the bit locations of the user having conclusive results. User privacy is therefore invaded. More seriously, wrong bit values can be produced. For the above instances, since both |←and |↓could be measured with nonzero probability, the user can derive distinct values of bi based on the outcome of his/her measurement (bi = 1for|↓, and bi = 0 for |←, respectively), given that the announced pair is {| ↑, |→}.

Table 1 Data for bit learning in the PDQ protocol

The state delivered and the pair The outcome state of the user announced by the database and his/her derivation

1 |→ |→, |↑ |→, |↑ X |↓ bi = 1 2 |→, |↓ |→, |↓ |↑ bi = 1 3 |← |←, |↑ |←, |↑ X |↓ bi = 1 4 |←, |↓ |←, |↓ |↑ bi = 1 5 |↑ |↑, |→ |↑, |→ X |← bi = 0 6 |↑, |← |↑, |← |→ bi = 0 7 |↓ |↓, |→ |↓, |→ X |← bi = 0 8 |↓, |← |↓, |← |→ bi = 0

123 Enhancing user privacy in SARG04-based private database... 4205

Table 2 Data for bit learning in the Yang’s protocol

 The state sent The measurement outcome and the announce- back by the user ment of the database and the user’s derivation

1 |→ |↑ |↑,|→ 0X|← 1 bi = 1 2 |↓ |↓, |← 1 |→ 0 bi = 1 3 |→ |↑, |→ 0 |↓ 1 bi = 0  The state initially sent by the database

2.2 The bit-learning process of the Yang’s protocol

The bit-learning module of the Yang’s protocol is a variant of the SARG04 protocol, which reduces the cost of quantum bit resources in the transmission of the raw key. In the process, the database only needs to send one same state |→initially. On receiving the state, the user either reflects it back directly or measures it in the basis and sends back the outcome state. Based on the value of bi , the database measures the state in either ↔ (bi = 1) or (bi = 0) basis and announces 0 and 1 corresponding to the outcome states |↑, |→and |↓, |←, respectively. At the end, the user can derive the value of bi based on the announcement of the database and the original state he/she has sent, as shown in the form below. Compared to the PDQ protocol, it is easier for the database to bias the conclusiveness of the bit in this protocol, simply by sending the faked state |↑instead of |→.For such a state, whatever operation he/she had applied on it, the user would necessarily get |↑. Then, in the following step, he/she will sent |↑to the database, but he/she believes that |→would be reflected back if he/she chooses to leave the received state untapped. The data are the same as that of the cases 1, 3 in Table 2. Observe that in both cases, the user will interpret the result as conclusive with respect to the announcement of 1. Therefore, as long as the database sends |↑while announcing the bit 1, the user would derive a conclusive result (bi = 1 if he/she chose to measure the state and bi = 0 if he/she just reflected it back, respectively). Although it cannot know the value derived by the user, the database can manipulate the conclusiveness of the bit arbitrarily. That means it can have the exact locations of the bits known to the user and hence can learn the exact value of j in the subsequent process. This is even worse than the PDQ protocol, where a dishonest database can only learn partial information about j via data forging. In addition, the user is totally unaware of the cheats, which will ruin his/her privacy inevitably.

3 The proposed private query protocol

In this section, we give our solution to the SPIR problem constructed from a variation of the SARG04 protocol which can prevent the database stealing information about j via data forging and hence can help the user protect his/her privacy in a SPIR application. The whole protocol consists of three modules. The bit-learning module, which is a variation of the SARG04 protocol, enables the user to learn a portion of a raw key owned by the database, without revealing which bits of the raw key he/she 123 4206 F. Yu et al.

Table 3 Data for bit learning in the proposed protocol

The state initially The measurement outcome and the announcement sent by the user of the database and the user’s derivation

1 |↑ |↑, |→ 0X|← 1 bi = 1 2 |↓ |↓, |← 1 |→ 0 bi = 1 3 |→ |→, |↑ 0 |↓ 1 bi = 0 4 |← |←, |↓ 1 |↑ 0 bi = 0 has learnt. The second module performs a post-processing function, which creates an N-bit length key K from the raw key, with (ideally) one bit left known to the user. The last module encrypts/decrypts data items between the parties by the usage of the key. Suppose that the database owns a raw key b1b2...bkN at the beginning of the protocol, with the integer k as a security parameter. The user tries to learn the bit bi (i ∈[1, kN]) in an oblivious way at the ith round of the communication. The scheme is given as follows, and the data involved in the bit-learning process are pre- sented in Table 3: 1. Initially, i = 1. Steps (1a)(1b)(1c) will iterate over i from 1 to kN. (a) The user sends randomly one of the four states |↑, |↓, |←, |→to the database. (b) The database measures the state in the basis ↔ ( )ifbi = 1 (0) and announces 0 (1) if the outcome state is either of |↑, |→(|↓, |←). (c) The user derives bi as ⎧ ⎪ , |← ⎪ 0 if was sent and 0 was announced, ⎪ ⎨⎪ or |→was sent and 1 was announced, , |↓ ⎪ 1 if was sent and 0 was announced, ⎪ ⎪ or |↑was sent and 1 was announced, ⎩⎪ unknown, otherwise.

2. The user randomly selects a portion of locations where he/she has conclusive results and asks the database to announce the outcome states of its measurements. If he/she found the database cheating (which will be discussed in detail later in Sect. 4.2), the user would immediately terminate the protocol and abort.

3. Now, b1b2...bkN becomes b1b2...bk N (k < k, and k is not necessarily an integer.), with some bits selected out for honesty checking in step 2. Then, both b1b2...bk N and that hold by the user are cut in k  substrings, which are added bitwise so as to create a key K with roughly one bit known to the user. 4. If the user is left with no known bit after step 3, the protocol has to be restarted. 5. Suppose he/she knows the ith bit Ki , the user will announce the number s = j − i to the database. 6. The database bitwise adds K to the item string, shifted by s. The user will decode A j by using i and Ki after he/she has received the encrypted item string from the database. 123 Enhancing user privacy in SARG04-based private database... 4207

Steps 3, 4, 5, 6 involve the functions of both the post-processing module and the encrypt–decrypt module, the details of which can be referred to steps 5, 6, 7 of the protocol in [9]. In the post-processing step 3, more than one bit might be left known to the user. From the procedure, we can see that an honest user will get the right answer to his/her query from an honest database. In order to complete a single query, the user needs to learn from the database kN bits in an oblivious way, delivering altogether one qubit and one bit of information for per-bit learning. In the canonical PDQ protocol [9], k needs to increase logarithmically with N to ensure that the number of the bits known to the user remains constant. So its communication complexity is O(N log N). By using the post-processing technology given in [13], the required communication cost is reduced to O(N) complexity. The technology can be applied to our scheme, which can hence have O(N) communication complexity as that of the PDQ protocol. The proposed protocol is efficient in communication than the existing SARG04- based protocols. Compared to the PDQ protocol, which transmits one qubit and two of information in communication for per-bit learning, three bits of information are saved according to the theory of , which says that two clas- sical bits of information are essential to transmit one qubit. Compared to the Yang’s protocol, which has to deliver two qubits and one bit information in communication for the same task, one qubit is saved since the database is no longer in charge of passing the qubit on to the user.

4 Security of the protocol

In this subsection, we will analyze the security of our protocol from the aspect of privacy protection for both parties. Firstly, the scheme preserves privacy of honest parties. User privacy is preserved because the database can infer no information about j from the uniformly distributed state it receives. Data privacy is preserved depending on the effectiveness of the post-processing routine [9,13]. Ideally, it reduces the number of the bits known to the user to roughly one single bit. On the other hand, either party may act illegally in order to gain extra information about the other party. In a SPIR scheme which uses an oblivious key to deliver data, the user may invade data privacy by trying to know more bits of the key and the database may invade user privacy by trying to get locations of the bits known to the user. One scheme will be considered as cheat sensitive if it has capabilities in detecting such cheating activities. In the following two subsections, we will make analysis of the security of our protocol with respect to both dishonest parties, especially dishonest databases.

4.1 How the protocol protects database privacy?

In the protocol, a dishonest user has various ways to invade data privacy. However, as long as he/she tries to gain some information (by learning the value of bi with a 1 probability higher than 4 ), the user would get wrong bit values, which would bring him/her wrong answers to his/her queries. In other words, the cheating activities of 123 4208 F. Yu et al. the user will badly affect the correctness of the answer to his/her queries. We explain it in the following paragraphs. Generally, the user may send a state other than mutually agreed, which has a general form cos θ|↑+sin θ|↓(θ ∈[0, 2π)), to invade data privacy. For such a state, the database may measure |↑, |↓(in the basis),|→, |←(in the ↔ basis) with the = 2 θ, = 2 θ, = 1 ( + θ), = 1 ( − θ) probability of p↑ cos p↓ sin p→ 2 1 sin 2 p← 2 1 sin 2 , 1 respectively. In order to learn the value of bi with a probability higher than 4 ,the user would like to bias the outcome of the measurement. The approach is to adjust θ θ ∈[π , π ] ≥ the value of . For instance, he/she can set 8 2 , which makes both p→ p↑ and p↓ ≥ p← simultaneously true. Then, if the bit 0 was announced, which implies that either |→or |↑was measured, ↔ basis would be inferred because |→was more likely measured than |↑. Similarly, if the bit 1 was announced, which implies that either |←or |↓was measured, basis would be inferred because |↓is more likely measured than |←. Next, the user can derive bi correspondingly. Note that the inference is nondeterministic, so it will inevitably induce errors if the outcome state of the measurement is the other indeed. For the above instance, an error occurs when |↑(|←) was measured in the case of 0 (1) being announced. A direct and simple calculation shows that p↑ and p← cannot be zero simultaneously. That is to say, for at least one announced bit, errors are unavoidable for a large N. Error bits could produce a key for the user that is inconsistent with that of the database in the subsequent step of post-processing and will bring him/her a wrong answer to his/her query in the final step of encryption–decryption. This is certainly unacceptable to the user. In fact, an announcement of just one single bit cannot tell the user both the choice of the basis and the outcome of the measurement, which needs at least two bits to contain the information. So, thanks to the impossibility of super- luminal communication based on quantum physics, whenever he/she tries to gain some information about the basis, the user will lose some about the measurement outcome of the database. In other words, more locations of the bits as conclusive he/she has, more values of the bits are wrong. Therefore, we can assume that in our protocol the user would not cheat to learn extra items at the cost of getting wrong answers to his/her queries.

4.2 How the protocol protects user privacy?

In the beginning of this section, we state that the database may invade user privacy by trying to get locations of the bits known to the user. One approach is to discriminate the received state directly. However, it is useless as the four candidates are distributed uni- formly and are mutually overlapped with half probability between the basis. Another approach that may bring the database information about the bit locations known to the user is to alter the announcements so as to bias the conclusiveness of the bits. The following table illustrates how it proceeds when the database has measured the state |←. See Table 3, there are three possible original states on which the database could measure |←.Theyare|↑, |↓, |←, respectively, which are enumerated as case 1, 2, 3 in the above tabular form. It can also be seen that given |←was measured, the 123 Enhancing user privacy in SARG04-based private database... 4209

Table 4 Data in the cheat when The user’s derivation of b |←was measured i The original state When1was When 0 it was and its probability announced announced

1 |↑ 1/4 1 X 2 |↓ 1/4 X 1 3 |← 1/2 X 0

1 , 1 , 1 probabilities of them being sent are 4 4 2 , respectively. If the database announces 1 the bit 1 honestly in the subsequent step, there has 4 probability for the result being interpreted as conclusive (case 1 in Table 4). However, if it alters the announced bit 3 to 0 instead, the probability will be raised to 4 (case 2, 3 in Table 4). That means the database can bias the conclusiveness of the result by choosing to announce the right bit 1 or the reversed 0 when it has measured |←. This strategy applies for other outcomes by symmetry. So the database can invade user privacy via such an easy manner as long as it has no risk of being detected. Our protocol, however, provides the user means to help him/her resist such attacks using faked data. In step 2 of the proposed protocol, the user is set to randomly select a portion of locations he/she has conclusive results and asks the database to announce the outcome states of its measurements. For the above instance, the database would announce |←, supposing that it is honest. Nevertheless, if it had announced the reversed bit 0, it must now select one of |→, |↑, which are represented as 0, to announce. The database would evade the detection if it fortunately made the right choice (either |→for the original state |↓or |↑for the original state |←). But as long as it made the wrong choice, the database would be caught cheating. This will occur inevitably for a sufficient amount of sample bits. Therefore, the database would take a high risk of being detected if it cheats. User privacy is then well preserved in the protocol via the honesty checking.

5 Conclusions

In this paper, we have proposed a quantum PDQ protocol, which uses a variant of the well-known SARG04 protocol to generate a raw oblivious key. It tries to eliminate the potential safety risk of those SARG04-based PDQ protocols by enhancing privacy of both parties. In this new protocol, the database is no longer in charge of preparing the initial workspace in the bit-learning process. The user makes decision on sending what message to the database at the beginning and makes derivation on the bit value based on the response of the database at the end. If he/she has a conclusive derivation, the user can selectively ask the database to perform a test on its honesty and will identify it as dishonest if it does not pass the detection. Therefore, the protocol is cheat sensitive against dishonest databases. In addition, as the user might sacrifice the correctness of the answer to his/her query, it can be assumed that the user would not cheat to gain extra items in our protocol. 123 4210 F. Yu et al.

By using the post-processing technology proposed in [13], the protocol has O(N) communication complexity, which is of the same order as that of most classical and QSPIR. However, it is efficient in communication compared with the existing SARG04-based protocols. There save three bits of information in per-bit learning compared to the PDQ protocol and one qubit of information compared to the Yang’s protocol [11].

Acknowledgments This work was supported in part by the National Natural Science Foundation (Grant Nos. 61272058, 61070164, 61272415, 61472452), the Natural Science Foundation of Guangdong Province of China (Grant Nos. 2014A030310265, S2012010008767), and the Science and Technology Planning Project of Guangdong Province of China under Grant (Grant No. 2013B010401015).

References

1. Gertner, Y., Ishai, Y., Kushilevitz, E., Malkin, T.: Protecting data privacy in private information retrieval schemes. J. Comput. Syst. Sci. 60(3), 592–629 (2000). Earlier version in STOC 98 2. Giovannetti, V., Lloyd, S., Maccone, L.: Quantum private queries. Phys. Rev. Lett. 100, 230502 (2008) 3. Olejnik, L.: Secure quantum private information retrieval using phase-encoded queries. Phys. Rev. A 84, 022313 (2011) 4. Yu, F., Qiu, D.W.: Coding-based quantum private database query using entanglement. Quantum Inf. Comput. 14(1&2), 0091–0106 (2014) 5. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In: Proceedings of the 36rd IEEE Symposium on Foundations of Computer Science, pp. 41–50. Also, in Journal of the ACM, vol. 45, 1998 (1995) /( − ) 6. Beimel, A., Ishai, Y., Kushilevitz, E., Raymond, J.: Breaking the O(n1 2k 1 ) barrier for information- theoretic private information retrieval. In: Proceedings of 43rd IEEE FOCS, pp. 261–270 (2002) 7. Ambainis, A.: Upper bound on the communication complexity of private information retrieval. In: 24th ICALP, LNCS 1256, pp. 401–407 (1997) 8. Hogg, T., Zhang, L.: Private database queries using quantum states with limited times. Int. J. Quantum Inf. 7(02), 459C474 (2009) 9. Jakobi, M., et al.: Practical private database queries based on a quantum-key-distribution protocol. Phys.Rev.A83, 022301 (2011) 10. Zhang, J.L., Guo, F.Z., Gao, F., Liu, B., Wen, Q.Y.: Private database queries based on counterfactual . Phys. Rev. A 88(2), 022334 (2013) 11. Yang, Y.G., Zhang, M.O., Yang, R.: Private database queries using one . Quantum Inf. Process. (2014). doi:10.1007/s11128-014-0902-z 12. Scarani, V., Acin, A., Ribordy, G., et al.: protocols robust against number splitting attacks for weak laser pulse implementations. Phys. Rev. Lett. 92(5), 057901 (2004) 13. Rao, M.V.P., Jakobi, M.: Towards communication-efficient quantum oblivious key distribution. Phys. Rev. A 87(1), 012331 (2013)

123