Cryptanalysis of An Encrypted in SIGMOD ’14 Xinle Cao Jian Liu∗ Zhejiang University Zhejiang University Hangzhou, China Hangzhou, China [email protected] [email protected] Hao Lu Kui Ren Zhejiang University Zhejiang University Hangzhou, China Hangzhou, China [email protected] [email protected]

ABSTRACT provides elasticity to DO: they can scale their service consumption Encrypted database is an innovative technology proposed to solve up or down according to their real requirements. the data confdentiality issue in cloud-based DB systems. It allows a On the other hand, this service paradigm brings data confden- data owner to encrypt its database before uploading it to the service tiality issue to DO, as the database stored in SP as well as the queries provider; and it allows the service provider to execute SQL queries sent by DO may contain sensitive information. A hacker can exploit over the encrypted data. Most of existing encrypted (e.g., software vulnerabilities to break into the server and snoop on the CryptDB in SOSP ’11) do not support data interoperability: unable data [2] and curious administrators of SP can steal data they are to process complex queries that require piping the output of one interested in [1]. One approach to prevent this potential informa- operation to another. tion leakage is to encrypt sensitive data before storing it at SP, e.g., To the best of our knowledge, SDB (SIGMOD ’14) is the only en- Depot [18], SUNDR [17] and SPORC [9]. To process queries, the crypted database that achieves data interoperability. Unfortunately, encrypted data has to be shipped back to DO and processed locally. we found SDB is not secure! In this paper, we revisit the security of Unfortunately, for some operations, this approach incurs a huge communication overhead that is in the order of the database size, SDB and propose a ciphertext-only attack named co-prime attack. It successfully attacks the common operations supported by SDB, hence it is even worse than storing the database locally. To this end, Popa et al. [22] propose the frst including addition, comparison, sum, equi-join and group-by. We encrypted database evaluate our attack in three real-world benchmarks. For columns called CryptDB allowing SP to execute SQL queries directly over encrypted data. The core idea of CryptDB is to encrypt each data that support addition and comparison, we recover 84.9% − 99.9% item in one or more : diferent onions enable diferent kinds plaintexts. For columns that support sum, equi-join and group-by, onions we recover 100% plaintexts. of operations; within each onion, an item is dressed in layers of Besides, we provide potential countermeasures that can prevent increasingly stronger encryption. For example, it uses homomorphic (HE) [21] for addition in one onion, and uses the attacks against sum, equi-join, group-by and addition. It is still encryption order- (OPE) [3, 6] for comparison in another onion. an open problem to prevent the attack against comparison. preserving encryption However, CryptDB is unable to process complex queries that require piping the output of one operation (onion) to another (i.e., data PVLDB Reference Format: ). For example, the selection clause “ Xinle Cao, Jian Liu, Hao Lu, and Kui Ren. Cryptanalysis of An Encrypted interoperability quantity × unit- > $10, 000” that requires both multiplication and comparison Database in SIGMOD ’14. PVLDB, 14(10): 1743 - 1755, 2021. price at the same time cannot be processed by CryptDB. doi:10.14778/3467861.3467865 SDB in SIGMOD ’14. To achieve data interoperability, in SIGMOD 1 INTRODUCTION ’14, Wong et al. presented an encrypted database named SDB [26]. It uses a special kind of multiplicatively homomorphic encryption Database-as-a-service (DBaaS) is a prevalent cloud-service para- scheme to encrypt each data item: when ciphertexts under diferent digm allowing a data owner ( ) to outsource its database to a DO keys being multiplied with each other, it generates a new key: service provider (SP) that possesses high-performance machines �(�3, �1 × �2) ← �(�1, �1) × �(�2, �2) and sophisticated database software. DO can query the database as if it was stored locally. SP thus provides storage, computation and Besides, it is additively homomorphic only for ciphertexts under administration services to DO. Most importantly, this paradigm the same key: �(�, �1 + �2) ← �(�, �1) + �(�, �2) ∗Jian Liu is the corresponding author. When a database is stored, all data items are encrypted under dif- This work is licensed under the Creative Commons BY-NC-ND 4.0 International ferent keys. When DO wants to add two columns, SDB provides a License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of operation, which enables (with the assistance of ) this license. For any use beyond those covered by this license, obtain permission by KeyUpdate SP DO emailing [email protected]. Copyright is held by the owner/author(s). Publication rights to update the items in the same row of these two columns to be licensed to the VLDB Endowment. under the same key, so that addition can be done. The KeyUpdate Proceedings of the VLDB Endowment, Vol. 14, No. 10 ISSN 2150-8097. doi:10.14778/3467861.3467865 operation also allows SP and DO to decrypt a whole column with constant communication overhead.

1743 To do comparison, e.g., �1 − �2 > 0, SP (with the assistance of a column for 4 times, we can recover at least 90% plaintexts DO) frst updates �(�1, �1) and �(�2, �2) to be under the same key of this column. �3, and computes �(�3, �1 − �2). Then, SP computes We summarize our contribution as follows: �(�5,�(�1 − �2)) ← �(�3, �1 − �2) × �(�4,�), (1) We revisit SDB (SIGMOD ’14) and make four observations, where �(�4,�) was previously uploaded to SP by DO, and � is a which incur serious information leakage but was not men- small random number that will not change the sign of (�1 − �2). In tioned in their paper (Section 2). the end, SP and DO decrypt the whole column using KeyUpdate so (2) We propose a ciphertext-only attack (named co-prime attack) that SP can return the rows that satisfy �(�1 − �2) > 0. against the addition, sum, comparison, equi-join and group-by To sum a column of � items �(�1, �1), ..., �(��, ��), SP (with the operations in SDB (Section 4). 1 1 assistance of DO) updates these encrypted items to�− �1, ...,�− ��, (3) We validate our attack on three public benchmarks, UCI � 1 Credit Card Clients, TPC-C and TPC-H (Section 5). The ex- where � is a new random number. Then, SP returns �− Í �� to �=1 perimental results are summarized in Table 1. DO. The equi-join and group-by operations can be realized in the (4) We provide potential countermeasures that can prevent the same way as sum. attacks against sum, equi-join, group-by and addition (Sec- Our contribution. In this paper, we revisit SDB and make the tion 6). It is still an open problem to prevent the attacks following observations: against comparison. (1) It cannot encrypt 0s; (2) The ciphertexts for addition are deterministic: �(�, �1) = Table 1: Summary of our experimental results. The recovery �(�, �2) if �1 = �2; rates are for the columns that are executed with the opera- (3) If the ciphertexts �(�1, �1), ..., �(�� , �� ) have been updated tions. More details are in Section 5. to be under the same key �0, then these ciphertexts can be converted back to be under any key within {�0, �1, ..., �� }. Operation Benchmark Recovery (%) Requirements (4) The decryption procedure in comparison not only applies to the diference (i.e., �1 − �2), but also applies to the items Credit 97.3 1 addition query Addition being compared (i.e., �1 and �2). TPC-C 84.9 4 update queries Based on these observations, we present a ciphertext-only attack Credit 99.8 against fve operations in SDB: addition, sum, comparison, equi-join Comparison TPC-C 99.8 10 range queries and group-by. We name it co-prime attack, because its success rate highly depends on the theorem [16, 20]: TPC-H 99.9 Credit 100 given � random positive integers in Z� , the probability for them to 1 � 1 , Sum TPC-C 100 1 sum query be co-prime is � (�) + ( |� | ) TPC-H 100 +∞ Í 1 where � refers to the Riemann � -function and � (�) = �� . This Credit 100 �=1 probability is close to 92.4% when � = 4, and close to 99.9% when Equi-Join TPC-C 100 1 equi-join query � = 10. TPC-H 100 Below, we briefy explain the co-prime attack against addition, Credit 100 sum, equi-join, group-by and comparison: Group-by TPC-C 100 1 group-by query To attack , calculates � �, � : � �, � • Addition. addition SP ( 1) ( 2) = TPC-H 100 �1 : �2; if �1 and �2 are co-prime, then �1 = �1 and �2 = �2. With more items being added together, the co-prime proba- bility of these elements increases. Our experimental results show that we can recover at least 90% plaintexts with 7 2 SDB REVISIT columns added together (cf. Table 6). In this section, we revisit the security of SDB. We frst provide • In the operation, Sum, equi-join and group-by. sum SP technical details of SDB and summarize them into interfaces, so that gets �−1� , ...,�−1� . Similar to , can get ratio 1 � addition SP we can use these interfaces to explain our attack. Then, we make among all elements in this column. In our experiments, with some observations, which result in serious information leakage but the number of rows being more than 100, we can recover all was not mentioned in their paper. plaintexts of this column with almost 100% probability. This attack applies to equi-join and group-by operations as well. 2.1 SDB in detail • Comparison. Recall that after comparison between �1 and SDB was proposed in SIGMOD ’14, by Wong et al. [26], to address �2, �(�1 −�2) is revealed to SP. Based on the 4-th observation the data interoperability issue in encrypted databases. It mainly aforementioned, both ��1 and ��2 will be revealed to SP as well. If �1 has been compared for twice, ��1 and �′�1 will be consists of the following operations: revealed. If � and �′ are co-prime, �1 will be revealed. In our Setup. DO generates two big random prime numbers �1 and �2; experiments, if the comparison operation has been applied to calculates � = �1�2 and � (� ) = (�1 − 1)(�2 − 1); and samples

1744 a generator � ∈ Z� for � (� ) order group. For each column, DO Comparison. The comparison operation frequently appears in generates a column key �� = ⟨�, �⟩, where � ∈ Z� and � ∈ Z� (� ) ; range queries with a selection clause. Given two encrypted columns and for each row, DO generates a row key � ∈ Z� (� ) . Then, the �1 and �2, the object is to compare the items (�1 and �2) in each encryption key for each data item is � = ⟨⟨�, �⟩, �⟩. row and return the rows that meet the condition. Notice that the � (�,�, �1, �2,...) ← Setup(1 ) comparison results should be visible to SP so that it can decide which rows to return. takes the encryption key � �, � , � and a Encryption. DO = ⟨⟨ ⟩ ⟩ To compare two values � � , � and � � , � ) in the same row, data item �, computes the ciphertext ( 1 1) ( 2 2) SP (with the assistance of DO) frst updates them to be under the �� −1 � := (�� ) · � mod � same key �3 and computes �(�3, �1 − �2). Then, SP computes . �(�5,�(�1 − �2)) ← �(�3, �1 − �2) × �(�4,�), � ← �(�, �) where �(�4,�) was previously uploaded to SP by DO, and � is a small random number that will not change the sign of (�1 − �2). In Multiplication. To multiply two columns, SP can directly multiply the two data items in the same row: the end, SP (with the assistance of DO) updates the column key of �(�5,�(�1 −�2)) to ⟨1, 0⟩, which gives �(�1 −�2) to SP. Then, SP can ��3 −1 ��1 −1 ��2 −1 (�3� ) �1�2 := (�1� ) �1 ·(�2� ) �2 mod � . decide the truth value of the comparison by comparing �(�1 − �2) with � . Notice that �3 = ⟨⟨�3, �3⟩ , �⟩ is the new-generated encryption key 2 for the product , where mod and �1�2 �3 = �1 ·�2 � �3 = �1 + �2 �(�1 − �2) ← ���(�(�1, �1), �(�2, �2)) mod � (� ). The operation is used to combine rows from � � , � � � � , � � � , � Equi-Join. equi-join ( 3 1 2) ← ( 1 1) × ( 2 2) two or more tables, through a common column between them. To Key update. The KeyUpdate operation allows SP (with the assis- join two encrypted tables �1 and �2 in SDB, SP (with the assistance tance of DO) to update the column key of a ciphertext to a new one. of DO) updates the column keys of the corresponding columns Suppose DO wants to update the encryption key �1 = ⟨⟨�1, �1⟩ , �⟩ to �� = ⟨�, 0⟩. Then, the ciphertexts in these two columns are of a ciphertext � � , � to � � , � , � ; and there is a previ- −1 −1 −1 ′ −1 ′ ( 1 ) 2 = ⟨⟨ 2 2⟩ ⟩ � �1,� �2,... and� �1,� �2,.... It is clear that the same items ously uploaded column of ciphertexts of 1 which includes a cipher- result in the same ciphertexts so that SP can join the tables. text encrypted under �3 = ⟨⟨�3, �3⟩ , �⟩. �3 ← ����-����(�1,�2) There are two phases in KeyUpdate: The operation groups rows that have the same (1) In the frst phase, DO computes Group-by. group-by values into summary rows. To group-by items in one column, SP � �−1 � � mod � � = 3 ( 2 − 1) ( ) (with the assistance of DO) updates the column key to �� = ⟨�, 0⟩. −1 −1 � −1 Then, the ciphertexts in this column become: � �1, ...,� ��. As � = �1� � mod �, 3 2 a result, the same items result in the same ciphertexts so that SP and includes them in the query � being sent to SP. can do group-by. (2) In the second phase, SP computes ������ �����-�� � � , � , ..., � � , � � ← ( ( 1 1) ( � �)) �(�2, �) = � · �(�1, �)· �(�3, 1) mod � . Notice that � and � can be applied to the whole column of 2.2 Observations �(�1, �). Intuitively, SDB is a special kind of multiplicatively homomorphic Notice that all KeyUpdate operations are issued by DO, SP can encryption scheme: when ciphertexts under diferent keys being only passively execute these operations and never ask for additional multiplied with each other, it generates a new key; and it supports KeyUpdate operations. key update. However, it uses some tricks to support addition and comparison, which introduces some security faws. We make the �(�2, �) ← KeyUpdate(�(�1, �), �) following observations on their security faws. Addition. To add two columns, SP frst (with the assistance of DO) updates the column keys to the same one using KeyUpdate. Then, Observation 1. SDB cannot encrypt 0s. Specifcally, in SDB, the �� 1 the two items in the same row would be: �1 = (�3� 3 )− · �1 and ciphertext c = 0 if the plaintext v = 0. �� 1 �2 = (�3� 3 )− · �2. SP can add them directly: Proof. It is obvious that if � 0, then ��3 −1 ��3 −1 ��3 −1 = (�3� ) (�1 + �2) := (�3� ) �1 + (�3� ) �2 mod � . � = (���� )−1 · � mod � = 0. �(�3, �1 + �2) ← �(�1, �1) + �(�2, �2) For the opposite direction, if � 0 but � 0, then ���� −1 � Sum. To sum all items in one column, SP (with the assistance of = ≠ ( ) · a multiple of � � � , where � and � are two big primes. DO) updates the column key to �� = ⟨�, 0⟩. Then, the ciphertexts in must be = 1 2 1 2 � We prove this will never happen. −1 −1 −1 Í this column become: � �1, ...,� ��. SP simply returns � �� Notice that � is a plaintext smaller than any of � and � . Then, � 1 1 2 = �� 1 �� 1 mod � . if (�� )− · � is a multiple of � = �1�2, (�� )− must be a mul- � tiple of � . However, this is not true as (���� )−1 ∈ Z . Therefore, �−1 Í � ��� � � , � , ..., � � , � � � ← ( ( 1 1) ( � �)) ���� −1 � will never be a multiple of � , hence � must be 0. □ �=1 ( ) ·

1745 Observation 2. The ciphertexts for addition are deterministic: Clearly, these features we observed lead to serious information �(�, �1) = �(�, �2) if �1 = �2; leakage. We will explain how we exploit these leakage to develop our attacks in Section 4. Furthermore, our attacks are independent Proof. It is obvious that if �1 = �2, then of the size of the security parameters, as these observations hold �� −1 �� −1 for any size of the security parameters. (�� ) �1 mod � = (�� ) �2 mod �

For the opposite direction, if �(�, �1) = �(�, �2), then 3 ADVERSARIAL MODEL �� −1 We only assume is an honest-but-curious adversary, i.e., it will (�� ) (�1 − �2) ≡ 0 mod � SP not deviate from the protocol specifcation, but will attempt to learn Based on Observation 1, we have �1 − �2 = 0, then �1 = �2. □ all possible information from legitimately received messages (or intermediate results). We assume the adversary has access to the Observation 3. If �(�1, �1), ..., �(�� , �� ) in the same row but dif- encrypted database but cannot issue queries; the adversary has no ferent columns have been updated to be under the same key �0 via auxiliary information (e.g., application details, public statistics or KeyUpdate, then these ciphertexts can be converted back to be under prior versions) about the database; and known-plaintext attack is any key within {�0, �1, ..., �� }. not available to the adversary. In another words, ciphertext-only attack is the only option for the adversary. We remark that this Proof. As assumed in this observation, there exists a query �� adversarial model is much weaker than what is typically assumed in that can update �(��, �� ) to be under �0. attack-related literature for encrypted databases (cf. Section 8). We further emphasis that this adversarial model captures all threats that �(�0, �� ) ← KeyUpdate(�(��, �� ), �� ) database customers are typically worried about: internal threats As we describe in section 2.1, the update process is computed as: like curious database administrators or employees, and external ′ � threats like individual hackers or organized crime. �(�0, �� ) = �� · �(��, �� )· �(� , 1) � mod � We introduce new notations as needed. A summary of notations It is clear that this equation is reversible: appears in Table 2.

−1 ′ �� −1 �(��, �� ) = �� · �(�0, �� )·(�(� , 1) ) mod � Table 2: Summary of notations. To prove that �(��, �� ) can be converted to any �(��, �� ) with (� ≠ �), we only need to prove that �(�0, �� ) can be converted to Notation Description �(��, �� ) with (� ≠ �). This can be achieved by: SP service provider � � , � �−1 � � , � � � ′, 1 � � −1 mod � . ( � � ) = � · ( 0 � )·( ( ) ) DO data owner □ � plaintext space � ciphertext space Since � and � can be applied to the whole column, the following � � � a query statement also holds. � value of a data item Remark 1. If columns �1, ...,�� (each encrypted under a column � a column of values ��� ) ��0 key can be converted to be under the same column key via � ciphertext KeyUpdate, then these columns can be converted back to any column � encryption key: � �, � , � key within {��0, ��1, ..., ��� }. = ⟨⟨ ⟩ ⟩ � () encryption function Observation 4. The decryption procedure in comparison not only � prime applies to the diference, but also applies to the items being compared. Specifcally, if DO issues a query � number of co-prime integers � number of rows �(�1 − �2) ← ���(�(�1, �1), �(�2, �2)), �� column key then SP can get both ��1 and ��2. � row key

Proof. Recall that the comparison operation is executed as fol- lowing steps: 4 CO-PRIME ATTACK AGAINST SDB (1) �(�3, �1) ← KeyUpdate(�(�1, �1), �); In this section, we explain how we exploit the observations we (2) �(�3, �2) ← KeyUpdate(�(�2, �2), �); made in Section 2.2 to develop our attacks against SDB. (3) �(�3, �1 − �2) = �(�3, �1) − �(�3, �2); (4) �(�5,�(�1 − �2)) ← �(�3, �1 − �2) × �(�4,�); 4.1 Co-prime probability (5) �(�1 − �2) ← KeyUpdate(�(�5,�(�1 − �2)), �) We frst present the following theorem (proved in [16, 20]) about In Step 4, SP can use �(�3, �1) to substitute �(�3, �1 − �2), so that it co-prime probability, which is considered to be the theoretical foun- can get ��1 in the end. Similarly, it can get ��2. □ dation for our attacks.

1746 Theorem 1. Given � positive integers that are chosen uniformly Figure 1 and Figure 2 show the co-prime probability of random at random from Z� , the probability that they are co-prime is: numbers in fnite felds are high. For instance, the probability for 1 1 4 randomly chosen integers being co-prime is ∼ 92.4%; the proba- + � ( ), bility for 10 randomly chosen integers being co-prime is 99.9%. � (�) |�| ∼ The success rate of our attack highly depends on this co-prime +∞ Í 1 probability, hence we name our attack . It utilizes where � refers to the Riemann � -function and � (�) = �� . co-prime attack �=1 either of the following information that is revealed by SDB: Notice that � 1 can be ignored when � is large, then the • ( |� | ) | | Ratio among diferent plaintexts (revealed by addition co-prime probability becomes 1 . Figure 1 shows 1 for � ran- or sum/equi-join/group). Once the ratio among � plain- � (�) � (�) texts is revealed, all these � plaintexts will be revealed if dom integers. For instance, the probability for 4 randomly chosen they are co-prime. We assume, w.h.p. these � plaintexts are integers being co-prime is nearly 92.4%; the probability for 10 ran- co-prime (our experimental results confrm this assumption). domly chosen integers being co-prime is close to 99.9%. To estimate • the error introduced by ignoring � ( 1 ), we set � to 224 and 280 Products of the same plaintext with diferent coef- |� | . The coefcients are ran- 10 cients (revealed by comparison) respectively, pick 10 sets of � random numbers, and calculate the dom numbers used to mask the plaintext, thereby, w.h.p. they proportion of sets whose numbers are co-prime. Figure 2 shows are co-prime. Then, the greatest common divisor of these the diference between 1 and the experimentally calculated co- � (�) products is exactly the plaintext. prime probabilities. In the rest of this section, we explain how we exploit the legiti- mate queries to cause the above information leakage. 1 4.2 Co-prime attack against addition Algorithm 1 shows the co-prime attack against the opera- 0.9 addition tion in SDB. It takes � ciphertexts in the same row together with a query calculating the sum of the corresponding plaintexts, and 0.8 it recovers all these � plaintexts. Notice that a query typically operates on the whole columns, hence we can recover all ci-

oximate probability � 0.7 phertexts in the whole columns by running Algorithm 1 for each row separately. Appr As we described in Section 2.1, to answer a query � that adds � 0.6 items in a row, SP (with the assistance of DO) executes the following 2 4 6 8 10 two steps: Number of integers (�) (1) Receiving respective � from DO, SP executes the second phase of KeyUpdate to convert all � ciphertexts to be under the same key: Figure 1: Approximate probability for diferent number of integers being co-prime. �(�0, �� ) ← KeyUpdate(�(��, �� ), �) � ∈ [�]. (2) With all ciphertexts under the same key, SP can execute addition directly: ·10−5 6 � � 24 Õ Õ 2 �(�0, �� ) ← �(�0, �� ). 80 2 �=1 �=1 4 After the frst step above (line 1-3), SP computes the ratio between �1 and other plaintexts (line 4-12). This is possible because after KeyUpdate converts ciphertexts to be under the same key, the ci- ence phertexts become both deterministic and additively homomorphic: 2

Difer • deterministic: if � = � ′ then �(�0, �) = �(�0, � ′); • additively homomorphic: ��(�0, �) = �(�0,��), which can be considered as adding � ciphertexts together. 0 With these two properties, SP can compute the ratio between �1 and any � , if they are non-zero (cf. Observation 1), by fnding � (�) 2 4 6 8 10 � 1 and �� s.t., Number of integers (�) (�) ���(�0, �1) = �1 �(�0, �� ). (�) D (�) E Figure 2: Diference between the experimentally calculated Then �1 : �� = � : �� . can fnd � ,�� by trying each pair in 1 1 SP 1 probabilities and � (�) for diferent number of integers. 2 2 Z� , which introduces � (� ) computational complexity and � (1)

1747 �1�� �� := . Algorithm 1: Co-prime attack against addition (for 1 row) (�) �1 Input: {�(�1, �1), ..., �(��, �� )}; a query � with an operation Recall that the co-prime probability for � , ..., � is high when � is �1 + ... + �� 1 � large. Even for � = 4, the co-prime probability is still ∼ 92.4%. Output: �1, ..., �� Notice that Algorithm 1 only considers a single query that adds 1 for � = 1 → � do update to the same key all � columns. Next, we show that SP can still recover the plaintexts 2 �(�0, �� ) ← KeyUpdate(�(��, �� ), �) with the assistance even if these � columns are queried separately as long as the queries of DO are connected. 3 end Definition 1 (Connected qeries). Suppose query � involves � 2 � � � 4 for = → do compute the ratio between 1 and � addition among a set of columns {�1,�2,...}, and query � ′ involves � (�) 1 � ′ ′ ′ 5 for 1 = → do addition among another set of columns {�1 ,�2 ,...}. Then, � and � � 1 � ′ ′ 6 for � = → do are connected if {�1,�2,...} ∩ {�1 ,�2 ,...} ≠ ∅. We say a set of queries (�) 7 if (���(�0, �1) mod � ) = (�1 �(�0, �� ) mod � ) are connected if any two of them can be connected with each other. then Observation 5. If a set of queries are connected, the columns line 4 8 go to touched by these queries can be converted to be under any column 9 end key of these columns. 10 end Proof. We frst prove that this statement holds for two con- 11 end ′ ′ ′ nected queries � and � : {�1,�2,...} and {�1 ,�2 ,...} are touched by 12 end ′ � and � separately. Suppose there is a column �� ∈ {�1,�2,...} ∩ (2) (�) ′ ′ 13 �1 ← SmallestCommonMultiple(�1 , ...,�1 ) {�1 ,�2 ,...} and its column key is ��� . According to Remark 1, 14 for � = 2 → � do columns in {�1,�2,...} can be converted to be under the same col- �� umn key �� �� ,... ; Similarly, columns in � ′,� ′,... can be 15 �� := �1 · � � ∈ { 1 } { 1 2 } � ( ) ′ 1 converted to be under ��� ∈ {��1,...} as well. As {�1,�2,...} ∪ 16 end ′ ′ {�1 ,�2 ,...} can be converted to be under the same key ��� , based on Remark 1 again, columns in � ,� ,... � ′,� ′,... can be 17 return: �1, ..., �� { 1 2 } ∪ { 1 2 } ′ ′ converted to be under any key in {��1, ��2,...} ∪ {��1, ��2,...}. This can be easily extended to multiple connected queries. □ space complexity. We have a way to reduce � (�2) computational complexity to � (�), but � (�) space complexity is required. More specifcally, we maintain a key-value store; in the frst loop, we (�) (�) put each (�1 · �(�0, �� ),�1 ) into the store; in the second loop, (�) whenever we fnd �� · �(�0, �1) is in the store, we know �� and �1 are the values we want. When some of the �s are negative, to fnd the ratio between �1 and �� , we run line 5-11 for two times: in the frst time, we assume Figure 3: �1 and �2 are connected queries; �2 and �3 are con- they are with the same sign and run it as before; if it fails, in the nected queries. Then, the columns touched by �1, �2 and �3 second time, we assume they are with opposite sign and multiply can be converted to be under the same column key. 1 −1 to �(�0, �� ), and then run as before . D (2) E D (�) E Figure 3 visualizes Observation 5. After fnding all ratios: �1 ,�2 , ..., �1 ,�� , SP can merge D 2 � E Observation 5 implies that a set of connected queries, each of them by calculating the smallest common multiple of � ( ), ...,� ( ) : 1 1 which adds a set of columns, is equal to a single query that adds all (2) (�) these columns. Then, Algorithm 1 can be applied directly. �1 ← SmallestCommonMultiple(�1 , ...,�1 ). Then, the ratio among �1, ..., �� is: 4.3 Co-prime attacks against sum, equi-join �1�2 �1�� and group-by �1 : : ... : (2) (�) The co-prime attack against the operation in SDB is shown �1 �1 sum in Algorithm 2. It takes a column of ciphertexts together with a If �1, ..., �� are co-prime, SP can recover all of them: sum query calculating the sum of this column, and outputs the �1 := �1 corresponding plaintexts. Recall that, Algorithm 1 requires the +∞ �1�2 query to touch � columns to reach a successful rate of Í 1 −1. �2 := ( �� ) (2) �=1 �1 +∞ Í 1 −1 ... The successful rate for Algorithm 2 is ( �� ) , where � is the �=1 1This strategy can be applied to all attacks we present in this paper. number of rows; and typically � is at least in the order of millions.

1748 Therefore, via this attack, SP can always recover the plaintexts of a Algorithm 3: Co-prime attack against comparison (for 1 row) column with ∼ 100% probability. Input: �(�0, �); {�(�1,�1), ..., �(��,�� }); � queries {�1, ..., �� }, each with an operation comparing � and other values

Algorithm 2: Co-prime attack against sum (for 1 column) Output: � 1 Input: a column of encrypted items �(�1, �1), ..., �(��, ��); a sum 1 for � = → � do update to a given key ′ query � for this column 2 �(�� , �) ← KeyUpdate(�(�0, �), �� ) with the assistance of DO �1, ..., �� Output: 3 end 1 for � = 1 → � do update to the same key � 1 � � � −1 4 for = → do compute the product of and � 2 � �� ← KeyUpdate(�(��, �� ), �) with the assistance of ′′ ′ 5 �(�� ,���) ← �(�� , �) × �(��,�� ) DO ′′ 6 ��� ← KeyUpdate(�(� , ���), �� ) with the assistance of 3 end � DO 4 for � = 2 → � do compute the ratio between �1 and �� 7 end (�) � � �, ...,� � 5 for �1 = 1 → � do 8 ← GreatestCommondivisor( 1 � ) 6 for �� = 1 → � do 9 return: � −1 (�) −1 7 if (��� �1 mod � ) = (�1 � �� mod � ) then 8 go to line 4 9 end Recall that to compare � with �� , SP executes 10 end ′ �� (� − �� ) ← ���(�(�0, �), �(�� , �� )), 11 end and compares � � � with � . According to Observation 4, 12 end � ( − � ) 2 SP not only gets �� (� − �� ), but also gets ��� (line 4 -6). With � queries � � (2), ...,� (�) 13 1 ← SmallestCommonMultiple( 1 1 ) {�1, ..., �� } for �, SP gets � products {�1�, ...,���}. Notice that 14 for � = 2 → � do {�1, ...,�� } are random numbers, hence w.h.p. they are co-prime �� 15 �� := �1 · when � is large. Even for � 4, the co-prime probability is still � (�) = 1 nearly 92.4%. Then, SP can recover � by computing the greatest 16 end common divisor of {�1, ...,�� } (line 8): 17 return: �1, ..., �� � = GreatestCommonDivisor(�1�, ...,���) The situation becomes even worse when we consider connected Even worse, for other columns that have not been touched by a queries. Recall that the columns touched by a set of connected sum query, SP can still recover them with ∼ 100% probability, as queries can be converted to be under any column key of these long as they are“connected” with a column touched by a sum query. columns (c.f. Observation 5). Then, these � queries are no longer More specifcally, suppose a set of columns are touched by a set of required to be issued for the same column; they can be issued connected queries (cf. Defnition 1), based on Observation 5, these for diferent columns as long as these columns are “connected”. columns can be converted to be under the same column key. If a Furthermore, the plaintexts of all these columns will be recovered. sum query is issued for any of these columns, Algorithm 2 can be Recall that the connected queries are defned for queries involv- applied to all these columns. Moreover, the co-prime attack against ing addition operations. We remark that if a comparison query is sum applies to equi-join and group-by as well. issued for two columns (then it involves addition between these However, co-prime attack against sum may fail to attack group- two columns), it can also be connected with other queries. by and equi-join if items in the operated columns are so large (e.g., the items are in string type) that we cannot get ratios between them 4.5 Summary by trying each pair in Z2 . However, since ciphertexts are converted � We have presented three attacks against fve diferent operations 1 1 to �− �1, ...,�− ��, the same plaintexts have the same ciphertexts, in SDB. Now, we compare them and give a summary. which reveals plaintexts frequency. Therefore, and group-by equi- In the attacks against addition, we assume data items are under join in SDB can be attacked by frequency analysis [19]. This attack uniform distribution in plaintext space. With this assumption, we is beyond the scope of this paper. only need four columns to be added together to recover plaintexts with at least 90% probability. 4.4 Co-prime attack against comparison In the attacks against sum (also equi-join and group-by), we only Algorithm 3 describes the co-prime attack against the comparison need to assume there are enough rows in a database so that elements operation in SDB. In this attack, if an encrypted item has been in the same column are co-prime with nearly 100% probability, compared for � times (no matter compared with the encrypted which always happens in real-world. With this weak assumption, item in the same row, or with a constant), SP can recover it with a we can recover all plaintexts in a column in nearly 100% probability. +∞ Besides, with more columns whose ciphertexts have been converted probability of nearly Í 1 −1. ( �� ) to be under the same column key, we can recover more columns. �=1

1749 In the attacks against comparison, we do not need any assumption Table 4: Statistical information about the datasets we use. 0 about the plaintexts. We only need four comparison operations on (%) are calculated with original datasets. Other information any column of the columns whose ciphertexts can be converted to are all calculated with datasets removed 0s. be under the same column key.

Benchmark 0 (%) Neg (%) Min Max Ave 5 EXPERIMENTS In this section, we evaluate our co-prime attack by conducting the Credit 10.1 3.2 -10 136468 4948.7 following benchmarks: TPC-C 2.4 0 0 104900 11487.6 • Credit. This is the “default of credit card clients” dataset TPC-H 15.3 1.3 -339603 1684259 29657.2 taken from the UCI machine learning repository2. It consists 30000 rows and 25 columns, containing banking information of credit card clients in Taiwan from April to September. It 5.1 Experimental settings did not specify queries, so we generate queries with addition, We encrypt the databases using SDB and perform co-prime attacks comparison and sum operations by ourselves. against their queries. To be consistent with [26], we set � and � to • TPC-C. This is an on-line transaction processing benchmark 1 2 simulating a complete environment for terminal operators to be 512 bits. Besides, to show that our attacks work under diferent execute transactions against a database3. We pick 4 columns parameter settings, we conduct experiments with larger security from its database: C_BALANCE, OL_O_ID, S_QUANTITY parameters and give results in Figure 4. and OL_AMOUNT. In the query stream of TPC-C, there are As 0s can be easily attacked (cf. Observation 1), we remove all rows that contain 0s. Moreover, larger databases (more rows update queries with addition operation querying C_ BAL- and more columns) result in better attacking performance, for the ANCE; there are queries with comparison operation querying following reasons: OL_O_ID and S_QUANTITY; and there are queries with sum operation querying OL_AMOUNT. CryptDB [22] uses this • For the attacks against sum/equi-join/group-by, more rows benchmark to evaluate its performance. result in higher co-prime probability, and hence higher re- • TPC-H. This is a decision support benchmark measuring covery rate. multiple aspects of the capability of a database system to • For the attack against addition, more columns being added process queries4. We pick four columns from its database: result in higher co-prime probability, and hence higher re- L_QUANTITY, L_DISCOUNT, PS_AVAILQTY, and L_ EX- covery rate; but independent of the number of rows. TENDEDPRICE. In the query stream of TPC-H, there are • For the attack against comparison, more times a column being queries with comparison operation querying PS_AVAILQTY compared result in higher co-prime probability, and hence and L_QUANTITY; and there are queries with sum oper- higher recovery rate. ation querying L_QUANTITY, L_DISCOUNT and L_ EX- To this end, we only consider small databases: we randomly pick TENDEDPRICE. SDB [26] uses this benchmark to evaluate 1000 rows and at most 12 columns from each dataset to conduct its performance. experiments. Our experimental results in Table 6 and Figure 6 The target columns and query operations in TPC benchmarks are validate the above statement. In Table 6, the recovery rate is at most summarized in Table 3. Besides, we give some necessary statistical 85.7% with 4 columns being added and it is at least 95% with 10 information about our target columns in Table 4. columns being added. In Figure 6, the recovery rate is about 92% with 4 range queries and it is almost 100% with 10 range queries. Table 3: Columns and operations in TPC benchmarks. All experiments were conducted on a Mac laptop with Intel Core i5 processor and 8GB Memory running macOS High Sierra (v10.13.6). In all of our experiments, we repeat the process 10 times. Attributes Benchmark Operation C_BALANCE TPC-C ��� 5.2 Attacking time OL_O_ID TPC-C > (<) We frst measure the time usage of the attack against addition with diferent plaintext spaces and parameter settings (with ran- S_QUANTITY TPC-C > (<) domly generated data). The results are shown in Figure 4: our attack OL_AMOUNT TPC-C ��� against addition works with diferent plaintexts smaller than 220 PS_AVAILQTY TPC-H > (<) and can succeed under diferent parameter settings. Since the attack L_QUANTITY TPC-H > (<) & ��� against comparison is infuenced little by larger parameter settings L_DISCOUNT TPC-H ��� (as it only iterates � times) and attack against sum/equi-join/group- is similar to attack against addition, we do not repeat these L_EXTENDEDPRICE TPC-H ��� by experiments here. Attacking a plaintext space of 220 requires 30 minutes on our

2 laptop. This time usage will increase linearly as the plaintext space https://archive.ics.uci.edu/ml/datasets.php 24 3http://www.tpc.org/tpcc/ grows (e.g., attacking a plaintext space of 2 requires 8 hours), 4http://www.tpc.org/tpch/ thereby attacking a larger plaintext space requires more dedicated

1750 512 bits Table 6: Co-prime attack against addition in the “Credit” 2,000 1024 bits benchmark. Col indicates how many columns have been 2048 bits added together. Min, Max, Average and Stddev separately in- 1,500 dicate the minimum, maximum, average and standard devi- ation of the proportion of recovered plaintexts in our exper- iments. Cost indicates the time usage of the attack.

usage (s) 1,000

(%) (%) (%) (%) (s) Time # Col Min Max Average Stddev Cost 500 2 34.0 58.3 47.2 8.5 111.6 3 60.4 79.5 67.1 6.5 229.8 0 4 57.4 85.7 76.8 8.6 291.9 5 81.8 89.4 84.0 2.4 264.9 10 12 14 16 18 20 6 77.2 94.0 88.5 5.0 478.5 � Plaintext Space � (2 ) 7 90.5 94.9 92.8 1.5 645.3 8 90.9 96.2 94.1 1.6 792.0 Figure 4: Time usage of co-prime attack against addition 9 93.0 96.3 94.8 0.9 781.2 with diferent plaintext spaces and diferent parameter set- 10 95.0 97.2 96.2 0.7 999.9 tings. 11 96.2 97.3 97.0 0.4 1090.0 12 97.3 97.3 97.3 \ 1229.1 machines. However, we argue that 224 is large enough for the com- mon databases. To demonstrate this, we survey datasets from difer- probability. However, there are still 76.8% plaintexts being recovered ent felds including business, economics, education and other felds when the number of columns is 4; and it reaches 97.3% when the in Kaggle5. In each feld, we pick the most popular 240 datasets and number of columns is 12. calculate the proportion of columns whose items are no more than The size of plaintext space in “credit” is 0, 220 (most plaintexts 20 22 24 2 , 2 and 2 respectively (cf. Table 5). are no more than 220; only 5 elements are over 220), hence we can try each pair in Z� to fnd the ratio among � items, which takes 220 Table 5: The proportion of columns whose items are no more 111.6-1229.1 seconds (cf. Table 6). However, the values in the “credit” 20 22 24 than 2 , 2 and 2 respectively. benchmark are typically small so that most time we do not need to go over the whole plaintext space. 20 22 24 We also evaluate the performance of our co-prime attack against Field < 2 (%) < 2 (%) < 2 (%) addition via the TPC-C benchmark. The addition operations of Business 94.9 95.8 96.9 TPC-C are included in the update queries: Economics 83.0 85.3 86.7 UPDATE customer SET c_balance = c_balance + ?. Education 91.3 94.8 98.3 where ? is a number generated in TPC-C query stream. Figure 5 Health 92.0 94.8 97.5 shows the proportion of recovered plaintexts with diferent number Arts and Entertainment 91.1 93.5 95.2 of update queries.

5.3 Against addition 0.8 In the “credit” database, there are 12 columns that support addition. To evaluate the performance of our co-prime attack against addition, 0.6 we generate queries such as: � � = Í �� . SELECT �=1 0.4 We randomly pick � = 2, ..., 12 columns from the database and execute the above query. oportion recovered Pr For example, if we pick 5 columns from the database, the addition 0.2 query would be performed among these 5 columns; we repeat this experiment ten times, and each time we re-select another 5 columns (if we pick 12 columns, then we have no other option but add these 2 4 6 8 10 12 columns once). We report max, min, avg and std of the proportion Number of update queries of recovered plaintexts in Table 6. Notice that the data values are not randomly distributed as we assumed in Section 4.2. Therefore, Figure 5: Proportion of recovered plaintexts of co-prime at- the proportion of recovered plaintexts is lower than the theoretical tack against addition with update queries in TPC-C.

5https://www.kaggle.com/

1751 5.4 Against sum 6 COUNTERMEASURES We evaluate our co-prime attack against sum operation using all Recall that the attacks against addition and sum/equi-join/group-by three benchmarks. In each benchmark, we select all columns that require SP to iterate the plaintext space to fnd the ratio. The basic can be queried with a sum operation and try to recover these idea of our countermeasures is to enlarge the plaintext space by columns with our co-prime attack. Here is an example of a sum multiplying a random value to each data item, so that SP can no query in TPC-C: longer iterate it. Data items in the same row should be multiplied with the value for two reasons: (1) additions can still be done, SELECT SUM(ol_amount) FROM order_line WHERE ol_o_id = ? same and (2) does not need to store random values for all items in the AND ol_d_id = ? AND ol_w_id = ? DO database. However, this is not enough to prevent the attack against If we run our attack for all rows, we can certainly recover 100% addition, as SP can still iterate the original plaintext space to fnd plaintexts. So, we randomly choose 2 − � rows to run our attack. the ratio. To this end, we also add noise to the new plaintext. The We run 10 times for each column in TPC-C and TPC-H, and record encryption scheme becomes: the minimum and maximum number that makes our attack success. encrypts a data item � as: For Credit, we run once for each column. Encryption. DO � := (���� )−1 ·(�� + �) mod �,

Table 7: Co-prime attack against sum in all benchmarks. where 0 ≪ � ≪ � and � ≪ �� + � ≪ � . The column key is ⟨�, �⟩ Columns indicates how many columns that can be queried and the row key is ⟨�, �⟩. DO does not store �. with sum. Recovered indicates how many columns we recov- Multiplication: Given two ciphertexts �(�1, ��1+�1) and �(�2, ��2+ ered. Min and Max separately indicate the minimum and the �2) in the same row, SP computes: maximum number of rows to make our attack success. 2 �(�3, � �1�2+(��1�2+��2�1+�1�2)) ← �(�1, ��1+�1)×�(�2, ��2+�2). 2 Notice that � must be large enough so that ��1�2+��2�1+�1�2 ≪ � , Benchmark Columns Recovered Min Max and (��1�2 + ��2�1 + �1�2) is considered as the new “�”. Credit 12 12 2 5 KeyUpdate: The KeyUpdate function is the same as before. TPC-C 1 1 2 6 Addition. Given two ciphertexts �(�1, ��1 +�1) and �(�2, ��2 +�2) TPC-H 3 3 2 5 in the same row, DO frst convert them to be under the same key �3 then adds them together:

�(�3, �(�1 + �2) + (�1 + �2)) ← �(�1, ��1 + �1) + �(�2, ��2 + �2). The co-prime attacks against equi-join and group-by are identical as that against . They can also recover 100% plaintexts as long � �� 1 sum Decryption. Given a ciphertext � = (� � + �)·(�� )− , DO as the number of rows is enough. So, we omit these experiments. decrypts it as follows: � · ���� mod �  � ← 5.5 Against comparison �� Recall that the success rate of our co-prime attack against compari- where � is the number of times a ciphertext has been multiplied. son is independent of the data distribution; but only depends on the � 6 Decryption can succeed because � is much smaller than � so that co-prime probability of the random masks {�1, ...,� } . Then, for � it will be canceled by the “rounding” operation. each benchmark we only need to evaluate one column; the attack Notice that this new encryption scheme can only support a performance on other columns should be similar. In the “credit” limited number of additions and multiplications: might benchmark, we select one column and generate 2-10 range queries addition make 0 � � no longer hold and might make for this column. In the TPC-C and TPC-H benchmarks, we select ≪ ≪ multiplication � �� � � no longer hold. Furthermore, this scheme cannot one column from each benchmark, and pick the frst 2-10 range ≪ + ≪ securely support comparisons, as the attack against queries regarding this column from the query stream. Here is an comparison does not require to iterate the plaintext space (cf. Algorithm 3). example of a range query in TPC-C: SP We leave it as a future work to refne this encryption scheme. SELECT count(*) FROM stock WHERE s_w_id = ? AND s_i_id = ? AND s_quantity < ? 7 DISCUSSION Figure 6 shows the proportion of recovered plaintexts of co- 7.1 Lesson learned prime attack against comparison with diferent number of range SDB is in fact a multiplicatively-homomorphic encryption scheme queries. In all these three benchmarks, more than 90% plaintexts of the form: (���� )−1 ·�, where ⟨⟨�, �⟩ , �⟩ can be considered as the are recovered with 4 range queries to the same column; and almost randomness, because for any two diferent ciphertexts, they either 100% plaintexts are recovered with 10 range queries to the same have diferent ⟨�, �⟩ or diferent �. To add some ciphertexts, one has column. So the range queries in SDB is fragile under co-prime attack to make them with the same ⟨⟨�, �⟩ , �⟩, otherwise, addition cannot against comparison. be done. Then, SDB can be considered as an encrypted database, where the ciphertexts are encrypted under the same key. We prove 6To keep consistent with SDB, each of them has 80 bits. the following theorem for such kind of encrypted databases:

1752 1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7 oportion recovered oportion recovered oportion recovered OL_O_ID PS_AVAILQTY Pr Pr Pr 0.6 0 6 0.6 S_QUANTITY . L_QUANTITY 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 Number of range queries Number of range queries Number of range queries

Bank database (selected one column) TPC-C (two columns) TPC-H (two columns)

Figure 6: Proportion of recovered plaintexts of co-prime attack against comparison with diferent number of range queries.

Theorem 2. In an encrypted database where the ciphertexts are an encryption scheme [8] that has been widely used in outsourced encrypted under the same key, if a Leakage function �(·) can be ap- computation [4, 10, 12, 25]. It works as follows: plied to a single column of ciphertexts with sublinear communication, • Setup: then it can be applied to all ciphertexts. {(� ′,�), (� ′, �)} ← ���(1�) Proof. We defne the leakage function �(·) as: (� ′,�) are public while (� ′, �) are private. � ′ should have � (�) ← �(�), many small divisors and there should be many elements less than � ′ that are co-prime with it. � ′ is a divisor of � ′ and where � is the ciphertext, � is the plaintext, and � � is the leaked ( ) � Z is co-prime with � ′. information. Consider the following two cases: ∈ � ′ • Encryption: (1) The encryption is deterministic. Then, it is clear that �(·) can be applied to all other ciphertexts, as they are encrypted � ← ���(�, �,�, � ′, � ′) under the same key. divides � into a �-tuple � (1), � (2), ..., � (�) Z� such that (2) The encryption is non-deterministic, which means that each ( ) ∈ � ′ � Í� � (�) mod � ′; the ciphertext is ciphertext has diferent randomness. Suppose there are � = �=1 ciphertexts in a column. Let � denote the information sent � = (� (1) · �, � (2) · � 2, ..., � (�) · �� ) mod � ′ from DO to SP. As the communication is sublinear, � can be customized for at most � (� < �) ciphertexts in this • Decryption: column. Recall that �(·) can be applied to the whole column, ′ ′ which means � is applicable to the rest (� − �) ciphertexts � ← ���(�, �, � , � ) as well. That means the randomness in the ciphertexts has the plaintext is calculated as: no efect on �. Then, � can be applied to the ciphertexts in other columns. (� (1), ..., � (�) ) = (� (1)� −1, ..., � (�)� −� ) mod � ′ □ � Õ � = � (�) mod � ′ In SDB, the leakage function for addition is (�0�1, ..., �0�� ), where �� 1 �=1 �0 = (�� )− . SP has to iterate the plaintext space to fnd ratio and recover �� . We can prevent the co-prime attack by enlarging • Addition: the plaintext space (cf. Section 6). However, the leakage function for �(�1 + �2) ← ���(�1, �2) comparison is (�0�, ...,���) and SP can easily fnd �. In summary, we point out the following two research directions for preventing � (�) (�) computes �(�1 + �2) ( ) = � + � mod � ′ the co-prime attack: 1 2 • Multiplication: • Encrypt diferent columns using diferent keys. Then, we � � � ���� � , � need to explore how to support interoperability. ( 1 · 2) ← ( 1 2) • Try to minimize the leakage function! (�) Í (�) (�) ′ computes �(�1 · �2) = �1 �2 mod � �+�=�+1 7.2 Attacking other schemes Due to co-prime probability (cf. Section 4.1), this scheme can We remark that our co-prime attack is not specifc to SDB. Instead, be attacked under known-plaintext attack, which was introduced it can also be applied to other encryption schemes having similar in [7, 24]. Suppose there are � + 2 known pairs (��, �(�� )), the favors with SDB. For example, our co-prime attack can be applied to adversary recovers (� ′, �) by constructing the following equations:

1753 returned) or communication volume (how many elements are re- (1) (�) turned) is leaked. However, it assumes the query distribution is �1 � ··· �  1 1  −1 2  � � (1) � (�)   � −1  0 known and it requires at least � (� log �) (� is the number of  2 2 ··· 2         .  0 mod � ′ (1) distinct plaintext values) range queries. [15] improves � �2 log �  . . .   .  = 0 ( ) . . . . 0    −�  to � log � + � (�) and presents an approximate reconstruction  (1) (�)   �   ��+1 � ··· �    attack which only needs the access pattern leakage of � � queries.  �+1 �+1  ( ) It can recover plaintext values in a dense dataset within a constant Denote this coefcient matrix as �1. As the equation has a nontrivial 1 � � 1 ratio of error. solution (−1, � − , ..., � − ) mod � ′ ∈ � + , we have ��� (�1) ≡ 0 �′ Kornaropoulos et al. [14] present a reconstruction attack that mod � ′, which means ��� (�1) is a multiple of � ′. With � + 2 succeeds without any knowledge about the query or data distribu- known pairs, the adversary can construct �+2 �+2 determinants: �+1 = tion. Based on the search-pattern leakage and a technique named (�1, �2, ..., � ). Suppose there are � determinants (�1, .., � )(2 ≤ �+2 � support size estimation, they use range queries and �-nearest- � � 2 satisfy ��� � 0. As ��� (�1) , ..., ��� (�� ) are co-prime ≤ + ) ( � ) ≠ ( �′ �′ ) neighbor( ) queries to reconstruct plaintext values under a ′ k-NN w.h.p, � can be recovered: variety of skewed query distributions. However, it still requires ′ queries to be under some fxed distributions. For example, queries � ← ���(��� (�1),��� (�2), ...,��� (� )) � should touch all plaintext values, if all queries only touch plaintexts With � ′, the adversary can get � ′ that satisfes � ≡ � ′ mod � ′ by from [0, �/2] (only plaintexts in this interval are returned), then solving Equation 1. Although the original scheme decrypts cipher- plaintexts in [�/2, �] cannot be recovered. texts with �, � ′ can be used for decryption as well. Comparison. We compare these attacks with our co-prime attack in the context of attacking SDB. Recall that the sum, equi-join and 8 RELATED WORK group-by operations in SDB leak plaintext frequency in the same Most encrypted databases (e.g., CryptDB [22], MONOMI [23]) lever- column. So attacks based on plaintext frequency [5, 11, 19] can be age property-preserving encryption (PPE) to support comparisons. applied to SDB, However, auxiliary information about the plaintext Deterministic encryption (DET) encrypts the same plaintexts to the frequency is required. Attacks based on range queries and k-NN same ciphertexts, so that equal operation, join operation and group- queries [13–15] can be applied to SDB, because the comparison by operation can be done. Order preserving encryption (OPE) [3, 6] operation in SDB leaks access patterns and volume information. preserves the order information of plaintexts, so that range queries However, at least � log(�) + � (�) queries are required for full and comparison operations can be processed. PPE-based encrypted reconstruction and � (�) queries are required for approximate databases like CryptDB are vulnerable to many attacks since sen- reconstruction in [15]. sitive information are leaked from ciphertexts: OPE leaks order Compared with these attacks, our co-prime attack can attack information of plaintexts and DET leaks which plaintexts are equal. more operations in SDB and needs no more than 10 queries to Many attacks have been proposed, we compare some of them with recover all plaintexts. Besides, our assumption is much weaker: we only assume plaintexts are under randomly uniform distribution our Co-prime attack. (our attack still works when this assumption is not satisfed). Leakage from ciphertexts. PPE inherently leaks information about plaintext such as frequency and order, which has been ex- ploited extensively by various of attacks. Naveed [19] attack DET- 9 CONCLUSION and OPE-encrypted databases based on frequency analysis and In this paper, we revisit the security of SDB and make four ob- sorting. They estimate plaintext frequency and order in advance servations, which incur serious information leakage but was not so that they can compare frequency and order to fnd a map from mentioned in their paper. We exploit these observations and pro- ciphertexts to plaintexts. Cash et al. [11] adapt the attack proce- pose a ciphertext-only attack named co-prime attack. We show how dure as a min-weight non-crossing bipartite matching, attacking a to use it to attack the addition, sum, comparison, join and group- widely used OPE scheme [6]. Compared with [19], the non-crossing by operations in SDB. We evaluate our attack in three real-world attack performs better when plaintext data are drawn from large benchmarks. For columns that support addition and comparison, domains. Bindschaedler et al. [5] present a new inference tech- we recover 84.9% − 99.9% plaintexts. For columns that support sum, nique called against property-revealing encryp- multinomial attack equi-join and group-by, we recover 100% plaintexts. Nevertheless, tion(PRE) schemes. Considering correlation between columns, they we still think the data interoperability is an important issue, and construct the attack based on Bayesian inference problem in multi we admit that there are a bunch of smart designs in SDB. In future dimensions. Besides, with the correlation and plaintexts gotten, they work, we will attempt to fx the security faws of SDB and propose can infer columns protected by semantically secure encryption or an encrypted database that supports data interoperability. redaction with machine learning and record linkage methods. Leakage from queries. When DO issues a query, what SP returns may leak information about plaintexts. Such attack usually need ACKNOWLEDGMENTS plenty of queries to recover plaintexts and most of them assume The work was supported in part by Zhejiang Key R&D Plans (Grant queries are under known distribution. Kellaris et al. [13] develop a No. 2021C01116, 2019C03133), National Natural Science Foundation generic reconstruction attack on any encrypted database support- of China (Grant No. 62002319, U20A20222) as well as a grant from ing range queries where either access pattern (which elements are China Zheshang Bank.

1754 REFERENCES [14] E. M. Kornaropoulos, C. Papamanthou, and R. Tamassia. 2020. The State of the [1] [n.d.]. GCreep: Google engineer stalked teens, spied on chats. Gawker. http: Uniform: Attacks on Encrypted Databases Beyond the Uniform Query Distribu- //gawker.com/5637234/. Accessed in December 2020. tion. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, [2] [n.d.]. National Vulnerability Database. CVE statistics. https://nvd.nist.gov/ Los Alamitos, CA, USA, 1223–1240. https://doi.org/10.1109/SP40000.2020.00029 vuln/search. Accessed in December 2020. [15] M. Lacharité, B. Minaud, and K. G. Paterson. 2018. Improved Reconstruction [3] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. 2004. Attacks on Encrypted Data Using Range Query Leakage. In 2018 IEEE Symposium . 297–314. https://doi.org/10.1109/SP.2018.00002 Order Preserving Encryption for Numeric Data. In Proceedings of the 2004 ACM on Security and Privacy (SP) [16] Derrick Norman Lehmer. 1900. Asymptotic Evaluation of Certain Totient Sums. SIGMOD International Conference on Management of Data (, France) (SIGMOD 22, 4 (oct 1900), 293. https://doi.org/10.2307/ ’04). Association for Computing Machinery, New York, NY, USA, 563–574. https: American Journal of Mathematics //doi.org/10.1145/1007568.1007632 2369728 [4] A. Alabdulatif, Heshan Kumarage, I. Khalil, and X. Yi. 2017. Privacy-preserving [17] Jinyuan Li, Maxwell Krohn, David Mazières, and Dennis Shasha. 2004. Secure Untrusted Data Repository (SUNDR). In anomaly detection in cloud with lightweight homomorphic encryption. J. Comput. Proceedings of the 6th Conference on (San Syst. Sci. 90 (2017), 28–45. Symposium on Operating Systems Design amp; Implementation - Volume 6 [5] Vincent Bindschaedler, Paul Grubbs, David Cash, Thomas Ristenpart, and Vitaly Francisco, CA) (OSDI’04). USENIX Association, USA, 9. [18] Prince Mahajan, Srinath Setty, Sangmin Lee, Allen Clement, Lorenzo Alvisi, Shmatikov. 2018. The Tao of Inference in Privacy-Protected Databases. Proc. VLDB Mike Dahlin, and Michael Walfsh. 2011. Depot: Cloud Storage with Minimal Endow. 11, 11 (July 2018), 1715–1728. https://doi.org/10.14778/3236187.3236217 [6] Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O’Neill. 2009. Trust. ACM Trans. Comput. Syst. 29, 4, Article 12 (Dec. 2011), 38 pages. https: //doi.org/10.1145/2063509.2063512 Order-Preserving Symmetric Encryption. In Advances in Cryptology - EURO- [19] Muhammad Naveed, Seny Kamara, and Charles V. Wright. 2015. Inference CRYPT 2009, Antoine Joux (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 224–241. Attacks on Property-Preserving Encrypted Databases. In Proceedings of the 22nd [7] Jung Cheon and Hyun Nam. 2003. A Cryptanalysis of the Original Domingo- ACM SIGSAC Conference on Computer and Communications Security (, Colorado, USA) . Association for Computing Machinery, New York, NY, Ferrer’s Algebraic Privacy Homomophism. IACR Cryptology ePrint Archive 2003 (CCS ’15) (01 2003), 221. USA, 644–655. https://doi.org/10.1145/2810103.2813651 [8] Josep Domingo-Ferrer. 2002. A Provably Secure Additive and Multiplicative [20] J.E Nymann. 1972. On the probability that k positive integers are relatively prime. 4, 5 (1972), 469–473. https://doi.org/10.1016/0022- Privacy Homomorphism*. In Information Security, Agnes Hui Chan and Virgil Journal of Number Theory Gligor (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 471–483. 314X(72)90038-8 [9] Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten. [21] Pascal Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In , Jacques Stern 2010. SPORC: Group Collaboration using Untrusted Cloud Resources. In 9th Advances in Cryptology — EUROCRYPT ’99 (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 223–238. USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, [22] Raluca Ada Popa, Catherine M. S. Redfeld, Nickolai Zeldovich, and Hari Bal- , Remzi H. Arpaci-Dusseau October 4-6, 2010, , BC, Canada, Proceedings akrishnan. 2011. CryptDB: Protecting Confdentiality with Encrypted Query and Brad Chen (Eds.). USENIX Association, 337–350. http://www.usenix.org/ Processing. In events/osdi10/tech/full_papers/Feldman.pdf Proceedings of the Twenty-Third ACM Symposium on Operating (Cascais, Portugal) . Association for Computing Ma- [10] J. Girao, D. Westhof, and M. Schneider. 2005. CDA: concealed data aggregation Systems Principles (SOSP ’11) chinery, New York, NY, USA, 85–100. https://doi.org/10.1145/2043556.2043566 for reverse multicast trafc in wireless sensor networks. In IEEE International [23] Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich. 2013. , Vol. 5. 3044–3049 Vol. 5. Conference on Communications, 2005. ICC 2005. 2005 Processing Analytical Queries over Encrypted Data. 6, 5 https://doi.org/10.1109/ICC.2005.1494953 Proc. VLDB Endow. (March 2013), 289–300. https://doi.org/10.14778/2535573.2488336 [11] P. Grubbs, K. Sekniqi, V. Bindschaedler, M. Naveed, and T. Ristenpart. 2017. [24] David Wagner. 2003. Cryptanalysis of an Algebraic Privacy Homomorphism. Leakage-Abuse Attacks against Order-Revealing Encryption. In 2017 IEEE Sym- In , Colin Boyd and Wenbo Mao (Eds.). Springer Berlin . 655–672. https://doi.org/10.1109/SP.2017.44 Information Security posium on Security and Privacy (SP) Heidelberg, Berlin, Heidelberg, 234–239. [12] H. Hu, J. Xu, C. Ren, and B. Choi. 2011. Processing private queries over untrusted [25] D. Westhof, J. Girao, and M. Acharya. 2006. Concealed Data Aggregation for data cloud through privacy homomorphism. In 2011 IEEE 27th International Reverse Multicast Trafc in Sensor Networks: Encryption, Key Distribution, . 601–612. https://doi.org/10.1109/ICDE.2011. Conference on Data Engineering and Routing Adaptation. 5, 10 (2006), 5767862 IEEE Transactions on Mobile Computing 1417–1431. https://doi.org/10.1109/TMC.2006.144 [13] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. 2016. Generic [26] Wai Kit Wong, Ben Kao, David Wai Lok Cheung, Rongbin Li, and Siu Ming Yiu. Attacks on Secure Outsourced Databases. In Proceedings of the 2016 ACM SIGSAC 2014. Secure Query Processing with Data Interoperability in a Cloud Database (Vienna, Austria) Conference on Computer and Communications Security (CCS Environment. In . Association for Computing Machinery, New York, NY, USA, 1329–1340. Proceedings of the 2014 ACM SIGMOD International Conference ’16) (Snowbird, Utah, USA) . Association for https://doi.org/10.1145/2976749.2978386 on Management of Data (SIGMOD ’14) Computing Machinery, New York, NY, USA, 1395–1406. https://doi.org/10.1145/ 2588555.2588572

1755