Using Constraint Programming to Solve a Cryptanalytic Problem∗
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Using Constraint Programming to Solve a Cryptanalytic Problem∗ David Gerault(1), Marine Minier(2), and Christine Solnon(3) (1) Universite´ Clermont Auvergne, LIMOS, France (2) Universite´ de Lorraine, LORIA, UMR 7503, F-54506, France (3) Universite´ de Lyon, INSA-Lyon, F-69621, LIRIS, CNRS UMR5205, France [email protected], [email protected], [email protected] Abstract graph traversal approach [Fouque et al., 2013], and a Branch & Bound approach [Biryukov and Nikolic, 2010]. The ap- We describe Constraint Programming (CP) models proach of [Fouque et al., 2013] requires about 60 GB of mem- to solve a cryptanalytic problem: the chosen key ory when the key has 128 bits, and it has not been extended differential attack against the standard block cipher to larger keys. The approach of [Biryukov and Nikolic, 2010] AES. We show that CP solvers are able to solve only takes several megabytes of memory, but it requires sev- these problems quicker than dedicated cryptanaly- eral days of computation when the key has 128 bits, and sev- sis tools, and we prove that a solution claimed to eral weeks when the key has 192 bits. be optimal in two recent cryptanalysis papers is not During the process of designing new ciphers, this search optimal by providing a better solution. generally needs to be performed several times, so it is desir- able that it can be done rather quickly. Another point that 1 Introduction should not be neglected is the time needed to design and im- plement these approaches: To ensure that the computation is Since 2001, AES (Advanced Encryption Standard) is the en- completed within a “reasonable” amount of time, it is neces- cryption standard for block ciphers [FIPS 197, 2001]. It sary to reduce the branching by introducing clever reasoning. guarantees communication confidentiality by using a secret Of course, this hard task is also likely to introduce bugs, and key K to encode an original plaintext X into a ciphertext checking the correctness or the optimality of the computed AESK (X), in such a way that the ciphertext can further solutions may not be so easy. Finally, reproducibility may be decoded into the original one using the same key, i.e., also be an issue. Other researchers may want to adapt these −1 X = AESK (AESK (X)). Cryptanalysis aims at testing algorithms to other problems, with some common features whether confidentiality is actually guaranteed. In particular, but also some differences, and this may again be very diffi- differential cryptanalysis [Biham and Shamir, 1991] evalu- cult and time-consuming. ates whether it is possible to find the key within a reasonable In [Gerault et al., 2016], we proposed to use Constraint number of trials by considering plaintext pairs (X; X0) and Programming (CP) to solve this problem. When using CP studying the propagation of the initial difference X ⊕ X0 be- to solve a problem, one simply has to model the problem by tween X and X0 while going through the ciphering process means of constraints. Then, this model is solved by generic (where ⊕ is the xor operator). Today, differential cryptanaly- solvers usually based on a Branch & Propagate approach: sis is public knowledge, and block ciphers such as AES have The search space is explored by building a search tree, and proven bounds against differential attacks. Hence, [Biham, constraints are propagated at each tree node in order to prune 1993] proposed a new type of attack called related-key attack branches. CP opens new perspectives for this kind of crypt- that allows an attacker to inject differences not only between analysis problems. First, it is very competitive with dedicated the plaintexts X and X0 but also between the keys K and K0 approaches: When the key has 128 bits, Chuffed [Chu and (even if the secret key K stays unknown from the attacker). Stuckey, 2014] is able to find optimal solutions in less than To mount related-key attacks, the cryptanalist must find one hour. Second, the CP model is easier to check or re-use optimal related-key differentials, i.e., a plaintext difference than a full program that not only describes the problem to δX and a key difference δK that maximize the probabil- solve, but also how to solve it. Actually, CP allowed us to ity that, for randomly chosen plaintext X and key K, the prove that a solution claimed to be optimal in [Biryukov and difference X ⊕ δX in the input plaintexts becomes the dif- Nikolic, 2010; Fouque et al., 2013] is not optimal by provid- 0 ference AESK (X) ⊕ AESK⊕δK (X ) in the output cipher- ing a better solution. texts. Finding the optimal related-key differentials for AES is a highly combinatorial problem that hardly scales. Two 2 Problem Statement main approaches have been proposed to solve this problem: a AES block cipher. AES ciphers blocks of length n = 128 ∗This research was conducted with the support of the FEDER bits, and each block is a 4 × 4 matrix of bytes. The length of program of 2014-2020 and the region council of Auvergne keys is l 2 f128; 192; 256g. In this paper, we only consider 4844 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Operations applied at each round i 2 [0; r − 1]: Key K = K0 SKi[j][3] = (4×4 bytes) Subkey Ki+1 KS S(Ki[j][3]) KS ARK S SR MC i = r SB ARK i = 0 i < r Plaintext X Xi SXi = S(Xi) SR(SXi) Yi = MC(SR(SXi)) Xr Ciphertext (4×4 bytes) SXr = AESK (X) Figure 1: AES ciphering process. Each 4 × 4 array represents a group of 16 bytes. Before the first round, ARK is applied on X and K to obtain X0. Then, for each round i 2 [0; r − 1], S is applied on Xi to obtain SXi, SR and MC are applied on SXi to obtain Yi, KS is applied on Ki to obtain Ki+1 (and during KS, S is applied on Ki[j][3] to obtain SKi[j][3], 8j 2 [0; 3]), and ARK is applied on Ki+1 and Yi to obtain Xi+1. The ciphertext SXr is obtained by applying SB on Xr. keys of length l = 128, and keys are 4 × 4 matrices of bytes. Optimal related-key differentials. Let us note δB = Given a 4 × 4 matrix of bytes M, we note M[j][k] the byte at B ⊕ B0 the difference between two bytes B and B0, and row j 2 [0; 3] and column k 2 [0; 3]. diffBytes = fδB j B 2 Bytesg the set of all differential AES is an iterative process, and we note Xi the ciphertext bytes. at the beginning of round i 2 [0; r]. Each round is composed Mounting attacks requires finding a related-key differential of the following operations, as displayed in Fig. 1: characteristic, i.e. a plaintext difference δX = X ⊕ X0 and a key difference δK = K ⊕ K0, such that δX becomes δSX • SubBytes S. S is a non-linear permutation which r after r rounds with a probability as hight as possible. This can is applied on each byte of X separately, i.e., 8j; k 2 i be achieved by tracking the propagation of the initial differ- [0; 3], X [j][k] is replaced by S(X [j][k]), according to i i ences through the cipher. Once such an optimal differential a lookup table. We note SX = S(X ). i i characteristic is found, the cryptanalist asks for the encryp- • ShiftRows SR. SR is a linear mapping that rotates on tion of messages satisfying the input difference. When this the left by 1 byte position (resp. 2 and 3 byte positions) difference propagates as expected, he can infer informations the second row (resp. third and fourth rows) of SXi. leading to a recovery of the secret key K with less workload than exhaustive search. The difficulty of recovering K de- • MixColumns MC. MC is a linear mapping that mul- creases as the probability of the used differential characteris- tiplies each column of SR(SX ) by a 4 × 4 fixed matrix i tic increases. The AES operators SR; MC; ARK; and KS chosen for its good properties of diffusion [Daemen and are linear, i.e., they propagate differences in a deterministic Rijmen, 2002]. In particular, it has the Maximum Dis- way (with probability 1). However, the S operator is not lin- tance Separable (MDS) property: For each column, the ear: Given a byte difference B ⊕ B0 = δB, the probability total number of bytes which are different from 0, before that δB becomes S(B) ⊕ S(B0) = δSB is P r(δB ! δSB). and after applying MC, is either equal to 0 or strictly It is equal to 1 if δB = 0 (i.e., B = B0). However, if δB 6= 0, greater than 4. We note Yi = MC(SR(SXi)). 2 4 then P r(δB ! δSB) 2 f 256 ; 256 g. • KeySchedule KS. The subkey at round 0 is the ini- The probability that δX becomes δSXr is equal to the tial key, i.e., K0 = K. For each round i 2 [0; r − 1], product of all P r(δB ! δSB) such that δB (resp. δSB) the subkey Ki+1 is generated from Ki by applying KS. is a byte difference before (resp. after) passing through 0 0 It first replaces each byte Ki[j][3] of the last column by the S operator when ciphering X with K and X with K , S(Ki[j][3]) (where S is the SubBytes operator), and i.e., the product of all P r(δXi[j][k] ! δSXi[j][k]) and all we note SKi[j][3] = S(Ki[j][3]).