<<

Improved Davies-Murphy’s Attack on DES Revisited

Yi Lu1⋆ and Yvo Desmedt2⋆⋆

1 National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences, Beijing, China 2 The University of Texas at Dallas, USA & University College London, UK

Abstract. DES is a famous 64-bit with balanced Feistel structure. It consists of 16 rounds. The has 56 bits and the round key has 48 bits. Two major techniques (namely, linear crypt- analysis and differential cryptanalysis) were notably developed and suc- cessfully applied to the full 16-round DES in the early 1990’s. Davies- Murphy’s attack can be seen as a special linear attack, which was devel- oped before invention of . It was improved by Biham and Biryukov and most recently by Jacques and Muller. In this paper, we revisit the recent improved Davies-Murphy’s attack by Jacques and Muller from an algorithmic point of view. Based on Matsui’s algorithm 2, we give an improved attack algorithm. Our improved attack algorithm works in time (241) with memory (233). In contrast, Jacques-Muller’s attack takes time (243) and memory (235). It seems that our results of the time and memory complexities are optimal, due to the use of Walsh transform. Meanwhile, we generalize and further improve the results of the improved Matsui’s algorithm 2 for the case that the subkeys are XORed into the round function. Keywords: DES, block cipher, Davies-Murphy’s attack, linear crypt- analysis, Matsui’s algorithm 2, Walsh transform

1 Introduction

DES is one of the most famous block ciphers [13]. It has been studied for 30 years and is still undergoing the progress of advanced cryptanaly- sis research today. Of the two major cryptanalysis techniques are linear cryptanalysis and differential cryptanalysis. They both proved to be suc- cessful to the full 16-round DES ([2,12]). Besides, another new cryptanal- ysis technique (i.e. algebraic attacks) has emerged since the last decade, which works on 6 rounds of DES [5].

⋆ Supported by the National Science and Technology Major Project under Grant No. 2012ZX01039-004, and the National Natural Science Foundation of China under Grant No. 61170072. Part of this work done while funded by British Telecommuni- cations under Grant No. ML858284/CT506918. ⋆⋆ Part of this work was done while funded by EPSRC EP/C538285/1 and by BT, as BT Chair of Information Security. Davies-Murphy’s attack [6] can be seen as a special linear attack, which was developed in the 1980’s before invention of linear cryptanalysis (cf. [4]). It was improved by Biham and Biryukov [1] and most recently by Jacques and Muller [9]. For review on Davies-Murphy’s attacks we refer to [9], and we refer to [4] for strengthening DES against Davies-Murphy’s attacks. In this paper, we revisit the improved Davies-Murphy’s attack [9] from an algorithmic point of view. Based on Matsui’s algorithm 2 [3, 7], we give an improved attack algorithm. In summary, our improved attack algorithm works in time (241) with memory (233). In contrast, the attack [9] takes time (243) and memory (235). Due to the use of Walsh transform3, it seems that our results of the time and memory complexities are optimal. Meanwhile, our results generalize and improve the results of the improved Matsui’s algorithm 2 in [3,7] for the case that the subkeys are XORed into the round function.

2 Related Works

In Fig. 1, we let 32-bit L0,R0 be the left and right half of the . Let L16,R16 be the left and right half of the . Similarly, Li,Ri denote the left and right half of DES output at Round i. As convention, the initial and final permutation of DES is ignored. The 48-bit subkey used for Round i is denoted by Ki (omitted in Fig. 1). Due to lack of space, we omit the detailed description on DES (cf. [13]). Let α = 0xa100c21 in hexadecimal representation 4. Let the 32-bit Ai be the output of DES round function f at Round i. Define the bias (also called imbalance [8]) of a binary random variable X by | Pr(X = 0)− Pr(X = 1)|. Recall due to Davies and Murphy [6], the bit α·A1 ⊕β ·K1 = −3.4 5 α · (L0 ⊕ L1) ⊕ β · K1 has bias 2 , with the subkey’s mask β = 0xf0. As DES consists of 16 rounds, this one-round characteristics is iterated 8 times in the original Davies-Murphy’s attack [6]. That makes a total bias 2−3.4×8 = 2−27.2 by Piling-up Lemma [12]. Later, Biham and Biryukov [1] proposed to use the technique of partial decryption to work with 15-round DES instead. Thus, the one-round characteristics is iterated 7 times. It makes an enlarged total bias of 2−3.4×7 = 2−23.8. Recently, with the trick

3 Note that Walsh transform and Fourier transform have been useful tools to aid linear cryptanalysis, e.g., [10, 11]. 4 Throughout the paper, we always let bit 0 be the least significant bit. 5 The subkey’s mask β corresponds to the highest 2 bits of the subkey’s 6-bit input to S-box S8 and the lowest 2 bits of the subkey’s 6-bit input to S-box S7.

2 of chosen-plaintext strategy, Jacques and Muller [9] showed that partial decryption actually allows to work with further reduced 13-round DES. It thus makes the increased bias of 2−3.4×6 = 2−20.4.

L0 R0

A1 f

A2 f

f

f . .

f

f

f

A16 f

L16 R16

Fig. 1. The untwisted view of DES

3 Our Improved Algorithm for Jacques-Muller’s Core Attack

As α·Ai equals the XOR of the four output bits of S-box S7 and the output bit 0,2,3 of S-box S8, let the Boolean function g(ℓ16 ⊕k16) compute the bit α · A16. Here, the 12-bit ℓ16,k16 denote inputs to S-box S7,S8 at Round 16. And ℓ16 is obtained by bit expansion from 10 bits of L16.

3 On the other hand, we let the 24-bit r0,k1 denote inputs to S-boxes S5 − S8 at Round 1. And r0 is obtained by bit expansion from 18 bits of R0. Let the 12-bit ℓ0,k2 denote inputs to both S-boxes S7,S8 at Round 2. And ℓ0 is obtained by bit expansion from 10 bits of L0. We let the Boolean function ′ hk1,k2 (ℓ0,r0)= h(ℓ0 ⊕ k2 ⊕ h (r0 ⊕ k1)) (1) ′ compute the bit α · A2. Here, h (r0 ⊕ k1) maps 12 bits to 12 bits, 5 bits 6 of which can be determined from k1,r0 and we define this function by ′′ h (r0 ⊕ k1). Further, the trick of chosen-plaintext strategy [9] allows to ′ have that the remaining 7 bits of the 12-bit output of h (r0⊕k1) are always fixed, which was considered an intermediate variable of 7 bits (denoted by x herein7) in [9]. For our convenience, we let ′ ′′ h (r0 ⊕ k1)= P (h (r0 ⊕ k1)kx), (2) where k denotes string concatenation and P is the bit permutation func- tion. Given the plaintext and ciphertext pair (L0,R0,L16,R16) (and we use the superscript i to denote each sample), define8 the binary function

x α·(R0⊕R16)⊕g(ℓ16⊕k16)⊕hk1,k2 (ℓ0,r0) Fk1,k2,k16 (L0,L16,R0,R16)=(−1) . (3)

Clearly, with the correct x,k1,k2,k16, the right-hand side of (3) is equal to (−1)α·(A4⊕A6⊕A8⊕A10⊕A12⊕A14). Jacques-Muller’s core attack idea [9] is shown in Algorithm 1, which aims at partial key-recovery of x,k1,k2,k16. As direct computation of Algorithm 1 is impractical, [9] proposed tech- niques to decompose into several steps: at each step, by guessing a few key bits, some intermediate information can be derived which allows to get rid of the old precomputation table. The optimization techniques [9, Sect. 3.4] solves Algorithm 1 with total time O(243) and the table size 35 O(2 ). It is worth noting that if x,k1,k2,k16 (of 7+24+12+12 = 55 bits in total) were linearly independent, we can apply the improved Matsui’s algorithm 2 in [7, Sect. 4] to solve Algorithm 1 with time O(3 × 55 × 255) and memory O(255). Nonetheless, from the description of DES [13], x,k1,k2,k16 are not linearly independent and [7] is not applicable. Inspired by the improved

6 Because they are bit expansion from 4 bits of S5 − S8 outputs (i.e., output bit 1 of S5, output bit 2 of S6, output bit 3 of S7, output bit 2 of S8) at Round 1. 7 Note that the 7-bit x actually is bit expansion from 6 unknown bits. 8 Note that k1, k2, k16,ℓ0,r0,ℓ16 simply is the bit selection function of K1,K2,K16,L0,R0,L16 with reduced bit length respectively.

4 Algorithm 1 The core partial key-recovery idea of Jacques-Muller’s at- tack [9] for all x, k1, k2, k16 do x uk1,k2,k16 ← 0 x x i i i i compute uk1,k2,k16 = Pi Fk1,k2,k16 (L0,L16,R0,R16), with F defined in (3) end for x output the largest uk1,k2,k16 with x, k1, k2, k16

Matsui’s algorithm 2 [3,7], based on the use of Walsh transform, we now give another algorithm for Jacques-Muller’s core attack [9] in order to compute Algorithm 1 with reduced time and memory. i i First, we define two sets E0 = {i : α · (R0 ⊕ R16) = 0} and E1 = {i : i i α · (R0 ⊕ R16) = 1}. So, we have E0 S E1 = {i}. For the set E = E0, we g(ℓ⊕k16) let G1(ℓ ⊕ k16)=(−1) and G2(ℓ) = Pi∈E 1ℓ16=ℓ, where 1 is the indicator function. Meanwhile, by (1), (2), we know the XOR (denoted by v) of x and 7 bits out of k2 becomes a new key material. And the linear ′ transformation (denoted by 12-bit k2) of x,k2 consists of the remaining 5 bits of k2 and the 7-bit v, which allows to rewrite the right-hand side ′ ′ ′′ of (1) as H(ℓ0 ⊕ k ,r0 ⊕ k1)= h(ℓ0 ⊕ k ⊕ P (h (r0 ⊕ k1)k0)), where 0 is 2 2 ′ ′ H(ℓ0⊕k2,r0⊕k1) the all zero vector of 7 bits. Let H1(ℓ0 ⊕ k2,r0 ⊕ k1)=(−1) , H2(b, c)= Pi∈E 1ℓ0=b,r0=c. We compute

′ µ ′ = G1(ℓ ⊕ k16) × G2(ℓ) × H1(b ⊕ k ,c ⊕ k1) × H2(b, c) k1,k2,k16 X 2 ℓ,b,c ′ =(G1 ⊗ G2(k16)) · (H1 ⊗ H2(k2,k1))

Here, ⊗ denotes convolution. Next, for the other set E = E1, we repeat E1 above computation to obtain µ ′ . We can easily check that we have k1,k2,k16

x u = µ ′ − µ ′ . (4) k1,k2,k16 k1,k2,k16 k1,k2,k16

′ x As x,k2 are combined into k2 as aforementioned, we denote uk1,k2,k16 by ′ u ′ from now on. k1,k2,k16 ′ ′ We now discuss how to find the largest u with corresponding k1,k2,k16. ′ First, we consider the simplified case when k1,k2,k16 (of total 24 + 12 + 12 = 48 bits) are linearly independent. For each set of E0,E1, com- 9 puting and storing the tables of G1 ⊗ G2 and H1 ⊗ H2 needs time 12 36 ′ O(3×12×2 ),O(3×36×2 ) respectively. Then, for all 48-bit k1,k2,k16, ′ 48 computing u ′ by (4) and finding the largest needs time O(2 ). The k1,k2,k16

9 Note that convolution can be computed by three times of Fast Walsh Transforms.

5 overall time cost is O(248) and the memory cost is dominated by O(2×236) in order to store two tables of H1 ⊗H2. In comparison, using the results of [7], we would need more time O(3 × 48 × 248) and higher memory O(248). ′ By DES [13], the 24-bit k1 and 12-bit k2 makes a total ′ of 33 bits rather than 24+12 = 36 bits; k1,k2,k16 makes a total of 35 bits rather than 48 bits. Thus, we propose to proceed as follows. For each set of E0,E1, computing and storing the table of G1 ⊗ G2 needs same 12 time O(3 × 12 × 2 ) as before, as only k16 is involved. With regards to ′ H1 ⊗ H2 for k1,k2, we do not need to get the complete table of results for all 36 bits. We are only interested in the results over GF (2)33. We use the techniques of linear transformation below to solve our problem with 33 33 time O(3 × 33 × 2 ) and memory O(2 ), for each set of E0,E1. Useful Techniques on Walsh Transforms: For a real function F over GF (2)L, recall that the Walsh transform of F is defined as Fˆ(x) = x·x′ ′ L Px′ (−1) F (x ), for all x ∈ GF (2) . Note that the order of the bit position does not affect the results of Walsh transforms. That is, we de- note x by x0,x1,...,xL−2,xL−1, and given a fixed bit permutation p over {0, 1, 2,...,L − 1}, we let y = xp(0),xp(1),...,xp(L−2),xp(L−1) (which is a special bijection of x). We define another real function F ′(y) = F (x) for all y ∈ GF (2)L. It is easy to see Fˆ(x) = Fˆ′(y) for all x ∈ GF (2)L. In the case of DES, we have L = 36. Denote our target real function by F with the input x = x0,x1,...,x34,x35. Due to the dependency of the key bits in DES, we can use the above property to arrange the bit order of the input (without affecting the results of Walsh transforms) from the bit permutation p over {0, 1, 2,..., 35} such that the three redundant bit positions are placed at bit 33, 34, 35. This way, we’re interested in the results over all 33-bit y0,y1,...,y34,y35 only with y33 = y34 = y35 = 0, ′ where y = y0,y1,...,y34,y35 = xp(0),xp(1),...,xp(34),xp(35) and F (y) = 36 ′ F (x) for all y ∈ GF (2) . Now, we let f(a) = Py:a=(y≫3) F (y) for all a ∈ GF (2)33, where ≫ denotes the (non-cyclic) bit shift (to the right) operation. We now show that fˆ(a)= Fˆ′(a ≪ 3) for all 33-bit a, where ≪ denotes the (non-cyclic) bit shift (to the left) operation. To prove this, we check that we have a·a′ ′ ′ fˆ(a)= X (−1) X F (y ) a′∈GF (2)33 y′∈GF (2)36:(y′≫3)=a′ a·(y′≫3) ′ ′ = X X (−1) F (y ) a′∈GF (2)33 y′∈GF (2)36:(y′≫3)=a′ As (a ≪ 3) · y′ ≡ a · (y′ ≫ 3) holds for all 33-bit a and 36-bit y′, we ˆ (a≪3)·y′ ′ ′ finally have f(a) = Py′∈GF (2)36 (−1) F (y ), which completes our

6 proof. Consequently, we have shown that computing fˆ(a) for all 33-bit a ′ is equivalent to computing Fˆ (y) for all 33-bit y0,y1,...,y32 (with y33 = 33 y34 = y35 = 0). This can be done with time O(33 × 2 ) and memory O(233). After that, to find the largest u′, an exhaustive search for all 35-bit ′ 35 k1,k2,k16 will do with time O(2 ). We give our algorithm of the partial key-recovery attack in Algorithm 2. The total memory cost is dominated 33 by O(2 ) for computing and storing H1 ⊗ H2, and the time is dominated 33 by O(2 × 3 × 33 × 2 ) to compute H1 ⊗ H2 for two sets E0,E1, i.e., O(240.6). Consequently, our results improve the Jacques-Muller’s Attack [9] which needs time O(243) and memory O(235). ′ Note that this partial-key recovery attack recovers k1,k2,k16, which contain 28 bits of the 56-bit key and the 7-bit v. Then the 7-bit x can be deduced from v and the recovered key. When the data amount is not sufficiently large and the correct key is not ranked No.1, it is clear that our attack algorithm can obtain the top n candidates with same time complexity. After that, as discussed in [9], for each of n candidates, the remaining 56 − 28 = 28 bits of the key (containing only 28 − 7 = 21 unknown bits due to recovery of the 7-bit x) can be found by exhaustive search using one pair of plaintext and ciphertext. Finally, for the suggested complete attack [9] which recovers the key with chosen O(245), time O(243) and memory O(235), our algorithm works in time O(241) with memory O(233) given the same amount of data.

Algorithm 2 Our improved algorithm for Jacques-Muller’s core partial key-recovery attack [9] for each set of E0, E1 do compute and store G1 ⊗ G2 and H1 ⊗ H2 end for ′ for all k1, k2, k16 do ′ compute u ′ by (4) k1,k2,k16 end for ′ ′ output the largest u with k1, k2, k16

4 Further Discussions

Following our attack algorithm (Algorithm 2), we see that our results actually generalize and improve the recent improved Matsui’s algorithm 2 in [3,7] when the subkeys are XORed into the round function (e.g., DES).

7 When the subkeys used for partial decryption are linearly independent of ℓ total ℓ bits, our results show that the time complexity is max(2 , 3ℓ1 · ℓ1 ℓ2 2 , 3ℓ2 · 2 ), where ℓ1,ℓ2 denotes the total key bits to decrypt top down and bottom up respectively, and ℓ = ℓ1 + ℓ2. We have the memory cost max(2ℓ1 , 2ℓ2 ). Note that the results of [3,7] would need time O(3ℓ·2ℓ) and ℓ memory O(2 ), regardless of the values of ℓ1,ℓ2. As another example, for the attack on 22-round block cipher SMS4 [7, Sect. 4.2] with time cost 115.9 112 2 and memory 2 , here we have ℓ1 = ℓ2 = 56,ℓ = 112. Accordingly, our results would need improved time max(2112, 3 × 56 × 256) = 2112 computations, i.e., 2112/22 = 2107.5 22-round computations with greatly decreased memory O(256). When the subkeys used for partial decryption are linearly dependent as we have studied above, [3,7] did not consider this case. Let the subkeys involved consist of ℓ independent bits, and ℓ1,ℓ2 denotes the independent key bits to decrypt top down and bottom up respectively. Note that ℓ = ℓ1 +ℓ2 or ℓ 6= ℓ1 +ℓ2 could be possible now. We then have exactly the same results as above, which has been demonstrated to improve the Jacques- Muller’s attack algorithm with ℓ = 35,ℓ1 = 33,ℓ2 = 12.

5 Conclusion

In this paper, we revisit the improved Davies-Murphy’s attack [9] on DES from an algorithmic point of view. Our improved attack algorithm works in time (241) with memory (233). In contrast, the attack [9] takes time (243) and memory (235). Further, it seems that our results of the time and memory complexities are optimal, due to the use of Walsh trans- form. Meanwhile, our results generalize and improve the recent improved Matsui’s algorithm 2 in [3,7] for the case that the subkeys are XORed into the round function.

References

1. E. Biham, A. Biryukov, An improvement of Davies’ attack on DES, EUROCRYPT 1994, LNCS vol. 950, pp. 461-467, 1995. 2. E. Biham, A. Shamir, Differential cryptanalysis of the full 16-round DES, CRYPTO 1992, LNCS vol. 740, pp. 487-496, 1992. 3. B. Collard, F. -X. Standaert, Jean-Jacques Quisquater, Improving the time com- plexity of Matsui’s linear cryptanalysis, ICISC 2007, LNCS vol. 4817, pp. 77-88, 2007. 4. N. T. Courtois, G. Castagnos, L. Goubin, What do DES S-boxes say to each other ?, IACR eprint, available at http://eprint.iacr.org/2003/184, 2003.

8 5. N. T. Courtois, G. V. Bard, Algebraic cryptanalysis of the data standard, IACR eprint, available at http://eprint.iacr.org/2006/402, 2006. 6. D. Davies, S. Murphy, Pairs and triplets of DES S-Boxes, Journal of cryptology, vol.8, No.1, pp. 1-25, 1995. 7. J. Etrog, M. J. B. Robshaw, The cryptanalysis of reduced-round SMS4, SAC 2008, LNCS vol. 5381, pp. 51-65, 2009. 8. C. Harpes, J. L. Massey, Partitioning cryptanalysis, FSE 1997, LNCS vol. 1267, pp. 13-27, 1997. 9. S. Jacques, F. Muller, New improvements of Davies-Murphy cryptanalysis, ASI- ACRYPT 2005, LNCS vol. 3788, pp. 425-442, 2005. 10. Y. Lu, Y. Desmedt, Bias Analysis of a Certain Problem with Applications to E0 and Shannon Cipher, ICISC 2010, LNCS vol. 6829, pp. 16-28, 2011. 11. Y. Lu, H. Wang, S. Ling, Cryptanalysis of Rabbit, ISC 2008, LNCS vol. 5222, pp. 204-214, 2008. 12. M. Matsui, Linear cryptanalysis method for DES cipher, EUROCRYPT 1993, LNCS vol. 765, pp. 386-397, 1994. 13. A. J. Menezes, P. C. van. Oorschot, and S. A. Vanstone, Handbook of Applied , CRC press, 1996.

9