Quick viewing(Text Mode)

Related-Key Boomerang and Rectangle Attacks⋆,⋆⋆

Related-Key Boomerang and Rectangle Attacks⋆,⋆⋆

Related- Boomerang and Rectangle Attacks⋆,⋆⋆

Jongsung Kim1, Seokhie Hong2, Bart Preneel3, Eli Biham4, Orr Dunkelman3,5,7,⋆⋆⋆, and Nathan Keller6,7,†

1 Division of e-Business, Kyungnam University, 449, Wolyeong-dong, Masan, Kyungnam, 631-701, Korea, [email protected] 2 Center for Information Security Technologies (CIST), Korea University, Anam Dong, Sungbuk Gu, Seoul, Korea, [email protected] 3 Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT/SCD-COSIC, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, [email protected] 4 Computer Science Department, Technion, Haifa 32000, Israel, [email protected] 5 Ecole´ Normale Sup´erieure D´epartement d’Informatique, 45 rue d’Ulm, 75230 Paris, France. 6 Einstein Institute of Mathematics, Hebrew University. Jerusalem 91904, Israel 7 Faculty of Mathematics and Computer Science Weizmann Institute of Science P.O. Box 26, Rehovot 76100, Israel {orr.dunkelman,nathan.keller}@weizmann.ac.il

Abstract. This paper introduces the related-key boomerang and the related-key rectangle attacks. These new attacks can expand the crypt- analytic toolbox, and can be applied to many block ciphers. The main advantage of these new attacks, is the ability to exploit the related-key model twice. Hence, even ciphers which were considered resistant to ei- ther boomerang or related-key differential attacks may be broken using the new techniques.

⋆ This paper is partially based on the papers [9, 10, 27] which appeared at EURO- CRYPT 2005, ASIACRYPT 2005 and FSE 2007, respectively. ⋆⋆ This paper was submitted to a Journal on October 2009. ⋆⋆⋆ The fifth author was partially supported by the Clore scholarship programme and in part by the Concerted Research Action (GOA) Ambiorics 2005/11 of the Flem- ish Government and by the IAP Programme P6/26 BCRYPT of the Belgian State (Belgian Science Policy). † The sixth author was partially supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities. In this paper we a rigorous treatment of the related-key boomerang and the related-key rectangle distinguishers. Following this treatment, we devise optimal distinguishing algorithms using the LLR (Logarithmic Likelihood Ratio) statistics. We then analyze the success probability un- der reasonable independence assumptions, and verify the computation experimentally by implementing an actual attack on a 6-round variant of KASUMI. The paper ends with a demonstration of the strength of our new proposed techniques with attacks on 10-round AES-192 and the full KASUMI. Keywords: Related-key , Related-Key Rectangle At- tack, AES, KASUMI.

1 Introduction

Related-key differentials are an extension of differentials, where the adversary is allowed a control over the key difference along with control over the plain- text/ differences [23]. The additional control gives the adversary the possibility to cancel differences that enter the nonlinear parts of the cipher, and as a result, the probability of the differential is increased. While the use of related-key differentials in differential attacks has been stud- ied for more than a decade, the idea of using the related-key differentials in more complex attacks has not been as extensively studied. Although techniques like related-key impossible differential [22] and related-key differential-linear crypt- analysis [11] were used to attack specific ciphers, no systematic analysis was suggested. In this paper we examine the applicability of related-key differentials in the boomerang and the rectangle attacks. We show that it is possible to change the differentials into related-key differentials, and allow the adversary to enjoy the related-key advantage twice, using two related-key differentials at the expense of using four keys for the attack.8

1.1 The Related-Key Model The related-key model was introduced in [4, 28] and deals with attack scenarios where the adversary is given access to under multiple unknown keys, such that the relation between them is known to (or even chosen by) the ad- versary. While this model might seem too strong, it has practical implications. Amongst the many issues, a which is not secure against related-key attacks might fail as a hash function (for example, a related-key attack on the block cipher TEA [40] used in Microsoft’s Xbox architecture as a hash function, was used to hack the system). Related-key attacks were intensively studied in the last decade, both from the theoretical [2] and the practical [11,12,19,22,43] points of view. Immunity to related-key attacks is considered one of the security goals in the design of modern block ciphers [16]. 8 We note that in some cases the number of keys remains two. 1.2 The Boomerang and the Rectangle Attacks

The boomerang attack [38] is a differential-based attack that uses two short differentials (of few rounds each) rather than one long differential (of many rounds). In an adaptive chosen plaintext and ciphertext process, the adversary constructs boomerang quartets by exploiting the short differentials. The attack was later transformed into a chosen plaintext variant named the amplified boomerang attack [25] (and then renamed as the rectangle attack [7]). The transformation is done by a birthday-paradox argument, which leads to a higher data complexity, but still allows the use of two short differentials. All of these attacks treat the distinguished part of the cipher E as a decompo- sition into two sub-ciphers, E = E1 E0, where in each of these two sub-ciphers some (relatively) high probability differential◦ exists. If the probability of the differential of E0 is p and the probability of the differential of E1 is , then the data complexity of the corresponding boomerang distinguisher is O((pq)−2) adaptively chosen plaintexts and and of the rectangle distinguisher is O(2n/2 (pq)−1) chosen plaintexts, where n is the block size. · In the more complex variants of these attacks the use of multiple differentials is supported as long as they share the input difference in the differentials for E0 (the first rounds) and share the output difference in the differentials for E1 (the last rounds). This improvement reduces the data complexity of both attacks significantly.

1.3 Our Contributions

We consider the same conceptual division. Let the cipher E be a concatena- tion of two sub-ciphers, i.e., E = E E . Furthermore, assume that there exist 1 ◦ 0 high-probability related-key differentials in E0 and in E1 (not necessarily with the same key difference). We show that in this scenario it is possible to apply related-key boomerang and rectangle attacks. The basic related-key boomerang and rectangle distinguishers are summarized in the following theorem:

n n Theorem 1. Let E = E1 E0 : 0, 1 0, 1 be a block cipher. Consider with E under a◦ secret{ key }K and→ { related-keys} whose differences with K are chosen by the adversary. Let

2 pˆ = max Pr [E0,K (P ) E0,K⊕∆K0 (P α)= β] , α6=0,∆K0 P ⊕ ⊕ s β X  

2 −1 −1 qˆ = max Pr E1,K (C) E1,K⊕∆K1 (C δ)= γ δ6=0,∆K1 C ⊕ ⊕ s γ X  h i 2 = max Pr [E1,K (X) E1,K⊕∆K1 (X γ)= δ] δ6=0,∆K1 X ⊕ ⊕ s γ X   where E0,K (P ) denotes the partial encryption of P through E0 under the key −1 K and E1,K (C) denotes the partial decryption of C through E1 under the key K. Under independence assumptions between the differentials, for c > 0, given either

– 4c/(ˆpqˆ)2 unique adaptively chosen plaintexts and ciphertexts, or – √c 2n/2+2/pˆqˆ unique chosen plaintexts, · encrypted under four related-keys of the form K,K ∆K0,K ∆K1,K ∆K0 9 ⊕ ⊕ ⊕ ⊕ ∆K1, it is possible to distinguish E from a random permutation. The probability of success of the distinguisher is approximately 1 e−c/2 (when pˆqˆ is sufficiently high). −

We present a rigorous treatment of the related-key boomerang and rectangle distinguishers. We devise the optimal distinguishing algorithms using the LLR metric, and compute their success rate. Using this analysis, we prove Theorem 1, along with an easy lemma allowing to calculate lower bounds forp ˆ andq ˆ in practical ciphers. As in other statistical attacks on block ciphers, the calculation of the success probability of our attack is based on some randomness assumptions. We state explicitly the assumptions we use and discuss their validity in various cases. To verify the validity of these assumptions we carried out computer experiments for the related-key boomerang attack on 6-round KASUMI [36]. We note that the analysis presented in our paper is also the first rigorous analysis of the boomerang/rectangle techniques themselves. Although these tech- niques were used many times in attacks, a rigorous analysis of them was not performed before. After the theoretical treatment, we consider several improvements of the related-key boomerang and rectangle attacks:

1. The Use of Structures of Keys: We use structures of keys to overcome a wider range of algorithms. In ciphers with a nonlinear key schedule, a given key difference may cause many subkey differences, thus interfering with the construction of related-key differentials. Structures of keys can be used to reduce the effect of this event on the differentials. 2. The Use of Other Relations between the Keys: While XOR relations are common and inherent to the majority of differential-based related-key attacks, in some cases there are more suitable key relations (either due to the environment of the attack or in order to get higher probabilities of the differentials). We show that the proposed attacks are applicable when the XOR relations between the keys are replaced with different kinds of relations and discuss which relations induce feasible attacks.

9 In some cases ∆K0 = ∆K1. In these cases, there are small changes in the analysis, most notably the use of only two related keys. We then compare the new attacks with previously proposed related-key tech- niques. We explore the advantages of the new attacks, and show that in many cases the related-key boomerang and the related-key rectangle attacks are signif- icantly more effective than other related-key techniques, even if in the single-key scenario the boomerang and the rectangle attacks are inferior to the non-related- key techniques. Finally, out of the many ciphers for which related-key boomerang and rect- angle attacks were applied to (to mention a few, IDEA, MISTY1, SHACAL-1, SHACAL-2, and XTEA), we present two cases that demonstrate the strength and the wide applicability of the new attacks. We chose to concentrate on KA- SUMI and AES, as these two ciphers demonstrate the advantages of using two pairs of related-keys to overcome complex round functions (KASUMI) and using structures of keys to bypass a nonlinear key schedule (AES).

An Attack on 10-round AES-192 The Advanced Encryption Standard (AES) [32] is a 128-bit block cipher with a variable key length (128, 192, and 256-bit keys are supported). Since its selection, AES gradually became one of the most widely used and analyzed block ciphers. The cipher has received a great of cryptan- alytic attention, both during the AES process, and even more after its selection. We present a related-key rectangle attack on 10-round AES-192 requiring 2119.2 chosen plaintexts encrypted under one of 64 related keys and time com- plexity of 2185.2 memory accesses. Our attack uses structures of 64 keys in order to overcome the nonlinearity of the AES key schedule. We summarize our results along with selected other results in Table 1.10

An Attack on the Full 8-Round KASUMI KASUMI is an 8-round Feistel block cipher used in the confidentiality and the integrity algorithms of some 3GPP mobile communications. Since the 3GPP mobile communications are used by millions of customers, KASUMI is one of the most widely used block ciphers. We start by presenting a simple 6-round related-key boomerang attack on 6-round KASUMI, which has a practical data and time complexity. We follow to present a related-key rectangle attack on the full 8-round requiring 254.6 chosen plaintexts and 273.6 encryptions. The cases of AES and KASUMI show the advantage of the related-key boomerang/rectangle attack over the other related-key attacks. While the other techniques can attack the same number of rounds as the best single-key attacks (8 rounds for AES-192 and 6 rounds for KASUMI), the related-key boomerang/rectangle attacks can attack either two more rounds (10 rounds for AES-192 and the full 8-round KASUMI), or the same number of rounds with a significantly lower

10 We note that an independent related-key boomerang attack on 9-round AES-192 was presented recently in [20]. Also, related-key boomerang attacks on the full AES-192 and AES-256 were presented in [14] using a stronger model of related keys. complexity. We summarize our results along with selected other results in Ta- ble 1.

Table 1. Comparison of our Attacks with Selected Previous Results

Cipher Attack Number of Complexity Rounds Keys Data Time KASUMI Imp. Diff. [29] 6 1 255 CP 2100 (8 rounds) RK Diff. [15] 6 2 218.6 RK-CP 2113.6 RK Boom. (Sect. 3.3) 6† 4 768 RK-ACPC 1 RK Rect. (Sect. 3.6) 8 4 254.6 RK-CP 273.6 RK Rect. (Sect. 3.7) 8 4 238.6 RK-CP 2104 AES-192 Partial Sums [19] 8 1 2128 − 2119 CP 2188 (12 rounds) RK Diff.-Lin. [43] 8 2 2118 RK-CP 2165 RKImp.Diff.[42] 8 2 264.5 RK-CP 2177 8 2 288 RK-CP 2153 8 2 2112 RK-CP 2136 RS Rectangle [14] 12 4 2123 RS-CP 2176 RK Rect. (Sect. 4.4) 10 256 2121.2 RK-CP 2184.2 MA (Sect. 4.5) 10 64 2119.2 RK-CP 2185.2 MA CP – chosen plaintexts, ACPC – adaptive chosen plaintexts and ciphertexts, RK: Related-Key, RS: Related-Subkey †: , MA: Memory accesses Time is measured in encryption units unless mentioned otherwise

1.4 The Organization of the Paper

The paper is organized as follows: In Section 2 we present the related-key boomerang and rectangle attacks and discuss them theoretically. In Section 3 we apply the attacks to the full KASUMI. In Section 4 we apply the attacks to reduced-round AES-192. Finally, Section 5 summarizes the paper.

2 The Related-Key Boomerang and Rectangle Attacks

In this section we introduce the related-key boomerang and the related-key rect- angle attacks. We start with a brief description of the boomerang and the rectan- gle attacks in the single key model. We then introduce and analyze rigorously the related-key boomerang and rectangle attacks. We follow and examine the ran- domness assumptions used in the attacks. We conclude this section with several generalizations and comparisons of the newly proposed attacks. 2.1 Boomerang and Amplified Boomerang (Rectangle) Attacks The main idea behind the boomerang attack [38] is to use two short differentials with high probabilities instead of one long differential with a low probability. We assume that a block cipher E : 0, 1 n 0, 1 k 0, 1 n can be described as { } ×{ } →{ } a cascade E = E1 E0, such that for E0 there exists a differential α β with ◦ → 11 probability p, and for E1 there exists a differential γ δ with probability q. The distinguisher is based on the following boomerang→ process: 1. Ask for the encryption of a pair of plaintexts (P , P ) such that P P = α 1 2 1 ⊕ 2 and denote the corresponding ciphertexts by (C1, C2). 2. Calculate C = C δ and C = C δ, and ask for the decryption of the 3 1 ⊕ 4 2 ⊕ pair (C3, C4). Denote the corresponding plaintexts by (P3, P4). 3. Check whether P P = α. 3 ⊕ 4 The boomerang attack uses the first differential (α β) for E0 with respect to the pairs (P , P ) and (P , P ), and the second differential→ (γ δ) for E with 1 2 3 4 → 1 respect to the pairs (C1, C3) and (C2, C4). For a random permutation the probability that the last condition is satisfied −n 12 is 2 , where n is the block size. For E, the probability that the pair (P1, P2) is a right pair with respect to the first differential (i.e., the probability that the intermediate difference after E0 equals β, as predicted by the differential) is p. The probability that both pairs (C1, C3) and (C2, C4) are right pairs with 2 −1 respect to the second differential is q . If all these are right pairs, then E1 (C3) −1 ⊕ E1 (C4)= β = E0(P3) E0(P4). Thus, with probability p, P3 P4 = α. Hence, the total probability of⊕ this quartet of plaintexts and ciphertexts⊕ to satisfy the 2 condition P3 P4 = α is at least (pq) . The attack⊕ can be mounted for all possible β’s and γ’s simultaneously (as long as β = γ). Therefore, a right quartet for E is encountered with probability not less than6 (ˆpqˆ)2, where:

pˆ = Pr 2[α β], andq ˆ = Pr 2[γ δ]. → → γ sXβ sX Using the boomerang process described above, the cipher E can be distinguished from a random permutation given O((ˆpqˆ)−2) adaptively chosen plaintexts and ciphertexts, provided thatp ˆqˆ 2−n/2. The complete analysis is given in [7, 8, 38]. We omit the analysis here≫ since it is essentially included in the analysis of the related-key boomerang attack presented in Section 2.2. As the boomerang distinguisher requires adaptively chosen plaintexts and ciphertexts, it cannot be combined with many of the standard techniques for

11 We note that in the attack, the differentials are used both in the forward (i.e., encryption), and in the backward (i.e., decryption) directions. As the considered dif- ferentials are not truncated differentials, the direction does not affect the probability of the differentials. 12 For the analysis of E we rely on some independence assumptions, addressed in Section 2.4. using distinguishers in key recovery attacks. This led to the introduction of a chosen plaintext variant of the boomerang attack called the amplified boomerang attack [25], and later renamed as the rectangle attack [7]. The transformation of the boomerang attack into a chosen plaintext attack relies on standard birthday- paradox arguments. The key idea behind the transformation is to encrypt many plaintext pairs with input difference α, and to look for quartets (i.e., pairs of pairs) that conform to the requirements of the boomerang process. In the rectangle distinguisher, the adversary considers quartets of plaintexts of the form ((P1, P2 = P1 α), (P3, P4 = P3 α)). A quartet is called a “right quartet” if the following conditions⊕ are satisfied:⊕

1. E (P ) E (P )= β = E (P ) E (P ). 0 1 ⊕ 0 2 0 3 ⊕ 0 4 2. E0(P1) E0(P3)= γ (which leads to E0(P2) E0(P4)= γ if this condition holds along⊕ with the previous one). ⊕ 3. C C = δ = C C . 1 ⊕ 3 2 ⊕ 4 The probability of a quartet to be a right quartet is a lower bound on the probability of the event

C C = δ = C C . (1) 1 ⊕ 3 2 ⊕ 4 The usual assumption is that each of the above conditions is independent of the rest, and hence the probability that a given quartet ((P1, P2), (P3, P4)) is a right quartet is p2 2−n−1 q2. Since for a random permutation, the probability of Condition (1)· is 2−2n,· the rectangle process can be used to distinguish E from a random permutation if pq 2−n/2 (like in the boomerang distinguisher). ≫ The data complexity of the distinguisher is O(2n/2(pq)−1), which is much higher than the complexity of the boomerang distinguisher. The higher data complexity follows from the fact that the event E0(P1) E0(P3)= γ occurs with a “random” probability of 2−n (actually, this is the birthday-paradox⊕ argu- ment used in the construction). The identification of right quartets is also more complicated than in the boomerang case, as instead of checking a condition on pairs, the adversary has to go over all the possible quartets. At the same time, the chosen plaintext nature allows using stronger key recovery techniques. An optimized method of finding the right rectangle quartets is presented in [8]. Like the boomerang attack, the rectangle attack can use all the possible β’s and γ’s simultaneously. This reduces the data complexity of the attack to O(2n/2(ˆpqˆ)−1), wherep ˆ andq ˆ are as defined above. The complete analysis of the rectangle attack is given in [7, 8].

2.2 The Related-Key Boomerang Attack

We now present the related-key boomerang distinguisher, and determine the conditions required for the distinguisher to succeed. Following a rigorous treat- ment we compute the optimal value of the threshold used in the distinguisher using the Logarithmic Likelihood Ratio method. Then we compute the success rate of the distinguisher using a Poisson approximation. In order to keep this sec- tion readable, we refrain from presenting a detailed analysis of the key-recovery attack algorithm. The reader is referred to [8] for a generic key-recovery attack algorithm exploiting the boomerang distinguisher (which is easily adapted to the related-key model), and to the specific attack algorithms presented in Sections 3 and 4. First, we recall the definition of related-key differentials and introduce a short- hand used throughout this paper to denote them:

Definition 1. We say that a related-key differential α β with key difference ∆K holds for E with probability p, if →

Pr [EK (P ) EK⊕∆K (P α)= β]= p, P,K ⊕ ⊕ where E ( ) denotes encryption through E with the key K. For the ease of K · exposition, we denote this event by Pr α E β = p. For sake of simplicity, −−→∆K we shall denote the related-key differential by α E β or when the cipher E −−→∆K is implicit from the text by α β (we alert the reader that this notation is −−→∆K not common).  

In order to present the independence assumption used in the paper, we need another definition:

Definition 2. For each related-key differential α E β , we denote the set of −−→∆K   E right pairs with respect to the differential (for the given key K) by GK α β . −−→∆K Formally, for a block cipher E and a given key K,  

E GK α β = P EK (P ) EK⊕∆K (P α)= β . −−→∆K ⊕ ⊕   n o

Similarly, we define the set of good ciphertexts: −1 E E −1 −1 G α β = EK (P ) P G α β = C E (C) E (C β)= α . K −−→∆K ∈ −−→∆K K ⊕ K⊕∆K ⊕   n  o n o E Our independence assumption asserts that the sets of the form G α β −−→∆K are independent, in the following sense:  

Assumption 1 For the block cipher E = E E under consideration, for any 1 ◦ 0 fixed key K, and for any set of differences α, γ1, δ, ∆K0, and ∆K1, we assume E1 that the event X GK γ1 δ is independent of any combination of ∈ −−−→∆K1 these three events:   E1 1. X β1 GK⊕∆K0 γ2 δ for all β1,γ2. ⊕ ∈ −−−→∆K1    −1 E0 2. X GK α β1 for all β1. ∈ −−−→∆K0    −1 E0 3. X γ1 GK⊕∆K1 α β2 for all β2. ⊕ ∈ −−−→∆K0    For example, our independence assumption asserts that

E1 E1 Pr X GK γ1 δ X β1 GK⊕∆K0 γ2 δ ∈ −−−→∆K1 ⊕ ∈ −−−→∆K1       ^ −1 E0 −1 E0 X GK α β1 X γ1 GK⊕∆K1 α β2 ∈ −−−→∆K0 ⊕ ∈ −−−→∆K0    ^    E1 = Pr X GK γ1 δ . ∈ −−−→∆K1   

This assumption is used implicitly in all the statements in the sequel. We discuss the assumption and its relation to the independence assumptions used in other techniques, such as differential and linear , in Section 2.4.

The Related-Key Boomerang Distinguisher Now we are ready to present the related-key boomerang distinguisher. Similarly to the boomerang attack, we treat the cipher E as a cascade of sub-ciphers: E = E1 E0. The distinguisher involves four different unknown (but related) keys — K◦ , K = K ∆K , a b a ⊕ ab Kc = Ka ∆Kac, and Kd = Ka ∆Kab ∆Kac. For fixed values α and δ, the attack algorithm⊕ is the following:⊕ ⊕ 1. Choose M plaintexts at random, and set a counter C to zero. For each plaintext Pa, perform the following: (a) Compute P = P α. b a ⊕ (b) Ask for the ciphertexts Ca = EKa (Pa) and Cb = EKb (Pb). (c) Compute Cc = Ca δ and Cd = Cb δ. (d) Ask for the plaintexts⊕ P = E−1(C )⊕ and P = E−1(C ). c Kc c d Kd d (e) Check whether Pc Pd = α. If yes, increase the value of the counter C by 1. ⊕ 2. If C > Threshold, output “The cipher E”. Otherwise, output “Random Permutation”. The value of Threshold will be specified later in this section. See Figure 1 for an outline of a right related-key boomerang quartet. It is easy to see that for a random permutation, the probability that the −n condition Pc Pd = α is satisfied is 2 . The probability that the condition is satisfied for E⊕is given in the following proposition: Fig. 1. A Related-Key Boomerang Quartet

ab ab ∆K ∆K Pa Pc

α α

Pb Pd

Xa Xc γ β

γ β Xb Xd

Ca Cc δ

Ka Kc Cb Cd ∆Kac δ

∆Kac Kb Kd

Proposition 1. Consider a quartet (Pa, Pb, Pc, Pd) constructed by the algorithm described above. We have Pr[P P = α]= c ⊕ d E0 E0 E1 E1 Pr α β1 Pr α β2 Pr γ1 δ Pr γ2 δ . −−−−→∆Kab · −−−−→∆Kab · −−−−→∆Kac · −−−−→∆Kac β1⊕β2⊕Xγ1⊕γ2=0         (2) In particular, Pr[P P = α] (ˆpqˆ)2, (3) c ⊕ d ≥ where

2 2 E0 E1 pˆ = Pr α β′ and qˆ = Pr γ′ δ . v −−−−→∆Kab v −−−−→∆Kac u β′   u γ′   uX uX t t

Proof. Consider a quartet (Pa, Pb, Pc, Pd) constructed by the algorithm. Denote the intermediate values (E0(Pa), E0(Pb), E0(Pc), E0(Pd)) (where the encryption is under the respective keys) by (Xa,Xb,Xc,Xd), respectively. For all β1,γ1,γ2, we say that the event Sβ1,γ1,γ2 occurs, if the following conditions are satisfied: X X = β , X X = γ , X X = γ . a ⊕ b 1 a ⊕ c 1 b ⊕ d 2 Since the events Sβ1,γ1,γ2 for different values of (β1,γ1,γ2) are disjoint and their union is the{ entire space,} we have Pr[P P = α]= Pr[P P = α S ] Pr[S ]. (4) c ⊕ d c ⊕ d β1,γ1,γ2 · β1,γ1,γ2 β1,γ1,γ2 X

If the event Sβ1,γ1,γ2 occurs, then

X X = (X X ) (X X ) (X X )= γ β γ . c ⊕ d c ⊕ a ⊕ a ⊕ b ⊕ b ⊕ d 1 ⊕ 1 ⊕ 2 Hence, by the independence assumptions,

E0 Pr Pc Pd = α Sβ1,γ1,γ2 = Pr[α β2], (5) ⊕ −−−−→∆Kab h i where β2 = γ1 β1 γ2. Similarly, the three conditions forming the event Sβ1,γ1,γ2 are independent,⊕ and⊕ hence

E0 E1 E1 Pr[Sβ1,γ1,γ2 ]=Pr α β1 Pr γ1 δ Pr γ2 δ . (6) −−−−→∆Kab · −−−−→∆Kac · −−−−→∆Kac       Substituting Equations (5) and (6) into Equation (4) yields Equation (2).

E0 E0 E1 E1 Pr α β1 Pr α β2 Pr γ1 δ Pr γ2 δ −−−−→∆Kab · −−−−→∆Kab · −−−−→∆Kac · −−−−→∆Kac ·≥ β1⊕β2⊕Xγ1⊕γ2=0        

E0 E0 E1 E1 Pr α β1 Pr α β2 Pr γ1 δ Pr γ2 δ = ≥ −−−−→∆Kab · −−−−→∆Kab · −−−−→∆Kac · −−−−→∆Kac         β1 Xβ2 =0, γ ⊕ γ =0 1 ⊕ 2 2 2 E0 E1 Pr α β′ Pr γ′ δ = (ˆpqˆ)2 , ∆K ∆K ′ −−−−→ab ! ′ −−−−→ac ! Xβ   Xγ   and thus, Inequality (3) follows from Equation (2).

Proposition 1 shows that ifp ˆq>ˆ 2−n/2, then the probability that the condi- tion Pc Pd = α holds, is higher for E than for a random permutation, i.e., we expect more⊕ quartets in the case of E. We next compute the optimal choice of the value Threshold used in the distinguisher.

The Optimal Choice of T hreshold The optimal value of Threshold can be found using the Likelihood Ratio test for the distributions representing Pr[P c ⊕ Pd = α] for E and for a random permutation. We use the following standard result:

Proposition 2 ([1], Proposition 1). Consider two distributions D0 and D1 assuming values in a finite set Z, and a sample zm of m independent elements of Z (represented as a vector in Zm). The optimal test for deciding whether the sample is distributed according to D0 or to D1 is the test having acceptance region A = zm Zm : LLR(zm) 0 , D0 { ∈ ≥ } where Pr [a] LLR(zm)= N(a zm)log D0 | PrD1 [a] aX∈Z is the logarithmic likelihood ratio (with the convention that log(0/p)= and log(p/0) = ), and where N(a zm) is the number of times a occurs−∞ in the sequence zm.∞ |

Denote p = Pr[P P = α] (where P and P are constructed by the boomerang 0 c⊕ d c d process). We apply Proposition 2, where D0 and D1 are the distributions rep- resenting Pr[Pc Pd = α] for E and for a random permutation, respectively. In this case, Z ⊕= 0, 1 , m = M, and both distributions represent Bernoulli { } −n random variables, where D0 = Ber(p0) and D1 = Ber(2 ). Hence, 1 p p LLR(zM )= N(0 zM )log − 0 + N(1 zM )log 0 . (7) | 1 2−n | 2−n − Since in our distinguisher, the acceptance region of the test is zM ZM : N(1 zM ) Threshold , the optimal value of Threshold is min k{ : f(k∈) 0 , where| ≥ } { ≥ } 1 p p f(k) = (M k)log − 0 + k log 0 . − 1 2−n 2−n − A simple computation shows that the optimal value is

1−p0 log 1−2−n Threshold = − − M . (8) p0(1−2 n) log − l (1−p0)2 n m

The Success Probability of the Distinguisher We use the following stan- dard definition of the success probability of a distinguisher (see, e.g., [1]):

Definition 3. Let A be a distinguisher between distributions D0 and D1, such that for j =0, 1, the statement [A(D)= j] corresponds to “D is distributed like Dj ”. The probability of success of A is Pr[A(D)=0 D = D ] + Pr[A(D)=1 D = D ] Pr (A)= | 0 | 1 . s 2

Since the distinguisher counts the number of successes amongst M trials, it actu- −n ally distinguishes between the Binomial distributions Bin(M,p0) and Bin(M, 2 ). Hence, given the value Threshold (as computed in Equation 8), the success prob- ability of the distinguisher is given by the formula: 1 Pr[Success]= Pr[Bin(M,p ) Threshold]+Pr[Bin(M, 2−n) < Threshold] = 2 0 ≥ h i M Threshold−1 1 M M = pk(1 p )M−k 2−nk(1 2−n)M−k . 2 k 0 − 0 k − "k=Threshold   k=0   # X X (9) For a large value of M (like the values usually used in attacks as M has to be at least 1/p0, and p0 is in most cases very small), the Binomial distributions −n can be approximated by the Poisson distributions P oi(p0M) and P oi(2 M). Using this approximation, Equation (9) is simplified to:

Threshold−1 k Threshold−1 −n k 1 (p M) −n (2 M) Pr[Success] 1 e−p0M 0 + e−2 M . ≈ 2 − k! k! " k=0 k=0 # X X (10) −n Denote c = Mp0, and x = p0/2 . Equation (10) can be reformulated into:

Threshold−1 Threshold−1 1 ck (c/x)k Pr[Success] 1 e−c + e−c/x . ≈ 2 − · k! · k! " k=0 ! k=0 !# X X (11) We note that in actual attacks, c usually satisfies 1 c 100, while the value x varies significantly between different attacks. In Table≤ ≤ 2, we give the optimal threshold and success rate for several common values of c and x. When x tends to infinity, Equation (11) can be simplified, as e−c/x tends −1 to 1. In other words, when x 1, given M = c p0 quartets, a threshold of 1 is sufficient to achieve the following≫ success rate:·

1 e−c Pr[success] 1 e−c +1 =1 . ≈ 2 − − 2 

Table 2. Optimal Thresholds and Success Rates for Common Parameters

x c = 1 c = 2 c = 3 c = 4 c = 6 c = 8 c = 16 2 1(61.9%)2(66.5%) Imp Imp Imp Imp Imp 4 1 (70.5%) 2 (75.2%) 2 (81.4%) 3 (84.1%) Imp Imp Imp 10 1 (76.8%) 1 (84.2%) 2 (88.2%) 2 (92.3%) 3 (95.7%) 4 (97.4%) Imp 16 1 (78.6%) 1 (87.4%) 2 (89.3%) 2 (94.1%) 3 (96.6%) 3 (98.6%) 6 (99.9%) 100 1 (81.1%) 1 (92.2%) 1 (96.0%) 1 (97.1%) 2 (99.0%) 2 (99.7%) 4 (99.99%) 200 1 (81.4%) 1 (92.7%) 1 (96.8%) 1 (98.1%) 2 (99.1%) 2 (99.8%) 4 (99.995%) 1000 1 (81.6%) 1 (93.1%) 1 (97.4%) 1 (98.9%) 1 (99.6%) 2 (99.8%) 3 (99.999%) 10000 1 (81.6%) 1 (93.2%) 1 (97.5%) 1 (99.1%) 1 (99.8%) 1 (99.9%) 2 (99.9998%) The entry X(Y %) means that the optimal threshold is X and the success rate is Y . Imp — it is impossible to gather the amount of data required in this case.

We note that while for attacks based on , the probability of success can be approximated using the Normal distribution (see, e.g., [1, 35]) in attacks based on differential cryptanalysis (like the attacks discussed in this paper) the Normal approximation may be inaccurate. The reason for the dif- ference is that while in linear-based attacks, the value of the measured random variable is big (close to M/2), in differential-based attacks the value is usually very small (e.g., 1 Threshold 10). For such small values, the approximation of a random variable≤ assuming≤ only integer values by a Normal distribution is inaccurate, and hence approximation using a Poisson random variable is prefer- able.13

Practical Lower Bounds forp ˆ andq ˆ In practical attacks, the probabil- ity of the related-key boomerang distinguisher (given by Equation 2) cannot be computed. Moreover, even the computation of the lower bound given by In- equality (3) is infeasible in most of the cases. Instead, the adversary finds high-

E0 E1 probability differential characteristics α β and γ δ . Then, −−−−→∆Kab −−−−→∆Kac the adversary computes lower bounds for p ˆ andq ˆ by considering only part of the possible β′ and γ′ values. For example, she can take into consideration all the E0 E0 characteristics α β′ that coincide with the characteristic α β −−−−→∆Kab −−−−→∆Kab in all the rounds except for the last one, and take all possible values in the output difference of the last round. In certain cases, especially when a good differential cannot be found, the following simple proposition is useful as a generic lower bound forp ˆ andq ˆ.

Proposition 3. Consider related-key differentials through E0 with input differ- ence α and key difference ∆K. If there exist only m differences β′ such that E0 Pr α β′ > 0, then pˆ 1/m. Moreover, equality holds if and only if all −−→∆K ≥ ′ thehm differentialsi (α βp) with non-zero probability have probability 1/m −−→∆K each.

Proof. Recall that the Cauchy-Schwarz inequality asserts that for any two se- quences a ,a ,...,a and b ,b ,...,b of non-negative numbers, we have { 1 2 m} { 1 2 m}

m m m a b a2 b2. i · i ≤ v i · v i i=1 ui=1 ui=1 X uX uX t t 13 In [35] the success probabilities of both a linear attack and a differential attack are approximated using the Normal distribution. The experiments presented in [35] show that the approximation is much more accurate in the case of linear cryptanalysis. It is possible that using a Poisson approximation would result in a better accuracy in the differential case, as explained above. ′ Denote the probabilities of the differentials of the form α β by p1,p2,...,pm −−→∆K (ignoring the differentials with zero probability). Clearl y, we have

m m p =1, andp ˆ = p2. i v i i=1 ui=1 X uX t We apply the Cauchy-Schwarz inequality for the sequences p1,p2,...,pm and 1, 1,..., 1 and get { } { }

m m m 1= p 1 p2 1=ˆp√m, i · ≤ v i · v i=1 ui=1 ui=1 X uX uX t t and hencep ˆ 1/m, as asserted. Furthermore, since equality in the Cauchy- Schwarz inequality≥ holds if and only if the sequences a m and b m are p i i=1 i i=1 proportional (i.e., there exists c such that a = c b{ for} all i), in{ our} case i · i equality holds if and only if all the pi’s are equal.

The generic lower bound given by Proposition 3 can be combined with a “good” differential for part of the rounds.

Proposition 4. Consider related-key differentials through E0 with input dif- ference α and key difference ∆K. Assume that there exists a decomposition E = E E , and a difference α′, such that: 0 01 ◦ 00

E00 1. Pr α α′ = p′, and −−→∆K E01 2. Thereh exist onlyi m differences β′ such that Pr[α′ β′] > 0. −−→∆K

Then pˆ p′ 1/m. ≥ p Proof. We compute a lower bound onp ˆ by considering only the characteristics E0 ′ E00 ′ α β for E0 whose restriction to E00 is α α . By the assumptions, −−→∆K −−→∆K there are only m such differentials (ignoring differentials  with probability zero), and the sum of their probabilities is p′. The assertion follows from the Cauchy- Schwarz inequality by the same argument as used in the proof of Proposition 3.

Clearly, the same arguments apply also for the computation ofq ˆ. Propositions 3 and 4 are used in our attacks on KASUMI and AES-192, presented in Sections 3 and 4. 2.3 The Related-Key Rectangle Attack

The transformation of the related-key boomerang attack into the related-key rectangle attack is similar to the transformation of the boomerang attack to the rectangle attack in the single-key model. The related-key rectangle distinguisher involves four different unknown (but related) keys — K , K = K ∆K , a b a ⊕ ab Kc = Ka ∆Kac, and Kd = Ka ∆Kab ∆Kac. For fixed values α and δ, the algorithm⊕ of the distinguisher is as⊕ follows:⊕

1. Choose M plaintexts P , and compute P = P α. Ask for the ciphertexts a b a ⊕ Ca = EKa (Pa) and Cb = EKb (Pb). 2. Choose M plaintexts P , and compute P = P α. Ask for the ciphertexts c d c ⊕ Cc = EKc (Pc) and Cd = EKd (Pd). 3. Set a counter C to zero. 2 4. For each of the M choices for (Pa, Pc) (and the corresponding (Pb, Pd)): (a) Check whether both conditions Ca Cc = δ and Cb Cd = δ are satisfied. If yes, increase the value of the counter⊕ C by 1. ⊕ 5. If C > Threshold, output “The cipher E”. Otherwise, output “Random Permutation”.

The value of Threshold will be specified later in this section. It is easy to see that for a random permutation, the probability that both −2n conditions Ca Cc = δ and Cb Cd = δ are satisfied is 2 . The probability that the conditions⊕ are satisfied⊕ for E is given in the following proposition:

Proposition 5. Consider a quartet of plaintexts and their corresponding cipher- texts ((Pa, Ca), (Pb, Cb), (Pc, Cc), (Pd, Cd)) constructed by the algorithm described above. We have Pr (C C = δ) (C C = δ) a ⊕ c ∧ b ⊕ d ≈ h i −n E0 E0 E1 E1 2 Pr α β1 Pr α β2 Pr γ1 δ Pr γ2 δ . ≈ −−−−→∆Kab · −−−−→∆Kab · −−−−→∆Kac · −−−−→∆Kac β1⊕β2⊕Xγ1⊕γ2=0         (12) In particular,

Pr C C = δ C C = δ 2−n(ˆpqˆ)2, (13) a ⊕ c ∧ b ⊕ d ≥ h   i where E0 E1 pˆ = Pr[α β′]2, and qˆ = Pr[γ′ δ]2. ′ −−−−→∆Kab ′ −−−−→∆Kac sXβ sXγ

Proof. The proof is similar to the proof of Proposition 1. Consider a quartet ((Pa, Ca), (Pb, Cb), (Pc, Cc), (Pd, Cd)) constructed by the algorithm. Denote the intermediate values (E0(Pa), E0(Pb), E0(Pc), E0(Pd)) (where the encryption is under the respective keys) by (Xa,Xb,Xc,Xd). For all β1,β2,γ1, we say that the event Sβ1,β2,γ1 occurs, if the following conditions are satisfied:

X X = β , X X = β , X X = γ . a ⊕ b 1 c ⊕ d 2 a ⊕ c 1

Since the events Sβ1,β2,γ1 for different values of (β1,β2,γ1) are disjoint and their union is the{ entire space,} we have

Pr C C = δ C C = δ = a ⊕ c ∧ b ⊕ d h   i = Pr C C = δ C C = δ S Pr[S ]. (14) a ⊕ c ∧ b ⊕ d β1,β2,γ1 · β1,β2,γ1 β1,β2,γ1 h    i X

If the event Sβ1,β2,γ1 occurs, then

X X = (X X ) (X X ) (X X )= b ⊕ d b ⊕ a ⊕ a ⊕ c ⊕ c ⊕ d β γ β . 1 ⊕ 1 ⊕ 2 Hence, by the independence assumption,

E1 E1 Pr Ca Cc = δ Cb Cd = δ Sβ1,β2,γ1 = Pr γ1 δ Pr γ2 δ , ⊕ ∧ ⊕ −−−−→∆Kac · −−−−→∆Kac h    i     (15) where γ = β γ β . Applying again the independence assumption, we have 2 1 ⊕ 1 ⊕ 2

−1 E0 −1 E0 Pr[Sβ1,β2,γ1 ]=Pr Xa GKa α β1 Xc GKc α β2 Xa Xc = γ1 ∈ −−−−→∆Kab ∈ −−−−→∆Kab ⊕ ·          ^

−1 E0 Pr Xc GKc α β2 Xa Xc = γ1 Pr [Xa Xc = γ1]= · ∈ −−−−→∆Kab ⊕ · ⊕    

E0 E0 = Pr α β1 Pr α β2 Pr(Xa Xc = γ1). (16) −−−−→∆Kab · −−−−→∆Kab · ⊕     Since Pa and Pc are chosen independently, then

Pr[X X = γ ] 2−n. (17) a ⊕ c 1 ≈

Note that for any fixed value of Pa Pc and γ1, this approximation is rather inaccurate. For an ideal cipher, it is⊕ expected that for a fraction e−1/2 of the possible values of γ1, we have Pr[Xa Xc = γ1] = 0, and for the other values, the probability is at least 2−n+1. However,⊕ when the probability is averaged over many different pairs (Pa, Pc), the approximation becomes reasonable. Substituting Equations (15), (16), and (17) into Equation (14) yields Equa- tion (12). The proof of Equation (13) given Equation (12) is identical to the derivation of Equation (3) from Equation (2) in the proof of Proposition 1. Proposition 5 shows that ifp ˆq>ˆ 2−n/2, then the probability that the conditions (Ca Cc = δ) and (Cb Cd = δ) hold simultaneously, is higher for E than for a random⊕ permutation,⊕ and hence Step 2 of the distinguisher makes sense. The optimal choice of Threshold and the computation of the success proba- bility of the distinguisher given the probability

p = Pr C C = δ C C = δ 0 a ⊕ c ∧ b ⊕ d h   i are very similar to the respective steps for the related-key boomerang distin- guisher presented in Section 2.2, and hence are omitted here. A key recovery attack based on the related-key rectangle distinguisher is more complicated than the respective related-key boomerang attack, due to the abundance of quartets the adversary has to examine. We do not describe the key-recovery algorithm here, and refer the reader to the algorithm of the rectangle key-recovery attack in [8], that can be easily adopted to the related-key model. We note that Table 2 can also be applied to the case of the rectangle attack, with a different value for −n 2 −2n p0,c and x: For the related-key rectangle attack p0 = 2 (ˆpqˆ) , x = p0/2 , and c is the number of expected quartets (i.e., given M = c/p0 pairs). p 2.4 The Independence Assumptions

All statistical cryptanalytic techniques require various randomness assumptions. For example, the construction of differential characteristics uses the assumption that the cipher is a Markov cipher (see [6]), which implies that the characteris- tics of single rounds are independent of each other and can be combined. Linear cryptanalysis is based on Matsui’s Piling up Lemma [31], which essentially asserts that linear approximations of single rounds are independent. These randomness assumptions allow a rigorous treatment of the techniques, as well as better appli- cability (since the search of differentials and linear approximations can be done for each round separately). It is easy to construct artificial examples of ciphers that do not satisfy the randomness assumptions, which would result in failure of the differential or the linear attacks. However, based on many experimental results, it is reasonable to assume that most of the ciphers satisfy the random- ness assumptions. Moreover, if some cipher does not satisfy these assumptions, then this non-randomness is probably exploitable in some other attack, e.g., impossible differential attack. Nevertheless, it is important to verify the attacks experimentally whenever possible in order to assure that the assumptions indeed hold in the specific case of interest. The randomness assumption used in the related-key boomerang and rect- angle attacks (i.e., Assumption 1) has two parts. The second part of the as- sumption, that essentially asserts that differentials of different parts of the ci- pher are independent, is similar to the standard assumption that the cipher is Markovian, which is used in differential cryptanalysis. However, the first part of Assumption 1 is relatively stronger than the assumptions used in differential cryptanalysis. Differential cryptanalysis is based on the assumption that for any fixed key K and any (related-key) differential (α β), the set GK (α β) is distributed randomly and uniformly in the plaintext→ space.14 In the→ related- key boomerang and rectangle attacks, the assumption deals with the distribu- E0 tion of pairs of sets of the class GK α β . We assume that any two −−−→∆K0   E1 pairs of such sets are independent, i.e., the events X GK γ1 δ and ∈ −−−→∆K1   E1 X β1 GK⊕∆K0 γ2 δ are independent, for any value of γ1,γ2,β1, δ, ⊕ ∈ −−−→∆K1 and K.   To show the problem that may exist in the independence assumptions, we give the following simple artificial example, which uses high probability differentials. Assume that for given K, α, and β, for which MSB(β) = 0 (i.e., the most −1 E0 significant bit of β is 0), we have GK (α β) = X MSB(X)= 1 −−−→∆K0 { | } −1 E0 and GK⊕∆K1 (α β) = X MSB(X)=0 (in particular, it follows that −−−→∆K0 { | } E0 Pr[α β]=1/2). Further assume that for some γ such that MSB(γ)=0 −−−→∆K0 E1 and for some δ, Pr[γ δ]=1/2. By the independence assumptions, it is −−−→∆K1 expected that the probability in a related-key boomerang distinguisher based on E0 E1 the differentials α β and γ δ is at least (1/4)2 = 1/16. How- −−−→∆K0 −−−→∆K1 ever, consider a right quartet with respect to this distinguisher and denote the −1 E0 intermediate encryption values by (Xa,Xb,Xc,Xd). Since Xa GK (α β), ∈ −−−→∆K0 we have MSB(Xa) = 1, and thus, since MSB(γ) = 0, necessarily MSB(Xc)= −1 E0 1. This implies that Xc GK⊕∆K1 (α β), and thus, the actual probability 6∈ −−−→∆K0 of the distinguisher is zero!15 This example demonstrates failure of the first part of Assumption 1 (indepen- dence inside the same sub-cipher). Similarly, the second part of the assumption −1 E0 fails if we assume that for some K,α,β,γ and δ, we have GK α β = −−−→∆K0   E1 X MSB(X)=1 and GK γ δ = X MSB(X)=0 , since in this { | } −−−→∆K1 { | }  

14 There are cases in which this cannot be satisfied even in a regular cipher as shown in [17], where the behavior of differential characteristics with probability lower than 2−n is shown to be dependent on the key. This is also the case for classes, i.e., classes of keys which behave significantly different than random. 15 Actually, the probability of the distinguisher may be higher due to differentials of the E form (α −−−→0 β′) for β′ 6= β. However, if there are no high-probability differentials ∆K0 of this form, the probability of the distinguisher is still significantly lower than the predicted value 1/16. −1 E0 E1 case Xa cannot be element in both GK α β and GK γ δ si- −−−→∆K0 −−−→∆K1 multaneously.     We note that in several specific cases, deviations from the prediction of the independence assumptions were detected in “real” ciphers. Such an example is the ladder switch described in [14], where higher probability for the related-key boomerang distinguisher is obtained using dependencies. Luckily, in the related-key boomerang and rectangle attacks, there are sev- eral mechanisms which may overcome dependence problems. The first is the fact that in the attack we count over many differentials (all β1,β2,γ1,γ2 such that β1 β2 γ1 γ2 = 0), which ensures that even if there is a problem in some combination⊕ ⊕ ⊕ of differentials, it is expected that other combinations still succeed. The second one is the fact that four different keys are used (in the case ∆K0 = ∆K1),and thus, even if there is a dependence between the differentials, it is slightly6 countered by the different keys.

Experimental Verification of the Randomness Assumption As follows from the discussion above, in the related-key boomerang and rectangle attacks it is very important to verify the independence assumption practically in any specific case. Unfortunately, in many of the cases it is impossible due to the high complexity of the attack. Moreover, for the rectangle attack such verification is inherently impossible: the data complexity of the attack is lower-bounded by 2n/2 and infeasible for any block cipher with block size of 128 bits or more (e.g., AES). For the boomerang attack, it is sometimes possible to challenge the independence assumptions for a reduced-round variant of the attack, e.g., for a variant containing one or two rounds in each sub-cipher. However, this verification is not fully sufficient, since in the full attack, the overall probability of the distinguisher is an average taken over many possible differentials, while in the reduced-round variant only a small subset of the differentials is considered. It is possible that while the reduced-round attack does not satisfy the independence assumption, the full attack does satisfy it, since the deviations from independence for different characteristics cancel each other. As an example of the validity of Assumption 1, we experimentally verified the related-key boomerang distinguisher on 6-round KASUMI, presented in Sec- tion 3.3. The predicted rate of experiments with at least one quartet of 86.5% was met with an 87% of the experiments showing one such quartet after 10,000 experiments, proving the validity of the analysis for 6-round KASUMI. For more details, we refer the reader to Section 3.3.

2.5 Generalizations of the Related-Key Boomerang and Rectangle Attacks In this section we briefly present two generalizations of the basic related-key boomerang and rectangle attacks. Using Structures of Keys The related-key differentials used in the attack are usually based on fixed subkey differences. If the key schedule of the attacked cipher is linear, such differences can be achieved by choosing the appropriate key difference. However, if the key schedule is nonlinear, a fixed key difference does not ensure fixed subkey differences. Instead, the adversary can apply differential cryptanalysis to the key schedule. By studying the differential properties of the key schedule, the adversary can find a key difference that leads to the required subkey differences with a relatively high probability. Then, the adversary can repeat the attack for many pairs of related-keys and expect that in one of the pairs, the required subkey differences are satisfied and the basic related-key boomerang/rectangle attack can be applied. Furthermore, we observe that the number of keys used in the attack can be reduced by using structures of keys. Instead of finding a single key difference leading to the required subkey differences with a high probability, the adversary can find many such key differences (possibly with lower probabilities). Then, the adversary can use structures of keys such that each structure contains many pairs of keys corresponding to different “key characteristics”, and thus reduce the number of keys required for the attack. A concrete example of this improvement can be found in the attack on AES- 192 in Section 4. The improvement uses 127 key characteristics in parallel, and succeeds to reduce the number of keys required for the attack from 256 to 64.

Generalizing the Key Relation While XOR relations are common and inher- ent to the majority of differential-based related-key attacks, in some cases other key relations are more suitable (either due to the environment of the attack or in order to obtain higher probabilities of the differentials). The related-key boomerang and rectangle attacks can be applied almost without a change when the XOR key relations are replaced by any relation satisfying a condition speci- fied below. Denote the relation between the keys K and K′ by R(K,K′). We note that R can be any relation which is symmetric, and covers all keys. At the same time, we recall the fact that the more complex the relation R is, the applicability of the related-key attack may be affected. For example, in the basic related-key boomerang and rectangle attacks we can set R(K,K′)= K K′. The boomerang and rectangle attacks can be applied whenever⊕ the key rela- tion satisfies the following condition:

(K ,K ,K ,K ), R(K ,K )= R(K ,K ) = ∀ a b c d a c b d ⇒   R(Ka,Kb)= R(Kc,Kd) . (18) Condition (18) ensures that in each of the sub-ciphers, the same key relation is used in both differentials. For example, for XOR differences

(K K = K K )= (K K = K K ), a ⊕ c b ⊕ d ⇒ a ⊕ b c ⊕ d and hence the condition holds. Condition (18) is satisfied for a wide variety of key relations, including ad- ditive differences (e.g., R(K,K′) = (K K′) mod 2n) and rotations. On the other hand, the condition does not hold− if the relation used in the first sub- cipher (i.e., between (Ka,Kb) and (Kc,Kd)) and the relation used in the second sub-cipher (i.e., between (Ka,Kc) and (Kb,Kd)) are of different classes (e.g., XOR differences in the first sub-cipher and modular differences in the second sub-cipher). We note that the basic related-key boomerang and rectangle attacks can be ′ extended to use different values α, α in the related-key differentials of E0, and ′ δ, δ in the related-key differentials of E1. Similarly, the attack can use different ′ ′ key differences ∆K0, ∆K0 and ∆K1, ∆K1 in the differentials of E0 and E1, re- spectively. This allows to extend Condition (18) to the following:

Proposition 6. The related-key boomerang attack can be applied with two key relations R1, R2, as long as for every quadruple (Ka,Kb,Kc,Kd) the relations R1(Ka,Kb), and R2(Ka,Kc), R2(Kb,Kd) imply the relation R1(Kc,Kd). The related-key rectangle attack can be applied if the relations R1(Ka,Kb), R1(Kc,Kd) and R2(Ka,Kc) imply the relation R2(Kb,Kd).

Finally, even if the condition of Proposition 6 is not satisfied, in some cases the attack can be still applied using structures of keys, as described earlier.

2.6 Comparison With Other Related-Key Attacks For any new technique constructed as a combination of existing techniques, a natural question to ask is whether there are cases in which the combined attack is better than each of its components taken separately. In this section we briefly describe several important cases in which the related-key boomerang and rectangle attacks are expected to outperform each of their components. Concrete examples of the advantage of related-key boomerang and rectangle attacks over other attack techniques are given in Sections 3 and 4. The main advantage of the related-key differential attacks over ordinary dif- ferential attacks is the ability of the adversary to use the subkey differences to cancel the plaintext difference in the input of one (or more) of the non-linear parts of the cipher. As a result, the adversary obtains one (or more) rounds in the differential that hold with probability 1, allowing the extension the differential by one (or more) rounds. In the related-key boomerang and rectangle attacks, the adversary can enjoy this advantage twice, once in each of the sub-ciphers. As a result, the overall dis- tinguisher can be extended by two (and in some cases even more) rounds. This is a significant advantage of the related-key boomerang/rectangle attack over all other differential-based related-key attacks (e.g., related-key differential at- tack, related-key impossible differential attack and related-key differential-linear attack) that can enjoy the advantage of the related-key model only once. The advantage of gaining a single additional round (or two rounds) to the distinguisher is significant in ciphers in which the number of rounds is small and each round function is relatively strong. Hence, the gain of the related-key boomerang/rectangle attack is expected to be significant in ciphers like AES [32] and KASUMI [36]. Another property of the cipher required for the success of related-key boomerang and rectangle attacks is simplicity of the key schedule. The basic version of the attack is applicable only to ciphers with a linear key schedule, but using struc- tures of keys the attack can be applied to ciphers with a nonlinear key schedule as well. However, if the key schedule of the cipher is complex enough and does not have “good” differential properties, then the number of keys required for the attack becomes infeasibly big. Summarizing the discussion above, the related-key boomerang and rectangle attacks are expected to be successful if the attacked cipher has the following properties:

– A small number of relatively strong rounds. – A relatively simple key schedule.

The class of ciphers satisfying these properties includes widely used ciphers such as AES [32], KASUMI [36], and IDEA [30]. These three ciphers can be in- deed attacked efficiently using the related-key boomerang/rectangle attack tech- nique.

3 Related-Key Boomerang/Rectangle Attacks on KASUMI

3.1 The KASUMI Block Cipher

KASUMI [36] is a 64-bit block cipher with 128-bit keys, with a recursive Feistel structure, following its ancestor, MISTY1. The cipher has eight Feistel rounds, where each round is composed of two functions: the F O function which is in itself a 3-round 32-bit Feistel construction, and the FL function that mixes a 32-bit subkey with the data in a linear way. The order of the two functions depends on the round number: in the even rounds the FL function is applied first, and in the odd rounds the F O function is applied first. The F O function also has a recursive structure: its F -function, called F I, is a four-round Feistel construction. The F I function uses two non-linear S-boxes S7 and S9 (where S7 is a 7-bit to 7-bit permutation and S9 is a 9-bit to 9-bit permutation), and accepts an additional 16-bit subkey, that is mixed with the data. In total, a 96-bit subkey enters F O in each round — 48 subkey bits are used in the F I functions and 48 subkey bits are used in the key mixing stages. The FL function accepts a 32-bit input and two 16-bit subkey words. One subkey word affects the data using the OR operation, while the second one affects the data using the AND operation. We outline the structure of KASUMI and its parts in Fig. 2. Fig. 2. Outline of KASUMI

? ? ? ? ? KL1 KO1,KI1  i KOi,1 ? ? ? S9

- - - 1  KIi,1 ? F L1 F O1 i FIi, i ?  `` 2 2 2 i `` KL KO ,KI ``` ? ? ``` ? ``` i F L2  F O2  ` S7 ? i KOi,2 ? ? KL3 KO3,KI3 i  2 - FIi,2 KIi,2 KIi,j, i ? ? ? - - - ? i KIi,j,1 F L3 F O3 i  ` i ``` `` ``` KL4 KO4,KI4 ` ` ``` ? ? ? ? S9 i KOi,3 i F L4  F O4  ? ?  i FIi,3 KIi,3 KL5 KO5,KI5 ``` ? ` `` ? ? i ` - - - ? F L5 F O5 i ``` ``` S7 ` ? KL6 KO6,KI6 i ? ? ? i F L6  F O6  F O function ? FI function KL7 KO7,KI7 ? ? ? - F L7 - F O7 - i KLi,1 ∩ ? - - KL8 KO8,KI8 ∩ <<< i bitwise AND ? ? ∪ KLi,2 bitwise OR i   ? F L8 F O8 i<<<  ∪ <<< rotate left by one bit ? ? KASUMI F L function Table 3. KASUMI’s Key Schedule Algorithm

Round KLi,1 KLi,2 KOi,1 KOi,2 KOi,3 KIi,1 KIi,2 KIi,3 ′ ′ ′ ′ 1 K1 ≪ 1 K3 K2 ≪ 5 K6 ≪ 8 K7 ≪ 13 K5 K4 K8 ′ ′ ′ ′ 2 K2 ≪ 1 K4 K3 ≪ 5 K7 ≪ 8 K8 ≪ 13 K6 K5 K1 ′ ′ ′ ′ 3 K3 ≪ 1 K5 K4 ≪ 5 K8 ≪ 8 K1 ≪ 13 K7 K6 K2 ′ ′ ′ ′ 4 K4 ≪ 1 K6 K5 ≪ 5 K1 ≪ 8 K2 ≪ 13 K8 K7 K3 ′ ′ ′ ′ 5 K5 ≪ 1 K7 K6 ≪ 5 K2 ≪ 8 K3 ≪ 13 K1 K8 K4 ′ ′ ′ ′ 6 K6 ≪ 1 K8 K7 ≪ 5 K3 ≪ 8 K4 ≪ 13 K2 K1 K5 ′ ′ ′ ′ 7 K7 ≪ 1 K1 K8 ≪ 5 K4 ≪ 8 K5 ≪ 13 K3 K2 K6 ′ ′ ′ ′ 8 K8 ≪ 1 K2 K1 ≪ 5 K5 ≪ 8 K6 ≪ 13 K4 K3 K7 (X ≪ i) — X rotated to the left by i bits

The key schedule of KASUMI is very simple and the subkeys are derived from the key linearly. The 128-bit key K is divided into eight 16-bit words: ′ K1,K2,...,K8. Each Ki is used to compute Ki = Ki Ci, where the Ci’s are fixed constants (we omit these from the paper as they⊕ have no effect on our 15 14 0 results). We denote the bits of the subkeys by Ki = (Ki ,Ki ,...,Ki ), where 15 Ki is the most significant bit. In each round, eight words are used as the round subkey (up to some in-word rotations). Therefore, the 128-bit subkey of each round is linearly dependent on the secret key in a very simple way. We give the key schedule algorithm of KASUMI in Table 3.

3.2 Related-Key Differentials of KASUMI In our attacks we use three related-key differentials: a 4-round differential for rounds 1–4, and 3-round differentials for rounds 4–6 and rounds 5–7. Note that the change in the order between F O and FL requires to use two 3-round differ- entials.

A 4-Round Related-Key Differential for Rounds 1–4 Our attack on the full KASUMI uses a related-key differential of rounds 1–4 of KASUMI which is an extension by one round of the related-key differential presented in [15]. The input difference of this differential is α = (0x, 0020 0000x), where the key difference is ∆Kab = (0, 0, 1, 0, 0, 0, 0, 0), i.e., only the third key word has a non- zero difference ∆K3 = 0001x. The first three rounds of the characteristic have probability 1/4, and due to the Feistel structure, the α difference can propagate to at most 232 differences after round 4. Hence, by Proposition 4, we have 1 pˆ √2−32 =2−18. ≥ 4 · We outline the differential in Figure 3. It was further observed in [15] that the probability of the differential can be increased by controlling two plaintext bits. If the adversary assigns one bit of Fig. 3. 4-Round Related-Key Differential Characteristic of KASUMI

0 0020 0000x ? KL1 KO1,KI1 ? ? 0 0 - F L1 - F O1 - i p = 1/2

KL2 KO2,KI2 ? ? 0   i F L2 F O2 0020 0000x p = 1

KL3 KO3,KI3 ? ? 0 0 - F L3 - F O3 - i p = 1/2

KL4 KO4,KI4 ? ? y    −32 i F L4 F O4 0020 0000x p ≥ 2

y ? 0020 0000x

the plaintexts to be one (thus fixing one bit of the output of the OR operation in FL1) and one bit of the plaintexts to be zero (thus fixing one bit of the output of the AND operation in FL1), then the probability of the differential described in [15] is increased to 1/2. As a result, for our 4-round differential we havep ˆ 2−17.16 ≥ We note that it is possible to rotate all the words of the key difference ∆Kab and of the differential by the same number of bits, without changing the prob- ability of the differential. Hence, the above differential can be replaced by 15 equivalent differentials.

3-Round Related-Key Differential for Rounds 5–7 The 3-round related- key differential used in rounds 5–7 is the 3-round differential of [15] shifted by four rounds and rotated by one position to the right. The key difference is ∆Kac = (0, 0, 0, 0, 0, 0, 8000 , 0), and the data differences are γ = (0 , 0010 0000 ) x x x → (0x, 0010 0000x)= δ. Since we use a single differential (and not count over other possibilities), we have qˆ = q =1/4. As before, it is possible to obtain 15 equivalent differentials by rotating the key difference in ∆K7 and rotating the data differences correspondingly.

16 Note that if the differential is used in the backward direction, the lower bound remains 2−18. 3-Round Related-Key Differential for Rounds 4–6. In rounds 4–6 we use conditional related-key differential characteristics [3], i.e., characteristics that depend on some unknown key bit. ′ Let δ0 = (0010 0000x, 0x), δ1 = (0010 0040x, 0x), and δ = (0001 0000x, 0x). 4 If K5 = 0 (i.e., the fifth least significant bit of K5 equals zero), we use the ′ 4 two differentials δ0 δ0 and δ0 δ δ0. If K5 = 1, we use the differentials ′ → ⊕ → δ1 δ1 and δ1 δ δ1. The key difference of all the characteristics is ∆Kac = → ⊕ → 4 (0, 0, 0, 0, 0, 1, 0, 0). Each of the four characteristics has probability 1/4, if K5 has the corresponding value. For example, we describe the difference propagation in the backward direc- tion of the characteristics δ δ and δ δ′ δ . Consider a pair with 0 → 0 0 ⊕ → 0 ciphertext difference δ0 = (0010 0000x, 0x). In round 6 the zero difference is pre- served with probability 1/2 (i.e., the key difference is cancelled with probability 1/2). In round 5, we need a difference of 0010 0000x after FL5, which is then 4 cancelled with the key difference in KO5,1. If K5 = 0, then this is indeed the case with probability 1. In round 4, the zero difference is preserved by the F O4 function. As in round 6, it has probability 1/2 to be preserved also by FL4, and ′ probability 1/2 to evolve into δ0 δ . Thus, the input difference of the differential characteristic is either δ or δ ⊕δ′, with probability 1/4 each. 0 0 ⊕ In the attack, we apply the distinguisher twice, once with each pair of char- acteristics, and expect that in one of the applications, both differentials hold with probability 1/4.17 For that application, we have

qˆ = (1/4)2 + (1/4)2 =1/√8.

We note that the four conditionalp differential characteristics we use can be rotated along with the key difference, to produce 15 similar sets of differential characteristics with the same probabilities.

3.3 Related-Key Boomerang Distinguisher on 6-Round KASUMI

In this section we present a related-key boomerang distinguisher for 6-round KASUMI. The distinguisher we present applies to rounds 1–6 of KASUMI, but it can be easily adapted to rounds 2–7 or 3–8, as well. Let E0 be rounds 1–3, and let E be rounds 4–6. In E we use the differential α = (0 , 0020 0000 ) 1 0 x x → (0x, 0020 0000x) with key difference ∆Kab = (0, 0, 1, 0, 0, 0, 0, 0). As shown in Section 3.2, the probability of the differential in the forward direction is 1/2 (after adding constraints on the plaintexts), and the probability in the backward ′ direction is 1/4. In E1 we use the two pairs of differentials (δ0 δ0,δ0 δ δ0), ′ → ⊕ → and (δ1 δ1,δ1 δ δ1), both with key difference ∆Kac = (0, 0, 0, 0, 0, 1, 0, 0). As shown→ in Section⊕ 3.2,→ one of the pairs of differentials yields overall probability ofq ˆ = 1/√8 (where the “successful” pair depends on the value of the key bit 4 K5 ).

17 We note that the knowledge of the “successful” pair of characteristics reveals the 4 value of the key bit K5 . The attack essentially performs two standard related-key boomerang dis- 4 tinguishers, one for each possible value of the key bit K5 . To reduce the data complexity of the attack, we share some of the chosen plaintexts between the two distinguishers. The attack algorithm requires four keys:

K ; K = K ∆K ; K = K ∆K ; K = K ∆K . a b a ⊕ ab c a ⊕ ac d b ⊕ ac The algorithm of the distinguisher is as follows: 1. Choose M pairs of plaintexts (P , P ) (for 1 i M) such that P a,i b,i ≤ ≤ a,i ⊕ Pb,i = α. For each pair, ask for the encryption of Pa,i and Pb,i under the keys Ka and Kb, respectively, and denote the corresponding ciphertexts by Ca,i and Cb,i. 2. For 1 i M, calculate C = C δ and C = C δ . For all i, ask ≤ ≤ c,i a,i ⊕ 0 d,i b,i ⊕ 0 for the decryption of Cc,i and Cd,i under the keys Kc and Kd, respectively, and denote the corresponding plaintexts by Pc,i and Pd,i. 3. For 1 i M, calculate C = C δ and C = C δ . For all i, ask ≤ ≤ e,i a,i ⊕ 1 f,i b,i ⊕ 1 for the decryption of Ce,i and Cf,i under the keys Kc and Kd, respectively, and denote the corresponding plaintexts by Pe,i and Pf,i. 4. Check whether P P = α and count the number of such occurrences. c,i ⊕ d,i 5. Check whether Pe,i Pf,i = α and count the number of such occurrences. 6. If one of the two counters⊕ from Steps 4 and 5 is greater than zero, then output “6-Round KASUMI”. Otherwise, output “Not 6-Round KASUMI”. The total probability of the boomerang process of this distinguisher is18 (1/2) (1/4) (1/√8)2 = 1/64, either for quartets counted in Step 4 or for quartets· counted· in Step 5. Therefore, for M = 128 we expect to find two right quartets in Step 4 or in Step 5 (either for the quartets (Pa,i, Pb,i, Pc,i, Pd,i) or for the quartets (Pa,i, Pb,i, Pe,i, Pf,i)). The right quartets can be detected effectively as for a random cipher the probability of the event Pc,i Pd,i = α (or the event −64 ⊕ Pe,i Pf,i = α) is 2 . We⊕ note that if two right quartets are expected, then the probability that none is found is about 14%, i.e., the success rate of the attack is about (86%+ 100%)/2 = 93%. The data complexity is 3 128 2 = 768 adaptively chosen plaintexts and ciphertexts, such that 256 chosen· plaintext· s are encrypted and 512 adaptively chosen ciphertexts are decrypted. The time complexity of the attack is negligible. We verified the distinguishing attack experimentally. We sampled 10,000 ran- dom keys, and ran the above distinguisher with M = 128. By the analysis pre- sented above, we expected that in 86.5% of the experiments there will be at least one right quartet. Our experiments revealed that in 87% there was at least one such quartet. We outline in Table 4 the number of quartets suggested in each experiments and compare it with the expected number based on Poisson distribution with a mean of 2. As can be seen from the table, the figures seem to be highly correlated. 18 Recall that the first differential has probability 1/2 for the pair (Pa, Pb) due to fixing the plaintexts correctly. Table 4. The Number of Found Quartets in 10,000 Experiments

Quartets 0 1 2 3 4 5 6 7 8 9 10 Experiments 1302 2695 2692 1879 907 348 127 27 9 4 0 Poisson (mean = 2) 1353 2707 2707 1804 902 361 120 34 9 2 < 1

As noted in [10], this distinguisher can be transformed into a key recovery attack. The key recover attack has a total data and time complexities of 213. The number of keys used in the attack is 34.

3.4 The Basic Related-Key Rectangle Attack on the Full KASUMI Our attack on the full KASUMI applies the related-key rectangle distinguisher in rounds 1–7 and retrieves subkey material in round 8. Let ∆Kab = (0, 0, 1, 0, 0, 0, 0, 0) and ∆K = (0, 0, 0, 0, 0, 8000 , 0, 0), and let K , K = K ∆K , K = ac x a b a ⊕ ab c Ka ∆Kac, and Kd = Kc ∆Kab be the unknown related keys we want to re- trieve.⊕ In rounds 1–4 we use⊕ the related-key differential presented in Section 3.2 that has an input difference α = (0x, 0020 0000x), a key difference ∆Kab and for whichp ˆ =2−17.19 In rounds 5–7 we use the related-key differential presented in Section 3.2 that has an output difference δ = (0x, 0010 0000x), a key difference −2 20 ∆Kac and for whichq ˆ =2 . 51 We start with N =2 pairs of plaintexts encrypted under Ka and Kb, and the same number of plaintext pairs encrypted under Kc and Kd. This data set contains N 2 =2102 quartets, of which about N 2 2−64 2−34 2−4 =2102 2−102 = 1 are expected to be right rectangle quartets.· In the· attack· we identify· the candidate quartets out of all possible quartets, and then analyze them to retrieve the subkey of round 8. Denote the 64-bit plaintext P by (PL, PR), where each 32-bit half is composed of two 16-bit halves, i.e., P = ((PLL, PLR), (PRL, PRR)). The attack algorithm is as follows:

1. Data Collection Phase: 51 (a) Choose a structure of 2 pairs of plaintexts (Pa, Pb), where Pb = Pa α, 0 ⊕ PaLL = 0 (i.e., the least significant bit of PaLL is fixed to zero for all the 1 21 plaintexts in the structure), and PaLR = 1. For each pair, ask for the encryption of Pa and Pb under the keys Ka and Kb, respectively, and insert each pair of ciphertexts into a hash table indexed by the 64-bit

value of (CaRL , CaRR , CbRL , CbRR ). 19 In this rectangle attack the differentials are used only in the forward direction and hencep ˆ is 2−17 rather than 2−18, as shown in Section 3.2. 20 We note that we picked a slightly rotated version of δ0 to ensure maximal indepen- dence between the two sub-ciphers, thus validating the independence assumptions. 21 Fixing a bit in Pa also fixes the corresponding bit in its counterpart Pb, due to the difference α. Table 5. Possible Values of KL8,2 and KL8,1

OR — KL8,2 AND — KL8,1 ′ ′ ′ ′ (X , Y ) (X , Y ) ′ ′ bd bd ′ ′ bd bd (Xac, Yac) (0,0) (0,1) (1,0) (1,1) (Xac, Yac) (0,0) (0,1) (1,0) (1,1) (0,0) {0,1} — 1 0 (0,0) {0,1} — 0 1 (0,1) — — — — (0,1) — — — — (1,0) 1 — 1 — (1,0) 0 — 0 — (1,1) 0 —— 0 (1,1) 1 —— 1 ∗ ′ ′ The two bits of the differences are denoted by (input difference, output difference): (X1, Y1 ) for ′ ′ one pair and (X2, Y2 ) for the other pair.

51 (b) Choose a structure of 2 pairs of plaintexts (Pc, Pd), where Pd = Pc α, 0 1 ⊕ PcLL = 0, and PcLR = 1. For each pair, ask for the encryption of Pc and Pd under the keys Kc and Kd, respectively. Then, access the hash table in the entry corresponding to the value (C 0010 , C , C cRL ⊕ x cRR dRL ⊕ 0010x, CdRR ). For each pair (Pa, Pb) found in this entry, apply Step 2 on the quartet (Pa, Pb, Pc, Pd).

In the first step described above, the (251)2 = 2102 possible quartets are filtered according to a condition on the 64 bits of difference which are known (due to the output difference δ), which leaves about 238 quartets to Step 2. In this step, we treat all the remaining quartets as right quartets. Under this assumption, we know not only the actual inputs to round 8, but also the output differences. We guess 32 bits of the key (KO8,1, KI8,1), and try to deduce KL8,2.

2. Analyzing Quartets:

(a) For each remaining quartet (Ca, Cb, Cc, Cd) guess the 32-bit value of KO8,1 and KI8,1. For the two pairs (Ca, Cc) and (Cb, Cd) use the value of the guessed key to compute the input and output differences of the OR operation in the last round of both pairs. For each bit of this 16-bit OR operation of FL8, the possible values of the corresponding bit of 16 −16 KL8,2 are given in Table 5. On average (8/16) =2 values of KL8,2 are suggested by each quartet and guess of KO8,1 and KI8,1. (b) For each quartet and values of KO8,1, KI8,1 and KL8,2 suggested in Step 2(a), guess the 32-bit value of KO8,3 and KI8,3, and use this infor- mation to compute the input and output differences of the AND opera- tion in both pairs. For each bit of the 16-bit AND operation of FL8, the possible values of the corresponding bit of KL8,1 are given in Table 5. 16 −16 On average (8/16) =2 values of KL8,1 are suggested by each quar- tet and guess of KO8,1, KI8,1, KO8,3, KI8,3, and the computed value of KL8,2.

3. Finding the Right Key: For each quartet and value of (KO8,1, KI8,1, KO8,3, KI8,3, KL8,1 ,KL8,2) suggested in Step 2, guess the remaining 32 bits of the key, and perform a trial encryption. 3.5 Analysis of the Attack

We first analyze Step 2(a), and show that given the input and output differences of the OR operation in the two pairs of the quartet, the expected number of −16 suggestions for the key KL8,2 is 2 . Let us examine a difference in some bit j. For each pair, there are four combinations of input difference and output difference in this bit. Table 5 lists the values that the two pairs suggest for the respective key bit. In the table there are nine entries that contain no value, which means a contradiction. For example, a difference 0 can never lead to a difference 1 by any linear function. Another possible contradiction occurs when one pair suggests that the key bit is 0, while the second pair suggests that the key bit is 1. The total number of suggestions for the key bit is 8. Since the table has 16 entries, the average number of suggested values for the key bit is 1/2. In total, for the 16 bits there are (1/2)16 = 2−16 key suggestions on average. A similar analysis can be applied to Step 2(b). We note that the identification of suggested values (or of the found contradic- tions) can be done efficiently in a bit-sliced manner. Hence, we conclude that this step can be implemented efficiently (for each quartet and initial subkey guess). Step 2 starts with 238 quartets. In Step 2(a), the 238 232 = 270 (quartet, subkey guesses) tuples suggest 270 2−16 = 254 values for· the 48 subkey bits · (KO8,1, KI8,1,KL8,2). Similarly, after the additional guess of KO8,3 and KI8,3 in Step 2(b), we get 254 232 2−16 = 270 suggestions for the 96 subkey bits · · (KO8,1, KI8,1,KL8,2, KO8,3, KI8,3,KL8,1). Step 3 goes over all 270 suggestions for the 96 key bits, and tries to complete the remaining 32 key bits by an exhaustive search. This can be performed easily due to the linear key schedule of KASUMI. The time complexity of this step is 2102 trial encryptions. As the complexity of Step 3 is dominant, the total complexity of this attack is 2102 trial encryptions.

3.6 Improvements of the Attack

In this section we present several improvements of the attack that allow to decrease its time complexity considerably.

Improvement of Step 3 Step 3 can be improved by using key ranking tech- niques. Taking 252.6 plaintexts encrypted under four different keys (i.e., three times the data as before), we expect nine right quartets. Instead of completing the missing key bits by an exhaustive key search, we count how many (quartet, subkey guess) tuples suggest each value of the 96 bits of KO8,1, KI8,1, KO8,3, KI8,3, KL8,1 and KL8,2. Only a few possible wrong key values are expected to get more than five suggestions. On the other hand, the right key has probability 88.4% to have at least this number of suggestions. Therefore, we identify which 96-bit values have more than five suggestions, and exhaustively search over the remaining bits of these cases. After this modification, the time complexity of Step 3 becomes negligible compared to that of Step 2(b). This reduces the time complexity of the attack to 286.2 full KASUMI encryptions, while increasing the data complexity to 254.6 related-key chosen plaintexts.

First Improvement of Step 2(b) Another improvement of the attack is based on the observation that Step 2(b) can be implemented in two sub-steps. In the first one, we guess KO8,3 and the 9-bit subkey KI8,3,2, and find the value of 54 25 82.2 only 9 bits of KL8,1. Hence, we generate 9 2 2 = 2 (quartet, subkey guess) tuples where the subkey guess is of 73· bits.· As this improvement deals only with 9 bits of KL8,1, the expected number of remaining (quartet, subkey 73.2 guess) values is 2 . Then, in the second sub-step we guess the 7 bits of KI8,3,1 and find the value of the 7 remaining bits of KL8,1. The time complexity of the attack is now dominated by the first sub-step of Step 2(b), whose complexity is equivalent to about 279.2 KASUMI encryptions.

Second Improvement of Step 2(b) Our next improvement uses the fact that Step 2(b) depends only partially on Step 2(a). After Step 2(a) there are 254 tuples of the form (quartet, subkey guess), where the subkey guess is of 48 bits. However, Step 2(b) uses only 32 bits of the guessed subkey, namely, the value of KO8,1 and KI8,1. As mentioned earlier, a given quartet suggests about 16 12.9 2 values for the 48 bits of KO8,1, KI8,1,KL8,2. However, it suggests only 2 values for the 32 bits of KO8,1, KI8,1. This observation is used to reduce the complexity of the attack: The purpose 12.9 of Step 2(a) is now to find the list of about 2 values for KO8,1, KI8,1 that a quartet suggests, and then Step 2(b) finds the list of about 212.9 values for KO8,3, KI8,3. Only then, in Step 3, we take into consideration the possible values 76.1 of KL8,1 and KL8,2. This reduces the time complexity of the attack to 2 KASUMI encryptions.

Third Improvement of Step 2(b) Finally, we offer another improvement that is based on a more delicate attack procedure. We recall that for a random input/output difference to an S-box, there is on average one pair of actual values which fit these differences.22 Denote the input and output of FL8 by (Y0,X0) and (Y1,X1), respectively. The relation between the input and the output of FL8 is given by

X = X ((Y KL ) ≪ 1); Y = Y ((X KL ) ≪ 1); (19) 1 0 ⊕ 0 ∧ 8,1 1 0 ⊕ 1 ∨ 8,2 ′ ′ ′ ′ Let (Y0 ,X0) and (Y1 ,X1) be the differences in the input and the output of FL8 for the considered pair. Note that the output difference is known to the adversary, ′ and after the guess of KO8,1 and KI8,1, the adversary can also compute Y0 .

22 Actually, the number of such ordered pairs is always even. On the other hand, the probability that no pair satisfies the input/output difference constraint is at least 1/2. In the modified variant of the attack, in Step 2(b) the adversary guesses only KO8,3, and not KI8,3. Like in the first improvement of Step 2(b), the step is divided into two sub-steps. The first sub-step collects suggestions for the 9-bit subkey KI8,3,1 and the second sub-step collects suggestions for the 7-bit subkey KI8,3,2. The First Sub-Step: The knowledge of KO8,3 allows the adversary to compute the input difference to the second S9 S-box of the function F I8,3. Moreover, the output difference of that S-box is given by the 9 corresponding ′ ′ ′ bits of X0. During this sub-step, we abuse the notation and refer by X0 and Y0 to these 9 bits. ′ When Y0 has 9 i zeros, the AND operation with the subkey KL8,1 may affect the difference− only in the i active bits, and hence can result in at most 2i ′ differences. According to Equation (19) each such difference Y0 suggests at most i ′ ′ 2 differences in X0. Each such X0, combined with the input difference to the second S9 S-box of F I8,3, translates to one suggestion on average for the actual inputs to that S-box, that in turn translates to one candidate on average for the subkey KI8,3,1. ′ Hence, assuming that the differences Y0 are distributed uniformly, the ex- pected number of candidates a pair suggests for KI8,3,1 is

9 9 9 9 2−9 2i =2−9 2i 19−i =2−9 (1 + 2)9 i · · i · · · i=0 i=0 X   X   39 = 25.3. 29 ≈ We note that this is also the average time complexity associated with this proce- dure (for the first pair). Then, for the analysis of the second pair, one can either repeat the procedure, and compute the intersection of the two lists, or just try the keys offered by the first procedure. Of course, a more efficient approach is to start with the pair that is going ′ to suggest less keys (i.e., the one for which Y0 has more 0’s). Repeating the analysis presented above, and taking into consideration this fact, it is expected that the pair with lower hamming weight requires 24.2 trials on average to find 4.2 the list of 2 candidate subkeys for KI8,3,1, and an equivalent time to challenge them for consistency with the second pair. At the end of the process, for a given quartet, guess of KO8,1, KI8,1 and KO8,3, about 2.88 candidates to KI8,3,1 are expected to remain. Of course, if no candidates remain, then it is possible to discard this (quartet, KO8,3, KI8,3) combination. We note that about 39.7% of the combinations are indeed discarded at this stage. The time complexity of this sub-step is 9 254 212.9 2 24.2 =275.3 one-round encryptions (when combined with the previous· · improvements).· · The Second Sub-Step: In this sub-step, for each of the remaining can- didates, the adversary obtains suggestions for KI8,3,2, by examining the 7 re- ′ ′ maining bits of X0 and Y0 . This time, the running time for the pair with lower hamming weight is expected to be 23.2 evaluations on average. Hence, this step takes 9 254 (2.88 212.9) 2 23.2 =275.9 one-round encryptions. The expected · · · · · number of suggested candidates for the subkey KI8,3,2 is 2.25. After these two sub-steps, the adversary is left with 9 251.9 216 2.88 2.25 = 72.8 · · · · 2 suggestions for combinations of (quartet, KO8,1, KI8,1, KO8,3, KI8,3). At this point, the adversary can use the obtained information to get suggestions for the subkeys KL8,2 and KL8,1, along with an additional filtering. The resulting number of suggestions for (quartet, KO8,1, KI8,1, KO8,3, KI8,3,KL8,1,KL8,2) is 273.2, like in the basic attack (actually, the number in the basic attack is 270 but in the modified attack we examine 9 times more quartets). The remaining key bits can be retrieved efficiently, as was shown earlier. The time complexity of this attack, is thus dominated by the analysis of the S-boxes S9 and S7 of 75.9 75.3 76.6 the function F I7,3, which takes time of about 2 +2 = 2 one-round encryptions, or a total of 273.6 full KASUMI encryptions.

3.7 A Different 8-Round Rectangle Attack

It is also possible to apply a slightly different attack algorithm, which is based on a 6-round related-key rectangle distinguisher. The distinguisher is used in rounds 2–7, where the first related-key differential (used in rounds 2–4) is the dif- ferential of rounds 4–6 presented in Section 3.2 (shifted two rounds backwards), and the second related-key differential is used in rounds 5–7. The associated probabilities arep ˆ =1/√8, andq ˆ =1/4, respectively. Hence, given 272 quartets with the right input differences, we expect about 272 (1/√8)2 (1/4)2 2−64 =2 right quartets. · · · The attack uses 16 structures, each composed of three sets of 232 plaintexts each. The structures are of the form P = A, x for a fixed A, and all possible x’s which are encrypted under K and K , and{ P =} A δ , x and P = A δ , x , a c 0 { ⊕ 0 } 1 { ⊕ 1 } each encrypted under Kb and Kd. For sake of clarity, we describe the attack for 4 K2 = 0 (which means that the differentials in use are based on δ0). The other case, follows immediately. We search for quartets composed of ((Pa, Pb), (Pc, Pd)) with input difference δ0 after the first round (in the pairs (Pa, Pb) and (Pc, Pd)), and difference α before the last round (in the pairs (Pa, Pc) and (Pb, Pd)). To do so, for each pair of structures, the adversary finds all the candidate quartets (out of possible 2128 quartets, only 264 satisfy the known ciphertext differences). As there are 162 = 28 pairs of structures, the adversary analyzes 272 quartets, in a manner very similar to the basic attack:

1. For each candidate quartet, guess KI8,1, KO8,1 and retrieve KL8,2 (similarly to Step 2(a)). 2. For each candidate (quartet, subkey) tuple, guess KI1,1 and use the known value of KO1,1 (which is known from KL8,2) to obtain candidate KL1,2. 3. For each candidate (quartet, subkey) tuple, guess KO1,3 and from the knowl- edge of KL1,1 (which is known from KO8,1), find candidates to KI1,3 (apply Step 2(b) in a slightly different order). 4. Exhaustively search over all remaining keys. The first step of the attack takes 272 232 = 2104 operations, and results in 272 232 2−16 = 288 tuples of (quartets,· 48-bit key guess). In the second step, · · 16 more key bits are guessed, each resulting in a consistent suggestion for KL1,2 with probability 2−16. Hence, this step takes 288 216 = 2104 operations, and offers 288 216 2−16 = 288 tuples of (quartets, 80-bit· key guess). In Step 3, we apply an analysis· · step which is similar to Step 2(b) (using the fact that we know the inputs to F I1,3 and its output difference), which allows finding a consistent −16 suggestion for KI1,3 with probability 2 for each KO1,3. Hence, also this step takes 288 216 = 2104 operations, and results in 288 216 2−16 = 288 tuples of the form (quartet,· 112-bit subkey guess). At this point,· exh· austive search takes 288 216 =2104 trial encryptions, which is what the adversary does. · We note that the attack has to be repeated twice, once with δ0 and once with 4 δ1. However, when using δ0 we are assuming that K3 = 0, which means that there is no need to guess its value, and by reversing the order of Steps 1 and 2, reduce the total running time to 2104 trial encryptions. The data complexity is unchanged, i.e., 2 16 3 232 =238.6 chosen plaintexts in total. · · ·

4 Related-Key Rectangle Attacks on AES-192

4.1 The AES Block Cipher AES encrypts data blocks of 128 bits with 128, 192 or 256-bit keys. According to the length of the keys, AES uses a different number of rounds Nr, i.e., Nr = 10, 12 and 14 when used with 128, 192 and 256-bit keys, respectively. The rounds are numbered 0, , Nr 1. The round function of AES consists of the following four basic operations:· · · − – SubBytes (SB) is a nonlinear byte-wise substitution that applies the same 8 8 S-box to every byte. – ShiftRows× (SR) is a cyclic shift of the i’th row by i bytes to the left. – MixColumns (MC) is a matrix multiplication over a finite field applied to each column. – AddRoundKey (ARK) is an exclusive-or with the round subkey. Each round of AES applies the SB, SR, MC and ARK operations in that order. Before the first round, an additional ARK operation is performed (using the whitening key), and in the last round, the MC operation is omitted. For more details of the above four transformations, we refer the reader to [16]. AES uses different key scheduling algorithms according to the length of the key. As we deal only with AES-192 (i.e., AES with 192-bit keys), we de- scribe the key schedule for AES-192: Let the supplied key be 6 words of 32-bits (W [0], W [1], , W [5]). To generate 13 subkeys of 128 bits (which compose 52 words of 32 bits),· · · the following algorithm is used: – For i = 6 till i = 51 do the following: If i 0 mod 6, then W [i]= W [i 6] SB(RotByte(W [i 1])) Rcon[i/6], • else≡W [i]= W [i 1] W [i 6],− ⊕ − ⊕ • − ⊕ − where RotByte represents one byte rotation to the left and Rcon denotes an array of fixed constants. The 128-bit block of AES is represented by a 4 4 byte matrix. Throughout the paper we treat the internal state as bytes ((0,1,2,3),(4× ,5,6,7),(8,9,10,11),(12,13,14,15)) (see Figure 4 for a graphical representation).

Fig. 4. Byte coordinates of a 128-bit block of AES (Ri: Row i, Ci: Column i, Xi: Byte i) C0 C1 C2 C3

? ? ? ?

R0 - X0 X4 X8 X12

- R1 X1 X5 X9 X13

- R2 X2 X6 X10 X14

- R3 X3 X7 X11 X15

4.2 Preliminaries

We first define some notation which is used in our attacks on AES. Denote the 10 first rounds of AES-192 by E = Ef E1 E0 Eb, where Eb is round 0 including the whitening key addition step and◦ excluding◦ ◦ the key addition step of round 0, E0 is rounds 1–4 (starting from the AddRoundKey operation of round 0), E1 is rounds 5–8 and Ef is round 9. In our 10-round AES-192 attack, we use the related-key differential for E0 depicted in Fig. 5 and the related-key differential for E1 depicted in Fig. 6. We use these related-key differentials for constructing a related-key rectangle distinguisher for E1 E0, that allows us to recover some portion of the keys in Eb and Ef . ◦

Let Ka,Kb,Kc,Kd be a quartet of keys satisfying the subkey differences required for the related-key differential (or that we conjecture that they satisfy w these subkey differences). Then Kx is the whitening key derived from Kx and i Kx is the ith round subkey derived from Kx. We use the notation Px to denote a plaintext encrypted under Kx. The intermediate value in the encryption of Px i is denoted by Ix (the input to round i). i i i i i Besides the key differences ∆Kab and ∆Kac, we use ∆Iab = Ia Ib = Ic Id i i i i i ⊕ ⊕ and ∆Iac = Ia Ic = Ib Id. Finally, HWb(X) denotes the hamming weight in bytes of X, x denotes⊕ an⊕ 8-bit difference, y,z denote (not necessarily different) 8-bit differences such that x can evolve to y or z through the SubBytes operation, and denotes an unknown byte difference. ∗ Fig. 5. The related-key differential for E0 (ARK of round 0 and rounds 1-4), and the preceding Eb (the whitening ARK and most of round 0). ∆Pab denotes a set of differences.

w 0 1 2 3 4 Kab Kab Kab Kab KabK ab

Ÿ Ÿ K Ÿ Ÿ K K Ÿ K Ÿ Ÿ K Ÿ Ÿ Ÿ Ÿ S S S S S

SB,SR,MC SB,SR,MC SB,SR,MC SB,SR,MC SB,SR,MC

* * * * * * * * * * x * * * Ÿ x * * * * * * * * * * * * * * * * * * * * * * * 0 1 2 3 4 5 PabI abI abI abI abI abI ab

Round before the differential

Fig. 6. The related-key differential for rounds 5-8 (E1) and the following round (Ef ).

5 6 7 8 9 Kac Kac Kac Kac Kac Kac x x Ÿ Ÿ Ÿ Ÿ Ÿ x x K K K K K S S S S S y y z z

SB,SR,MC SB,SR,MC SB,SR,MC SB,SR,MC SB,SR

* * x * x x * x * * * * * * * * * * * * T][ * * z z 5 w™dY 6 7 8 9 10 Iac Iac Iac Iac Iac Iac y–œ•‹Gˆ›Œ™G ›ŒG‹Œ™Œ•›ˆ“G 4.3 8-Round Related-Key Rectangle Distinguisher Our related-key differentials exploit the slow difference propagation of the key schedule of AES-192, that allows three consecutive rounds for which the Ham- ming weight in bytes of the key differences is 2,0,1, respectively, as shown in Figure 6. 0 The differential used for E is depicted in Figure 5. Its key difference ∆Kab equals x in bytes 1 of W [0] and W [2], and is zero in all the other bytes (see Figure 6). The input difference α equals x in bytes X9,13 and zero in the rest of the bytes, such that it cancels with the subkey difference, and the input 1 39 difference to round 1 becomes ∆Iab = 0. Since there are at most 2 possible output differences, it follows from Proposition 3 thatp ˆ √2−39 =2−19.5. The second differential, depicted in Figure 6, is a truncated≥ differential. Its 9 output difference set ∆Iac consists of 127 possible output differences that share 9 all but the first column. The first column of ∆Iac can accept any of the values = MC(j, 0, 0, 0) j = SB(i) SB(x i), B { | ⊕ ⊕ i =0, 1, 2, , 255 . · · · } As for the key difference, it appears that the required subkey differences (pre- sented in Figure 6) cannot be assured by any fixed key difference. The subkey 3 difference pattern requires some cancellation (in byte 11 of ∆Kac), that occurs with probability 2−7. Hence, this differential can be interpreted as a weak key class or a conditional differential. Fortunately, the relation can be assured with a small set of keys. 5 In order to computeq ˆ, we take all the possible input differences ∆Iac that can lead after SubBytes, ShiftRows, and MixColumns operations to a state with difference x in bytes X8,12 and zero difference in the rest of the bytes. Such 5 difference is then canceled with the key difference ∆Kac and leads to zero differ- 6 8 ence ∆Iac. There are 127 such input differences, one of them with probability (2−6)8 =2−48,8 126 of them with probability (2−6)7 2−7 =2−49, and so forth until (126)8 of them· with probability (2−7)8 =2−56. Summing· over all of them yieldsq ˆ 2−27.9. ≥ 9 Therefore, the overall probability of the rectangle distinguisher (i.e., Pr[Ia I9, I9 I9 ∆I9 ]) is ⊕ c b ⊕ d ∈ ac 2−128 (2−19.5)2 (2−27.9)2 =2−222.8. · · 9 We note that since ∆Iac consists of 127 differences, the probability that the con- 9 9 9 9 9 dition [Ia Ic , Ib Id ∆Iac] holds for a random permutation is approximately 2−242. ⊕ ⊕ ∈

4.4 Key Recovery Attack on 10-Round AES-192 with 256 Related Keys Before we present the attack, we address two points arising from the related-key nature of the attack. Table 6. The development of the subkey differences

w 0 1 2 3 4 ∆Kab ∆Kab ∆Kab ∆Kab ∆Kab ∆Kab xx xx x xxxxxx

aaaaaa a a a c c d d c c c x x x xx xx x x f f eeeeeee bbbbbb b b b g g 5 6 7 8 9 ∆Kab ∆Kab ∆Kab ∆Kab ∆Kab

The development of the subkey differences for ∆Kab

w 0 1 2 3 4 ∆Kac ∆Kac ∆Kac ∆Kac ∆Kac ∆Kac xx x x x x x x x

yy yy y

xx x xxxx

z z 5 6 7 8 9 ∆Kac ∆Kac ∆Kac ∆Kac ∆Kac

The development of the subkey differences for ∆Kac Multiple Keys. As noted before, the attack requires a quartet of keys sat- isfying the subkey differences depicted in Figure 6. Since no single key difference can assure these subkey differences, the simplest solution is to repeat the attack for all 127 possible byte values y such that difference x can evolve to y through the SubBytes operation. For each such y, the attack is performed with the same keys Ka and Kb, and with different keys Kc and Kd that correspond to y. Thus, the total number of required related keys is 256 (a single pair (Ka,Kb), and 127 pairs (Kc, Kd)). Note that since we try all possible values for y, one of the quartets we consider necessarily satisfies the subkey differences. The Amounte e of Guessed Subkey Material. The attack recovers bytes 1, w w w w 2, 6, 7, 8, 11, 12, 13 of the whitening key quartet (Ka ,Kb ,Kc ,Kd ), and 9 9 9 9 bytes 0,4,6,7,8,10,12,13 of the subkey quartet (Ka,Kb ,Kc ,Kd). While there are exactly (28)8 =264 possible values for the eight guessed bytes in the whitening keys, the number of possible values of the six bytes of K9 is much higher than 8 8 64 9 (2 ) = 2 . This occurs as the subkey difference ∆Kab in four of these bytes is unknown to the adversary (these are bytes 0,8,10,13, whose differences are denoted in Figure 6 by c,c,e,f, respectively). As a result, the adversary has to 9 learn not only six bytes of Ka, but also the values c,e,f. Due to properties of the key schedule, given the value of e, there are only 127 possible values of f. Hence, the number of possible guesses in K9 is (28)8 (28)2 27 =287. · · The attack algorithm goes as follows:

Data Collection Phase (Steps 1–3):

49 2 49.2 1 2 2 . 64 1. Choose 2 structures Sa, Sa, , Sa of 2 plaintexts each, where in each structure the values of bytes· · · 0, 3, 4, 5, 9, 10, 14, 15 are fixed and the remaining eight bytes assume all the possible values, and ask for their encryption under the key Ka. 49 2 49.2 1 2 2 . 64 2. Compute 2 structures Sb , Sb , , Sb of 2 plaintexts each by XOR- · · · 49.2 ing byte 9 of all the plaintexts in S1, S2, , S2 with x. Ask for the a a · · · a encryption of these structures under the key Kb. 3. Guess a candidate for the difference y and compute ∆Kac. For the key difference ∆Kac, do the following: 49 2 49.2 1 2 2 . 64 e (a) Choose 2 structures Sc , Sc , , Sc of 2 plaintexts each, con- e j · · · structed similarly to Sa (possibly with different constant values), and ask for their encryption under the key Kc = Ka ∆Kac. 49 2 49.2 1 2 2 . ⊕64 (b) Compute 2 structures Sd, Sd, , Sd of 2 plaintexts each by · · · 49.2 XORing byte 9 of all the plaintexts in eS1, S2, ,e S2 with x. Ask c c · · · c for the encryption of these structures under the key K = K ∆K . d b ⊕ ac Analyzing Eb and Finding Candidate Quartets (Step 4): e e w 4. Guess the 64 bits of bytes 1, 2, 6, 7, 8, 11, 12, 13 of Ka and derive from w w w them the corresponding bytes in Kb ,Kc ,Kd . For each such guess: (a) Partially encrypt all the plaintexts through the 8 active S-boxes of round 0, 23 and find all pairs (Pa, Pb) with difference α just before the AddRound- Key of round 0 (where Px is encrypted under Kx). Denote the corre- sponding ciphertexts by (Ca, Cb) respectively. Each pair of structures j j 64 (Sa,Sb ) is expected to contain 2 such pairs (Ca, Cb), and hence the total number of pairs at this stage is 249.2 264 =2113.2. · (b) Insert all pairs (Ca, Cb) into a hash table (indexed by bytes 1, 2, 3, 4, 5, 6, 9, 14 of each of the ciphertexts). (c) Similarly, find all pairs (Pc, Pd) with difference α just before the Ad- dRoundKey of round 0 (Px is encrypted under Kx). Denote their corre- sponding ciphertexts by (Cc, Cd), respectively. As before, the expected 113.2 number of pairs (Cc, Cd) at this stage is 2 . e (d) For every pair (Cc, Cd) check whether there exists a pair (Ca, Cb) such that C C and C C are zero in bytes 1,2,3,5,6,9,14, and x in byte 4. a ⊕ c b ⊕ d For each such quartet, check that the difference Ca Cc is the same in bytes 11 and 15 (denoted by z ), and check the same⊕ for C C (where ac b ⊕ d the difference is denoted by zbd). Check that x input difference to the S-box may cause zac and zbd output differences (otherwise, discard the quartet). Starting with (2113.2)2 =2226.4 quartets, about 2226.4 2−112 = 2114.4 quartets satisfy the zero differences in bytes 1,2,3,5,6,9,14,· of which 2114.4 2−16 = 298.4 satisfy the x differences in byte 4, and amongst them 2· 98.4 (2−8)2 (127/256)2 =280.4 quartets remain and are further analyzed. · · Analyzing Ef (Steps 5–8):

10 10 5. Using the difference in byte 12 of ∆Iac and ∆Ibd , deduce the subkey sug- 9 9 9 9 gested by the pairs (Ca, Cc) and (Cb, Cd) for byte 12 of Ka,Kb ,Kc ,Kd (due to the subkey differences, the value is the same for these four subkeys).24 If the values disagree, discard the quartet. Of the 280.4 quartets entering Step 5, we expect 280.4 2−8 =272.4 quartets to remain, each suggesting one value for subkey byte 12.· 6. For each remaining quartet, consider the pairs (Ca, Cc) and (Cb, Cd) sepa- 9 rately and use the differences zac,zbd to deduce the value of byte 4 in Ka, K9, K9, K9.25 Deduce the value of c from byte 4 of the difference K9 K9. b c d a ⊕ b 23 The difference α is defined in Section 4.3. 24 Recall that a (∆IN ,∆OUT ) pair for the SubBytes operation yields on average one suggestion for the actual inputs/outputs. In our case, the output difference is known, and we assume that the input difference is x. This gives us one suggestion on av- erage for the actual outputs of the SubBytes operation, that in turn yields a single suggestion on average for byte 12 of the last subkey. 25 For each pair, we consider the generation of byte 11 in K9 in the key schedule algorithm. At this stage, the input difference to the SubBytes operation in the key schedule algorithm is x, and the output difference is zac (or zbd for the second pair). This suggests one value on average for the input of the SubBytes operation, i.e., to byte 4 of K9. 7. For each quartet, consider the pairs (Ca, Cc) and (Cb, Cd) separately, and for each of them use the difference in byte 8 of ∆(I10) to deduce the values 9 9 9 9 suggested by the pairs (Ca, Cc) and (Cb, Cd) for byte 8 of Ka,Kb ,Kc ,Kd . 9 9 Use the value of byte 8 of the difference Ka Kb to get a suggestion for c. If the suggestion disagrees with the value of⊕c obtained for the quartet in Step 6, discard the quartet. Consistent values are expected with probability 2−8, and about 272.4 2−8 = 264.4 quartets remain. Each of the remaining · 9 9 9 9 quartets suggests on average one value for bytes 4, 8, 12 of Ka,Kb ,Kc ,Kd , and for c. 8. For each remaining (quartet, subkey guess): 9 (a) Guess byte 0 of Ka (that along with the knowledge of c is sufficient to 9 9 9 compute byte 0 of Kb ,Kc ,Kd ), and partially decrypt the two pairs to find the value of byte 0 in the input of round 9. 9 9 (b) For each pair, use the knowledge of byte 0 in ∆Iac (or ∆Ibd) to retrieve 9 9 the entire difference ∆Iac (or ∆Ibd, respectively). This is possible since the differences in the other bytes of the first column depend linearly on the difference in byte 0. (c) For each pair, use the input and output difference of the SubBytes oper- ation in byte 3 of round 9 to find the value of byte 7 in K9. If the pairs of the quartet disagree on that value, discard the (quartet,subkey guess). For each guess of byte 0, 2−8 of the quartets are expected to suggest consistent values, and hence on average, each quartet suggests a single value for bytes 0,7 of K9. (d) For each pair, use the input and output differences of the SubBytes operation in bytes 1,2 of round 9 to find the value of bytes 10,13 in K9. 9 9 Use the difference in bytes 10,13 of Ka Kb to retrieve the value of e,f. If e input difference to the S-box cannot⊕ cause f output difference, discard the (quartet, subkey guess). At this stage, each of the 264.4 remaining quartets suggests one value on 9 9 9 9 average for bytes 0, 4, 7, 8, 10, 12, 13 of Ka,Kb ,Kc ,Kd, and for c,e,f, and half of them are expected to offer consistent suggestion for e and f. – Finding the Right Key (Step 9):

9. If a subkey combination is suggested by six quartets or more, assume it is correct, and try to deduce the correct key using exhaustive search of the remaining bytes.

Analysis of the Attack In Steps 1,2,3 the adversary encrypts 2121.2 chosen 113.2 plaintexts (2 plaintexts under each of the keys Ka,Kb, Kc, and Kd). Hence, the data complexity of this attack is about 2121.2 related-key chosen plaintexts. Steps 4(a) may look as if they take 2121.2 264 =2185.2 partiale encryptionse each. However, as the values of the plaintexts in· the bytes which are fixed and have no effect on the actual “pairing”, then it is sufficient to partially encrypt a set of 264 values under 264 subkeys (and repeat this for each value of y). Hence, the total time complexity of an optimized implementation of each of these two steps is 264 264 27 =2135 partial encryptions (which are about 2132 encryptions in total). Steps· · 4(b) is composed of inserting the pairs found in Step 2(a) into tables. For each subkey guess there are 2113.2 pairs which are put into the table, and thus, this step takes 264 2113.2 27 =2184.2 memory accesses. · · Step 4(c) is similar to Step 4(a), and takes the same time. The same is true to Step 4(d) and Step 4(b), respectively. We note that Step 4(b) has to be performed only once and not 27 times like the other steps (as it is independent 184.2 of ∆Kac), and thus, Step 4 takes in total 2 memory accesses. Steps 5–8 each analyzes a small number of quartets, in a relatively efficient manner.g Finally, Step 9, exhaustively searches over 2128 keys for each key can- didate. Hence, we conclude that the running time of the attack is dominated by Steps 4(c) and 4(d), and is approximately 2184.2 memory accesses. We can calculate the success rate of the attack by using the Poisson distri- bution. At the end of Step 8(d), we expect 263.4 (quartet,subkey guess) combi- nations to remain. Given that there are 279 possible values at this point (c,e,f suggest values for the related-keys Kb,Kd), we expect a wrong subkey to be sug- gested by 2−15.6 quartets on average, while the right subkey is expected to be suggested by 2226.4 2−222.8 =23.6 = 12 quartets. Hence, the probability that any given wrong tuple of· subkeys (of the 279 264 =2143 subkeys) is suggested more than five times is about 2−103.1. Hence,· we expect 2143 2−103.1 = 239.9 wrong subkeys to be analyzed in Step 9, which means that Step· 9 has time complexity of 2167.9 trial encryptions. The right subkey is expected to be suggested by more than five quartets with probability about 98.0%. When this is the case, the right key is found, and thus, the success rate of the attack is about 98.0%.

4.5 Reducing the Number of Related Keys from 256 to 64

The number of related keys used in our attack can be reduced from 256 to 64 using key structures. The following 64 related keys are used in our optimized attack:

i i j – 16 keys Ka (i =0, 1, , 15) such that Ka Ka is zero in all bytes, besides bytes 3,8,11, and 12,· and · · the difference in bytes⊕ 3,11 is fixed to some w and the difference in bytes 8,12 is fixed to some r (not necessarily w r). i i → – 16 keys Kb, each computed as Ka ∆Kab for a specific value of x. i ⊕ i j – 16 keys Kc (i =0, 1, , 15) such that Kc Kc is zero in all bytes, besides bytes 3,8,11, and 12,· and · · the difference in bytes⊕ 3,11 is fixed to some w′ and the difference in bytes 8,12 is fixed to some r′ (not necessarily w′ r′). i i → – 16 keys Kd, each computed as Kc ∆Kab for the same value of x used to i ⊕ generate Kb. Using these delicately chosen key relationships, we generate 256 key quar- i i j j tets (Ka,Kb,Kc ,Kd) of which one is expected to satisfy the subkey difference requirement of the attack (the attack can be applied where the value of x in the first and the second differentials are replaced by x1 and x2, respectively). As we generate the data set for each possible key only once, the data com- plexity of the attack is reduced to 2119.2 related-key chosen plaintexts. On the other hand, the attack is now to be run 256 times rather than 127 times, resulting in a total running time of 2185.2 memory accesses.

5 Conclusions

In this paper we introduced the related-key boomerang and the related-key rect- angle attacks. The attacks use weaknesses of the key schedule algorithms to achieve significant advantage over other attacks techniques. We presented a rig- orous treatment of the new techniques, thus devising optimal distinguishers. Both the related-key boomerang attack and the related-key rectangle attack enjoy the use of key differences twice. Hence, in exchange for an attack model with 4 related-keys, the adversary is able to attack a significantly larger amount of rounds than in the standard single-key model or a standard related-key dif- ferential attack. Apart from the immediate attacks, another outcome of our results is a better understanding of the importance of well designed key schedule algorithms for the security of block ciphers. While it is commonly believed that a linear key schedule (or one close to it), is of no security concern to a well designed block cipher, the related-key boomerang and rectangle attacks along with the concept of structures of keys (for nonlinear key schedule algorithms) show that this belief is dangerous and at times may be faulty.

References

1. Thomas Baign`eres, Pascal Junod, Serge Vaudenay, How Far Can We Go Beyond Linear Cryptanalysis, Advances in Cryptology, proceedings of ASIACRYPT 2004, Lecture Notes in Computer Science 3329, pp. 432–450, Springer-Verlag, 2004. 2. Mihir Bellare, Tadayoshi Kohno, A Theoretical Treatment of Related-Key Attacks: RKA-PRPs, RKA-PRFs, and Applications, Advances in Cryptology, proceedings of EUROCRYPT 2003, Lecture Notes in Computer Science 2656, pp. 491–506, Springer-Verlag, 2003. 3. Ishai Ben-Aroya, , Differential Cryptanalysis of , Advances in Cryptology, proceedings of EUROCRYPT ’93, Lecture Notes in Computer Science 773, pp. 187–199, Springer-Verlag, 1994. 4. Eli Biham, New Types of Cryptanalytic Attacks Using Related Keys, Journal of Cryptology, Vol. 7, No. 4, pp. 229–246, Springer-Verlag, 1994. 5. Eli Biham, Adi Shamir, Differential Cryptanalysis of DES-like , Ad- vances in Cryptology, proceedings of CRYPTO 1990, Lecture Notes in Computer Science 537, pp. 2–21, Springer-Verlag, 1990. 6. Eli Biham, Adi Shamir, Differential Cryptanalysis of the Data Encryption Stan- dard, Springer-Verlag, 1993. 7. Eli Biham, Orr Dunkelman, Nathan Keller, The Rectangle Attack — Rectangling the , Advances in Cryptology, proceedings of EUROCRYPT 2001, Lecture Notes in Computer Science 2045, pp. 340–357, Springer-Verlag, 2001. 8. Eli Biham, Orr Dunkelman and Nathan Keller, New Results on Boomerang and Rectangle Attacks, proceedings of Fast Software Encryption 2002, Lecture Notes in Computer Science 2365, pp. 1–16, Springer-Verlag 2002. 9. Eli Biham, Orr Dunkelman, Nathan Keller, Related-Key Boomerang and Rectan- gle Attacks, Advances in Cryptology, proceedings of EUROCRYPT 2005, Lecture Notes in Computer Science 3494, pp. 507–525, Springer-Verlag, 2005. 10. Eli Biham, Orr Dunkelman, Nathan Keller, A Related-Key Rectangle Attack on the Full KASUMI, Advances in Cryptology, proceedings of ASIACRYPT 2005, Lecture Notes in Computer Science 3788, pp. 443–461, Springer-Verlag, 2005. 11. Eli Biham, Orr Dunkelman, Nathan Keller, New Cryptanalytic Results on IDEA, Advances in Cryptology, proceedings of ASIACRYPT 2006, Lecture Notes in Com- puter Science 4284, pp. 412–427, Springer-Verlag, 2006. 12. Eli Biham, Orr Dunkelman, Nathan Keller, A Unified Framework for Related-Key Attacks, proceedings of Fast Software Encryption 2008, Lecture Notes in Computer Science 5086, pp. 73–96, Springer-Verlag, 2008. 13. Alex Biryukov, The Boomerang Attack on 5 and 6-Round AES, Proceedings of Advance Encryption Standard Fourth Workshop, Lecture Notes in Computer Sci- ence 3373, pp. 11–16, Springer-Verlag, 2005. 14. Alex Biryukov, Dmitry Khovratovich, Related-key Cryptanalysis of the Full AES- 192 and AES-256, IACR ePrint report 2009/317, 2009. Available online at http: //eprint.iacr.org/2009/317. 15. Mark Blunden, Adrian Escott, Related Key Attacks on Reduced Round KASUMI, proceedings of Fast Software Encryption 2001, Lecture Notes in Computer Sci- ence 2355, pp. 277–285, Springer-Verlag, 2002. 16. Joan Daemen, Vincent Rijmen, The design of Rijndael: AES — the Advanced Encryption Standard, Springer-Verlag, 2002. 17. Joan Daemen, Vincent Rijmen, Understanding Two-Round Differentials in AES, proceedings of Security and for Networks 2006, Lecture Notes in Computer Science 4116, pp. 78–94, Springer-Verlag, 2006. 18. H¨useyin Demirci, Ali Aydin Sel¸cuk, A Meet-in-the-Middle Attack on 8-Round AES, proceedings of Fast Software Encryption 2008, Lecture Notes in Computer Sci- ence 5086, pp. 116–126, Springer-Verlag, 2008. 19. Niels Ferguson, John Kelsey, Stefan Lucks, Bruce Schneier, Mike Stay, David Wag- ner, Doug Whiting, Improved Cryptanalysis of Rijndael, proceedings of Fast Soft- ware Encryption 2000, Lecture Notes in Computer Science 1978, pp. 213–230, Springer-Verlag, 2001. 20. Michael Gorski, Stefan Lucks, New Related-Key Boomerang Attacks on AES, pro- ceedings of INDOCRYPT 2008, Lecture Notes in Computer Science 5365, pp. 266– 278, Springer-Verlag, 2008. 21. Seokhie Hong, Jongsung Kim, Guil Kim, Sangjin Lee, Bart Preneel, Related-Key Rectangle Attacks on Reduced Versions of SHACAL-1 and AES-192, proceedings of Fast Software Encryption 2005, Lecture Notes in Computer Science 3557, pp. 368– 383, Springer-Verlag, 2005. 22. Goce Jakimoski, Yvo Desmedt, Related-Key Differential Cryptanalysis of 192-bit Key AES Variants, proceedings of Selected Areas in Cryptography 2003, Lecture Notes in Computer Science 3006, pp. 208–221, Springer-Verlag, 2004. 23. John Kelsey, Bruce Schneier, David Wagner, Key Schedule Cryptanalysis of IDEA, G-DES, GOST, SAFER, and Triple-DES, Advances in Cryptology, proceedings of CRYPTO 1996, Lecture Notes in Computer Science 1109, pp. 237–251, Springer- Verlag, 1996. 24. John Kelsey, Bruce Schneier, David Wagner, Related-Key Cryptanalysis of 3-WAY, Biham-DES, CAST, DES-X, NewDES, RC2, and TEA, proceedings of Informa- tion and Communications Security 1997, Lecture Notes in Computer Science 1334, pp. 233–246, Springer-Verlag, 1997. 25. John Kelsey, Tadayoshi Kohno, Bruce Schneier, Amplified Boomerang Attacks Against Reduced-Round MARS and Serpent, proceedings of Fast Software Encryp- tion 2001, Lecture Notes in Computer Science 1978, pp. 75–93, Springer-Verlag, 2002. 26. Jongsung Kim, Guil Kim, Seokhie Hong, Dowon Hong, The Related-Key Rectangle Attack — Application to SHACAL-1, proceedings of Australasian Conference on Information Security and Privacy 2004, Lecture Notes in Computer Science 3108, pp. 123–136, Springer-Verlag, 2004. 27. Jongsung Kim, Seokhie Hong, Bart Preneel, Related-Key Rectangle Attacks on Re- duced AES-192 and AES-256, proceedings of FSE 2007, Lecture Notes in Computer Science 4593, pp. 225–241, Springer-Verlag, 2007. 28. Lars R. Knudsen, Cryptanalysis of LOKI91, proceedings of Auscrypt 1992, Lecture Notes in Computer Science 718, pp. 196–208, Springer-Verlag, 1993. 29. Ulrich K¨uhn, Cryptanalysis of Reduced-Round MISTY, Advances in Cryptology, proceedings of EUROCRYPT 2001, Lecture Notes in Computer Science 2045, pp. 325–339, Springer-Verlag, 2001. 30. X. Lai, J.L. Massey, S. Murphy, Markov Ciphers and Differential Cryptanalysis, Advances in Cryptology, proceedings of EUROCRYPT ’91, Lecture Notes in Com- puter Science 547, pp. 17–38, Springer-Verlag, 1992. 31. Mitsuru Matsui, Linear Cryptanalysis Method for DES Cipher, Advances in Cryp- tology, proceedings of EUROCRYPT ’93, Lecture Notes in Computer Science 765, pp. 386–397, Springer-Verlag, 1994. 32. National Institute of Standards and Technology, Advanced Encryption Standard, Federal Information Processing Standards Publications No. 197, 2001. 33. Kaisa Nyberg, Perfect Nonlinear S-boxes, Advances in Cryptology, proceedings of EUROCRYPT 1991, Lecture Notes in Computer Science 547, pp. 378–386, Springer-Verlag, 1991. 34. Kaisa Nyberg, Lars R. Knudsen, Provable Security Against Differential Cryptanal- ysis, Advances in Cryptology, proceedings of CRYPTO 1992, Lecture Notes in Computer Science 740, pp. 566–578, Springer-Verlag, 1993. 35. Ali Aydin Sel¸cuk, On Probability of Success in Linear and Differential Cryptanal- ysis, Journal of Cryptology Vol. 21 No. 1, pp. 131–147, Springer-Verlag, 2008. 36. 3rd Generation Partnership Project, Technical Specification Group Services and System Aspects, 3G Security, Specification of the 3GPP Confidentiality and In- tegrity Algorithms; Document 2: KASUMI Specification, V.3.1.1, 2001. 37. Gene Tsudik, Els Van Herreweghen, On simple and secure key distribution, Con- ference on Computer and Communications Security, Proceedings of the 1st ACM conference on Computer and communications security, pp. 49–57, ACM, 1993. 38. David Wagner, The Boomerang Attack, proceedings of Fast Software Encryp- tion 1999, Lecture Notes in Computer Science 1636, pp. 156–170, Springer-Verlag, 1999. 39. Gaoli Wang, Nathan Keller, Orr Dunkelman, The Delicate Issues of Addition with Respect to XOR Differences, proceedings of Selected Areas in Cryptography 2007, Lecture Notes in Computer Science 4876, pp. 212–231, Springer-Verlag, 2007. 40. David J. Wheeler, Roger M. Needham, TEA, a Tiny Encryption Algorithm, pro- ceedings of Fast Software Encryption 1994, Lecture Notes in Computer Science 1008, pp. 363–366, Springer-Verlag, 1995. 41. ZDNet, New Xbox security cracked by Linux fans, 2002. Available on-line at http: //news.zdnet.co.uk/software/developer/0,39020387,2123851,00.htm. 42. Wentao Zhang, Wenling Wu, Lei Zhang, Dengguo Feng, Improved Related-Key Im- possible Differential Attacks on Reduced-Round AES-192, proceedings of Selected Areas in Cryptography 2006, Lecture Notes in Computer Science 4356, pp. 15–27, Springer-Verlag, 2007. 43. Wentao Zhang, Lei Zhang, Wenling Wu, Dengguo Feng, Related-Key Differential- Linear Attacks on Reduced AES-192, proceedings of INDOCRYPT 2007, Lecture Notes in Computer Science 4859, pp. 73-85, Springer-Verlag, 2007.