Facolta` di Scienze Matematiche Fisiche e Naturali Graduation Thesis in Mathematics

Synthesis

Fast Algebraic in Finite Fields of Higher Order with the Cube Attack

Candidate: Supervisor: Marco Vargiu Prof. Marco Pedicini 406875

Academic year: 2011/2012

MSC AMS: 94A60; 12Y05; 13P10; 68W30; 68P30. KEYWORDS: Communication technology, Computational security, Efficient exhaustive search, Differential cryptanalysis.

Synthesis

On the last decades, fast evolving communication technologies have impacted the daily life of practically every person on Earth. The communication infras- tructure is a fundamental part to the global economy and will play an even more vital role for most aspects of global progress in the decades to come. The rapid development of computers, electronic transmission of information, online finance transaction, and the increasing military and diplomatic com- munications have heavily contributed to the development of by cipher-design community. Cryptography is the science of keeping secrets secret. Assume a sender, referred to as Alice, wants to send a message m to a receiver, referred to as Bob. She uses an inse- cure channel. It could be a telephone line or a computer network, for ex- ample. The message could be inter- Fig. 0.1. Alice, Bob and Eve scheme. cepted and read by an eavesdropper, see Figure 0.1. Or, even worse, the adversary, referred to as Eve, might be able to modify the message during transmission in such a way that the legitimate recipient Bob does not detect the manipulation. The message to be transmitted is called the plaintext, it can be some text, numerical data or other kind of information and its structure 4

is completely arbitrary. After encrypting the plaintext m, Alice obtains the resulting c which is then transmitted to Bob over the channel. Bob can turn the ciphertext back into the plaintext by decryption. These ideas are described formally using the following mathematical no- tation presented in Chapter 1. For a more detailed description we refer to [DK07, Sti06, Jou09].

Definition 1 A is a five-tuple (P, C, K, E, D), where the follow- ing condition are satisfied:

i) P is a finite set of possible plaintext; ii) C is a finite set of possible ciphertext; iii) K is a finite set of possible keys; iv) For each K ∈ K, there is an rule eK ∈ E and a corresponding

decryption rule dK ∈ D. Each eK : P −→ C and dK : C −→ P are

functions such that dK (eK (x)) = x for every plaintext element x ∈ P.

Typically, one has to transmit a message which consists of a finite string of symbols which are elements of some finite alphabet. In almost any crypto- graphic algorithm, this leads to the use of arithmetic in finite mathematical structures, such as finite multiplicative groups, rings, and finite fields. For instance let us consider the following widely used :

− RC4 has no mathematical structure, it is based on permutations of order 256; − RSA is based on integer arithmetics; − AES is based on arithmetics on finite fields.

Finite fields can be considered as a superset of operations of rings and multiplicative groups: multiplicative groups have only one defined operation, rings do not have multiplicative inverses defined for any element, whereas fi- nite fields feature addition-subtraction, multiplication-division and both mul- tiplicative and additive inversion operations. The basic arithmetic operations k in finite fields, Fq, where q = p , and p is a prime integer, are often used in many cryptographic algorithms as Diffie-Hellman exchange algorithm 5

based on discrete logarithm, elliptic curve cryptography and so on. More inter- estingly, the elements of an extension field can be represented as polynomials of degree of at most k − 1 where the coefficients of the polynomial are in the

base field Fq. Note that the arithmetic extension field Fq is usually performed as a regular polynomial arithmetic. The most important operation is polyno- mial multiplication and reduction by the irreducible of degree k used to define the extension field. Using sparse irreducible polynomials is a preferred method

to increase the efficiency of the reduction phase of the multiplication in Fq. Chapter 2 deeply describe these notions together with those of probability and computational theory which are very useful throughout this thesis. The aforementioned analogies between cryptography and algebraic tools made the study of efficient algorithms for general extension field arithmetic a popular research area. The practical advantage of polynomials over finite fields comes to be very useful in cryptography. In fact, almost any can be described by polynomials over a finite field of characteristic

q, Fq, which are, at some extent, tweakable polynomials (i.e., that can be manipulated by the attacker) containing secret and public variables (e.g., key bits in the first case, plaintext or IV bits in the second one). What Eve wants to do is to find out the plaintext or part of it from the ciphertext without knowing the secret key as well as substitute parts of the original message, forge digital signatures or find the secret key. So we can say that cryptanalysis is the science of studying attacks against cryptographic schemes. There are several attacks depending on the resources of the adversary. For example Eve might be the operator of a bank computer and she can see incoming ciphertext and the corresponding plaintext and vice versa. So the attacks can be so classified:

i) Ciphertext-only attack. Eve can only obtain . ii) Known-plaintext attack. Eve can obtain plaintext-ciphertext pairs. iii) Chosen-plaintext attack. Eve can obtain ciphertexts for plaintexts of her choosing. 6 iv) Chosen-ciphertext attack. Eve can obtain plaintexts for ciphertexts of her choosing. v) Brute force attack. Eve can try all possible values for the key to recover until the correct one is found. vi) Algebraic attack. Algebraic cryptanalysis consists of two steps. First, Eve must convert the cipher into a system of polynomial equations. Second one, she must solve the system of equations and obtain from the solution the secret key of the cipher or part of the plaintext.

Among the great set of algebraic attacks, in September 2008 Itai Dinur and Adi Shamir, in their paper “Cube attacks on tweakable black box polynomials”, [DS09], introduced the Cube Attacks which is object of great study over this thesis (Chapter 3). Cube Attack has similarities with a technique called “Algebraic IV Dif- ferential Attack” (AIDA), published in 2007 by Michael Vielhaber. Con- trarily to AIDA technique directed only to the analysis of cipher, Cube Attacks can be applied to any kind of cryptosystem which can be de- scribed by random looking polynomial of degree d and with n + m variables:

p(v1, . . . , vm, x1, . . . , xn), where

• v1, . . . , vm are the public variables (i.e., plaintext or IV bits)

• x1, . . . , xn are the secret variables which contain the key bits.

Since we deal with dense polynomial of relatively high degree, their explicit representations are extremely complex, and thus we assume that they are provided only implicitly as “black boxes” which can be queried. Therefore, Cube Attacks recover a secret key through queries to a black box polynomial (see Fig 0.2) with tweakable public variables, followed

Fig. 0.2. A Blackbox, or Enciphering by solving a linear system of equa- function. tions in the secret key variables. Moreover, no knowledge of the cryp- tosystem is necessary to the Cube Attack to be successful. 7

The solution consists of two phases. During the preprocessing phase, the attacker is allowed to set the values of all the variables (v1, . . . , vm, x1, . . . , xn) and use the blackbox in order to evaluate the corresponding output bit of p. This corresponds to the usual cryptanalytic setting in which the attacker can study the cryptosystem by running it with various keys and plaintexts. In this phase we want to find enough monomials tI (made of only public variables) such that p = tI · pS(I) + qI , where pS(I) is called the superpoly of tI in p. The goal of the preprocessing phase is to choose those public variables in which pS(I) is a linear non-constant polynomial. When enough of such linear polynomials are found we have recovered the resulting linear system. During the online phase, the n secret variables are set to unknown val- ues, and the attacker is allowed to set the values of the m public variables

(v1, . . . , vm) to any desired values and to evaluate p on the combined output. In the middle of the chapter we describe the generalization of the attack k over the field Fq, where q = p for some p. This is a very useful notion because in this way we could apply Cube Attacks to those cryptosystem based on polynomials over finite field of higher order too. Finally, in Section 3.6, we have presented a new variant of the Cube Attack called Dynamic Cube Attack, [DS10], published at the end of 2010. Dynamic Cube Attacks allow us to directly derive information on the secret key without solving any algebraic equations. The drawback of Dynamic Cube Attack, compared to standard Cube At- tack, is that it requires a more complex analysis of the internal structure of the cipher. Cube Attack was tested to the analysis of Trivium . How- ever, the best result on Trivium is a Cube Attack on a reduced version of 767 initialization rounds instead of 1152. For what concerns Dynamic Cube Attack, it was tested on the analysis of Grain-128. In this case the attack managed to recover the full 128-bit key (but only when it belongs to a large set subset of 210 possible keys) on the full version of Grain with a complexity which is faster than exhaustive search. 8

Follows from Chapter 3, that the most onerous part of the attack is the preprocessing phase connected to the number of p-evaluations required to ob- tain the linear polynomial system.

Note that the number of possible tI is proportional to the starting poly- nomial degree. For this reason most of block and stream ciphers are immune to this specific attack, since their polynomial representation has a too high degree. This aspect is not negligible in the modern cryptanalysis research area. Almost any cryptosystem of latest generation has a so elaborated structure that the resulting polynomial has a very mixed output and high degree. To improve the efficiency of Cube Attacks a decreasing number of evalu- ation of the polynomial, derived from the cryptographic primitive, is needed. In order to reduce the mentioned evaluations of p, we suggest a sort of Brute Force technique suitable for Cube Attacks. The starting point is a paper by Charles Bouillaguet called “Fast Ex- + haustive Search for Polynomial System in F2”, [BCC 10], and his PhD the- sis [Bou11], where an Intelligent Brute Force technique is described. Suppose a Brute force attack, or Exhaustive search, the simplest way to seek for recovering the secret. This method do not take advantage in any way on how the cryptographic primitive is computed, but is an attempt to break the cipher, represented by a polynomial p(x1, . . . , xn), substituting to each n-tuple every possible combinations of x1, . . . , xn. The number of possible combinations (i.e., n-tuples), and therefore the required time, grows exponen- tially as n increases. Thus, its cost is proportional to the number of candidate solutions. The new technique tries to enumerate the set of possible candidate solu- tions not in a random way but following a fixed procedure. The way of proceed tries to minimize the number of times that the system is evaluated and the number of arithmetic operations required to obtain the values of the poly- nomial representing the enciphering function. It must be clear that from the complexity point of view it is not possible to change the exponential bound. What can be improved is a multiplicative coefficient in front of the exponen- 9 tial which depends on the number of arithmetic operations required at any step. Since this technique tries to minimize this coefficient in an adaptive way depending on mathematical properties of the polynomial, we can define this approach Intelligent brute force, or Fast exhaustive search. This method is based on the idea that operations to compute the value of polynomial in a given point can be expressed with the help of values previously computed. Moreover, this incremental term is easily expressed when there is a single bit which changes between a point and the successive one. In other words, values are computed by using the derivative of the function and points are enumerated by using Gray Codes. Gray Code is a binary numeral system in which the representations of two n successive values differ in only one bit, or, equivalently, a permutation of (F2) such that two consecutive vectors differ in only one bit. There are many Gray Codes with various properties, we will use the most standard one, that is:

Definition 2 GrayCode(i) = i ⊕ (i  1), where  denotes right-shifting.

i GrayCode(i) 0 (0,0,0) 1 (0,0,1) 2 (0,1,1) 3 (0,1,0) 4 (1,1,0) 5 (1,1,1) 6 (1,0,1) 7 (1,0,0)

Table 0.1. Gray Code representation of the first 8 numbers.

Therefore, thanks to the following proposition:

Proposition 3 For i ∈ N, GrayCode(i + 1) = GrayCode(i) ⊕ eb1(i+1). points are generally enumerated as follows, we call Xi = (x1, . . . , xn) with xi ∈ {0, 1}: 10

Xi+1 = Xi ⊕ eb1(i+1)+k0

Where bk(i) is a function which takes an integer i and yields the index of the k-th lowest-significant bit (i.e., the right-most bit) of i set to 1; by 1 definition bk(i) returns −1 if the Hamming weight of i is strictly lesser than k. For example, bk(0) = −1, b1(1) = 0, b1(2) = 1 and b2(3) = 1. To evaluate f in the updated point we can exploit the derivative properties.

∂f Definition 4 Define the F2 derivative ∂i of a polynomial with respect to its ∂f i-th variable as ∂i : X 7→ f(X + ei) + f(X). Then for a vector X, we have: ∂f f(X + e ) = f(X) + (X), (0.1) i ∂i where ei represents the direction of the derivative.

∂f If f is of total degree d, then ∂i is a polynomial of degree d − 1. In ∂f particular, if f is quadratic, then ∂i is an affine function and, in this case, ∂f it is easy to isolate the constant part: ci = ∂i (e0) = f(ei) + f(e0), where ∂f e0 = (0,..., 0). Then, the function X 7→ ∂i (X) + ci is by definition a linear n form, and can be represented by a vector Di ∈ (F2) . More precisely, we have

Di[j] = f(ei + ej) + f(ei) + f(ej) + f(e0) (0.2) i.e., Di[j] represents the j-th term’s coefficient when deriving in i. In this way, thanks to Algorithm 0.1, we manage to evaluate f, if it is a 2-degree function, in all points needed, through less evaluations than the ones required by standard brute force. Indeed, with the brute force attack we have 2n evaluations of f. In this case, instead, we have:

1. One evaluation for f(X0)

2. One evaluation for f(e0)

3. n evaluations for f(e1), . . . , f(en) n 4. 2 evaluations for f(ei + ej) for all i, j ∈ {1, . . . , n}

1 The Hamming weight, HW (i), of a string i is the number of symbols that are different from the zero-symbol of the alphabet used. For the most typical case, a string of bits, this is the number of 1’s in the string. 11

n So we have 2 + n + 2 total evaluations of f.

n n Remark 1. 2 + n + 2 < 2 for any integer n > 0.

Algorithm 0.1 The differential algorithm.

1: function Init(f, k0,X0) 2: i ← 0

3: X ← X0

4: y ← f(X0) 5: for k = 1 → n do

6: Dk ← Dek f

7: ck ← f(e0) ⊕ f(ek) 8: end for 9: return State 10: end function 11: 12: function Next(State) 13: i ← i + 1

14: k = b1(i)

15: z = DotProduct(Dk,X) ⊕ ck 16: y ← y ⊕ z

17: X ← X ⊕ ek+k0 18: end function

It is possible to generalize Algorithm 0.1 so that it handles with polyno- mials of any degree.

For the sake of simplicity we define ai = eji = eb1(i)+k0 thus ji = b1(i)+k0 for any i. In this way the enumeration of the points for the intelligent brute force is expressed by

Xi = Xi 1 ⊕ eb (i)+k = Xi 1 ⊕ ai, for any i = 0, 1, . . . , n, − 1 0 − where ai represents the increment versor of Xi. Therefore, f(Xi) becomes

f(Xi) = f(Xi 1 ⊕ ai) = f(Xi 1) ⊕ ∆ai f(Xi). − − 12

Therefore, in general, when we have n-updates, we can prove the following result.

Proposition 5

n X X (k) f(X + a1 + a2 + ··· + an) = ∆ f(X). (0.3) ah1 ,...,ahk k=0 1 h1<

Where ∆af(X) denotes the standard derivative definition:

∆af(X) = f(X + a) − f(X). (0.4)

d 1 n P − This is very useful, since the new algorithm allocates i=0 i bits of n internal state and d bits of constants.

Remark 2. Note that, d 1 X− n < 2n. i i=0 This come from Newton’ binomial theorem:

n n X i n i (a + b) = a b − . i=0 When a = b = 1 we have n X n 2n = , i i=0 and thus d 1 n X− n X n < 2n = . i i i=0 i=0 Therefore, in the case of a degree d function we have again less evaluations of f than brute force, i.e., 2n evaluations, until d < n. Once applied on cubes, in the preprocessing phase of the Cube Attacks, this technique provide better results in term of number of evaluation of the polynomial derived from the cryptosystem. The applicability of this version of the Cube Attack has to be further investigated from the point of view of implementation. An implemented version of Cube Attack, that links together the higher order description and the intelligent way of evaluating, could be very useful 13 against the recent and actual ZUC cipher proposed for inclusion in the emerg- ing “4G” mobile communications standard called LTE developed by the 3rd Generation Partnership Project (3GPP). It consists of

• an encryption algorithm named 128-EEA3 and • an integrity algorithm named 128-EIA3,

The algorithms ZUC, 128-EEA3, and 128-EIA3 were designed by the Data Assurance and Communication Security Research Center (DACAS) of the Chinese Academy of Sciences. The algorithms had passed the evaluation by an ETSI SAGE 2 task force and two funded teams of academic experts. The basic building block for both 128-EEA3 and 128-EIA3 is the stream cipher algorithm ZUC, which is composed of three components with an in- ternal state of 560 bits initialized from a 128-bit cipher key K and a 128-bit iv and outputs a key stream of 32-bit words. The execu- tion of ZUC has two stages: key initialization stage and working stage. In the first stage, a key initialization on LFSR (Linear Feedback Shift Register) is performed. The second stage is a working stage. In this stage the LFSR does not receive any input. After working stage, during the key stream generating, with every clock tick, it produces a 32-bit word of output. In the specification, the algorithm is divided into three logical layers:

• a linear feedback shift register (LFSR) of 16 stages as the first layer, • a Bit-reorganization (BR) for the middle layer, • a nonlinear function F for the bottom layer.

The structure of ZUC is illustrated in Figure 0.3.

The LFSR has 16 of 31-bit registers (s0, s1, . . . , s15). Each register takes values from {0, 1,..., 231−1} (231−1 is a prime). In the key loading procedure,

2 The European Telecommunications Standards Institute (ETSI) is an indepen- dent, non-profit, standardization organization in the telecommunications industry (equipment makers and network operators) in Europe, with worldwide projection. ETSI has been successful in standardizing GSM cell phone system, TETRA pro- fessional mobile radio system, and Short Range Device requirements including LPD radio. 14

L F 31 mod 2 - 1 S R 215 217 221 220 1+28

s15 s14 s13 s12 s11 s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0

B R 16:16 16:16 16:16 16:16 X X X X 0 1 2 3

W Z

R1 R2

<<< 16

S.L1 S.L2 FSM

Fig. 2. ZUC cipher in Keystream Generation mode [26]. Fig. 0.3. ZUC keystream generator diagram

2.2 Brief Overview of RC4 and HC-128 128-bit initial key k and 128-bit initial vector iv are divided into 16 bytes each RC4 was allegedlyother: designedk = k || byk || Rivest... ||k in 1987,and iv and= itiv is|| theiv || most... || widelyiv . Then used commercial loaded into cipher the till date. The design consists of two major0 components,1 15 the Key Scheduling0 Algorithm1 (KSA)15 and the Pseudo-Random Generation Algorithm (PRGA).registers The of internal LFSR as state follows: of RC4 contains a permutation of size N = 256 words. The key K is of the same size 256 words as well. However, the original secret key is of length typically between 5 to 32 words, and is repeated to form the expanded key K. The KSA produces the initial pseudo-random permutation of RC4 by || || ≤ ≤ scrambling an identity permutation usingsi key= kKi .D Thei iv initiali permutation(0 i 15)S .produced by the KSA acts as an input to the next procedure PRGA that generates the output sequence. The RC4 algorithms KSA and PRGA are as shown in Fig. 3 (allHere, additionsDi areis a modulo 15-bit 256). constant. The security margin appears to be high, and the design rationale clear. RC4 KSA RC4 PRGA The SecurityS Algorithms Group of ExpertsS (SAGE) task force has no objectionZ (rounds = 256) (rounds = # bytes required) (identity) (after KSA) to128-EEA3 and 128-EIA3 being included in thei standards.= i +1 K j = j + S[i]+K[i] j = j + S[i] Swap S[i] S[j] Swap S[i] S[j] i =0 ↔ i =0 ↔ j =0 i = i +1 j =0 Z = S[S[i]+S[j]]

Fig. 3. Key-Scheduling Algo (KSA) and Pseudo-Random Generation Algo (PRGA) of RC4.

HC-128 [13] is also a state-based stream cipher, designed by Wu and later inducted into the final eSTREAM portfolio [11]. Internally, it consists of two secret tables (P and Q). Each table contains 512 number of 32-bit words. Initially, the 128-bit key and 128-bit IV is used to populate these tables, and then the key-scheduling routine is performed to update the initial states. For each state update one 32-bit word in each table is updated using a non-linear update function. After 1024 steps all elements of the tables have been updated. Thereafter in keystream generation mode, the cipher generates one 32-bit word for each subsequent update step using a 32-bit to 32-bit mapping function. Finally a linear bit-masking function is applied to generate an output word si. The two message schedule functions in the hash function SHA-256 [22] are used with the tables P and Q as S-boxes alternately. The main components of operation, KSA and PRGA, are outlined in Fig. 4.

The individual overview of the ciphers SNOW 3G, ZUC and RC4, HC-128 helps us identify the similarities and dissimilarities in their designs, which will lead to their integration, as described in the next sections. References

[BCC+10] Charles Bouillaguet, Hsieh-Chung Chen, Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, Adi Shamir, and Bo-Yin Yang. Fast exhaustive

search for polynomial systems in F2. In CHES, pages 203–218, 2010. [Bou11] Charles Bouillaguet. Etudes d’hypoth`esesalgorithmiques et attaques de primitives cryptographiques. PhD thesis, Universit´eParis Diderot, 2011. [DK07] Hans Delfs and Helmut Knebl. Introduction to cryptography. Informa- tion Security and Cryptography. Springer, Berlin, second edition, 2007. Principles and applications. [DS09] Itai Dinur and Adi Shamir. Cube attacks on tweakable black box poly- nomials. In Advances in cryptology—EUROCRYPT 2009, volume 5479 of Lecture Notes in Comput. Sci., pages 278–299. Springer, Berlin, 2009. [DS10] Itai Dinur and Adi Shamir. Breaking Grain-128 with dynamic cube attacks. Cryptology ePrint Archive, Report 2010/570, 2010. http: //eprint.iacr.org/. [Jou09] Antoine Joux. Algorithmic cryptanalysis. Chapman & Hall/CRC Cryp- tography and Network Security. CRC Press, Boca Raton, FL, 2009. [Sti06] Douglas R. Stinson. Cryptography. Discrete Mathematics and its Ap- plications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, third edition, 2006. Theory and practice.