Analysis of ARX Functions: Pseudo-Linear Methods for Approximation, Diﬀerentials, and Evaluating Diﬀusion

Analysis of ARX Functions: Pseudo-linear Methods for Approximation, Differentials, and Evaluating Diffusion Kerry A. McKay⋆ and Poorvi L. Vora⋆⋆ The George Washington University, Washington DC 20052, USA [email protected] and [email protected] n w Abstract. This paper explores the approximation of addition mod 2 by addition mod 2 , where 1 ≤ w ≤ n, in ARX functions that use large words (e.g., 32-bit words or 64-bit words). Three main areas are explored. First, pseudo-linear approximations aim to approximate the bits of a w-bit window of the state after some rounds. Second, the methods used in these approximations are also used to construct truncated differentials. Third, branch number metrics for diffusion are examined for ARX functions with large words, and variants of the differential and linear branch number characteristics based on pseudo-linear methods are introduced. These variants are called effective differential branch number and effective linear branch number, respectively. Applications of these approximation, differential, and diffusion evaluation techniques are demonstrated on Threefish-256 and Threefish-512. 1 Introduction Modern block ciphers are expected to be resilient to linear cryptanalysis [18], which relies on the existence of linear distinguishers: probabilistic linear relationships over Z2, among plaintext, ciphertext, and key bits. In order to protect against effective linear cryptanalysis, some block ciphers have employed a combination of addition modulo 2n, rotation and exclusive-or (ARX), preventing the existence of linear relations over Z2 among single bits. In this paper, we present pseudo-linear cryptanalysis, which approximates strings of bits using addition in underlying structures other than Z2 and is hence a more effective analytical tool for ARX designs. We illustrate the use of pseudo-linear analysis by applying it to the Threefish block cipher of the Skein design [9], one of five finalists in the SHA-3 competition. The paper also presents metrics for the evaluation of diffusion in ARX designs, to predict vulnerability to pseudo-linear cryptanalysis. In keeping with the tradition of linear cryptanalysis and the terminology common in the cryptanalytic community, in this paper, the words “linear” and “linearity” will refer to linearity over Z2. ARX designs are simple, efficient and easy to implement. Desktop computers easily support 32-bit words, and many newer architectures support 64-bit words, all of which leads to cheap and efficient processing. In addition, the use of addition modulo 2n reduces the memory footprint that may otherwise be used for substitution box table lookups. Computing the non-linear function in memory is also a means of defense against memory inspection techniques that look for certain structures in memory, such as lookup tables, to identify the use of cryptography. ARX-based block ciphers can also be used to design secure hash functions; in fact, two of the five finalists in the SHA-3 competition were ARX-based. Distinguishers of the block cipher could be used in hash function analysis to discover preimages and collisions. Given their flexibility and efficiency, it is expected that ARX designs will continue to be popular. However, the ARX approach has not been analyzed extensively and any general framework for analyzing ARX functions will have impact on future cipher designs. This paper presents an approach to analyzing and measuring the security of ARX-based block ciphers. Its focus is on the analytical approach in contrast to the “breaking” of a single cipher. The key idea in the approximation is to examine a window (grouping of contiguous bits) of size w for w<n. We make the following simple observations. First, addition modulo 2n on the window can be approximated by addition modulo ⋆ Work supported in part by the National Science Foundation Scholarship for Service Program, grant DUE-0621334, and National Science Foundation grant CCF 0830576 ⋆⋆ Work supported in part by National Science Foundation grant CCF 0830576 2w. Second, this addition gives a perfect approximation if the carry into the window is estimated correctly. The probability of correctness of the approximations depends exclusively on the probability distribution of 1 −i−1 the carry, and this probability is independent of w; in fact, for uniformly distributed addends it is 2 +2 , where i ≥ 0 is the position of the least significant bit in the window. Third, the probability of correctness 1 for a random guess of the value of the window decreases exponentially with w; it is 2w . Hence, the bias of the approximation (the difference between the probability of correctness of our approximation and that of a random guess) increases with w. When w = 1, pseudo-linear cryptanalysis provides a correctness probability exactly that of linear cryptanalysis with a bias of 2−i−1, which is small for large i (larger values of i imply the bit being approximated is more significant). It is hence correct to assume that ARX ciphers are not very vulnerable to traditional linear cryptanalysis. On the other hand, pseudo-linear cryptanalysis for large w and small i results in a correctness 1 probability greater than 2 for a string of w bits, providing considerable advantage over guessing at random. This paper presents pseudo-linear cryptanalysis and simple results about the approximations in terms of w. We show how approximations can be combined over a number of rounds. Because the estimate of the carries significantly affects the output, one may compensate for mispredicted carries by adding an offset to the approximated string and looking at an interval around the string. This paper also provides extensions of the branch number diffusion metric used in the design strategy of the Advanced Encryption Standard (AES) to better analyze ARX functions that use 32 and 64-bit word sizes. The application of pseudo-linear analysis and extended diffusion metrics is demonstrated on Threefish-512, the ARX block cipher found in the main submission of SHA-3 finalist Skein [9]. It is worth noting that pseudo-linear cryptanalysis presumes one of the weakest adversaries in the literature: the adversary has access to plaintext/ciphertext pairs, but has no power over the selection of plaintext, ciphertext, or keys, and does not assume any relationships between samples. This paper is organized as follows. Section 2 presents related work. Section 3 presents pseudo-linear cryptanalysis and section 4 demonstrates links with differential cryptanalysis. Section 5 presents diffusion metrics and section 6 applies all results to the Threefish cipher of the Skein hash function entry to the SHA-3 competition. The conclusion is presented in section 7. Some of this work has been presented at the second SHA-3 conference [19] and a pre-reviewed version of that paper is available as a technical report [20]. In addition to the material in that paper, this paper also presents pseudo-linear results on Threefish-512, links between pseudo-linear and differential cryptanalysis with results on Threefish-512, and an exploration of how the branch number metric applies to ARX ciphers that perform operations on large words. Notation: Throughout this paper, the following notation is used. ⊞ addition modulo 2n, where n is the word size in bits w ⊞w addition modulo 2 , where 1 ≤ w ≤ n ⊕ exclusive-or ≪r r-bit left word rotation ≫r r-bit right word rotation j xi word i of state x at round j xi the bit at position 0 ≤ i<n of word x x(i) bits xi,...,x(i+w−1) mod n of word x, for i ≤ 0 <n and 1 ≤ w ≤ n 2 Related Work Linear cryptanalysis consists of approximating a function using linear expressions. This method was created to attack DES, and as such, was designed for approximating functions using exclusive-or, or addition modulo 2. An approximation of the relationship among plaintext, ciphertext and key bits would be of the form p1 ⊕ p2 ⊕ . ⊕ pn ⊕ ci ⊕ . ⊕ cm = k1 ⊕ . ⊕ ks, where each pi is a bit of plaintext, each ci is a bit of ciphertext, and each ki is a bit of key. Results can be improved using multiple linear approximations to perform linear cryptanalysis [11]. In recent years, the concept of linear cryptanalysis has been generalized to encompass non-binary ciphers [2]. 2 The use of addition mod 2n in cryptographic algorithms has inspired several researchers to study the linear properties involved in integer addition. The carry function has been shown to be biased, based on position [29]. In addition to position, addition also has bias properties based on the number of inputs [23]. The probability distribution of the carry function in integer addition with an arbitrary number of inputs has been investigated[24]. The carry distribution of two neighboring bits during addition modulo 232 – has also been studied[5]. Nyberg and Wallén have made considerable contributions in linear approximations of addition and understanding the carry function [23][26][27][28][22][21]. Differential cryptanalysis exploits relations between pairs of inputs, and often requires the adversary to have power over choosing the plaintext or choosing the ciphertext. That is, the adversary chooses two message, M and M ′, that are related by some known value, ∆, such that M ′ = M ⊕ ∆. A generalization of a differential characteristic called a truncated differential characteristic [14] requires a difference to be satisfied in a subset of state bits, rather than the entire state. Higher order differentials [14] do not restrict differences to linear relationships. Higher order differential cryptanalysis extends analysis beyond the first order derivative[15]. More complex differential attacks, such as boomerang attacks, have been described[25]. Related-key boomerang attacks have been applied to Threefish, the block cipher in Skein’s [8] compression function [4][1]. Impossible differential attacks use knowledge of differences that can never happen to learn secret information.

Load more