SHA-1 and the Strict Avalanche Criterion

SHA-1 and the Strict Avalanche Criterion Yusuf Motara Barry Irwin Rhodes University Rhodes University Grahamstown 6140 Grahamstown 6140 SOUTH AFRICA SOUTH AFRICA Email: [email protected] Email: [email protected] Abstract—The Strict Avalanche Criterion (SAC) is a measure 3) An exploration of intermediate results of both confusion and diffusion, which are key properties of Section 2 of this paper examines related work and argues a cryptographic hash function. This work provides a working definition of the SAC, describes an experimental methodology that the SAC as proposed by Webster & Tavares [1] has that can be used to statistically evaluate whether a cryptographic been misunderstood in much of the contemporaneous critical hash meets the SAC, and uses this to investigate the degree to literature. Section 3 introduces salient points of a well-known which compression function of the SHA-1 hash meets the SAC. cryptographic hash (SHA-1) which is assumed to exhibit the The results (P < 0:01) are heartening: SHA-1 closely tracks SAC, and describes an experimental design to test its SAC- the SAC after the first 24 rounds, and demonstrates excellent properties of confusion and diffusion throughout. compliance. Section 4 presents experimental results, and some discussion follows. I. INTRODUCTION Many computer scientists know little about the inner work- II. RELATED WORK ings of cryptographic hashes, though they may know some- thing about their properties. One of these properties is the The original definition [1] of the SAC is: “avalanche effect”, by analogy with the idea of a small stone Consider X and Xi, two n-bit, binary plaintext causing a large avalanche of changes. The “avalanche effect” vectors, such that X and Xi differ only in bit i, explains how a small change in the input data can result in a 1 < i < n. Let large change in the output hash. However, many questions around the effect are unanswered. For example, how large Vi = Y ⊕ Yi is the effect? After how many “rounds” of a compression function can it be seen? Do all inputs result in such an where Y = f(X) , Yi = f(Xi) and f is the effect? Little experimental work has been done to answer these cryptographic transformation, under consideration. questions for any hash function, and this paper contributes If f is to meet the strict avalanche criterion, the experimental results that help in this regard. probability that each bit in Vi is equal to 1 should be m A boolean n-bit hash function H is the transform Z2 ! one half over the set of all possible plaintext vectors n Z2 . A cryptographic hash function attempts to obscure the X and Xi. This should be true for all values of i. relationship between the input and output of F , and the degree Forre´ [2] expresses this as: to which this is accomplished is directly related to the (second- Let x and x denote two n-bit vectors, such that )preimage resistance of the hash function. This implies that i x and x differ only in bit i, 1 ≤ i ≤ n. Zn two similar inputs should have very different outputs. i 2 denotes the n-dimensional vector space over 0,1. arXiv:1609.00616v1 [cs.CR] 2 Sep 2016 The Strict Avalanche Criterion (SAC) ([1], [2]) formalizes The function f(x) = z; z 2 f0; 1g fulfills the SAC this notion by measuring the amount of change introduced in if and only if the output by a small change in the input. It builds on the definition of completeness, which means that each bit of the X output depends on all the bits of the input, in a way that is f(x)⊕f(x ) = 2n−1; for all i with 1 ≤ i ≤ n: cryptographically relevant. Using the definition of H as above, i x2Zn an output H(x) = y is obtained for an input x. The initial 2 bit of x is now flipped, giving H(x0) = y0. This process is Similarly, Lloyd [3] understands the SAC as: repeated for x1::n, resulting in y1::n. The SAC is met when Let f : Zn 7! Zm be a cryptographic transfor- n 2 2 the Hamming distance between y and y0::n is, on average, 2 . mation. Then f satisfies the strict avalanche criterion There are three contributions that this paper makes to the if and only if existing body of research: 1) A definition of what the SAC is; X n−1 n−1 2) Experimental SAC results for a particular cryptographic f(x)⊕f(x⊕ci) = (2 ; :::; 2 ) for all i, 1 ≤ i ≤ n: n hash (SHA-1); x2Z2 where ⊕ denotes bitwise exclusive or and ci is seems to be more useful (and more in line with the original a vector of length n with a 1 in the ith position and definition) to understand how far a particular sample diverges 0 elsewhere. from the SAC. Therefore, this paper regards the SAC as a Other works ([4], [5], [6], [7]) follow in the same vein. continuum but takes Lloyd’s formulation as the definition of However, these definitions calculate the sum over all possible what it means to “meet” the SAC. inputs as leading to the fulfillment of the SAC, which is Preneel [4] suggests a generalisation of the SAC called the contrary to the original definition. The original definition propagation criterion (PC), defined as separates a baseline value from the avalanche vectors, and Let f be a Boolean function of n variables. Then states that the SAC holds true when “the probability that each f satisfies the propagation criterion of degree k, bit [in the avalanche vectors] is equal to 1 should be one half PC(k),(1 ≤ k ≤ n), if f^(x) changes with a over the set of all possible plaintext vectors” [1]. Therefore, probability of 1/2 whenever i (1 ≤ i ≤ k) bits of x n a better test of whether f : Z2 7! Z2 fulfills the SAC would are complemented. use a universal quantifier, It can be seen that the SAC is equivalent to PC(1). The same work defines an extended propagation criterion which regards the SAC as a continuum. Much of the subsequent 8x 2 n;P (f(x) = f(x )) = 0:5 Z2 i work ([8], [9], [10], [11], [12], [13]) in this area has more for all xi which differ from x in bit i; 1 ≤ i ≤ n closely examined the relationship between PC and nonlinearity A simple example clarifies the difference. Babbage [6] uses characteristics. Many of these extend the PC in interesting Lloyd’s [3] definition of the SAC and defines a SAC-compliant ways and examine ways of constructing functions which function: satisfy PC(n), but experimental research that targets existing n algorithms is scarce. Define f : Z2 7! Z2 by Although there are proven theoretical ways to construct a function which satisfies the SAC [7], there is no way (apart ( f(x1; :::; xn) = 0 if x1 = 0 from exhaustive testing) to verify that an existing function satisfies the SAC. By contrast, useful cryptographic properties f(x1; :::; xn) = x2 ⊕ ::: ⊕ xn if x1 = 1 such as non-degeneracy [14] or bentness [15] are verifiable The simplest function of this nature is f(x) = x0^x1. Then, without having to resort to exhaustive testing. However, the taking g(x) = f(x)⊕f(x⊕01) and h(x) = f(x)⊕f(x⊕10), SAC metric is no worse in this regard than the correlation immunity [16] and balance [17] metrics which also require x f(x) g(x) h(x) P (f(x) = f(xi)) exhaustive testing. 00 0 0 0 1.0 III. EXPERIMENTAL DESIGN 01 0 0 1 0.5 The SHA-1 hash [18] is a well-known cryptographic hash function which generates a 160-bit hash value. It is the 10 0 1 0 0.5 successor to the equally well-known MD5 cryptographic hash 11 1 1 1 1.0 function which generated a 128-bit hash value. SHA-1 was Sum: 2 2 designed by the National Security Agency of the United States of America and published in 1995 as National Institute Note that the sum of each of the third and fourth columns is of Standards and Technology (NIST) Federal Information n−1 2 , as predicted, and that this function fulfills the summed Processing Standard 180-1. definition of the SAC. However, the first and last rows do not fulfill the original definition of the SAC at all: the probability A. Hash details of change, given the baseline values 00 and 11, is 0.0 in The SHA-1 hash is constructed using the Merkle-Damgard˚ each case. It is therefore more reasonable to regard the paradigm ([19], [20]), which means that it consists of padding, row probability as important. This understanding is also in chunking, and compression stages. These stages are necessary accordance with the original text that defined the term. Under for the hash algorithm to be able to handle inputs which are this definition, x0 ^ x1 is not SAC-compliant. greater than 447 bits in length; however, they are unnecessary It is worth noting that the original definition, as per Webster to consider in an examination of the compression function & Tavares [1], is slightly ambiguous. They state that “the itself, since the strength of the Merkle-Damgard˚ paradigm is probability that each bit in Vi is equal to 1 should be one predicated on the characteristics of the compression function. half over the set of all possible plaintext vectors X and Xi”; This paper examines only the compression function itself, and however, they also state that “to satisfy the strict avalanche does not concern itself with padding, chunking, or Davies- criterion, every element must have a value close to one half ” Meyer strengthening [21].

Load more