Hash Function Balance and Its Impact on Birthday Attacks

An extended abstract of this paper appears in Advances in Cryptology { EUROCRYPT '04, Lecture Notes in Computer Science Vol. 3027, C. Cachin and J. Camenisch eds., Springer-Verlag, 2004. This is the full version. Hash Function Balance and its Impact on Birthday Attacks Mihir Bellare¤ Tadayoshi Kohnoy May 2004 Abstract Textbooks tell us that a birthday attack on a hash function h with range size r requires r1=2 trials (hash computations) to ¯nd a collision. But this is misleading, being true only if h is regular, meaning all points in the range have the same number of pre-images under h; if h is not regular, fewer trials may be required. But how much fewer? This paper addresses this question by introducing a measure of the \amount of regularity" of a hash function that we call its balance, and then providing estimates of the success-rate of the birthday attack as a function of the balance of the hash function being attacked. In particular, we will see that the number of trials to ¯nd a collision can be signi¯cantly less than r1=2 for hash functions of low balance. This leads us to examine popular design principles, such as the MD (Merkle-Damgºard) transform, from the point of view of balance preservation, and to mount experiments to determine the balance of popular hash functions. Keywords: Hash functions, birthday attacks, collision-resistance. ¤Dept. of Computer Science & Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA. E-Mail: [email protected]. URL: http://www-cse.ucsd.edu/users/mihir. Supported in part by NSF grants CCR-0098123, ANR-0129617 and CCR-0208842, and by an IBM Faculty Partnership Development Award. yDept. of Computer Science & Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA. E-mail: [email protected]. URL: http://www-cse.ucsd.edu/users/tkohno. Supported by a National Defense Science and Engineering Graduate Fellowship. 1 Contents 1 Introduction 2 2 Notation and Terminology 5 3 The attack and associated metrics 6 4 Approaches to the analysis of the birthday attack 7 5 The Balance Measure and its Properties 8 6 Balance-based Analysis of the Birthday attack 10 6.1 Bounds on Ch(q) . 10 6.2 Bounds on Qh(c) . 11 6.3 Proof of Theorem 6.1 . 12 6.4 Proof of Corollary 6.2 . 15 6.5 Proof of Theorem 6.3 . 15 6.6 Proof of Corollary 6.4 . 16 7 Special classes of hash functions 16 7.1 Regular functions . 16 7.2 Random functions . 17 7.3 Discussion . 18 7.4 Proof of Theorem 7.3 . 19 7.5 Proof of Proposition 7.4 . 19 8 Does the MD transform preserve balance? 19 9 Extensions and variations 22 9.1 Treating families of hash functions . 22 9.2 Variants of the attack . 24 10 The generalized birthday problem 24 10.1 Proof of Theorem 10.3 . 25 11 Experiments on SHA-1 26 References 27 A The expected number of trials to ¯nd a collision 28 A.1 Proof of Theorem A.1 . 29 2 1 Introduction Let h: D R be a function, where the domain D and range R are ¯nite sets. We say that h ! is a hash function if D > R . A collision for h is a pair x; y of distinct points in D for which j j j j h(x) = h(y). Cryptographers are interested in hash functions that are collision-resistant, meaning it is computationally infeasible to ¯nd a collision. (Such functions have numerous uses, for example for digital signatures.) The best known general collision-¯nding attack is the so-called birthday attack. In this paper, we assess the performance and success rate of this attack. The birthday attack. In a birthday attack, we pick points x1; : : : ; xq from D and compute yi = h(xi) for i = 1; : : : ; q. The attack is successful if there is a pair i; j such that xi; xj form a collision for h. We call q the number of trials. There are several variants of this attack which di®er in the way the points x1; : : : ; xq are chosen (cf. [9, 15, 18, 20]). The one we consider is that they are chosen independently at random from D. Picking random points from the domain may be prohibitive in the case of a hash function like SHA-1 [11] whose domain is the set of all strings of length at most 264. In such cases we would n 160 simply attack the function h = SHAn: 0; 1 0; 1 , the restriction of SHA-1 to inputs of 64 f g ! f g length n < 2 , for some suitable value of n, say n = 161. This works because a collision for SHAn is also a collision for SHA-1. This is to be understood henecforth in discussing attacks on hash functions like SHA-1. We let C (q) be the probability that the birthday attack on hash function h: D R succeeds h ! in ¯nding a collision in q trials. We are interested in how this function grows with q. Current beliefs. It is generally stated that q 1 C (q) ; (1) h ¼ 2 ¢ r µ ¶ where r = R is the size of the range of h and q O(pr). This implies that a collision is expected j j · in about r1=2 trials. In particular, it predicts that collisions in a hash function with output length m bits would take about 2m=2 trials to ¯nd. This estimate is the basis for the choice of output length m, which is typically made just large enough to make 2m=2 trials infeasible. How is Equation (1) obtained? The apparent reasoning is to view the range points y1; : : : ; yq computed in the attack as being uniformly and independently distributed in R. Then the standard birthday phenomenon says that the probability that there exist distinct i; j such that yi = yj is, q up to constant factors, 2 =r. However, this argument is actually not correct, because the point h(x), for x drawn at random ¡ ¢ from D, is not necessarily uniformly distributed in R. Rather, the probability that h(x) equals a particular range point y is h¡1(y) = D , where h¡1(y) is the set of all pre-images of y under h. So j j j j the range points computed in the attack are uniformly distributed over R if and only if h is regular, meaning every range point has the same number of pre-images under h.1 q If h is not regular, then Ch(q) is actually larger than 2 =r, meaning that the attack would ¯nd a collision in fewer than the expected r1=2 trials. But how much fewer? Is there cause for concern? ¡ ¢ Random functions. Before proceeding we note that there is another way that the perfomance of $ the birthday attack is often assessed, which is to consider the probability CD;R(q) that the birthday attack succeeds in q trials in the thought experiment where h is chosen at random from the set of all maps of D to R. One can show (cf. [7, 16] and Theorem 7.3) that C $ (q) q =r. D;R ¼ 2 Now one might argue that a given, \good" hash function h has \random behavior" and hence ¡ ¢ 1 We refer the reader to Section 4 for more details. 3 q Ch(q) is also about 2 =r for such a h. This argument is however not mathematically sound. There is no sound de¯nition of what it means to \have random behavior" and it is unclear a suitable ¡ ¢ one can be found. We end up not analyzing the given h, but rather analyzing an abstract and ideal object that ultimately has no connection to h, regardless of the design principles underlying h. Indeed, the analysis of the attack ignores the actual hash function entirely: whether it be MD5, SHA-1, RIPEMD-160 or some other function, there is no change in the analysis, for the latter looks at a random function.2 Our goal is to obtain results enabling is to assess Ch(q) for any given, speci¯c h, rather than one chosen at random. Still, the random hash function model is something to keep in mind and we will return to it, in particular contrasting results there with our own, at several points in this paper. This paper. To help answer questions such as those posed above, this paper begins by introducing a measure of the \amount of regularity" that we call the balance of a hash function. This is a real number between 0 and 1, with balance 1 indicating that the hash function is regular and balance 0 that it is a constant function, meaning as irregular as can be. We then provide quantitative estimates of the success-rate of the birthday attack as a function of the balance of the hash function being attacked. This yields a tool that has a variety of uses, and lends insight into various aspects of hash function design and parameter choices. For example, by analytically or experimentally estimating the balance of a particular hash function, we can tell how quickly the birthday attack on this hash function will succeed. Let us now look at all this in more detail. The balance measure. View the range R of hash function h: D R as consisting of r 2 ¡1 ! ¸ points R1; : : : ; Rr. For i = 1; : : : ; r we let h (Ri) be the pre-image of Ri under h, meaning the set of all x D such that h(x) = R , and let d = h¡1(R ) be the size of the pre-image of R under 2 i i j i j i h.

Load more