1 Collision Resistance of the JH Hash Function Jooyoung Lee and Deukjo Hong

Abstract—In this paper, we analyze collision resistance of the Merkle-Damgard˚ Transform: Let JH hash function in the ideal primitive model. The JH hash [∞ function is one of the five SHA-3 candidates accepted for the ∗ mi final round of evaluation. The JH hash function uses a mode of pad : {0, 1} → {0, 1} operation based on a permutation, while its security has been i=1 elusive even in the random permutation model. One can find a collision for the JH compression function only be an injective padding. With this padding scheme and a with two backward queries to the basing primitive. However, the predetermined constant IV ∈ {0, 1}2n, the Merkle-Damgard˚ security is significantly enhanced in iteration. For c ≤ n/2, we transform produces a variable-input-length function MD[F ]: prove that the JH hash function using an ideal n-bit permutation {0, 1}∗ → {0, 1}2n from a fixed-input-length function F : and producing c-bit outputs by truncation is collision resistant 2n m 2n ∗ c/2 {0, 1} × {0, 1} → {0, 1} . For M ∈ {0, 1} such that up to O(2 ) queries. |pad(M)| = lm, MD[F ](M) is computed as follows. Index Terms—hash function, collision resistance. Function MD[F ](M) I.INTRODUCTION u[0] ← IV Break pad(M) = M[1]|| ... ||M[l + 1] into m-bit blocks As many hash functions, including those most common in for i ← 1 to l + 1 do practical applications, have started to exhibit serious security u[i] ← F (u[i − 1],M[i]) weaknesses [2]–[9], the US National Institute for Standards return u[l + 1] and Technology (NIST) has opened a public competition to develop a new cryptographic hash function. Currently, the Collision Resistance: We review the definition of collision final candidates to replace SHA-2 has been announced, which resistance in the information-theoretic model. Given a function are BLAKE, Grøstl, JH, Keccak and . In this paper, H = H[P] and an information-theoretic adversary A both we analyze collision resistance for the JH hash function in with oracle access to an ideal primitive P, the collision the ideal primitive model. The JH compression function is resistance of H against A is estimated by the following illustrated in Fig. 1, where π is a certain permutation. The experiment. JH hash function is obtained by feeding the compression Experiment Expcol function to the Merkle-Damgard˚ transform [10]. The only A known result for the security of the JH hash function is A updates Q by making oracle queries to P 0 0 its indifferentiability from a random oracle guaranteed up to if ∃ M 6= M and u s.t. u = HQ(M) = HQ(M ) then 2n/6 query complexity [1]. This translates into the collision output 1 resistance of the JH hash function up to 2n/6 query complexity, else which is far from optimal. output 0 Even if π is a truly random function, one can find a collision This experiment records every query-response pair that A for the JH compression function only with two backward obtains by oracle queries into a query history Q. We write queries to the basing primitive. In this paper, however, we u = HQ(M) if Q contains all the query-response pairs show that the security is significantly enhanced in iteration. required to compute u = H(M). At the end of the experiment, For c ≤ n/2, we prove that the JH hash function using A would like to find two distinct evaluations yielding a an ideal n-bit permutation and producing c-bit outputs by collision. The collision-finding advantage of A is defined to truncation is collision resistant up to O(2c/2) queries. This be h i bound implies that the JH hash function provides the optimal Advcol(A) = Pr Expcol = 1 . collision resistance in the random permutation model. H A

II.PRELIMINARIES The probability is taken over the random choice of P and col A’s coins (if any). For q > 0, we define AdvH (q) as the maximum of Advcol(A) over all adversaries A making at General Notation: For two bitstrings x and y, x||y denotes H most q queries. the concatenation of x and y. Given x ∈ {0, 1}n for an n even integer n, xL and xR denote 2 -bit strings such that x = xL||xR. III.DESCRIPTIONOFTHE JHHASH FUNCTION J. Lee is with the Attached Institute of Electronics and Telecommunications Research Institute, Daejeon, Korea (e-mail: [email protected]). n D. Hong is with the Attached Institute of Electronics and Telecommunica- Let π be a permutation on {0, 1} for an even integer n. tions Research Institute, Daejeon, Korea (e-mail: [email protected]). Then the JH compression function F = F [π] is defined as 2 follows. birthday bound. The next step is to show that the probability of collision is small without the occurrence of Rcol . We begin n n/2 n q F : {0, 1} × {0, 1} −→ {0, 1} with the following proposition. (u, z) 7−→ v, Proposition 1: Without the occurrence of Rcoli, |Ui| ≤ i+1 for i = 0, . . . , q. where Proof: Note that U0 = {IV }. If |Ui| > i + 1 for some v = π (u ⊕ (z||0)) ⊕ (0||z). i = 1, . . . , q, then a certain query, say the j-th query, would produce two distinct orderly reachable nodes, say w and w0. The pictorial representation is given in Fig. 1. n c In this case, we have two paths For c ≤ n/2, let chopc : {0, 1} → {0, 1} be the function j1 j2 js−1 js that chops off the (n − c) leftmost bits of its input string, i.e., P1 : IV → u[1] → · · · → u[s − 1] → w n−c chopc(x) = x2 if x = x1||x2 for some x1 ∈ {0, 1} and c and x2 ∈ {0, 1} . Then the c-bit JH hash function is defined by 0 0 j0 0 j1 j2 t−1 jt 0 JHc = chopc ◦ MD[F ]. In the original submission, n = 1024 P2 : IV → v[1] → · · · → v[t − 1] → w and c ∈ {224, 256, 384, 512}. where the labels are strictly increasing and Since the padding is injective, we can simplify our collision j = j0 = j ≤ i. analysisS by assuming that the domain of the JH hash function s t is ∞ {0, 1}ni/2 (and ignore the padding scheme). In the i=1 Since w 6= w0 and j = j0 = j ≤ i, u[s − 1] and v[t − 1] are following section, we will prove collision resistance for the s t distinct orderly reachable nodes in U such that chop (u[s − JH hash function assuming π is an ideal random permutation. i c 1]) = chopc(v[t−1]). This contradicts the condition of ¬Rcoli. z n/2 Proposition 2: Suppose that an adversary A makes q queries to a random permutation π and its inverse π−1. For u v n/2 L n/2 L N = 2 and q < N, 㼩 q(q + 1) u v Pr [Rcol ] ≤ . R n/2 R q 2(N − 1) Proof: Since Fig. 1. JH compression function. Xq Pr [Rcolq] ≤ Pr [Rcoli ∧ ¬Rcoli−1] i=1 Xq IV. COLLISION RESISTANCE OF THE JHHASH FUNCTION ≤ Pr [Rcoli|¬Rcoli−1] , (1) Suppose that an information-theoretic adversary A adap- i=1 tively makes q forward or backward queries to an ideal random (where Rcol = ∅), we will focus on the estimation of i i 0 permutation π, and records a query history Q = {(x , y ) ∈ Pr [Rcol |¬Rcol ] for i = 1, . . . , q. Note that U contains n i i i i−1 i−1 {0, 1} : 1 ≤ i ≤ q}. Here π(x ) = y and A’s i-th query is at most i nodes without the occurrence of event Rcol by i −1 i i−1 either π(x ) or π (y ) for 1 ≤ i ≤ q. Proposition 1. We define a direct graph G on {0, 1}n where a direct edge ∗ ∗ Suppose that A makes a forward query π(xL||xR) = from u to v labeled i is added to G when the i-th query- (y ||y ). Since there are at most one orderly reachable node i i L R response pair (x , y ) determines an evaluation F [π](u, z) = v u ∈ U such that u = x∗ , the i-th query determines at i i−1 R R for some z ∈ {0, 1}n/2. We will denote such an edge by u → ∗ most one orderly reachable node v = (yL||(uL ⊕ xL ⊕ yR)). v. We note that each query π(x ||x ) = (y ||y ) generates ∗ L R L R The probability that uL ⊕ xL ⊕ yR = wR for some w ∈ Ui−1 n/2 2 edges from ((xL ⊕ z)||xR) to (yL||(yR ⊕ z)) where z ∈ is at most iN/(N 2 − q). When A makes a backward query {0, 1}n/2. −1 ∗ ∗ π (yL||yR) = (xL||xR), the probability that xR = wR for n 2 Definition 1: u ∈ {0, 1} is called an orderly reachable some w ∈ Ui−1 is also at most iN/(N − q). Therefore we node if there exists a direct path conclude that

i i it−1 i iN IV →1 u[1] →2 · · · → u[t − 1] →t u, Pr [Rcol |¬Rcol ] ≤ , i i−1 N 2 − q such that i1 < i2 < . . . < it−1 < it. By convention, IV is an and by (1), orderly reachable node. Xq For i = 1, . . . , q, let Ui be the set of orderly reachable nodes iN q(q + 1) Pr [Rcolq] ≤ ≤ . determined by the first i queries, and let Rcol be the event N 2 − q 2(N − 1) i i=1 that Ui contains a collision in the right-half bits. That is,

Rcoli : there exist u, v ∈ Ui such that u =6 v and uR = vR. Let Coll denote the event that A makes a collision of JHc. This event guarantees existence of two paths Now our security proof consists of two steps. The first step i1 i2 is−1 is is to prove that the probability of Rcolq is small up to the P1 : IV (= u[0]) → u[1] → · · · → u[s − 1] → w 3

and c) Event C3 ∧ ¬Rcolq: The probability that j1 j2 jt−1 jt 0 ∗ ∗ j P2 : IV (= v[0]) → v[1] → · · · → v[t − 1] → w uL ⊕ xL ⊕ yR = xR 0 ∗ ∗ 2 such that chopc(w) = chopc(w ). We can assume that this for some j < i is at most (i − 1)N/(N − q). collision is an earliest-possible one such that is 6= jt. If both w and w0 are orderly reachable nodes (with the above To summarize, we have paths) and i∗ = i > j (without loss of generality), then we "Ã ! # s t _3 Xq µ ¶ would have the following configuration. N iN Pr Ck ∧ ¬Rcolq ≤ + 1 + (i − 1) i∗ N 2 − q 2c 1) C1: u → w where u ∈ Ui∗−1 and chopc(w) = k=1 µ i¶=1 0 0 chop (w ) for some w ∈ U ∗ . N N q(q + 1) c i −1 = + 1 · · If one of w and w0 is not an orderly reachable node, assuming 2c N 2 − q 2 w is not an orderly reachable node without loss of generality, N q(q + 1) ∗ ≤ · . let i = iα be the first index in path P1 such that iα ≥ iα+1. N − 1 2c Then, u = u[α − 1] is an orderly reachable node in Ui∗−1. Starting from this node, we have one of the following two By Propositions 2 and 3, and inequality (2), we have the local configurations. ∗ ∗ following theorem. i 0 i 00 2) C2: u → u → u , where u ∈ Ui∗−1. Theorem 1: For the c-bit JH hash function JHc, ∗ i 0 j 00 ∗ 3) C3: u → u → u , where u ∈ Ui∗−1 and j < i . q(q + 1) Advcol (q) ≤ . To summarize, we have JHc 2c−1 " # _3 col Acknowledgements AdvJHc (A) = Pr [Coll] ≤ Pr Ck The authors would like to thank John Steinberger for "Ãk=1 ! # _3 valuable comments. ≤ Pr [Rcolq] + Pr Ck ∧ ¬Rcolq . (2) k=1 REFERENCES Proposition 3: Suppose that an adversary A makes q [1] R. Bhattacharyya, A. Mandal and M. Nandi. Security analysis of the queries to a random permutation π and its inverse π−1. For mode of JH hash function. FSE 2010, LNCS 6147, pp. 168–191, Springer, N = 2n/2 and q < N, Heidelberg (2010). "Ã ! # [2] E. Biham, R. Chen, A. Joux, P. Carribault, C. Lemuet and W. Jalby. _3 N q(q + 1) Collisions of SHA-0 and reduced SHA-1. Eurocrypt 2005, LNCS 3494, Pr Ck ∧ ¬Rcolq ≤ · . pp. 36–57, Springer-Verlag, 2005. N − 1 2c [3] C. De Canniere and C. Rechberger. Preimages for reduced SHA-0 and k=1 SHA-1. Crypto 2008, LNCS 5157, pp. 179–202, Springer-Verlag, 2008. Proof: Throughout the proof, we fix 1 ≤ i∗ ≤ q and [4] G. Leurent. MD4 is not one-way. FSE 2008, LNCS 5086, pp. 412–428, bound the probability that the i∗-th query completes any of Springer-Verlag, 2008. [5] F. Mendel, N. Pramstaller, C. Rechberger and V. Rijmen. Analysis of step- the configurations C1, C2 and C3 without the occurrence of reduced SHA-256. FSE 2006, LNCS 4047, pp. 126–143, Springer-Verlag, event Rcolq. 2006. First, we suppose that the i∗-th query π−1(y∗ ||y∗ ) = [6] Y. Sasaki and K. Aoki. Finding preimages in full MD5 faster than L R exhaustive search. Eurocrypt 2009, LNCS 5479, pp. 134–152, Springer- (xL||xR) is backward. In order to make any configuration Ck, Verlag, 2008. 0 0 (xL||xR) should be contained in Ui∗−1 for some xL. This [7] X. Wang, X. Lai, D. Feng, H. Chen and X. Yu. of the hash event occurs with probability at most i∗N/(N 2 − q) since functions MD4 and RIPEMD. Eurocrypt 2005, LNCS 3494, pp. 1–18, ∗ Springer-Verlag, 2005. |Ui∗−1| ≤ i without the occurrence of event Rcolq. [8] X. Wang, X. Lai and H. Yu. Finding collisions in the full SHA-1. ∗ ∗ ∗ Next, we suppose that the i -th query π(xL||xR) = Crypt0 2005, LNCS 3621, pp. 17–36, Springer-Verlag, 2005. [9] X. Wang and H. Yu. How to break MD5 and other hash functions. (yL||yR) is forward. This query determines at most one orderly ∗ ∗ ∗ Eurocrypt 2005, LNCS 3494, pp. 19–35, Springer-Verlag, 2005. reachable node u ∈ Ui∗−1 such that uR = xR, and hence a [10] H. Wu. The Hash Function JH. Submission to NIST, http://icsd.i2r.a- ∗ ∗ ∗ i star.edu.sg/staff/hongjun/jh/jh.pdf, 2008. unique node w = (yL||(uL ⊕ xL ⊕ yR)) such that u → w for some u ∈ Ui∗−1. a) Event C1 ∧ ¬Rcolq: The probability that ∗ ∗ 0 chopc(yL||(uL ⊕ xL ⊕ yR)) = chopc(w ) 0 n−c 2 for a fixed w ∈ Ui∗−1 is at most 2 /(N − q). Since ∗ ∗ |Ui∗−1| ≤ i , the probability that the i -th query completes C1 ∗ n−c 2 without the occurrence of event Rcolq is at most i 2 /(N − q). b) Event C2 ∧ ¬Rcolq: The probability that ∗ ∗ ∗ uL ⊕ xL ⊕ yR = xR is at most N/(N 2 − q).