HASH FUNCTIONS

Daniele Micciancio UCSD 1 Hashing

Hash functions like MD5, SHA1, SHA256, SHA512, SHA3, ... are amongst the most widely-used cryptographic primitives. Their primary purpose is collision-resistant data compression, but they have many other purposes and properties as well. A good is often treated like a magic wand ...

Daniele Micciancio UCSD 2 Collision resistance (CR)

n Definition: A collision for a function h : D → {0, 1} is a pair x1, x2 ∈ D of points such that h(x1) = h(x2) but x1 6= x2.

If |D| > 2n then the pigeonhole principle tells us that there must exist a collision for h.

Daniele Micciancio UCSD 3 Collision resistance (CR)

n Definition: A collision for a function h : D → {0, 1} is a pair x1, x2 ∈ D of points such that h(x1) = h(x2) but x1 6= x2.

If |D| > 2n then the pigeonhole principle tells us that there must exist a collision for h.

Daniele Micciancio UCSD 4 Collision resistance (CR)

n Definition: A collision for a function h : D → {0, 1} is a pair x1, x2 ∈ D of points such that h(x1) = h(x2) but x1 6= x2.

If |D| > 2n then the pigeonhole principle tells us that there must exist a collision for h.

We want that even though collisions exist, they are hard to find.

Daniele Micciancio UCSD 5 Collision-resistance of a function family

The formalism considers a family H : Keys(H) × D → R of functions, meaning for each K ∈ Keys(H) we have a map HK : D → R defined by HK (x) = H(K, x).

Game CRH procedure Finalize(x1, x2) procedure Initialize If (x1 = x2) then return false K ←$ Keys(H) If (x1 6∈ D or x2 6∈ D) then return false Return K Return (HK (x1) = HK (x2)) Let cr h A i AdvH (A) = Pr CRH ⇒ true .

Daniele Micciancio UCSD 6 Collision-resistance

Game CRH procedure Finalize(x1, x2) procedure Initialize If (x1 = x2) then return false K ←$ Keys(H) If (x1 6∈ D or x2 6∈ D) then return false Return K Return (HK (x1) = HK (x2)) The Return statement in Initialize means that the adversary A gets K as input. The K here is not secret!

Adversary A takes K and tries to output a collision x1, x2 for HK . A’s output is the input to Finalize, and the game returns true if the collision is valid.

Daniele Micciancio UCSD 7 Idea: Pick x1 = x1[1]x1[2] and x2 = x2[1]x2[2] so that

EK (x1[1]) ⊕ x1[2] = EK (x2[1]) ⊕ x2[2]

Example

Let E: {0, 1}k × {0, 1}n → {0, 1}n be a blockcipher. Let H: {0, 1}k × {0, 1}2n → {0, 1}n be defined by

Alg H(K, x[1]x[2])

y ← EK (EK (x[1]) ⊕ x[2]); Return y

Let’s show that H is not collision-resistant by giving an efficient adversary cr A such that AdvH (A) = 1.

Daniele Micciancio UCSD 8 Example

Let E: {0, 1}k × {0, 1}n → {0, 1}n be a blockcipher. Let H: {0, 1}k × {0, 1}2n → {0, 1}n be defined by

Alg H(K, x[1]x[2])

y ← EK (EK (x[1]) ⊕ x[2]); Return y

Let’s show that H is not collision-resistant by giving an efficient adversary cr A such that AdvH (A) = 1.

Idea: Pick x1 = x1[1]x1[2] and x2 = x2[1]x2[2] so that

EK (x1[1]) ⊕ x1[2] = EK (x2[1]) ⊕ x2[2]

Daniele Micciancio UCSD 9 Example

Alg H(K, x[1]x[2])

y ← EK (EK (x[1]) ⊕ x[2]); Return y

Idea: Pick x1 = x1[1]x1[2] and x2 = x2[1]x2[2] so that

EK (x1[1]) ⊕ x1[2] = EK (x2[1]) ⊕ x2[2] adversary A(K) n n n −1 x1 ← 0 1 ; x2[2] ← 0 ; x2[1] ← EK (EK (x1[1]) ⊕ x1[2] ⊕ x2[2]) return x1, x2

cr Then AdvH (A) = 1 and A is efficient, so H is not CR. Note how we used the fact that A knows K and the fact that E is a blockcipher!

Daniele Micciancio UCSD 10 Exercise

Let E: {0, 1}k × {0, 1}l → {0, 1}l be a blockcipher. Let D be the set of all strings whose length is a positive multiple of l. Define the hash function H: {0, 1}k × D → {0, 1}l as follows:

Alg H(K, M) M[1]M[2] ... M[n] ← M C[0] ← 0l For i = 1,..., n do B[i] ← E(K, C[i − 1] ⊕ M[i]); C[i] ← E(K, B[i] ⊕ M[i]) Return C[n]

Show that H is not CR by giving an efficient adversary A such that cr AdvH (A) = 1.

Daniele Micciancio UCSD 11 Keyless hash functions

We say that H: Keys(H) × D → R is keyless if Keys(H) = {ε} consists of just one key, the empty string.

In this case we write H(x) in place of H(ε, x) or Hε(x). Practical hash functions like MD5, SHA1, SHA256, SHA512, SHA3, ... are keyless.

Daniele Micciancio UCSD 12 SHA1

Alg SHA1(M) // |M| < 264 V ← SHF1( 5A827999 k 6ED9EBA1 k 8F1BBCDC k CA62C1D6 , M ) return V

Alg SHF1(K, M) // |K| = 128 and |M| < 264 y ← shapad(M) Parse y as M1 k M2 k · · · k Mn where |Mi | = 512 (1 ≤ i ≤ n) V ← 67452301 k EFCDAB89 k 98BADCFE k 10325476 k C3D2E1F0 for i = 1,..., n do V ← shf1(K, Mi k V ) return V

Alg shapad(M) // |M| < 264 d ← (447 − |M|) mod 512 Let ` be the 64-bit binary representation of |M| y ← M k 1 k 0d k ` // |y| is a multiple of 512 return y

Daniele Micciancio UCSD 13 shf1

Alg shf1(K, B k V ) // |K| = 128, |B| = 512 and |V | = 160

Parse B as W0 k W1 k · · · k W15 where |Wi | = 32 (0 ≤ i ≤ 15) Parse V as V0 k V1 k · · · k V4 where |Vi | = 32 (0 ≤ i ≤ 4) Parse K as K0 k K1 k K2 k K3 where |Ki | = 32 (0 ≤ i ≤ 3) 1 for t = 16 to 79 do Wt ← ROTL (Wt−3 ⊕ Wt−8 ⊕ Wt−14 ⊕ Wt−16) A ← V0 ; B ← V1 ; C ← V2 ; D ← V3 ; E ← V4 for t = 0 to 19 do Lt ← K0 ; Lt+20 ← K1 ; Lt+40 ← K2 ; Lt+60 ← K3 for t = 0 to 79 do if (0 ≤ t ≤ 19) then f ← (B ∧ C) ∨ ((¬B) ∧ D) if (20 ≤ t ≤ 39 OR 60 ≤ t ≤ 79) then f ← B ⊕ C ⊕ D if (40 ≤ t ≤ 59) then f ← (B ∧ C) ∨ (B ∧ D) ∨ (C ∧ D) 5 temp ← ROTL (A) + f + E + Wt + Lt E ← D ; D ← C ; C ← ROTL30(B); B ← A ; A ← temp V0 ← V0 +A ; V1 ← V1 +B ; V2 ← V2 +C ; V3 ← V3 +D ; V4 ← V4 +E V ← V0 k V1 k V2 k V3 k V4; return V Daniele Micciancio UCSD 14 SHA1 calculator

http://www.sha1-online.com/

Daniele Micciancio UCSD 15 Applications of hash functions

The killer-app is hashing before digitally signing, but we’ll discuss a few others: • primitive in cryptographic schemes • tool for security applications • tool for non-security applications

Daniele Micciancio UCSD 16 Password verification

• Client A has a password PW that is also held by server B • A authenticates itself by sending PW to B over a (TLS)

APW PW - BPW

Problem: The password will be found by an attacker who compromises the server.

Daniele Micciancio UCSD 17 Password verification

• Client A has a password PW and server stores PW = H(PW ). • A sends PW to B (over a secure channel) and B checks that H(PW ) = PW APW PW - BPW

Server compromise results in attacker getting PW which should not reveal PW as long as H is one-way, which we will see is a consequence of collision-resistance. But we will revisit this when we consider dictionary attacks!

Daniele Micciancio UCSD 18 File comparision

• A has a large file FA and B has a large file FB . For example, music collections.

• They want to know whether FA = FB

• A sends FA to B and B checks whether FA = FB

AFA FA - BFB

Problem: Transmission could be slow.

Daniele Micciancio UCSD 19 Compare-by-hash

• A has a large file FA and B has a large file FB and they want to know whether FA = FB

• A computes hA = H(FA) and sends it to B, and B checks whether hA = H(FB ). AFA hA - BFB

Collision-resistance of H guarantees that B does not accept if FA 6= FB !

Daniele Micciancio UCSD 20 Virus protection

An executable may be available at lots of sites S1, S2,..., SN . Which one can you trust?

• Provide a safe way to get the hash h = H(X ) of the correct executable X . • Download an executable from anywhere, and check hash.

Daniele Micciancio UCSD 21 Birthday attacks

Let H : {0, 1}k × D → {0, 1}n be a family of functions with |D| > 2n. The q-trial finds a collision with probability about

q2 . 2n+1 √ So a collision can be found in about q = 2n+1 ≈ 2n/2 trials.

Daniele Micciancio UCSD 22 Recall Birthday Problem

$ n for i = 1,..., q do yi ← {0, 1} if ∃i, j (i 6= j and yi = yj ) then COLL ← true

Pr [COLL] = C(2n, q) q2 ≈ 2n+1

Daniele Micciancio UCSD 23 Birthday attack

Let H : {0, 1}k × D → {0, 1}n.

adversary A(K) $ for i = 1,..., q do xi ← D ; yi ← HK (xi ) if ∃i, j (i 6= j and yi = yj and xi 6= xj ) then return xi , xj else return FAIL

Daniele Micciancio UCSD 24 Analysis of birthday attack

Let H : {0, 1}k × D → {0, 1}n.

adversary A(K) $ for i = 1,..., q do xi ← D ; yi ← HK (xi ) if ∃i, j (i 6= j and yi = yj ) then COLL ← true

We have dropped things that don’t much affect the advantage and focused on success probability. So we want to know what is Pr [COLL].

−1 Let HK (y) = {x ∈ D : HK (x) = y} and assume the size of this set is −1 n n |HK (y)| = |D|/2 for all y. Then we are throwing q balls into N = 2 bins, and Pr[COLL] = C(N, q) ≈ q2/N.

Daniele Micciancio UCSD 25 Birthday attack times

Function n TB MD4 128 264 MD5 128 264 SHA1 160 280 SHA2-256 256 2128 SHA2-512 512 2256 SHA3-256 256 2128 SHA3-512 512 2256

TB is the number of trials to find collisions via a birthday attack.

Daniele Micciancio UCSD 26 Compression functions

A compression function is a family h : {0, 1}k × {0, 1}b+n → {0, 1}n of hash functions whose inputs are of a fixed size b + n, where b is called the block size. E.g. b = 512 and n = 160, in which case

h : {0, 1}k × {0, 1}672 → {0, 1}160

x

hK v hK (x k v)

Daniele Micciancio UCSD 27 The MD transform

Design principle: To build a CR hash function

H : {0, 1}k × D → {0, 1}n

where D = {0, 1}≤264 : • First build a CR compression function h : {0, 1}k × {0, 1}b+n → {0, 1}n. • Appropriately iterate h to get H, using h to hash block-by-block.

Daniele Micciancio UCSD 28 MD setup

Assume for simplicity that |M| is a multiple of b. Let

• kMkb be the number of b-bit blocks in M, and write M = M[1] ... M[`] where ` = kMkb. • hii denote the b-bit binary representation of i ∈ {0,..., 2b − 1}. • D be the set of all strings of at most 2b − 1 blocks, so that b kMkb ∈ {0,..., 2 − 1} for any M ∈ D, and thus kMkb can be encoded as above.

Daniele Micciancio UCSD 29 MD transform

Given: Compression function h : {0, 1}k × {0, 1}b+n → {0, 1}n. Build: Hash function H : {0, 1}k × D → {0, 1}n.

Algorithm HK (M) n m ← kMkb ; M[m + 1] ← hmi ; V [0] ← 0 For i = 1,..., m + 1 do v[i] ← hK (M[i]||V [i − 1]) Return V [m + 1]

M[1] M[2] h2i

h h h 0n K K K HK (M)

Daniele Micciancio UCSD 30 MD preserves CR

Assume • h is CR • H is built from h using MD Then • H is CR too!

This means • No need to attack H! You won’t find a weakness in it unless h has one • H is guaranteed to be secure assuming h is.

For this reason, MD is the design used in many current hash functions. Newer hash functions use other iteration methods with analogous properties.

Daniele Micciancio UCSD 31 MD preserves CR

Theorem: Let h : {0, 1}k × {0, 1}b+n → {0, 1}n be a family of functions and let H : {0, 1}k × D → {0, 1}n be obtained from h via the MD transform. Given a cr-adversary AH we can build a cr-adversary Ah such that cr cr AdvH (AH ) ≤ Advh (Ah)

and the running time of Ah is that of AH plus the time for computing h on the outputs of AH .

cr h CR ⇒ Advh (Ah) small Implication: cr ⇒ AdvH (AH ) small ⇒ H CR

Daniele Micciancio UCSD 32 How Ah works

Let (M1, M2) be the HK -collision returned by AH . The Ah will trace the chains backwards to find an hk -collision.

Daniele Micciancio UCSD 33 Case 1: kM1kb 6= kM2kb

Let x1 = h2i||V1[2] and x2 = h1i||V2[1]. Then

• hK (x1) = hK (x2) because HK (M1) = HK (M2).

• But x1 6= x2 because h1i= 6 h2i.

Daniele Micciancio UCSD 34 Else // V1[2] = V2[2] x1 ← M1[2]||V1[1] ; x2 ← M2[2]||V2[1] If x1 6= x2 then return x1, x2 Else // V1[1] = V2[1] n n x1 ← M1[1]||0 ; x2 ← M2[1]||0 Return x1, x2

Case 2: kM1kb = kM2kb

x1 ← h2i||V1[2] ; x2 ← h2i||V2[2] If x1 6= x2 then return x1, x2

Daniele Micciancio UCSD 35 x1 ← M1[2]||V1[1] ; x2 ← M2[2]||V2[1] If x1 6= x2 then return x1, x2 Else // V1[1] = V2[1] n n x1 ← M1[1]||0 ; x2 ← M2[1]||0 Return x1, x2

Case 2: kM1kb = kM2kb

x1 ← h2i||V1[2] ; x2 ← h2i||V2[2] If x1 6= x2 then return x1, x2 Else // V1[2] = V2[2]

Daniele Micciancio UCSD 36 Else // V1[1] = V2[1] n n x1 ← M1[1]||0 ; x2 ← M2[1]||0 Return x1, x2

Case 2: kM1kb = kM2kb

x1 ← h2i||V1[2] ; x2 ← h2i||V2[2] If x1 6= x2 then return x1, x2 Else // V1[2] = V2[2] x1 ← M1[2]||V1[1] ; x2 ← M2[2]||V2[1] If x1 6= x2 then return x1, x2

Daniele Micciancio UCSD 37 n n x1 ← M1[1]||0 ; x2 ← M2[1]||0 Return x1, x2

Case 2: kM1kb = kM2kb

x1 ← h2i||V1[2] ; x2 ← h2i||V2[2] If x1 6= x2 then return x1, x2 Else // V1[2] = V2[2] x1 ← M1[2]||V1[1] ; x2 ← M2[2]||V2[1] If x1 6= x2 then return x1, x2 Else // V1[1] = V2[1]

Daniele Micciancio UCSD 38 Case 2: kM1kb = kM2kb

x1 ← h2i||V1[2] ; x2 ← h2i||V2[2] If x1 6= x2 then return x1, x2 Else // V1[2] = V2[2] x1 ← M1[2]||V1[1] ; x2 ← M2[2]||V2[1] If x1 6= x2 then return x1, x2 Else // V1[1] = V2[1] n n x1 ← M1[1]||0 ; x2 ← M2[1]||0 Return x1, x2 Daniele Micciancio UCSD 39 Exercise

Let h: K × {0, 1}2b → {0, 1}b be a compression function. Define H: K × {0, 1}4b → {0, 1}b as follows:

Alg H(K, M)

M1 k M2 ← M V1 ← h(K, M1); V2 ← h(K, M2) V ← h(K, V1 k V2) return V

Above, by M1 k M2 ← M, we mean that M1 is the first 2b bits of M and M2 is the rest, so that |M1| = |M2| = 2b. Show that if h is collision-resistant then so is H. Do this by stating and proving an analogue of the Theorem above.

Daniele Micciancio UCSD 40 We seek an adversary that outputs distinct x1kv1, x2kv2 satisfying

Ex1 (v1) = Ex2 (v2) .

Answer: NO, h is NOT collision-resistant, because the following adversary cr A has Advh (A) = 1: adversary A b b n −1 x1 ← 0 ; x2 ← 1 ; v1 ← 0 ; y ← Ex1 (v1); v2 ← Ex2 (y) Return x1kv1 , x2kv2

How are compression functions designed?

Let E : {0, 1}b × {0, 1}n → {0, 1}n be a . Let us define keyless compression function h : {0, 1}b+n → {0, 1}n by

h(xkv) = Ex (v) .

Question: Is H collision resistant?

Daniele Micciancio UCSD 41 Answer: NO, h is NOT collision-resistant, because the following adversary cr A has Advh (A) = 1: adversary A b b n −1 x1 ← 0 ; x2 ← 1 ; v1 ← 0 ; y ← Ex1 (v1); v2 ← Ex2 (y) Return x1kv1 , x2kv2

How are compression functions designed?

Let E : {0, 1}b × {0, 1}n → {0, 1}n be a block cipher. Let us define keyless compression function h : {0, 1}b+n → {0, 1}n by

h(xkv) = Ex (v) .

Question: Is H collision resistant?

We seek an adversary that outputs distinct x1kv1, x2kv2 satisfying

Ex1 (v1) = Ex2 (v2) .

Daniele Micciancio UCSD 42 How are compression functions designed?

Let E : {0, 1}b × {0, 1}n → {0, 1}n be a block cipher. Let us define keyless compression function h : {0, 1}b+n → {0, 1}n by

h(xkv) = Ex (v) .

Question: Is H collision resistant?

We seek an adversary that outputs distinct x1kv1, x2kv2 satisfying

Ex1 (v1) = Ex2 (v2) .

Answer: NO, h is NOT collision-resistant, because the following adversary cr A has Advh (A) = 1: adversary A b b n −1 x1 ← 0 ; x2 ← 1 ; v1 ← 0 ; y ← Ex1 (v1); v2 ← Ex2 (y) Return x1kv1 , x2kv2

Daniele Micciancio UCSD 43 How are compression functions designed?

Let E : {0, 1}b × {0, 1}n → {0, 1}n be a block cipher. Let us define keyless compression function h : {0, 1}b+n → {0, 1}n by

h(xkv) = Ex (v) ⊕ v .

The compression function of SHA1 is underlain in this way by a block cipher E : {0, 1}512 × {0, 1}160 → {0, 1}160. The compression function of SHA256 is underlain in this way by a block cipher E : {0, 1}512 × {0, 1}256 → {0, 1}256.

Daniele Micciancio UCSD 44 Cryptanalytic attacks

So far we have looked at attacks that do not attempt to exploit the structure of H. Can we do better than birthday if we do exploit the structure? Ideally not, but functions have fallen short!

Daniele Micciancio UCSD 45 Cryptanalytic attacks against hash functions

1993,1996 : collisions found in compression function of MD5 in 216 time [dBBo,Do]. But these did not yield collisions for MD5. 2004, 2005 : collisions found in RIPEMD, SHA0 [WaFeLaYu]. 2005, 2006: collisions can be found in MD5 in 1 minute [WaFeLaYu, LeWadW, Kl] 2015: collisions found in compression function sha1 of SHA1 [SKP]. But these do not yield collisions for SHA1. 2017 : collisions found for SHA1 in 263.1 time [SBKAM] https://shattered.io/. 2017: Google, Microsoft and Mozilla browsers stop accepting SHA1-based certificates.

Daniele Micciancio UCSD 46 SHA1 collision

Daniele Micciancio UCSD 47 SHA1 collision news

Daniele Micciancio UCSD 48 SHA2

The SHA256 and SHA512 hash functions are still viewed as secure, meaning the best known attack is the birthday attack. These are based on the MD paradigm we discussed above.

Daniele Micciancio UCSD 49 SHA3

National Institute for Standards and Technology (NIST) held a world-wide competition to develop a new hash function standard.

Contest webpage: http://csrc.nist.gov/groups/ST/hash/index.html

Requested parameters: • Design: Family of functions with 224, 256, 384, 512 bit output sizes • Security: CR, one-wayness, near-collision resistance, others... • Efficiency: as fast or faster than SHA2-256

Daniele Micciancio UCSD 50 SHA3

Submissions: 64 Round 1: 51 Round 2: 14: BLAKE, Blue Midnight Wish, CubeHash, ECHO, , Grostl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, . Finalists: 5: BLAKE, Grostl, JH, Keccak, Skein. SHA3: 1: Keccak

Daniele Micciancio UCSD 51 SHA3: The Sponge construction

f : {0, 1}r+c → {0, 1}r+c is a (public, invertible!) permutation. d is the number of output bits, and c = 2d. SHA3 does not use the MD paradigm used by SHA1 and SHA2. Shake(M, d)— Extendable-output function, returning any given number d of bits.

Daniele Micciancio UCSD 52