Hash functions CS-466: Applied Adam O’Neill Adapted from http://cseweb.ucsd.edu/~mihir/cse107/ Setting the Stage

• Today we will study a second lower-level primitive, hash functions. Setting the Stage

• Today we will study a second lower-level primitive, hash functions. • Hash functions like MD5, SHA1, SHA256 are used pervasively. Setting the Stage

• Today we will study a second lower-level primitive, hash functions. • Hash functions like MD5, SHA1, SHA256 are used pervasively. • Primary purpose is data compression, but they have many other uses and are often treated like a “magic wand” in protocol design. Collision resistance (CR) Collision Resistance

n Definition: A collision for a function h : D 0, 1 is a pair x1, x2 D → { } ∈ of points such that h(x1)=h(x2)butx1 = x2. ̸ If D > 2n then the pigeonhole principle tells us that there must exist a | | collision for h.

Mihir Bellare UCSD 3 Collision resistance (CR) Collision resistanceCollision Resistance (CR)

n Definition: A collision for a function h : D 0, 1 n is a pair x1, x2 D Definition: A collision for a function h : D → {0, 1} is a pair x1, x2 ∈ D of points such that h(x1)=h(x2)butx1 = →x2.{ } ∈ of points such that h(x1)=h(x2)butx1 ≠ x2. ̸ If D > 2n then the pigeonhole principle tells us that there must exist a If |D| > 2n then the pigeonhole principle tells us that there must exist a collision| | for h. collision for h.

We want that even though collisions exist, they are hard to find.

Mihir Bellare UCSD 3

Mihir Bellare UCSD 5 Collision resistance (CR) Collision resistance (CR) Collision resistanceCollision Resistance (CR)

n Definition: A collision for a function h : D 0, 1 n is a pair x1, x2 D Definition: A collision for a function h : D → {0, 1} is a pair x1, x2 ∈ D of points such that h(x1)=h(x2)butx1 = →x2.{ }n ∈ ofDefinition: points suchA thatcollisionh(x1for)= ah function(x2)buthx1: =D x2. 0, 1 is a pair x1, x2 D ̸ → { } ∈ of points such that h(x1)=h(x2)butx1 ̸= x2. If D > 2n then the pigeonhole principle tells̸ us that there must exist a If |D| > 2n then the pigeonhole principle tells us that there must exist a collision| | forn h. collisionIf D > for2 thenh. the pigeonhole principle tells us that there must exist a | | collision for h.

We want that even though collisions exist, they are hard to find. We want that even though collisions exist, they are hard to find.

Mihir Bellare UCSD 3

Mihir Bellare UCSD 5 Mihir Bellare UCSD 5 Collision-resistance of a function family The Game

be ck - secure ? suppose } then can H The formalism considers a family H :Keys(H) D R of functions, × → meaning for each K Keys(H)wehaveamapHK : D R defined by ∈ → HK (x)=H(K, x).

Game CRH procedure Finalize(x1, x2) procedure Initialize If (x1 = x2) then return false K $ Keys(H) If (x1 D or x2 D) then return false ← ̸∈ ̸∈ Return K Return (HK (x1)=HK (x2)) Let cr cne#% Adv (A) = Pr CRA true . KEESEY: H ! H ⇒ "

is CK - secure Cor CR ) We that It just if say * " " " practical A Bs is small . K¥44any , Advuttctl for Mihir Bellare UCSD 6 Example Example

k n n E- Let E: 0, 1 0, 1 0, 1 be a blockcipher. leg . let AES { }k × { }2n→ { } n Let H: 0, 1 0, 1 0, 1 be defined by n - 128 { } × { } → { } )

Alg H(K, x[1]x[2]) y EK (EK (x[1]) x[2]); Return y ← ⊕

are unequal CR . Let’s show. thatHisH isnot collision-resistant by giving an efficient adversary Claim cr wghioh - A such that AdvH (A)=1. Xz[DxzEi st . Want XIIIXEY , : .ES#kED ) Eetkcx ,E])•×Ei ) Ekftklx ⇒ Fkcx , [ EKCXZEIHGXZEIiD*X,G]= Mihir Bellare UCSD 8 =) KN=Ek( XED ) a XFDQEKCKED ) Keyless hash functions Keyless Hash Functions

We say that H:Keys(H) D R is keyless if Keys(H)= ε consists of × → { } just one , the empty string.

In this case we write H(x)inplaceofH(ε, x)orHε(x). Practical hash functions like MD5, SHA1, SHA256, SHA3, ... are keyless.

doesn't Although a formal definition of CR achievable for make Sense ( or is not ) keyless ode in we can still them hash functions ,

Mihir Bellare UCSD 12 reductions . Secure Hash SHA1 Algorithm . SHA1 Alg SHA1(M) // M < 264 | | V SHF1( 5A827999 6ED9EBA1 8F1BBCDC CA62C1D6 , M ) return←V ∥ ∥- ∥ 160 bits . Alg SHF1(K, M) // K =128and M < 264 | | | | y shapad(M) ← Parse y as M1 M2 Mn where Mi =512(1 i n) ∥ ∥ ··· ∥ | | ≤ ≤ V 67452301 EFCDAB89 98BADCFE 10325476 C3D2E1F0 ← ∥ ∥ ∥ ∥ for i =1,...,n do V shf1(K, Mi V ) ← ∥ return V

Alg shapad(M) // M < 264 | | d (447 M )mod512 ← − | | Let ℓ be the 64-bit binary representation of M | | y M 1 0d ℓ // y is a multiple of 512 ← ∥ ∥ ∥ | | return y Mihir Bellare UCSD 13 Underlying Compression Functionshf1

Alg shf1(K, B V ) // K =128, B =512and V =160 ∥ | | | | | | Parse B as W0 W1 W15 where Wi =32(0 i 15) ∥ ∥ ··· ∥ | | ≤ ≤ Parse V as V0 V1 V4 where Vi =32(0 i 4) ∥ ∥ ··· ∥ | | ≤ ≤ Parse K as K0 K1 K2 K3 where Ki =32(0 i 3) ∥ ∥ ∥ 1 | | ≤ ≤ for t =16to79do Wt ROTL (Wt−3 Wt−8 Wt−14 Wt−16) ← ⊕ ⊕ ⊕ A V0 ; B V1 ; C V2 ; D V3 ; E V4 ← ← ← ← ← for t =0to19do Lt K0 ; Lt+20 K1 ; Lt+40 K2 ; Lt+60 K3 ← ← ← ← for t =0to79do if (0 t 19) then f (B C) (( B) D) ≤ ≤ ← ∧ ∨ ¬ ∧ if (20 t 39 OR 60 t 79) then f B C D ≤ ≤ ≤ ≤ ← ⊕ ⊕ if (40 t 59) then f (B C) (B D) (C D) ≤ ≤ 5 ← ∧ ∨ ∧ ∨ ∧ temp ROTL (A)+f + E + Wt + Lt ← 30 E D ; D C ; C ROTL (B); B A ; A temp ← ← ← ← ← V0 V0 +A ; V1 V1 +B ; V2 V2 +C ; V3 V3 +D ; V4 V4 +E ← ← ← ← ← V V0 V1 V2 V3 V4; return V ← ∥ ∥ ∥ ∥ Mihir Bellare UCSD 14 Applications

• Hashing before digitally signing. Applications

• Hashing before digitally signing. • Primitive in cryptographic protocols. Applications

• Hashing before digitally signing. • Primitive in cryptographic protocols. • Tool for security applications. Applications

• Hashing before digitally signing. • Primitive in cryptographic protocols. • Tool for security applications. • Tool for non-security applications. Applications

• Hashing before digitally signing. • Primitive in cryptographic protocols. • Tool for security applications. • Tool for non-security applications. • Let’s see some examples… Password Verification

• Consider a password file stored on a remote server and clients logging in over a secure makes have adversary channel. to dir

ME: Ⱦ - t¥Client f¥iYaw¥¥ K Senverk Now only Vulnerable to dictionary attack . Compare-by-Hash

• Suppose two parties each have a large file and want to know if they have the same file.

H¥) & ' f * ' ^ Seward] Bob Aliceof [

in - contexts . 1 even non security

. Useful

some measure 2. Seems to provide attack of Security ( but dictionary Stiel applies ) Virus Protection

• Suppose you download an executable from somewhere on the Internet. How do you know it’s not a virus?

!¥¥!#Ⱦ %fat 9¥ ' let Ⱦ a hash fraction H :D { 0,15 be .

-

, ! Consider for Some integer parameter q

Adversary A to do: For i=l q xitf D ȼ y , . HCx ;)

st . If Ii ] , , izftq tii ,l=HCx :D ^ xiitxiz Xi return ( , Xi , then , )

Else return 1- . Analysis

is meaning Assume that H regular " ' ( ) 1 = fye{ 0,13 III y ? ]=Pr[ xittttly ) ] then PRTHCKHY =÷n then Adult ,rCa)=cCzn,e)z .3eG±n

- n . bit output for bit =) need nfe sewy

. - we need Koo bit for 80 bit see e.g. has 160 bit . output E- S ' SHAI output 't qyazsb has 256 bit oatp Birthday attack times

length Output time & as Function n TB find to MD4 128 264 MD5 128 2O64 collision - - SHA1 160 280 128 by bday SHA2-256 256 2 attack SHA2-512 512 2256 SHA3-256 256 2128 SHA3-512 512 2256

TB is the number of trials to find collisions via a birthday attack.

Mihir Bellare UCSD 25 Cryptanalytic attacks against hash functions A. non - generic attacks is ( bday attack generic ) When Against Time Who 1993,1996 216 [dBBo,Do] 2005 RIPEMD 218 2004 SHA0 251 [JoCaLeJa] 2005 SHA0 240 [WaFeLaYu] 2005 SHA1 269 [WaYiYu] 2012 SHA1 260 265 [St] − 2005,2006 MD5 1minute⇒ [WaFeLaYu,LeWadW,Kl] md5 is the compression function of MD5 SHA0 is an earlier, weaker version of SHA1

Mihir Bellare UCSD 44 CompressionCompression Functions functions

A compression function is a family h : 0, 1 k 0, 1 b+n 0, 1 n of { } × { } → { } hash functions whose inputs are of a fixed size b + n, where b is called the block size. for SHAI Ⱦ blocks E.g. b =512andn =160,inwhichcase - h : 0, 1 k 0, 1 672 0, 1 160 { } × { } → { } • w Ige key space x

hK v hK (x v) ∥

Mihir Bellare UCSD 26 The MD transform Merkle - Dam guard MD Transform

Design principle: To build a CR

H : 0, 1 k D 0, 1 n { } × → { } 64 where D = 0, 1 ≤2 : { } First build a CR compression function • h : 0, 1 k 0, 1 b+n 0, 1 n. { } × { } N → { } Appropriately iterate h to get H,usingh to hash block-by-block. • I U ite b isauedapo

Mihir Bellare UCSD 27 MD setup

MD Setup SHAI ageistfor Assume for simplicity that M is a multiple of b.Let | | M b be the number of b-bit blocks in M, and write • ∥ ∥ M = M[1] ...M[ℓ] where ℓ = M b. ∥ ∥ i denote the b-bit binary representation of i 0,...,2b 1 . • ⟨ ⟩ ∈ { − } D be the set of all strings of at most 2b 1blocks,sothat • b − M b 0,...,2 1 for any M D,andthus M b can be ∥ ∥ ∈ { − } ∈ ∥ ∥ encoded as above.

Mihir Bellare UCSD 28 MD transform The Transform

Given: Compression function h : 0, 1 k 0, 1 b+n 0, 1 n. { } × { } → { } Build: Hash function H : 0, 1 k D 0, 1 n. { } × → { }

Algorithm HK (M) n m M b ; M[m +1] m ; V [0] 0 ←∥ ∥ ←⟨ ⟩ ← For i =1,...,m +1dov[i] hK (M[i] V [i 1]) ← || − Return V [m +1] HMHB MED ME ] MED M[1] M[2] 2 .Ⱦi¥DȾ ⟨ ⟩, on Issa .IE h h h Hath 0n K K K HK (M)

Mihir Bellare UCSD 29 MD preserves CR

• The nice property of the MD transform is that it preserves collision-resistance (CR). MD preserves CR

• The nice property of the MD transform is that it preserves collision-resistance (CR). • If we start with a CR fixed input-length compression function we end up with a CR hash function taking unbounded-length inputs. . at least for practical purpose ) . MD preserves CR To • The nice property of the MD transform is that it preserves collision-resistance (CR). • If we start with a CR fixed input-length compression function we end up with a CR hash function taking unbounded-length inputs. • There is no need to cryptanalyze the latter. The only way to break it is to break the A- compression function. How Ah works

M has a blocks Mz has Az , , Proof Idea blocks computationsAided of h . Let (M1, M2)betheHK -collision returned by AH .TheAh will trace the chains backwards to find an h -collision. k " MED Mills MED -

• Dh Mi[ distinct inputs At MEH lmdlb ME MED - Mihir.EE#D*aae* Bellare.tn#I.D..kyy*HUCSD 32 "#⇐ B 11 Mall $ in HM , hot the length M M - Case 1: 1 b = 2 Case 1b ∥ ∥ ̸ ∥ ∥ ¥¥¥im±

.TT#%FFideb

Let x1 = 2 V1[2] and x2 = 1 V2[1]. Then in h ⟨ ⟩|| @⟨ ⟩||a collision We have hK (x1)=hK (x2)becauseHK (M1)=HK (M2). • in each in the last But x1 = x2 because 1 = 2 . • ̸ computationinventions⟨ ⟩̸ . ⟨ ⟩

Mihir Bellare UCSD 33 Hm ,k=HMrH↳ lines up Case 2 , - block - for Caseblock 2: M b = M b ∥ 1∥ ∥ 2∥ u p -0 0 confutation⑤OOO §twow¥eEE00 D

) . x1 2 V1at[2] ; xV2 [ 2)2 Vand2[2] VzT2 Look←⟨ ⟩|| ,←⟨ ⟩|| a collision If x1 = x2 then return x1, x2 we hare , ̸ - are different If they in?h

Same one - the are , back If they go

h and . invocation of repeat or Eventually we find a collision wkighonffadidm#,

Mihir Bellare UCSD 34 HowCompression from Blockcipher? are compression functions designed? b = n key length = b look length Let E : 0, 1 b 0, 1 n 0, 1 n be a . Let us design { } × { } → { } keyless compression function

h : 0, 1 b+n 0, 1 n a{ ←} → { } s:÷o by h(x v)=Ex (v) || # Is H collision resistant? E is use the fnt that No ! We givnk efficiently inrefisk K ' is computable ) Cia . EE efhath

: Adversary A K are . yȼE( K ,x ) when ,× arbitrary ' where KHK × 't I ( 1<1,4 ) ' ret K 'H× ) ( KHX , Mihir Bellare UCSD 40 How are compression functions designed? how SHAI 's compression function A Better Way works

Let E : 0, 1 b 0, 1 n 0, 1 n be a block cipher. Keyless compression { } × { } → { } function + e- comp . h : 0, 1 b n 0, 1 n refusing on { } ↳→ { } - may be designed as Daries Meyer h(x v)=Ex (v) v || ⊕D contingent ⇒ prennpthn.us attack The compression function of SHA1 is underlain in this way by a block cipher E : 0, 1 512 0, 1 160 0, 1 160. hard to solve D'epMeyer{ } × { } → { } A toro Mihir Bellare ¥x¥¥:¥¥¥!UCSD 42 Cryptanalytic attacks against hash functions Recall : so'II¥#dNon-Generic Attacks When Against Time Who 1993,1996 md5 216 [dBBo,Do] 2005 RIPEMD 218 2004 SHA0 251 [JoCaLeJa] 2005 SHA0 240 [WaFeLaYu] 2005 SHA1 269 [WaYiYu] 2012 SHA1 260 265 [St] a− 2005,2006 MD5 1minute [WaFeLaYu,LeWadW,Kl] md5 is the compression function of MD5 SHA0 is an earlier, weaker version of SHA1

Mihir Bellare SHAH-ere-d.com#ioUCSD 44 SHA3 ¥5competition SHA3 Competition a-

Submissions: 64 Round 1: 51 Round 2: 14: BLAKE, Blue Midnight Wish, CubeHash, ECHO, , Grostl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, . god Finalists: 5: BLAKE, Grostl, JH, Keccak, Skein. SHA3: 1: Keccak

*

doesn't use MD .

" " Uses functions Mihir Bellare spongeUCSD 50 Winner: The Sponge Construction SHA3: The Sponge construction S input block

d t I

- = r

f : 0, 1 r+c 0, 1 r+c is a (public, invertible!) permutation. { } → { } d is the number of output bits, and c =2d. SHA3 does not use the MD paradigm used by SHA1 and SHA2. Shake(M, d)— Extendable-output function, returning any given number d

of bits. - SET

Mihir Bellare UCSD 51 Winner: The Sponge Construction SHA3: The Sponge construction

f : 0, 1 r+c 0, 1 r+c is a (public, invertible!) permutation. { } → { } d is the number of output bits, and c =2d. SHA3 does not use the MD paradigm used by SHA1 and SHA2. Shake(M, d)— Extendable-output function, returning any given number d of bits.

Mihir Bellare UCSD 51