Hash Functions - June 2014

Hash functions

X.509 Annex D RIPEMD-160 MDC-2 SHA-256 SHA-3 MD2, MD4, MD5 SHA-512 Introduction to the SHA-1 Design and of This is an input to a crypto- Cryptographic Hash Functions graphic . The input is a very long string, that is reduced by the hash function to a string of fixed length. There are 1A3FD4128A198FB3CA345932 Bart Preneel additional security conditions: it h should be very hard to find an KU Leuven - COSIC input hashing to a given value (a [email protected] preimage) or to find two colliding inputs (a collision). Sibenik, June 2014 Insert presenter logo here on slide master 2

Applications Agenda

• short unique identifier to a string • Definitions – digital signatures – data authentication • Iterations (modes) • one-way function of a string • Compression functions – protection of passwords – micro-payments • Constructions • confirmation of knowledge/commitment • SHA-3 • pseudo-random string generation/ derivation • Conclusions • entropy extraction • construction of MAC algorithms, stream ciphers, block ciphers,…

2005: 800 uses of MD5 in Microsoft Windows 3 4

Security requirements (n-bit result) Preimage resistance

preimage 2nd preimage collision preimage • in a password file, one does not store – (username, password) x ?  ? ?  ? ? • but – (username,hash(password)) h h h h h h • this is sufficient to verify a password • an attacker with access to the password file has to find a preimage h(x) h(x) = h(x’) h(x) = h(x’) h(x) 2n 2n 2n/2 2n

5 6

1 Hash Functions - Bart Preneel June 2014

Second preimage resistance Collision resistance

2nd preimage • hacker Alice prepares two versions collision x of a software driver for the O/S company Bob Channel 1: high capacity and insecure x’ x  ? – x is correct code x  h(x) – x’ contains a backdoor that gives Alice access to the machine Channel 2: low capacity but secure (= authenticated – cannot be modified) • Alice submits x for inspection to Bob h h • if Bob is satisfied, he digitally signs h h h(x) with his private key • an attacker can modify x but not h(x) • Alice now distributes x’ to users of h(x) = h(x’) • he can only fool the recipient if he the O/S; these users verify the h(x) = h(x’) finds a second preimage of x signature with Bob’s public key n n/2 2 • this signature works for x and for x’, 2 since h(x) = h(x’) 7 8

Indifferentiability from a random oracle Pseudo-random function or PRO property [Maurer+04]

computationally indistinguishable from a random function variant of indistinguishability appropriate when distinguisher $ prf $ hK(.) f Advh = Pr [ K  K: A 1] - Pr [ f  RAND(m,n):A 1] has access to inner component (e.g. building block of a hash function) RAND(m,n): set of all functions from m-bit to n-bit strings  Simulator S,  distinguisher D, AdvPRO(H,S) is small

K h FIL f H VIL RO S (hash function) RO

? or ? This concept makes only ? or ? sense for a function with a D secret key D [Ristenpart-Shacham-Shrimpton’11] 9 [Demay-Gaz-Hirt-Maurer’13] 10

Brute force (2nd) preimage Brute force attacks in practice

• multiple target second preimage (1 out of many): • (2nd) preimage search t – if one can attack 2 simultaneous targets, the effort to find a single 40 preimage is 2n-t – n = 128: 14 B$ for 1 year if one can attack 2 targets in parallel • multiple target second preimage (many out of many): • parallel collision search: small memory using cycle finding algorithms (distinguished points) – time-memory trade-off with Θ(2n) precomputation and storage Θ(22n/3) time per (2nd) preimage: Θ(22n/3) – n = 128: 1 M$ for 5 hours (or 1 year on 60K PCs) [Hellman’80] – n = 160: 56 M$ for 1 year – need 256-bit result for long term security (30 years or more) • answer: randomize hash function with a parameter S (, key, spice,…)

11 12

2 Hash Functions - Bart Preneel June 2014

Quantum computers Properties in practice

• in principle exponential parallelism • collision resistance is not always necessary • inverting a one-way function: 2n reduced to 2n/2 • other properties are needed: [Grover’96] – PRF: pseudo-randomness if keyed (with secret key) • collision search: can we do better than 2n/2 ? – PRO: pseudo-random oracle property – near-collision resistance – 2n/3 computation + hardware [Brassard-Hoyer-Tapp’98] = 22n/3 – partial preimage resistance (most of input known) n/4 – [Bernstein’09] classical collision search requires 2 computation – multiplication freeness and hardware (= standard cost of 2n/2 ) • how to formalize these requirements and the relation between them?

13 14

How not to construct a hash function

• Divide the message into t blocks xi of n bits each

Message block 1: x1 Iteration  Message block 2: x2 (mode of compression function)  … 

Message block t: xt =

Hash value h(x)

15 15 16

Hash function: iterated structure Security relation between f and h

• iterating f can degrade its security – trivial example: 2nd preimage IV H1 H2 H3 f f f f g IV H H H 1 2 3 g x1 x2 x3 x4 f f f f

• split messages into blocks of fixed length and hash them x1 x2 x3 x4 block by block with a compression function f

• need padding at the end IV = H1 H2 H3 f f f g efficient and elegant…. but …

x2 x3 x4 17 18

3 Hash Functions - Bart Preneel June 2014

Security relation between f and h (2) Security relation between f and h (3)

• solution: Merkle-Damgård (MD) strengthening length extension: if one knows h(x), easy to compute h(x || y) without knowing x or IV – fix IV, use unambiguous padding and insert length at the end IV H H 1 2 H = h(x) f f f 3 • f is collision resistant  h is collision resistant x x x [Merkle’89-Damgård’89] 1 2 3 IV H H H H = h(x || y) nd nd 1 2 3 4 • f is ideally 2 preimage resistant ? h is ideally 2 f f f f preimage resistant [Lai-Massey’92] x1 x2 x3 y • PRO preservation  Col, Sec and Pre for ideal compression function solution: output transformation – but for narrow pipe bounds for Sec and Pre are at most 2n/2 rather than 2n IV H1 H2 H3 • many other results f f f f g

19 x1 x2 x3 x4 20

How (NOT) to strengthen a hash function? Attacks on MD-type iterations [Coppersmith’85][Joux’04]

• long message 2nd • answer: concatenation [Dean-Felten-Hu'99], [Kelsey-Schneier’05] – Sec security degrades lineary with number 2t of message blocks • h1 (n1-bit result) and h2 (n2-bit result) hashed: 2n-t+1 + t 2n/2+1 – appending the length does not help here!

• multi- and impact on concatenation [Joux’04] • intuition: the strength of g against h h collision/(2nd) preimage attacks is the 1 2 • herding attack [Kelsey-Kohno’06] product of the strength of h1 and h2 – reduces security of commitment using a hash function from 2n — if both are “independent” – on-line 2n-t + precomputation 2.2(n+t)/2 + storage 2t g(x) = h1(x) || h2(x) • but….

21 22

Multiple collisions  multi-collision Multi-collisions on iterated hash function (2)

Assume “ideal” hash function h with n-bit result IV H1 H2 H3 • Θ(2n/2) evaluations of h (or steps): 1 collision f f f f – h(x)=h(x’)

x , x’ x , x’ x , x’ x , x’ • Θ(r. 2n/2) steps: r2 collisions 1 1 2 2 3 3 4 4

– h(x1)=h(x1’) ; h(x2)=h(x2’) ; … ; h(xr2)=h(xr2’) • for IV: collision for block 1: x1, x’1 • for H : collision for block 2: x , x’ • Θ(22n/3) steps: a 3-collision 1 2 2 for H : collision for block 3: x , x’ – h(x)= h(x’)=h(x’’) • 2 3 3 • for H3: collision for block 4: x4, x’4 • Θ(2n(t-1)/t) steps: a t-fold collision (multi-collision) • now h(x1||x2||x3||x4) = h(x’1||x2||x3||x4) = h(x’1||x’2||x3||x4) = … – h(x1)= h(x2)= … =h(xt) = h(x’1||x’2||x’3||x’4) a 16-fold collision (time: 4 collisions)

23 24

4 Hash Functions - Bart Preneel June 2014

Multi-collisions [Coppersmith’85][Joux ’04] Multi-collisions [Coppersmith’85][ Joux ’04]

• finding multi-collisions for an iterated hash function is not consider h1 (n1-bit result) and h2 (n2-bit result), with n1  n2. much harder than finding a single collision (if the size of the concatenation of 2 iterated hash functions (g(x)= h (x) || h (x)) internal memory is n bits) 1 2 is as most as strong as the strongest of the two (even if both are independent) • algorithm R • generate R = 2n1/2-fold • cost of collision attack against g at most n2/2 n1/2 (n1 + n2)/2 multi-collision for h2 n1 . 2 + 2 << 2 • in R: search by brute • cost of (2nd) preimage attack against g at most force for h1 n1 . 2n2/2 + 2n1 + 2n2 << 2n1 + n2 • if either of the functions is weak, the attacks may work better • Time: n1. 2n2/2 + 2n1/2 h1 h2 << 2(n1 + n2)/2

g(x) = h1(x) || h2(x) 25 26

Improving MD iteration Improving MD iteration

salt + output transformation + counter + wide pipe • degradation with use: salting (family of functions, randomization)

salt salt salt salt salt – or should a salt be part of the input? • PRO: strong output transformation g IV H1 H2 H 3 – also solves length extension 2n 2n 2n 2n g n f f f f 2n • long message 2nd preimage: preclude fix points

– counter f fi [Biham-Dunkelman’07] |x| x x x x n/2 1 1 2 2 3 3 4 4 • multi-collisions, herding: avoid breakdown at 2 with larger internal memory: known as wide pipe security reductions well understood – e.g., extended MD4, RIPEMD, [Lucks’05] many more results on property preservation impact of theory limited 27 28

Tree structure: parallelism Permutation (π) based: sponge

x x x x h1 h2 [Damgård’89], [Pal-Sarkar’03], [Keccak team’13] 1 2 3 4 H1 x1 x2 x3 x4 x5 x6 x7 x8 0 r f f f f π π π π π π H2 0 … c f f absorb squeeze

if result has n bits, H1 has r bits (rate), H2 has c bits (capacity) and f the permutation π is “ideal” collisions min (2c/2 , 2n/2) 2nd preimage min (2c/2 , 2n) c n 29 preimage min (2 , 2 ) 30

5 Hash Functions - Bart Preneel June 2014

Modes: summary

• growing theory to reduce security properties of hash function to that of compression function (MD) or permutation (sponge) – preservation of large range of properties – relation between properties Compression functions • it is very nice to assume multiple properties of the compression function f, but unfortunately it is very hard to verify these • still no single comprehensive theory

31 32 32

Single block length: [Rabin’78] (EK) based: single block length

x1 x2 x3 x4 Davies-Meyer Miyaguchi-Preneel

H H H3 H4 IV E 1 E 2 E E Hi Hi H x i-1 E i E

H’2 H’’2

nd n/2 • Merkle’s meet-in-the-middle: (2 ) preimage in time 2 xi Hi-1 n/2 – select 2 values for (x1,x2) and compute forward H’2 n/2 – select 2 values for (x3,x4) and compute backward H’’2 • output length = block length m; rate 1; 1 per – by the birthday paradox expect a match and thus a (2nd) preimage • 12 secure compression functions (in ideal cipher model) • lower bounds: collision 2m/2, (2nd) preimage 2m • [Preneel+’93], [Black-Rogaway-Shrimpton’02], [Duo-Li’06], [Stam’09],…

33 34

Block cipher (EK) based: double block length Permutation (π) based (3n to 2n compression)

parazoa small permutation collision Open problems: JH Grøstl [Mennink12] • what is the best (3) collision/preimage security for xi n 2 block cipher calls? xi π1 • For optimal collision security: H1i-1 H1i what is the best preimage [Jetchev-Özen-Stam12] security for s block cipher H2i-1 π H2i (2) calls? (upper bounds are Hi 2n/3 known) Hi-1 MDC-4 π 5n/8 MDC-2 (4) 2 3n/5 (2)

n/2 trivial (1)

preimage 2n 35 n5n/4 3n/2 36

6 Hash Functions - Bart Preneel June 2014

Iteration modes and compression functions

• security of simple modes well understood • powerful tools available

• analysis of slightly more complex schemes very Hash function difficult constructions • MD versus sponge debate: – sponge is simpler

– should xi and Hi-1 be treated differently?

37 38 38

Hash function history 101 MDx-type hash function history

RSA DES MD4 Ext. MD4 1980 90 single block ad hoc Dedicated MD5 91 length schemes

HARDWARE MD2 MD4 H AVA L RIPEMD 92 1990 double MD5 block security SHA-1 length reduction SHA(-0) 93 for RIPEMD-160 factoring, 94 AES SHA-2 SHA-1 2000 small DLOG, RIPEMD-160 95 lattices Whirlpool

SOFTWARE permu- tations SHA-2 SHA-3 02 2010 SHA-3 39 12 40

MD5 [Rivest’91]: 4 rounds of 16 steps State updates in the MD4 family

H i-1

A0 B0 C0 D0 MD4 SHA/SHA-1 SHA-256 x0 f A1 B1 C1 D1 … x 15 f A16 B16 C16 D16 K + Σ0 Σ1 + K x g A17 B17 C17 D17 + f + p(0) x M i W A C + + + … J H W g << s + xp(15) A32 B32 C32 D32 K + A B C D i + xq(0) h 33 33 33 33 … AN BN CN DN EN FN GN HN

xq(15) h A48 B48 C48 D48

A B C D xr(0) j 49 49 49 49 Design principles copied in MD5, RIPEMD, HAVAL, SHA, … j SHA-1, SHA-256, ... xr(15) A64 B64 C64 D64 – All hash functions in use today + H i 41 42 Slide credit: C. Rechberger

7 Hash Functions - Bart Preneel June 2014

The complexity of collision attacks SHA-1 designed by NIST (NSA) in ‘94

log complexity brute force: 1 million PCs (1 year) or US$ 100,000 hardware (4 days) 2

90 [Wang+’05] 80 [Mendel+’08] [Manuel+’09] 70 MD4 [Wang+’04] [Stevens’12] 60 MD5 50 SHA-0 [Sugita+’06] [McDonald+’09] 40 SHA-1 30 Most attacks 20 Brute force unpublished/withdrawn 10 0

4 6 2 4 6 0 9 0 1 99 00 00 1992 1992 19 1 1998 2000 20 2 2 2008 20 43 prediction: collision for SHA-1 in the next 12 months 44

Rogue CA attack [Sotirov-Stevens-Appelbaum-Lenstra-Molnar-Osvik-de Weger ’08] Upgrades

• request user cert; by special • RIPEMD-160 is good replacement for SHA-1 collision this results in a fake CA Self-signed root key cert (need to predict serial

number + validity period) CA1 CA2 Rogue CA • upgrading algorithms is always hard

impact: rogue CA User1 User2 User x • TLS uses MD5 || SHA-1 to protect algorithm that can issue certs negotiation (up to v1.1) that are trusted by all browsers • upgrading negotiation algorithm is even harder: need to upgrade TLS 1.1 to TLS 1.2 •6 CAs have issued certificates signed with MD5 in 2008: — Rapid SSL, Free SSL (free trial certificates offered by RapidSSL), TC TrustCenter AG, RSA Data Security, Verisign.co.jp 45 46

SHA-2 [FIPS180,NIST‘02]

• SHA-224, SHA-256, SHA-384, SHA-512 – non-linear message expansion – 64/80 steps – SHA-384 and SHA-512: 64-bit architectures

• SHA-256 collisions: 31/64 steps 265.5 [Mendel+’13] 12x SHA-3 – free start collision: 52/64 steps (2 ) [Li+12] – non-randomness 47/64 steps (practical) [Biryukov+11][Mendel+11] (bits and bytes) • SHA-256 preimages: 45/64 steps (225x) [Khovratovitch’12]

• implementations today faster than anticipated

• adoption accelerated by other attacks on TLS – since 2013 deployment in TLS 1.2

47 48 48

8 Hash Functions - Bart Preneel June 2014

NIST AHS competition (SHA-3) The candidates

• SHA-3: 224, 256, 384, and 512-bit message digests • (similar to SHA-2) Call: 02/11/07 Deadline (64): 31/10/08 Round 1 (51): 09/12/08 80 Round 2 (14): 24/7/09 64 Final (5): 10/12/10 51 60 Selection: 02/10/12 40 20 14 5 1 0 Q4/08 Q3/09 Q4/10 Q4/12

round 1 round 2 final 49 50 Slide credit: Christophe De Cannière

Preliminary cryptanalysis End of Round 1 candidates

a

51 52 Slide credit: Christophe De Cannière Slide credit: Christophe De Cannière

Properties: bits and bytes Round 2 candidates [Watanabe’10]

a

53 54 Slide credit: Christophe De Cannière

9 Hash Functions - Bart Preneel June 2014

Reductions: 256-bit result Reductions: 512-bit result

pre sec coll. indiff. assumption pre sec coll. indiff. assumption Blake-256 256 256 128 128 E ideal Blake-512 512 512 256 256 E ideal Grøstl-256 256 256-L 128 128 π,ρ ideal Grøstl-512 512 512-L 256 256 π,ρ ideal JH-256 256 256 128 256 π ideal JH-512 256 256 256 256 π ideal Keccak-256 256 256 128 256 π ideal Keccak-512 512 512 256 512 π ideal -256 256 256 128 256 E ideal Skein-512 512 512 256 256 E ideal SHAKE-128 128 128 128 128 π ideal SHAKE-512 256 256 256 256 π ideal NIST 256 256-L 128 - NIST 512 512-L 256 -

55 56

Software performance Hardware: post-place & route results eBash [Bernstein-Lange] ASIC 130nm [Guo-Huang-Nazhandali-Schaumont’10]

Throughput logarithmic scale (Gbps) 20 slower SHA256 Blake Keccak BMW 16 CubeHash ECHO 12 Grostl Grøstl Hamsi JH 8 Keccak Luffa Blake Shabal SHAvite 4 JH SIMD Skein Skein 0 Area 0 40,000 80,000 120,000 160,000 200,000 (GateEqv) 57 58 factor 4 in cycles/byte Slide credit: Patrick Schaumont, Virginia Tech

Keccak Keccak: FIPS 202 (draft: 28 May 2014)

• append 2 extra bits for domain separation to allow – flexible output length (XOFs or eXtendable Output Functions) – tree structure (Sakura) allowed by additional encoding • 6 versions – SHA3-224: n=224; c = 448; r = 1152 (72%) – SHA3-256: n=256; c = 512; r = 1088 (68%) pad 01 – SHA3-384: n=384; c = 768; r = 832 (52%) – SHA3-512: n=512; c = 1024; r = 576 (36%) – SHAKE128: n=x; c = 256; r = 1344 (84%) pad 11 for XOF – SHAKE256: n=x; c = 512; r = 1088 (68%)

permutation: 25, 50, 100, 200, 400, 800, 1600 if result has n bits, H1 has r bits (rate), H2 has c bits (capacity) and c/2 n/2 nominal version: the permutation π is “ideal” collisions min (2 , 2 ) 2nd preimage min (2c/2 , 2n) • 5x5 array of 64 bits preimage min (2c , 2n) • 18 rounds of 5 steps 59 60

10 Hash Functions - Bart Preneel June 2014

Performance of hash functions [Bernstein-Lange] Hash functions: conclusions (cycles/byte) Intel Core 2 Quad Q9550; 4 x 2833MHz (2008)

2001 • SHA-1 would have needed 128-160 steps instead of 80 • 2004-2009 attacks: cryptographic meltdown but not dramatic for most applications • theory is developing for more robust iteration modes and extra features; still early for building blocks

• Nirwana: efficient hash functions with security reduction

(estimated) 61 62

11