<<

Overview

Hash Functions Lesson contents Definition and properties of cryptographic hash functions Gerardo Pelosi Design principles of hash functions Department of Electronics, Information and Bioengineering (DEIB) Design principles of compression functions Politecnico di Milano Common Hash functions gerardo.pelosi - at - polimi.it Message Authentication Codes

G. Pelosi, A. Barenghi (DEIB) Hash Functions 1 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 2 / 37

Hash Functions Hash Functions

Outline Sample integrity-check scenario Hash functions h(·) are the publicly known algorithms to provide Given message m, compute d = h(m) and store it in a safe place integrity To check for integrity, recompute h(m) and check against d Idea: compute a fingerprint of a ptx through a non-injective map the hash computation must be efficient, deterministic and practically If it matches, then m is still the same as before, otherwise unforgeable either m or d was tampered with or an error occurred (resp. was injected) over the communication The output size is constant (e.g., 160 bits) channel Common names for the output are: message digest, hash or Note: the computation of h(m) does not include any key! cryptographic

G. Pelosi, A. Barenghi (DEIB) Hash Functions 3 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 4 / 37 Hash Functions Security of Hash Functions

Unkeyed Hash Functions Formal Definition Consider h : M → D, an unkeyed hash function: the following A keyed hash function is a 4-uple (M, D, K, H), where: problems should be (computationally) impossible to solve: 1 M is a of input messages (could be unbounded) 1 Preimage Problem: given d∈D, find m∈M s.t. h(m)=d 2 D is a finite set of digests, with |M| ≥ |D| One-way property of h: you cannot reconstruct a valid m from d 3 K is a finite set of keys (the keyspace) 2 Second Preimage Problem: given m1∈M, d=h(m1), 4 H is a finite set of hash functions find m2∈M s.t. m16=m2, h(m2)=h(m1)=d Weak Collision Resistance property of h: you cannot find a message For each key k∈K, there is a hash function hk :M→D in H hashing to the same digest of an already known message A pair (m, d) is called a valid pair under key k, if hk (m) = d 3 Collision Problem: find m1, m2∈M s.t. m16=m2, h(m1)=h(m2) An unkeyed hash function is a hash family with a known fixed key k (Strong) Collision Resistance property of h: you cannot find two arbitrary messages with the same digest

G. Pelosi, A. Barenghi (DEIB) Hash Functions 5 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 6 / 37

Security of Hash Functions

Weak Collision Resistance is important because: A perfect hash: the Random Oracle If we could exhibit another message (or a small set of messages) with What does a “perfect hash function” (i.e., one-way, weak collision the same digest of a given input, the usefulness of the hash function resistant, and strong collision resistant) look like? could be at risk! Ideally, a perfect hash should be a deterministic function o(·) which returns a random string for every possible input string (Random Collision Resistance is important because: Oracle) if we could find collisions at our will, we would have the ability to Any call to the oracle with the same input m will yield o(m) build messages we can subsequently repudiate – as we would be o(m1) = o(m2) if and only if m1=m2 always able to show multiple plausible messages, of our choice, for a Probability of picking randomly a message with a specific digest d is 0 certain digest The message space is infinite ⇒ the digest space is infinite too! We need an infinite amount of memory to store all the mappings! Not practical, but useful to evaluate the security of real hash functions

G. Pelosi, A. Barenghi (DEIB) Hash Functions 7 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 8 / 37 Security of Hash Functions Security of Hash Functions

Practical 1st Preimage Problem - (black box analysis) A real hash function h(·) does not have an infinite digest space Practical 2nd Preimage Problem - (black box analysis) Trying to find a preimage for d ∈ D calling the hash function with a Let m∈M, d=h(m) be the msg-digest pair for which we want a 2nd 1 random input, I will obtain the preimage with probability |D| preimage (i.e., find m0∈M s.t. m06=m, h(m0)=h(m)=d)  q Probability of calling h(·) q times without success: 1 − 1 Pick q msg.s at random: m, m1, m2,...mq−1 s.t. m6=mi , |D| i∈{1,..., q−1} Probability of getting at least one valid preimage, mi , for d with q For each chosen message mi , compute h(mi ) calls is: If one of the h(m ) is d, return m ; otherwise fail  1 q i i Pr(mi s.t. d = h(mi )) = 1 − 1 − Again, the probability of getting at least one valid preimage of d is: |D| q−1 |D| q |D|→∞ − q   (1 − 1 )q=(1 + −1 ) |D| ≈ e |D| (notable limit), and 1 q − 1 |D| |D| Pr(∀ i mi 6= m, h(mi ) = h(m)) = 1 − 1 − ≈ q →0 |D| |D| − q |D| e |D| ≈ 1 − q (notable limit), thus if q is much smaller than |D| (i.e., q  |D|) we have: |D| q Pr(m s.t. d = h(m )) ≈ i i |D|

G. Pelosi, A. Barenghi (DEIB) Hash Functions 9 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 10 / 37

Security of Hash Functions Security of Hash Functions

Practical Collision Problem - (black box analysis) Finding the number of trials, q needed to get a collision a a − x The approach to find a collision in a hash function is From basic calculus (Taylor series), recall that 1 − x ≤ e Pick q messages at random mi , i ∈ {1,..., q} Consequentially, we can rewrite the previous result as follows: If any two digests, h(mi ) and h(mj ) with i 6= j (i, j ∈ {1,..., q}), are q q−1 equal then return (mi , mj ); otherwise, fail   Y i Y − i − 1 Pq−1 i − q(q−1) Pr(no coll.) = 1 − ≤ e |D| = e |D| i=0 = e 2|D| |D| q = 1, Pr(no collision) = 1 i=0 i=0  1  q = 2, Pr(no collision) = 1 · 1 − |D| We look for: Pr(no coll.) ≥ 50% ⇔ Pr(at least one coll.) ≤ 50%     q(q−1) 1 2 − 2|D| 1 q(q−1) q = 3, Pr(no collision) = 1 · 1 − |D| · 1 − |D| we find that: e ≥ 2 ⇒ 2|D| ≤ ln 2 ... 1 Solving for q we conclude that Pr(no collision) ≥ 2 if  1   2   q−1  1 q 1 p q picks, Pr(no collision) = 1 · 1 − |D| · 1 − |D| ··· 1 − |D| 0 < q ≤ 2 + 4 + 2|D| ln 2 ⇔ (roughly) 0 < q ≤ 1.1774 |D|

G. Pelosi, A. Barenghi (DEIB) Hash Functions 11 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 12 / 37 Security of Hash Functions Security of Hash Functions Birthday Paradox Summing up The Birthday paradox is the somewhat surprising answer to: How many people must I gather in a room in order to have a probability of 50% To obtain a sound cryptographic hash function we need to find an that any two of them share the same birthday? efficiently computable map h : M → D such that: It is computationally unfeasible to find a first/second preimage This is the probability to find at least one collision on the hash function in “less than” |D| calls to the hash function BirthdayOf(·) It is computationally unfeasible to find a collision√in less than p p 1 log |D| 1.17 |D| calls to the hash function (≈ |D| = 2log2 |D| = 2 2 2 if we do not consider leap years and assume√ that the birthdays are uniformly distributed, the answer is: ... ≈  1.1774 · 365  = 23 calls) Rule of thumb: take |D| ≥ 2160 (i.e., a hash with digest ≥160-bit) to Compare the above question with the following one: avoid a bruteforce approach to finding collisions (80-bit security) how many people must I gather in a room in order to have a probability >50% Note: Collision Res. ⇒ 2nd Preimage Res. ⇒ Preimage Res. that one of them has the same birthday as me? Note: Real hash functions have necessarily a very large number of This is the probability to find a 2nd preimage of the BirthdayOf(·) collisions. Their unforgeability comes from the fact that it is just function unfeasible to find collisions on purpose q−1 1 365 I need a lot more people: ... ≈ 365 ≥ 2 ⇔ q ≥ 1 + 2 , q ≥ 183

G. Pelosi, A. Barenghi (DEIB) Hash Functions 13 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 14 / 37

Use Scenarios for Hash functions Use Scenarios for Hash functions

Common Scenarios - 2 Efficient digital signatures: instead of signing the file f , sign h(f ) Common Scenarios - 1 Safe password storage: instead of storing a password p, store Integrity of files: hash functions may be used to check the integrity d = h(p). Even if an attacker retrieves d, he cannot find p of a network-transmitted file through having a very fast hash function may be an issue: it allows to quickly Downloading both the file f and d=h(f ) from the providing server test a set of guesses for p extracted from a dictionary Computing h(·) on the downloaded file and checking that it matches Commitment Schemes. Alice wants to commit to the choice of a with d value v, without revealing it instantly. She must be able to prove that Pseudo-unique file indexing: a unique index for a file f is obtained she committed to that value as h(f ) (e.g., video references on youtube, tiny-urls, Dropbox) Example: Alice and Bob want to perform a coin toss via e-mail: As collisions do not practically occur, Alice bets on a value, say, “heads”, and sends to Bob then h(f ) is a good unique index for f ha=h(cat(“heads”, ra)) Bob bets on a value say, “tails”, sends to Alice hb=h(cat(“tails”, rb)) The coin is tossed, and then Alice and Bob broadcast the values of ra, rb and the values (ha, hb) they committed to, allowing the commitment check

G. Pelosi, A. Barenghi (DEIB) Hash Functions 15 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 16 / 37 Common mistakes Hash Functions Design

Why employing error correction codes is not a good idea? How to design an hash function? A common mistake is to believe that an error correction code (e.g., The design of an infinite domain (unlimited message length) hash CRC-32) can be a good replacement for a cryptographic hash function is done in two phases: Despite providing integrity checking, their use is a critical mistake as 1 designing a finite domain compression function 2 designing a scheme to combine the compression function in order to it is trivial to find a preimage from a codeword! act on an infinite message space Encrypting the resulting code with a symmetric cipher may be a Willing to formalize the problem: stopgap solution, as long as the encryption is not xor linear: for simplicity, we consider bit strings as inputs/outputs a no stream cipher or CTR-like modes of operation should be used ! given a compression function h : M 7→ D, each msg is encoded by Bottom line: if you need a cryptographic hash function, do not use l + t bits, while each digest is encoded using only l bits: makeshift solutions, as they will break h : {0, 1}l+t 7→ {0, 1}l we want to construct a hash function with unlimited domain a As it was done, f.i., in WEP he : {0, 1}∗ 7→ {0, 1}s , for some integer s (digest size in bits)

G. Pelosi, A. Barenghi (DEIB) Hash Functions 17 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 18 / 37

Hash Function Design Hash Function Design

Digesting the blocks

From an unlimited string to a sequence of blocks Once we have the block sequence m1, m2,..., mr , we need to devise a scheme to feed them into h(·, ·) We want to compute he : {0, 1}∗ 7→ {0, 1}s with a sequence of applications of h : {0, 1}l+t 7→ {0, 1}l Let IV be some public initial value encoded with l-bit, then compute The input message should thus be preprocessed so that the length is a multiple of t z0 ← IV Use an injective padding function to obtain an input string with zi ← h (zi−1, m¯i ) , 1 ≤ i ≤ r length multiple of t (e.g.: concatenate enough zeros, or the size of the message followed by enough zeros) where the first l-bit parameter of h(·, ·) is called inner state, while the second t-bit parameter is the next message chunk Split the padded input string m into substrings with t bits each: After the last sub-string,m ¯r , has been processed, a further output m1, m2,..., mr transformation g : {0, 1}l → {0, 1}s can be applied (usually, it’s the identity map, provided l=s)

G. Pelosi, A. Barenghi (DEIB) Hash Functions 19 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 20 / 37 Hash Function Design Compression Function Design

Merkle-Damg˚ardConstruction Key points The design of compression functions h is younger than the one for block ciphers, thus it tries to reuse some of its key ideas The high level structure is the same of block ciphers: employ a round based structure with a small analyzable round primitive The inner state of a hash is initialized with constant values, and contains the digest after the execution The padded message to be hashed is expanded in a larger message, known as message schedule, typically through combining the words The vast majority of currently used hash functions are based on this via linear operations (XOR and shift/rotate) construction The purpose of the round in a hash function is to blend a part of the The most common attacks against this structure rely on appending a message schedule with the inner state (in a nontrivial way) chosen block to the original message to obtain a collision

G. Pelosi, A. Barenghi (DEIB) Hash Functions 21 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 22 / 37

Compression Function Design principles Secure Hash Algorithms

The most common hashing algorithms are: The collision resistance is achieved through mixing the message into MD4 (1990): 5127→128 bit digest, with 2 msg the state so that a random change in the message, triggers a bit flip 1 MD5 (1992): 5127→128 bit digest, arbitrary collision attack in every bit of the digest with probability as close as possible to 2 (this is commonly known as avalanche effect) appending one block The message mixing is usually performed through a nonlinear addition SHA-0 (1993): 5127→160 bit digest, NSA reported it as weak, (f.i. the addition modulo 232) during the round function proposed a corrected version SHA-1 The message addition may involve the whole state (similar to the SHA-1 (1995): 5127→160 bit digest, same as SHA-0 with an single bit SPN design of a cipher), or modify only a part of it and shuffle the rotation in the message schedule, 69 parts (similar to the Feistel Network design) collisions reported w/ ≈ 2 hash calls ( 2digest-bit-length/2 = 280) The number of rounds is higher than the block ciphers (numbers in (ref., X. Wang et al., Finding Collisions in the Full SHA-1, CRYPTO’05) the [64 − 80] range are not uncommon) as there is no secret key involved in the computation SHA-2 (2001): {512, 1024} 7→ {224, 256, 384, 512} bit digest, shares structure with SHA-1, different mixing functions

G. Pelosi, A. Barenghi (DEIB) Hash Functions 23 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 24 / 37 Compression Function Design Hash functions from block ciphers

SHA-1 Compression function It is possible to employ a common in place of the ad-hoc designed compression function h ABCDE To this end, the message should be padded to be a multiple of the Yellow boxes are length of the cipher block, and split accordingly in a sequence of bitwise rotations blocks x1,..., xr F W is a 32-bit word of t The proposed schemes maintain an inner state which is initialized to a <<< the message 5 constant known value H0 = IV Kt is a 32-bit W t The update rule of the inner state takes a block of the padded <<< round-dependent 30 message xi and computes Hi = f (Hi−1, xi ) K t constant word The final digest of the message is: H . F is a round dependent r+1 bitwise function Note: there are several ways to use the chosen block cipher primitive ABCDE as a compression function f

G. Pelosi, A. Barenghi (DEIB) Hash Functions 25 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 26 / 37

Hash functions from block ciphers Hash functions from block ciphers

DM (Davies-Meyer) Scheme MMO (Matyas-Meyer-Oseas) Scheme

Figure: f (xi , Hi−1) = EHi−1 (xi ) ⊕ Hi−1 Figure: f (xi , Hi−1) = Exi (Hi−1) ⊕ Hi−1

Solves the issue of generating collision via appending a block Avoids using the message as the key of the block cipher, uses the previous state as key

G. Pelosi, A. Barenghi (DEIB) Hash Functions 27 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 28 / 37 Hash functions from block ciphers Keyed Hash Functions Design

MP (Miyaguchi-Preneel) Scheme Keyed Hash Basics Incorporate a secret key into an unkeyed hash function by including the key as part of the message. If the employed hash function is constructed with the Merkle-Damg˚ardstructure, care is needed to choose where the key is added

H(m) = H(hm1, m2,..., mr i) = = h(... h(h(IV, m1), m2),..., mr ) |mi | = t-bit

Figure: f (xi , Hi−1) = EHi−1 (xi ) ⊕ xi ⊕ Hi−1 The key should be added neither as a prefix nor as a suffix of the message. Also adds the message to the output of the single compression function

G. Pelosi, A. Barenghi (DEIB) Hash Functions 29 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 30 / 37

Keyed Hash Functions Keyed Hash Functions

Key Prefix attack against the Merkle-Damg˚ardstructure Key Suffix attack against the Merkle-Damg˚ardstructure Given (m, d), with m being a (padded) message and d=H(k||m) Given (m, d), d=H(m||k), an attacker can re-use d as a digest for an attacker can provide a valid message-digest pair (m0, d0) with any message m0 colliding with the original one, i.e.: H(m)=H(m0) a message m0 of his choice without knowing the secret key k. If the length of the known message m is a multiple of t bits: m = m1||m2 ... ||mr , 0 Choosing an arbitrary message m , the attacker may forge a the attacker can consider the last block mr of the msg and the l-bit block y derived from keyed-digest d0 without knowing k as follows: the MD construction and can observe that: H(m)=h(y, mr ), with y=h( ...h(IV, m1), ... , mr−1), while H(m||k)=h(h(y, m ), k) d0 = h(d||m0) as r 0 0 If the attacker obtains a msg m colliding with m, that is: H(m )=H(m)=h(y, mr ) 0 0 h(d||m0) = h( H(k||m) || m0 ) due to MD= construction H(k||m||m0) The keyed digest of m would be: H(m ||k)=h(h(y||mr )||k)=d, thus (m0, d) is another valid message-digest pair!

(the symbol ‘||’ means concatenation) When m is shorter than t bits, the above collision attack cannot be mounted!... because 0 0 (m , d ) is another valid message-digest pair! part of the key k would become part of the first t-bit chunk in input to h(·, ·)

G. Pelosi, A. Barenghi (DEIB) Hash Functions 31 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 32 / 37 Message Authentication Codes Common Ways to Create MAC (1)

A common and widely standardised way to construct a MAC is the HMAC (keyed-Hash Message Authentication Code): A keyed hash function is often used as a message authentication code (MAC) The construction allows to build a MAC from any unkeyed hash function A MAC can be appended to a sequence of plaintext blocks Example based on SHA-2, with key size 512 bits: Used to prove to the receiver that the given plaintext originated from ipad=512 bits constant 0x363636..36 the rightful sender and was not tampered with opad=512 bits constant 0x5c5c5c..5c This is the original scenario that I gave at the beginning of lecture T = SHA-2((k ⊕ ipad)||m) HMACk (m) = SHA-2((k ⊕ opad)||T ) The scheme relies on the composition of two applications of a keyed hash function

G. Pelosi, A. Barenghi (DEIB) Hash Functions 33 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 34 / 37

Common Ways to Create MAC (2) Why design a MAC?

If design constraints do not allow to implement a separate compression function (e.g., SHA-2) to build a HMAC, a possibility is to reuse a block cipher as “compression” function. A first attempt was the CBC-MAC: Use a block cipher in CBC mode, keep only the last encrypted block as MAC Why do we use MACs instead of encrypting the digest of an Problem: if d is a valid MAC for m = cat(m1,..., mn), then d is a unkeyed hash function through employing a block cipher? valid MAC also for m = cat(m1,..., mn, IV ⊕ d ⊕ m1,..., mn) To solve the problem with CBC-MAC it is possible to employ a whitening step, which involves adding with an XOR the key to the whole input message, before applying the block cipher encryption A proper way to do so has been standardized as CMAC, and is currently accepted for the IpSec

G. Pelosi, A. Barenghi (DEIB) Hash Functions 35 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 36 / 37 Why design a MAC?

Why do we use MACs instead of encrypting the digest of an unkeyed hash function through employing a block cipher? Block size of hash primitives is larger than block cipher ones, thus the schemes based on a general compression function can exhibit a better collision resistance. The computation of a MAC should be more efficient than “hash and then encrypt”. Software and Hardware crypto-libraries usually includes hash functions because there is no country with a regulation that explicitly forbid the export/import of them USA applies export restrictions on the design or implementation of ciphers Russia forbids the import of any cryptographic solution for official or public use – The ISO standard “GOST R” was entirely designed in Russia to have a block cipher equivalent to either DES or AES

G. Pelosi, A. Barenghi (DEIB) Hash Functions 37 / 37