Hash Functions

Overview Hash Functions Lesson contents Definition and properties of cryptographic hash functions Gerardo Pelosi Design principles of hash functions Department of Electronics, Information and Bioengineering (DEIB) Design principles of compression functions Politecnico di Milano Common Hash functions gerardo.pelosi - at - polimi.it Message Authentication Codes G. Pelosi, A. Barenghi (DEIB) Hash Functions 1 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 2 / 37 Hash Functions Hash Functions Outline Sample integrity-check scenario Hash functions h(·) are the publicly known algorithms to provide data Given message m, compute d = h(m) and store it in a safe place integrity To check for integrity, recompute h(m) and check against d Idea: compute a fingerprint of a ptx through a non-injective map the hash computation must be efficient, deterministic and practically If it matches, then m is still the same as before, otherwise unforgeable either m or d was tampered with or an error occurred (resp. was injected) over the communication The output size is constant (e.g., 160 bits) channel Common names for the output are: message digest, hash or Note: the computation of h(m) does not include any key! cryptographic checksum G. Pelosi, A. Barenghi (DEIB) Hash Functions 3 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 4 / 37 Hash Functions Security of Hash Functions Unkeyed Hash Functions Formal Definition Consider h : M ! D, an unkeyed hash function: the following A keyed hash function is a 4-uple (M; D; K; H), where: problems should be (computationally) impossible to solve: 1 M is a set of input messages (could be unbounded) 1 Preimage Problem: given d2D, find m2M s.t. h(m)=d 2 D is a finite set of digests, with jMj ≥ jDj One-way property of h: you cannot reconstruct a valid m from d 3 K is a finite set of keys (the keyspace) 2 Second Preimage Problem: given m12M, d=h(m1), 4 H is a finite set of hash functions find m22M s.t. m16=m2, h(m2)=h(m1)=d Weak Collision Resistance property of h: you cannot find a message For each key k2K, there is a hash function hk :M!D in H hashing to the same digest of an already known message A pair (m; d) is called a valid pair under key k, if hk (m) = d 3 Collision Problem: find m1; m22M s.t. m16=m2, h(m1)=h(m2) An unkeyed hash function is a hash family with a known fixed key k (Strong) Collision Resistance property of h: you cannot find two arbitrary messages with the same digest G. Pelosi, A. Barenghi (DEIB) Hash Functions 5 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 6 / 37 Security of Hash Functions Weak Collision Resistance is important because: A perfect hash: the Random Oracle If we could exhibit another message (or a small set of messages) with What does a \perfect hash function" (i.e., one-way, weak collision the same digest of a given input, the usefulness of the hash function resistant, and strong collision resistant) look like? could be at risk! Ideally, a perfect hash should be a deterministic function o(·) which returns a random string for every possible input string (Random Collision Resistance is important because: Oracle) if we could find collisions at our will, we would have the ability to Any call to the oracle with the same input m will yield o(m) build messages we can subsequently repudiate { as we would be o(m1) = o(m2) if and only if m1=m2 always able to show multiple plausible messages, of our choice, for a Probability of picking randomly a message with a specific digest d is 0 certain digest The message space is infinite ) the digest space is infinite too! We need an infinite amount of memory to store all the mappings! Not practical, but useful to evaluate the security of real hash functions G. Pelosi, A. Barenghi (DEIB) Hash Functions 7 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 8 / 37 Security of Hash Functions Security of Hash Functions Practical 1st Preimage Problem - (black box analysis) A real hash function h(·) does not have an infinite digest space Practical 2nd Preimage Problem - (black box analysis) Trying to find a preimage for d 2 D calling the hash function with a Let m2M, d=h(m) be the msg-digest pair for which we want a 2nd 1 random input, I will obtain the preimage with probability jDj preimage (i.e., find m02M s.t. m06=m, h(m0)=h(m)=d) q Probability of calling h(·) q times without success: 1 − 1 Pick q msg.s at random: m; m1; m2;:::mq−1 s.t. m6=mi , jDj i2f1;:::; q−1g Probability of getting at least one valid preimage, mi , for d with q For each chosen message mi , compute h(mi ) calls is: If one of the h(m ) is d, return m ; otherwise fail 1 q i i Pr(mi s:t: d = h(mi )) = 1 − 1 − Again, the probability of getting at least one valid preimage of d is: jDj q−1 jDj q jDj!1 − q (1 − 1 )q=(1 + −1 ) jDj ≈ e jDj (notable limit), and 1 q − 1 jDj jDj Pr(8 i mi 6= m; h(mi ) = h(m)) = 1 − 1 − ≈ q !0 jDj jDj − q jDj e jDj ≈ 1 − q (notable limit), thus if q is much smaller than jDj (i.e., q jDj) we have: jDj q Pr(m s:t: d = h(m )) ≈ i i jDj G. Pelosi, A. Barenghi (DEIB) Hash Functions 9 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 10 / 37 Security of Hash Functions Security of Hash Functions Practical Collision Problem - (black box analysis) Finding the number of trials, q needed to get a collision a a − x The approach to find a collision in a hash function is From basic calculus (Taylor series), recall that 1 − x ≤ e Pick q messages at random mi , i 2 f1;:::; qg Consequentially, we can rewrite the previous result as follows: If any two digests, h(mi ) and h(mj ) with i 6= j (i; j 2 f1;:::; qg), are q q−1 equal then return (mi ; mj ); otherwise, fail Y i Y − i − 1 Pq−1 i − q(q−1) Pr(no coll.) = 1 − ≤ e jDj = e jDj i=0 = e 2jDj jDj q = 1, Pr(no collision) = 1 i=0 i=0 1 q = 2, Pr(no collision) = 1 · 1 − jDj We look for: Pr(no coll.) ≥ 50% , Pr(at least one coll.) ≤ 50% q(q−1) 1 2 − 2jDj 1 q(q−1) q = 3, Pr(no collision) = 1 · 1 − jDj · 1 − jDj we find that: e ≥ 2 ) 2jDj ≤ ln 2 ::: 1 Solving for q we conclude that Pr(no collision) ≥ 2 if 1 2 q−1 1 q 1 p q picks, Pr(no collision) = 1 · 1 − jDj · 1 − jDj ··· 1 − jDj 0 < q ≤ 2 + 4 + 2jDj ln 2 , (roughly) 0 < q ≤ 1:1774 jDj G. Pelosi, A. Barenghi (DEIB) Hash Functions 11 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 12 / 37 Security of Hash Functions Security of Hash Functions Birthday Paradox Summing up The Birthday paradox is the somewhat surprising answer to: How many people must I gather in a room in order to have a probability of 50% To obtain a sound cryptographic hash function we need to find an that any two of them share the same birthday? efficiently computable map h : M ! D such that: It is computationally unfeasible to find a first/second preimage This is the probability to find at least one collision on the hash function in \less than" jDj calls to the hash function BirthdayOf(·) It is computationally unfeasible to find a collisionpin less than p p 1 log jDj 1:17 jDj calls to the hash function (≈ jDj = 2log2 jDj = 2 2 2 if we do not consider leap years and assumep that the birthdays are uniformly distributed, the answer is: ::: ≈ 1:1774 · 365 = 23 calls) Rule of thumb: take jDj ≥ 2160 (i.e., a hash with digest ≥160-bit) to Compare the above question with the following one: avoid a bruteforce approach to finding collisions (80-bit security) how many people must I gather in a room in order to have a probability >50% Note: Collision Res. ) 2nd Preimage Res. ) Preimage Res. that one of them has the same birthday as me? Note: Real hash functions have necessarily a very large number of This is the probability to find a 2nd preimage of the BirthdayOf(·) collisions. Their unforgeability comes from the fact that it is just function unfeasible to find collisions on purpose q−1 1 365 I need a lot more people: ... ≈ 365 ≥ 2 , q ≥ 1 + 2 ; q ≥ 183 G. Pelosi, A. Barenghi (DEIB) Hash Functions 13 / 37 G. Pelosi, A. Barenghi (DEIB) Hash Functions 14 / 37 Use Scenarios for Hash functions Use Scenarios for Hash functions Common Scenarios - 2 Efficient digital signatures: instead of signing the file f , sign h(f ) Common Scenarios - 1 Safe password storage: instead of storing a password p, store Integrity of files: hash functions may be used to check the integrity d = h(p). Even if an attacker retrieves d, he cannot find p of a network-transmitted file through having a very fast hash function may be an issue: it allows to quickly Downloading both the file f and d=h(f ) from the providing server test a set of guesses for p extracted from a dictionary Computing h(·) on the downloaded file and checking that it matches Commitment Schemes.

Hash Functions

Hash Functions

Horner's Method: a Fast Method of Evaluating a Polynomial String Hash

Why Simple Hash Functions Work: Exploiting the Entropy in a Data Stream∗

CSC 344 – Algorithms and Complexity Why Search?

4 Hash Tables and Associative Arrays

SHA-3 and the Hash Function Keccak

Choosing Best Hashing Strategies and Hash Functions

DBMS Hashing

Hash Tables & Searching Algorithms

Introduction to Hash Table and Hash Function

PHP- Sha1 Function

Chapter 5 Hash Functions