<<

Chapter 5

Message Integrity

1 Outline • Code • and • SHA-1 and SHA-3 • HMAC

2 Message Authentication

• For passive attacks: • For active attacks: message authentication Message authentication can verify the authenticity and integrity of received messages, and also can verify the order and timeliness of messages.

The authentication mechanism needs to generate an authenticator/tag. Authenticator/tag: The value used to authenticate the message. The methods of generating the authenticator are: message authentication code (MAC) and hash function

3 Message authentication code

A fixed-length value used as an authenticator/tag generated from the message by a -controlled public function, or a cryptographic checksum.

Shared k

ktag message m k Alice: m Bob

Generate tag: Verify tag: tag  S(k, m) V(k, m, tag) = ‘yes’ ?

4 Remarks

If only Alice and the Bob know k, and the MAC of Bob is consistent with the received MAC. 1. Bob believes that the message sent by Alice has not been tampered. 2. Bob believes that Alice is not impersonating.

The MAC algorithm is similar to the encryption algorithm except that the MAC algorithm does not have to be reversible, and thereby it is less susceptible to attacks than the encryption algorithm.

MAC provides authentication without confidentiality. Examples: Protecting public binaries on disk. Protecting banner ads on web pages. 5 sender Remarks receiver compare

(a) Message authentication

compare

(b) Authenticity and confidentiality: Authentication on plaintext

compare

(c) Authenticity and confidentiality: Authentication on 6 Attacks on MAC

(1) The cost of the exhaustive search attack for MAC is greater than the exhaustive search for encryption algorithm with the same length key.

Given M||CK(M) =MAC, to find K (2) Some attack methods do not need to find the key used to generate the MAC. An attacker can fake a message for the given authentication tag.

7 Attack MAC (Find K)

Round 1: Given M1 and MAC1, where MAC1 = CK(M1), calculate MACi = CKi (M1) for all 2k possible keys to yield 2k-n possible keys.

Round 2: Given M2 and MAC2, where MAC2 = CK (M2), calculate MACi=CKi(M2) for 2k-n possible keys obtained in round 1, to obtain 2k-2×n possible keys. In this way, if k = αn, the above attack mode requires α rounds on average. For example, if the key length is 80 bits and the MAC length is 32 bits, then about 248 possible keys will be generated in round 1, 216 possible keys will be generated in round 2, and the correct K can be found in round 3.

8 Attack MAC (Fake Message)

Let M=(X1‖X2‖...‖Xm), where Xi (i=1,...,m) is 64-bit block,

(MXX ) 1 2  X m

CMK() E K   () M  The encryption algorithm is ECB-DES. Thus, the key size is 56 bits and the

MAC size is 64 bits. If the adversary gets M‖CK(M), then the adversary will need to do 256 using exhaustive search. However, the adversary can also hack the system in the following way:

X1 ,, X m1  Y1 ,,Ym1 Ym Y 1Y2 Ym1  (M ) X  Y m m M and M’ has the same MAC 伪造一新消息:Forge a new message

 9 M  Y1 Y2  Ym1 Ym Requirements of MAC

MAC should satisfy the following requirements, assuming that the adversary knows the MAC algorithm C but does not know the key K:

• If the adversary gets M and CK(M), constructing a new message M' that satisfies CK(M') = CK(M) is computationally infeasible.

• CK(M) is uniformly distributed in the following sense: Two messages M -n and M‘ are randomly selected, Pr[CK(M)=CK(M')]=2 where n is the size of MAC. • If M' is a permutation of M, i.e., M' = f (M), for example, f is to insert one -n or more bits, then Pr[CK(M) = CK(M')] = 2 .

10 Data Authentication Algorithm One of the most widely used message authentication codes.

• The algorithm is based on the DES algorithm with the CBC mode, • Its initial vector takes a zero vector.

• The data is divided into 64-bit blocks D1, D2, ..., DN. • If the last block is less than 64 bits, some 0s can be padded on the right.

11 Data Authentication Algorithm

O1 EDK ( 1 )

O2 EDK ( 2  O 1 )

O3 EDK ( 3  O 2 ) 

ON ED KN(  O N 1 )

The data authentication code is ON or the leftmost m bits of ON, where 16 ≤ m ≤ 64.

12 Outline • Message Authentication Code • Hash Function and Birthday Attack • SHA-1 and SHA-3 • HMAC

13 Definition

A hash family is a four-tuple I=(M, T, K, H), where the following conditions are satisfied: 1. M is a set of possible messages; 2. T is a finite set of possible message digests or authentication tags; 3. K, the key space, is a finite set of possible keys;

4. For each k K, there is a hash function Hk  H, each Hk: M → T. • Unkeyed Hash Function H: M → T

• Keyed hash function (Message Authentication Code) Hk: M → T

14 Hash Function

By a hash function, it means a map H: {0,1}*{0,1}n, n  N. hash function maps arbitrarily long strings to strings of fixed length.

Example The map that sends * b1∙∙∙bk in {0, 1} to

b1∙∙∙ bk is a hash function.

15 Preimage Preimage A hash function H: M → T and an element yT Find: m M such that H(m)=y Preimage-resistance: it is infeasible to find mM such that H(m)=y

Second Preimage A hash function H: M → T and an element mM Find: m’ M such that m’≠m and H(m’)=H(m) Second Preimage-resistance: It is infeasible to find m’ M such that m’≠m and H(m’)=H(m) 16 Let H: M T be a hash function ( |M| >> |T| )

A collision for H is a pair m0 , m1  M such that: H(m0) = H(m1) and m0  m1 A function H is collision resistant if for all (explicit) “eff” algs. A: AdvCR[A, H] = Pr[A outputs collision for H] is “neg”. Example: SHA-256 (outputs 256 bits) • Stronger than preimage resistance?

• Stronger than second preimage resistance? 17 Collision Resistance

A collision of H is a pair (m, m)M 2 for which m  m and H(m) = H(m). There are collisions of all hash functions and compression functions because they are not injective. The function H(m) is weak collision resistant if it is infeasible to compute a collision (m, m) for a given mM. The function H(m) is (strong) collision resistant if it is infeasible to compute any collision (m, m) of H.

18 Birthday Attack We describe a simple attack on hash functions H: *n called the birthday attack. It attacks the strong collision resistance of H. The attack is based on the birthday paradox.

19 Birthday Paradox Birthday paradox: Suppose a group of people are in a room. What is the probability that two of them have the same birthday?

Suppose there are n birthdays and that there are k people in the room. An k elementary event is a tuple (b1, …, bk)  {1, 2, …, n} . k The birthday of the ith person is bi, 1  i  k, so we have n elementary events. We assume that those elementary events are equally probable. Then the probability of an elementary events is 1/ nk. Let p be the probability that two people in the room have the same birthday. Then with probability q = 1 - p any two people have different birthdays.

20 Birthday Paradox

k Let E be the set of all vectors (g1, …, gk)  {1, 2, …, n} , whose entries are pairwise different. Then E models the Birthday paradox. Let |E| denote the number of element in E. Then k1 |E| = (n - 0) (n - 1) … (n -(k - 1)) =  i  0 ( n  i ) and q = (n - 0) (n - 1) … (n -(k - 1)) / nk 1 k1 k1 i =  i  0 ( n  i ) =  i  1 ( 1  .) nk n Since 1 + x  ex holds for all real numbers, therefore (e-x = 1-x+x2/2!-x3/3!+…) k1 k1 i /n  i / n -k(k-1)/2n q  i1 e = e i  1 = e .

21 Birthday Paradox Assume that q = 0.5, then we have q = 0.5  e-k(k-1)/2n  ln (2-1) = ln e-k(k-1)/2n  ln 2 = k(k-1)/2n  k(k-1) = 2n ln 2 (ln 2 = 0.693)  k2 - k - 2n ln 2 = 0  k = [1 + (1 + 8n ln 2)0.5)] / 2

If k  (1 + (1 + 8n ln 2)0.5) / 2, then q  0.5.

22 Birthday Paradox We describe a simple attack on hash functions H: *n (hence the result consists of ||n possible strings) called the birthday attack. It attacks the strong collision resistance of H. The attack is based on the birthday paradox. Assume that  is an alphabet. Then strings from * can be chosen such that the distribution on the corresponding hash values is the uniform distribution. If k strings in x * are chosen, where k  (1 + (1 + 8 ||n ln 2)0.5) / 2, then the probability of two hash values being equal exceeds 0.5. 23 Birthday Paradox Assume  = {0,1}, then k  (1 + (1 + 8 ∙ 2n ∙ ln 2)0.5) / 2 is sufficient to find two strings having the same hash value with probability greater than 0.5. By the table below, if we compute a little more than 2n/2 hash values, then the birthday attack will succeed with probability > 0.5. Today, n  128 or even n  160 is required to prevent the birthday attack. n (bit length of the hash values) 50 100 150 200

n/2 log2 k  n/2 (k  2 ) 25.24 50.24 75.24 100.24 24 Collision resistance: review

Let H: M T be a hash function ( |M| >> |T| )

A collision for H is a pair m0 , m1  M such that: H(m0) = H(m1) and m0  m1

Goal: to degign collision resistant (C.R.) hash functions

Step 1: given C.R. function for short messages, construct C.R. function for long messages 25 Merkle-Damgard iterated construction

m[0] m[1] m[2] m[3] ll PB

IV H(m) h h h h (fixed) H0 H1 H2 H3 H4

Given h: T × X T (compression function)

≤L we obtain H: X T . Hi - chaining variables PB:1000…0 padding ll msg block len If no space for PB add another block 64 bits 26 MD collision resistance Thm: if h is collision resistant then so is H. Proof: collision on H collision on h Suppose H(M) = H(M’). We build collision for h.

IV = H0 , H1 , … , Ht , Ht+1 = H(M)

IV = H0’ , H1’ , … , H’r, H’r+1 = H(M’)

h( Ht, Mt ll PB) = Ht+1 = H’r+1 = h(H’r, M’r ll PB’)

27 collision on H collision on h m1 m2 m3 m4 IV H(M) (fixed) h h h h

H0 H1 H2 H3 H4

m’1 m’2 m’3 m’4 IV H(M’) (fixed) h h h h

H0 H1 H2 H3 H4

Let |M| = |M’|, M≠M’, and H(M) = hi(mi, Hi-1) with m0 = IV Suppose H(M) = H(M’) . We build a collision on h.

Case 1: mi ≠ m’i or Hi-1 ≠ H’i-1. But since h(mi, Hi-1) = h(m’i, H’i-1) there is a collision in h and we are done. Else recurse 28 collision on H collision on h m1 m2 m3 m4 IV H(m) (fixed) h h h h H0 H1 H2 H3 H4

m’1 m’2 m’3 m’4 IV H(m) (fixed) h h h h H0 H1 H2 H3 H4

Let |M| = |M’|, M≠M’, and H(M) = hi(mi, Hi-1) with m0 = IV Suppose H(M) = H(M’) . We build a collision on h.

Case 2: mi = m’i and Hi-1 = H’i-1 for all i. But then M = M’, violating our assumption. 29 Hash Fnc. from a Step 2: construct compression function h: T × X T E: K× {0,1}n {0,1}n a block cipher. The Davies-Meyer compression function h(H, m) = E(m, H) H mi E Hi

Thm: Suppose E is an ideal cipher Best possible !! (collection of |K| random perms.). n/2 Finding a collision h(H,m)=h(H’,m’) takes O(2 ) evaluations of (E,D). 30 MD5 64 rounds of compression Rivest, 1991 function: Based on Davies-Meyer const. Very popular until 2010. 2004: First collision attacks 2008: Practical collision attack; SSL cert. with same MD5 hash. ~2010: Forged Microsoft MD5 certificates used in malware Preimage resistance: Mostly ok. Input: arbitrary message size Block: 512 bits Output: 128 bits. 31 Outline • Message Authentication Code • Hash Function and Birthday Attack • SHA-1 and SHA-3 • HMAC

32 SHA-1 Designed by NSA; based on Rivest’s MD4 & MD5 80 rounds of: designs SHA 1993; SHA-1 1995 160-bit output size 2005: Some flaws discovered. SHA-2: 256- and 512-bit extension; secure

Input: Any message less than 264 bits, divided into 512-bit blocks.

Output: 160-bit message digest.

33 SHA-1 Algorithm

34 1. Message padding Bits Padding: to pad the bits so that the message is 448 bits modulo 512, i.e., the message size after padding is a multiple of 512 minus 64, and the remaining 64 bits are used in step 2.

Step 1 is necessary, even if the message size has been met. For example, if the message length is 448 bits, the padding size is 512 bits, and the full size becomes 960 bits. The number of padding bits is larger than or equal to 1 and less than or equal to 512. The padding method is: the first bit is 1, and the others are 0.

35 2. Append Length Append Length: the left 64 bits is used to denote the message size before padding in a big-endian manner.

Big-endian stores the most significant byte in the low address byte. The opposite storage method is called the little-endian method.

The message size is a multiple of 512 (set to L times)

The message is divided into Y0, Y1,..., YL-1 Each block is 16 32-bit words The total number of words in the message is N=L×16 The message is represented in words as M[0,...,N-1].

36 3. MD Buffer Initialization

MD buffer initialization: Use a 160-bit buffer to store intermediate results and output hash values.

The buffer can be represented as five 32-bit long registers (A, B, C, D, E), each storing data in big-endian mode.

The initial values are A=67452301, B=EFCDAB89, C=98BADCFB, D=10325476, E=C3D2E1F0.

37 38 4. 80 Rounds

The output of the fourth round is added to the input CVq of the first 32 round to produce CVq+1, where the addition modulo 2 is to add the word in each register and each word of the CVq.

5. After the L blocks are compressed, the output of the last block is the 160-bit message digest.

39 The processing from step 3 to step 5 can be denoted as follows:

CV0=IV; CVq+1=SUM32(CVq, ABCDEq); MD=CVL

• IV: Initial value of ABCDE. th • ABCDEq: output of the q block after the last round of processing • L: the number of block for the message (including padding and length fields) • SUM32: is the addition modulo 232 of the corresponding word • MD: The final digest value.

40 • The compression function of SHA consists of 4 rounds of processing. Each round of processing consists of 20 iterations. The form of each iteration is

A, B, C, D, E: 5 words of the buffer t: number of iterations (0 ≤ t ≤ 79)

ft(B,C,D): basic logic functions used in the t-th iteration CLSs: left loop shift s bit Wt: a 32-bit word derived from the current 512-bit block Kt: constant for addition +: addition modulo 232.

41 One iteration

42 ABCDE,,,, ( E fBCDt (,,)  CLSA5 ()  W tt K ),, ACLS 30 (),, B CD Basic Logic function f

Input: 3 32-bit words Output: 1 32-bit word

Steps Function Operation

0  t  19 f1  ft (B,C, D) (B  C)  (B  D)

20  t  39 f2  ft (B,C, D) B  C  D 40  t  59 f3  ft (B,C, D) (B  C)  (B  D)  (C  D) 60  t  79 f4  ft (B,C, D) B  C  D

43 How to generate W0,W1,…,W15,W16,W17,…,W79 from the current 512-bit block

WCLSWt1( tttt 16  W  14 W  8 W  3 )

44 Constants for Addition

45 Security of SHA-1

• 2004, Rijmen published an attack which finds collisions with a computational effort of fewer than 280operations • 2004, Xiaoyun Wang reduce the complexity for finding a collision in SHA- 1 to 263 • 2008, Stéphane Manuel reported hash collisions with an estimated theoretical complexity of 251 to 257 • 2015, Marc Stevens published a freestart collision attack on SHA-1's compression function that requires only 257 SHA-1 evaluations • February 23, 2017, CWI and Google announced the SHAttered attack, in which they generated two different PDF files with the same SHA-1 hash in roughly 263.1 SHA-1 evaluations

46 SHA-3 Public competition by NIST, similar to AES: NIST’s request for proposals (2007) 64 submissions (2008) 14 semi-finalists (2009) 5 finalists (2010) Winner: Keccak (2012) Designed by Bertoni, Daemen, Peeters, Van Assche. Based on “sponge construction”, a completely different structure. 47 Specification of Keccak

48 Structure of Keccak

f: θ, ρ, π, χ, ι steps 49 Parameters of Keccak

50 Cube of Keccak

51 Parts of State Cube for Linear Transforms

52 The ᵡ function

53 Four SHA-3 Functions

54 Speed Comparisons

Algorithm Speed (MiByte/s.) AES-128 / CTR 198 MD5 335 SHA-1 192 SHA-256 139 SHA-3 ~ SHA-256

Crypto++ 5.6 benchmarks, 2.2 GHz AMD Opteron 8354 NIST expects SHA-2 to be used for the foreseeable future. SHA-3: A companion algorithm with a different structure and properties. 55 Outline • Message Authentication Code • Hash Function and Birthday Attack • SHA-1 and SHA-3 • HMAC

56 The Merkle-Damgard iterated construction

m[0] m[1] m[2] m[3] ll PB

IV (fixed) H(m) h h h h H0 H1 H2 H3 H4

Thm: h collision resistant implies H collision resistant Can we build a MAC out of H? 57 Attempt 1 Let H: X≤L T be a Merkle-Damgard hash, and: S(k,m) = H(k||m) is this secure? no! why?

m[0] m[1] m[2] m[3] ll PB

IV (fixed) H(m) h h h h H0 H1 H2 H3 H4

Given H( k ll m) can compute H( k ll m ll PB ll w ) for any w. 58 Standardized method: HMAC (Hash-MAC)

Most widely used MAC on the Internet.

H: hash function. example: SHA-256 ; output is 256 bits

Building a MAC out of a hash function:

HMAC: S( k, m ) = H( kopad , H( kipad ll m ) ) 59 HMAC

k ipad m[0] m[1] m[2] || PB

IV (fixed) h h h h h0 h1 h2 h3 h4

tag k⨁opad h h IV

PB: Padding Block 60 HMAC properties

HMAC is assumed to be a secure PRF Can be proven under certain PRF assumptions about h(.,.) Security bounds similar to NMAC Need q2/|T| to be negligible ( q << |T|½ )

In TLS: must support HMAC-SHA1-96

61 Thank You

62 Acknowledge

Dan Boneh, David Brumley, and Shaoquan Jiang for PowerPoint Slides and figures

63