The Compression Functions of SHA, MD2, MD4 and MD5 Are Not a Ne

The Compression Functions of SHA, MD2, MD4 and MD5 are not Ane Mahesh V. Tripunitara and Samuel S. Wagsta , Jr. COAST Lab oratory Purdue University West Lafayette, IN 47907-1398 ftripunit,[email protected] COAST TR-98/01 Abstract In this pap er, we show that the compressions functions of SHA, MD2, MD4 and MD5 are not ane transformations of their inputs. 1 Intro duction The Secure Hash Algorithm SHA [1, 5, 2] and the Message Digest algorithms MD2 [4], MD4 [6] and MD5 [7], are used to pro duce a message digest, a condensed form of a message for use in digital signature and other applications. Figure 1 shows how these algorithms are used. Each step in each of the algorithms is a compression function denoted by CF in the gure which takes as inputs a p ortion of a message and the currentvalue for a chaining variable. Each compression function alters the value for the chaining variable based on the input message, and the output from each compression function is this altered value for the chaining variable. The value for the chaining variable from the last application of the compression function is the output message digest. The initial value for the chaining variable for the rst application of the compression function is xed for each algorithm and each subsequent application uses the output from the previous application as input value for the chaining variable. An imp ortant prop erty for each such compression function to p ossess is for the output to not b e an ane transformation of the inputs, which are a xed size p ortion of a message and the initial value for the chaining variable. If the output from a compression function is an ane transformation of its inputs, it would b e trivial to construct messages that pro duce a given message digest when sub jected to the message digest algorithm as we show in section 1.2. Such an algorithm would b e useless in a security context. In this pap er we show that the compressions functions of the message digest algorithms SHA, MD2, MD4 and MD5 as describ ed in [1], [4], [6] and [7] resp ectively are indeed not ane. 1.1 Some Characteristics of the Algorithms Each algorithm takes an arbitrary length we use \length", \size" and \numb er of bits" to mean the same unless sp eci cally noted otherwise message as input and pro duces a xed length message digest as output. The message digest is 160 bits for SHA and 128 bits for each of MD2, MD4 and MD5. Each algorithm pads the input message such astomake the numb er of bits in the message plus the padamultiple of 512 in the case of SHA, MD4 and MD5, and a multiple of 128 in the case of MD2. Portions of this work were supp orted by sp onsors of the COAST Lab oratory. 1 Message Pad MMMMM LLLLLL CF CF CF CF CF Initial Final Default Digest Value for Chaining Variable M = 512 bits for SHA, MD{4,5} L = 160 bits for SHA M = 128 bits for MD2 L = 128 bits for MD{2,4,5} Figure 1: Use of Message Digest Algorithms For SHA, MD4 and MD5, the padding consists of a 1 bit followed by enough 0's to make the number of bits of the message plus the pad so far 448 mo d 512. At least 1 bit and at most 512 bits are thus added as pad. The length of the original message represented as a 64 bit numb er is then app ended to make the entire length a multiple of 512. In the case of MD2, padding of sucient length to make the numb er of bits of the padded message a multiple of 128 is added. Thus, if i bytes are added as pad, each of those bytes has the value i.At least 1 byte and at most 16 bytes are app ended as pad. A 16 byte 128 bit checksum of the message plus pad is also then app ended. After padding, SHA, MD4 and MD5 break the padded message into 512 bit chunks and use those as inputs to the compression function, in turn, as indicated in gure 1. MD2 also do es the same, except that its compression function takes 128 bit chunks of the message as input. Each compression function also takes as input the value of the chaining variable which is the output from the previous application of the compression function. For the rst application of the compression function, each algorithm uses a default initial value for the chaining variable. 1.2 An Attack on a Message Digest Algorithm That Uses an Ane Com- pression Function In this section, we assume the existence of a message digest algorithm, A that works like SHA, MD2, MD4 and MD5 in that it takes as input a message of some length, pads the message and breaks the padded message into equal sized chunks to use as input to a compression function, CF . CF also takes as input a chaining variable that is the output from the previous application of CF a default value for the rst application. Also, we assume that CF is ane in its inputs, that is, CF can b e represented as a pre-multiplication by a matrix of appropriate dimensions bitwise XOR-ed with a vector of appropriate size. Then, given a message and its corresp onding message digest, we construct a second message with the same message digest. We assume that the reader is familiar with basic concepts of numerical linear algebra suchasrank of a matrix and Gaussian Elimination.We refer the reader to [3] for a review of these and other related topics. Also, we adopt the following notation: n;m F : The set of n m matrices each of whose elements are in F , the Galois Field with the 2 2 two elements 0 and 1. m is omitted if it is 1, and n and m are omitted if they are b oth 1. : The bitwise exclusive-or op erator. jj: A function that takes as argument a bit string or bit vector and returns its length. : Used to concatenate bit strings. 2 A: The message digest algorithm with an ane compression function. 0 00 M; M ;M : Inputs to A messages of some length. D : Output from A message digest. CF : The compression function used in A. 0 00 m; m : Input message to CF ,typically a p ortion of M , M or M , written as a bit vector. i v; v : Input chaining variable to CF written as a bit vector. i w; w : Output from CF written as a bit vector. The output from the nal application of CF i is the message digest. L : The length of the input message to CF .Thus, typically, L = jm j and L = 512 for m m i m SHA, MD4 and MD5 and L = 128 for MD2. m L : The length of the input chaining variable to and thus, the output from, CF .Thus, v L = jv j = jw j for any i; j . L = 160 for SHA and L = 128 for MD2, MD4 and MD5. v i j v v p: The padding added to the message. For SHA, MD4 and MD5, 65 jpj 576 and for MD2, 136 jpj 256. L ;L +L v v m C : The matrix corresp onding to CF .Thus, C 2 F . 2 L th v and C = C : The i column of C , where 1 i L + L . Thus, C 2 F i v m i 2 C C ::: C . 1 2 L +L v m b: The constantvector corresp onding to the ane transformation, CF .Thus, b is the output L v . on input 0 the vector of all 0's, and b 2 F 2 Thus, CF , on inputs v , m and output w can now b e expressed as: v w = C C ::: C b 1 1 2 L +L v m m Assume that some message M on input to A pro duces D as message digest. Wewould liketo 0 substitute our favorite message, M for the original message, but have the new message also pro duce the same output D on input to A. Of course, this is not p ossible in general. Therefore, starting from our 0 00 0 00 favorite message M ,we construct M , a slightly transformed version of M , such that M on input to 0 A, pro duces D as message digest. Starting with M ,we carry out the attack as follows. 0 1. We split M into chunks of size L .Thus, we pro duce m ;m ;:::;m such that jm j = L for m 1 2 k i m 0 i =1; 2;:::;k 1, jm j L and m m ::: m = M . k m 1 2 k 2. We sub ject eachofm ;m ;:::;m to CF , in turn. At the rst application, we use the default 1 2 k 1 value for the input chaining variable and at each subsequent application, we use the output from the previous application of CF . Application of CF is carried out as indicated in 1. The output from the last such application of CF is w = v , which is the same as the input chaining variable k 1 k to the next application of CF .Thus, wehavenow p erformed k 1 applications of CF . 3. If A works like SHA, MD4 or MD5, the message is rst padded with at least one bit so that it is of length 448 mo d L , and then with the length of the message represented as a 64 bit number m of course, L = 512 for SHA, MD4 and MD5.

Load more