Hash Function Design Overview of the Basic Components in SHA-3 Competition
Total Page:16
File Type:pdf, Size:1020Kb
Hash Function Design Overview of the basic components in SHA-3 competition Daniel Joščák [email protected] S.ICZ a.s. Hvězdova 1689/2a, 140 00 Prague 4; Faculty of Mathematics and Physics, Charles University, Prague Abstract In this article we bring an overview of basic building blocks used in the design of new hash functions submitted to the SHA-3 competition. We briefly present the current widely used hash functions MD5, SHA-1, SHA-2 and RIPEMD-160. At the end we consider several properties of the candidates and give an example of candidates that are in SHA-3 competition. Keywords: SHA-3 competition, hash functions. 1 Introduction In 2004 a group of researchers led by Xiaoyun Wang (Shandong University, China) presented real collisions in MD5 and other hash functions at the rump session of Crypto conference and they explained the method in [10]. In 2006 the same group presented a collision attack on SHA–1 in [8] and since then a lot of progress in collision finding algorithms has been made. Although there is no specific reason to believe that a practical attack on any of the SHA–2 family of hash functions is imminent, a successful collision attack on an algorithm in the SHA–2 family could have catastrophic effects for digital signatures. In reaction to this situation the National Institute of Standards and Technology (NIST) created a public competition for a new hash algorithm standard SHA–3 [1]. Except for the obvious requirements of the hash function (i.e. collision resistance, first and second preimage resistance, …) NIST expects SHA–3 to have a security strength that is at least as good as the hash algorithms in the SHA–2 family, and that this security strength will be achieved with significantly improved efficiency. NIST also desires that the SHA–3 hash functions will be designed so that a possibly successful attack on the SHA–2 hash functions is unlikely to be applicable to SHA–3. The submission deadline for new designs was October 31, 2008. 51 algorithms were submitted for the competition. A lot of new ideas appeared in the submissions but candidates also contain some several common properties. We try to summarize common building blocks which appeared and categorize the submission according to them. The information about NIST’s organization of the SHA-3 competition, algorithm speed and current state of attacks and are taken and can be found at NIST web page [1], projects eBash [5] and Hash ZOO [4]. Very good comparison and categorization of the candidates can be found in [7]. 30 Security and Protection of Information 2009 2 Desired properties In this section we briefly present definitions of properties that good hash functions and candidates for SHA-3 algorithm must have. Collision resistant: a hash function H is collision resistant if it is hard to find two distinct inputs that hash to the same output (that is, two distinct inputs m1 and m2, such that H(m1) = H(m2)). Every hash function with more inputs than outputs will necessarily have collisions. Consider a hash function SHA256 that produces 256 bits of output from an arbitrarily large input. Since it must generate one of 2256 outputs for each member of a much larger set of inputs, the pigeonhole principle guarantees that some inputs will hash to the same output. Collision resistance doesn't mean that no collisions exist; simply that they are hard to find. The birthday paradox sets an upper bound on collision resistance: if a hash function produces N bits of output, an attacker can find a collision by performing only 2N/2 hash operations until two outputs happen to match. If there is an easier method than this brute force attack, it is considered a flaw in the hash function. First preimage resistant: a hash function H is said to be first preimage resistant (sometimes only preimage resistant) if given h it is hard to find any m such that h = H(m). Second preimage resistant: a hash function H is said to be second preimage resistant if given an input m1, it is hard to find another input, m2 (not equal to m1) such that H(m1) = H(m2) A preimage attack differs from a collision attack in that there is a fixed hash or message that is being attacked and in its complexity. Optimally, a preimage attack on an n-bit hash function will take an order of 2n operations to be successful. Resistant to length-extension attacks: given H(m) and length of m but not m, by choosing a suitable m' an attacker is not able to calculate H (m || m'), where || denotes concatenation. Efficiency: computation of a hash function must be efficient i.e. speed matters. Hash functions are widely deployed in many applications and it is important to have fast implementation on different architectures. During the first SHA-3 conference organized by NIST organizer announced they initially focus on Intel Architecture 32-bit (IA-32) and Advanced Micro Devices 64-bit (AMD64) but performance on other platforms will not be overlooked. They asked if submitters adjust tunable parameters of candidates to run as fast as SHA-256, SHA-512 on IA-32 and AMD64, are the algorithms secure? If not its chances in competition are lower. Memory requirements and code size is very important for implementation on various embedded systems such as smart cards. HMAC construction: hash function must have at least one construction to support HMAC (or alternative MAC construction) as a pseudorandom function (PRF) i.e. it is hard to distinguish HMACK based on H from a random function. 3 Current hash functions We briefly describe four the most known and used hash algorithms to show an evolution of the hash functions. All of the functions use the same message padding (adding bit “1”, then zeroes and length of the message such that padded message is multiple of the block-size for compression function). All of the functions use the Merkle-Damgård construction from a compression function which is shown in Figure 1. All but RIPEMD-160 uses Davies-Meyer construction of compression function from a block cipher. And Security and Protection of Information 2009 31 all of the functions use a very simple register instruction: logical operators or, and, xor in simple nonlinear function, modular addition, shift and rotation. Functions mainly differ (except the obvious length of the registers, message blocks and outputs) in complexity of the message expansion function and step function which are part of the compression function. The newer the function is, a more complex message expansion and step function is used. M1 M2 Mn IV f f f output Figure 1: Merkle-Damgård construction. 3.1 MD5 MD5 was designed by Ron Rivest in 1991. It was a successor of previous MD4 and the length of output is 128 bits long. The message expansion was very simple - identity and permutations of message-block registers. Step function is shown on Figure 2. The first cryptanalysis appeared in 1993 [6]. Real collisions are known since 2004 [10]. It is not recommended to use this function for cryptographic purposes any more. Figure 2: MD5 step function, F is simple nonlinear function (taken from wikipedia). 3.2 SHA-1 Specification was published in 1995 as the Secure Hash Standard, FIPS PUB 180-1, by NIST. The output of the function has a length of 160 bits. It was a successor of SHA0 which was withdrawn by NSA shortly after its publication and was superseded by the revised version. SHA-1 differs from SHA-0 only by a single bitwise rotation in the message schedule of its compression function; this was done, according to NSA, to correct a flaw in the original algorithm which reduced its cryptographic security. It is the most common hash function used today. 32 Security and Protection of Information 2009 In 2006 a collision attack on SHA–1 was presented in [8]. No real collisions were found till today but the complexity of the attack is claimed to be roughly 261. It is not recommended to use this function for new applications. Figure 3: SHA-1 step function, F is simple nonlinear function (taken from wikipedia). 3.3 SHA-2 SHA-2 is a family of four hash functions SHA 224, SHA 256, SHA 384 and SHA 512. The algorithms were first published in the draft FIPS PUB 180-2 in 2001. The 386 and 512 bit versions use different constants, 64 bits long registers and 1024 bits long message blocks in compression functions. Otherwise they are the same. SHA-2 functions have the same construction properties as SHA-1, but there weren’t any successful applications of the previous attacks on SHA-1 or MD5 published. This is believed to be due to their complex message expansion and step function. Nowadays users are strongly encouraged to move to these functions. Figure 4: SHA-2 step function, Ch, Ma, ∑0 and ∑+ are not so trivial functions (taken from wikipedia). Security and Protection of Information 2009 33 3.4 RIPEMD-160 RIPEMD-160 is a 160-bit cryptographic hash function, designed by H. Dobbertin, A. Bosselaers, and B. Preneel. It is intended to be used as a secure replacement for the 128-bit hash functions MD4, MD5. The speed of the algorithm is similar to the speed of SHA-1 but the structure of the algorithm is different as shown on Figure 5. It uses a balanced Feistel network known from the theory of block ciphers. There are no successful attacks known on RIPEMD-160 and the function is together with the SHA-2 family recommended by ETSI 102176-1.