<<

SHA-1 collision found

Lukáš Mi ňo, Richard Bartuš

What is a ...... 1 MD5 (Message-Digest Algorithm) ...... 2 Briefly about MD5...... 2 Vulnerability...... 2 Applications ...... 3 Examples ...... 3 SHA (Secure Hash Algorithm)...... 4 SHA-0...... 4 SHA-1...... 5 Cryptoanalysis...... 5 SHA-1 collision ...... 6 SHA-2...... 7 Cryptoanalysis...... 7 Applications ...... 8 Example hashes ...... 8 SHA-1 algorithm...... 8 SHA-2 algorithm...... 8 SHA-256 ...... 9 SHA-512 ...... 9 Comparison between SHA-1 and SHA-2 ...... 9 Literature...... 10

What is a hash function

A hash function is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm "chops and mixes" (for instance, substitutes or transposes) the data to create such fingerprints. The fingerprints are called hash sums, hash values, hash codes or simply hashes. (Note that hashes can also mean the hash functions.) Hash sums are commonly used as indices into hash tables or intermediate hash files. Cryptographic hash functions are used for various purposes in information security applications. 2 Lukáš Mi ňo, Richard Bartuš

MD5 (Message-Digest Algorithm)

Briefly about MD5

In , MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function with a 128-bit hash value. As an Internet standard (RFC 1321 ), MD5 has been employed in a wide variety of security applications, and is also commonly used to check the integrity of files . An MD5 hash is typically expressed as a 32-character hexadecimal number. MD5 was designed by Ron Rivest in 1991 to replace an earlier hash function, MD4 . In 1996 , a flaw was found with the design of MD5; while it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1 (which has since been found vulnerable itself). In 2004 , more serious flaws were discovered making further use of the algorithm for security purposes questionable. In 2007 a group of researchers including Arjen Lenstra described how to create a pair of files that share the same MD5 checksum.

Vulnerability

Because MD5 makes only one pass over the data, if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more reasonable. Because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix; that is, for any given string of characters X, two colliding files can be determined which both begin with X. All that is required to generate two colliding files is a template file, with a 128-byte block of data aligned on a 64-byte boundary, that can be changed freely by the collision-finding algorithm. Recently, a number of projects have created MD5 "rainbow tables " which are easily accessible online, and can be used to reverse many MD5 hashes into strings that collide with the original input, usually for the purposes of password cracking. However, if passwords are combined with a before the MD5 digest is generated, rainbow tables become much less useful. The use of MD5 in some websites' URLs means that Google can also sometimes function as a limited tool for reverse lookup of MD5 hashes. This technique is rendered ineffective by the use of a salt. SHA-1 collision found 3

Applications

MD5 digests have been widely used in the software world to provide some assurance that a transferred file has arrived intact. For example, file servers often provide a pre-computed MD5 checksum for the files, so that a user can compare the checksum of the downloaded file to it. Unix -based operating systems include MD5 sum utilities in their distribution packages, whereas Windows users use third-party applications. However, now that it is easy to generate MD5 collisions, it is possible for the person who created the file to create a second file with the same checksum, so this technique cannot protect against some forms of malicious tampering. Also, in some cases the checksum cannot be trusted (for example, if it was obtained over the same channel as the downloaded file), in which case MD5 can only provide error-checking functionality: it will recognize a corrupt or incomplete download, which becomes more likely when downloading larger files. MD5 is widely used to store passwords . To mitigate against the vulnerabilities mentioned above, one can add a salt to the passwords before hashing them. Some implementations may apply the hashing function more than once—see strengthening .

Examples

The 128-bit (16-byte) MD5 hashes (also termed message digests) are typically represented as a sequence of 32 hexadecimal digits. The following demonstrates a 43-byte ASCII input and the corresponding MD5 hash: MD5("The quick brown fox jumps over the lazy dog ") = 9e107d9d372bb6826bd81d3542a419d6

Even a small change in the message will (with overwhelming probability) result in a completely different hash, due to the . For example, changing d to e: MD5("The quick brown fox jumps over the lazy eog") = ffd93f16876049265fbaef4da268dd0e

The hash of the zero-length string is: MD5("") = d41d8cd98f00b204e9800998ecf8427e 4 Lukáš Mi ňo, Richard Bartuš

SHA (Secure Hash Algorithm)

The SHA hash functions are five cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. Hash algorithms compute a fixed-length digital representation (known as a message digest) of an input data sequence (the message) of any length. They are called “secure” when (in the words of the standard), “it is computationally infeasible to: 1. find a message that corresponds to a given message digest, or 2. find two different messages that produce the same message digest. Any change to a message will, with a very high probability, result in a different message digest.

The five algorithms are denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. SHA-224, SHA-256, SHA-384, and SHA-512 are sometimes collectively referred to as SHA-2. SHA-1 produces a message digest that is 160 bits long. SHA-224, SHA-256, SHA-384, and SHA-512 denote the bit length of the digest they produce.

The Secure Hash Algorithm (SHA), developed by NIST, along with the NSA, for use with the Standard (DSS) is specified within the Secure Hash Standard (SHS). SHA-1 was a revision to SHA that was published in 1994. The revision corrected an unpublished flaw in SHA. SHA is a cryptographic message digest algorithm similar to the MD4 family of hash functions developed by Rivest. It differs in that it adds an additional expansion operation, an extra round and the whole transformation was designed to accomodate the DSS block sise for efficiency. The Secure Hash Algorithm takes a message of less than 264 bits in length and produces a 160-bit message digest which is designed so that it should be computationaly expensive to find a text which matches a given hash, for example if you have a hash for document A, H(A), it is difficult to find a document B which has the same hash, and even more difficult to arrange that document B says what you want it to say.

SHA-0

The original specification of the algorithm was published in 1993 as the Secure Hash Standard, FIPS PUB 180, by US government standards agency NIST (National Institute of Standards and Technology). This version is now often referred to as SHA-0. SHA-1 collision found 5

SHA-1

SHA-1 was published in 1995 in FIPS PUB 180-1.

When a message of any length < 2 64 bits is input, the SHA-1 produces a 160- bit output called a message digest. The message digest can then, for example, be input to a signature algorithm which generates or verifies the signature for the message. Signing the message digest rather than the message often improves the efficiency of the process because the message digest is usually much smaller in size than the message. The same hash algorithm must be used by the verifier of a digital signature as was used by the creator of the digital signature. Any change to the message in transit will, with very high probability, result in a different message digest, and the signature will fail to verify. The SHA-1 is called secure because it is computationally infeasible to find a message which corresponds to a given message digest, or to find two different messages which produce the same message digest. Any change to a message in transit will, with very high probability, result in a different message digest, and the signature will fail to verify.

SHA-1 produces a 160-bit hash. That is, every message hashes down to a 160-bit number. Given that there are an infinite number of messages that hash to each possible value, there are an infinite number of possible collisions. But because the number of possible hashes is so large, the odds of finding one by chance is negligibly small (one in 280, to be exact). If you hashed 280 random messages, you'd find one pair that hashed to the same value. That's the "brute force" way of finding collisions, and it depends solely on the length of the hash value. "Breaking" the hash function means being able to find collisions faster than that.

SHA-1 is employed in several widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec. It was considered to be the successor to MD5, an earlier, widely-used hash function.

Cryptoanalysis For an ideal hash function, violating the first criterion listed above, finding a message that corresponds to a given message digest, can always be done using a brute force search in 2L evaluations, where L is the number of bits in the message digest. This is called a . The second criterion, finding two different messages that produce the same message digest, known as a collision, requires only 2L/2 evaluations using a . For the latter reason the strength of a hash function is usually compared to a 6 Lukáš Mi ňo, Richard Bartuš symmetric cipher of half the message digest length. Thus SHA-1 was originally considered to have 80-bit strength. Cryptographers have produced collision pairs for SHA-0 and have found algorithms that should produce SHA-1 collisions in far fewer than the intended 280 evaluations. In terms of practical security, a major concern about these new attacks is that they might pave the way to more efficient ones. Whether this is the case has yet to be seen, but a migration to stronger hashes is believed to be prudent. Some of the applications that use cryptographic hashes, such as password storage, are only minimally affected by a collision attack. Constructing a password that works for a given account requires a preimage attack, and access to the hash of the original password (typically in the shadow file) which may or may not be trivial. Reversing password (e.g. to obtain a password to try against a user's account elsewhere) is not made possible by the attacks. In the case of document signing, an attacker could not simply fake a signature from an existing document—the attacker would have to produce a pair of documents, one innocuous and one damaging, and get the private key holder to sign the innocuous document. There are practical circumstances where this is possible.

SHA-1 collision This attack builds on previous attacks on SHA-0 and SHA-1, and is a major, major cryptanalytic result. It pretty much puts a bullet into SHA-1 as a hash function for digital signatures (although it doesn't affect applications such as HMAC where collisions aren't important)… Because SHA-1 algorithm generates checksum of solid length 160 bits, it is obvious, that there are more inputs with equal resultant SHA-1 checksum; and this case is called collision. Purposeful looking for these collisions is based on brute computing force, when random input data are chosen, till two inputs with the same SHA-1 checksum are found. This could enable forge digital signatures or certificates. SHA-1 attack has computational complexity 2 80 by using brute computing force. Xiaoyun Wang and Hongbo Yu from Shandong University and Yiqun Lisa Yin from Princeton University found a way of attack, how to find collision at computational complexity 2 69 . Every procedure, that has lower computational complexity than brute-force attack is considered as breaking of algorithm. This new way is currently still very complicated to compute, but it’s not out of possibilities of supercomputers as was believed (or at least hoped). By original plans of NIST SHA-1 algorithm should be discarded by 2010, but its the newest commands to the government propose to substitute SHA-1 with SHA-256 or SHA-512. These algorithms differ from SHA-1 with length of resulting checksum in bits. The research team has been quietly circulating a paper describing its results: SHA-1 collision found 7

- collisions in the the full SHA-1 in 2 69 hash operations, much less than the brute-force attack of 2 80 operations based on the hash length. - collisions in SHA-0 in 2 39 operations. - collisions in 58-round SHA-1 in 2 33 operations.

SHA-2

NIST has published four additional hash functions in the SHA family, each with longer digests, collectively known as SHA-2. The individual variants are named after their digest lengths (in bits): SHA-224, SHA-256, SHA-384, and SHA-512. The latter three were first published in 2001 in the draft FIPS PUB 180-2, at which time review and comment were accepted. FIPS PUB 180-2, which also includes SHA-1, was released as an official standard in 2002. In February 2004, a change notice was published for FIPS PUB 180-2, specifying an additional variant, SHA-224, defined to match the key length of two-key Triple DES. These variants are patented in US patent 6829355.

SHA-256 and SHA-512 are novel hash functions computed with 32- and 64-bit words, respectively. They use different shift amounts and additive constants, but their structures are otherwise virtually identical, differing only in the number of rounds. SHA-224 and SHA-384 are simply truncated versions of the first two, computed with different initial values.

These new hash functions have not received as much scrutiny by the public cryptographic community as SHA-1 has, and so their cryptographic security is not yet as well-established. Gilbert and Handschuh (2003) have studied the newer variants and found no weaknesses.

The security of SHA-1 has been somewhat compromised by cryptography researchers. Although no attacks have yet been reported on the SHA-2 variants, they are algorithmically similar to SHA-1 and so efforts are underway to develop improved alternative hashing algorithms. An open competition for a new SHA-3 function was formally announced in the Federal Register on November 2, 2007. " NIST is initiating an effort to develop one or more additional hash algorithms through a public competition, similar to the development process for the Advanced Encryption Standard (AES). " Submissions are due October 31, 2008 and the proclamation of a winner and publication of the new standard are scheduled to take place in 2012.

Cryptoanalysis In terms of practical security, a major concern about these new attacks is that they might pave the way to more efficient ones. Whether this is the case has 8 Lukáš Mi ňo, Richard Bartuš yet to be seen, but a migration to stronger hashes is believed to be prudent. Some of the applications that use cryptographic hashes, such as password storage, are only minimally affected by a collision attack. Constructing a password that works for a given account requires a preimage attack, and access to the hash of the original password (typically in the shadow file) which may or may not be trivial. Reversing password encryption (e.g. to obtain a password to try against a user's account elsewhere) is not made possible by the attacks.

In the case of document signing, an attacker could not simply fake a signature from an existing document—the attacker would have to produce a pair of documents, one innocuous and one damaging, and get the private key holder to sign the innocuous document. There are practical circumstances where this is possible.

Applications SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 are the required by law for use in certain U. S. Government applications, including use within other cryptographic algorithms and protocols, for the protection of sensitive unclassified information. FIPS PUB 180-1 also encouraged adoption and use of SHA-1 by private and commercial organisations.

A prime motivation for the publication of the Secure Hash Algorithm was the Digital Signature Standard, in which it is incorporated. The SHA hash functions have been used as the basis for the SHACAL block ciphers.

Example hashes

SHA-1 algorithm

The following is an example of SHA-1 digests. ASCII encoding is assumed for all messages. SHA1("The quick brown fox jumps over the lazy dog") = 2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12

Even a small change in the message will, with overwhelming probability, result in a completely different hash due to the avalanche effect. For example, changing dog to cog: SHA-1 collision found 9

SHA1("The quick brown fox jumps over the lazy cog") = de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3

The hash of the zero-length message is: SHA1("") = da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709

SHA-256

SHA256("The quick brown fox jumps over the lazy dog") = d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

SHA-512

SHA512("The quick brown fox jumps over the lazy dog") = 07e547d9586f6a73f73fbac0435ed76951218fb7d0c8d788a309d785436bbb64 2e93a252a954f23912547d1e8a3b5ed6e1bfd7097821233fa0538f3db854fee6

Comparison between SHA-1 and SHA-2

SHA-1 and SHA-256 have many features in common. They both can process messages with the maximum length up to 264-1 bits, have a message block size of 512 bits, and have internal structure based on processing 32-bit words. SHA-384 and SHA-512 have even more similarities. They process messages with the maximum length up to 2128-1 bits, have a message block size of 1024 bits, and have internal structure based on processing 64-bit words. On top of that, the definition of SHA-384 is almost identical to the definition of SHA-512, with the exception of a different choice of the initialisation vector, and a truncation of the final 512-bit result to 384 bits. All functions have a very similar internal structure, and process each message block using multiple rounds. The number of rounds is the same for SHA-1, SHA-384, and SHA-512, and 20% smaller in SHA-256. The critical path in each round involves multioperand addition. SHA-1 requires two fewer operands per addition than in the remaining three functions. A notation k+1 used in the table, means that the number of operands to be added is k in all but last round, and k+1 in the last round. Alternatively, a number of operands may be equal to k in all rounds, and an additional simplified round may be introduced for the remaining single addition. 10 Lukáš Mi ňo, Richard Bartuš

The number of different constants is equal to four in SHA-1, and is the same as the number of rounds in all remaining functions. As a result, implementations of SHA-256, SHA-384, and SHA-512 must include a look- up table of constants, Kt, where t=0..number of rounds. SHA-1 is also the only function that contains an operation dependent on the round number t; in all remaining hash functions all rounds perform exactly the same operations. The following conclusions can be derived from this functional comparison. Hardware implementations of SHA-384 and SHA-512 have exactly the same performance, so only one of them needs to be implemented for the purpose of comparative analysis. Notice that the size of the message block is twice as large in SHA-512 as compared to SHA-1, the number of rounds is the same, and the critical path is only slightly longer in SHA-512. Because of this, SHA-512 (the strongest function) is likely to be significantly faster than SHA-1 (the weakest function), which would be a very positive result if true. The throughput of SHA-256 is likely to be in the same range as a throughput of SHA-1, and smaller than the throughput of SHA-512. Taking into account these estimations, we have decided to implement two of the investigated hash functions, SHA-1 and SHA- 512, which lay on the opposite ends of the spectrum in terms of both security and speed, with SHA-1 being the weakest and slowest, and SHA-512 being the strongest and fastest of the four investigated hash functions.

Literature http://www.rsa.com/rsalabs/node.asp?id=2927 http://www.schneier.com/blog/archives/2005/02/sha1_broken.html http://boinc.iaik.tugraz.at/ http://en.wikipedia.org/wiki/Sha1 http://www.schneier.com/blog/archives/2005/02/sha1_broken.html http://www.faqs.org/rfcs/rfc3174.html http://www.w3.org/PICS/DSig/SHA1_1_0.html http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html http://www.itnews.sk/buxus_dev/generate_page.php?page_id=3922 http://www.east.isi.edu/~bschott/pubs/grembowski02comparative. http://www.packetizer.com/security/sha1/ http://www.hashemall.com/ http://en.wikipedia.org/wiki/MD5