<<

Recent Advances in Automatic Control, Information and Communications

Security Review of the SHA-1 and MD5 Cryptographic Hash Algorithms

ROMAN JAŠEK1, LIBOR SARGA2, RADEK BENDA2 1Department of Informatics and Artificial Intelligence 2Department of Statistics and Quantitative Methods Tomas Bata University in Zlin 1Nad Stranemi 4511, 760 05 Zlin 2Mostni 5139, 760 01 Zlin CZECH REPUBLIC [email protected], [email protected], [email protected]

Abstract: - Database breaches and reverse engineering of hashed accentuated lackluster approach of administrators maintaining sensitive data. We therefore evaluate the most-commonly utilized hashing algorithm, SHA-1 and MD5, as to their ability to withstand various threat scenarios. A comprehensive literature research will be presented which supports the hypothesis that security considerations when implementing cryptographic hash functions are sidelined in favor of backward-compatible procedures with provably lower level of resilience in face of deliberate attempts at obtaining sensitive data by unauthorized third parties. Also described will be a way to improve the current predicament to better protect confidential data using cryptographic salts.

Key-Words: security, hashing, algorithm, sha-1, ,

1 Introduction Security Standard codifies proper handling and Sensitive data protection has been in focus of storage of financial data [1]. European Union’s Data security researchers for a long time. While extensive Protection Directive 95/46/EC was enacted in 1995 academic coverage focused on analysis of existing [2] and in 2012, a major reform titled General Data and proposed cryptographic hash algorithms exists, Protection Regulation has started being planned to corporations and governments are slow to adopt streamline protection and sharing of personally proposed changes due to inertia and compatibility identifiable data of all member states’ citizens. issues, increased hardware requirements together Unless noted, sensitive data will refer to any with deployment costs. When benefits of these confidential electronic information user willingly changes are not clearly communicated, the necessity disclosed which may compromise his electronic to keep cryptographic systems up-to-date is identity or integrity if obtained by an unauthorized deprioritized due to lacking technical background. third party. To preclude such situations, data may be Website frontends are frequently vulnerable to converted to a fixed-size output using a hash one or more techniques such as SQL injection, null function. injection, buffer overflow, directory traversal, Advances in chip design and transistor and uncontrolled format strings. Relying solely on integration follows the Moore’s Law [3], network perimeter security elements should not a prediction can be made computational resources constitute basis for leaving critical portions of data available for any potentially malicious third party storages unencrypted. However, widely used data will increase in time. Periodic revisions must thus schemes do not guarantee adequate level be made as to what cryptographic hash algorithms of security. To detect changes in databases counting are sufficient to protect sensitive data due to millions of records, various mathematical a parallelizing approach to brute-force and fingerprinting techniques were devised titled dictionary attacks, allowing attackers to enumerate cryptographic hash functions which provide billions of combinations per unit of time, rendering computationally efficient way to generate, store, and the hash scheme inefficient if implemented manipulate (compare, move, delete) such output incorrectly. data strings with marginal time requirements. The article describes two popular hashing Definition of sensitive data varies. Legal algorithms currently in use: MD5 and SHA-1. While incentives, namely Payment Card Industry Data some of their variants have been proven

ISBN: 978-960-474-316-2 19 Recent Advances in Automatic Control, Information and Communications

computationally insecure, they are nevertheless still 2.1 MD5 widely deployed as alternatives to comparably more The MD5 Message-Digest Algorithm is a 128- secure schemes for compatibility or legacy reasons. bit, 4-rounds function proposed by Ronald Rivest [6]. Successor to MD2 and MD4, it was designed as an industry standard and sanctioned by the Internet 2 Cryptographic Hash Functions Engineering Task Force (IETF) to be a part of the A cryptographic , commonly Internet protocol suite. It is represented by a 32-byte abbreviated as a hash is a “…function, mathematical hexadecimal string. or otherwise, that takes a variable-length input string As every encryption scheme seeing widespread (called a pre-image) and converts it to a fixed-length adoption, MD5 was heavily scrutinized by security (generally smaller) output string (called a hash researchers as well as academia. From the properties value)” [4]. Hash may alternatively be titled of any has, it was known the function is vulnerable , although checksum validates to hash collisions during which the attacker searches homogeneity and consistency of a data block while for a pre-image that hashes to the same product. If hash serves multitude of functions apart from found, they may be exploited to impersonate integrity checking such as authentication, a legitimate user and invalidate the authentication watermarking, schemes, or MACs procedure. It was further shown it is possible to find ( Codes). a colliding hash after 15 minutes of computations on Important properties of any hash are fixed length a supercomputer setup [7] which led to in bits, irreversibility, and ease of computation. recommendations that MD5 be not used when Once the textual input is processed into a digest, no generating digital certificates. operation is theoretically capable of producing the In 2004, the first practical attack on MD5, its pre-image. However, as any hash is of fixed size, predecessor MD4 as well as several others was a brute-force attack can be mounted against it during successfully performed [8]. Further improvements which all candidate pre-images are converted into were made based on these findings. In 2005, a pair hashes and compared to the original fingerprint. If of digital certificates compliant with the X.509 a match is made, the input constitutes either the Public Infrastructure (PKI) standard was original pre-image, or a different one which hashes produced, proving the attack was feasible in real- to the same value, a collision. The attack is world applications [9]. Two modifications of the extremely time- and resource-intensive and was flaw were demonstrated, allowing identical considered impractical when cryptographic hash signatures to be produced on a single node with the functions were devised. former capable of performing the operation within Another feature is that a bit change in the pre- several hours using consumer-grade notebook, the image results in at least 50% change in the hash, latter achieving the same goal within 60 seconds on a phenomenon known as [5]. equivalent machine [10]. Avalanche effect ensures the data has not been A research was also conducted which tested tampered by a simple fingerprint check. Hashing is propensity to collision attacks in the PKI model. a lossy process; input source information content is The resulting certificate corroborated that “[it] not preserved. The functions are thus unusable as allows us to impersonate any website on the a storage option, only to ensure pre-image’s validity Internet, including banking and e-commerce sites through comparison. secured using the HTTPS [Hypertext Transfer Hashes have become a widely-utilized means of Protocol Secure]” [11]. validating any type of data irrespective of contents In 2008, the United States Computer Emergency with the only requirement being binary input form. Readiness Team (US-CERT) announced that Software libraries are usually provided by “[s]oftware developers, Certification Authorities, a database vendor out of the box with the option of website owners, and users should avoid using the purchasing additional packages. However, as most MD5 algorithm in any capacity. As previous corporations nowadays limit expenditures into research has demonstrated, it should be considered information technology, it is not reasonable to cryptographically broken and unsuitable for further assume database management systems (DBMSs) use” [12]. In 2012, a new attack purportedly will be enhanced in such a way. Therefore, only demonstrated MD5’s susceptibility to single-block encryption functionality included by default in many collisions, enabling the attacker to forge 64-byte instances of DBMSs will be considered: MD5 and messages with arbitrary hash value [13]. SHA-1. Cryptographic community recommends migration to SHA-1.

ISBN: 978-960-474-316-2 20 Recent Advances in Automatic Control, Information and Communications

2.2 SHA-1 measurements, heat emissions, cycle counts) is The Secure Hash Algorithm 1 was designed by known. Independent on SHA-2, known attack the United States National Security Agency (NSA) vectors are unusable. in 1995 as a successor to the 1993’s SHA-0. With Two theoretical vectors against SHA-3 were the 160-bit digest iterated for 80 rounds, it was used proposed: a zero-sum attack applicable to the for protecting sensitive unclassified information as 9-round reduced version with no effect on the well as in Internet protocols such as Secure Sockets function’s security [20]; and an improved zero-sum Layer (SSL) or Secure Shell (SSH) [14]. It is distinguisher which applies to all 24 rounds and represented as a 40-character sequence. lowers the number of operations from 21579 to 21570 Touted an MD5’s replacement, SHA.1 saw [21]. Both were published before the final version of enormous rise in applications which in turn led to its SHA-3 was selected while no practical cryptanalytic thorough examination from the cryptographic breakthroughs on the final implementation has been community based on previous research focused on published as of yet. SHA-0. First results appeared in 2005 when it was shown it is possible to find a collision in full SHA-1 while requiring fewer computations that it would 3 Best practice: cryptographic salts take to brute-force the hash product, the most time- Regardless of the hash function, the current and resource-intensive cryptographic process [15]. security best practice for storing sensitive data such Further attempts were made to reduce number of as user credentials (logins, passwords) is to utilize operations after which the collision is found. randomized hashing. As the hash itself is A significant discovery was made in 2006 when deterministic – two identical strings produce a collision was found in a reduced SHA-1 with the 35 80 identical outputs –, additional probabilistically complexity of 2 as compared to 2 operations generated data need to be supplanted and processed necessary search the whole hash space [16]. Since along with input data stream. then, several attempts have been made to extend the NIST recommends “[t]he random value… [to] be attack onto full SHA-1 with mixed results. The a message-independent bit string of at least 80 bits, latest breakthrough occurred in 2008 when but no more than 1024 bits… [which] shall have a researcher estimated theoretical number of sufficient randomness to meet the desired security computations for SHA-1 by utilizing his own strength…” [22]. Cryptographic salt should method to 261 operations [17]. therefore be generated using a random number As the computational complexity of attacks on generator whose output meets randomness criteria, SHA-1 has been steadily decreasing SHA-2 was such as Linear Complexity, Approximate Entropy, devised as its direct successor. However, as both Binary Matrix Rank, and Serial Tests [23]. A single systems are based on identical algorithmic value should not be used globally, instead a per-user operations it is expected an optimized SHA-1 attack or per-application salt stored in a separately will be applicable to SHA-2 as well. database from the hash products is recommended. In 2012, a successor to SHA-1 and SHA-2 was The value is introduced to force threat agent to selected by the NIST (National Institute of generate a large set of candidate hashes Standards and Technology) after an open corresponding to the string being reverse engineered competition with the specific aim of choosing for a fixed iteration count. “[T]he number of sLen a function dissimilar to its predecessors. As possible resulting [hashes] is approximately 2 published attacks on SHA-2 break 46 out of 64 where sLen is the length of the salt in bits. rounds for SHA-256 [18] and equivalent amount of Therefore, using a salt makes it difficult for the rounds for the 80-iteration SHA-512, it is expected attacker to generate a table of resulting [hashes], for the full system to be targeted eventually despite even a small subset of the most-likely passwords” currently being computationally infeasible. SHA-3 [24]. utilizes functions with sponge construction [19], Dictionary attacks where permutations of making harder for the attacker to differentiate it commonly-used terms are generated based on a set from a random oracle, a theoretical scenario in of rules are largely mitigated, as are brute-force which any input is encrypted randomly in a black attacks. A break-through occurs when an attack box setting. Any outside agent is incapable to vector enabling pre-image extraction after lower discern whether the output was produced based on number of operations (and thus time factor a random function or a genuine encryption involved) than exhaustive search is discovered. algorithm if no other information (timing

ISBN: 978-960-474-316-2 21 Recent Advances in Automatic Control, Information and Communications

Cryptographic salts also makes time—memory with per-user (pseudo)random data, and only then tradeoff difficult to implement. First described in hash the string. Computationally demanding hashing 1980 [25], the technique trades time dedicated to algorithms such as SHA-256, SHA-512 etc. produce computing possible solutions during the attack for more secure outputs, the processing overhead is, a precomputed lookup data array where a simple however, increased. Every entity dealing with search algorithm can be applied to find the correct sensitive data must therefore decide whether the value. A threshold exists, though, above which table benefit of security outweighs the disadvantage of lookups become costly and ineffective. After several higher computational demands which are usually in improvements, a new version was introduced in abundance with pervasive cloud infrastructure on 2003 making use of non-merging rainbow chains, a pay-per-use basis available. addressing the issue in the original proposal [26]. An alternative proposed in 1996 is titled Hash- The technique achieved 99.9% success rate when based Message Authentication (HMAC) and reverse engineering Microsoft Windows LM hashes provides increased security for MAC by combining with a lookup table the size of 1.4GB. As the prices a hashing scheme with a cryptographic key [28]. of storage media decreases per Moore’s law, The strength of the output is dependent on the hash rainbow tables in tens of terabytes will proliferate function (SHA-1, MD5) and its bit size along with which utilize high-speed storage media such as SSD the parameters of the key. Theoretical attacks exist (Solid-State Drive). which don’t in any way subvert HMAC’s security. If the input to the hash function concatenated Proper implementation libraries utilizing key with a salt prior to being reduced to a fixed-size stretching are freely available: , , output, the attacker is forced to precompute the PBKDF2 etc. purposefully slows the lookup array for every possible salt value. process of computing the resulting function as much Therefore, security depends on uniqueness and as possible by using longer salts, higher number of length of the random value being appended or iterations (each round is repeated an arbitrary prepended to the (presumably) non-random input number of times), and limiting parallelizability of string. Salts, key strengthening and key stretching data arrays stored in memory. Scrypt in particular make time—memory tradeoff substantially difficult hinders attempts at reverse engineering on rainbow to balance compared to a brute-force attack. Key tables as well as ASIC (Application-Specific stretching is discussed in Section 4. Key Integrated Circuit), FPGA (Field-Programmable strengthening was devised in 1994 and splits the salt Gate Array) and GPU (Graphics Processing Unit), in two parts: public and secret [27]. While the public dedicated hardware modules exhibiting high part is stored, the secret is securely deleted after first computational throughput for mathematical use and becomes unknown. When the user enters operations. Generating a large vector of a , the server must perform a brute-force pseudorandom bit strings in memory, it accesses the attack using the public part of the salt to determine resulting structure in a pseudorandom fashion [29]. the secret portion, increasing both per-user Each of the vector’s elements is resource-intensive computational requirements and security. The to compute and may possibly be accessed on many attacker, however, must exhaustively search the occasions during the algorithm’s run, precluding whole hash space, i.e., both parts of the salt. distribution of the workload to a cluster of nodes. The salt or its part and the algorithm used to The library is set to be standardized by the IETF. generate the output must be known server-side to allow comparison of the data to the stored value. No 5 Conclusions plaintext-formatted data should be stored at any Website administrators should be informed about point, only the fingerprints. advances in hash functions to ensure timely transitions to a well-known and proven scheme with adequate for their data- 4 Discussion storing infrastructures. Increasing the time factor In the paper, a security overview of SHA-1 and involved via per-user salt and high number of MD5 systems has been presented. While MD5’s iterations should in particular be considered phasing out has been somewhat slow, by the time a priority due to advances in hardware performance. the shift to SHA-1 is completed, it may be necessary MD5 have been proven insecure against several to revise the system in favor of a more resilient one. attacks under realistic assumptions and its use is To ensure such long-term resilience of the stored therefore discouraged in favor of more resilient, hashes against offline brute-force attempts, security key-stretching iterative hashing algorithms. Despite community urges to first concatenate sensitive data no SHA-1 hash collisions have been produced so

ISBN: 978-960-474-316-2 22 Recent Advances in Automatic Control, Information and Communications

far, advances as per Moore’s law, complexity and Cryptology ePrint Archive, Report 2005/067, future-proof modifiability should be taken into 2005, Available at: account when selecting suitable hashing functions http://eprint.iacr.org/2005/067 Accessed on foe sensitive data. 2013-06-03. Accessed on: 2013-25-01. [10] Klima, Vlastimil, Finding MD5 Collisions – 5 Acknowledgments a Toy for a Notebook (online), Cryptology The article was supported by the "Centre for ePrint Archive, Report 2005/075, 2006, Security, Information and Advanced Technologies Available at: http://eprint.iacr.org/2005/075 (CEBIA-Tech) project, registration number Accessed on: 2013-26-01. CZ.1.05/2.1.00/03.0089”. [11] Sotirov, Alexander, Stevens, Marc, Appelbaum, Jacob, Lenstra, Arjen et al., MD5 References: considered harmful today (online), Technische [1] PCI Security Standards Council, Payment Card Universiteit Eidhoven, 2008, Available at: Industry Data Security Standard 2.0 (online), http://www.win.tue.nl/hashclash/rogue-ca/ PCI Security Standards Council, 2010, Accessed on: 2013-26-01. Available at: [12] US-CERT, MD5 vulnerable to collision https://www.pcisecuritystandards.org/security_s attacks (online), US-CERT, 2008, Available at: tandards/documents.php Accessed on: 2013-25- http://www.kb.cert.org/vuls/id/836068 Accessed 01. on 2013-26-01. [2] EU, Directive 95/46/EC of the European [13] Stevens, Marc, Single-block collision for Parliament and of the Council of 24 October MD5 (online), marc-stevens.nl, 2012, Available 1995 on the protection of individuals with at: http://marc-stevens.nl/research/md5-1block- regard to the processing of personal data and on collision/ Accessed on: 2013-07-03. the free movement of such data (online), EUR- [14] Eastlake, Donald E. 3rd, and Jones, Peter, Lex, 1995, Available at: http://eur- US Secure Hash Algorithm 1 (SHA1) (online), .europa.eu/LexUriServ/LexUriServ.do?uri=C Internet Engineering Task Force, 2001, ELEX:31995L0046:en:HTML Accessed on Available at: tools.ietf.org/html/rfc3174 2013-06-03. Accessed on: 2013-26-01. [3] Moore, Gordon E, Cramming more components [15] Wang, Xiaoyun, and Yu, Hongbo, How to onto integrated circuits, Electronics, Vol. 38, Break MD5 and Other Hash Functions, Lecture No 8, 1965, pp. 4—8. Notes in Computer Science, Vol. 3494, 2005, [4] Schneier, Bruce, Applied Cryptography, Second pp 561—577. Edition: Protocols, Algorithms, and Source [16] De Cannière, Christophe, and Rechberger, Code in C, Wiley, New Jersey, 1996. Christian, Finding SHA-1 Characteristics: [5] Feistel, Hors, Cryptography and Computer General Results and Applications, Lecture Privacy, Scientific American, May 1973, Notes in Computer Science, Vol. 4284, 2006, Volume 228, No 5, 1973, pp. 15—23. pp 1—20. [6] Rivest, Ronald, The MD5 Message Digest [17] Stevens, Marc, hashclash - Framework for Algorithm (online), Internet Engineering Task MD5 & SHA-1 Differential Path Construction Force, 1992, Available at: and Chosen-Prefix Collisions for MD5 (online), http://tools.ietf.org/html/rfc1321 Accessed on: Google Project Hosting, 2011, Available at: 2013-25-01. https://code.google.com/p/hashclash/ Accessed [7] Wang, Xiaoyun, and Yu, Hongbo, How to on: 2013-06-03. Break MD5 and Other Hash Functions, Lecture [18] Lamberger, Mario, and Mendel, Florian, Notes in Computer Science, Vol. 3494, 2005, Higher-Order Differential Attack on Reduced pp 561—577. SHA-256 (online), Cryptology ePrint Archive, [8] Wang, Xiaoyun, Feng, Dengguo, Lai, Xuejia, Report 2011/037, 2011, Available at: and Yu, Hongbo, Collision for Hash Functions http://eprint.iacr.org/2011/037 Accessed on: MD4, MD5, HAVAL-128 and RIPEMD 2013-06-03. (online), Cryptology ePrint Archive, Report [19] Bertoni, Guido, Daemen, Joan, Peeters, 2004/199, 2004, Available at: Michaël, and Van Assche, Gilles, Sponge http://eprint.iacr.org/2004/199 Accessed on Functions”, 2007, Ecrypt Hash Workshop 2007 2013-26-01. [20] Aumasson, Jean-Philippe, and Meier, Willi, [9] Lenstra, Arjen, Wang, Xiaoyun, and de Weger, Zero-sum distinguishers for reduced Keccak-f Benne, Colliding X.509 Certificates (online), and for the core functions of Luffa and Hamsi

ISBN: 978-960-474-316-2 23 Recent Advances in Automatic Control, Information and Communications

(online), Jean-Philippe Aumasson, 2009, [25] Hellman, Martin, A Cryptanalytic Time— Available at: Memory Trade-Off”, IEEE Transactions on https://131002.net/data/papers/AM09.pdf Information Theory, Vol. 26, No 4, 1980, Accessed on: 2013-06-03. pp. 401—406. [21] Ming, Duan, and Xuajia, Lia, Improved [26] Oechslin, Philippe, Making a Faster Time— zero-sum distinguisher for full round Keccak-f Memory Trade-Off, Proceedings 23rd Annual permutation (online), Cryptology ePrint International Cryptology Conference (CRYPTO Archive, Report 2011/023, 2011, Available at: 2003), Santa Barbara, California, 2003, http://eprint.iacr.org/2011/023 Accessed on: pp. 617—630. 2013-26-03. [27] Manber, Udi, A Simple Scheme to Make [22] Dang, Quynh, NIST Special Publication Passwords Based on One-Way Functions Much 800-106: Randomized Hashing for Digital Harder to Crack (online), University of Arizona, Signatures (online), National Institute for Department of Computer Science, 1994, Standards and Technology, 2009, Available at: Available at: http://webglimpse.net/pubs/TR94- http://csrc.nist.gov/publications/nistpubs/800- 34.pdf Accessed on: 2013-07-03. 106/NIST-SP-800-106.pdf Accessed on: 2013- [28] Bellare, Mihir, Canetti, Ran, and Krawczyk, 07-03. Hugo, Keying hash functions for message [23] Rukhin, Andrew, Soto, Juan, Nechvatal, authentication (online), University of California James, Smid, Miles et al., NIST Special San Diego, Computer Science and Engineering, Publication 800-22, Revision 1a: A Statistical 1996, Available at: Test Suite for Random and Pseudorandom http://cseweb.ucsd.edu/~mihir/papers/kmd5.pdf Number Generators for Cryptographic Accessed on 2013-07-03. Applications (online), National Institute of [29] Percival, Colin, Stronger Key Derivation Standards and Technology, 2010, Available at: via Sequential Memory-Hard Functions (online) http://csrc.nist.gov/publications/nistpubs/800- BSDCan'09, Ottawa, Canada, 2009, Available 22-rev1a/SP800-22rev1a.pdf Accessed on: at: 2013-07-03. http://www.bsdcan.org/2009/schedule/attachme [24] Turan, Meltem Sönmez, Barker, Elaine, nts/87_scrypt.pdf Accessed on: 2013-07-03. Burr, William, and Chen, Lily, NIST Special Publication 800-132: Recommendation for Password-Based Key Derivation, Part 1: Storage Applications (online) National Institute of Standards and Technology, 2010, Available at: http://csrc.nist.gov/publications/nistpubs/800- 132/nist-sp800-132.pdf Accessed on: 2013-07- 03.

ISBN: 978-960-474-316-2 24