Analysis of SHA-3 Candidates Cubehash and Keccak
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of SHA-3 Candidates CubeHash and Keccak ALEXANDER ROS and CARL SIGURJONSSON Bachelor of Science Thesis Stockholm, Sweden 2010 Analysis of SHA-3 Candidates CubeHash and Keccak ALEXANDER ROS and CARL SIGURJONSSON Bachelor’s Thesis in Computer Science (15 ECTS credits) at the School of Computer Science and Engineering Royal Institute of Technology year 2010 Supervisor at CSC was Cristian Bogdan Examiner was Mads Dam URL: www.csc.kth.se/utbildning/kandidatexjobb/datateknik/2010/ ros_alexander_OCH_sigurjonsson_carl_K10067.pdf Royal Institute of Technology School of Computer Science and Communication KTH CSC 100 44 Stockholm URL: www.kth.se/csc Abstract In 2005 the popular hash algorithm standard SHA-1 was revealed to be vulnerable to attacks much faster than brute force. Although a serious weakness in theory, due to the large amount of computations required to mount such an attack it did not pose any immediate threat in practice. In a few years however given technological advancements and improved cryptoanalysis on these algorithms, this is likely not to be the case. Thus the need for a modern set of secure hash algorithms. In an effort to realize this NIST announced in 2007 a public competition to develop the best possible replacement to be named SHA-3 by 2012. In this report we analyze two promising candidates, namely CubeHash and Keccak. A series of tests are conducted measuring their efficiency with regards to performance. Referat Analys av SHA-3 kandidaterna CubeHash och Keccak 2005 visade det sig att den populära hashstandarden SHA-1 innehåll svagheter mot attacker snabbare än en brute force attack. Detta medför allvarliga följder i teorin men på grund av det massiva antalet uträkning- ar har det inte än någon praktisk tillämpning. Med åren kommer dock datorer att utvecklas till den grad att detta kan utgöra ett problem. Av den anledningen ökade kraven på en modernare hashstandard och 2007 tillkännagav NIST att man har inlett en tävling för att hitta den bästa ersättaren som beräknas vara klar 2012. I denna rapport kommer vi analysera två lovande kandidater, Cubehash och Keccak, genom att utföra en serie tester med avseende på prestanda. Contents 1 Introduction 1 1.1 Brief description . 1 1.2 New hash standard . 1 1.3 Purpose of the report . 2 2 Background 3 2.1 Hash Functions . 3 2.2 Constructions . 4 2.2.1 Merkle-Damgård . 4 2.2.2 HAIFA . 5 2.2.3 Cryptographic sponges . 6 3 Secure Hash Algorithm 9 3.1 Overview of SHA-3 candidates . 9 4 Keccak Algorithm 11 4.1 Padding . 11 4.2 Overview . 11 4.3 Parameters . 12 4.4 Keccak-f . 12 5 CubeHash Algorithm 13 5.1 Parameters . 13 5.2 Overview . 13 5.3 Transformation . 14 6 Tests 15 6.1 Results . 15 7 Discussion 17 7.1 The tests . 17 7.2 Conclusion . 18 7.3 References . 18 Chapter 1 Introduction Hash functions have come to play a major role in modern cryptography being widely deployed in various security protocols and applications. The most common uses include digital signatures, message authentication code (MAC) and password pro- tection schemes. They are also useful when storing large data sets by means of fast lookups but used in a different context to cryptographic hash functions they will not be covered in this report. 1.1 Brief description A cryptographic hash function takes a message of arbitrary length as input and produces a value of fixed length as output. This value is usually called a message digest or simply a hash. Much like the signature of a document proving it’s au- thentic; the hash assures the integrity of the original message. Since altering the message even the slightest results in a different hash, the alteration becomes clear to anyone comparing the hashes. Generally for a hash function to be considered secure it should resist the following three types of attacks: • Collision - Find any two messages that produce the same hash • 1st preimage - Find the message given its hash. • 2nd preimage - Given a message, find another message that has the same hash. 1.2 New hash standard Currently the most used hash algorithm is SHA-1 (Secure Hash Algorithm ver- sion 1) and was thought secure until a team of Chinese cryptographers in 2005 discovered a weakness that made it significantly easier to produce collisions [13]. It was recommended that SHA-2 be used instead and although no known attacks exist for SHA-2, considering the structural similarities with SHA-1 it is not an ideal long time replacement. In 2007 the National Institute of Standards and Technology 1 CHAPTER 1. INTRODUCTION (NIST) proposed an open competition for the development of SHA-3, the next hash standard. 1.3 Purpose of the report In this report we study two of the currently fourteen candidates in the SHA-3 competition. Our main focus is evaluating their performance with the previous standards (SHA-1, SHA-2) as a point of reference. We begin with a brief look at traditional construction models used to design hash functions as well as some new ones common among the SHA-3 candidates. In chapter 4 and 5 we analyse the hash algorithms for our chosen candidates and discuss their features. This is followed by the main benchmarking tests. Finally we present the results and our conclusions based on them. 2 Chapter 2 Background 2.1 Hash Functions Hash functions process arbitrary messages and produce fixed-length outputs and ideally each possible message should map to a distinct hash. The function is then said to be collision free. In reality however, hash functions are bound to have collisions given the infinite domain and finite range. The probability of a collision occurring by chance depends on the hash size but in most cases it is negligibly small. Furthermore hash functions should be strictly one-way, i.e. it should not be possible to find the original message from the hash. Or if given a message it should not be possible to find another message that produce the same hash. These are the main properties required of a hash function and more formally defined as Collision resistance Function H is considered collision resistant if it.s infeasible to find two distinct messages m and m. such that H(m) = H(m0). Preimage resistance Function H is considered preimage resistant if given hash h it.s infeasible to find H(m) = h. 2nd preimage resistance Function H is considered 2nd preimage resistant if given message m it is infeasible to find a m0 such that m 6= m0 and H(m) = H(m0). n The generic “brute force” collision attack has a complexity of 2 2 where n is the bitlength of the hash. This is also known as a birthday attack due to the math- ematical similarities to the birthday problem [5]. Preimage attacks pose a greater practical threat than collision attacks and requires O(2n) computations for an n-bit hash. A common model used in cryptography is the random oracle model, a theoretical black box that generates a completely random output. Yet if called again with the same input it returns the same result. Although impossible to implement in practice they are of great help when designing secure hash functions and used in many proofs. 3 CHAPTER 2. BACKGROUND 2.2 Constructions Since hash functions have an infinite domain processing arbitrary long data inputs they are generally constructed to divide the data into fixed sized blocks. The blocks are then either chained though a series of compress function calls or incorporated into an inner state, which is an internal data structure used to keep track of the intermediate hash data. In this section we describe some of the common constructions for the different SHA-3 candidates. 2.2.1 Merkle-Damgård Merkle-Damgård (MD) is the construction model that the majority of commonly used hash functions like MD5, SHA-1 and SHA-2 are based on. It was named after the authors Merkle and Damgård who proved that if the compression function is collision resistant then so is also the hash function that builds on it [20]. Figure 2.1. Merkle-Damgård construction The figure above illustrates how the message is broken up in n fixed b-size blocks and passed to the underlying compression function f. The compression function expects two input values of size b and outputs a value of size b. An initialization vector (IV) that is predefined and set depending on implementation is first passed to f along with the first message block. The resulting value is then chained to the next iteration of f along with subsequent message block. This procedure is repeated until all blocks have been processed. Padding is applied in the last block by appending a single 1 followed by enough 0.s to ensure block size of b. The length of the original message must be included, either in the padding space (i.e. replacing zeros) or in an extra block should it not fit. The finalisation step varies with implementation but most times it is designed to achieve good avalanche effect, which is mixing of the hash bits so that even near identical input messages generates greatly differing hash outputs. 4 2.2. CONSTRUCTIONS Merkle-Damgård strengthening The length padding procedure is known as Merkle-Damgård strengthening and is important as without it the collision resistance proof mentioned above is invalid. To illustrate this point, say we have a MD based hash function H with a com- pression function f. We are given two messages m0 and m1 with the effect that H(m1) = H(m0 || m1) and f(iv, m0) = iv.