Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using Fpgas

Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs Ekawat Homsirikamol Marcin Rogawski Kris Gaj George Mason University ehomsiri, mrogawsk, [email protected] Last revised: December 21, 2010 This work has been supported in part by NIST through the Recovery Act Measurement Science and Engineering Research Grant Program, under contract no. 60NANB10D004. Contents 1 Introduction and Motivation 2 2 Methodology 4 2.1 Choice of a Language, FPGA Devices, and Tools . 4 2.2 Performance Metrics for FPGAs . 4 2.3 Uniform Interface . 7 2.4 Assumptions and Simplifications . 9 2.5 Optimization Target . 9 2.6 Design Methodology . 10 3 Comprehensive Designs of SHA-3 Candidates 13 3.1 Notations and Symbols . 13 3.2 Basic Component Description . 15 3.2.1 Multiplication by 2 in the Galois Field GF(28) . 15 3.2.2 Multiplication by n in the Galois Field GF(28) . 15 3.2.3 AES . 16 3.3 BLAKE . 19 3.3.1 Block Diagram Description . 19 3.3.2 256 vs. 512 Variant Differences . 21 3.4 Blue Midnight Wish (BMW) . 24 3.4.1 Block Diagram Description . 24 3.4.2 256 vs. 512 Variant Differences . 24 3.5 CubeHash . 26 3.5.1 Block Diagram Description . 26 3.5.2 256 vs. 512 Variant Differences . 26 3.6 ECHO . 29 3.6.1 Block Diagram Description . 29 3.6.2 256 vs. 512 Variant Differences . 31 3.7 Fugue . 33 3.7.1 Block Diagram Description . 33 2 Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates 3 3.7.2 256 vs. 512 Variant Differences . 35 3.8 Groestl . 39 3.8.1 Block Diagram Description . 39 3.8.2 256 vs. 512 Variant Differences . 41 3.9 Hamsi . 43 3.9.1 Block Diagram Description . 43 3.9.2 256 vs. 512 Variant Differences . 45 3.10 JH . 48 3.10.1 Block Diagram Description . 48 3.10.2 256 vs. 512 Variant Differences . 48 3.11 Keccak . 53 3.11.1 Block Diagram Description . 53 3.11.2 256 vs. 512 Variant Differences . 57 3.12 Luffa . 58 3.12.1 Block Diagram Description . 58 3.12.2 256 vs. 512 Variant Differences . 58 3.13 SHA-2 . 65 3.13.1 Block Diagram Description . 65 3.13.2 256 vs. 512 Variant Differences . 65 3.14 Shabal . 67 3.14.1 Block Diagram Description . 67 3.14.2 256 vs 512 Variant Differences . 67 3.15 SHAvite-3 . 69 3.15.1 Block Diagram Description . 69 3.15.2 256 vs. 512 Variant Differences . 70 3.16 SIMD . 75 3.16.1 Block Diagram Description . 75 3.16.2 256 vs. 512 Variant Differences . 82 3.17 Skein . 86 3.17.1 Block Diagram Description . 86 3.17.2 256 vs. 512 Variant Differences . 87 4 Design Summary and Results 90 4.1 Design Summary . 90 4.2 Relative Performance of the 512 and 256-bit Variants of the SHA-3 Candidates 91 4.3 Results . 93 5 Results from Other Groups 108 5.1 Best Results from Other Groups . 108 5.2 Best Results . 109 4 E. Homsirikamol, M. Rogawski, and K. Gaj 6 Conclusions and Future Work 114 A Absolute Results for Ten FPGA Families 116 Abstract Performance in hardware has been demonstrated to be an important factor in the evaluation of candidates for cryptographic standards. Up to now, no consensus exists on how such an evaluation should be performed in order to make it fair, transparent, practical, and acceptable for the majority of the cryptographic community. In this report, we formulate a proposal for a fair and comprehensive evaluation methodology, and apply it to the comparison of hardware performance of 14 Round 2 SHA-3 candidates. The most important aspects of our methodology include the definition of clear performance metrics, the development of a uniform and practical interface, generation of multiple sets of results for several representative FPGA families from two major vendors, and the application of a simple procedure to convert multiple sets of results into a single ranking. The VHDL codes for 256 and 512-bit variants of all 14 SHA-3 Round 2 candidates and the old standard SHA-2 have been developed and thoroughly verified. These codes have been then used to evaluate the relative performance of all aforementioned algorithms using ten modern families of Field Programmable Gate Arrays (FPGAs) from two major vendors, Xilinx and Altera. All algorithms have been evaluated using four performance measures: the throughput to area ratio, throughput, area, and the execution time for short messages. Based on these results, the 14 Round 2 SHA-3 candidates have been divided into several groups depending on their overall performance in FPGAs. Chapter 1 Introduction and Motivation Starting from the Advanced Encryption Standard (AES) contest organized by NIST in 1997-2000 [1], open contests have become a method of choice for selecting cryptographic standards in the U.S. and over the world. The AES contest in the U.S. was followed by the NESSIE competition in Europe [2], CRYPTREC in Japan, and eSTREAM in Europe [3]. Four typical criteria taken into account in the evaluation of candidates are: security, performance in software, performance in hardware, and flexibility. While security is commonly recognized as the most important evaluation criterion, it is also a measure that is most difficult to evaluate and quantify, especially during a relatively short period of time reserved for the majority of contests. A typical outcome is that, after eliminating a fraction of candidates based on security flaws, a significant number of remaining candidates fail to demonstrate any easy to identify security weaknesses, and as a result are judged to have adequate security. Performance in software and hardware are next in line to clearly differentiate among the candidates for a cryptographic standard. Interestingly, the differences among the cryptographic algorithms in terms of hardware performance seem to be particularly large, and often serve as a tiebreaker when other criteria fail to identify a clear winner. For example, in the AES contest, the difference in hardware speed between the two fastest final candidates (Serpent and Rijndael) and the slowest one (Mars) was by a factor of seven [1][4]; in the eSTREAM competition the spread of results among the eight top candidates qualified to the final round was by a factor of 500 in terms of speed (Trivium x64 vs. Pomaranch), and by a factor of 30 in terms of area (Grain v1 vs. Edon80) [5][6]. At this point, the focus of the attention of the entire cryptographic community is on the SHA-3 contest for a new hash function standard, organized by NIST [7][8]. The contest is now in its second round, with 14 candidates remaining in the competition. The evaluation is scheduled to continue until the second quarter of 2012. In spite of the progress made during previous competitions, no clear and commonly 2 Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates 3 accepted methodology exists for comparing hardware performance of cryptographic algorithms [9]. The majority of the reported evaluations have been performed on an ad-hoc basis, and focused on one particular technology and one particular family of hardware devices. Other pitfalls included the lack of a uniform interface, performance metrics, and optimization criteria. These pitfalls are compounded by different skills of designers, using two different hardware description languages, and no clear way of compressing multiple results to a single ranking. In this paper, we address all the aforementioned issues, and propose a clear, fair, and comprehensive methodology for comparing hardware performance of SHA-3 candidates and any future algorithms competing to become a new cryptographic standard. Our methodology is based on the use of FPGA devices from various vendors. The advantages of using FPGAs for comparison include short development time, wide availability of tools, and a limited number of vendors dominating the market. The hardware evaluation of SHA-3 candidates started shortly after announcing the specifications and reference software implementations of 51 algorithms submitted to the contest [7][8][10]. The majority of initial comparisons were limited to less than five candidates, and their results have been published at [10]. The more comprehensive efforts became feasible only after NISTs announcement of 14 candidates qualified to the second round of the competition in July 2009. Since then, several comprehensive studies have been reported in the Cryptology ePrint Archive [11][12], at the CHES 2010 workshop [13][14], and during the Second SHA-3 Candi- date Conference [15][16][17][18]. This report is an extension of our earlier papers presented at CHES 2010 and the SHA-3 Candidate Conference [13][16]. To our best knowledge, this is the first report that presents the detailed block diagrams of all 14 Round 2 SHA-3 Can- didates, and a comprehensive set of results covering two major SHA-3 variants (SHA-3-256 and SHA-3-512) implemented using 10 FPGA families from two major vendors, Xilinx and Altera. Chapter 2 Methodology 2.1 Choice of a Language, FPGA Devices, and Tools Out of two major hardware description languages used in industry, VHDL and Verilog HDL, we choose VHDL. We believe that either of the two languages is perfectly suited for the implementation and comparison of SHA-3 candidates, as long as all candidates are described in the same language. Using two different languages to describe different candidates may introduce an undesired bias to the evaluation. FPGA devices from two major vendors, Xilinx and Altera, dominate the market with about 90% of the market share.

Load more