<<

arXiv:1804.06497v1 [cs.CR] 17 Apr 2018 nvriyo ehooy ern rn mi:sbayat@sha Email: Iran, Tehran, Technology, of University ne lrd tatcUiest,Bc ao,F 42,Em 14623, FL Raton, Boca University, [email protected]. Atlantic Florida ence, rtmtcette,se o ntne 9,[0,[1,[1 [11], [10], [9], and instance, cryptographic for for detection see, date fault to entities, proposed many arithmetic been Therefore, have crypto-algorithms. schemes the one of [8], algorithms. hash in achieved. these are discussed and implementations sha low-complexity AES can As Thus, one the that between [7]. is resources functions and some hash these [6], of feature and se [5], important algorithms, high-performance instance, these developing of for much into side-channel implementations Moreover, put hardware and presented. efficient been differential are has [4], ECHO effort and for literatur attacks [3] NIST the analysis the in in of attention instance, part much For been received have have AE (which competition) These functions schemes. hash detection inspired fault been respective w b has inspired their (AES), are (which Standard present of these that [2] efficiency Advanced fact widely-utilized the Fugue the the and to and 2.0), fo Fugue due [1] to integrity paper, improved ECHO and this algorithms authentication In the provide data. transferred to the utilized functi hash then of output is The outputs. fixed-length generate and n niern,Uiest fSuhFoia ap,F 336 FL Tampa, Florida, South of [email protected]. University Engineering, and ihacpal vredfrtepooe schemes. o proposed the of cover benchmark for error results overhead high to acceptable The very with show technology characteristics. implementations timing standard and simulations and 65-nm hardware a their using malic circui architectur platform integrated of hardware application-specific effectiveness the reliable on the proposed implemented The deteriorating attacks. of fault capable detect are f fault hardware natural and proposed and detecting the of signat transformations highly-capable that predicted are schemes show specific byte/word-wide we include the simulations, to Through for differ tailored overhead be of can low signatures achieve are predicted signatures to to These algorithms. the presented efficie presen these in by been for transformations the schemes not formulations detection of have fault closed efficient reliability Fugue state-of-the propose and We the the date. ECHO increasing in functions for Nonetheless, hash securi approaches within. and performance algorithms the the the ameliorating to of devoted been has ivs aa-amd swt eateto optrEngin Computer of Department with is Bayat-Sarmadi Siavash eaAadrks swt eateto C n optrSci Computer and ECE of Department with is Azarderakhsh Reza ernMzfaiKraii ihDprmn fCmue Sci Computer of Department with is Kermani Mozaffari Mehran al tak oesrostrast h implementations the to threats serious pose attacks Fault rporpi ahfntostk rirr-eghinput arbitrary-length take functions hash Cryptographic Abstract ihwih adaeAcietrsfrEfficient for Architectures Hardware Lightweight I rporpi niern,etnieattention extensive engineering, cryptographic —In eueHs ucin COadFugue and ECHO Functions Hash Secure .I I. ernMzfaiKrai eaAadrks,SaahBaya Siavash Azarderakhsh, Reza Kermani, Mozaffari Mehran NTRODUCTION rif.edu. eig Sharif eering, i:razarder- ail: 0 Email: 20, (ASIC) t derived ailures sare es ures. -art, ious ting ence age ion ent 2], on ur S- nt re ty e, e. y e s r - following. for essential are architectures. algorithms hardware these reliable on achieving sc overhead detection fault examples. minimal Effective for with literature. schemes some open bee the the not for in have knowledge, algorithms sented [18] these our of of and reliability the best increasing [17], the [16], to Nonetheless, [15], [14], [13], aldCompress called 1 Compress to similar very h opeso ucini aldCompress called is function compression the i.e., etdb h B)[] uu-5 eeae 5-i output 256-bit a generates Fugue-256 [2]. H IBM) the by sented needed. as paper the throughout r rsne hogotteppra needed. as detail paper the More throughout [2]. presented sub-rounds are two by followed transformation o ITcmeiinwr 2,26 8,ad52bt.The outputs ( bits. four size 512 the output and bits, the 384, 512 with 256, to algorithm function 224, 128 ECHO were hash from competition The length NIST output bits. any for the Although of input. 512 as be to can a and 128 message a from takes ECHO length of output aldasbrud hrfr,around a Therefore, sub-round. a called cle n round one (called updating for m used is sequence transformation following sas pi o3-i lcsdntdby denoted blocks 32-bit to split also is ≤ i h umr forcnrbtosi rsne nthe in presented is contributions our of summary The • • • nwa olw,w xli h ahfnto uu (pre- Fugue function hash the explain we follows, what In CO(rsne yBenadjila by (presented ECHO I,RR,CI,SI,RR,CI,adSMIX and CMIX, ROR3, SMIX, CMIX, ROR3, TIX, : o h message the for i .. CO[]adFge[] h rsne closed presented high-performance schemes. The detection proposing fault [2]. for effective and used Fugue are and formulations [1] algorithms, hash ECHO for i.e., transformations sig- different predicted of the for natures formulations new obtained have We ae h rpsdacietrsrlal npractice. This in algorithms. reliable the architectures all proposed for the schemes makes capabil- proposed detection the fault for high ity show results simulation Our 128 ae h rpsdacietrssial o high- for suitable architectures schemes proposed applications. proposed proposed performance the the the of of efficiency makes high characteristics The the timing schemes. benchmark to and implementations hardware ASIC used have We ≤ t ≤ h hiigvleo uu-5 dntdby (denoted Fugue-256 of value chaining The . H size 512 R oee,for However, . M .Tesqec O3 MX MXis SMIX CMIX, ROR3, sequence The ). I P II. ≤ hc ssltit 2btblocks 32-bit into split is which 256 RELIMINARIES 512 t-Sarmadi sstecmrsinfunction compression the uses , 1.Mr eal r presented are details More [1]. tal. et 257 1 uprsayhash any supports [1] ) R H S size ossso h TIX the of consists ≤ i , 0 esta 256, than less ) H ≤ 1024 size i ≤ hc is which ≤ 29 h hemes pre- n The . from 512 m h i s 1 ) , , 2

C1 salt Ci salt Ct salt The next transformation used in BIG.SubWords of ECHO is 128 128 128 ShiftRows whose fault detection is straightforward and by re- BIG.SubWords BIG.SubWords BIG.SubWords wiring. Moreover, for the two final linear transformations, i.e., BIG.ShiftRows u8 BIG.ShiftRows u8 BIG.ShiftRows u8 BIG.MixColumns BIG.MixColumns BIG.MixColumns MixColumns and AddRoundKey, the 32-bit error indication T 3 BIG.Final BIG.Final BIG.Final h flag Ec = Pr=0(inr,c + kr,c + outr,c), 0 ≤ c ≤ 3, can be Compress Compress512 Compress512 512 used. It is noted that inr,c, kr,c, and outr,c are the input to 16u 128 16u 128 16u 128 MixColumns, the round , and the output of AddRoundKey, v0 m0 m4 m8 v0 m0 m4 m8 v0 m0 m4 m8 0 1 1 1 i1 i i i t1 t t t respectively. This error indication flag can be compressed so v1 m1 m5 m9 v1 m1 m5 m9 v1 m1 m5 m9 0 1 1 1 i1 i i i t1 t t t that an n-bit, 1 ≤ n ≤ 32, error indication flag for these two 2 6 2 2 2 6 10 " 2 10 " 2 6 10 v m m m v m m m vt 1 m m m 0 1 1 1 i1 i i i  t t t transformations are achieved. Finally, after two rounds of the v3 m3 m7 m11 v3 m3 m7 m11 v3 m3 m7 m11 0 1 1 1 i1 i i i t1 t t t AES, the output of BIG.SubWords is derived. Fault detection for the next transformation in ECHO, Fig. 1. The ECHO algorithm for 128 ≤ Hsize ≤ 256 [1]. BIG.ShiftRows, is by permutation. As explained in the afore- mentioned explanation, the last transformation in BIG.Round, i.e., BIG.MixColumns, is an expansion of MixColumns of the III. THE PROPOSED FAULT DIAGNOSIS APPROACHES AES. Specifically, the output state of BIG.SubWords (input state of BIG.MixColumns) is arranged as a 4-row, 64-column In what follows, for each of the algorithms presented in this matrix. Then, each 4 × 4 sub-matrix is multiplied by the fixed paper, we propose respective fault detection schemes. MixColumns matrix. Therefore, we obtain the error indication flags of the BIG.MixColumns (B.MC) transformation for j sub-matrices, j , as follows A. ECHO 0 ≤ ≤ 15 3 An overview of the ECHO algorithm for 128 ≤ Hsize ≤ j Ec(B.MC)= X(inr,c + outr,c), 4j ≤ c ≤ 4j +3, (1) 256 including the Compress512 functions is presented in Fig. r=0 1. As seen in Fig. 1, each of the t Compress512 functions gets where in the sub-matrices, in and out are the input and the 128-bit salt, a 4×4 state of 128-bit entries, and the counter r,c r,c output of BIG.MixColumns, respectively, for which 0 ≤ r ≤ 3 C , 1 ≤ i ≤ t (used to count the number of message bits being i and 0 ≤ c ≤ 63. hashed). The first column of the state consists of four 128-bit Finally, the BIG.Final transformation is performed as the values which construct the chaining variable of the previous 0 1 2 3 last transformation in each Compress512 (see Fig. 1) of ECHO. Compress512, i.e., V −1 v , v , v , v , i t. i = ( i−1 i−1 i−1 i−1) 1 ≤ ≤ This transformation includes modulo-2 addition of the input The other three columns include the 128-bit blocks of the input state of the Compress512 and the output state of the eighth message. Therefore, in total, there are t 128-bit message 12 × BIG.MixColumns. We present the following lemma for ob- blocks to be processed to give the output (see Fig. 1). taining the predicted parities of this transformation. As in seen Fig. 1, each Compress512 consists of four j Lemma 1: Let Mi , 0 ≤ j ≤ 11, be the 128-bit message different transformations, i.e., BIG.SubWords, BIG.ShiftRows, j blocks and Ai , 0 ≤ j ≤ 15, be the 128-bit outputs of the BIG.MixColumns, and BIG.Final. Each BIG.SubWords con- th eighth BIG.MixColumns of the i Compress512 in Fig. 1. tains two AES rounds. The first transformation SubBytes j In addition, let vi−1, 0 ≤ j ≤ 3, be the previous chaining which includes 16 S-boxes is the only nonlinear AES trans- j values. Then, the predicted parities of v , 0 ≤ j ≤ 3 formation. In the AES S-box, the irreducible polynomial of i (the current chaining values), after performing the BIG.Final M(x) = x8 + x4 + x3 + x + 1 is used to construct the transformation is obtained as binary field GF 8 . Let X GF 8 and Y GF 8 (2 ) ∈ (2 ) ∈ (2 ) 3 2 be the 8-bit input and output of each S-box, respectively. 4 4 Pˆ(vj )= P (vj + A j )+ P (M j). (2) Then, the S-box consists of a multiplicative inversion, i.e., i X i−1 i X i j=0 j=0 X−1 ∈ GF (28), followed by an affine transformation to 3 Y GF 8 j j obtain ∈ (2 ). Look-up tables (LUTs) and com- Proof. According to [1], we have vi = Pj=0 vi−1 + posite fields (polynomial basis, normal basis, mixed basis, 1 j 1 j Pj=0 1Ai + Pj=0 5Mi . Therefore, for the predicted par- and redundant-basis are among the approaches for this low- ˆ j 3 j 1 j ity we reach P (vi ) = j=0 P (vi−1) + j=0 1P (Ai ) + area implementation variant [19], [20], [21], [22]) are used 1 P P 5P (M j ) and after rearranging, the proof is complete. to implement the S-boxes. In general, with composite field Pj=0 i It is interesting to note that one can also obtain multiple realizations, a transformation matrix first transforms a field j 8 parities for vi by applying the parity derivation function (P ) element in the binary field GF (2 ) to the corresponding j 4j 4j 8 2 2 2 to selected bits of the arguments v + A and M . representation in the composite fields GF (2 )/GF (((2 ) ) ). i−1 i i Then, a multiplicative inversion consisting of composite field operations in the sub-field GF ((22)2) is performed. Finally, B. Fugue through an inverse transformation matrix, the inverted output To propose a fault detection scheme for Fugue, we observe is obtained. There have been a number of great research works that the Fugue transformations can be divided into three types. for error detection of the S-boxes and for the sake of brevity, The first type is the rotation transformations, i.e., ROR3, we do not discuss them. ROR14, and ROR15. The second category contains the two 3 linear transformations TIX and CMIX. Finally, the last one is Proof. We add the elements of the columns of N to reach the the nonlinear transformation SMIX. predicted parity PˆSM . It is interesting to note that adding the Each Fugue round has the following sequence: TIX, ROR3, elements in all columns except those in columns 0, 5, 10, and CMIX, SMIX, ROR3, CMIX, and SMIX. First, we propose 15 would result zero. For instance, if one adds the elements in the following theorem for the first three transformations TIX, column 1 of N (modulo-2), the result would be {4}h +{1}h + ROR3, and CMIX in the round sequence. Then, we propose {1}h + {7}h + {7}h + {1}h + {5}h = 0. For columns 0, 5, the fault detection scheme for the nonlinear transformation 10, and 15, the addition of elements results in {1}h + {4}h + SMIX. {7}h + {1}h = {4}h + {7}h = {3}h and this completes the 29 proof. We note that the multiplication with {3} = {2} + Theorem 1: Let σSi = Pi=0 Si be the 32-bit result of h h modulo-2 additions of Si, 0 ≤ i ≤ 29 (called word-wide {1}h is derived by the addition of I0 + I5 + I10 + I15 with signature). Then, the predicted word-wide signature of the x(I0 + I5 + I10 + I15) mod M(x). transformations sequence TIX, ROR3, and CMIX (σˆTRC ) in the Fugue round is obtained as IV. SIMULATION RESULTS AND ASICIMPLEMENTATIONS The proposed error detection architectures have been simu- σˆTRC = σSi + S24. (3) lated after injecting faults. The proposed architectures have the Proof. For TIX, the following substitutions are performed: capability of detecting both permanent and transient faults (this covers both natural and malicious faults). In this paper, we use S10 ← S10+S0, S0 ← m , S8 ← S8+m , and S1 ← S1+S24. i i stuck-at error model. The objective in using this model is to Therefore, we have σˆ = σ i + S10 + S10 + S0 + S0 + T IX S cover the malicious errors injected by the attackers to break mi + S8 + S8 + mi + S1 + S1 + S24 = σSi + S24. The ROR3 transformation, which is just rotations three positions to right, the algorithm (by injecting one or more incorrect bits) and to detect natural errors caused by bit flips. The stuck-at error does not change σˆ = σ i + S24. Moreover, for CMIX, we T IX S forces one bit (for single stuck-at error model) or multiple bits have S0 ← S0 + S4, S1 ← S1 + S5, S2 ← S2 + S6, S15 ← (for multiple stuck-at error model) to be stuck at logic one or S15 +S4, S16 ← S16 +S5, and S17 ← S17 +S6. Consequently, zero. This makes the result value independent of the error-free we reach σˆ = σ i +S0 +S0 +S4 +S1 +S1 +S5 +S2 + CMIX S intended value. S2+S6+S15+S15+S4+S16+S16+S5+S17+S17+S6 = σ i . S In fault attacks, single error injection is the ideal case Therefore, one reaches σˆ = σ i + S24 and the proof is TRC S for gaining the maximum information. Nevertheless, due to complete. technological constraints, a more realistic error model is to The nonlinear transformation SMIX in Fugue consists of inject multiple errors. Therefore, for covering both natural two functions. The second one is the linear Super-Mix func- errors and fault attacks, multiple errors need to be considered. tion. The Super-Mix function consists of multiplication of S0- The proposed diagnosis schemes in this paper are independent S3 (as a 16-byte input vector) with the following 16 × 16 of the life-time of errors. Therefore, both permanent and matrix N with hexadecimal entries to derive a 16-byte output transient stuck-at errors lead to the same error coverage. We 1471 1000 1000 1000 also note that intelligent attackers do not get confined to just  0100 1147 0100 0100  0010 0010 7114 0010 multiple stuck-at faults and thus the ability to detect single    0001 0001 0001 4711    faults is important.    0000 0471 1000 1000    The fault model used to test the proposed architectures is  0100 0000 1047 0100     0010 0010 0000 7104  created using external feedback linear-feedback shift registers    4710 0001 0001 0000  (LFSRs) to generate pseudo-random fault vectors that can N =     . (4)  0000 7000 6471 7000  flip random bits in the output of the gates and at random    0700 0000 0700 1647    intervals. For the architectures presented, we have injected up  7164 0070 0000 0070     0007 4716 0007 0000  to 80,000 faults and recorded the number of errors. We have      0000 4000 4000 5471  also used the redundant-basis S-boxes in composite field where    1547 0000 0400 0400   0040 7154 0000 0040  applicable. Moreover, the false alarm ratios are derived. The    0004 0004 4715 0000  error coverage in all the cases is more than 99% (and for the case of single stuck-at faults, 100% if we harden the error We propose the following theorem for the predicted parity indication flag comparison units), with relatively low ratio for of the Super-Mix function. 8 8 false alarms, i.e., 0.1%-0.3% for the cases. As we inject more Theorem 2: Let Ii ∈ GF (2 ) and Oi ∈ GF (2 ), 0 ≤ faults, the difference between the error detection results is, i ≤ 15, be the 16-byte input and output of the Super-Mix comparably, not high, showing the relatively high accuracy of function in Fugue, respectively. Then, the predicted parity for ˆ the results. this function, i.e., PSM , is derived as follows (we note that Through ASIC and for the constructions of the algorithms parity is just an example and any other detecting codes can in 256-bit form, we also present the performance and imple- be utilized) mentation metrics of the presented constructions. The bench- marking is performed for the error detection architectures PˆSM = {3}h(I0 + I5 + I10 + I15), (5) using TSMC 65nm library and Synopsys Design Compiler where the multiplication is performed using the irreducible (shown in Table I for area, frequency, throughput, and effi- polynomial M(x)= x8 + x4 + x3 + x +1. ciency [throughput over GE]). We note that in Table I, in 4

TABLE I BENCHMARKFORTHEPROPOSEDERRORDETECTIONSCHEMESFORTHEHASH ALGORITHMS ON ASIC (65NM TSMC)

Algorithm Block (bits) Area [GE] Frequency [MHz] Throughput [Gbps] Efficiency [Mbps/GE] ECHO-256 145,912 389 6.48 44.40 1,536 Proposed scheme 187,098 (28%) 370 (4.9%) 6.18 (4.6%) 33.03 (25.6%) Fugue-256 49,040 547 8.77 178.8 32 Proposed scheme 57,900 (18.1%) 519 (5.1%) 8.33 (5.1%) 141.1 (21.1%)

order to make the area results meaningful when switching [9] M. Mozaffari Kermani and A. Reyhani-Masoleh, “Concurrent Structure- technologies, we have also provided the NAND-gate equiv- Independent Fault Detection Schemes for the Advanced Encryption Standard,” IEEE Trans. Computers, vol. 59, no. 5, pp. 608-622, May alency (gate equivalents: GE). This is performed using the 2010 (special issue on System Level Design of Reliable Architectures). area of a NAND gate in the utilized TSMC 65-nm CMOS [10] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A High-Performance library which is 1.41 µm2. The results presented in Table I Fault Diagnosis Approach for the AES SubBytes Utilizing Mixed Bases,” in Proc. IEEE Workshop Fault Diagnosis and Tolerance in show acceptable overhead (degradation) for performance and Cryptography (FDTC), pp. 80-87, Nara, Japan, Sep. 2011. implementation metrics. We also note that the utilized platform [11] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A Lightweight High- is merely for benchmark and we expect similar results on Performance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields,” IEEE Trans. Very Large Scale field-programmable gate arrays (FPGAs) or different ASIC Integrated (VLSI) Systems, vol. 19, no. 1, pp. 85-91, Jan. 2011. libraries. [12] M. Mozaffari Kermani, V. Singh, and R. Azarderakhsh, “Reliable low-latency Viterbi algorithm architectures benchmarked on ASIC and FPGA,” IEEE Transactions on Circuits and Systems I: Regular Papers, V. CONCLUSIONS vol. 64, no. 1, pp. 208-216, 2017. [13] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Fault de- In this paper, we have proposed efficient fault detection tection architectures for post-quantum cryptographic stateless hash- based secure signatures benchmarked on ASIC,” ACM Trans. Embedded schemes by presenting closed formulations for the predicted Computing Syst. (special issue on Embedded Device Forensics and signatures of different transformations in three hash algo- Security: State of the Art Advances), vol. 16, no. 2, pp. 59:1-19, Dec. rithms. These signatures are derived to achieve low overhead 2016. [14] S. Patranabis, A. Chakraborty, D. Mukhopadhyay, and P. P. Chakrabarti, for the specific transformations and can be tailored to include “Fault space transformation: A generic approach to counter differential byte/word-wide predicted signatures. Through simulations, we fault analysis and differential fault intensity analysis on AES-like block have shown that the proposed fault detection schemes are ciphers,” IEEE Trans. Information Forensics and Security, vol. 12, no. 5, May 2017. highly capable of detecting natural hardware failures and are [15] M. Mozaffari Kermani, R. Azarderakhsh, C. Lee, and S. Bayat- capable of deteriorating the effectiveness of malicious fault Sarmadi, “Reliable concurrent error detection architectures for extended m attacks. The proposed reliable hardware architectures have Euclidean-based division over GF (2 ),” IEEE Trans. Very Large Scale Integrated (VLSI) Systems, vol. 22, no. 5, pp. 995-1003, May 2014. been also implemented on ASIC platform using a 65-nm [16] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Reliable standard technology to benchmark their hardware and timing and error detection architectures of Pomaranch for false-alarm-sensitive characteristics. The high efficiency of the proposed schemes cryptographic applications,” IEEE Trans. Very Large Scale Integrated (VLSI) Systems, vol. 23, no. 12, pp. 2804-2812, Dec. 2015. makes the proposed reliable architectures suitable for high- [17] M. Mozaffari Kermani, K. Tian, R. Azarderakhsh, and S. Bayat- performance applications. Sarmadi, “Fault-resilient lightweight cryptographic block ciphers for secure embedded systems,” IEEE Embedded Systems, vol. 6, no. 4, pp. 89-92, Dec. 2014. REFERENCES [18] S. Bayat-Sarmadi, M. Mozaffari Kermani, and A. Reyhani-Masoleh, “Efficient and concurrent reliable realization of the secure cryptographic [1] R. Benadjila, O. Billet, H. Gilbert, G. Macario-Rat, T. Peyrin, SHA-3 algorithm,” IEEE Trans. Computers-Aided Design Integr. Cir- M. Robshaw, and Y. Seurin, “ECHO ,” available: cuits Syst., vol. 33, no. 7, pp. 1105-1109, Jul. 2014. http://crypto.rd.francetelecom.com/echo/, accessed March 2018. [19] A. Hodjat and I. Verbauwhede, “Area-Throughput Trade-Offs for Fully [2] S. Halevi, W. E. Hall, and C. S. Jutla, “The hash function Fugue,” Pipelined 30 to 70 Gbits/s AES Processors,” IEEE Trans. Computers, Cryptology ePrint Archive, IACR, https://eprint.iacr.org/2014/423.pdf, vol. 55, no. 4, pp. 366-372, April 2006. 2014, accessed March 2018. [20] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndael [3] T. Peyrin, “Improved differential attacks for ECHO and Grøstl,” Cryp- hardware architecture with S-Box optimization,” in Proc. ASIACRYPT, tology ePrint Archive, Report 2010/223, 2010. 2001, pp. 239-254. [4] O. Beno¨ıt and T. Peyrin, “Side-channel analysis of six SHA-3 candi- [21] D. Canright, “A very compact S-Box for AES,” in Proc. CHES, 2005, dates,” in Proc. CHES, 2010, pp. 140-157. pp. 441-455. [22] Y. Nogami, K. Nekado, T. Toyota, N. Hongo, and Y. Morikawa, “Mixed [5] J.-L. Beuchat, E. Okamoto, and T. Yamazaki, “A compact FPGA bases for efficient inversion in F 2 2 2 and conversion matrices of implementation of the SHA-3 candidate ECHO,” Cryptology ePrint ((2 ) ) SubBytes of AES,” in Proc. CHES, 2010, pp. 234-247. Archive, Report 2010/364, 2010. [6] K. Gaj, E. Homsirikamol, and M. Rogawski, “Fair and comprehensive methodology for comparing hardware performance of fourteen round two SHA-3 candidates using FPGAs,” in Proc. CHES, 2010, pp. 264- 278. [7] S. Tillich, M. Feldhofer, M. Kirschbaum, T. Plos, J.-M. Schmidt, and A. Szekely, “Uniform evaluation of hardware implementations of the round- two SHA-3 candidates,” The Second SHA-3 Candidate Conference, Aug. 2010. [8] K. J¨arvinen, “Sharing resources between AES and the SHA-3 second round candidates Fugue and Grøstl,” The Second SHA-3 Candidate Conference, Aug. 2010.