Volume 56, Number 1-2, 2015 35

Online Testing Schemes for the IDEA NXT Crypto-algorithm Andreea Bozeşan, Flavius Opriţoiu Computer Science Department, University Politehnica Timisoara, Romania,

Abstract—This paper presents a hardware architecture for online self-test in the context of the IDEA NXT crypto-algorithm. From the many techniques and solutions presented in the literature for increasing Built In Self-Test (BIST) capabilities, after a careful analysis of these solutions, we decided to focus our attention on solutions based on parity-prediction. In this sense we designed and implemented a parity-based error detection architecture for the Datapath and one for the Schedule mechanism of IDEA NXT. The solution we propose doesn't interfere in any way with the algorithm's structure, as there is a complete separation between the functional and testing channels. The proposed solution is the first of this kind for the IDEA NXT crypto-algorithm. We evaluated the performance of the proposed test strategy with different redundancy levels and formulated recommendations for the concurrent detection strategy based on the obtained experimental results. The error-detection rate of the architecture in regards to stuck-at faults was also calculated in this paper.

Keywords— , IDEA NXT, crypto-algorithm, LFSR, concurrent-testing, parity-based testing Substitution-Permutation Networks based on the Feistel

1. INTRODUCTION scheme [2]. The ortomorphism represents a Feistel scheme on a single round which has the identity The Cryptographic domain is continuously trying function as a round function. IDEA NXT consists of 'n- to find ways for strengthening the means for obtaining 1' iterations of a round function called lmor64, the security of sensitive information. More and more followed by applying a slightly modified function attacks showing the weaknesses of existing algorithms called lmid64. The decryption process is very similar were published in the past years, most of the attacks on to the one, the only difference is that lmio64 algorithms operated with just simple key is used instead of lmor64 [3]. In this paper we will only programmers and algebraic substitution boxes, so these refer to the 64-bit version of IDEA NXT. crypro-algorithms' strucuture constituted merely an aid Function f32 stays at the base of lmor64 and also for algebraic attacks. The new trend in cryptographic constitutes the foundation of the entire encryption algorithms is IDEA NXT, a family of symmetric algorithm. It is composed of three parts: substitution, encryption algorithms, flexible and scalable, called diffusion and a round key addition part, as can be IDEA NXT which was theoretically proven to combine observed from Fig. 1. The substitution part uses a the speed of IDEA and security of AES crypto- substitution box (sbox) which essentially is a look-up algorithms [2]. table filled with predefined values. The diffusive part The IDEA NXT family is mainly made of two mu32 is a linear multipermutation defined of Galois block ciphers (NXT64, NXt128) which Field GF(28) [2]. essentially have the same algebraic structure The key is processed by a Key Scheduler module but differ in text sizes, key lengths and number which performs a four-layer encryption of its own of rounds. The NXT crypto-algorithms can be before providing the obtained round key to the data generalized by implementing a general version with encryption process itself. This algorithm is the very variable parameters. core of IDEA NXT, which gives the algorithm its security strength [2]. 2. MATHEMATICAL STRUCTURE OF IDEA The Key Scheduler's constituting parts are: NXT , mixing, diversification and the non-linear part called NL64. The IDEA NXT algorithm, which takes an input The f32 function stays at the base of lmor64 and text of 64 or 128 bits and a key of 128, 192 or 256 bits, also constitutes the foundation of the entire encryption depending on the chosen algorithm version, is based on algorithm. It is composed of three parts: substitution, a Lay-Massey scheme combined with two diffusion and a round key addition part. The ortomorphisms and the round functions are of type substitution part uses a substitution box (sbox) which essentially is a look-up table filled with predefined

© 2015 – Mediamira Science Publisher. All rights reserved 36 ACTA ELECTROTEHNICA

values. The diffusive part (mu32/mu64) is a linear at every stage of the algorithm. There are two types of multipermutation defined of Galois Field GF(28) [2]. error detection principles: offline and concurrent testing. In the first case, the testing is made when the system is not running, whereas in the second case testing is done while the system is in operating mode. Concurrent checking schemes are designed to detect certain types of errors (e.g. single errors, double errors, unidirectional errors) or all or a high percentage of single stuck at 0/1 faults, but in general a single fault can cause different types of errors to occur [17]. Therefore, one must design its architecture to be as general as possible and detect as many types of faults as possible. So far, no verification mechanisms have been implemented for the IDEA NXT crypto-algorithm, nor offline or concurrent, so we decided to increase the reliability of crypto-systems in which this powerful algorithm is used by creating a pair of concurrent, self- testing architectures, one for the Datapath and the other for the Key Scheduler. The fault detection principle we used is the non intrusive concurrent error detection mechanism from

Fig. 1 The f64 function, main part of lmor64 [16] based on the output’s parity prediction. A parity check code is a code in which the parity of multiple circuit outputs, forming a parity group, is checked The non-linear step is itself made of multiple against a predicted parity bit for that group. The parts: substitution (which uses 4 parallel sigma4 objective is to classify the outputs in a minimal number processes that are each composed of 4 substitution of groups, such that any single fault in the circuit will boxes operating in parallel), diffusion, composed of affect the parity of at most one output bit in every parity four times mu4 (a linear (4,4) multipermutation defined group. To ensure that the fault effect will be detected, on GF(28)) functions plus mixing, and mixing. The no sharing is allowed between the cones of logic of result is obtained from various combinations of XOR output bits belonging to the same parity group. The two operations between the 4 parts obtained by splitting the extreme cases for the number of parity groups are input vector. The diversification part takes the key single-bit parity and duplication. In single-bit parity, all computed in the mixing part, having ek bits length, the output bits of the circuit form a single group and, total number of rounds and the current round number and consequently, no sharing between their cones of logic is modifies the key with the help of a 24-bit LFSR. The allowed [18]. Duplication leaves the original circuit main part of diversification, the Linear Feedback Shift intact, yet incurs the cost of an additional copy of the Register (LFSR), is used to generate pseudo-random circuit to predict the parity of each group, while the numbers. single-bit parity case is relatively inexpensive, as no redundancy is introduced. We chose the second case for 3. CONCURRENT ERROR DETECTION our parity-checking architecture. ARCHITECTURES The output parity prediction means adding a After an encryption algorithm is starting to be number of parity bits and calculating at each stage if the used in the field, integrated in a chip or used as such, it parity remains the same or not. Thus the architecture has to be checked periodically for correct functioning. will correctly detect any odd number of errors affecting There are a lot of things that can determine its the result of the protected module, while remaining malfunction, from system faults to intruder attacks to completely independent from the circuit under test. the algorithm. As has already been proved in various (CUT). This type of error detection fits well with the papers [3], [23], [24], there are many different types of notion of integrated circuits that are designed to be attacks that can compromise the encryption process totally self-checking with respect to a set of faults, as even in the case of a hardware implementation of a we can verify each stage and component of a cryptographic algorithm. Attackers can inject faults into cryptographic algorithm in the proposed manner. crypto-chips and cryptographic cores, which can lead to As mentioned before, we constructed two permanent faults by modifying the underlying concurrent architectures - for the Datapath and the Key semiconductor layer. We can mention particularly Scheduler of IDEA NXT – so that the whole algorithm linear and differential and fault attacks. is checked for possible errors, not just part of it. The In order to check for errors and faults we can two error detection mechanisms will be described in construct testing architectures which fit in detail the detail in the following chapter. algorithm's structure and verifies for correct functioning Volume 56, Number 1-2, 2015 37

4. PROPOSED CONCURRENT TESTING bits of data, so calculating the parity of the operation ARCHITECTURES reduces to calculating the parity of one substitution box and then summing up all the s-boxes' parity outputs. A simple solution is to sum all the s-box's input parity 4.1. Error detection mechanism for IDEA bits, obtaining a single parity bit which is preserved by NXT’s Datapath the transformation. This result must be verified for correct execution against the Verifier used for the first The concurrent architecture for the Datapath is XOR operations described above. Moreover, after shown in Fig. 2. Both the IDEA NXT Datapath and the sigma4’s execution, the parity bits must be recomputed unit are designed to incorporate parity from the current state in order for them to be reinserted prediction modules, which are intended to operate as into the parity channel. decoupled as possible. However, the parity channel could not be created to be completely independent, as can be depicted from the graphical representation given in Fig. 2. In this paper we choose to operate on the 64-bit version of IDEA NXT, which has 64 bist of data and a key length of 128 bits. The concurrent testing scheme was constructed by adding two parity bits for the data, denoted xlp and xrp each associated to 32 bits of data (denoted xl and xr in the scheme), and two bits for the 64-bit round keys used in the encryption process, denoted rk0p and rk1p. As stated earlier, the IDEA NXT crypto algorithm is in fact the “r-1” times iteration of the lmor64 function followed by applying a slightly different function, lmid64 (same as lmor but doesn't have the orthomorphism at the end). At many stages, the lmor64 Fig. 2 Parity-based testing architecture for IDEA NXT's Datapath function does just basic XOR-ing operations, which impose no problems in terms of parity predictions, as The prediction for mu4 is also posing problems as they leave the parity bits unchanged. it is an irregular operation. The calculation of its parity The problem appears in case of more complex bit was obtained by XOR-ing the bits from the four 8- operations, like sigma4, which used substitution boxes, bit-length outputs operation takes four 8-bit inputs. The and mu4, which uses complex operations in the scheme of the parity predictor we constructed is shown GF(28).Regarding the parity prediction for the Sbox in Fig. 3. instances, this is a complex problem due to operation’s The last parity prediction module was non-linearity. The same is to be said for the implemented for the orthomorphism. The irregularity of orthomorphism (or) used in the lmor64 function of the this operation is visible in its defining equation: encryption as well as the inverse orthomorphism (io), used in the lmio function of the decryption process. y(64) = lmor64(xl(32)||xr(64)) = OR (xl(32) XOR For these complex operations we created a series f32(xl(32) XOR xr(32), rk(64)) || (xr(32) XOR f32(xl(32) (1) of parity prediction modules which compute the value XOR xr(32), rk(64)) of the parity bits after the operation's execution. A custom solution, tailored to the operation's specific implementation needed to be built for each of the If there were no XOR operation or if an XOR prediction modules and will be presented in the were applied symmetrically to the two input sides the following paragraphs. parity of the outputs would be straightforward, but As can be seen in Fig. 2, we created a testing because the XOR operation can change one part of the scheme which is parallel with the algorithm's structure. output result, a module for parity calculation was In the first point in which the parity needs to be needed. If we denote ar and al the inputs for the calculated, there is just an XOR between the left and orthomorphism bl and br its outputs, as in Fig. 5, then right parity bit and the ouput is further XOR-ed with the parity bit of this operation, denoted ap, can be the first round key parity bit (rk0p). The resulted bit is calculated like this: simply verified for parity (if it is an odd number or not), as can be observed in Fig. 2. ap = Parity (OR(al, ar)) = ar XOR (ar XOR al)) = (2) The parity prediction of the sigma4 module ar XOR al XOR ar = al introduces irregularity into the architecture because it is impossible to compute the transformation's output parity taking into account only the parity of its input. Sigma4 is composed of four s-boxes, taking as inputs 8 38 ACTA ELECTROTEHNICA

4.2. Error-detection mechanism for the Key For the inversion operation, which takes place at the Scheduler end of lmid64, the parity bit doesn't change its value. IDEA NXT's Key Scheduler is mostly composed An element which appears in the Key Scheduler but of the same operations as the Datapath, this is why the not in the Datapath is the LFSR. Calculating the parity general testing architecture is very similar to the one we for the series of pseudo-random number generators is constructed for the Datapath, as can be seen by not a trivial task, so a Parity Predictor needed to be analyzing comparatively Fig. 2 and Fig. 3. We will constructed for it also. The first idea was to extend the define again two parity bits for the round key, rk0p and LFSR with a bit, which will be the parity bit, but at the rk1p, but only one parity bit for the data, denoted: last LFSR, only 8 bits are used, instead of 24, so it wouldn't fit in the general scheme. Instead, we choose to use a 3-bit parity register, make an XOR between the encryption process, the only difference in the error- detection scheme will be the parity predictor constructed for the inverse orthomorphism (which is used instead of the regular orthomorphism in the IDEA NXT algorithm) whose output is ar instead of al, as observed from equation (5):

ap = Parity (Inverse OR(al, ar)) = al XOR (ar XOR al)) = al XOR al XOR ar = ar (5)

Fig. 3 Parity-based testing architecture for IDEA NXT's Key Scheduler xp.,

where xp = Ʃxi (xl, xr being, same as above, the input data for the KS). The first parity bit of the error- detection scheme is calculated by doing an XOR between the two halves of the input data bytes and Fig. 4 parity check scheme for mu4 summing up the results of this for each round: xp = Ʃa = Ʃ (xl XOR xr) (3) 5. EXPERIMENTAL RESULTS The output is furthered XOR-ed with rk0 and so The following paragraphs will analyze the the new intermediate parity bit will be the sum of these experimental results obtained by synthesizing the on- results: line test architecture I designed for the IDEA-NXT64

algorithm. We implemented in Verilog all architectures Ʃb = Ʃa XOR Ʃrk = x XOR r 0p p k0p (4) (the parity-testing ones and the original UDEA NXT encryption-decryption algorithm) and we synthesized The result is passed through a Verify module to them for the Xilinx Virtex 5 FPGA. see if it is an even or an odd number. The metrics we took into consideration were area For the Padding layer, we make an XOR between occupied inb hardware, critical path and throughput. It groups of four figures from the pad constant and the must also be mentioned that the architectures were results is also a constant – figure '3', whose parity bit is compared to one another and to the base crypto- 0 (constants have their parity bit zero). algorithm as, to the best of our knowledge, there is no For the sigma4 and mu4 operations we can use the other similar architecture in the literature to this day. s-box and mu4 Parity Predictors which we already Apart from the parity-based error detection developed for the Dapatah error-detection scheme. approach that associates 1 parity bit to each group of 4 After a parity bit is calculated for each of them, we bytes of the encryption process, I also evaluated the must check for correctness with a Verify module to see performance of the constructed architecture when using if the parity bit is even or odd. Three verifiers are used more redundancy bits. in the scheme. Volume 56, Number 1-2, 2015 39

TABLE I IDEA NXT64 parity-based architectures synthesis results architectures the faster it performs, at the expense of a larger design, as the experimental results reveal.

Fig 7 IDEA NXT parity-based architectures synthesis results in terms of throughput

Fig 5 IDEA NXT parity-based architectures synthesis results in terms of area

Fig 8 IDEA NXT parity-based architectures synthesis results in terms of throughput

The architecture employing 1 bit of parity for each group of 4 bytes has the largest critical path compared with the solution employing 4 bits of parity for the

Fig 6 IDEA NXT parity-based architectures synthesis results in same data size, as evident from Table I. terms of Maximum frequency The reason for the degradation of device’s performance with reduction of the redundancy level, is More precisely, I investigated the effect of using 1 partly due to the complexity associated with the verifier parity bytes for each pair of 2 bytes as well as using 1 modules and, for the case of 1 parity bit, because of the parity bit associated with each byte of the Datapath and parity prediction for the ortho module, depicted in Fig. Key Scheduler. Besides the increased error detection 5-8. More precisely, for all implementations involving capability associated with higher redundancy levels, more than 1 parity bit associated to the 32 bits because of the particular aspects of the IDEA NXT64 processed by ortho, as evident from equation (1), the algorithm as well as of the concurrent error detection parity of module’s output can be directly predicted architecture, the higher the redundancy level of the based solely input’s parity. In consequence, the implementations using 2 and 4 bits of parity for the 32 40 ACTA ELECTROTEHNICA

bits processed by the ortho module, the final verifier fig. 59. The error-detection rate is best when 1 defect is unit checking the correctness of parity bits is not injected into the concurrent architecture where bit of required. parity is associated with 4 bytes of the data processed by all units of the algorithm, reaching 100%. The case where 1000 defects of the same type are injected into the algorithm has the lowest detection rate in all 3 types of parity-based testing architectures, but it is still over 90% in all cases.

6. CONCLUSIONS In this paper we have we addressed the problem of including error detection capabilities into a crypto-chip hardware implementation for the new family of crypto-

Fig 9 Detection Rate for stuck-at-0 defects of the same type injected algorithms, IDEA NXT, as little work has been done in into the parity-based on-line error-detection Architecture built for the field of error-detection mechanisms for IDEA NXT IDEA NXT so far. In this sense we designed a parity-based testing architecture, that is completely independent from the Apart from this the higher the number of parity algorithm itself, and which works for all versions of the bits, the smaller the height of the XOR tree used inside algorithm, independent of the key and text length. It the verifier module and thus the faster the parity mainly consists of defining a series of parity bits for the verification. More precisely, when using a single bit of Datapath and the Key Scheduler and checking their parity for a group of 32 bits, the verifier unit contains value at every step of the algorithm, by verifying the an XOR tree of height 5, whereas when a parity bit is outputs of the Parity-predictor modules (built for associated with a byte, the XOR tree has a height of complex operations where the parity bit's value wasn't only 3 logic levels. The latency associated with the straightforward) against a simpler checker scheme – if verifier modules justify also the faster performance of the parity bit was odd, then an error had been the 4 parity bit design compared to the 2 parity bit introduced at that stage. solution. After constructing the design, we proceeded to As can be seen from Table I, the effect of verifying and synthesizing the error detection increasing the redundancy level over the area of the architecture in three different contexts, and design is consistent. With respect to the combined demonstrated the efficiency of the proposed solution. metric Throughtput/Area, the 2 parity bit and 4 parity bit architectures have similar scores, also higher than ACKNOWLEDGMENT the score for single parity bit design. This is the reason that IDEA NXT64 is better verified concurrently for This work was partially supported by the strategic errors by employing either 2 or 4 parity bits associated grant POSDRU/159/1.5/S/137070(2014) of the with each group of 4 bytes processed by the algorithm. Ministry of National Education Protection, Romania, It is well known that once a test architecture is co-financed by the European Social Fund – Investing in designed, it must be validated. One technique which People, within the Sectoral Operational Programme validates the dependability of an error-detection scheme Human resources Development 2007-2013. is fault injection. We considered sufficient for the time being to verify the detection rate of the parity-based architecture in regards to the stuck-at fault models. The REFERENCES technique we used was reducing the Verilog code to only primitive components and adding saboteurs 1. P. Junod, S. Vaudenay, "Perfect Diffusion Primitives for Block between different components in order to alter the Ciphers," Selected Areas in Cryptography, Lecture Notes in values of one or more input / output signals. Computer Science H. Handschuh and M. A. Hasan, eds., pp. 84- Fig. 9 shows the error-detection rate for the BIST 99: Springer Berlin Heidelberg, 2005. 2. P. Junod, S. Vaudenay, “FOX Specifications version 1.2, 2005, Error-Detection Schemes I have built for IDEA crypto- p.5-40 algorithm in the cases where 2, 10, 100, 250 and 500 3. K. Chong Hee, and J. J. Quisquater, “Faults, Injection Methods, faults have been injected in the architectures. It can be 4. and Fault Attacks,” Design & Test of Computers, IEEE, observed that in all cases the fault detection rate is vol. 24, no. 6, pp. 544-545, 2007. above 99% when more than 200 thousands simulation 5. A. Avizienis, J-C. Laprie, B. Rendall, “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE of the algorithm have been performed. Transactions on Dependable and Secure Computing, 2004 For the parity-based concurrent architectures I ran 6. S. Burton Kaliski Jr., Matthew J. B. Robshaw, “Linear simulations for the cases where 1, 10, 100, 250, 500, Cryptanalysis Using Multiple Approximations”, 1994 800 and 1000 stuck-at-0 defetcts were inserted inserted 7. C. Cachin, J. Camenisch, Y. Deswarte, J. Dobson, D. Horne, K. Kursawe. J.C. Laprie, J.C. Lebraud, D. Long, T. into the design and verified the error-detection rate after McCucheon, J. Muller, F. Petzold, B. Pfitzmann, D. all algorithm rounds were run. The results are shown in Volume 56, Number 1-2, 2015 41

8. W. Meier , “On the Security of the IDEA block cipher, 19. Steffen Tarnick, “Bounding Error Masking in Linear Output Advances in Cryptology”, 1996 Space Compression Schemes” 9. J. S. Coron, L. Goubin, “On Boolean and Arithmetic Masking 20. Sobeeh Almukhaizim and Yiorgos Makris, “Fault Tolerant Against Differential Power Analisys”, in Proceedings of Design of Random Logic based on a Parity Check Code”, Workshop on Cryptographic Hardware and Embedded Systems Electrical Engineering Department Yale University - CHES 2000, p231-237, Springer - Verlag, 2000 21. http://vcag.ecen.okstate.edu/projects/scells/ 10. N. Courtois, J. Pieprzyk, “Cryptanalysis of block ciphers with 22. Flavius Opritoiu, Mircea Vladutiu, Mihai Udrescu, Lucian overdefined systems of equations”, in Advances in Cryptology - Prodan - “Round-Level Concurrent Error Detection Applied to ASIACRYPT'02, volume 2501 of Lecture Notes in Computer Advanced Encryption Standard”, Timisoara, 2013 Science, pages 267-287, Springer-Verlag, 2002 23. Security Requirements for Cryptographic Modules. Federal 11. J. Daemon, R. Govaerts, J. Vandervale, “Weak Keys of Information IDEA”, Proc. Of 3rd 24. Processing Standards Publication 140-2, December 2002 12. J. Daemon, V. Rijmen, “The Rijndael Block Cipher- AES 25. A. Moradi, O. Mischke, and C. Paar, “One Attack to Rule Them Proposal”, 1997 All: Collision versus 42 AES ASIC Cores,” 13. P. Kitsos *, N. Sklavos, M.D. Galanis, O. Koufopavlou, “64- Computers, IEEE Transactions on, vol. 62, no. 9, pp. 1786- bit Block ciphers: hardware implementations and comparison 1798, 2013. analysis”, VLSI Design Laboratory, Electrical and Computer 26. M.-K. Mehran, “Concurrent Structure-Independent Fault Engineering Department, University of Patras, Greece, 2004 27. Detection Schemes for the Advanced Encryption Standard,” 14. Bruce Schneier, “Applied Cryptography. Protocols, Algorithms ansi Source Code in C”, J. Willey & Sons, New York, 1996 Andreea Bozeşan 15. Andreea Bozesan, Flavius Opritoiu, Mircea Vladutiu – Researcher at the Computer Science Department, University Hardware Implementation of the IDEA NXT Crypto-algorithm, Politehnica Timisoara, Romania, SIITME 2013 [email protected] 16. Sachin Dhingra , “Comparison of LFSR and CA for BIST “ 17. Rao, T.R.N., Fujiwara, E.: Error-Control Coding for Computer Flavius Opriţoiu Systems. Lecturer at the Computer Science Department, University 18. Prentice-Hall International, 1989 Politehnica Timisoara, Romania, [email protected]