CLEFIA Implementation with Full Key Expansion
Total Page:16
File Type:pdf, Size:1020Kb
2015 Euromicro Conference on Digital System Design CLEFIA Implementation with Full Key Expansion Jo˜ao Carlos Bittencourt†‡,Jo˜ao Carlos Resende‡, Wagner Luiz de Oliveira† and Ricardo Chaves‡ †Polytechnic Institute, Federal University of Bahia – Bahia, Brazil ‡INESC-ID, IST, Universidade de Lisboa – Lisbon, Portugal Abstract—In this paper a compact and high throughput architecture at very low added cost. To validate this, a hardware structure is proposed allowing for the computation fully functional compact hardware structure is proposed, of the novel 128-bit CLEFIA encryption algorithm and its supporting both the encryption computation of the CLEFIA associated full key expansion. In the existing state of the art only the 128-bit key schedule is supported, given the needed algorithm and the respective key expansion for all key sizes. modification to the CLEFIA Feistel network. This work shows that with a small area cost and with no performance impact, II. CLEFIA 128-BIT BLOCKCIPHER full key expansion can be supported. This is achieved by using addressable shift registers, available in modern FPGAs, The CLEFIA cipher is a 128-bit symmetrical block ci- and adaptable scheduling, allowing to compute the 4 and 8 phering algorithm supporting cipher key sizes of 128, 192, branch CLEFIA Feistel network within the same structure. and 256 bits. This algorithm is based on the well known and The obtained experimental results suggest that throughputs commonly used Feistel network structure. As in most block above 1 Gbps can be achieved with a low area cost, while ciphers, the input data is processed over several rounds, achieving efficiency metrics above those of the restricted state of the art. adding confusion and diffusion with the input key. In this particular algorithm the data and key are processed over 18, Keywords -CLEFIA, Encryption, Cipher, Full key expansion, 22, or 26 rounds depending on the key sizes. The round FPGA computation is exactly the same for each iteration. I. INTRODUCTION A. Data Processing The market of embedded systems has experienced sub- stantial growth in the last decades. Currently, the use of The encryption process takes a 128-bit input data block P = P |P |P |P WK = mobile and embedded systems already exceeds the use 0 1 2 3, four 32-bit whitening keys WK |WK |WK |WK of personal computer systems. Identically, the need for 0 1 2 3, and several 32-bit round keys RK security and privacy services has also increased. Towards i as data inputs. The resulting outputted ciphertext is this, efficient and compact implementations of cryptographic a 128-bit cryptogram. primitives are needed. One such primitive is the CLEFIA The first step of the encryption process is to XOR the P P symmetrical block cipher, proposed and developed by SONY second and fourth words of the plaintext ( 1 and 3) with WK Corporation [1]. This algorithm supports 128, 192, and the first and second 32 bits of the original key ( 0 and WK 256-bit keys and provides improved cryptographic secu- 1), performing the first key whitening procedure. After rity through the use of Diffusion Switch Mechanisms and this operation the rounds are executed. Each round is com- 4 GF N whitening keys among others, in order to ensure immunity puted by a -branch Feistel structure, defined by 4,n, n against differential and linear attacks [1]. where is the number of rounds to be computed [1]. F Recent works on CLEFIA have highlighted its perfor- The round computation contains two parallel non-linear mance, particularly in hardware implementations for both functions per round, where a copy of the first and third ASIC and FPGA technologies. Many of these approaches words, and two round keys, are their respective inputs. In strive for compact structures while maintaining high per- the final round the second and fourth final words are XORed formance, leading to the optimization of the computational with the last two whitening keys. F F resources and the exploitation of possible parallelism be- Besides the round keys addition, the 0 and 1 functions S S S tween operations. However, given the need for an 8-branch employ two different types of 8-bit -Boxes ( 0 and 1) M M Feistel network, when computing the key expansion for 192 and two distinct diffusion matrices ( 0 and 1)[1]. and 256-bit keys, most existing structures that provide key expansion only do so for 128-bit keys, using a 4-branch B. Key Scheduling Feistel network [2], [3]. Since each round uses two 32-bit round keys a total of 36, The main goal of the work herein presented is to show 44, or 52 round keys (depending on the number of rounds) that a CLEFIA ciphering structure, capable of supporting are needed, plus 4 additional whitening keys [1]. These the computation of both 4 and 8 branches of CLEFIA round keys are obtained using the specified key schedule Feistel networks, can be designed within the same hardware algorithm [1]. 978-1-4673-8035-5/15 $31.00 © 2015 IEEE 555 DOI 10.1109/DSD.2015.55 The whitening key (WK) generation is accomplished The first step towards supporting the expansion of all key according to the key size. For a 128-bit input key, the four sizes, is the ability to compute a GF N8,n function. Towards 32-bit whitening keys are obtained directly from the input this, the folded structure proposed in [4] is considered. This key, by: structure considers a T -Box based implementation within a WK0|WK1|WK2|WK3 ← K. (1) 32-bit datapath, a design choice shown to result in compact and efficient structures, particularly when considering FPGA For the 192 or 256-bit input keys, the value is divided as the target technology [4], [5]. into two 128-bit blocks, KL and KR, as shown by: This section starts by describing the proposed structure 192 GF N KL||KR ← K0|K1|K2|K3 || K4|K5|K0|K1 : K (2) for the 4/8,n computation, particularly considering the Xilinx VIRTEX FPGAs as the target technology. To con- 256 KL||KR ← K0|K1|K2|K3 || K4|K5|K6|K7 : K (3) clude, the proposed key expansion module is also detailed. The corresponding whitening key is then computed by: A. GFN Blockcipher Structure WK = KL ⊕ KR. (4) The CLEFIA encryption structure herein proposed is based on the work presented in [4]. However, the GF N8,n The key expansion of a 128-bit key uses the same 4- Feistel network imposes a larger datapath, due to the need branched GF N network used for the CLEFIA main encryp- to store and multiplex additional intermediate values. This tion process. The differences in the 128-bit key expansion storage and multiplexing can be performed by extra registers is that the input data of the GF N structure is now the input and wider multiplexers, resulting in higher area costs. key itself, and the round keys are replaced with predefined One of the main optimizations herein considered, in order constants [1]. to reduce the area cost of the proposed GF N8,n supporting When considering the key schedule for the 192 and 256- structure, is related to the needed word swap. This particular bit keys, the GF N network becomes an 8-branch structure chain of registers imposes a high cost. However, when con- (GF N8,n). In this case, the input value is a combination sidering the target technology, these individual registers can of K = KL||KR [1]. The 8-branch Feistel structure uses be replaced by addressable shift registers. This addressable the same two non-linear F functions, twice per round and shift register can be mapped into Look Up Tables (LUTs) processes eight input words on each round. operating in either SRL16 or SRL32 LUT mode. Each LUT Instead of a ciphered text, the output of the GF N is able to implement a 1-bit wide addressable shift register, structure, in the key expansion process, is either a 128- capable of storing up to 16 or 32 bits. The full value storage bit block (L), for 128-bit input keys, or two 128-bit blocks and swapping operation can thus be implemented using (LL and LR) for the remaining key sizes. After the GF N 32 LUTs, as depicted in block 2 of Figure 1, with an computation is completed, the result (L or LL and LR) additional register placed after the shift register in order to is expanded in an iterative way using a double swap (Σ) reduce the critical path. function, as: L = Σ(L); LL = Σ(LL) LR = Σ(LR) (5) The Σ function swaps several bits of its 128-bit input and returns another equally sized output, specified by: Σ(X)=X[7−63]|X[121−127]|X[0−6]|X[64−120] (6) With this, the 32-bit round keys are obtained by adding alternately the L, K,andΣ(X) values with another prede- fined set of constants [1], [3]. III. PROPOSED ARCHITECTURE The main goal of the work herein proposed is to design a compact structure capable of both computing the CLEFIA encryption and the key scheduling for all possible key sizes. As stated in Section II-B, the key expansion of 128-bit keys can be processed by the same GF N4,n structure used for encryption. On the other hand, for 192 and 256-bit keys a GF N8,n structure is required. Such a requirement is the Figure 1. Proposed CLEFIA GF N4/GF N8 structure. main difficulty towards full key expansion support regarding CLEFIA compact hardware structure. The input of data into the structure can also be optimized, 556 bus and stored into a SRL16 LUT.