Design and Implementation of Unified Hardware for 128-Bit Block Ciphers ARIA and AES
Total Page:16
File Type:pdf, Size:1020Kb
Design and Implementation of Unified Hardware for 128-Bit Block Ciphers ARIA and AES Bonseok Koo, Gwonho Ryu, Taejoo Chang, and Sangjin Lee ABSTRACT⎯ARIA and the Advanced Encryption Standard AES, and we measure its performance using a 0.25 μm CMOS (AES) are next generation standard block cipher algorithms of standard cell library. Our processor is very suitable to implement Korea and the US, respectively. This letter presents an area- low-cost and area-constrained cryptographic modules that efficient unified hardware architecture of ARIA and AES. Both should support both algorithms. Note also that the Public Key algorithms have 128-bit substitution permutation network (SPN) structures, and their substitution and permutation layers Cryptography Standard (PKCS) #11 has recently begun to could be efficiently merged. Therefore, we propose a 128-bit support the mechanisms of ARIA as well as those of AES [3]. processor architecture with resource sharing, which is capable of processing ARIA and AES. This is the first architecture which supports both algorithms. Furthermore, it requires only II. Unified Substitution Layer 19,056 logic gates and encrypts data at 720 Mbps and 1,047 Mbps for ARIA and AES, respectively. The substitution layer of ARIA consists of two kinds of S-boxes, S1, S2, and their inverses. The S-box S1 is composed of Keywords⎯ARIA, AES, hardware architecture, resource multiplicative inversion over GF(28) and affine transformation, sharing. 247 while S2 is a combination of x and affine transformation. Both S-boxes are defined over the GF(28) with the irreducible I. Introduction polynomial m(x)=x8 +x4+x3+x+1, which is the same as that of 247 AES. Also, S1 is identical to the S-box of AES. The term x in The Advanced Encryption Standard (AES) is a block cipher -8 8 -1 which was adopted as an encryption standard by the US S2 is equivalent to x in GF(2 ) and can be represented as C· x government in November 2001 [1]. In Korea, ARIA was with an 8×8 binary matrix C. Consequently, the two S-boxes announced as a standard block cipher algorithm in December can be rearranged with multiplicative inverse as (1) and (2). 2004 [2]. The algorithm structures of ARIA and AES are Here, A, B, and C are 8×8 binary matrices, and a and b are 8×1 analogous. Both algorithms are 128-bit block ciphers with binary vectors: substitution permutation network (SPN) structures. Also, ARIA −1 Sx1 ()=⋅ Ax ⊕ a , uses two kinds of S-boxes, one of which is the same as the S-box Sx−1 ()=⋅⊕⋅ ( AxAa−111−− ) , (1) of AES. However, the permutation layer of ARIA uses a 16×16 1 involutional binary matrix, whereas AES uses two kinds of 4×4 247−− 8 1 Sx2 ()= Bx⋅⊕=⋅⊕=⋅⊕ b Bx b BCx b , matrices which involve multiplication over GF(28). Sx−1 ()=⋅⊕⋅ (( BCxBCb )−−−111 ( ) ) . (2) In this letter, using resource sharing techniques, we propose an 2 efficient unified hardware architecture incorporating ARIA and Therefore, we designed two kinds of shared S-boxes, SA, SU, which share a multiplicative inverter and merging affine Manuscript received Apr. 17, 2007; revised Sept. 3, 2007. -1 transformations, as depicted in Fig. 1. Here, SA executes S1 or S1 , Bonseok Koo (phone: + 82 42 870 2218, email: [email protected]), Gwonho Ryu (email: -1 -1 [email protected]), and Taejoo Chang (email: [email protected]) are with the Network & and SU selectively executes S1, S1 , S2 , or S2 . Also, δA and δU are 8 2 2 2 Communication Security Division, The Attached Institute of ETRI, Daejeon, Rep. of Korea. isomorphism functions from GF(2 ) to GF(((2 ) ) ), and AF1 and Sangjin Lee (email: [email protected]) is with the Center for Information Security Technologies, Korea University, Seoul, Rep. of Korea. AF2 denote the affine transformations of S1 and S2, respectively. 820 Bonseok Koo et al. ETRI Journal, Volume 29, Number 6, December 2007 and can be written as a 4×4 matrix multiplication [1]. Also, SA InvMixColumns can be arranged in a form which includes a -1 δA x-1 over δA • AF1 MixColumns matrix; thus, these two functions can be 2 2 2 -1 2:1 GF(((2 ) ) ) -1 2:1 AF1 • δA δA efficiently implemented with resource sharing [4]. On the other hand, the permutation function of ARIA called a diffusion matrix uses a 16×16 involutional binary matrix [2]. SU -1 δU δU • AF1 -1 As each row of the matrix has 7 elements of 1, it requires 768 -1 x over -1 AF1 • δU 2 2 2 δU • AF2 3:1 GF(((2 ) ) ) 3:1 (that is, 6×16×8) 2-input XOR gates to implement the diffusion -1 -1 AF2 • δU δU matrix in a general way. In order to efficiently design a merged permutation circuit, Fig. 1. Structures of the shared S-boxes SA and SU. we examine common 2-input XOR terms among these three matrices. For a 16-byte input (x0, x1,···, x14, x15), we find 1-byte sharing common terms αi and βj as Table 1. Performance of the shared S-boxes S and S . A U α = xx⊕ , i = 0,1, 2, ,15, ii4⎣⎦⎢⎥ii / 4 +(+ 1 mod 4) L Shared S-box (functions) Gate counts Delay (ns) (3) β jjj=⊕xx2−+− ( mod 2) 2( jj 1) ( mod 2) , j = 0,1, 2,L , 7. -1 SA (S1, S1 ) 273 6.17 -1 -1 Then, using common terms αi and βj, each 16-byte output of SU (S1, S1 , S2, S2 ) 373 6.63 MixColumns, (u0, u1,···, u14, u15) and of InvMixColumns, (v0, v1,···, v14, v15) can be rearranged as uxkk=⊕2,αα4⎢⎥kk / 4 +(++ 2 mod 4) ⊕ 4 ⎢⎥ kk / 4 +( 1 mod 4) ARIA ⎣⎦ ⎣⎦ (4) Unified substitution layer 8 8 8 8 8 8 8 8 vukk=⊕12ββ2⎢⎥kk / 4+++ ( mod 2) ⊕ 8 2 ⎢⎥ kk / 4 ( 1 mod 2) , S S S -1 S -1 S -1 S -1 S S ⎣⎦ ⎣⎦ 32 32 32 32 1 2 1 2 1 2 1 2 8 8 8 8 8 8 8 8 where k=0,···,15. UT UT UT UT AES Also, the 16-byte output (w0, w1,···, w14, w15) of the diffusion 8 8 8 8 8 8 8 8 32 32 32 32 -1 -1 -1 -1 S1 S1 S1 S1 S1 S1 S1 S1 matrix of ARIA can be arranged with the common terms as 8 8 8 8 8 8 8 8 follows: wx= ⊕⊕⊕β αα,, wx =⊕⊕⊕ βαα Fig. 2. Structure of the unified substitution layer. 0 3 2 8 13 1 2 3 8 15 wx2=⊕ 1β 2 ⊕αα 10 ⊕ 15,, wx 3 =⊕ 0 βαα 3 ⊕ 10 ⊕ 13 2 2 2 For a compact area, we use the composite field GF(((2 ) ) ) wx4501114541914=⊕⊕βα ⊕ α,, wx =⊕⊕⊕ βαα inverter and also combine the isomorphism functions and wx67=⊕⊕⊕β 0αα 9 1276,, wx =⊕⊕ βα 11112 ⊕ α -1 affine transformations as AF1 · δA. The performance of the wx810707= ⊕⊕⊕β αα,, wx 911605 = ⊕⊕⊕ βαα S-boxes is shown in Table 1, where a 0.25 μm CMOS standard wx10=⊕⊕⊕ 8β 7αα 2 5,, wx 11 =⊕⊕⊕ 9 βαα 6 2 7 cell library is used, and one-gate is equivalent to a 2-input w12 = xwx12516⊕⊕⊕β αα,, 1313436 = ⊕⊕⊕ βαα NAND gate. Although S supports four functions of S-boxes, it U wx=⊕⊕⊕β αα,. wx =⊕⊕⊕ βαα (5) uses only 373 gates and has a 6.63 ns delay. It is 37% larger 14 14 5 3 4 15 15 4 1 4 and shows a 7% delay as compared to SA. Because each wk contains three common terms, one βj and Using the shared S-boxes SA and SU, we designed a unified two different αi’s as in (5), we can save 384 (that is, 3×16×8) transformation (UT), which is equal to (SA, SU, SA, SU), and 2-input XOR gates. Table 2 shows the hardware size and delay finally propose the 128-bit unified substitution layer which time of permutation functions by sharing common terms. consists of four UTs, as depicted in Fig. 2. By switching the When ARIA diffusion is also implemented, there is about a selection signals of the multiplexers shown in Fig. 1, UT can 49% increase in size from AES functions and no degradation in -1 -1 -1 -1 operate as (S1, S2, S1 , S2 ) or (S1 , S2 , S1, S2) for ARIA and as delay, because vk makes the critical path. -1 four units of S1 or S1 for AES. Table 2. Performance of 16-byte permutation functions. III. Merging Permutation Functions Permutation functions no. of XORs Delay (ns) AES uses two permutation functions, MixColumns and MixCol.+InvMixCol. 780 1.64 InvMixColumns in encryption and decryption, respectively. MixCol.+InvMixCol.+ARIA Diff. 1,164 1.64 8 Each function involves addition and multiplication over GF(2 ) ETRI Journal, Volume 29, Number 6, December 2007 Bonseok Koo et al. 821 Round function Key scheduler Table 3. Comparison of performance with previous results. AES key scheduler Input data 32 128 Rcon[i] 128 Throughput S1 S1 S1 S1 Designs (μm) Gate counts Functions Clock cycles 4:1 (Mbps) Output 32 32 32 32 << data 128 Our proposed ARIA 16 720 Data register 19,056 128 (0.25) AES (e/d) 11/21 1,047/548 128 32 32 32 32 128 2:1 128 Our ARIA 32 32 32 32 15,496 ARIA 16 816 128 Input key (0.25) 128 128 128 2:1 [4] Key registers 12,454 AES 11 1,691 128 UT UT UT UT 128 128 128 (0.18) W0 W1 W2 W3 [5] Camellia 22 661 ShiftRows 128 128 128 128 128 14,918 128 128 (0.18) AES 31 469 2:1 4:1 4:1 128 128 128 >>19>>31<<61<<31 2:1 2:1 128 5:1 the smallest AES processor [4] with 128-bit unit architecture.