Unified Hardware Architecture for 128-Bit Block Ciphers AES and Camellia
Total Page:16
File Type:pdf, Size:1020Kb
Unified Hardware Architecture for 128-bit Block Ciphers AES and Camellia A. Satoh and S. Morioka Tokyo Research Laboratory IBM Japan Ltd. Contents Unified S-Box Unified Permutation Layer Unified Data Path Architecture ASIC Implementation Results Conclusion Unified S-Box AES S-Box Combinations of GF(28) inverter and affine transformations Inverter followed by affine transformation for encryption and inverter follows affine transformation for decryption SubBytes InvSubBytes x0 y0 x0 y0 x1 y1 x1 y1 x2 y2 x2 y2 x3 GF(28 ) y3 x3 GF(2 8 ) y3 x4 Inverter y4 x4 Inverter y4 x5 y5 x5 y5 x6 y6 x6 y6 x7 y7 x7 y7 Affine Transformation A Affine Transformation A-1 1 0 0 0 1 1 1 1 a 1 0 0 1 0 0 1 0 1 a ⊕ 1 0 0 1 1 0 0 0 1 1 1 a1 1 1 0 0 1 0 0 1 0 a1 ⊕ 1 1 1 1 0 0 0 1 1 a 0 0 1 0 0 1 0 0 1 a 2 2 1 1 1 1 0 0 0 1 a3 0 1 0 1 0 0 1 0 0 a3 ⊕ 1 1 1 1 1 0 0 0 a 0 4 0 1 0 1 0 0 1 0 a4 0 1 1 1 1 1 0 0 a 1 0 0 1 0 1 0 0 1 a ⊕ 1 5 5 0 0 1 1 1 1 1 0 1 a6 1 0 0 1 0 1 0 0 a6 ⊕ 1 0 0 0 1 1 1 1 1 a 0 7 0 1 0 0 1 0 1 0 a7 Camellia S-Box GF((24)2) inverter is placed between two affine transformations Four S-Boxes S1~S4 (I/O ordering is differed) are used Feistel-type cipher Camellia uses same S-Box in encryption and decryption s1 s2 x0 y0 8 8 x >>1 y x1 y1 s1 x2 y2 x3 GF((242 ) ) y3 s3 8 8 x4 Inverter y4 x s1 <<1 y x5 y5 x6 y6 s4 8 8 x7 y7 x <<1 s1 y Affine TransformationFH Affine Transformation 0 1 0 0 0 1 0 0 a ⊕ 1 0 1 0 0 1 1 0 0 a 0 0 0 1 0 0 0 0 0 1 0 a1 ⊕ 1 0 1 0 0 0 1 0 0 a1 1 0 0 1 0 1 0 0 1 a 0 0 0 1 0 0 1 0 a 1 2 2 0 0 1 0 0 0 0 1 a3 0 1 0 0 0 0 0 1 a3 0 ⊕ 0 0 0 1 0 0 1 0 a 4 0 0 1 0 0 0 1 0 a4 1 0 1 0 0 1 0 0 0 a5 ⊕ 1 1 0 0 0 0 0 0 1 a 1 5 1 0 0 0 0 0 0 1 a 6 1 0 0 0 1 0 0 0 a6 1 0 0 0 1 0 1 0 0 a ⊕ 1 7 0 0 1 0 0 1 0 0 a7 0 Unified S-Box (1) AES S-Boxes (4) Combine affine and isomorphism 0 -1% 0 (28 ) (28 ) δ 4 2 δ A GF A A-1 GF GF((2 ) ) Inverter Inverter Inverter A-1% δ 1 δ -1 1 SubBytes InvSubBytes (2) Share GF(28) inverter (5) Share common terms 0 A 0 0 0 GF(28 ) δ GF((24 )2) δ -1%A -1 Inverter -1% Inverter -1 A 1 1 A δ 1 δ 1 (3) Use GF((24)2) inverter (6) Merge Camellia S-Box 0 4 2 A 0 δ 0 δ -1%A 0 GF((2 ) ) GF((24 )2) δ δ -1 A-1% δ 1 δ -1 1 -1 Inverter Inverter A 1 1 FH2 2 Modifying affine transformations In order to share common terms, bit inverse operations on input are converted to the operations on output -1 δ 0 4 2 δ %A 0 -1 GF((2 ) ) A-1% δ 1 δ -1 1 A Inverter F 2 H 2 1 1 0 0 0 1 1 0 a 0 1 0 0 0 1 0 0 a ⊕ 1 A-1×δ 0 F 0 0 1 1 1 0 0 0 1 a1 ⊕ 1 1 0 0 0 0 0 1 0 a1 ⊕ 1 0 1 1 1 1 0 0 0 a ⊕ 1 0 0 1 0 1 0 0 1 a 2 2 1 1 1 1 0 1 1 1 a3 0 0 1 0 0 0 0 1 a3 1 0 1 0 0 0 0 0 a4 0 0 0 1 0 0 1 0 a4 0 0 1 0 1 0 1 0 a 0 1 0 0 1 0 0 0 a ⊕ 1 5 5 0 1 1 0 1 1 0 0 a6 ⊕ 1 1 0 0 0 0 0 0 1 a6 0 0 1 0 0 0 1 0 a7 ⊕ 1 0 0 0 1 0 1 0 0 a7 ⊕ 1 1 1 0 0 0 1 1 0a 0 0 1 0 0 0 1 0 0 a0 0 0 0 1 1 1 0 0 0 1a1 1 1 0 0 0 0 0 1 0 a1 1 0 1 1 1 1 0 0 0a 0 0 0 1 0 1 0 0 1 a 1 2 2 1 1 1 1 0 1 1 1a3 0 0 0 1 0 0 0 0 1 a3 1 ⊕ ⊕ 0 0 0 1 0 0 1 0 a 0 1 0 1 0 0 0 0 0a4 1 4 0 0 1 0 1 0 1 0a 0 0 1 0 0 1 0 0 0 a 1 5 5 0 1 1 0 1 1 0 0a6 0 1 0 0 0 0 0 0 1 a6 0 0 0 0 1 0 1 0 0 a 1 0 0 1 0 0 0 1 0a7 0 7 Common term sharing δ 0 δ -1%A 0 GF((24 )2) -1% δ 1 δ -1 1 A Inverter Merging 6 matrices FH2 2 into 2 matrices 1 0 1 0 0 0 0 0 a 1 0 0 0 0 1 1 0 a 0 0 0 1 0 1 0 1 1 0 0 a1 1 1 0 1 0 0 0 0 a1 1 1 1 0 1 0 0 1 0 a 1 0 0 0 1 1 1 0 a 1 XORs are reduced 2 2 0 1 1 1 0 0 0 0 a3 0 1 1 1 1 0 1 1 a3 0 -1 ⊕ 1 1 0 0δ 0 1 1 0 a 0 0 δ0 0 ×0A1 0 1 a 0 by 40% 4 4 0 1 0 1 0 0 1 0 a 0 1 0 1 1 0 0 1 a 0 5 5 0 0 0 0 1 0 1 0 a6 1 0 0 0 1 1 1 1 a6 1 1 1 0 1 1 1 0 1 a7 0 1 1 0 0 1 0 1 a7 1 δ 20 XORs 1 1 0 0 0 1 1 0a 0 0 0 1 0 0 1 0 0 a 0 0 -1 0 1 1 1 0 0 0 1a1 1 1 1 1 0 1 1 1 0 a1 δ 21 XORs 0 1 1 1 1 0 0 0a 0 1 0 1 0 0 1 0 0 a 2 2 -1 1 1 1 1 0 1 1 1a3 0 0 1 0 1 1 0 1 0 a3 A ×δ 22 XORs -1 ⊕ -1 A ×δ 1 0 1 δ1 0 0 1 0 a 1 0 1 0 0 0 0 0a4 1 4 -1 0 0 1 0 1 0 1 0 a 0 0 1 1 1 0 0 1 0 a5 Original δ ×A 21 XORs 5 1 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0a6 0 a6 0 1 0 1 0 0 0 1 a F 9 XORs 0 0 1 0 0 0 1 0a7 0 7 0 1 0 0 0 1 0 0 a 0 0 1 0 0 1 1 0 0 a 0 H 9 XORs 0 0 1 0 0 0 0 0 1 0 a1 1 0 1 0 0 0 1 0 0 a1 1 0 0 1 0 1 0 0 1 a 1 0 0 0 1 0 0 1 0 a 1 Total 102 XORs 2 2 0 0 1 0 0 0 0 1 a3 1 0 1 0 0 0 0 0 1 a3 0 ⊕ ⊕ AES+ 0 0 0 1F 0 0 1 0 a 0 0 0 1 0H0 0 1 0 a 1 Merged 56~60 XORs 4 4 0 1 0 0 1 0 0 0 a 1 1 0 0 0 0 0 0 1 a 1 S1~S4 5 5 1 0 0 0 0 0 0 1 a6 0 1 0 0 0 1 0 0 0 a6 1 0 0 0 1 0 1 0 0 1 a7 0 0 1 0 0 1 0 0 a7 0 Performance of unified S-Box 1/2 size in comparison with discretely implemented S-Boxes Speed degraded by 20% due to selectors for component sharing ((24 )2) 280 gates -1% GF δ InvSubBytes A δ Inverter 3.65 ns (AES) ((24 )2) 280 gates Total δ GF -1% SubBytes Inverter δ A 3.56 ns 816 gates (AES) ((24 )2) 256 gates F GF H S1~S4 Inverter 3.45 ns (Camellia) δ 0 δ -1 %A 0 SubBytes+ GF((24 )2) 411~414 gates -1% δ 1 δ -1 1 InvSubBytes+ A Inverter S1~S4 FH2 2 4.29~4.65 ns (AES+Camellia) Unified Permutation Layer Merging MixColumns and InvMixColumns Break permutation layers of AES (MixColumns and InvMixColumns) into {01, 02, 04, 08} elements All elements of MixColumns are included in InvMixColumns MixColumns InvMixColumns z 02 03 01 01 x yy 0E0E 0B0B 0D0D 0909 xx 0 0 Same 0 0 00 z1 01 02 03 01 x1 yy11 0909 0E0E 0B0B 0D0D xx11 = ⋅⋅ == ⋅⋅ z 01 01 02 03 x yy 0D0D 0909 0E0E 0B0B xx 2 22 Matrices 22 22 z 03 01 01 02 x z3 03 01 01 02 x3 yy33 0B0B 0D0D 0909 0E0E xx33 00 01 01 01 x 02 02 00 00 x 00 01 01 01 x 02 02 00 00 x 0 0 0 0 01 00 01 01 x1 00 02 02 00 x1 01 00 01 01 x 00 02 02 00 x = ⋅ + ⋅ = ⋅ 1 + ⋅ 1 01 01 00 01 x 00 00 02 02 x 01 01 00 01 x 00 00 02 02 x 2 2 2 2 01 01 01 00 x3 02 00 00 02 x3 01 01 01 00 x3 02 00 00 02 x3 04 00 04 00 x 08 08 08 08 x 0 0 00 04 00 04 x 08 08 08 08 x + ⋅ 1 + ⋅ 1 04 00 04 00 x 08 08 08 08 x 2 2 00 04 00 04 x3 08 08 08 08 x3 Merging P-function and InvMixColumns Generate 64-bit matrix contains two InvMixColumns Factor common terms between P-function and 64-bit InvMixColumns P - function InvMixColumns w 0 1 0 1 1 0 1 1 1 x 0 y 0 E B D 9 x0 w1 1 1 0 1 1 0 1 1 x1 y1 9 E B D x1 0 w 2 1 1 1 0 1 1 0 1 x 2 y 2 D 9 E B x 2 w 3 0 1 1 1 1 1 1 0 x 3 y3 B D 9 E x3 = ⋅ = ⋅ w 1 1 0 0 0 1 1 1 x y E B D 9 x 4 4 4 4 9 E B D w 5 0 1 1 0 1 0 1 1 x 5 y5 x5 0 w 0 0 1 1 1 1 0 1 x D 9 E B 6 6 y 6 x6 w 1 0 0 1 1 1 1 0 x B D 9 E 7 7 y 7 x7 1 0 0 0 0 1 1 1 x x x x 0 0 0 1 1 0 E A E 8 0 0 1 1 1 0 1 0 1 1 0 1 0 0 x1 1 0 0 1 x1 8 E A E x1 1 0 1 1 x1 1 1 0 1 0 0 0 0 0 1 0 x 2 1 1 0 0 x 2 E 8 E A x 2 1 1 0 1 x2 1 1 1 0 0 0 0 1 x 3 0 1 1 0 x 3 A E 8 E x3 1 1 1 0 x3 = ⋅ + ⋅ = ⋅ + ⋅ 1 1 0 0 x 0 1 1 1 x E A E 8 x 0 1 1 1 x 4 4 4 4 1 0 1 1 8 E A E 1 0 1 1 0 1 1 0 x 5 x 5 x5 x5 0 0 0 0 0 1 1 x 1 1 0 1 x E 8 E A 1 1 0 1 0 6 6 x6 x6 1 0 0 1 x 1 1 1 0 x A E 8 E 1 1 1 0 7 7 x7 x7 Performance of unified permutation layer Number of XOR terms is reduced to 1/3 of the original matrices with two additional XOR-gates of delay x {08,04,00} {02,01,00} {01,00} {01,00} -element -element -element -element Matrices Matrices Matrices Matrices y z w InvMixColumns MixColumns P-function Original Matrices Shared MixCol InvMixCol P-funk Total Matrix XORs 304 880 288 1,482 476 Delay (gates) 3 5 3 5 7 Unified Data Path Architecture AES Algorithm SPN structure using 4 primitive functions takes 11 rounds Cipherihg Block a a a a 00 0102 03 k 00kkk01 02 03 b00bbb01 02 03 128-bit plain text 1-bit a a a a kkkk bbbb 10 1112 13 1011 12 13 = 1011 12 13 88 8 XOR a a a a kkkk bbbb 128 20 2122 23 2021 22 23 2021 22 23 AddRoundKey a30 a31a32 a 33 kkkk3031 32 33 bbbb3031 32 33 SubBytes a a a a b bbb ShiftRows 8-bit 00 0102 03 S-Box 0001 02 03 a a a a bbbbb MixColumns S-Box 10 11ij 13 1011 12ij 13 a a a a bbbb AddRoundKey 16 20 2122 23 2021 22 23 a30 a31a32 a 33 bbbb3031 32 33 a a a a no shift a a a a 32-bit 0001 02 03 0001 02 03 SubBytes a a a a left rotation by 1 a Rotation 10 1112 13 10 ShiftRows a a a a left rotation by 2 a a 4 2021.