S-Box Optimization for SM4 Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the World Congress on Engineering and Computer Science 2017 Vol I WCECS 2017, October 25-27, 2017, San Francisco, USA S-box Optimization for SM4 Algorithm Yuan Zhu, Fang Zhou, Ning Wu, Yasir derivation process. In [4]-[5], S-boxes based on polynomial Abstract—This paper proposes a highly optimized S-box of basis and normal basis were introduced. Their optimizations SM4 algorithm for low-area and high-speed embedded for S-box focused on the hardware overhead at the cost of application. A novel methodology is adopted for S-box throughput. Therefore, when the SM4 algorithm is used for implementation based on Composite Field Arithmetic (CFA) and mixed basis. The optimization result shows that the S-box high-throughput and area-constrained devices, a short-delay based on mixed basis has shorter critical path than S-boxes and compact S-box is required for SM4 hardware based on normal basis and polynomial basis. Compared with implementation. previous works, the mixed basis based S-box proposed in this This study focuses on the optimization of S-box for SM4 paper can achieve the shortest critical path. Besides, the and the major contributions include: 2 2 operations over GF((2 ) ) and the constant matrix Coalescence design and implementation based on CFA multiplications are optimized by Delay-Aware Common Sub-expression Elimination (DACSE) algorithm. ASIC technology [6] and mixed basis [7]. 2 2 implementation using static 180 nm @ 1.8 V yield an area The MI over GF((2) ) and constant matrix reduction of 35.57% as compared to direct implementation. multiplications are optimized by DACSE algorithm [8]. Index Terms—SM4 algorithm, S-box, Composite Field II. BACKGROUND OF SM4 S-BOX ON CFA Arithmetic (CFA), mixed basis The algebraic expression is shown as (1) and properties of the S-box for SM4 algorithm has been analyzed in [9]. (1) I. INTRODUCTION S(x) I x A C A C 8 M4 algorithm is a group symmetric cipher algorithm where I is the MI over GF(2 ). A is the affine matrix and C is Sannounced by Chinese government in January 2006 and it row vector. x is the input of S-box and S(x) is the has been widely used in various fields of information security, corresponding output. The S-box function involves a 8 such as wireless local area network (WLAN), Wireless LAN pre-affine transformation, MI over GF(2 ) and a post-affine Authentication and Privacy Infrastructure (WAPI), storage transformation. A and C are shown in (2). device and the smart card system [1]-[2]. As the SM4 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 algorithm is mostly used in high-speed and resource- 1 constrained applications, it is very necessary to design and 0 1 1 1 1 0 0 1 0 implement short-delay and compact circuit of SM4. The 1 0 1 1 1 1 0 0 1 A , C (2) implementation of S-box is the most expensive part in terms 0 1 0 1 1 1 1 0 0 of the required hardware. Therefore, the short-delay and 0 0 1 0 0 1 1 1 0 compact S-box is the key component of the SM4 algorithm. 1 0 0 1 0 1 1 1 1 The design and optimization of SM4 S-box have been studied in detail. The S-box implemented with LUT achieves 1 1 0 0 1 0 1 1 1 high throughput but acquires large area. In [3], the Since direct calculation of the MI over GF(28) is a N-dimensional hypercube method was introduced to complicated and difficult task, we adopt the CFA to reduce construct S-box. Although the method reduced the area, it is the hardware complexity by mapping the MI over GF(28) into difficult to implement in hardware because of complex composite field GF(((22)2)2). In CFA technology, an isomorphic mapping matrix is 8 Manuscript received July 03, 2017; revised July 24, 2017. This work was demanded to map the input vector from the finite field GF(2 ) supported in part by the National Natural Science Foundation of China to the composite field GF(((22)2)2), and its inverse matrix is (61376025), the Fundamental Research Funds for the Central Universities required to revert the computing results to GF(28). So the (NS2017023) and the Natural Science Foundation of Jiangsu Province S-box based on CFA technique can be expressed as: (BK20160806). Y. Zhu is with College of Electronic and Information Engineering, S(x) T1 I x A C T A C (3) Nanjing University of Aeronautics and Astronautics, Nanjing, 211100, China (e-mail: zhuyuan93@ qq.com). F. Zhou is with the College of Electronic and Information Engineering, MI over X A× +C T× AT-1× +C Y Nanjing University of Aeronautics and Astronautics, Nanjing, 211100, GF(((22)2)2) China(e-mail: [email protected]). N. Wu is with the College of Electronic and Information Engineering, Fig. 1. Architecture of SM4 s-box using the CFA technique Nanjing University of Aeronautics and Astronautics, Nanjing, 211100, China. where T is the isomorphic mapping matrix and T-1 is inverse Yasir is with the College of Electronic and Information Engineering, matrix of T. Generally, matrix T-1 and matrix A are merged Nanjing University of Aeronautics and Astronautics, Nanjing, 211100, into a single matrix to reduce the hardware resources. The China. ISBN: 978-988-14047-5-6 WCECS 2017 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) Proceedings of the World Congress on Engineering and Computer Science 2017 Vol I WCECS 2017, October 25-27, 2017, San Francisco, USA 1 architecture of the S-box using the CFA technique is shown in A1 AA4 A 4 Fig. 1. 1 4 4 4 =al ah a h a l ah a l 8 (6) III. SUB-OPERATIONS OVER GF(2 ) IN MI 2 1 = al a h a l a h al a h a l In CFA technology, the MI over GF(2)8 is built iteratively from GF(2) by using the following irreducible polynomials: = bl b h B 4 2 2 Where A can be expressed as A=alβ+ahβ , al,ah∈GF(2 ). B 8 2 2 2 2 GF2 GF 2 : g v( v = ) 2 can be represented as B=bl+bhβ, bl, bh ∈GF(2 ). By using the 2 2 2 third irreducible polynomial in (4), the MI over GF((2 ) ) is 2 2 2 GF 2 GF2 : f (4) further decomposed into GF((2)2) as (7). 2 2 b a a a a a a a a GF2 GF2 : e = + +1 3 0 1 2 3 1 2 0 3 a a a a a a 0 1 2 0 2 3 The coefficients {v=(0100)2, λ=(10)2} are used in this b2 a 0 a 2 a 1 a 2 a 0 a 3 a 0a 1a 3 B (a a 4 )1 paper. l h a1a2 a 3 A. MI in Composite Domain b a a a a a a a a a 1 0 1 1 2 1 3 0 1 2 According to the first irreducible polynomial in (4), the MI b0a 0 a 0 a 2 a 1 a 2 a 1 a 3 a0 a1 a 3 over GF(28) is decomposed into GF(((22)2)2) as (5). (7) -1 2 2 A-1= AA 16 A 16 Where al=a0α+a1α ,ah=a2α+a3α , a0, a1, a2, a3∈GF(2). 1 Input: normal basis Output: polynomial basis 16 2 2 2 16 16 (5) = al ah al a h ah a l 2 2 al bl 1 2 ()2×α ()-1 =al a h a l a h v al a l a h 2 = b b B ah 2 l h bh 16 2 2 where A can be expressed as A=alγ+ahγ , al, ah ∈GF((2 ) ). Fig. 3. MI over GF((22)2) 2 2 B can be represented as B=bl+bhγ, bl, bh ∈GF((2 ) ). The 2 2 architecture of MI over GF(((22)2)2) is shown in Fig. 2. The MI over GF((2 ) ) is optimized by the DACSE algorithm, and the optimized result includes 13 XOR gates Input: normal basis Output: polynomial basis and 8 AND gates, which needs 51 equivalent gates, with a 4 P 4 al bl reduction of 44.26% in terms of the total area occupancy P NN P ()2×υ ()-1 compared with the direct implementation. PN 4 2 2 ah P 4 b) Constant Multiplied by Square over GF((2 ) ) bh The input and output of constant multiplied by square 2 2 2 Fig. 2. MI over GF(((2 ) ) ) operation are denoted by the polynomial basis {1, β} and 2 2 2 As shown in Fig. 2, the MI over GF(((2 ) ) ) includes two normal basis {β, β4}, respectively. Constant multiplied by additions, a MI, a square, a constant multiplication and three square is calculated as (8) and its architecture is mentioned in 2 2 multiplications. All of the operations are over GF((2 ) ). The Fig. 4. addition over GF((22)2) is defined as bitwise XOR gates. The A2v = 2() a a 2 square and constant multiplication over GF((22)2) can be l h 2 2 2 2 2 2 deduced from multiplication over GF((2 ) ), and they are ()al a h (8) usually joint into a single block to reduce the number of gates.