<<

New Subquadratic Algorithms for Constructing Lightweight Hadamard MDS Matrices (Full Version)

Tianshuo Conga, Ximing Fub,c,∗, Xuting Zhoue, Yuli Zoud and Haining Fane aInstitute for Advanced Study, Tsinghua University, Beijing, China bThe Chinese University of Hong Kong, Shenzhen, Shenzhen, China cUniversity of Science and Technology of China, Hefei, China dAlibaba Local Life Service Lab, Shanghai, China eDepartment of Computer Science and Technology,Tsinghua University, Beijing, China

ARTICLEINFO ABSTRACT

Keywords: Maximum Distance Separable (MDS) Matrix plays a crucial role in designing . In this Lightweight paper we mainly talk about constructing lightweight Hadamard MDS matrices based on subquadratic MDS matrix multipliers over GF(24). We firstly propose subquadratic Hadamard matrix-vector product formulae Hadamard matrix (HMVP), and provide two new XOR count metrics. To the best of our knowledge, subquadratic Involution multipliers have not been used to construct MDS matrices. Furthermore, combined with HMVP Subquadratic matrix-vector product formulae we design a construction algorithm to find lightweight Hadamard MDS matrices under our XOR count metric. Applying our algorithms, we successfully find MDS matrices with the state-of- the-art fewest XOR counts for 4×4 and 8×8 involutory and non-involutory MDS matrices. Experiment results show that our candidates save up to 40.63% and 10.34% XOR gates for 8×8 and 4×4 matrices over GF(24) respectively.

1. Introduction cryption and decryption. The diffusion layers of AES, Grøstl, WHIRLPOOL use circulant matrices. The diffusion layer of KHAZAD With the rapid development and application of source re- uses Hadamard matrix. In [9], authors propose compact Cauchy stricted equipment like Radio Frequency Identification (RFID), matrices which have the fewest different entries, they prove lightweight ciphers have attracted great attention, such as that all compact Cauchy matrices could be improved into Trivium v1 MICKEY 2.0 stream ciphers [13], [22], [1] self-inverse ones. In [33], the application of Toeplitz ma- LED CLEFIA and block ciphers [19], [36], etc. Confusion layer trices in lightweight diffusion layers was discussed. The au- and diffusion layer play crucial roles in designing symmet- thors prove that Toeplitz matrices could not be both MDS ric ciphers in terms of security and efficiency. The branch and involutory, they give the result of 4 × 4 involutory MDS number is the primary factor for the performance of dif- matrix with small number of XOR operations. It is proved fusion layers. Maximum Distance Separable (MDS) ma- in [37] that there are equivalent classes of various MDS ma- trix has the biggest branch number and hence provides the trices, for example there are 30 equivalent classes of 8 × best resistance to differential and linear attacks. As a con- 8 Hadamard matrices. In [24], the authors propose a tool sequence, lightweight MDS matrices have been applied in named LIGHTER to produce optimized implementations of diffusion layers to provide better security and hardware per- small functions in lightweight cryptographic designs. They formances: Shark [32] firstly utilizes MDS ma- find 4 × 4 and 8 × 8 involutory and non-involutory MDS AES 8 trix; [11] uses a 4 × 4 matrix over GF(2 ), which oc- matrices over GF(24) and GF(28) with fewer XOR counts. cupies a wealth of hardware area resources; 4 × 4 matrix The GF(2푐) multiplication can be represented as a matrix- LED in is hardware-friendly. MDS matrices are also widely vector product and MDS matrices usually appear in matrix- CLEFIA used as diffusion layers in other block ciphers such as , vector multiplication, so the method to design multiplier can FOX [25], KHAZAD [2], ANUBIS [3], Square [10], Twofish [35], also be adapted to design MDS matrices. A typical hardware Joltik [23]; some stream ciphers [12] and hash functions [17, evaluation of the given MDS matrix is the total number of 18, 29]. 2-input XOR gates cost because an addition operation over The method of constructing MDS matrices is mainly con- GF(2) could be realized by a 2-input Exclusive Or (XOR) sidered from two perspectives: structure and involution. Two gate, and addition operation over GF(2푐) can be realized by networks are widely used in block ciphers, subtitution-per- 푐 2-input XOR gates with one XOR gate delay. Minimizing mutation networks (SPNs) and Feistel structure. Matrix with the number of XOR gates needed in matrix-vector multipli- lightweight inverse matrix could be used in SPNs and hardware- cation has always been a concern [28, 5]. Bit parallel GF(2푐) friendly involutory MDS matrices could be used in Feistel multipliers could be classified into two categories: quadratic structures because of the identical process of Feistel’s en- and subquadratic multipliers [15]. Quadratic multipliers are ∗Corresponding author built on the straightforward computation and they have high [email protected] (T. Cong); [email protected] (X. space complexity [30, 31, 34, 21]. Subquadratic multiplica- Fu); [email protected] (X. Zhou); [email protected] (Y. tion algorithms could be utilized to design low space com- Zou); [email protected] (H. Fan) plexity GF(2푐) multipliers for large 푐 [4, 16, 6, 26]. In this ORCID(s):

TS Cong et al.: Preprint submitted to Elsevier Page 1 of 10 Construction of Lightweight MDS Matrices work, we design lightweight MDS matrices using the sub- Definition 2 (Branch number). For a linear invertible map- quadratic Toeplitz matrix-vector product formulae in [14] ping 휃 ∶ [GF(2푚)]푛 → [GF(2푚)]푛, the branch number is (TMVP). (휃) = min (푤ℎ(퐱) + 푤ℎ(휃(퐱)). (1) 퐱≠ 1.1. Our Contributions 0 The goal in this work is to construct lightweight involu- Definition 3 (Best diffusion). The diffusion is denoted as the tory and non-involutory MDS matrices by using subquadratic best diffusion when the branch number Hadamard matrix-vector product formulae. To the best of our knowledge, this approach has not been used to design (휃) = 푛 + 1. (2) MDS matrices with low XOR count before. Considering lightweightness, we mainly consider MDS If the mapping 휃 is a matrix 푀, then 푀 is called an MDS matrices over small fields. We construct four kinds of MDS matrix. matrices, which are involutory and non-involutory 4 × 4 and 8 × 8 Hadamard matrices over GF(24). We propose two new When the input x’s hamming weight is equal to 1, and 푤 휃 퐱 ≤ 푛 XOR count metrics and use the subquadratic algorithms to ℎ( ( )) , the diffusion matrix’s maximum branch num- 푛 construct lightweight MDS matrices over GF(24) defined by ber is + 1. Reference [11] gives a property to find MDS 푀 three irreducible polynomials under our new metrics. Com- matrix: A matrix is MDS if and only if every square sub- 푀 parison with the best results of previous work is shown in matrix of is nonsingular. Table 1. The known lower bounds are from [24] (FSE2017), 2.2. Hadamard Matrix which are the benchmarks in our experiment. Inv. with the Hadamard matrix is a specially structured matrix. Let check mark means involutions. 퐻푘,푘 = (ℎ푖,푗) be a 푘 × 푘 Hadamard matrix, where 푘 = 푟 2 , 푟 = 1, 2, ⋯, (ℎ푖,푗) is the (푖, 푗)-th element and 0 ≤ 푖, 푗 ≤ Table 1 푘 ℎ ℎ ⊕ Summary table of the MDS matrices − 1, then 푖,푗 = 푖⊕푗 holds, denotes the bit-wise XOR, ℎ , ℎ , ⋯ , ℎ 0 1 푘−1 are the elements in the first row. Hence, Dimension Inv. XOR count [24] Comparison the elements in 퐻푘,푘 are determined by its first row, so the 퐻 ℎ , ⋯ , ℎ 4 × 4 ✓ 58 63 7.94% Hadamard matrix can also be denoted by had( 0 푘−1). ℎ , ℎ , ℎ , ℎ 8 × 8 ✓ 282 424 33.49% An example of 4 × 4 Hadamard matrix had( 0 1 2 3) is 4 × 4 52 58 10.34% shown as 8 × 8 228 384 40.63% ℎ ℎ ℎ ℎ ⎛ 0 1 2 3⎞ ⎜ℎ ℎ ℎ ℎ ⎟ 퐻 1 0 3 2 . The rest of this paper is organised as follows. Section 2 4,4 = ⎜ℎ ℎ ℎ ℎ ⎟ ⎜ 2 3 0 1⎟ briefly introduces some basic concepts such as background ⎝ℎ ℎ ℎ ℎ ⎠ knowledge of MDS matrices and subquadratic TMVP for- 3 2 1 0 mulae. Section 3 begins by laying out the applications of Definition 4 (Involutory Matrix [20]). A 푘 × 푘 square ma- 2 HMVP to Hadamard matrices, and introduces the search- trix 퐻 is called an involutory matrix if 퐻 = 퐼푘,푘, where ing process of involutions and non-involutions. Section 4 퐼푘,푘 is the 푘 × 푘 identity matrix. presents the experiment parameters and searching results of the algorithm. Meanwhile we compare our MDS matrices The advantage in applying involutory MDS matrices is with the previous results to show the efficiency. Finally, Sec- its free inverse implementation. tion 5 concludes this paper. 2.3. Subquadratic TMVP Formulae A matrix is denoted as a Toeplitz matrix if the elements 2. Preliminaries on the line parallel to the main diagonal are constant. An In this section, some basic concepts and subquadratic example of 4 × 4 Toeplitz matrix is HMVP formulae will be introduced. The elements of the 푡 푡 푡 푡 Hadamard matrix in our work all belong to the finite field ⎛ 0 1 2 3⎞ 푐 ⎜푡 푡 푡 푡 ⎟ GF(2 ) generated by degree-푐 irreducible polynomial 푝(푥), 푇 4 0 1 2 . 푐 = ⎜푡 푡 푡 푡 ⎟ which is denoted as GF(2 )∕푝(푥). We use hexadecimal form ⎜ 5 4 0 1⎟ ⎝푡 푡 푡 푡 ⎠ to denote 푝(푥) and the elements in matrices. 6 5 4 0 푇 2.1. Branch Number Definition 5 (Subquadratic TMVP formulae [14])). Let 푘 푘 푉 The branch number is a index for inspecting the per- be a × Toeplitz matrix and be a column vector of di- 푘 푇 푉 formance of diffusion layers. It could quantify the avalanche mension . The Toeplitz matrix-vector product could be effect. computed by ( )( ) ( ) 푇 푇 푉 푃 + 푃 Definition 1 (Hamming weight). The hamming weight 푤ℎ(퐱) 푇 푉 1 0 0 0 2 , = 푇 푇 푉 = 푃 + 푃 (3) is the number of non-zero components of the vector x. 2 1 1 1 2

TS Cong et al.: Preprint submitted to Elsevier Page 2 of 10 Construction of Lightweight MDS Matrices

 ℎ  ℎ where and are shown in Table 2. We acquire ( 0) = 0, ( 1) = ⎧푃 푇 푇 푉 1, (ℎ ) = 3, (ℎ ) = 8, and hence ⎪ 0 = ( 0 + 1) 1 2 3 ⎨푃 = (푇 + 푇 )푉 . 1 1 2 0 (퐸) = 4 × (0 + 1 + 3 + 8) + 48 = 96. ⎪푃 푇 푉 푉 ⎩ 2 = 1( 0 + 1) 푇 , 푇 , 푇 푇 푘 Here 0 1 2 are the sub-matrices of with size ( ∕2)× 푘 푉 , 푉 푉 푘 Table 2 ( ∕2). 0 1 are the column sub-vectors of with size ( ∕2)× 4 푃 , 푃 , 푃 푘 XOR Counts of Elements over GF(2 ) 1. 0 1 2 are three TMVPs of dimension ∕2. Elements GF(24)∕0x13 GF(24)∕0x1f GF(24)∕0x19 Straightforward computation of 푇 푉 requires 4 TMVPs 0x1 0 0 0 of size 푘∕2, while (3) saves 1 TMVP (underlined 푃 ). 2 0x2 1 3 1 0x3 5 5 3 2.4. XOR Count 0x4 2 3 3 In this paper we use the number of XOR gates needed 0x5 6 5 5 (XOR count) for implementing the MDS matrix-vector mul- 0x6 5 6 2 tiplication to measure the lightweightness of a given MDS 0x7 9 6 6 matrix. XOR counts was first proposed in [27] and has been 0x8 3 3 6 widely adopted to quantify the hardware implementation area 0x9 1 5 8 cost [37, 24]. 0xa 8 6 5 For a 푘 × 푘 Hadamard matrix 퐻, there are 2 metrics of 0xb 6 6 9 XOR count: 0xc 5 6 1 0xd 3 6 5 • The XOR count needed for directly computing the mul- 0xe 8 5 6 tiplication of 퐻 with a vector is called direct-XOR- 0xf 6 3 8 count, denoted by (퐻). (퐻) is an overestimation of the hardware implementation cost. The expression of the straightforward∑ matrix-vector prod- uct is shown in (4) and 푟 = 3 푟푖 푥푖, 0 ≤ 푘 ≤ 3, it re- • For computing the matrix-vector product, some re- 푘 푖=0 푘 quires 96 XOR gates. The 2-bit combination underlined are peated computation (XOR gates) can be reused. The the shared 2-input XOR gates, e.g., 푣0푣3 appears both in 푟3 direct-XOR-count minus the reused XOR count is de- 2 2 0 and 푟3. There are 18 shared XOR gates, such that noted as (퐻) [24]. However, minimizing (퐻) is 1 shown to be NP-hard [7]. We find the 2-input repeated (퐸) = 96 − 18 = 78. XOR gates by hand. Consider the XOR count of a given 푘 × 푘 Hadamard ma- trix. Let (ℎ푖) denote the XOR count of the element ℎ푖. If 푟 푣0푣3푣3푣2푣0푣2푣3 푥3 푣2푣1푣2푣3푣2푣3푣1 푥2 0 =[ 2 2 0 1 3 3 3] + [ 0 1 2 2 3 3 3] + we directly calculate matrix-vector multiplication, we will [푣0푣3푣1푣2푣1푣2푣0푣1푣3]푥 + [푣0푣3푣1푣1푣3], cost 푘2 multiplications and 푘(푘−1) additions. Therefore the 1 1 2 2 3 3 3 0 3 0 1 2 3 3 푐 푝 푥 푟 푣0푣3푣0푣3푣2푣3푣2 푥3 푣2푣3 푣2푣3 푣1푣2푣1 푥2 direct-XOR-count over GF(2 )∕ ( ) could be calculated as 1 =[ 3 3 2 2 0 1 2] + [( 2 2)( 3 3) 0 1 2] + 푣1푣0 푣0푣3 푣1푣2 푣1푣2 푣3 푥 푣3푣1푣3푣0푣1 , [ 1 2( 0 0)( 2 2)( 3 3) 2] + [ 2 3 0 1 2] 푘∑−1  퐻 푘  ℎ 푘 푘 푐. 푟 푣0푣3푣2푣3푣0푣3푣2 푥3 푣2푣1 푣2푣3 푣3푣2푣1 푥2 ( 푘,푘) = × ( 푖) + × ( − 1) × 2 =[ 0 0 1 1 1 2 3] + [( 0 1)( 1 1) 0 2 3] + 푖=0 푣1푣2 푣0푣3 푣0푣3 푣1푣2푣1 푥 푣3푣3푣1푣0푣1 , [ 0 0( 1 1)( 3 3) 1 1 2] + [ 1 3 1 2 0] Take matrix 퐸 = had(0x1,0x2,0x8,0xa) and vector 푉 = 푣 , 푣 , 푣 , 푣 ⊤ 4 푟 푣0푣3푣2푣0푣2푣3푣3 푥3 푣1푣2푣3푣1푣2푣2푣3푣3푣1푣2 푥2 ( 0 1 2 3) over GF(2 )∕0x13 as an example. Firstly, we 3 =[ 0 0 0 1 2 1 3] + [ 0 0 0 2 3 1 1 0 2 3] + should calculate∑ the XOR count of each element in the ma- 1 2 3 1 3 1 0 2 0 3 1 1 3 0 3 푖 푖 푣 푣 푣 푣 푣 푣 푣 푣 푣 푥 푣 푣 푣 푣 푣 . 푣 푣 푥 ≤ 푘 ≤ [ 0 0 0 1 2 3 0 1 2] + [ 0 1 0 2 3] trix. Let 푘 = 푖=0 푘 for 0 3. For element 0xa, ⋅ 푣 0xa 3 is equal to (4)

0xa ⋅ 푣 푥3 푥 푣3푥3 푣2푥2 푣1푥 푣0 3 =( + ) × ( 3 + 3 + 3 + 3) 3. Construction Algorithms 푣0푣2푣3 푥3 푣1푣2푣3 푥2 푣0푣1푣2푣3 푥 푣1푣3 . In this section, we will introduce our construction algo- =( 3 3 3) + ( 3 3 3) + ( 3 3 3 3) + ( 3 3) rithms in detail. We firstly introduce the subquadratic Hadamard As there is no AND operation, we omit the XOR mark matrix-vector formulae and its application to 4 × 4 and 8 × 8 ⊕ 푣0푣2 in the expression without ambiguity. For example, 3 3 Hadamard matrices, followed by the construction algorithms 푣0 ⊕ 푣2  here denotes 3 3. So (10) = 2 + 2 + 3 + 1 = 8. of involutory and non-involutory MDS matrices. Involutory The XOR counts of the other elements in the matrix over all and non-involutory matrices are called involutions and non- three finite fields GF(24))∕푝(푥) can be computed similarly involutions respectively for short.

TS Cong et al.: Preprint submitted to Elsevier Page 3 of 10 Construction of Lightweight MDS Matrices

3.1. Subquadratic HMVP Formulae 3.2. Application of HMVP to 4 × 4 Hadamard In this part, we show that the Hadamard matrices are ap- Matrices plicable for TMVP formulae. We firstly apply the HMVP formulae on 4×4 Hadamard matrices. According to the structure of Hadamard matrices, Lemma 1. A 푘 × 푘 Hadamard matrix 퐻 could be denoted 퐻 is in the form of as ( ) 4,4 퐻 퐻 퐻 = 1 0 , ( ) 퐻 퐻 퐻 퐻 0 1 퐻 ℎ , ℎ , ℎ , ℎ 1 0 , 퐻 퐻 푘 푘 4,4 = had( 0 1 2 3) = 퐻 퐻 where 0 and 1 are ( ∕2)×( ∕2) Hadamard sub-matrices. 0 1 Proof 1. Denote where ( ) 퐻 퐻 ( ) ( ) 퐻 0 1 , ℎ ℎ ℎ ℎ = 퐻 퐻 퐻 0 1 , 퐻 2 3 . 2 3 1 = ℎ ℎ 0 = ℎ ℎ 1 0 3 2 퐻 퐻 퐻 퐻 푘 푘 where 0, 1, 2 and 3 are ( ∕2) × ( ∕2) sub-matrices. 푉 퐻 ℎ ≤ 푖, 푗 < 푘 퐻 ℎ The vector is in the form of Denote 0 = ( 푖,푗), 0 ∕2. Then 1 = ( 푖,푗+(푘∕2)), ( ) 퐻 ℎ 퐻 ℎ 2 = ( 푖+(푘∕2),푗), 3 = ( 푖+(푘∕2),푗+(푘∕2)). The elements 푉 푉 = 0 , of 퐻 , 퐻 , 퐻 and 퐻 satisfy ℎ푖,푗 = ℎ푖⊕푗, so they are 푉 0 1 2 3 1 Hadamard matrices. Meanwhile we have where 푖 ⊕ 푗 = (푖 + (푘∕2)) ⊕ (푗 + (푘∕2)), ( ) ( ) 푣 푣 푉 0 , 푉 2 . and 0 = 푣 1 = 푣 푖 ⊕ (푗 + (푘∕2)) = (푖 + (푘∕2) ⊕ 푗), 1 3 as a result, By applying the HMVP formulae, the three correspond- ing HMVPs could be computed by 퐻 퐻 , 0 = 3 ( ) 퐻 퐻 . ℎ 푣 + ℎ 푣 1 = 2 푃 = (퐻 + 퐻 )푉 = 02 2 13 3 , 0 0 1 1 ℎ 푣 + ℎ 푣 푘 푘 퐻 ( 13 2 02 3 ) Lemma 2. Given a × Hadamard matrix of the form ℎ 푣 ℎ 푣 ( ) 푃 퐻 퐻 푉 02 0 + 13 1 , 퐻 퐻 1 = ( 1 + 0) 0 = ℎ 푣 ℎ 푣 퐻 1 0 , 13 0 + 02 1 = 퐻 퐻 ( ) 0 1 ℎ 푣 + ℎ 푣 푃 = 퐻 (푉 + 푉 ) = 0 02 1 13 . 퐻 퐻 푘 푘 퐻 2 1 0 1 ℎ 푣 + ℎ 푣 where 0 and 1 are ( ∕2)×( ∕2) sub-matrices, then 0 + 1 02 0 13 퐻 1 is a Hadamard matrix. 푅 퐻 푉 Finally the matrix-vector product 4 = 4,4 could be 퐻 ℎ 퐻 ℎ 퐻 퐻 Proof 2. Denote 1 = ( 푖,푗), 0 = ( 푖,푞), 01 = 0 + computed by 퐻 휃 , ⋯ , 휃 휃 ≤ 푖, 푗 ≤ 푘 1 = had( 0 푘∕2−1) = ( 푖,푗), where 0 ∕2 − 1, ⎛ 푟 ⎞ 푞 = 푗 + 푘∕2. We know that ℎ푖,푗 = ℎ푖⊕푗, ℎ푖,푞 = ℎ푖⊕푞, and 0 ( ) ⎜ 푟 ⎟ 푃 + 푃 휃푖⊕푗 = ℎ푖⊕푗 + ℎ푖⊕푞, 휃푖,푗 = ℎ푖,푗 + ℎ푖,푞, so 휃푖⊕푗 = 휃푖,푗 and 퐻 푅 1 0 2 01 4 = ⎜ 푟 ⎟ = 푃 푃 is a Hadamard matrix. ⎜ 2 ⎟ 1 + 2 ⎝ 푟 ⎠ 3 Definition 6 (Subquadratic HMVP Formulae). Based on ⎛ ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎞ Lemma 1 and Lemma 2, we can compute the subquadratic 02 2 + 13 3 + 0 02 + 1 13 ⎜ ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎟ Hamamard matrix-vector product (HMVP) of 퐻 and 푉 by ⎜ 13 2 + 02 3 + 1 02 + 0 13 ⎟ = ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 . ( )( ) ( ) ⎜ 02 0 + 13 1 + 0 02 + 1 13 ⎟ 퐻 퐻 푉 푃 + 푃 ⎜ ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎟ 퐻푉 1 0 0 0 2 , ⎝ 13 0 + 02 1 + 1 02 + 0 13 ⎠ = 퐻 퐻 푉 = 푃 푃 0 1 1 1 + 2 ℎ ℎ ℎ 푣 푣 푣 where where 푖푗 denotes 푖 + 푗 and 푖푗 denotes 푖 + 푗. According to above computation, when we directly com- ⎧푃 = (퐻 + 퐻 )푉 ⎪ 0 0 1 1 pute the matrix-vector multiplication on 4 × 4 matrix, there 푃 퐻 퐻 푉 . ⎨ 1 = ( 1 + 0) 0 needs 12 addition operations and 16 multiplication opera- ⎪푃 퐻 푉 푉 ⎩ 2 = 1( 0 + 1) tions over the finite field. 퐻 Here, 푃 , 푃 and 푃 are HMVPs of dimension 푘∕2. In order to implement the multiplication of 4,4 with 0 1 2 퐻 퐻 푉 푉 a vector, 0 + 1 can be computed in advance. 0 + 1 There are 2푘 − 1 independent elements which are in the costs 2 addition operations. By applying the HMVP formu- first row and first column of a Toeplitz matrix while there are lae, 10 addition operations and 12 multiplication operations 푘 푅 independent of a Hadamard matrix. Hence, constructing are needed in 4, so the total number of addition operations Hadamard matrices needs smaller search space than Toeplitz is 10 + 2 = 12. 12 Multiplication operations all appear in 푅 matrices. 4. In summary, we economize 4 multiplication operations,

TS Cong et al.: Preprint submitted to Elsevier Page 4 of 10 Construction of Lightweight MDS Matrices

푃 푃 which helps us to reduce the space complexity in hardware 퐻 0 had ℎ , ℎ 퐻 0 had ℎ , ℎ 푉 where 0 = ( 26 37), 1 = ( 04 15). Let 1 be implementation. ( ) 푃 Here we give a new observation of HMVP on XOR count 푉 0 푉 = 0푃 , and define two new metrics: 1 푉 0 1 • The number of XOR gates by directly applying HMVP 푃 푃 formulae is denoted as (퐻). 푉 0 푣 , 푣 ⊤ 푉 0 푣 , 푣 ⊤ 푃 where 0 = ( 4 5) , 1 = ( 6 7) . Then 0 can be computed by • The number of XOR gates (퐻) minus the repeated ( )( ) XOR gates is denoted as (퐻). 푃 푃 푃 퐻 0 퐻 0 푉 0 푃 1 0 0 푅 0 = 푃 푃 푃 From the expression of 4 we can obtain the XOR count 퐻 0 퐻 0 푉 0 of 4 × 4 Hadamard matrix by applying the HMVP formulae 0 1 1 over GF(2푐): 푃 퐴 , 퐴 , 퐴 Denote three corresponding HMVPs of 0 as 0 1 2, which can be computed according to (3), i.e., (퐻 , ) =4((ℎ ) + (ℎ )) + 2((ℎ ) + (ℎ )) + 12푐. 4 4 02 13 0 1 ( ) (5) 푃 푃 푃 ℎ 푣 ℎ 푣 퐴 퐻 0 퐻 0 푉 0 0246 6 + 1357 7 , 0 = ( 0 + 1 ) 1 = ℎ 푣 ℎ 푣 Look back to the example 퐸 = had(0x1,0x2,0x8,0xa), if 1357 6 + 0246 7 퐸 ℎ ⊕ we apply the HMVP formulae on , 02 = 0x1 0x8 = 0x9, ℎ ⊕  ℎ  ℎ 13 = 0x2 0xa = 0x8, so ( 02) = 1, ( 13) = 3. We ( ) 푃 푃 푃 ℎ 푣 ℎ 푣 could compute 퐴 퐻 0 퐻 0 푉 0 0246 4 + 1357 5 , 1 = ( 1 + 0 ) 0 = ℎ 푣 ℎ 푣 1357 4 + 0246 5 (퐸) = 4 × (1 + 3) + 2 × (0 + 1) + 12 × 4 = 66. The implementation of HMVP based matrix-vector prod- ( ) 푃 푃 푃 ℎ 푣 + ℎ 푣 uct is shown in Section A.1. There are 8 shared XOR gates 퐴 = 퐻 0 (푉 0 + 푉 0 ) = 04 46 15 57 . 2 1 0 1 ℎ 푣 + ℎ 푣 (underlined) totally, and hence 15 46 04 57 푃 (퐸) = 66 − 8 = 58. Then, 0 can be computed accordingly by ( ) 퐴 + 퐴 3.3. Application of HMVP to 8 × 8 Hadamard 푃 0 2 0 = 퐴 퐴 Matrices 1 + 2 In this section, we apply the HMVP to 8 × 8 Hadamard ⎛ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎞ 0246 6 + 1357 7 + 04 46 + 15 57 matrices. ⎜ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎟ ⎜ 1357 6 + 0246 7 + 15 46 + 04 57⎟ According to Lemma 1, 퐻 , is in the form of = ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 . 8 8 ⎜ 0246 4 + 1357 5 + 04 46 + 15 57⎟ ( ) ⎜ ⎟ 퐻 퐻 ℎ 푣 + ℎ 푣 + ℎ 푣 + ℎ 푣 퐻 ℎ , , ℎ 1 0 , ⎝ 1357 4 0246 5 15 46 04 57⎠ 8,8 = had( 0 … 7) = 퐻 퐻 0 1 푃 푃 푃 By similar derivation process, 퐵 = (퐻 1 + 퐻 1 )푉 1 , 퐻 ℎ , ℎ , ℎ , ℎ 퐻 ℎ , ℎ , ℎ , ℎ 0 0 1 1 where 1 = had( 0 1 2 3) and 0 = had( 4 5 6 7). 푃 푃 푃 푃 푃 푃 푃 퐵 퐻 1 퐻 1 푉 1 퐵 퐻 1 푉 1 푉 1 퐶 퐻 2 The vector 푉 is in the form of 1 = ( 1 + 0 ) 0 , 2 = 1 ( 0 + 1 ), 0 = ( 0 + 푃 푃 푃 푃 푃 푃 푃 푃 ( ) 퐻 2 )푉 2 , 퐶 = (퐻 2 + 퐻 2 )푉 2 , 퐶 = 퐻 2 (푉 2 + 푉 2 ), 푉 1 1 1 1 0 0 2 1 0 1 푉 0 , the other two HMVPs of 퐻 푉 , i.e., 푃 and 푃 are = 푉 8,8 1 2 1 ( ) 푉 푣 , 푣 , 푣 , 푣 ⊤ 푉 푣 , 푣 , 푣 , 푣 ⊤ 퐵 + 퐵 where 0 = ( 0 1 2 3) and 1 = ( 4 5 6 7) . 푃 0 2 1 = 퐵 퐵 By applying the HMVP formulae, the three correspond- 1 + 2 퐻 푉 푃 푃 푃 ing HMVPs of 8,8 , i.e., 0, 1 and 2, need to be com- ⎛ℎ 푣 + ℎ 푣 + ℎ 푣 + ℎ 푣 ⎞ 푃 ⎜ 0246 2 1357 3 04 02 15 13⎟ puted. Firstly, 0 can be computed by ℎ 푣 + ℎ 푣 + ℎ 푣 + ℎ 푣 ⎜ 1357 2 0246 3 15 02 04 13⎟ , ℎ ℎ ℎ ℎ 푣 = ⎜ℎ 푣 + ℎ 푣 + ℎ 푣 + ℎ 푣 ⎟ ⎛ 04 15 26 37⎞ ⎛ 4⎞ 0246 0 1357 1 04 02 15 13 ⎜ℎ ℎ ℎ ℎ ⎟ ⎜푣 ⎟ ⎜ℎ 푣 + ℎ 푣 + ℎ 푣 + ℎ 푣 ⎟ 푃 퐻 퐻 푉 15 04 37 26 5 . ⎝ 1357 0 0246 1 15 02 04 13⎠ 0 = ( 0 + 1) 1 = ⎜ℎ ℎ ℎ ℎ ⎟ ⎜푣 ⎟ ⎜ 26 37 04 15⎟ ⎜ 6⎟ ⎝ℎ ℎ ℎ ℎ ⎠ ⎝푣 ⎠ 37 26 15 04 7 and ( ) According to Lemma 2, 퐻 + 퐻 is also a Hadamard 퐶 + 퐶 0 1 푃 0 2 2 = 퐶 퐶 matrix. And hence, the HMVP formulae can be applied for 1 + 2 computing the matrix-vector product 푃 = (퐻 + 퐻 )푉 . 0 0 1 1 ⎛ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎞ Rewrite 퐻 + 퐻 as 02 26 + 13 37 + 0 0246 + 1 1357 0 1 ⎜ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎟ ( ) ⎜ 13 26 + 02 37 + 1 0246 + 0 1357⎟ 푃 푃 = ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 . 퐻 0 퐻 0 ⎜ 02 04 + 13 15 + 0 0246 + 1 1357⎟ 퐻 퐻 1 0 , 0 + 1 = 푃 푃 ⎜ℎ 푣 ℎ 푣 ℎ 푣 ℎ 푣 ⎟ 퐻 0 퐻 0 ⎝ 13 04 + 02 15 + 1 0246 + 0 1357⎠ 0 1

TS Cong et al.: Preprint submitted to Elsevier Page 5 of 10 Construction of Lightweight MDS Matrices

푅 퐻 푉 ℎ ℎ ℎ ℎ ℎ ℎ Finally the matrix-vector product 8 = 8,8 is 0 + 1 + 2 + 3 = 1 for involution, 13 = 02 +1, and hence ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ 3 = 13 + 1 = 02 + 1 +1. Combined with 2 = 02 + 0, ( ) all elements of the Hadamard matrix can be determined. The 푃 + 푃 푅 0 2 whole construction algorithm is shown in Algorithm 1. 8 = 푃 푃 1 + 2 ⎛퐴 퐴 퐶 퐶 ⎞ Algorithm 1 Construct 4 × 4 Involutory MDS Matrices. 0 + 2 + 0 + 2 ⎜퐴 + 퐴 + 퐶 + 퐶 ⎟ Input: The finite field GF(2푐)∕푝(푥), and the candidate  ⎜ 1 2 1 2 ⎟ . = ⎜퐵 + 퐵 + 퐶 + 퐶 ⎟ with the corresponding size 푠. 0 2 0 2 푐 푝 푥 ⎜퐵 퐵 퐶 퐶 ⎟ Output: A series of 4 × 4 involutions over GF(2 )∕ ( ). ⎝ 1 + 2 + 1 + 2 ⎠ ℎ , ℎ , ℎ  1: for each 0 1 02 from do ℎ ← ℎ ℎ 2: 2 02 + 0 A direct calculation of 8 × 8 HMVP needs 56 addition ℎ ← ℎ 3: 13 02 + 1 operations and 64 multiplication operations. By using the ℎ ← ℎ ℎ 푅 푃 푃 푃 푃 4: 3 13 + 1 above HMVP formulae, 8 consists of 0 + 2 and 1 + 2, 5: 퐻 , ← had(ℎ , ℎ , ℎ , ℎ ) 푃 , 푃 , 푃 each cost 10 addition operations, 푃 costs free. 4 4 0 1 2 3 0 1 2 2 6: if 퐻 is an involutory MDS matrix then 푃 , 푃 , 푃 푃 푃 푃 4,4 And 0 1 2 are vectors with 4 × 1, so 0 + 2 and 1 + 7: Save 퐻 and calculate (퐻 ) and (퐻 ). 푃 4,4 4,4 4,4 2 each cost 4 addition operations. 10 addition operations 8: end if 푣 푣 푣 푣 푣 푣 푣 푣 푣 are needed in 02, 04, 13, 15, 26, 46, 37, 57, 0246, 9: end for 푣 1357. So the total number of addition operations is 10 × 3 + 푅 푃 , 푃 , 푃 4 × 2 + 10 = 48. Meanwhile, in 8, 0 1 2 each cost The complexity of Algorithm 1 is 푅 12 multiplication operations, so 8 needs 36 multiplication operations totally. In summary, we economize 8 addition 푇 푂 푠3 . 1 = ( ) operations and 28 multiplication operations. 푅 4 Here, by counting the frequency of elements in 8 over If we search all matrices over GF(2 ), the total space is GF(2푐) we generalize 153 = 3, 375 ≈ 211.7 matrices. And for finite field GF(28)∕푝(푥), there are 255 non-zeros elements, the search space is 2553 =  퐻  ℎ  ℎ  ℎ  ℎ ( 8,8) =8( ( 0246) + ( 1357)) + 2( ( 0) + ( 1)) 16, 581, 375. The whole search space of 4×4 involutory ma-  ℎ  ℎ  ℎ  ℎ 푐. + 4( ( 04) + ( 15) + ( 02) + ( 13)) + 48 trices is under the computing ability of a single PC. (6) 3.4.2. Construct 8 × 8 Involutory MDS Matrices 3.4. Searching for Involutory MDS Matrices Continue to use the similar method, according to (6), If a Hadamard matrix 퐻푘,푘 is an involutory MDS matrix which illustrates the XOR count of 8 × 8 HMVP, the XOR 푐  ℎ  ℎ  ℎ over GF(2 )∕푝(푥), the following three constraints hold [8]: count can be minimized if ( 0246) + ( 1357), ( 0) +  ℎ  ℎ  ℎ  ℎ  ℎ ( 1) and ( 04) + ( 15) + ( 02) + ( 13) can be min- • ℎ ≠ ℎ for 푖 ≠ 푗, where 0 ≤ 푖, 푗 ≤ 푘 − 1 ; 푖 푗 imized. ∑ Due to the constraint 7 ℎ = 1, ℎ = ℎ +1, we • ℎ ≠ 0, where 0 ≤ 푖 ≤ 푘 − 1 ; 푖=0 푖 1357 0246 푖 should search only ℎ , ℎ , ℎ , ℎ , ℎ , ℎ , ℎ from , ∑ 0 1 02 13 04 15 0246 푘−1 퐻 ℎ ℎ ℎ • ℎ푖 = 1. and all elements of 8,8 can be determined as 2 = 02 + 0, 푖=0 ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ 3 = 13 + 1, 4 = 04 + 0, 5 = 15 + 1, 6 = 0246 + ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ We use these involutory constraints to reduce the search space 02 + 4, 7 = 1357 + 13 + 5 = 0246 + 1 + 13 + 5. The of the Hadamard MDS matrices. whole construction algorithm is shown in Algorithm 2. In this part we introduce a bottom-up construction strat- The complexity of Algorithm 2 is egy for involutory MDS matrices. Combined with the HMVP 푇 푂 푠7 . formulae, we identify an exhausted search strategy accord- 2 = ( ) ing to (5) and (6). For finite field GF(24), the whole search space is 157 = 3.4.1. Construct 4 × 4 Involutory MDS Matrices 170, 859, 375. For finite field GF(28), the whole search space According to (5), which illustrates the XOR count of 4×4 is 2557 = 70, 110, 209, 207, 109, 375. Hadamard matrices by applying HMVP, the XOR count can In order to reduce the search space, 푠 can be smaller, i.e.,  ℎ  ℎ  ℎ  ℎ  푠 be minimized if ( 02) + ( 13) and ( 0) + ( 1) can be can be initialized with the top most lightweight elements, minimized. So we can combine lightweight elements into a which will be detailed in Sec. 4. ℎ ℎ ℎ group, and assign these elements to 02, 0, 1. Firstly, we compute the XOR count of each possible el- 3.5. Searching for Non-involutory MDS Matrices ℎ , ℎ , ℎ , ℎ ement and choose 푠 most lightweight elements as the can- For 4×4 non-involutions, we should search 0 1 02 13 푐  ℎ ℎ didate set , so the search space over GF(2 ) is determined from . Elements 2 and 3 can be determined as follows: ℎ ℎ ℎ ℎ ℎ ℎ by 푠. As there is no zero element in a Hadamard matrix, 2 = 02 + 0, and 3 = 13 + 1. The whole construction 푠 ≤ 2푐 − 1. To construct a 4 × 4 involutory Hadamard algorithm of constructing 4 × 4 non-involutory MDS matri- ℎ , ℎ , ℎ  ces is similar with Algorithm 1. The complexity is 푂(푠4). matrix, we search 0 1 02 from . Due to the constraint

TS Cong et al.: Preprint submitted to Elsevier Page 6 of 10 Construction of Lightweight MDS Matrices

Algorithm 2 Construct 8 × 8 Involutory MDS Matrices. matrices, of which the smallest (퐻) is 66. There Input: The finite field GF(2푐)∕푝(푥), and the candidate  are 8 MDS matrices with XOR count 66 by apply- with the corresponding size 푠. ing the HMVP formulae, of which 4 matrices are over Output: A series of 8 × 8 involutions over GF(2푐)∕푝(푥). GF(24)∕0x13 and 4 matrices are over GF(24)∕0x19. ℎ , ℎ , ℎ , ℎ , ℎ , ℎ , ℎ  1: for each 0 1 02 04 13 15 0246 from do The matrices over GF(24)∕0x13 are ℎ ← ℎ ℎ 2: 2 02 + 0 ℎ ← ℎ ℎ , , 3: 3 13 + 1 had(0x1,0x2,0x9,0xb) had(0x1,0x2,0x8,0xa) ℎ ← ℎ ℎ 4: 4 04 + 0 had(0x2,0x1,0xa,0x8), had(0x2,0x1,0xb,0x9). ℎ ← ℎ ℎ 5: 5 15 + 1 ℎ ← ℎ ℎ ℎ ℎ 6: 6 0246 + 0 + 2 + 4 The matrices over GF(24)∕0x19 are ℎ ← ℎ 7: 1357 0246 + 1 ℎ ← ℎ ℎ ℎ ℎ , , 8: 7 1357 + 1 + 3 + 5 had(0x1,0xc,0x3,0xf) had(0x1,0xc,0x2,0xe) 퐻 ← ℎ , ℎ , ℎ , ℎ , ℎ , ℎ , ℎ , ℎ 9: 8,8 had( 0 1 2 3 4 5 6 7) had(0xc,0x1,0xe,0x2), had(0xc,0x1,0xf,0x3). 퐻 10: if 8,8 is an involutory MDS matrix then 퐻  퐻  퐻 11: Save 8,8 and calculate ( 8,8) and ( 8,8). We choose had(0x1,0x2,0x8,0xa) over GF(24)∕0x13 as 12: end if 퐼 the 4 × 4 involutory candidate 4, and hence we can 13: end for  퐼 obtain ( 4) = 58 according to the implementation in Section A.1. ℎ ℎ ℎ For 8 × 8 non-involutions, we should search 0, 1, 02, • 4×4 non-involutions: We find 21168 non-involutory ℎ ℎ ℎ ℎ ℎ   퐻 13, 04, 15, 0264, 1357 from . And all elements can be MDS matrices, of which the smallest ( ) is 56. There ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ  퐻 determined as 2 = 02 + 0, 3 = 13 + 1, 4 = 04 + 0, are 16 MDS matrices with ( ) 56 by applying the ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ ℎ 4 5 = 15 + 1, 6 = 0246 + 02 + 4, 7 = 1357 + 13 + 5. HMVP formulae, of which 8 matrices are over GF(2 )∕0x13 The whole construction algorithm is similar as Algorithm 2. and 8 matrices are over GF(24)∕0x19. 푂 푠8 The complexity is ( ). The matrices over GF(24)∕0x13 are , , 4. Experiment results had(0x1,0x4,0x3,0x5) had(0x1,0x4,0x8,0x5) had(0x2,0x9,0x3,0xb), had(0x2,0x9,0xb,0x8), We construct MDS matrices over all three finite fields , , (GF(24)∕0x13, GF(24)∕0x1f, GF(24)∕0x19). had(0x4,0x1,0x5,0x3) had(0x4,0x1,0x5,0x8) According to the analysis in Sec. 3.4.1 and Sec. 3.5, the had(0x9,0x2,0x8,0xb), had(0x9,0x2,0xb,0x3). total search space for 4 × 4 involutory and non-involutory matrices are 푠3 and 푠4 respectively, such that the full search The matrices over GF(24)∕0x19 are needs 153 ≈ 211.7 and 154 ≈ 215.6, so we set 푠 = 15 for 4× 4 , , matrices and search full space. had(0x1,0x6,0x3,0x7) had(0x1,0x6,0xd,0x7) For 8 × 8 matrices, the total search space are 푠7 and 푠8 had(0x2,0xc,0x3,0xe), had(0x2,0xc,0xe,0xd), for 8 × 8 involutory and non-involutory matrices, according had(0x6,0x1,0x7,0x3), had(0x6,0x1,0x7,0xd), to Sec. 3.4.2 and Sec. 3.5 respectively. In order to reduce , . ℎ ℎ had(0xc,0x2,0xd,0xe) had(0xc,0x2,0xe,0x3) the search space, we fix 0 = 1, 1 = 2, such that the search 푠5 space for involutory and non-involutory matrices are and had(0x1,0x4,0x3,0x5) 4 0x13 6 5 19.53 We choose over GF(2 )∕ as 푠 respectively. For 푠 = 15, the search space is 15 ≈ 2 푁 6 23.44 the 4×4 non-involutory MDS candidate 4, where we and 15 ≈ 2 , which is within the computing ability of a  푁 find 4 shared XOR gates, such that ( 4) = 56−4 = single PC. There are lightweight non-involutions while there 52. The detailed matrix-vector product is shown in 푠 is no such involution with this setting. We fix = 10 for Section A.2. constructing 8 × 8 involutions to select an appropriate value ℎ ℎ for 0246. By heuristic attempt, finally we fix 0246 = 10 and • 8×8 involutions: We totally find 22 8×8 involutions. ℎ ℎ 4 go through 0 and 1. In order to further reduce the search There are 2 matrices over GF(2 )∕0x19 cost 344 XOR space, the size 푠 is reduced to 10, i.e.,  contains the top 10 gates, which is the smallest (퐻) of them. They are: ℎ ℎ most lightweight elements. The values of 0246 and 1357 are specified, such that the search space for constructing 8 × 8 had(0xc,0x5,0xd,0x6,0x8,0x7,0x3,0xf), . involutions is 106 ≈ 219 93. had(0xc,0x5,0x8,0x7,0xd,0x6,0x3,0xf). All experimental parameters and results are shown in Ta- ble 3. Num. is used to record the number of matrices and We choose had(0xc,0x5,0x8,0x7,0xd,0x6,0x3,0xf) as 퐼 Max. and Min. are the maximum and minimum XOR count the 8 × 8 involutory candidate 8. The corresponding (퐻) respectively. The detailed information of the matrices matrix-vector product is shown in Section A.3. We  퐼 are as follows. find 62 shared XOR gates, such that ( 8) = 344 − 62 = 282. • 4×4 involutions: We find 1512 MDS involutory MDS

TS Cong et al.: Preprint submitted to Elsevier Page 7 of 10 Construction of Lightweight MDS Matrices

Table 3 Input parameters and experiment results

Parameter GF(24)∕0x13 GF(24)∕0x1f GF(24)∕0x19 Type 푘 푐 푠 Num. Max. Min. Num. Max. Min. Num. Max. Min. Involution 4 4 15 1512 138 66 1512 120 86 1512 138 66 Non-involution 4 4 15 21168 148 56 21168 120 66 21168 148 56 Involution 8 4 10 4 496 362 12 390 376 6 376 344 Non-involution 8 4 15 96 410 258 96 382 290 96 406 266

Table 4 Acknowledgement Comparisons of involutory MDS matrices This work was supported by the National Key Research Field 푘 / / Inv. Ref. and Development Program of China (No. 2018YFA0704701). GF(24)∕0x13 4 72 68 ✓ [23] GF(24)∕0x13 4 64 64 ✓ [33] A. Matrix-vector product of the candidates GF(24)∕0x13 4 68 63 ✓ [24] 4 ퟓퟖ ✓ 퐼 퐼 GF(2 )∕0x13 4 66 4 A.1. Matrix-vector product of 4 GF(24)∕0x19 4 58 58 [33] GF(24)∕0x13 4 58 58 [24] 4 ퟓퟐ 푁 ⎛ 0 0 3 3 3 3 2 2 2 2 1 1 1 0 ⎞ GF(2 )∕0x19 4 56 4 (푣 푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 ) 푃 ⎜ 2 3 3 2 3 3 2 3 3 2 3 2 ⎟ , 4 ✓ 0 = GF(2 )∕0x13 8 512 424 [37],[24] ⎜ (푣0푣0푣3)푥3 + (푣3푣3푣2)푥2 + (푣2푣2푣1)푥 + (푣1푣1푣0) ⎟ 4 ퟐퟖퟐ ✓ 퐼 ⎝ 2 3 2 2 3 2 2 3 2 3 2 3 ⎠ GF(2 )∕0x19 8 344 8 4 GF(2 )∕0x13 8 512 408 [29] ⎛ 푣0푣0푣3 푥3 푣3푣3푣2 푥2 푣2푣2푣1 푥 푣1푣1푣0 ⎞ 4 ( ) + ( ) + ( ) + ( ) GF(2 )∕0x13 8 448 392 [29] 푃 = ⎜ 0 1 1 0 1 1 0 1 1 0 1 0 ⎟ , 4 1 ⎜ 0 0 3 3 3 3 2 2 2 2 1 1 1 0 ⎟ GF(2 )∕0x13 8 432 384 [24] ⎝ (푣 푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 ) ⎠ 4 ퟐퟐퟖ 푁 0 1 0 0 1 0 0 1 0 0 1 1 GF(2 )∕0x13 8 258 8 ⎛ ⎞ (푣3 푣2 )푥3 +(푣2 푣1 )푥2 + (푣1 푣0 푣3 )푥 + (푣0 푣3 ) 푃 ⎜ 02 13 02 13 02 13 13 02 13 ⎟ . 2 = ⎜ 푣2 푣3 푥3 푣1 푣2 푥2 푣0 푣3 푣1 푥 푣3 푣0 ⎟ ⎝ ( 02 13) +( 02 13) + ( 02 02 13) + ( 02 13) ⎠ • 8×8 non-involutions: We find 96 8×8 non-involutions. 4 푁 2 matrices over GF(2 )∕0x13 have smallest XOR count A.2. Matrix-vector product of 4 (퐻) = 258: , ⎛ ⎞ had(0x1,0x2,0x5,0xe,0x7,0x3,0xa,0xd) (푣2푣3)푥3 + (푣1푣2)푥2 + (푣0푣3푣1)푥 + (푣3푣0) 푃 ⎜ 2 3 2 3 2 2 3 2 3 ⎟ , . 0 = had(0x1,0x2,0x7,0x3,0x5,0xe,0xa,0xd) ⎜ 푣3푣2 푥3 푣2푣1 푥2 푣1푣0푣3 푥 푣0푣3 ⎟ ⎝ ( 2 3) + ( 2 3) + ( 2 3 3) + ( 2 3) ⎠ We choose had(0x1,0x2,0x5,0xe,0x7,0x3,0xa,0xd) as ⎛ ⎞ 푁 (푣2푣3)푥3 + (푣1푣2)푥2 + (푣0푣3푣1)푥 + (푣3푣0) the 8 × 8 non-involutory candidate 8. The corre- 푃 ⎜ 0 1 0 1 0 0 1 0 1 ⎟ , 1 = ⎜ ⎟ sponding matrix-vector product is shown in Section A.4. ⎝ (푣3푣2)푥3 + (푣2푣1)푥2 + (푣1푣0푣3)푥 + (푣0푣3) ⎠ We find 30 shared XOR gates, such that (푁 ) = 0 1 0 1 0 1 1 0 1 8 ⎛ ⎞ 258 − 30 = 228. (푣3 푣1 )푥3 +(푣2 푣0 푣3 )푥2 +(푣1 푣3 푣2 )푥+(푣0 푣2 ) 푃 = ⎜ 02 13 02 13 13 02 13 13 02 13 ⎟ . 2 ⎜ 푣1 푣3 푥3 푣0 푣2 푣3 푥2 푣2 푣3 푣1 푥 푣2 푣0 ⎟ We compare our results with the previous work in Ta- ⎝ ( 02 13) +( 02 13 02) +( 02 02 13) +( 02 13) ⎠ ble 4. All the candidates we choose are more lightweight 퐼 than the previous MDS matrices. A.3. Matrix-vector product of 8

5. Conclusion ⎛ 푣0푣1푣0푣1푣3 푥3 푣1푣1푣3푣2푣3 푥2 ⎞ ⎜ ( 6 6 7 7 7) + ( 6 7 6 7 7) + ⎟ In this paper we propose efficient algorithms for con- ⎜ ⎟ structing lightweight involutory and non-involutory MDS ma- ⎜ (푣0푣0푣2푣3푣3푣1푣2)푥 + (푣1푣1푣0푣2푣2) ⎟ 퐴 6 7 6 6 7 7 7 6 7 7 6 7 , 0 = ⎜ ⎟ trices. We find 4 × 4 and 8 × 8 lightweight MDS matri- 0 0 3 1 1 3 1 1 2 3 3 2 4 ⎜ (푣 푣 푣 푣 푣 )푥 + (푣 푣 푣 푣 푣 )푥 + ⎟ ces with the fewest XOR counts over GF(2 ) until now. In ⎜ 6 7 6 6 7 6 7 6 6 7 ⎟ this paper, we focus on the construction of lightweight MDS ⎜ (푣0푣0푣1(푣2푣2)(푣3푣3))푥 + (푣1푣1푣0푣2푣2) ⎟ Hadamard matrices over GF(24). Our algorithms can be ex- ⎝ 6 7 6 6 7 6 7 6 7 6 6 7 ⎠ tended to construct MDS matrices over bigger finite field such as GF(28), which will be our future work.

TS Cong et al.: Preprint submitted to Elsevier Page 8 of 10 Construction of Lightweight MDS Matrices

⎛ ⎞ ⎛ (푣0 푣1 푣2 )푥3 + (푣0 푣3 푣0 푣2 )푥2+ ⎞ 푣0푣1푣0푣1푣3 푥3 푣1푣3푣1푣2푣3 푥2 ⎜ 0246 1357 1357 0246 0246 1357 1357 ⎟ ⎜ ( 4 4 5 5 5) + ( 4 4 5 5 5) + ⎟ ⎜ 2 1 3 1 0 2 3 ⎟ ⎜ 0 2 3 0 1 2 3 1 0 1 2 2 ⎟ (푣 푣 푣 )푥 + (푣 푣 푣 푣 ) ⎜ (푣 푣 푣 푣 푣 푣 푣 )푥 + (푣 (푣 푣 )(푣 푣 )) ⎟ 퐶 ⎜ 0246 1357 1357 0246 1357 1357 1357 ⎟ . 퐴 4 4 4 5 5 5 5 4 5 5 4 5 , 2 = ⎜ ⎟ 1 = ⎜ ⎟ 푣1 푣2 푣0 푥3 푣0 푣2 푣0 푣3 푥2 푣0푣1푣3푣0푣1 푥3 푣1푣2푣3푣1푣3 푥2 ⎜ ( 0246 0246 1357) + ( 0246 0246 1357 1357) + ⎟ ⎜ ( 4 4 4 5 5) + ( 4 4 4 5 5) + ⎟ ⎜ ⎟ ⎜ 1 3 2 0 2 3 1 ⎟ ⎝ (푣 푣 푣 )푥 + (푣 푣 푣 푣 ) ⎠ ⎜ (푣0푣1푣0푣2푣3푣2푣3)푥 + (푣0푣1푣1푣2푣2) ⎟ 0246 0246 1357 0246 0246 0246 1357 ⎝ 4 4 5 4 4 5 5 4 4 5 4 5 ⎠ 푁 A.4. Matrix-vector product of 8

⎛ 푣3 푣2 푥3 푣2 푣1 푣2 푥2 푣1 푣0 푣1 푥 푣0 푣0 푣3 ⎞ ⎛ ⎞ ( ) +( ) +( ) +( ) (푣0푣2)푥3 + (푣3푣1)푥2 + (푣2푣0푣3)푥 + (푣0푣1푣3) 퐴 = ⎜ 46 57 46 57 57 46 57 57 46 57 57 ⎟ , ⎜ 6 7 6 7 6 7 7 6 6 7 ⎟ 2 ⎜ 2 3 3 1 2 2 2 0 1 1 0 0 3 ⎟ 퐴 = , ⎝ (푣 푣 )푥 +(푣 푣 푣 )푥 +(푣 푣 푣 )푥+(푣 푣 푣 ) ⎠ 0 ⎜ 푣2푣0 푥3 푣1푣3 푥2 푣0푣2푣3 푥 푣3푣1푣0 ⎟ 46 57 46 46 57 46 46 57 46 57 46 ⎝ ( 6 7) + ( 6 7) + ( 6 7 6) + ( 6 7 7) ⎠

⎛ 0 2 3 3 1 2 2 0 3 0 1 3 ⎞ ⎛ 푣0푣1푣0푣1푣3 푥3 푣1푣3푣1푣3푣2 푥2 ⎞ (푣 푣 )푥 + (푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 ) ⎜ ( ) + ( ) + ⎟ 퐴 ⎜ 4 5 4 5 4 5 5 4 4 5 ⎟ , 2 2 3 3 3 2 2 3 3 3 1 = ⎜ ⎟ ⎜ (푣2푣0)푥3 + (푣1푣3)푥2 + (푣0푣2푣3)푥 + (푣3푣1푣0) ⎟ ⎜ (푣0푣2푣3푣0푣1푣3푣2)푥 + (푣1푣2푣0푣2푣1) ⎟ ⎝ 4 5 4 5 4 5 4 4 5 5 ⎠ 퐵 2 2 2 3 3 3 3 2 2 3 3 3 , 0 = ⎜ ⎟ ⎜ (푣0푣1푣3푣0푣1)푥3 + (푣1(푣2푣3)(푣1푣3))푥2+ ⎟ ⎜ 2 2 2 3 3 2 2 2 3 3 ⎟ ⎛ (푣1 푣2 푣3 )푥3 + (푣0 푣1 푣3 푣2 )푥2+ ⎞ ⎜ 46 46 57 46 46 46 57 ⎟ ⎜ ((푣0푣1)(푣2푣3)(푣0푣2)푣3)푥 + (푣0푣1푣2푣2푣1) ⎟ ⎝ 2 2 2 2 3 3 3 2 2 3 2 3 ⎠ ⎜ 푣0 푣2 푣1 푥 푣2 푣3 푣0 ⎟ ( 46 46 57) + ( 46 46 57) 퐴 = ⎜ ⎟ , 2 ⎜ (푣3 푣2 푣1 )푥3 + (푣2 푣3 푣0 푣1 )푥2+ ⎟ ⎜ 46 57 57 46 57 57 57 ⎟ ⎛ ⎞ (푣0푣1푣0푣1푣3)푥3 + (푣1푣3푣1푣3푣2)푥2+ ⎜ 푣1 푣2 푣0 푥 푣0 푣2 푣3 ⎟ ⎜ 0 0 1 1 1 0 0 1 1 1 ⎟ ⎝ ( 46 57 57) + ( 46 57 57) ⎠ ⎜ ⎟ ⎜ (푣0푣2푣3푣0푣1푣3푣2)푥 + (푣1푣2푣1푣0푣2) ⎟ 퐵 0 0 0 1 1 1 1 0 0 1 1 1 , 1 = ⎜ ⎟ ⎛ 0 2 3 3 1 2 2 0 3 0 1 3 ⎞ 0 1 3 0 1 3 2 3 1 1 3 2 (푣 푣 )푥 + (푣 푣 )푥 + (푣 푣 푣 )푥 + (푣 푣 푣 ) ⎜ (푣 푣 푣 푣 푣 )푥 + (푣 푣 푣 푣 푣 )푥 + ⎟ 퐵 ⎜ 2 3 2 3 2 3 3 2 2 3 ⎟ , ⎜ 0 0 0 1 1 0 0 0 1 1 ⎟ 0 = ⎜ (푣2푣0)푥3 + (푣1푣3)푥2 + (푣0푣2푣3)푥 + (푣3푣1푣0) ⎟ ⎜ ((푣0푣1)(푣2푣3)푣3푣0푣2)푥 + (푣0푣1푣2푣2푣1) ⎟ ⎝ 2 3 2 3 2 3 2 2 3 3 ⎠ ⎝ 0 0 0 0 1 1 1 0 0 1 0 1 ⎠

⎛ ⎞ (푣0푣2)푥3 + (푣3푣1)푥2 + (푣2푣0푣3)푥 + (푣0푣1푣3) ⎜ 0 1 0 1 0 1 1 0 0 1 ⎟ ⎛ 3 2 3 2 1 2 2 ⎞ 퐵 = , 푣 푣 푥 푣 푣 푣 푥 1 ⎜ 2 0 1 3 0 2 3 3 1 0 ⎟ ( 02 13) + ( 02 13 13) + 푣 푣 푥3 푣 푣 푥2 푣 푣 푣 푥 푣 푣 푣 ⎜ ⎟ ⎝ ( 0 1) + ( 0 1) + ( 0 1 0) + ( 0 1 1) ⎠ ⎜ 푣1 푣0 푣1 푥 푣0 푣0 푣3 ⎟ ( 02 13 13) + ( 02 13 13) 퐵 = ⎜ ⎟ , 2 ⎜ 푣2 푣3 푥3 푣1 푣2 푣2 푥2 ⎟ ( ) + ( ) + ⎛ (푣1 푣2 푣3 )푥3 + (푣0 푣1 푣3 푣2 )푥2+ ⎞ ⎜ 02 13 02 02 13 ⎟ ⎜ 02 02 13 02 02 02 13 ⎟ ⎜ ⎟ (푣0 푣1 푣1 )푥 + (푣0 푣0 푣3 ) ⎜ 푣0 푣2 푣1 푥 푣2 푣3 푣0 ⎟ ⎝ 02 02 13 02 13 02 ⎠ ( 02 02 13) + ( 02 02 13) 퐵 = ⎜ ⎟ , 2 ⎜ (푣3 푣1 푣2 )푥3 + (푣2 푣1 푣0 푣3 )푥2+ ⎟ ⎜ 02 13 13 02 13 13 13 ⎟ ⎛ ⎞ ⎜ 푣1 푣2 푣0 푥 푣0 푣2 푣3 ⎟ 푣1 푣2 푣3 푣2 푣3 푥3 푣0 푣1 푥2 ⎝ ( 02 13 13) + ( 02 13 13) ⎠ ⎜ ( 26 26 26 37 37) + ( 26 37) + ⎟ ⎜ ⎟ 푣3 푣0 푥 푣2 푣3 푣3 ⎜ ( 26 37) + ( 26 26 37) ⎟ 퐶 , ⎛ 1 0 1 3 3 0 3 0 2 2 ⎞ 0 = ⎜ ⎟ (푣 푣 푣 푣 )푥 + (푣 푣 푣 푣 )푥 + ⎜ (푣2 푣3 푣1 푣2 푣3 )푥3 + (푣1 푣0 )푥2+ ⎟ ⎜ 26 37 37 37 26 26 37 37 ⎟ ⎜ 26 26 37 37 37 26 37 ⎟ ⎜ 푣2 푣3 푣1 푣3 푥 푣2 푣1 푣2 ⎟ ( 26 26 37 37) + ( 26 37 37) ⎜ (푣0 푣3 )푥 + (푣3 푣2 푣3 ) ⎟ 퐶 = ⎜ ⎟ , ⎝ 26 37 26 37 37 ⎠ 0 ⎜ (푣0 푣3 푣1 푣1 )푥3 + (푣2 푣0 푣0 푣3 )푥2+ ⎟ ⎜ 26 26 26 37 26 26 37 37 ⎟ ⎜ ⎟ ⎝ (푣1 푣3 푣3 푣2 )푥 + (푣1 푣2 푣2 ) ⎠ ⎛ ⎞ 26 26 37 37 26 26 37 푣1 푣2 푣3 푣2 푣3 푥3 푣0 푣1 푥2 ⎜ ( 04 04 04 15 15) + ( 04 15) + ⎟ ⎜ ⎟ (푣3 푣0 )푥 + (푣2 푣3 푣3 ) ⎛ (푣1 푣0 푣1 푣3 )푥3 + (푣0 푣3 푣0 푣2 )푥2+ ⎞ ⎜ 04 15 04 04 15 ⎟ ⎜ 04 15 15 15 04 04 15 15 ⎟ 퐶 = ⎜ ⎟ , 1 푣2 푣3 푣1 푣2 푣3 푥3 푣1 푣0 푥2 ⎜ (푣2 푣3 푣1 푣3 )푥 + (푣2 푣1 푣2 ) ⎟ ⎜ ( ) + ( ) + ⎟ ⎜ 04 04 15 15 04 15 15 ⎟ ⎜ 04 04 15 15 15 04 15 ⎟ 퐶 = , 1 ⎜ (푣0 푣3 푣1 푣1 )푥3 + (푣2 푣0 푣0 푣3 )푥2+ ⎟ ⎜ (푣0 푣3 )푥 + (푣3 푣2 푣3 ) ⎟ 04 04 04 15 04 04 15 15 ⎝ 04 15 04 15 15 ⎠ ⎜ ⎟ ⎜ 푣1 푣3 푣3 푣2 푥 푣1 푣2 푣2 ⎟ ⎝ ( 04 04 15 15) + ( 04 04 15) ⎠

TS Cong et al.: Preprint submitted to Elsevier Page 9 of 10 Construction of Lightweight MDS Matrices

Conference on Information Security Practice and Experience, pages 564–576. Springer, 2014. ⎛ ⎞ (푣3 푣2 )푥3 + (푣2 푣1 )푥2+ [21] Alper Halbutogullari and Çetin K Koç. Mastrovito multiplier for ⎜ 0246 1357 0246 1357 ⎟ general irreducible polynomials. IEEE Transactions on Computers, ⎜ ⎟ (푣1 푣0 푣3 )푥 + (푣0 푣3 ) 49(5):503–518, 2000. 퐶 ⎜ 0246 1357 1357 0246 1357 ⎟ . 2 = ⎜ ⎟ [22] Martin Hell, Thomas Johansson, and Willi Meier. Grain: a stream 푣2 푣3 푥3 푣1 푣2 푥2 cipher for constrained environments. IJWMC, 2(1):86–93, 2007. ⎜ ( 0246 1357) + ( 0246 1357) + ⎟ [23] Jérémy Jean, Ivica Nikolić, and Thomas Peyrin. Joltik v1. 3. CAESAR ⎜ 푣0 푣3 푣1 푥 푣3 푣0 ⎟ ⎝ ( 0246 0246 1357) + ( 0246 1357) ⎠ Round, 2, 2015. [24] Jérémy Jean, Thomas Peyrin, Siang Meng Sim, and Jade Tourteaux. Optimizing implementations of lightweight building blocks. IACR References Transactions on Symmetric Cryptology, pages 130–168, 2017. [25] Pascal Junod and Serge Vaudenay. Fox: a new family of block ciphers. [1] Steve Babbage and Matthew Dodd. The MICKEY 2.0. Lecture Notes in Computer Science, 3357:131–146, 2004. ECRYPT Stream Cipher, 2006. [26] Anatolii Karatsuba. Multiplication of multidigit numbers on au- [2] PSLM Barreto and Vincent Rijmen. The Khazad legacy-level block tomata. In Soviet physics doklady, volume 7, pages 595–596, 1963. cipher. Primitive submitted to NESSIE, 97, 2000. [27] Khoongming Khoo, Thomas Peyrin, Axel Y Poschmann, and Huihui [3] PSLM Barreto and Vincent Rijmen. The Anubis Block Cipher. Sub- Yap. Foam: searching for hardware-optimal spn structures and com- mission to the NESSIE Project, 2000. ponents with a fair comparison. In International Workshop on Crypto- [4] Gérard M Baudet, Franco P Preparata, and Jean E Vuillemin. Area? graphic Hardware and Embedded Systems, pages 433–450. Springer, time optimal vlsi circuits for convolution. IEEE transactions on com- 2014. puters, (7):684–688, 1983. [28] Lukas Kölsch. Xor-counts and lightweight multiplication with fixed [5] Christof Beierle, Thorsten Kranz, and Gregor Leander. Lightweight elements in binary finite fields. In Annual International Conference 푛 multiplication in GF(2 ) with applications to MDS matrices. interna- on the Theory and Applications of Cryptographic Techniques, pages tional cryptology conference, 2016:625–653, 2016. 285–312. Springer, 2019. [6] Daniel J Bernstein. Multidigit multiplication for mathematicians. Ad- [29] Paulo S L M and Vincent Rijmen. The whirlpool hashing function. vances in Applied Mathematics, pages 1–19, 2001. the web conference, 2003. [7] Joan Boyar, Philip Matthews, and René Peralta. On the shortest lin- [30] Christophe Negre. Quadrinomial modular arithmetic using modi- ear straight-line program for computing linear forms. In Interna- fied polynomial basis. In International Conference on Information tional Symposium on Mathematical Foundations of Computer Sci- Technology: Coding and Computing (ITCC’05)-Volume II, volume 1, ence, pages 168–179. Springer, 2008. pages 550–555. IEEE, 2005. [8] Ting Cui and Chenhui Jin. Construction of involution cauchy- [31] Arash Reyhani-Masoleh and M Anwarul Hasan. A new construction hadamard type MDS matrices. Journal of Electronics Information of massey-omura parallel multiplier over GF (2푚). IEEE Transactions & Technology, 32:500–503, 02 2010. on Computers, 51(5):511–520, 2002. [9] Ting Cui, Chenhui Jin, and Zhiyin Kong. On compact cauchy ma- [32] Vincent Rijmen, Joan Daemen, , Antoon Bosselaers, and trices for substitution-permutation networks. IEEE Transactions on Erik De Win. The cipher shark. In International Workshop on Fast Computers, 64(7):2098–2102, 2014. Software , 1996. [10] Joan Daemen, Lars Knudsen, and Vincent Rijmen. The block ci- [33] Sumanta Sarkar and Habeeb Syed. Lightweight diffusion layer: Im- pher Square. In International Workshop on Fast Software Encryption, portance of toeplitz matrices. IACR Transactions on Symmetric Cryp- 1997. tology, pages 95–113, 2016. [11] Joan Daemen and Vincent Rijmen. The design of Rijndael: AES-the [34] Erkay Savaš, Alexandre F Tenca, and Cetin K Koç. A scalable and advanced encryption standard. Springer Science & Business Media, unified multiplier architecture for finite fields GF (p) and GF (2푚). In 2013. International Workshop on Cryptographic Hardware and Embedded [12] Watanabe Dai, Soichi Furuya, Hirotaka Yoshida, Kazuo Takaragi, Systems, pages 277–292. Springer, 2000. and Bart Preneel. A New Generator MUGI. 2002. [35] Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris [13] Christophe De Cannière. Trivium: A stream cipher construction in- Hall, and Niels Ferguson. The Twofish encryption algorithm: a 128- spired by block cipher design principles. In International Conference bit block cipher. 1999. on Information Security, pages 171–186. Springer, 2006. [36] Taizo Shirai, Kyoji Shibutani, Toru Akishita, Shiho Moriai, and Tetsu [14] Haining Fan and M Anwar Hasan. A new approach to subquadratic Iwata. The 128-bit blockcipher CLEFIA. In International Conference space complexity parallel multipliers for extended binary fields. IEEE on Fast Software Encryption, 2007. Transactions on Computers, 56(2):224–233, 2007. [37] Siang Meng Sim, Khoongming Khoo, Frederique E Oggier, and [15] Haining Fan and M Anwar Hasan. A survey of some recent bit-parallel Thomas Peyrin. Lightweight MDS involution matrices. fast software GF (2n) multipliers. Finite Fields and Their Applications, 32:5–43, encryption, 2015:471–493, 2015. 2015. [16] Martin Fürer and Kurt Mehlhorn. At 2-optimal Galois field multi- plier for vlsi. In Aegean Workshop on Computing, pages 217–225. Springer, 1986. [17] Praveen Gauravaram, Lars R Knudsen, Krystian Matusiewicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S Thom- sen. Grøstl-a SHA-3 candidate. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2009. [18] Jian Guo, Thomas Peyrin, and Axel Poschmann. The PHOTON fam- ily of lightweight hash functions. In Annual Cryptology Conference, pages 222–239. Springer, 2011. [19] Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew J B Rob- shaw. The LED block cipher. cryptographic hardware and embedded systems, 2012:326–341, 2011. [20] Kishan Chand Gupta and Indranil Ghosh Ray. On constructions of cir- culant MDS matrices for lightweight cryptography. In International

TS Cong et al.: Preprint submitted to Elsevier Page 10 of 10