Fast Normal Basis Multiplication Using General Purpose Processors
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 11, NOVEMBER 2003 1379 Fast Normal Basis Multiplication Using General Purpose Processors Arash Reyhani-Masoleh, Member, IEEE, and M. Anwar Hasan, Senior Member, IEEE Abstract—For cryptographic applications, normal bases have received considerable attention, especially for hardware implementation. In this article, we consider fast software algorithms for normal basis multiplication over the extended binary field GFð2mÞ. We present a vector-level algorithm which essentially eliminates the bit-wise inner products needed in the conventional approach to the normal basis multiplication. We then present another algorithm which significantly reduces the dynamic instruction counts. Both algorithms utilize the full width of the data-path of the general purpose processor on which the software is to be executed. We also consider composite fields and present an algorithm which can provide further speed-ups and an added flexibility toward hardware-software codesign of processors for very large finite fields. Index Terms—Finite field multiplication, normal basis, software algorithms, ECDSA, composite fields. æ 1INTRODUCTION m HE extended binary finite field GFð2 Þ of degree m is to a number of practical considerations. Most importantly, Tused in important cryptographic operations, such as normal basis multiplication algorithms require inner pro- key exchange, signing, and verification. For today’s security ducts or matrix multiplications over the ground field GF(2). applications, the minimum values of m are considered to be Such computations are not directly supported by most of l60 in the elliptic curve cryptography and 1,024 in the today’s general purpose processors. These computations standard discrete log-based cryptography. Elliptic curve require bit-by-bit logical AND and XOR operations, which crypto-systems, which were proposed by Koblitz [15] and are not efficiently implemented using the instruction set Miller [23] independently, use relatively smaller field sizes, supported by the processors. Also, when a high-level but require a considerable amount of field arithmetic for programming language such as C is used, the cyclic shifts each group operation (i.e., addition of two points). In such needed for field squaring operations are not as efficient as crypto-systems, often the most complicated and expensive they are in hardware. module is the finite field arithmetic unit. As a result, it is In this paper, we consider algorithms for fast software important to develop suitable finite field arithmetic algo- normal basis (NB) multiplication on general purpose rithms and architectures that can meet the constraints of processors. We discuss how the conventional bit-level various implementation technologies, such as hardware algorithm for normal basis multiplication fails to utilize and software. the full data-path of the processor and makes its software For cryptographic applications, the most frequently used implementation inefficient. In view of this, a vector-level m GFð2 Þ arithmetic operations are addition and multiplica- normal basis multiplication algorithm is presented which tion. Compared to the former, the latter is a much more eliminates the matrix multiplication over GF(2) and sig- complicated and time-consuming operation. The complex- nificantly reduces the number of dynamic instructions. We ity of GFð2mÞ multiplication very much depends on how then derive another scheme for normal basis multiplication the field elements are represented. For hardware imple- to further improve the speed. We present implementation mentation of a multiplier, the use of normal bases has results of these schemes for the five fields recommended by received considerable attention and a number of hardware NIST for NB multiplication in ECDSA (elliptic curve digital architectures and implementations have been reported (see, signature algorithm) [25], i.e., m ¼ 163, 233, 283, 409, 571. for example, [1], [2], [19], [10], [35]). Unlike hardware, so far For example, it takes 99s and requires only 322 bytes of 163 software implementation of a GFð2mÞ multiplier using memory for a GFð2 Þ multiplication using NB on normal bases has not been that popular. This is mainly due Pentium III 533 MHz PC. We also consider normal basis multiplication over certain special classes of composite fields. We show that normal basis multipliers over such . A. Reyhani-Masoleh is with the Centre for Applied Cryptographic composite fields can provide an additional speed-up. For Research, Department of Combinatorics and Optimization, University of example, it requires 114s and 25 bytes of memory for Waterloo, Waterloo, Ontario, Canada N2L 3G1. 299 E-mail: [email protected]. multiplication over GFð2 Þ. Composite fields also offer a . M.A. Hasan is with the Department of Electrical and Computer great deal of flexibility toward hardware-software codesign Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L of very large finite field processors. 3G1. E-mail: [email protected]. The organization of this article is as follows: In Section 2, Manuscript received 26 July 2001; revised 16 Sept. 2002; accepted 24 Jan. the NB representation and its conventional multiplication 2003. For information on obtaining reprints of this article, please send e-mail to: algorithm are presented. This algorithm has been proposed [email protected], and reference IEEECS Log Number 114607. by NIST for NB multiplication in ECDSA [25] where all 0018-9340/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society 1380 IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 11, NOVEMBER 2003 recommended values of m are prime. Then, in Section 3, we referred to [11] for an algorithm with t odd). The case of t even extend NIST’s algorithm to a vector-level multiplication is of particular interest for implementing high speed crypto- algorithm which uses the full width of the data-path of the systems based on Koblitz curves. Such curves with points processor. In Section 4, a more efficient algorithm for over GFð2mÞ exist for m ¼ 163, 233, 283, 409, 571, where NB multiplication is presented. In this section, this algo- normal bases have t even. Let A ¼ða0;a1; ÁÁÁ;amÀ1Þ and B ¼ m rithm is also compared with other algorithms. Then, we ðb0;b1; ÁÁÁ;bmÀ1Þ be the elements of GFð2 Þ, then the ith present an algorithm for multiplication over finite fields coordinate of the product C ¼ AB is computed as [13]: GFð2mÞ with composite values of m in Section 5. Finally, a XpÀ2 few concluding remarks are given in Section 6. ci ¼ aFðnþ1ÞþibFðpÀnÞþi; 0 i m À 1: ð4Þ n¼1 2PRELIMINARIES In (4), p ¼ tm þ 1 and 2.1 Normal Basis Representation Fð2iuj mod pÞ¼i; 0 i m À 1; 0 j<t; ð5Þ It is well-known that there exists a normal basis in the field GFð2mÞ over GFð2Þ for all positive integers m. By finding where u is an integer of order t mod p. In order to realize m 2 2mÀ1 an element 2 GFð2 Þ such that f; ; ÁÁÁ; g is a basis (4), the following algorithm can be used [11] where ci is of GFð2mÞ over GFð2Þ, any element A 2 GFð2mÞ can be computed in the inner loop of this algorithm. Note that, in represented as the following algorithm, A i (resp. A i) denotes the i-fold left (resp. right) cyclic shifts of the coordinates of A. mXÀ1 2i 2 2mÀ1 The algorithm uses the fact that the ði þ jÞth coordinate of A A ¼ ai ¼ a0 þ a1 þÁÁÁþamÀ1 ; i¼0 (i.e., aiþj) is equal to the jth coordinate of A i. It also requires the input sequence Fð1Þ;Fð2Þ; ÁÁÁ;Fðp À 1Þ to be where ai 2 GFð2Þ, 0 i m À 1, is the ith coordinate of A. precomputed using (5). In this paper, this normal basis representation of A will be written in short as A ¼ða0;a1; ÁÁÁ;amÀ1Þ. Now, consider the Algorithm 1. (Bit-Level NB Multiplication) following matrix: Input: A; B 2 GFð2mÞ;FðnÞ2½0;mÀ 1 for hi 1 n p À 1 i j mÀ1 M ¼ 2 þ2 ; ð1Þ Output: C ¼ AB i;j¼0 1. Initialize C ¼ðc0;c1; ÁÁÁ;cmÀ1Þ :¼ 0 whose entries belong to GFð2mÞ. Writing these entries with 2. For i ¼ 0 to m À 1 { respect to the NB, one obtains the following: 3. For n ¼ 1 to p À 2 { 4. ci :¼ ci þ aFðnþ1ÞbFðpÀnÞ 2 2mÀ1 M ¼ M0 þ M1 þÁÁÁþMmÀ1 ; ð2Þ 5. } 6. A 1;B 1 where Mis are m  m multiplication matrices whose entries belong to GFð2Þ. Following [23], we now give the definition 7. } of the complexity of an NB as follows. Software implementation of Algorithm 1 is not very efficient for the following reasons: First, in each execution of Definition 1. The numbers of 1s in all Mis are equal. Let us define this number by line 4, one coordinate of each of A and B are accessed. These accesses are such that their software implementation is CN ¼ HðMiÞ; 0 i m À 1; ð3Þ rather unsystematic and typically requires more than one instruction. Second, in line 4, the mod 2 multiplication of the which is known as the complexity of the NB. In (3), HðM Þ i coordinates, which is implemented by bit-level logical AND refers to the Hamming weight, i.e., the number of 1s, in M . i operation, is performed mðp À 2Þ times in total and the mod 2 addition, which is implemented by bit-level logical It is well-known that C 2m À 1 [23]. When N XOR operation, is performed 1 mðp À 2Þ times, on average, C ¼ 2m À 1, the NB is called an optimal normal basis 4 N assuming that A and B are two random inputs. In the (ONB). Two types of ONBs were constructed by Mullin et al. C programming language, these mod 2 multiplication and [23].