A Scalable System-On-A-Chip Architecture for Prime Number Validation
Total Page:16
File Type:pdf, Size:1020Kb
A SCALABLE SYSTEM-ON-A-CHIP ARCHITECTURE FOR PRIME NUMBER VALIDATION Ray C.C. Cheung and Ashley Brown Department of Computing, Imperial College London, United Kingdom Abstract This paper presents a scalable SoC architecture for prime number validation which targets reconfigurable hardware This paper presents a scalable SoC architecture for prime such as FPGAs. In particular, users are allowed to se- number validation which targets reconfigurable hardware. lect predefined scalable or non-scalable modular opera- The primality test is crucial for security systems, especially tors for their designs [4]. Our main contributions in- for most public-key schemes. The Rabin-Miller Strong clude: (1) Parallel designs for Montgomery modular arith- Pseudoprime Test has been mapped into hardware, which metic operations (Section 3). (2) A scalable design method makes use of a circuit for computing Montgomery modu- for mapping the Rabin-Miller Strong Pseudoprime Test lar exponentiation to further speed up the validation and to into hardware (Section 4). (3) An architecture of RAM- reduce the hardware cost. A design generator has been de- based Radix-2 Scalable Montgomery multiplier (Section veloped to generate a variety of scalable and non-scalable 4). (4) A design generator for producing hardware prime Montgomery multipliers based on user-defined parameters. number validators based on user-specified parameters (Sec- The performance and resource usage of our designs, im- tion 5). (5) Implementation of the proposed hardware ar- plemented in Xilinx reconfigurable devices, have been ex- chitectures in FPGAs, with an evaluation of its effective- plored using the embedded PowerPC processor and the soft ness compared with different size and speed tradeoffs (Sec- MicroBlaze processor. Our work demonstrates the flexibil- tion 6). (6) Evaluation of a SoC approach using customised ity and trade-offs in using reconfigurable platform for de- IP blocks with both hard [5] and soft processor (Section 7). veloping System-on-a-Chip applications. It is shown that, The rest of the paper is organized as follows. Section 2 de- for instance, a 1024-bit Primality test can be completed in scribes the related work. Section 3 presents a design flow less than a second, and a low cost XC3S2000 FPGA chip for mapping scalable prime number validators into hard- can accommodate a 32k-bit scalable primality test with 64 ware. Section 4 covers a scalable architecture for modu- parallel processing elements. lar multiplication. Section 5 presents a design generator GenDesign which produces architectures with different 1 Introduction design trade-offs, based on user-specified parameters. Sec- tion 6 provides experimental results of one single IP-block. Section 7 describes the SoC-based framework using em- For centuries, the problem of validating prime numbers, the bedded processors. Finally, Section 8 summarises our cur- Primality test, has posed a great challenge to both computer rent and future research. scientists and mathematicians [6]. The problem of identi- fying Prime and Composite numbers is known as one of the most important problems in arithmetic. Many security 2 Background and Related work applications also involve large numbers: while it is easy to Primality test is essential for prime number generation. multiply prime numbers to get a product, the reverse pro- Cryptography often uses large prime numbers. For in- cess of recovering the primes is much more difficult. stance, RSA public key algorithm requires 512-bit, 1,024- Field programmable gate arrays offer a rapid-prototyping bit for key generation [17]. Much research work has been platform for the verification of embedded systems.FPGAs done on primality test and generation, and most algorithms are also used as active components in security systems, are based on factorization [16]. In [2], the primality test has such as Firewall processors [12]. As the threat of attacks been proved to be solvable in polynomial time. However, appears to be increasing, many existing systems do not it requires a log operation which is expensive for hardware seem to be secure enough. One of the reasons is due to implementation; it also requires knowledge of the primal- the use of weak key generation. It has been recognised ity of preceding numbers, which is impractical for arbitrary that strong prime number generation is important, and the prime testing. Joye et al [10] has proposed an efficient prime validation is an intrinsic part of the generation. Here prime number generation scheme based on pseudo-random we define validation as the process of performing prime number generation using a smart-card implementation. Lu testing on a given number. et al [13] has further developed the RSA key generation for smart card using different prime number algorithms. Input: The number under primality test: p Input: X * Y (mod M), Output: S 1. S = 0 1. Random choose a number b in [1, p-1] Z 2. for i = 0 to n - 1 (q m) 2 (0) (0) (0) 2. Let p = 2 × +1 where m is an odd integer 3. (Ca,S ) = xi*Y + S 3. IF either 4. if S(0) = 1 then m 0 4. Case 1: b = 1 (mod p) or 5. (Cb,S(0)) = S(0) + S(0) 5. Case 2: there is an integer i in [0, q-1] i 6. for j = 1 to e m 2 (j) (j) (j) 6. such that b × = -1 (mod p) 7. (Ca,S ) = Ca + xi*Y + S (j) (j) (j) 7. RETURN "Inconclusive" 8. (Cb,S ) = Cb + M + S 8. ELSE (j 1) (j) (j 1) 9. S − = (S0 ,Sw −1::1) 9. RETURN "Composite" 10. end for − 11. else 12. for j = 1 to e (j) (j) (j) Figure 1: The Rabin-Miller Probabilistic Primality Test 13. (Ca,S ) = Ca + xi*Y + S (j 1) (j) (j 1) 14. S − = (S0 ,Sw −1::1) 15. end for − 16. end if There is a special class of very large prime numbers (e) which have the representation of 2n 1. They are called 17. s = 0 − 18. end for Mersenne prime numbers. The 40th Mersenne prime num- ber 224;036;583 1 has been discovered in May 2004. A − clustered workforce GIMPS [9] in the Internet is currently Figure 2: The Multiple Word Radix-2 Montgomery Multiplica- dedicated on locating the next Mersenne number. tion Algorithm 2.1 Rabin-Miller Primality Test cation has been implemented as a coprocessor in reconfig- urable RSA System-on-Chip building block in [8]. Further- In this paper, we have selected the Rabin-Miller probabilis- more, a high-radix design of scalable modular multiplier tic primality test algorithm [15] (Figure 1) as the core for has also been discussed. We observe that the scalable de- the primality test. The algorithm can quickly determine sign is particularly useful for very large precision such as the primality of the given large number with a control- the prime test. lably small probability of error [1]. It requires a num- ber of small primes for repeatedly testing the input num- ber. The probability of falsely identifying a composite 3 Design Flow as a prime decreases with every additional small prime used. This method enables the tradeoff between accuracy In this section we present the design flow for our scalable (more small primes) and efficiency (fewer small primes) prime number validator. The input to the system includes for different applications. As shown in the algorithm, user-specified constraints, such as the bit-width of the input “inconclusive” implies the number p maybe prime. prime number and the word-width which is required for The selection of number b is based on a set of small primes the scalable multiplier, while the output from the system is such as 2, 3, 5, ... With the use of first eight small primes, a synthesizeable hardware block that can be embedded in the 100% accuracy of primality test can be achieved for different cryptographic designs. This flexible design flow numbers up to 3:4 1010 [1]. facilitates the ability to upgrade existing security systems × with very small overhead. 2.2 Scalable Montgomery Algorithm The major features of the proposed validator include: (1) A variable compile-time prime number validator in which The fundamental operation of the RSA public key cryp- user can easily update the complexity of the system by tosystem is modular exponentiation, achieved by repeated controlling the prime width. (2) The variable input small modular multiplications. The Montgomery modular mul- prime numbers determine the accuracy and performance of tiplication [14] has been widely used to speed up the mul- the system. (3) The user-specified word-width controls the tiplication, squaring and exponentiation. There are many modular operation taken in the hardware. (4) Users are able different extended algorithms [11] and software implemen- to select different Montgomery architectures for hardware tation based on the Montgomery algorithm, and the high- implementation and thus enhance rapid prototyping. radix Montgomery modular exponentiation has been im- plemented in reconfigurable hardware [3]. The Rabin-Miller algorithm is first implemented using standard multiplication and exponentiation with a sequen- The reusability and scalability of the Montgomery multi- tial modulo-multiply. We observe that one of the bottle- plier have been investigated in [18, 19] such that the de- necks of the Rabin-Miller algorithm is the modular ex- sign is no longer restricted to a fixed precision. The in- ponentiation in testing the two valid checks in Section put operands are partitioned into multiple words. Figure 2 2. Montgomery algorithm is widely used for modu- shows the algorithmic description of the Radix-2 Mont- lar multiplication and exponentiation. It requires a con- gomery multiplication. The operand Y (multiplicand) is stant at the start of algorithm to facilitate conversion be- scanned word-by-word and the operand X (multiplier) is tween standard and Montgomery space.