Hardware Implementation of the Baillie-PSW Primality Test
Total Page:16
File Type:pdf, Size:1020Kb
Hardware Implementation of the Baillie-PSW Primality Test Carla Purdy, Yasaswy Kasarabada, and George Purdy Department of Electrical Engineering and Computing Systems University of Cincinnati Cincinnati, OH USA [email protected], [email protected], [email protected] Abstract – The need for large primes in major cryptographic checking whether some equality involving n is true. If so, then, algorithms has stirred interest in methods for prime generation. to a certain accuracy, n is probably prime. The Fermat test [4], Recently, to improve confidence and security, prime number the Miller-Rabin test [5] and the Solovay-Strassen test [6] are a generation in hardware is being considered as an alternative to few examples. On the other hand, deterministic tests determine software. Due to time complexity and hardware implementation with absolute certainty the primality of a number. However, issues, probabilistic primality tests are generally preferred. The Baillie-PSW primality test is a strong probabilistic test; no known these tests generally have high running times compared to Baillie-PSW pseudoprime exists. In this paper, we discuss probabilistic tests. Heuristic tests are a more “experimental” different types of cryptographic algorithms and primality tests, type of primality test, with low running times. These tests are and review hardware implementations of the Miller-Rabin and generally unproven but work well in practice; Baillie-PSW [7] Lucas tests. We also present the implementation of a Verilog-based is an example of such a test. Due to its low running time and design of the Baillie-PSW test on an Altera Cyclone IV GX FPGA. low rate of failure, we have chosen to implement the Baillie- To our knowledge, this is the first hardware implementation of this PSW test in this work. Our work can be used in standalone test. The implementation takes an odd random number as input embedded systems which need to generate prime numbers for and returns the next immediate probable prime number as output. security and cannot rely on software-based systems for We analyze the results from our implementation and suggest methods to further improve our results in future. generation. Our system can also be included as a prime- generation module in a large secure system. I. INTRODUCTION II. BACKGROUND Currently acceptable cryptographic systems are of two major types – symmetric key and public key. In symmetric key A. Distribution of primes and their use in cryptography cryptography, as the name suggests, the sender and receiver The Prime Number Theorem [8] [9], which describes the share the same key. It is usually implemented as block or stream asymptotic distribution of prime numbers, implies that primes ciphers. DES and AES are two examples. However, key become less common as the numbers get larger. Based on the management is a major issue. To overcome this obstacle, calculations of Gauss, Legendre and other mathematicians, we public-key cryptography was introduced in 1976 [1]. Two can estimate the number of primes less than or equal to a real separate but mathematically related keys (the public key and the number N, π (N) as: private key) are used. Given the public key, calculating the 1/2 private key is computationally infeasible. The sender encrypts |π (N) – Li (N)| < (N · ln (N) / 8π) (1) the data using the receiver’s public key; the receiving party then for all N > 2657, where Li (N) = (1/ ln( )) . uses their private key to decrypt the data. RSA, DSA and The security of many major public-key cryptosystems ElGamal are public-key cryptosystems. 2 depends on the ability of the attacker to factor∫ the public key to NIST’s SP 800-57 published in 2016 [2] mandates a key gain information about the private key. In RSA, this means length of 2048 bits (with 112 bits of strength) for all data that factoring the semi-prime into its prime factors. In 2012, a major needs to stay secure through 2030, when RSA is used. Using flaw in many RSA implementations was found. Researchers NIST’s FIPS published in 2013 [3], the maximum length of [10] noticed public keys being shared among unrelated parties. primes that need to be used to generate a 2048-bit (semi-prime) Many of the 1024-bit shared RSA moduli offered little to no key is 1024 bits, if probable primes are being used. Therefore, security. Such “high-risk” keys could compromise the security in our implementation we generate 1024-bit primes. of all concerned systems. To avoid such a scenario, large and unique primes must be used each time to generate RSA keys. Many methods can be used to generate primes. A prime sieve is a simple way to generate all primes in a limited range; B. Primality tests however, it is inefficient at finding a single prime in a large As discussed in Section I, deterministic primality tests have range. For this purpose, primality tests are generally used. zero rate of failure, but very high running times. Probabilistic There are three major types of primality tests – probabilistic, tests have low running times but higher rates of failure. The deterministic and heuristic. Probabilistic tests generally involve Baillie-PSW test has very high accuracy and a low running choosing a random number n from a sample space, followed by time; so, it is the ideal choice for a practical primality test. Algorithm 1 – Baillie-PSW primality test [7] Algorithm 3 – Lucas probable prime test [14] Input: N > 3, an odd integer to be tested for primality Input: N (generated by RNG module in software) Output: Probable Prime if N passes; otherwise Composite 1. Perform trial division of N with primes less than a convenient 1. U0 = 0, V0 = 2. limit. If N is divisible by any such prime, return composite and 2. Select D ϵ {5, -7, 9, -11 …} such that the Jacobi symbol quit (performed by Divisibility module in software). J (D, N) = −1. Set P = 1, Q = (1 – D) / 4. 2. Perform a Miller-Rabin test (base 2) on N. If N is not a base-2 3. Calculate UN - J (D, N) i.e. UN + 1. Miller-Rabin probable prime, declare N to be composite and 4. If UN+1 ≡ 0 mod N, RETURN Probable Prime; else quit (performed by Miller-Rabin module in hardware). RETURN Composite. 3. Perform a Lucas probable prime test on N. If N is not a Lucas probable prime, declare N to be composite and quit (performed by Lucas module in hardware). A. Miller-Rabin module 4. If N hasn’t been declared composite, declare N a probable This module performs a base-2 Miller-Rabin test. Step 2 of prime (performed by the Baillie-PSW module in hardware). Algorithm 2 uses the ME module. For step 4a, favoring speed over area, the MM module was used instead of the ME module. C. Hardware Implementations B. Lucas module Currently, many primality testing algorithms are being implemented in hardware, on FPGAs and ASICs. Due to their This module performs the Lucas probable prime test. flexibility, reconfigurable hardware like FPGAs and CPLDs are Step 2 of Algorithm 3 uses the JSC module. The values of U preferred. Several implementations of the Miller-Rabin test and and V in Algorithm 3 are the corresponding Lucas sequences a few implementations of the Lucas test exist. The fastest for the values of P and Q chosen in step 2. The recurrence implementation of the Miller-Rabin test on a Xilinx FPGA [11] relations used in this module are: gives a 2.2x speedup over software. A Lucas test + + = = (2.1) implementation on the same FPGA [12] was 30% slower but 3 2 2 −1 −1 ∙ −1 −1 times more energy efficient. The ASIC implementation [12] + was 3.6 times faster and 400 times more energy efficient than = = 2 2 (2.2) 2 the optimized software implementation. To our knowledge, this 2 2 ∙ is the first hardware implementation of the Baillie-PSW test. These ∙ relations were derived considering the values of P, Q and D in step 2. In addition to checking the congruence III. MODULE DESCRIPTIONS & IMPLEMENTATIONS condition in step 4, the following congruence relation can be The implementation in this paper consists of two main checked: 2 . This step strengthens the test. modules with five additional sub-modules and a top-level N is more likely to be prime if both congruence conditions hold. module. The main modules – the Miller-Rabin and Lucas +1 ≡ C. Modular Multiplication (MM) module modules – perform steps 2 and 3 of Algorithm 1 respectively. The sub-modules perform step 1 and assist in the successful Both the Miller-Rabin module and the Lucas module operation of the main modules. The sub-modules are the perform many modular multiplications, which are speeded up Modular Multiplication (MM) module, the Modular by using the Montgomery modular multiplication algorithm Exponentiation (ME) module, the Jacobi symbol calculator [15]. This algorithm adds an extra factor of 2-k to the output. To (JSC) module, the Divisibility module and the Random Number counter this, both inputs X and Y are converted to a Generator (RNG) module. The last two sub-modules were Montgomery form prior to multiplication. This changes the implemented in software on a host computer; the rest of the factor in the output to 2k. This factor is then removed by sub-modules, the two main modules and the top-level module performing another round of the multiplication algorithm while are hardware modules. We also discuss the role of the testbench. setting one input as the output from the previous step and the other input as 1, introducing a 2-k factor to counter the 2k factor.