Download the PCG Library," 2015

Home , AKS primality test, Generation of primes, Integer factorization, Lucas primality test, Lucas pseudoprime, Lucas sequence

A Verilog Description and Efficient Hardware

Implementation of the Baillie-PSW Primality Test

A thesis submitted to the

Graduate School of the University of Cincinnati

in partial fulfillment of the requirements for the degree of

Master of Science

in the Department of Electrical Engineering and Computing Systems

of the College of Engineering and Applied Sciences

Yasaswy Kasarabada

Bachelor of Technology, Electronics and Communication Engineering

Visvesvaraya National Institute of Technology, 2013

Thesis Advisor and Committee Chair: Dr. Carla Purdy

July 2016 Abstract

Prime numbers have been an important topic of study among mathematicians. With the

increasing usage of large primes in major cryptographic algorithms like RSA and

Diffie-Hellman Key Exchange, the study and method of generation of primes have

gained much significance in information security. For a long period of time, generation

of primes in software was common practice. However, in recent years, citing

confidence and security, prime number generation in hardware is being considered as

an alternative. Considering time complexity and hardware implementation issues,

probabilistic primality tests are preferred over deterministic tests. The Baillie-PSW

primality test is a strong probabilistic primality test, as no known Baillie-PSW

pseudoprime exists.

In this thesis, we briefly discuss the different types of cryptographic algorithms and

primality tests. We also study a few hardware implementations of some of the primality

algorithms, namely the Miller-Rabin and Lucas tests. Our work concentrates on the implementation of a Verilog-based design of the Baillie-PSW primality test on an Altera

Cyclone IV GX FPGA. To our knowledge, this is the first hardware implementation of this primality test. The implementation takes in an odd random number as input and returns the next immediate probable prime number. Numbers that are 1024 bits wide are preferred for their use in modern cryptographic algorithms. The results from our implementation are analyzed and methods to improve the results are discussed.

Keywords: Baillie-PSW test, hardware implementation, Verilog, FPGA, Cyclone IV GX

iii

Acknowledgements

I would like to thank everyone who helped me throughout the course of my graduate

degree at University of Cincinnati.

I am grateful to Dr. Carla Purdy for the opportunity to be a part of her research group, which helped me in furthering my research interests in the area of security and embedded systems. I offer my sincere thanks to her for her constant motivation and guidance during the course of my graduate studies. I would also like to thank

Dr. George Purdy and Dr. Wen-Ben Jone for spending their invaluable time in reviewing my work and serving on my defense committee.

I would like to express my deepest gratitude to my parents for their constant and unwavering patience and support in my professional endeavors, and for their good wishes which made this entire journey a wonderful experience. I would also like to thank all my lab-mates and friends for their helpful assistance and suggestions that helped me greatly in the design of my final implementation.

Table of Contents

1 Introduction ...... 1

1.1 Types of cryptography ...... 1

1.2 Primality algorithms ...... 3

1.3 Outline ...... 5

2 Background ...... 6

2.1 Prime numbers and their distribution ...... 6

2.2 Usage of primes in cryptography ...... 7

2.3 Primality tests ...... 7

2.4 Hardware implementations ...... 12

3 Module Description and Implementation ...... 14

3.1 Miller-Rabin module ...... 18

3.2 Lucas module ...... 20

3.3 Modular multiplication module ...... 23

3.4 Modular exponentiation module ...... 25

3.5 Jacobi symbol calculator module ...... 26

3.6 Divisibility module ...... 28

3.7 Random Number Generator (RNG) Module ...... 31

3.8 RAM module ...... 32

3.9 Baillie-PSW module ...... 33 v

3.10 Testbench module ...... 36

4 Results ...... 38

4.1 Implementation ...... 38

4.2 Synthesis and simulation ...... 39

4.3 Analysis ...... 46

5 Conclusions and Future Work ...... 49

5.1 Conclusions ...... 49

5.2 Future Work ...... 50

Bibliography ...... 51

Appendices ...... 56

Appendix A – Hardware modules ...... 58

Appendix B – Software modules...... 83

Appendix C – Cramer prime generation ...... 87

Appendix D – Lucas sequences and Jacobi symbol ...... 90

Appendix E – Tested numbers ...... 93

List of Figures

Figure 1 Block diagram of the Baillie-PSW module ...... 16

Figure 2 Block diagram of Combiner logic ...... 17

Figure 3 Block diagram of the Miller-Rabin module ...... 19

Figure 4 Block diagram of the Lucas module ...... 22

Figure 5 Block diagram of the Jacobi symbol calculator module ...... 27

Figure 6 Block diagram of the Divisibility module ...... 29

Figure 7 State machine of data flow within the software module ...... 30

Figure 8 Explanation of inputs signals used in Figure 7 ...... 30

Figure 9 Synchronicity between Baillie-PSW module and Divisibility module ...... 35

Figure 10 Explanation of inputs signals used in Figure 9 ...... 35

Figure 11 Data flow within the Testbench module ...... 36

Figure 12 Miller-Rabin synthesis results ...... 40

Figure 13 Miller Rabin simulation results ...... 41

Figure 14 Lucas synthesis results ...... 42

Figure 15 Lucas simulation results ...... 43

Figure 16 Baillie-PSW synthesis results ...... 45

Figure 17 Baillie-PSW Python implementation results ...... 46

Figure 18 Baillie-PSW Java implementation results ...... 47

Figure 19 (a) Global data flow (b) Data flow in Baillie-PSW module ...... 57

Figure 20 Global flowchart ...... 57

vii

List of Tables

Table 1 List of some of the common primality tests ...... 8

Table 2 Significance of output signals of Baillie-PSW module ...... 37

Table 3 Comparison of the results of 3 tests for 3 different numbers ...... 45

Table 4 Baillie-PSW system vs. Software implementations ...... 47

Table 5 Sub-systems vs. Hardware implementations ...... 48

Table 6 Value of Lucas sequences for small values of n ...... 90

Table 7 Value of Jacobi symbol for small values of a and n ...... 92

viii

1 Introduction

Cryptography is the practice and study of procedures for secure communication in the presence of third parties. Prior to the modern age, cryptography was the process of

converting information from readable data to apparent nonsense. The technique used

to decode the "nonsense" back into readable data was known only to the intended

recipients, thereby preventing unsolicited persons from gaining access to the readable

data. Since World War II and the dawn of computers, the practices used in cryptography

(encryption/decryption) have become incredibly complex and its usage has become

more extensive.

1.1 Types of cryptography

In modern day cryptography, two major types of cryptography exist:

1. Symmetric-key cryptography: as the name suggests, in this type of

cryptography, both the sender and the receiver share the same key. This was the

only known type of encryption till June 1976 [1]. Most symmetric key ciphers are

implemented as either block ciphers or stream ciphers - data is encrypted in

blocks of plaintext or in individual characters. DES and AES are some such

algorithms.

2. Public-key cryptography: due to the disadvantage arising from key management

in symmetric key cryptography, public-key cryptography was introduced. In this

type of cryptography, two separate but mathematically related keys exist (the

public key and the private key). Both keys are constructed in a manner such that,

given the public key, the calculation of the private key is computationally

infeasible. As the name suggests, the public key is distributed freely. The sender

encrypts the data using the receiver's public key. The receiving party then

decrypts the data using the private key that only they possess. In case a third

party intercepts the encrypted data, they would not be able to decrypt it

because they do not possess the private key. RSA and DSA are some of the most

popular public-key cryptographic algorithms.

RSA is one of the earliest and most widely used public-key cryptosystems. The name of the algorithm is based on the initials of the last names of its creators: Ron Rivest,

Adi Shamir and Leonard Adleman. The process of key generation used in RSA is described in the following steps [2]:

1. Choose two distinct prime numbers p and q. They must be chosen at random and

must be of similar bit-length.

2. Compute n = pq. n is used as the modulus for both the public and private keys.

Its length, in bits, is known as the key length.

3. Compute φ (n) where φ is Euler's totient function. This value is kept private.

4. Choose an integer e such that 1 < e < φ (n) and gcd (e, φ (n)) = 1, i.e., e and

φ (n) are coprime. e is released as the public key exponent.

5. Determine d as d e-1 (mod φ (n)), i.e., d is the modular multiplicative inverse

of e (modulo φ (n)≡). d is the private key exponent.

In RSA, the asymmetry of keys is based on the factoring problem. In practice it is difficult to factor the product of two large distinct prime numbers, also known as a semi-prime. Hence, large primes play a very important role in RSA.

According to NIST's Special Publication(SP) 800-57 [3], published in 2016, for all data that needs to stay secure through 2030, when using RSA as the encryption algorithm, a key-length of 2048 bits (with 112 bits of strength) must be used for encrypting the data.

This means that the bit-length of the semi-prime (n) must be 2048. Using NIST's Federal

Information Processing Standard [4], published in 2013, for a key-length of 2048 bits, the maximum length of primes used to generate the semi-prime must be 1024 bits, if probable primes are being used. Since our work mostly concentrates on probable primes, we consider 1024 bits to be the maximum length of prime which must be generated.

1.2 Primality algorithms

Generating prime numbers is a complex task as they seldom share properties with other numbers. Many kinds of prime number sieves could be used to generate primes. A prime sieve is a simple algorithm for finding primes that works by creating a list of all integers up to a desired limit and then eliminating all composite numbers until only the primes are left. This is a good method to find all primes in a limited range. However, this method is inefficient in finding a single prime in a large range. Therefore, to find a single large prime number, primality tests are generally used. Most sieves have linear

[5] or sub-linear time complexities [6], whereas primality tests have logarithmic time complexities. Therefore, an easier and faster approach is to generate a true- or pseudo- random number and test the number for primality by passing it through one or a set of primality tests.

Three main types of primality tests exist – probabilistic, deterministic and heuristic.

1) Probabilistic - These provide provable bounds on the probability of being fooled by

a composite number. Probabilistic tests generally involve numbers chosen at random

from a sample space, followed by an equality which states to a certain accuracy

whether the number is a composite or a probable prime. Since these tests declare

a number to be composite with absolute certainty, these tests are sometimes also

called "compositeness tests". Fermat test [7], Miller-Rabin test [8] and Solovay-

Strassen test [9] are some of the popular probabilistic tests. The basic structure of

a probabilistic test, for a number n, is as follows:

a) Randomly pick a number a.

b) Compute a function f (a), specific to the chosen primality test. If f (a) does not

satisfy certain equalities (specific to the primality test), declare the number as

composite and exit; else continue to step c.

c) Repeat steps a – b; pick different values for a till desired accuracy is achieved.

d) After a desired number of iterations, if n has not been declared composite,

declare n to be a probable prime.

2) Deterministic - These tests prove with absolute certainty the primality of a number.

Due to this reason, the numbers which pass these tests are also called provable

primes. The Pocklington primality test [10], one of the earliest deterministic

primality tests, had a slow running time since it required the partial factorization

of n-1. The cyclotomy test [11] and the elliptic curve primality test [12] are some

fast deterministic primality tests. In 2004, three computer scientists from IIT Kanpur

created the first polynomial-time deterministic test, the AKS primality test [13].

3) Heuristic - Although these tests are unproven, they work well in practice. The

Fibonacci test is an example of such a test. John Selfridge [14] states that: if p is an

odd number and p ≡ ±2 (mod 5) then p is a prime if:

a) 2(p-1) ≡ 1 (mod p)

b) f(p+1) ≡ 0 (mod p) where fk is the kth Fibonacci number

Baillie-PSW primality test is another such example that operates on all odd numbers

and replaces the Fibonacci sequence with the Lucas sequence.

In this work, the Baillie-PSW test was chosen for primality testing. Since no known

Baillie-PSW pseudoprime exists [14], we conclude that it is a strong primality test.

1.3 Outline

This work is divided into 5 chapters.

Chapter 2 gives a detailed background and discusses a few of the hardware

implementations of some of the above mentioned primality tests.

Chapter 3 discusses the different modules used in this test with the use of pseudocodes and block diagrams showing the data flow within these modules.

Chapter 4 describes the results obtained from synthesis and simulation, including timing

diagrams and data on execution time, number of logic elements used, etc.

Chapter 5 highlights the conclusions made from the results and talks about future work.

The Appendices include the codes for all modules discussed in chapter 3.

2 Background

2.1 Prime numbers and their distribution

A prime number is a natural number greater than 1 with no positive factors other than

1 and itself. The fundamental theorem of arithmetic [15] states that every integer greater than 1 is either a prime or a unique product of primes. So, prime numbers can be considered the basic building blocks of the natural numbers.

The Prime Number Theorem [16] [17], describes the asymptotic distribution of the prime numbers. It details the instinctive idea that primes become less common as they get larger. Let π(N) = number of primes less than or equal to some real number N.

In 1792, Gauss proposed that

(N)~ ln ( ) 𝑁𝑁 π Furthermore Legendre (1808) suggested that, for𝑁𝑁 large N,

(N) ~ where B = 1.08366 ln( ) + 𝑁𝑁 π − 𝑁𝑁 𝐵𝐵 Gauss later refined his estimate as,

1 (N)~ Li (N) ( ) = 𝑁𝑁 ln( ) π 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝐿𝐿𝐿𝐿 𝑁𝑁 � � � 𝑑𝑑𝑑𝑑 In 1901, Helge von Koch showed that, if and only if the2 Riemann𝑥𝑥 hypothesis [18] is true,

( ) = ( ) + ln( ) 1 2 𝜋𝜋 𝑁𝑁 𝐿𝐿𝐿𝐿 𝑁𝑁 𝑂𝑂 �𝑁𝑁 ∙ 𝑁𝑁 � 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑂𝑂 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑏𝑏𝑏𝑏𝑏𝑏 𝑂𝑂 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 The constant in the big O notation was estimated in 1976 by Lowell Schoenfeld [19],

ln( ) | ( ) ( )| < 1 2657 2 8 𝑁𝑁 ∙ 𝑁𝑁 𝜋𝜋 𝑁𝑁 − 𝐿𝐿𝐿𝐿 𝑁𝑁 𝑓𝑓𝑓𝑓𝑓𝑓 𝑎𝑎𝑎𝑎𝑎𝑎 𝑁𝑁 ≥ 6𝜋𝜋

2.2 Usage of primes in cryptography

Major public-key cryptographic algorithms like RSA and Diffie-Hellman key exchange

extensively use prime numbers; RSA mainly uses semi-primes. The security of these

algorithms depends on the ability of the attacker to factor the semi-prime into its prime

factors. Large and unique prime numbers are used to make factoring difficult for the

attacker. In 2012, a major flaw was found in the implementations of RSA. Researchers

[20] found that, to a large extent, public keys were being shared among unrelated parties. They found that for ElGamal and DSA, sharing was rare; but for RSA, the sharing

frequency was high enough to cause concern. The researchers also found that many

1024-bit RSA moduli offered little or no security. If such “high-risk” moduli were part

of the batch that was being shared, then these keys could compromise all concerned

systems. Due to this reason, the decision to use new and distinct prime numbers, which

must be chosen at random for each key-pair generation, was taken. To accomplish this

task, a new random number is generated each time and passed through a certain set of

primality tests. If the number passes these tests, then it is used in key generation.

2.3 Primality tests

Table 1 compares some of the primality tests described in Chapter 1, on the basis of

time complexity and probability of failure. Deterministic tests have zero probability of

failure, but very high running times. Since most probabilistic primality tests exhibit low

running times, they are ideal choices for practical primality tests. However, they also

have a higher rate of failure as compared to deterministic tests. Therefore, most

primality testing algorithms perform more than 1 cycle of these tests, using different

7 parameters in each cycle, to increase accuracy of the test. The Baillie-PSW primality test demonstrates very high accuracy with low running time [21].

Time Probability of Name of test Type of test complexity failure

1 for Carmichael Fermat test Probabilistic O (log3n) [21] numbers [22]

Solovay – Strassen test Probabilistic O (log3n) [23] 2-1

Miller – Rabin test Probabilistic O (log3n) [21] 4-1

Quadratic Frobenius Probabilistic Õ (log2n) [24] less than 7710-1 test

Elliptic curve test Deterministic O (log6n) [12] 0

AKS test Deterministic Õ (log6n) [25] 0

Baillie PSW test Heuristic O(log3n) [14] 0 (for n < 264)

Table 1 List of some of the common primality tests

2.3.1 Miller-Rabin test

Miller-Rabin test [8] is a strong and popular primality test. The probability of failure of

a single loop of the test (steps 2-5 described below) is 25%, i.e., a single loop declares

a composite number as prime with a probability of 25%. Therefore, doing these steps

twice will lead to a probability of failure, Pf = 25% * 25% = 6.25%. Performing these steps

thrice will lead to a Pf = 1.5625% and so on. Performing these steps k times will result

-k in Pf = 4 . Therefore, a single loop of this test must be repeated till Pf reaches an

acceptable level.

The Miller-Rabin test is performed as follows:

1. Find d, s such that N-1 = 2s · d.

2. Pick a random integer a in the range [2, N-2].

3. X ← ad mod N. If X = 1 or X = N - 1, continue to step 6.

4. Repeat s-1 times:

• X ← X2 mod N.

• If X = 1, return composite.

• If X = N - 1, continue to step 6.

5. Return composite if none of the above conditions are true after s-1 loops.

6. Repeat steps 2-5, using different values of a in each loop till required accuracy

is reached.

7. Return probably prime.

2.3.2 Lucas probable prime test

Lucas probable prime test [26] derives its strengths from the properties of the Lucas sequence. Given integers P and Q, where P > 0, Uk (P, Q) and Vk (P, Q) are the

corresponding Lucas sequences. The characteristic equation of the recurrence relation

for the Lucas sequences is + = 0, with D = P2 - 4Q being its discriminant. Let 2 N be a positive integer and J𝑥𝑥 (D,− 𝑃𝑃N)𝑃𝑃 be𝑄𝑄 the Jacobi symbol. We define, δ (N) = N – J (D, N).

Now, if N is a prime such that GCD of N and Q is 1, i.e., N and Q are coprime, then

Uδ(N) ≡ 0 (mod N). Using this property, the Lucas probable prime test determines the primality of any integer N. If the aforementioned equality holds true, then N is prime.

Otherwise N is composite. Appendix D talks in detail about the Lucas sequences and the

Jacobi symbol.

In [14], two methods of choosing the parameters D, P and Q are mentioned:

1. Let D be the first element of the sequence 5, -7, 9, -11 … for which J (D, N) = -1.

Let P = 1 and Q = (1 - D) / 4.

2. Let D be the least element of the sequence 5, 9, 13 … for which J (D, N) = -1.

Let P be the least odd number exceeding D1/2 and let Q = (P2 - D) / 4.

Baillie uses 2 while Selfridge prefers 1. In our work, we have chosen to follow Selfridge's

preference because although it might take longer to find D, the calculations for finding

P and Q and their subsequent usages in calculating U and V sequences become easier.

2.3.3 Baillie-PSW test

Baillie-PSW test [14], named after Robert Baillie, Carl Pomerance, John Selfridge and

Samuel Wagstaff, is a combination of a strong Fermat probable prime test (e.g.

Miller-Rabin test) to base 2 and a strong Lucas probable prime test. An optional trial division is performed before these two tests to eliminate composite numbers which are

divisible by primes less than a suitable limit. The Baillie-PSW test proves to be a strong

probable prime test since there is no known overlap between the lists of strong Fermat

pseudoprimes and strong Lucas pseudoprimes. From [14] and [26], there is evidence to

show that the numbers in each list tend to be different kinds of numbers: Fermat

pseudoprimes fall into the residue class 1 (mod m) for many small m, whereas Lucas

pseudoprimes fall into the residue class -1 (mod m). Therefore, if a number passes both

the tests without being declared composite, then it is very likely to be a prime. No

composite number less than 264 has passed the Baillie-PSW test [21]. Also there are no

known composite numbers beyond that limit that are known to have passed the test.

The test is performed as follows:

Input: N > 3, an odd integer to be tested for primality

1. [Optional] Perform a trail division of N with primes less than a convenient limit.

If N is divisible by any such prime, return composite; quit.

2. Perform a Miller-Rabin test (base 2) on N. If N is not a base 2 Miller-Rabin

probable prime, declare N to be composite and quit.

3. Find the first D in the sequence 5, -7, 9, -11 … such that the Jacobi symbol

J (D, N) = -1. Set P = 1; Q = (1 - D) / 4.

4. Perform a Lucas probable prime test on N using D, P and Q from step 3. If N is

not a Lucas probable prime, declare N to be composite and quit.

5. If N hasn’t been declared composite, declare N to be a probable prime.

2.4 Hardware implementations

In recent years, citing confidence and security, many of the primality testing algorithms

are being implemented in hardware, both in FPGAs and ASICs. For practical

applications, the size of the key for RSA, might need to be changed depending on

improvements in factorization algorithms. Similarly, the algorithm might need to be

modified to respond to change in standards or design flaws. Due to these factors,

reconfigurable hardware like FPGAs and CPLDs are preferred.

2.4.1 Miller-Rabin test

Miller-Rabin test has seen much attention in the academic community. Using variable

pipeline stages and variable serial replications in custom Montgomery multiplier and

exponentiation modules on a Xilinx Virtex-5 FPGA, Adrien Le Masle et al. [27], claim

faster speeds (of up to 2.2 times faster) than software implementations. Another

implementation of the Miller-Rabin algorithm shows significant improvement in speed, about 85 times, over previous hardware implementations with only 60% area overhead

[28]. The implementation in [29], has taken a different approach by modifying the

Karatsuba-Offman’s algorithm to obtain a less recursive algorithm and applying it for

the purpose of multiplication in the Miller-Rabin test. This implementation, on Texas

Instruments’ TMS320C54x DSP family, achieves up to 10-17% improvement over the

standard multiplication algorithm, for RSA key length of more than 1024 bits.

2.4.2 Lucas probable prime test

A recent implementation of the Lucas probable prime test in software and on an FPGA

as well as an ASIC is discussed in [30]. The FPGA implementation on a Xilinx Virtex-5 is

30% slower but 3 times more energy efficient than the software implementation running on an Intel Xeon W3505. The same design on an TSMC 65nm and 45nm ASIC implementation was 3.6 times faster and 400 times more energy efficient than the optimized software implementation. Pipelined modular add-shift module for the Lucas

sequences and dedicated hardware architecture for the Jacobi symbol calculation were

crucial in achieving these results. The ASIC implementation is suitable for integration

into embedded systems whereas the FPGA implementation is directed towards server

applications.

2.4.3 Baillie – PSW primality test

To our knowledge, no hardware implementations (FPGA or ASIC) exist for the Baillie-

PSW test. Our work is the first implementation of this primality test.

3 Module Description and Implementation

Our implementation for the Baillie-PSW primality test consists of various small modules

which work together to implement the test. Steps 2 and 4 of the Baillie-PSW test as

described in section 2.3.3 were implemented as the two main sub-modules: the Miller-

Rabin module and the Lucas module. Apart from these main sub-modules, three more sub-modules were designed to facilitate data flow within the main sub-modules.

Furthermore, Step 1 of the Baillie-PSW test was implemented as the Divisibility module.

Also a Random Number Generator module was implemented to generate a 1024-bit random number. Brief descriptions of these modules are given below:

1. Miller-Rabin module - This module performs the Miller-Rabin primality test on

the number N to the base 2. If the module found 2 to be a Miller-Rabin witness

to the compositeness of N, then N is declared composite by setting the output

signal to low. Otherwise, N is declared a strong probable prime by setting the

output signal to high.

2. Lucas module – This module performs the Lucas probable prime test on N using

specific values for the parameters D, P and Q. If N passes the test, then N is

declared a Lucas probable prime by setting the output signal to high. Otherwise,

N is declared composite by setting the output signal to low.

3. Modular multiplication module – This module uses the Montgomery modular

multiplication algorithm to compute the value of multiplication of two variables

modulo a third variable, i.e., output = input1 * input2 (mod input3). This module

was instantiated once in the Miller-Rabin module and twice in Lucas module.

4. Modular exponentiation module – This module computes the value of

exponentiation of the first variable to the second variable modulo the third

variable, i.e., output = input1input2 (mod input3). This module uses the

Montgomery modular exponentiation algorithm and was instantiated once in the

Miller-Rabin module.

5. Jacobi symbol calculator module - This module returns the value of D to be

used in the Lucas module. For use in the Lucas module, D must satisfy the

following constraints: D ϵ {5, -7, 9, -11 …} such that J (D, N) = -1, where J (D, N)

is the Jacobi symbol. The algorithm used in this module is a binary systolic

algorithm.

6. Divisibility module – This module checks if the number N is divisible by any of

the first 1000 primes. If N is divisible by any such prime, then N is declared

composite. Else N is declared a probable prime.

7. Random Number Generator module – This module generates a 1024-bit random

number that is used in both the Baillie-PSW module and the Divisibility module.

Figure 1 shows how the two main sub-modules were integrated together in the

Baillie-PSW module (described in section 3.9). The in_ready signal prompts each module

to start implementing the respective test. Both in_ready signals were forced high simultaneously. This ensures that both modules run in parallel to obtain the fastest output. The outputs from both these modules were combined using a 2-input AND gate

to obtain the final output of the Baillie-PSW module. Based on the truth table of a 2-

input AND gate, we conclude that the both sub-modules need to declare N to be prime for the Baillie-PSW module to declare N to be a prime.

Figure 1 Block diagram of the Baillie-PSW module

As can be seen in Figure 1, apart from an output signal, each of the main sub-modules

gives out an output_ready signal. This signal signifies the state of the output. If the

output_ready signal is low, it means that the respective module has not completed all

its computations and therefore reading the value of the output signal at this moment will result in an incorrect value. Therefore, each output signal must be read only if the

corresponding output_ready signal is high.

The Baillie-PSW module itself gives out an output_ready signal, which behaves the same

way as the output_ready signals of the sub-modules. The output signals and

output_ready signals from the sub-modules were combined together using a combiner.

Figure 2 describes the logic of the combiner. The outputs were ANDed together and fed

to the ‘high’ port of a 2x1 Multiplexer. The ‘low’ port is grounded. The output_ready

16 signals were ANDed together and fed to the ‘select’ port of the multiplexer as well as connected to the output_ready signal of the Baillie-PSW module. The output of the

multiplexer is connected to the output signal of the Baillie-PSW module.

Figure 2 Block diagram of Combiner logic

Each of the seven modules is described in detail in sections 3.1 to 3.7. The algorithms

used in each of the modules are included in the module descriptions. Section 3.8

describes the RAM module (instantiated in the Baillie-PSW module). Section 3.9 and

3.10 describe the Baillie-PSW module and the Testbench module respectively. For each

section, N is the number to be tested and k is the bit-length of N.

The Verilog codes for all these modules are included in the Appendices. These codes

include module descriptions (including input, output and internal variables) and any

other module instantiations used within the module description.

3.1 Miller-Rabin module

This module performs a Miller-Rabin primality test with base 2. If N passes base-2 Miller-

Rabin test, then the module declares N to be a probable prime and exits. If N does not pass the test, then the module declares it to be composite and exits. The algorithm for the base-2 Miller-Rabin primality test [8] [31], can be described as follows:

Input: N

Output: Probable Prime if N passes the test; otherwise Composite

1. Find s, m such that N - 1 = 2s · m; declare a = 2

2. Compute X = am mod N.

3. If X = 1 or X = N - 1, RETURN Probable Prime and EXIT.

4. FOR i = 1 to s-1 loop

a. Compute X = X2 mod N.

b. If X = 1, RETURN Composite and EXIT.

c. If X = N - 1, RETURN Probable Prime and EXIT.

5. RETURN Composite.

To perform step 2 of the algorithm, the Modular exponentiation module is instantiated.

For step 4a, the modular exponentiation module used in step 2, could be re-used to calculate the square of X. However, the Modular exponentiation module contains two

Modular multiplication module instances whereas the square of X can be calculated by using a single Modular multiplication module instance, by passing X as both input parameters to achieve X * X. Therefore, instead of re-using the Modular exponentiation module to calculate the square of X, a separate Modular multiplication module was

18 instantiated to handle this computation. This decision was taken to favor speed over area. In certain applications, where area needs to be given advantage over speed, the exponentiation module can be re-used to calculate X2 instead of the additional multiplication module. Figure 3 shows the data flow within the Miller-Rabin module.

When the in_ready signal becomes high, subsequent to the calculation of s and m, the

‘Check for Primality’ begins at step 1. Similarly, when abort signal becomes high, s and

m are re-calculated and ‘Check for Primality’ restarts at step 1.

Figure 3 Block diagram of the Miller-Rabin module

3.2 Lucas module

This module performs the Lucas probable prime test on the number N. If the number

passes the test, then the module declares it to be a probable prime and exits. Otherwise it declares the number to be composite. The algorithm for the Lucas test [26] can be described as follows:

Input: N

Output: Probable Prime if N passes the test; otherwise Composite

1. U0 = 0, V0 = 2.

2. Select D ϵ {5, -7, 9, -11 …} such that J (D, N) = −1. Set P = 1, Q = (1 – D) / 4.

3. Calculate UN - J (D, N) i.e. UN + 1.

4. If UN+1 ≡ 0 mod N: RETURN probable prime; else RETURN composite

The values of U and V in the algorithm are corresponding Lucas sequences for the values

of P, Q and D chosen in step 2. The correlation between Uk and Uk - 1 and between

Vk and Vk – 1 is as follows:

· (P, Q) + (P, Q) · (P, Q) + · (P, Q) (P, Q) = (P, Q) = 2 2 𝑃𝑃 𝑈𝑈𝑘𝑘−1 𝑉𝑉𝑘𝑘−1 𝐷𝐷 𝑈𝑈𝑘𝑘−1 𝑃𝑃 𝑉𝑉𝑘𝑘−1 𝑈𝑈𝑘𝑘 𝑉𝑉𝑘𝑘 Substituting the values of P, Q and D chosen in step 2, both the equations can be re-

written as:

+ · + = = 2 2 𝑈𝑈𝑘𝑘−1 𝑉𝑉𝑘𝑘−1 𝐷𝐷 𝑈𝑈𝑘𝑘−1 𝑉𝑉𝑘𝑘−1 𝑈𝑈𝑘𝑘 𝑉𝑉𝑘𝑘

Using these equations UN + 1 can be calculated. However, it is highly time inefficient to

calculate the value of Uk and Vk for all values of k = 1 to N + 1. Therefore, the correlation between Uk and U2k and between Vk and V2k is used.

= · V = 2 · Q 2 k 𝑈𝑈2𝑘𝑘 𝑈𝑈𝑘𝑘 k 𝑉𝑉2𝑘𝑘 𝑉𝑉𝑘𝑘 − For the values of D, P and Q chosen in step 2, the equation for V2k can be re-written as:

+ · = 2 2 2 𝑉𝑉𝑘𝑘 𝐷𝐷 𝑈𝑈𝑘𝑘 𝑉𝑉2𝑘𝑘 In addition to the congruence condition in step 4 of the Lucas test, another congruence

condition can be checked to ascertain the primality of N. If N is an odd prime and

J (D, N) = -1, then the following congruence holds true:

𝑉𝑉𝑁𝑁+1 ≡ 𝑄𝑄 𝑚𝑚𝑚𝑚𝑚𝑚 𝑁𝑁 Although this congruence is not a part of the Lucas primality test, checking this

congruence is free, since VN + 1 is calculated in the process of calculating UN + 1.

Therefore, if either of the two congruence conditions is false, then N is not prime. If both the conditions hold true, then the likeliness of N being a prime number increases.

Step 2 of the Lucas test is performed by the Jacobi symbol calculator module. A stronger

version of Lucas test [26] can be implemented using the following algorithm:

Input: N

Output: Probable Prime if N passes the test; otherwise Composite

1. U0 = 0, V0 = 2.

2. Select D ϵ {5, -7, 9, -11 …} such that J (D, N) = -1. P = 1, Q = (1 – D) / 4.

3. Select s, d such that N – J (D, N) = N + 1 = 2s · d.

4. if Ud ≡ 0 mod N: RETURN probable prime.

5. WHILE r = 1 to s-1 loop

a. if · ≡ 0 mod N: RETURN probable prime 𝑟𝑟 𝑑𝑑 2 6. RETURN composite𝑉𝑉

Any composite number which passes the Lucas test (for certain parameters P and Q) is known as a Lucas pseudoprime (for the corresponding values of P and Q). Similarly, any composite number that passes the strong Lucas test (for certain parameters P and Q) is known as a strong Lucas pseudoprime (for the corresponding values of P and Q).

In this thesis, for the implementation of Baillie-PSW primality test, the standard Lucas primality test is used. Figure 4 shows the data flow between different blocks in the

Lucas module.

Figure 4 Block diagram of the Lucas module

3.3 Modular multiplication module

The Miller-Rabin test and the Lucas test use a lot of modular multiplications as can be

seen from the algorithms used to implement these modules. To speed up these tests,

it is essential for the modular multiplication module to be fast. Therefore, the

Montgomery modular multiplication algorithm [32] was chosen for implementation of

this module. The algorithm is described as below:

Input: X, Y, N

Output: 2-k · X · Y (mod N)

1. S = 0

2. FOR i = 0 to k-1 loop

a. qi = (S + yi · X) mod 2

b. S = (S + qi · N + yi · X)

c. S = S / 2

3. RETURN S

th In this algorithm, yi corresponds to the value of the i bit of Y, when represented in bit

i format, i.e., Y = yi · 2 , where yi ϵ {0, 1}. All divide operations required are done 𝑘𝑘−1 0 using 2, which is very∑ cheap in hardware.

As can be seen from the algorithm, the output results in an additional factor (2-k) than

what is required for the tests.

S = X · Y · 2-k mod N ( 1 )

To counter this factor, the inputs X and Y are converted to their Montgomery form prior

to their use in the algorithm. This means that both X and Y are multiplied by 2k.

k k XM = X · 2 mod N YM = Y · 2 mod N

Now, using XM and YM instead of X and Y in Equation 1 results the factor in the output

to become 2k.

-k SM = XM · YM · 2 mod N

k SM = X · Y · 2 mod N

k Now, to remove this factor of 2 , a Montgomery multiplication of the output SM and 1

needs to be performed. This multiplication will re-introduce the 2-k factor to

compensate for the 2k factor, to return the required output.

-k S = 1 · SM · 2 mod N

S = X · Y mod N

Note: In the last step, the multiplication is performed with SM and the integer 1; NOT

the Montgomery forms of SM and 1.

The detailed Verilog code for the modular multiplication module in the Appendix includes the basic algorithm and all subsequent computations mentioned in this description.

3.4 Modular exponentiation module

In the Miller-Rabin test, the calculation of basem modulo N is performed. For this

purpose, the Modular exponentiation module is instantiated within the Miller-Rabin

module. This module computes the value of input1input2 mod input3. The Montgomery

modular exponentiation algorithm [33] used in this module is described below:

Input: base, exponent, N

Output: baseexponent mod N

1. R = 1; base = base mod N

2. WHILE exponent > 0 loop

a. If exponent is odd: R = R · base mod N

b. exponent = exponent / 2; base = base · base mod N

3. RETURN R

This module uses two instances of the modular multiplication module: one to calculate

(R · base) mod N in step 2a, the other to calculate (base · base) mod N in step 2b. Both calculations could have been completed using just one instance of the Modular multiplication module: first calculating (R · base) and then subsequently calculating

(base · base). However, for each cycle of the while loop, the value of the variable base once passed for the calculation of (R · base) is independent of the value obtained after the computation of base · base. Therefore, both instances of the module can run in parallel computing the value of R and base together. The divide computation on the exponent is by a factor of 2, which is very cheap in hardware.

3.5 Jacobi symbol calculator module

This module calculates the value of the Jacobi symbol J (a, b) for two variables a and

b. It uses a binary Jacobi symbol algorithm [34] which can be easily implemented in

hardware and is efficient with respect to space and time. The algorithm used is

described below:

Input: a, b

Output: J (a, b)

1. t = 1

2. If a < 0:

a. If b mod 4 = 3: t = -t

3. WHILE a ≠ 0 loop

a. WHILE a mod 2 = 0 loop

i. a = a / 2

ii. If (b mod 8 = 3) or (b mod 8 = 5): t = -t

b. If a < b:

i. interchange (a, b)

ii. If (a mod 4 = 3) and (b mod 4 = 3): t = -t

c. a = (a – b) / 2

d. If (b mod 8 =3) or (b mod 8 = 5): t = -t

4. If b = 1: RETURN t; else: RETURN 0.

Step 3.b.i of this algorithm uses an interchange function. In Verilog, a simple way to achieve an interchange is to assign both variables to each other directly, using non-

blocking statements. Therefore, the interchange function is implemented as follows:

interchange (a, b):

1. a <= b;

2. b <= a;

In this module, the binary algorithm is modified to adjust to the requirements of the

Lucas test. In the Lucas test, the value of D is chosen from the set {5, -7, 9, -11 …} such that J (D, N) = −1. Therefore, in the algorithm, assigning a = D and b = N, steps 1 - 3

were repeated for different values of D till the value of t in step 4 is -1. When the value

of t returned in step 4 becomes -1, the corresponding value of D is returned instead of

t. Therefore, this module returns the value of D for which J (D, N) = -1, instead of

returning the value of the Jacobi symbol itself. The data flow model of the module is

shown in Figure 5.

Figure 5 Block diagram of the Jacobi symbol calculator module

3.6 Divisibility module

This module checks the primality of a number by checking its divisibility w.r.t the first

1000 primes. This module was implemented in software, using Java, to increase the

spatial efficiency of the hardware design. If a larger device were chosen, this module

could be implemented in hardware to increase the time efficiency of the design. This

design choice is further explained in Section 4.1.The algorithm used is as follows:

Input: P = {P1, P2 … P1000}, N

Output: Probable Prime if N passes the test; otherwise Composite

1. FOR i = 1 to 1000 loop

a. If N mod Pi = 0: RETURN composite and EXIT

2. RETURN probable prime

th In this algorithm, P is the set of first 1000 primes, where Pi is the i prime and

i ϵ [1, 1000]. If any Pi completely divides N, then N cannot be a prime since Pi is a factor

of N. Therefore, the module declares N to be composite. If after 1000 cycles of the

loop, none of the primes completely divides N, then N is declared a probable prime,

since no factor of N has been found in the first 1000 primes. This module was implemented in Java, using the BigInteger class to declare and initialize values of N and all values of P. To calculate the remainder, the BigInteger remainder() method was used. If this module were implemented in hardware, a separate modulo reduction module, based on the Montgomery modulo reduction algorithm [35], would be instantiated that would perform the remainder functionality.

In addition to the input variable N and the output variable output, a conditional input

variable next_N and an output variable abort were declared. If the Divisibility module

finds the current value of N to be composite, then it signals the Baillie-PSW module to

terminate the Miller-Rabin and the Lucas tests by setting the abort variable high.

Similarly, if the Baillie-PSW module declares the current value of N to be composite, despite the Divisibility module declaring it to be a probable prime, the next_N variable is set high by the Baillie-PSW module. This signals the Divisibility module to update the value of N, by incrementing it by 2, and restart the divisibility test using the updated value. This concept has been explained in detail in section 3.9. Figure 6 shows the

functionality of the Divisibility module.

Figure 6 Block diagram of the Divisibility module

The data flow in the Divisibility module can be represented using a state machine as

shown in Figure 7 and Figure 8.

Figure 7 State machine of data flow within the software module

Figure 8 Explanation of inputs signals used in Figure 7

3.7 Random Number Generator (RNG) Module

For many of the probabilistic primality tests, testing is done by passing several random numbers through the primality test till a number is found that passes the test. However,

we evaluate our design using a different approach as described below:

1. A 1024-bit random number is generated and passed to the top-level module.

2. If the top-level module, finds the number to be composite, then subsequent

numbers were generated by simply adding 2 to the previous number.

3. In this way, all subsequent numbers were generated, till a prime number is

found.

From the principles highlighted in the (unpublished) proof by G. Purdy [Appendix C], it can be seen that the method described in step 2 is much more cost effective than generating several random numbers, but has only a slightly higher probability of generating the same prime twice (instead of generating two distinct primes). Hence, this particular method was chosen. Precautions to avoid overflow were handled in both software and hardware.

To generate the 1024-bit random number in step 1, the random number generator highlighted in [36] was used by modifying it a little to suit the particular requirements for our implementation. O’Neill’s generator generates a 32-bit random number. For our implementation, we have generated 32 such random numbers to form a 32*32 = 1024 bit-stream. This bit-stream forms the 1024-bit random number required in step 1. This random number is then stored in a file that is read by the Divisibility module and the

Testbench module to initialize their copies of the number.

3.8 RAM module

In order to store the random number, generated by the RNG module, as well as the prime number, obtained from the Baillie-PSW module, a bi-directional dual-port RAM was generated using the IP module in the Quartus Prime software [37]. The RAM stores

2048 bits of memory: the random number in the first 1024 bits and the prime number in the latter 1024 bits. The data input and output buses for each port are 128 bits wide and the address bus for each port is 4 bits wide. Each port has an individual read/write enable signal and both ports use the same clock.

The RAM module initializes its memory using a Memory Initialization File (.mif file). The

Testbench module reads the random number from the software generated file and stores the number in this mif file in the required format. The RAM module then uses this file to initialize its memory. On completion of this, the Baillie-PSW module reads this data using the RAM module instantiation and passes this value to the sub-modules.

On finding a probable prime, the Baillie-PSW module uses the same instantiation to store this prime in the RAM memory. Subsequently, the required prime number can be read from the RAM.

3.9 Baillie-PSW module

In this module, the external structure of the Baillie-PSW test is implemented. This module reads a 1024-bit random number from the RAM and returns the next immediate integral number that passes the Baillie-PSW test in the 1024-bit range.

In this module, the two main sub-modules were instantiated: the Miller-Rabin module and the Lucas module. The two inputs to the Baillie-PSW module, the random number

N (read from the RAM) and the clock variable, were passed to these sub-modules through their respective instantiations. Both these modules return a value of 0 or 1 corresponding to N being composite or probable prime respectively. The return values from both sub-modules were ANDed together to compute the final result value. If the final result value is 1, indicating that N is found to be prime by both the sub-modules, the corresponding value of N is written back into the RAM. If the final result value is 0, this indicates that one or both of the sub-modules found the number N to be composite.

The Baillie-PSW module then increments N by 2, adjusts this new value to compensate for overflow, passes the new value to the sub-modules and restarts both of them.

Along with these two modules, the Baillie-PSW module also instantiates a RAM module.

The initial value of the random number generated in software is stored in this RAM module. The Baillie-PSW module reads this value of the random number and initializes both the main sub-modules with this number. Also, the final prime number found by the Baillie-PSW module is also stored in the RAM module.

Along with these tasks, the Baillie-PSW module also needs to signal the Divisibility module that the number N was found to be composite by one or both of the sub- modules. Therefore, if N is declared composite, it sets the next_N output signal to become high. The Divisibility module reads this value of the next_N signal and updates

its copy of the variable N, thus avoiding any synchronization issues. After this, the

Divisibility module restarts the Divisibility test on the updated value of N.

In addition to these features, this module also takes advantage of two facts:

1. For large values of N, many composite numbers have at least one prime factor

among the first 1000 primes.

2. For large values of N, the Divisibility module finishes all computations earlier

than the hardware. From our analysis, the Divisibility module is about two orders

of magnitude faster than the fastest of the two hardware modules.

Utilizing these two pieces of information, the Baillie-PSW module synchronizes with the

Divisibility module, with the use of the abort input signal, to terminate the current cycle of the Miller-Rabin test and Lucas test as soon as the Divisibility module declares

N to be composite. If the Divisibility module declares N to be composite, it is excessive to keep the Miller-Rabin or the Lucas test running. On the other hand, if N is declared a probable prime by the Divisibility module, then this is not sufficient proof for the primality of N. Therefore, in that case the remaining tests continue their execution to

completely determine the primality of N. The synchronicity of the Baillie-PSW module

and the Divisibility module is represented using the state diagram as shown in Figure 9

and Figure 10.

Figure 9 Synchronicity between Baillie-PSW module and Divisibility module

Figure 10 Explanation of inputs signals used in Figure 9

3.10 Testbench module

This module defines the testbench used to simulate and test the Baillie-PSW module

against various inputs. In this module, only the Baillie-PSW module is instantiated. This

module reads the value of the random number from the file generated by the RNG

module. This value of N is passed to the Baillie-PSW module through its instantiation

using the RAM module. The Testbench module also instantiates a variable in_ready that is connected to the in_ready signal of the Baillie-PSW module. As soon as transmission

of N to the RAM is complete, this variable is set high to indicate to the Baillie-PSW module that it can read the value of N from the RAM and start its computations. The

Testbench module also generates a clock signal with the required clock frequency. This clock is then passed to the Baillie-PSW module as shown in Figure 11. This provision of

clock is only made during simulation phase of the design. For the synthesis phase, the

input clock signal of the Baillie-PSW module is connected directly to the on-board clock pin.

Figure 11 Data flow within the Testbench module

Two other variables were also initialized in this module: out and out_ready. The out variable is connected to the output signal out of the Baillie-PSW module and holds the

value of the output. The out_ready variable is connected to the output_ready signal of

the Baillie-PSW module and indicates whether the Baillie-PSW module has completed

its calculations. The Testbench module follows the truth table highlighted in Table 2.

out out_ready Action to be taken 0 0 Wait for Baillie-PSW module to finish computations 0 1 N is composite; update local copy of N 1 0 Wait for Baillie-PSW module to finish computations 1 1 Declare N as prime

Table 2 Significance of output signals of Baillie-PSW module

Along with these tasks, the testbench also handles the synchronicity between the

software and the hardware. The testbench keeps checking the software sync file to

read any updated values of the abort signal. Upon receiving a high abort, the testbench

updates the sync file to acknowledge this, and the abort signal of the Baillie-PSW

module is updated. A next_N variable is also declared in the testbench that is connected

to the next_N output signal of the Baillie-PSW module. As soon as this variable goes

high, the testbench updates the value of next_N in the sync file to notify the software

module that the current copy of N was found to be prime by the Baillie-PSW module.

At this time, the software updates its local copy of N and restarts computations. On encountering a high abort signal or a high next_N signal, the testbench updates its local copy of N to avoid synchronicity issues between itself and the Baillie-PSW module. The testbench also keeps comparing the current value of N with the initially generated random number. If the current value of N returns back to that number, the testbench displays a message indicating that no prime number was found in the 1024-bit range.

4 Results

This chapter discusses the results of our implementation in detail. Section 4.1 discusses the implementation of the design and some of the design choices that we have made

in the implementation. Section 4.2 talks about the synthesis and simulation phases of

the implementation. It provides a detailed record of the results of both these phases

(number of logic elements used, execution time, etc.). Section 4.3 compares our

implementation with prior implementations. All hardware syntheses and simulations

were performed using Altera tools [38].

4.1 Implementation

Using the Quartus Prime 15.1 Lite Edition design software, we implemented the

Baillie-PSW primality test on the Cyclone IV GX device family. The EP4CGX150DF31C8

device we used, has a core voltage of 1.2V, 149760 logic elements, 508 Total I/O pins,

464 General Purpose I/O pins, 8 PLLs and 30 global clocks. Apart from this particular

device, only a few Altera devices in the Cyclone III LS device family have a higher

number of logic elements. Although the higher number of logic elements would have

been beneficial for our design, the Cyclone III LS device family is not supported in

Quartus Prime 15.1, which supports parallel compilation. For the Cyclone III LS family,

we would need to use the Quartus II 13.0sp1 Web edition software and sacrifice this

parallel processing capability. Since parallel compilation proves to be a massive

advantage for large systems like ours, we decided to choose the Cyclone IV GX device

family and use the Quartus Prime 15.1, instead of the Cyclone III LS device family and

the associated Quartus II 13.0sp1.

To simplify our design for testing purposes, two major design choices were made:

1. The entire system was implemented in three phases:

a. First, the Miller-Rabin sub-system was implemented. This includes the

synthesis and RTL-simulation of the following three modules: the Modular

multiplication module, the Modular exponentiation module and the

Miller-Rabin module.

b. Next, the Lucas sub-system was implemented, including the synthesis and

RTL-simulation of the following three modules: the Modular multiplication

module, the Jacobi symbol calculator module and the Lucas module.

c. In the last phase, the main system was implemented. This includes the

synthesis of the two main sub-systems along with the Baillie-PSW module.

2. The Divisibility module was implemented in software. This enables the

implementation of the two sub-systems in the available logic elements without

significant deterioration in time efficiency of the main system. If the Cyclone III

LS device family were chosen, then this module could be moved to hardware,

owing to the larger number of logic elements in Cyclone III LS.

4.2 Synthesis and simulation

This section talks about the results of the synthesis and simulation (S&S) of our system.

As mentioned in Section 4.1, the S&S of our design has been broken down into three

phases. The following subsections talk in detail about the results of these three phases.

4.2.1 Miller – Rabin sub-system

For the implementation of this sub-system, the S&S of the following modules was

performed - the Modular multiplication module, the Modular exponentiation module

and the Miller-Rabin module. A number of 1024 bit-length was chosen to be tested.

Figure 12 shows the results of the synthesis of the Miller-Rabin sub-system. Our design uses 55,841 of the available 149,760 logic elements, out of which 55,755 were

combinational functions and 14,765 were dedicated logic registers. This represents 37%

usage of the total available logic elements and combinational functions and 10% usage

of the available logic registers.

Figure 12 Miller-Rabin synthesis results

Figure 13 shows the results of the simulation phase of the Miller-Rabin sub-system in

Model-Sim Altera Starter Edition 10.4b. To simulate an ideal scenario, at time t = 0s,

the inputs in_ready and reset were set to high and low respectively. Utilizing the clock

summary information from the TimeQuest Timing Analyzer in Quartus Prime 15.1, the

clock input signal clk is set to a clock frequency of 10MHz (1 cycle in 100 ns, 50% duty

cycle). As can be seen from Figure 13, the Miller-Rabin test, for this value of N, declares

the number to be prime in 478.2 ms (precisely 4,782,003 clock cycles). Different

numbers have different and distinct values of s and m as described in the algorithm

mentioned in Section 3.1. Due to this, the time required to complete the Miller-Rabin,

for other values of N may differ. On testing our sub-system with many different 1024-

bit values of N, the average time taken by the Miller-Rabin sub-system to declare a

number to be prime is 483 ms.

Figure 13 Miller Rabin simulation results

4.2.2 Lucas sub-system

For the implementation of this sub-system, the S&S of the following 3 modules was

performed: the Modular multiplication module, the Jacobi symbol calculator module

and the Lucas module. Similar to the Miller-Rabin sub-system, a bit-length of 1024 was chosen for the number N to be tested.

Figure 14 shows the results of the synthesis of the Lucas sub-system. Our design uses

70,936 of the available 149,760 logic elements, out of which 70,926 were combinational

functions and 16,718 were dedicated logic registers. This represents 47% usage of the

total available logic elements and combinational functions and 11% usage of the available logic registers.

Figure 14 Lucas synthesis results

Figure 15 shows the results of the simulation phase of the Lucas sub-system in Model-

Sim Altera Starter Edition 10.4b. Similar to the Miller-Rabin sub-system, to simulate an

ideal scenario, at t = 0s, the inputs in_ready and reset were set to high and low

respectively. Utilizing the clock summary information from the TimeQuest Timing

Analyzer in Quartus Prime 15.1, the clock input signal clk is also set to a clock frequency

of 10MHz (1 cycle in 100 ns, 50% duty cycle). As can be seen from Figure 15, the Lucas

pseudoprime test, for this value of N, declares the number to be prime in 1.18 s

(precisely 11,825,551 clock cycles). Like the Miller-Rabin test, different numbers have

different and distinct values for D as described in the algorithm mentioned in Section

3.2, which in turn changes the time required to complete the Lucas test for the

corresponding value of N. On testing our sub-system with various random values of N,

the average time taken by the Lucas sub-system to declare a 1024-bit number to be

prime is 1.18 s.

Figure 15 Lucas simulation results

4.2.3 Baillie-PSW system

The Baillie-PSW system is implemented by synthesizing the following modules: the

Miller-Rabin sub-system, the Lucas sub-system and the Baillie-PSW module. Also, the

RAM module (instantiated within the Baillie-PSW module) was also synthesized. A 1024-

bit number is chosen to be tested.

Figure 16 shows the results of the synthesis of the Baillie-PSW system. Our design uses

168,507 logic elements, with 168,507 combinational functions and 33,835 dedicated

logic registers. As can be seen, the total number of logic elements required is more

than the available logic elements on the device (149,760). To rectify this issue, the

Cyclone III LS device family could be used. But due to reasons explained in Section 4.1,

the design choice of splitting the S&S into three phases was taken instead. From

Figure 12,Figure 14 and Figure 16 it can be concluded that the Baillie-PSW module in

itself, only uses about 41,730 logic elements, 41,826 combinational functions and 2,352

dedicated registers.

For simulation results of the Baillie-PSW system, we utilize information from the Miller-

Rabin and the Lucas sub-systems to extrapolate the results. From the design of our

system, we know that the Baillie-PSW module waits for both tests to complete before

it declares the number to be prime. Therefore, the slower sub-system, which takes the

most time to declare a number to be prime, would dictate the time taken for the Baillie-

PSW system to complete. From Sections 4.2.1 and 4.2.2, it can be seen that the Lucas

sub-system takes the longest time to declare a number prime. Therefore, the Baillie-

PSW system will follow the same timeline as the Lucas sub-system and returns a positive

result for a prime number in an average time of 1.18 s, similar to the Lucas sub-system.

Figure 16 Baillie-PSW synthesis results A 1024-bit known prime number was used to obtain the simulation results discussed in sections 4.2.1 and 4.2.2. We also ran our tests for two more numbers – both numbers pass the Divisibility test, but neither passes the Miller-Rabin or the Lucas test. The results for all three numbers are discussed in Table 3. P denotes that the number passed the test, NP denotes that it did not pass. The values of N1, N2 and N3 are given in Appendix E.

Number Divisibility test Miller-Rabin test Lucas test N1 (1024-bit) P (0.588 ms) P (478.2 ms) P (1.18 s) N2 (1326-bit) P (0.526 ms) NP (838.48 ms) NP (2.02 s) N3 (333-bit) P (0.681 ms) NP (27.87 ms) NP (67.89 ms)

Table 3 Comparison of the results of 3 tests for 3 different numbers

4.3 Analysis

In this section we compare our design with other implementations. Since no prior hardware implementation of the Baillie-PSW test exists, our Baillie-PSW system is compared with prior software implementations instead. Our two main sub-systems however, were compared with previous hardware implementations.

4.3.1 Baillie-PSW system vs. Software implementations

For the Baillie-PSW system, the first software implementation we use is Luke

Smallman’s Python implementation [39]. We made a few changes to Smallman’s code to import the time module to calculate the execution time of the isprime() module

using the time() function. Figure 17 shows the output of the module for the same value

of N that was used to test our hardware implementation in Section 4.2.3.

Figure 17 Baillie-PSW Python implementation results

As can be seen from Figure 17, the software implementation takes 642.135 ms to

declare N to be prime. On testing various other 1024-bit numbers, we found that the

average time taken by the software implementation is 636.82 ms. Comparing this value

with the time taken by our implementation, we conclude that our implementation is about 1.85 times slower than this software implementation.

Apart from this, we compare our design to two other software implementations. First, we compare it to our own implementation of the Baillie-PSW test in Java [40], which

46 uses the same principles used in the design of the hardware implementation. Second, we compare our design with the built-in Java method isProbablePrime() of the

BigInteger class. The description and usage of this function is given in detail in [41].

From Figure 18, we can conclude that our design is about 22.7 times slower than the built-in Java function and about 6.5 times slower than our own software implementation. However, if we use number of clock cycles as a measure for speed, then by comparing the immensely fast clock used by the software implementations

(about 2.2 GHz) with the clock that our system uses (10 MHz), our system performs

better than the software implementations. So, we conclude that our system would perform better, if it were implemented on a custom chip which could accommodate a faster clock. Table 4 summaries the results of comparing our Baillie-PSW system with other software implementations.

Figure 18 Baillie-PSW Java implementation results

FPGA Python Java built-in Java Execution time (ms) 1182 636.82 52 180 Execution time 1.18 x 107 1.46 x 109 1.14 x 108 3.96 x 108 (# of clock cycles) Clock speed (MHz) 10 2200 2200 2200

Table 4 Baillie-PSW system vs. Software implementations

4.3.2 Miller-Rabin and Lucas sub-systems vs. Hardware implementations

We also compare our two main sub-systems to prior hardware implementations. The

Miller-Rabin sub-system performs well again its predecessor. The fastest scalable design discussed in [28] (with 8-bit word size and 32 processing elements) is about 3.68 times slower than our Miller-Rabin sub-system implementation. As for the Lucas sub-system, the FPGA implementation in [30], is more efficient in comparison to our Lucas sub- system. The implementation in [30] utilizes only 28% of the available resources and take only 12.66 ms on the Xilinx Virtex-5 device in comparison to the 47% resource utilization and 1.18 s on Altera Cyclone IV GX device by our implementation. However, by further optimizing our design using principles highlighted in Section 5.2, we estimate that our Lucas sub-system could perform well against this implementation. Table 5 summaries the results of the analysis of our sub-systems’ performances in comparison to prior hardware implementations.

Our sub-system Prior implementation Execution time (Miller-Rabin) 483 ms @ 10 MHz 1776.8 ms @ 30 MHz Execution time (Lucas) 1.18 s @ 10 MHz 12.66 ms @ 150 MHz

Table 5 Sub-systems vs. Hardware implementations

4.3.3 Power analysis

Since our main design did not fit on the device, we were unable to run power analysis on our design. If the Cyclone III LS device were chosen, then using the PowerPlay Power

Analyzer Tool in Quartus, the Total Thermal Power Dissipation, the Core Dynamic Power

Dissipated and the I/O Thermal Power Dissipation could be calculated.

5 Conclusions and Future Work

5.1 Conclusions

In this thesis, we briefly discussed cryptography and major primality algorithms, followed by the history of primes and their usage in the different types of cryptography.

We investigated some of the important primality tests and their implementations in hardware.

We thoroughly discussed our implementation and the various modules that make up our design. Some of the design choices and the reasons behind those choices were explained. The results of the synthesis and simulation of our implementation were discussed.

Since this was the first hardware implementation of the Baillie-PSW primality test, we mainly compared our design with prior software implementations. Our implementation did not perform well against software implementations. Our Miller-Rabin sub-system, however, showed speed improvement over a previous hardware implementation. The flexibility and scalability of the FPGA hardware was a key feature of our implementation that will help in adapting the primality test for any needed modifications, if some of the criteria for prime generation in cryptography were to change in the future.

In the next section, we talk about some of the ways in which the results of our implementation could be improved.

5.2 Future Work

Future work on our implementation includes the design implementation on a custom layout (ASIC). The advantages of such an implementation are two-fold:

1. The custom design would help achieve higher clock speed and smaller size of

layout, thus achieving higher efficiency in time and area.

2. Also, by decreasing the size of the layout, the Divisibility module could be

implemented in hardware instead of software. This would further help in

optimizing the Divisibility module itself in addition to increasing the speed of

operation of the entire system.

Another avenue that can be explored is the optimization of the Modular multiplication module. In our design of this particular module, a standard modulo reduction technique was used while converting the inputs to the Montgomery form. Instead, by using the

REDC algorithm [35], the temporal and spatial efficiency of the multiplier could be greatly increased. This, will enhance both sub-systems, thus optimizing the entire design.

If the REDC algorithm helps in fitting the design on the Cyclone IV GX device, then as discussed in Section 4.3.3, the total power analysis of the design can be performed to compare our implementation with other power-efficient designs. Using the results of this comparison, our implementation can be optimized further to make it more power efficient.

Bibliography

[1] W. Diffie and M. E. Hellman, "New Directions in Cryptography," IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 644-654, November 1976.

[2] R. L. Rivest, A. Shamir and L. Adleman, "A method for obtaining digital signatures and public-key cryptosystems," Communications of the ACM, vol. 21, no. 2, pp. 120-126, 1978.

[3] E. Barker, "Recommendation for Key Management - Part 1: General (Revision 4)," NIST Special Publication 800-57, January 2016.

[4] C. F. Kerry and P. D. Gallagher, "Digital Signature Standard," Federal Information Processing Standards Publication 186-4, July 2013.

[5] P. Pritchard, "Linear Prime-Number Sieves: A Family Tree," Science of Computer Programming, vol. 9, no. 1, pp. 17-35, 1987.

[6] P. Pritchard, "A Sublinear Additive Sieve for Finding Prime Numbers," Communications of the ACM, vol. 24, no. 1, pp. 18-23, 1981.

[7] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, "Primality Testing," in Introduction to Algorithms, 3rd ed., The MIT Press, 2009, pp. 965-968.

[8] M. O. Rabin, "Probabilistic Algorithm for Testing Primality," Journal of Number Theory, vol. 12, no. 1, pp. 128-138, 1980.

[9] R. Solovay and V. Strassen, "A fast Monte-Carlo test for primality," SIAM journal on Computing, vol. 6, no. 1, pp. 84-85, March 1977.

[10] H. C. Pocklington, "The determination of the prime or composite nature of large numbers by Fermat's theorem," Proceedings of the Cambridge Philosophical Society, vol. 18, pp. 29-30, 1914.

[11] L. M. Adleman, C. Pomerance and R. S. Rumely, "On distinguishing prime numbers from composite numbers," Annals of Mathematics, vol. 117, no. 1, pp. 173-206, 1983.

[12] A. O. L. Atkin and F. Morain, "Elliptic Curves and Primality Proving," Mathematics of computation, vol. 61, no. 203, pp. 29-68, 1993.

[13] M. Agrawal, N. Kayal and N. Saxena, "PRIMES Is in P," Annals of Mathematics, vol. 160, pp. 781-793, 2004.

[14] C. Pomerance, J. L. Selfridge and S. S. Wagstaff Jr., "The Pseudoprimes to 25 · 10^9," Mathematics of Computation, vol. 35, no. 151, pp. 1003-1026, July 1980.

[15] H. Davenport, "The fundamental theorem of arithmetic," in The higher arithmetic: an introduction to the theory of numbers, Cambridge University Press, 1999, pp. 9-12.

[16] J. Hadamard, "Sur la distribution des zéros de la fonction ζ(s) et ses conséquences arithmétiques.(On the distribution of zeros of the function ζ(s) arithmetic and its consequences.)," Bulletin de la Societé mathematique de France(Bulletin of Mathematical Society of France), vol. 24, pp. 199-220, 1896.

[17] C. Vallée-Poussin, "Recherches analytiques de la théorie des nombres premiers(Analytical research on the theory of prime numbers)," Annales de la Société scientifique de Bruxelles(Annals of the Brussels Scientific Society), vol. 20, pp. 183-256, 1896.

[18] B. Riemann, "On the Number of Prime Numbers less than a Given Quantity(Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse.)," Monatsberichte der Berliner Akademie, 1859.

[19] L. Schoenfeld, "Sharper Bounds for the Chebyshev Functions θ(x) and ψ(x). II.," Mathematics of Computation, vol. 30, no. 134, pp. 337-360, 1976.

[20] A. K. Lenstra, J. P. Hughes, M. Augier, J. W. Bos, T. Kleinjung and C. Wachter, "Ron was wrong, Whit is right," No. EPFL-REPORT-174943. IACR., 2012.

[21] T. R. Nicely, "The Baillie-PSW primality test," 10 June 2005. [Online]. Available: http://www.trnicely.net/misc/bpsw.html. [Accessed 16 June 2016].

[22] R. D. Carmichael, "On Composite Numbers P Which Satisfy the Fermat Congruence a^(P-1) ≡ 1 mod P," The American Mathematical Monthly, vol. 19, no. 2, pp. 22- 27, 1912.

[23] S. Bandyopadhyay, "PRIMALITY TESTING A journey from Fermat to AKS," Chennai Mathematical Institute, Chennai.

[24] J. Grantham, "A Probable Prime Test With High Confidence," Journal of Number Theory, vol. 72, no. 1, pp. 32-47, 1998.

[25] C. Pomerance and H. W. Lenstra Jr., "Primality testing with Gaussian periods," Foundations of Software Technology and Theoretical Computer Science (FSTTCS), p. 1, 2002.

[26] R. Baillie and S. S. Wagstaff Jr., "Lucas Pseudoprimes," Mathematics of Computation, vol. 35, no. 152, pp. 1391-1417, October 1980.

[27] A. Le Masle, W. Luk, J. Eldredge and K. Carver, "Parametric Encryption Hardware Design," Reconfigurable Computing: Architectures, Tools and Applications, pp. 68-79, 2010.

[28] R. C. Cheung, A. Brown, W. Luk and P. Y. Cheung, "A Scalable Hardware Architecture for Prime Number Validation," International Conference on Field- Programmable Technology, pp. 177-184, December 2004.

[29] G. Dordevic and M. Markovic, "On Optimization of Miller-Rabin Primality Test on TI TMS320C54x Signal," 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services, pp. 229-232, 2007.

[30] A. Le Masle, W. Luk and C. A. Moritz, "Parametrized Hardware Architectures for the Lucas Primality Test," International Conference on Embedded Computer Systems (SAMOS), pp. 124-131, 2011.

[31] G. L. Miller, "Riemann's Hypothesis and Tests for Primality," Journal of Computer and System Sciences, vol. 13, no. 3, pp. 300-317, 1976.

[32] J. Fry and M. Langhammer, "RSA & Public Key Cryptography in FPGAs," Altera Document, 2005.

[33] B. Schneier, "11.3 Number Theory," in Applied cryptography: protocols, algorithms, and source code in C, 2nd ed., New York, J. Wiley & Sons, 1996, pp. 203-204.

[34] J. Shallit and J. Sorenson, "A Binary Algorithm for the Jacobi Symbol," ACM SIGSAM Bulletin, vol. 27, no. 1, pp. 4-11, 1993.

[35] P. L. Montgomery, "Modular multiplication without trial division," Mathematics of computation, vol. 44, no. 170, pp. 519-521, 1985.

[36] M. E. O'Neill, "Download the PCG Library," 2015. [Online]. Available: http://www.pcg-random.org/. [Accessed 16 June 2016].

[37] Altera Corporation, "Introduction to Altera IP Cores," 5 February 2016. [Online]. Available: https://www.altera.com/content/dam/altera- www/global/en_US/pdfs/literature/ug/ug_intro_to_megafunctions.pdf. [Accessed 4 July 2016].

[38] Altera Corporation, "Design Software - Overview," Altera Corporation, [Online]. Available: https://www.altera.com/products/design-software/overview.html. [Accessed 6 July 2016].

[39] L. Smallman, "Python implementation of the Baillie-PSW probabilistic primality test.," GitHub, 11 December 2013. [Online]. Available: https://github.com/smllmn/baillie-psw. [Accessed 28 June 2016].

[40] Y. V. Kasarabada, "kasarayv/baillie-psw-java-," 8 July 2016. [Online]. Available: https://github.uc.edu/kasarayv/baillie-psw-java-. [Accessed 8 July 2016].

[41] Oracle, "BigInteger (Java Platform SE 7)," Oracle, [Online]. Available: https://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html#isProb ablePrime(int). [Accessed 28 June 2016].

[42] The Apache Software Foundation, "Apache License, Version 2.0," The Apache Software Foundation, January 2004. [Online]. Available: http://www.apache.org/licenses/LICENSE-2.0. [Accessed 1 July 2016].

[43] A.-M. Legendre, Essai sur la theorie des nombres, 1808.

[44] D. H. Lehmer, "Tests for primality by the converse of Fermat's theorem," Bulletin of the American Mathematical Society, vol. 33, no. 3, pp. 327-340, 1927.

[45] R. C. Baker, G. Harman and J. Pintz, "The difference between consecutive primes, II," Proceedings of the London Mathematical Society, vol. 83, no. 03, pp. 532-562, 2001.

[46] H. Cramer, "On the order of magnitude of the difference between consecutive primes.," Acta Arithmetica, vol. 2, no. 1, pp. 23-46, 1936.

Appendices

The program codes for the different modules described in chapter 3 are discussed in the appendices A and B. Appendix C gives the proof by G. Purdy discussed in section

3.7. Appendix D gives a brief explanation of the Lucas sequences and the Jacobi symbol and highlights an example of the Lucas probable prime test.

All modules coded in Verilog are included in appendix A; appendix B holds the software modules coded in Java and C. Certain variables used in some of the modules follow a common nomenclature. Some of these variables are:

. N – It is the number that is being tested for primality.

. NLEN – This parameter defines the bit-length of N.

. clk – This variable is used to declare the clock used by all the modules.

. in_ready – Each hardware module has an input named in_ready. This input

variable signals the respective module to commence computations.

. out – The main output of each hardware module. Bit-length of this variable

depends on the respective module. For the Baillie-PSW module and the two main

sub-modules, out is a single bit variable which signifies whether N is prime or

not: 0 means composite, 1 means prime. For all other modules, the value held

by output varies respective to each module.

. out_ready – Similar to in_ready, the out_ready is an output signal which signifies

the completion of all computations by the respective hardware module. Only

after out_ready becomes high, should the value of out be read; any intermediate

value of out must be disregarded.

Figure 19 (a) highlights the data flow between the RNG module, the Divisibility module and the

Testbench module. Figure 19 (b) shows a rough inter-connect of the various module in the

Baillie-PSW module. Figure 20 underlines the basic flowchart of the global data flow.

Figure 19 (a) Global data flow (b) Data flow in Baillie-PSW module

Figure 20 Global flowchart

Appendix A – Hardware modules

Miller-Rabin module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module performs the Miller-Rabin test on the input N. clk is the clock input. in_ready and reset control data flow within the module. out indicates primality of N: 0 – composite, 1 – prime. out_ready indicates readiness of out. ****************************************************************************/ module MillerRabin #( parameter NLEN=1024, parameter TAG = 5 )( input signed [NLEN:0] N, input in_ready, input clk, input reset, output out, output reg out_ready );

//Local variables localparam INITVAL = -1; reg signed [NLEN:0] m; integer turn, s; reg signed [NLEN+TAG:0] val; reg temp; reg signed [3:0] state = INITVAL;

//////////////////////SUB-MODULE INSTANTIATIONS////////////////////// reg exp_in_ready=0, mult_in_ready=0, mult_reset=0, exp_reset=0; wire signed [NLEN+TAG:0] exp_out, mult_out; wire exp_out_ready, mult_out_ready;

//Modular exponentiation module instantiation ModExp #( .NLEN (NLEN ), .TAG (TAG ) ) exp ( .exp (m ), .N (N ), .in_ready (exp_in_ready ), .clk (clk ), .reset (exp_reset ), .out (exp_out ), .out_ready (exp_out_ready ) ); 58

//Modular multiplication module instantiation ModMult #( .NLEN (NLEN ), .TAG (TAG ) ) mult( .in1 (val ), .in2 (val ), .N (N ), .in_ready (mult_in_ready ), .clk (clk ), .reset (mult_reset ), .out (mult_out ), .out_ready (mult_out_ready ) ); //////////////////SUB-MODULE INSTANTIATIONS COMPLETE/////////////////

//Main code always @(posedge clk) begin if(reset==1) begin out_ready <= 0; state <= INITVAL; end else begin

case (state)

INITVAL : if(in_ready==1) begin state <= 0; s <= 0; m <= N - 1; out_ready <= 0; val <= 1; temp <= 0; end

0 : if(m[0]==0) begin s <= s + 1; m <= m >>> 1; end else begin state <= 1; end

1 : begin state <= 2; if(exp_in_ready==1) exp_reset <= 1; else exp_in_ready <= 1; end

2 : begin exp_reset <= 0; if(exp_out_ready==1) begin if(exp_out==1 || exp_out==N-1) begin temp <= 1; state <= 7; end else begin if(s==1) begin temp <= 0; state <= 7; end else begin val <= exp_out; turn <= 1; state <= 3; end end end end

3 : begin state <= 4; if(mult_in_ready==1) mult_reset <= 1; else mult_in_ready <= 1; end

4 : begin mult_reset <= 0; state <= 5; end

5 : if(mult_out_ready==1) begin if(mult_out==1) begin temp <= 0; state <= 7; end else if(mult_out==N-1) begin temp <= 1; state <= 7; end else begin if(turn

6 : begin temp <= 0; state <= 7; end

7 : begin out_ready <= 1; end

//For testing purposes only default : begin $display("Error : Invalid case"); end

endcase end end

//Assign the value of temp variable to the output wire out assign out = temp; endmodule

Lucas module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module performs the Lucas primality test on the input N. clk is the clock input. in_ready and reset control data flow within the module. out indicates primality of N: 0 – composite, 1 – prime. out_ready indicates readiness of out. ****************************************************************************/ module Lucas #( parameter NLEN = 1024, parameter TAG = 5 )( input signed [NLEN:0] N, input in_ready, input clk, input reset, output out, output reg out_ready );

//Local variables localparam INITVAL = -1; reg signed [NLEN+TAG:0] u,v,Nplus1,tempN,temp1; reg temp=0; reg signed [4:0] state = INITVAL; integer i; wire signed [5:0] d;

//////////////////////SUB-MODULE INSTANTIATIONS////////////////////// reg signed [NLEN+TAG:0] mult1_in1,mult1_in2,mult2_in1,mult2_in2; reg mult1_in_ready=0,mult2_in_ready=0,mult1_reset=0,mult2_reset=0; wire signed [NLEN+TAG:0] mult1_out,mult2_out; wire mult1_out_ready,mult2_out_ready; reg jacobi_in_ready=0,jacobi_reset=0; wire jacobi_out_ready;

//Jacobi symbol calculator module Jacobi #( .NLEN (NLEN ) ) jac ( .N (N ), .in_ready (jacobi_in_ready ), .clk (clk ), .reset (jacobi_reset ), .out (d ), .out_ready (jacobi_out_ready ) );

//Two Modular multiplication module instantiations ModMult#( .NLEN (NLEN ), .TAG (TAG ) ) mult1 ( .in1 (mult1_in1 ), .in2 (mult1_in2 ), .N (N ), .in_ready (mult1_in_ready ), .clk (clk ), .reset (mult1_reset ), .out (mult1_out ), .out_ready (mult1_out_ready ) );

ModMult#( .NLEN (NLEN ), .TAG (TAG ) ) mult2 ( .in1 (mult2_in1 ), .in2 (mult2_in2 ), .N (N ), .in_ready (mult2_in_ready ), .clk (clk ), .reset (mult2_reset ), .out (mult2_out ), .out_ready (mult2_out_ready ) ); //////////////////SUB-MODULE INSTANTIATIONS COMPLETE/////////////////

//Main code always @(posedge clk) begin if(reset==1) begin state <= INITVAL; out_ready <= 0; temp <= 0; jacobi_reset <= 1; end else begin case (state)

INITVAL : if(in_ready==1) begin u <= 1; v <= 1; Nplus1 <= N + 1; jacobi_in_ready <= 1; jacobi_reset <= 0; out_ready <= 0; i <= 0; tempN <= N; state <= 0; end 63

0 : if(tempN>0) begin i <= i + 1; tempN <= tempN >>> 1; end else begin if(jacobi_out_ready==1) begin state <= 1; i <= i - 2; end end

1 : begin state <= 2; mult1_in1 <= u; mult1_in2 <= u; if(mult1_in_ready==1) mult1_reset <= 1; else mult1_in_ready <= 1; mult2_in1 <= v; mult2_in2 <= v; if(mult2_in_ready==1) mult2_reset <= 1; else mult2_in_ready <= 1; end

2 : begin mult1_reset <= 0; mult2_reset <= 0; state <= 3; end

3 : if(mult1_out_ready==1 && mult2_out_ready==1) begin mult1_in1 <= d; mult1_in2 <= mult1_out; mult1_reset <= 1; mult2_in1 <= u; mult2_in2 <= v; mult2_reset <= 1; temp1 <= mult2_out; state <= 4; end

4 : begin mult1_reset <= 0; mult2_reset <= 0; state <= 5; end

5 : if(mult1_out_ready==1 && mult2_out_ready==1) begin u <= mult2_out; v <= temp1 + mult1_out; state <= 6; end

6 : begin //make v2 even, if odd if(v[0]==1) begin if(v>=N) v <= v - N; else v <= v + N; end else begin v <= v >>> 1; state <= 7; end end

7 : if(Nplus1[i]==1) begin u <= u + v; mult1_in1 <= d; mult1_in2 <= u; mult1_reset <= 1; state <= 8; end else begin state <= 11; end

8 : begin mult1_reset <= 0; if(u[0]==1) begin if(u>=N) u <= u - N; else u <= u + N; end else begin u <= u >>> 1; state <= 9; end end

9 : if(mult1_out_ready==1) begin v <= mult1_out + v; state <= 10; end

10: begin //make v2 even, if odd if(v[0]==1) begin if(v>=N) v <= v - N; else v <= v + N; end 65

else begin v <= v >>> 1; state <= 11; end end

11: if(i>0) begin i <= i - 1; state <= 1; end else begin state <= 12; end

12: begin if(u==0 || u==N) begin temp <= 1; state <= 13; end else begin temp <= 0; state <= 13; end end

13: begin out_ready <= 1; end

//For testing purposes only default : begin $display("Error : Invalid case"); end

endcase end end

//Assign the value of temp variable to the output wire out assign out = temp; endmodule

Modular multiplication module

//Author: Yasaswy Kasarabada //Date: June 29, 2016 /**************************************************************************** This module performs Modular multiplication on the inputs in1, in2 and N. clk is the clock input. in_ready and reset control data flow within the module. out returns the value of in1*in2 (mod N). out_ready indicates readiness of out. ****************************************************************************/ module ModMult #( parameter NLEN = 1024, parameter TAG = 2 )( input signed [NLEN+TAG:0] in1, input signed [NLEN+TAG:0] in2, input signed [NLEN:0] N, input in_ready, input clk, input reset, output reg signed [NLEN+TAG:0] out, output reg out_ready );

//Local variables localparam INITVAL = -1; reg signed [2:0] state = INITVAL; reg signed [NLEN+TAG:0] temp1,temp2; integer i,j,k;

//Main code always @(posedge clk) begin if(reset==1) begin out_ready <= 0; state <= INITVAL; end else begin case (state) INITVAL : if(in_ready==1) begin state <= 0; out_ready <= 0; out <= 0; if(in1<0) begin temp1 <= (in2<0) ? -in1 : in2; temp2 <= (in2<0) ? -in2 : in1; end if(in1>0) begin temp1 <= in1; temp2 <= in2; end if(in1==0 || in2==0) begin j <= NLEN; out <= 0; state <= 3; end i <= 0; j <= 0; k <= 0; end

0 : begin temp1 <= temp1 <<< 1; temp2 <= temp2 <<< 1; state <= 1; k <= k + 1; end

1 : begin if(temp1=NLEN) begin out <= 0; state <= 2; end end if(temp1>=N && temp2>=N) begin temp1 <= temp1 - N; temp2 <= temp2 - N; end if(temp1>=N && temp2=N) temp2 <= temp2 - N; end

2 : begin if(i>>1; else out <= (out+((out[0]==1)?N:0))>>>1; end else state <= 3; end

3 : if(j>> 1; j <= j + 1; end else begin out_ready <= 1; end

//For testing purposes only default : begin $display("Error : Invalid case"); end endcase end end endmodule

Modular exponentiation module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module performs Modular exponentiation on base and the inputs exp, N. clk is the clock input. in_ready and reset control data flow within the module. out return the value of base^exp (mod N). out_ready indicates readiness of out. ****************************************************************************/ module ModExp #( parameter NLEN = 1024, parameter TAG = 2 )( input signed [NLEN:0] exp, input signed [NLEN:0] N, input in_ready, input clk, input reset, output reg signed [NLEN+TAG:0] out, output reg out_ready );

//Local variables localparam INITVAL = -1; reg signed [2:0] state = INITVAL; reg flag; reg signed [NLEN:0] exp_local; reg signed [NLEN+TAG:0] base;

//////////////////////SUB-MODULE INSTANTIATIONS////////////////////// reg modm_in_ready=0,base2_in_ready=0,modm_reset=0,base2_reset=0; wire modm_out_ready,base2_out_ready; wire signed [NLEN+TAG:0] modm_out,base2_out;

//Modular multiplication module instantiation //to calculate out*base mod N modmult #( .NLEN (NLEN ), .TAG (TAG ) ) modm ( .in1 (base ), .in2 (out ), .N (N ), .in_ready (modm_in_ready ), .clk (clk ), .reset (modm_reset ), .out (modm_out ), .out_ready (modm_out_ready ) );

//Modular multiplication module instantiation //to calculate base*base mod N modmult #( .NLEN (NLEN ), .TAG (TAG ) ) base2 ( .in1 (base ), .in2 (base ), .N (N ), .in_ready (base2_in_ready ), .clk (clk ), .reset (base2_reset ), .out (base2_out ), .out_ready (base2_out_ready ) ); //////////////////SUB-MODULE INSTANTIATIONS COMPLETE/////////////////

//Main code always @(posedge clk) begin if(reset==1) begin out_ready <= 0; state <= INITVAL; end else begin case (state)

INITVAL : if(in_ready==1) begin out <= 1; out_ready <= 0; exp_local <= exp; base <= 2; flag <= 0; state <= 0; end

0 : if(exp_local>0) begin state <= 1; //Compute val = val * base (mod N) if(exp_local[0]==1) begin flag <= 1; if(modm_in_ready==1) modm_reset <= 1; else modm_in_ready <= 1; end //Computer base = base * base (mod N) if(base2_in_ready==1) base2_reset <= 1; else base2_in_ready <= 1; end else state <= 3; 70

1 : begin modm_reset <= 0; base2_reset <= 0; state <= 2; end

2 : if(flag==1) begin //if exponent is odd if(modm_out_ready==1 && base2_out_ready==1) begin exp_local <= exp_local >>> 1; base <= base2_out; flag <= 0; out <= modm_out; state <= 0; end end else begin //if exponent is even if(base2_out_ready==1) begin exp_local <= exp_local >>> 1; base <= base2_out; state <= 0; end end

3 : begin out_ready <= 1; end

//For testing purposes only default : begin $display("Error : Invalid case"); end

endcase end end endmodule

Jacobi symbol calculator module

//Author: Yasaswy Kasarabada //Date: June 29, 2016 /**************************************************************************** This module calculates the Jacobi symbol J(D, N) and returns D in {5,-7,9...} such that J=-1. clk is the clock input. in_ready and reset control data flow within the module. out returns D. out_ready indicates readiness of out. ****************************************************************************/ module Jacobi #( parameter NLEN=1024 )( input signed [NLEN:0] N, input in_ready, input clk, input reset, output [5:0] out, output reg out_ready );

//Local variables localparam INITVAL = -1; reg [5:0] D=0; reg [NLEN:0] a,b,temp; reg neg,t; reg signed [3:0] state = INITVAL;

//Main code always @(posedge clk) begin if(reset==1) begin state <= INITVAL; out_ready <= 0; end else begin case (state) INITVAL : if(in_ready==1) begin D <= 5; neg <= 0; out_ready <= 0; state <= 0; end

0 : begin a <= D; b <= N; t <= 0; state <= 1; end

1 : begin if(neg==1) if(b%4==3) t <= ~t; state <= 2; end

2 : if(a!=0) state <= 3; else state <= 5;

3 : if(a[0]==0) begin a <= a >>> 1; if(b%8==3 || b%8==5) t <= ~t; end else begin if(a

4 : begin state <= 2; a <= (a - b) >>> 1; if(b%8==3 || b%8==5) t <= ~t; end

5 : if(b==1) begin if(t==1) state <= 7; else state <= 6; end else state <= 6;

6 : begin D <= D + 6'b000010; neg <= ~neg; state <= 0; end

7 : begin out_ready <= 1; end

endcase end end

//Return D assign out = (neg==0) ? D : -D; endmodule

Baillie-PSW module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module performs the Baillie-PSW test. clk is the clock input. in_ready controls data flow. abort, abort_ready, next_N and next_N_ready signals handle the synchronicity between the hardware and software modules. out returns the primality of N: 0 – composite, 1 – prime. out_ready indicates readiness of out. ****************************************************************************/ module BailliePSW #( //Size of number to be tested parameter NLEN = 1024 ) ( //Input signal to indicate completion of RN-slice transmission input in_ready,

//Global clock signal input clk,

//Input signal to indicate completion of Divisibility module //0 - prime/untested, 1 - composite input abort,

//Input signal to indicate readiness of abort signal //0 - not ready, 1 - ready to be read input abort_ready,

//Output signal to indicate primality: 0 - composite, 1 - prime output out,

//Output signal to indicate ready status of output output reg out_ready,

//Output signal to indicate software to proceed with next number //0 - don't proceed, 1 - proceed output reg next_N,

//Output signal to indicate readiness of next_N signal //0 - not ready, 1 - ready to be read output reg next_N_ready,

//Output signal to indicate any error encountered during computation output reg error=0 );

//Local variables localparam INITVAL = -1, getRANDNUM=-2, RAMenable=-3; reg signed [NLEN:0] N; //Number to be tested (read from RAM) reg signed [3:0] state=RAMenable; //State variable wire mr_ready, luc_ready; //Indicates ready of each sub-module 74

//////////////////////SUB-MODULE INSTANTIATIONS/////////////////////// /********************************************************************* Signals connected to sub-modules: mr_ concerns Miller-Rabin module signals luc_ concerns Lucas module signals ram_ concerns RAM module signals *********************************************************************/ //Input signals indicating module to start computation reg mr_in_ready=0,luc_in_ready=0; //Input signals indicating module to reset reg mr_reset=0,luc_reset=0; //Output signals indicating ready status of output wire mr_out_ready,luc_out_ready; //Output signals indicating output of module wire mr_out,luc_out; //Read enable signals for ports A and B reg ram_a_rden,ram_b_rden; //Write enable signals for ports A and B reg ram_a_wren,ram_b_wren; //Address buses for ports A and B reg [3:0] ram_a_address,ram_b_address; //Input data ports A and B reg [127:0] ram_a_data,ram_b_data; //Output data ports A and B wire [127:0] ram_q_a,ram_q_b;

//Miller-Rabin module instantiation MillerRabin #( .NLEN (NLEN ) ) millrab ( .N (N ), .in_ready (mr_in_ready ), .clk (clk ), .reset (mr_reset ), .out (mr_out ), .out_ready (mr_out_ready ) );

//Lucas module instantiation Lucas #( .NLEN (NLEN ) ) luc ( .N (N ), .in_ready (luc_in_ready ), .clk (clk ), .reset (luc_reset ), .out (luc_out ), .out_ready (luc_out_ready ) );

//RAM module instantiation ram randprime ( .address_a (ram_a_address ), .address_b (ram_b_address ), .clock (clk ), .data_a (ram_a_data ), .data_b (ram_b_data ), .rden_a (ram_a_rden ), .rden_b (ram_b_rden ), .wren_a (ram_a_wren ), .wren_b (ram_b_wren ), .q_a (ram_q_a ), .q_b (ram_q_b ) ); //////////////////SUB-MODULE INSTANTIATIONS COMPLETE/////////////////

//Main code always @(posedge clk) begin case (state)

RAMenable : begin //enable reading from RAM ram_a_rden <= 1; ram_b_rden <= 1; ram_a_wren <= 0; ram_b_wren <= 0; state <= getRANDNUM; error <= 0; ram_a_address <= 0; ram_b_address <= 1; end

getRANDNUM: if(in_ready==1) begin //read ram data to get random number case (ram_a_address)

0 : begin N[127:0] <= ram_q_a; N[255:128] <= ram_q_b; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end

2 : begin N[383:256] <= ram_q_a; N[511:384] <= ram_q_b; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end

4 : begin N[639:512] <= ram_q_a; N[767:640] <= ram_q_b; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end 76

6 : begin N[895:768] <= ram_q_a; N[1023:896] <= ram_q_b; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; state <= INITVAL; end

default : begin error <= 1; end

endcase end

INITVAL: begin state <= 0;

//start both tests mr_in_ready <= 1; luc_in_ready <= 1; mr_reset <= 0; luc_reset <= 0;

//initialize all outputs to 0 out_ready <= 0; error <= 0; next_N <= 0; next_N_ready <= 0; end

0 : //Increment N and reset modules if(abort&&abort_ready==1) state <= 3; else if(mr_ready&&luc_ready==1) state <= 2; else if( (!mr_out&&mr_out_ready==1)|| (!luc_out&&luc_out_ready==1)) state <= 1; else state <= 0;

1 : if(abort&&abort_ready==1) state <= 3; else begin state <= 3; next_N <= 1; next_N_ready <= 1; end

2 : if(abort&&abort_ready==1) begin state <= 3; N <= N + 2; end

else begin if(abort_ready==0) //Wait for Divisibility module to finish state <= 2; else state <= 5; //Write value to RAM end

3 : begin state <= 4; mr_reset <= 1; luc_reset <= 1; N <= N + 2; end

4 : begin state <= 0; mr_reset <= 0; luc_reset <= 0; error <= 0; out_ready <= 0; next_N <= 0; next_N_ready <= 0; end

5 : begin //enable writing to RAM ram_a_rden <= 0; ram_b_rden <= 0; ram_a_wren <= 1; ram_b_wren <= 1; state <= 6; error <= 0; end

6 : begin //write prime number to RAM case (ram_a_address)

0 : begin ram_a_data <= N[127:0]; ram_b_data <= N[255:128]; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end

2 : begin ram_a_data <= N[383:256]; ram_b_data <= N[511:384]; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end

4 : begin ram_a_data <= N[639:512]; ram_b_data <= N[767:640]; ram_a_address <= ram_a_address+2; ram_b_address <= ram_b_address+2; end 78

6 : begin ram_a_data <= N[895:768]; ram_b_data <= N[1023:896]; state <= 7; end

default : begin error <= 1; end

endcase end

7 : begin //signal testbench to read RAM data out_ready <= 1; end

default: begin error <= 1; end

endcase end

//Only if all 3 modules declare N to be prime (output as 1), //the Baillie-PSW module declares the number to be prime assign out = mr_out && luc_out;

//Only when the outputs of all 3 modules are ready //only then the Baillie-PSW module declares output to be ready assign mr_ready = mr_out&&mr_out_ready; assign luc_ready = luc_out&&luc_out_ready; endmodule

Testbench module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module initializes all vectors for the Baillie-PSW module. clock is generated using the clk variable. Value of random number is read from the file and the .mif is initialized. in_ready is set high after RAM initialization. If N is declared prime by Baillie-PSW module, appropriate message is displayed. Values of sync signals: abort, abort_ready, next_N and next_N_ready are read from/written to sync file as well as read from/passed to the Baillie-PSW module. ****************************************************************************/

`timescale 1 ns/100 ps //Parameters for time control module tb_bpsw ( ); localparam NLEN=1024; //Size of random/prime number localparam CLKPERIOD = 100; //Clock period = 100 ns localparam BUSWIDTH = 128; //Set RAM data bus width localparam BUSDEPTH = 16; //Set RAM data bus depth reg signed [NLEN-1:0] N; //Current copy of number integer rn,pr,sync,ram,readstat=0; //File I/O variables reg sync_data[5:0]; //Sync data-read/write to sync file reg flag = 0;

///////////////////////MAIN MODULE INSTANTIATION////////////////////// reg in_ready=0; //Variable to signal start of computations reg clk; //Clock signal reg abort=0,abort_ready=0; //Abort signal and read status of abort wire out,out_ready; //Output and ready status of output wire next_N,next_N_ready; //next_N and ready status of next_N wire error; //Error signal

BailliePSW #( .NLEN (NLEN ) ) main ( .in_ready (in_ready ), .clk (clk ), .abort (abort ), .abort_ready (abort_ready ), .out (out ), .out_ready (out_ready ), .next_N (next_N ), .next_N_ready (next_N_ready ), .error (error ) ); ///////////////////MAIN MODULE INSTANTIATION COMPLETE/////////////////

/********************************************************************* Generate clock with frequency 100MHz = 10ns period *********************************************************************/ initial begin clk = 0; forever begin #50 clk = ~clk; end end

/********************************************************************* Set file handles for: reading random number(rn) writing prime number(pr) read/write sync signals(sync) initializing RAM .mif file(ram) *********************************************************************/ initial begin rn = $fopen("randnum.list","r"); pr = $fopen("primenum.txt","w"); sync = $fopen("sync.txt","w"); ram = $fopen("randnum.mif","w"); end

/********************************************************************* Read value of random number from file. Store value in RAM. Signal Baillie-PSW module to start computations *********************************************************************/ always begin if(readstat==0) begin #CLKPERIOD readstat = $fscanf(rn,"%h",randnum); #CLKPERIOD N = randnum; N[NLEN] = 1'b0; end else begin #CLKPERIOD $fwrite(ram,"WIDTH=%d;\nDEPTH=%d;\nADDRESS_RADIX=HEX;\nDATA_RADIX= HEX;\nCONTENT BEGIN\n",BUSWIDTH,BUSDEPTH); #CLKPERIOD $fwrite(ram,"\t00 : %h\n\t01 : %h\n\t02 : %h\n\t03 : %h\n", N[127:0],N[255:128],N[383:256],N[511:384]); #CLKPERIOD $fwrite(ram,"\t04 : %h\n\t05 : %h\n\t06 : %h\n\t07 : %h\n", N[639:512],N[767:640],N[895:768],N[1023:896]); #CLKPERIOD $fwrite(ram,"\t[08..0F] : 00000000000000000000000000000000; \n END;"); #CLKPERIOD in_ready <= 1; end

/********************************************************************* Wait for Baillie-PSW module to complete. Display prime number Store value in file *********************************************************************/ always begin if(out_ready==1&&flag==0) begin #CLKPERIOD $display ("%d is prime",N); $fwrite(pr,"%h",N); flag=1; end end

/********************************************************************* 1. Read sync file. 2. Update values of abort, abort_ready in module. 3. Update values of next_N, next_N_ready in file. *********************************************************************/ always begin #CLKPERIOD readstat = $readmemb("sync.txt",sync_data,0,5); #CLKPERIOD abort <= sync_data[0]; abort_ready <= sync_data [1]; if(sync_data[1]==1) begin #CLKPERIOD sync_data[2] = 1; N = N + 2; end if(next_N_ready==1) begin #CLKPERIOD sync_data[3] = next_N; sync_data[4] = next_N_ready; N = N + 2; end if(N[NLEN]==1'b1) //Compensate N for overflow #CLKPERIOD N[NLEN] = 1'b0; if(N==randnum) //If no prime is found in range #CLKPERIOD $display("No prime was found in the %d range.",NLEN); #CLKPERIOD $fwrite(sync,"%b",sync_data); end endmodule

Appendix B – Software modules

Divisibility module

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module performs the Divisibility test. N is the number to be tested. This module reads the initial value of N and the values of the first 1000 primes from respective files. Handles synchronicity with hardware. ****************************************************************************/ import java.util.*; import java.math.*; import java.io.*; class divisibility{ public static BigInteger N, zero = new BigInteger("0"); public static BigInteger two = new BigInteger("2"); public static BigInteger[] primes=null; public static Writer fileOut; public static Scanner fileIn; /*sync array holds values of/for sync files sync [0-2] - abort, abort_ready, abort_ack sync [3-5] - next_N, next_N_ready, next_N_ack*/ public static int [] sync = new int[6];

public static void main(String[] args) throws IOException{ boolean init; try{ fileOut = new FileWriter("sync.txt",false); } catch(FileNotFoundException e){ System.out.println("Error opening sync file"); } try{ fileIn = new Scanner(new File("sync.txt")); } catch(FileNotFoundException e){ System.out.println("Error opening sync file"); } N = readRandNum(); //initialize N to random number if(N.equals(zero)==true) System.out.println("Error initializing random num"); else{ primes = new BigInteger[1000]; init = initprimes(); if(init==false) System.out.println("Error initializing primes"); else{ //to compensate for N = N + 2 for initial pass N = N.subtract(two);

do { //start Divisibility test do { N = N.add(two); writeabort(0,0); for(int i=0;i<1000;i++) if(N.remainder(primes[i])!=zero) { writeabort(1,1); //set abort = 1 for hardware break; } if(sync[0]==1) break;

//if N is prime, wait for hardware writeabort(0,1); //set abort = 0 for hardware do { //wait for abort_ack updatesync(0); } while(sync[2]==0); do { readnext_N(); //wait for next_N } while((sync[3]&sync[4])!=1); //check if N is composite

updatesync(1); //set next_N_ack } while(true);

//if N is composite, wait for abort_ack do{ updatesync(0); } while(sync[2]==0); } while(true); //restart test with N = N + 2 } } fileIn.close(); fileOut.close(); }

//read random number from file and initialize value of N public static BigInteger readRandNum(){ BigInteger temp = zero; Scanner rand; try{ rand = new Scanner(new File("randnum.list")); } catch(FileNotFoundException e){ System.out.println("Error opening primes file"); return zero; }

while(rand.hasNextLine()) temp = rand.nextBigInteger(16); rand.close(); return temp; } 84

//Read and initialize value of 1000 prime from file public static boolean initprimes(){ Scanner primefile; int i=0; try{ primefile = new Scanner(new File("primes.txt")); } catch(FileNotFoundException e){ System.out.println("Error opening primes file"); return false; } while(primefile.hasNextInt()) primes[i++]=new BigInteger(String.valueOf(primefile.nextInt())); primefile.close(); return true; }

//Write value of abort and abort_ready public static boolean writeabort(int abort, int abort_ready) throws IOException{ fileIn.reset(); for(int i=0;i<6;i++) sync[i] = fileIn.nextInt(); sync[0] = abort; sync[1] = abort_ready; for(int i=0;i<4;i++) fileOut.write(sync[i]); return true; }

//Read value of next_N and next_N ready public static boolean readnext_N(){ fileIn.reset(); for(int i=0;i<6;i++) sync[i] = fileIn.nextInt(); return true; }

//Update values of sync signals public static void updatesync(int val) throws IOException{ fileIn.reset(); for(int i=0;i<6;i++) sync[i] = fileIn.nextInt(); sync[5] = val; for(int i=0;i<4;i++) fileOut.write(sync[i]); } }

Random number generator module

The license for usage of this random number generator can be obtained from [42].

//Author: Yasaswy Kasarabada //Date: June 29, 2016

/**************************************************************************** This module generates the random number. Using the generator stored in include file, the 1024-bit random number is generated and stored in a file. ****************************************************************************/

#include #include #include #include #include #include #include #include #include "pcg_basic.h" // Contains the generator [36] int main() { int i,rounds = 10; bool seed = true; FILE *fo; //File handle int32_t numbers[32],num=0; //Array to store 32 numbers pcg32_random_t generator; //Generator object

//Seed the random number generator pcg32_srandom_r(&generator, time(NULL)^(intptr_t)&printf, (intptr_t)&rounds);

//Open file fo = fopen("randnum.list","w+");

//Generate 32 numbers(32-bit each) and store them in the array for(i=0;i<32;i++){ num = pcg32_random_r(&rng); if(num

//Write 1024-bit stream into a file for(i=0;i<32;i++) fprintf(fo,"%x",numbers[i]);

return 0; } 86

Appendix C – Cramer prime generation

Below is the (unpublished) proof of Dr. George B. Purdy on Cramer prime generation.

Cramer prime generation

George B. Purdy

July 2016

Here we compare two algorithms for finding primes.

Algorithm 1

1. Generate random odd number R, a < R < b.

2. Test R for primality

3. If R not prime the goto 1.

4. Stop. You have your prime.

Algorithm 2

1. Generate random number R, a < R < b.

2. Test R for primality

3. If R is prime stop. You have your prime.

4. R = R + 2.

5. If R < b goto 2.

6. Goto 1.

What is the probability that two Primes R < R’ will produce the same prime?

With algorithm 1 clearly zero.

With algorithm 2, suppose that pn < R < R’ < pn+1 for two consecutive primes pn < pn+1. Then R and R’ will produce the same prime pn+1. Otherwise, R and R’ will

produce different primes.

Now the average value of pn+1 – pn is ln pn by the Prime number theorem. Also

2 lim sup (pn+1 – pn)/(ln pn) = 1 by a conjecture of H. Cramer. If we can arrange that R’

2 – R < (ln pn) has low probability, then it will follow that R and R’ lead to the same

2 2 prime with low probability. Now pn < b, so ln pn < ln b, and (ln pn) < (ln b) .

So we are choosing R and R’ in (a, b) and we hope that the probability that

|R’ – R| < (ln b)2 is small.

An Example a = 10300 and b = 2x10300.

If we divide the interval (a, b) into subintervals of size (ln b)2, there will be

(b – a)/(ln b)2 of them, and the probability P that R and R’ will lie is the same interval

is P = (ln b)2/(b – a).

Now b – a = 10300 and ln b = ln 2 + 300 ln 10 = 691, (ln b)2 = 477326 < 106 and

P < 106/10300 = 10-294.

Rigorous Result

0.525 Baker et al proved that for large n, pn+1 – pn < p /ln pn. This will give a very decent

rigorous upper bound on the probability P. We dived the interval (a, b) into intervals of

0.525 300 size s = p /ln pn. There are N = (b – a)/s = 10 /s of them.

s < a0.525 < 10301x0.525 < 10158

P = s/N < 10158-300 = 10-142

References

R. C. Baker, G. Harman and J. Pintz, "The difference between consecutive primes, II," Proceedings of the London Mathematical Society, vol. 83, no. 03, pp. 532-562, 2001.

H. Cramer, "On the order of magnitude of the difference between consecutive primes.," Acta Arithmetica, vol. 2, no. 1, pp. 23-46, 1936.

Appendix D – Lucas sequences and Jacobi symbol

Lucas sequences

The Lucas sequences, for given values of P and Q, are defined using the recurrence

relations:

( , ) = 0, ( , ) = 1, ( , ) = ( , ) ( , ) > 1

𝑈𝑈0 𝑃𝑃 𝑄𝑄 𝑈𝑈1 𝑃𝑃 𝑄𝑄 𝑈𝑈𝑛𝑛 𝑃𝑃 𝑄𝑄 𝑃𝑃 ∙ 𝑈𝑈𝑛𝑛−1 𝑃𝑃 𝑄𝑄 − 𝑄𝑄 ∙ 𝑈𝑈𝑛𝑛−2 𝑃𝑃 𝑄𝑄 𝑓𝑓𝑓𝑓𝑓𝑓 𝑛𝑛 and

( , ) = 2, ( , ) = , ( , ) = ( , ) ( , ) > 1

𝑉𝑉0 𝑃𝑃 𝑄𝑄 𝑉𝑉1 𝑃𝑃 𝑄𝑄 𝑃𝑃 𝑉𝑉𝑛𝑛 𝑃𝑃 𝑄𝑄 𝑃𝑃 ∙ 𝑉𝑉𝑛𝑛−1 𝑃𝑃 𝑄𝑄 − 𝑄𝑄 ∙ 𝑉𝑉𝑛𝑛−2 𝑃𝑃 𝑄𝑄 𝑓𝑓𝑓𝑓𝑓𝑓 𝑛𝑛 Using these recurrence relations, it is easy to show that, for n > 0

( , ) + ( , ) ( , ) + ( , ) ( , ) = ( , ) = 2 2 𝑃𝑃 ∙ 𝑈𝑈𝑛𝑛−1 𝑃𝑃 𝑄𝑄 𝑉𝑉𝑛𝑛−1 𝑃𝑃 𝑄𝑄 𝐷𝐷 ∙ 𝑈𝑈𝑛𝑛−1 𝑃𝑃 𝑄𝑄 𝑃𝑃 ∙ 𝑉𝑉𝑛𝑛−1 𝑃𝑃 𝑄𝑄 𝑈𝑈𝑛𝑛 𝑃𝑃 𝑄𝑄 𝑉𝑉𝑛𝑛 𝑃𝑃 𝑄𝑄 We calculate the first few terms of the Lucas sequences Un (P, Q) and Vn (P, Q):

n Un (P, Q) Vn (P, Q) 0 0 2 1 1 P 2 P P2 – 2Q 3 P2 – Q P3 – 3PQ 4 P3 – 2PQ P4 – 4P2Q + 2Q2 5 P4 – 3P2Q + Q2 P5 – 5P3Q + 5PQ2

Table 6 Value of Lucas sequences for small values of n

The characteristic equation of the above mentioned recurrence relations for the Lucas

sequences is:

+ = 0 2 𝑥𝑥 − 𝑃𝑃𝑃𝑃90 𝑄𝑄

The roots of the equation are:

+ = = = 4 2 2 𝑃𝑃 √𝐷𝐷 𝑃𝑃 − √𝐷𝐷 2 𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎 𝑏𝑏 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝐷𝐷 𝑃𝑃 − 𝑄𝑄 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 The terms of the Lucas sequences can be expressed in terms of these roots as follows:

= = = + 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑎𝑎 − 𝑏𝑏 𝑎𝑎 − 𝑏𝑏 𝑛𝑛 𝑛𝑛 𝑈𝑈𝑛𝑛 𝑉𝑉𝑛𝑛 𝑎𝑎 𝑏𝑏 𝑎𝑎 − 𝑏𝑏 √𝐷𝐷 The value of the determinant D is used to find the value of the Jacobi symbol J (D, N).

Jacobi symbol

Legendre symbol [43] is a quadratic character modulo a prime number (p). The output

value of this multiplicative function is either 1, -1 or 0. The Legendre symbol is

represented as . For a given value of a and p, the Legendre symbol is defined as: 𝑎𝑎 �𝑝𝑝� 1 0 ( ) : ( ), = 1 0 ( ) , 2 𝑖𝑖𝑖𝑖 𝑎𝑎 ≢ 𝑚𝑚𝑚𝑚𝑚𝑚 𝑝𝑝 𝑎𝑎𝑎𝑎𝑎𝑎 𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑥𝑥 𝑎𝑎 ≡ 𝑥𝑥 𝑚𝑚𝑚𝑚𝑚𝑚 𝑝𝑝 𝑎𝑎 0 0 ( ) � � �− 𝑖𝑖𝑖𝑖 𝑎𝑎 ≢ 𝑚𝑚𝑚𝑚𝑚𝑚 𝑝𝑝 𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑛𝑛𝑛𝑛 𝑠𝑠𝑠𝑠𝑠𝑠ℎ 𝑥𝑥 𝑝𝑝 𝑖𝑖𝑖𝑖 𝑎𝑎 ≡ 𝑚𝑚𝑚𝑚𝑚𝑚 𝑝𝑝 The Jacobi symbol J (a, n) is a generalization of the Legendre symbol. It has prominent

use in primality testing and integer factorization. For any integer a and an odd integer

n, the Jacobi symbol is the product of the Legendre symbols corresponding to the prime

factors of n. is the Legendre symbol, for an integer a and an odd prime p. 𝑎𝑎 �𝑝𝑝� ( , ) = … = 𝛼𝛼1 𝛼𝛼2 𝛼𝛼3 𝛼𝛼𝑘𝑘 𝑎𝑎 𝑎𝑎 𝑎𝑎 𝑎𝑎 𝛼𝛼1 𝛼𝛼2 𝛼𝛼3 𝛼𝛼𝑘𝑘 𝐽𝐽 𝑎𝑎 𝑛𝑛 � � � � � � � � 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑛𝑛 𝑝𝑝1 𝑝𝑝2 𝑝𝑝3 ∙∙∙ 𝑝𝑝𝑘𝑘 𝑝𝑝1 𝑝𝑝2 𝑝𝑝3 𝑝𝑝𝑘𝑘 The values of the Jacobi symbol J (a, n) for a few values of a (0 to 10) and for all values of p ϵ [1,11] are given below.

n \ a 0 1 2 3 4 5 6 7 8 9 10 1 1 3 0 1 -1 5 0 1 -1 -1 1 7 0 1 1 -1 1 -1 -1 9 0 1 1 0 1 1 0 1 1 11 0 1 -1 1 1 1 -1 -1 -1 1 -1

Table 7 Value of Jacobi symbol for small values of a and n

For use in the Lucas probable test, the value of the Jacobi symbol J (D, N) must be

equal to -1. We chose D in the range {5, -7, 9, 11…}. To calculate the value of J for D

< 0, the following properties of the Jacobi symbol are used:

J (a*b, n) = J (a, n) * J (b, n) and J (-1, n) = (-1) (n-1)/2

Using these two properties, any D < 0 is converted to a product of |D| and (-1) to obtain the relation J (D, N) = J (|D|*-1, N) = J (|D|, N) * J (-1, N) = J (|D|, N) * (-1) (N-1)/2.

Example of the Lucas test

Let us test the number N = 7 for primality using the Lucas probable prime test. From

Table 7, the value for D is 5, since J (5, 7) = -1. Setting P = 1, the value for

Q = (1 – D)/4 = -1. Using the algorithm described in Section 3.2, we need to compute

U8. Substituting the values of D, P and Q in Table 7, we compute U4 and V4:

3 3 4 2 2 4 2 2 U4 = P – 2PQ = 1 – (2*1*-1) = 3 V4 = P – 4P Q + 2Q = 1 – (4*1 *-1) + (2*-1 ) = 7

Subsequently, calculating U8 using the principles highlighted in Section 3.2,

U8 = U4 · V4 = 3*7 = 21 ≡ 0 (mod 7)

Since U8 ≡ 0 mod 7, the number 7 is said to have passed the Lucas probable prime test.

Appendix E – Tested numbers

This Appendix holds the values of the number tested in section 4.2.3.

N1 115922179551495973383410176342643722334557255682879605864838806293 659619625004303206250384392546855063844106965156287951749387634112 551089284595541103692716528774876311641700929986988023197242224581 099872580798960693521778607396791006450968430359009613295725905514 216842343121690916290236558767890728449777 N2 100000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 0031 N3 999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999913

N1 is 1024-bit long PRIME number. N2 is 1326-bit long COMPOSITE number. N3 is 333-bit long COMPOSITE number.