<<

Institute of Information and Mathematical Sciences

Prime Search Algorithms (Mersenne Primes Search)

11191649 Jun Li

June, 2012

Contents

CHAPTER 1 INTRODUCTION ...... 1

1.1 BACKGROUND ...... 1 1.2 ...... 1 1.3 STUDY HISTORY ...... 2 1.3.1 Early history [4] ...... 2 1.3.2 Modern History ...... 3 1.3.3 Recent History ...... 4 CHAPTER 2 METHODOLOGY ...... 6

2.1 DEFINITION AND THEOREMS ...... 6 2.2 DISTRIBUTION LAW ...... 7 2.3 ALGORITHMS ...... 8 2.3.1 Trial ...... 8 2.3.2 ...... 9 2.3.3 Pollard's (P-1) method ...... 12 2.3.4 Lucas-Lehmer Test ...... 12 2.4 GREAT INTERNET MERSENNE PRIME SEARCH ...... 13 2.4.1 A little about GIMPS ...... 13 2.4.2 How GIMPS Works ...... 14 2.4.3 GIMPS and Grid Computing ...... 16 CHAPTER 3 EXPERIMENT AND RESULTS ...... 18

3.1 “LUCAS-LEHMER TEST WITH GMP” AND “GLUCAS”...... 18 3.1.1 Lucas-Lehmer test with GMP ...... 18 3.1.2 Glucas ...... 18 3.1.3 “Lucas-Lehmer test with GMP” vs “Glucas” ...... 19 3.1.4 Experiment conclusion ...... 23 3.2 UNDISCOVERED BETWEEN MERSENNE PRIMES 41ST AND 42ND ...... 25 3.2.1 Basic method ...... 25 3.2.2 Experiment ...... 25 3.2.3 Conclusion ...... 26 3.3 STUDY MERSENNE PRIMES ON GPU ...... 27 3.3.1 Introduction to GPU ...... 27 3.3.2 GPU Computing ...... 27 3.3.3 Experiments on GPU ...... 28 3.3.4 Conclusion ...... 35 CHAPTER 4 DISCUSSION AND FUTURE WORK ...... 36

4.1 SUMMARY OF EXPERIMENTS ...... 36 4.2 MEANING OF STUDY MERSENNE PRIMES ...... 36 4.3 FUTURE WORK ...... 37 REFERENCES ...... 39 APPENDIX ...... 41

A. PROGRAMS ...... 41 B. TABLES ...... 43 . FIGURES ...... 43 Chapter 1 Introduction

Although math doesn't bring excitement breakthrough very often, sometimes we can still hear the news that mathematicians discover a new super-large . Those are usually with a form which called Mersenne primes.

1.1 Background

On April 12th, 2009, the 47th known Mersenne prime number——242,643,801 − 1 was found by Odd Magnar Strindmo, a Norwegian IT professional, through Great Internet Mersenne Prime Search (GIMPS). It was confirmed by Great Internet Mersenne Prime Search (GIMPS) project organizer George Woltman on June 7, 2009. The 47th Mersenne prime number is a 12,837,064 digit number, the second largest Mersenne prime number, which will be more than 50 kilometers if we use ordinary font size to write it down. [1]

1.2 Mersenne Prime

There are some concepts and their relationship we have to be clear before we start our project: Prime Number, Mersenne Number and Mersenne Prime.

 Prime Number: In , a prime number, according to wolframalpha, that “A positive that has exactly one positive integer divisor other than 1 (i.e., no factors other than 1 and itself.)”[2] For example, 2, 3, 5, 7, 11 are prime numbers, as only 1 and themselves can divide them. However 4 is composite, since it has the divisors 2 and 2 in to 1 and 4. Usually people just call prime numbers as primes.

 Mersenne Number: In mathematics, a Mersenne number, named after Marin Mersenne (a French theologian and mathematician, who began the study of these numbers in the early 17th century, but known for his work on acoustics), is a number of the form, 푝 푀푝 = 2 − 1, where p is an integer. The first few Mersenne numbers are 1, 3, 7, 15, 31, 63, 127, 255 and so on.

 Mersenne Prime: In mathematics, a Mersenne prime is a Mersenne number that is prime. The form is just like below,

1

푝 푀푝 = 2 − 1,

where Mp and p are primes.

Marin Mersenne compiled a list of primes in the form of 2푝 − 1 (exponent p≤257), but it is wrongly included M67 and M257 which are not primes and missed M61, M89 and

M107. [3] The Mersenne number is not necessarily a prime number, the following examples show very clear that Mersenne number can be a prime number or a non-prime number. 2 3 Prime number:푀2 = 2 − 1 = 3 푀3 = 2 − 1 = 7 4 Non-prime number: 푀4 = 2 − 1 = 15 1.3 Study History

Mersenne prime was first proposed in order to solve . The history of searching Mersenne prime can be traced back to 350 BC. However, even till today, human beings have found only 47 Mersenne primes. Twelve of them are found before 1952 by mankind using pen and paper to calculate. The rest are found by computer.

1.3.1 Early history [4]

Over 2300 years ago, , the ancient Greek mathematician, started to study a number of the form 2푝 − 1 (the first time people study about 2푝 − 1in history) when he discussed about perfect number in his book "Euclid's Elements". In the 17th century, the famous French mathematician, founder of the French Academy of Sciences, Marin Mersenne (1588-1648), the first people who studied 2푝 − 1 deeply and systematically. To commemorate him, the mathematical community named the number of this form 푝 푀푝 = 2 − 1 (Mp and p are primes) as Mersenne primes. As the core issue of , Mersenne number can be described as the focus of research. It has been a thought that for all n, the number in the form of 2푛 − 1 is a prime number. This is obviously wrong. There are new error conjectures and extrapolations have been generated during the period people keeping studied about this issue. The known results, theories and the available computational tools are closely related to of Mersenne primes research progress. The early research experience can be seen from Table l. Since Marin Mersenne stated his conjecture in 1644, it has been taken 300 years to completely resolve this conjecture. In 1947, the range given by Mersenne had been finished checking and determined a correct Mersenne prime list as follow: n=2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107 and 127.

2

Table 1: Early History

Year Discoverer Conclusion/Guess Notes Proved that 211-1 = 2047 was not 1536 Hudalricus Regius It is 23*89. prime. Verified that 217-1 and 219-1 were Correct both primes. 1603 Pietro Cataldi Stated that when n=23, 29, 31 and Incorrect 37, 2n-1 was also prime. Proved that Pietro Cataldi was 1640 Fermat wrong when n=23, 37. Mersenne stated in the preface to his Cogitata Physica-Mathematica The list was (1644) that the numbers 2n-1 were wrong, with prime for missing primes 1644 Marin Mersenne n = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and composite and 257 numbers in the and were composite for all other list. positive n < 257. Proved that Pietro Cataldi was 1738 Euler wrong when n=29. Verified that the next number on 1750 Euler Mersenne's and Regius' lists, 231-1, was prime.

1876 Lucas Verified that 2127-1 was also prime. Mersenne 1883 Pervouchine Showed that 261-1 was prime. missed. Showed that 289-1 and 2107-1was also Mersenne 1900+ Powers prime. missed.

1.3.2 Modern History

From 1930 to September, 1996(just before the Great Internet Mersenne Prime Search started), the research results and discoveries can be found in Table 2.

3

Table 2: Modern History

Year Discoverer Conclusion Based on the theory which was initiated by Lucas in the late 1870's,Luhmer brought forward a 1930 Derrick Lehmer simple way to test Mersenne Prim — Lucas-Lehmer Test.* 1963 Donald B. Gillies Discover the 23rd Mersenne prime.

1971 Bryant Tuckerman Discover the 24th Mersenne prime. Landon Curt Noll & 1978-1979 Discover the 25th and 26th Mersenne prime. Laura Nickel Discover the 30th and 31st Mersenne prime. Wrote a version of the Lucas test that he has 1979-1996 David Slowinski convinced many Cray labs around the world to run in their spare time. Walter Colquitt & 1988 Discover the 29th Mersenne prime. Luke Welsh * Lucas-Lehmer Test: For p an odd prime, the Mersenne number 2푝 − 1 is prime if and only if 2푝 − 1 divides S(p-1) where S(n+1) = S(n)2-2, and S(1) = 4 [4]

Lucas-Lehmer Test is an important way to discover new Mersenne prime and we can write a C program to test Mersenne prime based on this theory. More details about Lucas-Lehmer Test will be presented in Chapter 2.

1.3.3 Recent History

In recent 60 years, people wrote computer programs and developed databases to solve the calculation of Mersenne prime. David Slowinski, who works for Cray computers, is the first person who thought about using multiple computers to search Mersenne prime. In the late 1995, George Woltman (a number theory enthusiasts, a computer programmer in Florida) collected and combined the distributed prime’s databases and posted it on the Internet in 1996 with a free and highly optimized program which can be used for searching Mersenne primes. This is the sign of the beginning of Great Internet Mersenne Prime Search. A person who has a computer which can access to Internet can join this project. More details about Great Internet Mersenne Prime Search will be presented in Chapter 2. Mersenne primes using GIMPS found are shown in Table 3.

4

Table 3: Recent History (Using GIMPS) [5]

# Date Discover p Mp Digits 35 November 13rd, 1996 Joel Armengaud 1,398,269 420,921 36 August 24th, 1997 Gordon Spence 2,976,221 895,932 37 January 27th,1998 Roland Clarkson 3,021,377 909,526 38 June 1st,1999 Nayan Hajratwala 6,972,593 2,098,960 39 November 14th,2001 Michael Cameron 13,466,917 4,053,946 40 November 17th, 2003 Michael Shafer 20,996,011 6,320,430 41 May 15th,2004 Josh Findley 24,036,583 7,235,733 42* February 18th,2005 Martin Nowak 25,964,951 7,816,230 Curtis Cooper & 43* December 15th,2005 30,402,457 9,152,052 Steven Boone Curtis Cooper & 44* September 4th,2006 32,582,657 9,808,358 Steven Boone Hans-Michael 45* September 6th,2008 37,156,667 11,185,272 Elvenich 46* April 12nd,2009 Odd M. Strindmo 42,643,801 12,837,064 47* August 23rd,2008 Edson Smith 43,112,609 12,978,189 *It is still not known if there is an undiscovered Mersenne primes smaller than this one. As can be seen from Table 1, 2, 3, the Mersenne primes are not always found in increasing order. The full table of known Mersenne primes can be found in reference. [4]

5

Chapter 2 Methodology

2.1 Definition and theorems

Many ancient cultures found that there is a correlation between the number and its divisor, usually giving an incredible rendition. [4] Such as a prefect number, its factors contain Mersenne number, which is the source of the Mersenne numbers.

 Definition 1: When 2n-1 is prime, n is a prime, then it is said to be a Mersenne prime.

For example, for n= 2,3,5,7,1 3,17,19,3l,61,89,107,127, 2푛 − 1 is Mersenne prime.

 Definition 2: A positive integer n is called a perfect number if it is equal to the sum of all of its positive divisors, excluding n itself.[4]

For example, because 6=1+2+3; 28=1+2+4+7+14, so 6 and 28 are perfect numbers. Similarly, 496 and 8128 are also perfect numbers. (6, 28, 496 and 8128 are perfect numbers has already been known before BC.) Discuss: The factors form of above four numbers (6, 28, 496 and 8128) separately are 2*3, 4*7, 16*31and 64*127. It is obvious that they can be writing in the form: 2푛−1(2푛 − 1) (for n = 2, 3, 5, and 7 respectively). And that 2푛 − 1 was a Mersenne prime in each case. [4] It is not difficult to prove the following two theorems:

 Theorem 1: k is an even perfect number if and only if it has the form 2푛−1(2푛 − 1) and 2푛 − 1 is prime.[4]

Two thousand and three hundred years ago, Euclid proved that if 2푘 − 1 is a prime number (it would be a Mersenne prime), then 2푘−1(2푘 − 1) is a perfect number. Two hundred and fifty years ago, Euler proved the converse that every even perfect number has this form. However, right now people still do not know if there are any odd perfect numbers. [4] The proof can be found in reference [6] and from this we can know that Mersenne primes are implicated in the even perfect numbers.

 Theorem 2: If 2n-1 is prime, then so is n.

6

Proof: Let r and s be positive integers, then the polynomial 푥푟푠 − 1 is 푥푠 − 1 times 푥푠(푟−1) + 푥푠(푟−2) + ⋯ 푥푠 + 1. So if n is composite (say rs with 1

 Inference 1: Let a and n be integers greater than one. If 푎푛 − 1 is prime, then a is 2 and n is prime.

푝  Theorem 3: Let p and q be primes. If q divides푀푝 = 2 − 1, then q=+/-1 (mod 8) and q=2kp + 1, k is integer.

Notes: When checking Mersenne number is prime or not, usually check the small divisor first. The above theorem given by Euler and Fermat favored this consideration. The proof of this theorem can be found in reference [8].

 Theorem 4: Let p = 3 (mod 4) be prime. 2p+1 is also prime if and only if 2p+1 divides Mp.

2.2 Distribution law

For the integer between 2 and 127(2 ≤ n ≤127), there are 31 primes and has 12 corresponding Mersenne primes. After that, next Mersenne prime (M521) comes up until prime n=521(There are 66 primes between 127 and 521). Then after 12 primes the next

Mersenne prime (M607) appears. Later, M1279 presents itself but it has already jumped another 95 primes. After a long-term interval (another 120 primes), M2203 comes on the scene…As can be seen from the known Mersenne primes, the distribution of these special primes in the positive integer is irregular, sometimes scatter and sometimes gather. Therefore, it seems that it is more difficult to find the nature of Mersenne primes —distribution law than to find a new Mersenne prime. During the long-term exploration of studying Mersenne primes, there are some conjectures have been proposed by mathematicians, but none of them has been proofed is correct. For example, In 1964, Donald B. Gillies made the following conjecture about the distribution of 푝 prime divisors of Mersenne numbers푀푝 = 2 − 1:

7

Conjecture: If A < B ≤ √푀푝, as B/A and Mp → ∞, the number of prime divisors of M in the interval [A, B] is Poisson distributed with mean≈log((logB)/log(max(A,2p))). He noted that his conjecture would imply that: 2 (1) The number of Mersenne primes less than x is ~ loglogx 푙표푔 2 (2) The expected number of Mersenne primes in the interval [x, 2x] in p is 푙표푔 2푥 2+2log ( ) which is ~ 2 푙표푔 푥 2 푙표푔 2푝 (3) The probability that Mp is prime is ~ 푝 푙표푔 2

2.3 Algorithms

2.3.1

Trial division is the most backbreaking but the easiest way to understand of the algorithms. An individual integer being tested is called a trial divisor. Its principle is like this: Suppose if there is an integer n, we use any integer greater than one but less than√푛 to divide the integer n. If there is a factor, then the integer n is a , otherwise it is a prime. [10] Therefore, assume there is a random prime p, we can use trial division to determine 푝 푀푝 = 2 − 1 is Mersenne prime or not. A simple C program is just like below: #include #include void main () { int number=0, p=0, j=0; printf ("This is a Trial Division example program!\n") ; printf ("Please input an exponent : ") ; scanf ("%d", &p) ;//Input exponent "p" number = (int) pow (2, p) -1; int NUMBER = (int) sqrt (number) ; for (int i = 2; i <= NUMBER; i++) {//Determine Mp=2p-1 is prime or not if (number % i == 0) {//If the number can divided by i, then Mp=2p-1 is not prime j=0;

8

break;//Find a factor, then exit loop } else j=1; } if (j) printf ("When p is %d, Mp is Mersenne prime!\n", p) ; else printf ("When p is %d, Mp is not Mersenne prime!\n", p) ; }

Figure 1: A trial division example program In a sense, trial division is a very low efficient algorithm. If n has prime factors which their values are very close, trial division is unlikely to be implemented. However, when n has at least one small value factor, this factor can be found very quickly by using trail division. It is worthy to notice that for a random n, there is a 50% probability that 2 is a factor of n, and a 33% chance that 3 is a factor, and so on. It can be concluded that 88% of all positive integers have a factor less than 100, and that 92% have a factor less than 1000. [11]

2.3.2 Sieve of Eratosthenes

In mathematics, the sieve of Eratosthenes is a simple algorithm for finding all prime numbers up to a specified integer. [12] It is one of the most efficient ways to find all of the smaller primes (below 10 million or so). To implement this algorithm, it will work like this:

9

1. Make a list of consecutive integers less than or equal to n (and greater than one), for example, from 2 to n: (2, 3, 4, …, n) 2. Strike off the multiples of all primes less than or equal to √푛 3. The left numbers that are the primes. In order to make it easier to understand this algorithm, here is an example: 1. Generate a list of integers from 2 to 25: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2. First prime in the list is 2, so cross out all the multiples of 2: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 3. Next prime in the list after 2 is 3, so cross out all the multiples of 3: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 4. Next prime in the list after 3 is 5, so cross out all the multiples of 5: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 5. Because 5 is equal to √푛, so after step 4 the numbers left not crossed out in the list are all the prime numbers: 2 3 5 7 11 13 17 19 23 We can easily write a C program to print all the primes less than a number user input by using this algorithm, the example program as follow: #include #include int main(void) { int i=0, j=0, k=0; printf ("This is a Sieve of Eratosthenes example program!\n") ; printf ("The program will calculate all prime in the range between 0 to the number you input.\n") ; printf ("Please input a number: ") ; scanf ("%d",&k) ; int a[k+1]; for (i = 1; i <= k; i++) // Assign the array elements a[i] = i; for (i = 2; i < sqrt(k); i++)// Use i as a divisor for (j = i + 1; j <= k; j++)// Determine numbers are the multiples of i or not { if (a[i] != 0 && a[j] != 0) if (a[j] % a[i] == 0)

10

a[j] = 0; // If it is multiple of i, then cross out it } printf ("All primes between 0 and %d are as below:\n", k) ; int n = 0; // Count output primes in order to output every 10 primes on 1 line for (i = 2; i <= k; i++) // Output primes { if (a[i] != 0) { printf("%-5d", a[i]); // Output primes n++; } if (n == 10) { printf("\n");// Output 10 primes each line n = 0; } } printf("\n"); return 0; }

Figure 2: A Sieve of Eratosthenes example program

11

Although using this algorithm can easily and quickly calculate all the prime numbers less than the integer n, the space and time consuming will become enormous with the integer n increase. With minor modifications the program will be able to find Mersenne prime less than the integer n, but it is inefficient.

2.3.3 Pollard's (P-1) method

This algorithm was invented by John M. Pollard in 1974. It is based on Fermat's Little Theorem. The idea [14] is that for any prime number we choose, p, and any other number, a, 푎(푝−1) = 1 (mod p) which means that, 푎(푝−1) − 1 = 0 (mod p) For example, p is a factor of the left hand side. Actually all we need to do is that checking out the converse. Suppose that the number N which we want to factor has some unknown prime factor, p. We just try a large number of ak-1 numbers and see if there is a common factor with N among them. If so, we got p. This method can be simply described as below: 1. Choose a number, N, which we want to factor 2. Choose a number, 1 < a < N, for example, a = 2 3. Choose a number, k, for example, k = 2 4. If GCD(a, N) is not 1, we have a factor. Otherwise, go to step 5. 5. Let t = ak mod N 6. Let d = GCD ( t-1, N ) 7. Use to see if d is a factor of N. If it is, then we found a factor. Otherwise, change a or k and go back to step 4. More details about this method can be found in [15].

2.3.4 Lucas-Lehmer Test

In mathematics, the Lucas–Lehmer test (LLT) is a for Mersenne numbers. It is the most powerful method right now. The test was originally developed by Édouard Lucas in 1856, and subsequently improved by Lucas in 1878 and Derrick Henry Lehmer (American mathematician and number theorist) in the 1930s. The Lucas– Lehmer test works as below: [16]

Let p be an odd prime. Mp is prime if and only if S(p-1) = 0 (mod Mp) where S1 = 4 2 and Sn = Sn−1-2 A simple program can be written like this:

12

Lucas_Lemer (int p) //p is a prime which in Mp { int s=4; int i,s1;

s1=pow(2,p)-1;

for(i=3;i<=p;i++) { s=((int)pow(s,2)-2)%s1; } return (s==0 ? 1:0); //Return 1 when M is prime; return 0 when M is p p composite number

}

For example, determine 27-1 is prime or not (27-1=127). S0 = 4 S1 = (4*4-2) mod 127=14 S2 = (14*14-2) mod 127=67 S3 = (67*67-2) mod 127=42 S4 = (42*42-2) mod 127=111 S5 = (111*111-2) mod 127=0

2.4 Great Internet Mersenne Prime Search

Great Internet Mersenne Prime Search (GIMPS) is a distributed computing project dedicated to finding Mersenne primes. The project was founded by George Woltman, an outstanding programmer and organizer, who also wrote the software (upgrade to version 27 on May 15, 2012) for this project.

2.4.1 A little bit about GIMPS

The first version software of GIMPS is basically user preserves the exponent and reports the result. The currently used method is using PrimeNet to implement grid computing which allowed worldwide volunteers who is interested in Mersenne primes search to join in. PrimeNet is a distributed computing platform which was set up by Scott Kurowski in 1997. It can automate the selection of extents and reporting of results for GIMPS. As of April, 2012, GIMPS has a sustained throughput of approximately 86.7 teraflops, which theoretically let GIMPS (here regard as a virtual supercomputer) has a place among the TOP 500 most powerful supercomputers in the world. [17]

13

2.4.2 How GIMPS Works

The algorithms used in GIMPS are very efficient in order to search for Mersenne primes. Therefore, it is necessary and important to know how these algorithms being used and worked behind GIMPS. The following stages describe the process of how GIMPS program works to find Mersenne prime and all the math concepts, theorems and algorithms related we have explained in section 2.1 and 2.3.

 Creating an Exponent List

푝 Mersenne number is a number of the form, 푀푝 = 2 − 1, where p is an integer. We have already known this in Chapter 1, but p is prime or not is still a question. Therefore, we have to test the exponent p. It is easy to prove that if 2P-1 is prime, then P must be a prime. In GIMPS, the first step is to create a list of prime exponents to test. [18]

 Trial Factoring

Trial factoring actually is using a modified Sieve of Eratosthenes. According to Theorem 3, each bit representing a potential 2kp+1 factor. The sieve can eliminate any potential prime factors 2kp+1 under 40,000. Also, representing potential factors of 3 or 5 mod 8 are removed. This process will exclude around 95% of potential factors. [18] The remained potential factors will be checked by using a high efficiency algorithm which is used for determine a number can be divisible by 2푛 − 1. Here we will use an example to explain this. For example, when prime p equals 89, does it divisible by 211 − 1 or not? In binary, 11 is represented as 1011. Starting from the top (left) bit 1, repeat the following steps: 1. Square 2. Remove that top bit 3. If that bit is 1, then multiply the value by 2, otherwise, just go to step 4 4. Use the value from step 3 to mod 89. After that, if the reminder is 1, then q is a

factor of Mp. The result is shown in Table 4.

 P-1 Factoring

This is another factoring method which has being used in GIMPS. The method’s basic idea has been introduced in section 2.3.3 and it is used for finding factors and therefore avoiding costly primality tests. According to Theorem 3, if q is a factor of a number, then the factor q is of the form 2kp+1. If q-1 is highly composite (it only has

14 small factors), precisely k is highly composite of the form 2kp+1, then the P-1 method will find the factor q. Here is a procedure description of this method from www.mersenne.org: [18] Stage 1: P-1 factoring will find the factor q as long as all factors of k are less than B1 (k is called B1-smooth). 1. Choose a bound B1 2. Compute E - the product of all primes less than B1 3. Compute x = 3E*2*P 4. Check the GCD (x-1, 2P-1) to see if a factor was found Stage 2: 1. Uses a second bound B2 2. Check if k has only one factor between B1 and B2 and all remaining factors are below B or not 3. If step 2 is yes then it will find the factor q

Table 4: Test when prime p=89 is divisible by 211-1 or not

Square Remove top bit Multiply by 2 as you need Mod 89 1*1=1 1……011 1*2=2 2 2*2=4 0……11 No 4 4*4=16 1……1 16*2=32 32 32*32=1024 1…… 1024*2=2048 1

 Lucas-Lehmer Primality Testing

GIMPS first will use trial factoring and P-1 factoring to test the Mersenne number, try to find a factor and thereby discard the prime exponent p from the testing list. If after the above two tests, it still cannot eliminate some Mersenne number then it will execute the comparatively costly Lucas-Lehmer primality test to check that Mersenne number is prime or not. The concept, method and example related to Lucas-Lehmer primality testing can be seen in section 2.3.4.

 Double-Checking

15

Double-checking was adopted in GIMPS in order to confirm that a first-time Lucas-Lehmer primality test was performed without error. Usually this job is assigned to low speed computers.

2.4.3 GIMPS and Grid Computing

Grid computing is a field of computer science that using many computers in a network (or access to Internet) to deal with a single scientific or technical problem (usually the problem need to process very large amounts of data, like “Search for Extraterrestrial Intelligence” and “Great Internet Mersenne Prime Search”) at the same time. GIMPS is one of the earliest and most successful world-wide grid computing project in the world. It is supported by the PrimeNet, the longest-running and all-volunteer scientific research grid in the world, running 24x7 at 42 teraflops on about 100,000 computers worldwide. [19] (It seems that Scott Kurowski has not updated the data for a long time. According to primenet’s status, it has maintained at 86.7 teraflops for last 7 days and in the last 30 days it actually teraflops is around 84.293.) Thanking the PrimeNet for letting GIMPS such powerful and has discovered 13 largest known Mersenne primes. Therefore, it is worth knowing the mechanism of PrimeNet—grid computing. The Mechanism of PC grid computing can be seen in Fig. 3. 1. The grid nodes which join in the project download the special software and install on the local computer. For GIMPS, user needs download Prime95 from the GIMPS homepage. 2. Local grid node send request to the central server of GIMPS ask for sending program and data to the nodes. 3. The central server of GIMPS resolves the steps of Mersenne prime search into multiple smaller tasks (work unit which is made of parallel processing programs and data) at a reasonable size and then sends to different grid nodes. 4. Grid nodes receive sub-tasks and keep them as the lowest priority tasks. These tasks will be executed when the CPU is idle. The most commonly used method is to call these tasks when the computer goes into screen saver. 5. When the calculating is completed, the Prime95 on the grid nodes will return the results to the central server of GIMPS and request for new data, then return to step 3. 6. The central server of GIMPS collect, collate and save the results obtained from the grid nodes which involved in the calculation.

16

Figure 3: Mechanism of PC Grid Computing [20] Here is an example describe how to arrange in the main program and through grid computing to search Mersenne primes when exponent between 4000 and 5000. Step 1: Calculate how many primes between 4000 and 5000, N=119(119 primes between 4000 and 5000, they are 4001, 4003…4999), keep them in p[i] (i=1, 2, 3…119). Step 2: Send primes from p[i] (i=1, 2, 3…119) to N=129 nodes and execute respectively. (Notes: The i (i=1, 2, 3…119) node test and verify2푝[푖] − 1 is prime or not, the result return back to r[i] (i=1, 2, 3…119) which in main program) Step 3: If r[i] (i=1, 2, 3…119) is 0, then 2푝[푖] − 1 is not a Mersenne prime; is 1, then 2푝[푖] − 1 is a Mersenne prime. In this example, p[33] = 4253 is 19th Mersenne prime (Distributed execution return r[33] = 1); p[52] = 4423 is 20th Mersenne prime (Distributed execution return r[52] = 1).

17

Chapter 3 Experiment and results

3.1 “Lucas-Lehmer test with GMP” and “Glucas”

Lucas-Lehmer test with GMP and Glucas are two common methods for people to study primality test. In this section, a brief introduction will be presented.

3.1.1 Lucas-Lehmer test with GMP

Lucas-Lehmer test using GMP library (The GNU Multiple Precision Arithmetic Library) is the easiest and most intuitive way to implement. The method of implement Lucas-Lehmer test has introduced in section 2.3.4. Here we only discuss about GMP. GMP is a free portable library written in C for arbitrary precision arithmetic on integers, rational numbers, and floating-point numbers. It aims to provide the fastest possible arithmetic for all applications that need higher precision than is directly supported by the basic C types, which is exactly what we need in our program. [21] GMP is designed to give good performance by choosing algorithms based on the sizes of the operands, and by carefully keeping the overhead at a minimum. Although Lucas-Lehmer test uses millions of bits, it still fast by using GMP. Simply programmer can use gcc compiler to compile the program just like below: gcc program_name.c -lgmp -lm-o executable_name

In order to run the executable program, just type the following command in terminal:

./executable_name Exponent

where Exponent is the exponent number programmer wants to test. If programmers want to check the execution time, just add time before the command:

time ./executable_name Exponent

A comparison of running this program on two different CPU can be seen in Table 5.

3.1.2 Glucas

Glucas is a free program to test primality of Mersenne numbers (numbers with the form 2푛 − 1) and will write the result into a text file to notify user whether a Mersenne number is a prime number. Glucas is the first program use YEAFFT library and using it in an optimized way. YEAFFT (Yet Another Fast Fourier Transform) is a library

18 designed to perform integer of very big numbers using FFT (Fast Fourier Transforms). Glucas and YEAFFT are written by the same author. In order to use Glucas to implement our experiments, there are some notes about Glucas we need to know: 1. The Glucas package includes GNU tools from automake and autoconf and scripts to make the build and install tasks as easy as possible. As usual, the steps are configure, make and install. But in the configure step, there are lots of options can be chosen. In order to make a comparison, we compile a version Glucas with no thread--- Glucas and a version Glucas use OpenMP threads--- Glucas_pthread. If programmer wants to use Glucas_pthread, he should make sure has a multiprocessor system. More detailed description can be found in [22]. 2. After the above three steps, programmer can start to use Glucas do some experiments. For example, programmer test an exponent with Glucas with no thread, just use the following command in terminal: time ../src/Glucas Exponent

where Exponent is the exponent number programmer want to test, time is to check the time consuming. The result will automatically write into a file named result.txt. Alternatively, one can use work directory to do the test. The advantage of using this method is that it can test a with a group candidate exponents. The command is simply like this: time ../src/Glucas -W . Exponents.txt

where –W is enable to use work directory and Exponents.txt is the place to store a bunch of candidate exponents. For Glucas_pthread version, it has to enable number_of_threads threads by -T thread number. The example command as shown below: time ../src/Glucas_pthread -T 4 -W . Exponents.txt

3.1.3 “Lucas-Lehmer test with GMP” vs “Glucas”

In the previous sections, we introduced using two different methods to test primality of Mersenne numbers. Here we implement them into a comparison experiment. The experiment machine detail is shown in Table 6.

19

Table 5: Lucas-Lehmer test using GMP on different CPU Intel(R) Core(TM) Intel(R) # Exponent i5-2410M CPU @2.30GHz Pentium(R) 4 CPU 3.00GHz 1 2203 0m0.009s 0m0.029s 2 2281 0m0.010s 0m0.031s 3 3217 0m0.025s 0m0.072s 4 4253 0m0.053s 0m0.143s 5 4423 0m0.058s 0m0.156s 6 9689 0m0.378s 0m1.142s 7 9941 0m0.396s 0m1.261s 8 11213 0m0.541s 0m1.652s 9 19937 0m2.413s 0m7.503s 10 21701 0m2.956s 0m9.351s 11 23209 0m3.729s 0m11.198s 12 44497 0m18.541s 0m57.258s 13 86243 1m41.468s 5m0.852s 14 110503 3m3.718s 9m28.154s 15 132049 4m45.544s 15m24.032s 16 216091 16m8.054s 51m23.976s 17 756839 351m0.815s 992m25.108s 18 859433 520m39.170s 1295m55.493s 19 1257787 1299m40.110s 3076m41.033s 20 1398269 1542m24.889s 4120m34.487s

Table 6: Experiment machine detail Processor Intel(R) Core(TM) i5-2410M CPU @2.30GHz Memory 4.00 GB System type 64-bit Operating System Windows 7 Ultimate Service Pack 1/Ubuntu 11.04 Graphic Card GeForce GT 550M

Table 7 shows a time comparison by using Lucas Lehmer test with GMP and Glucas with different threads to test different exponents.

20

Table 7: Comparison between Lucas Lehmer test with GMP and Glucas Lucas Glucas-2.9.2 Expone # Lehmer test nt No thread 1 thread 2 thread 3 thread 4 thread With GMP 1 2203 0m0.009s * * * * * 2 2281 0m0.010s * * * * * 3 3217 0m0.025s * * * * * 4 4253 0m0.053s 0m0.037s 0m0.109s * * * 5 4423 0m0.058s 0m0.039s 0m0.043s * * * 6 9689 0m0.378s 0m0.127s 0m0.177s 0m0.244s * * 7 9941 0m0.396s 0m0.146s 0m0.180s 0m0.284s * * 8 11213 0m0.541s 0m0.157s 0m0.168s 0m0.297s * * 9 19937 0m2.413s 0m0.275s 0m0.383s 0m0.641s 0m0.790s 0m0.967s 10 21701 0m2.956s 0m0.300s 0m0.462s 0m0.684s 0m0.909s 0m1.102s 11 23209 0m3.729s 0m0.444s 0m0.523s 0m0.913s 0m1.088s 0m1.332s 12 44497 0m18.541s 0m1.752s 0m1.942s 0m3.577s 0m3.707s 0m3.892s 13 86243 1m41.468s 0m7.595s 0m7.565s 0m8.630s 0m9.488s 0m10.146s 14 110503 3m3.718s 0m11.075s 0m12.367s 0m13.543s 0m14.538s 0m15.710s 15 132049 4m45.544s 0m16.170s 0m17.276s 0m19.514s 0m20.744s 0m21.740s 16 216091 16m8.054s 0m43.779s 0m46.971s 0m52.837s 0m53.340s 0m54.479s 10m17.459 10m26.190 10m44.507 17 756839 351m0.815s 9m8.263s 10m3.007s s s s 11m51.770 12m53.515 13m23.078 13m45.431 13m52.866 18 859433 520m39.170s s s s s s 125778 1299m40.110 24m55.444 25m35.296 25m40.675 19 23m5.019s 26m0.245s 7 s s s s 20 139826 1542m24.889 30m44.519 33m54.146 36m38.354 37m44.171 37m58.506 9 s s s s s s * means exponent are too small for this version

Figure 4 shows a difference of time consuming between implement Lucas Lehmer test with GMP and Glucas with no thread.

21

20 18 16 14 12 Lucas Lehmer testwith 10 GMP 8 Glucas-2.9.2 with No 6 thread 4 2 Time:Seconds 0

1800 1600 1400 1200 1000 Lucas Lehmer testwith 800 GMP 600 Glucas-2.9.2 with No 400 thread 200 Time: Minutes 0

Figure 4: Lucas Lehmer test with GMP vs Glucas with no thread Figure 5 shows a difference of time consuming among implement Glucas with different threads.

22

2500

2000

1500 No thread 1 thread

1000 2 thread 3 thread 4 thread 500 Time:

0

Figure 5: Glucas with no thread vs Glucas with different threads

3.1.4 Experiment conclusion

As we can see from Fig.4, the time consuming of using Lucas Lehmer test with GMP increased dramatically after the exponent 132049. Conversely, the trend of using Glucas with no thread just slightly grew. And the most important thing is with the exponent increased, the test time took so long by using Lucas Lehmer test with GMP. It was nearly 57 times that of Glucas with no thread when exponent is 1398269. Therefore, it is obvious that Glucas has a better performance and more useful than Lucas Lehmer test with GMP in the practical test. As can be seen from Fig.5, the time consuming of using Glucas with different threads have a very similar performance. As the thread increased, the time spent shows an upward tendency. However, this is not the result what we expected, because we want get speed-up by using parallel method OpenMP. The reason causes this situation is because the communication between threads is too heavy or something else? Therefore, we did another comparison experiment on a different architectures machine. The result can be seen in Table 8.

23

Table 8: Glucas with different threads on Computer Vision lab machine Glucas-2.9.2 # Exponent 2 thread 4 thread 8 thread

1 11213 0m0.545s * * 2 19937 0m1.128s 0m1.523s * 3 21701 0m1.248s 0m1.850s * 4 23209 0m1.494s 0m2.454s * 5 44497 0m3.712s 0m5.528s 0m9.072s 6 86243 0m10.592s 0m12.777s 0m21.322s 7 110503 0m16.737s 0m18.134s 0m30.369s 8 132049 0m22.997s 0m24.934s 0m39.290s 9 216091 0m50.360s 0m52.720s 1m21.321s 10 756839 8m47.138s 7m53.449s 11m22.893s 11 859433 10m27.565s 9m34.264s 14m42.188s 12 1257787 22m46.772s 17m2.749s 25m54.023s 13 1398269 26m45.069s 23m19.353s 32m38.578s * means exponent are too small for this version

35

30

25

20 2 thread 15 4 thread 10 8 thread

5 Time : Minutes 0

Figure 6: Glucas with different threads on Computer Vision lab machine

24

As we can see from the above Table 8 and Fig. 6, the 4 thread version Glucas actually spent less time on the calculating (according from the results and tendency) than the 2 thread and 8 thread version. The result is different from the previous experiment and which gives us a cue to find the reason. The reason causes this difference I think is because the CPU architectures. The performance is directly related to the core number the CPU contains and how many cores we are using. In Table 8, the 4 thread version did the best job in the experiment as we use all four cores of the CPU. But in Table 7, no thread version consumed the minimum time as we only use one core in the test. It is a very interesting result and we will do a further comparison experiment to see whether this surmise is correct or not.

3.2 Undiscovered between Mersenne primes 41st and 42nd

It is still not known whether any undiscovered Mersenne primes exist between the 41st

(M24036583) and 42nd (M25964951). Therefore, it is worth to do an experiment to check it.

3.2.1 Basic method

Here we present a basic method of implement the experiment. The method can be divided in three steps: create prime number exponent list, using factor5 to eliminate exponents, using Glucas to test primality.

3.2.2 Experiment

1. Create prime number exponent list

The first thing we need to do is create an exponent list. The exponents are the prime numbers between 24036583 and 25964951. We can write a simple program to calculate and output the primes between this ranges. Alternatively, there is an existed prime number list in [23]. We can download it and choose the part which we need. There are 113117 primes between the ranges, so we separate these primes into 29 samples. The first sample has 3996 exponents, the last one has 1181primes and each of the other 27 samples has 4000 primes.

2. Using factor5 to eliminate exponents

Factor5 is a program that performs trial factoring of Mersenne numbers. It is capable of trial factoring very large numbers, many of digits. Factor5 asks for the exponent of the Mersenne number to be tested, for starting and ending bit depths, and number of threads. It writes statistical lines on the screen, and outputs results on the "results.txt"

25 file. Each 150 seconds a stand-alone thread updates the "status.txt" file. [24] The example command is like below:

factor5

Here we use the first sample to do the trial factoring, in order to do this we have to use a shell script called “ReadExponent.sh” to let the program read from a text file and do the factoring until no element left in the text file. An example code can be found here:

#!/bin/bash

for i in `cat Sample-1.txt`

do

./ factor5 $i

done

It took 5986 minutes to calculate the first sample (3996 exponents) start from 1 bit to 56 bit with 1 thread and 2150 exponents are removed from the list because they have factors. After that, we can do the second round trial factoring by increased the end bit which will eliminate more exponents. An experiment shows that if we increased the end bit to 62, it will take around 81 minutes to do the trial factoring for one exponent.

3. Using Glucas to test primality

As we mentioned in section 3.1.2, we can use work directory to deal with a group of candidate exponents. After the previous step, we got a list of candidate exponents which need to be test primality. But the list will be longer than the length work directory can handle (work directory supports up to 128 elements) and has to be separate. Then does the primality test like we discuss in section 3.1.2.

3.2.3 Conclusion

In this section, we presented a method of how to discover or check Mersenne Primes between Mersenne primes 41st and 42nd. Because the primality test will took up an enormous amount of time, we only test one exponent --- 24,036,641 in the experiment. It took 32,641 minutes to finish the primality test and proved that it is not a Mersenne prime. For example, after six years of M41 was discovered, it is confirmed as the 41st Mersenne prime number. Therefore, checking the undiscovered Mersenne primes is a very time consuming and challenging work. A better way to do it is join GIMPS, which assign different work to the clients all over the world.

26

3.3 Study Mersenne Primes on GPU

As we have known that there is a limitation of current hardware (CPU) to discover Mersenne Primes, therefore, GPU (Graphical Processing Units) has led into the project.

3.3.1 Introduction to GPU

The GPU is a specialized circuit designed to accelerate the image output in a frame buffer intended for output to a display. The modern GPU has the following functions: rendering polygons, texture mapping, coordinate transformations and accelerated video decoding. The main manufacturers are NVIDIA and ATI. The reason we mentioned this is because the product of different manufactures are using different versions code to do the trial factoring and Lucas-Lehmer test.

3.3.2 GPU Computing

GPU computing is the use of a GPU (graphics processing unit) as a co-processor to accelerate CPUs for general-purpose scientific and engineering computing. [24] The reason application can run faster because it is using the massively parallel processing power of the GPU to improve performance. The current CPU usually consists of two to eight CPU cores, while the GPU includes hundreds of smaller cores. This massively parallel architecture gives GPU a high compute performance. A core comparison between CPU and GPU can be seen in Fig. 7. [24]

Figure 7: Core comparison between a CPU and a GPU The physical layout may different among GPU, but there is a general idea: [25]  GPU divided into blocks

27

 Each block contains one or more streaming multiprocessors  Each SM has a number of streaming processors – all share the same control logic and instruction cache within an SM  All SPs from all SMs have access to up to 4 GB GDDR DRAM global memory – GDDR: Graphics double data rate – DRAM: Dynamic random access memory

3.3.3 Experiments on GPU

Scientific and engineering computing experiment on GPU is a new concept in recent years. Right now there are not too much documents and implementation about using GPU to study Mersenne Primes and therefore it is worthwhile to implements some experiments on GPU. Table 9 provides a data sheet which shows available GPU implementations related to Mersenne Primes.

Table 9: GPU implementation related to Mersenne Primes Implementation Idea propose Current Manufacture First release Language Type Time version Trial Factoring Early in 2007 At the end of 2009 0.18 CUDA NVIDIA Primality Testing Unknown 16, Oct, 2009 2.01 CUDA ATI Trial Factoring 07,Jun,2011 15, Aug, 2011 0.11 OpenCL

In the experiment, we will implement trial factoring and primality testing on the experiment machine (machine detail has shown in Table 6).

1. Trial Factoring

Trial factoring implementation on NVIDIA GPU is called mfaktc (Mersenne number faktoring with CUDA). It is a very fast program which takes advantage of the many cores available on modern graphics cards. It is working on Windows and platform and the program has some options can be chose as Fig. 8 shows.

28

Figure 8: Run mfaktc on Windows platform An example of how to use mfaktc can be seen in Table 10 and the result is shown as Fig. 9.

Table 10: Example of using mfaktc Name mfaktc Version 0.17(Because GT 550M supports up to CUDA 3.2, 0.18 require CUDA4.0) Exponents: 1,000,000

Figure 9: Trial factoring M24036641 from 21 to 262

29

Alternatively, we can get assignment from the manual assignments page and it will write into a file called “worktodo.txt” file. Simply we just need to configure the mfaktc.ini file to let the program read the worktodo.txt file when the program is running. Example call command is “mfaktc-win-64.exe”. Table 11 shows a comparison between using CPU and GPU to eliminate exponents like in section 3.2.2 As we can see, mfaktc is faster than factor5 in three different experiments. Especially mfaktc are more efficient with bigger bit ranges and it will be a better way to eliminate exponents in the experiment of section 3.2.

Table 11: A comparison between CPU and GPU of implement trail factoring Items Factor 5 mfaktc Sample 1(1~56bit) 5896 minutes 3472 minutes Start from 1 bit Around 1 minute Around 52seconds and end to 56 bit 29seconds Start from 1 bit Around 81 minutes Around 2 minutes and 50 seconds and end to 62 bit Notes Run on CPU Run on GPU

When the end bit is bigger than 65, the way mfaktc works has a little different. Let us use M24036641 as a sample to factoring from 1 bit to 68 bit.

Table 12: Trial factoring M24036641 from 21 to 268 M24036641 21~265 265~266 266~267 267~268 Time 24m1.194s 14m28.908s 26m53.751s 52m16.797s Total time 117m40.65s

As can be seen from Table 12, mfaktc will calculate each bit separately after the bit is bigger than 65.

From 21 to 265 from 265 to 266

30

From 266 to 267 from 267 to 268 Figure 10: Trial factoring M24036641 from 21 to 268 One thing need to be mentioned is that mfaktc actually uses both CPU and GPU resources. Factoring is done by letting the CPU prepare an array of factor candidates, which is passed to the GPU for parallel trial factoring. The CPU's task is mainly to eliminate composite numbers from the list of factor candidates by using a prime sieve.

2. Primality Testing

The current available code running on GPU to do the primality test is called CUDALucas. It tests an exponent by using a command line argument. If you want to test multiple exponents, you have to use a batch file because the version in this experiment does not support to use a worktodo file. The latest 2.01 version supports input via file. An example can be seen in Table 13.

Table 13: Example of using CUDALucas Name CUDALucas Version 1.2b(Because GT 550M supports up to CUDA 3.2, 2.01 require CUDA4.0) Restricti Exponents: 2<=Exponent< 151,150,000 ons

Upper limit to be verified. Also depends on VRAM size. Example C:\Users\John\Desktop\CUDA\LL\CUDALucas.1.2b\CUDALucas.cuda3.2 Call .sm_13.WIN64.exe –c10000 24036641(It means check file every 10000 iterations, Exponent 24,036,641) Capable Around 4.5 GHz days/day

31

Figure 11: Primality Testing M24036641 Figure 11 is a screen shot of running CUDALucas to test M24036641’s primality. In this comparison experiment, we test M24036641 by using Glucas and CUDALucas to see how efficient CUDALucas can perform. The result is record in Table 14.

Table 14: A comparison between CPU and GPU of implement primality testing Items Glucas CUDALucas M24036641 32,641 minutes 11,422 minutes Run on Intel(R) Pentium(R) 4 CPU 3.00GHz GeForce GT 550M

As we can see from Table 14, CUDALucas presents a much better performance and almost three times faster than Glucas. As we all know that the reason cause this is the architecture difference between hardware, but it gives researchers a new and powerful method to study Mersenne Primes. Another interesting experiment has implemented is using Glucas and CUDALucas to test some specified exponents. The result is shown in Table 15. As we can see from Table 15 and Fig. 12, CUDALucas has a relatively stable performance for all the exponents. It is interesting to find that we can come up with a ratio equation that Runtime/Exponent ≈ 1.16∓0.04 (Every 10000 iteration takes 1.16∓0.04 minute) based on Table 14. However, this equation is not correct when the exponent up to 24036641(the ratio is around 4.75). Due to the time limitation, we cannot test full range of large exponent, but it is still need to pay attention to this trend.

32

By comparison, Glucas performs a better result in small exponents range, but with the exponents increased the time consuming also grew rapidly. It is obvious that Glucas spent less time than CUDALUCAS when the exponent smaller than a certain number. When the exponent is bigger than that number, Glucas will lose efficiency. This can be seen from Fig.12 and Fig. 13.

Table 15: Glucas vs CUDALucas # Exponent Glucas CUDALucas 1 216091 0m43.779s 25m11.648s 2 756839 9m8.263s 89m31.521s 3 859433 11m51.770s 99m28.619s 4 1257787 23m5.019s 146m7.610s 5 1398269 30m44.519s 164m11.523s 6 2976221 163m36.175s 343m6.364s 7 3021377 188m18.627s 363m7.203s 8 6972593 957m37.007s 793m30.734s

1200

1000 957.6

800 793.5

600 Glucas CUDALucas 400 Time: Minutes 343.1 363.1

200 188.3 146.1 164.2 163.6 89.5 99.5 30.7 0 0.7325.2 9.1 11.9 23.1 216091 756839 859433 1257787 1398269 2976221 3021377 6972593

Figure 12: Time Comparison between using Glucas and CUDALucas

33

2000 1800 1600 1400 1200 1000 Trend(Glucas) 800 Trend(CUDALucas) 600 400 200 0

Figure 13: The Time Spent Trend of Glucas and CUDALucas

The CUDALucas test results can be seen in Fig. 14.

Exponent = 216091 Exponent = 756839

Exponent = 859433 Exponent = 1257787

34

Exponent = 1398269 Exponent = 2976221

Exponent = 3021377 Exponent = 6972593 Figure 14: CUDALucas test results

3.3.4 Conclusion

In this section, we discussed some basic concepts about GPU, GPU computing and they related to Mersenne Primes. We implemented trial factoring and primality testing on GPU. The result shows that this is a better way to study Mersenne Primes because it can capable a large number computing in a relatively short time (Compare the time consuming by CPU). One thing need to be noticed by using CUDALucas is that because of using FFT (Fast Fourier Transform) in the program, the result might be not accurate which will lead to find some fake Mersenne Primes. Therefore, it is necessary to do the double checking after get result from CUDALucas. Another thing need to be mentioned is that according to the mfaktc author, run GPU code on 64bit operating system, the performance will increased around 33% than on 32bit operating system.

35

Chapter 4 Discussion and Future Work

4.1 Summary of experiments

In this project, we learnt Mersenne Primes’ concepts, study history, discussed related theorems, conjecture, talked about what is GIMPS, why has GIMPS, how GIMPS works and elaborated the algorithms which we used to study Mersenne Primes. Also we implemented a set of different experiments to research Mersenne Primes. The results provide some useful experiments for the further work:

1. Factoring

In this section, using GPU to run mfaktc is a smart choice when we want to do trial factoring. It has a relatively better performance in the experiment and will generate efficiency in the real study. Compare with the extremely time consuming experiment (Lucas-Lehmer test), factoring with bigger end bit still can save oceans of time.

2. Primality Test

Based on the experiment results, Lucas-Lehmer test with GMP is a basic program to study primality test but is not a useful program in the real research calculation. Glucas is efficiency method to do Lucas-Lehmer test with small exponents. By comparison, CUDALucas is a new way which relies on NVIDIA GPU can handle bigger exponents with a good performance.

4.2 Meaning of Study Mersenne Primes

Although finding a Mersenne prime number is a very difficult work, we need to clearly aware of studying Mersenne primes has a plenty theoretical and practical value, especially in the current ages.

1. Develop mathematical theory

Mathematics is not experimental science, but researchers usually observe examples and hope to prove them when they judge the conjectures. Researchers want to find some properties of Mersenne primes, analysis and summary the distribution law from the already known 47 Mersenne primes. Therefore, there are some conjectures has been proposed in the past few decade and right now we are hoping the GIMPS project find more and more Mersenne primes based on the constant development of algorithm and hardware. With the increase of the number of Mersenne primes, the understanding of

36 the distribution of prime numbers will also be deepened, this correspond to the cognitive rules which from particular to the general. A good example is that the discovery of the theory of prime numbers is found by observation of the tables of primes. We believe all these research works will improve the understanding of Mersenne primes and consummates mathematical theory.

2. Improve grid and parallel computing

GIMPS takes advantage of the using idle CPU and GPU resources from the clients’ machines of the grid all over the world. This project is the first really worldwide grid computing and greatly facilitate the testing, development and improvement of this technology. The parallel computing technology also gets benefit from this project. These two computing technology have applied into many complex scientific computing and industrial areas which speed up the research and development of related fields.

3. Improve and test hardware

As we have known Prime95 is software to discover new Mersenne primes, but it is also used for test hardware by some individual users and companies. For individual users, especially overclocking enthusiasts, they usually use Prime95 to do the stress test and know the stability of system. It uses Lucas-Lehmer test tests a known Mersenne primes and check the calculated result is same as the known correct result or not. The test uses different FFT-sizes. For companies, like Intel, using Prime95 to test CPU chips before they were shipped. A well-known case is the Pentium FDIV bug which was found by David Slowinski in a related effort as Thomas Nicely was calculating the twin prime constant. Therefore, a great number of people have directly benefited from the search for Mersenne primes.

4.3 Future work

In the immediate future, some new experiments will be implemented in the lab.

1. GPU Test

As we mentioned before, the program running on GPU is very powerful and efficient but rare. Fortunately there is a new CUDA program doing Lucas-Lehmer test named as gpuLucas just released to public. The author of the program implemented a comparison experiments between gpuLucas, CUDALucas, Glucas and Prime95. The result shows it has an almost equivalent performance as CUDALucas but has some issues with the implementation. The details about that has discussed in [26]. The source code is

37 available on [27] but no executable program in the source. The problem we met is it has compiling errors with the experiment machine. We will carry out this experiment later (after get rid of all the errors) to test and verify the result and the issues with the implementation.

2. Take advantage of lab resource

There is a project lab in IIMS building and there are many idle server machines can be used to join in the GIMPS project. We will set up a small local area network to connect all the machines and running Prime95. The network topological diagram is shown in Fig. 15.

Figure 15: Experiment Network Topological Diagram The study of Mersenne primes will still continue and more undiscovered properties and laws are waiting for be found.

38

References

[1] George Woltman, GIMPS Home, http://www.mersenne.org/

[2] Prime number - Wolfram|Alpha, http://www.wolframalpha.com/input/?i=prime+number&a=*C.prime+number- _*MathWorld-

[3] Andre Barczak, Prime Obsession, PrimeObssesionBlue2010.pdf

[4] Mersenne Primes: History, Theorems and Lists, http://primes.utm.edu/mersenne/index.html#hist

[5] Mersenne Prime -- from Wolfram MathWorld, http://mathworld.wolfram.com/MersennePrime.html

[6] A proof that all even perfect numbers are a times a Mersenne prime, http://primes.utm.edu/notes/proofs/EvenPerfect.html

[7] A proof that if 2n-1 is prime, then so is n, http://primes.utm.edu/notes/proofs/Theorem2.html

[8] Modular restrictions on Mersenne divisors, http://primes.utm.edu/notes/proofs/MerDiv.html

[9] Donald B. Gillies. Three new Mersenne primes and a statistical theory, http://www.ams.org/journals/mcom/1964-18-085/S0025-5718-1964-0159774-6 /S0025-5718-1964-0159774-6.pdf

[10] Trial Division -- from Wolfram MathWorld, http://mathworld.wolfram.com/TrialDivision.html

[11] Trial division - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Trial_division

[12] sieve of Eratosthenes - Wolfram|Alpha, http://www.wolframalpha.com/input/?i=sieve+of+Eratosthenes

[13] The Prime Glossary: Sieve of Eratosthenes, http://primes.utm.edu/glossary/page.php?sort=SieveOfEratosthenes

39

[14] Pollard's P-1 Method, http://www.frenchfries.net/paul/factoring/theory/pollard.p-1.html

[15] P-1 Factorization Method – Mersennewiki, http://mersennewiki.org/index.php/P-1

[16] A proof of Lucas-Lehmer Test, http://primes.utm.edu/notes/proofs/LucasLehmer.html

[17] PrimeNet 5.0, http://www.mersenne.org/primenet/

[18] The Math – GIMPS, http://www.mersenne.org/various/math.php

[19] Scott Kurowski's Home Page, http://scottkurowski.com/

[20] Makoto Tachikawa. Pc grid computing—Using increasingly common and powerful PCs to supply society with ample computing resources [J]. Science & Technology Trends Quarterly Review, 2006(18); 45-53

[21] Torbjorn Granlund and the GMP development team, The GNU Multiple Precision Arithmetic Library, http://gmplib.org/gmp-man-5.0.4.pdf

[22] Glucas manual online, http://glucas.sourceforge.net/glucas/index.html

[23] The first fifty million primes, http://primes.utm.edu/lists/small/millions/

[24] What is GPU Computing? | High-Performance Computing | NVIDIA | NVIDIA, http://www.nvidia.com/object/what-is-gpu-computing.html

[25] Ian Bond, Exploring Massive Parallel Computation with GPU, 2011 Sagan Exoplanet Workshop, July 25-29 2011

[26] Andrew Thall , Fast Mersenne Prime Testing on the GPU, www.ece.neu.edu/groups/nucar/GPGPU4/files/thall.pdf

[27] gpuLucas, https://github.com/Almajester/gpuLucas

40

Appendix

A. Programs

The programs has used in the project are listed as bellow:

1. Trial Division Example Code Page 8~9

2. Sieve Of Eratosthenes Example Code Page 10~11 3. Lucas–Lehmer Test Example Code Page 13 4. Lucas-Lehmer test using GMP library

#include #include #include #include

#include //#pragma precision=log10l(ULLONG_MAX)/2

typedef enum { FALSE=0, TRUE=1 } BOOL;

BOOL is_prime( unsigned long int p ){ if( p == 2 ) return TRUE; else if( p <= 1 || p % 2 == 0 ) return FALSE;

else { BOOL prime = TRUE; const int to = sqrt(p); int i;

for(i = 3; i <= to; i+=2) if (!(p % i)){ prime=FALSE; printf("P is NOT prime \n");exit(0);} return prime; }

}

BOOL is_mersenne_prime( unsigned long int p ){ if( p == 2 ) return TRUE;

41

else {

mpz_t m_p;

mpz_t s; mpz_init(m_p);//initialise m_p to 0 mpz_ui_pow_ui(m_p,(unsigned long int)2,(unsigned long int)p);//m_p=2^p

mpz_sub_ui(m_p,m_p,(unsigned long int)1);//m_p=m_p-1

mpz_init_set_str(s,"4",10);//initialise s to 4 int i; for (i = 3; i <= p; i++){

mpz_mul(s,s,s);//s=s*s

mpz_sub_ui(s,s,2);//s=s-2 mpz_mod(s,s,m_p);//s=s%m_p //s = (s * s - 2) % m_p;

}

if (mpz_cmp_ui(s,(unsigned long int) 0)==0) return TRUE; else return FALSE; }

}

int main(int argc, char **argv){ if(argc!=2) {printf("You need a prime number for lucaslehmertest \n");exit(0);}

unsigned long int p=11217;

p=atol(argv[1]); printf(" Result for LucasLehmer Test:\n"); if( is_prime(p) && is_mersenne_prime(p) ){

printf (" 2^ %u -1 is prime\n",p);

} else printf (" 2^ %u -1 is NOT prime\n",p);

}

5. Glucas Available at http://www.oxixares.com/glucas/ 6. Factor5 Available at http://www.moregimps.it/billion/download1.php 7. Mfaktc

42

Available at http://www.mersenneforum.org/mfaktc/ 8. CUDALucas Available at http://www.mersenneforum.org/showthread.php?t=16142 B. Tables

Table 1: Early History……………………………………………………………………3 Table 2: Modern History…………………………………………………………………4 Table 3: Recent History (Using GIMPS)………………………………………………...5 Table 4: Test when prime p=89 is divisible by 211-1 or not…………………………….15 Table 5: Lucas-Lehmer test using GMP on different CPU……………………………..20 Table 6: Experiment machine detail……………………………………………………20 Table 7: Comparison between Lucas Lehmer test with GMP and Glucas……………..21 Table 8: Glucas with different threads on Computer Vision lab machine………….…..24 Table 9: GPU implementation related to Mersenne Primes ...…………………………28 Table 10: Example of using mfaktc…………………………………………………….29 Table 11: A comparison between CPU and GPU of implement trail factoring………...30 Table 12: Trial factoring M24036641 from 21 to 268…………………………………...30 Table 13: Example of using CUDALucas……………………………………………...31 Table 14: A comparison between CPU and GPU of implement primality testing……..32 Table 15: Glucas vs CUDALucas………………………………………………………33

C. Figures

Figure 1: A trial division example program…………………………..…………….……9 Figure 2: A Sieve of Eratosthenes example program…………………………………...11 Figure 3: Mechanism of PC Grid Computing...... …..17 Figure 4: Lucas Lehmer test with GMP vs Glucas with no thread………………..……22 Figure 5: Glucas with no thread vs Glucas with different threads…………..…………23 Figure 6: Glucas with different threads on Computer Vision lab machine…………….24 Figure 7: Core comparison between a CPU and a GPU………………………………..27 Figure 8: Run mfaktc on Windows platform…………………………...………………29 Figure 9: Trial factoring M24036641 from 21 to 262………………………………...…29 Figure 10: Trial factoring M24036641 from 21 to 268………………………………….31 Figure 11: Primality Testing M24036641………………………………………...…….32 Figure 12: Time Comparison between using Glucas and CUDALucas…………….….33 Figure 13: The Time Spent Trend of Glucas and CUDALucas…...... 34 Figure 14: CUDALucas test results………………………………………………….....35

43

Figure 15: Experiment Network Topological Diagram……………………………...... 38

44