Intern. J. Math., 2002, Vol. 79(7), pp. 797–806

APPLICATION OF PARALLEL FRAMEWORK TO THE STRONG PRIME PROBLEM

DER-CHUYAN LOU,* CHIA-LONG WU and RONG-YI OU

Department of Electrical Engineering, Chung Cheng Institute of Technology, National Defense University, Tahsi, Taoyuan 33509, Taiwan

(Received 9 April 2001)

This paper use the well-discussed PVM () with several personal , and adopt the widespread Windows ‘98 as our operation platform to construct a heterogeneous PCs cluster. By engaging the related researches of PC cluster system and cluster computing theory, we apply our heterogeneous PC cluster computing system to generate more secure parameters for some public key cryptosystems such as RSA. Copes with each parameter’s related mathematic theory’s restriction, enormous computation power is needed to get better computation performance in generating these parameters. In this paper, we contribute heterogeneous PCs combined with the PVM software to cryptosystem parameters, which is conformed to today’s safety specification and requirement. We practically generate these data to prove that can effectively accumulate enormous computation power, and then demonstrate the cluster computation application in finding strong primes which are needed in some public key cryptosystems.

Keywords: Keywords:#Parallel virtual machine; Cluster computing; Cryptography; Primality test; Strong prime

C.R. Categories: .R. Categories: E.3, F.1.2, I.1.1.2, K.6.5

1. INTRODUCTION

In this section, the Parallel Virtual Machine (PVM) system that is based on the message- passing model will be introduced. Message-passing parallel programming can be considered and designed among those different machines for our integrated system based on their unique information and data format, and allow different machines make communication. Based on this property, we can have PVM [1, 2] connect through different working platforms to each other, combine them as one virtual machine with strong operation power, even each machine might has its different specification, this also specifies how the name ‘‘PVM’’ comes from. In 1989, a parallel computation program called PVM is proceeded in Oak Ridge National Lab [3]. This project was expected to offer a environment with heteroge- neous and general properties, which not only can support multi-party protocols effectively but also can be adapted to the distributed computation algorithm. Although the PVM was motioned as the most popular distributed computation operation system in 1992, and has most of the user population, it doesn’t necessarily means PVM can finish all jobs automati-

*Corresponding Author. Fax: 886-3-3801407; E-mail: [email protected]

ISSN 0020-7160 print; ISSN 1029-0265 online # 2002 Taylor & Francis Ltd DOI: 10.1080=00207160290029228 798 D.-C. LOU et al. cally. PVM [4] can only provides an environment that makes the parallel program executable. Program designers must depend on their manual processes and clearly specify those program instructions where the parallel computation task is needed. PVM does not have the ability to distribute the instruction and data automatically. That means, it does not offer the automatic parallel mechanism. PVM provides for a software environment for message passing between homogeneous computers. In PVM main design program, users must define all the parallel procedures and they must understand the fact that even though PVM is a parallel computation interface, but all the controlling main programs are still controlled by sequential pattern. Its proceeding control can let PVM be interrupted and become an Unix or a Window 32 procedure (which doesn’t have the parallel capability), or become a PVM procedure in general process. In general speaking, PVM is still a sequential control procedure. In this paper, we utilize the well-discussed PVM software that uses message-passing model as interface, accompanied with our personal computers and windows operating system Window’98 to build an experimental cluster. The PVM software can constructs a framework through different computer platforms. Different computers are used in this paper to construct a powerful computation virtual machine to satisfy the computer cryptosystem requirement that is urging the computation power. In this paper, we use three different rank’s PCs to demonstrate the heterogeneous property and to show homely personal-computers can also accumulate adequate computation power in solving the strong prime problem. Here are these computers’ specifications shown as Table I. The rest of the paper is organized as follows. Section 2 has focus on the strong prime problem and the bottleneck of the RSA public-key cryptosystem as well as the popular ‘‘cluster computing’’ topic. In Section 3, we then introduce and discuss several different theo- rems for primality test. Section 4 and Section 5 we here have demonstrated our experimental design and experimental performance results using primality test algorithms for RSA public- key. Finally, we put our research contribution and future work aspect in Section 6 as our conclusion.

2. THE STRONG PRIME PROBLEM

As we know number theory has play an important role in the public-key cryptographic system [5]. Prime number is an essential issue in number theory. It has been well discussed to construct the strong prime as the mainly secure parameter in some the public-key crypto- systems. Here we will discuss the RSA public-key cryptosystem and its bottleneck as well as the strong prime number problem, next we concentrate on the cluster computing and PVM system concepts.

2.1. Bottleneck of the RSA Public-key Cryptosystem In 1978, three MIT professors: Rivest, Shamir, and Adleman brought the public-key crypto- system using security-based modular exponential function with complex factoring large prime numbers difficulties, is what people known the RSA public-key cryptosystem [6].

TABLE I System specifications

Name Specification

D-Celeron CPU: Celeron-450 2, Memory 128 MB Celeron CPU: Celeron-300, Memory 64 MB Pentium CPU: Pentium-75, Memory 48 MB CLUSTER COMPUTING PRIME NUMBERS 799

The RSA algorithm is widely used in public-key cryptosystems [7]. Public-key crypto- system, though to some extent advantages, still its disadvantages does exist. Especially in encryption=decryption operations respect, these operation processes are quite complex, enor- mous operation capability is needed. Comparing the RSA public-key cryptosystem with the DES (Data Encryption Standard) secret-key cryptosystem. The DES hardware chip can reach the speed with approximately 45 Mega bits per second, while the RSA cryptosystem only has 50 Kilo bits per second, there is approximate 1000 times difference, enough to specify the bottleneck of the RSA public-key cryptosystem. Nowadays, the DES cryptosystem is no longer secure and its major safety concern is coming from the Wiener’s [8] assumption (based on a known plaintext attack). Because these systems are vulnerable to a shortcut attack, they must use key sizes substan- tially greater than those required for comparable levels of security with traditional single-key methods. The AES [9] now has its secret-key length extended to 128 192 bits, the RSA cryptosystem is also being recommended to extend its public key from 512 bits to 1024 bits to keep its safety, therefore the computation capability we need to have is then enor- mously increased.

2.2. Strong Prime Number The RSA cryptosystem is a block cipher that will process the input one block of elements at a time and produce an output block for each input block. Plaintext is encrypted in blocks, and every binary value in each block is no greater than some number N. Assume we have two given prime numbers p and q, such that N can be calculated as N ¼ pq. By using the Euler’s theorem, we can then have fð pqÞ¼ðp À 1Þðq À 1Þ and d  eÀ1 mod fðnÞ: That is ed is of the form ed ¼ kfðnÞþ1; therefore ed  1 mod fðnÞ: According to the statement shown above, we can understand the RSA cryptosystem is build its security-based property on the complexity of the factorization problem. It is obliv- ious that for in the public key (e, N) of the RSA cryptosystem, if N can be successfully factor- ized by factor p or q, then the trapdoor T ¼ fðNÞ¼ðp À 1Þðq À 1Þ and decryption key d which are the decryption process depending on is no place to hide. Therefore, the decryption key d can no longer keeps itself as a ‘‘secret’’ key, that means, ‘‘there exist no security’’ what- soever. Although it is not yet ‘‘identify’’ or ‘‘prove’’ the difficulty of how to break the RSA public key cryptosystem is as same as the effort of how we factorize the number N, but in general it is ‘‘believed’’ that the difficulty of breakdown the RSA cryptosystem is equal to factorize the number N. Therefore, for the RSA cryptosystem, how to choose its parameters should be considered most prudently and carefully. Since the RSA cryptosystem build its security-based property on the complexity of breaking down number N, the prime factors of N should satisfy the property of strong prime to assure that: it is computationally infeasible. The strong prime property is introduced as follows. r1, s1, r2, s2 be four extreme large prime numbers, we call them as ‘‘simple primes’’. Let xjy demote y is divisible by x.Ifwe have r1jp1 À 1; s1jp1 þ 1; r2jp2 À 1; s2jp2 þ 1; such p1, p2, we call them as ‘‘complex primes’’. To process these assemble steps furthermore we can have p1jp À 1; p2jp þ 1; then we can get p as so called ‘‘strong prime’’ [10]. The structure of a strong prime is shown as Figure 1. It is truly oblivious that any general prime number can also be called as simple prime. However, it is considered a mathematical problem as the most difficult to assemble the factor of product N which is constructed by the strong prime numbers p and q. It is already consid- 800 D.-C. LOU et al.

FIGURE 1 The structure of a strong prime. ered as a tough job to find a big prime number. It’s also no doubt being considered as a more tremendous magnificent task as we discuss the strong prime problem.

2.3. Cluster Computing Under such circumstance, how to integrate those computation abilities effectively on each distributed small computer becomes a modern lesson in computer science, which is what we call ‘‘cluster computing’’. Consider the meaning by its appearance, we can have tens of even hundreds of thousands of PCs constructed as network architecture to form a computer cluster. Thus, we divide the time-consumed computation problem into several small sub-problems and distribute those sub-problems to each system machine effectively to solve the problem using parallel system. At last, we integrate all the computation results from each distributed system machine. This computer cluster with so intensive integrating power, not only can much overpower the original PC’s capability, but also can much more overpower than the ’s operation capability. For example, LANL (Los Alamos National Laboratory) established Avalon in 1998, constructed by 140 256M-matched memory with 533 MHz Alpha 21164A CPU [11]. It has 47.7 G flops efficiency tested in Linpack’s performance evaluation program. This perfor- mance let Avalon rank 113 in the world’s 500 big computer systems. Avalon’s performance exceeds many well-known . Therefore, while we want dealing with the time- consuming problem in cryptosystem, the supercomputer will no longer be the only solution we search for. For small computer cluster system, it can be three or four computers located in one labora- tory, which can only handle few general data; while for the big computer cluster system, we can extend to as millions of computers connected with internet for performing the SETI (Search for Extra Terrestrial Intelligence). Now we can only need less than one million of the cost for the supercomputer (at first, we believe the supercomputer is the only machine which can finish the job), at the same time, without solving the supercomputer’s output- restrained problem. This computer cluster system may be considered as an inexpensive solu- tion for the time-consuming problem.

3. PRIMALITY TEST AND STRONG PRIME NUMBER

Primality testing [12–15] of large numbers is a very popular topic in many areas of mathe- matics, computer science and cryptography. For example, it can help solving important CLUSTER COMPUTING PRIME NUMBERS 801 security problems in the RSA public key cryptosystem [16], and recently, many of the modern primality testing algorithms have been incorporated in Computer Algebra Systems (CAS) as a standard. Here, three different primality testing techniques ‘‘Miller-Rabin’’, ‘‘Wilson’’, and ‘‘Proth’’ and strong prime number generation formula, are introduced as follows.

Miller-Rabin Primality Test Assume input n as a positive odd number and n ¼ 2st þ 1ðs  1Þ and t is an odd number, then follow the testing steps as: choose a positive integer a and test a if it satisfies at 6¼ 1 mod n and a2jt 6¼ À1 mod n, 0 j s À 1:

Wilson Primality Test For any positive integer n, we have n as a prime number if the relation ðn À 1Þ! À1 mod n holds.

Proth Primality Test If we can have a f À1 mod ðk 2n þ 1Þ equivalent relation hold, where f ¼ k 2nÀ1; then ðk 2n þ 1Þ is a prime.

Strong Prime Number Generation Formula Suppose we have two prime numbers r and s, we find the multiplicative inverse of s, desig- nated sÀ 1, hence, s multiplied by sÀ 1, yields the residue 1 when we mod r (i.e., s sÀ1  1 mod r). We define our strong prime number as ‘‘(2ssÀ 1 7 1) þ 2krs’’, where k is the quotient of 2LÀ 1 divided by 2rs and L is the maximum bit length of prime numbers r and s.

As using the Proth and Wilson methods, we can always find the real prime number. There- fore, we call them as ‘‘deterministic primality tests’’, and the number we found is called ‘‘provable prime’’. On the other hand, using Miller-Rabin test, we don’t guarantee the found number is a prime, we just have a ‘‘big chance’’ to get a prime, (if we repeatedly choose any different value k times, we can have a average error rate from experiments as (1=4)k to have the picked number as a prime number). Summarizes the advantage and disad- vantage of these three primality test methods as Table II.

TABLE II Comparison of primality test methods

Method Advantage Disadvantage

Miller-Rabin As k increases, the error rate (1=4)k ! 0, it Can’t always have the accuracy without provide adequate accuracy. miss. Wilson Can find the prime less then n by listing It has only theoretical value and its without miss. computation process is too complex. Proth Prime can be found quickly. The prime has a unique ðk 2n þ 1Þ pattern, and when we subtract the prime by 1, can easily to disassemble it. 802 D.-C. LOU et al.

4. EXPERIMENTAL DESIGN

The basic idea of this paper is to use the cluster computation to provide RSA cryptosystem the ‘‘big enough’’ prime number promptly, and thus enhance its security. But under the RSA cryptosystem specification, it needs the prime number at least of hundreds of bits to meet the basic requirement of this system. In the most popular parallel programming languages (for example: 90, Cþþ), even we declare its variable to unsigned long, still far from its basic requirement. Under this situation, the first problem we need dealing with is the data structure problem. Such large prime number is far beyond the representation limit of our present programming language. After a few evaluations, we decide to use array representation as the basic data structure for our test program. All variables in the program are represented using an array structure. We can then feel free to define the size of variable array in cursor-head to meet each different requirement for variant prime test methods. Take number[digit_size] ¼ {1024,6,8, ...} for example, number[ ] can be used to stored as a normal decimal number, number2 represents the name of the array. We can define the array size at the beginning of our program using ‘‘digit_size’’ command to unify the defini- tion specification in order to make any change anytime without influencing any other part of program. First position of the array (number[0]) is used to save the number of decimal digits in this number, then put each digit of the number in a order corresponding to each address number[1], number[2], number[3], ..., number[number[0]]. By using the data structure we have designed, we can easily solve the long-term severely bothering ‘‘overflow’’ problem in computation domain. At the same time, we provide an effective debugging function. During the experimental process, the data structure what we have defined can provide an absolutely correct operation result even for the Wilson primality testing method which regulates enormous operation digits. In the Wilson primality test proce- dure, the test computation of (n 7 1)! With the number n ¼ 1,597 can nearly produce 4,500 digits decimal number result. More tests are shown as Table III in detail. After checking the contents of digit field in this data structure shown above, we can easily spot the overflow problem in each subroutine effectively, and according to the observation in this field. We can find out if there is an abnormal situation occurs in each single part opera- tion, then we can have the effective debugging capability.

TABLE III Wilson pralility test

Test number Decimal digits n of ðn À 1Þ!

97 150 127 212 251 493 367 781 499 1129 541 1243 677 1622 727 1764 877 2200 977 2496 1009 2592 1103 2876 1213 3213 1301 3486 1423 3868 1597 4421 CLUSTER COMPUTING PRIME NUMBERS 803

As we design the data representation shown above, we can hardly use any ‘‘addition’’ and ‘‘subtraction’’ operation that the original program language supposed to provide. Take simple ‘‘add’’ and ‘‘subtract’’ operations as example, we must re-design subroutines for those opera- tions we need to execute. Therefore, we design various subroutines such as addition, subtraction, ...etc. We improve each subroutine to get its better efficiency based on practical requirements and program characteristics. Right after finishing every operation subroutine design task within self-constructed data structure. Subsequently, we will construct practical strong prime number based on the RSA cryptosystem safety specification criteria, produce the strong prime which is at least 512-bit long number, and increase the prime power step by step. Finally, for generating a complete strong prime. Based on the comparisons shown as Table II, we start with producing approximately 100-bit simple prime using the Proth algorithm. From the strong prime generation formula ((2ssÀ 1 7 1) þ 2krs) [17], we start input two prime numbers r and s, and use them combine as our candidate number. Then use the combi- nation to candidate for 200-bit complex prime candidate number. Thereafter, we use small prime number and the Miller–Rabin normalize prime number testing method to find out this 200-bit complex prime number, through repeating the same production testing process, finally we get an approximately 500-bit strong prime. In the next part, we use our proposed PVM framework to connect every single distributed personal computer, and then we test this system on line. The system design infrastructure is shown as Figure 2 in detail. Referring to the key issue on load-balancing respect for cluster computation, PVM adopt the round-robin method to assign the operation job. Since PVM can’t assign job according to each different machine type, it will easily arise a ‘‘system bottle- neck’’ problem. Because PVM doesn’t have load-balancing mechanism, therefore, this issue becomes more important when we are dealing with different-ranking machines (we have ten times difference in clock between the fastest machine Celeron-450 and the slowest machine Pentium-75). Here we then reject the general routine based on fixed job-assigned pattern, and we don’t force to assign any machine for any particular portion of the job. We adopt dynamic distrib- uted method to assign job. We then have the same portion of job (two prime number factors) as in the first assigned job through each distributed procedure. After we finishing the first

FIGURE 2 System hardware structure. 804 D.-C. LOU et al. assign process, we are no longer to assign job uniformly. After the procedure of being finished the job, we have the operation result passed back to the main procedure (this test result might be a strong prime number, or it might be a nullified factor signal). In the mean time, we can assign a new job to the idle computer and save the operation output uniformly by our main procedure, or wait for as being the factor usage in next step. Under such design, every procedure can have job be assigned at any time without any wasting time. Each procedure will run independently and avoid influencing each other. Without having the bottleneck problem, we have the integration performance improved and accelerated.

5. EXPERIMENTAL RESULTS

We use the Proth algorithm to generate 100 simple prime numbers for n ¼ 10, n ¼ 15, and n ¼ 20. Each of them is used in our experiment follows. Then we combine any two of these 100 simple prime numbers randomly, and use the strong prime number generator based on the Miller-Rabin method to test the combined complex prime numbers we have generated. At the same time, we measure those time-intervals that complete the whole test process. The time needs and performance are shown as Table IV. We can learn from Table IV: as the computation load increase, the PC cluster (constructed by the three different machines) can maintain the operation capability that equals 1.9*PC1 or 2.8*PC2 or 16.1*PC3 can do. Based on these data shown here, PC cluster can get a (1.9 7 1 ¼ 0.9) bi-processor PC1 computation power. Figure 3 is the transformation result from Table IV. We here point out a very exciting announcement; perhaps, our cluster system can only generate approximately two times operation power of bi-processor machine by its appear- ance. But as we look into this cluster system, we can find out a brand-new and different meaning shown inside. In our system, machine PC3 only uses a Pentium-75 processor, its ability already is failed short of user’s wishes on daily use, needless to say its poor application performance on numerical operation. Though machine PC2 has the same Celeron processor as PC1, its clock only has 300. PC1 has two Celeron-450 processors, therefore, in fact those two machines have great difference on computation performance when compare to PC1.We can see the difference of computation performance is almost 8 times between PC1 and PC3. But in cluster configuration, We can have these small computation’s power capabilities integrated and shown great performance. The contribution on PC cluster can never be neglected, if we can integrate much more new modern effective computers, we might have an impressive accumulative computation power. Therefore, it is quite honest for us to call

TABLE IV Experimental results

Pentium (PC3) Celeron (PC2) D-Celeron (PC1) Cluster n ¼ 10 5960 Sec 1051 Sec 723 Sec 366 Sec 16.284 2.871 1.975 1 n ¼ 15 8410 Sec 1483 Sec 994 Sec 520 Sec 16.173 2.852 1.911 1 n ¼ 20 18041 Sec 3122 Sec 2218 Sec 1122 Sec 16.079 2.782 1.976 1 CLUSTER COMPUTING PRIME NUMBERS 805

FIGURE 3 Computation time for n ¼ 10.

cluster as DIY supercomputer. At the same time, we can also point out a new direction for our future computer technology evolution. In the last part of this paper, we will apply this experimental PC cluster on generating the strong prime number parameters for the RSA public key cryptosystem. However, as the sparse-property of the prime number, it is very difficult on precisely estimating for the prime number size we generated, especially for those higher rank (which contains big multiple level prime number) strong prime numbers. A 480-bit strong prime number is given as below. P ¼ 17470593927108739746046784907855202526336237511507892- 32660446086178886391421937880327950252867420587012665- 36973898865521395996115104727364053579;

P1 ¼ 96300943223616203498330534078029490808557415031932184- 4225162363461776567;

P2 ¼ 149080216969893919667592965434580689588408959173673114- 873263902376468411; where P1jP 7 1 and P2jP 7 1.

R1 ¼126818301348032525784533182072225793;

S1 ¼129955736583597393553237522505531393 where R1jP1 7 1 and S1jP1 7 1.

R2 ¼ 129955736583597393553237522505531393

S2 ¼ 131667064893905503245258071832788993 where R2jP1 7 1 and P2jP1 7 1. 806 D.-C. LOU et al.

6. CONCLUSIONS

During our experiment, computer cluster indeed can be constructed by some small PCs as computer cluster, which can provide strong computation power. We decide computation power by depending on how many PCs we can integrate. Here, we want emphasize a unique fact, that is, PC cluster computing can show great potential in its computation capabilities. That is, we no longer take supercomputer as the only problem-solving solution for complex problems, whether for public science or for defense security cryptosystem, this idea shows a whole new research aspect. In our system, two old PCs (PC2 and PC3) can get the same computation power when elaborating the same grade bi-processor machine. Furthermore, these three machines, one is new, two are not, we still can integrate these three machines and run smoothly whereas they have great difference on their level. This is a very exciting result; also make this tech- nique application showing prosperous aspect in the future. To sum up, there are two simple results we want identifying in this paper. First, we can integrate several PCs to finish the complex computing job that is originally only supercom- puter can accomplish. Take a look at the popular WWW (World Wide Webs), we have enor- mous computers, this could be integrated as a huge computation power. Second, if we continue to develop this parallel processing technique properly, it is believed that there will be more great contribution for human world, no matter is in the domain such as weather prediction, medicine development, gene exploration, and many mysterious questions, there is so many improvement for us to be expected!

References [1] Sundream, V.S., Geist, G. A., Dongarra, J. and Manchek, R. (1994). ‘‘The PVM concurrent computing system: evolution, experiences, and trends’’, Parallel Computing, 20, 531–545. [2] Sundream, V. S. (1997). ‘‘Heterogeneous network computing: The next generation’’, Parallel Computing, 23, 121–135. [3] Geist, G. A., Beguelin, A., Dongarra, J., Jiang, W., Mancheck, R. and Sundream, V. S. (1993). ‘‘PVM 3 user’s guide and reference manual’’, Technical Report ORNL=TM-12187, Oak Ridge National Laboratory. [4] Geist, G. A. and Sundream, V.S. (1992). ‘‘Sundream, network based concurrent computing on the PVM system’’ Concurrence: Practice and Experience, 4(4), 293–311. [5] Rivest, R., Shamir, A. and Adleman, L. (1978). ‘‘A method for obtaining digital signatures and public-key cryptosystems,’’ Communications of the ACM, 21(2), 120–126. [6] Denning (1999). Cryptography And Data Security, Second Edition, (Addison Wesley), 104–110. [7] Stallings, W. (1999). Cryptography And Network Security: Principles And Practice, Second Edition, (Prentice Hall), 173–175. [8] Wiener, M. (1993). ‘‘Efficient DES key search,’’ Lecture Notes in Computer Science (Springer-Verlag). [9] Nechvatal, J. (2000). Report in the Development of the Advanced Encryption Standard, Technology Adminis- tration, U.S. Department of Commerce. [10] Kranakis, E. (1985). Primality and Cryptography (John Wiley & Sons Press), 39–79. [11] http:==www.lanl.gov. [12] Beauchemin, P. et al. (1986). ‘‘Two observations on probabilistic primality testing’’, Advances in Cryptology- CRYPTO ‘86, Lecture Notes in Computer Science (Springer-Verlag), 443–450. [13] Miller, G. (1976). ‘‘Riemann’s hypothesis and tests for primality’’, Journal of Computer and System Sciences, 13, 300–317. [14] Pollard, J. (1974). ‘‘Theorems on factorization and primality testing’’, Proc. Cambr. Philos. Society, 76, 521–528. [15] Adleman, L. and Huang, M. (1987). ‘‘Recognizing primes in random polynomial time’’, Proceedings of the Nineteenth ACM STOC, 462–469. [16] Gordon, J. (1984). ‘‘Strong RSA key’’, Electronics Letters, 20, 514–516. [17] Laih, C.-S., Harn, L. and Chang, C.-C. (1995). Contemporary Cryptography and its Applications, Unalis Corp., 155–164.