Title of the Paper (18Pt Times New Roman, Bold) s2

Some Computation Remarks on Prime Number Fractals

SABIN TABIRCA TATIANA TABIRCA1 JOHN MORRISON KIERAN REYNOLDS Department of Computer Science, BCRI National University of Ireland, Cork College Road, Cork IRELAND

Abstract: - This paper presents some recent development on Prime Number Fractals. Firstly, it is proved that the number of up, down, left or right moves is asymptotically equal to li(n) / 4 . This result explains the central area of intense brightness and some less detailed areas around the borders. A new PNF algorithm is introduced so that more border pixels contain colour details. Finally, we will present how parallel computation can be used to generate more prime numbers.

Key-Words: - Prime numbers, fractals, parallel computation.

1 Introduction Prime numbers have represented a very interesting In this article we introduce a generalization of the challenge for the last few centuries and all the algorithm to generate Prime Number Fractals. We important results concerning them have had a huge also present some efficient methods to generate the impact in Mathematics. In particular Number prime numbers needed in the algorithm. Theory has come a long way from the Euclid proof of the infinity of the prime numbers to the proof of procedure fractal(n, w, h, colour) begin P-completeness of primality testing. The quest of generate_primes(n, m, p); finding curious patterns or representations on prime x=w/2;y=h/2; numbers has always fascinated both mathematicians for i=0 to m-1 do begin and computer scientists. Perhaps, graphical dir=p[i] mod 5; representations have represented the most attractive if dir=1 then y=y-1 mod h; way to highlight curious facts on prime numbers. if dir=2 then y=y+1 mod h; The Ulam spiral is perhaps the first graphical if dir=3 then x=x-1 mod w; representation of primes [6],[7]. The numbers if dir=1 then x=x+1 mod w; 2 colour[x][y]++; 1, 2,...,n are written within a square in a spiral end; order with 1 in the centre and the primes are end; highlighted then diagonal patters appear in the representation. Of course, there have not been any Figure 1. The Algorithm to Generate PNF. proofs of these patterns but the bigger the square is the more visible they are. The simplicity of this construction has given less space for new 2 PNF Algorithm interpretations or generalizations for more than 4 The algorithm to generate Prime Number Fractals is decades. one of the simplest methods in Number Theory. Suppose that the fractal sizes are w and h for the Only recently, a new graphical representation has width and height, respectively so that the fractal is been proposed and that is called “Prime Number Fractals”. As with any simple fact there has been given by w  h pixels. At any time when a pixel is much mathematics folklore around it since there was visited the pixel colour increments. If we have a not a formal article to propose the construction. prime number p then p mod 5 {1,2,3,4} so that we Most mathematicians have recognized that Adrian can match the result p mod 5 with the four Leatherland from Monash University in Australia directions up, down, left and right as follows: has constructed the first prime number fractal [3]. 1. p mod 5 1 (x,y) goes to (x,y-1) We should also acknowledge the work of A. Turpel 2. p mod 5  2  (x,y) goes to (x,y+1) in The Aesthetics of Prime Sequence [9]. 3. p mod 5  3  (x,y) goes to (x-1,y) 1 Supported by Boole Centre for Research in Informatics. 4. p mod 5  4  (x,y) goes to (x+1,y) images are similar to the whole image so that they satisfy fractal self-similarity. If we start from all the prime numbers 1,000,000 5,000,000 10,000,000 p0  2, p1  3,..., pm1 less than a number n we will 19,618 87,063 166,105 obtain a sequence of moves up, down, left and right  5,1(x) to visit the fractal pixels. In this way the pixel  5,2 (x) 19,622 87,179 166,212 colours change. A description of this method is  (x) 19,665 87,216 166,230 shown in Figure 1. 5,3  5,4 (x) 10,593 87,055 166,032 The procedure generate_primes(n, m, p) generates all the prime numbers p0  2, p1  3,..., pm1 less Table 1. Distribution of up, down, left, right moves. than n. The Prime Number theorem gives that n PNF images as presented in [3] or [9] are all very m  O( ) . The Erathostenes’ sieve provides an log n similar. They have a central area of intense brightness and little details on the borders. This has O(n  loglog n) computation and that is recognized a simple explanation. The first few prime numbers to be the most efficient algorithm to generate these do not have a uniform distribution of p mod 5 . primes [2]. The method stores all the numbers less After that the distribution becomes uniform but than n into an array and removes successively all the random so that the walk through the image always multiples of 3, 5, 7, etc. The numbers left in the hits central pixels (see Table 1 and Figure 3). This array represent all the primes we want to find. The has a very sensitive mathematical explanation major drawback of this method is the internal shown by Theorem 1. memory might not be enough to store all the numbers we want to work with. Note that some programming languages e.g. C or C++ can allocate 180,000 sufficient space to generate up to hundreds of 160,000 millions primes. 140,000 120,000 up 100,000 down 80,000 left 60,000 right 40,000 20,000 0 1,000,000 5,000,000 10,000,000

Figure 3. Distribution of up, down, left, right moves.

Theorem 1. The numbers of up, down, left and right moves in the PNF algorithm are asymptotically equal. Figure 2. PNF Fractal Using 500,000,000 primes. Proof. Diriclet’s theorem assures if (a,b) 1 then The result of this algorithm is an image of coloured there are an infinity of primes in the set pixels. This image, shown in Figure 2 can describe a {a  k  b, k  0}. This means that the random walk distant astronomical object like a nebula or galaxy has an infinity of up, down, left, right moves. If of stars. It can also suggest a fireball from an  (x) explosion. Marek Wolf, who has also detected some a,b denotes the number of primes of the form connections between prime number distribution and a  k  b less than x then we know from a very fractality has pointed out the PNF images are fractal recent result of Weisstein that [11], [12]. A simple explanation can be that the  a,b (x) 1  generation algorithm gives a kind of random walk lim , (1) x li(x) (a) through the pixels. It is observable that parts of PNF where li(x) is the logarithmic integral function and 3. 2k  1 p mod q  3k  jump left over (a) the Euler totien function [13]. This gives that p mod q  2k pixels.  (x)  (x) 4. 3k 1 p mod q  4k  jump right over 5,1  5,2  lim lim p mod q  3k x li(x) x li(x) pixels. Figure 5 presents the detailed description of this new  5,3 (x)  5,4 (x) 1 1  lim  lim   (2) algorithm. x li(x) x li(x) (5) 4 which means that Equation (2) assures us that all these q 1  4  k li(x) moves occur in a similar number. Moreover, the  (x)   (x)   (x)   (x)  . (3) 5,1 5,2 5,3 5,4 4 border pixels are reached more often since the Equation (3) shows that the algorithm has moves jump more than one pixels. Figure 4 asymptotically the same number of up, down, left, illustrates the generalized PNF fractal using modulo q 13 right moves. ▄ . In this case we can jump 1, 2 or 3 pixels for If the starting point of the random walk is the pixel each direction. It can be seen that brightness on the from the image centre. The PNF algorithm executes pixels has been spread towards the borders. the same number of up, down, left, right moves but in a random fashion. Therefore, the central pixels procedure fractal(q, n, w, h, colour) are more likely to be reached than the border pixels. begin generate_primes(n, m, p); x=w/2; y=h/2; k=(q-1)/4; 3 Generalised PNF Algorithm for i=0 to m-1 do begin We have seen that PNF images have less-details dir=p[i] mod q; around the borders and the brightness is if 1≤dir≤k then y=y-dir; accumulated in the central area. The idea of this if k+1≤dir≤2k then y=y+dir-k; algorithm is to keep the directions up, down, left, if 2k+1≤dir≤3k then x=x-dir+2k; if 3k+1≤dir≤4k then x=x+dir-3k; right for a move but to jump more than one pixel. In colour[x][y]++; this way, jumping more than one pixel, more border end; pixels might be reached. end;

Figure 5. Generalised Algorithm for PNF.

4 A Parallel Approach It is clear that the more prime numbers used in this computation the more details PNF images contain. The PNF algorithms rely on the Eratosthene sieve to generate all the prime numbers up to a threshold. The question we address in this chapter is how to extend this computation above the threshold using parallel computation.

Suppose that the prime numbers up to n1 are generate by using the Eratosthene sieve in Figure 4. PNF Fractal Using 500,000,000 primes. generate_primes(n1, m, p). The array or collection p

contains initially n1 elements and finally only Let q  4  k 1 be a prime number. For another n p m  1 . Hence we can use the available prime number there are q-1 possible modulo log n results. Depending on the value p mod q we have 1 n1 the following moves: n1  memory locations to store some other 1. 1 p mod q  k  jump up over logn1 p mod q pixels. primes bigger than n1 . The odd numbers x between

2. k 1 p mod q  2k  jump down over n1 and n2 are tested to find whether they are prime p mod q  k pixels. by using a primality test is_prime(x). Since they are big odd numbers the computation is_prime(x) is very expensive so that parallel processing is needed Probabilistic methods, like Lucas’ or Rabin-Miller’s to speed up the process. Figure 6 illustrates the tests have been around for the last few decades. If a generic parallel algorithm for this computation. number passes such test is likely to be prime and is called pseudo-prime. For a number x let procedure generate_prime(n1, n2, m, p) x 1  2 y  z with z odd. Rabin-Miller’s test uses a begin random number 1 a  x 1 to check if generate_primes(n, m, p); j forall x=n1 to n2 step 2 do a z 1mod x or a2 z  1mod x for some j (5) if is_prime(x) then p[++m]=x; in which x passes the test [5]. If the test is repeated end; for several values 1 a  x 1 then the certainty of

the test becomes close to 1. Moreover, when Equation (5) is satisfied for all the values Figure 6. Extending the computation of primes. 1 a  x 1 then x is prime. The complexity of this O(log x) Suppose that a parallel machine with s  size Rabin-Miller test is which makes it one of processors P , P ,..., P is used in computation. The the most efficient method. However, if a number x 1 2 s passes the test then we can only say that x is likely parallel loop forall is scheduled onto processors in a prime. block fashion so that the processor Pj receives all the iterations {l j ,l j  2,...,hj }. The scheduling is The most effective deterministic test over the last decade was based on elliptic curves. By defining an l ,h , j 1,2,..., p therefore given by the bounds j j addition of points on an elliptic curve modulus a so that prime a group of rational points can be constructed. l1  n1, hp  n2 and l j  hj 1  2, j  2,3,..., p . (4) Goldwasser and Kilian proposed a theorem for a E( ) Each processor Pj tests all the odd numbers pseudocurve n that provided the impetus fpr a {l ,l  2,...,h } and stores the prime numbers found primality proof called ECPP [2]. This algorithm j j j yielded a certificate of primality that could easily be in a local array local_p. Finally, all the local arrays provided. ECPP has a complexity of 5 are gathered back in the array p. In this way the O((log n) ) prime numbers that are found by the processors will and is proven to be polynomial time for almost all be appended to the array p in a consecutive fashion. choices of inputs.

The efficiency of this parallel computation depends The quest of an efficient deterministic test has on how well the bounds l ,h , j 1,2,..., p are recently got a positive answer although this was j j predicted several years ago [10]. In August 2002, chosen. The simplest solution is to use an uniform the scientific community was shocked by Agrawal scheduling where each processor receives the same et.al. who announced that primality testing is n  n number of iterations 2 1 . In this case the polynomial . Their method, which is known as 2  s cyclotomic AKS test has the complexity Olog12 x. P processor 1 gets the smallest numbers to test while Recently, this complexity has been reduced P 6 4 the last processor s tests the largest numbers. In dramatically to Olog x or even to Olog x for this way the computation will have an important some classes of integers [4]. load imbalance overhead. All these methods have been successfully 4.1 Primality Tests implemented in various libraries to work with large Testing primality for large numbers has been a numbers. The Java BigInteger class provides the challenging problem for which several deterministic method isProbablePrime() that uses the Rabin- and probabilistic methods have been developed. For Miller test. In C/C++ there are several packages for century the trial divisor method is known to be large number computation. lip (long integer efficient for small numbers. For a number x all the processing) contains several probabilistic and odd number up to x should be tested to find a deterministic methods for both primality and divisor. Certainly, the complexity of this solution is factorization. O x  is excessive for very large numbers. 4.2 Balanced Workload Scheduling - BWBS Balanced Workload Block Scheduling (BWBS) is a static scheduling method that was proposed recently by Tabirca et.al [8]. The method attempts to find the k form of w(x)  log x, x[n1 , n2 ] . For them a pre- bounds l j , h j , j 1,2,..., s of a block scheduling processing step is required to compute the bounds of that balances the workload of the processors. The Equation (7) using assumption is that the workload of each parallel h  h  w(x)  W  w(x) j   . (9) iteration is known. Suppose that the workload of the xl j ,l j 2,...,h xl j ,l j 2,...,h2 method is_prime(x) is given by It is clear that the pre-processing step induces a w(x), x  n1,n1  2,,n1  4,...,n2 . In this case the n  n small scheduling overhead of O( 2 1 ) total workload is  w(x) so that the 2 xn1 ,n1 2,,n1 4,...,n2 operations. average workload per processors becomes 1 W   w(x) . (6) 5 Experimental Results s  xn1 ,n1 2,n1 4,...,n2 Some experimental tests have been carried out in order to illustrate the parallel method. They have BWBS find the bounds l j , h j , j 1,2,..., s so that been done using a Dell Cluster machine with 100 the workload of each processor is equal to W PIII processors. The algorithm fractal was translated 1 into a C MPI program in which a simple divisor trial  w(x)  W    w(x) . (7) xl ,l 2,...,h s xn ,n 2,n 4,...,n test has been used. Three scheduling methods have j j j 1 1 1 2 been used to schedule the parallel loop forall. The Tabirca et.al. proposed a method to find equations first one is uniform scheduling(US) where each for the upper bounds h j , j 1,2,..., s when the block set has the same number of iterations. The workload w(x) has simple formula [8]. This method second one is a balanced workload scheduling can be applied for the trial divisor testing when the (BWBS1) that uses the upper bounds from Equation (8). Finally, balanced workload scheduling workload is w(x)  x, x[n1 , n2 ] . In this case the upper bounds are (BWBS2) with a pre-processing step is employed. 2 / 3  j 3/ 2 s  j 3/ 2  h j    n2   n1  , j 1,2,..., s . (8) The MPI program has used n1  10,000,000 to  s s  generate the primes using the Erathostene’s sieve. After that the computation of primes has been P1 P2 P3 P4 extended up to n2  500,000,000 . Table 2 US 479.5 540.4 635.5 767.8 illustrates the processor execution times when s=4. BWBS1 610.5 608.4 609.1 611.3 Firstly, we can observe that the uniform scheduling BWBS2 620.7 610.9 611.1 609.6 gives an important load imbalance while the BWBS methods achieve a good load balance. We can also Table 2. Processor Execution Times for s=4. remark that the pre-processing step of BWBS2

makes the execution time of P1 a bit bigger. 900 800 s=1 s=4 s=8 s=16 s=32 700 US 3101.54 767.8 403.7 225.1 149.2 600 BWBS1 2628.59 611.3 293.7 146.5 75.1 s US d

n 500 BWBS2 2669.01 620.7 305.2 152.6 83.3 o BWBS1 c

S BWBS2 300 Table 3. Execution Times for s=1, 4, 8, 16, 32. 200 100 The second test evaluates the overall execution 0 times over s=1, 4, 8, 16 and 32 processors (see P1 P2 P3 P4 Table 3). Figure 8 shows that the difference between the BWBS methods is marginal with a small Figure 7. Processor Execution Times for s=4. advantage for BWBS1. It also illustrates the importance of parallel computing to solve this Unfortunately, the method cannot be applied for the problem. Using one processor the PNF image has other primality methods whose complexity has the been generated in around 43 minutes. But when 32 processors have been employed the execution time has reduces to less than 2 minutes. [2] E. Bach and J.Shallit, Algorithmic Number Theory, MIT Press, Cambridge, Massachusetts, USA, 1996. 3500 [3] A. Leatherland, Pulchritudinous primes; 3000 Visualizing the distribution of prime numbers, 2500 http://yoyo.cc.monash.edu.au/~bunyip/primes ) s US d 2000 [4] Lenstra H. W. Jr. and Pomerance C. "Primality n

o BWBS1

c Testing with Gaussian Periods." Manuscript.

e 1500

S BWBS2 ( March 2003. 1000 [5] M.O.Rabin, Probabilistic Algorithm for Testing 500 Primality, J. Number Th,. Vol 12, 1980, pp. 128- 0 138. s=1 s=4 s=8 s=16 s=32 [6] M. Stein and S. Ulam, An Observation on the (Number of Processors) Distribution of Primes, Amer. Math. Monthly, Vol 74, 1967, pp. 43-44. [7] M. Stein, S. Ulam, and B. Wells, A Visual Figure 8. Execution Times for s=1, 4, 8, 16, 32. Display of Some Properties of the Distribution of Primes, Amer. Math. Monthly, Vol 71, 1964, pp. 516-520. 6 Conclusion [8] T.Tabirca, L.Freeman, S.Tabirca and T.L.Yang, This article has presented some new developments A Static Workload Balance Scheduling on Prime Number Fractals. The first contribution of Algorithm, Proceedings of the 2nd Workshop on the article has proved that the number of up, down, Parallel and Distributed Scientific and left or right moves is the same which is a Engineering Computing with Applications mathematical explanation of the central area of (PDSECA 2001), April 2001, San Francisco, brightness. A generalized PNF algorithm has been USA. proposed to make areas around the borders brighter. [9] A. Turpel, The Aesthetics of Prime Sequence, Since the PNF algorithms depend on the number of http://www.2357.a-tu.net/ primes used in computation, we have presented a [10] S. Wagon, Primality Testing, Math. Intell. Vol. parallel method to generate more numbers than the 8, No. 3, 1986, pp. 58-61. Erathostene sieve gives. [11] M. Wolf, Multifractality of prime numbers, Physica A 160, 1989, pp. 24-42. References: [12] M. Wolf, Random walk on the prime numbers" [1] M. Agrawal, N. Kayal, and N. Saxena, Primes in Physica A 250, 1998, pp. 335-344. P, Indian Institute of Technology, Preprint, [13] E. E. Weisstein, Arbitrarily Long Progressions Aug. 6, 2002, of Primes, MathWorld headline news, April 12, www.cse.iitk.ac.in/primality.pdf. 2004. http://mathworld.wolfram.com/news/2004-04- 12/primeprogressions/. .