Parallel Diffusion-Limited Aggregation
Total Page:16
File Type:pdf, Size:1020Kb
PHYSICAL REVIEW E VOLUME 52, NUMBER 5 NOVEMBER 1995 Parallel diffusion-limited aggregation Henry Kaufman,! Alessandro VespignanV'· Benoit B. Mandelbrot,!·2 and Lionel Woog! lDepartment of Mathematics, Yale University, New Haven, Connecticut 06520-8283 2Thomas J. Watson Research Center, Yorktown Heights, New York 10598-0218 (Received 21 March 1995) We present methods for simulating very large diffusion-limited aggregation (DLA) clusters using parallel processing (PDLA). With our techniques, we have been able to simulate clusters of up to 130 million particles. The time required for generating a 100 million particle PDLA is approximately 13 h. The fractal behavior of these "parallel" clusters changes from a multi particle aggregation dynamics to the usual DLA dynamics. The transition is described by simple scaling assumptions that define a charac teristic cluster size separating the two dynamical regimes. We also use DLA clusters as seeds for parallel processing. In this case, the transient regime disappears and the dynamics converges from the early stage to that of DLA. PACS number(s): 02.70.-c, 68.70.+w, 05.40.+j, 02.50.-r I. INTRODUCTION several quantities of the PDLA. From simple scaling as sumptions we can characterize the transient region and A model of irreversible growth to generate fractal find the behavior of the characteristic cluster size for structures was the diffusion limited aggregation (DLA) which the dynamics become identical to DLA's dynam model of Witten and Sander [1]. This model accounts for ics. the origin of fractal structures in a great variety of pro To avoid the effects of the dynamical drift and to speed cesses: dendritic growth, viscous fingers in fluids, dielec up the convergence towards the DLA regime we propose tric breakdown, electrochemical deposition, etc. [2,3]. a combination of serial and parallel computing. We Despite the simplicity of the rules that govern DLA, it present large clusters simulated by a PDLA process shows unexpectedly subtle and complex properties and growing on DLA cores. The cores have been generated poses theoretical problems of new type [4,5]. To this day, on a serial machine and are used as the seeds of our the asymptotic properties of radial DLA are still not larger clusters. In this way the PDLA process already completely clear because of the discrepancies between the starts in a low particle density regime corresponding to various measures of fractal dimension [2,6-10] and slow the usual (serial) DLA. crossover to the asymptotic regime [11]. To investigate The study of PDLA clusters deserves further analysis, this and other issues in DLA, one seeks increasingly large particularly to examine possible differences between the DLA clusters, so that the microstructure becomes ir asymptotic structure of PDLA and DLA. Nevertheless, relevant and the asymptotic regime can be observed and this method seems very promising in that it allows simu analyzed. To that end, we propose the parallel DLA lations to go beyond the present limitations of serial com (PDLA), a variant of DLA, that uses a multiprocessor puters. parallel computer. By parallelizing the cluster aggrega Section II introduces the algorithm to generate PDLA tion, we generated PDLA clusters of 100 million particles and discusses the details involved in creating and analyz in 13 h on a 32-processor IBM power visualization sys ing such large clusters. Section III presents the results tem (PVS) computer. Therefore, the simulation time is from analyzing the clusters and differences between DLA no longer the limiting factor in generating very large and PDLA are discussed. Section IV describes the re clusters. sults obtained in the case of clusters obtained from However, a subtle difficulty arises. The fact that more PDLA growing on a DLA core. Finally Sec. V discusses than one particle is diffusing at the same time introduces results and perspectives of the method. some interference effects in the dynamical process. The number of processors used in the simulation is the com putational counterpart of the density of diffusing parti II. PARALLEL DLA cles. This means that the early growth stages of a PDLA DLA is simulated by placing a particle at a random lo are in a multiparticle diffusion aggregation (MPDA) [12] cation on a "birth" circle that is at some fixed distance regime that drifts to the DLA regime for larger sizes. A from the maximum radius of the existing cluster. The transient region is therefore present in the behavior of new particle undergoes Brownian motion until it comes within a fixed "sticking distance" to the cluster (we use i of a particle diameter for the sticking distance). At that point, the particle sticks to the clusters and a new particle ·Present address: Instituut Lorentz, Leiden University, P.O. is added on the birth circle, and the process continues. Box 9506, 2300 RA Leiden, The Netherlands. The original random walk DLA model is very simple but 1063-65IX/95/52(5)/5602(8)/$06.oo 52 5602 © 1995 The American Physical Society 52 PARALLEL DIFFUSION-LIMITED AGGREGATION 5603 biases due to approximation in the algorithm can intro allocated only for the regions of space occupied by the duce instabilities that dominate large scale properties of PDLA. We have found PDLA's to maintain fairly con the structure [13]. These parameters are the random stant average density, so we preallocate a fixed amount of walk step size, the distance at which the random walk quad-tree memory for the entire PDLA, which is propor starts, etc. Our algorithm lets the particle take the larg tional to the number of particles to be generated. est step possible, without landing on the DLA. If the A given region of the quad-tree is represented by a particle exits the birth circle, it is projected back with the block of four pointers. The subregions of that region that "first-hit" probability distribution (see Appendix A) that are empty have null pointers, while the occupied subre models correctly the Laplacian boundary conditions of gions point to other blocks of pointers. At a fixed max DLA. The particles thus wander "infinitely" until they imum subdivision level, the leaf pointers point to lists of first hit the DLA. particles. We have found, in practice, that a minimum The above process is extremely time and memory con region size of eight particle diameters works well in the suming, even with the various efficiency schemes in prac tradeoff between particle list searching efficiency and tice, hence few people could generate clusters greater memory usage. This requires us to store 14 levels of sub than 10 million particles. A single cluster of 100 million divisions to represent a region of width 128 K (1 K= 103) particles with our current algorithm would require a sin particle diameters in order to adequately store a 108 par gle computer for two to three weeks. This is possible, but ticle PDLA whose diameter is around 90 K diameters. very impractical since several clusters are required for re To further optimize memory usage, we strove to mini liable analysis. mize the memory requirements for the particle list in the A parallel computer, however, accelerates the process leaves of the quad-tree. In our previous implementation, linearly in the number of processors. We accessed the particles were represented by four-byte floating-point IBM power visualization system (PVS), whose architec numbers for each coordinate, and a four-byte "next" ture is well suited for simulating DLA in parallel. The pointer, which pointed to the subsequent particle in its PVS contains 32 CPU's. Each processor has 16 mega region. However, 12 bytes per particle prohibited reach bytes of "private" random access memory (RAM), as well ing clusters of 108 particles on any of the computers we as access to 512 megabytes of shared memory. This ar had available. Furthermore, the subparticle accuracy of chitecture is ideal for PDLA because the cluster can be the coordinates would begin to drop as the PDLA radius stored in the shared memory and can be accessed and increased to beyond 105 particle units. The average max modified quickly by any processor. Furthermore, the imum radius for our 108 PDLA's is 4.5 X 105, which simulation of PDLA is a "read mostly" process, because yields a subdiameter accuracy of 2~6 in the position of the most of the time the particles wander around the PDLA, outer particles because 15 of the 23 bits in the float's frac and access the PDLA structure without changing it. tion must be used to store the particle's integer position. Only when a particle becomes close enough to stick to We observed that the traversal path of the quad-tree the cluster, is the PDLA modified by adding the new par from the root node to the leaf regions encodes global po ticle. This is an important characteristic, because, to in sition information. Thus, we store only local particle sure the correctness of the PDLA data structure, only coordinates within the 8 X 8 leaf region with one-byte one processor may modify the shared PDLA data struc fixed-point values for each coordinate. This gives us ac ture at any given time, whereas many may read it. ceptable position accuracy to within -k of a particle di This mutually exclusive behavior is implemented as a ameter. Using local coordinates enabled us to store parti semaphore. When a processor is ready to stick its parti cle positions with two bytes instead of eight bytes. cle to the PDLA, it sets a semaphore that insures ex We also wanted to optimize the particle list data struc clusive write access. If another processor is currently ture.