<<

M ONTE C ARLO M ETHODS

The Monte Carlo Method in Science and Engineering

Since 1953, researchers have applied the Monte Carlo method to a wide of areas. Specialized have also been developed to extend the method’s applicability and efficiency. The author describes some of the algorithms that have been developed to perform Monte Carlo in science and engineering.

onte Carlo methods are algorithms might work well for functions of one variable, for solving various kinds of compu- such methods can be very inefficient for functions tational problems by using random of several variables. For example, to numerically numbers (or rather, pseudorandom integrate a function of an N-dimensional vector numbers).M Monte Carlo simulations play an impor- (where N = 100) with a grid of 10 points in each tant role in and engineering, would require the evaluation of 10100 with applications ranging from materials science to points, which is far too many to be computed. biology to quantum . They also play an im- Monte Carlo methods provide a way out of this portant role in a variety of other fields, including exponential time increase: as long as the function computer imaging, , and economics. is reasonably well behaved, it can be estimated by suggested the name “Monte randomly selecting points in N-dimensional space Carlo”—in reference to the famous casino in and then taking an appropriate average of the Monaco—in one of the first applications of the function values at these points. By the central Monte Carlo method in physics.1 Because of the limit theorem, this method will display 1/ N repetitive nature of a typical Monte Carlo algo- convergence—that is, quadrupling the number of rithm, as well as the large number of calculations in- sampled points will halve the error, regardless of volved, the Monte Carlo method is particularly the number of . Another very impor- suited to calculation using a computer. tant application for Monte Carlo simulations is Monte Carlo methods are particularly useful for optimization. The traveling salesman problem is problems that involve a large number of degrees an example of an optimization problem that is of freedom. For example, deterministic methods very difficult to solve using conventional methods, of operate by taking several but that might be approximately solved via Monte evenly spaced samples from a function. While this Carlo methods. A variety of Monte Carlo meth- ods such as tunneling,2 ,3 genetic algorithms,4 and parallel tem- 1521-9615/06/$20.00 © 2006 IEEE pering5 have been developed to handle such opti- Copublished by the IEEE CS and the AIP mization problems. JACQUES G. AMAR One of the first uses of Monte Carlo simulations University of Toledo is described in the classic article by Nicholas C. Metropolis, Arianna W. Rosenbluth, Marshall N.

MARCH/APRIL 2006 9 FURTHER READING References 1. M.H. Kalos and P.A. Whitlock, Monte Carlo Methods, John Wiley & Sons, iven the vast literature on Monte Carlo simulations, it is 1986. G virtually impossible to discuss all the methods that 2. D.P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statisti- have been developed in an article of this length. For fairly re- cal Physics, Cambridge Univ. Press, 2000. cent surveys of the literature, see the books on Monte Carlo 3. B.A Berg, Monte Carlo Simulations and Their Statistical methods by M.H. Kalos and P.A. Whitlock,1 D.P. Landau and Analysis, World Scientific, 2004. K. Binder,2 and B.A. Berg3 as well as a recent Los Alamos Na- 4. J.E. Gubernatis, ed., “The Monte Carlo Method in the Physical Sci- tional Laboratory conference proceedings on the Monte ences,” Am. Inst. of Physics Conf. Proc., no. 690, Am. Inst. of Physics, Carlo method in the physical sciences.4 A fairly recent discus- 2003. sion of extended ensemble methods is provided in a review 5. Y. Iba, “Extended Ensemble Monte Carlo,” Int’l J. Modern Physics C, vol. article by Yukito Iba.5 For a good description of classical 12, no. 5, 2001, pp. 623–656. Monte Carlo simulations of fluids, the book by Allen and 6. M.P. Allen and D.J. Tildesley, Computer of Liquids, Oxford Univ. Tildesley6 is also recommended. Press, 2001.

Rosenbluth, Augusta H. Teller, and Edward tem makes a transition from state i to state j, or it 1 acc Teller. In this work, the general Metropolis algo- is rejected with 1 – Pij . The overall rithm is first described along with its application transition rate from state i to state j is thus given acc to the equation of state of fluids. Using the Los by the transition matrix wij = TijPij . Alamos National Laboratory’s MANIAC com- The sequence of configurations generated in a puter (11,000 operations per second), Metropolis Monte Carlo simulation is generally referred to as and his colleagues obtained results for the equa- a Markov chain because the transition rate or prob- tion of state of the hard disk fluid by performing ability depends on the current state but not on pre- Monte Carlo simulations of 2D systems with 56 vious states. To generate a Markov chain of states and 224 particles. Since then, the computational with the desired , the power of a single processor has increased by ap- overall transition wij should satisfy the proximately a factor of 105, and the Monte Carlo detailed balance condition has become increasingly sophisticated. This article describes some of the algorithms that wijPi = wjiPj,(1) have been developed to perform both equilibrium and nonequilibrium Monte Carlo simulations of a which implies that the desired distribution Pi is a variety of systems of interest in biology, physics, stationary state. Assuming ergodicity—that is, a chemistry, materials science, and engineering. nonzero multitransition probability of reaching any allowed state of the system from any other allowed Metropolis-Hastings Monte state—this condition further implies that, in such Carlo and Detailed Balance a Monte Carlo simulation, the system will ap- As in the original article by Metropolis and his proach the equilibrium ensemble distribution. For- colleagues, in many scientific applications, re- mally, the ergodicity requirement also implies that searchers use Monte Carlo simulations to sample the transition matrix w must satisfy as accurately as possible the properties of a many- n body system within a given statistical distribution [w ]ij > 0 (2) or ensemble. An example is the Gibbs canonical ensemble with probability distribution Pi = for n > nmax for all i,j. Although the ergodicity exp(–Ei/kBT)/Z, where Ei is the energy of the sys- properties of a particular many-body system are tem in configuration i, kB is Boltzmann’s constant, difficult to study, it is believed, in general, that al- T is the temperature, and Z is the partition func- most any reasonable choice of allowed trial moves tion. Starting with a given initial state i, two steps will satisfy ergodicity. are usually involved in generating the next Monte Several possible forms for the acceptance acc Carlo state. First, a possible new state or trial con- probabilities Pij satisfy the detailed balance con- figuration j is selected with a trial selection prob- dition in Equation 1. The simplest and most ability or rate Tij. Then the new configuration is commonly used corresponds to the Metropolis- acc 6 either accepted with probability P ij , and the sys- Hastings rule,

10 COMPUTING IN SCIENCE & ENGINEERING ⎛ ⎞ TPji j surv P acc = min⎜1 , ⎟ tr =−dPi () t =τ −−1 t/τ ij .(3)Pti () e ,(6) ⎝ TPij i ⎠ dt

Another is the “symmetric” Barker which implies that the average transition time is expression,7 given by átñ = ␶. Such a distribution of transition times t can be generated using the expression PT acc = jji Pij + .(4)t = –␶ ln(r), (7) PTiij P j T ji where r is a uniform random number between 0 Many Monte Carlo simulations, including the and 1. ones carried out in the original article by Me- tropolis and his colleagues,1 use symmetric trial Constant-NPT Ensemble configuration selection rates Tij = Tji. Thus, in a While Equation 5 may be used to perform Metrop- typical Metropolis Monte Carlo simulation of the olis Monte Carlo simulations in the Gibbs canoni- canonical ensemble with equilibrium distribution cal ensemble, researchers have extended the Monte acc Pi ~ exp(–Ei/kBT), the acceptance probability Pij Carlo method to a variety of other ensembles. For for a transition from state i to state j can be writ- example, in classical simulations of molecular liquids ten as and gases in the constant NPT ensemble (where N is the number of particles, P is the pressure, and T is acc ␤ Pij = min(1, exp(– (Ej – Ei)), (5) the temperature), the configurational average of a quantity A can be rewritten as8 ␤ where = 1/kBT. An example is the Ising spin- model with “spin-flip” or Glauber dynamics: at A = each step, a spin is randomly selected from the lat- NPT − ∞ tice and then flipped with an appropriate accep- ZdVPVV1 ∫ exp(−β ) N tance probability. In Monte Carlo simulations of NPT 0 fluids with velocity-independent interactions,1 the 1 3N ∫ d sUA()exp(ss−β ()),(8) velocity (momentum) degrees of freedom can be 0 integrated out; only the atomic positions are im- portant. Thus, the energy Ei in Equation 5 can be where ZNPT is the corresponding partition func- taken to include only the configuration-dependent tion, V = L3 is the volume of the system, and U(s) portion of the energy. For the hard-disk system corresponds to the system’s configuration-depen- studied in the original Metropolis article,1 the dent total potential energy. Here, we use a set of –1 Monte Carlo method is particularly simple. All scaled coordinates s = s1, s2, ... sN, where si = L ri trial moves involving an overlap are immediately and ri is the coordinate of particle i. Accordingly, rejected, and all trial moves not involving an over- the corresponding equilibrium distribution is lap are accepted. In this case, a simple way of gen- given by erating trial configurations is to randomly select a ␤ hard disk and then displace it randomly within Pi ~ exp(– H), (9) some radius ␦. Although Monte Carlo simulations can be used where H = PV + U(s) – ␤ –1Nln(V). A trial configu- to efficiently sample an equilibrium distribution, ration is generated by randomly displacing a ran- nonequilibrium or “dynamic” Monte Carlo simu- domly selected molecule (molecule k) or making a lations are also of interest. In these simulations, the volume change from Vi to Vj : trial configuration selection rates Tij are usually as- ␶ –1 j i ␦ sumed to correspond to a fixed attempt rate , s k = sk + smax(2η – 1) which is the same for all possible allowed transi- ␦ ␹ tions. We can obtain the time t for a given transi- Vj = Vi + Vmax(2 – 1), (10) tion by noting that the survival probability that the system remains in state i at time t after arrival is where ␹ is a uniform random number between 0 surv –t/␶ given by Pi (t) = e . The probability distribution and 1, η is a 3D vector whose components are also tr P i (t) that the system undergoes a transition from uniform random numbers between 0 and 1, and 1 ␦ state i at time t is then is the vector (1, 1, 1). The quantities smax and

MARCH/APRIL 2006 11 ␦ Vmax govern the maximum changes in the scaled or biasing the trial configurations and selection rates coordinates of the particles and in the volume of Tij to make them more efficient, whereas others in- the simulation box, respectively, and are typically volve performing simulations in different ensem- adjusted8 to produce a Metropolis move acceptance bles. We now review some of these methods. ratio of 35 to 50 percent.9 Once the new state j is selected, the quantity ␦H is calculated as n-Fold Way Algorithm In some cases, such as at low temperatures when the ␦ ␦ ␤–1 Hij = P(Vj – Vi) + Uij – N ln(Vj /Vi), (11) acceptance probability is low and most transitions are rejected, the standard Metropolis algorithm can and the transition from state i to state j is accepted become inefficient. To eliminate rejection in discrete with transition probability Monte Carlo simulations, A.B. Bortz, Mal H. Kalos, and Joel L. Lebowitz developed the rejection-free acc ␤␦ 14 Pij = min(1, exp(– Hij)). (12) n-fold way algorithm. The basic idea is to calcu- acc late all the possible transition rates wij = Tij Pij for all possible trial configurations j (with j ٌ i) for a given Grand Canonical Monte Carlo initial configuration i, and then directly select the In Monte Carlo simulations of phase transitions new configuration j with a probability proportional and phase equilibria, the grand canonical ensemble to wij. Once the new configuration is selected, it is corresponding to constant chemical potential ␮ is always accepted. The penalty for eliminating rejec- particularly useful.10–13 As an example, in Monte tion is the additional overhead for determining all Carlo simulations in the constant-(␮, V, T) ensem- possible transition states and rates, as well as the ble, the energy can fluctuate due to particle dis- memory required to keep track of all the transitions. placements. However, fluctuations in the particle If N possible new states exist, then the new number and energy can also occur via particle in- configuration j can be selected by first calculat- i i ⌺n sertions and deletions, which are selected with ing the partial sums S0 = 0 and Sn = k=1wik for n equal probability. If deletion is chosen, a trial con- = 1 to N and then generating a uniform random i figuration j in which one of the particles is ran- number r between 0 and SN. A search can then be i domly removed is generated. In this case, the performed to find the value of j such that S j–1 < r i acceptance probability for the new configuration < S j, after which a transition to state j is carried can be written as11 out. Such a search can either be performed di- rectly by going through the list of partial sums or ⎛ Λ3 ⎞ NN→−1 N more efficiently by using a binary search. How- P =−−min⎜1 , exp(β (EE )⎟ , (13) ij ⎝ zV ji⎠ ever, in many cases, a relatively small number Nc of possible values of the transition probabilities or rates w␣ exist in which each value of ␣ corre- ␤␮ ⌳ π where z = exp( ), = hmkT/2 B is the ther- sponds to a different transition class. The main i mal wavelength, and N is the number of particles work then involves determining the number n ␣ in the system before deletion. If insertion is cho- of possible transitions from state i for each class, sen, then a trial configuration j in which an addi- and then selecting the type or “class” ␤ of transi- tional particle is inserted at a randomly chosen tion with probability location is generated. In this case, the acceptance

probability for the new configuration can be writ- N c ii i ten as ρβββ= nw/ ∑ nw αα. (15) α =1 →+ P NN1 = ij Once a particular class ␤ is selected, then one of the i ⎛ ⎞ n␤ possible transitions in that class is randomly se- zV −−β min⎜1 , exp( (EEji )⎟ . (14) lected from a list of all transitions in that class. ⎝ ()N +1 Λ3 ⎠ Since in a typical dynamical Metropolis Monte Carlo simulation, the trial configuration selection ␶ Acceleration Methods rate Tij is a constant 1/ for all possible transitions, and Extended Ensembles for the corresponding n-fold way simulation, we –1 acc acc Since the development of the Metropolis algorithm, can write w␣ = ␶ P␣ , where P␣ is the acceptance a variety of Monte Carlo acceleration methods have probability for class ␣. The average time for a emerged. Some of these methods involve altering transition then depends on the initial configura-

12 COMPUTING IN SCIENCE & ENGINEERING á⌬ ñ ␶ ⌺ i acc tion i and is given by ti = / ␣n␣ P␣ , while the critical temperature in this algorithm, the new ⌬ time for a particular transition is given by ti = configuration can differ substantially from the á⌬ ñ –ln(r) ti , where r is a uniform random number original one. In Monte Carlo simulations of the between 0 and 1. Thus, if the acceptance proba- 2D performed with this algo- acc 17,18 bilities P␣ are low, then the time interval per step rithm, researchers found that critical slowing for an n-fold way simulation will be much larger down was significantly reduced. than for a standard Metropolis simulation. Al- Ulli Wolff 20 has developed a somewhat differ- though originally developed for Ising spin systems, ent cluster algorithm to study continuous spin the n-fold way algorithm has recently been ex- models. In the Wolff algorithm, a single cluster is tended to the simulation of continuum systems.15 built at each step and then flipped using a gener- However, in this case, the overhead associated with alized spin-flip operation. In simulations of the determining the next move is significantly larger 2D Ising model and continuous spin O(n) models than in the discrete case. with n = 2 (xy model) and n = 3 (Heisenberg model) using the Wolff algorithm, the critical Cluster Acceleration slowing down was found to be further reduced. A In many cases, the use of Monte Carlo trial moves variety of extensions and modifications of the SW that correspond to relatively small local changes can be relatively efficient. However, when large- In many scientific applications, researchers use scale fluctuations become important, such as at a critical point, such localized moves become rela- Monte Carlo simulations to sample as accurately tively inefficient. On the other hand, the use of ar- bitrary large moves (such as increasing the as possible the properties of a many-body system maximum displacement in a fluid or moving a large number of particles randomly) can lead to a signif- within a given statistical distribution or ensemble. icant decrease in the acceptance probability. One approach to overcoming these problems has been the development of cluster acceleration methods such as the Sweeny16 and Swendsen-Wang and Wolff algorithms have since been developed, (SW)17,18 algorithms. The SW algorithm is based including cluster algorithms for vertex models21 on a mapping of the Ising model to a percolation as well as acceleration algorithms for quantum model by C.M. Fortuin and P.W. Kasteleyn19 and spin systems.22 was originally developed to accelerate the Monte More recently, researchers have extended clus- Carlo simulation of Ising and Potts spin models ter acceleration methods to the simulation of near the critical point. Consider a q-state Potts continuous systems such as complex fluids.23–25 model Hamiltonian, In particular, Jiuwen Liu and Erik Luijten24,25 have developed an elegant generalization of the HJ=− ∑δ SW and Wolff algorithms for the simulation of ssij, , (16) ij, fluids with a pair-potential based on the general- ized geometric cluster algorithm developed by C. 23 where the Potts spins si are on a lattice and take on Dress and W. Krauth. In Liu and Luijten’s al- the integer values 1, 2, ... q, and the sum is over all gorithm, the cluster selection and flipping nearest-neighbor spins. The Kronecker delta in processes are combined. At the beginning of each Equation 16 corresponds to ferromagnetic cou- Monte Carlo step, a randomly chosen particle i at pling, that is, the energy is minimized when near- initial position ri is first inverted (“pivoted”) est-neighbor spins have the same value. In the SW about a randomly selected pivot point to a new ¢ algorithm, bonds are created between all neigh- position r i. Using the same pivot point, subse- boring spins with the same value with probability quent particles j are then added to the cluster p = 1 – exp(–J/kBT), thus leading to a set of bond with probability clusters. (An isolated spin with no bonds is also ␤ ¢ considered to be a single cluster.) In a single pij = max(1 – exp(– (V(|ri – rj|) – V(|ri – rj|)), 0), (17) Monte Carlo move, the bond clusters are then all “flipped”—that is, for each bond cluster, a new where V(r) is the pair-potential describing the randomly chosen Potts value is selected and as- molecular . If accepted as part of the signed to all the spins in that bond cluster. Be- cluster, the particle j is also inverted about the cause the clusters can be arbitrarily large at the pivot point, so that all the particles belonging to a

MARCH/APRIL 2006 13 cluster maintain their original separation. The ric. The acceptance probability for a move from process is iterative—that is, once a particle has state i to state j is then been added to the cluster, it then also plays the role of particle i in Equation 17 and can recruit new ⎛ ⎞ acc gE()i particles to the cluster. The process ends when no P = min⎜1 , ⎟ ij ⎝ gE()⎠ , (18) more particles can be added to the cluster. As in j the discrete cluster algorithms, there is no rejec- tion because at least one particle is always moved. which implies that the probability of configura- For Monte Carlo simulations of binary mixtures tion i is given by 1/g(Ei). Because the density of and complex fluids, the simulation can states is g(Ei), this eventually leads to a flat his- be orders of magnitude higher using a geometric togram in energy space—that is, a “random walk” cluster Monte Carlo than is found in the usual Me- in energy space. Initially, because the density of tropolis Monte Carlo. states isn’t known, we have g(E) = 1. However, each time an energy level E is visited, the corre- Multicanonical Methods sponding density of states g(E) is updated by mul- Several interesting methods have been developed tiplying the existing value by a modification factor in which the ensemble being simulated differs f > 1, which can be reduced slowly during the from the actual ensemble for which results are course of the simulation to a value just slightly desired. The advantage of these methods, which higher than 1. The simulation process stops when include umbrella sampling26 as well as the more the modification factor is smaller than some pre- ⑀ defined final value, for example, ffinal = 1 + , where ⑀ The Wang-Landau multicanonical algorithm is << 1. Once this occurs, the density of states g(E) should be converged to the correct value and can particularly useful near phase transitions or for be used to calculate thermodynamic quantities such as the free energy, disordered systems, and has been applied to a =− FT() kTB ln() Z ⎛ ⎞ variety of classical and quantum systems. −βE =−kTln⎜∑ gEe ( ) ⎟ . (19) B ⎝ ⎠ E

recently developed multicanonical method,27 is To accelerate the calculation of the density of that the use of a different ensemble can lead to a states, one can perform multiple simulations with better of phase space and provide more each for a different range of energy and then piece information, such as the entropy and partition together the results. In each simulation, the ran- function. A somewhat related method is the his- dom walk in energy space is kept in the selected en- togram method of Alan Ferrenberg and Robert ergy range by rejecting any move out of that range. Swendsen.28 In this method, information about The Wang-Landau multicanonical algorithm is the density of states g(E) obtained during canon- particularly useful near phase transitions or for dis- ical ensemble Monte Carlo simulations at tem- ordered systems, and has been applied to a variety perature T1 is used via “re-weighting” to of classical and quantum systems. calculate properties of the system at a nearby temperature T2. Recently, Fugao Wang and David Landau have One method that has proven to be extremely use- developed a very interesting multicanonical Monte ful in Monte Carlo simulations of complex systems Carlo algorithm.29,30 The idea of the Wang-Lan- with a rugged energy landscape is parallel temper- dau algorithm is to sample the energy space uni- ing. In this method, several independent replicas of formly—“generate a flat in energy a system are simulated simultaneously, but under space” to determine the density of states g(E). Once different conditions (typically different tempera- the density of states is determined, all thermo- tures). Periodically, systems with neighboring dynamic quantities can be calculated for arbitrary temperatures are allowed to interchange configu- temperatures using the Boltzmann distribution. rations. For example, in the canonical ensemble, Typically, trial configurations are randomly chosen, the probability for a replica at temperature T1 to as in the standard Metropolis algorithm, so that the accept a trial move from configuration i to config- trial configuration selection matrix Tij is symmet- uration j is given by

14 COMPUTING IN SCIENCE & ENGINEERING acc =−−β PTij ()min,exp((11()1 EEji ). (20) first developed for boson systems by W.L. McMil- lan,36 and then generalized to fermions by David Ceperley and colleagues.37 The diffusion or Similarly, the probability for a replica at tempera- Green’s function Monte Carlo method33,38 takes ture T2 to accept a trial move from configuration j advantage of the mapping between the imaginary- to configuration i is given by time Schrödinger equation and a diffusion process that includes drift due to a trial wavefunction quan- acc =−−β PTij ()min,exp((22()1 EEji ). (21) tum force, and branching (a random walker in con- figuration space is either duplicated or eliminated) The probability of accepting an interchange of due to the difference between the local trial energy configurations between the two replicas corre- and a reference energy. To take into account the sponding to a “swap” move is then given by the antisymmetry of the wavefunction for fermions, the acc acc product of Pij (T1) and Pji (T2), that is, fixed-node approximation, in which particles are absorbed at the nodes can be used. Path- swap =−−−ββ 39 PEEij min()1 ,exp( (12 )(ji ) . (22) is based on the Feynman path-integral interpretation of quantum mechan- Typically, a swap move is attempted between two ics.40 Each quantum particle is represented by a replicas after both have completed the same num- “ring polymer” whose elements (beads) are con- ber of Monte Carlo steps. nected by harmonic forces (springs). Interparticle The swap moves used in parallel tempering can interactions are expressed as interactions between significantly improve the sampling of configuration space. For example, replicas of a system that are Quantum Monte Carlo (QMC) calculations are close to a glassy state can exchange their way “up” in temperature so that energy barriers are easier to an important alternative to ab initio density overcome, and then come back down to low tem- peratures to yield a new independent uncorrelated functional theory calculations. configuration. This is particularly useful in per- forming simulations at low temperatures or when searching for ground states.31 Recent research has combined the parallel tempering method with mul- simultaneous beads of the polymers, thus the par- ticanonical Monte Carlo, which was found to sig- tition function of a quantum many-body system is nificantly improve the efficiency in simulations of reduced to the distribution function of interacting a Lennard-Jones fluid’s coexistence curve.32 classical ring polymers with harmonic bonds. As for the Green’s function Monte Carlo method, the Quantum Monte Carlo main challenge for fermionic systems comes from The correlated motion of electrons plays a crucial the necessity of antisymmetrization. role in the aggregation of atoms into molecules and solids, in electronic transport properties, and in Kinetic Monte Carlo many other important physical phenomena. Al- The techniques discussed so far primarily involve though ab initio calculations performed with den- equilibrium Monte Carlo, in which the goal is to sity functional theory33 have become a vital tool in sample the equilibrium distribution or ensemble as materials science, condensed-matter physics, and efficiently as possible. However, several important quantum chemistry, quantum Monte Carlo (QMC) nonequilibrium processes exist in physics, chem- calculations are an important complementary al- istry, and materials science in which the time evo- ternative. For example, QMC calculations have lution is of interest. Although been used to accurately calculate the exchange-cor- can be used to accurately model the evolution of relation functional for the homogeneous electron such systems on the atomic scale, the time scales gas as a function of density.34 These results have that can be reached are extremely limited. Accord- been parameterized35 and then used as input to ingly, researchers have used kinetic Monte Carlo density functional theory calculations in the local (KMC) simulations in many cases. In particular, density approximation.35 KMC simulations have been used to efficiently A variety of QMC methods have been devel- model a wide variety of dynamical processes. A re- oped, including the , dif- cent review article discussing KMC simulations ap- fusion Monte Carlo, and path-integral Monte pears elsewhere.41 Carlo methods. The variational QMC method was In KMC simulations, we assume that the transi-

MARCH/APRIL 2006 15 tions from one state to another state are Markov- transition from state i, and r is a uniform random ian—that is, the transition rates wij depend only on number between 0 and 1. the initial state i and the final state j—and that each Many KMC simulations use a precalculated rate transition corresponds to a Poisson process (the catalog that includes a limited number of types of 42 transition rates wij are independent of time). This transitions, such as monomer diffusion, and their is a good approximation for systems in which the corresponding rates. However, a new “self- transitions involve infrequent thermally activated learning” KMC method has recently been devel- events between states separated by relatively large oped.45,46 In this method, a precalculated list of energy barriers. According to transition state the- diffusion processes and their associated energetics 43 ory, the rates wij are then determined by an acti- and transition rates isn’t needed—rather, at any b vation barrier E ij , as well as a prefactor Dij : time during the simulation, the activation energies for all possible single or multi-atom processes are b wij = Dij exp(–Eij /kBT). (23) either computed on-the-fly using a saddle-point search procedure, or retrieved from a database in The prefactors can be calculated more accurately which previously encountered processes are stored. by using transition state theory,43 but are often set Initially, this method may require an extensive to a typical vibrational frequency. To converge to amount of work to calculate the possible events that equilibrium, the detailed balance condition might occur in the system. However, as long as the interactions are short-ranged, updating the list of −−EkTb //EkTb ij Bji= B weij wji e (24) possible events and their rates after each event only requires a local calculation. In addition, this must be satisfied. Typically, we assume that the en- method can lead to a substantial gain in accuracy ergy barrier for a transition from configuration i to because of the inclusion of many-particle processes. b a configuration j can be written as Eij = E ij – Ei, where Recently, self-learning KMC has been used to a E ij is the energy of the activated state separating study the diffusion of Cu clusters on the Cu(111) 45,46 configurations i and j, and that Dij = Dji. These as- surface, for which there are many possible sumptions automatically satisfy the detailed balance types of events and energy barriers, and significant condition in Equation 24. differences between the results obtained using self- Starting from an initial configuration i, the pro- learning KMC and ordinary KMC with a restricted cedure is then similar to that for the n-fold way— rate catalog were observed. the next configuration is chosen with a probability proportional to the transition rate wij. In particu- Parallel Monte Carlo i i ⌺n lar, the partial sums S0 = 0 and Sn = k=1wik are gen- Although independent or quasi-independent paral- erated for all N possible final configurations j. After lel simulations can be used in equilibrium Monte generating a uniform random number r between 0 Carlo to increase statistics and accelerate simulation i and SN, a search is performed to find the value of j speed, the standard dynamic i i such that S j–1 < r < S j and a transition to that state is inherently serial because only one event can occur is then carried out. If there are several different at each step. However, in some cases it is desirable to transition rates wij for a given initial state i, the perform Monte Carlo simulations over very long search through the list of partial sums can be per- times or for very large system sizes. In such cases, rig- formed efficiently by using a binary- or K-tree al- orous parallel dynamical Monte Carlo simulations— gorithm.44 However, just as for the n-fold way, in in which the system is decomposed into different many cases there exist only a small number Nc of parts that are then assigned to different processors distinct transition classes ␣ (with corresponding via domain decomposition—are of interest. i rates w␣ and n␣ different possible transitions in each class). In this case, the next configuration can Parallel Metropolis Monte Carlo be selected more efficiently by first selecting the In Metropolis Monte Carlo, the attempt times are class of the transition with a probability propor- typically independent of system configuration (see i tional to n␣w␣ and then randomly selecting one of Equation 7), so parallel simulations can be carried i the n␣ possible transitions in the selected class. The out using an asynchronous “conservative” algo- time t for the transition is then given by rithm.47–50 In this algorithm, each processor keeps track of the times of the events performed in its do- t = –ln(r)/Ri, (25) main, but must check the next attempt times of the appropriate neighboring processors when perform- ⌺ ⌺ i where Ri = jٌiwij = ␣n␣w␣ is the total rate for a ing a boundary move. Thus, although Monte Carlo

16 COMPUTING IN SCIENCE & ENGINEERING events that occur away from the processor boundary tem configuration. In particular, because fast events (that is, whose acceptance probabilities aren’t af- can “propagate” across processors, the time for an fected by another processor’s configuration) are al- event already executed by a processor can change lowed to proceed independently, boundary moves due to earlier events in nearby processors, thus can only be accepted or rejected if the “local time leading to an incorrect evolution. If all possible condition” that the processor’s current attempt time KMC moves and event rates are known in advance, is less than the corresponding neighboring proces- the CA algorithm can still be used by mapping the sors’ next attempt time is satisfied. This leads to an event rates to Metropolis probabilities.53 However, asynchronous dynamics in which all processors or for problems with a wide range of different possi- domains have different local times at any stage of the ble event rates, the resulting parallel simulations simulation, but communication is required when- are typically very inefficient.53 ever a boundary move is attempted. Even in the ex- A more efficient approach to parallel KMC sim- treme case in which all moves are boundary moves, ulations is the synchronous relaxation (SR) algo- such as in the spin-flip Ising model,49 such an algo- rithm.54,55 S.G. Eick and colleagues54 originally rithm doesn’t lead to deadlock because at any point used this algorithm to simulate large circuit- in time the processor whose local time is a minimum switched communication networks, and it has also can always proceed. For efficiency, a modified ver- been discussed recently by Boris Lubachevsky and sion of this algorithm can also be used, in which n- Alan Weiss55 in the context of Ising model simula- fold way simulations are performed in the interior tions. In this approach, all processors remain syn- of each processor, while Metropolis simulations are 48,51 performed at the boundary. While the conservative asynchronous algorithm is Recently, it has been shown49 that the conserva- tive asynchronous (CA) algorithm scales (that is, appropriate for parallel Metropolis simulations, the overall simulation speed in Monte Carlo steps per second is proportional to the number of it cannot be directly applied to parallel kinetic processors) because a finite fraction of all proces- sors can proceed at any given time. However, the Monte Carlo simulations, since in KMC the CA algorithm also leads to a surface or “time hori- zon” (corresponding to the next attempt times of event time depends on the system configuration. all processors) that roughens with time and whose evolution can be described49 by the Kardar-Parisi- Zhang (KPZ) equation.52 The roughening of the time horizon can lead to problems with data-tak- chronized at the beginning and end of a time ing and memory storage because data can only be interval T, and an iterative relaxation method is used taken using configurations in different processors to correct errors due to boundary events between that correspond to the same local time.49 For mod- processors. The SR algorithm has the advantage of els in which a processor’s “boundary moves” can af- generality (for example, it isn’t necessary to know fect not only the acceptance probabilities of a the types or rates of all possible events in advance) neighboring processor, but also that processor’s and flexibility because the cycle length T can be dy- boundary configuration, this can lead to more sub- namically tuned56 to optimize parallel efficiency. tle problems.53 However, research has recently However, due to fluctuations, which increase loga- 50 56 shown that by enhancing the local time restric- rithmically with the number of processors Np, as tion to include a small fraction of randomly chosen well as the requirement of global communications processors that are arbitrarily far away, thus creat- at the end of each cycle (the global communications ing a small-world network, the time horizon’s time also increases logarithmically with Np), the roughness can be significantly reduced without sac- computational speedup as a function of Np is sub- rificing computational efficiency. As a result, the linear for a fixed processor size. memory storage requirements associated with data- Recently, a more efficient but semirigorous syn- taking can be significantly reduced. chronous sublattice (SL) algorithm has been de- veloped.57 The SL algorithm is useful for a variety Parallel Kinetic Monte Carlo of parallel KMC simulations and has recently been While the CA algorithm is appropriate for parallel applied to simulations of epitaxial Cu/Cu(100) Metropolis simulations, it cannot be directly ap- growth.58 Because the SL algorithm requires only plied to parallel kinetic Monte Carlo simulations, local communications, the parallel efficiency is in- since in KMC the event time depends on the sys- dependent of the number of processors in the large

MARCH/APRIL 2006 17 processor limit, thus leading to linear scaling. In nary Liquid Mixtures,” Molecular Physics, vol. 23, 1972, pp. 41–58. addition, research has shown57 that by using cer- 9. M.P. Allen and D.J. Tildesley, of Liquids, Ox- tain reasonable assumptions on the cycle length ford Univ. Press, 2001. and processor size, the results obtained are identi- 10. G.E. Norman and V.S. Filinov, “Investigation of Phase Transitions by a Monte Carlo Method,” High Temperature (USSR), vol. 7, cal to those obtained in serial simulations. Never- 1969, pp. 216–222. theless, the development of rigorous efficient 11. N.B. Wilding, “Computer Simulation of Fluid Phase Transitions,” parallel algorithms for KMC simulations remains Am. J. Physics, vol. 69, no. 11, 2001, pp. 1147–1155. a challenging problem. 12. D. Nicholson and N. Parsonage, Computer Simulation and the Sta- tistical Mechanics of Adsorption, Academic Press, 1982. 13. A.Z. Panagiotopoulos et al., “Phase Equilibria by Simulation in the Gibbs Ensemble, Alternative Derivation, Generalisation and ince the original article by Metropolis and Application to Mixture and Membrane Equilibria,” Molecular his colleagues1 first appeared in 1953, re- Physics, vol. 63, no. 4, 1988, pp. 527–538. searchers have applied the Monte Carlo 14. A.B. Bortz, M.H. Kalos, and J.L. Lebowitz, “A New Algorithm for method to a wide range of nonequilibrium Monte Carlo Simulation of Ising Spin Systems,” J. , vol. 17, 1975, pp. 10–18. Sand equilibrium processes as well as to a variety of 15. J.D. Munoz, M.A. Novotny, and S.J. Mitchell, “Rejection-Free complex problems. Moreover, various specialized Monte Carlo Algorithms for Models with Continuous Degrees of algorithms have been developed to extend its ap- Freedom,” Physical Rev. E, vol. 67, no. 2, 2003, pp. 026101- plicability and make it increasingly sophisticated 1–026101-4. and efficient. Coupled with the rapid increase in 16. M. Sweeny, “Monte Carlo Study of Weighted Percolation Clus- ters Relevant to the Potts Models,” Physical Rev. B, vol. 27, no. 7, computer power, this evolution has greatly ex- 1983, pp. 4445–4455. panded the complexity and size of systems to which 17. R.H. Swendsen and J.-S. Wang, “Nonuniversal Critical Dynamics the Monte Carlo method can be applied. Although in Monte Carlo Simulations,” Physical Rev. Letters, vol. 58, no. 2, the Monte Carlo method’s effectiveness in solving 1987, pp. 86–88. complex problems is due in part to the central-limit 18. J.-S. Wang, O. Kozan, and R.H. Swendsen, “Sweeny and Gliozzi Dynamics for Simulations of Potts Models in the Fortuin-Kaste- theorem, the development of improved algorithms leyn Representation,” Physical Rev. E, vol. 66, 2002, pp. 057101- and sampling methods will continue to play an im- 1–057101-4. portant role in expanding the accuracy and applic- 19. C.M. Fortuin and P.W. Kasteleyn, “On the Random-Cluster ability of Monte Carlo simulations in the sciences Model I. Introduction and Relation to other Models,” Physica, vol. 57, no. 4, 1972, pp. 536–564. and engineering. 20. U. Wolff, “Collective Monte Carlo Updating for Spin Systems,” Physical Rev. Letters, vol. 62, no. 4, 1989, pp. 361–364. Acknowledgments 21. H.G. Evertz, G. Lana, and M. Marcu, “Cluster Algorithm for Vertex This work was supported by the US National Science Models,” Physical Rev. Letters, vol. 70, no. 7, 1993, pp. 875–878. Foundation through grant numbers DMR-0219328 and 22. M. Troyer, S. Wessel, and F. Alet, “Flat Histogram Methods for CCF-0428826, as well as grants of computer time from Quantum Systems: Algorithms to Overcome Tunneling Problems and Calculate the Free Energy,” Physical Rev. Letters, vol. 90, the Ohio Center (grant number PJS0245) 2003, pp. 120201-1–120201-4. and the Pittsburgh Supercomputer Center. 23. C. Dress and W. Krauth, “Cluster Algorithm for Hard Spheres and Related Systems,” J. Physics A: and General, vol. 28, References 1995, pp. L597–L601. 1. N.C. Metropolis et al., “Equation of State Calculations by Fast 24. J. Liu and R. Luijten, “Rejection-Free Geometric Cluster Algorithm Computing Machines,” J. Chemical Physics, vol. 21, 1953, pp. for Complex Fluids,” Physical Rev. Letters, vol. 92, no. 3, 2004, 1087–1092. pp. 035504-1–035504-4. 2. W. Wenzel and K. Hamacher, “ Approach 25. J. Liu and R. Luijten, “Generalized Geometric Cluster Algorithm for Global Minimization of Complex Potential Energy Land- for Fluid Simulation,” Physical Rev. E, vol. 71, no. 12, 2004, pp. scapes,” Physical Rev. Letters, vol. 82, no. 15, 1999, pp. 066701-1–066701-12. 3003–3007. 26. G.M. Torrie and J.P. Valleau, “Nonphysical Sampling Distributions 3. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by Sim- in Monte Carlo Free-Energy Estimation: Umbrella Sampling,” J. ulated Annealing,” Science, vol. 220, no. 4598, 1983, pp. 671–680. Computational Physics, vol. 23, no. 2, 1977, pp. 187–199. 4. J.R. Koza, Genetic Programming: On the Programming of Comput- 27. G.R. Smith and A.D. Bruce, “A Study of the Multi-Canonical ers by of Natural Selection, MIT Press, 1992. Monte Carlo Method,” J. Physics A: Mathematical and General, 5. E. Marinari and G. Parisi, “Simulated Tempering: A New Monte vol. 28, 1995, pp. 6623–6643. Carlo Scheme,” Europhysics Letters, vol. 19, 1992, pp. 451–458. 28. A.M. Ferrenberg and R.H. Swendsen, “New Monte Carlo Tech- 6. W.K. Hastings, “Monte Carlo Sampling Methods Using Markov nique for Studying Phase Transitions,” Physical Rev. Letters, vol. Chains and Their Applications,” Biometrika, vol. 57, no. 1, 1970, 61, no. 23, 1988, pp. 2635–2638. pp. 97–109. 29. F. Wang and D.P. Landau, “Efficient, Multiple-Range Random 7. A.A. Barker, “Monte Carlo Calculations of Radial Distribution Walk Algorithm to Calculate the Density of States,” Physical Rev. Functions for a Proton-Electron Plasma,” Australian J. Physics, vol. Letters, vol. 86, no. 10, 2001, pp. 2050–2053. 18, no. 2, 1965, pp. 119–128. 30. F. Wang and D.P. Landau, “Determining the Density of States for 8. I.R. McDonald, “NpT-Ensemble Monte Carlo Calculations for Bi- Classical Statistical Models: A Random Walk Algorithm to

18 COMPUTING IN SCIENCE & ENGINEERING Produce a Flat Histogram,” Physical Rev. E, vol. 64, no. 5, 2001, Parallel Kinetic Monte Carlo Simulations of Thin Film Growth,” J. pp. 056101-1–056101-16. Computational Physics, vol. 212, no. 1, 2006, pp. 305–317. 31. F.C. Chuang et al., “Structure of Si(114) Determined by Global Op- 54. S.G. Eick et al., “Synchronous Relaxation for Parallel Simulations timization Methods,” Surface Science, vol. 578, 2005, pp. 183–195. with Applications to Circuit-Switched Networks,” ACM Trans. 32. R. Faller, Q. Yan, and J.J. de Pablo, “Multicanonical Parallel Tem- Modeling and Computer Simulation, vol. 3, no. 4, 1993, pp. pering,” J. Chemical Physics, vol. 116, no. 13, 2002, pp. 287–314. 5419–5423. 55. B.D. Lubachevsky and A. Weiss, “Synchronous Relaxation for Par- 33. W. Kohn and L.J. Sham, “Self-Consistent Equations Including Ex- allel Ising Spin Simulations,” Proc. 15th Workshop Parallel and Dis- change and Correlation Effects,” Physical Rev. A, vol. 140, no. 4A, tributed Simulation (PADS’01), IEEE CS Press, 2001; http:// 1965, pp. A1133–A1138. lanl.arXiv.org/abs/cs.DC/0405053. 34. D.M. Ceperley and B.J. Alder, “Ground State of the Electron Gas 56. Y. Shim and J.G. Amar, “Rigorous Synchronous Relaxation Algo- by a Stochastic Method,” Physical Rev. Letters, vol. 45, no. 7, rithm for Parallel Kinetic Monte Carlo Simulations of Thin Film 1980, pp. 566–569. Growth,” Physical Rev. B, vol. 71, 2005, pp. 115436-1–115436-12. 35. J.P. Perdew and A. Zunger, “Self-Interaction Correction to Den- 57. Y. Shim and J.G. Amar, “Semirigorous Synchronous Sublattice Al- sity-Functional Approximations for Many-Electron Systems,” gorithm for Parallel Kinetic Monte Carlo Simulations of Thin Film Physical Rev. B, vol. 23, no. 10, 1981, pp. 5048–5079. Growth,” Physical Rev. B, vol. 71, 2005, pp. 125432-1–125432-13. 36. W.L. McMillan, “Ground State of Liquid He4,” Physical Rev., vol. 58. Y. Shim and J.G. Amar, “Growth Instability in Cu Multilayer Films 138, no. 2A, 1965, pp. A442–A451. Due to Fast Edge/Corner Diffusion,” to be published in Physical Rev. B, 2006. 37. D.M. Ceperley, G.V. Chester, and M.H. Kalos, “Monte Carlo Sim- ulation of a Many-Fermion Study,” Physical Rev. B, vol. 16, no. 7, 1977, pp. 3081–3099. Jacques G. Amar is an associate professor in the Depart- 38. M.H. Kalos, D. Levesque, and L. Verlet, “Helium at Zero Tem- ment of Physics and Astronomy at the University of perature with Hard-Sphere and other Forces,” Physical Rev. A, vol. 9, no. 5, 1974, pp. 2178–2195. Toledo. His research interests include the theory of 39. S.D. Ivanov, A.P. Lyubartsev, and A. Laaksonen, “Bead-Fourier condensed-matter systems far from equilibrium, surface Path Integral Molecular Dynamics,” Physical Rev. E, vol. 67, 2003, physics, and epitaxial thin-film growth, as well as com- pp. 066710-1–066710-11. putational and . Amar has a PhD in 40. R.P. Feynman and A.R. Hibbs, Quantum Mechanics and Path In- physics from Temple University. He is a member of the APS tegrals, McGraw Hill, 1965. and the Materials Research Society. Contact him at jamar@ 41. A.F. Voter, “Introduction to the Kinetic Monte Carlo Method,” to be published in Radiation Effects in Solids, K.E. Sickafus and E.A. physics.utoledo.edu. Kotomin, eds., Springer, 2005. 42. K.A. Fichthorn and W.H. Weinberg, “Theoretical Foundations of Dynamical Monte Carlo Simulations,” J. Chemical Physics, vol. 95, no. 2, 1991, pp. 1090–1096. 43. G.H. Vineyard, “Frequency Factors and Isotope Effects in Solid State Rate Processes,” J. Physics and Chemistry of Solids, vol. 3, 1957, pp. 121–127. 44. J.L. Blue, I. Beichl, and F. Sullivan, “Faster Monte Carlo Simula- tions,” Physical Rev. E, vol. 51, no. 2, 1995, pp. R867–R868. 45. T.S. Rahman et al., “Atomistic Studies of Thin Film Growth,” Proc. SPIE, vol. 5009, 2004, pp. 1–14. 46. O. Trushin et al., “Self-Learning Kinetic Monte Carlo Method: Application to Cu(111),” Physical Rev. B, vol. 72, 2005, pp. 115401-1–115401-9. 47. K.M. Chandy and J. Misra, “Distributed Simulation: A Case Study in and Verification of Distributed Programs,” IEEE Trans. Software Eng., vol. 5, no. 9, 1979, pp. 440–452. 48. B.D. Lubachevsky, “Efficient Parallel Simulations of Dynamic Ising Spin Systems,” J. Computational Physics, vol. 75, no. 1, Stay on Track 1988, pp. 103–122. 49. G. Korniss et al., “From Massively Parallel Algorithms and Fluc- IEEE Internet Computing reports emerging tools, tuating Time Horizons to Nonequilibrium Surface Growth,” Physical Rev. Letters, vol. 84, no. 6, 2000, pp. 1351–1354. technologies, and applications implemented 50. G. Korniss et al., “Suppressing Roughness of Virtual Times in Par- through the Internet to support a worldwide allel Discrete-Event Simulations,” Science, vol. 299, 31 Jan. 2003, computing environment. pp. 677–679. 51. G. Korniss, M.A. Novotny, and P.A. Rikvold, “Parallelization of a Dynamic Monte Carlo Algorithm: A Partially Rejection-Free Conservative Approach,” J. Computational Physics, vol. 153, no. 2, 1999, pp. 488–508. 52. M. Kardar, G. Parisi, and Y.C. Zhang, “Dynamic Scaling of Growing Interfaces,” Physical Rev. Letters, vol. 56, no. 9, 1986, pp. 889–892. www.computer.org/internet 53. Y. Shim and J.G. Amar, “Hybrid Asynchronous Algorithm for

MARCH/APRIL 2006 19