Monte Carlo Methods for Rough Energy Landscapes

Monte Carlo Methods for Rough Energy Landscapes Jon Machta University of Massachusetts Amherst, Santa Fe Institute SFI, August 9, 2013 Collaborators • Stephan Burkhardt • Chris Pond • Jingran Li • Richard Ellis Motivation • How to sample equilibrium states of systems with rugged free energy landscapes, e.g. spin glasses, configurational glasses, proteins, combinatorial optimization problems. Problem • Markov chain Monte Carlo (MCMC) at a single temperature such as the Metropolis algorithm gets stuck in local minima. Problem • Markov chain Monte Carlo (MCMC) at a single temperature such as the Metropolis algorithm gets stuck in local minima. Outline • Population Annealing • Parallel Tempering • PA vs PT • Conclusions 5 spin glasses and perhaps other systems with rough free 0.1 energy landscapes. By using weighted averages over an ensemble of runs, biases inherent in a single run can be 0.01 can be made small and high precision results can be obtained. The method can be optimized by minizing the ⇤ 0.001 variance of the free energy estimator. If the variance of the free energy estimator is less than unity, high precision 4 results can be obtained from a relatively small ensemble 10 of runs. The method is comparable in efficiency to parallel tempering and is well suited to parallelization. 105 0.1 0.2 0.3 0.4 Var ⇥F FIG. 3: The variance of the free energy Var(βF˜) vs. the prob- ability of a small overlap, ⇥, for a sample⇥ of 25 disorder real- izations. 50 40 30 20 Acknowledgments 10 3813 3814 3815 3816 3817 This work was supported in part from NSF grant FIG. 4: The histogram of the dimensionless free energy, βF . DMR-0907235. I am grateful to Helmut Katzgraber and − PopulationThe solidAnnealing line is the best fit to a Gaussian distribution. Burcu Yucesoy for helpful discussions. [1] K. Hukushima and Y. Iba, in THE MONTE CARLO [7] C. Jarzynski, Phys. Rev. E 56, 5018 (1997). METHOD IN THE PHYSICAL SCIENCES: Celebrating [8] M. E. J. Newman and G. T. Barkema, Monte Carlo the 50th Anniversary of the Metropolis Algorithm, edited Methods in Statistical Physics (Oxford, Oxford, 1999). by J. E. Gubernatis (AIP, 2003), vol. 690, pp. 200–206. [9] H. G. Katzgraber and A. P. Young, Phys. Rev. B 65, [2] R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 57, 214402 (2002). 2607 (1986). [10] F. Krzakala and O. C. Martin, Phys. Rev. Lett. 85, 3013 [3] K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, (2000). • Modification of simulated annealing1604 (1996).for equilibrium [11] M. Palassini and A. P. Young, Phys. Rev. Lett. 85, 3017 [4] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Science (2000). sampling. 220, 671 (1983). [12] E. Marinari and F. Zuliani, J. Phys. A: Math. Gen. 32, [5] A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. 7447 (1999). A population of replicas is cooled61, according 2635 (1988). to an • [6] J. B. Anderson, J. Chem. Phys. 63, 1499 (1975). annealing schedule. Each replica is acted on by a MCMC (e.g. Metropolis) at the current temperature. • During each temperature step, the population is differentially resampled according to Boltzmann weights to maintain equilibrium. Population Annealing E1 E2 E3 E4 E5 ER R replicas β=1/kT ... Population Annealing E1 E2 E3 E4 E5 ER R replicas β=1/kT ... βʹ Population annealing = simulated annealing with differential reproduction (resampling) of replicas Temperature Step E1 E2 E3 E4 E5 ER β ... exp [ (β β)Ej] ⇥j(β, β)= − − Q(β, β) Replica j is reproduced nj times where nj is an integer random R variate with mean τj. j=1 exp [ (β β)Ej] Q(β, β)= − − R Temperature Step E1 E2 E3 E4 E5 ER β ... βʹ exp [ (β β)Ej] ⇥j(β, β)= − − Q(β, β) Replica j is reproduced nj times where nj is an integer random R variate with mean τj. j=1 exp [ (β β)Ej] Q(β, β)= − − R Resampling e.g. n1 =1, n2 =0, n3 =3, n4 =1 exp [ (β β)Ej] ⇥j(β, β)= − − Q(β, β) Replica j is reproduced nj times where nj is an integer random R variate with mean τj. j=1 exp [ (β β)Ej] Q(β, β)= − − R Nearest integer resampling Prob(n = ⌧ )=⌧ ⌧ Minimizes variance of nj. b c d e Maximizes number of ancestors with Prob(n = ⌧ )=⌧ ⌧ descendants. d e b c Population size fluctuates. Prob(n ⌧ > 1) = 0 − Population Annealing is related to... ✦Simulated annealing ✦Sequential Monte Carlo - See e.g. “Sequential Monte Carlo Methods in Practice”, A. Doucet, et. al. (2001) - aka Particle Filters - Nested Sampling, Skilling • Go with the Winners, Grassberger (2002) • Diffusion (quantum) Monte Carlo • Nonequilibrium Equality for Free Energy Differences, Jarzynski (1997) • Histogram Re-weighting, Swendsen and Ferrenberg (1988) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i 1 R I(β, β0, λ)=E log( Y ) R j j=1 X JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i 1 R I(β, β0, λ)=E log( Y ) R j j=1 X dI(β, β0, λ) A 0 = h iR dλ λ=0 JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i 1 R 1 VarY 1 I(β, β0, λ)=E log( Yj) = log(EY ) +O R 2 3/2 j=1 − 2R (EY ) R X ✓ ◆ dI(β, β0, λ) A 0 = h iR dλ λ=0 JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i 1 R 1 VarY 1 I(β, β0, λ)=E log( Yj) = log(EY ) +O R 2 3/2 j=1 − 2R (EY ) R X ✓ ◆ dI(β, β0, λ) 1 A 0 = = E0A +O h iR dλ R λ=0 ✓ ◆ JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Suppose the population at � is a correct, iid sample: Y =exp[ (β0 β)E + λA ] i − − i i 1 R 1 VarY 1 I(β, β0, λ)=E log( Yj) = log(EY ) +O R 2 3/2 j=1 − 2R (EY ) R X ✓ ◆ dI(β, β0, λ) 1 A 0 = = E0A +O h iR dλ R λ=0 ✓ ◆ A bias inversely proportional to the population size results from the re-weighting to �´ JM, R. Ellis, JSP 144, 544 (2011) Convergence to Equilibrium Resampling complicates the analysis... Conjecture: Fixing other parameters of population annealing, observable converge to their equilibrium values inversely in population size. Quantifying Convergence to Equilibrium • Do many runs for each of many population sizes. Expensive. • Is there an analog to the autocorrelation function and the exponential autocorrelation time? Quantifying Convergence to Equilibrium • Idea: resample small populations from a single run with a large population (bootstrap). – Choose a small population of parents and measure the observable in their descendants. – Repeat many times for different populations sizes. based on 300000*100*10 realization 1 realization 2 log10|<q2>-<q2>exact| log systematic error -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 3.5 4.0 4.5 5.0 log10(R) log population size • Problem: A small sample within a large population is not the same as an isolated small population because the sample within the large populations exists in a more competitive environment. realization 2 =bootstrap log10(|q2R-q2exact|) logsystematicerror 100runs =many runs 100000subsets -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 3.0 3.2 3.4 3.6 3.8 log10(R) log population size 3 the multinomial distribution p [R; n1,...,nR; ⇢1/R, . , ⇢R/R] for R trials. In this implementation the population size is fixed. Other valid resampling methods are available. For example, the number of copies of replica j can be chosen as a PoissonDirect random variable Estimate with mean proportional of Free to ⇢j(β, β0 ),Energies in which case the population size fluctuates [13]. For large R and small (β0 β), the resampled distribution is close to an equilibrium ensemble at the new temperature, − β0. However, the regions of the equilibrium distribution for β0 that di↵er significantly from the equilibrium distribution for β are not well sampled, leading to biasesR exp in the [ population(β β)E at β]0. In addition, due to resampling, the replicas are no longer independent. To mitigate bothj=1 of these problem, thej equilibrating subroutine is now applied. Finally, Q(β, β)= − − observables are measured by averaging over the population.R The entire algorithm consists of S 1 steps: in step k the temperature is lowered from βS k to βS k 1 via resampling − − − − followed by the application of the equilibratingK subroutine and data collection at temperature βS k 1. Population annealing permits one to estimate free energy di↵erences. If the annealing schedule− begins− at infinite βkF (βk)= ln Q(β⇥+1, β⇥)+βK F (βK ) ˜ temperature corresponding− to βS = 0, then it yields an estimate of the absolute free energy F (βk) at every temperature ⇥=k in the annealing schedule. The following calculation shows that the normalization factor Q(β, β0) is an estimator of the ratio of the partition functions at the two temperatures: β0E Z(β ) e− γ Derivation: 0 = γ Z(β) P Z(β) βEγ (β0 β)E e− = e− − γ ( ) Z(β) γ X (β0 β)E = e− − γ h iβ R 1 (β0 β)E e− − j = Q(β, β0).

Monte Carlo Methods for Rough Energy Landscapes

Parallel-Tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling

Chapter 2 Monte Carlo Simulations

LAMMPS – an Object Oriented Scientific Application

Desmond Users Guide Release 3.4.0 / 0.7

Hybrid Parallel Tempering and Simulated Annealing Method

The American Ceramic Society 38Th International Conference

Comparing Monte Carlo Methods for Finding Ground States of Ising Spin

Multiscale Implementation of Infinite-Swap Replica Exchange Molecular Dynamics

Parallel Tempering: Theory, Applications, and New Perspectives PCCP

PLUMED Tutorial

Conditions for Rapid Mixing of Parallel and Simulated Tempering on Multimodal Distributions

Qualitative Properties of Parallel Tempering and Its Infinite Swapping