arXiv:2011.08560v1 [cond-mat.dis-nn] 17 Nov 2020 ue ls otetasto,suiso pngassys- -glass 40 of to studies up sizes transition, tempera- of the at tems to method close this simple Using a tures in free- than simulation. the efficiently Metropolis of more structure much rugged landscape differ- the energy the explore Thereby, can decorrelate. the replicas to through ent thus freely travel and to space val- temperature phase free-energy high deep at explore and fully leys replicas to tempera- the ex- temperature enables different low intervals procedure the at This time between attempted. certain are replicas tempera- tures After the different of at changes performed. (replicas) system are the tures simulations of Metropolis copies where of method [3–5] ensembles. (PT) broad-energy pering of term of the range in wide subsumed can be that a problem all been this tackling have developments huge There algorithmic the overcome and to barriers. states) sufficient (metastable not free-energy is minima energy local thermal the simula- in The stuck ergodicity. get dra- broken tions effectively fails landscape the free-energy it of rugged the because temperatures with in systems low for weight At matically statistical ensemble. their canonical to according figurations the task. of low- challenging the very investigation a in the phase systems temperature such renders of poly- glasses, properties folding thermodynamical spin e.g., dy- or systems, the en- physical mers problem of many This down in phase. slowing countered low-temperature massive the in from namics suffer [1] scape o rudsaesace ytm faot10 about of systems searches ground-state For n ot al 91]mto hc rcessimilarly proceeds which method [9–13] Carlo Monte ing systems of investigation landscape. free-energy the rugged the in with probably simplicity: method it its employed and makes is set which most temperature performance PT suitable good of a exhibits advantage needs only great algorithm The the [8]. sible ∗ ‡ † [email protected] [email protected] [email protected] n omnyepoe ehdi h aallTem- Parallel the is method employed commonly One con- sample to designed is [2] algorithm Metropolis The land- free-energy rugged with systems of Simulations nte eetdvlpeti h ouainAnneal- Population the is development recent Another fma on-rptmscnb civdfralltiesize lattice all for achieved be modifi can this times by round-trip that en mean show We in of histograms case. multicanonical power-law-shaped standard simulating the by modified is 1 the method, multicanonical esuytebmdlEwrsAdro pngascomparing glass spin Edwards-Anderson bimodal the study We ntttfu hoeicePyi,Universit¨at Leipzig, Physik, f¨ur Theoretische Institut .INTRODUCTION I. 3 pn aebe eotd[,7]. [6, reported been have spins ofltHsormTcnqe o pnGlasses Spin for Techniques Histogram Nonflat ai M¨uller,Fabio /k esml n aalltmeig oa prahweetee the where approach an to tempering, parallel and -ensemble ∗ Dtd oebr1,2020) 18, November (Dated: tfnSchnabel, Stefan 3 r fea- are thsbe oe ydffrn eerhr htti en- the this is that improvement suggested researchers One 1 different optimal. by not is noted semble However, been energy. same has the in with histogram it simu- energies flat the possible a method all yielding this visit probability to In up applied 20]. set been [19, is Refs. lation already rugged in has glasses with It spin systems to of too. simulation landscape, the free-energy which in simu- transitions well phase the first-order performs for with systems designed of algorithm lation established well other for computing parallel of method. because real- use any efficient disorder play, the different into allows many come izations simulating of not necessity does how- the advantage systems, this disordered For ever, massively for implementation. suitability its parallel to is main algorithm able The this not complexity. of additional advantage is the it to more 15] due remains however, cumbersome 12, optimization, [10, Its glasses PT. outperform spin for optimizing method of the attempts simple the to Despite evalua- contrast Annealing. the in Simulated observables permits thermodynamic This of equilibrium. tion simulation thermal the at temperature kept the is lowering and after replicas replicas population of the of population of resampling big intermediate a introducing by on an- gradually The performed is schedule. is system annealing nealing an the to as according down [14] cooled Annealing Simulated to ytm oee,i u mlmnain ntecs of case the in implementation, our underlying in the However, of simu- system. the independently of diminish times should round-trip lations the that the on suggest effort of region simulation this the nature concentrating of and the bottlenecks simulation the and the identifying work the automatically of that for algorithm performance in round- improved considered the The of models energy. behavior in scaling times the trip model improves Ising it ferromagnetic which the method for to The applied energy. others in among the trips is an maximize round uses to performed order method of in number The diffusivity local [22]. the of Ref. estimator in proposed was gorithm phase. low-temperature the which the histograms than towards energy grow often in descrip- more resulting this region region, out, high-energy low-energy point the authors samples the integrated tion As the of states. inverse of the density is distribution sampling the /k h utcnncl(UA ehd[61]i an- is [16–18] method (MUCA) Multicanonical The nte o-aaercotmzto fteMC al- MUCA the of optimization non-parametric Another † esml yHseb n tnhob 2] where [21], Stinchcombe and Hesselbo by -ensemble n ofadJanke Wolfhard and P 311 48 epi,Germany Leipzig, 04081 231101, IPF ae noconsideration. into taken s ainasgicn pe-pi terms in speed-up significant a cation ryisedo a itgasa in as histograms flat of instead ergy salse ehd,nml the namely methods, established ‡ nsemble 2

11 the three-dimensional (3D) bimodal Edwards-Anderson 10 5 10 (EA) spin glass [23], the round-trip times did not sys- flat PSH(E) 1/k 4 tematically improve with this method. Instead the simu- PT 10 lation got stuck for some of the considered samples, ren- 9 power law 10 3 dering a comparison to the other methods impossible. 10 ) ) E ( In this work we present a different approach: we pre- E 2 ( 10 scribe parametric profiles for the histograms of the simu- SH H E

7 P lation and adjust the simulation weights accordingly. As 10 ∆ 1 g 10 E for the three previous MUCA variants, it requires the − g 0 knowledge of the underlying , but it is E 10 much more flexible. The profiles are all chosen to be 5 10 shifted power laws having two free parameters. −1000 −800 −600 −400 −200 0 As an example we consider the 3D bimodal EA spin E glass. This is one of the simplest models exhibiting a rugged free-energy landscape and is also interesting from FIG. 1. The recorded histograms H(E) of the different meth- the point of view of an optimization problem where find- ods and the profile function PSH(E) for one disorder realiza- ing ground states of hard disorder realizations is NP- tion of linear lattice size L = 8. The dotted and the dashed hard [24]. Despite the exponential growth of the compu- vertical lines indicate the position of the ground-state energy Eg and the position of the pole of the power law (5), respec- tational resources fundamental questions regarding the tively. nature of the spin-glass phase still remain. For the progress in understanding the open questions the devel- opment of new methods and an improvement of the ex- criterion with an energy dependent weight function isting methods is crucial. The rest of the paper is organized as follows. In Sec. II W (Enew) Pacc = min 1, , (2) the spin-glass model and the simulation methods are ex- W (E )  old  plained. The direct comparison of the round-trip times of the individual methods is performed in Sec. III. The where the weight function is proportional to the inverse of the density of states Ω(E), framework of extreme-value statistics is introduced in Sec. IV. In Sec. V benchmarks for the global comparison W (E) ∝ Ω−1(E). (3) are discussed and the different methods are compared in terms of those benchmarks. The results are summarized For the MUCA simulations Ω(E) has to be sufficiently in Sec. VI. well-known a priori for each disorder realization. An es- timator for it can, for instance, be obtained by means of the Wang-Landau algorithm [25] or, as in this work, by II. MODEL AND EMPLOYED METHODS other iterative procedures which are explained, e.g., in Ref. [26]. This ensemble produces histograms which are We take into consideration the 3D bimodal EA model flat in energy and is, therefore, often also referred to as whose Hamiltonian takes the form “flat histogram method”. A straightforward generalization of the flat histogram H = − Jij SiSj, (1) method are the nonflat histogram methods. If the sim- Xhiji ulation weights for the flat MUCA method are multi- plied with the desired energy dependent shape (or profile) where the bonds Jij and the spins Si can take values ±1. function PSH(E) The sum runs over all neighboring spins in the simple- −1 cubic lattice with periodic boundary conditions. W (E) ∝ Ω (E)PSH(E), (4) Due to the disordered nature of spin glasses the study has to take into account a sufficiently large set of disor- the resulting histograms will be shaped according to der realizations on which the averaged quantities can be PSH(E). In this work all the profiles are shifted power computed. In this case one disorder realization consists laws of the form α of a set of 3V couplings Jij which are either positive or E 3 P (E, ∆E, α)= +1 , (5) negative unity with a probability of 50%, where V = L SH ∆E − E is number of spins in a lattice of linear lattice size L. The  g  disorder realizations are generated prior to the simulation where the exponent α < 0 and ∆E > 0 is the position and then kept fixed for all times (quenched disorder). As of the pole relative to the ground-state energy Eg of the an adequate set of disorder realizations 4000 samples with respective spin-glass realization. In this parametrization L = 3 and L = 4 are generated and 5000, 6000, and 4000 the power laws are normalized to unity at E = 0. samples of size L =5, 6, and 8, respectively. In Fig. 1 the recorded histograms of the different meth- The method which we adapted is the well-established ods are displayed on a logarithmic y-scale for one disor- MUCA method [17] employing a generalized Metropolis der realization with L = 8. In contrast to flat MUCA all 3 methods have in common that the distribution of sam- ergodicity and apply it to spin glasses and the traveling pled states grows towards the ground-state energy. The salesman problem [27]. recorded histogram of nonflat MUCA matches perfectly Since for the above mentioned methods the density of the imposed profile and its histogram in the ground-state states is the only needed input it was determined only region is similar to that of PT. We are convinced that this once to high accuracy employing the iterative procedure feature which among the existing methods is strongest adapted from Ref. [26] but with power-law shaped distri- for PT enhances the ability of sampling the low-energy butions in energy. In this case, and generally when the region and especially the ability of finding low-energy ground-state energy of the system is not known, a priori states of investigated systems. There are different possi- the profile function has to be adapted whenever a lower ble choices of functional forms which enhance the sam- energy is found. pling of the low-energy region and even stepwise defined Lastly, the PT method being probably the most em- function could be employed and might even yield better ployed algorithm for spin-glass simulations, is included results. We chose a power law because the two involved in the comparison. The ensemble in this case is defined parameters allow for a good adaptation but the tuning of by a set of M temperatures {Ti, i =1, ..., M}. For each the parameters in the two-dimensional parameter space temperature Ti a Metropolis simulation of a copy of the remains feasible. system (replica) is performed. The temperatures of the For the above parametrization we found a fixed param- replicas i and j are allowed to exchange configuration eter set namely α = −3.6 and ∆E = 96 which indepen- according to dently of the lattice size yielded the shortest mean round- 1 1 ( − )(Ej −Ei) ex Tj Ti trip times, among the considered profiles. Subsequently Pij = min 1,e , (8) we will refer to the nonflat MUCA setting with the power-   law shape belonging to this parameter set just as power- where Ei and Ej are the energies of replica i and j and law (PL) setting or nonflat MUCA method. While the kB = 1. This prescription allows for fast decorrelation overall best results are obtained with this parameter set, when a replica travels to high temperature and the explo- we want to point out that an improvement compared to ration of the local minima at low temperatures. Among flat MUCA was visible for each of the considered param- the vast choice of different PT protocols available [28] eter sets. The parametrization with a fixed offset from we opted for the constant exchange rate protocol with the ground-state energy yields different relative distribu- acceptance rates between 40% and 60% [29]. For all sim- tions depending on the ground-state energy encountered ulations the maximal temperature was chosen to be well in the respective disorder realization. The value of the above the critical temperature, Tmax > 3 > Tc ≈ 1. profile function at the ground-state energy is given by The exchange rates were imposed on each individual dis- order realization in an initial equilibration run during α which the temperatures were modified accordingly. The 1 P (E , ∆E, α)= . (6) number of replicas was set to M = 7, 7, 12, 14, and 20 SH g Eg 1 − ∆E ! for L = 3, 4, 5, 6, and 8, respectively. We note that the choice of the temperature set is crucial for the PT algo- The sampling at the ground-state energy compared to rithm and also provides the possibility of optimizations zero energy is thus enhanced by a factor of ≈ 13 for a as for example in Ref. [30]. However, in this work we disorder realization with L = 4 and a typical ground- rather limit ourselves to a well established protocol for state energy of ≈ −100. For a sample with L = 8 and PT focusing on the optimization of the nonflat histogram typical ground-state energy of ≈ −900 instead it is en- technique. hanced by a factor of ≈ 4500. Due to this feature this parametrization of the profile function does not require any adjustments of the parameters in the system sizes III. COMPARISON OF THE ROUND-TRIP which we considered. Presumably such a profile will also TIMES yield good results for larger systems, although we cannot be certain. The observable taken into account for this study is the Next the 1/k-ensemble [21] is considered which is de- round-trip time. For all methods except PT and each fined by setting the simulation weights equal to the in- disorder realization it is defined as the time needed by the verse of the integrated density of states up to the energy simulation to travel from the highest energy (E ≈ 0) to of the respective bin the ground-state energy and back. For PT, instead, the −1 round trip is measured between the ground-state energy E W (E) ∝ 1/k = dE′Ω(E′) . (7) and an energy typical for a with a 1/k temperature well above the freezing point of the disorder Eg ! Z realization [31][32]. This time can be taken as an upper Here, a first-order Taylor expansion of lnΩ at E leads bound of the autocorrelation time of the energy of the ′ ′ to W (E) ≈ W1/k(E) if PSH(E) = d ln Ω(E )/dE |E′=E. respective disorder realization at the ground state. We This prescription again relies on the knowledge of the want to stress that the energies we refer to as ground- density of states. The authors of Ref. [21] stress its robust state energies are the lowest encountered energies and 4

4 4 4 10 L = 4 10 L = 4 10 L = 4 ) ) ) i i i ( ( 103 103 ( 103 PT PT flat τ τ

τ

102 102 102 102 103 104 102 103 104 102 103 104 τflat(i) τPL(i) τPL(i)

L = 8 L = 8 L = 8 109 109 109

7 ) 7 7 ) ) i i 10 10 i 10 ( ( ( PT PT flat τ τ τ

105 105 105

103 103 103 103 105 107 109 103 105 107 109 103 105 107 109 τflat(i) τPL(i) τPL(i)

FIG. 2. Scatter plots of the round-trip times comparing the nonflat power-law histogram technique (PL) to the standard flat MUCA (flat) and parallel tempering (PT) methods for sizes L = 4 (upper panels) and L = 8 (lower panels). All points scattered above the identity line have longer round-trip times for the method on the y axis. may not be the true ground states. However, the round- parallel tempering method (left panels) for both lattice trip times were always measured performing at least 100 sizes L = 4 and L = 8, the τi are systematically lower round trips for each individual sample and method so for PT, indicating its superior performance for the whole that several hundred round trips have been performed classes of the bimodal EA spin glasses of the respective on each disorder realization. In case lower energies were lattice sizes. measured during this process the disorder realization was When comparing the performance of the nonflat his- requeued and simulated again until the desired number togram method to the flat MUCA method (central pan- of round trips was achieved. This procedure renders the els) the surrounding area of the scattered round-trip discovery of the true ground state very probable. After times shows a bending, i.e., for L = 4 the flat histogram at least 100 round trips the relative statistical error in method displays only slightly higher round trip times for the round-trip time τi is of the order of ∆τi/τi ≈ 0.1. the easy disorder realizations. With increasing hardness The first property we want to look at is the depen- the round-trip times for the flat histogram method grow dence of the round-trip times for the individual disorder faster than for the PL setting. This effect gets enhanced realizations on the employed method. The scatter plots with a further increase of the lattice size (see lower panel) in Fig. 2 show the round-trip times of the same disorder where for the case of L = 8 the round-trip times of the realization for two different methods for all the simulated easiest samples for the flat MUCA method are similar to disorder realizations of size L = 4 and L = 8 on a log- those for PL. However, as will become apparent in the log scale. The strong correlation of the round-trip times next section, the hard samples contribute most to the for each single disorder realization should be noted, indi- mean round-trip time so that even a slightly weaker per- cating that the hardness of the underlying optimization formance for the easier samples would hardly contribute problem is primarily a characteristic of the disorder real- to the total computation time. ization and mostly independent of the employed method. The right panels show the comparison of PL to PT. For This fact allows us to categorize the disorder realizations L = 4 PT outperforms the nonflat histogram method for and speak of easy and hard instances. Comparing the the easy disorder realizations, while for the hard ones round-trip times τi for the flat MUCA method and the PL displays shorter round-trip times. For L = 8 a large 5

1.0 1.0 flat flat (a) 1/k (b) 1/k 0.8 PT 0.8 PT power law power law 0.6 0.6 ) ) τ τ ( ( ) F F

0.4 τ 0.4 ( f 0.2 0.2 200 400τ 600 800 0.0 0.0 102 103 104 104 105 106 107 108 109 1010 τ τ

FIG. 3. Round-trip time distributions (symbols) and best fitting cumulative distribution functions (lines) for the different methods and lattice size L = 4 (a) and L = 8 (b). The inset of the left panel shows the PDF form of the distribution. fraction of the disorder realizations is characterized by shorter round-trip times for PT, but the tail of the dis- 1.5 tribution describing the hard samples exhibits shorter round-trip times for PL. 1.0 ) L ( IV. ROUND-TRIP TIME DISTRIBUTIONS ξ 0 5 flat . 1/k PT In order to quantify the observations of the previous power law section the distributions of the round-trip times can be 0.0 examined. Round trips in energy include the visit of the 3 4 5 6 7 8 ground state of the respective disorder realization which L is an extreme event. Their statistics must thus be de- scribed in the framework of extreme-value statistics. One FIG. 4. The figure shows the shape parameter of the best of the main results in this field is given by the Fisher- fitting Fr´echet distribution in dependence of the lattice size for Tippet-Gnedenko theorem [33] which characterizes the the different employed simulation methods. The dotted line type of distributions which extreme-value distributions indicates the threshold value from which on the distribution mean diverges. can converge to. The round-trip time distributions of the bimodal EA spin glass all seem to converge to Fr´echet distributions independently of the method and the sys- tem size. This has already been suggested in Ref. [34] and L = 8 are plotted in Fig. 3. The points represent for the round-trip time distributions of the 3D EA model the measured data and the solid lines are the best fitting employing the flat histogram ensemble. Fr´echet distributions. The varying performance of the One parametrization of the cumulative distribution methods in dependence on the difficulty of the disorder function (CDF) of the Fr´echet distribution is given by realizations which became visible in the last section, also reflects in the distribution of the round-trip times. For both lattice sizes the CDF belonging to the flat MUCA τ − µ −1/ξ F (τ) = exp − 1+ ξ , (9) method is lower for all τ than the one belonging to PT. β "   # The maximum increase which corresponds to the bulk of the distribution is shifted to higher τ for MUCA as with τ ∈ [µ − β/ξ, ∞). The location of the distribution compared to PT. along the τ-axis is determined by µ, β is the scale pa- Comparing PT instead to the PL setting yields a dif- rameter, and the shape parameter ξ describes the decay ferent picture: the cumulative distribution functions for of the tail of the distribution, i.e., the occurrence of rare lattice size L = 4 cross at F (τ) ≈ 1/3, corresponding to events. The CDF is the integrated form of the proba- a round-trip time τ ≈ 2 × 102. This means that for the bility density function (PDF) f(τ). The round-trip time PT algorithm the easiest one third of all samples have distributions are thus all defined by sets of parameters smaller round-trip times than the easiest third for the µ,β,ξ which are determined by fitting the CDF to the PL method, while PL is faster for the harder two thirds. recorded round-trip times. For L = 8 the PL round-trip times are larger for the eas- The measured round-trip times and the respectively ier half of the samples and smaller for the harder half. best fitting Fr´echet distribution for lattice sizes L = 4 The round-trip times for the hard disorder realizations 6

round-trip time of all the disorder realizations belonging population 107 to the same problem class, by means of the mean τpop. This is a standard approach in all Monte 106 L = 8 Carlo studies and the law of large numbers assures its convergence for all random variables from distributions

) 5 n 10 L = 6 with well defined mean. However, this prerequisite is not (

τ fulfilled for all of the round-time distributions encoun- 104 tered in this work. The expected mean round-trip time resulting from the 3 10 L = 4 underlying probability density could be estimated by the distribution mean

10 100 1000 ∞ n hτi = dττf(τ). (10) Z FIG. 5. Illustration of the convergence of the population mean µ−β/ξ of the round-trip times for the flat MUCA method in depen- dence of the population size to the distribution mean of the This integral can be computed analytically, yielding underlying distribution. The solid lines are the running mean including the first n samples and the dotted lines in the re- µ + β [Γ (1 − ξ) − 1] for ξ < 1 spective color are the means of the underlying distribution. hτi = ξ , (11) (∞ otherwise have most influence on the decay of the distribution and with Γ(x) being the gamma function. The distribution thus on the shape parameter ξ. In Fig. 4 the scaling mean is, therefore, only defined as long as the shape pa- of the shape parameter for the different methods is dis- rameter ξ is smaller than one [37]. To illustrate this diffi- played, where the errors of the best fitting parameters culty one can consider the running mean which is defined are estimated via jackknifing [35]. For the considered as the population mean over the first n generated disor- lattice sizes the shape parameter scales similarly for all der realizations keeping them in a fixed order, the different methods. However the values for PL are sys- n tematically lower for L ≥ 4. This is in good agreement 1 τ(n)= τ , (12) with its superior performance for the difficult disorder n i i=1 realizations. X

implying τpop = τ(N), where N is the number of all simulated disorder realizations. V. ASSESSING THE PERFORMANCE OF THE In Fig. 5 the running mean for the flat MUCA method DIFFERENT METHODS and different system sizes is plotted together with the respective distribution mean, if it is defined. For L = 4 Next, we want to compare the performance of the dif- (ξ ≪ 1) the running mean quickly converges to the dis- ferent simulation methods. The most intuitive observable tribution mean indicated by the dotted line. For L = 6 would be the disorder average of the round-trip times (ξ ≈ 1) the jumps due to rare events in the tail of the dis- over the set of considered disorder realizations. How- tribution become more pronounced. The running mean ever, as it will turn out the rare-state events which have is still expected to approach the distribution mean for a a dominating influence on the distribution mean are not finite number of disorder realizations. For the 6000 sam- within the set of simulated disorder realizations. This ples considered in our work this is still not the case. For effect is accounted for by considering distribution means L = 8 (ξ > 1) the distribution mean is not defined. In up to large quantiles of the underlying extreme-value dis- the picture of the running mean, jumps represent round- tributions, yielding a more reliable measure of the real trip times in the tail of the distribution. In the case of performance of the different methods. ξ > 1 those jumps τn/n in the running mean are clearly visible in Fig. 5 and will lead to a divergence of the popu- lation mean the more disorder realizations are taken into A. Finding a Benchmark account and hence the more rigorously the tail of the dis- tribution is explored. This illustrates that the population In principle the real performance could be determined mean as a measure for the performance of the different by measuring the round-trip time of every possible dis- methods must be taken with a grain of salt. order realization. This procedure is discarded due to the In order to retain the characteristics of the underlying enormous number of possible disorder realizations [36]. round-trip time distribution into the estimator of the per- Instead we generate a subset of all possible disorder real- formance of the different methods the distribution mean izations and from those we try to infer the expected mean up to a certain quantile can instead be taken into account. 7

107 107

´aµ ´bµ 106 106

5 4 5 10 − 10 =10 pop 4 4 ǫ

τ 10 10 i

τ

aØ aØ 3 h 3

10 1/k 10 1/k ÈÌ

102 ÈÌ 102

Ô ÓÛeÖ ÐaÛ Ô ÓÛeÖ ÐaÛ

345678 345678 L L

FIG. 6. Population mean τpop (a) and quantile mean hτiǫ=10−4 (b) of the round-trip times for the different methods as a function of system size. The latter is the more reliable statistical quantity.

The quantile function is the inverse of the CDF (9), TABLE I. Ratios of the population mean τpop and the quan- tile mean hτi −4 of the round-trip times for flat MUCA, β −ξ ǫ=10 Q(p)= F −1(p)= µ + · (− ln p) − 1 , p ∈ (0, 1), the 1/k-ensemble, and parallel tempering with respect to the ξ same quantities for the power-law MUCA method. h i (13) yielding the round-trip time τp = Q(p) at which a certain flat MUCA 1/k-ensemble parallel tempering fraction p of the distribution is accumulated. For each L rpop rǫ=10−4 rpop rǫ=10−4 rpop rǫ=10−4 ǫ < 1 we define the quantile mean hτiǫ disregarding a fraction ǫ of the tail of the distribution as the integral (10) 3 1.160(2) 1.174(3) 1.0146(6) 1.0193(9) 1.637(4) 1.640(5) with the upper bound replaced by Q(1 − ǫ), 4 1.622(8) 1.68(2) 1.288(5) 1.328(10) 1.175(6) 1.25(2) 5 2.28(5) 2.44(6) 1.63(3) 1.75(4) 1.136(5) 1.185(8) Q(1−ǫ) 6 3.8(2) 3.9(2) 2.59(9) 2.6(2) 2.8(2) 3.4(3) hτiǫ = dττf(τ). (14) 8 10.5(2) 14.2(6) 6.9(3) 9.4(4) 2.1(2) 2.62(7)

µ−Zβ/ξ

The integral is evaluated with the parameters of the best fitting distributions to the measured round-trip times, see the median [38, 39], which are derived directly from the Fig. 3. This enables a well-defined extrapolation beyond τ-values without the intermediate step of fitting to a sta- the measured round-trip times of the simulated disorder tistical model. These quantiles behave similarly to the realizations of the underlying study and thus a compari- quantile means (14) being, however, less stable for small son of the different methods beyond the mere population ǫ. mean, which may be strongly dependent on the set of For the direct comparison of PL with the existing disorder realizations taken into account for the study. methods we introduce the relative performance r which we define as the fraction of the mean of the respective B. Comparison of the Different Methods method and the one of PL. In Table I, the relative per- formance for all different system sizes is listed. The errors in r are estimated using the Jackknife resampling tech- Finally, for the comparison of the mean round-trip nique. It consists in generating a set of ratios {ri}, where times only the population mean τpop and the quantile −4 for the calculation of each ri only a subset of all disorder mean hτiǫ=10−4 neglecting a fraction ǫ = 10 of the tail realizations is taken. The error in r is derived from the of the distribution are taken into account as the distribu- variance of the so generated Jackknife sample. tion mean for the parallel tempering method is already ill-defined for L = 6. The speedup of PL compared to flat MUCA increases The two definitions are evaluated for all simulated lat- with system size, reaching a factor of more than 10 for tice sizes and plotted in Fig. 6. Both definitions of the L = 8 for both definitions of the mean, while compared mean grow exponentially up to linear system size L =6 to the 1/k ensemble the speedup for the biggest system until which the mean is defined, while for L > 6, where size is still a factor of r ≈ 7 − 9. Compared to PT the the distribution means diverge, they seem to be growing speedup is less pronounced and not steadily growing with faster than exponentially. We have also looked at the system size, reaching a factor of r ≈ 2 − 3 for our largest scaling of the more commonly used quantiles including system sizes. 8

VI. CONCLUSION lattice size and reaches a factor of up to 10 − 15 in com- parison to flat MUCA and still a factor of up to 2 − 3 compared to PT. In terms of round-trip time distribu- Setting up multicanonical simulations such that the tions the heaviness of the tails is reduced by its superior outcoming histograms are shaped according to power ability to deal with the hard disorder realizations. laws instead of being flat is trivially achievable. Nev- This improved ability of the here proposed power-law ertheless this simple approach enables us to gather sig- MUCA method of finding ground states for the hard in- nificantly more independent statistics at the ground-state stances implies its usefulness in the application to gen- energy, which is important because the thermodynamic eral optimization problems. This is particularly use- contributions of the ground state of spin glasses are be- ful because many other optimization problems can be lieved to be significant. It is likely that similar techniques rephrased in terms of spin-glass Hamiltonians [40] and will also improve the sampling of the ground state of thus solved employing the same methodology. other systems with complex free-energy landscape such as polymers and in particular proteins, for which the im- portance of the native state is well known. ACKNOWLEDGEMENT While PT has been the most employed method in the simulation of spin glasses probably also due to its good This project was funded by the Deutsche Forschungs- ability to investigate the ground-state region, we were gemeinschaft (DFG, German Research Foundation) un- able to show that the power-law setting considerably im- der project No. 189853844 – SFB/TRR 102 (project proves the performance of multicanonical simulations in B04). It was further supported by the Deutsch- this respect, rendering them at least comparable to PT. Franz¨osische Hochschule (DFH-UFA) through the Doc- The overall gain in performance grows with increasing toral College “L4” under Grant No. CDFA-02-07.

[1] W. Janke, ed., Rugged Free Energy Landscapes: Com- [9] Y. Iba, Population Monte Carlo algorithms, Trans. Jpn. mon Computational Approaches to Spin Glasses, Struc- Soc. Artif. Intell. 16, 279 (2001). tural Glasses and Biological Macromolecules, Lect. Notes [10] K. Hukushima and Y. Iba, Population annealing and its Phys., Vol. 736 (Springer, Berlin, 2008). application to a spin glass, AIP Conf. Proc. 690, 200 [2] N. Metropolis, A. W. Rosenbluth, M. N. Rosen- (2003). bluth, A. H. Teller, and E. Teller, Equation of [11] J. Machta, Population annealing with weighted averages: state calculations by fast computing machines, A Monte Carlo method for rough free-energy landscapes, J. Chem. Phys. 21, 1087 (1953). Phys. Rev. E 82, 026704 (2010). [3] K. Hukushima and K. Nemoto, Exchange Monte Carlo [12] W. Wang, J. Machta, and H. G. Katzgraber, Population method and application to spin glass simulations, annealing: Theory and application in spin glasses, Phys. J. Phys. Soc. Jpn. 65, 1604 (1996). Rev. E 92, 063307 (2015). [4] R. H. Swendsen and J.-S. Wang, Replica [13] L. Y. Barash, M. Weigel, M. Borovsk´y, W. Janke, and Monte Carlo simulation of spin-glasses, L. N. Shchur, GPU accelerated population annealing al- Phys. Rev. Lett. 57, 2607 (1986). gorithm, Comput. Phys. Commun. 220, 341 (2017). [5] C. J. Geyer and E. A. Thompson, Annealing Markov [14] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza- chain Monte Carlo with applications to ancestral infer- tion by simulated annealing, Science 220, 671 (1983). ence, J. Am. Stat. Assoc. 90, 909 (1995). [15] A. Barzegar, C. Pattison, W. Wang, and H. G. [6] M. Hasenbusch, A. Pelissetto, and E. Vicari, Critical Katzgraber, Optimization of population annealing behavior of three-dimensional Ising spin glass models, Monte Carlo for large-scale spin-glass simulations, Phys. Rev. B 78, 214205 (2008). Phys. Rev. E 98, 053308 (2018). [7] M. Baity-Jesi, R. A. Ba˜nos, A. Cruz, L. A. Fer- [16] B. A. Berg and T. Neuhaus, Multicanoni- nandez, J. M. Gil-Narvion, A. Gordillo-Guerrero, cal algorithms for first order phase transitions, D. I˜niguez, A. Maiorano, F. Mantovani, E. Mari- Phys. Lett. B 267, 249 (1991). nari, V. Martin-Mayor, J. Monforte-Garcia, A. Mu˜noz [17] B. A. Berg and T. Neuhaus, Multicanonical ensemble: A Sudupe, D. Navarro, G. Parisi, S. Perez-Gaviro, M. Pi- new approach to simulate first-order phase transitions, vanti, F. Ricci-Tersenghi, J. J. Ruiz-Lorenzo, S. F. Phys. Rev. Lett. 68, 9 (1992). Schifano, B. Seoane, A. Tarancon, R. Tripiccione, [18] W. Janke, Multicanonical simulation of and D. Yllanes (Janus Collaboration), Critical pa- the two-dimensional 7-state , rameters of the three-dimensional Ising spin glass, Int. J. Mod. Phys. C 03, 1137 (1992). Phys. Rev. B 88, 224416 (2013). [19] B. A. Berg and T. Celik, New approach to spin-glass [8] W. Wang, J. Machta, and H. G. Katzgraber, Comparing simulations, Phys. Rev. Lett. 69, 2292 (1992). Monte Carlo methods for finding ground states of Ising [20] B. A. Berg, T. Celik, and U. Hansmann, Mul- spin glasses: Population annealing, simulated annealing, ticanonical study of the 3d Ising spin glass, and parallel tempering, Phys. Rev. E 92, 013303 (2015). Europhys. Lett. 22, 63 (1993). 9

[21] B. Hesselbo and R. B. Stinchcombe, Monte Carlo sim- as constant for all samples with the same lattice size. ulation and global optimization without parameters, [32] This different measuring prescription gives PT a slight Phys. Rev. Lett. 74, 2151 (1995). advantage in comparison to the other methods which is, [22] S. Trebst, D. A. Huse, and M. Troyer, Optimizing the en- however, negligible in the authors’ opinion. semble for equilibration in broad-histogram Monte Carlo [33] M. Charras-Garrido and P. Lezaud, Ex- simulations, Phys. Rev. E 70, 046701 (2004). treme value analysis: An introduction, [23] S. F. Edwards and P. W. Anderson, Theory of spin J. Soc. Fr. Statistique 154, 66 (2013). glasses, J. Phys. F: Met. Phys. 5, 965 (1975). [34] S. Alder, S. Trebst, A. K. Hartmann, and M. Troyer, Dy- [24] F. Barahona, On the computational namics of the Wang-Landau algorithm and complexity of complexity of Ising spin glass models, rare events for the three-dimensional bimodal Ising spin J. Phys. A: Math. Gen. 15, 3241 (1982). glass, J. Stat. Mech.: Theory 2004, P07008 (2004). [25] F. Wang and D. P. Landau, Efficient, multiple-range ran- [35] B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans dom walk algorithm to calculate the density of states, (Society for Industrial and Applied Mathematics, Phys. Rev. Lett. 86, 2050 (2001). Philadelphia, 1982). [26] W. Janke, Histograms and all that, in [36] The bimodal EA spin glass, having discrete randomness, Computer Simulations of Surfaces and Interfaces, has only a finite number of possible disorder realizations. edited by B. D¨unweg, D. P. Landau, and A. I. Milchev Due to the symmetries in the absence of external fields (Springer Netherlands, Dordrecht, 2003) pp. 137–157. this number can be estimated to be of the order of 2V , [27] The aim of the study was not maximizing the number of where V = L3 is the number of spins contained in the round trips in energy but rather the amount of statisti- lattice. cally independent data in an uncorrelated Monte Carlo [37] Due to the finite number of possible disorder realizations simulation. the mean is actually defined. Its finite value is expected [28] T. Papakonstantinou and A. Malakis, Par- to be of the order of 10100, and can therefore in terms of allel tempering and 3d spin glass models, computation time numerically not be distinguished from J. Phys. Conf. Ser. 487, 012010 (2014). a real divergence. [29] E. Bittner, A. Nußbaumer, and W. Janke, Make life sim- [38] B. A. Berg, A. Billoire, and W. Janke, Spin- ple: Unleash the full power of the parallel tempering al- glass overlap barriers in three and four dimensions, gorithm, Phys. Rev. Lett. 101, 130603 (2008). Phys. Rev. B 61, 12143 (2000). [30] H. G. Katzgraber, S. Trebst, D. A. Huse, and M. Troyer, [39] E. Bittner and W. Janke, Free-energy bar- Feedback-optimized parallel tempering Monte Carlo, riers in the Sherrington-Kirkpatrick model, J. Stat. Mech.: Theory 2006, P03018 (2006). Europhys. Lett. 74, 195 (2006). [31] The estimated temperature was extracted from few single [40] A. Lucas, Ising formulations of many NP problems, disorder realizations for each lattice size and then taken Front. Phys. 2, 5 (2014).