arXiv:2107.08633v1 [cond-mat.stat-mech] 19 Jul 2021 ie nta tt.SneMetropolis Since state. initial from given starting variables distribution random method. target a sample arbitrary (MCMC) an quickly follow Carlo to that Monte required is chain MCMC Markov model the as by such 6]. [5, inference purposes stochastic and various evaluation, its for and be- training used has widely sampling re- learning, the come machine of these of to background development addition the cent In with [4]. applications, spin transitions traditional 2], glass [1, and proteins [3], like glasses complex macromolecules nu- of as to such behaviors developed systems equilibrium been the engineer- have examine and methods merically science sampling of The fields ing. various in importance ing aibe 1] sas lsie sa xeddensemble extended an as auxiliary classified as also of momenta is Monte help introduces [14], variables Hamiltonian the which (HMC), with [13]. Carlo [12] algorithm ex- Wang-Landau method the the multicanonical 11], [10, the groups: tempering simulated three and the into [9], Carlo the ensem- categorized Monte change extended to roughly of transition are path techniques rapid a ble The of a distribution. allowing proposal target a dimension by higher accelerated in and is variables, state convergence ensemble auxiliary the introducing the extended method, by ensemble extended the extended is called space the is In One [8]. The method concepts. distribution. accel- based two mainly target to on constructed been the proposed have techniques to been speed-up convergence have the variants erate sys- many complex [7], investigate tems to MCMC introduced cessfully ∗ † [email protected] [email protected] tcatcdnmc ihu ealdblnecniinco condition balance detailed without dynamics Stochastic h otcmo ehiu o apigi provided is sampling for technique common most The increas- of become have techniques sampling Recently, 1 ntttso noainfrFtr oit,Ngy Univer Nagoya Society, Future for Innovation of Institutes utemr epooea ffiin ot al ehdbsdo based violat method that appro Carlo two method Monte efficient the Carlo an paper, Monte propose this we generalized Furthermore In of framework understood. propose clearly been the have been approaches not bala two has detailed conve these the th Historically, of of In violation MCMC. acceleration tion. the of on variant the based a approach aimed such another is have (HMC) studies Carlo Monte of sam Hamiltonian of number technique huge common most a the provides (MCMC) Carlo Monte apigocpe nipratpsto ntere fvari of theories in position important an occupies Sampling .INTRODUCTION I. 2 rdaeSho fIfrainSine,Thk Universit Tohoku Sciences, Information of School Graduate 3 4 nttt fInvtv eerh oy nttt fTechn of Institute Tokyo Research, Innovative of Institute im-,C.Ld,Knn iaok,Tko1807,Japan 108-0075, Tokyo Minato-ku, Konan, Ltd., Co. Sigma-i, ehdadHmloinMneCarlo Monte Hamiltonian and method khs Ichiki Akihisa hoaaa euok,Tko1285,Japan 152-8550, Tokyo Meguro-ku, Oh-okayama, tal. et Dtd uy2,2021) 20, July (Dated: 1, ∗ suc- n aauiOhzeki Masayuki and h pe fcnegnet h agtdsrbto.Sec- to distribution. respect target the with to convergence methods nu- of is other speed the method to In Ohzeki-Ichiki compared limits. generalized specific merically as the HMC V, contains the section the and method method and Ohzeki-Ichiki gradient generalized III, the will the section we respectively, that gradient in IV, see HMC the section the in reviewing method II, after Ohzeki-Ichiki the section fact, and in this method method gradient show the To including HMC. dynamics of ily dynamics. con- Hamiltonian seamlessly the gen- indeed to the that is nected explained method be Ohzeki-Ichiki will method it eralized Ohzeki-Ichiki and the generalized, paper, be Hamilto- this will the In of space. behavior dynamics. symplectic state nian the duplicated to the similar is in This state ro- the of causes evolution driv- current tational probability The the systems. producing a force two ing introduces Ohzeki-Ichiki the and between The system current original probability [23]. the distri- system duplicates target continuous method any a to in converges balance- bution Ichiki that detailed and de- dynamics of Ohzeki construction the violating result, systematic this of a on dis- violation proposed Based target the the [22]. that to tributions convergence shown accelerates been balance How- tailed has condition. balance it detailed developed the ever, of been range have the within algorithms Con- acceleration [18–21]. investigated balance ventional intensively detailed been of has possibility violation the acceleration, [17]. for distribution cepts target arbitrary an ex- to later applied was be algorithm to This to Wolff model. by Ising clusters tended the using in by spins updates of state efficient algo- makes Swendensen-Wang updated [15] The rithm the coarse-graining. con- the for the via of generated candidates cept are candidates of efficient Such proposal state. efficient based an is acceleration on for concept alternative The method. iy uoco hks-u aoa4480,Japan 464-8603, Nagoya Chikusa-ku, Furo-cho, sity, h eeaie heiIhk ehdpoie fam- a provides method Ohzeki-Ichiki generalized The con- two mentioned above the to addition in Recently, c odto a trce uhatten- much attracted has condition nce neednl,adterrelationship their and independently, d u cetfi ed,adMro chain Markov and fields, scientific ous ce r emesyudrto in understood seamlessly are aches stedtie aac condition. balance detailed the es gnet h agtdistribution. target the to rgence u framework. our n ln.I h rgeso MCMC, of progress the In pling. ,Sna 8-59 Japan 980-8579, Sendai y, eetdvlpeto MCMC, of development recent e ,3 4, 3, 2, XY † ncigsml gradient simple nnecting ology, oe 1] n snwextended now is and [16], model 2 tion VI is devoted to a summary and discussion. III. HAMILTONIAN MONTE CARLO
We have seen that, in the simple gradient method, the II. GRADIENT METHOD state is updated in the direction along the gradient of the potential, which is normal to the energy surface. With The simplest dynamics converging to the target dis- such a method, it is difficult to avoid to be trapped in tribution is given by a gradient method. The gradient the local minimum of the potential. To overcome this method satisfies the so-called detailed balance condition. difficulty, it has been proposed to add extra degrees of Physically, the dynamics with the detailed balance condi- freedom to the original system to make new directions tion is relaxed to a steady state in which no macroscopic to escape from the local minimum of the potential. This heat is generated. Such a special steady state is called idea is called an extended ensemble method. A method an equilibrium state. By the gradient method, the Gibbs called Hamiltonian Monte Carlo (HMC) is one of the real- distribution izations of the extended ensemble methods. In the HMC, in addition to the original state variable x, a momentum π(x) = exp [ U(x)/T ] /Z (1) − p is introduced as an auxiliary variable. By introducing with a partition function Z is achieved with the balance the momentum, the dimension of the dynamical system between the energy gradient and the diffusion due to doubles, and it becomes easier to escape from the local noise. The following dynamics gives the simplest gradi- minimum of the potential. In other words, when the ki- ent method in which the N-dimensional continuous state netic energy exceeds the energy gap between the local x converges to the Gibbs distribution: minimum and the local maximum of the potential U(x), the state can escape from the local minimum of the po- ∂U tential. The basic concept of the HMC is that the Gibbs dxi(t)= dt + √2TdWi(t) , (2) −∂xi distribution where, dxi is the displacement of xi during an infinites- πx,p(x, p) = exp [ H(x, p)/T ] /Zx,p , (6) − imal time dt, and U(x) and T correspond to the poten- p2 tial and temperature, respectively. W (t) is a standard H(x, p)= U(x)+ i (7) i 2m Wiener process that satisfies Xi i
dWi(t) =0 , (3) is invariant under the Hamiltonian dynamics h ′ i ′ dWi(t)dWj (t ) = δij δ(t t )dt , (4) pi h i − x˙ i = , (8) mi where δ and δ(t) denote Kronecker and Dirac delta ij ∂U(x) functions, respectively, and represents an expecta- p˙i = , (9) tion. The Fokker-Planck equationh·i corresponding to the − ∂xi Langevin equation (2) is given as where Zx,p := dxdp exp [ H(x, p)/T ] is a partition − ∂P (x, t) ∂ ∂U(x) ∂ function. Here, Rmi represents the mass of the i-th de- = –T P (x, t) . (5) gree of freedom. The target Gibbs distribution π(x) = ∂t − ∂x − ∂x ∂x Xi i i i exp [ U(x)/T ] /Z is acquired as a marginal distribution π(x)=− dp π (x, p) via the Gibbs distribution (6). It is straightforwardly confirmed that the Gibbs distri- x,p The algorithm of the HMC consists of the following bution (1) is the steady solution satisfying the Fokker- R steps. (i) Sample the momentum p′ (i =1, ,N) from Planck equation (5). i the Gaussian distribution ··· It is guaranteed by the H-theorem that the dynam- ′2 ics (2) converges to a unique steady distribution (1) as ′ 1 pi an equilibrium distribution regardless of an initial con- PG(pi)= exp . (10) √2πmiT −2miT dition. Therefore, the target Gibbs distribution can be obtained by providing U(x) and T in the simple gradient This procedure changes the state from (x, p) to (x, p′). dynamics (2). However, since the simple gradient method (ii) Evolve the state for waiting time τ starting from updates the state along the gradient of the potential U, the initial state (x, p′) according to the Hamiltonian the update becomes inefficient when the state is trapped dynamics (8) and (9). We denote the obtained in a local minimum of the potential, where the gradient state as (x′′,p′′). (iii) According to the Metropolis- vanishes. To escape from such a local minimum, noise is Hasting rule [7, 24], the state obtained in the step exploited in MCMC algorithms. However, if the poten- (ii), (x′′,p′′), is accepted with the acceptance rate tial around the local minimum is steep, it takes a long min [1, exp [H(x′′,p′′)–H(x, p′)] /T ]. Otherwise, the time to escape from the local minimum. In the history of state remains{− at (x, p′). The algorithm} of the HMC con- MCMC studies, various techniques have been proposed sists of a repetition of these three steps. to avoid such a bottleneck restricting the relaxation to Note that the Gibbs distribution (6) is invariant under the target distribution. the Hamiltonian dynamics (8) and (9). In particular, the 3
Gaussian distribution (10) gives the steady state distri- This system has the steady state distribution of Gibbsian bution for the momentum. In step (i), the momentum p is form sampled from this invariant distribution. The advantage πx,y(x, y) = exp β [U(x)+ U(y)] /Zx,y , (16) of the HMC is that the Gaussian random variables can {− } be easily generated in numerical manners. In step (ii), where β = 1/T , and Zx,y is a partition function. Then, the target distribution π(x) = exp [ U(x)/T ] /Z the state update is ballistic on the energy surface. Even − if the state is located at the local minimum of the poten- is acquired as the marginal distribution π(x) = tial U(x), it is possible to escape from it by the effect of dy πx,y(x, y). Note that this system violates the de- kinetic energy. The rejection in step (iii) is exploited to Rtailed balance condition, but satisfies the balance condi- eliminate nonphysical time evolution [25]. Since the total tion energy is conserved under the Hamiltonian dynamics, the ∂ x ∂ y ui π(x, y)+ ui π(x, y)=0 , (17) acceptance rate is theoretically always unity. However, ∂xi ∂yi Xi Xi naive numerical calculations have been reported to show where the driving force an increase in total energy. The step (iii) is introduced to eliminate this possibility to guarantee the calculation x ∂U(y) ui = γ , (18) accuracy. Thus, step (iii) is extra and can be omitted ∂yi when the time evolution of the Hamiltonian dynamics is y ∂U(x) calculated with sufficiently high accuracy. ui = γ (19) − ∂xi In the simple gradient method (2), the state update yields the probability current characteristic to the vio- in the normal direction of the energy surface is ballis- lation of the detailed balance. The introduction of the tic. The update on the energy surface is diffuse, since driving force satisfying the balance condition remains the the state update on the energy surface is caused only Gibbs distribution (16) to be the steady state distribu- by noise. On the other hand, in the HMC, the update tion. Although the two duplicated systems affect each in the normal direction of the energy surface is caused other via the driving force, the steady state distribution only by the random sampling of momentum. However, for each system is independent. the update on the energy surface is ballistic since the In the Ohzeki-Ichiki dynamics (11) and (12), the same state evolves according to the Hamiltonian dynamics. form of the potential in the original x-system is cho- The Gibbs distribution obeys the principle of equal a sen as that in the duplicated y-system. However, there priori weights for states with equal energy. The HMC is is arbitrariness in the choice of the potential in the y- expected to quickly satisfy the principle of equal a pri- system, since y is an auxiliary variable and the tar- ori weights by the ballistic state updates on the energy get distribution is given as the marginal distribution surface. π(x) = dy πx,y(x, y). Therefore, the potential in the y-systemR does not have to be the same as that of the x-system. Consider the following dynamics: IV. OHZEKI-ICHIKI METHOD ∂Hx(x) ∂Hy(y) x dxi(t)= + γ dt + √2TdWi (t) , − ∂xi ∂yi The violation of the detailed balance condition was (20) shown to accelerate relaxation to the steady state due to the eigenvalue shit for the Fokker-Planck operator [22]. ∂Hy(y) ∂Hx(x) √ y dyi(t)= γ dt + 2TdWi (t) , In order to systematically introduce the violation of the − ∂yi − ∂xi detailed balance condition, Ohzeki and Ichiki have pro- (21) posed to duplicate the original system to introduce a where H (x) = U(x) is the potential in the original x- rotating probability current between the two duplicated x systems: system, and the energy Hy(y) in the y-system can be in the form of an arbitrary function. This system has the following steady state distribution independent of the ∂U(x) ∂U(y) x dxi(t)= dt + γ dt + √2TdWi (t) ,(11) value of γ: − ∂xi ∂yi πx,y(x, y) = exp β [Hx(x)+ Hy(y)] /Zx,y . (22) ∂U(y) ∂U(x) √ y {− } dyi(t)= dt γ dt + 2TdWi (t) ,(12) − ∂yi − ∂xi Therefore, the target distribution is obtained as a marginal distribution π(x) = dy πx,y(x, y) for an ar- where xi and yi are degrees of freedom belonging to the bitrary form of Hy. x R original and the replicated system, respectively. Wi and Consider the change of variables in dynamics (20) and y Wi are independent standard Wiener processes: (21) as γ =˜γT , t˜=˜γTt. Then the dynamics
x x ′ ′ ∂Hy(y) dWi (t)dWj (t ) = δij δ(t t )dt , (13) dx (t˜)= dt , (23) − i ∂y y y ′ ′ i dWi (t)dWj (t ) = δij δ(t t )dt , (14) − ∂Hx(x) x y ′ dyi(t˜)= dt (24) dWi (t)dWj (t ) =0 . (15) − ∂xi
4
is obtained in the limit ofγ ˜ . Note that Hx → ∞ and Hy play the roles of potential and kinetic energies 0 in this dynamics, respectively. In fact, the choice of 2 0 Hy(y)= i yi /2mi reproduces the Hamiltonian dynam- ics (8) andP (9). In dynamics (20) and (21), the driving 0 force proportional to γ causes the violation of the detailed 0 balance condition. The case of γ = 0 corresponds to the 0 0 simple gradient method. On the other hand, the dynam- 0 0 ics in the limit γ corresponds to the Hamiltonian dynamics. Thus, it→ is ∞ concluded that the dynamics (20) 0 and (21) seamlessly connects the gradient method and 0 the Hamiltonian dynamics that is the basis of the HMC.