THE JOURNAL OF CHEMICAL PHYSICS 140,044115(2014)

Unification of algorithms for minimum mode optimization Yi Zeng,1,a) Penghao Xiao,2,a) and Graeme Henkelman2,b) 1Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02138, USA 2Department of Chemistry and the Institute for Computational and Engineering Sciences, University of Texas at Austin, Austin, Texas 78712, USA (Received 6 November 2013; accepted 6 January 2014; published online 30 January 2014)

Minimum mode following algorithms are widely used for saddle point searching in chemical and ma- terial systems. Common to these algorithms is a component to find the minimum curvature mode of the second derivative, or Hessian . Several methods, including Lanczos, dimer, Rayleigh-Ritz minimization, shifted power iteration, and locally optimal block preconditioned conjugate gradient, have been proposed for this purpose. Each of these methods finds the lowest curvature mode it- eratively without calculating the Hessian matrix, since the full matrix calculation is prohibitively expensive in the high dimensional spaces of interest. Here we unify these iterative methods in the same theoretical framework using the concept of the . The Lanczos method finds the lowest eigenvalue in a Krylov subspace of increasing size, while the other methods search in a smaller subspace spanned by the set of previous search directions. We show that these smaller subspaces are contained within the Krylov space for which the Lanczos method explicitly finds the lowest curvature mode, and hence the theoretical efficiency of the minimum mode finding methods are bounded by the Lanczos method. Numerical tests demonstrate that the dimer method combined with second-order optimizers approaches but does not exceed the efficiency of the Lanczos method for minimum mode optimization. ©2014AIPPublishingLLC.[http://dx.doi.org/10.1063/1.4862410]

I. INTRODUCTION equation reverses the force along that eigenvector and con- verges to a first-order saddle point. The efficiency of the algo- An important challenge in chemical and materials sci- rithm relies on an efficient update ofτ ˆ. ence is the simulation of the dynamics of systems over long In this paper, we focus on theτ ˆ updating algorithms that time scales. Most chemical reactions cannot be simulated di- avoid calculating the Hessian matrix H, since it is typically rectly with traditional molecular dynamics (MD) simulations too expensive to calculate directly for large chemical sys- due to their limited accessible time scales. However, using the tems of interest. We examine several existing methods for harmonic approximation to transition state theory,1, 2 which is estimatingτ ˆ,includingtheLanczos3 method as used in the generally valid for solid state systems at moderate tempera- activation relaxation technique (ART-nouveau),4 the dimer ture, any reaction rate can be determined from the reactant and method,5–7 Raleigh-Ritz minimization8 as used in the hybrid transition states. Once these states are located, the dynamics eigenvector following method,9 the shifted power iteration of rare-event systems can be evolved by a kinetic Monte Carlo method as used in the gentlest ascent dynamics method,10 algorithm over time scales much longer than is possible with and the locally optimal block preconditioned conjugate gra- MD. Thus the challenge of studying such chemical reactions dient (LOBPCG) method.11 Here, these methods are unified can be transformed into the task of searching for the saddle into the same mathematical framework so that their relative points (transition states) connected to a given minimum (re- theoretical efficiencies can be understood. actant). Due to the high dimensionality and expensive force This paper is structured as follows. In Sec. II the Lanc- evaluation of chemical systems, great efforts have been made zos method is presented with the essential idea of the Krylov in developing efficient saddle point searching algorithms. A subspace behind the algorithm. Another widely used numer- family of these algorithms, called minimum-mode following ical scheme, the dimer method, is presented in Sec. III.We algorithms, employs the following evolution equation: show that the dimer method searches for the lowest eigen- x˙ F (x) 2(τ ˆTF (x))τ ˆ, (1) vector of the Hessian within the same Krylov subspace as = − the . In Sec. IV we present the power it- where x is the position vector, F(x)isthegradientofthepo- − eration method with Rayleigh quotient shift. This method is tential energy surface V (x), andτ ˆ is the unit vector along the shown to be a special restarted case of the Lanczos algorithm minimum curvature mode direction. We denote H V =∇∇ for which the convergence rate is significantly slower in high andτ ˆ as the unit eigenvector of H with the minimum eigen- dimensional space. In Sec. V we numerically compare the ef- value. When H has only one negative eigenvalue, the above ficiency of the Lanczos algorithm, the dimer method coupled with three different optimizers, and the shifted power iteration method. Finally, we conclude in Sec. VI that the performance a)Y. Zeng and P. Xiao contributed equally to this work. b)Author to whom correspondence should be addressed. Electronic mail: of methods such as the dimer, which are limited to a subspace [email protected] of the same Krylov subspace as the Lanczos method, does

0021-9606/2014/140(4)/044115/5/$30.00 140,044115-1 ©2014AIPPublishingLLC 044115-2 Zeng, Xiao, and Henkelman J. Chem. Phys. 140,044115(2014) not exceed Lanczos efficiency for finding the lowest curva- following approximation: ture mode. F (x) F (x v) Hv − + ( v 2). (5) = v + O ∥ ∥ || || II. LANCZOS ALGORITHM To solve the minimization problem in the Krylov sub- The Lanczos algorithm is a specialized space, a set of orthogonal basis is utilized. This basis can method of eigenvalue calculations for symmetric matrices.3 be obtained by the Gram-Schmidt process iteratively as Before discussing the Lanczos algorithm, we first restate the shown in Refs. 12 and 13.Wedefinesuchabasisforan- eigenvalue problem as a minimization problem in Theorem dimensional Krylov subspace n as Qn [q1,q2, ,qn] m n K = ··· 1, and then review the concept of the Krylov subspace and R × .Therefore,anyvectorb n can be represented by, ∈ n ∈ K present the Lanczos algorithm based on Krylov subspace pro- b Qnr,wherer R .Thentheminimizationproblempro- = ∈ jection and search. In this section, we assume that the smallest jected on a Krylov subspace n can be solved, as shown in eigenvalue has multiplicity 1. Theorem 2. K

Theorem 1. Given a symmetric matrix H Rm m, v is Theorem 2. Q r solves the minimization problem in the ∈ × n the eigenvector associated with the smallest eigenvalue λ, if Krylov subspace n and only if v solves the minimization problem, K bTHb min , (6) T bTHb b n 0 b b min . (2) ∈K \{ } b Rm 0 bTb if r is the eigenvector corresponding to the smallest eigen- ∈ \{ } T value of the matrix Qn HQn. Proof. Since H is a symmetric matrix, all eigenvectors m Proof. Since any vector b n can be written as, v1,v2, ,vm form an orthogonal basis of the space R ∈ K ··· b Qnrb and all eigenvalues of H are real numbers. We can write = T T T T b a v ,thenwehave, b Hb (Q r ) HQ r r Q HQn rb i i i n b n b b n . (7) = T T T b b = (Qnrb) (Qnrb) = rb rb ! bTHb λ a2 " # i i i . (3) T 2 By Theorem 1, the eigenvector r associated with the small- b b = i ai T ! est eigenvalue of the matrix Qn HQn solves this minimization This function takes its minimum! value when b is equivalent to problem. Therefore, the vector Qnr solves the original opti- mization problem in the space n. the eigenvector v.Thus,v solves the minimization problem. K ! The other side of the statement follows from the Remark 3. By construction of the orthogonal basis Q , uniqueness of the eigenvector corresponding to the smallest n the matrix QTHQ is an upper Hessenberg matrix, e.g., an eigenvalue. n n ! upper triangular matrix plus a nonzero first subdiagonal. It is Remark 1. Such an optimal solution v must be unique also symmetric since H is symmetric and Qn is an orthogonal T also due to the uniqueness of the eigenvector associates with basis. These two properties confirm that the matrix Qn HQn is the smallest eigenvalue. tridiagonal.

Having transformed the eigenvalue problem to a mini- Finally, the eigenvalue problem of an unknown matrix mization problem, the minimization problem can be solved H is transformed to an iterative series of calculations to find as follows. We first solve the optimization problem in a low the smallest eigenvalue of a known low dimensional matrix T dimension subspace, which can be done more easily than in Qn HQn,whichcanbedoneefficiently,forexample,bythe the original space Rm.Thenweconsidertheoptimalsolution QR algorithm. In theory, this scheme is guaranteed to con- in a space with one more dimension to find a better solution. verge if n grows to the rank of the matrix H,butinpractice, 14, 15 As we move to higher and higher dimensional subspaces, this the convergence will be faster than this bound. optimal solution will converge to the true solution. Here the low dimensional subspace we will use is the Krylov subspace, III. DIMER METHOD n,whichisdefinedasinRemark2. K The dimer method is another iterative algorithm 5 Remark 2. The nth order Krylov subspace n generated for minimum mode finding. With improvements in the m m K m 6, 7, 16, 17 by a square matrix H R × and a nonzero vector v R implementation, the dimer method has become widely is defined as ∈ ∈ used in calculations of chemical reaction rates especially with forces evaluated from self-consistent electronic structure 2 n 1 n span v,Hv,H v, , H − v . (4) methods. K = { ··· } We note that the Raleigh-Ritz optimization method used When n is smaller than the rank r of the matrix H, n is a in hybrid eigenvector following, as developed by the Wales K 9 n-dimensional space; when n is greater or equal to r, n is group, is based upon the same finite-difference gradient of an r-dimensional space. The product Hv is calculated byK the the Rayleigh quotient that is used in the dimer method. So 044115-3 Zeng, Xiao, and Henkelman J. Chem. Phys. 140,044115(2014) while the methods are described using different language and rˆ by the unit eigenvectors of A, have some minor differences in their implementation, they rˆTArˆ (a vˆ a vˆ )TA(a vˆ a vˆ ) a2λ a2λ , are equivalent for the purposes of this analysis. The same ap- x x y y x x y y x x y y = + + = + (12) proach was also used previously by Voter to construct a bias where λ and λ are eigenvalues of the matrix A. potential for the acceleration of MD in his hyperdynamics x y 18 method. Theorem 3. The steepest descent dimer method is equiv- In this section, we present the dimer method within the alent to the Lanczos algorithm with restarts every two steps; same theoretical framework as the Lanczos algorithm. Nu- its theoretical efficiency must be lower than the Lanczos merical comparisons have been previously made between algorithm. these two algorithms,17 but now, under this mathematical framework, we can compare their relative theoretical effi- When the rotation plane is determined by algorithms ciency. based on the previous search direction #,asintheconju- In the dimer method, the minimum curvature mode is de- gate gradient (CG) algorithm, the dimer method is more ef- termined by rotating a pair of images separated by a small ficient but remains under the limit of the Lanczos method. distance v according to the torque acting on the dimer. The The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm || || torque is the force difference divided by the distance, and thus for updating #,aswellasthelimitedmemoryversion(L- has the same form as Hv in Eq. (5).Rotatingalongthetorque BFGS),19 is even more efficient. When the initial Hessian is direction is the mechanism by which the dimer method finds set as a constant times the identity matrix (the standard case the minimum curvature mode in a specific subspace of the in practice), BFGS/L-BFGS is also searching for the lowest Krylov subspace. mode in the subspace of n 1 at iteration n.TheLOBPCG τ THτ K + At each iteration, the new direction τ that minimizes τ Tτ method performs the minimization in a three-dimensional is found in the plane spanned by v,Hv ,assumingthesim- subspace, with one extra direction that is a linear combination { } plest case in which the SD direction is taken for the rotation of the previous directions.11 The search space of LOBPCG plane. A second direction # is then constructed perpendicular is therefore still a subspace of the Krylov subspace, and its to v to form an orthogonal basis set Q2 [v,#], reducing the theoretical efficiency also cannot exceed the Lanczos limit. optimization problem to two dimensions,= # Hv (vTHv)v, (8) IV. POWER ITERATION METHOD = − WITH A RAYLEIGH SHIFT Another method for finding the lowest curvature mode of vTHvvTH# the Hessian is the power iteration method, which has been em- T A Q2 HQ2 . (9) ployed in some recent saddle point searching algorithms.10, 20 = = ⎛ T T ⎞ # Hv#H# In this section, we present the motivation and mechanism of ⎝ ⎠ Here, calculating H# requires a second force evaluation, the power iteration method with a Rayleigh shift, and prove F(x #). The 2 2matrixA can then be diagonalized. The the convergence of this method. Similar to the dimer method, + × we demonstrate that the search space of this method is con- eigenvector τ,whichisexpressedasτ r1v r2#,isthen the starting point of the next iteration. Note= that+ tained in the Krylov subspace n for the minimization prob- lem of Eq. (2). K Hτ r1Hv r2H#, (10) = + can be obtained without any further force evaluation since Hv A. Derivation of the shifted power iteration method and H# are already known, as pointed out in Ref. 7.The The power iteration method is an iterative eigenvector minimization is repeated in a sequence of two-dimensional computation method which can be described as spaces: span v,Hv , span τ,Hτ , ,wherespanv,Hv is { } { } ··· { } the Krylov subspace 2.Alsobecause Hvn vn 1 , (13) K + T 2 = Hvn Hτ (r1 r2(v Hv))Hv r2H v, (11) ∥ ∥ = − + where H is a square matrix with eigenvalues λ1 <λ2 2 τ and Hτ are in the Krylov subspace 3 span v,Hv,H v . < λ3 < ... < λm and λ1 < λm .Inthisscheme,vn will K = { } = = | | | | After the nth iteration, n force evaluations have been made, converge to the eigenvector associated with eigenvalue λm as which is the same number as in the Lanczos method. How- n .13 ever, each two-dimensional space considered in the dimer →∞We already know that all eigenvalues of our Hessian ma- method is a subspace of n 1.Therefore,thedimermethod trix H are on the real axis. Moreover, if λ is an eigenvalue of convergence is theoreticallyK + limited by that of the Lanczos H associated with the eigenvector v,andI is the identity ma- algorithm. trix, then for any constant a R, a λ is the eigenvalue of In previous descriptions of the dimer method, the above anewmatrixaI H with eigenvector∈ − v.Asaresult,wecan − procedure was done by finding a unit vector rˆ R2 to mini- linearly shift the desired eigenvalue to become the eigenvalue mize rˆTArˆ,whichisexactlythedimerenergyinRef.∈ 5.The with greatest absolute value without changing the eigenvec- connection between minimizing the dimer energy and solving tor. Therefore, the power iteration method will converge to the eigenvector problem of Eq. (9) can be seen by expanding our desired eigenvector if we find such a shift. 044115-4 Zeng, Xiao, and Henkelman J. Chem. Phys. 140,044115(2014)

To pick an appropriate shift, a,weusethecurrentmax- tion from this method is located in the Krylov subspace n 1, imum absolute value of Rayleigh quotients at each iteration which is defined in Remark 2. We will prove this statementK + plus a small increment, in Lemma 1 in order to conclude that the shifted power iter- T T T ation method will always converge slower than the Lanczos v1 Hv1 v2 Hv2 vn Hvn an max , , log n. algorithm. = vTv vTv ··· vTv + () 1 1 ) ) 2 2 ) ) n n )* ) ) ) ) ) ) (14) Lemma 1. For any n > 0, vn n 1, where n 1 ) ) ) ) ) ) + + The log n term) is added) ) to prevent) the) case where) an λ1 n ∈ K K − span v0, Hv0, , H v0 . (an λm), i.e. an 0.5 (λ1 λm), even though this = { ··· } =− − = ∗ + scenario is unlikely. With the dynamical update of the shift Proof. We will prove this by induction. constant an according to Eq. (14),themodifiedpoweritera- When n 1, v1 (a1I H)v0 a1v0 Hv0 2. = = − = +n i∈ K tion method can be described, Given vn n 1,wecanwritevn i 0 ci H v0.Then, ∈ K + = = (anI H)vn n vn 1 − . (15) !i + = (anI H)vn vn 1 (anI H)vn (anI H) ci H v0 n 2. (17) + = − = − ∈ K + ∥ − ∥ i 0 .= Theorem 4. The iterative algorithm shown in Eq. (15) Thus, the statement holds for any general n > 0. ! will converge to the eigenvector associated with the smallest eigenvalue of H, if this eigenvalue has multiplicity 1. Since the eigenvalue problem can be taken as a minimiza- tion problem as shown in Theorem 1, the power iteration so- Proof. Let λ1 be the smallest eigenvalue and λ2 the sec- lution is within the same Krylov subspace as the Lanczos al- ond smallest one, by our assumption smallest eigenvalue has gorithm, with the same number of iterations. Therefore, the multiplicity 1, λ1 <λ2. convergence rate of the power iteration method is limited by The Rayleigh quotient is bounded by the maximal abso- the Lanczos algorithm. lute value of eigenvalues, which we assume to be L.Letv be the true eigenvector we want to obtain, then the convergence rate of the vn depends on the ratio of two eigenvalues which V. NUMERICAL TEST 13 are with largest absolute values. The convergence rate is, In Secs. I– IV,wehavecomparedthetheoreticaleffi- n L log k λ1 ciencies of the Lanczos algorithm, the dimer method, and vn v + − 0, (16) ∥ − ∥=O L log k λ → the shifted power iterative method. We proved that the con- +k 1 2 - ,= + − vergence rates of the later two methods are bounded by the as n ,whichprovestheconvergenceofthe Lanczos algorithm due to the restriction of a smaller search →∞ algorithm. ! space than Lanczos at each step. In this section, we conduct a numerical comparison to demonstrate our results in practice. The convergence rates of the algorithms are compared B. Krylov subspace of the shifted power for Lennard-Jones clusters with 38 atoms. Geometry config- iteration method urations are chosen randomly near saddle points where the While the convergence of the power iteration method existence of one negative eigenvalue of the Hessian is guaran- with a Rayleigh shift is guaranteed in principle, the conver- teed. A random direction is used as an initial guess for each of gence is slow in practice. The resulting vn at the nth itera- the minimum-mode searches. More details of the benchmark

1.6 1.6 Lanczos Lanczos 1.4 BFGS dimer 1.4 BFGS dimer CG dimer CG dimer 1.2 1.2 SD dimer 1.0 shifted power 1.0 iteration m mode (radians) m mode (radians) u u 0.8 0.8

0.6 0.6

0.4 0.4 Angle to the minim Angle to the minim 0.2 0.2

0.0 0.0 0 50 100 150 200 250 0 5 10 15 20 25 30 Iteration number Iteration number

FIG. 1. The angle, in radians, towards the true minimum mode as a function of iteration number (left) and a zoom in of the region of relevance for the Lanczos, BFGS dimer, and CG dimer methods (right). 044115-5 Zeng, Xiao, and Henkelman J. Chem. Phys. 140,044115(2014)

TABLE I. Steps to convergence. explicitly finds the minimum curvature mode. This leads to the conclusion that with the same number of evaluations of the ¯ Method NMaxNMinNpotential gradient, the Lanczos algorithm will theoretically Lanczos 25 54 13 converge no slower than the other two classes of methods. BFGS dimer 27 65 13 The result of this research can provide theoretical guidance CG dimer 29 80 13 for any future improvements to methods for finding the min- imum curvature mode. Key to methods that can outperform the Lanczos algorithm will be the determination of subspaces system are discussed elsewhere.21, 22 The Lanczos method is that are outside of the Krylov subspace. implemented with full reorthogonalization, which is faster by one step on average than without reorthogonalization. The dimer method is implemented with three optimizers for de- termining the rotation plane: SD, CG, and BFGS. The ini- ACKNOWLEDGMENTS tial Hessian for BFGS is taken as αI,whereα is set to be This work was supported by the National Science Foun- 60 eV/Å2 and I is the identity matrix. Other α values tested dation. Y.Z. is supported by a Graduate Research Fellow- did not give significantly better results. When the rotation di- ship under Grant No. 1122374 and G.H. by Grant No. CHE- rection from the BFGS becomes almost perpendicular (within 1152342. We gratefully acknowledge the Institute for Pure 3◦)oftheSDdirection,theBFGSisrestartedwiththeinitial and Applied Mathematics at the University of California in Hessian. No parameters are needed for the other methods. All Los Angeles for hosting our participation in the long pro- the methods are implemented in the TSASE software.23, 24 gram “Materials for a Sustainable Energy Future” from Sept. The angle between the estimated lowest mode and the 9toDec.13,2013.WethankKeithPromislowforhelpful true minimum mode is plotted at each iteration in a typical discussions. run in Fig. 1.ClearlytheLanczosmethodisthefastest,while the shifted power iteration and the SD dimer are significantly 1C. Wert and C. Zener, Phys. Rev. 76,1169(1949). slower. The CG dimer and the BFGS dimer are marginally 2G. H. Vineyard, J. Phys. Chem. Solids 3,121(1957). 3C. Lanczos, An Iteration Method for the Solution of the Eigenvalue Problem slower than Lanczos. The similar convergence trends of of Linear Differential and Integral Operators (United States Government these three methods indicate some commonality between Press Office, 1950). them. 4R. Malek and N. Mousseau, Phys. Rev. E 62,7723(2000). 5 The two slowest methods were not considered for fur- G. Henkelman and H. Jónsson, J. Chem. Phys. 111,7010(1999). 6A. Heyden, A. T. Bell, and F. J. Keil, J. Chem. Phys. 123,224101(2005). ther study, but for the three competitive methods, 200 mini- 7J. Kästner and P. Sherwood, J. Chem. Phys. 128,014106(2008). mum mode searches were run at different cluster geometries 8R. A. Horn and C. R. Johnson, Matrix Analysis (Cambridge University for a more statistically significant comparison. A summary of Press, Cambridge, 1985). 9 the results is presented in Table I.Theconvergencecriteria L. J. Munro and D. J. Wales, Phys. Rev. B 59,3969(1999). 10W. E and X. Zhou, Nonlinearity 24,1831(2011). is that the angle to the true minimum mode is smaller than 11J. Leng, W. Gao, C. Shang, and Z.-P. Liu, J. Chem. Phys. 138,094110 0.14, which corresponds to an overlap (dot product of unit (2013). vectors) greater than 0.99. We did not observe any case in 12Y. Saad, Numerical Methods for Large Eigenvalue Problems (SIAM, which the dimer method converges faster than Lanczos, al- 1992), Vol. 158. 13L. N. Trefethen and D. Bau III, (SIAM, 1997), though in some cases they converge at the same rate. Typi- Vol. 50. cally, the BFGS dimer is faster than the CG dimer when a 14Y. Saad, SIAM J. Numer. Anal. 17,687(1980). reasonable initial Hessian value, α,ischosen.Thesenumeri- 15J. Kuczynski and H. Wozniakowski, SIAM J. Matrix Anal. Appl. 13,1094 cal results are consistent with our theoretical conclusions. (1992). 16A. Poddey and P. E. Blöchl, J. Chem. Phys. 128,044107(2008). 17R. A. Olsen, G. J. Kroes, G. Henkelman, A. Arnaldsson, and H. Jónsson, J. VI. CONCLUSION Chem. Phys. 121,9776(2004). 18A. F. Voter, Phys. Rev. Lett. 78,3908(1997). 19 In summary, we have presented three classes of minimum J. Nocedal, Math. Comput. 35,773(1980). 20A. Samanta and W. E, J. Chem. Phys. 136,124104(2012). mode searching algorithms, the Lanczos algorithm, dimer 21S. T. Chill, J. Stevenson, V. Ruehle, C. Shang, P. Xiao, D. Wales, and G. method, and the shifted power method, under the same math- Henkelman, “Benchmarks for characterization of minima, transition states ematical framework of minimization in the Krylov subspace. and pathways in atomic systems,” J. Chem. Phys. (unpublished). 22 With a theoretical understanding of these methods, we can see See http://optbench.org/ for the optimization benchmarks. 23See https://wiki.fysik.dtu.dk/ase/ for information about the ASE project. the dimer and shifted power methods are searching in a sub- 24See http://theory.cm.utexas.edu/henkelman/code/ to obtain the TSASE space of the Krylov subspace for which the Lanczos method code.