Unification of Algorithms for Minimum Mode Optimization
Total Page:16
File Type:pdf, Size:1020Kb
THE JOURNAL OF CHEMICAL PHYSICS 140,044115(2014) Unification of algorithms for minimum mode optimization Yi Zeng,1,a) Penghao Xiao,2,a) and Graeme Henkelman2,b) 1Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02138, USA 2Department of Chemistry and the Institute for Computational and Engineering Sciences, University of Texas at Austin, Austin, Texas 78712, USA (Received 6 November 2013; accepted 6 January 2014; published online 30 January 2014) Minimum mode following algorithms are widely used for saddle point searching in chemical and ma- terial systems. Common to these algorithms is a component to find the minimum curvature mode of the second derivative, or Hessian matrix. Several methods, including Lanczos, dimer, Rayleigh-Ritz minimization, shifted power iteration, and locally optimal block preconditioned conjugate gradient, have been proposed for this purpose. Each of these methods finds the lowest curvature mode it- eratively without calculating the Hessian matrix, since the full matrix calculation is prohibitively expensive in the high dimensional spaces of interest. Here we unify these iterative methods in the same theoretical framework using the concept of the Krylov subspace. The Lanczos method finds the lowest eigenvalue in a Krylov subspace of increasing size, while the other methods search in a smaller subspace spanned by the set of previous search directions. We show that these smaller subspaces are contained within the Krylov space for which the Lanczos method explicitly finds the lowest curvature mode, and hence the theoretical efficiency of the minimum mode finding methods are bounded by the Lanczos method. Numerical tests demonstrate that the dimer method combined with second-order optimizers approaches but does not exceed the efficiency of the Lanczos method for minimum mode optimization. ©2014AIPPublishingLLC.[http://dx.doi.org/10.1063/1.4862410] I. INTRODUCTION equation reverses the force along that eigenvector and con- verges to a first-order saddle point. The efficiency of the algo- An important challenge in chemical and materials sci- rithm relies on an efficient update ofτ ˆ. ence is the simulation of the dynamics of systems over long In this paper, we focus on theτ ˆ updating algorithms that time scales. Most chemical reactions cannot be simulated di- avoid calculating the Hessian matrix H, since it is typically rectly with traditional molecular dynamics (MD) simulations too expensive to calculate directly for large chemical sys- due to their limited accessible time scales. However, using the tems of interest. We examine several existing methods for harmonic approximation to transition state theory,1, 2 which is estimatingτ ˆ,includingtheLanczos3 method as used in the generally valid for solid state systems at moderate tempera- activation relaxation technique (ART-nouveau),4 the dimer ture, any reaction rate can be determined from the reactant and method,5–7 Raleigh-Ritz minimization8 as used in the hybrid transition states. Once these states are located, the dynamics eigenvector following method,9 the shifted power iteration of rare-event systems can be evolved by a kinetic Monte Carlo method as used in the gentlest ascent dynamics method,10 algorithm over time scales much longer than is possible with and the locally optimal block preconditioned conjugate gra- MD. Thus the challenge of studying such chemical reactions dient (LOBPCG) method.11 Here, these methods are unified can be transformed into the task of searching for the saddle into the same mathematical framework so that their relative points (transition states) connected to a given minimum (re- theoretical efficiencies can be understood. actant). Due to the high dimensionality and expensive force This paper is structured as follows. In Sec. II the Lanc- evaluation of chemical systems, great efforts have been made zos method is presented with the essential idea of the Krylov in developing efficient saddle point searching algorithms. A subspace behind the algorithm. Another widely used numer- family of these algorithms, called minimum-mode following ical scheme, the dimer method, is presented in Sec. III.We algorithms, employs the following evolution equation: show that the dimer method searches for the lowest eigen- x˙ F (x) 2(τ ˆTF (x))τ ˆ, (1) vector of the Hessian within the same Krylov subspace as = − the Lanczos algorithm. In Sec. IV we present the power it- where x is the position vector, F(x)isthegradientofthepo- − eration method with Rayleigh quotient shift. This method is tential energy surface V (x), andτ ˆ is the unit vector along the shown to be a special restarted case of the Lanczos algorithm minimum curvature mode direction. We denote H V =∇∇ for which the convergence rate is significantly slower in high andτ ˆ as the unit eigenvector of H with the minimum eigen- dimensional space. In Sec. V we numerically compare the ef- value. When H has only one negative eigenvalue, the above ficiency of the Lanczos algorithm, the dimer method coupled with three different optimizers, and the shifted power iteration method. Finally, we conclude in Sec. VI that the performance a)Y. Zeng and P. Xiao contributed equally to this work. b)Author to whom correspondence should be addressed. Electronic mail: of methods such as the dimer, which are limited to a subspace [email protected] of the same Krylov subspace as the Lanczos method, does 0021-9606/2014/140(4)/044115/5/$30.00 140,044115-1 ©2014AIPPublishingLLC 044115-2 Zeng, Xiao, and Henkelman J. Chem. Phys. 140,044115(2014) not exceed Lanczos efficiency for finding the lowest curva- following approximation: ture mode. F (x) F (x v) Hv − + ( v 2). (5) = v + O ∥ ∥ || || II. LANCZOS ALGORITHM To solve the minimization problem in the Krylov sub- The Lanczos algorithm is a specialized Arnoldi iteration space, a set of orthogonal basis is utilized. This basis can method of eigenvalue calculations for symmetric matrices.3 be obtained by the Gram-Schmidt process iteratively as Before discussing the Lanczos algorithm, we first restate the shown in Refs. 12 and 13.Wedefinesuchabasisforan- eigenvalue problem as a minimization problem in Theorem dimensional Krylov subspace n as Qn [q1,q2, ,qn] m n K = ··· 1, and then review the concept of the Krylov subspace and R × .Therefore,anyvectorb n can be represented by, ∈ n ∈ K present the Lanczos algorithm based on Krylov subspace pro- b Qnr,wherer R .Thentheminimizationproblempro- = ∈ jection and search. In this section, we assume that the smallest jected on a Krylov subspace n can be solved, as shown in eigenvalue has multiplicity 1. Theorem 2. K Theorem 1. Given a symmetric matrix H Rm m, v is Theorem 2. Q r solves the minimization problem in the ∈ × n the eigenvector associated with the smallest eigenvalue λ, if Krylov subspace n and only if v solves the minimization problem, K bTHb min , (6) T bTHb b n 0 b b min . (2) ∈K \{ } b Rm 0 bTb if r is the eigenvector corresponding to the smallest eigen- ∈ \{ } T value of the matrix Qn HQn. Proof. Since H is a symmetric matrix, all eigenvectors m Proof. Since any vector b n can be written as, v1,v2, ,vm form an orthogonal basis of the space R ∈ K ··· b Qnrb and all eigenvalues of H are real numbers. We can write = T T T T b a v ,thenwehave, b Hb (Q r ) HQ r r Q HQn rb i i i n b n b b n . (7) = T T T b b = (Qnrb) (Qnrb) = rb rb ! bTHb λ a2 " # i i i . (3) T 2 By Theorem 1, the eigenvector r associated with the small- b b = i ai T ! est eigenvalue of the matrix Qn HQn solves this minimization This function takes its minimum! value when b is equivalent to problem. Therefore, the vector Qnr solves the original opti- mization problem in the space n. the eigenvector v.Thus,v solves the minimization problem. K ! The other side of the statement follows from the Remark 3. By construction of the orthogonal basis Q , uniqueness of the eigenvector corresponding to the smallest n the matrix QTHQ is an upper Hessenberg matrix, e.g., an eigenvalue. n n ! upper triangular matrix plus a nonzero first subdiagonal. It is Remark 1. Such an optimal solution v must be unique also symmetric since H is symmetric and Qn is an orthogonal T also due to the uniqueness of the eigenvector associates with basis. These two properties confirm that the matrix Qn HQn is the smallest eigenvalue. tridiagonal. Having transformed the eigenvalue problem to a mini- Finally, the eigenvalue problem of an unknown matrix mization problem, the minimization problem can be solved H is transformed to an iterative series of calculations to find as follows. We first solve the optimization problem in a low the smallest eigenvalue of a known low dimensional matrix T dimension subspace, which can be done more easily than in Qn HQn,whichcanbedoneefficiently,forexample,bythe the original space Rm.Thenweconsidertheoptimalsolution QR algorithm. In theory, this scheme is guaranteed to con- in a space with one more dimension to find a better solution. verge if n grows to the rank of the matrix H,butinpractice, 14, 15 As we move to higher and higher dimensional subspaces, this the convergence will be faster than this bound. optimal solution will converge to the true solution. Here the low dimensional subspace we will use is the Krylov subspace, III. DIMER METHOD n,whichisdefinedasinRemark2. K The dimer method is another iterative algorithm 5 Remark 2. The nth order Krylov subspace n generated for minimum mode finding. With improvements in the m m K m 6, 7, 16, 17 by a square matrix H R × and a nonzero vector v R implementation, the dimer method has become widely is defined as ∈ ∈ used in calculations of chemical reaction rates especially with forces evaluated from self-consistent electronic structure 2 n 1 n span v,Hv,H v, , H − v .