<<

Theor Chem Acc (2016) 135:84 DOI 10.1007/s00214-016-1847-3

REGULAR ARTICLE

Exploration of some refinements to geometry optimization methods

Adam B. Birkholz1 · H. Bernhard Schlegel1

Received: 15 January 2016 / Accepted: 18 February 2016 © Springer-Verlag Berlin Heidelberg 2016

Abstract The optimization of equilibrium geometries is an approximate Hessian stored and updated using the gra- a key first step in most investigations that utilize quantum dient information computed at each step of the optimiza- chemistry calculations. We present three modifications to tion. Methods also exist that avoid the storage and update standard methods for geometry optimization: a flow chart of the Hessian , such as the conjugate [4] approach to Hessian updating, a scaled RFO approach to and limited memory quasi-Newton [5] optimizers. These controlling the step size and direction, and a quasi-rotation approaches typically require more gradient evaluations approach to handling the propagation of internal coordinate to achieve convergence when compared with full storage data over the course of the optimization. These modifica- quasi-Newton methods and are frequently used to opti- tions are compared against a standard optimization method. mize very large molecules where the memory costs associ- We provide a new test set for geometry optimization con- ated with storing the Hessian are prohibitive. For a recent sisting of 20 structures which range from ca 20 atoms to ca benchmark of these such methods, see Ref. [6]. 100 atoms. For quasi-Newton optimizations, approximate, positive definite Hessian matrices [7] are typically used to avoid the Keywords Geometry optimization · Quasi–Newton · cost of computing the full Hessian exactly. These approxi- Hessian updating · RFO · Optimization test set mate Hessians are then updated by the method of Broyden, Fletcher, Goldfarb and Shanno (BFGS [8–11]) when seek- ing a minimum structure, and either the Powell’s sym- 1 Introduction metric Broyden (PSB [12]) or the symmetric rank 1 (SR1 [13]) updates, or some combination of the two [14], is used Geometry optimization is an important tool in the compu- when attempting to locate a transition state. Additionally, tational chemistry toolbox and has become ubiquitous in sequence acceleration methods such as line searches and modern studies of chemical properties and reactions. There direct inversion of the iterative subspace (GDIIS [15]) are are a wide variety of different algorithms that exist for also used to reduce the number of potential energy surface optimization (see Ref. [1] for a recent review of methods), (PES) calculations necessary to converge to the desired with the most common utilizing a combination of quasi- minimum or transition-state structure. Newton steps in redundant internal coordinates [2, 3], with Since geometry optimization using quasi-Newton meth- ods is a relatively mature field, improvements are likely to be modest. Nevertheless, it is worthwhile to explore the effects Electronic supplementary material The online version of this of various modifications to existing optimization algorithms. article (doi:10.1007/s00214-016-1847-3) contains supplementary material, which is available to authorized users. Described below are three refinements that may offer addi- tional benefits over existing methodologies for locating mini- * H. Bernhard Schlegel mum energy structures using quasi-Newton optimizers. [email protected]

1 Department of Chemistry, Wayne State University, Detroit, • Flowchart update—This approach seeks to improve Hes- MI 48202, USA sian updating by using different update methods only

1 3 84 Page 2 of 12 Theor Chem Acc (2016) 135:84

when they are expected to be well behaved, and falling zzT back to more reliable but less ideal updates when neces- (2) HSR1 = T sary. Additionally, a new modification to the PSB method z s is used by using scaled displacements to compute the T T T update. sz + zs T ss HPSB = − s z (3) • Scaled RFO method—This approach seeks to improve sT s sT s 2 the use of the rational optimization method for  controlling step size and direction by modifying the shift It is fairly well understood that the BFGS (Eq. 1) and SR1 matrix to better represent the expected relative stiffness (Eq. 2) updates can become numerically unstable when yT s of the bond stretches versus the other coordinates. and zT s are small, respectively. The PSB update (Eq. 3), on • Quasi-rotation method—This is an alternative approach the other hand, is always numerically stable since the only to handling the redundancy in an internal coordinate sys- quantity in the denominator (sT s) is always nonzero for a tem. Rather than store the approximate Hessian in the full finite step, but may have undesirable properties when the redundant space, a quasi- is used to rotate quadratic error is large. After observing that, in general, the the approximate Hessian from the non-redundant space at SR1 update produced the most reasonable results when zT s one point, to the non-redundant space at another. In addi- is less than 0, the following flowchart method was developed. tion to reducing the memory requirements for storing the Hessian for large systems, this could also lead to a more 2.2 Method consistent approximation to the Hessian even when the non-redundant space changes over the course of the opti- 2.2.1 Flowchart method mization and may help improve Hessian updating since zT s the change in the gradient can be expressed entirely in the 1. If |z||s| < −0.1, use the SR1 update. non-redundant space at one set of coordinates. yT s 2. If |y||s| > 0.1, use the BFGS update. 3. Otherwise, use PSB method. In the appendix, we summarize the procedures that have This has the benefit of attempting to use the SR1 and become standard over the past few decades for the optimi- BFGS methods, which are often far superior to the PSB zation of molecular geometries using the quasi-Newton method for minimization, as much as possible, but relying method. In Sects. 2–4, we present some new contributions on the numerical stability of PSB when necessary. to improve the stability and efficiency of such optimizations. The standard method will serve as the benchmark for eval- 2.2.2 SSB method uating the performance of each of the modifications to the standard optimization method. All Hessian update methods are based upon computing a To test the performance of the geometry optimiza- correction to the Hessian which satisfies the secant equation tion methods, we have compiled a new set of 20 mol with between 10 and 50 heavy atoms (Fig. 1). This test set is (H + H)s = y (4) more representative of molecules typically being optimized Whenever the dimensionality of the PES is greater than 1, using ab initio and DFT methods on modern architectures this equation is under-determined for the correction ( H ) than the older test sets that were developed according to the and the imposition of different constraints, such as the computer time available in previous decades [16, 17]. requirement that the correction be symmetric, and that it has a minimum size according to some metric, leads to the different update formula. Since all update methods satisfy 2 Flowchart update Eq. 4, the difference in performance between the meth- ods must depend on how they treat the remaining space. 2.1 Motivation For SR1 and BFGS, the numerator is constructed from the outer products of vectors that resemble gradient change Three of the most commonly used Hessian update formu- terms (y, Hs or z), while the numerator for the PSB for- lae are the BFGS, SR1 and PSB updates. They use the dis- mula is constructed from outer products of displacements placement (s = q), change in gradient (y = g ), and q or displacements and gradient change terms. Similarly, the quadratic error (z = y − Hs) to correct the approximation denominators in the SR1/BFGS updates are scalar products to the Hessian according to data computed at two separate of the displacement and a gradient change terms, while geometries. yyT sHHsT the denominators in the PSB update are scalar products of HBFGS = − the displacement alone. If these features play a role in the yT s sHs (1)

1 3 Theor Chem Acc (2016) 135:84 Page 3 of 12 84

O NH2 O H O H O O O N H OH O OH O H O O O

H H HO H O O H OH Artemisinin Aspartame H O O O O O

O O

O O O O Avobenzone O Azadirachtin OH

HO OH O O N N Cl Cl C2H5 Al Al C H C2H5 Cl 2 5 Ethylaluminium Sesquichloride Cetirizine (EASC) Bisphenol A Cl

OH HO O O O

O H

O H H H N O HO

Codeine Diisobutylphthalate Estradiol

Fig. 1 Structures for test set improved behavior observed with the SR1/BFGS updates, used to initialize a quasi-Newton optimization, we obtain then perhaps the PSB update can be modified to produce a sensible v that has characteristics of y, the change in the more reasonable updates as well. gradient, while also having a nonzero overlap with s, the A more general symmetric rank 2 update can be con- displacement. This modified Greenstadt formula will be structed using any vector v as follows refered to as the scaled symmetric Broyden (SSB) update and is also tested in the flowchart method. T T T vz + zv T vv HSR2 = − s z (5) vT s vT s 2  3 Scaled RFO method This update is valid for any choice of v that has a nonzero overlap with s, with v = s giving the PSB update, and v = z 3.1 Motivation giving the SR1 update. Greenstadt [18] observed that the choice of v = Ms, where M is a positive definite weight- The shifted (quasi-)Newton methods control the direction ing matrix, minimizes the magnitude of update matrix rela- and magnitude of a (quasi-)Newton step by a shift of the tive to M. By letting M be the approximate Hessian that is Hessian eigenvalues.

1 3 84 Page 4 of 12 Theor Chem Acc (2016) 135:84

OH N HO N OH OH OH OH N N O HO N N O Mg OH O N N O HO

OH OH Inosine HO Maltose Magnesium Porphin

Cl O O O

O O H NH O N N OH HO

O HO O S Penacillin V Ochratoxin A O

NH O P O OH O O- HO OH O N+ O OH C13H27 OH HO Sphingomyelin O HO OH

O OH Raffinose O HO O O O O N HO OH N O Zn N O OH HO O OH O

O O O OH Tamoxifen Zinc EDTA

Vitamin C

Fig. 1 continued

ξRFO =− aug) of the Hessian augmented with the gradient q =−H + ξI −1g r ( r ) r (6) (A more detailed discussion of the RFO method may be where ξ is computed in such a way that the shifted quan- found in “RFO” section in Appendix.) tity Hr + ξI has the correct number of negative eigenvalues Hr gr (zero for minima, one for transition states) for the struc- Haug = (7) gT 0  ture being optimized, and the subscript r indicates that the r quantities are projected into the reduced, non-redundant When the shift factor is large enough to dominate the Hes- space (see “Projection into non-redundant space” section in sian, the result is a scaled steepest descent (or shallow- the Appendix for further details). The RFO method com- est ascent, depending on the sign of the shift factor) step. putes this shift factor as the negative of an eigenvalue (i.e., This tends to be a poor step on a chemical PES, since the

1 3 Theor Chem Acc (2016) 135:84 Page 5 of 12 84 vibrational modes in a molecule tend to be a combination allowed internal motions of the molecule, and the remain- of both soft angular/torsional and stiff bond stretch coordi- ing coordinates orthogonal to that basis are redundant. This nates, which can lead to oscillations and poor convergence basis is considered local because it depends on the corre- behavior. Standard methods to compute the shift factor sponding geometry, in the case of Cartesian coordinates, often result in a shift that is small enough that the Hessian the rotational orientation defines the direction of the overall remains dominant, and the vibrational modes are scaled rotations, while with redundant internal coor- appropriately to avoid the oscillatory behavior. Considera- dinates, the local basis is given by the left singular vectors tion of better ways to account for the stiffness of the vibra- of the B-matrix (see “Transformation of gradient/Hessian” tional modes when computing the direction and magnitude section in the Appendix), which is a function of the Carte- of the shifted (quasi-)Newton step has resulted in the scaled sian coordinates. RFO method. A method for transforming the in the local basis for one geometry into the local basis for another 3.2 Method geometry is outlined below. This reduces the amount of space required to store approximate Hessian information While the RFO method is normally derived with a gen- down to the amount required for a symmetric nact matrix eral matrix S (see “RFO” section in the Appendix), it and may help keep approximate Hessian information rel- is usually implemented with S = I. In the scaled RFO evant over longer optimizations when the local space method, S is a positive that is chosen to changes dramatically. The transformation is constructed better account for the difference in stiffness between by projecting the B-matrix of a previous geometry into the coordinates. Additionally, for behavior similar to the the local basis at the current geometry, and the result- RFO method, S should be normalized so that the mag- ing nact × nact matrix will have an unsigned nitude of S is comparable to the magnitude of the iden- of approximately, but not less than, 1. For this reason, the tity matrix. This ensures that the only function of the S transformation is referred to as a quasi-rotation coordinate matrix is to change the contribution from each coordi- propagation method. nate in the shifted matrix relative to one another without increasing or decreasing the magnitude of the shift itself. 4.2 Method These requirements are met by using an approximate, positive definite HessianH ˜ used to initialize the Hessian Instead of updating the Hessian in the full redundant space in a quasi-Newton optimization, projected into the active and then projecting into the non-redundant space at every ˜ ˜ T space at the current geometry (Hr = UactHUact), with the step of the optimization, the Hessian may be stored and normalization factor computed using the inverse of the updated in only the active space. To account for the changes determinant. in the active space from one point to the next point, a quasi- rotation scheme is used. If Bold and Bnew are the B-matrix Hr computed at the previous and current point, respectively, S = (8) old new 1/n while U and U are the active space at the previous and |Hr | act act act current point, the quasi-rotation matrix that transforms a Löwdin orthogonalization [19] can be used to scale the matrix or vector from the active space at the previous point Hessian, gradient and computed step, allowing for the to the active space at the current point is given by computation of the shift parameter in the standard way. −1 Mrot = UoldBold UnewBold (10) −1/2 −1/2 −1/2 act act qSRFO = S S HrS  (9) −1 −1/2 The rotated Hessian at the new point may be computed as − RFOI) S gr follows 4 Quasi‑rotation internal coordinate propagation rot rot T old rot Hact = M Hact M (11) 4.1 Motivation  This Hessian may then be updated by using modified q Regardless of the coordinate system chosen, the number of and g vectors internal degrees of freedom defining a molecular geometry prj new new old is fixed atn act = ncrt − 6 for nonlinear molecules. When- q =Uact q − q (12) ever more than nact primitive coordinates are used to define  the molecular geometry, a local basis of n linear combi- act rot new rot T old nations of the primitive coordinates completely defines the g =gr − M gr (13)  1 3 84 Page 6 of 12 Theor Chem Acc (2016) 135:84

The projected displacement will not be 100 % accurate, consisting of an energy and gradient calculation, that each as some of the displacement may exist in the redundant method required to achieve the convergence criteria listed space at the new point. However, this amount is typically in Sect. 1 for the examples in the test set shown in Fig. 1 small relative to the magnitude of the displacement vector, (initial coordinates of the structures are provided in the and the BFGS and SR1 updates only modify space that is supporting information). So that the different algorithms nonzero in g and H q. could be surveyed quickly, Hartree–Fock with the 3–21 G basis set was used to compute the energies for most cases. The behavior for B3LYP [22–25] with a 6–31 G(d,p) basis 5 Results and discussion set should be similar, with the latter used in the EASC and vitamin C optimizations. For comparison, the total number The performance of the modified algorithms are compared of iterations required to converge all of the structures in the to the standard algorithm outlined in the appendix: optimi- test set is listed in Table 1 for each algorithm that success- zation in redundant internal coordinates using the BFGS fully optimized the complete test set. Additionally, since formula to update the Hessian and RFO to compute the the QN method fails to converge on five of the structures, step. An adaptive is also used to control the the total number of iterations required to converge only the maximum step size, with the RFO step scaled back when- remaining 15 structures is also listed. ever the computed step would exceed the trust region. Six Immediately apparent from the table is that optimiza- modified algorithms were constructed by replacing one tion in redundant internal coordinates is vastly superior to component of the standard algorithm: optimization in Cartesian coordinates. Simple quasi-New- ton steps perform slightly better, in general, than the steps • Cartesian—Optimize using Cartesian coordinates rather shifted by the RFO method; however, the RFO method ben- than redundant internal coordinates efits from greater stability due to the ability to produce a • QN—Use a quasi-Newton step (i.e., Eq. 27 with good optimization step even when the Hessian has negative m = 0 ) rather than the RFO step eigenvalues. It is common in optimization programs using • FlowPSB—Replace BFGS updating with the flowchart the quasi-Newton method to employ additional fail-safes, update, using the PSB update as the fallback option such as restarting the optimization with a new approximate • FlowSSB—Replace BFGS updating with the flowchart Hessian or rejecting updates that result in an indefinite Hes- update, using the SSB update as the fallback option sian, but the performance and merits of these approaches • SRFO—Compute the step using the scaled RFO were not studied in the present work. The SRFO method approach rather than the standard RFO step offers a level of efficiency that is closer to the unshifted • RotCrd—Store only the non-redundant nact × nact quasi-Newton method, while still providing the reliability of Hessian, using the quasi-rotation matrix to propagate the RFO method. the Hessian and gradient at the previous point prior to For the new methods introduced in this paper, the Rot- applying the update Crd method was the only one that required more iterations (701) than the standard method (689) to optimize the test In each case, the initial Hessian was estimated as a diag- set. In general, the unsigned determinant of the quasi-rota- onal matrix in redundant internal coordinates, and the ini- tion matrix was <1.05 even for large steps; however, when tial trust region was set to 0.5 bohr or radian. No ceiling on the BFGS update produced an indefinite Hessian, the sub- the maximum step size was found to be necessary, as steps sequent step not only tended to increase the energy, but the longer than approximately 1 bohr were uncommon, particu- resulting quasi-rotation matrix had a determinant much larly when RFO was employed, and so the trust region was larger than previous or subsequent steps, in some cases limited in practice to a maximum of 2 bohr. For the Car- being as large as 1.2. Since corresponding poor updates tesian optimization, the initial Hessian was transformed to were observed in the standard algorithm, it suggests that T Cartesian coordinates using the B-matrix, Hcart = BHintB , changes to the local basis may play some role in these poor and the reduced space at each step is defined as the updates, and that a large determinant of the quasi-rotation 3ncart − 6 vectors orthogonal to the translation and rotation matrix may have some use in rejecting what is likely to be vectors defined in Eqs. 16 and 17. For the QN algorithm, a poor step prior to evaluating the energy/gradient. Addi- if the Hessian is found to have a negative eigenvalue fol- tionally, the results here indicate that the RotCrd method lowing the update, the optimization is terminated since the adequately handles changes in the active coordinates dur- Hessian is not shifted prior to computing the step. ing the course of the optimization, suggesting that it may These algorithms were implemented in a Mathematica be worthwhile to use this approach in future investigations [20] program, using Gaussian 09 [21] to evaluate the poten- of new strategies that involve the evolution of the redundant tial energy surface. Table 1 shows the number of iterations, internal coordinate set during the optimization.

1 3 Theor Chem Acc (2016) 135:84 Page 7 of 12 84

Table 1 Number of iterations Cartesian QN Standard FlowPSB FlowSSB SRFO RotCrd required to converge to a minimum structure Artemisinin 80 25 24 21 21 24 25 Aspartame 98a 26 30 30 30d 28a 27 Avobenzone 193 18a 43 41 42 43 47 Azadirachtin 216a 57 77 73 74 65 72 Bisphenola A 94 Failc 31 32 31 21 30 Cetirizine 157a 31a 35 37 40 30 39 Codeine 70 30b 18 18 18d 18b 19 Diisobutylphthalate 150a Failc 27 27 27d 28a 29 Estradiol 55 25 21 21 21d 19 21 EASC 123 28 40 27 25 24 31 Inosine 96a Failc 41 46 42 34 41 Maltose 168a 36 35 36 37 36 37 Mg porphyrin 49 16 14 24 24d 16 14 Ochratoxin A 174b Failc 31 31 31d 43b 31 Penicillin V 183b 35b 31 32 32d 30a 32 Raffinose 223b 50a 53 55 55d 54b 55 Sphingomyelin 429a Failc 56 48 48d 51 67 Tamoxifen 238 24 35 35 35d 35 36 Vitamin C 48 18 18 18 18d 18 17 Zn EDTA 58 27 29 29 29d 27 31 Subtotale 1959 446 503 497 501 467 503 Total 2902 – 689 681 680 644 701

a The optimization converged to a higher energy structure than the standard method b The optimization converged to a lower energy structure than the standard method c The updated Hessian had a negative eigenvalue, and so the quasi-Newton optimizer was terminated prior to convergence d The symmetric Broyden update was never used in the FlowPSB optimization, so the optimization behav- ior of the FlowSSB method is identical e Comparison of only the 15 structures which successfully converged using the QN algorithm

Both of the flowchart methods (FlowPSB and FlowSSB) to converge the test set than the standard approach. The and the SRFO method required fewer iterations to opti- SRFO method was also the only new method that con- mize all of the structures in the test set than the standard verged to different structures in six of the cases. When method. Only 8 of the 20 optimizations needed to fall back the initial structure is high in energy and far from a mini- to the symmetric Broyden update in the flowchart method, mum, there may be multiple minimum energy wells that so the differences in performance between the FlowPSB are accessible, each containing a different stable con- and FlowSSB methods are limited to the differences in former. Since the SRFO effectively emphasizes changes those 8 structures with neither approach doing consistently to dihedral and valence angles early in the optimization, better than the other. Even when the symmetric Broyden it is reasonable to expect that it will occasionally find update was needed, it was only used once or twice during a different minimum than the standard RFO approach, the course of the optimization, and so the BFGS and SR1 even when using the same initial Hessian and same Hes- updates were well suited to these problems. It is possible sian update formula. For the molecules where the SRFO that greater differences between the BFGS and the flow- method had the largest improvement over the standard chart updates would be observed when the surface is less method (Azadirachtin, Bisphenol A, EASC and Inosine), regular than ab initio or DFT surfaces (for example, using the SRFO and Standard methods converged to the same semi-empirical energies, or solvation models), and future structure. Likewise, the only case where the SRFO method investigation is warranted. needed a significant number of additional iterations to con- The SRFO method had the best overall performance of verge (Ochratoxin A), the SRFO method produced a lower the seven methods in Table 1, requiring 45 fewer iterations energy structure.

1 3 84 Page 8 of 12 Theor Chem Acc (2016) 135:84

6 Summary of the optimization and the rational functional optimiza- tion and trust region schemes to control the step size and The present paper discusses various components of opti- direction. mization methods, including coordinate systems, transfor- Geometry extrapolation methods, such as line searches mations, control of the step size and direction, and Hessian and DIIS, are commonly used with an aim to improve the updating and provides a diverse set of 20 medium size mol- efficiency of an optimization. In practice, they can also have ecules for testing optimization methods. The best current a significant influence on the stability of an optimization as geometry optimization methods typically use redundant well. Likewise, Hessian updating methods that consider the internal coordinates, BFGS updating of the Hessian, RFO change in the gradient at the current geometry relative to and trust region updating for step size control. A number many previous optimization steps are employed for similar of new refinements have been proposed and tested: updat- reasons. The performance of these methods depends heav- ing that switches between SR1, BFGS and PSB or SSB to ily on the choice of parameters, such as how many previous obtain the most reliable update (FlowPSB and FlowSSB), points should be considered in the DIIS extrapolation or the scaled RFO (SRFO) that varies control of the step size Hessian update step. In order to better evaluate the merits and direction depending on the softness or stiffness of the of the methods described in this paper on their own terms, modes, and quasi-rotation propagation of the internal coor- no extrapolation methods were employed, and the Hessian dinates (RotCrd). The latter is an initial approach to work- was updated using only the information at the current point ing in only the active space of the redundant internal coor- and the most recent point. dinates which necessarily changes during the course of the optimization. The results for optimization of the molecules Coordinate transformation/projection in the test set clearly show that redundant internal coor- dinates are superior to Cartesian coordinates. RFO is bet- Coordinate definitions ter than quasi-Newton since it can reach a minimum even when the Hessian has negative eigenvalues. The FlowPSB The standard redundant internal coordinate (RIC) system and FlowSSB methods are comparable to the standard includes bond stretches between all bonded atoms, angles BFGS updating in most cases. The SRFO approach takes between pairs of bonds that share an atom and the dihedral about 7 % fewer steps for optimization of the test set. angles between the planes defined by any two angles that share a common bond. Typically, whether or not a pair of Acknowledgments This work was supported by a Grant from the atoms is bonded can be determined automatically by com- National Science Foundation (CHE1212281 and CHE1464450). paring the distance between the atoms to a standard bond Wayne State University’s computing grid provided computational support. length; however, for the present work, the connectivity was explicitly provided for all examples in order to ensure that the intended coordinate system is used and the behavior of Appendix: Standard method the resulting optimizations does not depend on any arbi- trary numerical thresholds for bondedness. Described below is an optimization algorithm composed Two additional coordinate definitions that are frequently of standard methods and is comparable to the ones used in used in adjunct to the standard stretch, bend and torsion various electron structure packages. This algorithm is used coordinates are out-of-plane bends and linear angle bends. as a reference for comparing and evaluating the modifica- Out-of-plane bends describe how an atom moves relative to tions described in the text. The core of the algorithm is the the plane defined by 3 other atoms, while linear angle bend quasi-Newton method applied to the gradient of the poten- coordinates describe the bending motion of 3 atoms which tial energy, and the additional components include the have become nearly or completely linear. Out-of-plane choice of coordinates to optimize, how the Hessian (the bend coordinates, such as an improper dihedral angle, may matrix of second derivatives of the PES) is computed or provide some benefit to an optimization but are generally approximated, and how the Newton method is modified in unnecessary for structures containing more than 4 atoms as order to control step size and direction. While there are a the relevant motions are well described by the redundancy variety of different approaches that have been used in the in the standard RIC set. For the present work, no out-of- past, a very common combination of methods includes the plane coordinates are used. use of a redundant set of internal coordinates comprised Linear bend coordinates, on the other hand, may be of bond stretches, bends and torsions, approximation of essential when the optimized structures have linear or the full Hessian matrix along with the use of updating near-linear angles. Inclusion of standard bend angle coor- schemes to improve the approximation over the course dinates (and the corresponding dihedral angle coordinates)

1 3 Theor Chem Acc (2016) 135:84 Page 9 of 12 84 for an angle that has become near linear can be disastrous onto the individual infinitesimal translation t( i) and rotation for an optimization, even when the remaining coordinates (ri) vectors in the RIC set are sufficient to fully describe the molecule. 3 Whether or not angles and their corresponding dihedrals P = t tT + r rT can simply be omitted from the RIC definitions depends TR i i i i (15) i on how many atoms are bonded to the center atom in the linear angle and may be possible to do so with as few as where the portion of these vectors corresponding to the kth 3 bonded atoms. When the local structure around a central atom are given by atom is non-planar and there are at least 4 bonds, the linear t =e angle coordinates can always be safely omitted since the i,k i (16) local structure can be completely defined with bonds and ri,k =xk × ei nonlinear angles. In the standard method, linear bends were (17) handled by the approach outlined in Sect. 1 whenever an where × denotes the 3-dimensional cross product, ei is the angle exceeds 165° and the central atom is bonded to fewer unit vector in the ith direction, and xk are the 3-dimensional than 5 atoms, which is assumed to be sufficient to ensure Cartesian coordinates for atom k. These vectors are not non-planarity around the central atom. orthonormal with respect to each other, but for a nonlinear molecule, they fully span and are entirely contained within Transformation of gradient/Hessian the 6 instantaneous translation and rotation degrees of free- dom for the molecule defined by the coordinatesx . The Since the derivatives of the PES are computed in terms of generalized inverse may then be computed as the absolute Cartesian coordinates of the atomic centers −1 − T T in a molecule, at each step of the optimization the gradi- B = B B + PTR B (18) ent (and Hessian, if it is computed) must be transformed  into internal coordinates. This transformation involves the The primary benefit of computing the generalized inverse T −1 multiplication of the Cartesian gradient by the generalized this way is that B B + PTR allows the use of iterative inverse of Wilson’s B-Matrix, which is defined as the par- inversion methods for when the size of the molecule is too tial derivatives of the RIC with respect to a change in the large to invert the matrix directly and/or store the result. Cartesian coordinates: For the present work, however, the generalized inverse was computed using singular value decomposition (SVD). The ∂qi SVD of B is given by B = U VT, where U is a square, Bia = → δqi = Biaδxa ∂x (14) of size n , V is a unitary matrix of size n , a a ric crt and is a matrix of the appropriate dimensions that is zero where qi are the internal coordinates, and xa are the Carte- everywhere except for the firstn crt − 6 diagonal elements. sian coordinates. The generalized inverse may be computed as B is an nric ncart, where nric is the total number of × B− = V −UT (19) redundant internal coordinates, and ncrt is the total num- ber of Cartesian coordinates (3 times the number of where − is the transpose of , with the nonzero diago- − atomic centers). The generalized inverse (also known as nal elements replaced by their inverse (i.e., ii = 1/�ii the Moore-Penrose pseudo-inverse) is a generalization of if ii �= 0). Whether Eq. 18 or Eq. 19 are used, the same the concept of an inverse to non-square or singular matri- matrix should result. The primary benefit of using Eq. 19 ces such that only the non-singular part of the matrix is is that the locally non-redundant active space for the cur- inverted. Two approaches for computing the pseudo-inverse rent geometry (the nact = ncrt − 6 orthogonal linear com- of the B matrix, in particular, are through inversion of binations of the redundant internal coordinates in which B B-squared (BT B), and by singular value decomposition. is defined for a given geometry) is computed as a conse- For the first approach, it should be recognized thatB T B quence of doing the SVD. The active space is defined as the is a ncart × ncart singular matrix, and so care must be taken firstn act rows of U. when inverting it. So long as the internal coordinates cho- With B− defined, the typical approach for transformation sen to defineB are sufficient to fully describe the internal of the gradient and Hessian begins with an algebraic rear- motions of the molecule, the singular part of BT B corre- rangement of the definition of the gradient (using sponds to the infinitesimal rotations and translations of the x and q subscripts to differentiate between Cartesian and molecule. A projector onto this space can be added to BT B redundant internal coordinate , respectively) without effecting the end result of the pseudo-inverse. This g = BT g → g = B− T g projector PTR can be computed as a sum of the projector x q q x (20)  1 3 84 Page 10 of 12 Theor Chem Acc (2016) 135:84

A similar approach is used to transform the Hessian nearly/completely linear is θ123), the dummy atom is indi- cated by the subscript d, and n and m are used to indicate − T T ∂B − Hq = B Hx − g B (21) any atoms bonded to 1 and 3, respectively.  q ∂x    Projection into non‑redundant space Placement of the dummy atom

When using a redundant coordinate system, the total num- If θ123 is not completely linear, then the plane in which ber of coordinates being optimized exceeds the number of the angle bends may be defined byr 12 and r23, while if it internal degrees of freedom, and this must be accounted for is linear, the choice of bend direction is arbitrary. For con- when computing a step. Constraints or a penalty function sistency and convenience, the bend direction may be cho- approach could be used to compute a valid step in the full sen such that either φn12d or φd23m is 0 for some n or m. set of redundant coordinates, but here a projection method Once the plane is defined, the dummy atom is placed such is used. The active space described in the previous section that θ12d = θ32d and r2d is equal to a constant value. In the may be used to project the gradient gr = Uactgq and Hes- present implementation, this is set to 2 bohr since that is T sian Hr = UactHqUact into a reduced space. The step, qr, a fairly typical bond length, but the choice is more or less is computed in this space and then transformed back to the arbitrary. T full q0 = Uact qr. Definition of coordinate set to handle linear bends Back‑transformation of step With the dummy atom placed, the optimization is carried Due to the curvilinear relationship between the redundant out by modifying the coordinate definitions and constrain- internal and Cartesian coordinates, the back-transforma- ing three coordinates per linear bend to define the location tion has to be done iteratively. This is done by minimiz- of the dummy atom relative to the rest of the molecule. ing the difference between the current geometry q(x) The standard approach is used to construct the coordinate and a goal geometry, in redundant internal coordinates set beginning with all of the stretches corresponding to the qg = q0 + q0 , where q0 is the internal coordinate geom- bonded atoms, and angle bends between pairs of stretches etry at the current iteration, and q0 is the computed step. that share a common atom. The angle θ123 is redefined as This can be thought of as the minimization of the following the equivalent sum θ12d + θ32d. The dihedral angles are functional included for all pairs of angle bend coordinates (not includ- ing θ123) that share a bond, including all of the dihedrals 1 2 F(x + x) = ( q) φn12d and φd23m, and the improper dihedral φ12d3. 2 (22) The constrained coordinates always include r2d , 1 2 = q − q(x + x) θ12d − θ32d, and a dihedral, but the choice of dihedral 2 g  depends on the structure. For the vast majority of cases, the with respect to x, starting with x = x0 where x0 is Car- choice of any φn12d or φd23m is appropriate, with special tesian geometry at the current step. This gives the typical care taken when there are multiple adjacent linear bends iterate for x (i.e., 4 or more bonded, colinear atoms) to ensure that one − dihedral for each dummy atom is included. If the molecule xi+1 = xi + Bi qi (23) contains only 3 atoms, the improper dihedral φ12d3 can be This process continues until the RMS of xi = xi+1 − xi is constrained as the bend direction is completely arbitrary for <10−6 bohr. a potential that only depends on the internal coordinates.

Linear bends Coordinate transformation/back‑transformation

Definition of terms Let Bopt be the B-matrix for the redundant internal coordi- nate set defined in the previous section, andB constr be the There are various ways to handle linear bends. In the pre- B-matrix for the set of coordinates that will be used to con- sent work, a dummy atom is used to define the two degrees strain the motion of the dummy atom. The transformation of freedom associated with a linear bend. The Cartesian is carried out using an otherwise identical approach to the coordinates, bond lengths, bend angle, and torsion angle standard unconstrained transformation, using a projected − are denoted by xi, rij > 0, 0 ≤ θijk ≤ π and −π ≤ φijkl ≤ π B-matrix Bprj = Bopt − BoptBconstrBconstr, and assum- respectively. The atoms involved in the linear angle are ing that the gradient and Hessian elements corresponding indicated by numbers 1 through 3 (i.e., the angle which is to the dummy atoms are all 0. Bprj has the correct number

1 3 Theor Chem Acc (2016) 135:84 Page 11 of 12 84 of non-singular values (3 ∗ n ) for the original sys- T atoms m = qRFO,m g (28) tem. If analytical Hessians are used, the B-matrix deriva-  tives are similarly projected: for internal coordinate i, For minimizations, the most negative eigenvalue of the − − ∂ Bprj i = I − BcnsBcns ∂ Bopt i I − BcnsBcns . augmented Hessian is used. For transition states, even Once a step is computed, the  back-transformation is though the second most negative eigenvalue results in a done in an unconstrained fashion, with the goal values for shifted Hessian with a single negative eigenvalue, the parti- the constrained coordinates set according to their current tioned RFO (pRFO [26]) approach tends to result in a more values. A second iterative back-transformation is used, if stable optimization behavior. The pRFO approach involves necessary, to reimpose the dihedral constraint since there searching for a minimum in n − 1 directions (usually may be some redundancy between the dihedral constraint selected to be eigenvectors of the Hessian), and a maxi- chosen and dihedral coordinates used in the optimization. mum in the remaining direction. This is done by combining During the second iterative back-transformation, only the an RFO minimization step in the n − 1 space, and an RFO dummy atom is allowed to move. maximization step in the remaining space. The RFO method provides a means to compute a step Step size and direction control in the correct direction for an optimization even when the Hessian has the incorrect number of negative eigenval- RFO ues (0 for minima, 1 for transition states). Often times, however, when the geometry is far from the solution, The rational function optimization [26] (RFO) method and especially when the Hessian data are approximate/ begins by approximating the PES as a rational function: updated, the computed step size may be unreasonably large and needs to be scaled back to avoid over-shooting ERF (q0 + q) the solution. For minimizations, an adaptive step size T 1 T q g0 + q H0 q (24) approach is frequently used and will be included in the = E + 2 0 1 + qT S q standard method (see Ref. [27] for a more indepth dis- cussion of adaptive step size methods). The adaptive with the matrix S taken to be the for con- step size, or trust region, is initialized to a modest value venience. Equation 24 has the property that, for a nonzero (0.5 bohr or rad for the present work), and at each step gradient, there exactly n + 1 stationary points on the sur- of the optimization, the predicted quadratic change in T T face, where n is the number of coordinates defining the energy ( Equad = q g + 0.5 q H q) is compared to surface, with each having a different cur- the actual change in the energy. When the ratio of Equad vature (determined by the number of negative eigenval- and the actual change in energy is >75 % and the length ues in the second of ERF). By stepping toward of the step is at least 80 % of the current trust region, the the stationary point on the rational function surface with trust region is doubled. If the ratio of Equad and the actual the desired curvature, an optimization may be directed to change in energy is <25 %, the trust region is reduced to a a particular solution even when the number of negative quarter of the length of the previous step. eigenvalues in H0 is incorrect. The stationary condition (the value of q where ∂ERF/∂ q is zero) for all n + 1 Hessian update stationary points is given by diagonalizing the augmented Hessian To facilitate the discussion of different Hessian update for- mulas, the following standard notations for the displace- Hg Haug = (25) ment s = q, change in gradient y = gq, and quadratic gT 0  error z = y − Hs is used. The step may be computed using either the mth eigenvector yyT sHHsT (vm) or eigenvalue ( m) of the augmented Hessian. H = − (29) BFGS yT s sHs

vm,1, vm,2, ..., vm,n T qRFO,m = (26) zz vm n+  H = (30) , 1 SR1 zT s

−1 qRFO,m =−(H − mI) g (27) T T T sz + zs T ss HPSB = − s z (31) Both approaches result in the same displacement, and the sT s sT s 2 resulting step satisfies the following relationship 

1 3 84 Page 12 of 12 Theor Chem Acc (2016) 135:84

HMSP = φMSP HSR1 3. Pulay P, Fogarasi G (1992) J Chem Phys 96(4):2856 + (1 − φ ) H (32) 4. Hestenes MR, Stiefel E (1952) Res Nat Bur Stand 49(6):409 MSP PSB 5. Nocedal J (1980) Math Comput 35(151):773 6. Chill ST, Stevenson J, Ruehle V, Shang C, Xiao P, Farrell JD, Wales DJ, Henkelman G (2014) J Chem Theory Comput sT z 2 10(12):5476 φ = (33) MSP T T 7. Schlegel HB (1984) Theor Chim Acta 66(5):333 z z s s 8. Broyden CG (1970) Math Comput 24(110):365   9. Fletcher R (1970) Comput J 13(3):317 When attempting to locate a minimum energy structure, 10. Goldfarb D (1970) Math Comput 24(109):23 the BFGS update (Eq. 29) is widely considered to be the 11. Shanno DF (1970) Math Comput 24(111):647 gold standard as it is generally quite stable, so long as 12. Powell M (1971) IMA J Appl Math 7(1):21 T 13. Murtagh BA (1970) Comput J 13(2):185 s y is positive and the current approximation to the Hes- 14. Bofill JM (1994) J Comput Chem 15(1):1 sian is positive definite. For transition-state optimiza- 15. Császár P, Pulay P (1984) J Mol Struct 114:31 tions, which require the Hessian to be indefinite with one 16. Peng C, Ayala PY, Schlegel HB, Frisch MJ (1996) J Comput negative eigenvalue, there is not such a clear cut winner. Chem 17(1):49 17. Baker J, Chan F (1996) J Comput Chem 17(7):888 The SR1 method (Eq. 30, also referred to as the MS or 18. Greenstadt J (1970) Math Comput 24(109):1 Murtagh–Sargeant update) tends to produce the most rea- 19. Löwdin PO (1950) J Chem Phys 18(3):365 sonable updates, particularly when the change in the Hes- 20. Wolfram Research, Inc (2013) Mathematica, Version 9.0. Cham- sian is large (for example, when the approximation is poor paign, IL 21. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, or the change in the gradient is large), but it can suffer from Cheeseman JR, Scalmani G, Barone V, Mennucci B, Petersson T T stability issues when s z is small relative to z z. The PSB GA, Nakatsuji H, Caricato M, Li X, Hratchian HP, Izmaylov AF, method (Eq. 31), on the other hand, is very stable, but the Bloino J, Zheng G, Sonnenberg JL, Hada M, Ehara M, Toyota K, updates tend to be significantly poorer as they increase in Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Montgomery JA Jr, Peralta JE, Ogliaro magnitude. A combination of these two updates, the MSP F, Bearpark M, Heyd JJ, Brothers E, Kudin KN, Staroverov method (Eq. 32), was proposed by Bofill [14]. This com- VN, Kobayashi R, Normand J , Raghavachari K, Rendell A, bined update attempts to leverage the pros and cons of the Burant JC, Iyengar SS, Tomasi J, Cossi M, Rega N, Millam JM, SR1 and PSB methods by blending them together accord- Klene M, Knox JE, Cross JB, Bakken V, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, ing to the overlap of the quadratic error and the displace- Pomelli C, Ochterski JW, Martin RL, Morokuma K, Zakrzewski ment (Eq. 33). VG, Voth GA, Salvador P, Dannenberg JJ, Dapprich S, Daniels AD, Farkas, Foresman JB, Ortiz JV, Cioslowski J, Fox DJ (2009) Convergence Gaussian 09 revision d.01. Gaussian Inc. Wallingford CT 22. Vosko S, Wilk L, Nusair M (1980) Can J Phys 58(8):1200 23. Lee C, Yang W, Parr RG (1988) Phys Rev B 37(2):785 An optimization is considered converged when the RMS 24. Becke AD (1993) J Chem Phys 98(7):5648 change in q is less than 1.2 × 10−3 bohr or radians, the 25. Stephens P, Devlin F, Chabalowski C, Frisch MJ (1994) J Phys maximum absolute element in q is less than 1.8 × 10−3 Chem 98(45):11623 −4 26. Banerjee A, Adams N, Simons J, Shepard R (1985) J Phys Chem bohr or radians, the RMS of gr is less than 1.5 × 10 har- 89(1):52 tree/(bohr or rad), and the maximum absolute element in gr 27. Dennis JE Jr, Schnabel RB (1983) Numerical methods for is less than 4.5 × 10−4 hartree/(bohr or rad). These conver- unconstrained optimization and nonlinear equations. Prentice- gence criteria are equivalent to those used by Gaussian 09 Hall, Englewood Cliffs, NJ [21].

References

1. Schlegel HB (2011) WIREs Comput Mol Sci 1(5):790 2. Fogarasi G, Zhou X, Taylor PW, Pulay P (1992) J Am Chem Soc 114(21):8191

1 3