Efficient Implementation of the Truncated-Newton

EFFICIENT IMPLEMENTATION OF THE TRUNCATED-NEWTON ALGORITHM FOR LARGE-SCALE CHEMISTRY APPLICATIONS y DEXUAN XIE AND TAMAR SCHLICK Abstract. To eciently implement the truncated-Newton TN optimization metho d for large- scale, highly nonlinear functions in chemistry, an unconventional mo di ed Cholesky UMC factorization is prop osed to avoid large mo di cations to a problem-derived preconditioner, used in the inner lo op in approximating the TN search vector at each step. The main motivation is to re- duce the computational time of the overall metho d: large changes in standard mo di ed Cholesky factorizations are found to increase the numb er of total iterations, as well as computational time, signi cantly. Since the UMC may generate an inde nite, rather than a p ositive de nite, e ective preconditioner, we prove that directions of descent still result. Hence, convergence to a lo cal minimum can b e shown, as in classic TN metho ds, for our UMC-based algorithm. Our incorp oration of the UMC also requires changes in the TN inner lo op regarding the negative-curvature test which we replace by a descent direction test and the choice of exit directions. Numerical exp eriments demonstrate that the unconventional use of an inde nite preconditioner works much b etter than the minimizer without preconditioning or other minimizers available in the molecular mechanics and dynamics package CHARMM. Go o d p erformance of the resulting TN metho d for large p otential energy problems is also shown with resp ect to the limited-memory BFGS metho d, tested b oth with and without preconditioning. Key words. truncated-Newton metho d, inde nite preconditioner, molecular p otential minimization, descent direction, mo di ed Cholesky factorization, unconventional mo di ed Cholesky factorization AMS sub ject classi cations. 65K10 92E10 1. Intro duction. Optimization of highly nonlinear ob jective functions is an im- p ortant task in biomolecular simulations. In these chemical applications, the energy of a large molecular system|such as a protein or a nucleic acid, often surrounded by water molecules|must b e minimized to nd a favorable con guration of the atoms in space. Finding this geometry is a prerequisite to further studies with molecular dynamics simulations or global optimization pro cedures, for example. An imp ortant feature of the p otential energy function is its ill conditioning; function evaluations are also exp ensive, and the Hessian is typically dense. Moreover, a minimum-energy con- guration corresp onds to a fairly accurate lo cal optimum. Since thousands of atoms are involved as indep endent variables and, often, the starting co ordinates may be far away from a lo cal minimum, this optimization task is formidable and is attract- ing an increasing number of numerical analysts in this quest, esp ecially for global optimization see [17,18], for example. The practical requirements that chemists and biophysicists face are somewhat dif- ferent from those of the typical numerical analyst who develops a new algorithm. The computational chemists seek reliable algorithms that pro duce answers quickly, with as little tinkering of parameters and options as p ossible. Thus, theoretical p erformance is not as imp ortant as practical b ehavior, and CPU time is of the utmost imp ortance. A prominent example is the current preference in the biomolecular community for Received by the editors Septemb er 9, 1997; accepted for publication in revised form July 3, 1998; published on SIAM J. Optim., Vol. 9, 1999. This work was supp orted in part by the National Science Foundation through award numb er ASC-9318159. T. Schlickisaninvestigator of the Howard Hughes Medical Institute. y Departments of Chemistry, Mathematics, and Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, [email protected], [email protected]. Phone: 212 998-3596, fax: 212 995-4152. 1 2 DEXUAN. XIE AND TAMAR. SCHLICK Ewald summation techniques for p erio dic systems over fast-multip ole approaches for evaluating the long-range forces in molecular simulations; the latter have smaller complexity in theory O n, where n is the system size, rather than the O n log n asso ciated with Ewald, but the Ewald pro cedure is easy to program and very fast in practice for a range of molecular sizes. The present pap er fo cuses on implementation details of a truncated-Newton TN metho d that are imp ortant in practice for p erformance eciency in large-scale p otential energy minimization problems. The algorithmic variations we discuss are moti- vated by optimization theory but depart from standard notions e.g., of a p ositive- de nite preconditioner for the sakeof eciency. Algorithmic stability and convergence prop erties are still retained in theory as in the traditional approach, but p erformance in practice is enhanced by the prop osed mo di cations. Our interest in such chemistry applications rst led to the development of a TN metho d adapted to p otential-energy functions [25]. Our TN package, TNPACK [23,24], was then adapted [5] for the widely used molecular mechanics and dynamics program CHARMM [1]. In TN metho ds, the classic Newton equation at step k , k k 1 H X P = g X ; where g and H are the gradient and Hessian, resp ectively, of the ob jective function k E at X , is solved iteratively and approximately for the search vector P [4]. The linear conjugate gradient CG metho d is a suitable choice for this solution pro cess for large-scale problems, and preconditioning is necessary to accelerate convergence. A main ingredient of TNPACK is the use of an application-tailored preconditioner k M . This matrix is a sparse approximation to H H X , formulated at each outer k k minimization step k . The preconditioner in chemical applications is constructed nat- urally from the lo cal chemical interactions: b ond length, b ond angle, and torsional p otentials [25]. These terms often contain the elements of largest magnitude and lead to a sparse matrix structure which remains constant in top ology throughout the minimization pro cess [5]. Since M may not be p ositive-de nite, our initial imple- k mentation applied the mo di ed Cholesky MC factorization of Gill and Murray [7] to solve the linear system M z = r at each step of PCG preconditioned CG. Thus, k g an e ective p ositive-de nite preconditioner, M , results. k Why is a TN scheme a comp etitive approach? First, analytic second-derivative information is available in most molecular mo deling packages and should b e used to improve minimization p erformance. That is, curvature information can guide the search b etter toward low-energy regions. Second, the basic idea of not solving the Newton equations exactly for the searchvector when far away from a minimum region saves unnecessary work and accelerates the path toward a solution. Third, the iterative TN scheme can be tailored to the application in many ways: handling of the truncated inner lo op, application of a preconditioner, incorp orating desired accu- racy, and so on. These implementation details are crucial to realized p erformance in practice. In our previous studies, wehave discussed alternative minimization approaches to TN [5,23,24,25]. We showed that mo di ed Newton metho ds are computationally to o exp ensive to be feasible for large systems [23, 24, 25] since the large Hessian of p otential energy function is dense and highly inde nite. Nonlinear CG metho ds can take excessively long times to reach a solution [5]; this is not only b ecause of the known prop erties of these metho ds but also due to the exp ense of evaluating TRUNCATED-NEWTON MINIMIZATION IN CHEMISTRY 3 the ob jective function at each step, a cost that dominates the CPU time [15]. A comp etitive approach to TN, however, is the limited-memory BFGS algorithm LM- BFGS [11], which also uses curvature information to guide the search. A study by 1 Nash and No cedal [15] comparing the p erformance of a discrete TN metho d to LM- BFGS found b oth schemes to be e ective for large-scale nonlinear problems. They suggested that the former p erforms b etter for nearly quadratic functions and also may p erform p o orly on problems asso ciated with ill-conditioned Hessians. However, as Nash and No cedal p oint out, since TN almost always requires fewer iterations than LM-BFGS, TN would b e more comp etitive if the work p erformed in the inner lo op were reduced as much as p ossible. This is the sub ject of this article. Our exp eriences to date in chemical applications for medium-size problems suggest that the CPU time of the TN approach can b e smaller than LM-BFGS since the total numb er of function evaluations is reduced. Examples shown in the presentwork, for larger problems as well, reinforce this. Surely, b oth metho ds can be ecient to ols for large-scale optimization, and sup eriority of one scheme over another cannot be claimed. In this pap er, we fo cus on an imp ortant asp ect of the TN metho d that a ects its p erformance profoundly: the formulation and handling of the preconditioner in the inner PCG lo op that is used to approximate the search vector at each step of the metho d. The use of a standard mo di ed Cholesky factorization applied to physically constructed preconditioner leads to excessively large mo di cations, which in turn means many function evaluations and thus a large total CPU time for the minimization metho d.

Efficient Implementation of the Truncated-Newton

CNC/IUGG: 2019 Quadrennial Report

The Method of Gauss-Newton to Compute Power Series Solutions of Polynomial Homotopies∗

Approximate Newton's Method for Tomographic Image Reconstruction

On the Use of Neural Networks to Solve Differential Equations Alberto García Molina

Petsc's Software Strategy for the Design Space of Composable

Sequential Quadratic Programming Methods∗

Arxiv:2011.07410V2 [Cs.CE] 3 Jul 2021

Trust Region Newton Method for Large-Scale Logistic Regression

Data-Scalable Hessian Preconditioning for Distributed Parameter PDE-Constrained Inverse Problems

International Conference on Differential, Difference Equations and Their Differential, Difference Applications

CREWES in 2016

Full Waveform Inversion and the Truncated Newton Method∗