EFFICIENT IMPLEMENTATION OF THE TRUNCATED-NEWTON
ALGORITHM FOR LARGE-SCALE CHEMISTRY APPLICATIONS
y
DEXUAN XIE AND TAMAR SCHLICK
Abstract. To eciently implement the truncated-Newton TN optimization metho d for large-
scale, highly nonlinear functions in chemistry, an unconventional mo di ed Cholesky UMC factor-
ization is prop osed to avoid large mo di cations to a problem-derived preconditioner, used in the
inner lo op in approximating the TN search vector at each step. The main motivation is to re-
duce the computational time of the overall metho d: large changes in standard mo di ed Cholesky
factorizations are found to increase the numb er of total iterations, as well as computational time,
signi cantly. Since the UMC may generate an inde nite, rather than a p ositive de nite, e ective
preconditioner, we prove that directions of descent still result. Hence, convergence to a lo cal mini-
mum can b e shown, as in classic TN metho ds, for our UMC-based algorithm. Our incorp oration of
the UMC also requires changes in the TN inner lo op regarding the negative-curvature test which
we replace by a descent direction test and the choice of exit directions. Numerical exp eriments
demonstrate that the unconventional use of an inde nite preconditioner works much b etter than
the minimizer without preconditioning or other minimizers available in the molecular mechanics and
dynamics package CHARMM. Go o d p erformance of the resulting TN metho d for large p otential
energy problems is also shown with resp ect to the limited-memory BFGS metho d, tested b oth with
and without preconditioning.
Key words. truncated-Newton metho d, inde nite preconditioner, molecular p otential mini-
mization, descent direction, mo di ed Cholesky factorization, unconventional mo di ed Cholesky fac-
torization
AMS sub ject classi cations. 65K10 92E10
1. Intro duction. Optimization of highly nonlinear ob jective functions is an im-
p ortant task in biomolecular simulations. In these chemical applications, the energy
of a large molecular system|such as a protein or a nucleic acid, often surrounded by
water molecules|must b e minimized to nd a favorable con guration of the atoms
in space. Finding this geometry is a prerequisite to further studies with molecular
dynamics simulations or global optimization pro cedures, for example. An imp ortant
feature of the p otential energy function is its ill conditioning; function evaluations are
also exp ensive, and the Hessian is typically dense. Moreover, a minimum-energy con-
guration corresp onds to a fairly accurate lo cal optimum. Since thousands of atoms
are involved as indep endent variables and, often, the starting co ordinates may be
far away from a lo cal minimum, this optimization task is formidable and is attract-
ing an increasing number of numerical analysts in this quest, esp ecially for global
optimization see [17,18], for example.
The practical requirements that chemists and biophysicists face are somewhat dif-
ferent from those of the typical numerical analyst who develops a new algorithm. The
computational chemists seek reliable algorithms that pro duce answers quickly, with as
little tinkering of parameters and options as p ossible. Thus, theoretical p erformance
is not as imp ortant as practical b ehavior, and CPU time is of the utmost imp ortance.
A prominent example is the current preference in the biomolecular community for
Received by the editors Septemb er 9, 1997; accepted for publication in revised form July 3,
1998; published on SIAM J. Optim., Vol. 9, 1999. This work was supp orted in part by the National
Science Foundation through award numb er ASC-9318159. T. Schlickisaninvestigator of the Howard
Hughes Medical Institute.
y
Departments of Chemistry, Mathematics, and Computer Science, Courant Institute of
Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012,
[email protected], [email protected]. Phone: 212 998-3596, fax: 212 995-4152. 1
2 DEXUAN. XIE AND TAMAR. SCHLICK
Ewald summation techniques for p erio dic systems over fast-multip ole approaches
for evaluating the long-range forces in molecular simulations; the latter have smaller
complexity in theory O n, where n is the system size, rather than the O n log n
asso ciated with Ewald, but the Ewald pro cedure is easy to program and very fast in
practice for a range of molecular sizes.
The present pap er fo cuses on implementation details of a truncated-Newton TN
metho d that are imp ortant in practice for p erformance eciency in large-scale p oten-
tial energy minimization problems. The algorithmic variations we discuss are moti-
vated by optimization theory but depart from standard notions e.g., of a p ositive-
de nite preconditioner for the sakeof eciency. Algorithmic stability and conver-
gence prop erties are still retained in theory as in the traditional approach, but p er-
formance in practice is enhanced by the prop osed mo di cations.
Our interest in such chemistry applications rst led to the development of a
TN metho d adapted to p otential-energy functions [25]. Our TN package, TNPACK
[23,24], was then adapted [5] for the widely used molecular mechanics and dynamics
program CHARMM [1].
In TN metho ds, the classic Newton equation at step k ,
k k
1 H X P = g X ;
where g and H are the gradient and Hessian, resp ectively, of the ob jective function
k
E at X , is solved iteratively and approximately for the search vector P [4]. The
linear conjugate gradient CG metho d is a suitable choice for this solution pro cess
for large-scale problems, and preconditioning is necessary to accelerate convergence.
A main ingredient of TNPACK is the use of an application-tailored preconditioner
k
M . This matrix is a sparse approximation to H H X , formulated at each outer
k k
minimization step k . The preconditioner in chemical applications is constructed nat-
urally from the lo cal chemical interactions: b ond length, b ond angle, and torsional
p otentials [25]. These terms often contain the elements of largest magnitude and lead
to a sparse matrix structure which remains constant in top ology throughout the
minimization pro cess [5]. Since M may not be p ositive-de nite, our initial imple-
k
mentation applied the mo di ed Cholesky MC factorization of Gill and Murray [7]
to solve the linear system M z = r at each step of PCG preconditioned CG. Thus,
k
g
an e ective p ositive-de nite preconditioner, M , results.
k
Why is a TN scheme a comp etitive approach? First, analytic second-derivative
information is available in most molecular mo deling packages and should b e used to
improve minimization p erformance. That is, curvature information can guide the
search b etter toward low-energy regions. Second, the basic idea of not solving the
Newton equations exactly for the searchvector when far away from a minimum re-
gion saves unnecessary work and accelerates the path toward a solution. Third, the
iterative TN scheme can be tailored to the application in many ways: handling of
the truncated inner lo op, application of a preconditioner, incorp orating desired accu-
racy, and so on. These implementation details are crucial to realized p erformance in
practice.
In our previous studies, wehave discussed alternative minimization approaches
to TN [5,23,24,25]. We showed that mo di ed Newton metho ds are computationally
to o exp ensive to be feasible for large systems [23, 24, 25] since the large Hessian
of p otential energy function is dense and highly inde nite. Nonlinear CG metho ds
can take excessively long times to reach a solution [5]; this is not only b ecause of
the known prop erties of these metho ds but also due to the exp ense of evaluating
TRUNCATED-NEWTON MINIMIZATION IN CHEMISTRY 3
the ob jective function at each step, a cost that dominates the CPU time [15]. A
comp etitive approach to TN, however, is the limited-memory BFGS algorithm LM-
BFGS [11], which also uses curvature information to guide the search. A study by
1
Nash and No cedal [15] comparing the p erformance of a discrete TN metho d to LM-
BFGS found b oth schemes to be e ective for large-scale nonlinear problems. They
suggested that the former p erforms b etter for nearly quadratic functions and also
may p erform p o orly on problems asso ciated with ill-conditioned Hessians. However,
as Nash and No cedal p oint out, since TN almost always requires fewer iterations than
LM-BFGS, TN would b e more comp etitive if the work p erformed in the inner lo op
were reduced as much as p ossible. This is the sub ject of this article.
Our exp eriences to date in chemical applications for medium-size problems suggest
that the CPU time of the TN approach can b e smaller than LM-BFGS since the total
numb er of function evaluations is reduced. Examples shown in the presentwork, for
larger problems as well, reinforce this. Surely, b oth metho ds can be ecient to ols
for large-scale optimization, and sup eriority of one scheme over another cannot be
claimed.
In this pap er, we fo cus on an imp ortant asp ect of the TN metho d that a ects its
p erformance profoundly: the formulation and handling of the preconditioner in the
inner PCG lo op that is used to approximate the search vector at each step of the
metho d. The use of a standard mo di ed Cholesky factorization applied to physically
constructed preconditioner leads to excessively large mo di cations, which in turn
means many function evaluations and thus a large total CPU time for the minimization
metho d. Pivoting strategies [6, 8] can reduce the size of the mo di cations but not
necessarily the problem condition numb er, and thus are not a clear solution. The
problem we address here is thus a general one, asso ciated with other mo di ed Cholesky
factorization metho ds [3,6,7,8,26]: how to handle large mo di cations to matrices
that are far from p ositive de nite. However, we address this problem only for the TN
minimization context, where the solution of such a linear system is not as imp ortant
as progress in the overall minimization metho d.
In chemistry problems, a large negative eigenvalue often corresp onds to a transi-
tion or saddle p oint. We argue that in our sp ecial context large-scale computational
chemical problems and TN, a standard MC factorization is inappropriate. Rather, it
is sucient to require only that the preconditioner b e nonsingular and often p ositive-
de nite near a minimum p oint. This leads to development of our simple unconven-
tional mo di ed Cholesky UMC factorization.
We present details of the resulting TN algorithm along with many practical ex-
amples that illustrate how the use of an inde nite preconditioner outp erforms other
variants e.g., no preconditioning, p ositive-de nite preconditioning in the TN frame-
work. We detail analysis that shows that the directions pro duced are still descent
directions, and thus the global convergence of the metho d to a lo cal minimum can
b e proven in the same way as for the \classic" TN scheme [4]. We also o er compar-
isons with LM-BFGS that suggest the b etter p erformance of TN for large p otential
energy problems.
The remainder of the pap er is organized as follows. In the next section, we sum-
marize the structure of a general descent metho d and describ e the new PCG inner
lo op we develop for the TN metho d. In section 3 we present the UMC designed for our
applications. In section 4, we present numerical exp eriments that demonstrate the
overall p erformance of the mo di ed TNPACK minimizer, along with a comparison to
1
The discrete TN metho d computes Hessian and vector pro ducts by nite di erences of gradients.
4 DEXUAN. XIE AND TAMAR. SCHLICK
LM-BFGS and other minimizers available in CHARMM a nonlinear CG and a New-
ton metho d. Conclusions are summarized in section 5. For completeness, analyses
for the PCG inner lo op of TN are presented in App endix A, and the full algorithm of
the TN metho d is describ ed in App endix B. The mo di ed package is also describ ed
in [28].
2. Descent metho ds and the truncated Newton approach. We assume
that a real-valued function E X istwice continuously di erentiable in an op en set D
n
of the n-dimensional vector space R . Descent metho ds for nding a lo cal minimum
k
of E from a given starting p oint generate a sequence fX g in the form
k +1 k k
2 X = X + P ;
k
k
where the search direction P satis es
k T k
3 g X P < 0:
The sup erscript T denotes a vector or matrix transp ose. Equation 3 de nes the
k
descent direction P which yields function reduction. The steplength in 2 is
k
chosen to guarantee sucient decrease, e.g., such that [12]
k k k k T k
4 E X + P E X + g X P
k k
and
k k T k k T k
5 jg X + P P j jg X P j;
k
where and are given constants satisfying 0 < < <1.
Condition 5 is referred to as the strong Wolfe condition. A steplength satis-
k
fying 5 must satisfy the usual Wolfe condition:
k k T k k T k
g X + P P gX P :
k
According to the line search algorithm of Mor e and Thuente [12] used in TNPACK,
such a steplength is guaranteed to b e found in a nite numb er of iterations. Hence,
k
according to the basic theory of descent metho ds [10], a descent metho d de ned in
the form 2 guarantees that
k
lim g X =0:
k !1
The challenge in developing an ecient descent metho d is balancing the cost of con-
k
structing of a descent direction P with p erformance realized in practice.
To reduce the work cost of the classic mo di ed Newton metho d and develop a
globally convergent descent algorithm, Demb o and Steihaug prop osed a clever vari-
ation known as the truncated-Newton metho d [4]. Since then, several variants have
b een develop ed and applied in various contexts; see, e.g., [13, 14,15,23,25,29]. The
linear PCG framework is the most convenient generator of descent directions in the
inner TN lo op due to its eciency and economic storage requirements for solving
large p ositive-de nite linear systems. Since the PCG metho d may fail at some step
when the matrix H is inde nite, a termination strategy is required to guarantee that
k
the resulting search directions are still descent directions. In addition, the PCG inner
lo op of the TN metho d can b e made more e ectiveby employing a truncation test.
TRUNCATED-NEWTON MINIMIZATION IN CHEMISTRY 5
We present our PCG inner lo op of the TN scheme in Algorithm 1. The changes
with resp ect to a \standard" PCG inner lo op include allowing an inde nite precon-
ditioner, the UMC factorization discussed in the next section, and a new descent
direction test.
Algorithm 1 PCG inner loop k of TN for solving H p = g with a
k k
given preconditioner M .
k
Let p represent the j th PCG iterate, d the direction vector, and r the residual
j j j
vector satisfying r = g H p . Let IT denote the maximum number of
j k k j
PCG
allowable PCG iterations at each inner lo op.
Set p =0, r = g , and d = z , where z solves a system related to M z =
1 1 k 1 1 1 k 1