Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third How to Utilize Sparsity in the Problem Numerical Results

Newton and Halley are one step apart

Trond Steihaug

Department of Informatics, University of Bergen, Norway

4th European Workshop on Automatic Differentiation December 7 - 8, 2006 Institute for Scientific Computing RWTH Aachen University Aachen, Germany

(This is joint work with Geir Gundersen)

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative How to Utilize Sparsity in the Problem Numerical Results Overview

- Methods for Solving Nonlinear Equations: Method in the Halley Class is Two Steps of Newton in disguise. - Local Methods for Unconstrained Optimization. - How to Utilize Structure in the Problem. - Numerical Results.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results Newton and Halley

A central problem in scientific computation is the solution of system of n equations in n unknowns

F (x) = 0 n n where F : R → R is sufficiently smooth.

Sir Isaac Newton (1643 - 1727). Sir Edmond Halley (1656 - 1742).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results A Nonlinear Newton method

Taylor expansion

0 1 00 T (s) = F (x) + F (x)s + F (x)ss 2 Nonlinear Newton: Given x. Determine s : T (s) = 0. Update x+ = x + s. Two Newton steps on nonlinear problem T (s) = 0 with s(0) ≡ 0:

0 (1) 0 F (x)s = −F (x). T (0)s(1) = −T (0). h 0 00 (1)i (1) 1 00 (1) (1) 0 (1) (2) (1) F (x) + F (x)s s = − F (x)s s . T (s )s = −T (s ). 2 x + = x + s(1) + s(2). x + = x + s(1) + s(2).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results The Halley Class

The Halley class of iterations (Gutierrez and Hernandez 2001): Given starting value x0 compute

1 0 x = x −{I + L(x )[I −αL(x )]−1}(F (x ))−1F (x ), k = 0, 1,..., k+1 k 2 k k k k where

0 −1 00 0 −1 n L(x) = (F (x)) F (x)(F (x)) F (x), x ∈ R Classical methods Chebyshev’s method (α = 0), 1 Halley’s method (α = 2 ), and Super Halley’s method (α = 1).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results One Step Halley

This formulation is not suitable for implementation. By rewriting the equation we get the following iterative method for k = 0, 1,...

(1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) h 0 00 (1)i (2) 1 00 (1) (1) Solve for sk : F (xk ) + αF (xk )sk sk = − 2 F (xk )sk sk (1) (2) Update the iterate: xk+1 = xk + sk + sk

A Key Point One step super Halley (α = 1) is two steps of Newton on the quadratic approximation.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results Super Halley as Two Steps of Newton

Two Steps of Newton is:

(1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 (1) (2) (1) Solve for sk : F (xk + sk )sk = −F (xk + sk ) (1) (2) Update the iterate: xk+1 = xk + sk + sk One step Super Halley:

(1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 00 (1) (2) 1 00 (1) (1) Solve for sk :[F (xk ) + F (xk )sk ]sk = − 2 F (xk )sk sk (1) (2) Update the iterate: xk+1 = xk + sk + sk

1 In addition to F (x), F 0(x) and the solution of two linear systems they require: - Halley requires F 00(x)s (+ matrix vector product [F 00(x)s] s) - Two Steps of Newton requires F 0(x + s) and F (x + s). 2 All members in the Halley class are cubically convergent. 3 Super Halley and two steps of Newton are equivalent on quadratic functions. 4 The super Halley method is quartically convergent for quadratic equations (D. Chen, I. K. Argyros and Q. Qian 1994).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results

(Ortega and Rheinboldt 1970): Methods which require second and higher order , are rather cumbersome from a computational view point. Note that, while 0 2 00 computation of F involves only n partial derivatives ∂j Fi , computation of F requires 3 n second partial derivatives ∂j ∂k Fi , in general exorbiant amount of work indeed.

(Rheinboldt 1974): Clearly, comparisons of this type turn out to be even worse for methods with derivatives of order larger than two. Except in the case n = 1, where all derivatives require only one evaluation, the practical value of methods involving more than the first derivative of F is therefore very questionable.

(Rheinboldt 1998): Clearly, for increasing dimension n the required computational work soon outweighs the advantage of the higher-order convergence.

When structure and sparsity is utilized the picture is very different. Sparsity is more predominant in higher derivatives.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Local Methods for Unconstrained Optimization

The members of the Halley class also apply for algorithms for the unconstrained optimization problem in the general case

min f (x) n x∈R

f (x), ∇f (x), ∇2f (x) and ∇3f (x)

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Terminology

n Let f : R → R be a three times continuously differentiable function. For a given n x ∈ R let ∂f (x) ∂2f (x) ∂3f (x) gi = , Hij = , Tijk = . ∂xi ∂xi ∂xj ∂xi ∂xj ∂xk

n n×n n×n×n g ∈ R , H ∈ R , and T ∈ R H is a symmetric matrix Hij = Hji , i 6= j We say that a n × n × n tensor is super-symmetric when

Tijk = Tikj = Tjik = Tjki = Tkij = Tkji , i 6= j, j 6= k, i 6= k

Tiik = Tiki = Tkii , i 6= k.

We will use the notation (pT ) for the matrix ∇3f (x)p.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Super-Symmetric Tensor

1 For a super-symmetric tensor we only store 6 n(n + 1)(n + 2) elements Tijk for 1 ≤ k ≤ j ≤ i ≤ n.(n = 9).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Computations with the Tensor

T The cubic value term p (pT )p ∈ R:

T Pn Pn Pn p (pT )p = i=1 pi j=1 pj k=1 pk Tijk

Pn nhPi−1  Pj−1  Pi−1 i 2 o = i=1 pi j=1 pj 6 k=1 pk Tijk + 3pj Tijj + 3pi k=1 pk Tiik + pi Tiii

n The cubic term (pT )p ∈ R : Pn Pn ((pT )p)i = j=1 k=1 pj pk Tijk , 1 ≤ i ≤ n

n×n The cubic Hessian term (pT ) ∈ R : Pn (pT )ij = k=1 pk Tijk , 1 ≤ i, j ≤ n

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Computing utilizing Super-Symmetry:H + (pT )

n×n×n T ∈ R is a super-symmetric tensor. n×n H ∈ R is a symmetric matrix. n Let p ∈ R . for i = 1 to n do for j = 1 to i − 1 do for k = 1 to j − 1 do Hij + = pk Tijk Hik + = pj Tijk Hjk + = pi Tijk end for Hij + = pj Tijj Hjj + = pi Tijj end for for k = 1 to i − 1 do Hii + = pk Tiik Hik + = pi Tiik end for Hii + = pi Tiii end for

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results Induced Sparsity

Definition The sparsity of the (Griewank and Toint 1982):

2 ∂ n f (x) = 0, ∀x ∈ R , and (i, j) ∈ Z ∂xi ∂xj Then

∂3f (x) Tijk = = 0 for (i, j) ∈ Z or (j, k) ∈ Z or (i, k) ∈ Z. ∂xi ∂xj ∂xk We say that sparsity structure of the tensor is induced by the sparsity structure of the Hessian matrix.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results Stored Elements: Arrowhead

 X X   X X     X X     X X     X X   X X     X X     X X  XXXXXXXXX

Stored elements of a 9 × 9 arrowhead matrix and the induced tensor.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results Stored Elements: Tridiagonal

Stored elements of a 9 × 9 tridiagonal matrix and the induced tensor.

 X X   XX X     XX X     XX X     XX X   XX X     XX X     XX X  XX

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results General Sparsity

If Z is the set of indices of the Hessian matrix that are 0 and define

N = {(i, j)|1 ≤ i, j ≤ n}\Z

N is the set of indices for which the elements in the Hessian matrix at x will be nonzero.

Since

Tijk = 0, if (i, j) ∈ Z, or (j, k) ∈ Z or (i, k) ∈ Z we only need to consider the elements (i, j, k) in the tensor, where

(i, j) ∈ N and (j, k) ∈ N and (i, k) ∈ N , 1 ≤ k ≤ j ≤ i ≤ n.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results General Sparsity cont.

In the following we will assume that (i, i) ∈ N . Define

T = {(i, j, k)|1 ≤ k ≤ j ≤ i ≤ n, (i, j) ∈ N , (j, k) ∈ N , (i, k) ∈ N }

Let Ci be the indices below the diagonal nonzero elements in row i of the sparse Hessian matrix: Ci = {j|j ≤ i, (i, j) ∈ N }, i = 1,..., n. Then T = {(i, j, k)|i = 1, ..., n, j ∈ Ci , k ∈ Ci ∩ Cj }. For a given (i, j) the indices k so that (i, j, k) ∈ T is called tube (i,j).(Bader and Kolda 2004)

A Key Point

The intersection of Ci and Cj defines the tube (i, j).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results The (Extreme) Sparsity of the Tensor

The induced tensor sparsity ratio is

nnz(T ) 100%. (n(n+1)(n+2)) 6

The matrix sparsity ratio is

nnz(H) 100%. (n(n+1)) 2 Consider the nos set from Matrix Market.

Sparsity Ratio % Matrix n nnz(H) nnz(T ) tensor matrix nos1 237 627 1017 0.0453 3.6060 nos2 957 2547 4137 0.0028 0.9025 nos3 960 8402 35066 0.0237 7.6019 nos4 100 347 630 0.3669 12.4752 nos5 468 2820 6359 0.0370 5.7943 nos6 675 1965 3255 0.0063 1.4267 nos7 729 2673 4617 0.0071 1.7352

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results A general sparse implementation of: H + (pT )

n×n×n T ∈ R is a super-symmetric tensor. n×n H ∈ R is a symmetric matrix. n Let p ∈ R . Let Ci is the nonzero index pattern of row i of the Hessian matrix. for i = 1 to n do for j ∈ Ci ∧ j < i do for k ∈ Ci ∩ Cj ∧ k < j do Hij + = pk Tijk Hik + = pj Tijk Hjk + = pi Tijk end for Hij + = pj Tijj Hjj + = pi Tijj end for for k ∈ Ci ∧ k < i do Hii + = pk Tiik Hik + = pi Tiik end for Hii + = pi Tiii end for

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Practical Issues

Four implementations:

1 Store k ∈ Ci ∩ Cj .

2 Let k ∈ Cj and if k ∈ Ci .

3 i Let k ∈ Cj and Indexk = 0 when k 6∈ Ci and 1 otherwise. 4 Expand storage of tube (i, j) to |Cj |.

With these implementations of k ∈ Ci ∩ Cj , k < j there is a tradeoff between memory and operations (arithmetic or logical).

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Intersection with and without if: (pT )

Set the elements of t to false. for i = 1 to n do Set the elements of Index to zero. (i) Compute t = true if k ∈ C for i = 1 to n do k i (i) for j ∈ Ci ∧ j < i do Compute Indexk = 1 if k ∈ Ci for k ∈ Cj ∧ k < j do for j ∈ Ci ∧ j < i do (i) for k ∈ C ∧ k < j do if t then j k (i) Hij + = pk Tijk Tijk = Tijk Indexk Hik + = pj Tijk Hij + = pk Tijk Hjk + = pi Tijk Hik + = pj Tijk end if Hjk + = pi Tijk end for end for Hij + = pj Tijj Hij + = pj Tijj Hjj + = pi Tijj Hjj + = pi Tijj end for end for for k ∈ Ci ∧ k < i do for k ∈ Ci ∧ k < i do Hii + = pk Tiik Hii + = pk Tiik Hik + = pi Tiik Hik + = pi Tiik end for end for Hii + = pi Tiii Hii + = pi Tiii (i) (i) Reset tk = false if k ∈ Ci Reset Indexk = 0 if k ∈ Ci end for end for

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Numerical Results: Computing (pT )p and (pT )

CPU Measurements for the Gradient Term (milliSeconds) Matrix n nnz(H) nnz(T ) Full storage ”if” Index x-storage nos3 960 8402 35066 701 766 1242 1047 nos4 100 347 630 16 23 28 20 nos5 468 2820 6359 154 222 373 289 bcsstk19 817 3835 10363 220 245 355 226 bcsstk22 138 417 841 22 29 32 27 gr3030 900 4322 11108 226 247 343 351

CPU Measurements for the Hessian Term (milliSeconds) Matrix n nnz(H) nnz(T ) Full storage ”if” Index x-storage nos3 960 8402 35066 996 1195 1717 1514 nos4 100 347 630 20 23 28 27 nos5 468 2820 6359 224 288 461 407 bcsstk19 817 3835 10363 216 380 411 483 bcsstk22 138 417 841 24 28 38 31 gr3030 900 4322 11108 230 421 567 525

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Analysis

Operations = c1S + c2nnz(T ) + c3nnz(H) + c4n.

where

n n n X X X X X S = |Cj | , nnz(T ) = |Ci ∩ Cj | and nnz(H) = |Ci | i=1 j∈Ci i=1 j∈Ci i=1

- For full storage, c1 = 0, c2 = 6. Memory is 2nnz(T ). - Implementations with if and full storage have the same number of arithmetic operations.

- For intersection with if c1 = 1, c2 = 6 and memory is nnz(T )

- For Index and x-storage c1 = 6, but memory is nnz(T ) or S ≥ nnz(T )

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results How to Utilize Sparsity in the Problem: A Skyline Matrix

A matrix has a symmetric skyline structure (envelope structure) if all nonzero elements in a row are located from the first nonzero element to the element on the diagonal. Define βi to be the (lower) bandwidth of row i,

βi = max{i − j| Hij 6= 0 with j < i.}

and define fi to be the start index for row i in the Hessian matrix

fi = i − βi Then Cj = {k|fj ≤ k ≤ j}

Ci ∩ Cj = {k| max{fi , fj } ≤ k ≤ j}, j ≤ i.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Skyline implementation of: pT (pT )p

n×n×n n Let T ∈ R be a super-symmetric tensor, let p ∈ R be a vector. Let {f1,..., fn} be the indices of the first nonzero elements for each row in the Hessian matrix. Let c, s, t ∈ R be a scalar. for i = 1 to n do t = 0 for j = fi to i − 1 do s = 0 for k = max{fi , fj } to j − 1 do s+ = pk Tijk end for t+ = pj (6s + 3pj Tijj ) end for s = 0 for k = fi to i − 1 do s+ = pk Tiik end for c+ = pi (t + pi (3s + pi Tiii )) end for

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Halley and Two Steps of Newton in Review

Two Steps of Newton is:

(1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 (1) (2) (1) Solve for sk : F (xk + sk )sk = −F (xk + sk ) (1) (2) Update the iterate: xk+1 = xk + sk + sk

One step Halley:

(1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 00 (1) (2) 1 00 (1) (1) Solve for sk :[F (xk ) + αF (xk )sk ]sk = − 2 F (xk )sk sk (1) (2) Update the iterate: xk+1 = xk + sk + sk

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Computational Requirements

The tensor computations and LDLT decomposition for dense, banded and skyline.

Number of floating point arithmetic operations Operation Dense Banded Skyline T 1 3 1 2 5 2 2 3 3 2 5 LDL 3 n + 2 n − 6 n nβ + 2nβ + n − 3 β − 2 β − 6 β 2nnz(T ) − nnz(H) − n (pT ) n3 + n2 3nβ2 + 5nβ − n − 2β3 − 4β2 − 3β 6nnz(T ) − 4nnz(H) − n 2 3 2 2 2 4 3 2 20 (pT )p 3 n + 6n − 3 n 2nβ + 14nβ + 7n − 3 β − 8β − 3 β 4nnz(T ) + 8nnz(H) − 6n

The LDLT decomposition has the same complexity as the tensor operations. The total computational requirements for the Newton’s and super Halley’s method.

Computational Requirements Method Dense Banded Skyline 1 3 5 2 5 2 2 3 7 2 17 Newton 3 n + 2 n − 6 n nβ + 6nβ + 3n − 3 β − 2 β − 6 β 2nnz(T ) + 3nnz(H) − 3n 5 3 19 2 7 2 10 3 2 70 Super Halley 3 n + 2 n − 6 n 5nβ + 22nβ + 11n − 3 β − 14β − 6 β 10nnz(T ) + 7nnz(H) − 9n

The Halley class and Newton’s method has the same asymptotic upper bound for the dense, banded and skyline structure.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Less Memory More Flops Local Methods for Unconstrained Optimization Numerical Results The General Sparsity of the Third Derivative Skyline Structure How to Utilize Sparsity in the Problem The Cost of Newton’s and Halley’s Methods Numerical Results Upper Bound for the Skyline Structure

Theorem The ratio of the number of arithmetic operations of a method in the Halley class and Newton’s method is constant per iteration.

flops(One Step Halley) ≤ 5 flops(One Step Newton)

when the tensor is induced by a skyline structure of the Hessian matrix and we use a direct method to solve the systems of linear equations.

(Rheinboldt 1998): Clearly, for increasing dimension n the required computational work soon outweighs the advantage of the higher-order convergence

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative Test Cases How to Utilize Sparsity in the Problem Numerical Results Chained and Generalized Rosenbrock

Chained Rosenbrock (Toint 1982):

n X 2 2 2 f (x) = [6.4(xi−1 − xi ) + (1 − xi ) ] i=2 Generalized Rosenbrock (Schwefel 1977):

n X 2 2 2 f (x) = [(xn − xi ) + (xi − 1) ] i=2

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative Test Cases How to Utilize Sparsity in the Problem Numerical Results Chained and Generalized Rosenbrock

x =(1.7,...,1.7) and x*=(1.0,...,1.0) x =(1.08,0.99,...,1.08,0.99) and x*=(1.0,...,1.0) O O 40 25 Newton 9 iterations Newton 5 iterations Chebyshev 6 iterations Chebyshev 3 iterations 35 Halley 6 iterations Halley 3 iterations Super Halley 4 iterations Super Halley 3 iterations 20 30

25 15

20

10 15 CPU Timings (mS) CPU Timings (mS) 10 5

5

0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 n n

−8 The termination criteria for all methods are if k∇f (xk )k ≤ 10 k∇f (x0)k. The total CPU time include function, gradient, Hessian and tensor evaluations.

Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative Test Cases How to Utilize Sparsity in the Problem Numerical Results References

B. W. Bader and T. G. Kolda. MATLAB Tensor Classes for Fast Algorithm Prototyping. Technical Report SAND 2004-5187, October 2004.

D. Chen, I. K. Argyros and Q. Qian. A Local Convergence Theorem for the Super-Halley Method in a Banach Space. Appl. Math. Lett. Vol. 7, 5, pp. 49-52, 1994.

A. Griewank and Ph. L. Toint. On the unconstrained optimization of partially separable functions. In Michael J. D. Powell, editor, Nonlinear Optimization 1981, pages 301-312. Academic Press, New York, NY, 1982.

G. Gundersen and T. Steihaug. Sparsity in Higher Order Methods in Optimization. Reports in Informatics 327, Dept. of Informatics, Univ. of Bergen, 2006.

J. M. Gutierrez and M. A. Hernandez. An of Newton’s method: Super-Halley method. Applied and Computation. 25 January 2001, vol. 117, no. 2, pp. 223-239(17).

H. P. Schwefel. Numerical Optimization of Computer Models. John Wiley and Sons, Chichester, 1981.

Ph.L. Toint. Some numerical results using a sparse matrix updating formula in unconstrained optimization. Mathematics of Computation, Volume 32, Number 143. July 1978, pages 839-851.

Trond Steihaug Newton and Halley are one step apart