Newton and Halley Are One Step Apart

Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative How to Utilize Sparsity in the Problem Numerical Results Newton and Halley are one step apart Trond Steihaug Department of Informatics, University of Bergen, Norway 4th European Workshop on Automatic Differentiation December 7 - 8, 2006 Institute for Scientific Computing RWTH Aachen University Aachen, Germany (This is joint work with Geir Gundersen) Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative How to Utilize Sparsity in the Problem Numerical Results Overview - Methods for Solving Nonlinear Equations: Method in the Halley Class is Two Steps of Newton in disguise. - Local Methods for Unconstrained Optimization. - How to Utilize Structure in the Problem. - Numerical Results. Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results Newton and Halley A central problem in scientific computation is the solution of system of n equations in n unknowns F (x) = 0 n n where F : R → R is sufficiently smooth. Sir Isaac Newton (1643 - 1727). Sir Edmond Halley (1656 - 1742). Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results A Nonlinear Newton method Taylor expansion 0 1 00 T (s) = F (x) + F (x)s + F (x)ss 2 Nonlinear Newton: Given x. Determine s : T (s) = 0. Update x+ = x + s. Two Newton steps on nonlinear problem T (s) = 0 with s(0) ≡ 0: 0 (1) 0 F (x)s = −F (x). T (0)s(1) = −T (0). h 0 00 (1)i (1) 1 00 (1) (1) 0 (1) (2) (1) F (x) + F (x)s s = − F (x)s s . T (s )s = −T (s ). 2 x + = x + s(1) + s(2). x + = x + s(1) + s(2). Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results The Halley Class The Halley class of iterations (Gutierrez and Hernandez 2001): Given starting value x0 compute 1 0 x = x −{I + L(x )[I −αL(x )]−1}(F (x ))−1F (x ), k = 0, 1,..., k+1 k 2 k k k k where 0 −1 00 0 −1 n L(x) = (F (x)) F (x)(F (x)) F (x), x ∈ R Classical methods Chebyshev’s method (α = 0), 1 Halley’s method (α = 2 ), and Super Halley’s method (α = 1). Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results One Step Halley This formulation is not suitable for implementation. By rewriting the equation we get the following iterative method for k = 0, 1,... (1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) h 0 00 (1)i (2) 1 00 (1) (1) Solve for sk : F (xk ) + αF (xk )sk sk = − 2 F (xk )sk sk (1) (2) Update the iterate: xk+1 = xk + sk + sk A Key Point One step super Halley (α = 1) is two steps of Newton on the quadratic approximation. Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results Super Halley as Two Steps of Newton Two Steps of Newton is: (1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 (1) (2) (1) Solve for sk : F (xk + sk )sk = −F (xk + sk ) (1) (2) Update the iterate: xk+1 = xk + sk + sk One step Super Halley: (1) 0 (1) Solve for sk : F (xk )sk = −F (xk ) (2) 0 00 (1) (2) 1 00 (1) (1) Solve for sk :[F (xk ) + F (xk )sk ]sk = − 2 F (xk )sk sk (1) (2) Update the iterate: xk+1 = xk + sk + sk 1 In addition to F (x), F 0(x) and the solution of two linear systems they require: - Halley requires F 00(x)s (+ matrix vector product [F 00(x)s] s) - Two Steps of Newton requires F 0(x + s) and F (x + s). 2 All members in the Halley class are cubically convergent. 3 Super Halley and two steps of Newton are equivalent on quadratic functions. 4 The super Halley method is quartically convergent for quadratic equations (D. Chen, I. K. Argyros and Q. Qian 1994). Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The Halley Class The General Sparsity of the Third Derivative Motivation How to Utilize Sparsity in the Problem Numerical Results (Ortega and Rheinboldt 1970): Methods which require second and higher order derivatives, are rather cumbersome from a computational view point. Note that, while 0 2 00 computation of F involves only n partial derivatives ∂j Fi , computation of F requires 3 n second partial derivatives ∂j ∂k Fi , in general exorbiant amount of work indeed. (Rheinboldt 1974): Clearly, comparisons of this type turn out to be even worse for methods with derivatives of order larger than two. Except in the case n = 1, where all derivatives require only one function evaluation, the practical value of methods involving more than the first derivative of F is therefore very questionable. (Rheinboldt 1998): Clearly, for increasing dimension n the required computational work soon outweighs the advantage of the higher-order convergence. When structure and sparsity is utilized the picture is very different. Sparsity is more predominant in higher derivatives. Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Local Methods for Unconstrained Optimization The members of the Halley class also apply for algorithms for the unconstrained optimization problem in the general case min f (x) n x∈R f (x), ∇f (x), ∇2f (x) and ∇3f (x) Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Terminology n Let f : R → R be a three times continuously differentiable function. For a given n x ∈ R let ∂f (x) ∂2f (x) ∂3f (x) gi = , Hij = , Tijk = . ∂xi ∂xi ∂xj ∂xi ∂xj ∂xk n n×n n×n×n g ∈ R , H ∈ R , and T ∈ R H is a symmetric matrix Hij = Hji , i 6= j We say that a n × n × n tensor is super-symmetric when Tijk = Tikj = Tjik = Tjki = Tkij = Tkji , i 6= j, j 6= k, i 6= k Tiik = Tiki = Tkii , i 6= k. We will use the notation (pT ) for the matrix ∇3f (x)p. Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Super-Symmetric Tensor 1 For a super-symmetric tensor we only store 6 n(n + 1)(n + 2) elements Tijk for 1 ≤ k ≤ j ≤ i ≤ n.(n = 9). Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Computations with the Tensor T The cubic value term p (pT )p ∈ R: T Pn Pn Pn p (pT )p = i=1 pi j=1 pj k=1 pk Tijk Pn nhPi−1 Pj−1 Pi−1 i 2 o = i=1 pi j=1 pj 6 k=1 pk Tijk + 3pj Tijj + 3pi k=1 pk Tiik + pi Tiii n The cubic gradient term (pT )p ∈ R : Pn Pn ((pT )p)i = j=1 k=1 pj pk Tijk , 1 ≤ i ≤ n n×n The cubic Hessian term (pT ) ∈ R : Pn (pT )ij = k=1 pk Tijk , 1 ≤ i, j ≤ n Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Basics The General Sparsity of the Third Derivative Computations with the Tensor How to Utilize Sparsity in the Problem Numerical Results Computing utilizing Super-Symmetry:H + (pT ) n×n×n T ∈ R is a super-symmetric tensor. n×n H ∈ R is a symmetric matrix. n Let p ∈ R . for i = 1 to n do for j = 1 to i − 1 do for k = 1 to j − 1 do Hij + = pk Tijk Hik + = pj Tijk Hjk + = pi Tijk end for Hij + = pj Tijj Hjj + = pi Tijj end for for k = 1 to i − 1 do Hii + = pk Tiik Hik + = pi Tiik end for Hii + = pi Tiii end for Trond Steihaug Newton and Halley are one step apart Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization Sparsity The General Sparsity of the Third Derivative General Sparsity How to Utilize Sparsity in the Problem Numerical Results Induced Sparsity Definition The sparsity of the Hessian matrix (Griewank and Toint 1982): 2 ∂ n f (x) = 0, ∀x ∈ R , and (i, j) ∈ Z ∂xi ∂xj Then ∂3f (x) Tijk = = 0 for (i, j) ∈ Z or (j, k) ∈ Z or (i, k) ∈ Z.

Newton and Halley Are One Step Apart

The Mean Value Theorem Math 120 Calculus I Fall 2015

Calculus Online Textbook Chapter 2

Efficient Computation of Sparse Higher Derivative Tensors⋆

A NUMERICAL METHOD in TERMS of the THIRD DERIVATIVE for a DELAY INTEGRAL EQUATION from BIOMATHEMATICS Volume 6, Issue 2, Article 42, 2005

2.8 the Derivative As a Function

Evaluating Higher Derivative Tensors by Forward Propagation of Univariate Taylor Series

Week 3 Quiz: Differential Calculus: the Derivative and Rules of Differentiation

Higher-Order Derivatives D Y 3 + 4X2y 5 5X = D 10 Dx − Dx J

Derivative Process Using Fractal Indices K Equals One-Half, One-Third, and One-Fourth

Putnam Practice—Integration and Differentiation

MA 2231 Lecture 18 - Higher Order Derivatives, Implicit Diﬀerentiation

What Are the Mean Value and Taylor Theorems Saying?