A Structured Diagonal Hessian Approximation Method with Evaluation Complexity Analysis for Nonlinear Least Squares

A structured diagonal Hessian approximation method with evaluation complexity analysis for nonlinear least squares Hassan Mohammad1;2 and Sandra A. Santos2 1Department of Mathematical Sciences Faculty of Physical Sciences Bayero University, Kano, Nigeria [email protected] 2Department of Applied Mathematics Institute of Mathematics, Statistics and Scientific Computing University of Campinas, Campinas, Brazil [email protected] June 9, 2018 Abstract This work proposes a Jacobian-free strategy for addressing large-scale nonlinear least- squares problems, in which structured secant conditions are used to define a diagonal approximation for the Hessian matrix. Proper safeguards are devised to ensure descent directions along the generated sequence. Worst-case evaluation analysis is provided within the framework of a non-monotone line search. Numerical experiments contextualize the proposed strategy, by addressing structured problems from the literature, also solved by related and recently presented conjugate-gradient and multivariate spectral gradient strategies. The comparative computational results show a favorable performance of the proposed approach. Keywords: Nonlinear least-squares problems ; Large-scale problems; Jacobian-free strategy; Evaluation complexity; Global convergence; Computational results 1 Introduction Nonlinear least-squares problems constitute a special class of unconstrained optimization problems that involves the minimization of the norm of the squares of the so-called residual functions, usually assumed to be twice continuously differentiable. These problems may be solved either by general unconstrained minimization methods, or by specialized methods, that take into account the par- ticular structure of the objective function (Björck, 1996). Nonlinear least-squares problems arise in many applications, such as data fitting, optimal control, parameter estimation, experimental design, and imaging problems (see, e.g. (Cornelio, 2011; Golub and Pereyra, 2003; Henn, 2003; Kim et al., 2007; Li et al., 2012; López et al., 2015; Tang, 2011), among others). Most of the classical and modified algorithms for solving nonlinear least-squares problems require the computation of the gradient vector and the Hessian matrix of the objective function. The Hessian, in this case, is the sum of two terms. The first one involves only the gradients of the residue, whereas the second involves the second-order derivatives of the functions. In practice, computing and storing the complete Hessian matrix is too expensive, because the exact second-order derivatives of the residual functions are rarely available at a reasonable cost. Such a matrix also requires a large amount of storage when considering large-scale problems. However, if the residual of the objective function is nonzero at the minimizer, the computation of the exact second-order 1 derivatives of the residual function might become relevant. Nonzero residual problems occur in many practical instances (see (Nocedal and Wright, 2006; Sun and Yuan, 2006)). Newton-type methods, that require the exact Hessian matrix, may be too expensive for such problems. As a consequence, alternative approaches, which rest upon function evaluations and first-order derivatives, have been developed (see (Nazareth, 1980) for details). This is the case of the Gauss-Newton method (GN) and the Levenberg-Marquardt method (LM) (Hartley, 1961; Levenberg, 1944; Marquardt, 1963; Morrison, 1960). Both GN and LM methods neglect the second-order term of the Hessian matrix of the objective function of the problems. As a result, they are expected to perform well for solving small residual problems. To address large residual problems, these methods may perform poorly (Dennis and Schnabel, 1996). Even the requirement of computing the Jacobian matrix for the aforementioned methods may contribute to some inefficiency, in the case of large-scale problems. Thus, Jacobian-free approaches are of interest for solving such instances (see for example, (Knoll and Keyes, 2004; Xu et al., 2012, 2016)). Many studies have been made to the development of methods for minimizing the sum of squares of nonlinear functions (see, e.g. (Al-Baali, 2003; Al-Baali and Fletcher, 1985; Bartholomew-Biggs, 1977; Betts, 1976; Fletcher and Xu, 1987; Lukˇsan, 1996; McKeown, 1975; Nazareth, 1980)). Along the derivative-free philosophy, we should mention the seminal work of (Powell, 1965), as well as the recent class of algorithms developed in (Zhang et al., 2010), based on polynomial interpolation- based models. From another standpoint, and inspired by the regularizing aspect of the LM method, further analyses of the performance of methods with second- and higher-order regularization terms have been developed (Bellavia et al., 2010; Birgin et al., 2017; Cartis et al., 2013, 2015; Grapiglia et al., 2015; Nesterov, 2007; Zhao and Fan, 2016). Structured quasi-Newton (SQN) methods have been proposed (Brown and Dennis, 1970; Dennis, 1973) to overcome the difficulties that arise when solving large residual problems using GN or LM methods. These methods combine the Gauss-Newton and a Quasi-Newton step in order to make good use of the structure of the Hessian of the objective function. SQN methods show compelling and improved numerical performance compared to the classical methods (Spedicato and Vespucci, 1988; Wang et al., 2010). However, their performance on large-scale problems is not encouraging, because they require a considerable large amount of matrix storage. Based on this drawback, matrix-free approaches are preferable for solving large-scale nonlinear least-squares problems. Motivated by the above discussion, we propose an alternative approximation of the complete Hessian matrix of the objective function of the nonlinear least-squares problem using a diagonal matrix that carries some information of both the first- and the second-order terms of the Hessian matrix. Our proposed approach is Jacobian-free, not requiring the storage of the Jacobian matrix, but only the action of its transpose upon a vector. Therefore, it is suitable for very large-scale problems. The remaining of this paper is organized as follows. In Section 2 we discuss some preliminaries, in Section 3 we present the proposed method, and its evaluation complexity analysis is developed in Section 4. Section 5 contains the numerical experiments. Some conclusions and prospective future work are given in Section 6. Throughout this paper, k:k stands for the Euclidean norm of vectors and the induced 2-norm of matrices. 2 Preliminaries In this section, we first recall the diagonal quasi-Newton method for general unconstrained optimization problems, as described in (Deng and Wan, 2012), where the authors provide details about the manuscript (Shi and Sun, 2006), in Chinese. According to (Deng and Wan, 2012), (Shi and Sun, 2006) presented a sparse (diagonal) quasi- Newton method for the unconstrained optimization problem min f(x); (1) x2Rn 2 where f : Rn ! R is continuously differentiable function. Their idea was motivated by the work of (Barzilai and Borwein, 1988) and (Raydan, 1993, 1997). Considering an iterative procedure for solving (1), in which the sequence fxkg is generated, the unconstrained optimization problem 1 2 1 2 min kHkyk−1 − sk−1k = min k(λkI)yk−1 − sk−1k ; k = 1; 2;:::; (2) Hk 2 λk 2 where sk−1 = xk − xk−1 and yk−1 = rf(xk) − rf(xk−1), is solved to obtain a scalar multiple of the identity that approximates the inverse of the Hessian. i Shi and Sun assumed Hk to be a diagonal matrix, say, Hk = diag(hk); (i = 1; 2; :::; n); and solved the following constrained problem Xn ( ) 1 i i − 2 min hkyk−1 sk−1 ; (3) ` ≤hi ≤u 2 k k k i=1 i ≤ i ≤ where `k and uk are given bounds for hk such that 0 < `k hk uk, and so Hk is safely positive definite. The solution of the problem (3) is given by 8 si si > k−1 ≤ k−1 ≤ > yi ; if `k yi uk > k−1 k−1 > i < sk−1 `k; if i < `k i yk−1 hk = (4) > si > k−1 >uk; if yi > uk > k−1 : i i hk−1; if yk−1 = 0: Shi and Sun's algorithm avoids the computation and storage of matrices in its iteration, making it suitable for large-scale problems. However, the numerical results depend on the bound parameters `k and uk. In a similar perspective, (Han et al., 2008) presented a multivariate spectral gradient method for unconstrained optimization. They found a solution to the problem Xn ( ) 1 i i − i 2 min bksk−1 yk−1 (5) bi 2 k i=1 as follows 8 i i > yk−1 yk−1 < i ; if i > 0 sk−1 sk−1 bi = (6) k > T : sk−1sk−1 T ; otherwise: sk−1yk−1 i In order to bound the diagonal entries bk, the authors used the technique introduced by (Raydan, i ≤ i ≥ 1 i 1997), that is, for some 0 < ϵ < 1 and δ > 0, if bk ϵ or bk ϵ , then bk = δ: Now, let us consider a special case of (1), namely, the least-squares problem (LS) of the form Xm 1k k2 1 2 min f(x); f(x) = F (x) 2 = (Fi(x)) ; (7) x2Rn 2 2 i=1 where F : Rn ! Rm (m ≥ n) is a twice continuously differentiable mapping. The gradient and the Hessian of the objective function in (7) are given by Xm T rf(x) = Fi(x)rFi(x) = J(x) F (x); (8) i=1 3 Xm Xm 2 T 2 r f(x) = rFi(x)rFi(x) + Fi(x)r Fi(x) i=1 i=1 (9) = J(x)T J(x) + S(x); 2 T m×n where Fi(x) is the i-th component of F (x), r Fi(x) is its Hessian, matrix J(x) = rF (x) 2 R is the Jacobian of the residual function F at x, and S(x) is a square matrix representing the second term of (9). For simplicity, we denote the residual function computed at the k-th iterate r T by Fk = F (xk), and also fk = f(xk), Jk = J(xk), and gk = fk = Jk Fk.

A Structured Diagonal Hessian Approximation Method with Evaluation Complexity Analysis for Nonlinear Least Squares

SUPPLEMENTARY MATERIAL: I. Fitting of the Hessian Matrix

EUCLIDEAN DISTANCE MATRIX COMPLETION PROBLEMS June 6

Removing External Degrees of Freedom from Transition State

Linear Algebra Review James Chuang

An Introduction to the HESSIAN

Linear Algebra Primer

Lecture 11: Maxima and Minima

Hessian Matrices, Automorphisms of P-Groups, and Torsion Points of Elliptic Curves

Chapter 7 Principal Hessian Directions

Exact Calculation of the Product of the Hessian Matrix of Feed-Forward

This Lecture: Lec2p1, ORF363/COS323

Math 233 Hessians Fall 2001 and Unconstrained Optimization