Гедзжйи 687@9 A¨ ¨2Cb4

TRUNCATED NEWTON METHODS FOR OPTIMIZATION WITH INACCURATE FUNCTIONS AND GRADIENTS ¡ ¢ C.T. KELLEY AND E.W. SACHS Abstract. We consider unconstrained minimization problems that have functions and gradients given by “black box” codes with error control. We discuss several modifications of the Steihaug truncated Newton method that can improve performance for such problems. We illustrate the ideas with two examples. Key words. Trust region methods, inexact Newton methods, optimal control AMS subject classifications. 49K20, 65F10, 49M15, 65J15, 65K10 1. Introduction. Consider an unconstrained minimization problem £¥¤§¦©¨ ¨ ¨ where the objective function and its gradient are computed inaccurately with absolute errors and relative errors that can be controlled. We ask how the errors should be set so that a truncated or inexact Newton iteration [10], [5], [11], [12] such as Newton-CG or the CG-trust region method from ¨ [15] will perform like the error-free algorithm while and continue to produce an improving ¨ #" ! sequence of iterations until . The case considered here is different from that considered ¨ in [3] and [4], which assumed fully accurate function values and gradient errors that were ), a ¨ condition that is impractical when is small, and used information on the iteration to change the ¨ ¨ accuracy to which and were computed. 1.1. Motivating Problem. An example of such a situation which motivates this work is the simple optimal control problem: $&%(' (1.1) ¨ ) where ,¥- ¨ " " " "6 +12 +3 4*5 +3 43 37 (1.2) +* .0/ ?>A@ 1 where *98 is the control and the state variable is the solution of the initial value problem (with /;:¥< = B 6 6 1DC 3 1& ) B .#J " " " " " GF5 +12 +3 4*5 +3 43 41H I1 (1.3) 1E +3 = ¨ " +* If and F are continuously differentiable, then the gradient with respect to the inner product, , / /LK can be represented as a continuous function of 3 : ¨ J ) ) " " " " " "PO " " " +* +3 NM5 +3 F +12 +3 4*5 +3 43 +1H +3 4*Q +3 43 (1.4) / ?>A@ In (1.4) M , the adjoint variable, satisfies the final-value problem on < = B J R " " " " "QO " " " " V NM5 +3 FTS +12 +3 4*Q +3 43 S +1H +3 4*Q +3 43 !M5 U> (1.5) M5 +3 / = Version of July 16, 1999. ¡ North Carolina State University, Department of Mathematics and Center for Research in Scientific Computation, Box 8205, Raleigh, N. C. 27695-8205 (Tim [email protected]). The research of this author was supported by National Science Founda- tion grants #DMS-9700569 and #DMS-9714811. ¢ Universitat¨ Trier, FB IV – Mathematik and Graduiertenkolleg Mathematische Optimierung, 54296 Trier, Germany ([email protected]). 1 2 C. T. KELLEY AND E. W. SACHS If one solves (1.3) and (1.5) with the explicit Euler method, then the discretized gradient is also the gradient of the discrete problem. This means that approximating Hessian-vector products to high accuracy can easily be done by differencing for the discrete problem, since analytic gradients are available by computation of the discrete adjoint state. This is not the case if higher order methods are used [8]. If one uses variable-step and variable-order codes that control the local truncation error [13], [2], [1], [14] the error in ¨ will depend on the errors that come from the numerical integration of (1.3) and (1.5). Moreover, after (1.3) has been solved, the values of 1 obtained will have to be used in an interpolation during the integration ¨ of (1.3). That interpolation error will also affect the accuracy of . ¨ The accuracy in , in turn, will affect the performance of a Newton-CG algorithm that uses finite difference Hessian-vector products. We will denote by 7 and the absolute errors in the computation of ? the function and gradients and by and the relative errors. Our scenario is that when a function value ¨ ¨HW " " +* +* is requested the computed value satisfies X XY X X W ¨ ¨ ¨ R " O " " +* +* (1.6) +* and the computed gradient, which we denote by Z , satisfies X X Y ¨ ¨ J R " O " " ? +* +* (1.7) 4ZH +* ¨ S\" 7&[& ! For the example problem, the errors in are of the same order as the errors in 1 . Hence " _^ `^ S and 4]& ! . The gradient errors are different. If and are the absolute and relative errors in the Z computation of M then, neglecting products of errors, the computed gradient is ) O `^c"4"dO e^f"4" O "4"2O " " aMQ ?b & ! ! F +12 ?b & ! S & ! S 4*©43 J ) O O "4"dO " " +12 ?b ! S & ! S 4*P43 / ¨ * M Assuming that 1 , , and are bounded and and are sufficiently smooth we have / ) `^ e^ O "4"dO "4" O Sg"4"2O S\" " aMQ ?b & ! ! F +12 ?b & ! & ! 4*©43 ) O !^f"4" "©O e^hO O " [ ?b ! MHF +1E4*©43 & ! S S and J ) ) O "4"©O " " "PO O " +12 ?b & ! S & ! S 4*©43 +1T4*P43 & ! S S / / So the relative error in the ) term can be taken to be zero. Hence, / ¨ " " O ?i"4"dO #" ZE +* G +* ?b & ! ! where J ? `^g" e^hO O " ! j& ! S S (1.8) and 1 In addition to the errors that are controlled by the integrator, interpolation errors in * and and integration errors in the computation of ¨ must be considered. These errors are independent of the choice of integrator. We illustrate these errors with a simple example. Let the discrete unknown be a vector k[8mln kPo * pq3orin with components , which represents the values of on a uniform temporal mesh otsPu with mesh width v R " wbxCy !z b * 3 . When the numerical integrator needs values of at points other that one of the o s, and in- { pq3 r terpolation needs to be done. After the state equation has been solved, the values of the solution at o are ¨ " |}8~l +* k stored in a vector n and then a is approximated by a numerical integration. If is smooth and a cubic spline interpolation is used, one may approximate ¨ with Simpson’s rule for an integration error of INACCURATE FUNCTIONS AND GRADIENTS 3 vD ¨ " & , which is also the interpolation error. If piecewise linear interpolation is used and is approximated v " k | by the trapezoid rule, the integration and interpolation errors are & . and must be interpolated for K `^ oi the integration of the adjoint equation. So if o# is the integration error and the interpolation error, we have ? `^ `^ _^ !^ O O " O S;O S;O O " & ! o# o# & ! o# o# and In the experiments reported in this paper, the values of are consistent with the interpolation and integration errors. So v `^ e^ " S Sj& I where if piecewise linear interpolation and trapezoid rule integration is used and if cubic spline interpolation and Simpson’s rule integration is used. * 2. Hessian-Vector Products. With the interpolatory relation between the vector k and the function in mind, we will no longer distinguish between them. ¨ "_ +* We approximate the product by differences with a difference increment of . We will scale K the difference increment by when = G#C and obtain a forward difference approximation: if = = ¨ " 4*~ (2.1) K R O " " ZE +* ZH +* if = O O " & `* The floating point arithmetic error in the computation of * is , where is machine roundoff. Hence the error in the computation of the numerator of the difference quotient in (2.1) is ¨ ¨ R R O " " O "dO " G +* +* ZH +* ZE +* ¨ " " ?LO ;O O +* * Therefore ¨ ¨ R "_ " O " +* 4* 7C K K K O " 7& ` 7C\ b and, since , ¨ " LO ;O +* *¢ J 7C\¡ £ *¢ * Assuming that the can be neglected, which is reasonable for of moderate size, the difference ¨ " ? " " ¤ ` +* ¤& ` is first order accurate if and . This indicates that near optimality, where K K ¨ " +* is small, one can allow the relative error in the gradient to increase somewhat. A similar observation was made in [3] and [4], where large relative errors in the gradient had fairly benign effects. The situation is similar with central differences where if = = ¨ ¥ "_ 4* K R R O " " UZH +* *¦ ZE +* *¦ if * and . = = 4 C. T. KELLEY AND E. W. SACHS Here ¨ " ?;O +* ¨ ¨ J R " " "©O +* 4*§ ` 79¡ K K K £ ¨ " ? " " & ì¨ +* & ì¨ To maintain second order accuracy we must have and . 8l To summarize and make the constant in the O-term explicit, there is ©«ª such that for all , ¨ Y " ?;O +* ¨ ¨ J R "_ " O +* 4*~ ©«ª¬ ¡x (2.2) K K £ ®G where ®wb for forward differences and for centered differences. 3. Errors induced by differencing. In this section we illustrate four ways in which difference errors can affect the algorithm from [15] and propose ways to address them. From now on we will assume that ? 7 y Y°¯ ¨ " +* © and that throughout the iteration. In this case there is such that Y ¨ R " " +* © (3.1) 4ZE +* and (2.2) can be written more simply as ¯ Y ¨ ¨ J R "_ " ª±O " LO " +* 4*~ §© i ` C\ (3.2) K K The right side of (3.4) is minimized when u_²³ u_µ ?´ (3.3) and we will enforce that for the remainder of the paper. Hence (3.2) becomes ¨ ¨ R " " 4*§ ©«¶ i\ (3.4) +* K K ¯ O i" where ©«¶·G¸ §©«ª . 6 " 6 " 8¹l `q 3.1. The Finite Difference-CG Iteration. For n we let be either the forward or central difference approximation of the Hessian-vector product. Let p4Mºgr be the search directions formed by the usual implementation [9] of CG with either a forward or central difference Hessian-vector products. - À»Á 6 " º» aMº ¼½º»±M º ¾ Let and let º . The the first CG iterates are the same as those that would be obtained Á with the matrix ¿ À u Â - u J ¢º ¼ º º .

Load more