TRUNCATED METHODS FOR OPTIMIZATION WITH INACCURATE FUNCTIONS

AND GRADIENTS

¡ ¢ C.T. KELLEY AND E.W. SACHS

Abstract. We consider unconstrained minimization problems that have functions and gradients given by “black box” codes with error control. We discuss several modifications of the Steihaug that can improve performance for such problems. We illustrate the ideas with two examples.

Key words. Trust region methods, inexact Newton methods, optimal control

AMS subject classifications. 49K20, 65F10, 49M15, 65J15, 65K10

1. Introduction. Consider an unconstrained minimization problem

£¥¤§¦©¨

¨ ¨

 

where the objective function and its gradient are computed inaccurately with absolute errors  and relative errors  that can be controlled. We ask how the errors should be set so that a truncated

or inexact Newton iteration [10], [5], [11], [12] such as Newton-CG or the CG-trust region method from

¨

 

[15] will perform like the error-free algorithm while  and continue to produce an improving

¨

#"   !

sequence of iterations until  . The case considered here is different from that considered

¨ 

in [3] and [4], which assumed fully accurate function values and gradient errors that were   ), a

¨ 

condition that is impractical when  is small, and used information on the iteration to change the

¨ ¨ accuracy to which and were computed. 1.1. Motivating Problem. An example of such a situation which motivates this work is the simple

optimal control problem: $&%('

(1.1) ¨ )

where

,¥-

¨

" " " "6

 +12 +3 4*5 +3 43 37

(1.2) +*

.0/

?>A@ 1

where *98 is the control and the state variable is the solution of the initial value problem (with

/;:¥< =

B

6 6

1DC 3

1& )

B

.#J

" " " " "

GF5 +12 +3 4*5 +3 43 41H I1

(1.3) 1E +3

=

¨

"

+*

If and F are continuously differentiable, then the gradient with respect to the inner product, ,

/ /LK

can be represented as a continuous function of 3 :

¨

J

) )

" " " " " "PO " " "

+* +3 NM5 +3 F +12 +3 4*5 +3 43 +1H +3 4*Q +3 43

(1.4)

/ ?>A@

In (1.4) M , the adjoint variable, satisfies the final-value problem on

< =

B

J

R

" " " " "QO " " " "

V NM5 +3 FTS +12 +3 4*Q +3 43 S +1H +3 4*Q +3 43 !M5 U> 

(1.5) M5 +3

/ =

Version of July 16, 1999. ¡ North Carolina State University, Department of Mathematics and Center for Research in Scientific Computation, Box 8205, Raleigh, N. C. 27695-8205 (Tim [email protected]). The research of this author was supported by National Science Founda-

tion grants #DMS-9700569 and #DMS-9714811. ¢ Universitat¨ Trier, FB IV – Mathematik and Graduiertenkolleg Mathematische Optimierung, 54296 Trier, Germany ([email protected]).

1 2 C. T. KELLEY AND E. W. SACHS

If one solves (1.3) and (1.5) with the explicit Euler method, then the discretized gradient is also the gradient of the discrete problem. This means that approximating Hessian-vector products to high accuracy can easily be done by differencing for the discrete problem, since analytic gradients are available by com- putation of the discrete adjoint state. This is not the case if higher order methods are used [8]. If one uses

variable-step and variable-order codes that control the local truncation error [13], [2], [1], [14] the error in ¨

will depend on the errors that come from the numerical integration of (1.3) and (1.5). Moreover, after

(1.3) has been solved, the values of 1 obtained will have to be used in an interpolation during the integration ¨

of (1.3). That interpolation error will also affect the accuracy of . ¨

The accuracy in , in turn, will affect the performance of a Newton-CG algorithm that uses finite



difference Hessian-vector products. We will denote by 7 and the absolute errors in the computation of

?

the function and gradients and by  and the relative errors. Our scenario is that when a function value

¨ ¨HW

" " +*

+* is requested the computed value satisfies

X XY X X

W

¨ ¨ ¨

R

" O " "

+*  +*  (1.6) +*

and the computed gradient, which we denote by Z , satisfies

X X Y

¨ ¨

J

R

" O  " " ?

+* +* 

(1.7) 4ZH +*

¨

S\" 7&[& !

For the example problem, the errors in are of the same order as the errors in 1 . Hence

 " _^ `^

S

and 4]& ! . The gradient errors are different. If and are the absolute and relative errors in the Z

computation of M then, neglecting products of errors, the computed gradient is

)

O `^c"4"dO e^f"4" O  "4"2O " "

aMQ ?b & !  ! F +12 ?b & ! S & ! S 4*©43

J

)

O O  "4"dO " "

+12 ?b  ! S & ! S 4*P43

/

¨

* M

Assuming that 1 , , and are bounded and and are sufficiently smooth we have

/

)

`^ e^ 

O "4"dO "4" O Sg"4"2O S\" "

aMQ ?b & !  ! F +12 ?b & ! & ! 4*©43

)

O !^f"4" "©O e^hO  O "

[ ?b  ! MHF +1E4*©43 & ! S S

and

J

) )

O  "4"©O " " "PO  O "

+12 ?b & ! S & ! S 4*©43  +1T4*P43 & ! S S

/ /

So the relative error in the ) term can be taken to be zero. Hence,

/

¨

" " O ?i"4"dO #"

ZE +* G +* ?b & !  ! 

where

J

? `^g"  e^hO O  "

 ! j& ! S S

(1.8) and 1 In addition to the errors that are controlled by the integrator, interpolation errors in * and and inte- gration errors in the computation of ¨ must be considered. These errors are independent of the choice of

integrator. We illustrate these errors with a simple example. Let the discrete unknown be a vector k[8mln

kPo * pq3orin

with components , which represents the values of on a uniform temporal mesh otsPu with mesh width

v

R

"

wbxCy !z b * 3

. When the numerical integrator needs values of at points other that one of the o s, and in-

{ pq3 r

terpolation needs to be done. After the state equation has been solved, the values of the solution at o are

¨

"

|}8~l +* k stored in a vector n and then a is approximated by a numerical integration. If is smooth and a cubic spline interpolation is used, one may approximate ¨ with Simpson’s rule for an integration error of

INACCURATE FUNCTIONS AND GRADIENTS 3

vD

¨ "

& , which is also the interpolation error. If piecewise linear interpolation is used and is approximated

v

"

k |

by the trapezoid rule, the integration and interpolation errors are & . and must be interpolated for

K

`^

o€i‚ the integration of the adjoint equation. So if o€#‚ is the integration error and the interpolation error,

we have

? `^ `^  _^  !^

O O " O S;O S;O O "

& ! o€#‚ oƒ€#‚ & ! o€#‚ oƒ€#‚ and

In the experiments reported in this paper, the values of are consistent with the interpolation and inte-

gration errors. So

v



`^ e^  "

„ „ S „ S†j& ‡‰IŠ where ‡ˆ if piecewise linear interpolation and trapezoid rule integration is used and if cubic spline

interpolation and Simpson’s rule integration is used. * 2. Hessian-Vector Products. With the interpolatory relation between the vector k and the function

in mind, we will no longer distinguish between them.

¨

"_‹

+* Œ

We approximate the product by differences with a difference increment of . We will scale

K

‹ ‹Ž

 

the difference increment by  when

= 

‹

ŒGŒ#C‘ 

and obtain a forward difference approximation:

”•

•

• ‹

if 

= =



–

’

¨

•

‹ "

4*~“ Œ 

(2.1) •

K

•—



R

O ‹ " "

ZE +* Œ ZH +* ‹˜

if 

=

Œ

 

O ‹ O ‹ "

Œ & `™š›* Œ  ™š The floating point arithmetic error in the computation of * is , where is machine

roundoff. Hence the error in the computation of the numerator of the difference quotient in (2.1) is

 

œ†

¨ ¨

R R

O ‹ " " O ‹"dO "

G +* Π+* ZH +* ΠZE +*



¨

‹ " " ?LO ;O O

Œ    +*  ™š›* 

Therefore 

’ œ



¨ ¨

R

"_‹ ‹ " ‹ O "

+* 4*ž“ Œ    Œ  7C Œ

K K K

œ



‹ O "

Ž 7& `Œ  7C\Œ 

š Ÿ‰Ÿ b

and, since ™ ,

¨

œ

 " LO ;O

 +*  ™š›*¢

J

 7C\Œ›¡

Œ £

š

*¢ *

Assuming that the ™ can be neglected, which is reasonable for of moderate size, the difference

¨

 " ? " "

¤ `Œ +* ¤& `Œ

is first order accurate if and . This indicates that near optimality, where

K K

¨

" +* is small, one can allow the relative error in the gradient to increase somewhat. A similar observation was made in [3] and [4], where large relative errors in the gradient had fairly benign effects.

The situation is similar with central differences where

”

•

•

• ‹

if 

= =

 

–

’

¨

¥

•

‹ "_‹

4*ž“ Œ 

•

K

•—



R R

O ‹ " ‹ "

UZH +* *¦ Œ ZE +* *¦ Œ

‹Ž 

 

if * and .

= =

ˆ Œ 4 C. T. KELLEY AND E. W. SACHS

Here

¨

’ " ?;O 

 +* 

¨ ¨

J

R

" ‹ " ‹ "©O ‹

+* 4*§“ Œ  `Œ    79¡

K K K

Œ

£

¨

 " ? " "

& `Œi¨ +* & `Œi¨

To maintain second order accuracy we must have and .

€

‹ 8žl

To summarize and make the constant in the O-term explicit, there is ©«ª such that for all ,

¨

Y

’

" ?;O 

 +* 

¨ ¨

J

R

"_‹ ‹ " ‹ O

+* 4*~“ Œ  ©«ª¬ ‘¡Œx­

(2.2) 

K K

Œ £ ®Gˆ where ®wb for forward differences and for centered differences.

3. Errors induced by differencing. In this section we illustrate four ways in which difference errors can affect the algorithm from [15] and propose ways to address them.

From now on we will assume that

 ?

7‰„ „ „ „ y

Y°¯

¨

"  

+*  ©

and that  throughout the iteration. In this case there is such that

Y

¨

R

" " 

+*  © (3.1) 4ZE +*

and (2.2) can be written more simply as

¯ Y

’

¨ ¨

J

R



"_‹ ‹ " ª±O " ‹ ­LO "

+* 4*~“ Œ  §©  i `Œ ‘C\Œ

(3.2) 

K K

The right side of (3.4) is minimized when

u_²³ u_µ ­?´ (3.3) Œ„

and we will enforce that for the remainder of the paper. Hence (3.2) becomes

’

¨ ¨

R

" ‹ " ‹

4*§“ Œ ©«¶‰ Œi­\

(3.4) +*

K K

¯

O i"

where ©«¶·Gˆ¸ §©«ª .

‹ 6 ‹ " 6 ‹"

8¹l  `Œq 3.1. The -CG Iteration. For n we let be either the forward

or central difference approximation of the Hessian-vector product. Let p4Mºgr be the search directions formed

by the usual implementation [9] of CG with either a forward or central difference Hessian-vector products.

-

À»Á

‹ 6 " ‹

º» aMº ¼½º»±M º ¾

Let and let º . The the first CG iterates are the same as those that would be obtained Á

with the matrix ¿ À

u

Â

-

u

J

‹¢º‹

 ¼

º

º

.

¿

À

º

s ¿ À

The finite difference CG iteration (in exact arithmetic) is equivalent to that for the matrix . However the

¨

" +*

matrix need not be a good approximation to .

K

Y Mº

If ¼½º then the CG-TR algorithm moves in the direction to the trust region boundary and returns a

=

¿ À

step. Therefore, if the approximate solution to the trust region problem is obtained in ¾ iterations, it is also the one that would be obtained with as the model Hessian. INACCURATE FUNCTIONS AND GRADIENTS 5

3.2. Termination of the Linear Iteration. In most Newton-iterative methods the inner iteration is

terminated when

YÅÄ

¨ ¨ ¨

"?ÃLO " "

+* +*   +* q

(3.5)  K where the parameter Ä is called the forcing term. However, when using finite difference Hessian-vector products and low-resolution functions and gradients, we expect that the step is on the trust region boundary

or

YÅÄ

J

6 Ã\"dO

Zd 4Zd (3.6) 

Assuming that the step is in the interior of the trust region, the CG iteration returns (at least in exact arith-

¿ À

metic [7]) when

YÅÄ

J

ëO

 4Z©

(3.7) Zd

6 Ã\" Ã

Neither (3.6) or (3.7) imply (3.5). Moreover, since is not linear in , (3.6) is not equivalent to (3.7). ¿†À

Following the analysis in [9] and [10] one can prove

Y

)

Ÿ ¼

THEOREM 3.1. Let ¼TÆ be the smallest and largest eigenvalues of . Then

=

¿

À

¨

R

)

Ç "4"Ç

i ¼ C#¼TÆʌi­f q

+* È„¾›©«¶»É ¿

(3.8) À

K

¿

À Ç

for all in the ¾ th Krylov space for . ¿ À

Proof. We let p4Mº#r be the CG search directions, which form an -orthogonal basis for the Krylov

º 6 ºx" ‹¢º

M  aM 

space. By construction, ¿ À . Hence,

Y

¨

R

­

"4" º ¶ º

i +* M  © `M Œ

¿

À

K

À»Á À»Á

Á Á

Ç Ç

by (3.4). If is in the ¾ th Krylov space, then we can expand using -orthogonality of the basis as

¿

À

 Â

u u

- -

u u

J

Ç Ç " Ç ‹ "

 ¼ Mº Mº† ¼ º Myº

. .

º º

º º

s s

À»Á

Á

¿ ¿ À

Hence, À

Â

u

-

u

¨

R R

Ç "Ç Ç ‹ " " "

+*  ¼ º Mº +* Mº 

.



º

K K

º

s

À»Á ¿

and therefore, À

-

Â

u

Y

Ç ‹

º

¨

J

R

"4"Ç

Ë i +*  ©«¶«Œi­ `Mºy¦Ë

.

K

º

s

¿

À

Ë Ë

¼½º

Ë Ë

Ë Ë

Ë Ë ¿

By -orthogonality, À

¿ ¿

À À

¿

À

- -

Y Y

Ç ‹ Ç

º Mº

-

-

)

Ç Ç" " Ç

Ë Ë ˜Ë Ë Cy aM Myº É ¼ C#¼TÆ? g`MºDq

É

-

º

Ë Ë Ë Ë

¼½º

M Mº

º

Ë Ë Ë Ë

Ë Ë Ë Ë

À»Á

Ë Ë Ë Ë

which completes the proof.

Y ¼

If u the trust region algorithm will move to the trust region boundary and no further CG

=

¿ À

iterations will be taken.

Y

)

Ÿ ¼

LEMMA 3.2. Let ¼TÆ be the smallest and largest eigenvalues of . Assume (3.8) holds. Then =

(3.7) implies

YÍÌÄ

¨ ¨ ¨

"?ëO " " OÏÎ

+* +*  u\ +*  u

(3.9) 

K

¿ À

and (3.5) implies

YÍÌÄ

ÃLO O±Î

Z© 4Zd 

(3.10) 

K K 6 C. T. KELLEY AND E. W. SACHS

where, for ÐÑwb\ˆ ,

Ä Ì Ä

J

O ­q" Î "

& `Œ º»& !

º» and ¿

Proof. We prove that (3.7) implies (3.9). The other halfÀ of the proof is similar. By (3.8) and (3.1)

¨ ¨

R

"?Ã;O " Ã O Ã O "

 +* +* ÒŽ Zd & `Œ  

­

K

YÅÄ

O Ã O "

4Zd  `Π 

­

YÅÄ

¨

J

O Ã O "

   `Π 

­ Á

(3.7) implies that

Y Ä

u

à O "

  ¼ ?b 4Zd Æ

and hence

Y Ä

¨ ¨ ¨

"?ëO " O ­q"4" " O "

 +* +*  & `Œ  +*  & ! K which is (3.9) Termination of the CG iteration when (3.8) holds implies that (3.5) holds for a slightly larger Ä (ie the

step is a useful step for the ¨ , if

YÅÓ

¨

O ­

Π4Zd  

(3.11)

Ó Ä

R ŸŸ

and b . Since

¨

­ ­

O O

Œ 4ZdÈÔ¡ Œ  

4Zd £

(3.11) will hold if, for example

4Zd†I© $&Õ#Ö

for a sufficiently large © and

Ä

J

­

" "

`Œ  ‘C‘4ZE +*  (3.12)  In summary, to guarantee that the inexact Newton iteration will behave correctly, the termination crite-

rion for the nonlinear iteration, say,

Y

4Zd © y Ä

for some ©w , must be connected to the forcing term in the inexact Newton iteration and to the error in = the numerical differencing. One way to do this is to use (3.12). Even larger choices for Ä can be more efficient in the earlier phases of the iteration [6].

3.3. Accuracy of the Quadratic Model. Having generated a step à the next stage in CG-TR is to

test the step for acceptability and adjust the trust region radius. These decisions are based on comparing

6 6 ؽ‡c× the predicted reduction M‘‡c× (the reduction in the quadratic model) with the actual reduction . In the

present case, both computations can be in error. 6

To compute M‘‡c× , the reduction in the quadratic model, one approximates

- -

¨ ¨

JÜÛ

6 Ã "2O Ã "?Ã

M‘‡c× ÆE +*

+*

oÙÚ K

with

- -

JÜÛ J

6 Ã O Ã 6 Ã\"

M‘‡c× 

(3.13) Z

¿

À 6mŸ

If the ideal quadratic model is used, M‘‡c× is guaranteed in TR methods, such as CG-TR, that enforce =

Cauchy decrease. As mentioned above, we do not use the ideal quadratic model, but one based on .

¿

À

Ý Ý Ý

However, Ý

- -

J

 6 "



INACCURATE FUNCTIONS AND GRADIENTS 7

Ý

¿

À

6

Ð M‘‡c×

unless ÞMº for some . Hence the value of we compute is neither the ideal quadratic model or the 6

one based on . The error in MT‡g× can be estimated:

X X

J

R

à O à " 6 6

Æ &     Œi­ M‘‡c× M‘‡c×

otÙÚ

K

u_²³ u_µ Œ„

Using the choice ­4´ , we obtain

X X

² u

J

R

à O à ­ ­4´ " 6 6

Æ      M‘‡c× M‘‡c×

(3.14) otÙÚ K

Near optimality, there is ߆ such that

=

à O à " 6

Ædà°ß\  g×y   M‘‡c×

otÙÚ

K ¿ À

and this will dominate the error in (3.14) while ׸»àÞ , i. e. until convergence.

¨

" +*

Far from the solution, however, can have small or negative eigenvalues, as can . In this case,

K

à 

when  can be large. A detection of negative curvature from the CG iteration may not be confirmed when

6 6

M‘‡c× 

M‘‡c× is computed using (3.13). Simply reducing the TR radius when will solve this problem, as =

the error is entirely in the quadratic term.

 ?  6

á á ؽ‡c× 3.4. Measurement of Decrease. If 7&á Ñá q and , then , as observed with

errors taken into account, is

¨ ¨

J

R

6 O·Ã\" "4" O "4"2O "

ؽ‡g× [ +* +* ?b & ! q  ! q

W W 6

The relative errors will not affect the high-order bits of Øf‡c× and are therefore harmless. However, the

6

Øf‡c× * *

absolute error can make useless. If, now, W is near , then

- -

¨ ¨ ¨ ¨

JÜÛ

R

OÏÃ#" " Ã "dO Ã "?ÃÈO Ã "

+* +* +* +*  &  ¨

W W W W

K

J

à "

j&  g×y

à " " " 6

Aj 4Zd ×yhâ 4Zd 4Zd»j& _ã q ؽ‡g× If  and then as soon as , will have no accuracy at all and

can mislead the trust region algorithm.



„ *

In the case where q]„ , acceptance of an inexact Newton step near implies that

Ä

J

" " O "

4ZE +* È& 4ZE +* 

W

´

Ä 6

So if is sufficiently small and if ؽ‡g× has only one or two digits of accuracy, the step will be accepted and

Ä 6

a good reduction obtained. If is large, on the other hand, inaccuracy in Øf‡c× may result in stagnation.

" Ïä& !

Two possible solutions are to make sure that  or to abandon the test for decrease once

K

6æå

ؽ‡c× 4Zd becomes sufficiently small or when . This means that the optimization algorithm becomes a

Newton-CG iteration and only seeks to find a root of Z . 4. Changes to the Trust Region Algorithm. The previous discussion suggests several modifications

of the algorithm in [15]: ç

Terminate iteration when TR radius is below . This is consistent with a more standard practice of terminating when too many reductions in the TR radius have been taken. We implement this in all

the experiments.

ç

6 à

Modification pred: Reduce TR radius if M‘‡c× .

=

Ä Ä

ç

"

4ZE +* 7C# Œ Modification : Enforce and ­ as lower bounds for .

8 C. T. KELLEY AND E. W. SACHS

ç

ã

Ÿ

Modification ared: Switch to an equations algorithm (currently Newton-CG) when 4Zd or

6 "  ! ؽ‡g× .

The trust region-CG code from [10] was modified to incorporate these changes. The trust region pa-

rameters were left unchanged. Both examples have tolerances that can be controlled, exactly for the simple è

example in è 4.1.1 and approximately in 4.1.2. In all the examples we set

 ?

7‰„ „ „ „ y

u_²

¨

" "

¨  +* 

use centered differences with a difference increment of ?b , and terminate the iteration when = was small or when the trust region radius has been decreased more that 20 times, the latter an indication that the limit of resolution of the function has been reached.

4.1. Examples.

6 

4.1.1. Perturbed Quadratic. The purpose of this example is to show that M‘‡c× is possible and can = lead to failure of the optimization. The modification pred solves this problem and the other modifications lead to a more efficient algorithm.

The error-free problem is a quadratic

-Pé

¨

JÜÛ

R R

"dO " "

+* ˆg× b +*  +* ˆg×

-

J J J

" qb

where ×»Í ?b\ and the diagonal Hessian is given by

¤

R R

é

" "

!¾ b b

J

R

wb

oo

R

"

¾Ï !z b

The Hessian has condition number ¾ .

We designed perturbations that vary rapidly with * in the following way. The perturbation for the

function was

¨

"dO " "4"

d !ØD +* ‡i¸ +* +*

where %Ê'

" Ç" " ǽ"

ØD +* „êëcìx `ˆ 4‡iH +* Iì `ˆ 

=g=gí =g=gí

and

Â

n

J

à " Ç

ßî ?b *‘o 

=g= o(sPu

The perturbation for the gradient

Õgñ

¨

 "©O  " " "

d !Ø +* ‡ +* Dïgð +*  :

was constructed similarly. Here

%('

J

 

" "4" " "4"

+* o©„êëcì# `ˆ êëcìx +*To ‡ +* oPGì `ˆ êëcì# +*‘o

Ø and

=g=gí =g=gí

Ä J

In the computations reported in this section the forcing term in (3.5) was set to b when modification

Ä was inactive.

J

ó b ¾òˆ

In the computation reported in Figure 4.1 zòGˆ , , and . We terminated the iteration

=g= = =g=

J

Ÿ 6

ˆ M‘‡c× when 4Z© . From the plots one can see the increase in the function value if the sign of is not tested, the significant reduction in the number of CG iterations if modification Ä is enforced. The ared modification did not become active in this example. INACCURATE FUNCTIONS AND GRADIENTS 9

FIG. 4.1. Quadratic Example Unmodified Unmodified 4 6 10 10

2 4 10 10

0 2 10 10 Gradient Norm Function Value

−2 0 10 10 0 10 20 30 0 50 100 150 200 250 pred change pred change 2 4 10 10

1 10 2 10 0 10 Gradient Norm Function Value

−1 0 10 10 0 2 4 6 8 0 50 100 150 Fully Modified Fully Modified 2 4 10 10

1 10 2 10 0 10 Gradient Norm Function Value

−1 0 10 10 0 2 4 6 8 0 5 10 15 20 Iterations Cumulative CG Iterations 10 C. T. KELLEY AND E. W. SACHS

4.1.2. Optimal Control Problem. In this example we set

J

RNô

" " O

+1E4*©43 [ +1 b*

K K

/ =

in (1.2) and

.

" O

I1½* 3  1 

F5 +1T4*P43 and

K = in (1.3).

The discretized control * was a piecewise linear spline with 10 nodes and the unknowns were the values qb7@

at the nodes, which were equally spaced on ?>A@Q . In view of the expected second order accuracy,

< = < =

we set the relative and absolute error tolerances in the ODE integrator to v . We solved (1.3) and (1.5) with K

the ode15s stiff integrator in MATLAB. The solution of (1.3) was reported at the nodes by the integrator ?>h@

and { was extended to all of with piecewise linear interpolation.

< =

Ä J

In the computations reported in this section the forcing term in (3.5) was set to b when modification

=

Ä

J J

Ÿ

b 4Zd b

was inactive,  , and the iteration was terminated when .

= =

FIG. 4.2. Control Problem Unmodified Unmodified 0 10 9

8

−1 10 7

6 Gradient Norm Function Value

−2 10 5 0 5 10 15 0 50 100 150 Eta change Eta change 0 10 9

8

−1 10 7

6 Gradient Norm Function Value

−2 10 5 0 5 10 15 20 0 10 20 30 40 Fully Modified Fully Modified 0 10 9

−1 8 10 7 −2 10 6 Gradient Norm Function Value

−3 10 5 0 5 10 15 20 0 10 20 30 40 Iterations Cumulative CG Iterations INACCURATE FUNCTIONS AND GRADIENTS 11

REFERENCES

[1] P. N. BROWN, G. D. BYRNE, AND A. C. HINDMARSH, VODE: A variable coefficient ode solver, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 1038–1051.

[2] P. N. BROWN, A. C. HINDMARSH, AND L. R. PETZOLD, Using Krylov methods in the solution of large-scale differential- algebraic systems, SIAM J. Sci. Comput., 15 (1994), pp. 1467–1488.

[3] R. G. CARTER, On the global convergence of trust region algorithms using inexact gradient information, SIAM J. Numer. Anal., 28 (1991), pp. 251–265. [4] , Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information, SIAM J. Sci. Comput., 14 (1993), pp. 368–388.

[5] R. DEMBO AND T. STEIHAUG, Truncated Newton algorithms for large-scale optimization, Math. Prog., 26 (1983), pp. 190– 212.

[6] S. C. EISENSTAT AND H. F. WALKER, Globally convergent inexact Newton methods, SIAM J. Optim., 4 (1994), pp. 393– 422.

[7] A. GREENBAUM, Iterative Methods for Solving Linear Systems, no. 17 in Frontiers in Applied Mathematics, SIAM, Philadel- phia, 1997.

[8] W. W. HAGER, Rates of convergence for discrete approxiamtions to unconstrained optimal control problems, SIAM J. Numer. Anal., 13 (1976), pp. 449–472.

[9] C. T. KELLEY, Iterative Methods for Linear and Nonlinear Equations, no. 16 in Frontiers in Applied Mathematics, SIAM, Philadelphia, 1995. [10] , Iterative Methods for Optimization, no. 18 in Frontiers in Applied Mathematics, SIAM, Philadelphia, 1999.

[11] S. G. NASH, Newton-type minimization via the Lanczos method, SIAM J. Numer. Anal., 21 (1984), pp. 770–789. [12] , Preconditioning of truncated Newton methods, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 599–616.

[13] K. RADHAKRISHNAN AND A. C. HINDMARSH, Description and use of LSODE, the Livermore solver for ordinary differ- ential equations, Tech. Rep. URCL-ID-113855, Lawrence Livermore National Laboratory, December 1993.

[14] L. F. SHAMPINE AND M. W. REICHELT, The MATLAB ODE suite, SIAM J. Sci. Comput., 18 (1997), pp. 1–22.

[15] T. STEIHAUG, The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 20 (1983), pp. 626–637.