FUNCTION OPTIMIZATION

Andreas Griewank Graduiertenkolleg 1128 Institut f¨ur Mathematik Humboldt-Universit¨at zu Berlin

Script, version June 4, 2008 By Levis Eneya & Lutz Lehmann

Winter Semester 2007/2008 Contents

Overview 1

1 Existence Theory 2 1.1 Existence via Arzela-Ascoli ...... 2 1.2 Existence via weak compactness of S ...... 3

2 (Generalized) Differentiation 8 2.1 Directional derivatives for S X Banach...... 8 2.2 Subdifferential...... ⊂ ..... 9 2.3 Generalized Jacobians and Hessians ...... 17

3 Fr´echet and Gˆateaux Differentiability 19

4 Tangent Cones and Sensitivity 28 4.1 Motivation...... 28 4.2 BasicReviewofAdjoints...... 31 4.3 Inequality Constraints via Cones ...... 33

5 Results 40

6 Remarks on the Two- Discrepancy 45

Appendix 47

References 50

i Overview

We consider abstract mathematical optimization problems

min f(x) for f : Sˆ S R x S ∈ ⊃ → i.e. findx ¯ = x S such that f := f(¯x) f(x) for x S. We call S a feasible (Jahn[Jah96]: ∗ ∈ ∗ ≤ ∈ constrained set). Generally Sˆ X is an open convex domain in (X, . ). ⊂ k k

Optimization Activities: (0) Prove globally existence of a minimizerx ¯.

(1) Characterize locally the minimality ofx ¯ by necessary and sufficient optimality conditions.

(2) Approximate the problem by finite dimensional ”discretization”

min f(x), Sn Xn X with dim(Xn) < x Sn ∈ ⊂ ⊂ ∞ (k) (3) Generate iterates xn such that

(k) 1 lim lim xn =x ¯n =x ¯ f − (f ). n k ∈ ∗ →∞  →∞  (k) (4) Analyze and implement fast algorithms for generating iterates xn .

In this course we will mostly do (0) and (1) and some of (2) - (4). In terms of notation and organization, we follow Jahn [Jah96], while finite dimensional terminology is as in Bonnans[BGLS06]. For nonsmooth aspects, we follow Klatte & Kummer [KK02] and Clarke [Cla90]; and Ponstein [Pon80] for the abstract framework.

1 1 Existence Theory

Definition. Any xj j∞=0 S such that f infx S (f(x)) = limj f(xj) is called ∗ ∈ →∞ infimizing (minimizing) sequence.{ } ⊂ ≡ f always exists but may be . General task: Show that some infimizing have in some ∗ sense a limitx ¯ S such that−∞f(¯x)= f . Classical scenario: f is continuous on closed and bounded n ∈ ∗ S R . This implies that any infimizing sequence has a convergent subsequence xjk x¯ S with f(¯⊂x)= f . → ∈ ∗

1.1 Existence via Arzela-Ascoli Proposition 1.1. dim(X)= not every closed and bounded is sequentially compact. ∞ ⇐⇒ Examples:

2 ℓ space of square summable sequences R∞. • ≡ ≡

∞ 2 x =(x )∞ ℓ x = x2 < i i=1 ∈ ⇐⇒ k k v i ∞ u i=1 uX x = e = (0, 0, , 0, 1 , 0,t ). j j ··· ··· jth pos. x x = √2 if j = k. k j − kk 6 |{z}

= No subsequence of (x )∞ can be Cauchy. ⇒ i i=1 X = C[0, 1] space of continuous functions on [0, 1] with x = max x(ω) , ω [0, 1] • ≡ j k k {| | ∈ } e.g. xj = sin(jωπ) or xj = ω . x x 1 = no Cauchy subsequence. C(Ω) is known to be a complete normed space k j − kk ≥ ⇒ but limit

j 0 if0 ω < 1, x (ω) = lim xj(ω) = lim ω = ≤ ∗ j j 1 if ω =1. →∞ →∞  = lim x does not exist in C[0, 1]. ⇒ j

Observation: In the second example, the slopes of the xj(ω) tend towards infinity near ω = 1. Question: Why does ωj contain no Cauchy subsequence? j0 { } j0 1 k 1 Pick j0 = ω . ω0 < 1 such that ω0 > . But ω0 0, so that limk xj0 xk . ∃ 2 → →∞ k − k ≥ 2

General Assumptions:

(i) X = C(Ω) with Ω Rd satisfying the cone condition used in Sobolev embedding. ⊂ (ii) Ω is compact in Rd.

Definition. S X = C(Ω) is called equicontinuous if ε > 0 δ > 0 ω,ω′ Ω, x S it holds ⊂ ∀ ∃ ∀ ∈ ∀ ∈ that ω′ ω < δ = x(ω) x(ω′) <ε. k − k ⇒ | − |

2 Proposition 1.2. S X = C(Ω) C0 is equicontinuous if it is contained in some subset of C that is bounded with respect⊂ to the H¨older≡ norm

x(ω) x(ω′) x 0,α = x 0 + sup | − α | (1.1) k k k k ω=ω′ ω ω′ 6 k − k for α (0, 1]. Thus γ = sup x : x S < = S is equicontinuous in C(Ω). ∈ {k k0,1 ∈ } ∞ ⇒ Proof. Take δ = ε1/α.γ.

Proposition 1.3 (Arzela-Ascoli). S is precompact in X = C(Ω), i.e. the closure S is compact in X if and only if

(i) S is bounded, i.e. supx S( x ) < ∈ k k ∞ (ii) S is equicontinuous.

Proof. (See Lebedev & Vorovich[LV03])

0,1 0 Question: Why does C ([0, 1]) x C [0, 1] : x 0,1 < not have a compact unit ball? In other words, is Proposition 1.1 really≡{ true?∈ k k ∞} Note that from equation (1.1), any bounded sequence has limit x C0[0, 1]. By ”intimidation”, 0,1 ∗ ∈ x 0,1 lim sup xj 0, x C [0, 1]. See Examples 1 and 2 in Appendix 1. k ∗k ≤ k k ∞ → ∗ ∈ 1.2 Existence via weak compactness of S

Observation: infimizing sequence x ∞ S X need not have a strong accumulation point for { j}j=1 ⊂ ⊂ the existence of x S with f(x ) = infx S(f(x)). ∗ ∈ ∗ ∈

Definition. A sequence xj X is called weakly convergent to some weak limit x if ∈ ∗ ℓ(x ) = limj ℓ(xj) for all ℓ X∗, i.e. ℓ is a continuous linear on X. Then one writes ∗ →∞ ∈ xj ⇀ x . ∗

Observation: xj x 0 xj x = xj ⇀ x because ℓ are by definition continuous ∗ ∗ ∗ with respect tok .−. k→ ⇐⇒ → ⇒ k k

Definition. A set S X is called weakly closed if S xj j 0 and xj ⇀ x = x S. ⊂ ⊃{ } ≥ ∗ ⇒ ∗ ∈ Proposition 1.4. S is weakly closed = S closed. ⇒ Note: The converse is only true if dim(X) < or S convex. ∞ Proof. Exercise.

Definition. S is called weakly precompact if any sequence x ∞ S has a weakly convergent { j}j=1 ⊂ subsequence xj ⇀ x . S is called weakly compact if it is weakly precompact and weakly closed, so k ∗ that x S. ∗ ∈ Proposition 1.5. Bounded sets in Hilbert spaces and other reflexive Banach spaces are precompact (e.g. Lp(Ω), 1

3 Definition. For a given f : S R the set E(f) (x, α) S R : f(x) α is called the epigraph of f on S. → ≡ { ∈ × ≤ }

Proposition 1.6. For S X weakly closed and f : S R, the following statements are equivalent: ⊂ → (i) f is w.l.s.c on S.

(ii) E(f) is weakly closed.

(iii) S := x S : f(x) α is closed or empty for all levels α R. α { ∈ ≤ } ∈ Proof. .

(i) = (ii): Consider (x , α ) ∞ E(f) with (x , α ) ⇀ (x, α). Then x S because S is • ⇒ { j j }j=1 ⊂ j j ∈ weakly closed. Furthermore, lim αj = α because projection on last component is continuous. Thus, f(x) lim infj (f(xj)) lim infj (αj)= α = f(x) α and ≤ →∞ ≤ →∞ ⇒ ≤ x S = (x, α) E(f) = E(f) is weakly closed. ∈ ⇒ ∈ ⇒ (ii) = (iii): E(f) and S α weakly closed = E(f) S α = S α weakly • ⇒ ×{ } ⇒ ∩ ×{ } α ×{ } closed. (because (x, α) E(f) S α f(x) α (x, α) S α ) ∈ ∩ ×{ } ⇐⇒ ≤ ⇐⇒ ∈ α ×{ } = S x S : f(x) α weakly closed ⇒ α ≡{ ∈ ≤ }

(iii) = (i): Suppose xj ⇀ x with f(x) > limj f(xj) = α R such that • ⇒ →∞ ⇒ ∃ ∈ limj f(xj) <α

Observation/Implications

Continuity Weak-continuity ⇐ ⇓ ⇓ Lower-semi-continuity Weak-lower-semi-continuity ⇐ Proposition 1.7. Let S be a bounded and weakly closed subset in a reflexive Banach space and 1 if f : S R be weakly lower semicontinuous on S. Then there exists an x f − (f ); i.e. → ∗ ∈ ∗ f(x ) = infx S f(x). ∗ ∈ Sufficient conditions for lower semicontinuity: Assume the definitions of convex sets and convex functions are known.

Definition. f is called quasi-convex on S if S = x S : f(x) α is convex for any α R α { ∈ ≤ } ∈ Observation: g : R R monotonic and f quasiconvex = g f quasi-convex, which applies especially if f is convex.→ ⇒ ◦

Example: f(x)= x for x R+ = x R, x 0 ; g = ln x = g f = ln x quasi-convex. ∈ { ∈ ≥ } 1 ⇒ ◦ Converse holds if g is strictly monotonic because then g− exists and is monotonic.

Proposition 1.8. If S is convex and closed; f continuous and quasi-convex, then f is w.l.s.c.

Proof. Sα is convex and closed for any α R = Sα is weakly closed f is w.l.s.c (by Proposition 1.6) ∈ ⇒ ⇐⇒

4 − Bρ( x ) ρ

.x^ x− ~ x. . x Bρ/4

Figure 1: Open Neighborhoods ofx ¯

Lemma. If f is convex and bounded on open neighborhood X of x¯ then f is Lipschitz contin- N ⊂ uous near x¯.

Proof. Assume f(x) f¯ for all x (¯x) (refer to Figure 1) | | ≤ ∈ Bρ

2 x˜ x 2 x˜ x (˜x x) ρ f(˜x) f x 1 k − k + k − k x + − ≡ − ρ ρ x˜ x · 2     k − k  ( xˆ x ρ = xˆ x¯ 2ρ = ρ) k − k ≤ 2 ⇒ k − k ≤ 2 2 x˜ x 2 x˜ x 1 k − k f(x)+ k − kf(ˆx). ≤ − ρ ρ   2 x˜ x f(x)+ k − k(f(ˆx) f(x)) ≤ ρ − 4f¯ f(x)+ x˜ x ≤ ρ k − k 4f¯ f(x) f(˜x)+ x˜ x ≤ ρ k − k   4f¯ = f(˜x) f(x) x˜ x ⇒ | − |≤k − k ρ

Lemma. Let f : S R be convex and bounded on some open neighborhood in S. Then f is bounded → on some open neighborhood of any x S◦. ∈ Proof. Suppose f is bounded on (¯x) (see Figures 2) Bρ x˜ (x) y ρ(¯x), whereρ ˆ = (1 1 )ρ, α [0, 1] such thatx ˜ = (1 α)ˆx + αy ∀ ∈ Bρˆ ∃ ∈ B − λ ∃ ∈ − = f(˜x) (1 α)f(ˆx)+ f(x) max f(ˆx), f(x) f(ˆx) + sup f(y) < ⇒ ≤ − ≤ { } ≤ y ρ(¯x) ∞ ∈B

Corollary: If f : S R convex and bounded on some open neighborhood in S. = f is Lipschitz → ⇒ continuous on some neighborhood of any x int(S). ∈

Application to approximation problem.

Lemma. S convex and closed = f convex and continuous = f is w.l.s.c. ⇒ ⇒ 5 x~ δS

x^ = λ x + (1−λ) −x . for λ >1 B( x− ) y ρ . . x− x

Bρ~

Figure 2: Open Neighborhoods ofx ¯ and x

Proof. By triangle inequality for norm. Proposition 1.9. On a reflexive Banach space X, any closed S is proximinal in that for any xˆ X the distance f(x)= x xˆ : S R attains a minimum. ∈ k − k → Proof. Assume f(xn) f = infx S(f(x)) 0 monotonically. We may restrict xn to a convex → ∗ ∈ ≥ closed and bounded set

S (x )= x S : x xˆ = f(x) x xˆ f 0 { ∈ k − k ≤k 0 − k} = S (x ) is weakly compact, f is w.l.s.c. = minimizerx ¯ ↼ x exists. ⇒ f 0 ⇒ kj Question: Isx ¯ unique? Answer: Generally not. Counter example: X = R2, . = . , S = 2 k k k k∞ x R : x = max x1 , x2 =1 .x ˆ = (2, 0), xα = (1, α) S with α 1; = xα xˆ = { ∈ k k∞ {| | | |} } ∈ | | ≤ ⇒ k − k ( 1, α) = max 1, α =1 k − k∞ { | |} Converse: A Banach space X is reflexive if and only if all closed convex S X are proximinal! ⊂ Proof. see Jahn [Jah96]. Definition. A norm . on a Banach space X is called uniformly convex if ε > 0 ρ > 0 such that x, y X, x = yk k=1 and x y ε = 1 (x + y) 1 ρ ∀ ∃ ∈ k k k k k − k ≥ ⇒ k 2 k ≤ − Example. In , 1 2 1 1 1 1 1 1 1 2 1 2 1 2 2 (x + y) = 4 + 4 + 2 x, y =1 4 4 + 2 x, y =1 2 (x y) =1 4 x y 1 4 ε . k k h 1 i2 − − h i −k − k − k − k ≤ − Hence we may choose ρ = 4 ε

Conclusion: On a uniformly convex Banach space any approximation problem has a unique solution and minimizing sequences converge strongly.

Summary: Implications for f : S R on convex S. → quasi-convex and continuity = w.l.s.c. • ⇒ convex and bounded = continuity and convexity = quasi convex and continuity. • ⇒ ⇒ convex and bounded somewhere = convex and bounded anywhere = convex and • bounded. ⇒ ⇒

6 Properties in reflexive Banach space X

(i) Bounded weakly closed sets are weakly compact.

(ii) Convex closed sets are proximinal ( i.e. minimizer not always unique).

(iii) There exists a topologically equivalent norm that is uniformly convex (Renorming Theorem). 2 n n Non uniqueness on (R , . ). (R , . ) topologically equivalent to (R , . 2) i.e c1,c2 : x 1 k k∞ k k∞ k k ∃ 1 k k √n. Most importantly, continuity and convergence properties are the same. x 2 ≤ k k ≤ Proposition 1.10. Uniform convexity of . on X implies k k (i) (x ⇀ x¯ and x x¯ ) x x¯ x x¯ 0. k k nk→k k ⇐⇒ k → ⇐⇒ k k − k→ (ii) minimizer x¯ of f(x)= x x¯ over x S (closed and convex) is unique. k − k ∈ (iii) Any infimizing sequence for (ii) converges strongly to x¯.

Proof. (i) ” = ” obvious. ⇐ ” = ” for Hilbert case: ⇒ 2 2 2 xk x¯ = xk x,¯ xk x¯ = x¯ 2 x,¯ xk + xk k − k2 h 2− − i k k − 2 h i2 k k lim xk x¯ = x¯ 2 x,¯ x¯ + lim xk = x¯ (1 2+1)=0 k n →∞ k − k k k − h i →∞ k k k k −

(ii) Suppose f(¯x1)= f(¯x2)= d(s, xˆ) = infx S f(x), butx ¯1 =x ¯2 ∈ 6 x¯ xˆ x¯ xˆ x¯ = 1 − =x ˜ = 2 − 1 d(s, xˆ) 6 2 d(s, xˆ)

= ε = x˜ x˜ =0 = δ > 0 such that 1 (˜x +˜x ) 1 δ = f( 1 (x + x )) = ⇒ k 2 − 1k 6 ⇒ ∃ k 2 1 2 k ≤ − ⇒ 2 1 2 d(s, x) 1 (˜x +˜x ) d(s, x)(1 δ) which contradicts the assumed optimality ofx ¯ &¯x . k 2 1 2 k ≤ − 1 2 (iii) Any subsequence of an infimizing sequence x S must have a weakly convergent subse- { k} ⊂ quence whose weak limit isx ¯. Hence we have x ⇀ x¯ x xˆ ⇀ x¯ xˆ. By definition k ⇐⇒ k − − of infimizing sequences f(xk) = xk xˆ x¯ xˆ = infx S f(x). By (i) this implies k − k →k k − k ∈ x xˆ x¯ xˆ x x¯. k − → − ⇐⇒ k →

7 2 (Generalized) Differentiation

2.1 Directional derivatives for S X Banach ⊂ Definition. For f : S R, x S, h X set → ∈ ∈ f(x + λh) f(x) f ′(x; h) = lim − R λ 0+ λ ∈ → provided x + λh S for 0 < λ λ¯ > 0 for some λ¯ = λ¯(h) > 0 and the limit exists. If f ′(x; h) is ∈ ≤ well defined for all h X then f is said to be (Gˆateaux-) directionally differentiable at x. ∈ Example: f(x1, x2) = x2 sin( x1 ) is directionally differentiable everywhere, even where x1 = 0. Counter example: Any not at| least| one-sided differentiable univariate function, e.g. 0 if x =0 f(x)= x sin(1/x) otherwise  Observation: f ′(x; h) is positively homogeneous i.e. f ′(x; αh)= αf ′(x; h) for α 0. ≥ Proposition 2.1. For convex f : S R we have → (i) f ′(x; h) R, exists uniquely if x + λh S for λ [ λ,¯ λ¯]. ∈ ∈ ∈ − (ii) f(x + h) f(x)+ f ′(x; h), if (i) applies. ≥ (iii) f ′(x; h + h˜) f ′(x; h)+ f ′(x; h˜) (sublinearity) (i.e. subgradient by Hahn Banach). ≤ Proof. After rescaling of h assume λ¯ = 1. For fixed x, h consider ∆f(λ)= f(x + λh) f(x), convex. ∆f(λ) − To bound λ from below consider 1 1 0 = ∆f(0) = ∆f λ + ( λ) 1+ λ 1+ λ −   1 λ ∆f(λ)+ ∆f( 1), with λ> 0. ≤ 1+ λ 1+ λ − ∆f(λ)+ λ∆f( 1) ≤ − ∆f(λ) = > ∆f( 1). ⇒ λ − − For any µ>λ, µ λ λ µ λ λ ∆f(λ) ∆f(µ) ∆f(λ) = ∆f − 0+ u − ∆f(0) + ∆f(u) = for 0 <λ µ. µ · µ · ≤ µ µ ⇒ λ ≤ µ ≤   ∆f(λ) Hence λ is bounded below by ∆f( 1) and monotonically increasing on λ [0, 1]. Hence there − 1− ∈ exists a unique limit f ′(x; h) = limλ 0 λ ∆f(λ). This proves (i). Part (ii) follows immediately from ∆f→(1) monotonicity: f(x + h)= f(x)+ 1 f(x)+ f ′(x; h). Finally for (iii): ≥ ˜ 1 1 ˜ f(x + λ(h + h)) = f( 2 (x +2λh)+ 2 (x +2λh)) 1 f(x +2λh)+ 1 f(x +2λh˜) ≤ 2 2 subtracting f divide by λ : f(x + λ(h + h˜)) f(x) f(x +2λh) f(x) f(x +2λh˜) f(x) − − + − λ ≤ 2λ 2λ = f ′(x, h + h˜) f ′(x; h)+ f ′(x; h˜), as asserted. ⇒ ≤

8 Definition. (i) S is called star-shaped with respect to x¯ S if x S = αx +(1 α)¯x S for any 0 α 1. ∈ ∈ ⇒ − ∈ ≤ ≤ (ii) f : S R is called convex at x¯ if x S the real function f(αx + (1 α)¯x) is well defined and convex.→ ∀ ∈ −

(iii) h is a tangent of a star-shaped S at x¯ if x¯ + αh S only for 0 <α< α¯. ∈ Proposition 2.2 (Directional first order optimality conditions). .

(i) x¯ S can only be a (local) minimizer of f on S if all existing directional derivatives f ′(¯x; h) ∈ along tangents h of S at x¯ are nonnegative.

(ii) x¯ must be a global minimizer of f on star-shaped S if f is convex at x¯ and all f ′(x; h) with h tangent exist and are nonnegative.

Proof. (i) For any tangent h we have, by local minimality, f(¯x + λh) f(¯x) 0 for small λ> 0, − ≥ = f ′(x; h) 0 if it exists. ⇒ ≥ (ii) The convexity of f atx ¯ implies as in proof of Proposition 2.1 that 1 [f(x + λh) f(x)] is λ − monotonically increasing with respect to λ. However, this difference quotient need not be bounded below, so that existence had to be assumed. Then monotonicity implies that for λ = 1 and h = x x¯, f(x) f(¯x +(x x¯)) + f ′(¯x; x x¯) f(¯x) since f ′(¯x; x x¯) 0 by assumption. − ≥ − − ≥ − ≥

Warning: For a f(x1, x2) one may have that f(λh) has local minimum at λ = 0 for any h Rn 0 without x = 0 being a local minimizer. ∈ \{ } Observation: The above first optimality conditions require inspection of f ′(¯x; h) for all h h X : h =1 , which is dimensionally only one magnitude below examining all differences f(∈{x) ∈f(¯x). kInk smooth} finite dimensional case we look at f(x) X. − ∇ ∈

2.2 Subdifferential

Definition. A linear functional g X∗ is called a subgradient of f : S R at x¯ if f(x) f(¯x)+ g, x x¯ for all x X. The∈ set of all such subgradients is denoted by→∂f(¯x) and called the≥ subdifferential.h − i ∈

Proposition 2.3. (i) For f : S R the condition 0 ∂f(¯x) is necessary and sufficient for global → ∈ optimality.

(ii) ∂f(¯x) is a convex, closed and even weak∗ closed subset of X∗.

(iii) For f : S R convex and x¯ an interior point of S, the set ∂f(¯x) is nonempty and furthermore bounded if→f is locally Lipschitz near x¯.

Proof. Pre-remark: X∗ x∗⇀∗x∗ X∗ x∗, z x∗, z for all z X. Hence weak∗ and ∋ k ∈ ⇐⇒ h k i −→k h i ∈ weak convergence are equivalent if X and hence X∗ is reflexive. x∗ ⇀ x∗ = x∗,g = x∗,g for all g X∗∗ X. k ⇒ h k i h i ∈ ⊃ Theorem of Alaoglu: Bounded sets in X∗ are weak∗ compact.

(i) Global optimality: f(x) f(¯x) 0 for all x X = f(x) f(¯x) 0, x x¯ for all x X. − ≥ ∈ ⇒ − ≥h − i ∈ 9 (ii) g, g˜ ∂f(¯x) = g,˜ x x¯ f(x) f(¯x) g,˜ x x¯ ∈ ⇒ h − i ≤ − ≥h − i = αg˜ + (1 α)g, x x¯ f(x) f(¯x) convexity. ⇒ h − − i ≤ − ⇐⇒ (g , x x¯) f(x) f(¯x) forall x X and let g ⇀g k − ≤ − ∈ k = lim gk, x x¯ = g, x x¯ f(x) f(¯x). k ⇒ →∞h − i h − i ≤ −

(iii) As shown in Proposition 2.1 f ′(¯x; h) is for fixedx ¯ a sublinear function of h X. Hence there ∈ exists, by Hahn-Banach, a linear functional g X∗ such that g, h f ′(¯x; h) for all h X. ∈ h i ≤ ∈ This makes g a subgradient since f ′(x; h) f(¯x + h) f(¯x) by monotonicity of difference ≤ − quotient. Furthermore, if f has local Lipschitz constant L, we have f(x + λh) f(x) λh f ′(x; h) = lim − Lk k = L h | | λ 0 λ ≤ λ k k →

Then Hahn-Banach ensures that also g L. It remains to be shown (exercise) that not only

one but all elements g ∂f(¯x) are boundedk k ≤ by L. ∈

Corollary of (iii): Given ∂f(¯x) the directional derivatives f ′(¯x; h) are uniquely characterized as so called function f ′(¯x; h) = maxg ∂f(¯x) g, h . ∈ h i Proof. f ′(x; h) g, h for all g follows from subgradient property. f ′(x; h) = g, h for some g = g(h) follows≥ from h versioni of Hahn-Banach where the linear functional g is prescribedh i on one vector h to be identical with sublinear bound g, h f ′(x; h). h i ≡ Generalization to locally Lipschitz functions Proposition 2.4. If f : X R has Lipschitz constant L on some neighborhood of x¯ then → (i) There exists for each h S a unique value ∈ f(x + λh) f(x) f 0(¯x; h) = lim sup − [ L h , L h ] x x¯; λ 0 λ ∈ − k k k k → ց (ii) f 0(¯x; h) is sublinear with respect to h and there exists (by Hahn-Banach) a generalized subdif- 0 0 ferential ∂ f(x)= g X∗ : g, h f (x; h) { ∈ h i ≤ } (iii) For x¯ to be a local unconstrained minimizer of f in S it is necessary that 0 ∂0f(x; h). ∈ Proof. (i) Existence of f 0(x; h) follows from boundedness. (ii) Homogeneity f 0(x; ρh)= ρf 0(x, h) is also clear. for h, h˜ X: ∈ 1 [f(x + λ(h + h˜)) f(x)] = 1 [f(x + λh + λh˜) f(x + λh˜)] + 1 [f(x + λh˜) f(x)] λ − λ − λ − As x x¯ and x + λh˜ x¯ we get λ 0 → → ց f 0(¯x; h + h˜) = lim sup 1 [f(x + λ(h + h˜)) f(x)] λ − lim sup 1 [f(x + λh˜ + λh)) f(x + λh˜)] + f 0(¯x, h˜) ≤ λ − = f 0(¯x, h)+ f 0(¯x, h˜) Existence of nonempty ∂0f(x) follows again by Hahn-Banach.

10 1 (iii) 0 lim supx x¯; λ 0 λ [f(¯x + λh) f(¯x)] (by optimality of f(¯x)) ≤ → ց1 − 0 lim supx=¯x; λ 0 λ [f(x + λh) f(x)] f (¯x; h). ≤ ց − ≡ = 0 f 0(¯x; h) 0, h for all h X, = 0 ∂0f(¯x). ⇒ ≤ ≥h i ∈ ⇒ ∈

Comment: The above is not a all sufficient because for f(x) = x we have 0 ∂0f(0) since h +1, 1 , −| | ∈ ∈{ − } x + λh + x f 0(0, h) = lim sup −| | | | x 0;λ 0 λ → ց If h =1& x< 0, x + x + λh f 0(0, h) lim − = h =1 ≥ x 0;λ< x λ ր | | h = 1 = f 0(0, h)= 1 = ∂0f(0) = [ 1, 1] − ⇒ − ⇒ − f 0(0; h) = h αh for any α [ 1, 1] | | ≥ ∈ − As in convex case, there is a 1 1 relation between the convex set ∂0f(x) and its support function f 0(x; h) max g, h : g ∂0f−(x) . ≡ {h i ∈ } 0 0 0 Proposition 2.5. Suppose f1 and f2 are Lipschitz near x¯ then ∂ (f1 + f1)(¯x) ∂ f1(¯x)+ ∂ f2(¯x) 0 0 0 ⊂ where ∂ f (¯x)+ ∂ f (¯x)= (g + g ) X∗ : g ∂ f (¯x) for i =1, 2 1 2 { 1 2 ∈ i ∈ i } 0 Lemma. Any C X∗ such that f (¯x; h) = max g, h : g C is equal to the set of all generalized gradients ∂0f(¯x).⊂ {h i ∈ }

Proof. If g ∂0(f + f )(x) then for f = f + f , ∈ 1 2 1 2 0 1 g, h f (x; h) = lim sup λ [f1(x + λh)+ f2(x + λh) f1(x) f2(x)] h i ≤ x x¯; λ 0 − − → ց f 0(¯x; h)+ f 0(¯x; h) for all h X ≤ 1 2 ∈ 0 0 The RHS is the support function of convex set ∂ f1(¯x)+ ∂ f2(¯x). For any h we can choose gi ∂0f (¯x) such that g , h = f 0(¯x; h) for i =1, 2. Hence g can be decomposed into sum g = g +g . ∈ 1 h i i i 1 2 Difficulty: Inclusion is in general not equality. Example: f = x ; f = x . f(x)= f + f =0 1 | | 2 −| | 1 2 ∂0f(0) = 0 [ 1, 1]+[ 1, 1]=[ 2, 2]. { } ⊂ − − − 0 0 0 Lemma. For Lipschitz continuous f1 and f2 one has ∂ (f1 + f2)= ∂ f1(x)+ ∂ f2(x) if f1 or f2 is strictly differentiable in that f(x + λh) f(x) f 0(¯x; h) = lim − = g; h , x x¯; λ 0 λ → ց h i for a unique g X∗. ∈

Generalization: For n functions fi(x) we have

n 0 n ∂ f (x) ∂0f (x) , i ⊂ i i=1 ! i=1 X X with equality holding if at most one of fi’s is not strictly differentiable.

11 n n Proposition 2.6 (Chain Rule). F =(Fi)i=1, ,n : X R , ϕ : R R. ··· F ’s and ϕ are locally Lipschitz near x¯ X and F (¯x)→ Rn, respectively.→ i ∈ ∈

n 0 0 n n 0 ∂ (ϕ F )(x) co¯ α g , g ∂ F (¯x), (α , , α ) R (R )∗ ∂ (ϕ(F (¯x)) ◦ ⊂ i i i ∈ i 1 ··· n ∈ ≡ ∈ ( i=1 ) X where co¯ weak closure of . ≡ Proof. See Clarke [Cla90]. Remark: Equality holds for example if n = 1 and ϕ strictly differentiable.

1 Example: f(x, y) = max x, y = 2 (x + y + x y ) ∂0f(0, 0) = 1 (1, 1)+ 1 ∂0{( x }y )(0) by sum| rule.− |ϕ = . = n =1, α = α [ 1, 1] 2 2 | − | | | ⇒ 1 ∈ − ∂0f(0, 0) 1 (1, 1) + 1 ¯co α(1, 1) : α [ 1, 1] ≤ 2 2 { − ∈ − } = 1 (1, 1) + 1 ( 1, 1)+2α(1, 1) for α [0, 1] 2 2 { − − ∈ } = (0, 1)+ α(1, 1) for α [0, 1] − ∈ = (α, 1 α) for 0 α 1 = ∂0f(0, 0) { − ≤ ≤ } Check support property: max(x + λ∆x, y + λ∆y) max(x, y) f 0(0, h) = lim sup − (x,y) (0,0); h 0 λ → ց ∆xα + ∆y(1 α) for 0 α 1. ≥ − ≤ ≤ 0 0 0 0 Observation: ∂ (f1 + f2), ∂ f1, ∂ f2 and ∂ (f1 + f2) are bounded, convex & closed and thus weak* compact of X∗

Proposition 2.7. For any such G and G˜ X∗ we have ∈ (with σ : X R; G(σ)= g X∗ : g, h σ(h) for h X): → { ∈ h i ≤ } ∈ (i) σG(h) supg G g, h = maxg G g, h , with σG(h) being sublinear = ≡ ∈ h i ∈ h i ⇒ ˜ (ii) G G σG(h) σG˜ (h) for h X i.e. there is a 1-1 relation between G’s and associated support⊂ ⇐⇒ functions. ≤ ∈

(iii) Denoting the association of G with σ by G(σ) we have G(σ1 + σ2)= G(σ1)+ G(σ2) Proof. .

g,h h i (i) Boundedness of G by γ = sup h ,g G, 0 = x X implies σG(h) g, h γ h . k k ∈ 6 ∈ ≤ |h i| ≤ k k Sublinearity follows simply and impliesn sup max by Hahn-Banach.o ≡ (ii) ” = ” This direction is immediate by definition. ⇒ ” = ” Proof under construction (assertion in Clarke [Cla90]) ⇐ (iii) We have g σ and g σ = (g + g ) σ + σ = G(σ )+ G(σ ) G(σ + σ ). 1 ≤ 1 2 ≤ 2 ⇒ 1 2 ≤ 1 2 ⇒ 1 2 ⊆ 1 2 σ(G(σ1)+G(σ2))(h) = sup g, h = sup g1, h + g2, h = σ1(h)+ σ2(h) g (G(σ1)+G(σ2))h i gi G(σi) {h i h i} ∈ ∈ = G(σ + σ ) G(σ )+ G(σ ) ⇒ 1 2 ⊆ 1 2

12 Mean-Value Theorem

Proposition 2.8. Suppose f has Lipschitz constant L on an open neighborhood of a line-segment [x, y] = x = x(1 α)+ αy : 0 α 1 . Then there exists a z = x with 0 α˜ 1 such that { α − ≤ ≤ } α˜ ≤ ≤ f(y) f(x) ∂0f(z),y x − ∈h − i Proof. Consider the Lipschitz ϕ(α) = f(x ) f(x)(1 α) αf(y), so that α − − − ϕ(0) = f(x) f(x)=0= ϕ(1) = f(y) f(y). Hence there exists a stationary pointα ˜ (0, 1) of ϕ(α) where necessarily− either ϕ or ϕ has− minimum and thus ∈ 0 ∂0ϕ(˜α) ∂0f(x ) + f(x) f−(y) = f(y) f(x) ∂0f(x ) ∈ ⊂ α |α˜ − ⇒ − ∈ α |α˜ ϕ(α) = f(x )= f(x(1 α)+ αy) α − ϕ(ˆα + λh) ϕ(ˆα) ϕ0(α; h) = lim sup − h 0;α ˆ α λ ց → f(x + λh(y x)) f(x ) = lim sup α − − α h 0;α ˆ α λ ց → f(˜z + λh(y x)) f(˜z) lim sup − − ≤ λ 0; z xα λ ց → = f 0(x ; h(y x)) α − ∂0ϕ(α) ∂0f(x ); y x ⊂ h α − i

Application: Characterization of convexity via monotonicity

Proposition 2.9. If f is Lipschitz on a convex neighborhood of x¯ then it is convex if and only if ∂0f(x) is monotone on in that x, x˜ , g ∂0f(x), g˜ N∂0f(˜x) = g˜ g, x˜ x 0 N ∈N ∈ ∈ ⇒ h − − i ≥ Proof. (w.l.o.g. x =x ˜). Convexity implies f(˜x) f(x) g, x˜ x . At the same time, f(x) f(˜x) g,˜ x6 x˜ = g,˜ x˜ x . Adding− we get≥h 0 g −g,˜ ix˜ x monotonicity. − ≥h − i h− − i ≥h − − i ⇐⇒ 0 ? f(x(1 α)+ αx˜) f(x)(1 α) αf(˜x) ≥ − − − − = (1 α)[f(x(1 α)+ αx˜) f(x)] + α[f(x(1 α)+ αx˜) f(˜x)] − − − − − = (1 α) g, α(˜x x) α g,˜ (1 α)(˜x x) − h − i − h − − i (1 α)α = − [ g, z˜ z g,˜ z˜ z ] γ h − i−h − i (1 α)α = − g g,˜ z˜ z 0 since γ > 0, α (0, 1) γ h − − i ≤ ∈

Corollary. Under the assumptions of proposition 2.9 the monotonicity of ∂0f(x) on x¯ and N ∋ 0 ∂0f(x) is sufficient for x¯ to be an unconstrained local minimum of f. ∈ Proof. As in proposition 2.4 and convexity.

Question. Can we somehow express ∂0f(x) in terms of ”conventional” derivatives? Answer Yes, in X = Rn using Rademacher and forming convex upper semi-continuous hull.

13 0 Proposition 2.10. Suppose f is Lipschitz near x¯. Then ∂ f(¯x) is a singleton g with g X∗ if and only if f is strictly differentiable, i.e. { } ∈

f(x+λh) f(x) lim − = g, h x x¯; λ 0 λ → ց h i for some g X∗ and all h X. ∈ ∈ Proof. ” = ” (Exercise). For any h⇐: f 0(x; h)= g, h for some g ∂0f(¯x) and hence for the unique g = ∂0f(¯x). But f 0 is merely limsup, so considerh i ∈ { }

1 1 lim inf λ [f(x + λh) f(x)] = lim sup λ [f(x + λh λh) f(x + λh)] h 0; x x¯ − − h 0; x x¯ − − ց → ց → 1 = lim sup λ [f(˜x + λ( h)) f(˜x)] − x˜ x¯; λ 0 − − → ց = f 0(¯x; h)= g, h − − −h − i = g, h h i = lim inf = lim sup = lim ⇒ = strict differentiability ⇒

Proposition 2.11. ∂0f(x)= g f strictly differentiable = continuous differentiability { } ⇐⇒ ⇐ Example x2 sin 1 if x =0 f(x)= x 0 if x 6=0  f differentiable at x = 0 with f ′(x)=0 Observation: If f is differentiable then f(x) ∂f(x). Question How do we characterize or compute∇ ∈ other elements of ∂0f(¯x) where x is not strictly differentiable?

Consider the example above with h = 1. ± 0 f(x+λ f(x)) − ˜ f (0; 1) = lim sup λ = lim f ′(x + λ), λ (0,λ) hց0 λց0; ∈ x→0 x→0 ¯ ¯ 1 1 = limx 0f ′(x)= limx 0( cos +2x sin ) → → x x 1 − = lim¯ x 0 cos =1 → x 0 − 1 f (0, 1) = lim¯ x 0 cos =1 − → x f 0(0, h) = h the same as for f(x) = x . = ∂0f(0) = [ 1, 1], because g h h for all h g | |1 | | ⇒ − · ≤ | | ⇐⇒ | | ≤ 0 Notice that here ∂ f = [ 1, 1] = limi f ′(xi), xi 0 = all cluster points of sequences f ′(xi) →∞ for x 0=¯x. − { → } i → Rademacher Theorem. f : S X = Rn R locally Lipschitz has a set of exceptional points ⊆ → ∂f Sf S of Lebesgue measure zero such that f(x) = ∂x exists at all x / Sf . i.e. every ⊂ ∇ i i=1,...,n ∈ locally Lipschitz function is almost everywhere differentiable on Rn.

14 f

f(x) = 1/2 x

1 x

Proof. see Federer [Fed96]

Example: f has Lipschitz constant 2 and is differentiable except at a countable set of points. 0 There is no directional derivative: f ′(0, 1). But f (0, 1)=2

2 if f is odd f 0(0, 1) = 0 if f is even  − 0 if f(x)=0for x 0  ≤ Generalized derivative: 

[0, 2] if f odd ∂0f(0) = [ 2, 2] if f even  −  [0, 2] otherwise = conv[0, 2] = [0, 2]  Proposition 2.12. Let f be Lipschitz continuous near x¯, S˜ be of measure 0. Then

∂f(¯x) = conv lim f(xk) xk x and xk / Sf S˜ k ∇ → ∈ ∪  →∞  convex hull of all cluster points of conventional gradients of points outside S S˜ near x˜. ≡ f ∪

n Proof. ” ”: Consider g = limk f(xk), lim xk =x ¯ and any h R = X. Pick multipliers λk ←∞ such that⊃ ∇ ∈ f(xk + λkh) f(xk) 1 − f(xk)h λk ≥ ∇ − k

0 0 Applying limsup we get f (x; h) g⊤h = g is subgradient g ∂ f(¯x). Convexification follows immediately. ≥ ⇒ ⇐⇒ ∈ ” ”: This direction is left as an exercise. ⊂

2 Example. f(x1, x2) = max(x1, x2), which is Lipschitz on R with constant 1. It is differentiable everywhere except where x = x . 1 6 2 15 (0, 1) if x1 > x2 f(x1, x2) = ∇ (1, 0) if x1 < x2  ∂0(2, 2) = (1, 0), (0, 1) = (1 α, α) for 0 α 1 0 { } { − ≤ ≤ } f ((2, 2), (∆x1, ∆x2)) = max (∆x1g1, ∆x2g2) g ∂0f(0) ∈ = max (∆x1(1 α)+ α∆x2) = max(∆x1, ∆x2). 0 α 1 ≤ ≤ − n 0 Corollary. If X R and f Lipschitz on X, then the multifunction ∂ f : ⇉X∗ is upper- semi-continuous in⊂ that g ∂0f(x ), x Nx¯, ⊂g g = g ∂0f(¯x). N k ∈ k k → k → ⇒ ∈

Proof. By diagonalization. Suppose gi = limj gij, gij = f(xij), limj xij = xi. ˆ ˆ 1 →∞ ∇ 1 →∞ Take j = j(i) such that xiˆj xi < 4 xi x¯ and giˆj gi < 4 gi g then limj xiˆj(i)= x; k 0− k k − k k − k k − k →∞ limj giˆj(i)= g = g ∂ f(¯x). →∞ ⇒ ∈ n k (k) (k) Result can be extended to convex combination gi = i=1 gi αi αi = 1. Here finite dim(X) is important. P P Proposition 2.13 (Griewank, Jongen, Mankwong[GJK91]). Let f : D Rn R, D open, be ⊂ → n uniform Lipschitz on an open neighborhood of (:= x D : f(x) µ + g⊤x ; µ R, g R ). 0 L { ∈ ≤ } ∈ ∈ Further, if f(x)= µ + g⊤x, then g ∂ f(x). Then the following statements are equivalent 6∈ (i) is convex and f is strictly convex in the interior 0 of so that for all x = y, L L L 6 x, y 0, and α (0, 1), we have ∈L ∈ f(x(1 α)+ αy) < (1 α)f(x)+ αf(y) − − (ii) The generalized differential is injective in that for x, y 0 we have ∈L ∂0f(x) ∂0f(y) = = x = y. ∩ 6 ∅ ⇒ (iii) The tangent plane defined by g˜ ∂f(x) at x 0 is strictly supporting in that for all y 0 ∈ ∈ L ∈ x , f(y) > f(x)+˜g⊤(y x). L \{ } − Proof. Exercise.

16 2.3 Generalized Jacobians and Hessians

n m Let F : R R be locally Lipschitz. Rademacher’s theorem implies that the Jacobian F ′(x) m n → m n ∈ R × exists at all points x ΩF = i=1 ΩFi with the complement R ΩF still of measure zero. Hence we have generalized Jacobian∈ \ T 0 ∂ F (¯x) = conv lim F ′(xk), xk x,¯ xk ΩF k → ∈ n →∞ o Properties of generalized Jacobian for Lipschitz F

(i) ∂0F (x) is nonempty, convex and compact at x.

0 n m n (ii) ∂ F : R ⇉R × is upper semi-continuous.

0 (iii) F is continuously differentiable = ∂ F (x)= F ′(x) ⇒ { } (iv) ∂0F (x) ∂0F (x) ∂0F (x) . . . ∂0F (x) ⊆ 1 × 2 × × m Example for (iv):

x F (x) = | | : R1 R2 x →  −| |  1 if x> 0 1 F ′(x) =   −   1  − if x< 0 1    α (1 α) 2α 1 = F ′(0) =  − − = − ⇒ α + (1 α) 1 2α α [0,1]  − −   −  ∈ 0 ∂ F1(0) 1 1 1 1 0 conv − , − , , ∂ F2(0) ⊆ 1 1 1 1    −     −    [+1, 1] α = − = , α [ 1, 1], β [ 1, 1] [ 1, 1] β ∈ − ∈ −  −     Chain Rule. Let both F : D Rn Rm and G : E Rm Rp be locally Lipschitz. We have ⊂ → ⊂ → ∂0(G F )(x) ( NM : M ∂0F (x) and N ∂0G(F (x)) ◦ { ∈ ∈ } where NM denote multiplication.

Mean Value Theorem. If F : D Rn Rm is locally Lipschitz on the convex domain D and ⊂ → x, y D then ∈ F (y) F (x) conv Z(y x) : Z ∂0F (z) conv ∂0F (z) (y x) − ∈ { − ∈ } ⊂ { } − where z = (1 α)x + αy. − Corollary. If m = n and for some z D0 all M ∂0F (z) are nonsingular, then for all x, y in some neighborhood of z, x = y = ∈F (x) = F (y).∈ Moreover there exists a neighborhood Rn of F (z) and a LipschitzU function6 G ⇒: Rn6 such that V ⊆ V → G F (x)= x for all x and F G(y)= y for all y . ◦ ∈ U ◦ ∈ V

17 Remark. There is also a corresponding theorem.

Definition. If F = f for some C1,1 function f : Rn R, then ∇ → 2 0 2 ∂ f(x)= ∂ f(x) = conv lim f(xk), xk x, xk / ΩF ∇ k ∇ → ∈ n →∞ o Notation: We write 2f(x) to denote the proper Hessian and ∂2f(x) for generalized Hessian. ∇ Proposition 2.14. If f(x) is Lipschitz continuous on some neighborhood of [x,y] then ∇ T 1 f(y) f(x)= f(x) (y x)+ (y x)⊤H (y x) − ∇ − 2 − z − where H (y x) ∂2f(z) for some z = (1 α)x + αy, 0 <α< 1. z − ∈ − Proof. see Hiriart-Urruty/Lemar´echal [HUL01]

Corollary 2.15. Under assumptions of Proposition 3.1, the condition that f(¯x)=0 and ∂2f(¯x) contain only positive definite matrices is sufficient for x¯ to be local minimizer∇ of f.

Proof. Because ∂f is a generalized Jacobian and thus upper-semi-continuous, all generalized Hes- sians ∂2f(x) for x in some neighborhood ofx ¯ also contain only positive definite matrices. Oth- U2 erwise, there would be a sequence Hk ∂ f(xk) with xk x¯ and Hk indefinite. Then any cluster 2 ∈ → point H = limj Hkj ∂ f(¯x) would also be indefinite. Note that the positive definite matrices →∞ form an open subset of∈ the linear space of symmetric n n matrices. Local optimality of f on follows from mean value theorem. × U

Proposition 2.16 (Saddle point property). Suppose f(¯x)=0 and all elements of ∂2f(¯x) are nonsingular and indefinite. Then x¯ is neither a local minimizer∇ nor a local maximizer of f.

Proof. See Vogel and Griewank [GV]

Proposition 2.17. At any local minimizer x¯ that does not satisfy the second order sufficient condi- tion in Corollary 3.2, the generalized Hessian ∂2f(x) must contain at least one singular or positive definite matrix as well as indefinite matrices.

18 3 Fr´echet and Gˆateaux Differentiability

Definition. Let F : X Y where X and Y are Banach spaces. → (i) F is said to be Gˆateaux differentiable at x X if for some bounded linear operator ∈ F ′(x) (x, y) and all h X ∈ B ∈ 1 lim [F (x + λh) F (x)] = F ′(x) h h 0 → λ − · (ii) F is called Fr´echet differentiable at x if 1 lim [F (x + h) F (x) F ′(x) h]=0. h 0 h − − · k k→ k k Remark. (ii) = (i); while (i) = (ii) only if dim(x) < . ⇒ ⇒ ∞ The following example highlights the difference: Let F : X L2[0, 1] Y = X be given by ≡ → F (x(t)) = sin(x(t)).

Proposition 3.1. F is uniformly Lipschitz continuous with F ′(x)h = cos(x(t))h(t) and Lipschitz constant L =1, and is everywhere Gˆateaux differentiable but nowhere Fr´echet differentiable. Proof.

1) :

1 2 2 F (x) F (y) L2 = [sin(x(t)) sin(y(t))] dt k − k 0 − Z 1 2 2 (x(t) y(t)) dt =1 x y 2 ≤ − ·k − kL Z0 2) Gˆateaux differentiability:

[sin(x(t)+ λh(t)) sin(x(t))] 2 0 ? lim − cos(x(t))h(t) = λ 0 λ − → 2 1 1 ˜ 2 ˜ = lim 2 [cos(x(t)+ λh(t))λh(t) cos(x(t)λh(t))] dt, h(t) h(t) λ 0 λ − | |≤| | → Z0 where h(t) 0 : x(t) x(t)+ λh˜(t) x(t)+ λh(t) with 0 h˜(t) h(t) ≥ ≤ ≤ ≤ ≤ where h˜(t) 0 : x(t)+ λh(t) x(t)+ λh˜(t) x(t), with h(t) h˜(t) 0 1 ≤ ≤ ≤ ≤ ≤ = lim [cos(x(t)+ λh˜(t)) cos(x(t))]2h2(t)dt λ 0 → 0 − Z 1 lim λ2h4(t)dt + 4h2(t)dt lim λ2 ∆2dt +4 h 2 4 h2(t)dt ≤ λ 0 h ∆>0 h ∆ ≤ λ 0 0 k k − h ∆ → Z{k k≤ } Z{k k≥ } → Z Zk k≤ ∗ 2 2 ( ) 4[ h L2 h (t)dt] | {z } ∗ ≤ k k − h ∆ Zk k≤

2 2 By dominated convergence, lim∆ h (t)dt = h . Hence the limit is indeed zero, i.e. →∞ h ∆ k k F is Gˆateaux differentiable at a givenk arbitraryk≤ x(t). R 19 3) Nowhere Fr´echet differentiable: Assume x = 0, cos(x(t)) = 1 and consider perturbations hn(t)= 3 2nπ for t [0, n− ] and h (t) = 0, outside. Thus sin(x(t)+ h (t)) = 0 and cos(x(t)+ h (t)) = 1. ∈ n n n

1 1 n3 2 2 2 2 (2n) 2 4 2 2π h (t)dt = 4n π = 3 π = π = hn L2 = 0 n n n ⇒ k k √n n−→→∞ Z0 Z0 Need to show:

1 n3 [sin(x(t)+ h (t)) sin(x(t)) cos(x(t))h (t)]2 n 0 ? n − − n · −→ Z0 But

1 1 n3 n3 2 2 2 2 2 n cos (x(t))h (t)= n h (t) dt = n h 2 =4π n n k nkL Z0 Z0

Conclusion. Uniformly Lipschitz continuous mappings from a (the) separable Hilbert space into itself may not be Fr´echet differentiable anywhere.

Proposition 3.2 (Preiss 1990, [Pre90]). If X∗ is separable, then any Lipschitz continuous function f : X R is Fr´echet differentiable on a dense set of points in X. → Lemma. Where F : X Y is Fr´echet differentiable, it must be locally Lipschitz continuous. → Proof. Exercise.

Proposition 3.3 (First Order Optimality for Gˆateaux differentiable). If f : N X R is Gˆateaux ⊂ → differentiable at some x¯ N open, it can only be a local minimizer if f ′(x)=0 X∗ f ′(x)h = ∈ ∈ ⇐⇒ 0 for all h X. ∈ Proof. By proposition 2.2 we must have f ′(x; h) = f ′(x)h 0 f ′(x; h) = f ′(x)h. = ≥ ≤ − − ⇒ f ′(x)h =0 = f ′(x)=0 X∗. ⇒ ∈ Application to variational problem

b min f(x) ϕ(x(t), x˙(t), t)dt where x(a)= x Rn, x(b)= x Rn ≡ a ∈ b ∈ Za W.l.o.g, take a =0, b =1, x(0)=0= x(1). Setx ˜(t)= x(t) (1 t)x tx ,˜˙x(t) =x ˙(t)+(x x ) − − a − b} a − b = x C1([0, 1], Rn) once continuously differentiable with homogeneous boundary condition. ⇒ ∈ 0 t x = max x(t) 2 = x(t) 2 = x˙(τ)dτ 0 t 1 k k ≤ ≤ k k ⇒ k k k 0 k t Z x˙(τ) dτ t max x˙(τ) 2 x C1[0,1]. ≤ k k ≤ 0 τ 1{k k }≤k k Z0 ≤ ≤ Also assume autonomy, i.e ϕ(x, x,˙ t) ϕ(x, x˙) and ϕ : R2n R and ϕ =(ϕ ,ϕ ) : R2n = R2n with global≡ Lipschitz constant−→L 0 for ϕ = ϕ C1,1(R2n). ∇ x x˙ ⇒ ≥ ∇ ⇒ ∈ 20 1 n 1 Lemma. f : C ([0, 1], R ) R is Fr´echet differentiable with gradient f ′(x)h = (ϕ (x(t), x˙(t)))h(t)+ 0 −→ 0 x ϕ (x(t), x˙(t))h˙ (t)dt x˙ R Proof.

1 f(x + h) f(x) f ′(x)h = ϕ(x(t)+ h(t), x˙(t)+ h˙ (t)) ϕx(x(t), x˙(t))h(t) ϕx(x(t), x˙(t))h˙ (t) | − − | | 0 − − | Z1 1 1 2 2 1 2 L(h(t) + h˙ (t) )dt = L h 1 2 dt ≤ 2 2 k kC0 [0,1] Z0 Z0 = L h 2forh C1[0, 1]. k k ∈ 0 Condition for classical solution x C1[0, 1] minimizing f is that for all h C1 ∈ 0 ∈ 0 1 ϕx(x(t), x˙(t))h(t)+ ϕx˙ (x(t), x˙(t))h˙ (t) dt =0 0 Z h i Assuming ϕx˙ (x(t), x˙(t)) is differentiable with respect to t we obtain integrating by parts

1 d ϕ (x(t), x˙(t)) ϕ (x(t), x˙(t)) h(t)= ϕ (x(t), x˙(t))h(t) 1 x − dt x˙ − x˙ |0 Z0  

Tentative consequence: Euler-Lagrange Equation:

d ϕx(x(t), x˙(t)) = dt ϕx˙ (x(t), x˙(t)) for 0 t 1. This is first order optimality condition for f at x C1. ≤ ≤ ∈ 0 d Question: Does dt ϕx˙ (x(t), x˙(t) really exist? Answer: Yes as shown below. Proposition 3.4. If u, v C0([0, 1], Rn) and for any h C1[0, 1]: ∈ ∈ 0 1 (u(t)T h(t)+ v(t)T h˙ (t))dt =0, Z0 then t v(t)= v(0) + u(τ) dτ Z0 and thus v˙ = u C0[0, 1]. ∈ Proof.

1 ˙ 1 n [u(t)⊤h(t)+ v(t)⊤h(t)] = 0 for h C0 ([0, 1], R ), h(0)=0= h(1) 0 ∈ Z 1 t = (v(t) U(t))⊤h˙ (t)dt for U(t)= u(τ)dτ 0 − 0 Z 1 Zτ = (v(t) U(t) c)⊤h˙ (t)dt =0 c⊤ h˙ (t)dt = c⊤(h(1) h(0)) = 0 for any c R. − − − − ∈ Z0 Z0 Pick 1 1 c = (v(t) U(t))dt such that as a result (v(t) U(t) c)dt =0. − − − Z0 Z0 21 Then consider h˙ =(v(t) U(t) c) corresponding to − − t h(t)= (v(τ) U(τ) c)dt = h(1)=0= h(0) − − ⇒ Z0 For this we obtain

1 1 2 0 = v(t) U(t) c = (v(t) U(t) c)⊤h˙ (t)dt k − − k − − Z0 Z0 = v(t)= U(t)+ c = v(0) = c andv ˙(t)= u(t) ⇒ ⇒ = v C1([0, 1]). ⇒ ∈

1 2 Proposition 3.5 (Legendre–Clebsch). When x C0 [0, 1] locally minimizes f and ϕ C then n n ∈ ∈ ϕ (x(t), x˙(t)) R × must be positive semidefinite at all 0 t 1 x˙x˙ ∈ ≤ ≤ Proof.

d2 1 1 f(x + αh) = h(t)⊤ϕ (x(t), x˙(t))h(t)dt + hϕ˙ (x(t), x˙(t))h˙ (t)dt dα2 x˙x˙ x˙x˙ α=0 Z0 Z0

+ 2 h⊤ϕ (x(t), x˙(t))h˙ (t)dt 0 xx˙ ≥ Z n Assume ω⊤ϕ (x(t ), x˙(t ))ω < 0 for ω R , ω =1 x˙x˙ 0 0 ∈ k k

= ω⊤ϕ (x(t), x˙(t))ω δ < 0 for t ε¯ t t +¯ε ⇒ x˙x˙ ≤ 0 − ≤ ≤ 0

π εω(1 + cos( ε (t t0))), if t t0 ε hε = − | − | ≤ ( 0, otherwise for ε ε¯ : ≤ π πω sin( ε (t t0)), if t t0 ε h˜ε = − − | − | ≤ ( 0, otherwise

1 h(t) 2dt ε2 4 2ε =8ε3 0 k k ≤ · · 1 Z h(t) h˙ (t) dt πε2ε =2πε2 0 k kk k ≤ Z 2 d 2 2 f(x + αh) ω⊤ϕ ω h˙ dt + o(ε ) dα2 ≤ x˙x˙ k εk α=0 Z 1 1 δ h˙ dt = δπ2 sin2( π (t t ))dt = δπ2ε + o(ε2); ≤ − k εk − ε − 0 − Z0 Z0 which would yield negative curvature for sufficiently small ε> 0.

Corollary. If in addition ϕ C2 and if ϕ (x(t), x˙(t)) is nonsingular for any t (0, 1) it follows ∈ x˙x˙ ∈ from the implicit function theorem that x˙ is a differentiable function of x(t) and ϕx˙ (x(t), x˙(t)) so that x(t) is in fact twice differentiable.

22 Proof. t ϕx˙x˙ (x(t), x˙(t)) = ϕx˙ (x(0), x˙(0) + ϕx(x(τ), x˙(τ))dτ) Z0 :=g(t) C1, w.r.t t ∈ Hence IFT is applicable to | {z }

G(x(t), x,˙ t)= ϕ (x, x˙) g(t) : Rn+1 Rn x˙ − → ∂G Whenever ∂x˙ = ϕx˙x˙ is regular, there is a locally unique functionx ˙ = F (t) with the derivative

d 1 x˙(t)=¨x(t)= ϕ (x, x˙)− (ϕ (x ˙(t))) dt − x˙x˙ xx˙ −1 (Gx˙ )

Which means that x(t) is in fact twice continuously| {z differentiable.}

In other words we are allowed to do implicit differentiation on the Euler–Lagrange equation to obtain the second order boundary value problem (BVP) in ODEs

ϕx˙x˙ (x, x˙)¨x + ϕx˙x˙ (x, x˙)x ˙ = ϕx(x, x˙) s.t x(0)=0= x(1)

Question: Does the BVP always have a solution if det[ϕ (x, x˙)] = 0 globally? x˙x˙ 6 Answer: No. Even not in the case where ϕx˙x˙ > 0.

Examples:

(i)x ¨ +4π2(x t)=0. y(t)= x(t)+− t = y¨(t)+4π2y(t)=0, y(0) = 0, y(1) = 1. ⇒ Any solution has the form

y(t)= α cos2πt + β sin 2πt = y(0) = y(1) ⇒ so that the condition y(0) = 0, y(1) = 1 is not satisfied.

1 (ii) Another example where the variational problem has a solution outside C0 [0, 1] is the double well potential ϕ(x, x˙)= 1 (x ˙ 2 1)2. 2 −

1 min f(x)= 1 (x ˙(t)2 1)2dt such that x(0)=0= x(1) 2 − Z0 has minimizers

t for 0 t 1 x(t)= ≤ ≤ 2 ( 1 t for 1 t 1 − 2 ≤ ≤

1 for 0 t< 1 ≤ 2 x˙(t)= 1  1 for 2 < t 1  − ≤1  NaN if t = 2  23 Conclusion: The Euler–Lagrange need not hold everywhere along an optimal solution. It does hold whenever x(t) is differentiable. Then x(t) solves the local problem

˜b ˜ min ϕ(x(t), x˜(t))dt s.t x(˜a)= xa˜ and x(b)= x˜b Za˜ ϕ 0; ϕ = (˜x2 1)2˙x 0 if x =1 = d ϕ˙ =0= ϕ . x ≡ x˙ − ≡ | | ⇒ dt x x

The above solution is contained in H1(0, 1) C1[0, 1] 0 ⊇ 0 1 (iii) An example where no solution exists even in H0 ([0, 1]) is the following:

1 ϕ(x)= 1 (x ˙ 2 1)2 + 1 x2 = min f(x)= 1 (x ˙ 2 1)2 + 1 x2 dt, x(0)=0= x(1) 2 − 2 ⇒ 2 − 2 Z0 

x0 (t) 1/2

x (t) 1 k+1 1/ 2

1/2

1 1 1 0 f(x ) dt 0 as k ≤ k ≤ 4k+1 ≤ 22k+3 −→ →∞ Z0 Hence inf f(x) = inf f(x)= f x C1([0,1]) x H1[0,1] ∗ ∈ 0 ∈ 0 But f = 0 is not obtained by the pointwise limit x (t) 0 ∗ ∗ ≡ 1 1 1 f(x )= +0 dt = > lim inf f(xk) ∗ 2 2 k Z0 →∞  1 even though xk ⇀ 0 weakly in H0 [0, 1]. That means f is not weakly lower semicontinuous in 1 H0 ([0, 1]). 1 1 Partial remedy: Extension of variational function f(x) from x C0 [0, 1] onto H0 [0, 1] (while still assuming that ϕ : R2n R is C1,1 with global Lipschitz constant∈ L) → Observation: H1[0, 1] is the closure of C1[0, 1] x with respect to the norm 0 0 ∋

1 2 1 1 x H1 = x˙(t) dt = C0 is a dense subspace of H0 k k s 0 k k ⇒ Z

1 1 x H0 has the representation x = limk xk, xk C0 . Then f(x) = limk f(xk) is unique ∈ →∞ ∈ →∞ 24 1 provided f is Lipschitz continuous on C with respect to . 1 0 k kH0 1 f(y) f(x) ϕ(y(t), y˙(t)) ϕ(x(t), x˙(t)) dt | − | ≤ 0 | − | Z 1 L y(t) x(t) 2 + y˙(t) x˙(t) 2dt ≤ k − k k − k Z0 p 1 1 2 L ( y(t) x(t) 2 + y˙(t) x˙(t) 2)dt ≤ 0 k − k k − k Z 1  2 √2L x˙(t) y˙(t) dt = √2L x y 1 ≤ k − k k − kH0 Z0 1 Conclusion: f(x) is well-defined on H0 ([0, 1]).

Question: What is sufficient for f to be weakly lower semicontinuous so that reflexivity of the Hilbert 1 space H0 guarantees existence of minimizers by Proposition 1.7? 1 Answer: For a general f : H0 R a sufficient condition is convexity. For f of integral form it suffices that ϕ(x, x˙) is convex with respect→ tox ˙, which follows locally from Legendre–Clebsch condition if f C2. ∈ Proposition 3.6. If ϕ(x, x˙) has a uniform Lipschitzian gradient and is convex with respect to x˙ then 1 f(x)= ϕ(x(t), x˙(t))dt for x H1[0, 1] ∈ 0 Z0 1 is weakly lower semicontinuous on H0 .

1 Proof. Suppose xk ⇀ x H0 = xk x L2 0. Hence we have ∗ ∈ ⇒ k − ∗k → 1 lim f(xk) = lim [ϕ(xk, x˙ k) ϕ(x , x˙ k)+ ϕ(x , x˙ k)]dt k k ∗ ∗ →∞ →∞ 0 − Z 1 lim ϕ(x (t), x˙ (t))(˙xk x˙ )dt + f(x )dt ≥ k ∇ ∗ ∗ − ∗ ∗ →∞ Z0

The first term gives

1 2 lim ϕ(xk, x˙ k) ϕ(x , x˙ k) k | − ∗ | →∞ Z0  1 2 xϕ x (t)+ δ(t)(xk(t) x (t)), x˙(t)+ δ˜(t)(x ˙ k(t) x˙ (t)) ≤ k∇ ∗ − ∗ − ∗  0  Z 1  1  2 2 L (C + (xk(t) + (x (t) + (x ˙ k(t) + (x ˙ (t) ) xk(t) x (t) dt ≤ k k k ∗ k k k k ∗ k k − ∗ k Z0 Z0 2 x x∗ 0 k k− kL→ 2 2 lim L˜ xk 1 + x 1 0=0. | {z } ≤ k k kH0 k ∗kH0 · →∞   1 x˙ ϕ(x (t), x˙ (t))v ˙(t)dt ∇ ∗ ∗ Z0 25 is a bounded linear functional on v H1 so that by assumed weak convergence ∈ 0 1 lim x˙ ϕ(x (t), x˙ (t))(˙xk(t) x˙ (t))dt =0 ∇ ∗ ∗ − ∗ Z0

Remark: Proposition 3.6 is also true for ϕ = ϕ(x, x,˙ t).

Example (Weierstrass): ϕ(x, x,˙ t)= 1 t(x ˙ 1)2, x(0)=0= x(1) 2 − 1 f(x)= 1 t(x ˙(t) 1)2dt 0 2 − ≥ Z0 and is weak lower semicontinuous. Since ϕ = t 0 everywhere = convexity with respect tox ˙. x˙x˙ ≥ ⇒ Let us look at Euler-Lagrange:

d d ϕ 0 ϕ = t(x ˙ 1)=x ˙ 1+ tx¨ x ≡ ≡ dt x˙ dt − −

x¨ 1 c = = ln(x ˙ 1) = c ln t = x˙ =1+ x˙ 1 − t ⇒ − − ⇒ t − = x(t)= t + c ln t + d ⇒ Again: BVP has no solution since x(0) is undefined unless C = 0 in which case x(0) = d = x(1) = d +1 = no C1 minimizer. Construct infimizing sequence by truncation 6 ⇒

t for 0 t 1 ≤ ≤ n xn(t)=  t 1 ln(t) for 1 t 1  − − ln(n) n ≤ ≤ t for 0 t 1  ≤ ≤ n x˙ n(t)= ( 1 1 for 1 t 1 − t ln(n) n ≤ ≤ 1 n x˙ ( t)= − n n ln(n) 1 1 2 n 1 f(xn)= 0dt + t dt 1 0 t ln n Z Z n   1 1 1 ln(t) ln(1/n) 1 = dt = = = 0, as n . 1 2 2 2 t ln(n) ln(n) 1 − ln(n) ln(n) → →∞ n # Z n

1 i.e xn H0 [0, 1] form infimizing sequence with f = limn f(xn)=0. ∈ ∗ →∞

Question: Does the sequence xn have a weak cluster point which would have to be a minimizer 1 of f in H0 by weak lower semicontinuity? Answer: No, because xn 1 grows unbounded as shown below. k kH0 26 1 1 1 n 2 2 1 2 xn 1 = (x ˙(t)) dt = 1dt + (1 ) dt H0 1 k k 0 0 − t ln(n) Z Z Z n 1 2 ln t 1 1 = + t n − ln n − t(ln(n))2 1   n 1 1 2 ln(n) 1 n = +1 + n − n − ln(n) − (ln(n))2 ln(n)2 1 n = 1 + , as n . − − (ln(n))2 ln(n)2 →∞ →∞

1 Compare to minx>0 f(x)= x .

Conclusion : For unconstrained S = X we need bounded level sets which is implied by coercivity i.e. lim x f(x)= . k k→∞ ∞ In the variational example: ϕ(x, x,˙ t) C x˙ 2 d, C> 0, d R ≥ | | − ∈ implies that f(x) C x 1 d = infimizing sequence must be contained in the ball ≥ k kH0 − ⇒ f +d S = x X : x 1 2 ∗ . ∈ k kH0 ≤ c   That feasible set S is convex, bounded closed and thus weakly closed and weakly sequentially com- pact. Hence Proposition 1.7 applies.

Review/Summary (1) Attainment of infimal value by the minimizer is guaranteed if

(i) dim(X) < . The feasible set S X is bounded and closed, and continuous. ∞ ⊂ (ii) X = C(Ω), f : X R continuous and S equicontinuous ε> 0, δ > 0 x X ω → ∀ ∃ ∀ ∈ ∀ ∈ Ω ω′ : ω ω¯ < δ = x(ω) x(¯ω) < ε then S is sequentially compact in X and infimizing∋ | sequences− | converge⇒ | to minimizers.− | Example: Open pit mining problem (see Appendix for details). (iii) X is reflexive, f weakly lower semicontinuous (implied by quasi-convexity) and S weakly closed and bounded (implied by quasi-convexity) and S weakly closed and bounded (im- plied by coercivity in unconstrained case).

(2) Necessary Conditions for local optimality ofx ¯ S: ∈ (i) f 0(¯x; h) 0ifx ¯ + λh S for 0 λ λ¯ and f locally Lipschitz continuous. ≥ ∈ ≤ ≤ (ii) 0 ∂0f(¯x) if x is in interior of S and f is Lipschitz continuous. ∈ (iii) 0 = f ′(¯x) if f is Gˆateaux-differentiable atx ¯. (iv) Euler–Lagrange holds for x C1 on variational problem with ϕ C1,1. ∈ ∈ (v) Legendre–Clebsch for x C2 on variational problem with ϕ C2,1. Local minimality implies global minimality∈ for convex f on convex S. ∈

27 4 Tangent Cones and Sensitivity

4.1 Motivation Unconstrained optimization (explicitly) constrained optimization problems. → Definition (Cone). Let C X be a subset of a Banach space. C is a cone if x C and λ 0 = ⊂ ∈ ≥ ⇒ λx C. A cone is pointed if x C x = x = 0 (i.e. only rays no whole lines belong to C). ∈ −n ∈ ∋ ⇒ Example: positive octant (R+) Lemma (4.0). C X is convex (x, y C = x + y C) ⊂ ⇐⇒ ∈ ⇒ ∈ Proof. Exercises

Definition. For any S X set Cone(S)= λx : λ 0, x S ⊆ { ≥ ∈ }

Definition. h X is called tangent to S X at x¯ if x S, λ > 0 such that x x¯ and ∈ ⊂ ∃{ k} ⊂ { k} k → (xk x¯)λk h. The set of all such tangents is denoted by T (S, x¯) and called “contingent cone” or “Bouligand− → tangent cone”.

h k k Observation: Without loss of generality, h =0 or λk = x x¯ k k− k Proposition 4.1.

(i) If x¯ S, then T (S, x¯) Closure(Cone(S x¯)) ∈ ⊂ − (ii) If S is starshaped at x¯ then Cone(S x¯) T (S, x¯) where S x¯ = S x¯ =: x S : x =x ¯ − ⊂ − 6 \{ } { ∈ 6 } Proof.

(i) h = limk λk(xk x¯) = limk x˜k, withx ˜k =(xk x¯)λk Cone(S x¯) →∞ − →∞ − ∈ − (ii) h Cone(S x¯)= λ(x x¯)= λ ( 1 (x x¯)) = λ (¯x + 1 (x x¯) x¯) ∈ − − k k − k k − −

Corollary. If S is starshaped then T (S, x¯) = Closure(Cone(S x¯)) − Proposition 4.2. For all S X, T (S, x¯) is closed. ⊂ Proof.

h = lim hk, hk T (S, x¯) = xk,j S,λk,j > 0 s.t. λk,j(xk,j x¯) hk T k j →∞ ∈ ⇒ ∃ ∈ − → ∈ = j(k) s.t. λ (x x¯) h 1 x x¯ ⇒ ∃ k k,j(k) k,j(k) − − kk ≤ k ≥k k,j(k) − k

lim λk,j(k)(xk,j(k) x¯) h lim λk,j(k)(xk,j(k) x¯) hk + h hk k k →∞ k − − k ≤ →∞ k − − k k − k 1 lim h hk + = 0 and lim xk,j(k) =x ¯ k k n ≤ →∞ k − k →∞

28 Lemma 4.3. S convex = T (S, x¯) convex. ⇒ Proof. λ (x x¯) h, λ˜ (˜x x¯) h˜ k k − → k k − → with λˆ = λk + λ˜k > 0 and λk λ˜k xˆ = xk + x˜k S λk + λ˜k λk + λ˜k ∈ by convexity we get

˜ ˜ ˜ λk λk h + h = lim (λk + λk) xk + x˜k x¯ k ˜ ˜ →∞ λk + λk λk + λk − ˆ   = lim λk(ˆxk x¯) T (S, x¯). k →∞ − ∈ = T (S, x¯) is convex by Lemma (4.0). ⇒ Proposition 4.4. For x¯ to be a local minimizer on S for an f that is Fr´echet differentiable on an open neighborhood of S it is necessary that f ′(¯x, h)= f ′(¯x)h 0 for all h T (S, x¯) ≥ ∈ Proof.

f ′(¯x)h = lim λkf ′(¯x)(xk x¯) since h = lim λk(xk x¯) k k →∞ − →∞ − = lim λk f(xk) f(¯x) [f(xk) f(¯x) f ′(¯x)(xk x¯)] k →∞ { − − − − − } f(xk) f(¯x) f ′(¯x)(xk x¯) lim λk xk x | − − − | ≥ − k k − ∗k x x¯ →∞ k k − k = h 0 by Fr´echet differentiability. −k k·

Question: Is it good enough for f ′(x, h) to be locally Lipschitz continuous with respect to h? Corollary 4.5. Under the assumptions of Proposition 4.4 and with S starshaped at x¯ and pseudo- convex, i.e f ′(¯x, h) 0 = f(¯x + λh) f(¯x) if λh S x¯, then x¯ must in fact be (global) ≥ ⇒ ≥ ∈ − minimizer of f on S. General representation of feasible Set S = x Sˆ : h(x)=0 Z; g(x) C Y for Sˆ X : g(x) 0 { ∈ ∈ ∈ − ⊂ } ⊂ ≤ where h : X Z,g : X Y . X,Y,Z are Banach spaces. → → 1 First step consider Sˆ = X and omit g. S x X : h(x)=0 = h− (0). ≡{ ∈ } Lemma 4.6. If h is Fr´echet differentiable at x¯ S then T (S, x¯) ker(h′(¯x)) v X : h′(¯x)v = 0 ∈ ⊂ ≡{ ∈ } Proof.

T (S, x¯) v = lim λk(xk x¯) with λk > 0, xk S lim xk =x. ¯ k ∋ →∞ − ∈ vk

= h′(¯x).v = lim λkh′(¯x)(xk x¯) | {z k} ⇒ →∞ − h(xk) h(¯x) h′(¯x)(xk x¯) = lim λk xk x¯ k − − − k k k − k x x¯ →∞ k k − k = lim vk 0 = 0 by Fr´echet differentiability k →∞ k k·

29 Question: Give simple example where T (S, x¯) = ker(h′(¯x)). 6

Remark: h′(x) may not be surjective and continuous with respect to x.

2 2 2 1 1 Answer: Lack of surjectivity X = R , h(x)= x1 + x2 = h− (0) = 0 = T (h− (0), 0) = 0 . 2 ⇒ { } ⇒ { } ker(h′(0)) = ker([0, 0]) = R ) 0 { }

Assume from now on that h′(x) is continuous with respect to x in the vicinity ofx ¯ and that h′(¯x) is surjective; i.e. Range(h′(¯x)) = Y .

1 Construction of a feasible arg x(λ)=x ¯ + λv + r(λ) h− (0) for small λ [0, λ¯) [0, 1) with r(λ) o(λ): ∈ ∈ ⊂ ∈

(i) By the open mapping theorem one concludes from the boundedness and surjectivity of h′(¯x) that h′(¯x) ρ(0X ) 1(0Y ) for some 0 <ρ< , where ρˆ(p)= x X : x p ρˆ . Here B ⊃ B 1 ∞ B { ∈ k − k ≤ } ρ generalizes the norm of h′(¯x)− if it were to exist, which it usually does not.

(ii) For ε (0, 1/(2ρ) ) pick σ> 0 such that forx ˜ (¯x) ∈ ∈ Bσ

h′(˜x) h′(¯x) ε = h(ˆx) h(˜x) h′(¯x)(ˆx x˜) ε xˆ x˜ k − k ≤ ⇒ k − − − k ≤ k − k for all pairsx ¯ (¯x) xˆ. To the mean value theorem for scalar functions consider for ∈ Bσ ∋ the given pair (˜x, xˆ) (with ∆ˆx =x ˆ x˜) some functional ℓ Y ∗ such that by Hahn-Banach ℓ 1 and − ∈ k k ≤ ℓ, h(ˆx) h(˜x) h′(¯x)∆ˆx = h(ˆx) h(˜x) h′(¯x)∆ˆx . h − − i k − − k Consider ϕ : R R with →

ϕ(t)= ℓ, h(˜x + t∆ˆx) h(˜x) th′(¯x)∆ˆx h − − i = ϕ(0) = 0 ⇒ ϕ(1) = h(ˆx) h(˜x) h′(¯x)∆ˆx k − − k ϕ′(t)= ℓ, h′(˜x + t∆ˆx)∆ˆx h′(¯x)∆ˆx h − i ϕ′(t) h′(˜x + t∆ˆx) h′(¯x) ∆ˆx ε ∆ˆx for 0 t 1. | |≤k − kk k ≤ k k ≤ ≤

By the mean value theorem, for some t¯ (0, 1) ∈

ϕ(1) ϕ(0) = ϕ′(t¯)(1 0) | − | | − | = h(ˆx) h(˜x) h′(¯x)∆ˆx = ϕ′(t¯) ε ∆ˆx ⇒ k − − k | | ≤ k k

(iii) Choose some v ker(h′(¯x)) with 0 = v σ and compute r(λ) by applying a Newton-like ∈ 6 k k ≤ iteration r = 0, r = r u for k =0, 1, ... with u satisfying 0 k+1 k − k k h′(¯x)u = h(¯x + λv + r ) Y while u ρ h(¯x + λv + r ) which is solvable by (i). k k ∈ k kk ≤ k k k

d(λ)= h(¯x + λv) = h(¯x + λv) h(¯x) h′(¯x)λv k k k − − k o(λ) v o(λ)σ — initial residual ≤ k k ≤

30 Successive residuals:

h(¯x + λv + r u ) = h(¯x + λv + r u ) [h(¯x + λv + r ) h′(¯x)u ] k k − k k k k − k − k − k k ε u ≤ k kk ρε h(¯x + λv + r ) 1 h(¯x + λv + r ) ≤ k k k ≤ 2 k k k ( 1 )k+1 h(¯x + λv + r ) =( 1 )k+1d(λ) ≤ 2 k 0 k 2 = u ρ h(¯x + λv + r ) ρ( 1 )k d(λ). ⇒ k kk ≤ k k k ≤ 2 ∞ ∞ 1 k sup rk uk ρ ( 2 ) d(λ) k k k ≤ k k ≤ Xk=0 Xk=0 2ρd(λ) 2ρo (λ)σ σ (1 λ) for small λ< λ¯ ≤ ≤ ≤ · −

By continuity of h, which follows from differentiability we have for r(λ)= k∞=0 uk that h(¯x + λv + r(λ))=0 and r(λ) 2ρo(λ)σ So that r(λ) /λ = o(λ0) k 0k. ≤ P λ 0 k k −−→→

Proposition 4.7. If h : X Z has Fr´echet derivative h′(x) that is surjective at x¯ and continuous → 1 with respect to x near x¯ then T (h− (0), x¯) = ker(h′(¯x)) Corollary 4.8. Under the assumption of Proposition 4.7, if x¯ is a local minimizer of a Fr´echet 1 differentiable f : X R on h− (0) X then there exists a functional λ Z∗ such that f ′(¯x)v = → ⊂ ∈ λ, h′(¯x)v for v X h i ∈ Proof. See page 32.

4.2 Basic Review of Adjoints

Let X,Y be Banach spaces, A B(X,Y ), X Y ; X∗ Y ∗, where X∗ = B(X, R),Y ∗ = ∈ −→A ←−A∗ B(Y, R) are spaces of bounded linear functionals. A∗ : Y ∗ X∗ is defined such that for all y∗ Y ∗ → ∈

A∗y∗, x = y∗, Ax for x X h i h i ∈ Lemma (Properties of the Adjoint).

(i) A∗ = A (= A∗ B(Y ∗,X∗)) k k k k ⇒ ∈

(ii) (A + B)∗ = A∗ + B∗

(iii) (αA)∗ =αA ¯ ∗

Thus, adjoining, , is a bounded linear operator in the real case; : B(X,Y ) B(Y ∗,X∗) ∗ ∗ → Proof. We prove part (i)

A∗y∗, x = y∗, Ax |h i| |h i| y∗ Ax y∗ A x ≤k kk k≤k kk kk k A∗y∗, x A∗y∗ sup |h i| k k ≤ x k k y∗ A = A∗ A ≤k kk k ⇒ k k≤k k

31 x X and y = Ax y∗ : y∗ 1, y∗,y = y ∀ 0 ∈ 0 0 ∃ 0 k 0k ≤ |h 0 0i| k 0k Ax = y∗, Ax = A∗y∗, x A∗y∗ x k 0k |h 0 0i| |h 0 0i|≤k 0kk 0k Ax0 k k A∗y∗ A∗ y∗ = A∗ for any x X. x ≤k 0k≤k kk 0k k k 0 ∈ k 0k = A A∗ A . ⇒ k k≤k k≤k k

Definition. ker(A)= x X : Ax =0 , • { ∈ } Range(A)= y Y : y = Ax, x X • { ∈ ∈ } X∗ (ker(A))⊥ = x∗ X∗ : x∗, x =0 if Ax =0 • ⊃ { ∈ h i } (Range(A))⊥ = y∗ Y ∗ : y∗,y =0 if y = Ax • { ∈ h i } Correspondingly, we have Range(A∗), Range(A∗)⊥, ker(A∗), ker(A∗)⊥ Proposition 4.9. For A B(X,Y ) ∈ (i) ker(A∗) = Range(A)⊥ Y ∗ ⊂ (ii) If furthermore Range(A)= Y , then ker(A)⊥ = Range(A∗) and ker(A)⊥ = Range(A∗) X∗ ⊂ Proof. (i) ” ” : Suppose y∗ ker(A∗) and y = Ax ⊂ ∈

= y∗,y = y∗, Ax = A∗y∗, x = 0 since A∗y∗ =0. ⇒ h i h i h i ” ”: Suppose y∗ Range(A)⊥, x X = y∗, Ax =0 = A∗y∗, x =0 = A∗y∗ = 0⊃ since x is arbitrary.∈ ∈ ⇒ h i ⇒ h i ⇒

(ii) ” ”: Let x∗ Range(A∗), then x∗ = A∗y∗ = x ker(A) ⊃ ∈ ⇒ ∀ ∈ x∗, x = A∗y∗, x = y∗, Ax =0 = x∗ ker(A)⊥ h i h i h i ⇒ ∈ ” ”: Let x∗ ker(A)⊥. Now construct a y∗ Y ∗ with y∗,y x∗, x for any y ⊂ ∈ ∈ h i≡h i ∈ Range(A)= Y and any x such that Ax = y. y∗ exists and is unique since forx ˜ with Ax˜ = y, x∗, x = x∗, x˜ as x x˜ ker(A). Hence y∗ is uniquely defined on Y . Next we must show hboundedness.i h Byi open− mapping∈ we can pick x such that x ρ y with Ax = y for some k k ≤ k k ρ< ∞ y∗,y = x∗, x x∗ x x∗ y ρ = y∗ x∗ ρ and y∗, Ax = x∗, x x |h i| |h i|≤k kk k≤k kk k ⇒ k k≤k k h i h i ∀ So that necessarily A∗y∗ = x∗ = x∗ Range(A∗). ⇒ ∈

Remark: The result can be generalized like the concepts of adjoints themselves to (only closed) semi-Fredholm operators (Kato).

Proof of Corollary 4.8 :

1 Proof. Minimality requires by Proposition 4.7 that f ′(¯x)v = 0forall v T (h− (0), x¯) = ker(h′(¯x)) = ∈ ⇒ f ′(¯x) ker(h′(¯x))⊥. This implies, by assumed surjectivity of h′(¯x) that ∈ f ′(¯x) Range(h′(¯x)∗) = f ′(¯x)=(h′(¯x))∗λ∗, λ∗ Z∗. ∈ ⇒ ∈

32 Practical Consequences

(i)x ¯ can under suitable conditions be computed together with λ∗ as a root to the KKT system:

h(x)=0

f ′(x) h′(x)∗λ∗ =0 −

i.e. root of the mapping (x, λ∗) (X Z∗) to (h(x), f ′(x) h′(x)∗λ) Z X∗ ∈ × − ∈ × (ii) The multiplier vector λ can be interpreted as sensitivity as follows: Under suitable assumption, the perturbed problem

min f(x) s.t h(x(t)) = tz for fixed z Z and t ( ε,ε) R ∈ ∈ − ∈

has differentiable paths of solutions x(t),λ∗(t) with x(0) =x, ¯ λ∗(0) = λ∗. Then at t = 0 the optimal value has the derivative

d f(x(t)) = f ′(x(t))x ˙(t)= λ∗(t), h′(x(t))x ˙(t) = λ∗, z . dt h i h i

4.3 Inequality Constraints via Cones A linear space X is partially ordered by ” ”” if ≤ x y (reflexivity) ≤ x y y z = x z (transitivity) ≤ ∧ ≤ ⇒ ≤ x y z X : x + z y + z (additivity) ≤ ∧ ∈ ≤ x y α 0 = αx αy (multiplicativity) ≤ ∧ ≥ ⇒ ≤ Lemma. Due to additivity and multiplicativity

x y y x C z X : z 0 ≤ ⇐⇒ − ∈ ≡{ ∈ ≥ } with C being a , which is pointed if and only if ” ” is antisymmetric ≤ (i.e. x y y x = x = y). ≤ ∧ ≤ ⇒ Proof. Exercise.

Definition. The dual cone

C∗ x∗ X∗ : x∗, x 0 for x C ≡{ ∈ h i ≥ ∈ } by definition is always closed and convex even for nonconvex C.

Examples:

(i) X = R2, C (x , x ) R2 : x 0 x (nonnegative orthant) ≡{ 1 2 ∈ 1 ≥ ≤ 2} (ii) In C0[0, 1] C = x C0[0, 1] : x(ω) 0 { ∈ ≥ } X∗ C∗ Stieltjes integrals represented by nondecreasing functions of . ⊃ ≡ 33 The inequality constraints g : X Y, g(x) 0 g(x) C (by the definition of ” ”). In many cases we assume that C Y→has nonempty≤ ⇐⇒ interior Int(∈C −) . ≤ ∈ Typical example: NLP g (x) g(x)= 1 R2 g2(x) ∈  

g1(x) 0 C = R2 = nonempty interior . ≤ g(x) C. + ⇒ g (x) 0 ⇐⇒ ∈ − 2 ≤ Examples of cones without interior in C[0, 1]

Convex functions • Lipschitz continuous functions on [0, 1]. x y y x is convex. • ≤ ⇐⇒ − From now on, we will consider the more general problem

min f(x) s.t x S x X : g(x) C, h(x)=0 ∈ ≡{ ∈ ∈ − } where g : X Y and h : X Z → → One (bad) idea is to replace h(x) = 0 by h(x) 0 h(x) 0. This idea is bad because we ≤ 1 ∧ − ≤ can never have a vector v X such that atx ¯ h− (0) strictly h′(x)v < 0 and h′(x)v < 0. But existence of such a v is necessary∈ for various∈ constraint qualifications. In finit−e , this means two active constraints with gradients of opposite sign that must be linearly dependent.

Proposition 4.10 (No strict descent direction). Suppose f, g and h are Fr´echet differentiable at x¯, h′(x) is continuous with respect to x x¯ and surjective. Then x¯ can only be a local minimizer if ≈ there exists no v X such that f ′(x), v < 0 , h′(¯x)v =0 and g(¯x)+ g′(¯x)v Int(C) ∈ h i ∈ − 1 1 Proof. Since v ker(h′(¯x)) = T (h− (0), x¯) there exists a sequence x h− (0) with ∈ k ∈ x x¯ v x x¯ and k − with v = 1 w.l.o.g. k → x x¯ → v k k k k − k k k By Fr´echet differentiability we have

f(xk) f(¯x) f ′(¯x)(xk x¯) f(xk) f(¯x) − − − 0 = − f ′(¯x)v < 0 x x¯ → ⇒ x x¯ → k k − k k k − k This contradicts local minimality provided we can show that g(x ) C for large k and thus x S k ∈ − k ∈

g(xk) g(¯x) − g′(¯x)v x x¯ → k k − k g(xk) g(¯x) y = g(¯x)+ − g(¯x)+ g′(¯x)v Int( C) = y C for large k. k x x¯ → ∈ − ⇒ k ∈ − k k − k = g(x )= x x¯ y + (1 x x¯ )g(¯x) C ⇒ k k k − k k −k k − k ∈ − since this RHS is a convex combination of two terms in C. − 34 Geometric Motivation of Lagrange Multiplier via Separation

Suppose we have the map f R g : X Y     h → Z At anyx ¯ where this map from X into R Y Z is open arbitrary variations of the value triple f(¯x),g(¯x), h(¯x) are possible. i.e. can be realized× × by perturbation ofx ¯. Thusx ¯ cannot possibly be extremal. Hence for anyx ¯ to be a minimizer we need that f(¯x),g(¯x), h(¯x) lies on the boundary of the set

f(x)+ α M˜ = g(x)+ y   h(x) x X, α>0,y Int(C)   ∈ ∈ If M˜ is convex then it has a supporting hyperplane at any boundary point (f(¯x),g(¯x), h(¯x))

Such that for all (f(x)+ α,g(x)+ y, h(x))

µ(f(x)+ α)+ y∗,g(x)+ y + z∗, h(x) µf(¯x)+ y∗,g(¯x) + z∗, h(¯x) h i h i ≥ h i h i Problem: M˜ is generally not convex. Remedy: Replace M˜ by the linearization

M = [f ′(¯x)v + α,g(¯x)+ g′(¯x)v + y, h′(¯x)v]:0 < α R, v X, y Int(C) { ∈ ∈ ∈ } Under suitable assumptions M is convex, open, and does not contain 0.

Lemma. For x¯ X where f,g, and h are Fr´echet differentiable ∈

M = [f ′(¯x)v + α,g(¯x)+ g′(¯x)v + y, h′(¯x)v]:0 < α R, v X, y Int(C) { ∈ ∈ ∈ } (i) is always convex

(ii) is open if h′(x) is surjective

(iii) does not contain 0 if x¯ is a local minimizer so that there are no directions of descent according to Proposition 4.10

Proof. Let (v , α ,y ) X R Int(C) for i =1, 2 i i i ∈ × + × (i) Then we have for any λ [0, 1] ∈

λ(f ′(¯x)v + α )+(1 λ)(f ′(¯x)v + α )= f ′(¯x)(λv + (1 λ)v )+ λα + (1 λ)α ) 1 1 − 2 2 1 − 2 1 − 2 λ(g(¯x)+ g′(¯x)v + y )+(1 λ)g(¯x)+ g′(¯x)v + y )= g(¯x)+ g′(¯x)(λv + (1 λ)v )+ λy + (1 λ)y 1 1 − 2 2 1 − 2 1 − 2 λh′(¯x)v + (1 λ)h′(¯x)v = h′(¯x)(λv + (1 λ)v ) 1 − 2 1 − 2 = M is convex as an affine image of convex X R Int(C). ⇒ × + × (ii) By open mapping there exists ρ such that for any ∆h Z there exists ∆v X with ∈ ∈

h′(¯x)∆v = ∆h and ∆v ρ ∆h h′(¯x)(v + ∆v)= h + ∆h k k ≤ k k ⇐⇒ 35 Then look for ∆y such that

g(¯x)+ g′(¯x)(v + ∆v)+ y + ∆y = g + ∆g Y ∈

For given ∆g and g = g(¯x)+ g′(¯x)v + y

∆y = ∆g g′(¯x)∆v − ∆y ∆g + g′(¯x) ∆v ∆g + g′(¯x) ρ ∆h k k≤k k k kk k≤k k k k k k For given ∆f look for ∆α such that

f ′(¯x)∆v + ∆α = ∆f = ∆α ∆f + f ′(¯x) ρ ∆h ⇒ | |≤k k k k k k Overall we obtain the estimate

∆v + ∆y + ∆α ∆f + ∆g + ∆h ρ(1 + g′(¯x) + f ′(¯x) ) k k k k | |≤| | k k k k k k k k

Since (v,y,α) Int(X Int(C) R+) = X Int(C) R+ all sufficiently small changes (∆f, ∆g, ∆h) can∈ be realized.× × × ×

(iii) Suppose 0 M. Then ∈

h′(¯x)v =0 g(¯x)+ g′(¯x)v =0 f ′(¯x)v + α =0 ∧ ∧ = g(¯x)+ g′(¯x)v Int(C) f ′(¯x)v < 0 h′(¯x)v =0. ⇒ ∈ − ∧ ∧

This contradicts minimality ofx ¯ by Proposition 4.10

Proposition 4.11. Suppose f,g,h are Fr´echet differentiable at x¯ and that h′(¯x) is surjective and continuous with respect to x. Then x can only be local minimizer of f on S x X : h(x)=0, g(x) C ≡{ ∈ ∈ − } if there exists (µ,y∗, z∗) R 0 C∗ Z∗ such that ∈ ≥ × ×

(i) µf ′(¯x)v + y∗,g′(¯x)v + z∗, h′(¯x)v =0 for all v X h i h i ∈

(ii) y∗,g(¯x) =0 (complementarity) h i

Moreover there is a v X such that h′(¯x)v =0 and g(¯x)+ g′(¯x)v Int(C). Then µ> 0 and thus ∈ ∈ − w.l.o.g µ =1, which means we have KKT rather than Fritz-John conditions satisfied.

Proof. Convexity and openness of M follow from the previous Lemma by Eidelheit theorem and separation from 0 through (µ,y∗, z∗) such that for all v X, y Int(C), and α> 0, { } ∈ ∈

µ(α + f ′(¯x)v)+ y∗,g(x)+ g′(x)v + y + z∗, h′(¯x)v > 0. h i h i

By continuity and because R 0 and C are closures of their interiors we obtain for any α 0 and ≥ ≥ y C ∈ µ(α + f ′(¯x)v)+ y∗,g(x)+ g′(x)v + y + z∗, h′(¯x)v 0 h i h i ≥ Take α =1, v =0, y = g(¯x) C. Then we have − ∈

µ + y∗,g(¯x) + y∗,y µ 0 h i h i ≥ ≥ 36 Taking α =0, v =0, y C gives ∈

y∗,g(¯x)+ y 0 h i ≥ 1 = y∗,g(¯x)+ λy 0 y∗, g(¯x)+ y 0 y∗,y 0 = y∗ C ⇒h i ≥ ⇐⇒ λ ≥ ⇐⇒ h i ≥ ⇒ ∈ ∗  

y∗,g(¯x) 0 for y = 0. g(¯x) C = y∗,g(¯x) = y∗, g(¯x) 0 = y∗,g(¯x) = 0 h(complementarityi ≥ condition) . Thus∈ − for α⇒= 0 h we get i −h − i ≤ ⇒ h i

µf ′(¯x)v + y∗,g′(¯x)v + z∗, h′(¯x)v 0 h i h i ≥ which must hold for all v X and thus also for v. Hence we have in fact equality as assumed in (i). ∈ −

Case µ> 0: Under MFCQ: v such that h′(¯x)v =0, and g(¯x)+ g′(¯x)v Int(C). For that v ∃ ∈ −

µf ′(¯x)v + y∗,g′(¯x)v + z∗, h′(¯x)v =0 h i h i y∗,g(¯x)+g′(¯x)v <0 h i | {z } = µf ′(¯x)v =0 = µ = 0 and µ 0 = µ = 1 w.l.o.g. ⇒ 6 ⇒ 6 ≥ ⇒

Review of Constraint Qualifications

1 1) Slater Condition: x˜ h− (0) such that g(˜x) Int(C). ∃ ∈ ∈ −

2) Mangasarian-Fromowitz Constraint Qualification (MFCQ) atx ¯ S: x such that h′(¯x)(x ∈ ∃ − x¯) = 0 and g(¯x)+ g′(¯x)(x x¯) Int(C). This can be regarded as Slater condition on local − ∈ − linearization of S.

3) Kuryucz-Zowe (Robinson): Atx ¯ S, g′(¯x) ker(h′(¯x)) + Cone(C + g(¯x)) = Y ∈ Lemma. MFCQ implies KZCQ

Proof. Pick any y Y . Then for some 0 <λ R we have ∈ ∈ 1 g(¯x)+ g′(¯x)(˜x x¯) y C − − λ ∈ − = y = g′(¯x)[λ(˜x x¯)] + λg(¯x)+ C g′(¯x) ker(h′(¯x)) + Cone(g(x)+ C). ⇒ − ∈

Case µ> 0: Under KZCQ assume µ = 0. Pick arbitrary y = g′(¯x)v+λg(¯x)+˜y, v ker(h′(¯x)), y˜ C. Then we have ∈ ∈

y∗,y = y∗,g′(¯x)v + λ y∗,g(¯x) + y∗, y˜ h i h i h i h i = z∗, h′(¯x)v +0+ y∗, y˜ = y∗, y˜ 0 −h i h i h i ≥ = y∗,y 0 y Y = y∗,y =0 = y∗ =0. ⇒ h i ≥ ∀ ∈ ⇒ h i ⇒

= h′(¯x)∗z∗ =0 = z∗ = 0 because ker(h′(¯x)∗) = Range(h′(¯x))⊥ = 0 . Thus ⇒ ⇒ { } (µ,y∗, z∗)=0 (R,C∗,Z∗) which contradicts the construction by Eidelheit separation. ∈

37 Generalization to more general problems

min f(x) where S x Sˆ : g(x) C, h(x)=0 x S ∈ ≡{ ∈ ∈ − } where Sˆ is closed convex with nonempty interior.

Kuryucz-Zowe: g (¯x) Cone(g(¯x)+ C) ′ Cone(Sˆ x¯)+ = Y Z h′(¯x) − 0 ×     Multiplier (µ,y∗, z∗)⊤ is as in Proposition 4.11 with µ = 0 under KZCQ but the adjoint equality becomes inequality 6

0 µf ′(¯x)(ˆx x¯)+ y∗,g′(¯x)(ˆx x¯) + z∗, h′(¯x)(ˆx x¯) forx ˆ S.ˆ ≤ − h − i h − i ∈ Remark: To obtain sufficient optimization conditions requires some kind of convexity of f and g and linearity of h. Definition. The map g : X Y (B-spaces) with Y partially ordered by C is called convex if → g(x(1 λ)+ λx˜) (1 λ)g(x) λg(˜x) C. − − − − ∈ −

Lemma. If g is convex and some y∗ C∗ then ϕ(x) y∗,g(x) : X R is convex in the classical ∈ ≡h i → (= scalar) sense.

Proof. ϕ(x(1 λ)+ λx˜) (1 λ)ϕ(x) λϕ(x)= y∗,y 0 for y C. − − − − h i ≤ ∈ Proposition 4.12. Suppose (x, y∗, z∗) X C∗ Z∗ satisfies KKT conditions ∈ × × h(¯x)=0, g(¯x) C ∈ − y∗,g(¯x) =0 h i f ′(¯x)+ g′(¯x)∗y∗ + h′(¯x)∗z∗ =0

Assume further that f and g are convex and h linear. Then x¯ is a global minimizer of the problem

min f(x) with S = x X : g(x)= C; h(x)=0 . x S ∈ { ∈ − } Proof. The Lagrangian L(x)= f(x)+ y∗,g(x) + z∗, h(x) h i h i is convex, Fr´echet differentiable and attains stationary point atx ¯. Hence by Corollary 4.5,x ¯ is its global minimizer. For any feasible x S we have ∈

f(x)= L(x) y∗,g(x) + z∗, h(x) = L(x)+ y∗, g(x) L(x). −h i h i h − i ≥ = x¯ is also a global minimizer of f(x) since L(¯x)= f(¯x). ⇒ Corollary 4.13 (Lagrange multipliers as sensitivities). Under the assumptions of Proposition 4.12 on f, g and h consider the following pair of optimization problems

min f(x) : g(x)+ y C h(x)+ z =0 for i =0, 1, with y Y and z Z. i ∈ − ∧ i i ∈ i ∈

Let (¯x ,y∗, z∗) (X,C∗,Z∗) be KKT points of these problems. Then we have i i i ∈

y∗,y y + z∗, z z f(¯x ) f(¯x ) y∗,y y + z∗, z z h 0 1 − 0i h 0 1 − 0i ≤ 1 − 0 ≤h 1 1 − 0i h 1 1 − 0i

38 Proof.

f(¯x )= f(¯x )+ y∗,g(¯x )+ y + z∗, h(¯x )+ z 1 1 h 1 1 1i h 1 1 1i f(¯x )+ y∗,g(¯x )+ y + z∗, h(¯x )+ z ≤ 0 h 1 1 1i h 1 1 1i f(¯x ) f(¯x ) y∗,y y + z∗, z z 1 − 0 ≤h 1 1 − 0i h 1 1 − 0i Exchange of indices due to symmetry gives

f(¯x ) f(¯x ) y∗,y y + z∗, z z 0 − 1 ≤h 0 0 − 1i h 0 0 − 1i Changing signs yields the left part inequality in the assertion. Remark: By Cauchy–Schwarz we get

f(¯x ) f(¯x ) y∗ y y + z∗ z z 1 − 0 ≤k 1kk 1 − 0k k 1 kk 1 − 0k f(¯x ) f(¯x ) y∗ y y + z∗ z z 0 − 1 ≤k 0kk 1 − 0k k 0 kk 1 − 0k Corollary 4.14 (The value-function). f¯(y, z) inf f(x) : g(x)+y C, h(x)+z =0 is convex ≡ ∈ − under the assumptions of Proposition 4.12 and that at any y , z with finite f¯(x ,y ) subgradient  0 0 0 0 y∗, z∗ C∗ Z∗ such that 0 0 ∈ ×

f¯(y, z) f¯(y , z ) (y∗,y y )+ z∗, z z − 0 0 ≥ 0 − 0 h 0 − 0i Proof.

¯ g(x)+ y0(1 λ)+ λy1 C f(y0(1 λ)+ λy1, z0(1 λ)+ λz1) = inf f(x) − ∈ − − − ( h(x)+ z0(1 λ)+ λz1 =0 ) −

g(xi)+ yi C inf f(x0 (1 λ)+ λx1) ∈ − , for i =0, 1 ≤ ( − h(xi)+ zi =0 )

inf (1 λ)f(x0)+ λf(x 1) ≤ − = (1 λ)f¯(y , z )+ λf¯(y , z ) − 0 0 1 1

This establishes convexity of f¯. Hence subgradients exist. (y∗, z∗) ∂f¯(y , z ) follows from the left 0 0 ∈ 0 0 inequality in Corollary 4.13 with y = y1 and f(¯x1)= f¯(y, z).

39 5 Duality Results

Consider now the “primal” problem

min f(x) : x S x Sˆ : g(x) C ∈ ≡ ∈ ∈ −  where Sˆ is closed, convex and may incorporate linear equality constraints.

Definition. The dual objective ϕ : C∗ R is given by →

ϕ(y∗) = inf f(x)+ y∗,g(x) x Sˆ h i ∈  The set S∗ = y∗ C∗ : ϕ(y∗) > is called the feasible set of the dual problem. ∈ −∞} Lemma (Weak Duality). Without any convexity assumptions on f or g

ϕ¯ sup ϕ(y∗) inf f(x)= f.¯ ≡ y∗ S∗ ≤ x S ∈ ∈

Proof. For any y∗ C∗ and any x S we have ϕ(y∗) f(x)+ y∗,g(x) f(x) ∈ ∈ ≤ h i ≤ Remark: The difference f¯ ϕ¯ is called the duality gap. If it is positive it can be shown to − vanish under suitable conditions, usually involving some kind of convexity. (Even in semi-definite programming convexity may not be enough).

Semi-Infinite Programming (SIP):

n min c⊤x such that a(t)⊤x β(t) for all t [0, 1] with a : [0, 1] R and β : [0, 1] R ≤ ∈ → →

Semi-Definite Programming (SDP):

min c, x such that A , x β R h i h i i ≤ i ∈ where M, x = Trace M ⊤x, M = M ⊤ x = x⊤ > 0. h i n ∧ x 0 t⊤xt 0 for t R t = 1.  ⇐⇒ ≥ ∈ ∧ k k Definition. g : X Y is called convex-like on Sˆ X with respect to C Y if g(Sˆ)+ C is convex in Y. → ⊂ ⊂

Lemma. Any convex g is convex-like.

Proof.

(1 λ)g(x )+(1 λ)y + λg(x )+ λy for x Sˆ and y∗ C − 0 − 0 1 1 i ∈ i ∈ (1 λ)y + λy + (1 λ)g(x )+ λg(x ) g((1 λ)x + λx ) + g((1 λ)x + λx ) − 0 1 − 0 1 − − 0 1 − 0 1 = g((1 λ)x + λx )+˜y, y˜ C − 0  1 ∈  g(S¯)+ C. ∈

40 Examples concerning duality gap

2 x 1 0 (i) min f(x)= x s.t x Sˆ [0, 2]; g(x)= − = (y1,y2) : y1 0 y2 − ∈ ∈ 1 x ≤ 0 ≥ ≤  −     

Optimal solution f¯ = f(1) = 1. − 2 ϕ(y1∗,y2∗) = inf x + y1∗(x 1) + y2∗(1 x) 0 x 2 ≤ ≤ − − − 2 min  x +(y1∗ y2∗)(x 1) 0 x 2 ≤ ≤ ≤ − − − = min  (y1∗ y2∗) 4+ y1∗ y2∗ 2 − − − −¯ ≤ − sup ϕ(y1∗,y2∗)= ϕ(0, 2) = 2=ϕ< ¯ 1= f y∗ 0 y∗  − − 1 ≥ ≤ 2 = duality gap = 1 = f¯ ϕ¯ ⇒ − y∗,g(¯x) =0 (1 1)+(1 1)=0. h i · − − (ii) min x2 s.t x 1 0, x 0, S¯ = R = S = [0, 1] − − ≤ − ≤ ⇒ f¯ = f(¯x)= f(1) = 1. 2 − ϕ(y1∗,y2∗) = inf x + y1∗(x 1)+ y2∗( x) = x R ∈ − − − −∞ ϕ¯ = < f¯ = 1. −∞ − Check convexity–likeness:

f(x) x2 g2 α 0 − − 2 ≥ g (x) = x 1 = 1 g + β 0  1   −   − − 2   ≥  g (x) x g γ 0 2 − 2 ≥         f

g1 g2

g = 1 g 1 − − 2

f not in set (S)+ C g  

Figure 3: Illustration of failure of convex-likeness by Example (ii)

41 Proposition 5.1 (Strong duality). If (i) (f,g) is convex-like on Sˆ with respect to R+ C; and if (ii) f,g are continuously Fr´echet differentiable on the neighborhood of Sˆ and (iii) g(ˆx)× Int(C) for some xˆ Sˆ S (scalar condition), and if (iv) the primal problem has a minimizer x¯∈ −X, then ∈ ∩ ∈ the dual objective ϕ(y∗) = inf f(x)+ y∗,g(x) : C∗ R x Sˆ h i → ∈ has a minimum y¯∗ and there is no duality gap, i.e. f(¯x)= ϕ(¯y∗) Proof. Look at

M = f(x)+ α, g(x)+ y R Y : α> 0, x S,ˆ y C = (α,y) R C. h i ∈ × ∈ ∈ ⇒ ∈ + × n o Convexity-likeness ensures exactly that M is convex. Int(M) = since Int(R C = ). By 6 ∅ + × 6 ∅ solvability of the primal problem we have (f(¯x), 0) / M. Using Eidelheit separation one finds that ∈ there exists 0 =(µ,y∗) R Y ∗ such that 6 ∈ × µ(f(x)+ α)+ y∗,g(x)+ y µf(¯x). h i ≥ Using α =0 we get µ f(x) f(¯x) + y∗,g(x)+ y 0 − h i ≥ Positive scaling of y—keeping everything else fixed—yields y∗,y 0 = y∗ C∗.  h i ≥ ⇒ ∈

Using y =0 x =x ˆ we get ∧ µ f(ˆx) f(¯x) + y∗,g(ˆx) 0 − h i ≥ Using y =0 x =x ¯ gives ∧  f(¯x) f(¯x)+ y∗,g(ˆx) = y∗,g(ˆx) 0 y∗,g(ˆx) =0. − h i h i ≥ ≥h i

f(x)+ y∗,g(x) f(¯x)+ y∗,g(ˆx) = f(¯x) h i ≥ h i = sup ϕ(y∗) f(¯x) = ϕ¯ = sup ϕ(y∗)= f.¯ ⇒ y∗ C∗ ≥ ⇒ y∗ C∗ ∈ ∈

Example: π π min x s.t sin x 0, Sˆ = [a, ) , a − , ≤ ∞ ∈ 2 2 The dual objective is  

ϕ(y∗) = inf (x + y∗ sin x) a y∗ > with y∗ 0 a x ≤ ≥ − −∞ ≥

We will find the minimizer at the lower boundary or one of the stationary points of L(x, y∗) = x+y∗ sin x wrt. x. The stationary points of L(x, y∗) are those where 0 = ∂xL(x, y∗)=1+y∗ cos(x). If y∗ [0, 1), then there are no solutions to this condition. If y∗ 1, then x arccos( 1/y∗)+2πk : ∈ ≥ ∈ {± − k ZZ are stationary points and ∈ } L(x, y∗)=2πk arccos( 1/y∗) + sin( arccos( 1/y∗)+2πk) ± − ± − 2 =2πk arccos( 1/y∗)+ 1 1/(y ) . ± − − ∗  p  42 π Since x a 2 the stationary interior point with the smallest value of L(x, y∗) is one of ≥ ≥ − 2 x+ = arccos( 1/y∗) with L(x+,y∗) = arccos( 1/y∗)+ (y∗) 1 and x = arccos( 1/y∗)+2π − − − 2 − − − with L(x ,y∗)=2π arccos( 1/y∗) (y∗) 1. One finds that L(x ,y∗) L(x+,y∗), so in − − − − − p − ≤ summary p a + y∗ sin(a) for y∗ [0, 1) ∈ ϕ(y∗)=  a + y∗ sin(a), min for y∗ 1 2  2π arccos( 1/y∗) (y ) 1 ≥ ( − − − ∗ − ) a = π = f¯ = π =x ¯. Here Proposition 5.1 appliesp due to convex-likeness. − 2 ⇒ − 2 We get

π y∗ < 0 if 0 y∗ 1 − 2 − ≤ ≤ ϕ(y∗)=  π 1 2 min y∗, 2π arccos ∗ (y ) 1 if y∗ 1  − 2 − − − y − ∗ − ≥ n   p o π = y∗ for all y∗ 0 − 2 − ≥ π = y¯∗ =0 = ϕ¯ = ϕ(y∗)= = f¯ (i.e. no duality gap). ⇒ ⇒ − 2

a = π = f¯ = π =x ¯. Here Proposition 5.1 does not apply due to lack of convex-likeness. 2 ⇒

π 2 + y∗ for y∗ [0, 1) π ∈ ϕ(y∗)=  2 + y∗, min 2 for y∗ 1  2π arccos( 1/y∗) (y∗) 1 ≥  − − − −   p from which one concludes by numerical methods thaty ¯∗ 1.38005 withϕ ¯ = ϕ(¯y∗) 2.95085. ≈ ≈

= ϕ¯ 2.95085 < 3.14159 < π = f¯ ⇒ ≈

Consequence of strong duality The Lagrangian function

L(x, y∗)= f(x)+ y∗,g(x) : Sˆ C∗ R h i × → has a saddle point at (¯x, y¯∗) in that

L(¯x,y∗) L(¯x, y¯∗) L(x, y¯∗) for x Sˆ and y∗ C∗. ≤ ≤ ∈ ∈ Question: Is the converse true, i.e. does a saddle point imply no duality gap? Answer: Yes as we show below.

Lemma. For L : Sˆ C∗ R × → ∪ {−∞} ∪ {∞}

ϕ¯ sup inf L(x, y∗) inf sup L(x, y∗) f¯ ≡ y∗ C∗ x Sˆ ≤ x Sˆ y∗ C∗ ≡ ∈ ∈ ∈ ∈ without any restriction on C∗ = and Sˆ = . 6 ∅ 6 ∅ 43 Proof. Ifϕ ¯ = the assertion is trivial. Otherwise there exists, for any ε> 0, a y∗ such that −∞ ε ϕ¯ ε inf L(x, yε∗) ϕ¯ − ≤ x Sˆ ≤ ∈ = ϕ¯ ε L(x, y∗) forall x Sˆ ⇒ − ≤ ε ∈ sup L(x, y∗) forall x Sˆ ≤ y∗ C∗ ∈ ∈ = ϕ¯ ε inf sup L(x, y∗)= f¯ ⇒ − ≤ x Sˆ y∗ C∗ ∈ ∈ Since ε may be chosen arbitrarily small, we getϕ ¯ f.¯ ≤ Proposition 5.2. If there is a saddle point (¯x, y¯∗) such that

L(¯x,y∗) L(¯x, y¯∗)= L(x, y¯∗) ≤ then we have no duality gap, i.e. ϕ¯ = f¯ with ϕ¯, f¯ as defined above. Proof.

L(¯x, y¯∗) = inf L(x, y¯∗) sup inf L(x, y∗) x Sˆ ≤ y∗ C∗ x Sˆ ∈ ∈ ∈ inf sup L(x, y∗) ≤ x Sˆ y∗ C∗ ∈ ∈ sup L(¯x,y∗) L(¯x, y¯∗) ≤ y∗ C∗ ≤ ∈ Thus, exact equality holds throughout.

Example (No duality gap but also no saddle point) x Take f(x) = e− , Sˆ R, g(x)= x 0, y∗ 0. ≡ − ≤ ≥ x L(¯x, y∗) = e− y∗x − x + for x< 0 f¯ = 0 = inf sup(e− y∗x) = inf ∞ R R x x y∗ 0 − x e− for x 0 ∈ ≥ ∈ ( ≥

x if y∗ > 0 ϕ¯ = sup inf (e− y∗x) = sup −∞ R y∗ 0 x − y∗ 0 0 if y∗ =0 ≥ ∈ ≥ ( =0. But the primal problem has no optimal solutionx ¯.

Proposition 5.3. If for some x,¯ y¯ Sˆ C∗ ∈ × inf L(x, y¯∗) = max inf L(x, y∗) = min sup L(x, y∗) = sup L(¯x,y∗) ∗ ∗ x Sˆ y C x Sˆ x Sˆ y∗ C∗ y∗ C∗ ∈ ∈ ∈ ∈ ∈ ∈ then (¯x, y¯∗) is a saddle point of L(x, y∗). Proof.

inf L(x, y¯∗) = sup L(¯x,y∗) x Sˆ y∗ C∗ ∈ ∈ L(¯x,y) sup L(¯x,y) = inf L(x, y¯∗) L(x, y¯∗) ≤ y C∗ x Sˆ ≤ ∈ ∈

Proposition 5.4. Under the assumptions of Proposition 5.1, x¯ is a minimizer of f(x) and y¯∗ is a maximizer of ϕ(y∗) if and only if (¯x, y¯∗) is a saddle point of the Lagrangian

L(x, y∗)= f(x)+ y∗,g(x) . h i

44 6 Remarks on the Two-Norm Discrepancy

Definition. f : U X R is called twice continuously Fr´echet differentiable on an open domain U if the gradient map⊂ → f ′(x) : U X X∗ ⊂ → has the Fr´echet derivative f ′′(x) which is continuous with respect to x. Note that f ′′(x) : U L(X,X∗) →

f ′′(x)[u, v]= f ′(x)[v,u] R bilinear, u, v X ∈ − ∈ Lemma. Under the above assumptions on f at x¯ U ∈

1 r(¯x, v) f(¯x + v)= f(x)+ f ′(¯x)v + f ′′(¯x)[v, v]+ r(¯x, v) with lim =0. 2 v 0 v 2 k k→ =o( v 2 ) k k k k

Proposition 6.1. If f is twice continuously Fr´echet differentiable| {z } on U x¯ and f ′(¯x)=0 and for some δ > 0 ∋ 2 f ′′(¯x)[v, v] δ v for v X ≥ k k ∈ then x¯ is a local minimizer of f and there exists an ε> 0 such that

f(x) f(¯x)+ δ x x¯ 2 if x x¯ ε ≥ 3 k − k k − k ≤ Proof.

1 f(x) f(¯x)= f ′′(¯x)[x x,¯ x x¯]+ r (¯x, x x¯) − 2 − − 2 − δ x x¯ 2 o( x x¯ 2) ≥ 2 k − k − k − k δ x x¯ 2 for x x¯ ε> 0. ≥ 3 k − k k − k ≤

(Counter) Example

1 f(x)= cos(x(ω))dω, x X = L2[0, 1] − 0 ∈ Z1 f ′(x)v = sin(x(ω))v(ω)dω Z0 A candidate minimizer isx ¯(ω) = 0, where

f(¯x)= 1 f(x) for x X − 1 ≤ ∈ f ′(¯x)v = sin(0)v(ω)dω =0 0 Z 1 2 2 2 f ′′(¯x)v = cos(0)v(ω) dω = v 2 k kL Z0 For δ = 1, Proposition 6.1 seems to imply thatx ¯ = 0 is the only minimizer in some neighborhood ofx ¯ = 0. But for any ε> 0 2π, 0 ω ε x (ω)= ≤ ≤ ε 0, ε<ω 1 ( ≤ 45 has also a minimal value f(x )= 1 and ε − 1 ε x x¯ 2 = (x (ω))2 dω = (2π)2 dω =4π2ε k ε − k ε Z0 Z0 becomes arbitrarily small as ε 0. Proposition 6.1 does not apply because the gradient → 1 f ′(x)v = sin(x(ω))v(ω)dω Z0 is nowhere Fr´echet differentiable.

Possible Remedies: 2 It can be shown that f C on the space X = L∞[0, 1], i.e. the space of all essentially bounded ∈ 2 functions on [0, 1]. But with respect to the norm we do not have f ′′(0)[v, v] δ v for k·k∞ ≥ k k∞ any δ > 0. Second order sufficiency works neither with respect to 2 nor with respect to . ∞ However one can show that k·k k·k

1 2 f(x) f(¯x)+ x x¯ 2 if x x¯ ε. ≥ 3 k − k k − k∞ ≤

46 Appendix: Examples concerning existence of minimizers

Example 1: Consider a sequence of functions x as shown in Figure 4: { j}

Figure 4: Sequence of functions xj

xj 0 = x . But xj 0,1 = 1 for all j. x = 0 is a weak limit, which is good enough. k k∞ → k ∗k k k ∗ lim f(xj) f(x ), for suitable f : C[0, 1] R. ≥ ∗ →

Example 2. (For existence proof via Arzela Ascoli): We consider an open pit mining problem as illustrated in Figure 5.

z = 0 mountain surface xo(w)

x(w)

slope limited by material

note: x(w) >= xo(w) represents exavation depth

Figure 5: Open Pit Mine

Program: Define set of feasible profiles x S C(Ω). • ∈ ⊆ Show that S is closed and has uniform Lipschitz constant ( = sequentially compact in • C(Ω)). ⇒

47 Define objective G(x) gain, and capacity constraint E(x) E¯. • ≡ ≤ Show that both G and E are Lipschitz continuous on C(Ω). • Conclude that optimal x exists. • ∗ Conditions on feasible profiles:

(i) z = x(ω) x0(ω) for ω Ω z = x(ω)=≥ x (ω) for ω ∈ ∂Ω 0 ∈ ⇐⇒ (x x0) C0(Ω) continuous function satisfying homogeneous boundary conditions n v− 0 ∈ nonnegative≡ orthant. { ≥ } ≡ (ii) ω,ω′ Ω, x(˜ω) x(ω) ϕ(ω, x(ω)) ω˜ ω +o( ω˜ ω ), for some given function (depending ∀on the∈ position| in− space)|ϕ ≤ : Ω Z kR−withk Z =k [−z, z¯k]; . = . on Ω means orientation × → k k k k2 invariance S = x x + C (Ω) satisfying conditions (i) and (ii) { ∈{ 0} 0 }

Question: Under which conditions on ϕ is S closed? Answer 0: Certainly if ϕ is continuous, but that is too restrictive. Example: Consider 1 if ω > 0 ϕ(ω, z)= ϕ (ω)= 0 if ω < 0 α   α if ω =0

= All xε for ε> 0 are feasible, but x0 = limε 0 xε is only feasible if α =1 ϕα(ω) is ⇒ → ⇐⇒ upper semi-continuous (u.s.c). Definition. f : X R is called upper or lower semi-continuous (l.c.s) if →

lim xj = x = f(x ) lim sup f(xj) or f(x ) lim sup f(xj), respectively. j ∗ ⇒ ∗ ≥ j ∗ ≤ j →∞ →∞ →∞ Theorem. If infimizing sequence has a limit x and f is l.s.c, then f(x )= f . ∗ ∗ ∗ Proposition. If ϕ : Ω Z R is upper semi-continuous then S is closed and uniformly × → Lipschitz continuous, i.e. for some L > 0 and x S we have x(˜ω) x(ω) L ω˜ ω for ω, ω˜ Ω (i.e. Arzela Ascoli is applicable). ∈ | − | ≤ k − k ∈ (iii) See ( ) in the proof below: ∗ Proof. Consider x S¯ = closure of S in C(Ω). Due to u.s.c of ϕ, there exists for all ε > 0, ω, ω˜ Ω, y S,∈ δ 0 with ω˜ ω <δ> y x s.t ∈ ∈ ∃ ≥ k − k k − k∞

ϕ(˜ω,y(˜ω)) ϕ(ω, x(ω))+ ε ≤ 4

Now we prove by contradiction that for allω, ˆ ω˜ δ(ω) Ω, (ˆω =ω ˜) and y (x) C(Ω) ∈ B ⊂ 6 ∈ Bδ ⊂ y(˜ω) y(ˆω) | − | ϕ(ω, x(ω))+ ε ( ) ω˜ ωˆ ≤ 2 ∗ k − k (..)

| {z } 48 y(ˆω) y(˜ω + ∆ω)+ y(˜ω + ∆ω) y(˜ω) (..) = | − − | 2 ∆ω k k y(ˆω) y(˜ω + ∆ω) + y(˜ω + ∆ω) y(˜ω) (by tri. ineq.) | − | | − | ≤ ∆ω + ∆ω k k k k y(ˆω) y(˜ω + ∆ω) y(˜ω + ∆ω) y(˜ω) max | − |, | − | ≤ ∆ω ∆ω  k k k k 

Hence ( ) can be violated by (˜ω, ωˆ) if it is also violated by (˜ω, ωˆ + ∆ω) or (˜ω + ∆ω, ωˆ) = ∗ (ˆω ∆ω, ωˆ). Continue with proof by contradiction. Generate sequences (˜ωj, ωˆj) by halving − ∆ω such thatω ˆ =ω ˜ +α ˜ ∆ω,ω ˆ =ω ˜ +α ˆ ∆ω withα ˆ > α˜ and x˜ ω˜ = k k . j j j j j j k j − jk 2 At limj ωˆ = ω = limj ω˜ we have the inequalities →∞ ∗ →∞ ϕ(ω ,y(ω )) + ε ϕ(ω, x(ω))+ ε ∗ ∗ 4 ≤ 2 y(ˆωj) y(˜ωj) y(ˆωj) y(ω ) + y(˜ωj) y(ω ) | − | | − ∗ | | − ∗ | ≤ ωˆj ω˜j ≤ ωˆj ω + ω˜j ω k − k k − ∗k k − ∗k y(ˆωj) y(ω ) y(˜ωj) y(ω ) max | − ∗ |, | − ∗ | ≤ ωˆj ω ω˜j ω  ∗ ∗  k − k k − k 0 ϕ(ω ,y(ω )) + o [( ωˆj ω + ω˜j ω )] ≤ ∗ ∗ k − ∗k k − ∗k = ϕ(ω ,y(ω )) = ε =0. ∗ ∗ ⇒ 1 This is a contradiction since we started with ε = 4 .

(Return to x S¯). For all ε> 0, δ > 0, y S such that y x ∈ω˜ ω . ε and ω˜ ω∃ < δ ∈ k − k∞ ≤k − k 4 k − k x(˜ω) x(ω) y(˜ω) y(ω) + y x = | − | | − | k − k∞ ⇒ ω˜ ω ≤ ω˜ ω k − k k − k ω˜ ω . ε ϕ(ω, x(ω))+ ε +2k − k 4 ϕ(ω, x(ω))+ ε ≤ 2 ω˜ ω ≤ k − k Then as ε 0 for δ 0 we get → → x(˜ω) x(ω) ϕ(ω, x(ω)) ω˜ ω + o( ω˜ ω ) | − | ≤ k − k k − k

Hence S is closed in C(Ω). Uniform Lipschitz continuity follows similarly.

Remark. Given any function ϕ L∞(Ω Z) we can define upper semi-continuous envelope ∈ × by settingϕ ¯(ω, z) = lim sup(˜ω,z˜) (ω,z) ϕ(˜ω, z˜). →

(iv) E(x)= x(ω) e(ω, z)dzdω, e = the excavation effort density; Ω x0(ω) R R (v) G(x)= x(ω) g(ω, z)dzdω, g = gain density (net profit), Ω x0(ω) R R where e, g L∞(Ω Z), e(ω, z) 0. ∈ × ≥ 49 Thus we have the static optimization problem max G(x) s.t x S and E(x) E¯. Existence of solution x follows from the compactness of x S : E(x) E¯∈ and continuity≤ of G with respect ∗ to . . In fact E and G are uniformly Lipschitz{ ∈ continuous.≤ i.e.} k k∞ x˜(ω) E(˜x) E(x) = e(ω, z)dzdω | − | Ω x(ω) Z Z

x˜(ω) x(ω) sup e (ω, z) dω ≤ Ω | − | z Z | | Z ∈ Ω x˜ x e ; ≤ | |k − k∞k k∞ and similarly, G(˜x) G(x) Ω x˜ x g . − ≤| |k − k∞k k∞ Observation: Neither E nor G are affine, even assuming e is concave with respect to z and thus making E(x) convex is unrealistic since multiple optima are typical.

References

[BGLS06] J. Fr´ed´eric Bonnans, J. Charles Gilbert, Claude Lemar´echal, and Claudia A. Sagastiz´abal. Numerical optimization. Theoretical and practical aspects. Transl. from the French. 2nd reviseded. Universitext. Berlin: Springer. xiv, 490 p., 2006. [Cla90] Frank H. Clarke. Optimization and nonsmooth analysis. Reprint. Classics in Applied Mathematics, 5. Philadelphia, PA: SIAM, Society for Industrial and Applied Mathemat- ics. xii, 308 p., 1990. [Fed96] Herbert Federer. Geometric measure theory. Repr. of the 1969 ed. Classics in Mathe- matics. Berlin: Springer-Verlag. xvi, 680 p., 1996. [GJK91] Andreas Griewank, Hubertus Th. Jongen, and Man Kam Kwong. The equivalence of strict convexity and injectivity of the gradient in bounded level sets. Math. Program., Ser. A, 51(2):273–278, 1991. [GV] A. Griewank and O. Vogel. Manuscript. [HUL01] Jean-Baptiste Hiriart-Urruty and Claude Lemar´echal. Fundamentals of convex analysis. Grundlehren. Text Editions. Berlin: Springer. x, 259 p., 2001. [Jah96] Johannes Jahn. Introduction to the theory of nonlinear optimization. 2nd rev. ed. Berlin: Springer. viii, 257 p., 1996. [KK02] Diethard Klatte and Bernd Kummer. Nonsmooth equations in optimization. Regularity, calculus, methods and applications. Nonconvex Optimization and Its Applications 60. Dordrecht: Kluwer Academic Publishers. xxviii, 2002. [LV03] L.P. Lebedev and I.I. Vorovich. in mechanics. Revised and extended translation of the Russian edition. Springer Monographs in Mathematics. New York, NY: Springer. xi, 238 p., 2003. [Pon80] J. Ponstein. Approaches to the Theory of Optimization. Cambridge University Press, Cambridge, England, 1980. [Pre90] D. Preiss. Differentiability of Lipschitz functions on Banach spaces. J. Funct. Anal., 91(2):312–345, 1990.

50