2 Optimality Conditions

Home , Gateaux derivative

2 Optimality conditions

Elementary remark: min J [a,b]⊂R if x0 is a local minimum of J, then (necessary condition):

0 • J (x0) ≥ 0 if x0 = a,

0 • J (x0) = 0 if x0 ∈]a, b[,

0 • J (x0) ≤ 0 if x0 = b.

Indeed, if x0 ∈ [a, b[, we can consider x = x0 + h for some h > 0 small enough.

0 0 J(x) = J(x0) + hJ (x0) + o(h) ≥ J(x0) → J (x0) ≥ 0.

0 If x0 ∈]a, b], x = x0 − h and then J (x0) ≤ 0. Note that if x0 ∈]a, b[,

h2 J(x ) + J 00(x ) + o(h2) ≥ J(x ) 0 2 0 0 00 and then J (x0) ≥ 0. One must take into account the constraints (x ∈ [a, b]) in order to test the optimality: there are admissible directions.

2.1 Fréchet and Gâteaux differentiability For clarity purpose, we may denote by dJ the Fréchet derivative, and by J 0 the Gâteaux derivative.

Deﬁnition 2.1 J : V → H has a directional derivative at point u ∈ V along the direction v ∈ V if J(u + θv) − J(u) lim θ→0+ θ exists, and we denote by J 0(u; v) the limit.

Example: V = Rn and H = R, partial derivatives of J: J(u + θe ) − J(u) ∂J i → (u). θ ∂xi Definition 2.2 J : V → H is Gâteaux differentiable at point u if • J 0(u; v) exists, ∀v ∈ V , • and v 7→ J 0(u; v) is a linear continuous function:

J 0(u; v) = (J 0(u), v),J 0(u) ∈ L(V,H) and we denote by J 0(u) the Gˆateaux derivative of J at u.

Definition 2.3 J : V → H is Fréchetdifferentiable at point u if there exists J 0(u) ∈ L(V,H) such that J(u + v) = J(u) + (J 0(u), v) + kvkε(v), where ε(v) → 0 when v → 0.

7 Proposition 2.1 If J is Fréchet differentiable at u, then J is Gâteaux differentiable, and the two derivatives are equal.

Proof J(u + θv) − J(u) J(u+θv) = J(u)+(J 0(u), θv)+kθvkε(θv), and then = (J 0(u), v)+kvkε(θv), θ and ε(θv) → 0 when θ → 0. Then the directional derivative exists everywhere, and J 0(u) is the Gˆateauxderivative.

Remark: the converse proposition is not true. Consider the following function in R2: J(x, y) = 1 if y = x2 and x 6= 0; J(x, y) = 0 otherwise.

This function is Gâteaux differentiable. But it is not continuous, and hence cannot have a Fréchet derivative at 0.

Finite-dimensional case: V = Rn and H = R. If all partial derivatives of J exist at u, then n X ∂J (J 0(u), v) = (u).v = h∇J(u), vi. ∂x i i=1 i

Example 1: J(v) = Av where A ∈ L(V,H). Then J 0(u) = A, ∀u. 1 Example 2: J(v) = a(v, v) − L(v), where a is a continuous symmetric bilinear form, and 2 L is a continuous linear form. 1 J(u + v) = J(u) + (J 0(u), v) + a(v, v). 2 As a is continuous, a(v, v) ≤ Mkvk2 and then

(J 0(u), v) = a(u, v) − L(v).

J 0(u + θv; w) − J 0(u; w) Second (Gˆateaux) derivative: If has a ﬁnite limit ∀u, w, v ∈ V θ when θ → 0+, then we denote this limit by J 00(u; v, w), and it is the second directional derivative at point u in the directions v and w. If J 00(u; v, w) is a continuous bilinear form, then J 00(u) is called the second derivative of J, or hessian of J.

Finite increments and Taylor formula:

Proposition 2.2 If J : V → R is Gˆateaux diﬀerentiable ∀u + θv, where θ ∈ [0, 1], then ∃θ0 ∈]0, 1[ such that 0 J(u + v) = J(u) + J (u + θ0v, v). Proof Let f(θ) = J(u + θv). f : R → R. f(θ + ε) − f(θ) J(u + θv + εv) − J(u + θv) f 0(θ) = lim = lim = (J 0(u + θv), v). ε ε

0 Moreover, ∃θ0 ∈]0, 1[ such that f(1) = f(0) + f (θ0).

8 Proposition 2.3 If J : V → R is twice Gˆateaux diﬀerentiable ∀u + θv, θ ∈ [0, 1], then ∃θ0 ∈]0, 1[ such that 1 J(u + v) = J(u) + (J 0(u), v) + J 00(u + θ v; v, v). 2 0 Proof 00 00 Same as before, with f (θ) = (J (u + θv); v, v).

2.2 Convexity and Gˆateaux diﬀerentiability

Let V be a Banach space on R, and K ⊂ V a convex subset.

Deﬁnition 2.4 J : K → R is a convex function if J ((1 − θ)u + θv) ≤ (1 − θ)J(u) + θJ(v), ∀θ ∈ [0, 1], ∀u, v ∈ K.

J is called strictly convex if

J ((1 − θ)u + θv) < (1 − θ)J(u) + θJ(v), ∀θ ∈]0, 1[, ∀u 6= v ∈ K.

J is α-convex (α > 0), or strongly convex, if α J ((1 − θ)u + θv) ≤ (1 − θ)J(u) + θJ(v) − θ(1 − θ)ku − vk2, ∀θ ∈ [0, 1], ∀u, v ∈ K. 2 Proposition 2.4 α-convex ⇒ strictly convex ⇒ convex.

Proposition 2.5 Let J : K → R be a convex function, then every local minimum of J is a global minimum.

Proof Let u be a local minimum of J. Then J(u) ≤ J(v), ∀v ∈ V(u) ∩ K. If u is not a global minimum, ∃w ∈ K such that J(w) < J(u). Then, for θ > 0, by convexity,

J((1 − θ)u + θw) < J(u).

For θ > 0 small enough, (1 − θ)u + θw ∈ V(u) ∩ K, and there is a contradiction.

Proposition 2.6 If J is strictly convex, then J has at most one minimum.

Proof Let u1 and u2 be two (local ⇒ global) minima of J. Then J(u1) = J(u2). By strict convexity, J(w) < J(u1) for all w ∈]u1, u2[.

Proposition 2.7 If J : V → R is Gˆateaux-diﬀerentiable in V , then: i J is convex in V ⇔ J(v) ≥ J(u) + (J 0(u), v − u), ∀u, v

ii J is strictly convex in V ⇔ J(v) > J(u) + (J 0(u), v − u), ∀u 6= v

0 α 2 iii J is α-convex in V ⇔ J(v) ≥ J(u) + (J (u), v − u) + 2 kv − uk , ∀u, v Proof

9 i) If J is convex: J(u + θ(v − u)) ≤ J(u) + θ(J(v) − J(u)) J(u + θ(v − u)) − J(u) ≤ J(v) − J(u) θ and consider the limit when θ → 0+. Conversely,

J(u) ≥ J(u + θ(v − u)) − θ (J 0(u + θ(v − u)), v − u)

and J(v) ≥ J(u + θ(v − u)) + (1 − θ)(J 0(u + θ(v − u)), v − u) Let us multiply the ﬁrst inequality by (1 − θ) and the second by θ, and then

(1 − θ)J(u) + θJ(v) ≥ J(u + θ(v − u))

ii) If J is strictly convex: then J is convex, and by (i) between u and u + θ(v − u),

(J 0(u), θ(v − u)) ≤ J(u + θ(v − u)) − J(u)

J(u + θ(v − u)) − J(u) (J 0(u), v − u) ≤ < J(v) − J(u) θ by strict convexity. Conversely, see (i). iii) Same as (ii). Geometric interpretation: J is convex if the function lies above its tangent.

Proposition 2.8 If J is Gâteaux-differentiable, then: i) J is convex iff J 0 is monotonic, i.e.

(J 0(u) − J 0(v), u − v) ≥ 0

ii) J is strictly convex iﬀ J 0 is strictly monotonic, i.e.

(J 0(u) − J 0(v), u − v) > 0, u 6= v

iii) J is α-convex iﬀ (J 0(u) − J 0(v), u − v) ≥ αku − vk2

Proof i) If J is convex, J(v) ≥ J(u)+(J 0(u), v−u) and J(u) ≥ J(v)+(J 0(v), u−v). Conversely, let φ(θ) = J(u+θ(v−u)). Then φ0(u) = (J 0(u + θ(v − u)), v − u). We have φ0(θ)−φ0(0) ≥ 0, and by integrating between 0 and θ, φ(θ) − θφ0(0) − φ(0) ≥ 0. θ = 1 ⇒ J is convex. ii) Same as (i). iii) Add α θ2kv − uk2. 2

Proposition 2.9 If J : V → R is twice Gâteaux-differentiable, then i) J is convex in V iff (J 00(u)w, w) ≥ 0, ∀u, w. ii) J is α-convex in V iff (J 00(u)w, w) ≥ αkwk2, ∀u, w.

10 Proof If J is convex, (J 0(u + θw) − J 0(u), θw) ≥ 0. Let θ → 0+, and then (J 00(u)w, w) ≥ 0. Conversely: Taylor formula. Geometric interpretation: if J is α-convex, the function J has a curvature larger than 1 2 (or equal to) the curvature of w 7→ 2 w .

Remark: J strictly convex ⇐ (J 00(u)w, w) > 0, this is a suﬃcient condition, but not a necessary condition!

2.3 Optimality conditions 2.3.1 First-order necessary optimality conditions

Proposition 2.10 Let V be a Banach space, and J :Ω → R, where Ω is an open subset of V . Let u ∈ Ω be a local extremum of J, and assume that J is Gˆateaux-diﬀerentiable at u. Then, J 0(u) = 0. This is the Euler’s equation.

Proof For all v, for θ > 0 small enough, u + θv ∈ V(u) ⊂ Ω. Then J(u) ≤ J(u + θv). Then 0 0 0 0 (J (u), v) ≥ 0. But (J (u), −v) ≥ 0, and then (J (u), v) = 0, ∀v. Then J (u) = 0.

Remark: This proposition is not true if Ω is not an open subset! For instance, consider J(x) = x on Ω = [0, 1]. 0 is a minimum of J, but J 0(0) = 1. The proposition is usually applied to Ω = V , or to an inner point of a subset.

Notation: The points u where J 0(u) = 0 are called the critical points of J.

Proposition 2.11 Let V be a Banach space, and K a convex subset of V . If u is a local minimum of J on K, and if J is Gˆateaux-diﬀerentiable at u, then

(J 0(u), v − u) ≥ 0, ∀v ∈ K.

This is the Euler’s inequality.

Proof For all v ∈ K, for all θ > 0 small enough, u + θ(v − u) ∈ V(u) ∩ K as long as θ ≤ 1. Then + 0 J(u) ≤ J(u + θ(v − u)), and with θ → 0 ,(J (u), v − u) ≥ 0.

Particular cases:

i) K is a sub-vector space of V : then (J 0(u), w) = 0, ∀w ∈ K.

0 ii) K is a sub-aﬃne space of V : v = a+v0, v0 ∈ sub-vector space V0 of V . Then (J (u), w) = 0, ∀w ∈ V0.

11 iii) Projection on a closed convex set: Let K be a closed convex subset of a Hilbert space V . Let J(v) = kv−fk2, for a given f ∈ V \K. Let u be the projection of f on K. Then (J 0(u), v) = 2hu−f, vi. Then (J 0(u), v−u) = 2hu−f, v−ui ≥ 0. Then

hf − u, v − ui ≤ 0, ∀v ∈ K.

This property characterizes the projection on a closed convex set.

iv) Least-square estimation: p X y(x) = vjwj(x) j=1

The goal is to approximate the bi’s: y(xi) ≈ bi. Deﬁne

2 n n  p  X 2 X X J(v) = [y(xi)−bi] =  vjwj(xi) − bi i=1 i=1 j=1

= kAv − bk2. Then (J 0(u), v) = hAu − b, Avi = htA(Au − b), vi and then tAAu = tAb This equality is called the normal equation of the least-square estimation problem. The same idea can be used for estimating the solution of an overdetermined system of linear equations.

2.3.2 Convex case: suﬃcient optimality conditions Let V be a Banach space, and K a convex subset of V .

Proposition 2.12 Let J : V → R Gˆateaux-diﬀerentiable and convex on K, where K is an open convex subset of V . If J 0(u) = 0, where u ∈ K, then u is a global minimum of J on K. Proof 0 J(v) ≥ J(u) + (J (u), v − u) = J(u).

Proposition 2.13 If J : K → R is convex, Gâteaux-differentiable, and if (J 0(u), v − u) ≥ 0, ∀v ∈ K, then u is a global minimum of J on K. These are necessary conditions in a general case, and necessary and sufficient conditions in the convex case.

Particular case: K = V : J 0(u) = 0 is the Euler’s equation.

Application to the minimization of quadratic cost functions Let a be a continuous symmetric bilinear coercive form: a(v, v) ≥ αkvk2, α > 0, ∀v ∈ V , and L be a continuous linear form on V . We deﬁne 1 J(v) = a(v, v) − L(v). 2

12 Proposition 2.14 J is Fr´echet-diﬀerentiable, and α-convex.

Proof 0 1 0 J(u + h) = J(u) + (J (u), h) + 2 a(h, h) and (J (u), h) = a(u, h) − L(h). 0 0 2 (J (v) − J (u), v − u) = a(v − u, v − u) ≥ αkv − uk , then J is α-convex.

Euler’s inequality for the optimization problem inf J(v) is v∈K

a(u, v − u) ≥ L(v − u), ∀v ∈ K.

If K = V , then a(u, v) = L(v), ∀v ∈ V .

In the particular case where V = Rn, then a(v, v) = (Av, v) where A is a symmetric positive n deﬁnite matrix (α = inf λi). And L(v) = (b, v), with b ∈ R . In this case, the Euler’s equation is: Au = b.

Non-differentiable convex cost functions Let J(v) = j0(v) + j1(v), where j0 is convex and Gâteaux-differentiable, and j1 is convex. If J(u) = inf J(v), with u ∈ K a nonempty v∈K closed convex subset of V :

Proposition 2.15 u is characterized by the following inequality:

0 (j0(u), v − u) + j1(v) − j1(u) ≥ 0, ∀v ∈ K.

Proof Suﬃcient condition:

0 j0(v) + j1(v) − j0(u) − j1(u) = J(v) − J(u) ≥ (j0(u), v − u) + j1(v) − j1(u) ≥ 0.

Necessary condition: j0(u) + j1(u) ≤ j0(u + θ(v − u)) + j1(u + θ(v − u)) ≤ j1(u) + θ(j1(v) − j1(u)) + j0(u + θ(v − u))

0 and then (j0(u), v − u) + j1(v) − j1(u) ≥ 0.

2.3.3 Second-order optimality conditions Proposition 2.16 Necessary condition for a local minimum: let Ω be an open subset of a normed vector space V , and J :Ω → R a diﬀerentiable function in Ω, twice diﬀerentiable at u ∈ Ω. If u is a local minimum of J, then

J 00(u)(w, w) ≥ 0, ∀w ∈ V.

Proof Taylor-Young:

θ2 J(u + θw) = J(u) + θ(J 0(u), w) + [J 00(u)(w, w) + ε(θ)] 2 where ε(θ) → 0 when θ → 0. As u is a local minimum of J, J 0(u) = 0 and J(u + θw) ≥ J(u) 00 for all w, for θ ≥ 0 small enough. Then J (u)(w, w) ≥ 0.

13 Deﬁnition 2.5 u is a strict local minimum of J if there exists a neighborhood V(u) such that J(u) < J(v), ∀v ∈ V(u)\{u}.

Proposition 2.17 Sufficient condition for a strict local minimum: let Ω be an open subset of V , and J :Ω → R a differentiable function such that J 0(u) = 0. If J is twice differentiable at u, and if there exists α > 0 such that J 00(u)(w, w) ≥ αkwk2, ∀w ∈ V , then u is a strict local minimum of J.

Proof 1 α − ε(w) J(u + w) − J(u) = [J 00(u)(w, w) + kwk2ε(w)] ≥ kwk2 > 0 2 2 in an open ball centered at u, of radius r (small enough) such that kε(w)k < α if kwk ≤ r.

Proposition 2.18 Let Ω be an open subset of V , J a diﬀerentiable function such that J 0(u) = 0. If J is twice diﬀerentiable in Ω, and if there exists a ball B centered at u in Ω such that J 00(v)(w, w) ≥ 0, ∀v ∈ B, ∀w ∈ V, then u is a local minimum of J.

Proof Taylor Mac Laurin: ∃v ∈]u, u + w[ such that 1 J(u + w) = J(u) + J 00(v)(w, w) ≥ J(u) 2 for all u + w ∈ B.

Remark: Straightforward application to a convex function.