<<

Copyright © 2019 Society for Industrial and Applied From Introduction to Optimization and Hadamard Semidifferential - Delfour (9781611975956)

Chapter 3

10 10

8 Semidifferentiability, 10

6 10

4 10 Differentiability,

2 10 0 0.2

0 0.4 10 1 0.8 0.6 0.6 0.4 0.2 0.8 0 !0.2 Continuity, and !0.4 !0.6 !0.8 !1 1 Convexities

1 Introduction According to some historians the differential or infinitesimal calculus has been implicitly present since the very early years of mathematics. For instance, the mathematician-astronomer Aryab- hata (476–550 CE) in 499 CE used a notion of infinitesimals and expressed an astronomical prob- lem in the form of a basic .66 For other historians the was invented in the 17th century. Accepting this second point of view, the first idea of the differential calculus and the rule for computing the extrema of a go back to Pierre de Fermat67 in 1638. He developed a method called maximis et minimis for determining maxima, minima, and to various curves that was equivalent to differentiation.68 Ideas leading up to the notions of function, deriva- tive, and were developed throughout the 17th century. It is generally accepted that the notion of is due to Leibniz69 and Newton.70 Fermat’s rule for the extremum of a func- tion is then de facto generalized in the form f 0(x) = 0. It was used in the proof of the theorem of Rolle71 in 1691, which led to the rule of l’Hôpital72 in 1696. Publication of Newton’s main treatises73 took many years, whereas Leibniz published first (Nova methodus,74 1684) and the whole subject was subsequently marred by a priority dispute between the two inventors of the calculus.

66George G. Joseph, The Crest of the Peacock, Princeton University Press (2000), pp. 298–300. 67Pierre de Fermat (1601–1665). 68First written in a letter to Mersenne (who was corresponding with numerous scientists at the time and was seeing to the diffusion of new results) in 1638; the first printed version of the method can be found in the fifth volume of Supplementum Cursus Mathematici (1642) written by Herigone, and it is only in 1679 that it appears in Varia opera mathematica under the title Methodus ad disquirendam Maximam et Minimam followed by De tangentibus linearum curvarum. 69Gottfried Wilhelm Leibniz (1646–1716). 70Sir (1643–1728). 71Michel Rolle (1652–1719). 72Guillaume François Antoine de l’Hôpital (1661–1704). 73The Method of completed in 1671 and published in 1736 and Philosophiae Naturalis Principia Mathe- matica (Mathematical Principles of Natural Philosophy), often called the Principia (Principles), 1687 and 1726 (third edition). 74Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas nec irrationales quantitates moratur, et singulare pro illis calculi genus (A new method for and their tangents, that are not limited to fractional or irrational expressions, and a remarkable type of calculus for these), in Acta Eruditorum, 1684, a journal created in Leipzig two years earlier.

75 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

76 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

“In early calculus the use of infinitesimal quantities was thought unrigorous, and was fiercely criticized by a number of authors, most notably Michel Rolle and Bishop Berkeley. Bishop Berkeley famously described infinitesimals as the ghosts of departed quantities in his book The Analyst in 1734. Several mathematicians, including Maclaurin, attempted to prove the soundness of using infinitesimals, but it would be 150 years later, due to the work of Cauchy and Weierstrass, where a means was finally found to avoid mere “notions” of infinitely small quantities, that the foundations of differential and integral calculus were made firm. In Cauchy’s writing, we find a versatile spectrum of foundational approaches, including a defini- tion of continuity in terms of infinitesimals, and a (somewhat imprecise) prototype of an (ε, δ)-definition of in the definition of differentiation. In his work Weier- strass formalized the concept of limit and eliminated infinitesimals. Following the work of Weierstrass, it eventually became common to base calculus on limits in- stead of infinitesimal quantities. This approach formalized by Weierstrass came to be known as the standard calculus. Informally, the name “infinitesimal calculus” became commonly used to refer to Weierstrass’ approach.” (from a 2010 article in Wikipedia “Infinitesimal Calculus” that is no longer available).

To characterize the extremum of a smooth function of several variables, the classical notion is the one of total differential generalized by Fréchet to spaces of functions (hence of infinite dimension). However, important classes of functions in optimization are not differentiable in that sense. In fact, the first natural notion for the characterization of an extremum is rather the directional derivative at the point where the extremum is achieved. In this chapter, we shall go back to the older notions of first variation and of differential that will be relaxed to weaker notions of semidifferentials. Nevertheless, existence of directional or semidifferentials does not guarantee ex- istence of the total differential or of the simple (linearity with respect to the direction). We shall see that the Gateaux semidifferential (or first variation) is not sufficient to preserve the basic properties of the total differential such as the continuity of the function and the for the composition of functions. It is the stronger semidifferential in the sense of Hadamard that will make it possible to enlarge the family of classically differentiable functions to some reput- edly nondifferentiable functions while preserving those two properties and adding new functional operations to the calculus that will become a semidifferential calculus. For instance, the lower and upper envelopes of finite (and in some cases infinite) families of Hadamard semidifferen- tial functions are Hadamard semidifferentiable. It will also be shown that continuous convex functions that play a central role in optimization (as we have seen in Chapter 2) are Hadamard semidifferentiable. This chapter focuses on the properties of differentiable and semidifferentiable real-valued and vector-valued functions and their associated semidifferential calculus. Section 2 revisits real-valued functions of a single real variable. Sections 2.2 to 2.4 review classical results. Section 3 deals with real-valued functions of several real variables. Notions of semidifferen- tials and Hadamard semidifferentials are introduced going back to the preoccupations of the time through the papers of J. HADAMARD [2] in 1923 and of M. FRÉCHET [5] in 1937. They will be used to fully characterize the classical notions of differentiability with which will be associated the names of Gateaux, Hadamard, and Fréchet. Section 3.5 concentrates on Lipschitz continuous functions for which the existence of the semidifferential implies the existence of the stronger Hadamard semidifferential. However, this is not entirely satisfactory. For instance, the distance function dU in Example 5.2 of Chapter 2 n is uniformly Lipschitzian in R . It is Hadamard semidifferentiable when U is convex, but this semidifferentiability might not exist at some points for a lousy set U! Yet, since dU is Lips- Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

2. Real-Valued Functions of a Real Variable 77 chitzian, its differential quotient is bounded and the liminf and limsup are both finite numbers. Without, for all that, going deep into the clouds and dropping the notion of semidifferential, we are led to introducing the notions of lower and upper semidifferentials by replacing the limit by the liminf and the limsup in the definitions. This idea makes it possible to relax other notions of differentiability for Lipschitzian functions. Thus, the upper semidifferential developed by Clarke75 in 1973 under the name generalized directional derivative relaxes the older notion of strict differentiability. The lower and upper semidifferentials in the sense of Clarke are added to the menagerie of notions of semidifferentials that will be summarized in section 6 and com- pared in Chapter 4 in the context of the necessary optimality condition. In general, the classical chain rule will not hold for the composition of two lower or upper semidifferentials, but weaker geometric forms of the chain rule will be available. The section is completed with a review of functions of class C(1) and C(p). Section 4 first deals with the characterization of convex functions that are directionally dif- ferentiable. It is followed by a characterization via the semidifferentials. It is also shown that a convex function is Hadamard semidifferentiable and continuous at interior points of its domain. At boundary points, the Hadamard semidifferential is replaced by the lower Hadamard semi- differential. The section is completed by introducing the semiconvex and semiconcave functions that are locally Lipschitzian and Hadamard semidifferentiable at each point of the interior of their domain. Section 5 on the semidifferentiability of an extremum with respect to one or several param- eters has been considerably expanded in order to further illustrate how the notion of Hadamard semidifferentiability naturally arises in the characterization of the semidifferential of the parame- trized extremum of an objective function. We first give a fairly general theorem on the semidiffer- entiability of extrema with respect to a parameter. It is applied to get the explicit expression of the Hadamard semidifferential of the extremum of quadratic functions (for instance, the depen- dence of the least or greatest eigenvalue of a symmetric matrix on parameters). We then give the Theorem of Danskin in 1966 in the differentiable case that only yields a Hadamard semi- differential when the the set of points achieving the extremum is not a singleton. We also give the concave and the semidifferentiable versions of this theorem. The convex case (dual of the concave case) was first given by D. Bertsekas in his doctoral thesis in 1971 by using a theorem of R. T. Rockafellar. We include a subsection 5.3.1 on sublinear and superlinear functions. Those notions arise in the conclusion of the Theorem of Danskin and are used later in Chapter 5 for objective and constraint functions that are only semidifferentiable. The various differentials and semidifferentials introduced in this chapter are summarized and compared in a table in section 6.

2 Real-Valued Functions of a Real Variable Prior to studying real-valued functions of several real variables, consider functions of a single real variable (see Figure 3.1).

Definition 2.1. Let f : R → R be a real-valued function of a real variable. (i) f is differentiable from the right or right differentiable in x ∈ R if the limit f(x + t) − f(x) lim (2.1) t&0+ t exists and is finite, where limt&0+ means that t goes to 0 by strictly positive values. That limit will be denoted df(x; +1) or df(x; 1).

75Frank Herbert Clarke (1948– ). See F. H. CLARKE [1, 2]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

78 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

f(x) . . f(x) f(x) .

x x x f right differentiable f left differentiable f left and right at x. at x. differentiable at x.

f(x) . f(x) .

x x f differentiable at x. f neither left or right differentiable at x.

Figure 3.1. Examples of right and left differentiability.

f is continuous from the right or right continuous at x ∈ R if

lim f(x + t) = f(x). (2.2) t&0+

(ii) f is differentiable from the left76 or left differentiable in x ∈ R if the limit f(x − t) − f(x) lim (2.3) t&0+ t exists and is finite. That limit will be denoted df(x; −1). f is continuous from the left or left continuous at x ∈ R if

lim f(x − t) = f(x). (2.4) t&0+

(iii) f is differentiable at x ∈ R if the limit f(x + t) − f(x) lim (2.5) t→0 t 0 exists and is finite, where limt→0 means that t ∈ R goes to 0. It will be denoted f (x), df(x)/dx, or f (1)(x).77

The notions of right and left derivatives at x are what we shall later call semidifferentials at x in the respective directions +1 and −1.

76Technically speaking, it is the derivative defined in (i) in the direction −1. 77The notation f 0(x) is the one of Lagrange, df/dx(x) of Leibniz, and f˙(x) of Newton. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

2. Real-Valued Functions of a Real Variable 79

The right and left differentials are special cases of Dini’s78 differentials that we shall see later in Definition 3.8.

Remark 2.1. If f is right differentiable at x ∈ R, we have the positive homogeneity: f(x + αt) − f(x) ∀α ≥ 0, df(x; α) def= lim = α df(x; +1), df(x; 0) = 0; (2.6) t&0+ t similarly, if f is left differentiable at x ∈ R, we have the positive homogeneity: f(x − αt) − f(x) ∀α ≥ 0, df(x; −α) def= lim = α df(x; −1), df(x; 0) = 0. (2.7) t&0+ t It is readily checked that if f is differentiable at x, then f 0(x) = df(x; +1) = −df(x; −1) (2.8) f(x − αt) − f(x) ⇒ ∀α ≥ 0, df(x; −α) def= lim = α df(x; −1) = −α f 0(x), t→0 t and we have the homogeneity

def f(x + αt) − f(x) 0 ∀α ∈ R, df(x; α) = lim = α df(x; +1) = α f (x). (2.9) t→0 t Therefore, if f is differentiable at x ∈ R, the positive homogeneity, combined with the condition df(x; −1) = −df(x; 1), (2.10) implies that the map def f(x + tv) − f(x) v 7→ df(x; v) = lim : R → R (2.11) t&0+ t is homogeneous and hence linear and continuous:

∀α, β ∈ R and ∀v, w ∈ R, df(x; αv + βw) = α df(x; v) + β df(x; w).

2.1 Continuity and Differentiability

Theorem 2.1. If f is right (resp., left) differentiable at a point x ∈ R, then f is right (resp., left) continuous at x. If f is differentiable at x, then f is continuous at x. Proof. It is sufficient to prove the right continuity from the right differentiability. By the defini- tion of the right derivative, for all ε > 0, there exists δ(x) > 0 such that

f(x + t) − f(x) ∀t, 0 < t < δ(x), − df(x; 1) < ε. t This implies that |f(x + t) − f(x)| < t (|df(x; +1)| + ε) . The number c(x, ε) = |df(x; +1)| + ε > 0 depends only on x and ε. Choosing  ε  δ0(x) = min δ(x), > 0, c(x, ε) then 0 < t < δ0(x) yields |f(x + t) − f(x)| < t c(ε, x) < ε and, thence, the right continuity of f, since ε/c(x, ε) ≤ 1.

78Ulisse Dini (1845–1918), U. DINI [1]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

80 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

In general, the derivative of a function is not continuous, as shown in the following example.

Example 2.1. Consider the function ( x2 sin 1 , x 6= 0, f(x) = x (2.12) 0, x = 0, which is differentiable at all x 6= 0: 1 1 f 0(x) = 2x sin − cos , x 6= 0. x x For x = 0, go back to the definition of the differential quotient

f(t) − f(0) 1 = 0 = t sin − 0 ≤ |t|, t 6= 0; t t by letting t → 0, it is readily seen that the limit exists and that f 0(0) = 0. Therefore, f is differentiable everywhere on R, but f 0 is not a since cos(1/t) does not converge as t goes to 0. There exist no right or left limits, but

lim inf f 0(x) = −1 and lim sup f 0(x) = +1, x→0 x→0 and, since f 0(0) = 0, f 0 is neither lsc nor usc.

This example shows that a at each point can have as derivative a function that is not continuous at some points. However, that does not mean that any function can be the derivative of a continuous function differentiable everywhere. The derivatives of such functions in an open interval share an important property of continuous functions in an open interval: such functions go through all intermediary points (section 2.3, Theorem 2.5).

2.2 With the notion of derivative, it is easy to recover Fermat’s rule.

Theorem 2.2 (Fermat’s rule). Let f :[a, b] → R, a < b. Assume that f has a local maximum at x ∈ ]a, b[ , that is, ( ∃ a neighborhood V (x) of x in ]a, b[ such that (2.13) f(x) ≥ f(y) ∀y ∈ V (x).

If f is differentiable at x, then

f 0(x) = 0. (2.14)

Proof. As x is an interior of [a, b], choose δ > 0 such that ] a, b [ ⊃ V (x) ⊃ Iδ = ] x − δ, x + δ [. As a result, ∀y ∈ Iδ, f(y) ≤ f(x). For t, 0 < t < δ, we have f(x − t) ≤ f(x), which implies

f(x − t) − f(x) ≤ 0 ⇒ df(x; −1) ≤ 0 t Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

2. Real-Valued Functions of a Real Variable 81 since f is differentiable at x; similarly f(x + t) ≤ f(x) implies

f(x + t) − f(x) ≤ 0 ⇒ df(x; +1) ≤ 0. t But, since f is differentiable at x, we have 0 ≥ df(x; −1) = −df(x; +1) ≥ 0 and hence f 0(x) = df(x; +1) = 0. The next theorem is a generalized form of the mean value theorem involving two functions. Theorem 2.3. Let f and g be two continuous functions in [a, b] that are differentiable on ]a, b[ . Then

∃x ∈ ]a, b[ such that [f(b) − f(a)] g0(x) = [g(b) − g(a)] f 0(x). (2.15)

Proof. Define

h(t) def= [f(b) − f(a)]g(t) − [g(b) − g(a)]f(t), a ≤ t ≤ b.

It is readily seen that h is continuous on [a, b] and differentiable on ]a, b[ and that

h(a) = f(b)g(a) − f(a)g(b) = h(b).

To prove the theorem, it is sufficient to show that h0(x) = 0 for some point x ∈ ]a, b[ . If h is constant, it is true for all x ∈ ] a, b [. If h(t) > h(a) for some t ∈ ]a, b[ , let x be the point of [a, b] at which h achieves its maximum. It exists by the Weierstrass theorem since f is continuous on the compact interval [a, b]. Since h(a) = h(b), the point x belongs to the open interval ]a, b[ and h0(x) = 0 by Fermat’s rule. If h(t) < h(a) for some t ∈ ]a, b[ , repeat the same argument by choosing as x a point of [a, b] where −h achieves its maximum. The mean value theorem is now a corollary to the previous theorem. Theorem 2.4. Let f be continuous on [a, b] and differentiable everywhere on ]a, b[ . Then there exists a point x ∈ ] a, b [ such that

f(b) − f(a) = (b − a)f 0(x). (2.16)

Proof. Set g(x) = x in the previous theorem.

2.3 Property of the Derivative of a Function Differentiable Everywhere Go back to the issue raised in the context of Example 2.1 that shows that, in general, the derivative of a differentiable function is not continuous. It still retains the following property of continuous functions on an open interval.

Theorem 2.5 (W. RUDIN, 1976 edition [1, Thm. 5.12, p. 100]). Let f : R → R be differentiable in ]a0, b0[ , a0 < b0.

(i) Given a0 < a < b < b0 and λ a real number such that

f 0(a) < λ < f 0(b) (resp., f 0(a) > λ > f 0(b)), (2.17)

there exists a point x ∈ ]a, b[ such that f 0(x) = λ.

(ii) The derivative of f in ]a0, b0[ cannot have discontinuities of the first kind. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

82 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Recall that if f is discontinuous at a point x and if the limit from the right f(x+) and the limit from the left f(x−) exist and are not equal, we say that f has a discontinuity of the first kind. Otherwise, the discontinuity is of the second kind.

Proof. Consider the function g(x) = f(x) − λ x. Since f is differentiable on ]a0, b0[ , f is continuous on the compact interval [a, b] and there exists a minimizer x ∈ [a, b] of g with respect to [a, b]. If a < x < b, then, by Fermat’s rule (Theorem 2.2), g0(x) = 0 and necessarily f 0(x) − λ = 0. It remains to prove that the minimum cannot occur at x = a or x = b. If x = a, then g(a + t) − g(a) ∀t, 0 < t ≤ b − a, g(a + t) − g(a) ≥ 0 ⇒ ≥ 0 ⇒ g0(a) ≥ 0. t But, as g0(a) = f 0(a) − λ and f 0(a) − λ < 0, we get a contradiction. Similarly, if the minimum occurs at x = b,

g(b − t) − g(b) ∀t, 0 < t ≤ b − a, g(b − t) − g(b) ≥ 0 ⇒ ≥ 0 ⇒ −g0(b) ≥ 0. t But, as g0(b) = f 0(b) − λ and f 0(b) − λ > 0, we get a contradiction.

2.4 Taylor’s Theorem When f has a derivative f 0 on an interval and f 0 also has a derivative on the same interval, this will be denoted f (2). In a similar fashion, denote by f (n) the nth order derivative of f, n ≥ 1. For the existence of f (n)(x) at a point x, it is necessary that f (n−1) exists in a neighborhood of x and that it is differentiable (and hence continuous) at that point. Since f (n−1) must exist on a neighborhood of x, f (n−2) must exist and must be differentiable on that neighborhood and so on; we can go up to f. As a result, a function f for which f (n) exists on ]a, b[ is a function such that f and its derivatives up to n − 1 are continuous and differentiable on ]a, b[ .

Theorem 2.6 (Taylor79). Let f :]a, b[ → R and assume that f (n) exists on ]a, b[ for some integer n ≥ 1. Given x, a < x < b, define the (n − 1)th order polynomial

n−1 (k) def X f (x) P (y) = (y − x)k, a < y < b. (2.18) x k! k=0

For any y, a < y < b, there exists θ, 0 < θ < 1, such that

f (n)(x + θ (y − x)) f(y) = P (y) + (y − x)n. (2.19) x n!

Proof. If y = x, there is nothing to prove since Px(x) = f(x). If y 6= x, the convex combination x+θ (y −x), 0 ≤ θ ≤ 1, sweeps the interval [x, y] when x < y and [y, x] when x > y. Consider the following function parametrized by M to be determined:

n−1 X θk θn g(θ) = f(x + θ(y − x)) − f (k)(x)(y − x)k − M (y − x)n k! n! k=0

79Brook Taylor (1685–1731). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 83 for 0 ≤ θ ≤ 1. By definition, g(0) = 0 and

n−1 X f (k)(x) (y − x)n g(1) = f(y) − (y − x)k − M k! n! k=0 (2.20) (y − x)n = f(y) − P (y) − M . x n! Since y 6= x, M can be chosen such that g(1) = 0 in (2.20). The `th derivative of g, 1 ≤ ` ≤ n − 1, is given by

g(`)(θ) =f (`)(x + θ(y − x)) (y − x)` n−1 X θ(k−`) θ(n−`) − f (k)(x)(y − x)k − M (y − x)n (k − `)! (n − `)! k=` ⇒ g0(0) = 0, g(2)(0) = 0, . . . , g(n−1)(0) = 0.

As the nth derivative of θk, 0 ≤ k ≤ n − 1, is zero, the nth derivative of g is h i g(n)(θ) = f (n)(x + θ (y − x)) − M (y − x)n.

(n) It is now sufficient to show the existence of θn ∈ ]0, 1[ such that g(1) = 0 = g (θn) to get (n) M = f (x + θn(y − x)) and, substituting in (2.20), formula (2.19). Indeed, the function g is continuous on [0, 1] and differentiable in [0, 1] and g(1) = g(0) = 0. By Theorem 2.4, 0 there exists θ1 ∈ ]0, 1[ such that 0 = g(1) − g(0) = g (θ1). Again, by Theorem 2.4, there 0 0 (2) (2) exists θ2 ∈ ]0, θ1[ such that 0 = g (θ1) − g (0) = θ1 g (θ2) and g (θ2) = 0. Repeating (n−1) this argument until the last step, there exists θn ∈ ]0, θn−1[ such that 0 = g (θn−1) − (n−1) (n) (n) (n) g (0) = θn−1 g (θn) and g (θn) = 0. Therefore, g(1) = 0 implies g (θn) = 0 and, (n) as y 6= x, M = f (x + θn (y − x)).

3 Real-Valued Functions of Several Real Variables 3.1 Geometrical Approach via the Differential It is straightforward to extend the results of the previous section to vector-valued functions of a n single real variable t 7→ h(t): R → R . The derivative h0(t) and the derivatives from the right dh(t; +1) and from the left dh(t; +1) at t are defined in the same way, but the convergence of n the differential quotients takes place in the space R instead of R or component by component. n It is quite different for functions of several real variables f : R → R. The oldest notion in the literature is the one of differential. For instance, for a function f(x, y) of two variables, the reasoning is on the increment or the variation ∆f(x, y) of the function f resulting from the variations ∆x and ∆y of the variables x and y:

∆f(x, y) ∆f(x, y) ∆f(x, y) = ∆x + ∆y. ∆x ∆y

Assuming that, as ∆x and ∆y go to zero, the quotients

∆f(x, y) ∂f ∆f(x, y) ∂f → (x, y) and → (x, y) ∆x ∂x ∆y ∂y Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

84 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities converge in R, the differential is formally written as ∂f ∂f df(x, y) = (x, y) dx + (x, y) dy, (3.1) ∂x ∂y which underlies the notion of in the directions of the x and y axes. But, for J. HADAMARD80 [2] in 1923, this expression is only an operational symbol: “What is the meaning of equality (3.1)? That, if x, y and hence g = f(x, y) are expressed as a function of some auxiliary variable t, we have, whatever those expressions be, dg dg dx dg dy = + . (3.2) dt dx dt dy dt Such is the unique meaning of equality (3.1). The equality (3.2) taking place whatever be the expression of the independent variable as a function of the other two, t is deleted. The invaluable advantage of the differential notation precisely consists of the possibility not to specify what is the variable that we consider as independent.”81 This quotation gives a precise meaning to the notion of differential. Consider the vector function

def 2 t 7→ h(t) = (x(t), y(t)) : R → R (3.3) of the auxiliary real variable t and the composition g of f and h,

def t 7→ g(t) = f(h(t)) = f(x(t), y(t)) : R → R .

2 The vector function h(t) = (x(t), y(t)) defines a path or a trajectory in R as a function of t. Assuming, without loss of generality, that h(0) = (x, y), the trajectory goes through the point (x(0), y(0)) = (x, y), where the vector is (x0(0), y0(0)). The differential at (x, y) exists 2 when there exists a linear82 map L : R → R such that g0(0) = L(h0(0)) for all paths h through (x, y). Note that the map L depends on (x, y) but is independent of the choice of the path h. This is what we call the geometric point of view. It can readily be extended to functions defined on manifolds by choosing paths lying in that manifold. In the vector case, if the paths are limited to lines through (x, y), we get the weaker notion of directional derivative in the direction (v, w) at (x, y) that is obtained by choosing paths of the form

def 2 t 7→ h(t) = (x + tv, y + tw): R → R , (3.4)

80Jacques-Salomon Hadamard (1865–1963). He obtained important results on partial differential equations of mathe- matical physics. He was also one of the contributors to the elaboration of the modern theory of functional analysis. 81From the French: “Que signifie l’égalité (3.1)? Que si x, y et dès lors g = f(x, y) sont exprimés en fonction d’une variable auxiliaire quelconque t, on a, quelles que soient ces expressions, dg dg dx dg dy = + . (3.2) dt dx dt dy dt Tel est le sens unique de l’égalité (3.1). L’égalité (3.2) ayant lieu quelle que soit la variable indépendante en fonction de laquelle les deux autres variables sont exprimées, on supprime la mention de t. L’avantage précieux de la notation différentielle consiste précisément en la possibilité de ne pas préciser quelle est la variable que l’on considère comme indépendante.” 82See Definition 4.11 in Chapter 1. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 85 which yields the function t 7→ g(t) = f(h(t)) = f(x + tv, y + tw): R → R of a real variable that is formally differentiated at t = 0. Equation (3.2) now becomes ∂f  (h(0)) dg d(f ◦ h) ∂f ∂f  ∂x  v  (0) = (0) = (h(0)) v + (h(0)) w =   · .   w dt dt ∂x ∂y ∂f  (h(0)) ∂y Denote by L(v, w) the right-hand side of this identity and observe that it is an inner product. So, 2 the function L : R → R is linear with respect to the direction (v, w): for all α, β ∈ R and all 2 (v1, w1), (v2, w2) ∈ R ,

L(αv1 + βv2, αw1 + βw2) = α L(v1, w1) + β L(v2, w2). Retain from the quotation of Hadamard that he insists on two elements: (a) identity (3.2) must be verified for all paths h(t) = (x(t), y(t)) and not only along lines; (b) the differential must be linear with respect to the h0(0) = (x0(0), y0(0)). We now give two examples to illustrate those considerations: one that satisfies (a) and (b) and one that satisfies (a) but not (b) that illustrates that the linearity is not a natural property. It must be imposed if wanted.

Example 3.1. Let f(x, y) = x2 + 2 y2. It is easy to check that dg 2x x0(0) (0) = 2x x0(0) + 4y y0(0) = · dt 4y y0(0) for all paths h(t) = (x(t), y(t)) given by (3.3) satisfying h(0) = (x, y).

Example 3.2 (Exercise 7.1). Let x3 f(x, y) def= if (x, y) 6= (0, 0), f(0, 0) def= 0. (3.5) x2 + y2 The function f is continuous at every point of the plane (including (0, 0)), and for all paths h given by (3.3) satisfying h(0) = (x, y), we have dg x0(0)3 (0) = if (x0(0), y0(0)) 6= (0, 0), and 0 if (x0(0), y0(0)) = (0, 0) dt x0(0)2 + y0(0)2 dg ⇒ (0) = f(x0(0), y0(0)). dt Since f is not linear, g0(0) is not linear with respect to (x0(0), y0(0)). Specializing to paths along lines in the direction (v, w) through (x, y) = (0, 0),

def 2 t 7→ h(t) = (tv, tw): R → R , consider the composition (tv)3 v3 dg v3 g(t) def= f(h(t)) = = t ⇒ (0) = . (tv)2 + (tw)2 v2 + w2 dt v2 + w2 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

86 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

2 Observe that the map (v, w) 7→ L(v, w) = g0(0) : R → R is well defined and independent of the choice of the path, but it is not linear in (v, w). We obtain this result in spite of the fact that the partial derivatives ∂f ∂f (0, 0) = 1 and (0, 0) = 0 ∂x ∂y exist. So, we would have expected identity (3.2), ∂f  dx  ∂f  (0, 0) (0) (0, 0) v  dg  ∂x   dt   ∂x  1 v  v3 (0) =   ·   =   · = · = v 6= dt         0 w v2 + w2 ∂f   dy  ∂f  w (0, 0) (0) (0, 0) ∂y dt ∂y for w 6= 0 and v 6= 0. However, only the linearity is missing.

The second example clearly illustrates that, working with the notion of differential in dimen- sions greater than one does not always yield the equivalent of a derivative in dimension one and a gradient in higher dimensions. It is customary to say that the function of the second example is not differentiable at (x, y), even if there exists a differential independent of the path through (x, y). Hence, it is necessary to carefully review the notion of differential and determine how far it can be relaxed while preserving the basic elements of the differential calculus.

3.2 Semidifferentials, Differentials, Gradient, and Partial Derivatives The privileged approach in this section is that of the first variation of the as opposed to the geometric approach of the previous section. We begin with the notions of semidifferentials and differentials. Then, we shall relate them to the previous geometric notion of differential in the next section. Several names will come up in that context: Karl Weierstrass (1815–1897), who is most likely the first to have given a correct definition of the differential of a function of several variables, Otto Stolz (1842–1905), James Pierpont (1866–1938), William Henry Young (1863–1942), Jacques Hadamard (1865–1963), Maurice Fréchet (1873–1973), and René Gateaux (1889–1914), as well as Paul Lévy (1886–1971), who carried out the publication of the work of Gateaux after his death in the early moments of the First World War (1914–1918).

3.2.1 Definitions n The previous definition for a real-valued function f : R → R generalizes to vector-valued n m functions when applied to each of its components. Starting from a function f : R → R , n m ≥ 1, and a point x and a direction83 v ∈ R , we consider the vector function of the real variable t,

def m t 7→ g(t) = f(x + tv): R → R , (3.6) and we are back to the framework and conditions of section 2. However, this will be sufficient and some stronger definitions will be required that will involve the variation of f with respect to both t and v.

Definition 3.1. n n Let m ≥ 1 and n ≥ 1 be two integers, f : R → R a vector function, x ∈ R a point, and n v ∈ R a direction. n 83A direction is often interpreted as a vector v ∈ R of norm one. In this book, the term direction is used for any n vector v ∈ R including 0. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 87

(i) • f is Gateaux semidifferentiable at x in the direction v if

f(x + tv) − f(x) m lim exists in R . (3.7) t&0+ t When the limit (3.7) exists, it will be denoted df(x; v). By definition, we get df(x; 0) = 0 and the positive homogeneity

∀α ≥ 0, df(x; αv) exists and df(x; αv) = α df(x; v).

n • f is Gateaux semidifferentiable at x if df(x; v) exists for all v ∈ R . • f is Gateaux differentiable84 at x if the semidifferential df(x; v) exists in all directions n v ∈ R and the map

def n m v 7→ Df(x)v = df(x; v): R → R (3.8) is linear.85 (ii) • f is Hadamard semidifferentiable86 at x in the direction v if the following limit exists:

f(x + tw) − f(x) m lim exists in R . (3.9) t&0+ t w→v

When the limit (3.9) exists, it will be denoted dH f(x; v). By definition, we have dH f(x; v) = df(x; v). n • f is Hadamard semidifferentiable at x if dH f(x; v) exists for all v ∈ R .

• f is Hadamard differentiable at x if the semidifferential dH f(x; v) exists in all directions n v ∈ R and the map

def n m v 7→ dH f(x)(v) = dH f(x; v): R → R (3.10)

is linear. Hence, by definition, dH f(x; v) = Df(x)v.

It is clear that if dH f(x; v) exists, then df(x; v) exists and dH f(x; v) = df(x; v). However, even if df(x; 0) always exists and is equal to 0, dH f(x; 0) does not always exist, as shown in Example 3.6 (Figure 3.4).

Remark 3.1. In Definition 3.1, checking the existence of the limit as (t, w) → (0, v), t 6= 0, is equivalent to using all sequences {(tn, wn)}, tn 6= 0, going to (0, v). For instance, dH f(x; v) exists if there m exists q ∈ R such that for all sequences {(tn, wn)}, tn 6= 0, going to (0, v), f(x + t w ) − f(x) n n → q. tn 84René Eugène Gateaux (1889–1914). In his birth certificate as in his texts, his letters, and his publications before his death (R. GATEAUX [1, 2, 3, 4, 5]) his name is spelled without circumflex accent (see L. MAZILAK [1], M. BARBUT, B.LOCKER,L.MAZILAK and P. PRIOURET [1], L. MAZILAK and R. TAZZIOLI [1]). The accent appeared in his three posthumous publications from 1919 to 1922 in the Bulletin de la Société Mathématique de France probably by homonymy with the French word “gâteaux” (cakes) (see L. MAZILAK ]2]). His important unpublished work was left in Jacques Hadamard’s care. It was completed by Paul Lévy who saw to their publication in the Bulletin de la Société Mathématique de France (see R. GATEAUX [6, 7, 8]). Gateaux was awarded (post-mortem) the Francœur Prize of the Academy of Sciences in 1916 for his work on the functional calculus. Paul Lévy (1886–1971) made Gateaux’s work known in his Leçons d’analyse fonctionnelle (1922) (P. LÉVY [1]). 85Cf. Definition 4.11 of Chapter 1. 86Cf. the comments at the end of section 3.3. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

88 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n m In finite dimension the linearity of v 7→ df(x; v): R → R implies its continuity, but we also have the continuity for Hadamard semidifferentiable functions.

n m n Theorem 3.1. If f : R → R is Hadamard semidifferentable at x ∈ R in all directions n v ∈ R , then the function n m v 7→ dH f(x; v): R → R (3.11) is continuous and positively homogeneous. Proof. We have already seen that, by definition, the function (3.11) is positively homogeneous. Always, by the definition of dH f(x; v), for all ε > 0, there exists δ > 0 such that

f(x + tw) − f(x) n ∀t, 0 < t < δ, ∀w, kw − vkR < δ, − dH f(x; v) < ε. t m R So, we can go to the limit as t & 0 to get df(x; w):

f(x + tw) − f(x) n ∀w, kw − vkR < δ, lim − dH f(x; v) < ε t&0 t m R ⇒ ∀w, kw − vk n < δ, kdf(x; w) − dH f(x; v)k m < ε. R R n Since df(x; w) = dH f(x; w), we get the continuity at each v ∈ R . Example 3.3. √ n n Consider the square f(x) = kxk2 of the norm kxk = x · x of x in R . It is continuous on R . For all (t, w) → (0, v), (t, w) 6= (0, v),

f(x + tw) − f(x) kx + twk2 − kxk2 (2x + tw) · tw = = = (2x + tw) · w → 2x · v t t t by continuity of the inner product. Therefore, dH f(x; v) exists for all x and all v,

dH f(x; v) = 2x · v,

87 and as v 7→ dH f(x; v) is linear in v, it is Hadamard differentiable.

The second example is the norm which is Hadamard semidifferentiable at 0, but not linear with respect to the direction v.

Example 3.4 (Figure 3.2). √ n n Consider the norm f(x) = kxk = x · x of x in R . It is continuous at R . First consider the case x 6= 0. For all w → v and t > 0,

f(x + tw) − f(x) kx + twk − kxk kx + twk2 − kxk2 1 = = t t t kx + twk + kxk 1 x → 2x · v = · v 2kxk kxk by continuity of the norm. Therefore, dH f(x; v) exists, x d f(x; v) = · v, H kxk

87Cf. Definition 4.11 of Chapter 1. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 89

f(x) = |x|

0 x Figure 3.2. The function f(x) = |x| in a neighborhood of x = 0 for n = 1.

and since dH f(x; v) is linear with respect to v, the norm is Hadamard differentiable at every point x 6= 0. We now show that f has a Hadamard semidifferential at x = 0 that is not linear in v. Indeed, for all w → v and t > 0,

f(0 + tw) − f(0) ktwk t = = kwk = kwk → kvk t t t by continuity of the norm. Therefore, dH f(0; v) exists,

dH f(0; v) = kvk for all directions v, but the map v 7→ dH f(0; v) is not linear.

We now complete the definitions by introducing the notion of directional derivative and the special case of partial derivatives.

Definition 3.2. n n n Let f : R → R, x ∈ R , and v ∈ R (a direction). (i) f has a derivative in the direction v at the point x if the following limit exists:

f(x + tv) − f(x) lim . (3.12) t→0 t

By definition, we get df(x; −v) = −df(x; v) and the homogeneity

∀α ∈ R, df(x; αv) exists and df(x; αv) = α df(x; v).

n (ii) Let {ei : 1 ≤ i ≤ n}, (ei)j = δij, be the canonical orthonormal basis in R . f has partial derivatives at x if for each i, f is differentiable at x in the direction ei, that is,

f(x + t e ) − f(x) lim i exists. t→0 t

The limit is denoted ∂if(x) or ∂f/∂xi(x). By definition, ∂if(x) = df(x; ei) and the function α 7→ df(x; α ei): R → R is homogeneous.

The intermediary notion of directional derivative is of limited interest since it hides the more fundamental notion of semidifferential and is not yet the Gateaux differential. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

90 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

3.2.2 Examples and Counterexamples It is useful to consider a of examples to fully appreciate the differences between the previ- ous definitions: (i) In general, a continuous function with partial derivatives may not be semidifferentiable in n all directions v ∈ R (see Example 3.5).

(ii) df(x; 0) always exists and df(x; 0) = 0, but dH f(x; 0) does not necessarily exist (see Example 3.6 and Theorem 3.8 that will establish that when dH f(x; 0) exists, the function f is continuous at x).

(iii) df(x; v) can exist in all directions while dH f(x; v) does not (see Example 3.6).

(iii) dH f(x; v) can exist in all directions without the linearity of the map v 7→ dH f(x; v) (Example 3.7 is the same as Example 3.2 already introduced in section 3.1). Moreover, * ⇒ f Hadamard differentiable at x

: (Example 3.5)

⇓ (6⇑ Counterexample 3.6)

* (Example 3.6 ) f continuous ; f Gateaux differentiable at x : (Example 3.5) at x

⇓ (6⇑ Counterexample 3.7)

* ; (Example 3.6 ) f differentiable in all directions at x : (Example 3.5)

In general, the converse statements are not true as will be shown in the examples. The first example is a function which is continuous with partial derivatives at (0, 0) and is not Gateaux semidifferentiable in all directions.

Example 3.5 (Figure 3.3). 2 Consider the function f : R → R: ( (xy)1/3 if xy ≥ 0, f(x, y) = −|xy|1/3 if xy < 0.

2 It is continuous on R . Compute its directional derivative at (0, 0) in the direction v = (v1, v2). For v1v2 ≥ 0 and t > 0, f(0 + tv) − f(0) (tv tv )1/3 1 = 1 2 = (v v )1/3, t t t1/3 1 2 and for v1v2 < 0, f(0 + tv) − f(0) |tv tv |1/3 1 = − 1 2 = − |v v |1/3. t t t1/3 1 2

The differential quotients converge if and only if v1v2 = 0. Therefore,

df((0, 0); (v1, v2)) = 0 if v1v2 = 0 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 91

2

z -4 0

-2 -2

0 4 y

2 2 0

x -2-2 4

-4-4

Figure 3.3. Example 3.5.

and does not exist for v1v2 6= 0. In fact, under the condition v1v2 = 0, we have f(0 + tv) − f(0) lim = 0, t→0 t that is, a semidifferential and even a directional derivative. 2 Specializing to the canonical orthonormal basis e1 = (1, 0) and e2 = (0, 1) of R , we get ∂f ∂f (0, 0) = df((0, 0); e1) = 0 and (0, 0) = df((0, 0); e2) = 0, ∂x1 ∂x2 where ∂f/∂x1(0, 0) and ∂f/∂x2(0, 0) are the partial derivatives of f at (0, 0). This example shows that for directions (v1, v2), v1 v2 6= 0, the semidifferential df((0, 0); v1, v2) does not exist and, hence, is not equal to  ∂f ∂f  (0, 0), (0, 0) · (v1 v2). ∂x1 ∂x2 The second example is both rich and important. The function is discontinuous and Gateaux differentiable, but not Hadamard semidifferentiable at (0, 0). In particular, dH f(x; 0) does not exist, even if df(x; 0) exists and df(x; 0) = 0.

Example 3.6 (Figure 3.4). 2 Consider the function f : R → R defined as  x6  if (x, y) 6= (0, 0), f(x, y) def= (y − x2)2 + x8 0 if (x, y) = (0, 0). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

92 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

10 10

8 10

6 10

4 10

2 10 0 0.2

0 0.4 10 1 0.8 0.6 0.6 0.4 0.2 0.8 0 !0.2 !0.4 !0.6 !0.8 !1 1

Figure 3.4. Examples 3.6 and 3.8 in logarithmic scale

It is directionally differentiable and even Gateaux differentiable at x = (0, 0), but not Hadamard semidifferential at (0, 0) in the directions (0, 0) and (1, 0), and f is not continuous at (0, 0). First compute the directional derivative of f at (0, 0). For v = (v1, v2) 6= (0, 0), consider the following two cases: v2 = 0 and v2 6= 0. If v2 = 0 and v1 6= 0,

6 6 f(tv1, 0) − f(0, 0) 1 (tv1) 1 (tv1) = 2 2 8 = 4 8 t t (0 − (tv1) ) + (tv1) t (tv1) + (tv1) 2 v1 = t 4 1 + (tv1) and as t goes to 0, df((0, 0); (v1, v2)) = 0.

If v2 6= 0,

6 6 f(tv1, tv2) − f(0, 0) 1 (tv1) 3 v1 = 2 2 8 = t 2 2 6 8 t t (tv2 − (tv1) ) + (tv1) (v2 − tv1) + t v1 and as t goes to 0, df((0, 0); (v1, v2)) = 0 if v2 6= 0. Therefore, 2 ∀v = (v1, v2) ∈ R , df((0, 0); (v1, v2)) = 0, 2 the map (v1, v2) 7→ df((0, 0); (v1, v2)) : R → R is linear, and f is Gateaux differentiable. To show that dH f((0, 0); (0, 0)) does not exist, choose the following sequences:

1  1 1  t = & 0 and w = , → (0, 0) as n → +∞. n n n n n3

Consider the quotient

def f((0, 0) + tnwn) − f(0, 0) qn = . tn Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 93

As n → ∞, ( 1 )6 q = n2 = n5 → +∞ n 1 1 8 n ( n2 ) and dH f((0, 0); (0, 0)) does not exist. It is also readily seen that f is discontinuous at x = (0, 0) by following the path (α, α2) as α goes to 0:

6 2 α 1 |f(α, α ) − f(0, 0)| = = → +∞ when α → 0. α8 α2

To show that dH ((0, 0); (1, 0)) does not exist, compute the differential quotient for two dif- ferent sequences (wn, tn) → (v, 0) and show that the two limits are different. First choose 1 w = (1, 0) ∀n and t = ∀n, n n n f(t w ) − f(0, 0) 1 ( 1 )6 ( 1 ) n n = n = n → 0. 1 1 1 4 1 8 1 4 n n ( n ) + ( n ) 1 + ( n ) Then choose  1  1 w = 1, ∀n and t = ∀n, n n n n f(t w ) − f(0, 0) 1 ( 1 )6 n n = n = n3 → +∞. 1 1 1 8 n n ( n ) The last example is the same as Example 3.2 already introduced in section 3.1. It is a contin- uous Hadamard semidifferentiable function, but not linear with respect to the direction.

Example 3.7. Consider the function  x3  if (x, y) 6= (0, 0), f(x, y) = x2 + y2 0 if (x, y) = (0, 0) shown in Figure 3.5. It is readily seen that it is continuous at (0, 0):

3 2 x x p = |x| ≤ |x| ≤ x2 + y2 → 0 as (x, y) → (0, 0). x2 + y2 x2 + y2

For v = (v1, v2) 6= 0 and w = (w1, w2) → v = (v1, v2),

3 3 3 f(tw) − f(0) 1 (tw1) w1 v1 = 2 2 = 2 2 = f(w1, w2) → f(v1, v2) = 2 2 t t (tw1) + (tw2) w1 + w2 v1 + v2 and necessarily

( 3 2 2 ) 2 v1 / (v1 + v2) if (v1, v2) 6= (0, 0) ∀v ∈ R , dH f(0; v) = = f(v1, v2). 0 if (v1, v2) = (0, 0)

So f is not Gateaux differentiable since v 7→ dH f(0; v) is not linear. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

94 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

5

2.5

z 4 0

-2.5 2

-5

0 y 4

2 -2 0

-2-2 x - 4

--44

Figure 3.5. Example 3.7.

3.2.3 Gradient, Jacobian Mapping, and Jacobian Matrix n Let {ei : 1 ≤ i ≤ n}, (ei)j = δij, be the canonical orthonormal basis in R . Any element n Pn x = (x1, . . . , xn) ∈ R can be written in the form x = i=1 xi ei and the inner product is

n def X x · y = xi yi. (3.13) i=1

n Similarly, any direction v = (v1, . . . , vn) ∈ R can be written as

n X v = viei and vi = v · ei. i=1

n As f is Gateaux differentiable at x, the map v 7→ df(x; v): R → R is linear. Therefore,

n n  X  X df(x; v) = df x; viei = vi df(x; ei) = g(x) · v, i=1 i=1 where n def X n g(x) = df(x; ei) ei ∈ R . i=1

n The vector g(x) is unique. Indeed, if there exist g1 and g2 in R such that

n ∀v ∈ R , g1 · v = df(x; v) = g2 · v,

n then for all v ∈ R we have (g1 − g2) · v = 0 and hence g1 = g2. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 95

Definition 3.3. n n Let f : R → R be Gateaux differentiable in x ∈ R . The gradient of f at x is the unique vector n ∇f(x) of R such that n ∀v ∈ R , ∇f(x) · v = df(x; v). (3.14) In particular,

∇f(x) = (∂1f(x), . . . , ∂nf(x)), where ∂if(x) = df(x; ei) is the partial derivative of f at x in the direction ei.

Example 3.5 shows that, even if the partial derivatives exist, the gradient may not exist, and, a fortiori, semidifferentials df(x; v) may not exist in some directions v.

Definition 3.4. n m n Let {ej : j = 1, . . . , n} and {ei : i = 1, . . . , m} be the orthonormal canonical bases in R m n m and R , respectively. If f : R → R is Gateaux differentiable at x, the linear mapping n m Df(x): R → R given by (3.8) is called the Jacobian mapping88 of f at x. The m × n matrix associated with Df(x),

def m n Df(x)ij = ei · df(x; ej ) = ∂jfi(x), 1 ≤ i ≤ m, 1 ≤ j ≤ n, is called the Jacobian matrix. When m = n, the absolute value of the determinant of the matrix Df(x) is called the Jacobian of f at x.

Remark 3.2. n When f : R → R, the matrix Df(x)1,j = ∂jf(x) of dimension 1 × n corresponds to the gradient ∇f(x) = (∂1f(x), . . . , ∂nf(x)) and

n ∀v ∈ R , Df(x)v = ∇f(x) · v. (3.15)

3.2.4 Fréchet Differential In this section we turn to the notion of differential that is usually found in contemporary books in analysis. For terminological reasons, it will be referred to as the Fréchet differentiability. In finite dimension, it is equivalent to Hadamard differentiability that is simpler to characterize and can be extended to infinite dimensional topological vector spaces that do not have a structure.

Definition 3.5. n n n f : R → R is Fréchet differentiable89 at x ∈ R if there exists a linear90 map L(x): R → R such that f(x + v) − f(x) − L(x)v lim = 0. (3.16) v→0 kvk

88Carl Gustav Jacob Jacobi (1804–1851), brother of the physicist Moritz Hermann von Jacobi. He established the theory of functional determinants, which are now called Jacobians. 89Maurice René Fréchet (1873–1973) made important contributions to real analysis and built the foundations of ab- stract spaces. He wrote his thesis under the supervision of Hadamard in 1906. He introduced the concept of metric space and the abstract formulation of compactness. 90The map v 7→ f(x) + L(x)v can be seen as an affine approximation of f(x + v) in (x, f(x)) at the infinitesimal scale. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

96 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Remark 3.3. This definition was initially given by M. FRÉCHET [1] in 1911 in the context of functionals, that is, functions of functions. But, in finite dimension, his definition is equivalent to the earlier notion of total differential used by O. Stolz91 in 1893, J. Pierpont92 in 1905, and W. H. Young93 in 1908–1909: “In fact, an equivalent definition had been given in 1908 by M. W. H. YOUNG [1, p. 157], [2, p. 21], who had, besides, explicitly developed the consequences.” (translated from M. FRÉCHET [2]).94 “But, I noticed that this definition was already in Stolz, Grundzüge der Differen- tial und Integral-Rechnung, t. I, p. 133, and in James Pierpont, The theory of func- tions of real variables, t. I, p. 268. But, it is W. H. Young who was the first to truly show all the advantages in his small Book: The fundamental theorems of Differential Calculus and in a few Mémoires.” (translated from M. FRÉCHET [3]).95

According to V. M. TIHOMIROV [1], “the correct definitions of derivative and differential of a function of many variables were given by K. Weierstrass96 in his lectures in the eighties of the 19th century. These lectures were published in the thirties of our century (20th). The correct def- initions of the derivative in the multidimensional case appear also at the beginning of the century in some German and English text-books (Stolz, Young) under the influence of Weierstrass.” He does not provide a more specific reference but both of them had had contacts with Weierstrass over long stays in Germany.

It is readily seen that a Fréchet differentiable function at x is Gateaux differentiable at x and that n df(x; v) = L(x)v ∀v ∈ R . Indeed, the result is true for v = 0 since df(x; 0) = 0 = L(x)0. For v 6= 0 and t > 0

tv → 0 as t & 0.

As f is Fréchet differentiable at x, f(x + tv) − f(x) − L(x)(tv) lim = 0. t&0 ktvk

Since t > 0 and v 6= 0, eliminate kvk and rewrite the expression as

f(x + tv) − f(x) lim − L(x)v = 0, t&0 t which is the semidifferential df(x; v) of f at x in the direction v and hence f(x + tv) − f(x) df(x; v) = lim = L(x)v. t&0 t

91Otto Stolz (1842–1905) (see O. STOLZ [1, 133]). 92James Pierpont (1866–1938) (see J. PIERPONT [1, p. 268]). 93William Henry Young (1863–1942) (see W. H. YOUNG [1, p. 157], [2, p. 21]). 94From the French: “En fait, une définition équivalente avait été donnée en 1908 par M. W. H. YOUNG [1, p. 157], [2, p. 21], qui avait, en outre, développé explicitement les conséquences.” 95From the French: “Mais je me suis aperçu qu’on trouve déjà cette définition dans Stolz, Grundzüge der Differential und Integral-Rechnung, t. I, p. 133, et James Pierpont, The theory of functions of real variables, t. I, p. 268. Mais c’est W. H. Young qui en a véritablement montré le premier tous les avantages dans son petit Livre: The fundamental theorems of Differential Calculus et dans quelques Mémoires.” 96Karl Theodor Wilhelm Weierstrass (1815–1897). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 97

This is precisely the definition of the semidifferential of f at x in the direction v. Since the map n L(x): R → R is linear, f is Gateaux differentiable at x and, by Definition 3.3 of the gradient,

n ∀v ∈ R , ∇f(x) · v = df(x; v) = L(x)v.

The next example shows that a Gateaux differentiable function f at x is not necessarily Fréchet differentiable at x, and not even continuous at x.

Example 3.8. 2 Go back to the function f : R → R of Example 3.6 defined by  x6  if (x, y) 6= (0, 0), f(x, y) def= (y − x2)2 + x8 0 if (x, y) = (0, 0).

It was shown in Example 3.6 that df((0, 0); (v1, v2)) = 0 for all (v1, v2), that f is Gateaux differentiable in (0, 0), and that f is discontinuous at (0, 0). We now show that f is not Fréchet differentiable at (0, 0). Choose

v(α) = (α, α2), α 6= 0.

As α goes to 0, v(α) goes to (0, 0). Compute the Fréchet quotient

f(α, α2) − f(0, 0) − df((0, 0); (α, α2)) q(α) def= k(α, α2)k 1 1 = 3 1 → +∞ as α → 0. α (1 + α2) 2

So, the function f is not Fréchet differentiable at (0, 0).

The next theorem relates the Fréchet and the Hadamard differentials.

n n Theorem 3.2. Let x ∈ R and f : R → R. The following conditions are equivalent: (i) f is Fréchet differentiable at x;

(ii) f is Hadamard differentiable at x. In dimension n = 1, Fréchet, Hadamard, and Gateaux differentiabilities in x coincide and correspond to the derivative in x of Definition 2.1(iii).

Proof. (i) ⇒ (ii). Consider the quotient

f(x + tw) − f(x) q(t, w) def= t as w → v and t & 0. We have h(t, w) def= tw → 0 since w → v and t & 0. Then

q(t, w) = Q(h(t, w)) kwk + L(x) w, where f(x + h) − f(x) − L(x) h Q(h) def= if h 6= 0 and Q(0) def= 0. (3.17) khk Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

98 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Since f is Fréchet differentiable at x and h(t, w) = tw → 0, Q(h(t, w)) → 0. Moreover, by continuity of L(x), L(x) w → L(x) v as w → v. Therefore,

lim q(t, w) = L(x) v, t&0+ w→v dH f(x; v) exists, and dH f(x; v) = L(x) v is linear (and continuous) with respect to v. (ii) ⇒ (i) As f is Hadamard differentiable, dH f(x; v) exists for all v and the map v 7→ def L(x)v = dH f(x; v) is linear. Letting

Q def= lim sup |Q(h)|, khk→0 there exists a sequence {hn}, 0 6= hn → 0, such that |Q(hn)| converges to Q ∈ [0, ∞]. n n Since {h/khk : ∀h ∈ R , h 6= 0} is the compact sphere S of radius 1 in R , there exist a subsequence {hnk } and a point v ∈ S such that

def hnk wnk = → v ∈ S. khnk k As h 6= 0,

 h   h  f x + khk khk − f(x) h Q(h) = Q khk = − L(x) . khk khk khk

Since dH f(x; v) exists and L(x)wnk → L(x)v = dH f(x; v), choosing the subsequence tnk = khnk k that goes to 0, we get

f(x + tnk wnk ) − f(x) Q(hnk ) = − L(x) wnk tnk

→ dH f(x; v) − L(x) v = dH f(x; v) − dH f(x; v) = 0

⇒ |Q(hnk )| → 0 and Q = lim sup |Q(h)| = 0. h→0

Since |Q(h)| ≥ 0 and the limsup Q is equal to zero, the limsup is equal to the limit and the limit of the quotient Q(h) exists and is 0 as h goes to 0. By definition, f is Fréchet differentiable at x.

3.3 Hadamard Differential and Semidifferential By using the terminology Hadamard differentiability in Definition 3.1(ii), we have slightly cheated. This rather applies to the geometric notion of a differential in the quotation of sec- tion 3.1, which is the original differential in the sense of Hadamard, while Definition 3.1(ii) is merely an equivalent characterization, as can be seen from the equivalence of (a) and (c) in part (ii) of the following theorem.

n m n Theorem 3.3. Let f : R → R and x ∈ R . n (i) Given a direction v ∈ R , the following conditions are equivalent:

(a) dH f(x; v) exists; m n (b) there exists g(x, v) ∈ R such that for all paths h : [0, ∞[ → R for which h(0) = x and dh(0; +1) = v, d(f ◦ h)(0; +1) exists and d(f ◦ h)(0; +1) = g(x, v). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 99

(ii) The following conditions are equivalent: (a) f is Hadamard semidifferentiable at x; n m (b) there exists a positively homogeneous function g(x): R → R such that for all n functions h : [0, +∞[ → R for which h(0) = x and dh(0; +1) exists, then d(f ◦ h)(0; +1) exists and d(f ◦ h)(0; +1) = g(x)(dh(0; +1)). (iii) The following conditions are equivalent: (a) f is Hadamard differentiable at x; n m n (b) there exists a linear map L(x): R → R such that for all paths h : R → R for which h(0) = x and dh(0; +1) exists, d(f ◦ h)(0; +1) exists and

d(f ◦ h)(0; +1) = L(x) dh(0; +1); (3.18)

n m n (c) there exists a linear map L(x): R → R such that for all paths h : R → R for which h(0) = x and h0(0) exists, (f ◦ h)0(0) exists and

(f ◦ h)0(0) = L(x) h0(0). (3.19)

Proof. It is sufficient to prove the equivalence of (a) and (b) in part (i). The equivalence of (a) and (b) in part (ii) is a consequence of part (i). The equivalence of (a) and (b) in part (iii) is also a consequence of part (i). It will remain to prove the equivalence of (b) and (c) in part (iii) to complete the proof. (i) (a) ⇒ (b). As dh(0; +1) exists, for any sequence {tn > 0} going to 0,

def h(tn) − h(0) wn = → dh(0; +1), ⇒ h(tn) = x + tn wn. tn

But, since dH f(x; v) exists, we have for any sequences {tn > 0}, tn & 0, wn → v,

f(h(tn)) − f(h(0)) f(x + tnwn) − f(x) = → dH f(x; v), tn tn and d(f ◦ h)(0; +1) = dH f(x; v) exists and we can choose g(x, v) = dH f(x; v). n (b) ⇒ (a). By contradiction. Assume that there exist v ∈ R and sequences wn → v and {tn}, tn & 0, such that the sequence of differential quotients

def f(x + tnwn) − f(x) qn = tn does not converge to g(x, v). So, there exists η > 0 such that, for all k ≥ 1, there exists nk ≥ k such that kqnk − g(x, v)k ≥ η. To simplify the notation, denote by (tn, wn) the subsequence

(tnk , wnk ).

Construct the following new subsequence (tnk , wnk ). Let n1 be the first n ≥ 1 such that tn ≤ 1. Let n2 be the first n > n1 such that tn ≤ tn1 /2. At step k + 1, let nk+1 be the first n > nk such that tn ≤ tnk /2. By construction, nk > nk+1 and tnk+1 ≤ tnk /2 < tnk . The subsequence {(tnk , wnk )} is such that {tnk } is monotone strictly decreasing to 0 and wnk → v. As a result, we can assume, without loss of generality, that for our initial sequence {(tn, wn)}, the sequence {tn} is monotone strictly decreasing. As {wn} is convergent, there exists a constant c such that kwnk ≤ c for all n. For all ε > 0, there exists N such that

∀n > N, kwn − vk < ε and tn < ε/c. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

100 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n Define the function h : R → R as follows:  x + tv if t ≤ 0, def  h(t) = x + t wn if tn+1 < t ≤ tn, n ≥ 1,  x + tw1 if t1 < t.

The vector function h is continuous at t = 0. Indeed, it is continuous from the left since h(t) = x + tv → x = h(0) as t < 0 goes to 0. On the right, for δ = tN+1 > 0 and 0 < t < δ, there exists n > N + 1 such that tn+1 < t ≤ tn, and hence

kh(t) − h(0)k = t kwnk ≤ tn−1 kwnk < tN+1 c < (ε/c) c = ε, and h is continuous from the right at 0. For the derivative from the right, for δ = tN+1 > 0 and 0 < t < δ, there exists n > N + 1 such that tn+1 < t ≤ tn and

h(t) − h(0) − v = kwn − vk < ε t and dh(0; +1) = v. For the derivative from the left, we have dh(0; −1) = −v, −dh(0; −1) = v = dh(0; +1), and, in fact, the derivative exists and h0(0) = v. But, by hypothesis, for such a function h, d(f ◦ h)(0; +1) exists and is equal to g(x; v). On the other hand, by construction of the function h,

f(x + tnwn) − f(x) f(h(tn)) − f(h(0)) qn = = → d(f ◦ h)(0; +1) = g(x, v). tn tn

This contradicts our initial assumption that qn 6→ g(x, v). (ii) is a consequence of (i). (iii) (b) ⇒ (c). Let h be such that h(0) = x and h0(0) exists. Then,

dh(0; +1) = h0(0) = −dh(0; −1) and from (b),

d(f ◦ h)(0; +1) = L(x) dh(0; +1) = L(x) h0(0). (3.20)

The function h¯(t) def= h(−t) is such that h¯(0) = x and h¯0(0) = −h0(0). By (b),

d(f ◦ h¯)(0; +1) = L(x) dh¯(0; +1) = −L(x) h0(0). (3.21)

Therefore, f(h(0 − t)) − f(h(0)) f(h¯(t)) − f(h¯(0)) lim = lim = −L(x) h0(0) t&0 t t&0 t ⇒ d(f ◦ h)(0; −1) exists and d(f ◦ h)(0; −1) = −L(x) h0(0) = −d(f ◦ h)(0; +1).

So we have existence of the derivative (f ◦ h)0(0) and (f ◦ h)0(0) = L(x) h0(0). (c) ⇒ (b). Let h be such that h(0) = x and dh(0; +1) exists. Construct the new function n h¯ : R → R as follows: ( h(t) if t ≥ 0, h¯(t) def= h(0) − t dh(0; +1) if t < 0. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 101

It is easy to check that h¯(0) = h(0) = x and that dh¯(0; +1) = dh(0; +1) and dh¯(0; −1) = −dh(0; +1) ⇒ h¯0(0) exists and h¯0(0) = dh(0; +1). For part (c), d(f ◦ h¯)(0; +1) = (f ◦ h¯)0(0) = L(x) h¯0(0) = L(x) dh(0; +1). But for t > 0, f(h¯(t)) − f(h¯(0)) f(h(0 + t)) − f(h(0)) = ⇒ d(f ◦ h¯)(0; +1) = d(f ◦ h)(0; +1) t t ⇒ d(f ◦ h)(0; +1) = L(x) dh(0; +1) and (b) is true.

From the abstract of the paper entitled “Sur la notion de différentielle” of M. FRÉCHET [5] in 1937, The author shows that the total differential of Stolz-Young97 is equivalent to the definition due to Hadamard (Theorem 3.3(ii) (c)). On the other hand, when the latter is extended to functionals it becomes more general than the one of the author . . . 98 since it applies to function spaces (infinite dimension) without metric. In the same paper, M. FRÉ- CHET [5, p. 239] proposes the following definition.

Definition 3.6 (notion proposed by Fréchet). n m n n m The function f : R → R is differentiable at x ∈ R if there exists a function g(x): R → R n such that for all paths h : R → R for which h(0) = x and h0(0) exists, we have (f ◦ h)0(0) = g(x)(h0(0)). (3.22) It is important to notice that, in general, the function g(x) is not linear.

Unfortunately, yielding to criticisms, he does not push this new notion further. “But as was pointed out by Mr. Paul Lévy, such a definition is not sufficient, since a differentiable function in that sense could loose important properties of the differential of simple functions and in particular property (3) (the linearity!). Such is, for instance, the case for the function s x2 f(x, y) = x for (x, y) 6= (0, 0) with f(0, 0) = 0.” (3.23) x2 + y2

(M.FRÉCHET [5, p. 239]).99

97See Remark 3.3 on page 96. 98From the French: “L’auteur montre que la différentielle totale de Stolz-Young est équivalente à la définition due à Hadamard (Theorem 3.3(ii) (c)). Par contre, quand on étend cette dernière aux fonctionnelles elle devient plus générale que celle de l’auteur . . . .” 99From the French: “Mais comme l’a fait observer M. Paul Lévy, une telle définition n’est pas suffisante, car une fonction différentiable à ce sens peut perdre d’importantes propriétés de la différentielle des fonctions simples et en particulier la propriété (3) (la linéarité!). Tel est, par exemple, le cas pour la fonction s x2 f(x, y) = x pour (x, y) 6= (0, 0) avec f(0, 0) = 0.” x2 + y2

(M.FRÉCHET [5, p. 239]). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

102 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Indeed, it is readily checked that for (v, w) 6= (0, 0),

r 2 f(tn vn, tn wn) − f(0, 0) v dH f((0, 0); (v, w)) = lim = v 2 2 tn→0 tn v + w (vn,wn)→(v,w) is not linear in (v, w). Far from discrediting this new notion, his example shows that such func- tions exist. By Theorem 3.3(i), Definition 3.6 implies (b), which implies (a), and, from there, the function n is Hadamard semidifferentiable; that is, the semidifferential dH f(x; v) exists for all v ∈ R . By using h0(0) and (f ◦ h)0(0) rather than dh(0; +1) and d(f ◦ h)(0; +1), Fréchet was losing some Hadamard semidifferentiable functions (in the sense of Definition 3.1(ii)) such as the norm n f(x) = kxk on R at x = 0 since the differential quotient

k0 + tn vnk − k0k |tn| = kvnk (3.24) tn tn does not converge as vn → v and tn → 0 (the sequence |tn|/tn has subsequences converging to 1 and −1). It is really necessary to use sequences {tn} of positive numbers to get the existence of the limit of the differential quotient

|tn| dH f(0; v) = lim kvnk = lim kvnk = kvk. tn&0 tn vn→v vn→v

So, Definition 3.6 is slightly stronger than Definition 3.1(ii) of dH f(x; v). Yet, it is quite re- markable since, up to the use of the semiderivatives dh(0; +1) and d(f ◦ h)(0; +1) rather than h0(0) and (f ◦ h)0(0), he introduces a nondifferentiable infinitesimal calculus. We shall see that such functions retain two important properties of differentiable functions: they are continuous at x (Theorem 3.8), and the chain rule is applicable to the composition (Theorem 3.5). We shall see in section 4 that those properties will be shared by convex continuous functions that turn out to be Hadamard semidifferentiable. Definition 3.1(i) of the semidifferential df(x; v) and of the differential can be found in the posthumous work100 of R. GATEAUX [6, 7] published in 1919 and 1922 up to the difference that he is using directional derivatives in Definition 3.2, where the variable t goes to 0, rather than the semidifferential where the variable t goes to 0 by positive values as U. DINI [1] was already doing in 1878. This notion would have been inspired during a visit to V. Volterra101 in the early times of the calculus of variations. The geometric definition (part (b) of Theorem 3.3(i)) of the semidifferential dH f(x; v) can be found in J. DURDIL [1, p. 457] in 1973 and the analytical Definition 3.1(ii) in J.-P. PENOT [1, p. 250] in 1978 under the name of M-semidifferentiability and of M-differentiability when v 7→ dH f(x; v) is linear. He attributes M-differentiability to A. D. MICHAL [1] in 1938 for infinite-dimensional spaces. A. D. MICHAL [1] makes the between his M- differentiability and what he calls the MH-differentiability, which is nothing but the geometric version of the Hadamard differential in infinite dimension (see part (c) of Theorem 3.3(ii)) that

100“. . . Nous allons emprunter au Calcul functionnel la notion de variation, qui nous rendra les services que rend la d differential totale dans la théorie des functions d’un nombre fini de variables. δF (x) = [ dλ F (x + λ δx)]λ=0 (see R.GATEAUX [7, page 83]). . . . ” d “. . . Considérons U(z + λ t1). Supposons que [ dλ U(z + λ t1)]λ=0 existe quelque soit t1. On l’appelle la variation première de U au point z : δU(z, t1). C’est une function de z et t1, qu’on suppose habituellement linear, en chaque point z par rapport à t1. . . . (see R. GATEAUX [6, page 11]).. . . ” 101Vito Volterra (1860–1940). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 103

Fréchet had already introduced and promoted in his 1937 paper. A. D. MICHAL [1] also intro- duces the equivalent in infinite dimension of the semidifferential of Definition 3.6 that Fréchet had proposed in 1937. However, as pointed out in (3.24), the norm is not semidifferentiable according to the defini- tions of the semidifferential introduced by Fréchet and Michal, but it becomes semidifferentiable for the slightly weaker notion of Definition 3.1(ii) of dH f(x; v). Keeping all this in mind, we have opted for the terminology Hadamard semidifferential rather than M-semidifferential.

3.4 Operations on Semidifferentiable Functions 3.4.1 Algebraic Operations, Lower and Upper Envelopes n n Theorem 3.4. Let x ∈ R be a point and v ∈ R be a direction. n (i) Let f, g : R → R be such that df(x; v) and dg(x; v) exist. Then

d(f + g)(x; v) = df(x; v) + dg(x; v), (3.25) d(fg)(x; v) = df(x; v) g(x) + f(x) dg(x; v) (3.26)

by defining

def n (f + g)(x) = f(x) + g(x) ∀x ∈ R , (3.27) def n (fg)(x) = f(x) g(x) ∀x ∈ R . (3.28)

If α ∈ R, then d(αf)(x; v) = αdf(x; v) by defining (αf)(x) def= αf(x).

n (ii) Let I be a finite set of indices, {fi : i ∈ I}, fi : R → R, a family of functions such that dfi(x; v) exists, and the upper envelope

def n h(x) = max fi(x), x ∈ R . (3.29) i∈I

Then

def dh(x; v) = max dfi(x; v), where I(x) = {i ∈ I : fi(x) = h(x)}. (3.30) i∈I(x)

Similarly, for the lower envelope

def n k(x) = min fi(x), x ∈ R , (3.31) i∈I

we have

def dk(x; v) = min dfi(x; v), where J(x) = {i ∈ I : fi(x) = k(x)}. (3.32) i∈J(x)

Corollary 1. Under the assumptions of Theorem 3.4, all the operations are verified for Hadamard semidifferentiable functions. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

104 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

f1(x)

h(x) = max{f1(x), f2(x)}

f2(x)

x

Figure 3.6. Upper envelope of two functions.

Proof. (i) By definition. (ii) It is sufficient to prove the result for two functions f1 and f2:

h1(x) = max{f1(x), f2(x)}.

From this, we can go to three functions by letting

h2(x) = max{h1(x), f3(x)} = max{fi(x) | 1 ≤ i ≤ 3}, and so on. Then, we repeat the construction for a finite number of functions. We distinguish three cases (see Figures 3.6 and 3.7): (a) f1(x) = f2(x), (b) f1(x) > f2(x), (c) f1(x) < f2(x).

(a) f1(x) = f2(x). On h1(x) = f1(x) = f2(x), for t > 0, h (x + tv) − h (x) max{f (x + tv), f (x + tv)} − h (x) 1 1 = 1 2 1 t t f (x + tv) − f (x) f (x + tv) − f (x) = max 1 1 , 2 2 . t t

f3(x)

h(x) = max {fi(x)} 1≤i≤3

f2(x)

f1(x)

x

Figure 3.7. Upper envelope of three functions. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 105

As both limits exist,

dh(x; v) = max{df1(x; v), df2(x; v)}.

(b) f1(x) > f2(x). Then h1(x) = f1(x) and, by continuity of f1(x + tv) and f2(x + tv) with respect to t > 0, there exists t > 0 such that

f1(x + tv) > f2(x + tv) ∀0 < t < t.

Then, for 0 < t < t,

h (x + tv) − h (x) f (x + tv) − f (x) 1 1 = 1 1 → df (x; v). t t 1

(c) f1(x) < f2(x). Repeat (b) by interchanging the indices.

This key theorem better motivates the introduction of semidifferentials. Indeed, the classical notion of differential fails when applied to the max of two (however smooth) functions at points where both functions are “active” and it becomes necessary to weaken the notion of differential to the one of semidifferential.

3.4.2 Chain Rule for the Composition of Functions One important operation in a good differential calculus is the chain rule for the composition h of n m n two functions f : R → R and g : R → R , where m and n are positive integers,

m g n f n R −→ R −→ R, ∂h X ∂f ∂gk (x) = (g(x)) (x), 1 ≤ j ≤ m. def m ∂xj ∂yk ∂xj x 7→ h(x) = f(g(x)) : R → R, k=1 Pm This property can also be expressed in terms of semidifferentials: given v = j=1 vj ej,

m m n n m X ∂h X X ∂f ∂gk X ∂f X ∂gk (x) vj = (g(x)) (x) vj = (g(x)) (x) vj ∂xj ∂yk ∂xj ∂yk ∂xj j=1 j=1 k=1 k=1 j=1 n n ! X ∂f X ⇒ dh(x; v) = (g(x)) dgk(x; v) = df g(x); dgk(x; v) ek ∂yk k=1 k=1 ⇒ dh(x; v) = df (g(x); dg(x; v)) , where g = (g1, . . . , gn) and dg(x; v) = (dg1(x; v), . . . , dgn(x; v)) is the extension of the notion m n of semidifferential to vector-valued functions g : R → R . n k This formula readily extends to vector functions f : R → R .

Theorem 3.5 (semidifferential of the composition of two functions). Let n ≥ 1, m ≥ 1, and m n n k m k ≥ 1 be three integers, g : R → R and f : R → R be two functions, x be a point in R , m and v be a direction in R . Consider the composition (f ◦ g)(x) = f(g(x)). Assume that

n (a) dg(x; v) exists in R ,

k (b) dH f(g(x); dg(x; v)) exists in R . Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

106 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Then,

(i) d(f ◦ g)(x; v) exists and

d(f ◦ g)(x; v) = dH f(g(x); dg(x; v)); (3.33)

(ii) if, in addition, dH g(x; v) exists, then dH (f ◦ g)(x; v) exists and

dH (f ◦ g)(x; v) = dH f(g(x); dH g(x; v)). (3.34)

Corollary 1. If, in addition, f is Hadamard differentiable at g(x) and g is Gateaux differentiable at x, then102

D(f ◦ g)(x) = Df(g(x)) ◦ Dg(x).

Since the mappings are linear, we simply write Df(g(x)) Dg(x).

Remark 3.4. n If f : R → R is a real-valued function in the corollary,

m ∀v ∈ R , ∇(f ◦ g)(x) · v = ∇f(g(x)) · Dg(x)v. (3.35)

The result can also be written in matrix form:

∇(f ◦ g)(x) = [Dg(x)]> ∇f(g(x)), (3.36) | {z } | {z } | {z } m×1 m×n n×1

m n where Dg(x) is the Jacobian m × n matrix of g : R → R ,

[Dg(x)]ij = ∂jgi(x), 1 ≤ i ≤ n, 1 ≤ j ≤ m, (3.37) and ∇f(g(x)) is considered as a column vector or an n × 1 matrix. If ∇f(g(x)) is considered as a line vector or a 1 × n matrix, the formula takes the form

∇(f ◦ g)(x) = ∇f(g(x)) Dg(x). | {z } | {z } | {z } 1×m 1×n n×m

Remark 3.5. It is possible to consider the composition of a finite number of functions. For instance, the Hadamard semidifferential of the composition of three functions g1 ◦ g2 ◦ g3 that are Hadamard semidifferentiable is given by

dH (g1 ◦ g2 ◦ g3)(x; v) = dH g1(g2(g3(x)); dH g2(g3(x); dH g3(x; v))).

All this extends to .

Proof of Theorem 3.5. (i) For t > 0, consider the limit of the differential quotient 1 q(t) def= [f(g(x + tv)) − f(g(x))]. t

102We shall see later that f being Hadamard semidifferentiable plus Gateaux differentiable is equivalent to f being Fréchet differentiable (see Definition 3.5 and Theorem 3.2 of section 3.2.4). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 107

The term f(g(x + tv)) can be written as g(x + tv) − g(x) g(x + tv) = g(x) + t = g(x) + t v(t) t by introducing the function g(x + tv) − g(x) v(t) def= . t By assumption of the existence of dg(x; v), v(t) → dg(x; v) as t & 0+. The differential quotient is now of the form 1 q(t) = [f(g(x) + t v(t)) − f(g(x))]. t

By definition and existence of dH f(g(x); dg(x; v) the limit exists and

lim q(t) = dH f(g(x); dg(x; v)). t&0+ m (ii) For t > 0 and w ∈ R , consider the differential quotient 1 q(t, w) def= [f(g(x + tw)) − f(g(x))]. t The term f(g(x + tw)) can be rewritten in the form g(x + tw) − g(x) g(x + tw) = g(x) + t = g(x) + t v(t, w) t by introducing the function g(x + tw) − g(x) v(t, w) def= . t But, by assumption, dH g(x; v) exists and + v(t, w) → dH g(x; v) as t & 0 and w → v. The differential quotient is now of the form 1 q(t, w) = [f(g(x) + t v(t, w)) − f(g(x))]. t

By definition and existence of dH f(g(x); dH g(x; v)), the limit exists and

lim q(t, w) = dH f(g(x); dH g(x; v)). t&0+ w→v Remark 3.6. It is important to recall that the semidifferential dH f(x; v) need not be linear in v as illustrated by the function f(x) = kxk for which dH f(0; v) = kvk.

The weaker assumption that dg(x; v) and df(g(x); dg(x; v)) exist is not sufficient to prove the theorem. The proof critically uses the stronger assumption that dH f(g(x); dg(x; v)) also exists. This is now illustrated by an example of the composition f ◦ g of a Gateaux differentiable function f and an infinitely differentiable function103 g. The composition f ◦ g is not Gateaux differentiable and not even Gateaux semidifferentiable at 0 in any direction v 6= 0.

103A function that is differentiable as well as all its partial derivatives of all orders. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

108 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Example 3.9. Consider the functions x6 f : 2 → , f(x, y) = if (x, y) 6= (0, 0) and f(0, 0) = 0, R R (y − x2)2 + x8  x  g : → 2, g(x) = . R R x2

We have seen in Example 3.6 that f is Gateaux differentiable in (0, 0) and that

0 ∇f(0, 0) = . 0

It is readily seen that g is infinitely continuously differentiable or of class C(∞) in R and that the associated Jacobian matrix is  1  Dg(x) = . 2x

The composition of f and g is given by ( 1/x2 if x 6= 0, h(x) = f(g(x)) = (3.38) 0 if x = 0.

By applying the chain rule, we should get

0 h0(0) = [Dg(0)]>∇f(g(0)) = 1 0 = 0. 0

The result given by the chain rule is false since the composition h(x) = f(g(x)), given by (3.38), as a real function of a real variable x is neither continuous at 0 nor right or left continuous at 0. It is not differentiable and not even semidifferentiable at x = 0 in any direction v 6= 0. This arises from the fact that the Gateaux differentiability of f is not sufficient. It would require the Hadamard semidifferentiability or differentiability of f at (0, 0).

n Up to now we have only considered functions defined on R , but semidifferentials can also n m be computed for functions defined on a finite-dimensional vector space such as L(R , R ) that n n can be identified with the family of m × n matrices or the set P k(R ) of polynomials on R of degree less than or equal to k ≥ 0.

Example 3.10. Let A be an n × n matrix whose entries are denoted {Aij}. As the determinant det(A) is a polynomial function of the entries, it is Hadamard differentiable, that is, dH det(A; B) exists for n n any n × n matrix B and the function B 7→ dH det(A; B): L(R , R ) → R is linear. To find its expression, we use Theorem 4.8 in Chapter 1. Write A in terms of its column vectors

def A = [a1, . . . , an], (aj)i = Aij, 1 ≤ i, j ≤ n, ⇒ det(A) = det([a1, . . . , an]).

n n In that form (a1, . . . , an) 7→ det([a1, . . . , an]) : (R ) → R is a multilinear function, that is, for each j, the function

n aj 7→ det([a1, . . . , aj−1, aj, aj+1, . . . , an]) : R → R Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 109 is linear. As a result

n X dH det(A; B) = det([a1, . . . , aj−1, bj, aj+1, . . . , an]), (3.39) j=1

def where the bj’s are the column vectors of the matrix B: (bj)i = Bij, 1 ≤ i ≤ n. Recall that the determinant can be developed with respect to the jth column by using the cofactor matrix Cof A,

n n X X det([a1, . . . , an]) = (aj)i (Cof A)ij = Aij(Cof A)ij. (3.40) i=1 i=1

Applying the formula to det([a1, . . . , aj−1, bj, aj+1, . . . , an]),

n X det([a1, . . . , aj−1, bj, aj+1, . . . , an]) = Bij(Cof A)ij. (3.41) i=1

This yields

n n X X dH det(A; B) = Bij(Cof A)ij = B·· (Cof A) = (Cof A)·· B, j=1 i=1 by using the Frobenius scalar product between two matrices as defined in (4.10) of Chapter 1. n In the change of variable formula for an integral over R , it is not det(A) but | det(A)| that appears in the formula. By using the chain rule, we get

 det(A)  Cof A·· B if det(A) 6= 0, dH | det |(A; B) = | det(A)| (3.42) |Cof A·· B| if det(A) = 0.

Recall the general formula (Theorem 4.8 in Chapter 1)

> > A (Cof A) = (Cof A) A = det(A) In

(In is the n × n identity matrix). If the matrix A is invertible, then

−1 > −1 −1 dH det(A; B) = det(A)[A ] ·· B = det(A) In·· A B = det(A) tr (A B).

So for the function A 7→ (ln | det |)(A) = ln(| det(A)|),

−1 > −1 dH (ln | det |)(A; B) = [A ] ·· B = tr (A B), (3.43) by using the chain rule with the function ln |t|,

d 1 ln |t| = , t 6= 0. dt t

For more examples involving matrices, see K. LANGE [1, Chapter 3]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

110 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

3.5 Lipschitzian Functions 3.5.1 Definitions and Their Hadamard Semidifferential Definition 3.7. n m Let f : R → R , n ≥ 1, m ≥ 1. (i) f is Lipschitzian at x if there exist c(x) > 0 and a neighborhood V (x) of x such that

m n ∀y, z ∈ V (x), kf(z) − f(y)kR ≤ c(x) kz − ykR .

n (ii) f is locally Lipschitzian on a subset U of R if, for each x ∈ U, there exist c(x) > 0 and a neighborhood V (x) of x such that

m n ∀y, z ∈ V (x) ∩ U, kf(z) − f(y)kR ≤ c(x) kz − ykR .

n (iii) f is Lipschitzian on a subset U of R if there exists c(U) > 0 such that

m n ∀y, z ∈ U, kf(z) − f(y)kR ≤ c(U) kz − ykR .

Example 3.11. n The norm f(x) = kxk is Lipschitzian on R since n ∀y, z ∈ R , |f(y) − f(z)| = |kyk − kzk| ≤ ky − zk n n with Lipschitz constant c(R ) = 1. The function f(x) = kxk2 is locally Lipschitzian on R n since for all x ∈ R , r > 0, 2 2 ∀y, z ∈ Br(x), |f(y) − f(z)| = kyk − kzk ≤ ky + zk ky − zk ≤ 2(r + kxk)ky − zk with the local constant c(x) = 2(r + kxk).

n m n Theorem 3.6. Let f : R → R , n ≥ 1, m ≥ 1, be Lipschitzian at x ∈ R .

(i) If df(x; v) exists, then dH f(x; v) exists. In particular, dH f(x; 0) exists. n n (ii) If df(x; v) exists for all v ∈ R , then dH f(x; v) exists for all v ∈ R and n ∀v, w ∈ , kdH f(x; v) − dH f(x; w)k m ≤ c(x) kv − wk n , (3.44) R R R where c(x) is the Lipschitz constant associated with the neighborhood V (x) of x in Defi- nition 3.7(i).

Proof. (i) Consider an arbitrary sequence {wn} converging to v, t > 0, and the differential quotient f(x + tw ) − f(x) f(x + tw ) − f(x + tv) f(x + tv) − f(x) n = n + . t t t By assumption, the second term goes to df(x; v) as t & 0. As f is Lipschitzian at x,

m n ∃c(x), ∀y, z ∈ V (x), kf(y) − f(z)kR ≤ c(x)ky − zkR and the absolute value of the first term can be estimated:

f(x + twn) − f(x + tv) n ≤ c(x) kwn − vkR → 0 when n → ∞. t m R Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 111

As the limit 0 is independent of the choice of the sequence wn → v, by definition of the Hadamard semiderivative, dH f(x; v) exists and

n dH f(x; v) = df(x; v) ∀v ∈ R .

For v = 0, df(x; 0) = 0 always exists. Hence, dH f(x; 0) exists. n (ii) For all v, w ∈ R and t > 0 sufficiently small, x + tv and x + tw belong to V (x) and

f(x + tv) − f(x) f(x + tw) − f(x) n − ≤ c(x) kv − wkR . t t m R

m n As t goes to zero, kdH f(x; v)−dH f(x; w)k = kdf(x; v)−df(x; w)kR ≤ c(x) kv−wkR . Example 3.12. n Since the norm f(x) = kxk is Lipschitzian on R and df(x; v) exists, so does dH f(x; v).

3.5.2 I Dini and Hadamard Upper and Lower Semidifferentials

In order to get the existence of dH f(x; v), the existence of df(x; v) is necessary. In general, this is not true for an arbitrary Lipschitzian function. For instance, dH dU (x; v) exists for a convex n set U since dU is convex and uniformly Lipschitzian on R , but the semidifferential might not exist at some points of a lousy set U. However, as f is Lipschitzian at x, the differential quotient

f(x + tv) − f(x) ≤ c(x) kvk t is bounded as t goes to zero and the liminf and limsup exist and are finite. In fact such limits exist for a much larger class of functions.

n Definition 3.8. (i) Given a function f : R → R ∪{+∞} and x ∈ dom f, f(x + tv) − f(x) df(x; v) def= lim inf t&0+ t

will be referred to as the Dini lower semidifferential.104

n (ii) Given a function f : R → R ∪{−∞} and x ∈ dom f, f(x + tv) − f(x) df(x; v) def= lim sup t&0+ t

will be referred to as the Dini upper semidifferential.105

By definition, df(x; v) = −d(−f)(x; v), so that it is sufficient to study the lower notion to get the properties of the upper notion. In fact, when f is Lipschitzian at x, we even get something stronger since the differential quotient

f(x + tw) − f(x) ≤ c(x)(kvk + kw − vk) t is also bounded as t → 0 and w → v.

104Ulisse Dini (1845–1918), U. DINI [1]. 105Upper and lower Dini derivatives in P. CANNARSA and C. SINESTRARI [1, Def. 3.1.3, p. 50]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

112 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n Definition 3.9. (i) Given a function f : R → R ∪{+∞} and x ∈ dom f,

def f(x + tw) − f(x) dH f(x; v) = lim inf t&0+ t w→v will be referred to as the Hadamard lower semidifferential. n (ii) Given a function f : R → R ∪{−∞} and x ∈ dom f,

def f(x + tw) − f(x) dH f(x; v) = lim sup t&0+ t w→v will be referred to as the Hadamard upper semidifferential.

Again, the upper notion can be obtained from the lower one by observing that

dH f(x; v) = −dH (−f)(x; v). (3.45) When f is Lipschitzian at x, the Dini and Hadamard notions coincide and the semidifferentials are finite. This is in line with previous notions such as the lower and upper semicontinuities.

3.5.3 I Clarke Upper and Lower Semidifferentials Lipschitzian functions were one of the motivations behind the introduction of another semidiffer- ential by F. H. CLARKE [1] in 1973. He considered the differential quotient f(y + tv) − f(y) . t For a Lipschitzian function at x, this quotient is bounded as t & 0 and y → x,

f(y + tv) − f(y) ≤ c(x) kvk, t and both liminf and limsup exist and are finite. Here, it is not sufficient that x ∈ dom f in order to make sense of the difference f(y + tv) − f(y) entering the differential quotient in a neighborhood of x, as can be seen from the convex function of Example 8.1 of Chapter 2 at the point (0, 0) of its effective domain.

Definition 3.10. n n Let f : R → R be Lipschitzian at x and v ∈ R be a direction.106 (i) The Clarke upper semidifferential of f at x in the direction v is107

def f(y + tv) − f(y) dC f(x; v) = lim sup . (3.46) t&0+ t y→x

(ii) The Clarke lower semidifferential of f at x in the direction v is108

def f(y + tv) − f(y) dC f(x; v) = lim inf . (3.47) t&0+ t y→x n 106For a function f : R → R, the definition of Lipschitzian at x would require that f be finite in a neighborhood of x, that is, x ∈ int (dom f). 107Generalized directional derivative in F. H. CLARKE [2, p. 10]. 108Generalized lower derivative in P. CANNARSA and C. SINESTRARI [1, Def. 3.1.11, p. 54]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 113

Again, the lower notion can be obtained from the upper one by observing that

dC f(x; v) = −dC (−f)(x; v). (3.48)

For Lipschitzian functions at x, the upper and lower Clarke semiderivatives are relaxations of the notion of strict differentiability109 at x, which will not be used in this book.

Remark 3.7. We shall see in section 4.4 that five of the six notions of Definitions 3.8, 3.9, and 3.10 can coincide when the Hadamard semidifferential exists: for semiconvex functions, f(y + tv) − f(y) ∀x ∈ dom f, df(x; v) = dH f(x; v) = dC f(x; v) = lim sup (3.50) t&0+ t y→x and for semiconcave functions, f(y + tv) − f(y) ∀x ∈ dom f, df(x; v) = dH f(x; v) = dC f(x; v) = lim inf . (3.51) t&0+ t y→x

In general, the existence of dH f(x; v) does not imply that dC f(x; v) = dH f(x; v) = dC f(x; v), as can be seen from the following example. Consider the function f(x) = |x| in R. We know that n dH f(0; v) = |v|. Choose the direction v = 1 and the sequences yn = (−1) /n and tn = 1/n. The differential quotient

|y + t v| − |y | n n n = (−1)n tn oscillates between +1 and −1 as n → ∞ and

dC f(0; 1) = −1 < dH f(0; 1) = |1| = 1 = dC f(0; 1).

3.5.4 I Properties of Upper and Lower Subdifferentials The next theorem gives the properties and relations between the above semidifferentials for Lip- schitzian functions.

109A function f Lipschitzian at x is strictly differentiable at x (see, for instance, J. M. BORWEIN and A. S. LEWIS [1, p. 132]) if

n n n f(y + tv) − f(y) ∃L(x): R → R linear such that ∀v ∈ R , lim = L(x)v. (3.49) t&0+ t y→x

It is stronger than Fréchet differentiability as can be seen from the following example of the function

def f(x) = x2 sin(1/x), x 6= 0, and f(0) = 0.

It is clearly differentiable at x 6= 0. At x = 0, consider the differential quotient for v 6= 0,

f(tv) − f(0) (tv)2 sin(1/(tv)) − 0 = = t v2 sin(1/(tv)) → 0. t t

It is not strictly differentiable in x = 0. Choose v = 1 and the sequence of points yn = 1/[(n + 1/2)π] → 0 and tn = 1/[(n + 1/2)π] − 1/[(n + 3/2)π] & 0. Then (f(yn + tnv) − f(yn))/tn oscillates towards ±1/π as n → ∞. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

114 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n n Theorem 3.7. Let f : R → R be Lipschitzian at x ∈ R . n (i) For all v ∈ R ,

dC f(x; v) ≤ dH f(x; v) = df(x; v) ≤ df(x; v) = dH f(x; v) ≤ dC f(x; v). (3.52)

(ii) dH f(x; 0) exists and dC f(x; 0) = dH f(x; 0) = dC f(x; 0) = 0.

(iii) The mappings v 7→ dC f(x; v), v 7→ dH f(x; v), v 7→ dH f(x; v), and v 7→ dC f(x; v) are n positively homogeneous and for all v, w ∈ R ,

|dC f(x; w) − dC f(x; v)| ≤ c(x) kw − vk, (3.53)

|dH f(x; w) − dH f(x; v)| ≤ c(x) kw − vk, (3.54)

|dH f(x; w) − dH f(x; v)| ≤ c(x) kw − vk, (3.55)

|dC f(x; w) − dC f(x; v)| ≤ c(x) kw − vk. (3.56)

Moreover, v 7→ dC f(x; v) is superadditive,

n ∀v, w ∈ R , dC f(x; v + w) ≥ dC f(x; v) + dC f(x; w), (3.57)

and v 7→ dC f(x; v) is subadditive,

n ∀v, w ∈ R , dC f(x; v + w) ≤ dC f(x; v) + dC f(x; w). (3.58)

Proof. (i) First we prove the last inequality in (3.52). Since f is Lipschitzian at x, dC f(x; v) and dH f(x; v) exist, and dH (x; 0) exists and is equal to 0 by Theorem 3.6. So, there exist sequences {tn}, tn > 0 and {wn}, wn 6= v, such that tn & 0, wn → v, and

f(y + tv) − f(y) f(x + tnwn) − f(x) dH f(x; v) = lim sup = lim . n→∞ t&0+ t tn y→x

Consider the differential quotient f(x + t w ) − f(x) f(x + t (w − v) + t v) − f(x + t (w − v)) n n = n n n n n tn tn f(x + t (w − v)) − f(x) + n n . tn

Since wn − v → 0 and yn = x + tn(wn − v) → x, we have

f(x + tn(wn − v)) − f(x) lim = dH f(x; 0) = 0, n→∞ tn f(x + tn(wn − v) + tnv) − f(x + tn(wn − v)) lim sup ≤ dC f(x; v) n→∞ tn

⇒ dH f(x; v) ≤ dC f(x; v).

As for the first inequality in (3.52), apply the above result to −f since it is also Lipschitzian:

dH (−f)(x; v) ≤ dC (−f)(x; v)

⇒ dC f(x; v) = −dC (−f)(x; v) ≤ −dH (−f)(x; v) = dH f(x; v). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 115

(ii) From part (i). n (iii) It is sufficient to give the proof for the upper notions. Indeed, for α > 0 and v, w ∈ R ,

dH f(x; αv) = −dH (−f)(x; αv) = −α dH (−f)(x; v) = α dH f(x; v),

|dH f(x; w) − dH f(x; v)| = −dH (−f)(x; w) + dH (−f)(x; v) ≤ c(x) kw − vk.

For dH f(x; αv), consider the differential quotient f(x + tw) − f(x) q(t, w) = , t > 0, w ∈ n . t R For α > 0, q(t, αw) = α q(αt, w) and lim sup q(t, αw) = α lim sup q(αt, w) = α lim sup q(t, w). t&0 t&0 t&0 w→v w→v w→v

Given w1 → v1, w1 6= v1, and w2 → v2, w2 6= v2, f(x + tw ) − f(x + tw ) q(t, w ) = q(t, w ) + 2 1 ≤ q(t, w ) + c(x) kw − w k. 2 1 t 1 2 1

For ε > 0, 0 < t < ε, kw1 − v1k < ε, and kw2 − v2k < ε, we have

q(t, w2) ≤ q(t, w1) + c(x)(kv2 − v1k + 2ε) ,

sup q(t, w2) ≤ sup q(t, w1) + c(x)(kv2 − v1k + 2ε) , 0

lim sup q(t, w2) ≤ lim sup q(t, w1) + c(x) kv2 − v1k, ε&0 0

dH f(x; v2) ≤ dH f(x; v1) + c(x) kv2 − v1k.

Since the roles of v1 and v2 can be interchanged,

dH f(x; v2) − dH f(x; v1) ≤ c(x) kv2 − v1k.

For dC f(x; αv), consider the differential quotient f(y + tv) − f(y) q(t, v, y) = , t > 0, y, v ∈ n . t R For α > 0, q(t, αv, y) = α q(αt, v, y) and lim sup q(t, αv, y) = α lim sup q(αt, v, y) = α lim sup q(t, v, y). t&0 t&0 t&0 y→x y→x y→x n Given v1, v2 ∈ R , for t > 0 sufficiently small and y 6= x close to x so that y + tv1 and y + tv2 belong to V (x), f(y + tv ) − f(y + tv ) q(t, v , y) = q(t, v , y) + 2 1 ≤ q(t, v , y) + c(x) kv − v k. 2 1 t 1 2 1 For ε > 0, 0 < t < ε, and ky − xk < ε, we have

q(t, v2, y)) ≤ q(t, v1, y) + c(x) kv2 − v1k,

sup q(t, v2, y) ≤ sup q(t, v1, y) + c(x) kv2 − v1k, 0

lim sup q(t, v2, y) ≤ lim sup q(t, v1, y) + c(x) kv2 − v1k, ε&0 0

dC f(x; v2) ≤ dC f(x; v1) + c(x) kv2 − v1k. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

116 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Since the roles of v1 and v2 can be interchanged,

dC f(x; v2) − dC f(x; v1) ≤ c(x) kv2 − v1k.

n Finally, given v1, v2 ∈ R , for t > 0 sufficiently small and y 6= x close to x so that y + tv1, y + tv2, and y + t(v1 + v2) belong to V (x),

q(t, v2 + v1, y) = q(t, v2, y + tv1) + q(t, v1, y),

lim sup q(t, v2 + v1, y) ≤ lim sup q(t, v2, y + tv1) + lim sup q(t, v1, y) t&0 t&0 t&0 y→x y→x y→x

= lim sup q(t, v2, y) + lim sup q(t, v1, y) t&0 t&0 y→x y→x

⇒ dC f(x; v2 + v1) ≤ dC f(x; v2) + dC f(x; v1).

For the norm f(x) = kxk which is convex and Lipschitzian and, more generally, for semi- 110 convex functions , the identity dH f(x; v) = dC f(x; v) holds. For functions enjoying that property, the nice semidifferential calculus and the chain rule are available and the mapping v 7→ dH f(x; v) is convex and uniformly Lipschitzian. But this is not necessarily true for an arbitrary Lipschitzian function. The use of upper or lower semidifferentials involving a limsup or a liminf considerably weakens the nice semidifferential calculus associated with Hadamard semidifferentiable functions. For instance, the basic functional operations take the following form:

dC (f + g)(x; v) ≤ dC f(x; v) + dC g(x; v),

dC (f ∨ g)(x; v) ≤ dC f(x; v) ∨ dC g(x; v), where (f ∨ g)(x) = max{f(x), g(x)}. In general, the classical chain rule will not hold for the composition of upper semidifferentiable functions111 and only weaker geometric forms will be available.112 In practice, the choice of a semidifferential and its associated calculus can be important. It should be sufficiently general to effectively deal with the problem at hand but not too general, so as to retain as many features of the classical differential calculus as possible. On this subject, it is amusing to read what Maurice Fréchet wrote about a notion of differential suggested by Paul Lévy:

. . . Lastly the definition due to M. Paul Lévy, not necessarily verifying the theo- rem of composite functions, is still more general, but for this very reason, perhaps too general. . . . 113

3.6 Continuity, Hadamard Semidifferential, and Fréchet Differential In dimension n = 1, Fréchet and Gateaux differentials coincide and correspond to the usual derivative of Definition 2.1(iii) and such functions are continuous by Theorem 2.1.

110 For semiconcave functions, we have dC f(x; v) = dH f(x; v) (see section 4.4). 111Consider the real functions f(x) = |x| and g(x) = −x, and their composition (g ◦ f)(x) = g(f(x)) = −|x|. It is readily seen that dC (g ◦ f)(0; 1) = +1, dC g(0; 1) = −1, dC f(0; 1) = 1, and dC (g ◦ f)(0; 1) = +1 > −1 = dC g(0; dC f(0; 1)). 112Cf. F. H. CLARKE [2, pp. 42–47]. 113“. . . Enfin la définition due à M. Paul Lévy, ne vérifiant pas nécessairement le théorème des fonctions composées, est plus générale encore, mais, pour cette même raison, peut-être trop générale. . . . ” (M. FRÉCHET [5, p. 233] in 1937). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 117

Example 3.8 shows that, in dimension n ≥ 2, a Gateaux differentiable function is not neces- sarily continuous. So, it is not the linearity of the semidifferential that causes the continuity of the function. Example 3.7 considers a continuous function that is Hadamard semidifferentiable, but not Gateaux differentiable (that is, dH (x; v) is not linear in v). It is to preserve the continuity of the function that the stronger notion of Fréchet differential has been introduced. From Theorem 3.2, it is necessary that the function be Hadamard semidifferentiable to make it Fréchet differentiable and thence continuous. It turns out that only the existence of dH f(x; 0) is sufficient for the continuity of f at x.

n m Theorem 3.8. Let f : R → R , n ≥ 1, m ≥ 1. If dH f(x; 0) exists at x, f is continuous at x. Moreover, for any α ∈ (0, 1) and any ε > 0,

m kf(y) − f(x)kR ∃δ > 0 such that ∀y ∈ Bδ(x), α < ε. ky − xk n R n m Corollary 1. If f : R → R is Fréchet differentiable at x, then f is continuous at x. Remark 3.8. It is interesting to compare Theorem 3.8 and Theorem 3.6, which says that if f is Lipschitzian114 at x, then dH f(x; 0) exists. In general, Lipschitz continuous functions are not necessarily Hadamard semidifferentiable and Hadamard semidifferentiable functions are continuous but not necessarily Lipschitz continuous.

Proof. First proof. If dH f(x; 0) exists, then dH f(x; 0) = df(x; 0) = 0. Let {xn}, xn 6= x, α be a convergent sequence to x. We want to prove that f(xn) → f(x). Let tn = kxn − xk , 0 < α < 1. We have   xn−x f x + tn − f(x) f(x ) − f(x) tn n = , tn tn α xn − x 1−α tn = kxn − xk & 0, vn = kvn − 0k = kxn − xk → 0. tn

Since dH f(x; 0) exists

f(xn) − f(x) lim = dH f(x; 0) = 0. n→∞ tn For all ε > 0, there exists N such that

f(xn) − f(x) ∀n > N, < ε tn α ⇒ |f(xn) − f(x)| < εkxn − xk → 0 as n → ∞.

This proves the continuity of f at x. Second proof. If dH f(x; 0) exists, then dH f(x; 0) = df(x; 0) = 0. For all y such that y 6= x,

 α y−x  m f x + ky − xk ky−xkα − f(x) kf(y) − f(x)kR α = α − 0 . ky − xk n ky − xk n R R m R

114This corresponds to α = 1. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

118 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

α Since d f(x; 0) = 0, as y → x, t = ky − xk n → 0 and H R

y − x 1−α y − x w = α = ky − xk → 0 when y → x ky − xk n ky − xk n R R  α y−x  f x + ky − xk α − f(x) ky−xk n R ⇒ lim α − dH f(x; 0) = 0. y→x ky − xk n R m R Therefore, for all ε > 0, there exists δ > 0 such that

 α y−x  m f x + ky − xk ky−xkα − f(x) kf(y) − f(x)kR ∀y ∈ Bδ(x), α = α − 0 < ε. ky − xk n ky − xk R m R In particular, this yields the continuity of f at x.

3.7 Mean Value Theorem for Functions of Several Variables We start with the mean value theorem, Theorem 2.4, for a real function of a real variable that will be suitably modified for vector functions of several variables.

n n n def Theorem 3.9. Let f : R → R, x ∈ R , and v ∈ R . If the function t 7→ g(t) = f(x + tv) is continuous on [0, 1] and differentiable on ]0, 1[ , then

∃θ ∈ ]0, 1[ such that f(x + v) = f(x) + df(x + θv; v). (3.59)

Proof. It is sufficient to observe that for t ∈ ]0, 1[

g0(t) = df(x + tv; v).

Indeed, by the definition of g0(t) at a point 0 < t < 1, for |s| sufficiently small, g(t + s) − g(t) f(x + (t + s)v) − f(x + tv) = s s f((x + tv) + s v) − f(x + tv) = → df(x + tv; v) as s → 0. s To complete the proof, it is sufficient to apply the mean value theorem (Theorem 2.4) to g(t) = f(x + tv): there exists θ ∈ ]0, 1[ such that g(1) = g(0) + g0(θ).

For vector-valued functions there is a θi for each component fi, but not a single θ that suits all components. n m n Theorem 3.10. Consider the function f : R → R , m ≥ 1, n ≥ 1, and two points a, b ∈ R , b 6= a. Assume that f is continuous at each point of the closed segment

{a + t (b − a) : 0 ≤ t ≤ 1} and Gateaux differentiable at each point of the open segment

{a + t (b − a) : 0 < t < 1}.

Then, there exists θ, 0 < θ < 1, such that

m n n m kf(b) − f(a)kR ≤ kb − akR kDf(a + θ(b − a))kL(R ,R ). (3.60) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 119

Proof. Define the function

def t 7→ ϕ(t) = [f(b) − f(a)] · f(a + t (b − a)) : [0, 1] → R . As ϕ is continuous on [0, 1] and differentiable in ]0, 1[ ϕ0(t) = [f(b) − f(a)] · df(a + t (b − a); b − a) = [f(b) − f(a)] · Df(a + t (b − a)) [b − a].

There exists θ ∈ ]0, 1[ such that ϕ(1) − ϕ(0) = ϕ0(θ) by the mean value theorem (Theorem 2.4). Explicitly,

2 ϕ(1) − ϕ(0) = kf(b) − f(a)k m R ⇒ kf(b) − f(a)k2 = [f(b) − f(a)] · Df(a + θ (b − a)) [b − a]

m n m n ≤ kf(b) − f(a)kR kDf(a + θ(b − a))kL(R ,R ) kb − akR

m n n m ⇒ kf(b) − f(a)kR ≤ kb − akR kDf(a + θ(b − a))kL(R ,R ). n m Theorem 3.11. Consider a function f : R → R Gateaux differentiable in the open convex n set U ⊂ R for which

n m ∃M > 0, ∀x ∈ U, kDf(x)kL(R ,R ) ≤ M; (3.61) then

m n ∀a, b ∈ U, kf(b) − f(a)kR ≤ M kb − akR , (3.62) f has a Lipschitzian extension to U with constant M, and f is Fréchet differentiable in U. Proof. From Theorem 3.10, for each pair a, b ∈ U

m n n m n kf(b) − f(a)kR ≤ kb − akR kDf(a + θ(b − a))kL(R ,R ) ≤ M kb − akR since, by convexity of U, a + θ(b − a) ∈ U. Therefore, the function f is Lipschitzian on U. Moreover, since f is Gateaux differentiable and Lipschitzian on U, it is Fréchet differentiable on U by Theorem 3.6(i).

Remark 3.9. Theorem 3.11 seems to contradict Example 3.8 of a Gateaux differentiable function f(x, y) at the point (0, 0), but discontinuous at that point. Indeed, the function f(x, y) is Gateaux differentiable 2 not only at (0, 0) but also at every point of R . However, the function and its gradient are not bounded in any open ball Bδ(0, 0), δ > 0, around (0, 0). It is easy to check that for (x, y) 6= (0, 0), ∂f 2 x6 (y − x2) ∂f (y − x2)(3y − x2) − x8 = − , = 2x5 . ∂y [(y − x2)2 + x8]2 ∂x [(y − x2)2 + x8]2

Choosing y = x2, x 6= 0, ∂f ∂f 2 (x, x2) = 0, (x, x2) = − , ∂y ∂x x3 and ∇f(x, x2) is not bounded as x goes to 0. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

120 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n m Corollary 1. Let f : R → R be Gateaux differentiable on a connected open nonempty subset n U of R . If Df(x) = 0 in U, then f(x) is equal to a constant vector in U.

Proof. Since U 6= ∅, pick a point x0 ∈ U and define U1 = {x ∈ U : f(x) = f(x0)}. By definition U1 6= ∅. Let x ∈ U1. Since U is open, there exists r > 0 such that Br(x) ⊂ U. Apply Theorem 3.11 with M = 0 and the open convex set Br(x) to get

∀y ∈ Br(x), f(y) = f(x) = f(x0). n Hence Br(x) ⊂ U1. This shows that U1 is an open subset of R . Consider the complement with respect to U, def U\U1 = {x ∈ U : f(x) 6= f(x0)}.

For each point x1 ∈ U\U1, there exists r > 0 such that Br(x1) ⊂ U and by the same argument f(x) = f(x1) 6= f(x0) for all x ∈ Br(x1). Thence, Br(x1) ⊂ U\U1 and U\U1 is open. If U\U1 6= ∅, then U is the union of two disjoint open sets U1 and U\U1. But this is n impossible since U is connected in R . Therefore, U\U1 = ∅ and U = U1. Remark 3.10. 2 H.WHITNEY [1] has given an example of a convex set U ⊂ R and a differentiable function f such that ∇f(x, y) = 0 for all (x, y) ∈ U, but f(x, y) is not constant on U. This set has no interior point.

3.8 Functions of Class C(0) and C(1) When f is Gateaux (resp., Fréchet) differentiable in x, we have seen that the gradient can be expressed in terms of the partial derivatives of f at x. In general, the converse is not true since the existence of directional derivatives in all directions is not sufficient to get the Gateaux (and Fréchet) differentiability. However, when appropriate continuity conditions are imposed on the partial derivatives, the function becomes Fréchet differentiable and the gradient is com- pletely specified by the partial derivatives. The main structure of the proof is borrowed from W. RUDIN [1].

Definition 3.11. n n n n Let f : R → R, U ⊂ R be open, and {ei}i=1 be the canonical orthonormal basis of R . (i) f is of class C(0) on U if f is continuous on U. It will be denoted f ∈ C(0)(U).

(1) (ii) f is of class C on U if the partial derivatives ∂if(x), 1 ≤ i ≤ n, exist and are continuous on U. It will be denoted f ∈ C(1)(U). The above definitions extend to vector-valued functions.

n n n Theorem 3.12. Let f : R → R and {ei}i=1 be the canonical orthonormal basis of R .

(i) If f has partial derivatives ∂if, i = 1, . . . , n, on a neighborhood of x that are continuous at x, then f is Fréchet differentiable at x (and hence continuous at x) and n n n X X ∀v ∈ R ,L(x)v = df(x; ei) ei · v, ∇f(x) = df(x; ei) ei. i=1 i=1

n (ii) If f has partial derivatives ∂if, i = 1, . . . , n, on an open subset U of R that are contin- uous on U, then f is Fréchet differentiable on U (and hence continuous on U) and for all y ∈ U, n n n X X ∀v ∈ R ,L(y)v = ∂if(y) ei · v, ∇f(y) = ∂if(y) ei, i=1 i=1 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 121

and the maps n n y 7→ ∇f(y): U → R and (y, w) 7→ df(y; w): U × R → R (3.63) are continuous. Corollary 1. If f is of class C(1) on an open set U, then f is Fréchet differentiable on U and hence of class C(0) on U. Moreover, ∇f is of class C(0) on U. Proof of Theorem 3.12. (i) It is sufficient to prove that f is Fréchet differentiable at x. The continuity follows from Theorem 3.8 and the other properties from the comments following the definitions of Gateaux and Fréchet differentials. To show that f is Fréchet differentiable, we show that f is Hadamard semidifferentiable and that dH f(x; v) is linear with respect to v, and we apply Theorem 3.2. n Any element v = (v1, . . . , vn) of R can be written as n X v = viei. i=1 For each y in the neighborhood V (x) of x, define the linear map n def X n v 7→ L(y)v = ∂if(y) vi : R → R . i=1 n Fix v ∈ R and consider sequences wk → v and tk & 0. There exists N such that

∀k > N, x + tk wk ∈ V (x). Define the following points: 0 def i def i−1 xk = x, xk = xk + tk(wk)iei, 1 ≤ i ≤ n. We want to show that the differential quotient minus L(x)v,

def f(x + tkwk) − f(x) qk = − L(x)v, tk goes to 0 as k goes to infinity. That difference can be rewritten in the form n X f(xi ) − f(xi−1) q = k k − ∂ f(x) v . k t i i i=1 k i−1 Since f is differentiable in the direction ei at every point of V (x), the function gi(α) = f(x + α tk (wk)iei) is continuous in [0, 1] and differentiable in ]0, 1[ . By the mean value theorem (Theorem 3.9) i i i−1 i−1 i ∃αk ∈ ] 0, 1 [ , f(xk) − f(xk ) = df(xk + αktk(wk)iei; tk(wk)iei) i−1 i = tk(wk)i ∂if(xk + αktk(wk)iei) by homogeneity and finally

i i−1 f(xk) − f(xk ) i−1 i i−1 i = (wk − v)i ∂if(xk + αktk(wk)iei) + vi ∂if(xk + αktk(wk)iei), tk i i−1 f(xk) − f(xk ) i−1 i − ∂if(x) vi = (wk − v)i ∂if(xk + αktk(wk)iei) tk  i−1 i  + vi ∂if(xk + αktk(wk)iei) − ∂if(x) . Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

122 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

But, by construction, for all i,

i−1 i−1 i X j j−1 i kx + α tk(wk)iei − xk = (x − x ) + α tk(wk)iei k k k k k j=1  1/2 i−1 i−1 X i X 2 i 2 = tk(wk)jej + α tk(wk)iei = tk |(wk)j| + |α (wk)i| k k j=1 j=1  1/2 1/2  i   n  X 2 X 2 ≤ tk |(wk)j| ≤ tk |(wk)j| = tk kwkk, j=1  j=1 

i−1 which goes to zero as wk → v and tk & 0 and since ∂if(y) is continuous at x, ∂if(xk + i αktk(wk)iei) → ∂if(x), and qk goes to zero as k goes to infinity. We have shown that, for all n v ∈ R , dH f(x; v) = L(x)v that is linear in v by the definition of L(x). This proves that f is Fréchet differentiable at x. (ii) Results for U follow from the results in part (i) at x. It remains to prove the continuity of the maps (3.63). As the expression of the gradient ∇f(y) is n X ∇f(y) = df(y; ei) ei, i=1 it is continuous on U as the sum of n functions continuous on U by assumption on the partial derivatives ∂if(y) = df(y; ei). Since f is Fréchet differentiable on U, for all y in U,

n ∀w ∈ R , df(y; w) = ∇f(y) · w = L(y)w.

n n Given an arbitrary pair (x, v) in U ×R and another pair (y, w) in U ×R , estimate the difference

df(y; w) − df(x; v) = df(y; w) − df(x; w) + df(x; w) − df(x; v) = [∇f(y) − ∇f(x)] · w + ∇f(x) · (w − v) ⇒ |df(y; w) − df(x; v)| ≤ k∇f(y) − ∇f(x)k kwk + k∇f(x)k kw − vk.

Since x is fixed, there exists a constant c > 0 such that |∇f(x)| ≤ c and for all ε > 0 and all w such that kw − vk ≤ ε/(2c), we have

k∇f(x)k kw − vk ≤ ε/2 and kwk ≤ kvk + ε/(2c).

Now, by continuity of the n partial derivatives on U, there exists δ(x) > 0 such that

n 1/2  X  ε ky − xk < δ(x) ⇒ |∂ f(y) − ∂ f(x)|2 ≤ j j 2(kvk + ε/(2c)) j=1 ⇒ k∇f(y) − ∇f(x)k kwk ≤ ε/2.

Finally, for all ε > 0, there exists δ0 = min{δ(x), ε/(2c)} such that

ky − xk < δ0 and kw − vk ≤ δ0 ⇒ |df(y; w) − df(x; v)| ≤ ε and hence the continuity as (y, w) goes to (x, v). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 123

If f and g are functions of class C(1) on the same open set U, then the sum (f + g)(x) = f(x) + g(x) is of class C(1) on U. In a similar fashion, the product (fg)(x) = f(x)g(x) is of class C(1) using the property

∂i(fg) = (∂if)g + f(∂ig), 1 ≤ i ≤ n. The composition of two functions of class C(1) is also of class C(1). Indeed, if f = ψ ◦ g, n g : U → R (U ⊂ R ), and ψ : R → R, then dψ ∂ f = (ψ0 ◦ g)∂ g, where ψ0 = . i i dx

3.9 Functions of Class C(p) and Second- and higher-order semidifferentials are defined in the same manner as those of first order.

Definition 3.12. n n n Let f : R → R, x ∈ R , and v and w be two directions in R . Assume that df(y; v) exists for all y on a neighborhood V (x) of x. The function f has a second-order semidifferential in the directions (v, w) in x if the limit df(x + tw; v) − df(x; v) lim t&0 t exists. The limit is denoted d2f(x; v; w).

In general, the order of the directions (v, w) is important. n Theorem 3.13. If f : R → R is Gateaux differentiable on a neighborhood V (x) of x, and ∇f(y) is Gateaux differentiable at x, then the map

2 n n (v, w) 7→ d f(x; v; w): R × R → R (3.64) is bilinear, that is,

n 2 n ∀v ∈ R , w 7→ d f(x; v; w): R → R is linear, (3.65) n 2 n ∀w ∈ R , v 7→ d f(x; v; w): R → R is linear. (3.66) Moreover, there exists a unique linear map

n n n 2 Hf(x): R → R such that ∀v, w ∈ R , d f(x; v; w) = Hf(x)w · v. (3.67)

Any pair of directions v = (v1, . . . , vn) and w = (w1, . . . , wn) can be written as n n X X v = viei, w = wiei i=1 i=1 n in terms of the canonical orthonormal basis {ei}i=1. Under the assumptions of Theorem 3.13, n n the map (v, w) 7→ d2f(x; v; w): R × R → R is bilinear. Therefore  n n  n n 2 2 X X X X 2 d f(x; v; w) = d f x; viei; wjej = d f(x; ei; ej)viwj. i=1 j=1 i=1 j=1

2 The elements d f(x; ei; ej) are the entries of the matrix associated with the linear map Hf(x): n n R → R . In the light of the theorem and the above comments, we now introduce the following definitions. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

124 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Definition 3.13. Let the assumptions of Theorem 3.13 hold. n n (i) The Hessian115 of f at x is the linear map Hf(x): R → R defined by (3.67). (ii) The Hessian matrix of f at x is the n × n matrix with entries

def 2 Hf(x)ij = d f(x; ei; ej)

n n with respect to the canonical orthonormal basis {ei}i=1 of R . The same notation Hf(x) will be used for the map and its associated matrix.

Proof of Theorem 3.13. By assumption, for all y ∈ V (x), f is Gateaux differentiable at y and df(y; v) = ∇f(y) · v and, since ∇f is Gateaux differentiable at x, d∇f(x; v) = D(∇f)(x) v,

n n where D(∇f)(x): R → R is the Jacobian (linear) map associated with the vector function y 7→ ∇f(y). n For v and w in R and t > 0, consider the quotient df(x + tw; v) − df(x; v) ∇f(x + tw) − ∇f(x) = · v. t t As t → 0, ∇f(x + tw) − ∇f(x) → D(∇f)(x) w, t df(x + tw; v) − df(x; v) ∇f(x + tw) − ∇f(x) = · v → D(∇f)(x) w · v t t ⇒ d2f(x; v; w) = D(∇f)(x) w · v.

So, we get bilinearity and linear map Hf(x) coincides with the linear map D(∇f)(x). We stress the fact that the definition of the Hessian matrix is compatible with the definition n n of the Jacobian matrix DF of a vector function F : R → R . Indeed,

def ∂Fi DF (x)ij = (x), ∂xj where Fi is the ith component of a function F . Taking F = ∇f, we get   ∂ ∂f 2 D(∇f(x))ij = (x) = d f(x; ei; ej) = Hf(x)ij. ∂xj ∂xi

2 We shall use the notation ∂jif(x). The higher-order partial derivatives are defined in the same way as second-order derivatives.

m ∂ f(x) m = d f(x; ei1 ; ... ; eim ). (3.68) ∂xim···∂xi1 ∂m f(x) We shall use the notation im···i1 . 115The Hessian matrix was developed in the 19th century by Ludwig Otto Hesse (1811–1874) and later named after him. Hesse himself had used the term “functional determinant.” Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

3. Real-Valued Functions of Several Real Variables 125

Definition 3.14. n n Let f : R → R and U an open subset of R . The function f is of class C(p) on U if all partial derivatives of f of order p exist and are continuous on U.

From the previous results (see Corollary 1 to Theorem 3.12), it is readily seen that functions of class C(p) on U are of class C(p−1) on U, and so on. The second-order differentials will be important to characterize the convexity of a differen- tiable function and hence to characterize its minimizers. In general, the Hessian matrix is not symmetric (see the example of Exercise 7.8), but it is for functions of class C(2).

n Theorem 3.14. Let f : R → R. Let V (x) be a neighborhood of x such that n ∀y ∈ V (x), ∀v ∈ R , df(y; v) exists, (3.69) n 2 ∀y ∈ V (x), ∀v, w ∈ R , d f(y; v; w) exists. (3.70) n If for all v and w in R the map

2 y 7→ d f(y; v; w): V (x) → R is continuous at x, (3.71) then

n 2 2 ∀v, w ∈ R , d f(x; v; w) = d f(x; w; v). (3.72)

n n n Corollary 1. Let {ei}i=1 be the canonical orthonormal basis of R , f : R → R, and U be an n open subset of R . Assume that f has partial derivatives ∂if on U and that each ∂if also has partial derivatives ∂j(∂if) on U. If for each pair i and j, the map

y 7→ ∂j(∂if(y)) : U → R is continuous, (3.73) then f is of class C(2) on U,

n 2 2 ∀y ∈ U, ∀v, w ∈ R , d f(y; v; w) = d f(y; w; v), ∂j(∂if(y)) = ∂i(∂jf(y)), (3.74) and the Hessian matrix Hf(y) is symmetric. Proof of Theorem 3.14. Define

def Cs,t = f(x + sv + tw) − f(x + sv) − f(x + tw) + f(x).

We give two different expressions of Cs,t. For s > 0 and t > 0 sufficiently small, x, x + sv, x + sv + tw are in U. We have

def Cs,t = g(x + tw) − g(x), where g(z) = f(z + sv) − f(z).

By Taylor’s theorem (Theorem 2.6), using a first order expansion, there exists α1 ∈ ]0, 1[ such that Cs,t = dg(x + α1tw; tw). By the definition of g, this identity can be rewritten as a function of df:

Cs,t = df(x + α1tw + sv; tw) − df(x + α1tw; tw).

A second application of Taylor’s theorem (Theorem 2.6) yields an α2 ∈ ]0, 1[ :

2 2 Cs,t = d f(x + α1tw + α2sv; tw; sv) = st d f(x + α1tw + α2sv; w; v) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

126 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities using the positive homogeneity. In an analogous fashion, by interchanging the roles of s and t, we get 2 ∃α3, α4 ∈ ]0, 1[ ,Cs,t = st d f(x + α3tw + α4sv; v; w). Hence 2 2 d f(x + α1tw + α2sv; w; v) = d f(x + α3tw + α4sv; v; w). As s → 0 and t → 0, we get, by continuity of d2f, d2f(x; w; v) = d2f(x; v; w).

4 Convex and Semiconvex Functions n Recall Definitions 7.1 and 7.6 of Chapter 2 of a convex subset U of R and of convex and strictly n convex functions f : R → R on a convex U.

4.1 Directionally Differentiable Convex Functions n n n Theorem 4.1. Let U ⊂ R be a convex open subset of R and f : R → R be directionally differentiable116 in U. (i) f is convex on U if and only if

∀x, y ∈ U, f(y) ≥ f(x) + df(x; y − x). (4.1)

(ii) If, in addition, f is Gateaux differentiable in U, then f is convex on U if and only if

∀x, y ∈ U, f(y) ≥ f(x) + ∇f(x) · (y − x). (4.2)

Proof. (i) (⇒) For all λ ∈ ]0, 1[ ,

f(λy + (1 − λ)x) ≤ λf(y) + (1 − λ)f(x) ⇒ f(x + λ(y − x)) − f(x) ≤ λ(f(y) − f(x)).

By dividing by λ > 0 and going to the limit as λ → 0, we get

df(x; y − x) ≤ f(y) − f(x).

(⇐) Conversely, apply condition (4.1) twice for λ ∈ [0, 1] and x, y ∈ U:

f(x) ≥ f(x + λ(y − x)) + df(x + λ(y − x); −λ(y − x)), f(y) ≥ f(x + λ(y − x)) + df(x + λ(y − x); (1 − λ)(y − x)).

Multiply the first inequality by 1 − λ and the second by λ. Add them up. As U is convex, x+λ(y−x) ∈ U, and, by homogeneity, df(x+λ(y−x); −(y−x)) = −df(x+λ(y−x); y−x). This yields (1 − λ)f(x) + λf(y) ≥ f(x + λ(y − x)) and the convexity of f on U. (ii) As f is Gateaux differentiable, its gradient exists and df(x; v) = ∇f(x) · v. Substitute it into the expression of part (i). n 116In the sense of Définition 3.1(iii), that is, for all v ∈ R the limit of the differential quotient f(x + tv) − f(v) lim t→0 t n exists as t goes to 0. This condition is equivalent to saying that for all v ∈ R df(x; v) exists and df(x; −v) = −df(x; v). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 127

n n n Theorem 4.2. Let U ⊂ R be a convex open subset of R and f : R → R be directionally differentiable on U. (i) f is strictly convex on U if and only if

∀x, y ∈ U, x 6= y, f(y) > f(x) + df(x; y − x). (4.3)

(ii) If, in addition, f is Gateaux differentiable in U, then f is strictly convex on U if and only if

∀x, y ∈ U, x 6= y, f(y) > f(x) + ∇f(x) · (y − x). (4.4)

Proof. (i) If f is strictly convex on U, we get (4.1) from Theorem 4.1. Therefore, for all x and y in U such that x 6= y and t ∈ ]0, 1[ ,

df(x; t(y − x)) ≤ f(x + t(y − x)) − f(x).

By positive homogeneity, tdf(x; y − x) = df(x; t(y − x)). As f is strictly convex,

f(x + t(y − x)) − f(x) =f((1 − t)x + ty) − f(x) < (1 − t)f(x) + tf(y) − f(x) = t [f(y) − f(x)] ⇒ tdf(x;(y − x)) < t [f(y) − f(x)].

By dividing both sides by t, we get (4.3). The proof of the converse is the same as that of Theorem 4.1 but with λ ∈ ]0, 1[ and x 6= y. (ii) Since f is Gateaux differentiable, the gradient exists and df(x; v) = ∇f(x)·v. Substitute in part (i). For the next theorem, recall Definition 4.16 of Chapter 2.

Definition 4.1. A symmetric matrix A is positive definite (resp., positive semidefinite) if

n n ∀x ∈ R , x 6= 0, (Ax) · x > 0 (resp., ∀x ∈ R , (Ax) · x ≥ 0). The property will be denoted A > 0 (resp., A ≥ 0).

n n Theorem 4.3. Let U ⊂ R be open convex and f : R → R be of class C(2) on U. (i) f is convex on U if and only if Hf(y) ≥ 0 for all y ∈ U. (ii) If there exists x ∈ U such that Hf(x) > 0, then there exists a convex neighborhood V (x) of x such that f is strictly convex on V (x).

Remark 4.1. The converse of part (ii) of Theorem 4.3 is not true. Indeed, consider the function f(x) = x4 defined on R. Its second-order derivative is given by f (2)(x) = 12x2. At x = 0, f is zero even if f is strictly convex on any neighborhood of x = 0.

Proof of Theorem 4.3. We again make use of Taylor’s theorem (Theorem 2.6) applied to the function g(t) = f(x + t(y − x)). For all x, y on U, there exists α ∈ ]0, 1[ such that 1 f(y) = f(x) + ∇f(x) · (y − x) + Hf(x + α(y − x))(y − x) · (y − x). 2 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

128 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

(i) If f is convex on U, by Theorem 4.1 1 0 ≤ f(y) − f(x) − ∇f(x) · (y − x) = Hf(x + α(y − x))(y − x) · (y − x). 2

But, since U is open, there exists r > 0 such that Br(x) ⊂ U. Therefore

Hf(x + αrb)b · b ≥ 0 ∀b ∈ B1(0). Since f is of class C(2) and |αrb| < r, as r goes to 0,

n Hf(x)b · b ≥ 0, ∀b ∈ B1(0) ⇒ ∀v ∈ R , Hf(x)v · v ≥ 0, and Hf(x) ≥ 0 for all x ∈ U. Conversely, for all x, y ∈ U, there exists α ∈ ]0, 1[ such that 1 f(y) − f(x) − ∇f(x) · (y − x) = Hf(x + α(y − x))(y − x) · (y − x) ≥ 0 2 since x + α(y − x) ∈ U and, by assumption, Hf(x + α(y − x)) ≥ 0. The function f is then convex by Theorem 4.1. (ii) If Hf(x) > 0 for some x ∈ U, then by continuity, there exists r > 0 such that

Br(x) ⊂ U and ∀y ∈ Br(x), Hf(y) > 0.

Now make use of Taylor’s theorem (Theorem 2.6) in Br(x). Then, for all y, z ∈ Br(x), y 6= z, there exists α ∈ ]0, 1[ such that 1 f(y) − f(z) − ∇f(z) · (y − z) = Hf(z + α(y − z))(y − z) · (y − z) > 0 2 since z + α(y − z) ∈ Br(x) and, by assumption, Hf(z + α(y − z)) > 0. Therefore, for all y, z ∈ Br(x), y 6= z, f(y) − f(z) − ∇f(z) · (y − z) > 0 and, by Theorem 4.2, f is strictly convex on Br(x).

4.2 I Semidifferentiability and Continuity of Convex Functions One of the most studied family of functions is the one of convex functions. In general, a convex function on a compact convex set U is not continuous and does not necessarily have a semi- differential in all points of U, as illustrated by the graphs of Figure 3.8 corresponding to the two functions of Example 4.1.

Example 4.1. Consider the following convex discontinuous function g1 : [0, 2] → R:  3/2 if x = 0, def  g1(x) = 1 − x/2 if 0 < x < 2, 1 if x = 2.

It is readily checked that dg1(0; 1) = −∞ and dg1(2; −1) = −∞. Now consider the continuous convex function g2 : [0, 2] → R,

def p 2 g2(x) = 1 − 1 − (x − 1) .

Again, it is easy to check that dg2(0; 1) = −∞ and dg2(2; −1) = −∞. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 129

g1(x) g2(x)

1 1

0 2 0 2

Figure 3.8. The two convex functions g1 and g2 on [0, 1] of Example 4.1.

However, problems seem to only occur at the boundary of the set U and Theorem 4.4 will show that a convex function f defined on a convex neighborhood of a point x is semidifferentiable in x. If, in addition, f is continuous at x, f will be Hadamard semidifferentiable at x (see Theorem 4.7). Finally, Theorem 4.8 will show that, in finite dimensions, a convex function defined in a convex set U is continuous on its interior int U. A complete characterization of a convex function from the existence of semidifferentials and associated conditions on them will be given in Theorem 4.5.

4.2.1 Convexity and Semidifferentiability

n Theorem 4.4. Let V (x) be a convex neighborhood of a point x ∈ R and f : V (x) → R a n convex function in V (x). Then df(x; v) exists in all directions v ∈ R and

n ∀v ∈ R , df(x; v) + df(x; −v) ≥ 0. (4.5)

n Proof. (i) Existence. Given v ∈ R , there exists α0, 0 < α0 < 1, such that x − αv ∈ V (x), 0 < α ≤ α0, and there exists θ0, 0 < θ0 < 1, such that x + θv ∈ V (x), 0 < θ ≤ θ0. Fix α, 0 < α ≤ α0. We first show that

f(x) − f(x − αv) f(x + θv) − f(x) ∀θ, 0 < θ < θ , ≤ . (4.6) 0 α θ Indeed, x can be written as α θ x = (x + θv) + (x − αv) α + θ α + θ and, by convexity, α θ f(x) ≤ f(x + θv) + f(x − αv) α + θ α + θ or, by rearranging,

θ α [f(x) − f(x − αv)] ≤ [f(x + θv) − f(x)], θ + α θ + α and hence we get (4.6). Define

f(x + θv) − f(x) ϕ(θ) def= , 0 < θ < θ , θ 0 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

130 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

and show that ϕ is monotone increasing. For all θ1 and θ2, 0 < θ1 < θ2 < θ0,     θ1 θ1 f(x + θ1v) − f(x) = f (x + θ2v) + 1 − x − f(x) θ2 θ2   θ1 θ1 θ1 ≤ f(x + θ2v) + 1 − f(x) − f(x) ≤ [f(x + θ2v) − f(x)] θ2 θ2 θ2

⇒ ϕ(θ1) ≤ ϕ(θ2).

Since the function ϕ(θ) is increasing and bounded below for θ ∈ ]0, θ0[ , the limit as θ goes to 0 exists. By definition, it coincides with the semidifferential df(x; v). n (ii) Given v ∈ R , there exists α0, 0 < α0 < 1, such that f(x) − f(x − αv) ≤ df(x; v), 0 < α ≤ α . α 0 n From part (i) for all v ∈ R , df(x; v) and df(x; −v) exist. Letting α go to 0, we get f(x − αv) − f(x) −df(x; −v) = − lim ≤ df(x; v) α&0 α and the inequality df(x; −v) + df(x; v) ≥ 0.

n n n n Corollary 1. If f : R → R is convex in R , then for all v ∈ R and all x ∈ R , f(x) − f(x − v) ≤ −df(x; −v) ≤ df(x; v) ≤ f(x + v) − f(x). (4.7)

This theorem is the first step towards the complete characterization of a convex function and the relaxation of the conditions of Theorem 4.2.

n Theorem 4.5. Let U ⊂ R be open convex and f : U → R. Then f is convex (resp., strictly convex) on U if and only if the following conditions are satisfied:

n (a) ∀x ∈ U, ∀v ∈ R , df(x; v) exists; n (b) ∀x ∈ U, ∀v ∈ R , df(x; v) + df(x; −v) ≥ 0; (c) ∀x, y ∈ U, f(y) ≥ f(x) + df(x; y − x) (resp., ∀x, y ∈ U, x 6= y, f(y) > f(x) + df(x; y − x)). Proof. (⇒) If f is convex, conditions (a) and (b) are verified by Theorem 4.4 at all points of the open convex U. As for condition (c), let x and y be two points of U and θ, 0 < θ ≤ 1. As f is convex on U,

f(θy + (1 − θ)x) ≤ θf(y) + (1 − θ)f(x) ⇒ f(x + θ(y − x)) − f(x) ≤ θ [f(y) − f(x)].

By dividing by θ and going to the limit as θ goes to 0, we get

df(x; y − x) ≤ f(y) − f(x).

When f is strictly convex, choose x 6= y and θ, 0 < θ < 1. Then 1 1 df(x; y − x) = df(x; θ(y − x)) ≤ [f(x + θ(y − x)) − f(x)]. θ θ Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 131

But f(x + θ(y − x)) = f(θy + (1 − θ)x) < θf(y) + (1 − θ)f(x) 1 1 ⇒ df(x; y − x) ≤ [f(x + θ(y − x)) − f(x)] < θ(f(y) − f(x)) = f(y) − f(x). θ θ (⇐) Apply the inequality in (c) to x and x + θ(y − x) and to y and x + θ(y − x) with θ ∈ [0, 1]: f(x) ≥ f(x + θ(y − x)) + df(x + θ(y − x); −θ(y − x)), f(y) ≥ f(x + θ(y − x)) + df(x + θ(y − x); (1 − θ)(y − x)). Multiply the first inequality by 1 − θ and the second by θ and add them up: (1 − θ)f(x) + θf(y) ≥ f(x + θ(y − x)) + (1 − θ)θdf(x + θ(y − x); −(y − x)) + θ(1 − θ)df(x + θ(y − x); y − x). By using (b), we get f(θy + (1 − θ)x) ≤ θf(y) + (1 − θ)f(x). In the strictly convex case, y 6= x and θ ∈ ]0, 1[ yield x 6= x + θ(y − x) and y 6= x + θ(y − x). The previous steps can be repeated with a strict inequality. n Theorem 4.6. Let U ⊂ R be open convex and f : U → R be convex on U. For each x ∈ U, the function n v 7→ df(x; v): R → R (4.8) is positively homogeneous, convex, and subadditive, that is, n ∀v, w ∈ R , df(x; v + w) ≤ df(x; v) + df(x; w). (4.9) n Proof. We want to show that for all α, 0 ≤ α ≤ 1, and v, w ∈ R , df(x; αv + (1 − α)w) ≤ αdf(x; v) + (1 − α)df(x; w). Since x ∈ U and U is open and convex,

∃θ0, 0 < θ0 < 1, such that ∀θ, 0 < θ ≤ θ0, x + θv ∈ U and x + θw ∈ U ⇒ ∀0 ≤ α ≤ 1, x + θ(αv + (1 − α)w) = α(x + θv) + (1 − α)(x + θw) ∈ U, and by convexity of f, f(x + θ(αv + (1 − α)w)) = f(α[x + θv] + (1 − α)[x + θw]) ≤ αf(x + θv) + (1 − α)f(x + θw) ⇒ [f(x + θ(αv + (1 − α)w)) − f(x)] ≤ α [f(x + θv) − f(x)] + (1 − α)[f(x + θw) − f(x)] . Dividing by θ and going to the limit as θ goes to 0, we get df(x; αv + (1 − α)w) ≤ αdf(x; v) + (1 − α)df(x; w). Combining the positive homogeneity and the convexity,  1 1  df(x; v + w) = df x; 2v + 2w 2 2 1 1 ≤ df(x; 2v) + df(x; 2w) = df(x; v) + df(x; w), 2 2 we get the subadditivity. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

132 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

4.2.2 Convexity and Continuity We have seen that the Hadamard semidifferential of a convex continuous function in a convex neighborhood of a point x exists at x in all directions. We now give a series of necessary and sufficient conditions for the existence of the Hadamard semidifferential.

n Theorem 4.7. Let f : R → R ∪{+∞} be convex on a convex neighborhood of a point x of n R . The following conditions are equivalent. (i) f is bounded above on a neighborhood of x.

(ii) There exist a neighborhood W (x) of x and a constant c(x) > 0 such that

n ∀y ∈ W (x), ∀v, w ∈ R , |df(y; w) − df(y; v)| ≤ c(x)kw − vk. (4.10)

(iii) f is Lipschitzian at x; that is, there exist c(x) > 0 and a neighborhood W (x) of x such that ∀y, z ∈ W (x), |f(y) − f(z)| ≤ c(x) ky − zk.

(iv) f is Hadamard semidifferentiable on a neighborhood of x.

(v) dH f(x; 0) exists.

(vi) f is continuous at x.

Proof. (i) ⇒ (ii) By assumption, there exist a neighborhood V (x) of x and a constant d(x) ∈ R such that ∀y ∈ U, f(y) ≤ d(x) and since V (x) is a neighborhood of x, there exists η > 0 such that B2η(x) ⊂ V (x). Choose c(x) = d(x) − f(x). From the previous discussion, c(x) ≥ 0. We get

∀z ∈ B2η(x), f(z) − f(x) ≤ c(x). (4.11)

By convexity of f on f : B2η(x) → R, we have from Theorem 4.5(a) that df(x; v) exists for all v and, by property (c), that

df(x; z − x) ≤ f(z) − f(x) ≤ c(x) ⇒ ∀y ∈ B2η(0), df(x; y) ≤ c(x). (4.12)

n For all w ∈ R , w 6= 0, η w/kwk ∈ B2η(0) and

 w  df x; η ≤ c(x). kwk

Since v 7→ df(x; v) is positively homogeneous,

c(x) ∀w ∈ n, df(x; w) ≤ kwk. (4.13) R η

By convexity of f and Theorem 4.4, c(x) c(x) −df(x; w) ≤ df(x; −w) ≤ k − wk = kwk, η η Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 133 and, combining this last inequality with inequality (4.13), we get

c(x) ∀w ∈ n, |df(x; w)| ≤ kwk. (4.14) R η

But we want more. Always, by convexity of f, we get from (4.9) in Theorem 4.6 that v 7→ n df(x; v) is subadditive. Therefore, for all v and w in R , df(x; w) − df(x; v) = df(x; v + w − v) − df(x; v) ≤ df(x; v) + df(x; w − v) − df(x; v) = df(x; w − v).

Then, using inequality (4.14), we get c(x) df(x; w) − df(x; v) ≤ df(x; w − v) ≤ |df(x; w − v)| ≤ kw − vk. η

By repeating this argument and interchanging the roles of v and w, we get

c(x) ∀v, w ∈ n, |df(x; w) − df(x; v)| ≤ kw − vk. R η

We have proved the result at x. We now extend it to a neighborhood of x. Always, by convexity of f and property (b) of Theorem 4.5, we have from (4.12) and (4.13), for all z ∈ Bη(x), c(x) f(x) − f(z) ≤ −df(x; z − x) ≤ df(x; x − z) ≤ kx − zk ≤ c(x). η

Therefore, by combining this inequality with inequality (4.11),

∀y ∈ Bη(x), |f(y) − f(x)| ≤ c(x).

Repeat the proof by replacing x by some y ∈ Bη(x). Indeed, for z ∈ Bη(y), we get z ∈ B2η(x) and

f(z) − f(y) ≤ f(z) − f(x) + |f(y) − f(x)| ≤ 2c(x), | {z } | {z } ≤c(x) ≤c(x) and f is bounded above on a neighborhood Bη(y) of y by the constant d(y) = f(y) + 2c(x). So we are back to the previous conditions with y in place of x, 2c(x) in place of c(x), and Bη(y) in place of B2η(x). By the same technical steps as above, we finally get for all y ∈ Bη(x), 2c(x) 4c(x) ∀v, w ∈ n, |df(y; w) − df(y; v)| ≤ kw − vk = kw − vk. R η/2 η

Choose the neighborhood W (x) = Bη(x). (ii) ⇒ (iii). By definition, df(y; 0) = 0 and from (4.10),

n ∀y ∈ W (x), ∀w ∈ R , |df(y; w)| = |df(y; w) − 0| ≤ c(x)kwk.

Since W (x) is a neighborhood of x, there exists η > 0 such that Bη(x) ⊂ W (x). From this we have for y1 and y2 ∈ Bη(x)

f(y2) − f(y1) ≤ −df(y2; y1 − y2) ≤ df(y2; y2 − y1) ≤ c(x)ky2 − y1k. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

134 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

By interchanging the roles of y1 and y2, we get

∀y1, y2 ∈ Bη(x), |f(y2) − f(y1)| ≤ c(x)ky2 − y1k.

So, the function f is Lipschitzian at x. (iii) ⇒ (iv) By definition, there exist η > 0 and c(x) > 0 such that

Bη(x) ⊂ W (x) and ∀y, z ∈ Bη(x), |f(y) − f(z)| ≤ c(x) ky − zk.

For all y ∈ Bη(x), there exists ε > 0 such that Bε(y) ⊂ Bη(x). Then

∀z1, z2 ∈ Bε(y), |f(z2) − f(z1)| ≤ c(x) kz2 − z1k,

n and f is Lipschitzian and convex on Bε(y). Therefore, as f is convex on Bη(x), for all v ∈ R , the semidifferential df(y; v) exists and, since f is Lipschitzian on Bη(x), the semidifferential dH f(y; v) exists by Theorem 3.6. Therefore, we have the result with W (x) = Bη(x). (iv) ⇒ (v) Obvious by choosing v = 0 at x. (v) ⇒ (vi) By Theorem 3.8. (vi) ⇒ (i) If f is continuous at x, then

∀ε > 0, ∃η > 0, ∀y, ky − xk < η, |f(y) − f(x)| < ε, and, for y on a neighborhood W (x) = Bη(x) of x, f(y) ≤ |f(y)| ≤ |f(x)| + ε. Therefore, the function f is bounded on a neighborhood W (x) of x.

Since we have established that a continuous convex function is Hadamard semidifferentiable, we complete this section by proving that convex functions on convex sets U are continuous in the interior of U. We need the following lemma.

Lemma 4.1. Given an open ball Bη(x) at x of radius η > 0, there exist x0, x1, . . . , xn in Bη(x) and ε > 0 such that

( n n ) def X X Bε(x) ⊂ M = λixi : λi ≥ 0, 0 ≤ i ≤ n, λi = 1 . (4.15) i=0 i=0

n Proof. Since Bη(x) is a neighborhood of x in R , choose x1, . . . , xn in Bη(x) such that the directions x1 − x, x2 − x, . . . , xn − x are linearly independent and such that

kxi − xk = η / 2n, 1 ≤ i ≤ n.

Also choose n def X x0 = x − (xi − x). (4.16) i=1 It is readily verified that kx0 − xk ≤ η / 2 ⇒ x0 ∈ Bη(x).

Moreover, by definition of M, n X 1 x = x ∈ M. n + 1 i i=0 We want to show that

∃ε > 0 such that Bε(x) ⊂ M. (4.17) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 135

We proceed by contradiction. If (4.17) is not verified, then

∀m ≥ 1, ∃xm such that kxm − xk = 1/m and xm ∈/ M. Since, by construction, M is a closed convex set, we can apply the separation theorem (Theo- rem 7.10(ii) of Chapter 2) to separate xm and M:

∃pm, kpmk = 1, such that ∀y ∈ M, pm · y ≤ pm · xm. (4.18)

As m goes to infinity, xm goes to x. But the sequence {pm} belongs to the compact sphere of radius one. So, there exist a subsequence {pmk } of {pm} and a point p, kpk = 1, such that pmk → p, k → ∞. By going to the limit in inequality (4.18) for the subsequence as k goes to infinity, we get

∃p, kpk = 1, p · y ≤ p · x ∀y ∈ M. (4.19)

In particular, for i, 1 ≤ i ≤ n, xi ∈ M, and

p · (xi − x) ≤ 0, 1 ≤ i ≤ n. (4.20) By definition of M and inequality (4.19),

n ! n X X p · λixi ≤ p · x, ∀λi ≥ 0, 0 ≤ i ≤ n, λi = 1 i=0 i=0 n ! n X X ⇒ p · λi (xi − x) ≤ 0, ∀λi ≥ 0, 0 ≤ i ≤ n, λi = 1. (4.21) i=0 i=0

By isolating x0,

n ! n ! X X p · λi (xi − x) = p · (λ0 (x0 − x)) + p · λi (xi − x) , i=0 i=1 and, by definition of x0,

n ! n ! n ! X X X p · λi (xi − x) = p · −λ0 (xi − x) + p · λi (xi − x) i=0 i=1 i=1 n ! n ! X X = p · (λi − λ0)(xi − x) = p · (λi − λ0)(xi − x) . i=1 i=0 Then (4.21) becomes

" n # n X X p · (λi − λ0)(xi − x) ≤ 0 ∀λi ≥ 0, 0 ≤ i ≤ n, λi = 1. (4.22) i=0 i=0 Given i, 1 ≤ i ≤ n, choose 1 + 1 / 2n 1 λ = > 0, λ = > 0, and λ = λ , j 6= i, 1 ≤ j ≤ n. 0 1 + n i 2(n + 1) j 0 It is readily checked that n X 1 + 2n 1 λ = nλ + λ = + = 1. j 0 i 2(1 + n) 2(1 + n) j=0 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

136 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

By substituting in (4.22), we get 1 0 ≥ p · (λ − λ )(x − x) = − p · (x − x) ⇒ p · (x − x) ≥ 0. i 0 i 2n i i Hence, from (4.20),

p · (xi − x) ≤ 0 and p · (xi − x) ≥ 0 ⇒ p · (xi − x) = 0.

n Since the n vectors x1 − x, . . . , xn − x are linearly independent, any y ∈ R can be represented as a linear combination of those n vectors. Then

n p · y = 0, ∀y ∈ R ⇒ p = 0 that contradicts the fact that kpk = 1. Hence, the assertion (4.17) is true. The notion of a convex function in a convex set U naturally extends to sets U that are not necessarily convex.

Definition 4.2. n n f : R → R ∪{+∞} is locally convex in U ⊂ R if f is convex in all convex subsets V of U. n It is locally concave in U ⊂ R if −f is locally convex in U. n n Theorem 4.8. A function f : R → R locally convex on U ⊂ R is locally Lipschitzian and Hadamard semidifferentiable in the interior int U of U.

Proof. For a locally convex function on U and an interior point x ∈ U, there exists an open ball Br(x) ⊂ U on which f is convex. So it is sufficient to prove the theorem for a convex function in a convex U. If int U 6= ∅, then for all x ∈ int U, there exists η > 0 such that Bη(x) ⊂ int U. By Lemma 4.1, there exist x0, x1, . . . , xn in Bη(x) and ε > 0 such that Bε(x) ⊂ M. So the set M is a neighborhood of x. To prove the continuity of f at x, we show that f is bounded above on M and apply Theorem 4.7 (equivalence of (i) and (v)). By Jensen’s inequality (7.14) from Theorem 7.6(i) of Chapter 2, for a convex function f,

n ! n n X X X f λixi ≤ λif(xi) ∀λi ≥ 0, 0 ≤ i ≤ n, λi = 1. i=0 i=0 i=0 Hence, ∀y ∈ M, f(y) ≤ max{f(xi) : 0 ≤ i ≤ n} < +∞ and f is bounded above on M. The continuity and the Hadamard semidifferentiability now follow from Theorem 4.7.

4.3 I Lower Hadamard Semidifferential at a Boundary Point of the Domain Section 4.2 provides a fairly complete account of the properties of a convex function f at inte- rior points of its domain dom f. However, for the infimum problem, the minimizer may very well occur on a part of the boundary ∂(dom f) contained in dom f where the semidifferen- tial dH f(x; v) does not exist (see Example 8.1 of Chapter 2, where the minimizer (0, 0) ∈ dom f ∩ ∂(dom f)). n When int (dom f) = ∅, the properties of convex functions f : R → R ∪{+∞} obtained n in section 4.2 naturally extend from interior points of dom f in R to points of the relative interior ri (dom f) in aff (dom f) by considering the function f as a function on aff (dom f) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 137

n rather than on R . Recall, from section 7.3.4 in Chapter 2 (Definition 7.3 and Lemma 7.5), that we can associate with f the function

def y 7→ fx(y) = f(y + x) − f(x): S(dom f) → R (4.23) which is convex with domain dom fx = dom f − x, where S(dom f) = aff (dom f) − x is the linear subspace associated with aff (dom f). If dom f 6= ∅ is not a singleton, ri (dom f) is a convex open subset in the affine subspace aff (dom f) and the previous results readily extend to that case. We single out a few properties. n Theorem 4.9. Let f : R → R ∪{+∞} be a convex function with dom f 6= ∅ that is not a singleton. (i) For each x ∈ ri (dom f), lim inf f(y) = lim f(y) = f(x), (4.24) y→x y∈dom f →x

and the restriction f : dom f → R is continuous in ri (dom f). (ii) For each x ∈ ri (dom f), df(x, v) exists in all directions v ∈ S(dom f) and ∀v ∈ S(dom f), df(x; v) + df(x; −v) ≥ 0, (4.25) ∀x, y ∈ ri (dom f), f(y) ≥ f(x) + df(x; y − x). (4.26) The function

v 7→ df(x; v): S(dom f) → R (4.27) is positively homogeneous, convex, subadditive, and Lipschitzian. Even if this result is interesting, the minimizer can still occur on the boundary ∂(dom f) and not on the relative interior ri (dom f). In addition, it would be nice to avoid working with relative interiors and directions in the subspace S(dom f). To get around the fact that the nice semidifferential dH f(x; v) is slightly too restrictive since it does not exist at boundary points of dom f, we turn to the lower notion dH f(x; v), which is much better suited for our purposes since, as a liminf, it always exists at every point x ∈ dom f where it can be finite or infinite. This can be easily seen by computing dH f((0, 0); (x, y)) for the function of Example 8.1 in Chapter 2 at the point (0, 0) ∈ dom f ∩ ∂(dom f), where dH f((0, 0); (x, y)) does not exist, but

dH f((0, 0); (x, y)) = 0 ∀(x, y) ∈ dom f. The next theorem generalizes the important property (4.1) of Theorem 4.1. n Theorem 4.10. Let f : R → R ∪{+∞} be a convex function, dom f 6= ∅. (i) For each x ∈ dom f, n ∀y ∈ R , dH f(x; y − x) ≤ f(y) − f(x) (4.28)

except when dom f = {x} and y = x for which dH f(x; 0) = +∞ > 0. (ii) For each x ∈ ri (dom f) and all v ∈ S(dom f), f(x + tw) − f(x) dH f(x; v) = lim (4.29) t&0 t w∈S(dom f) →v is finite. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

138 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n Proof. (i) For y ∈ R , 0 < t < 1, and z 6= y, by the convexity of f,

f(x + t(z − x)) − f(x) ≤ f(z) − f(x). t

If y = x, k(x + t(z − x)) − xk = t kz − yk > 0, and if y 6= x, k(x + t(z − x)) − xk ≥ t ky − xk − t kz − yk, which becomes strictly positive as z → y. Therefore, as t → 0 and z → y, x + t(z − x) 6= x and

f(x + t(z − x)) − f(x) dH f(x; y − x) = lim inf ≤ lim inf f(z) − f(x). t&0 t z→y z−x→y−x

By Lemma 7.3 of Chapter 2, lim infz→y f(z) ≤ f(y) except in the case where dom f = {x} and y = x. (ii) From the previous section when f is restricted to aff (dom f).

4.4 I Semiconvex Functions and Hadamard Semidifferentiability We complete this section on convexity by introducing semiconcave functions which play a key role in the study of the Hamilton–Jacobi equations and in optimal control problems (see, for instance, the book of P. CANNARSA and C. SINESTRARI [1]). As in the case of the notion of concavity, a function is semiconvex if −f is semiconcave and vice versa. It turns out that semiconvex functions are Hadamard semidifferentiable. Moreover, the Hadamard semidifferen- tial of a semiconvex function coincides with the generalized directional derivative (3.46) briefly introduced in section 3.5. n n In its simplest version a function f : R → R is semiconvex on a convex subset U of R if there exists a constant c > 0 such that f(x)+c kxk2 is convex on U. This yields, for all x, y ∈ U and all λ ∈ [0, 1],

f(λx + (1 − λ)y) − λ f(x) − (1 − λ) f(y) ≤ λ(1 − λ) c kx − yk2. (4.30)

n If f is continuous and semiconvex in R , the function

def 2 fc(x) = f(x) + c kxk (4.31)

n 2 is convex and continuous in R . As a result f(x) = fc(x) − c kxk is the difference of two convex continuous functions. In particular, f is Hadamard semidifferentiable at all points x and in all directions v:

dH f(x; v) = dH fc(x; v) − 2c x · v.

n When U is not necessarily convex, then we say that f : R → R is locally semiconvex on U if there exists a constant c > 0 such that f(x) + c kxk2 is convex in all convex subsets V of U as in Definition 4.2. The semiconcavity and the local semiconcavity are obtained by replacing f by −f in the definitions of semiconvexity and local semiconvexity. As in the semiconvex case, a semiconcave function is also the difference of two convex functions. The above definitions can be weakened in various ways to suit specific purposes. For in- stance, the square of the norm could be replaced by the norm: f is semiconvex on the convex U if there exists a constant c > 0 such that f(x) + c kxk is convex on U. We obtain an inequality Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 139 similar to (4.30): f(λx + (1 − λ)y) − λ f(x) − (1 − λ) f(y) ≤ c [λkxk + (1 − λ)kyk − kλx + (1 − λ)yk] ≤ c [λ (kxk − kλx + (1 − λ)yk) + (1 − λ)(kyk − kλx + (1 − λ)yk)] ≤ c [λ |kxk − kλx + (1 − λ)yk| + (1 − λ) |kyk − kλx + (1 − λ)yk|] ≤ 2λ(1 − λ) c kx − yk.

n If f is continuous and semiconvex in R , then the function def fc(x) = f(x) + c kxk (4.32) n is convex and continuous in R . As a result f(x) = fc(x) − c kxk is again the difference of two convex continuous functions. In particular, f is Hadamard semidifferentiable at all points x and in all directions v,

def dH f(x; v) = dH fc(x; v) − dH n(x; v), where n(x) = kxk, since the norm is Hadamard semidifferentiable. The definition of a semiconcave function as the one of a semiconvex function has been gen- eralized in the literature as follows (see, for instance, P. CANNARSA and C. SINESTRARI [1]).

Definition 4.3. n n Let f : R → R and U ⊂ R . 117 (i) f is semiconvex on U if there exists a nondecreasing function ω : R+ → R+ such that ω(ρ) → 0 as ρ → 0 and f(λx + (1 − λ)y) ≤ λ f(x) + (1 − λ) f(y) + λ(1 − λ) ω(kx − yk) kx − yk (4.34) for all pairs x, y ∈ U such that the segment [x, y] is contained in U. (ii) f is semiconcave on U if −f is semiconvex on U.

Locally convex (resp., concave) functions are a special case of semiconvex (resp., semiconcave) functions for which ω = 0, and the next theorem is a generalization of Theorem 4.8. n n Theorem 4.11. Let U ⊂ R , int U 6= ∅, and f : R → R be semiconvex on U. n (i) For each x ∈ int U and all v ∈ R , df(x; v) exists and is finite and df(x; v)+df(x; −v) ≥ 0.

n (ii) For each x ∈ int U there exists r > 0 such that f is Lipschitzian in Br(x), for all v ∈ R , dH f(x; v) exists, and n ∀v, w ∈ R , |dH f(x; v) − dH f(x; w)| ≤ c(x, r) kv − wk, (4.35)

where c(x, r) is the Lipschitz constant of f on Br(x).

117 The additional condition that ω : R+ → R+ be upper semicontinuous in the definition of P. CANNARSA and C.SINESTRARI [1] has been dropped since it is redundant. If, for some reason, this property becomes necessary, simply replace ω by its usc regularization def cl uscω(x) = inf g(x) (4.33) g usc and ω≤g on R+ (see Definition 4.4(ii) of Chapter 2). It retains the nondecreasing property, the continuity in 0, and the semiconvexity property (4.34) of ω since, by definition, ω ≤ cl uscω. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

140 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n (iii) For each x ∈ int U and all v ∈ R , dC f(x; v) = dH f(x; v).

(iv) Let x ∈ U ∩ ∂U. If there exists y ∈ U, y 6= x, such that [x, y] ⊂ U, then

lim inf f(z) ≤ f(x). (4.36) z→x

For all y ∈ U, y 6= x, such that [x, y] ⊂ U, we have

dH f(x; y − x) ≤ f(y) − f(x) + ω(ky − xk) ky − xk. (4.37)

m n (v) (Chain rules) If g : R → R , where m ≥ 1 is an integer, is such that dH g(y; w) exists and g(y) ∈ int U, then

dH (f ◦ g)(y; w) = dH f(g(y); dH g(y; w)); (4.38)

m if x ∈ int U and h : R → R is such that dH h(f(x); dH f(x; v)) exists, then

dH (h ◦ f)(x; v) = dH h(f(x); dH f(x; v)). (4.39)

Corollary 1. If the assumption of semiconvexity in Theorem 4.11 is replaced by the semiconcav- ity of f in U, the conclusions remain the same except in (iii) where

f(y + tv) − f(y) dH f(x; v) = dC f(x; v) = lim inf (4.40) t&0 t y→x and in (iv) where (4.36) and (4.37) are replaced by

lim sup f(z) ≥ f(x), (4.41) z→x

dH f(x; y − x) ≥ f(y) − f(x) − ω(ky − xk) ky − xk. (4.42)

Proof. Since x ∈ int U, there exist R > 0 and a closed cube Q of diameter L centered at x such that B2R(x) ⊂ Q ⊂ U. n (i) Given v ∈ R , there exists θ0, 0 < θ0 < 1, such that x + θv ∈ B2R(x), 0 < θ ≤ θ0, and there exists α0, 0 < α0 < 1, such that x − αv ∈ B2R(x), 0 < α ≤ α0. Hence, x can be written as a convex combination: θ α x = (x − αv) + (x + θv) α + θ α + θ θ α ⇒ f(x) − f(x − αv) − f(x + θv) α + θ α + θ θ α ω((α + θ)kvk) ≤ ω((α + θ)kvk)(α + θ)kvk = θ α kvk. α + θ α + θ (α + θ)

This last inequality can be rewritten as

f(x) − f(x − αv) f(x + θv) − f(x) ≤ + ω((α + θ)kvk) kvk α θ f(x) − f(x − αv) f(x + θv) − f(x) ⇒ ≤ lim inf + ω(2α kvk) kvk, (4.43) α θ&0 θ Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 141 since ω is monotone increasing and, as θ → 0, we have 0 < θ < α for θ small. So, the liminf is bounded below. For all θ1 and θ2, 0 < θ1 < θ2 < θ0, using the definition of a semiconvex function,     θ1 θ1 f(x + θ1v) − f(x) = f (x + θ2v) + 1 − x − f(x) θ2 θ2     θ1 θ1 θ1 θ1 ≤ f(x + θ2v) + 1 − f(x) − f(x) + 1 − ω(θ2kvk) θ2kvk θ2 θ2 θ2 θ2   f(x + θ2v) − f(x) θ1 ≤ θ1 + θ1 1 − ω(θ2kvk) kvk, θ2 θ2 and, dividing both sides by θ1,   f(x + θ1v) − f(x) f(x + θ2v) − f(x) θ1 ≤ + 1 − ω(θ2kvk) kvk θ1 θ2 θ2

f(x + θ1v) − f(x) f(x + θ2v) − f(x) ⇒ lim sup ≤ + ω(θ2kvk) kvk, (4.44) θ1&0 θ1 θ2 and the liminf and limsup are bounded above. As liminf was bounded below, both are bounded. Now, take the liminf in (4.44) as θ2 → 0, f(x + θ v) − f(x) f(x + θ v) − f(x) lim sup 1 ≤ lim inf 2 + 0, θ &0 θ1&0 θ1 2 θ2 since, by assumption, ω(θ2kvk) → 0 as θ2 → 0. Therefore, df(x; v) exists and is finite for all n v ∈ R . Finally, going back to (4.43), take the limit as α → 0, f(x) − f(x − αv) f(x + θv) − f(x) lim ≤ lim + lim ω(2α kvk) kvk α&0 α θ&0 θ α&0 ⇒ −df(x; −v) ≤ df(x; v) + 0 ⇒ df(x; −v) + df(x; v) ≥ 0, since, by assumption, ω(2αkvk) → 0 as α → 0. (ii) We first prove that f is bounded above and below in BR(x). For the upper bound, denote n by x1, x2,..., x2n the 2 vertices of Q. Let

def n M0 = max{u(xi) : 1 ≤ i ≤ 2 }.

For two consecutive vertices xi and xj of Q, using the semiconvexity identity (4.34),

f(λxi + (1 − λ)xj) − λ f(xi) − (1 − λ) f(xj) L (4.45) ≤ λ(1 − λ) ω(kx − x k) kx − x k ≤ M + ω(L) . i j i j 0 4 This shows that f is bounded above on the one-dimensional faces of Q. Repeat the procedure by taking any convex combination of two points lying on different one-dimensional faces to obtain u(z) ≤ M0 + ω(L)L/4 + ω(L)L/4 = M0 + ω(L)L/2. Iterating this procedure n times, we get the existence of a constant M such that u(z) ≤ M for all z ∈ Q. For the lower bound, since B2R(x) ⊂ Q, u(x) ≤ M in B2R(x). For any z ∈ B2R(x), kz − xk  z − x  2R x = x − 2R + z. 2R + kz − xk kz − xk 2R + kz − xk Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

142 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Using the semiconvexity identity (4.34), kz − xk  z − x  2R f(x) − f x − 2R − f(z) 2R + kz − xk kz − xk 2R + kz − xk kz − xk 2R ≤ ω(2R + kz − xk) (2R + kz − xk). 2R + kz − xk 2R + kz − xk

After rearranging the terms, for any z ∈ B2R(x), 2R + kz − xk kz − xk  z − x  f(z) ≥ f(x) − f x − 2R 2R 2R kz − xk − kz − xk ω(2R + kz − xk) ≥ − 2|f(x)| − |M| − 2R ω(4R). So f is bounded below by m = −2|f(x)| − |M| − 2R ω(4R) and bounded above by M in B2R(x). 0 0 Now pick any two distinct points y and z in BR(x). There exist y and z on the line going through y and z such that ky0 − xk = 2R, kz0 − xk = 2R, and the four points appear in the order y0, y, z, and z0 on that line: ky0 − zk = ky0 − yk + ky − zk and ky − z0k = ky − zk + kz − z0k ⇒ ky0 − yk ≥ R and kz − z0k ≥ R. This means that y is a convex combination of y0 and z, and z is a convex combination of y and z0: ky − zk ky0 − yk y = y0 + z, ky0 − yk + ky − zk ky0 − yk + ky − zk kz − z0k ky − zk z = y + z0. ky − zk + kz − z0k ky − zk + kz − z0k Using the semiconvexity inequality (4.34) associated with the first convex combination, ky − zk ky0 − yk f(y) − f(y0) − f(z) ky0 − yk + ky − zk ky0 − yk + ky − zk ky − zk ky0 − yk ≤ ω(ky0 − zk) ky0 − zk ky0 − yk + ky − zk ky0 − yk + ky − zk ky − zk ky0 − yk ≤ ω(ky0 − zk) ky0 − yk + ky − zk ky − zk ky0 − yk ⇒ (f(y) − f(y0)) + (f(y) − f(z)) ky0 − yk + ky − zk ky0 − yk + ky − zk ky − zk ky0 − yk ≤ ω(ky0 − zk) ky0 − yk + ky − zk f(y) − f(y0) f(y) − f(z) ⇒ + ≤ ω(ky0 − zk) ky0 − yk ky − zk f(y) − f(z) f(y) − f(y0) M − m ⇒ ≤ ω(ky0 − zk) − ≤ ω(3R) + . ky − zk ky0 − yk R Similarly, for the second convex combination, f(z) − f(y) f(z) − f(z0) + ≤ ω(ky − z0k) ky − zk kz − z0k f(y) − f(z) f(z) − f(z0) M − m ⇒ ≥ −ω(ky − z0k) + ≥ −ω(3R) − . ky − zk kz − z0k R Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

4. Convex and Semiconvex Functions 143

Finally, for all y, z ∈ BR(x),  M − m |f(y) − f(z)| ≤ ω(3R) + ky − zk. R

The existence of dH f(x; v) and the identity dH f(x; v) = df(x; v) are now a consequence of part (i) and Theorem 3.6 for a Lipschitzian function. (iii) From part (ii), dH f(x; v) exists and, since f is locally Lipschitzian in BR(x), dC f(x; v) exists. By definition, f(y + tv) − f(y) f(x + tv) − f(x) dC f(x; v) = lim sup ≥ lim sup = df(x; v). t&0 t t&0 t y→x It remains to prove the inequality in the other direction. Given v 6= 0, choose ρ > 0 such that ρ < min{R/(1 + kvk), 1} so that for all θ, 0 < θ ≤ ρ, and y ∈ Bρ(x),

ky + θv − xk ≤ ky − xk + θkvk < ρ + ρ kvk < R ⇒ y + θv ∈ BR(x). By the same technique as in the proof of part (i), for all θ, 0 < θ ≤ ρ < 1, f(y + θv) − f(y) f(y + ρv) − f(y)  θ  ≤ + 1 − ω(ρkvk) kvk θ ρ ρ f(y + ρv) − f(y) ≤ + ω(ρkvk) kvk. ρ

By Lipschitz continuity in BR(x), f(x + ρv) − f(x) f(y + ρv) − f(y) 2c ≥ − ky − xk, ρ ρ ρ where c is the Lipschitz constant of f in BR(x). Combining the above two inequalities, for all y ∈ Bρ(x) and 0 < θ ≤ ρ, f(x + ρv) − f(x) f(y + θv) − f(y) 2c ≥ − ω(ρkvk) kvk − ky − xk. ρ θ ρ Choose an arbitrary ε such that 0 < ε < 2c. Then for any δ, 0 < δ < ερ/(2c), we have δ < ρ and f(x + ρv) − f(x) f(y + θv) − f(y) ≥ sup − ω(ρkvk) kvk − ε ρ ky−xk<δ θ 0<θ<δ f(x + ρv) − f(x) f(y + θv) − f(y) ⇒ lim ≥ lim sup − ε. δ&0 ρ δ&0 ky−xk<δ θ 0<θ<δ Since both limits exist, by letting ε go to zero, f(x + ρv) − f(x) f(y + δv) − f(y) df(x; v) = lim ≥ lim sup = dC f(x; v). δ&0 ρ δ&0 δ y→x (iv) From the semiconvexity property, for 0 < t < 1 f(x + t(y − x)) − f(x) ≤ t [f(y) − f(x) + (1 − t) ω(ky − xk) ky − xk] , lim inf f(z) − f(x) ≤ lim inf f(x + t(y − x)) − f(x) z→x t&0 ≤ lim inf t [f(y) − f(x) + (1 − t) ω(ky − xk) ky − xk] = 0. t&0 Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

144 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Always, from the semiconvexity property, for 0 < t < 1 f(x + t(y − x)) − f(x) ≤ f(y) − f(x) + (1 − t) ω(ky − xk) ky − xk, t f(x + t(y − x)) − f(x) dH f(x; y − x) ≤ lim inf t&0 t ≤ lim inf [f(y) − f(x) + (1 − t) ω(ky − xk) ky − xk] t&0 = f(y) − f(x) + ω(ky − xk) ky − xk. (v) From Theorem 3.5(ii).

Remark 4.2. The proofs of (i), (ii), and the last part of (iii) follow those of P. CANNARSA and C. SINES- TRARI [1, Thm. 3.2.1, p. 55, and Thm. 2.1.7, p. 33]. Part (iv) applies to the pre- and postcompo- sitions with f under the assumptions of P. CANNARSA and C. SINESTRARI [1, Thm. 2.1.12, p. 37].

5 I Semidifferential of a Parametrized Extremum 5.1 Preliminaries This section deals with the important issue of the semidifferentiability of an infimum (lower envelope) or a supremum (upper envelope) that depends on a set of parameters. We have already seen the rule for the lower and upper envelopes of a finite family of functions in Theorem 3.4(ii). We now generalize it to an infinite family at the price of a few assumptions. In 1966, J. M. DANSKIN [1] was interested in a target defense problem and its relationship to the theory of Max Min. This was a special case of a resource allocation problem in economics or operations research. It could also be seen in the context of a multistep two-persons (JX and JY ) game where there is no equilibrium or saddle point (Min Max 6= Max Min). Given a strategy x ∈ X of player JX , player JY chooses a strategy y ∈ Y that minimizes a utility function G(x, y), min G(x, y). y∈Y

In a second step, player JX maximizes the result, max min G(x, y). x∈X y∈Y From a dynamical perspective, the steps can be repeated: Min Max Min (three steps game), Max Min Max Min (four steps game), etc. So, this leads us in the first step to consider functions of the form g(x) def= min G(x, y) or h(x) def= max G(x, y). (5.1) y∈Y y∈Y Danskin gives several simple examples in which the function g is not differentiable even if G is very smooth. This type of nondifferentiability is closely related to the fact that the set of maximizers Y (x) is not a singleton as illustrated in his example of the seesaw problem.

Example 5.1 (the seesaw problem of J. M. DANSKIN [1, p. 643]). Consider the function def def def G(x, y) = y sin x, Y = {y ∈ R : |y| ≤ 1},X = {x ∈ R : |x| ≤ π/2}. (5.2) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 145

y y = +1

x 0

y = −1

Figure 3.9. The seesaw problem of Danskin at point (x, y) ∈ [−π/2, π/2] × [−1, 1].

Player JX chooses the angle x ∈ X (see Figure 3.9) of the seesaw, player JY chooses any point y between the extremities −1 and +1. The function G(x, y) measures the height of the point (x, y) with respect to the level of the fulcrum of the seesaw. It is readily seen that

g(x) = min (y sin x) = −| sin x|, max min (y sin x) = max g(x) = 0 |y|≤1 |x|≤π/2 |y|≤1 |x|≤π/2 and that the function g(x) is not differentiable at x = 0 where the maximum of g(x) occurs. It is neither convex nor concave. Yet, the function g is Hadamard semidifferentiable and, by using the chain rule, sin x cos x   v, x 6= 0 dH g(x; v) = − | sin x| . (5.3) |v|, x = 0

The nondifferentiability at x = 0 arises from the fact that the set Y (x) of minimizers of G(x, y) is not a singleton at x = 0,

  sin x   − if x 6= 0, Y (x) def= {y ∈ Y : G(x, y) = g(x)} = | sin x| (5.4) {y : |y| ≤ 1} if x = 0.

Indeed, Danskin shows that ∂G dg(x; v) = min (x, y) v = min ((y cos x) v), (5.5) y∈Y (x) ∂x y∈Y (x) thus generalizing the formula for the case where Y contains only a finite number of points. It will be shown later that, under the assumptions of Danskin, we not only get the existence of dg(x; v) but also that of dH g(x; v). So, we have a complete description of the nondifferentiability at x = 0 that once again confirms the pertinence of the Hadamard semidifferential for this important family of functions.

Another simple example is the support function of a nonempty compact set. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

146 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Example 5.2 (the support function of Example 8.8 of Chapter 2). n Associate with Y ⊂ R nonempty compact the support function

def σY (x) = sup x · y, Y (x) = {y ∈ Y : x · y = σY (x)} . y∈Y

n Since Y is compact, the supremum is finite and dom σY = R . Since σY is the upper envelope n of the family y 7→ x · y of convex functions, it is convex in R (Theorem 7.6 of Chapter 2). n Therefore, it is continuous and Hadamard semidifferentiable on R by Theorem 4.8. It will be shown later in Theorem 2.4 (for A = 0) that its Hadamard semidifferential is

dH σY (x; v) = sup v · y. y∈Y (x)

In particular, for Y = B1(0), σY (x) = kxk, Y (x) = {x/kxk} if x 6= 0, and Y (0) = B1(0) if x = 0.

Another problem amenable to that formulation is the optimal shape of a column formulated by J. L. LAGRANGE [1] in 1770 and later studied by T. CLAUSEN [1] in 1849. It consists in finding the best profile of a vertical column to prevent buckling. It is one of the very early optimal design problems. Since Lagrange, many authors have proposed solutions, but a complete theoretical and numerical solution was only given in 1992 by S. J. COX and M. L OVERTON [1] using the generalized gradient.118 Once the equations of the problem are discretized, the remaining problem is to maximize the least eigenvalue of a symmetric matrix with respect to the parameters that define the geometry. This problem can also be tackled by using the Hadamard semidifferential (see M. C. DELFOUR and J.-P. ZOLÉSIO [2]). The theorems and explicit formulas given in this section can be directly applied to get the Hadamard semidifferential of the discretized version of the Euler buckling load, which is nothing but an eigenvalue depending on the shape parameters of the column. The specialization of Theorem 5.2 of this section to the semidifferential of the least and greatest eigenvalues of a parametrized symmetric matrix will be given in section 2.2 of Chapter 4.

5.2 Infimum with Respect to One Parameter

To compute dg(x; v) of the parametrized infimum g(x) = infy∈Y G(x, y), it is sufficient to consider the functions

t 7→ Gx,v(t, y) = G(x + tv, y) and gx,v(t) = inf G(x + tv, y), y∈Y where gx,v(t) depends on a single real parameter t ≥ 0. In view of the above remark it is natural to first study an infimum that depends on a single parameter t ≥ 0. Given a function

n G: [0, τ] × Y → R, τ > 0 et Y ⊂ R , (5.6) associate with each t ∈ [0, τ] the infimum

g(t) def= inf G(t, y) and Y (t) def= {y ∈ Y : G(t, y) = g(t)}. (5.7) y∈Y

118New existence results for this problem have been reported by YU. V. EGOROV [1, 2, 3], M. C. DELFOUR and F. HUOT-CHANTAL [1,] and F. HUOT-CHANTAL [1]. Other problems related to columns have been revisited in a series of papers by S. J. COX [1, 2] and S. J. COX and C. M. MCCARTHY [1]. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 147

In a first step, we determine the assumptions for the existence of the right-hand side derivative

g(t) − g(0) dg(0; +1) def= lim (5.8) t&0 t and find its exact expression as a function of G and its right-hand side derivative with respect to t. Then, combining this result with an additional property such as the Lipschitz continuity of the infimum g(x) with respect to the parameters x, we can establish the existence of the stronger Hadamard semidifferential dH g(x; v).

5.2.1 Existence and Expression of the Right-hand Side Derivative When Y (t) = {yt} is a singleton, 0 ≤ t ≤ τ, and the right-hand side derivative of yt,

yt − y0 y˙ = lim , (5.9) t&0 t is available, it is easy to obtain dg(x; v) under some conditions on the differentiability of the function G(t, y) with respect to t and y. But when y˙ is not easily available or when the sets Y (t) are not singletons, this direct approach fails or becomes very intricate. We now give a theorem that provides the exact expression of dg(0; +1). Its originality is that the differentiability of yt is replaced by an assumption of sequential semicontinuity of the multi- valued function Y (t) and assumptions on the semicontinuity of the right-hand side derivative of the function G(t, y) with respect to the parameter t. In other words, this approach does not necessitate the knowledge of the derivative y˙ with respect to t of the minimizer yt.

n Theorem 5.1. Let Y be a nonempty subset of R , τ > 0, and G: [0, τ] × Y → R. Assume that the following conditions are verified:

(H1) for all t, 0 ≤ t ≤ τ, Y (t) 6= ∅;

(H2) for all y0 ∈ Y (0), dG(0, y0; +1, 0) exists;

0 (H3) for all sequences {tn} ⊂ ]0, τ[ , tn → 0, there exist a subsequence {tnk } of {tn}, y ∈

Y (0), and a sequence {ynk }, ynk ∈ Y (tnk ), such that G(t , y ) − G(0, y ) lim inf nk nk nk ≥ dG(0, y0; +1, 0). k→∞ tnk

Then, there exists y0 ∈ Y (0) such that

g(t) − g(0) dg(0; +1) = lim = inf dG(0, y; +1, 0) = dG(0, y0; +1, 0). (5.10) t&0 t y∈Y (0)

Remark 5.1. Assumption (H3) can be relaxed to the existence of a y0 ∈ Y but at the cost of the additional con- 0 dition that ynk → y and that the function G: [0, τ]×Y → R be lsc (resp., y 7→ G(t, y): Y → R be lsc for all t when Y is compact in which case (H1) is verified).

Proof. Consider the lower and upper limits of the differential quotient g(t) − g(0) g(t) − g(0) dg(0) = lim inf , dg(0) = lim sup . t&0 t t&0 t Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

148 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Fix y0 ∈ Y (0) and yt ∈ Y (t). Then, by definition,

G(t, yt) = g(t) ≤ G(t, y0), −G(0, yt) ≤ −g(0) = −G(0, y0).

Add those two lines of inequalities and divide by t > 0 to get G(t, y ) − G(0, y ) g(t) − g(0) G(t, y ) − G(0, y ) t t ≤ ≤ 0 0 . (5.11) t t t

(i) Since dG(0, y0; +1, 0) exists, by taking the limsup of the last two inequalities (5.11) as t & 0, we get

dg(0) ≤ dG(0, y0; +1, 0) ⇒ dg(0) ≤ inf dG(0, y0; +1, 0). y0∈Y (0)

(ii) For the liminf, there exists a sequence tn & 0 such that g(t ) − g(0) n → dg(0). tn From the first part of the inequalities (5.11) G(t , y ) − G(0, y ) g(t ) − g(0) n n n ≤ n , tn tn

0 since G(0, yn) ≥ g(0). By assumption (H3) there exists a subsequence {tnk } of {tn}, y ∈

Y (0), and a sequence {ynk }, ynk ∈ Y (tnk ), such that G(t , y ) − G(0, y ) g(t ) − g(0) nk nk nk ≤ nk , tnk tnk

G(tnk , ynk ) − G(0, ynk ) 0 dg(0) = lim inf ≥ dG(0, y ; +1, 0) ≥ inf dG(0, y0; +1, 0). k→∞ tnk y0∈Y (0) Finally, there exists y0 ∈ Y (0) such that

0 inf dG(0, y0; +1, 0) ≥ dg(0) ≥ dg(0) ≥ dG(0, y ; +1, 0) ≥ inf dG(0, y0; +1, 0) y0∈Y (0) y0∈Y (0) 0 ⇒ dg(0; +1) = inf dG(0, y0; +1, 0) = dG(0, y ; +1, 0). y0∈Y (0) Remark 5.2. This theorem extends to spaces of infinite dimension. In particular, the last part of property (5.10) extends a former result by B. LEMAIRE [1, Thm. 2.1, p. 38] in 1970, where sequen- tial compactness of the set X was assumed. It also completes and extends Theorem 1 in the thesis of J.-P. ZOLÉSIO [1] in 1979 and the specific application to the semidifferentiability of the least eigenvalue presented at the NATO Advanced Institute by J.-P. ZOLÉSIO [2] in 1980. For a similar theorem in the framework of the generalized gradient the reader is referred to F. H. CLARKE [2, sect. 2.8, pp. 85–95] in 1983.

5.2.2 Infimum of a Parametrized Quadratic Function n n Given an n × n symmetric matrix A, a vector a ∈ R , and a nonempty subset Y of R , consider the parametrized infimum by A and a,

f(A, a) def= inf Ay · y + a · y, (5.12) y∈Y Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 149 and the set of minimizers Y (0) def= {y ∈ Y : Ay · y + a · y = f(A, a)} . (5.13) n Let B be another n × n symmetric matrix, b ∈ R , and t ≥ 0. Consider the perturbed problems f(A + t B, a + tb) = inf [A + t B]y · y + [a + t b] · y (5.14) y∈Y and their sets of minimizers Y (t) def= {y ∈ Y :[A + t B]y · y + [a + t b] · y = f(A + t B, a + tb)} . (5.15) In a first step, we show that (A, a) 7→ f(A, a) is concave and Lipschitzian and conclude that f is Hadamard semidifferentiable. In a second step, we use Theorem 5.1 to find the expression of the semidifferential f(A + t B, a + tb) − f(A, a) df(A, a; B, b) def= lim . (5.16) t&0 t

Theorem 5.2. Let Symn be the vector space of all symmetric n × n matrices endowed with the norm kAyk kAk def= sup . y6=0 kyk n Assume that Y ⊂ R is compact and nonempty and that f is given by expression (5.12). n (i) The function (A, a) 7→ f(A, a) : Symn × R → R is concave and Lipschitzian:

∀A, B ∈ Symn, ∀a, b, |f(B, b) − f(A, a)| ≤ kB − Ak + kb − ak.

n (ii) For all A, B ∈ Symn and a, b ∈ R

def f(A + tW, a + tw) − f(A, a) dH f(A, a; B, b) = lim t&0 t (W,w)→(B,b) (5.17) = min By · y + b · y = min y ⊗ y ·· B + b · y, y∈Y (A,a) y∈Y (A,a)

where Y (A, a) def= {y ∈ Y : Ay · y + a · y = f(A, a)},

def (a ⊗ b)ij = ai bj (5.18) is the product of the vectors a and b, and def X A·· B = Aij Bij (5.19) 1≤i,j≤n is the double inner product of two matrices A and B. (Note that y ⊗ y is a positive semidefinite matrix with trace equal to kyk2.) k k n (iii) Let ~p 7→ A(~p): R → Symn and ~p 7→ a(~p): R → R be Hadamard semidifferentiable mappings. Consider the function

def k ~p 7→ `(~p) = f(A(~p), a(~p)) : R → R . (5.20) k Then, for ~v ∈ R ,

dH `(~p;~v) = dH f(A(~p), a(~p); dH A(~p;~v), dH a(~p;~v))

= min y ⊗ y ·· dH A(~p;~v) + y · dH a(~p;~v). y∈Y (A(~p),a(~p)) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

150 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Remark 5.3. (1) Formula (5.17) gives a complete description of the nondifferentiability of f(A, a). When Y (A, a) is not a singleton and the direction B changes, the elements of Y (A, a) that achieve the infimum of By · y + b · y change, resulting in a nonlinearity of the mapping (B, b) 7→ dH f(A, a; B, b). (2) For a supremum instead of an infimum, it is sufficient to change the concavity for the con- vexity and the minimum for the maximum in the conclusions of the theorem. (3) By using the Rayleigh quotient, we shall see in section 2.2 of Chapter 4 that the smallest and largest eigenvalues of a symmetric matrix correspond to the infimum and the supremum of Ay ·y n n with respect to the unit sphere Y = {y ∈ R : kyk = 1} in R . We shall see later in section 2.3 of Chapter 4 that Theorem 5.2 directly gives the expression of the semidifferential of the smallest n and largest eigenvalues of a symmetric matrix A by using Y = {y ∈ R : kyk = 1} and a = 0. We can also obtain the semidifferential of all the other eigenvalues. (4) The case A = 0 corresponds to the support function σY of Example 5.2.

Proof. (i) For all α, 0 < α < 1, (A, a), (B, b), and y ∈ Y , [αA + (1 − αB]y · y + [αa + (1 − αb] · y = α [Ay · y + a · y] + (1 − α)[By · y + b · y] ≥ α inf [Ay · y + a · y] + (1 − α) inf [By · y + b · y] y∈Y y∈Y ⇒ f(α(A, a) + (1 − α)(B, b)) ≥ α f(A, a) + (1 − α) f(B, b), and the function is concave. For (A, a) and (B, b), Ay · y + a · y = (A − B)y · y + (a − b)y + By · y + b · y ⇒ Ay · y + a · y ≥ −kA − Bk kyk2 − ka − bk kyk + By · y + b · y ⇒ inf Ay · y + a · y ≥ −c2kA − Bk − c ka − bk + inf By · y + b · y y∈Y y∈Y ⇒ f(B, b) − f(A, a) ≤ c2kA − Bk + c ka − bk, since the norms of the elements of the compact Y are bounded by a constant c. By interchanging the roles of (A, a) and (B, b), we get f(A, a)−f(B, b) ≤ c2kA−Bk+c ka−bk and |f(B, b)− f(A, a)| ≤ c2kA − Bk + c ka − bk. (ii) It is sufficient to show that

df(A, a; B, b) = min By · y + b · y. (5.21) y∈Y (A,a)

Then, from the Lipschitz continuity of f(A, a) (cf. Theorem 3.6 of Chapter 3),

dH f(A, a; B, b) = min By · y + b · y. y∈Y (A,a) So, choose the polynomial function (hence continuous and differentiable)

def n (t, y) 7→ G(t, y) = [A + t B]y · y + [a + t b] · y : R × R → R ⇒ dG(t, y; +1, 0) = By · y + b · y and f(A + t B, a + t b) = inf G(t, y) y∈Y and check that the assumptions of Theorem 5.1 are verified. Assumption (H1) is verified by the Weierstrass theorem since the function G(t, y) is continuous in y and Y is compact. We have dG(t, y; +1, 0) = By·y+b·y, and Assumption (H2) is verified. It remains to check Assumption (H3). Associate with a sequence {tn}, tn > 0 → 0, a sequence of points yn ∈ Y (tn). As Y is Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 151

0 compact and yn ∈ Y (tn) ⊂ Y , there exist a subsequence {ynk } and a point y ∈ Y such that 0 ynk → y and tnk → 0. By definition of the infimum and continuity of G,

∀y ∈ Y,G(tnk , ynk ) ≤ G(tnk , y) 0 0 ⇒ G(0, y ) = lim G(tn , yn ) ≤ lim G(tn , y) = G(0, y) ⇒ y ∈ Y (0). k→∞ k k k→∞ k

Finally, Assumption (H3) is verified since

G(tnk , ynk ) − G(0, ynk ) =Bynk · ynk + b · ynk tnk → By0 · y0 + b · y0 = dG(0, y0; +1, 0).

All the assumptions of Theorem 5.1 are verified. (iii) The formula is a consequence of Theorem 3.5 for the functions that are Hadamard semi- differentiable.

Theorem 5.2 assumes that Y is compact, but its conclusions remain true under other sets of n assumptions. For a second example involving a quadratic function on R , we need the following lemma that anticipates Chapter 5.

n Lemma 5.1. Let A be a symmetric positive definite matrix and a ∈ R . The problem

inf q(y), q(y) def= Ay · y + a · y, (5.22) n y∈R has a unique minimizer yˆ that is a solution of the equation

2 Ayˆ + a = 0. (5.23)

n Proof. The function q has the growth property at infinity in R (Lemma 5.1 and Example 5.8). Therefore, it has a nonempty lower section in U by Theorem 5.4 of Chapter 2. Finally, since n q is continuous as a polynomial function and U = R is closed, there exists a minimizer yˆ by Theorem 5.3. By the Taylor theorem, Theorem 2.6, applied to the function g(t) = q(ˆy+t (y−yˆ)), n y ∈ R , there exists α ∈ ]0, 1[ such that g(1) = g(0) + g0(0) + (1/2)g00(α). But g0(0) = (2 Ayˆ + a) · (y − yˆ) and g00(α) = 2 A(y − yˆ) · (y − yˆ). Therefore,

q(y) = q(ˆy) + (2 Ayˆ + a) · (y − yˆ) + A(y − yˆ) · (y − yˆ).

In particular, for all t > 0,

0 ≤ q(ˆy ± t(y − yˆ)) − q(ˆy) = ±t (2 Ayˆ + a) · (y − yˆ) + t2A(y − yˆ) · (y − yˆ) q(ˆy ± t(y − yˆ)) − q(ˆy) ⇒ 0 ≤ = ±(2 Ayˆ + a) · (y − yˆ) + tA(y − yˆ) · (y − yˆ). t By letting t go to zero, ±(2 Ayˆ + a) · (y − yˆ) ≥ 0 and hence (2 Ayˆ + a) · (y − yˆ) = 0 for n all y ∈ R . So, 2 Ayˆ + a = 0. Since A is positive definite, the solution yˆ = −(1/2)A−1a is unique.

Theorem 5.3. Let A be an n × n positive definite matrix.

0 0 0 0 (i) There exists r > 0 such that the mapping (A , a ) 7→ f(A , a ): Br(A) × Br(a) → R is well defined, concave, and Lipschitzian. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

152 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n (ii) The function f is Hadamard differentiable at (A, a). For all B ∈ Symn and b ∈ R ,

def f(A + tW, a + tw) − f(A, a) dH f(A, a; B, b) = lim t&0 t (W,w)→(B,b) (5.24)

= By0 · y0 + b · y0 = y0 ⊗ y0 ·· B + b · y0,

n where y0 ∈ R is the unique solution of the equation 2 Ay0 + a = 0.

k k n (iii) Let ~p 7→ A(~p): R → Symn and ~p 7→ a(~p): R → R be Hadamard semidifferentiable with respect to ~p. Assume that A(~p) is positive definite. Consider the function

def k ~p 7→ `(~p) = f(A(~p), a(~p)) : R → R . (5.25)

k Then, for ~v ∈ R

dH `(~p;~v) = dH f(A(~p), a(~p); dH A(~p;~v), dH a(~p;~v))

= y0 ⊗ y0 ·· dH A(~p;~v) + y0 · dH a(~p;~v)),

n where y0 ∈ R is the unique solution of the equation 2 A(~p)y0 + a(~p) = 0. Proof. The proof is similar to that of Theorem 5.2 with the following changes. n Since A is positive definite, there exists α > 0 such that for all y ∈ R , Ay · y ≥ α kyk2. 0 Therefore, for A ∈ Symn,

A0y · y = Ay · y + (A0 − A)y · y ⇒ A0y · y ≥ Ay · y − kA0 − Ak kyk2 ≥ (α − kA0 − Ak) kyk2 > (α/2) kyk2

0 0 0 for all A such that kA − Ak < α/2, that is, for all A in the ball Bα/2(A) of radius α/2 at A. 0 0 n 0 0 Then, for all A ∈ Bα/2(A) and a ∈ R , the infimum of the quadratic function A y · y + a · y n has a unique minimizer y0 ∈ R that is the unique solution of 2 A0y0 +a = 0. Then, the mapping 0 0 0 0 n 0 (A , a ) 7→ f(A , a ): Bα/2(A) × R → R is well defined. Moreover, for all a ∈ Bα/2(a),

2 α ky0k2 ≤ 2 A0y0 · y0 = −a0 · y0 ≤ ka0k ky0k 1 1 1 h αi ⇒ ky0k ≤ ka0k ≤ [kak + ka − a0k] < k def= kak + . (5.26) 2 α 2 α 2 α 2

n Set Y = R and G(t, y) = [A + t B]y · y + [a + t b] · y for B ∈ Symn and t, 0 ≤ t < τ, where τ > 0 is chosen such that τ kBk < α/2 in order to have A + tB ∈ Bα/2(A) for all t, 0 ≤ t < τ. As a result Y (t) = {yt} is a singleton, a unique solution of the equation 2 [A + t B]y · y + [a + t b] · y = 0. (i) For all α, 0 < α < 1, (A1, a1), (A2, a2), and y ∈ Y ,

[αA1 + (1 − αA2]y · y + [αa1 + (1 − αa2] · y

= α [A1y · y + a1 · y] + (1 − α)[A2y · y + a2 · y]

≥ α inf [A1y · y + a1 · y] + (1 − α) inf [A2y · y + a2 · y] y∈Y y∈Y

⇒ f(α(A1, a1) + (1 − α(A2, a2)) ≥ α f(A1, a1) + (1 − α) f(A2, a2) and the function is concave. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 153

For (B, b) and (C, c) in Bα/2(A) × Bα/2(a),

By · y + b · y = (B − C)y · y + (b − c)y + Cy · y + c · y ⇒ By · y + b · y ≥ −kB − Ck kyk2 − kb − ck kyk + Cy · y + c · y ⇒ By · y + b · y ≥ −kB − Ck kyk2 − kb − ck kyk + inf Cy · y + c · y. y∈Y

If yB is the minimizer for the infimum f(B, b), then

2 f(B, b) = ByB · yB + b · yB ≥ −kB − Ck kyBk − kb − ck kyBk + inf Cy · y + c · y y∈Y 2 ⇒ f(C, c) − f(B, b) ≤ kyBk kB − Ck + kyBk kb − ck.

But, from (5.26), kyBk < k and

f(C, c) − f(B, b) ≤ k2kB − Ck + k kb − ck.

By interchanging the roles of (B, b) and (C, c), we obtain f(B, b) − f(C, c) ≤ k2kB − Ck + k kc − bk and |f(B, b) − f(C, c)| ≤ k2kB − Ck + k kc − bk. (ii) The proof is the same as that of Theorem 5.2 by observing that all the sets Y (t), 0 ≤ t < τ, are contained in the compact ball Bk(0). Moreover, dH f(A, a; B, b) is linear in (B, b) since Y (0) is a singleton. Consequently, f is Hadamard and Fréchet differentiable at (A, a). (iii) The formula is obtained by applying the chain rule to the composition of Hadamard semidifferentiable functions (Theorem 3.5).

5.3 Infimum with Respect to Several Parameters We have seen at the beginning of section 5.2.2 how Theorem 5.1 can be used to obtain the semidifferential with respect to several parameters. In fact, Theorem 5.1 readily extends from n m one to m parameters. Given Y , ∅ 6= Y ⊂ R , and a mapping G : R ×Y → R, consider the function

g(x) def= inf G(x, y),Y (x) def= {y ∈ Y : G(x, y) = g(x)} . (5.27) y∈Y

We first seek conditions under which dg(x; v) is of the form

dg(x; v) = min dG(x, y; v, 0) y∈Y (x) and, in a second step, the ones for the existence of the stronger dH g(x; v). The case of a supre- mum in place of an infimum in (5.27) is obtained by considering the function −G(x, y).

5.3.1 Sublinear and Superlinear Functions

In the differentiable case J. M. DANSKIN [1] brings to light the fact that the mapping v 7→ dg(x; v) is superlinear. We recall the definitions of such mappings and take this opportunity to give a rather complete account of their general properties since they will be used later in this book.

Definition 5.1. n A function f : R → R ∪{+∞} is sublinear if (a) ∀α > 0, f(αx) = α f(x) (positive homogeneity), Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

154 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n (b) ∀x, y ∈ R , f(x + y) ≤ f(x) + f(y) (subadditivity). n A function f : R → R ∪{−∞} is superlinear if −f is sublinear. We have seen in Theorem 4.9(ii) that the semidifferential of a convex function is sublinear and, consequently, that it is superlinear for a . We recall the definitions of the support function σU and the indicator function IU of a n nonempty subset U of R ,   ( )  sup x · y if U 6= ∅ 0 if x ∈ U def y∈U def σU (x) = ,IU (x) = . (5.28) + ∞ if x∈ / U  − ∞ if U = ∅

n Theorem 5.4. Let f : R → R ∪{+∞} be a sublinear function. (i) f is convex, dom f is a convex cone in 0, and either f(0) = 0 or f(0) = +∞.

(ii) If dom f 6= ∅ and f is lsc, then f(0) = 0, f ∗ is convex lsc, dom f ∗ 6= ∅ is convex and closed, and

∗ f = σdom f ∗ and f = Idom f ∗ . (5.29)

n (iii) If dom f = R , then dom f ∗ is nonempty, convex, and compact and f is convex, contin- uous, Hadamard semidifferentiable, and Lipschitzian:

n def ∀x, y ∈ R , |f(y) − f(x)| ≤ c(f) ky − xk, c(f) = sup |f(z)|, (5.30) kzk=1

∀v1, v2, |dH f(x; v2) − dH f(x, v2)| ≤ c(f) kv2 − v1k. (5.31)

Proof. (i) By definition, a sublinear function is convex and its domain dom f is convex. It is also a cone in 0 since

∀x ∈ dom f, ∀α > 0, f(αx) = αf(x) < ∞ ⇒ αx ∈ dom f.

Finally, for λ > 0, f(0) = f(λ0) = λf(0). Therefore, either f(0) = 0 or f(0) = +∞. (ii) If x ∈ dom f, then f(0) ≤ lim infn→∞ f(x/n) = lim infn→∞ f(x)/n = 0 and f(0) = n 0. By Theorem 8.5(iv) of Chapter 2, f ∗ : R → R ∪{+∞} is convex lsc and dom f ∗ 6= ∅ is convex. By definition, we have

f ∗(x∗) = sup x∗ · x − f(x). n x∈R For all λ > 0, by positive homogeneity,   λf ∗(x∗) = λ sup x∗ · x − f(x) = sup [x∗ · (λx) − f(λx)] = f ∗(x∗). n n x∈R x∈R Since f ∗(x∗) > −∞, either f ∗(x∗) = 0 or f ∗(x∗) = +∞. Therefore, dom f ∗ = {x∗ : ∗ ∗ ∗ f (x ) = 0}, f = Idom f ∗ . ∗ We have shown in Example 8.8 of Chapter 2 that for U 6= ∅, σU = IU and σco (U) = σU . ∗ In particular, dom σU = co U is convex and closed. Since f is convex, lsc, and dom f 6= ∅, by Theorem 8.5(iv) of Chapter 2,

∗∗ ∗ ∗ ∗ f = f = Idom f ∗ = σdom f = σdom f ∗ = Idom f ∗ . Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 155

∗ ∗∗ Finally, as dom f is convex and closed, I ∗ is convex lsc, I = I ∗ , and dom f dom f ∗ dom f

∗ ∗∗ ∗ ∗ ∗ Idom f = f = Idom f ∗ = Idom f ∗ ⇒ dom f = dom f and dom f ∗ is closed and convex. n n (iii) Since dom f = R and f : R → R is convex, it is continuous and Hadamard semi- n differentiable in R by Theorem 4.8 and f(0) = 0. It is also Lipschitzian since   |f(y) − f(x)| y − x ∀x, y, x 6= y, f(y) ≤ f(x) + f(y − x) ⇒ ≤ f , ky − xk ky − xk   |f(x) − f(y)| x − y f(x) ≤ f(y) + f(x − y) ⇒ ≤ f kx − yk kx − yk |f(y) − f(x)| ⇒ ≤ c(f) def= sup |f(z)| < ∞ ky − xk kzk=1 n ⇒ ∀x, y ∈ R , |f(y) − f(x)| ≤ c(f)ky − xk.

The Lipschitz continuity of v 7→ dH f(x; v) follows from Theorem 3.6. ∗ ∗ If dom f is not bounded, then there exists a sequence {yn} ⊂ dom f such that kynk →

+∞. Therefore, we can extract a subsequence such that ynk /kynk k → x for an x such that kxk = 1. By continuity of f,

 y   y  f(x) = lim f nk = lim sup nk · y k→∞ k→∞ ∗ kynk k y∈dom f kynk k   ynk ≥ lim · yn = lim kyn k = +∞, k→∞ k k→∞ k kynk k

n which contradicts the fact that dom f = R .

We have shown that a sublinear function with nonempty domain is the support function of a nonempty set. We now prove the converse.

n Theorem 5.5. For U ⊂ R nonempty, σU is sublinear, lsc, σU (0) = 0, dom σU 6= ∅, σU = ∗ ∗ ∗ σco U , σU = Ico U , and dom σU = co U. If, in addition, U is bounded, then dom σU = co U n is compact and convex, dom σU = R , and

n ∀x1, x2 ∈ R , |σU (x2) − σU (x1)| ≤ sup kyk kx2 − x1k, (5.32) y∈co U

n and σU : R → R is convex, continuous, and Hadamard semidifferentiable, and

def dH σU (x; v) = sup v · y, Uˆ(x) = {y ∈ co (U): x · y = σU (x)}. (5.33) y∈Uˆ(x)

n Corollary 1. (i) For a sublinear function f : R → R ∪{+∞}, the following three proper- n n ties are equivalent: f continuous on R ; dom f = R ; dom f ∗ nonempty and compact; dom f ∗ nonempty and bounded.

n (ii) For the function σU , we have the following equivalences: σU continuous on R ; dom σU = n R ; co (U) nonempty and compact; U nonempty and bounded. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

156 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Remark 5.4. Therefore, there is a bijection between the families

def n F = {f : R → R ∪{+∞} : f sublinear, lsc, and dom f 6= ∅} (5.34) def n and U = {U ⊂ R : U convex, closed, and nonempty}

∗ given by f 7→ U = dom f : F → U and U 7→ f = σU : U → F.

Proof. We have already established that σU is sublinear, lsc, and that σU (0) = 0 in Example ∗ ∗ ∗∗ 5.2. We have also shown in Example 8.8 of Chapter 2 that σU = IU , σU = (IU ) = Ico U , and ∗ σU = σco U . In particular, dom σU = co U is convex and closed. If, in addition, U is bounded, then co U is compact. For each x, there exists yˆ ∈ co U such that n σU (ˆy) = σco (U)(ˆy) = sup x · y = x · yˆ ∈ R ⇒ dom σU = R . y∈co (U) n For x1, x2 ∈ R ,

x2 · y = x1 · y + (x2 − x1) · y ≤ x1 · y + kx2 − x1k kyk

⇒ sup x2 · y ≤ sup x1 · y + kx2 − x1k sup kyk y∈co U y∈co U y∈co U

⇒ |σU (x2) − σU (x1)| ≤ sup kyk kx2 − x1k. y∈co U

n Therefore, the function σU is continuous and convex in R . By Theorem 4.8, it is Hadamard semidifferentiable and, by applying Theorem 5.2(ii) (with A = 0) to the function −σU (x) = − supy∈U x · y = infy∈U (−x) · y, we obtain the expression of dH σU (x; v).

5.3.2 General Theorem (Y Not Necessarily Compact) n m Theorem 5.6. Let Y be a nonempty subset of R , V (x) a neighborhood of a point x ∈ R , and G : V (x) × Y → R. Assume that the following conditions are verified:

(H1) for all x0 ∈ V (x), Y (x0) 6= ∅;

m (H2) for all y0 ∈ Y (x) and all v ∈ R , dG(x, y0; v, 0) exists;

m (H3) for all v ∈ R and all sequences tn & 0, there exists a subsequence {tnk } of {tn}, 0 y ∈ Y (x), and a sequence {ynk }, ynk ∈ Y (x + tnk v), such that

G(x + t v, y ) − G(x, y ) lim inf nk nk nk ≥ dG(x, y0; v, 0). k→∞ tnk

Then, we have the following properties.

m (i) For each v ∈ R , there exists y0 ∈ Y (0) such that g(x + tv) − g(0) dg(x; v) = lim = inf dG(x, y; v, 0) = dG(x, y0; v, 0). (5.35) t&0 t y∈Y (x)

m If, in addition, v 7→ dG(x, y; v, 0) is superlinear, then v 7→ dg(x; v): R → R is superlinear (and continuous). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 157

m (ii) If, for all y ∈ Y , x0 7→ G(x0, y) is Hadamard semidifferentiable at x, and, for all v ∈ R , 0 all sequences wn → v and tn & 0, there exist y ∈ Y (x), subsequences {wnk } of {wn},

{tnk } of {tn}, and a sequence {ynk }, ynk ∈ Y (x + tnk wnk ), such that G(x + t w , y ) − G(x, y ) lim inf nk nk nk nk ≥ dG(x, y0; v, 0), (5.36) k→∞ tnk m then, for all v ∈ R , dH g(x; v) exists. If, in addition, v 7→ dG(x, y; v, 0) is superlinear, m then v 7→ dH g(x; v): R → R is superlinear (and continuous). (iii) If, in addition, v 7→ dG(x, y; v, 0) is linear and Y (x) is a singleton, then g is Gateaux differentiable in parts (i) and (ii) and Hadamard (Fréchet) differentiable in part (iii). Proof. (i) From Theorem 5.1. (ii) Given t > 0 and w → v, and any yt ∈ Y (x + tw) and y0 ∈ Y (x), g(x + tw) − g(x) G(x + tw, y ) − G(x, y ) G(x + tw, y ) − G(x, y ) = t 0 ≤ 0 0 t t t def g(x + tw) − g(x) ⇒ ∀y0 ∈ Y (x), dg = lim sup ≤ dG(x, y0; v, 0) w→v t t&0

⇒ dg ≤ inf dG(x, y0; v, 0). y0∈Y (x)

Conversely, let {tn}, tn & 0, and vn → v be sequences such that g(x + t v ) − g(x) g(x + tw) − g(x) lim n n → dg def= lim sup . n→∞ tn w→v t t&0

0 Let yn ∈ Y (x + tnvn). By compactness of Y , there exist y ∈ Y and a subsequence, still 0 denoted {yn}, such that yn → y as tn & 0. Then,

∀y ∈ Y,G(x + tnvn, y) ≥ G(x + tnvn, yn).

+ Since dH G(x, y; v, 0) exists, by continuity of (t, w) 7→ G(x + tw, y) at (0 , v) and lsc of G,

0 G(x, y) = lim G(x + tnvn, y) ≥ lim inf G(x + tnvn, yn) ≥ G(x, y ) n→∞ n→∞ ⇒ ∀y ∈ Y,G(x, y) ≥ G(x, y0) ⇒ y0 ∈ Y (x). On the other hand, by assumption (5.45), g(x + t v ) − g(x) G(x + t v , y ) − G(x, y ) G(x + t v , y ) − G(x, y ) n n = n n n 0 ≥ n n n n tn tn tn G(x + t v , y ) − G(x, y ) ⇒ dg ≥ lim inf n n n n ≥ dG(x, y0; v, 0). n→∞ tn Therefore,

0 inf dG(x, y0; v, 0) ≥ dg ≥ dg ≥ dG(x, y ; v, 0) ≥ inf dG(x, y0; v, 0) y0∈Y (x) y0∈Y (x)

0 and we have equality everywhere. So, dH g(x; v) exists and there exists y ∈ Y (x) such that

0 dH g(x; v) = inf dG(x, y0; v, 0) = dG(x, y ; v, 0). y0∈Y (x) Finally, as dH g(x; v) exists, g is continuous at x. The superlinearity follows directly from (5.44). (iii) From formula (5.44). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

158 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

5.4 Theorem of Danskin and Its Variations: Y Compact Consider the parametrized infimum with m ≥ 1 parameters

g(x) def= inf G(x, y),Y (x) def= {y ∈ Y : G(x, y) = g(x)} (5.37) y∈Y

m n for Y , ∅ 6= Y ⊂ R , compact and a function G : R ×Y → R. We know that under the conditions of Theorem 5.1

dg(x; v) = min dG(x, y; v, 0). y∈Y (x)

J.M.DANSKIN [1] considered this problem in the case of the supremum in 1966 (and in [2] in 1967) under strong differentiability assumptions. The case of the supremum in (5.37) is obtained by considering the minimum of the function −G(x, y). We now give the proof of this theorem and show that its proof and conclusions can be completed by proving the existence of the stronger Hadamard semidifferential dH g(x; v).

m Theorem 5.7 (J.M.DANSKIN [1], Theorems 1 to 4). Let ∅ 6= Y ⊂ R be compact, and n (x, y) 7→ G(x, y): R ×Y → R be a continuous function which is Fréchet differentiable with 119 n n respect to x such that (x, y) 7→ ∇xG(x, y): R ×Y → R be continuous. Then (i) Y (x) = {y ∈ Y : g(x) = G(x, y)} is compact and nonempty for each x;

n (ii) g : R → R is continuous; n n (iii) for each pair (x, v) ∈ R × R ,

dH g(x; v) = min ∇xG(x, y) · v; (5.38) y∈Y (x)

n (iv) the mapping v 7→ dH g(x; v): R → R is continuous and superlinear and x 7→ dH g(x; v): n R → R is lower semicontinuous. Remark 5.5. J.M.DANSKIN [1] only proves the existence of dg(x; v), but we also have the one of dH g(x; v) for which the chain rule is applicable.

n Proof. (i) By continuity of G and compactness of Y , for each x ∈ R , Y (x) is nonempty and compact. n (ii) For x2, x1 ∈ R and y1 ∈ Y (x1), y2 ∈ Y (x2),

G(x2, y2) − G(x1, y2) ≤ g(x2) − g(x1) ≤ G(x2, y1) − G(x1, y1),

−|G(x2, y2) − G(x1, y2)| ≤ g(x2) − g(x1) ≤ |G(x2, y1) − G(x1, y1)|.

For ρ > 0, the function G : Bρ(x1) × Y → R is uniformly continuous: for all ε > 0, there exists δ > 0 such that

0 0 0 0 0 0 ∀x, x ∈ Bρ(x1), y, y ∈ Y, k(x , y ) − (x, y)k < δ, |G(x , y ) − G(x, y)| < ε

⇒ ∀x2, kx2 − x1k < δ, |G(x2, y2) − G(x1, y2)| < ε et |G(x2, y1) − G(x1, y1)|, < ε and g is continuous at x1.

119 n ∇xG(x, y) denotes the gradient of the function x 7→ G(x, y): R → R. Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 159

(iii) Given t & 0, w → v, and y0 ∈ Y (x), Z 1 g(x + tw) − g(x) G(x + tw, y0) − G(x, y0) ≤ = ∇xG(x + θtw, y0) · w dθ. t t 0

As tw → 0 and x + θtw → x, by continuity of (x, y) 7→ ∇xG(x, y), g(x + tw) − g(x) Z 1 lim sup ≤ lim ∇xG(x + θtw, y0) · w dθ = ∇xG(x, y0) · v. w→v t w→v t&0 t&0 0

Since y0 ∈ Y (x) is arbitrary, g(x + tw) − g(x) lim sup ≤ inf ∇xG(x, y) · v. (5.39) w→v t y∈Y (x) t&0

In the other direction, consider the sequences tn & 0 and wn → v such that g(x + t w ) − g(x) g(x + tw) − g(x) n n → lim inf w→v tn t&0 t and, for each n, yn ∈ Y (x + tnwn). Since Y is compact, there exist subsequences still denoted 0 0 {tn} and {wn} and y ∈ Y such that yn → y . Therefore

∀y ∈ Y, g(x + tnwn) = G(x + tnwn, yn) ≤ G(x + tnwn, y) and by continuity of G and g, ∀y ∈ Y, g(x) = G(x, y0) ≤ G(x, y) ⇒ y0 ∈ Y (x). For the differential quotient g(x + t w ) − g(x) G(x + t w , y ) − G(x, y ) n n ≥ n n n n t tn Z 1 = ∇xG(x + θtnwn, yn) · wn dθ. 0

Since tnwn → 0 and x + θtnwn → x, by continuity of (x, y) 7→ ∇xG(x, y) g(x + tw) − g(x) Z 1 lim inf ≥ lim ∇ G(x + θt w , y ) · w dθ w→v w →v x n n n n t n 0 t&0 tn&0 0 = ∇xG(x, y ) · v ≥ inf ∇xG(x, y) · v. y∈Y (x)

By combining this inequality with inequality (5.39), there exists y0 ∈ Y (x) such that

g(x + tw) − g(x) 0 dH g(x; v) = lim = inf ∇xG(x, y) · v = ∇xG(x, y ) · v. w→v y∈Y (x) t&0 t Therefore the set of minimizers Y (x) is not empty. (iv) As g is Hadamard semidifferentiable, v 7→ dH g(x; v) is continuous and superlinear as the infimum of linear functions

dH g(x; v1 + v2) = inf ∇xG(x, y) · (v1 + v2) y∈Y (x)

≥ inf ∇xG(x, y) · v1 + inf ∇xG(x, y) · v2. y∈Y (x) y∈Y (x) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

160 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

n Given v ∈ R , let xn → x and yn ∈ Y (xn) be sequences such that

dH g(xn; v) = inf ∇xG(xn, y) · v = ∇xG(xn, yn) · v. y∈Y (xn)

0 0 Since Y is compact, there exist a subsequence and y ∈ Y such that yn → y . Again, by continuity of g and G,

∀y ∈ Y, g(xn) = G(xn, yn) ≤ G(xn, y) ⇒ ∀y ∈ Y, g(x) = G(x, y0) ≤ G(x, y) ⇒ y0 ∈ Y (x)

⇒ lim inf dH g(xn; v) = lim inf ∇xG(xn, yn) · v xn→x xn→x 0 = ∇xG(x, y ) · v ≥ inf ∇xG(x, y) · v = dH g(x; v); y∈Y (x) hence x 7→ dH g(x; v) is lsc. We now give two other versions of this theorem where the strong differentiability is relaxed: the concave case (x 7→ G(x, y) concave) in section 5.4.1 and the semidifferentiable case in section 5.4.2. The dual case (x 7→ G(x, y) convex) was done by D. P. BERTSEKAS [1, Prop. A.22, p. 154] in 1971 (see also the book of D. P. BERTSEKAS [2, pp. 717–719] in 2004). It is important to note that the concave case is not a special case of the differentiable or the semi- differentiable case. It requires a technical result from R. T. ROCKAFELLAR [2, Thm. 24.5] (see also D. P. BERTSEKAS [1, Prop. A.21, p. 154], [2, pp. 710–711]). In sections 5.4.1 and 5.4.2 it will be important to well understand the following remark.

Remark 5.6. The assumption that the function x 7→ G(x, y) is Gateaux semidifferentiable is equivalent to the existence of dG(x, y; v, 0) for all v and its linearity. However, in general, the assumption that the function x 7→ G(x, y) is Hadamard semidifferentiable does not imply the stronger notion that dH G(x, y; v, 0) exists for all v since this would imply the continuity of the function (x, y) 7→ G(x, y) by Theorem 3.8.

5.4.1 Concave Case def n Recall that, for a family x 7→ Gy(x) = G(x, y): R → R ∪{−∞} of concave usc functions n indexed by y ∈ Y , the lower envelope x 7→ g(x) = infy∈Y Gy(x): R → R ∪{−∞} is concave usc. Since this function can be −∞ at some points, we can only speak of a semidifferential at a point x ∈ int (dom g). Since g is concave in a convex neighborhood of x, then g will be continuous and Hadamard semidifferentiable in each point of this neighborhood. This amounts to starting with the assumption of a family of functions G : U × Y → R for an open convex set n U in R , such as, for instance, a ball Bρ(x) of radius ρ > 0. We go back to the proof of D. P. BERTSEKAS [2, pp. 717–719] in the convex case relaxing the continuity assumption on G to the upper semicontinuity (lower semicontinuity in the concave case). However we give the theorem in the concave case. m n Theorem 5.8 (D. P. BERTSEKAS [1], 1971). Let Y ⊂ R be nonempty compact, U ⊂ R be nonempty open and convex, and G : U × Y → R be an lsc function such that ∀y ∈ Y, x 7→ G(x, y): U → R is concave. (5.40) For each x ∈ U consider the function

g(x) def= inf G(x, y),Y (x) def= {y ∈ Y : G(x, y) = g(x)} . (5.41) y∈Y Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 161

Then,

(i) for each x ∈ U, Y (x) is nonempty compact, U ⊂ dom g, g : U → R is concave and n locally Lipschitz continuous in U, and, for all (x, v) ∈ U × R , dH g(x; v) exists and v 7→ dH g(x; v) is superlinear and continuous; n (ii) for all x ∈ U and v ∈ R , there exists y0 ∈ Y (x) such that 0 dH g(x; v) = inf dG(x, y; v, 0) = dG(x, y ; v, 0). (5.42) y∈Y (x)

We need the following theorem for convex functions that will be applied to concave func- tions. It turns out that the proof of R. T. ROCKAFELLAR can be slightly modified to obtain not only an inequality between semidifferentials, but an inequality between the stronger Hadamard semidifferentials. n Theorem 5.9 (R. T. ROCKAFELLAR [2, Thm. 24.5]). Let U ⊂ R be open convex, Fk : U → R, k ≥ 1, and F : U → R be convex (resp., concave) functions on U such that for all x ∈ U and all sequences {xk} such that xk → x, we have limk→∞ Fk(xk) = F (x). Then, for all x ∈ U n and v ∈ R and for all sequences {xk} and {vk} such that xk → x and vk → v,

lim sup dH Fk(xk; vk) ≤ dH F (x; v) (resp., lim inf dH Fk(xk; vk) ≥ dH F (x; v)). k→∞ k→∞

Proof. We go back to the proof of R. T. ROCKAFELLAR replacing the simple semidifferential by the stronger notion of the Hadamard semidifferential. By Theorems 4.4, 4.8, 4.7, and 3.6(i), any n convex function f : U → R on an convex open subset U of R is locally Lipschitz continuous, n Hadamard semidifferentiable in U, and for all x ∈ U and v ∈ R , the differential quotient, (f(x + tv) − f(x))/t, t > 0, is monotone increasing as a function of t, and f(x + tv) − f(x) dH f(x; v) = df(x; v) = inf . t&0 t

Let µ be such that dH F (x; v) = dF (x; v) < µ. There exists t¯ > 0 such that F (x + tv) − F (x) ∀t, 0 < t ≤ t,¯ < µ. t

Since xk + tvk → x + tv and xk → x, we get, by assumption, F (x + tv ) − F (x ) F (x + tv) − F (x) k k k k k → . t t Therefore, there exists K such that F (x + tv ) − F (x ) ∀k > K, k k k k k < µ. t

By convexity of Fk,

Fk(xk + tvk) − Fk(xk) ≥ dH Fk(xk; tvk) = t dH Fk(xk; vk)

Fk(xk + tvk) − Fk(xk) ⇒ lim sup dH Fk(xk; vk) ≤ lim sup ≤ µ. k→∞ k→∞ t

Since this last inequality is true for all µ such that dH F (x; v) < µ, let µ go to dH F (x; v):

Fk(xk + tvk) − Fk(xk) lim sup dH Fk(xk; vk) ≤ lim sup ≤ dH F (x; v). k→∞ k→∞ t Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

162 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Proof of Theorem 5.8. (i) Since Y is compact and that, for x ∈ U, y 7→ G(x, y): Y → R is lsc, by Theorem 5.1(ii) of Chapter 2, there exists yˆ ∈ Y such that

g(x) = G(x, yˆ) ⇒ Y (x) 6= ∅ and x ∈ dom g ⇒ U ⊂ dom g.

By concavity of x 7→ G(x, y), for all λ ∈ [0, 1], x1, x2 ∈ U,

G(λx1 + (1 − λ)x2, y) ≥ λG(x1, y) + (1 − λ)G(x2, y) ≥ λg(x1) + (1 − λ)g(x2)

⇒ g(λx1 + (1 − λ)x2) ≥ λg(x1) + (1 − λ)g(x2).

Therefore, the function g : U → R is concave and locally Lipschitz in U and, consequently, n Hadamard semidifferentiable, that is, dH g(x; v) exists for all x ∈ U and v ∈ R . (ii) For t > 0, pick yt ∈ Y (x + tv) and y0 ∈ Y (x), g(x + tv) − g(x) G(x + tv, y ) − G(x, y ) G(x + tv, y ) − G(x, y ) = t 0 ≤ 0 0 t t t ⇒ dg(x; v) ≤ dG(x, y0; v, 0) ⇒ dg(x; v) ≤ inf dG(x, y0; v, 0). y0∈Y (x)

In the other direction, given a sequence {tn}, tn & 0, choose arbitrary points yn ∈ Y (x + tnv). 0 By compactness of Y , there exists y ∈ Y and a subsequence, still denoted {yn}, such that 0 yn → y when tn & 0. Therefore, by the lower semicontinuity of G,

0 ∀y ∈ Y,G(x + tnv, y) ≥ G(x + tnv, yn) ⇒ G(x, y) ≥ G(x, y ) and y0 ∈ Y (x). On the other hand, since g is concave,

g(x + t v) − g(x) dg(x; v) ≥ n tn G(x + t v, y ) − G(x, y0) G(x + t v, y ) − G(x, y ) = n n ≥ n n n . tn tn By concavity of G

G(x, yn) − G(x + tnv, yn) ≤ dG(x + tnv, yn; −v, 0) ≤ −dG(x + tnv, yn; v, 0) tn ⇒ dg(x; v) ≥ dG(x + tnv, yn; v, 0).

We now apply Theorem 5.9 to the following concave functions Fn and F :

def Fn(x) = G(x + tnv, yn), dFn(x; v) = dG(x + tnv, yn; v, 0), F (x) def= G(x, y0), dF (x; v) = dG(x, y0; v, 0).

By assumption, G is lsc. Therefore, for all sequences xn → x

0 lim inf G(xn + tnv, yn) ≥ G(x, y ), n→∞ and, on the other hand, by continuity of x0 7→ G(x0y0),

0 G(xn + tnv, yn) ≤ G(xn + tnv, y ) 0 0 ⇒ lim sup G(xn + tnv, yn) ≤ lim G(xn + tnv, y ) = G(x, y ). n→∞ n→∞ Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 163

Therefore, we have for all sequences xn → x

0 Fn(xn) = G(xn + tnv,yn) → G(x,y ) = F (x) 0 ⇒ dg(x; v) ≥ lim inf dFn(x, v) ≥ dF (x; v) = dG(x, y ; v, 0) n→∞ ⇒ ∃y0 ∈ Y (x) such that dg(x; v) ≥ dG(x, y0; v, 0).

Putting everything together,

0 inf dG(x, y0; v, 0) ≥ dg(x; v) ≥ dG(x, y ; v, 0) ≥ inf dG(x, y0; v, 0) y0∈Y (x) y0∈Y (x) and we have equality everywhere, that is, expression (5.42): dH g(x; v) = dg(x; v) . Therefore, the function v 7→ dH g(x; v) is continuous. Since, by assumption, for all y ∈ Y , the function x 7→ G(x, y): U → R is concave, the function v 7→ dH G(x; v) is concave by Theorem 4.6 and, in particular, v 7→ dH g(x; v) is superlinear.

5.4.2 Semidifferentiable Case m n Theorem 5.10. Let Y , ∅ 6= Y ⊂ R , be compact and G : R ×Y → R be an lsc function such n n that dG(x, y; v, 0) exists for all (x, v, y) ∈ R × R ×Y . Consider the function

g(x) def= inf G(x, y),Y (x) def= {y ∈ Y : G(x, y) = g(x)} y∈Y

n n for all (x, v) ∈ R × R . Assume that G(x + tv, z) − G(x, z) ∀y ∈ Y (x), lim inf ≥ dG(x, y; v, 0). (5.43) z→y t t&0 (i) Then we have the following properties:

n n (a) for each x ∈ R , Y (x) is nonempty and compact and dom g = R ; n n (b) for all (x, v) ∈ R × R , dg(x; v) exists and ∃y0 ∈ Y (x) such that dg(x; v) = inf dG(x, y; v, 0) = dG(x, y0; v, 0). (5.44) y∈Y (x)

n If, in addition, v 7→ dG(x, y; v, 0) is superlinear, then, v 7→ dg(x;; v): R → R is superlinear (and continuous).

n (ii) If, for all y ∈ Y , x 7→ G(x, y) is continuous, then g is continuous in R . (iii) If, for all y ∈ Y , x 7→ G(x, y) is Hadamard semidifferentiable, and, for all (x, v) ∈ n n R × R , G(x + tw, z) − G(x, z) ∀y ∈ Y (x), lim inf ≥ dG(x, y; v, 0), (5.45) (w,z)→(v,y) t t&0

n n then, for all (x, v) ∈ R × R , dH g(x; v) exists. If, in addition, the function v 7→ n dG(x, y; v, 0) is superlinear, then v 7→ dH g(x; v): R → R is superlinear (and con- tinuous). (iv) If, in addition, v 7→ dG(x, y; v, 0) is linear and Y (x) is a singleton, then g is Gateaux differentiable in (i) and (ii) and Hadamard (Fréchet) differentiable in (iii). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

164 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

Remark 5.7. Part (iii) is the most interesting since g is Hadamard semidifferentiable and the chain rule can be used.

Proof of Theorem 5.10. (i) (a) Since Y is compact and the function y 7→ G(x, y): Y → R is lsc, by Theorem 5.1(ii) of Chapter 2, there exists yˆ ∈ Y such that

n g(x) = G(x, yˆ) ⇒ Y (x) 6= ∅ and dom g = R .

(b) For t > 0, and arbitrary yt ∈ Y (x + tv) and y0 ∈ Y (x),

g(x + tv) − g(x) G(x + tv, y ) − G(x, y ) G(x + tv, y ) − G(x, y ) = t 0 ≤ 0 0 t t t def g(x + tv) − g(x) ⇒ ∀y0 ∈ Y (x), dg = lim sup ≤ dG(x, y0; v, 0) t&0 t

⇒ dg ≤ inf dG(x, y0; v, 0). y0∈Y (x)

In the other direction, let {tn}, tn & 0, be a sequence such that

g(x + t v) − g(x) g(x + tv) − g(x) lim n → dg def= lim inf . n→∞ tn t&0 t

0 Let yn ∈ Y (x + tnv). By compactness Y , there exists y ∈ Y and a subsequence, still denoted 0 {yn}, such that yn → y when tn & 0. Then, by definition of the infimum,

∀y ∈ Y,G(x + tnv, y) ≥ G(x + tnv, yn).

By continuity of t 7→ G(x + tv, y) at 0+ and the lower semicontinuity of G,

0 G(x, y) = lim G(x + tnv, y) ≥ lim inf G(x + tnv, yn) ≥ G(x, y ) n→∞ n→∞ ⇒ ∀y ∈ Y,G(x, y) ≥ G(x, y0) ⇒ y0 ∈ Y (x).

On the other hand, by assumption (5.43),

g(x + t v) − g(x) G(x + t v, y ) − G(x, y ) G(x + t v, y ) − G(x, y ) n = n n 0 ≥ n n n tn tn tn G(x + t v, y ) − G(x, y ) ⇒ dg ≥ lim inf n n n ≥ dG(x, y0; v, 0). n→∞ tn Therefore,

0 inf dG(x, y0; v, 0) ≥ dg ≥ dg ≥ dG(x, y ; v, 0) ≥ inf dG(x, y0; v, 0) y0∈Y (x) y0∈Y (x) and we have equality everywhere, Hence, dg(x; v) exists and there exists y0 ∈ Y (x) such that

0 dg(x; v) = inf dG(x, y0; v, 0) = dG(x, y ; v, 0). y0∈Y (x) The superlinearity is a consequence of formula (5.44). (ii) (a) To prove the continuity of g, consider a sequence xn → x. Since Y (xn) 6= ∅, there 0 exists yn ∈ Y (xn). By compactness of Y , there exists y ∈ Y and a subsequence {ynk } of {yn} Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

5. I Semidifferential of a Parametrized Extremum 165

0 such that ynk → y and g(xnk ) = G(xnk , ynk ). By the lower semicontinuity of G,

0 lim inf g(xn ) = lim inf G(xn , yn ) ≥ G(x, y ) k→∞ k k→∞ k k

for all y ∈ Y , G(xnk , y) ≥ g(xnk ) = G(xnk , ynk ) and, since, by assumption, the function x 7→ G(x, y) is continuous with respect to x,

0 G(x, y) = lim G(xn , y) ≥ lim inf g(xn ) = lim inf G(xn , yn ) ≥ G(x, y ) k→∞ k k→∞ k k→∞ k k 0 0 ⇒ y ∈ Y (x) and lim inf g(xn ) ≥ G(x, y ) = g(x) k→∞ k 0 0 0 ⇒ g(xnk ) ≤ G(xnk , y ) ⇒ lim sup g(xnk ) ≤ lim G(xnk , y ) = G(x, y ) = g(x) k→∞ k→∞ 0 ⇒ lim g(xn ) = G(x, y ) = g(x). k→∞ k

Since the limit g(x) is independent of the choice of the subsequence {xnk }, the whole sequence converges and g is continuous at x. (iii) Given t > 0 and w → v, for all yt ∈ Y (x + tw) and y0 ∈ Y (x), g(x + tw) − g(x) G(x + tw, y ) − G(x, y ) G(x + tw, y ) − G(x, y ) = t 0 ≤ 0 0 t t t def g(x + tw) − g(x) ⇒ ∀y0 ∈ Y (x), dg = lim sup ≤ dG(x, y0; v, 0) w→v t t&0

⇒ dg ≤ inf dG(x, y0; v, 0). y0∈Y (x)

In the other direction, let {tn}, tn & 0, and vn → v be sequences such that g(x + t v ) − g(x) g(x + tw) − g(x) lim n n → dg def= lim sup . n→∞ tn w→v t t&0

0 Let yn ∈ Y (x + tnvn). By compactness of Y , there exists y ∈ Y and a subsequence, still 0 denoted {yn}, such that yn → y when tn & 0. Then,

∀y ∈ Y,G(x + tnvn, y) ≥ G(x + tnvn, yn).

Since for all y ∈ Y , x 7→ G(x, y) is Hadamard semidifferentiable, (t, w) 7→ G(x + tw, y) is continuous at (0+, v), and since G is lsc,

0 G(x, y) = lim G(x + tnvn, y) ≥ lim inf G(x + tnvn, yn) ≥ G(x, y ) n→∞ n→∞ ⇒ ∀y ∈ Y,G(x, y) ≥ G(x, y0) ⇒ y0 ∈ Y (x).

On the other hand, by assumption (5.45), g(x + t v ) − g(x) G(x + t v , y ) − G(x, y ) G(x + t v , y ) − G(x, y ) n n = n n n 0 ≥ n n n n tn tn tn G(x + t v , y ) − G(x, y ) ⇒ dg ≥ lim inf n n n n ≥ dG(x, y0; v, 0). n→∞ tn Finally,

0 inf dG(x, y0; v, 0) ≥ dg ≥ dg ≥ dG(x, y ; v, 0) ≥ inf dG(x, y0; v, 0) y0∈Y (x) y0∈Y (x) Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

166 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

0 and equality holds everywhere. Therefore, dH g(x; v) exists and there exists y ∈ Y (x) such that

0 dH g(x; v) = inf dG(x, y0; v, 0) = dG(x, y ; v, 0). y0∈Y (x)

Finally, since dH g(x; v) exists, g is continuous at x. The superlinearity is a direct consequence of formula (5.44). (iv) From (iii) g is Hadamard semidifferentiable at x. Since Y (x) is a singleton, from formula (5.44), the mapping v 7→ dH g(x; v) is linear and hence g is Hadamard differentiable at x.

6 Summary of Semidifferentiability and Differentiability This section summarizes the various semidifferentials and differentials introduced in this chapter. The basic notions are the semidifferential df(x; v) and the stronger Hadamard semidifferential dH f(x; v), which plays a central role since the rules of the classical differential calculus remain valid for that family of functions. With the linearity with respect to v, we have the Gateaux and Hadamard differentials, respectively, and the notion of gradient.

def f(x + tv) − f(x) n df(x; v) = lim ∀v ∈ R , df(x; v) exists and t&0 t v 7→ df(x; v): n → n is linear f semidifferentiable R R f Gateaux differentiable at x at x in the direction v

f(x + tw) − f(x) def n dH f(x; v) = lim ∀v ∈ , d f(x; v) exists and t&0 t R H w→v n n v 7→ dH f(x; v): R → R is linear f Hadamard semidifferentiable f Hadamard differentiable at x at x in the direction v

Hadamard semidifferentiability has an equivalent geometric characterization in terms of paths.

f Hadamard ∃ g(x, v) ∈ R such that n semidifferentiable ⇐⇒ ∀ h : [0, ∞[ → R , h(0) = x and dh(0; +1) = v, at x in the direction v d(f ◦ h)(0; +1) exists and d(f ◦ h)(0; +1) = g(x, v)

Hadamard differentiability coincides with Fréchet differentiability and also has an equivalent geometric characterization in terms of paths.

n f Fréchet differentiable at x : ∃ L(x): R → R linear ⇐⇒ f(x + v) − f(x) − L(x)v such that lim = 0 f Hadamard v→0 kvk differentiable ∃ L(x): n → at x R R linear such that n 0 ⇐⇒ ∀ h : R → R , h(0) = x and h (0) exists, (f ◦ h)0(0) exists and (f ◦ h)0(0) = L(x) h0(0)

The above notions are sufficient for most problems of practical interest. n In going to arbitrary functions f : R → R, dom f 6= ∅, each of the two semidifferentials splits into two by replacing the limit by the liminf and the limsup, which can both take the value Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

7. Exercises 167

+∞ or −∞.

def f(x + tv) − f(x) def f(x + tw) − f(x) df(x; v) = lim inf dH f(x; v) = lim inf t&0 t&0 t t w→v Dini lower semidifferential Hadamard lower semidifferential at x ∈ dom f in the direction v at x ∈ dom f in the direction v

def f(x + tv) − f(x) def f(x + tw) − f(x) df(x; v) = lim sup dH f(x; v) = lim sup t&0 t t&0 t w→v Dini upper semidifferential Hadamard upper semidifferential at x ∈ dom f in the direction v at x ∈ dom f in the direction v

dH f(x; v) ≤ df(x; v) ≤ df(x; v) ≤ dH f(x; v)

n The next two notions necessitate that the function f : R → R be finite not only at the point x ∈ dom f but also in a neighborhood V (x) of f in order to make sense of terms of the form f(y + tv) − f(y), that is, x ∈ int (dom f).

def f(y + tv) − f(y) def f(y + tv) − f(y) dC f(x; v) = lim inf dC f(x; v) = lim sup t&0 t t&0 t y→x y→x Clarke lower semidifferential Clarke upper semidifferential at x ∈ dom f in the direction v at x ∈ dom f in the direction v

When f is Lipschitzian at x ∈ dom f, the six lower/upper semidifferentials are finite and

dC f(x; v) ≤ dH f(x; v) ≤ df(x; v) ≤ df(x; v) ≤ dH f(x; v) ≤ dC f(x; v)

To finish, we have also included the intermediary notion of directional derivative:

f(x + tv) − f(x) lim t→0 t f has a derivative in the direction v at x which is of limited interest since it hides the more fundamental notion of the semidifferential and is not yet the Gateaux differential.

7 Exercises Exercise 7.1. 2 Consider the continuous function in R of Example 3.7 (Figure 3.5), x3 f(x, y) def= if (x, y) 6= (0, 0), f(0, 0) def= 0. (7.1) x2 + y2

For all paths h(t) = (x(t), y(t)), t ∈ R, satisfying h(0) = (x, y) and for which the derivative h0(0) = (x0(0), y0(0)) exists (tangent to the trajectory h at (x(0), y(0)) = (x, y)), consider the new function g(t) = f(h(t)). Prove that g0(0) = f(x0(0), y0(0)). Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

168 Chapter 3. Semidifferentiability, Differentiability, Continuity, and Convexities

0 . 5

0 . 2 5

z 4 0

- 0 . 2 5 2

- 0 . 5

0 y 4

2 - 2 0

x - 2 - 4

- 4

Figure 3.10. Function of Exercise 7.2.

Exercise 7.2. Prove that the function (see Figure 3.10)  xy2  if x 6= 0, f(x, y) def= x2 + y4 0 if x = 0 is differentiable at (x, y) = (0, 0), but is neither Gateaux differentiable nor continuous at (x, y) = (0, 0). Note the following properties: x < 0 ⇒ f(x, y) ≤ 0 and x > 0 ⇒ f(x, y) ≥ 0, f(−x, y) = −f(x, y) and f(x, −y) = f(x, y). Exercise 7.3. Consider the function  x3y  if (x, y) 6= (0, 0), f(x, y) = x4 + y2 0 if (x, y) = (0, 0). Show that f 2 (i) is continuous in R , (ii) Gateaux differentiable at (0, 0), (iii) but is not Fréchet differentiable at (0, 0).

Exercise 7.4. n n n Let A : R → R be linear (or an n × n matrix) and b ∈ R (or an n-vector). Define 1 f(x) def= (Ax) · x + b · x, x ∈ n . 2 R Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)

7. Exercises 169

(i) Compute df(x; v) (or the gradient ∇f(x)) and d2f(x; v; w) (or the Hessian Hf(x)). (ii) Give necessary and sufficient conditions on A, b for the convexity of f. (iii) Give necessary and sufficient conditions on A, b for the strict convexity of f. (iv) Are the functions f associated with the following matrices and vectors convex?

 3 1 −2 2 4 1 (a) A = , b = and (b) A = , b = . −1 2 1 4 1 1

Exercise 7.5. Let f(x) = |x|n, x ∈ R, n ≥ 1, be an integer. (i) Determine the values of n for which f is differentiable on R. (ii) Give the directional derivatives of f at x = 0 as a function of n ≥ 1, if it exists. Otherwise, give the semidifferentials.

(iii) Determine the values of n ≥ 1 for which f is convex on R. Exercise 7.6. n Given an integer n ≥ 1, define the function f : R → R: n def X f(x) = |xi|. (7.2) i=1

n (i) Prove that the function f is convex and Lipschitzian on R and give the expression of its semidifferential df(x; v). (ii) For n = 2, compute df(x; v) at the points x = (1, 1), (0, 1), (1, 0), (0, 0) as a function of v = (v1, v2). In which cases is f Gateaux differentiable?

Exercise 7.7. Show that the function f(x) = sin x + (1 + x)2 is convex on the interval [0, 1].

Exercise 7.8. 2 Consider the following function f : R → R: xy (x2 − y2)  if (x, y) 6= (0, 0), f(x, y) def= x2 + y2 (7.3) 0 if (x, y) = (0, 0).

Show that

2 (i) f, ∂xf, and ∂yf exist and are continuous in R ; 2 2 2 (ii) ∂xyf = ∂x(∂yf) and ∂yxf = ∂y(∂xf) exist in R and are continuous except at (0, 0); 2 2 (iii) ∂xyf(0, 0) = 1 and ∂yxf(0, 0) = −1. Recall. The notation (3.68):   2 ∂ ∂f 2 ∂jif(x) = (x) = d f(x; ei; ej) = Hf(x)ij. ∂xj ∂xi Copyright © 2019 Society for Industrial and Applied Mathematics From Introduction to Optimization and Hadamard Semidifferential Calculus - Delfour (9781611975956)