The Ekeland Variational Principle, the Bishop-Phelps Theorem, and the Brøndsted-Rockafellar Theorem Our aim is to prove the Ekeland Variational Principle which is an abstract result that found numerous applications in various fields of . As its application to , we provide a proof of the famous Bishop- Phelps Theorem and some related results. Let us recall that the and the of a f : X → [−∞, +∞] (where X is a set) are the following of X × R: epi f = {(x, t) ∈ X × R : t ≥ f(x)} , hyp f = {(x, t) ∈ X × R : t ≤ f(x)} . Notice that, if f > −∞ on X then epi f 6= ∅ if and only if f is proper, that is, f is finite at at least one point.

0.1. The Ekeland Variational Principle. In what follows, (X, d) is a metric space. Given λ > 0 and (x, t) ∈ X × R, we define the set

Kλ(x, t) = {(y, s): s ≤ t − λd(x, y)} = {(y, s): λd(x, y) ≤ t − s} ⊂ X × R.

Notice that Kλ(x, t) = hyp[t − λ(x, ·)]. Let us state some useful properties of these sets.

Lemma 0.1. (a) Kλ(x, t) is closed and contains (x, t). (b) If (¯x, t¯) ∈ Kλ(x, t) then Kλ(¯x, t¯) ⊂ Kλ(x, t). (c) If (xn, tn) → (x, t) and (xn+1, tn+1) ∈ Kλ(xn, tn) (n ∈ N), then \ Kλ(x, t) = Kλ(xn, tn) . n∈N Proof. (a) is quite easy. To show (b), let (¯x, t¯) ∈ Kλ(x, t) and (y, s) ∈ Kλ(¯x, t¯). Then λd(x, y) ≤ λd(x, x¯) + λd(¯x, y) ≤ (t − t¯) + (t¯− s) = t − s, and hence (y, s) ∈ Kλ(x, t). To prove (c), we shall use (a) and (b). Notice that Kλ(xn+1, tn+1) ⊂ Kλ(xn, tn) for each n. Since (xm, tm) ∈ Kλ(xn, tn) whenever m ≥ n, we can pass to the limit for m → +∞ to obtain (x, t) ∈ Kλ(xn, tn) for each n. Thus T Kλ(x, t) ⊂ n Kλ(xn, tn). To show the equality, take (y, s) ∈/ Kλ(x, t), that is, λd(x, y) − t + s > 0. But then λd(xn, y) − tn + s > 0 for each sufficiently large T n. It follows that (y, s) ∈/ n Kλ(xn, tn).  The proof of the following proposition, basic for Ekeland’s principle, is some- how “geometric” and its basic idea is quite simple. The reader is invited to draw a picture to grasp this geometric idea of the proof. 1 2

Proposition 0.2. Let (X, d) be a , and let PR denote the canonical projection of X × R onto R, that is, PR(x, t) = t. Let F ⊂ X × R be a such that inf PR(F ) > −∞. Then for every λ > 0 and (x0, t0) ∈ F there exists (¯x, t¯) ∈ F ∩ Kλ(x0, t0) such that F ∩ Kλ(¯x, t¯) = {(¯x, t¯)}. P∞ Proof. Fix a sequence {εn}n≥1 ⊂ (0, +∞) such that 1 εn < +∞. Let us inductively define sets Fn and points (xn, tn) in X × R for n ≥ 0. Case n = 0. Define F0 = F , and recall that we already have (x0, t0) ∈ F0. Now let n ≥ 0, and assume that we have already defined Fk and (xk, tk) ∈ Fk for 0 ≤ k ≤ n. Put Fn+1 = Fn ∩Kλ(xn, tn), and fix a point (xn+1, tn+1) ∈ Fn+1 such that tn+1 ≤ inf PR(Fn+1) + εn+1. Notice that Fn+1 ⊂ Fn and Fn+1 = F ∩ Kλ(xn, tn)(n ≥ 0) by Lemma 0.1. Moreover, since (xn+1, tn+1) ∈ Fn ∩ Kλ(xn, tn), we have that λd(xn, xn+1) ≤  tn − tn+1 ≤ inf PR(Fn) + εn − inf PR(Fn) = εn (n ≥ 1). Our choice of {εn} easily implies that the sequences {xn} and {tn} are Cauchy and hence ¯ T T convergent: (xn, tn) → (¯x, t) ∈ F1. Now, n Fn = n[F ∩ Kλ(xn, tn)] = T ¯ ¯ F ∩ n Kλ(xn, tn) = F ∩ Kλ(¯x, t) by Lemma 0.1. To show that F ∩ Kλ(¯x, t) is a singleton, it suffices to show that diam Fn → 0. (On X × R we can consider, e.g., the metric dˆ(x, t), (y, s) := maxd(x, y), |t − s| , or dˆ(x, t), (y, s) := d(x, y) + |t − s| which is clearly equivalent to the previous one.) For n ≥ 2 and (y, s) ∈ Fn = Fn−1 ∩ Kλ(xn−1, tn−1), we have as above:  λd(xn−1, y) ≤ tn−1 − s ≤ inf PR(Fn−1) + εn−1 − inf PR(Fn−1) = εn−1 → 0. It follows that diam(Fn) → 0, and we are done.  We are ready for the famous “variational principle” due to I. Ekeland [J. Math. Anal. Appl. 47 (1974), 324–353]. It asserts that, under apropriate assumptions, if a point x0 is an “almost minimum point” for a function f then there exists a small Lipschitz perturbation of f attaining its (strict global) minimum at a point “near” to x0. As usual, “l.s.c.” stands for “lower semicontinuous”. Theorem 0.3 (Ekeland Variational Principle). Let (X, d) be a complete metric space, and f : X → (−∞, +∞] a proper l.s.c. function which is bounded below. Let ε > 0, λ > 0, x0 ∈ X, and f(x0) ≤ inf f(X) + ε. Then there exists x¯ ∈ X such that

(a) f(¯x) ≤ f(x0); (b) d(x0, x¯) ≤ ε/λ; (c) f(¯x) < f(x) + λd(¯x, x) for all x ∈ X \{x¯} (that is, the function f + λ(¯x, ·) attains its strict global minimum at x¯.)

Proof. The set F := epi f is a nonempty closed set in X × R,(x0, f(x0)) ∈ F , f(x0) ≤ inf PR(F ) + ε and the infimum is finite. By Proposition 0.2, there is (¯x, t¯) ∈ F ∩ Kλ(x0, f(x0)) such that F ∩ Kλ(¯x, t¯) = {(¯x, t¯)}. Since (¯x, f(¯x)) ∈ F ∩ Kλ(¯x, t¯), we necessarily have t¯ = f(¯x). Moreover, since (¯x, f(¯x)) ∈ 3  Kλ(x0, t0), we have λd(x0, x¯) ≤ f(x0) − f(¯x) ≤ inf f(X) + ε − inf f(X) ≤ ε which implies (a) and (b). Finally, if x 6=x ¯ then (x, f(x)) ∈/ Kλ(¯x, f(¯x)), that is, λd(¯x, x) > f(¯x) − f(x) which gives (c).  √ Remark 0.4. Notice that for λ = ε , Theorem 0.3(b,c) become √ √ d(x0, x¯) ≤ ε and f(¯x) < f(x) + ε d(¯x, x) whenever x 6=x ¯.

0.2. The Bishop-Phelps Theorem. The classical Bishop-Phelps Theorem asserts that if X is a (real) then the elements of X∗ that attain the are dense in X∗. We shall prove a more general version of this theorem, where the unit of X is replaced by an arbitrary closed, convex, proper of X. We shall use the following standard terminology. If C is a nonempty set in a normed space X, x ∈ C, x∗ ∈ X∗ \{0} and x∗(x) = sup x∗(C), then we say that x is a support point of C and x∗ is a support functional for C. Notice that each support point belongs to the ∂C of C, and all strictly positive multiples of a support functional are again support functionals. Moreover, the Hahn-Banach Theorem immediately gives that if C is a closed with nonempty (in some topological ) then each point of ∂C is a support point of C. The main tool for the Bishop-Phelps Theorem will be the following auxiliary theorem. In the proof we shall use the easy and well-known fact that a closed in X ×R that strictly separates two distinct points having the same first component coincides with the graph of a continuous affine function on X. Theorem 0.5. Let X be a Banach space, C ⊂ X a nonempty, closed, convex ∗ ∗ set. Let ε > 0, x0 ∈ C and x0 ∈ X \{0} be such that ∗ ∗ ∗ inf x0(C) > −∞ and x0(x0) ≤ inf x0(C) + ε. ∗ ∗ ∗ Then for each 0 < λ < kx0k there exist x1 ∈ C and x1 ∈ X \{0} such that ε kx − x k ≤ , kx∗ − x∗k ≤ λ and x∗(x ) = inf x∗(C) . 1 0 λ 1 0 1 1 1 Proof. The function ( x∗(x) if x ∈ C, f(x) = 0 +∞ otherwise, ∗ is a proper, convex, l.s.c. function on X (since epi f = (epi x0) ∩ C × R), and it satisfies f(x0) ≤ inf f(X) + ε. By the Ekeland Variational Principle, there exists x1 ∈ X such that f(x1) ≤ f(x0), kx1 − x0k ≤ ε/λ, and f(x1) ≤ f(x) + λkx − x1k for all x ∈ X. The first inequality gives that x1 ∈ C, while the last one can be rewritten in the form

g(x) := f(x) − f(x1) ≥ −λkx − x1k =: h(x), x ∈ X. 4

Since g is proper, convex and l.s.c., h is finite, concave and continuous, and g(x1) = h(x1) = 0, we can separate the epigraph epi g and the hypograph hyp h (which has nonempty interior!) by a closed hyperplane in X × R that ∗ ∗ necessarily passes through the point (x1, 0). Thus there exists y ∈ X such that ∗ f(x) − f(x1) ≥ y (x − x1) ≥ −λkx − x1k, x ∈ X. ∗ ∗ ∗ ∗ The first inequality can be written as x0(x) − x0(x1) ≥ y (x) − y (x1), x ∈ C, that is, ∗ ∗ ∗ ∗ (x0 − y )(x) ≥ (x0 − y )(x1), x ∈ C, while the second inequality gives (by putting x = x1 + v) y∗(v) ≤ λkvk, v ∈ X, ∗ ∗ ∗ ∗ that is, ky k ≤ λ. Since kx0 − y k ≥ kx0k − λ > 0, we conclude that the ∗ ∗ ∗ functional x1 := x0 − y and the point x1 have the required properties.  Theorem 0.6 (Bishop-Phelps Theorem). Let C be a nonempty, closed, con- vex, proper subset of a Banach space X. Then: (a) the support points of C are dense in the boundary ∂C; (b) the support functionals of C are dense in the (nonempty) set Ω(C) = {x∗ ∈ X∗ : sup x∗(C) < +∞}. In particular, if C is also bounded then its support functionals are dense in X∗.

Proof. (a) Fix arbitrarily x0 ∈ ∂C and ε ∈ (0, 1). There is y ∈ X \C such that ∗ ∗ kx0 − yk ≤ ε. By the Hahn-Banach theorem, there exists x ∈ X such that ∗ ∗ ∗ ∗ ∗ ∗ kx k = 1 and x (y) > sup x (C). Then x (x0) ≥ x (y) − ε > sup x (C) − ε. ∗ ∗ √ Define x0 = −x , and apply Theorem 0.5 with λ = ε to get x1 ∈ C and ∗ ∗ 0 6= x1 ∈ X such that √ ∗ ∗ √ ∗ ∗ kx1 − x0k ≤ ε , kx1 − x0k ≤ ε and x1(x1) = inf x1(C) . ∗ ∗ ∗ ∗ √ ∗ Then the (nonzero) functional z := −x1 satisfies kz −x k ≤ ε and z (x1) = ∗ sup z (C), and hence x1 is a support point of C. (b) Let ε ∈ (0, 1) and x∗ ∈ Ω(C). Since Ω(C) contains all positive multiples ∗ ∗ ∗ of x , we can (and do) assume that kx k = 1. Fix x0 ∈ C so that x (x0) > ∗ sup x (C) − ε, and repeat√ the proof of (a) above to get a support functional z∗ such that kz∗ − x∗k ≤ ε. Finally, if C is in addition bounded then clearly Ω(C) = X∗, and we are done.  As a particular case of Theorem 0.6 we obtain the following result which is the original Bishop-Phelps Theorem (published in [E. Bishop, R.R. Phelps, A proof that every Banach space is subreflexive, Bull. Amer. Math. Soc. 67 (1961), 97–98]). 5

Recall that the famous James’ theorem states that a Banach space is re- flexive if and only if each element of its dual attains its norm (that is, the supremum that defines the norm is attained). Theorem 0.7. Every Banach space X is “subreflexive” in the sense that the norm-attaining functionals are dense in X∗. Proof. Notice that a nonzero functional x∗ ∈ X∗ attains its norm if and only if it is a support functional of the closed unit ball BX . Apply Theorem 0.6.  Let us give a simple example showing that the completeness of the space is necessary in the Bishop-Phelps Theorem. We only sketch the construction, leaving the formal proof to the reader. Example 0.8. Consider an arbitrary infinite-dimensional normed space Z and a dense hyperplane X ⊂ Z (that is, X is the kernel of a discontinuous linear functional on Z). Fix a norm-one point z0 ∈ Z \ X, and a norm-one ∗ ∗ ∗ ∗ z ∈ Z = X with z (z0) = 1. Then the set

C := conv({±2z0} ∪ BZ ), where BZ denotes the closed unit ball of Z, is closed, convex, bounded and symmetric, and hence coincides with the unit ball of an equivalent norm |||· ||| on Z (where ||| · ||| can be defined as the Minkowski gauge of C). It is not difficult to show that every x∗ ∈ Z∗ which is sufficiently near to z∗ attains its ||| · |||-norm (that is, its supremum over C) just at 2z0. It follows that the (incomplete) normed space (X, |||· |||) is not “subreflexive”.

0.3. The Brøndsted-Rockafellar Theorem. As another application of the Ekeland Variational Principle, we are going to prove the Brøndsted-Rockafellar Theorem which is a kind of a “Bishop-Phelps theorem for convex functions”. We shall need the notion of ε-subdifferential. Given a locally convex Haus- dorff X, a proper, convex, l.s.c. function f : X → (−∞, +∞], a point x0 ∈ dom f , and ε ≥ 0, we can define the set ∗ ∗ ∗ ∂εf(x0) = {x ∈ X : ∀x ∈ X, f(x) ≥ f(x0) + x (x − x0) − ε}, called the ε-subdifferential of f at x0. If ε = 0 then the above set is the subdifferential of f at x0 and it is denoted by ∂f(x0). For x0 ∈/ dom f, we define ∂εf(x0) = ∅ (ε ≥ 0). ∗ Lemma 0.9. Under the above assumptions, ∂εf(x0) is convex and w -closed for every ε ≥ 0. Moreover:

(a) ∂εf(x0) 6= ∅ for each ε > 0; 6

(b) if in addition either X is a normed space and f is continuous at x0, or X is a Banach space and x0 ∈ int(dom f), then ∂f(x0) is nonempty and w∗-compact. Proof. The first part of the statement is an easy exercise. (a) Let ε > 0. Since the point (x0, f(x0) − ε) ∈ X × R does not belong to the (closed, convex) set epi f, it admits a convex open neighborhood D which is disjoint from epi f. Using the Hahn-Banach separation theorem for D and epi f, and the remark just before Theorem 0.5, there exist x∗ ∈ X∗ and α ∈ R such that ∗ f(x) ≥ α + x (x − x0), x ∈ X, and α > f(x0) − ε. ∗ ∗ It follows that f(x) > f(x0) − ε + x (x − x0), x ∈ X, that is, x ∈ ∂εf(x0). (b) By known results on continuity of convex functions, in both cases A := int(dom f) is an open convex set containing x0, and f is locally Lipschitz on A. In particular, (x0, f(x0) + 1) is an interior point of epi f. So we can separate, as in (a), the point (x0, f(x0)) and epi f by the graph of an affine function of ∗ ∗ the form a(x) = f(x0) + x (x − x0). Then x ∈ ∂f(x0). It remains to show that ∂f(x0) is bounded. Let δ > 0 be such that f is ∗ Lipschitz with a constant L > 0 on x0 + δBX . Then for y ∈ ∂f(x0) and ∗ ∗ v ∈ BX we have y (v) = (1/δ)y (δv) ≤ (1/δ)[f(x0 + δv) − f(x0)] ≤ L, and ∗ hence ky k ≤ L.  The following example shows that, even if X is a Banach space, the subdif- ferential ∂f(x0) can be empty for some points x0 ∈ dom f. P Example 0.10. Take some sequence {αn} ⊂ (0, +∞) such that n αn < +∞. In X = `2, consider the closed (even compact!) convex set 2 C = {x = (xn) ∈ ` : ∀n ∈ N, |xn| ≤ αn} P and the function f(x) := n xn on C. Then f is real-valued, affine, continuous (e.g. by the Dominated Convergence Theorem), and f(0) = 0. If we define f(x) = +∞ for x∈ / C then f becomes a proper, convex, l.s.c. function on X. Let us show that ∂f(0) = ∅. Assume there exists y∗ ∈ ∂f(0). By the Riesz ∗ ∗ P 2 Representation Theorem, y is of the form y (x) = n ynxn where (yn) ∈ ` . Since y∗ ≤ f on C, we must have y∗ = f on C by symmetry. Considering the points x = αnen (where en denotes the n-th vector or of the canonical 2 orthonormal of ` ), we easily obtain that yn = 1 for each n, which is clearly impossible. We are done. Now we are ready for the Brøndsted-Rockafellar Theorem which shows that the subdifferential is indeed nonempty in “many” points. Theorem 0.11 (Brøndsted-Rockafellar Theorem). Let X be a Banach space, f : X → (−∞, +∞] a proper, convex, l.s.c. function, and x0 ∈ dom f. If 7

∗ ∗ ε > 0, λ > 0 and x0 ∈ ∂εf(x0), then there exist x1 ∈ dom f and x1 ∈ ∂f(x1) such that ε kx − x k ≤ and kx∗ − x∗k ≤ λ. 1 0 λ 1 0 ∗ Moreover, |f(x1) − f(x0)| ≤ ε + εkx0k/λ .

Proof. We have that f(x) ≥ f(x0) + x0(x − x0) − ε (x ∈ X). So if we denote ∗ f0(x) := f(x) − x0(x − x0) we obtain

f0(x0) ≤ inf f0(X) + ε.

By the Ekeland Variational Principle (Theorem 0.3), there exists x1 ∈ X such that ε f (x ) ≤ f (x ), kx − x k ≤ , f (x) ≥ f (x ) − λkx − x k (x ∈ X). 0 1 0 0 1 0 λ 0 0 1 1 The last inequality can be rewritten as ∗ g(x) := f(x) − f(x1) − x0(x − x1) ≥ −λkx − x1k =: h(x), x ∈ X. Proceeding in the same way as in the proof of Theorem 0.5, we deduce that there exists y∗ ∈ X∗ such that ∗ ∗ f(x) − f(x1) − x0(x − x1) ≥ y (x − x1) ≥ −λkx − x1k, x ∈ X, ∗ ∗ ∗ ∗ from which we obtain that x1 := x0 + y ∈ ∂f(x1) and ky k ≤ λ, and the second formula holds. ∗ Let us prove the last part. Since x0 ∈ ∂εf(x0), we can write kx∗kε f(x ) − f(x ) ≤ x∗(x − x ) + ε ≤ 0 + ε, 0 1 0 0 1 λ ∗ and x1 ∈ ∂f(x1) implies kx∗kε ε f(x ) − f(x ) ≤ x∗(x − x ) = x∗(x − x ) + (x∗ − x∗)(x − x ) ≤ 0 + λ . 1 0 1 1 0 0 1 0 1 0 1 0 λ λ So the last inequality follows.  Observation 0.12. Let f be a proper, convex, l.s.c. function on X, and let z∗ ∈ X∗ be such that inf[f − z∗](X) > −∞. Then for each ε > 0 there exists ∗ z ∈ dom f such that z ∈ ∂εf(z). Indeed, it is easy to see that it suffices to take z so that f(z) − z∗(z) ≤ inf[f − z∗](X) + ε.

Given a function f : X → (−∞, +∞] and a set E ⊂ X, let f|E denote the restriction of f to E, and let gr f ⊂ X × R be the graph of f, that is, gr f = {(x, f(x)) : x ∈ dom f}. Let us state the following important application of the Brøndsted-Rockafellar Theorem. 8

Theorem 0.13. Let f be a proper, convex, l.s.c. function on a Banach space X, and let D(∂f) := {x ∈ X : ∂f(x) 6= ∅}.

(a) gr f|D(∂f) is dense in gr f. (b) D(∂f) is dense in dom f. (c) f coincides with the supremum of all continuous affine functions that “sup- port f at some point of its graph”, that is, f(x) = sup{f(z) + z∗(x − z): z ∈ D(∂f), z∗ ∈ ∂f(z)}, x ∈ X.

∗ Proof. Given x0 ∈ dom f and ε > 0, fix some x0 ∈ ∂εf(x0), and apply ∗ √ Theorem 0.11 with λ = max{1, kx0k} ε to get√ the corresponding point x1 ∈ D(∂f). It satisfies kx1 − x0k ≤ ε/λ ≤ ε and |f(x1) − f(x0)| ≤ ∗ √ ε + εkx0k/λ ≤ ε + ε. This proves (a) and (b). Let us show (c). Let g be the function on the right-hand side, that is, g(x) = sup{f(z) + z∗(x − z): z ∈ D(∂f), z∗ ∈ ∂f(z)}. Then g ≤ f (and g is also proper, convex and l.s.c., but we shall not need this fact). For x0 ∈ X we shall consider three cases. ∗ Case x0 ∈ dom f. Given ε > 0, choose some x0 ∈ ∂εf(x0), and apply ∗ Theorem 0.11 with an arbitrary λ > 0 to obtain x1 ∈ ∂f(x1), kx1 −x0k ≤ ε/λ, ∗ ∗ kx1 − x0k ≤ λ. Then ∗ g(x0) ≤ f(x0) ≤ f(x1) + x0(x0 − x1) + ε ∗ ∗ ∗ = [f(x1) + x1(x0 − x1)] + (x0 − x1)(x0 − x1) + ε ≤ g(x0) + 2ε, which shows that g(x0) = f(x0). Case x0 ∈ dom f \ dom f. Given m ∈ R, we have that f(x0) = +∞ > m and hence there exists an open convex neighborhood V of x0 such that f > m on V . By the Hahn-Banach separation theorem, the convex sets epi f and V × (−∞, m) (which is open) can be separated by a closed hyperplane in X × R. Since V contains a point of y ∈ dom f, this hyperplane cannot be “vertical”, and hence it coincides with the graph of a continuous affine function. ∗ ∗ ∗ Thus there exist x1 ∈ X and α ∈ R so that f(x) ≥ x1(x − x0) + α and ∗ ∗ α = x1(x0 −x0)+α ≥ m. Since β := infx∈X [f(x)−x1(x−x0)] ≥ α ≥ m > −∞, we also have ∗ f(x) ≥ x1(x − x0) + β, x ∈ X, ∗ and there exists x1 ∈ dom f such that f(x1) − x1(x1 − x0) ≤ β + ε. This easily ∗ implies that x1 ∈ ∂εf(x1). Fix an arbitrary ε > 0, and apply Theorem 0.11 ∗ ∗ ∗ with λ = ε/kx1 − x0k to get x2 ∈ ∂f(x2), kx2 − x1k ≤ ε/λ, kx2 − x1k ≤ λ. Then the last displayed formula implies ∗ ∗ ∗ g(x0) ≥ f(x2) + x2(x0 − x2) ≥ x1(x2 − x0) + β + x2(x0 − x2) ∗ ∗ ∗ ∗ ≥ m + (x1 − x2)(x2 − x1) + (x1 − x2)(x1 − x0) ≥ m − 2ε 9

∗ ∗ ∗ ∗ since |(x1−x2)(x2−x1)| ≤ λ(ε/λ) = ε and |(x1−x2)(x1−x0)| ≤ λkx1−x0k = ε. Since m ∈ R and ε > 0 were arbitrary, we obtain that g(x0) = +∞ = f(x0). ∗ ∗ Case x0 ∈/ dom f. By the Hahn-Banach theorem, there exists y ∈ X such that ∗ ∗ y (x0) > s := sup y (dom f). ∗ Fix arbitrarily x1 ∈ D(∂f), x1 ∈ ∂f(x1) and m > 0. Then ∗ ∗ ∗ f(x) − (x1 + my )(x) ≥ f(x1) − x1(x1) − ms =: c − ms, x ∈ dom f. ∗ ∗ By Observation 0.12 for ε := 1, we have x1 + my ∈ ∂1f(x2) for some x2 ∈ dom f. Apply Theorem 0.11 with λ = ε/kx2 − x0k = 1/kx2 − x0k to get ∗ ∗ ∗ x3 ∈ ∂f(x3), kx3 − x2k ≤ ε/λ, kx3 − x2k ≤ λ. Then ∗ ∗ ∗ ∗ ∗ g(x0) ≥ f(x3) + x3(x0 − x3) = [f(x3) − (x1 + my )(x3)] + (x1 + my )(x0) ∗ ∗ ∗ ∗ ∗ ∗ + (x1 + my − x3)(x3 − x2) + (x1 + my − x3)(x2 − x0) ∗ ∗ ≥ c − ms + x1(x0) + my (x0) − 2 ∗ ∗ ∗ ∗ ∗ ∗ since |(x1 + my − x3)(x3 − x2)| ≤ λ(ε/λ) = 1 and |(x1 + my − x3)(x2 − x0)| ≤ λkx2 − x0k = 1. Consequently, ∗ g(x0) ≥ d + m[y (x0) − s], m > 0, where the constant d does not depend on m. It follows that g(x0) = +∞ = f(x0), and we are done.