The Ekeland Variational Principle, the Bishop-Phelps Theorem, and the Brøndsted-Rockafellar Theorem Our aim is to prove the Ekeland Variational Principle which is an abstract result that found numerous applications in various fields of Mathematics. As its application to Convex Analysis, we provide a proof of the famous Bishop- Phelps Theorem and some related results. Let us recall that the epigraph and the hypograph of a function f : X → [−∞, +∞] (where X is a set) are the following subsets of X × R: epi f = {(x, t) ∈ X × R : t ≥ f(x)} , hyp f = {(x, t) ∈ X × R : t ≤ f(x)} . Notice that, if f > −∞ on X then epi f 6= ∅ if and only if f is proper, that is, f is finite at at least one point.
0.1. The Ekeland Variational Principle. In what follows, (X, d) is a metric space. Given λ > 0 and (x, t) ∈ X × R, we define the set
Kλ(x, t) = {(y, s): s ≤ t − λd(x, y)} = {(y, s): λd(x, y) ≤ t − s} ⊂ X × R.
Notice that Kλ(x, t) = hyp[t − λ(x, ·)]. Let us state some useful properties of these sets.
Lemma 0.1. (a) Kλ(x, t) is closed and contains (x, t). (b) If (¯x, t¯) ∈ Kλ(x, t) then Kλ(¯x, t¯) ⊂ Kλ(x, t). (c) If (xn, tn) → (x, t) and (xn+1, tn+1) ∈ Kλ(xn, tn) (n ∈ N), then \ Kλ(x, t) = Kλ(xn, tn) . n∈N Proof. (a) is quite easy. To show (b), let (¯x, t¯) ∈ Kλ(x, t) and (y, s) ∈ Kλ(¯x, t¯). Then λd(x, y) ≤ λd(x, x¯) + λd(¯x, y) ≤ (t − t¯) + (t¯− s) = t − s, and hence (y, s) ∈ Kλ(x, t). To prove (c), we shall use (a) and (b). Notice that Kλ(xn+1, tn+1) ⊂ Kλ(xn, tn) for each n. Since (xm, tm) ∈ Kλ(xn, tn) whenever m ≥ n, we can pass to the limit for m → +∞ to obtain (x, t) ∈ Kλ(xn, tn) for each n. Thus T Kλ(x, t) ⊂ n Kλ(xn, tn). To show the equality, take (y, s) ∈/ Kλ(x, t), that is, λd(x, y) − t + s > 0. But then λd(xn, y) − tn + s > 0 for each sufficiently large T n. It follows that (y, s) ∈/ n Kλ(xn, tn). The proof of the following proposition, basic for Ekeland’s principle, is some- how “geometric” and its basic idea is quite simple. The reader is invited to draw a picture to grasp this geometric idea of the proof. 1 2
Proposition 0.2. Let (X, d) be a complete metric space, and let PR denote the canonical projection of X × R onto R, that is, PR(x, t) = t. Let F ⊂ X × R be a closed set such that inf PR(F ) > −∞. Then for every λ > 0 and (x0, t0) ∈ F there exists (¯x, t¯) ∈ F ∩ Kλ(x0, t0) such that F ∩ Kλ(¯x, t¯) = {(¯x, t¯)}. P∞ Proof. Fix a sequence {εn}n≥1 ⊂ (0, +∞) such that 1 εn < +∞. Let us inductively define sets Fn and points (xn, tn) in X × R for n ≥ 0. Case n = 0. Define F0 = F , and recall that we already have (x0, t0) ∈ F0. Now let n ≥ 0, and assume that we have already defined Fk and (xk, tk) ∈ Fk for 0 ≤ k ≤ n. Put Fn+1 = Fn ∩Kλ(xn, tn), and fix a point (xn+1, tn+1) ∈ Fn+1 such that tn+1 ≤ inf PR(Fn+1) + εn+1. Notice that Fn+1 ⊂ Fn and Fn+1 = F ∩ Kλ(xn, tn)(n ≥ 0) by Lemma 0.1. Moreover, since (xn+1, tn+1) ∈ Fn ∩ Kλ(xn, tn), we have that λd(xn, xn+1) ≤ tn − tn+1 ≤ inf PR(Fn) + εn − inf PR(Fn) = εn (n ≥ 1). Our choice of {εn} easily implies that the sequences {xn} and {tn} are Cauchy and hence ¯ T T convergent: (xn, tn) → (¯x, t) ∈ F1. Now, n Fn = n[F ∩ Kλ(xn, tn)] = T ¯ ¯ F ∩ n Kλ(xn, tn) = F ∩ Kλ(¯x, t) by Lemma 0.1. To show that F ∩ Kλ(¯x, t) is a singleton, it suffices to show that diam Fn → 0. (On X × R we can consider, e.g., the metric dˆ (x, t), (y, s) := maxd(x, y), |t − s| , or dˆ (x, t), (y, s) := d(x, y) + |t − s| which is clearly equivalent to the previous one.) For n ≥ 2 and (y, s) ∈ Fn = Fn−1 ∩ Kλ(xn−1, tn−1), we have as above: λd(xn−1, y) ≤ tn−1 − s ≤ inf PR(Fn−1) + εn−1 − inf PR(Fn−1) = εn−1 → 0. It follows that diam(Fn) → 0, and we are done. We are ready for the famous “variational principle” due to I. Ekeland [J. Math. Anal. Appl. 47 (1974), 324–353]. It asserts that, under apropriate assumptions, if a point x0 is an “almost minimum point” for a function f then there exists a small Lipschitz perturbation of f attaining its (strict global) minimum at a point “near” to x0. As usual, “l.s.c.” stands for “lower semicontinuous”. Theorem 0.3 (Ekeland Variational Principle). Let (X, d) be a complete metric space, and f : X → (−∞, +∞] a proper l.s.c. function which is bounded below. Let ε > 0, λ > 0, x0 ∈ X, and f(x0) ≤ inf f(X) + ε. Then there exists x¯ ∈ X such that
(a) f(¯x) ≤ f(x0); (b) d(x0, x¯) ≤ ε/λ; (c) f(¯x) < f(x) + λd(¯x, x) for all x ∈ X \{x¯} (that is, the function f + λ(¯x, ·) attains its strict global minimum at x¯.)
Proof. The set F := epi f is a nonempty closed set in X × R,(x0, f(x0)) ∈ F , f(x0) ≤ inf PR(F ) + ε and the infimum is finite. By Proposition 0.2, there is (¯x, t¯) ∈ F ∩ Kλ(x0, f(x0)) such that F ∩ Kλ(¯x, t¯) = {(¯x, t¯)}. Since (¯x, f(¯x)) ∈ F ∩ Kλ(¯x, t¯), we necessarily have t¯ = f(¯x). Moreover, since (¯x, f(¯x)) ∈ 3 Kλ(x0, t0), we have λd(x0, x¯) ≤ f(x0) − f(¯x) ≤ inf f(X) + ε − inf f(X) ≤ ε which implies (a) and (b). Finally, if x 6=x ¯ then (x, f(x)) ∈/ Kλ(¯x, f(¯x)), that is, λd(¯x, x) > f(¯x) − f(x) which gives (c). √ Remark 0.4. Notice that for λ = ε , Theorem 0.3(b,c) become √ √ d(x0, x¯) ≤ ε and f(¯x) < f(x) + ε d(¯x, x) whenever x 6=x ¯.
0.2. The Bishop-Phelps Theorem. The classical Bishop-Phelps Theorem asserts that if X is a (real) Banach space then the elements of X∗ that attain the norm are dense in X∗. We shall prove a more general version of this theorem, where the unit ball of X is replaced by an arbitrary closed, convex, proper subset of X. We shall use the following standard terminology. If C is a nonempty set in a normed space X, x ∈ C, x∗ ∈ X∗ \{0} and x∗(x) = sup x∗(C), then we say that x is a support point of C and x∗ is a support functional for C. Notice that each support point belongs to the boundary ∂C of C, and all strictly positive multiples of a support functional are again support functionals. Moreover, the Hahn-Banach Theorem immediately gives that if C is a closed convex set with nonempty interior (in some topological vector space) then each point of ∂C is a support point of C. The main tool for the Bishop-Phelps Theorem will be the following auxiliary theorem. In the proof we shall use the easy and well-known fact that a closed hyperplane in X ×R that strictly separates two distinct points having the same first component coincides with the graph of a continuous affine function on X. Theorem 0.5. Let X be a Banach space, C ⊂ X a nonempty, closed, convex ∗ ∗ set. Let ε > 0, x0 ∈ C and x0 ∈ X \{0} be such that ∗ ∗ ∗ inf x0(C) > −∞ and x0(x0) ≤ inf x0(C) + ε. ∗ ∗ ∗ Then for each 0 < λ < kx0k there exist x1 ∈ C and x1 ∈ X \{0} such that ε kx − x k ≤ , kx∗ − x∗k ≤ λ and x∗(x ) = inf x∗(C) . 1 0 λ 1 0 1 1 1 Proof. The function ( x∗(x) if x ∈ C, f(x) = 0 +∞ otherwise, ∗ is a proper, convex, l.s.c. function on X (since epi f = (epi x0) ∩ C × R), and it satisfies f(x0) ≤ inf f(X) + ε. By the Ekeland Variational Principle, there exists x1 ∈ X such that f(x1) ≤ f(x0), kx1 − x0k ≤ ε/λ, and f(x1) ≤ f(x) + λkx − x1k for all x ∈ X. The first inequality gives that x1 ∈ C, while the last one can be rewritten in the form
g(x) := f(x) − f(x1) ≥ −λkx − x1k =: h(x), x ∈ X. 4
Since g is proper, convex and l.s.c., h is finite, concave and continuous, and g(x1) = h(x1) = 0, we can separate the epigraph epi g and the hypograph hyp h (which has nonempty interior!) by a closed hyperplane in X × R that ∗ ∗ necessarily passes through the point (x1, 0). Thus there exists y ∈ X such that ∗ f(x) − f(x1) ≥ y (x − x1) ≥ −λkx − x1k, x ∈ X. ∗ ∗ ∗ ∗ The first inequality can be written as x0(x) − x0(x1) ≥ y (x) − y (x1), x ∈ C, that is, ∗ ∗ ∗ ∗ (x0 − y )(x) ≥ (x0 − y )(x1), x ∈ C, while the second inequality gives (by putting x = x1 + v) y∗(v) ≤ λkvk, v ∈ X, ∗ ∗ ∗ ∗ that is, ky k ≤ λ. Since kx0 − y k ≥ kx0k − λ > 0, we conclude that the ∗ ∗ ∗ functional x1 := x0 − y and the point x1 have the required properties. Theorem 0.6 (Bishop-Phelps Theorem). Let C be a nonempty, closed, con- vex, proper subset of a Banach space X. Then: (a) the support points of C are dense in the boundary ∂C; (b) the support functionals of C are dense in the (nonempty) set Ω(C) = {x∗ ∈ X∗ : sup x∗(C) < +∞}. In particular, if C is also bounded then its support functionals are dense in X∗.
Proof. (a) Fix arbitrarily x0 ∈ ∂C and ε ∈ (0, 1). There is y ∈ X \C such that ∗ ∗ kx0 − yk ≤ ε. By the Hahn-Banach theorem, there exists x ∈ X such that ∗ ∗ ∗ ∗ ∗ ∗ kx k = 1 and x (y) > sup x (C). Then x (x0) ≥ x (y) − ε > sup x (C) − ε. ∗ ∗ √ Define x0 = −x , and apply Theorem 0.5 with λ = ε to get x1 ∈ C and ∗ ∗ 0 6= x1 ∈ X such that √ ∗ ∗ √ ∗ ∗ kx1 − x0k ≤ ε , kx1 − x0k ≤ ε and x1(x1) = inf x1(C) . ∗ ∗ ∗ ∗ √ ∗ Then the (nonzero) functional z := −x1 satisfies kz −x k ≤ ε and z (x1) = ∗ sup z (C), and hence x1 is a support point of C. (b) Let ε ∈ (0, 1) and x∗ ∈ Ω(C). Since Ω(C) contains all positive multiples ∗ ∗ ∗ of x , we can (and do) assume that kx k = 1. Fix x0 ∈ C so that x (x0) > ∗ sup x (C) − ε, and repeat√ the proof of (a) above to get a support functional z∗ such that kz∗ − x∗k ≤ ε. Finally, if C is in addition bounded then clearly Ω(C) = X∗, and we are done. As a particular case of Theorem 0.6 we obtain the following result which is the original Bishop-Phelps Theorem (published in [E. Bishop, R.R. Phelps, A proof that every Banach space is subreflexive, Bull. Amer. Math. Soc. 67 (1961), 97–98]). 5
Recall that the famous James’ theorem states that a Banach space is re- flexive if and only if each element of its dual attains its norm (that is, the supremum that defines the norm is attained). Theorem 0.7. Every Banach space X is “subreflexive” in the sense that the norm-attaining functionals are dense in X∗. Proof. Notice that a nonzero functional x∗ ∈ X∗ attains its norm if and only if it is a support functional of the closed unit ball BX . Apply Theorem 0.6. Let us give a simple example showing that the completeness of the space is necessary in the Bishop-Phelps Theorem. We only sketch the construction, leaving the formal proof to the reader. Example 0.8. Consider an arbitrary infinite-dimensional normed space Z and a dense hyperplane X ⊂ Z (that is, X is the kernel of a discontinuous linear functional on Z). Fix a norm-one point z0 ∈ Z \ X, and a norm-one ∗ ∗ ∗ ∗ z ∈ Z = X with z (z0) = 1. Then the set
C := conv({±2z0} ∪ BZ ), where BZ denotes the closed unit ball of Z, is closed, convex, bounded and symmetric, and hence coincides with the unit ball of an equivalent norm |||· ||| on Z (where ||| · ||| can be defined as the Minkowski gauge of C). It is not difficult to show that every x∗ ∈ Z∗ which is sufficiently near to z∗ attains its ||| · |||-norm (that is, its supremum over C) just at 2z0. It follows that the (incomplete) normed space (X, |||· |||) is not “subreflexive”.
0.3. The Brøndsted-Rockafellar Theorem. As another application of the Ekeland Variational Principle, we are going to prove the Brøndsted-Rockafellar Theorem which is a kind of a “Bishop-Phelps theorem for convex functions”. We shall need the notion of ε-subdifferential. Given a locally convex Haus- dorff topological vector space X, a proper, convex, l.s.c. function f : X → (−∞, +∞], a point x0 ∈ dom f , and ε ≥ 0, we can define the set ∗ ∗ ∗ ∂εf(x0) = {x ∈ X : ∀x ∈ X, f(x) ≥ f(x0) + x (x − x0) − ε}, called the ε-subdifferential of f at x0. If ε = 0 then the above set is the subdifferential of f at x0 and it is denoted by ∂f(x0). For x0 ∈/ dom f, we define ∂εf(x0) = ∅ (ε ≥ 0). ∗ Lemma 0.9. Under the above assumptions, ∂εf(x0) is convex and w -closed for every ε ≥ 0. Moreover:
(a) ∂εf(x0) 6= ∅ for each ε > 0; 6
(b) if in addition either X is a normed space and f is continuous at x0, or X is a Banach space and x0 ∈ int(dom f), then ∂f(x0) is nonempty and w∗-compact. Proof. The first part of the statement is an easy exercise. (a) Let ε > 0. Since the point (x0, f(x0) − ε) ∈ X × R does not belong to the (closed, convex) set epi f, it admits a convex open neighborhood D which is disjoint from epi f. Using the Hahn-Banach separation theorem for D and epi f, and the remark just before Theorem 0.5, there exist x∗ ∈ X∗ and α ∈ R such that ∗ f(x) ≥ α + x (x − x0), x ∈ X, and α > f(x0) − ε. ∗ ∗ It follows that f(x) > f(x0) − ε + x (x − x0), x ∈ X, that is, x ∈ ∂εf(x0). (b) By known results on continuity of convex functions, in both cases A := int(dom f) is an open convex set containing x0, and f is locally Lipschitz on A. In particular, (x0, f(x0) + 1) is an interior point of epi f. So we can separate, as in (a), the point (x0, f(x0)) and epi f by the graph of an affine function of ∗ ∗ the form a(x) = f(x0) + x (x − x0). Then x ∈ ∂f(x0). It remains to show that ∂f(x0) is bounded. Let δ > 0 be such that f is ∗ Lipschitz with a constant L > 0 on x0 + δBX . Then for y ∈ ∂f(x0) and ∗ ∗ v ∈ BX we have y (v) = (1/δ)y (δv) ≤ (1/δ)[f(x0 + δv) − f(x0)] ≤ L, and ∗ hence ky k ≤ L. The following example shows that, even if X is a Banach space, the subdif- ferential ∂f(x0) can be empty for some points x0 ∈ dom f. P Example 0.10. Take some sequence {αn} ⊂ (0, +∞) such that n αn < +∞. In X = `2, consider the closed (even compact!) convex set 2 C = {x = (xn) ∈ ` : ∀n ∈ N, |xn| ≤ αn} P and the function f(x) := n xn on C. Then f is real-valued, affine, continuous (e.g. by the Dominated Convergence Theorem), and f(0) = 0. If we define f(x) = +∞ for x∈ / C then f becomes a proper, convex, l.s.c. function on X. Let us show that ∂f(0) = ∅. Assume there exists y∗ ∈ ∂f(0). By the Riesz ∗ ∗ P 2 Representation Theorem, y is of the form y (x) = n ynxn where (yn) ∈ ` . Since y∗ ≤ f on C, we must have y∗ = f on C by symmetry. Considering the points x = αnen (where en denotes the n-th vector or of the canonical 2 orthonormal basis of ` ), we easily obtain that yn = 1 for each n, which is clearly impossible. We are done. Now we are ready for the Brøndsted-Rockafellar Theorem which shows that the subdifferential is indeed nonempty in “many” points. Theorem 0.11 (Brøndsted-Rockafellar Theorem). Let X be a Banach space, f : X → (−∞, +∞] a proper, convex, l.s.c. function, and x0 ∈ dom f. If 7
∗ ∗ ε > 0, λ > 0 and x0 ∈ ∂εf(x0), then there exist x1 ∈ dom f and x1 ∈ ∂f(x1) such that ε kx − x k ≤ and kx∗ − x∗k ≤ λ. 1 0 λ 1 0 ∗ Moreover, |f(x1) − f(x0)| ≤ ε + εkx0k/λ .
Proof. We have that f(x) ≥ f(x0) + x0(x − x0) − ε (x ∈ X). So if we denote ∗ f0(x) := f(x) − x0(x − x0) we obtain
f0(x0) ≤ inf f0(X) + ε.
By the Ekeland Variational Principle (Theorem 0.3), there exists x1 ∈ X such that ε f (x ) ≤ f (x ), kx − x k ≤ , f (x) ≥ f (x ) − λkx − x k (x ∈ X). 0 1 0 0 1 0 λ 0 0 1 1 The last inequality can be rewritten as ∗ g(x) := f(x) − f(x1) − x0(x − x1) ≥ −λkx − x1k =: h(x), x ∈ X. Proceeding in the same way as in the proof of Theorem 0.5, we deduce that there exists y∗ ∈ X∗ such that ∗ ∗ f(x) − f(x1) − x0(x − x1) ≥ y (x − x1) ≥ −λkx − x1k, x ∈ X, ∗ ∗ ∗ ∗ from which we obtain that x1 := x0 + y ∈ ∂f(x1) and ky k ≤ λ, and the second formula holds. ∗ Let us prove the last part. Since x0 ∈ ∂εf(x0), we can write kx∗kε f(x ) − f(x ) ≤ x∗(x − x ) + ε ≤ 0 + ε, 0 1 0 0 1 λ ∗ and x1 ∈ ∂f(x1) implies kx∗kε ε f(x ) − f(x ) ≤ x∗(x − x ) = x∗(x − x ) + (x∗ − x∗)(x − x ) ≤ 0 + λ . 1 0 1 1 0 0 1 0 1 0 1 0 λ λ So the last inequality follows. Observation 0.12. Let f be a proper, convex, l.s.c. function on X, and let z∗ ∈ X∗ be such that inf[f − z∗](X) > −∞. Then for each ε > 0 there exists ∗ z ∈ dom f such that z ∈ ∂εf(z). Indeed, it is easy to see that it suffices to take z so that f(z) − z∗(z) ≤ inf[f − z∗](X) + ε.
Given a function f : X → (−∞, +∞] and a set E ⊂ X, let f|E denote the restriction of f to E, and let gr f ⊂ X × R be the graph of f, that is, gr f = {(x, f(x)) : x ∈ dom f}. Let us state the following important application of the Brøndsted-Rockafellar Theorem. 8
Theorem 0.13. Let f be a proper, convex, l.s.c. function on a Banach space X, and let D(∂f) := {x ∈ X : ∂f(x) 6= ∅}.
(a) gr f|D(∂f) is dense in gr f. (b) D(∂f) is dense in dom f. (c) f coincides with the supremum of all continuous affine functions that “sup- port f at some point of its graph”, that is, f(x) = sup{f(z) + z∗(x − z): z ∈ D(∂f), z∗ ∈ ∂f(z)}, x ∈ X.
∗ Proof. Given x0 ∈ dom f and ε > 0, fix some x0 ∈ ∂εf(x0), and apply ∗ √ Theorem 0.11 with λ = max{1, kx0k} ε to get√ the corresponding point x1 ∈ D(∂f). It satisfies kx1 − x0k ≤ ε/λ ≤ ε and |f(x1) − f(x0)| ≤ ∗ √ ε + εkx0k/λ ≤ ε + ε. This proves (a) and (b). Let us show (c). Let g be the function on the right-hand side, that is, g(x) = sup{f(z) + z∗(x − z): z ∈ D(∂f), z∗ ∈ ∂f(z)}. Then g ≤ f (and g is also proper, convex and l.s.c., but we shall not need this fact). For x0 ∈ X we shall consider three cases. ∗ Case x0 ∈ dom f. Given ε > 0, choose some x0 ∈ ∂εf(x0), and apply ∗ Theorem 0.11 with an arbitrary λ > 0 to obtain x1 ∈ ∂f(x1), kx1 −x0k ≤ ε/λ, ∗ ∗ kx1 − x0k ≤ λ. Then ∗ g(x0) ≤ f(x0) ≤ f(x1) + x0(x0 − x1) + ε ∗ ∗ ∗ = [f(x1) + x1(x0 − x1)] + (x0 − x1)(x0 − x1) + ε ≤ g(x0) + 2ε, which shows that g(x0) = f(x0). Case x0 ∈ dom f \ dom f. Given m ∈ R, we have that f(x0) = +∞ > m and hence there exists an open convex neighborhood V of x0 such that f > m on V . By the Hahn-Banach separation theorem, the convex sets epi f and V × (−∞, m) (which is open) can be separated by a closed hyperplane in X × R. Since V contains a point of y ∈ dom f, this hyperplane cannot be “vertical”, and hence it coincides with the graph of a continuous affine function. ∗ ∗ ∗ Thus there exist x1 ∈ X and α ∈ R so that f(x) ≥ x1(x − x0) + α and ∗ ∗ α = x1(x0 −x0)+α ≥ m. Since β := infx∈X [f(x)−x1(x−x0)] ≥ α ≥ m > −∞, we also have ∗ f(x) ≥ x1(x − x0) + β, x ∈ X, ∗ and there exists x1 ∈ dom f such that f(x1) − x1(x1 − x0) ≤ β + ε. This easily ∗ implies that x1 ∈ ∂εf(x1). Fix an arbitrary ε > 0, and apply Theorem 0.11 ∗ ∗ ∗ with λ = ε/kx1 − x0k to get x2 ∈ ∂f(x2), kx2 − x1k ≤ ε/λ, kx2 − x1k ≤ λ. Then the last displayed formula implies ∗ ∗ ∗ g(x0) ≥ f(x2) + x2(x0 − x2) ≥ x1(x2 − x0) + β + x2(x0 − x2) ∗ ∗ ∗ ∗ ≥ m + (x1 − x2)(x2 − x1) + (x1 − x2)(x1 − x0) ≥ m − 2ε 9
∗ ∗ ∗ ∗ since |(x1−x2)(x2−x1)| ≤ λ(ε/λ) = ε and |(x1−x2)(x1−x0)| ≤ λkx1−x0k = ε. Since m ∈ R and ε > 0 were arbitrary, we obtain that g(x0) = +∞ = f(x0). ∗ ∗ Case x0 ∈/ dom f. By the Hahn-Banach theorem, there exists y ∈ X such that ∗ ∗ y (x0) > s := sup y (dom f). ∗ Fix arbitrarily x1 ∈ D(∂f), x1 ∈ ∂f(x1) and m > 0. Then ∗ ∗ ∗ f(x) − (x1 + my )(x) ≥ f(x1) − x1(x1) − ms =: c − ms, x ∈ dom f. ∗ ∗ By Observation 0.12 for ε := 1, we have x1 + my ∈ ∂1f(x2) for some x2 ∈ dom f. Apply Theorem 0.11 with λ = ε/kx2 − x0k = 1/kx2 − x0k to get ∗ ∗ ∗ x3 ∈ ∂f(x3), kx3 − x2k ≤ ε/λ, kx3 − x2k ≤ λ. Then ∗ ∗ ∗ ∗ ∗ g(x0) ≥ f(x3) + x3(x0 − x3) = [f(x3) − (x1 + my )(x3)] + (x1 + my )(x0) ∗ ∗ ∗ ∗ ∗ ∗ + (x1 + my − x3)(x3 − x2) + (x1 + my − x3)(x2 − x0) ∗ ∗ ≥ c − ms + x1(x0) + my (x0) − 2 ∗ ∗ ∗ ∗ ∗ ∗ since |(x1 + my − x3)(x3 − x2)| ≤ λ(ε/λ) = 1 and |(x1 + my − x3)(x2 − x0)| ≤ λkx2 − x0k = 1. Consequently, ∗ g(x0) ≥ d + m[y (x0) − s], m > 0, where the constant d does not depend on m. It follows that g(x0) = +∞ = f(x0), and we are done.