Affine mappings and convex functions. Examples of convex functions

In this section, X,Y denote real vector spaces, unless otherwise specified. Affine mappings. Definition 0.1. Let X,Y be vector spaces, A ⊂ X an affine set. A mapping F : A → Y is affine if F ((1 − t)x + ty) = (1 − t)F (x) + tF (y) whenever t ∈ R and x, y ∈ A. Proposition 0.2. Let F : X → Y . (a) F is linear if and only if F is affine and F (0) = 0. (b) F is affine if and only if there exist a linear mapping T : X → Y and a vector y0 ∈ Y such that F (x) = T x + y0 (x ∈ X). Moreover, T is unique in this case.

Proof. Exercise.  Corollary 0.3. Let A ⊂ X be an affine set. If F : A → Y is affine, then Pn Pn Pn F ( 1 λixi) = 1 λiF (xi) whenever xi ∈ A, λi ∈ R, 1 λj = 1.

Proof. Fix a0 ∈ A and consider the linear set L = A − a0. Since A = L + a0, we can define an affine mapping G: L → Y by G(y) = F (y + a0). By Proposition 0.2, we can write G(y) = T y + y0, where T : L → Y is linear and y0 ∈ Y is fixed. Then our F is of the form F (x) = G(x − a0) = T (x − a0) + y0 . Now, an easy calculation completes the proof.  Now, we can consider the following notion of affine mappings defined on convex (not necessarily affine) sets. Definition 0.4. Let X,Y be vector spaces, C ⊂ X a . We say that a mapping F : C → Y is c-affine if F ((1 − t)x + ty) = (1 − t)F (x) + tF (y) whenever t ∈ [0, 1] and x, y ∈ X. Lemma 0.5. Let A ⊂ X be an affine set and F : A → Y a mapping. Then F is affine if and only if F is c-affine. Proof. Each affine mapping is clearly c-affine. To prove the vice-versa, suppose that F is c-affine and consider x, y ∈ A, t ∈ R, z = (1 − t)x + ty. We want to show that F (z) = (1 − t)F (x) + tF (y). For t ∈ [0, 1] this is true. Suppose t > 1. Then y is a convex combination of x and z: 1 t−1 y = t z + t x . 1 t−1 Consequently, F (y) = t F (z) + t F (x) which easily implies F (z) = (1 − t)F (x) + tF (y). The case t < 0 can be proved in a similar way by expressing x by means of y, z.  Lemma 0.6. Let C be a convex set in a X. Then aff(C) = {αu − βv : α, β ≥ 0, α − β = 1, u, v, ∈ C}. 1 2

Proof. Let us prove the inclusion “⊂” (the other one is trivial). Take x ∈ aff(C) Pn and write it as an affine combination x = i=1 λiui of elements ui ∈ C (1 ≤ i ≤ n). We can suppose that λi 6= 0 for each i. Denote

I = {i : λi > 0} ,J = {i : λi < 0} . Clearly, I 6= ∅. If J = ∅, we have x ∈ C, and hence x = αx − βx with α = 1, β = 0. P P Now, let J 6= ∅. Then x = αu − βv where α = j∈I λj, β = j∈J (−λj), u = P λi P −λi i∈I α ui ∈ C and v = i∈J β ui ∈ C.  Theorem 0.7. Let X,Y be vector spaces, C ⊂ X a convex set, A = aff(C). If F : C → Y is c-affine, then there exists an affine mapping G: A → Y such that G|C = F . Moreover, such G is unique. Proof. By Lemma 0.6, each x ∈ A can be written in the form (1) x = αu − βv where α, β ≥ 0, α − β = 1, u, v ∈ C. Then we put G(x) = αF (u) − βF (v). Claim. G(x) does not depend on the representation (1). To see this, consider two such representations αu−βv = α0u0−β0v0. Then αu+β0v0 = α0u0 + βv and α0 + β = α + β0 =: ∆. We must have ∆ > 0, since otherwise we would 0 0 α β0 0 α0 0 β get α = α = β = β = 0 which is impossible. Since ∆ u + ∆ v = ∆ u + ∆ v are α β0 0 α0 0 β convex combinations, we get ∆ F (u) + ∆ (v ) = ∆ (u ) + ∆ (v) which easily implies αF (u) − βF (v) = α0F (u0) − β0F (v0). Our Claim is proved. Thus we have defined a mapping G: A → Y . Moreover, G is an extension of F . Since any affine extension of F has to satisfy the same definition, F has at most one affine extension to A. In view of Lemma 0.5, it suffices to show that G is c-affine on A. Let x, y ∈ A, t ∈ (0, 1), p = (1 − t)x + ty. By Lemma 0.6, we can write x = αu − βv , y = γw − δz , where α, β, γ, δ ≥ 0, α − β = γ = −δ = 1, u, v, w, z ∈ C. Then p = (1 − t)αu + tγw − (1 − t)βv + tδz. Put a = (1 − t)α + tγ and b = (1 − t)β + tδ and observe that a − b = 1. If a 6= 0 and b 6= 0, we can write h (1−t)α tγ i h (1−t)β tδ i p = a a u + a w − b b v + b z . The expressions in square brackets are points of C. Thus (1 − t)α tγ  (1 − t)β tδ  G(p) = aF u + w − bF v + z a a b b (1 − t)α tγ  (1 − t)β tδ  = a F (u) + F (w) − b F (v) + F (z) a a b b = (1 − t)[αF (u) − βF (v)] + λ[γF (w) − δF (z)] = (1 − t)G(x) + tG(y) .  3

By the above theorem, every c-affine mapping on a convex set is a restriction of an affine mapping. From this reason, c-affine mappings on a convex set will be called affine. Exercise 0.8. Let X,Y be vector spaces, C ⊂ X a convex set, and F : C → Y an affine mapping. Let A ⊂ C and B ⊂ Y be sets. (a) If A is convex, then F (A) is convex. If B is convex, then F −1(B) is convex. (b) Is it true that conv(F (A)) = F (conv(A)) ? (b) Is it true that conv(F −1(B)) = F −1(conv(B)) ?

Convex functions. By R we mean the extended real line [−∞, +∞]. Definition 0.9. Let C ⊂ X be a convex set, f : C → R a . (a) f is called proper (on C) if its effective domain

dom(f) = {x ∈ C : f(x) ∈ R} is nonempty. (b) f is called convex (on C) if its epigraph

epi(f) = {(x, t) ∈ C × R : t ≥ f(x)} is a convex set in X × R. (c) f is called concave (on C) if −f is convex. It is clear that f is concave on C if and only if its hypograph hypo(f) = {(x, t) ∈ C × R : t ≤ f(x)} is a convex set. Moreover, f : C → R is both convex and concave if and only if f is affine. Notice that if f : C → [−∞, +∞] is a , then the function ( f(x)(x ∈ C) fˆ(x) = +∞ (x ∈ X \ C) is convex on X.

Observation 0.10. If fi : C → R (i ∈ I) are convex functions, then also their pointwise supremum

f(x) = sup fi(x) i∈I T is a convex function. Indeed, epi(f) = i∈I epi(fi) .

Theorem 0.11. Let C be a convex set in a vector space X, f : C → R a function. Then f is convex if and only if

f((1 − λ)x + λy) ≤ (1 − λ)f(x) + λf(y) whenever x, y ∈ C, λ ∈ (0, 1) and the right-hand side is defined. 4

Proof. “⇒” Fix x, y ∈ C and λ ∈ (0, 1) such that (1 − λ)f(x) + λf(y) is defined (i.e., f(x), f(y) are not infinite with opposite signs). If f(x) = +∞ of f(y) = +∞, the inequality is obviously satisfied. Let now f(x) < +∞ and f(y) < +∞. For any t, s ∈ R such that t ≥ f(x) and s ≥ f(y), the points (x, t) and (y, s) belong to the convex set epi(f). Thus also ((1 − λ)x + λy, (1 − λ)t + λs) ∈ epi(f), which implies f((1 − λ)x + λy) ≤ (1 − λ)t + λs . The formula follows since t ≥ f(x) and s ≥ f(y) were arbitrary. “⇐” Let (x, t) and (y, s) be two points of epi(f), and λ ∈ (0, 1). Then f(x) ≤ t < +∞ and f(y) ≤ s < +∞. By our formula, f((1−λ)x+λy) ≤ (1−λ)f(x)+λf(y) ≤ (1−λ)t+λs. Hence (1−λ)(x, t)+λ(y, s) = ((1−λ)x+λy, (1−λ)t+λs) ∈ epi(f).  Observation 0.12. If f : C → R is a convex function on a convex set C, then the sets dom(f), {x ∈ C : f(x) ≤ α} and {x ∈ C : f(x) < α} (α ∈ R) are convex. Convex functions that assume the value −∞ are not particularly interesting. For −1 example, if f : R → R is convex with J := f (−∞) 6= ∅, then J is an interval, dom(f) ⊂ ∂I, and f is identically +∞ on R \ J. From this reason, we shall consider convex functions with values in (−∞, +∞] only. In this case, the right-hand side of the formula in Theorem 0.11 is always defined. Observation 0.13. (a) If f, g : C → (−∞, +∞] are convex functions and α, β ≥ 0, then also the function αf + βg is convex. (b) If fn : C → (−∞, +∞] (n ∈ N) are convex functions, then also the sum P+∞ f = n=1 fn is a convex function. Theorem 0.14 (disuguaglianza di Jensen). Let X be a vector space, C ⊂ X a convex set, and f : C → (−∞, +∞] a convex function. Then n ! n X X Pn (2) f λixi ≤ λif(xi) whenever n ∈ N, xi ∈ C, λi ≥ 0, 1 λj = 1. i=1 i=1 Proof. The proof, based on an induction argument (as in the proof of the fact that convex sets are closed under making arbitrary convex combinations), is left to the reader as an exercise. 

Midconvex functions. Let C ⊂ X be a convex set. A function f : C → (−∞, +∞] is called midconvex (or Jensen convex, or J-convex) if x+y f(x)+f(y) f( 2 ) ≤ 2 whenever x, y ∈ C. It is clear that f is midconvex on C if and only if it is midconvex on each segment C ∩ L where L ⊂ X is a line intersecting C. Each convex function is also midconvex, but not vice-versa. Examples of nonconvex midconvex functions on R can be constructed using the so-called Hamel basis: the basis of R as a vector space over the rationals Q. However, it can be proved that midconvex functions satisfy the Jensen inequality with rational coefficients. Moreover, a midconvex function f on an open interval I ⊂ R is convex provided it satisfies at least one of the following conditions: 5

(a) f is continuous; (b) f is locally bounded on I; (c) f is Lebesgue measurable; (d) f is bounded above on a set of positive measure; (e) f is bounded above on a set of the second Baire category. Let us conclude this section with some important examples of convex functions.

Indicator function. Given a set A in a vector space X, the indicator function δA of A is defined by ( 0 if x ∈ A δA(x) = . +∞ if x ∈ X \ A

It is easy to see that δA is convex if and only if A is a convex set.

Sublinear functions. A function p: X → (−∞, +∞] is called sublinear if p(0) = 0, p(tx) = tp(x) and p(x + y) ≤ p(x) + p(y) whenever t > 0, x, y ∈ X. Observe that every sublinear function is convex and its effective domain is a con- vex cone (that is a convex set which is invariant under multiplication by nonnegative scalars). Any is a sublinear function.

Distance function. Given a nonempty set A in a normed linear space, its distance function is the function

dA : X → [0, +∞) , dA(x) = d(x, A) = inf{kx − ak : a ∈ A} .

Observe that dA = dA . Moreover, dA is 1-Lipschitz (i.e., Lipschitz with constant 1); this follows easily from the fact that dA is the pointwise infimum of 1-Lipschitz functions k · −ak (a ∈ A).

Exercise 0.15. Let X be a normed space, xi ∈ X, ri > 0, λi > 0 (i = 1, . . . , n). Then Pn Pn Pn  i=1 λiB(xi, ri) = B i=1 λixi, i=1 λiri . Proposition 0.16. Let X be a normed space, A ⊂ X a nonempty closed set.

(a) dA is convex if and only if A is a convex set. (b) dA is concave on X \ A if and only if X \ A is a convex set.

Proof. (a) If dA is convex, then the set A = {x : dA(x) ≤ 0} is convex. Now, let A be convex. Given x, y ∈ X and ε > 0, choose a, b ∈ A so that kx − ak < dA(x) + ε and ky − bk < dA(y) + ε. Then for each t ∈ [0, 1] we have dA((1 − t)x + ty) ≤ k(1−t)x+ty −[(1−t)a+tb]k ≤ (1−t)kx−ak+tky −bk ≤ (1−t)dA(x)+tdA(y)+ε. By arbitrariety of ε > 0, dA is a convex function. (b) If dA is concave on X \ A, then the set X \ A = {x : dA(x) > 0} is convex. Now, 0 let X \A be convex. Let x, y ∈ X \A and t ∈ [0, 1]. The open balls B (x, dA(x)) and 0 0 B (y, dA(y)) are contained in the convex set X\A. Thus X\A ⊃ conv[B (x, dA(x))∪ 0 0 0 0 B (y, dA(y))] ⊃ (1−t)B (x, dA(x))+tB (y, dA(y)) = B ((1−t)x+ty, (1−t)dA(x)+ tdA(y)). It follows that dA((1−t)x+ty) ≥ (1−t)dA(x)+tdA(y). Thus dA is concave on X \ A.  6

Support functions. Let X be a normed space wuth the continuous dual X∗ (the normed space of all continuous linear functionals on X). Given nonempty sets A ⊂ X∗ and B ⊂ X, we can define the following so-called support functions: ∗ sA : X → (−∞, +∞] , sA(x) = sup a (x) = sup x(A); a∗∈A ∗ ∗ ∗ ∗ σB : X → (−∞, +∞] , σB(x ) = sup x (b) = sup x (B) . b∈B These two functions are easily seen to be sublinear and hence convex.

Minkowski . Let X be a vector space. By the cone hull of a set A ⊂ X we mean the set [ cone(A) = {0} ∪ tA . t>0 The cone hull of a nonempty set is just the set of all nonnegative multiples of the elements of the set. Observe that a cone hull of a convex set C containing 0 is the set S nC. n∈N The (or the Minkowski gauge) of a set A is the function x pA : X → (−∞, +∞] , pA(x) = inf{t > 0 : x ∈ tA} = inf{t > 0 : t ∈ A} , with the usual convention that inf ∅ = +∞. Example 0.17. Let B0(0, r) and B(0, r) denote the open and closed ball of radius kxk r, centered in 0. Then pB0(0,r)(x) = inf{t > 0 : kxk < tr} = inf{t > 0 : r < t} = 1 1 r kxk, and analogously pB(0,r)(x) = r kxk. Proposition 0.18. Let C ⊂ X be a convex set containing 0.

(a) dom(pC ) = cone(C). (b) pC is sublinear. (c) {x : pC (x) < 1} ⊂ C ⊂ {pC (x) ≤ 1}. (d) If, for each line L ⊂ span(C) through 0, C ∩ L is closed in L, then C = {x : pC (x) ≤ 1}. (e) If, for each line L ⊂ span(C) through 0, C ∩ L is open in L, then C = {x : pC (x) < 1}.

Proof. (a),(c),(d),(e) are easy exercises. Let us prove (b). First, pC is positively homogeneous since, for each t > 0, we have

tx λ=tµ x pC (tx) = inf{λ > 0 : λ } = inf{tµ : µ > 0, µ ∈ C} = tpC (x) .

It remains to show that pC is subadditive. Let x, y ∈ dom(C), ε > 0. Choose x y α, β > 0 such that α , β ∈ C, α < pC (x) + ε, β < pC (y) + ε. Then x+y α x  β y  α+β = α+β α + α+β β ∈ C.

Consequently, pC (x + y) ≤ α + β < pC (x) + pC (y) + 2ε. Since ε > 0 was arbitrary, we conclude that pC (x + y) ≤ pC (x) + pC (y).  Observation 0.19. Let C,D ⊂ X be two convex sets containing 0. If K ⊂ X is a cone such that C ∩ K ⊂ D, then pC (x) ≥ pD(x) for each x ∈ K. 7

Theorem 0.20. Let X be a normed space, and C ⊂ X a convex set such that 0 ∈ C. Then the following assertions are equivalent:

(i) there exists c > 0 such that pC (x) ≤ ckxk for each x ∈ cone(C); (ii) the restriction (pC )|cone(C) is continuous in 0; (iii) 0 belongs to the relative interior of C in cone(C), that is, there exists r > 0 such that B0(0, r) ∩ cone(C) ⊂ C.

If moreover cone(C) is a linear set, the above equivalent conditions imply that pC is Lipschitz on X.

Proof. (i) obviously implies (ii). If (ii) holds, there exists r > 0 such that pC (x) < 1 whenever x ∈ B0(0, r)∩cone(C). Then, by Proposition 0.18(c), B0(0, r)∩cone(C) ⊂ C. Finally, if (iii) holds, then for each x ∈ cone(C) we have (by Observations 0.19 1 and 0.17) that pC (x) ≤ pB0(0,r)(x) = r kxk. This proves that (i)-(iii) are equivalent. Now, let cone(C) be a linear set, x, y ∈ cone(C). Then pC (x) = pC (y + (x − y)) ≤ pC (y) + pC (x − y). Consequently, pC (x) − pC (y) ≤ pC (x − y) ≤ ckx − yk where c is as in (i). By interchanging the role of x and y, we obtain that pC is c-Lipschitz on cone(C).  As an application of Minkowski functionals, we prove the following theorem. Theorem 0.21. Let X be a normed linear space, C ⊂ X and D ⊂ X bounded convex sets with nonempty interiors. Let the sets C,D be either both closed or both open. Then there exists a homeomorphism Φ of X onto itself such that Φ(C) = D. Moreover, given two points x0 ∈ int(C) and y0 ∈ int(D), the homeomorphism Φ can be taken so that it satisfies the property

Φ(x0 + tv) = y0 + tΦ(v) whenever v ∈ X, t ≥ 0.

(In particular, Φ(x0) = y0.)

Proof. The sets C0 = C − x0 and D0 = D − y0 contain 0 as an interior point. Define a mapping F : X → X by p (x) F (0) = 0 ,F (x) = C0 x (x 6= 0) . pD0 (x)

Then F is positively homogeneous, and continuous on X \{0}, since pC0 and pD0 are continuous by Theorem 0.20. To show that F is continuous at 0, observe that 1 D0 ⊂ B(0,R) for some R > 0, and hence pD0 ≥ pB(0,R) = R k · k (Observation 0.19). Thus kxk ≤ R (x 6= 0), and this easily implies continuity of F at 0. A direct pD0 (x) p (y) calculation shows that G(y) = D0 y is the inverse function of F . Since G is pC0 (y) continuous, F is a homeomorphism of X. Moreover, pD0 (F (x)) = pC0 (x) for each x ∈ X. By Proposition 0.18(d,e), F (C0) = D0. Now, it is easy to verify that the mapping Φ(x) := y0 + F (x − x0) has the desired properties.  Exercises. Let C ⊂ X be a convex set, f : C → (−∞, +∞] a function. 1. Let p ≥ 1. If f is convex and nonnegative, then f p is convex, too. Is it possible to omit “nonnegative” for some integer values of p? 8

2. We say that f is quasiconvex if all sets f −1((−∞, α)) and f −1((−∞, α]) (α ∈ R) are convex. Show that a quasiconvex function is not necessarily convex. Characterize quasiconvex functions by an inequality analogous to the one in Theorem 0.11.