Convexity, Smoothness, Duality, and Stability

Home , Absolutely convex set, Smoothness, Sublinear function, Symmetric set

Yao-Liang Yu [email protected] Machine Learning Department Carnegie Mellon University

December 14, 2015

This note is about the interplay between convexity, smoothness, and Stability, through duality. The note is still largely under construction and will update from time to time. Contents

1 Topological background2 3 Uniformly convex and uniformly smooth functions 23 2 Convex Functions 10 1 Topological background

1 Topological background

In this section we collect some useful topological results. A signiﬁcant portion is devoted to uniform spaces as the writer starts to appreciate this notion hence wishes to learn more about them. Theorem 1.1: Many useful spaces are not pseudo-metrizable

ω The space {0, 1} is metrizable iff |ω| ≤ ℵ0. ω Proof: The only if part is easy. For the if part, note that {0, 1} is not first countable if |ω| > ℵ0. Q Now take some (pseudo) metric spaces {Xγ : γ ∈ Γ} and consider its product γ Xγ . If |Γ| > ℵ0 (and each space Xγ contains two topologically distinct points), then the product space is not (pseudo) metrizable. As we will see, many important topological spaces can be treated as a subspace of a product of metrizable spaces (e.g. [0, 1]J for some index set J). Theorem 1.1 suggests that these spaces may not be metrizable but nevertheless they still enjoy enough “metric” structure. So we want to study and characterize these spaces. Definition 1.2: Topology

X A topology on a set X is a collection of sets (nhoods) {Ux ⊆ 2 : x ∈ X} such that:

(I). U ∈ Ux =⇒ x ∈ U ;

(II). If U ∈ Ux, then there exists some V ∈ Ux such that U ∈ Uy for all y ∈ V ;

(III). U , V ∈ Ux =⇒ U ∩ V ∈ Ux;

(IV). U ∈ Ux, U ⊆ V =⇒ V ∈ Ux.

A set U is called open iff U ∈ Ux for all x ∈ U . Trivially ∅ and X are open. A set is closed iff its complement is open. X The collection {Ux ⊆ 2 : x ∈ X} is called an nhood basis iff

(I). U ∈ Ux =⇒ x ∈ U ;

(II). If U ∈ Ux, then there exists some V ∈ Ux such that for all y ∈ V there exists some W ∈ Uy, W ⊆ U ;

(III). U , V ∈ Ux =⇒ there exists some W ⊆ U ∩ V , W ∈ Ux. Enlarging the basis by including all supersets we recover the nhood. Removing the last condition we get a subbasis of the topology; we can recover the basis by taking all ﬁnite intersections.

Theorem 1.3: The closure operator, Kurotowski

Theorem 1.4: Convergence class

Deﬁnition 1.5: Uniformity

A uniformity on a product set X × X is a collection of sets {D ⊆ X × X} such that (I). D ∈ D =⇒ D ⊇ ∆ := {(x, x): x ∈ X};

(II). D ∈ D =⇒ D−1 ∈ D; (III). If D ∈ D, then there exists some E ∈ D, E ◦ E ⊆ D;

December 14, 2015 revision: 1 main 2 1 Topological background

(IV). D, E ∈ D =⇒ D ∩ E ∈ D; (V). D ∈ D, D ⊆ E =⇒ E ∈ D, where D−1 := {(y, x):(x, y) ∈ D} and D ◦ E = {(x, y):(x, z) ∈ E, (z, y) ∈ D for some z ∈ X}. Uniformity is introduced to measure the “distance” between two points: There is a clear analogy between (I), (II), (III) and the definition of distance. (IV) and (V) are needed to extract a topology from a uniformity. If we omit (V) (and weaken (IV) a bit) we obtain a basis, while if we omit both (IV) and (V) we obtain a subbasis. Note that ∆ ◦ D = D ◦ ∆ = D. Using (I): E ◦ E ⊆ D =⇒ E ⊆ D. Thus, if (III) is satisfied, then we can strengthen it: for all n ∈ N, there exists some E ∈ D such that E ◦ · · · ◦ E ⊆ D, where w.l.o.g. | {z } n we can assume E is symmetric, i.e. E = E−1. Note also that usually ∆ 6∈ D. A topological space that admits a compatible (see Definition 1.7 below) uniformity will be called uniformizable. The notion uniformity was first introduced in Weil[1937], who allegedly also invented the empty set symbol ∅ (from Norwegian alphabet).

Alert 1.6: Intersection / union of uniformities

Unlike topology, intersection of uniformities need not be a (subbasis of) uniformity: For any x ∈ [0, 1] let Dx be the collection of supersets of ∆ ∪ {1, x} ∪ {x, 1}. Clearly Dx is a uniformity but for x 6= y, Dx∩Dy is not even a subbasis: it is the collection of supersets of ∆ˆ := ∆∪{1, x}∪{x, 1}∪{1, y}∪{y, 1} but ∆ˆ ◦ ∆ˆ = ∆ˆ ∪ {x, y} ∪ {y, x} hence (III) in Definition 1.5 is violated for ∆ˆ . Clearly, intersection of uniformities is a (subbasis of) uniformity iff they all contain a common (subbasis of) uniformity. Thus, for a collection of (subbases of) uniformities, we can define the smallest uniformity that contains all of them (since their union is a subbasis of uniformity).

Deﬁnition 1.7: (Uniform) topology from uniformity

Let D be a uniformity on X × X, then its induced topology (a.k.a. uniform topology) on X is deﬁned using nhoods:

∀x ∈ X, Dx := {Dx : D ∈ D}, Dx := {y :(x, y) ∈ D}. (1)

We easily verify that {Dx}x∈X is a (basis, subbasis of) topology on X if D is a (basis, subbasis of) uniformity on X × X.

It is apparent that there is an intimate relation between topology and uniformity, and our central question is to reveal what kind of topology is derived from a uniformity. Example 1.8: Pseudo-metric uniformity

Let (X, d) be a pseudo-metric space, then it admits a natural uniformity whose basis is:

D := {(x, y): d(x, y) < r}r>0 (2)

Not surprisingly, the topology of X induced by the metric d coincides with the one induced by the above uniformity.

Proposition 1.9: Interior preserves uniformity

If D is a member of the uniformity D, then int D ∈ D, where the interior is taken w.r.t. the product of the uniform topology on X. Proof: Since D ∈ D there exists some symmetric E ∈ D such that E ⊆ E ◦ E ◦ E ⊆ D. We claim

December 14, 2015 revision: 1 main 3 1 Topological background

that E ⊆ int D: Indeed, (x, y) ∈ E =⇒ Ex × Ey ⊆ E ◦ E ◦ E ⊆ D.

Therefore, the collection of open symmetric sets D ∈ D is a basis of D. It is now clear that D ∈ D =⇒ D is an nhood of ∆. However, the converse need not hold: R2 1 Consider D := {(x, y) ∈ : |x − y| < 1+|y| } which is an nhood of the diagonal, but D cannot contain any member of the basis in (2).

Proposition 1.10: Closed sets as intersection of open sets

A topological space X is R0 (i.e., for all x, y ∈ X, x has an nhood not containing y iﬀ y has an nhood not containing x) iﬀ all closed set A ⊆ X can be written as the intersection of (a family of) open supersets of A. c Proof: Let X be R0 and A ⊆ X be closed. For any x 6∈ A and y ∈ A, the open set A contains x but not y, hence there is an open set Uy containing y but not x. Therefore the open set ∪y∈AUy contains U but not x. Since x is arbitrary, we know A is the intersection of all open supersets. Conversely, suppose any closed set A is the intersection of a family of open supersets. Take any c x, y ∈ X such that there is an open set U that contains x but not y. Thus the closed set U = ∩αVα c c c for a family of open sets Vα ⊇ U , and obviously y ∈ U , x 6∈ U . Therefore, there exists some α such that y ∈ Vα, x 6∈ Vα.

Note that a topological space is T1 iﬀ it is T0 and R0. Moreover, a regular topological space is R0.

Proposition 1.11: Closure in uniform space T S For any subset A in a uniform space (X, D), cl A = DA, where DA := Dx. Similarly, for D∈D x∈A any B ⊆ X × X, cl B = T D ◦ B ◦ D. D∈D

Proof: x ∈ cl A ⇐⇒ ∀D ∈ D, Dx ∩ A 6= ∅ ⇐⇒ ∀ symmetric D ∈ D, x ∈ DA. Similarly, (x, y) ∈ cl B ⇐⇒ ∀ symmetric D ∈ D, Dx × Dy ∩ B 6= ∅ ⇐⇒ ∀ symmetric D ∈ D, (x, y) ∈ D ◦ B ◦ D.

Clearly, both intersections can be restricted to any symmetric basis (but not subbasis). It follows that for any E ∈ D, cl E ⊆ E ◦ E ◦ E, hence the collection of closed symmetric sets D ∈ D is a basis of D. Thus, the uniform topology is at least regular: Let x ∈ U for any open set U , then there exists a closed symmetric set D such that x ∈ Dx ⊆ U . Since the section map is closed, Dx is closed. Since Dx is an nhood, there is an open set V such that x ∈ V ⊆ Dx ⊆ U . Since the section map is open, DA is an open nhood of A if D is open. Thus, cl A can be written as the intersection of a family of open supersets of A.

Proposition 1.12: T0 = T2 in regular space

A regular topological space is T0 iﬀ T1 iﬀ T2.

Proof: Since a regular space is R0, it is T1 iﬀ T0. If a regular space is T1, then it can separate disjoint points since a point is closed.

T T Therefore, the uniform topology is Hausdorff (T2) iff (cl x =) Dx = x iff D = ∆. D∈D D∈D

Deﬁnition 1.13: Uniform continuity

Let (X, D) and (Y, E) be two uniform spaces. We call the function f : X → Y uniformly continuous

December 14, 2015 revision: 1 main 4 1 Topological background

iﬀ for all E ∈ E, the set {(x, y):(f(x), f(y)) ∈ E} ∈ D.

Theorem 1.14: Composition preserves uniform continuity

Let f :(X, D) → (Y, E), g :(Y, E) → (Z, F) be uniform continuous, then g ◦ f is also uniform continuous. Proof: For any F ∈ F,

∈D z }| { n o n o (x, y): g(f(x)), g(f(y)) ∈ F = (x, y):(f(x), f(y)) ∈ (a, b):(g(a), g(b)) ∈ F , | {z } ∈E

since both f and g are uniform continuous.

Theorem 1.15: Uniform continuous is continuous

A uniform continuous function f :(X, D) → (Y, E) is continuous (w.r.t. the uniform topology).

Proof: Fix any x and f(x). For any nhood Ef(x) of f(x), E ∈ E, the set D := {(y, z):(f(y), f(z)) ∈ E} ∈ D. Then f(Dx) ⊆ Ef(x).

The following deﬁnitions are slight modiﬁcations from their topological counterparts (with continuity enhanced to uniform continuity).

Deﬁnition 1.16: Making functions uniformly continuous

Let f : X → (Y, E), then the sets

{(x, y):(f(x), f(y)) ∈ E}E∈E

is easily veriﬁed to be a basis of uniformity. Therefore, by including all supersets we construct a coarsest uniformity on X that makes f uniform continuous. Similarly, there exists a coarsest uniformity W on X that makes a family of functions f α : X → α α (Y , E ) all uniformly continuous. Moreover, f :(Z, F) → (X, W) is uniformly continuous iﬀ fα ◦ f is uniformly continuous for all α.

Deﬁnition 1.17: Subspace uniformity

Let A be a subset of the uniform space (X, D). We call A a (uniform) subspace of X if it is equipped with the coarsest uniformity such that the inclusion map ι : A → X, a 7→ a is uniform continuous. More precisely, the subspace uniformity on A is (A × A) ∩ D. Not surprisingly, the subspace topology on A coincides with the topology induced by the subspace uniformity.

Deﬁnition 1.18: Product uniformity

Let (Y α, Eα) be a collection of uniform spaces, then its product uniform space is deﬁned as the Q α Q α α α Y such that the projections πα : α Y → Y are uniformly continuous. Again, the product topology coincides with the topology induced by the product uniformity. Moreover, the function Q α Q α f :(X, D) → ( α Y , α E ) is uniformly continuous iﬀ πα ◦ f is uniformly continuous for all α.

December 14, 2015 revision: 1 main 5 1 Topological background

Deﬁnition 1.19: Quotient uniformity

Definition 1.20: Topological embedding Q Let fα : X → Xα be a family of functions and define the evaluation map e : X → α Xα, with [e(x)]α = fα(x). We usually can choose the space Xα, and we would like to know when the evaluation map e is a topological embedding. The functions fα : X → Xα separate points in X iff for all x 6= y in X there exists some α such that fα(x) 6= fα(y). This is equivalent to the evaluation map being 1-1. The functions fα : X → Xα separate points from closed sets in X iff for all x ∈ X and disjoint closed set A ⊆ X there exists some α such that fα(x) 6∈ cl fα(A).

Theorem 1.21: Topological (uniform) embedding Q The evaluation map e : X → α Xα is a topological (uniform) embedding iff the functions fα : X → Xα separate points and X is equipped with the coarsest topology (uniformity) that makes every fα (uniformly) continuous. Proof: We only prove the uniform case. The topological case is completely analogous. First note that the evaluation map e is 1-1 iff fα separate points. The evaluation map is uniformly continuous iff for all α, Dα ∈ Dα, the sets ( ) n Y Y αo (x, y) ∈ X × X : e(x), e(y) ∈ (u, v) ∈ Xα × Xα :(uα, vα) ∈ D , α α which after simplification are

α {(x, y): fα(x), fα(y) ∈ D }, (3)

generate a uniformity coarser than the uniformity on X. The evaluation map, being 1-1, has uniformly continuous inverse (when restricted onto its range) iﬀ for all D in the uniformity of X, the sets

{(u, v) ∈ Im(e) × Im(e): e−1(u), e−1(v) ∈ D} = {(e(x), e(y)) : (x, y) ∈ D}

generate a uniformity coarser than the product uniformity on Im(e) × Im(e). The latter has a subbasis as follows:

α α {(u, v) ∈ Im(e) × Im(e):(uα, vα) ∈ D } = {(e(x), e(y)) : fα(x), fα(y) ∈ D }.

Thus, e has uniformly continuous inverse iﬀ the uniformity on X is coarser than the uniformity generated by (3).

Proposition 1.22: Separating points from closed sets

A collection of continuous real-valued functions fα on a topological space X separates points from −1 closed sets in X iﬀ the sets {fα (V ): V open in Xα}α form a base for the topology on X. Proof: ⇐: Let x ∈ X be disjoint from the closed set A ⊆ X. Then x ∈ Ac. Since Ac is open, there −1 c c exists an α and an open set V in Xα such that x ∈ fα (V ) ⊆ A . Thus fα(x) ∈ V ⊆ fα(A ). Since c c fα(A) ⊆ V , fα(x) 6∈ V ⊇ cl fα(A). c c ⇒: Let U be an open set in X and x ∈ U. There exists an α such that fα(x) 6∈ cl fα(U ) =: V . −1 Thus fα(x) ∈ V , i.e. x ∈ fα (V ), which is open since V is open and fα is continuous. Since c −1 −1 fα(U ) ∩ V = ∅, fα (V ) ⊆ U. Therefore the sets {fα (V )} form a base.

December 14, 2015 revision: 1 main 6 1 Topological background

It follows that the topology on X is the coarsest topology that makes all fα continuous.

Corollary 1.23: Characterizing completely regular spaces

The topological space X is completely regular iﬀ it is endowed with the weak topology generated by all continuous (and bounded) real-valued functions. Proof: ⇒: Since X is completely regular, the class of continuous (and bounded) functions separate points from closed sets. From Proposition 1.22 we know X is endowed with the weak topology. c Tn −1 c ⇐: Let x 6∈ A for any closed set A. Then x ∈ A hence x ∈ i=1 fi (Vi) ⊆ A for some open intervals Vi in R. Clearly we can take Vi = (ai, ∞) or Vi = (−∞, bi). By changing fi to −fi we can Q assume Vi = (ai, ∞). By changing fi to (fi − ai)+ we can assume ai ≡ 0 and fi ≥ 0. Let f = i fi. T −1 R −1 R c Then x ∈ i fi ( ++) = f ( ++) ⊆ A . Since f(A) = 0, the continuous function f separates x from A.

Corollary 1.24: Certifying topological embedding

If {fα} is a collection of continuous real-valued functions on a topological space X that separate Q points from points and closed sets, then the evaluation map e : X → α Xα is an embedding. If X is T1, then the functions fα automatically separate points since they separate points from closed sets.

Proposition 1.25: Embedding continuous functions

Let Y be a continuous image of X, then C(Y ) ,→ C(X) with the pointwise topology. Proof: Let t : Y → X be the continuous surjection, and consider the map ψ : C(Y ) → C(X), f 7→ f ◦ t. The claim follows from the surjectiveness of t: ψ is 1-1, and fα → f iﬀ fα ◦ t → f ◦ t.

Frequently Y will be a (topological) subspace of X.

Theorem 1.26: Metrization Lemma

Let {Dn} be a sequence of decreasing sets in X × X such that D0 = X × X, (x, x) ∈ Dn, and n+1 n+1 n+1 n D ◦ D ◦ D ⊆ D for all n and x ∈ X. Then there is a function d : X × X → R+ such that (I). d(x, x) = 0 for all x ∈ X;

(II). d(x, y) + d(y, z) ≥ d(x, z) for all x, y, z ∈ X; (III). Dn ⊆ {(x, y): d(x, y) < 2−n} ⊆ Dn−1 for all n. If each Dn is symmetric, then d can be chosen symmetric too. Proof: Since D0 = X × X the following function is well-deﬁned on X × X: ( 2−n, (x, y) ∈ Dn−1 \ Dn f(x, y) = n . (4) 0, (x, y) ∈ ∩nD

We take the chain approximation of f:

( n ) X d(x, y) = inf f(xi, xi+1): x0 = x, xn+1 = y, n ∈ N . (5) i=0

n Note that d : X × X → [0, 1/2]. As (x, x) ∈ ∩nD , d(x, x) = 0 for all x. Clearly d satisﬁes the triangle inequality, and it is symmetric if each Dn is so.

December 14, 2015 revision: 1 main 7 1 Topological background

By construction d ≤ f hence Dn ⊆ f < 2−n ⊆ d < 2−n . For the other direction we ﬁrst use induction to prove that J K J K

n X ∀ x0, . . . , xn+1 ∈ X, f(x0, xn+1) ≤ 2 f(xi, xi+1). (6) i=0

Pn n n+1 n+1 n+1 Indeed, let a = i=0 f(xi, xi+1). If a = 0, then (xi, xi+1) ∈ ∩nD hence using D ◦D ◦D ⊆ n n −2 D we know (x0, xn+1) ∈ ∩nD , i.e. the claim holds. So we need only consider 2 > a > 0 in the Pk following. Find the largest k such that i=0 f(xi, xi+1) ≤ a/2. If k does not exist then the claim Pn+1 trivially holds, otherwise k ≤ n since a > 0. Note that we can assume i=k+1 f(xi, xi+1) ≤ a/2 for otherwise the claim holds again trivially. By the induction hypothesis we thus have f(x0, xk) ≤ Pk 2 i=0 f(xi, xi+1) ≤ a, f(xk+1, xn+1) ≤ a, and of course f(xk, xk+1) ≤ a (otherwise we have nothing to prove). Let m be the smallest integer such that 2−m ≤ a. Since 2−2 > a we know m−1 n+1 n+1 n+1 n m ≥ 3. then (x0, xk), (xk, xk+1), (xk+1, xn+1) ∈ D . Using D ◦ D ◦ D ⊆ D we know m−2 −m+1 −m (x0, xn+1) ∈ D , i.e. f(x0, xn+1) ≤ 2 = 2 · 2 ≤ 2a. The induction is now complete. Using (6) we know d(x, y) < 2−n =⇒ f(x, y) < 2−n+1 =⇒ (x, y) ∈ Dn−1.

Corollary 1.27: Metrization

A uniform space is pseudo-metrizable iﬀ its uniformity has a countable base. Proof: Use induction we can extract from the countable base a deceasing sequence Dn that satisﬁes the conditions in Theorem 1.26, therefore there is a pseudo-metric whose pseudo-uniformity is equivalent to the original uniformity (see (III) in Theorem 1.26). The converse is clear from the pseudo-metric uniformity.

Theorem 1.28: Characterizing pseudo-metric uniformity

Let (X, D) be a uniform space and d : X × X → R be a pseudo-metric on X. Then d is uniformly continuous (w.r.t. the product uniformity) iﬀ D is ﬁner than the pseudo-uniformity generated by d.

Proof: A moment’s reﬂection convinces us that d is uniformly continuous iﬀ for all r > 0 there exists some D ∈ D such that

{(x, y), (u, v) :(x, u) ∈ D and (y, v) ∈ D} ⊆ {(x, y), (u, v) : |d(x, y) − d(u, v)| < r}.

Taking the restriction u = v = y we know {(x, y), (y, y) :(x, y) ∈ D} = LHS ⊆ RHS. Thus, if d is uniformly continuous, for all r > 0 there exists some D ∈ D such that D ⊆ {(x, y): d(x, y) < r}. Conversely, for any r > 0, ﬁnd D ⊆ {(x, y): d(x, y) < r/2}. Then, if (x, u) ∈ D and (y, v) ∈ D,

d(x, y) − d(u, v) ≤ d(x, u) + d(u, y) − d(u, v) ≤ d(x, u) + d(y, v) < r,

and similarly d(u, v) − d(x, y) < r. So d is uniformly continuous, and the proof is complete.

As a consequence, the pseudo-uniformity generated by a pseudo-metric is the coarsest uniformity that makes the pseudo-metric uniformly continuous (w.r.t. the product uniformity).

Corollary 1.29: Relating uniformity to pseudo-metric

Every uniformity is the coarsest uniformity that makes a family of pseudo-metrics uniformly continuous (w.r.t. the product uniformity). Proof: By Theorem 1.26 every countable family from the uniformity on X corresponds to a uniformly continuous pseudo-metric. The coarsest claim follows from Theorem 1.28.

December 14, 2015 revision: 1 main 8 1 Topological background

Theorem 1.30: Uniform topology through pseudo-metrics

Let (X, D) be a uniform space whose uniformity is generated by a family of pseudo-metrics dκ. Then T h i for all A ⊆ X, cl A = κ clκ A := {x ∈ X : dκ(x, A) = 0} . Moreover, the net xα → x iﬀ for all κ, dκ(xα, x) → 0.

Proof: We need only note that the balls {y ∈ X : dκ(x, y) < r}κ,x∈X,r>0 consist of a subbasis of the topology on X.

We also note that the function f :(Z, F) → (X, D) is uniformly continuous iﬀ for all κ, the pseudo-metric (x, y) 7→ dκ(f(x), f(y)) is uniformly continuous on Z × Z.

Deﬁnition 1.31: Cauchy net (in complete regular space)

A net {xγ : γ ∈ Γ} in a uniform space (X, D) is called Cauchy iff for all D ∈ D there exists some γ ∈ Γ such that α, β ≥ γ =⇒ (xα, xβ) ∈ D. Equivalently, if the uniformity is generated by the family of pseudo-metrics {dκ}, then the net is Cauchy iff for all κ, dκ(xα, xβ) → 0. We want to emphasize that the notion of Cauchy net is well-defined in any uniform space, or equivalently any complete regular space.

Proposition 1.32: Uniformly continuous functions preserve Cauchy nets

Let f :(X, D) → (Y, E) be a uniformly continuous function, then xα is Cauchy in X =⇒ f(xα) is Cauchy in Y .

Proposition 1.33: Convergence and Cauchy

A convergent net is Cauchy, and a Cauchy net converges to each of its cluster point.

Proof: Let the pseudo-metrics {dκ} generate the uniformity. If the net xα → x, then dκ(xα, xβ) ≤ dκ(xα, x) + dκ(x, xβ) → 0, i.e., the convergent net xα is Cauchy. Let x be a cluster point of the Cauchy net xα, i.e. for all κ, lim inf dκ(xα, x) = 0. But for all ≥ 0, lim sup dκ(xα, x) ≤ lim supα dκ(xα, xβ) + dκ(xβ, x) ≤ , by choosing β smartly.

The catch here is that a Cauchy net need not have a cluster point, and when it always does, we’d better be very serious about the underlying space.

Deﬁnition 1.34: Complete uniform space

A uniform space is complete iﬀ every Cauchy net in it has at least one cluster point, or equivalently iﬀ every Cauchy net converges to some point.

Proposition 1.35: Subspace of complete space

A closed subspace of a complete uniform space is complete. Conversely, a complete subspace of a Hausdorﬀ uniform space is closed.

December 14, 2015 revision: 1 main 9 2 Convex Functions

Proposition 1.36: Suﬃcient condition for compacta to be uncountable

Let X be a locally compact Hausdorff space. If for all x ∈ X, {x} is not open, then |X| > ℵ0. Proof: Using the one-point compactification we can assume w.l.o.g. that X is compact. The one-point sets {x} remain open. Take any nonempty open set U in X. Fix x. Since {x} is not open we can choose y ∈ U, y 6= x. Because X is Hausdorff there exist open sets V 3 y, W 3 x, V ∩ W = ∅, implying that y ∈ cl V 63 x. Of course we can make V ⊆ U by intersecting the latter. Now let f : N → X be any function. We prove that f cannot be surjective. Indeed, let xn = f(n), n = 1, 2,..., and we construct open sets Vn ⊆ Vn−1 such that xn 6∈ cl Vn, where V0 = X. Since X is compact, there exists x ∈ ∩n cl Vn, but x 6= xn for all n.

Clearly, if every point x in X is a limit point, then {x} is not open.

Deﬁnition 1.37: Topological group

Recall that a semi-group is a set G that we can deﬁne an associative multiplication operator · : S × S → S. A group is a semi-group that has an identity so that we can also deﬁne the inverse operator −1 : S → S. A topological group is a group equipped with a topology so that the multiplication and the inverse are continuous, or equivalently the map (x, y) 7→ xy−1 is continuous. Clearly, the maps x 7→ x−1, x 7→ yx, x 7→ xz, x 7→ yxz are group homeomorphisms.

Proposition 1.38: Homogeneity of topological groups

Ne is an nhood basis at the identity e of the group G iﬀ xNe or Nex is an nhood basis at any element x ∈ G. Therefore a topological group is locally compact (connected, path connected) if it is locally compact (connected, path connected) at the identity (or any other element).

Theorem 1.39: Continuity of group homomorphism

The group homomorphism φ : G → H between two topological groups G and H is continuous iﬀ it is continuous at a single point x ∈ G.

2 Convex Functions

Let our domain be some vector space X. Instead of always assuming the field to be real, we note that a complex vector space can always be treated as a real vector space: we simply “forget” the multiplication with complex numbers. Thus, we will use XR to denote this “realization”. Be reminded that XR is the same space (of points) as X and they share the same topology and vector addition. The only difference is the scalar multiplication in XR is the restriction of that of X to real scalars. By definition many algebraic properties, such as convexity below, are the same in X or XR. Definition 2.1: Convex set

A point set C ⊆ X is called convex if

∀x, y ∈ C, [x, y] := {λx + (1 − λ)y : λ ∈ [0, 1]} ⊆ C. (7)

Proposition 2.2: Intersection and union of convex sets

Arbitrary intersection and increasing union of convex sets is convex. Thus, lim infα Cα = ∪α ∩β≥α Cβ is convex.

December 14, 2015 revision: 1 main 10 2 Convex Functions

Clearly, arbitrary union or lim supα = ∩α ∪β≥α Cβ may not be convex.

Deﬁnition 2.3: Convex function, Jensen[1905] (in Danish)

The extended real-valued function f : X → (−∞, ∞] is called convex if

∀x, y, ∀λ ∈ (0, 1), f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y). (8)

It is necessary that the (eﬀective) domain of f, i.e. dom f := {x ∈ X : f(x) < ∞}, is a convex set. We call f strictly convex iﬀ the equality in (8) holds only when x = y. According to wikipedia (https://en.wikipedia.org/wiki/Johan_Jensen_(mathematician)), Jensen (Danish) never held any academic position and proved his mathematical results in his spare time.

In the above deﬁnitions we have used the fact that X is a vector space, so that we can add vectors and multiply them with (real) scalars. It is quite remarkable that such a simple deﬁnition leads to a huge body of interesting results, a tiny part of which we shall be able to present below. Proposition 2.4: Distributive law of convex sets

Let C,D ⊆ X. Then for all α, β ≥ 0,

α(C + D) = αC + αD, (α + β)C ⊆ αC + βC. (9)

The latter also holds as equality when C is convex. α β Proof: We need only concern the case when C is convex. Let x, y ∈ C, then α+β x + α+β y ∈ C too. Hence, αx + βy ∈ (α + β)C.

Letting C = {1, −1} ⊆ R shows that in general the equality may not hold for nonconvex sets.

Deﬁnition 2.5: Subadditive function

An extended real-valued function f : X → (−∞, ∞] is subadditive if for all x, y

f(x + y) ≤ f(x) + f(y). (10)

For a subadditive function f, if 0 ∈ int(dom f), then f is continuous on int(dom f) iﬀ it is continuous at the origin: By subadditivity

−f(x − y) ≤ f(y) − f(x) ≤ f(y − x).

Note that we always have f(x) + f(−x) ≥ 0. If 0 ∈ dom(f), then necessarily ∞ > f(0) ≥ 0.

We are now ready for our ﬁrst important class of convex functions. Deﬁnition 2.6: Sublinear function, always convex

A subadditive function p that is also positive homogeneous (i.e. p(tx) = tp(x) for all t ≥ 0 and x ∈ dom p) is called sublinear. If 0 ∈ dom(p), then necessarily p(0) = 0.

Theorem 2.7: Cauchy’s functional equation

The solution to Cauchy’s functional equation

∀x, y ∈ R, f(x + y) = f(x) + f(y),

if not linear, must have dense graph. Proof: Use induction we know for all r ∈ Q and x, f(rx) = rf(x). Let λ = f(1) and suppose for

December 14, 2015 revision: 1 main 11 2 Convex Functions

some z we have f(z) = λz + δ for some δ 6= 0. Fix any rational ball, i.e. the ball with rational center (p, q) and rational radius r > 0. Let x = p + s(z − t) for some rational s and t. We have

y = f(x) = f(p + s(z − t)) = λp + sδ + λs(z − t).

Certainly we can choose s and then t so that k(x, y) − (p, q)k < r, i.e., the graph of f meets any rational ball.

Thus, if f is additionally continuous, or bounded on any interval, or monotone on any interval, it has to be linear. To get a solution that is not linear: take a Hamel basis of the vector space R over the ﬁeld Q, assign arbitrary values there, and extend to all of R using linearity.

Proposition 2.8: Determining linearity

A sublinear function p is linear (on its domain) iﬀ for all x ∈ dom p, p(λx) = λp(x) for all |λ| = 1. Proof: Indeed, using subadditivity and homogeneity X X X p( αixi) ≤ p(αixi) = αip(xi) (11) i i i X X X −p( αixi) = p(− αixi) ≤ −αip(xi). (12) i i i P P Thus we have p( i αixi) = i αip(xi).

Deﬁnition 2.9: Balanced set

A set A ⊆ X is balanced if for all |λ| ≤ 1, λA ⊆ A. It is star-sharped (at the origin) if for all λ ∈ [0, 1], λA ⊆ A. Clearly a star-shaped set contains the origin and a balanced set is star-shaped and symmetric. More generally, A is star-shaped at x if A − x is star-shaped. We easily verify that a set is convex iﬀ it is star-shaped at any of its points. All (convex) nhoods of the origin contains a (convex) balanced (open) nhood: The multiplication (λ, x) 7→ λx is continuous at (0, 0) hence for all nhood V , there exists δ > 0 and (open) nhood S U such that W := |λ|≤δ λU ⊆ V is an (open) balanced nhood. Additionally, if V is convex, convW ⊆ V is a convex balanced nhood. The union, intersection, convex hull, and closure of a balanced (star-shaped) set is balanced (star-shaped). So we can deﬁne the balanced hull of a set, i.e., the smallest balanced superset [ bh(A) = λA. (13) |λ|≤1

A set is called absolutely convex iﬀ it is convex and balanced. Equivalently, for all |α| + |β| ≤ 1, αA + βA ⊆ A. We can deﬁne the absolute convex hull of a set:

( n n ) X X absconv(A) := λiai : n ∈ N, ai ∈ A, |λi| ≤ 1 = conv(bhA) ⊇ bh(convA). (14) i=1 i=1

Deﬁnition 2.10: Absorbing set

A set A ⊆ X is absorbing if for all x ∈ X there exists some r ≥ 0 such that for all |λ| ≥ r we have x ∈ λA. It is weakly absorbing if ∪t≥0tA = X. (Think of say {−1, 1} in R.) For real vector spaces, the two notions coincide for star-shaped sets. An absolutely convex set is absorbing in its linear hull (not so for a star-shaped balanced set,

December 14, 2015 revision: 1 main 12 2 Convex Functions

think of the two axis in R2). The superset and ﬁnite intersection of an absorbing set is absorbing. Every nhood (of origin) is absorbing: The multiplication (λ, x) 7→ λx is continuous at (0, x), hence for all nhood V there exists some δ > 0 and nhood U such that λ(x + U ) ⊆ V for all |λ| ≤ δ, in 1 particular, x ∈ λ V .

Deﬁnition 2.11: Core of a set

For a set A ⊆ X, its core is deﬁned as:

core A := {x ∈ A : A − x is absorbing} = {x ∈ A : ∀d ∈ X, ∃t > 0, s.t. x + B(t)d ⊆ A}, (15)

where B(r) is the unit ball with radius r of the field. Sometimes we only need the core in the sense of the real field, i.e. the core in XR, hence we also define

rcore A := {x ∈ A : ∀d ∈ X, ∃t > 0, s.t. [x, x + td] ⊆ A}. (16)

Clearly, int A ⊆ core A ⊆ rcore A ⊆ A, but note that unlike the interior, the deﬁnition of core does not require a topology on X. Note that if A is (mid-point) convex then core A = rcore A: for any d ∈ X, there exists s, t > 0 such that [x − 2td, x + 2td] ∈ A, [x − 2sid, x + 2sid] ∈ A, hence [x − (t + si)d, x + (t + si)d] ∈ A. However, in general core A ⊂ rcore A: rotate in the plane and shrink the radius to 0.

Deﬁnition 2.12: Positive homogeneous (p.h.) function

The function p : X → (−∞, ∞] is p.h. iﬀ for all λ > 0 and x ∈ X, p(λx) = λp(x). P.h. functions enjoy the following property (see Deﬁnition 2.14 for the gauge pB): • p(0) ∈ {0, ∞}.

• If p ≥ 0, then p = pB for any {x : p(x) < 1} ⊆ B ⊆ {x : p(x) ≤ 1}. Thus gauges exhaust all nonnegative p.h. functions. •{ x : p(x) ≤ λ} = λ{x : p(x) ≤ 1} for all λ > 0. Similarly if we use strict inequality. This property in fact characterizes nonnegative p.h., since any extended real-valued function is completely determined by its sublevel sets:

f(x) = inf{λ : x ∈ Aλ} where Aλ := {x : f(x) ≤ λ}. (17)

• p is finite-valued iff the open “unit ball” Bp := {x : p(x) < 1} is real absorbing (absorbing w.r.t. the real field) iff, provided p ≥ 0, it is a gauge of a real absorbing set. Thus gauges of real absorbing sets exhaust all nonnegative finite-valued p.h. functions. • If p ≥ 0, p is sublinear (equivalently subadditive or convex) iff the (open) closed unit ball ¯ Bp := {x : p(x) ≤ 1} is convex iff it is a gauge of a convex set. Thus gauges of (real absorbing) convex sets exhaust all nonnegative (finite-valued) sublinear functions.

• If p ≥ 0, p is symmetric iff its (open) closed ball is balanced iff it is a gauge of a balanced set. Thus gauges of balanced sets exhaust all symmetric nonnegative p.h. functions. ¯ • p is (upper) lower semicontinuous iff its (open) closed unit ball Bp is (open) closed iff, provided p ≥ 0, it is a gauge of a (open) closed star-shaped set. • p is continuous at origin iff it is bounded on an nhood iff, provided p ≥ 0, it is a gauge of a (star-shaped) nhood (hence finite-valued). In particular, p.h. functions continuous at origin map bounded sets into bounded intervals, and the converse holds if there exists a bounded nhood (such as any normable space). Similarly, a nonnegative sublinear function is continuous

December 14, 2015 revision: 1 main 13 2 Convex Functions

iﬀ it is a gauge of a convex nhood. Thus gauges of convex nhoods exhaust all nonnegative continuous sublinear functions. We mention three important positive homogeneous functions. Deﬁnition 2.13: Seminorm

A nonnegative finite-valued sublinear function p is called a seminorm if for all λ and x, p(λx) = |λ|p(x), i.e., the open (or closed) unit ball Bp is balanced (and convex and absorbing). Thus gauges of absolutely convex absorbing sets exhaust all seminorms. A seminorm is continuous iff its open unit ball is an nhood. Thus gauges of absolutely convex nhoods exhaust all continuous seminorms. A seminorm is called a norm iff p(x) = 0 ⇐⇒ x = 0, i.e., its unit ball is bounded on each ray (or simply bounded on any normable space).

Deﬁnition 2.14: Minkowski’s gauge function

For any set A ⊆ X we associate the extended nonnegative-valued function:

pA(x) := inf{λ ≥ 0 : x ∈ λA}. (18)

1 • pA is always positive homogeneous, and pγA = γ pA for all γ > 0 (bigger ball, smaller gauge).

• pA(0) = 0 ⇐⇒ 0 ∈ A.

• A ∩ R+x ⊆ B ∩ R+x =⇒ pA(x) ≥ pB(x), in particular, pcl A ≤ pA. Conversely, if B is star-shaped (on R+x), then pA(x) > pB(x) =⇒ A ∩ R+x ⊆ B ∩ R+x.

• pA∩B ≥ pA ∨ pB, with equality if A, B are star-shaped. (If pA(x) = pB(x) then A ∩ R+x = B ∩ R+x except one of them may not include the endpoint.)

• pA∪B = pA ∧ pB.

−1 −1 −1 • pA+B ≤ (pA + pB ) .

• A is symmetric, i.e. λx ∈ A for all x ∈ A and |λ| = 1, =⇒ pA is symmetric, i.e. pA(λx) = pA(x) for all x and |λ| = 1. For a symmetric set A, pA = pbh(A). The converse does not hold: take A to be an arbitrary unbounded (on both ends) subset of a line Rx, then p(x) = p(−x) ≡ 0.

• A is convex =⇒ pA is subadditive: immediate consequence of Proposition 2.4. The converse is not true: take A = {−1, 0, 1}, then pA(x) = |x| is subadditive.

2 • pconvA ≤ pA, and the inequality can be strict: take a small ball in say R and put a big triangle around it. Let A be the ball union the three vertices.

Deﬁnition 2.15: Support function

A dual notion of the gauge is the support function deﬁned on the dual space X∗:

∗ ∗ σA(x ) := sup hx; x i . (19) x∈A The following properties are clear:

• σA is sublinear and σA(0) = 0;

• σA+B = σA + σB, σA∪B = σA ∨ σB.

• A ⊆ B =⇒ σA ≤ σB.

December 14, 2015 revision: 1 main 14 2 Convex Functions

• σA = σconvA.

◦ ◦ The closed unit ball of σA will be called the polar of A, and denoted as A . Note that A is ◦ ◦ always convex and 0 ∈ A . A is absorbing iﬀ σA is ﬁnite-valued.

Deﬁnition 2.16: Ray open / closed

We call a set A ray open/closed if for all x, R+x ∩ A is open/closed w.r.t. the inherited topology from R+x. Clearly, if A is open/closed (w.r.t. some topology of X) then it is ray open/closed.

Proposition 2.17: Reconstruct the set from gauge

Let A ⊆ X and pA(·) its gauge function.

• We always have A ⊆ {x : pA(x) ≤ 1};

• If A is star-shaped, then {x : pA(x) < 1} ⊆ A;

• If A is ray open, then A ⊆ {x : pA(x) < 1};

• If A is ray closed, then {x : pA(x)=1 } ⊆ A.

• If A is star-shaped, then A = {x : pA(x) < 1} if A is ray open, and A = {x : pA(x) ≤ 1} if A is ray closed.

Proof: We only prove the last four items. If pA(x) < 1, then either x = 0 or x ∈ λA for some 0 < λ < 1, namely x/λ ∈ A. In either case x ∈ A if A is star-shaped. If A is ray open, then for each x ∈ A there exists some δ > 0 such that (1 + δ)x ∈ A, i.e., 1 pA(x) ≤ 1+δ < 1. If pA(x) = 1, then there exists λn → 1 such that x/λn ∈ A. If A is ray closed, then x ∈ A. Finally, we note that for a star-shaped set A = ∪λ∈[0,1]λA, and {x : pA(x) = 0} ⊆ A.

Proposition 2.18: Interior/closure preserve convexity

For any star-shaped (convex) set A, core A, rcore A, int A and cl A are star-shaped (convex). Proof: Let A be convex and x, y ∈ core A. Fix any λ ∈ (0, 1). For all direction d, there exist t, s > 0 such that x + B(t)d, y + B(s)d ⊆ A, where B(r) is the centered ball of radius r of the underlying ﬁeld. Then λx + (1 − λ)y + B(λt)d ⊆ λ[x + B(t)d] + (1 − λ)[y + B(s)] ⊆ A, i.e. λx + (1 − λ)y ∈ core A. Let A be convex and x, y ∈ int A, i.e. x + V ∈ A, y + W ∈ A for some nhood V , W . Take the nhood U = V ∩ W we know x + U ∈ A, y + U ∈ A. Using convexity we know λx + (1 − λ)y + U ⊆ λx + (1 − λ)y + λU + (1 − λ)U = λ(x + U ) + (1 − λ)(y + U ) ⊆ A. Thus [x, y] ⊆ int A. Let A be convex and x, y ∈ cl A. Then there exist nets xα → x, yβ → y. But then A 3 λxα + (1 − λ)yβ → λx + (1 − λ)y, i.e. [x, y] ⊆ cl A. The star-shaped case is similar and omitted.

In fact, for the star-shaped case we can interpret the interior and closure as taking on each ray. This avoids putting any topology on X. Note that core(core A) ⊆ core A, with equality if A is convex: for any d, e and x ∈ core A, there exists t > 0 such that x + B(2t)d, x + B(2t)e ⊆ A, hence x + B(t)d + B(t)e ⊆ A, i.e. x ∈ core core A. The equality can fail in general, for instance when core A is a singleton. Similar for the real core.

December 14, 2015 revision: 1 main 15 2 Convex Functions

Proposition 2.19: Topological cancellations of convex sets

If x ∈ cl C and y ∈ int C for a convex set C, then (x, y] := {λx + (1 − λ)y : λ ∈ [0, 1)} ∈ int C. Moreover, int cl C = int C and cl int C = cl C, provided that int C 6= ∅. Proof: Since y ∈ int C, there exists an nhood V such that y+V ∈ C. Then λx+(1−λ)y+(1−λ)V = λx + (1 − λ)(y + V ) ⊆ C. As λ < 1, (1 − λ)V is also an nhood. The proof for (x, y] ∈ int C is now complete. For the second claim let x ∈ cl C and y ∈ int C (whose existence is assumed), then we can choose points in (x, y] to converge to x. This proves cl int C = cl C. On the other hand, let x ∈ int cl C and y ∈ int C ⊆ int cl C. For any λ ∈ (0, 1) we have x−λ(x−y) = (1−λ)x+λy ∈ int C, while for λ suﬃ- 1 1 ciently small we have x+λ(x−y) ∈ int cl C ⊆ cl C. Thus x = 2 (x−λ(x−y))+ 2 (x+λ(x−y)) ∈ int C. This proves int cl C = int C.

We can of course replace the interior with the relative interior (and replace the closure with the relative closure, too). In a finite dimensional space we always have ri C 6= ∅ for a nonempty convex set C: there exist affinely independent vectors x1,..., xd+1 in C, but their convex hull is a non-degenerate simplex hence contains an interior point. It also follows that a (nonempty) convex set in a finite dimensional space is closed iff it is ray closed. For infinite dimensional spaces, ri C can be empty and the second claim above may not hold: Take for instance a (real) infinite dimensional Frechét space X (locally convex complete metrizable TVS) and a discontinuous linear functionala f. Let C := {x ∈ X : −1 ≤ f(x) ≤ 1}. Clearly C is convex and 0 ∈ core C. Since C is symmetric convex and 0 6∈ int C (otherwise f would be continuous), int(C) = ∅. However, int cl C = core cl C 3 0, see the comment of Corollary 2.32. Note also that the convex set C is not closed but ray closed.

a Constructed as follows: take a countable subset {x1, x2,...} from a Hamel basis, a countable decreasing nhood basis V1, V2,..., and nonzero real numbers t1, t2,... so that tixi ∈ Vi for all i. Consider the linear functional f(xi) = 1/ti and f(y) = 0 for other elements in the Hamel basis. Clearly tixi → 0 but f(tixi) ≡ 1. This construction shows that the continuous dual of any metrizable TVS is strictly smaller than its algebraic dual.

The gauge function is also useful in proving the following result:

Theorem 2.20: Compact convex sets are homeomorphic

Compact convex sets with nonempty interior in Rd are homeomorphic. Proof: W.l.o.g., we assume C is a compact convex set with zero in its interior. We prove that C is homeomorphic to the closed unit ball B¯ of the underlying normed space. We deﬁne the (scaling) map s using the gauge function: ( pC (z) z, z ∈ C \{0} s(z) := kzk . (20) 0, z = 0

¯ Clearly, s maps C into B — continuously (since C is a convex nhood, the gauge pC is a continuous sublinear function). Because C is compact, pC (z) = 0 iﬀ z = 0. If z1 6= z2 is not on the same ray, we clearly have s(z1) 6= s(z2). If z1 6= z2 is on the same ray, again we have s(z1) 6= s(z2) since C is star-shaped (and compact). This proves the map s is 1-1. Since 0 ∈ core C and C is compact and star-shaped, pC (·) takes values in [0, 1] when restricted to the intersection of C with any direction (with both endpoints attainable). By the intermediate value theorem the map s is onto. To summarize, we have constructed a continuous bijection s from the compact space C onto the Hausdorﬀ space B¯. The inverse of s is automatically continuous.

The theorem does not hold for compact star-shaped sets, if we recall the following convenient rule:

For any n, homeomorphic sets have homeomorphic sets of points that cut them into n (path) connected components.

December 14, 2015 revision: 1 main 16 2 Convex Functions

Now the compact star-shaped set can be cut into 2 connected components by a single point while the ball cannot. The crucial property we needed in the proof (but missing for the star-shaped set above) is the continuity of the gauge function pC . Thus, compact sets (upon translation) that are star-shaped, with 0 in its real core, and having continuous gauge (restricted to the set) are homeomorphic, such as the `p ball with 0 < p < 1. We have written our proof in a way to seemingly suggest that the closed ball of any normed space is compact. This is of course not true, and the catch is that no compact sets in an infinite dimensional Hausdorff TVS can have nonempty interior. In other words, we have accidentally proved: A normed space has a compact nhood iff it is finite dimensional.

We need a technical lemma that is very interesting in its own right. Proposition 2.21: Closed sets are zeros of smooth functions

d ∞ d −1 Let A ⊆ R be a closed set, then there exists a C function f : R → R+ such that f (0) = A. c 1 Proof: For each x ∈ A , take the ball Bx := B(x, 2 d(x,A)), where d(x,A) = mina∈A kx − ak. Since A is closed, d(x,A) > 0 hence Bx ∩ A = ∅. The set of balls {Bx}x∈Ac is an open covering of the c ∞ metric space A , hence there exists a partition of unity {ϕx}x∈Ac . On each ball Bx consider the C d function fx : R → R+ deﬁned as  1 exp 1 , z ∈ Bx fx(z) = kz−xk− 2 d(x,A) . 0, otherwise P Putting f(z) = x∈Ac ϕx(z)fx(z) completes our proof.

Clearly closedness of the set is necessary as the zeros of any continuous function is closed.

Theorem 2.22: Open star-shaped sets are C∞-diﬀeomorphic

Every open star-shaped set Ω in Rd is C∞-diﬀeomorphic to Rd. Proof: W.l.o.g. assume 0 ∈ Ω and Ω is star-shaped at 0. Thanks to Proposition 2.21 there exists a ∞ d c −1 C function φ : R → R+ with Ω = φ ({0}). Deﬁne the function f :Ω → R as

" 2 #  !2 Z 1 Z kxk2 f(x) = 1 + 1 dν kxk2 ·x = 1 + 1 dt · x, (21) φ(νx) 2  φ(tx/kxk2)  0 0 | {z } λ(x)

which clearly is C∞ on Ω. Take two points x1, x2 ∈ Ω. If x1 and x2 are not on the same ray then clearly f(x1) 6= f(x2) since λ(x) ≥ 1. On the other hand, if x1 and x2 are on the same ray, again f(x1) 6= f(x2) since λ(sx) is an increasing function of s (better seen from the second equality above). Therefore f is 1-1. Since Ω is star-shaped at 0 we know the interval [0, x ) ∈ Ω for any x ∈ Ω. Consider the C∞ pΩ(x) 1 function g(s) := f(sx) where s ∈ [0, ). Obviously g(0) = 0. If pΩ(x) = 0, then clearly g(s) is pΩ(x) x x c onto R+x (by intermediate value theorem). If pΩ(x) > 0, then φ( ) = 0 since ∈ Ω . By pΩ(x) pΩ(x) the mean value theorem we know

|φ( x ) − φ(ν x )| = φ(ν x ) ≤ M(1 − ν) pΩ(x) pΩ(x) pΩ(x)

for some constant M (that does not depend on ν, since |φ0| attains its maximum on the interval [0, x ]). Therefore λ(x) diverges to inﬁnity as x → x . Thanks again to the intermediate value pΩ(x) pΩ(x) theorem we know g(s) maps again onto R+x. In summary, f is onto.

December 14, 2015 revision: 1 main 17 2 Convex Functions

To show f −1 is C∞ we need only show f 0 never vanishes, thanks to the inverse function theorem. Suppose to the contrary there exists some h 6= 0 such that

f 0(x)h = λ(x)h + hλ0(x), hi x = 0.

Then x 6= 0 and h = µx for some µ 6= 0. Plug in back we obtain λ(x) + hλ0(x), xi = 0 = dλ(sx) λ(x) + ds s=1, which is impossible as λ(x) ≥ 1 and λ(sx) is increasing w.r.t. s.

This result seems to be folklore, but the beautiful proof here is from page 60 of Gonnord & Tosel’s book "Calcul Diﬀérentiel", ellipses, 1998.

Theorem 2.23: Nullity of the boundary of convex sets, Lang[1986]

Rd d The boundary of a convex set in is null w.r.t. every product measure µ := ⊗i=1µi on the Borel field with non-atomic Σ-finite marginals µ1, . . . , µd. Proof: Fix the convex set C ⊆ Rd. First note that the boundary ∂C = cl C \ int C is closed hence Borel. Let M := {B ⊆ Rd Borel : µ(B ∩ ∂C) ≤ (1 − 3−d)µ(B)}. Qd We claim that all rectangles are in M. Indeed, let A = i=1(−ai, bi]. W.l.o.g. assume µi is finite, and using non-atomicness we can find ai < xi < yi < bi such that µi((ai, xi)) = µi((xi, yi)) = −1 d µi((yi, bi)) = 3 µi((ai, bi]). Thus, we partition the rectangle A into 3 open rectangles of equal measure. If cl C meets all of the 2d corner open rectangles, then the center open rectangle is in int C because of convexity. Either way we have µ(A ∩ ∂C) ≤ (1 − 3−d)µ(A), proving our claim that all rectangles are in M. As M is clearly closed upon taking countable unions, we know M is exactly the Borel sets (finite unions of rectangles form an algebra). Therefore, µ(∂C) ≤ (1 − 3−d)µ(∂C), i.e., µ(∂C) = 0.

The same proof works for “order solid” sets, i.e., x, y ∈ C =⇒ z ∈ C for all x ≤ z ≤ y.

Corollary 2.24: Measurability of convex sets, Lang[1986]

Convex sets in Rd are measurable w.r.t. every complete σ-ﬁnite product measure on the Borel ﬁeld.

Proof: Note that every σ-ﬁnite measure ν on the Borel ﬁeld of a separable metric space can be written as ∞ X ν = νk + µ, k=1

where νk concentrates on a single point and µ is non-atomic. Therefore the product σ-ﬁnite measure can be written as a countable sum of

d0 d ⊗i=1νi × ⊗j=d0+1µj

Since νi concentrates on say xi, C is measurable iﬀ its section Cx1,...,xd0 is measurable. This is indeed so since the section remains convex hence its boundary is null.

In particular, convex sets are Lebesgue measurable hence have volume. Another proof: For any x ∈ ∂C, half of the open ball centered at x does not meet C due to convexity. By the Lebesgue density theorem we know the boundary is null. The same proof again works for “order solid” sets.

December 14, 2015 revision: 1 main 18 2 Convex Functions

Corollary 2.25: Measure of convex sets

For any convex set C ⊆ Rd, we have µ(C) = µ(cl C) = µ(int C) w.r.t. every complete σ-ﬁnite non-atomic product measure µ on the Borel ﬁeld. This result is not true for non-convex sets: Take the complement of a Cantor set with Lebesgue measure 1 > > 0; its closure has full measure 1 while its interior has measure 1 − .

Corollary 2.26: Measurability of monotone functions, Lang[1986]

A monotone function f : Rd → Rm is (Borel) measurable w.r.t. every complete product σ-ﬁnite measure on the Borel ﬁeld. Proof: We need only verify the measurability of the “order solid” set [[a ≤ f ≤ b]].

Thus, monotone functions are Lebesgue measurable.

Example 2.27: Convex sets need not be Borel

The union of the open unit ball and any non-measurable subseta of the unit sphere is a non-Borel convex set. This convex set is also non-measurable w.r.t. the uniform distribution on the sphere (which is complete, non-atomic but not product).

aTo construct such a set, take the unit interval [0, 1] and consider all equivalence classes [r] := {x ∈ R : x − r ∈ Q}. There are uncountably many such sets [r] and each of them has at least one representative in [0, 1]. Let V ⊆ [0, 1] be the set such that V ∩ [r] is singleton for all r ∈ R. Yes, here we need axiom of choice for the existence of V . S Let q1, q2,... enumerate the rationals in [−1, 1], then [0, 1] ⊆ k V + qk ⊆ [−1, 2]. But the sets V + qk are disjoint hence we cannot assign a measure to it. To extend the construction to the unit sphere, consider a homeomorphism from the unit interval to the sphere.

Theorem 2.28: Bounded convex functions are locally Lipschitz

If a convex function f : X → (−∞, ∞] is lower bounded by m on a set A and upper bounded by M on A + W where W is a star-shaped nhood. Then

∀u, v ∈ A, |f(u) − f(v)| ≤ (M − m) · pW (u − v). (22)

1 Proof: Let u, v ∈ A, and y = u + α+δ (u − v) where α = pW (u − v) and δ > 0 is arbitrary. α Then pW (y − u) = α+δ < 1, hence y − u ∈ W according to Proposition 2.17 (item II). Therefore, α+δ 1 y ∈ u + W ⊆ A + W . Clearly, u = α+δ+1 y + α+δ+1 v. Using the boundedness assumption and convexity:

α+δ f(u) − f(v) ≤ α+δ+1 [f(y) − f(v)] ≤ (α + δ)(M − m) = (M − m) · pW (u − v) + δ(M − m). The proof is complete since δ > 0 is arbitrary, and we can swap u and v.

Conveniently, sometimes we need only verify the upper boundedness in Theorem 2.28. Proposition 2.29: Upper bounded implies lower bounded

For a convex function f : X → (−∞, ∞], it is automatically lower bounded on a set A if: (I). A = −A and f is upper bounded on A; or (II). A is bounded, and f is upper bounded on a star-shaped nhood of A.

Proof: Firstly, if f is upper bounded by M on the symmetric set A, then we have

∀x ∈ A, f(x) ≥ 2f(0) − f(−x) ≥ 2f(0) − M > −∞. (23)

December 14, 2015 revision: 1 main 19 2 Convex Functions

Secondly, let f be upper bounded by M on A + V for some star-shaped nhood V , and A be bounded. Then A − A ⊆ λV for some suﬃciently large λ > 0. Therefore, for any x, y ∈ A, we have pV (y − x) ≤ λ. Fix an arbitrary δ > 0 and use convexity: λ + δ 1 1 f(y) ≤ f(y + (y − x)) + f(x). λ + δ + 1 λ + δ λ + δ + 1

1 λ 1 Note that pV ( λ+δ (y − x)) ≤ λ+δ < 1, hence y + λ+δ (y − x) ∈ A + V since V is star-shaped. Therefore f(x) ≥ (λ + δ + 1)f(y) − (λ + δ)M. As δ > 0 and y ∈ A are arbitrary, we have

∀x ∈ A, f(x) ≥ λ[sup f(y) − sup f(z)] + sup f(y) > −∞, (24) y∈A z∈A+V y∈A

where recall that λ is the smallest positive number such that A − A ⊆ λV .

In both cases we obtain explicit estimates of the lower bound (cf. the numbered Eqs). Clearly, lower bounded does not imply upper bounded for convex functions (think of the hinge loss). To see the necessity of A = −A in (I) or A bounded in (II), think of a linear function capped on the right.

Theorem 2.30: Continuity of convex functions

Let f : X → (−∞, ∞] be a convex function and x ∈ int dom f be arbitrary. Then the following are equivalent:

(I). f is continuous on int dom f; (II). f is continuous at x; (III). f is upper semicontinuous at x;

(IV). f is upper bounded on an nhood of x;

Proof: Clearly, (I) =⇒ (II) =⇒ (III) =⇒ (IV). (IV) =⇒ (II): W.l.o.g. we can take a symmetric nhood U + x such that f is upper bounded and U ⊇ V + W where V and W are balanced nhoods. Then the continuity of f at x follows from Theorem 2.28 and item (I) of Proposition 2.29. (Recall that the p.h. function pV is continuous at origin iﬀ V is an nhood.) (IV) =⇒ (I): Suppose f is upper bounded on x + V ⊆ dom f for some balanced nhood V . We show (IV) holds for any z ∈ int dom f. Find w ∈ int dom f such that z = λx + (1 − λ)w for some λ ∈ (0, 1]. Take U = λV . Clearly, for all v ∈ V ,

f(z + λv) = f(λ(x + v) + (1 − λ)w) ≤ λf(x + v) + (1 − λ)f(w).

Thus f is upper bounded on z + U ⊆ dom f.

Corollary 2.31: Continuity of convex functions: ﬁnite dimensional case

Any convex function on a ﬁnite dimensional space is continuous on the interior of its domain. Proof: In ﬁnite dimensional spaces, the simplex around a point is an nhood. Thus due to convexity the function is upper bounded at the simplex nhood of each interior point of its domain.

Corollary 2.32: Continuity of convex functions: lower semicontinuous case

Any lower semicontinuous (l.s.c.) convex function on a barrelled space is continuous on the interior, or equivalently the core, of its domain.

December 14, 2015 revision: 1 main 20 2 Convex Functions

Proof: Set λ such that f(x) < t for some x ∈ int dom f (or the core). Consider the sublevel set C = {y : f(x + y) ≤ t} which is closed, nonempty, and convex. Thanks to Corollary 2.31, the convex restriction g(λ) := f(x + λy) is continuous at 0 for any y hence upper bounded by t on some nonempty open set |λ| < δ. Thus C is absorbing hence an nhood. It follows that f is bounded on an nhood of x and Theorem 2.30 applies.

Recall that a topological vector space is barrelled iff every closed (absolutelya) convex absorbing set (i.e. a barrel) is an nhood. Every Baire TVS (e.g. F-space, complete metrizable TVS) is barrelled: S S Let V be a closed convex absorbing set. Thus X = t>0 tV = n nV . By Baire’s theorem one of nV hence V has nonempty interior, say x + U ⊆ V . Use absorbing again we know −x/n ∈ V for 1 1 n some n > 0. Then n+1 U = n+1 (x + U ) + n+1 (−x/n) ⊆ V is an open nhood of 0. In summary, for a closed convex set C in a barelled space, core C = int C. The l.s.c. assumption cannot be dropped even on Hilbert spaces: there exist discontinuous linear functionals. aThe usual definition also requires the set to be balanced, which is motivated by the unit balls of l.s.c. seminorms. To see balanced is unnecessary, let C be closed convex absorbing and consider the set C˜ := {x ∈ C : λx ∈ C for all |λ| ≤ 1}, which we easily verify to be a balanced closed convex absorbing subset of C. Translating to functions, every (l.s.c.) nonnegative finite-valued sublinear function f is “equivalent” to a (l.s.c.) seminorm: let p(x) := max{f(λx): |λ| ≤ 1}, then f(x) ≤ p(x) ≤ f(x) + f(ix) (or f(x) + f(−x) for the real field).

Theorem 2.33: Continuity of convex functions: real case

Any convex function on R is upper semicontinuous (u.s.c.) when restricted to its domain. Proof: The domain of any convex function f on R is an interval (a, b) with possibly one or both endpoints included. By Corollary 2.31 f is continuous on (a, b). Convexity also demands f(a) ≥ lim supx&a f(x). Indeed, for any λ ∈ [0, 1], f(λa + (1 − λ)x) ≤ λf(a) + (1 − λ)f(x). Similar arguments apply to the other endpoint.

One needs to interpret this result carefully: f as an extended-valued function on the whole space R need not be u.s.c, as we can approach from the inﬁnity side. Still, this result is nontrivial: consider the solid disk in R2; put the interior to 0, half of its boundary to 2, and the rest to 1.

Theorem 2.34: Directional derivative

Let f : X → (−∞, ∞] be convex and x ∈ core dom f. Then for all d ∈ X the univariate function f(x+td)−f(x) t 7→ t is ﬁnite and increasing on the interval [−δ, 0) ∪ (0, δ] for some δ > 0 (that may depend on x and d). Thus, the right directional derivative

0 f(x + td) − f(x) f(x + td) − f(x) 0 0 f+(x; d) := lim ≥ lim =: f−(x; d) = −f+(x; −d) (25) t↓0 t t↑0 t

0 is a well-deﬁned ﬁnite sublinear function of d. Likewise, the left derivative f− is positive homogeneous and superaddtive. Proof: The existence of δ > 0 so that f(x + td) < ∞ on [−δ, 0) ∪ (0, δ] follows from the assumption s x ∈ core dom f. Let 0 < s ≤ t ≤ δ, then 0 < t ≤ 1 and using convexity

s s s x + sd = t (x + td) + (1 − t )x =⇒ f(x + sd) − f(x) ≤ t [f(x + td) − f(x)]. Thus, the increasing property on (0, δ] is clear. The proof for the other interval [−δ, 0) is analogous. Due to monotonicity and sandwiching the directional derivatives are well-deﬁned and ﬁnite. Their positive homogeneity is clear. To see the subadditivity, use convexity again: for t > 0,

f(x + t(d + d )) − f(x) f(x + 2td ) − f(x) f(x + 2td ) − f(x) 1 2 ≤ 1 + 2 . t 2t 2t

December 14, 2015 revision: 1 main 21 2 Convex Functions

0 Taking the limit t ↓ 0 shows the subadditivity of f+.

Deﬁnition 2.35: Bregman divergence, Bregman[1967]

The Bregman divergence induced by a convex function f is: for all y ∈ dom f, x ∈ core dom f,

tf(y) + (1 − t)f(x) − f(ty + (1 − t)x) D (y, x) := f(y) − f(x) − f 0 (x; y − x) ≥ ≥ 0, (26) f + t where t > 0 is any suﬃciently small number. The Bregman divergence need not be symmetric (i.e. Df (y, x) 6= Df (x, y)) or satisfy the triangle inequality (i.e. Df (y, x) ≤ Df (y, z) + Df (z, x)). In fact, it is deﬁned over the asymmetric product dom f × core dom f.

Theorem 2.36: Weak convexity of the Bregman divergence

Let f be a convex function and y ∈ dom f. Then for all x ∈ core dom f and 0 ≤ λ ≤ 1

Df (x, x) = 0, (27)

Df ((1 − λ)x + λy, x) ≤ λDf (y, x), (28)

i.e. the Bregman divergence is convex w.r.t. the ﬁrst argument on each line segment connecting to the second argument. Proof: The ﬁrst equality is clear. We easily verify

0 Df ((1 − λ)x + λy, x) = f((1 − λ)x + λy) − f(x) − f+(x; λ(y − x)) 0 ≤ (1 − λ)f(x) + λf(y) − f(x) − λf+(x; y − x)

= λDf (y, x).

Since Df (x, x) = 0, this shows a weak form of convexity of the Bregman divergence.

Due to asymmetry, the claim is no longer true if we swap the order of the arguments. The signiﬁcance of this weak convexity lies in its generality: there is no topology involved.

Deﬁnition 2.37: Gateaux derivative

The function f is said Gateaux diﬀerentiable at x ∈ rcore dom f if the limit

f(x + td) − f(x) f 0(x; d) = lim (29) t→0 t exists for all d ∈ X. Usually we also require the derivative to be a linear functional of d. René Gateaux was killed in WWI before he could defend his doctoral thesis (on integration on functional spaces), see Mazliak[2015] for a detailed account of this history.

Theorem 2.38: Gateaux diﬀerentiable = Directional derivative linear

A convex function is Gateaux diﬀerentiable at a point x ∈ core dom f iﬀ

0 0 ∀d ∈ X, f+(x; −d) = −f+(x; d),

i.e., the directional derivative (hence the Gateaux derivative) is linear. Proof: Simply combine Proposition 2.8 and Theorem 2.34.

December 14, 2015 revision: 1 main 22 3 Uniformly convex and uniformly smooth functions

Theorem 2.39: Continuity of the directional derivative

If the convex function f : X → (−∞, ∞] is continuous at a point x, then its directional derivative at x is also continuous. (Hence the Gateaux derivative of f, if exists, is continuous.) Proof: By Theorem 2.28 and Proposition 2.29 we know f is in fact Lipschitz continuous: For some star-shaped nhood W , some nhood V , and a ﬁnite constant L ≥ 0, for any u, v ∈ V ,

|f(x + u) − f(x + v)| ≤ L · pW (u − v).

Therefore, 0 0 f(x + td) − f(x + te) |f+(x; d) − f+(x; e)| = lim | | ≤ L · pW (d − e). t↓0 t

Since W is an nhood, pW is continuous at origin, and the theorem follows.

The proof shows that the directional derivative enjoys the same Lipschitz continuity as the function itself.

Theorem 2.40: Sierpinski Theorem

A Lebesgue measurable function is convex iﬀ it is mid-point convex.

3 Uniformly convex and uniformly smooth functions

Let X∗ ⊆ X0, the latter being the algebraic dual of X (i.e. all linear functionals on X). We associate the dual pairing h·; ·i for X and X∗, and topologize X (resp. X∗) with the weak (resp. weak-∗) topology induced by X∗ (resp. X). Note there is a slight asymmetry between X and its dual X∗: we had to deﬁne X ﬁrst and X∗∗ := (X∗)∗ ⊇ X where the containment may be strict. In this section we will let X be a Banach space with norm k · k, and X∗ its topological dual (but again equipped with the weak-* topology). The following class of univariate functions will be frequently referenced:

A := {f : R+ → R+ ∪ {+∞}, f(t) = 0 ⇐⇒ t = 0}. (30)

(A stands for Asplund.)

Deﬁnition 3.1: Uniformly convex functions

A function f : X → (−∞, ∞] is called σ-convex if for all x, y ∈ X and λ ∈ (0, 1),

f((1 − λ)x + λy) + λ(1 − λ) · σ(kx − yk) ≤ (1 − λ)f(x) + λf(y), (31)

where σ : R+ → (−∞, ∞]. Note that dom f has to be convex. The existence of some σ immediately implies the existence of a largest σ, in the following sense:

σf (t) := sup{σ(t): f is σ-convex} (32) n (1−λ)f(x)+λf(y)−f((1−λ)x+λy) o = inf λ(1−λ) : λ ∈ (0, 1), x, y ∈ dom f, kx − yk = t . (33)

Clearly f is convex iﬀ σf ≥ 0, in which case σf (0) = 0 (provided that dom f 6= ∅). We call f uniformly convex (hence bona ﬁde convex) if σf ≥ 0 and σf (t) = 0 ⇐⇒ t = 0.

December 14, 2015 revision: 1 main 23 3 Uniformly convex and uniformly smooth functions

Remark 3.2: Modiﬁcation for positive homogeneous functions

The definition of uniform convexity does not work for positive homogeneous functions: Take y ∝ x we see that σf ≡ 0. The fix is simple: we constrain both x and y to have unit norm in the definition (31) or (33). This modification must be kept in mind when we talk about uniformly convex norms.

Deﬁnition 3.3: Totally convex functions

We deﬁne the moduli of total convexity of a convex function f : X → (−∞, ∞] as:

0 τf (t) := inf {Df (y; x) := f(y) − f(x) − f (x; y − x): x ∈ core dom f, kx − yk = t} (34) 0 = inf {Df (y; x) := f(y) − f(x) − f (x; y − x): x ∈ core dom f, kx − yk ≥ t} , (35)

where the second equality follows from the weak convexity of the Bregman divergence, see Theo- rem 2.36: If kx − yk = s > t, then we can ﬁnd z ∈ [x, y] so that kz − xk = t and Df (z, x) ≤ Df (y, x). Thus, it follows from (34) that τf (0) = 0 and from (35) that τf is an increasing function. We call a convex function f totally convex iﬀ τf (t) = 0 ⇐⇒ t = 0. Totally convex functions are perfect candidates of Lyapunov functions: Df (xn, x) → 0 =⇒ xn → x.

Theorem 3.4: Uniformly convex ⊂ Totally convex

For any convex (or more generally directionally diﬀerentiable) function f : X → (−∞, ∞] we have τf ≥ σf . Proof: Apply the deﬁnition of the directional derivative in (31).

Theorem 3.5: Uniformly convex ⊂ Strictly convex

Uniformly convex functions are strictly convex.

Deﬁnition 3.6: Uniformly smooth functions

A (proper) function f : X → (−∞, ∞] is called ρ-smooth if for all x, y ∈ X and λ ∈ (0, 1) such that (1 − λ)x + λy ∈ dom f,

(1 − λ)f(x) + λf(y) ≤ f((1 − λ)x + λy) + λ(1 − λ) · ρ(kx − yk), (36)

where ρ : R+ → (−∞, ∞]. For convex f, it is necessary to have ρ ≥ 0. The existence of some ρ immediately implies the existence of a smallest ρ, in the following sense:

ρf (t) := inf{ρ(t): f is ρ-smooth} (37) n (1−λ)f(x)+λf(y)−f((1−λ)x+λy) o = sup λ(1−λ) : λ ∈ (0, 1), (1 − λ)x + λy ∈ dom f, kx − yk = t (38) n (1−λ)f(x−λty)+λf(x+(1−λ)ty)−f(x) o = sup λ(1−λ) : λ ∈ (0, 1), x ∈ dom f, kyk = 1 . (39)

Clearly, ρf (0) = 0 iﬀ dom f 6= ∅ iﬀ dom ρf 6= ∅. From (39) we know ρf is l.s.c or convex if f is so.

Remark 3.7: Modiﬁcation for positive homogeneous functions

For purely symmetric reasons we also constrain x, y to have unit norms in the above deﬁnition of uniform smoothness, when the underlying function is positive homogeneous. This is for the sake of a

December 14, 2015 revision: 1 main 24 3 Uniformly convex and uniformly smooth functions

beautiful duality result, although the original deﬁnition is not really broken (unlike the uniformly convex case).

As we shall see, uniform convexity is a dual notion to uniform smoothness, hence there are a lot of “similarities” between the two, with one notable exception: behaviour with respect to restriction. It is easily veriﬁed that if f is ρ-uniformly convex, then f + δA is also ρ-uniformly convex while uniform smoothness does not enjoy this property (see the next proposition). Nevertheless, we can still deﬁne uniform smoothness1 of f w.r.t. some convex set A by replacing dom f in (??)-(??) with dom f ∩ A, in which case we denote the gage of uniform smoothness by σf,A (hence σf = σf,dom f ). Proposition 3.8

Let ∅ 6= A ⊆ dom f and t0 > 0. If σf,A(t0) < ∞ then A + t0BX ⊆ dom f. In particular, if σf (t0) < ∞ then dom f = X hence σf is l.s.c.

Proof: From deﬁnition (??) and the given condition σf,A(t0) < ∞ it follows that A+t0BX ⊆ dom f.

We have already seen that σf is nondecreasing if f is closed convex. Much more is true for ρf , although less transparently. Proposition 3.9

2 The function ρf (t)/t is nondecreasing, hence also ρf (t)/t and ρ(t).

Proof: Let t > 0 and 1 < c < 2 be such that ρf (ct) < ∞ (if no such t and c exist, then ρ ≡ ∞, hence nothing to prove). Fix > 0, ∃x ∈ dom f, y ∈ dom f and 0 < λ ≤ 1/2 such that ky − xk = ct and (1 − λ)f(x) + λf(y) − f((1 − λ)x + λy) ρ (ct) + > . f λ(1 − λ) −1 −1 Consider xλ := (1−λ)x+λy and xc := (1−c )x+c y, we have kxc−xk = t and xλ = (1−cλ)x+cλxc (where cλ ∈ (0, 1)). Hence

(1 − λ)f(x) + λf(y) − λ(1 − λ)ρf (ct) − λ(1 − λ) < f(xλ) ≤ (1 − cλ)f(x) + cλf(xc) − cλ(1 − cλ)ρf (t),

and −1 −1 −1 −1 f(xc) ≤ (1 − c )f(x) + c f(y) − c (1 − c )ρf (ct). A bit simpliﬁcation yields 1 − λ c c2ρ (t) < ρ (ct) + c ≤ ρ (ct) + . f f 1 − cλ f 2 − c

2 2n n Letting → 0 proves c ρf (t) ≤ ρf (ct) for any t > 0 and 1 < c < 2. Induction gives c ρf (t) ≤ ρf (c t) 2 for any n ∈ N, t > 0, 1 < c < 2 hence c ρf (t) ≤ ρf (ct) for any t > 0 and 1 < c.

We have said that uniform convexity and uniform smoothness are “dual” to each other, here is the formal statement. Theorem 3.10

If f is ρ-uniformly convex then f ∗ is ρ∗-uniformly smooth, and if f is σ-uniformly smooth then f ∗ is σ∗-uniformly convex. Proof:

1When talking about uniform convexity/smoothness on some convex set A, our current strategy of deﬁnition seems to provide a consistent treatment: just replace dom f with dom f ∩ A. There is an important subtlety though: in Eq. (??), one of x and y, say x, could lie in dom f − A and yet f(x) < ∞. Had we restricted f to the set A, f(x) will have to be ∞ then. Said more explicitly, the uniform convexity of f on some convex set A is the same as the uniform convexity of f + δA while this is NOT so for uniform smoothness.

December 14, 2015 revision: 1 main 25 References References

Remark 3.11

It is tempting to say f is ρ-uniformly convex iﬀ f ∗ is ρ∗-uniformly smooth. This is indeed so if f is l.s.c.

References

L. M. Bregman. The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967. Johan Ludwig William Valdemar Jensen. Om konvekse funktioner og uligheder mellem middelværdier. Nyt Tidsskrift for Matematik B, 16:49–68, 1905. Robert Lang. A note on the measurability of convex sets. Archiv der Mathematik, 47:90–92, 1986.

Laurent Mazliak. The ghosts of the école normale. Statistical Science, 30(3):391—-412, 2015. André Weil. Sur les espaces a structure uniforme et sur la topologie générale. Actualités scientiﬁques et industrielles, 551. Paris, Hermann, 1937.

December 14, 2015 revision: 1 main 26