<<

Radon measures and the dual of C(K), the barycenter map, the Strong Krein-Milman Theorem, and Jensen’s integral inequality Preliminary version

0.1. Radon measures. Recall that (S, Σ) is a measurable space if S is a nonempty set and Σ is a σ-algebra of subsets of S. By a on (S, Σ) we mean any µ:Σ → [0, +∞] such that µ(∅) = 0 and µ is σ-additive, that is, µ(S A ) = P µ(A ) whenever the sets A ∈ Σ n∈N n n∈N n n are pairwise disjoint. Similarly, given a X, a function µ:Σ → X is called an X-valued measure whenever µ(∅) = 0 and µ is σ-additive. Using this terminology, we see that any finite measure is just a nonnegative R-valued measure. Theorem 0.1.1 (Total variation). Given an X-valued measure µ, the formula ( n n ) X [ |µ|(A) = sup kµ(Ai)k : n ∈ N, Ai ∈ Σ pairwise disjoint, A = Ai , i=1 1 for A ∈ Σ, defines a measure, called the (total) variation of µ, which can be equivalently defined by +∞ +∞  X [ |µ|(A) = sup kµ(Ai)k : Ai ∈ Σ pairwise disjoint, A = Ai . i=1 1 Moreover, for X = R the variation |µ| is always finite. Given a set E ∈ Σ, we can consider the corresponding “restricted” σ-algebra

ΣE := {A ∩ E : A ∈ Σ} = {A ∈ Σ: A ⊂ E}. If µ is an X-valued measure, the restriction of µ to E is the X-valued measure

µE := µ|ΣE on (E, ΣE). One can also view µE as a measure on (S, Σ) defined by

µE(A) = µ(A ∩ E),A ∈ Σ.

We shall not make formal distinction between these two views of µE.

Theorem 0.1.2 (Hahn and Jordan decompositions). Let µ be an R-valued measure on (S, Σ). Then there exist two disjoint sets P,N ∈ Σ such that P ∪ N = S and + − µ := µP and µ := −µN 1 2 are finite (nonnegative) measures, called the positive variation and the negative variation of µ, respectively. Such a decomposition of S is called the Hahn decomposition. Moreover: (a) the Hahn decomposition (P,N) of S is unique in the following weak sense: 0 0 if (P ,N ) is another such decomposition then µP = µP 0 and µN = µN 0 ; (b) µ = µ+ − µ− and |µ| = µ+ + µ−, and hence the variation |µ| is a finite measure; + − (c) the measures µ , µ are minimal in the following sense: if ν1, ν2 are (non- + − negative) measures such that µ = ν1 − ν2 then ν1 ≥ µ and ν2 ≥ µ ; this “minimal” decomposition µ = µ+ − µ− is clearly uniquely determined and it is called the Jordan decomposition of µ; (d) the variations µ+, µ− are given also by n n +  X + [ µ (A) = sup µ(Ai) : n ∈ N, Ai ∈ Σ pairwise disjoint, A = Ai , i=1 1 n n −  X − [ µ (A) = sup µ(Ai) : n ∈ N, Ai ∈ Σ pairwise disjoint, A = Ai , i=1 1 where t+ = max{t, 0} and t− = − min{t, 0} are the positive and the nega- tive parts of a real number t.

0.2. Radon measures on compact Hausdorff spaces. Let (K, τ) be a compact Hausdorff , and let Borel(K) denote the σ-algebra of Borel sets of K, i.e., the smallest σ-algebra containing all τ-open sets. Definition 0.2.1 (). A Radon measure on K is a finite (non- negative) measure µ on (K, Borel(K)) which is regular, that is, for each B ⊂ K, µ(B) = sup{µ(C): C ⊂ B is compact}. It is easy to see that if µ is a Radon measure on K then µ(B) = inf{µ(G): G ⊃ B is open} ,B ∈ Borel(K). Definition 0.2.2 (Radon X-valued measure). Given a Banach space X, a Radon X-valued measure is an X-valued measure µ on (K, Borel(K)) which is regular in the following sense: ∀B ∈ Borel(K), ∀ε > 0, ∃C compact, C ⊂ B, ∀A ∈ Borel(B \ C), kµ(A)k < ε. 3

Proposition 0.2.3. Let µ be an X-valued measure on (K, Borel(K)). Then µ is a Radon X-valued measure if and only if its variation |µ| is a Radon measure. Moreover, for X = R, µ is an R-valued Radon measure if and only if both µ+, µ− are Radon measures. Lemma 0.2.4. Let µ be a Radon (nonnegative) measure on K. Then the union of an arbitrary family of open sets of null µ-measure has null µ-measure as well. Proof. Let G be a family of open sets such that µ(G) for each G ∈ G. Put H := S G (the union of all members of G) and assume that µ(H) > 0. By regularity of µ, there exists a compact C ⊂ H with µ(C) > 0. There exists a finite subfamily G ⊂ G that covers C, but then µ(C) ≤ P µ(G) = 0 is a 0 G∈G0 contradiction. 

Corollary 0.2.5 (Existence of ). Every Radon measure µ has a sup- port, that is, the (unique) smallest closed set spt(µ) ⊂ K such that µ is concentrated on spt(µ): µ(K \ spt(µ)) = 0. (Indeed, by the above lemma the union of all open sets of null µ-measure is the largest such , and spt(µ) is its complement.) Definition 0.2.6. For an X-valued Radon measure µ we define spt(µ) := spt(|µ|). Notice that, in the particular case of X = R, if µ is an R-valued Radon measure then spt(µ) = spt(µ+) ∪ spt(µ−).

0.3. The space of Radon R-valued measures as the dual of C(K). Definition 0.3.1. Let M(K) denote the of all Radon R-valued measures on K, M+(K) the set of all nonnegative Radon measures on K, and M+,1(K) the set of all probability Radon measures of K (i.e., R-valued measures µ ∈ M(K) such that µ ≥ 0 and µ(K) = 1).

Theorem 0.3.2 (The Banach space M(K)). The formula kµkM := |µ|(K) defines a norm on M(K) in which M(K) is a Banach space (even a Banach lattice). 4

For µ ∈ M(K) and f ∈ L1(µ) := L1(|µ|), the corresponding integral is defined as Z Z Z f dµ := f dµ+ − f dµ− . K K K Theorem 0.3.3 (Riesz Representation Theorem). For every compact Haus- dorff topological space K, C(K)∗ = M(K) in the following sense: there exists a bijective correspondence, which is a linear isometry, between the two above spaces, and this correspondence C(K)∗ 3 Φ 7→ µ ∈ M(K) is given by the representation formula Z Φ(u) = u dµ for each u ∈ C(K). K Corollary 0.3.4. Notice that the above theorem implies that  Z 

|µ|(K) = sup u dµ : u ∈ C(K), kuk∞ ≤ 1 K for every µ ∈ M(K).

0.4. Extreme points in M(K). Given a Banach space Y , let BY denote its closed unit ball. Keeping in mind the Riesz Representation Theorem, we can consider on M(K) = C(K)∗ also the corresponding w∗-topology σ(M(K),C(K)). Then the sets BM(K) and M+,1(K) ∗ ∗ are convex and w -compact. Indeed, BM(K) is w -compact by the Alaoglou Theorem, and M+,1(K) = BM(K) ∩ {µ : µ(1) = 1} where 1 ∈ C(K) is the constant function 1(t) = 1 (t ∈ K). Notice that, since the hyperplane {µ : µ(1) = 1} is a supporting hyperplane to BM(K), this also implies that M+,1(K) is an extremal set for BM(K). Given a closed (hence compact) nonempty set F ⊂ K, we clearly can identify the set {µ ∈ M(K) : spt(µ) ⊂ F } = {µ ∈ M(K): |µ|(K \ F ) = 0}

with M(F ). So we shall often consider M+,1(F ) as a subset of M+,1(K). The following simple lemma shows that this subset is w∗-closed. F Observation 0.4.1. Let F be a (nonempty) closed set in a compact Hausdorff ∗ space K. Then M+,1(F ) is convex and w -closed in M(K). 5

∗ Proof. We already know (see above) that M+,1(F ) is convex and wC(F )∗ - ∗ ∗ compact (where wC(F )∗ is the w -topology σ(M(F ),C(F )) ). Now it suffices to notice that Tietze’s extension theorem and the definition of the w∗-topologies easily imply that ∗ ∗ (M+,1(F ), wC(K)∗ ) = (M+,1(F ), wC(F )∗ ). We are done. 

∗ By the Krein-Milman Theorem, each of the sets BM(K), M+,1(K) is the w - closed convex hull of its extreme points. Let us determine the extreme points of the two sets.

Given t ∈ K, let δt denote the corresponding , that is ( 1 if t ∈ A, δ (A) = t 0 if t∈ / A.

Though this measure is defined for all subsets of K, we can view δt (by restricting it to Borel(K)) as an element of M(K). We clearly have even δt ∈ M+,1(K). Exercise 0.4.2. Consider the “Dirac map” δ : K → M(K) that assigns to each t ∈ K its Dirac measure δt. (a) Show that in the norm of M(K) the image δ(K) is a discrete whose any two distinct points have distance 2. (b) Show that in the w∗-topology of M(K) the image δ(K) is homeomorphic to K. Dirac Lemma 0.4.3. For every compact Hausdorff K, we have

ext M+,1(K) = {δt : t ∈ K} and ext BM(K) = {±δt : t ∈ K}. 1 1 Proof. (a) Given t ∈ K, if δt = 2 µ1 + 2 µ2 with µi ∈ M+,1(K) then 1 = 1 1 1 1 δt({t}) = 2 µ1({t}) + 2 µ2({t}) ≤ 2 + 2 = 1. It follows easily that both µ1, µ2 are concentrated on {t}, and hence they coincide with δt. This shows that δt is an extreme point of M+,1(K). Since both sets ±M+,1(K) are extremal for BM(K), it follows that both ±δt are extreme points of BM(K).

(b) Now let µ ∈ M+,1(K)be such that spt(µ) contains two distinct points t, s. There exist open sets A, B ⊂ K such that t ∈ A, s ∈ B. Then µ(A) > 0 since otherwise spt(µ) would be contained in the closed set Ac := K \ A. By the same reasoning, µ(Ac) ≥ µ(B) > 0. But now we can write

µA c µAc c µ = µA + µA = µ(A) µ(A) + µ(A ) µ(Ac) which is a nontrivial convex combination of two probability Radon measures on K. Thus µ∈ / ext M+,1(K). Hence the only extreme points of M+,1(K) are the Dirac measures of points of K. 6

(c) Consider µ ∈ ext BM(K). Necessarily |µ|(K) = kµkM = 1. (c1) First assume that both µ+, µ− are nontrivial. But then + µ+ − −µ− µ = µ (K) µ+(K) + µ (K) µ−(K) which contradicts the assumption that µ is an extreme point. − (c2) Now let µ = 0, that is, µ ≥ 0. Since M+,1(K) is extremal for BM(K), we have

µ ∈ M+,1(K) ∩ ext BM(K) = ext M+,1(K) = {δt}t∈K by (a,b), and we are done. (c3) If µ ≤ 0, we can apply (c2) to −µ to obtain that µ = −δt for some t ∈ K. The proof is complete. 

0.5. “Intermezzo” for an interested reader: a vector-valued generalization of the Riesz Representation Theorem. In what follows, we are going to give a description of the dual of the space of X-valued continuous functions where X is a Banach space. Let K be a compact Hausdorff topological space. Define C(K,X) := {u: K → X : u is continuous}. Then C(K,X) is a vector space and, when equipped with the norm

kuk∞ = sup ku(t)kX , t∈K C(K,X) is a Banach space. By M(K,X) we mean the set of all X-valued Radon measures µ such that

kµkM := |µ|(K) < +∞.

Then kµkM is a norm on M(K,X) that makes this space a Banach space. For µ ∈ M(K,X∗) and u ∈ C(K,X), it is possible to develop, in an appropriate way, the (scalar!) integral Z hu, dµi K by starting to define it for Borel-measurable X-valued simple functions in the following natural way: n n X Z X v = xiχBi ⇒ hv, dµi = hxi, µ(Bi)i , i=1 K i=1 where hx, x∗i := x∗(x) whenever x ∈ X, x∗ ∈ X∗. (This notation is quite common in .) We can now state the Singer’s representation theorem. Theorem 0.5.1 (Singer, 1957). The dual of C(K,X) is isometrically isomorphic to the space M(K,X∗), and the correspondence C(K,X)∗ 3 Φ ↔ µ ∈ M(K,X∗) is given by the integral formula Z Φ(u) = hu, dµi (u ∈ C(K,X)). K 7

0.6. The Strong Krein-Milman Theorem, and around. In what follows, the basic assumptions are: X is a locally convex Hausdorff topological vector space, H (1) and K ⊂ X is a nonempty compact convex set. Let us recall the Krein-Milman Theorem. Theorem 0.6.1 (Krein–Milman). Under the assumptions (1), one has that K = conv (ext K). aff-conv Observation 0.6.2. Let X,Y be Hausdorff topological vector spaces, F : X → Y a continuous affine map, and A ⊂ X a nonempty set. Then F (conv A) ⊂ conv F (A). Indeed, since F is affine it is easy to see that F (conv A) = conv F (A). Then, by continuity we have F (conv A) ⊂ F (conv A) = conv F (A), and we are done. We are now ready to state the basic theorem of this section. T Theorem 0.6.3. Assume (1).

(a) For each µ ∈ M+,1(K) there exists a unique point r(µ) ∈ X, called the barycenter or resultant of µ, such that Z bar (2) x∗(r(µ)) = x∗ dµ for each x∗ ∈ X∗. K Moreover, the barycenter r(µ) belongs to K. (b) This “barycenter map” ∗ r :(M+,1(K), w ) → K

is affine and continuous, and such that r(δx) = x for every x ∈ K. More- over, the map r is uniquely determined by these properties. (c) For each nonempty closed set F ⊂ K, one has r(M+,1(F )) = conv F. Proof. Uniqueness of r(µ) immediately follows from the fact that X∗ separates points of X. Thus we have to show that r(µ) exists in K. To this end, consider the following two mappings X∗ ∗  I : K → R ,I(x) := x (x) x∗∈X∗ , Z ∗ X∗ ∗  ∗  r0 :(M+,1(K), w ) → R , r0(x) := x dµ x∗∈X∗ = µ(x |K ) x∗∈X∗ , K X∗ Q where the Cartesian product R = x∗∈X∗ R in its product topology is a ∗ Hausdorff locally convex topological vector space. Since the topology of RX is 8

the topology of coordinate-wise convergence, both mappings I, r0 are affine and continuous. Moreover, I is clearly injective, and hence it is a homeomorphism ∗ between K and I(K). Notice that I(K) is convex and compact in RX . We claim that r0(M+,1(K)) ⊂ I(K). Indeed, since r0(δx) = I(x)(x ∈ K), we can use Lemma 0.4.3, the Krein-Milman theorem and Observation 0.6.2 to obtain w∗ r0(M+,1(K)) = r0(conv δ(K)) ⊂ conv r0(δ(K)) ⊂ I(K). So we have affine continuous maps ∗ I : K → I(K) and r0 :(M+,1(K), w ) → I(K) and I is moreover a homeomorphism. Consider the affine continuous map −1 ∗ r := I ◦ r0 :(M+,1(K), w ) → K. −1 It is easy to see that (2) is satisfied, and r(δx) = I (I(x)) = x for each x ∈ K. Let us show uniqueness in (b). Letr ˜ be another such map. Then the set −1 E := {µ ∈ M+,1(K): r(µ) =r ˜(µ)} = (r − r˜) (0) ∗ is a convex w -compact subset of M+,1(K) such that E contains δ(K) = ext M+,1(K). By the Krein-Milman theorem, E contains the whole M+,1(K), that is, r =r ˜. This completes the proof of (a) and (b). Let us show (c). Given a closed set F ⊂ K, we can use Observation 0.4.1 and the Krein-Milman theorem to obtain as above

r(M+,1(F )) = r(conv δ(F )) ⊂ conv r(δ(F )) = conv F.

On the other hand, the properties of M+,1(F ) and r imply that r(M+,1(F )) is a convex compact set that contains r(δ(F )) = F , and hence it contains also conv F . We are done.  SKM Theorem 0.6.4 (Strong Krein-Milman Theorem). Assume (1). Then every x ∈ K is the barycenter of a µ ∈ M+,1( ext K ). Proof. By Theorem 0.6.3(c) for F = ext K,

r(M+,1( ext K )) = conv (ext K) = K. 

Recall that a measure µ ∈ M+,1(K) is said to be concentrated on a Borel set B ⊂ K if µ(K \ B) = 0. In this case it is clear that the support spt(µ) is contained in B. Pn If µ = i=1 λiδxi is a convex combination of Dirac measures, then clearly Pn µ ∈ M+,1(K), r(µ) = i=1 λixi , and µ is concentrated on the finite set {x1, . . . , xn}. Keeping this in mind, we have the following example. 9

Example 0.6.5. Let X,K be as in (1). If K is finite-dimensional, then (by the Minkowski’s theorem) each point of K is a convex combination of extreme points. This can be said as follows: each point of a finite-dimensional compact convex set K is a barycenter of a probability measure µ ∈ M+,1(K) that is concentrated on ext K. Notice that a simple theorem by Choquet asserts that if K is metrizable (and our K is such!) then ext K is a Gδ set and hence Borel. But it is known that in general ext K need not be a Borel set. What follows is the famous Choquet Representation Theorem which general- izes the above example. We state it without proof. Theorem 0.6.6 (Choquet). Assume (1). If K is metrizable, then each point of K is the barycenter of some µ ∈ M+,1(K) which is concentrated on ext K. Let us conclude this subsection by showing that the formula (2) that defines the barycenter holds for a larger class of functions. Let us start with the following notations. A(K) = {h ∈ C(K): h is affine}, ˆ ˆ A∗(K) = {h|K : h: X → R is affine and continuous} ∗ ∗ ∗ = {h ∈ C(K): h(x) = x (x) + α (x ∈ K) for some x ∈ X , α ∈ R}. It is easy to see that A(K) is a closed subspace of C(K), and A(K) contains A∗(K) as a linear subspace. Let us see that these two subspaces do not coincide in general.

Example 0.6.7. In X = `2, consider the convex set 2 K = {x = (xn)n ∈ `2 : |xn| ≤ 1/n for each n}. It is an easy exercise to verify that K is compact. Consider the function X h: K → R, h(x) = xn . n∈N Then h is well-defined and affine, and h(0) = 0. Let us show that h is continu- ous. Assume that x(k) → x in K. Since every absolutely convergent series can be viewed as the Lebergue integral over N with respect to the counting mea- sure, we can apply Lebesgue’s Dominated Convergence theorem to conclude that h(x(k)) → h(x). We have shown that h ∈ A(K). Now, if h were the restriction to K of an affine hˆ defined on X, then hˆ would be linear (since h(0) = 0) and hence representable as an element of `2 (by the Riesz Representation Theorem). That is, there would be P y ∈ `2 such that h(x) = n xnyn for each x ∈ K. But it is easy to see that this would imply that y = (1, 1,... ) which is not an element of `2. This shows that h∈ / A∗(K). 10

We shall need a simple algebraic lemma, saying that a “nonvertical” hyper- plane in X × R coincides with a graph of an affine function.

Lemma 0.6.8. Let X be a vector space and H ⊂ X × R a hyperplane. As- sume that H does not contain any “vertical” line, or equivalently, H strictly separates two points of the form (x, t) and (x, s) with t 6= s. Then there exists an affine function h: X → R such that H = graph(h). If moreover X is a topological vector space and H is closed, then the affine function h is continuous.

Proof. There exist a linear functional Φ: X × R → R and α ∈ R such that H = Φ−1(α). It is an easy exercise to show that Φ must be of the form

Φ(x, t) = `(x) + ct for some linear `: X → R and c ∈ R. The assumption on H implies that Φ(0, 1) 6= 0 and hence c 6= 0. Now we have the equivalences

1 α (x, t) ∈ H ⇔ `(x) + ct = α ⇔ t = − c `(x) + c =: h(x) which show the first part of the statement. If H is closed then Φ is continuous, hence ` = Φ(·, 0) is continuous, showing that h is continuous. 

Proposition 0.6.9. Under the assumptions (1), A∗(K) is dense in A(K) (in the norm k · k∞ of C(K)). Proof. Given h ∈ A(K) and ε > 0, the sets graph(h) and graph(h + ε) are convex, compact and disjoint. By the Hahn-Banach theorem, they can be strongly separated by a closed hyperplane H ⊂ X × R. By the previous lemma, H coincides with the graph of a continuous affine function hˆ on X. ˆ ˆ Since h ≤ h|K ≤ h + ε, we conclude that kh − h|K k∞ ≤ ε. This completes the proof. 

Corollary 0.6.10. Let X,K be as in (1), µ ∈ M+,1(K), and x = r(µ). Then Z AK (3) h(x) = h dµ for each h ∈ A(K). K Proof. Since (3) holds for h ∈ X∗ by (2), and trivially for every constant h, it holds for each h ∈ A∗(K). Since each h ∈ A(K) is the uniform limit of a sequence of elements of A∗(K), the proof follows by an easy passing to limits.  11

0.7. Choquet Lemma and Milman’s “converse” to the Krein-Milman Theorem. Definition 0.7.1. Given a nonempty set A in a topological vector space X, a slice of A is any nonempty set of the form A∩H where H is an open halfspace, that is, H = [x∗ > α] for some x∗ ∈ X∗ and α ∈ R. Theorem 0.7.2 (“Choquet Lemma”). Let K be a compact convex set in a Hausdorff locally convex topological vector space (X, τ), and x ∈ ext K. Then the family of all slices of K that contain x forms a basis of neighborhoods of x in (K, τ). Proof. First notice that (K, τ) = (K, w) since K is τ-compact, τ ≥ w, and w is Hausdorff. So we have to show that if W is a weak neighborhood of x then K ∩ W contains a slice of K that contains x. By definition of the w-topology, we can assume that n \ W = Hi where each Hi is an open halfspace containing x. i=1 Sn c c Now K \ W = i=1(K ∩ Hi ) where Hi = X \ Hi (1 ≤ i ≤ n) is a closed halfspace. Since K \ W is a finite union of compact convex sets, its convex hull conv(K \ W ) is compact, hence closed in K. Moreover, the fact that x is an extreme point easily implies that x∈ / conv(K \ W ). But then, by the Hahn-Banach theorem, there exists an open halfspace H such that x ∈ H and H ∩ (K \ W ) = ∅. It follows that H ∩ K ⊂ W , and since H ∩ K is a slice of K that contains x, we are done.  milman Theorem 0.7.3 (Milman). Let X,K be as in (1), and A ⊂ K such that conv A = K. Then A ⊃ ext K. Proof. Assume that there is x ∈ (ext K) \ A. By the Choquet Lemma, x belongs to an open halfspace H such that H ∩ A = ∅. Since A ⊂ Hc (the complement of H) and Hc is closed and convex, it follows that K = conv A ⊂ c H . This contradiction completes the proof.  char Corollary 0.7.4. Let X,K be as in (1). For a set A ⊂ K the following assertions are equivalent: (i) K = conv A; (ii) A ⊃ ext K; (iii) ∀x ∈ K, ∃µ ∈ M+,1(A), r(µ) = x. Proof. The equivalence (i) ⇔ (ii) holds by the Krein-Milman theorem and its “converse”, Milman’s Theorem 0.7.3. The rest of the proof easily follows by Theorem 0.6.3(c):

(iii) ⇔ r(M+,1(A)) = K ⇔ conv(A) = K ⇔ (i). 12

 Remark 0.7.5. Recall that an analogous theorem for finite-dimensional K implies that ext K is the smallest subset of K whose convex hull is K. On the other hand, Corollary 0.7.4 states that ext K is the smallest closed subset of K whose closed convex hull is K.

Assume that µ ∈ M+,1(K) is supported on a finite set. Then if r(µ) ∈ ext K Pn then µ is a Dirac measure. (Indeed, we can write µ = 1 λiδxi , a convex Pn combination with λi > 0 for each i. The fact that r(µ) = 1 λixi is an extreme point of K easily implies that xi = r(µ) for each i.) The following proposition generalizes this observation to arbitrary µ ∈ M+,1(K).

Proposition 0.7.6. Let X,K be as in (1), and x ∈ ext K. If µ ∈ M+,1(K) is such that x = r(µ), then µ = δx. Proof. Assume that spt(µ) 6= {x}. Then µ(spt(µ)\{x}) > 0, hence by regular- ity there exists a closed set F ⊂ spt(µ) \{x} such that µ(F ) > 0. By Choquet Lemma there exists an open halfspace H containing x such that F ∩ H = ∅. Consider the convex sets C = K ∩Hc and D = K ∩H. Notice that C is closed and µ(C) ≥ µ(F ) > 0. If µ is concentrated on C, then by Theorem 0.6.3(c) we would have x = r(µ) ∈ C, a contradiction. Thus µ(D) > 0, and we have the convex com- µC µD bination µ = µ(C)ν1 + µ(D)ν2 where ν1 = µ(C) , ν2 = µ(D) (recall that µB denotes the restriction of µ to a Borel set B ⊂ K). But then x = r(µ) = µ(C)r(ν1) + µ(D)r(ν2) and, by Theorem 0.6.3(c) again, r(ν1) ∈ C and hence r(ν1) 6= x. But this contradicts the fact that x is an extreme point of K. 

0.8. Jensen’s integral inequalities. Let X,K be as in (1), and let µ ∈ M+,1(K) be supported on a finite set, that is, Pn µ = 1 λiδxi , a convex combination. Then for every convex function f : K → Pn Pn (−∞, +∞] we have the (finite) Jensen inequality f( 1 λixi) ≤ 1 λif(xi). It is immediate to see that this inequality can be rewritten in the following integral form R f(r(µ)) ≤ K f dµ . We are going to show that this “finitely supported case” can be generalized. Jensen1 Theorem 0.8.1 (Jensen Integral Inequality - first form). Let X,K be as in (1), f : K → (−∞, +∞] a convex l.s.c. function, and µ ∈ M+,1(K). Then the R (Lebesgue) integral K f dµ exists (finite or infinite), and Z J1 (4) f(r(µ)) ≤ f dµ . K 13

Proof. Since f is l.s.c., it is clearly Borel measurable. Moreover, f attains its infimum over K, which implies that its negative part f − is bounded and hence µ-integrable. This shows that R f dµ exists. K R The inequality (4) is obvious if K f dµ = +∞. So let us now suppose that R f dµ is finite. Notice that this implies that µ is concentrated on the K S convex set [f < +∞] (which is clearly Borel since it coincides with n[f ≤ n]). Applying Theorem 0.6.3(c) to F = [f < +∞], we obtain that r(µ) ∈ [f < +∞]. Proceeding by contradiction, assume that (4) is false. Fix a real α such that Z f(r(µ)) > α > f dµ . K By lower semicontinuity, there exists an open convex neighborhood V of r(µ) such that f > α on V ∩ K. Consider the convex disjoint sets epi(f) and V × (−∞, α) the second one of which is open. By the Hahn-Banach theorem, they cam be separated by a closed hyperplane H ⊂ X × R. Notice that since r(µ) ∈ [f < +∞], we have V ∩ [f < +∞] 6= ∅. Thus H strictly separates two vertically situated points, and hence H coincides with the graph of an affine continuous function a: X → R. We have that f ≥ a on K, and a ≥ α on V. It follows that Z Z Z f dµ ≥ a dµ = a(r(µ)) ≥ α > f dµ. K K K This contradiction completes the proof.  To prove the second form of the Jensen integral inequality, we shall need the notion of an image of a measure. Let (S, Σ, µ) be a (nonnegative) measure space and (T, A) a measurable space. Given a (Σ-A)-measurable map ϕ: S → T , we can define ν : A → [0, +∞], ν(A) := µ(ϕ−1(A)). Then ν is called the image or push-out of µ by ϕ, and it is denoted by ν = ϕ#µ. It is not difficult to show that: (a) ν is a measure; (b) ν(T ) = µ(S) (in particular, ν is a probability measure if and only if µ is); (c) if f : T → R is an A-measurable function then: R R ◦ T f dν exists (finite or infinite) if and only if S(f ◦ ϕ) dµ exists, ◦ and in this case the two integrals coincide. 14

Another notion we shall use is the so-called Pettis integral of a vector-valued function. Let (S, Σ, µ) be a measure space, X a Hausdorff locally convex topological vector space, and ϕ: S → X a mapping. Then x ∈ X is the Pettis integral of ϕ with respect to µ if x∗ ◦ ϕ ∈ L1(µ) and Z x∗(x) = (x∗ ◦ ϕ) dµ for each x∗ ∈ X∗. S Notation: Z x = (P)- ϕ dµ. S Notice that the Pettis integral, if it exists, is unique since X∗ separates points of X. Also notice that if µ is a Radon probability measure on a compact convex set K ⊂ X, then its barycenter can be viewed as the Pettis integral of the identity map: Z Z r(µ) = (P)- (id) dµ = (P)- y dµ(y). K K Now we are ready for the second form of the Jensen’s integral inequality. Theorem 0.8.2 (Jensen Integral Inequality – second form). Let S be a Haus- dorff topological space, µ a nontrivial finite regular on S. Let X,K be as in (1), f : K → (−∞, +∞] a l.s.c. convex function, and ϕ: S → K a continuous mapping. Then: R R (a) the integrals (P)- S ϕ dµ and S(f ◦ ϕ) dµ exist (in X and in R, respec- tively); 1 R (b) the point µ(S) (P)- S ϕ dµ belongs to K; (c) one has the inequality  1 Z  1 Z f (P)- ϕ dµ ≤ (f ◦ ϕ) dµ . µ(S) S µ(S) S Proof. For simplicity we can (and do) assume that µ(S) = 1 (otherwise sub- µ stitute µ with µ(S) ). It is easy to see that the image measure ν := ϕ#µ is a regular Borel probability measure on K, that is, ν ∈ M+,1(K). Moreover, Z Z Z Z (P)- ϕ dµ = (id) dν = r(ν) and (f ◦ ϕ) dµ = f dν S K S K by the properties of the image measure. So the rest follows immediately from Theorem 0.8.1.  15

0.9. Two further theorems about extreme points. Let us finish by stating, without proofs, two other interesting results concern- ing extreme points. Theorem 0.9.1 (Rad`o). Let X be a Banach space, and K ⊂ X∗ a w∗-compact convex set. If ext K is (norm) separable, then K = convk·k(ext K) . (Notice that the Krein–Milman theorem gives a similar formula but with the w∗- only. So under the assumptions of Rad`o’stheorem the two closures coincide.) ∗ Corollary 0.9.2. If X is a Banach space and ext BX∗ is separable, then X (and hence also X) is separable. (It is an easy exercise to show that the convex hull of a separable set is sepa- rable. Hence BX∗ is separable by Rad`o’stheorem.) Theorem 0.9.3 (Rainwater). Let X be a Banach space, x ∈ X. For a se- quence {xn} ⊂ X the following assertions are equivalent:

(i) {xn} converges weakly to x; ∗ ∗ ∗ (ii) {xn} is bounded and e (xn) → e (x) for each e ∈ ext BX∗ .