Weak topology and Opial property in Wasserstein spaces, with applications to Gradient Flows and Proximal Point Algorithms of geodesically convex functionals
Emanuele Naldi * Giuseppe Savaré † Institut für Analysis und Algebra, TU Braunschweig. Department of Decision Sciences, Bocconi University.
Dedicated to the memory of Claudio Baiocchi, outstanding mathematician and beloved mentor
Abstract In this paper we discuss how to define an appropriate notion of weak topology in the Wasserstein space (P2(H),W2) of Borel probability measures with finite quadratic moment on a separable Hilbert space H. We will show that such a topology inherits many features of the usual weak topology in Hilbert spaces, in particular the weak closedness of geodesically convex closed sets and the Opial property characterising weakly convergent sequences. We apply this notion to the approximation of fixed points for a non-expansive map in a weakly closed subset of P2(H) and of minimizers of a lower semicontinuous and geodesically convex functional φ : P2(H) → (−∞, +∞] attaining its minimum. In particular, we will show that every solution to the Wasserstein gradient flow of φ weakly converge to a minimizer of φ as the time goes to +∞. Similarly, if φ is also convex along generalized geodesics, every sequence generated by the proximal point algorithm converges to a minimizer of φ with respect to the weak topology of P2(H).
1 Introduction
Opial proved in [10] that weak convergence in a separable Hilbert space (H, | · |) admits a nice metric characterization.
Theorem (Opial). Let (xn)n∈N be a sequence in H weakly converging to x ∈ H. Then
2 2 2 |y − x| + lim inf |xn − x| ≤ lim inf |xn − y| . (1.1) n→∞ n→∞ In particular, for every y 6= x lim inf |xn − x| < lim inf |xn − y|. (1.2) n→∞ n→∞
The proof can be easily obtained by passing to the limit in the identity
2 2 2 |xn − y| = |xn − x| + |x − y| + 2hxn − x, x − yi, (1.3) arXiv:2104.06121v1 [math.OC] 13 Apr 2021 observing that limn→∞hxn − x, x − yi = 0 by weak convergence. It is worth noticing that (1.2) shows that
the weak limit x of a sequence (xn)n∈N is the unique strict minimizer of the function
L(y) := lim inf |xn − y|. (1.4) n→∞ Opial property (further extended and studied in more general Banach spaces, see e.g. [11]) has many in- teresting applications. A first one, which already appears as a relevant motivation in Opial’s paper [10], is related to the approximation of a fixed point of a non-expansive map T : C → C defined in a closed and n+1 n convex subset of H. If the set of fixed points of T is not empty and limn→∞ |T x − T x| = 0 for some n x ∈ C, then the sequence of iterated maps (T x)n∈N weakly converges to a fixed point y ∈ C of T as n → ∞. *Universitätsplatz 2, 38106 Braunschweig, Germany. Email: [email protected] †Via Roentgen 1, 20136 Milan, Italy. Email: [email protected]
1 A second kind of applications concerns the dynamic approximation of minimizers of a convex and lower semicontinuous function ϕ : H → (−∞, +∞] as the asymptotic limit of its gradient flow or of the so called Proximal Point Algorithm. More precisely, if arg min ϕ 6= Ø, Bruck [3] proved that every locally Lipschitz curve x : (0, +∞) → H solving the differential inclusion (the gradient flow generated by ϕ) d x(t) ∈ −∂ϕ(x(t)) a.e. in (0, ∞), (1.5) dt weakly converges to a minimizer x∞ of ϕ as t ↑ ∞. An analogous asymptotic behaviour is exhibited by the solutions to the Proximal Point Algorithm: selecting an initial datum x0 ∈ H and a time step τ > 0, one k considers the sequence (xτ )k∈N which recursively solves the variational problem 1 xk minimizes y 7→ |y − xk−1|2 + ϕ(y). (1.6) τ 2τ τ k A result of Martinet [6,7] (see also Rockafellar [12]) shows that the sequence (xτ )k∈N weakly converges to an element x∞ of arg min ϕ.
The aim of the present paper is to study the extension of the Opial Lemma to the metric space (P2(H),W2) of Borel probability measures on H endowed with the Kantorovich-Rubinstein-Wasserstein distance W2 and to derive similar applications to fixed points and to convergence of gradient flows and proximal point algorithms.
Let us recall that a Borel probability measure µ in H belongs to P2(H) if Z the quadratic moment |x|2 dµ(x) is finite. (1.7) H
The squared Wasserstein distance between µ1, µ2 ∈ P2(H) can then be defined as the solution of the Optimal Problem with quadratic cost Z 2 n 2 o W2 (µ1, µ2) := min |x1 − x2| dµ(x1, x2): µ ∈ Γ(µ1, µ2) , (1.8) H×H where Γ(µ1, µ2) denotes the set of couplings between µ1 and µ2, i.e. the Borel probability measures in H×H whose marginals are µ1 and µ2 respectively. It turns out that (P2(H),W2) is a complete and separable metric space, which contains an isometric copy of H given by the Dirac masses {δx : x ∈ H} (see e.g. [15, 1, 13]).
Since the distance in P2(H) cannot be derived by a norm, a first natural question concerns the appropriate definition of a suitable weak topology in P2(H), which enjoys at least some of the most useful properties of weak convergence in Hilbert spaces: (a) bounded sequences admit weakly convergent subsequences, (b) the distance function from a given element is a weakly lower semicontinuous map, (c) the scalar product is sequentially continuous w.r.t. strong/weak convergence of their factors, (d) weakly convergent sequences are bounded, (e) strongly closed convex sets are also weakly closed. A first approach, adopted in [1], is to work with the Wasserstein distance induced by a weaker metric on H, which metrizes the weak topology on bounded sets. This provides a satisfactory answer to the first three questions (a,b,c), at least for bounded sequences.
Here we rely on a different point of view, recalling that the topology of P2(H) could be equivalently char- acterized as the initial topology induced by the family of real functions Fζ : P2(H) → R where Z ζ(x) Fζ : µ → ζ dµ, ζ ∈ C(H), sup 2 < ∞, (1.9) H x∈H 1 + |x| i.e. the coarsest topology such that makes all the functions Fζ in (1.9) continuous. We thus define the weak w topology in P2(H) as the initial topology σ(P2(H), C2 (H)) induced by Fζ as ζ varies in the set
w n ζ(x) o C2 (H) := ζ : H → R is sequentially weakly continuous, lim = 0 , (1.10) |x|→∞ 1 + |x|2
2 w w w and we call P2 (H) the corresponding topological space P2(H), σ(P2(H), C2 (H)) . In this way, P2 (H) ∗ w inherits the weak topology of a subset of the dual of the Banach space C2 (H) and we will show that it satisfies all the previous properties (a,. . . ,e). In particular, we will prove that every lower semicontinuous geodesically convex function φ : P2(H) → (−∞, +∞] is also sequentially lower semicontinuous w.r.t. the w weak topology σ(P2(H), C2 (H)). As a byproduct, for every µ0 ∈ P2(H) and τ > 0 the Proximal Point Algorithm in P2(H) (also known as JKO [5] or Minimizing Movement scheme [1]) 1 µk minimizes µ 7→ W 2(µ, µk−1) + φ(µ) (1.11) τ 2τ 2 τ
k has always a solution (µτ )k∈N. w We will then show that the Opial property holds in P2 (H), with the same structure of (1.1): if (µn)n∈N is a w sequence converging to µ in P2 (H) then
2 2 2 W2 (ν, µ) + lim inf W2 (µ, µn) ≤ lim inf W2 (ν, µn) for every ν ∈ P2(H). (1.12) n→∞ n→∞ Applications to the asymptotic convergence of the gradient flows of a lower semicontinuous and geodesi- cally convex functional φ : P2(H) → (−∞, +∞] can then be easily derived by the same strategy of [3], by using the metric characterization of a solution µ : (0, ∞) → D(φ) of the gradient flow of φ in P2(H) in terms of Evolution Variational Inequalities [1] (see [2] for such a metric approach to (1.5) in Hilbert spaces)
1 d 2 1 W (µt, σ) ≤ φ(σ) − φ(µt) L -a.e. in (0, ∞), for every σ ∈ D(φ). (EVI) 2 dt 2 Analogous results hold for the convergence of the Proximal Point Algorithm (here we use the discrete estimates of [1] for (1.11) assuming convexity along generalized geodesics) and for the approximation of the fixed point of a non-expansive and asymptotically regular map T defined in a weakly closed subset of P2(H).
Plan of the paper. We will collect in Section2 the main facts concerning optimal transport and Kantorovich- Rubinstein-Wasserstein distances; we adopt a general topological framework, in order to include conver- gence of Borel probability measures with respect to non metrizable topologies as the weak topology in a Hilbert space. w Section3 is devoted to the definition of the weak topology of P2 (H). Here we adopt a more general viewpoint, considering probability measures in product spaces Z = Xs × Yw where Xs is a Banach space with its strong topology and Yw is a reflexive Banach space endowed with its weak topology under the general p-q growth condition Z p q kxkX + kykY dµ(x, y) < ∞ X×Y sw and leading to the space Ppq (X × Y). This is quite useful to deal with the integration of bilinear forms; when q = p = 2 and X = Y = H is an Hilbert space, we recover the case which is particularly relevant for w our further applications. The weak topology of P2 (H) corresponds to the choice X = {0} and Y = H. As sw a byproduct, we will prove the stability of optimal couplings w.r.t. the convergence in P22 (H × H). w Section4 deals with the weak lower semicontinuity in P2 (H) of geodesically convex functionals. The w Opial property in P2 (H) is discussed in Section5. The last Section6 contains the applications to the asymptotic behaviour of gradient flows, of the Proximal Point Algorithm, and of the iteration of non-expansive and asymptotically regular maps in P2(H).
Acknowledgments. G.S. gratefully acknowledges the support of the Institute of Advanced Study of the Technical University of Munich, of the MIUR-PRIN 2017 project ”Gradient flows, Optimal Transport and Metric Measure Structures” and of the IMATI-CNR, Pavia. The authors also acknowledge the support of the Department of Mathematics “F. Casorati" of the University of Pavia during the preparation of a prelim- inary version of the paper. E.N. acknowledges that this project has received funding from the European Union’s Framework Pro- gramme for Research and Innovation Horizon 2020 (2014-2020) under the Marie Skłodowska-Curie Grant Agreement No. 861137 G.S. graduated and began his research under the scientific direction of Claudio Baiocchi, to whom he is
3 deeply grateful and indebted for the generous and enlightening guidance, and his gentle humanity. Thanks to the originality of his ideas, the depth of his vision, the superb clarity and broad scope of his teaching, and his encouragement towards new and stimulating problems, Claudio Baiocchi had a fundamental influence on the development of the "Pavia School". To remain in the context of the present paper, here we will limit ourselves to quote his deep contribution to the study of evolution variational inequalities, in particular [2], which had a profound impact on numerous developments and later on the metric theory of gradient flows.
2 Preliminaries 2.1 Radon measures in completely regular topological spaces Let (X, T) be a Hausdorff topological space. We will denote by B(X) the Borel σ-algebra in X and by P(X) the set of all Radon probability measures, i.e. Borel probability measures satisfying the inner approximation property by compact sets
∀B ∈ B(X), ∀ε > 0 ∃Kε ⊂ B compact such that µ(B \ Kε) ≤ ε. (2.1)
Recall that if (X, T) is Polish (i.e. its topology is induced by a metric d such that (X, d) is complete and separable) or it is Lusin (i.e. X admits a Polish topology T0 finer than T) then every Borel probability measure on X is also Radon. We notice moreover that if µ is a Radon measure in a metric space X then its support supp(µ) is separable. We will mostly deal with Borel probability measures in separable Hilbert/Banach spaces (or dual of separa- ble Banach spaces), possibly endowed with their weak/weak∗ topology. Clearly a separable Banach space is Polish; the dual of a separable Banach space with its weak∗ topology is a Lusin space [14, Theorem 7, page 112]. In order to define the natural weak topology of P(X) in such general settings, let us recall that a topological space (X, T) is called completely regular if it is Hausdorff and for all closed set F ⊂ X and for all x0 ∈ X \ F there exists f ∈ Cb(X) (the space of continuous and bounded real functions defined in (X, T)) such that f(x ) = 0 and f ≥ 1. It is worth noticing that every metric space (X, d) and every Hausdorff 0 |F topological vector space (in particular every Banach space endowed with the weak or the weak∗ topology) are completely regular topological spaces. Definition 2.1 (Narrow topology in P(X)). Let (X, T) be a completely regular topological space. The R narrow topology in P(X) is the coarsest topology in P(X) such that all the functionals µ 7→ X f dµ, f ∈ Cb(X), are continuous. In particular, we say that a sequence (µn)n∈N ⊂ P(X) is narrowly convergent to µ ∈ P(X) if Z Z lim f(x) dµn(x) = f(x) dµ(x) ∀f ∈ Cb(X). (2.2) n→∞ X X
Narrow topology in P(X) is also often called weak topology; since we will mostly deal with the case when X is a Banach space endowed with its strong or weak topology (and the corresponding topologies in P(X)), we will adopt the term narrow in order to avoid possible misunderstandings. The Prokhorov Theorem (see [4, III-59] a proof) provides an important criterium for relative compactness w.r.t. the narrow topology.
Definition 2.2 (Tightness). A subset K ⊂ P(X) is tight if ∀ε > 0 there exists a compact set Kε ⊂ X such that µ(X \ Kε) < ε for all µ ∈ K. Theorem 2.3 (Prokhorov). Let K be a tight subset of P(X), then K is relatively compact in P(X) w.r.t. the narrow convergence. Conversely, if X is a Polish space, then every relatively compact (w.r.t. the narrow convergence) subset of P(X) is tight. The following result provides an useful integral condition for tightness. Proposition 2.4. K ⊂ P(X) is tight if and only if there exists a function ϕ : X → [0, +∞), with compact sublevels, such that Z sup ϕ(x) dµ(x) ≤ C < +∞ (2.3) µ∈K X
4 2.2 Transport of measures
Let Xi, i = 1, 2, be Hausdorff topological spaces, µ ∈ P(X1) and r : X1 → X2 be a Lusin µ-measurable map (e.g. a continuous map); we denote by r]µ ∈ P(X2) the push-forward of µ through r, defined by
−1 r]µ(B) := µ(r (B)) ∀B ∈ B(X2). (2.4)
A particularly important case is provided by the projection maps in product spaces. For an integer N ≥ 2 and i, j = 1, ..., N, we denote by πi and πi,j the projection operators defined on the product space X := X1 × ... × XN and respectively defined by
i i,j π :(x1, ..., xN ) 7→ xi ∈ Xi, π :(x1, ..., xN ) 7→ (xi, xj) ∈ Xi × Xj. (2.5)
i i If µ ∈ P(Xi), i = 1, ..., N, the class of multiple plans with marginals µ is defined by
1 N i i Γ(µ , ..., µ ) := {µ ∈ P(X): π]µ = µ , i = 1, ..., N}. (2.6) In the case N = 2, µ ∈ Γ(µ1, µ2) is also called a transport plan or coupling between µ1 and µ2. 1 1 Remark 2.5. To every µ ∈ P(X1) and every Lusin µ -measurable map r : X1 → X2 and we can associate the transport plan
1 1 1 µ := (idX1 × r)]µ ∈ Γ(µ , r]µ ), where idX1 : X1 → X1 is the identity map. (2.7) If µ is representable as in (2.7) then we say that µ is induced by r.
The following glueing lemma guarantees the existence of multiple plans with given marginals. 12 13 Lemma 2.6. Let X1,X2,X3 be Lusin or metrizable spaces and let γ ∈ P(X1 ×X2), γ ∈ P(X1 ×X3) 1 12 1 13 1 such that π] γ = π] γ = µ ∈ P(X1). Then there exists γ ∈ P(X1 × X2 × X3) such that
12 12 13 13 π] γ = γ , π] γ = γ . (2.8)
1 1 2 1 3 We denote by Γ (γ , γ ) the subset of plans µ ∈ P(X1 × X2 × X3) satisfying (2.8).
2.3 Optimal Transport and Kantorovich-Rubinstein-Wasserstein spaces Let (X, d) be a metric space and let p ∈ [1, +∞). We say that a Radon measure µ ∈ P(X) belongs to Pp(X) if Z p d (x, xo) dµ(x) < +∞ for some (and thus any) xo ∈ X. (2.9) X
p Definition 2.7. The L -Kantorovich-Rubinstein-Wasserstein distance Wp between two Radon probability 1 2 measures µ , µ ∈ Pp(X) is defined by Z p 1 2 n p 1 2 o Wp (µ , µ ) := min d (x1, x2) dµ(x1, x2): µ ∈ Γ(µ , µ ) .
1 2 1 2 We denote by Γo(µ , µ ) ⊂ Γ(µ , µ ) the convex and narrowly compact set of optimal plans where the minimum is attained, i.e. Z 1 2 p p 1 2 γ ∈ Γo(µ , µ ) ⇐⇒ d (x1, x2) dγ(x1, x2) = Wp (µ , µ ).
It is possible to prove that Γo is not empty and (Pp(X),Wp) is a metric space. It is easy to check that a set K ⊂ Pp(X) is bounded (i.e. there exists a measure ν ∈ Pp(X) such that {Wp(µ, ν)}µ∈K is a bounded subset of R) if and only if Z p sup d (x, xo) dµ < +∞ for some (and thus any) point xo ∈ X. (2.10) µ∈K X The following result shows the relationships the narrow topology and the topology induced by the Wasser- stein distance Wp.
5 Proposition 2.8. If (X, d) is separable (resp. complete) then (Pp(X),Wp) is a separable (resp. complete) metric space. A set K ⊂ Pp(X) is relatively compact iff it has uniformly integrable p-moments and is tight. In particular, for a given sequence (µn) ⊂ Pp(X) we have ( µn narrowly converge to µ, lim Wp(µn, µ) = 0 ⇐⇒ (2.11) n→∞ (µn) has uniformly integrable p-moments.
1 2 Proposition 2.9 (Stability of optimality and narrow lower semicontinuity). Let (µn), (µn) ⊂ Pp(X) be 1 2 1 2 two bounded sequences narrowly converging to µ , µ respectively, and let µn ∈ Γo(µn, µn) be a sequence of optimal plans. Then (µn) is narrowly relatively compact in P(X × X) and any narrow limit point µ 1 2 belongs to Γo(µ , µ ), with Z p 1 2 p Wp (µ , µ ) = d (x1, x2) dµ(x1, x2) X2 Z (2.12) p p 1 2 ≤ lim inf d (x1, x2) dµn(x1, x2) = lim inf Wp (µn, µn). n→∞ X2 n→∞ 3 A strong-weak topology on measures in product spaces
Let us consider a separable Banach space X endowed with the strong topology induced by its norm k · kX (we will occasionally use the notation Xs when we want to emphasize the choice of the strong topology) and a reflexive and separable Banach space (Y, k · kY). We will denote by Yw the space Y endowed with the weak topology σ(Y, Y0).
We are interested in Radon probability measures in the topological space Xs × Yw. Since Xs × Yw is endowed with the product topology of two Lusin and completely regular spaces, it is a completely regular Lusin space as well; in particular Borel measures are Radon, the set P(Xs × Yw) coincides with P(Xs ×Ys), and narrow convergence in P(Xs × Yw) is well defined. Let us set Z := X × Y; we want now to introduce a natural topology on the subset Z n p q o Ppq(Z) := µ ∈ P(Z): kxkX + kykY dµ(x, y) < +∞ p ∈ [1, +∞), q ∈ (1, +∞). (3.1)
sw In order to define such a topology, we consider the space Cpq (Z) of test functions ζ : Z → R such that
ζ is sequentially continuous in Xs × Yw, (3.2) p q ∀ ε > 0 ∃ Aε ≥ 0 : |ζ(x, y)| ≤ Aε(1 + kxkX) + εkykY for every (x, y) ∈ X × Y. (3.3)
sw We endow Cpq (Z) with the norm
|ζ(x, y)| kζk sw := sup . (3.4) Cpq (Z) p q (x,y)∈Z 1 + kxkX + kykY
Remark 3.1. When Y is finite dimensional, (3.2) is equivalent to the continuity of ζ. It is worth noticing that sw if p, q are conjugate exponents, any continuous and bilinear map β : X × Y → R belongs to Cpq (X × Y). In fact, it is easy to check that β is sequentially continuous in Xs × Yw and its continuity yields the existence of a constant L ≥ 0 such that
|β(x, y)| ≤ LkxkX kykY for every x ∈ X, y ∈ Y, so that Lp ε |β(x, y)| ≤ kxkp + kykq for every x ∈ X, y ∈ Y, ε > 0. pεp/q X q Y This covers in particular the case when Y = X is an Hilbert space and β coincides with the scalar product h·, ·i in X. sw Lemma 3.2. (C (Z), k · k sw ) is a Banach space. pq Cpq (Z)
6 Proof. It is obvious that k · k sw is a norm, we can thus check the completeness. Let {ζ } be a Cpq (Z) n n∈N sw Cauchy sequence in Cpq (Z) and let ζ its pointwise limit, so that for every η > 0 there exists Nη ∈ N such that p q |ζ(x, y) − ζn(x, y)| ≤ η(1 + kxkX + kykY) for every (x, y) ∈ X × Y, n ≥ Nη. (3.5)
If (xk, yk) is a sequence converging to (¯x, y¯) in Xs × Yw as k → ∞ we know that
p q p q kx¯kX + ky¯kY ≤ R := sup kxkkX + kykkY < ∞ k so that for every n ≥ Nη
lim sup |ζ(¯x, y¯) − ζ(xk, yk)| ≤ 2(1 + R)η + lim sup |ζn(¯x, y¯) − ζn(xk, yk)| = 2(1 + R)η. k→∞ k→∞ Since η is arbitrary, we conclude that ζ is sequentially continuous in Z. Let us eventually check that ζ satisfies (3.3). For a given ε > 0 we apply (3.5) with η := ε/2 and we pick up n ≥ Nη. Since ζn belongs sw to Cpq (Z), we find Bε ≥ 0 such that
p q |ζn(x, y)| ≤ Bε(1 + kxkX) + ε/2kykY for every (x, y) ∈ X × Y. Combining such inequality with (3.5) we conclude that
p q |ζ(x, y)| ≤ |ζ(x, y) − ζn(x, y)| + |ζn(x, y)| ≤ (ε + Bε)(1 + kxkX) + εkykY.
sw Definition 3.3 (Topology of Ppq (X × Y)). We endow Ppq(X × Y) with the initial topology induced by the functions Z sw µ 7→ ζ(x, y) dµ(x, y), ζ ∈ Cpq (X × Y), (3.6)
sw sw and we call Ppq (X × Y) the topological space Ppq(X × Y), σ Ppq(X × Y), Cpq (X × Y) .
It is obvious that whenever r ≥ p ∨ q the Wasserstein topology of Pr(X × Y) (induced by the Wasserstein distance Wr generated by any product norm in the Banach space X × Y) is finer than the topology of sw Ppq (X × Y) and the latter is finer than the narrow topology of P(Xs × Yw). The next proposition collects sw other important properties and justifies the interest of the Ppq (X × Y)-topology. sw sw Proposition 3.4. (a) If (µα)α∈A ⊂ Ppq (Z) is a net indexed by the directed set A and µ ∈ Ppq (Z) satisfy
(i) µα → µ narrowly in P(Xs × Yw); Z Z p p (ii) lim kxkX dµα = kxkX dµ; α∈A Z q (iii) sup kykY dµα < ∞, α∈A sw then µα → µ in Ppq (Z). The converse property holds for sequences: i.e. if A = N and µn → µ in sw Ppq (Z) as n → ∞, then properties (a), (b), (c) hold.
(b) For every compact set K ⊂ Pp(X) and every constant c < ∞ the sets Z n sw 1 q o Kc := µ ∈ Ppq (Z): π] µ ∈ K, kykY dµ ≤ c (3.7)
sw are compact and metrizable in Ppq (Z) (in particular they are sequentially compact).
sw Proof. Let us consider the first claim and let (µα)α∈A in Ppq (Z) satisfy properties (i), (ii), (iii) with S := R q supα kykY dµα < ∞. 1 1 sw We first observe that π] µα → π] µ in Pp(X). Let us now fix ζ ∈ Cpq (Z) and for every ε > 0 let Aε as p q in (3.3). The function ζε(x, y) := ζ(x, y) + Aε(1 + kxkX) + 2εkykY is nonnegative and it is also lower semicontinuous w.r.t. the Xs × Yw-topology: in fact, the sublevels Xε,c := {(x, y) ∈ X × Y : ζε(x, y) ≤ c}
7 q of ζε are sequentially closed and contained in X × {y ∈ X : kykY ≤ c/ε} which is a metrizable space, so that Xε,c is closed in Xs × Yw. It follows that Z Z Z Z p q lim inf ζ dµα = lim inf ζε dµα − Aε 1 + kxkX dµα − 2ε kykY dµα α∈A α∈A Z Z Z p ≥ ζε dµ − Aε 1 + kxkX dµ − 2εS ≥ ζ dµ − 2εS R R and, since ε > 0 is arbitrary, lim infα∈A ζ dµα ≥ ζ dµ. Applying the same argument to −ζ we sw conclude that µα converges to µ in Ppq (X × Y).
In order to prove the converse implication in the case of sequences, let us observe that if µn → µ in sw sw Ppq (X × Y) then properties (i) and (ii) are obvious. Since Cpq (X × Y) is a Banach space and each measure sw µn induces a bounded linear functional Ln on Cpq (X × Y), the principle of uniform boundedness implies that S := sup kL k sw 0 < ∞, i.e. n n (Cpq (Z)) Z sw p q ζ dµn ≤ S for every ζ ∈ Cpq (Z), |ζ(x, y)| ≤ 1 + kxkX + kykY. (3.8)
0 Let now (eh)h∈N be a strongly dense subset of the unit ball of Y (the dual of Y, which is separable as well) and let q ζk(x, y) := sup |hy, ehi| ∧ k (3.9) 1≤h≤k sw Clearly each ζk belongs to the unit ball of Cpq (Z) so that Z ζk(x, y) dµn(x, y) ≤ S for every k, n ∈ N. (3.10)
q R q Since ζk(x, y) ↑ kykY as k → ∞, Lebesgue Dominated Convergence Theorem yields kykY dµn ≤ S for every n ∈ N.
q (b) Since K is tight and y 7→ kykY has compact sublevel in Yw, the set Kc is tight in P(Z) and it is also closed, so that it is compact in P(Z). Every net (µα)α∈A in Kc has a subnet (µα(β))β∈B converging to 1 R p R p µ ∈ Kc in P(Z). Since π] µα is uniformly p-integrable we deduce limβ∈B kxkX dµα(β) = kxkX dµ. sw Applying the previous claim, we deduce that µα(β) → µ w.r.t. Ppq (X × Y). In order to prove the metriz- ability we observe that the bounded distance on Y ∞ X −n 0 d$(y1, y2) := 2 (|hy1 − y2, eni| ∧ 1) where (en)n∈N is dense in the unit ball of Y , (3.11) n=1 0 induces a coarser topology than σ(Y, Y ) in Y, so that the L1-Wasserstein distance associated to
d((x1, y1), (x2, y2)) := |x1 − x2|X + d$(y1, y2) sw induces a coarser topology than the topology of Ppq (X × Y), which on the other hand coincides with the sw Ppq (X × Y)-topology on the compact set Kc. Kc is therefore metrizable.
sw It is worth noticing that the topology of Ppp (X × Y) is strictly coarser than the Wasserstein topology of sw Pp(X × Y) even when Y is finite dimensional. In fact, Cpp (X × Y) does not contain the function (x, y) 7→ p kykY, so that convergence of the p-moment w.r.t. y is not guaranteed. The previous construction is useful also in the case of a single space Y (we may think that X reduces to {0}). w Definition 3.5 (The topology of Pq (Y)). Let Y be a reflexive and separable Banach space and q ∈ (1, +∞). w (a) Cq (Y) is the Banach space of sequentially weakly continuous (continuous, if Y is finite dimensional) functions ζ : Y → R satisfying q ∀ ε > 0 ∃ Aε ≥ 0 : |ζ(y)| ≤ Aε + εkykY for every y ∈ Y, (3.12) ζ(y) or, equivalently, lim q = 0, endowed with the norm kykY→∞ 1 + kykY |ζ(y)| kζk w := sup . (3.13) Cq (Y) q y∈Y 1 + kykY
8 w w (b) Pq (Y) is the topological space of measures in Pq(Y) endowed with the initial topology σ Pq(Y), Cq (Y) ∗ w 0 (or, equivalently, the weak topology of Cq (Y) ). The following result is an immediate consequence of Proposition 3.4. Corollary 3.6. Let Y be a reflexive and separable Banach space and q ∈ (1, +∞). w (a) The topology of Pq (Y) is finer than the narrow topology of P(Yw); they coincide on bounded subsets K of Pq(Y), i.e. satisfying Z q sup kykY dµ < ∞. (3.14) µ∈K
w (b) a sequence (µn)n∈N converges to µ in Pq (Y) if and only if Z q (µn)n∈N converges narrowly in P(Yw) and sup kykY dµn(y) < ∞. n∈N
w (c) a set K ⊂ Pq(Y) is relatively sequentially compact in Pq (Y) if and only if it satisfies (3.14). sw sw 2 2 w (d) If a sequence (µn)n∈N in Ppq (X × Y) converges to µ in Ppq (X × Y) then π] µn → π] µ in Pq (Y) 1 1 (and π] µn → π] µ in Pp(X)).
Remark 3.7. All the definitions and results of this Section can be easily extended to the case when Yw is 0 ∗ replaced by the dual Yw∗ of a separable Banach space endowed with its weak -topology and we deal with 0 sw∗ 0 w∗ 0 the product Xs × Yw∗. We could therefore consider the spaces Ppq (X × Y ) and Pq (Y ). Let us conclude this section with a useful application of the weak topology of Definition 3.3 to the stability of optimal plans in Hilbert spaces. i Theorem 3.8. Let H be a separable Hilbert space, let (µn)n∈N be two sequences in P2(H), i = 1, 2 with 1 2 1 2 1 2 γn ∈ Γo(µn, µn), and let µ , µ ∈ P2(H) with γ ∈ Γ(µ , µ ). If
2 (µn) is tight in P(Hw) and γn → γ narrowly in P(Hs × Hw) as n → ∞, (3.15)
1 1 2 2 1 2 then µn → µ narrowly in P(H), µn → µ narrowly in P(Hw) and γ ∈ Γo(µ , µ ). In particular, any sw limit point γ of optimal plans γn in P22 (H × H) is optimal as well.
1 2 Proof. The statement concerning the convergence of µn and µn is obvious. Since γ has finite quadratic moment, in order to check its optimality it is sufficient to prove that γ is concentrated on a cyclically monotone set, i.e. there exists a Borel set M ⊂ H × H such that γ(H2 \ M) = 0 and for every N ∈ N
N k k 0 0 N N X k k−1 k (x1 , x2 ) ∈ M, k = 0, ··· ,N, with (x1, x2) = (x1 , x2 ) ⇒ hx1 − x1 , x2 i ≥ 0. (3.16) k=1
The standard idea, i.e. using the convergence of the supports of γn, should be adapted to the case of the (non-metrizable) weak topology of H. We thus consider also the metric space (H$, d$), whose metric has been defined by (3.11) (here Y = H); we recall that the topology induced by d$ coincides with the weak topology on every bounded subset of H. 1 Since (µn)n is narrowly convergent in P(H) it is tight, so that we can find a function ψ1 : H → [0, +∞] R 1 2 with strongly compact sublevels such that H ψ1(x) dµn(x) ≤ S1 < ∞ for every n ∈ N. Since (µn) is tight in P(Hw) we can find a function ψ2 : H → [0, +∞] with weakly compact sublevels such that R 2 2 H ψ2(x) dµn(x) ≤ S2 < ∞ for every n ∈ N. Let us set σn := (IdH×H ×ψ)]γn ∈ P(H × [0, +∞)). We have that Z ψ1(x1) + ψ2(x2) + |r|) dσn(x1, x2, r) ≤ S1 + 2S2 (3.17) so that the sequence (σn)n∈N is tight in P(H × H$ × R) (recall (3.11)). Since Hs × H$ × R is metrizable, we can thus extract a subsequence (still denoted by σn) converging to a limit plan σ ∈ P(H × H × R) such 12 that π] σ = γ. Since σ is a Radon probability measure in H2 × R, we can find an increasing sequence of compact sets 2 2 Kj ⊂ supp(σ) ⊂ H ×R such that σ(H ×R\∪jKj) = 0. It follows that γ is concentrated on M := ∪jMj 12 where Mj := π Kj) are compact sets.
9 k k k Let now (x1 , x2 ), k = 0, ··· ,N, be points in M as in (3.16). There exists j ∈ N and points r ≥ 0 such k k k that (x1 , x2 , r ) ∈ Kj. Since σn is concentrated on (IdH2 ×ψ)(supp(γn)), we can thus find a sequence k k (x1,n, x2,n) ∈ supp(γn) such that
k k k k k k x1,n → x1 strongly in H, d$(x2,n, x2 ) → 0, ψ2(x2,n) → r in R as n → ∞. (3.18)
k k Since ψ2 has weakly compact sublevels, we deduce that x2,n * x2 as n → ∞. Since γn is cyclically monotone, we know that
N X k k−1 k hx1,n − x1,n , x2,ni ≥ 0 for every n ∈ N. (3.19) k=1 We can then pass to the limit as n → ∞ in (3.19) and using the sequential continuity of the scalar product in H × Hw we obtain (3.16).
4 Weak lower semicontinuity of geodesically convex functions in P2(H)
Let (H, | · |) be a separable Hilbert space and let ϕ : H → R ∪ {+∞} be a convex function. It is well known that ϕ is lower semicontinuous w.r.t. the strong topology of H if and only if it is lower semicontinuous w.r.t. the weak topology. We want to extend this property to geodesically convex functions φ : P2(H) → R ∪ {+∞}, an important class of functions introduced by McCann [8].
Let us first recall that a (minimal, constant speed) geodesic (µs)s∈[0,1] in P2(H) connecting two given measures µ0, µ1 ∈ P2(H) is a Lipschitz curve satisfying
W2(µs, µt) = |t − s|W2(µ0, µ1) for every s, t ∈ [0, 1]. (4.1)
Equivalently, it is possible to prove (see e.g. [1]) that a curve (µs)s∈[0,1] is a geodesic if and only if there exists an optimal plan µ ∈ Γo(µ0, µ1) such that
1→2 1→2 µs := (πs )]µ, πs (x1, x2) := (1 − s)x1 + sx2 x1, x2 ∈ H, s ∈ [0, 1]. (4.2)
Definition 4.1. Let φ : P2(H) → R ∪ {+∞} be a function with proper domain D(φ) := {µ ∈ P2(H): φ(µ) < ∞}= 6 Ø. φ is geodesically convex if every µ0, µ1 ∈ D(φ) can be connected by a geodesic (µs)s∈[0,1] in P2(H) along which
φ(µs) ≤ (1 − s)φ(µ0) + sφ(µ1) for every s ∈ [0, 1]. (4.3)
Equivalently, there exists µ ∈ Γo(µ1, µ2) such that