[1] the Wasserstein Metric Wp (With P ≥ 1) Is Deﬁned for Probability Measures on a Radon Space

PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 136, Number 1, January 2008, Pages 333–339 S 0002-9939(07)09020-X Article electronically published on September 27, 2007 AN ELEMENTARY PROOF OF THE TRIANGLE INEQUALITY FOR THE WASSERSTEIN METRIC PHILIPPE CLEMENT AND WOLFGANG DESCH (Communicated by Richard C. Bradley) Abstract. We give an elementary proof for the triangle inequality of the p- Wasserstein metric for probability measures on separable metric spaces. Unlike known approaches, our proof does not rely on the disintegration theorem in its full generality; therefore the additional assumption that the underlying space is Radon can be omitted. We also supply a proof, not depending on disintegration, that the Wasserstein metric is complete on Polish spaces. 1. Introduction In [1] the Wasserstein metric Wp (with p ≥ 1) is defined for probability measures on a Radon space. The proof of the triangle inequality relies on the disintegration theorem [1, Theorem 5.3.1]. The aim of this paper is to give an elementary proof of the triangle inequality for a general separable metric space. To be more precise, we introduce the following notation and definitions (according to [1]): Let (X, d) be a separable metric space with its Borel σ-algebra B(X). Here P(X) denotes the set of Borel probability measures. If (Y,d) is another separable metric space, if µ ∈P(X)andf : X → Y is a measurable map, then f #µ −1 is the image measure on B(Y ), i.e. (f #µ)(A)=µ(f (A)) for A ∈B(Y ). In particular, considering the product X1 × X2 of two separable metric spaces, we define i i the canonical projections π : X1 × X2 → Xi.Ifγ ∈P(X1 × X2), then π #γ (for i =1, 2) are the marginal distributions of γ. Similarly, for products of three spaces, i,j i π : X1 × X2 × X3 → Xi × Xj denotes the canonical projection. If µ ∈P(Xi), we define 1 2 i i Γ(µ ,µ )={γ ∈P(X1 × X2): π #γ = µ for i =1, 2}. Let 1 ≤ p<∞.ByPp(X)wedenotethesetofallµ ∈P(X) such that p ∞ ∈ 1 2 ∈P X d(x, y) dµ(x) < for some (equivalently, all) y X.Forµ ,µ p(X)we define the Wasserstein metric as in [2, Section 11.8, Problem 7]: 1/p 1 2 p (1.1) Wp(µ ,µ )= inf d(x1,x2) dγ(x1,x2) . ∈ 1 2 γ Γ(µ ,µ ) X×X In [2, Section 11.8.3] it is shown for p =1thatW1 is a metric. Moreover, [2, Section 11.8.3, Problem 9] consists in proving that Wp is a metric for p ≥ 1, Received by the editors October 30, 2006. 2000 Mathematics Subject Classification. Primary 60B05. Key words and phrases. Wasserstein metric, triangle inequality, probability measures on metric spaces. c 2007 American Mathematical Society 333 License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 334 PHILIPPE CLEMENT AND WOLFGANG DESCH provided (X, d) is a Polish space. In [1, Section 7.1] it is proved that Wp is a metric for p ≥ 1 under the more general assumption that (X, d) is a separable Radon space. In this note (Section 2) we prove that the triangle inequality can be proved by more elementary means, thus extending its validity to general separable metric spaces. The strategy of our proof follows quite closely that of [2]. However, in the case of a countable metric space, the disintegration can be done in a very straightforward way without requiring higher tools of measure theory. The generalisation from countable to separable metric spaces is done by an approximation procedure. In [1], the disintegration theorem is also used to prove the completeness of (Pp(X),Wp)when(X, d) is complete. In Section 3 of this note we give a proof which does not rely on the disintegration theorem but on the completeness of (P(X),β), where β is the dual bounded Lipschitz metric (see [2, p. 394]). 2. The triangle inequality We begin with a proof of the triangle inequality in the case where (X, d)isa countable metric space. Proposition 2.1. Let (X, d) be a countable metric space and let 1 ≤ p<∞.Let 1 2 3 1,2 1 2 2,3 2 3 µ ,µ ,µ ∈Pp(X), γ ∈ Γ(µ ,µ ) and γ ∈ Γ(µ ,µ ). Then there exists some γ1,3 ∈ Γ(µ1,µ3) such that 1/p p 1,3 d(x1,x3) dγ (x1,x3) X×X 1/p 1/p p 1,2 p 2,3 ≤ d(x1,x2) dγ (x1,x2) + d(x2,x3) dγ (x1,x3) . X×X X×X 1 3 1 2 2 3 In particular, Wp(µ ,µ ) ≤ Wp(µ ,µ )+Wp(µ ,µ ) . { ···} i i { } i,j Proof. Let X = v1,v2, . For short, we denote µk = µ ( vk )andγk,l = i,j γ ({(vk,vl)}) whenever i, j ∈{1, 2, 3} and k, l ∈ N. We define (in accordance with the notation just introduced) the measure × × γ = γk,m,n(δvk δvm δvn ) k,m,n γ1,2 γ2,3 k,m m,n 2 µ2 if µm =0, with γk,m,n = m 2 0ifµm =0. Since the first marginal of γ2,3 equals µ2,weobtain γ1,2 γ2,3 k,m m,n 1,2 2 1,2 n µ2 = γk,m if µm =0, π #γ({(vk,vm)})= m 1,2 2 0=γk,m if µm =0. 2,3 2,3 Similarly, π #γ = γ . Since the marginals are probability measures, we infer j j that γ is a probability measure. Moreover it follows for j =1, 2, 3thatπ #γ = µ . We define 1,3 1,3 γ = π #γ which is again a probability measure and has marginals µ1 and µ3. License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use WASSERSTEIN METRIC 335 Now use the definition of γ1,3 and Minkowski’s inequality to estimate 1/p p 1,3 d(x1,x3) dγ (x1,x3) X×X 1/p p = d(x1,x3) dγ(x1,x2,x3) X×X×X 1/p p ≤ [d(x1,x2)+d(x2,x3)] dγ(x1,x2,x3) X×X×X 1/p p ≤ d(x1,x2) dγ(x1,x2,x3) X×X×X 1/p p + d(x2,x3) dγ(x1,x2,x3) X×X×X 1/p 1/p p 1,2 p 2,3 = d(x1,x2) dγ (x1,x2) + d(x2,x3) dγ (x2,x3) . X×X X×X Thus γ1,3 satisfies the desired estimate. The extension from the case of a countable metric space to a separable space is done by an approximation procedure. We will need several lemmas. Lemma 2.2. Let (X, d) be a separable metric space and let X˜ be a countable dense subset of X. Then, for each >0, there exists a Borel measurable map f : X → X˜ −1 such that d(x, f(x)) <for each x ∈ X.ThesetsSi = f (vi) form a partition of X. Proof. Let X˜ = {v1,v2, ···}. We define inductively S = B(v ,), 1 1 Si = B(vi,) \ Sj. j<i This is a partition of X by Borel sets. Finally, we define f(x)=vi whenever x ∈ Si. Remark 2.3. By skipping the indices where Si = ∅ and renumbering, we can obtain a partition of X into nonempty sets Si. Lemma 2.4. Let X be a separable metric space and let X˜ be a countable dense subset of X.Let>0 and let f be given according to Lemma 2.2. Moreover, let γ ∈P(X × X) and γ˜ ∈P(X˜ × X˜) be such that γ˜ =(f × f)#γ. Then the following assertions hold: i i (1) the marginals satisfy (for i =1, 2) π #γ˜ = f #(π#γ), 1/p 1/p p − p ≤ (2) X×X d(x, y) dγ(x, y) X˜×X˜ d(x, y) dγ˜(x, y) 2. Proof. Let U˜ ⊂ X˜.Then 1 −1 1 −1 1 (π #γ˜)(U˜)=˜γ(U˜ × X˜)=γ(f (U˜) × X)=(π #γ)(f (U˜)) = [f #(π #γ)](U˜). License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 336 PHILIPPE CLEMENT AND WOLFGANG DESCH Of course, the same proof holds for π2. To obtain the estimate, we utilize the fact thatγ ˜ =(f × f) γ and Minkowski’s inequality: # 1/p 1/p d(x, y)p dγ(x, y) − d(x, y)p dγ˜(x, y) × ˜× ˜ X X X X 1/p 1/p p − p = d(x, y) dγ(x, y) d(f(x),f(y)) dγ(x, y) X×X X×X 1/p ≤ |d(x, y) − d(f(x),f(y))|p dγ(x, y) X×X 1/p ≤ (2)p dγ(x, y) =2. X×X Theorem 2.5. Let (X, d) be a separable metric space and let 1 ≤ p<∞.Let 1 2 3 1 3 1 2 2 3 µ ,µ ,µ ∈Pp(X).ThenWp(µ ,µ ) ≤ Wp(µ ,µ )+Wp(µ ,µ ). Proof. Let X˜ = {v1,v2, ···} be a dense countable subset of X,let>0, and let −1 f begivenaccordingtoLemma2.2.LetSi = f ({vi}). For i =1, 2, 3 we define i i µ˜ = f #µ . Now, for the pairs (i, j) ∈{(1, 2), (2, 3)},letγi,j ∈ Γ(µi,µj) be such that 1/p p i,j i j d(xi,xj) dγ (xi,xj) <Wp(µ ,µ )+. X×X i,j i,j On X˜ × X˜ we define the measuresγ ˜ =(f × f)#γ .

[1] the Wasserstein Metric Wp (With P ≥ 1) Is Deﬁned for Probability Measures on a Radon Space

Wasserstein Distance W2, Or to the Kantorovich-Wasserstein Distance W1, with Explicit Constants of Equivalence

Free Complete Wasserstein Algebras

Statistical Aspects of Wasserstein Distances

Optimal Transportation: Duality Theory

Data Assimilation in Optimal Transport Framework

Pdfs) Is Discussed

Constrained Steepest Descent in the 2-Wasserstein Metric

Computational Optimal Transport

Some Geometric Calculations on Wasserstein Space

Maximum Mean Discrepancy Gradient Flow

Arxiv:1505.04954V2 [Math.PR]

1.2 Optimal Transport and Wasserstein Metric