Large Deviations for a matching problem related to the ∞-Wasserstein distance José Trashorras
To cite this version:
José Trashorras. Large Deviations for a matching problem related to the ∞-Wasserstein distance. ALEA : Latin American Journal of Probability and Mathematical Statistics, Instituto Nacional de Matemática Pura e Aplicada, 2018, 15, pp.247-278. hal-00641378v2
HAL Id: hal-00641378 https://hal.archives-ouvertes.fr/hal-00641378v2 Submitted on 15 Feb 2017
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Large Deviations for a matching problem related to the ∞-Wasserstein distance
Jos´eTrashorras ∗ February 15, 2017
Abstract
Let (E, d) be a compact metric space, X = (X1,...,Xn,... ) and Y = (Y1,...,Yn,... ) two independent sequences of independent E-valued ran- X Y dom variables and (Ln )n≥1 and (Ln )n≥1 the associated sequences of em- X Y pirical measures. We establish a large deviation principle for (W∞(Ln ,Ln ))n≥1 where W∞ is the ∞-Wasserstein distance,
X Y W∞(Ln ,Ln ) = min max d(Xi,Yσ(i)) σ∈Sn 1≤i≤n
where Sn stands for the set of permutations of {1, . . . , n}.
1 Introduction
n We say that a sequence of Borel probability measures (P )n≥1 on a topological space Y obeys a Large Deviation Principle (hereafter abbreviated LDP) with rate function I if I is a non-negative, lower semi-continuous function defined on Y such that
1 1 − inf I(y) ≤ lim inf log P n(A) ≤ lim sup log P n(A) ≤ − inf I(y) y∈Ao n→∞ n n→∞ n y∈A¯ for any measurable set A ⊂ Y, whose interior is denoted by Ao and closure by A¯. If the level sets {y : I(y) ≤ α} are compact for every α < ∞, I is called a good rate function. With a slight abuse of language we say that a sequence of random variables obeys an LDP when the sequence of measures induced by these random variables obeys an LDP. For a background on the theory of large deviations see Dembo and Zeitouni [4] and references therein.
Let (E, d) be a metric space. There has been a lot of interest for years in considering the space M 1(E) of Borel probability measures on E endowed with the so-called p-Wasserstein distances (Z 1/p) p Wp(ν, γ) = inf d(x, y) Q(dx, dy) Q∈C(ν,γ) E×E
∗Universit´eParis-Dauphine, Ceremade, Place du mar´echal de Lattre de Tassigny, 75775 Paris Cedex 16 France. Email: [email protected]
1 where p ∈ [1, ∞) and C(ν, γ) stands for the set of Borel probability measures 2 on E with first marginal Q1 = ν and second marginal Q2 = γ, see Chapter 6 in [20] for a broad review. However, the ∞-Wasserstein distance equivalently defined by either W∞(ν, γ) = lim Wp(ν, γ) p→∞ or −1 W∞(ν, γ) = inf sup u, u ∈ S(Q ◦ d ) (1) Q∈C(ν,γ) where S(Q◦d−1) ⊂ E2 stands for the support of the probability measure Q◦d−1 has attracted much less attention so far.1
Our framework is the following : We are given a compact metric space (E, d). Without loss of generality we can assume that supx,y∈E d(x, y) = 1. Let (Xn)n≥1 and (Yn)n≥1 be two independent sequences of E-valued independent random variables defined on the same probability space (Ω, A, P). We assume 1 2 that all the Xi’s (resp. Yi’s) have the same distribution µ (resp. µ ). For every n ≥ 1 we consider the empirical measures
n n 1 X 1 X LX = δ and LY = δ . n n Xi n n Yi i=1 i=1
X Y In [7] Ganesh and O’Connell conjecture that (W∞(Ln ,Ln ))n≥1 obeys an LDP with rate function
1 1 2 2 I∞(x) = inf H(ν |µ ) + H(ν |µ ) ν1,ν2∈M 1(E) W∞(ν1,ν2)=x where for any two ν, µ ∈ M 1(E)
R log dν dν if ν µ H(ν|µ) = E dµ ∞ otherwise.
If instead of I∞ one considers its lower semi-continuous regularization J∞ which is defined by J∞(x) = sup inf I∞(y), δ>0 y∈B(x,δ) see e.g. Chapter 1 in [15], then
X Y Theorem 1.1 The sequence (W∞(Ln ,Ln ))n≥1 satisfies an LDP on [0, 1] with good rate function J∞(x).
In Section 3.1 we show that if e.g. E = [0, 1] is endowed with the usual Euclidean 1 2 3 3 distance and µ = µ = µ admits f(x) = 1 1 (x) + 1 2 (x) as a density 2 [0, 3 ] 2 [ 3 ,1] w.r.t. the Lebesgue measure then, due to the disconnectedness of the support of µ, I∞ fails to be lower semi-continuous. Hence, I∞ and J∞ do not always coincide. This example illustrates a general feature of the problem considered
1Let us recall that the support S(P ) of a Borel probability measure P on a metric space (E, d) is defined by x ∈ S(P ) if and only if for every ε > 0,P (B(x, ε)) > 0 where B(x, ε) stands for the open ball centered at x with radius ε. If E is separable, in particular if E is compact, S(P ) is closed, see e.g. Theorem 2.1 in [14].
2 here : the connectedness properties of the support of the reference measures X Y are the key ingredient in the behavior of (W∞(Ln ,Ln ))n≥1. We will say more about that below, in Proposition 1.3 below. X Y Since for every n ≥ 1 every Q ∈ C(Ln ,Ln ) can be represented as a bi- stochastic matrix and since, according to the Birkhoff-Von Neumann Theorem, every bi-stochastic matrix is a convex combination of permutation matrices (see e.g. Theorem 5.5.1 in [17]) we have
X Y W∞(Ln ,Ln ) = min max d(Xi,Yσ(i)) (2) σ∈Sn 1≤i≤n where Sn stands for the set of permutations of {1, . . . , n}. Hence, computing X Y W∞(Ln ,Ln ) is nothing but solving a minimax matching problem which is a fundamental combinatorial question. Equality (2) can be rephrased as
X Y X Y W∞(Ln ,Ln ) = dH(S(Ln ), S(Ln )) (3)
X Y where S(Ln ) = {X1,...,Xn}, S(Ln ) = {Y1,...,Yn} and dH is the so-called Hausdorff distance. Lets us recall that the Hausdorff distance is defined on the set of non-empty closed subsets of E by dH(A, B) = max sup inf d(x, y), sup inf d(x, y) x∈A y∈B y∈B x∈A see e.g. Section 6.2.2 in [3]. It is the essential tool in many applications like image processing [8], object matching [18], face detection [10] and evolutionary optimization [16] to name just a few.
Now that we have stated our main result, let us briefly review some facts about W∞. The reference paper on the subject is [1] by Champion, De Pascale and Juutinen. They consider measures with support on compact subsets of Rd (d ≥ 1). Some of their results have been generalized from Rd to the setting of Polish spaces and general cost functions by Jylh¨ain [9]. We recall them for latter use. Lemma 1.1 (Proposition 2.1 in [1] and Theorem 2.6 in [9]) For any two ν, γ ∈ M 1(E) there exists at least one Q ∈ C(ν, γ) such that
−1 W∞(ν, γ) = sup u, u ∈ S(Q ◦ d ) . They further establish the existence of nice solutions to the problem (1) which they call infinitely cyclically monotone couplings. Definition 1.1 A probability measure P ∈ M 1(E2) is called infinitely cyclically monotone if and only if for every integer n ≥ 2, every (x1, y1),..., (xn, yn) ∈ S(P ) and every σ ∈ Sn we have
max d(xi, yi) ≤ max d(xi, yσ(i)). (4) 1≤i≤n 1≤i≤n
Lemma 1.2 (Theorem 3.2 and Theorem 3.4 in [1] and Theorem 2.16 in [9]) For any two γ, ν ∈ M 1(E) there exists at least one infinitely cyclically monotone P ∈ C(γ, ν) such that
−1 W∞(γ, ν) = sup u, u ∈ S(P ◦ d ) .
3 In order to check that P ∈ M 1(E2) is infinitely cyclically monotone we only need to know its support. The way mass is spread over S(P ) does not matter : this property is concerned with subsets of E2 rather than with fully specified probability measures. This is the reason why we will also call infinitely cyclically monotone subsets of E2 that satisfy (4). Infinitely cyclically monotone couplings are nice solutions to the problem (1) in the sense that Lemma 1.3 (Theorem 3.4 in [1] and Theorem 2.17 in [9]) Any infinitely cycli- cally monotone P ∈ M 1(E2) satisfies
−1 W∞(P1,P2) = sup u, u ∈ S(P ◦ d ) .
Indeed, once an infinitely cyclically monotone A ⊂ E2 is given, no matter how one spreads mass over A provided the couple that maximizes (x, y) 7→ d(x, y) over A is covered, the marginals of the resulting probability measure over E2 will always be at the same W∞ distance. This will show useful when constructing 1 approximations. For any two γ, ν ∈ M (E) we shall denote by C∞(ν, γ) the set of infinitely cyclically monotone couplings of ν and γ. Let us give an elementary example in order to illustrate the difference be- n tween the Wp’s and W∞. Let E = [0, 1] and consider the sequence (ν = 1 1 n (1 − n )δ0 + n δ1)n≥1. The ν ’s are perturbations of δ0 in the sense that for n every p ∈ [1, ∞) we have limn→∞ Wp(ν , δ0) = 0. Nevertheless for every n ≥ 1 n n we have W∞(ν , δ0) = 1 since S(ν ) = {0, 1} while S(δ0) = {0}. This shows that perturbations so small that they do not impact the Wp distances in the limit can have dramatic consequences when using the W∞ distance instead. X Y Again, an illustration of how this singularity affects (W∞(Ln ,Ln ))n≥1 is given in Proposition 1.3 below.
Next we briefly review the literature on results involving matching problems X Y and the Wp’s and W∞ distances. The first results on (W∞(Ln ,Ln ))n≥1 have 2 been focused on concentration inequalities for independent [0, 1] -valued Xi’s 1 2 and Yi’s with common distribution µ = µ = λ where λ stands for the uniform distribution over [0, 1]2 . In [13] Leighton and Shor establish that there exists a K > 0 such that 1 n−1/2(log n)3/4 ≤ W (LX ,LY ) ≤ Kn−1/2(log n)3/4 K ∞ n n with probability 1 − o(1). Using majorizing measures, Talagrand showed in [19] that the latter result is a particular case of a general property of empirical discrepancies. Concerning Large Deviations, in [7] the authors prove that if E = [0, 1]2 and 1 X µ = λ is the uniform distribution over E then (W∞(Ln , λ))n≥1 obeys an LDP with good rate function
T∞(x) = inf {H(ν|λ)} . ν∈M 1(E) W∞(ν,λ)=x
To obtain this result they prove that in this particular case the map ν 7→ W∞(ν, λ) is continuous with respect to the weak convergence topology, and then apply the contraction principle, see Theorem 4.2.1 in [4]. They further show in the same framework as here, i.e. E-valued Xi’s and Yi’s where (E, d)
4 X Y is any compact metric space, that (W1(Ln ,Ln ))n≥1 obeys an LDP with good rate function
1 1 2 2 I1(x) = inf H(ν |µ ) + H(ν |µ ) . ν1,ν2∈M 1(E) 1 2 W1(ν ,ν )=x
Their proof relies on the fact that, according to the Kantorovitch-Rubinstein Theorem (see e.g. Theorem 11.8.2 in [5]), when E is compact W1 generates the X Y weak convergence topology. As a consequence (Ln )n≥1 and (Ln )n≥1 satisfy an 1 LDP on M (E) endowed with W1 and again the contraction principle leads to an X Y LDP for (W1(Ln ,Ln ))n≥1. Following the same approach, one can deduce from X Y Theorem 1.1 in [21] that for every p ∈ [1, ∞) the sequence (Wp(Ln ,Ln ))n≥1 obeys an LDP with good rate function
1 1 2 2 Ip(x) = inf H(ν |µ ) + H(ν |µ ) . ν1,ν2∈M 1(E) Wp(ν1,ν2)=x
One might wonder why this idea is not applicable to the analysis of the LD X Y X properties of (W∞(Ln ,Ln ))n≥1. Actually an LDP for (Ln )n≥1 can not hold 1 when M (E) is endowed with W∞, at least at this level of generality. Indeed, consider the probability measure µ1 on E = [0, 1] which density with respect to X the Lebesgue measure is g(x) = 2(1 1 (x)+1 3 (x)) and assume that (L )n≥1 [0, 4 ] [ 4 ,1] n 1 satisfies an LDP on (M (E),W∞) with some rate function R. Clearly for every X 1 1 odd integer n we have P(W∞(Ln , µ ) < 3/8) = 0 hence for every ν ∈ B(µ , 3/8) we should necessarily have R(ν) = ∞. So we would get
1 X 1 1 lim sup log P(W∞(Ln , µ ) ≤ ) = −∞ (5) n→∞ n 4
X 1 1 n −n but for every even n we have P(W∞(Ln , µ ) ≤ 4 ) ≥ n/2 2 which contradicts (5). Again this is due to the lack of connectedness of the support of µ1 and, in a sense, Theorem 1.1 is the only possible “Sanov-like” statement involving W∞ without any additional assumption.
Now let us give some facts about the zeros of J∞. First notice that since
1 2 1 1 2 2 0 ≤ I∞(W∞(µ , µ )) ≤ H(µ |µ ) + H(µ |µ ) = 0
1 2 we necessarily have J∞(W∞(µ , µ )) = 0. Actually, due to the highly discon- 1 2 tinuous nature of W∞ the zeros of J∞ need not be reduced to W∞(µ , µ ). We shall first prove the following
1 2 Lemma 1.4 Let x ∈ [0, 1] be such that J∞(x) = 0. Necessarily x ≥ W∞(µ , µ ).
2 The zeros of J∞ are related to some particular elements of E as defined next
Definition 1.2 We shall say that (a, b) ∈ E2 is a couple of directly connected points and write a ↔ b if and only if a ∈ S(µ1), b ∈ S(µ2) and there exists Q ∈ 1 2 C∞(µ , µ ) such that for every integer N ≥ 2, every (α2, β2),..., (αN , βN ) ∈ S(Q) and every σ ∈ SN we have d(a, b) ≤ maxi=1,...,N d(αi, βσ(i)) where (α1, β1) = (a, b).
5 Notice that in this definition (a, b) need not be an element of S(Q). In words, 1 2 a ↔ b if and only if there exists Q ∈ C∞(µ , µ ) such that {(a, b)} ∪ S(Q) is still an infinitely cyclically monotone subset of E2. Consider the set
2 Zµ1,µ2 = {x ∈ [0, 1] : ∃(a, b) ∈ E such that a ↔ b and x = d(a, b)}
Proposition 1.1 J∞(x) = 0 if and only if x ∈ Zµ1,µ2 . The notion of directly connected points takes a simpler form when µ1 = µ2. Proposition 1.2 If µ1 = µ2 = µ then a, b ∈ E are directly connected if and only if a, b ∈ S(µ) and for every integer N ≥ 3 and every sequence α1, . . . , αN of elements of S(µ) such that α1 = a and αN = b there is at least one integer L such that 1 ≤ L ≤ N − 1, and d(a, b) ≤ d(αL, αL+1). In words, if µ1 = µ2 = µ then a ↔ b if and only if one can not decompose a journey from a to b into several stages between elements of S(µ) such that each of these stages has length strictly smaller than d(a, b). As an example consider 1 2 3 3 again E = [0, 1] and µ = µ = µ with density f(x) = 1 1 (x) + 1 2 (x) 2 [0, 3 ] 2 [ 3 ,1] 1 2 1 w.r.t. the Lebesgue measure. Then 3 ↔ 3 hence J∞( 3 ) = 0 as it is established through a direct computation in Section 3.1.
X Y Finally we investigate the almost sure behavior of (W∞(Ln ,Ln ))n≥1 as 1 2 n → ∞ when µ = µ = µ. The elements of Zµ, hence the asymptotic behavior X Y of (W∞(Ln ,Ln ))n≥1, are directly related to the connectedness properties of D S(µ). So let us write S(µ) = ∪i=1Ai where the Ai are the connected components of S(µ). Here D = 1 means that S(µ) is connected and we do not exclude the possibility that D = ∞. We have
Lemma 1.5 Zµ = {0} if and only if S(µ) is connected.
For every x ∈ S(µ) let us denote by Ai(x) the connected component of S(µ) that contains x and for any two closed B,C ⊂ E we set d(B,C) = inf d(x, y). (x,y)∈B×C
Lemma 1.6 For every a, b ∈ S(µ), if a ↔ b then d(a, b) = d(Ai(a),Ai(b)).
For any two connected components Ai,Aj of S(µ) we shall write Ai ↔ Aj if and only if there exists a ∈ Ai and b ∈ Aj such that a ↔ b. It directly follows from Lemma 1.6 that 2 Zµ = d(Ai,Aj), (i, j) ∈ {1,...,D} such that Ai ↔ Aj . In order to formulate our result we need the following
Lemma 1.7 If S(µ) is not connected every x ∈ Zµ such that x 6= 0 is an isolated point.
Hence, if S(µ) is not connected, we can order the elements of Zµ
β1 > β2 > ··· > βn > ··· where β1 = max Zµ, and for every integer i ≥ 2 max Zµ \{β1, . . . , βi−1} if βi−1 6= 0 βi = 0 if βi−1 = 0.
6 X Y Proposition 1.3 1. If S(µ) is connected then W∞(Ln ,Ln ) → 0 almost surely. 2. If S(µ) is not connected and
(a) there exists only two connected components Ai,Aj of S(µ) such that Ai ↔ Aj and d(Ai,Aj) = β1 then X Y X Y lim inf W∞(Ln ,Ln ) = β2 and lim sup W∞(Ln ,Ln ) = β1 (6) almost surely.
(b) there are more than two connected components Ai,Aj of S(µ) such X Y that Ai ↔ Aj and d(Ai,Aj) = β1 then W∞(Ln ,Ln ) → β1 almost surely. 3 3 Hence, if E = [0, 1] and µ admits f(x) = 1 1 (x) + 1 2 (x) as a density 2 [0, 3 ] 2 [ 3 ,1] X Y 1 w.r.t. the Lebesgue measure then lim sup W∞(Ln ,Ln ) = 3 almost surely and X Y lim inf W∞(Ln ,Ln ) = 0 almost surely. The paper is structured as follows : Section 2 is devoted to the proof of Theorem 1.1. In Section 3 we discuss the properties of I∞ and J∞. Section X Y 4 is concerned with the almost sure asymptotic behavior of (W∞(Ln ,Ln ))n≥1 while Section 5 deals with the proof of some complementary results.
2 Proof of Theorem 1.1
The basic idea in the proof of Theorem 1.1 is to partition E in such a way that we are essentially led to consider finite set valued Xi’s and Yi’s. Indeed, on the one hand we will see that W∞ is well-behaved with respect to partitioning (see Lemma 2.1 below) and on the other hand proceeding this way reduces the computation of probabilities to simple classical combinatorial estimates. In order to go from the particular case, i.e. E finite, to the general one we shall need some results on the weak convergence of nets of probability measures. In Section 2.1 we give an account on partitions, nets and the weak convergence topology. The proof of all the statements there is postponed to Section 5. The LD lower bound is established in Section 2.2 while the LD upper bound is derived in Section 2.3.
2.1 Some facts about partitions of E, nets and the weak convergence topology Let P be the set of finite measurable partitions of E into non-empty sets. To every Π = (A1,...,AL) ∈ P we associate once for all through Section 2 a family L (s1, . . . , sL) ∈ E such that for every 1 ≤ i ≤ L we have si ∈ Ai. We further associate to every Π = (A1,...,AL) ∈ P a map π as follows 1 1 π : M (E) → M ({s1, . . . , sL}) PL ν 7→ i=1 ν(Ai)δsi .
Finally for every Π = (A1,...,AL) ∈ P we define its maximal diameter as ∆(Π) = max sup d(x, y). 1≤i≤L x,y∈Ai
7 1 The following result links the W∞ distance between elements of M (E) to the analogue W∞ distance between their contractions through π. Lemma 2.1 For every ν1, ν2 ∈ M 1(E) and every Π ∈ P we have 1 2 1 2 |W∞(ν , ν ) − W∞(π(ν ), π(ν ))| ≤ 2∆(Π).
Let Π = (A1,...,AL) and K = (B1,...,BR) be two elements of P. We say that K is a refinement of Π if and only if for every 1 ≤ i ≤ L, there exists
Ji ⊂ {1,...,R} such that Ai = ∪j∈Ji Bj, and we denote it by Π K. This makes (P, ) a directed set. Let us recall that (J, E) is a directed set if and only if J is a non-empty set and E is a reflexive and transitive relation on J such that for every i, j ∈ J there exists a k ∈ J such that i E k and j E k. We introduce a general directed set (J, E) since we will use both (P, ) and (N, ≤) as directed sets and we do not want to be too specific in the results below. We j call net any map (P )j∈J defined on a directed set (J, E). A topological space j (T, T )-valued net (P )j∈J is said to converge to some P ∈ T if and only if for every neighborhood U of P there exists a j(U) ∈ J such that for every k ∈ J k j satisfying j(U) E k we have P ∈ U. We shall denote this limj∈J P = P . We j l call subnet of (P )j∈J any sub-family (P )l∈L parametrized by a cofinal L, i.e. a subset L ⊂ J such that for every j ∈ J there exists l ∈ L satisfying j E l. (While this is not the most general definition of a subnet it will be sufficient for the problem considered here). Let us recall that a topological space (T, T ) is compact if and only if every net in T admits a subnet that converges to some j point in T . Finally, for every real-valued net (P )j∈J we shall consider as usual lim sup P j = inf sup P i and lim inf P j = sup inf P i. j∈J j∈J i:j i j∈J i:jEi j∈J E For this and other questions related to nets we refer to e.g. [12]. The following lemmas are a consequence of a kind of Portmanteau result for nets of probability measures that will be derived in Section 5. j Lemma 2.2 Every net (P )j∈J of probability measures supported on R that converges weakly to some probability measure P satisfies lim sup sup S(P j) ≥ sup S(P ). (7) j∈J
j Lemma 2.3 Let (P )j∈J be a net of Borel probability measures on a compact metric space (Y, δ) that converges weakly to some probability measure P . For every x ∈ S(P ) and every j ∈ J there exists an xj ∈ S(P j) such that the net j (x )j∈J converges to x. Lemma 2.4 Every P ∈ M 1(E2) which is the limit in the weak convergence j 1 2 topology of a net (P )j∈J of infinitely cyclically monotone elements of M (E ) is infinitely cyclically monotone.
2.2 LD lower bound For every integer n ≥ 1 we consider ( n ) 1 X M 1,n(E)= ν ∈M 1(E) : there exists (x , . . . , x ) ∈ Ensuch that ν = δ . 1 n n xi i=1 The following lemma is the key point in the proof of the LD lower bound
8 Lemma 2.5 For every ε > 0, every Π = (A1,...,AL) ∈ P with ∆(Π) ≤ ε/2 1 2 1 1,n 2,n and every ν , ν ∈ M (E) there exists two sequences (ν )n≥1 and (ν )n≥1 such that 1. For every n ≥ 1 we have ν1,n, ν2,n ∈ M 1,n(E).
1,n 1 2,n 2 2. For every 1 ≤ i ≤ L we have ν (Ai) → ν (Ai) and ν (Ai) → ν (Ai).
3. There exists an N0 such that for every n ≥ N0 we have
1 2 1,n 2,n |W∞(ν , ν ) − W∞(ν , ν )| ≤ ε.
Proving Lemma 2.5 actually requires one more lemma
1 2 Lemma 2.6 For every Q ∈ M (E ) and every Π = (A1,...,AL) ∈ P there n 1 Pn 1 2 exists a sequence (Q = δ n n ) of elements of M (E ) such that n i=1 (xi ,yi ) n≥1 n 1. For every 1 ≤ u, v ≤ L we have Q (Au × Av) → Q(Au × Av).
n 2. The sequence (Q )n≥1 converges weakly to Q.
n 3. If Q(Au × Av) = 0 then for every n ≥ 1 we have Q (Au × Av) = 0.
1 2 Proof of Lemma 2.6 Let Q be an element of M (E ) and Π = (A1,...,AL) ∈ P. Let (Xi,Yi)i≥1 be a sequence of independent and Q−identically distributed random couples. According to the strong Law of Large Numbers there exists an event B of probability 1 such that for every 1 ≤ u, v ≤ L
n 1 X 1 (X ,Y ) → Q(A × A ) n Au×Av i i u v i=1 on B. Moreover, since E2 is separable (it is compact), according to Varadara- jan’s Lemma, see e.g. Theorem 11.4.1 in [5], there exists an event C of proba- bility 1 such that n 1 X w δ → Q n (Xi,Yi) i=1 on C where →w stands for the weak convergence of probability measures. Hence almost every realization of 1 Pn δ can play the role of (Qn) n i=1 (Xi,Yi) n≥1 n≥1 and the conclusions of Lemma 2.6 follow.
Proof of Lemma 2.5 Let ν1 and ν2 be two elements of M 1(E), ε > 0 and Π ∈ P with ∆(Π) ≤ ε/2. According to Lemma 1.2 there is a Q ∈ C(ν1, ν2) such that 1 2 −1 W∞(ν , ν ) = sup S(Q ◦ d ). n 1 2 Let (Q )n≥1 be the sequence of elements of M (E ) associated to Q and Π by 1,n n 2,n n Lemma 2.6. We shall prove that (ν = Q1 )n≥1 an (ν = Q2 )n≥1 meet the conditions of Lemma 2.5. First, (1) and (2) in Lemma 2.5 are clearly satisfied 1,n 2,n with these (ν )n≥1 and (ν )n≥1. Now, due to the definition of W∞, for every n ≥ 1 we have 1,n 2,n n −1 W∞(ν , ν ) ≤ sup S(Q ◦ d )
9 and due to (1) and (3) in Lemma 2.6 there exists N1 such that for every n ≥ N1 we have sup S(Qn ◦ d−1) ≤ sup S(Q ◦ d−1) + ε 1,n 2,n 1 2 hence W∞(ν , ν ) ≤ W∞(ν , ν )+ε. So we are left to prove that there exists 1 2 1,n 2,n an N2 such that for every n ≥ N2 we have W∞(ν , ν ) − ε ≤ W∞(ν , ν ). Let us assume that this is not true i.e. that there exists a sequence (nk)k≥1 such that for every k ≥ 1 we have
1 2 1,nk 2,nk W∞(ν , ν ) − ε > W∞(ν , ν ). (8)
According to Lemma 1.1 for every k ≥ 1 there exists a Cnk ∈ C(ν1,nk , ν2,nk ) such that 1,nk 2,nk nk −1 W∞(ν , ν ) = sup S(C ◦ d ). n 1 2 The sequence (C k )k≥1 admits a weakly converging subsequence since M (E ) is compact for the weak convergence topology, see Theorem 6.4 in [14]. By a n slight abuse of notation we still denote (C k )k≥1 this converging subsequence. 1 Let C be its limit. Due to (2) in Lemma 2.6 we necessarily have C1 = ν and 2 n −1 C2 = ν . Moreover (C k ◦ d )k≥1 is a sequence of probability measures on R that weakly converges to C ◦ d−1. Combining (8) with Lemma 2.2 we have
1 2 1,nk 2,nk W∞(ν , ν ) − ε ≥ lim sup W∞(ν , ν ) k→∞ = lim sup sup S(Cnk ◦ d−1) k→∞ ≥ sup S(C ◦ d−1) 1 2 ≥ W∞(ν , ν ) which can not be. The announced result follows.
Proof of the LD lower bound. As usual, in order to prove the LD lower bound it is sufficient to prove that for every x ∈ [0, 1] and every ε > 0 we have
1 X Y lim inf log P(W∞(Ln ,Ln ) ∈]x − ε, x + ε[) ≥ −J∞(x). (9) n→∞ n
In particular we can assume that J∞(x) < ∞ for otherwise (9) trivially holds. If J∞(x) < ∞ then for every m ≥ 1 there exists an ym such that |x − ym| < 1/m, I∞(ym) < ∞ and limm→∞ I∞(ym) = J∞(x). Now let us assume that for 1 2 1 1 2 every m and every ν , ν ∈ M (E) such that W∞(ν , ν ) = ym we have
1 X Y 1 1 2 2 lim inf log P(W∞(Ln ,Ln ) ∈]ym − ε/4, ym + ε/4[) ≥ −H(ν |µ ) − H(ν |µ ). n→∞ n (10) It follows that for every m large enough and every ν1, ν2 ∈ M 1(E) such that 1 2 W∞(ν , ν ) = ym 1 X Y lim inf log P(W∞(Ln ,Ln ) ∈]x − ε, x + ε[) ≥ n→∞ n
1 X Y ≥ lim inf log P(W∞(Ln ,Ln ) ∈]ym − ε/4, ym + ε/4[) n→∞ n ≥ −H(ν1|µ1) − H(ν2|µ2)
10 hence 1 X Y lim inf log P(W∞(Ln ,Ln ) ∈]x − ε, x + ε[) ≥ n→∞ n ≥ sup −H(ν1|µ1) − H(ν2|µ2) ν1,ν2∈M 1(E) W∞(ν1,ν2)=ym ≥ − inf H(ν1|µ1) + H(ν2|µ2) ν1,ν2∈M 1(E) W∞(ν1,ν2)=ym
≥ −I∞(ym) which leads to (9) by letting m → ∞. Hence it is sufficient to establish (10) and this is what we do now. 1 2 1 Let ε > 0, ym ∈ [0, 1] be such that I∞(ym) < ∞ and ν , ν ∈ M (E) be such 1 2 that W∞(ν , ν ) = ym. According to Lemma 2.5 to any Π = (A1,...,AL) ∈ P 1,n 2,n with ∆(Π) ≤ ε/32 we can associate two sequences (ν )n≥1 and (ν )n≥1 and 1,n 2,n 1,n an integer N0 such that for every n ≥ 1 we have ν , ν ∈ M (E) and for every n ≥ N0 one has X Y P(W∞(Ln ,Ln ) ∈]ym − ε/4, ym + ε/4[) = X Y 1 2 = P(|W∞(Ln ,Ln ) − W∞(ν , ν )| < ε/4) X Y 1,n 2,n ≥ P(|W∞(Ln ,Ln ) − W∞(ν , ν )| ≤ ε/8). We obviously have
X Y 1,n 2,n X Y X Y |W∞(Ln ,Ln ) − W∞(ν , ν )| ≤ |W∞(Ln ,Ln ) − W∞(π(Ln ), π(Ln ))|+ X Y 1,n 2,n +|W∞(π(Ln ), π(Ln )) − W∞(π(ν ), π(ν ))|+ 1,n 2,n 1,n 2,n +|W∞(π(ν ), π(ν )) − W∞(ν , ν )| so due to Lemma 2.1 we get X 1,n Y 2,n {π(Ln ) = π(ν )} ∩ {π(Ln ) = π(ν )} ⊂ X Y 1,n 2,n ⊂ {|W∞(Ln ,Ln ) − W∞(ν , ν )| ≤ ε/8} hence X Y 1,n 2,n P(|W∞(Ln ,Ln ) − W∞(ν , ν )| ≤ ε/8) ≥ X 1,n Y 2,n ≥ P({π(Ln ) = π(ν )} ∩ {π(Ln ) = π(ν )}) X 1,n Y 2,n = P({π(Ln ) = π(ν )})P({π(Ln ) = π(ν )}) since the sequences (Xn)n≥1 and (Yn)n≥1 are independent. It follows from elementary combinatorics (see e.g. Lemma 2.1.9 in [4]) that
X 1,n −L −nH(π(ν1,n)|π(µ1)) P({π(Ln ) = π(ν )}) ≥ (n + 1) exp Y 2,n and the analogue for P({π(Ln ) = π(ν )}) also holds true. As a consequence
1 X Y lim inf log P(W∞(Ln ,Ln ) ∈]ym − ε/4, ym + ε/4[) ≥ n→∞ n ≥ − lim sup(H(π(ν1,n)|π(µ1)) + H(π(ν2,n)|π(µ2))) n→∞ = −H(π(ν1)|π(µ1)) − H(π(ν2)|π(µ2)) (11) ≥ −H(ν1|µ1) − H(ν2|µ2) (12) where (11) is due to (2) in Lemma 2.5 and (12) comes from the fact that H(π(ν1)|π(µ1)) ≤ H(ν1|µ1) for every Π ∈ P, see e.g. Theorem 1.4.3 in [6].
11 2.3 LD upper bound We first establish the LD upper bound under the assumption that E is finite. Then we extend this result to the general case.
2.3.1 A particular case
Here we assume that E = {a1, . . . , aN }. Since for every n ≥ 1 we have X Y W∞(Ln ,Ln ) ∈ Γ = {d(ai, aj), 1 ≤ i, j ≤ N} which is a finite set it is suffi- cient in order to establish the LD upper bound to show that for every δ ∈ Γ we have
1 X Y 1 1 2 2 lim sup log P(W∞(Ln ,Ln ) = δ) ≤ − inf H(ν |µ ) + H(ν |µ ) . n→∞ n ν1,ν2∈M 1(E) W∞(ν1,ν2)=δ (13) For every n ≥ 1 we get
X Y P(W∞(Ln ,Ln ) = δ) (14) X X 1,n Y 2,n = P(Ln = ν ,Ln = ν ) ν1,n,ν2,n∈M 1,n(E) 1,n 2,n W∞(ν ,ν )=δ X ≤ exp −nH(ν1,n|µ1) − nH(ν2,n|µ2) (15) ν1,n,ν2,n∈M 1,n(E) 1,n 2,n W∞(ν ,ν )=δ ≤ (n + 1)2N exp −n inf H(ν1|µ1) + H(ν2|µ2) ν1,ν2∈M 1(E) W∞(ν1,ν2)=δ see e.g. Lemma 2.1.9 in [4] for the elementary combinatorial estimate (15). The announced (13) follows.
2.3.2 The general case Let us introduce some more notations. For every Π ∈ P we consider
Π 1 1 2 2 I∞(x) = inf H(π(ν )|π(µ )) + H(π(ν )|π(µ )) ν1,ν2∈M 1(E) W∞(ν1,ν2)=x
Π Π and we shall denote by J∞ the lower semi-continuous regularization of I∞. X Y Since [0, 1] is compact the sequence (W∞(Ln ,Ln ))n≥1 is naturally exponentially tight so it is sufficient in order to establish the LD upper bound to consider X Y events of the form {W∞(Ln ,Ln ) ∈ [a, b]}, see Lemma 1.2.18 in [4]. According to Lemma 2.1 for every a, b ∈ [0, 1], a ≤ b, and every Π ∈ P we have for every n ≥ 1 that
X Y X Y P(W∞(Ln ,Ln ) ∈ [a, b]) ≤ P(W∞(π(Ln ), π(Ln )) ∈ [a − 2∆(Π), b + 2∆(Π)]).
Hence, due to the computation carried out assuming that E is finite we get for every Π ∈ P
12 1 X Y lim sup log P(W∞(Ln ,Ln ) ∈ [a, b]) ≤ n→∞ n ≤ − inf H(ν1|π(µ1)) + H(ν2|π(µ2)) 1 2 1 ν ,ν ∈M ({s1,...,sL}) W∞(ν1,ν2)∈[a−2∆(Π),b+2∆(Π)] ≤ − inf H(π(ν1)|π(µ1)) + H(π(ν2)|π(µ2)) ν1,ν2∈M 1(E) W∞(ν1,ν2)∈[a−2∆(Π),b+2∆(Π)]
Π ≤ − inf I∞(x) x∈[a−2∆(Π),b+2∆(Π)]
Π ≤ − inf J∞(x). x∈[a−2∆(Π),b+2∆(Π)] Hence, to conclude the proof of the LD upper bound we are left to prove that
Π sup inf J∞(x) ≥ inf J∞(x). (16) Π∈P x∈[a−2∆(Π),b+2∆(Π)] x∈[a,b] Assume for a while the following
Π Lemma 2.7 For every x ∈ [0, 1] we have supΠ J∞(x) = J∞(x). Assume also that (16) does not hold : There exists a δ > 0 such that for every Π ∈ P Π inf J∞(x) < inf J∞(x) − δ. x∈[a−2∆(Π),b+2∆(Π)] x∈[a,b]
Π Since for every Π ∈ P the map x 7→ J∞(x) is lower semi-continuous there exists a net (xΠ)Π∈P such that xΠ ∈ [a − 2∆(Π), b + 2∆(Π)] and
Π J∞(xΠ) − δ/2 < inf J∞(x) − δ. x∈[a,b]
Due to the log sum inequality, see e.g. Theorem 2.7.1 in [2], for every K,K0 ∈ P 0 K K0 K such that K K and every x ∈ [0, 1] we have I∞(x) ≤ I∞ (x) hence J∞(x) ≤ K0 0 0 J∞ (x) so for every Π ∈ P such that Π Π we have
Π0 J∞ (xΠ) − δ/2 < inf J∞(x) − δ. x∈[a,b]
Since (xΠ)Π∈P is a [0, 1]-valued net it admits a converging subnet which limit ∗ Π0 we denote x ∈ [a, b]. Due to the fact that J∞ is lower semi-continuous we get
Π0 Π0 ∗ lim inf J∞ (xΠ) ≥ J∞ (x ) Π∈P hence Π0 ∗ J∞ (x ) < inf J∞(x) − δ/2 x∈[a,b] ∗ which, according to Lemma 2.7, implies J∞(x ) < infx∈[a,b] J∞(x) − δ/2. Since the latter can not be (16) holds and the LD upper bound follows.
Proof of Lemma 2.7 According to Theorem 1.4.3 in [6] for every x ∈ [0, 1] Π Π Π and every Π ∈ P I∞(x) ≤ I∞(x) hence J∞(x) ≤ J∞(x) whence supΠ J∞(x) ≤ J∞(x). Conversely let x be a fixed element of [0, 1]. We shall assume that
13 Π supΠ J∞(x) < ∞ for otherwise the claimed equality trivially holds. Hence for every Π ∈ P and every δ > 0 there exists yδ,Π such that |x − yδ,Π| < δ, Π Π limδ→0 I∞(yδ,Π) = J∞(x) and Π Π I∞(yδ,Π) − δ/2 < inf I∞(y). (17) y∈B(x,δ)
Π Π,δ Π,δ In particular we can assume that I∞(yδ,Π) < ∞ thus there exists ν1 , ν2 ∈ 1 Π,δ Π,δ M (E) such that W∞(ν1 , ν2 ) = yδ,Π and
Π,δ 1 Π,δ 2 Π H(π(ν1 )|π(µ )) + H(π(ν2 )|π(µ )) − δ ≤ I∞(yδ,Π) − δ/2. (18)
Π,δ Π,δ 1 Next we define ρ1 , ρ2 ∈ M (E) by
Π,δ Π,δ X π(ν1 )(Ai) 1 ρ1 (F ) = 1 µ (F ∩ Ai) µ (Ai) Ai∈Π and accordingly
Π,δ Π,δ X π(ν2 )(Ai) 2 ρ2 (F ) = 2 µ (F ∩ Ai) µ (Ai) Ai∈Π
Π,δ Π,δ Π,δ Π,δ for every borelian F ⊂ E. Clearly π(ν1 ) = π(ρ1 ) and π(ν2 ) = π(ρ2 ) hence
Π,δ Π,δ Π,δ Π,δ |W∞(ρ1 , ρ2 ) − W∞(ν1 , ν2 )| Π,δ Π,δ Π,δ Π,δ ≤ |W∞(ρ1 , ρ2 ) − W∞(π(ρ1 ), π(ρ2 ))| + Π,δ Π,δ Π,δ Π,δ +|W∞(π(ρ1 ), π(ρ2 )) − W∞(π(ν1 ), π(ν2 ))| + Π,δ Π,δ Π,δ Π,δ +|W∞(π(ν1 ), π(ν2 )) − W∞(ν1 , ν2 )| ≤ 4∆(Π).
According to Theorem 1.4.3 in [6] and Theorem 2.7.1 in [2] we have
Π,δ Π,δ 1 X Π,δ ρ1 (Bj) H(ρ1 |µ ) = sup ρ1 (Bj) log 1 K∈P µ (Bj) Bj ∈K Π,δ X X Π,δ ρ1 (Bj) = sup ρ1 (Bj) log 1 K∈P µ (Bj) KΠ Ai∈Π Bj ∈K Bj ⊂Ai Π,δ Π,δ X X π(ν1 )(Ai) 1 π(ν1 )(Ai) = sup 1 µ (Bj) log 1 K∈P µ (Ai) µ (Ai) KΠ Ai∈Π Bj ∈K Bj ⊂Ai Π,δ X Π,δ π(ν1 )(Ai) = sup π(ν1 )(Ai) log 1 K∈P µ (Ai) KΠ Ai∈Π Π,δ 1 = H(π(ν1 )|π(µ ))
Π,δ 2 Π,δ 2 and H(ρ2 |µ ) = H(π(ν2 )|π(µ )) as well, hence
Π,δ 1 Π,δ 2 Π,δ 1 Π,δ 2 H(ρ1 |µ ) + H(ρ2 |µ ) = H(π(ν1 )|π(µ )) + H(π(ν2 )|π(µ )).
14 Thus it follows from the preceding, (17) and (18) that for every δ > 0 and every Π ∈ P we have
Π inf I∞(u) − δ ≤ inf I∞(y) u∈B(yδ,Π,4∆(Π)) y∈B(x,δ) hence Π inf I∞(y) − δ ≤ inf I∞(y) y∈B(x,δ+4∆(Π)) y∈B(x,δ) whence
Π inf I∞(y) − δ ≤ sup sup inf I∞(y) y∈B(x,δ+4∆(Π)) Π∈P δ>0 y∈B(x,δ) Π = sup J∞(x). Π But for every δ > 0 we have
sup inf I∞(y) = sup inf I∞(y) Π∈P y∈B(x,δ+4∆(Π)) Π∈P y∈B(x,δ+4∆(Π)) ∆(Π)<δ hence Π inf I∞(y) ≤ sup J∞(x) y∈B(x,6δ) Π∈P Π whence supΠ J∞(x) ≥ J∞(x).
3 On some properties of I∞ and J∞.
3.1 I∞ can fail to be lower semi-continuous All through Section 3.1 we consider that E = [0, 1] is endowed with the usual 1 2 3 3 Euclidean distance and that µ = µ = µ admits f(x) = 1 1 (x) + 1 2 (x) 2 [0, 3 ] 2 [ 3 ,1] as a density w.r.t. the Lebesgue measure. That in this setting I∞ is not lower semi-continuous is an immediate consequence of the following two lemmas.
1 2 1 1 2 Lemma 3.1 Let ν and ν be two elements of M (E) such that W∞(ν , ν ) = 1 1 2 1 2 3 , S(ν ) ⊂ S(µ) and S(ν ) ⊂ S(µ). Necessarily at least one of ν or ν has an atom.
n 1 Lemma 3.2 The sequence (ρ )n≥4 of elements of M (E) defined by the densi- ties n 3 3 1 g (x) = 1[0, 1 − 1 ](x) + − 1[ 1 − 1 , 1 ](x)+ 2 3 n 2 n 3 n 3 3 1 3 + + 1[ 2 , 2 + 1 ](x) + 1[ 2 + 1 ,1](x) 2 n 3 3 n 2 3 n is such that for every n ≥ 4 we have 1 1 2 < W (ρn, µ) ≤ + 3 ∞ 3 n and 3 2 2 3 2 2 H(ρn|µ) = 1 − log 1 − + 1 + log 1 + . 2n 3n 3n 2n 3n 3n
15 1 2 1 1 2 1 Indeed, let ν , ν ∈ M (E) be such that W∞(ν , ν ) = 3 . We have two possi- bilities :
i) S(ν1) or S(ν2) is not included in S(µ) which implies that H(ν1|µ) + H(ν2|µ) = ∞; ii) both S(ν1) and S(ν2) are included in S(µ) which, according to Lemma 3.1, implies that at least one of ν1 or ν2 has an atom, hence H(ν1|µ) + H(ν2|µ) = ∞.
1 Thus I∞( 3 ) = ∞. Meanwhile, it follows from Lemma 3.2 that the sequence n 1 (xn)n≥4 defined by xn = W∞(ρ , µ) is such that xn → 3 and 0 ≤ I∞(xn) ≤ n H(ρ |µ) + H(µ|µ) → 0 as n → ∞, whence I∞ is not lower semi-continuous. 1 From the preceding we deduce that J∞( 3 ) = 0.
3.1.1 Proof of Lemma 3.1 Lemma 3.1 is an immediate consequence of the following three lemmas
Lemma 3.3 Let ν1 and ν2 be two elements of M 1(E) such that S(ν1) ⊂ S(µ) 2 1 2 1 and S(ν ) ⊂ S(µ). If W∞(ν , ν ) ≤ 3 then necessarily at least one of these two statements is true
1 1 2 1 i) ν ([0, 3 ]) = ν ([0, 3 ]); ii) at least one of ν1 or ν2 has an atom.
Lemma 3.4 Let ν1 and ν2 be two elements of M 1(E) such that S(ν1) ⊂ S(µ), 2 1 2 1 1 1 2 1 S(ν ) ⊂ S(µ), W∞(ν , ν ) ≤ 3 and ν ([0, 3 ]) = ν ([0, 3 ]) = α with α ∈]0, 1[. Then at least one of these two statements is true
i) we have 1 2 1 2 1 2 W∞(ν , ν ) = max W∞(γ , γ ),W∞(θ , θ ) 1 2 1 1 1 2 1 2 where γ , γ ∈ M ([0, 3 ]) and θ , θ ∈ M ([ 3 , 1]) are defined by ν1(A) ν1(B) γ1(A) = , θ1(B) = 1 1 1 2 ν ([0, 3 ]) ν ([ 3 , 1])
1 2 2 2 for every measurable A ⊂ [0, 3 ] and B ⊂ [ 3 , 1], γ and θ being defined on the ground of ν2 accordingly; ii) at least one of ν1 or ν2 has an atom.
1 2 1 1 1 2 Lemma 3.5 Let ζ and ζ be two elements of M ([0, 3 ]) (resp. M ([ 3 , 1])) 1 2 1 1 2 such that W∞(ζ , ζ ) = 3 . Then at least one of ζ or ζ has an atom. Proof of Lemma 3.3. Let ν1 and ν2 be two elements of M 1(E) such that S(ν1) ⊂ S(µ) and S(ν2) ⊂ S(µ). We shall prove the following, which is equiv- 1 1 2 1 1 alent to the announced statement : If ν ([0, 3 ]) 6= ν ([0, 3 ]) and neither ν nor 2 1 2 1 1 1 ν have an atom then W∞(ν , ν ) > 3 . To this end assume that ν ([0, 3 ]) >
16 2 1 1 2 1 2 −1 ν ([0, 3 ]). Let Q ∈ C(ν , ν ) be such that W∞(ν , ν ) = sup S(Q ◦ d ). We have 1 1 ν1([0, ]) = Q([0, ] × E) 3 3 1 1 1 1 2 1 2 = Q([0, ] × [0, ]) + Q([0, ]×] , [) + Q([0, ] × [ , 1]) 3 3 3 3 3 3 3 1 1 1 2 = Q([0, ] × [0, ]) + Q([0, ] × [ , 1]) 3 3 3 3
1 1 2 2 1 2 2 1 2 since Q([0, 3 ]×] 3 , 3 [) ≤ ν (] 3 , 3 [) = 0 because S(ν ) ⊂ S(µ) = [0, 3 ] ∪ [ 3 , 1]. 1 1 1 2 1 But Q([0, 3 ] × [0, 3 ]) ≤ Q(E × [0, 3 ]) = ν ([0, 3 ]) hence 1 2 1 1 1 Q([0, ] × [ , 1]) = ν1([0, ]) − Q([0, ] × [0, ]) 3 3 3 3 3 1 1 ≥ ν1([0, ]) − ν2([0, ]) 3 3 > 0.
2 2 2 1 2 We have ν ({ 3 }) = 0 since ν has no atom, hence Q([0, 3 ] × { 3 }) ≤ Q(E × 2 1 2 1 2 S 1 { 3 }) = 0 whence Q([0, 3 ]×] 3 , 1]) > 0. Since [0, 3 ]×] 3 , 1] = m>1[0, 3 ] × 2 1 1 2 1 [ + , 1] there exists an m0 > 0 such that Q([0, ] × [ + , 1]) > 0 hence 3 m 3 3 m0 1 2 −1 1 1 1 W∞(ν , ν ) = sup S(Q ◦ d ) ≥ + > . 3 m0 3 Proof of Lemma 3.4. Let ν1 and ν2 be two elements of M 1(E) such that 1 2 1 2 1 1 2 1 S(ν ) ⊂ S(µ), S(ν ) ⊂ S(µ), W∞(ν , ν ) ≤ 1 and ν ([0, 3 ]) = ν ([0, 3 ]) = α with α ∈]0, 1[. We shall prove that if ii) is false then i) is necessarily true. 1 2 1 2 According to Lemma 1.2 there exists a Q ∈ C∞(ν , ν ) such that W∞(ν , ν ) = −1 1 2 1 2 sup S(Q ◦ d ). Since S(ν ) ⊂ S(µ) and S(ν ) ⊂ S(µ) we have Q(E×] 3 , 3 [) = 1 2 −1 1 1 2 Q(] 3 , 3 [×E) = 0. Since sup S(Q ◦ d ) ≤ 3 we further have Q([0, 3 ] × [ 3 , 1] \ 1 2 2 1 2 1 1 2 2 1 {( 3 , 3 )}) = Q([ 3 , 1] × [0, 3 ] \{( 3 , 3 )}) = 0. But Q({( 3 , 3 )}) + Q({( 3 , 3 )}) = 0 for otherwise either ν1 or ν2 would have at least an atom. Hence Q(E × E) = 1 1 2 2 1 1 1 1 1 = Q([0, 3 ] × [0, 3 ]) + Q([ 3 , 1] × [ 3 , 1]) whence Q([0, 3 ] × [0, 3 ]) = ν ([0, 3 ]) = 2 1 2 2 1 2 2 2 ν ([0, 3 ]) = α > 0 and Q([ 3 , 1] × [ 3 , 1]) = ν ([ 3 , 1]) = ν ([ 3 , 1]) = 1 − α > 0. Since S(Q) is infinitely cyclically monotone , the probability measures on E2 defined by
1 1 2 2 sw Q(A ∩ [0, 3 ] × [0, 3 ]) ne Q(B ∩ [ 3 , 1] × [ 3 , 1]) Q (A) = 1 1 ,Q (B) = 2 2 Q([0, 3 ] × [0, 3 ]) Q([ 3 , 1] × [ 3 , 1])
2 1 2 1 2 for every measurable A, B ⊂ E are elements of C∞(γ , γ ) and C∞(θ , θ ) respectively and S(Q) = S(Qsw) ∪ S(Qne). Indeed, for every measurable A ⊂ 1 [0, 3 ] we have
sw sw Q1 (A) = Q (A × E) 1 1 Q(A × E ∩ [0, 3 ] × [0, 3 ]) = 1 1 (19) Q([0, 3 ] × [0, 3 ]) Q(A × E) = (20) 1 1 ν ([0, 3 ]) ν1(A) = 1 1 ν ([0, 3 ])
17 1 1 where (20) follows from (19) due to the fact that Q([0, 3 ]×] 3 , 1]) = 0. One sw 2 ne 1 ne 2 proves that Q2 = γ ,Q1 = θ and Q2 = θ the same way. Thus we have 1 2 sw −1 1 2 ne −1 W∞(γ , γ ) = sup S(Q ◦ d ) and W∞(θ , θ ) = sup S(Q ◦ d ) hence
1 2 −1 W∞(ν , ν ) = sup S(Q ◦ d ) = max sup S(Qsw ◦ d−1), sup S(Qne ◦ d−1) 1 2 1 2 = max W∞(γ , γ ),W∞(θ , θ ) .
1 2 1 1 Proof of Lemma 3.5. Let ζ and ζ be two elements of M ([0, 3 ]) such 1 2 1 1 2 1 2 that W∞(ζ , ζ ) = 3 . (The proof when ζ and ζ are elements of M ([ 3 , 1]) is 1 2 −1 1 obviously the same.) There exists a Q ∈ C∞(ζ , ζ ) such that sup S(Q◦d ) = 3 1 2 and, to fix notations, we assume that (0, 3 ) ∈ S(Q). We shall prove that if ζ 1 1 has no atoms on [0, 3 ] then 0 is an atom for ζ . To this end it is sufficient to 1 1 2 prove that for every u, v ∈]0, 3 [ we have ζ ([0, u]) ≥ ζ ([0, v]) since this leads 1 1 1 to ζ ([0, u]) = 1 for every u ∈]0, 3 [ hence ζ ({0}) = 1. So, let us assume ∗ ∗ 1 1 ∗ 2 ∗ 1 there exists u , v ∈]0, 3 [ such that ζ ([0, u ]) < ζ ([0, v ]). Since (0, 3 ) ∈ S(Q) 1 ∗ 2 ∗ 2 1 we have 0 < ζ ([0, u ]) < ζ ([0, v ]). Since ζ has no atoms on [0, 3 ] there exists a v∗∗ ∈]0, 1[ such that ζ1([0, u∗]) = ζ2([0, v∗∗]). We can assume that ζ1([0, u∗]) < 1 for otherwise we would have ζ1([0, u∗]) = ζ2([0, v∗∗]) = 1 hence 1 2 ∗ ∗∗ 1 1 W∞(ζ , ζ ) = max{u , v } < 3 . We define four probability measures on [0, 3 ] by setting for every measurable A
ζ1(A ∩ [0, u∗]) ζ2(A ∩ [0, v∗∗]) ζ1,l(A) = , ζ2,l(A) = ζ1([0, u∗]) ζ2([0, v∗∗]) and ζ1(A∩]u∗, 1 ]) ζ2(A∩]v∗∗, 1 ]) ζ1,r(A) = 3 , ζ2,r(A) = 3 . 1 ∗ 1 2 ∗∗ 1 ζ (]u , 3 ]) ζ (]v , 3 ]) There exist two infinitely cyclically monotone Ql ∈ C(ζ1,l, ζ2,l) and Qr ∈ C(ζ1,r, ζ2,r) and we observe that the probability measure K defined on E2 by 1 K = ζ1([0, u∗])Ql + ζ1(]u∗, ])Qr 3 1 = ζ2([0, v∗∗])Ql + ζ2(]v∗∗, ])Qr 3 is such that K ∈ C(ζ1, ζ2). Moreover
1 2 −1 W∞(ζ , ζ ) ≤ sup S(K ◦ d ) 1,l 2,l 1,r 2,r = max{W∞(ζ , ζ ),W∞(ζ , ζ )} ≤ max{max(u∗, v∗∗), max(1 − u∗, 1 − v∗∗)} 1 < 3
1 2 1 which contradicts W∞(ζ , ζ ) = 3 and concludes the proof.
3.1.2 Proof of Lemma 3.2
n 1 1 First of all notice that for every n ≥ 3 we have ρ ([0, 3 ]) 6= µ([0, 3 ]) hence n 1 W∞(ρ , µ) > 3 according to Lemma 3.4. Next, it is not difficult to check that
18 for every n ≥ 4 the probability measure on E2 denoted Qn which density w.r.t. the Lebesgue measure on E2 is given by n 3 3 1 h (x) = 1 1 1 2 (x) + n − 1 1 1 1 2 (x)+ 1 1 [0, 3 − n ] [ 3 − n , 3 ] 2 3 − n 2 n 9 +1[ 1 − 1 , 1 ]×[ 2 , 2 + 1 ](x) + 1[ 2 ,1]2 (x) 3 n 3 3 3 n 2 3 n n n n n −1 1 2 is such that Q1 = µ and Q2 = ρ hence W∞(µ, ρ ) ≤ sup S(Q ◦ d ) ≤ 3 + n . Finally, that H(ρn|µ) is as given is the result of a straightforward computation.
3.2 On the zeros of J∞.
First we prove Lemma 1.4. In Section 3.2.2 we establish that if J∞(x) = 0 then x ∈ Zµ1,µ2 while the converse is proven in Section 3.2.3. Finally in Section 3.2.4 we prove Proposition 1.2
3.2.1 Proof of Lemma 1.4.
n Let x ∈ [0, 1] be such that J∞(x) = 0 : There exists a sequence (x )n≥1 of n n elements of [0, 1] such that x → x and limn→∞ I∞(x ) = 0. Thus, by taking n subsequences if needed, we can say that there exists a sequence (P )n≥1 of infinitely cyclically monotone elements of M 1(E × E) such that for every n ≥ 1 n n −1 n 1 n 2 we have x = sup S(P ◦ d ), H(P1 |µ ) + H(P2 |µ ) → 0 as n → ∞ and n 1 2 (P )n≥1 is convergent w.r.t. the weak convergence topology on M (E ) since 2 n E is compact. Lets denote P the limit of (P )n≥1. We necessarily have that n w 1 n w 2 P1 → P1 = µ and P2 → P2 = µ since
1 2 n 1 n 2 0 ≤ H(P1|µ ) + H(P2|µ ) ≤ lim inf H(P1 |µ ) + H(P2 |µ ) = 0 for ν 7→ H(ν|µ) is lower semi-continuous w.r.t. the weak convergence topology, 1 2 see e.g. Theorem 1.4.3 in [6]. According to Lemma 2.4 P ∈ C∞(µ , µ ) whence −1 1 2 n w sup S(P ◦ d ) = W∞(µ , µ ). Finally, since P → P Lemma 2.2 implies n −1 −1 1 2 x = lim supn→∞ sup S(P ◦ d ) ≥ sup S(P ◦ d ) = W∞(µ , µ ).
3.2.2 If J∞(x) = 0 then x ∈ Zµ1,µ2 .
Let x ∈ [0, 1] be such that J∞(x) = 0. We know from Lemma 1.4 that x ≥ 1 2 n W∞(µ , µ ). Moreover there exists a sequence (x )n≥1 of elements of [0, 1] n n such that x → x and limn→∞ I∞(x ) = 0. Again, by taking subsequences if n needed, we can say that there exists a sequence (P )n≥1 of infinitely cyclically monotone elements of M 1(E × E) such that for every n ≥ 1 we have xn = sup S(P n ◦ d−1) = d(an, bn), (an, bn) ∈ S(P n)
n 1 n 2 H(P1 |µ ) + H(P2 |µ ) → 0 (21)
n n → ∞ and (P )n≥1 is convergent w.r.t. the weak convergence topology on 1 n n M (E × E). We can also assume that (a )n≥1 and (b )n≥1 are converging sequences of elements of S(µ1) and S(µ2) respectively. Lets denote P (resp. n n n a, b) the limit of (P )n≥1 (resp. (a )n≥1, (b )n≥1). We obviously have that x = d(a, b) and we prove that a ↔ b. Again, due to Lemma 2.4, Lemma 2.2
19 n w 1 n w 2 and (21) we necessarily have that P1 → P1 = µ and P2 → P2 = µ , hence 1 2 1 2 1 P ∈ C∞(µ , µ ). We have that a ∈ S(µ ) and b ∈ S(µ ) since both S(µ ) and S(µ2) are closed so we are left to prove that for every integer N ≥ 2, every (α2, β2),..., (αN , βN ) ∈ S(P ) and every σ ∈ SN we have d(a, b) ≤ max d(αi, βσ(i)) i=1,...,N where α1 = a and β1 = b. Indeed, it follows from Lemma 2.3 that for every n n n integer i, 2 ≤ i ≤ N, and every n ≥ 1 there exists (αi , βi ) ∈ S(P ) such that n n n n n n n (αi , βi ) → (αi, βi) as n → ∞. Moreover, (a , b ) → (a, b) and (a , b ) ∈ S(P ) for every n ≥ 1. Since P n is infinitely cyclically monotone and sup S(P n◦d−1) = n n n n n n d(a , b ) we have that for every σ ∈ SN d(a , b ) ≤ maxi=1,...,N {d(αi , βσ(i))} n n n n where α1 = a and β1 = b . This proves the announced inequality by taking the limit n → ∞ and x ∈ Zµ1,µ2 since x = d(a, b) and a ↔ b.
3.2.3 If x ∈ Zµ1,µ2 then J∞(x) = 0. 1 2 First we prove that W∞(µ , µ ) ∈ Zµ1,µ2 . Indeed, according to Lemma 1.2 there 1 2 1 2 −1 exists P ∈ C∞(µ , µ ) such that W∞(µ , µ ) = sup S(P ◦ d ) = d(a, b) with 1 2 (a, b) ∈ S(P ). Clearly a ↔ b hence W∞(µ , µ ) ∈ Zµ1,µ2 . So now let x ∈ Zµ1,µ2 1 2 1 2 be such that x > W∞(µ , µ ) since we already know that J∞(W∞(µ , µ )) = 0. 1 2 Since x ∈ Zµ1,µ2 there exists (a, b) ∈ S(µ ) × S(µ ) such that x = d(a, b) 1 2 and a ↔ b which means that there exists Q ∈ C∞(µ , µ ) such that for every integer N ≥ 2, every (α2, β2),..., (αN , βN ) ∈ S(Q) and every σ ∈ SN we have d(a, b) ≤ maxi=1,...,N d(αi, βσ(i)) where (α1, β1) = (a, b). Since d(a, b) = x > 1 2 −1 W∞(µ , µ ) = sup S(Q ◦ d ) ≥ 0 there exists an ε0 small enough to ensure that for every 0 < ε < ε0 we can find an open neighborhood Uε × Vε of (a, b) such that
1. Uε ∩ Vε = ∅;
2. Uε × Vε ∩ S(Q) = ∅; 1 2 3. µ (Uε) > 0 and µ (Vε) > 0;
4. supx,y∈Uε d(x, y) ≤ ε/2 and supx,y∈Vε d(x, y) ≤ ε/2.
Starting from those (Uε)0<ε<ε0 and (Vε)0<ε<ε0 , for every 0 < ε < ε0 one can ε ε ε choose a finite partition Π = (B1,...,BL(ε)) of E into non-empty measurable sets such that
1. for every 1 ≤ i ≤ L(ε) we have sup ε d(x, y) ≤ ε/2; x,y∈Bi 2. there exists two distinct integers i(a) and i(b) such that 1 ≤ i(a), i(b) ≤ ε ε L(ε),Bi(a) = Uε and Bi(b) = Vε.
ε ε ε L(ε) To every Π we associate (s1, . . . , sL(ε)) ∈ E such that for every 1 ≤ i ≤ L(ε) ε ε ε ε we have si ∈ Bi with si(a) = a, si(b) = b and we set
ε 1 1 ε ε π : M (E) → M ({s1, . . . , sL(ε)}) PL(ε) ε ν 7→ ν(B )δ ε . i=1 i si
20 1,ε 2,ε 1 For every 0 < ε < ε0 we define two elements κ and κ of M (E) by
µ1(A ∩ Bε ) µ2(A ∩ Bε ) 1,ε i(a) 2,ε i(b) κ (A) = 1 ε and κ (A) = 2 ε . µ (Bi(a)) µ (Bi(b))
For every 0 < ε < ε0 and every η ∈]0, 1[ we further introduce
Qε,η = η κ1,ε ⊗ κ2,ε + (1 − η)Q
1 ε,η 1,ε 1 which is an element of M (E × E) such that Q1 = ηκ + (1 − η)µ and ε,η 2,ε 2 Q2 = ηκ + (1 − η)µ . Finally, we also consider
ε,η X ε ε C = ηδ + (1 − η) Q(B × B )δ ε ε . (a,b) i j (si ,sj ) (i,j)∈{1,...,L(ε)}2
ε,η ε ε,η ε,η ε ε,η ε,η We have C1 = π (Q1 ) and C2 = π (Q2 ). Since C need not be infinitely cyclically monotone we have to consider
ε,η X ε ε Cg = ηδ(a,b) + (1 − η) Q(B × B )δ ε ε i j (se1(i,j),se2(i,j)) (i,j)∈{1,...,L(ε)}2
2 ε ε where for every (i, j) ∈ {1,...,L(ε)} such that Q(Bi × Bj ) > 0 we have ε ε ε ε ε ε (se1(i, j), se2(i, j)) ∈ Bi × Bj ∩ S(Q) and (se1(i(a), i(b)), se2(i(a), i(b))) = (a, b). That a ↔ b implies that Cgε,η is infinitely cyclically monotone thus, according ε,η ε,η ε,η ε,η to Lemma 1.3, W∞(Cg1 , Cg2 ) = d(a, b) = x. Clearly W∞(C1 , Cg1 ) ≤ ε ε,η ε,η ε,η ε,η and W∞(C2 , Cg2 ) ≤ ε so |W∞(C1 ,C2 ) − x| ≤ 2ε which implies that ε,η ε,η |W∞(Q1 ,Q2 ) − x| ≤ 4ε according to Lemma 2.1. For every 0 < ε < ε0 and every η ∈]0, 1[ we have
ε,η ε,η ε,η 1 ε,η 2 I∞(W∞(Q1 ,Q2 )) ≤ H(Q1 |µ ) + H(Q2 |µ ) = H(ηκ1,ε + (1 − η)µ1|µ1) + H(ηκ2,ε + (1 − η)µ2|µ2) ≤ ηH(κ1,ε|µ1) + (1 − η)H(µ1|µ1) + +ηH(κ2,ε|µ2) + (1 − η)H(µ2|µ2) = ηH(κ1,ε|µ1) + ηH(κ2,ε|µ2).
1,ε 1 2,ε 2 Since for every 0 < ε < ε0 we have H(κ |µ ) < ∞ and H(κ |µ ) < ∞ one can find an ηε ∈]0, 1[ such that