ONE-DIMENSIONAL EMPIRICAL MEASURES, ORDER STATISTICS, and KANTOROVICH TRANSPORT DISTANCES Sergey Bobkov and Michel Ledoux Univer
Total Page:16
File Type:pdf, Size:1020Kb
ONE-DIMENSIONAL EMPIRICAL MEASURES, ORDER STATISTICS, AND KANTOROVICH TRANSPORT DISTANCES Sergey Bobkov and Michel Ledoux University of Minnesota∗ and University of Toulousey December 19, 2016 Abstract. This work is devoted to the study of rates of convergence of the empiri- 1 Pn cal measures µn = n k=1 δXk , n ≥ 1, over a sample (Xk)k≥1 of independent identically distributed real-valued random variables towards the common distribution µ in Kan- torovich transport distances Wp. The focus is on finite range bounds on the expected p 1=p Kantorovich distances E(Wp(µn; µ)) or E(Wp (µn; µ)) in terms of moments and an- alytic conditions on the measure µ and its distribution function. The study describes a p1 variety of rates, from the standard one n to slower rates, and both lower and upper- bounds on E(Wp(µn; µ)) for fixed n in various instances. Order statistics, reduction to uniform samples and analysis of beta distributions, inverse distribution functions, log- concavity are main tools in the investigation. Two detailed appendices collect classical and some new facts on inverse distribution functions and beta distributions and their densities necessary to the investigation. Keywords. Empirical measure, Kantorovich distance, rate of convergence, finite rate bound, order statistic, inverse distribution function, beta distribution, log-concave measure. Mathematics Subject Classification (2010). Primary 60B10, 60F99, 60G57, 62G30, 60B12; Secondary 62G20. ∗School of Mathematics, University of Minnesota, Minneapolis, MN 55455 USA, [email protected] yInstitut de Math´ematiquesde Toulouse, Universit´ede Toulouse, F-31062 Toulouse, France, and Institut Universitaire de France, [email protected] 1 2 Contents 1 Introduction 4 2 Generalities on Kantorovich transport distances 10 2.1 Kantorovich transport distance Wp ..................... 10 2.2 Topology generated by Wp .......................... 13 2.3 Representations for Wp on the real line . 15 2.4 Empirical measures . 19 3 The Kantorovich distance W1(µn; µ) 24 3.1 Best and worst rates for the means E(W1(µn; µ)) . 24 3.2 Two-sided bounds on E(W1(µn; µ)) ..................... 27 3.3 Functional limit theorems . 31 4 Order statistics representations of Wp(µn; µ) 34 4.1 Optimal transport, order statistics and inverse functions . 34 4.2 Reduction to the uniform distribution . 39 p 5 Standard rate for E(Wp (µn; µ)) 43 2 5.1 General upper-bounds on E(W2 (µn; µ))................... 43 p 5.2 General upper-bounds on E(Wp (µn; µ))................... 46 5.3 Distributions with finite Cheeger constants . 48 5.4 Connectedness and absolute continuity . 50 5.5 Necessary and sufficient conditions . 54 5.6 Standard rate for W1 distance . 55 6 Sampling from log-concave distributions 58 6.1 Bounds on variances of order statistics . 58 p 6.2 Two-sided bounds on E(Wp (µn; µ))..................... 64 6.3 Khinchine-type inequality . 67 6.4 Bounds in terms of the variance . 69 6.5 Some other log-concave examples . 73 7 Miscellaneous bounds and results 76 7.1 Deviations of Wp(µn; µ) from the mean . 76 7.2 Upper-bounds in terms of modulus of continuity . 80 7.3 Two-sided bounds of order n−1=(2p) ..................... 82 p 7.4 General upper-bounds on E(Wp (µ, ν)) ................... 85 −1=2 p 7.5 Moment upper-bounds of order n on E(Wp (µn; µ)) . 87 7.6 W1-convergence of empirical distributions . 89 3 A Inverse distribution functions 93 A.1 Inverse distribution functions . 93 A.2 Supports and continuity . 97 A.3 Modulus of continuity . 100 A.4 Absolute continuity . 102 A.5 Integrals containing the derivative of F −1 .................. 107 A.6 Monotone Lipschitz transforms . 109 B Beta distributions 112 B.1 Log-concave measures on the real line . 112 B.2 Log-concave measures of high order . 115 B.3 Spectral gap . 118 B.4 Poincar´e-type inequalities for Lp norms . 119 B.5 Gaussian concentration . 123 B.6 Mean square beta distributions . 124 B.7 Lower-bounds on the beta densities . 126 B.8 Lower integral bounds . 130 4 1 Introduction This work is devoted to an in-depth investigation of orders of growth of Kantorovich transport distances for one-dimensional empirical measures. Let X be a real-valued random variable on some probability space (Ω; Σ; P), with law (distribution) µ (which defines a Borel probability measure on R) and distribution function F (x) = µ (−∞; x] ; x 2 R: Consider a sequence (Xk)k≥1 of independent copies of X thus with the same distribution µ, and, for each n ≥ 1, the (random) empirical measure n 1 X µ = δ ; n n Xk k=1 where δx is Dirac mass at the point x 2 R. Denote by Fn the distribution function of µn, n 1 X F (x) = 1 (X ); x 2 : n n (−∞;x] k R k=1 The classical limit theorems by Glivenko-Cantelli and Donsker ensure respectively that, almost surely, sup Fn(x) − F (x) ! 0 x2R and, weakly in the Skorokhod topology, p o n Fn(x) − F (x) ! W F (x) ; x 2 R; where W o is a Brownian bridge (on [0; 1]). This work is concerned with rates of convergence in the Kantorovich1 distances Wp, p ≥ 1, of the empirical measures µn towards the theoretical distribution µ. The Kantorovich transport distance Wp(µn; µ), p ≥ 1, between µn and µ is defined by Z Z p p Wp (µn; µ) = inf jx − yj dπ(x; y); π R R 1 In the literature, the distance Wp is also called the Monge-Kantorovich, or Kantorovich- Rubinshtein, or Wasserstein transport distance, as well as the Fr´echet distance (in case p = 2), or a minimal distance. Recently, Vershik [Ve1] wrote an interesting historic essay explaining why it is more fair to fix the name \Kantorovich distance" for all metrics like Wp (calling them Kantorovich power metrics) according to the original reference [Ka1]. Some general topological properties of W1 were studied in 1970 by Dobrushin [Do], who re-introduced this metric with reference to [Vas]; appar- ently, that is why the name \Wasserstein distance" has become rather traditional. As Vershik writes, \Leonid Vasershtein is a famous mathematician specializing in algebraic K-theory and other areas of algebra and analysis, and ... he is absolutely not guilty of this distortion of terminology, which occurs primarily in Western literature". It should be noted that the notation W for the quantities like Wp is the one used by Kantorovich in [Ka1], keeping therefore a balance with the nowadays terminology! 5 where the infimum is taken over all probability measures π on the product space R × R with respective marginals µn and µ. More precisely, we focus in this work on the possible behaviour of the expected Kantorovich distance E(Wp(µn; µ)) as a function of n, where p ≥ 1 is given. Note that this distance is finite as long as Z 1 p p jxj dµ(x) = E jXj < 1; −∞ in which case it will be shown below that Wp(µn; µ) ! 0 with probability one. The rates at which µn ! µ in Wp depends on a variety of hypotheses and properties on the underlying distribution µ discussed here as completely as possible. As such, these questions were only partially studied in the literature (as far as we can tell). The asymptotic behaviour of Wp(µn; µ) for p = 1 and 2 has been investigated previously in papers by del Barrio, Gin´e,Matr´an[B-G-M] and del Barrio, Gin´e,Utzet [B-G-U], providing in particular necessary and sufficient conditions for the weak conver- gence of Wp(µn; µ) (for these values of p) towards integrals of the Brownian bridge under some regularity conditions on µ. The purpose of the present work is rather the study of finite range bounds (that is, for n ≥ 1 large but fixed), both upper and lower-bounds, p on the expected Kantorovich distances E(Wp(µn; µ)) or E(Wp (µn; µ)) for all p ≥ 1 and under fairly general assumptions on the distribution µ. The functional central limit the- p o orem n (Fn(x) − F (x)) ! W (F (x)) already indicates that under proper assumptions p1 the value of E(Wp(µn; µ)) should have the rate of order n (which is in general best possible). Therefore, we will be in particular interested in conditions that ensure this \standard" rate. We next present the various parts and summarize some of the main conclusions obtained here. The first section (Section 2) collects a number of standard results on the Kantorovich transport distances Wp and the topology that they generate. Quantile representations of Wp on the real line are also addressed there. The last paragraph gathers some basic facts on the convergence of empirical measures in Wp over a sample of independent identically distributed random variables towards the common distribution. Section 3 is devoted to the Kantorovich distance W1(µn; µ). It is shown in particular that if E(jXj) < 1, then E(W1(µn; µ)) ! 0, but the convergence may actually hold at an arbitrarily slow rate. On the other hand, the convergence rate cannot be better than p1 2+δ n . This standard rate is reached under the moment condition E(jXj ) < 1 for some δ > 0. In fact, a necessary and sufficient condition for the standard rate is that Z 1 p J1(µ) = F (x)(1 − F (x)) dx < 1: −∞ Morever, explicit two-sided bounds, depending on n, for E(W1(µn; µ)) in terms of the distribution function F may be provided. Connections with functional limit theorems are also addressed.