Multivariate Quantiles and Ranks using Optimal Transportation

Bodhisattva Sen1 Department of Columbia University, New York

Department of Statistics George Mason University

Joint work with Promit Ghosal (Columbia University)

05 April, 2019

1Supported by NSF grants DMS-1712822 and AST-1614743 Ranks and quantiles when d = 1 X is a with c.d.f. F

Rank: The rank of x is F (x) ∈ Property: If F is continuous, F (X ) Unif([0, 1]) ∼ Quantile: The is F −1

Property: If F is continuous, F −1(U) F where U Unif([0, 1]) ∼ ∼

How to define ranks and quantiles in Rd , d > 1? Quantile: The quantile function is F −1

Property: If F is continuous, F −1(U) F where U Unif([0, 1]) ∼ ∼

How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1 X is a random variable with c.d.f. F

Rank: The rank of x R is F (x) ∈ Property: If F is continuous, F (X ) Unif([0, 1]) ∼ How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1 X is a random variable with c.d.f. F

Rank: The rank of x R is F (x) ∈ Property: If F is continuous, F (X ) Unif([0, 1]) ∼ Quantile: The quantile function is F −1

Property: If F is continuous, F −1(U) F where U Unif([0, 1]) ∼ ∼ Many notions of multivariate quantiles/ranks have been suggested: Puri and Sen (1971), Chaudhuri and Sengupta (1993), M¨ott¨onenand Oja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial and geometric quantile Spatial median: M := arg min E X m d m∈R k − k Quantile when d = 1: For u (0, 1), ∈ −1 F (u) = arg min E X x (2u 1)x x∈R | − | − − h i Geometric quantile [Chaudhuri (1996)]: For u < 1, let k k Q(u) := arg min E X x u, x x∈ d k − k − h i R h i

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1 Spatial median and geometric quantile Spatial median: M := arg min E X m d m∈R k − k Quantile when d = 1: For u (0, 1), ∈ −1 F (u) = arg min E X x (2u 1)x x∈R | − | − − h i Geometric quantile [Chaudhuri (1996)]: For u < 1, let k k Q(u) := arg min E X x u, x x∈ d k − k − h i R h i

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested: Puri and Sen (1971), Chaudhuri and Sengupta (1993), M¨ott¨onenand Oja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ... Quantile when d = 1: For u (0, 1), ∈ −1 F (u) = arg min E X x (2u 1)x x∈R | − | − − h i Geometric quantile [Chaudhuri (1996)]: For u < 1, let k k Q(u) := arg min E X x u, x x∈ d k − k − h i R h i

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested: Puri and Sen (1971), Chaudhuri and Sengupta (1993), M¨ott¨onenand Oja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile Spatial median: M := arg min E X m d m∈R k − k Geometric quantile [Chaudhuri (1996)]: For u < 1, let k k Q(u) := arg min E X x u, x x∈ d k − k − h i R h i

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested: Puri and Sen (1971), Chaudhuri and Sengupta (1993), M¨ott¨onenand Oja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile Spatial median: M := arg min E X m d m∈R k − k Quantile when d = 1: For u (0, 1), ∈ −1 F (u) = arg min E X x (2u 1)x x∈R | − | − − h i Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested: Puri and Sen (1971), Chaudhuri and Sengupta (1993), M¨ott¨onenand Oja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile Spatial median: M := arg min E X m d m∈R k − k Quantile when d = 1: For u (0, 1), ∈ −1 F (u) = arg min E X x (2u 1)x x∈R | − | − − h i Geometric quantile [Chaudhuri (1996)]: For u < 1, let k k Q(u) := arg min E X x u, x x∈ d k − k − h i R h i Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two- Goodness-of-fit Testing Independence Testing Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Goal: inf Eµ[c(X , T (X ))] T :T (X )∼ν

µ (on ) and ν (on ) measures, R dµ(x) = R dν(y) = 1 X Y X Y c(x, y) 0: cost of transporting x to y (e.g., c(x, y) = x y p) ≥ k − k T transports µ to ν, i.e., T (X ) ν where X µ, or, ∼ ∼ ν(B) = µ(T −1(B))= dµ, B −1 ⊂ Y ZT (B)

Monge Problem

What’s the cheapest way to transport a pile of sand to cover a Gaspard Mongesinkhole? (1781): What is the cheapest way to transport a pile of sand to cover a sinkhole?

Blanchet (Columbia U. and Stanford U.) 5/60 Monge Problem

What’s the cheapest way to transport a pile of sand to cover a Gaspard Mongesinkhole? (1781): What is the cheapest way to transport a pile of sand to cover a sinkhole?

Goal: inf Eµ[c(X , T (X ))] Blanchet (Columbia U.T and:T Stanford(X )∼ U.)ν 5/60

µ (on ) and ν (on ) probability measures, R dµ(x) = R dν(y) = 1 X Y X Y c(x, y) 0: cost of transporting x to y (e.g., c(x, y) = x y p) ≥ k − k T transports µ to ν, i.e., T (X ) ν where X µ, or, ∼ ∼ ν(B) = µ(T −1(B))= dµ, B −1 ⊂ Y ZT (B) Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p 1. When p =1thisisalsocalledtheEarth Mover distance. The minimizer J ⇤ (which does exist) is called the optimal transport plan or the optimal coupling.Incasethere is an optimal transport map T then J is a singular measure with all its mass on the set (x, T (x)) . { } It can be shown that

W p(P, Q)=sup (y)dQ(y) (x)dP (x) p , Z Z where (y) (x) x y p.Thisiscalledthedualformulation.Inspecialcasewhere p = 1 we have the very|| simple || representation

W (P, Q) = sup f(x)dP (x) f(x)dQ(x): f 1 2F (Z Z ) where denotes all maps from Rd to R such that f(y) f(x) x y for all x, y. F | ||| || When d =1,thedistancehasaclosedform:

1 1/p 1 1 p W (P, Q)= F (z) G (z) p | | ✓Z0 ◆

4

One-dimensional optimal transport

Suppose , R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼

2 2 (ii) T minimizes cost Eµ[(X T (X )) ]; assume c(x, y) = (x y) − − One-dimensional optimal transport

Suppose , R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼

2 2 (ii) T minimizes cost Eµ[(X T (X )) ]; assume c(x, y) = (x y) − −

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p 1. When p =1thisisalsocalledtheEarth Mover distance. The minimizer J ⇤ (which does exist) is called the optimal transport plan or the optimal coupling.Incasethere is an optimal transport map T then J is a singular measure with all its mass on the set (x, T (x)) . { } It can be shown that

W p(P, Q)=sup (y)dQ(y) (x)dP (x) p , Z Z where (y) (x) x y p.Thisiscalledthedualformulation.Inspecialcasewhere p = 1 we have the very|| simple || representation

W (P, Q) = sup f(x)dP (x) f(x)dQ(x): f 1 2F (Z Z ) where denotes all maps from Rd to R such that f(y) f(x) x y for all x, y. F | ||| || When d =1,thedistancehasaclosedform:

1 1/p 1 1 p W (P, Q)= F (z) G (z) p | | ✓Z0 ◆

4 This means that if x > x then T (x ) T (x ) 1 0 1 ≥ 0 So T must be a monotone nondecreasing function

Therefore, choose T ( ) so that (recall: ν(B) = dµ) · T −1(B) x T (x) R dµ(x) = dν(y) Fµ(x) = Fν (T (x)) ⇒ Z−∞ Z−∞ −1 Thus, T = F Fµ (and this map T is unique) ν ◦

The minimizing T must satisfy (Why?) 2 2 2 2 (x0 T (x0)) + (x1 T (x1)) (x0 T (x1)) + (x1 T (x0)) Figure− 3: Two densities p and− q and the optimal≤ transport− map to that morphs p−into q.

where p 1. When p =1thisisalsocalledtheEarth Mover distance. The minimizer J ⇤ (which does exist) is called the optimal transport plan or the optimal coupling.Incasethere is an optimal transport map T then J is a singular measure with all its mass on the set (x, T (x)) . { } It can be shown that

W p(P, Q)=sup (y)dQ(y) (x)dP (x) p , Z Z where (y) (x) x y p.Thisiscalledthedualformulation.Inspecialcasewhere p = 1 we have the very|| simple || representation

W (P, Q) = sup f(x)dP (x) f(x)dQ(x): f 1 2F (Z Z ) where denotes all maps from Rd to R such that f(y) f(x) x y for all x, y. F | ||| || When d =1,thedistancehasaclosedform:

1 1/p 1 1 p W (P, Q)= F (z) G (z) p | | ✓Z0 ◆

4 Therefore, choose T ( ) so that (recall: ν(B) = dµ) · T −1(B) x T (x) R dµ(x) = dν(y) Fµ(x) = Fν (T (x)) ⇒ Z−∞ Z−∞ −1 Thus, T = F Fµ (and this map T is unique) ν ◦

The minimizing T must satisfy (Why?) 2 2 2 2 (x0 T (x0)) + (x1 T (x1)) (x0 T (x1)) + (x1 T (x0)) Figure− 3: Two densities p and− q and the optimal≤ transport− map to that morphs p−into q. This means that if x > x then T (x ) T (x ) 1 0 1 ≥ 0 where p 1. When p =1thisisalsocalledtheEarth Mover distance. The minimizer J ⇤ (which does exist) is called the optimal transport plan or the optimal coupling.Incasethere SoisT an optimalmust transport be a monotone map T then J nondecreasingis a singular measure withfunction all its mass on the set (x, T (x)) . { } It can be shown that

W p(P, Q)=sup (y)dQ(y) (x)dP (x) p , Z Z where (y) (x) x y p.Thisiscalledthedualformulation.Inspecialcasewhere p = 1 we have the very|| simple || representation

W (P, Q) = sup f(x)dP (x) f(x)dQ(x): f 1 2F (Z Z ) where denotes all maps from Rd to R such that f(y) f(x) x y for all x, y. F | ||| || When d =1,thedistancehasaclosedform:

1 1/p 1 1 p W (P, Q)= F (z) G (z) p | | ✓Z0 ◆

4 The minimizing T must satisfy (Why?) 2 2 2 2 (x0 T (x0)) + (x1 T (x1)) (x0 T (x1)) + (x1 T (x0)) Figure− 3: Two densities p and− q and the optimal≤ transport− map to that morphs p−into q. This means that if x > x then T (x ) T (x ) 1 0 1 ≥ 0 where p 1. When p =1thisisalsocalledtheEarth Mover distance. The minimizer J ⇤ (which does exist) is called the optimal transport plan or the optimal coupling.Incasethere SoisT an optimalmust transport be a monotone map T then J nondecreasingis a singular measure withfunction all its mass on the set (x, T (x)) . { } Therefore,It can be shown choose that T ( ) so that (recall: ν(B) = dµ) · T −1(B) p x Wp (P, Q)=supT (x) (y)dQ(y) (x)dP (x) R , Z Z dµ(x) = dν(y) Fµ(x) = Fν (T (x)) where (y−∞) (x) x y p.Thisiscalledthedualformulation.Inspecialcasewhere−∞ ⇒ p = 1 weZ have the very|| simple || representationZ −1 Thus, T = Fν Fµ (and this map T is unique) W (P,◦ Q) = sup f(x)dP (x) f(x)dQ(x): f 1 2F (Z Z ) where denotes all maps from Rd to R such that f(y) f(x) x y for all x, y. F | ||| || When d =1,thedistancehasaclosedform:

1 1/p 1 1 p W (P, Q)= F (z) G (z) p | | ✓Z0 ◆

4 Ranks and Quantiles when d = 1 −1 When µ = Unif([0, 1]), T = Fν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to the optimal transport problem

How to do this in higher dimensions, e.g., when = = Rd , d > 1? X Y

Optimal transportation when d = 1

, R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼ 2 (ii) T minimizes cost Eµ[(X T (X )) ] −

−1 Solution: T = F Fµ (and this map T is unique) ν ◦ When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to the optimal transport problem

How to do this in higher dimensions, e.g., when = = Rd , d > 1? X Y

Optimal transportation when d = 1

, R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼ 2 (ii) T minimizes cost Eµ[(X T (X )) ] −

−1 Solution: T = F Fµ (and this map T is unique) ν ◦ Ranks and Quantiles when d = 1 −1 When µ = Unif([0, 1]), T = Fν transports µ to ν — quantile map Thus, when d = 1, the rank and quantile maps are solutions to the optimal transport problem

How to do this in higher dimensions, e.g., when = = Rd , d > 1? X Y

Optimal transportation when d = 1

, R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼ 2 (ii) T minimizes cost Eµ[(X T (X )) ] −

−1 Solution: T = F Fµ (and this map T is unique) ν ◦ Ranks and Quantiles when d = 1 −1 When µ = Unif([0, 1]), T = Fν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map How to do this in higher dimensions, e.g., when = = Rd , d > 1? X Y

Optimal transportation when d = 1

, R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼ 2 (ii) T minimizes cost Eµ[(X T (X )) ] −

−1 Solution: T = F Fµ (and this map T is unique) ν ◦ Ranks and Quantiles when d = 1 −1 When µ = Unif([0, 1]), T = Fν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to the optimal transport problem Optimal transportation when d = 1

, R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s X Y ⊂ Goals: (i) Transport µ to ν; i.e., find T s.t. if X µ then T (X ) ν ∼ ∼ 2 (ii) T minimizes cost Eµ[(X T (X )) ] −

−1 Solution: T = F Fµ (and this map T is unique) ν ◦ Ranks and Quantiles when d = 1 −1 When µ = Unif([0, 1]), T = Fν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to the optimal transport problem

How to do this in higher dimensions, e.g., when = = Rd , d > 1? X Y Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Drawbacks Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit of transport maps Ti i≥ may fail to be a transport map { } 1 Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs!

Monge’s problem: Given probability measures µ and ν solve:

inf Eµ[c(X , T (X ))] = inf c(x, T (x))dµ(x) (1) T :T (X )∼ν T :T #µ=ν ZX where T (X ) ν iff µ(T −1(B)) = ν(B), for all B Borel ∼ Monge’s problem: Given probability measures µ and ν solve:

inf Eµ[c(X , T (X ))] = inf c(x, T (x))dµ(x) (1) T :T (X )∼ν T :T #µ=ν ZX where T (X ) ν iff µ(T −1(B)) = ν(B), for all B Borel ∼ Drawbacks Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit of transport maps Ti i≥ may fail to be a transport map { } 1 Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs! 1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Always has a solution for c( , ) 0 lower semicontinuous (l.s.c)2 · · ≥ Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id, Topt )#µ

Computation: Kantorovich Dual Problem Dual

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): inf c(x, T (x))dµ(x) T :T #µ=ν X R LetΠ( µ, ν) be the class of joint distributions of( X , Y ) π s.t. ∼

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

min Eπ[c(X , Y )] = min c(x, y)dπ(x, y) π∈Π(µ,ν) π:π∈Π(µ,ν) ZX ×Y

2 d A function φ : R R is l.s.c at x0 iff lim infx→x φ(x) φ(x0) → 0 ≥ Result: If µ is abs. cont. then (K)=(M) and πopt = (id, Topt )#µ

Computation: Kantorovich Dual Problem Dual

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): inf c(x, T (x))dµ(x) T :T #µ=ν X R LetΠ( µ, ν) be the class of joint distributions of( X , Y ) π s.t. ∼

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

min Eπ[c(X , Y )] = min c(x, y)dπ(x, y) π∈Π(µ,ν) π:π∈Π(µ,ν) ZX ×Y Always has a solution for c( , ) 0 lower semicontinuous (l.s.c)2 · · ≥ Linear program (infinite dimensional)

2 d A function φ : R R is l.s.c at x0 iff lim infx→x φ(x) φ(x0) → 0 ≥ Kantorovich Relaxation: Primal Problem

Monge’s problem (M): inf c(x, T (x))dµ(x) T :T #µ=ν X R LetΠ( µ, ν) be the class of joint distributions of( X , Y ) π s.t. ∼

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

min Eπ[c(X , Y )] = min c(x, y)dπ(x, y) π∈Π(µ,ν) π:π∈Π(µ,ν) ZX ×Y Always has a solution for c( , ) 0 lower semicontinuous (l.s.c)2 · · ≥ Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id, Topt )#µ

Computation: Kantorovich Dual Problem Dual 2 d A function φ : R R is l.s.c at x0 iff lim infx→x φ(x) φ(x0) → 0 ≥ Discrete Kantorovich formulation (Earth Mover’s Distance) When µ and ν are discrete N M I Let µ = i=1 pixi and µ = j=1 qj yj ,wherexi is a Dirac measure,

KantorovichP relaxation:P Solve min Eπ[c(X , Y )] : π Π(µ, ν) K(µ, ⌫)=min c(x ,y ) ∈ n i j ij o i j M N Discrete version: µ and ν supportedX X on xi and yj . Then { }i=1 { }j=1 s.t. ij = pi, ij = qj ,ij 0 (7) M N M N Xj Xi min pij c(xi , yj ): pij = ν(yj ); pij = µ(xi ) {pij ≥0}   i j i j X=1 X=1 X=1 X=1   

S. Kolouri and G. K. Rohde OT Crash Course Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

−1 When d = 1 opt. transport T = F Fµ irrespective of moment assump. ν ◦

A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = u x 2 k − k Monge’s problem3 (M): inf u T (u) 2dµ(u) T :T #µ=ν k − k R T #µ is the push forward of µ by T , i.e., T #µ(B) = µ(T −1(B)), ∀B

Kantorovich Relaxation (K): min u x 2dπ(u, x) π:π∈Π(µ,ν) k − k R

3Monge’s problem is not meaningful unless µ and ν have finite second moments A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = u x 2 k − k Monge’s problem3 (M): inf u T (u) 2dµ(u) T :T #µ=ν k − k R T #µ is the push forward of µ by T , i.e., T #µ(B) = µ(T −1(B)), ∀B

Kantorovich Relaxation (K): min u x 2dπ(u, x) π:π∈Π(µ,ν) k − k R Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

−1 When d = 1 opt. transport T = F Fµ irrespective of moment assump. ν ◦

3Monge’s problem is not meaningful unless µ and ν have finite second moments Goal: Find the “optimal” transportation map T s.t. T #µ = ν

Theorem [Knot and Smith, Brenier, McCann ...] There is an µ-a.e. unique measurable mapping Q : Rd , transporting µ to ν (i.e., Q#µ = ν or Q(U) ν), of the form S → ∼ Q(u) = ϕ(u), for µ-a.e. u, ∇ where ϕ: Rd R + is a convex function (cf. when d = 1). → ∪ { ∞} If, in addition, µ, ν have finite second moments, then (i) Q( ) is the µ-a.e. unique transport map (sol. (M)), i.e., · inf u T (u) 2dµ(u) = u Q(u) 2dµ(u); T :T #µ=ν k − k k − k Z Z (ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π =( id, Q)#µ

U µ abs. cont. distribution with support Rd ∼ d S⊂ Example: µ = Unif([0, 1] ) or Unif(Bd (0, 1))

X ν; ν is a given probability measure in Rd ∼ Theorem [Knot and Smith, Brenier, McCann ...] There is an µ-a.e. unique measurable mapping Q : Rd , transporting µ to ν (i.e., Q#µ = ν or Q(U) ν), of the form S → ∼ Q(u) = ϕ(u), for µ-a.e. u, ∇ where ϕ: Rd R + is a convex function (cf. when d = 1). → ∪ { ∞} If, in addition, µ, ν have finite second moments, then (i) Q( ) is the µ-a.e. unique transport map (sol. (M)), i.e., · inf u T (u) 2dµ(u) = u Q(u) 2dµ(u); T :T #µ=ν k − k k − k Z Z (ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π =( id, Q)#µ

U µ abs. cont. distribution with support Rd ∼ d S⊂ Example: µ = Unif([0, 1] ) or Unif(Bd (0, 1))

X ν; ν is a given probability measure in Rd ∼ Goal: Find the “optimal” transportation map T s.t. T #µ = ν If, in addition, µ, ν have finite second moments, then (i) Q( ) is the µ-a.e. unique transport map (sol. (M)), i.e., · inf u T (u) 2dµ(u) = u Q(u) 2dµ(u); T :T #µ=ν k − k k − k Z Z (ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π =( id, Q)#µ

U µ abs. cont. distribution with support Rd ∼ d S⊂ Example: µ = Unif([0, 1] ) or Unif(Bd (0, 1))

X ν; ν is a given probability measure in Rd ∼ Goal: Find the “optimal” transportation map T s.t. T #µ = ν

Theorem [Knot and Smith, Brenier, McCann ...] There is an µ-a.e. unique measurable mapping Q : Rd , transporting µ to ν (i.e., Q#µ = ν or Q(U) ν), of the form S → ∼ Q(u) = ϕ(u), for µ-a.e. u, ∇ where ϕ: Rd R + is a convex function (cf. when d = 1). → ∪ { ∞} U µ abs. cont. distribution with support Rd ∼ d S⊂ Example: µ = Unif([0, 1] ) or Unif(Bd (0, 1))

X ν; ν is a given probability measure in Rd ∼ Goal: Find the “optimal” transportation map T s.t. T #µ = ν

Theorem [Knot and Smith, Brenier, McCann ...] There is an µ-a.e. unique measurable mapping Q : Rd , transporting µ to ν (i.e., Q#µ = ν or Q(U) ν), of the form S → ∼ Q(u) = ϕ(u), for µ-a.e. u, ∇ where ϕ: Rd R + is a convex function (cf. when d = 1). → ∪ { ∞} If, in addition, µ, ν have finite second moments, then (i) Q( ) is the µ-a.e. unique transport map (sol. (M)), i.e., · inf u T (u) 2dµ(u) = u Q(u) 2dµ(u); T :T #µ=ν k − k k − k Z Z (ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π =( id, Q)#µ 1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. unique map that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = + for u c ∞ ∈ S In the statistics literature this study was initiated by Chernozhukov et al. (2017, AoS); Hallin (2018, AoS, in revision)

Quantile map when d 1 ≥

µ has an abs. cont. distribution with support Rd S⊂

ν a given probability measure in Rd (need notbe abs. cont.)

Quantile map The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

d Q ϕ : R ≡ ∇ S → where ϕ pushes µ to ν (i.e., ϕ#µ = ν) and ϕ : Rd R + is a convex∇ function. ∇ → ∪ { ∞} Quantile map when d 1 ≥

µ has an abs. cont. distribution with support Rd S⊂

ν a given probability measure in Rd (need notbe abs. cont.)

Quantile map The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

d Q ϕ : R ≡ ∇ S → where ϕ pushes µ to ν (i.e., ϕ#µ = ν) and ϕ : Rd R + is a convex∇ function. ∇ → ∪ { ∞}

Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. unique map that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = + for u c ∞ ∈ S In the statistics literature this study was initiated by Chernozhukov et al. (2017, AoS); Hallin (2018, AoS, in revision) Sample quantiles when d = 1

µ = Uniform([0, 1]) 1 n n ν νn = δXi is the empirical distribution of Xi i R ≡ n i=1 { } =1 ⊂ X(1) < . . . < X(n) be the order statistics P Quantile function Q 0.5 0.5 − 1.5 −

0.0 0.2 0.4 0.6 0.8 1.0

u Then the sample quantile function Qˆn (Qˆn#µ = νn) reduces to i 1 i Qˆn(u) = X , if u − , , i = 1,..., n (i) ∈ n n   At i , i = 1,..., n 1, we are free to define n − i Qˆn [X , X ] n ∈ (i) (i+1)   Sample quantiles in Rd , d 1 ≥ µ abs. cont. with support Rd ; e.g., µ = Uniform([0, 1]d ) S ⊂ 1 n n d ν νn = δXi is the empirical distribution of Xi i R ≡ n i=1 { } =1 ⊂ ˆ P ˆ 1 n Qn is the transport (Monge) map s.t. Qn#µ = n i=1 δXi and minimizes (in this case (K)=(M)) n P 2 2 u T (u) dµ(u) = u Xi dµ(u) k − k k − k Z i=1 Z S X{u∈S:T (u)=Xi } Sample quantiles in Rd , d 1 ≥ µ abs. cont. with support Rd ; e.g., µ = Uniform([0, 1]d ) S ⊂ 1 n n d ν νn = δXi is the empirical distribution of Xi i R ≡ n i=1 { } =1 ⊂ ˆ P ˆ 1 n Qn is the transport (Monge) map s.t. Qn#µ = n i=1 δXi and minimizes (in this case (K)=(M)) n P 2 2 u T (u) dµ(u) = u Xi dµ(u) k − k k − k Z i=1 Z S X{u∈S:T (u)=Xi } Computation in the semi-discrete case

n −1 Obtain a convex subdivision of — “partition” of = Qˆ (Xi ) S S ∪i=1 n Top-dimensional cells: convex polyhedral sets in the subdivision of with non-empty interior S

Question: How to compute Qˆn?(Figures and plots?) 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure: The data sets are drawn from the following distributions (clockwise top to bottom): (i) X ∼ N2((0, 0), I2); (ii) X ∼ N2((0, 0), Σ) where Σ1,1 = Σ2,2 = 1 and Σ1,2 = Σ2,1 = 0.99; (iii) two spiral structures with Gaussian perturbations (with small variance); and (iv) a mixture of four different distributions. aBy convention ϕ(u) = + for u / ∞ ∈ S Rank map The rank map of ν (w.r.t. µ) is defined by

R ϕ∗ ≡ ∇ where ϕ#µ = ν, ϕ∗ : Rd R is (convex) Legendre-Fenchel dual of ϕ: ∇ → ϕ∗(x) := sup x, u ϕ(u) = sup x, u ϕ(u) d u∈R {h i − } u∈S{h i − } d Note that the rank map R( ) is finite on R convex functions ·

Rank map

µ has an abs. cont. distribution with support Rd S⊂ ν a given probability measure in Rd (need notbe abs. cont.)

Quantile map The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ϕ : Rd ≡ ∇ S → where ϕ#µ = ν and ϕ : Rd R + is a convexa function ∇ → ∪ { ∞} Rank map

µ has an abs. cont. distribution with support Rd S⊂ ν a given probability measure in Rd (need notbe abs. cont.)

Quantile map The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ϕ : Rd ≡ ∇ S → where ϕ#µ = ν and ϕ : Rd R + is a convexa function ∇ → ∪ { ∞} aBy convention ϕ(u) = + for u / ∞ ∈ S Rank map The rank map of ν (w.r.t. µ) is defined by

R ϕ∗ ≡ ∇ where ϕ#µ = ν, ϕ∗ : Rd R is (convex) Legendre-Fenchel dual of ϕ: ∇ → ϕ∗(x) := sup x, u ϕ(u) = sup x, u ϕ(u) d u∈R {h i − } u∈S{h i − } d Note that the rank map R( ) is finite on R convex functions · Result [Ghosal and S. (2018+)]

Let Q = ϕ be the quantile function of ν (w.r.t. µ), where ∇ ϕ : Rd R + is convex. Then: → ∪ { ∞} (i) The inverse function of Q exists, and has the form

Q−1 = ϕ∗ =: R, ∇ where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int( ) to Int( ) (cf. when d = 1a) S X (ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e., R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

1 When is R = Q− ?

X ν: supported on convex Rd with Lebesgue density4 ∼ X ⊂ U µ: supported on cvx. compact Rd with bounded density ∼ S⊂

4with mild boundedness (from below and above) assumptions on the density on X (ii) Q is a homeomorphism from Int( ) to Int( ) (cf. when d = 1a) S X (ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e., R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

1 When is R = Q− ?

X ν: supported on convex Rd with Lebesgue density4 ∼ X ⊂ U µ: supported on cvx. compact Rd with bounded density ∼ S⊂ Result [Ghosal and S. (2018+)]

Let Q = ϕ be the quantile function of ν (w.r.t. µ), where ∇ ϕ : Rd R + is convex. Then: → ∪ { ∞} (i) The inverse function of Q exists, and has the form

Q−1 = ϕ∗ =: R, ∇ where ϕ∗ is the Legendre-Fenchel dual of ϕ.

4with mild boundedness (from below and above) assumptions on the density on X 1 When is R = Q− ?

X ν: supported on convex Rd with Lebesgue density4 ∼ X ⊂ U µ: supported on cvx. compact Rd with bounded density ∼ S⊂ Result [Ghosal and S. (2018+)]

Let Q = ϕ be the quantile function of ν (w.r.t. µ), where ∇ ϕ : Rd R + is convex. Then: → ∪ { ∞} (i) The inverse function of Q exists, and has the form

Q−1 = ϕ∗ =: R, ∇ where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int( ) to Int( ) (cf. when d = 1a) S X (ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e., R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing 4with mild boundedness (from below and above) assumptions on the density on X Equivariance under orthogonal transformations Suppose Y = AX , A is d d matrix ×

> > A is an orthogonal matrix, i.e., AA = A A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd (0, 1))

> Then, QY (u) = AQX (A u) for µ-a.e. u

> d RY (y) = ARX (A y), for a.e. y R ∈ Quantile/rank maps — equivariant under orthogonal transformations

Properties of the rank/quantile maps

Characterizes the distribution The quantile and rank functions characterize the associated distribution µ: spherically symmetric distribution (e.g., µ = Uniform(Bd (0, 1))

> Then, QY (u) = AQX (A u) for µ-a.e. u

> d RY (y) = ARX (A y), for a.e. y R ∈ Quantile/rank maps — equivariant under orthogonal transformations

Properties of the rank/quantile maps

Characterizes the distribution The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations Suppose Y = AX , A is d d matrix ×

> > A is an orthogonal matrix, i.e., AA = A A = Id Properties of the rank/quantile maps

Characterizes the distribution The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations Suppose Y = AX , A is d d matrix ×

> > A is an orthogonal matrix, i.e., AA = A A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd (0, 1))

> Then, QY (u) = AQX (A u) for µ-a.e. u

> d RY (y) = ARX (A y), for a.e. y R ∈ Quantile/rank maps — equivariant under orthogonal transformations Mutual independence [Ghosal and S. (2018+)]

If X1,..., Xk are mutually independent then

Q(u1,..., uk ) = (Q1(u1),..., Qk (uk )), for µ-a.e. (u1,..., uk ),

d R(x1,..., xk ) = (R1(x1),..., Rk (xk )), for a.e. (x1,..., xk ) R . ∈

Under mutual independence

X = (X , X ,..., Xk ) ν where k 2; 1 2 ∼ ≥

di Xi νi , for i = 1,..., k are r.v. in R (here d1 + ... + dk = d) ∼ µ = Uniform([0, 1]d )

Let Q and Qi be the quantile maps of X and Xi , for i = 1,..., k, di respectively (w.r.t. µ and µi = Uniform([0, 1] ))

Let R and Ri , for i = 1,..., k, be the corresponding rank maps Under mutual independence

X = (X , X ,..., Xk ) ν where k 2; 1 2 ∼ ≥

di Xi νi , for i = 1,..., k are r.v. in R (here d1 + ... + dk = d) ∼ µ = Uniform([0, 1]d )

Let Q and Qi be the quantile maps of X and Xi , for i = 1,..., k, di respectively (w.r.t. µ and µi = Uniform([0, 1] ))

Let R and Ri , for i = 1,..., k, be the corresponding rank maps

Mutual independence [Ghosal and S. (2018+)]

If X1,..., Xk are mutually independent then

Q(u1,..., uk ) = (Q1(u1),..., Qk (uk )), for µ-a.e. (u1,..., uk ),

d R(x1,..., xk ) = (R1(x1),..., Rk (xk )), for a.e. (x1,..., xk ) R . ∈ Sample rank map The sample rank map is defined as

∗ Rˆn = ϕˆ ∇ n ∗ d whereˆϕn : R R is also convex piecewise affine: → ∗ ϕˆn(x) = sup x, u ϕˆn(u) u∈S {h i − }

Sample rank map when d 1 ≥ d d X1,..., Xn R ; µ is abs. cont. on R convex ∈ S ⊂ 1 n Sample quantile function: Qˆn ϕˆn pushes µ to ν = δX ≡ ∇ n i=1 i P Observe that ϕˆn X ,..., Xn µ-a.e. ∇ ∈ { 1 }

Thusˆϕn is a piecewise affine convex function:

max Xi , u + hˆi , u i=1,...,n ϕˆn(u) = {h i } ∈ S (+ , u c ∞ ∈ S Sample rank map when d 1 ≥ d d X1,..., Xn R ; µ is abs. cont. on R convex ∈ S ⊂ 1 n Sample quantile function: Qˆn ϕˆn pushes µ to ν = δX ≡ ∇ n i=1 i P Observe that ϕˆn X ,..., Xn µ-a.e. ∇ ∈ { 1 }

Thusˆϕn is a piecewise affine convex function:

max Xi , u + hˆi , u i=1,...,n ϕˆn(u) = {h i } ∈ S (+ , u c ∞ ∈ S Sample rank map The sample rank map is defined as

∗ Rˆn = ϕˆ ∇ n ∗ d whereˆϕn : R R is also convex piecewise affine: → ∗ ϕˆn(x) = sup x, u ϕˆn(u) u∈S {h i − } = Qˆn(u) convex functions

∗ −1 Result: Rˆn(Xi ) ∂ϕˆ (Xi ) which contains top-dim. cell Qˆ (Xi )! ∈ n n

How to define the sample ranks Rˆn(Xi )?

∗ The sample rank map is Rˆn = ϕˆ (Recall: Qˆn = ϕˆn) ∇ n ∇

∗ Butϕ ˆn is not differentiable at Xi ; i = 1,..., n

∗ Fact: u ∂ϕˆ (Xi ) Xi ∂ϕˆn(u) ∈ n ⇔ ∈ How to define the sample ranks Rˆn(Xi )?

∗ The sample rank map is Rˆn = ϕˆ (Recall: Qˆn = ϕˆn) ∇ n ∇

∗ Butϕ ˆn is not differentiable at Xi ; i = 1,..., n

∗ Fact: u ∂ϕˆ (Xi ) Xi ∂ϕˆn(u) = Qˆn(u) convex functions ∈ n ⇔ ∈

∗ −1 Result: Rˆn(Xi ) ∂ϕˆ (Xi ) which contains top-dim. cell Qˆ (Xi )! ∈ n n Usual ranks when d = 1 i The sample DF (rank): Fn(x) = , if x X(i), X(i+1) n ∈  

Convention: We can define Rˆn(Xi ) = max u −1 u∈Cl(Qn (Xi )) k k

The sample ranks Rˆn(Xi ) when d = 1 i The sample rank map: Rˆn(x) = , if x X , X n ∈ (i) (i+1)  Free to define Rˆn(X ) as anything in the interval[( i 1)/n, i/n] (i) − The sample ranks Rˆn(Xi ) when d = 1 i The sample rank map: Rˆn(x) = , if x X , X n ∈ (i) (i+1)  Free to define Rˆn(X ) as anything in the interval[( i 1)/n, i/n] (i) − Usual ranks when d = 1 i The sample DF (rank): Fn(x) = , if x X(i), X(i+1) n ∈  

Convention: We can define Rˆn(Xi ) = max u −1 u∈Cl(Qn (Xi )) k k We define Rˆn(Xi ) as a random point drawn from the uniform ˆ−1 distribution on the cell Qn (Xi ), i.e., −1 Rˆn(Xi ) X ,..., Xn Uniform(Qˆ (Xi )) | 1 ∼ n

Result: If µ = Uniform( ), Rˆn(Xi ) µ = Uniform( ) S ∼ S

Compare: R(Xi ) µ = Uniform( ), R is the pop. rank map ∼ S

Distribution-free multivariate ranks Rˆn(Xi ) n When d = 1, the ranks Fn(Xi ) i=1, are identically distributed (on 1/n, 2/n,..., n/n {with probability} 1 /n each) { } Result: If µ = Uniform( ), Rˆn(Xi ) µ = Uniform( ) S ∼ S

Compare: R(Xi ) µ = Uniform( ), R is the pop. rank map ∼ S

Distribution-free multivariate ranks Rˆn(Xi ) n When d = 1, the ranks Fn(Xi ) i=1, are identically distributed (on 1/n, 2/n,..., n/n {with probability} 1 /n each) { }

We define Rˆn(Xi ) as a random point drawn from the uniform ˆ−1 distribution on the cell Qn (Xi ), i.e., −1 Rˆn(Xi ) X ,..., Xn Uniform(Qˆ (Xi )) | 1 ∼ n Distribution-free multivariate ranks Rˆn(Xi ) n When d = 1, the ranks Fn(Xi ) i=1, are identically distributed (on 1/n, 2/n,..., n/n {with probability} 1 /n each) { }

We define Rˆn(Xi ) as a random point drawn from the uniform ˆ−1 distribution on the cell Qn (Xi ), i.e., −1 Rˆn(Xi ) X ,..., Xn Uniform(Qˆ (Xi )) | 1 ∼ n

Result: If µ = Uniform( ), Rˆn(Xi ) µ = Uniform( ) S ∼ S

Compare: R(Xi ) µ = Uniform( ), R is the pop. rank map ∼ S Let Qˆn and Rˆn be any sample quantile and rank functions

Let K Int( ) be a compact set. Then, we have 1 ⊂ S a.s. a.s. sup Qˆn(u) Q(u) 0 and sup Rˆn(x) R(x) 0 d u∈K1 k − k → x∈R k − k → Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which: (i) assumed that ν is compactly supported; (ii) showed uniform convergence of Rˆn only on compacts inside Int( ); (iii) showed in probability convergence. X

Glivenko-Cantelli type result [Ghosal & S. (2018+)] d Let X1, X2,... R be i.i.d. ν where ν is abs. cont. with support ∈ X Take µ = Uniform(Bd (0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ϕ where ϕ : Int( ) Int( ) is homeomorphism ∇ ∇ S → X Glivenko-Cantelli type result [Ghosal & S. (2018+)] d Let X1, X2,... R be i.i.d. ν where ν is abs. cont. with support ∈ X Take µ = Uniform(Bd (0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ϕ where ϕ : Int( ) Int( ) is homeomorphism ∇ ∇ S → X

Let Qˆn and Rˆn be any sample quantile and rank functions

Let K Int( ) be a compact set. Then, we have 1 ⊂ S a.s. a.s. sup Qˆn(u) Q(u) 0 and sup Rˆn(x) R(x) 0 d u∈K1 k − k → x∈R k − k → Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which: (i) assumed that ν is compactly supported; (ii) showed uniform convergence of Rˆn only on compacts inside Int( ); (iii) showed in probability convergence. X Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing Quantile maps

QˆX and QˆY are the sample quantile maps for Xi ’s and Yj ’s

Population quantile maps: QX and QY

Recall: QX #µ = νX and QX #µ = νY

Two-sample Testing

d Suppose that X1,..., Xm are i.i.d. νX (abs. cont.) on R

d Suppose that Y1,..., Yn are i.i.d. νY (abs. cont.) on R

µ: Distribution on Rd (e.g., µ = Unif([0, 1]d )) S ⊂

Goal: Test H : νX = νY versus H : νX = νY 0 1 6 Two-sample Testing

d Suppose that X1,..., Xm are i.i.d. νX (abs. cont.) on R

d Suppose that Y1,..., Yn are i.i.d. νY (abs. cont.) on R

µ: Distribution on Rd (e.g., µ = Unif([0, 1]d )) S ⊂

Goal: Test H : νX = νY versus H : νX = νY 0 1 6 Quantile maps

QˆX and QˆY are the sample quantile maps for Xi ’s and Yj ’s

Population quantile maps: QX and QY

Recall: QX #µ = νX and QX #µ = νY Motivation: One sample Cram´er-vonMises

Z Z 1 2 −1 2 {Fn(x) − F (x)} dF (x) = {Fn(F (u)) − u} du 0

Goal: Test H : νX = νY versus H : νX = νY 0 1 6

QˆX and QˆY are the sample quantile maps for Xi ’s and Yj ’s

Joint rank map: Rˆm,n is the rank map (properly defined) of the combined sample X1,..., Xm, Y1,..., Yn

Test statistic:

2 Tm,n := Rˆm,n(QˆX (u)) Rˆm,n(QˆY (u)) dµ(u) − ZS

Goal: Test H : νX = νY versus H : νX = νY 0 1 6

QˆX and QˆY are the sample quantile maps for Xi ’s and Yj ’s

Joint rank map: Rˆm,n is the rank map (properly defined) of the combined sample X1,..., Xm, Y1,..., Yn

Test statistic:

2 Tm,n := Rˆm,n(QˆX (u)) Rˆm,n(QˆY (u)) dµ(u) − ZS

Motivation: One sample Cram´er-vonMises statistic

Z Z 1 2 −1 2 {Fn(x) − F (x)} dF (x) = {Fn(F (u)) − u} du 0 Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)] m Suppose that θ (0, 1) as m, n . Under H : νX = νY , m+n → ∈ → ∞ 0 P Tm,n 0, as m, n . → → ∞

Further, for νX = νY (and mild regularity conditions on νX and νY ), 6 P Tm,n c > 0 as m, n . → → ∞

2 Test statistic: Tm,n = Rˆm,n(QˆX (u)) Rˆm,n(QˆY (u)) dµ(u) S − R

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1? Theorem [Ghosal and S. (2018+)] m Suppose that θ (0, 1) as m, n . Under H : νX = νY , m+n → ∈ → ∞ 0 P Tm,n 0, as m, n . → → ∞

Further, for νX = νY (and mild regularity conditions on νX and νY ), 6 P Tm,n c > 0 as m, n . → → ∞

2 Test statistic: Tm,n = Rˆm,n(QˆX (u)) Rˆm,n(QˆY (u)) dµ(u) S − R

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test 2 Test statistic: Tm,n = Rˆm,n(QˆX (u)) Rˆm,n(QˆY (u)) dµ(u) S − R

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)] m Suppose that θ (0, 1) as m, n . Under H : νX = νY , m+n → ∈ → ∞ 0 P Tm,n 0, as m, n . → → ∞

Further, for νX = νY (and mild regularity conditions on νX and νY ), 6 P Tm,n c > 0 as m, n . → → ∞ Outline

1 Introduction to Optimal Transportation Monge’s Problem Kantorovich Relaxation: Primal Problem A Geometric Approach

2 Quantile and Rank Functions in Rd (d 1) ≥

3 Some Applications is Statistics Two-sample Goodness-of-fit Testing Independence Testing ˆX dX dX ˆY Rn : R R — rank map of X1,..., Xn; similarly Rn → ˜ ˆX ˆY d d Define Rn := (Rn , Rn ): R [0, 1] → 2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R

Independence Testing

dX dY (X1, Y1),..., (Xn, Yn) are i.i.d. ν (abs. cont.) on R × R ; dX + dY = d

Goal: Test H : X Y versus H : X Y 0 ⊥⊥ 1 6⊥⊥

d d µX = Unif([0, 1] X ), µY = Unif([0, 1] Y )

d µ = µX µY = Unif([0, 1] ) × ˆ d d Rn : R R — rank map of joint sample( X1, Y1),..., (Xn, Yn) → ˆ Qn: sample quantile map of joint sample( X1, Y1),..., (Xn, Yn) Independence Testing

dX dY (X1, Y1),..., (Xn, Yn) are i.i.d. ν (abs. cont.) on R × R ; dX + dY = d

Goal: Test H : X Y versus H : X Y 0 ⊥⊥ 1 6⊥⊥

d d µX = Unif([0, 1] X ), µY = Unif([0, 1] Y )

d µ = µX µY = Unif([0, 1] ) × ˆ d d Rn : R R — rank map of joint sample( X1, Y1),..., (Xn, Yn) → ˆ Qn: sample quantile map of joint sample( X1, Y1),..., (Xn, Yn)

ˆX dX dX ˆY Rn : R R — rank map of X1,..., Xn; similarly Rn → ˜ ˆX ˆY d d Define Rn := (Rn , Rn ): R [0, 1] → 2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)] Under H : X Y , 0 ⊥⊥ P Tn 0, as n . → → ∞ Further, if X Y (and mild regularity conditions), 6⊥⊥ P Tn c > 0, as n . → → ∞

ˆ ˆ Rn and Qn — rank and quantile map based on (X1, Y1),..., (Xn, Yn)

˜ ˆX ˆY ˆX dX dX Rn := (Rn , Rn ); Rn : R R — rank map of X1,..., Xn; ˆY → similarly Rn rank map of Y1,..., Yn

2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)] Under H : X Y , 0 ⊥⊥ P Tn 0, as n . → → ∞ Further, if X Y (and mild regularity conditions), 6⊥⊥ P Tn c > 0, as n . → → ∞

ˆ ˆ Rn and Qn — rank and quantile map based on (X1, Y1),..., (Xn, Yn)

˜ ˆX ˆY ˆX dX dX Rn := (Rn , Rn ); Rn : R R — rank map of X1,..., Xn; ˆY → similarly Rn rank map of Y1,..., Yn

2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R Question: Is Tn is (asymptotically) distribution-free for d > 1? Theorem [Ghosal and S. (2018+)] Under H : X Y , 0 ⊥⊥ P Tn 0, as n . → → ∞ Further, if X Y (and mild regularity conditions), 6⊥⊥ P Tn c > 0, as n . → → ∞

ˆ ˆ Rn and Qn — rank and quantile map based on (X1, Y1),..., (Xn, Yn)

˜ ˆX ˆY ˆX dX dX Rn := (Rn , Rn ); Rn : R R — rank map of X1,..., Xn; ˆY → similarly Rn rank map of Y1,..., Yn

2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle ˆ ˆ Rn and Qn — rank and quantile map based on (X1, Y1),..., (Xn, Yn)

˜ ˆX ˆY ˆX dX dX Rn := (Rn , Rn ); Rn : R R — rank map of X1,..., Xn; ˆY → similarly Rn rank map of Y1,..., Yn

2 Test statistic: Tn := Rˆn(Qˆn(u)) R˜n(Qˆn(u)) dµ(u) S − R Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)] Under H : X Y , 0 ⊥⊥ P Tn 0, as n . → → ∞ Further, if X Y (and mild regularity conditions), 6⊥⊥ P Tn c > 0, as n . → → ∞ Estimation of the “center” of the data cloud

Sample “median” Qˆn(0) when µ = Uniform(Bd (0, 1))

P We can show Qˆn(0) Q(0). What about rate of convergence? What is the limiting→ distribution?

What about other sample quantiles?

Thank you very much! Questions?

Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energy distance methods, mutual information, etc. Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energy distance methods, mutual information, etc.

Estimation of the “center” of the data cloud

Sample “median” Qˆn(0) when µ = Uniform(Bd (0, 1))

P We can show Qˆn(0) Q(0). What about rate of convergence? What is the limiting→ distribution?

What about other sample quantiles?

Thank you very much! Questions? Kantorovich duality for general cost functions

π Π(µ, ν): all probability dist. on with marginals µ and ν ∈ X × Y Kontorovich duality Let I (π) := c(x, y)dπ(x, y) where c( , ) 0 is l.s.c. Then, X ×Y · · ≥ R inf I (π) = sup φ(x)dµ(x) + ψ(y)dν(y) π∈Π(µ,ν) (φ,ψ)∈Φc ZX ZY 

where Φc is the set of all measurable functions (φ, ψ) L1(µ) L1(ν) satisfying ∈ ×

φ(x) + ψ(y) c(x, y) for µ-a.e. x , ν-a.e. y . ≤ ∈ X ∈ Y

In fact, one can restrict to fncs. in Φc that are bounded & continuous. Duality when c(x, y) = x y 2 and = = Rd k − k X Y µ, ν probability measures on Rd with finite second moments π Π(µ, ν): all prob. measures on Rd Rd with marginals µ and ν ∈ × 2 2 Let M2 = d x dµ(x) + d y dν(y) < + R R k k 2 k k ∞ Then, inf x y dπ(x, y) = M2 2 sup x, y dπ(x, y) π∈Π(Rµ,ν) k − k R − π∈Π(µ,ν) h i R R Kontorovich duality For a pair( φ, φ∗) of l.s.c. proper conjugate cvx. func. s.t. for a.e. x, y Rd , x, y φ(x) + ψ(y), we have ∈ h i ≤ Z Z Z  sup hx, yidπ(x, y) = inf φ(x)dµ(x) + φ∗(y)dν(y) (2) ∗ π∈Π(µ,ν) (φ,φ )

Result: Suppose that( ϕ, ϕ∗) solves (??). If µ is abs. cont., then (i) the unique optimal tranference plan is π = (id, ϕ)#µ; (ii) ϕ is the unique solution to the Monge problem:∇ ∇ x ϕ(x) 2dµ(x) = inf x T (x) 2dµ(x) k − ∇ k T :T #µ=ν k − k Z Z Proof: Step 1

Let J(φ, ψ) := d φ(x)dµ(x) + d ψ(y)dν(y) R R R R Duality: inf I (π) = sup J(φ, ψ) π∈Π(µ,ν) (φ,ψ)∈Φ where (φ, ψ) Φ iff for a.a. x, y Rn, φ(x) + ψ(y) x y 2 ∈ ∈ ≤ k − k

2 2 Simple algebra yields: x, y kxk φ(x) + kyk ψ(y) h i ≤ 2 − 2 −

2 h 2 i h i Define: φ˜(x) = kxk φ(x), ψ˜(y) = kyk ψ(y) 2 − 2 −

2 inf x y dπ(x, y) = M2 2 sup x, y dπ(x, y) π∈Π(µ,ν) k − k − π∈Π(µ,ν) h i R ˜ ˜ R sup J(φ, ψ) = M2 2 inf J(φ, ψ) (φ,ψ)∈Φ − (φ,˜ ψ˜)∈Φ˜ where (φ, ψ) Φ˜ iff for a.a. x, y, x, y φ(x) + ψ(y) ∈ h i ≤ Thus, sup x, y dπ(x, y) = inf J(φ,˜ ψ˜) π∈Π(µ,ν) h i (φ,˜ ψ˜)∈Φ˜ R Proof: Step 2 Double convexification trick to improve the admissible pairs in the dual problem Semi-discrete OT

d 1 n Data: X1,..., Xn in R ; ν = n i=1 δXi P µ: an abs. cont. distribution on compact convex set Rd S ⊂ The dual Kantorovich problem in this setting can be written as:

inf ψ∗(x)dµ(x) + ψ(y)dν(y) ψ convex Z Z

Let ψi = ψ(Xi ), the above minimization problem reduces to

1 n (M) = inf sup Xi , x ψi dµ(x) + ψi {h i − } n " S i # Z X=1 Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = f (x) {∇ }

A convex function (in Rd ) is a.e. differentiable

Some facts about convex functions

Given a convex function f : Rd R + we define the subdifferential of f at x Dom→(f ) by∪ { ∞} ∈ d d ∂f (x) := ξ R : f (x) + y x, ξ f (y), for all y R ∈ h − i ≤ ∈  Some facts about convex functions

Given a convex function f : Rd R + we define the subdifferential of f at x Dom→(f ) by∪ { ∞} ∈ d d ∂f (x) := ξ R : f (x) + y x, ξ f (y), for all y R ∈ h − i ≤ ∈  Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = f (x) {∇ }

A convex function (in Rd ) is a.e. differentiable Legendre-Fenchel dual Let φ : Rd R + . The convex conjugate φ∗ : Rd R + of φ is defined→ as∪ { ∞} → ∪ { ∞}

φ∗(x) := sup x, y φ(y) d y∈R {h i − }

Lemma (Characterization of subdifferential)

Let f : Rd R + be a proper (i.e., f (x) < for some x Rd ) → ∪ { ∞} ∞ ∈ l.s.c. convex function. Then for all x, y Rd , ∈ x, y = f (x) + f ∗(y) y ∂f (x) x ∂f ∗(y). h i ⇐⇒ ∈ ⇐⇒ ∈ Lemma (Legendre duality)

Let f : Rd R + be a proper function. Then the following three properties are→ equivalent:∪ { ∞} (i) f is convex l.s.c. function; (ii) f = ψ∗ for some proper function ψ; (iii) f ∗∗ = f . When is (K) = (M)? (with c(x, y) = x y 2) k − k Discrete case M N Suppose that µ and ν supported on xi and yj { }i=1 { }j=1 Kantorovich’s problem (K):

M N M N

min pij c(xi , yj ): pij = ν(yj ); pij = µ(xi ) {pij ≥0}   i j i j X=1 X=1 X=1 X=1 

M 2 Monge’s problem (M): minT T µ ν xi T (xi ) µ(xi )  : # = i=1 k − k P In general, (K) = (M) 6 If M = N and 1 µ(xi ) = ν(yj ) = i, j, N ∀ then the optimal transference plan in the (K) problem coincides with the solution of the (M) problem Absolutely continuous case µ and ν are absolutely continuous

Then there is a unique solution to the (K) problem which turns out to be also the solution of the (M) problem

Semi-discrete case N d ν supported on finite set yj j ; µ abs. cont. with support R { } =1 S ⊂ N Monge’s problem: Find T s.t. T #µ = i=1 νj δyj & minimizes N P 2 2 x T (x) dµ(x) = x yj dµ(x) k − k k − k Z j=1 Z X{x:T (x)=yj }

−1 Note that: µ(T (yj )) = µ( x : T (x) = yj ) = νj { } Then there is a unique solution to the (K) problem which turns out to be also the solution of the (M) problem