NEW TECHNIQUES IN OPTIMAL TRANSPORT A Thesis Submitted to the Faculty in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mathematics by James Gordon Ronan DARTMOUTH COLLEGE Hanover, New Hampshire May 2021
Examining Committee: AnneGeet Anne Gelb, Chair Mather Pour Matthew Parno
Peter Doyle
Douglas Cochran
F. Jon Kull, Ph.D. Dean of the Guarini School of Graduate and Advanced Studies
Abstract
This thesis develops a new technique for applications of optimal transport and presents anewperspectiveonoptimaltransportthroughthemeasuretheoretictooloftransi- tion kernels. Optimal transport provides a way of lifting a distance metric on a space to probability measures over that space. This makes the field well suited for certain types of image analysis. Part of this thesis focuses on a new application for optimal transport, while the other focuses on a new approach to optimal transport itself. With respect to the first part of this thesis, we propose using semi-discrete optimal transport for the estimation of parameters in physical scenes and show how to do so. Optimal transport is a natural setting when studying images because displacements of the objects in the image directly correspond to a change in the optimal transport cost. In the second part of this thesis we discuss transition kernels, which provide a mathematical tool that can be used to map measures to measures. It therefore seems intuitive to incorporate transition kernels into optimal transport problems. However, this requires changing the traditional perspective of viewing optimal transport as primarily a tool to measure distances between two fixed measures. To that end, this thesis develops theory to show how kernels may be used to extend optimal transport to signed measures.
ii Preface
IamfortunateandgratefultobewhereIamtoday.Myparentshavesupportedand encouraged me along the way, and pushed me to take all of the opportunities that I have been given. My sincere gratitude goes to my advisor Anne Gelb for accepting me as her odd duck of a student. She has helped me to grow into a better mathematician and think about how mathematics should serve the world. She and my secondary advisor, Matthew Parno have been resiliently positive and optimistic despite the challenges of the past year. I am excited to continue to work with and to learn from them in the future. Thank you to the remainder of my committee, Douglas Cochran and Peter Doyle, for meeting with me and helping to improve this thesis.
iii Contents
Abstract...... ii Preface ...... iii
1Overview 1
2 Preliminaries 4 2.1 Measure Theory Background ...... 4 2.2 OT Background ...... 10 2.2.1 Historical development of optimal transport ...... 10 2.2.2 The Kantorovich Problem ...... 11 2.2.3 Duality Theory ...... 17 2.3 Wasserstein Distances ...... 25 2.3.1 Examples of Wasserstein distances ...... 29 2.3.2 Geodesics in Wasserstein Space ...... 31 2.4 1-D OT ...... 35 2.4.1 One Dimensional Transport ...... 36 2.4.2 Structure of Monotone Coupling ...... 38 2.4.3 c-Cyclic Monotonicity in One Dimension ...... 41 2.4.4 OptimalTransportMaps...... 44
3 SDOT 45
iv 3.1 SDOT ...... 46 3.1.1 Laguerre Cells ...... 47 3.1.2 Theoretical Results on SDOT ...... 52 3.2 AlgorithmicSDOT ...... 54 3.2.1 AlgorithmicModifications ...... 60 3.2.2 Other Regularized Optimal Transport ...... 66 3.2.3 Quantization ...... 67
4ParameterEstimation 69 4.1 Simple Examples ...... 70 4.1.1 Centers of Mass ...... 70 4.1.2 AngleofRotation...... 73 4.1.3 Rotating and Translating Object ...... 76 4.2 Misfits in Time ...... 77 4.2.1 Velocity Estimation ...... 78 4.2.2 Colliding Balls ...... 80 4.3 Cantilever Beam ...... 85 4.3.1 Model Set-up ...... 86 4.3.2 Results ...... 90 4.3.3 AdjointGradient ...... 92 4.4 Further Questions ...... 98 4.4.1 Representation of Objects ...... 98 4.4.2 Time Sensitivity ...... 99 4.4.3 BlurringandNoise ...... 99 4.4.4 AnalysisofSolutionOperators...... 99
5 OT Kernels 101
v 5.1 Kernels Background ...... 102 5.2 Kernels for OT ...... 105 5.2.1 Optimal transport kernels ...... 107 5.2.2 Geodesics and Kernels ...... 112 5.3 Signed OT with Kernels ...... 114 5.3.1 One Dimensional Signed Optimal Transport ...... 118 5.4 FutureQuestions ...... 123
6 Conclusion 124
References 127
vi Chapter 1
Overview
Optimal transport is an area of mathematics that combines analysis, probability and geometry while o↵ering applications in diverse areas. Optimal transport studies various kinds of transport costs that describe how to rearrange one measure into another. The initial description of optimal transport codified these rearrangements as transport maps, T , and the mass of the measure at x was sent to T (x). Just as there are many ways to go from point A to point B, there are many di↵erent possible transport maps which send one measure into another. Optimal transport is a way of assigning a cost to these rearrangements and understanding the transport map with the lowest total cost. This can be useful when looking at physical systems because it acts like Occam’s razor. The optimal transport map is the one that required the least e↵ort to get things done. While an optimal map might not perfectly reflect what happened, in some ways it is the simplest. It is important to note that a transport map sends one measure to another on the global scale, but it does so by coordinating the local paths at each point supported in the measure. The optimal transport cost represents perfect cooperation, where every point acts in accordance with everyone else to minimize the cost. Hence we see that optimal transport exists in a perfect world while still allowing us to understand our
1 Overview Overview imperfect one. The roots of optimal transport date back to Monge in 1781, and the idea was originally motivated by Monge’s interest in quantifying the amount of work it would take to excavate a hole of a particular shape and construct a pile in another shape [38]. While we have moved on from focusing on digging ditches, applications are an integral part of optimal transport to this day. In 1942 Kantorovich o↵ered a new perspective and re-framed the problem and showed the field’s applicability to economic problems [23]. The trends of re-interpretation and application can be found running through the history of optimal transport. A notable re-formulation was to view the problem in a continuous time setting in [8], which opened up the geometric aspects of the field which were explored further in [35, 40, 33]. The di↵erent formulations of the problems lent themselves to di↵ering applications and improved implementations that reinforce each other – more applications become feasible as the implementations improve, while the increase in potential applications drives the desire for better implementations. A few notable implementations include using an elliptic partial di↵erential equation (PDE) formulation of the problem, [9], entropic regularization, which adds a regularization term to the objective function of the problem, [14, 51], and semi-discrete optimal transport (SDOT) which exploits connections to computational geometry that arise when the class of measures un- der consideration is restricted, [24, 26, 28, 37]. Applications arise in diverse fields including economics [36], fluid dynamics [21], and seismic imaging [16, 17, 18]. This thesis presents a new way to apply optimal transport for parameter estima- tion. We show how to apply SDOT to form misfit functions for parameter estimation of physical systems. We show how we can use quantitization to create a discrete representation of the objects in the system, and use SDOT to compare our model to the observation. This approach can be used for a variety of applications and we
2 Overview Overview demonstrate this through examples. Additionally, this thesis demonstrates that there is a latent measure theoretic ker- nel structure in optimal transport, which allows us to restore some of the functionality that has been lost in reformulations of the problem. The original formulation of opti- mal transport su↵ered from many disadvantages, but when it was reframed in terms of couplings we lost the ability to send multiple measures through the same map. Kernels allow us to understand couplings in a new way to restore some of the lost functionality. We discuss how they may be exploited as a new approach to extending optimal transport to signed measures. This remainder of this thesis proceeds in four parts:
(a) Chapter 2 reviews the conceptual and mathematical foundations of optimal transport. This provides a somewhat broad introduction to the subject and establishes the notational conventions used throughout the thesis.
(b) Chapter 3 presents a focused review of semi-discrete optimal transport. Both the theoretical background of the area as well as topics related to the imple- mentation are discussed.
(c) Chapter 4 demonstrates how to construct an optimal transport misfit function and provides examples. We show how this formulation can be used to estimate parameters of physical systems.
(d) Chapter 5 introduces the kernel based framework and presents a new approach to signed optimal transport. We conclude this chapter by focusing on the one dimensional case.
Small results are presented throughout the thesis, with the major contributions taking place in Chapters 4 and 5.
3 Chapter 2
Preliminaries
The purpose of this chapter is to serve as an easy reference for the necessary back- ground on optimal transport. This includes first defining and introducing our notation for measure theoretic concepts before providing an introduction to optimal transport. A few examples are included in this chapter to illustrate key ideas.
Section 2.1 Definitions and Conventions in Measure Theory
Measure theory provides an essential building block of optimal transport. This section therefore summarizes measure theoretic concepts used in the thesis. A more thorough background may be found in standard analysis texts such as [20, 48] and probability focused texts such as [13, 25]. A curated review of the measure theory necessary for optimal transport is provided in [5, Ch. 5].
At the heart of measure theory is the intertwined triple (X, M,µ)whereX is aset,M is a -algebra of subsets of X,andµ is a measure. Elements of M are called measurable sets and the pair (X, M) is called a measurable space. When the -algebra is understood from context, X itself may be called a measurable space.
Both signed measures µ : M R and positive measures (more properly called ! [{1} 4 2.1 Measure Theory Background Preliminaries non-negative measures) µ : M [0, ] are used in this thesis. A measure may be ! 1 restricted to a subspace and sub-algebra. If µ is a measure on X, then we denote the restriction of µ to Y X as µ Y . ⇢ A metric space X with metric dist(x, y)iscompletewhenallCauchysequences converge and is separable when it contains a dense countable subset. If X is a topological space, the smallest -algebra containing all open subsets of X is called the Borel -algebra. Measures on Borel -algebras are called Borel measures.
Definition 2.1 ([56, Pg. XX]). A Polish space is a complete, separable metric space with the metric topology generated by open balls of the distance metric and the corresponding Borel -algebra.
Following [56, Pg. XX], the convention of this thesis is that all measures which we consider will be Borel measures on Polish Spaces. For a Polish space, the triple
(X, B,µ)canalsobethoughtofas(X, dist,µ)becausethe -algebra is generated from the distance dist.
Definition 2.2 ([5, Pg. 105]). The support supp(µ)ofapositivemeasureisthe closed set defined by
supp(µ):= x X : µ(U) > 0foreachneighborhoodU of x . { 2 }
If supp(µ) A, then we say that µ is concentrated on A. ⇢
We say a property holds µ almost everywhere, or µ-a.e. or simply a.e., if the set of points N where the property does not hold has measure zero, i.e. µ(N)=0.Thisis common for the uniqueness of functions in integral conditions, i.e. if E fdµ = E gdµ for all sets E M, this implies that f = g a.e. R R 2
Definition 2.3 ([20, Pg. 87]). Positive measures µ, ⌫ are said to be mutually
5 2.1 Measure Theory Background Preliminaries singular if there are disjoint sets A and B with A B = X and µ(A)=0and [ ⌫(B)=0.Thisiswrittenµ ⌫. ?
Theorem 2.4 (The Jordan Decomposition Theorem [20, Pg. 87]). If ⌫ is a signed
+ + measure, there exist unique positive measures ⌫ and ⌫ such that ⌫ = ⌫ ⌫ and + ⌫ ⌫ . ?
Definition 2.5 ([20, Pg. 88]). We say a signed measure ⌫ is absolutely continuous with respect to a positive measure µ if whenever µ(E)=0,then⌫(E)=0.Wedenote this as ⌫ µ. ⌧
We will say that a positive measure ⌫ is dominated by a positive measure µ if
⌫(E) µ(E)forallE M. 2 The integral of a measure µ on a space X is µ(X). The mass of a measure µ is
+ µ (X)+µ (X). For positive measures the mass and integral are the same. For a
+ + signed measure ⌫ = ⌫ ⌫ , the total variation measure is defined as ⌫ = ⌫ + ⌫ , | | so the mass of a measure is the integral of the total variation of the measure. The space of signed Borel measures with finite mass on X is denoted M(X), and the space of positive measures is denoted by M+(X). A positive measure with mass equal to 1 is a probability measure and the space of Borel probability measures is
P(X).
Definition 2.6 ([20, Pg. 43]). For two polish spaces (X, BX )and(Y,BY ), a function
1 T : X Y is a measurable map if T (E) B for all E B . 7! 2 X 2 Y
The class of measurable functions is broad but not all encompassing. However, limiting our discussion to measurable maps should not be seen as a constraint. A function f : X R is a measurable function if it is a measurable map between X 7! and R with the Borel -algebra on R.
6 2.1 Measure Theory Background Preliminaries
Definition 2.7 ([50, Pg. 3]). On a metric space X, a function f : X R is 7! [ {1} lower semicontinuous if for every sequence x x, f(x) lim inf f(x ). n ! n
Definition 2.8 ([20, Pg. 314]). Let (X, BX ,µ)beameasurespace,(Y,BY )bea measurable space, and T : X Y be a measurable map. Then T induces a push- 7! forward measure, or image measure, T#µ on Y via
1 T#µ(E):=µ(T (E)) for all E B . 2 Y
Integration of a function f on Y with respect to a push-forward measure T#µ is given by
f(y)dT#µ(y)= f(T (x))dµ(x). ZY ZX Push-forward measures play a large role in historical and modern optimal transport theory, and an important class of push-forward measures comes from projection maps. To this end, let X and Y be two Polish spaces (this condition is unnecessary, but present because our default space is a Polish space), then X Y is the product space ⇥ which is also a Polish space with the product -algebra.1 In this thesis we will use proj and proj to represent the maps from X Y X and X Y Y given by x y ⇥ 7! ⇥ 7! projx((x, y)) = x and projy((x, y)) = y.
Definition 2.9 ([20, Pg. 53]). A measurable function f is said to be integrable with respect to µ if f dµ < and we write f L1(µ). X | | 1 2 R Complex measures do not take on the value on any set. In this thesis we will 1 primarily be concerned with probability measures, which also do not take on the value
1A complete metric can be put on this space, and a countable product of separable spaces will still be separable.
7 2.1 Measure Theory Background Preliminaries
. Observe in Theorem 2.10 that one of the measures is complex. This is done to 1 avoid the complications caused by taking the value . 1 Theorem 2.10 (The Theorem of Lebesque-Radon-Nikodym [48, Pg. 121]). Recall a positive measure µ on X is said to be -finite if there is a countable union of sets E such that µ(E ) < and X = E . Let µ be a positive -finite measure on a i i 1 [ i -algebra M, and let be a complex measure on M.
(a) There is a unique pair of complex measures a and s on M such that