Algorithms for Optimal Transport and Wasserstein Distances

Algorithms for Optimal Transport and Wasserstein Distances Dissertation for the award of the degree “Doctor rerum naturalium” (Dr. rer. nat.) of the Georg-August-UniversitätGöttingen within the doctoral program “Mathematical Sciences” of the Georg-August University School of Science (GAUSS) submitted by JörnSchrieber from Hamburg Göttingen,2019 ii Thesis Committee Prof. Dr. Dominic Schuhmacher, Institute for Mathematical Stochastics Prof. Dr. Anita Schöbel, Institute for Numerical and Applied Mathematics Members of the Examination Board Reviewer: Prof. Dr. Dominic Schuhmacher, Institute for Mathematical Stochastics Second Reviewer: Prof. Dr. Anita Schöbel, Institute for Numerical and Applied Mathematics Further Members of the Examination Board Prof. Dr. Axel Munk, Institute for Mathematical Stochastics Prof. Dr. Stephan Huckemann, Institute for Mathematical Stochastics Prof. Dr. Gerlind Plonka-Hoch, Institute for Numerical and Applied Mathe- matics Prof. Dr. Russell Luke, Institute for Numerical and Applied Mathematics Date of the oral examination 14 February, 2019 Acknowledgements I would like to express my gratitude towards everyone, who supported me during my work on this thesis. First and foremost, thanks to my supervisors Dominic Schuhmacher and Anita Schöbel, who took the time for discussions on a regular basis and provided me with their unique perspectives, insights on the matter and invaluable advice. This dedication is not taken for granted and greatly appreciated! I wish to thank everyone else I had the pleasure to collaborate with on research projects: Carsten Gottschlich, Max Sommerfeld, Axel Munk and Yoav Zemel from the Institute for Mathematical Stochastics. Further thanks to Carsten Gottschlich and Christian Böhm for their aid in resolving countless technical difficulties. I would also like to thank my colleagues from the Institute for Mathematical Stochastics, who all contribute to the friendly and cooperative atmosphere at the institute, and the workgroup for optimization at the Institute for Numerical and Applied Mathematics, who despite my spatial distance always welcomed me to seminars and workshops. Furthermore, thanks to Khwam Tabougua Trevor for helping me with the creation of the plots in Chapter 5 during his time as a student assistant. I gratefully acknowledge funding by the German Research Foundation (DFG) as part of the Research Training Group 2088 “Discovering Structure in Complex Data: Statistics Meets Optimization and Inverse Problems” during the time from October 2015 to September 2018. iv Preface Optimal Transport and Wasserstein Distance are closely related terms that do not only have a long history in the mathematical literature, but also have seen a resurgence in recent years, particularly in the context of the many applications they are used in, which span a variety of scientific fields including - but not limited to - imaging, statistics and machine learning. Due to drastic increases in data volume and a high demand for Wasserstein distance computation, the development of more efficient algorithms in the domain of optimal transport increased in priority and the advancement picked up pace quickly. This thesis is dedicated to algorithms for solving the optimal transport problem and computing Wasserstein distances. After an introduction to the field of optimal transport, there will be an overview of the application areas as well as a summary of the most important methods for computation with a focus on the discrete optimal transport problem. This is followed by a presentation of a benchmark for discrete optimal transport together with a performance test of a selection of algorithms on this data set. Afterwards, two new approaches are introduced: a probabilistic approximation method for Wasserstein distances using subsampling and a clustering method, which aims to generalize multiscale methods to discrete optimal transport problems, including instances with a non-metric cost function. Collaborations and Contributions While this thesis is largely based on the author’s original research, it would not have been possible without the work of other people. In particular, parts vi of this thesis have been previously published as papers and are the results of collaborations. Chapters 1 and 2 serve as a general introduction into optimal transport and an overview over popular algorithms, respectively, and is the author’s compilation of work done by other researchers. Chapter 3 is a collaboration between Dominic Schuhmacher, Carsten Gottschlich and the author of this thesis and was published as the paper DOTmark - A Benchmark for Discrete Optimal Transport [78] in IEEE Access. Conceptualization and creation of the benchmark itself was largely done by Dominic Schuhmacher, while the author was responsible for the performance test, including the implementation of the shielding method, execution and analysis of the results. The content of Chapter 4 is available as the preprint Optimal Transport: Fast Probabilistic Approximation with Exact Solvers [90], which is joint work with Max Sommerfeld, Yoav Zemel, Axel Munk. While the author and Max Sommerfeld contributed in equal parts to the simulations and analysis in Section 4.4, Max Sommerfeld and Axel Munk conceptualized the subsampling scheme and proved the theoretical results with several improvements contributed by Yoav Zemel. The proofs of the results in Section 4.3 are omitted in this thesis and they can be found in the appendix of the paper [90]. Due to this collaboration the content of this chapter is also contained in Max Sommerfeld’s doctoral thesis [88]. Chapter 5 is based on the author’s own research. Contents 1 Introduction 1 1.1 Mathematical Setup . .2 1.2 Applications . .8 1.3 Motivation and Organization of this Work . 10 2 Algorithms for Optimal Transport 15 2.1 Linear Programming . 16 2.2 AHA Method . 22 2.3 Entropically Regularized Optimal Transport . 25 2.4 Multiscale Methods . 28 3 DOTmark: A Benchmark for Discrete Optimal Transport 33 3.1 Brief Theoretical Background . 34 3.2 Benchmark . 37 3.3 Tested Methods . 41 3.4 Computational Results . 44 3.4.1 Runtimes . 44 3.4.2 Errors of the AHA Method . 47 3.4.3 Iterations of the Shielding Method . 48 3.5 Discussion . 50 3.6 Outlook . 52 4 Probabilistic Approximation with Exact Solvers 55 4.1 Introduction . 55 4.1.1 Computation . 56 viii CONTENTS 4.1.2 Contribution . 57 4.2 Problem and Algorithm . 58 4.3 Theoretical Results . 60 4.3.1 Expected Absolute Error . 61 4.3.2 Concentration Bounds . 64 4.4 Simulations . 65 4.4.1 Simulation Setup . 65 4.4.2 Computational Results . 69 4.5 Discussion . 72 5 Cost-Based Clustering: A General Multiscale Approach to Discrete Optimal Transport 75 5.1 Introduction . 75 5.2 The Cost-Based Clustering Problem . 78 5.2.1 Deviation Bounds of Clusterings . 80 5.2.2 Comparison to Other Clustering Objectives and Devia- tion Bounds . 83 5.3 Clustering Algorithms . 87 5.3.1 Agglomerative Clustering . 88 5.3.2 Clustering with Random Representatives . 93 5.3.3 Propagation . 95 5.4 Simulations . 98 5.4.1 Simulation Setup . 98 5.4.2 Computational Results . 100 5.5 A General View on Cost-Based Clustering . 112 5.6 Summary and Discussion . 115 5.7 NP-Hardness Proof . 118 5.7.1 Geometric Clustering Simplification . 119 5.7.2 Geometric Reduction Concept . 126 5.7.3 Geometric Reduction Proofs . 129 5.7.4 Auxiliary Lemmas . 144 Chapter 1 Introduction Between the different mathematical fields, the subject of optimal transport is uniquely positioned. The history of its theory goes back to the 18th century, when the French mathematician Gaspard Monge published his work [59] on a novel problem formulation at the boundary between analysis and probability theory, which would later become known as the Monge formulation of optimal transport. Two centuries later, Kantorovich revised this subject with his more general statement of the problem [46], which laid the foundation for optimal transport to be as widely applied as it is today and the Kantorovich formulation is primarily considered in modern applications. In the discrete case, both formulations have deep connections to well studied optimization problems. For example, the assignment problem is a special case of the Monge formulation and the Kantorovich problem can be written as a minimum cost flow problem on a complete bipartite network. Conceptually, this problem revolves around the idea of efficient rearrange- ment and operates on probability measures. What is rearranged in a particular application can vastly differ - from physical goods, people or particles to more abstract concepts, like greyscale values of pixels in images or the classification result of a neural network - as long as it can be cast in the form of a measure. It comes as no surprise that potential applications are plenty and consequently optimal transport research has thrived in recent years. The rich theoretical background in this field is more and more supported by the development and 2 1.1. MATHEMATICAL SETUP analysis of modern methods to tackle optimal transport problems. However, in many subject areas, the application of optimal transport or Wasserstein distances is still held back by the large computational burden. Despite the ability to cast optimal transport into the form of a very simple linear program, the input sizes often vastly exceed the boundaries of viability of even the latest linear programming methods with high runtime and memory consumption. This was somewhat mitigated by advances on regularized optimal transport and the introduction of the Sinkhorn scaling

Load more