Shape Classification Via Optimal Transport and Persistent Homology

Shape classification via Optimal Transport and Persistent Homology A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Mathematical Sciences in the Graduate School of The Ohio State University By Ying Yin, B.S. Graduate Program in Mathematical Sciences The Ohio State University 2019 Master's Examination Committee: Facundo Mémoli,Advisor Tom Needham, Advisor Janet Best, Committee member © Copyright by Ying Yin 2019 Abstract Quantifying similarity between shapes is an important task in many disciplines, such as architecture, anatomy, security, and manufacturing. My research project is motivated by taxonomic studies in Biology. Taxonomy is the classification of biological organisms based on shared characteristics. In this thesis, we will explore two approaches, based on optimal transport and persistent homology, to discriminating shapes through defining a meaningful distance that reflects geometric or topological features of the shapes under study. By ap- proximating lower bounds to the Gromov-Wasserstein distance and the Gromov-Hausdorff distance, we automate the process of taxon classification by comparing geometric or topological features of anatomical surfaces. We test our implementations on a data set containing surfaces of crowns of teeth that are from primates and non-primates close relatives, obtained from [11]. ii Acknowledgments I want to thank my advisors, Facundo Mémoliand Tom Needham, for all the help and guidance that they have provided for the past two years. I appreciate that they motivated me to do my best and kept me on schedule. I also would like to thank Janet Best for sitting in my committee. I am very glad that Facundo introduced me to the TGDA group where I received tremen- dous help. I want to thank Woojin Kim for his patience in helping me understand optimal transport and persistence homology. I also want to thank Samir Chowdhury and Kritika Singhal for proofreading my thesis. Their comments are extremely helpful. Special thanks to my boyfriend, Kairui Zhang, for keeping me sane when I stressed out. I want to acknowledge Pingfan Hu for being an entertaining friend for many years. Kiwon Lee and Weihong Su, thank you for keeping me company for the past two years. In particular, I want to thank Kairui and Kiwon for proofreading my thesis and providing many valuable comments. I want to thank Slawomir Solecki, who triggered my interest in Mathematics and was also a good friend. This thesis would not be written if it were not for his encouragement. Lastly, I want to thank my parents for their support during my study and at every stage of my life. iii Vita 2017 . .B.S., Double major in Mathematics and Economics, University of Illinois at Urbana- Champaign 2017-present . .Graduate Teaching Associate and Re- search Associate, the Ohio State University. Fields of Study Major Field: Mathematical Sciences iv Table of Contents Page Abstract . ii Acknowledgments . iii Vita............................................. iv List of Tables . vii List of Figures . viii 1. Introduction . .1 1.1 Motivation . .1 1.2 Overview of shape analysis . .3 1.3 Optimal Transport . .4 1.3.1 Wasserstein distance . .5 1.3.2 Gromov-Wasserstein distance . .5 1.3.3 Computation of TLB . .6 1.4 Persistent homology . .6 1.4.1 Bottleneck distance . .7 1.4.2 Computation of the Bottleneck distance . .7 1.5 Application of the TLB and the bottleneck distance . .7 2. Optimal Transport . 10 2.1 Brief History . 10 2.2 Monge-Kantorovich formulation . 12 2.3 Wasserstein distance . 14 2.4 Gromov-Wasserstein distance . 15 2.5 Third lower bound . 16 2.6 Sinkhorn's Algorithm . 18 3. Persistent Homology . 21 3.1 Brief History . 21 3.2 Simplicial homology . 22 3.3 Functoriality of Hk ............................... 24 3.4 Persistent homology . 25 3.5 Persistence diagrams . 27 3.5.1 The four point example . 29 3.6 Bottleneck distance . 31 3.7 Interleaving distance . 31 3.8 Stability results of persistence diagrams . 37 3.8.1 Stability of Vietoris-Rips filtration . 37 3.8.2 Stability of filtration functions . 37 3.9 Computation of bottleneck distance . 38 v 4. Experiments . 42 4.1 Summary of data . 42 4.2 Overview of experiments . 44 4.2.1 Quantitative measure of quality of classification . 47 4.3 The OT approach . 47 4.3.1 Outline of the OT approach . 47 4.3.2 Results of using the OT approach . 50 4.3.3 Using Euclidean distance with uniform probability measures . 51 4.3.4 Using geodesic distance with uniform probability measures . 52 4.3.5 Using Voronoi probability measures . 55 4.3.6 Summary of results using the OT approach . 55 4.4 The PH Approach . 58 4.4.1 Preprocessing data . 58 4.4.2 Mean curvature . 58 4.4.3 Outline of the PH approach . 60 4.4.4 Results of using the PH approach . 61 4.4.5 Using Vietoris-Rips filtration . 62 4.4.6 Using mean curvature based filtration functions . 64 4.4.7 Summary of results using the PH approach . 70 4.5 Comparison of the results from the OT approach and the PH approach . 71 5. Contributions and Future Work . 74 5.1 Conclusion . 74 5.2 Future work . 74 5.2.1 Improvement on the OT approach . 74 5.2.2 Improvement on the PH approach . 75 5.2.3 Other approaches . 75 5.2.4 Experiment on different data sets . 75 Appendices 77 A. Main functions and scripts . 77 A.1 OT approach . 77 A.1.1 Compute local distribution . 77 A.1.2 Compute TLB . 77 A.2 The PH approach . 79 A.2.1 Fill 1-cycles . 79 A.2.2 Compute persistence diagrams of sublevel set filtration of a function in (4.3) - (4.6) . 81 A.3 Probability of error (Pe)............................ 82 Bibliography . 83 vi List of Tables Table Page 4.1 Statistics of families, genera and diets of the teeth in the data set. 43 4.2 Parameters that can be tuned in the experiment. The choices of D are not uniformly distributed. The \Normalization" column in the table indicates if the distance matrix is normalized. 46 4.3 Table above shows all the combinations of parameters that we tried using the PH approach. For each method, we consider both the normalized and the unnormalized versions. 46 4.4 Probability of Error table for experiments using optimal transport. 50 4.5 Parameters used in experiments where the lowest Pe for each label category. 57 4.6 Probability of error of experiments using persistent homology. 62 4.7 Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips filtration. 63 4.8 Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips with modifided weight given in (4.1). 64 4.9 Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.2) in section 4.4.6. 66 4.10 Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.3) and normalized (4.5). \absMeancurv" represents (4.3) and \minus absMeancurv" represents (4.5). 70 4.11 Change in Pe observed in Table 4.10 that is caused by switching from dB;1 to dB;2 for the distance between persistence diagrams. 70 4.12 List of methods with the best Pe using the PH approach. 71 vii List of Figures Figure Page 1.1 Surface of the crown of a tooth in the data set belonging to animal in family Lemuridae with frugivorous diet. The color bars on the side indicates mean curvature. .8 1.2 Dendrogram of classification results of TLB using D = 0.32. An outlier is excluded. Labels indicates dietary preferences of the owners of the teet.h . .9 2.1 Example of an optimal transport problem. Figure 1 in [51]. 11 2.2 Figure from [73] showing two examples of the Monge's problem. On the left, is a case where the cardinalities of two spaces are the same and each point has equal weight. Hence, the optimal transport map is a permutation. However, on the right is an example where a transport map from the red dots to the blue dots does not exist. 13 3.1 Figure from [95]. It shows the Betti numbers βk for different shapes. 24 3.2 Example of Vietrois-Rips complex on a set of three poitns. Figure 3.2a shows the original set. Figure 3.2b and 3.2c show all the simplicies in Kr as r increases. 27 3.3 An example of persistence diagrams of applying Vietoris-Rips filtration on a sampled point cloud of a circle. On the left is the sampled circle with radius 0.5 centered at (0.5,0.5). The middle figure shows the persistence diagram in the 0th dimension and the figure on the right shows the persistence diagram in the 1st dimension. We observe one point . 29 3.4 Example of a simplicial complex with filtration function defined in the fol- lowing way: all vertices enters at filtration value 0; the value of the filtration function at an edge is given by the weight of the edge; once three edges form a triangle, we add a face to the interior of the triangle. 30 3.5 Filtered simplicial complex Kt at filtration value t. Note that a cycle Hp1 appears when t = 1 and is \killed" by the new triangles appeared when t = 2. 30 3.6 Persistence barcodes in H0 and H1 of the filtered simplicial complex. 30 3.7 Persistence barcode representation of all the cases of the relative positions between I1 and I2.

Load more