Shape classification via Optimal Transport and Persistent Homology

A Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of Mathematical Sciences in the Graduate School of The Ohio State University

By

Ying Yin, B.S.

Graduate Program in Mathematical Sciences

The Ohio State University

2019

Master’s Examination Committee:

Facundo M´emoli,Advisor Tom Needham, Advisor Janet Best, Committee member © Copyright by

Ying Yin

2019 Abstract

Quantifying similarity between shapes is an important task in many disciplines, such as architecture, anatomy, security, and manufacturing. My research project is motivated by taxonomic studies in Biology. Taxonomy is the classification of biological organisms based on shared characteristics. In this thesis, we will explore two approaches, based on optimal transport and persistent homology, to discriminating shapes through defining a meaningful distance that reflects geometric or topological features of the shapes under study. By ap- proximating lower bounds to the Gromov-Wasserstein distance and the Gromov-Hausdorff distance, we automate the process of taxon classification by comparing geometric or topolog- ical features of anatomical surfaces. We test our implementations on a data set containing surfaces of crowns of teeth that are from and non-primates close relatives, obtained from [11].

ii Acknowledgments

I want to thank my advisors, Facundo M´emoliand Tom Needham, for all the help and guidance that they have provided for the past two years. I appreciate that they motivated me to do my best and kept me on schedule. I also would like to thank Janet Best for sitting in my committee.

I am very glad that Facundo introduced me to the TGDA group where I received tremen- dous help. I want to thank Woojin Kim for his patience in helping me understand optimal transport and persistence homology. I also want to thank Samir Chowdhury and Kritika

Singhal for proofreading my thesis. Their comments are extremely helpful.

Special thanks to my boyfriend, Kairui Zhang, for keeping me sane when I stressed out. I want to acknowledge Pingfan Hu for being an entertaining friend for many years. Kiwon Lee and Weihong Su, thank you for keeping me company for the past two years. In particular,

I want to thank Kairui and Kiwon for proofreading my thesis and providing many valuable comments.

I want to thank Slawomir Solecki, who triggered my interest in Mathematics and was also a good friend. This thesis would not be written if it were not for his encouragement.

Lastly, I want to thank my parents for their support during my study and at every stage of my life.

iii Vita

2017 ...... B.S., Double major in Mathematics and Economics, University of Illinois at Urbana- Champaign 2017-present ...... Graduate Teaching Associate and Re- search Associate, the Ohio State University.

Fields of Study

Major Field: Mathematical Sciences

iv Table of Contents

Page

Abstract ...... ii

Acknowledgments ...... iii

Vita...... iv

List of Tables ...... vii

List of Figures ...... viii

1. Introduction ...... 1

1.1 Motivation ...... 1 1.2 Overview of shape analysis ...... 3 1.3 Optimal Transport ...... 4 1.3.1 Wasserstein distance ...... 5 1.3.2 Gromov-Wasserstein distance ...... 5 1.3.3 Computation of TLB ...... 6 1.4 Persistent homology ...... 6 1.4.1 Bottleneck distance ...... 7 1.4.2 Computation of the Bottleneck distance ...... 7 1.5 Application of the TLB and the bottleneck distance ...... 7

2. Optimal Transport ...... 10

2.1 Brief History ...... 10 2.2 Monge-Kantorovich formulation ...... 12 2.3 Wasserstein distance ...... 14 2.4 Gromov-Wasserstein distance ...... 15 2.5 Third lower bound ...... 16 2.6 Sinkhorn’s Algorithm ...... 18

3. Persistent Homology ...... 21

3.1 Brief History ...... 21 3.2 Simplicial homology ...... 22

3.3 Functoriality of Hk ...... 24 3.4 Persistent homology ...... 25 3.5 Persistence diagrams ...... 27 3.5.1 The four point example ...... 29 3.6 Bottleneck distance ...... 31 3.7 Interleaving distance ...... 31 3.8 Stability results of persistence diagrams ...... 37 3.8.1 Stability of Vietoris-Rips filtration ...... 37 3.8.2 Stability of filtration functions ...... 37 3.9 Computation of bottleneck distance ...... 38

v 4. Experiments ...... 42

4.1 Summary of data ...... 42 4.2 Overview of experiments ...... 44 4.2.1 Quantitative measure of quality of classification ...... 47 4.3 The OT approach ...... 47 4.3.1 Outline of the OT approach ...... 47 4.3.2 Results of using the OT approach ...... 50 4.3.3 Using Euclidean distance with uniform probability measures . . . . 51 4.3.4 Using geodesic distance with uniform probability measures . . . . . 52 4.3.5 Using Voronoi probability measures ...... 55 4.3.6 Summary of results using the OT approach ...... 55 4.4 The PH Approach ...... 58 4.4.1 Preprocessing data ...... 58 4.4.2 Mean curvature ...... 58 4.4.3 Outline of the PH approach ...... 60 4.4.4 Results of using the PH approach ...... 61 4.4.5 Using Vietoris-Rips filtration ...... 62 4.4.6 Using mean curvature based filtration functions ...... 64 4.4.7 Summary of results using the PH approach ...... 70 4.5 Comparison of the results from the OT approach and the PH approach . . 71

5. Contributions and Future Work ...... 74

5.1 Conclusion ...... 74 5.2 Future work ...... 74 5.2.1 Improvement on the OT approach ...... 74 5.2.2 Improvement on the PH approach ...... 75 5.2.3 Other approaches ...... 75 5.2.4 Experiment on different data sets ...... 75

Appendices 77

A. Main functions and scripts ...... 77

A.1 OT approach ...... 77 A.1.1 Compute local distribution ...... 77 A.1.2 Compute TLB ...... 77 A.2 The PH approach ...... 79 A.2.1 Fill 1-cycles ...... 79 A.2.2 Compute persistence diagrams of sublevel set filtration of a function in (4.3) - (4.6) ...... 81 A.3 Probability of error (Pe)...... 82

Bibliography ...... 83

vi List of Tables

Table Page

4.1 Statistics of families, genera and diets of the teeth in the data set...... 43

4.2 Parameters that can be tuned in the experiment. The choices of D are not uniformly distributed. The “Normalization” column in the table indicates if the distance matrix is normalized...... 46

4.3 Table above shows all the combinations of parameters that we tried using the PH approach. For each method, we consider both the normalized and the unnormalized versions...... 46

4.4 Probability of Error table for experiments using optimal transport...... 50

4.5 Parameters used in experiments where the lowest Pe for each label category. 57

4.6 Probability of error of experiments using persistent homology...... 62

4.7 Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips filtration...... 63

4.8 Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips with modifided weight given in (4.1)...... 64

4.9 Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.2) in section 4.4.6...... 66

4.10 Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.3) and normalized (4.5). “absMeancurv” represents (4.3) and “minus absMeancurv” represents (4.5)...... 70

4.11 Change in Pe observed in Table 4.10 that is caused by switching from dB,∞ to dB,2 for the distance between persistence diagrams...... 70

4.12 List of methods with the best Pe using the PH approach...... 71

vii List of Figures

Figure Page

1.1 Surface of the crown of a tooth in the data set belonging to in family Lemuridae with frugivorous diet. The color bars on the side indicates mean curvature...... 8

1.2 Dendrogram of classification results of TLB using D = 0.32. An outlier is excluded. Labels indicates dietary preferences of the owners of the teet.h . .9

2.1 Example of an optimal transport problem. Figure 1 in [51]...... 11

2.2 Figure from [73] showing two examples of the Monge’s problem. On the left, is a case where the cardinalities of two spaces are the same and each point has equal weight. Hence, the optimal transport map is a permutation. However, on the right is an example where a transport map from the red dots to the blue dots does not exist...... 13

3.1 Figure from [95]. It shows the Betti numbers βk for different shapes...... 24

3.2 Example of Vietrois-Rips complex on a set of three poitns. Figure 3.2a shows

the original set. Figure 3.2b and 3.2c show all the simplicies in Kr as r increases. 27

3.3 An example of persistence diagrams of applying Vietoris-Rips filtration on a sampled point cloud of a circle. On the left is the sampled circle with radius 0.5 centered at (0.5,0.5). The middle figure shows the persistence diagram in the 0th dimension and the figure on the right shows the persistence diagram in the 1st dimension. We observe one point ...... 29

3.4 Example of a simplicial complex with filtration function defined in the fol- lowing way: all vertices enters at filtration value 0; the value of the filtration function at an edge is given by the weight of the edge; once three edges form a triangle, we add a face to the interior of the triangle...... 30

3.5 Filtered simplicial complex Kt at filtration value t. Note that a cycle H√1 appears when t = 1 and is “killed” by the new triangles appeared when t = 2. 30

3.6 Persistence barcodes in H0 and H1 of the filtered simplicial complex. . . . . 30

3.7 Persistence barcode representation of all the cases of the relative positions

between I1 and I2. Figure from [71]. An interval module is indicate by line connecting the left endpoint, the midpoint and the right end point...... 34

3.8 Figure from [68]. An example of a k-d tree...... 41

4.1 Figure from [10] showing crowns of teeth from with different diet. . 44

viii 4.2 Histogram of the diameters of the teeth in Euclidean (Figure 4.2a) and geodesic (Figure 4.2b) distances. The tooth that belongs to Megaladapis has diameter about 24.6mm (in Euclidean distance) and 32.7 (in geodesic distance). It is excluded from both histograms...... 45

4.3 An example of Voronoi partition. We generate 5000 random points and picked 7 points as representatives using the Farthest Point Search algorithm (ex- plained in section 4.3.1). 4.3a shows the 5000 points in small blue dots and the larger colored dots are the 7 representatives. 4.3b shows the Voronoi cells associated to each representative...... 49

4.4 Error against niter when  = 0.005 and K = 2000...... 50

4.5 Figure shows the dendrogram using Euclidean distance, uniform probability measure and D = 0.36. The Pe for diet is 0.494...... 52

4.6 Best dendrogram (in terms of structure) using geodesic distance, uniform probability measure with D = 0.35. The separation between the clusters are less optimal than the best dendrogram when using Euclidean distance as the metric. The Pe for diet is 0.557...... 53

4.7 Dendrograms of results using the OT approach with geodesic distance, uni- form probability measures and D = 0.3 (Figure 4.7a), 0.33 (Figure 4.7b) and 0.5 (Figure 4.7c). All three experiments yields Pe = 0.519 for dietary classi- fication...... 54

4.8 Figure shows the values of Pe for family (Figure 4.8a), genus (Figure 4.8b) and diet (Figure 4.8c) against the choice of D when using uniform probability measures with either Euclidean distance or geodesic distance as cost. We observe that the lowest Pe’s when using Euclidean distance is always lower than those when using geodesic distance...... 56

4.9 Dendrogram for distance matrix using the Vornoi probability measures, geodesic distance with D = 0.35. This method produces the lowest Pe (= 0.494) for diet when using the OT appraoch...... 57

4.10 Examples of holes in meshes. Both figures are zoomed in to better show the holes. These holes are visually hard to detect when one is looking at an entire triangulated surface. Figure 4.10a shows an example of a loop that is caused by missing triangles. The boundary of the hole is polygonal. Figure 4.10b is an example of a hole caused by filling triangles at the wrong place: the mesh becomes non-manifold in this case...... 58

4.11 Figure 4.11a shows an example a 1-cycle in a triangular mesh. Figure 4.11b shows the new mesh with the cycle (shown on the left) filled. Darker areas are new triangles added to the mesh. The yellow vertex is the centroid of the cycle and is added to the list of vertices that generates the mesh...... 59

4.12 Figure from Wikipedia created by Eric Gaba at https://en.wikipedia.org/ wiki/Curvature#/media/File:Minimal_surface_curvature_planes-en.svg 59

4.13 Figure from Wikipedia created by Cepheus at https://en.wikipedia.org/ wiki/Curvature#/media/File:Osculating_circle.svg. Given a point p, an osculating circle is shown as a blue circle in the figure...... 60

4.14 Figure from [94]. Figure on the left shows the triangle on which A(4(v0, v1, v2)) is computed. Figure on the right shows the angles used to compute the cotan- gent formula...... 61

ix 4.15 Dendrogram of bottleneck distance matrix of 0th-persistence diagrams for Vietoris-Rips filtration with geodesic distance. The Pe for diet is 0.430, which is also the lowest Pe among all the experiments using the PH approach. . . . 63

4.16 Dendrograms of the bottleneck distances between the 1st-persistence dia- grams. The persistence diagrams are produced using modified geodesic dis- tance in (4.1) in the Vietoris-Rips filtration. Figure 4.16a shows the result when setting α = 30. Figure 4.16b shows the result when setting α = 50. Both results yields Pe = 0.519 for dietary classification...... 65

4.17 Figure shows the dendrogram of the result when using (4.2) with λ = 0.1 to construct a filtered simplicial complex. The Pe for diet is 0.506...... 67

4.18 Crown surfaces of a collection of teeth in the data set that are used for testing. Title of each subfigure shows family, diet and code of the tooth. Teeth in the same row share the same family and diet. The coloring on the surfaces indicates mean curvatures at the vertices...... 68

4.19 Dendrograms for filtration functions listed in (4.3) - (4.6). Each row cor- responds to a filtration function in the order of (4.3) - (4.6). Odd columns show dendrograms of bottleneck distance matrices of 0th-persistence diagrams whereas the even columns show those of 1st-persistence diagrams. The left half of the figure are from unnormalized filtration functions and the right half are where normalized filtration functions are applied...... 69

4.20 Figure shows the dendrogram of the result when using dB,2 on 1st-persistence diagrams for Normalized (4.3). The Pe for diet is 0.620...... 71

4.21 Clustergrams (heatmap with dendrogram) for the distance matrices that pro- duce the best Pe for dietary classifications using the OT and PH approach. Figure 4.21a is produced by using Euclidean distance, uniform probability

measures, D = 0.36 and K = 2000. Figure 4.21b is produced by using dB,∞ comparing 0th-persistence diagrams for Vietoris-Rips filtration with geodesic distance. Labels on the right indicate diets and labels at the bottom show family of each tooth...... 73

x Chapter 1: Introduction

1.1 Motivation

In this thesis, we will explore two approaches, based on optimal transport and persistent homology, to discriminating shapes through defining a “meaningful” distance that reflects geometric or topological features of the shapes under study. The need for quantifying sim- ilarities between shapes arises as an assistant to decision making and classification tasks in many fields, such as architecture, anatomy, security, and manufacturing. To tackle shape comparison problems, a large amount of research has proposed different techniques, e.g. landmark-based comparison, functional analysis, and geometric comparison. A more de- tailed review on shape analysis will be provided in section 1.2.

My research project is motivated by taxonomic studies in Biology. Taxonomy is the classification of biological organisms based on shared morphological, anatomical or genetic characteristics. A group of organisms in a taxonomy is called a taxon (plural: taxa). These taxa yield a rank of groups of organisms. The principal ranks in modern taxonomy (in decreasing order) are domain, kingdom, phylum, class, order, family, genus and species.

Traversing through the taxonomic hierarchy, we will develop a sense of “proximity” between organisms.

Taxonomy arose naturally in the history of of humankind. An emperor in Chinese mythol- ogy, named Shen Nung from around 3000 BC, is said to have tasted hundreds of herbs to test their medical values and is believed to have composed the Shen-nung pen ts’ao ching

(Divine Husbandman’s Materia Medica), the earliest extant Chinese pharmacopieia [1]. The

first compilation of the book was prepared by a Taoist named T’ao Hung-ching (452 - 536)

[91]. Although attempts at building a classifier of organisms were observed in ancient ages for agricultural and medical purposes, Aristotle was considered the first to formalize tax- onomy in his work History of Animals around 350 BC [52]. In 1753, Linnaeus introduced the binomial system of nonmenclature that we are using today in his work Systema naturae

[54, 38]. Later, the International Code of Zoological Nomenclature and the International

1 Code of Botanical Nomenclature were created to structure the scientific naming process of

organisms and are still in use to the present day. Despite the fact that the codes decide the

formality of the names, the actual taxon of an organism is decided by biologists.

Taxonomy plays an important role in biology. It gives a unique ID to each taxon to

assist communication within the biology community. The number of taxa allows biologists to

understand biodiversity on the Earth as the diversity of species is in peril [61]. Habitat losses

are causing species to become extinct at an increasing rate [75, 88] and thus, the conservation

of biodiversity is more urgent than ever. Taxonomy provides a means of conserving species

as the biologists can study a close relative to an endangered species to reveal insights on how

to improve the habitat of the endangered one. Moreover, [38] pointed out that taxonomy

paved the ground for physiology, genetics, ecology and evolutionary biology. Based on the

work done in developing the taxonomy, biologists are able to devise new similarities between

organisms, which slowly reveal the origin of life. However, the slow rate at which taxonomists

can describe or identify organisms obstructs the future of taxonomy [62].

The best way of determining the proximity between organisms is an open question. There

have been various attempts at developing notions of proximity in biology. A recently pop-

ular approach to measuring similarity between two organisms is by studying their genetic

sequences. The 1-dimensional sequential alignment of genomes is in the form of permutations

of four nitrogenous bases: adenine (A), thymine (T), guanine (G), and cytosine (C). Such

representation of DNA sequences reduces the complexity of computing similarity measures

and encourages automation of the process [55]. A DNA sequence based tool called DNA barcoding has received high popularity for animal taxa identification [14]. It uses a short region of mitochondrial cytochrome c oxidase 1 (COI) as a unique identifier for organisms

[81]. DNA barcoding has shown success in identifying genetic diversity and unusual patterns in genetic variability that is not yet well understood [62]. However, empirical results suggest that COI does not always act as an identifier for organisms [81]. Moreover, DNA barcoding initiatives have not reconciled the indiscripancies between DNA barcoding predictions and nomenclature given by classical taxonomy [82].

Another traditional but more intuitive approach of quantifying similarity is to compare the geometry of gross anatomical structures of organisms. For example, [11] computed the

Procrustes distance between landmarks on surfaces of teeth. A biologist that studies the geometry of anatomical surfaces is called a morphologist. Usually, morphologists label 10 to

100 landmark points on anatomical surfaces and compare the coordinates of such landmarks

2 to determine similarities [66]. However, labelling landmarks requires domain knowledge and the result does not extend to the anatomical structures of new species [79]. In addition, the variety of anatomical surfaces provide different angles to understanding proximity between species and may lead conclusions of similarity. To overcome the limitations of personal knowledge, we would like to develop a systematic way of determining proximity.

We will propose two different frameworks that one uses optimal transport whereas the other one uses persistent homology to quantify similarities between organisms in morpholog- ical sense: our objective is to automate the process of taxon classification by comparing the geometric or topological features of anatomical surfaces. Both approaches realize two im- portant distances that one can define on shapes: the Gromov-Wasserstein distance and the

Gromov-Hausdorff distance. To develop a sensible metric that recovers taxonomic proxim- ity, we will compute approximations to the Gromov-Wasserstein distance and the Gromov-

Hausdorff distance on dental information of a group of primates (e.g. prosimian primates) and non- close relatives (e.g. Dermoptera, Scandentia, and Plesiadapiforms).

Many biologists used dentition to understand evolutionary histories because ecology has direct consequences on dietary preference [37, 40, 41]. Differences in sizes and shapes of teeth have been long considered as reflectors of diet since teeth are used for processing food. For example, Kay stated in 1975 that species with different diets have different molar structures

[48]. In addition, mammalian teeth are unique identifiers of taxonomy [10]. By testing our approximations of the Gromov-Wasserstein distance and the Gromov-Hausdorff distance on second molars of animals which belong to Euarchonta, we hope to develop a good automated classifier that recovers taxonomy or diets.

1.2 Overview of shape analysis

Motivated by real life scenarios, shape analysis techniques aim to provide meaningful invariants that quantify similarities between shapes under deformations or rotations. The need for such quantification of similarity appears in different domains, e.g. in documenting archaeological objects [86], in comparing anatomical structures [11], in recognizing same or different faces [39] and in finding mechanical parts in a large data base [2, 18]. In many of these studies, similarities were originally determined by human experts. However, visually comparing shapes is an expensive and laborious task. Moreover, results are subjective and may be inconsistent. On the other hand, computer algorithms that can automatically com- pare shapes are systematic. Recent advances in computational technologies enabled such

3 automatic processes of shape comparison. There have been many studies proposing dif-

ferent tools in shape analysis. Popular approaches include landmark-based shape analysis,

conformal geometry, optimal transport and persistent homology.

A mathematical extension to the approach originally used by morphologists which relied

on choosing landmarks is landmark-based shape analysis. It was one of the first formal

methods for the analysis of shapes and was first suggested by Kendall [49] in 1981. The idea

is that given a collection of landmarks, one can apply vector calculus by treating landmarks

as vectors after removing the action of translation, scaling and rotation. This approach frees

morphologists from visual inspections on landmarks. However, it still depends highly on

domain expertise in choosing landmarks. A solution to landmrak dependency is to study

the boundaries or silhouettes of shapes instead of landmarks.

Conformal geometry is an alternative tool for the comparison of anatomical structure

comparisons. Originating from Riemannian geometry, conformal mapping studies angle-

preserving transformations between pairs of shapes. An introduction of the subject along

with applications, for example, in studying brain images, can be found at [42]. In [11], it

defines a distance on two dimensional surfaces using conformal mappings. However, the

limitation of conformal geometry is that it only applies to smooth manifolds.

In the following sections, we will introduce optimal transport and persistent homology.

One of the benefits of using optimal transport and persistent homology is that they are

applicable on point clouds (collections of points) as well as manifolds. Hence, contrary to

conformal geometry that can only be used study smooth surfaces, optimal transport and

persistent homology can study a wider range of data sets. In addition, cost function in

optimal transport and filtration function in persistent homology provide freedom in detecting

various features in the shapes that address users interest more directly.

1.3 Optimal Transport

Optimal transport is the theory of finding the most efficient way of moving a distribution

of masses from one object to another. Let (X, dX ) be a metric space and let α and β be two probability measures on X. We call (X, dX , α) and (X, dX , β) metric-measure spaces.

A coupling µ between α and β is a probability measure on the space X × Y s.t. for all measurable A ⊂ X, µ(A, Y ) = α(A) and for all measurable B ⊂ Y , µ(X,B) = β(B). A coupling between α and β describes a transport plan of moving masses from α to β. We refer our readers to Chapter 2 for more precise definitions.

4 1.3.1 Wasserstein distance

Let p ∈ [1, ∞). A useful distance between two probability measures on a metric space is called the p-Wasserstein distance, given by

 Z 1/p p dW,p(α, β) := inf dX (x, y) µ(dx, dy) µ∈M(α,β) X×X

p where M(α, β) is the collection of all admissible couplings between α and β. We call dX (x, y)

a cost function. The p-Wasserstein distance is the least possible cost of moving from distri-

bution α to β and the optimal solution µ describes the transport plan.

The Wasserstein distance arises from the Monge-Kantorovich formulation of the optimal

transform problem, which we will discuss in Chapter 2. When p = 1, the Wasserstein distance

is also called the Earth mover distance and is widely studied in the computer vision literature,

e.g. in image retrieval [78]. When studying shapes, we may put a probability measure of

importance on points and then, apply optimal transport to obtain the Wasserstein distance.

However, note that in order to compare two metric spaces using the Wasserstein distance,

we need to embed the two spaces into a common ambient metric space and them compute

the Wasserstein distance in the ambient space. In this case, the positions of the two shapes

in the ambient space are important due to the cost function dX (x, y).

1.3.2 Gromov-Wasserstein distance

In many cases, the relative position of two metric spaces inside an ambient space is not

important. Instead we use pairwise distances as a representation for each shape and define

a new distance between these representations. In this way, we do not need an embedding to

compute a distance between two metric-measure spaces.

Let X = (X, dX , α) and Y = (Y, dY , β) be two metric-measure spaces. Let p ∈ [1, ∞).

The p-Gromov-Wasserstein distance is defined as

 ZZ 1/p 0 0 p 0 0 dGW,p(X , Y) := inf |dX (x, x ) − dY (y, y )| µ(dx, dy)µ(dx , dy ) . µ∈M(α,β) X×Y

A disadvantage of the Gromov-Wasserstein distance is that the cost of computing the

distance is more expensive than that of the Wasserstein distance. Instead we calculate lower

bounds of the Gromov-Wasserstein distance to reduce the complexity of computation. In

this project, we will use the Third Lower Bound (TLB) [63].

The local distribution is defined as

FX (x, r) := α(BX (x, r)).

5 −1 And FX (x, t) := inf {u ∈ R|FX (x, u) > t} is the generalized inverse of FX (x, t). The Third Lower Bound (TLB) [63] is defined as

Z 1/p TLBp(X , Y) := inf c(x, y)µ(dx, dy) µ∈M(α,β) X×Y where Z DD −1 −1 p c(x, y) := |FX (x, t) − FY (y, t)| dt 0 is the cost function of TLB. Note that in the cost function, we only integrate the local distribution up to DD. Hence, DD controls the locality in TLB and it can be tuned to output the “best” TLB for a specific classification task.

TLB provides a cheaper alternative to the Gromov-Wasserstein distance in terms of computation. We will explain the naming (why it is the “third” lower bound) and show that

TLB is truly a lower bound in Chapter 2.

1.3.3 Computation of TLB

In my thesis work, the computation of TLB was done by Sinkhorn’s algorithm [22] where entropy is integrated into the optimal transport problem. Entropy quantifies the randomness R in a distribution. The entropy of a coupling µ is defined as H(µ) := X×Y µ(x, y) log(µ(dx, dy)). Let  > 0. Sinkhorn’s algorithm solves an entropic regularized optimal transport problem

Z inf c(x, y)µ(dx, dy) − H(µ). µ∈M(α,β) X×Y

The entropic regularized optimization problem guarentees the existence of an optimizer [22].

Moreover, the solution can be found using Sinkhorn’s iterative method [80] with linear con- vergence [33].

The results of Sinkhorn’s algorithm depends on the choice of the entropic regularizer () and number of iteration (niter). We fix  = 0.005 and niter = 44 when using Sinkhorn’s algorithm to ensure a good and computable approximation to the original optimal transport problem.

1.4 Persistent homology

Persistent homology summarizes topological features of a topological space or of a func- tion on a topological space[26]. We call this function a filtration function. Through collect- ing the sublevel sets of a filtration function, we build an increasing sequence of simplicial complexes, called a filtered simplicial complex from a set. This filtered simplicial complex allows the computation of (k-th) persistent homology vector spaces that detect and track

6 k-dimensional cycles. Let R := R ∪ {+∞} be the extended real line. Due to the interval decomposition of persistent homology vector spaces, we obtain a (k-th) persistence diagram

2 (or persistence barcode) which is a multiset of subsets of R . The off diagonal points in a persistence diagram records the “birth” (time of appearance) and “death” (time of disap-

pearance) of these k-dimensional cycles with finite multipliticity. The diagonal points in a

persistence diagram are assigned with infinite multipliticity.

1.4.1 Bottleneck distance

The bottleneck distance dB,∞ is defined between two kth-persistence diagrams Dk(f) and

Dk(g) for two filtration functions f and g as

dB,∞(Dk(X), Dk(Y )) := inf sup kx − ϕ(x)k∞ ϕ x∈Dk(X)

where ϕ : Dk(X) → Dk(Y ) is a bijection. That is, we find the best way of matching the

points in two persistent diagrams and define the bottleneck distance to be the ∞-distance

of the farthest pair in the best bijection.

Similarly, a p-bottleneck distance is defined as

Z 1/p p dB,p(Dk(f), Dk(g)) := inf kx − ϕ(x)k∞dx . ϕ Dk(f)

1.4.2 Computation of the Bottleneck distance

We use JavaPlex [87] and Ripser [89] to compute the persistence diagrams and Hera [50] for computations of the bottleneck distances. Hera converts the optimization problem of

finding the best bijection to a graph matching problem. It uses k-d tree [5] to search for the best matching in the graph. Details are discussed in 3.9.

1.5 Application of the TLB and the bottleneck distance

In this thesis, we will explore several implementations of the optimal transport and persistent homology approaches and test the performances of these implementations on a data set of triangulated surfaces of teeth studied in [11]. These teeth belong to either primates or non-primate close relatives. Our objective for this project is to build an automatic classifier that recovers either taxonomic classification (e.g. biological family or genus) or dietary classification.

An actual tooth in the data set is shown in Figure 1.1. There are 116 teeth in the data set. Each tooth was cut off in the middle such that only the crown was shown.

7 Figure 1.1: Surface of the crown of a tooth in the data set belonging to animal in family Lemuridae with frugivorous diet. The color bars on the side indicates mean curvature.

In the hope of developing a new means of automated classification, we tune the parameters to obtain a “good” classifier that resembles either taxonomic or dietary classification. When experimenting with the TLB, we tuned three parameters: D, cost function and probability measures. When experimenting with the bottleneck distance, we used different filtration functions with either dB,∞ or dB,2. For further details, readers can proceed to Chapter 4.

We will represent our results in dendrograms of single-linkage hierarchical clustering.

The dendrograms are ordered in the way that closest clusters are merged first. We classify the teeth by three types of labelling: family (biological), genus (biological) and dietary preferences. There are 24 families, 37 genera, and four dietary types. Labels are adapted from [11]. A “good” dendrogram should show clusters of teeth with same labels at each level.

For better comprehension of the results in this thesis, we will only show visualizations through dendrograms using diets as labels. For more results, we refer our readers to the webpage. Readers can see the dendrograms of any available combinations of D, metric, probability measures, and labels by choosing the values in the dropdown menu. There are four different diets in the data sets: folivorous (leaf-eating), frugivorous (fruit-eating), insectivorous (insect-eating) and omnivorous (diets with no restriction).

We observe that the “best” classification result in terms of structure is obtained using

TLB with D = 0.32. Although we cannot find four major clusters that separate the four dietary types, we observe that at the top of the hierarchy (or the largest clusters that merge

8 eucK2000_epsParam0.005_niter44_DD0.32 insectivorous insectivorous insectivorous insectivorous omnivorous frugivorous insectivorous folivorous insectivorous insectivorous insectivorous frugivorous folivorous folivorous insectivorous insectivorous omnivorous omnivorous folivorous na insectivorous omnivorous insectivorous insectivorous insectivorous insectivorous na insectivorous omnivorous na na insectivorous na insectivorous insectivorous insectivorous insectivorous na insectivorous na na na omnivorous frugivorous na na insectivorous insectivorous na insectivorous na na insectivorous omnivorous omnivorous omnivorous omnivorous na na na na na na na na na na na insectivorous insectivorous insectivorous na na na na folivorous na na folivorous insectivorous frugivorous omnivorous na folivorous folivorous omnivorous folivorous folivorous folivorous folivorous folivorous frugivorous na folivorous folivorous folivorous folivorous na na na na frugivorous folivorous folivorous folivorous frugivorous frugivorous folivorous frugivorous folivorous folivorous folivorous folivorous folivorous folivorous 0.006 0.007 0.008 0.009 0.01 0.011 0.012 0.013 0.014 0.015 0.016

Figure 1.2: Dendrogram of classification results of TLB using D = 0.32. An outlier is excluded. Labels indicates dietary preferences of the owners of the teet.h

in the end), we have two clusters that almost separate insectivorous and omnivorous from frugivorous and folivorous.

We compute the probability of error (Pe) in leave-one-out classification to quantify the quality of our results. Fix a labeling category (family, genus or diet), for each tooth in the data set, we predict its label by the closest neighbor given by our distance matrix. The Pe is the fraction of items that have different predicted labels from their true labels. We observe that the best Pe for using different labeling categories are obtained in different experiments.

When using family and genus as labels, the lowest Pe’s are 0.737 and 0.793 and are obtained in experiments using the optimal transport approach. When using diet as labels, the lowest

Pe is 0.430 and is obtained in a experiment using the persistent homology approach.

9 Chapter 2: Optimal Transport

2.1 Brief History

Optimization is a task that we do on a daily basis. We want to take the shortest path to save time when commuting; we want to make the right decisions to avoid failures; essentially, we want to succeed with minimal effort. An economist may identify such optimization behaviors as acts of rationality. Due to scarcity of resources, we need to be rational to maximize the output from limited resources. Hence, economics was developed to study how societies can best utilize these resources. Since economics is a rather broad subject and society is a large entity, we often break the task down into smaller cases to better study the subject. A possible scenario in a construction site may be finding the best way of moving a pile of soil from one place to another. Various theories were proposed to find a general solution of the optimal transport problem under a similar setting.

Optimal Transport is the theory of finding the most efficient way of moving a distribution of mass (continuous or discrete) from one object to another. In 1781, the problem was formalized by the French mathematician Gaspard Monge [67]. The weighted sum of distance between two mass distributions of soil can be considered as the cost of the transportation.

Then moving a pile of soil optimally to another is equivalent to finding a map between two mass distributions with minimal cost. Hence, Monge’s formulation aims at finding an optimal transport map between two mass distributions with respect to a cost function.

Later, in 1942, a Soviet Union mathematician Leonid V. Kantorovich developed a general version of the optimal transport problem where mass splitting is allowed [45]. That is, a point in the support of the measure can be mapped to two different destinations. Hence, the resulting optimal solution is no longer a map. We call the solution of Kantorovich’s for- mulation a transport plan. Although the new formulation sounds harder to solve than the original problem proposed by Monge, Kantorovich converted the optimal transport problem to a linear programming problem. The optimal solution exists and can be found in polyno- mial time [22]. This generalization of optimal transport problem won Kantorovich the Nobel

10 Figure 2.1: Example of an optimal transport problem. Figure 1 in [51].

Prize in Economics in 1975. Figure 2.1 shows a difference between Monge’s formulation and

Kantorovich’s relaxation.

Kantorovich’s work led to many studies in the field of optimal transport [51]. Sudakhov showed the existence of the optimal transport map in 1979 [84]. In the 1990s, Brenier studied the characterization, existence and uniqueness of the optimal transport map [13] and Gangbo and McCann provided a geometric interpretation of the problem [36].

Due to a large amount of theories on the subject, optimal transport has become com- putationally available and is gradually applied on projects in different areas [73]. In image processing, optimal transport has extensive uses in image matching [99, 97, 70], image fusion

[21], medical imaging [96], shape registration [58] and image watermarking [60]. Optimal transport was also used in music transcription [32] and in economics [35].

Optimal transport gives rise to two important metrics defined on the space of all metric- measure spaces (i.e. metric spaces equipped with measures): the Wasserstein distance and the Gromov-Wasserstein distance. These distances are consequences of geometric properties of the optimal transport problem. The Wasserstein distance a distance between two metric- measure subspaces embedded in a common space whereas the Gromov-Wasserstein distance is defined between any two metric-measure spaces.

11 However, computing Gromov-Wasserstein distance leads to an instance of a Quadratic As-

signment problem and hence, is NP-hard. A handful of lower bounds of Gromov-Wasserstein

distance are introduced in [63]. In this thesis, we only use the Third Lower Bound as it is

tighter than the First Lower Bound [63]. Moreover, the TLB allows finer detection of lo-

cal structures via local distribution whereas the Second Lower Bound only uses the global

distribution.

The actual computation of the optimal transport problem is approximated using Sinkhorn’s

algorithm [22]. The algorithm adds an entropic regularization term to the original optimal

transport problem and hence, turns it into a strictly convex problem. Instead of solving the

optimal transport problem at vertex of the domain, the solution to the new problem locates

at the interior with sufficient smoothness. The solution obtained by using Sinkhorn’s fixed

point iteration will converge [33].

2.2 Monge-Kantorovich formulation

Let (X, dX ) be a metric space. A Borel σ-algebra B(X) on X is any collection of open

subsets of X that is closed under countable union, countable intersection, and relative com-

plement. An element in a Borel σ-algebra is called a Borel set.

A Borel probability measure on X is a function µ : B(X) → [0, 1] such that for any S countable collection of measurable subsets {Si}i of X that are pairwise disjoint, µ( i Si) = P i µ(Si) and µ(X) = 1. A subset B ∈ X is measurable in X if B ∈ B(X). We denote the space of borel probability measures on X by P1(X).

Let α ∈ P1(X) and Y be a metric space. Let f : X → Y be a continuous function. We say that f is a measurable function if for all measurable B ⊂ Y , f −1(B) is also measurable in X.

For a measurable function f, we define the pushforward measure β = f#α : B(X) → B(Y )

as follows: for all measurable B ⊂ Y ,

β(B) := α({x ∈ X|f(x) ∈ B}) = α(f −1(B)).

We observe that β is a measure on Y . From [92], we have that for all measurable functions g : Y → R, Z Z g(y)β(dy) = g ◦ f(x)α(dx). (2.1) Y X Such f is also called a transport map in Monge’s formulation.

Then the Monge’s optimal transport problem is

Z M(α, β) := inf c(x, f(x))α(dx) (2.2) f∈MP X

12 Figure 2.2: Figure from [73] showing two examples of the Monge’s problem. On the left, is a case where the cardinalities of two spaces are the same and each point has equal weight. Hence, the optimal transport map is a permutation. However, on the right is an example where a transport map from the red dots to the blue dots does not exist.

where MP := {f : X → Y |f#α = β} and c : X × Y → R+ is a cost function that determines the cost of transporting an object x ∈ X to f(x) ∈ Y .

The existence of an optimal transport map depends on the measures α, β as well as

the cost function c. Suppose X and Y are finite metric spaces and |X| = |Y | = n. Let

α ∈ P1(X) and β ∈ P1(Y ) be uniform measures, i.e. α({xi}) = β({yj}) = 1/n for xi ∈ X

and yj ∈ Y . Then we note that when solving (2.2), we are looking for an optimal permutation

σ : {1, ··· , n} → {1, ··· , n} such that c(xi, f(xi)) = c(xi, yσ(i)) is minimized. Hence, (2.2)

becomes n 1 X min c(xi, yσ(i)), σ n i=1 which is an optimal assignment problem [73]. However, if |X| ≤ |Y |, then a transport map

does not exist. Figure 2.2 shows two examples of optimal transport problems from the set

of blue points to the set of red ones. We note that a transport map from the red points to

the blue ones does not exist in the case on the right of Figure 2.2.

In general, difficulties arise in computing the optimal solution for (2.2). Both the ob-

jective function and the constraints in are non-linear with respect to f. In addition, the

domain of the problem MP is non-convex [73]. To tackles these challenges, we consider the

Kantorovich’s formulation which converts the optimal transport problem in (2.2) to a linear

programming problem.

Let α and β be two probability measures on two metric spaces (X, dX ) and (Y, dY )

respectively. A coupling µ between α and β is a probability measure on the space X × Y

13 s.t. for all measurable A ⊂ X, µ(A, Y ) = α(A) and for all measurable B ⊂ Y , µ(X,B) =

β(B). Alternatively, we can rewrite the constraints as ΠX #µ = α and ΠY #µ = β where

ΠX : X × Y → X and ΠY : X × Y → Y are canonical projections. We call α, β marginals of µ. By considering a coupling instead of a permutation or a transport map, we allow the mass at a single point xi ∈ X to be distributed over multiple destinations in Y . The idea of mass splitting suggests that instead of thinking about a deterministic model for mass transportation, one can consider a probabilistic approach [73].

Let M(α, β) be the collection of all possible couplings between α, β on a metric space

X. The Kantorovich’s formulation is defined as

Z K(α, β) := min c(x, y)µ(dx × dy). (2.3) µ∈M(α,β) X×Y

A minimizer in (2.3) is called a (optimal) transport plan. Notice that in (2.3), the objective

function and constraints are linear with respect to µ. Moreover, (2.3) always has a solution

[73]. We observe that M(α, β) is non-empty since the trivial coupling α ⊗ β is in M(α, β),

which is given by the tensor product, (α ⊗ β)(Ba × Bb) = α(Ba)β(Bb) for all measurable

Ba ⊂ X and Bb ⊂ Y .

If a transport map exists in the Monge’s formulation, then we can rewrite Monge’s problem in Kantorovich’s formulation. For any transport map f, we define a transport plan

µ = (Id × f)#α. Then for any measurable subsets A ∈ X and B ∈ Y we have

−1 −1 (Id × f)#α((A, B)) = α((Id × f) (A, B)) = α(A) × α(f (B))

= α(A) × β(B) = µ(A, B).

Therefore,

Z Z c(x, y)µ(dx × dy) = c(x, y) ((Id × f)#α)(dx × dy) X×Y X×Y Z = c ◦ (Id × f)(x)α(dx) X Z = c(x, f(x))α(dx). X

If there exist an optimal transport map f, then the coupling given by (Id × f)#α is an

optimal transport plan [93]. However, the converse is not true since a transport map does

not always exist.

2.3 Wasserstein distance

Given two borel probability measures α, β on a metric space (X, dX ), Wasserstein distance

measures how different these measures by solving Kantorovich’s optimal transport problem.

14 For p ≥ 1, we define p-Wasserstein distance between α, β to be

Z 1/p X 0 p 0 dW,p(α, β) := inf dX (x, x ) µ(dx, dx ) . (2.4) µ∈M(α,β) X×X

Note that to compute p-Wasserstein distance on two metric-measure spaces (X, dX , α) and

(Y, dY , β), we need to embed X and Y into an ambient metric space (Z, dZ ) and compute

Z µ|X µ|Y dW,p(α, β, µ) such thata α = µ(X) and β = µ(Y ) . Moreover, p-Wasserstein distance is sensitive to the positions of X and Y in the ambient space Z. To overcome the limitation of p-

Wasserstein distances, we can use the Gromov-Wasserstein distance.

2.4 Gromov-Wasserstein distance

Again, suppose that Z = (Z, dZ ) is a metric space and X,Y ⊂ Z.A correspondence between X and Y is any R ⊆ X × Y such that

• for all x ∈ X, there is y ∈ Y that (x, y) ∈ R

• for all y0 ∈ Y , there is x0 ∈ X that (x0, y0) ∈ R.

Let R(X,Y ) denote all possible correspondence between X and Y . Then we define the

Hausdorff distance between X and Y as

Z dH (X,Y ) := min max dZ (x, y). R∈R(X,Y ) (x,y)∈R

We define the Gromov-Hausdorff distance between X and Y as

Z dGH (X,Y ) = inf dH (γX (X), γY (Y )), (2.5) Z,γX ,γY where γX : X → Z and γY : Y → Z are isometric embeddings.

We can also define the Gromov-Hausdorff distance in the following way:

0 0 dGH (X,Y ) = inf sup |dX (x, x ) − dY (y, y )|. (2.6) R (x,y),(x0,y0)∈R

We observe that (2.5) and (2.6) are equivalent by Theorem 7.3.25 as shown in [15].

Let X := (X, dX , α) and Y := (Y, dY , β) be two metric-measure spaces. The Gromov-

Wasserstein distance is an analogue of the Gromov-Hausdorff distance on metric-mesaure

spaces. For p ≥ 1, the Gromov-Wasserstein distance [63] between X and Y is defined as

1 dGW,p(X , Y) := min disp(µ) (2.7) 2 µ∈M(α,β)

where for 1 ≤ p ≤ ∞

Z Z 1/p 0 0 p 0 0 disp(µ) := |dX (x, x ) − dY (y, y )| µ(dx, dy)µ(dx , dy ) X×Y X×Y 15 and for p = ∞

0 0 dis∞(µ) := sup |dX (x, x ) − dY (y, y )|. x,x0,y,y0 is called the p-distortion of µ.

Theorem 2.8. [63] Let M be the collection of all metric-measure spaces. Then (M, dGW,p)

is a metric space up to isomorphisms.

We observe that computing the Gromov-Wasserstein distance is an instance of a Quadratic

Assignment problem and hence, is NP-hard [63]. Therefore, in practice, an alternative is

to compute a tight lower bound of the Gromov-Wasserstein distance. Several lower bounds

are proposed in [63]. We will introduce the so called Third Lower Bound in the following

section.

2.5 Third lower bound

Again, let X = (X, dX , α) and Y = (Y, dY , β) be two finite metric-measure spaces. Fix a

p and a D ∈ R+. The third lower bound (TLB) [63] for the p-Gromov-Wasserstein distance between X and Y is given by

Z 1/p TLBp(X , Y) := inf c(x, y)µ(dx, dy) (2.9) µ∈M(α,β) X×Y

where Z D −1 −1 p c(x, y) := |FX (x, t) − FY (y, t)| dt 0 is the cost function of TLB. We define

FX (x, r) := α(BX (x, r)) (2.10)

−1 to be the local distribution. FX (x, t) := inf {u ∈ R|FX (x, u) > t} is the generalized inverse of FX (x, t).

We observe that D controls the neighborhood that we are interested in. If D = ∞, then

0 BX (x, D) = X. Hence, FX (x, D) = 1. Let diam(X) := maxx,x0∈X (dX (x, x )) be the diameter

of X. In facet, since we assumed X and Y to be finite, when D = max(diam(X), diam(Y )),

we have FX (x, D) = 1. On the other hand, if D is small, then we forget the global distribution

of masses and focus only on local features. In Chapter 4, we will see how the choice of D

affects classification outcomes.

To see how the local distributions appear in (2.9), we observe that when the measures

R are 1-dimensional, dW,p has a closed form solution.

16 Proposition 2.5.1. Let α, β ∈ P1(R). The Z 1/p Z 1/p R p −1 −1 p dW,p(α, β) = inf |x − y| µ(dx × dy) = |Fα (t) − Fβ (t)| dt µ∈M(α,β) R2 R −1 where Fα(t) = α((−∞, t]) and Fα (t) = inf {u ∈ R|Fα(u) > t} is the generalized inverse of

Fα(t).

Proof. Due to Brenier’s theorem [13], there exists a unique optimal transport map f = Oφ(x) such that φ is a convex scalar function. Since f is a function from R to R, f is monotonically increasing. In R, there exist only one monotonically increasing transport map, i.e. f(x) :=

0 inf {y ∈ R|Fα(x) ≤ Fβ(y)} for x ∈ R. To see this, we fix an x ∈ R. Then for all x ≥ x, we have f(x) ≤ f(x0). Then we have

β((−∞, f(x)]) ≤ β((−∞, f(x0)])

α(f −1((−∞, f(x)])) ≤ β((−∞, f(x0)]))

0 0 Fα(x) ≤ Fβ(f(x )) for all x ≥ x

=⇒ Fα(x) ≤ inf {Fβ(y)|y ≥ f(x)} .

−1 Hence, we define f as described above. In other words, f = Fβ ◦ Fα. Then, Z 1/p p dW,p(α, β) = |x − f(x)| α(dx) R Z 1/p −1  p = |x − Fβ ◦ Fα (x)| α(dx) R Z 1/p −1 −1 p = |Fα (t) − Fβ (t)| dt . R

Lemma 2.5.1 ([64]). Let f : X → R and g : Y → R be two continuous functions. Then Z Z inf |f(x) − g(y)|pµ(dx × dy) ≥ |F −1(t) − G−1(t)|pdt µ∈M(α,β) X×Y R where F (u) = α({x ∈ X|f(x) ≤ u}) and G(u) = β({y ∈ Y |g(y) ≤ u}). F −1(t) and G−1(t) are generalized inverses of F and G.

Although the proof was shown in detail in [64], we will still reconstruct the proof below for the sake of completeness.

Proof. Fix an µ ∈ M(α, β). Let ν = (f, g)#µ. Note that for all measureable A ∈ R, we have ν(A × R) = µ(f −1(A) × g−1(R)) = µ(f −1(A) × Y ) = α(f −1(A)). Similarly, for any measurable B ⊂ R, ν(R × B) = β(g−1(B)). Then, we have Z Z |f(x) − g(y)|pµ(dx × dy) = |t − s|pν(dt × ds) X×Y R2 Z ≥ inf |t − s|pν(dt × ds). ν∈M(f α,g β) # # R2

17 We observe that since f, g are real-valued function, f#α and g#β are two probability measures

−1 on R. Hence, by the above proposition, we have F (t) = f#α((−∞, t] = α(f (−∞, t]) =

α {x ∈ X|f(x) ≤ t}. And similarly, G(t)g#β((−∞, t]) = β {y ∈ Y |f(y) ≤ t}. And Z Z inf |t − s|pν(dt × ds) = |F −1(t) − G−1(t)|pdt. ν∈M(f α,g β) # # R2 R By combining the equality with the inequality obtained above, we obtain the conclusion.

Proposition 2.5.2. TLBp(X , Y) ≤ dGW,p(X , Y).

Proof. Fix x ∈ X and y ∈ Y . Then note that Z 0 0 p 0 0 |dX (x, x ) − dY (y, y )| µ(dx × dy ) X×Y Z X 0 Y 0 p 0 0 = |Dx (x ) − Dy (y )| µ(dx × dy ) X×Y

X 0 0 Y 0 0 where Dx (x ) = dX (x, x ) and Dy (y ) = dY (y, y ). Using the lemma above, we have F (u) =

0 0 α({x ∈ X|dX (x, x ) ≤ t}) = α(BX (x, t)) = FX (x, t) in the definition of TLBp. Similarly,

G(u) = FY (y, t). Hence, we conclude that Z Z X 0 Y 0 p 0 0 −1 −1 |Dx (x ) − Dy (y )| µ(dx × dy ) ≥ |FX (x, t) − FY (y, t)|dt. X×Y R Since the choice of µ is arbitrary, we have Z Z X 0 Y 0 p 0 0 −1 −1 inf |Dx (x ) − Dy (y )| µ(dx × dy ) ≥ |F (x, t) − F (y, t)|dt. µ X Y X×Y R

Hence, if we intergrate over all x, y, we have TLBp(X , Y) ≤ dGW,p(X , Y).

2.6 Sinkhorn’s Algorithm

Although the Wasserstein distance gives a powerful tool to compare two probability dis- tributions on metric spaces, the computation is still costly. The worst case cost of computing the Wasserstein distance of two discrete measures with N elements is O(N 3 log(N)), when using the best algorithm [22]. If the metric measure space can be embedded into a low di- mensional Rn space, the cost of approximating the Wasserstein distance can be reduced [22]. However, due to the distortion of embeddings and the exponentional increase in costs, ap- proximating the Wasserstein distance through embedding into Rn is impossible when n > 4 [22].

To approximate the Wasserstein distance faster, a new method was proposed in [22] where by turning the optimal transport problem into a strictly convex problem, one can use

Sinkhorn’s fixed point algorithm [80] to solve the new optimization problem with linear con- vergence [33]. The performance can be boosted by implementing the algorithm on GPGPU architectures.

18 Let α and β be two probability mass functions on a finite metric space (X, dX ) with cardi- Pd nality d. Let X = {x1, x2, ··· , xd} The entropy of α is defined by h(α) = − i=1 α(xi) log(α(xi)). Entropy quantifies the information in a probability distribution, or in other words, uncer- tainty in a distribution. To see this, consider two distributions γ1 = {0.8, 0.1, 0.1} and

γ2 = {0.3, 0.3, 0.4}. We observe that h(γ1) = −0.8 log(0.8) − 2 ∗ 0.1 log(0.1) ≈ 0.28 and h(γ2) = −2 ∗ 0.3 log(0.3) − 0.4 log(0.4) ≈ 0.47. This illustrates the idea that a more uniform probability distribution has higher entropy. We define entropy on a coupling µ ∈ M(α, β)

Pd in the following fashion: h(µ) = − i,j µ(xi, xj) log(µ(xi, xj)). Fix an  > 0. Firstly, we introduce an entropic regularization term to the optimal transport problem. That is,

Z   dS(α, β) = inf c(x, y)µ(dx × dy) − h(µ) (2.11) µ∈M(α,β) X×Y where c(x, y) is a cost function.

Due to the principle of maximum entropy [44], µ with the largest entropy best represents current state of knowledge. Hence, by subtracting h(µ) from the original objective function, we maximize entropy when finding infimum. Moreover, we note that the objective function in P P (2.11) is strictly convex. This can be seen as follows: let f(µ) = i,j cijµij + ij µij log(µij).

2 2 Note that ∂f/∂µij = cij + (log(µij) + 1) and ∂ f/∂µij = /µij > 0. Hence, f is convex. In general, the solution of a linear optimization problem is found at an extremal point of the domain [7]. However, by converting the optimal transport problem to a stricly convex problem, we now can search for µ in the interior of M(α, β).

 The computation of dS utilizes a Matrix Scaling algorithm and Sinkhorn’s fixed point iteration. First, we note that the optimizer is unique [22]. Let w be a vector, then we use diag(w) to denote the diagonal matrix with diagonal entries from w. The optimal coupling

µ that realizes the infimum has a closed form solution µ = diag(u)Kdiag(v) where u and v are non-negative vectors and K = e−(1/)c with cost matrix c [30]. Hence, µ is a rescaled version of K. Hence, we can use Sinkhorn’s algoritm to update the optimal couplings. The algorithm is described in detail in [22]. We will outline the procedures here. Recall that

α, β ∈ RN are marginals of the coupling µ. We solve the following system of equations for u and v iteratively: ( u Kv = α v KT u = β where denotes the component-wise product. We start with v(0) = (1, 1, ··· , 1) and com-

(0) (0) (1) α (1) pute u by plugging v into the first equation and solve for u = Kv(0) . Then we plug u

19 (1) β into the second equation and solve for v = Ku(1) . We repeat these steps until convergence or when the stopping criteria is met.

The convergence of Sinkhorn’s iteration is proven in [33]. The idea is to show that the Sinkhorn’s iteration converges with respect to the Hilbert projective metric on non-

uivj negative vectors u, v defined as dH(u, v) = log maxi,j using Birkhoff’s theorem [9]. The uj vi Hilbert projective metric is a distance on the projective cone where scalar multiplication gives an equivalence relation on the collection of positive vectors. Moreover, dH(u, v) is invariant under component-wise multiplication, i.e. for all non-negative vector w, dH(u, v) =

∗ ∗ dH(w u, w v). Let u and v be the true solution. Then by the Birkhoff’s theorem, we obtain an upper bound for the rate of convergence of u (and v) as a factor of κ(K)2 where

p θ(K) − 1 KilKjm κ(K) = p , and θ(K) = max . θ(K) + 1 i,j,l,m KjlKim

20 Chapter 3: Persistent Homology

3.1 Brief History

Persistent homology summarizes topological features of a shape or of a function on a topological space [26]. Through building a sequence of simplicial complexes and tracking the “birth” and “death” of features in the simplicial complexes, we obtain a concise repre- sentation of the original space using a collection of coordinate pairs in R2. The collection of coordinates is called a persistence diagram.

As the name suggests, persistent homology extends the results in homology and studies the “persistence” of holes in a topology. In 1990, Patrizio Frosini and collaborators intro- duced the size functions which are equivalent to what is nowadays known as 0-persistent homology [34]. In 1999, Vanessa Robins studied inclusion maps in building homology [77].

In 2002, Eldesbrunner, Letscher and Zomorodian introduced persistent homology with a fast algorithm [28]. Both works by Robins and Eldesbrunner et al. built on the notion of alpha shapes [27].

In the following sections, we will define persistent homology on top of simplicial homology.

Although in many cases, our input data may not come with a simplicial structure, we can build a sequence of simplicial complexes with the points in the data set as vertices. Starting with an empty complex, we gradually add simplices to the complex. As we trace the simplicial complex at each time step, we obtain a sequence of simplicial complexes, that is, a filtration.

We will discuss a well-known filtration called Vietoris-Rips filtration in section 3.4.

Applying the functoriality of kth-homology with field coefficients, we obtain a sequence of kth homology vector spaces that track holes in kth dimension. We call these holes k-cycles.

The time that a k-cycle appears is called the “birth” time and the time that it was filled by a higher dimensional chain is called a “death” time. We collect the “birth” time and

“death” time of a k-cycles as a tuple. Let R := R ∪ ∞. The collection of the birth times and death times of all the k-cycles is called a kth-persistence diagram. It is a sub multiset of

2 R with multiplicities. Any off diagonal points have finite multiplicities and diagonal points

21 have infinite multipliticities. The persistence diagram provides a visual representation of the

filtered simplicial complex. There is an equivalent notion called barcode where we think of

each persistent point as an interval. We can also track the difference between “death” and

“birth” of a persistent point, in other words, how long a k-cycle lasts. The longer a k-cycle

lasts, the more off diagonal the corresponding point on the persistence diagram is. We call

the most off-diagonal points signatures.

The bottleneck distance is a metric defined on the collection of persistence diagrams.

In section 3.6, we will see that the infinite bottleneck distance is bounded above by the

Gromov-Hausdorff distance on a metric space.

3.2 Simplicial homology

Let X be a finite set. An abstract simplicial complex on X is a collection Σ of finite subsets of X s.t ∀σ ∈ Σ, all subsets of σ are in Σ. We call an element σ ∈ Σ s.t. σ = {v0, ··· , vk}

a k-simplex. and vi a vertex of σ for each 0 ≤ i ≤ k. We denote such σ by [v0, v1, ··· , vk].

Fix an σ ∈ Σ and a field F . Despite the fact that F can be any field, we assume we are

always working with the field F2 in this paper for simplicity. F2 contains only two elements

0F and 1F . The field multiplication and addition almost follow the usual multiplication and

addition as integers, except for that 1F + 1F = 0F .A characteristic function for σ is a map

ϕσ :Σ → F such that ( 0 0 1F σ = σ ϕσ(σ ) := 0F otherwise.

Let K be the collection of all k-simplices in Σ. The kth chain complex Ck(K) of K

over F is a free vector space on the set K. That is, Ck(K) is the vector space of F -valued

functions on K with vector space addition as

(ϕ + ψ)(σ) := ϕ(σ) + ψ(σ)

and scalar multiplication as

(λ · ϕ)(σ) := λ · ϕ(σ)

for all ϕ, ψ ∈ Ck(K), all k-simplex σ ∈ K and all λ ∈ F . We call an element in Ck(K) a

k-chain.

Proposition 3.2.1. The family of characteristic functions {ϕσ}σ∈K forms a basis for Ck(K).

Proof. Let ϕ ∈ Ck(K) and λσ = ϕ(σ) for σ ∈ K. It is clear that we can rewrite ϕ = P σ∈K λσϕσ. Hence, {ϕσ} spans Ck(K).

22 0 0 P 0 0 Then, for a σ ∈ K, we observe that 0F = ϕ(σ ) = σ∈K λσϕσ(σ ) = λσ0 since ϕσ(σ ) =

0 0 0F for any σ 6= σ . Since the choice of σ is arbitrary, we have λσ = 0F for all σ. Hence,

{ϕσ} is linearly independent.

The boundary map ∂k : Ck(K) → Ck−1(K) for all k-simplex σ := [v0, ··· , vk] is defined

as k X k ∂k(ϕσ) = (−1F ) ϕ[v0,··· ,vˆi,··· ,vk]. i=0 where [v0, ··· , vˆi, ··· , vk] is a (k − 1)-simplex with the vertexv ˆi is omitted. Then ∂k is extended to all elements in Ck(K) by linearity. We note that in F2, we have −1F = 1F .

Thus, the boundary map of any chain complex of F2 can be written as k X ∂k(ϕσ) = ϕ[v0,··· ,vˆi,··· ,vk]. i=0

Proposition 3.2.2. ∂k ◦ ∂k+1(ϕσ) = 0F for all (k + 1)-simplex σ when F = F2

Proof. Let σ = [v0, ··· , vk+1]. Then

k k+1 X X ∂k ◦ ∂(ϕσ) = ϕ[v0,··· ,vˆi,··· ,vˆj ,··· ,vk+1] j=1 i=1 k+1 k X X = ϕ[v0,··· ,vˆi,··· ,vˆj ,··· ,vk+1]. i=1 i6=j

Note that [v0, ··· , vˆi, ··· , vˆj, ··· , vk+1] = [v0, ··· , vˆj, ··· , vˆi, ··· , vk+1]. Hence, for each i, j such that i 6= j, the simplex [v0, ··· , vˆi, ··· , vˆj, ··· , vk+1] appeared exactly twice. Since

2F = 0F in F2, we conclude that ∂k ◦ ∂(σ) = 0.

A homology vector space is defined on a simplicial chain. Let

Zk := ker∂k = {σ ∈ Ck(K)|∂k(σ) = 0}

and

Bk := im∂k+1 = {τ ∈ Ck(K)|∃σ ∈ Ck+1(K) s.t. ∂k(σ) = τ} .

A consequence of Proposition (3.2.2) is that Bk ⊂ Zk. Then the k-th homology vector space

is given by

Hk := Zk/Bk.

Note that Hk captures “holes” (or loops) in k-th dimension that are not filled by any higher

dimensional chains.

The dimension of a vector space is an important feature in linear algebra. Hence, the kth

Betti number βk is defined to be the vector space dimension of Hk. Due to the construction

of Hk, the kth Betti number captures number of cycles in kth dimension of the simplicial

complex K.

23 Figure 3.1: Figure from [95]. It shows the Betti numbers βk for different shapes.

3.3 Functoriality of Hk

The fact that Hk is functorial means that maps between simplicial complexes K and L are mapped by Hk to linear maps between Hk(K) and Hk(L).

Let K and L be two abstract simplicial complexes. A simplicial map f : K → L is a continuous function that maps a vertex in K to a vertex in L and is extended linearly. That is, for all k ≥ 0 and all k-simplex σ = [v0, ··· , vk], f([v0, ··· , vk]) := [f(v0), ··· , f(vk)]. A simplicial map on vertices need not be injective, and hence, it may map a k-simplex to a lower-dimensional simplex. A simplicial map is a simplicial isomorphism if it is bijective and if for all σ ∈ L, f −1(σ) ∈ K.

Let K,L be two simplicial complexes and f be a simplicial map between them. Let

Ck(K) and Ck(L) be the simplicial k-chains of K and L over F respectively. Then f induces a well defined linear map Ck(f): Ck(K) → Ck(L) on the basis elements in the following way: ( ϕf(σ) ϕf(σ) ∈ Ck(L) Ck(f)(ϕσ) := 0 otherwise which is extended linearly in the following way: for any τ ∈ Ck(K),

X Ck(f)(ϕ)(τ) := ϕ(s). s|f(s)=τ

Proposition 3.3.1. [69] The diagram

∂k Ck(K) Ck−1(K)

Ck(f) Ck−1(f)

∂k Ck(L) Ck−1(L). commutes, i.e. ∂k ◦ Ck(f) = Ck−1(f) ◦ ∂k.

24 Proof. Let σ = [v0, ··· , vk] be a k-simplex in K. Then

k k X X C (f) ◦ ∂ (ϕ ) = C (f)( ϕ ) = ϕ . k−1 k σ k−1 [v0,··· ,vbi,··· ,vk] f([v0,··· ,vbi,··· ,vk]) i=1 i=1

On the other hand, we have

k k X X ∂k ◦ Ck(f)(ϕσ) = ∂k(ϕf(σ)) = ϕ = ϕf([v0,··· ,vi,··· ,v ]). [f(v0),··· ,f\(vi),··· ,f(vk)] b k i=1 i=1

Proposition 3.3.2. [69] Ck(f) induces a well defined map Hk(f): Hk(K) → Hk(L).

Proof. We first claim that Ck(f) sends Zk(K) → Zk(L) and Bk(K) → Bk(L).

Let ϕσ ∈ Zk(K). Then Ck−1(f) ◦ ∂k(ϕσ) = 0 = ∂k ◦ Ck(f)(ϕσ) = ∂k(ϕf(σ)) due to the

above proposition. Hence we have Ck(f)(ϕf(σ)) ∈ Zk(L).

Let ϕσ ∈ Bk(K). Then there is an ϕσ0 ∈ Ck+1(K) such that ∂k+1(ϕσ0 ) = ϕσ. Let

Ck(f)(ϕσ) = ϕτ for some ϕτ . Then we have Ck(f) ◦ ∂k+1(ϕσ0 ) = ϕτ = ∂k+1 ◦ Ck+1(f)(ϕσ0 ).

Hence, ϕτ ∈ Bk(L).

Then Hk(f): Zk(K)/Bk(K) → Zk(L)/Bk(L) is defined such that for every equivalence class [ϕσ] ∈ Hk(K), Hk(f)([ϕσ]) := [Ck(f)(ϕσ)]. We claim that Hk(f) is well defined.

Let [ϕσ], [ϕσ0 ] ∈ Zk(K) and suppose [ϕσ] = [ϕσ0 ]. If [ϕσ], [ϕσ0 ] ∈ Bk(K) ⊂ Zk(K), then

Ck(f)([ϕσ]) = Ck(f)[ϕσ0 ] = [0]. Otherwise, [ϕσ] = [ϕσ0 ] if and only if ϕσ = ϕσ0 . Hence,

Ck(f)([ϕσ]) = Ck(f)[ϕσ0 ].

3.4 Persistent homology

In many cases, we are interested in the topology of a set of points that are sampled

from a shape, e.g. a circle. We would expect the set of sample points to possess the same

topology as the original space that they are sampled from. However, the input data set

may not possess a structure that from which one can construct a simplical complex. Thus,

simplicial homology and the Betti numbers are not available. Persistent homology provides

a solution to this problem by constructing an increasing sequence of simplicial complexes

called a filtered simplicial complex

K0 ,−→ K1 ,−→· · · ,−→ Kn

on which we can apply homology functor and obtain a sequence of vector spaces.

The construction of a filtered simplical is done through collecting the sublevel sets of a

filtration function. Let X be a finite set and let K be a sub collection of the powerset of X

25 where for any σ ∈ K, ∀τ ⊂ σ, τ ∈ K.A filtration function is a map f : K → R s.t. for all

τ ⊆ σ, f(τ) ≤ f(σ). Let Kr denote the sublevel set of f at height r, i.e.

−1 Kr := f ((−∞, r]) = {σ|f(σ) ≤ r} .

0 Note that for all r ≤ r , we have Kr ⊆ Kr0 .A filtered simplicial complex is a collection

0 of simplicial complexes {K } such that for all r ≤ r , K ⊆ K 0 . By applying the kth r r∈R+ r r homology functor to a filtered simplicial complex, we obtain a collection of homology vector

spaces {Hk(Kr)}r. As r increases, we obtain an increasing sequence of simplicial complexes. A filtration

function gives instructions on how to build such increasing sequence of simplicial complexes.

By using different filtration functions, we may detect different features in a given data set.

Then the kth homology functor tracks how k-cycles appear and disappear as r increases in

the filtered simplicial complexes.

A widely used example is the Vietoris-Rips (or Rips) complex. Let (X, dX ) be a finite

metric space. Fix a finite non-negative number r ∈ R+, a Vietoris-Rips complex VR(X, r)

0 0 is a collection of abstract simplices σ ⊆ X such that for all vertices x, x ∈ σ, dX (x, x ) ≤ r.

Figure 3.2 shows an example of a Vietoris-Rips complex. Starting from a set of points, we

inductively add higher dimensional simplices into the Vietoris-Rips complex. Let K be the

powerset of X. The filtration function f : K → R to construct the Vietoris-Rips complex is defined as follows:

• f(x) = 0 for all x ∈ X

0 0 0 • f([x, x ]) = dX (x, x ) for all x, x ∈ X

0 00 0 00 0 00 0 00 • f([x, x , x ]) = max {dX (x, x ), dX (x, x ), dX (x , x )} for all x, x , x ∈ X . .

Due to the construction of a Vietoris-Rips complex, we have VR(X, r) ⊆ VR(X, r0) for any

r, r0 ∈ such that r ≤ r0. Thus, we obtain a filtered simplicial complex {VR(X, r)} R r∈R≥0

and a collection of homology vector spaces {Hk(VR(X, r))} . r∈R≥0

0 We observe that due to functoriality, in any filtered simplicial complex {Kr}r, if r ≤ r ,

the maps between any simplicial complexes Kr and Kr0 is preserved, i.e. Hk(Kr) ,−→ Hk(Kr0 ).

This observation motivates a general definition of a collection of homology vector spaces

called persistence vector space.

A persistence vector space over a field F is a collection of vector spaces {Vr} , together r∈R≥0

0 00 with a family of linear maps {Lr,r0 : Vr → Vr0 }r≤r0 s.t. for all r ≤ r ≤ r , Lr,r00 = Lr0,r00 ◦Lr,r0 .

26 (a) (b) (c)

Figure 3.2: Example of Vietrois-Rips complex on a set of three poitns. Figure 3.2a shows the original set. Figure 3.2b and 3.2c show all the simplicies in Kr as r increases.

We can further generalize the construction of the persistence vector spaces described in

the beginning of the section. Let X be a set and f : X → R+. Let Xr denote the sublevel set of f at r. The free vector space generated by (X, f) is the persistence vector space

{VF (X, f)r}r where VF (X, f)r := VF (Xr) is the free vector space generated by Xr for each

0 r ∈ R+. Since Xr ⊆ Xr0 for all r ≤ r , we can define the linear maps

VF (X,f) Lr,r0 (ϕx) := ϕx

for all basis element ϕx ∈ VF (X, f)r.

We call a persistence vector space free if it can be expressed as {VF (X, f)r}r for some (X, f). We observe that the persistence vector spaces obtained through building a Vietoris-

Rips complex is free. A free persistence vector space is finitely generated if there exist a pair

(X, f) where X is finite.

3.5 Persistence diagrams

Let {Vr}r and {Wr}r be two persistence vector spaces. A linear transformation f from

0 {Vr} to {Wr} is a family of linear transformations fr : Vr → Wr such that for all 0 ≤ r ≤ r , all the diagrams of the following form commute:

LV r,r0 Vr Vr0

fr fr0 LW r,r0 Wr Wr0 .

Let V = {Vr} and W = {Wr}. the direct sum between persistence vector spaces V ⊕ W

is defined as the persistence vector space

Vr ⊕ Wr := {Vr} ⊕ {Wr}

together with linear maps

V⊕W Lr,r0 : Vr ⊕ Wr → Vr0 ⊕ Wr0

27 by

V⊕W V W Lr,r0 (v, w) := (Lr,r0 (v),Lr,r0 (w)).

Recall from linear algebra, a finite vector space over V over a field F is isomorphic to

Ln i=1 F where n = dim(V ). We can decompose some persistence vector spaces in a similar fashion. A persistence vector space is finitely presented if it is isomorphic to a persistence

vector space of the form Wr/im(f) for some linear transformation f : {Vr} → {Wr} between

finitely generated free persistence vector spaces {Vr} and {Wr} [16].

First we denote R+ ∪ {+∞} as R+. Let a ∈ R+, b ∈ R+ and a < b. Let P (a, b) :=

{P (a, b)r}r be a persistence vector space over F where ( F for r ∈ [a, b) P (a, b)r := 0 otherwise together with a collection of linear maps {Lr,r0 } such that

( 0 idF if r, r ∈ [a, b) Lr,r0 := 0F otherwise.

We also call such P (a, b) an interval module. We can also visualize an interval module in the following way

P (a, b) = ··· −→0 0 −→0 0 −→0 F −−→·idF · · −−→idF F −→0 0 −→0 0 −→·0 · · . | {z } | {z } | {z } r≤a a≤r

finite direct sum of the form

P (a1, b1) ⊕ P (a2, b2) ⊕ · · · ⊕ P (an, bn)

for some ai ∈ R+, bi ∈ R+, and ai < bi for all i. Moreover, the direct sum is unique up to reordering.

We refer our readers to [16] for a complete proof of this theorem. This is also known

as the classification theorem for finitely presented persistence vector spaces. We call ai the

“birth” time and bi the “death” time.

Firstly, we notice that a persistent homology vector space is also a persistence vector

space with the linear maps being inclusion maps. From Theorem 3.1, we see that a finitely

presented persistence vector space can be represented by a set of the form

 (ai, bi)|a ∈ R+, b ∈ R+, and a < b

with multiplicities. Notice that any diagonal point (a, a) means that the cycle dies at the

time it was born. Hence, these diagonal points are trivial. We give all diagonal points infinite

28 Figure 3.3: An example of persistence diagrams of applying Vietoris-Rips filtration on a sampled point cloud of a circle. On the left is the sampled circle with radius 0.5 centered at (0.5,0.5). The middle figure shows the persistence diagram in the 0th dimension and the figure on the right shows the persistence diagram in the 1st dimension. We observe one point

multiplicity for the convenience of computing the bottleneck distance which we will discuss in section 3.6.

Utilizing Theorem 3.1 and the convention above, we can represent a finitely presented persistence vector space as a collection of intervals (called persistence barcode), or as a collection of points in R2 (called persistence diagram). Figure 3.3 shows an example of persistence diagram of sampled circle in dimension 0 and dimension 1. Figure 3.6 shows an example of the persistence barcode along with an example.

3.5.1 The four point example

We will present a simple example of computing persistence diagram of a set of four points shown in Figure 3.5.

Let us construct a simple filtration function in the following way:

• At filtration value t = 0, all the vertices are introduced in the simplicial complex

• The filtration value of an edge in K is the weight on the edge

• The filtration value of a triangle is given by the maximum filtration value of the edges

forming the triangle.

Figure 3.5 shows the changes in the filtered simplicial complex Kt as the filtration value t increases. We observe that there are four connected components in Kt until t ≥ 1 and a √ cycle in H1 is present in when 1 ≤ t < 2. To show the multiplicity of the points, we will visualize the persistent homology in H0 and H1 through persistence barcodes shown in

Figure 3.6.

29 1 a b

1 √ √ 1 2 2

d c 1

Figure 3.4: Example of a simplicial complex with filtration function defined in the following way: all vertices enters at filtration value 0; the value of the filtration function at an edge is given by the weight of the edge; once three edges form a triangle, we add a face to the interior of the triangle.

1 a b a b

1 1

d c d c 1 (a) 0 < t < 1 √ (b) 1 ≤ t < 2 √ (c) t ≥ 2.

Figure 3.5: Filtered simplicial complex Kt at filtration value t. Note√ that a cycle H1 appears when t = 1 and is “killed” by the new triangles appeared when t = 2.

Figure 3.6: Persistence barcodes in H0 and H1 of the filtered simplicial complex.

30 3.6 Bottleneck distance

Given two persistence diagrams, it is useful to be able to compare similarity/dissimilarity

between them. We can view persistence diagrams as a collection of points. If the two

persistence diagrams have the same number of points, then by finding the “best” bijection

between the points between the points, we will know how different two persistence diagrams

are through measuring pairwise distance between each matched pair of points.

However, the number of non-diagonal points in two persistence diagrams need not be

the same. Hence, a bijection between non-diagonal points may not exist. Thus, we add

diagonal points to a persistence diagram to make all persistence diagrams to have the same

cardinality. Then we need to find the “best” way to add diagonal points so that we obtain

the “optimal” matching between the persistence diagrams.

For any pair of persistence diagrams DX (f) and DY (f), the ∞-bottleneck distance is

defined as

dB,∞(DX (f), DY (f)) := inf sup kx − γ(x)k∞ (3.2) γ x∈X where γ : DX (f) → DY (f) is a bijection. The p-bottleneck distance on persistence diagrams is defined as

X p 1/p dB,p(DX (f), DY (f)) = inf (kx − γ(x)k∞) . (3.3) γ x∈DX (f)

3.7 Interleaving distance

Note that the bottleneck distance defined in section 3.6 is a distance defined on persistence diagrams, rather than on persistence vector spaces. A natural metric on the set of persistence vector spaces is the interleaving distance. It compares two persistence vector spaces via

“shifting” as we will see later in this section. We will see that the interleaving distance coincides with the bottleneck distance in some cases.

Let V = {V } and W = {W } be two persistence vector spaces. We defined earlier r r∈R s s∈R ∼ that if there exists a family of isomorphisms between Vr and Wr for all r, then V = W, i.e. let fr : Vr → Wr be the isomorphism, then the following diagram

LV LV r,r0 r0,r00 ··· Vr Vr0 Vr00 ···

fr fr0 fr00 LW LW r,r0 r0,r00 ··· Wr Wr0 Wr00 ···

commutes. From the notion of isomorphism, it is natural to ask if it is possible to compare

two non-isomorphic persistence vector spaces. Let V = P (0, 1) and W = P (0.1, 1). It is obvious that there is no isomorphism between V and W, but intuitively we would think that

31 they are “similiar”. Hence, we are looking for a metric that could quantify the “similarity”

between two persistence vector spaces.

Let ϕ·, : V → W be a family of maps {ϕr, : Vr → Wr+}. We call ϕ·, an -morphism if for all r ≤ s, the digram V Lr,s Vr Vs

ϕr, ϕs,

Wr+,s+ Wr+ Ws+ commutes.

An -interleaving between V and W is a pair of -morphisms ϕ·, : V → W and ψ·, : W → V

such that

for all r ∈ R, V Lr,r+2 Vr Vr+2 ϕr, ψr+,

Wr+ and for all s ∈ R W Ls,s+2 Ws Ws+2 ψs, ϕs+,

Vs+ commutes. We call V and W -interleaved if there exist an -interleaving. Note that V and

W are isomorphic if and only if there exist a 0-interleaving between them.

Proposition 3.7.1. If V and W are -interleaved, then they are also 0-interleaved for all

0 > .

Proof. Let {ϕr : Vr → Wr+} and {ψs : Ws → Vs+} be two -interleaving.

W ¯ Letϕ ¯r = Lr+,r+0 ◦ ϕr and ψs = Ls+,s+0 ◦ ψs. We want to show that

1.¯ϕ and ψ¯ are 0-homomorphisms.

2.¯ϕ and ψ¯ is an 0-interleaving.

1. Let d = 0 − . Note that ∀r and if r0 > r + d we have a commuting diagram

V LV LV Lr,r+d r+d,r0 r0,r0+d Vr Vr+d Vr0 Vr0+d

ϕr ϕr+d ϕr0 ϕr0+d LW LW LW r+,r+0 r+0,r0 r0+,r0+0 Wr+ Wr+0 Wr0+ Wr0+0 . Then we get the following commuting diagram

LV r,r0 Vr Vr0

ϕ LW ◦ϕ r0+d◦LV r+,r+0 r r0,r0+d LW r+0,r0+0 Wr+0 Wr0+0

32 .

Now suppose r < r0 < r + d, we have a commuting diagram

LV LV LV r,r0 r0,r+d r+d,r0+d Vr Vr0 Vr+d Vr0+d

ϕr ϕr0 ϕr+d ϕr0+d LW LW LW r+,r0+ r0+,r+0 r+0,r0+0 Wr+ Wr0+ Wr+0 Wr0+0 .

Then we get the same commuting diagram

LV r,r0 Vr Vr0

ϕ LW ◦ϕ r0+d◦LV r+,r+0 r r0,r0+d

LW r+0,r0+0 Wr+0 Wr0+0 . Therefore,ϕ ¯ is an 0-homomorphism. Using similar argument, one can show that ψ¯ is also

an 0-homomorphism.

2. Since ϕ is an -interleaving, we have a commuting diagram

V Lr,r+2 Vr Vr+2 ϕr ψr+

Wr+. Then

V V LV Lr,r+2 Lr+2,r+2+d r+2+d,r+2(+d) Vr Vr+2 Vr+2+d Vr+2(+d)

ψr+ ψr++d ϕr .

W Lr+,r++d Wr+ Wr++d

V V W Since ψr+ ◦ ϕr = Lr,r+2 and Lr+2,r+2+d ◦ ψr+ = ψr++d ◦ Lr+,r++d, we have

W V ψr++d ◦ Lr+,r++d ◦ ϕr = Lr,r+2+d

Then

V W V V Lr+2+d,r+2(+d) ◦ ψr++d ◦ Lr+,r++d ◦ ϕr = Lr+2+d,r+2(+d) ◦ Lr,r+2+d

i.e.

¯ V ψr+0 ◦ ϕ¯r = Lr,r+20 .

Similar argument shows that

¯ W ϕ¯r+0 ◦ ψr = Lr,r+20 .

Thus,ϕ ¯ and ψ¯ is a pair of 0-interleaving.

33 Figure 3.7: Persistence barcode representation of all the cases of the relative positions be- tween I1 and I2. Figure from [71]. An interval module is indicate by line connecting the left endpoint, the midpoint and the right end point.

We define the interleaving distance between V and W as

dI (V, W) = inf { > 0|V and W are -interleaved} . (3.4)

Proposition 3.7.2. Let I1 = P (a1, b1) and I2 = P (a2, b2) be two interval modules. Then dI (I1,I2) = dB,∞((a1, b1), (a2, b2)).

One can prove the proposition case by case. Figure 3.7 show all different cases of the relative positions between I1 and I2. However, we will not list all the details in here but rather provide a proof for the cases on the first row of Figure 3.7. One can use similar argument to prove other cases.

Proof. Case 1: left case on the first row of Figure 3.7

From the diagram, we assume that a1 < a2 < b1 < b2. Without loss of generality, suppose

0 0  = max {|a1 − a2|, |b1 − b2|} = |a1 − a2|. Let  < , and suppose there is an  -interleaving

ϕ and ψ. Then we have

··· 0 0 0 F id ··· id F id ··· ϕ 0 ψ a1, c,0

··· 0 0 0 0 0 0 0 0 0 ···

34 0 I1 where c = a +  < a . Then, we have that ψ 0 ◦ ϕ 0 = 0 but L 0 = id. This 1 2 c, a1, a1,a1+2

0 contradicts to the fact that ϕ and ψ is an  -interleaving. Therefore, we conclude that I1 and

I2 are at least -interleaved.

Let ϕ and ψ be two -homomorphisms. It suffices to check the diagrams at the boundaries.

We note that since a1 < a2, when r = a1 − 2 the diagarm

0 0 ··· 0 F id F

ϕr, ψr+, ϕa1,

0 0 ··· 0 F

commutes because

I1 Lr,r+2 = 0 = ψr+, ◦ ϕr,

and

I2 Lr+,a2 = 0 = ϕa1, ◦ ψr+,.

When r1 = b1 −  and r2 = b2 − , the diagram

F id F id F 0 ··· 0 ··· 0 0 ψ ψr1, ϕr2, ϕb1, b2,

F id ··· id ··· 0 F 0 0

also commutes because

I2 Lr1,r1+2 = 0 = ϕb1, ◦ ψr1,

and

I1 Lr2,r2+ = ψb2, ◦ ϕr2,.

Hence, dI (I1,I2) = max {|a1 − a2|, |b1 − b2|}.

Now we recall that

dB,∞((a1, b1), (a2, b2)) = inf sup kx − f(x)k∞ f

= inf max {|a1 − f(a1)|, |b1 − f(b1)|} . f

We note that we can either map (a1, b1) to (a2, b2) or both points to their closest diagonal

a1+b1 a1+b1 points. The closest diagonal point to (a1, b1) is ( 2 , 2 ) and the closest to (a2, b2) is

a2+b2 a2+b2 ( 2 , 2 ). Then from Figure 3.7, we see that

a + b a + b  max {|a − a |, |b − b |} < max 1 1 , 2 2 . 1 2 1 2 2 2

Hence, we let f((a1, b1)) = (a2, b2). Thus,

dB,∞((a1, b1), (a2, b2)) = max {|a1 − a2|, |b1 − b − 2|} = dI (I1,I2).

35 b2−a2 Case 2: right case on the first row of Figure 3.7 Let m = 2 and  = m. Suppose

0 0 0  < . Let ϕ and ψ be an  -interleaving. If  > b1 − a2, then in the diagram

0 ϕ 0 0 ψa2, a2+ ,

F id ··· id F,

I2 0 0 we have La2,a2+2 = id but ϕa2+ , ◦ ψa2, = 0. This contradicts to the assumption that ϕ and ψ is an 0-interleaving.

0 If  < b2 − a2, then we have the diagram

0 ψm, ϕm+0,0

F id ··· id F,

0 I2 because m +  < b2. Then we observe that Lm,m+20 = id 6= ϕm+0,0 ◦ ψm, = 0. Again, it contradicts to the assumption that ϕ and ψ is an 0-interleaving.

Hence, we conclude that the interleaving distance between I1 and I2 is at least .

To show that dI (I1,I2) = , suppose ϕ and ψ is an -interleaving. Then we note that the diagrams ··· 0 F id F id F 0 ··· 0 0

ϕa1, ψa1+,

F and 0

ψa2, ϕa2+,

··· 0 F id F id F id ··· 0 0

|b2−a2| commutes. Hence, dI (I1,I2) = 2 .

To compute the bottleneck distance, we need to construct a bijection. Again, we note

that we can map (a1, b1) to (a2, b2) or both points to their closest diagonal points. Then we

observe that

a + b a + b  a + b max {|a − a |, |b − b |} > max 1 1 , 2 2 = 2 2 1 2 1 2 2 2 2

a2+b2 a1+b2 a1+b1 a1+b1 a2+b2 a2+b2 since |b1−b2| > 2 > 2 . Therefore, we let f((a1, b1)) = ( 2 , 2 ) and f(( 2 , 2 )) =

(a2, b2).

Hence, we have a + b d ((a , b ), (a , b )) = 2 2 = d (I ,I ). B,∞ 1 1 2 2 2 I 1 2

36 LN LM Theorem 3.5 (The isometry theorem [53]). Let M = i=1 P (ai, bi) and N = j=1 P (cj, dj). Then

dB,∞({(ai, bi)}i , {(cj, dj)}j) = dI (M,N).

3.8 Stability results of persistence diagrams

The stability of persistence diagrams is important in detecting and deciding a feature in the topological space or functions on these spaces. A true feature that is observed in the persistence diagram is expected to be stable with respect to small perturbations.

3.8.1 Stability of Vietoris-Rips filtration

In section 3.4 we introduced the Vietoris-Rips complex on a metric space. Here, we will state the stability theorem of the Vietoris-Rips filtration with respect to the Gromov-

Hausdorff distance.

Let (X, dX ) and (Y, dY ) be two metric spaces. Let DkR(X) and DkR(Y ) be the kth persistence diagram of the Vietoris-Rips complexes of X and Y .

Theorem 3.6. [17]

dB(DkR(X),DkR(Y )) ≤ dGH (X,Y ), for all natural number k.

We refer our readers to [17] for a proof. Theorem 3.6 provides us a lower bound of the

Gromov-Hausdorff distance which is hard to compute.

3.8.2 Stability of filtration functions

Let X be a topological space. Let f : X → R. An r ∈ R is called a homological critical

−1 value of f if there exists k ∈ Z and a small  ∈ R such that the map Hk(f (−∞, r − ]) →

−1 Hk(f (−∞, r + ]) induced by inclusion is not an isomorphism. The function f is tame if if it has a finite number of homological critical calues and for all k ∈ Z and t ∈ R,

−1 dim(Hk(f (−∞, t])) is finite.

Theorem 3.7. [19] Let X be a triangulable topological space and f and g are real-valued tame functions on X, then

dB(Dk(f), Dk(g)) ≤ kf − gk∞.

We refer our readers to [19] for the proof of the stability theorem.

37 3.9 Computation of bottleneck distance

In the actual computation of bottleneck distance, we use a software called Hera [50].

Hera introduces a fast way of computing bottleneck distance between persistence diagrams

by modifying the observation in [29] on the Hopcroft and Karp algorithm [43, 29]. That

is, Hera aims to find a partial matching between the points in two persistence diagrams by

finding a maximum matching of a bipartite graph. It speeds up the computation in finding

such a maximum matching by utilizing the properties of a k-tree. We will briefly explain

related concepts and outline the algorithm below.

Given two persistence diagrams, first we need to construct a bipartite graph of them

and convert the bottleneck distance into a matching problem. Let X and Y be two per-

sistence diagrams. Recall that we only record off-diagonal points in a persistence diagram

and put infinite multiplicity on diagonal points. Hence, we assume that X and Y con-

tains only the off-diagonal points with multiplicities. Let X0 and Y 0 be the projection of

X and Y onto the diagonal respectively, i.e. X0 = {((x + y)/2, (x + y)/2)|(x, y) ∈ X} and

Y 0 = {((x0 + y0)/2, (x0 + y0)/2)|(x0, y0) ∈ Y }. Let U = X ∪ Y 0 and V = Y ∪ X0. We call

an undirected graph G a weighted complete bipartite graph if G = (U t V,U × V ) with the

weights given as ( ku − vk , if u ∈ X or v ∈ Y c(u, v) = ∞ 0, otherwise. To see how this weight function relates to a map between persistence diagrams, we note that

there are four cases of matching the points:

Case 1: If u ∈ X, v ∈ Y , then we are sending an off-diagonal point in X to an off-diagonal

point in Y . Thus, the cost is ku − vk∞;

Case 2: If u ∈ X, v ∈ X0, then we are sending an off-diagonal point in X to a diagonal

point. Hence the cost is (y − x)/2 if u = (x, y).

Case 3: If u ∈ Y 0, v ∈ Y , then we are sending an off-diagonal point in Y to a diagonal

point. Again, the cost is (y0 − x0)/2 if u = (x0, y0).

Case 4: If u ∈ Y 0, v ∈ X0, then a diagonal point is send to another diagonal point. However,

a diagonal point does not “exist” in a persistence diagram. Hence, the cost is 0.

Note that G contains all possible ways to map two points in X and Y . Moreover, because

we do not map a diagonal point to another diagonal point, we can remove all the edges with

0 weight and their vertices.

Recall that in (3.2), when we compute the bottleneck distance, we want to find an optimal

map such that the cost of matching the most expensive pair is minimized. Since G contains

38 all possible ways of matching two points in two persistence diagrams, the bottleneck distance

can is given by the weight of an edge in G. We can find the weight of such edge by searching

for the smallest maximum edge weight in all matchings of points in persistence diagrams.

Let G[r] be a subgraph of G that contains all the edges with weights at most r.A

collection of edges is called a matching if none of the edges in the collection share a common vertex. An edge is matched if the edge is in a matching M. We call a vertex in G free if it is unmatched. An augmenting path is a path that starts and ends on free vertices, and alternate between matched and unmatched edges in the path. A maximum matching is a matching of maximal cardinality of edges. Suppose G[r] has 2n vertices. G[r] has a perfect matching if its maximum matching has n edges. Note that a perfect matching gives a matching of points. Then, the bottleneck distance of G is the minimal value r such that G[r] contains a perfect matching [50]. A maximum matching is hard to find, but due to Berge [6], we only need to find a maximal set of edges such that no more augmenting path could be found.

The Hopcroft-Karp algorithm [43] is used to find a maximum matching. Starting with an empty matching M, we find an augmenting path π and update the matching by taking a symmetric difference. That is, the updated matching is M 0 = (M − π) ∪ (π − M). Observe that in an augmenting path, we have one more pair of unmatched edges than matched edges.

Because symmetric difference negates the status (matched/unmatched) of an edge, we have

|M 0| = |M| + 1. We then repeat the process of finding an augmenting path and updating the matching until there is no more augmenting path. Therefore, we obtain a maximum matching.

However, updating augmenting paths one by one may take a lot of time. Hopcroft and

Karp proposed an breadth-first search algorithm that finds all shortest augmenting paths

(with respect to a matching M) by constructing layers of vertices. Let M be a matching.

In the first layer L1, we put all free vertices in U in L1. In layer L2i, we consider all vertices in V that have not appeared in any Lj for j < 2i and connected in the underlying graph to any vertices in L2i−1. If L2i contains free vertices, then L2i is the last layer and there is an augmenting path of length 2i. Otherwise, we construct L2i+1 to contain all vertices that are matched to vertices in L2i and these vertices are in U. Then we can add all the augmenting paths found to M by taking symmetric difference and increase the size of the matching.

Efrat et al. observed that we can avoid explicit computation of the layers for a geometric graph G[r] by using a near-neighbor search data structure [29]. They introduced a data

39 structure Dr(S) for some S ⊆ V with near-neighbor search and deletion operations. More precisely:

• neighborr(Dr(S), q): returns an s ∈ S such that the distance between q and s is at

most r. If no such s exists, returns ∅;

• deleter(Dr(S), s): deletes s from S.

Then a layered graph can be constructed in the following fashion. Let r∗ denote the minimal

r such that G[r] contains a perfect matching. Fix an r and consider a graph G with a

matching M. Let D = Dr(V ). We followed the usual construction for the first layer, i.e.

Li = {u ∈ U|u is free}. Then for any even layer L2i, we iterate over all a ∈ L2i−1. For each

A, we put all b ∈ neighborr(D, a) in L2i and delete b from D using deleter(D, b). Note

that the deletion of b excludes the possibility of building a loop in the graph. If L2i is empty,

then there is no augmenting path. If L2i contains a free vertex in V , then we’ve found a

construction of a layered graph that contains an augmenting path. Hence, we return all

the layers. Otherwise, we build L2i+1 by adding all vertices that are matched in M to any

vertices in L2i. In this way, we can partition vertices in V into even layers. If there is no

free vertex in V in the last even layer, we conclude r∗ > r.

After {Li} are built, we can construct all augmenting paths again using a depth-first

search [29]. We start from free vertices L1 and build an alternating path. For each vertex

a ∈ L2i−1, we choose a b ∈ neighborr(D(L2i, a)) and add (a, b) to current path and advance

to b. If no such b exists, then there is no neighbors of a in L2i remains. Hence, we backtrack

− − to a. If a ∈ L1, we delete a. Otherwise, we track the two vertices a and b preceding a in

the path. Then we remove a and b− from the path and continue from a−. If all vertices are

matched, then we obtain a perfect matching. Thus, r∗ ≤ r. Otherwise, we conclude r∗ > r.

To find the optimal r∗, we can perform a binary search on the magnitude of r. Let

n = |U| = |V |. Recall that r∗ must be a weigh of an edge in G, which was the Euclidean

distance between the vertices. Hence, we only need to search for n2 weights. Then, given all

these n2 weights, we can sort them in increasing order and perform a binary search on these

weights to find the optimal r∗ using the constructions described above.

Kerber et al. followed the construction suggested by Efrat et al. but they used k-d trees

to simplify the near-neighbor search. A k-d tree [5] is a binary tree that partitions points in

a k-dimensional Euclidean space. Given a set of points, we first split the set at the median

value of the first coordinate into two subsets. The split point is added as a node to the binary

tree. Then we recursively split the two halves in the next dimension, and collect the medians

40 Figure 3.8: Figure from [68]. An example of a k-d tree.

as nodes in tree. If there is no more unique dimensions to be split, we simply restart from the first dimension. Then we repeat the construction until the subset to be split contains only one element and these are the leaf node of the tree. Note that for each subtree, we can associate the root of the subtree to a bounding box in the original space. In the end, we will obtain a balanced binary tree. Figure 3.8 shows an example of a k-d tree. One can follow the labels of the points to see how the splitting was performed.

We can traverse the tree to perform the near-neighbor search neighborr(Dr(S), q) on a

k-d tree for a query point q and a set of points S. Starting at the root of the binary tree,

we use the root as the current candidate for near neighbor and move down to its children.

If the bounding box of the root of the current subtree is farther from q than the radius r,

then we can remove the subtree from the search.

Using the k-d tree structure, Kerber et al. integrated the Hopcroft-Karp algorithm

and Efrat’s observation to implement a tool to compute the bottleneck distance. Instead

of constructing Dr(S) when building layered graph, they build a k-d tree for S. Then

deleter(Dr(S), s) corresponds to removing s from the k-d tree without rebalancing the tree.

41 Chapter 4: Experiments

4.1 Summary of data

We use the teeth data from [11] to test the performance of our implementations. The data set consists of 116 triangulated surface scans of mandibular second molars which belong to either primates or non-primate close relatives. The mandibular second molar is a tooth found in for grinding. It locates between the first and the third molars; the third molar is more commonly referred to as a “wisdom tooth”.

Each tooth in the data set is composed of about 5,000 vertices and more than 10,000 triangles constructing a mesh of the surface of the crown. Figure 1.1 in Chapter 1 shows an example of the surface of the crown of a tooth in the data set colored by its mean curvature.

We will classify the teeth following three different schemes: family (biology), genus (bi- ology), and diet. There are 24 families, 37 genera and 4 diets found in the data set. We provide a table of classes in each category with counts in Table 4.1. Labels are mostly obtain from the supplementary materials for [11]. Only 74 specimens that have dietary labels in

[11]. We search and add the dietary labels for another 12 specimens. We also label the class

Incertae sedis, which means “of uncertain placement”, as “NA”. Any specimen with “NA” label is excluded when testing the quality of classification.

Unlike family or genus, which are widely used in taxonomic classification of organisms, using dietary categories to classify organisms may be foreign to some readers. However, such a choice is rather intuitive since our data are teeth and it is expected that the shape of a tooth is correlated with the particular diet of the animal.

Qualitative analysis on dietary preferences of fossil taxa suggests correlations between diets and shapes of the crowns [40, 85, 37, 57]. As summarized in [10], biologists associate lower-crowned fossil teeth with diets that involve crushing “brittle” food, e.g. fruits and nuts. Teeth with long blades are good for cracking leaves or some insects. To crush harder insects, like beetles, one may need taller cusps [48, 83, 31]. Major dietary preferences include: frugivore (fruits), folivore (leaves), insectivore (insects) and omnivore (all). Figure 4.1

42 Family class Family count Genus class Genus count Diet class Diet count 4 adapis 4 folivorous 28 carpolestidae 1 altanius 2 frugivorous 9 cercamonaiidae 3 arctocebus 4 insectivorous 30 cheirogaleiidae 11 avahi 3 na 37 chronolestidae 1 4 omnivorous 12 cynocephalidae 4 cheirogaleus 3 eosimiidae 4 chronolestes 1 galagidae 9 cynocephalus 4 indridae 9 donrussellia 3 lemuridae 14 elphidotarsius 1 lepilemuridae 4 eosimias 4 lorisidae 12 eulemur 3 megaladapidae 1 galago 9 na 2 hapalemur 2 4 indri 1 nyctitheriidae 3 lemur 4 8 lepilemur 4 palaechthonidae 1 leptacodon 3 paromomyidae 1 loris 3 pitlocercidae 2 megaladapis 1 plesiadapidae 4 microcebus 4 purgatoriidae 4 mirza 2 saxonellidae 1 nycticebus 3 tarsiidae 5 paromomys 1 tupaiidae 4 perodicticus 2 phaner 2 plesiolestes 1 prolemur 1 pronothodectes 4 propithecus 5 ptilocercus 2 purgatorius 4 saxonella 1 tarsius 5 teilhardina 8 tupaia 4 varecia 4

Table 4.1: Statistics of families, genera and diets of the teeth in the data set.

43 Figure 4.1: Figure from [10] showing crowns of teeth from animals with different diet.

shows a side-by-side comparison between molars from primates with different diets. The

cusps and valleys in the teeth are deeper as we scan from the left of the figure to the

right. These cusps and valleys can be characterized by curvature in differential geometry.

Therefore, the curvature distributions on shapes may distinguish teeth with different diets.

However, the effectiveness of characterizing teeth by these dietary categories is contro- versial. The dietary categories (frugivore, folivore, insectivore and omnivore) fail to perfectly differentiate the teeth since textures and properties of the materials are shared among dif- ferent dietary categories [83]. On the other hand, in some cases, the shape of a tooth corresponds to distinct properties of the materials in different dietary categories [56, 46, 47].

As one may observe in Figure 4.1, the size of teeth varies a lot from species to species.

Hence, it is necessary to know the range of sizes of the teeth in our data set. Figure 4.2 shows the histograms of all the diameters (maximum distance between any two points in a metric space) using Euclidean and geodesic distances with an outlier being removed. We use the graph distance induced by the triangulation on a mesh as the geodesic distance. The outlier tooth belongs to an animal in the extinct genus of Megaladapis, which is also known

as the koala lemur which once inhabited Madagascar. Megaladapis was different from other

lemurs with a body built like a modern koala and is believed to be on a folivorous diet [72].

The shape of its skull was also unique among primates on a folivorous diet [4]. The difference

between the Megaladapis’ tooth and those of other animals in the data set is also reflected

in its size. The tooth which belongs to the Megaladapis has a diameter of about 24.6mm (in

Euclidean distance) while the second largest diameter is about 0.8mm.

4.2 Overview of experiments

We call each attempt of constructing a distance on the collection of shapes using different

parameters an experiment. Our experiments correspond to two different approaches:

44 (a)

(b)

Figure 4.2: Histogram of the diameters of the teeth in Euclidean (Figure 4.2a) and geodesic (Figure 4.2b) distances. The tooth that belongs to Megaladapis has diameter about 24.6mm (in Euclidean distance) and 32.7 (in geodesic distance). It is excluded from both histograms.

45 OT: Approximate the TLB on the teeth using different metrics, probability measures with

different D;

PH: Construct a mean curvature based filtered simplicial complex or a Vietoris-Rips com-

plex; then apply persistent homology and compute the bottleneck distance between

the resulting persistence diagrams.

The OT approach Our experiments using the OT approach depend on the choices of three

parameters: probability measures, metric (for measuring B(x, t) when computing the local

distribution (2.10)), and D. Table 4.2 shows all the combinations on which we experiment.

Probability measures Metrics Normalization D Uniform Euclidean No 0.01-0.8 Uniform Euclidean Yes 1 Uniform Geodesic No 0.1-1.2 Uniform Geodesic Yes 1 Voronoi Euclidean No 0.32 Voronoi Euclidean Yes 1 Voronoi Geodesic No 0.35 Voronoi Geodesic Yes 1

Table 4.2: Parameters that can be tuned in the experiment. The choices of D are not uniformly distributed. The “Normalization” column in the table indicates if the distance matrix is normalized.

The PH approach We use either Vietoris-Rips filtration or a sublevel set filtration from

a mean curvature based function to obtain a persistence diagram. Then we choose dB,∞

(3.2) or dB,2 (3.3) between persistence diagrams to construct a bottleneck distance matrix.

Table 4.3 lists the filtration functions and bottleneck distances that we experiment.

Filtration method Metric Barcode distance Vietoris-Rips (Normalized) Euclidean/geodesic dB,∞ Vietoris-Rips (Normalized) Modified geodesic dB,∞ (Normalized) filtration functions N/A d /d based on mean curvature B,∞ B,2

Table 4.3: Table above shows all the combinations of parameters that we tried using the PH approach. For each method, we consider both the normalized and the unnormalized versions.

46 4.2.1 Quantitative measure of quality of classification

Each experiment yields a resulting distance matrix. To visualize the result, we use dendro- grams of the clustering results with single linkage hierarchical clustering. Such dendrogram reflects proximity between each pair of clusters independent of the ordering on the data.

Figure 1.2 shows an example of a dendrogram. We will also show the dendrograms that produces the lowest Pe for dietary classification in later sections. We remind our readers that the tooth that belongs to Megaladapis is removed from all the dendrograms that are shown in this thesis.

We will outline a bottom-up single linkage hierarchical clustering algorithm below. That is, it builds a binary tree from leaf nodes up to the root. We refer our readers to [59] for details. The algorithm starts with a collection of singletons and forms a new partition of the data set by merging two closest clusters at a time, until there is no clusters to be merged.

Eventually, at the root of the tree, the entire data set is in the same cluster. Single linkage is a rule for updating the distance between two clusters. At each step, for each pair of clusters, we compute all the pairwise distances of the elements in the two clusters. In single linkage, the proximity between clusters is given by the minimal distance.

Although a dendrogram is capable of showing clusters in a distance matrix, a quantitative measurement of the quality of the classification result is still needed for rigorous compar- ison between the methods. An intuitive measurement of the quality is by computing the probability of error (Pe) in the leave-one-out classification. Leave-one-out is a supervised classification algorithm. For each item in the data set, we predict the label of the item by the label of its nearest neighbor. By repeating the prediction process for each item in the data set, we obtain a collection of predicted labels. The Pe is the fraction of items that have different predicted labels from their true labels.

4.3 The OT approach

4.3.1 Outline of the OT approach

We recall (2.9) from Chapter 2 which states that the TLB between two metric-measure spaces X = (X, dX , α) and Y = (Y, dY , β) is defined as

Z Z D  1/p −1 −1 p TLBp(X , Y) := inf |FX (x, t) − FY (y, t)| dt µ(dx, dy) µ∈M(α,β) X×Y 0 where D controls the size of the neighborhood of interest and

FX (x, r) := α(B(x, r))

47 is the local distribution of X defined in (2.10).

Our implementation of the method that computes the TLB can be broken up into three parts:

1. subsampling vertices and computing a pairwise distance matrix;

2. computing local distribution at each sample point;

3. solving the optimal transport problem in (2.9) and obtain TLB.

In part 1, we first subsample K vertices in each tooth using the Farthest Point Sampling algorithm (FPS). Essentially, we maximize the sum of distances of the points in the sampling process. Let S be the sampling set and X be a set of points in a metric space. We initialize S

by adding a random point in X. Then we repetitively find the point in X \ S that is farthest

(in euclidean sense) to all the points in S and add it to S until we have K elements in S.

Then for computing local distributions, we convert a point cloud into a pairwise distance

matrix using either Euclidean distance in R3 or geodesic distance induced by a triangulated mesh. To compute the geodesic distance, we first build a weighted graph. The underlying

graph is given by the edges in the triangulation and the weight on each edge is given by

the euclidean distance between the vertices of the edge. Then the geodesic distance between

two vertices in the graph is the sum of the weights of the edges in the shortest path. The

computation of geodesic distance uses Dijkstra’s algorithm [24].

In part 2, we discretize the interval [0, D] into N equally spaced points and compute the

local distribution (2.10) at each point. We chose N = 100 to ensure speed of computation.

Given a probabilty measure µ, recall that the local distribution of x with radius t is given

by µ(BX (x, t)). Hence, for each point xi, we look for the values in the distance matrix at row i that have distance less than t. Then we sum the probability measures at these points

and repeat for all t to obtain the local distribution vxi .

In our experiments, we use two different probability measures: uniform probability mea- sure and a probability measure induced by the Voronoi cell on a subsample of data points.

Let S ⊂ X. A Voronoi probabiltiy measure is defined as follows: to each s ∈ S, we associate a set

0 0 0 0 0 0 Vs := {x ∈ X|dX (s, x ) ≤ dX (s , x )∀s ∈ S s.t. s 6= s} .

We observe that the collection {Vs}s∈S forms a partition of X. Figure 4.3 shows an example of a Voronoi partition. Then we can define a probability measure ν on S to be that for each

48 (a) (b)

Figure 4.3: An example of Voronoi partition. We generate 5000 random points and picked 7 points as representatives using the Farthest Point Search algorithm (explained in section 4.3.1). 4.3a shows the 5000 points in small blue dots and the larger colored dots are the 7 representatives. 4.3b shows the Voronoi cells associated to each representative.

s ∈ S |V | ν(s) := s . |X|

One can check that such ν is a probability measure when S and X are finite sets.

In part 3, we compute the TLB using Sinkhorn’s algorithm [22]. For each pair of shapes X

PN and Y , we compute the cost of transporting a point xi ∈ X to yj ∈ Y by k=1 |vxi,k −vyj ,k|∗

D/N where vx and vy are local distributions at x and y respectively. Then using Sinkhorn’s algorithm [22] to solve the optimization problem in (2.9), we obtain an approximation of the

TLB.

Note that the iterative Sinkhorn’s algorithm depends both on the choice of  and the number of iterations in section 2.6. Using a small  with many iterations will lead us to a good approximation of the TLB but the computation will be costly in time. We fix a pair of shapes and test  = 0.1, 0.05 and 0.005 with K = 1000, 1500 and 2000. After experimenting with a few combinations of these parameters, we choose  = 0.005 with niter = 44 and K = 2000. It takes rounghly 17 seconds to solve for the TLB for a pair of shapes with these parameters. The error of Sinkhorn’s algorithm at the ith iteration

(i) (i−1) P (i) (i−1) is ku − u k1 = j |uj − uj | where u is the scaling factor described in section 2.6. The error measures the rate of convergence of Sinkhorn’s algorithm. Choosing niter

= 44 guarantees the error to have the magnitude of 10−6. Figure 4.4 shows the error of computation against the number of iterations.

49 Figure 4.4: Error against niter when  = 0.005 and K = 2000.

4.3.2 Results of using the OT approach

We computed the TLB using different sets of parameters. By choosing combinations of parameters from Table 4.2, we obtain different results. Table 4.4 shows the probability of error associated to each experiment using optimal transport.

Table 4.4: Probability of Error table for experiments using optimal transport.

ProbabilityMeasure Metric D K PeFamily PeGenus PeDiet Uniform Euclidean 0.01 2000 0.851 0.897 0.658 Uniform Euclidean 0.10 2000 0.816 0.888 0.506 Uniform Euclidean 0.20 2000 0.789 0.879 0.519 Uniform Euclidean 0.21 2000 0.781 0.871 0.506 Uniform Euclidean 0.22 2000 0.789 0.879 0.544 Uniform Euclidean 0.23 2000 0.772 0.862 0.544 Uniform Euclidean 0.24 2000 0.807 0.879 0.532 Uniform Euclidean 0.25 2000 0.807 0.897 0.557 Uniform Euclidean 0.26 2000 0.816 0.879 0.519 Uniform Euclidean 0.27 2000 0.789 0.871 0.532 Uniform Euclidean 0.28 2000 0.772 0.853 0.532 Uniform Euclidean 0.29 2000 0.737 0.845 0.519 Uniform Euclidean 0.30 2000 0.798 0.879 0.557 Uniform Euclidean 0.31 2000 0.798 0.871 0.519 Uniform Euclidean 0.32 2000 0.772 0.853 0.532 Uniform Euclidean 0.33 2000 0.781 0.862 0.532 Uniform Euclidean 0.34 2000 0.754 0.836 0.532 Uniform Euclidean 0.35 2000 0.781 0.853 0.532 Uniform Euclidean 0.36 2000 0.746 0.819 0.494 Uniform Euclidean 0.37 2000 0.781 0.871 0.532 Uniform Euclidean 0.40 2000 0.798 0.871 0.532 Uniform Euclidean 0.50 2000 0.763 0.836 0.532 Uniform Euclidean 0.80 2000 0.772 0.853 0.519 Uniform Normalized Euclidean 1.00 2000 0.842 0.845 0.570 Uniform Geodesic 0.10 2000 0.851 0.879 0.570 Uniform Geodesic 0.15 2000 0.816 0.853 0.532 Continued on next page

50 Table 4.4 – Continued from previous page ProbabilityMeasure Metric D K PeFamily PeGenus PeDiet Uniform Geodesic 0.20 2000 0.798 0.853 0.532 Uniform Geodesic 0.25 2000 0.816 0.862 0.544 Uniform Geodesic 0.27 2000 0.807 0.871 0.544 Uniform Geodesic 0.28 2000 0.781 0.862 0.532 Uniform Geodesic 0.29 2000 0.781 0.845 0.544 Uniform Geodesic 0.30 2000 0.807 0.879 0.519 Uniform Geodesic 0.31 2000 0.816 0.871 0.570 Uniform Geodesic 0.32 2000 0.798 0.879 0.570 Uniform Geodesic 0.33 2000 0.807 0.879 0.519 Uniform Geodesic 0.34 2000 0.807 0.871 0.557 Uniform Geodesic 0.35 2000 0.789 0.853 0.557 Uniform Geodesic 0.36 2000 0.781 0.862 0.570 Uniform Geodesic 0.38 2000 0.789 0.862 0.532 Uniform Geodesic 0.40 2000 0.789 0.862 0.544 Uniform Geodesic 0.42 2000 0.746 0.862 0.544 Uniform Geodesic 0.44 2000 0.763 0.845 0.544 Uniform Geodesic 0.45 2000 0.798 0.871 0.570 Uniform Geodesic 0.50 2000 0.772 0.862 0.519 Uniform Geodesic 0.60 2000 0.772 0.853 0.532 Uniform Geodesic 0.70 2000 0.789 0.862 0.557 Uniform Geodesic 0.80 2000 0.772 0.853 0.544 Uniform Geodesic 0.90 2000 0.781 0.862 0.582 Uniform Geodesic 1.00 2000 0.763 0.845 0.557 Uniform Geodesic 1.10 2000 0.781 0.853 0.582 Uniform Geodesic 1.20 2000 0.781 0.862 0.570 Uniform Normalized geodesic 1.00 2000 0.798 0.810 0.570 Voronoi Euclidean 0.32 50 0.746 0.828 0.557 Voronoi Euclidean 0.32 100 0.754 0.828 0.557 Voronoi Euclidean 0.32 200 0.763 0.836 0.544 Voronoi Normalized Euclidean 1.00 50 0.807 0.862 0.608 Voronoi Normalized Euclidean 1.00 100 0.807 0.828 0.696 Voronoi Normalized Euclidean 1.00 200 0.772 0.793 0.519 Voronoi Geodesic 0.35 50 0.807 0.871 0.519 Voronoi Geodesic 0.35 100 0.763 0.836 0.557 Voronoi Geodesic 0.35 200 0.763 0.836 0.494 Voronoi Normalized geodesic 1.00 50 0.868 0.922 0.608 Voronoi Normalized geodesic 1.00 100 0.781 0.836 0.620 Voronoi Normalized geodesic 1.00 200 0.816 0.879 0.671

4.3.3 Using Euclidean distance with uniform probability measures

We test many choices of D with Euclidean distance and uniform probability measures.

We observe that for D around 0.32, we obtain a “good” dendrogram as shown in Figure 1.2 where the immediate children of the root consists of two large clusters that almost separates frugivore/folivore from insectivore/omnivore. Hence, we expect the Pe to be lower around

D = 0.32. To view all the dendrograms, we redirect our readers to our webpage 1 which we built to assist comparisons of dendrograms. When using uniform probability measures, we

1https://research.math.osu.edu/networks/demos/teeth-dendrograms/

51 KeucK2000_epsParam0.005_niter44_DD0.36 insectivorous insectivorous insectivorous insectivorous insectivorous na na insectivorous omnivorous insectivorous omnivorous folivorous insectivorous insectivorous omnivorous omnivorous insectivorous insectivorous frugivorous insectivorous insectivorous omnivorous insectivorous insectivorous na insectivorous na na folivorous folivorous insectivorous omnivorous na insectivorous insectivorous insectivorous folivorous folivorous folivorous folivorous na na insectivorous na na folivorous na na na na na na na na frugivorous insectivorous na na frugivorous folivorous insectivorous insectivorous folivorous na folivorous folivorous insectivorous na na folivorous folivorous na na folivorous omnivorous omnivorous folivorous omnivorous insectivorous omnivorous folivorous folivorous folivorous na na insectivorous na na na folivorous folivorous insectivorous folivorous folivorous folivorous insectivorous frugivorous frugivorous folivorous frugivorous omnivorous insectivorous na frugivorous na na na folivorous na folivorous omnivorous na folivorous frugivorous frugivorous 0.006 0.007 0.008 0.009 0.01 0.011 0.012 0.013 0.014 0.015 0.016

Figure 4.5: Figure shows the dendrogram using Euclidean distance, uniform probability measure and D = 0.36. The Pe for diet is 0.494.

can see that as D moves towards 0.32, the separation between the dietary categories becomes more obvious.

However, the probability of error (Pe) in leave-one-out tells a different story. The lowest

Pe’s for family, genus and diet classifications are 0.737, 0.819 and 0.494 and are obtained at

D = 0.29, 0.36 and 0.36 respectively. In addition, at D = 0.29 and 0.36, we obtain the lowest

Pe’s for family and diet across all the experiments using the OT approach. The dendrogram for D = 0.36 is shown in Figure 4.5. Although 0.29 and 0.36 are close to 0.32, which is the value of D with the best overall structure in its dendrogram, as we scan the Pe for diet in

Table 4.4 or in Figure 4.8c, the change in Pe does not show any trend of decrease as D moves toward 0.32. This does not coincide with our observation on the change in the structures of dendrograms.

4.3.4 Using geodesic distance with uniform probability measures

We observe (visually) that the best dendrogram using geodesic distance with uniform probability measures was obtained when D = 0.35 as shown in Figure 4.6. The lowest Pe for family is obtained at D = 0.42; Pe for genus is the lowest at D = 1.0 when using normalized

52 geoK2000_epsParam0.005_niter44_DD0.35 folivorous folivorous frugivorous na folivorous omnivorous insectivorous insectivorous na na frugivorous folivorous folivorous omnivorous na folivorous insectivorous insectivorous folivorous insectivorous insectivorous insectivorous insectivorous insectivorous na insectivorous insectivorous na na omnivorous insectivorous insectivorous na na na insectivorous na na omnivorous frugivorous insectivorous folivorous folivorous frugivorous insectivorous omnivorous omnivorous insectivorous omnivorous folivorous insectivorous insectivorous na frugivorous insectivorous insectivorous omnivorous insectivorous insectivorous na na insectivorous insectivorous na insectivorous na insectivorous insectivorous insectivorous na insectivorous na na na na na na na na na na na omnivorous omnivorous omnivorous omnivorous na na na na folivorous folivorous folivorous folivorous folivorous folivorous folivorous na na na na frugivorous folivorous folivorous folivorous frugivorous frugivorous folivorous frugivorous folivorous folivorous folivorous folivorous folivorous folivorous 5 6 7 8 9 10 10-3

Figure 4.6: Best dendrogram (in terms of structure) using geodesic distance, uniform prob- ability measure with D = 0.35. The separation between the clusters are less optimal than the best dendrogram when using Euclidean distance as the metric. The Pe for diet is 0.557.

geodesic distance; and the best Pe for diet is obtained at D = 0.3, 0.33 and 0.50. The dendrograms of the results that produce the lowest Pe for diet are shown in Figure 4.7.

Similar to what we observed in section 4.3.3, Table 4.4 and Figure 4.8c suggests that there is no clear correlation between the Pe and D. Moreover, the overall structures of the dendrograms in Figure 4.7b (D = 0.33) and in Figure 4.7c (D = 0.5) are drastically different.

Yet the lowest Pe is attained at both values of D when using geodesic distance. Hence, we conclude that the overall structure of a dendrogram may have little to do with the dietary identification accuracy.

When using the uniform probability measures, the classification errors for family and dietary labels are higher in general for geodesic distance than those for Euclidean distance as shown in Figure 4.8. However, for classification tasks on genus classes, we observe that the highest Pe for geodesic distance is 0.879, which is lower than the Pe = 0.897 for Euclidean distance. Moreover, the lowest Pe = 0.810 for genus when using geodesic distance is lower than that of Euclidean distance (Pe = 0.819).

53 geoK2000_epsParam0.005_niter44_DD0.3 folivorous frugivorous frugivorous na na insectivorous na na na folivorous folivorous folivorous na folivorous insectivorous omnivorous na na folivorous insectivorous folivorous folivorous frugivorous frugivorous folivorous frugivorous na na frugivorous omnivorous insectivorous na omnivorous omnivorous na folivorous folivorous omnivorous folivorous folivorous folivorous omnivorous folivorous insectivorous folivorous na insectivorous folivorous folivorous na na na frugivorous insectivorous frugivorous folivorous insectivorous insectivorous na na na na na na na folivorous insectivorous na na na na folivorous folivorous folivorous insectivorous omnivorous insectivorous na na insectivorous insectivorous insectivorous omnivorous omnivorous insectivorous insectivorous omnivorous frugivorous na na insectivorous na na na insectivorous folivorous insectivorous insectivorous folivorous folivorous folivorous insectivorous na insectivorous na insectivorous omnivorous insectivorous omnivorous folivorous insectivorous insectivorous insectivorous insectivorous insectivorous 4 5 6 7 8 9 10-3 (a)

geoK2000_epsParam0.005_niter44_DD0.33 folivorous frugivorous frugivorous na na na insectivorous na na folivorous folivorous folivorous na folivorous insectivorous omnivorous na na folivorous insectivorous folivorous folivorous frugivorous frugivorous folivorous frugivorous na na frugivorous omnivorous insectivorous na omnivorous omnivorous na folivorous folivorous folivorous folivorous folivorous omnivorous omnivorous folivorous insectivorous folivorous insectivorous na folivorous folivorous na na na na frugivorous insectivorous insectivorous insectivorous na frugivorous folivorous na na na na na folivorous na na insectivorous na folivorous folivorous insectivorous omnivorous insectivorous na insectivorous insectivorous omnivorous omnivorous insectivorous insectivorous insectivorous omnivorous frugivorous na folivorous na na na insectivorous na na na insectivorous insectivorous folivorous folivorous insectivorous folivorous folivorous insectivorous na insectivorous na insectivorous omnivorous insectivorous omnivorous folivorous insectivorous insectivorous insectivorous insectivorous insectivorous 5 6 7 8 9 10 10-3 (b)

Figure 4.7: Dendrograms of results using the OT approach with geodesic distance, uniform probability measures and D = 0.3 (Figure 4.7a), 0.33 (Figure 4.7b) and 0.5 (Figure 4.7c). All three experiments yields Pe = 0.519 for dietary classification.

54 geoK2000_epsParam0.005_niter44_DD0.5 insectivorous insectivorous insectivorous insectivorous na insectivorous na insectivorous omnivorous insectivorous omnivorous folivorous insectivorous insectivorous insectivorous na na na insectivorous insectivorous insectivorous folivorous folivorous folivorous folivorous folivorous folivorous insectivorous omnivorous insectivorous na omnivorous omnivorous insectivorous insectivorous insectivorous omnivorous insectivorous insectivorous frugivorous na na folivorous na na na folivorous na na insectivorous na insectivorous insectivorous na frugivorous folivorous na frugivorous insectivorous na na na na na folivorous frugivorous frugivorous omnivorous omnivorous na folivorous folivorous folivorous folivorous folivorous omnivorous omnivorous folivorous insectivorous folivorous na insectivorous folivorous folivorous na na na na na insectivorous na na folivorous insectivorous na folivorous folivorous folivorous omnivorous na na folivorous insectivorous folivorous folivorous frugivorous frugivorous folivorous frugivorous na na na frugivorous omnivorous insectivorous 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022

(c)

4.3.5 Using Voronoi probability measures

From Table 4.4, we observe that when using the Euclidean distance with D = 0.32 with

K = 50, we obtain the lowest Pe at 0.746 for family. When using genus as label, we obtain the Pe as low as 0.793 in the experiment using Normalized Euclidean distance with D = 1 and K = 200. When using diet as labels, we obtain the lowest Pe of 0.494 in the experiment using geodesic distance with DD = 0.35 and K = 200. Figure 4.9 shows its corresponding dendrogram. The lowest Pe’s for genus and diet that we observed in this section is also the lowest Pe’s for genus and diet across all the experiments using the OT approach.

4.3.6 Summary of results using the OT approach

Table 4.5 shows the lowest Pe for each labeling category. The lowest Pe’s are observed in experiments using various parameters. Therefore, we conclude that the flexibility given by these parameters are useful in different classification tasks.

55 (a)

(b)

(c)

Figure 4.8: Figure shows the values of Pe for family (Figure 4.8a), genus (Figure 4.8b) and diet (Figure 4.8c) against the choice of D when using uniform probability measures with either Euclidean distance or geodesic distance as cost. We observe that the lowest Pe’s when using Euclidean distance is always lower than those when using geodesic distance.

56 voronoi_geoK200_epsParam0.005_niter44_DD0.35 insectivorous insectivorous na insectivorous na insectivorous na insectivorous na na na na omnivorous omnivorous frugivorous insectivorous insectivorous insectivorous na insectivorous insectivorous na omnivorous omnivorous folivorous insectivorous insectivorous folivorous folivorous frugivorous insectivorous omnivorous frugivorous insectivorous insectivorous insectivorous omnivorous insectivorous insectivorous na insectivorous insectivorous na na insectivorous insectivorous insectivorous na insectivorous na insectivorous na na na na na na na na na na na omnivorous omnivorous omnivorous omnivorous na na na na insectivorous insectivorous folivorous insectivorous na na insectivorous frugivorous folivorous folivorous omnivorous na folivorous insectivorous folivorous folivorous na na folivorous folivorous folivorous folivorous folivorous na omnivorous folivorous folivorous frugivorous na folivorous na frugivorous frugivorous folivorous folivorous folivorous frugivorous folivorous frugivorous folivorous folivorous folivorous folivorous folivorous folivorous 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 10-3

Figure 4.9: Dendrogram for distance matrix using the Vornoi probability measures, geodesic distance with D = 0.35. This method produces the lowest Pe (= 0.494) for diet when using the OT appraoch.

Label Parameters Pe Family Euclidean distance, uniform probability measures, D = 0.29, K = 2000 0.737 Genus Normalized Euclidean distance, Voronoi probability measures, D = 1.0, K = 200 0.793 Euclidean distance, uniform measure, D = 0.36, K = 2000; and Diet 0.494 geodesic distance, Voronoi probability measures, D = 0.35, K = 200

Table 4.5: Parameters used in experiments where the lowest Pe for each label category.

57 (a) (b)

Figure 4.10: Examples of holes in meshes. Both figures are zoomed in to better show the holes. These holes are visually hard to detect when one is looking at an entire triangulated surface. Figure 4.10a shows an example of a loop that is caused by missing triangles. The boundary of the hole is polygonal. Figure 4.10b is an example of a hole caused by filling triangles at the wrong place: the mesh becomes non-manifold in this case.

4.4 The PH Approach

4.4.1 Preprocessing data

We find tiny holes in the mesh that create cycles in H1. Figure 4.10 shows two examples

of those holes. However, the surface of the crown of a tooth is homotopy equivalent to a disk

[11]. That is, the surface can be continuously deformed into a disk without tearing. Hence,

we believe that these holes in meshes are erroneous and should be removed.

We use the program called ShortLoop [23] to detect holes in a triangulation. The program

returns the shortest list of generators (vertices) for all the holes in the triangulated surface.

For each hole, we first check if the hole only has three generators. If so, we fill the loop

simply by adding an triangle with the vertices being the generators. Otherwise, we add an

extra vertex at the centroid of the generators and fill the hole by connecting the centroid

with each generator. Figure 4.11 shows an example of a hole being filled.

4.4.2 Mean curvature

Mean curvature is an important component in the experiments. Therefore, we provide

a brief review of the definitions and computation of mean curvature below. We refer our

readers to [25] for a detailed review on curvature.

3 Let S ⊂ R be a smooth surface. Let TpS be the tangent space of S at p. That is,

3 for all v ∈ TpS, there exist a curve γ :[−1, 1] → R such that γ([−1, 1]) ⊆ S, γ(0) = p

0 and γ (0) = v. Let np be a unit normal vector to the tangent plane at p. By doing so, we decide an orientation for the surface S. Fix a vp ∈ TpS; then vp and np span a normal plane.

58 (a) (b)

Figure 4.11: Figure 4.11a shows an example a 1-cycle in a triangular mesh. Figure 4.11b shows the new mesh with the cycle (shown on the left) filled. Darker areas are new triangles added to the mesh. The yellow vertex is the centroid of the cycle and is added to the list of vertices that generates the mesh.

Figure 4.12: Figure from Wikipedia created by Eric Gaba at https://en.wikipedia.org/ wiki/Curvature#/media/File:Minimal_surface_curvature_planes-en.svg

Note that the normal plane intersects S in a curve. Figure 4.12 shows a visualization of the normal plane and tangent space given the saddle point p.

The curvature κ for the normal section is defined to be the reciprocal of the radius of

the osculating circle. That is, we fit a circle on the normal plane with maximum radius such

that the circle intersects with S at p and points sufficiently close to p. Figure 4.13 shows an

example of an osculating circle. We say that κ is positive if the osculating circle is on the

same side with the normal vector np. Otherwise κ is negative. The principal curvatures at

p, κ1 and κ2 are the maximum and minimum values of the curvatures over all the choices of

59 Figure 4.13: Figure from Wikipedia created by Cepheus at https://en.wikipedia.org/ wiki/Curvature#/media/File:Osculating_circle.svg. Given a point p, an osculating circle is shown as a blue circle in the figure.

1 1 R 2π vp. Finally, we can define the mean curvature as H(p) = 2 (κ1 + κ2) = 2π 0 κ(θ)dθ where θ where κ(θ) is the curvature associated to the unit vector θ in the tangent space.

One can use a different construction to compute the mean curvature in a discrete setting.

One of the constructions is called the cotangent formula [65]. We will briefly describe the construction below. For further details, we refer our readers to [65, 98]. Suppose M is

a triangular mesh, i.e. a piecewise linear approximation of a smooth surface. Let vi be a vertex in the mesh. We call N1(vi) := {vj ∈ M|vj and vi share an edge in M} a 1-ring neighborhood of v. The idea is to use spatial averages to describe geometric properties using

finite elements. Let A = P A(4(v , v , v )) where 4(v , v , v ) is the triangle i j∈N(vi) i j j+1 i j j+1 with vertices vi, vj, vj+1 and A(4) is the area of a triangle. Then the cotangent formula for a polygonal surface M at a vertex vi is

1 X 1 Lc(vi) := (cot(αij) cot(βij))(vj − vi) Ai 2 j∈N1(vi)

where αij and βij are two angles opposite to the edge (vi, vj) in two triangles sharing the edge (vi, vj) in the mesh. Figure 4.14 shows an illustration of αij and βij. And the mean curvature is given by kL (v )k |H(v )| = c i , i 2

where the sign of H(vi) is given by a normal vector of the surface.

4.4.3 Outline of the PH approach

Our computation is done in three steps:

60 Figure 4.14: Figure from [94]. Figure on the left shows the triangle on which A(4(v0, v1, v2)) is computed. Figure on the right shows the angles used to compute the cotangent formula.

1. compute mean curvatures at each vertex and build filtration functions;

2. construct filtered simplicial complexes and compute the persistence diagrams;

3. calculate the bottleneck distance between persistence diagrams.

In part 1, mean curvature is computed via [20]. Then we build a mean curvature based

filtration function on the simplicial complex induced by the triangulation of a mesh, i.e. the vertices are the 0-simplices; the edges in the triangulation are 1-simplices and the triangles are 2-simplices.

In part 2, we either build Vietoris-Rips filtration (discussed in section 3.4) or filter the simplicial complex given by a triangulation through a mean curvature based filtration func- tion constructed in part 1. In building the Vietoris-Rips filtration, we consider both the

Euclidean and the geodesic distance induced by the triangulation. The geodesic distance is constructed in the same way as we dicussed in section 4.3.1. We will list the filtration functions in later sections. Vietoris-Rips filtrations are built using Ripser [89] and filtered simplicial complexes on curvature-based filtration functions are computed using JavaPlex

[87].

Then part 3 is computed using Hera [50] as discussed in section 3.9.

4.4.4 Results of using the PH approach

Table 4.3 shows a list of parameters that we tuned in experimenting with persistent homology. Table 4.6 shows the Pe (probability of error) associated to each experiment using the PH approach.

61 Filtration Metric BarcodeDistance Dim PeFamily PeGenus PeDiet Vietoris-Rips Euclidean dB,∞ 0 0.851 0.897 0.582 Vietoris-Rips Euclidean dB,∞ 1 0.833 0.879 0.557 Vietoris-Rips Normalized Euclidean dB,∞ 0 0.930 0.957 0.671 Vietoris-Rips Normalized Euclidean dB,∞ 1 0.947 0.948 0.696 Vietoris-Rips Geodesic dB,∞ 0 0.868 0.922 0.430 Vietoris-Rips Geodesic dB,∞ 1 0.860 0.897 0.633 Vietoris-Rips Normalized geodesic dB,∞ 0 0.965 0.974 0.772 Vietoris-Rips Normalized geodesic dB,∞ 1 0.912 0.940 0.608 Vietoris-Rips Modified geodesic, coef=30 dB,∞ 0 0.921 0.931 0.544 Vietoris-Rips Modified geodesic, coef=30 dB,∞ 1 0.895 0.905 0.519 Vietoris-Rips Modified geodesic, coef=40 dB,∞ 0 0.851 0.905 0.582 Vietoris-Rips Modified geodesic, coef=40 dB,∞ 1 0.851 0.897 0.582 Vietoris-Rips Modified geodesic, coef=50 dB,∞ 0 0.860 0.914 0.608 Vietoris-Rips Modified geodesic, coef=50 dB,∞ 1 0.825 0.905 0.519 Vietoris-Rips Normalized modified geodesic, coef=35 dB,∞ 0 0.956 0.966 0.646 Vietoris-Rips Normalized modified geodesic, coef=35 dB,∞ 1 0.886 0.897 0.570 Vietoris-Rips Normalized modified geodesic, coef=45 dB,∞ 0 0.930 0.940 0.747 Vietoris-Rips Normalized modified geodesic, coef=45 dB,∞ 1 0.939 0.966 0.646 maxMeancurv, coef=0.1 N/A dB,∞ 0 0.860 0.914 0.506 maxMeancurv, coef=0.2 N/A dB,∞ 0 0.851 0.922 0.532 maxMeancurv, coef=0.3 N/A dB,∞ 0 0.816 0.914 0.570 Normalized maxMeancurv, coef=0.1 N/A dB,∞ 0 0.956 0.974 0.696 Normalized maxMeancurv, coef=0.2 N/A dB,∞ 0 0.991 0.991 0.671 Normalized maxMeancurv, coef=0.3 N/A dB,∞ 0 0.912 0.948 0.722 Normalized maxMeancurv, coef=0.1 N/A dB,∞ 0 0.956 0.974 0.696 Normalized maxMeancurv, coef=0.2 N/A dB,∞ 0 0.991 0.991 0.671 Normalized maxMeancurv, coef=0.3 N/A dB,∞ 0 0.912 0.948 0.722 Normalized absMeancurv N/A dB,∞ 0 0.965 0.974 0.696 Normalized absMeancurv N/A dB,∞ 1 0.921 0.940 0.658 Normalized minus absMeancurv N/A dB,∞ 0 0.921 0.957 0.620 Normalized minus absMeancurv N/A dB,∞ 1 0.974 0.974 0.772 Normalized absMeancurv N/A dB,2 0 0.974 0.983 0.696 Normalized absMeancurv N/A dB,2 1 0.930 0.957 0.658 Normalized minus absMeancurv N/A dB,2 0 0.895 0.948 0.646 Normalized minus absMeancurv N/A dB,2 1 0.956 0.983 0.633

Table 4.6: Probability of error of experiments using persistent homology.

4.4.5 Using Vietoris-Rips filtration

The first set of experiments in the PH approach is to apply Vietoris-Rips filtration (dis- cussed in section 3.4) on the set of vertices sampled from each tooth. We use the FPS

(explained in section 4.3.1) to sample 500 vertices. For each shape, we filtered the simplicial complex induced by the triangulation both using Euclidean distance and geodesic distance discussed in section 4.3.1. According to Table 4.6 and Table 4.7, the best Pe for diet is 0.43 and is obtained when using geodesic distance with Vietoris-Rips filtration and bottleneck dis- tance between 0th-persistence diagrams. Its dendrogram is shown in Figure 4.15. For family and genus, the best Pe is at 0.833 and 0.879 respectively when using Euclidean distance with Vietoris-Rips filtration and bottleneck distance between 1st-persistence diagrams.

In the hope of improving the classification result, we consider a modification on the geodesic distance. Instead of using Euclidean distance as weight, we construct a new weight function that depends on the mean curvature for each edge [u, v] as

W ([u, v]) := ku − vke−α min{|H(u)|,|H(v)|} (4.1) where H(u) and H(v) are mean curvature at u and v defined in section 4.4.2 and α is a constant. This new weight function on edges makes edges where both vertices have large

62 VR_Geodesic_bt_dim0 folivorous folivorous folivorous folivorous frugivorous folivorous folivorous folivorous frugivorous frugivorous folivorous folivorous folivorous folivorous frugivorous folivorous folivorous folivorous folivorous NA folivorous NA folivorous folivorous NA NA folivorous omnivorous folivorous omnivorous folivorous insectivorous folivorous frugivorous NA folivorous frugivorous NA insectivorous NA NA NA insectivorous NA insectivorous NA NA frugivorous insectivorous NA omnivorous insectivorous insectivorous omnivorous NA NA NA insectivorous insectivorous frugivorous insectivorous omnivorous insectivorous insectivorous frugivorous folivorous folivorous insectivorous folivorous insectivorous insectivorous omnivorous insectivorous insectivorous NA NA insectivorous NA insectivorous insectivorous folivorous omnivorous omnivorous insectivorous insectivorous NA NA NA omnivorous omnivorous omnivorous omnivorous NA NA insectivorous NA insectivorous insectivorous NA insectivorous NA NA insectivorous insectivorous insectivorous insectivorous NA NA NA NA NA NA NA NA NA 0 0.02 0.04 0.06 0.08 0.1 0.12

Figure 4.15: Dendrogram of bottleneck distance matrix of 0th-persistence diagrams for Vietoris-Rips filtration with geodesic distance. The Pe for diet is 0.430, which is also the lowest Pe among all the experiments using the PH approach.

Filtration Metric BarcodeDistance Dim PeFamily PeGenus PeDiet Vietoris-Rips Euclidean dB,∞ 0 0.851 0.897 0.582 Vietoris-Rips Euclidean dB,∞ 1 0.833 0.879 0.557 Vietoris-Rips Normalized Euclidean dB,∞ 0 0.930 0.957 0.671 Vietoris-Rips Normalized Euclidean dB,∞ 1 0.947 0.948 0.696 Vietoris-Rips Geodesic dB,∞ 0 0.868 0.922 0.430 Vietoris-Rips Geodesic dB,∞ 1 0.860 0.897 0.633 Vietoris-Rips Normalized geodesic dB,∞ 0 0.965 0.974 0.772 Vietoris-Rips Normalized geodesic dB,∞ 1 0.912 0.940 0.608

Table 4.7: Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips filtration.

63 Filtration Metric BarcodeDistance Dim PeFamily PeGenus PeDiet Vietoris-Rips Modified geodesic, coef=30 dB,∞ 0 0.921 0.931 0.544 Vietoris-Rips Modified geodesic, coef=30 dB,∞ 1 0.895 0.905 0.519 Vietoris-Rips Modified geodesic, coef=40 dB,∞ 0 0.851 0.905 0.582 Vietoris-Rips Modified geodesic, coef=40 dB,∞ 1 0.851 0.897 0.582 Vietoris-Rips Modified geodesic, coef=50 dB,∞ 0 0.860 0.914 0.608 Vietoris-Rips Modified geodesic, coef=50 dB,∞ 1 0.825 0.905 0.519 Vietoris-Rips Normalized modified geodesic, coef=35 dB,∞ 0 0.956 0.966 0.646 Vietoris-Rips Normalized modified geodesic, coef=35 dB,∞ 1 0.886 0.897 0.570 Vietoris-Rips Normalized modified geodesic, coef=45 dB,∞ 0 0.930 0.940 0.747 Vietoris-Rips Normalized modified geodesic, coef=45 dB,∞ 1 0.939 0.966 0.646

Table 4.8: Probability of error of experiments where filtered simplicial complex is built through Vietoris-Rips with modifided weight given in (4.1).

magnitude in mean curvature easy to travel. Hence, when the filtration value is small, the

filtered simplicial complex would capture the silhouette of each shape first.

We tested on α = 30, 40, 50 and observed that the lowest Pe for diet is 0.519 and is obtained at α = 30 and 50 at dimension 1. Figure 4.16 shows the dendrograms when α = 30 and 50. For genus, the lowest Pe is 0.897 and is observed in two experiments:

1. using α = 40 in (4.1) and comparing the 1st-persistence diagrams;

2. using α = 35 in Normalized (4.1) and comparing the 1st-persistence diagrams.

Although the accuracy of classification is not improved than using the original geodesic distance for genus and diet, the lowest Pe for family is 0.825 when setting α = 50. Table

4.8 shows the list of Pe of the experiments using modified geodesic distances.

4.4.6 Using mean curvature based filtration functions

The first filtration function that we considered incorporates both Euclidean distance and mean curvature. Let H(v) denote the mean curvature at a vertex v and h = maxv |H(v)|.

The filtration function f is constructed in the following way:

f(v) := h − |H(v)| for all vertices v (4.2) f([u, v]) := max {λku − vk, h − |H(u)|, h − |H(v)|} for all edges [u, v] where λ is to be determined. The parameter λ adjusts to the effect of the Euclidean distance in the filtration function. When λ = 0, the filtration value for any edge is the maximum curvature at the two vertices. The vertices with high magnitude of mean curvatures enter into the filtered simplicial complex first. When λ is sufficiently large, the filtration value for any edge is the Euclidean distance between the two vertices. Then short edges connecting two vertices with high curvatures enter into the filtered simplicial complex before any other edges. For some λ in the middle, we observe an outline of a teeth for small filtration values.

64 VR_Modified_geodesic_coef_30_bt_dim1 folivorous folivorous folivorous folivorous folivorous folivorous folivorous frugivorous frugivorous frugivorous folivorous folivorous folivorous frugivorous folivorous NA folivorous folivorous folivorous folivorous folivorous folivorous folivorous folivorous NA NA NA folivorous frugivorous folivorous omnivorous folivorous NA omnivorous NA insectivorous folivorous insectivorous omnivorous insectivorous NA folivorous insectivorous frugivorous insectivorous NA insectivorous insectivorous insectivorous NA insectivorous frugivorous folivorous folivorous insectivorous insectivorous insectivorous omnivorous omnivorous insectivorous frugivorous folivorous omnivorous insectivorous insectivorous omnivorous insectivorous NA NA NA NA NA NA insectivorous NA insectivorous NA NA NA insectivorous insectivorous insectivorous insectivorous frugivorous NA insectivorous NA NA omnivorous insectivorous NA NA insectivorous insectivorous NA NA NA NA insectivorous insectivorous insectivorous insectivorous omnivorous omnivorous omnivorous NA omnivorous NA NA NA NA NA NA NA NA 1 2 3 4 5 6 7 8 9 10-3

(a)

VR_Modified_geodesic_coef_50_bt_dim1 frugivorous folivorous frugivorous folivorous folivorous folivorous folivorous folivorous folivorous folivorous folivorous frugivorous folivorous frugivorous folivorous NA folivorous folivorous folivorous folivorous folivorous folivorous folivorous folivorous folivorous NA NA NA insectivorous folivorous insectivorous NA folivorous frugivorous omnivorous insectivorous folivorous NA NA insectivorous folivorous frugivorous insectivorous omnivorous omnivorous insectivorous insectivorous frugivorous NA folivorous insectivorous folivorous omnivorous insectivorous insectivorous folivorous omnivorous NA insectivorous frugivorous insectivorous insectivorous omnivorous NA insectivorous omnivorous insectivorous insectivorous insectivorous insectivorous NA NA NA NA insectivorous insectivorous NA insectivorous insectivorous insectivorous NA NA frugivorous omnivorous NA insectivorous NA NA insectivorous NA NA NA insectivorous insectivorous NA NA omnivorous omnivorous omnivorous NA omnivorous NA NA NA NA insectivorous NA NA NA insectivorous insectivorous NA NA NA NA 1 2 3 4 5 6 10-3

(b)

Figure 4.16: Dendrograms of the bottleneck distances between the 1st-persistence diagrams. The persistence diagrams are produced using modified geodesic distance in (4.1) in the Vietoris-Rips filtration. Figure 4.16a shows the result when setting α = 30. Figure 4.16b shows the result when setting α = 50. Both results yields Pe = 0.519 for dietary classifica- tion. 65 Filtration Metric BarcodeDistance Dim PeFamily PeGenus PeDiet maxMeancurv, coef=0.1 N/A dB,∞ 0 0.860 0.914 0.506 maxMeancurv, coef=0.2 N/A dB,∞ 0 0.851 0.922 0.532 maxMeancurv, coef=0.3 N/A dB,∞ 0 0.816 0.914 0.570 Normalized maxMeancurv, coef=0.1 N/A dB,∞ 0 0.956 0.974 0.696 Normalized maxMeancurv, coef=0.2 N/A dB,∞ 0 0.991 0.991 0.671 Normalized maxMeancurv, coef=0.3 N/A dB,∞ 0 0.912 0.948 0.722

Table 4.9: Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.2) in section 4.4.6.

We only compute the 0-dimensional persistence diagrams. The Pe’s are shown in Table

4.9. The lowest Pe for family is 0.816 and is obtained when using λ = 0.3. It is better than the Pe for the experiments using Vietoris-Rips filtration. In fact, it is the best Pe for family across all the experiments using the PH approach. The lowest Pe for genus is 0.914 and is obtained when using λ = 0.1 or 0.3. The lowest Pe for diet is 0.506 and is obtained when

λ = 0.1. It is slightly better than the Pe for diet when using modified geodesic distance.

Dendrograms for the experiments using λ = 0.1 and 0.3 are shown in Figure 4.20. The structure of the dendrograms is visually similar to that of Figure 4.15.

Since (4.2) improved Pe for family classification, we hope that by exploring other filtra- tion functions that are more directly related with mean curvature, we will obtain better Pe for family.

For the following family of filtration functions, we will first defined the filtration values at vertices and extend the filtration functions on edges and triangles by

fi([u, v]) = max {fi(u), fi(v)} for any edge [u, v]

fi([u, v, w]) = max {fi(u), fi(v), fi(w)} for any triangle [u, v, w].

We considered the following collection of functions:

f1(v) := |H(v)| (4.3)

f2(v) := H(v) − min H(v) (4.4) v

f3(v) := max |H(v)| − |H(v)| (4.5) v

f4(v) := max H(v) − H(v) (4.6) v

We note that in (4.5), the vertex with the largest magnitude of mean curvature appears

first in the filtered complex, followed by vertices with the second largest magnitude of mean curvature. Hence, the sublevel set filtration of (4.5) mimics the behavior of applying the superlevel set filtration on function (4.3). Similarly, (4.6) is the superlevel set filtration formulation of (4.4).

66 maxMeancurv_coef_0.1_bt_dim0 folivorous folivorous folivorous folivorous folivorous folivorous folivorous frugivorous frugivorous frugivorous folivorous folivorous folivorous folivorous frugivorous folivorous folivorous folivorous folivorous NA folivorous folivorous folivorous NA NA NA folivorous omnivorous folivorous frugivorous NA folivorous insectivorous folivorous omnivorous frugivorous folivorous NA insectivorous NA NA NA NA insectivorous folivorous frugivorous folivorous insectivorous omnivorous insectivorous insectivorous folivorous insectivorous insectivorous NA NA insectivorous frugivorous insectivorous insectivorous insectivorous insectivorous omnivorous folivorous insectivorous omnivorous insectivorous omnivorous insectivorous NA NA NA omnivorous omnivorous omnivorous omnivorous NA insectivorous NA NA insectivorous NA frugivorous omnivorous insectivorous NA omnivorous insectivorous insectivorous NA NA NA insectivorous NA NA NA NA NA NA NA NA insectivorous NA insectivorous NA insectivorous NA insectivorous insectivorous insectivorous insectivorous insectivorous NA NA NA 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Figure 4.17: Figure shows the dendrogram of the result when using (4.2) with λ = 0.1 to construct a filtered simplicial complex. The Pe for diet is 0.506.

Because it is too expensive to compute a full bottleneck distance matrix of all 116 teeth, we first compare the bottleneck distance matrix of a collection of seven shapes. The collection of sample shapes is chosen such that it contains a pair of teeth from three different families covering two diets, together with the tooth from Megaladapis (see section 4.1). The surfaces of the crowns of these teeth are shown in Figure 4.18. The dendrograms for the bottleneck distance matrices of the 0th and 1st-persistence diagrams for (4.3) - (4.6) are shown in Figure

4.19. We see that only the third dendrogram in the first row and the last dendrogram in the third row show that clusters that have the same dietary label are merged first. These are dendrograms corresponding to the bottleneck distance of the 0th-persistence diagrams for normalized (4.3) and of the 1st-persistence diagrams for normalized (4.5). Hence, we only compute the full bottleneck distance between persistence diagrams for these filtration functions. To test the performances of other bottleneck distances such as dB,2, we also

compute the dB,2 between persistence diagrams for normalized (4.3) and normalized (4.5).

Table 4.10 shows the Pe for the full distance matrix using normalized (4.3) and nor-

malized (4.5). The best Pe for family is 0.895 and is obtained when using dB,2 distance

67 Figure 4.18: Crown surfaces of a collection of teeth in the data set that are used for testing. Title of each subfigure shows family, diet and code of the tooth. Teeth in the same row share the same family and diet. The coloring on the surfaces indicates mean curvatures at the vertices.

68 69

Figure 4.19: Dendrograms for filtration functions listed in (4.3) - (4.6). Each row corresponds to a filtration function in the order of (4.3) - (4.6). Odd columns show dendrograms of bottleneck distance matrices of 0th-persistence diagrams whereas the even columns show those of 1st-persistence diagrams. The left half of the figure are from unnormalized filtration functions and the right half are where normalized filtration functions are applied. Filtration Metric BarcodeDistance Dim PeFamily PeGenus PeDiet Normalized absMeancurv N/A dB,∞ 0 0.965 0.974 0.696 Normalized absMeancurv N/A dB,∞ 1 0.921 0.940 0.658 Normalized minus absMeancurv N/A dB,∞ 0 0.921 0.957 0.620 Normalized minus absMeancurv N/A dB,∞ 1 0.974 0.974 0.772 Normalized absMeancurv N/A dB,2 0 0.974 0.983 0.696 Normalized absMeancurv N/A dB,2 1 0.930 0.957 0.658 Normalized minus absMeancurv N/A dB,2 0 0.895 0.948 0.646 Normalized minus absMeancurv N/A dB,2 1 0.956 0.983 0.633

Table 4.10: Probability of error of experiments where filtered simplicial complex is built through the filtration function (4.3) and normalized (4.5). “absMeancurv” represents (4.3) and “minus absMeancurv” represents (4.5).

Change in Change in Change in Filtration Dim PeFamily PeGenus PeDiet Normalized absMeancurv 0 -0.009 -0.009 -0.006 Normalized absMeancurv 1 -0.009 -0.017 0 Normalized minus absMeancurv 0 0.026 0.009 -0.026 Normalized minus absMeancurv 1 0.018 -0.009 0.139

Table 4.11: Change in Pe observed in Table 4.10 that is caused by switching from dB,∞ to dB,2 for the distance between persistence diagrams.

to 0th-persistence diagrams that are filtered by normalized (4.5). The best Pe for genus is

0.940 and is obtained when using dB,∞ to compare 1st-persistence diagrams for normalized

(4.3). The best Pe for diet is 0.620 and is obtained when using dB,∞ on 0th-persistence diagrams for normalized (4.5). Dendrogram is shown in Figure 4.20. The lowest Pe’s we ob- tained using Normalized (4.3) are greater than the lowest Pe’s when using (4.2) as filtration function.

The benefit of using dB,2 as oppose to dB,∞ is not obvious. Table 4.11 shows the change

in Pe between There are only 4 cells out of 12 in the upper half of the Table 4.11 show

an improvement in Pe by switching from dB,∞ to dB,2. However, 7 cells out of 12 suffers

from worse Pe. Persistence diagram-wise, computing dB,2 instead of dB,∞ between 0th-

persistence diagrams for Normalized (4.3) does not improve the Pe’s, but computing dB,2

between persistence diagrams for Normalized (4.5) helps lower 2 out of 3 Pe’s. A more

thorough examination is required for a more decisive conclusion.

4.4.7 Summary of results using the PH approach

Table 4.12 shows the methods for the best Pe using the PH approach. We see that

although the Pe for family and genus classification for the PH approach are higher those

for the OT approach, the Pe for diet for the PH approach is better than the Pe for diet

70 Normalized_minus_absMeancurv_bt_dim0 omnivorous insectivorous insectivorous insectivorous folivorous NA insectivorous insectivorous insectivorous folivorous frugivorous frugivorous frugivorous frugivorous folivorous folivorous frugivorous folivorous NA NA insectivorous NA frugivorous folivorous NA folivorous folivorous omnivorous insectivorous folivorous insectivorous NA folivorous folivorous folivorous folivorous folivorous NA insectivorous NA omnivorous frugivorous folivorous NA insectivorous insectivorous folivorous insectivorous NA folivorous omnivorous insectivorous folivorous insectivorous folivorous NA insectivorous insectivorous folivorous insectivorous NA omnivorous folivorous NA NA NA insectivorous insectivorous NA folivorous folivorous NA NA NA NA NA omnivorous omnivorous omnivorous NA insectivorous omnivorous omnivorous insectivorous NA NA NA NA insectivorous NA NA insectivorous NA insectivorous NA insectivorous insectivorous NA folivorous NA NA NA NA NA folivorous NA frugivorous folivorous folivorous frugivorous omnivorous insectivorous omnivorous insectivorous 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Figure 4.20: Figure shows the dendrogram of the result when using dB,2 on 1st-persistence diagrams for Normalized (4.3). The Pe for diet is 0.620.

Label Method Pe Family bottleneck distance on 0th-persistence diagrams of filtration function (4.2) with λ = 0.3 0.816 bottleneck distance on 1st-persistence diagrams Genus 0.879 of Vietoris-Rips filtration with Euclidean distance Diet bottleneck distance on 0th-persistence diagrams of Vietoris-Rips with geodesic distance 0.430

Table 4.12: List of methods with the best Pe using the PH approach.

for the OT approach (which is 0.494 according to Table 4.5). We observe that in all the implementations that we discussed, the best Pe for genus is almost always obtained in distance matrices of 1st-persistence diagrams.

4.5 Comparison of the results from the OT approach and the PH approach

Figure 4.21 provides a more detailed visualization of the distance matrix that produces one of the best Pe for dietary classification. Such visualization that combines a heatmap with dendrogram is called a clustergram. The colors in the heatmap indicate the distances between two teeth given by the distance matrix. Figure 4.21a shows the heatmap and the dendrogram from the experiment using Euclidean distance, uniform probability measures, D

= 0.36 and K = 2000. Figure 4.21b shows the heatmap and dendrogram for the experiment

71 using dB,∞ comparing 0th-persistence diagrams for Vietoris-Rips filtration with geodesic distance.

In Figure 4.21a, we can see two large blocks along the diagonal that represents the two clusters that we observed in Figure 4.5 that separate diets into two groups. However, such division is not reflected in Figure 4.21b when using the PH approach.

Consensuses and differences among the two clustergrams can be found by comparing smaller clusters in Figure 4.21. The colors in the dendrograms aid the discrimination of smaller clusters. These smaller clusters are chosen in the way that there are 9 clusters of relatively short merge heights in each dendrogram. Same color is assigned to a pair of clusters that share similar ingredients in composing the clusters in Figure 4.21a and 4.21b. We can learn about the general structures of the distance matrices by grouping the clusters into two collections as shown in Figure 4.21. The fact that we can use color to identify similar clusters suggests that the two distance matrices are locally similar. Another consensus between the two results is that clusters in Group 1 are close to each other in both distance matrices.

Moreover, we can see that the clusters in Group 2 are far away from clusters in Group 1. If we look more closely at the clusters in Group 2, we can discover that the collection of the bright red, gray and black clusters contains the exact same items in both distance matrices.

On the other hand, one major difference between the two distance matrices is that the overall structure of the dendrograms are different. As we observed in the dendrograms shown in section 4.3 as well as in Figure 4.21a, there is a clear separation between the clusters that contains omnivore/insectivore and folivore/frugivore. However, such clear distinction between dietary labels cannot be observed in Figure 4.21b. Another difference is that clusters in Group 2 are farther away from each other in Figure 4.21b than they are in Figure 4.21a.

We can see more yellow color at the boundary of the distance matrix in Figure 4.21b if we restrict the distance matrix to clusters in Group 2. If one inspect the cluster individually, one can identify that yellow and purple clusters exhibit the most distinctions. In fact, the items shift between the yellow cluster and purple cluster. However, the order of merges are different in the dendrograms. In Figure 4.21a, the yellow cluster merges with the blue cluster

first, whereas in Figure 4.21b, the yellow cluster merges with the purple cluster first. Recall that in single-linkage hierarchical clustering, the distance between clusters is given by the minimal distance between a pair of items in the two clusters. Hence, we know that in Figure

4.21a, the closest pair of teeth in the blue and yellow clusters has less distance than the closest pair in yellow and purple clusters.

72 (a)

(b)

Figure 4.21: Clustergrams (heatmap with dendrogram) for the distance matrices that pro- duce the best Pe for dietary classifications using the OT and PH approach. Figure 4.21a is produced by using Euclidean distance, uniform probability measures, D = 0.36 and K = 2000. Figure 4.21b is produced by using dB,∞ comparing 0th-persistence diagrams for Vietoris-Rips filtration with geodesic distance. Labels on the right indicate diets and labels at the bottom show family of each tooth. 73 Chapter 5: Contributions and Future Work

5.1 Conclusion

We explored several implementations using two approaches based on optimal transport

(OT) and persistent homology (PH) to constructing distances between shapes. Table 4.5

and Table 4.12 shows the experiments that yield the best Pe for the OT approach and the

PH approach respectively. We observe that the OT approach yields better Pe’s for family

and genus classifications (Pe = 0.737 and 0.793), but the PH approach outperforms the OT approach when classifying our data set by diet (Pe = 0.430). Although the classification

success rates are not very good, by testing various methods, we improve the classification

results by 22.5%, 18% and 32% than randomly guessing family, genus or diet labels.

Our results do not compete with the results obtained in [11]. The success rate in leave-

one-out classification it obtained for family and genus is 90.9% and 92.5% when using the

continuous Procrustes distance which is also discussed in [3].

The benefit of the flexibility that both approaches provide is also reflected in the results.

By choosing different metrics and probability measures in the OT approach, or choosing a

suitable filtration function in the PH approach, we refined our results for various classifica-

tion tasks. Therefore, we believe that there are plenty of rooms for improvement in shape

classification via optimal transport and persistence homology.

5.2 Future work

5.2.1 Improvement on the OT approach

We fed the entropic regularizer  = 0.005 to Sinkhorn’s algorithm throughout our ex- periments to compute TLB as discussed in section 4.3.1. One can decrease  further and choose the number of iterations correspondingly. As  decreases, the approximation given by Sinkhorn’s algorithm will converge to the true Gromov-Wasserstein distance.

Another direction to improve the classification results using the OT approach is to ap- proximate the Gromov-Wasserstein distance directly, rather than computing a lower bound.

Discussion on using gradient descent to solve for the Gromov-Wasserstein distance can be

74 found in [74]. Although in theory, gradient descent seeks a local minimum without the

promise of finding a global one, empirical results suggest that using gradient descent to solve

the optimal transport problem always outputs the global optimal solution [74].

5.2.2 Improvement on the PH approach

In section 4.4.6, we proposed a few filtration functions. One can invent more filtration

functions that do not necessarily depend on the mean curvature. For example, the Gaussian

curvature can be a reasonable substitute for the mean curvature.

Persistent homology transform (PHT) [90] is another tool on which one can experiment.

Instead of filtering through one filtration function, persistent homology transform proposes

that one can filter a simplicial complex on multiple filtration functions and obtain a collection

of persistence diagrams for a shape. To compare two shapes using PHT, one can compute the

bottleneck distance on each pair of persistence diagrams coordinate-wise. Although comput-

ing the coordinate-wise bottleneck distance increases computational cost, PHT eliminates

the possibility that noise is picked up by a special filtration function.

5.2.3 Other approaches

One can explore the continuous Procrustes distance [11, 3] for disk-like shape classification

tasks. Yet, a more general landmark-free approach proposed as a follow-up work to [11] may

interest some of our readers. In 2015, Boyer et al. released an R-package called auto3dgm

[12]. The method was developed by Puente as a part of his PhD thesis [76]. By computing all

pairwise alignments and distances using a iterative closest points process [8] and computing

the minimum spanning tree, the auto3dgm automatically generates a set of landmarks for

shape comparison. Boyer et al. [12] tested the approach on a data set of bones and compared

the result with that based on user-defined landmarks to conclude that the auto3dgm provides

reliable automatically generated landmarks.

5.2.4 Experiment on different data sets

Morphobrowser2 is an online database of 3D scans mammalian teeth together with visu- alizations. One can add the teeth in Morphobrowser to the data set provided in [11] to form a larger data set and repeat the experiments.

One can also explore other anatomical surfaces since optimal transport and persistent homology do not require special structure on the input data set. For example, brain imaging method generates a large amount of data to study the activities in human brains.

2http://morphobrowser.biocenter.helsinki.fi/

75 One can also explore other data sets where interesting topology is observed. ShapeNet-

Core3 is an online databse that contains 3D models of 51,300 items across 55 categories of common objects, such as airliners, bathtubs, pianos, and etc.

3https://www.shapenet.org/

76 Appendix A: Main functions and scripts

A.1 OT approach

A.1.1 Compute local distribution functionv= localDist(dm,NN,DD,mu) %%%% Input: %%%% dm: pairwise distance matrix %%%% NN: number of nodal points in [0,DD] %%%% DD: size of neighborhood %%%% mu: measure n = length(dm); TimeMesh = linspace(0,DD,NN);

v = zeros(n,NN); fori=1:n forj=1:NN ii = find(dm(i,:)<= TimeMesh(j)); v(i,j) = sum(mu(ii)); end end v = v/max(v(:)); end

A.1.2 Compute TLB function compute_TLBpar(K, epsParam, niter, DD, NN, mu, metric_type, normed, data_path, localDist_path, save_path) %%%% Input: %%%% K: number of sampled point %%%% epsParam: entropic regularizer %%%% niter: number of iteration for Sinkhorn's algorithm %%%% DD: size of neighborhood %%%% NN: number of points in the mesh of [0,DD] %%%% muX: measure on X %%%% muY: measure on Y %%%% metric_type: "euc" for Euclidean distance as cost %%%% "geo" for geodesic distance as cost %%%% normed: boolean for normalizing distance %%%% data_path, localDist_path, save_path: paths for loading data, %%%% saving local distribution, and saving output addpath('sinkhorn/matlab/'); %add path to sinkhorn's algorithm addpath('toolbox_general/'); %add path to toolbox general savename = ['K' num2str(K) '_epsParam' num2str(epsParam) '_niter' num2str(niter) '_DD' num2str(DD)]; if normed savename = ['normed_' savename]; end if strcmp(metric_type,'geo') savename = ['geo' savename]; else savename = ['euc' savename]; end localDist_path = [localDist_path savename '/']; disp(['locDist path: ' localDist_path])

77 if~exist(localDist_path) mkdir(localDist_path); end

%% options for sinkhorn solver options.tau =0; options.niter = niter; options.verb =0; EpsParam = epsParam;

%% readmeshes and process dir_meshes = dir([data_path '*.off']); nm = length(dir_meshes); if~exist(localDist_path) mkdir(localDist_path) end if~exist(['tlb_cache/' savename '/']) mkdir(['tlb_cache/' savename '/']) end

M = 200; dim = min(nm,M); p = nchoosek(1:dim,2); parfork=1:size(p,1) ik = p(k,1); jk = p(k,2);

if exist(['tlb_cache/' savename '/tlb' num2str(ik) '_' num2str(jk) '.mat'],'file') ~=2 try T = load([localDist_path num2str(ik) '.mat']); vX = T.v; catch vX = process_dm([data_path dir_meshes(ik).name],ik,K,NN,DD, muX, metric_type,normed,localDist_path); end

try T = load([localDist_path num2str(jk) '.mat']); vY = T.v; catch vY = process_dm([data_path dir_meshes(jk).name],jk,K,NN,DD, muY, metric_type,normed,localDist_path); end

%% compute cost matrix (p=1) Q = zeros(K,K); fori=1:K vi = vX(i,:); forj=1:K vj = vY(j,:); Q(i,j) = sum(abs(vi-vj))*DD/NN; end end

%% Compute TLB using Sinkhorn's algorithm [u,v,gamma,Wprimal,Wdual,err] = sinkhorn_log(muX,muY,Q,epsParam,options); tlb = sum(sum(Q.*gamma)); % this will calculate the TLB (p=1) save_gamma(gamma,ik,jk,savename); save_tlbij(tlb,ik,jk,savename); end end

%% read tlb for each ij dir_tlb = dir(['tlb_cache/' savename '/*.mat']); TLB = zeros(length(dir_meshes)); forl=1:length(dir_tlb) namel = dir_tlb(l).name; [i,j,tlb] = read_tlbij(['tlb_cache/' savename '/' namel]);

78 TLB(i,j) = tlb; end size(TLB) TLB = max(TLB,TLB');

%% put everything together -- prepare results so they can be saved disp('Saving TLB...') results.TLB = TLB; results.DD = DD; results.NN = NN; results.EpsParam = EpsParam; results.options = options; results.K = K; save(['results_' savename '.mat'],'results'); end functionv = process_dm(filename,ik,K,NN,DD,muX, muY,metric_type, normed,localDist_path) %%%% input: %%%% filename: path to the mesh file to be read %%%% ik: index of the file in the directory (also acts %%%% as saved file name) %%%% NN: number of nodal points in [0,DD] %%%% DD: size of neighborhood %%%% mu: measure %%%% metric_type: "euc" for Euclidean distance as cost %%%% "geo" for geodesic distance as cost %%%% localDist_path: path to where local distribution is saved [T,X,Y,Z] = read_off_ph(filename); disp('Computing Euclidean Farthest Point Sampling...') ifK< length(X) I = euclid_far_samp(X,Y,Z,K); else I =1:length(X); end % compute dm dmX = distance_matrix(metric_type,I,T,X,Y,Z); diamX = max(max(dmX)); if normed dmX = dmX/diamX; %normalize dm end v = localDist(dmX,NN,DD,mu); savetomat([localDist_path num2str(ik) '.mat'],v,I,dmX); end

A.2 The PH approach

A.2.1 Fill 1-cycles

%% load all the names for the meshes that contains a 1-cycle %% res: structure variable with one field ``fnames'' %% fnames: cell array of the filenames of the meshes that %% contains a 1-cycle res = load('../data/cache/all_loops.mat'); fnames = res.fnames; wd = pwd; datapath = '../../CPsurfcomp/DATA/teeth/meshes/'; savepath = '../data/cache/ShortLoop/'; shortloop_path = '../../ShortLoop/'; options.curvature_smoothing =1; %structure variable for % curvature computation max_dimension =3; num_divisions = 1000; dim =2;

% find shortest set of generators of 1-cycle using Shortloop cd(shortloop_path) forf=1:length(fnames)

79 system(['./ShortLoop ../CPsurfcomp/DATA/teeth/meshes/' fnames{f} '-v -t']) end cd(wd) system(['mv ../../CPsurfcomp/DATA/teeth/meshes/*_loops* ' savepath]) system(['mv ../../CPsurfcomp/DATA/teeth/meshes/*_timing* ' savepath]) forf=1:length(fnames) disp('loading loops') all_loops = load_loop([savepath fnames{f}]); [Tp,Xp,Yp,Zp] = read_off_ph([datapath fnames{f}]); disp('filling holes') [T,X,Y,Z] = fill_holes(Tp,Xp,Yp,Zp,all_loops); save(['../data/cache/fillhole_T/' fnames{f}(1:end-4) '.mat'], 'T','X','Y','Z'); end function all_loops = load_loop(fname) %%%% load shortest loop %%%% input: %%%% fname: name of flie

%% read file filestr = fileread([fname(1:end-4) '_loops.txt']); filebyline = regexp(filestr, '\n', 'split'); %% eliminate empty rows filebyline(cellfun(@isempty,filebyline)) = []; all_loops = cell(length(filebyline)-2,1); forl=2:length(filebyline)-1 line = regexp(filebyline{l}, ':', 'split'); loop = regexp(line{2},'', 'split'); loop(cellfun(@isempty,loop)) = []; loop_id = str2double(loop)+1; all_loops{l-1} = loop_id; end end function [Tp,Xp,Yp,Zp] = fill_holes(T,X,Y,Z,all_loops) %%%% fill loops %%%% input: %%%% T: triangular mesh %%%% X,Y,Z: x,y,z coordinates for vertices %%%% all_loops: cell array of generators of 1-cycles %%%% output: %%%% Tp: new triangular mesh where 1-cycles are filled %%%% Xp,Yp,Zp: new x,y,z coordinates of vertices num_holes = length(all_loops); Tp = T; Xp = X; Yp = Y; Zp = Z; forh=1:num_holes hole = all_loops{h}; if length(hole) ==3 Tp(end+1,:) = hole; else Xp = [Xp;mean(X(hole))]; Yp = [Yp;mean(Y(hole))]; Zp = [Zp;mean(Z(hole))]; center = length(Xp); forj=1:length(hole) ifj == length(hole) k =1; else k =j+1; end curr_triangle = [center,hole(j),hole(k)]; Tp(end+1,:) = curr_triangle; end end

80 end end

A.2.2 Compute persistence diagrams of sublevel set filtration of a function in (4.3) - (4.6) function [infinite_barcodes,dgm_dict] = sublevel_filtration_simp(T,f,dim) %%%% input: %%%% T: a triangulation of the shape %%%% f a function value at vertices %%%% dim: maximum dimension of persistence diagrams %%%% output: %%%% infinite_barcodes: list of infinite barcodes %%%% dgm_dict: cell array of size dim-by-1 where each row stores %%%% the (dim-1)th-persistence diagram

%% add javaplex path wd = pwd; addpath('../../matlab_examples/') addpath('../../appliedtopology-javaplex-6a2ef48/') load_javaplex; import edu.stanford.math.plex4.*;

%% initialize a filtered simplicial complex stream = api.Plex4.createExplicitSimplexStream(max(f)+10); for face = T' edges_ind = nchoosek(1:length(face),2); %set index of edges fork=1:size(edges_ind,1) vertexA = face(edges_ind(k,1)); vertexB = face(edges_ind(k,2)); stream.addVertex(vertexA,f(vertexA)); stream.addVertex(vertexB,f(vertexB)); stream.addElement([vertexA,vertexB],max(f(vertexA),f(vertexB))); end stream.addElement(face,max(f(face))); end if~stream.validateVerbose() % validate the filtered simplicial complex disp('Not a valid filtered simplicial complex') return end stream.finalizeStream(); %% compute the persistence diagrams persistence = api.Plex4.getModularSimplicialAlgorithm(dim,2); intervals = persistence.computeAnnotatedIntervals(stream); %% return the set of infinite barcodes infinite_barcodes = intervals.getInfiniteIntervals(); for curr_dim =1:dim %% remove trivial points in a persistence diagram for Hera dgm_dict{curr_dim} = remove_trivial(dgm); end clear stream cd(wd) end function simp_dgm = remove_trivial(M) %%%% removes trivial points in a persistence diagram for Hera %%%% input: %%%% M: persistence diagram if isempty(M) simp_dgm = M; return; end try zero_persist = find(abs(M(:,1)- M(:,2))< 1e-4); catch

81 disp('not able to simplify dgm') end simp_dgm = M; simp_dgm(zero_persist,:) = []; end

A.3 Probability of error (Pe) function Pe = leave_one_out(dm,labels) %%%% input: %%%% dm: pairwise distance matrix %%%% labels: true labels n = size(dm,1); [~,nearest_neighbor] = sort(dm,2); mock_label = labels(nearest_neighbor(:,2)); indicator = (mock_label == labels); Pe = (n-sum(indicator))/n; end

82 Bibliography

[1] Classics of traditional chinese medicine: Emperors and physicians, 04 2012.

[2] Abdallah A. Alshennawy. Extract the geometry of mechanical parts by vision system using Hough transform. International Journal of Control, Automation and Systems, 3(2), 04 2014.

[3] Reema AlAifari, Ingrid Daubechies, and Yaron Lipman. Communications on Pure and Applied Mathematics, 66(6):934–964, 2013.

[4] Karen L. Baab, Jonathan M. G. Perry, F. James Rohlf, and William L. Jungers. Phylo- genetic, ecological, and allometric correlates of cranial shape in malagasy lemuriforms. Evolution, 68(5):1450–1468.

[5] Jon L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 09 1975.

[6] Claude Berge. Two theorems in graph theory. Proceedings of the National Academy of Sciences, 43(9):842–844, 1957.

[7] Dimitris Bertsimas and John Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1st edition, 1997.

[8] Paul J. Besl and Neil D. McKay. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, Feb 1992.

[9] Garrett Birkhoff. Extensions of Jentzschs theorem. Transactions of the American Math- ematical Society, 85(1):219, 1957.

[10] Doug M. Boyer. Relief index of second mandibular molars is a correlate of diet among prosimian primates and other euarchontan mammals. Journal of Human Evolution, 55(6):1118 – 1137, 2008.

[11] Doug M. Boyer, Yaron Lipman, Elizabeth St. Clair, Jesus Puente, Biren A. Patel, Thomas Funkhouser, Jukka Jernvall, and Ingrid Daubechies. Algorithms to automat- ically quantify the geometric similarity of anatomical surfaces. Proceedings of the Na- tional Academy of Sciences, 108(45):18221–18226, 2011.

[12] Doug M. Boyer, Jesus Puente, Justin T. Gladman, Chris Glynn, Sayan Mukherjee, Gabriel S. Yapuncich, and Ingrid Daubechies. A new fully automated approach for aligning and comparing shapes. The Anatomical Record, 298(1):249–276.

[13] Yann Brenier. Polar factorization and monotone rearrangement of vector-valued func- tions. Communications on Pure and Applied Mathematics, 44(4):375–417, 1991.

[14] Andrew V.Z. Brower. Problems with DNA barcodes for species delimitation: ten species of astraptes fulgerator reassessed (lepidoptera: Hesperiidae). Systematics and Biodiver- sity, 4(2):127–132, 2006.

[15] Dmitri Burago, Yuri Burago, and Sergei Ivanov. A course in metric geometry. Graduate Studies in Math., 33, 01 2001.

[16] Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas J. Guibas. Persistence barcodes for shapes. International Journal of Shape Modeling, 11(02):149–187, 2005.

83 [17] Fr´ed´eric Chazal, David Cohen-Steiner, Leonidas J. Guibas, Facundo M´emoli, and Steve Y. Oudot. Gromov-Hausdorff stable signatures for shapes using persistence. In Proceedings of the Symposium on Geometry Processing, SGP ’09, pages 1393–1403, Aire-la-Ville, Switzerland, Switzerland, 2009. Eurographics Association.

[18] Vincent Cicirello and William C. Regli. Machining feature-based comparisons of me- chanical parts. In Proceedings International Conference on Shape Modeling and Appli- cations, pages 176–185, 05 2001.

[19] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. Discrete & Computational Geometry, 37(1):103–120, 01 2007.

[20] David Cohen-Steiner and Jean-Marie Morvan. Restricted Delaunay triangulations and normal cycle. In ACM SYMPOSIUM ON COMPUTATIONAL GEOMETRY, 2003.

[21] Nicolas Courty, R´emiFlamary, Devis Tuia, and Thomas Corpetti. Optimal transport for data fusion in remote sensing. In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 3571–3574, 07 2016.

[22] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation distances. Advances in Neural Information Processing Systems, 26, 06 2013.

[23] Tamal K Dey, Jian Sun, and Yusu Wang. Approximating cycles in a shortest basis of the first homology group from point data. Inverse Problems, 27(12):124004, 11 2011.

[24] Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269271, 1959.

[25] Manfredo P. do Carmo. Differential Geometry of Curves and Surfaces: Revised and Updated Second Edition. Dover Books on Mathematics. Dover Publications, 2016.

[26] Herbert Edelsbrunner and John Harer. Persistent homology - a survey. Discrete Com- putational Geometry - DCG, 453, 01 2008.

[27] Herbert Edelsbrunner, David G. Kirkpatrick, and Raimund Seidel. On the shape of a set of points in the plane. IEEE Transactions on Information Theory, 29(4):551–559, 07 1983.

[28] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplification. Discrete Computational Geometry, 28:511–533, 2002.

[29] Alon Efrat, Alon Itai, and Matthew J. Katz. Geometry helps in bottleneck matching and related problems. Algorithmica, 31(1):1–28, 09 2001.

[30] Sven Erlander and Neil S. Stewart. The gravity model in transportation analysis: theory and extensions. VSP, 1990.

[31] Alistair R. Evans and Gordon D. Sanson. The effect of tooth shape on the breakdown of insects. Journal of Zoology, 246(4):391–400.

[32] R´emiFlamary, C´edric F´evotte, Nicolas Courty, and Valentin Emiya. Optimal spec- tral transportation with application to music transcription. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 703–711, USA, 2016. Curran Associates Inc.

[33] Joel Franklin and Jens Lorenz. On the scaling of multidimensional matrices. Linear Algebra and its Applications, 114-115:717 – 735, 1989. Special Issue Dedicated to Alan J. Hoffman.

[34] Patrizio Frosini. A distance for similarity classes of submanifolds of a Euclidean space. Bulletin of the Australian Mathematical Society, 42(3):407415, 1990.

[35] Alfred Galichon. Optimal Transport Methods in Economics. Princeton University Press, 1 edition, 2016.

84 [36] Wilfrid Gangbo and Robert J. McCann. The geometry of optimal transportation. Acta Math., 177(2):113–161, 1996.

[37] Philip D. Gingerigh. Function of pointed premolars in phenacolemur and other mam- mals. Journal of Dental Research, 53(2):497–497, 1974. PMID: 4521916.

[38] Hugh C. J. Godfray and Sandra Knapp. Taxonomy for the twenty-first century - intro- duction. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 359:559–69, 05 2004.

[39] Afzal Godil. Facial shape analysis and sizing system. In Vincent G. Duffy, editor, Digital Human Modeling, pages 29–35, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.

[40] William K. Gregory. The origin and evolution of the human dentition: A palontological review. Journal of Dental Research, 3(1):87–228, 1921.

[41] Frederick E. Grine. Dental evidence for dietary differences in Australopithecus and Paranthropus: a quantitative analysis of permanent molar microwear. Journal of Human Evolution, 15(8):783 – 822, 1986.

[42] Xianfeng D. Gu and Shing-Tung Yau. Computational Conformal Geometry. Advanced lectures in mathematics. International Press, 2008.

[43] John E. Hopcroft and Richard M. Karp. A n5/2 algorithm for maximum matchings in bipartite. In 12th Annual Symposium on Switching and Automata Theory (swat 1971), pages 122–125, 10 1971.

[44] Edwin T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106:620– 630, 05 1957.

[45] Leonid Kantorovich. On the transfer of masses (in Russian). Transactions of the Amer- ican Mathematical Society, 37(2):227–229, 1942.

[46] Richard Kay. Molar structure and diet in extant Cercopithecidae, pages 309–339. 01 1978.

[47] Richard Kay, Blythe Williams, and Federico Anaya. The Adaptations of Branisella boliviana, the Earliest South American Monkey, pages 339–370. 01 2002.

[48] Richard F. Kay. The functional adaptations of primate molar teeth. American journal of physical anthropology, 43 2:195–216, 1975.

[49] David G. Kendall. Shape Manifolds, Procrustean Metrics, and Complex Projective Spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 03 1984.

[50] Michael Kerber, Dmitriy Morozov, and Arnur Nigmetov. Geometry helps to compare persistence diagrams. J. Exp. Algorithmics, 22:1.4:1–1.4:20, 09 2017.

[51] Soheil Kolouri, Serim Park, Matthew Thorpe, Dejan Slepcev, and Gustavo K. Rohde. Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34:43–59, 07 2017.

[52] James Lennox. Aristotles biology. In Edward N. Zalta, editor, The Stanford Encyclope- dia of Philosophy. Metaphysics Research Lab, Stanford University, spring 2019 edition, 2019.

[53] Michael Lesnick. The optimality of the interleaving distance on multidimensional per- sistence modules. CoRR, abs/1106.5305, 2011.

[54] Carl Linnaeus. Systema naturae. 1753.

[55] Kevin Liu, Sindhu Raghavan, Serita Nelesen, C. Randal Linder, and Tandy Warnow. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science (New York, N.Y.), 324:1561–4, 07 2009.

85 [56] Peter W. Lucas. Dental Functional Morphology: How Teeth Work. Cambridge Univer- sity Press, 2004.

[57] Mahammed Mahboubi and Marc Godinot. Earliest known simian primate found in algeria. Nature, 04 1992.

[58] Yasushi Makihara and Yasushi Yagi. Earth mover’s morphing: Topology-free shape morphing using cluster-based emd flows. In Ron Kimmel, Reinhard Klette, and Akihiro Sugimoto, editors, Computer Vision – ACCV 2010, pages 202–215, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.

[59] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch¨utze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

[60] Benjamin Mathon, Fran¸coisCayre, Patrick Bas, and BenoˆıtMacq. Optimal transport for secure spread-spectrum watermarking of still images. IEEE Transactions on Image Processing, 23(4):1694–1705, 04 2014.

[61] Jeffrey A. McNeely. Biodiversity and ecosystem insecurity: A planet in peril edited by ahmed djoghlaf and felix dodds. The Quarterly Review of Biology, 88(4):336–336, 2013.

[62] Rudolf Meier, Kwong Shiyang, Gaurav Vaidya, and Peter K L Ng. DNA Barcoding and Taxonomy in Diptera: A Tale of High Intraspecific Variability and Low Identification Success. Systematic Biology, 55(5):715–728, 10 2006.

[63] Facundo M´emoli. On the use of Gromov-Hausdorff distances for shape comparison. Proceedings Point Based Graphics, pages 81–90, 01 2007.

[64] Facundo M´emoli. Gromov-Wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4):417487, 2011.

[65] Mark Meyer, Mathieu Desbrun, Peter Schr¨oder,and Alan H. Barr. Discrete differential- geometry operators for triangulated 2-manifolds. In Hans-Christian Hege and Konrad Polthier, editors, Visualization and Mathematics III, pages 35–57, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.

[66] Philipp Mitteroecker and Philipp Gunz. Advances in geometric morphometrics. Evolu- tionary Biology, 36(2):235–247, 06 2009.

[67] Gaspard Monge and Augustus De Morgan. Memoire sur la theorie des deblais et des remblais. Imprimerie royale, 1781.

[68] Andrew Moore. Efficient Memory-based Learning for Robot Control. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 03 1991.

[69] James R. Munkres. Elements Of Algebraic Topology. CRC Press, 2018.

[70] O. Museyko, Michael Stiglmayr, Kathrin Klamroth, and G¨unter Leugering. On the application of the Monge–Kantorovich problem to image registration. SIAM J. Imaging Sciences, 2:1068–1097, 2009.

[71] Tom Needham. Introduction to applied algebraic topology. Available at https:// drive.google.com/file/d/1SCrKHfZdDuMmSKlZ7xveQT8SqBHjFEkk/view, 02 2019.

[72] Jonathan M. G. Perry. Inferring the diets of extinct giant lemurs from osteological correlates of muscle dimensions. The Anatomical Record, 301(2):343–362.

[73] Gabriel Peyr´eand Marco Cuturi. Computational Optimal Transport. 2018.

[74] Gabriel Peyr´e,Marco Cuturi, and Justin Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices. In ICML 2016, Proc. 33rd International Conference on Machine Learning, New-York, United States, 06 2016.

[75] Stuart Pimm and Peter Raven. Biodiversity - extinction by numbers. Nature, 403:843–5, 03 2000.

86 [76] Jes´usPuente. Distances and algorithms to compare sets of shapes for automated biolog- ical morphometrics. PhD thesis, Princeton University, 2013.

[77] Vanessa Robins. Towards computing homology from approximations. Topology Pro- ceedings, 24, 01 1999.

[78] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99–121, 11 2000.

[79] George G. Simpson. Studies of the Earliest Mammalian Dentitions. 1936.

[80] Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Statist., 35(2):876–879, 06 1964.

[81] Hojun Song, Jennifer E. Buhay, Michael F. Whiting, and Keith A. Crandall. Many species in one: DNA barcoding overestimates the number of species when nuclear mito- chondrial pseudogenes are coamplified. Proceedings of the National Academy of Sciences, 105(36):13486–13491, 2008.

[82] Svetlana Stepanovi, Andrea Kosovac, Oliver Krsti, Jelena Jovi, and Ivo Toevski. Mor- phology versus DNA barcoding: two sides of the same coin. a case study of ceu- torhynchus erysimi and c. contractus identification. Insect Science, 23(4):638–648.

[83] Suzanne G. Strait. Molar morphology and food texture among small-bodied insectivo- rous mammals. Journal of Mammalogy, 74(2):391–402, 1993.

[84] V. N. Sudakov. Geometric problems in the theory of infinite-dimensional probability distributions. American Mathematical Society, 1979.

[85] Frederick S. Szalay. The beginnings of primates. Evolution, 22(1):19–36, 1968.

[86] Ayellet Tal. 3D Shape Analysis for Archaeology, pages 50–63. Springer Berlin Heidel- berg, Berlin, Heidelberg, 2014.

[87] Andrew Tausz, Mikael Vejdemo-Johansson, and Henry Adams. JavaPlex: A research software package for persistent (co)homology. In Han Hong and Chee Yap, editors, Proceedings of ICMS 2014, Lecture Notes in Computer Science 8592, pages 129–136, 2014. Software available at http://appliedtopology.github.io/javaplex/.

[88] Jeremy Thomas, Mark Telfer, David B. Roy, Chris D. Preston, Jeremy Greenwood, J Asher, Richard Fox, Ralph Clarke, and J. H. Lawton. Comparative losses of british butterflies, birds, and plants and the global extinction crisis. Science (New York, N.Y.), 303:1879–81, 04 2004.

[89] Christopher Tralie, Nathaniel Saul, and Rann Bar-On. Ripser.py: A lean persistent homology library for python. The Journal of Open Source Software, 3(29):925, 09 2018.

[90] Katharine Turner, Sayan Mukherjee, and Doug M. Boyer. Persistent homology trans- form for modeling shapes and surfaces. Information and Inference: A Journal of the IMA, 3(4):310–344, 12 2014.

[91] P.U. Unschuld. Medicine in China: A History of Ideas. Comparative Studies of Health Systems and Medical Care. University of California Press, 1985.

[92] C´edricVillani. Topics in Optimal Transportation. Graduate studies in mathematics. American Mathematical Society, 2003.

[93] C´edricVillani. Optimal transport – Old and new, volume 338, pages xxii+973. 01 2008.

[94] Etienne Vouga. Lectures in discrete differential geometry 3 discrete surfaces. Avail- able at https://www.cs.utexas.edu/users/evouga/uploads/4/5/6/8/45689883/ notes3.pdf 03 2014.

87 [95] Brenton Walker. Using persistent homology to recover spatial information from en- counter traces. pages 371–380, 01 2008.

[96] Wei Wang, J A Ozolek, Dejan Slep, Ann B Lee, Cheng Chen, and G K Rohde. An opti- mal transportation approach for nuclear structure-based pathology. IEEE Transactions on Medical Imaging, 30(3):621631, 2011.

[97] Wei Wang, Dejan Slepˇcev,Saurav Basu, John A. Ozolek, and Gustavo K. Rohde. A linear optimal transportation framework for quantifying and visualizing variations in sets of images. International Journal of Computer Vision, 101(2):254–269, 01 2013.

[98] Max Wardetzky. Convergence of the Cotangent Formula: An Overview, pages 275–286. Birkh¨auserBasel, Basel, 2008.

[99] Lei Zhu, Yan Yang, Steven Haker, and Allen Tannenbaum. An image morphing tech- nique based on optimal mass preserving mapping. IEEE Transactions on Image Pro- cessing, 16(6):1481–1495, 06 2007.

88