Persistence, Metric Invariants, and Simplification
Dissertation
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University
By
Osman Berat Okutan, M.S.
Graduate Program in Mathematics
The Ohio State University
2019
Dissertation Committee:
Facundo M´emoli,Advisor Matthew Kahle Jean-Franc¸oisLafont c Copyright by
Osman Berat Okutan
2019 Abstract
Given a metric space, a natural question to ask is how to obtain a simpler and faithful approximation of it. In such a situation two main concerns arise: How to construct the approximating space and how to measure and control the faithfulness of the approximation.
In this dissertation, we consider the following simplification problems: Finite approx- imations of compact metric spaces, lower cardinality approximations of filtered simplicial complexes, tree metric approximations of metric spaces and finite metric graph approxima- tions of compact geodesic spaces. In each case, we give a simplification construction, and measure the faithfulness of the process by using the metric invariants of the original space, including the Vietoris-Rips persistence barcodes.
ii For Esra and Elif Beste
iii Acknowledgments
First and foremost I’d like to thank my advisor, Facundo M´emoli,for our discussions and for his continual support, care and understanding since I started working with him. I’d like to thank Mike Davis for his support especially on Summer 2016. I’d like to thank Dan
Burghelea for all the courses and feedback I took from him. Finally, I would like to thank my wife Esra and my daughter Elif Beste for their patience and support. This research was supported by NSF grants AF 1526513, DMS 1723003, CCF 1740761, IIS-1422400 and
CCF-1526513 .
iv Vita
May 2010 ...... B.S. in Mathematics Bilkent University May 2012 ...... M.S. in Mathematics Bilkent University September 2012 - present ...... Graduate Teaching Associate Department of Mathematics The Ohio State University.
Publications
Research Publications
O. Okutan, E. Yalcin “Free actions on products of spheres at high dimensions”. Algebraic & Geometric Topology 13, pp: 2087-2099, 2013.
Fields of Study
Major Field: Mathematics Specialization: Topological Data Analysis, Metric Geometry
v Table of Contents
Page
Abstract ...... ii
Dedication ...... iii
Acknowledgments ...... iv
Vita...... v
List of Figures ...... ix
1. Introduction ...... 1
1.1 Content ...... 3
2. Background ...... 9
2.1 Metric Geometry ...... 9 2.1.1 Gromov-Hausdorff Distance ...... 10 2.1.2 Hyperbolicity ...... 15 2.1.3 Injective/Hyperconvex Spaces ...... 17 2.2 Persistence ...... 19 2.2.1 Persistence Sequence ...... 22 2.3 Interleaving Distance ...... 27 2.4 Metric Graphs ...... 28 2.5 Reeb Graphs ...... 35 2.5.1 Stability of Reeb Metric Graphs ...... 39 2.5.2 Smoothings ...... 40 2.6 Differential Topology ...... 42
vi 3. A Geometric Characterization of Vietoris-Rips Filtration ...... 46
3.1 Introduction ...... 46 3.2 Persistence via Metric Pairs ...... 48 3.3 Isomorphism ...... 52 3.3.1 Stability of Metric Homotopy Pairings ...... 54 3.4 Application to the Vietoris-Rips Filtration ...... 56 3.4.1 Products and Wedge Sums ...... 56 3.5 Applications to the Filling Radius ...... 60 3.5.1 Bounding Barcode Length via Spread ...... 60 3.5.2 Bounding the Filling Radius ...... 62 3.5.3 Stability ...... 64
4. Finite Approximations of Compact Metric Spaces ...... 67
4.1 Introduction ...... 67 4.2 Discrete Length Structures ...... 68 4.3 Discrete Length structures on Metric Spaces ...... 69 4.4 Proof of Theorem 4.1.1 ...... 73
5. Metric Graph Approximations of Geodesic Spaces ...... 75
5.1 Introduction ...... 75 5.2 Graph Approximations ...... 79 5.3 Tree Approximations ...... 82
6. The Distortion of the Reeb Quotient Map on Riemannian Manifolds ...... 92
6.1 Introduction ...... 92 6.2 Distortion of the Reeb Quotient Map ...... 95 6.3 Thickness ...... 97 6.3.1 A Calculation: Thickened Filtered Graphs ...... 99 6.4 The Diameter of a Fiber of the Reeb Quotient Map ...... 103 6.5 The Bound of Theorem 6.1.1 for Thickened Graphs ...... 105
7. Reeb Posets and Metric Tree Approximations ...... 109
7.1 Introduction ...... 109 7.2 Posets ...... 111 7.3 Reeb Constructions ...... 115 7.3.1 Poset Paths and Length Structures ...... 115
vii 7.3.2 Reeb Posets ...... 117 7.3.3 Reeb Tree Posets ...... 119 7.4 Hyperbolicity for Reeb Posets ...... 122 7.5 Approximation ...... 124 7.6 An Application to Metric Graphs and Finite Metric Spaces ...... 126 7.7 Example where Φ ∼ Υ...... 129
8. Quantitative Simplification of Filtered Simplicial Complexes ...... 132
8.1 Introduction ...... 132 8.2 Gromov-Hausdorff and Interleaving Type Distances between Filtered Sim- plicial Complexes ...... 139 8.2.1 Gromov-Hausdorff Distance between Filtered Simplicial complexes 139 F 8.2.2 The Interleaving Type Distance dI between Filtered Simplicial Com- plexes ...... 143 F 8.2.3 Remarks About the Definition of dI ...... 149 8.2.4 Stability and the Proof of Theorem 8.1.1 ...... 150 8.3 The Vertex Quasi-distance and Simplification ...... 152
8.3.1 Computing δX (v, w): The Procedure ComputeCodensityMatrix() . 155 8.3.2 Specializing δX (v, w) According to Homology Degree ...... 158 8.4 An Application to the Vietoris-Rips Filtration of Finite Metric Spaces and Graphs ...... 160 8.4.1 Finite Metric Spaces ...... 160 8.4.2 Application to Metric Graphs ...... 163 F 8.5 Classification of Filtered Simplicial Complexes via dI ...... 168 F 8.6 An Example where dI dGH ...... 171 8.7 Chain Construction ...... 173
9. Future work ...... 176
Bibliography ...... 178
viii List of Figures
Figure Page
1.1 The space on the right is a simplification of the space on the left...... 2
1.2 B(X, r) is the r-neighborhood of X in E. Notice how the small loop in X is filled...... 3
1.3 A four point approximation of a metric space with three components. . . . .4
2.1 A metric graph ...... 29
2.2 Reeb graph Xh of the height function h on X...... 35
3.1 A big sphere X with a small handle. In this case, as r > 0 increases, 2 Br(X, κ(X)) changes homotopy type from that of X to that of S as soon as r > r0 for some r0 < FillRad(X)...... 63
2 ab a a 6.1 TA = a2+b2 depends only on b and converges to 0 as b → 0...... 98
6.2 A 2-dimensional thickened filtered graph...... 99
6.3 A 2-dimensional thickened 3-fork...... 100
6.4 An inverse 2-dimensional thickened fork...... 102
6.5 A vertical fork...... 106
7.1 A finite metric space embedded in a metric graph ...... 110
7.2 Let R ≥ r > 0 and consider the metric graph from the figure. Let Zn be the fi- nite subset {p, x0, . . . , xn, y1, . . . , yn}. We show that Φ(Zn) ∼ 2 log(4n) hyp(Zn) and Υp(Zn) = 2 log(4n + 4) hyp(Zn)...... 130
ix 8.1 These two finite spaces have the same Vietoris-Rips PH≥1, see Example 8.4.3. 133
∗ ∗ 8.2 X := ∆3...... 148
8.3 A simple metric graph...... 166
F 8.4 These two filtered simplicial complexes are at 0 dI -distance while they are at 1 Gromov-Hausdorff distance at least 2 ...... 172
x Chapter 1: Introduction
In Data Analysis, a data set can generally be endowed with a metric structure. This enables the analysis of the data set not just from a statistical point of view, but also from a geometric point of view. Topological Data Analysis tries to combine and take advantage of both the quantitative (but albeit often noisy) nature of Data and the qualitative nature of
Topology [19].
Geometric intuition tells us that metric spaces have topological features and these features have quantitative properties like size. To be able to recognize these quantitative properties, one needs to go beyond the topological structure induced by the metric, since a given topo- logical space can be metrized in many different ways. Furthermore, as an example, if we consider a finite metric subspace of a unit circle, it is supposed to have an inherited circularity property as a metric space, but the underlying topological space is just discrete. Similarly, if we only look at the underlying topology, finite metric spaces have discrete topology, which can not explain any expected topological feature of the metric space. The main insight of persistence ([69, 38, 19]) for metric spaces is the following: A metric space does not simply induce a topological space, but a family of topological spaces indexed by non-negative real numbers. This family generally arises as a filtration. Then one can observe how topological features change as we change the index.
1 Figure 1.1: The space on the right is a simplification of the space on the left.
Given a metric space X, the most common way of obtaining a family of topological spaces
r r is via the Vietoris-Rips filtration (VR (X))r≥0 , where VR (X) is the simplicial complex with the vertex set X and simplices given by finite subsets of X with diameter less than or equal to r. There is also a more geometric method of obtaining a filtration, which is equivalent to the Vietoris-Rips filtration up to homotopy. Given a metric space X, there are several natural metric spaces into which X can be isometrically embedded; for example the Kuratowski space κ(X) and the tight span E(X) [34]. These spaces have many nice metric and topological properties, which mainly follow from being hyperconvex [53]. They also have very nice categorical properties in the category of metric spaces [53]. Now, let H be such a natural space associated to X. Then, we can look at open or closed r-neighborhoods
Br(X) of X in H, and investigate how the topology of this filtration changes. We make this interpretation precise in Chapter 3. Note that for r = 0 we have the underlying topological space of X itself and as r increases it starts to look more like H.
One of the main concerns of Data Analysis is obtaining simpler and faithful represen- tations of data. In this spirit, the main theme of this thesis is utilizing persistence and metric invariants for simplification of metric objects. More specifically we try to answer questions like the following: Let X be a metric object (it can be a metric space, a geodesic
2 E E B(X,r) X X
Figure 1.2: B(X, r) is the r-neighborhood of X in E. Notice how the small loop in X is filled.
space, a filtered simplicial complex etc). How can we measure the complexity of X? Given a measurement of complexity, how can we construct new metric objects from X with less complexity and how much does that object differ from X? What type of quantification of difference should be used for such comparison? Given a family F of simple metric objects, how similar can X be to an object in F ?
1.1 Content
In Chapter 2, we give the main definitions and results in Metric Geometry, Topological
Data Analysis, and Topology which we are going to use in the rest of this thesis. This chapter includes many classical concepts and results as well as new ones which we introduce. In particular, the stability result of Reeb graphs (Theorem 2.5.8) and the effect of -smoothings on the first Betti number of a metric graph (Proposition 2.5.10) are novel results we proved in Chapter 2 and they are essential for the rest of our work.
3 z x y
w
Figure 1.3: A four point approximation of a metric space with three components.
In Chapter 3 we establish a precise relationship (i.e. a filtered homotopy equivalence)
between the Vietoris-Rips simplicial filtration of a metric space and a more geometric (or
extrinsic) way of assigning a persistence module to a metric space, which consists of first
embedding it into a larger space and then taking the persistence homology of the filtration
obtained by considering the increasing neighborhoods of the original space inside the ambient
space (see Figure 1.2). These neighborhoods are also metric spaces and we benefit from this,
for example, in obtaining a short proof of the K¨unnethformula for persistent homology.
In Chapter 4, we consider finite approximations of compact metric spaces given an upper bound for the cardinality of the approximating space (see Figure 1.3). The main result (Theorem 4.1.1) of this chapter has a striking similarity with one of the main results
(Theorem 5.1.1) about graph approximations of compact geodesic spaces in Chapter 5, which hints a deeper and more general result about approximations in general.
In Chapter 5, we consider graph approximations of compact geodesic spaces (see Figure
1.1). A classical result in metric geometry is that any compact geodesic space can be approx- imated in the Gromov-Hausdorff sense arbitrarily well by finite metric graphs. The classical construction for this approximation consists of taking an -net N from the compact geodesic
4 space and inducing a graph structure on N based on the proximity of its points. However, this construction does not provide any control on the first Betti number of the approximat- ing graphs. If we interpret a graph approximation to a geodesic space as a simplification of the space, we encounter the following problem: As gets smaller, the approximating graphs mentioned above become quite complicated themselves, i.e. their first Betti numbers grow without bound. To get a better handle on this problem, we introduce the following invariant.
Given a compact geodesic space X, define:
X δn := {dGH(X,G): G a finite metric graph β1(G) ≤ n},
where dGH denotes the Gromov-Hausdorff distance. We study the rate of decay of this sequence, specific elements in the sequence like δX , δX and metric graph constructions for β1(X) 0
X obtaining upper and lower bounds. Note that the element δ0 corresponds to metric tree approximations.
In Chapter 6, we analyze approximations of compact Riemannian manifolds by Reeb graphs of Morse functions. We generalize the results of Gromov and Zinov’ev [43, 73], which give measure theoretic bounds on the distortion of the Reeb quotient map of a Morse function, to arbitrary n-dimensional closed Riemannian manifolds. In order to do this, we introduce a metric invariant Tf associated to any filtered metric space f : X → R (see Section 6.3) which we refer to as the thickness of f : X → R. This invariant gives a quantitative measure of how the volume of level sets of f is distributed with respect to their diameters.
In Chapter 7, we consider tree approximations of metric spaces. From the standpoint of applications, datasets which can be associated a tree representation can be readily visualized because trees are planar graphs. When a dataset does not directly lend itself to being represented as by a tree, motivated by the desire to visualize it, the question arises of what
5 is the closest tree to the given dataset. In this sense, one would then want to have (1) ways
of quantifying the treeness of data and, (2) efficient methods for actually computing a tree
that is (nearly) optimally close to the given dataset. There are three different but related
ways in which trees can be mathematically described. The first one is poset theoretic: a
tree is a partially ordered set such that any two elements less than a given element are
comparable, or in other words there is a unique way to go down the poset. The second is
graph theoretic: a tree is a graph without loops. Finally, there is the metric way: a tree
metric space is a metric space which can be embedded in a metric tree (graph). This last
description is the bridge between data analysis and combinatorics of trees. Through it, we
can ask and eventually answer the following questions: How tree-like is a given metric data set? How does this treeness affect its geo- metric features? How can we obtain a tree which is close to a given dataset?
For a metric space X, there exists a metric invariant called hyperbolicity (see Section 2.1.2) denoted by hyp(X) such that hyp(X) ≥ 0 and it is equal to zero if and only if X is a tree
metric. A natural question that ensues is whether the relaxed condition that hyp(X) be
small (instead of hyp(X) = 0), guarantees the existence of a tree metric on X which is close
to dX . In this respect, in [42] Gromov shows that for each finite metric space (X, dX ), there
exists a tree metric tX on X such that
||dX − tX ||∞ ≤ Υ(X) := 2 hyp(X) log(2|X|), where |X| is the cardinality of X. Despite the seemingly unsatisfactory fact that Υ(X) blows up with the cardinality of X (unless hyp(X) = 0), it is known that this bound is asymptoti- cally tight [27]. This suggests searching for alternative bounds which may perform better in more restricted scenarios. We refine Gromov’s bound Υ(X) by identifying a quantity Φ(X) that is related to but often much smaller than Gromov’s Υ(X).
6 In Chapter 8, we consider simplifications of filtered simplicial complexes. For a subset
t I of R, a filtered simplicial complex indexed over I is a family (X )t∈I of simplicial complexes such that for each t ≤ t0 in I, Xt is contained in Xt0 . Filtered simplicial complexes arise in topological data analysis for example as Vietoris-Rips or Cechˇ complexes of metric spaces
[37]. Simplicial complexes have the advantage of admitting a discrete description, hence they are naturally better suited for computations when compared to arbitrary topological spaces.
A useful and computationally feasible way of analyzing the scale dependent features of a
filtered simplicial complex is through persistent homology and persistence diagrams/barcodes
[19, 37]. Given a filtered simplicial complex X∗, for a given k ∈ N, efficient computation of
∗ its k-th dimensional persistent homology PHk(X ) is studied in many papers, for example
[38, 75, 31, 39].
To reduce computational complexity, in the interest of being able to process large datasets, an important task is that of simplifying filtered simplicial complexes (that is, reducing the total number of simplices) in a way such that it is possible to precisely quantify the trade-off between degree of simplification and loss/distortion of homological features [38, 52, 71, 21,
31, 33, 14]. In this chapter we consider the effect on persistent homology of removing a vertex and all cells containing it. In this respect, our study is related to [71, Section 7] and [21, Section 6]. A standard measure of the change in persistent homology is called the interleaving distance[10], which is, by the Isometry Theorem [56, Theorem 3.4], isometric to the bottleneck distance between persistent barcodes. To quantify the distortion at the persistent homology level incurred by operations carried out at the simplicial level, we in- troduce an interleaving type distance for filtered simplicial complexes which is compatible with the distance between their persistent homology signatures. More precisely, persistent homology is stable with respect to this new metric. We bound the effect of removing a vertex
7 with respect to this new metric in terms of a new invariant that we call the codensity of the vertex, which in turn gives a bound on the change in persistent homology. Finally, we in- troduce a construction which we call the chain construction which takes an arbitrary family of simplicial complexes and produces a filtered simplicial complex with the same persistent homology.
Finally, in Chapter 9, we discuss some open problems and future directions arising from our research.
8 Chapter 2: Background
In this chapter, we give the necessary definitions and results we are going to use in the
rest of the paper.
We start Section 2.1 by stating a result about connected metric spaces and Coarea formula for Riemannian manifolds. Then we review Gromov-Hausdorff distance and introduce a novel
version of it for geodesic spaces. Then we review hyperbolicity and injective/hyperconvex
metric spaces. In Section 2.2, we review persistence modules and Vietoris-Rips filtration.
Then we introduce persistence sequences.In Section 2.4, we review metric graphs and do
an analysis of paths in finite metric graphs. In Section 2.5, we review Reeb graphs and -
smoothings. Here we also prove novel results about the stability of Reeb graph construction
and the effect of smoothings on the first Betti number. Finally, in Section 2.6, we prove a
few results about Morse functions that we need later.
2.1 Metric Geometry
The following is a result about covers of connected metric spaces.
Proposition 2.1.1. Let X be a connected metric space and A be a finite cover of X. Then
X diam(X) ≤ diam(A). A∈A
9 For a proof of the Proposition 2.1.1, see the proof of [18, p. 53, Lemma 2.6.1].
For an integer k ≥ 0, we denote the kth Hausdorff measure [18] on a metric space by µk.
We have the following coarea formula (see [40, Theorem 3.2.12, p. 249]).
Proposition 2.1.2 (Coarea Formula). If f : X → R is a smooth L-Lipschitz function defined on an n-dimensional Riemannian manifold X, then for each t0 ≤ t1 ∈ R we have
Z t1 n−1 −1 n −1 µ (f (t)) µ (f [t0, t1]) ≥ dt. t0 L
2.1.1 Gromov-Hausdorff Distance
Gromov-Hausdorff distance is a way of measuring how similar two metric spaces are.
There are several equivalent ways of defining the Gromov-Hausdorff distance (see [18, Section
7.3]). We are going to define it using correspondences.
Definition 2.1.1 (Correspondences). • A correspondence R between two given sets X
and Y , is a relation between them such that for all x in X, there exists a y0 in Y such
that x R y0 and for each y in Y , there exists an x0 in X such that x0 R y.
• A correspondence between pointed sets (X, p) and (Y, q) is a correspondence R between
X and Y such that p R q.
• A correspondence R0 between X and Y is called a subcorrespondence of R if x R0 y
implies that x R y.
• If R is a correspondence between X and Y and S is a correspondence between Y and
Z, then we define the relation S ◦R between X and Z as follows: x S ◦R z if there exists
y in Y such that x R y and y S z. Note that S ◦ R is a correspondence between X,Z.
10 Note that the composition of pointed correspondences is a pointed correspondence and
the composition of subcorrespondences is a subcorrespondence of the composition.
Definition 2.1.2 (Distortion of a correspondence). Let (X, dX ) and (Y, dY ) be metric spaces and R be a correspondence between X and Y . The metric distortion dis(R) of the corre- spondence R is defined as
0 0 dis(R) := sup |dX (x, x ) − dY (y, y )|. (x,y),(x0,y0)∈R
Remark 2.1.3. • If R0 is a subcorrespondence of R, then dis(R0) ≤ dis(R).
• dis(S ◦ R) ≤ dis(R) + dis(S).
Definition 2.1.3 (Gromov-Hausdorff distance). Let X and Y be metric spaces.
• The Gromov-Hausdorff distance dGH(X,Y ) is defined as
1 d (X,Y ) := inf{dis(R): R is a correspondence between X and Y }. GH 2
• Let p and q be points in X and Y respectively. The pointed Gromov-Hausdorff distance
is defined as
1 d (X, p), (Y, q) := inf{dis(R): R is a correspondence between (X, p) and (Y, q)}. GH 2
The following remark is straightforward.
Remark 2.1.4. Let X and Y be metric spaces and p be a point in X. Then