Homology Inference from Point Cloud Data
Total Page:16
File Type:pdf, Size:1020Kb
Homology Inference from Point Cloud Data Yusu Wang Abstract. In this paper, we survey a common framework of estimating topo- logical information from point cloud data (PCD) that has been developed in the field of computational topology in the past twenty years. Specifically, we focus on the inference of homological information. We briefly explain the main ingredients involved, and present some basic results. This chapter is part of the AMS short course on Geometry and Topology in Statistical Inference. It aims to introduce the general mathematical audience to the problem of topol- ogy inference, but is not meant to be a comprehensive review of the field in general. 1. Introduction The past two decades have witnessed tremendous development in the field of computational and applied topology. Much progress has been made not only in the theoretical and computational fronts, but also in the application of topological methods / ideas for data analysis. For example, on the theoretical front, there has been elegant work on the so-called persistent homology, originally proposed in [52] 1, and further developed in [15, 16, 19, 22, 33, 94] etc. On the application front, topological methods have been successfully used in fields such as computer graphics e.g, [67, 82, 42], visualization e.g, [10, 77, 76], geometric reconstruction and meshing e.g, [2, 41, 90], sensor network e.g, [38, 39, 61, 88, 91], high dimensional data analysis e.g, [59, 60, 84, 89] and so on. Indeed, topological data analysis (TDA) has become a vibrant field attracting researchers from mathematics, computer science and statistics. We refer the readers to surveys and books [14, 48, 49, 50, 92] for some development of computational and applied topology. One of the prevailing ideas in topological data analysis is to provide descrip- tors that encode useful information about hidden objects from observed data. A common type of observed data is a discrete set of points sampled from the hidden domain of interest, often referred to as the point cloud data (PCD). For example, one may be interested in understanding the space X formed by images of hand- written digits. However, what one can obtain is only a collection of images of such hand-written digits, where each image can be considered as a sample point from this unknown domain X. Hence an important question in topological data analysis is to compute or approximate certain topological quantities of the hidden domain from 1991 Mathematics Subject Classification. Primary 55U10, 68P01; Secondary 57Q55,68W99. Key words and phrases. Computational topology, applied topology, homology inference. 1The zero-dimensional case, referred to as the size function theory, was studied in e.g, [57, 86]. 1 2 YUSU WANG a PCD input. In particular, one hopes to perform such topology inference both efficiently and reliably (with some theoretical guarantees of the approximation). In this chapter, we aim to introduce the readers to some basic ideas and ap- proaches, developed mostly in the fields of applied topology and computational geometry, to tackle the problem of topology inference from point cloud data. In particular, we will focus on the estimation of a specific topological structure: the homology information of a hidden domain, from its point samples. Organization. In Section 2, we present some necessary background and introduce the three main ingredients involved in estimating homology from point cloud data in Section 2. Each of these ingredients is then described in more detail in Sections 3, 4, and 5, respectively. We also discuss the handling of noise in the input samples in Section 6. Finally, we discuss some extensions of homology inference and give concluding remarks in Section 7. 2. Preliminaries and Overview Topological space and simplicial complex. Let X be a topological space. We will be interested in the i-th dimensional homology group information of X, denoted by Hi(X). We consider the simplicial homology group if X is a simplicial complex, and the singular homology group otherwise. The definitions of these two homology groups can be found in any standard book on algebraic topology (e.g. [66, 71]), and thus not contained in this short review paper. We consider only the homology group over the coefficients in Z2; since Z2 is a field, Hi(X) is a vector space. Let βi(X) := rank(Hi(X)) denote the rank of the i-th homology group of X. We briefly remind the readers of the concept of simplicial complex, since it will play an important role in homology inference: Given a set V , a collection K of finite subsets of V forms an (abstract) simplicial complex if the following condition holds: if σ is an element in K, then any non-empty subset of σ is also in K. Each subset σ of V in K is referred to as a simplex, and its dimension equals the cardinality of this subset σ minus one. A face of a simplex is any non-empty subset β σ. The vertex set of K is the union of the simplices in K with dimension zero.⊆ The k-skeleton of a simplicial complex K is the set of simplices of K of dimension at most k. For example, the 1-skeleton of K consists of the set of vertices and edges in K. Problem setup. We are interested in the homology group information of a topo- logical space X. In what follows, X is assumed to be either a smooth Riemannian manifold embedded in the Euclidean space Rd, or a compact subset of Rd. In practice, the domain X of interest may only be accessible through a set of points P Rd sampled from on or around X. To simplify the problem, assume our goal ⊂ is to compute or approximate βi(X) from the PCD P . We will also briefly discuss computing additional information related to Hi(X) in Section 7. Main ingredients. Since points themselves do not have interesting topology, the natural first step, given the input set of points P , is to construct an intermediate object that in some sense approximates the hidden domain X of interest. In par- ticular, a popular choice for this intermediate object is to construct a simplicial complex K from points in P . HOMOLOGY INFERENCE FROM POINT CLOUD DATA 3 After the simplicial complex K is constructed, one can then compute the ho- mology information of K and return that as an approximation of the homology of X. However, in order to provide more precise and quantitative statements on the \approximation quality" of the outcome from such an approach, we need to first describe what the quality of the input point cloud data P is. Intuitively, a better approximation in homology is achieved if the input points P \approximate" the hidden domain X better. In computational geometry and topology, the language to describe the quality of input points is called the sampling condition. In summary, there are three main issues that we will discuss in the remainder of this paper: Issue 1: the choices of simplicial complexes (Section 3). Issue 2: sampling conditions of input points P (Section 4). Issue 3: how to obtain approximation guarantees (Section 5). Furthermore, input point samples could be corrupted with noise. We will also discuss the issue of handling noise, and in particular focus on the work of Chazal et al. [21] based on the so-called distance to measures. Issue 4: the handling of noise in P (Section 6). Further discussion: approximating homology cycles and Reeb graphs (Sec- tion 7). 3. Simplicial Complexes d Recall that we have assumed that P; X R . Suppose P = p1; : : : ; pn . In what follows, we use B(p; r) to denote the d-dimensional⊂ Euclideanf ball centeredg at p with radius r, and d(x; y) to denote the Euclidean distance between x and y. Delaunay complex. One of the most well-known simplicial complexes spanned by a set of points in Rd is the Delaunay complex (sometimes called the Delaunay triangulation, especially in low dimensions) defined as follows: d Definition 3.1. Given a set of points P = p1; : : : ; pn R , a k-simplex f g ⊂ σ = pi0 ; : : : ; pik is in the Delaunay complex Del(P ) if and only if there exists a ball Bh whose boundaryi contains vertices of σ and that the interior of B contains no point from P . A simplex in Del(P ) is also called a Delaunay simplex. The Delaunay complex is named after mathematician Boris Delaunay [40]. It has many beautiful properties, and has been well-studied, especially for the low- dimensional case when d = 2 or 3. Indeed, this concept is fundamental to the fields of surface reconstruction and meshing. See for examples books [41, 90]. The Delaunay complex is the dual complex of the so-called Voronoi diagram Vor(P ) of P [87], which decomposes the space Rd into cells. Each cell is uniquely associated d with an input point pi, and contains the set of points from R that have pi as the nearest neighbor among points in P ; that is, the Voronoi cell of pi is given by d x R d(x; pi) d(x; p), for any p P . See Figure 1 (a) for an illustration. f 2For pointsj in high-dimensional≤ space,2 however,g the construction of the Delaunay d d e complex is expensive: It takes O(n 2 + n log n) time in worst case to compute the Delaunay complex for a set of n points in Rd [35]. Furthermore, computing only the first-dimensional Delaunay simplices does not appear to be any faster, 4 YUSU WANG (a) (b) (c) (d) Figure 1.