A Primer in Persistent Homology

A primer in persistent homology Bastian Rieck Motivation What is the ‘shape’ of data? Bastian Rieck A primer in persistent homology 1 A simple example What is the shape of this set of points? Technically, a set of points does not have a ‘shape’. Still, we perceive the points to be arranged in a circle. How can we quantify this? Bastian Rieck A primer in persistent homology 2 A simple example What is the shape of this set of points? We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see. Bastian Rieck A primer in persistent homology 3 A simple example What is the shape of this set of points? We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see. Bastian Rieck A primer in persistent homology 3 A simple example What is the shape of this set of points? We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see. Bastian Rieck A primer in persistent homology 3 A simple example What is the shape of this set of points? We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see. Bastian Rieck A primer in persistent homology 3 What did we see? Points are arranged in a circle, as long as the radius of the disks we use to cover them does not exceed a certain critical threshold. How can we formulate this more precisely? Bastian Rieck A primer in persistent homology 4 Algebraic topology The branch of mathematics that is concerned with finding invariant properties of high-dimensional objects. Simple invariants 2 3 1 Dimension: R 6= R because 2 6= 3 2 Determinant: If matrices A and B are similar, their determinants are equal. Bastian Rieck A primer in persistent homology 5 Betti numbers A topological invariant Informally, they count the number of holes in different dimensions that occur in an object. Space β0 β1 β2 β0 Connected components β1 Tunnels Point 1 0 0 β2 Voids Circle 1 1 0 . Sphere 1 0 1 . Torus 1 2 1 Bastian Rieck A primer in persistent homology 6 Calculating Betti numbers th th The k Betti number βk is the rank of the k homology group Hk(X) of the topological space X. To define this formally, we require a notion of ‘holes’ in simplicial complexes. This, in turn, requires the concepts of boundaries and cycles. Technically, I should write simplicial homology group every time. I am not going to do this. Instead, let us first talk about simplicial complexes. Bastian Rieck A primer in persistent homology 7 Simplicial complexes A family of sets K with a collection of subsets S is called an abstract simplicial complex if: 1 fvg 2 S for all v 2 K. 2 If σ 2 S and τ ⊆ σ, then τ 2 K. The elements of a simplicial complex are called simplices.A k-simplex consists of k + 1 indices. Bastian Rieck A primer in persistent homology 8 Simplicial complexes Example Valid Invalid Bastian Rieck A primer in persistent homology 9 Chain groups th Given a simplicial complex K, the p chain group Cp of K contains all linear combinations of p-simplices in the complex. Coefficients Z P are in 2, hence all elements of Cp are of the form j σj, for σj 2 K. The group operation is addition with Z2 coefficients. We need chain groups to algebraically express the concept of a boundary. Bastian Rieck A primer in persistent homology 10 Boundary homomorphism Given a simplicial complex K, the pth boundary homomorphism is the homomorphism that assigns each simplex σ = fv0; : : : ; vpg 2 K to its boundary: X @pσ = fv0;:::; v^i; : : : ; vkg i In the equation above, v^i indicates that the set does not contain the th i vertex. The function @p : Cp ! Cp−1 is thus a homomorphism between the chain groups. Bastian Rieck A primer in persistent homology 11 Fundamental lemma & chain complex For all p, we have @p−1 ◦ @p = 0: Boundaries do not have a boundary themselves. This leads to the chain complex: @n+1 @n @n−1 @2 @1 @0 0 −−−! Cn −! Cn−1 −−−! ::: −! C1 −! C0 −! 0 Bastian Rieck A primer in persistent homology 12 Cycle and boundary groups Cycle group Zp = ker @p Boundary group Bp = im @p+1 We have Bp ⊆ Zp in the group-theoretical sense. In other words, every boundary is also a cycle. Bastian Rieck A primer in persistent homology 13 Homology groups & Betti numbers th The p homology group Hp is a quotient group, defined by ‘removing’ cycles that are boundaries from a higher dimension: Hp = Zp/Bp = ker @p/ im @p+1; With this definition, we may finally calculate the pth Betti number: βp = rank Hp Intuitively: Calculate all boundaries, remove the boundaries that come from higher-dimensional objects, and count what is left. Bastian Rieck A primer in persistent homology 14 Real-world multivariate data Often: Unstructured point clouds n items with D attributes; n × D matrix Non-random sample from RD Manifold hypothesis There is an unknown d-dimensional manifold M ⊆ RD, with d D, from which our data have been sampled. Bastian Rieck A primer in persistent homology 15 Converting unstructured data into a simplicial complex Rips graph R Use a distance measure dist(·;·) such as the Euclidean distance and a threshold parameter . Connect u and v if dist(u; v) ≤ . Bastian Rieck A primer in persistent homology 16 How to get a simplicial complex from R? Construct the Vietoris–Rips complex V by adding a k-simplex whenever all of its (k − 1)-dimensional faces are present. Bastian Rieck A primer in persistent homology 17 How to calculate Betti numbers? Direct calculations are unstable = 0:35 = 0:53 = 0:88 = 1:05 1 1 β 0 0 0:5 1 ϵ Bastian Rieck A primer in persistent homology 18 Persistent homology Note that the ‘correct’ Betti number of the data persists over a certain range of the threshold parameter . To formalize this, assume that simplices in the Vietoris–Rips complex are added one after the other with an associated weight. This gives rise to a filtration, ; = K0 ⊆ K1 ⊆ · · · ⊆ Kn−1 ⊆ Kn = K; such that each Ki is a valid simplicial subcomplex of K. We write w(Ki) to denote the weight of Ki. Bastian Rieck A primer in persistent homology 19 Similar to what we have previously seen, this gives rise to a sequence of homomorphisms, i;j fp : Hp(Ki) ! Hp(Kj); and a sequence of homology groups, i.e. 0;1 1;2 n−2;n−1 n−1;n fp fp fp fp 0 = Hp(K0) −−−! Hp(K1) −−−! ::: −−−−−−−! Hp(Kn−1) −−−−−! Hp(Kn) = Hp(K); where p denotes the dimension of the homology groups. Bastian Rieck A primer in persistent homology 20 Persistent homology group th i;j Given two indices i ≤ j, the p persistent homology group Hp is defined as i;j Hp := Zp (Ki)/(Bp (Kj) \ Zp (Ki)) ; which contains all the homology classes of Ki that are still present in Kj. We may now track the different homology classes through the individual homology groups. Bastian Rieck A primer in persistent homology 21 Tracking of homology classes i−1;i Creation in Ki : c 2 Hp (Ki), but c 2/ Hp i;j−1 i−1;j−1 i;j i−1;j Destruction in Kj : fp (c)/2 Hp and fp (c) 2 Hp The persistence of a class c that is created in Ki and destroyed in Kj is defined as pers(c) = jw(Kj) − w(Ki)j; and measures the ‘scale’ at which a certain topological feature occurs. Bastian Rieck A primer in persistent homology 22 = 0:35 = 0:53 = 0:88 = 1:05 Here, the topological feature is the circle that underlies the data. It persists from = 0:53 to = 1:05, so its persistence is: pers = 1:05 − 0:53 = 0:52 In general, a high persistence indicates relevant features. Bastian Rieck A primer in persistent homology 23 How to represent topological information? Persistence diagram Given a topological feature created in Ki and destroyed in Kj, add a point with coordinates (w(Ki); w(Kj)) to a diagram: This summarizing description is always two-dimensional, regardless of the dimensionality of the input data! Bastian Rieck A primer in persistent homology 24 Uses for persistence diagrams Well-defined distance measures Persistence diagrams from the same object. Some noise has been added to the object, resulting in spurious topological features. Large-scale features remain the same, though! Bastian Rieck A primer in persistent homology 25 Distance measure Second Wasserstein distance s X 2 W2(X; Y ) = inf kx − η(x)k1 η : X!Y x2X Bastian Rieck A primer in persistent homology 26 Stability Theorem Let f and g be two Lipschitz-continuous functions. There are constants k and C that depend on the input space and on the Lipschitz constants of f and g such that k 1− 2 W2(X; Y ) ≤ Ckf − gk1 ; (1) where X and Y refer to the persistence diagrams of f and g. Bastian Rieck A primer in persistent homology 27 Summarizing statistics Given a persistence diagram D, there are various summary statistics that we can calculate: 1-norm: kDk1 = max jc − dj (x;y)2D p-norm: 1 0 1 p X p kDkp = @ (x − y) A (x;y)2D Total persistence: X p pers(D)p = (x − y) (x;y)2D Bastian Rieck A primer in persistent homology 28 Scalar field analysis Climate research Bastian Rieck A primer in persistent homology 29 Scalar field analysis What are the issues? Need to know about large-scale & small-scale differences in qualitative behaviour of the fields Similar phenomena may appear at different regions in the data Time-varying aspects: What are outlying time steps with markedly different properties than the remaining ones? Using a 2D simplicial complex (surface of the Earth), we can only find topological features in dimensions 0, 1, and2.

A Primer in Persistent Homology

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support