Computing Persistent Homology ∗

Afra Zomorodian† and Gunnar Carlsson‡

(Discrete and Computational Geometry)

Abstract a a a a a a b b b b b b We show that the persistent homology of a filtered d- dimensional simplicial complex is simply the standard homol- d c d c d c d c d c c, d, ogy of a particular graded module over a polynomial . Our 0 a, b 1 ab, bc 2 cd, ad 3 ac 4 abc 5 acd analysis establishes the existence of a simple description of persistent homology groups over arbitrary fields. It also en- Figure 1. A filtered complex with newly added simplices high- ables us to derive a natural algorithm for computing persistent lighted. homology of spaces in arbitrary dimension over any field. This result generalizes and extends the previously known algorithm of vector spaces. Our classification is in terms of a set of 3 that was restricted to subcomplexes of S and Z2 coefficients. intervals. We also derive a natural algorithm for comput- Finally, our study implies the lack of a simple classification ing this family of intervals. Using this family, we may over non-fields. Instead, we give an algorithm for computing identify homological features that persist within the fil- individual persistent homology groups over an arbitrary prin- tration, the persistent homology of the filtered complex. cipal ideal domains in any dimension. Furthermore, our interpretation makes it clear that if 1 Introduction the ground ring is not a field, there exists no similarly simple classification of persistent homology. Rather, the In this paper, we study the homology of a filtered d- structures are very complicated, and although we may dimensional simplicial complex K, allowing an arbi- assign interesting invariants to them, no simple classifi- trary principal ideal domain D as the ground ring of co- cation is, or is likely ever to be, available. In this case efficients. A filtered complex is an increasing sequence we provide an algorithm for computing a single persis- of simplicial complexes, as shown in Figure 1. It deter- tent for the filtration. mines an inductive system of homology groups, i.e., a In the rest of this section we first motivate our study family of Abelian groups {Gi}i≥0 together with homo- through three examples in which filtered complexes morphisms Gi → Gi+1. If the homology is computed arise whose persistent homology is of interest. We then with field coefficients, we obtain an inductive system of discuss prior work and its relationship to our work. We vector spaces over the field. Each vector space is deter- conclude this section with an outline of the paper. mined up to isomorphism by its dimension. In this paper we obtain a simple classification of an inductive system 1.1 Motivation ∗ Research by the first author is partially supported by NSF un- We call a filtered simplicial complex, along with its as- der grants CCR-00-86013 and ITR-0086013. Research by the sec- ond author is partially supported by NSF under grant DMS-0101364. sociated chain and boundary maps, a persistence com- Research by both authors is partially supported by NSF under grant plex. We formalize this concept in Section 3. Per- DMS-0138456. sistence complexes arise naturally whenever one is at- † Department of Computer Science, , Stanford, tempting to study topological invariants of a space com- California. ‡Department of Mathematics, Stanford University, Stanford, Cali- putationally. Often, our knowledge of this space is lim- fornia. ited and imprecise. Consequently, we must utilize a multiscale approach to capture the connectivity of the Persistent homology groups are initially defined in space, giving us a persistence complex. [11, 18]. The authors also provide an algorithm that worked only for spaces that were subcomplexes of S3 Example 1.1 (point cloud data) Suppose we are given over Z2 coefficients. The algorithm generates a set of in- a finite set of points X from a subspace X ∈ Rn. We call tervals for a filtered complex. Surprisingly, the authors X point cloud data or PCD for short. It is reasonable to show that these intervals allowed the correct computa- believe that if the sampling is dense enough, we should tion of the rank of persistent homology groups. In other be able to compute the topological invariants of X di- words, the authors prove constructively that persistent rectly from the PCD. To do so, we may either compute homology groups of subcomplexes of S3, if computed the Cechˇ complex, or approximate it via a Rips com- over Z2 coefficients, have a simple description in terms plex [15]. The latter complex R(X) has X as its vertex of a set of intervals. To build these intervals, the algo- set. We declare a set of vertices σ = {x0, x1, . . . , xk} rithm pairs positive cycle-creating simplices with neg- to span a k-simplex of R(X) iff the vertices are pair- ative cycle-destroying simplices. During the computa- wise close, that is, d(xi, xj) ≤  for all pairs xi, xj ∈ σ. tion, the algorithm ignores negative simplices and al- There is an obvious inclusion R(X) → R0 (X) when- ways looks for the youngest simplex. While the authors ever  < 0. In other words, for any increasing sequence prove the correctness of the results of the algorithm, the of non-negative real numbers, we obtain a persistence underlying structure remains hidden. complex. 1.3 Our Work Example 1.2 (density) Often, our samples are not from a geometric object, but are heavily concentrated on it. It We are motivated primarily by the unexplained results is important, therefore, to compute a measure of den- of the previous work. We wish to answer the following sity of the data around each sample. For instance, we questions: may count the number of samples ρ(x) contained in a ball of size  around each sample x. We may then de- 1. Why does a simple description exist for persistent fine Rr(X) ⊆ R to be the Rips subcomplex on the 3   homology of subcomplexes of S over Z2? vertices for which ρ(x) ≤ r. Again, for any increasing sequence of non-negative real numbers r, we obtain a 2. Does this description also exist over other rings persistence complex. We must analyze this complex to of coefficients and arbitrary-dimensional simplicial compute topological invariants attached to the geomet- complexes? ric object around which our data is concentrated. 3. Why can we ignore negative simplices during com- Example 1.3 (Morse functions) Given a manifold M putation? equipped with a Morse function f, we may filter M 4. Why do we always look for the youngest simplex? via the excursion sets Mr = {m ∈ M | f(m) ≤ r}. We again choose an increasing sequence of non-negative 5. What is the relationship between the persistence al- numbers to get a persistence complex. If the Morse gorithm and the standard reduction scheme? function is a height function attached to some embed- M n ding of in R , persistent homology can now give in- In this paper we resolve all these questions by uncov- formation about the shape of the submanifolds, as well ering and elucidating the structure of persistent homol- the homological invariants of the total manifold. ogy. Specifically, we show that the persistent homology of a filtered d-dimensional simplicial complex is simply 1.2 Prior Work the standard homology of a particular graded module over a polynomial ring. Our analysis places persistent We assume familiarity with basic group theory and refer homology within the classical framework of algebraic the reader to Dummit and Foote [10] for an introduc- topology. This placement allows us to utilize a standard tion. We make extensive use of Munkres [16] in our de- structure theorem to establish the existence of a simple scription of algebraic homology and recommend it as an description of persistent homology groups as a set of accessible resource to non-specialists. There is a large intervals, answering the first question above. This de- body of work on the efficient computation of homology scription exists over arbitrary fields, not just Z2 as in the groups and their ranks [1, 8, 9, 13]. previous result, resolving the second question.

2 Our analysis also enables us to derive a persistence in Section 5 that computes individual persistent groups. algorithm from the standard reduction scheme in alge- Section 6 describes our implementation and some ex- bra, resolving the next three questions using two main periments. We conclude the paper in Section 7 with a lemmas. Our algorithm generalizes and extends the pre- discussion of current and future work. viously known algorithm to complexes in arbitrary di- mensions over arbitrary fields of coefficients. We also show that if we consider integer coefficients or coeffi- 2 Background cients in some non-field R, there is no similar simple In this section we review the mathematical and algo- classification. This negative result suggests the possibil- rithmic background necessary for our work. We begin ity of interesting yet incomplete invariants of inductive by reviewing the structure of finitely generated modules systems. For now, we give an algorithm for classifying a over principal ideal domains. We then discuss simpli- single homology group over an arbitrary principal ideal cial complexes and their associated chain complexes. domain. Putting these concepts together, we define simplicial ho- mology and outline the standard algorithm for its com- 1.4 Spectral Sequences putation. We conclude this section by describing persis- tent homology. Any filtered complex gives rise to a spectral sequence, so it is natural to wonder about the relationship be- tween this sequence and persistence. A full discussion 2.1 Algebra on spectral sequences is outside the scope of this pa- Throughout this paper we assume a ring R to be com- per. However, we include a few remarks here for the mutative with unity. A polynomial f(t) with coefficients reader who is familiar with the subject. We may easily P∞ i in R is a formal sum i=0 ait , where ai ∈ R and t is show that the persistence intervals for a filtration corre- the indeterminate. For example, 2t + 3 and t7 − 5t2 are spond to nontrivial differentials in the spectral sequence both polynomials with integer coefficients. The set of that arises from the filtration. Specifically, an interval of all polynomials f(t) over R forms a commutative ring length r corresponds to some differential dr+1. Given R[t] with unity. If R has no divisors of zero, and all its this correspondence, we realize that the method of spec- ideals are principal, it is a principal ideal domain (PID). tral sequences computes persistence intervals in order of For our purposes, a PID is simply a ring in which we length, finding all intervals of length r during the com- may compute the greatest common divisor or gcd of a r+1 putation of the E term. In principle, we may use this pair of elements. This is the key operation needed by method to compute the result of our algorithm. How- the structure theorem that we discuss below. PIDs in- ever, the method does not provide an algorithm, but a clude the familiar rings Z, Q, and R. Finite fields Zp scheme that must be tailored for each problem indepen- for p a prime, as well as F [t], polynomials with coeffi- dently. The practitioner must decide on an appropriate cients from a field F , are also PIDs and have effective basis, find the zero terms in the sequence, and deduce algorithms for computing the gcd [6]. the nature of the differentials. Our analysis of persis- A graded ring is a ring hR, +, ·i equipped with tent homology, on the other hand, provides a complete, a direct sum decomposition of Abelian groups R =∼ effective, and implementable algorithm for any filtered L R , i ∈ , so that multiplication is defined by bi- complex. i i Z linear pairings Rn ⊗ Rm → Rn+m. Elements in a single Ri are called homogeneous and have degree i, 1.5 Outline deg e = i for all e ∈ Ri. We may grade the polyno- mial ring R[t] non-negatively with the standard grading n 6 3 We begin by reviewing concepts from algebra and sim- Rn = Rt , n ≥ 0. In this grading, 2t and 7t are plicial homology in Section 2. We also re-introduce both homogeneous of degree 6 and 3, respectively, but persistent homology over integers and arbitrary dimen- their sum 2t6 + 7t3 is not homogeneous. The product sions. In Section 3 we define and study the persistence of the two terms, 14t9, has degree 9 as required by the module, a structure that represents the homology of a fil- definition. A graded module M over a graded ring R tered complex. In addition, we establish a relationship is a module equipped with a direct sum decomposition, ∼ L between our results and prior work. Using our analy- M = i Mi, i ∈ Z, so that the action of R on M is sis, we derive an algorithm for computation over fields defined by bilinear pairings Rn ⊗ Mm → Mn+m. The in Section 4. For non-fields, we describe an algorithm main structure in our paper is a graded module and we

3 include concrete examples that clarify this concept later vertices of σ, where (v0, . . . , vk) ∼ (vτ(0), . . . , vτ(k)) on. A graded ring (module) is non-negatively graded if are equivalent if the sign of τ is 1. We denote an ori- Ri = 0 (Mi = 0) for all i < 0. ented simplex by [σ]. A simplex may be realized geo- The standard structure theorem describes finitely gen- metrically as the convex hull of k + 1 affinely indepen- erated modules and graded modules over PIDs. dent points in Rd, d ≥ k. A realization gives us the familiar low-dimensional k-simplices: vertices, edges, Theorem 2.1 (structure) If D is a PID, then every triangles, and tetrahedra, for 0 ≤ k ≤ 3, shown in Fig- finitely generated D-module is isomorphic to a direct ure 2. Within a realized complex, the simplices must sum of cyclic D-modules. That is, it decomposes b b uniquely into the form c a m ! a β M a a b D ⊕ D/diD , (1) c d i=1 vertex edge triangle tetrahedron a [a, b] [a, b, c] [a, b, c, d] for di ∈ D, β ∈ Z, such that di|di+1. Similarly, ev- 3 ery graded module M over a graded PID D decomposes Figure 2. Oriented k-simplices in R , 0 ≤ k ≤ 3. The orien- uniquely into the form tation on the tetrahedron is shown on its faces.

n !  m  meet along common faces. A subcomplex of K is a sub- M αi M γj set L ⊆ K that is also a simplicial complex. A filtration Σ D ⊕  Σ D/djD , (2) i=1 j=1 of a complex K is a nested subsequence of complexes ∅ = K0 ⊆ K1 ⊆ ... ⊆ Km = K. For generality, i m where dj ∈ D are homogeneous elements so that we let K = K for all i ≥ m. We call K a filtered α dj|dj+1, αi, γj ∈ Z, and Σ denotes an α-shift upward complex. We show a small filtered complex in Figure 1. in grading. 2.3 Chain Complex In both cases, the theorem decomposes the structures into two parts. The free portion on the left includes The kth chain group Ck of K is the free Abelian group generators that may generate an infinite number of el- on its set of oriented k-simplices, where [σ] = −[τ] if ements. This portion is a vector space and should be σ = τ and σ and τ are differently oriented.. An element P familiar to most readers. Decomposition (1) has a vec- c ∈ Ck is a k-chain, c = i ni[σi], σi ∈ K with coeffi- tor space of dimension β. The torsional portion on the cients ni ∈ Z. The boundary operator ∂k : Ck → Ck−1 right includes generators that may generate a finite num- is a homomorphism defined linearly on a chain c by its ber of elements. For example, if PID D is Z in the theo- action on any simplex σ = [v0, v1, . . . , vk] ∈ c, /3 = rem, Z Z Z3 would represent a generator capable of X i only creating three elements. These torsional elements ∂kσ = (−1) [v0, v1,..., vˆi, . . . , vk], are also homogeneous. Intuitively then, the theorem de- i scribes finitely generated modules and graded modules where vˆi indicates that vi is deleted from the sequence. as structures that look like vector spaces but also have The boundary operator connects the chain groups into a some dimensions that are “finite” in size. chain complex C∗:

∂k+1 ∂k · · · → Ck+1 −−−→ Ck −→ Ck−1 → · · · . 2.2 Simplicial Complexes We may also define subgroups of Ck using the boundary A simplicial complex is a set K, together with a col- operator: the cycle group Zk = ker ∂k and the bound- lection S of subsets of K called simplices (singular sim- ary group Bk = im ∂k+1. We show examples of cy- plex) such that for all v ∈ K, {v} ∈ S, and if τ ⊆ σ ∈ S, cles in Figure 3. An important property of the boundary then τ ∈ S. We call the sets {v} the vertices of K. When operators is that the boundary of a boundary is always it is clear from context what S is, we refer to set K as empty, ∂k∂k+1 = 0. This fact, along with the defi- a complex. We say σ ∈ S is a k-simplex of dimension nitions, implies that the defined subgroups are nested, k if |σ| = k + 1. If τ ⊆ σ, τ is a face of σ, and σ Bk ⊆ Zk ⊆ Ck, as in Figure 4. For generality, we often is a coface of τ. An orientation of a k-simplex σ, σ = define null boundary operators in dimensions where Ck {v0, . . . , vk}, is an equivalence class of orderings of the is empty.

4 δk+1 δk 2.4 Homology Ck+1 Ck Ck−1

Z Z k Zk−1 The kth homology group is Hk = Zk/Bk. Its elements k+1 are classes of homologous cycles. To describe its struc- Bk+1 Bk Bk−1 ture, we view the Abelian groups we have defined so far as modules over the integers. This view allows alter- nate ground rings of coefficients, including fields. If the 0 0 0 Figure 4. A chain complex with its internals: chain, cycle, and ring is a PID D, Hk is a D-module and Theorem (2.1) applies: β, the rank of the free submodule, is the Betti boundary groups, and their images under the boundary opera- tors. number of the module, and di are its torsion coefficients. When the ground ring is Z, the theorem above describes the structure of finitely generated Abelian groups. Over The algorithm also uses elementary column operations a field, such as R, Q, or Zp for p a prime, the torsion sub- that are similarly defined. Each column (row) opera- module disappears. The module is a vector space that is tion corresponds to a change in the basis for Ck (Ck−1). fully described by a single integer, its rank β, which de- For example, if ei and ej are the ith and jth basis ele- pends on the chosen field. ments for Ck, respectively, a column operation of type (3) amounts to replacing ei with ei + qej. A similar row operation on basis elements eˆ and eˆ for C , how- 2.5 Reduction i j k−1 ever, replaces eˆj by eˆj − qeˆi. We shall make use of this The standard method for computing homology is the re- fact in Section 4. The algorithm systematically modifies duction algorithm. We describe this method for integer the bases of Ck and Ck−1 using elementary operations M coefficients as it is the more familiar ring. The method to reduce k to its (Smith) normal form: extends to modules over arbitrary PIDs, however.   b1 0 As C is free, the oriented k-simplices form the stan- k  ..  dard basis for it. We represent the boundary operator  . 0    ∂ : C → C relative to the standard bases of the ˜  0 bl  k k k−1 Mk =  k  , chain groups as an integer matrix M with entries in   k   {−1, 0, 1}. The matrix Mk is called the standard ma-  0 0  trix representation of ∂k. It has mk columns and mk−1 rows (the number of k- and (k − 1)-simplices, respec- i tively). The null-space of Mk corresponds to Zk and its where lk = rank Mk = rank M˜ k, b ≥ 1, and bi|bi+1 range-space to Bk−1, as manifested in Figure 4. The re- for all 1 ≤ i < lk. The algorithm can also compute duction algorithm derives alternate bases for the chain corresponding bases {ej} and {eˆi} for Ck and Ck−1, groups, relative to which the matrix for ∂k is diagonal. respectively, although this is unnecessary if a decompo- The algorithm utilizes the following elementary row op- sition is all that is needed. Computing the normal form erations on Mk: in all dimensions, we get a full characterization of Hk:

1. exchange row i and row j, (i) the torsion coefficients of Hk−1 (di in (1)) are pre- 2. multiply row i by −1, cisely the diagonal entries bi greater than one.

3. replace row i by (row i) + q(row j), where q is an (ii) {ei | lk +1 ≤ i ≤ mk} is a basis for Zk. Therefore, integer and j 6= i. rank Zk = mk − lk.

(iii) {bieˆi | 1 ≤ i ≤ lk} is a basis for Bk−1. Equiva-

lently, rank Bk = rank Mk+1 = lk+1. ¡ ¡ ¡

¢¡¢¡¢¡¢ Combining (ii) and (iii), we have

¡ ¡ ¡

¢¡¢¡¢¡¢

¡ ¡ ¡

¢¡¢¡¢¡¢

¡ ¡ ¡

¢¡¢¡¢¡¢

¡ ¡ ¡ ¢¡¢¡¢¡¢ βk = rank Zk − rank Bk = mk − lk − lk+1. (3)

Figure 3. The dashed 1-boundary rests on the surface of a torus. The two solid 1-cycles form a basis for the first homology class of the torus. These cycles are non-bounding: neither is a boundary of a piece of surface. Example 2.1 For the complex in Figure 1, the standard

5 matrix representation of ∂1 is The definition is well-defined: both groups in the de- nominator are subgroups of Cl+p, so their intersection   k ab bc cd ad ac is also a group, a subgroup of the numerator. The p-  a −1 0 0 −1 −1  k Ki βi,p   persistent th Betti number of is k , the rank of M1 =  b 1 −1 0 0 0  , the free subgroup of Hi,p. We may also define persistent  c 0 1 −1 0 1  k   homology groups using the injection ηi,p : Hi → Hi+p, d 0 0 1 1 0 k k k that maps a homology class into the one that contains i,p i,p where we show the bases within the matrix. Reducing it. Then, im ηk ' Hk [11, 18]. We extend this the matrix, we get the normal form definition over arbitrary PIDs, as before. Persistent ho- mology groups are modules and Theorem 2.1 describes   cd bc ab z1 z2 their structure.  d − c 1 0 0 0 0  ˜   M1 =  c − b 0 1 0 0 0  ,    b − a 0 0 1 0 0  a 0 0 0 0 0 3 The Persistence Module where z1 = ad − bc − cd − ab and z2 = ac − bc − ab form a basis for Z1 and {d − c, c − b, b − a} is a basis for B0. In this section we take a different view of persistent homology in order to understand its structure. Intu- We may use a similar procedure to compute homol- itively, the computation of persistence requires compat- i i+p ogy over graded PIDs. A homogeneous basis is a basis ible bases for Hk and Hk . It is not clear when a suc- of homogeneous elements. We begin by representing ∂k cinct description is available for the compatible bases. relative to the standard basis of Ck (which is homoge- We begin this section by combining the homology of neous) and a homogeneous basis for Zk−1. Reducing all the complexes in the filtration into a single alge- to normal form, we read off the description provided by braic structure. We then establish a correspondence that direct sum (2) using the new basis {eˆj} for Zk−1: reveals a simple description over fields. Most signifi- cantly, we illustrate that the persistent homology of a (i) zero row i contributes a free term with shift αi = filtered complex is simply the standard homology of a deg eˆi, particular graded module over a polynomial ring. A sim- ple application of the structure theorem (Theorem 2.1) (ii) row with diagonal term bi contributes a torsional gives us the needed description. We end this section by term with homogeneous dj = bj and shift γj = illustrating the relationship of our structures to the per- deg eˆj. sistence equation (Equation (4).) The reduction algorithm requires O(m3) elementary operations, where m is the number of simplices in K. The operations, however, must be performed in exact Definition 3.1 (persistence complex) A persistence integer arithmetic. This is problematic in practice, as i the entries of the intermediate matrices may become ex- complex C is a family of chain complexes {C∗}i≥0 over i i i+1 tremely large. R, together with chain map’s f : C∗ → C∗ , so that we have the following diagram: 2.6 Persistence 0 1 2 0 f 1 f 2 f We end this section with by re-introducing persistence. C∗ −→ C∗ −→ C∗ −→· · · . Given a filtered complex, the ith complex Ki has asso- i i ciated boundary operators ∂k, matrices Mk, and groups i i i i Our filtered complex K with inclusion maps for the sim- Ck, Zk, Bk and Hk for all i, k ≥ 0. Note that super- scripts indicate the filtration index and are not related to plices becomes a persistence complex. Below, we show cohomology. The p-persistent kth homology group of a portion of a persistence complex, with the chain com- Ki is plexes expanded. The filtration index increases hori- zontally to the right under the chain maps f i, and the i,p i i+p i Hk = Zk / (Bk ∩ Zk). (4) dimension decreases vertically to the bottom under the

6 boundary operators ∂k. Theorem 3.1 (correspondence) The correspondence α defines an equivalence of categories between the       category of persistence modules of finite type over R ∂3y ∂3y ∂3y and the category of finitely generated non-negatively 0 1 2 0 f 1 f 2 f C2 −−−−→ C2 −−−−→ C2 −−−−→ · · · graded modules over R[t].       ∂2y ∂2y ∂2y The proof is the Artin-Rees theory in commutative alge- 0 1 2 bra [12]. 0 f 1 f 2 f C1 −−−−→ C1 −−−−→ C1 −−−−→ · · · Intuitively, we are building a single structure that con-    tains all the complexes in the filtration. We begin by ∂  ∂  ∂  1y 1y 1y computing a direct sum of the complexes, arriving at a 0 1 2 0 f 1 f 2 f much larger space that is graded according to the filtra- C −−−−→ C −−−−→ C −−−−→ · · · 0 0 0 tion ordering. We then remember the time each simplex enters using a polynomial coefficient. For instance, sim- plex a enters the filtration in Figure 1 at time 0. To shift Definition 3.2 (persistence module) A persistence this simplex along the grading, we must multiply the module M is a family of R-modules M i, together with simplex using t. Therefore, while a exists at time 0, t · a homomorphisms ϕi : M i → M i+1. exists at time 1, t2 · a at time 2, and so on. The key idea is that the filtration ordering is encoded in the co- For example, the homology of a persistence complex is a efficient polynomial ring. We utilize these coefficients persistence module, where ϕi simply maps a homology in Section 4 to derive the persistence algorithm from the class to the one that contains it. reduction scheme in Section 2.5.

Definition 3.3 (finite type) A persistence complex 3.2 Decomposition i i i i {C∗, f } (persistence module {M , ϕ }) is of finite type if each component complex (module) is a finitely gen- The correspondence established by Theorem 3.1 sug- erated R-module, and if the maps f i (ϕi, respectively) gests the non-existence of simple classifications of per- are isomorphisms for i ≥ m for some integer m. sistence modules over a ground ring that is not a field, such as Z. It is well known in commutative algebra that As our complex K is finite, it generates a persistence the classification of modules over Z[t] is extremely com- complex C of finite type, whose homology is a persis- plicated. While it is possible to assign interesting in- tence module M of finite type. We showed in the Intro- variants to Z[t]-modules, a simple classification is not duction how such complexes arise in practice. available, nor is it ever likely to be available. On the other hand, the correspondence gives us a sim- ple decomposition when the ground ring is a field F . 3.1 Correspondence Here, the graded ring F [t] is a PID and its only graded ideals are homogeneous of form (tn) = tn · R[t], n ≥ Suppose we have a persistence module M = 0, so the structure of the F [t]-module is described by i i {M , ϕ }i≥0 over ring R. We now equip R[t] with the sum (2) in Theorem 2.1: standard grading and define a graded module over R[t]   by n ! m M α M γ n ∞ Σ i F [t] ⊕  Σ j F [t]/(t j ) . (5) M i α(M) = M , i=1 j=1 i=0 We wish to parametrize the isomorphism classes of where the R-module structure is simply the sum of the F [t]-modules by suitable objects. structures on the individual components, and where the action of t is given by Definition 3.4 (P-interval) A P-interval is an ordered pair (i, j) with 0 ≤ i < j ∈ Z∞ = Z ∪ {+∞}. t·(m0, m1, m2,...) = (0, ϕ0(m0), ϕ1(m1), ϕ2(m2),...). We associate a graded F [t]-module to a set S of P- That is, t simply shifts elements of the module up in the intervals via a bijection Q. We define Q(i, j) = gradation. ΣiF [t]/(tj−i) for P-interval (i, j). Of course,

7 i l,p l,p Q(i, +∞) = Σ F [t]. For a set of P-intervals S = rank βk of Hk is the number of triangles in T contain- {(i1, j1), (i2, j2) ..., (in, jn)}, we define ing the point (l, p). n M Consequently, computing persistent homology over a Q(S) = Q(il, jl). field is equivalent to finding the corresponding set of P-

l=1 intervals.

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

Our correspondence may now be restated as follows. ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨ ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

l i

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨ ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

>

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦l+p < j

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

Corollary 3.1 The correspondence S → Q(S) defines ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢

¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

a bijection between the finite sets of P-intervals and the ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦(i, 0) (j, 0) index (l)

¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢

¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢£¢

¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤£¤

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

persistence (p)

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

finitely generated graded modules over the graded ring p ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦> 0

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

F [t]. Consequently, the isomorphism classes of persis- ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

tence modules of finite type over F are in bijective cor- ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦(i, j − i) (i, 0) (j, 0)

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

respondence with the finite sets of P-intervals. ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

3.3 Interpretation ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦(i, j − i)

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

Before proceeding any further, we recap our work so far ¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥£¥

¡¨

¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦£¦

§

¡¨

§ ¨

and relate it to prior results. Recall that our input is a ¡

§

¡¨ filtered complex K and we are interested in its kth ho- mology. In each dimension the homology of complex Figure 5. The inequalities p ≥ 0, l ≥ i, and l+p < j define a Ki becomes a vector space over a field, described fully triangular region in the index-persistence plane. This region de- i fines when the cycle is a basis element for the homology vector by its rank βk. We need to choose compatible bases space. across the filtration in order to compute persistent ho- mology for the entire filtration. So, we form the persis- tence module corresponding to K, a direct sum of these vector spaces. The structure theorem states that a ba- 4 Algorithm for Fields sis exists for this module that provides compatible bases for all the vector spaces. In particular, each P-interval In this section we devise an algorithm for computing (i, j) describes a basis element for the homology vec- persistent homology over a field. Given the theoret- tor spaces starting at time i until time j − 1. This ele- ical development of the last section, our approach is ment is a k-cycle e that is completed at time i, forming a rather simple: we simplify the standard reduction al- new homology class. It also remains non-bounding un- gorithm using the properties of the persistence module. j P til time j, at which time it joins the boundary group B . Our arguments give an algorithm for computing the - k F Therefore, the P-intervals discussed here are precisely intervals for a filtered complex directly over the field , the so-called k-intervals utilized in [11] to describe per- without the need for constructing the persistence mod- ule. This algorithm is a generalized version of the pair- sistent Z2-homology. That is, while component homol- ogy groups are torsionless, persistence appears as tor- ing algorithm shown in [11]. sional and free elements of the persistence module. l Our interpretation also allows us to ask when e + Bk 4.1 Derivation is a basis element for the persistent groups Hl,p. Recall k We use the small filtration in Figure 1 as a running ex- Equation (4). As e 6∈ Bl for all l < j, we know that k ample and compute over Z2, although any field will do. e 6∈ Bl+p for l + p < j. Along with l ≥ i and p ≥ 0, k The persistence module corresponds to a Z2[t]-module the three inequalities define a triangular region in the by the correspondence established in Theorem 2.1. Ta- index-persistence plane, as drawn in Figure 5. The re- ble 1 reviews the degrees of the simplices of our filtra- gion gives us the values for which the k-cycle e is a basis tion as homogeneous elements of this module. element for Hl,p. In other words, we have just shown a k Throughout this section we use {ej} and {eˆi} to rep- direct proof of the k-triangle Lemma in [11], which we resent homogeneous bases for Ck and Ck−1, respec- restate here in a different form. tively. Relative to homogeneous bases, any represen- tation Mk of ∂k has the following basic property: Lemma 3.1 Let T be the set of triangles defined by P- intervals for the k-dimensional persistence module. The deg eˆi + deg Mk(i, j) = deg ej, (6)

8 non-pivot columns form the desired basis for Zk. In our a b c d ab bc cd ad ac abc acd example, we have 0 0 1 1 1 1 2 2 3 4 5  cd bc ab z1 z2  Table 1. Degree of simplices of filtration in Figure 1  d t 0 0 0 0  ˜   M1 =  c t 1 0 0 0  , (8)    b 0 t t 0 0  where Mk(i, j) denotes the element at location (i, j). We get a 0 0 t 0 0 2  ab bc cd ad ac  where z1 = ad − cd − t · bc − t · ab, and z2 = ac − t · 2  d 0 0 t t 0  bc − t · ab form a homogeneous basis for Z1.  2  M1 =  c 0 1 t 0 t  , (7) The procedure that arrives at the echelon form is    b t t 0 0 0  Gaussian elimination on the columns, utilizing elemen- a t 0 0 t2 t3 tary column operations of types (1, 3) only. Starting with the left-most column, we eliminate non-zero en- for ∂1 in our example. The reader may verify Equa- tries occurring in pivot rows in order of increasing row. tion (6) using this example for intuition, e.g. M1(4, 4) = To eliminate an entry, we use an elementary column op- 2 t as deg ad − deg a = 2 − 0 = 2, according to Table 1. eration of type (3) that maintains the homogeneity of Clearly, the standard bases for chain groups are ho- the basis and matrix elements. We continue until we ei- mogeneous. We need to represent ∂k : Ck → Ck−1 rel- ther arrive at a zero column, or we find a new pivot. If ative to the standard basis for Ck and a homogeneous needed, we then perform a column exchange (type (1)) basis for Zk−1. We then reduce the matrix and read to reorder the columns appropriately. off the description of Hk according to our discussion in Section 2.5. We compute these representations in- Lemma 4.1 (Echelon Form) The pivots in column- ductively in dimension. The base case is trivial. As echelon form are the same as the diagonal elements in ∂0 ≡ 0, Z0 = C0 and the standard basis may be used normal form. Moreover, the degree of the basis elements for representing ∂1. Now, assume we have a matrix on pivot rows is the same in both forms. representation Mk of ∂k relative to the standard basis {ej} for Ck and a homogeneous basis {eˆi} for Zk−1. Proof: Because of our sort, the degree of row basis el- For induction, we need to compute a homogeneous ba- ements eˆi is monotonically decreasing from the top row sis for Zk and represent ∂k+1 relative to Ck+1 and the down. Within each fixed column j, deg ej is a constant computed basis. We begin by sorting basis eˆi in re- c. By Equation (6), deg Mk(i, j) = c − deg eˆi. There- verse degree order, as already done in the matrix in fore, the degree of the elements in each column is mono- Equation (7). We next transform Mk into the column- tonically increasing with row. We may eliminate non- echelon form M˜ k, a lower staircase form shown in Fig- zero elements below pivots using row operations that do ure 6 [17]. The steps have variable height, all landings not change the pivot elements or the degrees of the row have width equal to one, and non-zero elements may basis elements. We then place the matrix in diagonal only occur beneath the staircase. A boxed value in the form with row and column swaps. 

 ∗ 0 0  The lemma states that if we are only interested in the de-  ∗ 0 ···  gree of the basis elements, we may read them off from  .  the echelon form directly. That is, we may use the fol-  ∗ ∗ 0 .    lowing corollary of the standard structure theorem to ob-  ∗ ∗ 0 ···    tain the description. ∗ ∗ 0 ··· 0 Corollary 4.1 Let M˜ be the column-echelon form for Figure 6. The column-echelon form. An ∗ indicates a non-zero k value and pivots are boxed. ∂k relative to bases {ej} and {eˆi} for Ck and Zk−1, re- n spectively. If row i has pivot M˜ k(i, j) = t , it con- deg eˆi n figure is a pivot and a row (column) with a pivot is called tributes Σ F [t]/t to the description of Hk−1. Oth- a pivot row (column). From linear algebra, we know erwise, it contributes Σdeg eˆi F [t]. Equivalently, we get that rank Mk = rank Bk−1 is the number of pivots in (deg eˆi, deg eˆi + n) and (deg eˆi, ∞), respectively, as P- an echelon form. The basis elements corresponding to intervals for Hk−1.

9 In our example, M˜ 1(1, 1) = t in Equation (8). As for ∂k+1 in terms of the basis for Zk. This completes 1 deg d = 1, the element contributes Σ Z2[t]/(t) or P- the induction. In our example, the standard matrix rep- interval (1,2) to the description of H0. resentation for ∂2 is  abc acd  j i  ac t t2    j  ad 0 t3  M2 =   .  cd 0 t3   3  = 0  bc t 0  3 Mk i ab t 0

Mk+1 To get a representation in terms of C2 and the basis

m k−1 x m k m k x m k+1 (z1, z2) for Z1 we computed earlier, we simply elimi- nate the bottom three rows. These rows are associated Figure 7. As ∂k∂k+1 = 0, MkMk+1 = 0 and this is with pivots in M˜ 1, according to Equation (8). We get unchanged by elementary operations. When Mk is reduced to echelon form M˜ by column operations, the corresponding k  abc acd  row operations zero out rows in Mk+1 that correspond to pivot ˇ 2 columns in M˜ k. M2 =  z2 t t  , 3 z1 0 t We now wish to represent ∂k+1 in terms of the basis where we have also replaced ad and ac with the corre- we computed for Zk. We begin with the standard ma- sponding basis elements z1 = ad − bc − cd − ab and trix representation Mk+1 of ∂k+1. As ∂k∂k+1 = 0, z2 = ac − bc − ab. MkMk+1 = 0, as shown in Figure 7. Furthermore, this relationship is unchanged by elementary operations. Since the domain of ∂k is the codomain of ∂k+1, the el- 4.2 Algorithm ementary column operations we used to transform Mk Our discussion gives us an algorithm for computing P- into echelon form M˜ k give corresponding row opera- intervals of an F [t]-module over field F . It turns out, tions on Mk+1. These row operations zero out rows in however, that we can simulate the algorithm over the Mk+1 that correspond to non-zero pivot columns in M˜ k, field itself, without the need for computing the F [t]- and give a representation of ∂k+1 relative to the basis we module. Rather, we use two significant observations just computed for Zk. This is precisely what we are af- ter. We can get it, however, with hardly any work. from the derivation of the algorithm. First, Lemma 4.1 guarantees that if we eliminate pivots in the order of de- creasing degree, we may read off the entire description Lemma 4.2 (Basis Change) To represent ∂k+1 relative from the echelon form and do not need to reduce to nor- to the standard basis for Ck+1 and the basis computed mal form. Second, Lemma 4.2 tells us that by simply for Zk, simply delete rows in Mk+1 that correspond to noting the pivot columns in each dimension and elimi- pivot columns in M˜ k. nating the corresponding rows in the next dimension, we Proof: We only used elementary column operations of get the required basis change. types (1,3) in our variant of Gaussian elimination. Only Therefore, we only need column operations through- the latter changes values in the matrix. Suppose we re- out our procedure and there is no need for a matrix rep- place column i by (column i) + q(column j) in order to resentation. We represent the boundary operators as a eliminate an element in a pivot row j, as shown in Fig- set of boundary chains corresponding to the columns ure 7. This operation amounts to replacing column basis of the matrix. Within this representation, column ex- element ei by ei+qej in Mk. To effect the same replace- changes (type (1)) have no meaning, and the only opera- ment in the row basis for ∂k+1, we need to replace row tion we need is of type (3). Our data structure is an array j with (row j) − q(row i). However, row j is eventu- T with a slot for each simplex in the filtration, as shown ally zeroed-out, as shown in Figure 7, and row i is never in Figure 8 for our example. Each simplex gets a slot in changed by any such operation.  the table. For indexing, we need a full ordering of the simplices, so we complete the partial order defined by Therefore, we have no need for row operations. We the degree of a simplex by sorting simplices according simply eliminate rows corresponding to pivot columns to dimension, breaking all remaining ties arbitrarily (we one dimension lower to get the desired representation did this implicitly in the matrix representation.) We also

10 a b c d ab bc cd ad ac abc acd EMOVE IVOT OWS 0 1 2 3 4 5 6 7 8 9 10 chain R P R (σ) { 4 5 6 10 9 k = dim σ; d = ∂kσ; Remove unmarked terms in d; while (d 6= ∅) { b c d ad ac i = maxindex d; −a −b −c if T [i] is empty, break; i Figure 8. Data structure after running the algorithm on the fil- Let q be the coefficient of σ in T [i]; tration in Figure 1. Marked simplices are in bold italic. d = d − q−1T [i]; } need the ability to mark simplices to indicate non-pivot return d; columns. } Rather than computing homology in each dimension independently, we compute homology in all dimen- Figure 10. Algorithm REMOVEPIVOTROWS first eliminates rows sions incrementally and concurrently. The algorithm, as not marked (not corresponding to the basis for Zk−1), and then eliminates terms in pivot rows. shown in Figure 9, stores the list of P-intervals for Hk in Lk. tinue until we exhaust the filtration. We then perform COMPUTEINTERVALS (K) { another pass through the filtration in search of infinite for k = 0 to dim(K) Lk = ∅; P-intervals: marked simplices whose slot is empty. for j = 0 to m − 1 { We give the function REMOVEPIVOTROWS in Fig- j d = REMOVEPIVOTROWS (σ ); ure 10. Initially, the function computes the boundary j if (d = ∅) Mark σ ; chain d for the simplex. It then applies Lemma 4.2, else { eliminating all terms involving unmarked simplices to i = d k = dim σi maxindex ; ; get a representation in terms of the basis for Zk−1. The Store j and d in T [i]; rest of the procedure is Gaussian elimination in the order i j Lk = Lk ∪ {(deg σ , deg σ )} of decreasing degree, as dictated by our discussion for } the F [t]-module. The term with the maximum index i = } max d is a potential pivot. If T [i] is non-empty, a pivot for j = 0 to m − 1 { already exists in that row, and we use the inverse of its j if σ is marked and T [j] is empty { coefficient to eliminate the row from our chain. Other- j j k = dim σ ; Lk = Lk ∪ {(deg σ , ∞)} wise, we have found a pivot and our chain is a pivot col- } umn. For our example filtration in Figure 8, the marked } 0-simplices {a, b, c, d} and 1-simplices {ad, ac} gener- } ate P-intervals L0 = {(0, ∞), (0, 1), (1, 1), (1, 2)} and L1 = {(2, 5), (3, 4)}, respectively.

Figure 9. Algorithm COMPUTEINTERVALS processes a complex of m simplices. It stores the sets of P-intervals in dimension k 4.3 Discussion in Lk. From our derivation, it is clear that the algorithm has the j When simplex σ is added, we check via proce- same running time as Gaussian elimination over fields. dure REMOVEPIVOTROWS to see whether its bound- That is, it takes O(m3) in the worst case, where m is the ary chain d corresponds to a zero or pivot column. If number of simplices in the filtration. The algorithm is the chain is empty, it corresponds to a zero column and very simple, however, and represents the matrices effi- j we mark σ : its column is a basis element for Zk, and ciently. In our preliminary experiments, we have seen a the corresponding row should not be eliminated in the linear time behavior for the algorithm. next dimension. Otherwise, the chain corresponds to a pivot column and the term with the maximum index i = maxindex d is the pivot, according the procedure de- 5 Algorithm for PIDs scribed for the F [t]-module. We store index j and chain d representing the column in T [i]. Applying Corol- The correspondence we established in Section 3 elimi- lary 4.1, we get P-interval (deg σi, deg σj). We con- nates any hope for a simple classification of persistent

11 groups over rings that are not fields. Nevertheless, we 6 Experiments may still be interested in their computation. In this sec- tion, we give an algorithm to compute the persistent ho- In this section, we discuss experiments using an im- i,p plementation of the persistence algorithm for arbitrary mology groups Hk of a filtered complex K for a fixed i and p. The algorithm we provide computes persistent fields. Our aim is to further elucidate the contributions homology over any PID D of coefficients by utilizing a of this paper. We look at two scenarios where the previ- reduction algorithm over that ring. ous algorithm would not be applicable, but where our al- gorithm succeeds in providing information about a topo- To compute the persistent group, we need to obtain logical space. a description of the numerator and denominator of the quotient group in Equation (4). We already know how to characterize the numerator. We simply reduce the stan- 6.1 Implementation dard matrix representation M i of ∂i using the reduc- k k We have implemented our field algorithm for Zp for p a tion algorithm. The denominator, Bi,p = Bi+p ∩ Zi , k k k prime, and Q coefficients. Our implementation is in C plays the role of the boundary group in Equation (4). and utilizes GNU MP, a multiprecision library, for ex- i Therefore, instead of reducing matrix Mk+1, we need act computation [14]. We have a separate implementa- i,p to reduce an alternate matrix Mk+1 that describes this tion for coefficients in Z2 as the computation is greatly boundary group. We obtain this matrix as follows: simplified in this field. The coefficients are either 0, or 1, so there is no need for orienting simplices or main- i (1) We reduce matrix Mk to its normal form and obtain taining coefficients. A k-chain is simply a list of sim- j i plices, those with coefficient 1. Each simplex is its a basis {z } for Zk, using fact (ii) in Section 2.5. We may merge this computation with that of the own inverse, reducing the group operation to the sym- numerator. metric difference, where the sum of two k-chains c, d is c + d = (c ∪ d) − (c ∩ d). We use a 2.2 GHz Pentium i+p 4 Dell PC with 1 GB RAM running Red Hat Linux 7.3 (2) We reduce matrix Mk+1 to its normal form and ob- l i+p for computing the timings. tain a basis {b } for Bk using fact (iii) in Sec- tion 2.5. 6.2 Data N = [{bl}{zj}] = [BZ] (3) Let , that is, the columns Our algorithm requires a persistence complex as input. N of matrix consist of the basis elements from the In the introduction, we discussed how persistence com- B Z bases we just computed, and and are the re- plexes arise naturally in practice. In Example 1.3, we spective submatrices defined by the bases. We next discussed generating persistence complexes using ex- N {uq} reduce to normal form to find a basis for cursion sets of Morse functions over manifolds. We its null-space. As before, we obtain this basis us- have implemented a general framework for computing uq = [αq ζq] αq, ζq ing fact (ii). Each , where are complexes of this type. We must emphasize, however, {bl}, {zj} vectors of coefficients of , respectively. that our persistence software processes persistence com- Nuq = Bαq +Zζq = 0 Note that by definition. In plexes of any origin. other words, element Bαq = −Zζq is belongs to Our framework takes a tuple (K, f) as input and pro- the span of both bases. Therefore, both {Bαq} and duces a persistence complex C(K, f) as output. K is {Zζq} are bases for Bi,p = Bi+p ∩ Zi . We form a k k k a d-dimensional simplicial complex that triangulates an matrix M i,p from either. k+1 underlying manifold, and f : vert K → R is a discrete function over the vertices of K that we extend linearly i,p We now reduce Mk+1 to normal form and read off the over the remaining simplices of K. The function f acts i,p as the Morse function over the manifold, but need not torsion coefficients and the rank of Bk . It is clear from the procedure that we are computing the persistent be Morse for our purposes. Frequently, our complex is d groups correctly, giving us the following. augmented with a map ϕ : K → R that immerses or embeds the manifold in Euclidean space. Our algo- rithm does not require ϕ for computation, but ϕ is of- Theorem 5.1 For coefficients in any PID, persistent ho- ten provided as a discrete map over the vertices of K mology groups are computable in the order of time and and is extended linearly as before. For example, Fig- space of computing homology groups. ure 11 displays a triangulated Klein bottle, immersed

12 s k number k of -simplices χ 0 1 2 3 4 K 2,000 6,000 4,000 0 0 0 E 3,095 52,285 177,067 212,327 84,451 1 J 17,862 297,372 1,010,203 1,217,319 486,627 1

Table 2. Datasets. K is the Klein bottle, shown in Figure 11. E is potential around electrostatic charges. J is supersonic jet flow.

3 in R . For each dataset, Table 2 gives the number F β0 β1 β2 time (s) sk of k-simplices, as well as the Euler characteristic Z2 1 2 1 0.01 P k χ = k(−1) sk. We use the Morse function to com- Z3 1 1 0 0.23 pute the excursion set filtration for each dataset. Table 3 Z5 1 1 0 0.23 gives information on the resulting filtrations. Z3203 1 1 0 0.23 Q 1 1 0 0.50

Table 4. Field coefficients. The Betti numbers of K computed over field F and time for the persistence algorithm. We use a separate implementation for Z2 coefficients.

in Figure 11. The change in triangle orientation at the parametrization boundary leads to a rendering artifact where two sets of triangles are front-facing. In homol- ogy, the non-orientability of the Klein bottle manifests itself as a torsional 1-cycle c where 2c is a boundary (in- deed, it bounds the surface itself.) The homology groups over Z are: H (K) = , Figure 11. A wire-frame visualization of dataset K, an immersed 0 Z triangulated Klein bottle with 4000 triangles. H1(K) = Z × Z2, H2(K) = {0}.

|K| len filt (s) pers (s) Note that β1 = rank H1 = 1. We now use the “height K 12,000 1,020 0.03 < 0.01 function” as our Morse function, f = z, to generate the E 529,225 3,013 3.17 5.00 filtration in Table 3. We then compute the homology of J 3,029,383 256 24.13 50.23 dataset K with field coefficients using our algorithm, as shown in Table 4. Table 3. Filtrations. The number of simplices in the filtration Over Z2, we get β1 = 2 as homology is unable to rec- P |K| = i si, the length of the filtration (number of distinct ognize the torsional boundary 2c with coefficients 0 and values of function f), time to compute the filtration, and time to 1. Instead, it observes an additional class of homology compute persistence over Z2 coefficients. P 1-cycles. By the Euler-Poincare´ relation, χ = i βi, so we also get a class of 2-cycles to compensate for the in- crease in β1 [16]. Therefore, Z2-homology misidentifies 6.3 Field Coefficients the Klein bottle as the torus. Over any other field, how- ever, homology turns the torsional cycle into a boundary, A contribution of this paper is the generalization of the as the inverse of 2 exists. In other words, while we can- persistence algorithm to arbitrary fields. This contribu- not observe torsion in computing homology over fields, tion is important when the manifold under study con- we can deduce its existence by comparing our results tains torsion. To make this clear, we compute the ho- over different coefficient sets. Similarly, we can com- mology of the Klein bottle using the persistence algo- pare sets of P-intervals from different computations to rithm. Here, we are interested only in the Betti numbers discover torsion in a persistence complex. of the final complex in the filtration for illustrative pur- Note that our algorithm’s performance for this dataset poses. The non-orientability of the Klein bottle is visible is about the same over arbitrary finite fields, as the coef-

13 ficients do not get large. The computation over Q takes 90 about twice as much time and space, since each rational 80

is represented as two integers in GNU MP. 70

60

6.4 Higher Dimensions 50 f 2 β 40 A second contribution of this paper is the extension of the persistence algorithm from subcomplexes of S3 to 30 complexes in arbitrary dimensions. We have already uti- 20

lized this capability in computing the homology of the 10 Klein bottle. We now examine the performance of this 0 0 50 100 150 200 250 algorithm in higher dimensions. For practical motiva- f

tion, we use large-scale time-varying volume data as in- f Figure 12. Graph of β2 for dataset J, where f is the flow ve- put. Advances in data acquisition systems and comput- locity. ing technologies have resulted in the generation of mas- sive sets of measured or simulated data. The datasets usually contain the time evolution of physical variables, requires at least 16 bytes for representing any integer. such as temperature, pressure, or flow velocity at sample points in space. The goal is to identify and localize sig- nificant phenomena within the data. We propose using 7 Conclusion persistence as the significance measure. We believe the most important contribution of this pa- The underlying space for our datasets is the four- per is a reinterpretation of persistent homology within dimensional space-time manifold. For each dataset, we the classical framework of . Our in- triangulate the convex hull of the samples to get a trian- terpretation allows us to: gulation. Each complex listed in Table 2 is homeomor- phic to a four-dimensional ball and has χ = 1. Dataset 1. establish a correspondence that fully describes the E contains the potential around electrostatic charges at structure of persistent homology over any field, not each vertex. Dataset J records the supersonic flow ve- only over Z2, as in the previous result, locity of a jet engine. We use these values as Morse functions to generate the filtrations. We then compute 2. and relate the previous algorithm to the classic persistence over Z2 coefficients to get the Betti numbers. reduction algorithm, thereby extending it to arbi- We give filtration sizes and timings in Table 3. Figure 12 trary fields and arbitrary dimensional complexes, displays β2 for dataset J. We observe a large number of not just subcomplexes of S3 as in the previous re- two-dimensional cycles (voids), as the co-dimension is sult. 2. Persistence allows us to decompose this graph into the set of P-intervals. Although there are 730,692 P- We provide implementations of our algorithm for fields, intervals in dimension 2, most are empty as the topolog- and show that they perform quite well for large datasets. ical attribute is created and destroyed at the same func- Finally, we give an algorithm for computing a persis- tion level. We draw the 502 non-empty P-intervals in tent homology group with fixed parameters over arbi- Figure 13. We note that the P-intervals represent a com- trary PIDs. pact and general shape descriptor for arbitrary spaces. Our software for n-dimensional complexes enables us to analyze arbitrary-dimensional point cloud data and For the large data sets, we do not compute persistence their derived spaces. One current project uses this im- over alternate fields as the computation requires in ex- plementation for feature recognition using a novel alge- cess of 2 gigabytes of memory. In the case of finite fields braic method [2]. Another project analyzes the topo- Zp, we may restrict the prime p so that the computation logical structures in a high-dimensional data set derived fits within an integer. This is a reasonable restriction, from natural images [7]. Yet another applies persistence as on most modern machines with 32-bit integers, it im- to derived spaces to arrive at compact shape descriptors plies p < 216 − 1. Given this restriction, any coefficient for geometric objects [3, 5]. Future theoretical work in- will be less than p and representable as a 4-byte integer. clude examining invariants for persistent homology over The GNU MP exact integer format, on the other hand, non-fields and defining multivariate persistence, where

14 the Euler characteristic of semi-algebraic sets. Discrete Comput. Geom. 22 (1999), 1–18. [2] CARLSSON,E.,CARLSSON,G., ANDDE SILVA, V. An algebraic topological method for feature identification, 2003. Manuscript. [3] CARLSSON,G.,ZOMORODIAN,A.,COLLINS,A., AND GUIBAS, L. Persistence barcodes for shapes. In Proc. Symp. Geom. Process. (2004), pp. 127–138. [4] CGAL. Computational geometry algorithms library. http://www.cgal.org. [5] COLLINS,A.,ZOMORODIAN,A.,CARLSSON,G., AND GUIBAS, L. A barcode shape descriptor for curve point cloud data. In Proc. Symp. Point-Based Graph. (2004), pp. 181–191. [6] COX,D.,LITTLE,J., AND O’SHEA,D. Ideals, Vari- eties, and Algorithms, second ed. Springer-Verlag, New York, NY, 1991. [7] DE SILVA, V., AND CARLSSON, G. Topological esti- mation using witness complexes. In Proc. Symp. Point- Based Graph. (2004), pp. 157–166. [8] DONALD,B.R., AND CHANG, D. R. On the complex- ity of computing the homology type of a triangulation. In Proc. 32nd Ann. IEEE Sympos. Found Comput. Sci. Figure 13. The 502 non-empty P-intervals for dataset J in (1991), pp. 650–661. dimension 2. The amalgamation of these intervals gives the graph in Figure 12. [9] DUMAS,J.-G.,HECKENBACH, F., SAUNDERS,B.D., AND WELKER, V. Computing simplicial homology based on efficient Smith normal form algorithms. In Al- there is more than one persistence dimension. An ex- gebra, Geometry, and Software Systems (2003), pp. 177– ample would be tracking a Morse function as well as 207. density of sampling on a manifold. Finally, we have [10] DUMMIT,D., AND FOOTE,R. Abstract Algebra. John recently reimplemented the algorithm using the generic Wiley & Sons, Inc., New York, NY, 1999. paradigm. This implementation will soon be a part of [11] EDELSBRUNNER,H.,LETSCHER,D., AND ZOMORO- the CGAL library [4]. DIAN, A. Topological persistence and simplification. Discrete Comput. Geom. 28 (2002), 511–533. [12] EISENBUD,D. Commutative Algebra with a View To- Acknowledgments ward Algebraic Theory. Springer, New York, NY, 1995. The first author thanks Herbert Edelsbrunner and John [13] FRIEDMAN, J. Computing Betti numbers via combina- Harer for discussions on Z-homology, and Leo Guibas torial Laplacians. In Proc. 28th Ann. ACM Sympos. The- for providing support and encouragement. Both au- ory Comput. (1996), pp. 386–391. thors thank Anne Collins for her thorough review of the [14] GRANLUND, T. The GNU multiple precision arithmetic manuscript and Ajith Mascarenhas for providing sim- library. http://www.swox.com/gmp. plicial complexes for datasets E and J. The original [15] GROMOV, M. Hyperbolic groups. In Essays in Group data are part of the Advanced Visualization Technology Theory, S. Gersten, Ed. Springer Verlag, New York, NY, Center’s data repository and appears courtesy of Andrea 1987, pp. 75–263. Malagoli and Milena Micono of the Laboratory for As- [16] MUNKRES,J.R. Elements of Algebraic Topology. trophysics and Space Research (LASR) at the University Addison-Wesley, Reading, MA, 1984. of Chicago. Figure 11 was rendered in Stanford Graph- [17] UHLIG, F. Transform Linear Algebra. Prentice Hall, ics Lab’s Scanalyze. Upper Saddle River, NJ, 2002. [18] ZOMORODIAN,A. Computing and Comprehending Topology: Persistence and Hierarchical Morse Com- References plexes. PhD thesis, University of Illinois at Urbana- Champaign, 2001. [1] BASU, S. On bounding the Betti numbers and computing

15