Graph Filtration Learning
Christoph D. Hofer 1 Florian Graf 1 Bastian Rieck 2 Marc Niethammer 3 Roland Kwitt 1
Abstract Gradient We propose an approach to learning with graph- f(θ, K, v) B0 structured data in the problem domain of graph v f ph ∞ MLP L classification. In particular, we present a novel k B0 V readout type of operation to aggregate node fea- B∞ Classifier tures into a graph-level representation. To this k = 0, 1 1 end, we leverage persistent homology computed v Barcodes Vectorization via a real-valued, learnable, filter function. We es- Input graph G Homological readout tablish the theoretical foundation for differentiat- ing through the persistent homology computation. Figure 1. Overview of the proposed homological readout. Given a Empirically, we show that this type of readout graph/simplicial complex, we use a vertex-based graph functional operation compares favorably to previous tech- f to assign a real-valued score to each node. A practical choice niques, especially when the graph connectivity is to implement f as a GNN, with one level of message passing. structure is informative for the learning problem. We then compute persistence barcodes, Bk, using the filtration induced by f. Finally, barcodes are fed through a vectorization scheme V and passed to a classifier (e.g., an MLP). Our approach 1. Introduction allows passing a learning signal through the persistent homology computation, allowing to optimize f for the classification task. We consider the task of learning a function from the space of (finite) undirected graphs, G, to a (discrete/continuous) target domain . Additionally, graphs might have discrete, Y tion. While research has mostly focused on variants of the or continuous attributes attached to each node. Prominent message passing function, the readout step may have a sig- examples for this class of learning problem appear in the nificant impact, as it aims to capture properties of the entire context of classifying molecule structures, chemical com- graph. Importantly, both simple and more refined readout pounds or social networks. operations, such as summation, differentiable pooling (Ying A substantial amount of research has been devoted to de- et al., 2018), or sort pooling (Zhang et al., 2018a), are inher- veloping techniques for supervised learning with graph- ently coupled to the amount of information carried over via structured data, ranging from kernel-based methods (Sher- multiple rounds of message passing. Hence, architectural vashidze et al., 2009; 2011; Feragen et al., 2013; Kriege GNN choices are typically guided by dataset characteris- et al., 2016), to more recent approaches based on graph tics, e.g., requiring to tune the number of message passing neural networks (GNN) (Scarselli et al., 2009; Hamilton rounds to the expected size of graphs. arXiv:1905.10996v3 [cs.LG] 17 May 2021 et al., 2017; Zhang et al., 2018b; Morris et al., 2019; Xu Contribution. We propose a homological readout opera- et al., 2019; Ying et al., 2018). Most of the latter works use tion that captures the full global structure of a graph, while an iterative message passing scheme (Gilmer et al., 2017) relying only on node representations learned from immedi- to learn node representations, followed by a graph-level ate neighbors. This not only alleviates the aforementioned pooling operation that aggregates node-level features. This design challenge, but potentially offers additional discrimi- aggregation step is typically referred to as a readout opera- native information. Similar to previous works, we consider 1Department of Computer Science, University of Salzburg, a graph, G, as a simplicial complex, K, and use persistent Austria 2Department of Biosystems Science and Engineering, homology (Edelsbrunner & Harer, 2010) to capture homo- ETH Zurich, Switzerland 3UNC Chapel Hill. Correspondence logical changes that occur when constructing the graph one to: Christoph D. Hofer
1 vector space generated by Kk over Z/2Z and define for 1 ≤ i < j ≤ m encode all the information about the k X persistent Betti numbers of dimension . ∂k : Kk → Ck−1(K) σ 7→ τ . τ:τ⊂σ Persistence barcodes. In principle, introducing persistence dim(τ)=k−1 barcodes only requires a filtration as defined above. In our setting, we define a filtration of K via a vertex filter function In other words, σ is mapped to the formal sum of its (k −1)- f : → . In particular, given the sorted sequence of filter dim. faces, i.e., its subsets of cardinality k. The linear V R values a < ··· < a , i.e., a ∈ {f(v): {v} ∈ K }, we extension to C (K) of this mapping defines the k-th bound- 1 m i 0 k define the filtration w.r.t.f as ary operator Kf,0 = ∅ ∂k : Ck(K) → Ck−1(K) f,i (3) n n K = {σ ∈ K : max f(v) ≤ ai} X X v∈σ σi 7→ ∂k(σi) . i=1 i=1 for 1 ≤ i ≤ m. Intuitively, this means that the filter func- tion is defined on the vertices of K and lifted to higher Using ∂ , we obtain the corresponding k-th homology k dimensional simplices in K by maximum aggregation. This group, H , as in Eq. (1). k enables us to consider a sub levelset filtration of K. Graphs are simplicial complexes. In fact, we can directly Then, for a given filtration of K and 0 ≤ k ≤ dim(K), interpret a graph G as an 1-dimensional simplicial com- we can construct a multiset by inserting the point (ai, aj), plex whose vertices (0-simplices) are the graphs nodes and i,j whose edges (1-simplices) are the graphs edges. In this 1 ≤ i < j ≤ m, with multiplicity µk . This effectively en- setting, only the 1st boundary operator is non trivial and codes the k-dimensional persistent homology of K w.r.t. the simply maps an edge to the formal sum of its defining nodes. given filtration. This representation is called a persistence Via homology we can then compute the graph’s 0 and 1 di- barcode, Bk. Hence, for a given complex K of dimension mensional topological features, i.e., connected components dim(K) and a filter function f (of the discussed form), and cycles. However, as mentioned earlier, this only gives a we can interpret k-dimensional persistent homology as a rather coarse summary of the graph’s topological properties, mapping of simplicial complexes, defined by raising the demand for a refinement of homology. f phk (K) = Bk 0 ≤ k ≤ dim(K) . (4) Persistent homology. For consistency with the relevant literature, we introduce the idea of refining homology to Remark 1. By setting persistent homology in terms of simplicial complexes, but µi,∞ = βi,m − βi−1,m recall that graphs are 1-dimensional simplicial complexes. k n n i m we extend Eq. (2) to features which never disappear, also Now, let (K )i=0 be a sequence of simplicial complexes such that ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Km = K. Then, referred to as essential. If we use this extension, Eq. (4) i m yields an additional barcode, denoted as B∞, per dimension. (K )i=0 is called a filtration of K. Using the extra informa- k ∞ tion provided by the filtration of K, we obtain a sequence For practical reasons, the points in Bk are just the birth- i i time, as all death-times equal to ∞ and thus are omitted. of chain complexes where Ck = Ck(Kk) and ι denotes the inclusion. This leads to the concept of persistent homology groups, i.e., Persistent homology on graphs. In the setting of graph classification, we can think of a filtration as a growth pro- i,j i j i Hk = ker ∂k/(im ∂k+1∩ker ∂k) for 1 ≤ i ≤ j ≤ m . cess of a weighted graph. As the graph grows, we track homological changes and gather this information in its 0- i,j i,j The ranks, βk = rank Hk , of these homology groups and 1-dimensional persistence barcodes. An illustration of (i.e., the k-th persistent Betti numbers), capture the number 0-dimensional persistent homology for a toy graph example of homological features of dimensionality k (e.g., connected is shown in Fig.2. components for k = 0, loops for k = 1, etc.) that persist from i to (at least) j. According to the Fundamental Lemma 4. Filtration learning of Persistent Homology (Edelsbrunner & Harer, 2010), the quantities Having introduced the required terminology, we now focus
i,j i,j−1 i,j i−1,j−1 i−1,j on the main contribution of this paper, i.e., how to learn an µk = (βk − βk ) − (βk − βk ) (2) appropriate filter function. To the best of our knowledge, our 1 paper constitutes the first approach to this problem. While Simplicial homology is not specific to Z/2Z, but it’s a typ- ical choice, since it allows us to interpret k-chains as sets of n- different filter functions for graphs have been proposed and simplices. used in the past, e.g., based on degrees (Hofer et al., 2017), Graph Filtration Learning
1 2 Barcode (H0)
1 2 3 4 3 ⊆ ⊆ ⊆ vi
4 = f(vi)
Birth Death 1 Graph G Kf,1 Kf,2 Kf,3 Kf,4
Figure 2. Illustration of 0-dimensional persistent homology on a graph G, via a node-based filter function f. The barcode (shown right), ∞ computed by persistent homology is B0 = {(1, 4), (2, 3)}. If we consider essential topological features, we further get B0 = {1}. cliques (Petri et al., 2013), multiset distances (Rieck et al., The assumption that f is dependent on a single parameter 2019a), or the Jaccard distance (Zhao & Wang, 2019), prior is solely dedicated to simplify the following theoretical work typically treats the filter function as fixed. In this part. The derived results immediately generalize to the case setting, the filter function needs to be selected via, e.g., f : Rn × K × V where n is the number of parameters and cross-validation, which is cumbersome. Instead, we show we will drop this assumption later on. that it is possible to learn a filter function end-to-end. f By Eq. (4), (θ, K) 7→ phk (θ, K) is a mapping to the space This endeavor hinges on the capability of backpropagat- of persistence barcodes B (or B2 if essential barcodes are ing gradients through the persistent homology computation. included). As B has no natural linear structure, we consider While this has previously been done in (Chen et al., 2019) or differentiability in combination with a coordinatization strat- (Hofer et al., 2019b), their results are not directly applicable egy V : B → R. This allows studying the map here, due to the special problem settings considered in these f works. In (Chen et al., 2019), the authors regularize deci- (θ, K) 7→ V phk (θ, K) sion boundaries of classifiers, in (Hofer et al., 2019b) the authors impose certain topological properties on represen- for which differentiability is defined. In fact, we can rely on tations learned by an autoencoder. This warrants a detailed prior works on learnable vectorization schemes for persis- analysis in the context of graphs. tence barcodes (Hofer et al., 2019a; Carrière et al., 2019). In the following definition of a barcode coordinate function, We start by recalling that the computation of sublevel set we adhere to (Hofer et al., 2019b). persistent homology, cf. Eq. (3), depends on two arguments: Definition 2 (Barcode coordinate function). Let s : 2 → (1) the complex K and (2) the filter function f which deter- R be a differentiable function that vanishes on the diagonal mines the order of the simplices in the filtration of K. As R of 2. Then K is of discrete nature, it is clear that K cannot be subject R X to gradient-based optimization. However, assume that the V : → B 7→ s(b, d) filter f has a differentiable dependence on a real-valued B R (b,d)∈B parameter θ. In this case, the persistent homology of the sublevel set filtration of K also depends on θ, raising the is called barcode coordinate function. following question: Is the mapping differentiable in θ? Intuitively, V maps a barcode B to a real value by aggre- Notation. If any symbol introduced in the context of persis- gating the points in B via a weighted sum. Notably, the tent homology is dependent on θ, we interpret it as function deep sets approach of (Zaheer et al., 2017) would also be a of θ and attach “(θ)” to this symbol. If the dependence on natural choice, however, not specifically tailored to persis- the simplicial complex K is irrelevant to the current context, i,j tence barcodes. Upon using d barcode coordinate functions we omit it for brevity. For example, we write µ (θ) for the k of the form described in Definition2, we can effectively multiplicity of barcode points. map a barcode into Rd and feed this representation through Next, we concretize the idea of a learnable vertex filter. any differentiable layer downstream, e.g., implementing a classifier. We will discuss our particular choice of s in §5. Definition 1 (Learnable vertex filter function). Let V be a vertex domain, the set of possible simplicial complexes f K Next, we show that, under certain conditions, phk in combi- over V and let nation with a suitable barcode coordinate function preserves differentiability. This is the key result that will enable end- f : × × → (θ, K, v) 7→ f(θ, K, v) R K V R to-end learning of a filter function. be differentiable in θ for K ∈ K, v ∈ V. Then, we call f a Lemma 1. Let K be a finite simplicial complex with vertex learnable vertex filter function with parameter θ. set V = {v1, . . . , vn}, f : R × K × V → R be a learnable Graph Filtration Learning vertex filter function as in Definition1 and V a barcode face of σ, formally τ ∈ {ρ ⊆ σ : dim ρ = dim σ − 1} ⇒ f,i m coordinate function as in Definition2. If, for θ0 ∈ R, it τ < σ. Specifically, the sublevel set filtration (K )i=1, holds that the pairwise vertex filter values are distinct, i.e., only yields a partial strict ordering by
f(θ0, K, vi) 6= f(θ0, K, vj) for 1 ≤ i < j ≤ n σ ∈ Ki, τ ∈ Kj and i < j ⇒ σ < τ , (8) σ, τ ∈ K and dim(σ) < dim(τ) ⇒ σ < τ . (9) then the mapping i