Spectral Clustering of Signed Graphs Via Matrix Power Means

Spectral Clustering of Signed Graphs via Matrix Power Means Pedro Mercado 1 2 Francesco Tudisco 3 Matthias Hein 2 Abstract example, can express positive interactions, like friendship Signed graphs encode positive (attractive) and and trust, and negative ones, like enmity and distrust. Other negative (repulsive) relations between nodes. We important application settings are the analysis of gene ex- extend spectral clustering to signed graphs via pressions in biology (Fujita et al., 2012) or the analysis of the one-parameter family of Signed Power Mean financial and economic time sequences (Ziegler et al., 2010; Laplacians, defined as the matrix power mean Pavlidis et al., 2006), where similarity and variable depen- of normalized standard and signless Laplacians dence measures commonly used may attain both positive of positive and negative edges. We provide a and negative values (e.g. the Pearson correlation coefficient). thorough analysis of the proposed approach in Although the majority of the literature has focused on the setting of a general Stochastic Block Model graphs that encode only positive interactions, the analy- that includes models such as the Labeled Stochas- sis of signed graphs can be traced back to social balance tic Block Model and the Censored Block Model. theory (Cartwright & Harary, 1956; Harary, 1953; Davis, We show that in expectation the signed power 1967), where the concept of a k-balance signed graph is mean Laplacian captures the ground truth clus- introduced. The analysis of signed networks has been then ters under reasonable settings where state-of-the- pushed forward through the study of a variety of tasks in art approaches fail. Moreover, we prove that the signed graphs, as for example edge prediction (Kumar et al., eigenvalues and eigenvector of the signed power 2016; Leskovec et al., 2010a; Falher et al., 2017), node mean Laplacian concentrate around their expec- classification (Bosch et al., 2018; Tang et al., 2016a), node tation under reasonable conditions in the general embeddings (Chiang et al., 2011; Derr et al., 2018; Kim Stochastic Block Model. Extensive experiments et al., 2018; Wang et al., 2017; Yuan et al., 2017), node on random graphs and real world datasets confirm ranking (Chung et al., 2013; Shahriari & Jalili, 2014), and the theoretically predicted behaviour of the signed clustering (Chiang et al., 2012; Kunegis et al., 2010; Mer- power mean Laplacian and show that it compares cado et al., 2016; Sedoc et al., 2017; Doreian & Mrvar, 2009; favourably with state-of-the-art methods. Knyazev, 2018; Kirkley et al., 2018; Cucuringu et al., 2019; Cucuringu et al., 2018). See (Tang et al., 2016b; Gallier, 2016) for recent surveys on the topic. 1. Introduction In this paper we present a novel extension of spectral cluster- The analysis of graphs has received a significant amount of ing for signed graphs. Spectral clustering (Luxburg, 2007) attention due to their capability to encode interactions that is a well established technique for non-signed graphs, which naturally arise in social networks. Yet, the vast majority of partitions the set of the nodes based on a k-dimensional node graph methods has been focused on the case where inter- embedding obtained using the first eigenvectors of the graph arXiv:1905.06230v1 [cs.LG] 15 May 2019 actions are of the same type, leaving aside the case where Laplacian. Our contributions are as follows: We intro- different kinds of interactions are available (Leskovec et al., duce the family of Signed Power Mean (SPM) Laplacians: 2010b). Graphs and networks with both positive and neg- a one-parameter family of graph matrices for signed graphs ative edge weights arise naturally in a number of social, that blends the information from positive and negative inter- biological and economic contexts. Social dynamics and actions through the matrix power mean, a general class of relationships are intrinsically positive and negative: users of matrix means that contains the arithmetic, geometric, and online social networks such as Slashdot and Epinions, for harmonic mean as special cases. This is inspired by recent extensions of spectral clustering which merge the informa- 1Saarland University 2University of Tubingen¨ 3University of Strathclyde. Correspondence to: Pedro Mercado <[email protected] tion encoded by positive and negative interactions through saarland.de>. different types of arithmetic (Chiang et al., 2012; Kunegis et al., 2010) and geometric (Mercado et al., 2016) means th Proceedings of the 36 International Conference on Machine of the standard and signless graph Laplacians. We analyze Learning, Long Beach, California, PMLR 97, 2019. Copyright the performance of the signed power mean Laplacian in a 2019 by the author(s). Spectral Clustering of Signed Graphs via Matrix Power Means general Signed Stochastic Block Model. We first provide an A related approach is correlation clustering (Bansal et al., anlysis in expectation showing that the smaller is the param- 2004). Unlike spectral clustering, where the number of clus- eter of the signed power mean Laplacian, the less restrictive ters is fixed a-priori, correlation clustering approximates the are the conditions that ensure to recover the ground truth optimal number of clusters by identifying a partition that clusters. In particular, we show that the limit cases +1 is as close as possible to be k-balanced. In this setting, the and −∞ are related to the boolean operators AND and OR, case where the number of clusters is constrained has been respectively, in the sense that for the limit case +1 clusters considered in (Giotis & Guruswami, 2006). are recovered only if both positive and negative interactions We briefly introduce the standard and signless Laplacian and are informative, whereas for −∞ clusters are recovered if review different definitions of Laplacians on signed graphs. positive or negative interactions are informative. This is The final clustering algorithm to find k clusters is the same consistent with related work in the context of unsigned mul- for all of them: compute the smallest k eigenvectors of the tilayer graphs (Mercado et al., 2018). Second, we show that corresponding Laplacian, use the eigenvectors to embed the the eigenvalues and eigenvectors of the signed power mean nodes into k, obtain the final clustering by doing k-means Laplacian concentrate around their mean, so that our results R in the embedding space. However, we will see below that hold also for the case where one samples from the stochastic in some cases we have to slightly deviate from this generic block model. Our result extends with minor changes to the principle by using the k − 1 smallest eigenvectors instead. unsigned multilayer graph setting considered in (Mercado et al., 2018), where just the expected case has been studied. Laplacians of Unsigned Graphs: In the following all To our knowledge these are the first concentration results weight matrices are non-negative and symmetric. Given for matrix power means under any stochastic block model an assortative graph G = (V; W ), standard spectral clus- for signed graphs. Finally, we show that the signed power tering is based on the Laplacian and its normalized version mean Laplacian compares favorably with state-of-the-art defined as: approaches through extensive numerical experiments on di- L = D − WL = D−1=2LD−1=2 verse real world datasets. All the proofs have been moved sym Pn to the supplementary material. where Dii = j=1 wij is the diagonal matrix of the degrees of G. Both Laplacians are symmetric positive semidef- Notation. A signed graph is a pair G± = (G+;G−), where inite and the multiplicity of the eigenvalue 0 is equal to the G+ = (V; W +) and G− = (V; W −) encode positive and number of connected components in G. negative edges, respectively, with positive symmetric ad- jacency matrices W + and W −, and a common vertex set For disassortative graphs, i.e. when edges carry only dis- V = fv1; : : : ; vng. Note that this definition allows the similarity information, the goal is to identify clusters such simultaneous presence of both positive and negative interac- that the amount of edges between clusters is larger than the tions between the same two nodes. This is a major difference one inside clusters. Spectral clustering is extended to this with respect to the alternative point of view where G± is setting by considering the signless Laplacian matrix and associated to a single symmetric matrix W with positive its normalized version (see e.g. (Liu, 2015; Mercado et al., and negative entries. In this case W = W + − W −, with 2016)), defined as: W + = maxf0;W g and W − = − minf0;W g, imply- ij ij ij ij Q = D + WQ = D−1=2QD−1=2 ing that every interaction is either positive or negative, but sym + Pn + not both at the same time. We denote by Dii = j=1 wij Both Laplacians are positive semi-definite, and the smallest − Pn − eigenvalue is zero if and only if the graph has a bipartite and Dii = j=1 wij the diagonal matrix of the degrees of G+ and G−, respectively, and D = D+ + D−. component (Desai & Rao, 1994). Laplacians of Signed Graphs: Signed graphs encode both 2. Related work positive and negative interactions. In the ideal k-balanced case positive interactions present an assortative behaviour, The study of clustering of signed graphs can be traced back whereas negative interactions present a disassortative be- to the theory of social balance (Cartwright & Harary, 1956; haviour. With this in mind, several novel definitions of Harary, 1953; Davis, 1967), where a signed graph is called signed Laplacians have been proposed. We briefly review k-balanced if the set of vertices can be partitioned into k sets them for later reference. such that within the subsets there are only positive edges, and between them only negative. In (Chiang et al., 2012) the balance ratio Laplacian and its normalized version are defined as: k Inspired by the notion of -balance, different approaches + + − −1=2 −1=2 for signed graph clustering have been introduced.

Load more