of Signed Graphs via Power Means

Pedro Mercado 1 2 Francesco Tudisco 3 Matthias Hein 2

Abstract example, can express positive interactions, like friendship Signed graphs encode positive (attractive) and and trust, and negative ones, like enmity and distrust. Other negative (repulsive) relations between nodes. We important application settings are the analysis of gene ex- extend spectral clustering to signed graphs via pressions in biology (Fujita et al., 2012) or the analysis of the one-parameter family of Signed Power Mean financial and economic time sequences (Ziegler et al., 2010; Laplacians, defined as the matrix power mean Pavlidis et al., 2006), where similarity and variable depen- of normalized standard and signless Laplacians dence measures commonly used may attain both positive of positive and negative edges. We provide a and negative values (e.g. the Pearson correlation coefficient). thorough analysis of the proposed approach in Although the majority of the literature has focused on the setting of a general Stochastic Block Model graphs that encode only positive interactions, the analy- that includes models such as the Labeled Stochas- sis of signed graphs can be traced back to social balance tic Block Model and the Censored Block Model. theory (Cartwright & Harary, 1956; Harary, 1953; Davis, We show that in expectation the signed power 1967), where the concept of a k-balance signed graph is mean Laplacian captures the ground truth clus- introduced. The analysis of signed networks has been then ters under reasonable settings where state-of-the- pushed forward through the study of a variety of tasks in art approaches fail. Moreover, we prove that the signed graphs, as for example edge prediction (Kumar et al., eigenvalues and eigenvector of the signed power 2016; Leskovec et al., 2010a; Falher et al., 2017), node mean Laplacian concentrate around their expec- classification (Bosch et al., 2018; Tang et al., 2016a), node tation under reasonable conditions in the general embeddings (Chiang et al., 2011; Derr et al., 2018; Kim Stochastic Block Model. Extensive experiments et al., 2018; Wang et al., 2017; Yuan et al., 2017), node on random graphs and real world datasets confirm ranking (Chung et al., 2013; Shahriari & Jalili, 2014), and the theoretically predicted behaviour of the signed clustering (Chiang et al., 2012; Kunegis et al., 2010; Mer- power mean Laplacian and show that it compares cado et al., 2016; Sedoc et al., 2017; Doreian & Mrvar, 2009; favourably with state-of-the-art methods. Knyazev, 2018; Kirkley et al., 2018; Cucuringu et al., 2019; Cucuringu et al., 2018). See (Tang et al., 2016b; Gallier, 2016) for recent surveys on the topic. 1. Introduction In this paper we present a novel extension of spectral cluster- The analysis of graphs has received a significant amount of ing for signed graphs. Spectral clustering (Luxburg, 2007) attention due to their capability to encode interactions that is a well established technique for non-signed graphs, which naturally arise in social networks. Yet, the vast majority of partitions the set of the nodes based on a k-dimensional node graph methods has been focused on the case where inter- embedding obtained using the first eigenvectors of the graph arXiv:1905.06230v1 [cs.LG] 15 May 2019 actions are of the same type, leaving aside the case where Laplacian. Our contributions are as follows: We intro- different kinds of interactions are available (Leskovec et al., duce the family of Signed Power Mean (SPM) Laplacians: 2010b). Graphs and networks with both positive and neg- a one-parameter family of graph matrices for signed graphs ative edge weights arise naturally in a number of social, that blends the information from positive and negative inter- biological and economic contexts. Social dynamics and actions through the matrix power mean, a general class of relationships are intrinsically positive and negative: users of matrix means that contains the arithmetic, geometric, and online social networks such as Slashdot and Epinions, for harmonic mean as special cases. This is inspired by recent extensions of spectral clustering which merge the informa- 1Saarland University 2University of Tubingen¨ 3University of Strathclyde. Correspondence to: Pedro Mercado . different types of arithmetic (Chiang et al., 2012; Kunegis et al., 2010) and geometric (Mercado et al., 2016) means th Proceedings of the 36 International Conference on Machine of the standard and signless graph Laplacians. We analyze Learning, Long Beach, California, PMLR 97, 2019. Copyright the performance of the signed power mean Laplacian in a 2019 by the author(s). Spectral Clustering of Signed Graphs via Matrix Power Means general Signed Stochastic Block Model. We first provide an A related approach is correlation clustering (Bansal et al., anlysis in expectation showing that the smaller is the param- 2004). Unlike spectral clustering, where the number of clus- eter of the signed power mean Laplacian, the less restrictive ters is fixed a-priori, correlation clustering approximates the are the conditions that ensure to recover the ground truth optimal number of clusters by identifying a partition that clusters. In particular, we show that the limit cases +∞ is as close as possible to be k-balanced. In this setting, the and −∞ are related to the boolean operators AND and OR, case where the number of clusters is constrained has been respectively, in the sense that for the limit case +∞ clusters considered in (Giotis & Guruswami, 2006). are recovered only if both positive and negative interactions We briefly introduce the standard and signless Laplacian and are informative, whereas for −∞ clusters are recovered if review different definitions of Laplacians on signed graphs. positive or negative interactions are informative. This is The final clustering algorithm to find k clusters is the same consistent with related work in the context of unsigned mul- for all of them: compute the smallest k eigenvectors of the tilayer graphs (Mercado et al., 2018). Second, we show that corresponding Laplacian, use the eigenvectors to embed the the eigenvalues and eigenvectors of the signed power mean nodes into k, obtain the final clustering by doing k-means Laplacian concentrate around their mean, so that our results R in the embedding space. However, we will see below that hold also for the case where one samples from the stochastic in some cases we have to slightly deviate from this generic block model. Our result extends with changes to the principle by using the k − 1 smallest eigenvectors instead. unsigned multilayer graph setting considered in (Mercado et al., 2018), where just the expected case has been studied. Laplacians of Unsigned Graphs: In the following all To our knowledge these are the first concentration results weight matrices are non-negative and symmetric. Given for matrix power means under any stochastic block model an assortative graph G = (V,W ), standard spectral clus- for signed graphs. Finally, we show that the signed power tering is based on the Laplacian and its normalized version mean Laplacian compares favorably with state-of-the-art defined as: approaches through extensive numerical experiments on di- L = D − WL = D−1/2LD−1/2 verse real world datasets. All the proofs have been moved sym Pn to the supplementary material. where Dii = j=1 wij is the of the de- grees of G. Both Laplacians are symmetric positive semidef- Notation. A signed graph is a pair G± = (G+,G−), where inite and the multiplicity of the eigenvalue 0 is equal to the G+ = (V,W +) and G− = (V,W −) encode positive and number of connected components in G. negative edges, respectively, with positive symmetric ad- jacency matrices W + and W −, and a common vertex set For disassortative graphs, i.e. when edges carry only dis- V = {v1, . . . , vn}. Note that this definition allows the similarity information, the goal is to identify clusters such simultaneous presence of both positive and negative interac- that the amount of edges between clusters is larger than the tions between the same two nodes. This is a major difference one inside clusters. Spectral clustering is extended to this with respect to the alternative point of view where G± is setting by considering the signless Laplacian matrix and associated to a single W with positive its normalized version (see e.g. (Liu, 2015; Mercado et al., and negative entries. In this case W = W + − W −, with 2016)), defined as: W + = max{0,W } and W − = − min{0,W }, imply- ij ij ij ij Q = D + WQ = D−1/2QD−1/2 ing that every interaction is either positive or negative, but sym + Pn + not both at the same time. We denote by Dii = j=1 wij Both Laplacians are positive semi-definite, and the smallest − Pn − eigenvalue is zero if and only if the graph has a bipartite and Dii = j=1 wij the diagonal matrix of the degrees of G+ and G−, respectively, and D = D+ + D−. component (Desai & Rao, 1994). Laplacians of Signed Graphs: Signed graphs encode both 2. Related work positive and negative interactions. In the ideal k-balanced case positive interactions present an assortative behaviour, The study of clustering of signed graphs can be traced back whereas negative interactions present a disassortative be- to the theory of social balance (Cartwright & Harary, 1956; haviour. With this in mind, several novel definitions of Harary, 1953; Davis, 1967), where a signed graph is called signed Laplacians have been proposed. We briefly review k-balanced if the set of vertices can be partitioned into k sets them for later reference. such that within the subsets there are only positive edges, and between them only negative. In (Chiang et al., 2012) the balance ratio Laplacian and its normalized version are defined as: k Inspired by the notion of -balance, different approaches + + − −1/2 −1/2 for signed graph clustering have been introduced. In par- LBR = D −W +W ,LBN = D LBRD ticular, many of them aim to extend spectral clustering to whereas in (Kunegis et al., 2010) the signed ratio Laplacian signed graphs by proposing novel signed graph Laplacians. Spectral Clustering of Signed Graphs via Matrix Power Means and its normalized version have been defined as: Table 1 Particular cases of scalar power means + − −1/2 −1/2 LSR = D − W + W ,LSN = D LSRD p mp(a, b) name The signed Laplacians LBR and LBN need not be positive p → ∞ max{a, b} maximum p = 1 (a + b)/2 arithmetic mean semidefinite, while the signed Laplacians LSR and LSN are √ positive semidefinite with eigenvalue zero if and only if the p → 0 ab geometric mean p = −1 2 1 + 1 −1 graph is 2-balanced. a b harmonic mean p → −∞ min{a, b} minimum In the context of correlation clustering, in (Saade et al., 2015) the Bethe is defined as: mean have been introduced, which typically agree if the √ + − matrices commute, see e.g. Chapter 4 in (Bhatia, 2009). We H = (α − 1)I − α(W − W ) + D consider the following matrix extension of the scalar power 1 Pn mean: where α is the average node degree α = n i=1 Dii. The Bethe Hessian H need not be positive definite. In fact, Definition 1 ((Bhagwat & Subramanian, 1978)). Let A, B eigenvectors with negative eigenvalues bring information of be symmetric positive definite matrices, and p ∈ R. The clustering structure (Saade et al., 2014). matrix power mean of A, B with exponent p is + + + − − − Let L = D −W and Q = D +W be the Laplacian  p p 1/p + − A + B and signless Laplacian of G and G , respectively. As Mp(A, B) = + − 2 noted in (Mercado et al., 2016), LSR = L + Q i.e. it L+ Q− coincides with twice the arithmetic mean of and . where Y 1/p is the unique positive definite solution of the H α Note that the same holds for when the average equation Xp = Y . is equal to one, i.e. H = LSR when α = 1. In (Mercado et al., 2016), the arithmetic mean and geometric mean of Please note that this definition can be extended to positive the normalized Laplacian and its signless version are used semidefinite matrices (Bhagwat & Subramanian, 1978) for to define new Laplacians for signed graphs: p > 0, as Mp(A, B) exists, whereas for p < 0 a diago- + − + − nal shift is necessary to ensure that the matrices A, B are LAM = Lsym + Qsym,LGM = Lsym#Qsym positive definite. where A#B = A−1/2(A1/2BA1/2)1/2A−1/2 is the geo- + + −1/2 + + −1/2 metric mean of A and B, Lsym = (D ) L (D ) 2.2. The Signed Power Mean Laplacian and Q− = (D−)−1/2Q−(D−)−1/2. While the computa- sym Given a signed graph G± = (G+,G−) we define the Signed tion of LGM is more challenging, in (Mercado et al., 2016) ± it is shown that the clustering assignment obtained with Power Mean (SPM) Laplacian Lp of G as the geometric mean Laplacian LGM outperforms all other + − Lp = Mp(L ,Q ). (1) signed Laplacians. sym sym p < 0 Both the arithmetic and the geometric means are special For the case the matrix power mean requires positive cases of a much richer one-parameter family of means definite matrices, hence we use in this case the matrix power L+ + εI known as power means. Based on this observation, we mean of diagonally shifted Laplacians, i.e. sym and Q− + εI introduce the Signed Power Mean Laplacian in Section 2.2, sym . Our following theoretical analysis holds for ε > 0 defined via a matrix version of the family of power means all possible shifts , whereas we discuss in the sup- which we briefly review below. plementary material the numerical robustness with respect to ε. The clustering algorithm for identifying k clusters in signed graphs is given in Algorithm1. Please note that 2.1. Matrix Power Means for p ≥ 1 we deviate from the usual scheme and use the The scalar power mean of two non-negative scalars a, b first k − 1 eigenvectors rathen than the first k. The reason is a one-parameter family of means defined for p ∈ R is a result of the analysis in the stochastic block model in ap+bp 1/p Section3. In general, the main influence of the parameter p as mp(a, b) = 2 . Particular cases are the arith- metic, geometric and harmonic means, as shown in Ta- of the power mean is on the ordering of the eigenvalues. In ble1. Moreover, the scalar power mean is monotone in Section3 we will see that this significantly influences the performance of different instances of SPM Laplacians, in the parameter p, i.e. mp(a, b) ≤ mq(a, b) when p ≤ q (see (Bullen, 2013) , Ch. 3, Thm. 1), which yields the particular, the arithmetic and geometric mean discussed in well known arithmetic-geometric-harmonic mean inequality (Mercado et al., 2016) are suboptimal for the recovery of the ground truth clusters. For the computation of the matrix m−1(a, b) ≤ m0(a, b) ≤ m1(a, b). As matrices do not commute, several matrix extensions of the scalar power power mean we adapt the scalable Krylov subspace-based algorithm proposed in (Mercado et al., 2018). Spectral Clustering of Signed Graphs via Matrix Power Means

Algorithm 1: Spectral clustering of signed graphs with Lp 2012) and the Censored Block Model (CBM)(Abbe et al., 2014). In the context of signed graphs, both LSBM and Input: Symmetric matrices W +,W −, number k of clusters to construct. CBM assume that an observed edge can be either posi- Output: Clusters C1,...,Ck. tive or negative, but not both. Our SSBM, instead, allows 0 0 1 Let k = k − 1 if p ≥ 1 and k = k if p < 1. the simultaneous presence of both positive and negative 0 2 Compute eigenvectors u1,..., uk0 corresponding to the k edges between the same pair of nodes, as the parameters smallest eigenvalues of Lp. + + − − pin, pout, pin, pout in SSBM are independent. Moreover, the 3 Set U = (u ,..., u 0 ) and cluster the rows of U with k-means 1 k edge probabilities defining both the LSBM and the CBM into clusters C1,...,Ck. can be recovered as special cases of the SSBM. In particular, 3. Stochastic Block Model Analysis of the the LSBM corresponds to the SSBM for the choices + + − − Signed Power Mean Laplacian pin = pinµ , pin = pinµ (within clusters) + + − − pout = poutν , pout = poutν (between clusters) In this section we analyze the signed power mean Laplacian where p and p are edge probabilities within and be- L under a general Signed Stochastic Block Model. Our in out p tween clusters, respectively, whereas µ+ and µ− = 1 − µ+ results here are twofold. First, we derive new conditions in (resp. ν+ and ν− = 1 − ν+) are the probabilities of as- expectation that guarantee that the eigenvectors correspond- signing a positive and negative label to an edge within (resp. ing to the smallest eigenvalues of L recover the ground p between) clusters. Similarly, the CBM corresponds to the truth clusters. These conditions reveal that, in this setting, SSBM for the particular choices p = p , µ+ = ν− = the state-of-the-art signed graph matrices are suboptimal as in out (1 − η) and µ− = ν+ = η where η is a noise parameter. compared to Lp for negative values of p. Second, we show + + − that our result in expectation transfer to sampled graphs as Our goal is to identify conditions in terms of k, pin, pout, pin, − we prove conditions that ensure that both eigenvalues and and pout, such that C1,..., Ck are recovered by the smallest eigenvectors of Lp concentrate around their expected value eigenvectors of the signed power mean Laplacian. Consider with high probability. We verify our results by several exper- the following k vectors: iments where the clustering performance of state-of-the-art χ = 1, χ = (k − 1)1 − 1 . 1 i Ci Ci matrices and Lp are compared on random graphs following k i = 2, . . . , k. The node embedding given by {χi}i=1 is the Signed Stochastic Block Model. k informative in the sense that applying k-means on {χi}i=1 All proofs hold for an arbitrary diagonal shift ε > 0, trivially recovers the ground truth clusters C1,..., Ck as −6 whereas the shift is set to ε = log10(1 + |p|) + 10 in all nodes of a cluster are mapped to the same point. Note the numerical experiments. Numerical robustness with re- that the constant vector χ1 could be omitted as it does not spect to ε is discussed in the supplementary material. add clustering information. We derive conditions for the k The Stochastic Block Model (SBM) is a well-established SSBM such that {χi}i=1 are the smallest eigenvectors of generative model for graphs and a canonical tool for study- the signed power mean Laplacian in expectation. + − ing clustering methods (Holland et al., 1983; Rohe et al., Theorem 1. Let Lp = Mp(Lsym, Qsym) and let ε > 0 be 2011; Abbe, 2018). Graphs drawn from the SBM show the diagonal shift. a prescribed clustering structure, as the probability of an k • If p ≥ 1, then {χi}i=2 correspond to the (k-1)-smallest edge between two nodes depends only on the clustering + − eigenvalues of Lp if and only if mp(ρε , ρε ) < 1 + ε; membership of each node. We introduce our SBM for k • If p < 1, then {χi}i=1 correspond to the k-smallest signed Graphs (SSBM): we consider k ground truth clusters + − n eigenvalues of Lp if and only if mp(ρε , ρε ) < 1 + ε; C1,..., Ck, all of them of size |C| = , and parameters k with ρ+ = 1 − (p+ − p+ )/(p+ + (k − 1)p+ ) + ε and p+ , p+ , p− , p− ∈ [0, 1] where p+ (resp. p− ) is the prob- ε in out in out in out in out in in ρ− = 1 + (p− − p− )/(p− + (k − 1)p− ) + ε. ability of observing an edge inside clusters in G+ (resp. G−) ε in out in out p+ p− and out (resp. out) is the probability of observing an edge Note that Theorem1 is the reason why Alg.1 uses only G+ G− between clusters in (resp. ). Calligraphic letters are the first k − 1 eigenvectors for p ≥ 1. The problem is W+ W− used for the expected adjacency matrices: and that the constant eigenvector need not be among the first G+ G− are the expected of and , respec- k eigenvectors in the SSBM for p ≥ 1. However, as it is tively, where W+ = p+ and W− = p− if v , v belong to i,j in i,j in i j constant and thus uninformative in the embedding, this does + + − − the same cluster, whereas Wi,j = pout and Wi,j = pout if not lead to any loss of information. The following Corollary v , v belong to different clusters. i j shows that the limit cases of Lp are related to the boolean Other extensions of the SBM to the signed setting have operators AND and OR. + − been considered. Particularly relevant examples are the La- Corollary 1. Let Lp = Mp(Lsym, Qsym). belled Stochastic Block Model (LSBM)(Heimlicher et al., k •{ χi}i=2 correspond to the (k-1)-smallest eigenvalues of Spectral Clustering of Signed Graphs via Matrix Power Means

+ + − − + − L∞ if and only if pin > pout and pin < pout, which can make one of the graphs G and G quite sparse k •{ χi}i=1 correspond to the k-smallest eigenvalues of and as we just consider graphs with 200 nodes, the area of + + − − L−∞ if and only if pin > pout or pin < pout. low clustering error is smaller in comparison to the region of guaranteed recovery in expectation due to the sampling The conditions for L∞ are the most conservative ones, as variance in the stochastic block model. they require that G+ and G− are informative, i.e. G+ has to be assortative and G− disassortative. Under these con- In the bottom row of Fig.2 we show the clustering error L ,L ,L H ditions every clustering method for signed graphs should for the state of the art methods GM SN BM and . L be able to identify the ground truth clusters in expectation. We can see that GM presents a similar performance as L On the other hand, the less restrictive conditions for the the signed power mean Laplacian 0. The next Theorem L recovery of the ground truth clusters correspond to the limit shows that the geometric mean Laplacian GM and the + − limit p → 0 of the signed power mean Laplacian agree in case L−∞. If G or G are informative, then the ground expectation for the SSBM. This implies via Corollary2 that truth clusters are recovered, that is, L−∞ only requires that G+ is assortative or G− is disassortative. In particular, the this operator is inferior to the signed power mean Laplacian p < 0 following corollary shows that smaller values of p require for . This is why we use in the experiments on real p < 0 less restrictive conditions to ensure the identification of the world graphs later on always . + − informative eigenvectors. Theorem 2. Let LGM = Lsym#Qsym and L0 be the k signed power mean Laplacian with p → 0 of the expected Corollary 2. Let q ≤ p. If {χi}i=θ(p) correspond to the k- k signed graph. Then, L0 = LGM . smallest eigenvalues of Lp, then {χi}i=θ(q) correspond to the k-smallest eigenvalues of Lq, where θ(x) = 1 if x ≤ 0 In the bottom row of Fig.2 we can observe that LSN ,LBN and θ(x) = 2 if x > 0. and H present a similar behaviour to the arithmetic mean Laplacian L1. A quick computation shows that for the To better understand the different conditions we have de- case where both G+,G− have the same node degree in rived, we visualize them in Fig.1, where the x-axis cor- expectation, the conditions of Theorem1 for L1 reduce to responds to how assortative G+ is, while the y-axis corre- − + + − pin + pout < pin + pout. It turns out that this condition is sponds to how disassortative G− is. The conditions of the also required by LSN , LBN and H, as the following shows. limit case L , i.e. the case where G+ and G− have to be ∞ Theorem 3 ((Mercado et al., 2016)). Let L and L be informative, correspond to the upper-right region, dark blue BN SN the balanced normalized Laplacian and signed normalized region in Fig. 1c, and correspond to the 25% of all possible Laplacian of the expected signed graph. The following configurations of the SBM. The conditions for the limit case + − statements are equivalent: L−∞, i.e. the case where G or G has to be informative, k instead correspond to all possible configurations of the SBM •{ χi}i=1 are the eigenvectors corresponding to the k- except for the bottom-left region. This is depicted in Fig. 1b smallest eigenvalues of LBN . k and corresponds to the 75% of all possible configurations •{ χi}i=1 are the eigenvectors corresponding to the k- under the SBM. smallest eigenvalues of LSN . • inequalities p− + (k − 1)p− < p+ + (k − 1)p+ and In Fig.2 we present the corresponding conditions for recov- in out in out p− + p+ < p+ + p− hold. ery in expectation for the cases p ∈ {−10, −1, 0, 1, 10}. We in out in out can visually verify that the larger the value of p the smaller Finally, we present conditions in expectation for the Bethe is the region where the conditions of Theorem1 hold. In Hessian to identify the ground truth clustering. particular, one can compare the change of conditions as one Let H be the Bethe Hessian of the expected moves from the signed harmonic (L ), geometric (L ), to Theorem 4. −1 0 signed graph. Then {χ }k are the eigenvectors corre- the arithmetic (L ) mean Laplacians verifying the ordering i i=2 1 sponding to the (k − 1)-smallest negative eigenvalues of H described in Corollary2. Moreover, we clearly observe if and only if the following conditions hold: that L−10 and L10 are already quite close to the conditions 2(d++d−)−1 necessary for the limit cases L and L , respectively. 1. max{0, √ } < (p+ − p+ ) − (p− − p− ) −∞ ∞ d++d−|C| in out in out + − In the middle row of Fig.2 we show the average clustering 2. pout < pout error for each power mean Laplacian when sampling 50 Moreover, for the limit case |V | → ∞ the first condition times from the SSBM following the diagram presented in − + + − reduces to p + pout < p + pout. + − + in in Fig. 1a and fixing the sparsity of G and G by setting pin + + − − pout = 0.1 and pin + pout = 0.1 with two clusters each of Please see the supplementary material for a further analysis size 100. We observe that the areas with low clustering error in expectation. We can observe that the first condition in qualitatively match the regions where in expectation we Theorem4 is related to conditions of L1 and LSN , LBN − + + − have recovery of the clusters. However, due to the sampling through the inequality pin +pout < pin +pout. This explains Spectral Clustering of Signed Graphs via Matrix Power Means

Figure 1: Stochastic Block Model (SBM) for signed graphs. From left to right: Fig. 1a SBM Diagram. Fig. 1b SBM for L−∞(OR), Fig. 1c SBM for L∞(AND). according to Corollary1.

(a) SBM Diagram (b) L−∞ (OR) (c) L∞ (AND)

Recovery of Clusters in Expectation

True False -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 -0.1 -0.1 -0.1 -0.1

0 0 0 0 0

0.1 0.1 0.1 0.1 0.1

(a) L−10 (b) L−1 (c) L0 (d) L1 (e) L10 Clustering Error

0 0.5

-0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 -0.1 -0.1 -0.1 -0.1

0 0 0 0 0

0.1 0.1 0.1 0.1 0.1

(f) L−10 (g) L−1 (h) L0 (i) L1 (j) L10 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 0 0.1 -0.1 -0.1 -0.1 -0.1

0 0 0 0 random_bethe_hessian.pdf

0.1 0.1 0.1 0.1

(k) LGM (l) LSN (m) LBN (n) H Figure 2: Performance visualization for two clusters for different parameters of the SBM. Top row: In dark blue the settings where the signed power mean Laplacians Lp identify the ground truth clusters in expectation for the SBM, see Theorem1, whereas yellow indicates failure. Middle/Bottom row: average clustering error (dark blue: small error, yellow: large error) of the signed power mean Laplacian Lp and LGM ,LSN ,LBN ,H for 50 samples from the SBM. why the performance of the Bethe Hessian H resembles LSN ,LBN ,L1,H have a similar performance. However, the one of arithmetic Laplacians LSN ,LBN ,L1. A more in the case of sparse graphs, it is known that the Bethe Hes- detailed comparison between the conditions of Theorems1, sian is asymptotically optimal in the information-theoretic 3 and4 is detailed in the supplementary material. transition limit (Saade et al., 2014; 2015). Please see the supplementary material for an evaluation under the CBM. Note that our analysis in expectation considers the dense regime where the average degree increases with the num- We now zoom in on a particular setting of Fig.2. Namely, ber of nodes and hence our results in expectation are veri- the case where G+ (resp.G−) is fixed to be informative, fied under the SSBM setting here considered, showing that whereas the remaining graph transitions from informative to Spectral Clustering of Signed Graphs via Matrix Power Means

0.4 0.4

0.3 0.3

0.2 0.2 (c) L−10 (d) L−1 (e) L0 (f) L1

0.1 0.1

-0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05

(a) (b) (g) L10 (h) LSN (i) LBN (j) H Figure 3: Left: Mean clustering error under the SSBM, with two clusters of size 100 and 50 runs. In Fig. 3a: G+ is + + − informative, i.e. assortative with pin = 0.09 and pout = 0.01. In Fig. 3b: G is informative, i.e. disassortative with − − pin = 0.01 and pout = 0.09. Right: Node embeddings induced by eigenvectors of different signed Laplacians for a random + + − − graph drawn from SSBM for 2 clusters of size 100, pin = 0.025, pout = 0.075, pin = 0.01, pout = 0.09. uninformative. The corresponding results are in Fig.3. In power mean to the consistency of the standard and signless Fig. 3a we consider the case where G+ is informative with Laplacian established in (Chung & Radcliffe, 2011). parameters p+ = 0.09 and p+ = 0.01 (this corresponds in out The consistency of spectral clustering on unsigned graphs to p+ − p+ = 0.08 in Fig.2 ), and G− goes from being in out for the SBM has been studied in (Lei & Rinaldo, 2015; informative (p− < p− ) to non-informative (p− ≥ p− ). in out in out Sarkar & Bickel, 2015; Rohe et al., 2011) and more recently We confirm that the power mean Laplacian L presents p consistency of several variants of spectral clustering has smaller clustering errors for smaller values of p. Moreover, been shown (Qin & Rohe, 2013; Joseph & Yu, 2016; Chaud- it is clear that in the case p < 0, L is able to recover clusters p huri et al., 2012; Le et al.; Fasino & Tudisco, 2018; Davis even in the case where G− is not informative, whereas for & Sethuraman, 2018). Moreover, while the case of multi- p > 0, L requires both G+ and G− to be informative. We p layer graphs under the SBM has been previously analyzed observe that the smallest (resp. largest) clustering errors (Han et al., 2015; Heimlicher et al., 2012; Jog & Loh, 2015; correspond to L (resp. L ), corroborating Corollary2. −10 10 Paul & Chen, 2017; Xu et al., 2014; 2017; Yun & Proutiere, Further, we can observe that L and L have a similar GM 0 2016), there are no consistency results for matrix power performance, as well as L ,L ,L ,H, as observed SN BN 1 means for multilayer graphs as studied in (Mercado et al., before, confirming Theorem2 and Theorem4, respectively. 2018). While our main emphasis is on the analysis of the In Fig. 3b similar observations hold for the case where SPM Laplacian, our proofs are general enough to cover also G− is informative with parameters p− = 0.01 and p− = in out the consistency of the matrix power means for unsigned mul- 0.09 (this corresponds to p− − p− = −0.08 in Fig.2), in out tilayer graphs (Mercado et al., 2018). In Thm.5 we show and G+ goes from being non-informative (p+ ≤ p+ ) to in out that the SPM Laplacian L for the SSBM is concentrated informative (p+ > p+ ). Within this setting we present p in out around L , with high probability for large n. The following the eigenvector-based node embeddings of each method for p + + − − results hold for general shifts ε. the case pin = 0.025, pout = 0.075, pin = 0.01, pout = 0.09, in right hand side of Fig.3. For L−10,L−1,L0 the Theorem 5. Let p be a non-zero integer, let embeddings split the clusters properly, whereas remaining  (2p)1/p(2 + ε)1−1/p p ≥ 1 embeddings are not informative, verifying the effectivity of C = p |2p|1/|p| ε−(3+1/|p|) p ≤ −1 Lp with p < 0. n + + and choose  > 0. If k (pin+(k−1)pout) > 3 ln(8n/), and 3.1. Consistency of the Signed Power Mean Laplacian n − − k (pin +(k−1)pout) > 3 ln(8n/), then with probability at for the Stochastic Block Model least 1 − , we have r r  In this section we prove two novel concentration bounds kL − L k ≤ C m1/|p| 3 ln(8n/) , 3 ln(8n/) p p p |p| n (p++(k−1)p+ ) n (p−+(k−1)p− ) for signed power mean Laplacians of signed graphs drawn k in out k in out from the SSBM. The bounds show that, for large graphs, our previous results in expectation transfer to sampled graphs In Thm5 we take the spectral norm. A more general version of Theorem5 for the inhomogeneous Erd os-R˝ enyi´ model, with high probability. We first show in Theorem5 that Lp is where edges are formed independently with probabilities close to Lp. Then, in Theorem6, we show that eigenvalues p+, p− is given in the supplementary material. Theorem5 and eigenvectors of Lp are close to those of Lp. We derive ij ij builds on top of concentration results of (Chung & Radcliffe, this result by tracing back the consistency of the matrix + + 2011) proven for the unsigned case Lsym − Lsym . We Spectral Clustering of Signed Graphs via Matrix Power Means

(a) L−10 (b) L−5 (c) L−2 (d) L−1 (e) L0 (f) L1

Figure 4: Adjacency matrices W + and W − sorted through clustering of Wikipedia Elections dataset by the proposed Signed + Power mean Laplacians L−10,L−5,L−2,L−1,L0,L1. Top row: zoom-in visualization of positive edges W . Bottom row: zoom-in visualization of negative edges W −. See supplementary material for more details. can see that the deviation of Lp from Lp depends on the 4. Experiments on Wikipedia-Elections + − power mean of the individual deviations of Lsym and Qsym + − We now evaluate the Signed Power Mean Laplacian Lp from Lsym and Qsym, respectively. Note that the larger the size n of the graph is, the stronger is the concentration of with p ∈ {−10, −5, −2, −1, 0, 1} on Wikipedia-Elections dataset (Leskovec & Krevl, 2014). In this dataset each node Lp around Lp. represents an editor requesting to become administrator and The next Theorem shows that the eigenvectors correspond- positive (resp. negative) edges represent supporting (resp. ing to the smallest eigenvalues of Lp are close to the corre- against) votes to the corresponding admin candidate. sponding eigenvectors of Lp. This is a key result showing While (Chiang et al., 2012) conjectured that this dataset consistency of our spectral clustering technique with Lp for signed graphs drawn from the SSBM. has no clustering structure, recent works (Mercado et al., 2016; Cucuringu et al., 2019) have shown that indeed there is clustering structure. As noted in (Mercado et al., 2016), n×k using the geometric mean Laplacian LGM and looking for Theorem 6. Let p 6= 0 be an integer. Let Vk, Vk ∈ R be orthonormal matrices whose columns are the eigenvectors k clusters unveils the presence of a large non-informative cluster and k − 1 remaining smaller clusters which show of the k smallest eigenvalues of Lp and Lp, respectively. + − relevant clustering structure. Let ρε , ρε and Cp be defined as in Theorems1 and5, respectively. Define ek = k − 1, if p ≥ 1, and ek = k, if Our results verify these recent findings. We set the number p ≤ −1 and choose  > 0. of clusters to identify to k = 30 and in Fig.4 we portray + − + n + + If mp(ρε , ρε ) < 1 + ε, δ := k (pin + (k − 1)pout) > the portion of adjacency matrices of positive and negative − n − − edges W + and W − corresponding to k − 1 clusters sorted 3 ln(8n/), and δ := k (pin +(k−1)pout) > 3 ln(8n/), then there exists an O ∈ ek×ek such according to the corresponding identified clusters. We can ek R that, with probability at least 1 − , we have see that when p ≤ 0 the Signed Power Mean Laplacian L identifies clustering stucture, whereas this structure is q q p p 1/|p| 3 ln(8n/) 3 ln(8n/)  overlooked by the arithmetic mean case p = 1. Moreover, 8ekCp m|p| δ+ , δ− kV − V O k ≤ we can see that different powers identify slightly different ek ek ek + − (1 + ε) − mp(ρε , ρε ) clusters: this happens as this dataset does not necessarily follow the Signed Stochastic Block Model, and hence we do not fully retrieve the same behaviour studied in Section3. Note that the main difference compared to Thm.5 is the Further experiments on UCI datasets are available in the spectral gap γ = (1 + ε) − m (ρ+, ρ−) of L , which is p p ε ε p supplementary material, suggesting that the L together the difference of the eigenvalues corresponding to the infor- GM with Lp is a reasonable option under different settings. mative versus non-informative eigenvectors of Lp. Thus the stronger the clustering structure the tighter is the concentra- Acknowledgments The work of F.T. has been funded by the tion of the eigenvectors. Moreover, from the monotonicity Marie Curie Individual Fellowship MAGNET n. 744014. of mp we have γp ≥ γq for p < q, and thus for p ≤ −1 the spectral gap increases with |p|, ensuring a stronger concen- tration of eigenvectors for smaller values of p. Spectral Clustering of Signed Graphs via Matrix Power Means

References Cucuringu, M., Davies, P., Glielmo, A., and Tyagi, H. SPONGE: A generalized eigenproblem for clustering Abbe, E. Community detection and stochastic block mod- signed networks. In AISTATS, 2019. els: Recent developments. Journal of Research, 18(177):1–86, 2018. Davis, E. and Sethuraman, S. Consistency of modular- ity clustering on random geometric graphs. Ann. Appl. Abbe, E., Bandeira, A. S., Bracher, A., and Singer, A. De- Probab., 28(4):2003–2062, 08 2018. coding binary node labels from censored edge measure- ments: Phase transition and efficient recovery. IEEE Davis, J. A. Clustering and structural balance in graphs. Transactions on Network Science and Engineering, 1(1): Human Relations, 20:181–187, 1967. 10–22, Jan 2014. Derr, T., Ma, Y., and Tang, J. Signed Graph Convolutional Bansal, N., Blum, A., and Chawla, S. Correlation clustering. Network. arXiv:1808.06354, 2018. Machine Learning, 56(1):89–113, Jul 2004. Desai, M. and Rao, V. A characterization of the smallest Bhagwat, K. V. and Subramanian, R. Inequalities between eigenvalue of a graph. Journal of , 18(2): means of positive operators. Mathematical Proceedings 181–194, 1994. of the Cambridge Philosophical Society, 83(3):393401, 1978. Doreian, P. and Mrvar, A. Partitioning signed social net- works. Social Networks, 31(1):1–11, 2009. Bhatia, R. Matrix Analysis. Springer New York, 1997. Falher, G. L., Cesa-Bianchi, N., Gentile, C., and Vitale, F. Bhatia, R. Positive definite matrices. Princeton University On the Troll-Trust Model for Edge Sign Prediction in Press, 2009. Social Networks. In AISTATS, 2017.

Bosch, J., Mercado, P., and Stoll, M. Node classifica- Fasino, D. and Tudisco, F. A modularity based spectral tion for signed networks using diffuse interface methods. method for simultaneous community and anti-community arXiv:1809.06432, 2018. detection. Linear Algebra and its Applications, 542:605– 623, 2018. Bullen, P. S. Handbook of means and their inequalities, volume 560. Springer Science & Business Media, 2013. Fujita, A., Severino, P., Kojima, K., Sato, J. R., Patriota, A. G., and Miyano, S. Functional clustering of time series Cartwright, D. and Harary, F. Structural balance: a general- gene expression data by granger causality. BMC systems ization of Heider’s theory. Psychological Review, 63(5): biology, 6(1):137, 2012. 277–293, 1956. Gallier, J. Spectral theory of unsigned and signed Chaudhuri, K., Chung, F., and Tsiatas, A. Spectral clustering graphs. applications to graph clustering: a survey. of graphs with general degrees in the extended planted arXiv:1601.04692, 2016. partition model. In COLT, 2012. Giotis, I. and Guruswami, V. Correlation clustering with a Chiang, K., Whang, J., and Dhillon, I. Scalable clustering fixed number of clusters. In SODA, 2006. of signed networks using balance normalized cut. CIKM, 2012. Han, Q., Xu, K. S., and Airoldi, E. M. Consistent estimation of dynamic and multi-layer block models. In ICML, 2015. Chiang, K.-Y., Natarajan, N., Tewari, A., and Dhillon, I. S. Exploiting longer cycles for link prediction in signed Harary, F. On the notion of balance of a signed graph. networks. CIKM, 2011. Michigan Mathematical Journal, 2:143–146, 1953.

Chung, F. and Radcliffe, M. On the spectra of general Heimlicher, S., Lelarge, M., and Massoulie,´ L. Commu- random graphs. the electronic journal of combinatorics, nity detection in the labelled stochastic block model. 18(1):P215, 2011. arXiv:1209.2910, 2012.

Chung, F., Tsiatas, A., and Xu, W. Dirichlet pagerank and Holland, P. W., Laskey, K. B., and Leinhardt, S. Stochastic ranking algorithms based on trust and distrust. Internet blockmodels: First steps. Social Networks, 5(2):109 – Mathematics, 9(1):113–134, 2013. 137, 1983.

Cucuringu, M., Pizzoferrato, A., and van Gennip, Y. An Jog, V. and Loh, P.-L. Information-theoretic bounds for MBO scheme for clustering and semi-supervised cluster- exact recovery in weighted stochastic block models using ing of signed networks. arXiv:1901.03091, 2018. the Renyi divergence. arXiv:1509.06418, 2015. Spectral Clustering of Signed Graphs via Matrix Power Means

Joseph, A. and Yu, B. Impact of regularization on spectral Pavlidis, N. G., Plagianakos, V. P., Tasoulis, D. K., and Vra- clustering. Ann. Statist., 44(4):1765–1791, 08 2016. hatis, M. N. Financial forecasting through unsupervised clustering and neural networks. Operational Research, 6 Kim, J., Park, H., Lee, J.-E., and Kang, U. Side: Repre- (2):103–127, 2006. sentation learning in signed directed networks. In WWW, 2018. Qin, T. and Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. In NIPS. Kirkley, A., Cantwell, G. T., and Newman, M. Balance in 2013. signed networks. arXiv:1809.05140, 2018. Rohe, K., Chatterjee, S., Yu, B., et al. Spectral clustering Knyazev, A. On spectral partitioning of signed graphs. In and the high-dimensional stochastic blockmodel. The SIAM Workshop on Combinatorial Scientific Computing, Annals of Statistics, 39(4):1878–1915, 2011. 2018. Saade, A., Krzakala, F., and Zdeborova,´ L. Spectral cluster- Kumar, S., Spezzano, F., Subrahmanian, V., and Faloutsos, ing of graphs with the bethe hessian. In NIPS. 2014. C. Edge weight prediction in weighted signed networks. In ICDM, 2016. Saade, A., Lelarge, M., Krzakala, F., and Zdeborov, L. Spec- tral detection in the censored block model. In 2015 IEEE Kunegis, J., Schmidt, S., Lommatzsch, A., Lerner, J., Luca, International Symposium on Information Theory (ISIT), E., and Albayrak, S. Spectral analysis of signed graphs pp. 1184–1188, June 2015. for clustering, prediction and visualization. In ICDM, 2010. Sarkar, P. and Bickel, P. J. Role of normalization in spectral Ann. Statist. Le, C. M., Levina, E., and Vershynin, R. Concentration and clustering for stochastic blockmodels. , 43 regularization of random graphs. Random Structures & (3):962–990, 06 2015. Algorithms, 51(3):538–561. Sedoc, J., Gallier, J., Foster, D., and Ungar, L. Semantic Lei, J. and Rinaldo, A. Consistency of spectral clustering in word clusters using signed spectral clustering. In ACL, stochastic block models. Ann. Statist., 43(1):215–237, 02 2017. 2015. Shahriari, M. and Jalili, M. Ranking nodes in signed social Leskovec, J. and Krevl, A. SNAP Datasets: Stanford networks. Social Network Analysis and Mining, 4(1):172, Large Network Dataset Collection. http://snap. Jan 2014. ISSN 1869-5469. stanford.edu/data, June 2014. Tang, J., Aggarwal, C., and Liu, H. Node classification in Leskovec, J., Huttenlocher, D., and Kleinberg, J. Predicting signed social networks. In SDM, 2016a. positive and negative links in online social networks. In Tang, J., Chang, Y., Aggarwal, C., and Liu, H. A survey of WWW , 2010a. signed network mining in social media. ACM Comput. Leskovec, J., Huttenlocher, D., and Kleinberg, J. Signed Surv., 49(3):42:1–42:37, August 2016b. ISSN 0360-0300. Networks in Social Media. In CHI, 2010b. Tropp, J. A. An introduction to matrix concentration in- Liu, S. Multi-way dual cheeger constants and spectral equalities. Foundations and Trends in Machine Learning, bounds of graphs. Advances in Mathematics, 268:306 – 8(1-2):1–230, 2015. ISSN 1935-8237. 338, 2015. Wang, S., Tang, J., Aggarwal, C., Chang, Y., and Liu, H. Luxburg, U. A tutorial on spectral clustering. Statistics and Signed network embedding in social media. In SDM, Computing, 17(4):395–416, December 2007. 2017. Mercado, P., Tudisco, F., and Hein, M. Clustering signed Xu, J., Massouli, L., and Lelarge, M. Edge label inference networks with the geometric mean of Laplacians. In NIPS. in generalized stochastic block models: from spectral 2016. theory to impossibility results. In COLT, 2014. Mercado, P., Gautier, A., Tudisco, F., and Hein, M. The Xu, M., Jog, V., and Loh, P.-L. Optimal rates for commu- power mean laplacian for multilayer graph clustering. In nity estimation in the weighted stochastic block model. AISTATS, 2018. arXiv:1706.01175, 2017. Paul, S. and Chen, Y. Consistency of community detec- Yu, Y., Wang, T., and Samworth, R. J. A useful variant of tion in multi-layer networks using spectral and matrix the Davis-Kahan theorem for statisticians. Biometrika, factorization methods. arXiv:1704.07353, 2017. 102(2):315–323, 2015. Spectral Clustering of Signed Graphs via Matrix Power Means Yuan, S., Wu, X., and Xiang, Y. Sne: Signed network Appendices embedding. In PAKDD, 2017.

Yun, S.-Y. and Proutiere, A. Optimal cluster recovery in the This section contains all the proofs of results mention in the labeled stochastic block model. In NIPS. 2016. main paper. It is organized as follows: Ziegler, H., Jenny, M., Gruse, T., and Keim, D. A. Visual • SectionA contains the proof of Theorem1, market sector analysis for financial time series data. In • SectionB contains the proof of Corollary1, Visual Analytics Science and Technology (VAST), 2010 • SectionC contains the proof of Corollary2, IEEE Symposium on, pp. 83–90. IEEE, 2010. • SectionD contains the proof of Theorem2, • SectionI contains the proof of Theorem4 • SectionE contains the proof of Theorem5, • SectionG contains the proof of6. • SectionJ contains experiments on the Censored Block Model where the superiority of the Bethe Hessian on the sparse regime is verified, • SectionK contains experiments on UCI datasets, • SectionL contains an analysis on diagonal shift of the signed power mean Laplacian, • SectionM contains a description on the numerical scheme for the computation of eigenvectors withtout ever com- puting the power mean Laplacian matrix, • SectionN contains a further study on conditions of ex- pectation of the power mean Laplacian and state of the art approaches. • SectionO contains a more detailed inspection of results on Wikipedia-Elections dataset presented in Section4. Before starting with the general results, we first mention a couple of basic results. The following theorem states the monotonocity of the scalar power mean. Theorem 7 ((Bullen, 2013), Ch. 3, Thm. 1). Let p < q then mp(a, b) ≤ mq(a, b) with equality if and only if a = b.

The following lemma shows the effect of the matrix power mean when matrices have a common eigenvector. Lemma 1 ((Mercado et al., 2018)). Let u be an eigenvector of both A and B, with corresponding eigenvalues α and β. Then u is an eigenvector of Mp(A, B) with eigenvalue mp(α, β).

A. Proof of Theorem1

Proof. (Proof of Theorem1) We first show that χ1,..., χk + − are eigenvectors of W and W . For χ1 we have,

+ + + + + + W χ1 = W 1 = |C| (pin +(k−1)pout)1 = d 1 = λ1 1

For the remaining vectors χ2,..., χk we have

W+χ = W+(k − 1)1 − 1  i Ci Ci = W+k1 − (1 + 1 ) Ci Ci Ci +  = W k1Ci − 1 Spectral Clustering of Signed Graphs via Matrix Power Means

= k |C| (p+ 1 + p+ 1 ) − d+1 Observe that λ (L ), with j = k + 1,..., |V |, corresponds in Ci out Ci j p + + + to eigenvectors that do not yield an informative embedding. = k |C| (p 1C + p 1 ) − d (1C + 1 ) in i out Ci i Ci Hence, we do not want this eigenvalue to belong to the bot- = |C| (kp+ − d+)1 + |C| (kp+ − d+)1 in Ci out Ci tom of the spectrum of Lp. Thus, for the case of χ2,..., χk, = |C| (k − 1)(p+ − p+ )1 − |C| (p+ − p+ )1 we can see that they will be located at the bottom of the in out Ci in out Ci + +  spectrum if the following condition holds: = |C| (pin − pout) (k − 1)1Ci − 1C i + − + + λi(Lp) = mp(1 − ρ + ε, 1 + ρ + ε) < 1 + ε = λj(Lp) = |C| (pin − pout)χi = λiχi It remains to analyze the case of the constant eigenvector χ1. Note that its associated eigenvalue λ1(L1) has the following relationship to the non-informative eigenvectors: The same procedure holds for W−. Thus, we have shown + − λ (L ) = m (ε, 2 + ε) = 1 + ε = λ (L ) that χ1,..., χk are eigenvectors of both W and W . In 1 1 1 j p particular, we have seen that By Theorem7 we know that the scalar power mean is mono- + + + + + + tone in its parameter p, and thus, for the case p < 1 we λ1 = |C| (pin + (k − 1)pout), λi = |C| (pin − pout) − − − − − − observe λ1 = |C| (pin + (k − 1)pout), λi = |C| (pin − pout) λ (L ) = m (ε, 2+ε) < m (ε, 2+ε) = λ (L ) = λ (L ) for i = 2, . . . , k. Further, as both matrices W+ and W− 1 p p 1 1 1 j p share all their eigenvectors, they are simultaneously diago- and for the case p ≥ 1 we observe nalizable, that is there exists a non-singular matrix Σ such λ (L ) = m (ε, 2+ε) ≥ m (ε, 2+ε) = λ (L ) = λ (L ) that Σ−1W±Σ = Λ±, where Λ+ and Λ− are diagonal ma- 1 p p 1 1 1 j p trices Λ± = diag(λ±, . . . , λ±, 0,..., 0). 1 k This means that for positive powers p ≥ 1 , the constant As we assume that all clusters are of the same size |C|, the eigenvector χ1 does not belong to the bottom of the spec- expected signed graph is a with degrees d+ trum, whereas for p < 1 it always does. and d−. Hence, the normalized Laplacian and normalized signless Laplacian of the expected signed graph can be With this in mind, we reach the desired result, namely k expressed as • Let p ≥ 1 . {χi}i=2 correspond to the (k-1)-smallest eigenvalues of Lp if and only if mp(µ + ε1) < 1 + ε, + 1 + −1 Lsym = Σ(I − + Λ )Σ k d • Let p < 1 . {χi}i=1 correspond to the k-smallest eigen- 1 Q− = Σ(I + Λ−)Σ−1 values of Lp if and only if mp(µ + ε1) < 1 + ε sym d−

Thus, we can observe that B. Proof of Corollary1 + + − − λ1 (Lsym) = 0, λ1 (Qsym) = 2 Proof of Corollary1. Following the proof from Theorem1, + + + − − − λi (Lsym) = 1 − ρ , λi (Qsym) = 1 + ρ recall that limp→∞ mp(x) = max{x1, . . . , xT } and + + − − limp→−∞ mp(x) = min{x1, . . . , xT }. λj (Lsym) = 1, λj (Qsym) = 1 + − Thus, m∞(µ + εI) = max(1 − ρ , 1 + ρ ), and hence + − m∞(µ + εI) < 1 + ε if and only if ρ > 0 and ρ < 0, for i = 2, . . . , k, and j = k + 1,..., |V |, where yielding the desired conditions. ρ+ = (p+ − p+ )/(p+ + (k − 1)p+ ) in out in out The case for p → −∞ is analogous: m−∞(µ + εI) = − − − − − + − ρ = (pin − pout)/(pin + (k − 1)pout) min(1 − ρ , 1 + ρ ) and thus m−∞(µ + εI) < 1 + ε if and only if ρ+ > 0 or ρ− < 0, yielding the desired By obtaining the signed power mean Laplacian on diago- conditions. nally shifted matrices, + − Lp = Mp(Lsym + εI, Qsym + εI) C. Proof of Corollary2 we have by Lemma1 Proof of Corollary2. If λ1, . . . , λk resp. (λ2, . . . , λk) are + − among the k (resp. k − 1)-smallest eigenvalues of Lp, then λ1(Lp) = mp(λ1 + ε, λ1 + ε) = mp(ε, 2 + ε) by Theorem1, we have mp(µ+1) < 1+. By Theorem7 λ (L ) = m (1 − ρ+ + ε, 1 + ρ− + ε) (2) i p p we have mq(µ + 1) ≤ mp(µ + 1), Theorem1 concludes + − λj(Lp) = mp(λj + ε, λj + ε) = 1 + ε the proof. Spectral Clustering of Signed Graphs via Matrix Power Means

D. Proof of Theorem2 and, for p ≤ −1, with p integer

Proof of Theorem2. Following the proof from Theorem1 kMp(A1,...,AT ) − Mp(B1,...,BT )k we can see that L+ and Q− share all of their eigen- sym sym − 1/|p| + − ≤ Cp m|p| kA1 − B1k ,..., kAT − BT k vectors. Let u be an eigenvector of Lsym and Qsym with eigenvalues α and β, respectively. Proof. The proof is contained in SectionF. By Lemma1 we have Observe that the upper bound in Theorem9 is general in the L u = m (α, β)u 0 0 sense that it is suitable for symmetric definite matrices with bounded spectrum, and for an arbitrary number of matrices. Moreover, from Theorem 1 in (Mercado et al., 2016) we know that We are now ready to prove Theorem8. + − p (Lsym#Qsym)u = αβu Proof of Theorem8. Let √ Further, m (α, β) = αβ. Hence, as L and L+ #Q− 0 0 sym sym A = L+ ,B = L+ have in common all eigenvectors and eigenvalues, we con- 1 sym 1 sym + − A = Q− ,B = Q− clude that L0 = Lsym#Qsym. 2 sym 2 sym with he corresponding signed power mean Laplacian E. Proof of Theorem5 + − + − Lp = Mp(Lsym,Qsym), Lp = Mp(Lsym, Qsym) For the proof of Theorem5, we first present Theorem8 + which is a general version that allows to choose different We start with the case p ≥ 1, with p integer. Let Cp = diagonal shifts of the Laplacians together with different p1/pβ1−1/p. By Theorem9 we have edge probabilities. kL − L k ≤ C+m ( L+ − L+ , Q− − Q− )1/p Theorem 8. Let G+ and G− be random graphs with inde- p p p p sym sym sym sym + + − − pendent edges P(Wi,j = 1) = pij and P(Wi,j = 1) = pij. Let δ+, δ− be the minimum expected degrees of G+ and Let γ = (γ1, γ2) where − + 1/p 1−1/p − q q G , respectively. Let Cp = p β , and Cp = 3 ln(8n/) 3 ln(8n/) γ1 = 2 + γ2 = 2 − |p|1/|p| α−(3+1/|p|). Choose  > 0. Then there exist con- δ δ + + − − + stants k = k (/2) and k = k (/2) such that if Define a = c mp(γ) and c = Cp . Then, δ+ > k+ ln n, and δ− > k− ln n then with probability  at least 1 − , P kLp − Lpk > a !  1/p q 3 ln(8n/) q 3 ln(8n/) + + − − + ≤P c mp( Lsym − Lsym , Qsym − Qsym ) > a kLp − Lpk ≤ Cp mp 2 δ+ , 2 δ− ! for p ≥ 1, with p integer and + + − − a =P mp( Lsym − Lsym , Qsym − Qsym ) >  q q 1/|p| c kL − L k ≤ C−m 2 3 ln(8n/) , 2 3 ln(8n/) p p p |p| δ+ δ−  p! + + p − − p a =P Lsym − Lsym + Qsym − Qsym > 2 + c for p ≤ −1, with p integer, where Lp = Mp(Lsym + − + − αI, Qsym + αI), and Lp = Mp(Lsym + αI, Qsym + αI). 2 ! + + p − − p X p =P Lsym − Lsym + Qsym − Qsym > γi Before starting the proof of Theorem8, we present an upper i=1 bound on the matrix power mean.   + + p p Theorem 9. Let A1,...,AT ,B1,...,BT be symmetric ≤P Lsym − Lsym > γ1 ∪ matrices where α ≤ λ(Ai) ≤ β, α ≤ λ(Bi) ≤ β for i = 1,...T and α, β > 0.  ! − − p p Qsym − Qsym > γ2 + 1/p 1−1/p − 1/|p| −(3+1/|p|) Let Cp = p β and Cp = |p| α .   Then, for p ≥ 1, with p integer + + p p ≤P Lsym − Lsym > γ1 kMp(A1,...,AT ) − Mp(B1,...,BT )k  p  1/p − − p +  + P Qsym − Qsym > γ2 (3) ≤ Cp mp kA1 − B1k ,..., kAT − BT k Spectral Clustering of Signed Graphs via Matrix Power Means   + + Proof of Theorem5. We will adapt to our particular case =P Lsym − Lsym > γ1 the general version presented in Theorem8. We do this by   showing that our Stochastic Block Model approach together − − + P Qsym − Qsym > γ2 with the shift of our model are particular cases of Theorem8.  r  First, note that the spectrum of the normalized Lapla- + + 3 ln(8n/) =P Lsym − Lsym > 2 + − δ+ cians Lsym and Qsym is upper bounded by two, i.e. + − r λ(Lsym), λ(Qsym) ∈ [0, 2]. Hence, by adding a diagonal  3 ln(8n/) + − − − shift we get λ(Lsym + αI), λ(Qsym + αI) ∈ [α, 2 + α]. + P Qsym − Qsym > 2 δ− Letting α = ε and β = 2+α we get the shift corresponding  r  to the particular case from Theorem5. + + 3 ln(4n/ˆ) =P Lsym − Lsym > 2 δ+ Further, observe that our SBM model is obtained by setting r + + − −  3 ln(4n/ˆ) pij = pin and pij = pin if vi, vj belong to the same cluster − − + + − − + P Qsym − Qsym > 2 δ− and pij = pout and pij = pout if vi, vj belong to different clusters.  r  + + 3 ln(4n/ˆ) = L − L > 2 P sym sym δ+ Moreover, under the Stochastic Block Model here consid- ered, the induced expected graphs are regular, and thus all  r  − − 3 ln(4n/ˆ) nodes have the same degree. Hence, the minimum expected + P Lsym − Lsym > 2 δ− degrees of G+ and G− are ≤ˆ+ ˆ (4) n δ+ = (p+ + (k − 1)p+ ) = k in out n δ− = (p− + (k − 1)p− ) where ˆ = /2. Inequality (3) follows from Boole’s in- k in out equality. Inequality (4) comes from applying Theorem 12 from (Chung & Radcliffe, 2011) to G+ and G−, with cor- Thus, taking these settings into Theorem8 we get the desired responding minimum expected degree δ+, and δ−, respec- result, except that the condition on the minimum expected + + tively, and ˆ, and degrees is that there exists constants k = k (/2), and k− = k−(/2) such that the desired concentration holds. − − Qsym − Qsym = k(I + T ) − (I + T )k To overcome this, observe in the proof of Theorem 12 (p.9) = k(I − T ) − (I − T )k that the condition δ > k ln(n) comes from the requirement = L− − L− r sym sym 3 ln(4n/) < 1 where δ Thus, by setting δ > 3 ln(4n/) the condition is fulfilled. − −1/2 − − −1/2 T = (D ) W (D ) In our case, this yields to δ+ > 3 ln(4n/ˆ) = 3 ln(8n/) − T = (D−)−1/2W−(D−)−1/2 and δ > 3 ln(4n/ˆ) = 3 ln(8n/) , leading to the desired result. Thus,   kL − L k ≥ a <  P p p F. Proof of Theorem9 and hence

  Before going into the proof, a set of preliminary results are kL − L k ≤ a < 1 −  P p p necessary. completing the proof for the case p ≥ 1. In what follows, for Hermitian matrices A and B we mean by A  B that B − A is positive semidefinite (see (Bhatia, For the proof of the case p ≤ −1 with p integer, let c = 1997), Ch. 5, and (Tropp, 2015), Ch. 2.1.8 for more details). 1/|p| |p| α−(3+1/|p|), and proceed as for the previous case We now proceed with the definition of a operator monotone with |p|. function: Definition 2 ((Tropp, 2015) Ch. 8.4.2, (Bhatia, 1997) Ch. We now finally give the proof for Theorem5. 5.). Let f:I → R be a function on an interval I of the Spectral Clustering of Signed Graphs via Matrix Power Means real line. The function f is operator monotone on I when Next we show that the spectrum of the matrix power mean A  B implies f(A)  f(B) for all Hermitian matrices A is well bounded for positive powers larger than one. and B whose eigenvalues are contained in I. Proposition 3. Let A1,...,AT be symmetric positive defi- nite matrices that are bounded below and above by α and The following result states that the negative inverse is opera- β; i.e. αI ≤ A ≤ βI for positive numbers α and β. Then, tor monotone. i for p ≥ 1, with p integer Proposition 1 ((Bhatia, 1997), Prop. V.1.6, (Tropp, 2015), Prop. 8.4.3). The function f(t) = − 1 is operator monotone t α ≤ λ(Mp(A1,...,AT )) ≤ β on (0, ∞).

The following result states that the effect of operator mono- 1 PT p Proof. Let Sp(A1,...,AT ) = T i=1 Ai . Then tone functions can be upper bounded in a helpful way. T ! Theorem 10 ((Bhatia, 1997), Theorem. X.3.8). Let f be 1 X hx, S (A ,...,A )xi = hx, Ap xi an operator monotone function on (0, ∞) and let A, B be p 1 T T i two positive definite matrices that are bounded below by a; i=1 T i.e. A ≥ aI and B ≥ aI for the positive number a. Then 1 X = hx, Apxi for every unitarily invariant norm T i i=1 |||f(A) − f(B)||| ≤ f 0(a)|||A − B||| Thus, we obtain the following upper bound

Applying this to the case of the negative inverse leads to the T 1 X p following Corollary. max hx, Sp(A1,...,AT )xi = max hx, Ai xi kxk=1 kxk=1 T Corollary 3. Let A, B be two positive definite matrices that i=1 T are bounded below by a; i.e. A ≥ aI and B ≥ aI for the 1 X p ≤ max hx, Ai xi positive number a. Then for every unitarily invariant norm T kxk=1 i=1 1 ≤ βp |||A−1 − B−1||| ≤ |||A − B||| a2 p Hence, λmax(Sp(A1,...,AT )) ≤ β , and 1 Proof. Let f(t) = − t . Then, by Proposition1 we know thus we obtain the corresponding upper bound 0 2 that f is operator monotone. Since f (t) = 1/t , it follows λmax(Mp(A1,...,AT )) ≤ β. from Theorem 10 In a similar way we obtain the following lower bound, |||A−1 − B−1||| = |||f(A) − f(B)||| ≤ 1 T f 0(a)|||A − B||| = |||A − B||| 1 X p 2 min hx, Sp(A1,...,AT )xi = min hx, Ai xi a kxk=1 kxk=1 T i=1 T 1 X p ≥ min hx, Ai xi T kxk=1 The next results states a useful result on positive powers i=1 between zero and one. ≥ αp Corollary 4 ((Bhatia, 1997), Eq. X.2). Let A, B be two p positive semidefinite matrices. Then, for 0 ≤ r ≤ 1 Hence, λmin(Sp(A1,...,AT )) ≥ α , and thus we obtain the corresponding lower bound r r r kA − B k ≤ kA − Bk λmin(Mp(A1,...,AT )) ≥ α.

Its equivalent to positive integer powers is stated in the Therefore, α ≤ λ(Mp(A1,...,AT )) ≤ β. following result. Proposition 2 (See (Bhatia, 1997), Eq. IX.4). For any two We now present results for p ≥ 1 of Theorem9. matrices X,Y , and for m = 1, 2,..., F.1. Results for the case p ≥ 1 kXm − Y mk ≤ mM m−1 kX − Y k The following two propositions are the main ingredients for where M = max(kXk , kY k). the upper bound presented in Theorem9 for the case p ≥ 1. Spectral Clustering of Signed Graphs via Matrix Power Means

Proposition 4. Let A1,...,AT ,B1,...,BT be symmetric where: the first inequality follows from the triangular in- positive semidefinite matrices. Then, for p ≥ 1 with p equality, the second inequality follows from Proposition2, integer the third inequality follows as βi ≤ β, and the last in- equality comes from the monotonicity of the scalar power kMp(A1,...,AT ) − Mp(B1,...,BT )k means. 1 p p p ≤ M (A1,...,AT ) − M (B1,...,BT ) p p The next Lemma contains the proof corresponding to the case of positive powers of Theorem9. 1 PT p Proof. Let Sp(A1,...,AT ) = T i=1 Ai and r = 1/p. Lemma 2 (Theorem9 for the case p ≥ 1). Let Then, A1,...,AT ,B1,...,BT be symmetric positive semidefi- nite matrices where λ(Ai) ≤ β and λ(Bi) ≤ β for kMp(A1,...,AT ) − Mp(B1,...,BT )k i = 1,...T . Let C+ = p1/pβ1−1/p. Let p ≥ 1, with p p 1/p 1/p integer Then, = Sp (A1,...,AT ) − Sp (B1,...,BT ) r r = Sp(A1,...,AT ) − Sp(B1,...,BT ) kMp(A1,...,AT ) − Mp(B1,...,BT )k r 1/p ≤ kSp(A1,...,AT ) − Sp(B1,...,BT )k +  ≤ Cp mp kA1 − B1k ,..., kAT − BT k 1/p = kSp(A1,...,AT ) − Sp(B1,...,BT )k Proof. We can see that p p 1/p = Mp (A1,...,AT ) − Mp (B1,...,BT ) kMp(A1,...,AT ) − Mp(B1,...,BT )k where the inequality comes from Corollary4, giving the 1 p p p desired result. ≤ Mp (A1,...,AT ) − Mp (B1,...,BT ) 1 ! p Proposition 5. Let A1,...,AT ,B1,...,BT be symmet- p−1  ≤ pβ mp kA1 − B1k ,..., kAT − BT k ric positive semidefinite matrices such that λ(Ai)≤β and λ(Bi)≤β for i = 1,...T . Then, for p ≥ 1, + 1/p = Cp mp kA1 − B1k ,..., kAT − BT k p p M (A1,...,AT ) − M (B1,...,BT ) p p where the first inequality comes from Proposition4, and the p−1  ≤ pβ mp kA1 − B1k ,..., kAT − BT k second inequality comes from Proposition5.

Proof. Let βi = max(kAik , kBik). Then, F.2. Results for the case p ≤ −1

p p The following two propositions are the main ingredients M (A1,...,AT ) − M (B1,...,BT ) p p for the upper bound presented in Theorem9 for the case T ! T ! 1 X 1 X p ≤ −1. = Ap − Bp T i T i i=1 i=1 Proposition 6. Let A1,...,AT ,B1,...,BT be symmetric T positive definite matrices where α ≤ λ(Ai) and α ≤ λ(Bi) 1 X = Ap − Bp for i = 1,...T , and α > 0. Then, for p ≤ −1, with p T i i i=1 integer T 1 X ≤ kAp − Bpk kMp(A1,...,AT ) − Mp(B1,...,BT )k T i i i=1 1 1/|p| ≤ M |p|(A−1,...,A−1) − M |p|(B−1,...,B−1) T 2 |p| 1 T |p| 1 T 1 X α ≤ p(β )p−1 kA − B k T i i i i=1 T 1 X Proof. Let S (A ,...,A ) = 1 PT Ap . Then, ≤ pβp−1 kA − B k p 1 T T i=1 i T i i i=1 kMp(A1,...,AT ) − Mp(B1,...,BT )k T ! 1 X = pβp−1 kA − B k = S1/p(A ,...,A ) − S1/p(B ,...,B ) T i i p 1 T p 1 T i=1 1/|p| −1 1/|p| −1 p−1  = S (A1,...,AT ) − S (B1,...,BT ) = pβ m1 kA1 − B1k ,..., kAT − BT k p p p−1  1/|p| −1 −1 −1 1/|p| −1 −1 −1 ≤ pβ mp kA1 − B1k ,..., kAT − BT k = S (A ,...,A ) − S (B ,...,B ) |p| 1 T |p| 1 T Spectral Clustering of Signed Graphs via Matrix Power Means

−1 −1 −1 −1 −1 −1 = M|p|(A1 ,...,AT ) − M|p|(B1 ,...,BT ) The next Lemma contains the proof corresponding to the case of negative powers of Theorem9. 1 −1 −1 −1 −1 ≤ M|p|(A ,...,A ) − M|p|(B ,...,B ) α2 1 T 1 T Lemma 3 (Theorem9 for the case p ≤ −1). Let 1 1/|p| A1,...,AT ,B1,...,BT be symmetric positive definite ma- ≤ M |p|(A−1,...,A−1) − M |p|(B−1,...,B−1) α2 |p| 1 T |p| 1 T trices where α ≤ λ(Ai) and α ≤ λ(Bi) for i = 1,...T . − 1/|p| −(3+1/|p|) Let Cp = |p| α . Let p ≤ −1 with p integer. Then, where the first inequality follows from Corollary3 and Proposition3, whereas the second inequality follows from kMp(A1,...,AT ) − Mp(B1,...,BT )k Proposition4. − 1/|p| ≤ Cp m|p| kA1 − B1k ,..., kAT − BT k Proposition 7. Let A1,...,AT ,B1,...,BT be symmetric positive definite matrices such that α≤λ(Ai) and α≤λ(Bi) Proof. for i = 1,...T . Then, for p ≤ − 1, with p integer kMp(A1,...,AT ) − Mp(B1,...,BT )k |p| −1 −1 |p| −1 −1 M (A ,...,A ) − M (B ,...,B ) 1 1/|p| |p| 1 T |p| 1 T |p| −1 −1 |p| −1 −1 ≤ M|p| (A1 ,...,AT ) − M|p| (B1 ,...,BT ) −(1+|p|)  α2 ≤ |p| α m kA1 − B1k ,..., kAT − BT k |p| !1/|p| 1 −(1+|p|)  ≤ |p| α m|p| kA1 − B1k ,..., kAT − BT k Proof. Let αi = min(kAik , kBik), then it clearly follows α2 1 −1 −1 that α = max( Ai , Bi ). Thus, i − 1/|p| = Cp m|p| kA1 − B1k ,..., kAT − BT k M |p|(A−1,...,A−1) − M |p|(B−1,...,B−1) |p| 1 T |p| 1 T where the first inequality comes from Proposition6, and the T ! T ! 1 X 1 X second inequality comes from Proposition7. = (A−1)|p| − (B−1)|p| T i T i i=1 i=1 We are now ready to prove the result of Theorem9.

1 T X −1 |p| −1 |p| = (Ai ) − (Bi ) Proof of Theorem9. For the case p ≥ 1 see Lemma2. For T i=1 the case p ≤ −1 see Lemma3. T 1 X ≤ (A−1)|p| − (B−1)|p| T i i i=1

T |p|−1 1 X  1  −1 −1 G. Proof of Theorem6 ≤ |p| A − B T α i i i=1 i Before giving the proof of Theorem6 we need to present |p|−1 T ! two auxiliary results.  1  1 X −1 −1 ≤ |p| A − B α T i i The following is an auxiliary technical result that extends i=1 T ! an implicit result stated in (Rohe et al., 2011)(p.1908-1909)  1 |p|−1 1 X for the Frobenius norm to the case of the operator norm. ≤ |p| kAi − Bik α T α2 n×k i=1 Lemma 4. Let X, X ∈ R be matrices with orthonor- T ! mal columns. Let U, V be orthonormal matrices and Σ a  1 |p|+1 1 X = |p| kA − B k diagonal matrix such that α T i i i=1 X T X = UΣV T  1 |p|+1 = |p| m kA − B k ,..., kA − B k  α 1 1 1 T T where the diagonal entries of Σ are the cosines of the princi-  1 |p|+1  pal angles between the column space of X and the column ≤ |p| m kA − B k ,..., kA − B k T α |p| 1 1 T T space of X . Let O = UV . Then, = |p| α−(1+|p|)m kA − B k ,..., kA − B k  1 |p| 1 1 T T √ kX − X Ok ≤ k sin Θ(X ,X)k where: the first inequality follows from the triangular 2 inequality, the second inequality follows from Proposi- Proof. For the proof we will make use of the identity tion2, the third inequality follows as α ≤ α , and the i XT X O = (V ΣU T )UV T = V ΣV T , and the fact that fourth inequality follows as Corollary3, and the last in- kAk = pλ (AT A). That is, equality comes from the monotonicity of the scalar power max means. (X−X O)T (X − X O) Spectral Clustering of Signed Graphs via Matrix Power Means

T T T p = (X − O X )(X − X O) X 2 ≤ sin Θi T T T T T T = X X − X X O − O X X + O X X O i T T T T ˆ 2 = I − X X O − O X X + O O = k sin Θ(V,V )kF = I − XT X O − OT X T X + I Thus, = 2I − XT X O − OT X T X 1 ˆ ˆ ˆ = 2I − V ΣV T − V ΣV T √ kV − VOk ≤ k sin Θ(V,V )k ≤ k sin Θ(V,V )kF 2 = 2(I − V ΣV T ) Further, it is straightforward to see that

1/2 1/2 Thus, min(d kΣˆ − Σk, kΣˆ − ΣkF) ≤ d kΣˆ − Σk

2  T  kX − X Ok = λmax (X − X O) (X − X O) Thus, all in all, we have T 1 = 2λmax(I − V ΣV ) √ kV − VOˆ k ≤ k sin Θ(V,Vˆ )k = 2 max(1 − cos Θi) 2 i ˆ 2 ≤ k sin Θ(V,V )kF ≤ 2 max(1 − cos Θi) i 2d1/2kΣˆ − Σk 2 ≤ = 2 max(sin Θi) i min(µr−1 − µr, µs − µs+1) 2 = 2k sin Θk which completes the proof.

We are now ready to give the proof of Theorem6. Hence, √1 kX − X Ok ≤ k sin Θ(X ,X)k 2 Proof of Theorem6. The proof is an application of the The next result is a useful representation of the Davis-Kahan Davis-Kahan theorem as presented in Theorem 11. Ob- theorem. It is a technical adaption from the Frobenius norm serve that in Theorem 11 the eigenvalues are sorted in a to the operator norm based on Lemma4 and Theorem 14. decreasing way i.e. µ1 ≥ · · · ≥ µn, whereas in our case ˆ p×p Theorem 11. Let Σ, Σ ∈ R be symmetric, with eigen- they are sorted in an increasing manner i.e. λ1 ≤ · · · ≤ λn. values and respectively. Fix µ1 ≥ ... ≥ µp µˆ1 ≥ ... ≥ µˆp Notationally, let the variables p, s, r from Theorem 11 be and assume that 1 ≤ r ≤ s ≤ p min(µr−1 − µr, µs − defined as p = s = n, r = p − k + 1. µs+1) > 0, where µ0 := ∞ and µp+1 := −∞. Let p×d d := s − r + 1, and let V = (vr, vr+1, . . . , vs) ∈ R We first focus in the case for p ≤ −1. For this case we ˆ p×d and V = (ˆvr, vˆr+1,..., vˆs) ∈ R have orthonormal are interested in the k-smallest eigenvalues, i.e. λ1, . . . , λk, columns satisfying Σvj = µjvj and Σˆˆvj =µ ˆjvˆj for which correspond to µp, . . . , µr, where µp = λ1 and µr = j = r, r + 1, . . . , s. Then there exists an orthogonal matrix λk. O ∈ d×d such that R By definition, in Theorem 11 we have that µp+1 = −∞. 1 2d1/2kΣˆ − Σk Thus, µp − µp+1 = ∞. Further, we can see µr−1 − µr = √ ˆ + − kV − VOk ≤ λk+1 − λk = (1 + ε) − mp(1 − ρ + ε, 1 + ρ + ε) and 2 min(µr−1 − µr, µs − µs+1) hence by Eq.2 Proof. By theorem 14 we have min(µr−1 − µr, µs − µs+1) = 1/2 ˆ ˆ + − 2 min(d kΣ − Σk, kΣ − ΣkF) (1 + ε) − mp(1 − ρ + ε, 1 + ρ + ε) k sin Θ(V,Vˆ )kF ≤ . min(µr−1 − µr, µs − µs+1) which by Theorem 11 leads to the following inequality √ From lemma4 we can see that 23/2k1/2 8k kVk − VkOkk ≤ γ kLp − Lpk = γ kLp − Lpk 1 √ kV − VOˆ k ≤ k sin Θ(V,Vˆ )k By applying Theorem5, we know that if 2 n δ+ = (p+ + (k − 1)p+ ) > 3 ln(8n/), and Moreover, as sin Θ(V,Vˆ ) is a diagonal matrix, it holds that k in out − n − − 2 2 δ = (p + (k − 1)pout) > 3 ln(8n/) k sin Θ(V,Vˆ )k = max(sin Θi) in i k Spectral Clustering of Signed Graphs via Matrix Power Means then with probability at least 1 −  I. Results on Bethe Hessian √ q q  8k − 1/|p| 3 ln(8n/) 3 ln(8n/) The following Lemma5 states that for the case where α = 1 kVk − VkOkk ≤ C m + , − γ p |p| δ δ the Bethe Hessian is equal to the arithmetic mean of Lapla- yielding the desired result. The case for p ≥ 1 is similar, cians, i.e. the signed ratio Laplacian LSR. where instead of k the value k0 = k − 1 is used. Lemma 5. Let α = 1. Then the Bethe Hessian is two times the arithmetic mean of L+ and Q−.

Proof of Lemma5. Let J +,J − be the positive and negative H. Main building block for our results + − part of J, i.e. Jij = max{0,Jij} and Jij = − min{0,Jij}. In this section present two results from (Chung & Radcliffe, Let D+ and D− be degree diagonal matrices of J + and J− 2011) that are the main building blocks for our results. respectively, i.e. D = D+ + D−. Then, Theorem 12 ((Chung & Radcliffe, 2011)). Let G be a ran- √ H = (α − 1)I − αJ + D dom graph, where pr(vi ∼ vj) = pij, and each edges is independent of each other edge. Let A be the adjacency = −J + D matrix of G, so Aij = 1 if vi ∼ vj and 0 otherwise, and = −J + + J − + D+ + D− A = E(A), so A = p . Let D be the diagonal matrix ij i,j = (D+ − J +) + (D− + J −) with Dii = deg(vi), and D = E(D). Let δ be the minimum expected degree of G, and L = I − D−1/2AD−1/2 the = L+ + Q−

(normalized) Laplacian matrix for G. Choose  > 0. Then = LSR there exists a constant k = k() such that if δ > k ln n, then the probability at least 1 − , the eigenvalues of L and L satisfy r 3 ln(4n/) Lemma 6. Let H be the Bethe hessian of the expected λj(L) − λj(L) ≤ 2 k δ signed graph. Then {χi}i=2 are the eigenvectors corre- −1/2 −1/2 for all 1 ≤ j ≤ n, where L = I − D AD . sponding to the (k − 1)-smallest negative eigenvalues of H if and only if the following conditions hold: Although this theorem is presented as the main result, one 2(d++d−)−1 can see in the proof of theorem 12 in (Chung & Radcliffe, √ + + − − 1. max{0, + − } < (pin − pout) − (pin − pout) 2011), that in deed what they proved was a concentration d +d |C| bound for L − L . + − 2. pout < pout Theorem 13 ((Chung & Radcliffe, 2011)). Assume that conditions of Theorem 12 hold. Choose  > 0. Then there Proof. In our framework the we can see that J = W + − exists a constant k = k() such that if δ > k ln n, then W −. In SectionA we can see that expected adjacency  r  matrices W+ and W− have three distinct eigenvalues: 3 ln(4n/) P L − L ≤ 2 > 1 −  (5) δ + + + + + + λ1 = |C| (pin + (k − 1)pout), λi = |C| (pin − pout) Theorem 14 ((Yu et al., 2015)). Let Σ, Σˆ ∈ p×p be sym- − − − − − − R λ1 = |C| (pin + (k − 1)pout), λi = |C| (pin − pout) metric, with eigenvalues µ1 ≥ ... ≥ µp and µˆ1 ≥ ... ≥ µˆp respectively. Fix 1 ≤ r ≤ s ≤ p and assume that for i = 2, . . . , k, with corresponding eigenvectors min(µr−1 − µr, µs − µs+1) > 0, where µ0 := ∞ and χ1,..., χk. Remaining eigenvalues are equal to zero. Fur- + − µp+1 := −∞. Let d := s − r + 1, and let V = ther, as both matrices W and W share all their eigenvec- p×d ˆ + − (vr, vr+1, . . . , vs) ∈ R and V = (ˆvr, vˆr+1,..., vˆs) ∈ tors, then the expected matrix J = W −W has the same p×d R have orthonormal columns satisfying Σvj = µjvj eigenvectors with eigenvalues being the difference between and Σˆˆvj =µ ˆjvˆj for j = r, r + 1, . . . , s. Then the positive and negative counterparts, i.e. J χi = µiχi where 1/2 ˆ ˆ ˆ 2 min(d kΣ − Σk, kΣ − ΣkF) k sin Θ(V,V )kF ≤ . + − min(µr−1 − µr, µs − µs+1) µi = λi − λi . ˆ d×d Moreover, there exists an orthogonal matrix O ∈ R such As we assume that all clusters are of the same size |C|, the that expected signed graph is a regular graph with degrees d+ 3/2 1/2 ˆ ˆ − + − + 2 min(d kΣ − Σk, kΣ − ΣkF) and d . Thus, in expectation αb = d + d , where d = kVˆ Oˆ − V kF ≤ . + + − − − min(µr−1 − µr, µs − µs+1) |C| (pin + (k − 1)pout) and d = |C| (pin + (k − 1)pout). Spectral Clustering of Signed Graphs via Matrix Power Means

Hence, the Bethe hessian of the expected signed graph can and for the remaining condition λi < λj the equivalent be expressed as condition is √ + + − − H = (α − 1)I − αJ + D 0 < (pin − pout) − (pin − pout) b √b = (α − 1)I − αJ + αI by putting together conditions for λ < 0 and λ < λ we b √b b i i j get the desired result. = (2αb − 1)I − αbJ H Lemma 7. Let H be the Bethe hessian of the expected non- It is easy to see that the matrix is some sort of a diagonal k shift of J , and thus they have the same eigenvectors. In empty signed graph. Let |V | → ∞. Then {χi}i=2 are the particular we can observe that: eigenvectors corresponding to the (k − 1) smallest negative √ eigenvalues of H if and only if the following conditions  Hχi = (2α − 1)I − αJ χi hold: b √b = (2αb − 1)χi − αbJ χi − + + − √ 1. pin + pout < pin + pout = (2α − 1)χi − αµiχi b √ b 2. p+ < p−  out out = (2αb − 1) − αµb i χi Hence, the corresponding eigenvalues of H are: Proof. From Lemma6 we have the following conditions for √ the recovery of informative eigenvectors on finite graphs: λi = (2αb − 1) − αµb i . (6) 2(d++d−)−1 1. max{0, √ } < (p+ − p+ ) − (p− − p− ) d++d−|C| in out in out All in all, the corresponding eigenvalues of the expected + − Bethe hessian matrix H are: 2. pout < pout √ + − λ1 = (2α − 1) − α(d − d ) , b √b Let + + − −  λi = (2αb − 1) − αb |C| (pin − pout) − (pin − pout) , + + − − c2 = p + (k − 1)p + p + (k − 1)p λ = (2α − 1) . in out in out j b + + − − c3 = (pin − pout) − (pin − pout) for i = 2, . . . , k and j = k + 1, . . . , n. We now focus on the conditions that are necessary so that For the first condition of Lemma6 can be expressed as follows: eigenvectors χ2,..., χk have the smallest negative eigen- values. This is based on the fact that informative eigen- 2(d+ + d−) − 1 √ < (p+ − p+ ) − (p− − p− ) ⇐⇒ vectors of the Bethe Hessian H have the smallest negative d+ + d− |C| in out in out eigenvalue. From Eq.6 we can see that the general condition 1/2 for eigenvalues of the Bethe Hessian in expectation H to be 2c2 1 − < c3 . negative is 1/2 3/2 1/2 |C| |C| c2 2αb − 1 Hence, in the limit where |C| → ∞ the above condition λi < 0 ⇐⇒ √ < µi . (7) αb turns into Hence the conditions to be analyzed are: − + + − 0 < c3 ⇐⇒ pin + pout < pin + pout . (8) λi < λ1, for i = 2, . . . , k yielding the desired conditions. λi < 0, for i = 2, . . . , k

λi < λj, for i = 2, . . . , k and j = k + 1, . . . , n The following Lemma states the interesting fact that the Bethe Hessian works better for large graphs Therefore we can easily see that the corresponding condition λi < λ1 boils down to Lemma 8. Let Hn be the Bethe Hessian of the expected signed graph under the SBM with n nodes. Let χn = + − k n 3 + − pout < pout {χi}i=2 where χ2,..., χk ∈ R . Let 2 < d + d . Let n < m. If χn are eigenvectors corresponding to the whereas condition λi < 0 is equivalent to m (k − 1)-smallest negative eigenvalues of Hn, then χ are + − 2(d + d ) − 1 + + − −  eigenvectors corresponding to the (k − 1)-smallest negative √ < (pin − pout) − (pin − pout) d+ + d− |C| eigenvalues of Hm. Spectral Clustering of Signed Graphs via Matrix Power Means

Proof. In this proof we show that if for a given signed graph with n nodes the conditions of Lemma6, then conditions 0.5 of Lemma6 hold for expected signed graphs with a larger number of nodes. 0.4 0.4 By Lemma6, we know that for a given graph in expectation 0.3 0.3 n k with n nodes, eigenvectors χ = {χi}i=2 correspond to 0.2 0.2 the (k − 1)-smallest negative eigenvalues of Hn if and only the following conditions hold: 0.1 0.1

0 2(d++d−)−1 0 0.1 0.2 0.3 0.4 0.001 0.01 0.02 0.03 1. max{0, √ } < (p+ − p+ ) − (p− − p− ) d++d−|C| in out in out (a) p = 0.03,  ∈ [0, 0.5] (b) p ∈ [0.001, 0.03],  = 0.25 + − 2. pout < pout Figure 5: Mean clustering error under the Censored Block Observe that the right hand side of the above conditions Model (Saade et al., 2015), with two clusters of size 500 and does not depend on the number of nodes in the graph. We 20 runs. Fig. 5a: probability of observing and edge is fixed proceed by analyzing the left hand side of the first condition: to p = 0.03, and  ∈ [0, 0.5]. Fig. 5b: probability of flipping sign of an edge is fixed to  = 0.25, and p ∈ [0.001, 0.03] 2(d+ + d−) − 1 √ . (9) |C| d+ + d− J. Experiments with the Censored Block Model Note that under the Stochastic Block Model in considera- n tion, all k clusters are of size |C| = k . We now identify In this section we present a numerical evaluation of dif- conditions such that the Equation9 decreases with larger ferent methods under the Stochastic Block Model follow- values of |C|. ing the parameters corresponding to the Censored Block Model (CBM), following (Saade et al., 2015). Observe Let x, α ∈ R. Define the scalar function g : R>0 → R as that the CBM is a particular case of the Stochastic Block 2αx − 1 g(x) = √ Model for signed graphs as introduced in Section3. Fol- αx3 lowing (Saade et al., 2015), the CBM is has two parameters: probability of observing an edge (p), and the probability Observe that we recover Equation9 by letting x = |C| + + − − of flipping the sign of an edge (). The CBM can be re- and α = pin + (k − 1)pout + pin + (k − 1)pout where + − covered from the SSBM introduced in Section3 by setting αx = d + d . + − − + pin = pout = p(1 − ) and pin = pout = p. Observe that The corresponding derivative is the parameter  works as a noise parameter: the noiseless setting corresponds to  = 0, where positive and negative 3 − 2αx g0(x) = √ edges are only inside and between clusters, respectively. 3 2x αx The case where  = 0.5 corresponds to the case where no Then clustering structure is conveyed by the sign of the edges. 3 g0(x) < 0 ⇐⇒ < αx . (10) 2 We present a numerical evaluation under the SSBM with parameters from CBM in Fig.5. We consider two clusters Hence, if 3 < αx then g(y) < g(x) if and only if x < y. 2 and fix a priori its size to be of 500 nodes each. We present We now apply this result to our setting. the clustering error out of 20 realizations from the SSM with n m parameters following the CBM. We consider two settings: Let |Cn| := |C| = k and |Cm| = k denote the cluster size of the expected signed graphs with n and m nodes, First setting: we fix the probability of observing an edge to + + − − respectively. Let α = pin + (k − 1)pout + pin + (k − 1)pout. p = 0.03, and evaluate over different values of  ∈ [0, 0.5]. 3 + − In Fig. 5a we can observe that there is no relevant difference Let 2 < d + d . Then in clustering error between methods. Further, as expected + − 2(d + d ) − 1 we can see that for small values of  all methods perform g(|Cm|) < g(|Cn|) = √ (11) |C| d+ + d− well, and for larger values of  the clustering error increases; Second setting: we fix the probability of flipping the sign if and only if n < m. Hence, if conditions 1 and 2 hold of an edge to  = 0.25, and evaluate over different values for the expected graph G with n nodes and its expected of p ∈ [0.001, 0.03]. In Fig. 5b we can observe that the absolute degree is larger than 3 , i.e. 3 < d+ + d−, then 2 2 performance of the Bethe Hessian is best for small values of conditions 1 and 2 hold for expected graphs with a larger p, i.e. for sparser graphs. Following the Bethe Hessian are number of nodes, leading to the desired result. Spectral Clustering of Signed Graphs via Matrix Power Means

Table 2 Experiments on UCI datasets. Positive edges generated by k-nearest neighbours, and negative edges generated by k-farthest neighbours. Reported is the percentage of cases where each method achieves the smallest and stricly smallest clustering error, and the average clustering error. iris wine ecoli australian cancer vehicle german image optdig isolet USPS pendig 20new MNIST # vertices 150 178 336 690 699 846 1000 2310 5620 7797 9298 10992 18846 70000 # classes 3 3 8 2 2 4 2 7 10 26 10 10 20 10 Best (%) 14.1 14.1 10.9 7.8 0.0 20.3 34.4 0.0 3.1 0.0 0.0 0.0 45.3 0.0 H Str. best (%) 4.7 3.1 7.8 3.1 0.0 14.1 17.2 0.0 3.1 0.0 0.0 0.0 45.3 0.0 Avg. error 16.8 32.2 23.9 41.7 13.2 58.4 29.5 64.0 32.5 54.0 41.3 39.2 88.5 48.2 Best (%) 10.9 10.9 14.1 4.7 15.6 12.5 12.5 17.2 26.6 7.8 14.1 7.8 26.6 14.1 LSN Str. best (%) 1.6 3.1 9.4 1.6 15.6 7.8 0.0 17.2 26.6 7.8 14.1 7.8 26.6 14.1 Avg. error 17.5 32.2 24.5 42.8 8.8 57.2 29.9 53.9 24.9 51.2 38.6 37.8 89.0 45.8 Best (%) 4.7 12.5 0.0 6.3 1.6 6.3 40.6 0.0 0.0 0.0 0.0 0.0 1.6 0.0 LBN Str. best (%) 1.6 1.6 0.0 0.0 1.6 4.7 17.2 0.0 0.0 0.0 0.0 0.0 1.6 0.0 Avg. error 26.6 33.6 30.5 42.5 10.2 61.6 29.6 57.2 41.1 67.4 50.1 50.5 92.5 58.6 Best (%) 6.3 20.3 7.8 6.3 0.0 20.3 15.6 9.4 0.0 0.0 0.0 0.0 4.7 0.0 LAM Str. best (%) 1.6 9.4 6.3 1.6 0.0 7.8 1.6 6.3 0.0 0.0 0.0 0.0 4.7 0.0 Avg. error 19.0 32.7 24.4 42.7 11.6 58.1 29.7 47.7 33.5 49.6 44.7 48.3 89.7 56.1 Best (%) 32.8 35.9 34.4 32.8 7.8 17.2 46.9 6.3 29.7 28.1 12.5 0.0 1.6 82.8 LGM Str. best (%) 1.6 7.8 21.9 23.4 6.3 14.1 25.0 6.3 28.1 28.1 9.4 0.0 1.6 82.8 Avg. error 14.1 31.9 20.4 39.3 11.3 57.6 29.5 46.8 13.0 42.6 27.6 45.0 89.9 26.7 Best (%) 25.0 45.3 39.1 42.2 0.0 12.5 15.6 39.1 4.7 37.5 4.7 9.4 12.5 1.6 L−1 Str. best (%) 0.0 14.1 18.8 31.3 0.0 9.4 1.6 29.7 4.7 37.5 4.7 9.4 12.5 1.6 Avg. error 13.8 29.8 20.3 38.2 8.3 56.2 29.8 39.7 16.3 42.1 25.2 32.9 88.3 32.3 Best (%) 73.4 43.8 25.0 34.4 76.6 31.3 20.3 39.1 37.5 26.6 71.9 82.8 7.8 1.6 L−10 Str. best (%) 42.2 7.8 10.9 18.8 75.0 25.0 4.7 31.3 35.9 26.6 68.8 82.8 7.8 1.6 Avg. error 12.7 30.2 20.8 38.6 5.7 55.9 29.7 39.4 12.1 42.3 21.9 26.9 89.8 28.6

the arithmetic mean Laplacian L1 together with the signed Model (Saade et al., 2015), in the context where graphs normalized Laplacian LSN . unlikely follow a SBM distribution we can see that it is outperformed by the signed power mean Laplacian L . Hence we have observed that for sufficiently dense graphs p following the Censored Block Model, the performance of We consider a second setting where we generate k− noise- different methods is rather similar, whereas for sparser less negative edges via cannot link constraints between graphs the Bethe Hessian performs best, confirming the nodes of different classes. The corresponding results are analysis presented in (Saade et al., 2015). shown in Table3. We observe in this setting that the arith- metic mean Laplacian LAM presents the best performance, K. Experiments on UCI datasets followed by the geometric mean Laplacian LGM and the Balance Normalized Laplacian LBN . This suggests that We evaluate the signed power mean Laplacian with from the family of non-arithmetic based Laplacians the case L−10,L−1 against LSN , LBN , LAM , LGM and H using of LGM is a reasonable option showing certain robustness datasets from the UCI repository. We build W + from the to different signed graph regimes. k+ nearest neighbor graph, whereas W − is obtained from We emphasize that the eigenvectors of L are calculated the k− farthest neighbor graph. For each dataset we eval- p without ever computing the matrix itself, by adapting the uate all clustering methods over all possible choices of method proposed in (Mercado et al., 2018), described in k+, k− ∈ {3, 5, 7, 10, 15, 20, 40, 60}, yielding in total 64 Sec.M. Also, please see SectionL for a performance com- cases. We present the following statistics: Best(%): propor- parison with respect to changes in the diagonal shift on UCI tion of cases where a method yields the smallest clustering datasets. error. Strictly Best(%): proportion of cases where a method is the only one yielding the smallest clustering error. Results are shown in Table2. L. On Diagonal Shift

Observe that in 4 datasets H and LGM present a compet- In this section we briefly discuss the effect of the diagonal itive performance. For the remaining cases we can see shift on the power mean Laplacian Lp for p ≤ 0. In the defi- that the best performance are obtained by the signed power nition of power mean Laplacian in Eq.1 it is mentioned that mean Laplacians L−1,L−10. This verifies the superiority for negative powers p ≤ 0 a diagonal shift is necessary. To of negative powers (p < 0) to positive (p > 0) powers of evaluate the influence of the magnitude of the diagonal shift Lp and related approaches like LSN ,LBN . Moreover, al- we perform numerical evaluations on two different kinds of though the Bethe Hessian is known to be optimal under the signed graphs: on one side we consider signed graphs gener- sparse transition theoretic limit under the Censored Block ated through the Signed Stochastic Block Model introduced Spectral Clustering of Signed Graphs via Matrix Power Means

Table 3 Experiments on UCI datasets. Positive edges generated by k-nearest neighbours. Negative edges generated are cannot links between nodes of different classes. Reported is the percentage of cases where each method achieves the smallest and stricly smallest clustering error, and the average clustering error. iris wine ecoli australian cancer vehicle german image optdig isolet USPS pendigits 20new MNIST # vertices 150 178 336 690 699 846 1000 2310 5620 7797 9298 10992 18846 70000 # classes 3 3 8 2 2 4 2 7 10 26 10 10 20 10 Best (%) 54.7 51.6 20.3 43.8 40.6 26.6 45.3 12.5 4.7 3.1 4.7 1.6 15.6 4.7 H Str. best (%) 0.0 9.4 9.4 1.6 0.0 7.8 0.0 0.0 3.1 3.1 4.7 1.6 15.6 3.1 Avg. error 3.6 14.3 15.9 8.6 0.8 32.0 8.5 16.9 6.5 47.8 15.7 10.4 87.2 10.7 Best (%) 59.4 42.2 17.2 42.2 45.3 37.5 48.4 17.2 15.6 20.3 12.5 12.5 28.1 9.4 LSN Str. best (%) 0.0 4.7 9.4 0.0 0.0 17.2 0.0 3.1 10.9 18.8 12.5 12.5 28.1 7.8 Avg. error 4.0 15.5 15.9 11.9 5.3 30.5 11.1 13.8 4.9 44.1 11.9 7.2 86.1 7.8 Best (%) 68.8 51.6 45.3 42.2 53.1 50.0 50.0 50.0 37.5 28.1 35.9 35.9 35.9 42.2 LBN Str. best (%) 3.1 12.5 40.6 0.0 1.6 28.1 0.0 35.9 34.4 26.6 35.9 35.9 35.9 42.2 Avg. error 2.5 14.6 14.9 11.0 0.8 30.1 10.1 13.6 6.6 47.0 12.5 8.8 84.6 9.2 Best (%) 42.2 25.0 26.6 0.0 0.0 23.4 0.0 45.3 45.3 35.9 46.9 48.4 18.8 45.3 LAM Str. best (%) 21.9 21.9 23.4 0.0 0.0 21.9 0.0 43.8 45.3 35.9 46.9 48.4 18.8 45.3 Avg. error 2.1 22.9 17.0 11.3 0.5 44.2 12.5 13.0 3.2 40.2 8.2 3.9 88.1 4.2 Best (%) 3.1 4.7 0.0 87.5 87.5 1.6 100.0 0.0 0.0 7.8 0.0 0.0 1.6 0.0 LGM Str. best (%) 0.0 0.0 0.0 56.3 46.9 1.6 48.4 0.0 0.0 7.8 0.0 0.0 1.6 0.0 Avg. error 5.6 28.8 19.4 2.2 4.3 55.9 0.0 35.9 9.4 41.9 21.1 25.0 89.5 25.4 Best (%) 7.8 9.4 1.6 0.0 0.0 1.6 0.0 1.6 1.6 6.3 0.0 1.6 0.0 0.0 L−1 Str. best (%) 0.0 4.7 0.0 0.0 0.0 1.6 0.0 1.6 1.6 6.3 0.0 1.6 0.0 0.0 Avg. error 3.9 28.3 19.1 10.8 4.2 54.9 22.5 27.3 9.1 41.5 19.2 15.9 89.5 19.5 Best (%) 6.3 0.0 4.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 L−10 Str. best (%) 4.7 0.0 3.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg. error 8.1 29.3 20.0 5.6 3.6 56.4 7.8 38.5 11.2 42.1 21.4 26.3 89.9 25.9 in Section3, and on the otherside we consider signed graphs Experiments with benchmark datasets. We now perform built from standard machine learning benchmark datasets a numerical evaluation on different real world networks, following SectionK. following the procedure of SectionK. Moreover, we per- form this analysis for p ∈ {−1, −10} and diagonal shifts Experiments with SSBM. We begin with experiments {10−10, 10−9,..., 103}. The corresponding results are pre- based on signed graphs following the SSBM. The corre- sented in Fig.8, where we present the average clustering sponding results are presented in Fig.6. We study the perfor- error taken across all values of k+ and k− (for more de- mance of the power mean Laplacians L ,L ,L ,L −1 −2 −5 −10 tails on the construction of the corresponding signed graphs with diagonal shifts {10−10, 10−9,..., 103}. Moreover, the please see SectionK). case where either G+ or G− are informative i.e. assorta- tive and disassortative, respectively. In particular, in top We can observe a general behaviour for L−10 across (resp. bottom) row of Fig.6 the results correspond to the datasets, where for a small diagonal shift, the clustering case where G+ (resp.G−) is fixed to be assortative (resp. error is high, and decreases for larger shifts, generally reach- disassortative). ing its mininum clustering error around diagonal shifts equal to one, to later present a slight increase in clustering error. We can observe that the larger the value of p, the more This confirms the proposed approach to set the diagonal shift robust the performance of the corresponding power mean to log (1 + |p|) + 10−6 which for the case of p = −10 Laplacian L to the values of the diagonal shift. For in- 10 p is ≈ 1.04. For the case of the harmonic mean Laplacian stance, we can see for L (see Figs. 6a and 6e) that the −1 L we can observe that it presents a more stable behaviour smaller the diagonal shift, the better the smaller the cluster- −1 that slightly resembles the one of L . In particular, we ing error, whereas for diagonal shifts 100, 101, 102, 103 its −10 can observe that there is a region from 10−6 to 10−1 where performance clearly deteriorates. the smallest average clustering error is achieved. Hence, On the other side we can see that the power mean Laplacian L−1 is relatively more robust to different diagonal shifts. L−10 presents a high sensibility towards the value of the This confirms the observations made based on signed graphs diagonal shift (see Figs. 6d and 6h) where the diagonal shift following the SBM. should be neither too large nor too small, being the val- On condition number. We now consider a condition num- ues {10−2, 10−1, 100} the more suitable for this particular ber approach to study the effect of the diagonal shift. Recall case. This observations are confirmation for the setting that the eigenvalue computation scheme considered in this with sparse graphs, as it is observed in Fig.7. paper is described in SectionM with the corresponding Al- Spectral Clustering of Signed Graphs via Matrix Power Means

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

-0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05

(a) L−1 (b) L−2 (c) L−5 (d) L−10

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

-0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05

(e) L−1 (f) L−2 (g) L−5 (h) L−10

Figure 6: Mean clustering error under SBM for different diagonal shifts with sparsity 0.1. Details in Sec.L.

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

-0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05

(a) L−1 (b) L−2 (c) L−5 (d) L−10

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

-0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05 -0.05 -0.025 0 0.025 0.05

(e) L−1 (f) L−2 (g) L−5 (h) L−10

Figure 7: Mean clustering error under SBM for different diagonal shifts with sparsity 0.05. Details in Sec.L. Spectral Clustering of Signed Graphs via Matrix Power Means

0.5 0.5

0.3 0.4 0.4

0.3 0.3 0.2 0.05

0.2 0.2

0.1 0.1 0.1

0 0 0 0 10 -9 10 -6 10 -3 10 -0 10 3 10 -9 10 -6 10 -3 10 -0 10 3 10 -9 10 -6 10 -3 10 -0 10 3 10 -9 10 -6 10 -3 10 -0 10 3

−10 −9 3 Figure 8: Mean clustering error of the power mean Laplacians L−1 and L−10 with diagonal shifts {10 , 10 ,..., 10 }. Apy + − Algorithm 3: PKSM for the computation of Algorithm 2: PM applied to Mp(Lsym,Qsym). Input: u0 = y, V0 = [ · ], p < 0 Input: x0, p < 0 x = Apy + − Output: Output: Eigenpair (λ, x) of Mp(Lsym,Qsym) 1 v0 ← y/kyk2 1 repeat 2 for s = 0, 1, 2, . . . , n do (1) + p 2 u ← (L ) x k sym k (Compute with Alg.3) 3 Ves+1 ← [Vs, vs] (2) − p 3 uk ← (Qsym) xk (Compute with Alg.3) 4 Vs+1 ← Orthogonalize columns of Ves+1 1 (1) (2) T 4 yk+1 ← 2 (uk + uk ) 5 Hs+1 ← Vs+1AVs+1 p 5 xk+1 ← yk+1/kyk+1k2 6 xs+1 ← Vs+1(Hs+1) e1kyk2 6 until tolerance reached 7 if tolerance reached then break T 1/p 8 vs+1 ← Avs 7 λ ← (xk+1xk) , x ← xk+1 9 end 10 x ← xs+1

gorithm2. We can observe that the main computation steps behaviour presented in Figs.6,7 and8. + p are related to the matrix vector operations (Lsym) xk and − p (Qsym) xk with p < 0. We highlight that this framework M. Computation Of the Smallest Eigenvalues considers only the case where p < 0. and Eigenvectors of Lp + p Observe that in the operation (Lsym) xk, with p < 0, the condition number plays a influential place due to the inverse For the computation of the eigenvectors corresponding to the operation implied by the negativity of p. Note that the eigen- smallest eigenvalues of the signed power mean Laplacian + Lp with p < 0, we take the Polynomial Krylov Subspace values of the normalized Laplacians Lsym are contained in the interval [0, 2], hence, it is a singular matrix. As men- Method for multilayer graphs presented in (Mercado et al., tioned in definition of the power mean Laplacian in Eq.1, 2018) and apply it to our case. The corresponding adaption a suitable diagonal shift is necessary for the case where is presented in Algorithms2 and3.

p < 0. Hence, the eigenvalues of the shifted Laplacian We briefly explain Algorithm2. Let λ1 ≤ · · · ≤ λn be L+ + µI are contained in the interval [µ, 2 + µ], therefore, + + sym the eigenvalues of Lp = Mp(Lsym,Qsym). Let p < 0. λ (L+ ) p p p max sym Then the eigenvalues of L are λ ≥ · · · ≥ λ , that is, the condition number is equal to + which in this case p 1 n λmin(Lsym) 2+µ eigenvectors corresponding to the smallest eigenvalues of reduces to . Thus, it follows that the condition number p µ Lp correspond to the largest eigenvalues of Lp. Thus, in or- |p| + p  2+µ  der to obtain the eigenvectors corresponding to the smallest of (Lsym + µI) is g(µ, p) := µ . It is easy to see 2+µ eigenvalues of Lp we have to apply the power method to that > 1 and hence g(µ, p) grows with larger values of p µ Lp. This is depicted in Algorithm2 . However, the main |p|, hence the condition number is larger for smaller values computational task now is the matrix-vector multiplications of the power mean Laplacian. Moreover, the growth rate + p − p (Lsym) x and (Qsym) x. This is approximated through the of g(µ, p) is larger for smaller values of µ, suggesting that Polynomial Krylov Subspace Method (PKSM). This approx- the shift µ should be set as large as possible. Yet, very + p − p imation method allows to obtain (Lsym) x and (Qsym) x large values of µ overcome the information contained in + p − p without ever computing the matrices (Lsym) and (Qsym) , the Laplacian matrix. Hence, the diagonal shift should not respectively. This is depicted in Algorithm3. be too small (due to numerical stability) and should not be too large (due to information ofuscation). This confirms the The main idea of PKSM s-step is to project a given ma- Spectral Clustering of Signed Graphs via Matrix Power Means

+ + + − pin, pout, pin, pout in [0, 1] in one hundred steps and count how many times the conditions of Theorems1,3 and4 hold 10 4 under different settings. In Fig. 10a we analyze the case + − + + when both G and G are informative (pin > pout and − − pin < pout). We can see that the conditions for the signed power mean Laplacian Lp are always fulfilled, whereas 10 3 those of LSN and LBN hold in a significantly smaller frac- tion of cases, whereas the case of the Bethe Hessian H are

closer to the power mean Laplacians than to LSN and LBN . Mean time (sec.) time Mean In Fig. 10b we analyze the case when G+ or G− is infor- 2 + + − − 10 mative (pin > pout or pin < pout). Now we see an ordering between different Lp where the smaller the value of p the 1 2 3 4 larger the proportion of cases leading to recovery of the 4 10 clusters in expectation. In particular, L−∞ always fulfills the conditions, whereas L realizes the smallest proportion Figure 9: Time execution analysis ∞ of cases where its conditions hold comparable to the one s s−1 trix A onto the space K (A, y) = {y,Ay,...,A y} of LSN and LBN , while the Bethe Hessian holds for 50% and solve the corresponding problem there. The projec- of the cases. In Fig. 10c we treat the case where on aver- s + − − + + − tion on to K (A, y) is done by means of the Lanczos pro- age G and G are informative (pin + pout < pin + pout). cess, producing a sequence of matrices Vs with orthogo- We observe the same ordering as in the previous case and nal columns where the first column of Vs is y/ kyk and again all signed power mean Laplacians outperform LSN s range(Vs) = K (A, y). Moreover, at each step we have and LBN ,while the Bethe Hessian holds for around 75% T AVs = VsHs + vs+1es where Hs is s × s symmet- of the cases. In Figs 10b and 10c we observe that the dif- ric tridiagonal, and ei is the i-th canonical vector. The ference between the signed power mean Laplacians with matrix product vector x = Apy is the approximated by finite p gets smaller as the number of clusters k increases. p p xs = Vs(Hs) e1 kyk ≈ A y. The reason is that the eigenvalues of Lsym and Qsym are of the form 1 ± ρ(k), where lim ρ(k) = 0. Thus, as Time Execution Analysis. We present a time execution k→∞ k increases the eigenvalues become equal and thus the gap analysis in Fig.9. We depict the mean time execution vanishes. out of 10 runs of the power mean Laplacian Lp with p ∈ {−1, −2, −5 − 10}. In particular L−1(ours), L−2(ours), L−5(ours) and L−10(ours) depict the time execution using O. On Wikipedia Experiments our proposed method based on Algorithm2 together with We provide a more detailed inspection of the results from the polynomial Krylov subspace method described in Algo- Sec.4. In Fig. 11 we present the sorted adjacency matrices rithm3. For comparison we consider L (eigs) which is −1 according to the identified clusters. In the first two clumns, computed with the function eigs from MATLAB instead (left to right), we can see that there is a large cluster (upper- of using Algorithm3. All experiments are performed using left corner of each adjacency matrix) that does not resemble one thread. For evaluation random signed graphs following any structure, whereas the remaining part of the graph does the SSBM are generated, with parameters p+ = p− = in out present certain clustering structure. The following third and 0.05 and p− = p+ = 0.025 with two equal sized clusters, in out fourth columns zoom in into this region, which corresponds and graph size |V | ∈ {10000, 20000, 30000, 40000}. We to results presented in Fig.4. can observe that our computational matrix-free approach based on the polynomial Krylov subspace method systemat- ically outperforms the natural approach based on the explic- ity computation of power matrices per layer.

N. Proportion of cases where conditions hold In order to understand how often the conditions from Theo- rems1,3 and4, we perform a series of experiments. For the Bethe Hessian we take the limit result when |V | → ∞ as the corresponding conditions do not have as a pa- rameter the size of graph. The corresponding results are depicted in Fig. 10. We discretize each of the parameters Spectral Clustering of Signed Graphs via Matrix Power Means

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2 2 5 10 25 2 5 10 25 2 5 10 25

(a) (b) (c) Figure 10: Proportion of cases where conditions of Theorems1,3 and4 hold under different settings. Spectral Clustering of Signed Graphs via Matrix Power Means

+ − + − (a) L−10 : W (b) L−10 : W (c) L−10 : W (Zoom) (d) L−10 : W (Zoom)

+ − + − (e) L−5 : W (f) L−5 : W (g) L−5 : W (Zoom) (h) L−5 : W (Zoom)

+ − + − (i) L−2 : W (j) L−2 : W (k) L−2 : W (Zoom) (l) L−2 : W (Zoom)

+ − + − (m) L−1 : W (n) L−1 : W (o) L−1 : W (Zoom) (p) L−1 : W (Zoom)

+ − + − (q) L0 : W (r) L0 : W (s) L0 : W (Zoom) (t) L0 : W (Zoom)

+ − + − (u) L1 : W (v) L1 : W (w) L1 : W (Zoom) (x) L1 : W (Zoom)

Figure 11: Sorted adjacency matrices according to clusters identified by the Power Mean Laplacian Lp with p ∈ {−10, −5, −2, −1, 0, 1}. Columns from left to right: First two columns depict adjacency matrices W + and W − sorted through the corresponding clustering. Third and fourth columns depict the portion of adjacency matrices W + and W − corresponding to the k − 1 identified clusters. Rows from top to bottom: Clustering corresponding to L−10,L−5,L−2,L−1,L0,L−1.