Finding Large Balanced Subgraphs in Signed Networks
Total Page:16
File Type:pdf, Size:1020Kb
Finding large balanced subgraphs in signed networks Bruno Ordozgoiti Antonis Matakos Aristides Gionis∗ Aalto University Aalto University KTH Royal Institute of Technology [email protected] [email protected] [email protected] ABSTRACT Signed networks are graphs whose edges are labelled with either a + + - - + + - - positive or a negative sign, and can be used to capture nuances in interactions that are missed by their unsigned counterparts. The + + - - concept of balance in signed graph theory determines whether a network can be partitioned into two perfectly opposing subsets, and Figure 1: The four possible signed triangles. The two on the is therefore useful for modelling phenomena such as the existence left are balanced, while the two on the right are not. of polarized communities in social networks. While determining whether a graph is balanced is easy, finding a large balanced sub- Many social-media platforms can be represented by graphs. Thus, graph is hard. The few heuristics available in the literature for this graph theory has found a variety of applications in this domain purpose are either ineffective or non-scalable. In this paper we over the last few decades, such as community detection [13], parti- propose an efficient algorithm for finding large balanced subgraphs tioning [4], and recommendation [30]. One limitation of the graph in signed networks. The algorithm relies on signed spectral theory representations usually employed in the literature is that they can and a novel bound for perturbations of the graph Laplacian. In capture the existence, or even the strength, of connections between a wide variety of experiments on real-world data we show that vertices, but not their disposition. For instance, in a social network, our algorithm can find balanced subgraphs much larger than those vertices may represent people and edges interactions between them. detected by existing methods, and in addition, it is faster. We test By relying just on this information we cannot know whether each its scalability on graphs of up to 34 million edges. interaction is friendly or hostile. Signed graphs can be used to overcome this limitation. In signed CCS CONCEPTS graphs, each edge is labeled with either a positive or negative sign. • Theory of computation → Graph algorithms analysis. If a graph represents social interactions, signs can be employed to determine whether these interactions are friendly or not. Thus, KEYWORDS signed graphs constitute a good representation for detecting polar- ized groups in online debates. Signed graphs were first introduced graph mining, signed graphs, dense subgraph, community detection by Harary to study the concept of balance [17]. A signed graph is ACM Reference Format: said to be balanced if its vertices can be partitioned into two sets in Bruno Ordozgoiti, Antonis Matakos, and Aristides Gionis. 2020. Finding perfect agreement with the edge signs; that is, every edge within large balanced subgraphs in signed networks. In Proceedings of The Web each set is positive and every edge between the two sets is negative. Conference 2020 (WWW ’20), April 20–24, 2020, Taipei, Taiwan. ACM, New Equivalently, a signed graph is balanced when the product of the York, NY, USA, 11 pages. https://doi.org/10.1145/3366423.3380212 signs of every cycle is positive. This is analogous to the common- place notion “the friend of a friend is a friend,” “the enemy of a 1 INTRODUCTION friend is an enemy,” etc., as illustrated in Fig. 1. Social-media platforms have taken hold as one of the main forms A substantial body of work has been devoted to studying the of communication in today’s society. Despite having served to fa- spectral properties of signed graphs, which have strong connec- cilitate connections between individuals, in recent years we have tions to the concept of balance. In particular, the spectrum of the arXiv:2002.00775v1 [cs.SI] 3 Feb 2020 observed an array of negative phenomena associated to these tech- Laplacian matrix of a signed graph reveals whether it is balanced nologies. Among other, these platforms seem to contribute to the [23]. Graphs found in real applications are often not balanced, and polarization of political deliberation, which can be detrimental to therefore the question of finding a balanced subgraph arises nat- the health of democracy. Thus, the study of methods to detect and urally. The problem of finding a maximum balanced subgraph be mitigate polarization in online debates is becoming an increasingly formulated in terms of vertex cardinality (MBS) or edge cardinality compelling topic [14, 15, 26, 28, 29, 33]. (MBS-EDGE). Both formulations lead to NP-hard problems, thus, the development of efficient heuristics to approximately solve this ∗This work was done while the author was with Aalto University. problem is well motivated. In this paper we present an algorithm to find large balanced This paper is published under the Creative Commons Attribution 4.0 International subgraphs in signed networks. The algorithm works in two stages. (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. First, we rely on spectral theory, as well as on a novel bound for per- WWW ’20, April 20–24, 2020, Taipei, Taiwan turbations of the Laplacian, to develop a greedy method to remove © 2020 IW3C2 (International World Wide Web Conference Committee), published vertices and uncover a balanced subgraph. Then, any removed ver- under Creative Commons CC-BY 4.0 License. ACM ISBN 978-1-4503-7023-3/20/04. tices that do not violate the balance of the located structure are https://doi.org/10.1145/3366423.3380212 restored. We derive analytical properties that allow us to efficiently WWW ’20, April 20–24, 2020, Taipei, Taiwan Ordozgoiti, Matakos & Gionis. implement the algorithm. Finally, we devise a random sampling a generalization of the standard MaxCut problem, and thus, NP- strategy to significantly enhance the scalability of the method. hard. To obtain an exact solution, the problem has been studied in In a variety of experiments on real-world and synthetic data, we the context of fixed-parameter tractability (FPT). Hüffner et al. [21] show that the proposed algorithm finds larger balanced subgraphs propose an FPT algorithm for deciding whether the maximum bal- than alternative heuristics from the literature, both in terms of anced subgraph has size at least m − k, where k is the parameter. vertex and edge count. Furthermore, the proposed algorithm has the Motivated by the lower bound of Poljak and Turzík, Crowston et desirable properties that (i) it runs faster than any other competing al. [9] give an FPT algorithm for deciding whether the maximum m n−1 k method, (ii) it can be tuned to trade off running time and solution balanced subgraph has at least 2 + 4 + 4 edges, where k is the quality, and (iii) produces a vertex-removal sequence, which can be parameter. These algorithms are not practical, as their running time used to trade off balance and graph size. We validate the scalability is exponetial in k and the degree of the polynomial in n is large. of the method by testing it on graphs of up to 34 million edges. The MBS-EDGE problem has also been considered in application- Our contributions are summarized as follows: driven studies, and different heuristics have been proposed. Das- • We propose an algorithm for finding large balanced subgraphs Gupta et al. [10] consider an edge-deletion formulation of the MBS- in signed graphs, based on spectral theory and its connections EDGE problem in the context of biological networks. Motivated by to balance. the MaxCut connection, the authors develop an algorithm based on • We give an upper bound for the smallest Laplacian eigenvalue semidefinite programming relaxation (SDP); the approach, however, after removing a set of vertices, which allows us to efficiently is not scalable and tested only on very small networks. Figueiredo trim the graph to find a balanced subgraph. and Frota [12] ask to find a balanced subgraph that maximizes the • We experimentally show that our algorithm finds subgraphs number of vertices. They propose and experiment with a branch- much larger than those found by state-of-the-art methods. and-cut exact approach, a heuristic based on minimum-spanning • We devise a random sampling strategy to significantly enhance tree computation, and a heuristic combining a greedy algorithm the scalability of the method, and show empirically that the and local-search. We experimentally compare the proposed method quality of the output is not affected. with these heuristics in our empirical evaluation. The rest of this paper is structured as follows. We discuss related Community detection in signed graphs. Different approaches work in Section 2. In Section 3 we introduce our notation and have been proposed for community detection in signed graphs, relevant notions. In Section 4 we describe our algorithm in detail some of which try to incorporate balance theory. Anchuri et al. [2] and discuss relevant considerations, and in Section 5 we show our propose a spectral approach to partition a signed graph into a experimental results. Finally, Section 6 is a short conclusion. number of non-overlapping balanced communities. Yang et al. [34] propose a random-walk-based approach for partitioning a signed 2 RELATED WORK graph into communities, where in addition to positive edges within clusters and negative edges across clusters, they also want to max- Signed graphs and balance theory. Signed graphs were first stud- imize cluster densities. Doreian and Mrvar [11] propose an algo- ied by Harary, who was interested in the notion of balance [17]. In rithm for partitioning a signed directed graph so as to minimize 1956, Cartwright and Harary generalized Heider’s psychological a measure of imbalance.