Balance Maximization in Signed Networks via Edge Deletions

Kartik Sharma Iqra Altaf Gillani Sourav Medya IIT Delhi IIT Delhi Northwestern University [email protected] [email protected] [email protected] Sayan Ranu Amitabha Bagchi IIT Delhi IIT Delhi [email protected] [email protected]

ABSTRACT Signed graphs were first studied by Harary et al.[15] with par- In signed networks, each edge is labeled as either positive or nega- ticular focus on their balance. A balanced is one in tive. The edge sign captures the polarity of a relationship. Balance which the vertices can be partitioned into two sets such that all of signed networks is a well-studied property in . In a edges inside each partition have a positive sign and all the nega- balanced (sub)graph, the vertices can be partitioned into two subsets tive signed edges are across the partitions. Balance is correlated with negative edges present only across the partitions. Balanced with both positive and negative side-effects on a community. On portions of a graph have been shown to increase coherence among the positive side, balanced communities are positively correlated its members and lead to better performance. While existing works with performance in financial networks where edges represent trad- have focused primarily on finding the largest balanced subgraph ing links [3, 12]. On the negative side, in social networks, balanced inside a graph, we study the network design problem of maximizing communities often promote “echo-chambers”, reduce diversity of balance of a target community (subgraph). In particular, given a bud- opinions, and ultimately lead to more polarized viewpoints [13]. get 푏 and a community of interest within the signed network, we Owing to the correlation of balance with several higher-order aim to make the community as close to being balanced as possible functional traits, it is natural to measure how far a community by deleting up to 푏 edges. Besides establishing NP-hardness, we also is from being balanced. For example, in financial networks, it is show that the problem is non-monotone and non-submodular. To important to evaluate how the community may be engineered to overcome these computational challenges, we propose heuristics further improve its balance. On the other hand, in social networks, an based on the spectral relation of balance with the Laplacian spectrum adversary, such as a political party, may be interested in polarizing of the network. Since the spectral approach lacks approximation the community in its favor by further increasing its balance. To guarantees, we further design a greedy algorithm, and its random- avoid such adversarial attacks, it is important to know the weak ized version, with provable bounds on the approximation quality. links in a community so that they can be safeguarded. The bounds are derived by exploiting pseudo-submodularity of the In this paper, we address these applications by studying the prob- balance maximization function. Empirical evaluation on eight real- lem of maximizing balance via edge deletions (Mbed). In the Mbed world signed networks establishes that the proposed algorithms are problem, we are given a graph, a target community within this effective, efficient, and scalable to graphs with millions ofedges. graph, and a budget 푏. Our goal is to remove 푏 edges, such that the community gets as close to being balanced as possible. We formally 1 INTRODUCTION AND RELATED WORK define the notion of balance closeness in § 2. Deleting an edgewould Graphs can model various complex systems such as knowledge correspond to actions such as unfollowing or blocking a connection. graphs [33], road networks [27], communication networks [29], and If increasing balance is desirable, then Mbed provides a mechanism social networks [19]. Typically, nodes represent entities, and edges towards achieving the goal. On the other hand, Mbed also measures characterize relationships between pairs of entities. Signed graphs how susceptible a community is to adversarial attacks by revealing further enhance the representative power of graphs by capturing how much the balance can be increased through a small number of the polarity of a relationship through positive and negative edge deletions, and which are these critical edges that must be protected. arXiv:2010.10991v1 [cs.SI] 21 Oct 2020 labels [15, 17, 31]. For example, if a graph represents social inter- 1.1 Related Work actions, a positive edge would denote friendly interaction, and a negative edge would indicate a hostile relationship. Similarly, in a The problem we study falls in the class of network design prob- collaboration network, positive edges may indicate complementary lems. In network design, the goal is to modify the network so that skill sets, whereas negative edges would indicate disparate skills. an objective function modeling a desirable property is optimized. Examples of such objective functions include optimizing shortest path distances (traffic and sustainability improvement) [11, 23, 27, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed 28], increasing centrality of target nodes by adding a small set of for profit or commercial advantage and that copies bear this notice and the full citation edges [7, 18, 26], optimizing the 푘-core[24, 39], manipulating node on the first page. Copyrights for components of this work owned by others than ACM similarities [10], and boosting/containing influence on social net- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a works [5, 20, 25]. fee. Request permissions from [email protected]. While several works exist on finding balanced subgraphs [9, 12, WSDM ’21, March 08–12, 2021, Jerusalem, Israel 15, 17, 31], work on optimizing balance through network design is © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00 rather limited. The only work is by Akiyama et al. [1], where they https://doi.org/10.1145/1122445.1122456 study the minimum number of sign flips needed to make a graph WSDM ’21, March 08–12, 2021, Jerusalem, Israel Sharma et al. balanced. However our work is different for several reasons. First, 풆4+ [1] does not have any notion of a budget constraint. Second, the o o 풗2 o o o o o × + + + + + + + + + + + + o o cascading impact of a sign flip and an edge deletion on the balance + o o 풗1 o o o o o × 풆2+ o o of a graph is significantly different. Third, [1] lacks evaluation on + + + + + + + + o × o × o × 풆 + large real world graphs containing millions of edges. Finally, from o o 1 풆 + + + + + − 3 a practicality viewpoint, selectively flipping the sign of an edge + − + − + − + − o × o o o × o × × is difficult since the edge sign encodes the nature of interaction (a) (b) (c) (d) (e) between the two entities (endpoints) of the edge. In contrast, deleting Figure 1: This figure shows a series of signed graphs. Weuse an edge is a more lightweight task as it only involves stopping the following coloring scheme. The balanced subgraphs con- further interactions with a chosen node. tain the colored nodes in blue (marked in ‘o’) and red (in ‘×′) Several studies related to identifying large balanced subgraphs representing node partition sets 푉1 and 푉2. Nodes outside the exist. Poljak and Turzík addressed the problem of finding a maxi- balanced component are in white. The current balance of (a) mum weight balanced subgraph and showed an equivalence with is Δ(Γ) = 6, whereas in (b)-(d) a single edge deletion increases max-cut in a graph with a general weight function [35]. Other ap- Δ(Γ) to 8. (e) Illustration of why Mbed is not submodular. proaches include finding balanced subgraphs with the maximum Definition 3 (Current Balance (Δ(Γ))[12]). Given a signed number of vertices [12, 31] and edges [9] in the context of biological graph Γ, the current balance Δ(Γ) is the maximum number of nodes networks. Hüffner et al.17 [ ] gave an exact algorithm for finding in any induced subgraph that is connected and balanced. The largest such balanced subgraphs using the idea of graph separators. More connected induced balanced subgraph is denoted by 푆 (Γ), and thus, recently, Ordozgoiti et al. [31] studied the problem to identify the Δ(Γ) = |푉 (푆 (Γ))|. maximum balanced subgraph in a given graph and designed efficient It is worth noting that the largest connected induced balanced and effective heuristics. subgraph might not be unique. We solve a network design problem where the balance is max- 1.2 Contributions imized via edge deletions. The modified graph is denoted as Γ푋 Our key contributions are summarized as follows. after the deletion operation of edge set 푋 on Γ. Deletion of an edge • We propose the novel network design problem of maximizing (positive or negative) may increase the balance of a graph. . We establish balance in a target subgraph via edge deletion (Mbed) Example 2. The current balance of the graph in Fig. 1(a) is 6. Delet- that Mbed is NP-hard, non-submodular and non-monotonic (§ 2). ing any negative or positive edge increases the balance to 8 (Fig. 1(b)- • Since NP-hardness makes an optimal algorithm infeasible, we (d)). Note that deleting an edge may initiate a cascading impact and propose an efficient, algebraically-grounded heuristic that exploits bring in multiple nodes into the balanced subgraph. the connection of balance in a signed graph with the spectrum Problem 1 (Maximizing Balance via Edge Deletion (Mbed)). of its Laplacian (§ 3). Although this spectral approach Given a signed (sub) graph 퐻, a candidate edge set C and a budget is extremely efficient, it lacks an approximation guarantee. We 푏, find the set, 퐵 ⊂ C of 푏 edges to be deleted such that 푓 (퐵) = overcome this weakness by establishing that Mbed is pseudo- Δ(퐻퐵) − Δ(퐻), i.e., the number of nodes in 푆 (퐻퐵), is maximized. submodular, which is then utilized to design greedy algorithms Here, 퐻 = (푉 (퐻), 퐸(퐻)\ 퐵) is the subgraph of 퐻 formed by deleting with provable quality guarantees (§ 4). 퐵 the edge set 퐵 from 퐻. • We extensively benchmark the proposed methodologies on an Note that maximizing 푓 (퐵) = Δ(퐻 ) − Δ(퐻) is equivalent to array of eight real-world signed graphs. Our experiments estab- 퐵 maximizing Δ(퐻 ). We envision 퐻 to be the target community lish that the proposed methodologies are effective, efficient, and 퐵 where we would like to maximize balance. C denotes the edges that scalable to million-sized graphs (§ 5). may be deleted, which may be the entire edge set of 퐻. 2 PROBLEM DEFINITION 2.1 Problem Characterization In this section, we introduce the concepts central to our problem. Theorem 1. The Mbed problem is NP-hard. All important notations used in our work are summarized in Table 1. Proof. We reduce Mbed from the Set Union Knapsack Problem Definition 1 (Signed graph). A signed graph, Γ = (퐺, 휎) is a [14]. The details are in Section 8.1. undirected graph 퐺 = (푉, 퐸) along with a mapping 휎 : 퐸 → {−1, +1}, Lemma 1. The optimization function 푓 (퐵) of Mbed is non-monotonic, called its edge labelling, that assigns a sign to each edge. i.e., an edge deletion may lead to a decrease in current balance. Given a signed graph Γ = ((푉, 퐸), 휎), we use the notation 퐸+ = Proof. Consider the path 푎 −푏 −푐 −푑 with only edge (푏, 푐) being {푒 ∈ 퐸 : 휎(푒) = +1} and 퐸− = {푒 ∈ 퐸 : 휎(푒) = −1} to denote the set negative. The current balance is 4 since the entire graph is balanced. of positive and negative edges in Γ respectively. If we delete any edge, the balance decreases to at most 3. □ Definition 2 (Balanced graph). A signed graph Γ = ((푉, 퐸), 휎) A function 푓 (.) is submodular [19] if the marginal gain by adding is said to be balanced if there exists a partition (푉1,푉2) of 푉 such that an element 푒 to a subset 푆 is equal or higher than the same in a for every (푢, 푣) ∈ 퐸 with 휎(푢, 푣) = −1, 푢 ∈ 푉1 iff 푣 ∈ 푉2. superset 푇 . Mathematically, it satisfies: ( ∪ { }) − ( ) ≥ ( ∪ { }) − ( ) Example 1. Consider the signed graph in (Fig. 1(a)). The subgraph 푓 푆 푒 푓 푆 푓 푇 푒 푓 푇 (1) induced by the coloured nodes is a balanced subgraph since it can be for all elements 푒 and all pairs of sets 푆 ⊆ 푇 and 푒 ∉ 푆, 푒 ∉ 푇 . 1 partitioned into disjoint sets 푉1 (blue) and 푉2 (red) with positive edges Lemma 2. 푓 (퐵) is not sub-modular . within the partitions and negative edges across partitions. 1We show a stronger result that it is not even proportionally-submodular in Sec. 8.2. Balance Maximization in Signed Networks via Edge Deletions WSDM ’21, March 08–12, 2021, Jerusalem, Israel

Symbol Definition and Description Γ = ((푉 , 퐸), 휎) Signed undirected graph with sign fn. 휎 풗 denotes the unit eigenvector of Laplacian 퐿(퐻) corresponding to the ( ) 푡ℎ 푆 Γ Largest balanced (connected and induced) subgraph of Γ minimum eigenvalue 휆1 (퐻) and 푣푖 denotes the 푖 entry of 풗. Recall, Δ(Γ) |푉 (푆 (Γ)) | C Candidate edge set 퐻푋 denotes the subgraph formed due to removal of edge set 푋 from 퐻. 푏 Budget (i.e., #edges to be deleted) 퐿(Γ) Laplacian matrix of signed graph Γ Proof. Given a signed graph Γ with 퐿(Γ) being its corresponding ( ) ( ) 휆1 Γ Smallest eigenvalue of 퐿 Γ |푉 | 풖, 풗 Vectors (bold lower case) Laplacian. We know for any 풖 ∈ R , 푣 푖푡ℎ entry of 풗 푖 푇 ∑︁ 2 ∑︁ 2 퐻푋 Subgraph 퐻 after deleting edges 푋 ⊆ C 풖 퐿(Γ)풖 = (푢푖 − 푢푗 ) + (푢푖 + 푢푗 ) . (4) ( ) Set of contradictory edge-pairs for subgraph 푐푒푝 퐻, 푥 + − 퐻 with one end at node 푥 (푖,푗) ∈퐸 (푖,푗) ∈퐸

Table 1: Frequently used symbols Now, using Eq. (4) for 퐿(퐻푋 ) and (unit) eigenvector 풗 of 퐿(퐻) cor- responding to 휆1 (퐻), we get Proof. In Fig. 1e, let 푆 = {푒4},푇 = {푒1, 푒4}. Here, 푓 (푆 ∪ {푒2}) = ∑︁ 푇 ( ) = ( − ( ) )2 0, 푓 (푇 ∪{푒2}) = 1. Thus, 푓 (푆 ∪{푒2})−푓 (푆) < 푓 (푇 ∪{푒2})−푓 (푇 ). □ 풗 퐿 퐻푋 풗 푣푖 휎 푖, 푗 푣 푗 (푖,푗) ∈퐸 (퐻 ) Owing to NP-hardness, devising an optimal algorithm for Mbed 푋 ∑︁ 2 ∑︁ 2 is not feasible in polynomial time. Furthermore, due to the optimiza- = (푣푖 − 휎(푖, 푗)푣 푗 ) − (푣푖 − 휎 (푖, 푗)푣 푗 ) tion function being non-monotonic and non-submodular, greedy (푖,푗) ∈퐸 (퐻) (푖,푗) ∈푋 푇 ∑︁ 2 algorithms exploiting these properties are also not applicable. We = 풗 퐿(퐻)풗 − (푣푖 − 휎(푖, 푗)푣 푗 ) . overcome these computational challenges through two different (푖,푗) ∈푋 푇 푇 approaches: a spectral approach built on signed graph Laplacians 풛 퐿(퐻푋 )풛 풗 퐿(퐻푋 )풗 Note that as 휆1 (퐻푋 ) = min풛 , 휆1 (퐻푋 ) ≤ . (§ 3) and approximation schemes based on opti- 풛푇 풛 풗푇 풗 pseudo-submodular 푇 ( ) Substituting 풗 퐿 퐻 풗 = 휆 (퐻) and 풗푇 풗 = 1, the result is proved. mization (§ 4). 풗푇 풗 1 □ We denote the upper bound as the function 푔, where 푔 is 3 THE SPECTRAL APPROACH ∑︁ 2 푔(푋) = 휆1 (퐻) − (푣푖 − 휎(푖, 푗)푣 푗 ) Given a signed graph Γ = ((푉, 퐸), 휎), let 퐴 be its (푖,푗) ∈푋 where 퐴 = 휎(푖, 푗) for (푖, 푗) ∈ 퐸, and 0 otherwise. Furthermore, let 푖 푗 ( ) 퐷 be the diagonal matrix defined as 퐷 = 푑(푖), where 푑(푖) is The upper bound 푔 푋 is easier to optimize than minimizing 푖푖 ( ) ( ) the vertex degree, i.e., the total number of edges incident on vertex 휆1 퐻 . In particular, 푔 푋 is a modular function and hence greedily 푖. We define the corresponding signed Laplacian as follows. choosing the top-푏 edges will achieve an optimal solution [30]. Lemma 6. 푔(푋) is modular (submodular and supermodular). Definition 4 (Signed Laplacian). The Laplacian of a signed graph Γ = ((푉, 퐸), 휎), denoted as 퐿(Γ) is a symmetric matrix |푉 | × |푉 | Proof. The proof is in Section 8.3. □ 퐿(Γ) = 퐷(Γ)−퐴(Γ) 퐿 = 푑 퐿 = −휎(푖, 푗) matrix defined as , i.e., 푖푖 푖 , and 푖 푗 Algorithm: Since 푔(푋) is modular, we simply compute 푔({푒}), ( ) ∈ ≠ if 푖, 푗 퐸 and 0 otherwise for 푖 푗. for each edge 푒 = (푖, 푗) ∈ C and select the top-푏 edges based on the 2 Lemma 3 ([16]). Given a signed graph Γ = ((푉, 퐸), 휎), Γ is bal- value of (푣푖 − 휎(푖, 푗)푣 푗 ) , where 푏 is the budget. anced iff the smallest eigenvalue of the Laplacian 휆1 (Γ) = 0. The algorithm involved in this approach requires to compute the smallest eigenpair of 퐿(퐻) only once. So, we can use the Locally Op- It has been further shown that 휆1 (Γ) is a measure of how "far" the graph is from being balanced [4, 22]. timal Block Preconditioned Conjugate Gradient (LOBPCG) method proposed by Knyazev [21]. This method has theoretical guarantee 4 ([4]). Γ = ((푉, 퐸), 휎) 휆 (Γ) Lemma Given a signed graph with 1 on linear convergence, and the costs per iteration and the memory as the smallest eigenvalue of the corresponding Laplacian. use are competitive with those of the Lanczos method 2. 휆 (Γ) ≤ 휈 (Γ) ≤ 휖(Γ) 1 3.2 Perturbation & Iterative Algorithm ( ) ( ) where 휈 Γ (휖 Γ ) denotes the frustration number (frustration index), We extend the described upper bound in Lemma 5 into a tighter i.e., the minimum number of vertices (edges) to be deleted such that expression and design another way to solve MBED in an iterative the signed graph is balanced. fashion. Similarly, the main idea is to compute change in the smallest Note that Δ(Γ) = |푉 | − 휈 (Γ). Through Lemma 4, for any given eigenvalue 휆1 (퐻) of the Laplacian with a single edge deletion. We subgraph 퐻, we have: drop 퐻 and use 휆1 (퐻) = 휆1 where the context is understood. Δ(퐻) = |푉 (퐻)| − 휈 (퐻) ≤ |푉 (퐻)| − 휆1 (퐻) (2) Let 휆ˆ1 be the (exact) smallest eigenvalue of 퐿ˆ(퐻), where 퐿ˆ(퐻) is the perturbed version of 퐿(퐻) obtained by deleting a single edge 3.1 An Upperbound Based Algorithm (푖, 푗) ∈ 퐸(퐻). Let 훿 = 휆2 − 휆1 be the eigengap of 퐿(퐻). For graphs Since directly maximizing Δ(퐻) is NP-hard, we turn our focus to that have sufficiently large eigengaps, we show the following result. the upperbound provided by Eq. (2). It is evident that maximizing Lemma 7. Given 휆1 is the smallest eigenvalue of 퐿(퐻) and 풗 is the upper bound is equivalent to minimizing 휆 (퐻). To minimize 1 the corresponding unit eigenvector, for 훿 ≥ 4 we have 휆ˆ1 = 휆1 − 휆1 (퐻), we first derive the following upper bound. 2 푣푖 − 휎(푖, 푗)푣 푗 + 푂 (1). Lemma 5. Given a signed graph Γ, a subgraph 퐻, a candidate edge set C, for a set 푋 ⊆ C, we have Proof. See App. 7.1 □ 2 ∑︁ 2 Lanczos algorithm [32] (with Fast Multipole method [6]) has a time complexity of 휆1 (퐻푋 ) ≤ 휆1 (퐻) − (푣푖 − 휎(푖, 푗)푣 푗 ) (3) 푂 (푑푎푣푔 |푉 (퐻) |푘) where 푑푎푣푔 is the average number of nonzero elements in a row of (푖,푗) ∈푋 the matrix and 푘 is the number of iterations of the algorithm. WSDM ’21, March 08–12, 2021, Jerusalem, Israel Sharma et al.

Algorithm 1 Spectral Edge Deletion Definition 5 (Contradictory Edge-pair). Given a subgraph Require: The initial subgraph 퐻, budget 푏, candidate set C 퐻 with largest balanced subgraph 푆 (퐻) having balance partition Ensure: A set 퐵 of 푏 edges ← ← (푉1,푉2) two edges 푒1, 푒2 form a contradictory edge-pair if any of these 1: 퐻0 퐻, 퐵 Φ ′ ′ 2: for 푘 = 1 to 푏 do conditions follow for some 푢,푢 ∈ 푉1 and 푤,푤 ∈ 푉2, and 푥 ∉ 푉1 ∪푉2: 3: Compute eigenpair 휆1 (퐻푘−1), 풗 = ( ) = ( ) (( )) = (( )) 4: for 푒 = (푖, 푗) ∈ C \ 퐵 do (1) 푒1 푥,푢 and 푒2 푥,푤 such that 휎 푥,푢 휎 푥,푤 . 2 ′ ′ 5: Compute 푠푐표푟푒 (푒) = (푣푖 − 휎 (푖, 푗)푣푗 ) (2) 푒1 = (푥,푢) and 푒2 = (푥,푢 ) such that 휎((푥,푢)) = −휎((푥,푢 )) 6: 푒 = argmax 푠푐표푟푒 (푒) ′ ′ 푘 푒∈C (3) 푒1 = (푥,푤) and 푒2 = (푥,푤 ) such that 휎((푥,푤)) = −휎((푥,푤 )) 7: 퐵 ← 퐵 ∪ {푒푘 }, 퐸 (퐻푘 ) = 퐸 (퐻푘−1)\ 푒푘 8: return 퐵 We use 푐푒푝(퐻, 푥) to denote the set of contradictory edge-pairs for subgraph 퐻 with one end at node 푥. A contradictory edge pair 3.2.1 Algorithm: We use Lemma 7 to design an iterative algorithm restricts node 푥 from contributing to the balance. This property is (Alg. 1). Given 풗 as unit eigenvector corresponding to the smallest more formally expressed as follows. eigenvalue 휆 , we define score of an edge 푒 = (푖, 푗) ∈ 퐸(퐻) as 1 Observation 2. A node 푥 will not be part of 푆 (퐻) if one of the fol- (푣 − 휎 (푖, 푗)푣 )2. We use this score to subsequently find the best 푖 푗 lowing conditions hold: (1) |푐푒푝(퐻, 푥)| > 0, (2) the node 푥 is connected edge from the candidate edge set C (lines 4 − 6). In subsequent to 푆 (퐻) only via paths ending at a node 푦 where |푐푒푝(퐻,푦)| > 0. iterations (lines 2 − 8) of the algorithm, we recompute the eigenpair (line 3) corresponding to the minimum eigenvalue of the perturbed Example 3. In Fig. 1(a), nodes 푣1 and 푣2 are not part of the bal- matrix after the deletion of the best edge (line 7) and use LOBPCG anced subgraph 푆 (Γ) due to condition (1) and condition (2) respectively. method for all such iterations to achieve faster convergence. Obs. 2 allows us to formally define when an edge deletion in- 3.2.2 Limitations: Alg. 1 does not provide any approximation guar- creases the balance. ′ antee and does not directly optimize the objective in MBED. Rather, Observation 3. 푓 ({푒}) > 0 iff (푒, 푒 ) ∈ 푐푒푝(퐻, 푥) for some ′ it minimizes the smallest eigenvalue. Although it is known that in 푒 ∈ 퐸, 푥 ∈ 푉 (퐻), and |푐푒푝(퐻{푒 }, 푥)| = 0, i.e., following deletion of 푒, a balanced graph, 휆1 = 0, no result is known on the gradients of 푥 does not associate with any contradictory edge pair. change in balance with that of change in 휆1, i.e., the relationship From Obs. 3, it follows that only the deletion of a peripheral between Δ(퐻푋 ) − Δ(퐻) with 휆1 (퐻) − 휆1 (퐻푋 ). To address these edge may result in increase of balance. A peripheral edge has one weaknesses, we next directly optimize the objective function and endpoint within 푆 (퐻) and the other outside 푆 (퐻). Owing to this show that MBED is pseudo-submodular, which in turn allows us to result, hereon, we implicitly assume any edge being considered for provide an approximation guarantee on quality. deletion is a peripheral edge. Note, however, that following an edge 4 APPROXIMATION ALGORITHMS deletion, the set of peripheral edges changes. Empowered with these observations, we next establish pseudo-submodularity. In § 2.1, we showed that MBED is not monotonic. We next show that if the set of deleted edges 푋 is selected strategically, then monotonic- 4.1.1 Local Pseudo-submodularity. ity can be guaranteed. If the optimization function is monotonic and Definition 6 (Pseudo-submodularity [36]). Given a scalar Í pseudo-submodular, then greedy algorithms can produce approxi- 0 < 훾 ≤ 1, a function 푓 is pseduo-submodular if 푒 ∈푅 [푓 (푄 ∪ {푒}) − mation bounds. The rest of the section builds towards this result. 푓 (푄)] ≥ 훾 [푓 (푄 ∪ 푅) − 푓 (푄)] for any pair of disjoint sets 푄, 푅 ⊂ C. Note that the pseudo-submodularity ratio 훾 is a pessimistic bound Observation 1. If the set of deleted edges 푋 is chosen such that over all pairs of disjoint sets. Instead of using 훾, we compute approx- 퐻푋 and 퐻 have same number of connected components, then the imation bounds on a local submodularity ratio [36] defined on two Í objective function 푓 (·) is monotonic, i.e., 푓 (푆 ∪ {푒}) ≥ 푓 (푆) ∀푆, 푒. sets 푄, 푅, i.e., a non-negative 훾푄,푅 satisfying 푒 ∈푅 [푓 (푄 ∪ {푒}) − 푓 (푄)] ≥ 훾 [푓 (푄 ∪ 푅) − 푓 (푄)]. It has been shown that using local Proof. For all of the subsequent discussions, we will use ( ) 푄,푅 푆 퐻 bounds leads to significantly better guarantees [36]. First, we prove to denote the largest balanced subgraph of 퐻 with the two vertex a lower bound for 훾푄,푅 as follows: sets being 푉1 and 푉2. The deleted edge 푋 = {푒} can fall in one of Theorem 2. For two disjoint sets 푄, 푅, three categories. (1) both end points lie in 푉1 (or equivalently 푉2), ∑︁ [푓 (푄 ∪ {푒}) − 푓 (푄)] ≥ 훾 [푓 (푄 ∪ 푅) − 푓 (푄)] in which case Δ(퐻) = Δ(퐻푋 ) since 퐻푋 and 퐻 have same number 푄,푅 푒 ∈푅 of connected components. (2) One endpoint lies in 푉1 and the other 1 in 푉2. Even in this case Δ(퐻) = Δ(퐻푋 ). (3) One endpoint in 푉1 (or where 훾푄,푅 ≥ 1 . 1+ 4 Δ(퐻푄 ) ( |푅 |−1) 푉2) and the other in 푉 (퐻)\{푉1 ∪ 푉2}. In this case, the node in 푉1 Proof. See App. 7.2. continues to stay there while the other endpoint may move into 푉1 This theorem proves a lower bound for 훾 for any disjoint sets ( ) ≤ ( ) 푄,푅 or 푉2 and thus Δ 퐻 Δ 퐻푋 . □ 푄 and 푅. Obs. 1 and Thm. 2 show that the monotonicity and local Choosing 푋 is in our control. Hence, we may assume that MBED pseudo-submodularity holds for our objective function. We next is monotonic by ensuring that 푋 satisfies the constraint outlined in leverage these properties to design a randomized greedy algorithm Obs. 1. We next establish that although MBED is not submodular with approximation guarantees. (Lem. 2), it is pseudo-submodular (Thm. 2). 4.2 Randomized Greedy (Rg) 4.1 Pseudo-Submodularity Lemma 8 ([36]). Assuming 0 ≤ 훾푖 ≤ 1 for 푖 ∈ {0, 1, 2, ··· , 푘 − 1} Í We first prove that our objective function is pseudo-submodular so that 푒 ∈푂푃푇 [푓 (푆푖 ∪ {푒}) − 푓 (푆푖 ))] ≥ 훾푖 ·[푓 (푆푖 ∪푂푃푇 ) − 푓 (푆푖 )] (Thm. 2) and then provide approximations (Thms. 3 and 4) via Ran- (local pseduo-submodularity) throughout the execution of the Rg al- domized Greedy and Greedy algorithms. gorithm, where 푓 is monotonic, 푂푃푇 denotes the optimal set of edges, Balance Maximization in Signed Networks via Edge Deletions WSDM ’21, March 08–12, 2021, Jerusalem, Israel

Algorithm 2 Randomized Greedy Cases I II III 푟 Require: The initial subgraph 퐻, balanced subgraph 푆 (퐻), budget 푏, candidate set C Rg 4 4 4휓 Ensure: A set 퐵 of 푏 edges 4+Δ∗ (푏−1) 4+Δ푅퐺 (푏−1) 4휓푟 +Δ푅퐺 (푏−1) 1: 퐵 ← Φ 푔 Greedy 4 4 4휓 2: for 푖 = 1 to 푏 do 4+Δ∗ (푏−1) 4+Δ퐺 (푏−1) 4휓푔+Δ퐺 (푏−1) 3: Compute the set of edges C∗ on the periphery of 푆 (퐻) connecting to nodes in 퐻 \ 푆 (퐻). ∗ ′ 4: for 푒 ∈ C ∩ C do Table 2: Lower bounds (higher is better) of 훾 produced by Rg 5: Compute 푓 ({푒 }) = Δ(퐻{푒}) − Δ(퐻). and Greedy, where Δ∗, Δ푅퐺 and Δ퐺 denote the balance after 푖 ⊆ C∗ Í ({ }) 6: Find a subset 푀 of size 푏 maximizing 푒∈푀푖 푓 푒 . 푖 deleting the optimal set of edges, the set produced by Rg and 7: Select a uniformly random element 푒푖 from 푀 . 푟 푔 8: Delete 푒푖 from 퐻, 퐵 ← 퐵 ∪ {푒푖 } Greedy respectively. 휓 and 휓 are the summation of mar- 9: Update 푆 (퐻) to include the nodes from Δ(퐻{푒}). ginal gains of the elements in the optimal solution set over 10: return 퐵 the solution set produced by Rg and Greedy respectively. and 푆푖 denotes the set of chosen elements after the 푖-th iteration (i.e.   ∗ ∗ | | = − − 1 Í푘−1 Datasets |V| |퐸+ ∪ 퐸− | 휌− |푉 (퐺 ) | |Δ(퐺 ) | 푆푖 푖); then Rg obtains an approximation of 1 exp 푘 푖=0 훾푖 BitcoinAlpha 4k 14k 0.09 3772 2903 with a high probability. BitcoinOTC 6k 21k 0.15 5872 4487 Chess 7k 32k 0.42 6601 3477 We can directly apply this lemma in our setting. The Rg Algorithm WikiElections 7k 100k 0.22 7066 3857 is described as Algorithm 2. Slashdot 82k 498k 0.23 82052 51486 WikiConflict 118k 1.4M 0.62 96243 53542 Theorem 3. For MBED, the Rg algorithm obtains an approxima- Epinions 131k 708k 0.17 119070 81385 −훾′ ′ 4 ∗ WikiPolitics 138k 712k 0.12 137713 68037 tion of 1−푒 , and 훾 ≥ + ∗ ( − ) where 푏 and Δ denote the budget 4 Δ 푏 1 Table 3: Description of Datasets: 퐺∗ and Δ(퐺∗) denote the and the balance after deleting the optimal set of edges respectively. largest connected component (LCC) and the maximum bal- ∗ Proof. Let us denote the optimal set of 푏 edges as 퐵 . By mono- |퐸− | anced subgraph of LCC respectively in graph 퐺. 휌− = | ∪ | tonicity, we get Δ(퐻 ) ≤ Δ(퐻 ) · · · ≤ Δ(퐻 ) ≤ Δ∗. From Theo- 퐸+ 퐸− 푆0 푆1 푆푏−1 denotes the proportion of negative edges in the graph. ∗ 4 4 rem 2, 훾푆푖,퐵 ≥ ∗ ≥ ∗ . Now, by substituting 4+Δ(퐻푆푖 ) ( |퐵 |−1) 4+Δ (푏−1) ∗ 푓 ({푒}): For each edge 푒 in the peripheral edge set, the computation 훾푖 with 훾푆푖,퐵 in Lem. 8 we get the desired result. □ of 푓 ({푒}) first checks if the corresponding vertex that is outside The lower bound of 훾 ′ in Thm. 3 can be Improved Bounds: the balanced subgraph can be inducted inside on deletion of the tighter. In particular, 훾 ′ ≥ 4 where Δ푅퐺 denotes the bal- 4+Δ푅퐺 (푏−1) given edge 푒. It checks the sign of all edges incident on the vertex, ance after deleting the solution set of 푏 edges produced by the Rg. 푟 which on average consumes 푂 (푑푎푣푔), where 푑푎푣푔 is the average The bound could be further improved as 훾 ′ ≥ 4휓 where 4휓푟 +Δ푅퐺 (푏−1) degree of a node in the graph. If the node is inducted, a breadth- 휓푟 is the summation of marginal gains of the elements in the opti- first search (BFS) is performed to count its compatible neighbors mal solution set over the solution set produced by Rg (see App. 7.3). that could be included in the newly balanced subgraph. So, each ′ Table 2 summarizes the additional lower bounds of 훾 (where the 푓 ({푒}) computation takes 푂 (|퐸|푑푎푣푔) time. (ii-iii) For updating the ′ approximation guarantee is 1 − 푒−훾 ) that can be derived on the Rg. balanced subgraph and corresponding peripheral edge set, a similar Implementation: Alg. 2 first computes the set of peripheral BFS is performed to find the vertices to be inducted in 푆 (퐻) and the edges of the initial balanced subgraph 푆 (퐻) (line 3). After that, for incompatible edges during this search forms the peripheral edge set all peripheral candidate edges, 푓 ({푒}) is computed (lines 4 − 5). of the updated balanced subgraph. So, the overall time complexity Using these values, the subset of peripheral edges of cardinality 푏 of Rg is 푂 (푏|C||퐸|푑푎푣푔) time. Greedy has the same complexity. maximizing the sum of 푓 ({푒}) is chosen and a random edge from this subset is selected for deletion (lines 6 − 8). Following this edge 5 EXPERIMENTS deletion, the balanced subgraph 푆 (퐻) is updated to include the In this section, we benchmark the proposed algorithms and analyze newly compatible nodes (line 9). The peripheral edge set for the their efficacy, efficiency and scalability. updated 푆 (퐻) is recomputed (line 3) and this process continues in 5.1 Experimental Setup an iterative manner for 푏 iterations. All algorithms have been implemented in Python 3.6.9 on a Ubuntu ® ® 4.3 The Greedy Approach 18.04 PC with a 2.1 GHz Intel Xeon Platinum 8160 processor, 256 GB RAM and a 7200 RPM, 8.5 TB disk. The codebase is available The only difference with Alg. 2 is that instead of choosing a random online3. edge from the top 푏 edges with the highest sum of 푓 ({푒})s (lines 5.1.1 Datasets. We use publicly available signed networks from 6 − 7), the greedy algorithm (Greedy) chooses the edge with the http://konect.cc. Table 3 summarizes the dataset statistics. Each highest 푓 ({푒}), i.e., 푒 = arg max ∗ {푓 ({푒})}. 푖 푒 ∈C of these models polarized (signed) social interactions. BitcoinOTC, Theoretical Bounds: We derive the approximation of Greedy BitcoinAlpha, Epinions are trust/distrust networks on the two re- in App. 7.4. Table 2 summarizes the different lower bounds of 훾 ′ ′ spective Bitcoin trading platforms and an online product rating (where the approximation guarantee is 1 − 푒−훾 ). site respectively. Chess represents the chess games’ results with 4.4 Time Complexity edges being positive if white won and negative otherwise. Slash- Alg. 2 comprises of three main dominating parts with respect to dot comprises the friend/foe relations on the news site Slashdot. the time complexity: (i) calls to compute function 푓 ({푒}) for all The edges in WikiConflict represent the positive/negative conflicts candidate edges, (ii) computing peripheral edge set (line 3) and (iii) on the Wikipedia. WikiPolitics contains interpreted interactions finally updating the balanced subgraph 푆 (퐻) (line 9). (i) Computing 3https://github.com/Ksartik/MBED WSDM ’21, March 08–12, 2021, Jerusalem, Israel Sharma et al.

(a) BitcoinAlpha (b) BitcoinOTC (c) Chess (d) WikiElections

(e) Epinions (f) Slashdot (g) WikiConflict (h) WikiPolitics Figure 2: Impact of budget on IB% (Eq. 5). Rg and Greedy are superior by up to 7 times than the closest baseline (Min-Cep). between editors of political articles on Wikipedia. WikiElections criterion. In particular, the spectral methods Isa and Spec-Top do connects Wikipedia users who voted for/against each other. We not perform well since it chooses edges based on an upper-bound ignore the direction of the edges in the directed graphs and remove to minimize the minimum eigenvalue of the corresponding Lapla- any loops and multi-edges. cian. Though the balanced graph has minimum eigenvalue of the 5.1.2 Baselines. Besides Greedy and Randomized Greedy (Rg), we Laplacian as 0, the rate at which the edge deletions move towards consider the following baselines: achieving it, might still be low. We also observe that Greedy, in • Spec-Top: In §3.1, we design a spectral approach using an upper- general, performs better than Rg. It would be wrong, however, to bound of the minimum eigen value of the Laplacian. draw the conclusion that Greedy is always better. In subsequent • Isa: Alg. 1 describes this baseline, which is based on perturbation experiments where we choose 푘-cores as the input subgraphs, we theory. We only consider the peripheral edges as the candidates. will see that Rg performs better. We will revisit the topic of Greedy • Random: We randomly delete 푏 edges from the periphery of 푆 (퐻), vs Rg while discussing that experiment. where 퐻 is the initial given subgraph. • Min-Cep: Obs. 3 shows that an edge ( ) deletion associated with 푒 5.2.2 Larger budget on large datasets: To further demonstrate the | ( )| = a node 푥 is favorable if 푐푒푝 퐻{푒 }, 푥 0. Thus, we iteratively efficacy of our methods we vary the budget as a function of C. i.e., delete the peripheral edge minimizing | ( )|. 푐푒푝 퐻{푒 }, 푥 all edges in 퐻. Fig. 3 shows the percentage increase in balance (IB) 5.1.3 Parameters: The default input subgraph 퐻 is the largest con- for the four largest datasets. Consistent with previous experiments, nected component (LCC) of the signed graph. We find the initial Rg and Greedy outperform all baselines (better by up to 6% points). maximum balanced graph 푆 (퐻) using TIMBAL [31]. Table 3 lists the More interestingly, we observe that a substantial increase in balance size of the LCC and its balance in each of the datasets. In addition, is feasible (9% or up to 4000 nodes) by deleting only 0.1% of edges for some experiments, we also use 푘-core structures that are well- (≈ 500 edges). In other words, improvement in balance-dependent known for community discovery [34]. The set of candidate edges C community functions, such as team performance or stability, may be is set to all edges in 퐻. The budget 푏 is varied in each experiment. significantly improved through minor adjustments to the network. 5.1.4 Performance Metric: The quality of a solution (edge) set 퐵 for a given subgraph 퐻 is defined as the percentage of nodes that 5.2.3 Scalability: Table 4 shows the running times of all algorithms gets included in the balanced subgraph after the deletion of 퐵. against budget in the three largest datasets. Although Rg and Greedy Δ(퐻 ) − Δ(퐻) 퐼퐵(퐵, 퐻)(%) = 퐵 × 100. (5) are slower than the other baselines, they finish within a few minutes |퐻 | − Δ(퐻) even on a million edges’ network. Thus, scalability to large networks 5.2 Efficacy and Efficiency is not a concern. A more interesting behavior is witnessed in the correlation between efficacy and efficiency. More specifically, we 5.2.1 Small budget on all datasets: Fig. 2 shows the percentage in- observe that the better performance of an algorithm in IB%, the crease in balance (IB) for eight datasets achieved by each algorithm. higher is its running time. When an algorithm performs better, it Greedy and Rg outperform all the baselines by up to 12%. Besides means in each iteration, the algorithm produces a larger cascading having approximation guarantees (Thms. 3 and 4), Greedy and Rg impact following an edge deletion. Higher cascading impact leads to directly optimize the objective function in an iterative fashion. In a larger number of new peripheral edges coming into consideration. contrast, the baselines choose solution edges depending on other Consequently, the running time goes up. Balance Maximization in Signed Networks via Edge Deletions WSDM ’21, March 08–12, 2021, Jerusalem, Israel

(a) Epinions (b) Slashdot (c) WikiConflict (d) WikiPolitics Figure 3: The quality of all methods with large budgets (ED implies the fraction of edge deletions) in four large datasets.

(a) Epinions (b) Slashdot (c) WikiConflict (d) WikiPolitics Figure 4: Increase in the balance when the input subgraph is a 푘-core. Results are shown against varying values of 푘 for 푏 = 50. 5.3 Impact of Community Density the gradual increase in the size of the balanced component following In this experiment, we systematically vary the density of the input 5 and 10 edge deletions. It shows that: (1) both positive and nega- community 퐻 and analyze its impact on the performance. To control tive edges are chosen for deletion, and (2) there may be significant the density of 퐻, we use 푘-core [38] as the input subgraph. As 푘 cascading impact of a single deletion (as visible in the appearance increases, 퐻 gets denser. Table 5 shows the maximum and minimum of several new green squares in Fig. 5(c)). 푘-core sizes along with their balance for each dataset. We vary the value of 푘 depending on the 푘-core distribution of the graph. As 6 CONCLUSIONS high 푘-cores contain fewer nodes, the highest value of 푘 is chosen such that the size of the 푘-core is at least 10% of the original graph In this paper, we studied the problem of maximizing the balance size in terms of number of nodes. in signed networks via edge deletion. While existing studies have Fig. 4 presents the results. In this section, we only consider the focused primarily on finding the largest balanced subgraph, we three best-performing algorithms of Greedy, Rg and Min-Cep. adopted a network design approach to improve balance inside a Greedy and Rg continue to be the best performers. Another inter- subgraph. We proved that the problem is NP-hard, non-submodular, esting behavior we observe is that, the higher the 푘, and therefore and non-monotonic. To overcome the resultant computational chal- density, the smaller is the gap between Greedy and Rg. In some lenges, we designed an efficient heuristic based on the relation of cases, Rg performs better than Greedy. This behavior is a direct con- Laplacian eigenvalues with the balance in corresponding signed sequence of how Rg and Greedy operates. Greedy deterministically graphs. Since these heuristics do not exhibit approximation guaran- chooses the edge with the highest marginal gain. Consequently, tees, we leverage pseudo-submodularity of the objective function to when the gradient of the marginal gains in the sorted order is high, design greedy algorithms with provable approximation guarantees. choosing the highest edge produces a good result. However, when Through an extensive set of experiments, we showed that the pro- the gradient is small and several edges provide similarly high mar- posed approximation algorithms outperform the baseline algorithms ginal gains, Rg performs better. while being scalable to large graphs. An interesting future direction would be to explore alternative network design mechanisms such 5.4 Visualizations on Bitcoin Network as node deletion and edge-sign flips to improve balance. From a the- In the next experiment, we visually inspect the impact of edge dele- oretical perspective, we also aim to investigate the parameterized tions on increasing balance in the BitcoinOTC data. Fig. 5 presents complexity of balance-related design problems.

Epinions WikiPolitics WikiConflict Budget 10 30 50 10 30 50 10 30 50 Method Datasets | ( ) | | ( ) | | ( ) | | ( ) | 푉 퐻푘푚푖푛 Δ 퐻푘푚푖푛 푉 퐻푘푚푎푥 Δ 퐻푘푚푎푥 Isa 2 6 11 2 7 12 4 12 19 Epinions 26k 20k 13k 10k Spec-Top 2 6 9 3 9 15 3 7 11 Slashdot 23k 14k 8k 4k Min-Cep 4 5 6 4 6 7 10 12 14 WikiConflict 25k 17k 12k 9k Rg 6 7 9 7 9 10 13 16 18 WikiPolitics 37k 30k 14k 11k Greedy 7 13 18 9 18 25 15 22 28 푘 퐻 Table 4: Running times in minutes of the algorithms varying Table 5: Sizes of the -core ( 푘 ) corresponding to the mini- 퐻 퐻 푘 budget on largest available datasets. mum ( 푘푚푖푛 ) and maximum ( 푘푚푎푥 ) values of considered for each dataset in Fig. 4. WSDM ’21, March 08–12, 2021, Jerusalem, Israel Sharma et al.

Δ(퐻)−2 7 APPENDIX Proof by Contradiction. If |퐶푢 | > 2 , then the initial ( ) 7.1 Proof of Lemma 7 푆 퐻 would consist of the larger among 푉1 and 푉2 (which would be Δ(퐻) { } ∪ Proof. Given a signed graph 퐺 = (푉, 퐸, 휎), a subgraph 퐻, let 휆푖 , at least of size 2 ) along with 푢 퐶푢 . 훼 (퐵) Let 훼 (퐵) be the 휆˜푖 be the eigenvalues of 퐿(퐻) and the perturbed matrix 퐿ˆ(퐻) (after Choice of and Peripheral Edges (PE): number of nodes 푥 satisfying: (1) |푐푒푝(퐻 , 푥)| > 0, ∀푒 ∈ 퐵 and single edge (푖, 푗) deletion) respectively where 휆1 ≤ 휆2 ≤ · · · ≤ 휆푚. {푒 } We have 퐿ˆ(퐻) = 퐿(퐻) + 푃, and perturbation matrix 푃 = 퐷¯ + 푆, (2) |푐푒푝(퐻푌 , 푥)| = 0 for some subset 푌 ⊆ 퐵, 푌 ≠ ∅. We use Obs. 3 to restrict the edge set 퐵 to always belong to the of the where 퐷¯ is a diagonal matrix with 퐷¯푖푖 = 퐷¯ 푗 푗 = −1 and 0 otherwise. periphery current balanced subgraph. An upperbound of 푓 (퐵) is as follows. 푆푖 푗 = 푆푗푖 = 휎(푖, 푗) for the perturbed edge (푖, 푗) ∈ 퐸 and otherwise 0. Given 풗 as the unit eigenvector corresponding to 휆1 we have, Lemma 9. 푇 ¯ ∑︁ ¯ 2 푇 ∑︁ 풗 퐷풗 = 퐷푘푘푣 , and 풗 푆풗 = 휎(푖, 푗)푣푖푣 푗 푏 푘 ∑︁ ∗ 푘=푖,푗 푖,푗 푓 (퐵) ≤ 푓 ({푒푖 }) + (퐶 + 1)훼 (퐵). (7) From the first-order matrix perturbation theory (see p.37 183[ ]), 푖=1

˜ 푇 2 푇 ¯ 푇 2 Proof. This is proved using induction (Sec. 8.4). □ 휆1 = 휆1 + 풗 푃풗 + 푂 (||푃 ||퐹 ) = 휆1 + 풗 퐷풗 + 풗 푆풗 + 푂 (||푃 ||퐹 ) ∑︁ ∑︁ − 2 + ( ) + ( ) = 휆1 푣푘 휎 푖, 푗 푣푖푣 푗 푂 1 7.2.1 Final proof for Theorem 2. 푘=푖,푗 푖,푗 = 휆1 − 푣푖 (푣푖 − 휎(푖, 푗)푣 푗 ) − 푣 푗 (푣 푗 − 휎(푗, 푖)푣푖 ) + 푂 (1) Proof. Note that 푓 (푄 ∪푅) − 푓 (푄) = Δ(퐻푄∪푅) − Δ(퐻푄 ). We can write this as Δ(퐻 ′ ) − Δ(퐻 ′), where 퐻 ′ = 퐻 . That means marginal = − − ( ) 2 + ( ) 푅 푄 휆1 푣푖 휎 푖, 푗 푣 푗 푂 1 gain in balance of deleting the set 푅 over 푄 is same as the marginal Now, to show that 휆˜1 (퐻) is indeed the smallest eigenvalue of gain in balance of deleting the set 푅 from 퐻푄 . We can thus use ˆ ′( ) = ( ′ ) − ( ′) 퐿(퐻), using matrix perturbation theory (p. 203 [37]), we have 푓 푅 Δ 퐻푅 Δ 퐻 in place of 푓 . Thus, by Lem. 9: 푏 휆˜ ≤ 휆 + ||푃 || ≤ 휆 + ||푃 || ≤ 휆 + 2 ′ ∑︁ ′ ∗ 1 1 2 1 퐹 1 푓 (퐵) ≤ 푓 ({푒푖 }) + (퐶 + 1)훼 (퐵) (8) ˜ 푖=1 휆푖 ≥ 휆푖 − ||푃 ||2 ≥ 휆푖 − ||푃 ||퐹 ≥ 휆푖 − 2, (푖 ≥ 2) ∗ − ≥ ˜ ≥ ˜ where 퐶 and 훼 are defined accordingly to new initial subgraph Since the spectral gap 훿 = 휆2 휆1 4, we have 휆푖 휆1. So, we ′ 퐻 = 퐻푄 . Next, we propose an upper bound of 훼 (.) as follows: have 휆˜1 = 휆ˆ1 is the smallest eigenvalue of 퐿ˆ(퐻). □ |퐵| − 1 7.2 Details for proof for Theorem 2 훼 (퐵) ≤ (9) 2 Before proving Thm. 2, we derive a few results. Let Vˆ be the node set that gets added in the maximum balanced subgraph 푆 (퐻) after This is true since we need at least two edges for one node to be deleting 퐵 edges. We know that ∀푢 ∈ Vˆ there exists 푣 ∈ 푆 (퐻) such counted in 훼 (퐵). Í Í ′ that (푢, 푣) ∈ 퐵. The inclusion of one node may lead to including 푒 ∈푅 [푓 (푄 ∪ {푒}) − 푓 (푄)] 푒 ∈푅 푓 ({푒}) Now we have, = ′ more nodes in the balanced portion. Let 퐶푢 be the size of component 푓 (푄 ∪ 푅) − 푓 (푄) 푓 (푅) ˆ ∗ ˆ that gets added with 푢 ∈ V and 퐶 = max{퐶푢, 푢 ∈ V}. 1 ′  ≥ ∗ Replace 푓 (푅) using Eq. 8 4. + 훼 (푅)(퐶 +1) Observation 1 Í 푓 ′ ({푒 }) Δ(퐻) 푒∈푅 퐶∗ + 1 ≤ (6) 2 1 ≥ ( Using the upper bound of 훼 in Eq. 9 ) |푅|−1 ∗ 2 (퐶 +1) 1 + Í ′ 푒∈푅 푓 ({푒 }) ! 1 ∑︁ Δ(퐻푄 ) ≥ 푓 ′(푒) ≥ 1,퐶∗ + 1 ≤ [Eq. 6] . + 1 ( )(| | − ) 2 1 4 Δ 퐻푄 푅 1 푒 ∈푅 □

We also show a construction for the tight lower bound in Thm. 2 (a) Initial subgraph (b) After 5 deletions (c) After 10 deletions (Sec. 8.5). Figure 5: Visualization of the impact of edge deletions by 4휓푟 Greedy. Green and orange denote the two partitions of the 7.3 Proof with bound 4휓푟 +Δ푅퐺 (푏−1) ( ) balanced subgraph 푆 퐻 ; grey denotes the component out- In proof of Thm. 2, we have Í 푓 ′(푒) ≥ 1. However, Í 푓 ′(푒) ≥ ( ) 푒 ∈푅 푒 ∈푅 side 푆 퐻 . The solid red and blue edges are positive and nega- 휓푟 , where 휓푟 is the summation of marginal gains of the elements in tive edges, respectively, while the dashed edges in (b) and (c) the optimal solution set (i.e., 푅) over the solution set produced by Rg are the ones being deleted. (b) and (c) also show the new com- 푟 (i.e., 푄). Now replacing Í 푓 ′(푒), as휓푟 we get,훾 ′ ≥ 4휓 ponents being added to the balanced portion through green 푒 ∈푅 4휓푟 +Δ푅퐺 (푏−1) and orange squares. according to Thm. 3. Balance Maximization in Signed Networks via Edge Deletions WSDM ’21, March 08–12, 2021, Jerusalem, Israel

7.4 Approximation by Greedy [18] Vatche Ishakian, Dóra Erdos, Evimaria Terzi, and Azer Bestavros. 2012. A Frame- work for the Evaluation and Management of Network Centrality. In Proc. SIAM Lemma 10 ([8]). Given 푓 is a non-negative and monotone set func- International Conference on Data Mining. 427–438. Í 퐺 퐺 퐺 tion, budget 푏, and 푒 ∈푅 [푓 (푆 ∪ {푒}) − 푓 (푆 ))] ≥ 훾 ·[푓 (푆 ∪ [19] David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of 푅) − 푓 (푅)] 푆퐺 influence through a . In KDD. where is the final set selected by the Greedy Algo- [20] Masahiro Kimura, Kazumi Saito, and Hiroshi Motoda. 2008. Minimizing the Spread rithm, then the algorithm has the following approximation guarantee of Contamination by Blocking Links in a Network.. In AAAI. − 훾푆퐺 ,푏 퐺 [21] Andrew V Knyazev. 2001. Toward the optimal preconditioned eigensolver: Lo- of (1 − 푒 ) where 훾푆퐺 ,푏 = 푚푖푛{훾} for any 푅, 푆 ∩ 푅 = Φ. cally optimal block preconditioned conjugate gradient method. SIAM journal on We apply this result in our problem setting: scientific computing 23, 2 (2001), 517–541. [22] Hong-hai Li and Jiong-sheng Li. 2009. Note on the normalized Laplacian eigen- Theorem 4. For the MBED problem, Greedy algorithm obtains values of signed graphs. Australasian J. Combinatorics 44 (2009), 153–162. −훾′ ′ 4 ∗ [23] Yimin Lin and Kyriakos Mouratidis. 2015. Best upgrade plans for single and an approximation of 1 − 푒 , and 훾 ≥ ∗ where 푏 and Δ 4+Δ (푏−1) multiple source-destination pairs. GeoInformatica 19, 2 (2015), 365–404. denote the budget and the balance after deleting the optimal set of [24] Sourav Medya, Tiyani Ma, Arlei Silva, and Ambuj Singh. 2020. A Game Theoretic edges respectively. Approach For Core Resilience. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. [25] S. Medya, A. Silva, and A. Singh. 2020. Approximate Algorithms for Data-driven ∗ 퐺 Proof. Let the optimal set of 푏 edges be 퐵 and let 푆 denote Influence Limitation. IEEE Transactions on Knowledge and Data Engineering (2020). the final edge set by the Greedy algorithm. Also, let Δ∗ denote [26] Sourav Medya, Arlei Silva, Ambuj Singh, Prithwish Basu, and Ananthram Swami. 2018. Group centrality maximization via network design. In Proc. 24th SIAM the balance after deleting the optimal set of edges, then by its def- International Conference on Data Mining. SIAM, 126–134. ∗ ′ inition we have Δ(퐻푆퐺 ) ≤ Δ . From Theorem 2, 훾 = 훾푆퐺 ,|퐵∗ | ≥ [27] Sourav Medya, Jithin Vachery, Sayan Ranu, and Ambuj Singh. 2018. Notice- 4 4 able network delay minimization via node upgrades. Proceedings of the VLDB + ( ) ( | ∗ |− ) ≥ + ∗ ( − ) . So, substituting 훾푆퐺 ,푏 in Lem. 10 as Endowment 11, 9 (2018), 988–1001. 4 Δ 퐻푆퐺 퐵 1 4 Δ 푏 1 ′ 4 [28] Adam Meyerson and Brian Tagiku. 2009. Minimizing average shortest path 훾푆퐺 ,|퐵∗ | (or 훾 ), we get the desired approximation of 4+Δ∗ (푏−1) . □ distances via shortcut edge addition. In Approximation, Randomization, and Com- binatorial Optimization. Algorithms and Techniques (APPROX-RANDOM). Springer, ′ 272–285. The other lowers bounds for 훾 (where the approximation pro- [29] Shubhadip Mitra, Sayan Ranu, Vinay Kolar, Aditya Telang, Arnab Bhattacharya, ′ 푔 −훾 4 4휓 Ravi Kokku, and Sriram Raghavan. 2015. Trajectory aware macro-cell planning for duced by Greedy is 1 − 푒 ) as 퐺 and 푔 퐺 can be 4+Δ (푏−1) 4휓 +Δ (푏−1) mobile users. In 2015 IEEE Conference on Computer Communications (INFOCOM). derived in similar ways as in the case of Rg. IEEE, 792–800. [30] George L Nemhauser and Laurence A Wolsey. 1978. Best algorithms for approxi- mating the maximum of a submodular set function. of operations REFERENCES research 3, 3 (1978), 177–188. [1] Jin Akiyama, David Avis, Vasek Chvátal, and Hiroshi Era. 1981. Balancing signed [31] Bruno Ordozgoiti, Antonis Matakos, and Aristides Gionis. 2020. Finding large graphs. Discrete Applied Mathematics 3, 4 (1981), 227–233. balanced subgraphs in signed networks. In Proceedings of The Web Conference [2] Ashwin Arulselvan. 2014. A note on the set union knapsack problem. Discrete 2020. 1378–1388. Applied Mathematics 169 (2014), 214–218. [32] Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K Vishnoi. 2012. Approximat- [3] O. Askarisichani, J. Ng Lane, F. Bullo, N. E. Friedkin, A. K. Singh, and B. Uzzi. 2019. ing the exponential, the Lanczos method and an O (m)-time spectral algorithm Structural Balance Emerges and Explains Performance in Risky Decision-Making. for balanced separator. In Proceedings of the forty-fourth annual ACM symposium 10, 2648 (2019). https://doi.org/10.1038/s41467-019-10548-8 on Theory of computing. 1141–1160. [4] Francesco Belardo. 2014. Balancedness and the least eigenvalue of Laplacian of [33] Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and signed graphs. Linear Algebra Appl. 446 (2014), 133–147. evaluation methods. Semantic web 8, 3 (2017), 489–508. [5] Vineet Chaoji, Sayan Ranu, Rajeev Rastogi, and Rushi Bhatt. 2012. Recommenda- [34] Chengbin Peng, Tamara G Kolda, and Ali Pinar. 2014. Accelerating community tions to boost content spread in social networks. In WWW. 529–538. detection by using k-core subgraphs. arXiv preprint arXiv:1403.2226 (2014). [6] Ed S Coakley and Vladimir Rokhlin. 2013. A fast divide-and-conquer algorithm [35] Svatopluk Poljak and Daniel Turzík. 1986. A polynomial time heuristic for certain for computing the spectra of real symmetric tridiagonal matrices. Applied and subgraph optimization problems with guaranteed worst case bound. Discrete Computational Harmonic Analysis 34, 3 (2013), 379–414. Mathematics 58, 1 (1986), 99–104. [7] Pierluigi Crescenzi, Gianlorenzo D’Angelo, Lorenzo Severini, and Yllka Velaj. [36] Richard Santiago and Yuichi Yoshida. 2020. Weakly Submodular Function Maxi- 2015. Greedily Improving Our Own Centrality in A Network. In SEA. Springer mization Using Local Submodularity Ratio. arXiv preprint arXiv:2004.14650 (2020). International Publishing, 43–55. [37] G.W. Stewart and J-g Sun. 1990. Matrix Perturbation Theory. Academic Press, Inc. [8] A. Das and D. Kempe. 2018. Approximate submodularity and its applications: [38] Fan Zhang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2017. Finding subset selection, sparse approximation and dictionary selection. The Journal of Critical Users for Social Network Engagement: The Collapsed k-Core Problem. In Machine Learning Research 19, 1 (2018), 74–107. Thirty-First AAAI Conference on Artificial Intelligence. 245–251. [9] Bhaskar DasGupta, German Andres Enciso, Eduardo Sontag, and Yi Zhang. 2007. [39] Zhongxin Zhou, Fan Zhang, Xuemin Lin, Wenjie Zhang, and Chen Chen. 2019. Algorithmic and complexity results for decompositions of biological networks K-Core Maximization: An Edge Addition Approach.. In IJCAI. 4867–4873. into monotone subsystems. Biosystems 90, 1 (2007), 161–178. [10] Palash Dey and Sourav Medya. 2020. Manipulating Node Similarity Measures in Network. In AAMAS. [11] Bistra Dilkina, Katherine J. Lai, and Carla P. Gomes. 2011. Upgrading shortest paths 8 ADDITIONAL PROOFS in networks. In Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Springer, 76–91. 8.1 NP-hardness [12] Rosa Figueiredo and Yuri Frota. 2014. The maximum balanced subgraph of a signed Proof. Let 푆퐾 (푈 , 푆, 푃,푊 , 푞) be an instance of the Set Union graph: Applications and solution approaches. European Journal of Operational = { } Research 236, 2 (2014), 473–487. Knapsack Problem [14], where 푈 푢1, . . .푢푛 is a set of items, [13] Venkata Rama Kiran Garimella and Ingmar Weber. 2017. A long-term analysis of 푆 = {푆1, . . . 푆푚 } is a set of subsets (푆푖 ⊆ 푈 ), 푃 : 푆 → R+ is a sub- polarization on Twitter. In Eleventh International AAAI Conference on Web and set profit function, 푤 : 푈 → R+ is an item weight function, and Social Media. [14] Olivier Goldschmidt, David Nehme, and Gang Yu. 1994. Note: On the set-union 푞 ∈ R+ is the budget. For a subset A ⊆ 푆, the weighted union of set knapsack problem. Naval Research Logistics (NRL) (1994). A (A) = Í (A) = Í is 푊 푒 ∈∪푡∈A푆푡 푤푒 and 푃 푡 ∈A 푝푡 . The problem [15] et al. 1953. On the notion of balance of a signed graph. The Michigan is to find a subset A∗ ⊆ 푆 such that 푊 (A∗) ≤ 푞 and 푃 (A∗) is Mathematical Journal 2, 2 (1953), 143–146. [16] Yaoping Hou, Jiongsheng Li, and Yongliang Pan. 2003. On the Laplacian eigenval- maximized. SK is NP-hard to approximate within a constant factor ues of signed graphs. Linear and Multilinear Algebra 51, 1 (2003), 21–30. [2]. We reduce a version of 푆퐾 with equal profits and weights (also [17] Falk Hüffner, Nadja Betzler, and Rolf Niedermeier. 2007. Optimal edge deletions for signed graph balancing. In International Workshop on Experimental and Efficient NP-hard) to the Mbed problem. We define a corresponding Mbed Algorithms. Springer, 297–310. problem instance via constructing a graph Γ as follows. WSDM ’21, March 08–12, 2021, Jerusalem, Israel Sharma et al.

For each 푆푖 ∈ 푆 and 푢푗 ∈ 푈 we create nodes 푥푖 and 푦푗 respectively. 8.4 Proof of Lemma 9 We also add a node 푣 with a large connected component 퐿 of size Proof. We prove this by induction on the number of edges, 푏. 푙 only with positive edges attached to it. The node 푣 has negative Let us denote 퐵푘 ⊆ 퐵 as {푒1, ··· , 푒푘 }. We construct 퐵 by only con- edges with every node 푥 , ∀푖 ∈ [푚] and every node 푦 , ∀푗 ∈ [푛]. ′ 푖 푗 sidering peripheral edges 푒푘+1 such that, for all 푘 ≤ 푏: (푒푘+1, 푒 ) ∈ Additionally, if 푢푗 ∈ 푆푖 , a negative edge (푥푖,푦푗 ) will be added to the ( ) ′ 푐푒푝 퐻퐵푘 , 푥 , for some node 푥 and edge 푒 . edge set 퐸. Base case (푏 = 1): 푓 ({푒1}) ≤ 푓 ({푒1}). Also, 훼 ({푒1}) = 0. In Mbed, the number of edges to be removed is the budget, 푏 = 푞. Inductive hypothesis (IH): Suppose the equation holds for 푏 = The candidate set, C = {(푣,푦푗 )|∀푗 ∈ [푛]}. Note that initial largest Í푘 ∗ 푘, i.e., 푓 (퐵푘 ) ≤ 푖=1 푓 ({푒푖 }) + (퐶 + 1)훼 (퐵푘 ). connected balanced component is {푣 ∪퐿} ∪ {푦푗 ∀푗 ∈ [푛]} if 푙 > 푚 +1 Inductive step (푏 = 푘 + 1): We present different cases for 푒푘+1. (assuming 푛 > 푚). Our claim is that, for any solution A of an ( ′) ∈ ( ) ′ Note that we have 푒푘+1, 푒 푐푒푝 퐻퐵푘 , 푥 for some 푥, 푒 . instance of 푆퐾 there is a corresponding solution set of edges, 퐵 ′  Case 1: (푒푘+1, 푒 ) ∈ 푐푒푝(퐻, 푥) and 푐푒푝 퐻푒 + , 푥 = 0, i.e., after (where |퐵| = 푏) in the graph Γ of the Mbed version, such that 푘 1 deleting 푒푘+1, 푥 moves into the balanced subgraph. Then, we must 푓 (퐵) = 푃 (A) + 푛 + 푙 + 1 if 퐵 = {(푣,푦)|푦 ∈ A} are removed.  ( ∪ { }) − ( ) also have 푐푒푝 퐻퐵푘+1 , 푥 = 0. Hence, 푓 퐵푘 푒푘+1 푓 퐵푘 = In the new balanced graph, we aim to build two partitions (푊1 푓 ({푒푘+1}) and the inequality holds. and 푊2) as follows. One partition 푊1 consists of {푣 ∪ 퐿} initially. ′  Case 2: Either (1) (푒푘+1, 푒 ) ∈ 푐푒푝(퐻, 푥) and 푐푒푝 퐻푒푘+1 , 푥 > 0 Our goal is to delete edges from C and add the nodes 푦푗 ’s in 푊1. ′ ′ or (2) (푒푘+1, 푒 ) ∉ 푐푒푝(퐻, 푥). If (푣,푦푗′ ) for any 푗 does not get deleted then it would be in 푊2. If Thus, by Observation 2, we have 푓 ({푒푘+1}) = 0. there is any node 푥푖 that is connected with only nodes in A beside  ∗ Case 2a: Suppose 푐푒푝 퐻퐵푘+1 , 푥 = 0. Then by definition of 훼,퐶 , being connected with 푣, then removing all the edges in 퐵 would ∗ we have 훼 (퐵푘+1) = 훼 (퐵푘 ) + 1, and 푓 ({퐵푘 ∪푒푘+1}) − 푓 (퐵푘 ) ≤ 퐶 + 1. put the node 푥푖 in 푊2. Thus removing edges in A would put 푃 (A) ∗ Í푘 Substituting this, we get 푓 (퐵푘+1) ≤ (퐶 + 1) + 푖=1 푓 ({푒푖 }) + nodes in 푊2. Thus, 푓 (퐵) = 푃 (A) + 푛 + 푙 + 1. (퐶∗ + 1)훼 (퐵 ) = Í푘+1 푓 ({푒 }) + (퐶∗ + 1)훼 (퐵 ). □ 푘 푖=1 푖 푘+1 Case 2b: In other cases, 푓 (퐵푘 ∪ {푒푘+1}) − 푓 (퐵푘 ) = 푓 ({푒푘+1}) = 0. This exhausts our cases and the claim is true ∀푏,푏 > 0. □ 8.2 Proportionally Submodular Lemma 8.1. The objective function 푓 is not proportionally 8.5 Construction for the tight lower bound in submodular [36]. In other words, there exists 푆,푇 ∈ 퐸 for some graph Thm. 2 퐻 |푇 |푓 (푆) + |푆|푓 (푇 ) < |푆 ∩ 푇 |푓 (푆 ∪ 푇 ) + |푆 ∪ 푇 |푓 (푆 ∩ 푇 ) such that . One can construct a graph 퐻 and the sets 푄, 푅 where equality holds. In particular, let 푅 be of an arbitrary size 푏. Consider 퐻푄 to have the Proof. Consider a balanced subgraph of 퐻, 푆 (퐻) has a partition Δ(퐻푄 ) MBS partition as 푉1,푉2 each of size 2 . Nodes of type 1 (Obs. 2) 푉1 and 푉2. A node 푣 is outside 푆 (퐻) and it is connected to 푉1 with are attached to these each with the sole connected component of size Δ(퐻 )−2 positive edges 푒1 and 푒2, 푉2 with another positive edge 푒3. Thus 푄 . Let these nodes have 3 such connections (thus, removing ( ) 2 the node 푣 cannot be the part of 푆 퐻 . Consider an edge 푒4 inside two will help - any two such that our "connected assumption" holds which can be removed without making the graph disconnected. 푉1 are in the set 푅). We have another node of type 1 such that only Let us assume = { } = { }. Then, ({ }) = 0 and 푆 푒1, 푒4 ,푇 푒2, 푒4 푓 푒1, 푒4 two such connections are connected and one of these is in 푅 and ({ }) = 0, since even after removing any of these edges it is not 푓 푒2, 푒4 the connected component 퐶 to it is of size 0. This completes the possible to add the node 푣 to 푆 (퐻). Note that 푓 (푆 ∩푇 ) = 푓 ({푒4}) = 0. Í set 푅. Thus, 푒 ∈푅 [푓 (푄 ∪ {푒}) − 푓 (푄) = 1 and 푓 (푄 ∪ 푅) − 푓 (푄) = However, 푓 (푆 ∪ 푇 ) = 푓 ({푒1, 푒2, 푒4}) = 1 since the node 푣 can be  ( )−  + Δ 퐻푄 2 + 푏−1 added. Substituting these values, we get |푇 |푓 (푆) + |푆|푓 (푇 ) < |푆 ∩ 1 2 1 2 . 푇 |푓 (푆 ∪ 푇 ) + |푆 ∪ 푇 |푓 (푆 ∩ 푇 ). □

8.3 Proof of Lemma 6 We denote 푔푋 (푌) as the marginal gain of the set of edges 푌 over the set 푋, i.e., 푔푋 (푌) = 푔(푋 ∪ 푌) − 푔(푋). To prove modularity, we need Í to show 푔푋 (푌) = 푒 ∈푌 푔푋 (푒), i.e. the marginal gain of the set of 푌 over 푋 is the summation of the marginal gains of each individual in 푌 over 푋 for any 푋, 푌.

Proof. We can write 푔푋 (푌) as follows.

∑︁ 2 ∑︁ 2 푔푋 (푌) = − (풗푖 − 휎(푖, 푗)풗푗 ) + (풗푖 − 휎(푖, 푗)풗푗 ) (푖,푗) ∈푋 ∪푌 (푖,푗) ∈푋 ∑︁ 2 ∑︁ = − (풗푖 − 휎(푖, 푗)풗푗 ) = 푔푋 (푒) (푖,푗) ∈푌 푒 ∈푌