Fully Dynamic Approximate Maximum Independent Set on Massive Graphs

Xiangyu Gao §‡, Jianzhong Li §‡, Dongjing Miao § §Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China ‡Shenzhen Institute of Advanced Technology, Chinese Acadamy of Sciences, Shenzhen, China {gaoxy, lijzh, miaodongjing}@hit.edu.cn

Abstract—Computing a maximum independent set (MaxIS) is The MaxIS problem has a wide range of real-world ap- a fundamental NP-hard problem in , which has plications, such as indexing techniques for shortest path and important applications in a wide range of areas such as social net- distance queries [16], [24], collusion detection [4], automated work analysis, graphical information systems and coding theory. Since the underlying graphs of numerous applications are always map labeling [18], social network analysis [19], and associa- changing continuously, the problem of maintaining a MaxIS over tion rule mining [31]. Besides having these direct applications, dynamic graphs has received increasing attention in recent years. the MaxIS problem is also closely related to two well-known Due to the intractability of maintaining an exact MaxIS, this optimization problems, i.e., the minimum problem paper studies the problem of maintaining an approximate MaxIS and the maximum problem. To find the maximum clique over fully dynamic graphs, where 4 graph update operations are allowed i.e., adding or deleting a vertex or an edge. Based on (the largest complete subgraph) of a graph G, it suffices to swap operation, we present a novel framework for maintaining find the maximum independent set of the complement of G. an approximate maximum independent set which contains no k- And, to find the minimum vertex cover (the smallest subset swaps and make a deep analysis of performance ratio achieved of vertices that contains at least one endpoint of each edge by it. We implement a dynamic ( ∆ + 1)-approximate 2 in the graph) of G = (V,E), one can compute the maximum and a more effective algorithm based on one-swap vertex and two-swap vertex set respectively and make a further analysis of independent set M of G and return V \M. their performance based on Power-Law Random graph model. Due to the importance of the MaxIS problem, it has been Extensive experiments are conducted over real graphs to confirm extensively studied in static graphs for decades of years [2], the effectiveness and efficiency of the proposed . [14], [15], [22], [29], [30]. Since it is NP-hard to compute an exact maximum independent set, all known exact algorithms I.INTRODUCTION have worst-case exponential time complexities regarding the number of vertices in the graph. The state-of-the-art algorithm Graph has been widely used to model many types of proposed by Xiao et al. [30] reduces the base of the exponent relationships among entities in a wide spectrum of applica- to 1.1996, i.e., with time complexity O(1.1996nnO(1)), which tions such as social networks, collaboration networks, com- still can not handle large graphs. Moreover, the MaxIS problem munication networks and biological networks. The maximum is also hard to approximate. It has been proved that the MaxIS independent set (MaxIS) problem is a classic NP-hard problem problem does not admit a constant approximation factor in in graph theory [17]. A subset M of vertices in a graph G is general graphs [28] and for any ε > 0, there is no polynomial- an independent set if there is no edge between any two vertices time n1−ε algorithm for the MaxIS problem unless NP = in M. A maximal independent set is an independent set such ZPP [23]. As a result, the approximation ratios of the existing that adding any other vertex to the set forces the set to contain techniques depend on either n or ∆, where n is the number arXiv:2009.11435v2 [cs.DS] 21 Jun 2021 an edge. The independent set with the largest size, measured of vertices in G and ∆ is the maximum vertex degree of G. by the number of vertices in it, among all independent sets Till now, the best approximation ratio known for the MaxIS of G is called the maximum independent set of G, which problem is O(n(log log n)2/(log n)3) [14]. In recent yeas, may not be unique. For example, in Figure 1, {v2, v6, v8} is many algorithms adopt heuristics techniques to compute high- a maximal independent set of size 3, while {v1, v4, v6, v8} is quality (large-size) independent sets [3], [10], [12], [20], [25], the maximum independent set of size 4. [27]. The state-of-the-art method is proposed by Chang et al., which iteratively applies reduction rules on vertices and remove the vertex with highest degree when no reduction rules can be applied [10]. Notice that if all vertices are removed from the graph according to reduction rules, the solution computed by the algorithm is certain to be maximum. However, the underlying graphs of many real-world ap- plications are changing continuously. For instance, when a (a) Maximal independent set. (b) Maximum independent set. user follows the other in Facebook, a directed edge will be Fig. 1. An example graph to illustrate independent sets. added between the two vertices corresponding to these two and O(log2 ∆ · log2 n). However, the solution maintained users. Similarly, a user can also remove the directed edges by these algorithms is only a maximal independent set that between himself and his out-neighbors by unfollowing them. may be not good enough to be used in many real-world Unfortunately, all the existing algorithms can not be used applications. Furthermore, these algorithms may be inefficient directly over dynamic graphs, as it is costly to compute in practice since they are complex and some of them are a solution from scratch each time, especially in large-scale analyzed against a non-adaptive oblivious adversary for an frequently updated graphs. Hence, the problem of maintaining expected time complexity. a MaxIS over dynamic graphs has received increasing attention In summary, the existing heuristic algorithms suffer from over the last few years. Because it is NP-hard to compute high time consumption and can not guarantee the solution an exact MaxIS over dynamic graphs [33], all the existing quality with the increasing of the amount of updates. And, methods resort to heuristic techniques to maintain a high- the theoretical algorithms with lower time complexities only quality solution without theoretical guarantee. provide a solution with limited accuracy in practice. To The first non-trivial algorithm is given by Zheng et al. in address this issue, in this paper, we study the problem of 2018 [33]. They propose a lazy search strategy to enable the maintaining an approximation maximum independent set over MaxIS computation over dynamic graphs. The main idea is dynamic graphs. The main challenge is to achieve a non-trivial that an exact MaxIS is obtained if a solution is found in the approximation ratio without sacrificing the time efficiency. search; Otherwise, some visited vertices will not be explored We develop a swap-based update framework for maintaining further. However, the quality of the maintained independent an independent set containing no k-swaps all the time and set is not satisfying after a few rounds of updates when the this structural feature also enables a non-trivial performance initial solution is not optimal. After that, to further improve ratio of our algorithms. Although swap-operations have been the quality of the maintained solution, they devise a directed successfully used to enlarge the independent set in static graph index DG, named as dependency graph, to guide the graphs [12], [27], none of them make an analysis of the search [32]. They category the vertices into reducing vertices performance ratio achieved by their algorithms. We also show and dependent vertices according to the procedure of applying that taking additional kinds of swap operations into consider reduction rules to vertices introduced in [10]. When a reducing will not improve the approximation ratio. This points out the vertex u (the vertex where a reduction rule is applied) and its limitation of the swap-based methods for the MaxIS problem. ∆ dependent vertices (the neighbors of u in current graph) are We implement a dynamic ( 2 + 1)-approximation algorithm removed from G, directed edges are added from u to each of and a practically more effective algorithm based on one-swap its dependent vertices in DG. Finally, directed edges will also vertex and two-swap vertex set respectively. Moreover, based be added from each dependent vertex to its reducing neighbors, on the observation that real networks are usually power-law excluding its current in-neighbors in DG. Assuming a vertex graphs with many lower-degree vertices, we make a further v is to be removed from the solution, they try to find two analysis of the performance of our algorithms based on Power- set of vertices Vin and Vout starting from v along with the law Random graph model. The main contributions of this directed edges in DG such that |Vin| ≥ |Vout|. They design a paper are as follows. bottom-up dynamic searching algorithm with time complexity • A swap-based update framework is proposed for main- O(m) in the worst case, where m is the number of edges in taining an approximate maximum independent set over the graph. It is notice that as the number of updates increases, dynamic graphs, and the performance ratio achieved by the dependency graph will soonly become non-instructive due it is deeply analyzed. We also derive the lower bound of to the change in the vertex categories. However, in practice, the performance ratio achieved by swap-based methods the number of update operations is always proportional to the for the MaxIS problem in general graphs. ∆ number of vertices in the graph. And rebuilding the index • A dynamic ( 2 + 1)-approximate algorithm is proposed frequently to ensure the solution quality is expensive. based on one-swap vertex. To the best of our knowledge, There are also some theoretical algorithms for maintaining this is the first algorithm which maintains an approximate a maximal independent set over dynamic graphs. The first maximum independent set over dynamic graphs with non-trivial algorithm for fully dynamic maximal independent non-trivial theoretical guarantee. Moreover, the expected 2(ζ(2β,∆)−1) set problem was presented by Assadi et al. who obtain a approximation ratio is (1 + ζ(β,∆) ) in power-law deterministic algorithm with O(m3/4) update time [5]. Later, random graph P (α, β). ˜ 2/3 ∆ the update time was independently improved to O(m ) by • A dynamic ( 2 + 1)-approximate algorithm, which is Du and Zhang [13] and Gupta and Khan [21]. Du and Zhang more effective in practice, for maintains an independent set containing no 2-swaps is proposed based on the [13] also present a√ randomized algorithm against an oblivious ˜ framework. adversary with O( m) update time. This√ randomized upper bound was currently improved to O˜( n) by Assadi et al. • Extensive experiments are conducted over a bunch of [6]. The current state of art algorithms are independently large-scale graphs. As confirmed in the experiments, the proposed by Chechik and Zhang [11] and Behnezhad et al. [7]. proposed algorithms are both effective and efficient. They achieve a dynamic randomized algorithm against obvious The reminder of this paper is organized as follows. Pre- adversary with expected worst-case update time of O(log4 n) liminaries and problem definition are stated in Section II. A 0 0 swap-based update framework for maintaining an approximate and Gt+1 is obtained by inserting an edge in E0 to Gt. This maximum independent set is introduced in Section III. Two ensures that we can always have a maximum independent set concrete algorithms based on one-swap vertex and two-swap V0 as the initial set, which is definitely a γ-approximation. As vertex sets are presented in Section IV and Section V respec- proved in [33], it is NP-hard to maintain an exact maximum tively. Experimental results are reported in Section VI. The independent set over a dynamic graph. Similarly, the following paper is concluded in Section VII. theorem shows the hardness of maintaining an approximate maximum independent set. II.PRELIMINARIES Theorem 1: There is no polynomial-time approximation In this section, we first introduce the basic notations and algorithm which maintains an approximate maximum indepen- define the problem studied in this paper. Then we introduce dent set with performance ratio n1−ε or less for any ε ∈ (0, 1) the Power-Law Random (PLR) graph model, which will be over a dynamic graph G, unless NP = ZPP. used to analyze the performance of our algorithms later. Proof: For any input graph G = (V,E), construct a (m+ A. Notations and Problem Definition 1)-length dynamic graph G = hG0, ··· ,Gi as follows. Let G = (V, ∅), G = (V,E), and for i ∈ [m], G be obtained Without loss of generality, we focus on unweighted undi- 0 m i by inserting an edge in E \ Ei−1 to Gi−1. If an independent rected graphs. For ease of presentation, we simply refer to an set with approximation ratio n1−ε or less for any ε ∈ (0, 1) unweighted undirected graph as a graph. A dynamic graph can be maintained in polynomial time over dynamic graph G, G = hG ,G , · · · i is a graph sequence denoted as 0 1 . Each then such a solution of G can be computed in polynomial G G t+1 is obtained from the previous graph t in the sequence time, which contradicts to the result of [23]. by either inserting or deleting a vertex or an edge1. At step t, let Gt = (Vt,Et) denote the current graph and define B. Power-Law Random Graph Model. nt = |Vt|, mt = |Et|. The neighbors of a vertex u is denoted Considerable research has focused on discovering and mod- by Nt(u) = {v ∈ Vt | (u, v) ∈ Et}, and the degree of a eling the topological properties of various large-scale real- vertex u is denoted by dt(u) = |Nt(u)|. For any S ⊆ Vt, world graphs, including social graphs and web graphs. A define Nt(S) to be the set of all neighbors of S in Gt and power-law random (PLR) graph G = P (α, β) [1] is a graph Gt[S] to be the of Gt on S. Throughout that conforms the following degree distribution parameterized the paper M denotes the independent set maintained by the with two given values α and β: for any x, the number of ver- algorithm at each time step, and given an integer k ≥ 1, [k] tices in G with degree x equals y, such that log y = α−β log x. denotes the integer set {1, ··· , k}. In other word, we have Definition 1 (Independent Set): Given a graph G, a vertex eα |{v | d(v) = x}| = y = (1) subset M is an independent set of G if for any two vertices xβ u and v in M, there is no edge between u and v in G. Basically, α is the logarithm of the number of nodes of degree The size of an independent set is measured by the number of 1 and β is the log-log growth rate of the number of nodes with vertices in it. An independent set M is a maximal independent a given degree. Moreover, the number of vertices and edges set if adding any other vertex to it forces M to contain an of the graph can be expressed as follows ∆ edge. A maximal independent set M of G is a maximum X eα n = = eαζ(β, ∆), independent set if its size is the largest among all independent iβ sets of G, and this size is called the independence number of i=1 (2) ∆ G, denoted as α(G). Note that, the maximum independent set X eα 2m = = eαζ(β − 1, ∆), of G is not unique; that is, there may exist several independent iβ−1 sets of G with size α(G). An independnet set M is called a γ- i=1 Pb 1 where ζ(a, b) = a is the first b terms of Riemann zeta approximate maximum independent set if α(G) ≤ γ ·|M| and i=1 i α M is called an expected γ-approximate maximum independent function with parameter a, and ∆ = be β c is the maximum set if E[α(G)] ≤ γ · E[|M|]. For an independent set algorithm degree of the graph. A, let A(G) denote the size of the solution obtained by A III.ASWAP-BASED UPDATE FRAMEWORK on graph G. The performance ratio ρA of A is defined as In this section, we develop a swap-based update framework ρA = ρA(n) = maxG.|V |=n α(G)/A(G). The problem that we studied in this paper is formally defined as follows. for maintaining an approximate maximum independent set Definition 2 (Dynamic Approx-MaxIS): Given a dynamic over dynamic graphs. We first review a simple dynamic algorithm for maximal independent set. graph G = hG0,G1, · · · i and an initial γ-approximate maxi- mum independent set M of G0, maintain the γ-approximate A. A Simple Dynamic Algorithm maximum independent set M over G. There is a straightforward algorithm for maintaining a Given any dynamic graph G, we can add a (m + 1)-length 0 maximal independent set M over dynamic graphs, which has prefix hG0 , ··· ,G i to the sequence, where G0 = (V , ∅) 0 0 0 0 been deeply studied in [5], [13]. For each vertex v in the graph, 1When inserting a vertex to the graph, both the vertex and the adjacency the algorithm maintains a count count[v], counting the number list are given as the input. of its neighbors currently in M. It is apparently that a vertex v belongs to M if and only if count[v] = 0, and whenever a For a subset S of M, we define a list C[S] including vertices vertex v enters or leaves M, the counts of its neighbors can in N(S) that possibly make up the swap-in vertex set IS for be updated in O(d(v)) time. In the following, we state how S. That is the neighbors of vertices in C[S] that currently to update M after each update operation. belong to M are all in S. We arrange all C[S]s in a hierarchy (1) In case of inserting a vertex v, we first compute count[v] structure, denoted as C. For each vertex v not in M, we store in O(d(v)) time, and if count[v] = 0, we add v to M. it in the list C[S] with minimal level to which it belongs, and (2) In case of deleting a vertex v, an update is only required use pointers to represent the containment relationship between when v belongs to M, where v is simply removed from adjacent levels of C. This allows us to update v’s position in M. After that if the count of any neighbor of v reduces constant time when M[v] changes. To ensure M contains no to zero, we add it to M. k-swaps, we only maintain the first k level of C. Since C (3) On insertion of an edge (u, v), an update is required only provides enough information for finding the swap-in set of a in case when both u and v belong to M, where either one particular set, we will not maintain C[v] explicitly in each of them, say u, is removed from M. In case the count of vertex. But for easy of representation, we will keep using any neighbor of v reduces to zero, we add it to M. the notation in the rest of the paper. After a change in the (4) On deletion of an edge (u, v), an update is required when topology or M, we update the information as follows. When one of them, say u, is in M, where we subtract one from a vertex v enters or leaves M, we iterate the neighbors u of v count[v], and if count[v] reduces to zero, we add v to M. to update M[u] and move u from C[M[u] ∪ {v}] to C[M[u]]. As stated in [5], [13], there is at most one vertex will We use hashing as a position index to enable the removal be removed from M once the topology of the graph is and insertion operations of M[u] and C[S] can be done in changed. Hence, the amortized time complexity of the simple constant time. Therefore, the total time needed to update the dynamic algorithm is O(∆), where ∆ is an upper bound of the information is O(d(v)), and the additional space consumption maximum degree in the graph. To simplify the representation, caused by hashing is O(m). we refer to this simple dynamic algorithm as Simple in the rest of the paper. Algorithm 1: Swap-based Update Framework: main- B. The Swap-based Update Framework tain an independent set M containing no k-swaps Now, we are ready to present our swap-based update frame- Input: The graph Gt, the indpendent set M of Gt, work for maintaining an approximate maximum independent and an update operation op. set and make a discussion about its performance ratio. Output: The independent set M of Gt+1 = Gt ⊕ op Definition 3 (k-swap Vertex Set): Given a graph G and an 1 Update Gt, M, and initialize S1, ··· , Sk; independent set M of G, a vertex set S = {v1, ··· , vk} ⊆ M 2 while ∃j ∈ [k]: Sj 6= ∅ do is a k-swap vertex set (or k-swappable) if there is an indepen- 3 Retrieve a pair (S, P[S]) from Sj; dent set IS ⊆ N(S) such that N(IS) ∩ M ⊆ S and |IS| > k. 4 if S is a j-swap vertex set then A subset S of M is k-swappable if and only if there exists a 5 MoveOut(S); MoveIn(IS); (k, k + 1)-swap, also referred as a k-improvement, between S 6 Extend M to be maximal around C[S]; and N(S). The main idea of the framework is that for a given 7 for u ∈ Nt+1(S) do 0 k, maintain an independent set containing no k-swaps all the 8 if u is newly added to C[S ] then 0 time. We say that an independent set M contains no k-swaps 9 Add (S , {u}) to Slevel[C[S0]]; if there are no j-swap vertex sets in M for all j ∈ [k]. That is we can not enlarge M by finding j-improvements among it for all j ∈ [k]. This feature enables the framework to achieve a non-trivial performance ratio, which will be analyzed later. We propose the swap-based update framework, shown in 2 We first describe the information maintained in the framework. Algorithm 1. Given an update operation op = (u, v) , let • status[v]: a boolean entry indicating whether or not v Nt+1[op] denote Nt+1({u, v}) ∪ {u, v}. We first update the belongs to M; structure of Gt according to op and invoke Simple to main- •M[v]: a list containing the neighbors of v currently in tain M to be maximal. Then, we initialize k dictionary-like M and note that count[v] is the size of M[v]; structures S1, ··· , Sk, one for each of the first k level of C. •C[v]: an array of vertex sets, one for each possible value Specifically, for each j ∈ [k], Sj treats a set S ⊆ M with size of count, that is C[v][k] contains all current neighbors of j as the key and P[S] as the value, where P[S] is the set of v whose count equals to k; vertices newly added to C[S]. Since only the counts of vertices •C[S]: a list including the candidate vertices of S which in Nt+1[op] may decrease, for each u ∈ Nt+1[op] we insert is recursively defined as follows, (M[u], {u}) to Slevel[C[M[u]]] if count[u] ≤ k. After that, we ( iteratively check whether or not the set in Sj is j-swappable C[v][1], if S = v, until all S are empty for j ∈ [k]. We retrieve a pair (S, P[S]) C[S] = j (∪v∈SC[S \ v]) ∪ (∩v∈SC[v][k]), otherwise. from the first non-empty Sj, and for each vertex w ∈ P[S],

And define level[C[S]] = |S|. 2To unify the presentation if op is a vertex operation let u = v. we test whether there is an independent set including w in Let Mopt denote a maximum independent set of G. Discard G[C[S]], whose size is at least j +1. If so, we remove vertices the first k items of inequality 4, it is known that in S from M and insert vertices in IS to M. Since IS may c c c |Mopt ∩ (M>k ∪ M>k)| ≤ |Mk+1| + ··· + |M∆| + |M>k| be not maximal in G[C[S]] we extend M around C[S]. After ∆ ≤ · |M| + |M |. that, we examine whether there is a vertex u ∈ N(S) newly k + 1 >k added to some set C[S0] in the first k level of C. If so, we add (5) 0 (S , u) to Slevel[C[S0]]. Hence, the critical part is to give a deep analysis of how swap Theorem 2: Algorithm 1 always maintains an independent operation will benefit in the remaining graph. The following set M which contains no k-swaps over a dynamic graph G = lemma suggests an optimal case when k = 1. hG0,G1, · · · i. Lemma 1: Suppose M is an independent set contains no Proof: We will prove by induction that when Algo- 1-swaps, then M1 is a maximum independent set of G[1], c rithm 1 terminates, for all vertex sets S at level i ∈ [k], where G[1] denotes G[M1 ∪ M1]. α(G[C[S]]) ≤ i. First, we add a (m0 +1)-length prefix starting Proof: Since there are no 1-swaps in M, for all v ∈ M1, 0 at Gs = (V0, ∅) and ending with Ge = (V0,E0) to G, where C[v][1] must be a clique. For contradiction, suppose M1 is a 0 m0 = |E0|. This ensures that we can always have a maximum maximum independent set of G[1] and |M1| > |M1|. Since independent set M = V0 as the initial input, which definitely M1 is a maximal independent set, it is known that M1 * 0 0 contains no k-swap vertex sets. Then, suppose M contains M1 which means |M1 \M1| ≥ |M1 \ M| + 1. Due to the no k-swap vertex sets at step t, we show that M contains no Pigeonhole Principle, there exists a v ∈ M1 such that C[v][1] k-swap vertex sets when Algorithm 1 terminates at step t + 1. contains two vertices x, y and (x, y) ∈/ E, which contradicts For contradiction, assume there exists a l-swap vertex set S in to C[v][1] is a clique for all v ∈ M1. Thus, M1 is a maximum M for some l ∈ [k] at step t + 1. If S belong to M at step t, independent set of G[1]. it is known that α(G[C[S]]) ≤ l since S is not l-swappbale at Taking things together, the following theorem is obtained. step t. The increase of α(G[C[S]]) at step t + 1 means some Theorem 3: Suppose M is an independent set of G which ∆ vertices are newly added to G[C[S]] due to swap operation. contains no 1-swaps, then M is a ( 2 + 1)-approximate Otherwise, C[S] is empty at step t but not during the update maximum independent set of G. process. In both of this two cases, S would be added to Sl Proof: Let Mopt denote a maximum independent set of during the update procedure, which contradicts to the terminal G. Since M1 is a maximum independent set of G[1], it is c conditions of Algorithm 1. known that |Mopt ∩ (M1 ∪ M1)| = |M1|. Together with equation 5, it is derived that Performance Ratio. Given a graph G with maximum degree c c ∆, recall that α(G) is the size of the maximum independent set |Mopt| = |Mopt ∩ (M1 ∪ M1)| + |Mopt ∩ (M>1 ∪ M>1)| c c of G. Define Mc = V \M to be the vertices which are not in ≤ |M1| + |M2| + · · · |M∆| + |M>1| c ∆ M. Partition M according to their count into at most ∆ sets ≤ |M| + · |M|. c c c 2 M1, ··· , M∆ where Mk = {w | w∈ / M, count[w] = k}. It is obviously that Thus the theorem is derived. Counter-intuitively, the following theorem shows that the c c c α(G) ≤ n = |M| + |M1| + |M2| + ··· + |M∆|, (3) performance ratio will not be improved with considering addition kinds of swap vertex sets. c c c X |M1| + 2|M2| + ··· + ∆|M∆| = d(v) ≤ ∆ · |M|. (4) v∈M

To analyze the performance ratio achieved by Algorithm 1, suppose M is an independent set of G which contains no k- swaps. As stated above, only vertices whose count ≤ k will be considered when finding a k-swap vertex set. Hence, we divide Mc into two parts as illustrated in Figure 2, where c c c c c (a) Worst case for k = 3. (b) Worst case for k = 4. M≤k = ∪l∈[k]Ml and M>k = M \M≤k. This also results Fig. 3. Example graphs with worst-case approximate ratio. in a partition of M according to whether or not some of its Theorem 4: For k ≥ 2, there is an infinite family of graphs c 2 neighbors belong to M≤k, i.e., M≤k = {w | w ∈ M,N(w)∩ in which an independent set M containing no k-swaps is c ∆ M≤k 6= ∅} and M>k = M\M≤k. of the optimal. Proof: For k = 2 or 3, consider the infinite family of instances given by the Kn. For each (u, v) ∈ Kn, a vertex w is added between u and v, and the edge (u, v) is replaced by two edges (u, w) and (w, v). Denote the resulted 0 graph as Kn. An example for k = 3 and n = 4 is shown in Fig. 2. Graph partition according to count. Figure 3(a). Notice that the original n vertices constitute of an 0 independent set of Kn which contains no k-swaps. However, (1) In case of inserting a vertex v, if v does not belong to M 0 n n(n−1) α(Kn) = 2 = 2 and ∆ = n − 1. and count[v] = 1 in Gt+1, which means v is newly added As for k ≥ 4, consider the infinite family of instances given to C[v], we add (M[v], v) to S1. by the Qn for n ≥ k. A hypercube graph Qn (2) In case of deleting a vertex v and v belongs to M in n n−1 has 2 vertices, and 2 n edges, and is a regular graph with Gt, we iterate over all neighbors u of v in Gt and add 0 n edges touching each vertex. For each edge (u, v) in Qn, a (M[u], u) to S1 if count[u] reduces to one in Gt+1. vertex w is added between u and v, and the edge (u, v) is (3) On insertion of an edge (u, v), if both u and v belong to replaced by two edges (u, w) and (w, v) Denote the resulted M in Gt and u is removed from M in Gt+1, we iterate 0 graph as Qn. Since the length of the shortest cycle in Qn is over all neighbors w of u in Gt and add (M[w], w) to S1 n, the induced graph of any vertex subset S with size k in if count[w] reduces to one in Gt+1. 0 Qn has at most k edges. That is, in Qn, there are at most k (4) On deletion of an edge (u, v), there are two cases to be vertices in C[S]. An example for k = 4 and n = 4 is shown in considered: (i) if one of them, say u, belongs to M in Gt, Figure 3(b), where the newly added vertices in the middle of and count[v] = 1 in Gt+1, we add (M[v], v) to S1; (ii) if each edge are omitted in it. Notice that the original 2n vertices neither u nor v belongs to M and M[u] = M[v] = {w} 0 in Qn constitute of an independent set of Qn which contains in Gt, we add (w, u) to S1. 0 n−1 no k-swaps. However, α(Qn) = 2 n and ∆ = n. Then in each loop of while, we retrieve a pair (v, P[v]) from S until it is empty. We check whether C[v] is still a clique after IV. ONE-SWAP BASED UPDATE ALGORITHM 1 adding each vertex in P[v] to it. For each u ∈ P[v], instead Following the swap-based update framework, we propose of locating a specific vertex in C[v] that is not in Nt+1(u), we an algorithm for maintaining an approximate maximum inde- only need to know if such a vertex exists. Hence, we compute pendent set containing no 1-swaps and make a further analysis the number of neighbors of u that is in C[v], and if |Nt+1(u)∩ of its performance based on the PLR graph model. C[v]| < |C[v]|, which means C[v] \ Nt+1(u) 6= ∅ that is C[v] ∪ Definition 4 (One-swap Vertex): Given a graph G and an {u} is not a clique, we remove v from M, add u to M, and independent set M, a vertex v ∈ M is a one-swap vertex (or extend M to be maximal by adding any vertex in Nt+1(v), one-swappable) if there exists an independent set Iv ⊆ C[v] = whose count reduces to zero, to M. Since C[v] ∪ {u} is not {w | count[w] = 1, w ∈ N(v)} such that |Iv| > 1. a clique, there is at least one vertex besides u will be newly As stated in Section III, we will maintain the first level of added to M. Then for any vertex in Nt+1(v) whose count C in our algorithm. That is for each vertex v ∈ M, C[v] is the reduces to one, we add it along with its neighbor in M to S1. neighbors of v whose count is one. And notice that a vertex Otherwise, since C[v] is still a clique we simply add u to C[v]. v ∈ M is one-swappable if and only if G[C[v]] is not a clique. Time Complexity. For each vertex u ∈ P[v], we compute |N (u)∩C[v]| to test whether C[v]\N (u) is empty. With Algorithm 2: Maintain an independent set M contain- t+1 t+1 the help of the hashing index of C[v], we can check whether a ing no 1-swaps vertex belongs to C[v] in constant time. Hence, the calculation Input: The graph Gt, the independent set M of Gt, can be done in d(u) time. So, if v is not one-swappable, the and an update operation op P total time consumed by a pair (v, P[v]) is u∈P[v] dt+1(u). Output: The independent set M of Gt+1 = Gt ⊕ op Otherwise, suppose there is a vertex u ∈ P[v] such that C[v] \ 1 Update Gt, M, and initialize S1; Nt+1(u) is not empty. Algorithm 2 will take dt+1(v)+dt+1(u) P 2 while S1 6= ∅ do to make the swap and at most u∈C[v] dt+1(u) time to extend 3 Retrieve a pair (v, P[v]) from S1; M to be maximal. Since each vertex u belongs to at most 4 for u ∈ P[v] do one P[v], summing up for all vertices in M, the total cost of P 5 if |N (u) ∩ C[v]| < |C[v]| then Algorithm 2 is O( c dt+1(v)) = O(mt+1). t+1 v∈M∪M1 6 MoveOut(v); MoveIn(u); Given a power-law random graph P (α, β) and an indepen- P 7 for w ∈ Nt+1(v) do dent set M of it, let ψ = v∈M d(v). The following theorem 8 if count[w] = 0 then MoveIn(w); states the expected running time of Algorithm 2 on P (α, β). Theorem 5: When β > 1, the expected time complexity 9 for w ∈ Nt+1(v) do of Algorithm 2 on a power-law random graph P (α, β) is 10 if count[w] = 1 and w∈ / C[v] then O(( 1 + ζ(β − 1, ∆))n ln n). 11 Add (M[w], w) to S1; ζ(β,∆) ln n Proof: Notice that the running time of Algorithm 2 is P bounded by O( c d(v)). We first show that a vertex 12 else Add u to C[v]; v∈M∪M1 whose degree is greater than τ = ζ(β − 1, ∆) ln n does not belong to M with high probability. ψ ζ(β − 1, ∆)ψ 1 We give the concrete implementation based on the template Pr{v ∈ M | d(v) > τ} ≤ (1− )τ ≤ exp(− ln n) ≤ and the pseudocode is shown in Algorithm 2. We first update m m n the structure of Gt and maintain M to be maximal utilizing The last inequality is due to ψ is greater than the number of Simple. After that we initialize S1 as follows. vertices whose degree is one. The probability that a vertex with degree j whose count is one is bounded by Taking expectation of both sides of Inequality 3 and together ψ ψ with Lemma 1, we derive the following theorem. Pr{v ∈ Mc | d(v) = j} ≤ j (1 − )j−1 1 m m Theorem 6: Given a power-law graph P (α, β) and an independent set M of P , if M contains no 1-swaps, then Then, we have 2(ζ(2β,∆)−1) ∆ M is an expected (1 + ζ(β,∆) )-approximate maximum X X c E[ d(v)] = j · E[|{v | v ∈ M1}| | d(v) = j] · Pr{d(v) = j}independent set. v∈Mc j=1 1 V. TWO-SWAP BASED UPDATE ALGORITHM ∆ α α ∆ X e 1 ψ ψ j−1 e ψ X ψ j−1 ≤ (1 − ) ≤ (1 − ) In this section, we propose a more effective algorithm ζ(β, ∆) j2β−2 m m ζ(β, ∆) m m j=1 j=1 by maintaining an independent set containing no 2-swaps. eα ψ 1 n Although we have shown that taking additional kinds of swap ≤ ≤ ζ(β, ∆) m ψ ζ(β, ∆) 1 − (1 − m ) vertex set into consider will not improve the performance ratio, finding two-swap vertex sets among M can indeed further 2β−2 The second inequality is because when β > 1, 1/j ≤ enlarge it in practice, when there are no one-swap vertices. 1 for all j ∈ [∆]. Thus, combining these two inequalities Definition 5 (Two-swap Vertex Set): Given a graph G and together, the theorem is obtained. an independent set M of G, a vertex pair S = {u, v} is Performance Ratio. From Theorem 2, we know that Algo- a two-swap vertex set (or two-swappable) if there exists an ∆ rithm 2 maintains a ( 2 + 1)-approximate maximum inde- independent set IS ⊆ C[S] = {w | count[w] = 1, w ∈ N(u)∪ pendent set in general graphs. However, as mentioned early, N(v)} ∪ {w | count[w] = 2, w ∈ N(u) ∩ N(v)} such that the real graphs are usually power-law graphs with many low- |IS| > 2. degree vertices. In the following, we make a further analysis It is easy to notice that given two vertices u, v in M, they of the performance ratio achieved by Algorithm 2 based on constitute a two-swap vertex set S if and only if there exists a the PLR graph model. triangle (x, y, z) in GR[C[S]]. Moreover, when there is no one- Lemma 2: Given a power-law random graph P (α, β) and swap vertices in M, we can derive the following useful facts of an independent set M of P , if M contains no 1-swap, then x, y, z: (1) count[x] = 2, and M[x] = {u, v}; (2) count[y] ≤ P∆ c 2(ζ(2β,∆)−1) E[ i=2 |Mi |] ≤ ζ(β,∆) · |M|. 2, and y is adjacent to u (and maybe to v); (3) count[z] ≤ 2, Proof: The probability that a vertex with degree j whose and z is adjacent to v (and maybe to u). We maintain the first count = i is bounded by two level of C in our algorithm, and in addition to S1, we j ψ m − ψ keep a set S2 to store all possible x for a given S. That is the Pr{v ∈ Mc | d(v) = j} ≤ ( )i( )j−i i i m m count of all vertices in P[S] will be two. We give the concrete implementation based on the template Thus, we have and the pseudocdoe is shown in Algorithm 3. After updating eα j ψ m − ψ c i j−i the structure of Gt according to op and maintaining M to be E[|{v | v ∈ Mi }| | d(v) = j] ≤ β · ( ) ( ) j i m m maximal by Simple, we initialize S1 and S2 as follows. Since the degree of a vertex whose count = i is greater than (1) In case of inserting a vertex v, if v is not in M in graph c Gt+1 and count[v] ≤ 2, we add (M[v], v) to S . or equal to i, the expected size of Mi is bounded by count[v] ∆ (2) In case of deleting a vertex v which belongs to M in Gt, X eα j ψ m − ψ E[|Mc|] ≤ · ( )i( )j−i we iterate over all neighbors u of v in Gt and if count[u] i j2βζ(β, ∆) i m m j=i reduces to two or less, we add (M[u], u) to Scount[u]. (3) On insertion of an edge (u, v), if both of u and v belong Summing up the expectations of |Mc |, ··· , |Mc |, we get 2 ∆ to M in Gt and u is removed from M in Gt+1, we iterate ∆ ∆ ∆ α ! X c X X e j ψ i m − ψ j−i over all neighbors w of u in Gt and if count[w] reduces E[ |M |] ≤ · ( ) ( ) i j2β ζ(β, ∆) i m m to two or less, we add (M[w], w) to S . i=2 i=2 j=i count[w] (4) On deletion of an edge (u, v), there are two cases to be ∆ j α ! X X e j ψ i ψ j−i ≤ · ( ) (1 − ) considered: (i) if one of them, say u, belongs to M in Gt j2β ζ(β, ∆) i m m j=2 i=2 and count[v] ≤ 2 in Gt+1, we add (M[v], v) to Scount[v]; ∆ α (ii) if neither u nor v belongs to M in Gt, the following X e ψ ψ j ≤ ( + 1 − ) j2β ζ(β, ∆) m m three conditions are considered (a) if M[u] = M[v] = j=2 {w}, we add (w, u) to S1; (b) if M[u] = {x}, M[v] = eα(ζ(2β, ∆) − 1) = {y}, and there is a vertex w ∈ C[x][2] ∩ C[y][2] such that ζ(β, ∆) (u, w) and (v, w) are not in Gt+1, we add ({x, y}, w) to 2(ζ(2β, ∆) − 1) ≤ · E[|M|] S2; (c) if count[u] ≤ count[v] = 2 and M[u] ⊆ M[v] = ζ(β, ∆) {w, x}, we add (M[v], v) to S2. α The last inequality is due to e is the number of the vertices Then in each loop of while, if S1 is not empty, then we process whose degree is one and at least half of them must belong to a pair (v, P[v]) as stated in Section IV. Notice that if v is not M. one-swappable, we need additionally check whether there is a c Algorithm 3: Maintain an independent set M contain- neighbors of y which is not in Cx[S], i.e., |Nt+1(y) ∩ Cx[S]|. c c c ing no 2-swaps If |Nt+1(y) ∩ Cx[S]| < |Cx[S]|, we conclude Cx[S] \ Nt+1(y) c Input: The graph Gt, an independent set M of Gt, is not empty. An element z that belongs to Cx[S] \ Nt+1(y) is and an update operation op the one we are looking for. We remove u, v form M, add x, Output: An independent set M of Gt+1 = Gt ⊕ op y to M, and extend M to be maximal by adding any vertex in Nt+1(S), whose count reduces to zero, to M. Then for 1 Update Gt, M, and initialize S1 and S2; any vertex w in Nt+1(S) whose count ≤ 2, we add them to 2 while S1 6= ∅ or S2 6= ∅ do Scount[w]. Otherwise, since count[x] = 2, we add x to C[S]. 3 if S1 6= ∅ then OneSwap(); Time Complexity. Since pairs in S1 are processed in the same 4 else if S2 6= ∅ then TwoSwap(); way as Algorithm 2, we only focus on the time consumed 5 Procedure OneSwap() by each pair in S2. Algorithm 3 retrieves a pair (S, P[S]) 6 Retrieve a pair (v, P[v]) from S1; from S2 when S1 is empty and S2 is not. For a fixed vertex 7 foreach u ∈ P[v] do x ∈ P[S], with the help of hashing index of C[S], Cx[S] can 8 if |Nt+1(u) ∩ C[v]| < |C[v]| then be built in dt+1(x) time. Then the algorithm considers each 9 MoveOut(v); MoveIn(u); vertex y in C[S] \C[v] in turn. One can test whether a vertex 10 foreach w ∈ Nt+1(u) do belongs to Cx[S] in constant time and compute |Nt+1(y) ∩ 11 if count[w] = 0 then MoveIn(w); c Cx[S]| in d(y) time. Therefore, if S is not two-swappable, P 12 Update-S(v); the total time consumption is at most (dt+1(x) + P x∈P[S] 13 else Add u to C[v]; y∈C[S]\C[v] dt+1(y)). Otherwise, suppose there is a triangle R (x, y, z) in G [C[S]]. It will take dt+1(S)+dt+1(x)+dt+1(y) 14 if v is not one-swappable then P time to make the swap and at most dt+1(z) 15 foreach w ∈ C[v][2] do z∈C[S]∪P[S] time to extend M to be maximal. Since count[x] = 2, each 16 if |N (w) ∩ P[v]| < |P[v]| then t+1 x belongs to at most one P[S]. And for each vertex y ∈ 17 Add (M[w], w) to S2; c C[S] \C[v], it will be checked at most |Nt+1(u) ∩ M2| times. Therefore, summing up for all vertices, the total time cost 18 () P P Procedure TwoSwap is O( dt+1(v) + k c dt+1(u)) = O(kmt+1), v∈M u∈M≤2 19 Retrieve a pair (S = {u, v}, P[S]) from S2; c where k = maxu∈Vt+1 |Nt+1(u) ∩ M2|. 20 foreach x ∈ P[S] do Theorem 7: When β > 1.5, the expected time complexity of 21 Cx[S] ← C[S] ∩ Nt+1(x); Algorithm 3 on a power-law random graph P (α, β) is O((1+ 22 foreach y ∈ C[S] \C[v] do 3 )ζ(β − 1, ∆)n ln n) c 2ζ(β,∆) . 23 if y∈ / Cx[S] and Cx[S] \ Nt+1(y) 6= ∅ then c Proof: According to the analysis in the proof of Lemma 24 /* where Cx[S] denotes C[S] \Cx[S] */ c 2, the expected sum of the degree of the vertices in M2 is 25 ( ) MoveOut S ; MoveIn ({x, y}); bounded by 26 foreach z ∈ Nt+1(S) do ∆ 27 ( ) X X c if count[z] = 0 then MoveIn z ; E[ d(v)] = j · E[|{v | v ∈ M2}| | d(v) = j] · Pr{d(v) = j} d∈Mc j=2 28 Update-S(S); 2 ∆ α α ∆ X e j − 1 ψ 2 ψ j−2 e ψ X ψ j−2 29 if y, z are not found for x then Add x to C[S]; ≤ ( ) (1 − ) ≤ (1 − ) 2ζ(β, ∆) j2β−2 m m 2ζ(β, ∆) m m j=2 j=2 30 Procedure Update-S(S) eα ψ 1 n ≤ ≤ 31 foreach w ∈ Nt+1(S) do 2ζ(β, ∆) m ψ 2ζ(β, ∆) 1 − (1 − m ) 32 if count[w] = 1 then Add (M[w], w) to S1; Combining the proof of Theorem 5, the expected time com- 33 else if count[w] = 2 then Add (M[w], w) to plexity of Algorithm 3 on P (α, β) is derived. S2; VI.EXPERIMENTS In this section, we conduct extensive experiments to evaluate the efficiency and effectiveness of the proposed algorithms. super set S including v becomes two-swappable due to P[v]. Or, if S1 is empty but S2 is not empty, we retrieve a pair A. Experiment Setting (S, P[S]) from S and determine if S is two-swappable as 2 Datasets. 22 real graphs are used to evaluate the algorithms. follows. For each x ∈ P[S], we check whether there exists a All of the graphs are downloaded form the Stanford Network pair y, z such that (x, y, z) constitute a triangle in GR[C[S]]. Analysis Platform3 [26] and Laboratory for Web Algorith- We first build a set C [S] containing the neighbors of x that x mics4 [8], [9]. The statistics are summarized in Table I, where belongs to C[S]. If S is a two-swap vertex set, vertex z must c belong to C[S] \Cx[S], denoted as Cx[S]. Then for each y in 3http://snap.stanford.edu/data/ 4 C[S] \C[v], if it is not in Cx[S], we compute the number of http://law.di.unimi.it/datasets.php the last column gives the average degree d¯. We categorize the command /usr/bin/time5. graphs in Table I into easy instances and hard instances, where B. Experimental Results the easy instances are the graphs that VCSolver [2] can finish in five hours and are listed in the first half of Table I. We report our findings concerning about the performance of our algorithms in this section. TABLE I Evaluate Solution Quality. We first evaluate the effectiveness STATISTICS OF GRAPHS of our proposed algorithms against the existing methods. First, Graph n m d¯ Epinions 75,879 405,740 10.69 we report the gap of the independent set size maintained by Slashdot 82,168 504,230 12.27 each algorithms to the independence number computed by VC- Email 265,214 364,481 2.75 Solver [2] and the approximation ratio γ after 100,000 updates com-dblp 317,080 1,049,866 6.62 com-amazon 334,863 925,872 5.53 for thirteen easy real graphs in Table II. It is clear that not web-Google 875,713 4,322,051 9.87 only TwoSwap but also OneSwap ourperform the competitors web-BerkStan 685,230 6,649,470 19.41 DGOneDIS and DGTwoDIS in the first six graphs and achieve in-2004 1,382,970 13,591,473 19.66 as-skitter 1,696,415 11,095,298 13.08 a competitive approximation ratio in the remaining graphs. As hollywood 1,985,306 114,492,816 115.34 stated previously, in practice, the amount of updates is always WikiTalk 2,394,385 4,659,565 3.89 proportional to the number of vertices in the graph. Hence, com-lj 3,997,962 34,681,189 17.35 soc-LiveJournal 4,847,571 42,851,237 17.68 we report the gap and the approximation ratio of solutions soc-pokec 1,632,803 22,301,964 27.32 returned by each algorithm after 1,000,000 updates for the wiki-topcats 1,791,489 25,444,207 28.41 last seven easy graphs in Table IV. All of our methods have a com-orkut 3,072,441 117,185,083 76.28 cit-Patents 3,774,768 16,518,947 8.75 smaller gap on all of them, especially in graph web-BerkStan uk-2005 39,454,746 783,027,125 39.70 and hollywood. The reason is that as the increasing of the it-2004 41,290,682 1,027,474,947 49.77 number of updates, the vertex categories in the dependency twitter-2010 41,652,230 1,468,365,182 70.51 Friendster 65,608,366 1,806,067,135 55.06 graph no longer fit the procedure of applying reductions to the uk-2007 109,499,800 3,448,528,200 62.99 vertices. They need to rebuild the index frequently to ensure the effectiveness, which is costly in practice. Algorithms. We compare our algorithms with the state-of- Then, we report the gap of the independent set size main- the-art methods DGOneDIS and DGTwoDIS proposed in [32] tained by each algorithm after 1,000,000 updates to the result which maintain a high-quality independent set without theo- size, which is obtained by the local search algorithm ARW retical guarantee on approximation ratio over dynamic graphs. [3], on the hard graphs in Table III. Notice that DGOneDIS We implement the following algorithms and compare them and DGTwoDIS didn’t finish within five hours on the last with the competitors. five graphs. Our proposed algorithms sometimes even return a solution with more vertices (marked with ↑ behind the gap). ∆ • OneSwap: our dynamic ( 2 +1)-approximation algorithm Although there is no improvement on the performance ratio, based on one-swap vertex (see Section IV). TwoSwap is indeed much more effective than OneSwap on all • TwoSwap: our effective dynamic approximation algorithm graphs. We conclude that our algorithms are more effective based on two-swap vertex set (see Section V). when the number of updates has the same order of n, which All the programmes above are implemented in C++ and com- is quite common in real-life applications. plied by GNU G++ 7.5.0 with -O2 optimization; the source Evaluate Time Efficiency. To study the time efficiency of codes of DGOneDIS and DGTwoDIS are obtained from the these algorithms, the time consumed by each of them to handle authors of [32]. All experiments are conducted on a machine 100,000 updates on the thirteen easy real graphs are shown in with eight 16-core Intel Xeon processors, 3072GB of main Figure 4(a). Generally, the response time of all four algorithms memory, and 40TB hard disk running CentOS 7. Similar to increase along with the increasing of the graph size. Due [32], we randomly insert/remove u vertices/edges to simulate to its simplicity, OneSwap runs the fastest across all graphs. the update operations. For easy graphs, we uses a maximum Both OneSwap and TwoSwap runs faster than DGOneDIS and independent set computed by VCSolver [2] as the initial M, DGTwoDIS on most of the graphs, especially when the graph and for hard graphs, we treats the independent set returned is dense e.g., hollywood. Since consider two-swap vertex set by ARW [3] as the input one. This is reasonable as we can additionally, TwoSwap takes a little more time than OneSwap. compute the initial M within limited time consumption. We show the response time taken by each algorithms on the last seven easy graphs and hard graphs in Figure 4(c) Metrics. We evaluate the different algorithms from three and Figure 5(a) respectively. It is surprising that DGOneDIS aspects: size of maintained independent set, response time, and DGTwoDIS suffer from a very high time consumption, and memory usage. Firstly, the larger the independent set especially in web-Berkstan and hollywood. Moreover, they maintained by an algorithm, the better the algorithm. Secondly, even didn’t finish in five hours on the last five hard graphs. for the response time, the smaller, the better; we run each Evaluate Memory Cost. The memory usage by each algo- algorithm three times and report the average CPU time. rithm on easy graphs and hard graphs are shown in Figure Thirdly, the smaller memory consumed by an algorithm the better; we measure the heap memory usage by the Linux 5https://man7.org/linux/man-pages/man1/time.1.html TABLE II THEGAPTOTHEINDEPENDENCENUMBEROBTAINEDBY VCSolver [2] AND APPROXIMATION RATIO AFTER 100,000 UPDATES. DGOneDIS DGTwoDIS OneSwap TwoSwap Graphs α(G) gap γ gap γ gap γ gap γ Epinions 26668 421 1.01604 408 1.01554 70 1.00263 31 1.00116 Slashdot 30529 478 1.01591 473 1.01574 99 1.00325 36 1.00118 Email 200028 74 1.00037 74 1.00037 21 1.00011 16 1.00008 com-dblp 144410 921 1.00642 479 1.00333 158 1.00109 57 1.00039 com-amazon 160042 1069 1.00672 828 1.00520 706 1.00443 342 1.00214 web-Google 506172 896 1.00177 620 1.00123 432 1.00085 239 1.00047 web-BerkStan 386906 2413 1.00628 2196 1.00573 2890 1.00753 2202 1.00572 in-2004 871245 1713 1.00193 1508 1.00173 2212 1.00255 1626 1.00187 as-skitter 1142177 344 1.00031 292 1.00026 738 1.00065 511 1.00045 hollywood 334188 4476 1.01358 3584 1.01073 23 1.00007 1 1.000002 WikiTalk 2276264 10 1.000004 10 1.000004 19 1.000008 7 1.000003 com-lj 2068594 556 1.00027 317 1.00015 1095 1.00053 724 1.00035 soc-LiveJournal 2613884 420 1.00016 263 1.00010 995 1.00038 627 1.00024

TABLE III THEGAPTOTHEBESTRESULTSIZEOBTAINEDBYTHELOCALSEARCH ALGORITHM ARW [3] AFTER 1,000,000 UPDATES Best Gap to the Best Result Size Graphs Result DGOneDIS DGTwoDIS OneSwap TwoSwap soc-pokec 612901 4006 3939 1143↑ 3595↑ wiki-topcats 792023 10885 10100 4013 882 com-orkut 747459 4669 3037 3347 5062↑ cit-Patents 1865112 6795 6261 5509 276↑ uk-2005 23363561 - - 7443 227↑ it-2004 25238765 - - 13078 276↑ twitter-2010 28423449 - - 6871 3515 Friendster 36012590 - - 5248 703↑ (a) Response time caused by 100,000 updates uk-2007 68976197 - - 12746 15974↑

4(b) and Figure 5(b) respectively. The memory usage of all methods increase with the increasing of the graph size. OneSwap and TwoSwap take more space than DGOneDIS and DGTwoDIS since we maintain more information to speed up the swap operation and equip M and C with hashing position index to enable constant time deletion and insertion. And TwoSwap consumes more space than OneSwap since it maintains an additional level of the hierarchy structure C (b) Memory usage caused by 100,000 updates as stated previously. Considering the main memory size of common commercial computers, the memory usage of the proposed algorithms is acceptable. Evaluate Scalability. To study the scalability of the proposed algorithms, the number of update operations (denoted by #Updates) is varied from 100,000 to 1,000,000 and from 1,000,000 to 10,000,000. Figure 6(a) and Figure 6(c) show the effect of #Updates on the time efficiency over the graph soc- LiveJournal. It is clear that the increasing rate of the response time is near linear to the amount of update operations. And, (c) Response time caused by 1,000,000 updates the improvement of TwoSwap and OneSwap in time efficiency Fig. 4. Response time and memory usage on easy graphs is stable and significant (more than 2 times speedup). Figure varying the growth exponent β from 1.9 to 2.7. The results on 6(b) and Figure 6(d) show the effect of #Updates on the gap PLR graphs are shown in Table V and Figure 7. It is clear that and accuracy over the graph soc-LiveJournal. As we can see, our proposed methods OneSwap and TwoSwap outperforms the performance of all algorithms degrades with the number the competitor DGOneDIS and DGTwoDIS significantly in of updates increases. However, our proposed methods have a terms of both gap and response time. The proposed methods lower decreasing rate than the competitors. As for the rise of are better than the competitor by a margin around 1.5% when the final stage in Figure 6(d), it is because the graph is much β is small, which is a noticeable improvement. Moreover, simpler at that time. both DGOneDIS and DGTwoDIS suffer from a high time Power-Law Random Graphs. We generate nine Power-Law consumption when β is small, i.e., the number of edges Random (PLR) graphs by NetworkX with 106 vertices by is huge. One thing to be noticed is that, DGOneDIS and TABLE IV THEGAPTOTHEINDEPENDENCENUMBEROBTAINEDBY VCSolver [2] AND APPROXIMATION RATIO AFTER 1,000,000 UPDATES. DGOneDIS DGTwoDIS OneSwap TwoSwap Graphs α(G) gap γ gap γ gap γ gap γ web-BerkStan 202164 6018 1.03115 5866 1.02988 1283 1.00638 608 1.00301 in-2004 871245 11127 1.01723 10071 1.01557 3414 1.00522 1572 1.00239 as-skitter 1347791 6337 1.00706 5869 1.00654 3112 1.00345 1705 1.00189 hollywood 350855 17777 1.05337 13268 1.03930 1020 1.00291 214 1.00061 WikiTalk 1802415 540 1.00030 535 1.00029 110 1.00006 66 1.00003 com-lj 1917114 11221 1.00589 9794 1.00513 7443 1.00389 4043 1.00211 soc-LiveJournal 2452407 9704 1.00397 8251 1.00338 7003 1.00286 3879 1.00158

other hand, applying perturbation to the framework may break the worst case sometimes, which may further improve the solution quality in practice.

ACKNOWLEDGMENTS This work was supported by the National Natural Sci- ence Foundation of China under grants 61732003, 61832003, 61972110 and U1811461.

REFERENCES (a) Response time caused by 1,000,000 updates [1] William Aiello, Fan R. K. Chung, and Linyuan Lu. A random graph model for massive graphs. In F. Frances Yao and Eugene M. Luks, editors, Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA, pages 171– 180. ACM, 2000. [2] Takuya Akiba and Yoichi Iwata. Branch-and-reduce exponential/fpt algorithms in practice: A case study of vertex cover. Theor. Comput. Sci., 609:211–225, 2016. [3] Diogo Vieira Andrade, Mauricio G. C. Resende, and Renato Fonseca F. Werneck. Fast local search for the maximum independent set problem. J. Heuristics, 18(4):525–547, 2012. [4] Filipe Araujo,´ Jorge Farinha, Patr´ıcio Domingues, Gheorghe Cosmin Silaghi, and Derrick Kondo. A maximum independent set approach for collusion detection in voting pools. J. Parallel Distributed Comput., (b) Memory usage caused by 1,000,000 updates 71(10):1356–1366, 2011. Fig. 5. Response time and memory usage on hard graphs [5] Sepehr Assadi, Krzysztof Onak, Baruch Schieber, and Shay Solomon. DGTwoDIS maintain an solution with the same size all the Fully dynamic maximal independent set with sublinear update time. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, time. This is because the power-law random graph is easy to Proceedings of the 50th Annual ACM SIGACT Symposium on Theory process, only the degree-one reduction will be applied to the of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, vertices when constructing the dependency graph index. pages 815–826. ACM, 2018. [6] Sepehr Assadi, Krzysztof Onak, Baruch Schieber, and Shay Solomon. Fully dynamic maximal independent set with sublinear in n update ONCLUSION VII.C time. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on In this paper, we developed a swap-based update framework Discrete Algorithms, SODA ’19, page 1919–1936, USA, 2019. Society for Industrial and Applied Mathematics. for efficiently maintaining approximate maximum independent [7] Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, sets over dynamic graphs and make a deep analysis on the Cliff Stein, and Madhu Sudan. Fully dynamic maximal independent performance ratio achieved by it. We also gave out the lower set with polylogarithmic update time. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS bound of the approximation ratio achieved by the algorithms 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 382–405. based on swap operations, which indicates the limit of this IEEE Computer Society, 2019. methodology. Following the framework, we designed a dy- [8] Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. ∆ Layered label propagation: A multiresolution coordinate-free ordering namic ( 2 + 1)-approximation algorithm based on one-swap for compressing social networks. In Sadagopan Srinivasan, Krithi vertex. Notice that this is the first dynamic approximation Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi algorithm for the maximum independent set problem over Kumar, editors, Proceedings of the 20th international conference on World Wide Web, pages 587–596. ACM Press, 2011. dynamic graphs. To further improve the quality of the solution, [9] Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: we proposed a much more effective algorithm based on two- Compression techniques. In Proc. of the Thirteenth International World swap vertex set. Extensive empirical studies demonstrate that Wide Web Conference (WWW 2004), pages 595–601, Manhattan, USA, 2004. ACM Press. all our algorithms maintain much larger independent sets while [10] Lijun Chang, Wei Li, and Wenjie Zhang. Computing A near-maximum having less running time with the increasing of the amount of independent set in linear time by reducing-peeling. In Semih Salihoglu, update operations. For future directions, there are two possible Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management ways. On the one hand, it is possible to utilize other structural of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, information to achieve a better approximation ratio; on the 2017, pages 1181–1196. ACM, 2017. (a) Response Time (105 to 106) (b) Gap&Accuracy (105 to 106) (c) Response Time (106 to 107) (d) Gap&Accuracy (106 to 107) Fig. 6. Scalability evaluation on graph soc-LiveJournal TABLE V THEGAPTOTHEINDEPENDENCENUMBEROBTAINEDBY VCSolver [2] AND APPROXIMATION RATIO ON POWER-LAW GRAPHS AFTER 1,000,000 UPDATES DGOneDIS DGTwoDIS OneSwap TwoSwap Graphs β α(G) gap γ gap γ gap γ gap γ PLR1 1.9 386299 2603 1.00678 2603 1.00678 541 1.00140 242 1.00063 PLR2 2.0 354997 3728 1.01061 3728 1.01061 864 1.00244 324 1.00091 PLR3 2.1 342790 4304 1.01272 4304 1.01272 853 1.00252 332 1.00097 PLR4 2.2 328441 5186 1.01604 5186 1.01604 1143 1.00349 414 1.00126 PLR5 2.3 319165 5345 1.01703 5345 1.01703 1176 1.00370 419 1.00131 PLR6 2.4 312687 5833 1.01901 5833 1.01901 1140 1.00366 406 1.00130 PLR7 2.5 306760 5804 1.01929 5804 1.01929 1182 1.00387 415 1.00135 PLR8 2.6 300774 6185 1.02100 6185 1.02100 1159 1.00387 413 1.00138 Fig. 7. Response time caused by 106 up- PLR9 2.7 297614 6072 1.02083 6072 1.02083 1160 1.00391 375 1.00126 dates on power-law graphs

[11] Shiri Chechik and Tianyi Zhang. Fully dynamic maximal independent 2018. set in expected poly-log update time. In David Zuckerman, editor, 60th [22] Magnus´ M. Halldorsson´ and Jaikumar Radhakrishnan. Greed is good: IEEE Annual Symposium on Foundations of Computer Science, FOCS Approximating independent sets in sparse and bounded-degree graphs. 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 370–381. Algorithmica, 18(1):145–163, 1997. IEEE Computer Society, 2019. [23] Johan Hastad.˚ Clique is hard to approximate within n1-epsilon. In 37th [12] Jakob Dahlum, Sebastian Lamm, Peter Sanders, Christian Schulz, Annual Symposium on Foundations of Computer Science, FOCS ’96, Darren Strash, and Renato F. Werneck. Accelerating local search Burlington, Vermont, USA, 14-16 October, 1996, pages 627–636. IEEE for the maximum independent set problem. In Andrew V. Goldberg Computer Society, 1996. and Alexander S. Kulikov, editors, Experimental Algorithms - 15th [24] Minhao Jiang, Ada Wai-Chee Fu, Raymond Chi-Wing Wong, and International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, Yanyan Xu. Hop doubling label indexing for point-to-point distance 2016, Proceedings, volume 9685 of Lecture Notes in Computer Science, querying on scale-free networks. Proc. VLDB Endow., 7(12):1203–1214, pages 118–133. Springer, 2016. 2014. [13] Yuhao Du and Hengjie Zhang. Improved algorithms for fully dynamic [25] Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, and maximal independent set. CoRR, abs/1804.08908, 2018. Renato F. Werneck. Finding near-optimal independent sets at scale. In [14] Uriel Feige. Approximating maximum clique by removing subgraphs. Michael T. Goodrich and Michael Mitzenmacher, editors, Proceedings SIAM J. Discret. Math., 18(2):219–225, 2004. of the Eighteenth Workshop on Algorithm Engineering and Experiments, [15] Fedor V. Fomin, Fabrizio Grandoni, and Dieter Kratsch. A measure ALENEX 2016, Arlington, Virginia, USA, January 10, 2016, pages 138– & conquer approach for the analysis of exact algorithms. J. ACM, 150. SIAM, 2016. 56(5):25:1–25:32, 2009. [26] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network [16] Ada Wai-Chee Fu, Huanhuan Wu, James Cheng, and Raymond Chi- dataset collection. http://snap.stanford.edu/data, June 2014. Wing Wong. IS-LABEL: an independent-set based labeling scheme for [27] Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, and Zhewei Wei. Towards point-to-point distance querying. Proc. VLDB Endow., 6(6):457–468, maximum independent sets on massive graphs. Proc. VLDB Endow., 2013. 8(13):2122–2133, 2015. [17] M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. [28] J. M. Robson. Algorithms for maximum independent sets. J. Algorithms, 7(3):425–440, 1986. [18] Andreas Gemsa, Martin Nollenburg,¨ and Ignaz Rutter. Evaluation of labeling strategies for rotating maps. In Joachim Gudmundsson and [29] Robert Endre Tarjan and Anthony E. Trojanowski. a maximum inde- Jyrki Katajainen, editors, Experimental Algorithms - 13th International pendent set. SIAM J. Comput., 6(3):537–546, 1977. Symposium, SEA 2014, Copenhagen, Denmark, June 29 - July 1, 2014. [30] Mingyu Xiao and Hiroshi Nagamochi. Exact algorithms for maximum Proceedings, volume 8504 of Lecture Notes in Computer Science, pages independent set. Inf. Comput., 255:126–146, 2017. 235–246. Springer, 2014. [31] Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, [19] Mark K. Goldberg, David L. Hollinger, and Malik Magdon-Ismail. and Wei Li. New algorithms for fast discovery of association rules. In Experimental evaluation of the greedy and random algorithms for finding David Heckerman, Heikki Mannila, and Daryl Pregibon, editors, Pro- independent sets in random graphs. In Sotiris E. Nikoletseas, editor, ceedings of the Third International Conference on Knowledge Discovery Experimental and Efficient Algorithms, 4th InternationalWorkshop, WEA and Data Mining (KDD-97), Newport Beach, California, USA, August 2005, Santorini Island, Greece, May 10-13, 2005, Proceedings, volume 14-17, 1997, pages 283–286. AAAI Press, 1997. 3503 of Lecture Notes in Computer Science, pages 513–523. Springer, [32] Weiguo Zheng, Chengzhi Piao, Hong Cheng, and Jeffrey Xu Yu. 2005. Computing a near-maximum independent set in dynamic graphs. In [20] Andrea Grosso, Marco Locatelli, and Wayne J. Pullan. Simple ingredi- 35th IEEE International Conference on Data Engineering, ICDE 2019, ents leading to very efficient heuristics for the maximum . Macao, China, April 8-11, 2019, pages 76–87. IEEE, 2019. J. Heuristics, 14(6):587–612, 2008. [33] Weiguo Zheng, Qichen Wang, Jeffrey Xu Yu, Hong Cheng, and Lei Zou. [21] Manoj Gupta and Shahbaz Khan. Simple dynamic algorithms for Efficient computation of a near-maximum independent set over evolving maximal independent set and other problems. CoRR, abs/1804.01823, graphs. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, pages 869–880. IEEE Computer Society, 2018.