Journal of Statistical Mechanics: Theory and Experiment

PAPER: INTERDISCIPLINARY STATISTICAL MECHANICS Related content

- A divisive spectral method for network Multi-level spectral graph partitioning method community detection Jianjun Cheng, Longjie Li, Mingwei Leng et al. To cite this article: Muhammed Fatih Talu J. Stat. Mech. (2017) 093406 - Motif-based embedding for graph clustering Sungsu Lim and Jae-Gil Lee

- Complex eigenvectors of network matrices View the article online for updates and enhancements. give better insight into the communitystructure Mina Zarei, Keivan Aghababaei Samani and Gholam Reza Omidi

This content was downloaded from IP address 193.140.142.195 on 20/08/2019 at 11:59 J. Stat. Mech. (2017) 093406 dividing the sub-graphs dividing the sub-graphs ) 1742-5468/17/093406+17$33.00 2 Lin and Fiduccia-Mattheyses) some are community detection some are community detection Theory and Experiment Theory Kernighan– is introduced. Using the eigenvectors of eigenvectors of Using the introduced. is anics: high processing costs; ( MSGP) ) ( 1 In this article, a new method for multi-level and balanced division multi-level and balanced division In this article, a new method for random graphs, networks, clustering techniques, heuristics random graphs, networks, clustering techniques, heuristics on regular and irregular graphs which clearly demonstrate that on regular and irregular graphs which clearly demonstrate that

algorithms) faster than the others to produce a proper times 14,4 works about MSGP partitioning result. independently. With a better understanding of the eigenvectors of the whole of the whole independently. With a better understanding of the eigenvectors MSGP can graph, and by discovering the confidential information owned, processing. recursive without multi-leveled and balanced into graphs the divide heap tree. The Inspired by Haar wavelets, it uses the eigenvectors with a binary comparison results in seven existing methods ( of non-directional graphs non-directional of the method has a spectral approach which the of graphs, ( has superiority over local methods Bisection, which is a spectral method, can with a global division ability. Fiedler vector, while the recursive version of divide the graph by using the levels. However, the spectral methods this method can divide into multiple have two disadvantages: ( algorithms Keywords: Abstract. Muhammed Fatih Talu Inonu University, Malatya, Turkey Computer Science Department, E-mail: [email protected] Received 29 March 2017 2017 Accepted for publication 31 July Published 27 September 2017 Online at stacks.iop.org/JSTAT/2017/093406 https://doi.org/10.1088/1742-5468/aa85ba

ournal of Statistical Mech of Statistical ournal

method Multi-level spectral graph partitioning partitioning spectral graph Multi-level

: Interdisciplinary statistical mechanics statistical Interdisciplinary : PAPER J © 2017 IOP Publishing Ltd and SISSA Medialab srl Ltd and SISSA IOP Publishing © 2017 IOP JSTAT 10.1088/1742-5468/aa85ba 1742-5468 9 Journal of Statistical Mechanics: Theory and Experiment Theory Mechanics: of Statistical Journal J. Stat. Mech. 2017 Ltd and SISSA Medialab srl © 2017 IOP Publishing JSMTC6 093406 M F Talu Multi-level spectral graph partitioning method Printed in the UK 2017 Multi-level spectral graph partitioning method

Contents

1. Introduction 2

2. Spectral partitioning (SP) 3 3. The proposed method (MSGP) 5 4. Applications with MSGP 8 5. Comparison results 9 J. Stat. Mech. ( 2017 6. Conclusion 16 Acknowledgments 16 References 16

1. Introduction

Graphs (G) are mathematical structures used to define objects (nodes or vertices) and relationships (edges) of them. Edges may be directed or undirected according to the relation types. A graph G can be defined in G =(V,E) form with V and E, without loops or multiple edges. The graph partitioning problem can be defined to divide G into smaller subsets such that each subset has about the same size and therefore the cost of the edges spanning subsets is minimal. For example, a k-way partition splits the node set V of G into k smaller subsets V1,V2,...,Vk 3 . [ ] ) 093406 Graph partitioning is a widely researched topic and many books [3, 21] and papers about the subject have been published [29]. It was used in many applications such as VLSI circuit layout [17], solving linear systems [22] and distributing workloads for par- allel computation [15]. In recent years, graph partitioning has gained importance due to its applications for the grouping and discovering of social, biological, pathological and cyber security networks (see figure 1). It allows us to model many practical appli- cations. For example, in natural language processing, after expressing the verbal argu- ments as vertices and similarity between them as edges, the graph partitioning methods help us to find the same semantic or unique roles [19]. The study of extraction building patterns (collinear, curvilinear, parallel groups, and grid) at [8] can be given as another example. The building clusters with features of area, shape and visual similarities are detected using graph partitioning. The graph partitioning problem is known to be NP-hard [5]. Even for special graphs such as trees and grids (without holes), no reasonable approximation algorithms exist. There is a tradeo between runtime and partition quality (balanced) which is unavoid- able [9]. Some the algorithms are fast but compute inappropriate partitioning results; some provides high quality ratios but are very slow. In literature, the graph partitioning methods can be separated into two wide catego- ries, local and global. Widely used local methods are the algorithms of Kernighan–Lin https://doi.org/10.1088/1742-5468/aa85ba 2 Multi-level spectral graph partitioning method

Figure 1. Graph partitioning applications. [18] and Fiduccia-Mattheyses [10]. They are ecient 2-way partitioning algorithms using by local search strategies. Their major disadvantage is the arbitrary initial split- J. Stat. Mech. ( 2017 ting of the node set, which influences partition quality considerably. Global partitioning approaches use the entire graph’s information without an arbitrary initial requirement to start splitting. A well-known example for global approaches is the spectral scheme (SS) [15, 30] and multilevel scheme (MS) [16]. In SS, a graphical partitioning based on the eigenvalues and eigenvectors of the Laplacian (or adjacency) matrix of a graph is performed. MS contains coarsening, partitioning and uncoarsening phases. In [23], for fast graph partitioning, a comparison study of SSs and MSs in terms of partition quality and cost (time) is performed on GPUs. In all experiments carried out in that study, nvGRAPH library [24] with Cuda 8 introduced by NVIDIA was used. As a result of experimental studies, the obtained numerical results showed that SSs can obtain significantly higher quality partitions than MSs for networks with high degree nodes (especially social networks), while the time taken by both schemes is essentially the same. Especially in applications such as medical image segmentation, the partition quality has a significant impact on result [27]. Therefore the superiority of the partition quality of SSs over MSs seems to be very valuable. The major disadvantage of the SSs especially in multi-level partitioning is the

( ) ) 093406 need to recalculate eigenvalues and eigenvectors. As it is well known, eigenvalue calcul­ ation part of spectral graph partitioning methods requires extensive time. For quick calculations, many solver methods such as Lanczos, Jacobi/Davidson and LOBPCG were developed. In this paper, a new multi-level and balanced spectral graph partitioning method (MSGP) which does not need to iteratively calculate eigenvalues and eigenvectors is used. By placing the eigenvectors in a binary heap tree structure in a certain order and optimal time, it reveals the knowledge of level division. In section 2, a summary about classical SP methods is given. MSGP is introduced in section 3. The comparison results obtained between MSGP and the existing methods on specific graphs (three regular, one irregular) is given in section 4. In the last section, the conclusion of the work is given.

2. Spectral partitioning (SP)

Due to be eectively solved with linear algebra, the SP methods outperform tradi- tional clustering algorithms such as the k-means, mean-shift and PCA [11]. Because of this superiority, it is used to solve problems in unspecified complex graphs [20]. It https://doi.org/10.1088/1742-5468/aa85ba 3 Multi-level spectral graph partitioning method is a partitioning approach based on the eigenvalues and eigenvectors of the Laplacian matrix L(G), which is also a square, symmetric and sparse matrix defined by deg(i) i = j L(G)=D(G) A(G)= 0 i = j , i, j =1,...,n −   (1) w i, j E  − i,j  ∈ where A(G) is the  (square) and D(G) is the node degree matrix (diagonal) defined as follows: 0 i, j / E

A (G)=  ∈ , i, j =1,...,n J. Stat. Mech. ( 2017 w i, j E (2) i,j ∈ deg(i) i = j D (G) = , i, j =1,...,n 0 i = j (3) where wi,j denotes the edge weight between nodes i and j and deg(i) refers to the num- ber of nodes to which node i is connected. Because L(G) is a symmetric matrix, its eigenvalues are real, and its eigenvectors are perpendicular (orthogonal) to each other. After calculation of L(G)v = λv, the eigenvector corresponding to the second smallest eigenvalue of L(G) is known as the Fiedler vector [14]: Fiedler v v v  1 2 n λ1 ... 0 . . . . ...... v =  ......  ,λ=  . . .  . (4)    . . .  0 ... λ  . . .   n      

  ) 093406   Using the Fiedler vector, G is divided into two sub-graphs (G1 and G2). For divi- sion, the signs or median of the Fiedler vector is used. The graph partitioning algo- rithm using the Fiedler vector is known as spectral bisection (SB) algorithm shown in algorithm 1:

Algorithm 1. Spectral bisection (SB).

INPUT:a graph, G =(V,E) OUTPUT:two sub-graphs, G1 and G2 Compute L(G) Compute eigenvalues and eigenvectors of L(G)=[v, λ]=eig(L) Fiedler vector: v2 while node i of G do i if v2 median(v2) then Put node i in G1 else Put node i in G2 end if end while return G1 and G2 https://doi.org/10.1088/1742-5468/aa85ba 4 Multi-level spectral graph partitioning method

Algorithm 2. Recursive spectral bisection (RSB).

INPUT:a graph, G =(V,E) and a partition number, k OUTPUT:two sub-graphs, G1,G2, ..., Gk [G1,G2] = SB(G) if (k/2) > 1 then RSB(G1,k/2) else RSB(G2,k/2) end if return G1,G2, ..., Gk J. Stat. Mech. ( 2017 To perform the multi-level partitioning, a recursive version of SB (named as the recursive spectral bisection-RSB) is used and shown at algorithm 1 [14]. Until the node number in the sub-graph reaches one, the partitioning process of RSB continues. The necessity of recalculating eigenvalues and eigenvectors at each recursion step cause RSB to run extremely slowly, especially on large graphs (social networks). To resolve the slowness of RSB, the parallel programming techniques have been used in [24]. It uses multiple eigenvalues and k-means algorithm on GPUs for faster partitioning. Since the method uses an intuitive approach like k-means, it does not work at the desired acc­ uracy. The other disadvantage of RSB is the division of each sub-graph independently. Each sub-graph should be thought of inside a whole infrastructure. In the comparison section, these two spectral approachs is named as SpectralClust1 and SpectralClust2 respectively. A comprehensive literature review of the spectral partitioning concept can be found in [31]. A method providing a multilevel splitting in spectral methods other than RSB has not been found. ) 093406 3. The proposed method (MSGP)

MSGP is a multi-level spectral graph partitioning method which does not require repeated calculations of eigenvectors. It is inspired by the wavelets which are often used in signal processing and can perform a multi-level analysis of the signal [1]. The wavelets provide a multi-level decomposition of the signal using predefined dierently structured wavelets. Therefore, the input signal can be written as a linear combination of wavelets. The Haar wavelet is the simplest and most commonly used wavelet type within the Daubechies wavelet family and can be described as 10 t<1/2 ψ(t)= 11/2 t<1 5 − ( ) 0 otherwise By adding the scaling ( j ) and shifting (k) properties, it can be written as ψ (t)=ψ(2 jt k). Note that ψ and ψ are orthogonal in the range [0, 1]: jk − jk 1 ψ(t)ψjk(t)dt =0. (6) 0 https://doi.org/10.1088/1742-5468/aa85ba 5 Multi-level spectral graph partitioning method Just as Haar wavelets encode the dierent parts of the input signal, according to MSGP, each eigenvector (of Laplacian of any graph) contains information about the dierent parts (sub-graphs) of the graph. For a clearer view, the 8 × 8 Haar wave- let matrix (H8) and the eigenvector sign matrix of a graph with 8-nodes is shown at figure 2. It is seen from figure 2(left) that each Haar wavelet encodes information about a particular region of the signal. The first wavelet encodes the entire signal, while the second wavelet separately encodes each half of the signal (child signals). The third and fourth wavelets encode each child signal by dividing them into two. The encoding pro- cess continues until the child signal size has reached one. Figure 2 right shows the eigenvector sign matrix of a simple graph with 8-nodes.

( ) J. Stat. Mech. ( 2017 Similarly, according to MSGP, the first eigenvector represents all nodes in the graph, while the second eigenvector (Fiedler) divides the graph into two and produces two sub-graphs. The third and fourth eigenvectors can divide the sub-graphs produced by the second eigenvector into two again. The division process continues until one node remains in the sub-graphs. Key Question-1: Do the eigenvectors of Laplacian of any graph have a balanced division of knowledge? Answer : According to MSGP, in order for a balanced division to take place, the graphs (or sub-graphs) must contain one single connected component. Otherwise a bal- anced division does not occur. To understand the better balanced division of graph, the (Q) metric [25] can be used as follows: 1 k k s s +1 Q = A i j i j 2m i,j − 2m 2 (7) i,j   where i and j are two nodes with degrees ki and kj respectively. For a particular divi- sion of the graph into two groups let s =1 if node i belongs to group 1 and s = 1 if i i − it belongs to group 2. ) 093406 Key Question-2: Do the eigenvectors of Laplacian of any graph with one single connected component have a multi-level division of knowledge? Answer : The concept of MSGP is defined so that a sub-graph on the kth eigen- vector can be divided into two parts (in a balanced manner) by using one of the 2kth or 2k 1th eigenvectors. Let− us try to understand it on an example. For example, let G be divided into two 1 1 level sub-graphs (G2 and G3) using the second eigenvector (Fiedler). Gindex represents the partition level and the sub-graph index respectively. According to MSGP, these sub- graphs can be divided using the third or fourth eigenvectors. Thus the children of the 1 2 2 1 2 sub-graph G2 will be G4 and G5, while the children of the sub-graph G3 will be G6 ve 2 G7. For performing MSGP, the binary heap tree (T) [28] has been used. The data structure of each node at T is shown at figure 3. When any sub-graph is divided by an eigenvector, the division results are stored to T as a new node. Each node contains a divider eigenvector index (k), the division level and two sub-graph (2k and 2k 1) − information (node indexes and total node weights). The node weight represents its val- ues at the eigenvector. Algorithm 3 shows the steps of MSGP. Accordingly, the algorithm takes a non- level directional graph G as input and produces sub-graphs, Gindex, as output. Due to being https://doi.org/10.1088/1742-5468/aa85ba 6 Multi-level spectral graph partitioning method

Figure 2. Haar wavelets (left) and eigenvector sign matrix of Laplacian of a graph (right). a spectral method, the Laplacian matrix and eigenvalue/eigenvectors are calculated at J. Stat. Mech. ( 2017 the first and second steps of the algorithm. In the third and fourth steps, a T is built where the information of the sub-graphs is stored. The first division of the graph begins with the second eigenvector (Fiedler). The 1 1 division results (G2 and G3 sub-graphs) is stored to T as root node. This is the first division level. One more level of division means adding new child nodes to T by divid- ing each sub-graph again. The division process is continued until one node remains in sub-graphs (in while-loop). It can be seen that the maximum level of division of a n-nodes graph is log n . 2  Algorithm 3. MSGP.

INPUT:a graph, G =(V,E) and a partition number, k OUTPUT:k sub-graphs, G1,G2, ..., Gk Compute L(G) Compute k eigenvalues/eigenvectors (in ascending): [V,λ]=eig(L(G),k) Add the second eigenvector (Fiedler) to T as root node:AddNode(T, root) root.evec =2 ) 093406 root.node =[V2 > 0 V2 0] root.level =1 root.weight = sum(and(root.node, Vroot.evec)) while parent = SubtractNode(T) do for Calculate candidate child nodes (c =1, 2) do child .evec =2 parent.evec mod(k, 2) c ∗ − childc.node = and( parent.node, [Vchildc.evec > 0 Vchildc.evec 0]) childc.level = parent.level +1 childc.weight = sum(and(childc.node, Vchildc.evec)) end for if min diff(childc.weight) then c ind = ind +1 addnode(T, childc) end if end while return G1,G2, ..., Gk

In the division process of a sub-graph in each node of T, the answer to the question ‘which eigenvector will divide the sub-graph’ must be found. According to the above- mentioned theorem, there are two candidate eigenvectors (2k and 2k 1) to divide − each sub-graph obtained with the eigenvector (k). So that the proper eigenvector can be determined, the sub-graph is divided using both eigenvectors and the two results https://doi.org/10.1088/1742-5468/aa85ba 7 Multi-level spectral graph partitioning method

Figure 3. Data structure of each node at the binary heap tree (n: node number). is stored to the two candidates (child1 and child2) (in for-loop). In the control stage (if- block), the dierence value (weight) between the parts after division of each candidate is calculated and the candidate providing the minimum value is selected. J. Stat. Mech. ( 2017 Obviously, the superiority of MSGP over the RSB is that it does not need to repeat- edly (recursively) calculate eigenvalues and eigenvectors. This calcuation is performed only once, as in the SB which equals 1-level MSGP division. For higher levels, MSGP has the cost of building binary trees which is O(log(n)).

4. Applications with MSGP

In this section, the application results are evaluated on specific graphs (three regular, one irregular) so that the division ability of MSGP can be better understood. In the first application, MSGP shows the ability of multi-level partitioning. For this application, a graph with 40 nodes and regular structure shown in figure 4 is used. 3-level division was performed on this graph by MSGP and 8 sub-graphs (shown in dierent color) were produced as output. As a result of the 1-level division, the graph [1–40] is divided into two sub-graphs, [1–20] and [21–40]. In the 2-level division, the sub-graph [1–20] is divided into two ) 093406 sub-graphs, [1–10] and [11–20], and the sub-graph [21–40] is also divided into two sub- graphs [21–30] and [31–40]. In the 3-level, a total of 8 sub-graphs [1–5], [6–10], [11–15], [16–20], [21–25], [26–30], [31–35] and [36–40] are obtained. In the second application, it is shown how MSGP builds the binary heap tree T, and calculates the total weights and sign information. Figure 5(a) shows a simple graph with 26 nodes. This graph is divided by using 2-level MSGP and the output sub-graphs are shown in figure 5(b) with dierent colors. The eigenvectors of the Laplace of this graph are shown in figure 5(c) and its sign matrix is shown in figure 5(d). The colors on the columns indicate the positive and negative nodes of the sub-graphs resulting from division. The partition levels are shown on the matrices. The total weight values used to test whether provide a balanced split are displayed under each column. The child node with the smallest weight dierence between the sub-graphs is added to the T. Figure 5(e) shows the T built for this graph. It appears that a maximum of 4-level divi- sions can take place for that graph. Accordingly, 2nd, 3rd–4th, 5th–8th and 9th–15th eigenvectors are used in the 1st, 2nd, 3rd and 4th-level divisions. In the third application, it is shown how MSGP performs proper eigenvector selec- tion while dividing the sub-graphs. For this, a graph with 60 nodes and a regular struc- ture was used. The first column of figure 6 shows the 2nd, 3rd and 5th level division results on this graph. In the second column, candidate eigenvectors appearing at each division level are shown in the tree structure. The eigenvector index and sign (positive/ https://doi.org/10.1088/1742-5468/aa85ba 8 Multi-level spectral graph partitioning method

Figure 4. 3-level MSGP division result. J. Stat. Mech. ( 2017 negative) is shown on nodes of the tree, number of nodes in the sub-graphs is located on edges. Careful observation shows that each sub-graph is split using two eigenvectors. In the first division level, the second eigenvector is used and two sub-graphs with 30 nodes are obtained from the whole graph with 60 nodes. These are marked with 2P and 2N symbols. At the second division level, each of the sub-graphs is divided by eigenvectors 3 and 4. As a result, four sub-graphs with 15 nodes are obtained. At this point, it must be determined which eigenvector is responsible for the proper division. The weight dierences of the sub-graphs resulting in the split are used to determine the appropriate eigenvector. The node with the appropriate eigenvector is added to the tree T. In the three applications mentioned up to now, regular graphs have been used. In the fourth and last application, the divide ability on an irregular graph of MSGP is shown. Figure 7 shows the results of the level division of MSGP on an irregular struc- tures graph with 830 nodes. The first, seconnd, and third level division results are shown at figure 9 in order. It is clear from the results that MSGP is quite successful and fast in the balanced and multi-level partition of regular or irregular graphs. ) 093406

5. Comparison results

In this section, the seven existing algorithms in the graph partitioning (or community detection) field and MSGP are compared in terms of performance and accuracy. The algorithms are briefly as follows:

(i) AFG [2] is a community detection algorithm which allows for multi scale screening of the graphs. (ii) Danon [7] is a modified version of the Newman algorithm [12] which a method for detecting communities in social and biological networks.

(iii) MSGP is the proposed method in this manuscript.

(iv) ModulMax1 [4] is a heuristic based community detecting structure of large social networks.

(v) ModulMax2 is an dierent implementation of [2] which provides fast detection of communities using modularity optimization. https://doi.org/10.1088/1742-5468/aa85ba 9 Multi-level spectral graph partitioning method J. Stat. Mech. ( 2017

(a) (b)

(c) (d) ) 093406

(e)

Figure 5. Multilevel and balanced partitioning example with MSGP algorithm. (a) Original Graph. (b) 2-level MSGP partitioning result. (c) Eigenvectors of Laplacian of the graph. (d) Signs of eigenvectors. (e) Binary heap tree, T. (vi) ModulMax3 is also an implementation of the greedy agglomerative modularity maximization method in [2].

(vii) SpectralClust1 [18] is a spectral graph partitioning method based on factoriza- tion, which uses the k-means clustering of eigenvectors. https://doi.org/10.1088/1742-5468/aa85ba 10 Multi-level spectral graph partitioning method J. Stat. Mech. ( 2017

(a)

(b) ) 093406

(c)

Figure 6. Selection of suitable eigenvector for division (left) sub-graphs (right) binary heap tree. (a) 1-level. (b) 2-level. (c) 4-level.

https://doi.org/10.1088/1742-5468/aa85ba 11 Multi-level spectral graph partitioning method

(a) (b) J. Stat. Mech. ( 2017

(c) (d)

Figure 7. MSGP division results on irregular graph. (a) 1-level. (b) 2-level. (c) 3-level. (d) 4-level. (viii) SpectralClust2 [13] is also a spectral grouping approach based on optimization of a generalized eigenvalue problem. ) 093406 Two dierent types of synthetic graphs were produced for comparison. First, for a realistic comparison, four large graphs (Graph-1, Graph-2, Graph-3 and Graph-4 with 1024, 960, 832, and 1536 nodes respectively) were produced. These large graphs were obtained by the copying and combining of simple and well-structured micro graphs shown in figure 8. To produce these large graphs, the nodes indexed with 9, 8, 7, 6 and their peers (after copying) were used for the linking (combining) of micro graphs, respectively. After producing the large graphs, for the ground-true of each large graph, four levels of division results (sub-graphs with 2, 4, 8 and 16 pieces) were produced. Note that these four large produced graphs do not contain any noise and have a fairly simple and orderly structure. Second type of synthetic graphs were produced using the method in [26] whose graphs are generally used for the purpose of community detection (or identifying asso- ciations) in social networks. For producing graphs, the number of groups (community), the number of nodes in each group, and the numbers of internal and external half-edges per nodes must be given as inputs. All eight algorithms in comparison take two inputs (the adjacent matrix of the graph and the partition number) and produce a single output vector which is the node

https://doi.org/10.1088/1742-5468/aa85ba 12 Multi-level spectral graph partitioning method J. Stat. Mech. ( 2017

Figure 8. Micro structures of simple graphs.

Figure 9. A sample graph produced by [26]. It has the eight clusters and the node ) 093406 number in each group is 1024. vector and specifies which node is in which group. In order to calculate the accuracy of the algorithm results, the estimated node vector (Vest) is compared with the real node vector (Vreal) by using the evaluation criterion in [6] which calculates the normalized mutual information between Vest and Vreal as defined follows:

Gr Ge Nij N 2 Nijlog − i=1 j=1 NiNj Similarity (Vreal,Vest)= (8) Gr N i Ge   Nj i=1 Nilog N + j=1 Njlog N     where Gr and Ge denotes the numbers of real and estimated groups respectively. N is the confusion matrix between the two vectors (Vreal,Vest). Ni and Nj denotes the num- bers of sum over row i and column j of matrix Nij respectively. If Vreal is completely dierent from Vest, the similarity information of the two vectors is 0. If Vreal are identi- cal to Vest, then the mutual information takes a maximum value of 1. The first experiment was performed on the first type of synthetic graphs. The four large graphs produced were divided into 2, 4, 8 and 16 parts by the eight partitioning algorithms. After each division process, the time spent and the accuracy value were recorded. The obtained results at all levels are shown at table 1. It is clear from the https://doi.org/10.1088/1742-5468/aa85ba 13 Multi-level spectral graph partitioning method

Table 1. The running time (in second) and accuracy values of the eight partitioning algorithms on the four synthetic graphs (Gi : ith graph). J. Stat. Mech. ( 2017 ) 093406

results that, the spectral partitioning methods (SpectralClust1 and MSGP) can produce results faster than the other methods on the first type of synthetic graphs (four large graphs). The reason being they need only to calculate the k eigenvectors once. After this calculation, all that is done is to use k-means algorithm for SpectralClust1 and to build the binary heap tree for MSGP. The average running times of the seven existing methods and MSGP for the first type of synthetic graphs at four levels are 2,311 08 s and 0,160 49 s respectively. It shows that MSGP works about 14,4 times faster than the other seven existing methods. Looking at the accuracy values, it is seen that MSGP can produce the ground true vector. The reason why MSGP can achieve such a high acc­ uracy is its linear algebraic resistance, contrary to being a heuristic approach. Besides it is seen that SpectralClust1 is the closest method to MSGP in general accuracy results. The reason for this is that it is also method based on linear algebra, but it uses k-means clustering method after calculating eigenvectors. The average acc­ uracy values of the seven existing method and MSGP for four graphs at four levels are https://doi.org/10.1088/1742-5468/aa85ba 14 Multi-level spectral graph partitioning method

Table 2. The running time (in second) and accuracy values of the eight partitioning algorithms on the second type of synthetic graphs produced by [26]. J. Stat. Mech. ( 2017

0,528 35 and 1 respectively. It shows that MSGP produces about %47,16 more correct results than the other seven existing methods. In the second experiment, a second type of synthetic graphs was used to analyze ) 093406 the performance of the algorithms on the eect of the number of groups. As in the first experiment, the eight dierent graphs were produced, including four in 1024 and the other four in 2048 nodes, by changing the group numbers to 2, 4, 8, and 16. For all generated graphs, the number of internal and external half-edges was set to 20 and 5 respectively. Table 2 shows the performance results of the algorithms in terms of time and accuracy. In general, MSGP works better than all other methods. However, it is observed that the accuracy of MSGP is lower than other methods. The average running times of the seven existing method and MSGP for four graphs at four levels are 0,903 10 and 0,116 89 respectively. It shows that MSGP works about 7,72 times faster than the other seven existing methods. The average accuracy values of the seven existing methods and MSGP for four graphs at four levels are 0,960 05 and 0,816 872 respectively. It shows that the seven existing methods produce about %14,31 correct results than MSGP. When the results in tables 1 and 2 are evaluated together, it can be seen that there is a significant change in the accuracy of MSGP. It is important to note that the graphs used in the first application are produced with noiseless, smooth and simple logic. For this reason, there is no ambiguity or inconsistency as to how the ground-true node vec- tor should be. Because of that, the results given in table 1 are 100 percent correct. In https://doi.org/10.1088/1742-5468/aa85ba 15 Multi-level spectral graph partitioning method addition, the graphs and the ground true node vector used in the second application are produced by [26]. I think that there is a discrepancy between the generated graph and its vector. This graph generation algorithm uses two coecients (internal and exter- nal), which may cause the ground true node vector to be inaccurate. For this reason, there is doubt about the consistency of the results given in table 2.

6. Conclusion

This paper shows a new method MSGP which enables multilevel and balanced parti- J. Stat. Mech. ( 2017 tion of regular and irregular graphs. The method has a spectral approximation and shows that the eigenvectors of Laplacian of a graph have the multi-level and balanced partitioning knowledge. Inspired by the Haar wavelets, MSGP reveals this hidden knowledge in eigenvectors by using binary heap trees in the implementation stage. The experimental works clearly demonstrate the superiority of MSGP over the seven exist- ing methods in terms of the correctness and performance.

Acknowledgments

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK—grant no: 215E075).

References

[1] Akansu A N, Serdijn W A and Selesnick I W 2010 Emerging applications of wavelets: a review Phys. Commun. 3 1–18 ) 093406 [2] Arenas A, Fernández A and Gómez S 2008 Analysis of the structure of complex networks at different resolu- tion levels New J.Phys. 10 053039 [3] Bichot C E and Siarry P 2011 Graph Partitioning (New York: Wiley) [4] Blondel V D, Guillaume J-L, Lambiotte R and Lefebvre E 2008 Fast unfolding of communities in large networks J. Stat. Mech. P10008 [5] Dahlke K 2008 NP-complete problems Math Reference Project (http://www.mathreference.com/lan-cx- np,intro.html) Retrieved 2008-06-21 [6] Danon L, Diaz-Guilera A, Duch J and Arenas A 2005 Comparing community structure identification J. Stat. Mech. P09008 [7] Danon L, Guilera A D and Arenas A 2006 Eect of size heterogeneity on community identification in complex networks J. Stat. Mech. P11010 [8] Du S, Luo L, Cao K and Shu M 2016 Extracting building patterns with multilevel graph partition and building grouping ISPRS J. Photogramm. Remote Sens. 122 81–96 [9] Feldmann A E and Emil A 2012 Fast Balanced Partitioning is Hard Even on Grids and Trees (Berlin: Springer) pp 372–82 [10] Fiduccia C M and Mattheyses R M 1982 A linear-time heuristic for improving network partitions 19th Design, Automation Conf. (IEEE) pp 175–81 [11] Fishkind D, Sussman D, Tang M, Vogelstein J and Priebe C 2013 Consistent adjacency-spectral partition- ing for the stochastic block model when the model parameters are unknown SIAM J. Matrix Anal. Appl. 34 23–39 [12] Girvan M and Newman M E J 2002 Community structure in social and biological networks Proc. Natl Acad. Sci. USA 99 7821–6 [13] Shi J J and Malik J 2000 Normalized cuts and image segmentation IEEE Trans. Pattern Anal. Mach. Intell. 22 888–905 https://doi.org/10.1088/1742-5468/aa85ba 16 Multi-level spectral graph partitioning method

[14] Gross P Z J L and Yellen J 2013 Handbook of Graph Theory 2nd edn (Boca Raton, FL: CRC) [15] Kabelikov P 2006 Graph partitioning using spectral methods PhD Thesis VSB—Technical University of Ostrava [16] Karypis G, Kumar V, Karypis G and Kumar V 1998 A fast and high quality multilevel scheme for partition- ing irregular graphs SIAM J. Sci. Comput. 20 359–92 [17] Kaur M and Singh K 2013 Digital circuit layout based on graph partitioning technique using DNA comput- ing Int. J. Comput. Appl. 69 17–2 [18] Kernighan B W and Lin S 1970 An ecient heuristic procedure for partitioning graphs Bell Syst. Tech. J. 49 291–307 [19] Lang J and Lapata M 2014 Similarity-driven semantic role induction via graph partitioning Comput. Lin- guist. 40 633–69 [20] Liu H, Cao M and Wu C W 2014 Coupling strength allocation for synchronization in complex networks

using spectral graph theory IEEE Trans. Circuits Syst. I 61 1520–30 J. Stat. Mech. ( 2017 [21] Lorca X 2011 Tree-Based Graph Partitioning Constraint (New York: Wiley) [22] Moraglio A, Kim Y-H, Yoon Y and Moon B-R 2007 Geometric crossovers for multiway graph partitioning Evol. Comput. 15 445–74 [23] Naumov M 2016 Fast spectral graph partitioning on GPUs NVIDIA publications https://devblogs.nvidia. com/parallelforall/fast-spectral-graph-partitioning-gpus/ [24] Naumov M and Moon T 2016 Parallel spectral graph partitioning research Technical Report NVIDIA Research NVR-2016-001 [25] Newman M E J 2006 Modularity and community structure in networks Proc. Natl Acad. Sci. USA 103 8577–82 [26] Newman M E J and Girvan M 2004 Finding and evaluating community structure in networks Phys. Rev. E 69 026113 [27] Stawiaski J 2008 Mathematical morphology and graphs: application to interactive medical image segmenta- tion PhD Thesis École Nationale Supérieure des Mines de Paris [28] Stlting G, Madalgo B, Lagogiannis G and Tarjan R E 2012 Strict fibonacci heaps Proc. 44th Symp. Theory Computational vol 12 p 1177 [29] Van Dongen S 2008 Graph clustering via a discrete uncoupling process SIAM J. Matrix Anal. Appl. 30 121–41 [30] Vecharynski E, Saad Y and Sosonkina M 2014 Graph partitioning using matrix values for preconditioning symmetric positive definite systems SIAM J. Sci. Comput. 36 A63–87 [31] Von Luxburg U 2007 A tutorial on spectral clustering Stat. Comput. 17 395–416 ) 093406

https://doi.org/10.1088/1742-5468/aa85ba 17