Network Exploration by Complements of Graphs with Graph Coloring

Journal of Advanced Statistics, Vol. 2, No. 2, June 2017 78 https://dx.doi.org/10.22606/jas.2017.22002 Network Exploration by Complements of Graphs with Graph Coloring Tai-Chi Wang1, Frederick Kin Hing Phoa2* and Yuan-Lung Lin2 1National Center for High-Performance Computing, National Applied Research Laboratories, Taiwan 2Institute of Statistical Science, Academia Sinica, Taiwan Email: [email protected] Abstract Network data have become very popular with the growth of technologies and social applications such as Twitter and Facebook. However, few visualization tools have been created for exploring large-scale networks. We propose a simple and quick procedure to explore a network in this study. The algorithm changes the edge representation based on the complement of a simple graph and the partition method of vertex coloring. Furthermore, the colors provide additional information on top of the partitions. Our proposed method is demonstrated in some famous networks. Keywords: Visualization, Complement of Graph, Greedy Algorithm, Network Partition, N-clique 1 Introduction Graph theory has been verified as a useful approach to explain complex networks. For example, Erdős and Rényi [1] first provided Poisson random model to explain the relation in a network. Watts and Strogatz [2] used a small-world model to describe the dynamic network system. Barabási and Albert [3] provided a scale-free model to explain the expanded connectivity of networks. There have been huge amount of studies and applications since these models were proposed. It is difficult to glimpse structures of large-size networks due to their network complexities. Although many useful R-, Python- and C-based softwares such as Cytoscape [4], Gephi [5], and igraph [6] were developed to visualize and analyze network data, many network features were still hindered inside their complicated structures. Few studies proposed simple and efficient methods to visualize networks. Fruchterman and Reingold [7] provided some useful methods to embed proper vertex locations and to make visualization easier. However, once a network is big, the analysis becomes a time-consuming task and the network features are still difficult to clearly visualize. Among complicated network patterns, recent researches have focused on exploring network patterns by graph partition and network communities. Graph partition is an important issue in network recognition. The main issue of graph partition is how to divide a graph into subgraphs so that the number of edges connecting different subgraphs is minimized. For example, Karypis and Kumar [8] provided a multilevel partition√ algorithm by considering the coarsening and uncoarsening procedures. Arora et al. [9] developed a O( log n)-approximation algorithm that can quickly partition a graph into two parts by a 2 l2-representation approach. Similar but not exactly equivalent to a graph partition problem that finds the minimum number of cut, a community detection problem focuses on finding subgraphs so that each subgraph is densely connected. Given the reachability of community and the geodesic distance among community members, Wasserman and Faust [10] defined two important quantities: (1) the n−clique is a maximal subgraph in which the largest geodesic distance between any two nodes is no greater than n, and (2) the n−club restricts the geodesic distance within the subgraph to be no greater than n. These n−cliques and n−clubs are possible communities under a weaker definition. Furthermore, the modularity-based method [11] considered a modularity measure to evaluate the similarity of groups. A spectral optimization method is used to classify network communities. Bickel and Chen [12] further investigated the theoretical properties of the modularity measure. Bayesian models, which were developed by Handcock et al. [13] and Heard et al. [14], deduced network properties and assigned cluster labels to vertices by posterior samples. Readers who are interested in other community detection approaches and definitions can refer to Wasserman and Faust [10] and Tang and Liu [15]. JAS Copyright © 2017 Isaac Scientific Publishing Journal of Advanced Statistics, Vol. 2, No. 2, June 2017 79 Although these models and methodologies are helpful in realizing the network patterns, they are commonly limited by their computational efficiencies. In specific, these methodologies might not be helpful and informative if one needs to quickly characterize a complex network within a limited amount of time. For the visualization purpose, an initial step to understand a network is still by raw visual recognitions. In addition, it is common among traditional approaches that they use geodesic distances between pairs of vertices in a network. If we want to quickly glimpse structures of a network according to the interactive distances among vertices, it is essential to develop a simple representation to summarize a network. In general, we want to construct an algorithm to achieve the following goals: – In terms of computing efficiency, the algorithm can quickly visualize and explore a network. – In terms of graph partition, the algorithm can label vertices in a network. – In terms of cluster pattern visualization, same labels are assigned to close vertices. – In terms of extended information, the proposed approach can provide additional information on top of network partition. In order to achieve the above goals, we propose a popular but old-school approach, graph coloring, to visualize and to explore networks in a simple way. Graph coloring is one of the most useful tools developed in graph theory. It originally arose from the four-colors problem and has been widely used in other fields such as time tabling and scheduling, register allocation, printed circulated board testing, and so on [16]. In operation research, many researches used this approach to deal with scheduling problems and to make decisions [17, 18]. In compiler optimization, graph coloring is an important tool that used for solving the register allocation problem [19, 20]. The original purpose of graph coloring is to find the minimum chromatic number for labeling a graph. This leads to the partition of a graph into small groups when each group has good properties such as n−cliques or n−clubs (for recognizing partition and cluster patterns). Although Graph coloring is a promising approach to feature a network, it suffers some difficulties when applying to network visualization. One of the most difficulties is the meaning of labels of graph coloring. The most important and fundamental property of the graph coloring is “two adjacent vertices must label different colors”. The opposite side of this property is “two vertices with the SAME color must be NOT adjacent”. Suppose a studied graph is denoted as G. The graph coloring technique on G can shed the light on the dependence of cliques, i.e., the clique members must have different colors. On the other hand, we consider to color the complement of a graph, G, where two vertices of G are connected if and only if they are not connected in G. Based on the above “opposite” property, two vertices with the SAME color must be NOT adjacent on the colored G. Interestingly, this means that the vertices with the same colors are adjacent on the original graph G. Thus, the colors of G can be the partition labels. Based on this idea, we can apply the graph coloring technique to partition a network. In this study, we propose the greedy vertex coloring approach for coloring a graph. The greedy vertex coloring approach has been verified to run O(n + m) time [21], and this running time is efficient and acceptable in the case of a large scale network. Thus, the graph coloring is a good approach to achieve our goals. The remaining part of this paper is organized as follows. First, we introduce some preliminary concepts of graph theory and the basic algorithm for coloring in section 2. In section 3, we consider the complements of different edge representations for partitions by graph coloring, and then we use the Karate network [22] to demonstrate our proposed method with an emphasize on aspects such as labeling the complement results, the representation of partitions, and statistics of labels and edges. Section 4 shows the feasibility and efficiency of our proposed approach using other famous networks. The last section provides the discussion and the final conclusion of the paper. 2 Preliminary Definitions and Graph Coloring In this study, we focus on the undirected network and vertex coloring only. Since an undirected network is simpler to visualize than a directed network, we can directly see the patterns via graph coloring. 2.1 Preliminary Definitions A simple graph is an ordered pair G = (V, E), where V = {v1, . , vn} is a set of vertices and E = {e1, . , em} is a set of edges in which each edge is denoted as (vi, vj). In practice, each edge is denoted as Copyright © 2017 Isaac Scientific Publishing JAS 80 Journal of Advanced Statistics, Vol. 2, No. 2, June 2017 (vi, vj), where the vertices vi and vj are known as the endpoints of the corresponding edge. Additionally, they are said to be adjacent to each other, and are also called neighbors to each other. The adjacency matrix of G is an n by n square matrix with entries aij = 1 if vi and vj are adjacent and aij = 0 otherwise. The degree of a vertex v is the number of neighbors of v.A u-v path is a sequence of distinct vertices {u = v0, v1, . , vp−1, vp = v}, and any two consecutive vertices are joined by an edge. The shortest path from u to v is the u-v path containing minimum number of edges among all paths from u to v, and the distance between two vertices is the number of edges in the shortest path from u to v, denoted as dG(u, v). A graph is connected if it has at least one u-v path whenever u, v ∈ V .

Load more