Efficient Anytime Anywhere Algorithms for Closeness Centrality in Large and Dynamic Graphs

Efficient Anytime Anywhere Algorithms for Closeness Centrality in Large and Dynamic Graphs Eunice E. Santos, John Korah, Vairavan Murugappan, Suresh Subramanian Department of Computer Science Illinois Institute of Technology, Chicago, USA {esantos2, jkorah3}@iit.edu, {vmuruga1, ssubra20}@hawk.iit.edu Abstract—Recent advances in social network analysis SNA techniques in large-scale networks, especially for time methodologies for large (millions of nodes and billions of edges) critical applications. A number of recent papers have and dynamic (evolving at different rates) networks have focused proposed various methodologies such as parallel and on leveraging new high performance architectures, multithreaded tools, novel data structures and efficient graph parallel/distributed tools and novel data structures. However, libraries to perform large-scale graph analysis [3][4][5][6]. there has been less focus on designing scalable and efficient algorithms to handle the challenges of dynamism in large-scale While parallelization techniques solve some of the scalability networks. In our previous work, we presented an overarching issues, there are other issues such as network dynamism that anytime anywhere framework for designing parallel and need to be better addressed. Also, these methodologies do not distributed social network analysis algorithms that are scalable to generally prescribe algorithm designs that take into account large network sizes and can handle dynamism. A key contribution tradeoffs between memory availability and performance. of our work is to leverage the anytime and anywhere properties of In many SNA tasks such as real-time social media graph analysis problems to design algorithms that can efficiently analytics, disaster management, etc., the underlying network handle network dynamism by reusing partial results, and by is continuously evolving, with nodes and edges being added reducing re-computations. In this paper, we present an algorithm and/or removed, during analysis. One approach used to for closeness centrality analysis that can handle changes in the network in the form of edge deletions. Using both theoretical address this issue is to restart the analysis of the modified analysis and experimental evaluations, we examine the graph from scratch [7]. While this method can be feasible for performance of our algorithm with different network sizes and smaller graphs with occasional network changes, restarting dynamism rates. the entire analysis from scratch in larger real-time networks with frequent updates will often yield poor performance. Keywords-social network analysis; parallel and distributed Another method is to analyze a static snapshot of the dynamic processing; centrality analysis; dynamic graphs; anytime network. However, this will only yield approximate results algorithms which, depending on the rate of change of the network, can I. INTRODUCTION quickly become obsolete. Although there are some graph processing methods[6][8] that are sensitive to networks Social networks provide the capability to leverage graph changes, their main objective is to reduce the load imbalance algorithms and other computational methodologies to caused by dynamism. We believe less attention has been rigorously analyze various social phenomena such as social given to formulating critical algorithm designs that can truly influence. Depending on what relationships are depicted by adapt to different rates of changes in the social networks by the arcs/edges of a social network, the graph can be efficiently incorporating the new information, and avoiding represented as directed or undirected. A number of social massive re-computations. Finally, as discussed in the network services such as Twitter, LinkedIn, and a myriad of background section, many computational platforms such as other similar applications have witnessed explosive growth shared-memory, distributed-memory, massively in the past decade. As a result, these domains generate multithreaded and other specialized architectures are dynamic network information in the order of millions of leveraged to perform large-scale graph analysis. In order to nodes and edges. Availability of such large data sets, while take advantage of various computational platforms, any significantly extending our understanding of the underlying graph analysis algorithm or framework has to be compatible social phenomenon, has also created new challenges for with these environments. social network analysis (SNA) research. We now describe the In order to address these challenges, a parallel/distributed following key issues in performing SNA on large and anytime anywhere methodology for large and dynamic social dynamic social networks. network analysis was proposed by Santos et al.[1], [9]. In the The most important issue is network size. As the network anytime anywhere approach, the large social network graph size grows, computation time and resources required for is decomposed into smaller sub-graphs. Each sub-graph is measuring SNA metrics may increase dramatically[1][2]. analyzed and refined over time, while providing nontrivial Computation time can significantly constrain the utility of intermediate results. The quality of these results are monotonically non-decreasing with respect to the for massive shared-memory and many thread processor computation time. Moreover, the dynamic changes in the architectures. Our anytime anywhere framework, on the other network occurring over the course of analysis are hand, is computational platform independent and can be used continuously incorporated. In the anytime anywhere to develop dynamic large-scale algorithms for both cluster methodology, the term anytime refers to the capability of the computing platforms and other specialized architectures. algorithms to provide nontrivial partial results that will get MapReduce[15] is a popular framework that provides a refined over time. The term anywhere, on the other hand, simple, scalable and fault tolerant programming model for refers to the capability to incorporate changes to the network developing parallel applications. It is specifically designed whenever it happens and to propagate the effects of these for processing large amounts of data stored on commodity changes to the whole network. We demonstrated our anytime clusters. Large-scale graph analysis tools such as Surfer [5] anywhere approach for centrality measurements [1] and and PEGASUS [4] were developed based on the MapReduce maximal clique enumeration problem [2] as part of our framework. However, most graph analysis algorithms are previous work. In this work, we provide algorithm designs iterative in nature [16], and MapReduce systems are not well that can adapt to edge deletions efficiently, and validate this suited for such tasks. Pregel [7], a vertex-centric and Bulk approach for the problem of closeness centrality calculations Synchronous Parallel (BSP) model based graph analysis in social networks. Furthermore, we have provided both platform, was designed to address these issues. In this model, theoretical and experimental evaluations of the proposed the computation is divided into a sequence of iterations or algorithm. Note that our previous algorithm designs[2] for supersteps. During each superstep, a user defined function is edge deletions were specific for the maximum clique performed at each graph vertex. Messages are exchanged problem. between the vertices at the end of each superstep. This iterative framework can be used to formulate a number of II. BACKGROUND graph algorithms. In addition, a user defined function can be A number of graph libraries [4][10], computational invoked for each vertex, theoretically in parallel, so the frameworks [3][7], tools [5][6], and data structures[3][10] maximum number of parallel tasks in the system could be as have been proposed for large-scale graph analysis. However, high as the number of vertices in the graph, thereby making even with the existing computational capabilities researchers this framework highly scalable. This framework provides and analysts are overwhelmed with the ever-increasing size ease of programming due to the synchronicity imposed by the of the social network datasets. In addition to the enormous supersteps. Giraph2, Hama3, Mizan [6] and Pregelix [17] are scale, these data sets exhibit various degrees of dynamism, some of the popular Pregel based graph analysis systems which further impedes the network analysis. Some of the [18]. These systems provide efficient data structures, APIs, prominent single machine, shared-memory based graph and platforms for large-scale graph analysis. However, analysis tools are LEMON [11], Pajek 1 , igraph [12], MapReduce and Pregel are programming frameworks, and do NetworkX [13], UCInet [14], etc. Pajek and UCInet are graph not specify efficient algorithm designs for handling dynamic analysis, software tools that provide implementations of graphs. various graph analysis algorithms, and provisions for graph A large part of existing work on SNA algorithms, visualization. LEMON, NetworkX and igraph are graph especially for computing shortest path related metrics for libraries that provides APIs and other useful graph data dynamic graphs [19][20], is focused on providing efficient structures for performing graph analysis. Due to issues such serial algorithms. While these solutions perform well for the as limited memory size, these tools can only provide analysis analysis of small to medium sized graphs, they are not for small to medium scale (hundreds

Efficient Anytime Anywhere Algorithms for Closeness Centrality in Large and Dynamic Graphs

Approximating Network Centrality Measures Using Node Embedding and Machine Learning

Closeness Centrality Extended to Unconnected Graphs : the Harmonic Centrality Index

“A Tie Is a Tie? Gender and Network Positioning in Life Science Inventor Collaboration”

Measures of Centrality

Application of Social Network Analysis in Interlibrary Loan Services

Closeness Centrality in Some Splitting Networks

Triangle Centrality [16] Is Based on the Sum of Triangle Counts for a Vertex and Its Neighbors, Normalized Over the Total Triangle Count in the Graph

A Faster Method to Estimate Closeness Centrality Ranking Arxiv

Centrality Measures

Measures of Centrality

Patterns of Iran's Research Collaboration in the Field Of

Centralities in Large Networks: Algorithms and Observations