Visualization of Large Networks Using Recursive Community Detection

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 12-18-2020 Visualization of Large Networks Using Recursive Community Detection Xinyuan Fan San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Other Computer Sciences Commons Recommended Citation Fan, Xinyuan, "Visualization of Large Networks Using Recursive Community Detection" (2020). Master's Projects. 965. DOI: https://doi.org/10.31979/etd.atus-pbv9 https://scholarworks.sjsu.edu/etd_projects/965 This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. Visualization of Large Networks Using Recursive Community Detection CS298 Report Presented to Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements for the Degree Master of Science By Xinyuan Fan December 2020 c 2020 Xinyuan Fan ALL RIGHTS RESERVED The Designated Project Committee Approves the Project Titled Visualization of Large Networks Using Recursive Community Detection by Xinyuan Fan APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE SAN JOSE STATE UNIVERSITY December 2020 Dr. Katerina Potika Department of Computer Science Dr. Teng Moh Department of Computer Science Prof. Kevin Smith Department of Computer Science ABSTRACT Networks show relationships between people or things. For instance, a person has a social network of friends, and websites are connected through a network of hyperlinks. Networks are most commonly represented as graphs, so graph drawing becomes significant for network visualization. An effective graph drawing can quickly reveal connections and patterns within a network that would be difficult to discern without visual aid. But graph drawing becomes a challenge for large networks. Am- biguous edge crossings are inevitable in large networks with numerous nodes and edges, and large graphs often become a complicated tangle of lines. These issues greatly reduce graph readability and makes analyzing complex networks an arduous task. This project aims to address the large network visualization problem by combining recursive community detection, node size scaling, layout formation, labeling, edge coloring, and interactivity to make large graphs more readable. Experiments are performed on five known datasets to test the effectiveness of the proposed approach. Asurveyofthevisualizationresultsisconductedtomeasuretheresults. Keywords: Networks, Visualization, Graph Drawing, Community Detec- tion, Louvain, Node Size, Graph Layout, Labeling, Edge Coloring, Inter- activity ACKNOWLEDGMENTS I would like to thank Dr. Katerina Potika for serving as the advisor for this project. Dr. Potika’s continued guidance and feedback have been invaluable towards the completion of this project. I would also like to thank committee members Dr. Teng Moh and Prof. Kevin Smith for taking an interest in this project. I am very appreciative of their time and efforts. Lastly, I would like to thank my family and friends for their encouragement and support. v TABLE OF CONTENTS CHAPTER 1Introduction................................ 1 2RelatedWorks.............................. 4 2.1 Network Tools ............................. 4 2.2 Network Embedding ......................... 4 2.3 Network Nucleus ........................... 5 2.4 Layouts ................................ 5 2.5 Edge Coloring ............................. 6 3Methodology............................... 8 3.1 Community Detection ......................... 8 3.1.1 Louvain Method for Community Detection ......... 10 3.1.2 Recursive Community Detection ............... 10 3.2 Node Size Scaling ........................... 16 3.2.1 Minimum Node Size Scaling ................. 16 3.2.2 Logarithmic Node Size Scaling ................ 17 3.2.3 Capped Node Size Scaling .................. 19 3.3 Layouts ................................ 20 3.3.1 Static Layouts ......................... 20 3.3.2 Interactive Layouts ...................... 21 3.4 Labeling ................................ 24 3.5 Edge Coloring ............................. 25 vi 3.5.1 Node Coloring ......................... 25 3.6 Interactivity .............................. 27 4SetupofExperiments.......................... 29 4.1 Datasets ................................ 29 4.2 Technologies .............................. 30 4.3 System Requirements ......................... 31 4.4 User Survey .............................. 31 5Results................................... 34 5.1 Recursive Community Detection ................... 34 5.2 Node Size Scaling ........................... 37 5.2.1 Minimum Node Size Scaling ................. 37 5.2.2 Logarithmic Node Size Scaling ................ 38 5.2.3 Capped Node Size Scaling .................. 39 5.3 Layouts ................................ 40 5.4 Labeling ................................ 43 5.5 Edge Coloring ............................. 43 5.6 Interactivity .............................. 44 6ConclusionandFutureWork..................... 48 6.1 Conclusion ............................... 48 6.2 Future Work .............................. 49 List of References .............................. 51 APPENDIX vii Data Visualization User Survey Questions .............. 55 A.1 Email Iteration 1 ........................... 56 A.2 Email Iteration 2 ........................... 57 A.3 DBLP Iteration 1 ........................... 58 A.4 DBLP Iteration 2 ........................... 59 A.5 Amazon Iteration 1 .......................... 60 A.6 YouTube Iteration 1 ......................... 61 A.7 YouTube Iteration 2 ......................... 62 A.8 Wikipedia Iteration 1 ......................... 63 viii CHAPTER 1 Introduction Anetworkisaninterconnectedsystemofpeopleorthings.Networksareubiq- uitous, as everyone has their own social network of friends, colleagues, and other acquaintances. Furthermore, with the rise of online social networks, such as Face- book and Twitter, millions of people can connect from across the world. With these rapidly growing networks, visualization becomes more important than ever to com- prehending complex networks. Network analysis provides valuable information about network organization and community structure. These insights can be most clearly seen by representing networks as graphs. Actors correspond to nodes, and relationships are depicted as edges. A simple example of a social network in graph format can be seen in Figure 1. Figure 1: Social network graph [1]. 1 Avisualgraphrepresentationcanrevealdetailsaboutanetworkfromjusta glance. This makes understanding network connections quick and easy compared to trying to analyze raw data. However, as networks grow larger, they become in- creasingly difficult to visualize. With so many nodes and overlapping edges, a graph can become a giant hairball, impossible to extract valuable information from. This project aims to address this problem by combining various visualization techniques and proposing some for the first time in order to improve the readability of large graphs. The objective of this research is to create a method to effectively visualize large networks. This is achieved by proposing a recursive community detection approach, • exploring various node size scaling approaches, • applying various layout formations, • applying node labeling to convey information on community size, • incorporating edge coloring for the inter-connections between communities, • and experimenting with interactive visualization capabilities. • Community detection is necessary for identifying closely related groups within a network [2]. Node size scaling ensures nodes are sized proportionately compared to other communities based on how many nodes from the original network are contained in the community. Visualizations are drawn in different layouts [3, 4]dependingonthe graph characteristics. Communities are labeled with the number of nodes contained in the community. This labeling enables the viewer to quickly gather insights on 2 community size. Edge coloring [5]isappliedtoedgesbetweencommunities,inter- connections, to minimize confusion from edge crossings. Finally, making the graphs interactive allows for useful capabilities such as zooming, dragging, and highlighting node connections. These topics have been individually researched, but there has yet to be a visualization method that utilizes all of these features together to generate optimal network visualizations. Combining these methods for graph visualization is the pri- mary innovation and challenge of this project. Additionally, a user survey on different visualizations of five known datasets for community detection is used to rate the ex- perimental results. 3 CHAPTER 2 Related Works There have been various research efforts towards developing new graph analysis and visualization techniques. This project draws inspiration from some of these existing solutions to produce a comprehensive graph visualization approach. 2.1 Network Tools Many network analysis tools have arisen in recent years with the boom of social networks. These tools provide community detection and mining capabilities to extract useful information from networks. However, with so many different tools available, it can be difficult to determine which tool is best for a job. Amin, Ahmad, and Choi [1]hopedtoaddressthisissuebycomparingfourmajornetworkanalysistools:

Load more