A Comparative Study of Large-Scale Network Data Visualization Tools

Total Page:16

File Type:pdf, Size:1020Kb

A Comparative Study of Large-Scale Network Data Visualization Tools University of New Orleans ScholarWorks@UNO Senior Honors Theses Undergraduate Showcase 5-2018 A Comparative Study Of Large-Scale Network Data Visualization Tools Prakash Joshi University of New Orleans Follow this and additional works at: https://scholarworks.uno.edu/honors_theses Part of the Computer Sciences Commons Recommended Citation Joshi, Prakash, "A Comparative Study Of Large-Scale Network Data Visualization Tools" (2018). Senior Honors Theses. 108. https://scholarworks.uno.edu/honors_theses/108 This Honors Thesis-Unrestricted is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Honors Thesis-Unrestricted in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Honors Thesis-Unrestricted has been accepted for inclusion in Senior Honors Theses by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. A Comparative Study Of Large-Scale Network Data Visualization Tools An Honors Thesis Submitted to The Department of Computer Science Of the University of New Orleans In Partial Fulfillment Of the Requirements For the Degree of Bachelor of Science With University High Honors and Honors in Computer Science By Prakash Joshi May 2018 Acknowledgments To my advisor and mentor Dr. Shaikh M. Arifuzzaman: I owe you this thesis. You have been a great teacher and guide throughout the writing process. I highly appreciate the feedbacks you gave me whenever I needed them. To my advisor and second reader Dr. Christopher Summa, thank you for taking time from your busy schedule and being my second reader. I appreciate your patience and friendly attitude, a lot. To Dr. Eishita Farjana, thank you for being the third reader and giving valuable suggestions on styling, editing and helping me shape the overall layout of the project. To Erin Sutherland, from the honors office department, thank you for the support whenever I was stressed about meeting the deadlines. To my friends Haydar Mahdi and Hiram Molina, thank you for being my resident grammarians. And finally – I dedicate this thesis to my support system – my parents. ii Table of Contents Acknowledgments ................................................................................................................... ii LIST OF TABLES ........................................................................................................................ v LIST OF FIGURES ..................................................................................................................... vi Abstract ................................................................................................................................ vii Introduction ........................................................................................................................... 1 Importance of Visualization .................................................................................................... 3 Network Data and Terminologies ........................................................................................... 5 Network (Graph) Terminologies ....................................................................................................... 5 Node Degree ........................................................................................................................................ 5 Size ....................................................................................................................................................... 5 Weight ................................................................................................................................................. 5 Network Features ................................................................................................................................ 6 Node Features ..................................................................................................................................... 7 Edge Features ...................................................................................................................................... 8 Data Visualization Tools ......................................................................................................... 9 Gephi ............................................................................................................................................. 10 NodeXL .......................................................................................................................................... 11 iii Pajek .............................................................................................................................................. 12 Basis of Comparison: ............................................................................................................. 14 Visualization .......................................................................................................................... 15 Graph Analysis Features ................................................................................................................. 30 Compatibility ................................................................................................................................. 39 Based on Operating Systems ............................................................................................................. 39 Data Types Supported ....................................................................................................................... 40 Scalability ....................................................................................................................................... 42 Gephi: ............................................................................................................................................ 42 NodeXL .......................................................................................................................................... 42 Pajek .............................................................................................................................................. 42 Conclusion (End Report) ........................................................................................................ 43 [References] .......................................................................................................................... 44 iv LIST OF TABLES Table 1: Dataset 1 (Les Misérables) ............................................................................................. 16 Table 2: Dataset 2 (People who tweeted “mardi gras”) ................................................................ 22 Table 3: Dataset 3 Yeast PPI interaction ...................................................................................... 30 Table 4: Yeast Graph features calculated- by Gephi .................................................................... 31 Table 5: Twitter data graph feature calculated by NodeXl ........................................................... 35 Table 6: Compatibility based on Operating System ..................................................................... 39 Table 7: Input Data types supported by Tools .............................................................................. 40 Table 8: Output Data types supported by Tools ........................................................................... 41 v LIST OF FIGURES Figure 1: German School Boys’ Friendship Network .................................................................... 4 Figure 2: Gephi in action .............................................................................................................. 11 Figure 3: NodeXL in action .......................................................................................................... 12 Figure 4: Initial Pajek Welcome Window .................................................................................... 13 Figure 5: Pajek in Action .............................................................................................................. 13 Figure 6: Raw Layout Les Misérables .......................................................................................... 17 Figure 7: Fruchterman Reingold Les Misérables .......................................................................... 19 Figure 8: Les Misérables Groups and Clusters ............................................................................. 20 Figure 9: NodeXL Fruchterman Reingold .................................................................................... 23 Figure 10 NodeXL Clusters and Groups ...................................................................................... 24 Figure 11: NodeXL groups in a box ............................................................................................. 25 Figure 12: Fruchterman Reingold Pajek ....................................................................................... 27 Figure 13: Clusters and groups in Pajek ....................................................................................... 29 Figure 14: Betweenness Centrality calculated by Pajek ..............................................................
Recommended publications
  • Networkx: Network Analysis with Python
    NetworkX: Network Analysis with Python Salvatore Scellato Full tutorial presented at the XXX SunBelt Conference “NetworkX introduction: Hacking social networks using the Python programming language” by Aric Hagberg & Drew Conway Outline 1. Introduction to NetworkX 2. Getting started with Python and NetworkX 3. Basic network analysis 4. Writing your own code 5. You are ready for your project! 1. Introduction to NetworkX. Introduction to NetworkX - network analysis Vast amounts of network data are being generated and collected • Sociology: web pages, mobile phones, social networks • Technology: Internet routers, vehicular flows, power grids How can we analyze this networks? Introduction to NetworkX - Python awesomeness Introduction to NetworkX “Python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.” • Data structures for representing many types of networks, or graphs • Nodes can be any (hashable) Python object, edges can contain arbitrary data • Flexibility ideal for representing networks found in many different fields • Easy to install on multiple platforms • Online up-to-date documentation • First public release in April 2005 Introduction to NetworkX - design requirements • Tool to study the structure and dynamics of social, biological, and infrastructure networks • Ease-of-use and rapid development in a collaborative, multidisciplinary environment • Easy to learn, easy to teach • Open-source tool base that can easily grow in a multidisciplinary environment with non-expert users
    [Show full text]
  • Networks in Nature: Dynamics, Evolution, and Modularity
    Networks in Nature: Dynamics, Evolution, and Modularity Sumeet Agarwal Merton College University of Oxford A thesis submitted for the degree of Doctor of Philosophy Hilary 2012 2 To my entire social network; for no man is an island Acknowledgements Primary thanks go to my supervisors, Nick Jones, Charlotte Deane, and Mason Porter, whose ideas and guidance have of course played a major role in shaping this thesis. I would also like to acknowledge the very useful sug- gestions of my examiners, Mark Fricker and Jukka-Pekka Onnela, which have helped improve this work. I am very grateful to all the members of the three Oxford groups I have had the fortune to be associated with: Sys- tems and Signals, Protein Informatics, and the Systems Biology Doctoral Training Centre. Their companionship has served greatly to educate and motivate me during the course of my time in Oxford. In particular, Anna Lewis and Ben Fulcher, both working on closely related D.Phil. projects, have been invaluable throughout, and have assisted and inspired my work in many different ways. Gabriel Villar and Samuel Johnson have been col- laborators and co-authors who have helped me to develop some of the ideas and methods used here. There are several other people who have gener- ously provided data, code, or information that has been directly useful for my work: Waqar Ali, Binh-Minh Bui-Xuan, Pao-Yang Chen, Dan Fenn, Katherine Huang, Patrick Kemmeren, Max Little, Aur´elienMazurie, Aziz Mithani, Peter Mucha, George Nicholson, Eli Owens, Stephen Reid, Nico- las Simonis, Dave Smith, Ian Taylor, Amanda Traud, and Jeffrey Wrana.
    [Show full text]
  • Modularity in Static and Dynamic Networks
    Modularity in static and dynamic networks Sarah F. Muldoon University at Buffalo, SUNY OHBM – Brain Graphs Workshop June 25, 2017 Outline 1. Introduction: What is modularity? 2. Determining community structure (static networks) 3. Comparing community structure 4. Multilayer networks: Constructing multitask and temporal multilayer dynamic networks 5. Dynamic community structure 6. Useful references OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Introduction: What is Modularity? OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon What is Modularity? Modularity (Community Structure) • A module (community) is a subset of vertices in a graph that have more connections to each other than to the rest of the network • Example social networks: groups of friends Modularity in the brain: • Structural networks: communities are groups of brain areas that are more highly connected to each other than the rest of the brain • Functional networks: communities are groups of brain areas with synchronous activity that is not synchronous with other brain activity OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The Brain is Modular Structural networks: cortical thickness correlations sensorimotor/spatial strategic/executive mnemonic/emotion olfactocentric auditory/language visual processing Chen et al. (2008) Cereb Cortex OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The Brain is Modular • Functional networks: resting state fMRI He et al. (2009) PLOS One OHBM 2017 – Brain Graphs Workshop – Sarah F. Muldoon Findings: The
    [Show full text]
  • Introduction to Network Science & Visualisation
    IFC – Bank Indonesia International Workshop and Seminar on “Big Data for Central Bank Policies / Building Pathways for Policy Making with Big Data” Bali, Indonesia, 23-26 July 2018 Introduction to network science & visualisation1 Kimmo Soramäki, Financial Network Analytics 1 This presentation was prepared for the meeting. The views expressed are those of the author and do not necessarily reflect the views of the BIS, the IFC or the central banks and other institutions represented at the meeting. FNA FNA Introduction to Network Science & Visualization I Dr. Kimmo Soramäki Founder & CEO, FNA www.fna.fi Agenda Network Science ● Introduction ● Key concepts Exposure Networks ● OTC Derivatives ● CCP Interconnectedness Correlation Networks ● Housing Bubble and Crisis ● US Presidential Election Network Science and Graphs Analytics Is already powering the best known AI applications Knowledge Social Product Economic Knowledge Payment Graph Graph Graph Graph Graph Graph Network Science and Graphs Analytics “Goldman Sachs takes a DIY approach to graph analytics” For enhanced compliance and fraud detection (www.TechTarget.com, Mar 2015). “PayPal relies on graph techniques to perform sophisticated fraud detection” Saving them more than $700 million and enabling them to perform predictive fraud analysis, according to the IDC (www.globalbankingandfinance.com, Jan 2016) "Network diagnostics .. may displace atomised metrics such as VaR” Regulators are increasing using network science for financial stability analysis. (Andy Haldane, Bank of England Executive
    [Show full text]
  • Evolving Networks and Social Network Analysis Methods And
    DOI: 10.5772/intechopen.79041 ProvisionalChapter chapter 7 Evolving Networks andand SocialSocial NetworkNetwork AnalysisAnalysis Methods and Techniques Mário Cordeiro, Rui P. Sarmento,Sarmento, PavelPavel BrazdilBrazdil andand João Gama Additional information isis available atat thethe endend ofof thethe chapterchapter http://dx.doi.org/10.5772/intechopen.79041 Abstract Evolving networks by definition are networks that change as a function of time. They are a natural extension of network science since almost all real-world networks evolve over time, either by adding or by removing nodes or links over time: elementary actor-level network measures like network centrality change as a function of time, popularity and influence of individuals grow or fade depending on processes, and events occur in net- works during time intervals. Other problems such as network-level statistics computation, link prediction, community detection, and visualization gain additional research impor- tance when applied to dynamic online social networks (OSNs). Due to their temporal dimension, rapid growth of users, velocity of changes in networks, and amount of data that these OSNs generate, effective and efficient methods and techniques for small static networks are now required to scale and deal with the temporal dimension in case of streaming settings. This chapter reviews the state of the art in selected aspects of evolving social networks presenting open research challenges related to OSNs. The challenges suggest that significant further research is required in evolving social networks, i.e., existent methods, techniques, and algorithms must be rethought and designed toward incremental and dynamic versions that allow the efficient analysis of evolving networks. Keywords: evolving networks, social network analysis 1.
    [Show full text]
  • Network Science, Homophily and Who Reviews Who in the Linux Kernel? Working Paper[+] › Open-Access at ECIS2020-Arxiv.Pdf
    Network Science, Homophily and Who Reviews Who in the Linux Kernel? Working paper[P] Open-access at http://users.abo.fi/jteixeir/pub/linuxsna/ ECIS2020-arxiv.pdf José Apolinário Teixeira Åbo Akademi University Finland # jteixeira@abo. Ville Leppänen University of Turku Finland Sami Hyrynsalmi arXiv:2106.09329v1 [cs.SE] 17 Jun 2021 LUT University Finland [P]As presented at 2020 European Conference on Information Systems (ECIS 2020), held Online, June 15-17, 2020. The ocial conference proceedings are available at the AIS eLibrary (https://aisel.aisnet.org/ecis2020_rp/). Page ii of 24 ? Copyright notice ? The copyright is held by the authors. The same article is available at the AIS Electronic Library (AISeL) with permission from the authors (see https://aisel.aisnet.org/ ecis2020_rp/). The Association for Information Systems (AIS) can publish and repro- duce this article as part of the Proceedings of the European Conference on Information Systems (ECIS 2020). ? Archiving information ? The article was self-archived by the rst author at its own personal website http:// users.abo.fi/jteixeir/pub/linuxsna/ECIS2020-arxiv.pdf dur- ing June 2021 after the work was presented at the 28th European Conference on Informa- tion Systems (ECIS 2020). Page iii of 24 ð Funding and Acknowledgements ð The rst author’s eorts were partially nanced by Liikesivistysrahasto - the Finnish Foundation for Economic Education, the Academy of Finland via the DiWIL project (see http://abo./diwil) project. A research companion website at http://users.abo./jteixeir/ECIS2020cw supports the paper with additional methodological details, additional data visualizations (plots, tables, and networks), as well as high-resolution versions of the gures embedded in the paper.
    [Show full text]
  • Gephi Tools for Network Analysis and Visualization
    Frontiers of Network Science Fall 2018 Class 8: Introduction to Gephi Tools for network analysis and visualization Boleslaw Szymanski CLASS PLAN Main Topics • Overview of tools for network analysis and visualization • Installing and using Gephi • Gephi hands-on labs Frontiers of Network Science: Introduction to Gephi 2018 2 TOOLS OVERVIEW (LISTED ALPHABETICALLY) Tools for network analysis and visualization • Computing model and interface – Desktop GUI applications – API/code libraries, Web services – Web GUI front-ends (cloud, distributed, HPC) • Extensibility model – Only by the original developers – By other users/developers (add-ins, modules, additional packages, etc.) • Source availability model – Open-source – Closed-source • Business model – Free of charge – Commercial Frontiers of Network Science: Introduction to Gephi 2018 3 TOOLS CINET CyberInfrastructure for NETwork science • Accessed via a Web-based portal (http://cinet.vbi.vt.edu/granite/granite.html) • Supported by grants, no charge for end users • Aims to provide researchers, analysts, and educators interested in Network Science with an easy-to-use cyber-environment that is accessible from their desktop and integrates into their daily work • Users can contribute new networks, data, algorithms, hardware, and research results • Primarily for research, teaching, and collaboration • No programming experience is required Frontiers of Network Science: Introduction to Gephi 2018 4 TOOLS Cytoscape Network Data Integration, Analysis, and Visualization • A standalone GUI application
    [Show full text]
  • Networkx Reference Release 1.9.1
    NetworkX Reference Release 1.9.1 Aric Hagberg, Dan Schult, Pieter Swart September 20, 2014 CONTENTS 1 Overview 1 1.1 Who uses NetworkX?..........................................1 1.2 Goals...................................................1 1.3 The Python programming language...................................1 1.4 Free software...............................................2 1.5 History..................................................2 2 Introduction 3 2.1 NetworkX Basics.............................................3 2.2 Nodes and Edges.............................................4 3 Graph types 9 3.1 Which graph class should I use?.....................................9 3.2 Basic graph types.............................................9 4 Algorithms 127 4.1 Approximation.............................................. 127 4.2 Assortativity............................................... 132 4.3 Bipartite................................................. 141 4.4 Blockmodeling.............................................. 161 4.5 Boundary................................................. 162 4.6 Centrality................................................. 163 4.7 Chordal.................................................. 184 4.8 Clique.................................................. 187 4.9 Clustering................................................ 190 4.10 Communities............................................... 193 4.11 Components............................................... 194 4.12 Connectivity..............................................
    [Show full text]
  • Multidimensional Network Analysis
    Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Multidimensional Network Analysis Michele Coscia Supervisor Supervisor Fosca Giannotti Dino Pedreschi May 9, 2012 Abstract This thesis is focused on the study of multidimensional networks. A multidimensional network is a network in which among the nodes there may be multiple different qualitative and quantitative relations. Traditionally, complex network analysis has focused on networks with only one kind of relation. Even with this constraint, monodimensional networks posed many analytic challenges, being representations of ubiquitous complex systems in nature. However, it is a matter of common experience that the constraint of considering only one single relation at a time limits the set of real world phenomena that can be represented with complex networks. When multiple different relations act at the same time, traditional complex network analysis cannot provide suitable an- alytic tools. To provide the suitable tools for this scenario is exactly the aim of this thesis: the creation and study of a Multidimensional Network Analysis, to extend the toolbox of complex network analysis and grasp the complexity of real world phenomena. The urgency and need for a multidimensional network analysis is here presented, along with an empirical proof of the ubiquity of this multifaceted reality in different complex networks, and some related works that in the last two years were proposed in this novel setting, yet to be systematically defined. Then, we tackle the foundations of the multidimensional setting at different levels, both by looking at the basic exten- sions of the known model and by developing novel algorithms and frameworks for well-understood and useful problems, such as community discovery (our main case study), temporal analysis, link prediction and more.
    [Show full text]
  • Strongly Connected Components and Biconnected Components
    Strongly Connected Components and Biconnected Components Daniel Wisdom 27 January 2017 1 2 3 4 5 6 7 8 Figure 1: A directed graph. Credit: Crash Course Coding Companion. 1 Strongly Connected Components A strongly connected component (SCC) is a set of vertices where every vertex can reach every other vertex. In an undirected graph the SCCs are just the groups of connected vertices. We can find them using a simple DFS. This works because, in an undirected graph, u can reach v if and only if v can reach u. This is not true in a directed graph, which makes finding the SCCs harder. More on that later. 2 Kernel graph We can travel between any two nodes in the same strongly connected component. So what if we replaced each strongly connected component with a single node? This would give us what we call the kernel graph, which describes the edges between strongly connected components. Note that the kernel graph is a directed acyclic graph because any cycles would constitute an SCC, so the cycle would be compressed into a kernel. 3 1,2,5,6 4,7 8 Figure 2: Kernel graph of the above directed graph. 3 Tarjan's SCC Algorithm In a DFS traversal of the graph every SCC will be a subtree of the DFS tree. This subtree is not necessarily proper, meaning that it may be missing some branches which are their own SCCs. This means that if we can identify every node that is the root of its SCC in the DFS, we can enumerate all the SCCs.
    [Show full text]
  • Analyzing Social Media Network for Students in Presidential Election 2019 with Nodexl
    ANALYZING SOCIAL MEDIA NETWORK FOR STUDENTS IN PRESIDENTIAL ELECTION 2019 WITH NODEXL Irwan Dwi Arianto Doctoral Candidate of Communication Sciences, Airlangga University Corresponding Authors: [email protected] Abstract. Twitter is widely used in digital political campaigns. Twitter as a social media that is useful for building networks and even connecting political participants with the community. Indonesia will get a demographic bonus starting next year until 2030. The number of productive ages that will become a demographic bonus if not recognized correctly can be a problem. The election organizer must seize this opportunity for the benefit of voter participation. This study aims to describe the network structure of students in the 2019 presidential election. The first debate was held on January 17, 2019 as a starting point for data retrieval on Twitter social media. This study uses data sources derived from Twitter talks from 17 January 2019 to 20 August 2019 with keywords “#pilpres2019 OR #mahasiswa since: 2019-01-17”. The data obtained were analyzed by the communication network analysis method using NodeXL software. Our Analysis found that Top Influencer is @jokowi, as well as Top, Mentioned also @jokowi while Top Tweeters @okezonenews and Top Replied-To @hasmi_bakhtiar. Jokowi is incumbent running for re-election with Ma’ruf Amin (Senior Muslim Cleric) as his running mate against Prabowo Subianto (a former general) and Sandiaga Uno as his running mate (former vice governor). This shows that the more concentrated in the millennial generation in this case students are presidential candidates @jokowi. @okezonenews, the official twitter account of okezone.com (MNC Media Group).
    [Show full text]
  • Performing Knowledge Requirements Analysis for Public Organisations in a Virtual Learning Environment: a Social Network Analysis
    chnology Te & n S o o ti ft a w a m r r e Fontenele et al., J Inform Tech Softw Eng 2014, 4:2 o f E Journal of n n I g f i o n DOI: 10.4172/2165-7866.1000134 l e a e n r r i n u g o J ISSN: 2165-7866 Information Technology & Software Engineering Research Article Open Access Performing Knowledge Requirements Analysis for Public Organisations in a Virtual Learning Environment: A Social Network Analysis Approach Fontenele MP1*, Sampaio RB2, da Silva AIB2, Fernandes JHC2 and Sun L1 1University of Reading, School of Systems Engineering, Whiteknights, Reading, RG6 6AH, United Kingdom 2Universidade de Brasília, Campus Universitário Darcy Ribeiro, Brasília, DF, 70910-900, Brasil Abstract This paper describes an application of Social Network Analysis methods for identification of knowledge demands in public organisations. Affiliation networks established in a postgraduate programme were analysed. The course was ex- ecuted in a distance education mode and its students worked on public agencies. Relations established among course participants were mediated through a virtual learning environment using Moodle. Data available in Moodle may be ex- tracted using knowledge discovery in databases techniques. Potential degrees of closeness existing among different organisations and among researched subjects were assessed. This suggests how organisations could cooperate for knowledge management and also how to identify their common interests. The study points out that closeness among organisations and research topics may be assessed through affiliation networks. This opens up opportunities for ap- plying knowledge management between organisations and creating communities of practice.
    [Show full text]