Graphprism: Compact Visualization of Network Structure
Total Page:16
File Type:pdf, Size:1020Kb
GraphPrism: Compact Visualization of Network Structure Sanjay Kairam, Diana MacLean, Manolis Savva, Jeffrey Heer Stanford University Computer Science Department {skairam, malcdi, msavva, jheer}@cs.stanford.edu GraphPrism Connectivity ABSTRACT nodeRadius 7.9 strokeWidth 1.12 charge -242 Visual methods for supporting the characterization, com- gravity 0.65 linkDistance 20 parison, and classification of large networks remain an open Transitivity updateGraph Close Controls challenge. Ideally, such techniques should surface useful 0 node(s) selected. structural features { such as effective diameter, small-world properties, and structural holes { not always apparent from Density either summary statistics or typical network visualizations. In this paper, we present GraphPrism, a technique for visu- Conductance ally summarizing arbitrarily large graphs through combina- tions of `facets', each corresponding to a single node- or edge- specific metric (e.g., transitivity). We describe a generalized Jaccard approach for constructing facets by calculating distributions of graph metrics over increasingly large local neighborhoods and representing these as a stacked multi-scale histogram. MeetMin Evaluation with paper prototypes shows that, with minimal training, static GraphPrism diagrams can aid network anal- ysis experts in performing basic analysis tasks with network Created by Sanjay Kairam. Visualization using D3. data. Finally, we contribute the design of an interactive sys- Figure 1: GraphPrism and node-link diagrams for tem using linked selection between GraphPrism overviews the largest component of a co-authorship graph. and node-link detail views. Using a case study of data from a co-authorship network, we illustrate how GraphPrism fa- compactly summarizing networks of arbitrary size. Graph- cilitates interactive exploration of network data. Prism views combine multiple `facets', each providing a sta- Categories and Subject Descriptors tistical summary of the graph with respect to a single node- or edge-specific metric. By computing metrics over graph I.6.9 [Computing Methodologies]: Visualization|Visu- neighborhoods of increasing size, GraphPrism generalizes alization techniques and methodologies Bagrow et al.'s [3] B-Matrix technique. Paper prototype evaluations with network analysis experts demonstrate that Keywords analysts can use GraphPrism diagrams to assess and com- Network analysis, graph visualization, scalability pare network structures. Second, we describe the design of an interactive system 1. INTRODUCTION for inspecting graphs at multiple levels of detail using linked selection between a GraphPrism overview and a node-link The size of available network datasets is growing consid- detail view. Through a case study with a network science erably, already exceeding millions of nodes and edges [25]. co-authorship graph (Figure 1), we show how GraphPrism Many existing graph visualization methods such as node- facets function as both visual summaries and dynamic query link diagrams and adjacency matrices were developed in the selectors to investigate patterns ranging from the level of context of smaller graphs. For larger networks these tech- individual nodes and edges up to global network structure. niques break down, making them unsuitable for supporting inference about higher-level network properties. In this paper, we make two contributions towards the 2. RELATED WORK problem of scaling graph visualization to larger networks. We first review relevant work in the area of network anal- First, we present GraphPrism, a visualization technique for ysis and identify properties critical for understanding graph data. We then examine existing graph visualization ap- proaches and discuss how well each conveys these properties. Permission to make digital or hard copies of all or part of this work for 2.1 Properties of Complex Networks personal or classroom use is granted without fee provided that copies are Our goal with GraphPrism is to surface structural prop- not made or distributed for profit or commercial advantage and that copies erties of complex networks in a scalable, spatially compact bear this notice and the full citation on the first page. To copy otherwise, to manner. Here, we identify structural properties of networks republish, to post on servers or to redistribute to lists, requires prior specific which are critical in performing common analysis tasks, and permission and/or a fee. AVI ’12, May 21-25, 2012, Capri Island, Italy thus should ideally be conveyed by network visualizations. Copyright 2012 ACM 978-1-4503-1287-5/12/05 ...$10.00. Key definitions are summarized in Table 1. Degree. Node degrees for many real networks have been Degree For a node v in G, the count of shown to follow a power-law distribution [15]; such patterns edges in G incident on v. can yield insight into mechanisms underlying system growth Diameter The length of the maximum [4]. Deviations from expected distributions can reveal un- shortest path between any pair of predicted system properties which may affect the structure connected nodes u, v in G. of the network [25]. Thus, there are significant benefits to visualizing the overall shape of the degree distribution. Triad A triplet of connected nodes. A Diameter. The diameter of a graph intuitively expresses triad is closed if edges connect all the graph's `width'. The presence of specific structures, three nodes to each other. such as a long isolated chain, can change this value dra- Local Transitivity For some v in G, the fraction of matically | even if such structures do not play a significant triads in G which are closed for role in the overall system. A more robust version of this met- the subgraph defined by v and its ric is the effective diameter, defined as the number of hops immediate neighbors. separating a given percentage of connected node pairs (com- monly 90% [24]). Visualizing the relationship between this Global Transitivity The fraction of all triads in G choice of threshold and the effective diameter may provide which are closed. additional insight into patterns of graph connectivity. Table 1: Common Graph-Theoretic Measures Transitivity. Also named clustering coefficient, transi- tivity describes network interconnectedness. Local transi- tivity around individual nodes can aid reasoning about pro- Adjacency matrices avoid edge-crossing and occlusion prob- cesses like clique formation [11]. Global transitivity in neu- lems but are still subject to layout effects; the perception of ral, infrastructure, and social networks (among others) [32] larger structures such as communities depends on a proper has been shown to be significantly higher than in random permutation of rows and columns [6]. Algorithms exist for networks with equivalent degree distributions [24, 29]. sorting adjacency matrices [16], but the choice of which to Small-Worlds. Networks with short average path lengths use is dependent on task context and prior insight into the and local clustering of nodes exhibit so-called small-world expected structure [20]. Like node-link diagrams, the scal- effects [32]. First identified in social networks [28], further ability of adjacency matrices is limited due to the bounded study has revealed this phenomenon in many other types resolution of both pixel displays and human visual acuity. of systems, such as biological and information networks [25, Recent work has attempted to overcome limitations of 32]. The presence of small-world properties in communica- each of these approaches by combining them [20, 21, 22]. To tion and information networks indicates that these systems aid path-following, MatLink [21] augments an adjacency ma- can be leveraged for rapid information passing and search [1, trix with arcs along row and column headers. NodeTrix [22] 23]. Making such properties evident in a visualization can starts with a traditional node-link view, but reduces occlu- aid reasoning about global system behavior. sion by representing dense clusters as small adjacency ma- Bridges. We define a local bridge as an edge connecting trices. While these approaches marry benefits of the two two nodes which otherwise share no immediate neighbors, visualization types, they remain limited by layout and scal- and a global bridge as connecting two otherwise disconnected ing issues as network size increases and show only a subset nodes. Such bridges can represent critical pathways through of the topological features we hope to convey. which resources in a system might travel [12]. Structural Abstracting Network Properties. To facilitate analy- holes created around these bridges have been shown to fos- sis of larger graphs, successful visualization approaches must ter creativity and productivity in professional information abstract the data. PivotGraph [31] and Honeycomb [19] networks by preventing `echo chambers' which might occur both aggregate graph data based on node attributes. Pivot- in more heavily clustered networks [13, 14]. Graph collapses nodes with particular attribute values into Communities. In network analysis, communities are a single super-node, allowing viewers to easily see how edges generally defined as collections of nodes which tend to link connect nodes of different types. Honeycomb aggregates more with each other than with nodes outside. Though this on the basis of node hierarchy and visualizes customizable concept is clearly derived from social networks, meaningful metrics to surface interesting network properties at multi- analogues have been discovered in a variety of systems, in- ple