Mapper on Graphs for Network Visualization
Mustafa Hajij,∗ Paul Rosen,† Bei Wang‡
Abstract Networks are an exceedingly popular type of data for representing relationships between individuals, businesses, proteins, brain regions, telecommunication endpoints, etc. Network or graph visualization provides an intuitive way to explore the node-link structures of network data for instant sense-making. However, naive node-link diagrams can fail to convey insights regarding network structures, even for moderately sized data of a few hundred nodes. We pro- pose to apply the mapper construction—a popular tool in topological data analysis—to graph visualization, which provides a strong theoretical basis for summarizing network data while preserving their core structures. We develop a variation of the mapper construction target- ing weighted, undirected graphs, called mapper on graphs, which generates property-preserving summaries of graphs. We provide a software tool that enables interactive explorations of such summaries and demonstrates the effectiveness of our method for synthetic and real-world data. The mapper on graphs approach we propose represents a new class of techniques that leverages tools from topological data analysis in addressing challenges in graph visualization.
1 Introduction
Networks are often used to model social, biological, and technological systems. In recent years, our ability to collect and archive such data has far outpaced our ability to understand them. For instance, the Blue Brain Project—the worlds largest-scale simulations of neural circuits—generates instances of the micro-connectome containing 10 million neurons and 88 billion synaptic connections for the rodent brain. The challenges for graph visualization (sometimes called network visualization) are two-fold: how to effectively extract features from such complex data; and how to design effective visualizations to communicate these features to the users. We propose to address these challenges by leveraging the mapper construction [88], a tool in topological data analysis (TDA), to develop visualizations for large network data. Given a topological space X equipped with a function f on X, the classic mapper construction from the arXiv:1804.11242v4 [cs.SI] 17 Dec 2019 seminal work of Singh et al. [88] provides a topological summary of the data for efficient computation, manipulation, and exploration. It has enjoyed tremendous success in data science, from cancer research [72] to sports analytics [1], among others [17, 64, 65, 91]; it is also a cornerstone of several data analytics companies, e.g., Ayasdi and Alpine Data Labs. In this paper, we develop a variation of the mapper construction targeting weighted undirected graphs, called mapper on graphs. For the rest of the paper, we use networks to refer to the data and graphs as an abstraction to the data. The mapper construction connects naturally with visualization by providing a strong theoretical basis for simplifying large complex data while preserving their core structures. Specifically:
∗KLA corporation. E-mail: [email protected] †University of South Florida. E-mail: [email protected] ‡University of Utah. E-mail: [email protected]. Corresponding author.
1 • We propose a set of summarization techniques to transform large graphs into hierarchical representations and provide interactive visualizations for their exploration.
• We demonstrate the effectiveness of our method on synthetic and real-world data using three different topological lenses that capture various properties of the graphs.
• We provide open-sourced implementation together with our experimental datasets via GitHub (see the supplement material).
2 Related Work
Graph visualization. We limit our review to node-link diagrams, which are utilized by many visu- alization software tools, including Gephi [7], GraphViz [34], and NodeXL [44]. For a comprehensive overview of graph visualization techniques, see [93]. One of the biggest challenges with node-link diagrams is visual clutter, which has been exten- sively studied in graph visualization [33]. It is mainly addressed in three ways: improved node layouts, edges bundling, and alternative visual representations. Tutte [92] provided the earliest graph layout method for node-link diagrams, followed by meth- ods driven by linear programming [40], force-directed embeddings [37, 47], embeddings of the graph metric [39], and connectivity structures [13, 50, 52, 53]. TopoLayout [2] creates a hybrid layout by decomposing a graph into subgraphs based on their topological features, including trees, complete graphs, bi-connected components, and clusters, which are subsequently grouped and laid out as meta-nodes. One of many differences between TopoLayout and our work is that we use functions defined on the graph to automatically and interactively guide decomposition and feature extraction among subgraphs. Edge bundling, which bundles adjacent edges together, is commonly used to reduce visual clutter on dense graphs [45]. For massive graphs, hierarchical edge bundling scales to millions of edges [38], while divided edge bundling [85] tends to produce higher-quality visual results. Nevertheless, these approaches only deal with edge clutter, not node clutter, and they only support limited types of analytic tasks [4, 67]. Finally, alternative visual representations have been used to remove clutter, ranging from varia- tions on node-link diagrams, such as replacing nodes with modules [31] and motifs [30], to abstract representations, such as matrix diagrams [28] and graph statistics [49]. Node clustering. The objective in node clustering (or graph clustering) is to group the nodes of the graph by taking into consideration its edge structure [84]. Common techniques include spectral methods [26, 36, 54, 94], similarity-based aggregation [90], community detection [69, 70], random walks [48, 80], and hierarchical clustering [12, 16]. Edge clustering has also been studied [23, 35]. Broadly speaking, our approach is a type of graph clustering that simultaneously preserves relationships between clusters. TDA in graph analysis and visualization. Persistent homology (the study of topological features across multi-scales) and mapper construction are two of the most widely used tools in TDA. A number of works use persistent homology to analyze graphs [29, 32, 46, 76, 77], and it has been applied to the study of collaboration networks [5, 20] and brain networks [21, 24, 56, 57, 58, 59, 79]. In terms of graph visualization, persistent homology has been used in capturing changes in time- varying graphs [43], as well as supporting interactive force-directed layouts [89]. The mapper construction [88] has been widely utilized in TDA for a number of applications [17, 64, 65, 73, 91]. Recently, it has witnessed major theoretical developments (e.g., [19, 25, 68]) that
2 Figure 1: An illustration of the mapper on graphs construction: (a) A weighted graph G = (V,E) has (b) a topological lens f : V → R applied. (c) A cover U of the range space is given by intervals −1 U1, U2, U3, and U4 as cover elements. (d-e) The connected subgraphs induced by f (Ui) form a cover of G, denoted as f ∗(U). (f) The 1-dimensional skeleton of the nerve (1-nerve) of f ∗(U) is the resulting mapper on a graph whose nodes represent the connected subgraphs (in orange), and edges represent the non-empty intersections between the subgraphs (in purple).
further adjudicate its use in data analysis. To the best of our knowledge, this is the first time the mapper construction is utilized explicitly in graph visualization.
3 Methods: Mapper on Graphs Construction
Suppose the data is a weighted graph G = (V,E) equipped with a positive edge weight w : E → R and a real-valued function defined on its nodes f : V → R. Our mapper on graphs method—a variation of the classic mapper construction—provides a general framework to analyze, simplify, and visualize G, as well as functions on G. An open cover of a topological space X is a collection U = {Uα}α∈A of open sets for some S indexing set A such that α∈A Uα = X. A finite open cover U is a good cover if every finite nonempty intersection of sets in U is contractible. The mapper on graphs construction starts with a finite good cover U = {Uα}α∈A of the image S ∗ f(V ) of f, such that f(V ) ⊆ Uα. Let f (U) denote the cover of G obtained by considering the −1 connected components (i.e., maximal connected subgraphs) induced by nodes in f (Uα) for each α. Given a cover U = {Uα}α∈A of X, let N (U) denote the simplicial complex that corresponds to T ∗ the nerve of the cover U, that is, N (U) = σ ⊆ A | α∈σ Uα 6= ∅ . We compute the nerve of f (U), denoted as N (f ∗(U)), and refer to its 1-dimensional skeleton as the mapper on a graph, denoted as M; see Figure 1 for an illustrative example. Parameters. Mapper on graphs is inherently multi-scale; its construction relies on two sets of parameters: the first defines the function/lens f, and the other specifies the cover U. For simplicity, we normalize the range space f(V ) to be within [0, 1].
1. Topological Lens: The function f plays the role of a topological lens through which we look at the properties of the data, and different lenses provide different insights [9, 88].
3 Mapper on graphs currently considers three graph-theoretic lenses, average geodesic distance (AGD), density estimation, and eigenfunctions of the graph Laplacian (see Figure 2), although our framework can be easily extended to include other lenses (see Section 6).
2. Cover: The range of f, f(V ) ⊆ [0, 1], is covered by U, which consists of a finite number of open intervals as cover elements U = {Uα}α∈A. A common strategy is to use uniformly sized overlapping intervals. Let n be the number of intervals, and describes the amount of overlap between adjacent intervals (see Section 3.2 for details). Adjusting these parameters increases or decreases the amount of aggregation mapper on graphs provides.