Mapper on Graphs for Network Visualization

Mustafa Hajij,∗ Paul Rosen,† Bei Wang‡

Abstract Networks are an exceedingly popular type of data for representing relationships between individuals, businesses, proteins, brain regions, telecommunication endpoints, etc. Network or graph visualization provides an intuitive way to explore the node-link structures of network data for instant sense-making. However, naive node-link diagrams can fail to convey insights regarding network structures, even for moderately sized data of a few hundred nodes. We pro- pose to apply the mapper construction—a popular tool in topological data analysis—to graph visualization, which provides a strong theoretical basis for summarizing network data while preserving their core structures. We develop a variation of the mapper construction target- ing weighted, undirected graphs, called mapper on graphs, which generates property-preserving summaries of graphs. We provide a software tool that enables interactive explorations of such summaries and demonstrates the effectiveness of our method for synthetic and real-world data. The mapper on graphs approach we propose represents a new class of techniques that leverages tools from topological data analysis in addressing challenges in graph visualization.

1 Introduction

Networks are often used to model social, biological, and technological systems. In recent years, our ability to collect and archive such data has far outpaced our ability to understand them. For instance, the Blue Brain Project—the worlds largest-scale simulations of neural circuits—generates instances of the micro-connectome containing 10 million neurons and 88 billion synaptic connections for the rodent brain. The challenges for graph visualization (sometimes called network visualization) are two-fold: how to effectively extract features from such complex data; and how to design effective visualizations to communicate these features to the users. We propose to address these challenges by leveraging the mapper construction [88], a tool in topological data analysis (TDA), to develop visualizations for large network data. Given a topological space X equipped with a function f on X, the classic mapper construction from the arXiv:1804.11242v4 [cs.SI] 17 Dec 2019 seminal work of Singh et al. [88] provides a topological summary of the data for efficient computation, manipulation, and exploration. It has enjoyed tremendous success in data science, from cancer research [72] to sports analytics [1], among others [17, 64, 65, 91]; it is also a cornerstone of several data analytics companies, e.g., Ayasdi and Alpine Data Labs. In this paper, we develop a variation of the mapper construction targeting weighted undirected graphs, called mapper on graphs. For the rest of the paper, we use networks to refer to the data and graphs as an abstraction to the data. The mapper construction connects naturally with visualization by providing a strong theoretical basis for simplifying large complex data while preserving their core structures. Specifically:

∗KLA corporation. E-mail: [email protected] †University of South Florida. E-mail: [email protected] ‡University of Utah. E-mail: [email protected]. Corresponding author.

1 • We propose a set of summarization techniques to transform large graphs into hierarchical representations and provide interactive visualizations for their exploration.

• We demonstrate the effectiveness of our method on synthetic and real-world data using three different topological lenses that capture various properties of the graphs.

• We provide open-sourced implementation together with our experimental datasets via GitHub (see the supplement material).

2 Related Work

Graph visualization. We limit our review to node-link diagrams, which are utilized by many visu- alization software tools, including Gephi [7], GraphViz [34], and NodeXL [44]. For a comprehensive overview of graph visualization techniques, see [93]. One of the biggest challenges with node-link diagrams is visual clutter, which has been exten- sively studied in graph visualization [33]. It is mainly addressed in three ways: improved node layouts, edges bundling, and alternative visual representations. Tutte [92] provided the earliest graph layout method for node-link diagrams, followed by meth- ods driven by linear programming [40], force-directed embeddings [37, 47], embeddings of the graph metric [39], and connectivity structures [13, 50, 52, 53]. TopoLayout [2] creates a hybrid layout by decomposing a graph into subgraphs based on their topological features, including trees, complete graphs, bi-connected components, and clusters, which are subsequently grouped and laid out as meta-nodes. One of many differences between TopoLayout and our work is that we use functions defined on the graph to automatically and interactively guide decomposition and feature extraction among subgraphs. Edge bundling, which bundles adjacent edges together, is commonly used to reduce visual clutter on dense graphs [45]. For massive graphs, hierarchical edge bundling scales to millions of edges [38], while divided edge bundling [85] tends to produce higher-quality visual results. Nevertheless, these approaches only deal with edge clutter, not node clutter, and they only support limited types of analytic tasks [4, 67]. Finally, alternative visual representations have been used to remove clutter, ranging from varia- tions on node-link diagrams, such as replacing nodes with modules [31] and motifs [30], to abstract representations, such as matrix diagrams [28] and graph statistics [49]. Node clustering. The objective in node clustering (or graph clustering) is to group the nodes of the graph by taking into consideration its edge structure [84]. Common techniques include spectral methods [26, 36, 54, 94], similarity-based aggregation [90], community detection [69, 70], random walks [48, 80], and hierarchical clustering [12, 16]. Edge clustering has also been studied [23, 35]. Broadly speaking, our approach is a type of graph clustering that simultaneously preserves relationships between clusters. TDA in graph analysis and visualization. Persistent homology (the study of topological features across multi-scales) and mapper construction are two of the most widely used tools in TDA. A number of works use persistent homology to analyze graphs [29, 32, 46, 76, 77], and it has been applied to the study of collaboration networks [5, 20] and brain networks [21, 24, 56, 57, 58, 59, 79]. In terms of graph visualization, persistent homology has been used in capturing changes in time- varying graphs [43], as well as supporting interactive force-directed layouts [89]. The mapper construction [88] has been widely utilized in TDA for a number of applications [17, 64, 65, 73, 91]. Recently, it has witnessed major theoretical developments (e.g., [19, 25, 68]) that

2 Figure 1: An illustration of the mapper on graphs construction: (a) A weighted graph G = (V,E) has (b) a topological lens f : V → R applied. (c) A cover U of the range space is given by intervals −1 U1, U2, U3, and U4 as cover elements. (d-e) The connected subgraphs induced by f (Ui) form a cover of G, denoted as f ∗(U). (f) The 1-dimensional skeleton of the nerve (1-nerve) of f ∗(U) is the resulting mapper on a graph whose nodes represent the connected subgraphs (in orange), and edges represent the non-empty intersections between the subgraphs (in purple).

further adjudicate its use in data analysis. To the best of our knowledge, this is the first time the mapper construction is utilized explicitly in graph visualization.

3 Methods: Mapper on Graphs Construction

Suppose the data is a weighted graph G = (V,E) equipped with a positive edge weight w : E → R and a real-valued function defined on its nodes f : V → R. Our mapper on graphs method—a variation of the classic mapper construction—provides a general framework to analyze, simplify, and visualize G, as well as functions on G. An open cover of a topological space X is a collection U = {Uα}α∈A of open sets for some S indexing set A such that α∈A Uα = X. A finite open cover U is a good cover if every finite nonempty intersection of sets in U is contractible. The mapper on graphs construction starts with a finite good cover U = {Uα}α∈A of the image S ∗ f(V ) of f, such that f(V ) ⊆ Uα. Let f (U) denote the cover of G obtained by considering the −1 connected components (i.e., maximal connected subgraphs) induced by nodes in f (Uα) for each α. Given a cover U = {Uα}α∈A of X, let N (U) denote the simplicial complex that corresponds to  T ∗ the nerve of the cover U, that is, N (U) = σ ⊆ A | α∈σ Uα 6= ∅ . We compute the nerve of f (U), denoted as N (f ∗(U)), and refer to its 1-dimensional skeleton as the mapper on a graph, denoted as M; see Figure 1 for an illustrative example. Parameters. Mapper on graphs is inherently multi-scale; its construction relies on two sets of parameters: the first defines the function/lens f, and the other specifies the cover U. For simplicity, we normalize the range space f(V ) to be within [0, 1].

1. Topological Lens: The function f plays the role of a topological lens through which we look at the properties of the data, and different lenses provide different insights [9, 88].

3 Mapper on graphs currently considers three graph-theoretic lenses, average geodesic (AGD), density estimation, and eigenfunctions of the graph Laplacian (see Figure 2), although our framework can be easily extended to include other lenses (see Section 6).

2. Cover: The range of f, f(V ) ⊆ [0, 1], is covered by U, which consists of a finite number of open intervals as cover elements U = {Uα}α∈A. A common strategy is to use uniformly sized overlapping intervals. Let n be the number of intervals, and  describes the amount of overlap between adjacent intervals (see Section 3.2 for details). Adjusting these parameters increases or decreases the amount of aggregation mapper on graphs provides.

Figure 2: Examples of topological lenses for graphs: (a) average geodesic distance AGD (orange); (b) density estimation Dδ with δ = 2 (green); and (c) eigenvectors of the Fiedler vector of the graph Laplacian, l2 (purple). Darker colors mean lower function values.

3.1 Topological Lens An interesting open problem for the classic mapper construction is how to formulate topological lenses beyond the best practice or a rule of thumb [9, 10]. In practice, height functions, distances from the barycenter of the space, surface curvature, integral geodesic distances, and geodesic dis- tances from a source point in the space have all been proposed as reasonable choices [9]. In the graph setting, we focus on graph-theoretic lenses defined on the nodes of a graph, as illustrated in Figure 2. Each lens is chosen to reflect a specific property of interest that is intrinsic to the structure of a graph. In particular, we use as lenses average geodesic distance (AGD) [51] that detects symmetries in the graph while being invariant to reflection, rotation, and scaling; density estimation [87] that differentiates dense regions from sparse regions and outliers; and eigenfunctions of the graph Laplacian [55] that capture geometric properties of the graph. Average geodesic distance. Suppose a weighted graph G = (V,E) is equipped with a geodesic distance metric d. That is, d(u, v) measures the geodesic/graph distance between two nodes u, v ∈ V . d can be computed by utilizing Dijkstra’s shortest algorithm. The average geodesic distance, AGD : V → R, is given by 1 X AGD(v) = d(v, u). |V | u∈V This definition implies that the nodes near the center of the graph will likely have low function values, while points on the periphery will have high values. The AGD function has been used

4 extensively in shape analysis due to its desirable proprieties in detecting and reflecting symme- try [51] based on how the function values are distributed. Therefore, the AGD as a topological lens captures the symmetric properties of a graph, which are described by all or parts of the graph that are invariant to transformations such as reflection, rotation, and scaling. The mathematical notion of automorphism, in some sense, captures the symmetry of the space as it is a structural-preserving way of mapping a space to itself.

More precisely, consider a graph G as a metric space equipped with the geodesic distance, (G, d). A bijec- tion T : V → V is called an automorphism on (G, d) if d(u, v) = d(T (u),T (v)) for every u, v ∈ V . Let Aut(G) denote the group of automorphisms on G. A function f : V → R is isometry invariant over Aut(G) if for every T ∈ Aut(G): f ◦ T = f. AGD is, therefore, an isometry invariant scalar function. Indeed, let T be an automorphism on G, then for every v ∈ V , we 1 P can verify that AGD(T (v)) = |V | u∈V d(T (u)), and 1 P T (v) = |V | u∈V d(u, v) = AGD(v). See Figure 2(a) and Figure 3(a) for examples of AGD on graphs. Density estimation. The density estimation func- tion [87] is given by

X −d(u, v)2 D (v) = exp( ), δ δ Figure 3: The effect of a lens. The orig- u∈V inal graph, colored by one of the three where d(u, v) is the geodesic distance between two nodes lenses, is shown on the left; its corre- in the graph and δ > 0. sponding mapper on a graph M, along with a chosen cover, is shown on the Since Dδ is completely defined in terms of the dis- right. (a) AGD. (b) Dδ with δ = 1. tance d, it is not hard to see that Dδ is also isometry in- (c) l2. variant. Dδ correlates negatively with AGD as it tends to take larger values on nodes that are close to the center, see Figure 2(b) and Figure 3(b) for examples. Eigenfunctions of the graph Laplacian. Let C(G) be the vector space of all functions f : V −→ R. The unnormalized Laplacian of the graph G is the linear operator L : C(G) → C(G) defined by P mapping f ∈ C(G) to Lf, where (Lf)(v) = u∈N(v) wu,v(f(v) − f(u)). The eigenvectors of L form a rich family of scalar functions defined on G with many geometric properties [55]. First, the gradient of the eigenfunctions of L tends to follow the overall shape of the data [63]; and these functions have been used in applications, such as graph understanding [86], segmentation [82], spectral clustering [71], and min-cut problems [66]. Sorting the eigenvectors of L by increasing eigenvalues, we use eigenvectors of the second and third smallest eigenvalues of L as the lens, denoted as l2 and l3. These vectors usually contain low-frequency information about the graph, and they help to retain the shape of complex graphs. In particular, l2, commonly referred to as the Fiedler vector [63], has desirable geometric properties [27]. For instance, the maximum and the minimum of the Fielder vector tend to occur at nodes with maximum geodesic distances [22], see Figure 2(c) and Figure 3(c) for examples.

5 Furthermore, there is a connection between mapper on graphs, spectral clustering, and graph min-cut. For instance, the Fielder’s vector l2 can be used to bi-partition the graph G (i.e., based on l2(v) > 0 or l2(v) ≤ 0); such a partition could also be approximated by computing mapper on a graph M with l2 as the lens and setting n = 2 and  ≈ 0. M not only provides a generalization of spectral clustering but also preserves the connections (which form the min-cut) between the clusters (for appropriately chosen  > 0).

3.2 Cover Given a graph G equipped with a lens function f : V → R, suppose the range of the function f(V ) is normalized to be within [0, 1], and A = {1, ··· , n} for a finite good cover U = {Uα}α∈A of the interval [0, 1]. We represent this cover U visually by drawing long, colored rectangu- lar boxes, as indicated in 4. The mapper on graphs construction relies on the choice of a cover for the interval [0, 1]; such a choice is rather flexible but also essential to achieve effective graph visualization. To obtain an initial cover, we start by splitting [0, 1] into n (the resolution parameter) inter- vals [c1, c2], ··· , [cn−1, cn] with equal length, such that c1 = 0 and cn = 1. The overlap parameter  is then used to obtain the initial cover U consisting of cover elements (ci − , ci+1 + ) for 1 ≤ i ≤ n − 1. Choosing n and  has a significant impact on the mapper on graphs output, as illustrated in Figure 4. Broadly speaking, smaller n leads to a smaller topo- logical summary of the graph; and smaller  captures Figure 4: Varying  and n in a fewer connections between clusters of nodes. In the ex- cover. (a) G with a AGD lens. Vari- amples shown in the paper, we find a smaller  typically ous mapper on graphs constructions: (b- give a more effective visualization for large and highly d) n = 2, 3, and 4, respectively;  = connected graphs, e.g.,  ∈ [0.01, 0.1], while  ∈ [0.1, 0.3] 0.1. (e-f) n = 3,  = 0.2 and 0.3, respec- appears to be sufficient for small graphs. tively.

4 Visual Design and Interaction

We provide a linked-view interface to enable exploration of the structure of a graph. It connects the original graph G to its (multi-scale) summary in the form of a mapper on a graph M, through interactive cover manipulation that supports customization of M.

4.1 Cover Visualization and Interaction The mapper on graphs construction relies on the choice of two sets of parameters: a lens and a cover. Therefore, the cover visualization consists of two components: a histogram of the lens, showing the distribution of function values, and an interactive cover designer. Via interactive visualization, we treat the exploration and manipulation of these parameters as a vehicle to study and summarize the intrinsic structure of an input graph.

6 Histogram of a lens. Understanding the distribution of the values of a lens f can be helpful in the mapper on graphs construction. Figure 5(left) shows an example of a histogram for the AGD lens in gray. The histogram is split into a fixed number of bins within the range [0, 1]. We will illustrate later how the visual information encoded in the histogram can be utilized to optimize the choice of the cover. In addition, the histogram of a lens can be used to inform the choices for n and —generally speaking, a uniformly distributed lens function requires smaller . Cover visualization and interactive manipulation. The cover is visualized using a series of boxes, one per cover element, displayed next to the histogram of the lens. Each box is placed based upon the start and end val- ues of its interval and colored based upon the midpoint. Figure 5 shows the cover as red and orange boxes. While an initial, uniform cover is sufficient for most graph visualizations, we provide interactive manipula- tion of individual cover elements that can be used to Figure 5: The histogram of a lens. U obtain more desirable mapper on graphs output in some consists of open intervals U1,U2, and U3. cases. Given an interval Ui ∈ U, the user can manipu- late its endpoints dynamically via an interactive interface such that Ui can be shrunk, expanded, or shifted, as illustrated in Figure 6. As a user manipulates the interval Ui, the connected components −1 within f (Ui) split, merge, appear, or disappear. The histogram of a lens can be used to inform the cover manipulation; for instance, the length of an interval could be inversely proportional to the density of the histogram.

Figure 6: (a) The case when an interval Ui ∈ U shrinks, from left to right, or equivalently when it expands, from right to left, are shown. (b) The effect of interval shrinking (expanding) on the mapper on graphs nodes is shown. (c) The case when the interval Ui shifts to a new position is shown. (d) Changes to the mapper on graphs nodes that occur due to the interval shift are shown.

4.2 For both G and its summary M, we apply a Fruchterman-Reingold force-directed layout [37] with the Barnes-Hut approximation for repulsive forces [6]. Our approach is ultimately agnostic of the graph layout algorithm, and different layouts (e.g., layered approaches) may improve the presentation of certain graphs. For a given lens, f : V → R, a node in G is colored by a saturated colormap (red for AGD, green for Dδ, and purple for l2 or l3) based on its function value. A node u in M (associated with

7 a connected Cu ⊆ G) is colored similarly by taking the average of the function values of nodes in Cu. The size of u is proportional to |Cu|. For both graphs, edge thickness is drawn proportional to edge weight. For M, an edge [u, v] represents the nonempty intersection between Cu and Cv. Therefore, its edge weight, and thus thickness, is drawn proportional to the size of the intersection, |Cu ∩ Cv|. For readability, only the largest component of the mapper on a graph is included in the visu- alization. Furthermore, mapper on a graph nodes are removed from the output if the size of their connected component |Cu| is less than a user-selected value.

4.3 Interactive Structural Correspondences We provide three mechanisms for exploring structural correspondences between G and M: cover element selection, node selection, and edge selection.

(a) (b) (c) (d) (e)

Figure 7: (a-c) Selecting a cover element (in blue) triggers the selection of its corresponding nodes in M (top) and G (bottom). (d-e) Selecting a node u of M (top right in blue) triggers the selection of its corresponding cover element that generates u (left) and nodes in G (bottom).

Cover element selection. When a cover element U is se- lected, the action triggers the selection of nodes f −1(U) in G, as well as nodes that represent connected compo- nents of f −1(U) in M. As illustrated in Figure 7(a-c), after the selection of a cover element (top left in (a-c), highlighted in blue), our system selects the correspond- ing nodes in M (top) and in G (bottom). As previously noted, if the nodes of M captured by a particular cover element need fine-tuning, the box may be dragged, ex- (a) (b) (c) panded, or contracted. M will update correspondingly. Figure 8: Selecting an edge (blue) in M Node Selection. Each node u in M corresponds to highlights the clusters of nodes in G as- a connected component Cu from the original graph G. sociated with its endpoints and their in- With the selection of a node u, our interface recovers tersection. and highlights Cu from G, as well as the cover element that generates the node u, see Figure 7(d-e).

Edge Selection. Each edge uv in M is determined by two connected components Cu and Cv in G. With the selection of an edge uv, our interface highlights the clusters Cu and Cv in G associated with its endpoints. Specifically, the sets Cu −(Cu ∩Cv), Cv −(Cu ∩Cv), and Cu ∩Cv are colored differently

8 to highlight node memberships and the relationship between the clusters. Figure 8 illustrates this process. Nodes that are unique to each endpoint are in purple and sky blue, respectively; nodes that correspond to the intersection are in blue (see Figure 8(a)). For comparison, clusters of nodes attached to individual endpoints are also highlighted in blue in Figure 8(b).

5 Results

To demonstrate our approach, we have implemented our approach using Java and Processing1. We evaluate our approach by examining mapper on graphs on synthetic and real datasets. Our code and datasets are available on GitHub2. Synthetic datasets. We apply mapper on graphs to 20 synthetic datasets, all of which are generated using Net- 6 1 2 3 5 workX [42]. Figure 9 shows 16 synthetic graphs G and their 4 corresponding mapper on graphs outputs M. For each ex- Anchorage Int ample, certain structures are emphasized depending on the St Mary's Honolulu Int. Bethel Guam Int. choice of the lens and cover, such as symmetry (e.g., (b), Aniak (c), (k)) and the overall shape of the data (e.g., (a), (f), (l)). The original graphs for (a) and (f) have a circular shape, so we choose the Fiedler’s vector l2 as the lens as (a) it has variability across the graph. M for (f) is partic- ularly interesting as the original graph G comes from a 6 1 2 5 torus mesh, and M appears similar to its corresponding 3 4 Anchorage Int Reeb graph. M also captures the dual structures of some St Mary's Honolulu Int. graphs in the cases of Grid graph (j) and the Dorogovtsev- Guam Int. Aniak Goltsev-Mendes graph (c). Juneau Int USAIR 97. The previous example illustrates a natural in- terpretation of the nodes in M as clusters in G. In Fig- ure 10, we provide a natural interpretation of the edges in (b) M as connections between clusters. The USAIR 97 graph Figure 10: Edge selection within M consists of 332 nodes and 2126 edges [8]. The nodes rep- for the USAIR 97 graph: selecting resent airports and the edges routes between airports. We edge (1, 2) in (a) vs. edge (2, 3) in (b). use l3 as the lens and explore the USAIR 97 dataset by utilizing interactive edge selection as described in Section 4. First in Figure 10(a), selecting edge (1, 2) in M allows us to inspect their corresponding clusters C1 (light and dark blue nodes) and C2 (dark blue and purple nodes) in G. C1 and C2 have Bethel and Anchorage International as major airports, respectively. Edge (1, 2) corresponds to airports in C1 ∩ C2 (blue nodes), including the Aniak and the St Mary’s. M clearly captures the fact that these airports are the hubs between Bethel and Anchorage international in G. Second, selecting edge (2, 3) in M enables the exploration of C2 and C3 in G, respectively. C2, as shown in Figure 10(b), has Aniak and St Mary’s; while C3 contains the Juneau International Airport. Edge (2, 3) in M is mainly represented by Anchorage International. M captures the fact that Anchorage International serves as a hub—in order to go from any airport in C3 to airports in C2 one must travel through Anchorage. Finally, via node selection, node 6 in M corresponds to a peripheral cluster C6 on the outskirt

1https://processing.org/ 2https://github.com/USFDataVisualization/MapperOnGraphs

9 (a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Figure 9: Mapper on graphs applied to the visualization of synthetic datasets. In all examples, M is shown on the top with its cover and the original graph G is shown on the bottom. (a) Connected caveman graph (n = 5,  = 0.1); (b) Lobster graph (n = 5,  = 0.1); (c) Dorogovtsev-Goltsev- Mendes graph (n = 3,  = 0.2, δ = 7); (d) Large (n = 9,  = 0.15); (e) Community graph (n = 5,  = 0.25); (f) Torus graph (n = 3,  = 0.3); (g) Dorogovtsev-Goltsev-Mendes graph (n = 3,  = 0.3); (h) Lollipop graph (n = 3,  = 0.1, δ = 7); (i) Small bipartite graph (n = 3,  = 0.4); (j) Grid graph (n = 3,  = 0.1); (k) Tree (n = 3,  = 0.1); (l) Ladder graph (n = 4,  = 0.1). 10 of G (see Figure 10(a)). C6 is represented mainly by the Guam international airport. In order to pass from any airport in C6 to airports in C5, one must pass from the Honolulu International, which is contained in edge (5, 6) in M.

M M M (MMdMM M M MdM M bMMdM

g, g, g,

Figure 11: (a) Map of science graph G where nodes are colored by 13 scientific disciplines. (b) Mapper on a graph M (top, n = 10,  = 0.1) with l3 as the lens, in comparison with the original graph G (bottom). (c) Applying interactive cover manipulation on M achieves better clustering quality and shape summary.

Map of science. The map of science graph [11] (see Figure 11(a)) consists of 554 nodes and 2276 edges. Nodes represent and are colored by specialties within 13 major scientific disciplines, and edges represent co-authorship of publications between those specialties. Since G does not exhibit obvious symmetry, we choose the 3rd smallest eigenfunctions of the graph Laplacian, l3, as the lens to help to retain the shape of G. As illustrated in Figure 11(b), both M and G are laid out by way of correspondence where M is shown to preserve the overall structure of G. The highlighted nodes in M also capture certain clusters in G. We could utilize interactive cover manipulation to obtain an even better representation of the data in Figure 11(c) where nodes in M are circled to highlight the majority scientific discipline from the underlying cluster. For instance, node 1 in M represents the humanities-labeled nodes in G; nodes 8 and 6 represent chemistry and biology, respectively. Node 8 and 6 merge at node 5 in M, which represents medical science and infectious diseases.

6 Scalable Computation

The running time of mapper on graphs algorithm relies heavily on the choice of a lens. While the lenses we discussed earlier (Section 3) are effective in capturing various structures of an underlying graph, they are expensive to compute for very large graphs. To address the issue, we need a lens that can be computed efficiently for very large graphs while still carrying structural information. We, therefore, consider PageRank [14] as an additional, scalable lens. We consider a version of the PageRank algorithm applicable to undirected graphs [41]. A PageRank vector R : V → R is defined for every node v ∈ V , (1 − d) X R(u) R(v) = + d , |V | |N(u)| u∈N(v)

11 where N(v) is the set of neighbors of v; 0 < d < 1 is the damping factor, which is typically set at 0.85. Using the formation of R(v), PageRank yields an iterative algorithm that can be computed efficiently in practice [14, 74]. The existence of the PageRank vector R is guaranteed by the PerronFrobenius theorem [78]. A high PageRank score at v typically means that v is connected to many nodes, which also have high PageRank scores. The PageRank has been shown to be a continuous function in [0, 1] [81]. For example, Figure 12(a) bottom illustrates the continuous variation of R on V for a random geometric graph. For the lens, we utilized f(v) = log(R(v)), as it provided a good distribution of function values. We utilize the PageRank implementation in NetworkX [42] and ran mapper on graphs on two synthetic graphs generated using NetworkX [42] and five real-world large graphs obtained from Stanford Large Network Dataset Collection [62] with up to 3 million edges using a MacBook Pro with 2.3 GHz Quad-Core Intel Core i5 with 8 GB memory. We report the average computational time in terms of PageRank (R) and mapper on graphs (M) in Table 1.

Table 1: Average computational time for mapper on graphs with PageRank on five large real-world graphs. Time is reported in seconds. Graph Figure —V— —E— n  R (s) M (s) Amazon0302 [60] 12(g) 262, 111 1, 234, 877 5 0.15 31.63 2.88 ca-CondMat [61] 12(e) 23, 133 93, 497 5 0.15 4.18 0.15 com-amazon.ungraph [95] 12(c) 334, 863 925, 872 5 0.15 46.90 4.15 com-youtube.ungraph [95] 12(d) 1, 134, 890 2, 987, 624 6 0.15 240.33 14.44 soc-Epinions1 [83] 12(f) 75, 879 508, 837 6 0.15 16.74 1.02

Finally, we would like to demonstrate that not only our approach is scalable using the PageR- ank lens, it also produces meaningful visualization results. Figure 12 gives examples of applying mapper on graphs to seven large graphs using the PageRank lens. Figure 12(a-b) are both synthetic graphs: (a) is a random geometric graph [75] with a radius 0.2 (n = 10,  = 0.15), while (b) contains a balanced tree G with branching factor 5 and height 9 (n = 3,  = 0.15). Mapper on graphs M for both graphs are shown to capture the global organizational principle of the original graph. Figure 12(c) contains a co-purchasing graph based on the CWBTIAB feature (Customers Who Bought This Item Also Bought) on Amazon website: if a product i is frequently purchased together with product j, then the original graph G contains an edge (i, j). While the node-link diagram of G cannot be improved beyond a hairball, M provides a very compact summary containing information regarding popular products serving as “hubs”. Similarly, Figure 12(d-g) show the mapper on graphs M for (d) YouTube , (e) Condense Matter collaboration network, (f) Epinions social network, and (g) a 2nd Amazon product co-purchasing network on March 2nd, 2003. The original graphs for these figures are not shown because they, too, are essentially hairballs.

7 Discussion

In this paper, we present a TDA approach for graph visualization using a variant of the mapper construction called mapper on graphs. Our approach is effective at detecting clusters in a graph and preserving the relations among these clusters. It is also flexible in capturing the structure of a graph across multiple scales based on different topological or geometric lenses. Mapper on graphs could potentially be applied to other visualization tasks, for instance, as a skeleton representation for the

12

Figure 12: Mapper on graphs using the PageRank lens. In examples (a-c), mapper on a graph M is shown on the top with its cover, and the original graph G is shown on the bottom. For examples (d-g), the original graphs G are not shown because it renders essentially as a hairball, similar to the graph G shown in (c). (a) A random geometric graph: |V | = 1, 000, |E| = 53, 741. (b) A (9, 5) balanced tree: |V | = 66, 430, |E| = 66, 429. (c) Amazon product co-purchasing network: |V | = 334, 863, |E| = 925, 872. (d) YouTube social network: |V | = 1, 134, 890, |E| = 2987624. (e) Condense Matter collaboration network: |V | = 23, 133, |E| = 93, 497. (f) Epinions social network: |V | = 75, 879, |E| = 508, 837. (g) A 2nd Amazon product co-purchasing network on March 2nd, 2003: |V | = 262, 111, |E| = 1, 234, 877. underlying graph in determining an initial layout in graph drawing. Traditionally, the PageRank vector is employed on the web graph, which could be considered as a —the PageRank vector at a previous instance of the web graph is used as an initial vector to obtain a fast PageRank solution for the current instance of the web graph. Therefore, Mapper on graphs in conjugation with the PageRank lens could be utilized for the study of temporal networks. Our focus in this paper is applying mapper on graphs to graph visualization. We are also interested in the theoretical properties of mapper on graphs. For example, how do we measure the distance between a mapper on graphs M and its underlying graph G? What is an appropriate metric under which M converges to the Reeb graph of G as the granularity of the cover goes to zero? What is the structural stability of mapper on graphs with respect to perturbations of the lens f, graph G, and cover U? Recent theoretical advances on the stability [15, 19] and convergence [3, 15, 18, 19, 25, 68] of mapper construction could address these questions, at least partially. However, questions remain that are unique to mapper on graphs: e.g., what is the relation between the shape of M and the geometric and topological properties of the lens f?

Acknowledgments

This work was supported in part by the National Science Foundation NSF IIS-1513616 and NSF DBI-1661375.

13 References

[1] Muthu Alagappan. From 5 to 13: Redefining the positions in basketball. MIT Sloan Sports Analytics Conference, 2012.

[2] Daniel Archambault, Tamara Munzner, and David Auber. Topolayout: Multilevel graph layout by topological features. IEEE transactions on visualization and computer graphics, 13(2), 2007.

[3] Aravindakshan Babu. Zigzag Coarsenings, Mapper Stability and Gene-network Analyses. PhD thesis, Stanford University, 2013.

[4] Benjamin Bach, Nathalie Henry Riche, Christophe Hurter, Kim Marriott, and Tim Dwyer. To- wards unambiguous edge bundling: Investigating confluent drawings for network visualization. IEEE transactions on visualization and computer graphics, 23(1):541–550, 2017.

[5] Maria Bampasidou and Thanos Gentimis. Modeling collaborations with persistent homology. CoRR, abs/1403.5346, 2014.

[6] Josh Barnes and Piet Hut. A hierarchical o(n log N) force-calculation algorithm. Nature, 324(6096):446, 1986.

[7] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: an open source software for exploring and manipulating networks. In ICWSM, pages 361–362, 2009.

[8] Vladimir Batagelj and Andrej Mrvar. Pajek datasets. http://vlado.fmf.uni-lj.si/pub/ networks/data/, 2006. [9] S. Biasotti, D. Giorgi, M. Spagnuolo, and B. Falcidieno. Reeb graphs for shape analysis and applications. Theoretical Computer Science, 392:5–22, 2008.

[10] S. Biasotti, S. Marini, M. Mortara, and G. Patane. An overview on properties and efficacy of topological skeletons in shape modelling. Shape Modeling International, 2003.

[11] Katy B¨orner,Richard Klavans, Michael Patek, Angela M Zoss, Joseph R Biberstine, Robert P Light, Vincent Larivi`ere,and Kevin W Boyack. Design and update of a classification system: The UCSD map of science. PloS one, 7(7):e39464, 2012.

[12] Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. Experiments on graph clustering algo- rithms. In European Symposium on Algorithms, pages 568–579, 2003.

[13] Ulrik Brandes and Christian Pich. Eigensolver methods for progressive multidimensional scal- ing of large data. In Graph Drawing, pages 42–53. Springer, 2007.

[14] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.

[15] Adam Brown, Omer Bobrowski, Elizabeth Munch, and Bei Wang. Probabilistic convergence and stability of random mapper graphs. arXiv preprint arXiv:1909.03488, 2019.

[16] Thang Nguyen Bui, Soma Chaudhuri, Frank Thomson Leighton, and Michael Sipser. Graph bisection algorithms with good average case behavior. Combinatorica, 7(2):171–191, 1987.

[17] Gunnar Carlsson. Topological pattern recognition for point cloud data. Acta Numerica, 23:289– 368, 2014.

14 [18] Mathieu Carri´ere,Bertrand Michel, and Steve Oudot. Statistical analysis and parameter selection for mapper. Journal of Machine Learning Research, 19:1–39, 2018.

[19] Mathieu Carri´ereand Steve Oudot. Structure and stability of the one-dimensional mapper. Foundations of Computational Mathematics, 18(6):1333–1396, 2018.

[20] C. J. Carstens and K. J. Horadam. Persistent homology of collaboration networks. Mathe- matical Problems in Engineering, 2013, 2013.

[21] Ben Cassidy, Caroline Rae, and Victor Solo. Brain activity: Conditional dissimilarity and persistent homology. IEEE 12th International Symposium on Biomedical Imaging (ISBI), pages 1356 – 1359, 2015.

[22] Moo K Chung, Seongho Seo, Nagesh Adluru, and Houri K Vorperian. Hot spots conjecture and its application to modeling tubular structures. In International Workshop on Machine Learning in Medical Imaging, pages 225–232, 2011.

[23] Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. Geometry-based edge clustering for graph visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6):1277–1284, 2008.

[24] Y. Dabaghian, F. M´emoli,L. Frank, and G. Carlsson. A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology, 8(8):e1002581, 2012.

[25] Tamal K Dey, Facundo Memoli, and Yusu Wang. Topological analysis of nerves, reeb spaces, mappers, and multiscale mappers. arXiv preprint arXiv:1703.07387, 2017.

[26] Inderjit S Dhillon. Co-clustering documents and words using bipartite spectral graph par- titioning. In Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–274, 2001.

[27] Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst D. Simon. A min-max cut algorithm for graph partitioning and data clustering. IEEE International Conference on Data Mining, pages 107–114, 2001.

[28] Kasper Dinkla, Michel A Westenberg, and Jarke J van Wijk. Compressed adjacency matrices: untangling gene regulatory networks. IEEE Transactions on Visualization and Computer Graphics, 18(12):2457–2466, 2012.

[29] Irene Donato, Giovanni Petri, Martina Scolamiero, Lamberto Rondoni, and Francesco Vac- carino. Decimation of fast states and weak nodes: topological variation via persistent homol- ogy. Proceedings of the European Conference on Complex Systems, pages 295–301, 2012.

[30] Cody Dunne and Ben Shneiderman. Motif simplification. Proceedings of the SIGCHI Confer- ence on Human Factors in Computing Systems, 2013.

[31] Tim Dwyer, Nathalie Henry Riche, Kim Marriott, and Christopher Mears. Edge compression techniques for visualization of dense directed graphs. IEEE Transactions on Visualization and Computer Graphics, 19(12):2596–2605, 2013.

[32] Weinan E, Jianfeng Lu, and Yuan Yao. The landscape of complex networks. CoRR, abs/1204.6376, 2012.

15 [33] Geoffrey Ellis and Alan Dix. A taxonomy of clutter reduction for information visualisation. IEEE Transactions on Visualization and Computer Graphics, 13(6):1216–1223, 2007.

[34] John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and Gordon Woodhull. Graphviz— open source graph drawing tools. Graph Drawing, pages 483–484, 2002.

[35] Ozan Ersoy, Christophe Hurter, Fernando Paulovich, Gabriel Cantareiro, and Alex Telea. Skeleton-based edge bundling for graph visualization. IEEE Transactions on Visualization and Computer Graphics, 17(12):2364–2373, 2011.

[36] Miroslav Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2):298–305, 1973.

[37] Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed place- ment. Software: Practice and experience, 21(11):1129–1164, 1991.

[38] Emden R Gansner, Yifan Hu, Stephen North, and Carlos Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In IEEE Pacific Visualization Symposium, pages 187–194, 2011.

[39] Emden R Gansner, Yehuda Koren, and Stephen North. Graph drawing by stress majorization. In Graph Drawing, pages 239–250. Springer, 2005.

[40] Emden R. Gansner, Eleftherios Koutsofios, Stephen C. North, and Kiem-Phong Vo. A tech- nique for drawing directed graphs. IEEE Transactions on Software Engineering, 19(3):214–230, 1993.

[41] Vince Grolmusz. A note on the pagerank of undirected graphs. arXiv preprint arXiv:1205.1960, 2012.

[42] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dy- namics, and function using NetworkX. Proceedings of the 7th Python in Science Conference, 2008.

[43] Mustafa Hajij, Bei Wang, Carlos Scheidegger, and Paul Rosen. Visual detection of struc- tural changes in time-varying graphs using persistent homology. IEEE Pacific Visualization Symposium (PacificVis), 2018.

[44] Derek Hansen, Ben Shneiderman, and Marc A Smith. Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann, 2010.

[45] Danny Holten and Jarke J Van Wijk. Force-directed edge bundling for graph visualization. Computer Graphics Forum, 28(3):983–990, 2009.

[46] Danijela Horak, Slobodan Maleti´c,and Milan Rajkovi´c.Persistent homology of complex net- works. Journal of Statistical Mechanics: Theory and Experiment, page P03034, 2009.

[47] Yifan Hu. Efficient, high-quality force- drawing. Mathematica Journal, 10(1):37– 71, 2005.

[48] Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538–543, 2002.

16 [49] Sanjay Kairam, Diana MacLean, Manolis Savva, and Jeffrey Heer. Graphprism: Compact visualization of network structure. In Advanced Visual Interfaces, 2012.

[50] Marc Khoury, Yifan Hu, Shankar Krishnan, and Carlos Scheidegger. Drawing large graphs by low-rank stress majorization. Computer Graphics Forum, 31:975–984, 2012.

[51] Vladimir G Kim, Yaron Lipman, Xiaobai Chen, and Thomas Funkhouser. M¨obiustransfor- mations for global intrinsic symmetry analysis. Computer Graphics Forum, 29(5):1689–1700, 2010.

[52] Yehuda Koren. On spectral graph drawing. In Computing and Combinatorics, pages 496–508. Springer, 2003.

[53] Yehuda Koren, Liran Carmel, and David Harel. Ace: A fast multiscale eigenvectors computa- tion for drawing huge graphs. IEEE Symposium on Information Visualization, pages 137–144, 2002.

[54] Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney. Semi-supervised graph clustering: a kernel approach. Machine learning, 74(1):1–22, 2009.

[55] Stephane Lafon and Ann B Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE transac- tions on pattern analysis and machine intelligence, 28(9):1393–1403, 2006.

[56] Hyekyoung Lee, Moo K. Chung, Hyejin Kang, Boong-Nyun Kim, and Dong Soo Lee. Com- puting the shape of brain networks using graph filtration and gromov-hausdorff metric. Inter- national Conference on Medical Image Computing and Computer Assisted Intervention, pages 302–309, 2011.

[57] Hyekyoung Lee, Moo K. Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee. Discrim- inative persistent homology of brain networks. IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 841–844, 2011.

[58] Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, and Dong Soo Lee. Persistent brain network homology from the perspective of dendrogram. IEEE Transactions on Medical Imaging, 31(12):2267–2277, 2012.

[59] Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, and Dong Soo Lee. Weighted functional brain network modeling via network filtration. NIPS Workshop on Algebraic Topol- ogy and Machine Learning, 2012.

[60] Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 2007.

[61] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007.

[62] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.

[63] Bruno L´evy. Laplace-beltrami eigenfunctions towards an algorithm that understands geometry. In IEEE International Conference on Shape Modeling and Applications, page 13, 2006.

17 [64] Shusen Liu, Dan Maljovec, Bei Wang, Peer-Timo Bremer, and Valerio Pascucci. Visualizing high-dimensional data: Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3):1249–1268, 2017.

[65] P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carls- son, and G. Carlsson. Extracting insights from the shape of complex data using topology. Scientific Reports, 3, 2013.

[66] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.

[67] Fintan McGee and John Dingliana. An empirical study on the impact of edge bundling on user comprehension of graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces, pages 620–627. ACM, 2012.

[68] Elizabeth Munch and Bei Wang. Convergence between categorical representations of Reeb space and mapper. In S´andorFekete and Anna Lubiw, editors, Proceedings of the 32nd Inter- national Symposium on Computational Geometry, volume 51 of Leibniz International Proceed- ings in Informatics (LIPIcs), pages 53:1–53:16, Dagstuhl, Germany, 2016. Schloss Dagstuhl– Leibniz-Zentrum fuer Informatik.

[69] Mark E. J. Newman. Fast algorithm for detecting in networks. Physical review E, 69(6):066133, 2004.

[70] Mark EJ Newman. Properties of highly clustered networks. Physical Review E, 68(2):026121, 2003.

[71] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, pages 849–856, 2002.

[72] Monica Nicolau, Arnold J. Levine, and Gunnar Carlsson. Topology based data analysis iden- tifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.

[73] Monica Nicolaua, Arnold J. Levine, and Gunnar Carlsson. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings National Academy of Sciences of the United States of America, 108(17):7265–7270, 2011.

[74] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.

[75] Mathew Penrose. Random geometric graphs. Oxford University Press, 2003.

[76] Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Networks and cy- cles: A persistent homology approach to complex networks. Proceedings European Conference on Complex Systems 2012, Springer Proceedings in Complexity, pages 93–99, 2013.

[77] Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Topological strata of weighted complex networks. PLoS ONE, 8(6):e66506, 2013.

[78] S. Unnikrishna Pillai, Torsten Suel, and Seunghun Cha. The Perron-Frobenius theorem: some of its applications. IEEE Signal Processing Magazine, 22(2):62–75, 2005.

18 [79] Virginia Pirino, Eva Riccomagno, Sergio Martinoia, and Paolo Massobrio. A topological study of repetitive co-activation networks in in vitro cortical assemblies. Physical Biology, 12(1), 2015. [80] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In International symposium on computer and information sciences, pages 284–293, 2005. [81] Luca Pretto. Analysis of web link analysis algorithms: The mathematics of ranking. In Maristella Agosti, editor, Information Access through Search Engines and Digital Libraries, pages 97–111. Springer, 2008. [82] Martin Reuter, Silvia Biasotti, Daniela Giorgi, Giuseppe Patan´e,and Michela Spagnuolo. Dis- crete laplace–beltrami operators for shape analysis and segmentation. Computers & Graphics, 33(3):381–390, 2009. [83] Matthew Richardson, Rakesh Agrawal, and Pedro Domingos. Trust management for the se- mantic web. In International semantic Web conference, pages 351–368. Springer, 2003. [84] Satu Elisa Schaeffer. Graph clustering. Computer science review, 1(1):27–64, 2007. [85] David Selassie, Brandon Heller, and Jeffrey Heer. Divided edge bundling for directional network data. IEEE Transactions on Visualization and Computer Graphics, 17(12):2354–2363, 2011. [86] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013. [87] Bernard W Silverman. Density estimation for statistics and data analysis, volume 26. CRC press, 1986. [88] Gurjeet Singh, Facundo M´emoli,and Gunnar Carlsson. Topological methods for the analysis of high dimensional data sets and 3d object recognition. Eurographics Symposium on Point-Based Graphics, 22, 2007. [89] Ashley Suh, Mustafa Hajij, Bei Wang, Carlos Scheidegger, and Paul Rosen. Persistent ho- mology guided force-directed graph layouts. Proceedings of IEEE Visualization Conference (InfoVis), 2019. [90] Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. Efficient aggregation for graph summarization. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 567–580, 2008. [91] Brenda Y Torres, Jose Henrique M Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cumnock, and David S Schneider. Tracking resilience to infections by mapping disease space. PLoS biology, 14(4), 2016. [92] W. T. Tutte. How to draw a graph. Proceedings of the London Mathematical Society, s3- 13(1):743–767, Jan 1963. URL: http://dx.doi.org/10.1112/plms/s3-13.1.743, doi:10. 1112/plms/s3-13.1.743. [93] Tatiana Von Landesberger, Arjan Kuijper, Tobias Schreck, J¨ornKohlhammer, Jarke J van Wijk, J-D Fekete, and Dieter W Fellner. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum, 30(6):1719–1749, 2011.

19 [94] Scott White and Padhraic Smyth. A spectral clustering approach to finding communities in graphs. In Proceedings of the SIAM international conference on data mining, pages 274–285, 2005.

[95] Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015.

20