Motif Simplification: Improving Network Visualization Readability

HCIL Tech Report Motif Simplification: Improving Network Visualization Readability with Fan and Parallel Glyphs Cody Dunne and Ben Shneiderman ) Fig. 1: The left bipartite sociogram shows edit histories of the Lostpedia wiki article entitled “Four-toed-statue”. The right sociogram shows the same network after replacing the common fan and 2-parallel motifs with simplified glyphs. Abstract— Network data structures have been used extensively in recent years for modeling entities and their ties, across diverse disciplines. Analyzing networks involves understanding the complex relationships between entities, as well as any attributes, statistics, or groupings associated with them. A widely used class of visualizations called sociograms excel at showing the network topology, attributes, and groupings simultaneously. However, many sociograms are not easily readable or difficult to extract meaning from because of the inherent complexity of the relationships and the number of items designers try to render in limited space. This paper introduces a technique called motif simplification that leverages the repeating motifs in networks to reduce visualization complexity and increase readability. We propose replacing motifs in the network with easily understandable glyphs that (1) require less screen space, (2) are easier to understand in the context of the network, (3) can reveal otherwise hidden relationships, and (4) result in minimal loss of fidelity. We tackle two frequently occurring and high-payoff motifs: a fan motif consisting of a fan of nodes with only a single neighbor connecting them to the network, and a parallel motif of functionally equivalent nodes that span two or more other nodes together. We contribute the design of representative glyphs for these motifs, algorithms for detecting them, a publicly available reference implementation, and initial case studies and user feedback that support the motif simplification approach. Index Terms—Network motif simplification, network analysis, social network, graph. 1 INTRODUCTION Networks have long been common data structures in Computer Sci- when faced with these varied and oftentimes immense datasets. vi- ence, but have only recently exploded into popular culture with pub- sualcomplexity.com provides many beautiful alternative network vi- lishers like the New York Times now frequently including elaborate sualizations, but one enduring visualization in particular models re- and interesting networks with their articles. Online communities like lationships using a node-link visualization or sociogram [1], where Facebook, MySpace, Twitter, Flickr, and mailing lists (to name only a nodes represent actors in a community and the links or edges indicate handful) enjoyed enormous growth over the last few years and provide relationships between individual actors [2]. Sociograms have only re- incredibly rich datasets of interpersonal relationships called social net- cently been established as tools for network analysis, but have already works, which social scientists are now fervently exploring. Networks been put to great effect. [3,4] successfully used sociograms to detect have also found applications in such diverse disciplines as Bioinfor- common social roles in online discussion newsgroups such as answer matics, Urban Planning, and Archeology. person and discussion person. Sociograms have also been applied to Analysis of network data requires knowledge of the connectivity, the study of relationships between political blogs during the 2004 U.S. clusters, and centrality of the nodes. Statistical analysis and conven- Presidential Election, showing the division between liberal and con- tional visualization tools like bar and pie charts are often inadequate servative communities as well as their internal interactions [5]. However, there is a huge array of possible sociograms for any given • Cody Dunne is with University of Maryland, E-mail: [email protected]. social network, many of which can be misleading or incomprehensi- • Ben Shneiderman is with University of Maryland, E-mail: ble. Visualizations of relational structures like social networks are only [email protected]. useful to the degree they “effectively convey information to the people that use them” [6]. In fact, the spatial layout of a sociogram can have a 1 profound impact on the detection of communities in the network and motif glyphs, followed in Section4 with the algorithmic details of the perceived importance of actors [7]. Significant thought must be detecting the fan and parallel motifs. We then discuss the NodeXL given to properly visualizing networks so that analysts will be able to reference implementation in Section5. Finally, Sections6 and7 de- understand and effectively communicate data like clusters, the paths scribe the visualization coverage metric, example applications to two and spans between them, and the importance of individual actors. network datasets, and initial user impressions of motif simplification. As manual layout of nodes in the sociogram is incredibly time con- suming to do well, a lot of effort has been put into developing au- 2 RELATED WORK tomated network layout algorithms and filtering tools. As the opti- Network analysis tools generally use sociograms, as in SocialAction mization of many readability metrics is NP-hard [6], layout algorithms [9]. Several alternate tools are built around matrix representations, like often use heuristics that produce suboptimal visualizations quickly. Matrix Zoom [10] and MatrixExplorer [11]. Both show the topology However, the results of applying a layout algorithm can vary greatly of small networks well but can be unreadable with a few thousand depending on the size and topology of the network, and the layout nodes. In order to quantify the effectiveness of sociograms, Purchase generated is highly dependent on the algorithm used. We believe that [12] developed a set of aesthetic criteria or readability metrics that state of the art layout algorithms alone are insufficient to consistently measure aspects of the visualization that worsen readabilitiy, such as produce understandable network visualizations. edge crossings. Dunne and Shneiderman [13] proposed extending One way forward is the use of aggregation, specifically by simpli- these readabilty metrics to specific nodes and edges, so as to iden- fying common repeating network structures called network motifs. tify problem areas in the drawing that users or a laytout algorithm can Large, complex network visualizations often have these repeated pat- fix. However no sufficiently fast automatic layout techniques exist to terns throughout because of either the network structure or how the leverage these metrics to create better sociograms. data was collected. Regardless of their cause, many frequently ex- One way to get an overview of a large network is to aggregate pressed motifs contain little information compared to the space they nodes based on the topology like in Ask-GraphView [14]. Similarly, occupy in the network visualization. Two network motifs in particular ManyNets [15] partitions networks according to topology or node at- plague social scientists, especially those using heterogeneous (multi- tributes, supporting easy statistical comparisons. These clusterings can ple node type) networks. First, fan motifs with a fan of leaf nodes show the aggregated topology of networks with hundreds of thousands connected via a single head node to the rest of the network can ac- of nodes, but require effective topological clustering techniques. Piv- count for a large portion of the visualization in many data sets. Sec- otGraph [16] also aggregates nodes by attributes and shows relation- ond, parallel motifs consist of functionally equivalent span nodes ships between aggregates using arcs, but does not allow for several that solely span two or more anchor nodes together. node types. NetLens [17] can show two node types and GraphTrail These two motifs are both visible in the bipartite network for the [18] allows an arbitrary number, but both focus on attribute compar- Lostpedia community shown on the left side of Fig.1. With such a isons at the expense of showing network topology. small network seeing the complete topology is easy. However, many Our chosen approach is to aggregate the network by the frequently of the networks scientists find interesting are much larger and more occurring motifs it contains. While the fan and parallel motifs we complex. Visualizations of these networks can be dominated by large target are quite prominent in social network datasets, there are many fans of nodes spread around the periphery, in one common example other motifs of interest, especially for biologists. Motif census (count- taking up 58% of screen real estate and wasting another 21% as empty ing the kinds of motifs) and analysis of them is used extensively to space (see Section 6.2.2). Parallel motifs connecting parts of the net- analyze the behavior of complex biologic networks, looking for feed- work can be hard to detect on their own, much less in the 33% of the forward loops and other repeated patterns that indicate underlying pro- screen remaining for the core network and occluded by fan motifs. cesses. For example, Milo et al. [19] used an approach that finds mo- This paper describes a technique we call motif simplification that tifs that appeared more frequently than expected in suitably random leverages these repeating motifs in large network visualizations to re- networks. They provide an extensive chart of motifs of three or four duce

Load more