HCIL Tech Report

Motif Simplification: Improving Network Visualization Readability with Fan and Parallel Glyphs

Cody Dunne and Ben Shneiderman

Fig. 1: The left bipartite sociogram shows edit histories of the Lostpedia wiki article entitled “Four-toed-statue”. The right sociogram shows the same network after replacing the common fan and 2-parallel motifs with simplified glyphs.

Abstract— Network data structures have been used extensively in recent years for modeling entities and their ties, across diverse disciplines. Analyzing networks involves understanding the complex relationships between entities, as well as any attributes, statistics, or groupings associated with them. A widely used class of visualizations called sociograms excel at showing the , attributes, and groupings simultaneously. However, many sociograms are not easily readable or difficult to extract meaning from because of the inherent complexity of the relationships and the number of items designers try to render in limited space.

This paper introduces a technique called motif simplification that leverages the repeating motifs in networks to reduce vi- sualization complexity and increase readability. We propose replacing motifs in the network with easily understandable glyphs that (1) require less screen space, (2) are easier to understand in the context of the network, (3) can reveal otherwise hidden relationships, and (4) result in minimal loss of fidelity. We tackle two frequently occurring and high-payoff motifs: a fan motif consisting of a fan of nodes with only a single neighbor connecting them to the network, and a parallel motif of functionally equivalent nodes that span two or more other nodes together. We contribute the design of representative glyphs for these motifs, algorithms for detecting them, a publicly available reference implementation, and initial case studies and user that support the motif simplification approach. Index Terms—Network motif simplification, network analysis, , graph.

1 INTRODUCTION Networks have long been common data structures in Computer Sci- when faced with these varied and oftentimes immense datasets. vi- ence, but have only recently exploded into popular culture with pub- sualcomplexity.com provides many beautiful alternative network vi- lishers like the New York Times now frequently including elaborate sualizations, but one enduring visualization in particular models re- and interesting networks with their articles. Online communities like lationships using a node-link visualization or sociogram [1], where Facebook, MySpace, Twitter, Flickr, and mailing lists (to name only a nodes represent actors in a community and the links or edges indicate handful) enjoyed enormous growth over the last few years and provide relationships between individual actors [2]. Sociograms have only re- incredibly rich datasets of interpersonal relationships called social net- cently been established as tools for network analysis, but have already works, which social scientists are now fervently exploring. Networks been put to great effect. [3,4] successfully used sociograms to detect have also found applications in such diverse disciplines as Bioinfor- common social roles in online discussion newsgroups such as answer matics, Urban Planning, and Archeology. person and discussion person. Sociograms have also been applied to Analysis of network data requires knowledge of the connectivity, the study of relationships between political blogs during the 2004 U.S. clusters, and of the nodes. Statistical analysis and conven- Presidential Election, showing the division between liberal and con- tional visualization tools like bar and pie charts are often inadequate servative communities as well as their internal interactions [5]. However, there is a huge array of possible sociograms for any given • Cody Dunne is with University of Maryland, E-mail: [email protected]. social network, many of which can be misleading or incomprehensi- • Ben Shneiderman is with University of Maryland, E-mail: ble. Visualizations of relational structures like social networks are only [email protected]. useful to the they “effectively convey information to the people that use them” [6]. In fact, the spatial layout of a sociogram can have a

1 profound impact on the detection of communities in the network and motif glyphs, followed in Section4 with the algorithmic details of the perceived importance of actors [7]. Significant thought must be detecting the fan and parallel motifs. We then discuss the NodeXL given to properly visualizing networks so that analysts will be able to reference implementation in Section5. Finally, Sections6 and7 de- understand and effectively communicate data like clusters, the paths scribe the visualization coverage metric, example applications to two and spans between them, and the importance of individual actors. network datasets, and initial user impressions of motif simplification. As manual layout of nodes in the sociogram is incredibly time con- suming to do well, a lot of effort has been put into developing au- 2 RELATED WORK tomated network layout algorithms and filtering tools. As the opti- Network analysis tools generally use sociograms, as in SocialAction mization of many readability metrics is NP-hard [6], layout algorithms [9]. Several alternate tools are built around matrix representations, like often use heuristics that produce suboptimal visualizations quickly. Matrix Zoom [10] and MatrixExplorer [11]. Both show the topology However, the results of applying a layout algorithm can vary greatly of small networks well but can be unreadable with a few thousand depending on the size and topology of the network, and the layout nodes. In order to quantify the effectiveness of sociograms, Purchase generated is highly dependent on the algorithm used. We believe that [12] developed a set of aesthetic criteria or readability metrics that state of the art layout algorithms alone are insufficient to consistently measure aspects of the visualization that worsen readabilitiy, such as produce understandable network visualizations. edge crossings. Dunne and Shneiderman [13] proposed extending One way forward is the use of aggregation, specifically by simpli- these readabilty metrics to specific nodes and edges, so as to iden- fying common repeating network structures called network motifs. tify problem areas in the drawing that users or a laytout algorithm can Large, visualizations often have these repeated pat- fix. However no sufficiently fast automatic layout techniques exist to terns throughout because of either the network structure or how the leverage these metrics to create better sociograms. data was collected. Regardless of their cause, many frequently ex- One way to get an overview of a large network is to aggregate pressed motifs contain little information compared to the space they nodes based on the topology like in Ask-GraphView [14]. Similarly, occupy in the network visualization. Two network motifs in particular ManyNets [15] partitions networks according to topology or node at- plague social scientists, especially those using heterogeneous (multi- tributes, supporting easy statistical comparisons. These clusterings can ple node type) networks. First, fan motifs with a fan of leaf nodes show the aggregated topology of networks with hundreds of thousands connected via a single head node to the rest of the network can ac- of nodes, but require effective topological clustering techniques. Piv- count for a large portion of the visualization in many data sets. Sec- otGraph [16] also aggregates nodes by attributes and shows relation- ond, parallel motifs consist of functionally equivalent span nodes ships between aggregates using arcs, but does not allow for several that solely span two or more anchor nodes together. node types. NetLens [17] can show two node types and GraphTrail These two motifs are both visible in the bipartite network for the [18] allows an arbitrary number, but both focus on attribute compar- Lostpedia community shown on the left side of Fig.1. With such a isons at the expense of showing network topology. small network seeing the complete topology is easy. However, many Our chosen approach is to aggregate the network by the frequently of the networks scientists find interesting are much larger and more occurring motifs it contains. While the fan and parallel motifs we complex. Visualizations of these networks can be dominated by large target are quite prominent in social network datasets, there are many fans of nodes spread around the periphery, in one common example other motifs of interest, especially for biologists. Motif census (count- taking up 58% of screen real estate and wasting another 21% as empty ing the kinds of motifs) and analysis of them is used extensively to space (see Section 6.2.2). Parallel motifs connecting parts of the net- analyze the behavior of complex biologic networks, looking for feed- work can be hard to detect on their own, much less in the 33% of the forward loops and other repeated patterns that indicate underlying pro- screen remaining for the core network and occluded by fan motifs. cesses. For example, Milo et al. [19] used an approach that finds mo- This paper describes a technique we call motif simplification that tifs that appeared more frequently than expected in suitably random leverages these repeating motifs in large network visualizations to re- networks. They provide an extensive chart of motifs of three or four duce the overall complexity and increase its readability. Instead of nodes, and describe their frequency in various biologic networks. Zhu highlighting or summarizing the motifs found in a network, we pro- et al. [20] provide an overview of the use of network motifs for an- pose new techniques for simplifying their graphical representations alyzing biologic networks, while Luscombe et al. [21] and Ye et al. into representative glyphs that (1) require less screen real estate, (2) [22] both demonstrate specific applications of motif analysis. are easier to understand in the context of the network, (3) result in To look for motifs larger than three or four nodes, Grochow and minimal loss of fidelity, and (4) can reveal otherwise hidden relation- Kellis [23] developed a technique called symmetry-breaking that ships. We discuss the design of representative glyphs for the fan and quickly finds motifs of various sizes. In applying their algorithm to parallel motifs, as well as novel algorithms for detecting them. We the protein-protein interaction network of S. cerevisae, a species of also provide a new metric for measuring visualization coverage, so as yeast, they discovered one motif that appeared 27,720 times but did to quantify the space saved through simplification. The techniques dis- not appear at all in suitably created random ensembles. This motifis cussed in this paper are implemented and made publicly available as composed of various overlapping combinations of 29 nodes that rep- part of the free and open source NodeXL network analysis tool [8]. resent cellular transcription machinery. Specifically, the contributions of this paper are: Knowledge of the motifs present in a network can help predict be- • A technique for simplifying sociogram network visualizations by havior and the “structural signatures” of individual entities [4], but replacing common motifs with easily understandable glyphs, visualizing these motifs effectively is challenging. In one approach to visualize these motifs, the matches to a selected motif are highlighted • The design of representative glyphs for the fan and parallel mo- within the overall network visualization and can be drawn identically tifs that show the motif contents and underlying attributes, so they are easily spotted [24]. While highlighting the motifs can help biologists spot the locations of particular processes, it does little to • Algorithms for detecting the high-payoff fan and parallel motifs reduce the clutter of a complex network visualization and can even re- common to social network datasets, duce readability. Similarly, combining functionally equivalent nodes into meta-nodes is an aggregation technique often used to simplify the • A new metric for measuring visualization coverage, and network being visualized, such as the greedy graph summarization ap- • A free and open source reference implementation, made publicly proach by Navlakha et al. [25]. This includes functionally equivalent available as part of NodeXL [8]. nodes like our parallel motif spans and fan leaves. However, with- out visible indications on the meta-nodes showing what patterns were The remainder of this paper is as follows. Section2 presents related compressed it is difficult to understand the original network topology. work on the use of motifs for understanding networks. Next, Section3 Moreover, the greedy approach used pays no attention to the motifs describes the overall motif simplification technique and design of our present, focusing only on reducing the complexity network data struc-

2 HCIL Tech Report ture. This greedy heuristic is more difficult for the user to understand, as it the structure of the underlying nodes is not well defined. While current tools can highlight detected motifs, there are few techniques for providing a graphical overview or summary of them. More importantly, we know of no approaches that leverage the motifs present to reduce the visual complexity of the network visualization.

3 NETWORK MOTIF SIMPLIFICATION Many common motifs in social network datasets present little mean- ingful information, yet dominate much of the display space and can obscure the more interesting parts of the network. We believe that replacing these motifs with representative glyphs will create network visualizations that (1) require less screen real estate, (2) make motifs easier to understand in the context of the network, (3) result in mini- mal loss of fidelity, and (4) can reveal otherwise hidden relationships. We term this approach motif simplification. We chosen two motifs for our initial foray in motif simplification: the fan motif and parallel motif described in the introduction. These Fig. 3: Three fan motifs (left) and simplified fan glyph versions (right). two motifs are prime simplification candidates for several reasons. For one, these motifs are quite common in the social network datasets that sociograms are frequently used for. While simple to understand on across multiple networks, the wide range of motif sizes we encoun- their own, these motifs can account for much of the visual complexity tered pushed us to use a relative scale so that size differences within of a sociogram. The fan motifs especially can dominate much of the a network would be visible. To allow easier size comparison between display space, obscuring any underlying relationships and reducing fans, we fix the left side of the sector vertically, moving only the right the space available for more interesting aspects of the network. The side clockwise. parallel motifs usually occupy less screen space than the fan motifs, This arc scaling is shown in Fig.3 for three fan motifs and their but can contribute substantial complexity to the core of the sociogram. associated glyphs. The top fan motif has six leaf nodes, the most in For each motif we want to simplify, careful thought must be given the network, so its glyph is given an arc of 120◦. The second motif has to the design of a glyph to represent it. While any arbitrary motif L three leaf nodes, the fewest in the network, so its glyph is given an arc can be shown as a simple meta-node (e.g., ), a representative glyph of 10◦. The bottom motif has four leaf nodes, so it is given an arc of that reveals the properties of the motif will be easier to understand. about 47◦. For example, sizing the glyph proportional to the number of nodes There are several advantages to this representation of the motif size. it contains will help users understand important differences in scale. Foremost, if the arc of a sector increases, its area increases proportion- Similarly, the underlying attributes of the simplified nodes may be im- ally. Other size options such as changing sector radius would increase portant for an analysis. Normally, users can use a color scale to show area quadratically. Moreover, these fan glyphs can be easily compared this information. Rather than hiding these colors by using a standard at a or superimposed, such that the exact size ratios are more meta-node, we can aggregate the underlying attributes and show the clearly elucidated. aggregate representation on the same color scale. In addition to the membership of a fan motif, we would like to show information about the member nodes’ attributes. If the visualization 3.1 Fan Glyph Design has a color scale for the nodes based on numerical attributes, we mod- We applied these design principles to the creation of a fan glyph to ify the fan glyph to represent the average value of its leaf nodes (or represent the fan motif. The basic design is shown in Fig.2. We take any other function of the attributes). Taking the average of a set of the fan motif, with its green head node and orange leaf nodes, and node colors may be meaningless in some cases, like when values are replace the leaf nodes with a sector representation. The head node binned, and misleading in others, like when a log scale is used. In- retains all of its attribute encodings, including any shape, color, size, stead, we query the original values of the underlying nodes that were opacity, or label specified by the user. Note that the head node of a fan used for the color coding, and compute the average of these values. motif may be connected to other nodes not participating in the motif, We then use the original node color scale to find the proper color for shown in gray. By default, the color of the sector is chosen to represent the average. This requires that the implementation retains information the motif uniquely. This basic representation is sufficient for showing about the applied color scale, so it can be reused to color the fan glyph. the presence of a fan motif, but does not provide any information about As head nodes are generally important entities in the network, we re- the number of nodes it contains or their attributes. tain their original color coding and use the leaf average for only the sector part of the fan glyph. Fig.3 shows three examples of the fan glyph color coding. The nodes are colored on a scale from orange to purple, where orange rep- resents low values and purple shows high values. The leaves of the top fan motif have low orange values, so their fan glyph is filled by orange. The second fan motif has a range of colors in its leaves, and the fan glyph is filled with a reddish midrange color. The bottom fan motif has generally high purple values, thus the associated fan glyph Fig. 2: A fan motif (left), simplified using a fan glyph (right). is filled by purple. One disadvantage of this fan glyph design is that it hides any at- To show the number of nodes in a fan motif, we scale the arc of tribute encoding used by the edges connecting the fan head to its leaf the sector representation from 10◦ to 120◦. While we experimented nodes. This edge information could be encoded by the sector radius, with other arc ranges, we found that this large range most effectively however that would affect perceived node count as well. Instead, users revealed differences between motifs and helped users notice their pres- interested in edge attributes may prefer the alternate fan glyph shown ence. We use a relative scale for the arcs, such that the largest motif in Fig.4. This fan glyph uses a meta-edge to connect the fan head in the network always has an arc of 120◦ despite the number of nodes to the sector, showing the aggregate of the underlying edges. The it contains. While an absolute scale would enable motif comparisons meta-edge is colored and sized based on the average of its underly-

3 Fig. 6: A 3-parallel motif (left), simplified with a parallel glyph (right). Fig. 4: Alternate fan glyphs for showing edge color and size coding. ing values, as with the original sector coloring. However, alternate version requires more display space and reduces the simplicity and understandability of the glyph.

3.2 Parallel Glyph Design We used the same design principles to create a parallel glyph to rep- resent the parallel motif. An example of the parallel glyph is shown in Fig.5. Each parallel motif consists of a set of two or more anchor nodes, shown in green, and a set of functionally equivalent span nodes shown in orange. Note that the anchor nodes of a parallel motif may be connected to other nodes not participating in the motif (gray), or to each other. Moreover, the head node of a fan motif can be an anchor node for a parallel motif. The parallel glyph replaces the span nodes with an arch that is meant to signify their connecting nature. This arch is connected to the head nodes via meta-edges, which represent the aggregate of the edges between specific head nodes and all the span nodes.

Fig. 7: Three 2-parallel motifs (left) and their parallel glyphs (right).

ing edge attribute values. For example, many dark, thick edges like in Fig. 5: A 2-parallel motif (left), simplified with a parallel glyph (right). the bottom motif of Fig.7 will be shown as a dark, thick meta-edge. Likewise, many light, thin edges will be simplified into a light and thin meta-edge like the top motif. However, the meta-edges connecting to Parallel motifs have the added complexity of having various dimen- an arch can be unbalanced as with the middle motif, where the con- sions, denoted D, to indicate the number of anchors it has. D can be nections to one anchor are substantially darker or thicker than the other any integer two or greater, though the frequency of the motifs gener- connections. While the size and color of the arch represents the span ally decreases proportional to D. Our previous examples showed only nodes and their attributes, the thickness and color of the meta-edges 2-parallel motifs with two anchors, though in many networks there represent the underlying span edges and their attributes. are plenty of 3- and 4-parallel motifs as well. We chose to use the same span node representation for any dimension of parallel motif, Our current parallel glyph uses an arch to represent the span nodes, connecting the meta-edges to the arch at various points based on the though we previously experimented with a diamond representation anchor locations. An example simplification of a 3-parallel motif with like Fig.8. The advantage of the diamond representation is that, like three green anchors is shown in Fig.6. We chose this consistent rep- the fan glyph, it is easily understandable by users and makes it easier resentation because we believed that the added complexity caused by to compare motif sizes. Moreover, it can be scaled on one axis to rep- multiple versions of the same motif would add too much visual com- resent the number of span nodes with a proportional change in area. plexity to be easily intelligible. However, we found that users would easily confuse the parallel glyph As with the fan glyph, we show the number of span nodes in each with the diamond node shape often used to show several types of nodes parallel glyph by scaling the size of the arch. Again, we use a relative with shape coding. While the arch is more complex to scale appropri- scale, where the same size is always used for the largest arch despite ately and makes it harder compare sizes, it is much more noticeable the number of span nodes it represents. Fig.7 shows three example and visually distinct. 2-parallel motifs, and the scaled arches in their simplified 2-parallel glyphs. We use the same color encoding for the arch as for the fan glyph sector (Section 3.1). By default, the color of the arch is represents the motif uniquely. If a node color scale exists, the scale will be used to color the arch based on the average of the span node values. The an- chor nodes of the parallel motif are visibly distinct from the arch and retain their original visual encodings. Moreover, the meta-edges in the parallel glyph are sized and colored using the average of any underly- Fig. 8: Alternate less distinct parallel glyph for better size scaling.

4 HCIL Tech Report

3.3 Overlapping Motifs Algorithm 1 Fan motif detection algorithm. Time complexity: O(|G.nodes| × average neighbor count) While each motif glyph is individually useful, they will be more ef- fective when used in combinations to substantially simplify the vi- 1: procedure DETECTFANS sualization. In the ideal case motifs are non-overlapping and easily 2: for all n ∈ G.nodes do transformed into glyphs. However, many networks like the Lostpedia 3: if |n.neighbors| ≥ 2 then example in Fig.1 have fan motif heads that also serve as parallel mo- 4: leaves ← {/0} tif anchors. The design of any motif glyphs must take these overlaps 5: for all nbr ∈ n.neighbors do into account, as we have for the fan and parallel glyphs. As the fan 6: if |nbr.neighbors| = 1 then motif heads and parallel motif anchors are represented individually, 7: leaves.add(nbr) their functionality can be combined. As we show later in Section4, 8: end if our definitions of fan and parallel motifs do not allow any other over- 9: end for lap except in degenerate cases. However, more complicated motifs 10: if |leaves| ≥ 2 then like cliques may have complexities that require careful consideration, 11: RECORDFAN(n, leaves) such as the order and choice of motif simplification when a node is a 12: end if member of multiple cliques. 13: end if 14: end for 3.4 Glyph Interactivity 15: end procedure

While the motif glyphs we described before can be effective tools for 16: procedure RECORDFAN(head, leaves) simplifying a network visualization, we would like to make sure that 17: ··· . Record a given fan motif they are easily understandable and investigable. One important aspect 18: end procedure of this is to ensure that users can switch between the original and sim- plified versions interactively. Users may wish to simplify the entire network, or only a particular subset of motifs. Likewise, they may want to expand the entire network to see the original visualization, or edge types may have overlapping edges of differing types. Some algo- only expand a selected motif they are interested in understanding. rithms for computing degree would return higher values in these cases Moreover, direct manipulation of the motif glyphs and underlying than the actual number of neighboring nodes. nodes is an effective way of exploring the network. This includes al- lowing users to adjust the node or glyph placement, as well as highlight 4.2 D-Parallel Motif incident edges or adjacent nodes through simple context menus. To re- For our discussion of parallel motif detection we use the same nota- tain the ability to compare across networks, and to aid in more exact tion as Section 4.1. A parallel motif also has a dimension, denoted comparisons within networks, we added tooltips to the motif glyphs D, that indicates the number of anchors it has. D can be any integer that displayed some of their contents. For example, a fan motif may two or greater, though the frequency of the motifs generally decreases have a tool tip like Fan motif: 5 leaf nodes with head node ‘Sean’ proportional to D. For example, Fig.1 shows several 2-parallel mo- while one for a 2-parallel motif could read 2-Parallel motif: 7 span tifs simplified with glyph representations, in addition to several 3- and nodes anchored by ‘Susan’ and ‘Carol’. 4-parallel motifs in the center that were not simplified. Additionally, automatic layout algorithms may be effective for lay- Our algorithm for detecting D-parallel motifs is shown in Algo- ing out the reduced networks resulting from these motif simplifica- rithm2, with parameters D-min and D-max to indicate the range of tions. Ideally the layout would take the shape and size of the glyphs dimensions to search for. In most of our examples we searched for into account, in addition to the number of edges in any meta-edges. only 2-parallel motifs, with D-min = D-max = 2. The run time com- However, standard layout algorithms are still useful in many cases. plexity of this algorithm does not vary with any reasonable range of dimensions, and is also O(|G.nodes| × average neighbor count). 4 MOTIF DETECTION Parallel motifs are not as straightforward to detect as fan motifs, de- In this section we detail algorithms for detecting and recording the fan spite the algorithm having the same run time complexity. Algorithm2 and parallel motifs in a network. We use the terminology of a network is broken into several procedures and a class to store the details for or graph G with a set of nodes G.nodes, and each node n has a set of each potential parallel motif. The detect in the algorithm (Line3) adjacent nodes n.neighbors. The size of each of these node sets, say s, passes through all nodes in the network, searching for potential span is denoted as |s|. nodes. Each span node must have between D-min and D-max neigh- bors, which must be anchor nodes. We require a minimum of two span nodes for the parallel motif, so each anchor node must have two 4.1 Fan Motif or more neighbors itself (Line7). At least two of the neighbors are Our approach to detecting all the fan motifs in a network is detailed span nodes, but the remainder can be connections to the rest of the in Algorithm1, which has a run time complexity of O(|G.nodes| × network or other anchor nodes in the motif. average neighbor count). The algorithm first passes through all the If all the anchor nodes check out, the span node is added to a paral- nodes in the network, searching for potential fan heads. Each fan head lel motif (Algorithm2, line 10) using the A DDSPAN procedure (Line must have two or more neighbors to exclude the degenerate barbell 28). This motif can be new or an existing one with the same set of an- case (Line3), though this criteria could be increased to find only larger chors. All existing motifs are stored in a map (Line2), using a string fans. For each potential fan head, we then search through the set of its representation of the anchors as a key and an instance of the PAR- neighbors to find any leaf nodes connected only to it (Line5). Each of ALLELMOTIF class (Line 35) as the associated value. This allows these leaf nodes are added to the set of potential leaves. If two or more speedy lookup of each motif given a sorted anchor set. Note that the leaves have been detected in the neighbor set, the found fan motif is anchor set and its string representation must be sorted so as to avoid acceptable and recorded (Line 10). having motifs with identical anchor sets but the anchors were found in The differing neighbor count criteria for head and leaf nodes in Al- a different order. gorithm1 prohibit any overlapping motifs from being detected. How- After searching for all potential span nodes, Algorithm2 requires an ever, please note that we are using |n.neighbors| to show the size of the additional pass over the detected parallel motifs to ensure that (1) they neighbor set of n, which may differ from n’s degree if there are over- have two or more span nodes and (2) they do not overlap with other lapping edges. For example, in a network with directed edges a leaf parallel motifs. The filter loop on Line 15 goes through each potential node may have two overlapping edges connecting it to the head node, PARALLELMOTIF instance in the map to verify that they pass these one for each direction. Moreover, an undirected network with several two criteria. The first criteria, the minimum number of span nodes,

5 Algorithm 2 D-Parallel motif detection algorithm. [D-min, D-max] is the range of dimensions of the parallel motifs to find (the number of anchors). Time complexity: O(|G.nodes| × average neighbor count). 1: procedure DETECTPARALLELS(D-min, D-max) 2: parallels ← Maphstring, PARALLELMOTIFi 3: detectLoop: 4: for all n ∈ G.nodes do 5: if |n.neighbors| ∈ [D-min, D-max] then 6: for all nbr ∈ n.neighbors do 7: if |nbr.neighbors| < 2 then 8: continue detectLoop 9: end if 10: ADDSPAN(n.neighbors.sorted, n, parallels) 11: end for 12: end if 13: end for 14: usedNodes ← {/0} 15: filterLoop: 16: for all p ∈ parallels.values do 17: if |p.spanners| ≥ 2 then 18: for all s ∈ p.spanners do Fig. 9: An image of the standard NodeXL workspace, showing U.S. 19: if s ∈ usedNodes then Senate voting patterns from 2007. The left view shows the worksheets 20: continue filterLoop that store the network and its attributes, while the right pane shows a 21: end if sociogram of the network. 22: end for 23: RECORDPARALLEL(p) 24: usedNodes.addAll(p.spanners ∪ p.anchors) 5 NODEXLIMPLEMENTATION 25: end if 26: end for We have implemented a reference implementation of our motif simpli- 27: end procedure fication approach and made it publicly available as part of the NodeXL network analysis tool [8, 26]. NodeXL is a free and open source tem- 28: procedure ADDSPAN(anchors, spanner, parallels) plate for Excel 2007/2010 that is tailored to be “your first network 29: key ← string(anchors) analysis tool.” The basic interface of NodeXL is shown in Fig.9. The 30: if key ∈/ parallels then left side provides several worksheets in an Excel workbook that repre- 31: parallels[key] ← new PARALLELMOTIF(anchors) sents the network: one each for the nodes, edges, and any groups. Each 32: end if worksheet has several columns, including basic information about the 33: parallels[key].add(spanner) network like the nodes and edges between them. Additionally, there 34: end procedure are places to insert columns for node or edge attributes and calculated metrics, as well as columns that control the visual display of each net- 35: class PARALLELMOTIF work item. These include color, shape, size, label, tooltip, display po- 36: dimension ← 0 sition, and the like. Any of these visual properties can be automatically 37: anchors ← {/0}, spanners ← {/0} filled based on the metric or attribute columns using a special autofill 38: procedure PARALLELMOTIF(new-anchors) dialog. Moreover, standard Excel formulas or macros can be used for 39: dimension ← |new-anchors| arbitrary calculations and scales. The Excel ribbon is customized with 40: anchors ← new-anchors a new tab for many of the common operations users perform on net- 41: end procedure works, including the autofill feature. 42: end class The visualization pane shown in the right of Fig.9 displays a so- ciogram based on the network in the workbook. Whenever the con- 43: procedure RECORDPARALLEL(parallelMotif) tents of the workbook is updated, the visualization pane can be up- 44: ··· . Record a given parallel motif dated using a button. The pane also provides users with several auto- 45: end procedure matic layout algorithms to arrange the network, and any automatic or manual adjustments to the node positions are stored in the workbook as well. Moreover, the contents of the visualization can be filtered using a dynamic filters dialog. The worksheet view and the visualization pane are connected using brushing, where any selection in one is reflected in the other. Clicking could be increased if only larger higher payoff motifs are of interest a node in the visualization or dragging a box around several causes the (Line7, 17). The sole example we have found that matches the second associated rows to be selected in the nodes worksheet. Likewise, any criteria, parallel motif overlap, is a ring of four nodes A−B−C−D−A incident edges are selected in the edges worksheet. The reverse is also isolated from the rest of the network. In this case it is unclear whether true. Any nodes or edges selected in the worksheets are highlighted in to choose A & C or B & D as the 2-parallel motif anchors, as we do the visualization pane as well. not allow overlap. As there may be other yet undetected examples of overlap that need to be caught, we chose a general overlap detection 5.1 Motif Simplification in NodeXL approach that compares each node in a motif to all nodes in already NodeXL is widely used in many disciplines and has a full-time devel- detected motifs (Line 14, 18, 24). From a set of overlapping motifs we oper as well as a team of volunteer advisors, including the authors of choose to keep the first detected, however if more overlapping cases this paper. Many introductory courses on network analysis have used emerge a choice heuristic based on anchor importance metrics may be NodeXL and its companion book [27] as part of their curriculum,1 and necessary. After passing the minimum span count and overlap detec- tion checks, the detected parallel motif is then recorded (Line 23). 1http://goo.gl/oa4tg

6 HCIL Tech Report user studies have shown NodeXL to be effective in these situations [28, Metric (before ⇒ after) Lostpedia VOSON 29]. Given that these users generally have little prior knowledge about Number of nodes 513 ⇒ 25 3958 ⇒ 559 network visualization readability, we believe that they will particularly Number of edges 586 ⇒ 40 4380 ⇒ 765 benefit from our interactive motif simplification techniques. Graph density 0.00446 0.00056 Fan motifs 4 16 We have integrated our motif simplifications into the standard 2-parallel motifs 4 24 NodeXL groups infrastructure, which stores groups using two work- Fan sizes 7–247 17–852 sheets: (1) Groups which contains a row for each group and its at- 2-parallel sizes 7–28 2–50 tributes, and (2) Group Vertices where each row maps an individual Node-node overlap 0.981 ⇒ 0.983 0.709 ⇒ 0.971 grouped node to its associated group. These worksheets can be pop- Edge crossing 0.999 ⇒ 0.917 0.989 ⇒ 0.910 ulated automatically in a variety of manners, including detection of Coverage 0.179 ⇒ 0.046 0.456 ⇒ 0.090 topological clusters, exact-value attribute groupings, connected com- ponents, and now the fan and parallel network motifs. The NodeXL Table 1: Motif simplification results on two network visualizations group model allows for nodes that are in no group at all, which is im- portant for motif simplification as not every node in the network is part of a motif. Note however that this group model does not allow 6.1 Visualization Coverage Metric ℵ overlapping groups, which means that special care must be given to vc the definition of what members of each motif constitute the group in The visualization coverage or ink metric denoted ℵvc is our attempt the worksheets. to quantify the amount of screen space used by the visual items in a visualization compared to the entire space available. It is formulated In the group worksheets users can interactively edit the labels, at- as the area occupied by all visual items divided by the area of the tributes, visual encoding, and membership of specific groups; remove screen space. The objective of this metric is to measure the amount groups completely; or even create custom sets of groups by editing of theoretically available screen space, so as to quantify the reduction the worksheets or visual interaction with the sociogram visualiza- in in ink presented to the user after filtering or motif simplification. tion. Moreover, automated statistics can be computed for each group It can also measure the reduction in ink by using aggregate edges (or and added to the Groups worksheet, including node & edge counts, no edges) between groups in a meta-layout like NodeXL’s Group-in- geodesic distances, and graph density; as well as the number of edges a-Box layout [30]. between pairs of groups in a special Group Edges worksheet. Here we use a notation of a network or graph G with |G.nodes| After the groups have been computed or entered into the worksheets nodes and |G.edges| edges and a network visualization V(G). Each in- manually, users can display them in the visualization pane. When dividual node n ∈ G.nodes and edge e ∈ G.edges is indexed using sub- users select a group in the worksheet, all its member nodes are selected scripts (e.g., ni,e j). For any node, edge, or visualization k, bounds(k) in the visualization. Likewise, for any nodes selected in the visualiza- indicates a bounding shape b for that item in the visualization, and tion users can select any groups in the worksheet that contain them area(b) denotes the area of that bounding shape. using the ribbon menu. By default, groups are shown in their original The visualization coverage metric ℵvc is defined as follows: expanded form based on the current layout algorithm, with categori- cal color and shape coding so as to distinguish them from each other. [ However, users can switch between the original expanded form and bn = bounds(n) (1) an alternate collapsed form for specific selected groups or all groups. n∈G.nodes This is done using the context menu in the visualization pane or the [ be = bounds(e) (2) ribbon groups menu. e∈G.edges The default collapsed form for groups is a meta-node representa- a = area(bn ∪ be) (3) tion of a the same categorically coded shape with a plus sign inside to indicate its status (e.g., L), sized proportional to the number of nodes the group contains and with any associated label next to it. However, namax = argmax area(bounds(ni)) (4) n ∈ the groups for our motifs use their representative glyphs that were de- i G.nodes scribed in Section3. When a collapsed group is selected in the visual- eamax = argmax area(bounds(e j)) (5) ization pane it is also selected in the Groups worksheet, and its position e j∈G.edges in the visualization can be adjusted with the mouse. These collapsed a∆ = max(namax,eamax) (6) representations are by default colored using the same categorical col- amax = area(bounds(V(G))) (7) oring as for the expanded version so the association between views can a − a∆ be easily identified. Through an option in the groups menu, users can ℵvc = (8) switch from the default categorical colors and shapes to the underlying amax attribute encodings the user specified for the nodes. This updates all First, a union is computed of all the node bounding shapes and collapsed motifs so that they show the aggregate attribute information edge bounding shapes in the visualization, including all meta-nodes about the underlying nodes they represent (see Section3 for details). and meta-edges. In order for the metric to have a range of [0,1], this area a must have the maximum node or edge area a∆ subtracted from it. This quantity is then divided by the total visualization area. 6 QUANTIFYING EFFECTIVENESS 6.2 Case Studies To evaluate the effectiveness of the motif simplifications, we first mea- In our efforts to understand the impact of motifs in network visual- sured the changes between several visualizations before and after mo- izations we have explored several example datasets and their motif tif simplification. Some of these measures are straightforward and simplification versions. The visualization coverage and group over- based on the number of visual items shown, such as the number of lap metrics, along with several other measures, were applied to two motifs detected of each type and the change in the number of dis- network visualizations of varying complexity before and after motif played nodes, meta-nodes, edges, and meta-edges between visualiza- simplification. The networks are described in the following sections, tions. Two other metrics we use are readability metrics that measure and their results are shown in Table1. We also tested several other net- the amount of node-node overlap and edge crossing in the visualiza- works not described here, including one with 7124 nodes and 16,109 tions [13], are both common problems sociograms face. Additionally, edges that had 529 detected motifs. In that case the motif simplifica- we define a new measure to quantify the amount of the visualization tion eliminated 3853 nodes and 1437 edges, and seemed effective at space used. revealing previously unseen features.

7 Fig. 10: This visualization represents the network of web pages con- Fig. 11: The same as Fig. 10, with each fan and 2-parallel motif shown nected to voson.anu.edu.au obtained by a web crawl, modified from in a distinct color and shape. Fig. 12.9 of the NodeXL Book [27, p. 192]. A similar graph for wiki structure is shown on p.259.

and the remaining 46% used to show the fan motifs. Calculating only 6.2.1 Lostpedia Wiki Edits for the elliptical visualization region, approximately 58% of the space One example we investigated is shown in Fig.1. The network repre- available is used to show the fan motifs. This is a substantial amount of sents the bipartite network for the Lostpedia wiki community collected screen area dedicated to showing a very common structure in network by Beth Foss in Derek Hansen’s Communities of Practice class. Page datasets obtained by crawling web sites, social networks, or using sur- nodes shown as boxes with labels and are connected by the contrib- veys. Moreover, the visualizations of these fans do not provide any utors editing them, with some contributors editing only one page and additional information besides the number of nodes they contain. The other users editing two or more. Contributors are colored by their total fans in the visualization range from 17 to 852 nodes, but due to overlap number of edits. their size can be difficult distinguish. The right side of Fig.1 shows the simplified version of the net- Some of the overlap between motifs and and with other nodes is work, with specific measures shown in Table1. There was a substan- visible in Fig. 11, where we have colored and shaped each of the fans tial reduction in both node count from 513 to 25, with several large of nodes distinctly. You can see in the bottom-right that the large green fans contributing the most to the simplification. Overall node over- and blue fans overlap substantially, while many of the smaller fans are lap improves slightly, but the original visualization was rather good spread in several directions or hidden in the interior. Moreover, many to start with. The edge crossing worsens, though this is due to the of the fans overlap and obscure other more important nodes that are high connectivity of the remaining nodes and motifs. The simplified not participating in any fan, such as the light green nodes hidden in the visualization occupies only a quarter of the original screen space, and blue and green fans at the bottom-right. Those light green nodes are additional layout would improve its presentation substantially. While actually a huge 2-parallel motif with 50 span nodes. This 2-parallel these simplifications are not completely necessary to understand such motif, as well as the several others connecting parts of the web page a small and well-arranged visualization, they appear effective and un- network together, are quite hard to detect among the clutter. derstandable. Our next step was to simplify the representation of the fan and 2- 6.2.2 VOSON Web Crawl parallel motifs we detected by replacing them with the motif glyphs. The simplified version shown in Fig. 12 is much less cluttered, occupy- A larger real-world dataset we encountered is shown in Fig. 10, which ing 80% less space than the original visualization ( Table1). Applying we modified from from the NodeXL book, Fig. 12.9 [27, p. 192]. This an automatic layout algorithm to this simplified network would result network of 3958 web pages and the 4380 links connecting them was in a new layout that makes more effective use of the newfound space. collected by crawling sites connected to voson.anu.edu.au. Nodes are The node overlap in the visualization was improved substantially from colored based on their betweeness centrality. It is immediately evident 0.709 to 0.971, primarily due to eliminating the overlap between fan that large fans of nodes dominate the periphery of the visualization. leaf nodes. The edge crossing metric for the simplified visualization This is partially because the NodeXL [26] implementation of the lay- was again somewhat worse, due to the increased density of the simpli- out algorithm used, Fruchterman-Reingold [31], tends to create ellipti- fied network. cal layouts within the rectangular visualization space. However, these kinds of structures tend to dominate network visualizations regardless Subjectively, this visualization is much clearer at presenting (1) the of the layout algorithm used. For example, Fig. 10 shows the same size and membership of the various fans motifs, (2) the large parallel graph using the Harel-Koren NodeXL layout [32]. motifs connecting pairs of fan heads, and (3) the underlying attribute Our manual calculations using Gimp showed that approximately encoding for specific subsets. Moreover, it appears that there is min- 21% of the screen space in Fig. 10 is wasted as blank space in the imal loss of information and visual clutter compared to the original corners, with 33% showing the core network with its parallel motifs, visualization.

8 HCIL Tech Report

number of analyses. However, the current glyph design needs some improvement. While the fan glyph was easily understandable, its bor- der caused some error during size comparison. The parallel glyph was harder to compare, and the varied edge connection locations the arch used were confusing. There are many avenues of future work on new motifs and their glyphs, as well as many sources for additional complexity. Fuzzy mo- tifs like almost-cliques pose an interesting challenge, both in terms of glyphs and the potential for motif overlap. Displaying the ambiguity of these motifs in the glyph representations is an interesting problem. In the almost- case, the absence of specific edges in the motif can be represented as light cuts across a regular hexagon glyph that shows a complete clique. Similarly, the presence of additional edges in a fan motif, connecting the leaf nodes, or in a parallel motif, connecting the span nodes, can be shown using various line styles or textures for the glyph components. Directed networks have the added complexity of edge directional- ity, which is important for some tasks like determining information flow and trust analysis. This edge directionality needs to be taken into account in the glyph design so as to show these flows. For example, the Fig. 12: The same as Fig. 11, with each of the fan motifs converted fan motif’s leaf glyph can be divided into three representatively sized into small fan glyphs sized by the number of nodes they contain. segments to represent edges pointing to the head node, to leaf nodes, or in both directions (reciprocated ties). The directionality of each seg- ment can be shown with small arrows, or by arranging the segments at different angles around the head node. Another approach would be to 7 USER IMPRESSIONS embed a flow visualization or matrix inside the motif glyph. We invited four individuals to use the motif simplification techniques For arbitrary networks, the expressed motifs may be different from inside NodeXL to analyze the Lostpedia and VOSON datasets from the ones we have designed glyphs to represent. A motif census tool Section 6.2, as well as an additional dataset of co-patent relationships could be created that makes recommendations for specific motif sim- between innovators. These participants had varying backgrounds, plifications to target based on readability metrics for the original and including Computer Science, Information Studies, and Economics. reduced visualizations. They also had varying education, including a recent undergraduate stu- dent, two graduate students, and a professor. They all had little or no 9 CONCLUSION experience with NodeXL and none with motif simplification. We present motif simplification, a technique for increasing the read- After an initial hands-on training session, we invited participants to ability sociogram network visualizations. With motif simplification, explore the networks and recorded anything they had difficulty with or common repeating network motifs are replaced with easily under- mentioned. Their explorations ranged from 45–60 minutes. Overall standable motif glyphs that require less space, are easier to understand, they were quite excited by the motif simplifications, and especially for and reveal hidden relationships. We contribute the design of glyph re- the VOSON example eager to change to the simplified version. One placements for the fan and parallel motifs common in social network of them stated about the original VOSON view, “I’m overwhelmed, ... datasets, as well as algorithms for detecting the fan and parallel motifs. this is like one of those vision tests at the eye doctor”, but when asked Moreover, we provide a visualization space used metric for measuring to switch to the simplified view emphatically stated, “Yes please!”. the space saved by the motif glyphs. Finally, we have developed a free Asked afterward about her overall impressions of motif simplification, and open source reference implementation, made publicly available as one participant said, “I like it because it makes more sense. For spe- part of NodeXL. cific nodes it is easier to look at the spreadsheet side”. None of the With two case studies and feedback from four initial users, we participants detected the bottom-right parallel motif hidden in the VO- demonstrate the effectiveness of motif simplification as well as areas SON fan motifs, but did immediately in the simplified view. to focus on for improving glyph design. The motif simplifications There were several issues the participants encountered, though. can result in substantial reductions in visual complexity, allowing eas- First, they wanted to simplify all repeating patterns they saw, not just ier understanding and manipulation of larger network visualizations. fans and parallel motifs. One even did the simplification manually us- There are several avenues for exploration opened up by this work, in- ing the standard meta-nodes. Next, they were unsure about the design cluding additional glyphs for other common motif types, algorithms of the parallel motif. They did not understand why edges connected and glyphs for fuzzy motifs, and methods for showing edge direction- to the arch in several places instead of only the corners, and had dif- ality within glyphs. ficulty comparing parallel glyph size exactly. A few even confused the parallel glyphs with overlapping or strange fan glyphs. In spite of ACKNOWLEDGMENTS these reservations, they strongly appreciated the benefits of simplify- The authors wish to thank Marc Smith and the NodeXL team for their ing complex networks and expressed enthusiasm for integration of the support. This work was supported in part by the Social Media Re- glyphs in future sociograms. search Foundation and the Connected Action Consulting Group.

8 DISCUSSION &FUTURE WORK REFERENCES Overall, the case studies in Section 6.2 and user impressions in Sec- [1] J. L. Moreno, Who shall survive? Foundations of sociometry, tion7 seem to indicate that motif simplification is an effective way group psychotherapy and sociodrama. Beacon House, 1953, of reducing sociogram complexity. By replacing the common repeat- p. 141. ing motifs with representative glyphs, many nuances of the network [2] J. Blythe, C. McGrath, and D. Krackhardt, The effect of graph are revealed. When one participant was looking for relationships in layout on inference from social network data, in GD ’95: Proc. the network, she stated, “I could only look at two at a time”. This 3rd International Symposium on , ser. GD ’95, seems to indicate that the simplified view will help users understand 1996, pp. 40–51. DOI: 10.1007/BFb0021783. larger relationships in the network, as parallel glyphs and fan glyphs allow the comparisons of larger subsets of the network and reduce the

9 [3] D. Fisher, M. Smith, and H. T. Welser, You are who you talk [18] C. Dunne, N. H. Riche, B. Lee, R. A. Metoyer, and G. G. to: Detecting roles in Usenet newsgroups, in HICSS ’06: Proc. Robertson, GraphTrail: Analyzing large multivariate and het- 39th Annual Hawaii International Conference on System Sci- erogeneous networks while supporting exploration history, in ences, 2006, p. 59.2. DOI: 10.1109/HICSS.2006.536. CHI ’12: Proc. 2012 international conference on Human fac- [4] H. T. Welser, E. Gleave, D. Fisher, and M. Smith, Visualizing tors in computing systems, 2012. the signatures of social roles in online discussion groups, JOSS: [19] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, Journal of Social Structure, vol. 8, no. 2, 2007. and U. Alon, Network motifs: Simple building blocks of com- [5] L. A. Adamic and N. Glance, The political blogosphere and the plex networks. Science, vol. 298, no. 5594, pp. 824–827, 2002. 10.1126/science.298.5594.824 2004 U.S. election: Divided they blog, in LinkKDD ’05: Proc. DOI: . 3rd International Workshop on Link Discovery, 2005, pp. 36– [20] X. Zhu, M. Gerstein, and M. Snyder, Getting connected: Analy- 43. DOI: 10.1145/1134271.1134277. sis and principles of biological networks. Genes Development, 10.1101/gad. [6] G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis, Graph vol. 21, no. 9, pp. 1010–1024, 2007. DOI: 1528707 drawing: Algorithms for the visualization of graphs, L. Steele, . Ed. Prentice Hall, 1998. [21] N. M. Luscombe, M. M. Babu, H. Yu, M. Snyder, S. A. Te- [7] C. McGrath, J. Blythe, and D. Krackhardt, The effect of spatial ichmann, and M. Gerstein, Genomic analysis of regulatory arrangement on judgments and errors in interpreting graphs, network dynamics reveals large topological changes. Nature, vol. 431, no. 7006, pp. 308–312, 2004. DOI: 10 . 1038 / Social Networks, vol. 19, no. 3, pp. 223–242, 1997. DOI: 10. nature02782 1016/S0378-8733(96)00299-7. . [8] M. Smith, B. Shneiderman, N. Milic-Frayling, E. M. Ro- [22] P. Ye, B. D. Peyser, F. A. Spencer, and J. S. Bader, Commen- drigues, V. Barash, C. Dunne, T. Capone, A. Perer, and E. surate distances and similar motifs in genetic congruence and Gleave. (2010). NodeXL: A free and open network overview, protein interaction networks in yeast. BMC Bioinformatics, vol. 10.1186/1471-2105-6-270 discovery and exploration add-in for Excel 2007/2010, Social 6, p. 270, 2005. DOI: . Media Research Foundation, [Online]. Available: http : / / [23] J. Grochow and M. Kellis, Network motif discovery using Sub- .codeplex.com. graph Enumeration and Symmetry-Breaking, in RECOMB ’07: [9] A. Perer and B. Shneiderman, Balancing systematic and flexible Proc. 11th iInternational conference on Research in Computa- 10.1007/ exploration of social networks, TVCG: IEEE Transactions on tional Molecular Biology, 2007, pp. 92–106. DOI: 978-3-540-71681-5_7 Visualization and Computer Graphics, vol. 12, no. 5, pp. 693– . 700, 2006. DOI: 10.1109/TVCG.2006.122. [24] C. Klukas, F. Schreiber, and H. Schwobbermeyer,¨ Coordinated [10] J. Abello and F. van Ham, Matrix Zoom: A visual interface to perspectives and enhanced force-directed layout for the anal- semi-external graphs, in INFOVIS ’04: Proc. 2004 IEEE Sym- ysis of network motifs, in APVis ’06: Proc. 2006 Asia-Pacific Symposium on Information Visualisation, 2006, pp. 39–48. posium on Information Visualization, 2004, pp. 183–190. DOI: 10.1109/INFVIS.2004.46. [25] S. Navlakha, M. C. Schatz, and C. Kingsford, Revealing bio- [11] N. Henry and J.-D. Fekete, MatrixExplorer: A dual- logical modules via graph summarization, Journal of Compu- 10. representation system to explore social networks, TVCG: IEEE tational Biology, vol. 16, no. 2, pp. 253–264, 2009. DOI: 1089/cmb.2008.11TT Transactions on Visualization and Computer Graphics, vol. 12, . no. 5, pp. 677–684, 2006. DOI: 10.1109/TVCG.2006.160. [26] M. Smith, B. Shneiderman, N. Milic-Frayling, E. M. Ro- [12] H. C. Purchase, Metrics for graph drawing aesthetics, Journal drigues, V. Barash, C. Dunne, T. Capone, A. Perer, and E. of Visual Languages & Computing, vol. 13, pp. 501–516, 2002. Gleave, Analyzing (social media) networks with NodeXL, in C&T ’09: Proc. fourth international conference on Communi- DOI: 10.1006/jvlc.2002.0232. ties and Technologies, 2009, pp. 255–264. DOI: 10.1145/ [13] C. Dunne and B. Shneiderman, Improving graph drawing read- 1556460.1556497. ability by incorporating readability metrics: A software tool for network analysts, University of Maryland, Human-Computer [27] D. Hansen, B. Shneiderman, and M. Smith, Analyzing social Interaction Lab Tech Report HCIL-2009-13, 2009. media networks with NodeXL: Insights from a connected world, M. James and D. Bevans, Eds. Morgan Kaufmann, 2011. [14] J. Abello, F. van Ham, and N. Krishnan, ASK-GraphView: a large scale graph visualization system, TVCG: IEEE Transac- [28] E. M. Bonsignore, C. Dunne, D. Rotman, M. Smith, T. Capone, tions on Visualization and Computer Graphics, vol. 12, no. 5, D. L. Hansen, and B. Shneiderman, First steps to NetViz Nir- vana: Evaluating with NodeXL, in CSE pp. 669–676, 2006. DOI: 10.1109/TVCG.2006.120. ’09: Proc. 2009 international conference on computational sci- [15] M. Freire, C. Plaisant, B. Shneiderman, and J. Golbeck, ence and engineering, vol. 4, 2009, pp. 332–339. DOI: 10 . ManyNets: An interface for multiple network analysis and vi- 1109/CSE.2009.120. sualization, in CHI ’10: Proc. 28th international conference on [29] D. L. Hansen, D. Rotman, E. M. Bonsignore, N. Milic- Human factors in computing systems, 2010, pp. 213–222. DOI: 10.1145/1753326.1753358. Frayling, E. M. Rodrigues, M. Smith, and B. Shneiderman, Do you know the way to SNA?: A process model for analyzing and [16] M. Wattenberg, Visual exploration of multivariate graphs, in visualizing social media data, University of Maryland, Human CHI ’06: Proc. SIGCHI conference on Human Factors in Com- Computer Interaction Lab Tech Report HCIL-2009-17, 2009. puting Systems, ser. CHI ’06, 2006, pp. 811–819. DOI: 10 . 1145/1124772.1124891. [30] E. M. Rodrigues, N. Milic-Frayling, M. Smith, B. Shneider- man, and D. Hansen, Group-in-a-Box layout for multi-faceted [17] H. Kang, C. Plaisant, B. Lee, and B. B. Bederson, NetLens: analysis of communities, in SocialCom ’11: Proc. 2011 IEEE Iterative exploration of content-actor network data, in VAST 3rd International Conference on Social Computing, 2011. ’06: Proc. IEEE Symposium on Visual Analytics Science And [31] T. M. J. Fruchterman and E. M. Reingold, Graph drawing by Technology, 2006, pp. 91–98. DOI: 10.1109/VAST.2006. 261426. force-directed placement, Software: Practice and Experience, vol. 21, no. 11, pp. 1129–1164, 1991. DOI: 10.1002/spe. 4380211102.

10 HCIL Tech Report

[32] D. Harel and Y. Koren, A fast multi-scale method for drawing large graphs, JGAA: Journal of Graph Algorithms and Appli- cations, vol. 6, no. 3, pp. 179–202, 2002.

11