Network CMSC828O Principles of Visualization Design

–Tamara Munzner. Visualization Analysis and Design. AK Peters Visualization Series. CRC Press, 2014. • http://www.cs.ubc.ca/~tmm/vadbook/

2 Why have a human in the loop?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods.

• don’t need vis when fully automatic solution exists and is trusted • many analysis problems ill-specified – don’t know exactly what questions to ask in advance • possibilities – long-term use for end users (e.g. exploratory analysis of scientific data) – presentation of known results – stepping stone to better understanding of requirements before developing models – help developers of automatic solution refine/debug, determine parameters

– help end users of automatic solutions verify, build trust 3 Why use an external representation?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

• external representation: replace cognition with perception

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.] 4 Anscombe’s Quartet: Raw Data 1 2 3 4 X Y X Y X Y X Y

Why represent all the data?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. Mean • summaries lose , details matterVariance – confirm expected and find unexpected patternsCorrelation – assess validity of statistical model Anscombe’s Quartet

Identical statistics x mean 9 x variance 10 y mean 7.5 y variance 3.75 x/y correlation 0.816 https://www.youtube.com/watch?v=DbJyPELmhJc

Same Stats, Different Graphs 5 Why focus on tasks and effectiveness?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

• tasks serve as constraint on design (as does data) – idioms do not serve all tasks equally! – challenge: recast tasks from domain-specific vocabulary to abstract forms • most possibilities ineffective – validation is necessary, but tricky – increases chance of finding good solutions if you understand full space of possibilities • what counts as effective? – novel: enable entirely new kinds of analysis – faster: speed up existing workflows

6 Why are there resource limitations?

Vis designers must take into account three very different kinds of resource limitations: those of computers, of humans, and of displays.

• computational limits – processing time – system memory • human limits – human attention and memory • display limits – pixels are precious resource, the most constrained resource – information density: ratio of space used to encode info vs unused whitespace • tradeoff between clutter and wasting space, find sweet spot between dense and sparse

7 Analysis: What, why, and how • what is shown? – data abstraction • why is the user looking at it? – task abstraction • how is it shown? – idiom: visual encoding and interaction

• abstract vocabulary avoids domain-specific terms – translation process iterative, tricky • what-why-how analysis framework as scaffold to think systematically about design space

8 Why analyze? SpaceTree TreeJuxtaposer • imposes structure on huge design space –scaffold to help you think systematically about choices –analyzing existing as stepping stone to designing new –most possibilities ineffective for [SpaceTree: Supporting Exploration in Large [TreeJuxtaposer: Scalable Tree Comparison Using particular task/data combination Node Link Tree, Design Evolution and Empirical Focus+Context With Guaranteed Visibility. ACM Trans. on Evaluation. Grosjean, Plaisant, and Bederson. Graphics (Proc. SIGGRAPH) 22:453– 462, 2003.] What? Why? How? Proc. InfoVis 2002, p 57–64.]

Tree Actions SpaceTree Present Locate Identify Encode Navigate Select Filter Aggregate

Targets TreeJuxtaposer Path between two nodes Encode Navigate Select Arrange

9 What? Datasets Attributes

Data Types Attribute Types Items Attributes Links Positions Grids Categorical

Data and Dataset Types Tables Networks & Fields Geometry Clusters, Ordered Trees sets, lists Ordinal Items Items (nodes) Grids Items Items Attributes Links Positions Positions Attributes Attributes Quantitative

Dataset Types Ordering Direction Tables Networks Fields (Continuous)Data Abstraction Sequential Attributes (columns) Grid of positions What? Datasets Attributes Link Items Cell (rows) Node Data Types Diverging Attribute Types (item) Items Attributes Links Positions Grids Categorical Cell containing value Attributes (columns)

Data and Dataset Types Value in cell Cyclic Multidimensional Trees Tables Networks & Fields Geometry Clusters, Ordered Trees Sets, Lists Ordinal Items Items (nodes) Grids Items Items Attributes Links Positions Positions Attributes Attributes Quantitative

Value in cell Dataset Types Ordering Direction Tables Networks Fields (Continuous) Sequential Attributes (columns) Grid of positions

Link (Spatial) Items Cell Geometry (rows) Node Diverging (item) Cell containing value Attributes (columns)

Value in cell Cyclic Multidimensional Table Trees Position

Value in cell Dataset Availability What? Geometry (Spatial) Static Dynamic Why?

Position How? Dataset Availability What? [VAD Fig 2.1] Static Dynamic Why? 10

How? VAD Ch 3: Task Abstraction Why? Actions Targets

Analyze All Data Consume Trends Outliers Features Discover Present Enjoy

Attributes Produce Annotate Record Derive One Many

tag Distribution Dependency Correlation Similarity

Extremes Search

Target known Target unknown Location • {action, target} pairs Lookup Browse Network Data known Location Locate Explore Topology –discover distribution unknown –compare trends Query Paths –locate outliers Identify Compare Summarize What? –browse topology Spatial Data Why? Shape How?

[VAD Fig 3.1] 11 How? Encode Manipulate Facet Reduce

Arrange Change Juxtapose Filter Express Separate

Order Align Select Partition Aggregate

Use How? Navigate Superimpose Embed How? How? EncodeEncode ManipulEnatceode ManipulFacatete ManipulatRedue Facecet Facet ReduReducece

ArArangerrange MapArrCangehange ChangeJuxtapose ChangeFilJuterxtaposeJuxtapose FilterFilter Express Separate from cExpategoressrical and oSrepaderredat e Express Separaattteributes Color Hue Saturation Luminance Order Align OSelerderct Align Partition Select Aggregate Partition Aggregate

Order Align Size, Angle, CurvatureS, ele... ct Partition Aggregate Use Use Navigate Superimpose NavigatEmbede Superimpose Embed Shape What? Use Map Navigate Superimpose Embed Motionfrom categorical and ordered Why? from categorical and ordered Direction, Rate, Frequency, ... attributes attributes How? Color Color Hue Saturation Luminance Hue Saturation Luminance Map

fSirzome, Angl caet,ego Curvraicalture, and... ordered Size, Angle, Curvature, ... attributes

Color 12 Shape Shape What? Hue Saturation Luminance What?

Motion Why? Motion Direction, Rate, Frequency, ... Why? DirSiection,ze, R Aatengl, Frequene, Ccy,u ...rvature, ... How? How?

Shape What?

Motion Why? Direction, Rate, Frequency, ... How?

Actions High-level actions: Analyze • consume Analyze –discover vs present Consume Discover Present Enjoy • classic split • aka explore vs explain –enjoy • newcomer Produce • aka casual, social Annotate Record Derive

tag • produce –annotate, record –derive Search • crucial design choice Target known Target unknown

Location 13 Lookup Browse known

Location Locate Explore unknown

Query Identify Compare Summarise Derive • don’t just draw what you’re given! –decide what the right thing to show is –create it with a series of transformations from the original dataset –draw that • one of the four major strategies for handling complexity

exports imports trade balance

trade balance = exports −imports Derived Data Original Data 14 Actions

Analyze

Consume Discover Present Enjoy

Produce Annotate Record Derive

tag Actions: Mid-level search, low-level query

• what does user know? Search

–target, location Target known Target unknown

Location Lookup Browse • how much of the data known Location matters? Locate Explore unknown –one, some, all

Query • independent choices, Identify Compare Summarize mix & match –analyze, query, search

15 Why? Actions Targets

Analyze All Data Consume Trends Outliers Features Discover Present Enjoy

Attributes Produce Annotate Record Derive One Many

tag Distribution Dependency Correlation Similarity

Why? Extremes Targets Search Actions Targets Target known Target unknown Location Analyze All Data Lookup Browse Network Data known Consume Trends Outliers Features Location Locate Explore Topology Discover Present Enjoy unknown

Query Paths Attributes Identify Compare Summarize Produce What? Annotate Record Derive One Many Spatial Data tag Distribution Dependency Correlation Similarity Why? Shape Extremes How? Search

Target known Target unknown

Location 16 Lookup Browse Network Data known Location Locate Explore Topology unknown

Query Paths Identify Compare Summarize What? Spatial Data Why? Shape How? Analysis example: Compare idioms SpaceTree TreeJuxtaposer [SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation. Grosjean, Plaisant, and Bederson. Proc. InfoVis 2002, p 57–64.]

[TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context With Guaranteed Visibility. ACM Trans. on Graphics (Proc. SIGGRAPH) 22:453– 462, 2003.]

What? Why? How?

Tree Actions SpaceTree Present Locate Identify Encode Navigate Select Filter Aggregate

Targets TreeJuxtaposer Path between two nodes Encode Navigate Select Arrange

17 Analysis example: Derive one attribute • Strahler number – centrality metric for trees/networks – derived quantitative attribute – draw top 5K of 500K for good skeleton [Using Strahler numbers for real time visual exploration of huge graphs. Auber. Proc. Intl. Conf. Computer Vision and Graphics, pp. 56–69, 2002.]

Task 1 Task 2

.58 .74 .58 .74 .64 .64 .54 .84 .54 .84 .74 .84 .74 .84 .24 .84 .24 .84 .64 .64 .94 .94 In Out In In Out Tree Quantitative Tree + Quantitative Filtered Tree attribute on nodes attribute on nodes Removed unimportant parts

What? Why? What? Why? How? In Tree Derive In Tree Summarize Reduce Out Quantitative In Quantitative attribute on nodes Topology Filter attribute on nodes Out Filtered Tree 18 Definitions: Marks and channels

• marks Points Lines Areas – geometric primitives

Position Color • channels Horizontal Vertical Both – control appearance of marks

Shape Tilt

Size Length Area Volume

19 Encoding visually with marks and channels • analyze idiom structure – as combination of marks and channels

1: 2: 3: 4: vertical position vertical position vertical position vertical position horizontal position horizontal position horizontal position color hue color hue size (area)

mark: line mark: point mark: point mark: point

20 Channels: Expressiveness Rankings Types And Efectiveness Ranks

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region

Position on unaligned scale Color hue

Length (1D size) Motion

Tilt/angle Shape

Area (2D size) • effectiveness principle Depth (3D position) – encode most important attributes with Color luminance highest ranked channels

Color saturation • expressiveness principle

Curvature – match channel and data characteristics

Volume (3D size) 21 Accuracy: Fundamental Theory

22 Rules of Thumb • No unjustified 3D –Power of the plane, dangers of depth –Occlusion hides information – distortion loses information –Tilted text isn’t legible • No unjustified 2D • Eyes beat memory • Resolution over immersion • Overview first, zoom and filter, details on demand • Function first, form next

• (Get it right in black and white) 23 No unjustified 3D: Power of the plane • high-ranked spatial position channels: planar spatial position

Channels:– not depth! Expressiveness Types And Efectiveness Ranks

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region

Position on unaligned scale Color hue

Length (1D size) Motion

Tilt/angle Shape

Area (2D size)

Depth (3D position)

Color luminance 24

Color saturation

Curvature

Volume (3D size) Occlusion hides information • occlusion • interaction complexity

[Distortion Viewing Techniques for 3D Data. Carpendale et al. InfoVis1996.] 25 Targets

All Data Trends Outliers Features

Attributes One Many

Distribution Dependency Correlation Similarity

No unjustified 2D Extremes • consider whether network data requires 2D Targets spatial layout All Data Network Data –especially if reading text is central to task! Trends OutliersTopology Features –arranging as network means lower information density and harder label lookup compared to text lists Paths • benefits outweigh costs when topological Attributes structure/context important for task One Many –be especially careful for search results, document Distribution SpatialDependency Data Correlation Similarity collections, ontologies Shape

Extremes

26

Network Data Topology

Paths

Spatial Data Shape Eyes beat memory example: Cerebral

• small multiples: one graph instance per experimental condition –same spatial layout –color differently, by condition

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE Trans. Visualization and (Proc. InfoVis 2008) 14:6 (2008), 1253–1260.] 27 Why not animation? • disparate frames and regions: comparison difficult –vs contiguous frames –vs small region –vs coherent motion of group

• change blindness –even major changes difficult to notice if mental buffer wiped

• safe special case –animated transitions

28 Actions

Analyze

Consume Discover Present Enjoy

Produce Annotate Record Derive

tag

Search Overview first, zoom and filter, details onTarget demand known Target unknown Location Lookup Browse • influential mantra from Shneiderman known Location Locate Explore [The Eyes Have It: A Task by Data Type Taxonomy forunknown Information Visualizations. Shneiderman. Proc. IEEE Visual Languages, pp. 336–343, 1996.]

Query • overview = summary Identify Compare Summarise –microcosm of full vis design problem

• nuances –beyond just two levels: multi-scale structure –difficult when scale huge: give up on overview and browse local neighborhoods? [Search, Show Context, Expand on Demand: Supporting Large Graph Exploration with Degree-of-Interest. van Ham and Perer. IEEE Trans. Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 953–960.] 29 Arrange networks and trees Arrange Networks and Trees Node–Link Connection Marks

NETWORKS TREES

Adjacency Matrix Derived Table

NETWORKS TREES

Enclosure Containment Marks

NETWORKS TREES

30 !"#$"%#& '!"#$%&'( '()*+,#-./.%)- '0)+$*# *0123 !"#$%&'(#%$)%*+,#-./

Idiom: force-directed placement • visual encoding –link connection marks, node point marks • considerations –spatial position: no meaning directly encoded • left free to minimize crossings –proximity semantics? • sometimes meaningful • sometimes arbitrary, artifact of layout algorithm • tension with length – long edges more visually salient than short 12%3'3%,45#'6)$*#78%$#*.#8'9$/42'32)&3'*2/$/*.#$'*)7)**+$#-*#'%-'!"#$%&#'()*+"#:';'42<3%*/5'3%,+5/.%)-')6 • tasks *2/$9#8'4/$.%*5#3'/-8'34$%-93'45/*#3'$#5/.#8'*2/$/*.#$3'%-'*5)3#$'4$)=%,%.<>'&2%5#'+-$#5/.#8'*2/$/*.#$3'/$#'6/$.2#$ /4/$.:'?/<)+.'/59)$%.2,'%-34%$#8'@<'1%,'(&<#$'/-8'12),/3'A/B)@3#-:'(/./'@/3#8')-'*2/$/*.#$'*)/44#/$#-*#'%- –explore topology; locate paths, clusters C%*.)$'D+9)E3'!"#$%&#'()*+"#>'*),4%5#8'@<'()-/58'F-+.2: )*+,-'./*0'

• scalability != !!"#!"#$%&!$!'()* !8 !!!!!&+#,&%!$!-)). !3 ! –node/edge density E < 4N !R !!"#!/0102!$!$345/61+4/6%+,0278)9:. !- ! http://mbostock.github.com/d3/ex/force.htmldiom: sfdp (multi-level force-directed placement) • data –original: network –derived: cluster hierarchy atop it • considerations –better algorithm for same encoding technique • same: fundamental use of space • hierarchy used for algorithm speed/quality but [Efficient and high quality force-directed graph drawing. not shown explicitly Hu. The Mathematica Journal 10:37–71, 2005.] • scalability –nodes, edges: 1K-10K –hairball problem eventually hits

http://www.research.att.com/yifanhu/GALLERY/GRAPHS/index1.html 32 i i

i i

7.1. Using Space 135

Idiom: adjacency matrix view • data: network –transform into same data/encoding as heatmap

[NodeTrix: a Hybrid Visualization of Social Networks. • derived data: table from network Henry, Fekete, and McGuffin. IEEE TVCG (Proc. InfoVis) Figure 7.5: Comparing13(6):1302-1309, matrix and 2007.] node-link views of a five-node network. –1 quant attrib a) Matrix view. b) Node-link view. From [Henry et al. 07], Figure 3b and • weighted edge between nodes 3a. (Permission needed.)

–2 categ attribs: node list x 2 the number of available pixels per cell; typically only a few levels would be distinguishable between the largest and the smallest cell size. Network • visual encoding matrix views can also show weighted networks, where each link has an as- sociated quantitative value attribute, by encoding with an ordered channel –cell shows presence/absence of edge such as color luminance or size. For undirected networks where links are symmetric, only half of the matrix needs to be shown, above or below the diagonal, because a link • scalability from node A to node B necessarily implies a link from B to A. For directed networks, the full square matrix has meaning, because links can be asym- –1K nodes, 1M edges metric. Figure 7.5 shows a simple example of an undirected network, with a matrix view of the five-node dataset in Figure 7.5a and a corresponding node-link view in Figure 7.5b. Matrix[Points views of view: of Networks. networks Gehlenborg can achieve and Wong. very Nature high Methods information 9:115.] density, up to a limit of one thousand nodes and one million edges, just like cluster heatmaps and all other matrix views that uses small area marks.33

Technique network matrix view Data Types network Derived Data table: network nodes as keys, link status between two nodes as values View Comp. space: area marks in 2D matrix alignment Scalability nodes: 1K edges: 1M

......

7.1.3.3 Multiple Keys: Partition and Subdivide When a dataset has only one key, then it is straightforward to use that key to separate into one region

i i

i i Connection vs. adjacency comparison • adjacency matrix strengths –predictability, scalability, supports reordering –some topology tasks trainable • node-link strengths –topology understanding, path tracing

–intuitive, no training needed http://www.michaelmcguffin.com/courses/vis/patternsInAdjacencyMatrix.png • empirical study –node-link best for small networks –matrix best for large networks • if tasks don’t involve topological structure! [On the readability of graphs using node-link and matrix-based representations: a controlled experiment and statistical analysis. Ghoniem, Fekete, and Castagliola. Information Visualization 4:2

(2005), 114–135.] 34 !"#$"%#& '!"#$%&'( '()*+,#-./.%)- '0)+$*# #-./0 !"#$%&'()*+,$$

G

3

F / % @ (

" 8 D 2 / 8

> * $ : " # $ B ) = # $ * D < F 8 , * % # $ * $ 3 $ % " ) % & ( ) % $ > ) " 8 $ * $ ( 8 % 8 " A = - ) & 8 & 6 # B # $ 8 * & B $ " / $ B # # B A / " = $ . # 8 * * $ Idiom: 2 # $ K $ # 0

9 # ( $ ) " 7 = $ $ $ radial node-link tree $ # 8 ) " $ < ) $ < D # < < " 8 % $ < - * 0 8 < - <

" < " " @ 9 " < " . & . " < " # ? $ "

• data * ( " ( $ / 8 I # * % $ # * $ ) ( # $ . $ / * ) " > ) / I + ) # ( / 8 $ % : / # $ % # & % " # E 8

–tree $ ( # $ 9 # / - ( # ? $ % F % &( & / ) F $ $ # - " $ # H ( ( &$ < * 2 ( ( / # * % . $ 8 > % " 0 $ (F % . ) 8 F , J < $ + $ $ 6 H )% (" N % ( * G $ / # N I / * )+ . 8 # % $ , ( ) A $ D ) + $ " / % D & % " , , 3 )C < 8 % % / # 5 " $ " # ) E $ " 3 B F( . / ( #( ( . & $ $ 9 / ) $ $ 9 ) " $ $ ( $ / $ C $ 1 D &$ / % 0 5 C 3 & $ + % $ &( 7 % " ) $ H # 7 F * ) 8 9 % ( / 8 D % ( 9 # ( , . ($ ) 1 & " 2 • encoding . 8 " ) " 0 " # $ > / 7 ) / & & + ( $ $ $ / # ) # ? $ * % D % # " % 1 % ( ( ) ( , $ 1 8 $ 7 / % / ( $ $ ( $ . * # % - % $ # " 1 ( 9 + 9 8 & B # # 8 #$ % # # $ 8 / #/ $ # B 7 $ ( / " % # $ $ 1 $ 2 ($ ; #$ 1 % $ && $ % )( < # # B #$ $ ( 7 , ( 5 & $ $ $ ' # $ * " $ , / )0 6 1 #( 9 3 : % @ % # * ( 2 & 5 % $ * , ( ) + 8 % % # % $ 9 / 1$ " )&8 F3 " , $ $ / $ $ > " 2 % &$ $ &$ A " 5 , () % 1 # (" / : B# # . * ) / 7 K " 2 )% $ $ A E + )() % > G G # 8) $ % 3 $ % + )/ F I /8 $% %. 8 3 #" % )( L3 –link connection marks 7 $ 8 $ B " + 8 3 $ # % .$ 9# # 1$ + B " $% > 3 )( * # #+ " 9 $ % ) B $ ("3 #)( (+ ( ; #($ 9 $ ' B 8 & $ & " >" #)($ 1 )$ 0 ( " % "<) / "@ 5$ + % * ( * >" ( ($ " 3 " ( B/ (" " > (" $ $ 3 /&()9 8" >" "3 ,#* $&$* 2/ (" )0 "( / A ()/% %(#/ % > ("3 H&$ –point node marks "%S 2/% & " " B" //0 (#/ > (" 2/% & >" ()& F (#/& ("M 4 2/%(# 1 >" /1$#2/ /& )+ #)($ 7?9" %(#/& (" #('39 %82/% 8" >) 9#)($ >#" (#/& <)%$3 .2/%(#/& * (39#)($ 2/%(# /%(#/&+ D$* /&<)+( ' $?(39#)($ –radial axis orientation 2/%(#/& 8)+9&" B 2&)*=2/%(#/& #$N)+ -%*5/#2/%(#/& !$? @&" ! " >#".@/#*$ "#($+)"%-?$+ # 2 ?)+ $ 95 K#"1)('@/# -?)+<"H$& " '+)*+ F@/ *$ <)%$ #*$ • angular proximity: siblings -?)+K#)8 + G:/8 -?)+ 3(#)%. A" '@/#*$ ?$+ ("(+ #()*&$ - 3 #( 3)0, 3/ ()& 39 &"()/ 9$+ , 3 #)%. % 5" (' 9#) 3 $# %.@ #/9 (($ - /#* A "&$ % E .. $ • distance from center: depth in tree 9 ()/ , -% #$ (" 5+ $ - 8 ." % ( $ ( $ " 5 & # #) $ ($ #) 6 ( ' - (5 7? $( I " ' " 1$ 0 9 & 0 ? * : $ # A" (($ / $ )% #" () $+ )C$ "&$ $ A# ( + 2 " . * +)/ 3 $A $(( $ *" &$ 2 /0 #' $ % 9 "& ($ &, 8) H 9 / 7 5" A $( " $ " " 2 0 9 ? 3 "& FN # &, $ #' > / 9 " 9 #A FA " ( # > " , / #)+ #$ &/ #)? 1 5 $ $ ( % + / + / "( ? 7 0 )&( + 7 )+ $ ( )( % +) 2 6 (#) F / @ ' + 7 ? ( M $ / $ " )? &" $ @ 9 )% ( 7 % #+ 6 (# $ ( #+ ? # * )& ? • tasks " F " K 9 " + FO % 9 $ ( 9 9 )+ &/ ' F # + # 3 6 > / < + $ $ +$ > #" &$ 6 - + + + % 2 # $ 6 )( + )/ + " $ 0 " $ ) $ - 9 & $ 6 ) % / > * & G " # / ' $ I $ ( % $ " & Q " % $ D * 3 3 ) ? 3 8 B " & $ N / % 9 * & $ ( & F # 5 / " & $ # ) $ * " & 9 " , 5 0 ( " $ $ " ( ( , ) G & 3 * " 0 $ " " / # $ * " / 0 3 " * % $ 8 9 # ) # ) " * 0 , " ( 3 * % " 4 ) 8 " ) * R , 8 # * 3 % " . " % B 6 0 * / 3 ' ( $ $ . $ & 3 0 + / 1 8 * 3 " H $ * 4 3 1 & $ / ) " ) ) " . # & M # / $ # 8 ( ( & # % D $ , " " " % / ( 8 ) –understanding topology, following paths / " % ) " + ) % . * ( $ & $ ) H ) < 1 ( ) 8 N ) ( ( " O E % $ @ # % 3 % , ) . * % F ( I < . ( " Q )O ( , ) O $ & + & ( " Q 0 ( 0 $ 0 0 " % ) % / % ? / / $ , # + / 8 + + " # + , 1 # E & $ ( # ( , 8 ,

" % P 9 $ 8 & / # $ H $ 0 . # 8 8

? )

• scalability $ # " * $ " $ H

5 (

%

( 1 '

$

; * –1K - 10K nodes $ 12#'!"##'3/4)+.'%,53#,#-.6'.2#'7#%-8)39:1%3;)$9'/38)$%.2,';)$'#;<*%#-.='.%94'/$$/-8#,#-.');'3/4#$#9'-)9#6>'12# http://mbostock.github.com/d3/ex/tree.html9#5.2');'-)9#6'%6'*),5+.#9'?4'9%6./-*#';$),'.2#'$)).='3#/9%-8'.)'/'$/88#9'/55#/$/-*#>'35@/$.#6%/-')$%#-./.%)-6 /$#'/36)'6+55)$.#9>'A,53#,#-./.%)-'?/6#9')-'&)$B'?4'C#;;'D##$'/-9'C/6)-'(/"%#6'+6%-8'E+*22#%,'#.'/3>F6'3%-#/$: .%,#'"/$%/-.');'.2#'7#%-8)39:1%3;)$9'/38)$%.2,>'(/./'62)&6'.2#'G3/$#'*3/66'2%#$/$*24='/36)'*)+$.#64'C#;;'D##$>

)*+,-'./*0'

$9 $!"#$"%&'()$$$*+,$%$-. $- $ $/ $!"#$!"##$$$&/01%23(!0!"##45 $Q $$$$$0)'6#47/+,8$"%&'()$&$9-,:5 $M $$$$$0)#;%"%!'3<4'()*+,-)4%8$=5$>$#.+(#)$4%0;%"#$#.+(#)$7&028$&0E$%$9F,$1$G%!?0HI:.$@5. $* $ 9, $!"#$B')$$$&/0)#1#D!4JKD?%"!J50%;;#<&4J)BAJ5 99 $$$$$0%!!"4JL'&!?J8$"%&'()$1$-5 9- $$$$$0%!!"4J?#'A?!J8$"%&'()$1$-$&$9M,5 9/ $$$0%;;#<&4JAJ5 9Q $$$$$0%!!"4J!"%<)N3"OJ8$J!"%<)1%!#4J$2$"%&'()$2$J8J$2$"%&'()$2$J5J5. 9M $ 9+ $&/0C)3<4J00P&%!%PN1%"#0C)3 9R $$$!"#$<3&#)$$$!"##0<3&#)4C)3<5. Idiom: treemap • data –tree –1 quant attrib at leaf nodes • encoding –area containment marks for hierarchical structure –rectilinear orientation –size encodes quant attrib • tasks http://www.nytimes.com/packages/html/newsgraphics/2011/0119-budget/index.html –query attribute at leaf nodes • scalability –1M leaf nodes 36 Marks As Items/nodes Marks As Items/nodes Points Lines Points Areas Lines Areas

Link marks: Connection and containment Marks As Links Marks As Links • marks as links (vs. nodes) Containment Connection Containment Connection –common case in network drawing –1D case: connection • ex: all node-link diagrams • emphasizes topology, path tracing • networks and trees Node-Link Containment –2D case: containment • ex: all treemap variants • emphasizes attribute values at leaves (size coding) • only trees Node–Link Diagram Treemap Elastic Hierarchy [Elastic Hierarchies: Combining Treemaps and Node-Link Diagrams. Dong, McGuffin, and Chignell. Proc. InfoVis 2005, p. 57-64.] 37 Tree drawing idioms comparison • data shown – link relationships – tree depth – sibling order • design choices – connection vs containment link marks – rectilinear vs radial layout – spatial position channels • considerations – redundant? arbitrary? – information density? [Quantifying the Space-Efficiency of 2D Graphical Representations of Trees. McGuffin and Robert. Information • avoid wasting space Visualization 9:2 (2010), 115–140.] 38 Paper: ABySS-Explorer

39 ABySS-Explorer: Design study • reconstructing genome with ABySS algorithm (Assembly By Short Sequences) • domain task – go from short subsequences to contigs, long contiguous sequences – extensive automatic support, but still human in the loop for visual inspection and manual editing – ambiguities, like repetitions longer than read length • data, domain:abstract – millions of reads of 25-100 nucleotides (nt): strings – read coverage, proxy for quality: quant attrib – read pairing distances, proxy for size distribution: quant

Fig 2. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 40 Contigs: abstraction as derived network data • derived data: de Bruijn graph/network – directed network, compact representation of sequence overlaps – node: contig – edge: overlap of k − 1 nt between two contigs – good for computing, bad for reasoning about sequence space • derived data: dual de Bruijn graph – node: points of contig overlap – edge: contig – better match for arrow diagrams used in hand drawn sketches • base layout: force-directed

Fig 3. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 41 DNA as double stranded: idiom for encoding & interaction • rejected option: 2 nodes per contig – excess clutter if one for each direction – choice at data abstraction level • encoding & interaction idiom: polar node – encoding: upper vs lower attachment point • redundant with arc direction – large-scale visibility, without need to zoom • arbitrary but consistent – interaction: click to reverse direction • switches polarity of vertex connections • changes sign of label

Fig 4. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 42 Contig length: encoding • rejected option: scale edge lengths by sequence lengths – short contigs are important sources of ambiguity, would be hard to distinguish – task guidance: only low-res judgements needed, relatively long or short • encoding idiom: wave pattern – oscillation shows fixed number, shapes distinguishable 3K nt – min amplitude at connections so edges visible – orientation with max amplitude asymmetric wrt start • rejected initial option: max in middle 12K nt • rejected options: – color (keep for other attribute) – half-lines – curvature (used for polar nodes) • aligned with empirical guidance for tapered edges

Fig 5. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 43 Contig coverage: encoding • rejected options: luminance/lightness – not distinguishable given denseness variation from wave shapes – also problematic with desire for separable color/hue encoding • chosen: line thickness – not distinguishable for extremely long contigs – can address by adjusting oscillation frequency to suitable size

44 Read pairs: encoding • data: – distance estimate – orientation • encoding: – dashed line (shape channel for line mark) • implying inferred vs observed sequences – color for both dashed line and contig leaf – [same length as for contigs] – rejected initial option: line color alone • too ambiguous – interaction to fully resolve remaining ambiguity – or color by unambiguous paths in grey

Fig 6. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 45 Displaying meta-data • reserve color for additional attributes • ex: color to compare reference human to lymphoma genome – inconsistencies visible as interconnections between different colors – inversion breakpoint visible – interaction to check if error in metadata from experiments vs assembly • read pair info supports metadata – speedup claim vs prev work

Fig 10. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 46 Assembly examples • ideal: single large contig – overview/gist: many small contigs remain • interaction to resolve – integrate paired read highlighting on top of contig paths structure

Fig 7/9. ABySS-Explorer: visualizing genome sequence assemblies. Nielsen, Jackman, Birol, Jones. TVCG 15(6):881-8, 2009 (Proc. InfoVis 2009). 47