Visualization Analysis & Design

Tamara Munzner Department of Computer Science University of British Columbia

D3 Unconference Keynote November 21 2015, San Francisco CA http://www.cs.ubc.ca/~tmm/talks.html#vad15d3 @tamaramunzner Defining (vis)

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

2 Why have a human in the loop?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods.

• don’t need vis when fully automatic solution exists and is trusted • many analysis problems ill-specified – don’t know exactly what questions to ask in advance • possibilities – long-term use for end users (e.g. exploratory analysis of scientific data) – presentation of known results – stepping stone to better understanding of requirements before developing models – help developers of automatic solution refine/debug, determine parameters

– help end users of automatic solutions verify, build trust 3 Why use an external representation?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. • external representation: replace cognition with perception

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.] 4 Why represent all the data?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. • summaries lose , details matter – confirm expected and find unexpected patterns – assess validity of statistical model

Anscombe’s Quartet

Identical statistics x mean 9 x variance 10 y mean 8 y variance 4 x/y correlation 1

5 Why represent all the data?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. • summaries lose information, details matter – confirm expected and find unexpected patterns – assess validity of statistical model

Anscombe’s Quartet

Identical statistics x mean 9 x variance 10 y mean 8 y variance 4 x/y correlation 1

5 Analysis framework: Four levels, three questions domain • domain situation abstraction – who are the target users? idiom • abstraction algorithm – translate from specifics of domain to vocabulary of vis [A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] • what is shown? data abstraction • often don’t just draw what you’re given: transform to new form domain abstraction • why is the user looking at it? task abstraction • idiom idiom • how is it shown? algorithm • visual encoding idiom: how to draw • interaction idiom: how to manipulate [A Multi-Level Typology of Abstract Visualization Tasks • algorithm Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

– efficient computation 6 Why is validation difficult? • different ways to get it wrong at each level

Domain situation You misunderstood their needs

Data/task abstraction You’re showing them the wrong thing

Visual encoding/interaction idiom The way you show it doesn’t work

Algorithm Your code is too slow

7 Why is validation difficult? • solution: use methods from different fields at each level

Domain situation anthropology/ Observe target users using existing tools ethnography Data/task abstraction

Visual encoding/interaction idiom design Justify design with respect to alternatives

computer Algorithm technique-driven science Measure system time/memory Analyze computational complexity work cognitive Analyze results qualitatively psychology Measure human time with lab experiment (lab study) anthropology/ Observe target users after deployment ( ) ethnography Measure adoption

[A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 8 Why is validation difficult? • solution: use methods from different fields at each level

Domain situation anthropology/ Observe target users using existing tools ethnography Data/task abstraction

Visual encoding/interaction idiom design Justify design with respect to alternatives

computer Algorithm technique-driven science Measure system time/memory Analyze computational complexity work cognitive Analyze results qualitatively psychology Measure human time with lab experiment (lab study) anthropology/ Observe target users after deployment ( ) ethnography Measure adoption

[A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 8 Why is validation difficult? • solution: use methods from different fields at each level

Domain situation problem-driven anthropology/ Observe target users using existing tools work ethnography Data/task abstraction

Visual encoding/interaction idiom design Justify design with respect to alternatives

computer Algorithm technique-driven science Measure system time/memory Analyze computational complexity work cognitive Analyze results qualitatively psychology Measure human time with lab experiment (lab study) anthropology/ Observe target users after deployment ( ) ethnography Measure adoption

[A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 8 Why analyze? • imposes a structure on huge design space – scaffold to help you think systematically about choices – analyzing existing as stepping stone to designing new Why analyze? SpaceTree TreeJuxtaposer • imposes a structure on huge design space – scaffold to help you think systematically about choices – analyzing existing as stepping stone to designing new

[SpaceTree: Supporting Exploration in Large [TreeJuxtaposer: Scalable Tree Comparison Using Focus Node Link Tree, Design Evolution and Empirical +Context With Guaranteed Visibility. ACM Trans. on Evaluation. Grosjean, Plaisant, and Bederson. Graphics (Proc. SIGGRAPH) 22:453– 462, 2003.] What? Why? How? Proc. InfoVis 2002, p 57–64.]

Tree Actions SpaceTree Present Locate Identify Encode Navigate Select Filter Aggregate

Targets TreeJuxtaposer Path between two nodes Encode Navigate Select Arrange

9 What? Datasets Attributes

Data Types Attribute Types What? Items Attributes Links Positions Grids Categorical

Data and Dataset Types Why? Tables Networks & Fields Geometry Clusters, Ordered Trees Sets, Lists Ordinal Items Items (nodes) Grids Items Items Attributes Links Positions Positions How? Attributes Attributes Quantitative

Dataset Types Ordering Direction Tables Networks Fields (Continuous) Sequential Attributes (columns) Grid of positions

Link Items Cell (rows) Node Diverging (item) Cell containing value Attributes (columns)

Value in cell Cyclic Multidimensional Trees

Value in cell

Geometry (Spatial) Dataset Availability Static Dynamic

Position 10 Types: Datasets and data

Dataset Types Tables Networks NetworksSpatial

Attributes (columns) Fields (Continuous) Geometry (Spatial)

Items Link Grid of positions (rows) Node (item) Cell Cell containing value Position Node (item) Attributes (columns)

Attribute Types Value in cell Categorical Ordered

Ordinal Quantitative

11 Why? Actions Targets

Analyze All Data Consume Trends Outliers Features Discover Present Enjoy

Attributes Produce Annotate Record Derive One Many

tag Distribution Dependency Correlation Similarity

Extremes Search • {action, target} pairs Target known Target unknown Location Lookup Browse Network Data – discover distribution known Location Locate Explore Topology – compare trends unknown – locate outliers Query Paths – browse topology Identify Compare Summarize What? Spatial Data Why? Shape How? 12 Actions 1: Analyze • consume Analyze –discover vs present Consume Discover Present Enjoy • classic split • aka explore vs explain –enjoy • newcomer Produce • aka casual, social Annotate Record Derive

tag • produce –annotate, record –derive • crucial design choice

13 Derive • don’t just draw what you’re given! – decide what the right thing to show is – create it with a series of transformations from the original dataset – draw that • one of the four major strategies for handling complexity

exports imports trade balance

trade balance = exports −imports Derived Data Original Data 14 Analysis example: Derive one attribute • Strahler number – centrality metric for trees/networks – derived quantitative attribute – draw top 5K of 500K for good skeleton [Using Strahler numbers for real time visual exploration of huge graphs. Auber. Proc. Intl. Conf. Computer Vision and Graphics, pp. 56–69, 2002.]

Task 1 Task 2

.58 .74 .58 .74 .64 .64 .54 .84 .54 .84 .74 .84 .74 .84 .24 .84 .24 .84 .64 .64 .94 .94 In Out In In Out Tree Quantitative Tree + Quantitative Filtered Tree attribute on nodes attribute on nodes Removed unimportant parts

What? Why? What? Why? How? In Tree Derive In Tree Summarize Reduce Out Quantitative In Quantitative attribute on nodes Topology Filter attribute on nodes Out Filtered Tree 15 Actions II: Search

• what does user know? Search

– target, location Target known Target unknown

Location Lookup Browse known

Location Locate Explore unknown

16 Actions III: Query

• what does user know? Search

– target, location Target known Target unknown

Location Lookup Browse • how much of the data known Location matters? Locate Explore unknown – one, some, all

Query • analyze, search, query Identify Compare Summarize – independent choices for each

17 Targets

All Data Network Data Trends Outliers Features Topology

Paths Attributes One Many Distribution Dependency Correlation Similarity Spatial Data Shape Extremes

18 How?

EncodeEncode Manipulate Manipulate Facet Facet ReduReducece

Arrange Change Juxtapose Filter Express Separate from categorical and ordered attributes Color Hue Saturation Order Align Luminance Select Partition Aggregate

Size, Angle, Curvature, ...

Use Navigate Superimpose Embed Shape

Motion Direction, Rate, Frequency, ...

19 How to encode: Arrange space, map channels Encode

Arrange Map Express Separate from categorical and ordered attributes Color Hue Saturation Luminance Order Align Size, Angle, Curvature, ...

Use Shape

Motion Direction, Rate, Frequency, ...

20 Encoding visually • analyze idiom structure

21 Definitions: Marks and channels

• marks Points Lines Areas – geometric primitives

Position Color • channels Horizontal Vertical Both – control appearance of marks

Shape Tilt

Size Length Area Volume

22 Encoding visually with marks and channels • analyze idiom structure – as combination of marks and channels

1: 2: 3: 4: vertical position vertical position vertical position vertical position horizontal position horizontal position horizontal position color hue color hue size (area)

mark: line mark: point mark: point mark: point

23 Channels: Expressiveness types and effectiveness rankings

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region

Position on unaligned scale Color hue

Length (1D size) Motion

Tilt/angle Shape

Area (2D size)

Depth (3D position)

Color luminance

Color saturation

Curvature

Volume (3D size) 24 Channels: Matching Types

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region

Position on unaligned scale Color hue

Length (1D size) Motion

Tilt/angle Shape

Area (2D size) • expressiveness principle Depth (3D position) – match channel and data characteristics Color luminance

Color saturation

Curvature

Volume (3D size) 25 Channels: Rankings

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region

Position on unaligned scale Color hue

Length (1D size) Motion

Tilt/angle Shape

Area (2D size) • expressiveness principle Depth (3D position) – match channel and data characteristics Color luminance • effectiveness principle Color saturation – encode most important attributes with Curvature highest ranked channels

Volume (3D size) 26 How?

EncodeEncode Manipulate Manipulate Facet Facet ReduReducece

Arrange Map Change Juxtapose Filter Express Separate from categorical and ordered attributes Color Hue Saturation Order Align Luminance Select Partition Aggregate

Size, Angle, Curvature, ...

Use Navigate Superimpose Embed Shape

Motion Direction, Rate, Frequency, ...

27 How to handle complexity: 3 more strategies + 1 previous

Manipulate Facet Reduce Derive

Change Juxtapose Filter

Select Partition Aggregate • change view over time • facet across multiple views Navigate Superimpose Embed • reduce items/attributes within single view • derive new data to show within view 28 How to handle complexity: 3 more strategies + 1 previous

Manipulate Facet Reduce Derive

Change Juxtapose Filter

Select Partition Aggregate • change over time - most obvious & flexible of the 4 strategies

Navigate Superimpose Embed

29 Idiom: Animated transitions • smooth transition from one state to another – alternative to jump cuts – support for item tracking when amount of change is limited • example: multilevel matrix views – scope of what is shown narrows down • middle block stretches to fill space, additional structure appears within • other blocks squish down to increasingly aggregated representations

[Using Multilevel Call Matrices in Large Software Projects. van Ham. Proc. IEEE Symp. Information Visualization (InfoVis), pp. 227–232, 2003.] 30 How to handle complexity: 3 more strategies + 1 previous

Manipulate Facet Reduce Derive

Change Juxtapose Filter

Select Partition Aggregate • facet data across multiple views

Navigate Superimpose Embed

31 Facet

Juxtapose Coordinate Multiple Side By Side Views Share Encoding: Same/Di!erent Linked Highlighting

Partition Share Data: All/Subset/None

Superimpose Share Navigation

32 Idiom: Linked highlighting System: EDV • see how regions contiguous in one view are distributed within another – powerful and pervasive interaction idiom

• encoding: different – multiform • data: all shared

[Visual Exploration of Large Structured Datasets. Wills. Proc. New Techniques and Trends in Statistics (NTTS), pp. 237–246. IOS Press, 1995.]

33 Idiom: bird’s-eye System: Google Maps • encoding: same • data: subset shared • navigation: shared – bidirectional linking

• differences – viewpoint – (size)

• overview-detail [A Review of Overview+Detail, Zooming, and Focus+Context Interfaces. Cockburn, Karlson, and Bederson. ACM Computing Surveys 41:1 (2008), 1–31.]

34 Idiom: Small multiples System: Cerebral • encoding: same • data: none shared – different attributes for node colors – (same network layout) • navigation: shared

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE Trans. Visualization and (Proc. InfoVis 2008) 14:6 (2008), 1253–1260.] 35 Coordinate views: Design choice interaction

All Subset None

Overview/ Same Redundant Detail Small Multiples

Multiform, Overview/ No Linkage Multiform Detail • why juxtapose views? – benefits: eyes vs memory • lower cognitive load to move eyes between 2 views than remembering previous state with single changing view

– costs: display area, 2 views side by side each have only half the area of one view 36 Partition into views • how to divide data between views Partition into Side-by-Side Views – encodes association between items using spatial proximity – major implications for what patterns are visible – split according to attributes • design choices – how many splits • all the way down: one mark per region? • stop earlier, for more complex structure within region? – order in which attribs used to split – how many views 37 Partitioning: List alignment • single bar with grouped bars • small-multiple bar – split by state into regions – split by age into regions • complex glyph within each region showing all ages • one chart per region – compare: easy within state, hard across ages – compare: easy within age, harder

11.0 65 Years and Over across states 45 to 64 Years 11 10.0 25 to 44 Years 5 18 to 24 Years 0 9.0 14 to 17 Years 11 5 to 13 Years 8.0 Under 5 Years 5 0 7.0 11 5 6.0 0 11 5.0 5 4.0 0 11 3.0 5 0 2.0 11 5 1.0 0 11 0.0 CA TK NY FL IL PA 5 0 http:/bl.ocks.org/mbostock/3887051 http:/bl.ocks.org/mbostock/4679202 CA TK NY FL IL PA 38 Partitioning: Recursive subdivision System: HIVE • split by neighborhood • then by type • then time – years as rows – months as columns • color by price

• neighborhood patterns – where it’s expensive – where you pay much more for detached type

[Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.] 39 Partitioning: Recursive subdivision System: HIVE • switch order of splits – type then neighborhood • switch color – by price variation

• type patterns – within specific type, which neighborhoods inconsistent

[Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.] 40 Partitioning: Recursive subdivision System: HIVE • different encoding for second-level regions – choropleth maps

[Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.] 41 How to handle complexity: 3 more strategies + 1 previous

Manipulate Facet Reduce Derive

Change Juxtapose Filter

Select Partition Aggregate • reduce what is shown within single view

Navigate Superimpose Embed

42 Reduce items and attributes Reducing Items and Attributes Reduce Filter Filter • reduce/increase: inverses Items • filter – pro: straightforward and intuitive Aggregate Attributes • to understand and compute

– con: out of sight, out of mind Embed • aggregation Aggregate – pro: inform about whole set Items – con: difficult to avoid losing signal

• not mutually exclusive Attributes – combine filter, aggregate – combine reduce, facet, change, derive 43 pod, and the rug looks like the seeds within. Kampstra (2008) also suggests a way of comparing two

groups more easily: use the left and right sides of the bean to display different distributions. A related idea

is the raindrop plot (Barrowman and Myers, 2003), but its focus is on the display of error distributions from

complex models.

Figure 4 demonstrates these density boxplots applied to 100 numbers drawn from each of four distribu-

tions with mean 0 and standard deviation 1: a standard normal, a skew-right distribution (Johnson distri-

bution with skewness 2.2 and kurtosis 13), a leptikurtic distribution (Johnson distribution with skewness 0

and kurtosis 20) and a bimodal distribution (two normals with mean -0.95 and 0.95 and standard devia-

tion 0.31). Richer displays of density make it much easier to see important variations in the distribution: Idiom: boxplot multi-modality is particularly important, and yet completely invisible with the boxplot. • static item aggregation ! !

! ! 4 4

! • task: find distribution ! ! 4 ! 4

! !

• data: table ! !

2 2 !

! 2 • derived data 2 0 0 – 5 quant attribs 0 0

! ! ! • median: central line ! !

! 2

! 2 ! ! ! !

2 2 !

• lower and upper quartile: boxes ! ! ! ! !

!

! ! 4

• lower upper fences: whiskers ! 4 ! – values beyond which items are outliers n s k mm n s k mm n s k mm n s k mm – outliers beyond fence cutoffs explicitly shown Figure 4: From left to right: box plot, vase plot, violin plot and bean plot. Within each plot, the distributions from left to [40 years of boxplots. Wickham and Stryjewski. 2012. had.co.nz]

right are: standard normal (n), right-skewed44 (s), leptikurtic (k), and bimodal (mm). A normal kernel and bandwidth of

0.2 are used in all plots for all groups.

A more sophisticated display is the sectioned density plot (Cohen and Cohen, 2006), which uses both

colour and space to stack a density estimate into a smaller area, hopefully without losing any information

(not formally verified with a perceptual study). The sectioned density plot is similar in spirit to horizon

graphs for time series (Reijner, 2008), which have been found to be just as readable as regular line graphs

despite taking up much less space (Heer et al., 2009). The density strips of Jackson (2008) provide a similar

compact display that uses colour instead of width to display density. These methods are shown in Figure 5.

6 Idiom: Dimensionality reduction for documents • attribute aggregation – derive low-dimensional target space from high-dimensional measured space

Task 1 Task 2 Task 3

wombat

In Out In Out In Out HD data 2D data 2D data Scatterplot Scatterplot Labels for Clusters & points Clusters & points clusters

What? Why? What? Why? How? What? Why? In High- Produce In 2D data Discover Encode In Scatterplot Produce dimensional data Derive Out Scatterplot Explore Navigate In Clusters & points Annotate Out 2D data Out Clusters & Identify Select Out Labels for points clusters 45 What? Datasets Attributes domain abstraction Data Types WAhttribuy? te Types Items Attributes Links Positions Grids Categorical Actions Targets

Data and Dataset Types Analyze All Data Tables Networks & Fields Geometry Clusters, Ordered idiom TreesConsume Sets, Lists OrdinalTrends Outliers Features algorithm Items Items (nodes)DiscoverGrids PresentItems EnjoyItems Attributes Links Positions Positions Attributes Attributes Quantitative

Attributes How? Dataset Types Produce Ordering OneDirection Many Tables AnnotaNetteworks Record FieldsEncD ode(eriCEnonvtinuous)ecode Manipulate Manipulate Facet Facet ReduReducece Sequential Dependency Correlation Similarity tag Grid of positions Distribution Attributes (columns) Arrange Map Change Juxtapose Filter Items Link from categorical and ordered ExpressCell Separate (rows) Node attrDiibuvetesrging (item) Extremes Cell containingS veaalue rch Attributes (columns) Color Hue Saturation Luminance Select Partition Aggregate Order ValueA inli cgelln Cyclic Multidimensional Table TreesTarget known Target unknown Location Size, Angle, Curvature, ... Lookup Browse Network Data known Use Navigate Superimpose Embed ValueL inoc cellation Locate Explore Topology unknown Shape

Geometry (Spatial)Query Motion Paths Direction, Rate, Frequency, ... Identify Compare Summarize What? Position 46 Spatial Data Why? More Information @tamaramunzner • this talk http://www.cs.ubc.ca/~tmm/talks.html#vad15d3

• book page (including tutorial lecture slides) http://www.cs.ubc.ca/~tmm/vadbook – 20% promo code for book+ebook combo: HVN17 – http://www.crcpress.com/product/isbn/9781466508910

: Eamonn Maguire

• papers, videos, software, talks, full courses http://www.cs.ubc.ca/group/infovis http://www.cs.ubc.ca/~tmm Visualization Analysis and Design. Munzner. A K Peters Visualization Series, CRC Press, Visualization Series, 2014. 47