Session 2 Tamara Munzner Department of Computer Science University of British Columbia
Total Page:16
File Type:pdf, Size:1020Kb
Visualization Analysis & Design Full-Day Tutorial Session 2 Tamara Munzner Department of Computer Science University of British Columbia Sanger Institute / European Bioinformatics Institute June 2014, Cambridge UK http://www.cs.ubc.ca/~tmm/talks.html#minicourse14 Outline • Visualization Analysis Framework • Idiom Design Choices Session 1 9:30-10:45am Session 2 11:00am-12:15pm – Introduction: Definitions – Arrange Tables – Analysis: What, Why, How – Arrange Spatial Data – Marks and Channels – Arrange Networks and Trees – Map Color • Idiom Design Choices, Part 2 • Guidelines and Examples Session 3 1:15pm-2:45pm Session 4 3-4:30pm – Manipulate: Change, Select, Navigate – Rules of Thumb – Facet: Juxtapose, Partition, Superimpose – Validation – Reduce: Filter, Aggregate, Embed – BioVis Analysis Example http://www.cs.ubc.ca/~tmm/talks.html#minicourse14 2 How? EncodeEncode Manipulate Manipulate Facet Facet ReduReducece Arrange Map Change Juxtapose Filter Express Separate from categorical and ordered attributes Color Hue Saturation Order Align Luminance Select Partition Aggregate Size, Angle, Curvature, ... Use Navigate Superimpose Embed Shape Motion Direction, Rate, Frequency, ... 3 Arrange space Encode Arrange Express Separate Order Align Use 4 Arrange tables Express Values Axis Orientation Rectilinear Parallel Radial Separate, Order, Align Regions Separate Order Layout Density Dense Space-Filling Align 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision 5 Keys and values Tables • key Attributes (columns) Items – independent attribute (rows) – used as unique index to look up items Cell containing value – simple tables: 1 key – multidimensional tables: multiple keys Multidimensional Table • value – dependent attribute, value of cell Value in cell • classify arrangements by key count – 0, 1, 2, many... Express Values 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision 6 Idiom: scatterplot Express Values • express values – quantitative attributes • no keys, only values – data • 2 quant attribs – mark: points – channels • horiz + vert position – tasks • find trends, outliers, distribution, correlation, clusters – scalability • hundreds of items [A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 7 Some keys: Categorical regions Separate Order Align • regions: contiguous bounded areas distinct from each other – using space to separate (proximity) – following expressiveness principle for categorical attributes • use ordered attribute to order and align regions 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision 8 Idiom: bar chart 100 100 • one key, one value 75 75 50 50 – data 25 25 • 1 categ attrib, 1 quant attrib 0 0 – mark: lines – channels Animal Type Animal Type • length to express quant value • spatial regions: one per mark – separated horizontally, aligned vertically – ordered by quant attrib » by label (alphabetical), by length attrib (data-driven) – task • compare, lookup values – scalability • dozens to hundreds of levels for key attrib 9 Idiom: stacked bar chart • one more key – data • 2 categ attrib, 1 quant attrib – mark: vertical stack of line marks • glyph: composite object, internal structure from multiple marks – channels [Using Visualization to Understand the • length and color hue Behavior of Computer Systems. Bosch. Ph.D. • spatial regions: one per glyph thesis, Stanford Computer Science, 2001.] – aligned: full glyph, lowest bar component – unaligned: other bar components – task • part-to-whole relationship – scalability • several to one dozen levels for stacked attrib 10 Idiom: streamgraph • generalized stacked graph – emphasizing horizontal continuity • vs vertical items [Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. – data IEEE Trans. Visualization and Computer Graphics (Proc. InfoVis • 1 categ key attrib (artist) 2008) 14(6): 1245–1252, (2008).] • 1 ordered key attrib (time) • 1 quant value attrib (counts) – derived data • geometry: layers, where height encodes counts • 1 quant attrib (layer ordering) – scalability • hundreds of time keys • dozens to hundreds of artist keys – more than stacked bars, since most layers don’t extend across whole chart 11 Idiom: line chart 20 • one key, one value 15 – data 10 • 2 quant attribs 5 – mark: points 0 • line connection marks between them – channels Year • aligned lengths to express quant value • separated and ordered by key attrib into horizontal regions – task • find trend – connection marks emphasize ordering of items along key axis by explicitly showing relationship between one item and the next 12 Choosing bar vs line charts 60 60 50 50 40 40 • depends on type of key attrib 30 30 20 20 – bar charts if categorical 10 10 0 0 – line charts if ordered Female Male Female Male • do not use line charts for 60 60 50 50 categorical key attribs 40 40 30 30 – violates expressiveness principle 20 20 10 10 • implication of trend so strong that 0 0 10-year-olds 12-year-olds 10-year-olds 12-year-olds it overrides semantics! – “The more male a person is, the after [Bars and Lines: A Study of Graphic Communication. taller he/she is” Zacks and Tversky. Memory and Cognition 27:6 (1999), 1073–1079.] 13 Idiom: heatmap • two keys, one value – data • 2 categ attribs (gene, experimental condition) • 1 quant attrib (expression levels) – marks: area • separate and align in 2D matrix – indexed by 2 categorical attributes 1 Key 2 Keys Many Keys – channels List Matrix Recursive Subdivision • color by quant attrib – (ordered diverging colormap) – task • find clusters, outliers – scalability • 1M items, 100s of categ levels, ~10 quant attrib levels 14 Idiom: cluster heatmap • in addition – derived data • 2 cluster hierarchies – dendrogram • parent-child relationships in tree with connection line marks • leaves aligned so interior branch heights easy to compare – heatmap • marks (re-)ordered by cluster hierarchy traversal 15 Axis Orientation Rectilinear Parallel Radial 16 Idioms: scatterplot matrix, parallel coordinates • scatterplot matrix (SPLOM) Scatterplot Matrix Parallel Coordinates Math Physics Dance Drama – rectilinear axes, point mark Math 100 – all possible pairs of axes 90 Physics 80 70 – scalability 60 Dance 50 • one dozen attribs 40 30 • dozens to hundreds of items 20 Drama 10 • parallel coordinates Math Physics Dance Drama 0 – parallel axes, jagged line representing item Table – rectilinear axes, item as point Math Physics Dance Drama • axis ordering is major challenge 85 95 70 65 90 80 60 50 – scalability 65 50 90 90 50 40 95 80 • dozens of attribs 40 60 80 90 • hundreds of items after [Visualization Course Figures. McGuffin, 2014. http://www.michaelmcguffin.com/courses/vis/] 17 Task: Correlation • scatterplot matrix – positive correlation • diagonal low-to-high – negative correlation • diagonal high-to-low [A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics – uncorrelated 19:1 (2010), 3–28.] • parallel coordinates – positive correlation • parallel line segments – negative correlation • all segments cross at halfway point – uncorrelated [Hyperdimensional Data Analysis Using Parallel Coordinates. • scattered crossings Wegman. Journ. American Statistical Association 85:411 (1990), 664–675.] 18 Idioms: radial bar chart, star plot • radial bar chart – radial axes meet at central ring, line mark • star plot – radial axes, meet at central point, line mark • bar chart – rectilinear axes, aligned vertically • accuracy – length unaligned with radial • less accurate than aligned with rectilinear [Vismon: Facilitating Risk Assessment and Decision Making In Fisheries Management. Booshehrian, Möller, Peterman, and Munzner. Technical Report TR 2011-04, Simon Fraser University, School of Computing Science, 2011.] 19 Idioms: pie chart, polar area chart • pie chart – area marks with angle channel – accuracy: angle/area much less accurate than line length • polar area chart – area marks with length channel – more direct analog to bar charts • data – 1 categ key attrib, 1 quant value attrib • task – part-to-whole judgements [A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 20 Idioms: normalized stacked bar chart3/21/2014 bl.ocks.org/mbostock/raw/3886394/ 100% 65 Years and Over 90% 80% • task 45 to 64 Years 70% 60% – part-to-whole judgements 50% 25 to 44 Years 40% 30% 18 to 24 Years 20% 14 to 17 Years 10% 5 to 13 Years Under 5 Years • normalized stacked bar chart 3/21/200%14 bl.ocks.org/mbostock/raw/3886208/ UT TX ID AZ NV GA AK MS NMNE CA OK SDCO KS WY NC AR LA IN IL MNDE HI SCMOVA IA TN KY AL WAMDNDOH WI OR NJ MT MI FL NY DC CT PA MAWV RI NHME VT n o i 65 Years and Over t a 35M l u 45 to 64 Years p o – stacked bar chart, normalized to full vert height P 25 to 44 Years 18 to 24 Years 30M 14 to 17 Years 5 to 13 Years Under 5 Years – single stacked bar equivalent to full pie 25M • high information density: requires narrow rectangle 20M 15M 10M 5.0M 0.0 • pie chart CA TX NY FL IL PA OH MI GA3N/2C1/N2J01V4A WA AZ MA IN TN MO MD WI MN CO AL SC LA KY OR OK CT IAblM.oScAksR.oKrgS/mUbToNstVocNkM/rWawV/N38E86ID39M4E/ NH HI RI MT DE SD AK ND VT DC WY 100% 3/21/2014 bl.ocks.org/mbostock/raw/3887235/ 65 Years and Over – information density: requires large circle 90% 80% ≥65 70% <5 45 to 64 Years 45-64 5-13 60% 14-17 50% 18-24 25 to 44 Years http://bl.ocks.org/mbostock/3887235, 40% http://bl.ocks.org/mbostock/raw/3886394/ 1/1 30% 25-44 18 to 24 Years http://bl.ocks.org/mbostock/3886208, 20% 14 to 17 Years 10% 5 21to 13 Years http://bl.ocks.org/mbostock/3886394. Under 5 Years 0% UT TX ID AZ NV GA AK MS NMNE CA OK SDCO KS WY NC AR LA IN IL MNDE HI SCMOVA IA TN KY AL WAMDNDOH WI OR NJ MT MI FL NY DC CT PA MAWV RI NHME VT http://bl.ocks.org/mbostock/raw/3886208/ 1/1 http://bl.ocks.org/mbostock/raw/3887235/ 1/1 http://bl.ocks.org/mbostock/raw/3886394/ 1/1 Two types of glyph – lines and stars – are especially useful for temporal displays.