Best Practices in Data Visualization
Total Page:16
File Type:pdf, Size:1020Kb
Best Practices in Data Visualization Jodie Jenkinson, Associate Professor + Director Biomedical Communications University of Toronto bmc.med.utoronto.ca Why visualize data? • Because of the inexplicability of complex information or raw data • In order to leverage visual perception • To create an aid to understanding • To provide insight Best Practices in Data Visualization – ComSciComCan Jenkinson Leveraging visual perception Vision is high bandwidth • 40%+ of cortex devoted to visual perception • Information processing capacity of the visual system • High-bandwidth channel • 109 bits per second* • Parallel processing • = 1 billion bits • = ~120 Megabytes per second * Information Capacity of a Single Retinal Channel, DH Kelly, IRE Transactions on Information Theory, 1962, pp. 221 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson All visual stimuli Pre-attentive Pre-attentive Attend to Leveraging visual perception Vision is high bandwidth • 40%+ of cortex devoted to visual perception • Information processing capacity of the visual system • High-bandwidth channel • 109 bits per second* • Parallel processing • = 1 billion bits • = ~120 Megabytes per second * Information Capacity of a Single Retinal Channel, DH Kelly, IRE Transactions on Information Theory, 1962, pp. 221 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson All visual stimuli Pre-attentive Pre-attentive Attend to Providing insight Why visualize? Anscombe’s quartet I II III IV • Same Anscombe’s quartet I II III IV x y x y x y x y • Mean (x and y) x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 10 8.04 10 9.14 10 7.46 8 6.58 • Variance 8 6.95 8 8.14 8 6.77 8 5.76 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 13 7.58 13 8.74 13 12.74 8 7.71 • Correlation 11 8.33 11 9.26 11 7.81 8 8.47 9 8.81 9 8.77 9 7.11 8 8.84 • Regression 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 11 8.33 11 9.26 11 7.81 8 8.47 4 4.26 4 3.1 4 5.39 19 12.5 14 9.96 14 8.1 14 8.84 8 7.04 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 6 7.24 6 6.13 6 6.08 8 5.25 5 5.68 5 4.74 5 5.73 8 6.89 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Why visualize? Anscombe’s quartet I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 “…If a picture is only 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 worth a thousand 5 5.68 5 4.74 5 5.73 8 6.89 13 13 13 13 words, we're screwed.” 11 11 11 11 9 9 9 9 6 6 6 6 4 4 4 4 Eric Lander 4 8 12 16 20 4 8 12 16 20 4 8 12 16 20 4 8 12 16 20 Francis J. Anscombe, Graphs in Statistical Analysis. The American Statistician, vol. 27, no. 1, pp. 17–21, 1973 Professor of Biology, MIT VIZBI 2011, Opening Remarks Best Practices in Data Visualization – ComSciConCan Jenkinson Data Visualization in a nutshell • Visual mapping (encoding data) • Providing adequate context • Balancing clarity & aesthetics Encoding Data Best Practices in Data Visualization – ComSciComCan Jenkinson Title Text Data types • Visual representation of data should be consistent with the numerical • Nominal representation • name, type, category • eg. mammals, reptiles, birds • Ordinal • integer sequence • eg. first, second, third • happy, very happy, ecstatic Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Data types Data features • Interval • Spatial • gap in values • eg. maps, GIS, directions • eg. every three months • scalar fields • Ratio • Narrative • real numbers; zero as reference • eg. assembly sequence, process • 45.7 out of 100 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Data features Visual encoding 1d • The way in which data is mapped to visual structures • Every visualization can be described as a set of mappings: 2d • from data items to visual marks • from data attributes to visual channels 3d Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Title Text Marks • Data Items: • Data Marks: the basic visual units that represent data objects visually • Data Attributes: • Visual Channels: the visual variables we can use to represent characteristics of these objects From Enrico Bertini Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Allows more accurate judgments Position along a common scale Channels Position along nonaligned scales Cleveland & McGill’s Length Perceptual Task Direction Scale Angle Area Volume Curvature Shading Allows more generic judgments Colour saturation From Enrico Bertini From The Functional Art, Alberto Cairo Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Cleveland and McGill’s Perceptual Task Scale A basic example: the pie chart • Tasks are grouped according to how well you can perceive differences in • People are not good at making visual angular distinctions the data • Pie charts are sometimes rolled out to encode 1 or 2 numbers; usually a very low data density! Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Pie charts Pie charts 7% 2002 2003 2004 8% 2002 2003 2004 2005 2006 2007 2005 2006 2007 35% 10% 11% 29% Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Lies, damned lies… • Misalignments between graphic elements and the data they are intended to represent “There are three kinds of lies: lies, damned lies, and statistics” Benjamin Disraeli Best Practices in Data Visualization – ComSciComCan Jenkinson Misleading use of area Misleading use of area Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Misleading use of area Misleading use of area… Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Linear vs. quadratic change Linear vs quadratic change • A common mistake for chart design is to scale an area by two sides at the same time, producing a quadratic effect for a linear change The area of the blue circle is equal to πr2 (20,106) 2 Te area of the white square = a r = 80 Te area of the white square =6,400 px r = 160 To double the area of white square =12,800 px The area of the red circle is equal to πr2 (80, 424) Te area of thered squaregreysquare is twice a =2 that 25,600 of white px square √12,800 = 113 ...four times that of the white square a = 80 a = 160 a = 113 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Size encoding How statistics lie • Modifications to the X or Y axis in an attempt to make differences or change appear to be more dramatic • Data represented out of context does not allow for adequate comparison Height Area Volume Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson The disappearing baseline The disappearing baseline Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Reversing the x-axis Best Practices in Data Visualization – ComSciConCan Jenkinson Reversing the x-axis No defined y-axis Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson No defined y-axis Context Best Practices in Data Visualization – ComSciConCan Jenkinson Context • Without context we are unable to see the big picture • Without context we are unable to make meaningful comparisons Source: New York Times Best Practices in Data Visualization – ComSciComCan Jenkinson Source: New York Times Source: New York Times Data in and out of context Data in and out of context Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Data in and out of context Historical context Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Historical context Clarity & Aesthetics Best Practices in Data Visualization – ComSciConCan Jenkinson Effectiveness • The important of the information should match the salience of the channel Kim OY, et al. (2012) Higher levels of serum triglyceride and dietary carbohydrate intake are associated with smaller LDL particle size in healthy Korean women.