CSE 5544: Introduction to Data Visualization Raghu Machiraju [email protected]
[xkcd] Many Thanks to
– Alex Lex, University of Utah – Torsten Moeller, University of Vienna – Tamara Munzner, Univ of British Columbia – Hanspeter Pfister, Harvard University
2 Design Principles
3 Previously
4 Visualization To convey information through visual representations
5 Design Excellence “Well-designed presentations of interesting data are a matter of substance, of statistics, and of design.” E. Tufte
7 Mark Goetz And … Stephen Few Robin Williams Garr Reynolds Design Critique & Redesign https://goo.gl/lHWp4x
Sunday Star Times, 2012 Quantity encoded by diameter, not area! Fixing that: But is this visual encoding appropriate ?
R. Cunliffe, Stats Chat The Premise http://creativity.wikia.com/wiki/File:Aiden-terrifying-horrific-classic.jpg What can you distort ? When you use geometry, photometry do not distort
Distort What ? Data – representations – clusters, trends – correlations
23 Design Excellence “Well-designed presentations of interesting data are a matter of substance, of statistics, and of design.” - Tufte
24 Design Guidelines Mantra – Graphical Integrity – Design Principles – Design Elements
26 Graphical Integrity Flowing Data Dealing with Scale Distortions
Flowing Data Chart Integrity
• Where’s baseline? • What’s scale? • What’s context?
Summer term 2008 15
Where’s 0? MissingVol 1, p. 54 Scales (1) Note middle ‘70
Summer term 2008 16 Tufte, VDQI 30
8 Incomplete Scales Consider This … Look at this closely … Start y-scale at 0?
A. Kriebel, VizWiz Another one …
The Daily Mail, UK, Jan 2012 Putting it in CONTEXT
Mother Jones Frame the Data
Mother Jones A huge problem Junk Charts Other distortions Vol 1, p 54 (2)
What’s being compared?
Summer term 2008 17 The Lie Factor • (Size of effect in graphic)/ Vol 1, 57(size of effect in data)
Tufte, VDQI 41
Scale?
Summer term 2008 18
9 Vol 1, p 54 (2)
What’s being compared?
Summer term 2008 17 The Lie Factor
5.3 0.6 27.5 18 Vol 1, 57 / = 14.8 0.6 18
Tufte, VDQI 42
Scale?
Summer term 2008 18
9 The Lie Factor
http://www.theoildrum.com/node/4645 43 Vol 1, p. 69
area = value?
Summer term 2008 23
Vol 1, p. 62 VolThe 1, p. Lie69 Factor
area = value?
volume = value?
Summer term 2008 23 Tufte, VDQI 44 Summer term 2008 24
Vol 1, p. 62 12
volume = value?
Summer term 2008 24
12 Design Distortions
Tufte, VDQI 45 Inherently pathological Death to Pie Charts
Share of coverage on TechCrunch
“I hate pie charts. I mean, really hate them.”
www.storytellingwithdata.com/2011/07/death-to-pie-charts.html Cole Nussbaumer Redesign The differences? The differences? Tufte’s Integrity
• Show data variation, not design variation • Clear, detailed, and thorough labeling and appropriate scales • Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”)
51 The problem … Vol 1, p 54 (2)
What’s being compared?
VisualSummer term 2008 Encoding17 Vol 1, 57
Scale?
Summer term 2008 18
9 Not all Encodings are same
Electri c
From Wilkinson 99, based on Stevens 61 Design Principles Use Decomposition
Beer sales Hierarchical Display Show Context
M. Ericson, NY Times 58 M. Ericson, NY Times M. Ericson, NY Times M. Ericson, NY Times 61 Maximize Data-Ink Ratio
0-$24,999 $25,000+ 0-$24,999 $25,000+ Maximize Data-Ink Ratio • Data-ink = the ink used to show data • Data-ink ratio = data-ink / total ink used
700
525
350
175
0 Males Females
0-$24,999 $25,000+ 0-$24,999 $25,000+
63 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
64 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
65 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
66 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
67 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
68 Avoid Chartjunk Extraneous visual elements that distract from the message
ongoing, Tim Brey
69 Which is better?
[Bateman et al. 2010]
70 Which is better?
[Bateman et al. 2010] https://eagereyes.org/criticism/chart-junk-considered-useful-after-all
71
EXPERIMENTAL RESULTS
1. No significant difference between plain and image charts for interactive interpretation accuracy 2. No significant difference in recall accuracy after a five- minute gap 3. Significantly better recall for plain charts of both the chart topic and the details (categories and trend) after long-term gap (2-3 weeks). 4. Participants saw value messages in plain charts significantly more often than in the plain charts. 5. Participants found plain charts more attractive, most enjoyed them, and found that they were easiest and fastest to remember.
73 Before After
G. Reynolds, Presentation Zen Before After
G. Reynolds, Presentation Zen Bring in the Clowns…
World Population in 2008 PTS Blog A better version…
World Population in 2008 PTS Blog Use Chart Junk? It depends! PROS CONS persuasion unbiased analysis memorability trustworthiness engagement interpretability space efficiency More 2.2. HIGH DATA DENSITY 43
2.2 High Data Density
Another theme of Tufte’s is that good graphics have high “data density”. To calculate this Dataquantity, note thatDensity the numbers to be plotted in a graphic can be written as an array of one or more dimensions.
Definition 3 (Date Density)
numbernumber of of entries entries in data array in data array data density =data density = areaarea of data of graphic data graphic
Ho et al., “Thermal Conductivity of the Elements: A Comprehensive Review” J. Phys. Chem. 1974 80 Figure 2.12: High data density graphic: illustration of many experimental measurements of the thermal conductivity of copper. Each string of dots, connected by a thin curve, is from a single publication which is identified by a number enclosed in a circle. Each of these 200-plus articles is listed with its identification number in the bibliography of the source of this figure, C. Y. Ho, R. W. Powell, P. E. Liley, “Thermal conductivity of the elements: A comprehensive review”, supplement no. 1, J. Phys. Chem. Reference Data, 3, pg. 1-244 (1974). The thick solid line is the recommended curve.
Not withstanding this formal definition, the concept of a “high data density” graphic is perhaps best illustrated through some examples. Fig. 2.12 compares a large number of simultaneous measurements of thermal conductivity as a function of temperature. Because more than 200 different curves, each labelled, are combined into a single graph, Tufte has canonized this as a Good Example. Is it? Well, yes, but only up to a point. First, there is no really good way to present 200 data sets. A single figure is certainly much easier to grasp than a whole of lot of figures, Escaping Flatland
Reebee Garofalo, Genealogy of Pop/Rock Music 81 Reebee Garofalo, Genealogy of Pop/Rock Music 82 Sparklines
Tufte, 1990 83 Tufte’s Integrity
• Show data variation, not design variation • Clear, detailed, and thorough labeling and appropriate scales •Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”)
84 Design Elements Tufte’s Design • Above all else show the data • Maximize data-ink ratio • Erase non-data ink • Erase redundant data ink • Revise and edit
86 Design Pyramid
Aesthetics affective
Usability efficient
Functionality effective Subjective • Aesthetics: Attractive things are perceived as more useful than unattractive ones • Style: Communicates brand, process, who the designer is • Playfulness: Encourages experimentation and exploration • Vividness: Can make a visualization more memorable
Pat Hanrahan, Nov 2007
88 CRAP Contrast Repetition Alignment Proximity Contrast Before After
G. Reynolds, Presentation Zen Before After
G. Reynolds, Presentation Zen Repetition
Faces of the Dead in Iraq, NY Times 94 Alignment
Before After
S. Few, Show me the numbers
95 Proximity
S. Few, Show me the numbers
96 Small Multiples
S. Few, Show me the numbers
97 Small Multiples
Tufte, VDQI 98 62 CHAPTER 2. THE GOSPEL ACCORDING TO TUFTE
Layering and Separation
Train No. 3701 3301 3801 3542 3765 New York 12:10 1:30 3:45 7:30 4:33
Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05
Peekskill 5:34 6:40 ...... 7:20 8:50 Ediison, N. J. 4:45 5:20 4:40 2:10 11:05 Princeton, N. J. 1:30 ...... 3:30 7:30
New York 12:10 1:30 3:45 7:30 4:33
Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05 Tufte, VDQI Peekskill 5:34 6:40 ...... 7:20 8:50 99
Ediison, N. J. 4:45 5:20 4:40 2:10 11:05
Princeton, N. J. 1:30 ...... 3:30 7:30
Train No. 3701 3301 3801 3542 3765
Figure 2.25: Two versions of a railway timetable with the “bad” version on top. Inspired by an example from pgs. 54-55 of Tufte(1990). 62 CHAPTER 2. THE GOSPEL ACCORDING TO TUFTE
Train No. 3701 3301 3801 3542 3765 New York 12:10 1:30 3:45 7:30 4:33 Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05 Peekskill 5:34 6:40 ...... 7:20 8:50 LayeringEdiison, N. J.and 4:45 5:20 Separation 4:40 2:10 11:05 Princeton, N. J. 1:30 ...... 3:30 7:30
New York 12:10 1:30 3:45 7:30 4:33
Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05
Peekskill 5:34 6:40 ...... 7:20 8:50
Ediison, N. J. 4:45 5:20 4:40 2:10 11:05
Princeton, N. J. 1:30 ...... 3:30 7:30
Train No. 3701 3301 3801 3542 3765
Figure 2.25: Two versions of a railway timetable with the “bad” version on top. Inspired by an example from pgs. 54-55 of Tufte(1990).
Tufte, VDQI 100 Negative Space
Tufte, EI (Vol. 2) p. 62 Negative Space
Negative Space Logos To Summarize The Zen Aesthetic • Simplicity • Clarity • Uncluttered • Restraint
G. Reynolds, Presentation Zen 104 Tufte’s Graphical Excellence
• Interesting data – Complex ideas, multivariate data • Clear, precise, concise presentation – Data-ink ratio • Accurate communication – Lie factor
105 Few is Applied Tufte
S. Few, Show Me the Numbers, p. 87 106 Analysis Questions
• Who is the intended audience? • What information does this visualization represent? • How many data dimensions does it encode? • List several tasks, comparisons or evaluations it enables • What principles of excellence best describe why it is good / bad? • Can you suggest any improvements? • Why do you like / dislike this visualization?
107 Graphical displays • Show the data • Induce the viewer to think about the substance, rather than about methodology, graphic design, [or] the technology of graphic productions... • Avoid distorting what the data have to say • Present many numbers in a small space • Make large data sets coherent • Encourage the eye to compare different pieces of data • Reveal the data at several levels of detail • Serve a reasonably clear purpose • Be closely integrated with the statistical and verbal descriptions Edward Tufte, The Display of Quantitative Information, page 1 108