<<

CSE 5544: Introduction to Data Raghu Machiraju [email protected]

[xkcd] Many Thanks to

– Alex Lex, University of Utah – Torsten Moeller, University of Vienna – , Univ of British Columbia – Hanspeter Pfister, Harvard University

2 Design Principles

3 Previously

4 Visualization To convey through visual representations

5 Design Excellence “Well-designed presentations of interesting data are a matter of substance, of , and of design.” E. Tufte

6

7 Mark Goetz And … Stephen Few Robin Williams Garr Reynolds Design Critique & Redesign https://goo.gl/lHWp4x

Sunday Star Times, 2012 Quantity encoded by diameter, not area! Fixing that: But is this visual encoding appropriate ?

R. Cunliffe, Stats Chat The Premise http://creativity.wikia.com/wiki/File:Aiden-terrifying-horrific-classic.jpg What can you distort ? When you use geometry, photometry do not distort

Distort What ? Data – representations – clusters, trends – correlations

23 Design Excellence “Well-designed presentations of interesting data are a matter of substance, of statistics, and of design.” - Tufte

24 Design Guidelines Mantra – Graphical Integrity – Design Principles – Design Elements

26 Graphical Integrity Flowing Data Dealing with Scale Distortions

Flowing Data Integrity

• Where’s baseline? • What’s scale? • What’s context?

Summer term 2008 15

Where’s 0? MissingVol 1, p. 54 Scales (1) Note middle ‘70

Summer term 2008 16 Tufte, VDQI 30

8 Incomplete Scales Consider This … Look at this closely … Start y-scale at 0?

A. Kriebel, VizWiz Another one …

The Daily Mail, UK, Jan 2012 Putting it in CONTEXT

Mother Jones Frame the Data

Mother Jones A huge problem Junk Other distortions Vol 1, p 54 (2)

What’s being compared?

Summer term 2008 17 The Lie Factor • (Size of effect in graphic)/ Vol 1, 57(size of effect in data)

Tufte, VDQI 41

Scale?

Summer term 2008 18

9 Vol 1, p 54 (2)

What’s being compared?

Summer term 2008 17 The Lie Factor

5.3 0.6 27.5 18 Vol 1, 57 / = 14.8 0.6 18

Tufte, VDQI 42

Scale?

Summer term 2008 18

9 The Lie Factor

http://www.theoildrum.com/node/4645 43 Vol 1, p. 69

area = value?

Summer term 2008 23

Vol 1, p. 62 VolThe 1, p. Lie69 Factor

area = value?

volume = value?

Summer term 2008 23 Tufte, VDQI 44 Summer term 2008 24

Vol 1, p. 62 12

volume = value?

Summer term 2008 24

12 Design Distortions

Tufte, VDQI 45 Inherently pathological Death to Pie Charts

Share of coverage on TechCrunch

“I hate pie charts. I mean, really hate them.”

www.storytellingwithdata.com/2011/07/death-to-pie-charts.html Cole Nussbaumer Redesign The differences? The differences? Tufte’s Integrity

• Show data variation, not design variation • Clear, detailed, and thorough labeling and appropriate scales • Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”)

51 The problem … Vol 1, p 54 (2)

What’s being compared?

VisualSummer term 2008 Encoding17 Vol 1, 57

Scale?

Summer term 2008 18

9 Not all Encodings are same

Electri c

From Wilkinson 99, based on Stevens 61 Design Principles Use Decomposition

Beer sales Hierarchical Display Show Context

M. Ericson, NY Times 58 M. Ericson, NY Times M. Ericson, NY Times M. Ericson, NY Times 61 Maximize Data-Ink Ratio

0-$24,999 $25,000+ 0-$24,999 $25,000+ Maximize Data-Ink Ratio • Data-ink = the ink used to show data • Data-ink ratio = data-ink / total ink used

700

525

350

175

0 Males Females

0-$24,999 $25,000+ 0-$24,999 $25,000+

63 Avoid Extraneous visual elements that distract from the message

ongoing, Tim Brey

64 Avoid Chartjunk Extraneous visual elements that distract from the message

ongoing, Tim Brey

65 Avoid Chartjunk Extraneous visual elements that distract from the message

ongoing, Tim Brey

66 Avoid Chartjunk Extraneous visual elements that distract from the message

ongoing, Tim Brey

67 Avoid Chartjunk Extraneous visual elements that distract from the message

ongoing, Tim Brey

68 Avoid Chartjunk Extraneous visual elements that distract from the message

ongoing, Tim Brey

69 Which is better?

[Bateman et al. 2010]

70 Which is better?

[Bateman et al. 2010] https://eagereyes.org/criticism/chart-junk-considered-useful-after-all

71

EXPERIMENTAL RESULTS

1. No significant difference between plain and image charts for interactive interpretation accuracy 2. No significant difference in recall accuracy after a five- minute gap 3. Significantly better recall for plain charts of both the chart topic and the details (categories and trend) after long-term gap (2-3 weeks). 4. Participants saw value messages in plain charts significantly more often than in the plain charts. 5. Participants found plain charts more attractive, most enjoyed them, and found that they were easiest and fastest to remember.

73 Before After

G. Reynolds, Presentation Zen Before After

G. Reynolds, Presentation Zen Bring in the Clowns…

World Population in 2008 PTS Blog A better version…

World Population in 2008 PTS Blog Use Chart Junk? It depends! PROS CONS persuasion unbiased analysis memorability trustworthiness engagement interpretability space efficiency More 2.2. HIGH DATA DENSITY 43

2.2 High Data Density

Another theme of Tufte’s is that good graphics have high “data density”. To calculate this Dataquantity, note thatDensity the numbers to be plotted in a graphic can be written as an array of one or more dimensions.

Definition 3 (Date Density)

numbernumber of of entries entries in data array in data array data density =data density = areaarea of data of graphic data graphic

Ho et al., “Thermal Conductivity of the Elements: A Comprehensive Review” J. Phys. Chem. 1974 80 Figure 2.12: High data density graphic: of many experimental measurements of the thermal conductivity of copper. Each string of dots, connected by a thin curve, is from a single publication which is identified by a number enclosed in a circle. Each of these 200-plus articles is listed with its identification number in the bibliography of the source of this figure, C. Y. Ho, R. W. Powell, P. E. Liley, “Thermal conductivity of the elements: A comprehensive review”, supplement no. 1, J. Phys. Chem. Reference Data, 3, pg. 1-244 (1974). The thick solid line is the recommended curve.

Not withstanding this formal definition, the concept of a “high data density” graphic is perhaps best illustrated through some examples. Fig. 2.12 compares a large number of simultaneous measurements of thermal conductivity as a function of temperature. Because more than 200 different curves, each labelled, are combined into a single graph, Tufte has canonized this as a Good Example. Is it? Well, yes, but only up to a point. First, there is no really good way to present 200 data sets. A single figure is certainly much easier to grasp than a whole of lot of figures, Escaping Flatland

Reebee Garofalo, Genealogy of Pop/Rock Music 81 Reebee Garofalo, Genealogy of Pop/Rock Music 82 Sparklines

Tufte, 1990 83 Tufte’s Integrity

• Show data variation, not design variation • Clear, detailed, and thorough labeling and appropriate scales •Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”)

84 Design Elements Tufte’s Design • Above all else show the data • Maximize data-ink ratio • Erase non-data ink • Erase redundant data ink • Revise and edit

86 Design Pyramid

Aesthetics affective

Usability efficient

Functionality effective Subjective • Aesthetics: Attractive things are perceived as more useful than unattractive ones • Style: Communicates brand, process, who the designer is • Playfulness: Encourages experimentation and exploration • Vividness: Can make a visualization more memorable

Pat Hanrahan, Nov 2007

88 CRAP Contrast Repetition Alignment Proximity Contrast Before After

G. Reynolds, Presentation Zen Before After

G. Reynolds, Presentation Zen Repetition

Faces of the Dead in Iraq, NY Times 94 Alignment

Before After

S. Few, Show me the numbers

95 Proximity

S. Few, Show me the numbers

96 Small Multiples

S. Few, Show me the numbers

97 Small Multiples

Tufte, VDQI 98 62 CHAPTER 2. THE GOSPEL ACCORDING TO TUFTE

Layering and Separation

Train No. 3701 3301 3801 3542 3765 New York 12:10 1:30 3:45 7:30 4:33

Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05

Peekskill 5:34 6:40 ...... 7:20 8:50 Ediison, N. J. 4:45 5:20 4:40 2:10 11:05 Princeton, N. J. 1:30 ...... 3:30 7:30

New York 12:10 1:30 3:45 7:30 4:33

Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05 Tufte, VDQI Peekskill 5:34 6:40 ...... 7:20 8:50 99

Ediison, N. J. 4:45 5:20 4:40 2:10 11:05

Princeton, N. J. 1:30 ...... 3:30 7:30

Train No. 3701 3301 3801 3542 3765

Figure 2.25: Two versions of a railway timetable with the “bad” version on top. Inspired by an example from pgs. 54-55 of Tufte(1990). 62 CHAPTER 2. THE GOSPEL ACCORDING TO TUFTE

Train No. 3701 3301 3801 3542 3765 New York 12:10 1:30 3:45 7:30 4:33 Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05 Peekskill 5:34 6:40 ...... 7:20 8:50 LayeringEdiison, N. J.and 4:45 5:20 Separation 4:40 2:10 11:05 Princeton, N. J. 1:30 ...... 3:30 7:30

New York 12:10 1:30 3:45 7:30 4:33

Newark, N. J. 1:43 10:30 5:21 8:50 11:45 North Elizabeth ...... 6:45 Elizabeth 3:33 2:05 ...... 7:05

Peekskill 5:34 6:40 ...... 7:20 8:50

Ediison, N. J. 4:45 5:20 4:40 2:10 11:05

Princeton, N. J. 1:30 ...... 3:30 7:30

Train No. 3701 3301 3801 3542 3765

Figure 2.25: Two versions of a railway timetable with the “bad” version on top. Inspired by an example from pgs. 54-55 of Tufte(1990).

Tufte, VDQI 100 Negative Space

Tufte, EI (Vol. 2) p. 62 Negative Space

Negative Space Logos To Summarize The Zen Aesthetic • Simplicity • Clarity • Uncluttered • Restraint

G. Reynolds, Presentation Zen 104 Tufte’s Graphical Excellence

• Interesting data – Complex ideas, multivariate data • Clear, precise, concise presentation – Data-ink ratio • Accurate communication – Lie factor

105 Few is Applied Tufte

S. Few, Show Me the Numbers, p. 87 106 Analysis Questions

• Who is the intended audience? • What information does this visualization represent? • How many data dimensions does it encode? • List several tasks, comparisons or evaluations it enables • What principles of excellence best describe why it is good / bad? • Can you suggest any improvements? • Why do you like / dislike this visualization?

107 Graphical displays • Show the data • Induce the viewer to think about the substance, rather than about methodology, , [or] the technology of graphic productions... • Avoid distorting what the data have to say • Present many numbers in a small space • Make large data sets coherent • Encourage the eye to compare different pieces of data • Reveal the data at several levels of detail • Serve a reasonably clear purpose • Be closely integrated with the statistical and verbal descriptions Edward Tufte, The Display of Quantitative Information, page 1 108