<<

Dudoit

Motivation

Principles of Data Visualization Data 100: Principles and Techniques of Data Do We Really Need a Graph? General Considerations Graphical Perception Sandrine Dudoit Bad Graphs Survey of Data Department of and Division of Biostatistics, UC Berkeley Visualization Techniques One Quantitative Spring 2019 Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 1 / 120 References Outline

Data Visualization 1 Motivation

Dudoit 2 Principles of Data Visualization Motivation 2.1 Do We Really Need a Graph? Principles of Data 2.2 General Considerations Visualization Do We Really 2.3 Graphical Perception Need a Graph? General 2.4 Bad Graphs Considerations Graphical Perception Bad Graphs 3 Survey of Data Visualization Techniques Survey of 3.1 One Quantitative Variable Data Visualization 3.2 Multiple Quantitative Variables Techniques One 3.3 One Qualitative Variable Quantitative Variable 3.4 Multiple Qualitative Variables Multiple Quantitative Variables 3.5 Conditional Plots One Qualitative Variable Multiple Qualitative Variables Version: 05/02/2019, 17:17 Conditional Plots 2 / 120 References Data Visualization

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really “One picture worth ten thousand words.” Need a Graph? General Considerations Graphical Perception Frederick R. Barnard, Printer’s Ink, March 10th, 1927. Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 3 / 120 References An Oldie But Goodie

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Figure 1: Minard’s representation of Napoleon’s 1812 Russian Data Visualization Campaign. This graph, made in 1861 by Techniques One (1781–1870), is commonly regarded as one of the finest ever. It Quantitative Variable represents, in only two dimensions, the size of the troops, their Multiple Quantitative location, their direction of movement, dates, and temperatures. Variables One Qualitative https://en.wikipedia.org/wiki/Charles_Joseph_Minard. Variable Multiple Qualitative Variables Conditional Plots 4 / 120 References New But ...

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Figure 2: Bitcoin wealth distribution. Quantitative Variables http://viz.wtf/image/166329900475. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 5 / 120 References Data Visualization

Data Visualization

Dudoit

Motivation One picture worth ten thousand words. Principles of Data • Visualization Only if it is a good picture. Do We Really Need a Graph? • We tend to be more demanding with text than with General Considerations graphics. Graphical Perception Bad Graphs • How long does it take to write/read one thousand words? Survey of Data At least the same effort should be put into Visualization Techniques making/viewing a graph. One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 6 / 120 References Learning Objectives

Data Visualization • Become a wise and effective “creator”/“maker” as well as Dudoit “reader”/“viewer” of data visualization. Motivation • Master general principles for data visualization and apply Principles of Data these when making your own graphs as well as when Visualization Do We Really viewing others’. Need a Graph? General Considerations • Produce the right graph for the matter at hand. Graphical Perception Bad Graphs • Become aware of the variety of graphical techniques Survey of available for different types of data and purposes and Data Visualization understand their pros and cons. Techniques One Go beyond and pie ! Quantitative Variable Multiple • Think more carefully about each plot you create, consider Quantitative Variables One Qualitative the pros and cons of different choices, and try several Variable Multiple different plots for a given dataset. Qualitative Variables Conditional Plots 7 / 120 References Learning Objectives

Data Visualization Dudoit • Familiarize yourself with software for data visualization.

Motivation Most of the examples in theses slides are based on

Principles of Python’s matplotlib and seaborn libraries. However, as Data Visualization discussed in the first lecture, other languages such a R Do We Really Need a Graph? may be better suited for certain tasks. General Considerations • Graphical Focus on what type of plot to make rather than how to Perception Bad Graphs make it, i.e., compose the plot conceptually before Survey of thinking of its software implementation details. Data Visualization Concepts are general and long-lasting, will syntax is highly Techniques One specific and ephemeral. Quantitative Variable Multiple • Avoid bad graphs! Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 8 / 120 References Data Visualization

Data Visualization • Data visualization is a fundamental aspect of Data Dudoit Science. Motivation • It is essential to “look at data” throughout the workflow, Principles of Data from exploratory (EDA) to model diagnostics Visualization Do We Really and reporting the results of the inquiry. Need a Graph? General Considerations • Visualization is valuable for detecting the main features Graphical Perception Bad Graphs (good or bad) of a dataset, revealing patterns, and Survey of suggesting theories or further questions. Data Visualization • Visualization is also useful for quality/assessment control Techniques One (QA/QC) and detecting problems with the data. Quantitative Variable Multiple • An effective plot can be good enough to answer the Quantitative Variables One Qualitative question on its own. In some cases, it may even be the Variable Multiple only appropriate type of answer. Qualitative Variables Conditional Plots 9 / 120 References Data Visualization

Data Visualization

Dudoit

Motivation

Principles of Data Visualization • An effective plot can also be sufficient to convince Do We Really Need a Graph? stakeholders of the findings from a full-blown statistical General Considerations Graphical inference procedure. Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 10 / 120 References Data Visualization

Data Visualization • Although data visualization is ubiquitous and heavily relied Dudoit upon, in research as well as in the media, typically not Motivation much thought is put into creating or reading plots.

Principles of I Creators often rely on very limited subsets of plots and Data Visualization without proper consideration of their limitations. Do We Really Need a Graph? I Readers often passively absorb a message imposed on them General Considerations by the graph, rather than reason and think critically about Graphical Perception it. Bad Graphs Survey of • Very few Statistics, Computer Science (CS), or domain Data Visualization curricula offer courses in data visualization. Techniques One • Proper data visualization is non trivial. Entire courses Quantitative Variable could and should be devoted to data visualization, Multiple Quantitative Variables including discussions of vision and perception to guide the One Qualitative Variable design of effective graphs. Multiple Qualitative Variables Conditional Plots 11 / 120 References Do We Really Need a Graph?

Data Visualization

Dudoit

Motivation

Principles of • When the data only comprise a handful of values, a Data Visualization or a simple mention in text may be a more effective, i.e., Do We Really Need a Graph? accurate and simple, display. General Considerations Graphical • E.g. Percentage of popular vote for Trump and Clinton in Perception Bad Graphs 2016 presidential election: Survey of Trump 46.1 % Data Visualization Clinton 48.2 % Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 12 / 120 References Do We Really Need a Graph?

Data Visualization

Dudoit Trump Trump Motivation 48.9% 46.1% Principles of Data Visualization 5.7% Do We Really Others Need a Graph? General 51.1% 48.2% Considerations Graphical Perception Clinton Clinton Bad Graphs Survey of Data Visualization Techniques Figure 3: US Election Results 2016. Left: Pie of percentage of One popular vote for Trump and Clinton. Right: Pie chart of percentage Quantitative Variable of popular vote for Trump, Clinton, and other candidates. Why the Multiple Quantitative Variables different percentages on left and right? One Qualitative Variable Multiple Qualitative Variables Conditional Plots 13 / 120 References From Tables to Graphs

Data Visualization

Dudoit

Motivation • When a table represents two or more variables, with more

Principles of than a handful of values each, a graph may be more Data Visualization effective. Do We Really Need a Graph? • General Tables leave the interpretation to the viewer. Considerations Graphical • Perception Graphs provide a summary of the data and are more Bad Graphs amenable to comparisons. Survey of Data • Gelman et al. (2002). Lets Practice What We Preach: Visualization Techniques Turning Tables into Graphs. http://www.stat.columbia. One Quantitative Variable edu/~gelman/research/published/dodhia.pdf. Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 14 / 120 References From Tables to Graphs

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques Figure 4: Turning tables into graphs (Gelman et al., 2002, Figure 2). One Quantitative Variable Counts and rates of citations of various professions from the New Multiple Quantitative York Times database. Graph: Log-log scale allows comparison across Variables ◦ One Qualitative several orders of magnitude. Any 45 line indicates constant relative Variable Multiple frequency. The relative positions of the different professions is clearer. Qualitative Variables Conditional Plots 15 / 120 References More Oldies But Goodies

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables Figure 5: Album de Statistique Graphique (1881). One Qualitative Variable https://www.davidrumsey.com/. Multiple Qualitative Variables Conditional Plots 16 / 120 References More Oldies But Goodies:

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Figure 6: Album de Statistique Graphique (1881). Train load (scaled Multiple Quantitative by length of line) is represented by thickness of bands. How would Variables One Qualitative you represent this data without a graph? Variable Multiple Qualitative Variables Conditional Plots 17 / 120 References More Oldies But Goodies: Graphical Timetables

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Figure 7: Marey (1885). Train schedule Paris–Lyon, 1880s. Quantitative Variable https://www.edwardtufte.com/bboard/q-and-a-fetch-msg? Multiple Quantitative . How would you represent this data without a Variables msg_id=0003zP One Qualitative Variable graph? Multiple Qualitative Variables Conditional Plots 18 / 120 References More Oldies But Goodies: Graphical Timetables

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Figure 8: Marey (1885). Train schedule Paris–Lyon with TGV, 1980s Quantitative Variable vs. 1880s. The red line indicates the 1981 itinerary of the TGV, a Multiple Quantitative Variables new express train that cut the trip from Paris to Lyon to under three One Qualitative Variable hours (vs. nine hours in the 1880s). Multiple Qualitative Variables Conditional Plots 19 / 120 References More Oldies But Goodies: Graphical Timetables

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Figure 9: Train schedule SF–Gilroy, now. Quantitative Variable https://i.stack.imgur.com/qJ1hH. Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 20 / 120 References More Oldies But Goodies: Graphical Timetables

Data Visualization • In Marey (1885)’s Paris–Lyon graphical train schedule in Dudoit the 1880s, time is represented on the x axis and the Motivation stations and distances between stations are represented on Principles of the y axis (Tufte, 2001). Data Visualization Do We Really • A train’s itinerary is represented by a line. Need a Graph? General Considerations • The slope of the line reflects the speed of the train: The Graphical Perception more nearly vertical the line, the faster the train. Bad Graphs Survey of • The length of a stop at a station is indicated by the length Data Visualization of the horizontal line. Techniques One • The intersection of two lines locates the time and place Quantitative Variable that trains going in opposite directions pass each other. Multiple Quantitative Variables • This type of graph, known as a parallel coordinates plot, is One Qualitative Variable Multiple still used today and has many other applications. Qualitative Variables Conditional Plots 21 / 120 References Caveats

Data Visualization • Dudoit Graphs should attempt to summarize data in a simple, intuitive, and efficient manner, without distorting or Motivation loosing important . Principles of Data • However, not all good graphs are simple. As with text, Visualization Do We Really Need a Graph? plots conveying a lot of information (e.g., displaying General Considerations multiple variables) require both a skillful creator and an Graphical Perception educated reader. Bad Graphs E.g. Minard’s graph for Napoleon’s Russia campaign, old Survey of Data graphical train schedules. Visualization Techniques • There is no “one-size-fits-all” graph, i.e., different types of One Quantitative Variable graphs should be used for different Multiple Quantitative I types of data, e.g., quantitative, qualitative variables; Variables One Qualitative I purposes, e.g., debugging code, EDA, reporting results; Variable Multiple I media, e.g., print journal, projector. Qualitative Variables Conditional Plots 22 / 120 References Caveats

Data Visualization

Dudoit • Graphs typically reduce the information contained in the data. Motivation E.g. Histograms n data points into B < n bins; Principles of Data boxplots map n data points into 5 summary statistics (+ Visualization Do We Really possibly ). Need a Graph? General Considerations • By focusing on certain aspects of the data or even Graphical Perception imposing structure on data, graphs can also be subjective. Bad Graphs Survey of E.g. Choosing which variables to plot, decisions regarding Data 1 Visualization axes and scales, dendrogram representation of clusters . Techniques One • As with text, the creator of the plot makes editorial Quantitative Variable decisions as to which data to display and which aspects of Multiple Quantitative Variables these data to show or emphasize. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 23 / 120 References Caveats

Data Visualization • The reader should assess the relevance and reliability of

Dudoit the data being displayed, as well as the appropriateness of the graph. Motivation

Principles of • Software implicitly makes many decisions for the creator of Data Visualization a plot, e.g., axes, scales, plotting symbols, color, ordering Do We Really Need a Graph? of data. Experiment with different settings. General Considerations • Graphical Graphs are rarely presented on their own. They should be Perception Bad Graphs interpreted in context of the text which they support. The Survey of reader should examine the graph-text interface and, in Data Visualization particular, whether the conclusions in the text are Techniques One supported by the graph. Quantitative Variable Multiple 1 Quantitative A dendrogram is a graphical representation of hierarchical clustering Variables n−1 One Qualitative results; for a given clustering of n objects, there are 2 possible Variable Multiple dendrograms. The various choices made in hierarchical clustering as well as Qualitative Variables the dendrogram representation impose (vs. reveal) structure on the data. Conditional Plots 24 / 120 References Statistical Inference

Data Visualization Dudoit • Graphs are by definition functions of the data, i.e.,

Motivation statistics. Principles of • Although not typically viewed this way, visualization can Data Visualization therefore be used as part of statistical inference. Do We Really Need a Graph? • General One can produce the same types of plots for a and Considerations Graphical for a population, in that sense, the plot for the sample can Perception Bad Graphs be viewed as an estimator of the plot for the population, Survey of Data i.e., the parameter. Visualization Techniques • A pattern that we detect from plotting data for a sample One Quantitative can be used to infer properties of the population from Variable Multiple Quantitative which the sample was drawn. A formalized special case of Variables One Qualitative such an approach is given by . Variable Multiple Qualitative Variables Conditional Plots 25 / 120 References General Considerations

Data Visualization In the process of creating a plot, you should consider the Dudoit following issues. Motivation • Determine the purpose of the plot. Principles of Data E.g. EDA, debugging code, comparing distributions, Visualization model diagnostics, summarizing results, reporting results. Do We Really Need a Graph? General • Formulate the message. Considerations Graphical Perception • Identify the audience. Bad Graphs Survey of • Identify the display mode/medium (e.g., journal, Data Visualization projector). Techniques One • Think about the best type of graph for the purpose, Quantitative Variable Multiple message, audience, and display mode. Quantitative Variables One Qualitative • Aim for efficient perception: Speed, accuracy, and Variable Multiple minimum cognitive load for understanding the message. Qualitative Variables Conditional Plots 26 / 120 References General Considerations

Data Visualization Dudoit • Apply principles.

Motivation E.g. Angles and areas are harder to perceive/compare

Principles of than lengths. Data Visualization • Do not use more dimensions to represent the data than Do We Really Need a Graph? are in the data. This rules out pie charts and barplots. General Considerations Graphical • An important consideration when selecting a graphical Perception Bad Graphs technique is how easily it can be extended (e.g., to Survey of Data multiple variables) and how amenable it is to comparing Visualization Techniques distributions. One Quantitative • Variable Choose graphical parameters carefully: Aspect ratio, Multiple Quantitative plotting symbols, line types, texture, axes, etc. Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 27 / 120 References General Considerations

Data Visualization • Choose color palette carefully. E.g. Be mindful of color Dudoit blindness, use different color schemes for different types of data and messages (e.g., sequential, qualitative, and Motivation

Principles of diverging). Data Visualization • Provide sufficient information so that the plot can be Do We Really Need a Graph? interpreted properly. General Considerations E.g. Title, axis parameters (i.e., label, tick marks), Graphical Perception Bad Graphs annotation, legend, caption, etc. Survey of In a document, number the figures and tables. Data Visualization • Do not include irrelevant information, i.e., avoid “chart Techniques One junk”. Quantitative Variable Multiple • Principle of “least surprise”: If you defy expectations, Quantitative Variables One Qualitative people may get confused. Only defy expectations if it is Variable Multiple very important. Qualitative Variables Conditional Plots 28 / 120 References General Considerations

Data Visualization

Dudoit

Motivation

Principles of Data • Visualization Experiment, i.e., consider different types of plots and Do We Really Need a Graph? update the plots iteratively. General Considerations • Graphical Of course, always think about the quality of the data you Perception Bad Graphs plot. Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 29 / 120 References General Considerations

Data Visualization Dudoit • Sample size.

Motivation I For small sample sizes, plot all of the data – Why loose

Principles of information? Data I For larger samples sizes, plot relevant summaries of the Visualization Do We Really data, that do not distort or loose important information in Need a Graph? General the data. Considerations Graphical Perception • Variables to display/emphasize. Depends on the purpose Bad Graphs and message of the plot. Survey of Data Visualization • Type of variables. Quantitative and qualitative variables Techniques One call for different types of graphical summaries. Quantitative Variable Multiple • Pre-processing. E.g. Transformation (e.g., log), Quantitative Variables dimensionality reduction, imputation. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 30 / 120 References Graphical Perception

Data Visualization Dudoit • Cleveland and McGill (1985): “Graphical perception is the

Motivation visual decoding of the quantitative and qualitative

Principles of information encoded on graphs. Recent investigations have Data Visualization uncovered basic principles of human graphical perception Do We Really Need a Graph? that have important implications for the display of data.” General Considerations Graphical • When we create a graph, we encode the data as graphical Perception Bad Graphs attributes. Survey of Data • Possible graphical attributes are: Angles, areas, lengths, Visualization Techniques position on common aligned/unaligned scale, slopes, color One Quantitative properties. Variable Multiple Quantitative • Effective graphs are those for which attributes are most Variables One Qualitative easily decoded. Variable Multiple Qualitative Variables Conditional Plots 31 / 120 References Graphical Perception

Data Visualization

Dudoit

Motivation

Principles of • There are empirical laws for perception that can be used Data Visualization to rank different types of graphical encodings. Do We Really Need a Graph? • In general, such laws relate the perceived (change in) General Considerations Graphical intensity in a physical stimulus to the actual (change in) Perception Bad Graphs intensity. This concerns stimuli to all senses, i.e., vision, Survey of hearing, taste, touch, and smell. Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 32 / 120 References Graphical Perception: Weber’s Law

Data Visualization • Weber’s Law is an empirical relationship in psychophysics Dudoit between the initial intensity in a stimulus( I ) and the

Motivation smallest perceivable difference (a.k.a., just noticeable

Principles of difference) in the stimulus intensity (∆I ): Data Visualization ∆I Do We Really = k, (1) Need a Graph? General I Considerations Graphical where k is a proportionality constant for a given type of Perception 2 Bad Graphs stimulus . Survey of Data • In terms of length, this means we detect a 1 cm change in Visualization Techniques a 1 m length as easily as we detect a 10 m change in a 1 One Quantitative km length. Variable Multiple • Weber’s Law appears to hold for many different graphical Quantitative Variables One Qualitative encodings. Variable 2 Multiple Law formulated and published by Gustav Theodor Fechner Qualitative Variables (1801–1887), a student of Ernst Heinrich Weber (1795–1878). Conditional Plots 33 / 120 References Graphical Perception: Stevens’ Law

Data Visualization • Stevens (1957) Law is an empirical relationship in Dudoit psychophysics between the intensity in a stimulus and the perceived magnitude of the sensation created by the Motivation

Principles of stimulus: Data β Visualization ψ(I ) = Ci , (2) Do We Really Need a Graph? where I is the intensity or strength of the stimulus in General Considerations physical units (energy, weight, pressure, mixture Graphical Perception Bad Graphs proportions, etc.), ψ(I ) is the magnitude of the sensation, Survey of β is an exponent that depends on the type of stimulation Data Visualization or sensory modality, and C is a proportionality constant Techniques One that depends on the units used. Quantitative Variable • Examples of values for exponent, β Multiple Quantitative Variables Length: 0.9 – 1.1 One Qualitative Variable Area: 0.6 – 0.9 Multiple Qualitative Variables Volume: 0.5 – 0.8 Conditional Plots 34 / 120 References Graphical Perception: Stevens’ Law

Data Visualization • For lengths, the relationship is almost linear, thus our Dudoit perception is about right. Motivation • However, according to this power law, our perception of Principles of Data areas and volumes is conservative, i.e., when values are Visualization represented as areas or volumes, we underestimate the Do We Really Need a Graph? General large values relative to the small ones and overestimate Considerations Graphical the small ones relative to the large ones. Perception Bad Graphs • E.g. Areas, with β = 0.7. Survey of Data Consider two areas of size 1 and 2, respectively. Visualization Techniques 0.7 One ψ(2) 2 Quantitative Variable = 0.7 u 1.62. Multiple ψ(1) 1 Quantitative Variables One Qualitative Variable Thus, we don’t see the bigger area as twice as large. Multiple Qualitative Variables Conditional Plots 35 / 120 References Graphical Perception: Stevens’ Law

Data Visualization

Dudoit

Motivation Principles of Now consider two areas of size 1/2 and 1, respectively. Data Visualization Do We Really 0.7 Need a Graph? ψ(1/2) 0.5 General = 0.62. Considerations 0.7 u Graphical ψ(1) 1 Perception Bad Graphs Survey of Thus, we don’t see the smaller area as half as large. Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 36 / 120 References Graphical Perception: Stevens’ Law

Data

Visualization 5 electric shock 3.5 saturation Dudoit 1.7

4 length Motivation 1

Principles of area Data 3 0.7 Visualization depth Do We Really 0.67

Need a Graph? 2 General brightness Considerations Intensity Perceived 0.5 Graphical

Perception 1 Bad Graphs Survey of

Data 0 Visualization Techniques 0 1 2 3 4 5 One Quantitative Actual Intensity Variable Multiple Quantitative Variables Figure 10: Graphical perception: Steven’s Law. Stevens (1957) One Qualitative Variable Multiple perceived sensory magnitude power law. Qualitative Variables Conditional Plots 37 / 120 References Graphical Perception: Combining Weber’s and Stevens’ Laws

Data Visualization Dudoit • Consider comparing the values x and x + w, using length

Motivation (β = 1) and area( β = 0.7) encodings. Principles of • For length, we perceive the relative value Data Visualization Do We Really x + w w Need a Graph? = 1 + . General Considerations x x Graphical Perception Bad Graphs • For area, we perceive the relative value Survey of Data 0.7 0.7 Visualization (x + w)  w  0.7w Techniques 0.7 = 1 + u 1 + . One x x x Quantitative Variable Multiple • Thus, we are more likely to detect small differences using Quantitative Variables One Qualitative length encoding. Variable Multiple Qualitative Variables Conditional Plots 38 / 120 References Graphical Perception

Data Visualization • Cleveland and McGill (1985) carried out an extensive study Dudoit of graphical encodings to obtain a best to worst ranking. Motivation • The encodings they examined include: position on a Principles of Data common aligned scale, position on a common unaligned Visualization Do We Really scale, length, slope, angle, area, volume, color hue, Need a Graph? General brightness, and purity. Considerations Graphical Perception • One of their experiments consisted of Bad Graphs I 7 graphical encodings, Survey of Data I 3 judgments per encoding, Visualization Techniques I 10 replications per subject, One I 127 experimental subjects. Quantitative Variable Multiple Assessment criterion: error = kperceived p − true pk, Quantitative Variables One Qualitative where p denotes the ratio (in percentages) of the smaller Variable Multiple to the larger magnitude. Qualitative Variables Conditional Plots 39 / 120 References Graphical Perception

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Figure 11: Graphical perception. Based on Table 1 in Cleveland and Multiple Quantitative McGill (1985). Variables One Qualitative http://paldhous.github.io/ucb/2016/dataviz/week2.html. Variable Multiple Qualitative Variables Conditional Plots 40 / 120 References Bad Graphs

Data Visualization • The literature is full of “bad graphs”, that, for instance, Dudoit distort the data and are misleading, are too complicated, Motivation or are missing essential information. Principles of Data • Karl Broman’s Top Ten Worst Graphs (including one of Visualization Do We Really his own!): https://www.biostat.wisc.edu/~kbroman/ Need a Graph? General topten_worstgraphs/. Considerations Graphical Perception • Ross Ihaka’s Good and Bad Graphs: https://www.stat. Bad Graphs auckland.ac.nz/ ihaka/120/Lectures/lecture03.pdf. Survey of ~ Data Visualization • : https://www.edwardtufte.com/bboard/ Techniques . One q-and-a-fetch-msg?msg_id=00040Z Quantitative Variable • Junk Charts: Multiple Quantitative Variables https://junkcharts.typepad.com/junk_charts/. One Qualitative Variable Multiple • WTF Visualization: http://viz.wtf. Qualitative Variables Conditional Plots 41 / 120 References Bad Graphs: Pie Charts

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques Figure 12: Top 10 Google salaries by job category: Pie chart. One Quantitative Variable https://junkcharts.typepad.com/junk_charts/2011/10/ Multiple Quantitative the-massive-burden-of-pie-charts.html. What’s the Variables One Qualitative message? What do the angles represent? What’s a better graph? Variable Multiple Qualitative Variables Conditional Plots 42 / 120 References Bad Graphs: Pie Charts

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Figure 13: Top 10 Google salaries by job category: Interval chart. Multiple Quantitative https://junkcharts.typepad.com/junk_charts/2011/10/ Variables One Qualitative the-massive-burden-of-pie-charts.html. Variable Multiple Qualitative Variables Conditional Plots 43 / 120 References Bad Graphs: Pie Charts

Data

Visualization Lead Software Engineer Contractor ●

Dudoit Product Management Director ●

● Motivation Directors

Principles of Human Resources Director ● Data Director ● Visualization

Do We Really Senior Partner Manager ● Need a Graph?

General Staff User Experience Designer ● Considerations Graphical Perception Marketing Director ● Bad Graphs Senior Managers* ● Survey of Data Group Product Manager ● Visualization Techniques 140 160 180 200 220 240 One Salary, thousands of dollars Quantitative Variable Multiple Quantitative Figure 14: Top 10 Google salaries by job category: Interval chart. Variables One Qualitative Sorted by midpoint of salary range. Variable Multiple Qualitative Variables Conditional Plots 44 / 120 References Bad Graphs: Pie Charts

Data

Visualization Directors ●

Dudoit Senior Managers* ●

● Motivation Staff User Experience Designer

Principles of Lead Software Engineer Contractor ● Data Human Resources Director ● Visualization

Do We Really Product Management Director ● Need a Graph?

General Senior Partner Technology Manager ● Considerations Graphical Perception Group Product Manager ● Bad Graphs Engineering Director ● Survey of Data Marketing Director ● Visualization Techniques 140 160 180 200 220 240 One Salary, thousands of dollars Quantitative Variable Multiple Quantitative Figure 15: Top 10 Google salaries by job category: Interval chart. Variables One Qualitative Sorted by salary range. Variable Multiple Qualitative Variables Conditional Plots 45 / 120 References Bad Graphs: Pie Charts

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Figure 16: Google Home query categories: Pie chart. Quantitative Variable Multiple http://viz.wtf/image/171134950336. Unreadable. Can’t match Quantitative Variables numbers to categories. What’s a better graph? One Qualitative Variable Multiple Qualitative Variables Conditional Plots 46 / 120 References Bad Graphs: Pie Charts

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Figure 17: Bitcoin wealth distribution: Pie chart. Multiple Quantitative http://viz.wtf/image/166329900475. What’s the message? Variables One Qualitative How to compare shapes and areas? Without text, pie uninformative. Variable Multiple What’s a better graph? Qualitative Variables Conditional Plots 47 / 120 References Bad Graphs: Pie Charts

Data Visualization

Dudoit

● ● ● ● ● 100 Motivation 100 ● Principles of ● ● 80 Data 80 Visualization Do We Really ● ● 60 Need a Graph? 60 General Considerations % BTC owned % BTC owned

Graphical 40 ● 40 ● Perception Bad Graphs 20 Survey of 20 ● Data ● ● Visualization ● ● 0 ● ● ● Techniques 0 One 0 20 40 60 80 100 0 20 40 60 80 100 Quantitative % of top addresses % of bottom addresses Variable Multiple Quantitative Variables One Qualitative Figure 18: Bitcoin wealth distribution: Scatterplot. Variable Multiple Qualitative Variables Conditional Plots 48 / 120 References Bad Graphs: Multilevel Donut Charts

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Figure 19: Goldman Sachs job listings: Multilevel donut chart. Multiple Quantitative Variables https://s3.amazonaws.com/cbi-research-portal-uploads/ One Qualitative Variable 2017/09/18173935/GSteardownjobs. What’s the message? Multiple Qualitative Unreadable. What’s a better graph? Variables Conditional Plots 49 / 120 References Bad Graphs: Wordclouds

Data Visualization

Dudoit american people spending one businesses Motivation know americans many tonightit...s country Principles of future government economy business health Data that...s let...s families must ...ve the reform deficit get Visualization right also they energyput even just work Do We Really jobs that time care nextfirst like small Need a Graph? tax give last help clean security america i...m world General every still wantthis Considerations but education now new make Graphical come year ...re Perception takeand nation Bad Graphs congress can two need Survey of willyears Data Visualization Techniques One Quantitative Variable Figure 20: State of the Union speeches 2010 and 2011: Wordcloud. Multiple Quantitative Frequency of words with at least 15 occurrences. What’s the Variables One Qualitative message? How to compare frequencies of words? What’s a better Variable Multiple graph? Qualitative Variables Conditional Plots 50 / 120 References Bad Graphs: Wordclouds

Data spending ● Visualization security ● put ● education ● they ● i...m ● ...re ● Dudoit still ● must ● first ● even ● ...ve ● this ● right ● reform ● Motivation help ● health ● clean ● want ● let...s ● it...s ● Principles of families ● deficit ● world ● Data that ● next ● congress ● Visualization care ● small ● many ● future ● Do We Really every ● give ● Need a Graph? get ● economy ● tax ● General like ● come ● Considerations that...s ● energy ● business ● also ● Graphical country ● businesses ● Perception nation ● tonight ● two ● Bad Graphs need ● take ● government ● time ● last ● Survey of know ● the ● just ● Data make ● america ● american ● americans ● Visualization year ● now ● work ● Techniques one ● years ● jobs ● One new ● but ● can ● Quantitative people ● and ● Variable will ●

Multiple 20 40 60 80 100 120 Quantitative Frequency Variables One Qualitative Variable Multiple Figure 21: State of the Union speeches 2010 and 2011: Dotplot. Qualitative Variables Frequency of words with at least 15 occurrences. Conditional Plots 51 / 120 References Bad Graphs: Wordclouds

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques Figure 22: Gilets jaunes: Wordcloud. Frequency of expressions and One hashtags on Twitter for first four days of gilets jaunes movement. Quantitative Variable How to compare frequencies between days? Multiple Quantitative Variables https://www.lexpress.fr/actualite/societe/ One Qualitative Variable gilets-jaunes-ce-qu-en-disent-les-francais_2055542. Multiple Qualitative html. Variables Conditional Plots 52 / 120 References Bad Graphs: Wordclouds

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Figure 23: Names: Wordcloud. Variable Multiple https://www.wordclouds.com/?cloud=names. Qualitative Variables Conditional Plots 53 / 120 References Bad Graphs: Wordclouds

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables Figure 24: Business words: Wordcloud. One Qualitative Variable https://www.wordclouds.com/?cloud=business Multiple Qualitative Variables Conditional Plots 54 / 120 References Bad Graphs

Data Visualization • Chart junk. The previous graphs exemplify “chart junk”, Dudoit i.e., they contain superfluous elements that are not Motivation necessary to convey the information contained in the data, Principles of Data but instead distract the viewer from this information or Visualization Do We Really even mask or distort important information. Need a Graph? General • Pie charts. Considerations Graphical Perception I Frequency represented by angle/area. Bad Graphs I Angles and areas are hard to perceive and compare. Survey of Data I Pie charts quickly become unreadable for more than a Visualization handful of values. Techniques One I Listing the values is often better – they are actually often Quantitative Variable added to a pie chart anyway! Multiple Quantitative I How to select order of categories? Variables One Qualitative I Not amenable to comparing distributions; side-by-side Variable Multiple Qualitative comparisons not effective. Variables Conditional Plots 55 / 120 References Bad Graphs

Data Visualization I Hard to extend to multiple variables.

Dudoit I A lot of junk often added to pie charts, e.g., thickness, slice explosion. Motivation • Wordclouds/tag clouds. Principles of Data I Frequency represented by font size. Visualization Do We Really I Neither area nor height corresponds to frequency of words. Need a Graph? General I How do longer words compare with shorter words? Considerations Graphical I How are capital letters handled? Perception Bad Graphs I How to calculate relative difference in frequency between Survey of two words? Data Visualization I How are the words ordered within the cloud (alphabetical, Techniques One frequency)? Quantitative Variable I Not amenable to comparing distributions; side-by-side Multiple Quantitative comparisons not effective. Variables One Qualitative I How to extend to multiple variables? Variable Multiple I A lot of junk often added to word clouds. Qualitative Variables Conditional Plots 56 / 120 References Bad Graphs

Data Visualization Dudoit • Barcharts/barplots. Better.

Motivation I Based on length and position on common aligned scale.

Principles of I Add an irrelevant dimension (thickness of bar). Data I Visualization How to select order of categories? Do We Really I Not readily amenable to comparisons. Need a Graph? General I Extension to multiple variables problematic. Considerations Graphical Perception • Dotcharts/dotplots. (And interval charts.) Even better. Bad Graphs Survey of I Based on length and position on common aligned scale. Data I Display only the relevant information. Visualization Techniques I How to select order of categories? One Quantitative I More amenable to comparisons and extensions to multiple Variable Multiple variables. Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 57 / 120 References Gapminder

Data Visualization Gapminder.( https://www.gapminder.org) Dudoit • We will use data from Gapminder to reason through the

Motivation process of data visualization, e.g., population, population Principles of density, life expectancy, income for each country. Data Visualization • Note that in this case we have a census, i.e., there is no Do We Really Need a Graph? 3 General sampling involved . Considerations Graphical Perception • Gapminder is a Swedish foundation co-created in 2005 by Bad Graphs Hans Rosling (Professor of International Health at Survey of Data Karolinska Institute) and family members. Visualization Techniques • “Gapminder is a fact tank, not a think tank.” One Quantitative Variable “Gapminder measures ignorance about the world.” Multiple Quantitative “Gapminder makes global data easy to use and Variables One Qualitative Variable understand.” Multiple Qualitative “Gapminder promotes Factfulness, a new way of thinking.” Variables Conditional Plots 58 / 120 References Gapminder

Data Visualization

Dudoit

Motivation

Principles of Data • Gapminder developed Trendalyzer, a data visualization Visualization software providing dynamic and interactive graphics of Do We Really Need a Graph? General data compiled by organizations such as the United Nations Considerations Graphical and the World Bank (acquired by Google in 2007). Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable 3 Multiple Some of the data could be estimates, but we won’t concern ourselves Qualitative Variables with this at this point. Conditional Plots 59 / 120 References Gapminder

INCOME LEVELS LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4 85 Data Japan Andorra apminder World 2015 Italy Iceland Switzerland Spain Australia Luxembourg France Sweden Norway Malta Israel Visualization Cyprus Ireland Canada Singapore Greece N. Zeal. UK Netherlands Sloven. Austria Costa Rica Portugal Finl. Germany Peru South Korea BelgiumDenm. Kuwait Maldives Turkey 80 Saudi Arabia Qatar Bosnia & Herz. Lebanon Chile Czech Rep. Bahrain Jordan Dudoit Panama Albania Cuba Poland Puerto Bermuda Nicaragua Colombia Estonia USA Croatia Rico Sri Lanka Monten. Tunisia Uruguay Slovak Rep. Oman Brunei Maced F. Algeria Argentina Hungary Serbia Antig.& B. Barbados Malaysia Aruba ChinaEcuador Mexico Latvia Vietnam El Salvador Jamaica Lithuania United Arab Em. Dominican R.Venezuela 75 Armenia Bulgaria Romania Palestine St. Lucia Morocco Thailand Mauritius THY Seychelles Moldova Paraguay Motivation Bolivia Iran Libya Bahamas Georgia Dominica Tajikistan Honduras Samoa Azerbaijan North Korea Cape Verde Bhutan Brazil Uzbekistan Egypt Trinidad & Tobago Timor-Leste Guatemala Belize Suriname L HEA Bangladesh Tonga Grenada Ukraine St.V&G Belarus Russia Principles of Kyrgyz Rep. Mauritania Philippines 70 HEALTH & INCOME Nepal Cambodia Indonesia Turkmenistan Kazakhastan Micronesia Myanmar OF NATIONS Data Comoros Gambia Sao T & P Syria IN 2015 H Lao Mars. Isl. India Iraq This graph compares Mongolia Sudan Guyana

T Life Expectancy & GDP per capita Visualization Rwanda Gabon Kenya Yemen Fiji for all 182 nations

L Pakistan Ethiopia Senegal recognized by the UN. 65 Vanuatu Ghana Solomon Isl. Namibia

Do We Really A Madagascar Haiti Djibouti Liberia Kiribati Tanzania Nigeria e expectancy (years) COLOR BY REGION Need a Graph? E Benin

Li f Togo Burundi Equatorial Niger Papua N. G. Congo, Rep. Guinea Eritrea Uganda

General H Burkina Faso Mali South Africa Malawi Congo Cameroon 60 Considerations Zimbabwe Dem. Rep. Cote d'Ivoire Guinea Angola Graphical Chad Mozambique SIZE BY POPULATION Perception SICK Sierra Leone Zambia Guinea-Bissau South Sudan 55 Bad Graphs Somalia 1 000 Afghanistan 1 10 million 100 Survey of Swaziland

50 Central African Rep. www.gapminder.org Data a free fact-based worldview Visualization Lesotho POOR INCOME RICH $1 000 $2 000 $4 000 $8 000 $16 000 $32 000 $64 000 $128 000 Techniques GDP per capita ($ adjusted for price differences, PPP 2011) version 15 DATA SOURCES—INCOME: World Bank’s GDP per capita, PPP (2011 international $). Income of Syria & Cuba are Gapminder estimates. X-axis uses log-scale to make a doubling income show same distance on all levels. POPULATION: Data from UN Population Division. LIFE EXPECTANCY: IHME GBD-2015, as of Oct 2016. One ANIMATING GRAPH: Go to www.gapminder.org/tools to see how this graph changed historically and compare 500 other indicators. LICENSE: Our charts are freely available under Creative Commons Attribution License. Please copy, share, modify, integrate and even sell them, as long as you mention: ”Based on a free chart from www.gapminder.org”. Quantitative Variable Multiple Figure 25: Gapminder: World Poster 2015. “How Does Income Quantitative Variables Relate to Life Expectancy? Short answer - Rich people live longer.” One Qualitative Variable Bubble chart with four variables displayed in 2D. Multiple Qualitative Variables Conditional Plots 60 / 120 References Software

Data Visualization • Most of the plots below are produced with Python’s Dudoit seaborn library, using default arguments. Motivation • Default settings typically do not correspond to the most Principles of Data basic version of the plot, but rather impose many decisions Visualization Do We Really on the plot, e.g., color, legend, ordering. Experiment with Need a Graph? General Considerations different settings to make sure you get the plot you want. Graphical Perception • Seaborn tutorial: Bad Graphs Survey of https://seaborn.pydata.org/tutorial.html. Data Visualization Each has many arguments to customize the plots. Techniques As usual, consult documentation. One Quantitative Variable • Datasets available at: Multiple Quantitative Variables https://github.com/mwaskom/seaborn-data. One Qualitative Variable E.g. Titanic survival dataset, Fisher’s iris dataset. Multiple Qualitative Variables Conditional Plots 61 / 120 References One Quantitative Variable

Data Visualization Dudoit How would you visualize life expectancy in 2018 over all Motivation countries? Principles of Data Visualization count 182.000000 Do We Really Need a Graph? mean 72.726374 General Considerations std 7.237996 Graphical Perception Bad Graphs min 51.100000 Survey of 25% 67.150000 Data Visualization 50% 74.100000 Techniques One 75% 78.075000 Quantitative Variable max 84.200000 Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 62 / 120 References Stem-And-Leaf Plots

Data Visualization Key: aggr|stem|leaf 182 84 0 = 84 .0x1.0 = 84.0 182 84 02 Dudoit 180 83 25 178 82 1244446669 168 81 11223333458889 154 80 0125778 Motivation 147 79 13446 142 78 00122367 134 77 0223466677899 Principles of 121 76 0125678899 111 75 1223355578999 Data 98 74 011238899_ Visualization 89 73 123448 83 72 0002334456 Do We Really 73 71 115569 Need a Graph? 67 70 3555679 60 69 138 General 57 68 0002378 Considerations 50 67 113389 44 66 114689 Graphical 38 65 0245788 Perception 31 64 356 28 63 14569 Bad Graphs 23 62 24599 18 61 01112269 Survey of 10 60 025 7 59 57 Data 5 58 067 2 57 Visualization 2 56 Techniques 2 55 2 54 One 2 53 Quantitative 2 52 Variable 2 51 16 Multiple Quantitative Variables One Qualitative Variable Multiple Figure 26: Life expectancy, 2018. Qualitative Variables Conditional Plots 63 / 120 References Stripplots

Data Visualization 85 85 Dudoit

80 80 Motivation

Principles of 75 75 Data Visualization Do We Really 70 70 Need a Graph? General Considerations 65 65 Graphical life expectancy life expectancy Perception Bad Graphs 60 60 Survey of Data 55 55 Visualization

Techniques 50 50 One Quantitative Variable Multiple Quantitative Figure 27: Life expectancy, 2018. Right: Jittering, i.e., adding Variables One Qualitative random noise, to avoid overplotting. Variable Multiple Qualitative Variables Conditional Plots 64 / 120 References Histograms

Data Visualization

Dudoit

35 Motivation

Principles of 30 Data Visualization 25 Do We Really Need a Graph? 20 General Considerations 15 Graphical Perception Bad Graphs 10 Survey of 5 Data Visualization Techniques 0 50 55 60 65 70 75 80 85 One Quantitative life expectancy Variable Multiple Quantitative Variables Figure 28: Life expectancy, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 65 / 120 References Histograms

Data Visualization

Dudoit 140 Motivation bins=default 120 2 Principles of 20 Data 100 Visualization Do We Really Need a Graph? 80 General Considerations 60 Graphical Perception Bad Graphs 40 Survey of 20 Data Visualization Techniques 0 50 55 60 65 70 75 80 85 One Quantitative life expectancy Variable Multiple Quantitative Variables Figure 29: Life expectancy, 2018. Different numbers of bins. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 66 / 120 References Density Plots

Data Visualization

Dudoit

Motivation 0.05

Principles of Data 0.04 Visualization Do We Really Need a Graph? 0.03 General Considerations Graphical 0.02 Perception Bad Graphs Survey of 0.01 Data Visualization Techniques 0.00 50 60 70 80 90 One Quantitative life expectancy Variable Multiple Quantitative Variables Figure 30: Life expectancy, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 67 / 120 References Density Plots

Data Visualization

Dudoit

Motivation 0.07 bw=default 0.5 Principles of 0.06 2 Data 16 Visualization 0.05 Do We Really Need a Graph? 0.04 General Considerations Graphical 0.03 Perception Bad Graphs 0.02 Survey of Data 0.01 Visualization Techniques 0.00 0 20 40 60 80 100 120 One Quantitative Variable Multiple Quantitative Variables Figure 31: Life expectancy, 2018. Different bandwidths. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 68 / 120 References Boxplots

Data Visualization

Dudoit 85 Motivation 80 Principles of Data 75 Visualization Do We Really Need a Graph? 70 General Considerations 65 Graphical

Perception life expectancy Bad Graphs 60

Survey of 55 Data Visualization 50 Techniques One Quantitative Variable Multiple Quantitative Variables Figure 32: Life expectancy, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 69 / 120 References One Quantitative Variable and One Qualitative Variable

Data Visualization

Dudoit

Motivation How would you visually compare life expectancy between

Principles of regions? Data Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 70 / 120 References Stripplots

Data Visualization

Dudoit 85

Motivation 80

Principles of Data 75 Visualization

Do We Really 70 Need a Graph? General Considerations 65 Graphical

Perception life expectancy Bad Graphs 60 Survey of Data Visualization 55 Techniques One 50 Quantitative Variable south_asiaeurope_central_asiamiddle_east_north_africasub_saharan_africaamericaeast_asia_pacific Multiple six_regions Quantitative Variables One Qualitative Variable Multiple Figure 33: Life expectancy by region, 2018. Qualitative Variables Conditional Plots 71 / 120 References Histograms

Data Visualization

Dudoit

middle_east_north_africa Motivation 20.0 america Principles of 17.5 east_asia_pacific Data sub_saharan_africa 15.0 Visualization europe_central_asia Do We Really south_asia Need a Graph? 12.5 General Considerations 10.0 Graphical Perception 7.5 Bad Graphs 5.0 Survey of Data 2.5 Visualization Techniques 0.0 50 55 60 65 70 75 80 85 One Quantitative life expectancy Variable Multiple Quantitative Variables Figure 34: Life expectancy by region, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 72 / 120 References Density Plots

Data Visualization

Dudoit

Motivation middle_east_north_africa 0.10 america Principles of east_asia_pacific Data sub_saharan_africa 0.08 Visualization europe_central_asia Do We Really south_asia Need a Graph? General 0.06 Considerations Graphical Perception 0.04 Bad Graphs

Survey of 0.02 Data Visualization Techniques 0.00 50 60 70 80 90 One Quantitative Variable Multiple Quantitative Variables Figure 35: Life expectancy by region, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 73 / 120 References Boxplots

Data Visualization

Dudoit 85 Motivation 80 Principles of Data 75 Visualization Do We Really Need a Graph? 70 General Considerations 65 Graphical

Perception life expectancy Bad Graphs 60

Survey of 55 Data Visualization 50 Techniques One Quantitative Variable america Multiple Quantitative south_asia Variables Figure 36: Life expectancy by region, 2018. One Qualitative Variable east_asia_pacific sub_saharan_africa Multiple europe_central_asia Qualitative

Variables middle_east_north_africa Conditional six_regions Plots 74 / 120 References Violin Plots

Data Visualization

Dudoit

Motivation 90

Principles of Data 80 Visualization Do We Really Need a Graph? 70 General Considerations Graphical

Perception life expectancy 60 Bad Graphs Survey of Data 50 Visualization Techniques One Quantitative america Variable south_asia Multiple east_asia_pacific Quantitative europe_central_asia sub_saharan_africa Variables Figure 37: Lifemiddle_east_north_africa expectancy by region, 2018. One Qualitative six_regions Variable Multiple Qualitative Variables Conditional Plots 75 / 120 References Log-Transformation

Data Visualization

Dudoit

Motivation

120000 5 Principles of 10

Data 100000 Visualization Do We Really 80000 Need a Graph? 4 10 General 60000 Considerations income income 40000 Graphical Perception 20000 3 Bad Graphs 10 0 Survey of Data america america Visualization south_asia south_asia east_asia_pacific east_asia_pacific Techniques europe_central_asia sub_saharan_africa europe_central_asia sub_saharan_africa middle_east_north_africa middle_east_north_africa One Figure 38: Income,six_regions 2018. Left: Income (GDP/capita,six_regions Quantitative Variable inflation-adjusted $). Right: Log-transformed income. Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 76 / 120 References Time Series

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? General How did life expectancy vary between 1800 and 2018? Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 77 / 120 References Time Series

Data Visualization

Dudoit United States Russia China Syria Motivation Cambodia

Principles of Data Visualization Do We Really Need a Graph?

General expectancy life Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques 1800 1825 1850 1875 1900 1925 1950 1975 2000 One year Quantitative Variable Multiple Quantitative Variables Figure 39: Life expectancy over time for five countries. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 78 / 120 References Time Series

Data Visualization

Dudoit 80

Motivation 70

Principles of Data 60

● Visualization ● ● ● ● 50 ● Do We Really ● ● Need a Graph? ● ● ● ● ●

General expectancy life Considerations 40 Graphical Perception Bad Graphs 30

Survey of 20 ●

Data ● ● Visualization Techniques 1800 1850 1900 1950 2000

One year Quantitative Variable Multiple Quantitative Variables Figure 40: Life expectancy over time. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 79 / 120 References One Quantitative Variable: Summary

Data Visualization Displaying and comparing marginal distributions for Dudoit quantitative data.

Motivation • Stem-and-leaf plots.

Principles of I Simple pen-and-paper method for visualizing the Data Visualization distribution of all of a handful of values. Do We Really Need a Graph? I Not amenable to comparisons between distributions. General Considerations I No reason to use these days. Graphical Perception • Stripcharts/Stripplots. (Sometimes referred to as Bad Graphs dotcharts/dotplots, related to rug plots.) Survey of Data I Effective for visualizing the distribution of all of a Visualization Techniques moderate number of values. One Quantitative I Can use side-by-side stripplots to compare multiple Variable Multiple distributions. Quantitative Variables One Qualitative • Histograms. Variable Multiple I Classical method for displaying a single distribution. Qualitative Variables Conditional Plots 80 / 120 References One Quantitative Variable: Summary

Data Visualization I Sensitive to bin width and bin boundaries.

Dudoit I Cannot easily display and compare multiple distributions. • Density plots. Motivation I Based on kernel density estimation (cf. smoothing). Principles of Data I Sensitive to bandwidth, but methods available to select Visualization Do We Really bandwidth. Need a Graph? General I Effective for displaying and comparing multiple Considerations Graphical distributions. Perception Bad Graphs • Boxplots. (A.k.a., box-and-whiskers plots.) Survey of Data I Summarize distribution by only 5 numbers (+ outliers): Visualization Median, upper and lower-quartiles, whiskers at 1.5 times Techniques One inter-quartile range (IQR) above and below upper and Quantitative Variable lower-quartiles, respectively. Multiple Quantitative Variables I Possible loss of information, e.g., multimodality. One Qualitative Variable I Effective for displaying and comparing multiple Multiple Qualitative distributions, especially with notches. Variables Conditional Plots 81 / 120 References One Quantitative Variable: Summary

Data Visualization

Dudoit

Motivation • Violin plots. Principles of Data I Trendy hybrids of boxplots and density plots. Visualization Do We Really I Redundant (twice the density plot!), unless plot different Need a Graph? General densities on each side. Considerations Graphical I Same limitations and issues as with boxplots and density Perception Bad Graphs plots.

Survey of I Cannot compare densities as readily as with standard Data Visualization density plots. Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 82 / 120 References Multiple Quantitative Variables

Data Visualization

Dudoit

Motivation

Principles of Data Visualization Do We Really Need a Graph? How would you visually examine the relationship between General Considerations life expectancy and income in 2018 over all countries? Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 83 / 120 References Scatterplots

Data Visualization

Dudoit

Motivation

85 85 Principles of Data 80 80

Visualization 75 75 Do We Really Need a Graph? 70 70 General Considerations 65 65 life expectancy life expectancy Graphical 60 60 Perception Bad Graphs 55 55 Survey of 50 50 3 4 5 Data 0 20000 40000 60000 80000 100000 120000 10 10 10 income income Visualization Techniques One Figure 41: Life expectancy vs. income, 2018. Left: Income Quantitative Variable (GDP/capita, inflation-adjusted $). Right: Log-transformed income. Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 84 / 120 References Scatterplots

Data Visualization

Dudoit 85 Motivation 80 Principles of Data 75 Visualization Do We Really Need a Graph? 70 General six_regions Considerations 65 south_asia Graphical europe_central_asia

Perception life expectancy Bad Graphs 60 middle_east_north_africa sub_saharan_africa Survey of 55 america Data east_asia_pacific Visualization 50 Techniques 3 4 5 One 10 10 10 Quantitative income Variable Multiple Quantitative Variables Figure 42: Life expectancy vs. income, colored by region, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 85 / 120 References Bubble Charts

Data Visualization

Dudoit 85 Motivation 80 Principles of Data 75 four_regions Visualization asia Do We Really 70 europe Need a Graph? africa General americas Considerations 65 population

Graphical life expectancy Perception 60 0 Bad Graphs 500000000 Survey of 55 1000000000 Data 1500000000 Visualization 50 3 4 5 Techniques 10 10 10 One income Quantitative Variable Multiple Quantitative Figure 43: Life expectancy vs. income, colored by region and with Variables One Qualitative area of bubbles representing population, 2018. Variable Multiple Qualitative Variables Conditional Plots 86 / 120 References Mean-Difference Plots

Data Visualization

Dudoit

Motivation

Principles of 15 +SD1.96: 12.95 90 Data

Visualization 80 10 Do We Really Need a Graph? 70 mean diff: General 5 5.87

Considerations 2018 Difference Graphical 60 Perception 0 Bad Graphs 50 -SD1.96: -1.21

Survey of 40 Data 50 55 60 65 70 75 80 40 50 60 70 80 90 Means Visualization 1998 Techniques One Quantitative Figure 44: Life expectancy, 2018 vs. 1998. Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 87 / 120 References Scatterplot Matrices

Data Visualization 80 70

Dudoit 60 life expectancy

50

Motivation 20

18 Principles of 16 log pop Data 14 six_regions 12 south_asia Visualization europe_central_asia middle_east_north_africa Do We Really sub_saharan_africa 8 america Need a Graph? east_asia_pacific 6 General Considerations 4 log pop dens Graphical 2 Perception Bad Graphs 12

10 Survey of log inc Data 8 Visualization 60 80 10 20 0 5 10 5.0 7.5 10.0 12.5 Techniques life expectancy log pop log pop dens log inc One Quantitative Variable Multiple Figure 45: Life expectancy, population, population density, and Quantitative Variables income, by region, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 88 / 120 References Scatterplot Matrices

Data Visualization

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0.6 0.8 1.0 ● ● ● ● Dudoit ● ● ●● ● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● 0.8 ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ● ●● ● ● 0.6 ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● z Motivation ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ●● ● ● 0.4 ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● Principles of ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.0 0.2 0.4 Data ● ● ● ●● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● Visualization ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● 0.8 ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● Do We Really ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.6 ● ●● ●● ● ● Need a Graph? ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● General ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● 0.4 ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Considerations ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● Graphical ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● 0.0 0.2 0.4 ● ● ● ● ● ● ● ● ● Perception ● ● ● ●● ●● ● 0.0 ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● 0.6 0.8 1.0 ●● ● ● ● ●● ● ● ● ● ●● ● Bad Graphs ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Survey of ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ● ● ●● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● x ●● ●● ● ● ● ● ● ● ● ● ● Data ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● Visualization ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.2 ● ●● ● ● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● Techniques ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 ●● ●● ● ● ● ●● ● 0.0 ● ● ● ● ● ● One Quantitative Matrix Variable Multiple Quantitative Variables Figure 46: RANDU RNG. Triples of successive numbers. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 89 / 120 References 3D Scatterplots

Data Visualization

Dudoit

Motivation y x ●

● ● Principles of ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Data ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● Visualization ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Do We Really ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● Need a Graph? ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● General ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● Considerations ● ●● ● z ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Graphical ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● Perception ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● Bad Graphs ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● Survey of ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Data ● ● ● ● ● ● Visualization ● Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Figure 47: RANDU RNG. Triples of successive numbers. Qualitative Variables Conditional Plots 90 / 120 References RANDU RNG

Data Visualization Dudoit RANDU random number generator. (R. Ihaka, https://www.

Motivation stat.auckland.ac.nz/~ihaka/120/Lectures/lecture27.pdf.) Principles of • The dataset consists of 400 triples of successive numbers Data Visualization produced by the RANDU random number generator Do We Really Need a Graph? General (RNG). Considerations Graphical • Perception The consecutive triples produced by RANDU are Bad Graphs constrained to lie on a series of parallel planes which cut Survey of Data through the unit cube. Visualization Techniques • The planes are not aligned with the sides of the unit cube One Quantitative Variable and so do not show up in any of the panels of a Multiple Quantitative scatterplot matrix. Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 91 / 120 References Overplotting

Data Visualization

Dudoit

Motivation 20

Principles of 18 Data Visualization 16 Do We Really Need a Graph? 14 y General Considerations 12 Graphical Perception 10 Bad Graphs 8 Survey of Data 6 Visualization 4 6 8 10 12 14 16 18 20 Techniques x One Quantitative Variable Multiple Figure 48: Simulated data, n = 60, 000: Scatterplot. Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 92 / 120 References Overplotting: Hexagonal Binning

Data Visualization

Dudoit

Motivation 20

Principles of 18 Data Visualization 16 Do We Really Need a Graph? 14 y General Considerations 12 Graphical Perception 10 Bad Graphs 8 Survey of Data 6 Visualization 4 6 8 10 12 14 16 18 20 Techniques x One Quantitative Variable Multiple Figure 49: Simulated data, n = 60, 000: Hexagonal binning. Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 93 / 120 References Overplotting: Scatterplot Smoothing

Data Visualization

Dudoit

22.5 Motivation 20.0 Principles of Data 17.5 Visualization Do We Really 15.0 Need a Graph? y General 12.5 Considerations

Graphical 10.0 Perception Bad Graphs 7.5 Survey of Data 5.0 Visualization 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Techniques x One Quantitative Variable Multiple Figure 50: Simulated data, n = 60, 000: Scatterplot smoothing. Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 94 / 120 References Multiple Quantitative Variables: Summary

Data Visualization Displaying joint distributions for quantitative data. Dudoit • While density plots and boxplots are useful for comparing Motivation two or more marginal distributions (e.g., in terms of Principles of Data location and scale), they do not provide any information Visualization Do We Really about joint distributions and, in particular, associations Need a Graph? General between two variables. Considerations Graphical Perception • Scatterplots and scatterplot matrices. Bad Graphs I Useful for examining linear association between two Survey of Data variables. Visualization Techniques I Can extend beyond two variables by using color and One Quantitative plotting symbol area, as in bubble charts. Variable Multiple I However, can miss important higher-dimensional patterns Quantitative Variables (cf. RANDU example). One Qualitative Variable Multiple • Mean-difference plots. Qualitative Variables Conditional Plots 95 / 120 References Multiple Quantitative Variables: Summary

Data Visualization I Rotated and scaled version of scatterplot. Dudoit I Better for looking at differences vs. associations. Motivation • Bubble charts. A bubble chart is a type of scatterplot that Principles of displays one or two extra dimensions using area and color. Data Visualization • Parallel coordinates plots. Do We Really Need a Graph? General I Natural for visualizing time series data, i.e., same variable Considerations Graphical measured across time. Perception Bad Graphs Cf. Train schedules. Survey of I Can also be used for visualizing the relationship between Data Visualization multiple variables, but trickier: Each line corresponds to an Techniques observation and each axis to a variable. One Quantitative Variable I Three important considerations, that can affect Multiple Quantitative interpretation of the plot: The order, the rotation, and the Variables One Qualitative scaling of the axes. Variable Multiple Qualitative Variables Conditional Plots 96 / 120 References Overplotting

Data Visualization

Dudoit

Motivation

Principles of Overplotting issues can be reduced by the following Data Visualization approaches. Do We Really Need a Graph? • Changing plotting symbol. General Considerations Graphical • Jittering, i.e., adding random noise. Perception Bad Graphs • Smoothing. Survey of Data Visualization • Hexagonal binning. Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 97 / 120 References Qualitative Variables

Data Visualization How would you visualize the 2017 UK election results?

Dudoit Number of seats for each of 13 parties. Motivation Party MPs Principles of Data 0 CON 318 Visualization 1 LAB 261 Do We Really Need a Graph? 2 SNP 35 General Considerations 3 LIB DEM 12 Graphical Perception 4 DUP 10 Bad Graphs Survey of 5 SF 7 Data 6 PC 4 Visualization Techniques 7 GREEN 1 One Quantitative 8 IND 1 Variable Multiple 9 OTHER 1 Quantitative Variables 10 UKIP 0 One Qualitative Variable 11 SDLP 0 Multiple Qualitative 12 UUP 0 Variables Conditional Plots 98 / 120 References Pie Charts

Data Visualization

Dudoit CON Motivation

Principles of 48.9% Data Visualization Do We Really Need a Graph? 0.2%0.0% 1.1%0.6%0.2% PCGREENINDOTHERUKIPSDLPUUP General 1.5% SF Considerations 1.8% DUP 5.4% LIB DEM Graphical Perception 40.2% SNP Bad Graphs Survey of Data LAB Visualization Techniques One Quantitative Variable Multiple Quantitative Figure 51: UK Election Results 2017. Number of seats for each of 13 Variables One Qualitative parties. Variable Multiple Qualitative Variables Conditional Plots 99 / 120 References Barplots

Data Visualization

Dudoit

Motivation 300

Principles of 250 Data Visualization 200 Do We Really Need a Graph?

General MPs 150 Considerations Graphical Perception 100 Bad Graphs Survey of 50 Data Visualization 0 Techniques SF PC IND CON LAB SNP DUP UKIP SDLP UUP One GREEN OTHER Quantitative LIB DEM Variable Party Multiple Quantitative Figure 52: UK Election Results 2017. Number of seats for each of 13 Variables One Qualitative parties. Variable Multiple Qualitative Variables Conditional Plots 100 / 120 References Dotplots

Data Visualization

Dudoit CON Motivation LAB SNP Principles of LIB DEM Data Visualization DUP Do We Really SF Need a Graph? PC

General Party Considerations GREEN Graphical IND Perception OTHER Bad Graphs UKIP Survey of SDLP Data UUP Visualization Techniques 0 50 100 150 200 250 300 One MPs Quantitative Variable Multiple Quantitative Figure 53: UK Election Results 2017. Number of seats for each of 13 Variables One Qualitative parties. Variable Multiple Qualitative Variables Conditional Plots 101 / 120 References Lollipop Plots

Data Visualization

Dudoit

300 Motivation

Principles of 250 Data Visualization 200 Do We Really Need a Graph? 150 General Considerations Graphical 100 Perception Bad Graphs 50 Survey of Data 0 Visualization Techniques SF PC IND LAB SNP DUP CON

One UKIP Quantitative SDLP OTHER GREEN

Variable LIB DEM Multiple Quantitative Figure 54: UK Election Results 2017. Number of seats for each of 13 Variables One Qualitative parties. Variable Multiple Qualitative Variables Conditional Plots 102 / 120 References One Qualitative Variable: Summary

Data Visualization • Pie charts. Dudoit I Frequency represented by angle/area. Motivation I Angles and areas are hard to perceive and compare. Principles of I Pie charts quickly become unreadable for more than a Data Visualization handful of values. Do We Really Need a Graph? I Listing the values is often better – they are actually often General Considerations added to a pie chart anyway! Graphical Perception I How to select order of categories? Bad Graphs I Not amenable to comparing distributions; side-by-side Survey of Data comparisons not effective. Visualization Techniques I Hard to extend to multiple variables. One Quantitative I A lot of junk often added to pie charts, e.g., thickness, Variable Multiple slice explosion. Quantitative Variables • Wordclouds/tag clouds. One Qualitative Variable Multiple I Frequency represented by font size. Qualitative Variables Conditional Plots 103 / 120 References One Qualitative Variable: Summary

Data Visualization I Neither area nor height corresponds to frequency of words.

Dudoit I How do longer words compare with shorter words? I How are capital letters handled? Motivation I How to calculate relative difference in frequency between Principles of Data two words? Visualization I How are the words ordered within the cloud (alphabetical, Do We Really Need a Graph? frequency)? General Considerations I Not amenable to comparing distributions; side-by-side Graphical Perception comparisons not effective. Bad Graphs I How to extend to multiple variables? Survey of Data I A lot of junk often added to word clouds. Visualization Techniques • Barcharts/barplots. One Quantitative Variable I Based on length and position on common aligned scale. Multiple Quantitative I Add an irrelevant dimension (thickness of bar). Variables One Qualitative I How to select order of categories? Variable Multiple I Not readily amenable to comparisons. Qualitative Variables Conditional Plots 104 / 120 References One Qualitative Variable: Summary

Data Visualization I Extension to multiple variables problematic. Dudoit • Dotcharts/dotplots. (And interval charts.) Motivation I Based on length and position on common aligned scale. Principles of I Display only the relevant information. Data Visualization I How to select order of categories? Do We Really Need a Graph? I More amenable to comparisons and extensions to multiple General Considerations variables. Graphical Perception • Bad Graphs Lollipop plots. Survey of I Similar to dotcharts/dotplots (with added stem) and Data Visualization barcharts/barplots. Techniques I Stem is redundant. One Quantitative Variable I How to select order of categories? Multiple Quantitative I Not readily amenable to comparisons. Variables One Qualitative I Extension to multiple variables problematic. Variable Multiple Qualitative Variables Conditional Plots 105 / 120 References Multiple Qualitative Variables

Data Visualization

Dudoit Motivation How would you display survival data on the Titanic Principles of Data according to class, gender, and age? Visualization Do We Really Need a Graph? General Considerations Graphical Perception Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 106 / 120 References Barplots

Data Visualization

Dudoit

Motivation No Yes 350 Yes No Principles of 400

Data 300 Visualization

Do We Really 250

Need a Graph? 300 General Considerations 200 Graphical 200 Perception 150 Bad Graphs

Survey of 100 Data 100 50 Visualization

Techniques 0 0

One First Second Third First Second Third Quantitative Variable Multiple Quantitative Variables One Qualitative Figure 55: Titanic: Survival by class. Variable Multiple Qualitative Variables Conditional Plots 107 / 120 References Dotplots

Data Visualization

Dudoit

Motivation Third ● Principles of Data Visualization ● Do We Really Second Need a Graph? General Considerations Graphical First ● Perception Bad Graphs Survey of Data Visualization Techniques 0.3 0.4 0.5 0.6 One Quantitative Variable Multiple Quantitative Survival frequency per class Variables One Qualitative Variable Multiple Figure 56: Titanic: Survival by class. Qualitative Variables Conditional Plots 108 / 120 References Dotplots

Data Visualization

Dudoit

Motivation woman ● Principles of Data Visualization ● Do We Really man Need a Graph? General Considerations Graphical child ● Perception Bad Graphs Survey of Data Visualization Techniques 0.2 0.4 0.6 One Quantitative Variable Multiple Survival frequency per gender/age Quantitative Variables One Qualitative Variable Multiple Figure 57: Titanic: Survival by gender/age. Qualitative Variables Conditional Plots 109 / 120 References Dotplots

Data Visualization Dudoit child Third ● Motivation Second ● ● Principles of First Data man Visualization Third ● Do We Really Second ● Need a Graph? ● General First Considerations Graphical woman Perception Third ● Bad Graphs Second ● ● Survey of First Data Visualization Techniques 0.2 0.4 0.6 0.8 1.0 One Quantitative Variable Multiple Survival frequency per class and gender/age Quantitative Variables One Qualitative Variable Multiple Figure 58: Titanic: Survival by class and gender/age. Qualitative Variables Conditional Plots 110 / 120 References Mosaic Plots

Data no yes Visualization

Dudoit First

Motivation Second Principles of Data Visualization

Do We Really class Need a Graph? General Considerations Graphical Third Perception Bad Graphs Survey of Data Visualization Techniques alive One Quantitative Variable Multiple Quantitative Variables One Qualitative Figure 59: Titanic: Survival and class. Variable Multiple Qualitative Variables Conditional Plots 111 / 120 References Mosaic Plots

Data no yes child man woman child man woman Visualization

Dudoit First

Motivation Second Principles of Data Visualization

Do We Really class Need a Graph? General Considerations

Graphical Third Perception Bad Graphs Survey of Data Visualization Techniques alive One Quantitative Variable Multiple Quantitative Variables One Qualitative Figure 60: Titanic: Survival, class, and gender/age. Variable Multiple Qualitative Variables Conditional Plots 112 / 120 References Multiple Qualitative Variables: Summary

Data Visualization The following types of plots are used to represent conditional Dudoit distributions for multiple categorical variables or counts for

Motivation hierarchical categories. Principles of • Multilevel donut/pie/sunburst plots. Data Visualization I Same or worse perception issues as with univariate pie Do We Really Need a Graph? charts. General Considerations I Which variable to choose for “outer” layer? Graphical Perception Bad Graphs • Barcharts/barplots.

Survey of I For two categorical variables, a barchart/barplot displays Data Visualization the counts (or percentages) for each category of the Techniques second variable within each category of the first variable., One Quantitative Variable i.e., conditional distribution of second variable given first. Multiple Quantitative I Which variable to choose as “first”? Variables One Qualitative I In a side-by-side barplot, the frequencies for the second Variable Multiple variable are displayed as juxtaposed bars. Qualitative Variables Conditional Plots 113 / 120 References Multiple Qualitative Variables: Summary

Data I In a stacked/segmented barplot, the bars for the second Visualization variable are staked, so that their total height is the total Dudoit count for the category of the first variable or 100 percent. Motivation I Hard to compare frequencies between categories of first Principles of variable with both types of barplots. Data Visualization I Hard to compare frequencies of second variable within Do We Really Need a Graph? categories of first variable with stacked barplot. General Considerations I Circular barcharts/barplots: Eye-catching, but even harder Graphical Perception to compare frequencies. Bad Graphs Survey of • Treemap. The hierarchical or conditional frequencies are Data Visualization represented using nested figures, usually rectangles. Techniques One • Mosaic plots. Quantitative Variable I A mosaic plot is a graphical display of the counts in a Multiple Quantitative Variables contingency table (a.k.a., cross-tabulation or crosstab), One Qualitative Variable where each cell is represented by a tile (i.e., rectangle) Multiple Qualitative whose area is proportional to the cell frequency. Variables Conditional Plots 114 / 120 References Multiple Qualitative Variables: Summary

Data Visualization

Dudoit I Color and shading of the tiles can be used to represent Motivation unusually large or small counts, the sign and magnitude of Principles of Data residuals (deviations) for particular models (e.g., Visualization independence). Do We Really Need a Graph? I For two categorical variables, the width of each tile is General Considerations proportional to the marginal frequency of the category for Graphical Perception Bad Graphs the first variable and the height of the tile to the Survey of conditional frequency of the category for the second Data variable given the first. Visualization Techniques I Can be hard to read mosaic plots for more than two One Quantitative variables. Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 115 / 120 References Conditional Plots

Data Visualization Conditional plots/coplots/faceting/panels/small multiples. Dudoit • Collection of plots, where each plot represents the Motivation conditional distribution of one or more variables given a Principles of Data conditioning variable. Visualization Do We Really • Each plot corresponds to a value or set of values for the Need a Graph? General conditioning variable. For a quantitative conditioning Considerations Graphical Perception variable, the ranges are typically chosen so that there are Bad Graphs equal numbers of observations in each panel. Survey of Data • The scales on the axes have to be the same for all panels. Visualization Techniques • One The colors (and legends) also have to be the same for all Quantitative Variable panels. Multiple Quantitative Variables • E.g. Scatterplots of life expectancy vs. income for each of One Qualitative Variable the six world regions. Multiple Qualitative Variables Conditional Plots 116 / 120 References Conditional Plots

Data Visualization 85 ● ● Dudoit ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● Motivation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● Principles of ● ● ● ● ● ● ● ● ● ● Data ● ● ● ● ● ● ● ● ● ● ● ● ● Visualization 70 ● ● ● ● ● ● ● ● ● ● ● Do We Really ● ● ● ● Need a Graph? ● ● ● ● 65 ● ● ● General life.expectancy ● ● ● ● Considerations ● ● ● ● ● Graphical ● ● Perception 60 ● ● ● Bad Graphs ● Survey of 55

Data ● ● Visualization 50 Techniques One Quantitative america Variable south_asia Multiple Quantitative east_asia_pacific sub_saharan_africa Variables Figure 61: Life expectancyeurope_central_asia by region, 2018.

One Qualitative middle_east_north_africa Variable Multiple Qualitative Variables Conditional Plots 117 / 120 References Conditional Plots

Data Given : income Visualization 0 20000 40000 60000 80000 100000 120000

Dudoit

Motivation

Principles of amrc es__ er__ m___ sth_ sb__ amrc es__ er__ m___ sth_ sb__

● 85 ● ● ● ● ● Data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Visualization ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Do We Really ● 70 ● ● ● ● Need a Graph? ● ● ● ● 65

General 60 Considerations 55

Graphical 50

Perception 85 ● ● ●

80 ● ● ● ● ● ● Bad Graphs ● ● ● life.expectancy ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 70 ● ● ● ● ● ● ● ● ● ● ● ● Survey of ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 65 ● ● ● ● ● ● ● ● ● Data ● ● ● ● ● ● ● ● 60 ● Visualization ● ● ● 55

● ●

Techniques 50 amrc es__ er__ m___ sth_ sb__ One Quantitative six_regions Variable Multiple Quantitative Variables Figure 62: Life expectancy by region conditioning on income, 2018. One Qualitative Variable Multiple Qualitative Variables Conditional Plots 118 / 120 References References

Data Visualization • Peter Aldhous. Data visualization: basic principles. http: Dudoit //paldhous.github.io/ucb/2016/dataviz/week2.html. Motivation • Ross Ihaka. Statistics 120 – Information Visualisation. Principles of Data https://www.stat.auckland.ac.nz/ ihaka/120/. Visualization ~ Do We Really Need a Graph? • Duncan Temple Lang. Data Visualization Workshops. General Considerations http://dsi.ucdavis.edu/tag/data-visualization.html. Graphical Perception Bad Graphs W. S. Cleveland and R. McGill. Graphical perception and graphical Survey of Data methods for analyzing scientific data. Science, 229(4716):828–833, Visualization 1985. Techniques One A. Gelman, C. Pasarica, and R. Dodhia. Lets practice what we preach: Quantitative Variable Turning tables into graphs. The American Statistician, 56(2):121–130, Multiple Quantitative 2002. Variables One Qualitative Variable E. J. Marey. La Mthode Graphique. Librairie de l’Acad´emiede M´edecine, Multiple Qualitative 1885. Variables Conditional Plots 119 / 120 References References

Data Visualization

Dudoit

Motivation

Principles of Data Visualization S. S. Stevens. On the psychophysical law. Psychological Review, 64(3): Do We Really Need a Graph? 153–181, 1957. General Considerations E. R. Tufte. The Visual Display of Quantitative Information. Graphics Graphical Perception Press, 2nd edition, 2001. Bad Graphs Survey of Data Visualization Techniques One Quantitative Variable Multiple Quantitative Variables One Qualitative Variable Multiple Qualitative Variables Conditional Plots 120 / 120 References