Visualizing Scientific Data

Elena A. Allena, Erik Barry Erhardtb

aDepartment of Biological and Medical Psychology, University of Bergen bDepartment of Mathematics and Statistics, University of New Mexico

Contents List of Figures

1 Introduction and Motivation 1 1 Perceptual Tasks ...... 2 2 Difference between curves ...... 3 2 Designing for Human Abilities and Limita- 3 types ...... 5 tions 2 4 Alternatives to 3D ...... 9 2.1 Graphical Encoding ...... 2 5 Symbol shape and saturation ...... 9 2.2 Short-term and Long-term Memory . . . . . 3 6 Color benefits ...... 11 2.3 Design Principles ...... 3 7 Color cylinder ...... 12 8 Aspect ratio ...... 13 3 Selecting a Chart Type 4 9 Categorical axis ...... 14 3.1 Common Chart Types ...... 4 10 Integrate statistical descriptions ...... 15 3.2 Displaying Variation and Uncertainty . . . 6 11 Data process ...... 17 3.3 Multivariate Visualizations ...... 7 12 Modified designs with real data ...... 18 3.4 3D visualizations ...... 8

4 Symbols and Colors 8 1. Introduction and Motivation 4.1 Using Symbols ...... 8 4.2 Using Color ...... 10 is more than just the graphical dis- play of information. Rather, it is a form of visual commu- 5 Supporting Details 12 nication intended to generate understanding or insights 5.1 Axes ...... 12 about data. Specific to scientific research, data visual- 5.2 Grids and Guides ...... 13 ization serves many distinct purposes. In exploratory 5.3 Annotation ...... 13 data analysis, which renowned statistician John Tukey de- 5.4 Font Choice ...... 14 scribes as “graphical detective work” (Tukey, 1977), visu- 5.5 Numerical Precision ...... 15 alizations provide the ability to reveal unanticipated pat- terns and relationships. When a priori hypotheses are 6 Putting it into Practice 15 available, visualizations are used to support hypothesis 6.1 Tools ...... 15 testing and model validation. In presentations and pub- 6.2 Examples ...... 16 lications, visualizations are the primary medium used to 6.3 A Checklist for Assessing Visualizations . . 19 convey research findings to colleagues. While tables have their place for point/value reading in small or moderate- sized datasets, graphs are the superior choice for show- List of Tables ing trends, summarizing data, and demonstrating rela- 1 Decoding accuracy ...... 3 tionships (Jarvenpaa and Dickson, 1988). 2 Data types ...... 6 As with any form of communication, data visualiza- 3 Checklist ...... 20 tion can be effective or ineffective. It has been argued by Wallgren, Wallgren, Persson, Jorner, and Haaland (1996) that “a poor chart is worse than no chart at all”. With- $Chapter to appear in 2016 in Handbook of Psychophysiol- out consideration of how a visualization will be interpreted ogy, 4th Edition by Cambridge University Press, edited by Pro- (or possibly misinterpreted), you run the risk of confusing fessor John T. Cacioppo, Professor Louis G. Tassinary and Gary G. Berntson. your viewer rather than enhancing their understanding. Acknowledgments: We thank Lana Chavez for generous help Unfortunately, there is evidence to suggest that graphical editing early versions of the chapter. We also acknowledge Bang communication in the sciences can be substantially im- Wong and Martin Krzywinski, whose clear writing and proved. An early survey by Cleveland (1984) of 377 graphs inspired much of the work presented here. Email addresses: [email protected] (Elena A. Allen), in a volume of Science showed that 30% had at least one [email protected] (Erik Barry Erhardt) major error compromising interpretation. Our own more

Preprint Handbook of Psychophysiology, 4th Edition, 2016 October 2015 Allen and Erhardt Visualizing Scientific Data Page 2 of 21 recent survey of nearly 1500 figures from five neuroscience the viewer to decode information effortlessly and accu- journals found that graphical displays became less infor- rately. mative and interpretable as the dimensions of datasets in- Psychophysics experiments have long shown that some crease (Allen, Erhardt, and Calhoun, 2012). For example, encodings are easier to decode than others (e.g., Fech- only 43% of graphics displaying higher-dimensional data ner, 1860). An intensive period of investigation into hu- labeled the dependent variable (meaning that more than man perceptual abilities in the 1950s–70s found power- half the time the viewer couldn’t determine what quantity law relationships between the true magnitude of a stimu- was being plotted) and only 20% portrayed the statistical lus and its perceived magnitude (Baird, 1970a,b; Feinberg uncertainty of measured or calculated quantities. and Franklin, 1975; Stevens, 1975). For example, while The inadequacies of many scientific graphics are not judgements of length were approximately linear, perceived due to a lack of guidelines or established best practices. brightness, area, and volume all showed sublinear rela- Quite the contrary, there are numerous excellent resources tionships to true stimulus magnitude, i.e., in order to get available (e.g., Cleveland, 1994; Tufte, 2001). The prob- the same increase in perceived area from one object to lem appears to be that the advice of the experts is not another, one must consistently over-represent the actual reaching its intended audience (Wickham, 2013). It is area of the object. increasingly recognized that data visualization education In the 1980s, William Cleveland and colleagues pur- must be integrated into resources broadly accessed by the sued investigations into perceptual decoding in relation to scientific community. Examples of this paradigm shift graphical communication, positing that visual attributes include the appearance of relatively simple graphing tu- which could be decoded more accurately would ultimately torials in prominent journals (e.g., Spitzer, Wildenhain, result in a greater likelihood of perceiving patterns in Rappsilber, and Tyers, 2014; Wand, Iversen, Law, and data and making correct interpretations (Cleveland and Maher, 2014), and recent development of the strongly rec- McGill, 1984). The findings of Cleveland and McGill ommended “Points of View” column by Bang Wong, Mar- (1985), which have also been reproduced more recently tin Krzywinski, and colleagues, in Nature Methods, which by Heer and Bostock (2010), led to a ranking of visual at- provides a high level discussion of data visualization fun- tributes based on decoding accuracy, presented in 1. damentals. Education — particularly early on — is crit- Several of these visual attributes are shown in Figure 1, ical for the development of skills, which represents the same set of data encoded differently and it is exciting to see innovative resources (such as this in each column. You can test for yourself whether some book) that acknowledge data visualization as a general attributes are easier to accurately decode than others. research method for scientists. It is our intention that by reading this chapter you will VALUE SATURATION VOLUME AREA ANGLE LENGTH POSITION become a better communicator of visual information. We 15 will guide you through the motivating principles of graph- ? ical design, how to construct graphs that play to human ? perceptual strengths, how to clarify visual messages with ? annotations, and, finally, how to implement these concepts ? in practice. ? 40 2. Designing for Human Abilities and Limitations Figure 1: Test your ability to decode quantitative information from There are countless ways to represent information graph- different visual attributes. Each column encodes the same 7 values ically, however not all of these ways are useful or even in- using a different graphical feature. How accurate is your perception? Values from top to bottom: 15, 30, 50, 10, 32, 24, 40. Adapted from terpretable to human beings. When designing a graphic, Wong (2010b). we must consider our perceptual and cognitive capabil- ities. Effective graphics will be those that play to the strengths of human information-processing abilities and The relative abilities of humans to make accurate per- avoid the weaknesses. ceptual judgements has become the basis for recommend- ing particular graphical encodings over others. It is com- 2.1. Graphical Encoding monly accepted that position along a common axis should In data visualizations, information is represented not be the default encoding method for displaying quantita- with numbers, but with geometric objects. Thus, quanti- tive information because it is most accurately decoded. tative and categorical data must be mapped or encoded to Histograms, bar , line plots, scatter plots, and many visual objects, typically through attributes such as posi- other forms encode information in this fashion. When a tion, size, shape, or color. Communication is effective only single common axis is not sufficient to show the data, a if the perceptual decoding of information by the viewer is good alternative is a series of replicated axes, known as successful. Ideally, a visualization is designed that helps small multiples (Tufte, 2001). Color hue and saturation, Allen and Erhardt Visualizing Scientific Data Page 3 of 21

Visual attribute Rank material in a display, overwhelming the viewer, or relegate Position along a common scale 1 (most accurate) too much information to the figure legend, requiring the Position along non-aligned scales 2 viewer to arduously memorize encodings and definitions Length 3 (Kosslyn, 1985). Improved graphical designs (Sec. 2.3) Angle, Slope 4 and integrated annotation (Sec. 5.3) can reduce demands Area 5 Volume, Color saturation 6 on short-term memory, enhancing and accelerating under- Color hue 7 (least accurate) standing. Considerations of long-term memory involve taking Table 1: A ranking of decoding accuracy for different visual at- advantage of past experiences. Comprehending a con- tributes, adapted from Cleveland and McGill (1984) and Cleveland ventional display requires little more than recognizing the and McGill (1985). chart type and identifying graphical elements. In con- trast, a novel display type must first be deciphered — it is a puzzle to be solved, rather than a graph to be read which rank as the least accurate visual attributes, should (Kosslyn, 1985). By following conventions and using fa- be used cautiously to encode quantitative information, a miliar chart types (Sec. 3.1), we can exploit the repository topic we discuss at length in Section 4.2. of knowledge stored in long-term memory and facilitate With regard to perceptual limitations, special care should comprehension. also be taken when visually comparing curves (Haemer, 1947a; Cleveland and McGill, 1984). To determine the 2.3. Design Principles difference between curves, we must accurately judge the When we write, we choose each word carefully and vertical distance between them. Unfortunately, our brains consider its integration into the surrounding text. When tend to judge the minimum distance between the curves, we design, we follow the same process and choose each vi- rather than the vertical distance. You can test your own sual element based on its role in the graphic. Good design abilities to perceive curve differences in Figure 2 — you choices, like good writing, make ideas easy to understand. may be surprised by the result. Thus, if the difference We present five design principles that will help to guide between curves is of interest, we recommend plotting this your choices. quantity directly. Principle 1: Determine the goal. Designing a visualization without a goal is a bit like taking a road

How does the difference between curves change with X? trip without a destination — you may see many beauti- ful things along the way, but you’ll never be sure when a a a A B b C you’ve arrived. A clear goal can often be specified as a b b question to be answered by the visualization. For exam- Y ple, a simple goal is to answer “What is the distribution of continuous variable Y ”, with secondary questions, “Does the variable require transformation?” and “Are there ex- X treme outlying observations?”. With the goals defined, one can implement one or more graphical forms for visu- Figure 2: Test your perceptual ability to determine the difference alizing distributional shape (e.g., see Fig. 3A) that will between curves. In each panel, how does the difference between curves a and b change as a function of x? Answers: (A) a − b answer the associated questions. Designing with a goal in increases exponentially with x;(B) a−b is constant over x;(C) a−b mind will help you to determine which information your increases linearly with x. Loosely adapted from Haemer (1947a). visualization needs to convey and the appropriate salience of each visual element (Chambers, Cleveland, Kleiner, and Tukey, 1983). We conclude this section by reminding readers that Principle 2: “Let the data dominate” (Wall- while computer graphing capabilities advance rapidly, the gren et al., 1996). A graph includes two parts: the human visual system persists relatively unchanged. Con- data and the annotations that put the data in context. sider the enduring perceptual limitations of humans when The data should always take the leading role, with an- implementing new graphical designs (Cleveland, 1994). notations as supporting players. Comprehension can be diminished when minor details are highlighted (Wainer, 2.2. Short-term and Long-term Memory 1984). Adjust object size, line thickness, color, satura- Effective designs accommodate not only perceptual lim- tion, etc., to emphasize the data over non-data elements itations, but also the constraints of short-term memory. (Kosslyn, 1985; Krzywinski, 2013a). If you are in doubt Healthy individuals can keep only three to five objects regarding the visual prominence of elements, try squint- in visual short-term memory (Cowan, 2001). Since this ing at the image from a distance: the data should be most capacity is not easily increased, we must design graphics salient feature and annotations should fade into the back- that do not require the reader to hold too many details ground (Wong, 2011b). For example, in Figure 8B, the at once. For instance, graphics commonly place too much data points are most prominent, followed by the smooth Allen and Erhardt Visualizing Scientific Data Page 4 of 21 curve obtained with locally weighted regression (LOESS), 3. Selecting a Chart Type and finally the grid and labels. Principle 3: “Simplify to clarify” (Wong, 2011c). When faced with visualizing a set of data, building or Every element in a graphic competes for our visual at- selecting a chart can be a daunting task. We find it use- tention. Good designs use the fewest elements possible ful to first determine dataset characteristics, since these to communicate the message without compromising data place constraints on possible visualizations. One should integrity. Statistician and artist provides be well-acquainted with the dataset sample size (i.e., the two pragmatic guidelines to help streamline figures: (1) number of observations collected), the data dimensionality “maximize the data-ink ratio”, i.e., the majority of ink (i.e., the number of variables collected for each observa- in a graphic should depict the data, and (2) “erase non- tion), and whether variables are considered independent data-ink”, i.e., superfluous elements that fail to convey or dependent. Equally important is the data type of each vital information should be removed (Tufte, 2001). When variable, a simplified taxonomy for which is provided in refining a graphic, consider the primary goal of a figure, Table 2. A critical distinction is made between quantita- prune it down to its essential parts, eliminate any extra- tive and qualitative (categorical) variables, as this affects neous elements, then refine the elements that remain to the types of encodings that can be used. uphold the message. Once dataset characteristics are established, we can Principle 4: Consistency trumps creativity. Good make choices regarding graphical encodings and transfor- design uses consistent encodings, layouts, colors, and other mations into geometric objects, or “geoms” (Wickham, elements to achieve visual continuity across panels and 2009). Geoms are the building blocks of all charts and figures (Wainer, 2008). A single (well-considered) design represent the mapping between a visual attribute and its will reduce the burden on your reader to decipher graph- representation with points, lines, and shapes. For exam- ics and will discourage misinterpretation. Creative and ple, if you encode a series of numbers using position along novel designs have their place, but a conventional or fa- a common axis (as in Fig. 3C–D), you could use a dot miliar display will require the least effort to comprehend (where the geom is the set of points whose positions are (see also Sec. 2.2). As an example, consider Figure 3. defined by the data), a line plot (where the geom is the line Although each panel displays distinct content, the visual that connects points defined by the data), or a bar plot style, font families, and font hierarchy is maintained con- (where the geom is the series of bars whose heights are sistently across panels, simplifying digestion for the reader. defined by the data). As we will see in the next section, Principle 5: Group with Gestalt. Gestalt princi- layering geoms can provide rich and informative charts ples are the rules of perceptual organization which help that meet our communication needs. us to understand and exploit the interaction between the parts and the whole (Ellis, 1999). Although there are 3.1. Common Chart Types numerous Gestalt principles, we focus on two describing Common chart types are shown in Figure 3. Each how humans tend to organize visual elements into groups. panel displays one or more charts that may be appro- First, the proximity principle states that objects that are priate based on data types, data dimensionality, and the closer together tend to be perceived as being part of the goals of the visualization. As discussed previously, using same group. This principle explains why *** *** and conventional chart types (and geoms therein) and adapt- ** ** ** are seen as two and three groups, respectively. ing as necessary will typically yield designs that are more Proximity is commonly applied to achieve effective lay- accessible to viewers. outs that reveal hierarchical relationships through align- Take some time to study the charts in Figure 3 and ment and the use of negative space (i.e., blank space). As consider the encodings used to portray the data, and de- an example, Figure 4 uses a relatively subtle addition of coding accuracy thereof. For low dimensional data (top negative space between rows A and B to encourage the panels), nearly all common visualizations encode data based interpretation that the plots within each row should be on position along a common axis. As dimensionality in- considered together, but that the rows themselves are dis- creases and multiple encodings are required, less optimal tinct. Second, the similarity principle states that objects encodings (such as area in mosaic plots and color in heat with similar visual attributes (e.g., shape, color, size, ori- ) may be necessary. A notable exception to this entation, etc.) tend to be perceived as being part of the principle is the pie chart, which encodes data with an- same group. This principle is commonly exploited with gles and/or area despite visualizing only a single categori- symbol encoding. For example, in the scatter plots dis- cal variable. This discrepancy between decoding accuracy played in Figure 5, we easily perceive the presence of two and data complexity, along with additional shortcomings groups based on the visual (dis-)similarity of plotting sym- of pie charts (e.g., low data-density and labeling difficul- bols. ties), have encouraged some data visualization experts to recommend strongly against their use (Tufte, 2001; Few, 2007). Although we name and portray a limited number of chart types, there are in practice infinite ways to commu- Allen and Erhardt Visualizing Scientific Data Page 5 of 21

DATA DISTRIBUTION EMPHASIS OF VISUALIZATION COMPARISON A B C BAR PLOT BAR CHART Compare the mean or other summary HISTOGRAM statistic over categories. The bar SMALL height, representing distance from zero, should be meaningful. Error bars can illustrate the uncertainty of the statistic. VIOLIN PLOT STACKED BAR CHART

PIE CHART D LINE PLOT BEE SWARM Compare the mean or other summary statistic over a qualitative or quantitative variable. Comparisons between groups can be made by plotting multiple lines. Error bars/bands Visualize the distribution shape, center, Visualize frequencies or proportions can illustrate the uncertainty of the and spread of a single quantitative variable. over a single qualitative variable. statistic.

TREEMAP E SCATTER PLOT F TABLE G I SMALL MULTIPLES 11 5 6 29 7 17 18 17 22 Visualize proportions in a HEAT hierarchical data structure. Color encoding (a heat map) can further visualize a qualitative or quantitative value for each cell.

2D HISTOGRAM

H MAP MOSAIC PLOT

Visualize the bivariate distribution Visualize frequencies Compare any type of visualization over one or between two quantitative variables. more categorical dimensions. Particularly useful DATA DIMENSIONALITY DATA or proportions over A scatter plot with multiple symbols two or more qualitative Visualize spatial patterns of when comparing across 4+ categories, which allows group comparisons. variables. qualitative or quantitative variables. will typically overwhelm single-panel displays.

SCATTER PLOT MATRIX J K PARALLEL COORDINATE PLOT L HEAT MAP V W X Y Z

X

GRAPH

W RADAR PLOT GLYPHS DENDROGRAM

Y X

V 2D PROJECTION

Z Y

Z Visualize multivariate datasets, in particular to Visualize multivariate compare groups or samples. In parallel datasets to discover data X Y Z coordinate plots dimensions are aligned in structure. Visualizations Visualize pairwise relationships between qualitative or quantitative parallel lines; in radar plots dimensions are portray the similarity among variables in a multivariate dataset. Bivariate distributions are displayed arranged radially. Glyphs can be generated from samples, emphasizing LARGE in off-diagonal panels; univariate distributions are along the diagonal. the hulls created with a radar plot. relationships and clustering.

Figure 3: Common chart types. Each panel shows possible visualizations based on the type of data, dimensionality, and desired emphasis. Panels A through L are arranged by increasing data dimensionality (top to bottom) and increasing emphasis on comparison (left to right). Allen and Erhardt Visualizing Scientific Data Page 6 of 21

Data Type Example Values Qualitative/Categorical Nominal gender {female, male} Ordinal disease stage {mild, moderate, severe} Quantitative/Numerical Discrete number of errors made 8, 1, 5, . . . Continuous cortical thickness 1.72, 3.32, 2.26, . . .

Table 2: Data can be broadly classified as qualitative or quantitative. Qualitative data are categorical. Nominal variables differentiate between groups based only on their names, and ordinal variables allows for rank ordering of groups. Quantitative data are numerical with meaningful intervals between measurements. Discrete variables can take only integer values, and continuous variables can take any real value.

nicate data through various encodings and the creative must “remind us that the data being displayed do con- use of points, lines, and shapes. In his highly influen- tain some uncertainty” and must “characterize the size of tial book “The Grammar of Graphics”, statistician Leland that uncertainty as it pertains to the inferences we have in Wilkinson shuns the notion of a fixed “chart typology”, mind” (Wainer, 1996). Unfortunately, our recent survey of and instead encourages flexibility by layering geoms that neuroscience journals demonstrates that many published portray different levels of detail (Wilkinson, 2005). Ex- figures do not meet this standard, particularly as data di- amples of such layering can be seen in Figure 3. In Panel mensionality increases (Allen et al., 2012). When the goal A, a violin plot layers a box plot on top of a kernel den- of a visualization is to compare a measured or derived sity estimate (smoothed histogram) (Hintze and Nelson, quantity across categories or conditions, one must include 1998), making both the distribution shape and the dis- a geom portraying the quantity and a second geom por- tributional quartiles accessible to viewers. Of course, one traying the uncertainty of the quantity. Note that without could replace the box plot geom with a “rug” of the indi- the portrayal of uncertainty, accurate visual comparison vidual data points (referred to as a “beanplot” (Kampstra, is not possible; viewers may draw incorrect or uninformed 2008)) or with a portrayal of the mean and/or standard conclusions. deviation, tailoring the visualization for what is meant to Variation and uncertainty can be portrayed with a va- be communicated. Another effective use of layering is the riety of geoms, but are most commonly displayed with portrayal of data and a model fit to the data, as shown in error bars. Unfortunately, there is no single standard for Figure 10B. When including a geom that depicts a model, what quantity the error bar should represent. In fact, ensure that the data maintains visual prominence (i.e., there is an overwhelming plurality of possible meanings, follow Principle 2 to let the data dominate). e.g., a standard deviation (SD) of the sample, a stan- One should also consider clarity when choosing be- dard error of the mean (SEM), a range, a parametric tween geoms. For example, when visualizing a bivariate 100(1 − α)% confidence interval (CI) of the mean, a boot- distribution between continuous variables, a scatter plot strap CI, a Bayesian probability interval, a prediction in- (Panel E, top), may be the natural first choice. However, terval, etc. Each quantity has its own statistical inter- with a large dataset (e.g., hundreds or thousands of ob- pretation, and poor labeling of error bars has been shown servations) the density of points may obfuscate portrayal to mislead viewers (Wainer, 1996; Cumming and Finch, of the distribution. In this case, a two-dimensional (2D) 2005). histogram or contours of the 2D kernel density estimate Thus, when using error bars, ensure that (1) the quan- (Panel E, bottom) offers greater clarity. (When permitted tity encoded by the bar is consistent with the goal of the by the data density, layering a scatter plot on top of the visualization, and (2) the quantity is unambiguously de- 2D contours can provide an attractive visualization that fined. Regarding the first point, we offer the following allows access to the data at high and low levels of detail.) guidelines when using error bars to portray variation of a As a second example, a scatter plot can effectively com- parameter estimate, or variation of the data. pare bivariate distributions between a few categories that If the interest is estimating a population parameter, are distinguished by different plotting symbols or colors. such as the mean or variance, then the variation of the However when the number of categories exceeds three or estimate (i.e., the sampling distribution of the statistic) four, it may be difficult to discern and compare distri- is wanted. Examples of suitable error bars include the butions. Small multiples can yield a less overwhelming SEM or a 95% parametric or bootstrap CI, as seen in graphic. visualizations that emphasize comparisons (Fig. 3C–D). Parametric CIs should only be used if data meet the as- 3.2. Displaying Variation and Uncertainty sumptions of the underlying model, otherwise a bootstrap The concept of layering also facilitates the depiction (or other strategy for approximating the sampling distri- of statistical variation and uncertainty. As emphasized by bution) should be used. Note that many aspects of CIs statistician , an effective data visualization (and p-values) are often misunderstood, even by experi- Allen and Erhardt Visualizing Scientific Data Page 7 of 21 enced scientists (Belia, Fidler, Williams, and Cumming, full square form of a scatter plot matrix is redundant (i.e., 2005; Hoekstra, Morey, Rouder, and Wagenmakers, 2014). it displays both X vs. Y and Y vs. X), many choose to in- For example, when comparing population parameters be- clude only axes in the upper or lower triangle of the matrix tween two or more groups, viewers may look to the overlap (i.e., above or below the diagonal, respectively). Alterna- between CI bars to determine whether the parameters are tively, one can use the redundancy to provide slightly dif- statistically different. While non-overlapping 95% CIs do ferent visualizations of the same bivariate distributions. indicate a significant difference (under a normal proba- For example, we plot X vs. Z (bottom left axis) using bility model), the converse is not true — depending on violin plots to emphasize distributional differences, and sample size, the CI bars may overlap by as much as 50% plot Z vs. X (top right axis) using bee swarms to show and still meet significance criteria (Cumming, Fidler, and individual observations. Vaux, 2007; Krzywinski and Altman, 2013). Therefore, One limitation of scatter plot matrices is that visual- after defining the quantity portrayed by an error bar, we izations are limited to pairs of variables. Panel K shows recommended interpreting that quantity and the results several chart types that help to overcome this limitation. of any relevant hypothesis tests to help your reader reach In a parallel coordinate plot, data dimensions are depicted the correct conclusions. as equally-spaced parallel lines. The example dataset has Alternatively, if the interest is possible observational five dimensions (variables V , W , X, Y , and Z) portrayed outcomes, then the variation of the data (i.e., the em- by parallel lines. Each observation in the dataset is rep- pirical distribution) is desired. In this case, “error bars” resented as a line or profile which traverses the parallels. might indicate one SD from the mean or the interquartile Profiles of similar observations will form “bundles”, indi- range (middle 50% of data) around the median (Cleve- cating natural clusters in the data and making outliers ap- land, 1994). Note, however, that these “error bars” do parent. A related visualization is a radar plot, or star plot, not reflect error or uncertainty, but rather variation. As which orients axes radially rather than in parallel. For such, in these cases we prefer to use geoms such as box both methods, the scaling and order of axes are arbitrary. plots and violin plots (Fig. 3A) that are intended to em- Both properties can profoundly affect the appearance of phasize distributions. the visualization, thus one may consider re-ordering or re- Displaying uncertainty remains an active area of re- scaling to help reveal patterns. Note, too, that arbitrary search and development, particularly for compact visual- scaling and ordering can also make pairwise relationships izations of high-dimensional data (e.g., Griethe and Schu- much less accessible than in a scatter plot matrix (see mann, 2006; Potter, Rosen, and Johnson, 2012). For heatmaps, Fig. 4B for an example). Hengl (2003) suggests using a 2D colormap: color hue is Multivariate data can also be visualized with glyphs, used to visualize observations, and transparency is used which are symbolic or iconic representations wherein each to visualize the error or uncertainty associated with each variable is mapped to a visual feature. Simple glyphs observation. In Section 6.2 we apply this strategy to func- can be generated directly from the hulls created with a tional brain data and find the visualization to be radar plot, where each variable is mapped to the ver- much more informative than the typical design. One con- tex position of a polygon (Panel K, bottom). More com- cern with this approach is how accurately combined hue plex glyphs include Chernoff faces (Chernoff, 1973), which and transparency encodings can be decoded, particularly maps each variable to a different facial feature (e.g., nose given the perceptual non-independence of these visual at- length, mouth curvature, eye separation, eyebrow angle, tributes (Wilkinson, 2005). We look forward to further etc.), exploiting the innate ability of humans to recognize improvements and innovations in visualizing uncertainty. and analyze faces. In , glyphs are commonly used to display diffusion weighted images: variables de- 3.3. Multivariate Visualizations scribing the orientation and magnitude of diffusion along Because readers may be less familiar with the multi- fiber tracts are mapped to ellipsoid-like shapes displayed variate visualizations shown in Panels J-L of Figure 3, we at each voxel location (Margulies, B¨ottger,Watanabe, and provide a more detailed discussion of their composition Gorgolewski, 2013). A significant advantage of glyphs over and usage. The scatter plot matrix (Panel J) displays scatter plot matrices or parallel coordinate plots is that bivariate relationships between all pairs of variables ar- patterns involving several dimensions can be more readily ranged into a matrix. Originally conceived as a visualiza- perceived. However, the accuracy with which glyphs can tion for only continuous variables, the scatter plot matrix convey data is often questioned, and the use of complex has since been generalized to include both continuous and glyphs may be best suited for data exploration or quali- categorical variables and is also referred to as a generalized tative comparisons (Ward, 2008). pairs plot (Emerson, Green, Schloerke, Crowley, Cook, While the chart types in Panels J and K lose their in- Hofmann, and Wickham, 2011). The example dataset in terpretability at 10–20 dimensions, the visualizations in Panel J includes three variables, X (categorical), Y (con- Panel L are suitable for very high-dimensional data (e.g., tinuous), and Z (continuous). The visualizations of bi- observations with hundreds of time points or thousands of variate distributions make use of simpler geoms depicted pixels). Notably, none of these visualizations display the in earlier panels and are highly customizable. Since the original data or even reveal the number of variables in the Allen and Erhardt Visualizing Scientific Data Page 8 of 21 dataset. Rather, they focus on the similarity or differences the visual system relies on depth cues such as occlusion between observations, or use data reduction methods to (where objects in the foreground partially conceal those project the data to a lower-dimensional space. In the ex- in the background) or graphical (where par- ample, we consider a dataset with five observations, each allel lines converge toward the background) (Gehlenborg of which has values over a large number of variables. We and Wong, 2012). These cues can interfere with effective can visualize the relationships between observations by visualization by obstructing data or distorting height and computing the similarity matrix (or distance matrix) and length estimates. Thus, while 3D graphics may be appro- displaying the result as a heat map. Reordering the simi- priate in some cases (e.g., portraying 3D objects such as larity matrix (Friendly, 2002) can help to reveal patterns neuroanatomical structures), 3(+)-dimensional data can or clusters; in this case we can visually identify two differ- typically be visualized more effectively on the 2D plane. ent clusters with three and two observations, respectively. As shown in Figure 4, using additional encodings (such as The similarity matrix can be used to produce a theoretical color) or selecting chart types suitable for higher-dimensional network graph, where each observation is represented by data can easily supplant the need for 3D graphics. a node (circle) and relationships between them are rep- Recommendations against plotting in a 3D space are resented by edges (lines). Such visualizations have be- even stronger when visualizations include a pseudo-third come ubiquitous in the study of structural and functional dimension or “3D-effect” to create a sense of graphical brain connectivity, where methods adapted from graph sophistication, a prime example of “” (Wainer, theory are used to describe and analyze brain networks 1984; Tufte, 2001). Examples include 3D-effect bar charts, (Bullmore and Sporns, 2009). Alternatively, the original where each bar is portrayed as a column, or the highly crit- data can be subjected to hierarchical clustering and dis- icized 3D pie chart. In these cases the empty third dimen- played as a dendrogram, where the height of each branch sion is completely gratuitous and simply degrades decod- indicates the difference between observations. Finally, we ing accuracy. Although data visualization experts have could also apply dimension reduction techniques to the advised against such designs for nearly 70 years (Haemer, original dataset, producing an accessible visualization in 1947b, 1951), such graphics unfortunately persist. two (or three) dimensions. The classical linear data re- duction approach is principle component analysis (PCA), 4. Symbols and Colors where data are projected onto the dimensions that cap- ture the greatest variance. Nonlinear dimension reduction 4.1. Using Symbols approaches, such as Sammon mapping (Sammon, 1969) When it comes to plotting data, not all symbols are or t-Distributed Stochastic Neighbor Embedding (t-SNE) created equal. Effective symbols minimize data occlusion (Van der Maaten and Hinton, 2008), determine a mapping and provide intuitive visual contrast between different cat- to a low-dimensional space by optimally preserving rela- egories (Krzywinski and Wong, 2013). tionships between data present in the high-dimensional For plots with a single category of data, Cleveland space. T-SNE, in particular, has been found to be very (1994) recommends using the open (i.e., unfilled) circle as well-suited for visualizing high-dimensional data. a plotting symbol. Open symbols are preferred to filled While multivariate visualizations can be initially over- symbols because they retain their distinctness in instances whelming and may take more time to interpret, we believe of overlap. Additionally, circles are unique in that their they are under-utilized both for the purposes of data ex- intersection does not form another circle — this is not ploration and presentation. Often, multivariate datasets the case with squares (which can form additional squares are unfortunately portrayed with a series of univariate vi- when intersecting), triangles, crosses, or other symbols. sualizations. This reduces the ability to systematically One downside of the open circle (or any open symbol) study relationships between variables or find outliers in is that it does not contain ink at the value being plot- the multivariate space. As a simple example, consider the ted. Thus, the eye may be drawn to the periphery of the scatter plot in Panel E. The plot reveals not only a nonlin- shape, distorting judgements of length or position. One ear relationship between variables, but also an outlier that can mitigate this concern by using small symbols, or, in is only detectable in the 2D space, i.e., the point would cases of low or no overlap, by reverting to the filled circle not be seen as an anomaly in any portrayal of univari- (e.g., see Fig. 9). ate distributions. Likewise, outliers in an n-dimensional For plots with multiple categories, choose symbols with space may only be visible in visualizations that portray or high contrast that are easy to distinguish. Figure 5 com- incorporate multiple dimensions simultaneously. pares combinations of plotting symbols for two groups. As 3.4. 3D visualizations seen in Panel A, using open circles and open squares (or any other pair of open polygons) provides relatively poor Note that regardless of dataset dimensionality, none contrast — the eye must actively seek out different cat- of the chart types displayed in Figure 3 use a 3D visu- egories. Discrimination is improved by changing symbol alization. This is not by accident. Portraying data in shape (from squares to crosses or filled circles) or symbol a 3D space requires additional complexity, often at the saturation, as shown in panels B-F. Where possible, we cost of clarity. In order to interpret 3D representations, Allen and Erhardt Visualizing Scientific Data Page 9 of 21

LINE PLOT A 3D BAR PLOT HEAT MAP SMALL MULTIPLES WITH SYMBOL ENCODING Reaction time (s) 5 4 aa Aa AA mean ± AA 4 1 SEM 1.5 2.0 2.5 3.0 3.5 aa 4

3 3 2 aa Aa

2

Reaction time (s) Aa 0 2 Reaction time (s) Reaction time (s) aa Genotype 1 AA 4 GenotypeAa 3 1 0 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 AA 2 1 Task difficulty Task difficulty Task difficulty Task difficulty

SCATTER PLOT PARALLEL B 3D SCATTER PLOT SCATTER PLOT MATRIX WITH COLOR ENCODING COORDINATE PLOT 140 PWV (m/s) Age (years)

120 15 40 65 90 4.5 6.5 8.5 10.5 12.5 140 100 100 140 14 13

MAP (mmHg) 80 120 10 12 100 7 80 9 60 80 Pulse wave velocity (m/s) 4 40 PWV (m/s) 6 10 60 3 140 Mean arterial pressure (mmHg) 120 20 100 Age (years) Mean arterial pressure80 (mmHg) 60 20 40 60 80 80 100 120 140 20 40 60 80 Age MAP PWV (years) (mmHg) (m/s) Age (years) MAP (mmHg) Age (years)

Figure 4: Datasets with 3(+) dimensions need not be visualized in a 3D space. Panels A and B show examples of 3D visualizations (left) as well as alternative visualizations that portray the same data on the 2D plane. Both datasets are synthetic.

recommend changing symbol color rather than saturation, as color is recognized as the most effective discriminator (Lewandowsky and Spence, 1989); see also Section 4.2 and Figure 6. When varying symbol attributes, be mindful of the effects on salience. For example, in Panels C, D, CHANGE SYMBOL SHAPE and E, one category is more salient than the other. If A B C categories are intended to have equal prominence, we pre- fer panel B (which uses very distinct shapes with similar salience) or panel F (which balances the ink used by each category). When plotting categories that have a natural hierar- chy or grouping, choose parallel symbols that reflect this D E F organization (Krzywinski and Wong, 2013). The degree of visual contrast between categories should mirror the in- tended analytic contrast. For example, consider a study comparing subjects with different diagnoses where data SATURATION CHANGE SYMBOL has been collected at multiple research sites around the nation. You are primarily interested in differences with respect to diagnosis, but also want to consider the influ- Figure 5: Increase the visual distinction between symbols by chang- ence of collection site. One approach would be to use ing symbol shape or saturation. When available, color can also be different colors to encode diagnosis (since color will pro- a powerful discriminator. vide the greatest visual distinction between categories) and different symbols to encode study site. With numerous categories to distinguish, visual dis- tinctness may be futile regardless of how strategically sym- Allen and Erhardt Visualizing Scientific Data Page 10 of 21 bols have been chosen. The data-density and degree of may become impossible to distinguish, though see below overlap may also present a challenge. As mentioned in for strategies to avoid this problem. Section 3.1, a good strategy is to use small multiples and Given the disadvantages of color, we recommend try- present each category (or subset of related categories) in ing to use other designs or encodings first. If you do decide a separate panel. to use color, make sure that the benefits to your visual- ization outweigh any costs. Figure 6 shows two examples 4.2. Using Color of incorporating color. Panel A uses color to encode a The trichromatic visual system of humans allows most categorical variable (which we refer to as qualitative color individuals to distinguish millions of different colors, cre- encoding), and Panel B applies color to a continuous vari- ating a rich sensory experience of the visual world. Incor- able (quantitative color mapping). Specific considerations porating color into scientific visualizations can increase for each type of encoding are discussed below. In Panel A, expressiveness and heighten impact, however there are a color-coded categories are easier to discriminate, despite number of disadvantages introduced by color that should our efforts to maximize contrast between grayscale sym- be addressed. bols with shape and saturation. Because we have double First, color is a relative (rather than an absolute) medium encoded (with symbols and colors), the graphic will be (Wong, 2010a). Human color perception depends on con- accessible to color blind individuals as well as viewers of trast with neighboring colors, as is often illustrated with grayscale reproductions. In Panel B, color has a profound beguiling optical illusions (Albers, 1975). This depen- impact. Using only grayscale, the salience of negative and dence can create unfortunate interactions between color positive values is unbalanced as our attention is drawn to- and spatial location, biasing our judgments. As demon- ward darker regions. Color restores balance because nega- strated by Cleveland and McGill (1983), color also inter- tive and positive values can be represented with equivalent acts with other visual features such as area (brighter colors saturation. Additionally, color encoding allows zero to be can make areas appear larger than darker colors). mapped to visually neutral white (rather than ambiguous A second concern in using color is color blindness, medium-gray), facilitating pattern detection. The chosen which affects seven to ten percent of males. Common blue-to-red colormap avoids red/green contrast and is ro- forms of color blindness involve deficiencies in red- or bust to a variety of color vision impairments. green-sensitive cone cells, thus red/green contrast (or any contrast that involves similar levels of red and green, such Qualitative Color Encoding as orange/light-green) should be avoided. To ensure that Effective qualitative color encoding requires adept choice your graphic is accessible to the widest possible audience, of easily-differentiated colors. One strategy for doing so, use colorblind-friendly palettes (colorbrewer2.org). Ad- suggested by artist and color theorist Albert Munsell, is ditionally, graphical editing software often includes tools to determine a “harmonic” palette by selecting equally- to simulate red-green color blindess, allowing you to make spaced points along a perceptually uniform color space adjustments to color while you work. Similar web-based (Munsell, 1947). Similar recommendations are made more simulators, such as Coblis (color-blindness.com) or Vis- recently by (Brewer, 1994). The col- check (vischeck.com), invite you to upload your visual- ors resulting from this approach have distinct hues with ization and check how it appears to individuals with com- comparable saturation and brightness, providing a palette mon color vision impairments. that is unlikely to produce attentional bias. We adopt Thirdly, when working with color you must consider this strategy in Figure 6A and use a 4-color qualitative the different representations of your graphic in digital and palette provided by Brewer’s online resource, ColorBrewer printed media. Digital media (e.g., computer screens, pro- (colorbrewer2.org). A second strategy, proposed by jectors, etc.) operate in the additive RGB (red, green, Bang Wong, is to select colors by spiraling through hue blue) color space: different intensities of R, G, and B light and saturation while varying brightness (see Wong, 2010a, are emitted to produce different colors. Print media re- Fig. 2). The resulting colors have distinct hues and bright- lies on the subtractive CMYK (cyan, magenta, yellow, ness, thus are discernible even in grayscale reproductions. black) color space: quantities of C, M, and Y pigments As with symbol encoding, if your dataset has so many absorb wavelengths to produce different colors via reflec- categories that a palette with easily-differentiated colors tion. RGB has a wider color gamut than CMYK, meaning is unobtainable, consider small multiples. that not all colors visible on the screen can be reproduced in print. The difference in gamut is most noticeable for Quantitative Color Mapping vibrant colors. You may find your printed graphic appear- Quantitative color mapping, that is, the mapping of ing dull or muted with a loss of discriminability around a numerical variable to a continuous progression of color, green and violet hues. Issues surrounding conversion to is considered controversial by several data visualization print media are more severe when color printing is not an experts. Tufte (2001) writes that “the mind’s eye does option and images are converted to grayscale. Categories not readily give a visual ordering to colors,” and Kosslyn or features that were once easily differentiated by color (1985) emphatically states that “differences in quantities Allen and Erhardt Visualizing Scientific Data Page 11 of 21

should not be represented by differences in color . . . shift- ing from red to green does not result in ‘more of some- thing’ in the same way as shifting from a small dot to a large one does.” We agree that the relationship between quantitative value and color is arbitrary (and must be clearly defined), but feel that strategically chosen map- pings can compensate for the lack of an absolute color order. Consider four common colormaps displayed in Fig- ure 7, as well as the trajectories these maps take through the hue-saturation-value (HSV) color space. (Note that HSV is equivalent to RGB in terms of gamut and simply uses a different set of axes to navigate the same space.) A QUALITATIVE COLOR ENCODING HSV is portrayed as a cylinder, where angle around the 100 100 task 1 task 1 vertical axis corresponds to hue (the color), radial dis- task 2 task 2 90 tance from the vertical axis corresponds to increasing sat- 90 task 3 task 3 task 4 task 4 uration (color intensity), and height along the vertical 80 80 axis corresponds to value (color brightness). We find that the trajectories taken by the colormaps (Paths 1–4) sug- 70 70 gest natural and specific mappings to different data types.

Predictied score (%) 60 60 For example, Path 1 circumnavigates the upper rim of the cylinder and represents a circular progression through 50 50 50 60 70 80 90 100 50 60 70 80 90 100 pure hues, thus is appropriate for cyclical data such as True score (%) True score (%) phase angle or time of day. In contrast, Paths 2 and 3 in- tersect the vertical axis representing neutral colors (white B QUANTITATIVE COLOR MAPPING at Point B; gray at Point E), thus are suitable for data Correlation Correlation that diverge from an origin (commonly, zero). These “di- −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 vergent” or “bipolar” colormaps are consistent with our A A B B C C perceptual expectations that the distance from the origin D D E E should correspond to “more of something” — more color F F G G H H intensity in Path 2 and more brightness in Path 3. I I J J Thus, the choice or design of a colormap must consider K K Region L L M M the nature of the data and how this can be aligned with N N O O perceptual expectations. If the data contain a discontinu- P P Q Q R R ity or abrupt change point, construct a colormap with a S S T T similar perceptual discontinuity. Rely primarily on satu- A B C D E F G H I J K L MN O P Q R S T A B C D E F G H I J K L MN O P Q R S T Region Region ration or brightness (rather than hue) to reflect changes in quantitative magnitude, since these dimensions do have Figure 6: Examples of incorporating color for qualitative encoding a natural order. Map neutral grayscale colors to zero or of different categories (A) and quantitative mapping of a continuous other natural baselines. Most critically, include a color variable (B). In A, the addition of color facilitates group discrimina- axis (i.e., a colorbar) defining your color mapping (Allen tion. Colors were chosen based on a 4-color qualitative palette pro- vided by ColorBrewer (colorbrewer2.org). In B, colors facilitate et al., 2012). Analogous to any x- or y-axis, the color pattern detection and balance the salience of positive and negative axis should be properly labeled and include sufficient tick values. marks to help viewers decode the map. See Figure 6 for an example. In Figure 7 we also draw attention to the fact that although the colormaps progress smoothly through HSV space, the color transitions we perceive are non-uniform. For example, in Path 1, a pair of points represent colors perceived as nearly indistinguishable green hues, while a second pair of equidistant points are perceived as very distinct hues (yellow and orange). Similar discrepancies can be seen in the other paths. In general, a percep- tually smooth colormap must be carefully designed and calibrated. Despite numerous limitations, quantitative color map- ping is and will likely remain a preferred choice for com- Allen and Erhardt Visualizing Scientific Data Page 12 of 21

Saturation HUE-SATURATION-VALUE COLOR SPACE C B SLICE AT A SINGLE HUE Hue

C Value I A

G TRAJECTORIES OF COMMON COLORMAPS Saturation B

Path 1 D F G C D F

F E

Value Path 2 H D B F

A J Path 3 C D E F G

Path 4 H D C I G F J

Figure 7: A depiction of the hue-saturation-value (HSV) color space. Four different paths, described by points A–J, denote the trajectories that common colormaps take through HSV space. On each colormap, colors highlighted in circles illustrate that equidistant shifts along the trajectory are not commensurate with our perceived shifts in color.

pact visualizations of large datasets. Maintain awareness One should also consider the axes aspect ratio and of these limitations and expect that mappings may intro- scale the width and height to maximize visual discrimina- duce biases or distortions. If the goal of your visualization tion of trends and change points. Figure 8 displays two requires highly accurate judgements, either avoid color different datasets at three different aspect ratios. It is ob- mapping altogether or provide additional designs (e.g., vious from these plots that the accessibility of patterns small multiples of relevant “slices” through an image) that is strongly impacted by aspect ratio. Cleveland, McGill, clarify the data through more accurate encodings. and McGill (1988) recommend that axes should be scaled such that the mean absolute angle of line segments is 45 degrees, since deviations of slopes in this range will be eas- 5. Supporting Details iest to detect. Although the “bank at 45 degrees” guide- The supporting details of graph construction include line has received some recent criticism and may not be the use of axes, grids and guides, and verbal and graphical optimal in all cases (Talbot, Gerth, and Hanrahan, 2012), annotations. These details may seem minor, but play a we feel it is a valuable starting point. profound role in clarifying information and guiding atten- Additional recommendations for axes apply to specific tion. chart types or data types.When working with small multi- ples, determine whether comparisons between axes should 5.1. Axes be based on relative or absolute differences. If absolute differences are most relevant, use uniform scaling across Axes are the coordinate system reference around data, the axes and label a single y- (or x-) axis to emphasize orienting viewers to the data’s extent, scale, and precision. that the scale is fixed (Krzywinski, 2013a). See Figure 4A Axes should extend slightly beyond the extremes of the (rightmost panel) for an example. When working with data to avoid concealing symbols or lines, as well as to categorical (in particular, nominal) variables, special at- provide a visual frame for the viewer’s attention (Wainer, tention should be paid to the order of categories. Fein- 2008). If extreme values dramatically compress the range berg and Wainer (2011) and many others suggest that of the majority of your data, consider using axis breaks, categories should be ordered based on the data (e.g., by scale transformations, or trimming outlying values, and frequency or median value) and that one should avoid al- notate these elements or omissions on the plot (Krzywin- phabetical or other arbitrary orders. As seen in Figure 9, ski, 2013a). a data-driven ordering can greatly enhance one’s ability Allen and Erhardt Visualizing Scientific Data Page 13 of 21

to detect trends and identify categories with specific prop- erties. Note too, that categorical axes often benefit from vertical orientation. Though this may break with the con- vention of placing dependent variables along the y-axis, it allows for categories to be labeled with horizontal text, eliminating the need for viewers to rotate their heads 90 degrees. A ASPECT RATIO 2 : 1 ASPECT RATIO 1 : 1 130 130 5.2. Grids and Guides Axis ticks, grids, and guides comprise the “naviga- tional elements” that help viewers determine where data 125 125 lie in the (x, y) plane and in relation to other important quantities (Heer and Bostock, 2010; Krzywinski, 2013a). Rather than falling victim to software defaults, manually Pressure (mmHg) Pressure (mmHg) 120 120 set tick mark spacing and grid density in a manner that is consistent with the data. For example, if your data 0 2 4 6 0 2 4 6 are discrete and take on only integer values, remove ir- Time (seconds) Time (seconds) relevant ticks or grid lines at non-integers (e.g., see Fig- ASPECT RATIO 1 : 5 ure 11). When appropriate, tick marks should be spaced 130 by base-10 conventions in multiples of 1, 2, 5, or 10, since 125 labels will be more accessible to readers (Wainer, 2008; 120 Wallgren et al., 1996). Choose tick mark and grid density

Pressure (mmHg) 0 2 4 6 carefully: a dense grid suggests high data precision and Time (seconds) that minor differences are important (Krzywinski, 2013a). Furthermore, very dense grids can actually lead to greater B ASPECT RATIO 2 : 1 ASPECT RATIO 1 : 1 judgement errors (Heer and Bostock, 2010) or give rise to 300 300 moir´eeffects (i.e., the illusion of movement or vibration) (Tufte, 2001). In all cases, ensure that the data retains 250 250 prominence over the grid. This can be achieved by adjust- ing grid line width, color, and opacity, or, for bar graphs, 200 200 by “gridding” only the bar rather than the entire plot 150 150 region (see Tufte, 2001, p. 128). Guides refer to lines or other geoms that provide a Reaction time SD (ms) Reaction time SD Reaction time SD (ms) 100 100 specific reference. Common examples include a point or

0 5 10 0 5 10 line at the value of a parameter under the null hypothesis, Error rate (%) Error rate (%) or a y = x line for scatter plots comparing pre- and post- ASPECT RATIO 1 : 5 measurements or true and predicted values (e.g., Fig. 6A). 300 As usual, emphasize the data over the guide. 200 5.3. Annotation 100 0 2 4 6 8 10 The majority of verbal annotation in a figure is ded- Reaction time SD (ms) Error rate (%) icated to the labeling and defining of graphical objects. In all cases, labels should be explicit and complete. For Figure 8: Choose an aspect ratio for the axes that maximizes visual example, an axis label of “Time (s)” is considerably more discrimination. Panels A and B show two different sets of simulated ambiguous than “Time since stimulus onset (s)” (Wainer, data plotted at different aspect ratios. In A, an aspect ratio of 1:5 2008). Labels defining symbol encodings should also be (bottom) yields curve segments with slopes close to 45◦ and reveals included on the graphic, potentially in a legend or key as detail in the oscillatory pattern that are not apparent at higher aspect ratios. In B, an aspect ratio of 1:1 (top right) yields line shown in Figure 6A. Place legends as close as possible to segments with slopes close to 45◦, making the change point clearly the plotting symbols they define. As mentioned in Sec- visible. Loosely adapted from (Cleveland, 1994). tion 2.2, the separation between labels and the objects they are labeling can place unnecessary demand on visual short-term memory, forcing viewers to toggle back and forth between the data and the legend to recall encod- ings. When possible, omit a legend and label shapes or lines directly, as in Figure 4A (Schmid, 1983). In the case of crossing lines, ambiguity can be reduced by providing labels at both ends (Wainer, 2008). Allen and Erhardt Visualizing Scientific Data Page 14 of 21

ORDERED ALPHABETICALLY ORDERED BY MAGNITUDE

Angular gyrus L Angular gyrus R Angular gyrus R Angular gyrus L Anterior cingulate Medial frontal Inferior frontal L Superior frontal Inferior frontal R Anterior cingulate Inferior frontal R Middle temporal R Inferior temporal L Inferior temporal L Medial frontal Middle temporal R Medulla L Parahippocampal L Medulla R Parahippocampal R Midbrain Semilunar lobule L Middle temporal R Medulla L Middle temporal R Pyramis Parahippocampal L Medulla R Parahippocampal R Inferior frontal R Pons Inferior frontal L Pyramis Inferior frontal R Semilunar lobule L Semilunar lobule R Semilunar lobule R Superior temporal L Superior frontal Midbrain Superior temporal L Pons

0 10 20 30 0 10 20 30 Functional connectivity to Functional connectivity to posterior cingulate (t−statistic) posterior cingulate (t−statistic)

Figure 9: Order nominal variables based on the data. Ordering alphabetically (left) or by other arbitrary criteria yields data patterns that are difficult to parse. Ordering based on statistical magnitude (right) immediately reveals regions with high and low functional connectivity to the posterior cingulate. Graphic is based on data reported in Tomasi and Volkow (2011).

When error bars or other portrayals of uncertainty are to prevent excessive “storytelling” or elaboration. If you included, define the type of uncertainty that is portrayed find yourself limited by journal conventions, include sta- (e.g., see Fig. 10A). Such information is typically included tistical descriptions and definitions in the figure caption in the figure caption, however we recommend it appear to maintain proximity to the graphic. directly on the graph to avoid the possibility of misinter- A type of non-verbal annotation that deserves special pretation (Cumming and Finch, 2005; Vaux, 2004). mention is the arrow, which is best used as a “visual verb” Another important use of annotation is the integra- to describe processes and relationships (Wong, 2011a). In tion of statistical or numerical descriptions. When scien- this chapter, arrows are used in different scenarios to in- tific graphs are intended to address a specific question or dicate continuation (Fig. 3), interchangeability (Fig. 4), hypothesis visually, there is often a corresponding statis- and process flow (Fig. 11). Because arrows can represent tical model that addresses the same question numerically. so many different functions and actions, ensure that your Placing relevant statistics alongside their graphical coun- intended meaning is clear from context. terparts can improve understanding of both the graphic and the model and remove the burden on readers to di- 5.4. Font Choice vide their attention between the figure and explanatory The clarity and impact of verbal annotation is affected text elsewhere (Lane and S´andor,2009). Commonly inte- not only by what you say, but also how you say it. The grated statistics include correlation coefficients, regression choice of typeface (more commonly known as font family, coefficients and p-values, however annotations can also e.g., Helvetica) and font style (bold, italic, etc.) should include instructive explanations of the hypothesis being be made based on the purpose of the text as well as tested. Figure 10 provides two examples where relatively its desired tone. Typefaces are broadly classified into elaborate statistical annotation has been integrated into two groups depending on the presence or absence of ”ser- the visualization. In both cases, annotation is limited to ifs” (i.e., the small projections attached to the end of description of the statistical model(s) and key parameters strokes). Serif typefaces tend to be easier to read in that address the hypothesis. One should keep the text printed multi-line blocks. Sans-serif (without serifs) fonts concise and avoid indiscriminately reporting all informa- are simpler forms which can decrease time for word recog- tion that software may output. nition (Moret-Tatay and Perea, 2011), thus are suited for Note that while we and others advocate for the ad- smaller chunks of text such as labels and headings. For dition of descriptive numerical and verbal annotation in both classes, Gestalt principles dictate that labels of a figures (e.g., Tufte, 2001; Lane and S´andor,2009), this given type (axis tick labels, line labels, etc.) be of a single practice is actively discouraged by some journals, perhaps typeface and style. Using multiple typefaces or different Allen and Erhardt Visualizing Scientific Data Page 15 of 21

font styles within a typeface can be an effective approach to create contrast or establish a visual hierarchy, however one should be wary of overwhelming the viewer with in- consistencies. Additionally, consider the context of your message and find a typeface and font style with a comple- mentary mood. For example, throughout this chapter we have used two different typefaces: to label data, axes, and tick marks we use the freely available sans-serif typeface “Open Sans”, which is easy to read and has a clean and unassuming appearance; for more expository labels, com- ments, and questions, we use a contrasting handwritten typeface “Architect’s Daughter”, which evokes a friendly and less austere tone. A How do the groups differ across conditions?

Group 1 5.5. Numerical Precision Group 2 When reporting numbers in figures (or tables), a good 5 strategy is to round liberally, retaining the fewest sensi- ble significant digits. There are several reasons for this: 4 mean, 95% confidence interval (1) humans don’t comprehend more than three digits very easily, (2) it’s very rare that we care about more than three repeated measures ANOVA: F (1,18) = 16.0, p < 0.001 digits of absolute accuracy (when using scientific nota- 3 C

Pupil diameter (mm) F (1,18) = 0.3, p = 0.6 G tion), and (3) it’s even rarer that we can justify more than F (1,18) = 12.7, p = 0.002 C × G three digits of statistical precision (Feinberg and Wainer, 2 neutral angry 2011). As an example, consider the statistical precision Facial expression condition of the Pearson correlation coefficient. While the default reporting of correlation by some software is four digits, B Are the slopes different between groups? achieving a standard error less than 0.00005 requires a 100 Group 1 sample size greater than 400 million! Justifying more than Group 2 one digit still requires a sample size at least 400, a level

95 Model 1: met by relatively few research studies. (To perform your Y = β + β G + β X + β G × X own calculations, consider that the standard error of r can 0 1 2 3 √ 90 β3 = 0.14, t(76) = 0.6, p = 0.6 be as large as tanh(1/ n − 3), where n is the sample size R2 = 0.64 (Feinberg and Wainer, 2011).) Similar considerations of

Accuracy (%) significant digits should be made when labeling axis ticks. 85 Model 2 (shown): For example, avoid “1.00”, “2.00”, etc., if “1” and “2” will Y = β0 + β1G + β2X 2 80 R = 0.64 suffice, and use scientific notation in axes labels to keep tick labels concise (e.g., replace “Time (ms)” with “Time −10 −5 0 ERN amplitude (µV) (s)” and change tick labels from “1000”, “2000”, etc. to “1” and “2”). Figure 10: Integrate statistical descriptions related to the hypoth- esis of interest. In each panel, the graphic is designed to address a specific question or hypothesis. Incorporating corresponding statis- 6. Putting it into Practice tics can enhance interpretation of both the graphic and the model. Both datasets are synthetic. Previous sections have focused on the theory of graph- ical perception and numerous guidelines for effective visu- alizations. In this section we address the practical imple- mentation of this information, from tools of the trade to a straightforward checklist with which to assess visualiza- tions.

6.1. Tools Paper and Pencil While some might consider a paper and pencil to be crude or out-dated in the age of tablets and touch-screens, there are few tools that can match the same immediacy of expression and ease of use (Wong and Kjærgaard, 2012). A quick doodle allows us to rapidly specify what we do Allen and Erhardt Visualizing Scientific Data Page 16 of 21 and do not know, explore design ideas, and exchange con- Our first example focuses on the full process of creat- cepts with our colleagues. Additionally, drawing can help ing a visualization. We use a simple set of synthetic data to assess understanding and gain insights into data or de- comprising subject reaction times collected over a range signs: just as you may discover the limits of your knowl- of task difficulty levels. As illustrated in Figure 11, we be- edge when you attempt to teach a concept, you may re- gin by thinking about the data and the visualization goals. alize those same limits when you try to draw it. Finally, (If ever in doubt about where to start with data visualiza- doodling focuses your thinking on the content of your cre- tion, consider the phrase “think before you ink”.) In Step ation, rather than on the “how” of creating it. 1, we establish the sample size, the data dimensions, and the type of each variable. Additionally, we formulate a hy- Plotting Software pothesis relating mean reaction time to subject genotype. Data visualization is often performed with the same Our ability to visually address this hypothesis guides our tool used for processing and statistical analysis (e.g., MAT- design choices. In Step 2, we doodle. Putting pen to pa- LAB, R, Python, SAS, SPSS, Microsoft Excel, etc.), though per allows for experimentation with different chart types, data and summaries can typically be exported to be plot- layouts, and encodings. Doodles need not be accurate — ted with other software. When choosing among tools, con- their purpose is to explore designs rather than the data sider flexibility and accessibility. Nearly all software will itself. We select one of these designs in Step 3. In this produce ubiquitous chart types such as bar and line plots, case, a heat map is a less optimal choice because it is dif- however custom encodings of visual attributes and geoms ficult to incorporate uncertainty in mean reaction times. may be difficult or impossible to achieve with some pack- Small multiples provide a very good choice, as they per- ages. You may also consider the community of users for mit portrayal of the data at both group and individual different programming languages and the repositories of levels. Ultimately, we choose a line plot with symbol en- sophisticated visualization methods that are readily avail- coding because the proximity of the lines to each other able. For example, the Grammar of Graphics (Wilkinson, encourages the eye to make comparisons between groups 2005) has been implemented in libraries for R and Python (following design Principle 1). (ggplot, Wickham (2009)), and ColorBrewer is accessible In Step 4 we finally “ink” and create our first version within R, Python, and MATLAB. There are also special- of the visualization using MATLAB, with the majority of ized software packages for analyzing and visualizing par- figure properties left at their default values. Step 5 offers ticular data types, such as FreeSurfer (freesurfer.net) a refinement of the visualization, following design princi- for cortical surfaces, TrackVis (trackvis.org) for fiber ples provided in Section 2.3. The bulk of revisions are tract data, and Gephi (gephi.org) for networks. For any implemented in MATLAB, with small adjustments added plotting software, developing a high level of proficiency in Adobe Illustrator. Depending on the results of Step will help you be less restricted in implementing and revis- 4 or 5, one may reconsider the design choice or its im- ing your design. plementation. As noted by scientist and illustrator Mar- tin Krzywinski, “a good figure, like good writing, doesn’t Graphical Editing Software simply happen — it is crafted.” Similarly, the custom- Assembling a multi-panel figure or fine-tuning a single ary advice to “revise and rewrite” becomes “revise and plot may require the use of graphical editing software that redraw” (Krzywinski, 2013b). Typically, one’s initial idea provides the ability to adjust colors and transparency, re- for a visualization will be immature and unclear. Use it as size objects, add annotation, etc. Because data visualiza- a starting point, then re-doodle (literally go back to the tions are almost entirely composed of points, lines, curves, drawing board) and re-design as necessary to clarify your and shapes, they are best edited in vector-based software message. Once you are satisfied with the visualization, it programs such as Adobe Illustrator, Inkscape, or Corel- is important to consider how it integrates into a multi- DRAW. These programs allow you to manipulate graph- panel figure or larger body of work such as a manuscript ical objects in their native format, avoiding distortion or or presentation (Step 6). For example, if additional pan- degradation introduced by “rasterizing”. For graphical els or figures also compare genotypes, the same symbols elements such as photos or pictures that originate as pix- should be used consistently. Follow Gestalt principles and elated images, editing can be performed with raster-based the recommendations in Section 5 to maintain uniformity software such as Adobe Photoshop, GIMP, or ImageMag- of visual styles and typefaces such that your viewers can ick. Regardless of the software choice, we again recom- focus on data variation, rather than design variation. mend investing the time to master it. Graphical editing Our second example focuses on modifying visualiza- should empower you to explore and create, rather than tions of electroencephalographic (EEG) and functional mag- leave you feeling frustrated. netic resonance imaging (fMRI) datasets which are com- monly seen in psychophysiology and neuroscience research. 6.2. Examples Figure 12A presents data from an EEG visual flanker Here we provide two concrete examples of data visu- task. Subjects were asked to indicate the direction of a alization, revisiting and integrating principles introduced visual target which appeared shortly after the presenta- earlier. tion of flanking distractors. Multi-channel EEG record- Allen and Erhardt Visualizing Scientific Data Page 17 of 21

STEP 1 : THINK STEP 2 : DOODLE STEP 3 : CHOOSE A DESIGN

THINK BEFORE YOU INK A line plot with DATA CHARACTERISTICS W symbol encoding Sample size: 25 subjects A makes comparisons R

Dimensions: 3 D between genotypes • Genotype (nominal, independent) E easiest.

R

• Task difficulty (ordinal, independent)

&

• Reaction time (continuous, dependent)

E

S

I HYPOTHESIS V

Mean reaction time profiles over task E

difficulty depend on subject genotype. R

STEP 6 : INTEGRATE STEP 5 : REFINE STEP 4 : INK

4.5 4 mean ± AA 4 1 SEM Let the data dominate. aa Remove extraneous grid 3.5 3 lines and labels. Annotate Aa data directly, rather than 3 with a legend. Adjust axes to frame the data. Jitter 2.5 data as necessary to avoid 2 2 Reaction time (s) occlusion. Define error Reaction time (s) bars on the graphic. aa Integrate into a larger multi-panel figure 1.5 Aa or body of work using Gestalt principles. AA 1 1 Similar content should have similar visual 1 2 3 4 1 1.5 2 2.5 3 3.5 4 styles. Consider grouping and alignment. Task difficulty Task difficulty

Figure 11: An of the data visualization process.

ings were decomposed using independent component anal- asked to respond to target tones presented within a series ysis and a single component best matching the expected of standard tones and novel sounds. Blood oxygenation fronto-central topography for a performance monitoring level-dependent (BOLD) time series at each brain voxel process was selected for further analysis (Eichele, Juvod- were regressed onto activation models for the target, novel, den, Ullsperger, and Eichele, 2010). Here, we ask how the and standard stimuli (Kiehl, Laurens, Duty, Forster, and extracted event-related potential (ERP) differs according Liddle, 2001). Here, we ask which brain regions might to the subject’s response (i.e., correct or error). Panel A be involved in the novelty processing of auditory stimuli provides a common portrayal of such results, with the and compare beta parameters between novel and stan- mean ERP displayed for each condition. While this vi- dard conditions. Panel B presents voxelwise differences sualization clearly contrasts the mean ERPs, it does not between beta coefficients using a widely reproduced de- provide sufficient information to determine whether they sign: functional-imaging results are thresholded based on differ. In the modified display (Panel A’), we incorporate statistical significance and overlaid on a high-resolution 95% confidence bands around the average ERPs. The structural image. This design provides excellent spatial bands are made slightly transparent to highlight overlap localization for functional effects, but is not without prob- between conditions and to maintain the visual prominence lems. It does not portray uncertainty and has a remark- of the means. Confidence intervals clarify that there is ably low data-ink ratio due to the prominent (non-data) greater uncertainty in the error response than the correct structural image and sparsity of actual data (Habeck and response (since subjects make few errors), and that there Moeller, 2011). Moreover, the design hides results that is insufficient evidence to conclude a response difference do not pass a somewhat arbitrary statistical threshold. A after ∼800 ms. We also add several annotations to the rich and complex dataset is reduced to little more than graphic. These define the type of uncertainty that is por- a dichotomous representation (i.e., “significant or not?”) trayed, specify our null and alternative hypotheses as well that suffers from all the limitations of hypothesis testing as the alpha level chosen to determine statistical signif- (Harlow, Mulaik, and Steiger, 2013). icance, indicate the results of the significance tests, and Rather than threshold results, we suggest the dual- establish that the timeline is relative to the presentation of coding approach to represent uncertainty proposed by Hengl the target stimulus. The resulting design provides view- (2003). As shown in Panel B’, differences in beta esti- ers with more information about the experiment, data, mates are mapped to color hue, and associated paired t- analysis, and results. statistics (providing a measure of uncertainty) are mapped Figure 12B portrays results from an auditory odd- to color transparency. Note that for dual-coding, the ball event-related fMRI experiment. Participants were hue colormap should be restricted to fully saturated col- Allen and Erhardt Visualizing Scientific Data Page 18 of 21

COMMON VISUALIZATION MODIFIED VISUALIZATION

flankers target A 4 Error A’ 6 mean, 95% CI 3 Error 4 m V) 2 m V) 2 1 Correct Correct 0 0 −1 −2 −2 H : μ = μ

Average IC potential ( 0 E C Average IC potential ( −4 −3 Ha: μE ≠ μC : p < 0.001 −4 −6 * * −500 0 500 1000 −500 −200 0 500 1000 Time (ms) Time (ms)

B Novel − Standard B’ Novel − Standard

1

1 2

n = 28 subjects

1.5 2 Z = 22 Z = 22 Z = 2 Z = 2 1 0.5

Novel β 0 −0.5 −0.5 0 0.5 1 1.5 Standard β L R L R

Z = −19 Z = −39 Z = −19 Z = −39 H0: μN = μS ∆β weight H : μ ≠ μ ∆β weight ≥5 a N S |t| : p < 0.001 0 −1.6 0 +1.6 −1.6 0 +1.6

Figure 12: Conventional (A, B) and modified designs (A’, B’) portraying real data. Captions describe the modified designs. (A’) EEG flanker data. Mean ERPs (averaged over 10 subjects) and 95% non-parametric CIs for Error trials (red) and Correct trials (blue). Asterisks indicate significantly different mean ERPs at p < 0.001 (nonparametric randomization test with implicit correction for multiple comparisons). (B’) FMRI auditory oddball data. Axial slices show the difference between Novel and Standard beta weights averaged over 28 subjects. Beta difference is mapped to color hue; t-statistic magnitude is mapped to transparency. Black contours denote significantly different betas at p < 0.001 (two-tailed paired t-tests, corrected with false discovery rate). Scatter plots show Standard versus novel Betas for regions of interest (ROIs) 1 and 2. Beta weights are averaged over clusters of contiguous voxels passing significance (ROI 1 = 2426 voxels; ROI 2 = 1733 voxels). Gray lines indicate y = x. Adapted from Allen et al. (2012).

ors (i.e., the outer cylindrical surface in Figure 7), since vation to novel stimuli compared to standard tones. These changes in saturation will be confounded with changes in brain areas coincide with the default-mode network, a sys- transparency. In this example we use the “jet” or “rain- tem preferentially active when subjects engage in internal bow” colormap (Path 4 in Figure Figure 7). Compared to rather than external processes (Buckner, Andrews-Hanna, Panel B, no information is lost. Transparency is sufficient and Schacter, 2008). Along with portraying uncertainty, to determine structural boundaries and statistical signif- Panel B’ also includes scatter plots displaying the Stan- icance is indicated with contours. However substantial dard and Novel beta parameters in regions of interest for information is gained. The quality of the data is now ap- individual subjects. Scatter plots allow viewers to access parent: large and consistent differences in betas are wholly the data in greater detail as they indicate the beta esti- localized to gray matter, while white matter and ventric- mates for each condition (rather than just the difference), ular regions exhibit very small or very uncertain differ- reveal the degree of variability across subjects (and the ences. In addition, isolated blobs of differential activation absence of outliers), and validate a “paired” statistical in Panel B are now seen as the peaks of larger contiguous approach since beta values covary across conditions. Ad- activations (often with bilateral homologues) that failed to ditionally, scatter plots remove dependence on color map- meet significance criteria. The modified display also re- ping and remain perfectly clear when reproduced in black veals regions in lateral parietal cortex, medial prefrontal and white. cortex, and posterior cingulate cortex with reduced acti- In summary, the modified design and use of dual cod- Allen and Erhardt Visualizing Scientific Data Page 19 of 21 ing provides substantially more information to viewers, Cumming, G. and Finch, S. Inference by eye: confidence intervals increases data accessibility, and provides clarity regarding and how to read pictures of data. American Psychologist, 60(2): hypothesis testing. To encourage the use of this approach, 170, 2005. Cumming, G., Fidler, F., and Vaux, D. L. Error bars in experimental sample Matlab scripts for hue and transparency coding are biology. The Journal of Cell Biology, 177(1):7–11, 2007. provided at mialab.mrn.org/datavis. Eichele, H., Juvodden, H. T., Ullsperger, M., and Eichele, T. Mal- adaptation of event-related eeg responses preceding performance errors. Frontiers in Human Neuroscience, 4, 2010. 6.3. A Checklist for Assessing Visualizations Ellis, W. D. A source book of Gestalt psychology, volume 2. Psy- Table 3 provides a checklist for data visualizations that chology Press, 1999. can be used before, during, and after you implement your Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H., and Wickham, H. The generalized pairs plot. design (Wallgren et al., 1996). If you can answer “yes” to Journal of Computational and Graphical Statistics, 22(1):79–91, most of the questions, your figure is likely to be accessible 2011. and correctly interpreted by others. We hope this check- Fechner, G. Elemente Der Psychophysik. Breitkopf & H¨artel, list will serve as a useful reference for you as you create Leipzig, 1860. Feinberg, B. M. and Franklin, C. A. Social Graphics Bibliography. your own visualizations, as well as a tool to evaluate and Bureau of Social Science Research, Washington, D.C., 1975. improve the visualizations of your colleagues. Feinberg, R. A. and Wainer, H. Extracting sunbeams from cucum- bers. Journal of Computational and Graphical Statistics, 20(4): 793–810, 2011. References Few, S. Save the pies for dessert. Visual Business Intelligence Newsletter, pages 1–14, 2007. Albers, J. Interaction of Color. Yale University Press, New Haven, Friendly, M. Corrgrams: Exploratory displays for correlation matri- Connecticut, 1975. ces. The American Statistician, 56(4):316–324, 2002. Allen, E. A., Erhardt, E. B., and Calhoun, V. D. Data visualization Gehlenborg, N. and Wong, B. Points of view: Into the third dimen- in the neurosciences: Overcoming the curse of dimensionality. sion. Nature Methods, 9(9):851–851, 2012. Neuron, 74(4):603–608, 2012. Griethe, H. and Schumann, H. The visualization of uncertain data: Baird, J. C. A cognitive theory of psychophysics I. Scandinavian Methods and problems. In Proceedings of SimVis, pages 143–156, Journal of Psychology, 11(1):35–46, 1970a. 2006. Baird, J. C. A cognitive theory of psychophysics II. Scandinavian Habeck, C. and Moeller, J. R. Intrinsic functional-connectivity net- Journal of Psychology, 11(1):89–102, 1970b. works for diagnosis: Just beautiful pictures? Brain Connectivity, Belia, S., Fidler, F., Williams, J., and Cumming, G. Researchers 1(2):99–103, 2011. misunderstand confidence intervals and standard error bars. Psy- Haemer, K. W. Hold that line. The American Statistician, 1(1): chological Methods, 10(4):389, 2005. 25–25, 1947a. Brewer, C. A. Guidelines for use of the perceptual dimensions of Haemer, K. W. The perils of perspective. The American Statisti- color for mapping and visualization. In Proceedings of the Inter- cian, 1(3):19–19, 1947b. national Society for Optical Engineering (SPIE), volume 2171, Haemer, K. W. The pseudo third dimension. The American Statis- pages 54–63, San Jose, 1994. International Society for Optics and tician, 5(4):28–28, 1951. Photonics. Harlow, L. L., Mulaik, S. A., and Steiger, J. H. What if there were Buckner, R. L., Andrews-Hanna, J. R., and Schacter, D. L. The no significance tests? Psychology Press, 2013. brain’s default network. Annals of the New York Academy of Heer, J. and Bostock, M. Crowdsourcing graphical perception: Us- Sciences, 1124(1):1–38, 2008. ing mechanical turk to assess visualization design. In Proceedings Bullmore, E. and Sporns, O. Complex brain networks: Graph the- of the SIGCHI Conference on Human Factors in Computing Sys- oretical analysis of structural and functional systems. Nature tems, pages 203–212. ACM, 2010. Reviews Neuroscience, 10(3):186–198, 2009. Hengl, T. Visualisation of uncertainty using the HSI colour model: Chambers, J., Cleveland, W., Kleiner, B., and Tukey, P. Graphi- Computations with colours. In Proceedings of the 7th Interna- cal Methods for Data Analysis. Wadsworth International Group, tional Conference on GeoComputation, pages 8–17, 2003. Belmont, CA, 1983. Hintze, J. L. and Nelson, R. D. Violin plots: A box plot-density Chernoff, H. The use of faces to represent points in k-dimensional trace synergism. The American Statistician, 52(2):181–184, May space graphically. Journal of the American Statistical Associa- 1998. tion, 68(342):361–368, 1973. Hoekstra, R., Morey, R. D., Rouder, J. N., and Wagenmakers, E.-J. Cleveland, W. S. Graphs in scientific publications. The American Robust misinterpretation of confidence intervals. Psychonomic Statistician, 38(4):261–269, 1984. bulletin & review, 21(5):1157–1164, 2014. Cleveland, W. S. The Elements of Graphing Data. Hobart Press, Jarvenpaa, S. L. and Dickson, G. W. Graphics and managerial Summit, New Jersey, revised edition, 1994. decision making: Research-based guidelines. Communications of Cleveland, W. S. and McGill, R. A color-caused optical illusion on a the ACM, 31(6):764–774, 1988. statistical graph. The American Statistician, 7(2):101–105, 1983. Kampstra, P. Beanplot: A boxplot alternative for visual comparison Cleveland, W. S. and McGill, R. Graphical perception: Theory, of distributions. Journal of Statistical Software, 28(1):1–9, 2008. experimentation, and application to the development of graphical Kiehl, K. A., Laurens, K. R., Duty, T. L., Forster, B. B., and Lid- methods. Journal of the American Statistical Association, 79 dle, P. F. Neural sources involved in auditory target detection (387):531–554, 1984. and novelty processing: an event-related fmri study. Psychophys- Cleveland, W. S. and McGill, R. Graphical perception and graphical iology, 38(01):133–142, 2001. methods for analyzing scientific data. Science, 229(4716):828– Kosslyn, S. M. Graphics and human information processing: A 833, 1985. review of five books. Journal of the American Statistical Associ- Cleveland, W. S., McGill, M. E., and McGill, R. The shape param- ation, 80(391):499–512, 1985. eter of a two-variable graph. Journal of the American Statistical Krzywinski, M. Points of view: Axes, ticks and grids. Nature Meth- Association, 83(402):289–300, 1988. ods, 10(3):183–183, 2013a. Cowan, N. Metatheory of storage capacity limits. Behavioral and Krzywinski, M. Points of view: Elements of visual style. Nature Brain Sciences, 24(01):154–176, 2001. Methods, 10(5):371–371, 2013b. Krzywinski, M. and Altman, N. Points of significance: Error bars. Allen and Erhardt Visualizing Scientific Data Page 20 of 21

Before you begin Is a visualization necessary? Have you considered a table or written description? Have you clarified the goal(s) of the visualization? Are you aware of size and color constraints?

While you are working Design Is the visualization consistent with the model or hypothesis being tested? Does the visualization emphasize actual data over idealized models? Where possible, does the graphical encoding method have high decoding accuracy? Have “empty dimensions” or 3D effects been removed? Have encodings and annotations been used consistently? Axes Are axes scales defined as linear, log, or radial? Do axes limits frame the data? Is the aspect ratio appropriate for the data? Are the axes units intuitive? Uncertainty Does the visualization portray uncertainty where necessary? Is the type of uncertainty appropriate for the data? Are the units of uncertainty labeled? Color Is color necessary or useful? Can features be discriminated when printed in grayscale? Does the visualization accommodate common forms of colorblindness? Colormapping Is the colormap consistent with the data type? Does a colorbar fully define the mapping (quantity, units, and scale)? Annotation Are all symbols defined, preferably by directly labeling objects? For numerical annotation, are the significant figures appropriate? Are uncommon abbreviations avoided or clearly defined?

When you think you are finished Have you tested the visualization on a na¨ıve viewer? Did you communicate the intended message?

Table 3: A checklist for data visualizations.

Nature Methods, 10(10):921–922, 2013. IEEE Transactions on Computers, 18(5):401–409, 1969. Krzywinski, M. and Wong, B. Points of view: Plotting symbols. Schmid, C. F. : Design Principles and Practices. Nature Methods, 10(6):451–451, 2013. Wiley New York, 1983. Lane, D. M. and S´andor,A. Designing better graphs by including Spitzer, M., Wildenhain, J., Rappsilber, J., and Tyers, M. Box- distributional information and integrating words, numbers, and PlotR: A web tool for generation of box plots. Nature Methods, images. Psychological Methods, 14(3):239, 2009. 11(2):121–122, 2014. Lewandowsky, S. and Spence, I. Discriminating strata in scatter- Stevens, S. S. Psychophysics. Transaction Publishers, 1975. plots. Journal of the American Statistical Association, 84(407): Talbot, J., Gerth, J., and Hanrahan, P. An empirical model of slope 682–688, 1989. ratio comparisons. Visualization and , IEEE Margulies, D. S., B¨ottger,J., Watanabe, A., and Gorgolewski, K. J. Transactions On, 18(12):2613–2620, 2012. Visualizing the human connectome. NeuroImage, 80:445–461, Tomasi, D. and Volkow, N. D. Association between functional con- 2013. nectivity hubs and brain networks. Cerebral Cortex, 21(9):2003– Moret-Tatay, C. and Perea, M. Do serifs provide an advantage in the 2013, 2011. recognition of written words? Journal of Cognitive Psychology, Tufte, E. R. The Visual Display of Quantitative Information. 23(5):619–624, 2011. Graphics Press, Cheshire, CT, USA, second edition, 2001. ISBN Munsell, A. H. A Color Notation. Munsell color company, Inc., 0-9613921-0-X. Baltimore, 12 edition, 1947. Tukey, J. Exploratory Data Analysis. Addison-Wesley series in Potter, K., Rosen, P., and Johnson, C. R. From quantification behavioral science. Addison-Wesley Publishing Company, 1977. to visualization: A taxonomy of uncertainty visualization ap- Van der Maaten, L. and Hinton, G. Visualizing data using t-SNE. proaches. In Uncertainty Quantification in Scientific Computing, Journal of Machine Learning Research, 9(2579-2605):85, 2008. pages 226–249. Springer, 2012. Vaux, D. L. Error message. Nature, 428(6985):799–799, 2004. Sammon, J. W. A nonlinear mapping for data structure analysis. Wainer, H. How to display data badly. The American Statistician, Allen and Erhardt Visualizing Scientific Data Page 21 of 21

38(2):137–147, 1984. Wainer, H. Depicting error. The American Statistician, 50(2):101– 111, 1996. Wainer, H. Visual revelations: Improving graphic displays by con- trolling creativity. Chance, 21(2):46–52, 2008. Wallgren, A., Wallgren, B., Persson, R., Jorner, U., and Haaland, J.-A. Graphing Statistics & Data: Creating Better Charts. Sage, 1996. Wand, H., Iversen, J., Law, M., and Maher, L. Quilt plots: A simple tool for the visualisation of large epidemiological data. PloS One, 9(1):e85047, 2014. Ward, M. O. Multivariate data glyphs: Principles and practice. In Handbook of Data Visualization, pages 179–198. Springer, 2008. Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer Science & Business Media, 2009. Wickham, H. Graphical criticism: Some historical notes. Journal of Computational and Graphical Statistics, 22(1):38–44, 2013. Wilkinson, L. The Grammar of Graphics. Springer Science & Busi- ness Media, second edition, 2005. Wong, B. Points of view: Color coding. Nature Methods, 7(8): 573–573, 2010a. Wong, B. Points of view: Design of data figures. Nature Methods, 7(9):665–665, 2010b. Wong, B. Points of view: Arrows. Nature Methods, 8(9):701–701, 2011a. Wong, B. Points of view: Salience to relevance. Nature Methods, 8 (11):889–889, 2011b. Wong, B. Points of view: Simplify to clarify. Nature Methods, 8(8): 611–611, 2011c. Wong, B. and Kjærgaard, R. S. Points of view: Pencil and paper. Nature Methods, 9(11):1037–1037, 2012.