Visualizing Scientific Data
Total Page:16
File Type:pdf, Size:1020Kb
Visualizing Scientific Data Elena A. Allena, Erik Barry Erhardtb aDepartment of Biological and Medical Psychology, University of Bergen bDepartment of Mathematics and Statistics, University of New Mexico Contents List of Figures 1 Introduction and Motivation 1 1 Perceptual Tasks . 2 2 Difference between curves . 3 2 Designing for Human Abilities and Limita- 3 Chart types . 5 tions 2 4 Alternatives to 3D . 9 2.1 Graphical Encoding . 2 5 Symbol shape and saturation . 9 2.2 Short-term and Long-term Memory . 3 6 Color benefits . 11 2.3 Design Principles . 3 7 Color cylinder . 12 8 Aspect ratio . 13 3 Selecting a Chart Type 4 9 Categorical axis . 14 3.1 Common Chart Types . 4 10 Integrate statistical descriptions . 15 3.2 Displaying Variation and Uncertainty . 6 11 Data visualization process . 17 3.3 Multivariate Visualizations . 7 12 Modified designs with real data . 18 3.4 3D visualizations . 8 4 Symbols and Colors 8 1. Introduction and Motivation 4.1 Using Symbols . 8 4.2 Using Color . 10 Data visualization is more than just the graphical dis- play of information. Rather, it is a form of visual commu- 5 Supporting Details 12 nication intended to generate understanding or insights 5.1 Axes . 12 about data. Specific to scientific research, data visual- 5.2 Grids and Guides . 13 ization serves many distinct purposes. In exploratory 5.3 Annotation . 13 data analysis, which renowned statistician John Tukey de- 5.4 Font Choice . 14 scribes as \graphical detective work" (Tukey, 1977), visu- 5.5 Numerical Precision . 15 alizations provide the ability to reveal unanticipated pat- terns and relationships. When a priori hypotheses are 6 Putting it into Practice 15 available, visualizations are used to support hypothesis 6.1 Tools . 15 testing and model validation. In presentations and pub- 6.2 Examples . 16 lications, visualizations are the primary medium used to 6.3 A Checklist for Assessing Visualizations . 19 convey research findings to colleagues. While tables have their place for point/value reading in small or moderate- sized datasets, graphs are the superior choice for show- List of Tables ing trends, summarizing data, and demonstrating rela- 1 Decoding accuracy . 3 tionships (Jarvenpaa and Dickson, 1988). 2 Data types . 6 As with any form of communication, data visualiza- 3 Checklist . 20 tion can be effective or ineffective. It has been argued by Wallgren, Wallgren, Persson, Jorner, and Haaland (1996) that \a poor chart is worse than no chart at all". With- IChapter to appear in 2016 in Handbook of Psychophysiol- out consideration of how a visualization will be interpreted ogy, 4th Edition by Cambridge University Press, edited by Pro- (or possibly misinterpreted), you run the risk of confusing fessor John T. Cacioppo, Professor Louis G. Tassinary and Gary G. Berntson. your viewer rather than enhancing their understanding. Acknowledgments: We thank Lana Chavez for generous help Unfortunately, there is evidence to suggest that graphical editing early versions of the chapter. We also acknowledge Bang communication in the sciences can be substantially im- Wong and Martin Krzywinski, whose clear writing and illustrations proved. An early survey by Cleveland (1984) of 377 graphs inspired much of the work presented here. Email addresses: [email protected] (Elena A. Allen), in a volume of Science showed that 30% had at least one [email protected] (Erik Barry Erhardt) major error compromising interpretation. Our own more Preprint Handbook of Psychophysiology, 4th Edition, 2016 October 2015 Allen and Erhardt Visualizing Scientific Data Page 2 of 21 recent survey of nearly 1500 figures from five neuroscience the viewer to decode information effortlessly and accu- journals found that graphical displays became less infor- rately. mative and interpretable as the dimensions of datasets in- Psychophysics experiments have long shown that some crease (Allen, Erhardt, and Calhoun, 2012). For example, encodings are easier to decode than others (e.g., Fech- only 43% of graphics displaying higher-dimensional data ner, 1860). An intensive period of investigation into hu- labeled the dependent variable (meaning that more than man perceptual abilities in the 1950s{70s found power- half the time the viewer couldn't determine what quantity law relationships between the true magnitude of a stimu- was being plotted) and only 20% portrayed the statistical lus and its perceived magnitude (Baird, 1970a,b; Feinberg uncertainty of measured or calculated quantities. and Franklin, 1975; Stevens, 1975). For example, while The inadequacies of many scientific graphics are not judgements of length were approximately linear, perceived due to a lack of guidelines or established best practices. brightness, area, and volume all showed sublinear rela- Quite the contrary, there are numerous excellent resources tionships to true stimulus magnitude, i.e., in order to get available (e.g., Cleveland, 1994; Tufte, 2001). The prob- the same increase in perceived area from one object to lem appears to be that the advice of the experts is not another, one must consistently over-represent the actual reaching its intended audience (Wickham, 2013). It is area of the object. increasingly recognized that data visualization education In the 1980s, William Cleveland and colleagues pur- must be integrated into resources broadly accessed by the sued investigations into perceptual decoding in relation to scientific community. Examples of this paradigm shift graphical communication, positing that visual attributes include the appearance of relatively simple graphing tu- which could be decoded more accurately would ultimately torials in prominent journals (e.g., Spitzer, Wildenhain, result in a greater likelihood of perceiving patterns in Rappsilber, and Tyers, 2014; Wand, Iversen, Law, and data and making correct interpretations (Cleveland and Maher, 2014), and recent development of the strongly rec- McGill, 1984). The findings of Cleveland and McGill ommended \Points of View" column by Bang Wong, Mar- (1985), which have also been reproduced more recently tin Krzywinski, and colleagues, in Nature Methods, which by Heer and Bostock (2010), led to a ranking of visual at- provides a high level discussion of data visualization fun- tributes based on decoding accuracy, presented in Table 1. damentals. Education | particularly early on | is crit- Several of these visual attributes are shown in Figure 1, ical for the development of visual communication skills, which represents the same set of data encoded differently and it is exciting to see innovative resources (such as this in each column. You can test for yourself whether some book) that acknowledge data visualization as a general attributes are easier to accurately decode than others. research method for scientists. It is our intention that by reading this chapter you will VALUE SATURATION VOLUME AREA ANGLE LENGTH POSITION become a better communicator of visual information. We 15 will guide you through the motivating principles of graph- ? ical design, how to construct graphs that play to human ? perceptual strengths, how to clarify visual messages with ? annotations, and, finally, how to implement these concepts ? in practice. ? 40 2. Designing for Human Abilities and Limitations Figure 1: Test your ability to decode quantitative information from There are countless ways to represent information graph- different visual attributes. Each column encodes the same 7 values ically, however not all of these ways are useful or even in- using a different graphical feature. How accurate is your perception? Values from top to bottom: 15, 30, 50, 10, 32, 24, 40. Adapted from terpretable to human beings. When designing a graphic, Wong (2010b). we must consider our perceptual and cognitive capabil- ities. Effective graphics will be those that play to the strengths of human information-processing abilities and The relative abilities of humans to make accurate per- avoid the weaknesses. ceptual judgements has become the basis for recommend- ing particular graphical encodings over others. It is com- 2.1. Graphical Encoding monly accepted that position along a common axis should In data visualizations, information is represented not be the default encoding method for displaying quantita- with numbers, but with geometric objects. Thus, quanti- tive information because it is most accurately decoded. tative and categorical data must be mapped or encoded to Histograms, bar charts, line plots, scatter plots, and many visual objects, typically through attributes such as posi- other forms encode information in this fashion. When a tion, size, shape, or color. Communication is effective only single common axis is not sufficient to show the data, a if the perceptual decoding of information by the viewer is good alternative is a series of replicated axes, known as successful. Ideally, a visualization is designed that helps small multiples (Tufte, 2001). Color hue and saturation, Allen and Erhardt Visualizing Scientific Data Page 3 of 21 Visual attribute Rank material in a display, overwhelming the viewer, or relegate Position along a common scale 1 (most accurate) too much information to the figure legend, requiring the Position along non-aligned scales 2 viewer to arduously memorize encodings and definitions Length 3 (Kosslyn, 1985). Improved graphical designs (Sec. 2.3) Angle, Slope 4 and integrated annotation (Sec. 5.3) can reduce demands Area 5 Volume, Color saturation 6 on short-term memory, enhancing and accelerating under- Color hue 7 (least accurate) standing. Considerations of long-term