Five Ways Visualizations Can Mislead (And How to Fix Them)

Home , Data visualization, Misleading graph, Plot (graphics)

COVER STORY

The Good, the Bad, and the Biased: Five Ways Visualizations Can Mislead (and How to Fix Them)

Danielle Albers Szafir, University of Colorado Boulder

Insights Data visualizations allow people to to help people see what matters. This →→Visualizations allow readily explore and communicate article reviews common visualization people to readily analyze knowledge drawn from data. practices that may inhibit effective and communicate data. Visualization methods range from analysis, why these designs are However, many common standard scatterplots and line graphs problematic, and how to avoid them. visualization designs lead to to intricate interactive systems for The discussion illustrates a need to engaging imagery but false analyzing large data volumes at a glance. better understand how visualizations conclusions. But how can we craft visualizations can support flexible and accurate data →→By understanding what that effectively communicate the right analysis while mitigating potential people see when they look information from our data? What sources of bias. at a visualization, we can aspects of data and design need to Glancing at the bar chart in Figure design visualizations that come together to develop accurate 1 will likely convince you that one support more accurate insights? The answer lies in the way method performs twice as well as the data analysis and avoid we see the world: People use their other. However, this visualization is unnecessary biases. visual and cognitive systems (i.e., our misleading: The true difference between eyes and brain) to extract meaning methods is only 5 percent. Talks and from visualized data. However, flashy articles frequently feature flashy visualizations are not always optimized visualizations like this—visualizations

26 INTERACTIONS JULY–AUGUST 2018 INTERACTIONS.ACM.ORG that, despite the data’s simplicity, break “If I label my axes, no one will make for harnessing data, interpretation several rules for honest and effective that mistake.” While there are small and decision making are ultimately data visualization, exaggerating the individual differences in how we done by people. People bring context, differences between methods and interpret visualizations, everyone has expertise, and situational awareness to calling into question the statistical the same visual system, is subject to the analyses that are not easily integrated conclusions drawn from the results. Are same visual biases, and can be fooled into databases but that are critical to these violations nefarious? No. Are they by the same visual illusions. And we disentangling the signal from the noise. done with the intention of making a cool are only fooling ourselves if we assume How can we as developers and data graph? Probably. Do they lie with that differently. scientists enable access to the right data? Yes. The choice to use flashy rather than information to support effective data The mistakes made in this accurate data visualizations is growing analysis and communication? visualization—unnecessary use of 3D, increasingly problematic. Data provides The answer lies in understanding a lack of uncertainty information, axes a crucial foundation for the decisions what people actually see in data starting above zero—are common on which our society operates. It allows visualizations. Our sense of sight throughout the scientific world. us to characterize the world in new provides us with a well-tuned pattern- People often justify these designs ways and drive innovative discoveries. recognition system. Centuries of with comments like “I have learned While algorithms and computational evolution have refined our visual

IMAGE BY GN8 / SHUTTERSTOCK/ GN8 BY IMAGE to read these charts correctly” or tools provide powerful mechanisms abilities to rapidly process large

INTERACTIONS.ACM.ORG JULY–AUGUST 2018 INTERACTIONS 27 cover story amounts of complex information. We Here, we identify several (sometimes can find a tiger in long grass or ripe controversial) visualization design 12 12 red berries in a bush. We can detect choices that can lead to potentially 8 8 whether people are approaching or erroneous conclusions and offer

moving away. Visualizations leverage solutions to overcome them, focusing 4 4 the high-throughput processing on color choice, animation, axis scales, capabilities offered by our sense of unnecessary 3D, and privileging 4 8 12 16 20 4 8 12 16 20

sight to help people make sense of statistics over data. 12 12 data. If we understand the patterns and information people extract from a A PRIMER IN VISUALIZATION: 8 8 visualization, we can enable people to WHEN, WHY, AND HOW draw informed conclusions from data at Visualizations are powerful tools 4 4

a glance. for discovering and communicating 4 8 12 16 20 4 8 12 16 20 Visualizations must be crafted insights in data. However, visualizations Figure 2. The four datasets of Anscombe’s with care, as we are easily tricked are not always necessary—people Quartet share the same basic descriptive into seeing patterns in data that are are not optimized to compute precise statistics, but visualizing these datasets not actually present, such as the 50 statistical quantities from abstract reveals four qualitatively different percent difference in Figure 1. While images. Many analysis problems can structures. some visual biases and illusions are be solved with direct queries and difficult to avoid, by understanding how algorithmic methods. For example, This trade-off between flexibility information is transformed between statistical models allow companies to and precision is often the primary the visualization and the knowledge optimize shipping procedures. Purely deciding factor for determining when it creates, we can encourage designs computational approaches scale further a visualization is necessary: If access that help people better communicate, and more accurately estimate precise to the data underlying a statistic or and ultimately understand, data. quantities than people. If you can distill prediction might change our decisions what you need to know about your data about that data, we should use a

Prediction Accuracy into one computable value, you likely do visualization. not need a visualization. Crafting visualizations generally 75% However, visualizations often follows a systematic process: clean the prove robust where statistics fall short. data, precompute relevant information, Visualizations take advantage of the map that information to different universality of visual structure: We visual channels (e.g., position, size, can see the shapes these data points color), and integrate interaction and 70% make even when we cannot directly other details where appropriate. By enumerate them. Take, for example, combining a small number of channels, Anscombe’s Quartet: four datasets with visualization designers can create identical means, variance, correlation, intricate interactive systems that reveal 65% and regressions (Figure 2). While these patterns in large data collections at a Our Method Their Method Our Method Their Method datasets appear statistically identical, glance. Choosing among these channels, Misleading MisleadingMisleading visualizing them shows substantial while simple in concept, is where qualitative differences in their most visualizations go wrong. While structure. Our sight detects these high- many combinations create flashy and Prediction Accuracy Prediction Accuracy level structures within 100 milliseconds engaging graphics, these approaches 100% of looking at a graph [1], far faster than may inadvertently obscure or even 90% the blink of an eye. misrepresent data in ways that lead 80% How do you decide when to visualize to flawed and biased interpretations. 70% and when to compute? Factors such Misleading visualizations appear 60% as uncertainty (how well do statistics in our news reports, creating public 50% represent the data?), transparency (what mistrust in data, in scientific results, 40% does the underlying data look like?), leading to incorrect theories, and even 30% 30% context (what additional knowledge in Congress, where policymakers find 20% 20% could inform analysis and decision themselves in conflict over data. So 10% 10% making?), scale (how many distinct how do we avoid faulty visualizations? 0% Our Method Their Method quantities do we need to evaluate?), Science still cannot fully answer that exposition (what story must the data question, but we can start by avoiding ImprovedImproved tell?), and purpose (do we know what well-studied design pitfalls. Figure 1. 3D marks, truncated axes, and other we are looking for?) all help determine design choices create stylish visualizations; when visualizations are valuable. For GETTING OVER THE RAINBOW however, these visualizations are at best difficult to read and at worst lead to incorrect example, if you cannot readily quantify Many visualizations, such as conclusions. Avoiding known bad practices (or even know) what data properties geographic choropleth maps, eye- leads to more honest and accurate data matter, you can use visualizations to tracking heatmaps, and scalar field communication. synthesize a diverse set of conclusions. visualizations, represent data using

28 INTERACTIONS JULY–AUGUST 2018 INTERACTIONS.ACM.ORG a familiar red-yellow-green-blue practice, this grouping makes rainbows differences from a baseline or natural scheme referred to as the rainbow useful for visualizing categorical data zero value). If so, diverging colormaps colormap. A longtime default of tools (e.g., apples and oranges). However, (those that extend continuously like MatLab, this colormap creates using rainbows for continuous values from a neutral middle color) allow bright and engaging imagery that has introduces artificial divisions in easy comparisons to that middle led to incorrect conclusions and even smoothly varying data. These divisions point. If not, sequential colormaps retracted papers in top scientific venues. create false associations within grouped intuitively represent data magnitudes Many insist that the rainbow colormap colors and dissociations between (Figure 3). By matching color to data, allows them to interpret more variations colors that bias what we see as same visualizations can avoid needless in their data, as they have “learned to and different data. In Figure 3, we see distortions that so often lead to false read the colormap correctly.” However, clear bands of blues, greens, yellows, conclusions. a number of studies have proven that and reds, even though the data varies rainbow colormaps distort data even smoothly across the entire dataset. DATA ON THE MOVE for people who use them daily. For More appropriate colormaps overcome Many visualizations use animations. example, researchers at Harvard these biases by visually preserving For example, a data point’s velocity may worked with cardiologists who used relative data magnitudes. represent its value. We visualize data rainbow colormaps to diagnose arterial Even if you consider yourself robust at different time points in sequence disease [2]. Despite experts’ insistence to the rainbow, consider that nearly 1 to show change over time. Animated that they could accurately interpret in 12 men are colorblind [3]. Colorblind visualizations are flashy and engaging; rainbows, switching from rainbows to individuals see the rainbow differently: however, they also blind people to more mundane colors increased experts’ They cannot discriminate between important changes in data. abilities to correctly identify cardiac certain hues. This lack of discrimination While we can use motion direction issues from 50 percent to 81 percent. does not just cause people to see reds and velocity to encode data, people can While in most cases using a rainbow and greens as the same but also shifts distinguish only a handful of different colormap is not life-or-death, getting the perception of all hues by removing3 speeds and motion directions [4] and over the rainbow can improve data individual color components from can trace the specific movement of only 1.0 interpretation. Rainbows trick people each color in the rainbow. This shift2 three to four data points at a time [5]. into seeing false patterns in data. Color further skews the mapping between Our limited abilities to track moving 0.5 changes over rainbows are not uniform color and data, leading to significant1 objects imply that representing data in their magnitude or direction, causing misperceptions and inaccessible data. using motion may help us identify only a mismatches between perceived color Tools such as ColorBrewer, few high-level patterns with little sense 0.0 differences and actual data differences. Colorgorical, and Adobe Kuler offer0 of what those patterns mean. Feature 2 Feature These mismatches distort value principled alternatives to rainbows These limitations are especially relationships and lead people to see data and allow you to tailor colormaps-1 to problematic for showing values differences as being artificially smaller best represent the visualized data changing over time. For example, or larger than they actually are. For types. If your data is categorical-2 (e.g., Hans Rosling’s GapMinder TED Talk example, in Figure 3 the yellows appear dogs and cats), rainbows are fair game. [6] leverages animation to narrate far more similar to the oranges than to However, ordered or continuous data changes in the global economy. -3 the equidistant greens. should use either sequential or diverging-3 -2 Much-1 of 0the power1 in this2 story3 lies Rainbows also cause people to colormaps. To choose between them, in Rosling’sFeature 1 ability to direct your visually group colors sharing the determine if there is a meaningful attention to important changes in Misleading same name, such as shades of blue. In middle point in your data (e.g., the data. However, our attention is a

3 3

1.0 1.0 2 2

0.5 0.5 1 1

0 0.0 0 0.0 Feature 2 Feature Feature 2 Feature

-1 -1

-2 -2

-3 -3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Feature 1 Feature 1

Misleading Improved Misleading Figure 3. Rainbow colormaps make engaging figures but also create artificial divisions and skew value differences in ways that have caused innumerable false conclusions. Using a sequential colormap supports more accurate insights into smoothly varying datasets. 3 INTERACTIONS.ACM.ORG JULY–AUGUST 2018 INTERACTIONS 29 1.0 2

0.5 1

0 0.0 Feature 2 Feature

-1

-2

-3 -3 -2 -1 0 1 2 3 Feature 1

Improved cover story

20102010 2010 201120112011 201220122012

8080 80 Company:Company:Company: CBS CBSCBS 60 60 60 EA EAEA EBAY EBEBAAYY MSFT MSFT 40 40 MSFT 40 NDAQ NDAQNDAQ UPS UPS UPS 20 20 YHOO YHOO 20 YHOO

0 0 0 0 1000 100 300 300 500 500 0 1000 100 300 300 500 500 0 1000 100 300 300 500 500 0 100 300 500 0 100 300 500 0 100 300 500

JuxtapositionJuxtapositionJuxtaposition Juxtaposition 2010-20122010-2012 2010-20122010-2012 2010-2012 2010-2012 80 80 80 80 80 Company:Company: 80 Company:Company: ‘10 ‘12 ‘10 ‘12 ‘10 ‘12 ‘10 ‘12 Company:CBS CBS Company:CBS CBS 60 ‘10 ‘12 60 60 ‘10 ‘12 60 EACBS EA EA CBSEA 60 EBEAAY EBA60Y EBAY EAEBAY MSFT MSFT MSFT MSFT 40 40 EBAY 40 40 EBAY NDAQ NDAQ MSFTNDAQ MSFTNDAQ 40 UPS UPS40 UPS UPS NDAQ NDAQ YHOO YHOO YHOO YHOO 20 20 UPS 20 20 UPS 20 YHOO 20 YHOO 0 0 0 0 0 1000 100 300 300 500 500 0 1000 100 300 300 500 500 0 0 Superposition Explicit Encoding 0 100 Superposition300 Superposition500 0 100 Explicit300Explicit Encoding Encoding500 Figure 4. Animated data can leave people blind to important changes. Instead, consider methods for directly supporting comparison across time points. Superposition Explicit Encoding

scarce resource: We can allot a limited [7]. We can replace a conversant mid- (visualizing multiple time points side amount to any given set of data points. discussion without notice [8]. In data by side), superposition (arranging data By directing our attention to one set of visualizations, change blindness means from multiple time points on the same values, we effectively ignore changes that if we don’t tell the analyst what axes), and explicit encoding (directly in the rest of the dataset. As a result, aspects of an animated visualization visualizing the differences between animating your data over time may to pay attention to, they may never see time points). We choose between these cause people to lose sight of most of the important changes in their data. Even different techniques by focusing on what data. if they see these changes, our limited aspects of change we want to highlight This information loss is in large memory prevents us from recalling in our visualization and how many part due to change blindness, a precise differences over time. time points we need to see at any one phenomenon where attending to one We can overcome these limitations time. Superposition facilitates precise change leaves us blind to others. For by directly visualizing how data and immediate comparison across a example, counting the number of times changes over time. Methods for such small number of time points; however, a basketball is passed causes us to miss temporal comparison fall into three layering too many time points causes a gorilla dancing through the passers categories (Figure 4): juxtaposition data points to occlude one another. Juxtaposition scales comparisons across larger datasets; however, it is difficult to precisely compare visualizations that are far apart. Explicit encoding Our attention is a scarce resource: We can can extract and represent salient allot a limited amount to any given set of information about changes over time, such as the trajectory a point follows on data points. By directing our attention to a scatterplot; however, these techniques one set of values, we effectively ignore require determining what differences matter for the analysis. By considering changes in the rest of the dataset. data scale and relevant questions, we

30 INTERACTIONS JULY–AUGUST 2018 INTERACTIONS.ACM.ORG can use these visualizations to compare and structures distort our perceptions occlusion may prevent us from knowing changes over time without blinding of the data. where to look. people to critical changes in their data. The one place where starting y-axes Our ability to resolve 3D objects at values greater than zero is still a stems from both monocular cues (e.g., A MATTER OF SCALES matter of debate is in communicating one object being partially occluded When we represent data on a standard variation. For line graphs, small by another) and binocular cues (e.g., Cartesian plane, many systems by variations become less noticeable information coming from each eye fused default fit axis ranges to natural data as the amount of space dedicated to into a single picture). When we project scales, such as the minimum and those varying elements grows smaller; 3D data onto a 2D image, we lose maximum value. This choice maximizes the magnitude of these small-scale binocular cues. For example, we cannot the space in a graph dedicated to data. variations becomes distorted by engage motion parallax—the same However, it also may cause people to see truncated axes. But if an analyst cares depth cue that cats use when bobbing differences in the data that simply do about variation rather than magnitude, back and forth to judge how far to not exist. many argue that the loss of fidelity from leap—or vergence—our brain’s ability This issue is most problematic when non-zero y-axes may be mitigated: The to resolve 3D position using the angles visualizations begin their y-axes above distortions created by the axis may not between an object and our eyes. As a zero. In many common visualizations, matter. result, 2D projections are inherently we interpret visualized values by Instead of truncating your axes, imperfect approximations of 3D space measuring the distance between the consider the story the visualization and are often difficult to resolve. For x-axis and our marks (e.g., the top of a is supposed to tell. What are the

bar, the position of a point). Non-zero important differences in the data? Thickness Clump Cell Cell Clump Thickness Clump Cell y-axes distort the difference between For example, if you want to visualize 12 values, causing small differences to change in a value over time, instead of 1012 appear much larger than they truly are. communicating the raw magnitudes, 810 68 Consider the example shown in Figure you may wish to compute change 46 1: The data difference is only 5 percent, relative to some baseline and visualize 24 yet the left bar appears twice as large as that computed value instead. To 42 the right bar. Many argue that labeling tell a story about growth or decline, 3 4 4 2 3 Cell Shape3 Uniformity 1 axes counteracts the biasing effects of visualize the rate of growth rather 4 2 2 3 1 truncating the y-axis. However, people than the full population. By visualizing Cell Shape Uniformity2 0 1 1 Cell Size Uniformity seldom read axis labels: The ratios metrics more closely tied to the actual Misleading0 Cell Size Uniformity Misleading people see at a glance often reflect the quantity of interest using honest Misleading conclusions they will draw from data [9]. axes, visualizations can focus on data 4 The same is true of normalized that matters without introducing axes. If you have multiple consecutive unnecessary bias. 4 3 plots showing the same variables, the axes should map to the same data THREE PROBLEMS WITH 3D 3 2 ranges. Consider the infamous Planned Three-dimensional visualizations

Parenthood comparison chart [10]. The create graphics that appear to pop out 2Uniformity Size Cell 1 y-axis corresponds to the number of of the page. They are seen as engaging, Cell Size Uniformity Size Cell 1 2 3 4 services provided; however, these axes futuristic, and sophisticated. And 1 Cell Shape Uniformity are normalized to two different ranges, removing the ability to generate them 1 2 3 4 creating a false crossing in the data. is one of the best things presentation 4 Cell Shape Uniformity Renormalizing these axes to the same tools could do for honest data scale tells a different story: At no point communication. 4 3 does the dominant provided service 3D visualizations in two-dimensional change. The most salient feature of the media like slideshows and papers suffer 3 2

original graph led to a false conclusion from three primary issues that bias Cell Size Uniformity Size Cell because of improper normalization. analysis: occlusion, projection, and 2 1 The distortion caused by poor axis perceptual ambiguity. Occlusion occurs 1 2 3 4 scaling is a by-product of the way we when some marks make it difficult Uniformity Size Cell 1 Cell Shape Uniformity read visualizations. Axis labels require (or even impossible) to view others. 1 0 2 3 12 4 conscious attention to interpret: We Consider Figure 5: Center bars are CellCell Shape Clump Uniformity Thickness have to actively read these numbers to occluded by outer values, complicating make sense of them. However, when analysis. In the real world, people 0 Improved 12 we look at a visualization, we form the can move around objects to resolve Cell Clump Thickness gist of a visual scene unconsciously. occlusions. For example, we peek ImprovedImproved We get a sense of the data’s shape and around a wall to see what lies behind it. Figure 5. 3D bar charts can occlude data distributional properties without In 2D, people generally cannot change and distort values. Leveraging a third visual actively reading anything. If we use their viewpoint to see occluded data. variable, such as size or color, supports different axes to represent different Occluded data is effectively lost. Even more accurate comparisons over multiple facets of our data, the resulting shapes if people can move their viewpoint, dimensions.

INTERACTIONS.ACM.ORG JULY–AUGUST 2018 INTERACTIONS 31 cover story example, when we tilt a pie chart in 3D, still often imperfect for these scenarios. are two primary cases where we may we distort the angles between slices of For example, we can see only half of choose to explicitly aggregate data: the pie (Figure 6) [11]. This distortion any 3D volume from a single viewpoint. when aggregate statistics are sufficient at best makes it harder to read the data Pairing 2D summary representations for our analysis and when we have too and at worst causes incorrect analysis with 3D structures can help overcome much data to visualize at once. In some by distorting mark shape and size these limitations, even for complex cases, we may not need much data to (and consequently perceived values) geometries and inherently spatial data. address the question at hand. However, at different depths. These distortions such visualizations should use caution worsen when we map data to size: As SHOW, DON’T TELL when communicating statistics. For objects get farther away, they also As algorithms improve, it is tempting example, analysts often compare sample appear smaller. In 3D visualizations, to rely on statistical processing for populations using bar charts with a small object may either have a small most data analysis. Visualizations error bars. This method, despite its value or be far away. We cannot visually increasingly represent the outputs of popularity, causes people to interpret resolve the two possibilities. these processes rather than the original values inside of a bar as statistically To avoid occlusion and ambiguity data. People often see algorithms as more likely than those outside of the in visualizations, use 3D only when less error prone and unbiased; however, bar, a phenomenon known as within-the- absolutely necessary. Instead of like people, algorithms are subject to bar bias [15,16]. We can avoid this bias representing the third dimension bias and make mistakes. Electing to by using representations that provide of your data using depth, try using visualize algorithmic outputs without more transparent insight into the data alternative visual variables like color the context of the underlying data distribution. A violin plot (Figure 7) or size (Figure 5). Some kinds of data, deprives people of the information visualizes data distributions alongside like molecular surfaces or architectural necessary to evaluate the output’s means to help avoid within-the-bar structures, have inherent 3D shapes. In meaning and validity. bias; it also surfaces aspects of the data these cases, 3D can provide important In collaboration with Microsoft distribution that enrich analysis, such contextual information. However, 3D is and the University of Wisconsin, we as the normal, bimodal, and skewed surveyed the ways in which people distributions in the figure’s three U.S. Smartphone Market Share visualize large collections of data. samples. U.S. Smartphone Market Share The majority of systems (74 percent) Showing the full dataset is not computed and directly visualized always an option. Modern datasets representative statistics [12]. While may simply have too much data to Other RIMOther such statistical aggregation allows visualize. Trying to show all available AppleRIM PalmApple people to make precise claims about data can lead to clutter—we have MotorolaPalm target quantities, it comes at the so much visual information, we NokiaMotorola Nokia expense of context and flexibility. cannot find the data that matters. For Consider a scatterplot comparing two example, network visualizations may Misleading MisleadingMisleading clusters, A and B. If we choose to show gain so many connections that they the means of A and B, we have precise become a “hairball”: It is impossible to

U.S. Smartphone MarketMarket Share Share information about these means but have disentangle the individual relationships no data about other statistics of each between entities in the graph. We can cluster, such as the variance or density. overcome clutter by carefully coupling People can efficiently estimate statistics and visualization to construct OtherOther RIMRIM aspects of a statistical distribution at a visual summaries—visualizations AppleApple glance [13]. They can use visualizations that reduce the amount of data shown PalmPalm MotorolaMotorola to estimate properties of a distribution while preserving important properties Nokia Nokia like means, variance, and even higher- of the distribution. For example, we order statistics like correlation quickly can compute representative statistics and accurately [14]. For example, within for relevant subsets such as clusters or

U.S. Smartphone MarketMarket Share Share a half second of looking at a bubbleplot, connected components. Alternatively, we already have an approximate sense we can filter out irrelevant information NokiaNokia of the mean size of the collection to focus on relevant elements of MotorolaMotorola of bubbles. Our abilities to visually the dataset. We can even randomly PalmPalm compute these values relates to the subsample our data, preserving the concept of ensemble coding—a process underlying data distribution while AppleApple our brain uses to compactly represent reducing the overall amount of RIM RIM large quantities of visual information information shown. Other Other by recalling the data’s distributional Balancing showing and telling in 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 parameters. visualization is more of an art than a Improved ImprovedImproved When reasonable, visualizations science, as we need to allow accurate and Figure 6. Distortion due to projection in should err toward providing more data flexible analysis while not overwhelming the 3D pie chart causes the green wedge to rather than less. This design choice people with too much information. Ideal represent a far larger market share than sacrifices precise statistical comparison visualizations should be transparent: the data supports. in order to enrich analysis. There People should understand how the data

32 INTERACTIONS JULY–AUGUST 2018 INTERACTIONS.ACM.ORG 100

50 Runtime (s)

Approach A Approach B Approach C

Misleading

100 100

75 75

50 50 Runtime (s) Runtime (s)

25 25

Approach A Approach B Approach C Approach A Approach B Approach C

MisleadingMisleading Improved Improved Figure 7. Traditional aggregation methods, such as bar charts encoding means, replace data with statistics, obscuring important patterns in the underlying data distribution. 100 changed between the raw, unprocessed the many benefits offered by data. world interaction. Psychonomic Bulletin & file and visualized marks, and how the Crafting optimal visualizations is Review 5, 4 (1998), 644–649. patterns they see reflect the underlying still an unsolved and wicked problem. 9. Pandey, A.V., Rall, K., Satterthwaite, 75 M.L., Nov, O., and Bertini, E. How data. What statistics are used? What Deeper collaboration between data deceptive are deceptive visualizations?: An was filtered for? What happened to science, cognitive science, and vision empirical analysis of common distortion outliers? By being transparent with science is necessary to move us toward techniques. Proc. of the ACM Conference visualizations, we can help people algorithmic and visual solutions that on Human Factors in Computing Systems. 50 better understand the available data and can scaffold an informed and inclusive ACM, New York, 2015, 1469–1478. Runtime (s) intuitively generate informed insights data-driven society. 10. http://www.msnbc.com/msnbc/ and decisions, even with large data congressman-chaffetz-misleading-graph- smear-planned-parenthood collections. 25 Endnotes 11. https://www.wired.com/2008/02/ 1. Larson, A.M., Freeman, T.E., Ringer, R.V., macworlds-iphon/ TOWARD BETTER PRACTICES and Loschky, L.C. The spatiotemporal 12. Sarikaya, A., Gleicher, M., and Szafir, D.A. This article focuses on common dynamics of scene gist recognition. Journal of Design factors for summary visualization Approach A Approach B Approach C mistakes in visualizations that bias data Experimental Psychology: Human Perception in visual analytics. Computer Graphics and Performance 40, 2 (2014), 471. Forum 37, 3 (2018). analysis. These guidelines areImproved deeply 2. Borkin, M.A., Gajos, K.Z., Peters, A., 13. Ariely, D. Seeing sets: Representation by grounded in empirical studies and Mitsouras, D., Melchionna, S., Rybicki, statistical properties. Psychological Science decades of observation and practice. F.J., Feldman, C.L., and Pfister, H. 12, 2 (2001), 157–162. Vision science and visualization Evaluation of artery visualizations for 14. Szafir, D.A., Haroz, S., Gleicher, M., and offer some explanation for why these heart disease diagnosis. IEEE Trans. on Franconeri, S. Four types of ensemble phenomena occur and allow us to design Visualization and Computer Graphics 17, 12 coding in data visualizations. Journal of alternative representations that more (2011), 2479–2488. Vision 16, 5 (2016), 1–19. 3. Wong, B. Points of view: Color blindness. faithfully depict data. 15. Correll, M. and Gleicher, M. Error bars Nature Methods 8, 441 (2011). considered harmful: Exploring alternate However, we are far from 4. Ball, K., and Sekuler, R. A specific and encodings for mean and error. IEEE Trans. understanding all of the mechanisms enduring improvement in visual motion on Visualization and Computer Graphics 20, at play when people interpret data. For discrimination. Science 218, 4573 (1982), 12 (2014), 2142–2151. example, how might visualizations 697–698. 16. Newman, G.E. and Scholl, B.J. Bar graphs account for illusions that occur 5. Franconeri, S.L., Jonathan, S.V., and depicting averages are perceptually Scimeca, J.M. Tracking multiple objects naturally in data? Can we rescale or misinterpreted: The within-the-bar bias. is limited only by object spacing, not by renormalize visualizations to account Psychonomic Bulletin & Review 19, 4 speed, time, or capacity. Psychological (2012), 601–607. for biases introduced by the ways we Science 21, 7 (2010), 920–925. see the world? How do we intuitively 6. https://www.ted.com/talks/hans_rosling_ navigate high-dimensional data? How shows_the_best_stats_you_ve_ever_seen Danielle Albers Szafir is an assistant do we effectively pair visualization 7. Neisser U. The control of information professor in the Department of Information Science at the University of Colorado Boulder. and computation to help people better pickup in selective looking. In Perception and Its Development: A Tribute to Eleanor J. Her research bridges data science and vision leverage petabyte datasets? Gibson. A.D. Pick, ed. Erlbaum, New York, science to develop interactive visualization A principled and quantified 1979, 201–219. systems, guidelines, and techniques for understanding of the way we see data 8. Simons, D.J. and Levin, D.T. Failure to exploratory data analysis. can empower people to better leverage detect changes to people during a real- →→ [email protected]

DOI: 10.1145/3231772 COPYRIGHT HELD BY AUTHOR. PUBLICATION RIGHTS LICENSED TO ACM. $15.00

INTERACTIONS.ACM.ORG JULY–AUGUST 2018 INTERACTIONS 33