Investigating the Effect of the Multiple Comparisons Problem in Visual Analysis Emanuel Zgraggen1 Zheguang Zhao1 Robert Zeleznik1 Tim Kraska1, 2 Brown University1 Massachusetts Institute of Technology2 Providence, RI, United States Cambridge, MA, United States {ez, sam, bcz}@cs.brown.edu
[email protected] ABSTRACT Similar to others [9], we argue that the same thing happens The goal of a visualization system is to facilitate data-driven when performing visual comparisons, but instead of winning, insight discovery. But what if the insights are spurious? Fea- an analyst “loses” when they observe an interesting-looking tures or patterns in visualizations can be perceived as relevant random event (e.g., two sixes). Instead of being rewarded for insights, even though they may arise from noise. We often persistence, an analyst increases their chances of losing by compare visualizations to a mental image of what we are inter- viewing more data visualizations. This concept is formally ested in: a particular trend, distribution or an unusual pattern. known as the multiple comparisons problem (MCP) [5]. Con- As more visualizations are examined and more comparisons sider an analyst looking at a completely random dataset. As are made, the probability of discovering spurious insights more comparisons are made, the probability rapidly increases increases. This problem is well-known in Statistics as the mul- of encountering interesting-looking (e.g., data trend, unex- tiple comparisons problem (MCP) but overlooked in visual pected distribution, etc.), but still random events. Treating analysis. We present a way to evaluate MCP in visualization such inevitable patterns as insights is a false discovery (Type I tools by measuring the accuracy of user reported insights on error) and the analyst “loses” if they act on such false insights.