DESIGNING VISUALIZATIONS FOR BIOLOGICAL DATA Miriah Meyer, School of Computing, University of Utah, Salt Lake City, UT, 84112, U.S.A. E-mail: [email protected]
Submitted:
Abstract Visualization is now a vital component of the biological discovery process. In this article I pre- sent visualization design studies as a promising Fig. 1. This nine-stage framework [1] provides practical guidance for designing visualization process for creating effective, visualization tools tools that focus on solving specific, real-world problems. Copyright Miriah Meyer, 2012. for biological data.
Design studies are often highly col- chemical reactions that occur in cells. The field of biology has been trans- laborative projects that, at their heart, The problem they faced at the onset of formed over the last decade with the rely on a visualization practitioner ac- our collaboration was that existing tools rapidly decreasing cost of data. Today, quiring a deep understanding of a prob- could only look at a subset of this data at the focus is on combining huge ge- lem and exploring the broad space of a time. nomics databases with large amounts of design solutions. Through progressively During the discover stage of the pro- molecular data, and then augmenting refining prototypes based on feedback ject we conducted weekly meetings with this data with long-term clinical out- from the target end-users, effective visu- the group of 7 biologists over the course comes across large populations of peo- alization tools emerge for solving a real- of 3 months to learn about their scien- ple. Biologists believe that embedded in world analysis problem. tific questions, data analysis needs, and these complex, massive datasets are The nine-stage framework [1] lays out existing visualization tools. scientific goldmines like a cure for can- a practical process for conducting design From these meetings we came to un- cer. But how do you combine data that studies, shown in Figure 1. This frame- derstand that the group was working spans from the molecular level up to the work contains three precondition stages with four kinds of data: several dozen population level in a meaningful way? that focus on learning the space of visu- graphs that describe cellular reactions Visualization has emerged as an im- alization techniques and establishing that are catalyzed by genes; a large table portant way to make sense of this data. synergistic collaborations. The inner of quantitative values that describe Lying at the intersection of computer four core stages describe the process for how much each gene in each species is science, design, and biology, visualiza- characterizing a real-world problem on or off over time during an experi- tion of biological data enables scientists followed by the design, implementation, ment; a tree depicting the evolutionary to understand data by encoding meaning and evaluation of a visualization solu- relationship of the fourteen species of through images and supporting explora- tion. The final two analysis stages de- yeast; and quantitative correlation tion through human-computer interac- scribe steps for analyzing lessons- values that describe how similar the tions. My research shows that producing learned during the project and com- temporal values are for a specific gene agile visualizations by matching the rate municating those findings to the visuali- across the species. of software development to the biolo- zation research community. The entire We preformed a similar analysis on gists’ rate of experimentation produces process is iterative and practitioners the questions our collaborators were tools that not only support scientists, but often jump backwards to earlier stages asking and translated them into a list of also influence them as they tackle com- when refining ideas. data analysis tasks. We characterized plex questions. these tasks as operating at four different A recent trend in the visualization Pathline: A Design Study levels. Briefly, those tasks, starting at community is to develop influential To illustrate how the nine-stage frame- the lowest level, are: tools by conducting design studies in work applies to a real-world, visualiza- 1. Study the gene data over time to close collaboration with scientists and tion design problem, I’ll discuss the observe dynamic trends in the data. other expert end users. In this paper I process of creating the tool Pathline [2], 2. Compare a limited number of time describe a methodology for conducting a visualization system for exploring series, such as those corresponding to a design studies, and illustrate their effec- molecular biology data. specific gene across all the species, to tiveness with an example from my own In this project we collaborated with a characterize meaningful differences. work. group of biologists at the Broad Institute 3. Compare the correlation values of Harvard and MIT who are studying across one or more of the reaction Visualization Design Studies how the same sets of genes can facilitate graphs to understand where the genes A design study is a “project in which different kinds of cellular functions in behave the same, and differently, in the visualization researchers analyze a spe- related species. Our collaborators specif- species. cific real-world problem faced by do- ically study metabolism in yeast, and 4. Compare multiple correlation val- main experts, design a visualization they spent several years conducting ex- ues to validate which correlation metric system that supports solving this prob- periments and collecting data for many is the most informative for the specific lem, validate the design, and reflect different genes in fourteen species of analysis needs of the group. about lessons learned in order to refine yeast. Their goal is to understand pat- This data and task abstraction [3] visualization design guidelines”[1]. terns and trends in this data in the con- served as the input for the design stage. text of known information about In this stage we focused our designs around the idea of encoding quantitative the discovery of a previously unknown References and Notes values with a spatial encoding channel evolutionary event. * This paper was presented as a keynote talk instead of other, less effective channels The design study project that under- at Arts, Humanities, and Complex Networks – 3rd Leonardo satellite symposium at NetSci2012. such as color [4]. This means that in- lies Pathline has several visualization See