Supplement on Visualizing Biological Data
Total Page:16
File Type:pdf, Size:1020Kb
Supplement on visualizing biological data iology is a visually grounded scientific discipline—from the CONTENTS way data is collected and analyzed to the manner in which S2 Visualizing biological the results are communicated to others. Visualization B data—now and in the methods have advanced greatly from the hand-drawn pictures future found in scientific publications before the twentieth century and S I O’Donoghue, A-C Gavin, N Gehlenborg, now rely almost exclusively on computer-based visualization D S Goodsell, J-K Hériché, The cover image shows a range tools. But the similarity of modern computer-generated phyloge- C B Nielsen, C North, of data visualizations currently A J Olson, J B Procter, used by life scientists. Source netic trees to their ancestral hand-drawn evolutionary trees illus- D W Shattuck, images come from figures in trates the challenges involved in developing novel visualization T Walter & B Wong the Nature Methods supplement methods that present information in a self-evident way and yet S5 Visualizing genomes: “Visualizing biological data” and can handle the demands placed on them by modern methods of techniques and from Nature Cell Biology and Nature challenges Biotechnology. Cover design by data generation. C B Nielsen, M Cantor, Seán O’Donoghue and Bang Wong. The exponentially increasing amount of scientific data is taxing I Dubchak, D Gordon & Supplement Foreword p193 the abilities of scientists to make sense of it all and communicate it T Wang to others in a concise and meaningful way. Although the computers S16 Visualization of multiple alignments, that facilitate this data deluge also help handle it, it is critical that phylogenies and gene scientists be able to participate intimately in the analysis steps using family evolution J B Procter, J Thompson, qualitative and quantitative abstractions of the underlying data. I Letunic, C Creevey, This supplement describes data visualization methods and tools F Jossinet & G J Barton and how these methods are adapting to the challenges accompa- S26 Visualization of image nying modern biology. A Commentary introduces the topic and data from cells to summarizes the general challenges. Five Reviews describe the organisms T Walter, D W Shattuck, visualization approaches and software tools that biologists use R Baldock, M E Bastin, © 2010 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2010 for, respectively, data visualization of genomes, alignments and A E Carpenter, S Duce, J Ellenberg, A Fraser, phylogenies, image-based data, macromolecular structures and N Hamilton, S Pieper, systems biology data. M A Ragan, J E Schneider, P Tomancak & J-K Hériché Each review highlights a recommended fraction of the avail- able tools. Because these tools can be very specialized and the S42 Visualization of macromolecular writers themselves are developers of some of the tools, there structures is little comparative assessment. Instead, the reviews focus S I O’Donoghue, D S Goodsell, more on the challenges and methods behind the tools. The A S Frangakis, F Jossinet, tools themselves, ranging from simple stand-alone software to R A Laskowski, M Nilges, H R Saibil, A Schafferhans, complex integrated software packages, are conveniently listed R C Wade, E Westho & in tables within each review, and links are provided so that A J Olson readers may easily access the tools and evaluate which ones best S56 Visualization of omics meet their specific needs. data for systems biology N Gehlenborg, Daniel Evanko S I O’Donoghue, N S Baliga, A Goesmann, Editor, Nature Methods Daniel Evanko Senior Production Editor M A Hibbs, H Kitano, O Kohlbacher, Publisher Veronique Kiermer Brandy Cafarella H Neuweger, R Schneider, Senior Copy Editor Anita Gould Production Editor Amanda Crawford D Tenenbaum & A-C Gavin Managing Production Editor Marketing Joanna Budukiewicz Ingrid McNamara NATURE METHODS SUPPLEMENT | VOL.7 NO.3s | MARCH 2010 | S1 COMMENTARY Visualizing biological data—now and in the future Seán I O’Donoghue1, Anne-Claude Gavin1, Nils Gehlenborg2,3, David S Goodsell4, Jean-Karim Hériché1, Cydney B Nielsen5, Chris North6, Arthur J Olson4, James B Procter7, David W Shattuck8, Thomas Walter1 & Bang Wong9 Methods and tools for visualizing biological data have improved considerably over the last decades, but they are still inadequate for some high-throughput data sets. For most users, a key challenge is to benefit from the deluge of data without being overwhelmed by it. This challenge is still largely unfulfilled and will require the development of truly integrated and highly useable tools. Computer-based visualization is widely used disposal, many of these tools amenable to In addition, tools are increasingly being in biology to help understand and communi- use by non-experts1. designed to interoperate directly with other cate data, to generate ideas and to gain insight A main reason for the increased accessibility visualization and analysis tools. Such inter- into biological processes. This collection of and use of visualization software has been the operation can enable, for example, simulta- reviews examines the key methods now being advances in computer hardware and network neous interactive visualization of a multiple used to visualize genomes1, alignments and access. Many visualization tasks that previ- sequence alignment with corresponding phylogenies2, macromolecular structures3, ously required expensive and specialized hard- three-dimensional structures (Procter et systems biology data4 and image-based data5. ware can now be easily managed with a stan- al.2 and O’Donoghue et al.3)—or of a net- Here, we outline several common trends, dard personal computer. However, an equally work with corresponding heat maps, profile challenges and recent advances that suggest important factor has been the development of plots or phylogenetic trees and dendrograms the nature of future visualization in biology. a wide range of methods and tools specialized (Gehlenborg et al.4). in visualizing specific kinds of biological data. Finally, many of today’s visualization Visualization goes mainstream In this Supplement, we discuss over 200 tools tools can be either directly embedded into, Twenty years ago, only experts could cre- selected from the much greater number now or launched from, web pages; and such tools ate computer images of a protein structure available. This diversity of tools can be con- are being used to construct integrated web at atomic detail, a large phylogenetic tree, fusing, but it is probably unavoidable, given applications for data mining and browsing, © 2010 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2010 or a complex biochemical pathway. Today, the diverse nature of the biosciences. In fact, in often using multiple visualization tools. For software tools for creating these images are many cases, biologists still find that their exact example, the UCSC Genome Browser8 shows widely available and widely used. Of the dif- requirements are not met by current tools and genomic sequences assembled from many ferent visualization areas in biology, molec- often have to create custom solutions. This has laboratories and provides access to a diverse ular graphics3 is perhaps the most mature, helped spur a growing trend to allow reuse range of related data, including multiple and as a result, molecular graphic images are of visualization software, either by means of sequence alignments among sequences from widely used in textbooks, presentations and open source software libraries (for example, similar organisms, three-dimensional struc- popular media. Other fields, such as genome http://www.vtk.org/) or by means of architec- tures and in situ hybridization images. visualization1, are much younger; however, tures specifically designed to allow extensions The improved integration in visualiza- even here, molecular biologists have a rich (for example, Cytoscape6). tion tools has been helped greatly by a toolbox of visualization software at their trend toward increased consolidation of Integration is improving experimental data. An exemplary case of this 1European Molecular Biology Laboratory, Heidelberg, In the past, visualization tools were typically trend is macromolecular three-dimensional Germany. 2European Bioinformatics Institute, stand-alone programs designed to view data structure: almost all experimentally deter- Cambridge, UK. 3Graduate School of Life Sciences, University of Cambridge, Cambridge, UK. 4The from a single experiment. In contrast, many mined structures are consolidated in a single 9 Scripps Research Institute, La Jolla, California, USA. of today’s tools are integrated with remote resource (wwPDB ). Unfortunately, such 5British Columbia Cancer Agency, Genome Sciences databases and provide visualizations that consolidation is still the exception: it is more Centre, Vancouver, British Columbia, Canada. integrate data from multiple sources. For typical in biology to have equivalent data 6Virginia Tech, Blacksburg, Virginia, USA. 7School 7 of Life Sciences Research, College of Life Sciences, instance, Jalview —a popular tool for edit- distributed over many resources. In the case University of Dundee, Dundee, UK. 8Laboratory of ing multiple sequence alignments—can con- of image data from high-throughput experi- Neuro Imaging, University of California, Los Angeles, nect to multiple data sources and displays not ments, most of these data are never made California, USA. 9Broad Institute of MIT & Harvard, Cambridge,