Bioimage Informatics for Experimental Biology*

ANRV376-BB38-16 ARI 27 March 2009 11:9 Bioimage Informatics for Experimental Biology∗ Jason R. Swedlow,1 Ilya G. Goldberg,2 Kevin W. Eliceiri,3 and the OME Consortium4 1Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom; email: [email protected] 2Image Informatics and Computational Biology Unit, Laboratory of Genetics, National Institute on Aging, IRP, NIH Biomedical Research Center, Baltimore MD 21224; email: [email protected] 3Laboratory for Optical and Computational Instrumentation, University of Wisconsin at Madison, Madison, Wisconsin 53706; email: [email protected] 4http://openmicroscopy.org/site/about/development-teams Annu. Rev. Biophys. 2009. 38:327–46 Key Words The Annual Review of Biophysics is online at microscopy, file formats, image management, image analysis, image biophys.annualreviews.org processing This article’s doi: 10.1146/annurev.biophys.050708.133641 Abstract Copyright c 2009 by Annual Reviews. Over the past twenty years there have been great advances in light mi- Annu. Rev. Biophys. 2009.38:327-346. Downloaded from arjournals.annualreviews.org All rights reserved croscopy with the result that multidimensional imaging has driven a rev- by NORTHWESTERN UNIVERSITY - Evanston Campus on 05/17/10. For personal use only. 1936-122X/09/0609-0327$20.00 olution in modern biology. The development of new approaches of data ∗The U.S. Government has the right to retain a acquisition is reported frequently, and yet the significant data manage- nonexclusive, royalty-free license in and to any ment and analysis challenges presented by these new complex datasets copyright covering this paper. remain largely unsolved. As in the well-developed field of genome bioinformatics, central repositories are and will be key resources, but there is a critical need for informatics tools in individual laboratories to help manage, share, visualize, and analyze image data. In this article we present the recent efforts by the bioimage informatics community to tackle these challenges, and discuss our own vision for future development of bioimage informatics solutions. 327 ANRV376-BB38-16 ARI 27 March 2009 11:9 Contents INTRODUCTION .................. 328 SUPPORT FOR DATA FLEXIBLE INFORMATICS TRANSLATION—BIO- FOR EXPERIMENTAL FORMATS ........................ 334 BIOLOGY ........................ 329 Utility ............................. 334 Proprietary File Formats ........... 329 Modularity ........................ 335 Experimental Protocols ............ 330 Flexibility ......................... 335 Image Result Management ......... 330 Extensibility ....................... 335 Remote Image Access .............. 330 DATA MANAGEMENT Image Processing and Analysis ...... 330 APPLICATIONS: OME Distributed Processing ............. 330 AND OMERO .................... 336 Image Data and Interoperability .... 330 OMERO ENHANCEMENTS: TOWARDS BIOIMAGE BETA3ANDBETA4.............. 338 INFORMATICS .................. 331 OMERO.blitz ..................... 338 BUILDING BY AND FOR Structured Annotations............. 339 THE COMMUNITY ............. 331 OMERO.search ................... 339 DELIVERING ON THE PROMISE: OMERO.java ...................... 339 STANDARDIZED FILE OMERO.editor.................... 339 FORMATS VERSUS “JUST PUT OMERO.web...................... 339 IT IN A DATABASE” ............. 332 OMERO.scripts ................... 340 OME: A COMMUNITY-BASED OMERO.fs ........................ 340 EFFORT TO DEVELOP IMAGE WORKFLOW-BASED DATA INFORMATICS TOOLS ......... 333 ANALYSIS: WND-CHARM ...... 340 THE FOUNDATION: THE OME USABILITY ......................... 342 DATA MODEL ................... 334 SUMMARY AND FUTURE OME FILE FORMATS............... 334 IMPACT.......................... 342 INTRODUCTION a five-dimensional structure—space, time, and channel. High content screening (HCS) and Modern imaging systems have enabled a new fluorescence lifetime, polarization, and corre- kind of discovery in cellular and developmen- lation are all examples of new modalities that tal biology. With spatial resolutions running further increase the complexity of the mod- from millimeters to nanometers, analysis of cell Annu. Rev. Biophys. 2009.38:327-346. Downloaded from arjournals.annualreviews.org ern microscopy dataset. However, multidimen- and molecular structure and dynamics is now by NORTHWESTERN UNIVERSITY - Evanston Campus on 05/17/10. For personal use only. sional data acquisition generates a significant routinely possible across a range of biological data problem: A typical four-year project gen- systems. The development of fluorescent re- erates hundreds of gigabytes of images, per- porters, most notably in the form of geneti- haps on many different proprietary data ac- cally encoded fluorescent proteins (FPs), com- quisition systems, making hypothesis-driven bined with increasingly sophisticated imaging research dependent on data management, systems has enabled direct study of molecular visualization, and analysis. structure and dynamics (6, 52). Cell and tissue Bioinformatics is a mature science that imaging assays have scaled to include all three forms the cornerstone of much of modern spatial dimensions, a temporal component, and HCS: high content biology. Modern biologists routinely use ge- the use of spectral separation to measure mul- screening nomic databases to inform their experiments. tiple molecules such that a single image is now 328 Swedlow et al. ANRV376-BB38-16 ARI 27 March 2009 11:9 In fact these databases are well-crafted multi- to manage a single laboratory’s data that are layered applications that include defined data comparable to that used to deliver genomic se- structures, application programming interfaces quence applications and databases to the whole Application (APIs), and use standardized user interfaces to community? Why can’t the tools used in ge- programming enable querying, browsing, and visualization of nomics be immediately adapted to imaging? interface (API): an the underlying genome sequences. These fa- Are image informatics tools from other fields interface providing cilities serve as a great model of the sophis- appropriate for biological microscopy? In this one software program tication necessary to deliver complex, hetero- article, we address these questions, discuss the or library easy access to its functionality geneous datasets to bench biologists. However, requirements for successful image informatics with full knowledge of most genomic resources work on the basis of de- solutions for biological microscopy, and con- the underlying code or fined data structures with defined formats and sider the future directions that these applica- data structures known identifiers that all applications can ac- tions must take to deliver effective solutions for cess (they also employ expert staff to monitor biological microscopy. systems and databases, a resource that is rarely available in individual laboratories). There is no single agreed data format, but a defined num- FLEXIBLE INFORMATICS FOR ber are used in various applications, depend- EXPERIMENTAL BIOLOGY ing on the exact application (e.g., FASTA and Experimental imaging data are by their very EMBL files). These files are accessed through a nature heterogeneous and dynamic. The chal- number of defined software libraries that trans- lenge is to capture the evolving nature of an late data into defined data structures that can be experiment in data structures that by their very used for further analysis and visualization. Be- nature are specifically typed and static, for later cause a relatively small number of sequence data recall, analysis, and comparison. Achieving this generation and collation centers exist, standards goal in imaging applications means solving a have been relatively easy to declare and support. number of problems. Nonetheless, a key to the successful use of these data was the development of software applications, designed for use by bench biologists as Proprietary File Formats well as specialist bioinformaticists, that enabled There are over 50 different proprietary file querying and discovery based on genomic data formats (PFFs) used in commercial and aca- held by and served from central data resources. demic image acquisition software packages for Given this paradigm, the same facility light microscopy (34). This number signifi- should in principle be available for all biological cantly increases if electron microscopy, new imaging data (as well as proteomics and, soon, HCS systems, tissue imaging systems, and deep sequencing). In contrast to centralized other new modes of imaging modalities are genomics resources, in most cases, these meth- included. Regardless of the specific applica- Annu. Rev. Biophys. 2009.38:327-346. Downloaded from arjournals.annualreviews.org ods are used for defined experiments in indi- tion, almost all store data in their own PFFs. by NORTHWESTERN UNIVERSITY - Evanston Campus on 05/17/10. For personal use only. vidual laboratories or facilities, and the number Each of these formats includes the binary data of image datasets recorded by a single postdoc- (i.e., the values in the pixels) and the meta- toral fellow (hundreds to thousands) can easily data (i.e., the data that describes the binary rival the number of genomes that have been data). Metadata include physical pixel sizes, sequenced to date. For the continued develop- time stamps, spectral ranges, and any other ment and application of experimental biology measurements or values required

Bioimage Informatics for Experimental Biology*

Bioimage Analysis Tools

Other Departments and Institutes Courses 1

Bioimage Informatics

Bioimage Informatics 2019

A Bioimage Informatics Platform for High-Throughput Embryo Phenotyping James M

View of Above Observations and Arguments, the Fol- Scopy Environment) [9]

Principles of Bioimage Informatics: Focus on Machine Learning of Cell Patterns

Introduction to Bio(Medical)- Informatics

Bioimagexd: an Open, General-Purpose and High-Throughput Image-Processing Platform

Bioimage Informatics for Plant Sciences Blobs and Curves: Object-Based Colocalisation for Plant Cells

A Short Course on Bioimage Informatics Lecture 1: Basic Concepts and Techniques

Phenotypic Image Analysis Software Tools for Exploring and Understanding Big Image Data from Cell-Based Assays