A Call for Public Archives for Biological Image Data Public Data Archives Are the Backbone of Modern Biological Research

comment A call for public archives for biological image data Public data archives are the backbone of modern biological research. Biomolecular archives are well established, but bioimaging resources lag behind them. The technology required for imaging archives is now available, thus enabling the creation of the frst public bioimage datasets. We present the rationale for the construction of bioimage archives and their associated databases to underpin the next revolution in bioinformatics discovery. Jan Ellenberg, Jason R. Swedlow, Mary Barlow, Charles E. Cook, Ugis Sarkans, Ardan Patwardhan, Alvis Brazma and Ewan Birney ince the mid-1970s it has been possible image data to encourage both reuse and in automation and throughput allow this to analyze the molecular composition extraction of knowledge. to be done in a systematic manner. We Sof living organisms, from the individual Despite the long history of biological expect that, for example, super-resolution units (nucleotides, amino acids, and imaging, the tools and resources for imaging will be applied across biological metabolites) to the macromolecules they collecting, managing, and sharing image and biomedical imaging; examples in are part of (DNA, RNA, and proteins), data are immature compared with those multidimensional imaging8 and high- as well as the interactions among available for sequence and 3D structure content screening9 have recently been these components1–3. The cost of such data. In the following sections we reported. It is very likely that imaging data measurements has decreased remarkably outline the current barriers to progress, volume and complexity will continue to over time, and technological development the technological developments that grow. Second, the emergence of powerful has widened their scope from investigations could provide solutions, and how those computer-based approaches for image of gene and transcript sequences (genomics/ infrastructure solutions can meet the interpretation, quantification, and modeling transcriptomics) to proteins (proteomics) outlined need. has vastly improved the ability to process and metabolic products (metabolomics). Most important, “imaging” is not a single large image datasets. These approaches However, life cannot be understood simply technology but an umbrella term for diverse are underpinned by the rapid evolution from measurements of the presence of technologies that create spatiotemporal of algorithms, data storage, and cloud biomolecules; scientists also need to know maps of biological systems at different scales computing technologies. Wider adoption of when and where these biomolecules are and resolutions (Table 1). This is further such technology has also led to substantial present and how they interact inside cells complicated by the large size and diversity of decreases in cost, thus making the and organisms. This requires mapping image datasets and the concomitant diversity establishment of integrated public bioimage of their spatiotemporal distribution, of the associated computational analysis data resources feasible and highly valuable. structural changes, and interactions in tools. Achieving the level of integration for We believe that the establishment of a biological systems. imaging data that is routine in biomolecular biological image archive, with coordinated Observation and measurement of the data will require coordination of data data resources brokering image data ‘when’ and ‘where’ in living organisms resources and community approaches deposition and further analysis, will provide predates chemical measurements by more that harmonizes measurements across community momentum and a strong than three centuries: such observation spatiotemporal scales, imaging modalities, incentive to harmonize both measurements began with light microscopy4 and was and data formats. In specific areas, the and analysis approaches. augmented centuries later with diverse ‘new’ biological imaging community has technologies such as electron microscopy, tackled a number of these challenges, as Data archives and added-value X-ray imaging, electron beam scattering, exemplified by coordinated data-format databases and magnetic resonance imaging. These reading schemes5 and individual examples There are two distinct types of biological direct physical measurements range from of harmonized measurements across scales; databases. Data archives are long-lasting the atomic scale to the whole-organism scale however, there is far more potential that data stores with the dual goals of (1) and have a striking dual use: they provide could be achieved through data accessibility, faithfully representing and efficiently quantitative measurements of molecular reuse, and integration. storing experimental data and supporting structure, composition, and dynamics across Recent breakthroughs in imaging metadata, thus preserving the scientific many spatial and temporal scales and, in technology and in computer science now record, and (2) making these data easily parallel, powerful visual representations of make it both feasible and of the utmost searchable by and accessible to the scientific biological structures and processes for the importance to address this challenge. community. An archive serves as an scientific community and the wider public. First, the resolution revolutions in light authoritative public resource for data, but it With the huge expansion of imaging at and electron microscopy (recognized with does not aim to synthesize datasets or make all levels made possible by revolutionary the Nobel Prize in chemistry in 20146 and value judgments beyond assuring adherence technologies including cryo- and volume- 20177) now routinely allow molecular to standards and quality. Archives make it electron microscopy, super-resolution light identification and position and structure possible to connect different datasets on the microscopy, and light-sheet microscopy, the determination in imaging data from basis of common standardized elements, opportunities for research and biomedical biological systems, rapidly bringing the such as genes, molecules, and publications. insights have never been greater. Delivering where and when of molecular structure A typical example of a biological data on this potential requires open sharing of and function within reach. New advances archive is the European Nucleotide NATURE METHODS | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods 849 comment Table 1 | Biological scales of imaging Scale (unit) Imaging technology Use Molecular (angstrom) Single-particle cryo-electron microscopy (EM) and electron Structural analysis, molecular function tomography averaging Molecular machines Cryo-EM, super-resolution light microscopy (SRM) Biochemistry, molecular mechanisms (nanometer) Cells (micrometer) Transmission EM, volume EM, light microscopy Cellular morphology, activity within cells, mechanism (wide-field, confocal, SRM), electron tomography, 3D scanning EM, soft X-ray tomography Tissues (millimeter) Volume EM, scanning EM, light microscopy (multiphoton, Protein localization, tissue morphology and anatomy, light sheet, OPT, etc.), X-rays (micro-CT), fluorescence interactions between cells imaging, mass spectrometry imaging Organism/organ Photography, X-rays, magnetic resonance imaging, optical Mechanistic understanding of development and disease (centimeter) tomography technologies, computerized tomography, luminescence imaging Imaging is used to understand a range of phenomena at different size and time scales. In general, image capture at different scales uses different technologies and records different types of metadata. Archive10, which stores nucleotide phenotype, and PhenoImageShare12, which A clear separation between data archives sequences. Data archives often have a single uses defined ontologies to integrate image and added-value databases is important global scope, even if they are provided by a datasets on the basis of phenotype. for ensuring the smooth flow of data from distributed organization worldwide. We propose the creation of a central the experimentalists generating the data The second type, referred to as added- ‘BioImage Archive’ that would provide the to the public archives. If journals make it value databases, are synthetic: they enrich foundation for existing and new future a requirement that image data supporting and combine different datasets through added-value databases. Data pipelines from a publication must be submitted to the well-designed analysis, expert curation, this archive would feed databases, and at BioImage Archive, the data-submission and, where possible, meta-analysis. They the same time maximize the possibilities for process must be straightforward and rapid, typically provide integrated information knowledge extraction and new discoveries and should produce a unique identifier for and biological knowledge for a community by allowing rapid data aggregation and citation of the data. Added-value databases, of users. They may also include advanced cross-discipline comparisons. Given the in contrast, may want to carefully curate the functionalities such as question-oriented currently available and rapidly growing submitted data, and often require additional searches and queries, cross-comparison image datasets in the Electron Microscopy quality control, updates to ontologies, or of datasets, and advanced mining and Data Bank (EMDB)13, Electron Microscopy reprocessing of the data. The separation of visualization. Examples of added-value Public Image Archive

A Call for Public Archives for Biological Image Data Public Data Archives Are the Backbone of Modern Biological Research

3 on the Nature of Biological Data

Visualization of Biomedical Data

When Biology Gets Personal: Hidden Challenges of Privacy and Ethics in Biological Big Data

From Big Data Analysis to Personalized Medicine for All: Challenges and Opportunities Akram Alyass1, Michelle Turcotte1 and David Meyre1,2*

Personalized Medicine in Big Data

Introduction to Bioinformatics

Introduction to Computational Biology & Bioinformatics – Part I

Application of Data Mining in Bioinformatics

Deep Learning in Mining Biological Data

Machine Learning and Complex Biological Data Chunming Xu1,2* and Scott A

From Big Data Analysis to Personalized Medicine for All: Challenges and Opportunities Akram Alyass1, Michelle Turcotte1 and David Meyre1,2*

Image Data Resource: a Bioimage Data Integration and Publication Platform