<<

comment A call for public archives for biological image data Public data archives are the backbone of modern biological research. Biomolecular archives are well established, but bioimaging resources lag behind them. The technology required for imaging archives is now available, thus enabling the creation of the frst public bioimage datasets. We present the rationale for the construction of bioimage archives and their associated databases to underpin the next revolution in discovery. Jan Ellenberg, Jason R. Swedlow, Mary Barlow, Charles E. Cook, Ugis Sarkans, Ardan Patwardhan, Alvis Brazma and Ewan Birney

ince the mid-1970s it has been possible image data to encourage both reuse and in automation and throughput allow this to analyze the molecular composition extraction of knowledge. to be done in a systematic manner. We Sof living organisms, from the individual Despite the long history of biological expect that, for example, super-resolution units (nucleotides, amino acids, and imaging, the tools and resources for imaging will be applied across biological metabolites) to the macromolecules they collecting, managing, and sharing image and biomedical imaging; examples in are part of (DNA, RNA, and proteins), data are immature compared with those multidimensional imaging8 and high- as well as the interactions among available for sequence and 3D structure content screening9 have recently been these components1–3. The cost of such data. In the following sections we reported. It is very likely that imaging data measurements has decreased remarkably outline the current barriers to progress, volume and complexity will continue to over time, and technological development the technological developments that grow. Second, the emergence of powerful has widened their scope from investigations could provide solutions, and how those computer-based approaches for image of gene and transcript sequences (genomics/ infrastructure solutions can meet the interpretation, quantification, and modeling transcriptomics) to proteins (proteomics) outlined need. has vastly improved the ability to process and metabolic products (metabolomics). Most important, “imaging” is not a single large image datasets. These approaches However, life cannot be understood simply technology but an umbrella term for diverse are underpinned by the rapid evolution from measurements of the presence of technologies that create spatiotemporal of algorithms, data storage, and cloud biomolecules; scientists also need to know maps of biological systems at different scales computing technologies. Wider adoption of when and where these biomolecules are and resolutions (Table 1). This is further such technology has also led to substantial present and how they interact inside cells complicated by the large size and diversity of decreases in cost, thus making the and organisms. This requires mapping image datasets and the concomitant diversity establishment of integrated public bioimage of their spatiotemporal distribution, of the associated computational analysis data resources feasible and highly valuable. structural changes, and interactions in tools. Achieving the level of integration for We believe that the establishment of a biological systems. imaging data that is routine in biomolecular biological image archive, with coordinated Observation and measurement of the data will require coordination of data data resources brokering image data ‘when’ and ‘where’ in living organisms resources and community approaches deposition and further analysis, will provide predates chemical measurements by more that harmonizes measurements across community momentum and a strong than three centuries: such observation spatiotemporal scales, imaging modalities, incentive to harmonize both measurements began with light microscopy4 and was and data formats. In specific areas, the and analysis approaches. augmented centuries later with diverse ‘new’ biological imaging community has technologies such as electron , tackled a number of these challenges, as Data archives and added-value X-ray imaging, electron beam scattering, exemplified by coordinated data-format databases and magnetic resonance imaging. These reading schemes5 and individual examples There are two distinct types of biological direct physical measurements range from of harmonized measurements across scales; databases. Data archives are long-lasting the atomic scale to the whole-organism scale however, there is far more potential that data stores with the dual goals of (1) and have a striking dual use: they provide could be achieved through data accessibility, faithfully representing and efficiently quantitative measurements of molecular reuse, and integration. storing experimental data and supporting structure, composition, and dynamics across Recent breakthroughs in imaging metadata, thus preserving the scientific many spatial and temporal scales and, in technology and in computer science now record, and (2) making these data easily parallel, powerful visual representations of make it both feasible and of the utmost searchable by and accessible to the scientific biological structures and processes for the importance to address this challenge. community. An archive serves as an scientific community and the wider public. First, the resolution revolutions in light authoritative public resource for data, but it With the huge expansion of imaging at and electron microscopy (recognized with does not aim to synthesize datasets or make all levels made possible by revolutionary the Nobel Prize in chemistry in 20146 and value judgments beyond assuring adherence technologies including cryo- and volume- 20177) now routinely allow molecular to standards and quality. Archives make it electron microscopy, super-resolution light identification and position and structure possible to connect different datasets on the microscopy, and light-sheet microscopy, the determination in imaging data from basis of common standardized elements, opportunities for research and biomedical biological systems, rapidly bringing the such as genes, molecules, and publications. insights have never been greater. Delivering where and when of molecular structure A typical example of a biological data on this potential requires open sharing of and function within reach. New advances archive is the European Nucleotide

Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods 849 comment

Table 1 | Biological scales of imaging Scale (unit) Imaging technology Use Molecular (angstrom) Single-particle cryo-electron microscopy (EM) and electron Structural analysis, molecular function tomography averaging Molecular machines Cryo-EM, super-resolution light microscopy (SRM) Biochemistry, molecular mechanisms (nanometer) Cells (micrometer) Transmission EM, volume EM, light microscopy Cellular morphology, activity within cells, mechanism (wide-field, confocal, SRM), electron tomography, 3D scanning EM, soft X-ray tomography Tissues (millimeter) Volume EM, scanning EM, light microscopy (multiphoton, Protein localization, tissue morphology and anatomy, light sheet, OPT, etc.), X-rays (micro-CT), fluorescence interactions between cells imaging, mass spectrometry imaging Organism/organ Photography, X-rays, magnetic resonance imaging, optical Mechanistic understanding of development and disease (centimeter) tomography technologies, computerized tomography, luminescence imaging

Imaging is used to understand a range of phenomena at different size and time scales. In general, image capture at different scales uses different technologies and records different types of metadata.

Archive10, which stores nucleotide phenotype, and PhenoImageShare12, which A clear separation between data archives sequences. Data archives often have a single uses defined ontologies to integrate image and added-value databases is important global scope, even if they are provided by a datasets on the basis of phenotype. for ensuring the smooth flow of data from distributed organization worldwide. We propose the creation of a central the experimentalists generating the data The second type, referred to as added- ‘BioImage Archive’ that would provide the to the public archives. If journals make it value databases, are synthetic: they enrich foundation for existing and new future a requirement that image data supporting and combine different datasets through added-value databases. Data pipelines from a publication must be submitted to the well-designed analysis, expert curation, this archive would feed databases, and at BioImage Archive, the data-submission and, where possible, meta-analysis. They the same time maximize the possibilities for process must be straightforward and rapid, typically provide integrated information knowledge extraction and new discoveries and should produce a unique identifier for and biological knowledge for a community by allowing rapid data aggregation and citation of the data. Added-value databases, of users. They may also include advanced cross-discipline comparisons. Given the in contrast, may want to carefully curate the functionalities such as question-oriented currently available and rapidly growing submitted data, and often require additional searches and queries, cross-comparison image datasets in the Electron Microscopy quality control, updates to ontologies, or of datasets, and advanced mining and Data Bank (EMDB)13, Electron Microscopy reprocessing of the data. The separation of visualization. Examples of added-value Public Image Archive (EMPIAR)14 (Box 1), the archival layer from the added-value layer image databases exist already, such as the and IDR resources and beyond (Table 2), we makes both functions possible, efficient, Image Data Resource11 (IDR; https://idr. see this archive as a response to an urgent and effective. openmicroscopy.org/), which integrates community need, as well as an essential Thus, a functional biological imaging cell and tissue imaging studies on the foundational layer for the accelerated data storage and coordinated data resource basis of genetic or drug perturbations and development of new added-value resources. would include both archival and added-value

Table 2 | Examples of potential high-value datasets Data type Utility and impact Types of users/ Examples of public resources applications Correlative light and electron Link functional information Structural biologists and EMPIAR14 microscopy across spatial and temporal modelers: structural scales models that span spatial and temporal scales Cell and tissue atlases Construction, composition, Educational resources; Allen Brain Atlas (https://www.brain-map.org), Allen Cell and orientation of reference for Explorer (https://www.allencell.org/), Human Protein Atlas biological systems in construction of tissues (https://www.proteinatlas.org), Human Protein Cell Atlas (https:// normal and pathological and/or organisms www.proteinatlas.org), Mitotic Cell Atlas (https://omictools.com/ states mitotic-cell-atlas-tool), model-organism gene expression atlases19 Benchmark datasets Standardized test datasets Algorithm developers, EMDataBank13, BBBC (https://data.broadinstitute.org/bbbc), for the development of new testing systems IDR11, CELL Image Library (http://www.cellimagelibrary.org) algorithms Systematic phenotyping Comprehensive studies of Queries for genes or MitoCheck (http://www.mitocheck.org), SSBD (http://ssbd. cell structure, systems, and inhibitor effects qbic.riken.jp), IMPC (https://www.mousephenotype.org), response PhenoImageShare12

This table is exemplary and is not a comprehensive survey of all imaging datasets.

850 Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods comment

BioImage archive biologists. Examples of such images are electron microscopy maps of protein Submission Re-create/Reuse structure, systematic characterizations of proteins in a common cell type, mapping of spatiotemporal expression patterns or phenotypes, and organ or developmental atlases. Such reference images not only will be useful to the research community, but also would serve to demonstrate the value of Added value bioimage data archiving. At the beginning, Other image metadata defining the principles for the identification databases of reference-image datasets will require a scientific review process. Over time, however, image datasets would be expected to attain a ‘highly used’ label based on their actual reuse, similar to citation scores of scientific publications. The scope of the archive would not be tied to any particular imaging technology; rather, it would be defined by the life sciences community on the basis of the likelihood of data reuse. This is an established guiding concept in biomolecular data archiving; for instance, the Gene Expression Omnibus and the Functional Genomics Data Archive 19,20 Research community ArrayExpress accept data from different experimental modalities and generated by different technologies. The acceptance is Fig. 1 | The bioimaging-data archiving ecosystem. The flow of data between the different parts of the based on the data’s relevance to functional community is shown. The BioImaging Archive serves as the central bioimage data repository for the genomics, which in practice is defined by scientific community, while added-value databases consume reference datasets and enable reuse and the goals of the project that generated the integration. Credit: Marina Corral Spence/Springer Nature. data or by the journal where the research paper is published. We list below four key synergistic aspects (Fig. 1). The archival repository deployment of technologies to ensure that functions that the integrated ecosystem of a of images would preferably be supported the growth in the volume of data year to year bioimage data archive and associated added- by an international collaborative effort matches the decrease in disk storage costs; value databases should fulfill: with common standards in the same for DNA sequences, this has required the 1. Promote open data and reproduc- way that DNA archives are part of the development of data-specific compression. ibility of research, following the FAIR International Nucleotide Sequence Database Similar methods of highly effective lossless principles21, allowing authors to provide Collaboration consortium (http://insdc. or near-lossless compression and big-data a full audit trail of their original image org)10. Added-value databases, developed handling are currently being developed for data and analysis methods, and allowing around the archive, would focus on digital images17,18. other interested scientists to explore al- particular biological areas where greater To accelerate the development of the ternative analyses of the same raw data. understanding can be obtained through BioImage Archive, datasets that are likely Making data, materials, and methods systematic integration of images, just as the to be reused and of high value to the used in scientifc research available to Ensembl database and genome browser15 community should be rapidly identified. other researchers, with no restrictions add value to DNA data, and the Expression So-called reference images, a concept that on their use other than a request for a Atlas16 integrates archived data to elucidate was introduced in a white paper published citation, is a long-standing tradition in gene expression in a biology-centric way. jointly by Euro-BioImaging (https:// the life sciences, as well as an essential www.eurobioimaging.eu), the European bedrock principle of scientifc progress The rationale for a BioImage Archive infrastructure for imaging technologies and and rigor22,23. The BioImage Archive should store and ELIXIR (https://www.elixir-europe.org), and 2. Provide reference data for the research make available imaging datasets from the the European infrastructure that coordinates community. Tese data, provided with molecular to the organism scale (Table 1). life science data resources (https://www. rules of community use and analysis Archived datasets should be directly eurobioimaging.eu/content-news/euro- methods, can prevent the redundant related to the results and figures included bioimaging-elixir-image-data-strategy), production of replica datasets and serve in a publication. Scaling of such an archive are data that have value beyond a single as the basis for added-value resources to the large number of publications that experiment or project because they also can (e.g., atlases). Tey improve the ef- contain bioimage data is made possible serve as a resource for the larger community. fciency of routine experiments and over by new, efficient, object-based storage They must be interpretable by researchers time will constitute a comprehensive systems. A key aspect for the sustainability outside the laboratory that generated scientifc reference-image resource for of large-scale data archiving has been the them and also of general interest for many cells, tissues, organs, and organisms.

Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods 851 comment

3. Allow new scientifc discoveries to be more and more the standard case, as many Additionally, journals and funding made with existing data, in particular research funders now include mandatory agencies should exert their influence to via novel or integrated analysis of data- data-management plans in their grants, encourage data submission. The BioImage sets that were originally generated for a for which the archive would again be the Archive should work closely with journals diferent purpose or as a resource. Te trusted partner. and preprint servers to encourage image- feasibility and value of such reanalysis of We have considered the concern that data deposition at the time of submission combined imaging datasets has already a BioImage Archive would be flooded or acceptance for publication. The been demonstrated24. Integrative struc- by enthusiastic depositors offloading image-data sharing infrastructure must ture determination is another example their storage costs to the central resource demonstrate a positive return on investment of such a synergy25. without scientific benefit for anyone beyond to both funders and data submitters. The 4. Accelerate the development of image- the submitting group. Our experience involvement of funders will be crucial to analysis methods using a broad range of running DNA archives is that the ensure that large systematic data-generation of benchmark datasets, thus encourag- requirement that datasets be associated projects commit to depositing their image ing validation and robustness of these with a publication and the inclusion of data and appropriately budget for these computational methods and enabling key quality control steps and management costs in projects. well-structured image-analysis chal- prevent ‘spurious’ use of the archive. The ultimate driver for data submission lenges24,26,27. Importantly, the BioImage Archive would will be scientific impact: sharing and reuse not replace the need for initial local storage of data maximize the impact of research, and A similarly strategic approach to data for capture, quality control, and primary can increase the volume of citations by one archiving has been critical to the success image analysis in scientific projects; or two orders of magnitude32. The BioImage of genomics—indeed, it has amplified the it would, however, provide long-term Archive will ensure the preservation of the value of DNA sequencing data enormously, archiving after publication. As noted above, scientific record while building up a critical as archives allow the comparison of new we propose to start filling the archive with mass of reusable data. Associated added- sequences to all previously archived reference-image datasets, but it is quite likely value databases, as noted above, will build sequences. Although we recognize that that the definition and scope of reference on the archive to provide in-depth curation, biological image data are more complex, images will evolve as the archive grows and annotation, standardization, reanalysis, and context dependent, and multidimensional, its associated databases develop protocols integration of independent datasets. such dependencies are not entirely new for and tools for identifying and integrating molecular archives. Gene-expression data, data from the archive. An evolution of Human image archiving for example, are also context dependent, data-submission standards has occurred There are many potential synergies between multidimensional, and potentially affected with other archives, and discussions with the use of imaging in clinical research by experimental conditions. Three- the imaging community at an international and practice and that in basic biological dimensional macromolecular structures and workshop we convened (details are included research. Both domains require robust, electron microscopy data have been archived in the Acknowledgements) suggest a similar usable tools for data processing and analysis, for 45 and 15 years, respectively, and this has dynamic in the bioimaging community. and there are many opportunities for reuse proven to be tremendously valuable28–30. of methods, algorithms, and software. A central BioImage Archive will prove Easy submission of image data will Because of the extensive use of imaging in just as powerful. drive the value of the archive clinical research and practice, there is also One of the keys to successful sharing is an increasing need to link imaging used for Paying for a BioImage Archive the ease of submitting data to the archives. human phenotyping with genotyping efforts, Economic analysis of EMBL-EBI’s Submission criteria must balance the such as in the UK BioBank33. Although biomolecular data resources31 shows that collection of standardized data and metadata there are analogies to human genome data research efficiency is increased many-fold necessary for dataset interpretation against sharing between researchers (for example, by the availability of reference data (function the risk of overburdening submitters with as enabled by the European Genome- 2)—enough to more than compensate complex technical requirements. Moreover, phenome Archive (https://www.ebi.ac.uk/ for the storage and running of these it is not always obvious what metadata ega)), data sharing in the clinical community resource—and that only open data enable are minimally required, particularly when has different challenges and a different assessment of reproducibility (function dealing with rapidly evolving technologies culture. Additionally, practical factors such 1). An integrated archive will provide a like imaging. The solution to this problem as appropriate consent, ethics approval, long-term home for image data that has is for relevant scientific communities to and data privacy must be addressed before high utility and reduces the need for every rapidly agree on the initial minimum broader sharing protocols can be set up. institution with substantial image-data- requirements and update them as Our recommendation is to use the BioImage generation capabilities to invest in long-term technologies change and science advances, Archive to drive concrete, example-driven archiving efforts. For consortia and projects for example, as in the case of the Genomic discussions and recommendations around planning to systematically generate large Standards Consortium (http://gensc.org), these issues. We aim to (a) continue bringing amounts of imaging data, the archive would which works to develop community-driven the atomic, cellular, tissue, and model become a trusted partner in presenting cost- ‘minimum information’ standards for organism imaging communities together effective and sustainable solutions for image descriptions of genomes, metagenomes, with the clinical community; (b) continue deposition and storage to their funders. and marker genes. However, with the a broad discussion on the merits and As with biomolecular data resources, rapid advances in imaging technology, requirements of comprehensive data sharing the maximum utility is realized when all the underlying data structures must also among clinical researchers, with a particular scientists (whether in academia or industry) evolve with changing requirements, and the focus on factors of known complexity, can freely deposit to the archive and freely submission tools will have to coevolve to such as consent and privacy; and (c) where access and reuse the data. This will be keep up with those changes. possible, incorporate suitably anonymized

852 Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods comment

that transnational initiatives that include Box 1 | Archives for the EM revolution scientists, funders, and institutions are best One prominent public imaging resource is in the area of structural biology. Increased placed to build and sustain community detector sensitivity and new methods in electron microscopy have created a new data resources. Indeed, given that the approach to atomic-scale measurement of biological molecules: single-particle cryo- proposed resources will fuel life science electron microscopy. Tis method bridges the gap between the atomic resolution of research globally and not just in Europe, crystallography and that of difraction-limited light microscopy to provide visualization a worldwide community might ultimately of large protein complexes at near-atomic resolution. contribute to their long-term operation. Te structural biology community has enthusiastically embraced these technologies, Connecting European efforts with implementing resources for cryo-electron microscopy imaging (EMDB (http:// international partners is a major objective of emdb-empiar.org/), established at EMBL-EBI in 2002, and EMPIAR (http://empiar. the Global BioImaging Project (http://www. org/), established at EMBL-EBI in 2014) that robustly demonstrate the value of image eurobioimaging.eu/global-bioimaging). resources13,14. EMDB now has more than 6,000 entries, with one-third of these released Euro-BioImaging, Global BioImaging, and in the past two years, and this growth is mirrored by a rapid rise in the number of unique other national and transnational imaging hosts accessing the service, which grew 20-fold to almost 100,000 between 2009 and 2017. networks can serve as powerful advocates Tis rapid growth is unlikely to slow. In the UK alone, the number of high-end for the adoption and use of the bioimage microscopes will increase from 7 in 2016 to nearly 20 in 2018. Worldwide, 50 Titan data ecosystem by scientists, institutions, Krios microscopes were installed between 2008 and 2015, around 100 were installed in journals, and funders. This is already 2016–2017, and even more will come online in 2018 (S. Welsh, personal communication). beginning: as of this writing, Springer Nature is recommending that its authors use IDR for their imaging data. and consented clinical image datasets into easily imported into IDR for inclusion in the the BioImage Archive to help establish the added-value resource. At the time of writing, Conclusions correct submission requirements and overall before the establishment of the BioImage Three parallel developments have together value of these data. These submissions Archive, these two resources held nearly created both the need for an open access may come directly from study authors or 5,000 bioimage datasets and 2.9 million BioImage Archive and the opportunity to eventually from applications such as i2b2 individual images. These developments establish it: (1) the development of new that have been adapted to submit data to are an exemplar for the coordination imaging technologies that generate large, the BioImage Archive’s submission portal34. needed to deliver the overall vision: high-quality image datasets ranging from develop submission protocols, establish molecular and structural information Building a bioimage database system review processes for identifying reference to organismal descriptions; (2) faster and its user communities datasets, assemble imaging datasets, and computers and new computational methods We envisage an integrated BioImage build the publication systems that will be that allow faster, better analysis of these Archive with user-friendly submission the foundation for the mature bioimage images; and (3) the decreasing costs of data systems that support deposition using database system. storage and cloud computing technologies. community-developed data standards, The mature BioImage Archive, like The possibilities for deeper biological interconnected with a growing and those already established for genomics, understanding opened up by new imaging diverse set of added-value databases that transcriptomics, and proteomics, will need technology make the development of the together evolve into a comprehensive to be a large international effort, leveraging BioImaging Archive and the corresponding bioimage database system. We have already contributions, resources, and technology coordinated data resources critical for made progress toward this end with the from the global scientific community. In new opportunities in research, methods establishment of the BioStudies archive and Europe, the Euro-BioImaging consortium development, and training. To deliver IDR. BioStudies enables authors to package provides a framework for imaging-research on the promise of imaging technology, all data supporting a publication, either infrastructure. During Euro-BioImaging’s it is essential that scientists openly share by providing links to data in specialized planning, representatives from almost all biological image data. databases such as the Protein Data Bank EU member states participated in user We envision a BioImage Archive (PDB) or by storing the actual data when a surveys, technological evaluations, and that will store reference images that are dedicated resource does not exist35. It has proof-of-concept tests of a transnational freely available for reuse. These are very a flexible structure that supports metadata bioimaging infrastructure. This activity likely to include any image that has been capture from rapidly evolving technologies, produced plans for image-data archiving formally published, as well as other, curated including imaging. While the standards and community-accessible tools for image image datasets. The archive will, in turn, for bioimaging data submission and a analysis and processing. Euro-BioImaging support a host of added-value image-data dedicated archive are being developed by has recently submitted its application to resources that will enhance the scientific the relevant community, BioStudies fulfills become a European Research Infrastructure value of the archival images through the role of a temporary bioimage data Consortium to obtain independent curation, integrative analysis, and the archive; its ‘working blueprint’ approach international status and implement the plans development of new analytical methods. will help developers design functionality developed in the preparatory phase. The practical experience of the latest for imaging data submissions and access, Euro-BioImaging, the IDR, and the generation of bioimaging data resources determine practical and sufficient metadata strategic collaboration with ELIXIR are shows that data volumes can be technically requirements, and build a scalable technical examples of how national and transnational managed at scale and, importantly, that infrastructure capable of dealing with large efforts can collaborate to deliver the datasets generated by the latest imaging data volumes. We are currently working components of the bioimage data ecosystem technologies can be made FAIR. Examples to connect BioStudies and the IDR so that we have outlined here. Past experience with of reuse of bioimaging data suggest that reference-image data in BioStudies can be resources such as the PDB demonstrates the long-term value of these datasets,

Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods 853 comment

just like that of public biomolecular data, 3. Karas, M., Bachmann, D., Bahr, U. & Hillenkamp, F. Int. J. Mass 34. Murphy, S. et al. Genome Res. 19, 1675–1681 (2009). Spectrom. Ion Process 78, 53–68 (1987). 35. Sarkans, U. et al. Nucleic Acids Res. 46, D1266–D1270 (2018). will grow and evolve. The construction 4. Leewenhoeck, A. Philosophical Transactions of the Royal Society of public bioimage data resources is just 14, 568–574 (1684). beginning, and we need to continue to 5. Linkert, M. et al. J. Cell Biol. 189, 777–782 (2010). Acknowledgements expand engagement across bioimaging 6. Te Nobel Prize Organization. Nobelprize.org http://www. We thank the participants (R. Ankerhold, Carl Zeiss Ltd; nobelprize.org/nobel_prizes/chemistry/laureates/2014/ (2014). communities, in particular in clinical J.A. Aston, University of Cambridge; K. Brakspear, MRC; 7. Te Royal Swedish Academy of Sciences. Nobelprize.org J.A. Briggs, EMBL; L.M. Collinson, Francis Crick Institute; imaging, as well as with scientific journals http://www.nobelprize.org/nobel_prizes/chemistry/ B. Fischer, DKFZ; S.E. Fraser, University of Southern and funders. Nevertheless, it is already clear laureates/2017/press.html (2017). California; K. Grünewald, Universität Hamburg; E. Gustin, 8. Schueder, F. et al. Nat. Commun. 8, 2090 (2017). Janssen Pharmaceuticals; S. Isaacson, Wellcome Trust; G. that routine bioimage archiving will deliver 9. Beghin, A. et al. Nat. Methods 14, 1184–1190 (2017). Kleywegt, EMBL-EBI; E. Lundberg, KTH Royal Institute strong scientific and economic returns, as 10. Cochrane, G., Karsch-Mizrachi, I. & Takagi, T. Nucleic Acids Res. of Technology; R.S. McKibbin, BBSRC; T.F. Meehan, shown by the reuse made possible by the 44, D48–D50 (2016). EMBL-EBI; S.J. Newhouse, EMBL-EBI; A.W. Nicholls, GSK; EMPIAR and IDR resources. The potential 11. Williams, E. et al. Nat. Methods 14, 775–781 (2017). 12. Adebayo, S. et al. J. Biomed. Semantics 7, 35 (2016). D.P. O’Regan, MRC; H. Parkinson, EMBL-EBI; J.W. Raff, resulting research discoveries and clinical 13. Patwardhan, A. Acta Crystallogr. D Struct. Biol. 73, University of Oxford; Z. Sachak, BBSRC; A. Sali, University applications make the integration of the 503–508 (2017). of California San Francisco; U. Sarkans, EMBL-EBI; T.R. proposed BioImage Archive and associated 14. Iudin, A., Korir, P. K., Salavert-Torres, J., Kleywegt, G. J. & Schneider, EMBL; P. Tomancak, MPI-CBG; J. Vamathevan, added-value databases an imperative for the Patwardhan, A. Nat. Methods 13, 387–388 (2016). EMBL-EBI) in an international workshop in bioimaging 15. Aken, B. L. et al. Nucleic Acids Res. 45, D635–D642 (2017). convened by the authors in Hinxton, UK, on 23–24 January biological, computational, and, in the future, 16. Papatheodorou, I. et al. Nucleic Acids Res. 46, D246–D251 (2018). 2017 to explore the opportunities and challenges for clinical research communities. ❐ 17. Amat, F. et al. Nat. Protoc. 10, 1679–1696 (2015). biological image archiving. The goal of the workshop was to 18. Pietzsch, T., Saalfeld, S., Preibisch, S. & Tomancak, P. establish the scientific rationale for building public biological Jan Ellenberg1*, Jason R. Swedlow2*, Nat. Methods 12, 481–483 (2015). imaging data resources and the strategic principles that 19. Clough, E. & Barrett, T. Methods Mol. Biol. 1418, 93–110 (2016). Mary Barlow3, Charles E. Cook3, should guide their construction, and much of the discussion 20. Kolesnikov, N. et al. Nucleic Acids Res. 43, D1113–D1116 (2015). in this paper is distilled from dialogue at the workshop. Ugis Sarkans3, Ardan Patwardhan3, 21. Wilkinson, M. D. et al. Sci. Data 3, 160018 (2016). 3 3 22. National Research Council (US) Committee on Responsibilities Alvis Brazma * and Ewan Birney * Competing interests 1 of Authorship in the Biological Sciences. Sharing Publication- European Molecular Biology Laboratory, Heidelberg, related Data and Materials: Responsibilities of Authorship in the The authors declare no competing interests 2 Germany. Division of Computational Biology, School Life Sciences (National Academies Press, Washington, DC, 2003). of Life Sciences, University of Dundee, Dundee, UK. 23. Royal Society. Science as an Open Enterprise (Te Royal Society Open Access Tis article is licensed 3European Molecular Biology Laboratory, European Science Policy Centre, London, 2012). under a Creative Commons At- 24. Ludtke, S. J., Lawson, C. L., Kleywegt, G. J., Berman, H. M. & tribution 4.0 International License, Bioinformatics Institute (EMBL-EBI), Wellcome Chiu, W. Pac. Symp. Biocomput. 2011, 369–373 (2011). which permits use, sharing, adaptation, distribution and Genome Campus, Cambridge, UK. 25. Beck, M. & Hurt, E. Nat. Rev. Mol. Cell Biol. 18, 73–89 (2017). reproduction in any medium or format, as long as you give 26. Ludtke, S. J., Lawson, C. L., Kleywegt, G. J., Berman, H. & Chiu, *e-mail: [email protected]; jrswedlow@dundee. appropriate credit to the original author(s) and the source, ac.uk; [email protected]; [email protected] W. Biopolymers 97, 651–654 (2012). 27. Marabini, R. et al. J. Struct. Biol. 190, 348–359 (2015). provide a link to the Creative Commons license, and 28. Berman, H. M. et al. Curr. Opin. Struct. Biol. 40, 17–22 (2016). indicate if changes were made. Te images or other third Published online: 30 October 2018 29. Lawson, C. L. et al. Nucleic Acids Res. 44, D396–D403 (2016). party material in this article are included in the article’s https://doi.org/10.1038/s41592-018-0195-8 30. Patwardhan, A. & Lawson, C. L. Methods Enzymol. 579, Creative Commons license, unless indicated otherwise in 393–412 (2016). a credit line to the material. If material is not included in References 31. Beagrie, N. & Houghton, J. Te Value and Impact of the European the article’s Creative Commons license and your intended 1. Maxam, A. M. & Gilbert, W. Proc. Natl. Acad. Sci. USA 74, Bioinformatics Institute (EMBL-EBI, Hinxton, UK, 2016). use is not permitted by statutory regulation or exceeds the 560–564 (1977). 32. Wang, X. et al. Scientometrics 103, 555 (2015). permitted use, you will need to obtain permission directly 2. Sanger, F., Nicklen, S. & Coulson, A. R. Proc. Natl. Acad. Sci. USA 33. Allen, N. E., Sudlow, C., Peakman, T. & Collins, R. from the copyright holder. To view a copy of this license, 74, 5463–5467 (1977). Sci. Transl. Med. 6, 224ed4 (2014). visit http://creativecommons.org/licenses/by/4.0/.

854 Nature Methods | VOL 15 | NOVEMBER 2018 | 849–854 | www.nature.com/naturemethods