FEATURE ARTICLE

CUTTING EDGE Building bridges between cellular and molecular structural Abstract The integration of cellular and molecular structural data is key to understanding the function of macromolecular assemblies and complexes in their in vivo context. Here we report on the outcomes of a workshop that discussed how to integrate structural data from a range of public archives. The workshop identified two main priorities: the development of tools and file formats to support segmentation (that is, the decomposition of a three-dimensional volume into regions that can be associated with defined objects), and the development of tools to support the annotation of biological structures. DOI: 10.7554/eLife.25835.001

ARDANPATWARDHAN*,ROBERTBRANDT,SARAHJBUTCHER, LUCYCOLLINSON,DAVIDGAULT,KAYGRU¨ NEWALD,COREYHECKSEL†, JUHATHUISKONEN,ANDRIIIUDIN,MARTINLJONES,PAULKKORIR, ABRAHAMJKOSTER,INGVARLAGERSTEDT‡,CATHERINELLAWSON, DAVIDMASTRONARDE,MATTHEWMCCORMICK,HELENPARKINSON, PETERBROSENTHAL,STEPHANSAALFELD,HELENRSAIBIL, SIRARATSARNTIVIJAI,IRENESOLANESVALERO§,SRIRAMSUBRAMANIAM, JASONRSWEDLOW,ILINCATUDOSE,MARTYNWINNAND * *For correspondence: ardan@ GERARDJKLEYWEGT ebi.ac.uk (AP); [email protected] (GJK)

Present address: †Electron Bio- Introduction can be used to link to other Imaging Centre, Diamond Light To obtain an integrated view of how molecular Source Ltd, Didcot, United resources such as the Universal Protein Resource Kingdom; ‡Computational machinery operates inside cells, biologists are (UniProt; .org; UniProt Consortium, Chemistry and Cheminformatics, increasingly combining structural data at differ- 2013). However, atomic models are not always Lilly UK, Windlesham, United ent length scales, obtained using a range of available for a variety of reasons, such as when Kingdom; §University of Vic - techniques such as electron tomography, elec- molecular averaging fails to obtain high-resolu- Central University of Catalonia, tron microscopy, NMR spectroscopy and X-ray tion features or the inherently lower resolution Barcelona, Spain crystallography. Structural data is held in public associated with molecules being imaged in more archives such as the Electron Microscopy Data complex or even cellular environments. In such Reviewing editor: Werner Bank (EMDB; emdb-empiar.org; Tagari et al., cases, the identification of features often relies Ku¨ hlbrandt, Max Planck Institute 2002), the Electron Microscopy Public Image on prior knowledge or correlation of structural of Biophysics, Germany Archive (EMPIAR; empiar.org; Iudin et al., data obtained at different scales. This is an open-access article, 2016), and the Protein Data Bank (PDB; wwpdb. Once features have been identified, segmen- free of all copyright, and may be org; Bernstein et al., 1977) tation (defined here as the decomposition of the freely reproduced, distributed, Integration between PDB and EMDB data is 3D volume into regions that can be associated transmitted, modified, built based on atomic models in the PDB that have with defined objects) can be employed to facili- upon, or otherwise used by anyone for any lawful purpose. been fitted to or built into EMDB volume maps. tate and visualise the interpretation of the map. The work is made available under For purified biological molecules or larger For example, in a recent study the segmentation the Creative Commons CC0 defined complexes this approach is done rou- of electron and soft X-ray tomography recon- public domain dedication. tinely. Sequence information from the models structions was used to study leakage and

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 1 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Figure 1. Segmentation of Plasmodium falciparum–infected erythrocytes. Soft X-ray tomography shows loss of mechanical integrity of the red cell membrane in the final stages of egress. Panels A-C depict schizonts treated with a selective malarial cGMP-dependent protein kinase G inhibitor (C2), and panels D-F depict schizonts treated with a broad-spectrum cysteine protease inhibitor, E64, which allows parasitophorous vacuole membrane (PVM) rupture but prevents erythrocyte membrane rupture, resulting in merozoites trapped in the blood cell. (A) Slice from tomogram of C2-arrested schizont. (B) Outlines of erythrocyte membrane (red), PVM (yellow), and parasites (cyan) in the tomogram slice in A.(C) 3D rendering of the schizont. The vacuole (yellow) is densely packed with merozoites (cyan) that have been collectively rather than individually rendered, for clarity. The overall height of the cell is ~5 mm. (D) Tomogram slice from an E64-arrested schizont, shown with outlining of membranes in E. Remnants of the PVM are visible. (F) 3D rendering of the schizont. Figure and legend adapted with permission from Hale et al. (2017). Scale bar 1 mm. DOI: 10.7554/eLife.25835.002

breakage of the membranes in erythrocytes higher resolution afforded by sub-tomogram infected by Plasmodium falciparum, and docu- averaging provides more structural detail, dis- mented the dramatic changes in the morphology playing sub-tomogram averages at the original of cells during egress (Hale et al., 2017). The soft tomogram positions and orientations may reveal X-ray tomograms provided overviews of the important information about the organization membrane compartments in intact, vitrified cells and distribution of the object within a cellular (Figure 1). It should be noted that the word ’seg- and functional context. If properly annotated mentation’ may have different interpretations: for such data can be further mined with other ques- example, in whole animal, pre-clinical and medi- tions in mind by other researchers. For example, cal imaging, segmentation includes a concept of researchers recently created composite maps of a model that is used for fitting of the features. In Lassa virus particles by inserting the sub-tomo- this manuscript we limit the definition to the sep- gram average structure of the Lassa virus glyco- aration of density into distinct sub-domains. protein spike back into the original tomographic In tomography, where multiple copies of reconstructions, revealing the organisation and nearly identical objects are found, 3D sub-tomo- copy number of the spikes on the virus surface gram averaging and 3D classification may be (Figure 2; Li et al., 2016). Another example employed to obtain higher resolution recon- revealed the lateral clustering of viral membrane structions. This process often involves combining proteins mediating membrane fusion information from multiple tomograms. Since the (Maurer et al., 2013).

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 2 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Figure 2. Arrangement of Lassa virus glycoprotein spikes on the virion surface. Left to right: A slice from a tomographic volume of Lassa viruses, a sub-tomogram average of the glycoprotein spike, and the sub-tomogram average inserted back onto a virus reconstruction. Images adapted from Li et al., 2016 (under a CC BY 4.0 license). DOI: 10.7554/eLife.25835.003

The archiving of segmentation data in EMDB transformation data, the Protein Data Bank in entries was identified as an area requiring urgent Europe (PDBe) organised an expert workshop on attention in previous workshops on “Data-Man- “3D Segmentations and Transformations - Build- agement Challenges in 3D Electron Microscopy” ing Bridges between Cellular and Molecular in 2011 (Patwardhan et al., 2012) and “A 3D Structural Biology” in December 2015. The Cellular Context for the Macromolecular World” objectives were: in 2012 (Patwardhan et al., 2014), as was the . To identify data models and formats for improved biological annotation of structural data representing segmentation and transfor- to make it more accessible to the wider biological mation data that could provide support audience and to enable integration with struc- for structured biological annotation, thus tural and other bioinformatics resources. Crucially facilitating their use by EMDB and for data integration we need “structured biologi- enabling data-exchange between different cal annotation” which is here defined as the asso- software packages . To gain a better understanding of the ciation of data with identifiers (e.g., accession challenges involved in the annotation of codes from UniProt) and ontologies taken from electron microscopy data and develop well established bioinformatics resources. (Ontol- requirements in terms of tools and strate- ogies are formal collections of statements defin- gies to facilitate annotation. ing concepts, relationships and constraints; for example, the mitochondrial large and small ribo- Here we report and discuss the main out- comes of the workshop, which was attended by a somal subunits are parts of the mitochondrial ribosome which, in turn, is a part of the mitochon- range of participants including software develop- drion). To our knowledge, none of the segmenta- ers, users of segmentation software, ontology tion formats widely used in electron microscopy experts, and experts in structure and data and related fields currently support structured archiving. biological annotation. Furthermore, spatial trans- formations relating sub-tomograms to their par- ent tomograms are not currently captured in Data models and file formats for EMDB. Moreover, wider usage of both segmen- segmentations and tation and transformation data by non-expert transformations users is hindered by a plurality of formats. Prior to the workshop, PDBe developed a draft To discuss and address the challenges of rep- data model to support segmentations and their resenting and capturing segmentations and annotations in EMDB that could accommodate

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 3 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

segmentation descriptions from a range of exist- so that the format could serve the entire biologi- ing formats and software packages as well as cal segmentation field. This would also make it structured biological annotation. It supported easier to support the development of translators the key features of major segmentation pack- between different formats and possibly contrib- ages such as Amira (www.fei.com/software/ ute to a reduction of the number of formats (or amira), IMOD (Kremer et al., 1996) and Chi- at least prevent further proliferation of formats). mera (Pettersen et al., 2004), and provided Representatives for several major software pack- scope for extension and flexibility as the field ages used for segmentation including IMOD (D. developed. However, the draft data model did M.), Amira (R.B.) and Chimera (Tom Goddard, not cover minor features (e.g., surface rendering personal communication) have expressed a com- parameters), especially those that are only rele- mitment to providing read/write capabilities for vant in the context of a particular software pack- the developed format if standard libraries are age. The data model was implemented in an made available. XML schema with the following features: The draft data model included support for a. Support for hierarchical segmentation various colour models including RGB, HSV and description. This is important for repre- colour names. Participants argued that it would senting segmentations from (semi-)auto- be sufficient to support only the most commonly matic approaches that naturally result in a used one, namely the RGB model, as the other hierarchal segmentation, such as Segger models can be converted to it. (which iteratively groups the results of the initial watershed segmentation into a hier- Participants also noted that it might be useful archy; Pintilie et al., 2010). to allow quantification of the estimated certainty b. Different representations of segmenta- of a biological annotation, for example a score tions. Contours and simple geometric for the agreement between a sub-tomogram primitives such as spheres and lines are average and a corresponding region from an often used to delineate regions of interest originating tomogram. There may also be alter- (ROIs) when segmentation is performed native biological annotations in various combina- manually. In automatic segmentation the tions (logical OR, XOR, AND, etc.). The segments are typically represented as sur- quantification of alternative annotations could face meshes and/or 3D volume masks. In become very complex to represent and use, and the latter case, run-length encoding and the participants agreed to initially limit the scope limited bit-depth are commonly used tech- niques to minimise memory requirements. to a single annotation per segment and to let It could be argued that it would be useful the need for more complex representations be to have only one canonical representation driven by actual use cases. and convert all the individual representa- Concerning the transformations between tions to it. However, representing geo- sub-tomogram averages and tomograms, the metric primitives such as spheres as participants agreed that this information should surface meshes could lead to substantial be incorporated into the segmentation data increases in storage size and decreases in model; it simply requires adding support for accuracy of the descriptions. multiple transformations of the same 3D volume c. Support for externally defined (i.e., as sep- representation. It was agreed that the conven- arate files) 3D volume masks. It may be tion to define affine transformations should be useful to allow separation between the metadata (annotations) and the actual well-defined in terms of the transformation, the segmentations (e.g., to lessen the burden order in which they are applied, the direction of on tools and web-services that only the transformation, and the orientations and ori- require the metadata). The data model gins of coordinate systems. accommodates links to external files (and With respect to correlative multi-modal imag- locations within these files) for represent- ing it was recognized that there would eventu- ing segments. ally be a need to go beyond affine d. Segment colours. In some application transformations, for example to represent distor- areas, colour is used to identify objects of tions and deformations of slice data, but the par- the same kind, so it is important that such ticipants did not come to a conclusion about a information is not lost. coherent extensible format. Often, a segment The draft data model was intended primarily consists of multiple spatially transformed copies for internal use in EMDB. However, the meeting of the same primitive. This is also relevant for participants strongly favoured a broader scope sub-tomogram averages as the same volume is

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 4 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

to be spatially transformed into multiple loca- the CCP-EM SVN repository. Comments on the tions within a tomogram. To accommodate schema should be sent to AP. these situations, every segment can be associ- ated with a list of transformations. This represen- Structured biological annotation tation will also be useful in the context of As previously explained, structured biological template matching for describing the transfor- annotation is the association of data with identi- mations between the template and the 3D fiers and ontologies taken from well-established volume. bioinformatics resources. The use of structured The draft data model was developed in XSD biological annotation is not common practice in (XML Schema Definition). The definition of data the electron microscopy or structural biology models is greatly facilitated by tools that enable communities. Therefore, ontology experts were GUI-based development of schemas such as invited to the workshop to explain why these are Oxygen and XMLSpy. Code generators such as useful and what resources and tools are available generateDS create object-model wrappers from for assigning annotations. Use-cases such as schemas that enable reading, writing and manip- mouse imaging data helped to explain the prin- ulation of XML files, thus allowing for rapid pro- ciples and practice of structured biological anno- totyping. Various XML validators also allow the tation. By the end of the meeting there was a correctness of a file relative to a schema to be clearer appreciation of the importance of struc- tested. However, concerns were raised about tured biological annotation for searching and the verbosity of the XML format and the effi- linking imaging data across different scales, ciency with which it can be used. Participants between different imaging and structural data- proposed that while XML may be the natural for- bases and with other bioinformatics resources. mat for a schema defined in XSD, it would be Structured annotation would enable the useful to consider other more compact and effi- seamless integration of structural, imaging and cient formats such as JSON and HDF5 (a binary bioinformatics data from different resources, format that allows for efficient representation of thus making it possible to provide problem-cen- hierarchal metadata and data in a single con- tric views of biology that incorporate structural tainer). Both JSON and HDF5 are now widely and imaging data and are easily accessible by supported with libraries in most major program- the broader biological community (and in con- ming languages, including Python and C/C++, trast to the highly specialised structure-centric to facilitate reading and writing. To this end, util- resources that are available today and mainly ities to convert between the XML, JSON and serve domain-specific communities). However, HDF5 representations of the segmentation data there were concerns that many in the electron model are currently in development at PDBe. microscopy community would find navigating Future format development will be an itera- the landscape of ontologies challenging and tive process involving extensive consultation that this approach would only gain traction in with relevant stakeholders to obtain consensus the community if tools were developed to sim- in and support from the community of develop- plify the biological annotation process. ers, yielding a format that they will support. A It was also discussed whether annotation "Segmentation and transformation file format should be performed by the depositor or by working group" has been established by a sub- EMDB curators. While curators could be trained set of the workshop participants, and other to a high level of expertise in the use of ontolo- developers working on segmentation who are gies, they would not necessarily have enough interested in joining the group are asked to con- knowledge about the sample and the specifics tact AP. of the biological system underlying the study. It PDBe has already modified the data model was concluded that depositors should perform based on the feedback from the meeting, and the annotation, with curators overseeing and checking annotations. this will continue in several rounds of consulta- tion with the working group. The schema is ver- sioned to keep track of changes. To facilitate Tools for structured biological adoption of the format, dubbed EMDB-SFF annotation (SFF=Segmentation File Format), PDBe is devel- Structured biological annotation for electron oping translators to/from other commonly used microscopy will rely on a range of established formats. The code for these translators is pro- ontologies such as Gene Ontology (GO; vided as free open source and distributed via Gene Ontology Consortium, 2008),

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 5 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Experimental Factor Ontology (EFO; Two different options were presented for Malone et al., 2010), Protein Ontology (PRO; how annotation could take place (Figure 3). Natale et al., 2014), Cellular Microscopy Pheno- Many macromolecular systems for which data type Ontology (CMPO; Jupp et al., 2016), NCBI are deposited in EMDB fall into broad catego- organismal classification (NCBITaxon), inte- ries such as ribosomes, proteasomes, chapero- grated cross-species for anatomical structures nins and so on: for each of these categories, and (UBERON; Mungall et al., 2012), imaging with the added information about taxonomy, modality and sample preparation from Fbbi lists of likely components could be generated to (Orloff et al., 2013), Foundational Model of facilitate annotation (Figure 3A). Similarly for Anatomy (FMA) and Cell Ontology (CL; cellular level annotation, lists of cellular compo- Diehl et al., 2016). It may also include identifiers nents could be used. As it would not be possible from resources such as UniProt and the Complex to cover every potential scenario with pre- Portal, which in turn contain cross-reference defined lists, the other option is to provide a information to other useful standardised vocabu- search facility that offers potentially applicable laries and common terminology identifiers, such terms from available ontologies (Figure 3B). as the OMIM and KEGG (Kanehisa et al., 2016) The workshop participants expressed strong pathways. This cross-reference information is support for the development of the SAT and the useful when linking data coded with these termi- functionality depicted in the mock-ups but nologies to the ontologies. raised concerns about a number of issues: the Several of these resources provide applica- upload of data to a web server – some users tion programming interfaces (APIs) that can be may find it challenging to upload large maps and segmentations; the need to annotate seg- used to access the information programmatically mentations twice – users would typically add and provide search functionality. The Samples, free text annotations in the software used for Phenotypes and Ontologies Team (SPOT) at the segmentation and would the need to re- EMBL-EBI has developed tools such as Zooma annotate in the SAT; finding the ’right’ metadata and the Ontology Lookup Service (OLS; terms (particularly in cases where a search yields Jupp et al., 2015), which aggregate information more than one term, and it is not clear which is from a wide range of ontologies and provide the most relevant term); annotating a hierarchi- APIs to access these tools. These APIs can be cal segmentation. (The SAT mock-up accommo- used when building tools for segmentation dates annotation on only one level of hierarchy: annotation to provide simplified views and this might be sufficient in many cases, but it search facilities for ontological terms. could become problematic as more automated At the workshop, PDBe presented mock-ups segmentation techniques are developed and of a web-based segmentation annotation tool their usage expands.) (SAT; Figure 3). This tool would allow a user to A desktop version of the SAT would help add structured biological annotation to segmen- users concerned about the upload of large tations obtained from a variety of different soft- amounts of data to a web-server. Another ware packages and then output an annotated option would be to integrate the functionality segmentation file in EMDB-SFF that could be for structured biological annotation into existing deposited to EMDB or EMPIAR. Annotation packages such as IMOD, Chimera and Amira; could either be done during deposition, in which this would also avoid the problem of users hav- case the biological annotation from the segmen- ing to annotate the segmentations twice. This tation file could be harvested by the deposition alternative would require the development of system to facilitate the deposition process, or it libraries and widgets that facilitate the use of could be done post deposition. The workflow ontologies and the EMDB-SFF by third parties. would consist of: (i) the user uploading segmen- For example, the program for segmentation in tation files (there could be several if the seg- IMOD already has a ’Name Wizard’ plugin that ments have been saved as separate files) and helps the user to choose standardized object the corresponding map (unless it is already names from a CSV file: however, additional released in EMDB or EMPIAR); (ii) conversion to development would be need to provide access an EMDB-SFF file; (iii) use of a GUI-based inter- to on-line ontologies. face to view the segmentations overlaid on the Participants agreed that PDBe should start by map and to select segmentations and add anno- developing the web-based SAT because it could tation; (iv) output of a fully annotated EMDB-SFF reuse a number of components that are already file that could be uploaded to EMDB (Figure 4). being used in other electron microscopy-related

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 6 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Figure 3. Mock-up of a possible Segmentation-Annotation Tool (SAT). Image slices are shown with the segmentations overlaid. (A) The top right panel presents a tree that enables the user to select the segment to be annotated, and existing annotations are shown in the middle right panel. The bottom right panel provides pre- defined lists of annotation terms for frequently studied assemblies and complexes. The image in the left panel is adapted from Mu¨ller et al. (2014) (under a CC BY 3.0 license). (B) The top right and middle right panels are similar to those in A. The bottom right panel provides a search option to find relevant terms. The image in the left panel is adapted from Santarella-Mellwig et al., 2013 (under a CC BY 4.0 license). DOI: 10.7554/eLife.25835.004

web services (such as the Volume slicer; Sala- level of maturity PDBe could work with third- vert-Torres et al., 2016), followed by the desk- party developers to integrate the annotation top version. Once the SAT reaches a certain functionality into their packages.

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 7 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Figure 4. Segmentation-annotation workflow. A user launches the Segmentation-Annotation Tool and uploads segmentations obtained with third-party software. After the segmentation has been annotated with biologically meaningful terms, a segmentation file is written in EMDB-SFF format; this file can be uploaded to the Electron Microscopy Data Bank when the structure is deposited. Once released, the EMDB-SFF file can be used for the integration of structural data between different imaging scales and across resources. The Volume browser mock- up (bottom right) contains images adapted from Bennett et al. (2007) and Bennett et al. (2009) (under a CC0 1.0 license). The 3D rendering was generated from EMDB entry EMD-5020 and PDB entry 3dno (Liu et al., 2008). DOI: 10.7554/eLife.25835.005

By far the greatest challenge is developing Wide acceptance and support of the EMDB- the functionality to find the appropriate biologi- SFF format by software developers working on cal metadata (Malone et al., 2016) and tools segmentations and transformations will be cru- such as Zooma and OLS will be useful for this cial. Providing well-documented open-source purpose. A "Segmentation annotation working tools for working with the format will help in this group" has been established by a subset of the regard, and the Collaborative Computational workshop participants to provide data sets and Project for Electron cryo-Microscopy (CCP-EM) use cases to aid the design of the SAT and to has committed to distributing these tools, and help with its testing. Members of the electron including them in training events for users and microscopy community and related communities developers. However, the scope of the format is who are interested in joining the group are not limited to the cryo-EM field. For example, asked to contact AP. segmentation is an essential element of the workflow for interpreting data in 3D scanning electron microscopy (3D-SEM; Discussion Patwardhan et al., 2014). It will also be possible The EMDB-SFF data model has undergone a to provide support for segmentations for other round of updates based on the feedback from imaging modalities (and also for imaging on the meeting. The development of the file format other length scales), although the range of bio- and the segmentation annotation tools will be logical ontologies and vocabularies will need to iterative, with user-testing and feedback from be expanded. It should also be possible to sup- the working groups being integral parts of the port techniques that combine imaging modali- process. The file format and tools are expected ties (such as correlative light and electron to be ready by late 2017, although they might microscopy), but this will involve extra work on not offer all the features discussed above. the transformation model.

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 8 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

It was clear from the discussions regarding Ingvar Lagerstedt is in the European Molecular the annotation of segmentations that there are Biology Laboratory, European Bioinformatics Institute, significant language barriers between the fields. Wellcome Genome Campus, Hinxton, United Kingdom Catherine L Lawson is in the Center for Integrative Overcoming these barriers is a prerequisite for Proteomics Research and the Research Collaboratory progress, as is the development of new tools for Structural Bioinformatics, Rutgers, The State that will facilitate annotation. University of New Jersey, Piscataway, United States This workshop was an important milestone in David Mastronarde is in the Department of Molecular, that it defined concrete actionable outcomes to Cellular, and Developmental Biology, University of address the challenges involved in the integra- Colorado, Boulder, United States tion of cellular and molecular structural data in Matthew McCormick is at Kitware, Inc., Carrboro, the public archives. This integration will provide United States researchers with "problem-centric views" of Helen Parkinson is in Molecular Archival Resources, European Molecular Biology Laboratory, European data from many different sources, and will also Bioinformatics Institute, Wellcome Genome Campus, help the wider biological and medical communi- Hinxton, United Kingdom ties by making make structural data more Peter B Rosenthal is in Structural Biology of Cells and accessible. Viruses, , London, United Kingdom Ardan Patwardhan is in Cellular Structure and 3D Stephan Saalfeld is in at the Janelia Research Campus, Bioimaging, European Molecular Biology Laboratory, Howard Hughes Medical Institute, Ashburn, United European Bioinformatics Institute, Wellcome Genome States Campus, Hinxton, United Kingdom http://orcid.org/0000-0002-4106-1761 http://orcid.org/0000-0001-7663-9028 Helen R Saibil is in the Institute of Structural and Robert Brandt is in the Visualization Sciences Group, Molecular Biology, Department of Crystallography, FEI, Me´rignac, France Birkbeck College, London, United Kingdom Sarah J Butcher is in the Institute of Biotechnology Sirarat Sarntivijai is in Molecular Archival Resources, and the Department of Biosciences, University of European Molecular Biology Laboratory, European Helsinki, Helsinki, Finland Bioinformatics Institute, Wellcome Genome Campus, Lucy Collinson is in the Electron Microscopy Science Hinxton, United Kingdom Technology Platform, Francis Crick Institute, London, Irene Solanes Valero is in the European Molecular United Kingdom Biology Laboratory, European Bioinformatics Institute, David Gault is in the Centre for Gene Regulation and Wellcome Genome Campus, Hinxton, United Kingdom Expression, University of Dundee, Dundee, United Sriram Subramaniam is in the Laboratory for Cell Kingdom Biology, Center for Cancer Research, National Cancer Kay Gru¨newald is in the Division of Structural Biology, Institute, Bethesda, United States Centre for Human Genetics, University Jason R Swedlow is in the Centre for Gene Regulation of Oxford, Oxford, United Kingdom and Expression and the Division of Computational Corey Hecksel is in the National Center for Biology, University of Dundee, Dundee, United Macromolecular Imaging, Verna and Marrs McLean Kingdom Department of Biochemistry and Molecular Biology, Ilinca Tudose is in Molecular Archival Resources, Baylor College of Medicine, Houston, United States European Molecular Biology Laboratory, European Juha T Huiskonen is in the Division of Structural Bioinformatics Institute, Wellcome Genome Campus, Biology, Wellcome Trust Centre for Human Genetics, Hinxton, United Kingdom University of Oxford, Oxford, United Kingdom Martyn Winn is in the Scientific Computing Andrii Iudin is in Cellular Structure and 3D Bioimaging, Department, Science and Technology Facilities European Molecular Biology Laboratory, European Council, Research Complex at Harwell, Didcot, United Bioinformatics Institute, Wellcome Genome Campus, Kingdom Hinxton, United Kingdom Gerard J Kleywegt is in the Molecular and Cellular Martin L Jones is in the Electron Microscopy Science Structure Cluster, European Molecular Biology Technology Platform, Francis Crick Institute, London, Laboratory, European Bioinformatics Institute, United Kingdom Wellcome Genome Campus, Hinxton, United Kingdom http://orcid.org/0000-0003-0994-5652 Author contributions: AP, Conceptualization, Resour- Paul K Korir is in Cellular Structure and 3D ces, Supervision, Funding acquisition, Investigation, Bioimaging, European Molecular Biology Laboratory, Visualization, Methodology, Writing—original draft, European Bioinformatics Institute, Wellcome Genome Project administration, Writing—review and editing, Campus, Hinxton, United Kingdom Orgainzed workshop and chaired sessions; RB, DG, Abraham J Koster is in the Department of Molecular CH, MLJ, IL, CLL, DM, MM, SSaa, SSar, JRS, IT, Con- Cell Biology, Leiden University Medical Center, Leiden, ceptualization, Writing—review and editing; SJB, The Netherlands

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 9 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

Conceptualization, Writing—review and editing, modularization, and ontology interoperability. Journal Chaired workshop sessions; LC, KG, JTH, AJK, PBR, of Biomedical Semantics 7:44. doi: 10.1186/s13326- Conceptualization, Writing—review and editing, 016-0088-7, PMID: 27377652 Chaired workshop session; AI, HRS, SS, Conceptualiza- Gene Ontology Consortium. 2008. The Gene Ontology project in 2008. Nucleic Acids Research 36: tion, Visualization, Writing—review and editing; PKK, D440–D444. doi: 10.1093/nar/gkm883, PMID: 179840 Conceptualization, Software, Visualization, Methodol- 83 ogy, Writing—review and editing; HP, Conceptualiza- Hale VL, Watermeyer JM, Hackett F, Vizcay-Barrena tion; ISV, Conceptualization, Data curation, Writing— G, van Ooij C, Thomas JA, Spink MC, Harkiolaki M, review and editing; MW, GJK, Conceptualization, Duke E, Fleck RA, Blackman MJ, Saibil HR. 2017. Funding acquisition, Writing—review and editing Parasitophorous vacuole poration precedes its rupture and rapid host erythrocyte cytoskeleton collapse in Competing interests: SS: Reviewing editor, eLife. The plasmodium falciparum egress. PNAS 114:3439–3444. other authors declare that no competing interests doi: 10.1073/pnas.1619441114, PMID: 28292906 exist. Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A. 2016. EMPIAR: A public archive for raw electron microscopy image data. Nature Methods 13: Received 08 February 2017 387–388. doi: 10.1038/nmeth.3806, PMID: 27067018 Accepted 30 June 2017 Jupp S, Burdett T, Leroy C, Parkinson H. 2015. A New Published 06 July 2017 Ontology Lookup Service at EMBL-EBI. 8th International Conference on Semantic Web Funding Applications and Tools for Life Sciences Cambridge: CEUR-WS.org. Grant reference Jupp S, Malone J, Burdett T, Heriche JK, Williams E, Funder number Author Ellenberg J, Parkinson H, Rustici G. 2016. The cellular Medical Research MR/L007835 Gerard J microscopy phenotype ontology. Journal of Council Kleywegt Biomedical Semantics 7:28. doi: 10.1186/s13326-016- European Commis- 284209 Gerard J 0074-0, PMID: 27195102 sion Kleywegt Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. 2016. KEGG as a reference resource for Wellcome Trust 104948 Gerard J gene and protein annotation. Nucleic Acids Research Kleywegt 44:D457–D462. doi: 10.1093/nar/gkv1070, National Institutes of R01 GM079429 Gerard J PMID: 26476454 Health Kleywegt Kremer JR, Mastronarde DN, McIntosh JR. 1996. Computer visualization of three-dimensional image Medical Research MR/N009614/1 Martyn Winn data using IMOD. Journal of Structural Biology 116: Council 71–76. doi: 10.1006/jsbi.1996.0013, PMID: 8742726 The funders had no role in study design, data Li S, Sun Z, Pryce R, Parsy ML, Fehling SK, Schlie K, collection and interpretation, or the decision to submit Siebert CA, Garten W, Bowden TA, Strecker T, the work for publication. Huiskonen JT. 2016. Acidic pH-induced conformations and LAMP1 binding of the Lassa virus glycoprotein spike. PLOS Pathogens 12:e1005418. doi: 10.1371/ References journal.ppat.1005418, PMID: 26849049 Bennett A, Liu J, Van Ryk D, Bliss D, Arthos J, Liu J, Bartesaghi A, Borgnia MJ, Sapiro G, Henderson RM, Subramaniam S. 2007. Cryoelectron Subramaniam S. 2008. Molecular architecture of native tomographic analysis of an HIV-neutralizing protein HIV-1 gp120 trimers. Nature 455:109–113. doi: 10. and its complex with native viral gp120. Journal of 1038/nature07159, PMID: 18668044 Biological Chemistry 282:27754–27759. doi: 10.1074/ Malone J, Holloway E, Adamusiak T, Kapushesky M, jbc.M702025200, PMID: 17599917 Zheng J, Kolesnikov N, Zhukova A, Brazma A, Bennett AE, Narayan K, Shi D, Hartnell LM, Gousset Parkinson H. 2010. Modeling sample variables with an K, He H, Lowekamp BC, Yoo TS, Bliss D, Freed EO, Experimental Factor Ontology. Bioinformatics 26: Subramaniam S. 2009. Ion-abrasion scanning electron 1112–1118. doi: 10.1093/bioinformatics/btq099, microscopy reveals surface-connected tubular conduits PMID: 20200009 in HIV-infected macrophages. PLoS Pathogens 5: Malone J, Stevens R, Jupp S, Hancocks T, Parkinson e1000591. doi: 10.1371/journal.ppat.1000591, PMID: 1 H, Brooksbank C. 2016. Ten simple rules for selecting 9779568 a bio-ontology. PLOS 12: Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, e1004743. doi: 10.1371/journal.pcbi.1004743, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, PMID: 26867217 Tasumi M. 1977. The protein data bank: A computer- Maurer UE, Zeev-Ben-Mordehai T, Pandurangan AP, based archival file for macromolecular structures. Cairns TM, Hannah BP, Whitbeck JC, Eisenberg RJ, Journal of Molecular Biology 112:535–542. doi: 10. Cohen GH, Topf M, Huiskonen JT, Gru¨ newald K. 2013. 1016/S0022-2836(77)80200-3, PMID: 875032 The structure of herpesvirus fusion glycoprotein Diehl AD, Meehan TF, Bradford YM, Brush MH, B-bilayer complex reveals the protein-membrane and Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, lateral protein-protein interaction. Structure 21:1396– Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky 1405. doi: 10.1016/j.str.2013.05.018, PMID: 23850455 NA, Haendel MA, Blake JA, Mungall CJ. 2016. The Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel Cell Ontology 2016: Enhanced content, MA. 2012. Uberon, an integrative multi-species

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 10 of 11 Feature article Cutting edge Building bridges between cellular and molecular structural biology

anatomy ontology. Genome Biology 13:R5. doi: 10. Structural & Molecular Biology 21:841–845. doi: 10. 1186/gb-2012-13-1-r5, PMID: 22293552 1038/nsmb.2897, PMID: 25289590 Mu¨ ller A, Beeby M, McDowall AW, Chow J, Jensen Pettersen EF, Goddard TD, Huang CC, Couch GS, GJ, Clemons WM. 2014. Ultrastructure and complex Greenblatt DM, Meng EC, Ferrin TE. 2004. UCSF polar architecture of the human pathogen Chimera–A visualization system for exploratory Campylobacter jejuni. MicrobiologyOpen 3:702–710. research and analysis. Journal of Computational doi: 10.1002/mbo3.200, PMID: 25065852 Chemistry 25:1605–1612. doi: 10.1002/jcc.20084, Natale DA, Arighi CN, Blake JA, Bult CJ, Christie KR, PMID: 15264254 Cowart J, D’Eustachio P, Diehl AD, Drabkin HJ, Helfer Pintilie GD, Zhang J, Goddard TD, Chiu W, Gossard O, Huang H, Masci AM, Ren J, Roberts NV, Ross K, DC. 2010. Quantitative analysis of cryo-EM density Ruttenberg A, Shamovsky V, Smith B, Yerramalla MS, map segmentation by watershed and scale-space Zhang J, et al. 2014. Protein Ontology: A controlled filtering, and fitting of structures by alignment to structured network of protein entities. Nucleic Acids regions. Journal of Structural Biology 170:427–438. Research 42:D415–D421. doi: 10.1093/nar/gkt1173, doi: 10.1016/j.jsb.2010.03.007, PMID: 20338243 PMID: 24270789 Salavert-Torres J, Iudin A, Lagerstedt I, Sanz-Garcı´a E, Orloff DN, Iwasa JH, Martone ME, Ellisman MH, Kane Kleywegt GJ, Patwardhan A. 2016. Web-based volume CM. 2013. The cell: an image library-CCDB: A curated slicer for 3D electron-microscopy data from EMDB. repository of microscopy data. Nucleic Acids Research Journal of Structural Biology 194:164–170. doi: 10. 41:D1241–D1250. doi: 10.1093/nar/gks1257, 1016/j.jsb.2016.02.012, PMID: 26876163 PMID: 23203874 Santarella-Mellwig R, Pruggnaller S, Roos N, Mattaj Patwardhan A, Carazo JM, Carragher B, Henderson R, IW, Devos DP. 2013. Three-dimensional reconstruction Heymann JB, Hill E, Jensen GJ, Lagerstedt I, Lawson of bacteria with a complex endomembrane system. CL, Ludtke SJ, Mastronarde D, Moore WJ, Roseman A, PLoS Biology 11:e1001565. doi: 10.1371/journal.pbio. Rosenthal P, Sorzano CO, Sanz-Garcı´a E, Scheres SH, 1001565, PMID: 23700385 Subramaniam S, Westbrook J, Winn M, et al. 2012. Tagari M, Newman R, Chagoyen M, Carazo JM, Data management challenges in three-dimensional Henrick K. 2002. New electron microscopy database EM. Nature Structural & Molecular Biology 19:1203– and deposition system. Trends in Biochemical Sciences 1207. doi: 10.1038/nsmb.2426, PMID: 23211764 27:589. doi: 10.1016/S0968-0004(02)02176-X, Patwardhan A, Ashton A, Brandt R, Butcher S, PMID: 12417136 Carzaniga R, Chiu W, Collinson L, Doux P, Duke E, UniProt Consortium. 2013. Update on activities at the Ellisman MH, Franken E, Gru¨ newald K, Heriche JK, Universal Protein Resource (UniProt) in 2013. Nucleic Koster A, Ku¨ hlbrandt W, Lagerstedt I, Larabell C, Acids Research 41:D43–47. doi: 10.1093/nar/gks1068, Lawson CL, Saibil HR, Sanz-Garcı´a E, et al. 2014. A 3D PMID: 23161681 cellular context for the macromolecular world. Nature

Patwardhan et al. eLife 2017;6:e25835. DOI: 10.7554/eLife.25835 11 of 11