Complex Portal 2018: Extended Content and Enhanced Visualization Tools for Macromolecular Complexes Birgit H.M
Total Page:16
File Type:pdf, Size:1020Kb
D550–D558 Nucleic Acids Research, 2019, Vol. 47, Database issue Published online 24 October 2018 doi: 10.1093/nar/gky1001 Complex Portal 2018: extended content and enhanced visualization tools for macromolecular complexes Birgit H.M. Meldal 1,*, Hema Bye-A-Jee1, Luka´sˇ Gajdosˇ 2, Zuzana Hammerova´ 2, Aneta Hora´ckovˇ a´ 2, Filip Melicher2, Livia Perfetto1, Daniel Pokorny´ 2, Milagros Rodriguez Lopez1,Alzbˇ etaˇ Turkov¨ a´ 2, Edith D. Wong3, Zengyan Xie4, Elisabeth Barrera Casanova 1, Noemi del-Toro1, Maximilian Koch1,5, Pablo Porras1, Henning Hermjakob1,6 and Sandra Orchard1 1European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK, 2Department of Biochemistry and National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic, 3Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5477, USA, 4Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, No. 2 Chongwen Road, Nan’an District, Chongqing 400065, China, 5Novartis Institutes for BioMedical Research (NIBR), Basel, Canton of Basel-Stadt, Switzerland and 6State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China Received September 13, 2018; Revised October 05, 2018; Editorial Decision October 08, 2018; Accepted October 09, 2018 ABSTRACT INTRODUCTION The Complex Portal (www.ebi.ac.uk/complexportal) All biological processes rely on the interaction of proteins is a manually curated, encyclopaedic database that with other cellular components and in many cases these pro- collates and summarizes information on stable, teins function as part of macromolecular complexes. These complexes may be composed only of proteins or also con- macromolecular complexes of known function. It / captures complex composition, topology and func- tain nucleic acids and or small molecules. Complexes can be stable or transient in nature and vary in their function tion and links out to a large range of domain-specific and composition depending on the tissue type, developmen- resources that hold more detailed data, such as PDB tal stage and biological process. Many complexes have been or Reactome. We have made several significant im- well characterized by experimental methods while the exis- provements since our last update, including improv- tence or composition of other complexes has been inferred ing compliance to the FAIR data principles by pro- from related evidences. viding complex-specific, stable identifiers that in- The Complex Portal (www.ebi.ac.uk/complexportal)isa clude versioning. Protein complexes are now avail- manually curated, encyclopaedic resource that summarizes able from 20 species for download in standards- all functional aspects of stable macromolecular complexes compliant formats such as PSI-XML, MI-JSON and (1). Since our last update, we have improved the data struc- ComplexTAB or can be accessed via an improved ture by providing more data download formats, new iden- REST API. A component-based JS front-end frame- tifiers and expanded our data content in terms of both, the number of complexes curated and the species represented in work has been implemented to drive a new website the resource. We have significantly improved interoperabil- and this has allowed the use of APIs from linked ity with other resources, such as the Gene Ontology (2), and services to import and visualize information such updated the website to provide better data access and more as the 3D structure of protein complexes, its role visualization options, all making the Complex Portal more in reactions and pathways and the co-expression of fully compliant with the FAIR principles of Findability, Ac- complex components in the tissues of multi-cellular cessibility, Interoperability and Reusability (3). organisms. A first draft of the complete complex- The characterization of the complex composition in a ome of Saccharomyces cerevisiae is now available model organism can be used to predict the existence of sim- to browse and download. ilar complexes in other species (4). Such inferences can con- tribute to our understanding of evolutionary development *To whom correspondence should be addressed. Tel: +44 1223 494107; Email: [email protected] C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2019, Vol. 47, Database issue D551 at the molecular level. We are currently developing a tool Quality control that can predict orthologous complexes in other species as All entries in the Complex Portal are manually annotated well as find homologous complexes that have already been and carefully checked by a second curator before publica- curated in the Complex Portal. tion. Entries are standards compliant, for example using stable identifiers (UniProt for proteins, ChEBI for small CONTENT molecules (9), RNAcentral for ncRNAs (10) and controlled Curation update vocabularies such as the Gene Ontology and PSI-MI ontol- ogy (11). Curators work to a detailed curation manual that The working definition of a protein complex is that it should is constantly updated and reviewed as we discover new data contain two or more macromolecules for which there is evi- concepts. A summary of our curation practices is available dence (experimental or inferred) that these molecules stably via the <Documentation> tile on the public website; this interact with each other and have a demonstrated molec- enables users to understand the annotation process. We have ular function. Each entry (Figure 1) describes the compo- implemented new syntax checks that are run before each sition, topology and function of a single complex from a release cycle, making the data more robust to human error. defined species and links out to external databases that pro- Following strict rules and quality control checks means that vide domain-specific information of the complex, such as our entries are consistent and of high quality. structural details from PDB (5) or the role of the complex in metabolic reactions or signalling pathways in Reactome (6). A complex is uniquely defined by its protein components us- Complex identifiers and versions ing canonical UniProt identifiers, i.e. the protein sequence The new complex-specific stable identifiers have a version selected as the reference sequence in a UniProtKB/Swiss- number that can be incrementally increased to indicate a Prot entry. Isoform-specific complex variants are only cre- major change to the underlying record, thus making the re- ated when the function of the complex is specific for a given source more fully compliant with the FAIR standard. Iden- isoform. The Cyclin L-CDK11 complexes provide interest- tifiers are of the type: CPX-xxxx.y, where xxxx is the run- ing examples where this is the case (CPX-341, CPX-343, ning number and y the version. The previously published CPX-344, CPX-347). Complex participants are described EBI-xxxxxxx identifiers are retained as secondary identi- using the UniProtKB identifiers for post-processed chains fiers and can still be retrieved as they remain in the down- when complexes are formed only by a specific cleavage load files and are indexed in the webservice. The new iden- product, for example the amyloid-beta protein 40/42 het- tifiers also provide an easy distinction from IntAct exper- erodimer (CPX-1062). Post-transcriptional modifications imental evidences with which they previously shared an are described only if they are essential to the formation of identifier space. A complex version will be incrementally in- the complex as, for example, in the case of the formation of creased if the composition of the complex has changed, for the SMAD1 homotrimer (CPX-144). example additional participants have been added. A change The Complex Portal currently focuses on curating stable in stoichiometry will currently not trigger reversioning. protein complexes from a limited number of model organ- If a complex has been obsoleted without replacement, the isms to provide a reference from which complexes in other accession number (AC) will no longer be searchable in the species can be inferred and the evolutionary development of current release files or the website but will remain accessi- a conserved functional unit can be traced. As of release 219 ble from the archive on our ftp site (ftp://ftp.ebi.ac.uk/pub/ (8 October 2018), 2472 complexes have been curated and databases/intact/complex/current). If a complex is split into released. several new entries the original AC will be retained as a sec- Recent curation priorities have included: ondary AC on the new complexes and, similarly, if two com- plexes are merged, the original AC from one complex be- a. Mammalian complexes with available 3D structures. comes a secondary AC on the retained and updated com- This is a long-term program in collaboration with plex. The website only displays the current AC whilst pre- CEITEC and PDBe. 297 curated mammalian complexes vious entry versions and obsoleted ACs are stored in the have crystal structures, 248