Mobidb: Intrinsically Disordered Proteins in 2021
Total Page:16
File Type:pdf, Size:1020Kb
Published online 25 November 2020 Nucleic Acids Research, 2021, Vol. 49, Database issue D361–D367 doi: 10.1093/nar/gkaa1058 MobiDB: intrinsically disordered proteins in 2021 Damiano Piovesan 1, Marco Necci1, Nahuel Escobedo2, Alexander Miguel Monzon 1, Andras´ Hatos 1,IvanMicetiˇ c´ 1, Federica Quaglia1, Lisanna Paladin 1, Pathmanaban Ramasamy3,4,5,6,7, Zsuzsanna Dosztanyi´ 8, Wim F. Vranken 3,4,5, Norman E. Davey9, Gustavo Parisi2, Monika Fuxreiter1 and Silvio C.E. Tosatto 1,* 1Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy, 2Dept. of Science and Technology, Universidad Nacional de Quilmes, Buenos Aires, Argentina, 3Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium, 4Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium, 5Centre for Structural Biology, VIB, Pleinlaan 2, 1050 Brussels, Belgium, 6VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium, 7Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium, 8Dept. of Biochemistry, ELTE Eotv¨ os¨ Lorand´ University, Budapest, Hungary and 9Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London, SW3 6JB, UK Received September 15, 2020; Revised October 16, 2020; Editorial Decision October 19, 2020; Accepted November 19, 2020 ABSTRACT INTRODUCTION The MobiDB database (URL: https://mobidb.org/) Intrinsically disordered regions (IDRs) of proteins do not provides predictions and annotations for intrinsically adopt a highly populated structure in isolation, but sample disordered proteins. Here, we report recent devel- a wide range of conformations. IDR-mediated interactions opments implemented in MobiDB version 4, regard- are implicated in a variety of cellular processes from signal ing the database format, with novel types of anno- transduction and liquid-liquid phase transition (1–5). IDRs are subjected to extensive pre- and post-translational reg- tations and an improved update process. The new ulation to modulate protein function in response to cellu- website includes a re-designed user interface, a more lar stimuli (6–9). Many functions of IDRs, such as entropic effective search engine and advanced API for pro- springs, flexible linkers or spacers are directly associated grammatic access. The new database schema gives with their structural attributes (10,11). more flexibility for the users, as well as simplify- Proteins containing intrinsically disordered regions are ing the maintenance and updates. In addition, the present in a considerable fraction of the proteome of new entry page provides more visualisation tools in- eukaryotic organisms (e.g. 43.6% in humans based on cluding customizable feature viewer and graphs of MobiDB-lite predictions). In the human proteome, for ex- the residue contact maps. MobiDB v4 annotates the ample, IDRs are highly enriched in interaction interfaces binding modes of disordered proteins, whether they and post-translational modification sites, which serve as undergo disorder-to-order transitions or remain dis- regulatory switches of a variety of biochemical pathways (12). Short linear motifs in the IDRs of viruses lead to func- ordered in the bound state. In addition, disordered tional promiscuity and enable compact viral proteomes to regions undergoing liquid-liquid phase separation or extensively rewire their host cells (13). Given the lack of a post-translational modifications are defined. The in- requirement for a stable globular fold, interfaces in IDRs tegrated information is presented in a simplified in- also appear to have higher evolutionary plasticity allowing terface, which enables faster searches and allows them to proliferate in a proteome by ex nihilo evolution. large customized datasets to be downloaded in TSV, As a result, IDR interfaces are possibly more widespread Fasta or JSON formats. An alternative advanced in- than ordered interfaces (14). Despite their central roles in terface allows users to drill deeper into features of key cellular regulatory processes, only a small fraction of interest. A new statistics page provides information disordered regions have been experimentally characterised at database and proteome levels. The new MobiDB (15,16). While difficulties in protein expression, purification version presents state-of-the-art knowledge on dis- and structural characterisation hamper experimental char- acterization, assigning functional modules to dynamic con- ordered proteins and improves data accessibility for formational ensembles presents a technical problem for a both computational and experimental users. database. *To whom correspondence should be addressed. Tel: +39 049 827 6269; Email: [email protected] C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D362 Nucleic Acids Research, 2021, Vol. 49, Database issue Manually curated databases, such as DisProt (17,18)and annotation coverage of different types of annotations and IDEAL (19), collect annotations from the literature and compare the annotation content at the proteome level for standardise that information. These databases assign the all reference proteomes. position of IDRs in the sequence, and aim to assemble disorder-related functional annotations. Other databases PROGRESS AND NEW FEATURES focus on functional features: DIBS (15) and MFIB (20) col- lect regions undergoing folding upon binding, and ELM Database content (16) annotates linear motifs. IDRs forming fuzzy complexes MobiDB data serves both experimental scientists, who are are collected in the FuzDB database (21), which also aims interested in different aspects of disordered protein regions to establish links between structural properties and func- as well as bioinformaticians, who develop softwares for tion. Proteins reported to undergo liquid-liquid phase sep- analysis and prediction of disordered regions. In the new aration (LLPS) typically contain IDRs, which are available version of the database, several new data resources and in PhasePro (22), PhaSepDB (23) and LLPSDB (24). Alto- features were added to increase the coverage and usabil- gether, these specialized datasets exhibit a limited overlap, ity of information about protein disorder. Figure 1 gives which presents a bottleneck for comprehensive characteri- an overview of the annotations and services provided by sation of the IDRs contained therein (25). MobiDB. All curated disordered regions are mapped across Large-scale identification of IDRs often depends on pri- homologs obtained from GeneTree alignments (39) with mary deposition databases of structural data, like the PDB >80% of sequence similarity and alignment lengths of 10 (26), which can be used to indirectly derive disorder infor- residues. This expands the curated set of proteins ten fold. mation from missing and mobile residues. Although an- Structural and functional properties of disordered regions notation based on this information is less reliable com- are based on third party databases and a set of prediction pared to curated resources, it provides a larger set of ex- methods. amples (27,28). A complementary source of information The full list of software and resources used in MobiDB is provided by sequence based predictors, which typically is provided in Tables 1 and 2. The different types of anno- exploit amino acid biases in regions of proteins to iden- tations and predictions are assembled to provide the user tify IDRs (29). Deriving information from these methods a comprehensive view of properties of disordered regions requires a critical assessment of the different techniques, at the residue level. As compared to the previous releases, which has been implemented in independent blind tests MobiDB v4 provides annotations on novel functional as- and in the Critical Assessment of Intrinsic protein Disor- pects of disordered proteins, such as binding modes (36), der (CAID). These efforts, in particular CAID, enable inte- involvement in post-translational modifications and phase grating large-scale predicted information on disordered re- separation (3,4,7). gions into other core data resources such as InterPro (30), Disorder predictions are provided via the MobiDB-lite UniProtKB (31) and PDBe (26) via MobiDB-lite (32). software (32) over the entire UniProtKB set of protein se- In contrast to structural data, functional annotations re- quences. In addition, MobiDB calculates annotations from lated to disordered regions are underrepresented in public PDB structures, based on manually curated annotations databases and ontologies (33). This presents a bottleneck from specific third party databases and propagating manual for large-scale functional assignment of disordered regions. curation by homology. MobiDB-lite is also integrated for Functional predictions are in essence limited to regions that example into InterProScan (30) which propagates its pre- fold upon binding, e.g. ANCHOR (34), DISOPRED3 (35) dictions onto several other EBI resources like UniProtKB, and MoRFCHiBi (66). Recently, prediction of regions that InterPro and PDBe. remain disordered upon binding (36) and of regions that