Computational Characterization of Moonlighting Proteins
Total Page:16
File Type:pdf, Size:1020Kb
1780 Biochemical Society Transactions (2014) Volume 42, part 6 Computational characterization of moonlighting proteins Ishita K. Khan* and Daisuke Kihara*†1 *Department of Computer Science, Purdue University, West Lafayette, IN 47907, U.S.A. †Department of Biological Science, Purdue University, West Lafayette, IN 47907, U.S.A. Abstract Moonlighting proteins perform multiple independent cellular functions within one polypeptide chain. Moonlighting proteins switch functions depending on various factors including the cell-type in which they are expressed, cellular location, oligomerization status and the binding of different ligands at different sites. Although an increasing number of moonlighting proteins have been experimentally identified in recent years, the quantity of known moonlighting proteins is insufficient to elucidate their overall landscape. Moreover, most moonlighting proteins have been identified as a serendipitous discovery. Hence, characterization of moonlighting proteins using bioinformatics approaches can have a significant impact on the overall understanding of protein function. In this work, we provide a short review of existing computational approaches for illuminating the functional diversity of moonlighting proteins. Introduction apoptosis and other regulatory functions. So far, the identific- With the increase in the number of functionally well- ation of moonlighting proteins has been done by experiments characterized proteins, as well as the advancement of large- and reviews of these proteins exist in the literature [1,3–6]. scale proteomics studies, more and more proteins have been Studies suggest significant effects of moonlighting proteins in observed to exhibit more than one cellular function. These diseases and disorders [7–9]. Despite the potential abundance proteins were named as ‘moonlighting’ proteins first by of moonlighting proteins in various genomes and their Jeffrey [1]. A moonlighting protein demonstrates multiple important roles in pathways and disease development, the autonomous and usually unrelated functions. The diversity number of currently confirmed moonlighting proteins is of dual functions of these proteins is, in principle, not a con- still too small to obtain a comprehensive picture of the sequence of gene fusions, splice variants, multiple proteolytic cellular mechanisms underlying their functional diversity. fragments, homologous but non-identical proteins or varying This quantitative insufficiency is, in large part, due to the post-transcriptional modifications. Moonlighting proteins tendency for the additional function of these proteins to be are not limited to a certain type of organism or protein found serendipitously in the course of unrelated experiments. family, nor do they have common switching mechanisms Hence, a systematic bioinformatics approach could make through which they moonlight. The known mechanisms for substantial contributions in identifying novel moonlighting switching functions include expression of cell type, cellular proteins and also in elucidating functional characteristics of localization, oligomerization state and identity of binding moonlighting proteins. ligand [1]. In the present article, we review existing computational It was identified that crystallin, a structural protein in analyses on moonlighting proteins. First, we discuss two the eye lens of several species, also has enzymatic activity studies that investigated whether existing sequence-based [2]; this was one of the first examples of multifunctional function prediction methods can identify distinct dual proteins. Many known moonlighting proteins were originally functions of moonlighting proteins [10,11]. Secondly, we recognized as enzymes, but there are also others that are review another study by Gomez´ et al. [12] on analysis of known as receptors, channel proteins, chaperon proteins, protein–protein interactions (PPIs) of moonlighting proteins ribosomal proteins and scaffold proteins [1,3,4]. The sec- where they examined whether the interacting partners of ondary or moonlighting functions of these proteins include moonlighting proteins disclose the moonlighting function transcriptional regulation, receptor binding, involvement in or not. Thirdly, we analyse the study by Hernandez´ et al. [13] where they explored structural aspects of known moonlighting proteins to identify whether the promiscuous functionality of these proteins are caused by Key words: bioinformatics, computational analysis, moonlighting protein, multitasking protein, protein function. the conformational fluctuations in their structures. Then Abbreviations: BLOSUM, Blocks Substitution Matrix; BP, biological process; ESG, Extended we introduce recently developed databases of moonlighting Similarity Group; GO, Gene Ontology; IDP, intrinsic disordered protein; MF, molecular function; proteins [14]. Lastly, we discuss the current situation of Gene PFP, Protein Function Prediction; PPI, protein–protein interaction; PSI-BLAST, Position-Specific Iterated-Basic Local Alignment Search Tool. Ontology (GO) [15] annotations of known moonlighting 1 To whom correspondence should be addressed (email [email protected]). proteins in the UniProt database [16]. C C Biochemical Society Transactions www.biochemsoctrans.org The Authors Journal compilation 2014 Biochemical Society Biochem. Soc. Trans. (2014) 42, 1780–1785; doi:10.1042/BST20140214 The Biological and Biomedical Consequences of Protein Moonlighting 1781 Moonlighting proteins pose a challenge in the default BLOSUM60 (Blocks Substitution Matrix 60) in bioinformatics research order to consider more distantly related sequences. Recall by A review by Jeffery [4] discusses moonlighting proteins in PSI-BLAST improved using BLOSUM45. In the head-to- the context of systems-level proteomics studies and present head comparison against PFP,PSI-BLAST with BLOSUM45 challenges for computational analyses. Most sequence-based showed a higher recall than PFP for eight proteins whereas function prediction methods are based on homology searches PFP had a higher recall in 10 cases (one protein had a tie). PSI- or motif/domain identifications. Moonlighting proteins can BLAST with BLOSUM30 failed to predict any GO terms complicate this approach since there are cases in which above an E-value of 0.01 for 12 proteins. Overall, PFP and orthologous proteins in different organisms do not share PSI-BLAST with BLOSUM45 showed higher recall than the moonlighting functions. Moreover, the possibility of a rest of the methods. moonlighting function would change how we treat the These results highlighted the advantage of PFP in existence of motifs in the protein that have been identified predicting the diverse functions of moonlighting proteins with less confidence, since those hits may explain the with high recall. Incorporating the BLOSUM45 matrix moonlighting function of the protein. From a structural improved recall of PSI-BLAST greatly, which provides point of view, moonlighting proteins could be identified by another indication that considering weakly similar sequences discovery of multiple ligand-binding sites. Last but not least, enhances the prediction of moonlighting functions of moonlighting functions have implications in the discovery proteins. of drug-targets and biomarkers, since the knowledge of all The second work, by Gomez´ et al. [11], compared functions of a target protein is necessary to design drugs that the performance of homology-based and motif/domain- only affect the desired function of the target. based methods in retrieving sequences with primary and/or moonlighting functions using a dataset of 46 moonlighting proteins. They compared PSI-BLAST and ten motif/domain- based methods. For a dataset of 46 moonlighting proteins, the Sequence-based function prediction for authors ran the 11 methods and retrieved all the sequences moonlighting proteins that matched a query protein above a certain standard score Conventional sequence-based functional annotation methods cut-off. If any of the retrieved sequences had the primary are based on the concept of homology [17,18] or conserved or secondary function of the query moonlighting protein, motifs/domains [19–21]. Two studies have investigated how it was considered as a ‘positive match’ for that function. well current sequence-based methods identify the distinct For example, for the moonlighting protein FtsH (primary dual functions of moonlighting proteins. In one of the works function: protease, moonlighting function: chaperone), the we have benchmarked performance of three sequence-based PSI-BLAST output contained two matched sequences (both function prediction methods, the Protein Function Prediction with E-value of 0.0): gi5231279 and gi12724524, which are (PFP) algorithm [22,23], the Extended Similarity Group a proteinase and a heat-shock protein respectively. In this (ESG) algorithm [24] and Position-specific Iterated-Basic case, both sequences were considered a ‘positive match’, Local Alignment Search Tool (PSI-BLAST) [25], on a set the first for the primary function and the second for the of experimentally known moonlighting proteins [5]. PFP moonlighting function. Among the methods tested, PSI- extends a traditional PSI-BLAST search by extracting and BLAST out-performed others in finding positive matches for scoring GO annotations from distantly similar sequences both the functionalities of the moonlighting proteins. Among and then applying contextual associations of GO terms the ten motif/domain-based methods, ProDom performed observed