Performance of Secondary Structure Prediction Methods on Proteins

Performance of Secondary Structure Prediction Methods on Proteins PerformanceContaining of Structurally Secondary Ambivalent Structure Prediction Sequence Methods Fragments on Proteins Containing Structurally Ambivalent Sequence Fragments K. Mani Saravanan, Samuel Selvaraj Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620024, Tamil Nadu, India Received 3 July 2012; revised 4 September 2012; accepted 23 September 2012 Published online 19 October 2012 in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/bip.22178 # ABSTRACT: fragments. 2012 Wiley Periodicals, Inc. Biopolymers Several approaches for predicting secondary structures (Pept Sci) 100: 148-153, 2013. from sequences have been developed and reached a fair Keywords: octapeptides; secondary structure prediction; accuracy. One of the most rigorous tests for these segment overlap measure; structurally ambivalent frag- prediction methods is their ability to correctly predict ments; protein folding identical fragments of protein sequences adopting This article was originally published online as an accepted preprint. different secondary structures in unrelated proteins. In The ‘‘Published Online’’ date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers our previous work, we obtained 30 identical octapeptide editorial office at [email protected] sequence fragments adopting different backbone conformations. It is of interest to find whether the INTRODUCTION presence of structurally ambivalent fragments in proteins he general observation that protein sequences with detectable similarity will share a common structural will affect the accuracy of secondary structure prediction topology has yielded fairly accurate three-dimen- methods or not. Hence, in this work, we have made a sional structures by means of comparative modeling systematic comparative analysis on secondary structure and threading procedures.1–3 In contrast, there are Tcertain examples reported that similar sequences adopt dif- prediction results of 30 identical octapeptide pairs and 52 ferent structures to perform important biological functions. identical heptapeptide pairs adopting different One such example is a pair of designed proteins in which an conformations with the aid of segment overlap measure. IgG-binding, 4b+a fold can be transformed into an albumin- The results reveal the better performance of profile-based binding, 3-a fold via a mutational pathway where neither 4 methods such as PSIpred and JPred and misprediction by function nor native structure is completely lost. Contrarily, it was also shown that the distant sequences adopt similar struc- classical rule-based methods such as Garnier Osguthorpe tures from the crystal structures of intestinal fatty acid binding Robson Method and Double Prediction Method. The protein and Manduca sexta fatty acid binding protein, which results discussed here insist that modern secondary shares 19% sequence identity but structurally aligned to 1.62- structure prediction methods are able to better A˚ root mean square deviation of their Ca atoms.5 discriminate conformationally ambivalent peptide The idea behind the designed proteins with such high sequence identity with different structure and function is from the classical work by Kabsch and Sander,6 in which they Additional Supporting Information may be found in the online version of this article. reported that identical pentapeptides in unrelated proteins Correspondence to: Samuel Selvaraj, Department of Bioinformatics, School of adopt different secondary structures. Wilson et al.7 also Life Sciences, Bharathidasan University, Tiruchirappalli 620024, Tamil Nadu, India; e-mail: [email protected] observed that common sequences of up to eight residues do Contract grant sponsor: Council of Scientific and Industrial Research (CSIR), occur in unrelated proteins and that sequence-specific anti- Government of India bodies can be generated to test binding to identical sequences Contract grant number: 09/475(0157)/2010-EMR-I VC 2012 Wiley Periodicals, Inc. contained in unrelated proteins. 148 PeptideScience Volume 100 / Number 2 Performance of Secondary Structure Prediction Methods 149 Secondary structure predictions are playing a key role in Computation of Segment Overlap Measure finding important fragments of protein sequences that are To measure prediction accuracy of the different secondary structure involved in maintaining the folding and stability of three- prediction methods, we have computed segment overlap measure dimensional protein structure. Over the years, many second- between DSSP-assigned secondary structure and predicted secondary structures of each of all the methods incorporated in the NPS ary structure prediction approaches have been implemented web server, SYMPRED web server, and JPred server by using the 8–11 and tested by various research groups. Gromiha and Sel- program obtained from Zemla et al.,19 who proposed the segment varaj12 assessed the performance of several secondary struc- overlap measure to assess the accuracy of secondary structure pre- ture prediction methods in different structural classes of diction methods. The segment overlap measure can be defined as globular proteins and suggested that all the methods predict follows. Use s1 and s2 to denote segments of secondary structure in conformational state i (i.e., H, E, or C). Segments s1 (DSSP-assigned all-alpha class of proteins accurately. Jacoboni et al.13 have secondary structure) and s2 (predicted secondary structures) corre- made a stringent test on secondary structure prediction spond to the two secondary structure assignments being compared. methods by predicting identical short sequences that are The first assignment is considered as a reference and is typically known to adopt different conformations in unrelated pro- based on experiment, and the second assignment is the one being teins. Such secondary structure predictions may be used to evaluated. The two assignments are further referred to as ‘‘observed’’ and ‘‘predicted,’’ respectively. Let (s1, s2) denote a pair of overlap- guide the design of site-directed mutagenesis studies to locate 14 ping segments, S(i), the set of all the overlapping pairs of segments potential functionally important residues. (s1, s2) in state i: In our previous work, we have made an identical octapep- SðiÞ¼ðs1 \ s2Þ; s1 \ s2 6¼ u; tide search and found 30 identical octapeptide fragments with different secondary structures.15 These fragments of protein Here, s1 and s2 are both in conformational state i. sequences may undergo conformational rearrangements to For state i (H, E, and C), the segment overlap measure is defined as follows: bind other molecules to perform their specific biological role. X X Considering such examples, protein scientists want to make use sovðiÞ¼1003½1=N minovðs1; s2Þ of surprising protein fragments with similar sequences adopting i¼fH;E;Cg SðiÞ different secondary structures to understand protein folding þ @ðs1; s2Þ=moxovðs1; s2Þ3lenðs1Þ andstability.Ourfocusisontheeffectsofstructurallyambiva- lent octapeptide segments in proteins on structure prediction Here, s1 and s2 are the observed and predicted secondary structure outcomes. Hence, we have benchmarked several secondary segments in the particular secondary structural state, respectively; S(i) is the number of all segment pairs (s1, s2), where s1 and s2 have at structure prediction methods on 30 octapeptides of alternating least one residue in a secondary structural state in common; minov(s1, structure in terms of segment overlap measure. We have also s2) is the length of the actual overlap of s1 and s2; and maxov(s1, s2) is compared the secondary structure prediction accuracy of struc- the length of the total extent for which either of the segment s1 or s2 turally ambivalent octapeptides with the secondary structure has a residue in particular secondary structural state. Niis the total prediction accuracy of structurally ambivalent heptapeptides. number of amino acid residues observed in the ith secondary structural conformation. The definition of d(s1,s2)isasfollows: @ðs1; s2Þ¼minfðmax ovðs1; s2ÞÞ min ovðs1; s2ÞÞ; min ovðs1; s2Þ; MATERIALS AND METHODS intðlenðs1Þ=2Þ; intðlenðs2Þ=2Þg: Dataset and Secondary Structure Prediction Segment overlap measure has been defined to evaluate the correct- In our previous work,15 we have searched for identical fragments in ness of segment prediction with respect to a reference assignment unrelated Protein Data Bank16 sequences. From our search, we have (Sov observed). The latter measure is calculated with s1 standing for found 52 structurally ambivalent heptapeptides and 30 structurally predicted segments and s2 for observed and corresponds to Sov pre- 19 ambivalent octapeptide pairs with different secondary structures, dicted. We have not used the Q3 measure to benchmark the pre- which form the source of our study. Each protein sequence (not diction accuracy as the segment overlap measure itself takes into only the ambivalent peptide) containing the conformationally more account on structural kind of information, whereas the Q3 20 ambivalent fragments was subjected to consensus secondary struc- measure does not consider it. All the computations have been ture prediction by using NPS web server,17 SYMPRED (http:// automated and performed on SUN ULTRA 40M2 workstation. www.ibi.vu.nl/programs/sympredwww/), and JPred secondary structure prediction server.18 From the secondary

Performance of Secondary Structure Prediction Methods on Proteins

An Effective Computational Method Incorporating Multiple Secondary

Chapter 13 Protein Structure Learning Objectives

Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework Lewis Moffat 1,∗ and David T

Bi-Allelic Novel Variants in CLIC5 Identified in a Cameroonian

Further Confirmation of the Association of SLC12A2 with Non-Syndromic

Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction

The Structure and Dynamics of Bmr1 Protein from Brugia Malayi: in Silico Approaches

The PSIPRED Protein Structure Prediction Server Liam J

Biophysical Analysis of the N-Terminal Domain from The

Protein Fold Recognition by Threading

The Phyre2 Web Portal for Protein Modeling, Prediction and Analysis

The PSIPRED Protein Analysis Workbench