A New Paradigm for Beta Structural Motif Recognition with Application to Recognizing Beta Trefoils

Wrap-and-Pack: A New Paradigm for Beta Structural Motif Recognition with Application to Recognizing Beta Trefoils [Extended Abstract] ∗ ∗ † Matthew Menke Eben Scanlon Jonathan King ‡ ¶ §¶ Bonnie Berger Lenore Cowen ABSTRACT Keywords A method is presented that uses β-strand interactions at Protein Structure Prediction, Beta Structures, Motif Recog- both the sequence and the atomic level, to predict the beta- nition, Beta Trefoils, Rotamer Libraries, Threading. structural motifs in protein sequences. A program called Wrap-and-Pack implements this method, and is shown to recognize β-trefoils, an important class of globular β-struc- 1. INTRODUCTION tures, in the Protein Data Bank with 92% specificity and Hand-curated hierarchical classification systems, such as 92.3% sensitivity in cross-validation. It is demonstrated SCOP [17] and CATH [18] divide the set of known globular that Wrap-and-Pack learns each of the ten known SCOP β- protein folds into groups depending on the overall topology trefoil families, when trained primarily on β-structures that of the fold. At the top level of the hierarchy, protein folds are are not β-trefoils, together with 3D structures of known β- divided into such classes as “mainly alpha,” “mainly beta,” trefoils from outside the family. Wrap-and-Pack also predicts “alpha/beta,” etc. according to the predominant type of sec- many proteins of unknown structure to be β-trefoils. The ondary structure motifs in the protein; the classification spe- computational method used here may generalize to other β- cializes within these classes into fold, superfamily, and fam- structures for which strand topology and profiles of residue ily levels of the hierarchy. The structural motif recognition accessibility are well conserved. problem is the following prediction problem: given only the target amino acid sequence for a protein, and a fold or super- Categories and Subject Descriptors family, predict whether the protein folds into a 3D structure which is a member of that fold, or superfamily, or not. J.3.1 [Computer Applications]: Life and Medical Sci- The structural motif recognition problem is more eas- ences—Biology and Genetics ily solved when there is sufficient sequence similarity between protein sequences in the structural motif, because pro- General Terms teins whose sequences are sufficiently similar fold into similar structures. For such a motif, membership questions can be Algorithms, Experimentation. solved by simply running standard sequence matching tools ∗ such as BLAST [1]. However, there exist many protein motifs Computer Science and Artificial Intelligence Lab, MIT, where while the 3D structures of the proteins are very close, Cambridge, MA 02139, {mmenke, eben}@mit.edu there is insufficient sequence similarity to determine from † Department of Biology, MIT, Cambridgem MA 02139 sequence alone if an unsolved protein sequence is a member [email protected] of the motif. We call such motifs sequence heterogeneous. ‡ Department of Mathematics, and Computer Science and It has proved to be a difficult challenge to devise struc- Artificial Intelligence Lab, MIT, Cambridge, MA 02139, tural motif recognizers for mainly-beta structures that are [email protected] sequence heterogeneous. In fact, even the best (local) sec- § Department of Computer Science, Tufts University, Med- ondary structure predictors are better at correctly placing ford, MA 02155, [email protected] α-helices than β-strands [19, 22]. It has been our experience ¶ Corresponding author that general secondary structure predictors do not suffice even to correctly determine the number of β-strands in a sequence that folds into one of these motifs; never mind find the ends of the strands accurately. Rather we have found Permission to make digital or hard copies of all or part of this work for that to recognize such motifs, we must search for secondary personal or classroom use is granted without fee provided that copies are structure and super-secondary structure at the same time. not made or distributed for profit or commercial advantage and that copies This was the approach taken by our first structural motif bear this notice and the full citation on the first page. To copy otherwise, to recognizer for a sequence-heterogeneous mainly-beta fold, republish, to post on servers or to redistribute to lists, requires prior specific BetaWrap, which predicts the right-handed parallel β-helix permission and/or a fee. RECOMB’04, March 27–31, 2004, San Diego, California, USA. motif [6, 9]. Copyright 2004 ACM 1-58113-755-9/04/0003 ...$5.00. BetaWrap used a structural template approach to look for 298 the conserved elements of super-secondary structure in the tational issues, this can lead to false positives, both because β-helix motif. Given a sequence of unknown structure, the the energy scores are only approximations, and also false program would “wrap” the sequence into a parallel β-helix positive sequences can score well by chance in some possible with conserved regions of β-strands and variable-length turn threading, when the number of allowed different threadings regions. For each possible wrap, pairwise statistical prefer- is large. Both the wrapping phase and the packing phase of ences of which amino acids prefer to stack on top of each Wrap-and-Pack can be seen as threaders— the first with a other in the β-sheets was calculated, and compared to a pairwise potential energy function based on pairwise residue database of stacking preferences in amphipathic β-sheets correlations with no energetics. Then the small number of (from general non-β-helix β-structural motifs that had such high-scoring alignments of the target sequence wrapped onto β-sheets). This approach can detect more global interac- the template structure are passed to a threader that uses the tions than a local secondary structure predictor, allowing more sensitive 3D packing molecular dynamics constraints for the capture of relationships between residues that are (in our case, we use SCWRL on a conserved portion of the close in space, but may be far, and an irregular distance trefoil cap strand, but other threaders could also be sub- apart, in sequence. With some additional complexity, such stituted). The key is by using a structural template and as adding a secondary structure filter to rule out false posi- an initial structure-based wrapping phase, we can drasti- tives with too much global α-helical content, BetaWrap was cally reduce the number of different ways each sequence is able to completely separate the true β-helices from the non- threaded onto the backbone in the packing phase. Then, if β-helices in a 2000 non-redundant version of the PDB, in a one of these small number of sequence threadings produces leave-family-out cross validation. a low-energy score using SCWRL,aswedid,orsomeother3D The purpose of this paper is to introduce a new method sidechain placement algorithm, we are more confident that for solving structural motif recognition problems that arise it is a true positive example. in sequence-heterogeneous β-structural superfamilies, that There are 122 solved β-trefoil proteins according to SCOP we call Wrap-and-Pack. As the name indicates, the method version 1.63 in six superfamilies, and ten families. The β- we employ has two phases, a wrapping phase and a packing trefoils serve as neurotoxins, inhibitors, and receptors. β- phase. The wrapping phase is conceptually similar to what trefoil proteins have also been implicated in inducing the BetaWrap does for the β-helices: it tries to parse the struc- inflammatory response in rheumatoid arthritis, as well as ture onto an abstract template that captures the conserved playing roles in embryonic development, and tumorigenesis. elements of super-secondary structure, and screens for favorable pairwise correlations between adjacent residues in the N putative β-sheets; it also incorporates bonuses and penal- B4 ties into the score, such as the β-propensity of residues in B1 the putative β-strands, according to PSIPRED [13]. When we create a wrapping phase and apply it alone to the β-trefoil T3 T1 x3 fold (Figure 1), we find that it does fairly well at identifying the correct regions of conserved secondary structure in the B2 true β-trefoils; however, unlike BetaWrap and the β-helices, there are non-trefoil sequences that the program indicates B3 could form β-trefoils. To help screen these out, we go on to T2 the packing phase. The packing phase incorporates, for the first time, 3D Figure 1: The β-trefoil consists of three leaves energetic information into our structural template by way around an axis of three-fold symmetry. In this fig- of a backbone dependent rotamer library [20]. In particular, ure a single leaf is shown in dark gray (left). Each the most favorable wraps are fed into the SCWRL program of the leaves consists of four β-strands, B1, B2, B3, of Canutescu et al., which then threads the wrap onto a and B4, separated by turn regions T1, T2, and T3. small set of β-trefoil backbones, resolves steric clashes, and B2-T2-B3 form a β-hairpin. T1 and T2 both contain reports an energy score. The energy score is used to help β-turns. The B1 and B4 strands of all three leaves discriminate the trefoils from the non-trefoils. form a six-stranded antiparallel β-barrel. The three Another way to think of Wrap-and-Pack is as a two-tiered β-hairpins form a cap on one end of the barrel (on threader. A threader is a method that tries to map a tar- the bottom in this figure.) get sequence of unknown structure onto the backbone of a known 3D structure or backbone template of multiple 3D structures (see [8, 15, 21] and [16] for a recent survey).

A New Paradigm for Beta Structural Motif Recognition with Application to Recognizing Beta Trefoils

RNA Structural Motif Recognition Based on Least-Squares Distance

Cross-Talk Between Overlap Interactions in Biomolecules: a Case Study of the Β-Turn Motif

Comparison of GHT-Based Approaches to Structural Motif Retrieval

For Use of the Conserved Helix-Turn-Helix Motif in DNA Binding (Escherichia Coli/Operator Recognition/Hydroxylamine Mutagenesis/Tetracycline Resistance) PAUL J

Computational Design of Asymmetric Three-Dimensional RNA Structures and 2 Machines

Computational Prediction of RNA Structural Motifs Involved in Posttranscriptional Regulatory Processes

An Approach Used in DNA-Binding Helix-Turn-Helix Motif Prediction with Sequence Information Wenwei Xiong, Tonghua Li*, Kai Chen and Kailin Tang

Disclosing the Impact of Carcinogenic Sf3b Mutations on Pre-Mrna Recognition Via All-Atom Simulations

Automated Construction of Structural Motifs for Predicting Functional Sites on Protein Structures

The Structure and Function of DNA G-Quadruplexes

Structure of a Left-Handed DNA G-Quadruplex

Structural Motifs in Ribosomal Rnas: Implications for RNA Design And