A Brighter Future for Protein Structure Prediction

A Brighter Future for Protein Structure Prediction

meeting review A brighter future for protein structure prediction Patrice Koehl and Michael Levitt The most recent critical assessment of structure prediction meeting (CASP3) revealed significant progress in predicting the three-dimensional folds of proteins with unknown structures. In recent alternating years, protein struc- ed at Asilomar and papers by the three tein on the basis of close sequence similar- ture predictors have met in Asilomar, assessors and the 18 groups chosen to ity to a template protein of known struc- California, where their predictions and speak are published in a special issue of ture. At least one group built a high methods have been assessed. The most Proteins: Structure, Function & Genetics. quality model (Ca r.m.s. deviation < 2 A recent critical assessment of structure The first meeting (CASPl, see ref. 1) took for a substantial fraction of the structure) prediction (CASP) meeting* was held place in 1994, assessing 135 predictions for every target for which a homolog with this past December, and the results pre- made by 35 groups for 33 different protein a known structure could be detected by sented at this conference reveal clear targets. The second meeting (CASP2, see sequence comparison methods like progress in the areas of accurate align- ref. 2), held in 1996, was marked by a dra- FASTA or PSI-BLAST. This was even true ments, sensitive fold recognition, reason- matic increase in the interest of the com- for cases with sequence identity below able ab initio structure modeling and munity (947 predictions made by 76 25%, where accurate modeling was possi- improved secondary structure predic- groups for 42 targets). Recently, some 250 ble for 60-75% of the structure (Table 1). tion. These advances give hope that pro- anxious predictors gathered again in Getting the alignment correct at such low tein structure prediction is indeed a Asilomar for the CASP3 meeting, orga- levels of sequence identity is a difficult tractable problem. They also have impli- nized by John Moult, Tim Hubbard, problem. It was solved in different ways by cations for ongoing efforts to assign (Sanger Center, UK), Krzysztof Fidelis, the speakers for the selected groups structure and function to the thousands (Lawrence Livermore, USA) and Jan including: Sternberg (Imperial Cancer of novel genes that are sequenced each Pedersen, (Acadia Pharmaceuticals, Research Fund, UK); Blundell year. Denmark). This time, 3,807 predictions, (Biochemistry, Cambridge, UK); Fidelis Predicting a protein’s structure from its including 1,256 three-dimensional mod- (Lawrence Livermore); Yang (Honig Lab, amino acid sequence is a ‘holy grail’ for els, were made by 98 groups for 43 targets . Columbia University), Fischer (Ben the structural biology community. In Experimental structures for 36 of the 43 Gurion University, Israel); and Dunbrack spite of decades of effort, it remains an targets were solved in time for the meet- (Fox Chase Cancer Center) . These groups extremely difficult problem both because ing. Here we summarize what was commonly made use of multiple sequence the folded three-dimensional structure of revealed during the four days of CASP3. alignments, alignment of actual and pre- a protein is complicated and because the Complete information can be found at the dicted secondary structures and hand- structure is defined by many degrees of Prediction Center web site (http://predic- adjustments. freedom. All theoretical and computa- tioncentetllnLgov/casp3). Once a known structure has been iden- tional approaches must be validated by CASP considers three categories of tified as a homolog of the target, a model testing. This is usually accomplished by structure prediction: (i) comparative is built by copying backbone elements attempts to reproduce the known folds of modeling using a known template struc- from this template. A wide variety of a number of small proteins. Such tests can ture; (ii) fold recognition using a library of methods are used for loop building and be criticized in that they ‘remember’ the known protein folds; and (iii) ab initio side chain reconstruction. While it is hard known structures that are being ‘predict- prediction using principles of atomic to evaluate the quality of the loop building ed’. For truly objective assessment, one interactions and protein architecture. due to insufficient data, side chain model- needs a system of blind prediction. Five Previously, each target was initially ing reached a high level of accuracy for years ago, such a scheme was devised by assigned to a prediction category but at buried side chains: from >80% correct (to John Moult (CARB, University of CASP3, categories were defined at the within 30°) for T47, T58 and T60 (‘T’ Maryland Biotechnology Institute, USA), meeting. Such categorization is somewhat stands for ‘target’; the targets are num- who initiated the CASP meetings. arbitrary and it is fortunate that the 36 bered consecutively from CASP2) to CASP works as follows. Protein crystal- solved structures spanned the full range of -45% correct for T57, T68 and T70. Some lographers and NMR spectroscopists are difficulty and challenged all categories of of the consistently successful methods for solicited for proteins whose structures are predictors (Table 1). placing side chains include backbone- likely to be completed before the next dependent rotamer libraries (Dunbrack), CASP meeting. The sequences of these Comparative modeling segment matching followed by energy target proteins are made available on a Alwyn Jones (Uppsala, Sweden), a pioneer minimization (Levitt) and self-consistent web server. Researchers interested in tak- of interactive modeling software and an mean field optimization (Sternberg). ing part in the CASP experiment then experienced X-ray crystallographer, was Methods for comparative modeling were submit up to five predictions for each tar- assessor together with Gerard Kleywegt criticized at earlier CASP meetings for get before a given deadline. Assessors (Uppsala) for the comparative modeling their inability to provide a final model of chosen by the organizers critically analyze category. The goal of comparative model- the target closer to the experimental struc- these predictions. The results are present- ing is to build a structural model of a pro- ture than the template structure from 108 nature structural biology l volume 6 number 2 l february 1999 meeting review which it is built. At CASP3, most models predictors in this category at CASP2, the target protein without using a specific were not refined. While this led to better team formed by Alexey Murzin and Alex template structure. There was consider- models, it skirts a central embarrassment Bateman (MRC, Cambridge) did best. able overlap between the ab initio and fold of molecular mechanics, namely that This was a surprise as they relied much recognition categories and Orengo energy minimization or molecular more on careful analysis of sequences, a assessed 15 targets. In fact, the best ab ini- dynamics generally leads to a model that is unique knowledge of protein structure tio models were clearly better than any less like the experimental structure. and study of the functional literature, than fold-recognition models for three targets Keeping the template fixed works badly on massive computation. with folds already represented in the data- for proteins in the calmodulin family, At CASP3, 17 targets could be assigned base (T61, T67 and T77). which show a great deal of structural vari- to the fold-recognition category in that Four groups selected to speak used the ability for very similar sequences (T74 and they were structurally similar to other predicted secondary structure in their ab T76; Table 1). proteins defined as superfamilies or folds initio predictions. (i) Baker et al. Knowing the structure of a protein is in the scop database4. The correct folds for (University of Washington, Seattle) useful for predicting, analyzing and 13 of these targets were recognized by at assembled fragments of known structures designing its function, but it is expensive least one predictor. This is impressive as chosen to match both sequence and sec- to determine experimentally the structure none of the 17 folds could have been rec- ondary structure and then minimized a of every protein, Structural genomics ognized by the best standard sequence knowledge-based energy function by sim- focuses structure determination of a well- comparison methods, such as FASTA or ulated annealing with moves that change chosen subset of proteins that should put PSI-BLAST (Table 1). Three of the speak- fragments. (ii) Samudrala et al. (Levitt all other protein sequences within the ers used their programming systems: Lab, Stanford) generated all possible folds range of comparative modeling. This was Procyon (Sippl, Salzburg, Austria), of the chain using a simplified three-state discussed at a recent meeting held in Threader (Jones, Warwick, UK), and the lattice model. The set of structures was Avalon, New Jersey and reviewed in NCBI threading program (Bryant, NCBI, pruned by repeated applications of knowl- Nature Structural Biology by Andrej Sali3. USA). Human expertise plays a major role edge-based scoring functions, proceeding The results of CASP3 are directly relevant in some of these highly computerized from a simple point residue representa- to structural genomics: they confirm it is approaches. This was clearly demonstrat- tion to an all-atom model with the pre- possible to build a reasonable model when ed by the success

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us