CASP)-Round V

PROTEINS: Structure, Function, and Genetics 53:334–339 (2003) Critical Assessment of Methods of Protein Structure Prediction (CASP)-Round V John Moult,1 Krzysztof Fidelis,2 Adam Zemla,2 and Tim Hubbard3 1Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 2Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 3Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, United Kingdom ABSTRACT This article provides an introduc- The role and importance of automated servers in the tion to the special issue of the journal Proteins structure prediction field continue to grow. Another main dedicated to the fifth CASP experiment to assess the section of the issue deals with this topic. The first of these state of the art in protein structure prediction. The articles describes the CAFASP3 experiment. The goal of article describes the conduct, the categories of pre- CAFASP is to assess the state of the art in automatic diction, and the evaluation and assessment proce- methods of structure prediction.16 Whereas CASP allows dures of the experiment. A brief summary of progress any combination of computational and human methods, over the five CASP experiments is provided. Related CAFASP captures predictions directly from fully auto- developments in the field are also described. Proteins matic servers. CAFASP makes use of the CASP target 2003;53:334–339. © 2003 Wiley-Liss, Inc. distribution and prediction collection infrastructure, but is otherwise independent. The results of the CAFASP3 experi- Key words: protein structure prediction; communi- ment were also evaluated by the CASP assessors, provid- tywide experiment; CASP ing a comparison of fully automatic and hybrid methods. Full information is available at the Web site (http:// INTRODUCTION www.cs.bgu.ac.il/ϳdfischer/CAFASP3/). This first article This issue of Proteins is devoted to articles reporting the is followed by three that report some of the more interest- outcome of the fifth communitywide experiment to assess ing CAFASP results. methods of protein structure prediction (CASP5) and Large-scale benchmarking of prediction server perfor- related activities. Four previous CASP experiments in mance is reported in the following two articles: one for 1994, 1996, 1998, and 2000 were reported in previous Livebench,16 and one for EVA.17 In contrast to CASP and 1–4 5–13 special issues of Proteins as well as elsewhere. CAFASP, the benchmarking experiments run continu- 14,15 Independent discussions of CASP5 have also appeared. ously. Both Livebench and EVA operate by sending the The primary goals of CASP are to establish the capabili- sequences of just released PDB entries to automatic predic- ties and limitations of current methods of modeling protein tion servers and collating and analyzing the results over structure from sequence, to determine where progress is time. Livebench focuses primarily on fold recognition and being made, and to determine where the field is held back EVA primarily on secondary structure predictions. Live- by specific bottlenecks. With ϳ10 years of effort now bench and EVA are entirely independent of CASP, and we recorded, these latter factors—progress or the lack of are grateful to the organizers for their participation in the it—have assumed increasing importance. Methods are CASP5 meeting and in contributing to this issue of Pro- assessed on the basis of the analysis of a large number of teins. Benchmarking experiments complement CASP, par- blind predictions of protein structure. ticularly by clarifying issues of the statistical significance This article outlines the structure and conduct of the of the results. experiment and is followed by a description of the CASP5 Prediction of disorder in protein structures was included target proteins. There are sections of the special issue for for the first time in CASP5. An assumption behind the each of the main CASP prediction categories: Comparative prediction of protein structure is that, under specified Modeling, Fold Recognition, and New Fold methods. These environmental conditions, every protein molecule has es- sections begin with an article by the assessment team in that area and continue with contributions from the prediction groups the assessors considered to have done the most Grant sponsor: Lawrence Livermore Laboratory; Grant sponsor: interesting work. The number of predictor articles in each Department of Energy; Grant sponsor: National Library of Medicine; Grant number: LM07085; Grant sponsor: National Institutes of category varies—there are only three in comparative Health; Grant number: GM/DK61967; Grant sponsor: HP; Grant modeling, four in the fold recognition, and seven in the new sponsor: Amgen; Grant sponsor: GlaxoSmithKline; Grant sponsor: folds category. The small number of articles in compara- IBM. *Correspondence to: John Moult, Center for Advanced Research in tive modeling reflects the fact that there again appears to Biotechnology, University of Maryland Biotechnology Institute, 9600 have been little progress in this area since the last CASP. Gudelsky Dr., Rockville, MD 208.50, E-mail: [email protected] The assessors’ articles are probably the most important in Received: 21 June 2003; Accepted 24 June 2003 the whole issue and describe the state of the art as they found it in CASP5. © 2003 WILEY-LISS, INC. CASP5 335 sentially a single functional three-dimensional structure. analysis data generated with approved methods, and A number of experimental studies have established that may also add their own numerical methods. this does not always appear to be the case.18 Thus, the E. Protein Structure Prediction Center at Lawrence Liver- ability to predict disorder is of considerable importance. more Laboratory. The prediction center is responsible The section on this topic contains three articles: one for all data management aspects of the experiment, providing an evaluation of the predictions and two describ- including the distribution of target information, collec- ing some of the more interesting results. tion of predictions, generation of numerical evaluation Two additional articles complete the issue. The first was of predictions, collection of numerical evaluations from chosen by a vote on the FORCASP Web site (see below) as other workers, and maintenance of a Web site where all representing work done with particularly interesting meth- data are available. Details of these aspects of the ods. It is hoped that in future CASPs it will be possible to experiment are described in Ref. 21. include more methods-related material. The last article is F. The FORCASP Web site (www.FORCASP.org). As dis- an evaluation of progress since CASP4 using similar cussed below, FORCASP provides a forum where mem- methods to those for progress evaluation after CASPs 319 bers of the prediction community may discuss aspects and 4.20 of the CASP experiment. The CASP5 Experiment Collection of Targets The structure of the experiment was very similar to that X-ray crystallographers and NMR spectroscopists were of the earlier ones and consisted of three steps: solicited to provide information about structures that were either expected to be solved shortly or that were already 1. Information about “soon to be solved” structures was solved but had not yet been discussed in public. U.S. collected from the experimental community and passed structural genomics projects were also asked to contribute on to the prediction community. Target information prediction targets, and more than half of the targets came was made available through the CASP Web site and from that source. Target information was made available sent directly to registered CAFASP servers. to predictors through a Web interface. Details of 67 2. Prediction teams deposited models of the structures structures were obtained, and of these, 60 were solved in before the experimental results were public. For CASP, time to be included in the assessment. A number of these deposition was required by a specified deadline. For targets were divided into two or more domains for assess- CAFASP, servers were required to respond within ment purposes. The total again falls short of what we had 48 hours hoped for (ϳ100), but because of the participation of the 3. The models were compared with experiment by using structural genomics projects, it is about 50% larger than in numerical evaluation techniques and human assess- CASP4. ment, and a meeting was held to discuss the significance of the results. Categories of Prediction The quality of a structure model depends on how much Management and Organization information from already known structures can be used—at CASP has a multilevel structure, intended to ensure one extreme, models competitive with experiment can be substantial input from the prediction community: produced for proteins with sequences very similar to that of a known structure. At the other, models for proteins A. Organizers. The authors of this article, responsible for with no detectable sequence or structure relationship to all aspects of the organization of the experiment and one of known structure are still at best very approximate. meeting. In all the CASPs so far, targets have been divided into B. Consultancy groups. Three groups of ϳ10 veteran three broad categories, reflecting how extensively models CASP predictors each, one for each of the three primary could be based on knowledge of other structures. There is prediction categories. These groups, first

CASP)-Round V

The Second Critical Assessment of Fully Automated Structure

EVA: Continuous Automatic Evaluation of Protein Structure Prediction Servers Volker A

Benchmarking the POEM@HOME Network for Protein Structure

Methods for the Refinement of Protein Structure 3D Models

Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems

Gathering Them in to the Fold

Improving the Performance of Rosetta Using Multiple Sequence Alignment

Precise Structural Contact Prediction Using Sparse Inverse Covariance Estimation on Large Multiple Sequence Alignments David T

Rosetta @ Home and the Foldit Game

EVA: Continuous Automatic Evaluation of Protein Structure Prediction Servers Volker A

Jakub Paś Application and Implementation of Probabilistic

April 11, 2003 1:34 WSPC/185-JBCB 00018 RAPTOR: OPTIMAL PROTEIN THREADING by LINEAR PROGRAMMING