Application of Bioinformatics on Protein Structure Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Application of Bioinformatics on Protein Structure Prediction 2015 3rd AASRI Conference on Computational Intelligence and Bioinformatics (CIB 2015) ISBN: 978-1-60595-308-3 Application of Bioinformatics on Protein Structure Prediction Xiaodan Liu Jilin Agriculture Science and Technology College, Jilin, China ABSTRACT: With the advent of the information age, bioinformatics has been rapid development in many fields to be fully applied. This paper describes the establishment and development of common database of protein, protein structure analysis, protein secondary and tertiary structure prediction; The application of bioinformatics in protein structure analysis is comprehensively summarized, which provides theoretical basis for the wide application of bioinformatics in protein engineering. 1. INTRODUCTION 2. PROTEIN STRUCTURE DATABASE Bioinformatics provides a shortcut for the protein According to the data in the database based on the function prediction and structural analysis by analysis of the structure and function of the natural directly using gene or protein sequences. Which is protein, the spatial structure and biological function formed by the interaction of biological and computer of a certain amino acid sequence can be predicted, science and applied mathematics. It is processed by and the amino acid sequence and spatial structure of the data acquisition, processing, storage, retrieval the protein can be designed according to the specific and analysis, to explain the biological significance biological function. of the data. As the genome and proteome research The basic three-dimensional structure of the provides a very rich data, we need to make a high protein database is Protein Data Bank((PDB; degree of automation of these data management, http://www.pdb.org/), founded by Brookhaven National establish a strict database and write the Laboratory in New York in 1971,which is the corresponding software. This work greatly promoted database of the most important biological the development of bioinformatics, and macromolecules (proteins, nucleic acids and sugars) bioinformatics will also play a special role in the three- dimensional structure (Sussman et al.,1998; research of structure prediction (Hagen, 2000). Berman et al., 2000; Westbrook et al., 2003).The The emergence of bioinformatics has made the database is accurate coordinate data collected in the protein project into a new era. In the future, more protein structure by X-ray diffraction and nuclear detailed knowledge of protein structure and function, magnetic resonance (NMR) experiment measured. as well as advancements in high throughput Protein structure data in PDB are widely used in technology, may greatly expand the capabilities of studies of protein function and evolution, and they protein structure prediction. serve as a basis for protein structure prediction (He Protein structure prediction is an important task et al., 2014). in bioinformatics. To a large extent, the biological PDB Finder database (http://www.cmbi.kun.nl/ function of protein depends on its spatial structure, swift/pdbfinder/) is built on the basis of PDB, DSSP, so it is very important to predict the structure of HSSP, the two level library, which contains the PDB proteins. Sometimes a possible new gene cannot be sequence, the author, R factor, resolution, Secondary found by searching for any homologous sequence, Structure, etc. these information is not easy to read namely orphan gene, and for proteins encoded by directly from the PDB, with the PDB library every those genes, bioinformatics technology can be used time a new version, PDBFinder automatically to search for homologous genes based on the generated in EBI. homologous comparison of structure or predict its Homology Derived Secondary Structure of Proteins advanced structure to infer its function directly. (HSSP;http://www.sander.embl-heidelberg.de/hssp/) is the secondary structure database based on the homology of the two protein structure. Each PDB project has a corresponding HSSP file (Dodge et al., 1998). The database also provides the homology of all protein sequences in the SWISS-PROT database. 109 The Structural Classification of Proteins data- assigning regions of the amino acid sequence as base(SCOP; http://scop.mrc-lmb.cam.ac.uk/scop/)is a likely alpha helices, beta strands (often noted as comprehensive ordering of all proteins of known "extended" conformations), or turns. The different structure, according to their evolutionary and amino acid residues have different tendencies for the structural relationships. which is obtained by manual formation of different secondary structures. The classification, in which the proteins of known success of a prediction is determined by comparing structure are hierarchical classified (Murzin et al., it to the results of the DSSP algorithm (or similar e.g. 1995; Andreeva et al., 2004).This resource allows STRIDE) applied to the crystal structure of the users to analyze whether the query protein and protein. Specialized algorithms have been developed known structural protein have similarities. Protein for the detection of specific well-defined patterns domains in SCOP are hierarchically classified into such as transmembrane helices and coiled coils in families, superfamilies, folds and classes. proteins (Mount, 2004). The prediction accuracy of Molecular Modeling Database (MMDB; a single sequence of secondary structure is about http://www.ncbi.nih.gov/Structure/MMDB/mmdb. shtml), 60% ~80% (Petersen et al., 2000). The best modern This is a three-dimensional structure database used methods of secondary structure prediction in by the Entrez search tool (Wang et al., 2000). ASN,1 proteins reach about 80% accuracy (Pirovano format, which reflects the structure and sequence &Heringa,2010); this high accuracy allows the use data of PDB Library. Meanwhile, a complete set of of the predictions as feature improving fold 3D structure display program Cn3D is provided by recognition and ab initio protein structure prediction, NCBI. classification of structural motifs, and refinement of MUFOLD-DB(http://mufold.org/mufolddb.php), a sequence alignments.10 kinds of important and web-based database, to collect and process the effective methods were tested and analyzed using weekly PDB files thereby providing users with non- the unified standard, the comparison results show: redundant, cleaned and partially predicted structure the prediction effect of APSSP2, SsPro2.0 and data. For each of the non-redundant sequences, the PSIPRED method is better, the accuracy of the SCOP domain classification and predict structures prediction is more than 80% (Zhang et al.,2003). of missing regions are annotated by loop modelling. So far, more than 20 different secondary structure In addition, evolutional information, secondary stru- prediction methods have been developed. First cture, disorder region, and processed three- algorithms was Chou-Fasman method, which relies dimensional structure are computed and visualized predominantly on probability parameters determined tohelp users better understand the protein(He et al., from relative frequencies of each amino acid's 2014). appearance in each type of secondary structure(Chou et al.,1974). The next notable program was the GOR method, is an information 3. PROTEIN STRUCTURE PREDICTION theory-based method. Which employs the more powerful probabilistic technique of Bayesian Protein structure prediction is the prediction of the inference(Garnier et al.,1978).Another big step three-dimensional structure of a protein from its forward, was using machine learning methods. First amino acid sequence-that is, the prediction of its artificial neural networks methods were used. As a folding and its secondary, tertiary, and quaternary training sets they use solved structures to identify structure from its primary structure. Structure common sequence motifs associated with particular prediction is fundamentally different from the arrangements of secondary structures. These inverse problem of protein design. Protein structure methods are over 70% accurate in their predictions, prediction is one of the most important goals although beta strands are still often underpredicted pursued by bioinformatics and theoretical chemistry; due to the lack of three-dimensional structural it is highly important in medicine (for example, in information that would allow assessment of drug design) and biotechnology (for example, in the hydrogen bonding patterns that can promote design of novel enzymes) (https://en.wikipedia. formation of the extended conformation required for org/wiki/Protein_structure_pre-diction). the presence of a complete beta sheet (Mount, 2004). PSIPRED and JPRED are some of the most known Protein secondary structure prediction 3.1 programs based on neural networks for protein Protein secondary structure prediction is a set of secondary structure prediction (Cuff, 2000). techniques in bioinformatics that aim to predict the The predicted secondary structural states are not local secondary structures of proteins based only on cross validated by any of the existing servers. Hence, knowledge of their amino acid sequence only. information on the level of accuracy for every Which plays a vital role and acts as an intermediate sequence is not reported by the existing servers. This in solving tertiary structures; which provides an was overcome by NNvPDB, which not only insight in to protein function (Kloczkowski et reported greater Q3 but also validates every al.,2002). For proteins, a prediction consists of prediction with the homologous PDB entries.
Recommended publications
  • CASP)-Round V
    PROTEINS: Structure, Function, and Genetics 53:334–339 (2003) Critical Assessment of Methods of Protein Structure Prediction (CASP)-Round V John Moult,1 Krzysztof Fidelis,2 Adam Zemla,2 and Tim Hubbard3 1Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 2Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 3Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, United Kingdom ABSTRACT This article provides an introduc- The role and importance of automated servers in the tion to the special issue of the journal Proteins structure prediction field continue to grow. Another main dedicated to the fifth CASP experiment to assess the section of the issue deals with this topic. The first of these state of the art in protein structure prediction. The articles describes the CAFASP3 experiment. The goal of article describes the conduct, the categories of pre- CAFASP is to assess the state of the art in automatic diction, and the evaluation and assessment proce- methods of structure prediction.16 Whereas CASP allows dures of the experiment. A brief summary of progress any combination of computational and human methods, over the five CASP experiments is provided. Related CAFASP captures predictions directly from fully auto- developments in the field are also described. Proteins matic servers. CAFASP makes use of the CASP target 2003;53:334–339. © 2003 Wiley-Liss, Inc. distribution and prediction collection infrastructure, but is otherwise independent. The results of the CAFASP3 experi- Key words: protein structure prediction; communi- ment were also evaluated by the CASP assessors, provid- tywide experiment; CASP ing a comparison of fully automatic and hybrid methods.
    [Show full text]
  • EVA: Continuous Automatic Evaluation of Protein Structure Prediction Servers Volker A
    EVA: continuous automatic evaluation of protein structure prediction servers Volker A. Eyrich 1, Marc A. Martí-Renom 2, Dariusz Przybylski 3, Mallur S. Madhusudhan2, András Fiser 2, Florencio Pazos 4, Alfonso Valencia 4, Andrej Sali 2 & Burkhard Rost 3 1 Columbia Univ., Dept. of Chemistry, 3000 Broadway MC 3136, New York, NY 10027, USA 2 The Rockefeller Univ., Lab. of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, 1230 York Avenue, New York, NY 10021-6399, USA 3 CUBIC Columbia Univ, Dept. of Biochemistry and Molecular Biophysics, 650 West 168th Street, New York, N.Y. 10032, USA 4 Protein Design Group, CNB-CSIC, Cantoblanco, Madrid 28049, Spain Corresponding author: B Rost, [email protected], Tel +1-212-305-3773, Fax +1-212-305-7932 Running title: EVA: evaluation of prediction servers Document type: Application note Statistics: Abstract 153 words; Text 1129 words; 5 References Abstract Summary: Evaluation of protein structure prediction methods is difficult and time- consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. Availability: All EVA web pages are accessible at {http://cubic.bioc.columbia.edu/eva}.
    [Show full text]
  • EVA: Continuous Automatic Evaluation of Protein Structure Prediction Servers Volker A
    Vol. 17 no. 12 2001 BIOINFORMATICS APPLICATIONS NOTE Pages 1242–1243 EVA: continuous automatic evaluation of protein structure prediction servers Volker A. Eyrich 1, Marc A. Mart´ı-Renom 2, Dariusz Przybylski 3, Mallur S. Madhusudhan 2, Andras´ Fiser 2, Florencio Pazos 4, Alfonso Valencia 4, Andrej Sali 2 and Burkhard Rost 3,∗ 1Columbia University, Department of Chemistry, 3000 Broadway MC 3136, New York, NY 10027, USA, 2The Rockefeller University, Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, 1230 York Avenue, New York, NY 10021-6399, USA, 3CUBIC Columbia University, Department of Biochemistry and Molecular Biophysics, 650 West 168th Street, New York, NY 10032, USA and 4Protein Design Group, CNB-CSIC, Cantoblanco, Madrid 28049, Spain Received on February 20, 2001; revised on May 28, 2001; accepted on July 4, 2001 ABSTRACT How well do experts predict protein structure? The Summary: Evaluation of protein structure prediction CASP experiments attempt to address the problem of over- methods is difficult and time-consuming. Here, we de- estimated performance (Zemla et al., 2001). Although scribe EVA, a web server for assessing protein structure CASP resolves the bias resulting from using known pro- prediction methods, in an automated, continuous and tein structures as targets, it has limitations. (1) The meth- large-scale fashion. Currently, EVA evaluates the perfor- ods are ranked by human assessors who have to evaluate mance of a variety of prediction methods available through thousands of predictions in 1–2 months (∼ 10 000 from the internet. Every week, the sequences of the latest 160 groups for CASP4; Zemla et al., 2001).
    [Show full text]
  • BIOINFORMATICS ORIGINAL PAPER Doi:10.1093/Bioinformatics/Bti770
    Vol. 22 no. 2 2006, pages 195–201 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/bti770 Structural bioinformatics The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Konstantin Arnold1,2, Lorenza Bordoli1,2,Ju¨ rgen Kopp1,2 and Torsten Schwede1,2,Ã 1Biozentrum Basel, University of Basel, Switzerland and 2Swiss Institute of Bioinformatics, Basel, Switzerland Received on August 25, 2005; revised and accepted on November 7, 2005 Advance Access publication November 13, 2005 Associate Editor: Dmitrij Frishman ABSTRACT limitation. Since the number of possible folds in nature appears to be Motivation: Homology models of proteins are of great interest for plan- limited (Chothia, 1992) and the 3D structure of proteins is better ning and analysing biological experiments when no experimental three- conserved than their sequences, it is often possible to identify a dimensional structures are available. Building homology models homologous protein with a known structure (template) for a given requires specialized programs and up-to-date sequence and structural protein sequence (target). In these cases, homology modelling has databases.Integrating all required tools,programs anddatabasesinto a proven to be the method of choice to generate a reliable 3D model of single web-based workspace facilitates access to homology modelling a protein from its amino acid sequence as impressively shown in from a computer with web connection without the need of downloading several meetings of the bi-annual CASP experiment (Tramontano Downloaded from and installing large program packages and databases. and Morea, 2003; Moult, 2005). Homology modelling is routinely Results:SWISS-MODEL workspaceis a web-based integrated service used in many applications, such as virtual screening, or rationalizing dedicated to protein structure homology modelling.
    [Show full text]
  • CAFASP3 in the Spotlight of EVA
    PROTEINS: Structure, Function, and Genetics 53:548–560 (2003) CAFASP3 in the Spotlight of EVA Volker A. Eyrich,1* Dariusz Przybylski,1,2,4 Ingrid Y.Y. Koh,1,2 Osvaldo Grana,5 Florencio Pazos,6 Alfonso Valencia,5 and Burkhard Rost1,2,3* 1CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 2Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, New York 3North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 4Department of Physics, Columbia University, New York, New York 5Protein Design Group, Centro Nacional de Biotecnologia (CNB-CSIC), Cantoblanco, Madrid, Spain 6ALMA Bioinformatica, Contoblanco, Madrid, Spain ABSTRACT We have analysed fold recogni- targets to servers as well as data collection and evaluation tion, secondary structure and contact prediction are fully automated and are capable of scaling up to much servers from CAFASP3. This assessment was car- larger numbers of prediction targets than CASP: despite ried out in the framework of the fully automated, the immense increase in the number of targets at CASP5, web-based evaluation server EVA. Detailed results over 100 times more targets have been handled by EVA are available at http://cubic.bioc.columbia.edu/eva/ between CASP4 and CASP5 than at CASP5 alone. Details cafasp3/. We observed that the sequence-unique tar- on the mechanism of data acquisition and the presentation gets from CAFASP3/CASP5 were not fully represen- of results are available from the web and our previous tative for evaluating performance. For all three publications;1,3,2 forthcoming publications will describe categories, we showed how careless ranking might the evaluation procedure in more detail.
    [Show full text]
  • Protein Structure Prediction
    Protein structure prediction Comparative/homology modeling What is the reason for protein structure prediction? 1. Solving protein structures experimentally is hard (sometimes impossible) 2. Many predicted structures can be close to solved structures in accuracy 3. Protein structure can provide important clues to protein function – Functional sites (e.g., enzyme active sites and specificity determinants) – Protein-protein interaction – Docking studies (e.g., for drug interaction/design studies) – See Baker and Sali paper for other uses 2 Assigned Reading for this section From Pevsner text: • Chapter 6 Multiple Sequence Alignment 205-227, 234 (Read-by: 9/27) • Chapter 13 Protein Structure 589-625 (Read-by: 9/29) “Protein Structure Prediction and Structural Genomics”, David Baker and Andrej Sali, Science 2001. (Read-by: 9/29) 3 Sources for this lecture Park et al, “Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.” JMB 1998 David Baker and Andrej Sali, “Protein Structure Prediction and Structural Genomics” Science 2001 Chothia and Lesk, “The relation between the divergence of sequence and structure in proteins”, EMBO Journal 1986 Andrej Sali and Andras Fiser – selected slides from their seminars Bioinformatics (Baxevanis and Ouellette, previous course text) Chapter 8: Predictive methods using protein sequences (Ofran and Rost) 198-219 Chapter 9: Protein structure prediction and analysis (Wishart) 224-247 Chapter 12: Creation and analysis of protein multiple sequence alignments
    [Show full text]
  • "Comparative Protein Structure Modeling Using MODELLER". In
    Comparative Protein Structure Modeling UNIT 2.9 Using MODELLER Narayanan Eswar,1 Ben Webb,1 Marc A. Marti-Renom,2 M.S. Madhusudhan,1 David Eramian,1 Min-yi Shen,1 Ursula Pieper,1 and Andrej Sali1 1University of California at San Francisco, San Francisco, California 2Centro de Investigacion´ Pr´ıncipe Felipe (CIPF), Valencia, Spain ABSTRACT Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Curr. Protoc. Protein Sci. 50:2.9.1-2.9.31. C 2007 by John Wiley & ! Sons, Inc. Keywords: Modeller protein structure comparative modeling structure prediction rprotein fold r r r Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by an accurate three-dimensional (3-D) structure of the studied protein.
    [Show full text]
  • Basic Protein Structure Prediction for the Biologist: a Review M. Mihăşan
    Arch. Biol. Sci., Belgrade, 62 (4), 857-871, 2010 DOI:10.2298/ABS1004857M BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST: A REVIEW M. MIHĂŞAN “Alexandru Ioan Cuza” University, Faculty of Biology, Department of Molecular and Experimental Biology, 6600, Iaşi, Romania Abstract - As the field of protein structure prediction continues to expand at an exponential rate, the bench-biologist might feel overwhelmed by the sheer range of available applications. This review presents the three main approaches in computational structure prediction from a non-bioinformatician’s point of view and makes a selection of tools and servers freely available. These tools are evaluated from several aspects, such as number of citations, ease of usage and quality of the results. Finally, the applications of models generated by computational structure prediction are discussed. Keywords: Protein structure prediction, protein mode application UDC 57.08.088.6 INTRODUCTION of which more than half were published in the past three years alone, the rest being published between Knowledge of the three-dimensional structure of a 1993 and 2006. Because of this proliferation, it is protein can provide invaluable hints about its func- difficult for a biologist to know which server or pro- tional and evolutionary features and, in addition, the gram to use, as it is hard to answer frequent questions structural information is useful in drug design efforts. such as: how do I know if I can trust the result; what Genome-scale sequencing projects have already pro- does output mean; should I use more than one server; duced more than 108 million individual sequences how much time will it take to get results? This review (Benson et al., 2010), but due to the inherently time- will address these questions with particular emphasis consuming and complicated nature of structure on the evaluation of the currently available free pro- determination techniques, only around 53,000 of these grams and web-servers from a biologist’s point of view.
    [Show full text]
  • Protein Structure Prediction Ii Introduction
    PROTEIN STRUCTURE PREDICTION II Jeffrey Skolnick1 ,2 Yang Zhang1 Because the molecular function of a protein depends on its three dimensional structure, which is often unknown, protein structure prediction is an essential tool in proteomics. The state of the art of protein structure prediction is reviewed with emphasis on knowledge-based comparative modeling/threading approaches that exploit the observation that the protein data bank (PDB) is complete for low- resolution, single domain proteins. The recently developed TASSER structure prediction algorithm is described; its ability to produce structures closer to the native state than to the initial template structure demonstrated, along with encouraging results for membrane protein tertiary structure prediction and its ability to predict NMR quality structures in 1/5 of the cases. The quality of predicted structures required for the inference of biochemical function and describe structure-based approaches to predict protein-protein interactions is also examined. Applications to a number of proteomes are presented. The weaknesses and strengths of existing approaches are summarized with an emphasis on the importance of large scale benchmarking. Finally, the outlook for future progress is reviewed. INTRODUCTION Over the past decade, the success of genome sequence efforts has brought about a paradigm shift in biology [1]. There is increasing emphasis on the large-scale, high- throughput examination of all genes and gene products of an organism, with the aim of assigning their functions[2]. Of course, biological function is multifaceted, ranging from molecular/biochemical to cellular or physiological to phenotypical [3]. In practice, knowledge of the DNA sequence of an organism and the identification of its open reading frames (ORFs) does not directly provide functional insight.
    [Show full text]
  • Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives
    Review Article Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives V. K. VYAS*, R. D. UKAWALA, M. GHATE AND C. CHINTHA Department of Pharmaceutical Chemistry, Institute of Pharmacy, Nirma University, Ahmedabad-382 481, India Vyas, et al.: Homology Modeling and Drug Discovery Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
    [Show full text]
  • CAFASP3 in the Spotlight of EVA
    PROTEINS: Structure, Function, and Genetics 53:548–560 (2003) CAFASP3 in the Spotlight of EVA Volker A. Eyrich,1* Dariusz Przybylski,1,2,4 Ingrid Y.Y. Koh,1,2 Osvaldo Grana,5 Florencio Pazos,6 Alfonso Valencia,5 and Burkhard Rost1,2,3* 1CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 2Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, New York 3North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 4Department of Physics, Columbia University, New York, New York 5Protein Design Group, Centro Nacional de Biotecnologia (CNB-CSIC), Cantoblanco, Madrid, Spain 6ALMA Bioinformatica, Contoblanco, Madrid, Spain ABSTRACT We have analysed fold recogni- targets to servers as well as data collection and evaluation tion, secondary structure and contact prediction are fully automated and are capable of scaling up to much servers from CAFASP3. This assessment was car- larger numbers of prediction targets than CASP: despite ried out in the framework of the fully automated, the immense increase in the number of targets at CASP5, web-based evaluation server EVA. Detailed results over 100 times more targets have been handled by EVA are available at http://cubic.bioc.columbia.edu/eva/ between CASP4 and CASP5 than at CASP5 alone. Details cafasp3/. We observed that the sequence-unique tar- on the mechanism of data acquisition and the presentation gets from CAFASP3/CASP5 were not fully represen- of results are available from the web and our previous tative for evaluating performance. For all three publications;1,3,2 forthcoming publications will describe categories, we showed how careless ranking might the evaluation procedure in more detail.
    [Show full text]
  • Comparative Protein Structure Prediction
    Comparative Protein Structure Prediction Marc A. Marti-Renom, Božidar Jerković and Andrej Sali Laboratories of Molecular Biophysics Pels Family Center for Biochemistry and Structural Biology The Rockefeller University, 1230 York Ave, New York, NY 10021, USA KEYWORDS: Short Title: Comparative Modeling, MODELLER, ModWeb, ModBase, ModView Correspondence to Andrej Sali, The Rockefeller University, 1230 York Ave, New York, NY 10021, USA. tel: (212) 327 7550; fax: (212) 327 7540 e-mail: [email protected] Version: March 2002 1 Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. The number of protein sequences that can be modeled and the accuracy of the predictions are increasing steadily because of the growth in the number of known protein structures and because of the improvements in the modeling software. It is currently possible to model with useful accuracy significant parts of approximately one half of all known protein sequences (Pieper et al., 2002). Despite progress in ab initio protein structure prediction (Baker, 2000;Bonneau, Baker, 2001), comparative modeling remains the only method that can reliably predict the 3D structure of a protein with an accuracy comparable to a low-resolution experimentally determined structure (Marti-Renom et al., 2000).
    [Show full text]