Lattice Protein Folding with Two and Four-Body Statistical Potentials

Total Page:16

File Type:pdf, Size:1020Kb

Lattice Protein Folding with Two and Four-Body Statistical Potentials PROTEINS:Structure,Function,andGenetics43:161–174(2001) LatticeProteinFoldingWithTwoandFour-BodyStatistical Potentials HinHarkGan,1 AlexanderTropsha,2 andTamarSchlick1* 1DepartmentofChemistryandCourantInstituteofMathematicalSciences,NewYorkUniversityandtheHowardHughes MedicalInstitute,NewYork,NewYork 2LaboratoryforMolecularModeling,SchoolofPharmacy,UniversityofNorthCarolina,ChapelHill,NorthCarolina ABSTRACT Thecooperativefoldingofpro- cooperativeinteractionsinnativeproteins—becomesfea- teinsimpliesadescriptionbymultibodypotentials. sible. Suchmultibodypotentialscanbegeneralizedfrom Whiletwo-bodypotentialsareadequateformostcon- commontwo-bodystatisticalpotentialsthrougha densedsystems,theimportanceofmultibodypotentials relationtoprobabilitydistributionsofresidueclus- increasesfordensemolecularsystemssuchascompact tersviatheBoltzmanncondition.Inthisexplor- nativeproteinstructures.Multibodypotentialsmayhelp atorystudy,wecompareafour-bodystatistical improveourunderstandingofthecooperativityofprotein potential,definedbytheDelaunaytessellationof foldingprocessandtheregularityofproteinstructures. proteinstructures,totheMiyazawa–Jernigan(MJ) Recentproteinfoldingstudieswithmultibodypotential potentialforproteinstructureprediction,usinga termsshowthattheyplayaroleinstabilizingproteinfolds latticechaingrowthalgorithm.Weusethefour- andinenhancingcooperativityofthefolding/unfolding bodypotentialasadiscriminatoryfunctionfor process.Forexample,Liwoetal.4,11 haveintroducedthree conformationalensemblesgeneratedwiththeMJ andfour-bodycorrelationtermsintheirunited-residue potentialandexamineperformanceonasetof22 potentials,whicharisefromtheexpansionofmeanpoten- proteinsof30–76residuesinlength.Wefindthatthe tials.Four-bodyhydrogenbondingcorrelationtermshave four-bodypotentialyieldscomparableresultstothe alsobeenintroducedphenomenologicallybyKolinskiand two-bodyMJpotential,namely,anaveragecoordi- Skolnick8 intheirlatticeMonteCarloproteinstructure nateroot-mean-squaredeviation(cRMSD)valueof8 predictionalgorithm. ␣ Åforthelowestenergyconfigurationsofall- pro- Togainabetterunderstandingofmultibodypotentials, teins,andsomewhatpoorercRMSDvaluesforother weformulatemultibodymeanpotentialsintermsofthe proteinclasses.Forbothtwoandfour-bodypoten- potentialsofmeanforceinstatisticalmechanics.12 This tials,superpositionsofsomepredictedandnative formulationimpliesthatthemultibodycontactenergyand structuresshowaroughoverallagreement.Formu- probabilityofobservingclustersofresiduesaregenerally latingthefour-bodypotentialusinglargerdatasets relatedbytheBoltzmanncondition.Furthermore,lower- anddirect,butcostly,generationofconformational ordermeanpotentialscanbederivedfromthehigher- ensembleswithmultibodypotentialsmayofferfur- orderexpressions.Followingthemethodologyofstatistical therimprovements.Proteins2001;43:161–174. functions,3,13 suchmeanpotentialscanbeapproximately ©2001Wiley-Liss,Inc. derivedfromtheproteinstructuraldatabase.Sincethey yieldagreateramountofinformationaboutmany-body Keywords:latticemodel;statisticalpotential;multi- correlationsincompactmolecularsystems,multibody bodypotentials;chaingrowthalgo- potentialsmaybettercharacterizenativeproteins.How- rithm;MonteCarlo ever,derivingthesepotentialsrequireslargeproteinstruc- INTRODUCTION turaldatabasesforaccurateformulations. Inthiswork,weexamineafour-bodypotentialdevel- Statisticalpotentialsderivedfromsimplifiedrepresenta- 14 tionsofproteinresiduesarewidelyusedinproteinstruc- opedbyTropshaandVaismanandcoworkers. Byusinga tureprediction.Becauseofrapidgrowthofproteinstruc- threadingtechnique,theseresearchersshowedthattheir turaldatabases,manyofthecurrentresiduepotentials statisticalpotentialcandiscerncorrectsequence/structure areeitherderiveddirectlyfromtheproteindatabase1–3 or indirectlybyhavingtheparametersofthechosenfunc- 4 Grantsponsor:NationalInstitutesofHealth;Grantnumber: tionalformsdeterminedbythecrystalstructures. These GM55164;Grantnumber:RR08102;Grantsponsor:NationalScience statisticalpotentialshavefoundwide-rangingapplica- Foundation;Grantnumber:BIR-94-23827ER;Grantnumber:ASC- tionsintheevaluationofstructure/sequencecompatibility 9704681. 5,6 7 *Correspondenceto:TamarSchlick,DepartmentofChemistryand ofproteins, homologymodeling, andproteinfolding CourantInstituteofMathematicalSciences,NewYorkUniversityand 8–10 simulations. Currently,moststatisticalpotentialsare theHowardHughesMedicalInstitute,251MercerStreet,NewYork, two-body,suchastheMiyazawa–Jernigan1,2 (MJ)and NY10012.E-mail:[email protected] Sippl3 potentials.Asalargerstructuraldatabaseemerges, Received4August2000;Accepted8December2000 thedevelopmentofmultibodypotentials—tomodelthe Publishedonline00Month2001 ©2001WILEY-LISS,INC. 162 H.H. GAN ET AL. matches better than two-body statistical potentials.15,16 In in native proteins are related to their observed frequency the present investigation, we discuss the evaluation of this in a representative structural database. Since energies are potential and its implementation for lattice protein folding computed from a set of folded structures, they are more using a chain growth algorithm. The evaluation is per- appropriately interpreted as a mean, rather than as bare formed by means of a statistical geometrical method in interaction energies. These mean energies are related to which the four-body residue neighbors are systematically potentials of mean force in statistical mechanics, obtained enumerated using the Delaunay tessellation technique. by ensemble averaging over equilibrium states.13,18 In Four-body energies are then related to the frequencies of structure-derived potentials, ensemble averaging is effec- observed four-residue clusters in protein structures via the tively replaced by averaging over a set of representative Boltzmann relation. Our evaluation shows that while most protein structures. Critical assessment of this interpreta- terms have attained their saturation values, some have tion using a simple two-residue hydrophobic–hydrophilic not converged, indicating that a larger structural database (HP) protein model shows that it yields a correct ranking of is needed to determine this potential more accurately. contact energies, but their absolute values are imprecise.18 We compare the performance of the four-body potential The following discussion presents two-body and multibody in protein structure prediction to the two-body MJ poten- potentials as potentials of mean force. tial using a lattice C␣ protein model. On the (311) cubic For a protein chain with N residue interaction centers, lattice, we generate configurational ensembles for proteins we define the n-body potential of mean force w(n) for the 17 with our recently implemented chain growth algorithm. residue-cluster (i1i2...in)as We have shown that this variant is effective for calculating ͑n͒ ϭ Ϫ ͓ ͑n͒ ͔ conformational and thermodynamic properties of several wi1i2 ...in kBT ln Fi1i2 ...in/Ri1i2 ...in (1) test proteins.17 In this article, we examine the quality of where R is the reference state (defined below) and the two and four-body potentials using coordinate root- i1i2...in (n) mean-square deviation (cRMSD) between native and pre- F is the probability density of finding the residues in a dicted structures and energy/cRMSD scatter plots. As the cluster: computational cost in generating ensembles using the four-body potential is currently prohibitive, we can only ͑ ͒ ͑ Ϫ ͒ ͑ Ϫ ͒ F n ϭ ͵ dV N n exp͑Ϫ␤E ͒/͵ dV N 1 exp͑Ϫ␤E ͒ (2) apply it to the conformational ensembles generated with i1i2 ...in N N the two-body potential. We find that the shapes of the ␤ ϭ energy/cRMSD plots for the two and four-body potentials where the temperature parameter 1/(kBT), the vol- (N Ϫ n) ϭ are correlated for low-energy and low cRMSD configura- ume element dV (dV1dV2...dVN)/ (dV dV ...dV ), and E is the total energy of the protein. tions. The two and four-body energies are generally weakly i1 i2 in N correlated. The n-body potential is defined with respect to a “reference state” R determined by the specific problem of inter- For a set of 22 proteins, the predicted cRMSD values by i1i2...in the MJ potential are about8Åfor␣ proteins and some- est. Moreover, what poorer, around 9 Å, for ␤ and ␣/␤ protein classes. ͑n͒ ϭ Superpositions of predicted and native structures show Fi1i2 ...in Ri1i2 ...in (3) rough overall agreements for some proteins. Our four-body when the mean potential w(n) ϭ 0. potential yields comparable results. Improving the four- i1i2...in body potential requires larger protein data sets than used It is useful to separate correlated from uncorrelated as- in the present study. Furthermore, a more sophisticated pects of many-body interactions. The reference state is often protein model that uses finer representation for each chosen to be the uncorrelated state where there are no residue may be required. Thus, although the present interactions between residues. If the interaction energy EN vanishes, F(n) in eq. 2 becomes a product of the uncorre- four-body potential may be adequate for protein fold i1i2...in recognition, further improvements are necessary for more lated, single residue properties. Mathematically, we write R ϳ ͟n R , where R is the frequency of occurrence of accurate ab initio structure prediction. i1i2...in a ia ia individual residues. Consequently, the mean potential w(n) i1i2...in STATISTICAL POTENTIALS is interpreted as a measure of the nonrandom nature of residue distributions, or contacts, in protein structures. The We begin by presenting the modified MJ potential, the correlations among the residues in the cluster (i i
Recommended publications
  • Algorithms & Experiments for the Protein
    ALGORITHMS & EXPERIMENTS FOR THE PROTEIN CHAIN LATTICE FITTING PROBLEM DALLAS THOMAS Bachelor of Science, University of Lethbridge, 2004 Bachelor of Science, Kings University College, 2002 A Thesis Submitted to the School of Graduate Studies of the University of Lethbridge in Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE Department of Mathematics and Computer Science University of Lethbridge LETHBRIDGE, ALBERTA, CANADA c Dallas Thomas, 2006 Abstract This study seeks to design algorithms that may be used to determine if a given lattice is a good approximation to a given rigid protein structure. Ideal lattice models discovered using our techniques may then be used in algorithms for protein folding and inverse protein folding. In this study we develop methods based on dynamic programming and branch and bound in an effort to identify “ideal” lattice models. To further our understanding of the concepts behind the methods we have utilized a simple cubic lattice for our analysis. The algorithms may be adapted to work on any lattice. We describe two algorithms. One for aligning the protein backbone to the lattice as a walk. This algorithm runs in polynomial time. The sec- ond algorithm for aligning a protein backbone as a path to the lattice. Both the algorithms seek to minimize the CRMS deviation of the alignment. The second problem was recently shown to be NP-Complete, hence it is highly unlikely that an efficient algorithm exists. The first algorithm gives a lower bound on the optimal solution to the second problem, and can be used in a branch and bound procedure.
    [Show full text]
  • Resource-Efficient Quantum Algorithm for Protein Folding
    www.nature.com/npjqi ARTICLE OPEN Resource-efficient quantum algorithm for protein folding ✉ Anton Robert1,2, Panagiotis Kl. Barkoutsos 1, Stefan Woerner 1 and Ivano Tavernelli 1 Predicting the three-dimensional structure of a protein from its primary sequence of amino acids is known as the protein folding problem. Due to the central role of proteins’ structures in chemistry, biology and medicine applications, this subject has been intensively studied for over half a century. Although classical algorithms provide practical solutions for the sampling of the conformation space of small proteins, they cannot tackle the intrinsic NP-hard complexity of the problem, even when reduced to the simplest Hydrophobic-Polar model. On the other hand, while fault-tolerant quantum computers are beyond reach for state-of- the-art quantum technologies, there is evidence that quantum algorithms can be successfully used in noisy state-of-the-art quantum computers to accelerate energy optimization in frustrated systems. In this work, we present a model Hamiltonian with OðN4Þ scaling and a corresponding quantum variational algorithm for the folding of a polymer chain with N monomers on a lattice. The model reflects many physico-chemical properties of the protein, reducing the gap between coarse-grained representations and mere lattice models. In addition, we use a robust and versatile optimization scheme, bringing together variational quantum algorithms specifically adapted to classical cost functions and evolutionary strategies to simulate the folding of the 10 amino acid Angiotensin on 22 qubits. The same method is also successfully applied to the study of the folding of a 7 amino acid neuropeptide using 9 qubits on an IBM 20-qubit quantum computer.
    [Show full text]
  • Doctor of Philosophy
    Protein structure recognition: From eigenvector analysis to structural threading method by Haibo Cao A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Condensed Matter Physics Program of Study Committee: Kai-Ming Bo, Major Professor Drena Dobbs Amy Andreotti Alan Goldman Bruce Harmon Joerg Schmalian Iowa State University Ames, Iowa 2003 .. 11 Graduate College Iowa State University This is to certify that the doctoral dissertation of Haibo Cao has met the dissertation requirements of Iowa State University Major Professor For the Major Program ... 111 TABLE OF CONTENTS ... Acknowledgments ................................... vm Abstract and Organization of Dissertation ..................... X CHAPTER 1. Protein Structure And Building Blocks ............. 1 Proteins. .......................................... 1 AminoAcids ........................................ 3 Secondary Structure ................................... 4 aHelix ...................................... 4 psheet ......................................... 5 Bibliography ........................................ 11 CHAPTER 2 . Protein Folding Problem ...................... 13 Levinthal’s Paradox .................................... 13 Interactions ......................................... 15 MJmatrix ....................................... 16 LTW parameterization ................................ 19 Bibliography ........................................ 25 CHAPTER 3 . Simple Exact Model .......................
    [Show full text]
  • Combinatorial Algorithms for Protein Folding in Lattice Models: a Survey of Mathematical Results
    Combinatorial Algorithms for Protein Folding in Lattice Models: A Survey of Mathematical Results Sorin Istrail∗ Center for Computational Molecular Biology Department of Computer Science Brown University Providence, RI 02912 Fumei Lam† Center for Computational Molecular Biology Department of Computer Science Brown University Providence, RI 02912 Dedicated to Michael Waterman’s 67th Birthday June 17, 2009 “ ... a very nice step forward in the computerology of proteins.” Ken Dill 1995[1] Abstract We present a comprehensive survey of combinatorial algorithms and theorems about lattice protein folding models obtained in the almost 15 years since the publication in 1995 of the first protein folding approximation algorithm with mathematically guar- anteed error bounds [60]. The results presented here are mainly about the HP-protein folding model introduced by Ken Dill in 1985 [37]. The main topics of this survey include: approximation algorithms for linear-chain and side-chain lattice models, as well as off-lattice models, NP-completeness theorems about a variety of protein folding models, contact map structure of self-avoiding walks and HP-folds, combinatorics and algorithmics of side-chain models, bi-sphere packing and the Kepler conjecture, and the protein side-chain self-assembly conjecture. As an appealing bridge between the hybrid of continuous mathematics and discrete mathematics, a cornerstone of the mathemat- ical difficulty of the protein folding problem, we show how work on 2D self-avoiding walks contact-map decomposition [56] can build upon the exact RNA contacts counting formula by Mike Waterman and collaborators [96] which lead to renewed hope for ana- lytical closed-form approximations for statistical mechanics of protein folding in lattice models.
    [Show full text]
  • Arxiv:2004.01118V2 [Quant-Ph] 18 May 2021
    Investigating the potential for a limited quantum speedup on protein lattice problems Carlos Outeiral1;2, Garrett M. Morris1, Jiye Shi3, Martin Strahm4, Simon C. Benjamin2;∗, Charlotte M. Deane1;y 1Department of Statistics, University of Oxford, 24-29 St Giles', Oxford OX1 3LB, United Kingdom 2Department of Materials, University of Oxford, Parks Road, Oxford OX1 3PH, United Kingdom 3Computer-Aided Drug Design, UCB Pharma, 216 Bath Road, Slough SL1 3WE, United Kingdom 4Pharma Research & Early Development, F. Hoffmann-La Roche, Grenzacherstrasse 4058, Basel, Switzerland ∗To whom correspondence shall be addressed: [email protected] yTo whom correspondence shall be addressed: [email protected] Abstract. Protein folding is a central challenge in computational biology, with important applications in molecular biology, drug discovery and catalyst design. As a hard combinatorial optimisation problem, it has been studied as a potential target problem for quantum annealing. Although several experimental implementations have been discussed in the literature, the computational scaling of these approaches has not been elucidated. In this article, we present a numerical study of quantum annealing applied to a large number of small peptide folding problems, aiming to infer useful insights for near-term applications. We present two conclusions: that even na¨ıve quantum annealing, when applied to protein lattice folding, has the potential to outperform classical approaches, and that careful engineering of the Hamiltonians and schedules involved can deliver notable relative improvements for this problem. Overall, our results suggest that quantum algorithms may well offer improvements for problems in the protein folding and structure prediction realm. arXiv:2004.01118v2 [quant-ph] 18 May 2021 Keywords: quantum annealing, protein folding, quantum optimisation, biophysics 1.
    [Show full text]
  • Learning Protein Constitutive Motifs from Sequence Data
    Manuscript SUBMITTED TO eLife Learning PROTEIN CONSTITUTIVE MOTIFS FROM SEQUENCE DATA Jérôme Tubiana, Simona Cocco, Rémi Monasson LaborATORY OF Physics OF THE Ecole Normale Supérieure, CNRS & PSL Research, 24 RUE Lhomond, 75005 Paris, FrANCE AbstrACT Statistical ANALYSIS OF Evolutionary-rELATED PROTEIN SEQUENCES PROVIDES INSIGHTS ABOUT THEIR STRUCTURe, function, AND HISTORY. WE SHOW THAT Restricted Boltzmann Machines (RBM), DESIGNED TO LEARN COMPLEX high-dimensional DATA AND THEIR STATISTICAL FEATURes, CAN EffiCIENTLY MODEL PROTEIN FAMILIES FROM SEQUENCE information. WE HERE APPLY RBM TO TWENTY PROTEIN families, AND PRESENT DETAILED RESULTS FOR TWO SHORT PROTEIN domains, Kunitz AND WW, ONE LONG CHAPERONE PRotein, Hsp70, AND SYNTHETIC LATTICE PROTEINS FOR benchmarking. The FEATURES INFERRED BY THE RBM ARE BIOLOGICALLY INTERPRetable: THEY ARE RELATED TO STRUCTURE (such AS Residue-rESIDUE TERTIARY contacts, EXTENDED SECONDARY MOTIFS ( -helix AND -sheet) AND INTRINSICALLY DISORDERED Regions), TO FUNCTION (such AS ACTIVITY AND LIGAND SPECIficity), OR TO PHYLOGENETIC IDENTITY. IN addition, WE USE RBM TO DESIGN NEW PROTEIN SEQUENCES WITH PUTATIVE PROPERTIES BY COMPOSING AND TURNING UP OR DOWN THE DIffERENT MODES AT will. Our WORK THEREFORE SHOWS THAT RBM ARE A VERSATILE AND PRACTICAL TOOL TO UNVEIL AND EXPLOIT THE genotype-phenotype RELATIONSHIP FOR PROTEIN families. Sequencing OF MANY ORGANISM GENOMES HAS LED OVER THE RECENT YEARS TO THE COLLECTION OF A HUGE NUMBER OF PROTEIN sequences, GATHERED IN DATABASES SUCH AS UniPrOT OR PFAM Finn ET al. (2013). Sequences WITH A COMMON ANCESTRAL origin, DEfiNING A FAMILY (Fig. 1A), ARE LIKELY TO CODE FOR PROTEINS WITH SIMILAR FUNCTIONS AND STRUCTURes, HENCE PROVIDING A UNIQUE WINDOW INTO THE RELATIONSHIP BETWEEN GENOTYPE (sequence content) AND PHENOTYPE (biological FEATURes).
    [Show full text]
  • Protein Preliminaries and Structure Prediction Fundamentals For
    AN ARXIV PREPRINT 1 Protein preliminaries and structure prediction fundamentals for computer scientists Mahmood A. Rashid, Firas Khatib, and Abdul Sattar Abstract—Protein structure prediction is a chal- I. PROTEIN FUNDAMENTALS lenging and unsolved problem in computer science. Proteins are the sequence of amino acids connected Proteins are the principal constituents of the together by single peptide bond. The combinations of protoplasm of all cells. Proteins essentially con- the twenty primary amino acids are the constituents sist of amino acids, which are themselves bonded of all proteins. In-vitro laboratory methods used in by peptide linkages. In this section, we give an this problem are very time-consuming, cost-intensive, overview of protein preliminaries. and failure-prone. Thus, alternative computational methods come into play. The protein structure predic- tion problem is to find the three-dimensional native A. Amino acids structure of a protein, from its amino acid sequence. The native structure of a protein has the minimum Amino acids, also known as protein monomers free energy possible and arguably determines the or residues, are the molecules containing an amino function of the protein. In this study, we present the group, a carboxyl group, and a side-chain. This preliminaries of proteins and their structures, protein side-chain is the only component that varies be- structure prediction problem, and protein models. We tween different amino acids. Figure 1 shows the also give a brief overview on experimental and compu- tational methods used in protein structure prediction. generic structure of an amino acid. An amino This study will provide a fundamental knowledge to acid has the generic formula H2NCHROOH.
    [Show full text]
  • Protein Folding Optimization Using Differential Evolution
    Protein Folding Optimization using Differential Evolution Extended with Local Search and Component Reinitialization Borko Boˇskovi´c · Janez Brest This version, created on May 8, 2018, is an electronic reprint of the original article published in the Information Sciences Journal, 2018, Vol. 454-455, pp. 178{199, doi: 10.1016/j.ins.2018.04.072. This reprint differs from the original in pagination and typographic detail. Abstract This paper presents a novel Differential Evo- 1 Introduction lution algorithm for protein folding optimization that is applied to a three-dimensional AB off-lattice model. The protein structure prediction represents the prob- The proposed algorithm includes two new mechanisms. lem of how to predict the native structure of a pro- A local search is used to improve convergence speed and tein from its amino acid sequence. This problem is one to reduce the runtime complexity of the energy calcula- of the more important challenges of this century [17] tion. For this purpose, a local movement is introduced and, because of its nature, it attracts scientists from within the local search. The designed evolutionary algo- different fields, such as Physics, Chemistry, Biology, rithm has fast convergence speed and, therefore, when Mathematics, and Computer Science. Within the pro- it is trapped into the local optimum or a relatively good tein structure prediction, the Protein Folding Optimiza- solution is located, it is hard to locate a better similar tion (PFO) represents a computational problem for sim- solution. The similar solution is different from the good ulating the protein folding process and finding a native solution in only a few components.
    [Show full text]
  • A Multi-Objective Ab-Initio Model for Protein Folding Prediction at an Atomic Conformation Level
    NATIONAL UNIVERSITY OF COLOMBIA A Multi-objective Ab-initio Model for Protein Folding Prediction at an Atomic Conformation Level by David Camilo Becerra Romero A thesis submitted in partial fulfillment for the degree of Msc. System Engineering and Computation in the Engineering Computer Science February 2010 Declaration of Authorship The following articles were published during my period of research in the master program. A Novel Ab-Initio Genetic Based Approach for Protein Folding Prediction, In Proceedings of the 2007 Conference on Genetic and Evolutionary Computation (GECCO'07) (London, UK, July 7 - 11, 2007), pages 393-400, ACM Press. A Novel Method for Protein Folding Prediction Based on Genetic Algorithms and Hashing, In Proceedings of the II Computation Colombian Congress (2CCC) (Bogot´a,Colombia, April 18 - 20, 2007). A Model for Resource Assignment to Transit Routes in Bogota Transportation Sys- tem Transmilenio, In Proceedings of the III Computation Colombian Congress (3CCC) (Medell´ın,Colombia, April 23 - 25, 2008). An Association Rule Based Model For Information Extraction From Protein Sequence Data, In Proceedings of the III Computation Colombian Congress (3CCC) (Medelln, Colombia, April 23 - 25, 2008). Un Modelo de Asignacion de Recursos a Rutas en el Sistema de Transporte Masivo Trans- milenio, Avances en Sistemas e Inform´atica. Special Volume Vol. 5 Number. 2, June 2008, ISBN 1657-7663. An Association Rule Based Model For Information Extraction From Protein Sequence Data, Avances en Sistemas e Inform´atica.Special Volume Vol. 5 Number. 2, June 2008, ISBN 1657-7663. An Artificial Network Topology Based on Association Rules for Protein Secondary Struc- ture Prediction (Best paper award), III Biotechnology Colombian Congress (Bogot´a, Colombia, July 29 August 01, 2008).
    [Show full text]
  • On Lattice Protein Structure Prediction Revisited Ivan Dotu Manuel Cebrian´ Pascal Van Hentenryck Peter Clote
    1 On Lattice Protein Structure Prediction Revisited Ivan Dotu Manuel Cebrian´ Pascal Van Hentenryck Peter Clote Abstract—Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. Many approaches have been developed in the recent years, moving to increasingly complex lattice models or even off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. This hybrid algorithm improves earlier approaches in the literature over several well-known instances and demonstrates the potential of constraint-programming approaches for ab initio methods. Index Terms—Protein Structure, Constraint Programming, Local Search. ! 1 INTRODUCTION successful drug.Indeed, it has been stated that: “Prediction of protein structure in silico has thus been the ‘holy grail’ of In 1973, Nobel laureat C.B. Anfinsen [3] denatured the 124 computational biologists for many years” [53]. Despite the residue protein, bovine ribonuclease A, by the addition of urea. quantity of work on this problem over the past 30 years, Upon removal of the denaturant, the ribonuclease, an enzyme, and despite the variety of methods developed for structure was determined to be fully functional, thus attesting the prediction, no truly accurate ab initio methods exist to pre- successful reformation of functional 3-dimensional structure.
    [Show full text]
  • Protein Structure Prediction with Lattice Models
    1 Protein Structure Prediction with Lattice Models 1.1 Introduction:::::::::::::::::::::::::::::::::::::::::::: 1-1 1.2 Hydrophobic-Hydrophilic Lattice Models :::::::::: 1-3 1.3 Computational Intractability :::::::::::::::::::::::: 1-5 Initial Results ² Robust Results ² Finite-Alphabet Results 1.4 Performance-Guaranteed Approximation Algorithms ::::::::::::::::::::::::::::::::::::::::::::: 1-8 HP Model ² HP Model with Side-Chains ² Off-Lattice HP Model ² Robust Approximability for HP Models on General Lattices ² Accessible Surface Area Lattice William E. Hart Model Sandia National Laboratories 1.5 Exact Methods :::::::::::::::::::::::::::::::::::::::: 1-18 Enumeration of Hydrophobic Cores ² Constraint Alantha Newman Programming ² Integer Programming Massachusetts Institute of Technology 1.6 Conclusions :::::::::::::::::::::::::::::::::::::::::::: 1-21 1.1 Introduction A protein is a complex biological macromolecule composed of a sequence of amino acids. Proteins play key roles in many cellular functions. Fibrous proteins are found in hair, skin, bone, and blood. Membrane proteins are found in cells’ membranes, where they mediate the exchange of molecules and information across cellular boundaries. Water-soluble globular proteins serve as enzymes that catalyze most cellular biochemical reactions. Amino acids are joined end-to-end during protein synthesis by the formation of peptide bonds (see Figure 1.1). The sequence of peptide bonds forms a “main chain” or “backbone” for the protein, off of which project the various side chains. Unlike the structure of other biological macromolecules, proteins have complex, irregular structures. The sequence of residues in a protein is called its primary structure. Proteins exhibit a variety of motifs that reflect common structural elements in a local region of the polypeptide chain: ®- helices, ¯-strands, and loops—often termed secondary structures. Groups of these secondary structures usually combine to form compact globular structures, which represent the three- dimensional tertiary structure of a protein.
    [Show full text]
  • Fioretto-Master-Thesis.Pdf
    A CONSTRAINT PROGRAMMING APPROACH FOR THE ANALYSIS OF PROTEINS CONFORMATIONS VIA FRAGMENT ASSEMBLY BY FERDINANDO FIORETTO A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Science Subject: Computer Science New Mexico State University Las Cruces, New Mexico November 2011 Copyright c 2011 by Ferdinando Fioretto \A Constraint Programming approach for the analysis of proteins conformations via Fragment Assembly," a thesis prepared by Ferdinando Fioretto in partial ful- fillment of the requirements for the degree, Master of Science, has been approved and accepted by the following: Linda Lacey Dean of the Graduate School Enrico Pontelli Chair of the Examining Committee Date Committee in charge: Dr. Enrico Pontelli, Chair Dr. Champa Sengupta-Gopalan Dr. Enrico Pontelli Dr. Son Cao Tran ii DEDICATION This Thesis is dedicated to my family: my mother, Anna, my father, Bia- gio, and my sister Ilaria. iii ACKNOWLEDGMENTS I thank Enrico Pontelli, my advisor, for his guidance and encouragement in my research topics and for making this Thesis possible. Enrico allowed me to work freely on what I present in this Thesis, and supported me in numberless discussions with a fresh, outside look, making my Master studies a great experi- ence. My great gratitude goes to Alessandro Dal Pal´u,that originally offered me to join this project, and literally guided me into such research area, sharing with an uncountable number of ideas and suggestions. To him I owe nearly everything I know about computational biology. The Knowledge representation, Logic, and Advanced Programming Laboratory at New Mexico State University has been an inspiring and gratifying working environment for the past year, giving me the opportunity to meet extraordinary people.
    [Show full text]