Protein Structure Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Protein Structure Prediction Bioinformatics Algorithms Protein structure prediction David Hoksza http://siret.ms.mff.cuni.cz/hoksza Motivation • Sequence → structure → function • The number of available (protein) sequences grows much faster than the number of available 3D structures • Given a protein sequence we want to determine its structure • Inverse problem to protein structure design where, given a structure, we want to find sequence which codes for it 2 Structure → function • Inferring function from structure • Detection of local structural motifs with functional roles • Analysis of surface clefts → catalytic sites • Conservation analysis • Quaternary structure (beware of false positives due to crystallization) • Buried and solvent exposed residues • Issues • Moonlighting proteins • Multiple functions carried out by a single domain • Conformational change of shape upon binding • ligand-bound state (holo structures) vs unbound state (apo structure) • Intrinsically disordered proteins (IDP) • Natively unfolded proteins 3 Sequence → structure Size of common cores as a function of protein homology. If two proteins of length 푛1 and 푛2 have 푐 residues in the common core, the fractions of The relation of residue identity and the r.m.s. deviation of the backbone atoms 푐 푐 of the common cores of 32 pairs of homologous proteins each sequence in the common core are and . We plot these values, 푛1 푛2 4 connected by a bar,- against the residue identity of the core source: Chothia, Cyrus, and Arthur M. Lesk. "The relation between the divergence of sequence and structure in proteins." The EMBO journal 5.4 (1986): 823. Protein structure prediction tasks • Secondary structure prediction • Assign each amino acid one of three (or more) states (helix, sheet, loop) • Tertiary structure prediction • Assign each amino acid/atom its position in 3D space • Interaction sites prediction • Tertiary structure (intra-molecular) contacts • Protein-protein/DNA/RNA sites prediction (inter-molecular quaternary structure contacts) • Protein-ligand (active sites/pockets) prediction 5 Protein structure determination 6 Protein structure determination • X-ray crystallography (89%) • NMR spectroscopy (8%) • 3D (cryo) electron microscopy (EM) (2%) 7 X-ray crystallography • Crystallized protein subjected to X-ray beams, electrons disperse the beam, interfering with each other forming a diffraction patterns which is observed • Electron density of crystal is determined by the positions of electrons (atoms) ↔ magnitudes and phases of the X-ray diffraction waves = diffraction pattern of the crystal • Fourier transformation is used to estimate the electron density for each position • Works only for proteins which form a crystal → suitable for rigid proteins but unsuitable for flexible proteins source: https://www.nature.com/news/cryo-electron-microscopy-wins-chemistry-nobel-1.22738 8 X-ray crystallography – quality measures Electron density map • Resolution • 3Å→ secondary structure • 2.5Å→ side chains • <1Å→ hydrogen atoms • R-factor • After structure reconstruction, theoretical diffraction pattern can be computed → difference between real and theoretical pattern 3.7 Å 2.4 Å expressed as percentage (how well model back-predicts the data) • Rule of thumb - good structure should have R-factor lower than resolution/10 ( ≤ 0.3 for 3Å resolution) • R(free)-factor • When set aside data is used for the real pattern • B-factor (temperature factor) • Thermal motion is present even in crystal → extent to which electron density is spread out for each atom • 퐵 = 8휋2푈2 1.5 Å 0.8 Å 9 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) NMR spectroscopy • Purified protein in a solution is put to strong magnetic PDB ID: 6F0Y field and probed with radio waves and observed resonances (each atom has characteristic resonance in magnetic field based on its surroundings) which are analyzed to build a model of atomic nuclei and bonded atoms • Resonances give indication of which atoms are close to each other → list of restraints to build the model • NMR structure commonly includes ensemble of structures which fit the constraints → diverse regions correspond to flexible parts PDB ID: 5MN3 • Proteins in solution → works also for flexible proteins which can’t be locked in a crystal • Works for small to medium-sized proteins PDB ID: 6BNH NMR spectroscopy – quality measures • Completeness of resonance assignments • Percentage of atoms for which the resonances were measured • Statistically unusual resonances • Random coil index • How does the resonance fit usual protein conformations such as secondary structure 11 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) 3D cryo-EM • A beam of electrons and a system of electron lenses is used to image the biomolecule directly. • Cryo-EM • Vitrification - protein solution is cooled so rapidly that water molecules do not have time to crystallize → thin layer of non-crystalline ice • Thousands of 2D projection images → 3D density map → fitting atomic model to the map • Chemistry Nobel prize in 2017 - Jacques Dubochet, Joachim Frank and Richard Henderson • Ability to analyze large, complex and flexible structure • Works for proteins in native state • Often breaking 3Å resolution barrier PDB ID: 3j3q 12 13 ource: "Protein Data Bank: the single global archive for 3D macromolecular structure data." Nucleic Acids Research 47, no. D1 (2018): D520-D528. 14 Protein folding 15 Protein folding • Folding (skládání) is the process through which protein obtains its three- dimensional structure • The protein wants to fold into most thermodynamically efficient state, i.e. state with the lowest free energy • Information for folding is (mostly) driven by protein’s amino acid sequence through thermodynamic process • Anfinsen’s dogma 16 Anfinsen’s dogma • All information needed to fold native structure of a protein is contained in its amino acid sequence • Experiment with ribonuclease A (RNaseA), a 124-long extracellular enzyme with 4 disulfide bonds • Observation 1. SS bonds reduced using mercaptoethanol → denaturation with 8M urea → inactive protein, flexible random polymer 2. Removal of urea → oxidation of –SH groups back to SS bonds → regain of 90% of activity • Control (proving that the protein was unfolded) • Change of the order of steps in second phase → 1-2% of activity and random assortment of SS bonds 17 Levinthal’s paradox • Reaching native folded state of a protein by a random search among all possible configurations can take an enormously long time • Unfolded polypeptide chain has many degrees of freedom • Even a small number of allowed 휙 and 휓 combinations leads to astronomically large number of structures • Proteins fold in at most seconds which is a paradox → there must be pathway or set of pathways leading to energetically favorable conformation • Biased search • When considering some conformations as stabilizing and preferred (energy bias), the folding time becomes reasonable [Zwanzig et al. "Levinthal's paradox." PNSA 89.1 (1992): 20-22] 18 Structure prediction 19 Template existence dependency of tertiary structure prediction approaches sequence identity 20% – 30% night twilight zone day Combinatorial exploration Utilization of existing structures of the folding space with the goal to find the state with the lowest energy 20 source: Krieger, E., Nabuurs, S. B. and Vriend, G. (2003) Homology Modeling, in Structural Bioinformatics 21 Model scoring 23 Energy/scoring functions • Native structure is the lowest free energy conformation → need for a function capable to assess energy/quality of a proposed structure • Approaches • Potential energy • Atom-level resolution • Based on energy terms • Knowledge-based scoring functions • Residue-level resolution • Recognizing good folds from existing knowledge (PDB) 24 Potential energy function • Potential energy function defines the potential energy of a system of positions of all its atoms • Behavior of a molecule can be described by the Schrödinger equation which in general describes behavior of a dynamic system • We need to consider not only the atoms of the molecules, but also surrounding water molecules • Were we able to compute the energy of the system, we could use it to score our predictions • To compute the equation we need to consider all nuclei and electrons of the system and their interactions → impossible to solve for more than few atoms systems → potential energy function / force field • Molecular mechanics force field • Consists of energetic contribution of covalent (bonded) and electro-static (non-bonded) interactions • Each contribution consists of a functional part and its parametrization • Atoms represented by their centers only, but that depends on the type of energy function 25 Potential energy function – covalent interactions spring equilibrium bond • Bond-length potential constant length • Treating bond as a spring and describing its energy by Hooke’s law bond length • Bonds between chemically similar atoms have similar lengths, thus we can assume the observed 2 equilibrium is the one with minimum potential 퐸푏표푛푑 = 퐾푟 푟 − 푟푒푞 energy • Bond-angle potential 2 • Same as bonds 퐸푎푛푔푙푒 = 퐾휃 휃 − 휃푒푞 • Dihedral angle potential Barrier height given number of • Dihedral angles do not have single energy energy minima minimum • Not sufficient to represent energy of a dihedral 푉 angle and often combined with electrostatic 퐸 = 푛 [1 + cos 푛휙 − 훾 ] energy between the first and last atom of the 푑푖ℎ푒푑푟푎푙 2 atoms involved in the dihedral angle angular offset 26 source:
Recommended publications
  • Homology Modeling and Analysis of Structure Predictions of the Bovine Rhinitis B Virus RNA Dependent RNA Polymerase (Rdrp)
    Int. J. Mol. Sci. 2012, 13, 8998-9013; doi:10.3390/ijms13078998 OPEN ACCESS International Journal of Molecular Sciences ISSN 1422-0067 www.mdpi.com/journal/ijms Article Homology Modeling and Analysis of Structure Predictions of the Bovine Rhinitis B Virus RNA Dependent RNA Polymerase (RdRp) Devendra K. Rai and Elizabeth Rieder * Foreign Animal Disease Research Unit, United States Department of Agriculture, Agricultural Research Service, Plum Island Animal Disease Center, Greenport, NY 11944, USA; E-Mail: [email protected] * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +1-631-323-3177; Fax: +1-631-323-3006. Received: 3 May 2012; in revised form: 3 July 2012 / Accepted: 11 July 2012 / Published: 19 July 2012 Abstract: Bovine Rhinitis B Virus (BRBV) is a picornavirus responsible for mild respiratory infection of cattle. It is probably the least characterized among the aphthoviruses. BRBV is the closest relative known to Foot and Mouth Disease virus (FMDV) with a ~43% identical polyprotein sequence and as much as 67% identical sequence for the RNA dependent RNA polymerase (RdRp), which is also known as 3D polymerase (3Dpol). In the present study we carried out phylogenetic analysis, structure based sequence alignment and prediction of three-dimensional structure of BRBV 3Dpol using a combination of different computational tools. Model structures of BRBV 3Dpol were verified for their stereochemical quality and accuracy. The BRBV 3Dpol structure predicted by SWISS-MODEL exhibited highest scores in terms of stereochemical quality and accuracy, which were in the range of 2Å resolution crystal structures. The active site, nucleic acid binding site and overall structure were observed to be in agreement with the crystal structure of unliganded as well as template/primer (T/P), nucleotide tri-phosphate (NTP) and pyrophosphate (PPi) bound FMDV 3Dpol (PDB, 1U09 and 2E9Z).
    [Show full text]
  • Foldit Gamers Improve Protein Design Through Crowdsourcing 25 January 2012, by Bob Yirka
    Foldit gamers improve protein design through crowdsourcing 25 January 2012, by Bob Yirka chemical reactions. In earlier versions of the Foldit game, players were simply given existing proteins to play with and asked to find the minimal energy state for them by folding them in optimum ways, this latest version has gone much farther by giving players the opportunity to come up with a whole new protein design. To create the new design, gamers were given a Image: Nature Biotechnology (2012) simple beginning structure and some basic ideas doi:10.1038/nbt.2109 about the goal of the new protein, in this case to serve as a better catalyst for a class of Diels-Alder reactions, which are used to synthesize many commercial products. After offering some ideas (PhysOrg.com) -- Gamers on Foldit have such as remodeling certain sections to make them succeeded in improving the catalyst abilities of an behave in certain ways, the gamers went to work enzyme, making it 18-fold more active than the folding the proteins using the tools at hand. original version. The idea is the brainchild of University of Washington scientist Zoran Popovic The first go-round proved mostly futile, with few who is director of the Center for Game Science, gamers coming up with good improvements. To and biochemist David Baker. Together they have improve the results, the team took the best foldings created the Foldit site which is a video game from the first round and fed them back into the application that allows players to work with protein game allowing gamers to improve on them.
    [Show full text]
  • Increasing Public Involvement in Structural Biology
    Structure Commentary Increasing Public Involvement in Structural Biology Seth Cooper,1,* Firas Khatib,2 and David Baker2 1Department of Computer Science 2Department of Biochemistry University of Washington, Seattle, WA 98195, USA *Correspondence: [email protected] http://dx.doi.org/10.1016/j.str.2013.08.009 Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects. It is now easier than ever for the public to We’re very excited about the possibility Openness to Collaboration get involved in science. The Internet has for games and other forms of public in a Variety of Forms made it feasible for research groups to involvement in science to help advance The core of the project has been a very easily connect with people all over the the field. To our knowledge, there have fruitful collaboration between the Com- world. Personal computers have also been a few other projects actively puter Science and Engineering Depart- become powerful enough to run compu- involving the public in structural biology, ment and the Biochemistry Department tationally intensive programs, giving the and we look forward to many more in at the University of Washington. Both public the opportunity to contribute to the future. Structural biology problems departments were able to bring their scientific research. Volunteer computing involving the analysis of existing mole- knowledge and skills together to make a allows the public to share their spare cules and the design of new ones are successful team.
    [Show full text]
  • Algorithm Discovery by Protein Folding Game Players
    Algorithm discovery by protein folding game players Firas Khatiba, Seth Cooperb, Michael D. Tykaa, Kefan Xub, Ilya Makedonb, Zoran Popovićb, David Bakera,c,1, and Foldit Players aDepartment of Biochemistry; bDepartment of Computer Science and Engineering; and cHoward Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195 Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011) Foldit is a multiplayer online game in which players collaborate As the players themselves understand their strategies better than and compete to create accurate protein structure models. For spe- anyone, we decided to allow them to codify their algorithms cific hard problems, Foldit player solutions can in some cases out- directly, rather than attempting to automatically learn approxi- perform state-of-the-art computational methods. However, very mations. We augmented standard Foldit play with the ability to little is known about how collaborative gameplay produces these create, edit, share, and rate gameplay macros, referred to as results and whether Foldit player strategies can be formalized and “recipes” within the Foldit game (10). In the game each player structured so that they can be used by computers. To determine has their own “cookbook” of such recipes, from which they can whether high performing player strategies could be collectively invoke a variety of interactive automated strategies. Players can codified, we augmented the Foldit gameplay mechanics with tools share recipes they write with the rest of the Foldit community or for players to encode their folding strategies as “recipes” and to they can choose to keep their creations to themselves. share their recipes with other players, who are able to further mod- In this paper we describe the quite unexpected evolution of ify and redistribute them.
    [Show full text]
  • Dnpro: a Deep Learning Network Approach to Predicting Protein Stability Changes Induced by Single-Site Mutations Xiao Zhou and Jianlin Cheng*
    DNpro: A Deep Learning Network Approach to Predicting Protein Stability Changes Induced by Single-Site Mutations Xiao Zhou and Jianlin Cheng* Computer Science Department University of Missouri Columbia, MO 65211, USA *Corresponding author: [email protected] a function from the data to map the input information Abstract—A single amino acid mutation can have a significant regarding a protein and its mutation to the energy change impact on the stability of protein structure. Thus, the prediction of without the need of approximating the physics underlying protein stability change induced by single site mutations is critical mutation stability. This data-driven formulation of the and useful for studying protein function and structure. Here, we problem makes it possible to apply a large array of machine presented a new deep learning network with the dropout technique learning methods to tackle the problem. Therefore, a wide for predicting protein stability changes upon single amino acid range of machine learning methods has been applied to the substitution. While using only protein sequence as input, the overall same problem. E Capriotti et al., 2004 [2] presented a neural prediction accuracy of the method on a standard benchmark is >85%, network for predicting protein thermodynamic stability with which is higher than existing sequence-based methods and is respect to native structure; J Cheng et al., 2006 [1] developed comparable to the methods that use not only protein sequence but also tertiary structure, pH value and temperature. The results MUpro, a support vector machine with radial basis kernel to demonstrate that deep learning is a promising technique for protein improve mutation stability prediction; iPTREE based on C4.5 stability prediction.
    [Show full text]
  • A Deep Reinforcement Learning Neural Network Folding Proteins
    DeepFoldit - A Deep Reinforcement Learning Neural Network Folding Proteins Dimitra Panou1, Martin Reczko2 1University of Athens, Department of Informatics and Telecommunications 2Biomedical Sciences Research Center “Alexander Fleming” ABSTRACT Despite considerable progress, ab initio protein structure prediction remains suboptimal. A crowdsourcing approach is the online puzzle video game Foldit [1], that provided several useful results that matched or even outperformed algorithmically computed solutions [2]. Using Foldit, the WeFold [3] crowd had several successful participations in the Critical Assessment of Techniques for Protein Structure Prediction. Based on the recent Foldit standalone version [4], we trained a deep reinforcement neural network called DeepFoldit to improve the score assigned to an unfolded protein, using the Q-learning method [5] with experience replay. This paper is focused on model improvement through hyperparameter tuning. We examined various implementations by examining different model architectures and changing hyperparameter values to improve the accuracy of the model. The new model’s hyper-parameters also improved its ability to generalize. Initial results, from the latest implementation, show that given a set of small unfolded training proteins, DeepFoldit learns action sequences that improve the score both on the training set and on novel test proteins. Our approach combines the intuitive user interface of Foldit with the efficiency of deep reinforcement learning. KEYWORDS: ab initio protein structure prediction, Reinforcement Learning, Deep Learning, Convolution Neural Networks, Q-learning 1. ALGORITHMIC BACKGROUND Machine learning (ML) is the study of algorithms and statistical models used by computer systems to accomplish a given task without using explicit guidelines, relying on inferences derived from patterns. ML is a field of artificial intelligence.
    [Show full text]
  • Games As a Platform for Student Participation in Authentic Scientific Research
    Games as a Platform for Student Participation in Authentic Scientific Research Rikke Magnussen1, Sidse Damgaard Hansen2, Tilo Planke2 and Jacob Friis Sherson2 AU Ideas Center for Community Driven Research, CODER 1ResearchLab: ICT and Design for Learning, Department of Communication, Aalborg University, Denmark 2Department of Physics and Astronomy, Aarhus University, Denmark [email protected] [email protected] [email protected] [email protected] Abstract: This paper presents results from the design and testing of an educational version of Quantum Moves, a Scientific Discovery Game that allows players to help solve authentic scientific challenges in the effort to develop a quantum computer. The primary aim of developing a game-based platform for student-research collaboration is to investigate if and how this type of game concept can strengthen authentic experimental practice and the creation of new knowledge in science education. Researchers and game developers tested the game in three separate high school classes (Class 1, 2, and 3). The tests were documented using video observations of students playing the game, qualitative interviews, and qualitative and quantitative questionnaires. The focus of the tests has been to study players' motivation and their experience of learning through participation in authentic scientific inquiry. In questionnaires conducted in the two first test classes students found that the aspects of doing “real scientific research” and solving physics problems were the more interesting aspects of playing the game. However, designing a game that facilitates professional research collaboration while simultaneously introducing quantum physics to high school students proved to be a challenge. A collaborative learning design was implemented in Class 3, where students were given expert roles such as experimental and theoretical physicists.
    [Show full text]
  • Final Draft.Docx
    Three-Dimensional Modeling of Chicken Anemia Virus VP3 and Porcine Circovirus Type 1 VP3 A Major Qualifying Project Submitted to the faculty of WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science in Biochemistry and Chemistry by: __________________________ Sam Eisenberg __________________________________________ Lee Hermsdorf-Krasin __________________________ Curtis Innamorati September 12th, 2013 Approved: _________________________________ Dr. Destin Heilman, Advisor Department of Chemistry and Biochemistry, WPI Abstract The third viral protein (VP3) of the Chicken Anemia Virus (Apoptin) and Porcine Circovirus Type 1 (PCV1VP3) have potential therapeutic cancer killing properties. Though advances have been made in understanding their apoptotic mechanisms, the reasons behind their cancer cell selectivity have thus far eluded researchers. Further, researchers have been unable to isolate and crystallize these proteins, and this lack of a known structure greatly contributes to the difficulty of studying their selectivity. In the past decade protein prediction algorithms have made great strides in the ability to accurately predict secondary and tertiary structures of proteins. This project aimed to generate possible functional models of these proteins using the available prediction techniques. One significant and well defined function of these proteins is their ability to specifically localize to the cell nucleus or cytoplasm. In order to link and evaluate the results generated from tertiary structure predictions with possible mechanisms for localization, experiments regarding the activity of nuclear export signals in the proteins were performed. The generated models strongly suggest that a conformational change plays a significant role regarding the localization of Apoptin and that the export capabilities of PCV1VP3 are CRM1-dependent.
    [Show full text]
  • 11: Catchup II Machine Learning and Real-World Data (MLRD)
    11: Catchup II Machine Learning and Real-world Data (MLRD) Ann Copestake Lent 2019 Last session: HMM in a biological application In the last session, we used an HMM as a way of approximating some aspects of protein structure. Today: catchup session 2. Very brief sketch of protein structure determination: including gamification and Monte Carlo methods (and a little about AlphaFold). Related ideas are used in many very different machine learning applications . What happens in catchup sessions? Lecture and demonstrated session scheduled as in normal session. Lecture material is non-examinable. Time for you to catch-up in demonstrated sessions or attempt some starred ticks. Demonstrators help as usual. Protein structure Levels of structure: Primary structure: sequence of amino acid residues. Secondary structure: highly regular substructures, especially α-helix, β-sheet. Tertiary structure: full 3-D structure. In the cell: an amino acid sequence (as encoded by DNA) is produced and folds itself into a protein. Secondary and tertiary structure crucial for protein to operate correctly. Some diseases thought to be caused by problems in protein folding. Alpha helix Dcrjsr - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9131613 Bovine rhodopsin By Andrei Lomize - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=34114850 found in the rods in the retina of the eye a bundle of seven helices crossing the membrane (membrane surfaces marked by horizontal lines) supports a molecule of retinal, which changes structure when exposed to light, also changing the protein structure, initiating the visual pathway 7-bladed propeller fold (found naturally) http://beautifulproteins.blogspot.co.uk/ Peptide self-assembly mimic scaffold (an engineered protein) http://beautifulproteins.blogspot.co.uk/ Protein folding Anfinsen’s hypothesis: the structure a protein forms in nature is the global minimum of the free energy and is determined by the animo acid sequence.
    [Show full text]
  • Estimation of Uncertainties in the Global Distance Test (GDT TS) for CASP Models
    RESEARCH ARTICLE Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models Wenlin Li2, R. Dustin Schaeffer1, Zbyszek Otwinowski2, Nick V. Grishin1,2* 1 Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America, 2 Department of Biochemistry and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America * [email protected] Abstract a11111 The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a com- munity-wide blind test experiment to reveal the best accomplishments of structure model- ing. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differ- ences between close models is difficult because of the lack of uncertainty estimations for OPEN ACCESS this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to esti- Citation: Li W, Schaeffer RD, Otwinowski Z, Grishin mate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic reso- NV (2016) Estimation of Uncertainties in the Global nance are deposited as ensembles of alternative conformers that reflect the structural Distance Test (GDT_TS) for CASP Models. PLoS flexibility, whereas standard X-ray refinement produces the static structure averaged over ONE 11(5): e0154786. doi:10.1371/journal. time and space for the dynamic ensembles. To recapitulate the structural heterogeneous pone.0154786 ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets Editor: Yang Zhang, University of Michigan, UNITED to generate structural ensembles for our GDT_TS uncertainty analysis.
    [Show full text]
  • Methods for the Refinement of Protein Structure 3D Models
    International Journal of Molecular Sciences Review Methods for the Refinement of Protein Structure 3D Models Recep Adiyaman and Liam James McGuffin * School of Biological Sciences, University of Reading, Reading RG6 6AS, UK; [email protected] * Correspondence: l.j.mcguffi[email protected]; Tel.: +44-0-118-378-6332 Received: 2 April 2019; Accepted: 7 May 2019; Published: 1 May 2019 Abstract: The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
    [Show full text]
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]