Advances in Protein Structure Prediction and Design

REVIEWS TECHNOLOGIES AND TECHNIQUES Advances in protein structure prediction and design Brian Kuhlman 1,2* and Philip Bradley 3,4* Abstract | The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently , the inverse problem — designing an amino acid sequence that will fold into a specified three-dimensional structure — has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein– protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review , we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled. The stunning diversity of molecular functions per- These advances in protein structure prediction and formed by naturally evolved proteins is made possible by design have been fuelled by technological breakthroughs their finely tuned three-dimensional structures, which as well as by a rapid growth in biological databases. are in turn determined by their genetically encoded Protein-modelling algorithms (BOx 1) are computation- amino acid sequences. A predictive understanding of the ally demanding both to develop and to apply. The rapid relationship between amino acid sequence and protein increase in computing power available to researchers structure would therefore open up new avenues, both for (both CPU-based and, increasingly, GPU-based com- the prediction of function from genome sequence data puting power) facilitates rapid benchmarking of new and also for the rational engineering of novel protein algorithms and enables their application to larger mole- 1Department of Biochemistry functions through the design of amino acid sequences cules and molecular assemblies. At the same time, and Biophysics, University of with specific structures. The past decade has seen dra- next-generation sequencing has fuelled a dramatic North Carolina, Chapel Hill, matic improvements in our ability to predict and design increase in protein sequence databases as genomic NC, USA. the three-dimensional structures of proteins, with and metagenomic sequencing efforts have expanded10. 2Lineberger Comprehensive potentially far-reaching implications for medicine and Advances in software and automation have increased the Cancer Center, University of our understanding of biology. New machine-learning pace of experimental structure determination, speeding North Carolina, Chapel Hill, NC, USA. algorithms have been developed that analyse the pat- the growth of the database of experimentally determined 11 3Computational Biology terns of correlated mutations in protein families, to protein structures (the Protein Data Bank (PDB)) , which Program, Fred Hutchinson predict structurally interacting residues from sequence now contains close to 150,000 macromolecular structures. Cancer Research Center, information alone1,2. Improved protein energy functions3,4 Deep-learning algorithms12 that have revolutionized image Seattle, WA, USA. have for the first time made it possible to start with an processing and speech recognition are now being adopted 4Institute for Protein Design, approximate structure prediction model and move it by protein modellers seeking to take advantage of these University of Washington, closer to the experimentally determined structure by expanded sequence and structural databases. Seattle, WA, USA. an energy-guided refinement process5,6. Advances in In this Review, we highlight a selection of recent *e-mail: bkuhlman@ email.unc.edu; pbradley@ protein conformational sampling and sequence optimi- breakthroughs that these technological advances have fredhutch.org zation have permitted the design of novel protein struc- enabled. We describe current approaches to the pre- 7,8 https://doi.org/10.1038/ tures and complexes , some of which show promise diction and design of protein structures, focusing pri- s41580-019-0163-x as therapeutics9. marily on template-free methods that do not require an NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 20 | NOVEMBER 2019 | 681 REVIEWS Box 1 | Navigating protein energy landscapes Protein conformational energy landscapes are complex, high-dimensional surfaces with many local minima. Navigating these landscapes in order to locate low-energy basins for prediction and design requires efficient sampling methods and accurate energy functions. In gradient-based optimization approaches (see the figure, upper left panel), the derivatives of the energy function with respect to the flexible degrees of freedom (e.g. the atomic coordinates or backbone torsion angles) are calculated in order to proceed in the direction in which the energy decreases most rapidly. Gradient-based optimization is effective at finding the nearest local minimum in the energy landscape, but it will not generally locate the global minimum. Monte Carlo sampling approaches employ randomly selected conformational moves and occasional uphill steps to escape local minima (see the figure, lower panels). In Metropolis Monte Carlo44, sampling moves are accepted (green arrows) or rejected (red arrows) on the basis of the change in energy: downhill moves that decrease the energy are accepted with probability 1, whereas uphill moves (dashed arrows) are accepted with a probability P that exponentially decreases as a function of the energy change. Examples of the move sets used for Monte Carlo simulations include fragment-replacement moves, in which a continuous backbone segment in the current conformation is replaced with an alternative conformation from a fragment library, and side-chain rotamer substitutions. A popular alternative to Monte Carlo sampling is molecular dynamics simulation (see the figure, upper right panel), in which the conformational sampling is dictated by Newton’s laws of motion applied to the potential energy function of the molecular system. Given a starting set of atomic positions and velocities, the force acting on each atom is calculated by taking the gradient of the potential energy, and a resulting acceleration is derived from Newton’s second law (F = ma). A very small step forwards in time is taken (typically of the order of a few femtoseconds), and new positions and velocities are calculated on the basis of the size of the time step and the old positions, velocities and accelerations. With an accurate energy function and sufficiently small time steps, a long molecular-dynamics simulation provides broad sampling of the energy landscape and also gives a realistic picture of how individual molecules evolve over time. The challenge of modelling approaches based on molecular dynamics is that anywhere from millions to trillions of time steps must be conducted to reach biologically relevant time scales, requiring high-performance software and, in some cases, even special-purpose supercomputers170,171. Exploring the energy landscape Molecular dynamics Gradient-based minimization Start F = ma gy Force ﬁeld calculations Start Protein energy functions Finish for each atom determine Ener time progression in v = dx/dt Functions that correspond femtosecond steps to a mathematical model Nearest energy minimum of the molecular forces that Finish determine protein structures and interactions. The choice Conformation of an energy function defines a map from structures onto energy values, referred to as Metropolis Monte Carlo Monte Carlo moves an energy landscape, which can guide structure prediction P = 0.03 and design simulations. Typical ΔE = 2.1 P = 0.58 protein energy functions are P = 1 linear combinations of multiple gy P = 1 Start P = 0.07 terms, each term capturing a ΔE = 0.33 Ener distinct energetic contribution ΔE = 1.6 Global P = 1 (van der Waals interactions, minimum electrostatics, desolvation), Finish Fragment Rotamer with the weights and atomic replacement substitution Conformation parameters for these terms chosen by a parameterization procedure that seeks to optimize the agreement experimentally determined structure as a template. The in the shape of the energy landscape of the polypeptide: between the quantities strengths and weaknesses of these modelling approaches, the native state is the one with the lowest free energy16,17. predicted from the energy as well as their current and potential applications, will be This hypothesis forms the basis for a general approach to function and the corresponding values derived from discussed. Finally, we comment on the broader practi- protein structure prediction that combines sampling of experiments or from quantum cal implications of these developments for the fields of alternative conformations with scoring to rank them by chemistry calculations on small biology and medicine. energy

Advances in Protein Structure Prediction and Design

An Effective Computational Method Incorporating Multiple Secondary

Side-Chain Liquid Crystal Polymers (SCLCP): Methods and Materials. an Overview

The Role of Side-Chain Branch Position on Thermal Property of Poly-3-Alkylthiophenes

Protein Structure & Folding

DNA Glycosylase Exercise - Levels 1 & 2: Answer Key

GROMACS: Fast, Flexible, and Free

Improved Prediction of Protein Side-Chain Conformations with SCWRL4 Georgii G

The Future of Protein Secondary Structure Prediction Was Invented by Oleg Ptitsyn

Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development

Proteasomes on the Chromosome Cell Research (2017) 27:602-603

Chapter 13 Protein Structure Learning Objectives

Chapter 4 the Three-Dimensional Structure of Proteins