Introduction of a Polar Core Into the De Novo Designed Protein Top7

Total Page:16

File Type:pdf, Size:1020Kb

Introduction of a Polar Core Into the De Novo Designed Protein Top7 Introduction of a polar core into the de novo designed protein Top7 Benjamin Basanta,1,2,3 Kui K. Chan,4 Patrick Barth,5,6,7 Tiffany King,8 Tobin R. Sosnick,8,9 James R. Hinshaw,10 Gaohua Liu,11,12 John K. Everett,11,12 Rong Xiao,11,12 Gaetano T. Montelione,11,12,13 and David Baker1,2,14* 1Department of Biochemistry, University of Washington, Seattle, Washington 98195 2Institute for Protein Design, University of Washington, Seattle, Washington 98195 3Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA 4Enzyme Engineering, EnzymeWorks, California 92121 5Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030 6Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030 7Department of Pharmacology Baylor College of Medicine, Houston, Texas 77030 8Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637 9Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois 60637 10Department of Chemistry, University of Chicago, Chicago, Illinois 60637 11Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, The State University of New Jersey, Piscataway, New Jersey 08854 12Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854 13Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854 14Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195 Received 2 October 2015; Revised 4 February 2016; Accepted 8 February 2016 DOI: 10.1002/pro.2899 Published online 00 Month 2016 proteinscience.org Abstract: Design of polar interactions is a current challenge for protein design. The de novo designed protein Top7, like almost all designed proteins, has an entirely nonpolar core. Here we describe the replacing of a sizable fraction (5 residues) of this core with a designed polar hydrogen bond network. The polar core design is expressed at high levels in E. coli, has a folding free energy of 10 kcal/mol, and retains the multiphasic folding kinetics of the original Top7. The NMR structure of the design shows that conformations of three of the five residues, and the designed hydrogen bonds between them, are very close to those in the design model. The remaining two residues, which are more solvent exposed, sample a wide range of conformations in the NMR ensemble. These results show that hydrogen bond networks can be designed in protein cores, but also high- light challenges that need to be overcome when there is competition with solvent. Additional Supporting Information may be found in the online version of this article. Broader Audience Statement: Natural proteins have primarily hydrophobic cores and polar exteriors. With the long-term goal of mak- ing “inside out” proteins with polar cores, we explore the properties of a designed protein with a polar hydrogen bond network in its core Benjamin Basanta and Kui K. Chan contributed equally to this work Grant sponsor: US National Institutes of Health; Grant number: U54GM094597 (to G.T.M.); Grant sponsor: NIH; Grant number: GM055694 (to T.R.S); Grant sponsor: Howard Hughes Medical Institute. *Correspondence to: David Baker, University of Washington, Molecular Engineering and Sciences, Box 351655, Seattle, WA 98195- 1655. E-mail: [email protected] Published by Wiley-Blackwell. VC 2016 The Protein Society PROTEIN SCIENCE 2016 VOL 00:00—00 1 Keywords: protein design; hydrogen bonds; protein folding; protein NMR; protein core polar interactions Introduction mized by minimization in the Rosetta forcefield; the De novo protein design has advanced considerably lowest energy of the refined designs is shown in Fig- in recent years. New protein structures, functions, ure 1(C). and interactions have been successfully designed.1–5 In a first round of experiments, we attempted to Both the new structures and new interactions are express the two lowest energy designed “inside-out” stabilized primarily by nonpolar interactions; there proteins in Escherichia coli with the hope that they has been less success with hydrogen bonding and would form inclusion bodies that could then be solu- polar interaction networks. bilized in organic solvent. However, there was little The de novo designed Top7 protein has an or no expression of these rather unusual proteins. entirely hydrophobic core and is extremely stable.6 To evaluate the designed polar core independent of Characterization of the folding kinetics of Top7 such a drastic surface redesign, we replaced the sur- showed the process was considerably more complex face residues on the two lowest energy polar-core than that of most native proteins with the same Top7 designs with those of the original water-soluble size, with multiple distinct kinetic phases. Analysis design. As we anticipated that the polar core designs of fragments of Top7 showed that many had appreci- would be considerably destabilized, we decided to able structure in isolation. The high stability and follow the example of crambin, a small protein with low folding cooperativity of Top7 suggests that there a quite polar core,9 and incorporate a disulfide bond may have been more selection for folding cooperativ- into the design. We searched for pairs of positions at ity than stability during evolution.7 which substituted cysteines could form a disulfide With the long-range goal of designing an “inside bond by introducing cysteines at every residue and out” protein—with a polar core and a nonpolar exte- using the Rosetta all atom energy function to iden- rior—that could be stable in organic solvent, we tify residue pairs with low energy disulfide bonds. explored the replacement of a portion of the Top7 We identified three pairs of positions where disulfide hydrophobic core with a polar hydrogen bond net- bonds with good geometry could be introduced [Fig. work. The polar-core Top 7 (Top7_PC) is folded and 1(D), only the model for the best behaved protein is moderately stable, and structural characterization shown]. shows that the majority of the hydrogen bonds in We tested the designed disulfides initially in the the designed network are intact. context of the original nonpolar core Top7. The three disulfides were introduced by mutagenesis, and the Results proteins expressed and purified. One of the three With the goal of designing an “inside out” protein, introduced disulfides—between residues 12 and 45— we adapted the RosettaMembrane all-atom energy was formed quantitatively, as assessed using Ell- function which utilizes an implicit model of the lipid man’s reagent. Guanidine hydrochloride (GuHCl) surrounding the transmembrane portions of mem- denaturation experiments tracking far-UV circular brane proteins.8 We modified the energy function dichroism (CD) showed the introduced disulfide such that the protein environment was treated as a shifted the denaturation midpoint, Cm, of Top7, an uniform implicit apolar solvent instead of a hetero- already very stable protein, from 6.5M to 8M geneous lipid membrane. We first entirely rede- GuHCl (Top7_CC, Supporting Information Fig. 1). signed Top7 using the apolar potential, keeping the Because little or no unfolded baseline could be meas- backbone conformation fixed, and allowing all 20 ured, we could not determine the increase in folding amino acids at all 94 positions to assess whether free energy by fitting the full unfolding transition; networks of polar residues with low energy configu- instead we estimate from the shift in midpoint and rations could be designed in its core. The calcula- the m-value of Top7 that the disulfide increases sta- tions converged on very similar low energy designed bility by 3 kcal/mol. structures, with most core positions of the original We next introduced the 12 to 45 disulfide bond Top7 recapitulated except for five positions substi- into the protein having the designed core hydrogen tuted with polar residues forming a connected bond network, keeping the original Top7 surface res- hydrogen bond network with each residue stabilized idues at all other positions [Fig. 1(E)]. The resulting by two hydrogen bonds [Fig. 1(A,B)]. The polar net- protein “Top7_PC” expressed at high levels in E. coli work bridges several secondary structural elements and could be readily purified. The CD spectrum of in the Top7 core including the first and third beta Top7_PC was very similar to that of the wild-type strands and the C-terminal helix. The backbone and protein [Supporting Information Figs. 4 and 2(A) of side-chain dihedral angles of the designs were opti- Ref. 6]. Rotational correlation time (sc) estimates 2 PROTEINSCIENCE.ORG Polar Core in De Novo Designed Protein Top7 Figure 1. Top7_PC and Top7 models. Close up view comparing the core region of Top7 before (B) and after (A) hydrogen- bond network incorporation. (C) Hydrogen-bond network in the context of the whole structure of one of the initial “inverted” Top7 models. (D) Model of the disulfide-bonded variant of Top7. (E) Model of disulfide-bonded Top7 with core hydrogen-bond network. based on 15N NMR nuclear relaxation measure- teins are similar. Both proteins are stabilized ments (T1/T2) indicate that Top7_PC is predomi- entropically at low temperature but enthalpically at nantly monomeric in solution (Supporting
Recommended publications
  • Recent Advances in Automated Protein Design and Its Future
    EXPERT OPINION ON DRUG DISCOVERY 2018, VOL. 13, NO. 7, 587–604 https://doi.org/10.1080/17460441.2018.1465922 REVIEW Recent advances in automated protein design and its future challenges Dani Setiawana, Jeffrey Brenderb and Yang Zhanga,c aDepartment of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; bRadiation Biology Branch, Center for Cancer Research, National Cancer Institute – NIH, Bethesda, MD, USA; cDepartment of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA ABSTRACT ARTICLE HISTORY Introduction: Protein function is determined by protein structure which is in turn determined by the Received 25 October 2017 corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are Accepted 13 April 2018 understood, it should be possible to refine or even redefine the function of a protein by working KEYWORDS backwards from the desired structure to the sequence. Automated protein design attempts to calculate Protein design; automated the effects of mutations computationally with the goal of more radical or complex transformations than protein design; ab initio are accessible by experimental techniques. design; scoring function; Areas covered: The authors give a brief overview of the recent methodological advances in computer- protein folding; aided protein design, showing how methodological choices affect final design and how automated conformational search protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, parti- cularly in cases where screening by combinatorial mutagenesis is problematic.
    [Show full text]
  • Structure-Function Relationship of Archaeal Rhodopsin Proteins Analyzed by Continuum Electrostatics
    Structure-Function Relationship of Archaeal Rhodopsin Proteins analyzed by Continuum Electrostatics DISSERTATION submitted to the Faculty of Biology, Chemistry and Geoscience of the University of Bayreuth, Germany for obtaining the degree of Doctor of Natural Sciences presented by Edda Kloppmann born in Luneburg,¨ Germany Bayreuth 2010 Vollstandiger¨ Abdruck der von der Fakultat¨ fur¨ Biologie, Chemie und Geowissenschaften der Universitat¨ Bayreuth genehmigten Disserta- tion zur Erlangung des akademischen Grades Doktor der Naturwis- ” senschaften (Dr. rer. nat.)“. Die vorliegende Arbeit wurde im Zeitraum von November 2002 bis November 2003 am IWR Heidelberg und von Dezember 2003 bis April 2007 am Lehrstuhl fur¨ Biopolymere an der Universitat¨ Bayreuth unter der Leitung von Professor G. Matthias Ullmann erstellt. Hiermit erklare¨ ich, dass ich die vorliegende Arbeit selbststandig¨ ver- fasst und keine anderen als die von mir angegebenen Quellen und Hilfsmittel verwendet habe. Ferner erklare¨ ich, dass ich nicht bereits anderweitig mit oder ohne Er- folg versucht habe, eine Dissertation einzureichen oder mich der Dok- torprufung¨ zu unterziehen. Bayreuth, den 13.01.2010 Prufungsausschuss:¨ 1. Gutachter: Prof. Dr. G. Matthias Ullmann 2. Gutachter: Prof. Dr. Franz X. Schmid Prufungsvorsitz:¨ Prof. Dr. Thomas Hellweg Prof. Dr. Benedikt Westermann Einreichung des Promotionsgesuchs: 13.01.2010 Kolloqium: 04.05.2010 Structure-Function Relationship of Archaeal Rhodopsin Proteins analyzed by Continuum Electrostatics ABSTRACT Rhodopsin proteins perform two cellular key functions: signaling of external stimuli and ion transport. Examples of both functional types are found in the family of archaeal rho- dopsins, namely the proton pump bacteriorhodopsin, the chloride pump halorhodopsin and the photoreceptor sensory rhodopsin II. For these three membrane proteins, high- resolution X-ray structures are available, allowing a theoretical investigation in atomic detail.
    [Show full text]
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]
  • Precise Assembly of Complex Beta Sheet Topologies from De Novo 2 Designed Building Blocks
    1 Precise Assembly of Complex Beta Sheet Topologies from de novo 2 Designed Building Blocks. 3 Indigo Chris King*†, James Gleixner†, Lindsey Doyle§, Alexandre Kuzin‡, John F. Hunt‡, Rong Xiaox, 4 Gaetano T. Montelionex, Barry L. Stoddard§, Frank Dimaio†, David Baker† 5 † Institute for Protein Design, University of Washington, Seattle, Washington, United States 6 ‡ Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York, United 7 States 8 x Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Northeast Struc- 9 tural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States 10 § Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States 11 12 ABSTRACT 13 Design of complex alpha-beta protein topologies poses a challenge because of the large number of alternative packing 14 arrangements. A similar challenge presumably limited the emergence of large and complex protein topologies in evo- 15 lution. Here we demonstrate that protein topologies with six and seven-stranded beta sheets can be designed by inser- 16 tion of one de novo designed beta sheet containing protein into another such that the two beta sheets are merged to 17 form a single extended sheet, followed by amino acid sequence optimization at the newly formed strand-strand, strand- 18 helix, and helix-helix interfaces. Crystal structures of two such designs closely match the computational design models. 19 Searches for similar structures in the SCOP protein domain database yield only weak matches with different beta sheet 20 connectivities. A similar beta sheet fusion mechanism may have contributed to the emergence of complex beta sheets 21 during natural protein evolution.
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • Bioengineering Professor Trey Ideker Wins 2009 Overton Prize
    Bioengineering Professor Trey Ideker Wins 2009 Overton Prize March 13, 2009 Daniel Kane University of California, San Diego bioengineering professor Trey Ideker-a network and systems biology pioneer-has won the International Society for Computational Biology's Overton Prize. The Overton prize is awarded each year to an early-to-mid-career scientist who has already made a significant contribution to the field of computational biology. Trey Ideker is an Associate Professor of Bioengineering at UC San Diego's Jacobs School of Engineering, Adjunct Professor of Computer Science, and member of the Moores UCSD Cancer Center. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. His recent research activities include development of software and algorithms for protein network analysis, network-level comparison of pathogens, and genome-scale models of the response to DNA-damaging agents. "Receiving this award is a wonderful honor and helps to confirm that the work we have been doing for the past several years has been useful to people," said Ideker. "This award also provides great recognition to UC San Diego which has fantastic bioinformatics programs both at the undergraduate and graduate level. I could never have done it without the help of some really first-rate bioinformatics and bioengineering graduate students," said Ideker. Ideker is on the faculty of the Jacobs School of Engineering's Department of Bioengineering, which ranks 2nd in the nationfor biomedical engineering, according to the latest US News rankings. The bioengineering department has ranked among the top five programs in the nation every year for the past decade.
    [Show full text]
  • Precise Assembly of Complex Beta Sheet Topologies from De Novo Designed Building Blocks
    SHORT REPORT Precise assembly of complex beta sheet topologies from de novo designed building blocks Indigo Chris King1*†, James Gleixner1, Lindsey Doyle2, Alexandre Kuzin3, John F Hunt3, Rong Xiao4, Gaetano T Montelione4, Barry L Stoddard2, Frank DiMaio1, David Baker1 1Institute for Protein Design, University of Washington, Seattle, United States; 2Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, United States; 3Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, United States; 4Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, United States Abstract Design of complex alpha-beta protein topologies poses a challenge because of the large number of alternative packing arrangements. A similar challenge presumably limited the emergence of large and complex protein topologies in evolution. Here, we demonstrate that protein topologies with six and seven-stranded beta sheets can be designed by insertion of one de novo designed beta sheet containing protein into another such that the two beta sheets are merged to form a single extended sheet, followed by amino acid sequence optimization at the newly formed strand-strand, strand-helix, and helix-helix interfaces. Crystal structures of two such *For correspondence: chrisk1@ uw.edu designs closely match the computational design models. Searches for similar structures in the SCOP protein domain database yield only weak matches with different beta sheet connectivities. A † Present address: Molecular similar beta sheet fusion mechanism may have contributed to the emergence of complex beta Engineering and Sciences, sheets during natural protein evolution. University of Washington, DOI: 10.7554/eLife.11012.001 Seattle, United States Competing interests: The authors declare that no competing interests exist.
    [Show full text]
  • A Glance Into the Evolution of Template-Free Protein Structure Prediction Methodologies Arxiv:2002.06616V2 [Q-Bio.QM] 24 Apr 2
    A glance into the evolution of template-free protein structure prediction methodologies Surbhi Dhingra1, Ramanathan Sowdhamini2, Frédéric Cadet3,4,5, and Bernard Offmann∗1 1Université de Nantes, CNRS, UFIP, UMR6286, F-44000 Nantes, France 2Computational Approaches to Protein Science (CAPS), National Centre for Biological Sciences (NCBS), Tata Institute for Fundamental Research (TIFR), Bangalore 560-065, India 3University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris F-75015, France 4Laboratory of Excellence GR-Ex, Boulevard du Montparnasse, Paris F-75015, France 5DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis F-97715, France Abstract Prediction of protein structures using computational approaches has been explored for over two decades, paving a way for more focused research and development of algorithms in com- parative modelling, ab intio modelling and structure refinement protocols. A tremendous suc- cess has been witnessed in template-based modelling protocols, whereas strategies that involve template-free modelling still lag behind, specifically for larger proteins (> 150 a.a.). Various improvements have been observed in ab initio protein structure prediction methodologies over- time, with recent ones attributed to the usage of deep learning approaches to construct protein backbone structure from its amino acid sequence. This review highlights the major strategies undertaken for template-free modelling of protein structures while discussing few tools devel- arXiv:2002.06616v2 [q-bio.QM] 24 Apr 2020 oped under each strategy. It will also briefly comment on the progress observed in the field of ab initio modelling of proteins over the course of time as seen through the evolution of CASP platform.
    [Show full text]
  • Deep Learning-Based Advances in Protein Structure Prediction
    International Journal of Molecular Sciences Review Deep Learning-Based Advances in Protein Structure Prediction Subash C. Pakhrin 1,†, Bikash Shrestha 2,†, Badri Adhikari 2,* and Dukka B. KC 1,* 1 Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA; [email protected] 2 Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA; [email protected] * Correspondence: [email protected] (B.A.); [email protected] (D.B.K.) † Both the authors should be considered equal first authors. Abstract: Obtaining an accurate description of protein structure is a fundamental step toward under- standing the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the suc- cess of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination Citation: Pakhrin, S.C.; Shrestha, B.; using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress Adhikari, B.; KC, D.B.
    [Show full text]
  • IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct
    bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.242347; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. IG-VAE: GENERATIVE MODELING OF IMMUNOGLOBULIN PROTEINS BY DIRECT 3D COORDINATE GENERATION Raphael R. Eguchi Namrata Anand Christian A. Choe Department of Biochemistry Department of Bioengineering Department of Bioengineering Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Po-Ssu Huang Department of Bioengineering Stanford University [email protected] ABSTRACT While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model. 1 Introduction Over the past two decades, the field of protein design has made steady progress, with new computational methods providing novel solutions to challenging problems such as enzyme catalysis, viral inhibition, de novo structure generation and more[1, 2, 3, 4, 5, 6, 7, 8, 9].
    [Show full text]
  • THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S
    THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S. Innovation Deficit MIT Washington Office 820 First St. NE, Suite 610 Washington, DC 20002-8031 Tel: 202-789-1828 futurepostponed.org THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S. Innovation Deficit The Future Postponed 2.0. October 2016. Cambridge, Massachusetts. This material may be freely reproduced. Cover image: Leigh Prather/Shutterstock Contents UNLEASHING THE POWER OF SYNTHETIC PROTEINS 4 Potential Breakthroughs in Medicine, Energy, and Technology DAVID BAKER, Professor of Biochemistry, University of Washington Investigator, Howard Hughes Medical Institute PREDICTING THE FUTURE OF EARTH’S FORESTS 9 Forests absorb carbon dioxide and are thus an important buffer against climate change, for now. Understanding forest dynamics would enable both better management of forests and the ability to assess how they are changing. STUART J. DAVIES, Frank H. Levinson Chair in Global Forest Science; Director, Forest Global Earth Observatory-Center for Tropical Forest Science, Smithsonian Tropical Research Institute RESETTING THE CLOCK OF LIFE 13 We know that the circadian clock keeps time in every living cell, controlling biological processes such as metabolism, cell division, and DNA repair, but we don’t understand how. Gaining such knowledge would not only offer fundamental insights into cellular biochemistry, but could also yield practical results in areas from agriculture to medicine to human aging. NING ZHENG, Howard Hughes Medical Institute, Department of Pharmacology University of Washington; MICHELE PAGANO, M.D., Howard Hughes Medical Institute, Department of Pathology and NYU Cancer Institute, New York University School of Medicine CREATING A CENSUS OF HUMAN CELLS 16 For the first time, new techniques make possible a systematic description of the myriad types of cells in the human body that underlie both health and disease.
    [Show full text]
  • Alzheimer's Disease Ask Document
    UW MEDICINE INSTITUTE FOR PROTEIN DESIGN TACKLING THE FOUNDATION OF ALZHEIMER’S DISEASE The University of Washington Institute for Protein Design LZHEIMER’S DISEASE IS A DEVASTATING CONDITION. This relentlessly Aprogressive neurodegenerative disorder has no cure, it can’t be prevented, and the mean life expectancy of a person who has been diagnosed is a short seven years. The disease imposes heavy social, psychological, physical and economic burdens on caregivers, and it is costly for society: the U.S. now spends an estimated $100 billion on it every year. And costs are steadily rising with the aging of the population. Unfortunately, despite investing billions of dollars and many years into more than 400 potential drug treatments, there are, as yet, no solutions for Alzheimer’s disease. Given that the number of people affected worldwide could quadruple to more than 100 million by 2050, we think it is time for a totally new approach: protein design. Scientists at the University of Washington’s Institute for Protein Design have already taken the first steps to change the paradigm for Alzheimer’s disease research and treatment. We invite you to join us. Normal nerve cells are Mis-folding Proteins and Alzheimer’s Disease shown on the top. Accumulation of amyloid Proteins are the body’s workhorses: the major functional components of living cells. plaques and formation However, over a person’s lifetime, otherwise normal proteins may malfunction — and may of neurofibrillary tangles even cause toxic activity that harms the cell. Many diseases, including Alzheimer’s disease, (shown on bottom) are hallmarks of result from a protein malfunction known as mis-folding.
    [Show full text]