The Origin and Evolution of the Genetic Code: Statistical and Experimental Investigations
Total Page:16
File Type:pdf, Size:1020Kb
The Origin and Evolution of the Genetic Code: Statistical and Experimental Investigations Robin Douglas Knight A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE BY THE DEPARTMENT OF ECOLOGY AND EVOLUTIONARY BIOLOGY June 2001 © Copyright by Robin Douglas Knight, 2001. All rights reserved. Abstract The structure of the genetic code has puzzled researchers since codon assignments were first elucidated, but recent genome and aptamer sequences allow quantitative testing of hypotheses about the code’s origin, evolution, and subsequent diversification. My major findings can briefly be categorized by stage of evolution: 1. There is strong statistical support for two major theories about the evolution of the canonical code. The code is highly resistant to errors relative to random codes, and there is a strong statistical association between at least some codons and the RNA binding sites for their cognate amino acids. I found no support for specific coevolutionary models, although the code presumably evolved from a simpler form. 2. The exact time of code optimization remains unclear, but measures of amino acids that would have been more important early in evolution generally make the code appear more optimal. Thus codon assignments were perhaps determined early, in an RNA world. Aptamer data supports this idea, though equivocally. 3. Modern variant codes are probably recently-derived, nearly neutral mutants of the standard code. They may be adapted to the specific mitochondrial environments where most of them occur, although I did not find any supporting evidence. There is no support for the idea that variant codes are optimized for reduced tRNA number. There is strong support for the idea that code evolution proceeds through an ambiguous intermediate. The molecular basis for certain code changes can be elucidated: in ciliates, the mutations in the release factor eRF1 that prevent stop codon recognition can be recaptured by statistical techniques applied to phylogenies. 4. The genetic code is not just the pattern of codon assignments: it is also the frequency with which each codon is used. Surprisingly, the vast interspecific variation in codon and amino acid usage can largely be recaptured by a neutral model that assumes only purifying selection. Despite these advances, many fundamental questions — such as the order in which amino acids were added to the code, and the extent to which stereochemistry and natural selection contributed to modern codon assignments — are still to be resolved. iii Acknowledgements Science is fundamentally a collaborative activity: no finding makes sense except in its social and historical context. This thesis is no exception. I thank my advisor, Laura Landweber, for all her help and advice during the four years I spent in her lab, and for bringing together such an exciting group of researchers. Much of my research was done in collaboration with Steve Freeland, who brought a completely different perspective on code evolution: this thesis would have been very different (and much worse!) without Steve’s input. Thanks also to Cathy Lozupone for her tireless cloning, sequencing, and phylogeny building, to Tam Horton and Vernadette Simon for advice and discussion, and to Drew Ronneberg, Pete Simon, Laura Katz, Ed Curtis, Irina Pokrovskaya, Dirk Faulhammer, Anthony Cukras, Tai-Chih Kuo, Amy Horton, Wei-Jen Chang, Christina Burch, Man-Kyoon Shin, and other past and present members of the Landweber lab for interesting conversation and fun times. Thanks also to my thesis committee, Henry Horn, Hope Hollocher, Lee Silver, and Michael Hecht, and to Steve Pacala, the director of graduate studies, for advice, direction and guidance. With interdisciplinary projects, outside collaborations are essential. Here I especially thank Mike Yarus, who put up with me for several months in his lab at the University of Colorado, Boulder, while I learned how to do SELEX experiments correctly, and who contributed greatly to the site/codon association work and to the tests of hypotheses about the evolution of variant codes. Thanks also to members of the Yarus lab: Anastasia Khrorova, Vasant Jadhav, Krishna Kumar, Alexandre Vlassov, Jason Bobe, and, especially, to Irene Majerfeld and Mali Illangasekare, whose patience and helpfulness are simply amazing. While in Boulder, I also had the pleasure of interacting with Noboru Sueoka, Leslie Leinwand, Norm Pace, and Ravinder Singh (and members of the Pace and Singh labs), and I look forward to interesting times in the MCDB department at Colorado. I also thank Jean Lobry at the Universite Claude Bernard, Guy Sella and Michael Lachmann at the Weismann Institute, Stephen Sowerby and David Liberles at the University of Stockholm, Erik Schultes at the Whitehead Institute, Warren Tate at the University of Otago, Michael New at NASA Ames, Dawn Brooks and Jacques Fresco here in Princeton for useful comments and discussion. I’d also like to acknowledge the many good friends I’ve made while at Princeton. Special thanks go to the roommates who’ve put up with me over the years, and who helped make five years in New Jersey tolerable: Amanda Birmingham, Jeremy Peirce, Jared Anderson, David Nadler, Bob Derose, Kiran Kedlaya, and Kyle Morrison. Thanks also to my cohort (also support group) of EEB grad students: Ethan Pride, Beth Macdougal-Shackleton, Kate Rodriguez-Clark, Rae Winfree, Jen Gee, and Nipada Ruankaew; and to other people in the department including (but by no means limited to!) Todd Vision, Diane O’Brien, Mark Roberts, Philippe Tortell, Jon-Paul Rodriguez , Jason Wilder, Stuart Sandin, Eric Dyreson, Tatjana Good, and Jayatri Das. The EEB administrative staff were absolutely critical for getting things done. Thanks especially to Mona Fazio, Christine Samburg, and Trish Turner-Gillespie for processing innumerable purchase orders and reimbursement forms, and to Mary Guimond for sorting out teaching and funding and for her encyclopedic knowledge of university procedure. Thanks also to Richard Smith for guiding me through the minefield of requirements for finals submission of the thesis, and to the people at the Visa Office for helping me not get deported. Finally, I’d like to thank my parents for their constant encouragement and support, and my teachers and lecturers in Dunedin, especially Laurie McAuley and Stuart McConnell, who fostered my interest in biology; David Fenby, who rekindled my interest in chemistry, Paul Griffiths, who got me interested in the idea of genetic information; and Warren Tate, who introduced me to the RNA World. iv Table of Contents Abstract ........................................................................................................................................... iii Acknowledgements ........................................................................................................................ iv Table of Contents............................................................................................................................. v Preface ........................................................................................................................................... vi 1 Introduction................................................................................................................................... 1 1.1 The Genetic Code ............................................................................................................ 1 1.2 Early Metabolism and the Origin of Coded Information ............................................ 8 1.3 The RNA World .............................................................................................................. 12 1.4 The Early Evolution of the Genetic Code................................................................... 18 1.5 Selection, Chemistry, and History: Three Faces of the Genetic Code .................. 23 2 Evolution of the Canonical Genetic Code............................................................................... 31 2.1 Stereochemistry and the Origin of the Genetic Code .............................................. 32 2.2 Rhyme or Reason: RNA-Arginine Interactions and the Genetic Code.................. 46 2.3 Guilt By Association: The Arginine Case Revisited................................................. 53 2.4 The Adaptive Evolution of the Genetic Code ............................................................ 66 2.5 When Did the Genetic Code Adapt to its Environment? ......................................... 83 3 Diversification of Modern Genetic Codes............................................................................. 100 3.1 Rewiring the Keyboard: Evolvability of the Genetic Code.................................... 101 3.2 How Mitochondria Redefine the Code...................................................................... 112 3.3 The Molecular Basis of Nuclear Genetic Code Change in Ciliates ...................... 134 3.4 A Simple Model Based On Mutation and Selection Explains Compositional Trends Within and Across Genomes........................................................................ 145 4 Conclusions/Research Summary...........................................................................................159 References .................................................................................................................................... 162 v Preface This thesis is organized into three main sections. The