<<

In vitro Evolution of a Processive Clamping RNA Ribozyme with Promoter Recognition

by Razvan Cojocaru

BSc, Simon Fraser University, 2014

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

in the Department of Molecular Biology and Biochemistry Faculty of Science

© Razvan Cojocaru 2021 SIMON FRASER UNIVERSITY Summer 2021

Copyright in this work is held by the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Declaration of Committee

Name: Razvan Cojocaru

Degree: Doctor of Philosophy

Title: In vitro Evolution of a Processive Clamping RNA Polymerase Ribozyme with Promoter Recognition

Committee: Chair: Lisa Craig Professor, Molecular Biology and Biochemistry

Peter Unrau Supervisor Professor, Molecular Biology and Biochemistry

Dipankar Sen Committee Member Professor, Molecular Biology and Biochemistry

Michel Leroux Committee Member Professor, Molecular Biology and Biochemistry

Mani Larijani Internal Examiner Associate Professor, Molecular Biology and Biochemistry

Gerald Joyce External Examiner Professor, Jack H. Skirball Center for Chemical Biology and Proteomics Salk Institute for Biological Studies

Date Defended/Approved: August 12, 2021

ii Abstract

The RNA World hypothesis proposes that the early evolution of life began with RNAs that can serve both as carriers of genetic information and as catalysts. Later in evolution, these functions were gradually replaced by DNA and enzymatic proteins in cellular biology. I start by reviewing the naturally occurring catalytic RNAs, ribozymes, as they play many important roles in biology today. These ribozymes are central to protein synthesis and the regulation of gene expression, creating a landscape that strongly supports an early RNA World. Ribozymes have also been produced in the laboratory using artificial rather than natural selection. As phosphoryl transfer reactions are central to the energy balance of all organisms in modern biology, I explore artificially selected , glycosidic bond forming, capping, , and polymerase ribozymes, highlighting the importance of phosphoryl transfer reactions from nucleotide and nucleoside metabolism to the assembly and replication of RNA molecules in an RNA World setting.

RNA replicases capable of general and self-replication are thought to have been essential early in evolution. However, how such sophisticated evolved to enable processive gene expression remains largely unexplored. I performed a complex selective strategy that screened ~1013 pool variants to isolate a three domain holopolymerase ribozyme, containing a class I ligase catalytic core, an NTP positioning accessory domain, and a processivity clamping domain. This ribozyme uses a sigma factor–like specificity primer to first recognize an RNA promoter sequence, and then, in a second step, rearrange to a processive clamped elongation form. When correctly assembled, the clamped complex results in more than one order of magnitude increase in extension, synthesizing duplexes of 50-107 base pairs in size. The polymerase can also synthesize part of its own specificity primer, programming itself to polymerize from certain RNA promoters and not others, demonstrating how RNA polymerase ribozymes could have preferentially replicated their own genomes and associated genes, while avoiding replicative parasites in a primordial RNA World. The clamp-like mechanism of my selected polymerase could eventually enable strand displacement and improve fidelity, both being critical requirements for replication in the early evolution of life.

Keywords: RNA World; in vitro evolution; ribozyme; RNA polymerase; promoter recognition; processivity

iii Acknowledgements

First and foremost, I would like to thank my senior supervisor, Dr. Peter Unrau, for taking me under his supervision and providing me with the support and guidance needed to become the scientist I am today. Thank you for all the opportunities you have given me over the years and the many thought-provoking conversations we have shared about science and career. Thank you to my committee members, Dr. Dipankar Sen and Dr. Michel Leroux, for their encouragement and for taking the time and interest to be involved in my projects, providing invaluable feedback and perspectives. I would also like to thank my internal examiner, Dr. Mani Larijani, and my external examiner, Dr. Gerald Joyce, for taking the time to read my thesis and to participate in my defence.

I would like to thank all the collaborators I have had the pleasure of working with over these past years. Thank you to Dr. Michael Ryckelynck and his lab for their hospitality. Thank you to the research team at Lumex Instruments, as well as to Dr. Ryan Morin and Dr. Adrian Ferré-D'Amaré and their laboratories, for the fruitful collaborations. Thank you to the MBB office and teaching staff for all their help and support. I would also like to thank fellow graduate students and friends I have journeyed with throughout the years in Dr. Unrau’s lab: Dr. Sunny Jeng, Dr. Shyam Panchapakesan, Amir Abdolahzadeh, Iqra Yaseen, Florian Weissenboeck, Kristen Kong, Christopher Bonar, Dr. Lena Dolgosheina, Dr. Mariana Oviedo, Lyssa Martin and all the undergraduate and high school students passing through, for moral support and meaningful discussions.

I would like to thank my family and friends for their continuous love and support, as well as for the many philosophical conversations shared over fermented beverages. A special thank you to my parents, Gabriela and Aurel Cojocaru, and sister, Anca, for their infinite supply of love, patience and encouragement throughout my life. Without you I would not be where I am today. Lastly, I would like to thank my partner, Caelie Stewart, for always being by my side and all the joy, love and support she has provided me throughout this journey. Thank you for being ready at a moment’s notice to celebrate the smallest successes and lift me up from the inevitable falls of graduate studies.

iv Preface: Short Summary of Contributions not Included in the Thesis

Over the course of my PhD, my research has spanned four RNA based projects of significance: 1. Exploring the origin of life by using in vitro evolution to develop a processive RNA polymerase ribozyme; 2. Validating a microchip real-time PCR technology for detection of SARS-CoV-2 in clinical samples, that uses 10-fold less reagents than the current CDC approved PCR detection tests; 3. Development and characterization of RNA Mango fluorogenic aptamers; And 4. Structural investigation of the NFKBIZ gene 3' untranslated region, mutations in which are linked to diffuse large B- cell lymphoma. However, since the majority of my time and focus has been on the origin of life, I, together with my supervisors, have decided to focus the thesis entirely on this subject matter. Nevertheless, I will briefly summarize all the contributions not included in this thesis.

Validation of a microchip RT-PCR technology for SARS-CoV-2 detection

1. Cojocaru, R., Yaseen, I., Unrau, P.J., Lowe, C.F., Ritchie, G., Romney, M.G., Sin, D.D., Gill, S., Slyadnev, M., 2021. Microchip RT-PCR Detection of Nasopharyngeal SARS-CoV-2 Samples. J Mol Diagn 23, 683–690.

Fast, accurate, and reliable diagnostic tests have been critical for controlling the spread of the coronavirus disease 2019 (COVID-19) associated with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The current gold standard for testing is real-time PCR; however, during the current pandemic, supplies of testing kits and reagents have been limited. We reported the validation of a rapid (30 minutes), user-friendly, and accurate microchip real-time PCR assay for detection of SARS-CoV-2 from nasopharyngeal swab RNA extracts. Microchips preloaded with COVID-19 primers and probes for the N gene accommodate 1.2 μL reaction volumes, lowering the required reagents by 10-fold compared with tube-based real-time PCR. We validated our assay using contrived reference samples and 21 clinical samples from patients in Canada, determining a limit of detection of 1 copy per reaction. The microchip real-time PCR provides a significantly lower resource alternative to the Centers for Disease Control and Prevention (CDC) approved real-time RT-PCR assays with comparable sensitivity,

v showing 100% positive and negative predictive agreement of clinical samples. I contributed to this project by performing half the experiments and manuscript writing.

Development and characterization of RNA Mango fluorogenic aptamers

2. Trachman, R.J., Abdolahzadeh, A., Andreoni, A., Cojocaru, R., Knutson, J.R., Ryckelynck, M., Unrau, P.J., Ferré-D’Amaré, A.R., 2018. Crystal Structures of the Mango-II RNA Aptamer Reveal Heterogeneous Fluorophore Binding and Guide Engineering of Variants with Improved Selectivity and Brightness. Biochemistry 57, 3544–3548.

Several RNA aptamers that bind small molecules and enhance their fluorescence have been selected in vitro and used successfully to tag and track RNAs in vivo. Recently, combined SELEX and microfluidic fluorescence sorting yielded three aptamers that activate fluorescence of TO1-Biotin: Mango-II, Mango-III, and Mango-IV (Autour et al., 2018). Of the three, Mango-II, binds TO1-Biotin with the highest affinity, ∼1 nM, and enhances its fluorescence by >1500-fold. We determined the crystal structures of Mango-II in complex with two fluorophores, TO1-Biotin (Em: 535 nm) and a more significantly red shifted TO3-Biotin (Em: 658 nm), and found that despite their high affinity, the ligands adopt multiple distinct conformations, indicative of a binding pocket with modest stereoselectivity. Mutational analysis of the led to a Mango-II mutant (A22U) which retains high affinity for TO1-Biotin but now discriminates >5-fold against TO3-biotin. Moreover, fluorescence enhancement of TO1-Biotin increases by 18%, while that of TO3-Biotin decreases by 25% providing a potential for two-color imaging. My contribution included a set of experiments to characterize the effects of mutations on an important motif of the RNA as well as synthesizing various thiazole orange derivative for determining Mango-II binding and fluorescence enhancement.

3. Trachman, R.J., Autour, A., Jeng, S.C.Y., Abdolahzadeh, A., Andreoni, A., Cojocaru, R., Garipov, R., Dolgosheina, E.V., Knutson, J.R., Ryckelynck, M., Unrau, P.J., Ferré-D’Amaré, A.R., 2019. Structure and functional reselection of the Mango- III fluorogenic RNA aptamer. Nature Chemical Biology 15, 472.

The ~30 nucleotide Mango-III is also notable as it has the brightest fluoresce of the three aptamers (quantum yield 0.55). Unique among related aptamers, Mango-III exhibits biphasic thermal melting, characteristic of molecules with tertiary structure. We

vi reported crystal structures of TO1-Biotin complexes of Mango-III, a structure-guided mutant Mango-III(A10U), and a functionally reselected mutant iMango-III. The structures reveal a globular architecture arising from an unprecedented pseudoknot-like connectivity between a G-quadruplex and an embedded non-canonical duplex. The fluorophore is restrained into a planar conformation by the G-quadruplex, a lone, long- range trans Watson–Crick pair (whose A10U mutation increases quantum yield to 0.66), and a pyrimidine perpendicular to the nucleobase planes of those motifs. The improved iMango-III and Mango-III(A10U) fluoresce ~50% brighter than enhanced green fluorescent protein, making them suitable tags for live cell RNA visualization. My contribution involved performing fluorescence binding experiments with the reselected iMango-III and a series of structure guided iMango-III mutants to determine the importance of particular residues in binding and fluorescence. Additionally, I looked at the effect of D2O on iMango-III, as D2O is known to affect fluorescence lifetimes through a kinetic isotopic effect on proton transfer rate and/or through changes in solvent viscosity.

4. Trachman, R.J., Cojocaru, R., Wu, D., Piszczek, G., Ryckelynck, M., Unrau, P.J., Ferré-D’Amaré, A.R., 2020. Structure-Guided Engineering of the Homodimeric Mango-IV Fluorescence Turn-on Aptamer Yields an RNA FRET Pair. Structure 28, 776-785.

Additionally, fluorescent RNA aptamers have been used in cells as biosensor reporters and tags for tracking transcripts. Of the three recently selected aptamers, Mango-IV was best at imaging RNAs in both fixed and live mammalian cells. To understand how Mango-IV achieves activity in cells, we determined its crystal structure complexed with TO1-Biotin. The structure reveals a domain-swapped homodimer with two independent G-quadruplex fluorophore binding pockets. Structure-based analyses indicate that the Mango-IV core has relaxed fluorophore specificity, and a tendency to reorganize binding pocket residues. These molecular properties may endow it with robustness in the cellular milieu. Based on the domain-swapped structure, heterodimers between Mango-IV and the fluorescent aptamer iSpinach, joined by Watson-Crick base pairing, were constructed. These exhibited FRET between their respective aptamer- activated fluorophores, advancing fluorescent aptamer technology toward multi-color, RNA-based imaging of RNA coexpression and colocalization. I contributed to this project

vii by performing fluorescence enhancement measurements with Mango I-IV, Mango- II(A22U) and Mango-III(A10U) mutants on several variant fluorophores.

Characterization of the NFKBIZ gene 3' untranslated region

5. Arthur, S.E., Jiang, A., Grande, B.M., Alcaide, M., Cojocaru, R., Rushton, C.K., Mottok, A., Hilton, L.K., Lat, P.K., Zhao, E.Y., Culibrk, L., Ennishi, D., Jessa, S., Chong, L., Thomas, N., Pararajalingam, P., Meissner, B., Boyle, M., Davidson, J., Bushell, K.R., Lai, D., Farinha, P., Slack, G.W., Morin, G.B., Shah, S., Sen, D., Jones, S.J.M., Mungall, A.J., Gascoyne, R.D., Audas, T.E., Unrau, P., Marra, M.A., Connors, J.M., Steidl, C., Scott, D.W., Morin, R.D., 2018. Genome-wide discovery of somatic regulatory variants in diffuse large B-cell lymphoma. Nat Commun 9, 4001.

6. Arthur, S. E., Gao, J., Healy, S., Rushton, C., Thomas, N., Hilton, L., Dreval, K., Tang, J., Alcaide, M., Cojocaru, R., Mottok, A., Telenius, A., Unrau, P. J., Wilson, W., Staudt, L., Scott, D., Hodson, D., Steidl, C., Morin, R. D., 2021. Non-coding NFKBIZ 3′ UTR mutations promote cell growth and resistance to targeted therapeutics in diffuse large B-cell lymphoma. Cancer Discovery (*submitted).

Diffuse large B-cell lymphoma (DLBCL) is an aggressive cancer originating from mature B-cells. Prognosis is strongly associated with molecular subgroup, although the driver mutations that distinguish the two main subgroups remain poorly defined. Through an integrative analysis of whole genomes, exomes, and transcriptomes, we have uncovered genes and noncoding loci that are commonly mutated in DLBCL. Our analysis has identified novel cis-regulatory sites and implicates recurrent mutations in the 3' UTR of NFKBIZ as a novel mechanism of oncogene deregulation and NF-κB pathway activation in the activated B-cell (ABC) subgroup. We confirmed the prevalence and pattern of NFKBIZ 3′ UTR mutations in additional cohorts. Although the role of NFKBIZ amplifications in activating NF-κB signaling is well known, the effects of and mechanism by which 3′ UTR mutations affect IκB-ζ activity have been relatively unexplored. We described a functional characterization of these mutations by demonstrating that elevated expression of IκB-ζ confers growth advantages in DLBCL cell lines and primary germinal center cells as well as nominate novel IκB-ζ target genes with potential therapeutic implications. The limited benefits of targeted treatments for DLBCL, particularly those targeting the NF-κB axis, lead us to investigate and confirm

viii that NFKBIZ 3′ UTR mutations affect response to therapeutics. My contribution to these projects included in vitro cleavage assays of NFKBIZ 3' UTR mutants with the endoribonuclease Regnase-1, an NFKBIZ post-transcriptional regulator. Additionally, I performed RNA probing experiments using RNase T1 to construct a model of the NFKBIZ 3′ UTR secondary structure.

ix Table of Contents

Declaration of Committee ...... ii Abstract ...... iii Acknowledgements ...... iv Preface: Short Summary of Contributions not Included in the Thesis ...... v Table of Contents ...... x List of Tables ...... xii List of Figures...... xiii List of Acronyms ...... xv

Chapter 1. Introduction ...... 1 1.1. The RNA World hypothesis and discovery of ribozymes ...... 1 1.2. Ribozyme chemistry and divalent cations ...... 3 1.3. The natural ribozymes ...... 4 1.3.1. Classes of natural ribozymes ...... 4 1.3.2. The self-cleaving ribozymes ...... 5 The hammerhead ribozyme (HHR) ...... 5 The hairpin ribozyme ...... 6 Hepatitis delta virus (HDV) ribozyme ...... 7 Varkud satellite (VS) ribozyme ...... 8 Glucosamine-6-phosphate synthase (glmS) ribozyme ...... 9 Twister ...... 10 Twister sister, pistol and hatchet ...... 11 1.3.3. Trans-cleaving ribozymes ...... 13 Group I introns ...... 13 Group II introns and the spliceosome ...... 14 Lariat capping (LC) ribozyme ...... 17 Ribonuclease P (RNase P)...... 18 The ribosome ...... 20 1.4. RNA and the origins of life ...... 21 1.5. In vitro evolution of RNA ...... 22 1.5.1. Systematic evolution of ligands by exponential enrichment (SELEX) ..... 22 1.5.2. Selection in vitro for catalytic RNAs ...... 23 1.5.3. Evolution in vitro ...... 25 1.6. Artificial ribozymes ...... 26 1.6.1. Rediscovery of naturally occurring ribozymes by artificial selection ...... 26 1.6.2. Catalytic diversity of artificial ribozymes ...... 27 1.6.3. Kinase ribozymes ...... 28 1.6.4. Glycosidic bond forming ribozymes ...... 33 1.6.5. Capping ribozymes ...... 38 1.6.6. Ligase ribozymes ...... 43 1.6.7. Polymerase ribozymes ...... 51

x Chapter 2. Processive RNA Polymerization and Promoter Recognition in an RNA World ...... 58 2.1. Introduction ...... 58 2.2. Selection of a promoter-specific RNA polymerase ribozyme...... 60 2.3. Clamping domain characterization ...... 65 2.4. The clamping domain confers long-range extension and promoter selectivity ...... 69 2.5. The clamping domain confers polymerization efficiency ...... 70 2.6. The clamped complex is stable and allows extension at multiple primed sites ..... 77 2.7. Programmable promoter recognition by the holo-polymerase ...... 82 2.8. Discussion ...... 85 2.9. Materials and Methods ...... 87 2.9.1. Oligonucleotides...... 87 2.9.2. Pool construction and modification ...... 87 2.9.3. In vitro selection schemes ...... 89 2.9.4. RT of selection rounds ...... 92 2.9.5. Pool PCR amplification...... 92 2.9.6. Covalent circularization of RNA templates ...... 92 2.9.7. 3' Biotinylation of linear RNA templates ...... 93 2.9.8. High-throughput sequencing ...... 94 2.9.9. Characterization of the clamping polymerase (CP) ribozyme ...... 94 2.9.10. PBS sequence modification of CP for self-templated primer synthesis ... 95 2.9.11. Polymerization assays...... 95 2.9.12. Polymerization Assays with multiple primers on cT1 ...... 96 2.9.13. Extension efficiency titrations and order of addition experiment ...... 96 2.9.14. Off-rates of primers and templates from bead immobilized CPOPEN complexes ...... 97 2.9.15. Primer extensions on beads with immobilized CP complexes ...... 98 2.9.16. Quantification of primer extensions ...... 99 2.10. Supplementary Tables ...... 100

Chapter 3. Conclusions and Future Research Directions ...... 111 3.1. Evolution of an autocatalytic system ...... 111 3.1.1. Strand displacement ...... 111 3.1.2. Fidelity ...... 116 3.1.3. Regulation of replication ...... 118 3.2. New polymerase functionalities: RNA tailing ...... 119 3.2.1. Methods: self-extension in vitro selection scheme ...... 121 3.3. Conclusion...... 123

References ...... 124

xi List of Tables

Table 2.1. Oligonucleotides for the construction of the DNA selection pool...... 100 Table 2.2. Selection schemes and conditions...... 102 Table 2.3. Sequences of RNA primers and modified DNA oligos ...... 103 Table 2.4. Sequences of RNA templates and corresponding complementary RNA templates...... 104 Table 2.5. Ribozyme sequences ...... 106 Table 2.6. Modified CP ribozymes constructs...... 107 Table 2.7. Oligonucleotides for CP ribozyme blocking experiments ...... 110 Table 3.1. Self-extension selection conditions...... 122

xii List of Figures

Figure 1.1. Self-cleaving and trans-cleaving chemistry of natural ribozymes...... 5 Figure 1.2. Secondary structure of the hammerhead ribozyme (HHR)...... 6 Figure 1.3. Secondary structure of the hairpin ribozyme...... 7 Figure 1.4. Secondary structure of the hepatitis delta virus ribozyme (HDV)...... 8 Figure 1.5. Secondary structure of the Varkud satellite ribozyme (VS)...... 9 Figure 1.6. Secondary structure of the glucosamine-6-phosphate synthase ribozymes (glmS)...... 10 Figure 1.7. Secondary structure of the twister ribozyme...... 11 Figure 1.8. Secondary structure of the twister sister, pistol and hatchet ribozymes. . 12 Figure 1.9. The group I self-splicing intron...... 14 Figure 1.10. The group II self-splicing intron...... 16 Figure 1.11. The lariat capping ribozyme (LC)...... 18 Figure 1.12. RNase P is a trans acting two-domain ribozyme...... 19 Figure 1.13. The peptide chain forming chemistry of the ribosome is unique among natural ribozymes...... 21 Figure 1.14. Artificial in vitro selection methodology...... 24 Figure 1.15. Kinase ribozymes ...... 31 Figure 1.16. Glycosidic bond-forming ribozymes...... 36 Figure 1.17. Ribozyme mediated self-capping...... 39 Figure 1.18. Protein mediated capping...... 42 Figure 1.19. Protein mediated RNA ligation...... 44 Figure 1.20. Ribozyme mediated RNA ligation...... 47 Figure 1.21. Self-replication with ligase ribozymes...... 50 Figure 1.22. The artificial evolution of an RNA polymerase ribozyme...... 52 Figure 1.23. Template dependent RNA and DNA polymerization is catalyzed by RNA dependent RNA or DNA polymerase (RdRP or RdDP) ribozymes...... 55 Figure 2.1. The clamping RNA dependent RNA polymerase (CP RdRP) ribozyme and DNA dependent RNA polymerase (DdRP) transcriptional initiation processes...... 59 Figure 2.2. The Clamping Polymerase (CP) progenitor, B6.61...... 61 Figure 2.3. Schematic in vitro selection schemes...... 62 Figure 2.4. Selection schemes and pool modifications implemented by round of selection...... 63 Figure 2.5. 3' extension by the Round 23 pool...... 64 Figure 2.6. Pool diversity and the emergence of polymerization...... 65 Figure 2.7. Clamping domain minimization preserves activity while domain removal strongly inhibits polymerization...... 66

xiii Figure 2.8. Clamping domain transplantation onto a lower activity Family 4 ribozyme...... 67 Figure 2.9. Clamping domain mutations lower extension activity...... 68 Figure 2.10. Promoter specific polymerization by CP on random sequence templates...... 69 Figure 2.11. Long range primer extension by the CP ribozyme...... 70 Figure 2.12. The order of addition of the polymerase, specificity primer and template influences extension, but not folding of the polymerase core...... 71 Figure 2.13. Extension efficiency of the CP ribozyme...... 72 Figure 2.14. Primer extensions for Extension Ratios...... 73 Figure 2.15. Higher extension efficiency on circular templates by the CP ribozyme. .. 74 Figure 2.16. ER with no PBS specificity primer hybridization sequence...... 75 Figure 2.17. Magnesium dependence of the CP ribozyme and its progenitor B6.61. .. 76 Figure 2.18. Immobilized open form polymerase (P1:CPOPEN) forms a stable complex with long templates but not short...... 78 Figure 2.19. CP processively extends multiple primers on the same promoter- template...... 79 Figure 2.20. Extension of P1 and P1+n primers is correlated only when on the same templates...... 80 Figure 2.21. Streptavidin bead immobilized polymerase template complex is active and exhibits highly correlated primer extension...... 81 Figure 2.22. CP Ribozyme self-templated primer extension...... 83 Figure 2.23. Promoter selectivity resulting from specificity-primer synthesis templated by the polymerase ribozyme’s PBS sequence...... 84 Figure 2.24. Untemplated extension by the open form of the polymerase is purine rich...... 85 Figure 2.25. Linearization of cT1 by alkaline hydrolysis...... 93 Figure 2.26. Biotinylation efficiency...... 94 Figure 2.27. CP activity is unaffected by bead binding conditions...... 98 Figure 3.1. Strand displacement for non-enzymatic primer extension...... 112 Figure 3.2. Symmetric and asymmetric rolling-circle replication of viroids and HDV...... 113 Figure 3.3. CP ribozyme shows no strand displacement activity...... 115 Figure 3.4. Proposed strand invasion of an evolved CP ribozyme...... 116 Figure 3.5. Misincorporation of nucleotides during polymerization...... 118 Figure 3.6. 3' tailing activity by selection rounds...... 121 Figure 3.7. Schematic of 3' tailing in vitro selection schemes...... 122

xiv List of Acronyms aa-tRNA Aminoacyl transfer RNA AK AMP Adenosine monophosphate ATP Adenosine-5'-triphosphate cDNA Complementary DNA CP Clamping polymerase CPEB3 Cytoplasmic element-binding protein 3 CTP Cytidine-5'-triphosphate CTPB Biotin-11-cytidine-5'-triphosphate DdRP DNA-dependent RNA polymerases DNA Deoxyribonucleic acid DNase Deoxyribonuclease dNTP Deoxynucleotide triphosphate ds Double-stranded dTMPK Thymidylate kinase EF-G Elongation factor G EF-Tu Elongation factor thermo unstable ER Extension ratio FAD Flavin adenine dinucleotide GIR1 Group I-like ribozyme 1 glmS Glucosamine-6-phosphate synthase GlcN6P Glucosamine-6-phosphate GTP Guanosine-5′-triphosphate GUK HDV Hepatitis delta virus HEG Homing endonuclease gene HHR Hammerhead Ribozyme hPNK Human polynucleotide nucleotide kinase IEP Intron-encoded proteins LC Lariat-capping LUCA Last common universal ancestor mRNA Messenger RNA

xv NAD Nicotinamide adenine dinucleotide NDP Nucleotide diphosphate NDPK Nucleoside diphosphate kinase NK Nucleotide kinase NMN Nicotinamide mononucleotide NMP Nucleotide monophosphate NMPK Nucleoside monophosphate kinase nt Nucleotide NTP Nucleotide triphosphate PAP Poly(A) polymerase PBS Primer-binding site PCR Polymerase chain reaction PNK Polynucleotide nucleotide kinase PRPP 5-Phosphoribosyl-1-pyrophosphate pRNA RNA PTC Peptidyl center R5P -5-phosphate RdDP RNA dependent DNA polymerase RdRP RNA dependent RNA polymerase RNA Ribonucleic acid RNAP RNA polymerase RNase P Ribonuclease P rRNA Ribosomal RNA RT Reverse transcription (transcriptase) SELEX Systematic evolution of ligands by exponential enrichment snRNA Small nuclear RNA ss Single stranded TENT Terminal TMP Trimetaphosphate tRNA Transfer RNA UMP-CMPK Uridylate-cytidylate kinase UTP Uridine-5′-triphosphate WT Wild type VS Varkud satellite

xvi Chapter 1. Introduction

This chapter is based off the following manuscripts:

Cojocaru, R., Unrau, P. J., Phosphoryl Transfer Ribozymes in Ribozymes: Principles, Methods, Applications, S. Müller, B. Masquida, W. Winkler, Eds. (Wiley, ed. 1, 2021).

Cojocaru R., Unrau P. J., Ribozymes and Evolution in Encyclopedia of Biochemistry III, J. M. Jez, Eds. (Elsevier, ed. 3, 2021).

1.1. The RNA World hypothesis and discovery of ribozymes

For most of molecular biology’s history, RNA was regarded as an intermediary between DNA, the storage of heritable genetic information, and protein , which carry out most cellular functions. This changed experimentally in 1989, when the Nobel Prize was awarded to Cech and Altman for the discovery that RNA, like proteins, can catalyze chemical reactions. Cech’s discovery that RNA splicing, an important regulatory process, is catalyzed by RNA (Kruger et al., 1982), together with Altman’s nearly simultaneous discovery that RNase P, responsible for tRNA maturation, is also at its core RNA (Guerrier-Takada et al., 1983), revolutionized the RNA field. These discoveries were subsequently extended by the demonstration that the ribosome, a universal cellular machine responsible for nearly all protein synthesis on this planet, relies on a catalytic RNA for peptide elongation (Nissen et al., 2000). Additional discoveries have helped demonstrate that catalytic RNAs are essential in modern biology powering not only critical aspects of protein synthesis in all three branches of life, but also having many important regulatory roles (Nissen et al., 2000; Teixeira et al., 2004; Winkler et al., 2004).

The discovery of naturally occurring catalytic RNAs, also known as ribozymes, bolstered the RNA World hypothesis initially proposed in the 1960s (Crick, 1968; Orgel, 1968; Woese, 1968) and later coined by Gilbert in 1986 (Gilbert, 1986). According to this hypothesis, early in evolution, RNA played a key role in the establishment of life by serving both as the carrier of genetic information and as a catalyst, functions that were gradually replaced by the more stable and efficient DNA and catalytically diverse protein enzymes. This theory is further supported by the fact that metabolically, hundreds of

1 protein enzymes use RNA cofactors (White, 1976). This includes not only the activated nucleotides adenosine-5'-triphosphate (ATP) (177 enzymatic reactions), and guanosine- 5′-triphosphate (GTP) (27 reactions), but also the cap-like redox cofactors such as nicotinamide adenine dinucleotide (NAD) (446 reactions) and flavin adenine dinucleotide (FAD) (41 reactions) (numbers from the KEGG Database (Kanehisa and Goto, 2000; Kanehisa et al., 2016, 2017)). Finally, the metabolic organization of nucleotide synthesis in modern biology suggests that RNA synthesis could have predated DNA synthesis (Reichard, 1988). Together, this evidence suggests that RNA may have preceded the genetic polymer of DNA and the protein catalyzed metabolism of today in an earlier RNA dominated period of evolution.

In this introductory chapter, I will discuss the 14 different types of natural ribozymes discovered to date and the mechanisms by which they function. The landscape created by these natural ribozymes strongly supports a scenario early in the evolution of life, where RNA functioned both as genetic material and as a catalyst. Today, remnants of this early world are still being continually discovered within modern life forms. Further, the invention of artificial in vitro nucleic acid selection (also termed Systematic Evolution of Ligands by Exponential Enrichment or SELEX) by Gold, Joyce, and Szostak (Tuerk and Gold, 1990; Robertson and Joyce, 1990; Ellington and Szostak, 1990) provides further support for the RNA World hypothesis, with the isolation and characterization of ribozymes able to promote small-molecule chemistry and RNA polymerization (Chen et al., 2007; Wilson et al., 2016). Similar types of ribozymes would have been essential in an early RNA World, during which, RNA and not proteins would have been needed to power cellular metabolism and replication. Together, the discovery of natural ribozymes and the invention of in vitro selection grew the popularity of the RNA World hypothesis and resulted in a continuous effort to develop ribozymes able to ultimately achieve self-sustained replication and evolution in the laboratory. Such a system would share many aspects of modern living systems, and if achieved, would allow unprecedented study of ideas directly relevant to early evolution. Therefore, after discussing the naturally occurring ribozymes, I will focus on a subset of artificial RNA enzymes and their protein equivalents that are most relevant to a primordial RNA World and that have the potential to evolve a self-sustained replicative system built entirely from RNA.

2 1.2. Ribozyme chemistry and divalent cations

Whether it be protein or RNA, catalysts accelerate chemical reactions by lowering the activation energy required to perform a specific , facilitating the transition from reactants to products (Wisniak, 2010). For their catalytic activity, protein enzymes can utilize 20 amino acid side chains which can contribute diverse functional groups that include polar, charged, aliphatic and aromatic residues for the construction of the , thus providing a versatile chemical landscape for evolution to act upon. However, ribozymes are comprised of only four nucleotides with no stabilizing positively charged residues at neutral pH, limiting the catalytic potential of these aperiodic polymers. Despite this limitation, ribozymes utilize some of the same catalytic strategies as protein enzymes, namely: in-line orientation of reactants, stabilization of the transition state, and general acid-base catalysis (Fedor and Williamson, 2005). RNA folds into three-dimensional structures with catalytic pockets that precisely align and orient substrates relative to each other and the catalytic residues of the ribozymes. Via its phosphodiester backbone, RNA particularly excels at positioning phosphates. Additionally, using base pairing, base stacking and hydrogen- bond interactions, the active site of a ribozyme can stabilize the transition state by binding to it with high affinity (Rupert et al., 2002; Cochrane and Strobel, 2008). The transition state is further stabilized by acid-base catalysis, during which covalent intermediates can form between the and the ribozyme.

Divalent cations, notably magnesium, are critically important in many biological functions with no exception when it comes to ribozymes, as they add to the chemical versatility RNA lacks (Steitz and Steitz, 1993; Hanna and Doudna, 2000; Andreini et al., 2008; Donghi and Schnabl, 2011). Divalent cations stabilize RNA assembly into three- dimensional structures by neutralizing the negative phosphate charge of the phosphodiester backbone. Additionally, metal ions can also complement the catalytic strategies of ribozymes. In phosphoryl transfer reactions, these help align the reactants, stabilize the negative charges developed in transition states and stabilize the oxyanion leaving groups. Further, when bound to water, metal ions can also aid acid-base catalysis by acting as proton donors and acceptors.

3 1.3. The natural ribozymes

1.3.1. Classes of natural ribozymes

Since the discovery of the first ribozyme in 1982, 14 distinct types of naturally occurring ribozymes have been discovered. All naturally occurring ribozymes, with the single exception of the ribosome, are involved in phosphoryl-transfer reactions, catalyzing the cleavage or ligation of the phosphodiester backbone of RNA. These ribozymes can be further classified into two types, self-cleaving (Fig. 1.1A) or trans- cleaving (Fig. 1.1B), based on the type of reaction being catalyzed. The self-cleaving RNAs include nine classes so far: the hammerhead (Hutchins et al., 1986; Prody et al., 1986), hairpin (Buzayan et al., 1986), human hepatitis delta virus (Kuo et al., 1988), Varkud-satellite (Saville and Collins, 1990), glmS (Winkler et al., 2004), twister (Roth et al., 2014), twister sister, hatchet and pistol ribozymes (Weinberg et al., 2015). In a reversible reaction, these ribozymes (Fig. 1.1A) use an internal 2'-oxygen, found at the ribose moiety of the cleavage site, to perform nucleophilic attack on the adjacent 3'- phosphate, cleaving the phosphodiester RNA backbone. This results in RNA products with a 2'-3'-cyclic phosphate 5' of the cleavage site and a 5'-hydroxyl on the 3' downstream termini. Trans-cleaving ribozymes also catalyze phosphodiester cleavage reactions, however, using a range of nucleophiles (hydroxyl containing R groups); for example, water in the case of ribonuclease P (RNase P, Fig. 1.1B) (Guerrier-Takada et al., 1983). This results in RNA products with a 3′-hydroxyl on the 5′ strand and a 3′ strand covalently attached to the attacking R-group via a phosphodiester linkage. Part of the trans-cleaving ribozymes, the self-splicing RNAs: the group I, II introns (Kruger et al., 1982; Michel et al., 1982) and the spliceosome perform two-step reversible chemistry with strand exchange, while the lariat-capping (LC) ribozyme (Nielsen et al., 2005) performs single-step chemistry to form a small protective cap-like lariat.

4

Figure 1.1. Self-cleaving and trans-cleaving chemistry of natural ribozymes. (A) Self-cleaving ribozymes form a cyclic phosphate. When correctly positioned and deprotonated by a ribozyme’s active site, the 2′-hydroxyl adjacent to the 3′-phosphate acts as a nucleophile, attacking to form a pentavalent intermediate, which can result in strand cleavage. Cleavage results in a 5′ strand containing a 2′-3′ cyclic phosphate and a 3′ strand having a free 5′-hydroxyl. This chemistry is fully reversible allowing ligation of cleaved products, a fact that is exploited by many RNA viral replication strategies. (B) Trans-cleaving ribozymes have evolved to act on RNA substrates. By chemistry similar to that of the self-cleaving ribozymes, a range of hydroxyl containing R groups (indicated by R-OH), water (RNase P), 3′ OH of guanosine (group I self- splicing intron), 2′ OH of an inter-strand adenosine (the group II self-splicing intron and the spliceosome) or the 2′ OH of an inter-strand uridine (the lariat-capping ribozyme (LC)) attack a phosphodiester linkage to form a pentavalent transition state. However, cleavage results in a 5′ strand with a 3′ hydroxyl and a 3′ strand covalently attached to the attacking R-group strand via a phosphodiester linkage.

1.3.2. The self-cleaving ribozymes

The hammerhead ribozyme (HHR)

The first of the self-cleaving ribozymes to be discovered, the hammerhead ribozyme (HHR) was reported independently in the satellite RNA of Tobacco Ringspot virus (Prody et al., 1986) and the Avocado Sunblotch viroid (Hutchins et al., 1986). This small ~50 nucleotide (nt) ribozyme resembles the shape of a hammerhead with a highly conserved 15-nt catalytic core, surrounded by three double helixes (I to III), and further

5 classified into three cyclic permutations (I to III) depending on which of the helixes are open-ended (Fig. 1.2). To reach high activity under physiological conditions, loop-loop interactions between helix I and II are required across the major groove of the RNA helices, forming a network of non-canonical base pairs (De la Peña et al., 2003; Khvorova et al., 2003). The HHR has been linked to the processing of circular RNA genomes in which it is present, processing multimeric transcripts into monomers during rolling-circle replication through its self-cleavage activity (Symons et al., 1987). However, acting exclusively in cis, the HHR cannot be considered a true catalyst due to its own consumption in the reaction. The ribozyme can however be divided into catalytic and substrate strands, making it capable of multiple turnover reactions in high magnesium concentrations, altering it to a true catalyst (Uhlenbeck, 1987). Since their original identification, HHRs have been further discovered in genomes from Bacteria, Chromalveolata, Plantae and Metazoa, having largely unknown roles, but likely in the propagation of retrotransposons (De la Peña and García-Robles, 2010).

Figure 1.2. Secondary structure of the hammerhead ribozyme (HHR). Schematic of the three cyclic permutations (obtained by opening the sequence at each of the dotted lines) of the hammerhead ribozyme (HHR). Double stranded helical regions are shown by parallel lines spaced by perpendicular bars and are given their conventional names. Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

The hairpin ribozyme

Like the hammerhead motif, the hairpin catalytic motif was first discovered in the satellite RNA of Tobacco Ringspot virus (Buzayan et al., 1986), later to also be discovered in the satellite RNAs of Chicory Yellow Mottle Virus and Arabis Mosaic Virus (DeYoung et al., 1995). Like its hammerhead cousin, the hairpin ribozyme also processes rolling-circle replication products by catalyzing self-cleavage and ligation. However, unlike the hammerhead ribozyme, which is more efficient at catalyzing cleavage over ligation by ~100 fold (Hertel et al., 1994), the hairpin ribozyme favors

6 ligation by ~10-fold (Hegg and Fedor, 1995). The hairpin ribozyme contains four helical stems (A-D) joined by a 4-way junction, with two of the helices containing catalytically essential unpaired internal loops A and B (Fig. 1.3) (Hampel and Tritz, 1989). For activity, the ribozyme, with the aid of divalent cations, rearranges to allow active-site formation by bringing the A and B internal loops into close proximity, where loop A contains the scissile phosphate (Murchie et al., 1998). Mutagenesis and modification analyses of the hairpin ribozyme revealed that the loop sequences have the highest effect on catalytic activity. In particular, the ribozyme requires an essential guanyl residue following the scissile phosphate for activity (Fedor, 2000). Further, a minimal hairpin ribozyme consisting only of the A and B helix-loop-helix segments can be constructed. This artificially-created ribozyme, however, is less stable and its catalytic efficiency is shifted towards the cleavage reaction (Hampel and Tritz, 1989; Fedor, 1999).

Figure 1.3. Secondary structure of the hairpin ribozyme. Schematic of the hairpin ribozyme. Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

Hepatitis delta virus (HDV) ribozyme

Requiring hepatitis B infection, the hepatitis delta virus uses a double rolling- circle mechanism to propagate, in which genomic and antigenomic multimeric RNA transcripts are produced (Lai, 1995). Self-cleaving HDV ribozymes were discovered in both genomic and antigenomic strands, and were found to process the double rolling- circle replication products into monomeric transcripts (Kuo et al., 1988). Both genomic

7 and antigenomic forms of the ribozyme are composed of five coaxially stacked helical regions (P1.1 and P1 to P4) organized by a nested double pseudoknot, with a cleavage site at the base of the P1 helix (Fig. 1.4) (Ferré-D’Amaré et al., 1998; Wadkins et al., 1999). The two forms of the ribozymes are structurally very similar, sharing conserved catalytically essential stem joining regions, with less essential differences in P2 helix length and sequence of the P4 helix. Since its original discovery, in vitro selection and structure-based searches have found similar catalytic motifs, or HDV-like ribozymes, which are structurally and biochemically related. First to be discovered outside a virus, the human cytoplasmic polyadenylation element-binding protein 3 (CPEB3) ribozyme was identified in a human genomic library using in vitro selection (Salehi-Ashtiani et al., 2006; Webb and Lupták, 2011). Additionally, structure-based searches have found similar motifs widely spread in genomes of mammals, insects, fish, plants, bacteria, and other viruses (Riccitelli and Lupták, 2013). However, how HDV-like ribozymes evolved is not yet completely clear.

Figure 1.4. Secondary structure of the hepatitis delta virus ribozyme (HDV). Schematic of the hepatitis delta virus ribozyme (HDV). Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

Varkud satellite (VS) ribozyme

The VS ribozyme was discovered in the mitochondrial RNA of some Neurospora strains, catalyzing self-cleavage and ligation of multimeric products in the replication cycle of the VS RNA (Saville and Collins, 1990; Collins, 2002). This ribozyme is unique

8 in its mode of substrate recognition, requiring several conformational changes to adopt its active form. It is comprised of six helices (I to VI) with the scissile phosphate located between the G620-A621 dinucleotide, which adopts a splayed conformation (Fig. 1.5) (Dagenais et al., 2017). Efficient catalytic activity requires the ribozyme to undergo a conformational shift in stem loop (SL) I, changing from a symmetrical 6-nt internal loop to an asymmetrical 5-nt internal loop to form a functional active site (Andersen and Collins, 2000). Additionally, the formation of a magnesium-dependent kissing-loop interaction between SL-I and SL-V is needed to facilitate the association of the internal loops of SL-I and SL-VI to form the active site (Bouchard and Legault, 2014), for which each of the internal loops contain a nucleobase with a key catalytic role, A756 (SL-VI) and G638 (SL-I) (Lipfert et al., 2008; Wilson et al., 2010).

Figure 1.5. Secondary structure of the Varkud satellite ribozyme (VS). Schematic of the Varkud satellite ribozyme (VS). Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

Glucosamine-6-phosphate synthase (glmS) ribozyme

Discovered in many Gram-positive bacteria, the glmS ribozyme was found as a conserved element in the 5′ untranslated region of the mRNA transcript of the glmS gene, which encodes the glutamine-fructose-6-phosphate amidotransferase. This enzyme generates glucosamine-6-phosphate (GlcN6P) from fructose-6-phosphate and glutamine (Winkler et al., 2004). This unique ribozyme, which is also a riboswitch, uses GlcN6P as a necessary for catalysis. Using its self-cleaving activity, the ribozyme functions in a negative feedback mechanism, suppressing glmS gene expression in response to cellular GlcN6P concentrations (McCarthy et al., 2005). Crystal structures show that the ribozyme adopts three coaxial stacked helices, a

9 defined metabolite binding pocket, with a doubly pseudo-knotted catalytic core and the scissile phosphate located at the bottom of the P2.2 helix between two splayed apart nucleotides (Fig. 1.6) (Klein and Ferré-D’Amaré, 2006). Despite needing the GlcN6P cofactor for activity, the ribozyme’s active site is overall quite rigid, with no major conformational changes upon binding the metabolite. The current evidence implicates the amine group of GlcN6P in the catalytic process, serving as a general acid, where incubation with a metabolite analog lacking the amine group inhibits self-cleavage activity (Viladoms and Fedor, 2012). Interestingly, three key mutations in the ribozyme can convert the ribozyme to a GlcN6P independent self-cleaving ribozyme having the same overall structural fold as the wild-type ribozyme (Lau and Ferré-D’Amaré, 2013). The discovery of this ribozyme suggests that in an early RNA World, riboswitches could have functioned as metabolite sensors and as regulators of expression, further speaking to the catalytic versatility of RNA.

Figure 1.6. Secondary structure of the glucosamine-6-phosphate synthase ribozymes (glmS). Schematic of the glucosamine-6-phosphate synthase (glmS) ribozyme. Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

Twister

Using comparative genomic analysis-based identification, 2700 twister ribozymes have been identified in both bacteria and eukaryotes (Roth et al., 2014). Representatives of these ribozymes were capable of reversible RNA cleavage as uni- or bimolecular

10 complexes in both in vitro and in vivo assays, however no biological role has been linked to them. The twister ribozymes are comprised of two pseudoknots, three essential stems (P1, P2, P4) and up to three optional stems (P0 at the 5′ end of P1, P3 and P5 at the junction of P2 and P4) interspersed by loops (Fig. 1.7). The ribozymes can be classified depending on which stem the termini are located (type P1, P3 or P5). Important for their catalytic activity, a number of highly conserved nucleotides are present in Loops 1 and 4, which are brought together for catalysis by a double pseudoknot interaction, with Loop 1 containing the scissile bond of a conserved adenine. Mutations of these conserved nucleotides or disruption of the stem base-pairing resulted in decreased cleavage activity (Roth et al., 2014). Further, crystal structures of four twister ribozymes implicate a key guanine residue in the cleavage mechanism along with the participation of divalent ions (Gebetsberger and Micura, 2017).

Figure 1.7. Secondary structure of the twister ribozyme. Schematic of the minimal domains of the twister ribozyme. Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

Twister sister, pistol and hatchet

In a further search for conserved RNA structural motifs, the Breaker group used comparative genomics analysis to discover three new self-cleaving ribozyme classes, termed twister sister, pistol, and hatchet (Weinberg et al., 2015). Although no biological role has been linked to these ribozymes, they are unique in sequence, structure and cleavage site compared to previous discoveries. The twister sister ribozyme adopts a three- or four-way junctional fold (without or with P3 respectively at the P2 to P4 junction), interspersed by internal and terminal loops (Fig. 1.8A). It gets its name by

11 sharing some highly conserved key nucleotides in its loops with the twister ribozyme, however it does not adopt a pseudoknot fold and cleavage happens at a C-A dinucleotide on the opposite side of an internal loop that resembles the catalytic loop of the twister ribozyme (Weinberg et al., 2015). Notably, crystal structures of two-stranded three- or four-way junctional fold complexes reveal the conserved C-A cleavage site as being splayed apart, anchored in place within the fold of the ribozyme (Liu et al., 2017; Zheng et al., 2017). The pistol ribozyme is characterized by a more compact fold containing three stems (P1-P3), a hairpin loop and an internal loop, with a 6-nt pseudoknot forming between the P1 loop and the P2 to P3 internal loop (Fig. 1.8B) (Harris et al., 2015; Ren et al., 2016). The ribozyme contains 10 highly conserved nucleotides, mostly within the internal loop and the P1 to P2 linker segments, which also contains a highly conserved AAA trinucleotide. Crystal structures revealed cleavage occurs at a modestly conserved G-U splayed-apart dinucleotide between the P2 and P3 stems, anchored through intercalation between bases (Nguyen et al., 2017). Lastly, with only biochemical data and no reported structure, the hatchet ribozyme seems to adopt a four-stem P1-P4 structure with a fifth 5′ of P1 functionally dispensable P0 hairpin conformation (Fig. 1.8C). It contains 13 highly conserved residues interspersed between the P1-P2 connection and the internal bulges between P2 and P3, with cleavage occurring at the base of P1 (Li et al., 2015).

Figure 1.8. Secondary structure of the twister sister, pistol and hatchet ribozymes. Schematic of the minimal domains of (A) the twister sister, (B) the pistol, and (C) the hatchet ribozymes. Sites of RNA cleavage are indicated by the arrowheads and the 5′ and 3′ ends of each ribozyme are indicated.

12 1.3.3. Trans-cleaving ribozymes

Group I introns

The first ribozymes to be discovered (Kruger et al., 1982), group I introns are 250–500 nt in size and are widespread in ribosomal RNA (rRNA), transfer RNA (tRNA) and messenger RNA (mRNA) genes of bacteriophages, bacteria, plants, lower eukaryotes and some eukaryotic viruses (Haugen et al., 2005). They catalyze their self- splicing from flanking exons and subsequent exon ligation through two sequential transesterification reactions with the aid of three metal ions (Fig. 1.9) (Shan et al., 1999). In the first transesterification reaction, the 3′ hydroxyl of an exogenous guanosine acts as the nucleophile to attack the 5′ splice site, yielding a free 5′ exon with a 3′-OH and a free 5′ intron end attached to the guanosine cofactor by a newly formed 3′-5′- phosphodiester bond (Cech, 1990). In the second step, the 3′-OH of the free 5′ exon acts as the nucleophile to attack the 3′ splice site, resulting in two ligated exons and an excised linear intron (Fig. 1.9) (Cech, 1990). The group I introns contain nine base- paired elements (P1-P9) organized as three helical domains, P1-P2 (P1), P4-P5-P6 (P4- P6) and P3-P7-P9 (P3-P9), with a highly conserved U as the last nucleotide of the 5′ exon and a G for the last nucleotide of the intron (Kim and Cech, 1987; Michel and Westhof, 1990). On large, P1 is the substrate domain containing the 5′ splice site, P4-P6 acts as structural support for the P3-P9 domain, and P3-P9 is the catalytic domain, containing many of the active site residues and the guanosine-cofactor binding site (Michel et al., 1989; Michel and Westhof, 1990). Besides their self-splicing activity, there is evidence that group I introns also act as mobile elements facilitated by homing endonuclease genes (HEGs), which they encode (Dujon, 1989).

13

Figure 1.9. The group I self-splicing intron. The group I self-splicing intron positions the 3′-hydroxyl of guanosine by binding the nucleoside cofactor into a binding pocket. This nucleophile can then attack the phosphodiester bond at the upstream exon1-intron boundary. In a second step, the liberated 5′ exon attacks the phosphodiester linkage found at the downstream intron-exon2 boundary. This results in an excised intron and a fused exon1-exon2.

Group II introns and the spliceosome

Group II introns are large (>200 nt) ribozymes found in all three domains of life, where they catalyze their self-splicing from surrounding exons and subsequent joining of the exons (Peebles et al., 1986). These ribozymes are closely related to the spliceosome which is essential for correct gene expression in most eukaryotes (Fica et al., 2013; Smathers and Robart, 2019). Self-splicing can proceed through two pathways, which can occur in parallel: branching or hydrolysis. Both involve two sequential reversible transesterification reactions that require two divalent metal ions for catalysis

14 (Fig. 1.10) (Daniels et al., 1996). In the first transesterification reaction, the branching pathway uses the 2′-OH of an unpaired adenosine as the nucleophile to attack the 5′ splice site forming a 2′-5′-phosphodiester bond and yielding a free 5′ exon (Fig. 1.10) (Peebles et al., 1986), while in the hydrolysis pathway a water molecule acts as the nucleophile (Jarrell et al., 1988). For both pathways, the second transesterification reaction proceeds with the free 5′ exon acting as a nucleophile to attack the 3′ splice site, resulting in two ligated exons and an intron lariat product (branching pathway) or a linear intron (hydrolysis pathway) (Jarrell et al., 1988). Due to the reversible nature of both transesterification reactions, group II introns can also undergo reverse-splicing (retrotransposition), where free introns can insert themselves into new target sites, a function believed to be responsible for their widespread occurrence throughout all domains of life (Chin and Pyle, 1995). During both splicing and retrotransposition, group II introns often recruit proteins they encode, intron-encoded proteins (IEPs), to help them fold or enhance their function in vivo (Lambowitz and Belfort, 2015). There are three major classes of group II introns (IIA, IIB, IIC), differing slightly in sequence and structure, and most notably in their method of exon recognition (Zimmerly and Semper, 2015). All group II introns share a conserved structure, with six domains (D1-D6) radiating from a central wheel-like structure, with a conserved 5′ GUGYG and 3′ AY (Y = pyrimidine) (Galej et al., 2018). Broadly, Domain 1 (D1) provides exon recognition, and during reverse-splicing, target recognition; D2 and D3 serve a role for structural assembly, positioning the D2 to D3 junction in the active site; D4 contains the open reading frame encoding the IEPs and binds the IEPs (Pyle, 2016). D5 contains the most conserved nucleotides, interacting with D1 to form the catalytic core and contains the nucleotides that bind the catalytic divalent metal ions (Pyle, 2016). D6 contains the conserved unpaired adenosine, the nucleophile in the branching pathway (Pyle, 2016).

15

Figure 1.10. The group II self-splicing intron. The group II self-splicing intron and the eukaryotic spliceosome position the 2′-hydroxyl of an internal adenosine to attack the exon1-intron boundary phosphodiester linkage, forming a lariat structure. In a second step, the 3′-hydroxyl of the liberated upstream exon1 attacks the downstream intron-exon2 phosphodiester to excise the intron and form a fused exon1-exon2.

In eukaryotes, splicing is carried out by the spliceosome, a large ribonucleoprotein complex consisting of over 200 proteins but with a catalytic RNA core composed of five small nuclear RNAs (snRNAs) (Fica et al., 2013). Like the group II introns, the spliceosome performs the same two transesterification reactions, during which a lariat intron is excised, and two exons are ligated (Will and Lührmann, 2011). Structural and mechanistic similarities between the two systems in combination with the retrotransposition activity of the group II introns strongly suggest an evolutionary connection between the two, for which the spliceosome is yet another example of an RNA World remnant evolved and utilized in modern biological systems (Fica et al.,

16 2013). The spliceosome’s active site shows structural parallels to that of group II introns, facilitated by arrangements of the spliceosome’s U2, U5 and U6 snRNA components resembling D1, D5 and D6 structures (Smathers and Robart, 2019). In some cases, group II intron domain components can replace spliceosomal snRNA components and vice versa, while still maintaining splicing function (Hetzer et al., 1997; Shukla and Padgett, 2002). Additionally, it was found that group II intron encoded proteins (IEPs) bind and function similarly to the splicing protein Prp8, further strengthening the case for a common evolutionary origin (Galej et al., 2013).

Lariat capping (LC) ribozyme

The lariat capping (LC) ribozyme (~180 nt), originally termed the Group I-like Ribozyme 1 (GIR1), was discovered as part of a group I twin-ribozyme intron in the extrachromosomal ribosomal DNA of Didymium iridis. The extrachromosomal ribosomal DNA in this organism encodes a regular group I splicing ribozyme followed by a LC ribozyme and a homing endonuclease gene (HEG) in a peripheral domain (Decatur et al., 1995). The ribozyme catalyzes cleavage through a branching transesterification reaction analogous to the first transesterification reaction of group II introns. In this reaction, the 2′-OH of an internal U residue acts as the nucleophile to attack a phosphodiester bond two nucleotides away, yielding a free 5′ fragment with a 3′-OH and a 3′ fragment with a 3-nt lariat cap formed by a 2′-5′-phosphodiester bond (Fig. 1.11) (Nielsen et al., 2005). Also found in some Naegleria strains, the lariat cap facilitated by the LC ribozymes seems to increase the half-life of HEG mRNA (Johansen and Vogt, 1994; Nielsen et al., 2005). Notably, the LC ribozyme’s sequence and secondary structure closely resemble that of the eubacterial group I intron, suggesting an evolutionary connection that is most likely facilitated by the reverse-splicing and homing capabilities of group I introns (Johansen et al., 2002; Haugen et al., 2005). Both ribozymes share a three-domain structure and tertiary interactions to stabilize the catalytic core and short peripheral elements. The main differences lie in the smaller size of the LC ribozyme and a different substrate domain (Beckert et al., 2008; Meyer et al., 2014).

17

Figure 1.11. The lariat capping ribozyme (LC). The LC ribozyme, uses the 2′ hydroxyl of an internal uridine to form a small 3-nt lariat cap.

Ribonuclease P (RNase P)

One of the first ribozymes to be discovered, RNase P is a large ribonucleoprotein complex with multiple turnover enzymatic activity (Guerrier-Takada et al., 1983). The ribozyme is universal, found in all three domains of life where it is best known for processing precursor tRNA via its cleavage activity, amongst other RNA substrates such as mRNA, rRNA and long noncoding RNAs (Jarrous, 2017). During tRNA maturation, RNase P uses divalent metals to activate a nucleophilic water molecule to hydrolyze the phosphodiester bond between the mature tRNA and the 5′ leading sequence, removing the 5′ end (Fig. 1.12A). In all three domains, catalytic activity is attributed to the RNA catalytic core, with poor activity still being observed in the absence of proteins under varying conditions (Guerrier-Takada et al., 1983; Pannucci et al., 1999; Kikovska et al., 2007). In bacteria, the RNA is comprised of two functional domains, the catalytic (C) domain essential for catalysis and the substrate recognition specificity (S) domain (Kazantsev et al., 2011) (Fig. 1.12B). The protein subunits aid in substrate recognition and stabilization of the RNA structure, rather than in catalysis, and render pre-tRNA homogeneous in E. Coli (Sun et al., 2006; Guenther et al., 2013). In bacteria, the ribonucleoprotein consists of a single RNA component and one protein subunit; in contrast the complex harbors four and six protein subunits in Archaea and Eukaryotes, respectively (Ellis and Brown, 2009). Despite divergence in sequence and structure of the RNase P enzymes across domains, the RNA catalytic center remained overall similar with structural elements of the four archaeal proteins homologous to four of the core eukaryotic proteins (Frank et al., 2000; Hartmann and Hartmann, 2003). This however, is not the case for RNase P in human mitochondria and spinach chloroplast, for which the RNA component has been fully replaced by proteins (Thomas et al., 2000; Holzmann et al., 2008). Furthermore, in certain organisms, RNase P activity has been completely bypassed (Randau et al., 2008).

18

Figure 1.12. RNase P is a trans acting two-domain ribozyme. (A) RNase P positions divalent metal ions (M2+) and a water molecule proximal to the pre-tRNA cleavage site (indicated by the black arrow). RNase P cleavage results in displacement of the pre-tRNA leader sequence. In bacteria a protein cofactor ensures that all pre-tRNAs are cleaved at approximately the same rate by binding in distinct patterns to the leader sequence of the pre- tRNA (interaction shown by red oval), while the RNase P RNA recognizes the 3′ tail of the maturing tRNA (blue ovals). (B) The A-type E. coli RNase P RNA is divided into two independently folding structural domains. The Specificity domain (S-domain) shown in black is responsible for pre-tRNA recognition, while the Catalytic domain (C-domain) shown in red is responsible for catalyzing pre-tRNA cleavage.

19 The ribosome

Found in all domains of life, the ribosome is a megadalton ribonucleoprotein complex capable of multiple turnover enzymatic activity. Composed of at least three RNA molecules and over 50 proteins, the ribosome at its core is a ribozyme that catalyzes protein synthesis whose sequences are encoded by messenger RNA (mRNA) (Nissen et al., 2000). Unlike the other natural ribozymes, which catalyze phosphoryl transfer reactions, the ribosome catalyzes amide-bond formation where an amine nucleophile on one aminoacyl transfer RNA (aa-tRNA) attacks a carbonyl carbon of the ester bond of a second aa-tRNA, linking amino acids together by forming an amide bond (Fig. 1.13A) (Green and Lorsch, 2002). Ribosomes are composed of two subunits, the large subunit which contains the peptidyl transferase center (PTC), and the small subunit which contains the decoding site where codon-anticodon base pairing occurs between the mRNA and tRNA. The PTC is formed entirely out of RNA and shows a high degree of conservation (Ban et al., 2000). Supporting the idea that the ribosome is a ribozyme, peptide-bond formation activity was retained in vitro without most of the proteins of the large ribosomal subunit (Noller et al., 1992). The ribosome has three tRNA binding sites. An A (aminoacyl) site where the substrate aa-tRNA arrives in complex with EF-Tu (elongation factor thermo unstable) and GTP. Following GTP hydrolysis, the aa-tRNA reacts with the peptidyl-tRNA in the P (peptidyl) site resulting in a deacylated tRNA in the P site and an extended peptidyl-tRNA in the A site (Ramakrishnan, 2002) (Fig. 1.13B). The two resulting tRNAs then translocate, moving from the A and P sites to the P and E (exit) sites, respectively, by EF-G (elongation factor G), releasing the deacylated tRNA from the E site and providing an empty A site for the next aa-tRNA to arrive (Ramakrishnan, 2002) (Fig. 1.13B). Crystal structures of the bacterial ribosome show that the conserved CCA sequence of the tRNA ends are held in place and oriented near the active site by interactions with the ribosomal RNA in both the A and P sites (Schmeing et al., 2005; Selmer, 2006). Besides placing and orienting the substrates in the catalytic site, the ribosome seems to provide a favorable electrostatic environment for the reaction to proceed, increasing peptide bond formation rate by 106–107 fold (Beringer and Rodnina, 2007).

20

Figure 1.13. The peptide chain forming chemistry of the ribosome is unique among natural ribozymes. (A) The chemistry of peptide bond formation catalyzed by the ribosome. The amine group (red) on an incoming aminoacylated tRNA charged with Isoleucine (Ile, I) acts as a nucleophile to attack the carbonyl carbon of the ester bond of an aminoacylated tRNA charged with Methionine (Met, M) to form a peptide bond. This results in an MI dipeptide attached to the original tRNAIle via an aminoacyl linkage. (B) Schematic of the peptidyl transfer mechanism in the ribosome. The ribosome, a megadalton ribonucleoprotein complex, consists of a large subunit responsible for peptide bond formation, and a small subunit responsible for mRNA decoding. An aminoacylated tRNA (circle indicates amino acid) is loaded into the A-site of the ribosome (left panel) by EF-Tu (not shown). Subsequently, the aminoacyl linkage attaching the peptide chain (string of circles) to the P-site tRNA is attacked by the primary amine of the A-site amino acid resulting in a peptide chain longer by one amino acid (middle panel). Next, EF-G (not shown) translocates the mRNA and the anticodon stems of the tRNAs to the left by 3-nt allowing the removal of deacylated tRNA from the E-site and providing an empty A-site for the acylated tRNA to enter the ribosome (right panel).

1.4. RNA and the origins of life

Judging from the phylogenetic distribution of natural ribozymes and the central nature of the reactions that they catalyze, it is clear that catalytic RNA was likely present in the last common universal ancestor (LUCA) of contemporary life (Benner et al., 1989). The distribution in all domains of life and a high degree of conservation of the catalytically critical and complex ribosome and RNase P ribozymes highly suggest an evolutionary connection of catalytic RNA in all organisms. Interestingly, recent structural

21 and mechanistic studies demonstrate that the catalytic core of the group II self-splicing intron and the spliceosome are highly conserved, suggesting that the eukaryotic spliceosome is derived from an ancient viral infection (Keating et al., 2010; Fica et al., 2013; Nguyen et al., 2015). Further, with increasing computational power in recent years, comparative genomics is starting to provide a much broader picture of the distribution of naturally occurring ribozymes, finding thousands of catalytic RNA motifs in all domains of life.

In addition to the inference that ribozymes were a feature in the LUCA, there is much evidence and corresponding speculation that RNA, in particular catalytic RNAs, were critical in the earliest life forms on Earth. RNA, or an RNA-like molecule, could even have been the basis for the origins of life itself (Gilbert, 1986; Joyce, 2002; Robertson and Joyce, 2012). Initially, this postulate was based on the observation that RNA was the only known polymer to simultaneously possess a genotype and a phenotype. Subsequently, more direct evidence of the antiquity of RNA has come to light. These include the discovery that the ribosome is a ribozyme, that RNA can catalyze a diverse array of reactions, that an artificial ligase-derived RNA can catalyze rudimentary RNA polymerization, that RNA can catalyze recombination, and, importantly, that populations of RNA can evolve in a test tube and respond to environmental pressures in manners analogous to natural organisms (Wright and Joyce, 1997). Moreover, much is now known about plausible abiotic synthetic routes to critical RNA precursors such as purines, pyrimidines, carbohydrates, and phosphorylated nucleotides (Powner et al., 2009; Stairs et al., 2017; Whitaker and Powner, 2018; Xu et al., 2020). Equally important, the ability of abiotic chemistry to produce RNA polymers seems likely, as RNA-like monomers can be converted into RNA chains of 50 or more nucleotides using cationic clays such as montmorillonite (Huang and Ferris, 2006).

1.5. In vitro evolution of RNA

1.5.1. Systematic evolution of ligands by exponential enrichment (SELEX)

SELEX is a powerful method developed to isolate RNA sequences that bind to particular substrates without necessarily reacting with them (Ellington and Szostak, 1990; Tuerk and Gold, 1990). Random pools of RNA, typically 40–100 nt in length, are

22 generated such that up to 1015 possible sequences can be easily represented. These random pools are challenged to bind a target substrate, such as a particular protein or a small molecule, and successful binders are separated, reverse transcribed into complementary DNA (cDNA) by a enzyme, amplified at the DNA level by Polymerase Chain Reaction (PCR), before being transcribed back into RNA by a DNA-dependent RNA polymerase. Repeating this process 5–10 times often generates a small subset of RNA sequences, called aptamers, with specific and high-binding affinity toward the target substrate.

The generation of aptamers against target substrates has many analogies to the immune system of mammals. In mammals, a high intrinsic diversity of potential antibodies is used to “raise” antibodies that target specific pathogens. It is estimated that a human can express 1015 distinct antibodies (Briney et al., 2019), a value similar in magnitude to the diversity of artificial RNA sequences that can be generated in a laboratory. Since both antibodies and aptamers can bind specific target substrates, they, when used as research tools, have many applications for the study of biology and treatment of disease (Yan and Levy, 2009; Meyer et al., 2011; Sun et al., 2014).

1.5.2. Selection in vitro for catalytic RNAs

Selection in vitro follows the same basic strategy as SELEX, only RNAs are isolated by their ability to catalyze a target chemical reaction (Fig. 1.14). Like aptamers, these RNAs first bind one or more substrates, and then promote a reaction in a way that modifies or “tags” the catalytic RNA for amplification. This tag allows desired catalytic RNA molecules (ribozymes) to be purified and amplified as in the selection of RNA aptamers. For the initial RNA pool, this technique can either use a set of random RNA sequences, or mutants based on a particular known wild-type sequence.

23

Figure 1.14. Artificial in vitro selection methodology. Polymerase chain reaction (PCR) is used to create an initial DNA library that can be transcribed (typically using T7 RNA polymerase) into an RNA pool with sequence diversities typically ranging from 1012 to 1016 sequences. The library is subjected to a selective step that allows purification of functional (active) ribozymes away from non-functional (inactive) sequences. The stringency of this biochemical step is key to a successful in vitro selection. Active sequences are recovered, reverse transcribed into cDNA and then PCR amplified to generate a second-round pool population. This selection cycle is repeated until the desired functional ribozymes are found. To further increase sequence diversity, PCR can be replaced with mutagenic or recombination- based methods.

Fundamentally, both RNA aptamers and ribozymes adopt complex three- dimension structures to become functional. This folding generates binding pockets for ligands or reactants, allowing ribozymes to mediate chemistry by precisely positioning them with metal ions and proton donor and acceptor groups at the site of chemistry. Linear RNA polymers can achieve such complex folding through a variety of mechanisms. Just like double-stranded B-form DNA, where A can pair with T and C with G, RNA can also make Watson-Crick base pairs converting local regions of linear sequence into antiparallel A-form double-stranded regions of RNA. Such helices make important structural elements in all known natural and artificial ribozymes. These helices are typically separated by linear regions of RNA that can form loops and linear connecting junctions between helices (Fig. 1.2 to 1.8 natural ribozymes). Such elements often form complex three-dimension structures that via their interactions with RNA helical elements confer function to the RNA molecule.

24 1.5.3. Evolution in vitro

Evolution in vitro follows the strategy of selection in vitro except that in each round, mutations are deliberately introduced into the RNA population to provide additional sequence variation on which natural selection can operate (Fig. 1.14). The mutations can come from error-prone protein polymerases, through mutagenic PCR (Cadwell and Joyce, 1994) or from recombination with sexual PCR (Stemmer, 1994). Evolution in vitro can in principle be carried out for an indefinite number of rounds, but convergence on a particular catalytic solution, if one exists, typically occurs after 10–20 rounds in general accordance with the principles of population genetics.

The selection of ribozymes or aptamers out of random sequence can result in functional molecules, which often contain extraneous random sequence. Removing these sequences can be performed by systematically removing RNA nucleotides from either the 5′ or 3′ ends of the RNA until function is negatively affected. Similarly, RNAs containing large loop sequences can be shortened by replacing such loops with minimal RNA tetraloops (for example, GNRA tetraloops), provided that the loop sequence is not involved in function (Ekland et al., 1995). The resulting minimal motifs are often much easier to utilize.

Evolutionary methods that streamline this procedure have been developed by several laboratories. The DNA of a ribozyme sequence is cleaved into short segments using DNase I, followed by re-ligation, resulting in a large number of permutations. From this cohort, the resulting RNA libraries can be efficiently evolved into the minimal functional form (Wang and Unrau, 2005; Nomura and Yokobayashi, 2019). This method results in ribozymes with all nonessential sequence removed, while simultaneously allowing accumulation of useful point mutations. Such strategies serve to effectively isolate the minimal functional domain of an artificial ribozyme.

An ultimate goal of these RNA evolution techniques is to mimic a prebiological milieu in which all components required for evolution – the RNA, the nucleotides, the salts, the oligonucleotides, and the protein enzymes, if any – are simultaneously present in a single test tube (Wright and Joyce, 1997). In this fashion, several rounds of evolution occur autonomously before equilibrium is reached. Serial dilution and fresh addition of reagents can result in the extremely rapid evolution of RNA populations and

25 could, with external oscillations in temperature, result in an evolving RNA system (Lincoln and Joyce, 2009). Should RNA polymerase ribozymes that are capable of replicating an RNA genome be developed, then true Darwinian evolution within a test tube might become possible, dramatically revolutionize our understanding of early evolution.

1.6. Artificial ribozymes

1.6.1. Rediscovery of naturally occurring ribozymes by artificial selection

An intriguing aspect of artificial RNA selections is that they determined the origin of two naturally occurring ribozymes. In vitro selection has demonstrated that the natural hammerhead ribozyme is the most efficient ribozyme that can be found from completely random sequence libraries (Salehi-Ashtiani and Szostak, 2001). This explains how RNA viruses can make use of this ribozyme to rapidly process long linear RNA copies of either the sense or antisense (+ or – strands) into individual genomic copies. As the complexity of the hammerhead is low (Sabeti et al., 1997), this explains why the hammerhead ribozyme is used by many RNA viral systems, as it is the most effective ribozyme that evolution can most easily discover.

A second in vitro selection, also by Szostak’s research group, tells an interesting story about the source of the HDV ribozyme. This ribozyme is much more sophisticated than the hammerhead, cleaving RNA more rapidly with a more complex core motif. The source of this ribozyme has long been mysterious: it seemed improbable that a virus could have spontaneously evolved such a complex RNA fold. This mystery was resolved by an in vitro selection that took, instead of random sequence, short segments of the entire human genome as a starting library. Artificial selection found several RNA sequences within the human genome capable of RNA cleavage, with the fastest being the HDV ribozyme (Salehi-Ashtiani et al., 2006). Why the human genome contains the HDV ribozyme is still unclear, but this artificial selection explains how the human HDV virus was able to rapidly adopt such a ribozyme: it stole it.

26 1.6.2. Catalytic diversity of artificial ribozymes

Using SELEX, in vitro selection, and in vitro evolution, many additional catalytic RNA sequences have been discovered, most promoting chemistries not yet seen in natural ribozymes. The existence of these ribozymes demonstrates not only that RNA has a wide repertoire of catalytic capabilities, but also that evolutionary forces are quite adept at molding RNA-based catalysis. For example, copper-dependent Diels–Alderase ribozymes able to perform carbon–carbon bond chemistry have been isolated by in vitro selection, although it was necessary to derivatize one of the four bases, uridine, with a pyridyl moiety (Seelig and Jäschke, 1999). From an evolutionary perspective, ribozymes containing less than the four standard RNA residues of A, C, G and U have also been evolved in the laboratory. Ribozymes have been selected that use only three RNA nucleotides (i.e., A, G, and U) (Rogers and Joyce, 1999). Further, using only U and 2,6- diaminopurine, a two-base ligase ribozyme could be constructed (Reader and Joyce, 2002). Many other unnatural ribozymes exist, and more are being constructed. These ribozymes can catalyze amide-bond formation, N-alkylation, porphyrin metalation, thioester bond formation, and a host of other chemical reactions, with catalytic centers as small as five nucleotides.

Of particular interest for achieving self-sustained replication and evolution in the laboratory with RNA are the in vitro selected phosphoryl transfer ribozymes. Phosphoryl transfer enzymes play fundamental roles in modern metabolism, and over the last two decades, a set of phosphoryl transfer ribozymes have been isolated that provide insight into how RNA, rather than protein, might have been able to sustain many of these centrally important reactions. One function of phosphoryl transfer reactions in modern metabolism is to maintain the balance of energy inside cells by interconverting between mono-, di- and triphosphate nucleotides (catalyzed by mono- and diphosphate ) (Manning et al., 2002). The high chemical potential energy stored in the phosphate bonds of di- and triphosphate nucleotides is used by multiple cellular functions to drive otherwise thermodynamically unfavorable reactions (Knowles, 1980). Catalytically, phosphoryl transfer reactions and their complex, often multistep, chemistry make it unclear how well RNA could have promoted similar chemistry early in the evolution of life.

27 Surprisingly, RNA is remarkably good at such chemistry, as highlighted in more detail in the sections that follow. Therefore, I will introduce the different mechanisms by which ribozyme phosphoryl transfer reactions occur and compare them with their protein equivalents. Taking a bottom-up approach, I will be putting ribozyme-mediated phosphoryl transfer reactions into the context of nucleotide and nucleoside metabolism, the activation of RNA oligomers, and the assembly and replication of RNA molecules.

1.6.3. Kinase ribozymes

One major set of phosphoryl transfer reactions is catalyzed by kinase enzymes. Kinases carry out phosphorylation by transferring phosphate groups from high energy donor molecules, such as ATP (pppA), to specific substrates. With 518 genes scattered across the human genome, nucleoside monophosphate kinases (NMPKs) and nucleoside diphosphate kinases (NDPKs) are two critical enzyme families that are essential for the synthesis and homeostasis of nucleotides. This dynamic balance is required to maintain cellular metabolic processes and to both synthesize and repair RNA and DNA polymers central to information transfer and storage respectively (Manning et al., 2002). NMPKs and nucleotide kinases (NKs) catalyze the transfer of a terminal γ-phosphoryl group from a nucleoside triphosphate onto an attacking nucleophile in a single step.

Similarly, polynucleotide nucleotide kinase (PNK) type enzymes phosphorylate the termini of RNA or DNA strands. This then allows to join nucleic acid strands together (Karimi-Busheri et al., 1999), which can play critical roles in either viral infection or the repair of DNA damage. The attack of either a hydroxyl group (Eq. (1)) or a monophosphate (Eq. (2)), is achieved by precise positioning of the relevant nucleophile inside the enzyme active site. In contrast, the divalent metal ion dependent NDPK enzymes catalyze the phosphorylation of nucleoside diphosphates to nucleoside triphosphates via a two-step mechanism involving a phospho-enzyme intermediate (Eq. (3)). This process is intrinsically highly symmetrical as the product of the reaction differs from the substrates only by the identity of N1 and N2 (Eqs. (3) and (4)) (Berg et al., 2002; Henzler-Wildman et al., 2007).

PNK pppN1 + 5ʹ-HO-RNA ↔ ppN1 + pRNA (1)

28 NMPK pppN1 + pN2 ↔ ppN1 + ppN2 (2)

pppN1 + NDPK ↔ ppN1 + NDPKp (3)

NDPKp + ppN2 ↔ NDPK + pppN2 (4)

While NDPKs do not show high specificity towards types of nucleoside bases, NMPKs do, with human tissues containing a thymidylate kinase (dTMPK), a uridylate- cytidylate kinase (UMP-CMPK), five isozymes of adenylate kinase (AK) named AK1–5, and several guanylate kinases (GUKs) (Van Rompay et al., 2000). This suggests that the two-step mechanism and the intrinsic symmetry of the NDPK type enzymes might be responsible for their lack of specificity.

Given the role of protein kinases, could kinase ribozymes have been utilized in an early RNA World to carry out RNA activation through phosphorylation? The Tetrahymena group I intron of the natural ribozymes was an early indicator for the validity of RNA catalyzed kinase reactions (Kruger et al., 1982). A shortened form of the intron acts as a ribozyme, catalyzing the transfer of a 3ʹ-phosphate from one oligonucleotide to the 3ʹ-terminal hydroxyl of another (Zaug and Cech, 1986). To that end, using in vitro selection from deep random sequences and subsequent laboratory mediated evolution, kinase ribozymes have been isolated that catalyze self- thiophosphorylation and self-phosphorylation of 5ʹ-terminal or 2ʹ-internal hydroxyls (Fig. 1.15, A and B) (Lorsch and Szostak, 1994; Curtis and Bartel, 2005; Saran et al., 2005; Biondi et al., 2010, 2012). All these kinase ribozymes underwent a similar selection process, where RNA libraries were incubated with either ATPγS or GTPγS, and self- thiophosphorylating kinases were isolated by taking advantage of the chemical specificity conferred by the sulfur modification. Despite being selected for thiophosphoryl transfer, these isolated kinase ribozymes were also able to catalyze phosphoryl transfer reactions, albeit at reduced rates.

As a starting point, the first approach towards selecting kinase ribozymes from random sequence used a pre-existing (in vitro selected) RNA aptamer domain that binds ATP (Sassanfar and Szostak, 1993), surrounded by regions of random sequence (Lorsch and Szostak, 1994). This approach resulted in seven classes of kinase

29 ribozymes, with five acting as 5ʹ-kinases and two as internal 2ʹ-hydroxyl kinases. An engineered variant of one of these ribozymes, Kin.46, was also capable of catalyzing the transfer of γ-thiophosphate from ATPγS to the 5ʹ-hydroxyl of an exogenous RNA oligomer held in the enzyme’s active site through Watson-Crick base pairing (Lorsch and Szostak, 1995). The catalyzed reaction has a similar mechanism to the reaction catalyzed by the human PNK (hPNK), performing multiple turnover chemistry with a kcat

-1 of up to 0.17 min . ATP and GTPγS were also substrates for Kin.46, however the kcat was ~100-fold and ~650-fold lower than with ATPγS respectively, showing high specificity for the donor with which it was selected (Lorsch and Szostak, 1995).

Similarly, using an aminoacylase parent ribozyme as a starting point (Illangasekare et al., 1995), another 23 distinct kinase ribozymes were isolated with 5ʹ- terminal or 2ʹ-internal phosphorylation activity. Using GTPγS as the donor molecule, these ribozymes had catalytic rates ranging from 8 x 10-6 to 6 x 10-4 min-1. The evolution of kinase ribozymes continued with the selection of the ribozyme 2PTmin3.2. Although showing a 5-10 fold lower catalytic rate than the ribozymes selected by Szostak in 1993, this 2ʹ internal hydroxyl kinase is the first demonstration of a trans-acting ribozyme able to phosphorylate the 2ʹ-hydroxyl of an unpaired exogenous substrate (Saran et al., 2005). A special feature of this ribozyme is its ability to execute multiple cycles of de- thiophosphorylation and re-thiophosphorylation driven by ATPγS, allowing it to act as a primitive energy transducing molecular motor (Saran et al., 2006).

Divalent cations play a critical role in many essential biological ribozymes (Steitz and Steitz, 1993; Andreini et al., 2008; Butcher, 2011), helping stabilize negative charges during nucleophilic attack and increase the electrophilicity of phosphates. The kinase ribozymes discussed so far are all pH independent between pH 5 – 9 and are divalent metal ion dependent, with magnesium being the main metal ion cofactor. One of the last selected kinase ribozymes, and the most divergent in behavior compared to its predecessors, is K28(1-77)C (Biondi et al., 2010, 2012, 2013). This ribozyme is able to perform thiophosphorylation onto itself at two widely separated residues. However, as opposed to the previously described Mg2+ dependent ribozymes, in addition to Mg2+, K28(1-77)C requires a tightly bound Cu2+-GTPγS complex in its active site at a high pH of 8.5 for optimal catalytic activity (Biondi et al., 2012, 2013). The diverse metal ion behavior observed amongst the selected kinase ribozymes therefore highlights the capability of RNA to evolve complex chemical strategies in an RNA World.

30

Figure 1.15. Kinase ribozymes Kinase ribozymes have been characterized that catalyze phosphoryl transfer from either an adenosine or guanosine triphosphate to the (A) terminal 5′-hydroxyl or (B) internal 2′-hydroxyl of an RNA molecule. (C) Triphosphorylation ribozymes catalyze the triphosphorylation of RNA 5′- hydroxyl groups with trimetaphosphate (TMP).

31 While phosphorylation of a nucleotide hydroxyl is essential in modern metabolism for the subsequent activation of a nucleotide or polymer to a diphosphate or triphosphate, alternative ribozyme-based activation mechanisms have been explored using cyclic trimetaphosphate (TMP) as an energy source (Fig. 1.15C). Nucleoside triphosphates can be generated directly from nucleosides and TMP at a pH of 12 (Etaix and Orgel, 1978), as TMP is one of the most reactive polyphosphorylating reagents among the polyphosphates (Feldmann, 1967) and can be generated through the self- reaction of the termini of linear polyphosphates (Thilo et al., 1953). While the rapid hydrolysis of TMP at high pH (Kura, 1987) is problematic, a ribozyme that facilitates the triphosphate activation of RNA at neutral pH might have been useful in an RNA World. Using in vitro selection, over a dozen ribozyme families were selected that are able to triphosphorylate their 5ʹ-hydroxyl termini using TMP (Moretti and Müller, 2014). Kinetic analysis of eight individuals showed catalytic rate constants ranging between 0.01 and

-1 0.03 min under selective conditions (50 mM TMP, 100 mM MgCl2, pH 8.3). One of the ribozymes, TPR1, with a minimum functional motif of 96 nucleotides was capable of

-1 triphosphorylating RNAs both in cis and trans. This ribozyme has a kcat of 0.16 min and

KM for TMP of 30 mM in optimal buffer conditions (100 mM TMP, 400 mM MgCl2, pH 8.1), making the catalytic rate ~107-fold faster than the uncatalyzed reaction in the absence of magnesium.

This set of artificially selected kinase ribozymes demonstrate that in an early RNA World, RNA catalysis could have been versatile enough to catalyze the phosphorylation and direct activation of a variety of RNA oligomers. However, kinase ribozymes are currently only able to phosphorylate 5ʹ-terminal hydroxyls or internal 2' hydroxyls and are unable to catalyze reactions on free nucleosides. The selection of such kinase ribozymes would mimic reactions highly likely to have been important in an early RNA World and it would serve as important tools in the evolution of artificial RNA metabolism in the laboratory. The diversity of functional sequences found in kinase ribozyme selections to date indicates the potential in this area for future research. Likewise, selections able to produce mono- and diphosphate kinase ribozymes with mechanisms similar to that found in modern metabolism appear likely. In contrast to existing monophosphate kinase ribozymes that phosphorylate in a single step, it remains to be determined if diphosphate kinase ribozymes will require a two-step mechanism (Eqs. (3-4)) to overcome the difficult catalytic challenge of charge stabilization in these

32 reactions. While such selections remain to be performed, it appears that RNA will be up to this challenge, as a number of ribozymes have been selected which promote two-step chemistry that closely resembles that needed for diphosphate kinase chemistry (see Eqs. (3) and (4), and sections 1.6.5: Capping Ribozymes; 1.6.6: Ligase Ribozymes).

1.6.4. Glycosidic bond forming ribozymes

The de novo and salvage pathways that lead to the synthesis of nucleotides use 5-phosphoribosyl-1-α-pyrophosphate (PRPP) as a substrate. While the phosphorylation of purine and pyrimidine mononucleotides ultimately allows the production of RNA and DNA polymers in modern metabolism and is of fundamental importance in core metabolism, reactions involving PRPP do not directly involve phosphoryl transfer type enzymes. Rather, they formally belong to the pentosyltransferase family of enzymes. Artificially selected glycosidic bond forming ribozymes are discussed in this thesis, as they have important ramifications as to how ribozyme mediated phosphoryl transferase type reaction pathways might have evolved in an RNA World. Such phosphoryl transferase reactions would have been essential in an early RNA World but have yet to be demonstrated by laboratory RNA selections.

In order to efficiently produce the β-D-ribofuranose sugar found in RNA nucleotides, metabolism must first control three abundant cyclic forms of ribose: α-D- ribopyranose (59% abundance), β-D-ribopyranose (20% abundance) and β-D- ribofuranose (20% abundance), together with small but nevertheless extremely reactive amounts of the linear aldehyde form of this sugar (Drew et al., 1998). Modern metabolism achieves this control by two key phosphoryl-transferase reactions that focus ribose chemistry towards nucleotide synthesis. Ribokinase takes ribose (R) to ribose-5- phosphate (pR or R5P, Eq. (5)).

pppA + R → pR + ppA (5)

Synthesis of R5P precludes the formation of α-D-ribopyranose and β-D- ribopyranose, which constitutes nearly 80% of the possible ribopyranose isomers. Converting R5P to PRPP via ribose-phosphate diphosphokinase then produces the

33 activated PRPP substrate (Eq. (6)) with the correct initial stereochemistry for the synthesis of β nucleotides.

pppA + pR → pRpp + pA (6)

PRPP, which is locked into the α-D-ribofuranose form, simultaneously precludes the ring opening of the ribose sugar and prevents the formation of the otherwise very reactive linear aldehyde sugar form of ribose. Thus, just as performed by modern metabolism, it appears likely that RNA enzymes must have required these two phosphoryl transfer reactions early in evolution in order to produce nucleotides. Based on the previously discussed 5' kinase ribozymes, there seems little doubt that a ribokinase ribozyme could have existed early in evolution. While more challenging, the recent discovery of riboswitches that recognize PRPP as a ligand (Sherlock et al., 2018) suggests that ribose-phosphate diphosphokinases could have existed as well. Interestingly, in vitro selection experiments for a purine nucleotide synthesis ribozyme also provides tangential support for the early existence of a PRPP type synthase ribozyme.

Nucleotide synthesis in an RNA World context has been explored in the lab by the in vitro selection of two PRPP dependent nucleotide synthesis ribozymes (Unrau and Bartel, 1998; Lau et al., 2004). In core metabolism, PRPP plays a central role in pyrimidine synthesis either via the de novo orotidine synthesis or by the salvage nucleotide synthesis pathways. Likewise, de novo purine synthesis or salvage nucleotide synthesis also utilizes PRPP. Both ultimately result in the synthesis of β-nucleotides. Such chemistry ranges from highly concerted to nearly fully dissociative as judged by a range of kinetic isotope effect experiments (Berti, 1999). Selection for a 4-thiouridine nucleotide synthase ribozyme (Fig. 1.16A), using PRPP tethered to the 3' terminus of a high diversity RNA pool, yielded pyrimidine nucleotide synthase ribozymes (Unrau and Bartel, 1998) that were later shown to be highly dissociative (Unrau and Bartel, 2003). This indicates that the ribozyme mediates chemistry where the pyrophosphate is nearly fully displaced prior to the final attack of the N1 atom of 4-thiouracil. This pyrimidine chemistry, which is thermodynamically much less favourable than purine synthesis, was

34 complemented by the selection of much more efficient 6-thioguanosine nucleotide synthase ribozymes, again using PRPP tethered to the 3' termini of a high diversity RNA pool (Fig. 1.16B) (Lau et al., 2004). Both ribozyme families showed very high substrate specificity and a marked preference for either their 4-thiouracil or 6-thioguanine substrates, strongly indicating that RNA has no intrinsic difficulty in recognizing small ribozyme substrates.

As just argued, PRPP serves a dual purpose in modern metabolism. First, it activates a single furanose sugar isomer for nucleotide synthesis. Second, it suppresses the formation of other furanose sugar isomers. This presents an evolutionary conundrum, given the unstable nature of PRPP itself. Such an activated sugar could not have easily accumulated to useful concentrations in a prebiotic world. Thus, while PRPP might have been available prebiotically in small amounts (Akouche et al., 2017), it seems unlikely that this molecule could have been a core feature of an RNA World until after the evolution of an RNA metabolism robust enough to allow the reliable harnessing of metabolic energy in polyphosphates, such as ATP. Conversely, selective pressure early in evolution would have been strongly driven by life's exponentially increasing demand for both nucleotides and a reliable metabolic energy currency. The exponentially rapid consumption of complex prebiotically synthesized nucleotides (Powner and Sutherland, 2011; Stairs et al., 2017) and other prebiotic materials that enabled the transition from abiotic chemistry to organized biological processes, therefore strongly implies that the metabolic synthesis of activated compounds such as PRPP would have become essential very early in evolution in an RNA World.

But how could the multistep metabolic pathways linking ribose to nucleotide synthesis have most naturally emerged early in evolution, when selection pressure would have demanded the almost immediate emergence of such apparently complex catalysts (Hayden et al., 2018)? As the earliest waves of biology strip mined prebiotic sources of complex metabolites (such as nucleotides), might it have been possible that RNA intrinsically possesses the promiscuity required to rapidly establish and build a complex RNA core metabolism capable of utilizing simpler and simpler prebiotic compounds? Specifically, could it have been that the critical PRPP type phosphoryltransfer type reactions (Eq. (6)) and PRPP dependent nucleotide synthesis reactions (Fig. 1.16) so essential for modern metabolism were derived from promiscuous ribozyme mediated chemistries?

35

Figure 1.16. Glycosidic bond-forming ribozymes. Glycosidic bond-forming ribozymes catalyze the synthesis of (A) 4-thiouracil monophosphate (4SUMP) and (B) 6-thioguanosine monophosphate (6SGMP) from RNA tethered 5-phosphoribosyl 1(α)-pyrophosphate (PRPP) and 4-thiouracil (4SUra) or 6-thioguanine (6SGua) respectively. (C) A promiscuous ribozyme reacts a nucleobase with R5P to produce a carbon linker similar to that of flavin mononucleotides. (D) FAD synthetase (FADS) attaches an adenine nucleotide to a flavin mononucleotide that contains a flexible five-carbon ribitol linker.

36 A third in vitro selection for nucleotide synthesis suggests that such ideas may have validity. A selection for nucleotide synthesis using tethered R5P rather than PRPP found that the linear aldehyde form of R5P could be made to easily react with 6- thioguanine (Lau and Unrau, 2009). While the linear ribose chain product of this reaction was not completely characterized, the C-N linkage between sugar and nucleobase (Fig. 1.16C) bears a marked resemblance to the linear linkage found between flavin and ribose in flavin mononucleotide, used to synthesize FAD in modern metabolism (Fig. 1.16D). Thus, reactive ribose aldehyde chemistry can easily be imagined to lead to the covalent tethering of redox nucleobase cofactors to RNA catalysts early in an RNA World (see section 1.6.5: Capping Ribozymes for additional strategies), where redox catalytic diversity was presumably quite limited in the absence of redox activating cofactors.

Quite unexpectedly, simply replacing R5P with PRPP caused this R5P- dependent 6-thioguanine tethering ribozyme to now function as an effective nucleotide synthesis ribozyme (Lau and Unrau, 2009). As this promiscuous ribozyme supports both the chemistry of the linear aldehyde form of ribose found in R5P and of the cyclic furanose form of ribose found in PRPP, the following evolutionary scenario might have been possible in an early RNA World: metabolism would require redox chemistry, just as modern metabolism does today. As demonstrated by the artificially selected 6- thioguanine tethering ribozyme, promiscuous ribozymes able to tether redox cofactors to RNA using R5P could have easily evolved in an early RNA metabolism. Such ribozymes might have played a key role in the early emergence of critical flavin dependent biochemistry that is essential today for a broad range of carbon-carbon, carbon-, and carbon-oxygen chemistry. Under these assumptions, the evolution of a single new ribozyme activity early in evolution that was able to phosphorylate R5P to PRPP (Eq. (6)) would have rapidly resulted in the evolution of ribozyme mediated nucleotide synthesis. This process would have been driven by the consumption of the prebiotic supplies of nucleotides by an expanding RNA metabolism. This potential link between the tethering of redox factors and the emergence of modern PRPP dependent nucleotide synthesis early in metabolism is substantially untested and much experimental work remains to fully explore this conjecture. Nevertheless, RNA has proven to be surprisingly versatile at mediating the types of chemistry that are directly relevant to the emergence of centrally important metabolic reactions found in modern life and ribozyme research

37 has given tantalizing clues as to how the earliest steps in the emergence of an RNA based metabolism could have occurred.

1.6.5. Capping ribozymes

The enzymatic capping of mRNAs is a multistep process in modern eukaryotic metabolism that plays a key role in allowing the cell to test mRNA quality prior to translation (Jackson et al., 2010; Hinnebusch, 2014). Capping in prokaryotic organisms occurs relatively rarely and is mechanistically distinct from eukaryotic capping, involving the incorporation of metabolic dinucleotide cofactors into the 5ʹ terminus of certain mRNAs in a fashion that is speculated to alter mRNA lifetimes as nutrient conditions change (Chen et al., 2009; Luciano and Belasco, 2015; Jäschke et al., 2016; Höfer and Jäschke, 2018). Throughout the kingdoms of life, it is striking that metabolism requires many dinucleotide 5ʹ-5ʹ cap analogs, such as NAD and FAD, which are essential protein enzyme redox cofactors. This suggests that capping in an RNA World would have been an important way to diversify the functionality of RNA catalysts early in evolution.

Two independent in vitro selections have found RNA capping ribozymes (Huang and Yarus, 1997a; Zaher et al., 2006). Neither selection was initially focused on the discovery of capping activity, but nevertheless, careful characterization indicated that capping is a straightforward phosphoryl transfer reaction for RNA to mediate. This was surprising as it might have been reasonably expected that stabilizing the charges found on each of the substrates required for capping would have been difficult to achieve. In the first selection, the Yarus laboratory was searching for a mixed carboxylate- phosphate anhydride of the sort normally produced by aminoacyl-tRNA synthetases (Huang and Yarus, 1997a, 1997b). Instead, they discovered a quite universal capping ribozyme, Iso6. In a similar fashion, the Unrau laboratory was trying to select for an RNA polymerase using a high diversity random pool annealed to a poly(A) template and incubated with 4SUTP. Instead of the desired polymerization activity, they also found a versatile capping ribozyme, called 6.17. Characterization of this ribozyme lead to the discovery that both ribozymes had very broad substrate requirements (Zaher et al., 2006). Initially activated with a 5ʹ triphosphate, both ribozymes could be capped with a range of nucleotides differing in either phosphate number (Eqs. (7) and (8), Fig. 1.17) or base type. As both ribozymes had distinct selective histories, secondary structures and metal ion requirements, it was perhaps not surprising that their apparent kcat values

38

Figure 1.17. Ribozyme mediated self-capping. RNA capping ribozymes catalyze the transfer of RNA oligomers or nucleotides (N = A, U, G, C) with varying numbers of phosphates (n = 1, 2, 3, 4) to the 5ʹ α-phosphate of an RNA sequence to form an RNA cap (N(5ʹ)pnpRNA cap) through 5ʹ-5ʹ phosphate-phosphate linkages. This pathway involves an intermediate where the 5ʹ α-phosphate of an uncapped (A) or capped (B) RNA molecule forms a covalent bond with a 3ʹ downstream Nitrogen or Oxygen (X). (C) The intermediate is then attacked by RNA oligomers or nucleotides to form an RNA cap or exchange a current cap.

39 differed by approximately 60-fold. Curiously, these two ribozymes shared many enzymatic similarities apart from this difference in rate. First, both ribozymes become faster at low pH. This contrasts with ligase ribozymes, which become faster at high pH as the attacking hydroxyl in these ligases must be deprotonated prior to nucleophilic attack (see section 1.6.6: Ligase Ribozymes). Second, as capping substrates changed in phosphate number (NMP, NDP, NTPs), the two ribozymes changed catalytic rate in a highly correlated fashion. Both ribozymes had nearly the same relative speeds using NTP and NDP substrates, but both become about 20-fold slower when NMP was used to generate a cap (n=1, Eq. (8)). Third, both ribozymes activated the α-phosphate of the triphosphate originally found on the ribozyme (Fig. 1.17, Eq. (7)), which was followed in a second step by nucleotide capping (Eq. (8)). Fourth, both ribozymes could exchange cap structures, implying a high level of symmetry in the reaction mechanism (Eq. (9)). These commonalities suggested a similarity in mechanism between the two distinct ribozyme isolates as indicated in Eqs. (7)-(9) and Figure 1.17.

pppRNA → p

pnN + p

pmNʹ + N(5ʹ)pnpRNA → Nʹ(5ʹ)pmpRNA + pnN (9)

Where n, m = 1, 2, 3, 4, and N, Nʹ = A, U, G, and C, < indicates a lariat type activated 5' phosphate.

By synthesizing Rp and Sp α-capped variants of the 6.17 capping ribozyme (Zaher and Unrau, 2006), it was demonstrated that cap exchange retained stereochemistry rather than inverting it. While the covalent intermediate implied by this retaining mechanism was not found, this experiment indicated that both ribozymes performed capping using a two-step mechanism. If initially activated with a triphosphate, both ribozymes would, in their first step, attack the 5ʹ α-terminal phosphate found on the ribozyme to form a lariat structure (Fig. 1.17A), displacing pyrophosphate and reversing the stereochemistry of the reaction center. If activated with a cap, these ribozymes would first displace the nucleotide found immediately proximal to the α-phosphate, while again forming a lariat (Fig. 1.17B). While the nucleophile responsible for forming this lariat has yet to be identified, it is likely to involve an amine from a base or a 2ʹ hydroxyl

40 of a ribose sugar found in either ribozyme’s catalytic core. In the second step, the incoming capping substrate now attacks this intermediate reversing the stereochemistry of the first step to a retaining configuration (Fig. 1.17C), thus explaining the symmetry observed with cap exchange in this ribozyme. Such a two-step mechanism would also explain the extensive commonalities between the two ribozymes and is fundamentally identical to that of protein-based eukaryotic RNA capping and the mechanism of DNA and RNA ligation (Shuman and Lima, 2004). Both ribozymes appear to only recognize the phosphates immediately proximal to the site of lariat formation. This, together with the two-step mechanism, explains the otherwise perplexing finding that forming a cap with NMP is less efficient than with NDP or NTP for both ribozymes.

Protein-mediated capping activates the 5ʹ α-phosphate of the incoming GTP nucleotide, effectively reversing the order of this otherwise equivalent chemistry. First, the protein catalyst attacks the GTP α-phosphate with a lysine, displacing pyrophosphate to form a covalent intermediate, which is directly analogous to the ribozyme lariat structure (Compare Fig. 1.17A, 1.17B, 1.18A). In the second step, the capping enzyme allows an mRNA, which possesses a terminal 5ʹ diphosphate, to attack this reactive intermediate to form a final GpppRNA triphosphate cap (Fig. 1.18B, 1.17C n=2). From an evolutionary perspective, this protein strategy has several obvious regulatory advantages. Namely, premature hydrolysis of the GpE intermediate does not inactivate capping, as a fresh GTP can still react. Likewise, using ppRNA in the second step allows capping to be maximally regulated by biological processes. Otherwise, RNA- mediated capping by a ribozyme is identical to that of eukaryotic capping and NDPK dependent phosporyl transfer enzymes from a mechanistic perspective. While the ribozymes studied to date have no mechanism to break the symmetry between their initial and final states (allowing them to continuously exchange caps until hydrolysis occurs (Zaher and Unrau, 2006)), further in vitro selection could be used to evolve capping and NDPK-like ribozymes. Capping with redox factors such a nicotinamide mononucleotide (NMN) would have allowed an early RNA World to rapidly evolve redox dependent metabolic pathways using RNA as a catalyst. More interestingly, the future demonstration of NDPK like ribozymes would have direct ramifications for the role of RNA in early emergence of a nucleotide-based energy currency. Such chemistry would be fundamentally similar to that used by NDPK enzymes found in modern metabolism (Eq. (3) and (4)).

41

Figure 1.18. Protein mediated capping. Protein mediated capping proceeds in reverse fashion to that of the ribozyme capping mechanism. Here, the enzyme (E) attacks a reactive 5ʹ α- phosphate, but this time on the incoming cap nucleotide so as to form a covalent N(5ʹ)pE nucleotide intermediate (A), which in turn is attacked by a ppRNA to form a phosphate-phosphate linkage (B).

42 1.6.6. Ligase ribozymes

RNA ligase enzymes are universal in all domains of biology and are able to catalyze the ligation of RNA molecules through phosphodiester bond formation. These reactions play a critical role in RNA repair, splicing and editing pathways (Amitsur et al., 1987; Abelson et al., 1998; Schnaufer et al., 2001). The protein ligases, such as the two branches of bacteriophage T4 ligases, RNA ligase 1 (Rnl1) and RNA ligase 2 (Rnl2) (Wang et al., 2003), are capable of ligating 5ʹ-phosphorylated RNA termini (5ʹ-pRNA) with a terminal 3ʹ-hydroxyl RNA via a three step mechanism. The first two steps are chemically analogous to RNA capping (Fig 1.19A, B). The enzymes first form an activated adenylate intermediate (A(5ʹ)p-E) via the attack of an active site lysine residue on the alpha phosphate of pppA, displacing pyrophosphate (PPi) (Fig. 1.19A).

Similarly, DNA ligases can also use the nucleotide cap NAD+ as a substrate, as opposed to pppA, releasing NMN (Pascal, 2008). The first step confers specificity towards purine nucleotides by burying the purine base in a hydrophobic pocket of conserved residues and exposing the α-phosphate of the nucleotide on the enzyme surface (Shuman and Lima, 2004). In the second step, the adenylate is transferred onto the 5ʹ α-phosphate of the 5ʹ-pRNA to form an A(5ʹ)ppRNA cap (Fig. 1.19B). In the third and final step, the enzyme coordinates the nucleophilic attack of a 3ʹ hydroxyl of an opposing RNA strand to form a final phosphodiester linkage with the A(5ʹ)ppRNA capped RNA, displacing pA (Fig. 1.19C). This mechanism has the ultimate advantage of activating an otherwise quiescent 5ʹ phosphate to enable phosphodiester bond formation. As just discussed in section 1.6.5: Capping Ribozymes, ribozymes are capable of two-step capping chemistry, but it remains to be seen if the more complex three-step chemistry performed by protein RNA ligases can be achieved by a ribozymes in the laboratory.

43

Figure 1.19. Protein mediated RNA ligation. Protein mediated RNA ligation proceeds via a three-step mechanism, with the first two steps directly analogous to capping. (A)The enzyme (E) attacks the reactive 5ʹ α-phosphate on an incoming ATP, forming a covalent A(5ʹ)pE nucleotide intermediate and releasing pyrophosphate. (B) In turn, this activated phosphate is attacked by a 5'-pRNA1 to form an AppRNA1 cap structure. (C) The enzyme then coordinates the attack of a 3ʹ hydroxyl of an upstream RNA2 strand to AppRNA1 to finally form the phosphodiester linkage while displacing pA.

44 In contrast to protein catalyzed ligation, all ribozyme ligases found to date proceed via a single-step mechanism. Two such ribozymes are the previously discussed naturally occurring hairpin and hammerhead, which catalyze reversible sequence- specific cleavage of the RNA backbone, where nucleophilic attack of the 2ʹ-hydroxyl onto an adjacent 3ʹ-phosphodiester results in the formation of a 2ʹ-3ʹ-cyclic phosphate and a free 5ʹ-hydroxyl terminus (Buzayan et al., 1986; Hutchins et al., 1986; Prody et al., 1986). Favoring ligation over cleavage, hairpin ribozymes derived from the (-) strand of Tobacco ringspot virus satellite RNA show ligation yields of up to 57% (Fig. 1.20A) (Balke et al., 2014). Conversely, although cleavage is preferred by hammerhead ribozymes, they are capable of efficient ligation, as observed in organisms such as Schistosoma, which show up to 23% of ligated products when starting with fully cleaved substrate (Fig. 1.20A) (Canny et al., 2007). The interconversion between cleaved cyclic phosphate and a ligated phosphodiester is nearly thermodynamically neutral, and while exploited by viruses to great effect, the susceptibility of the cyclic phosphate to hydrolysis and its intrinsic reversibility likely does not make for an effective ligation mechanism, where the robust preservation of more complex genetic information is required.

A major requirement of an RNA World is the ability of RNA molecules to sustain life through RNA replication. Of particular interest is an RNA replication system that can either polymerize sequential nucleotides onto a growing strand in a template-dependent manner or assemble RNA oligomers via RNA catalyzed ligation reactions. The first in vitro ribozyme selection aimed to isolate ligase ribozymes that are mechanistically indistinguishable from that of RNA polymerases (Bartel and Szostak, 1993). This selection used the 5' triphosphate found on RNA transcripts as activation, resulting in seven isolated ribozyme families that fall into three structural classes, each showing an increase in catalytic rate of 103 to 104 over the uncatalyzed reaction. Of these three classes, the class I ligase is unique as it catalyzes the formation of a 3ʹ-5ʹ- phosphodiester bond linkage (Fig. 1.20B), while the class II and III ligases catalyze 2ʹ-5ʹ linkage formation (Fig. 1.20C) (Ekland et al., 1995). Using further in vitro evolution and engineering approaches on the class I ligase resulted in an improved new ribozyme, 210t, which differs from its parent by only a three base pair rearrangement in its catalytic core. This new ribozyme achieved a catalytic rate comparable to naturally occurring ribozymes such as the Tetrahymena group I intron and RNase P RNA, with a kcat of 360

45 -1 -1 min at pH 9.0 as compared to the parent ribozyme with a kcat of 16 min at pH 8.0 (Bergman et al., 2000). In addition to high catalytic rates, in vitro selected ligase ribozymes are quite versatile. This was demonstrated by Ikawa et al. who synthesized and developed an artificial RNA ligase by combining multiple functioning RNA motifs, into cis-DSL-1S, a ribozyme that can carry out two successive 3ʹ-5ʹ nucleotidyl polymerization type reactions (Ikawa et al., 2004). Furthermore, through a distinct selection scheme utilizing ATPγS, the ribozyme C06 was selected, able to catalyze 2ʹ-5ʹ- phosphodiester bond formation with a variety of purine mono-, di- and tri-nucleotides (Kang and Suga, 2007).

Compared to the modern protein ligases, the ligase ribozymes selected and discussed so far catalyze a single step ligation mechanism as opposed to three steps. As the multistep process fundamentally allows ligation to occur from an inactivated state (i.e., nucleic acid strands initially lacking activation on either the 5' or 3' termini), the single step ligation therefore misses out on the advantages that multistep ligation offers for nucleic acid repair. The question thus arises of whether, in the setting of an early RNA World, RNA ribozymes could have evolved to carry out a similar multistep ligation process. As we have seen, phosphorylation is entirely possible (see section: 1.6.3: Kinase Ribozymes) and all three steps found in protein mediated ligation reactions can be performed by ribozymes. However, each step is currently performed by a distinct ribozyme. The capping ribozymes described earlier in this chapter (see section 1.6.5: Capping Ribozymes) can catalyze the first and second steps of the protein RNA ligase mechanism by transferring a nucleotide cap onto the 5ʹ α-phosphate of an RNA sequence. Furthermore, Hager and Szostak, used a pre-existing ATP RNA aptamer domain flanked by random sequences and isolated ligase ribozymes capable of 3ʹ-5ʹ template-directed ligation of AMP activated RNA (A(5ʹ)ppRNA) with a catalytic rate enhancement of ~ 5x105 over the uncatalyzed reaction (Fig. 1.20D) (Hager and Szostak, 1997). This set of reactions, therefore, demonstrates that a ribozyme containing all three mechanistic steps of a modern protein ligase could certainly be selected in the future. To that end, there is no reason to believe that RNA could not have easily manipulated and repaired genetic information in an RNA World.

46

Figure 1.20. Ribozyme mediated RNA ligation. Ligase ribozymes catalyze the ligation of an RNA molecule through (A) a 5ʹ hydroxyl nucleophilic attack on a 2ʹ-3ʹ cyclic phosphate, as found in the reverse reaction of the hairpin and hammerhead ribozymes, (B) nucleophilic attack of a 3ʹ hydroxyl or (C) a 2ʹ hydroxyl on the α-phosphate of a 5ʹ- triphosphate on a downstream RNA molecule and (D) 3ʹ hydroxyl nucleophilic attack on an AppRNA promoted by ribozymes (see next page).

47

Figure 1.20. (Continued)

48 Ligase ribozymes could have also played a key role in early RNA replication through the assembly of smaller, less information rich oligomers into complex functional RNA molecules. Here, the manipulation of genetic information in an RNA World is taken to its logical endpoint, where the construction of complex RNA catalysts could have occurred via the ligation of smaller RNA polymers into larger functional elements. Using the in vitro selected RNA ligase R3C (Rogers and Joyce, 2001) as a starting point, Joyce and Paul developed a self-replicating RNA enzyme (Paul and Joyce, 2002). The R3C ribozyme has a simple secondary structure and catalyzes 3ʹ-5ʹ phosphodiester linkage between two RNA molecules with a catalytic rate of 0.32 min-1. This ribozyme was re- configured to act as a catalyst for its own assembly, joining together two RNA substrates to produce an identical copy of itself in a template dependent matter. The two component molecules, A and B, are partially complementary to the ribozyme (R1), which catalyzes their ligation to form a second identical ribozyme (R2) (Fig. 1.21A). This iteration of the system showed inefficient self-replication with a doubling time of ~17 h, however directed evolution guided by kinetic studies (Ferretti and Joyce, 2013) has greatly improved the system, achieving doubling times of ~5 min with exponential growth rates of ~0.14 min-1 (Robertson and Joyce, 2014). The R3C ligase was also converted to a cross-catalytic format, where two different ribozymes (R and Rʹ) can synthesize copies of each other from four component molecules (A, Aʹ, B and Bʹ) (Fig. 1.21B) (Kim and Joyce, 2004). Both self- and cross-replicating ribozyme systems are capable of indefinite exponential amplification, provided they have an ongoing supply of substrates. It has been proposed that in an early RNA World, the first enzymatic replication system would have relied on trans-acting ligase ribozymes for the propagation of genetic information by assembling themselves and other catalysts or templates (Vaidya et al., 2012), only later to be replaced by RNA dependent RNA polymerase ribozymes (Levy and Ellington, 2001).

49

Figure 1.21. Self-replication with ligase ribozymes. Self-replication schematic (A). The ribozyme, R1, ligates substrates A and B to form an identical ribozyme R2. Cross-catalytic replication schematic (B). The ribozyme R (blue) ligates substrates Aʹ and Bʹ (red) to form ribozyme Rʹ (red) (B(i)). In a reverse mechanism, the ribozyme Rʹ (red) ligates substrates A and B (blue) to form ribozyme R (blue) (B(ii)). Lines indicate partial hybridization. Adapted from (Kim and Joyce, 2004).

50 1.6.7. Polymerase ribozymes

Two major strategies have defined attempts to create self-replicating systems out of RNA. The first focuses on how complex networks of cooperative molecules can emerge spontaneously in laboratory conditions. One such system is the aforementioned in vitro evolved ligase ribozymes that catalyze the assembly of oligonucleotide substrates on complementary templates (Lincoln and Joyce, 2009; Sczepanski and Joyce, 2014). When optimized, these systems can carry out indefinite exponential amplification as long as a supply of substrates is maintained. Other systems have been developed by exploiting the catalytic core of the naturally occurring Tetrahymena, sunY and Azoarcus group I intron ribozymes, likewise, making them capable of assembling their own strands from RNA oligonucleotides (Doudna and Szostak, 1989; Doudna et al., 1991; Green and Szostak, 1992; Hayden and Lehman, 2006; Draper et al., 2008; Hayden et al., 2008, 2018). Although capable of self-sustained replication, the described systems were designed to use oligonucleotides of at least 6-8 residues as building blocks, making it difficult to replicate from random-sequence oligonucleotide pools and to evolve the ribozymes in response to selective pressures (Doudna et al., 1993). However, these interesting reactions demonstrate how networks of catalysis could have emerged early in evolution and reinforce how the types of multistep catalytic processes sustained by modern metabolism could have been preceded by RNA.

The second strategy explicitly demonstrates how one catalytic activity, namely ligation, could have potentially evolved into the more advanced catalytic function of polymerization. As modern metabolism uses polymerases for all fundamental forms of information storage and transfer, several laboratories have focused on the development of RNA dependent RNA polymerase ribozymes capable of replicating arbitrary template sequences. One system, took advantage of the ligation chemistry of a modified sunY group I intron to extend a primer:template by three nucleotides at a time (Doudna et al., 1993). Despite providing the basis for a more general RNA replication system, this ribozyme suffered from low fidelity, low extension and the occurrence of undesired side reactions that lead to complicated mixtures of reaction products. Another system, utilized the hc ligase ribozyme, which was selected from a random RNA pool embedded into the Tetrahymena group I ribozyme (Jaeger et al., 1999), to evolve a primitive RNA polymerase ribozyme, R18-2. Using mononucleotides, this ribozyme was capable of extending an RNA primer by up to three templated G residues (McGinness and Joyce,

51 2002, 2003). However, the most successful polymerase ribozymes to date have been developed from the Class I ligase discovered by Bartel and Szostak, which uses the chemistry required for RNA polymerization to rapidly ligates an RNA primer to itself (Bartel and Szostak, 1993) (Fig. 1.22A). Using magnesium ions for catalysis, this ribozyme positions the 3' hydroxyl of the RNA primer close to the alpha phosphate of the 5' triphosphate found at the start of the Class I ligase ribozyme. When truncated and with optimized sequence, this ribozyme became the fasted known RNA ligase ribozyme, with a ligation rate of 360 min−1, a rate ~107 fold faster than the uncatalyzed reaction (Ekland et al., 1995; Ekland and Bartel, 1995; Bergman et al., 2000).

Figure 1.22. The artificial evolution of an RNA polymerase ribozyme. (A) The class I ligase (blue) selected by Bartel and Szostak (Bartel and Szostak, 1993) is the fastest known ligase ribozyme, able to join a P1 primer (red) to its 5' terminus via the attack of the primer’s 3' hydroxyl on the alpha phosphate of the terminal triphosphate, with consequent displacement of pyrophosphate. (B) The ligase ribozyme’s core function was extended by removing some of its sequence elements and appending a random sequence pool to it. In vitro selection resulted in an RNA polymerase ribozyme able to extend a primer:template substrate in a template dependent fashion (Johnston et al., 2001). The accessory domain (green) and ligase domain (blue) being essential for nucleotide triphosphate (NTP) recognition and polymerization respectively (Wang et al., 2011).

An RNA polymerase has chemistry identical to that of Class I ligase, but rather than joining two RNA strands, it must allow the 3' hydroxyl of the primer to attack the α- phosphate of incoming nucleoside triphosphates (NTPs) to continuously extend the primer by polymerization (Fig. 1.23A). To select for RNA polymerization, the Class I ligase core was fused with a random 76-nt sequence library and in vitro selection was used to select a new functional domain capable of positioning incoming NTPs with respect to the Class I ligase’s active site so as to extend a primer:template (Johnston et

52 al., 2001) (Fig. 1.22B). Functional molecules were selected on their ability to extend a covalently linked RNA primer with 4-thioUTP, prior to their isolation on APM (N- acryloylaminophenylmercuric acetate) gels. The resulting RNA polymerase ribozyme, the R18 (round 18), now consisted of two functional domains; one responsible for catalysis and the second responsible for correctly positioning the incoming nucleotides (Wang et al., 2011; Akoopie and Müller, 2018). The resolved 3.0 Å resolution crystal structure of the Class I ligase has a resemblance to a tripod (Shechner et al., 2009), while the second RNA domain drapes over the vertex of the ligase core tripod structure (Wang et al., 2011) (Fig. 1.22). Interestingly, many naturally occurring ribozymes are built from multiple domains, each with a distinct function. For example, the ribosome consists of two main RNA complexes; one forming the large subunit responsible for peptide bond synthesis and the other forming the small subunit where mRNA is decoded via interactions with the anticodon region of tRNAs (Ramakrishnan, 2002). Likewise, in bacteria, RNase P is defined by two functional domains; one responsible for the chemistry of pre-tRNA cleavage and the other, a specificity domain, responsible for binding the pre-tRNA substrate (Fig. 1.12B) (Ellis and Brown, 2009). In this aspect, naturally occurring RNAs are similar to protein enzymes, which also make use of multiple functional domains. For example, in all living cells, protein DNA-dependent RNA polymerases are multisubnunit, multiprotein complexes with conserved functional domains across Bacteria, Archaea and Eukaryotes (Werner and Grohmann, 2011). These domains include the catalytic core, a DNA binding cleft, a template securing clamp, the template interacting “jaws”, and a stalk involved in the recruitment and function of transcription factors. Like the naturally occurring ribozymes that have multiple functions and structures, in vitro evolution has been used to select new functional domains that endow artificial ribozymes with increasing biological properties. The strong parallels between artificial and natural evolution in the development of multiple domain ribozymes is currently providing much insight into the complexity of RNA evolution in an RNA World and the Class I ligase derived RNA-dependent RNA polymerase ribozymes evolved to date are a primary example.

Further laboratory evolution using compartmentalization methods (Zaher and Unrau, 2007) and anchoring of the polymerase to the RNA template (Wochner et al., 2011; Attwater et al., 2013) resulted in a series of R18 polymerase ribozyme variants capable of synthesizing hundreds of nucleotides on highly repetitive templates.

53 Additionally, selections aimed at synthesizing full length aptamers (Horning and Joyce, 2016) or functional hammerhead ribozymes (Tjhung et al., 2020) resulted in polymerase ribozymes able to synthesize more complex structured RNAs such as an entire tRNA, the hammerhead ribozyme and even the class I ligase core from three RNA fragments. Interestingly, when provided with dNTPs, one ribozyme also displayed reverse transcriptase activity (Samanta and Joyce, 2017), strongly suggesting that the emergence of DNA genomes could have occurred much earlier in evolution than previously anticipated.

While laboratories have focused on RNA polymerization in the 5' to 3' direction (Fig. 1.23A), fundamentally 5' to 3' and 3' to 5' extension are chemically equivalent (Fig. 1.23B). While 3' activated NTPs are untenable due to the presence of reactive 2' hydroxyls, extension with 5' activated NTPs can technically proceed in either direction. Interestingly, the Joyce laboratory has observed that, when given the opportunity, ribozyme polymerases are indeed capable of extension in either direction (McGinness et al., 2002). Modern metabolism favors 5' to 3' extension presumably because attacking the -phosphate of an incoming nucleotide by a 3' hydroxyl found on a growing RNA strand (Eq. (10)) is intrinsically resistant to inadvertent NTP hydrolysis.

pppNn + … Nn−2pNn−1 → … Nn−2pNn−1pNn + PPi (10)

An active RNA metabolism capable of nucleotide synthesis and phosphorylation would naturally regenerate a nucleoside or nucleotide to an NTP, whereas extension in the 3' to 5' direction (Eq. (11)), which is mechanistically equivalent, would be highly sensitive to hydrolysis of the 5' triphosphate found on the growing strand and would therefore be strongly selected against.

pppNn + pppNn−1pNn−2 … → pppNnpNn−1pNn−2 + PPi (11)

54

Figure 1.23. Template dependent RNA and DNA polymerization is catalyzed by RNA dependent RNA or DNA polymerase (RdRP or RdDP) ribozymes. (A) RdRP or RdDP ribozymes catalyze the formation of a 5ʹ to 3ʹ-phosphodiester linkage between the 3ʹ hydroxyl of a template bound RNA primer and the 5ʹ α-phosphate of either a ribonucleotide or deoxyribonucleotide triphosphate (pppN or pppdN). (B) RdRP Ribozymes catalyzes the formation of a 3ʹ to 5ʹ-phosphodiester linkage between the 5ʹ α-phosphate of a template-bound RNA primer and the 3ʹ hydroxyl of a ribonucleotide triphosphate (pppN). N indicates the possible nucleosides adenosine (A), cytidine (C), guanosine (G) and uridine (U) in pppN and A, C, G, thymine (T) in pppdN (where dN represents a deoxyribose nucleotide).

While these polymerase ribozymes are important demonstrations of polymerization activity, they lack many essential features found in protein polymerases that are fully integrated into metabolism. Most importantly, these ribozymes have low processivity, precluding the ability to achieve strand displacement, which is required for replicating RNA genomes into functional RNA genes (Szostak et al., 2001; Joyce, 2002; Cheng and Unrau, 2010; Joyce and Szostak, 2018). Lawrence and Bartel measured in detail the processivity of the R18 RNA polymerase ribozyme (Johnston et al., 2001) and

55 determined that the low processivity was attributed to the weak affinity of the polymerase for the primer-template (Lawrence and Bartel, 2003). Surprisingly, the ribozyme was particularly processive once it achieved accurate primer-template binding and alignment, showing great extension speed and multiple nucleotide additions before dissociation. Consistent with this, tethering the primer or template to the polymerase resulted in a ~300-fold increase in polymerization rate (Wang et al., 2011; Wochner et al., 2011; Attwater et al., 2013; Horning and Joyce, 2016; Tjhung et al., 2020), with concentration enhancing micelles leading to up to 20-fold increase in product yield (Müller and Bartel, 2008). In contrast, the structure of protein RNA polymerases (RNAPs) resembles a “crab claw” with two “pincers” that form a clamping domain around a DNA binding cleft, facilitating high processivity (Griesenbeck et al., 2017). In bacteria, the clamping domain is found in an open state in free RNAP, with an opening large enough to accommodate double-stranded DNA (dsDNA). Upon loading and unwinding of the dsDNA template, the clamp domain undergoes a conformational change, closing over the DNA binding channel, securing the template and now only capable of accommodating single-stranded DNA (ssDNA) (Chakraborty et al., 2012).

Equally important, these polymerase ribozymes have no way to specifically initiate or terminate their copying function. Such activities are built into all protein-based DNA-dependent RNA polymerases in modern metabolism, as these functions allow cells to change their transcriptional programs to match changing environmental conditions (Griesenbeck et al., 2017). In Bacteria, initiation involves promoter specific sigma-factors (σ) which transiently bind the RNAP core, recruiting it to s specific promoter and the transcription start site (Young et al., 2002). Following initiation, the enzyme undergoes NTP-dependent structural rearrangements to a processive elongation form, clearing the promoter and releasing the specificity factor (Murakami and Darst, 2003). Upon reaching the end, transcriptional termination is either mediated by the termination factor Rho (ρ) or intrinsically by a hairpin forming palindromic sequence in the RNA (Washburn and Gottesman, 2015).

Developing improved processivity and mechanisms of initiation, elongation, and termination similar to those found in protein enzymes are therefore fundamental to demonstrating the potential of RNA ribozymes to mediate the replication and evolution of an early RNA World metabolism. Over the course of my PhD, I used in vitro evolution to add a third clamping domain to the previously developed two domain polymerases. This

56 new domain allows the polymerase to search for an RNA promoter, which, when found, causes the polymerase to structurally rearrange into a processive elongation complex capable of extending a broad range of random sequence templates, functioning much like modern protein RNA polymerases in gene expression (Cojocaru and Unrau, 2021) (Chapter 2). This sequential addition of functionality by the addition of new domains may well mirror the earliest evolution of ribozymes at the dawn of life, bringing us one step closer to true Darwinian evolution within a test tube.

57 Chapter 2. Processive RNA Polymerization and Promoter Recognition in an RNA World

This chapter is an extended version of the following manuscript:

Cojocaru, R., Unrau, P.J. (2021). Processive RNA polymerization and promoter recognition in an RNA World. Science 371, 1225–1232.

2.1. Introduction

The RNA World hypothesis posits that at the dawn of evolution, RNA played a key role in the establishment of life (Gilbert, 1986). Central to this hypothesis is the existence of an RNA replicase ribozyme capable of copying its own genome using a supply of prebiotically-synthesized nucleotide monomers and RNA polymers (Szostak et al., 2001; Joyce, 2002). Ever since the class I ligase ribozyme was isolated from a high- diversity RNA pool (Bartel and Szostak, 1993), there has been a sustained effort to produce highly processive polymerase ribozymes (Johnston et al., 2001; Zaher and Unrau, 2007; Müller and Bartel, 2008; Wang et al., 2011; Wochner et al., 2011; Attwater et al., 2013; Sczepanski and Joyce, 2014; Horning and Joyce, 2016). As the affinity of these polymerases for their RNA templates is weak, with KM values in the millimolar range (Lawrence and Bartel, 2003), the most successful strategies to date have colocalized polymerase ribozymes with their substrates using concentration enhancing micelles (Müller and Bartel, 2008), or by anchoring either the RNA template (Wochner et al., 2011; Attwater et al., 2013) or the RNA primer to be extended (Wang et al., 2011; Horning and Joyce, 2016) to the polymerase ribozyme. These strategies create a high local concentration of primer-template with respect to the polymerase but fail to create a truly processive polymerase by virtue of the tethering strategies used to enhance polymerization.

Here, we report a natural linkage between the emergence of processivity and promoter selectivity in an RNA polymerase ribozyme. We hypothesized that an RNA polymerase ribozyme could be partially hybridized to a sigma factor like specificity- primer. This 'open' clamp form (Fig. 2.1, A and D) would be able to search for a single- stranded RNA promoter. Strand invasion would then allow template sequences containing a promoter to strip the specificity-primer away from the primer-binding site

58

Figure 2.1. The clamping RNA dependent RNA polymerase (CP RdRP) ribozyme and DNA dependent RNA polymerase (DdRP) transcriptional initiation processes. (A) RNA specificity-primer activated 'open' form P1:CPOPEN (top) and DdRP holo-enzyme (bottom). (B) The specificity-primer localizes CP to a ssRNA promoter (top), while a sigma factor localizes the DdRP to a DNA promoter. (C) In both cases, a clamped 'closed' state forms, enhancing polymerization. (D) Secondary structure of the minimal P1:CPOPEN form, and (E) 'closed' form CPCLOSED. Colored lines indicate: Ligase Core (blue), Accessory Domain (green) and minimal Clamping Domain (orange). Up mutations designed into the selection (red boxes), rediscovered up mutations (teal boxes), and newly discovered mutations (yellow boxes, panel D).

59 (PBS) of the polymerase (Fig. 2.1B), triggering a structural rearrangement to a processive 'closed' clamp form (Fig. 2.1, C and E). Such a mechanism is analogous to that used by extant bacterial DNA-dependent RNA polymerases (DdRPs), which have evolved to recognize promoters via a two-step process, involving sigma-factor dependent promoter recognition and NTP-dependent structural rearrangement to a final processive elongation form (Fig. 2.1, A to C, bottom panels) (Maeda et al., 2000; Young et al., 2002; Gruber and Gross, 2003; Murakami and Darst, 2003). All extant DdRPs, including the bacteriophage polymerases (Cheetham and Steitz, 1999; Yin and Steitz, 2002; Tahirov et al., 2002; Durniak et al., 2008), use a variation of this two-step process. Thus, we selected a ribozyme with a similar mechanism to explore the potential connection between promoter recognition and processivity.

2.2. Selection of a promoter-specific RNA polymerase ribozyme.

To investigate this hypothesis, we started with the two-domain RNA polymerase ribozyme B6.61 (Zaher and Unrau, 2007), which consists of a catalytic ligase core and a secondary accessory domain that confers NTP extension ability via its AJ3/4 element (Wang et al., 2011) (Fig. 2.2). We engineered three changes into this parental ribozyme by appending a primer-binding site (PBS) to its 5' end, synthesizing a high-diversity pool containing 1013 sequence variants by inserting random sequence libraries at three distinct sites, and removing sequence from the B6.61 accessory domain known to be redundant (Wang et al., 2011) (details in section 2.9.2 of Materials and Methods and Table 2.1).

Three selection schemes, a negative selection, a clamping selection, and a processivity selection (Fig. 2.3) were alternated for 30 rounds to select for functional ribozymes (Fig. 2.4, Table 2.2). The negative selection removed pool molecules that could hybridize to a linear, randomly generated selection template (T1) immobilized onto streptavidin magnetic beads (Fig. 2.3A). The clamping selection (Fig. 2.3B) first formed P1:PoolOPEN molecules by incubating them with the P1 specificity-primer (Table 2.3), which is fully complementary to the 22-nucleotide (nt) promoter found within the T1 template (Table 2.4). To retain pool molecules that could make the transition to the PoolCLOSED state, P1:PoolOPEN molecules were added to circularized T1 (cT1) immobilized on streptavidin beads. Correctly clamped 'closed' pool molecules retained on cT1 were

60 then recovered by adding fresh specificity-primer to again re-form the 'open' state and release correctly clamped pool molecules from the circular template. This process of transitioning from 'open' to 'closed' and back to 'open' was performed either once or twice during clamping rounds of selection for increased selective pressure.

Figure 2.2. The Clamping Polymerase (CP) progenitor, B6.61. Secondary structure of the B6.61 ribozyme, adapted from (Zaher and Unrau, 2007; Wang et al., 2011). Black arrowheads indicate the locations of random sequence library insertions used in constructing the initial selection pool library. The catalytic ligase core is shown in blue, the accessory domain in green.

The processivity selection scheme incorporated polymerization activity into the clamping selection (Fig. 2.3C). Here, activated P1:PoolOPEN complexes were added to free circular template and incubated with ATP, GTP, UTP (4 mM each) together with Biotin-11-cytidine-5'-triphosphate (1 mM, CTPB). Templates encoding for CTP incorporation at either the first, third or tenth extension position were then used to select for polymerization via the incorporation of CTPB (Table 2.2). Following this incubation, the pool-primer-template mixture was added to streptavidin beads and washed. As before, captured pool ribozymes were recovered by adding specificity-primer, re-forming the 'open' state and allowing the recovery of clamping polymerase ribozymes with significant polymerization activity. After every round of selection, recovered pool RNA

61 was reverse transcribed, PCR-amplified and transcribed for the next round of in vitro selection (see section 2.9.5 and 2.9.6 of Materials and Methods).

Figure 2.3. Schematic in vitro selection schemes. (A) Negative selection scheme. 'Closed' RNA pool molecules were incubated with 3' Biotinylated T1 (T1-B) immobilized on streptavidin magnetic beads (S) to remove molecules capable of hybridization to the T1 template. (B) Binding selection scheme. Primer bound 'open' pool molecules were incubated with cT1 immobilized onto streptavidin magnetic beads (S) via a

62 biotinylated DNA oligo (Table 2.3, 50.15). Functional ribozymes able to localize to the template and undergo a conformational change from an 'open' to a 'closed' form, remain trapped on cT1, while non-functioning molecules are washed away. Functional molecules are recovered by the addition of fresh P1 primer to trigger a conformational change back to the 'open' form. Pool molecules were put through this selection scheme once or twice in a row for increased stringency as indicated in Table 2.2 and Fig. 2.4. (C) Processivity selection scheme. 'Open' pool molecules were incubated with cT1, 4 mM of ATP, GTP, UTP and 1 mM Biotinylated-CTP (B-CTP). Functional molecules able to clamp onto the template and promote polymerization were selected by binding to streptavidin magnetic beads (S) and washed. As before, functional molecules were released from the template by the addition of excess fresh P1 primer.

Figure 2.4. Selection schemes and pool modifications implemented by round of selection. At round 16 three sub-pools were mixed with the selective pool in equal parts. At R23 a new RT primer was used to add an additional A residue to the 3' terminus of the pool to suppress an observed artifact.

After 16 rounds of selection the pool exhibited the anticipated clamping activity, but only minor polymerization ability relative to the B6.61 progenitor. The selection pool was therefore mixed with three new sub-pools where the existing clamping domain pool diversity was preserved but the ligase and accessory core of the polymerase were modified to contain the following: (i) nine high frequency mutations found throughout the cloned pool, (ii) 11 point mutations and a deletion found previously by other groups (Wochner et al., 2011; Horning and Joyce, 2016), and (iii) the union of sub-pools 1 and 2 mutations. To further increase diversity, the combined pools were subjected to mutagenic PCR. After 23 rounds of selection, the pool was found to add CTPB to the 3'

63 terminus of pool molecules. This was suppressed by extending the 3' terminus of the pool by a single A residue (Fig. 2.5, Table 2.1C).

Figure 2.5. 3' extension by the Round 23 pool. R23 pool 3' extension when incubated with α-32P-UTP in the absence of primer and template. This effect was decreased to 2 and 14% by adding a terminal A or U residue to the pool molecules respectively or to 7% by mutating the terminal nucleotide of the pool from a U to a C.

After 25 rounds of selection, a dramatic decrease in pool diversity occurred, with the final 5 rounds of selection being dominated by 5 major ribozyme polymerase families (Fig. 2.6A). This loss in diversity was directly correlated with the emergence of significant polymerization on the cT1 template (Fig. 2.6B). One ribozyme from Family 1, the Clamping Polymerase (CP, Table 2.5), was characterized further. The CP ribozyme contained 7-point mutations and deletions in the ligase core, and 17 point mutations or insertions in the accessory domain relative to the progenitor B6.61 ribozyme. Out of the 24 mutations found, 14 activity enhancing mutations have been found previously (Wochner et al., 2011; Horning and Joyce, 2016). Of these 14, 11 were deliberately designed into the selection pool at Round 16, and three evolved independently (Fig. 2.1D).

64

Figure 2.6. Pool diversity and the emergence of polymerization. (A) Pool diversity from Rounds 23 to 30. Families contained sequences with a pairwise-distance: d ≤ 2. Arrow indicates 2% mutagenesis of the R28 DNA pool. (B) Extension activity of P1:PoolOPEN on cT1 by Selection Rounds. Reaction conditions: P1 specificity-primer (0.1 μM) was mixed with Ribozyme pools (0.12 μM) in 100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5 for 20 min at room temperature. Reactions were started by the addition of 4 mM of each NTP and cT1 template (0.14 μM). Reactions were stopped by heating at 95°C for 5 min after adding equivolume 80% formamide, 200 mM EDTA, 0.025% Xylene Cyanol, 0.025% bromophenol blue and 10-fold excess of an RNA oligonucleotide complementary to cT1 prior to loading on 10% PAGE.

2.3. Clamping domain characterization

Removal of the newly selected 3' clamping domain abolished polymerization activity (Fig. 2.7, Table 2.6, Const. 5), while transplanting the clamping domain from the Family 1 CP onto a lower activity Family 4 ribozyme (12% activity of CP) enhanced its activity to CP’s level (Fig. 2.8) implicating this new domain in processive polymerization. Truncation analysis (Table 2.6, Const. 1-18) and secondary structure prediction of the 3' clamping domain revealed a minimal 45-nt domain comprised of two stem-loop structures, C1-CL1 and C2-CL2, separated by a junction sequence, CJ1/2, shown in the predicted P1:CPOPEN form (Fig. 2.1D). In the CPCLOSED form, the C1 helix shortens by up to three base pairs, allowing the CJ1/2 junction to form a 7 bp non-continuous helix with the 5' PBS sequence as the specificity-primer transfers to the template sequence (Fig.

65 2.1E). The closed form of the minimal clamping domain is highly structured, naturally precluding base pairing interactions with template sequence.

Figure 2.7. Clamping domain minimization preserves activity while domain removal strongly inhibits polymerization. (A) Schematic representation of the full-length CP, the CP ribozyme with the 45-nt minimal 3' clamping domain as shown in Fig. 2.1D and 2.1E (CP min clamp, Table 2.6, Const. 12) and the CP ribozyme with the entire 3' clamping domain removed (CP no clamp, Table 2.6, Const. 5). (B) P1 primer extension on cT1 by the three ribozyme variants shown in (A).

66

Figure 2.8. Clamping domain transplantation onto a lower activity Family 4 ribozyme. Open form P1 primer extensions on cT1 by the Family 1 CP ribozyme (CPF1) and the lower activity Family 4 Polymerase ribozyme (PF4) before and after transplanting the 3' CPF1 clamping domain onto the Family 4 ligase and accessory domain core (PΔF4:F1, Table 2.5).

Removing the C1-CL1 stem-loop destroyed polymerization activity, while replacing either the stem sequence, or the loop sequence with a GCAA tetraloop had minimal effect on activity (Table 2.6, Const. 19-25). Hybridization of a DNA oligonucleotide to this region also suppressed activity (Table 2.7, DNA 1). Thus, the C1- CL1 stem-loop plays an important mechanistic role in forming the active form of the clamped polymerase. The C2-CL2 stem-loop was less critical as removing or mutating it had only an intermediate effect on activity.

The CJ1/2 region, which hybridizes to the PBS in the closed state (Fig. 2.1, D and E), showed the highest effect on activity when blocked or mutated (Table 2.6, Const. 26-51, Table 2.7, DNA 2-7). Introducing a G5:C208U wobble mutation in the PBS:CJ1/2 helix resulted in a 19% increase in extension compared to WT on cT1 (Const. 36). Weakening this stem further, impacted activity, with G8:C205U and U6:A207G lowering extension to 43% and 50% respectively, while combining the two mutations lowered activity to 11% (Const. 34-35, 38). Changing the CJ1/2 sequence from ...A GGC AAC CAC G... by 7-nt to ...A CGG CCA AAA G... (underlined residues predicted to hybridize to PBS, Fig. 2.9A), was predicted to preserve the net hybridization between the PBS and

67 the CJ1/2 in the clamped helix and indeed had 53% activity, indicating that base-pair formation rather than sequence in this region is essential for forming a correctly closed clamp (Const. 26). Conversely, strengthening hybridization in the PBS:CJ/2 stem via A11:A202U or C3:A210G mutations lowered extension to 34% or 47% respectively (Const. 29-30), while strengthening by 3 bp via A11:A202U, G9:G204C and C3:A210G mutations dropped extension to only 2% (Const. 31). Notably, Const. 31 prevented P1 hybridization and formation of the P1:CPOPEN complex (Fig. 2.9. B and C). As both stabilization and destabilization of the clamping helix can lower polymerization activity, a thermodynamic balance between the 'open' primer bound form and the 'closed' form of the polymerase is required for correct promoter-dependent polymerase function.

Figure 2.9. Clamping domain mutations lower extension activity. (A) Predicted hybridization between the PBS and the wild-type CP CJ1/2 or Const. 26 CJ1/2 mutant (Table 2.6). Underline indicates mutated residues. (B) Schematic representation of the specificity primer (P1) hybridization to the CP ribozyme and the lack of hybridization to the CP ribozyme with a 3 bp increase in clamping strength between the PBS and the clamping domain. (C) Native gel shift assay of 0.1 µM 5' 32P end labeled P1 incubated with 0.1 µM CP min clamp and CP+3 bp min clamp (Table 2.6, Const. 12 and 31 respectively) for 20 min at room temperature prior to being ran on a 5% native gel.

68 2.4. The clamping domain confers long-range extension and promoter selectivity

In addition to the cT1 template, we created a second template called cT2 (Table 2.4). This template, also generated from random sequence, is distinct from cT1 and contains a new promoter region complementary to a 26-nt P2 specificity-primer (Table 2.3). P2 shares 10-nt in common with P1 at its 3' end enabling its hybridization to the PBS just as P1 does (Fig. 2.1D). When incubated with cT1 for 24 h, P1:CPOPEN extends P1 by 85-nt while P2:CPOPEN extends P2 on cT2 by 26-nt. In contrast, the progenitor B6.61 ribozyme extended only 5-nt on cT1 and had no observable extension on cT2 (Fig. 2.10A). The long-range extension on cT1 saturated at ~85-nt after 3 days of

Figure 2.10. Promoter specific polymerization by CP on random sequence templates. (A) P1:CPOPEN and P2:CPOPEN extension relative to its progenitor, B6.61, on cT1 and cT2 promoter-templates. (B) Promoter mediated template selectivity by CP. P1:CPOPEN or P2:CPOPEN extensions with either cT1, cT2 or both templates simultaneously.

69 incubation, while cT2 extended by up to ~40-nt (Fig. 2.11). Notably, the holo-complexes formed from either P1or P2 were template-specific, with P1:CPOPEN being specific for the cT1 promoter and P2:CPOPEN being specific for the cT2 promoter, even when mixtures of cT1 and cT2 were presented simultaneously to either holo-polymerase (Fig. 2.10B).

Figure 2.11. Long range primer extension by the CP ribozyme. P1, P2 primer extensions of the CP ribozyme on two templates, cT1 and cT2. Contrast enhanced extension products over 50-nt depicted in the black border excerpt panels, analyzed by 8% PAGE.

2.5. The clamping domain confers polymerization efficiency

A correctly clamped CP ribozyme should ideally stay localized to the primer- template complex that triggered the formation of its 'closed' elongation form, while a less processive ribozyme might dissociate. However, just as in extant biology, the CP ribozyme should only initiate elongation when its 'open' holo-polymerase form is

70 presented to a single-stranded RNA promoter-template and should not polymerize efficiently when its 'closed' form is presented to a primer-promoter-template complex. We found this to be the case, with P1:CPOPEN extension on cT1 being significantly better than CPCLOSED extension on P1:cT1 (Fig. 2.12, A and B). We additionally confirmed that this difference in extension was not due to the CP polymerase being activated by the removal of the P1 specificity primer from the P1:CPOPEN complex (Fig. 2.12C).

Figure 2.12. The order of addition of the polymerase, specificity primer and template influences extension, but not folding of the polymerase core. (A) P1:CPOPEN added to cT1 shows superior extension (lanes 1-6) to CPCLOSED being added to P1:cT1 (lanes 7-12) or when CPCLOSED and cT1 are simultaneously mixed with P1 (lanes 13-18). cT1 was kept constant at 1 µM and all incubations were carried out for 4 h. (B) Quantified and normalized P1 extensions (for extension > 15-nt) of above order of addition conditions, graphed logarithmically. (C) The CP polymerase is not activated by removing the P1 specificity primer from the P1:CPOPEN complex. P1:CPOPEN (0.01 µM P1, 0.012 µM CP) was mixed with cT1 (1 µM)

71 and extension was performed for 4 h (lanes 1 and 2). CPCLOSED complex (0.012 µM) mixed with P1:cT1 (0.01 µM P1, 1 µM cT1) is much less extensive (lanes 3 and 4). P1 of the P1:CPOPEN complex (0.01 µM 1, 0.012 µM CP) was then stripped off the polymerase by the addition of an oligo having the reverse complement sequence to P1 (rcP1, 0.1 µM, Table 2.3) and the resulting complex was added to P1:cT1 (0.01 µM 1, 0.012 µM CP) as before (lanes 5 and 6).

To quantify clamp-driven extension efficiency, we defined and measured an extension ratio (ER) for a range of promoter-templates (Fig. 2.13A). The numerator, defined as the percent extension past a particular RNA product size when the open holo- polymerase ribozyme was added to a promoter-template (P1:CPOPEN + Template); the denominator, defined as the extension when the promoter-template was first hybridized to the P1 specificity-primer, prior to the addition of the closed polymerase ribozyme (CPCLOSED + P1:Template, Fig. 2.13B).

Figure 2.13. Extension efficiency of the CP ribozyme. (A) Linear, circular, short, and short T1 flanked by poly(A) template sequence were tested. (B) Extension Ratio (ER) defined by extension of P1:CPOPEN added to a template (cT1 shown), divided by extension of CPCLOSED when added to prehybridized P1:template. (C) ER, as defined in panel (B), shown for the templates defined in (A). B6.61 with the cT1 template is shown as solid squares. Extensions >15-nt for T1 and cT1; >8-nt for sT1, sT1-A, A-sT1 and cA-sT1; and >2-nt for the progenitor B6.61 ribozyme on cT1 were quantified.

We measured ER by maintaining a one-to-one stoichiometric ratio of the P1 primer and CP ribozyme, which was then titrated over two orders of magnitude of concentration on fixed 1 M templates (Fig. 2.13C and 2.14). As expected, the B6.61 progenitor ribozyme had a concentration independent ER value of ~1, consistent with this polymerase lacking a clamping domain (Fig. 2.13C, solid squares). However, CP showed a ~12-fold higher ER at low primer-polymerase concentrations with both linear and circular T1 (Fig. 2.13C, solid and empty circles). Truncating the linear T1 to the much shorter sT1 construct resulted in a low ER value that was completely independent of polymerase concentration (Fig. 2.13C, open squares). This lack of polymerization efficiency could be rescued by adding oligo(A) template sequence either to the 5'

72 (A59sT1, open diamonds) or 3' (sT1A69, open triangles) terminus of sT1, with the 3' rescue being more pronounced (Fig. 2.13C). Similarly, circularizing the sT1 construct with oligo (A) fully rescued polymerization efficiency (cA128sT1, solid diamonds, Fig. 2.13C).

Figure 2.14. Primer extensions for Extension Ratios. Primer extensions for determining the Extension Ratio (ER) of the CP ribozyme on a set of cT1 template variants. P1 extensions by the CP ribozyme and its progenitor B6.61 on various templates (Fig. 2.13A). A Holo-Polymerase 'open' complex was prepared by adding primer (P1) and CP ribozyme at near equimolar concentration and serially diluted into a fixed, 1

73 μM template. Alternatively, using the exact some concentrations, the dilution was carried out where P1 was first hybridized to the template prior to addition of the CP or B6.61 ribozyme. Both conditions were carried out on the aforementioned templates. Detailed experimental conditions in section 2.9.14 of materials and methods. ER was determined with the CP ribozyme on the following templates: (A) short T1 (sT1), (B) circularized sT1 with oligo (A) (cA128sT1), (C) 5' oligo(A) extension (A59sT1), (D) 3' oligo(A) extension (sT1A69) (F) linear T1 (T1), (G) circular T1 (cT1) and (E) with the B6.61 ribozyme on cT1.

Template extension efficiency was about threefold superior on circular cT1 and cT2 templates relative to their linear counterparts, where efficiency was defined as: E = Extension(P1:CPOPEN + cTemplate)/Extension(P1:CPOPEN + Template) (Fig. 2.15). Further highlighting the importance of correct clamping for efficient polymerization, we eliminated the 5' primer-binding region within the PBS preventing both the P1:CPOPEN and the CPCLOSED states from forming. Its removal reduced the extension ratio to ~1 as expected (Fig. 2.16). These data are consistent with a promoter triggering correct clamping of the polymerase, forming a processive elongation complex able to extend a range of templates, provided they have sufficiently long sequence flanking the specificity-primer:promoter duplex.

Figure 2.15. Higher extension efficiency on circular templates by the CP ribozyme. Three-fold enhanced extension efficiency on circular vs linear templates.

74

Figure 2.16. ER with no PBS specificity primer hybridization sequence. Extension Ratio of the CP ribozyme and CPΔ1-11 lacking the PBS primer hybridization sequence (construct 4, Table 2.6). (A) Holo-Polymerase 'open' complex extensions on cT1 were compared with 'closed' complex extensions on pre-formed P1:cT1 complexes. Reactions were carried out at 10 nM P1, 12 nM CP or CPΔ1-11 ribozyme (Table 2.6, Const. 4) and 1 μM cT1. (B) A histogram comparing the ER for the intact CP ribozyme to the PBS truncated variant. ER was determined for extension > 15-nt.

Relative to the B6.61 progenitor, the CP ribozyme also required less Mg2+ to become fully functional, with the emergence of polymerization occurring at 50 mM Mg2+ and saturating at 75 to 100 mM. In contrast, B6.61 polymerization shows no such saturation, with polymerization extension doubling from 75 to 100 mM and then tripling from 100 to 200 mM Mg2+ (Fig. 2.17, A and B). However, consistent with the mutational analysis performed on the PBS:CJ1/2 helix of the clamping domain (Fig. 2.1, D and E), the fine thermodynamic balance between the 'open' primer bound form and the 'closed' form of the polymerase is highlighted by the sensitivity of the clamp component to magnesium concentration. The ER value of CP increased by over a factor of two when magnesium ions were titrated from 50 to 200 mM, but not for B6.61 (Fig. 2.17), indicating that the clamping structure contains magnesium sensitive elements.

75

Figure 2.17. Magnesium dependence of the CP ribozyme and its progenitor B6.61. P1 extensions on cT1 by the CP ribozyme (A) and its progenitor B6.61 (B) as a function of magnesium concentration. Extensions were carried out by preparing a Holo-Polymerase 'open' complex (P1:CPOPEN) by mixing primer (P1, 0.04 μM) and CP or B6.61 ribozyme (0.05 μM) first, prior to their addition to cT1 (0.2 μM), lanes denoted by black bars. Alternatively, using the same concentrations, the extensions were carried out where P1 was first hybridized to the template prior to the addition of the CP or B6.61 ribozymes, lanes lacking black bar. (C) Extension Ratio (ER) was determined by dividing the above correctly formed Holo-Clamping Polymerase ‘Open’ complex extension by the equivalent amount of extension observed upon adding the ‘Closed’ complex to pre-hybridized primer:template duplex, and plotted as a function of magnesium ion concentration. ER was calculated for polymerization >15-nt for CP (circles) and >2-nt for the progenitor B6.61 ribozyme (squares).

76 2.6. The clamped complex is stable and allows extension at multiple primed sites

Extant DNA polymerases use multimeric clamp proteins to facilitate high polymerization rates and processivity (Benkovic et al., 2001). Without the clamping domain of the CP ribozyme, a close relative of the B6.61 polymerase ribozyme has been reported to have negligible processivity, where on average only ~50% of nucleotide extensions result in a second nucleotide being added by the same polymerase on the same template (Lawrence and Bartel, 2003). To explore the mechanism of processivity further, we immobilized the P1:CPOPEN complex to streptavidin magnetic beads by hybridizing the ribozyme’s 3' terminus to a biotinylated DNA oligonucleotide. The P1:CPOPEN complex was then incubated with a range of templates and the off-rate of either the radiolabeled P1 or template was measured by scintillation counting (Fig. 2.18). The reverse complement of P1 and the sT1 template nearly quantitatively stripped P1 off the immobilized P1:CPOPEN complex (Fig. 2.18, C to E). All other templates were retained together with P1 and the immobilized CP ribozyme (Fig. 2.18, C to E). Addition of cT1 in two-fold excess retained 64% of P1, while addition of sT1 retained only 8% after 1 h of stringent washing (Fig. 2.18D). Likewise, after 4 h of washing, 41-53% of the cA128sT1, sT1A69 and A59sT1 templates were retained, while only 1% of sT1 remained bound (Fig. 2.18E). As these templates differ from sT1 only by the addition of oligo(A) residues, the CP clamp must intrinsically operate in a sequence independent fashion to retain the immobilized complex.

Mechanistically, the formation of the clamped state allows the polymerase to reach and extend primers found a substantial distance away from the promoter binding site. A set of primers (P1+n: n = 5, 40, 80, 121, 156, where n indicates the 5' termini of each primer hybridized downstream from the P1 specificity-primer promoter start site on cT1), could all be significantly extended by the polymerase (Fig. 2.19). As expected, the P1 specificity-primer could only be extended by 5-nt and 40-nt before being blocked by the P1+5 and P1+40 primers respectively (Fig. 2.19B).

77

Figure 2.18. Immobilized open form polymerase (P1:CPOPEN) forms a stable complex with long templates but not short. The P1:CPOPEN complex was hybridized to a biotinylated DNA oligo (at the polymerase’s 3' terminus, Tables 2.3, 30.57) and immobilized to streptavidin magnetic beads prior to washing. A range of templates were then added and time courses performed for monitoring the off-rate of either radiolabeled primer or radiolabeled templates by scintillation counting. (A) Schematic representation of P1:sT1 coming off the P1:CPOPEN complex with the addition of short T1 (sT1). (B) Schematic for the same situation but using the circular T1 (cT1). (C) The percentage of P1displaced from the magnetic beads after 20 min incubation with reverse complement P1 (rcP1), short, circular and short T1 flanked by poly(A) template sequence or P2 off-rate with the cT2 template. (D) Off-rate time course using 5' radiolabeled P1 with cT1 and sT1 templates. Time points were collected by pelleting the beads at each time point (0, 5, 10, 20, 40 and 60 min) and collecting the supernatant prior to resuspension, symbols as in panel (C). (E) Off-rate time course now using radiolabeled templates, sT1, sT1A69, A59sT1, and cA128sT1 using 0, 2, 4, 8, 16, 32, 64, 128 and 256 min time points.

78

Figure 2.19. CP processively extends multiple primers on the same promoter- template. The open form CP polymerase (P1:CPOPEN) is able to extend a range of prehybridized primers distributed around the cT1 template. (A) Location of tested primers relative to the P1:cT1 promoter duplex region. Promoter binding site and start of polymerization is indicated by the transcriptional arrow. (B) Extension assay of P1:CPOPEN and P1+n (n = 5, 40, 80, 121, 156) primers added to cT1. Either P1 or the P1+n primers were radiolabeled as indicated (star).

79 When P1:CPOPEN was mixed with cT1 prehybridized to the P1+5 or P1+40 primers and the resultant complex diluted into 100-fold excess cT1 (Fig. 2.20), simultaneous extension of P1 and the P1+5 or P1+40 occurred to a much greater extent than if the P1 and P1+n primers were found on distinct templates (Fig. 2.20, A to C). The extension ratio for P1+5 and P1+40 being 8 and 11-fold higher when correctly clamped (Fig. 2.20D), similar to the ratios observed for P1 extension (Fig. 2.13C). Further, immobilizing P1:CPOPEN onto streptavidin beads (Fig. 2.21A) resulted in similar extension of templates with P1+n primers, with 75-90% of the correctly clamped complexes remaining on the beads after 4 h of polymerization (Fig. 2.21B). The three-component complex (CPCLOSED- P1:cT1) is therefore a stable and processive polymerase complex.

Figure 2.20. Extension of P1 and P1+n primers is correlated only when on the same templates. (A) To prepare primers in a correlated fashion, a CPCLOSED-P1:cT1:P1+n complex (n = 5 or 40) was pre-formed by mixing P1:CPOPEN (0.1 µM), P1+n (0.2 µM) and cT1 (0.1 µM) and then rapidly diluted 10-fold into cT1 (final concentration: 1 µM, CT ↑). (B) To prepare uncorrelated mixtures,

80 pre-formed CPCLOSED-P1:cT1 complex was diluted into high cT1 (1 µM) containing prehybridized P1+n (0.02 µM). (C) Correlated (panel A) and uncorrelated (panel B) 4 h extensions with P1+5 (left panel) and P1+40 (right panel). (D) Ratio of quantified correlated and uncorrelated extensions (>5- nt for P1 and >2-nt for P1+n).

Figure 2.21. Streptavidin bead immobilized polymerase template complex is active and exhibits highly correlated primer extension. (A) Immobilized P1:CPOPEN was prepared on streptavidin magnetic beads and an excess of template was added prior to washing for 1 h and incubating with NTPs for 4 h after the washing. (B) After incubation, the total suspension (T), the beads (B) and the supernatant (S) were loaded onto a 10% denaturing gel after denaturing each sample in formamide loading dye with 10-fold excess of competing RNA oligonucleotide complementary to T1. To the immobilized P1:CPOPEN, cT1 was added (lanes 1-3), cT1 pre-hybridized to unlabeled P1+5 (lanes 4-6), radiolabeled P1+5 (lanes 7-9) and cT1 pre-hybridized to unlabeled P1+40 (lanes 10-12), radiolabeled P1+40 (lanes 13- 15). Note the nearly quantitative blocking of P1 extension in lanes 4-6.

In aggregate, these data indicate that the CP ribozyme uses a sequence independent topological clamp of the form shown in Fig. 2.1; however, we cannot rule out the possibility of a “gripping” type clamp model known to form by fusing DNA-binding domains to Taq polymerase (Wang et al., 2004). In either case, an organized transition exists from the 'open' holo-polymerase to the 'closed' clamped form, which allows the active site of the polymerase to extend a broad set of RNA templates often by multiple helical turns. Simultaneously, but only when correctly triggered by a specificity-primer, the clamp confers the ability to find and extend primers found at widely spaced locations within a single RNA template.

81 2.7. Programmable promoter recognition by the holo- polymerase

The CP ribozyme can use its PBS sequence as a template to extend shortened specificity-primers. To find the minimal primer that the CPOPEN complex can extend, we partially hydrolyzed 5' end-labeled P1. We found that a 7-nt 3' truncation of P1 (P1-7) was the shortest primer that could be efficiently extended by the CP ribozyme (Fig. 2.22A), consistent with the proposed hybridization mechanism of P1 to the CP ribozyme’s PBS domain (Fig. 2.1D). Three new CP variants were created, each having a unique 5' terminal dinucleotide sequence appended to the CP ribozyme after first removing its 5' G. The variants were activated with a shortened universal primer, PS (Fig. 2.22B), designed to allow two nucleotides of self-templated extension (Fig. 2.23A, Table

2.3 and 2.5). PS was extended nearly quantitatively by the CP1 variant (5' UG template), while the CP2 (5' GU template) and the CP3 (5' CA template) variants showed lower extension after 24 hr of incubation (Fig. 2.23B). This extension behavior was quite robust, with all three polymerase constructs also extending the P1-7 truncated primer (Fig. 2.23B). Notably, all polymerase variants added at least one untemplated purine residue (Fig. 2.23B and 2.24) similar to many protein repair polymerases such as Taq, which add primarily an untemplated A to blunt-ended duplex DNA (Hu, 1993).

Such holo-polymerase-dependent extension of the universal primer sequence allows some promoter-template RNAs to be copied but not others, based on the sequence of the polymerase itself (Fig. 2.23C). We replaced the P1 and P2 promoters in the cT1 and cT2 constructs with three new promoter sequences corresponding to primer

OPEN sequences synthesized by the CP1, CP2 and CP3 ribozymes. Forming the PS:CP1 holo-polymerase, the PS primer was extended by incubation for 3 h (Fig. 2.23D). The

OPEN holo-polymerase ribozymes containing the extended PS primer (PS-ext:CP1 ) was then incubated with the newly constructed promoter templates and primer extension measured. Polymerization by CP1 was ~12-fold better on the cT1 template with the CP1 promoter (cT1.1) compared to the same template having a CP3 promoter (cT1.3, Fig. 2.23D). Likewise, extension on the cT2 template having a CP1 promoter (cT2.1) was ~ 4-fold superior to extension on the same template with a CP2 promoter (cT2.2, Fig. 2.23D). Other combinations of CP-derivatives and their promoters were less significantly regulated, but these permutations demonstrate that self-templated primer synthesis by the polymerase itself can have a marked effect on selective polymerization ability.

82

Figure 2.22. CP Ribozyme self-templated primer extension. Self-templated extension of a P1 hydrolysis ladder by the CP ribozyme. (A) A hydrolysis ladder of P1 was CIP treated and then 5' radiolabeled prior to being incubated with the CP ribozyme and NTPs (4 mM each) to allow primer extension. After 16 h, cT1 template was added and trans template extension of the P1 hydrolysis products monitored. (B) Sequences of primers P1, PS and P1-7.

83

Figure 2.23. Promoter selectivity resulting from specificity-primer synthesis templated by the polymerase ribozyme’s PBS sequence. (A) Template extension of the universal primer PS by CP derivatives, CP1, CP2, CP3, each containing two unique nucleotides in the PBS. (B) PS and P1-7 extensions for each CP variant. * OPEN Untemplated purine residue. (C) PS:CP extension of PS should only efficiently extend the OPEN cT1.1 promoter-template. (D) Left panel: PS:CP1 self-extension (lanes 1, 2) fed into promoter- OPEN template extension of cT1.1, cT2.2, cT2.1 and cT1.3 (lanes 3-6). Right panel: PS1:CP1 OPEN extension of left panel promoter-templates (PS1: synthetic product of expected PS:CP1 self- extension).

84

Figure 2.24. Untemplated extension by the open form of the polymerase is purine rich. OPEN Full length PS1 primer (Top panel, Table 2.3, Fig. 2.23) extension in the PS1:CP1 complex and OPEN full length PS3 primer (Bottom panel, Table 2.3) extension in the PS3:CP3 complex were incubated with either all NTPs (4 mM each) or 4 mM of each NTP alone for 1 or 18 h.

2.8. Discussion

The ability of a polymerase to recognize a promoter presents a fundamental evolutionary tension: molecular recognition of a promoter is a static process, while processive polymerization is a dynamic one. Through in vitro evolution, we have found an RNA polymerase that can search for a promoter by first forming a functional 'open' holo-polymerase complex and then in a second step rearrange into a processive elongation form. The correct assembly of this CPCLOSED complex results in more than one order of magnitude increase in extension, with extension directly comparable to the best RNA polymerase ribozymes isolated to date, which on highly repetitive tethered templates are able to synthesize 75 to 203-nt of sequence (Wochner et al., 2011; Attwater et al., 2013; Horning and Joyce, 2016).

RNA replication results in long stretches of duplex RNA. Thus, just as in modern biology, the ability of an RNA polymerase ribozyme to invade duplex RNA would be of

85 fundamental importance in early evolution. While the CP ribozyme is incapable of strand- invasion (Fig. 2.19B and 2.21B), its processivity and correlated primer extension ability indicates that it entrains templates via a ‘sticky’ topological clamp when correctly clamped (Fig. 2.1). The CP can synthesize duplexes from ~50 to 107 bp in size, a linear extent ranging from 175 to 360 Å, which is threefold to sixfold larger than its class I ligase catalytic core (Shechner et al., 2009). The polymerase must therefore move while not disengaging from the template, as polymerization can occur while washing the immobilized processive complex. Precedent for such a sticky clamp exists in modern RNA biology. The ribosome creates a topological clamp by assembling the large and small subunits around an mRNA. This clamp is stable (Steitz, 1969), but allows robust movement of mRNA 3-nt at a time during translation. Similarly, coupling the force generated from NTP incorporation with the embryonic CP clamp, could lead to the development of a polymerase ribozyme with ratchet-like and strand invasion capabilities (Cheng and Unrau, 2010).

The CP ribozyme can also synthesize part of its own specificity-primer, providing evidence that a replicase in an RNA World could have avoided replicative parasites by a strategy akin to the genomic tag hypothesis of Weiner and Maizels (Weiner and Maizels, 1987). Compartmentalization has long been recognized as a key element in the solution to this problem (Joyce and Szostak, 2018), but early evolution may have undergone a period where replicating systems existed without cellularization. In such a situation, a replicase able to synthesis all or part of its own specificity-primer, could via mutations to its own sequence, have rapidly evolved a sense of ‘self’ to avoid replicative parasites early in evolution.

While many outstanding challenges remain to producing a self-evolving system in the laboratory, including increased polymerization rate, fidelity and most importantly strand displacement, the development of a promoter-dependent RNA polymerase ribozyme with processive clamping ability offers many insights into the dilemmas faced by life in the earliest periods of evolution on this planet.

86 2.9. Materials and Methods

2.9.1. Oligonucleotides

All oligonucleotides used in this project are described in detail in Tables 2.1 and Tables 2.3 to 2.7. Synthetic DNA and RNA oligonucleotides (Syn) were purchased from Integrated DNA Technologies (IDT), resuspended in 45% formamide, 10 mM EDTA, heated (95°C for 5 min) and purified by denaturing PAGE, eluted in 0.3 M NaCl and precipitated in 70% ethanol prior to use. For T7 RNAP in vitro transcribed constructs (IVT), double-stranded DNA (dsDNA) templates were generated either by PCR or by filling in two partially complementary synthetic DNA oligonucleotides at 5 µM each in PCR conditions. In vitro transcriptions were carried out in a mixture of 1 μM dsDNA template, 8 mM GTP, 2 mM UTP, 5 mM CTP and ATP in T7 buffer (2.5 mM Spermidine,

26 mM MgCl2, 10 mM DTT, 0.01% Triton X-100, 40 mM Tris-HCl pH 8.0 at 25 °C) and 1.5 U/μL of T7 RNA polymerase (Applied Biological Materials). Reaction mixtures were incubated at 37°C for 2 h and for an additional 15 min at 37 °C after the addition of 0.4 U/μL DNase I (ThermoFisher). Transcribed products were made up to 45% formamide, 10 mM EDTA, heated (95°C for 5 min) and subsequently purified by denaturing PAGE, eluted in 0.3 M NaCl and precipitated in 70% ethanol prior to use.

2.9.2. Pool construction and modification

To create the starting selection pool, Round 0, the ligase core of the B6.61 ribozyme was amplified and modified using PCR primers containing an Ear I restriction site, so as to allow the ligation of three distinct accessory domains having a matching restriction site and random sequence libraries inserted into three different locations (Table 2.1).

To the ligase core, a 5' primer-binding site (PBS) was appended, able to hybridize to 10-nt found at the 3' terminus of specificity primers (i.e. P1, P2), allowing primer hybridization and formation of the P1:PoolOPEN form of the clamping domain. Additionally, the PBS includes a 4-nt stem with a GNRA tetra loop to introduce a 180° turn so as to correctly orient the primer complementary sequence of the PBS. When the primer is stripped from the polymerase onto a ssRNA promoter element of a template,

87 this turn positions the primer-template complex optimally relative to the catalytic core of the ribozyme (data not shown).

Using published structural data of the B6.61 accessory domain (Wang et al., 2011), three accessory domains were built containing random sequence insertions designed to preserve function while maximizing potential interactions with the engineered 5' PBS sequence of the ligase core (Fig. 2.2 black arrows, Table 2.1A). For Accessory Domain 1, the A2, A5 and AL5 structures were removed and replaced with 76-nt long random sequence. For Accessory Domain 2, the A1, AL1 structures were replaced with 76-nt random sequence and the A2, A5 and AL5 structures were replaced with 25-nt random sequence. Lastly, for Accessory Domain 3, the A1, AL1 and JL/A structures were removed and replaced with 76-nt random sequence, while the A2, A5 and AL5 structures were removed and replaced with 35-nt random sequence. To build the three accessory domains, pairs of partially complementary DNA oligonucleotides containing the random sequence and an Ear I restriction site (Table 2.1A) were end filled using Taq DNA polymerase under PCR conditions for one cycle of extension.

Ear I restriction digestion of the newly built ligase core and three accessory domains allowed their subsequent ligation via T4 DNA ligase (NEB) to produce three full- length DNA Round 0 pools. The three pool populations constructed in this fashion, were then mixed in equimolar amounts and PCR amplified together for 7 further cycles using the 64.14 and 17.119 primers (Table 2.1A).

The quality of these pool molecules was determined by cloning into E. coli using the TOPO-TA cloning kit (ThermoFisher). Transformed cells were plated and grown at 37 °C overnight on LB agar plates containing 0.1 mg/mL Ampicillin. 50 individual colonies were PCR amplified and sequenced by Eurofins Scientific to verify full length products, proper random sequence insertion and the even distribution of the three different accessory domains. Following this verification, the pool was in vitro transcribed, denaturing PAGE purified, and fed into the first round of selection with an RNA pool

13 diversity of ~10 and a pool copy number of ~50.

After 16 rounds of selection, the pool showed clamping activity, but only a minor polymerization activity increase relative to the B6.61 progenitor. The R16 pool was transformed into E. coli as in Round 0, and 50 clones were analyzed by sequencing.

88 Seven families emerged: Family 1 (10/50 clones), Family 2 (6/50), Family 3 (6/50), Family 4 (3/50), Family 5 (2/50), Family 6 (2/50), Family 7 (2/50); with 19/50 clones having a unique sequence. Within the clones, nine point mutations were observed at a high frequency across the ligase core and accessory domain, three of which, C90U, G147A, A170C (Fig. 2.1D, teal boxes) have previously been published as activity enhancing, highlight the effectiveness of the polymerization selective step (Wochner et al., 2011; Horning and Joyce, 2016). To increase polymerization activity, going into Round 17, four variants of the Round 16 pool (R16-1, R16-2, R16-3, R16-4) were engineered using pairs of PCR primers, and mixed in equimolar amounts (Table 2.1B). R16-1 was the unchanged selection pool, R16-2 consolidated the high frequency 9-point mutations observed in the ligase core and accessory domain over the 50 cloned sequences. R16-3 introduced 11-point mutations and a 18-nt deletion in the ligase core and accessory domain previously published as activity enhancing (Wochner et al., 2011; Horning and Joyce, 2016), and R16-4 combined the R16-2 and R16-3 mutations. To further increase pool diversity, all four R16 pools underwent mutagenic PCR (as described in section 2.9.5: Pool PCR amplification).

Lastly, after 23 rounds of selection, an undesired selection artifact was observed, where unprimed and untemplated 3' end extensions of the pools were taking place (Fig. 2.5). To block this activity, the reverse PCR primer was modified so as to add one an additional terminal A to the pool (Table 2.1C, 18.70), decreasing the observed artifact activity by 50-fold.

2.9.3. In vitro selection schemes

The RNA pools underwent three selective schemes, “Negative, “Clamping” and “Processivity” (Fig. 2.3). After every round, the isolated active RNAs were reverse transcribed and PCR amplified (see sections 2.9.4: RT of selection rounds and 2.9.5: Pool PCR amplification) to obtain a DNA library enriched in active sequences that feed into the subsequent round of selection. Detailed selection steps and conditions are summarized by round in Table 2.2 and Fig. 2.4.

1. Negative selection. In seven of the selection rounds (Fig. 2.3A), a negative selection step was implemented to remove any pool molecules able to hybridize to or interact with the selection template in a sequence specific manner. Streptavidin magnetic

89 beads (Dynabeads M-270 Streptavidin, ThermoFisher) were used in all three selection schemes and prepared by washing the beads with Buffer A (50 mM NaCl, 100 mM NaOH), for 2 min, twice, followed by equilibration with two, 2 min washes of Polymerization Buffer supplemented with Heparin and BSA (PB+HB buffer: 100 mM KCl,

100 mM MgCl2, 100 mM Tris-HCl pH 8.5 at 25°C, 0.1 mg/mL Heparin, 1 mg/mL BSA). All reactions and washes in the selection were carried out in a 1 to 1 volume ratio with the beads. 3' Biotinylated T1 or T1.4 templates (T1-B or T1.4-B, Table 2.4) at 0.15 µM concentration were bound to the prepared beads for 20 min, followed by two, 2 min washes with PB+HB buffer. To verify template immobilization, the pre-beads T1-B or T1.4-B solution, the supernatant and washes were collected and ran into a native gel at the end of the round with excess 5' end labeled 32P radiolabeled probes able to hybridize to the template. Internally radiolabeled (32P) 'Closed' Pool (PoolCLOSED) molecules, at 0.1 µM, were subsequently incubated on the T1-B or T1.4-B beads for 20 min. The supernatant containing template un-bound pool molecules were collected and incubated with 0.12 µM primer P1 (for T1-B) or P1.4 (for T1.4-B) to form 'Open' Pool complexes (PoolOPEN) to be used in the subsequent Clamping Selection Scheme in that round. Pool molecules immobilized on the T1-B or T1.4-B beads were discarded. At every step in the selection scheme, the radiolabeled pool was tracked by Scintillation Counting.

2. Clamping selection. The clamping selection scheme was implemented for 11 rounds, aimed at selecting pool molecules for their ability to form a primer bound P1 or P1.4:PoolOPEN state, localize to a circular template by recognizing a promoter element and then undergoing structural rearrangement to a PoolCLOSED state (Fig. 2.3B). Streptavidin magnetic beads were prepared as in the Negative Selection scheme, except that a mixture of 0.12 µM cT1 (or cT1.4) and a 0.12 µM template complementary Biotinylated DNA binding oligo was used for bead immobilization (50.15, Table 2.3). The binding was followed by two, 2 min washes with PB+HB buffer to wash away any unbound template molecules. To monitor template immobilization on beads a diagnostic native gel using complementary 5' end labeled 32P radiolabeled probes was ran at the end of the round as in the Negative Selection scheme. Subsequently, internally radiolabeled (32P) PoolOPEN complexes formed with P1 or P1.4, either prepared fresh or from the Negative Selection were incubated on the bead-immobilized circular templates for 20 min, followed by washes (as specified in Table 2.2) with PB+HB buffer to wash away Pool molecules unable to change to a correct template entraining PoolCLOSED state.

90 To recover function PoolCLOSED molecules, correctly entrained pool molecules were converted back to the 'Open' state, by incubating the washed beads with 0.1 µM P1 or P1.4 in PB+HB buffer. For added selective pressure, conditions were changed between rounds, with longer more stringent washes, multiple 'on' and 'off' events, and shorter primer recovery incubations (Table 2.2). The radiolabeled pool was tracked by Scintillation Counting at every step in the selection scheme. Recovered functional molecules were ethanol precipitated, reverse transcribed and PCR amplified as described in sections 2.9.4: RT of selection rounds and 2.9.5: Pool PCR amplification.

3. Processivity selection. To select for polymerization activity and functional clamping, 19 rounds of the processivity selection scheme were carried out (Fig. 2.3C). Initially internally radiolabeled (32P) RNA pool, 0.1 µM, was incubated with RNA specificity primer, 0.11 µM P1 (or P1.4), for 20 min at room temperature to obtain a primer bound P1:PoolOPEN conformation. Subsequently P1:PoolOPEN were incubated on circular templates (cT1, cT1.4, cT1.5, see section 2.9.6: Covalent circularization of RNA templates for synthesis) with ATP, GTP, UTP (4 mM each) together with Biotin-11- cytidine-5'-triphosphate (1 mM, CTPB) in PB buffer. Following this incubation, the pool- primer-template mixture was passed through PB buffer equilibrated Performa DTR Gel Filtration Cartridges (EdgeBio) for the removal of unincorporated CTPB prior to incubation for 20 min on streptavidin magnetic beads prepared as in the above schemes. Elongated primer-circ-template-ribozyme complexes were retained on the beads, while unelongated complexes were washed away. For Rounds 9-15, bead captured pool molecules were reverse transcribed on the beads and the supernatant was collected, PCR amplified and in vitro transcribed feeding into the next round of selection (sections 2.9.4: RT of selection rounds and 2.9.5: Pool PCR amplification). For Rounds 16 and above, functional molecules were recovered off the beads by incubated with 0.12 µM P1 or P1.4 solution in PB+HB buffer, ethanol precipitated, reverse transcribed, PCR amplified and in vitro transcribed feeding into the next round of selection as above. For additional selective pressure, Rounds were adjusted with shorter polymerization times (from overnight to 30 min), the promoter element was changed and the first templated B-CTP additions moved further into the transcribed sequence of the template (Table 2.2). The radiolabeled pool was tracked by Scintillation Counting at every step in the selection scheme.

91 2.9.4. RT of selection rounds

Collected functional pool molecules were ethanol precipitated with 20 µg of glycogen as a carrier and re-suspended in a mixture of dNTPs (0.6 mM of each), 1 μM

RT primer, ddH2O and heated to 65°C for 5 min before slowly cooling on the bench to room temperature. The reaction mixture was then completed with RT buffer (75 mM KCl,

3 mM MgCl2, 10 mM DTT, 50 mM Tris-HCl pH 8.3 at 25 °C, 10 mM DTT and 20 U/µL Maxima Reverse Transcriptase (ThermoFisher). This reaction was incubated at 50°C for 1 h. 1 M KOH was then added to a final concentration of 100 mM and heated to 90°C for 15 min to degrade RNA and inactivate the RT. The reaction mixture was neutralized with 1 M Tris-HCl to a pH of 8.3-8.4 before feeding into a PCR reaction.

2.9.5. Pool PCR amplification

PCR of the neutralized RT reaction was carried out in a mixture of 0.5 μM each of forward and reverse primers, 0.2 mM of each dNTP, in PCR buffer (50 mM KCl, 1.5 mM MgCl2, 0.01% Gelatin, 10 mM Tris-HCl pH 8.3 at 25°C) and 1.25 U/μL Taq DNA polymerase (NEB). Reaction mixtures were subjected to PCR cycles at 94°C (45 s), 50°C (85 s) and 72°C (115 s) on a PTC-100TM Peltier Thermal Cycler (MJ Research), followed by product size verification on 2% agarose gels prior to ethanol precipitation and use in in vitro transcriptions. For mutagenic PCR with a 2% mutational rate, reaction mixtures were supplemented with 0.5 mM MnCl2 and 5.5 mM MgCl2 as described by Cadwell and Joyce (Cadwell and Joyce, 1994).

2.9.6. Covalent circularization of RNA templates

RNA templates were in vitro transcribed using a modified NTP mixture containing 10 mM GMP, 2 mM GTP, 2 mM UTP, 5 mM CTP and ATP (5:1 GMP:GTP) and purified by denaturing PAGE, to allow subsequent circularization using T4 RNA ligase. Reaction mixtures of 5 μM GMP-templates, 50 μM ATP in T4 RNA ligase buffer (1 mM DTT, 10 mM MgCl2, 50 mM Tris-HCl pH 7.5 at 25°C) and 0.5 U/μL T4 RNA Ligase 1 (NEB) were incubated at room temperature for 2 h. Reactions were stopped with 45% formamide, 10 mM EDTA, heated (95°C for 5 min) and subsequently purified by denaturing PAGE.

Circularization was verified by partial alkaline hydrolysis (in 50 mM NaHCO3 and heating to 90°C in a time course of 1-5 min) of 32P internally labelled circular templates. In a 5%

92 denaturing PAGE analysis, a gel mobility shift was observed resulting from a single hydrolysis event within the circle linearizing the template (Fig. 2.25).

Figure 2.25. Linearization of cT1 by alkaline hydrolysis. (A) A 5-minute alkaline hydrolysis of internally radiolabeled RNA was resolved on a 5% denaturing PAGE. T1 as a control (lane 1) and a T1+cT1 mixture (lane 2 to 7). (B) Quantified RNA degradation.

2.9.7. 3' Biotinylation of linear RNA templates

Reaction mixtures containing 100 µM RNA templates were incubated with fresh

10 mM Sodium Periodate (NaIO4) at room temperature (RT) for 20 min. The reaction was reduced with 10 mM Sodium Thiosulphate (Na2S2O3) for 20 min at RT, followed by ethanol precipitation. The precipitated pellet was resuspended in 100 mM NaOAc (pH 4), 4 mM EZ Link Biotin LC (long chain) Hydrazide (ThermoFisher) and incubated at 37°C for 60 min. To stabilize the newly formed hydrazine bond, fresh 16 µM Sodium

Cyanoborohydride (NaBH3CN) (dissolved in anhydrous acetonitrile) was added and incubated at 37°C for 30 min. The resultant mixture was butanol precipitated with 9 equivalents of n-Butanol, followed by three phenol and one chloroform extraction prior to one final ethanol precipitation to remove excess biotin. Biotinylation efficiency was determined via a 5% native PAGE streptavidin gel shift assay, where ~85-90% biotinylation was typically observed (Fig. 2.26).

93

Figure 2.26. Biotinylation efficiency. The ~88% efficient 3' biotinylation of an internally radiolabeled T1 (1 μM), marked by mobility shift on a 5% native gel in the presence of Streptavidin (2 μM).

2.9.8. High-throughput sequencing

Double stranded DNA libraries of rounds 23, 25, 26, 27, 28, 29 and 30 generated by RT-PCR were cleaned up using a QIAquick PCR Purification Kit (QIAGEN) to remove primers, nucleotides, enzymes, and salts, followed by size verification on a 2% agarose gel. Cleaned up DNA libraries were quantified on a Qubit 2.0 Fluorometer (ThermoFisher) and submitted for high-throughput Amplicon-EZ sequencing (GENEWIZ). Raw sequencing reads were processed using Geneious software: paired end reads were merged, low quality reads were removed, low-quality bases were trimmed, and adapters were identified and removed. Subsequently, sequences were grouped into families within a pairwise-distance of either zero (determining number of unique sequences, d = 0) or up to two (d ≤ 2) and their relative abundance determined (all sorted Sequenced DNA libraries are available with the publication at https://science.sciencemag.org/content/371/6535/1225). Python Code for grouping and sorting of sequences is publicly available with instructions at https://github.com/ehan1990/dna-sequence-grouping. Typically, 5,000-10,000 cleaned up sequences were obtained for each round of selection.

2.9.9. Characterization of the clamping polymerase (CP) ribozyme

Three strategies were implemented for determining the essential sequence of the 5' PBS and 3' Clamping Domain. 1. Using PCR amplification and primer walking, the 5' PBS primer hybridization region was removed or a systematic truncation of the 3' Clamping Domain down to the minimal functional sequence was carried out (Table 2.6,

94 Const. 4-18). 2. Again, using PCR primers, mutations to either disrupt, substitute and/or rescue predicted secondary structures and 5' to 3' clamping hybridization regions were implemented (Table 2.6, Const. 1-3 and 19-51). 3. Using 10-fold excess complementary RNA or DNA oligonucleotides, the 5' PBS primer hybridization region or regions of the minimum 3' clamping domain were blocked by hybridization of the oligonucleotides to the ribozyme to inhibit potential functionality (Table 2.7, Cost. RNA 1-2 and DNA 1-7). Hybridization of all oligos to the ribozyme was confirmed by 5% native PAGE analysis.

2.9.10. PBS sequence modification of CP for self-templated primer synthesis

As described in section 2.9.6: Covalent circularization of RNA templates, GMP spiked in vitro transcriptions were used to generate 5' monophosphate CP ribozymes lacking the first 12-nt of the PBS. RNA splint ligations using T4 DNA ligase were then performed to add back new 5' sequence (Moore and Sharp, 1992). Reaction mixtures containing 10 μM truncated monophosphate ribozymes, 20 μM DNA splint oligonucleotide, 25 μM new RNA 5' PBS oligonucleotides (for synthesizing CP1, CP2,

CP3, Table 2.5) in T4 Buffer (10 mM MgCl2, 1 mM ATP, 10 mM DTT, 50 mM Tris-HCl pH 7.5 at 25 °C) and 200 U/μL T4 DNA Ligase (NEB) were then incubated at 16°C for 16 h and subsequently purified by denaturing PAGE, resulting in ~25% ligation yield.

2.9.11. Polymerization assays

Standard primer extension assays were carried out with three individual components, 5'-radiolabeled RNA Specificity Primer, Ribozyme and Template in 100 mM

MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5, and 4 mM of each NTP at 22°C. Unless otherwise mentioned, the RNA Primer (0.1 μM) was initially allowed to hybridize to the Ribozyme (0.12 μM) in a buffered solution for 20 min at room temperature. The reaction was subsequently started by the addition of NTPs and the Template (0.14 μM). Reactions were stopped by adding 1x volume of stop mix (80% formamide, 200 mM EDTA, 0.025% Xylene Cyanol, and 0.025% bromophenol blue) in the presence of 10- fold excess of a competing RNA oligonucleotide complementary to the template strand (Table 2.4) and heated to 95°C for 5 min. Products were resolved on 8-20% sequencing PAGE and the gels were analyzed using a GE Healthcare Amersham Typhoon scanner.

95 Extension products were quantified as described in section 2.9.16: Quantification of primer extensions.

2.9.12. Polymerization Assays with multiple primers on cT1

Primer extension assays with multiple primers on cT1 were carried out by first forming the P1:CPOPEN complex, where the RNA Primer (0.2 μM) was allowed to hybridize to the Ribozyme (0.24 μM) in a buffered solution (100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5) for 20 min at room temperature. In parallel, P1+n primers (0.4 μM, where n = 5, 40, 80, 121, 156) were prehybridized to cT1 (0.2 μM) in a buffered solution containing 8 mM of each NTP for 20 min at room temperature. Primer extensions were initiated by mixing the P1:CPOPEN complex with the prehybridized P1+n:cT1 complex and incubating at 22°C (Fig. 2.19B). The effective final concentrations were: P1 (0.1 μM), CP (0.12 μM), P1+N (0.2 μM), cT1 (0.1 μM) and 4 mM of each NTP. To observe extension of both P1 and P1+n primers, all assays were done with either radiolabeled P1 or P1+n in parallel. Reactions were stopped and quantified as in section 2.9.11: Polymerization assays above.

2.9.13. Extension efficiency titrations and order of addition experiment

To determine extension efficiency, concentration dependent extension assays were carried out with a fixed, high template concentration (1 μM) and titrating amounts of the Primer at 10, 20, 40, 80, 160 and 320 nM, and Ribozyme at 12, 24, 48, 96, 192, and

384 nM in buffer containing 100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl pH 8.5 at 25 °C, and 4 mM of each NTP at 22°C for 4 h. Two assays were carried out: 1. The Primer was first incubated with the Ribozyme for 20 min at room temperature to create the correct holo-polymerase 'open' conformation prior to the addition of the template, or 2. The Primer was first incubated with the template for 20 min at room temperature prior to the addition of the ribozyme in the 'closed' conformation. Assays were carried out with Primer 1 (P1) and the Clamping Polymerase (CP) ribozyme on a series of templates, or P1 and the progenitor B6.61 ribozyme on the circular template 1 (cT1), at the aforementioned concentrations. Primer extensions were quantified as described in section 2.9.16: Quantification of primer extensions and an Extension Ratio (ER) was determined for all tested conditions by dividing the primer extension of assay 1 by that of

96 assay 2 (Fig. 2.13, 2.14 and 2.17). Further, to measure the effect order of addition had on extension, alongside assay 1 and 2 with P1, CP and cT1, a 3rd assay was carried out where the 'closed' conformation Ribozyme was first incubated with the template for 20 min at room temperature prior to the addition of the primer (Fig. 2.12, A and B). Similar to the extension ratio assays, the template was held at a high concentration (1 μM) and the Primer and Ribozyme were titrated at the above concentrations. Primer extensions were quantified, normalized and graphed logarithmically (Fig. 2.12B).

2.9.14. Off-rates of primers and templates from bead immobilized CPOPEN complexes

To immobilize the P1:CPOPEN complex on streptavidin magnetic beads, beads were prepared as described in section 2.9.3: In vitro selection schemes and incubated for 30 min at room temperature with prehybridized Biotinylated DNA oligo complementary to CP (0.1 μM, Table 2.3, 30.57), CP (0.12 μM) and P1 or P2 (0.15 μM) prepared in buffer containing 100 mM KCl, 100 mM MgCl2, 100 mM Tris-HCl pH 8.5 at 25°C, 0.1 mg/mL Heparin, 1 mg/mL BSA and incubated at room temperature for 30 min. Unbound molecules were washed away in the above buffer. CP’s polymerization activity was found to be unaffected by the Biotinylated DNA probe, Heparin and BSA which were required for bead binding (Fig. 2.27). By adding a range of templates to the immobilized P1:CPOPEN complexes, the off-rates of either radiolabeled primers or templates were measured over time by scintillation counting, by bringing down the beads, removing and measuring the supernatant, then resuspending the beads in fresh buffer (Fig. 2.18). Templates were added at 0.2 μM concentration when measuring primer off-rates or 0.1 μM concentration for template off-rates.

97

Figure 2.27. CP activity is unaffected by bead binding conditions. CP polymerase activity is not affected by hybridizing a Biotinylated DNA probe to its 3' terminus or the addition of Heparin and BSA required for bead binding. P1:CPOPEN complex (0.1 µM of CP and P1 each) was incubated with cT1 (0.12 µM) and NTPs (4 mM each) to allow primer extension. (A) Addition of a biotinylated DNA probe (Table 2.3, 30.57) to the 3' terminus of the CP ribozyme has negligible effect on polymerase activity. (B) Addition of heparin (0.1 mg/ml), BSA (1 mg/ml) or both simultaneously had no effect on CP’s extension of cT1. Both heparin and BSA are used in streptavidin magnetic bead binding experiments to prevent non-specific binding.

2.9.15. Primer extensions on beads with immobilized CP complexes

Immobilized P1:CPOPEN complexes were formed on streptavidin magnetic beads as described above. The immobilized complexes were incubated with either cT1 (0.1 μM) or prehybridized P1+N:cT1 (0.4 μM P1+N, 0.1 μM cT1, where N = 5, 40) in buffer containing 100 mM KCl, 100 mM MgCl2, 100 mM Tris-HCl pH 8.5 at 25°C, 0.1 mg/mL Heparin, 1 mg/mL BSA for 20 min at room temperature. After 20 min, unbound templates were removed and further washed for 1 h, changing the buffer at 0, 5, 10, 20, 40 and 60 min, before incubating the beads with buffered NTPs for 4 h. After incubation, the total suspension (T), the beads (B) and the supernatant (S) were loaded onto a 10% denaturing gel (Fig. 2.21).

98 2.9.16. Quantification of primer extensions

Primer extensions were quantified using ImageQuantTL software, determining the percent of extended vs un-extended primer. The rolling ball background subtraction method was implemented to remove large spatial variations of the background intensities prior to pixel counts being determined for lanes and bands of equal area. In most cases, long-range extension of >15-nt (processive-extension) was quantified as a percentage of total available primer. However, to accurately measure the lower activity B6.61 progenitor and the short poly(A) Template 1 variants, >2 and >8-nt extensions were quantified respectively. Where multiple lanes were compared, loading counts were normalized.

99 2.10. Supplementary Tables

Table 2.1. Oligonucleotides for the construction of the DNA selection pool. IVT indicates in vitro transcribed and Syn indicates purchased synthetic oligonucleotides. (A) DNA Oligonucleotides for adding a set primer- binding site sequence (PBS) and appending random nucleotide sequences to the B6.61 ribozyme to synthesize three types of the DNA Round 0 selection pool. (Continued on next page).

100

Table 2.1. (Continued). (B) DNA oligonucleotides for synthesizing four types of the DNA Round 16 pool, compiling up mutations from the first 16 rounds of selection and activity enhancing mutations recently published (Horning and Joyce, 2016; Wochner et al., 2011). (C) Round 23 modifying reverse PCR primers for the addition of a 3' A, U or mutating the terminal nucleotide of the pool from a U to a C in the DNA pool.

101 Conditions of Selection Rounds Selective Schemes Negative Circular Inc. on Fresh Primer Final % % Pool Bound Round Clamping Processivity Primer Wash (min) Mut Selection Templates Templates (min) Recovery (min) Pool Recovery to Beads 1 - + 1x - P1 cT1 10 2x 2 20 - - - 2 - + 1x - P1 cT1 10 2x 2 20 - - - 3 - + 2x - P1 cT1 10 2x 2 10 - - - 4 - + 2x - P1 cT1 10 6x 5 5 - - - 5 + + 2x - P1 cT1 10 6x 5 5 - - - 6 + + 2x - P1 cT1 10 6x 5 5 0.58 - - 7 + + 2x - P1 cT1 2 6x 5 1 0.19 - - 8 + + 2x - P1 cT1 2 6x 5 1 0.14 - - 9 - - + P1 cT1 1,320 6x 5 - - - - 10 - - + P1 cT1 1,050 6x 5 - - - - 11 - - + P1 cT1 1,050 6x 5 - - - - 12 - - + P1 cT1 1,140 6x 5 - - 0.7 - 13 - - + P1 cT1 1,170 6x 5 - - 0.7 - 14 - - + P1 cT1 30 6x 5 - - 4.7 - 15 - - + P1 cT1 30 6x 5 - - 6.1 - 16 - - + P1 cT1 30 6x 5 1 0.9 6.4 + 17 + + 2x - P1.4 cT1.4 5 2x 2 5 - - + 18 - - + P1.4 cT1.4 240 2x 2 2 - - + 19 + + 2x - P1.4 cT1.4 5 2x 2 5 0.5 - - 20 - - + P1.4 cT1.4 120 3x 2 5 0.16 - - 21 + + 2x - P1.4 cT1.4 5 2x 2 5 3.1 - - 22 - - + P1.4 cT1.4 120 3x 2 5 0.2 - - 23 - - + P1.4 cT1.4 120 3x 2 10 0.2 - - 24 - - + P1.4 cT1.4 120 2x 2 10 - - - 25 - - + P1.4 cT1.4 120 3x 2 10 0.21 - - 26 - - + P1.4 cT1.4 120 3x 2 10 0.32 - - 27 - - + P1.4 cT1.4 120 3x 2 10 3.42 - + 28 - - + P1.4 cT1.5 120 3x 2 10 3.37 - - 29 - - + P1.4 cT1.5 15 3x 2 10 0.75 - - 30 - - + P1.4 cT1.5 15 3x 2 10 0.94 - -

Table 2.2. Selection schemes and conditions. (-) indicates that specified selective condition was not used, values where not measured, or values were within background.

102

Table 2.3. Sequences of RNA primers and modified DNA oligos

103

Table 2.4. Sequences of RNA templates and corresponding complementary RNA templates. Underline indicates promoter sequence. Colored nucleotides indicate changes from the original sequence to created template variants.

104

Table 2.4. (Continued).

105

Table 2.5. Ribozyme sequences Ignoring B6.61, colored nucleotides indicate changes from the original sequence of the CP ribozyme.

106

Table 2.6. Modified CP ribozymes constructs. Modified CP constructs for clamping domain characterization. Red oligonucleotides indicate changes relative to the WT CP sequence and (-) indicate deletions.

107

Table 2.6. (Continued).

108

Table 2.6. (Continued).

109

Table 2.7. Oligonucleotides for CP ribozyme blocking experiments RNA and DNA oligonucleotides complementary to the CP ribozyme, used in blocking experiments for clamping domain characterization.

110 Chapter 3. Conclusions and Future Research Directions

3.1. Evolution of an autocatalytic system

3.1.1. Strand displacement

Natural and artificial ribozymes have demonstrated the ability to catalyze a broad range of reactions directly relevant to the replication and expression of RNA in an early RNA World. The potential for a self-consistent RNA metabolism is increasingly supported as multidomain RNA polymerases of the type presented here are further developed, however, there are many obstacles still to overcome. One main challenge that remains is strand displacement (Szostak et al., 2001; Joyce, 2002; Cheng and Unrau, 2010; Joyce and Szostak, 2018). In an RNA World, template-directed RNA replication requires two polymerization reactions. The first reaction generates a daughter strand complementary to the template, which in turn, must undergo a second polymerization reaction to generate a product with the same sequence as the original template. To carry out full cycles of replication and create functional single stranded RNA genes, RNA polymerases must find a way to overcome the highly stable duplex products that result from both polymerization reactions.

To address strand displacement, several prebiotically possible non-enzymatic processes have been explored in an effort to facilitate nucleic acid replication. Using solvent viscosity (via glycholine) and temperature cycling, the Hud group was able to alter the mobility and annealing kinetics of nucleic acids (He et al., 2017). With this method, they demonstrated template-directed assembly of eleven 32-nt RNA oligonucleotides by greatly slowing down the re-annealing of the 545-nt full-length duplex template. However, since this method relies on size differentials between oligonucleotides, it is not compatible with shorter templates and the larger sized RNA polymerase ribozymes. Fluctuations of pH (Mariani et al., 2018) and salt concentrations through miniaturized water cycles (Ianeselli et al., 2019) have also been shown to drive RNA strand separation, however, so far no RNA replication has been demonstrated under these conditions. In an unique approach, the Szostak group was able to show template-directed non-enzymatic primer extension using partial strand separation (Zhou

111 et al., 2019). Through a toehold/branch migration process, short oligonucleotides (invaders) rescued primer extension by partially stripping off blockers that lie ahead of and inhibit primer extension (Fig. 3.1). With multiple invaders, this method allows for primer extension with multiple nucleotides in a limited fashion. Although a potential solution to strand inhibition, for a full replication cycle this method requires a large source of multiple sequence-specific short oligonucleotide invaders complementary to both (+) and (-) strands, leading to potential invader-invader complexes.

Figure 3.1. Strand displacement for non-enzymatic primer extension. An RNA blocker (green) inhibits primer extension by hybridizing to a complementary region on the template (blue) in front of the primer (brown). A short oligonucleotide, the invader (red), sequesters the blocker rescuing primer extension. An activated 5'-5'-phosphorimidazolium- bridged dinucleotide then extends the primer by one nucleotide. Source: from Zhou et al. (Zhou et al., 2019).

In early evolution, it is highly likely that RNA replicases would have utilized strategies similar to present day mechanisms. Protein DNA-dependent RNA polymerases (DdRP) form stable multiprotein transcription bubbles, where the DNA template is locally melted to allow RNA synthesis onto a single stranded template (Griesenbeck et al., 2017). When the generated DNA-RNA hybrid reaches a certain length, the RNA transcript separates from the DNA and enters the RNA exit channel, resulting in a single stranded RNA product (Andrecka et al., 2008). By utilizing the energy of phosphodiester bond formation, these polymerases perform strand displacement of the RNA product as it is being made. It is conceivable that a polymerase ribozyme could be developed with similar properties. However, such a ribozyme would require two novel functionalities not seen in any of the current in vitro selected polymerase ribozymes (Cheng and Unrau, 2010). First, the ribozyme must be able to recognize and initiate polymerization from a duplex RNA, similar to how DdRPs utilize dsDNA promoters. Second, the ribozyme must possess helicase activity to melt the RNA duplex, exposing a single stranded template. With these two properties, a polymerase ribozyme would be capable of asymmetric replication of one duplex strand, producing single stranded RNA products.

112 Nature, however, provides us with a simpler example through the replication process of viroids and the hepatitis delta virus (HDV). Viroids are small (~250-400 nt), single stranded circular noncoding RNAs that infect and replicate in plants (Wang, 2021), while the HDV RNA is larger (~1700 nt), encodes a single protein, and depends on hepatitis B virus for transmission (Lasda and Parker, 2014). There are two families of viroids, Pospiviroidae and Avsunviroidae, distinguished by their replication mechanism. Both viroid families and HDV replicate by recruiting host DdRPs to a defined transcription initiation site to perform RNA templated replication. Pospiviroidae replicate via an asymmetric single rolling-circle mechanism, where a circular (+) RNA template is used to generate intermediate linear oligomeric (-) RNAs, which in turn template the generation of more oligomeric (+) RNAs (Flores et al., 2009) (Fig. 3.2, bottom). The resultant oligomeric (+) products are cleaved into unit lengths and ligated into circles using host enzymes. Avsunviroidae viroids and HDV replicate via a symmetric double rolling-circle mechanism, where the circular (+) RNA genomes are first transcribed into oligomeric (-) RNAs, which are subsequently cleaved into unit lengths and ligated into circles. The circular (-) RNAs then template the transcription of oligomeric (+) RNAs that are cleaved and ligated into monomeric circular genomes (Flores et al., 2011) (Fig. 3.2, top). In contrast to Pospiviroidae, Avsunviroidae and HDV cleavage of the oligomeric copies occurs by the encoded cis-acting hammerhead and HDV ribozymes, respectively. While the ligation mechanism is unclear and might involve autocatalytic ligation by encoded ribozymes, it is generally accepted that some require host enzymes (Nohales et al., 2012).

Figure 3.2. Symmetric and asymmetric rolling-circle replication of viroids and HDV. Symmetric, double rolling-circles pathway (top) used by the Avsunviroidae viroid family and HDV. Asymmetric, single rolling-circle pathway (bottom) used by Pospiviroidae viroid family. Solid lines represent (+) strands and open lines represent (-) strands. Processing of oligomeric products are indicated by the arrow heads, where ribozyme cleavage is indicated by RZ and host enzyme cleavage by HE. Source: from Flores et al. (Flores et al., 2011).

113 Given the replication strategies of viroids and HDV, it is highly conceivable that RNA replicases, and especially the CP ribozyme could be evolved to utilize a similar strategy. In an RNA World scenario, using circular templates offers several advantages. The circular nature of the template ensures replication of the full genome regardless of the initiation start site and can facilitate rearrangement of the genetic information inside the RNA genome (Lasda and Parker, 2014). Additionally, circular templates can constrain RNA folding, enhancing polymerization off single-stranded templates by minimizing stable RNA secondary structures. The latter is consistent with observations from the CP ribozyme, where extension efficiency is enhanced three-fold on circular vs linear templates (Fig. 2.15). However, the challenge lies in evolving the CP ribozyme to contain strand displacement capabilities. Currently, the CP ribozyme is incapable of strand-invasion (Fig. 2.19B, 2.21B and 3.3). Fully or partially hybridized blocking DNA oligonucleotides were able to inhibit primer extension completely at zero or three nucleotides from the P1 specificity-primer promoter start site on cT1 (Fig. 3.3). This inability to unwind an upstream duplex would prevent possible rolling-circle replication.

To achieve strand displacement, the current embryonic CP topological clamp requires further in vitro evolution to maintain a more tightly closed state on the template. Current experiments show that the closed CP state (CPCLOSED) can slide over duplex RNA (Fig. 2.19 and 2.21), suggesting that the polymerase can slide in both directions on a circular template. In bacteria, only the open clamp domain is large enough to accommodate dsDNA, while the closed clamp domain can only accommodate ssDNA (Chakraborty et al., 2012). It is conceivable that mutations in the current CP clamp or the selection of a clamp-enhancing domain can result in the same template specificity for the CP ribozyme. Once a promoter region is recognized by the CP ribozyme and the specificity primer is stripped off, the newly enhanced clamp domain can close ahead of the double stranded promoter onto the single stranded RNA template (Fig. 3.4, left). Specificity for single stranded RNA would force the polymerase to become unidirectional (in a forward direction), facilitating symmetric rolling-circle replication as seen in Avsunviroidae viroids and HDV. Likewise, the CP ribozyme would require two polymerization reactions to fully replicate a template. The first reaction would drive around a single stranded circular (+) RNA template, resulting in a highly stable duplex (Fig. 3.4, middle). If able to distinguish between a single stranded template and an extended duplex, the polymerase might be able to be evolved to push against the duplex

114 using the force generated from NTP incorporation, facilitating strand invasion (Fig. 3.4, right). The disruption of upstream duplex base pairs would be thermodynamically compensated by the formation of a downstream base pair as the polymerase moves forward, resulting in displaced oligomeric linear (-) RNA products. To process the RNA products, the circular genome could contain a hammerhead or hairpin ribozyme motif to facilitate cleavage and ligation. The resultant monomeric circular (-) RNAs can then template the transcription of oligomeric (+) RNAs, which, in turn, can be cleaved and ligated into the original monomeric circular genomes by ribozyme motifs. Coupled with the CP ribozyme’s ability to initiate replication, an enhanced clamping domain can facilitate rolling circle replication of both (+) and (-) strands, allowing for an autocatalytic system that needs only primers and nucleotides, as long as the ribozyme fidelity is high enough to accurately copy the templates.

Figure 3.3. CP ribozyme shows no strand displacement activity. (A) Sequence of complementary DNA blocking oligonucleotides and their hybridization region on the template (cT1). (B) P1 (0.1 μM) extensions by the CP ribozyme (0.1 μM) on cT1 (0.12 μM)

115 with blocking DNA oligonucleotides (0.6 μM) in polymerization buffer (100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5).

Figure 3.4. Proposed strand invasion of an evolved CP ribozyme. After the specificity primer directs the polymerase ribozyme to the template, the clamping domain closes tightly around the single stranded template 3' of the primer end about to be extended (left). First copying reaction generates a complementary daughter strand in a stable duplex with the original template (middle). After a full rotation around the template, the improved clamping domain reaches the duplex template and pushes against it to facilitate strand invasion (left).

3.1.2. Fidelity

Bacterial and eukaryotic RNAPs have an estimated error rate of less than 10-5 (Sydow and Cramer, 2009). Template-dependent RNA polymerization on the other hand is plagued by low fidelity. The maximum genome length that can be maintained during evolution is inversely proportionate to the fidelity of the polymerase ribozyme (Eigen, 1971). If the genome is too long, the minimal error limit, termed the error threshold, is surpassed, and the encoded information will be lost due to accumulated mutations. However, the error threshold can be relaxed by maintaining the functional phenotype with neutral and compensatory mutations (Kun et al., 2005).

The primary objective over the last two decades has been to use directed evolution to produce highly processive polymerase ribozymes capable of long-range extension, with fidelity taking a secondary role. However, as polymerization activity and template generality increase, fidelity remains a challenge to overcome in transcribing active ribozymes from an RNA template. Depending on the template sequence, current polymerase ribozymes show a fidelity between ~70-97% (Johnston et al., 2001; Zaher and Unrau, 2007; Wochner et al., 2011; Attwater et al., 2013; Horning and Joyce, 2016; Tjhung et al., 2020), at best allowing a maximum 30-35 nucleotide long RNA genome to be maintained through replication (Eigen, 1971). A template sequence selection, highlighted the preference of polymerase ribozyme variants for low secondary structure templates, high in C composition and low in G content (Attwater et al., 2013).

116 Additionally, errors were concentrated towards the 3' ends of products with misincorporations often resulting in termination of further extension. Recent in vitro evolutions have focused on selecting RNA polymerase ribozymes that can synthesize functional RNA molecules, including the hammerhead ribozymes, RNA aptamers, tRNA and the polymerase’s own catalytic subunit, the class I ligase (Wochner et al., 2011; Horning and Joyce, 2016; Tjhung et al., 2020). The most successful attempt at improving fidelity, selected RNAs based on their ability to synthesize a catalytically active motif of the hammerhead ribozyme (Tjhung et al., 2020). Applying adaptive pressure for fidelity was further proven successful in increasing the position fidelity from ~91% to 97.4% of a polymerase ribozyme that uses trinucleotide triphosphates as substrates (Attwater et al., 2018). Additionally, an RNA polymerase ribozyme was evolved to more efficiently incorporate 6-thio guanosine triphosphate (6SGTP), as opposed to canonical nucleotides, showcasing the ability of the ribozymes to evolve with substrates (Akoopie and Müller, 2018). Despite these methods, fidelity of the RNA polymerase ribozymes remains modest, limiting both the yield and specific activity of the synthesized functional RNAs through debilitating mutations.

Fidelity was outside the scope of the CP ribozyme selection; therefore, it was not surprising that the round 28 pool showed misincorporation of nucleotides when only one or two nucleotides were present at a time during polymerization (Fig. 3.5). In particular, U and G could be misincorporated to produce low yields of 15- to 20-nt long products. Modern RNAPs form stable transcription bubbles, perfectly positioning the template, incoming nucleotides and magnesium ions for rapid and accurate transcription (Griesenbeck et al., 2017). The current methods used to evolve RNA polymerase ribozymes with improved fidelity, anchor the ribozyme to the template or primer to create a high local concentration of primer-template with respect to the polymerase, in order to mimic processivity. While these methods work for synthesizing long RNA products, they don’t overcome the low affinity of the ribozyme for the primer-template duplex (Lawrence and Bartel, 2003). Thus, polymerization occurs in short stretches, through multiple associations of the ribozyme to the primer-template. Each association needs proper orientation of the primer-template duplex in the catalytic site of the ribozyme, offering many opportunities for misincorporation of nucleotides to occur. A unidirectional CP ribozyme with improved clamping as described in the above section (see Section 3.1.1) could mimic modern biology by perfectly holding the primer-template duplex in the

117 catalytic site, without releasing them in between nucleotide additions. While possible, this may require further evolution, with direct selective pressure applied on the polymerase to synthesize more accurate and complex products in order for fidelity to greatly improve.

Figure 3.5. Misincorporation of nucleotides during polymerization. (A) Sequence of template being copied. (B) P1.4 (0.1 μM, Table 2.3) extensions by the Round 28 selection pool (0.1 μM) on cT1.4 (0.12 μM, Table 2.4) with NTPs as indicated (4 mM/each) in polymerization buffer (100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5).

3.1.3. Regulation of replication

The CP ribozyme’s ability to synthesize part of its own specificity-primer provides a mechanism by which the ribozyme could have avoided replicative parasites, by initiating polymerization from promoter-templates based on its own sequence (Fig. 2.22 and 2.23). However, can this functionality evolve to also serve as a regulator of replication? In all branches of eubacteria, a small non-coding RNA, 6S RNA, is used as a regulator of transcription (Barrick et al., 2005). In low nutrient conditions, by resembling an open bubble structure, the 6S RNA competes with DNA promoters to bind the housekeeping holoenzyme form of the E. coli RNA polymerase (Eσ70) (Wassarman and Storz, 2000). By sequestering and inhibiting the holoenzyme, 6S RNA facilitates efficient

118 usage of cellular resources in poor nutrient environments. Under nutrient rich conditions, the normally DNA dependent RNA polymerase uses the 6S RNA as a template to synthesize a short product RNA (pRNA), which in turn mediates the release of the RNA polymerase from the 6S RNA, returning it to normal transcription (Wassarman, 2018). Likewise, it is conceivable that the primer-binding site of the CP ribozyme (Fig. 2.1) could be evolved to act as a regulator of template-directed polymerization. Currently, the CP ribozyme can efficiently extend a 7-nt 3' truncated primer, tagging it with the ribozyme’s sequence, in turn, causing it to control promoter-based extension (Fig. 2.23). In a primordial RNA World, the starting primer fragment could perhaps be much shorter and made abiotically. The Holliger group has demonstrated that a polymerase ribozyme variant evolved to use trinucleotide triphosphates (pppNNN) as substrates was able to initiate RNA synthesis off a single triplet (Attwater et al., 2018). In low NTP conditions, an ideal CP ribozyme would be found in a closed, inactive form, analogous to the 6S sequestered Eσ70, unable of synthesizing its own specificity primer. As nutrient conditions change and NTPs become available, the ribozyme could fully synthesize its specificity primer, changing to an open, active form, that is able to again initiate replication of RNA templates. Such a system could facilitate efficient usage of environmental resources during a time when control over replication of one’s genome and other metabolically relevant RNAs would be essential for propagation of an autocatalytic system.

3.2. New polymerase functionalities: RNA tailing

In eukaryotes, as part of RNA maturation, almost all RNA species undergo 3' processing, known as tailing. In this process, canonical poly(A) polymerases (PAPs) or terminal (TENTs) add non-templated poly(A), poly(U) or mixed tails (guanosines incorporated with adenosines) to the 3' end of RNAs (Laishram, 2014). RNA tailing is an important mechanism of post-transcriptional gene regulation, as the tails can facilitate nuclear export, increase translation rate, enhance stability, or promote degradation of RNA transcripts (Yu and Kim, 2020). Tailing of exogenous RNA was also found to suppress viruses and other pathogens, making it an important defense mechanism (Warkocki et al., 2018).

Seeing how critical RNA tailing is in eukaryotes, it is interesting to speculate whether in an RNA World, RNA polymerase ribozymes could have evolved to modify the

119 3' ends of RNA products, adding to their functionality. Such ribozymes might employ two mechanisms. First, like the extant PAPs, they might function in a completely untemplated manner, where catalytic site residues and buried water molecules facilitate incoming ATP alignment and subsequent chemistry (Balbo and Bohm, 2007). Alternatively, the ribozymes could contain short internal templates, templating poly(N) additions, and transiently binding and positioning RNA substrates in the catalytic site. One piece of supporting evidence for the latter comes from the Müller group, where in a cis system, an evolved polymerase ribozyme was able to tag its own 3' end with 6SGTP from an internal template (Akoopie and Müller, 2018). Likewise, the self-templated primer synthesis of the CP ribozyme provides an example of a trans system with such an internally templated mechanism.

Another primitive example comes from the selection pool of the CP ribozyme. After 23 rounds of selection, an undesired selection artifact was observed, where 3' end extension of the pool was taking place in the absence of primer and template (Fig. 2.5 and 3.6). During the selection, a subset of RNA molecules was tailed with CTPB, allowing their capture onto streptavidin magnetic beads and their subsequent propagation from round to round. To further explore this activity, starting with round 23, six additional selection rounds were carried out, targeted at improving 3' tailing (Fig. 3.7, Table 3.1, Section 3.2.1). For two rounds, the RNA pools were deliberately incubated only with CTPB (Table 3.1). From round 23 to 25, streptavidin captured pool molecules nearly doubled from 6.6% to 12.8%, while α-32P-UTP extension assays showed an 85% improvement in 3' end extension (Fig. 3.6). For added selective pressure and selection of possible poly(C) tailing, four more rounds were carried out in the presence of CTPB with 40-fold excess CTP, resulting in a total 120% improvement of α-32P-UTP incorporation compared to round 23 (Fig. 3.6). The resulting round 29 pool remains to be characterized and it is currently unknown if this pool can add multiple end terminal nucleotides, or if it can incorporate other nucleotides besides U and C. However, this 3' tailing branch of selection shows that from the same selection pool, two types of RNA polymerase ribozymes can evolve to carry out distinct functions. The CP ribozyme selection resulted in general, primer-directed template extension, while the 3' tailing selection shows rapid 3' self-tagging activity. Thus, early in evolution it is conceivable that multiple populations of RNA polymerase ribozymes with different functions were present in the same environment, much like what we see in modern biology. With further

120 evolution of the 3' tailing polymerase ribozymes, we might be able to develop a true poly(N) polymerase, showcasing RNA maturation early in an RNA World.

Figure 3.6. 3' tailing activity by selection rounds. 32 Ribozyme pools (0.1 μM) were mixed with α- P-UTP (0.3 μM) in 100 mM MgCl2, 100 mM KCl, 100 mM Tris-HCl at pH 8.5, and allowed to extend for 2 hours at 22°C. Products were resolved on 5% PAGE. Pools contained two sizes of RNA molecules, differing by ~20-nt, as represented by the two observed bands. Their relative abundance changed as selective pressure was applied.

3.2.1. Methods: self-extension in vitro selection scheme

To improve the observed 3' tailing activity at round 23, six additional rounds of selection were performed (Fig. 3.7, Table 3.1). Internally radiolabeled (32P) RNA pools, 0.1 µM, were incubated for two hours at 22°C with either just Biotin-11-cytidine-5'- triphosphate (1 mM, CTPB) (Fig. 3.7A), or CTP (4 mM) with trace CTPB (0.1 mM) (Fig.

3.7B) in PB (100 mM KCl, 100 mM MgCl2, 100 mM Tris-HCl pH 8.5 at 25°C). Following this incubation, reaction mixtures were passed through buffer equilibrated Performa DTR Gel Filtration Cartridges (EdgeBio) for the removal of unincorporated CTPB prior to incubation for 20 min on streptavidin magnetic beads (beads were prepared as in section 2.9.3: In vitro selection schemes). Bead captured pool molecules were washed, reverse transcribed on the beads and the supernatant was collected, PCR amplified and in vitro transcribed to be fed into the next round of selection (as described in sections 2.9.4: RT of selection rounds and 2.9.5: Pool PCR amplification). The radiolabeled pool was tracked by Scintillation Counting at every step in the selection scheme.

121

Figure 3.7. Schematic of 3' tailing in vitro selection schemes. (A) RNA pool molecules were incubated with 1 mM Biotinylated-CTP (CTPB). Functional molecules tagged with CTPB were captured on streptavidin magnetic beads (S) and washed. (B) For added selective pressure and selection of possible poly(C) tailing, RNA pool molecules were incubated with 4 mM CTP and 0.1 mM CTPB. As before, functional molecules were captured on streptavidin magnetic beads and washed.

Conditions of Selection Rounds Incubation PBK Buffer ddH O % Pool Bound Round Substrates 2 (h) Wash (min) Wash (min) to Beads ATP (4 mM) + GTP (4 mM) + 23 2 3x 2 - 6.6 UTP (4 mM) + CTPB (1 mM) 24SE CTPB (1 mM) 2 4x 2 - 8.4 25SE CTPB (1 mM) 2 2x 2 2x 2 12.8 26SE CTP (4 mM) + CTPB (0.1 mM) 2 2x 2 2x 2 0.5 27SE CTP (4 mM) + CTPB (0.1 mM) 2 2x 2 2x 2 0.6 28SE CTP (4 mM) + CTPB (0.1 mM) 2 2x 2 2x 2 1.1 29SE CTP (4 mM) + CTPB (0.1 mM) 2 2x 2 2x 2 1.1

Table 3.1. Self-extension selection conditions. SE indicates that the selection round followed the self-extension selection scheme. (-) indicates that specified selective condition was not used.

122 3.3. Conclusion

Our understanding of naturally occurring ribozymes in modern biology is rapidly evolving and it is becoming clear that RNA cleavage and ligation play an important role in RNA mediated gene regulation and the replicative life cycles of a number of plant and eukaryotic RNA viruses. The fact that centrally important ribozymes such as the ribosome still exist in modern metabolism and are responsible for all extant protein synthesis, points to the increasing importance of RNA early in evolution. Through tools like in vitro selection and evolution, RNA ribozymes have demonstrated a broad range of phosphoryl transfer reactions that are of direct relevance to the replication and manipulation of RNA in an early RNA World. The potential for self-sustained RNA metabolism is therefore increasingly supported by the laboratory in vitro selection of ribozymes that are increasingly capable of metabolically relevant chemistry in this area.

The existence of multidomain nucleases such as RNase P and the ribosome further suggests the importance of multidomain functionality in ribozymes early in the RNA World. Therefore, the multidomain RNA polymerases artificially evolved in the laboratory add further credence to this hypothesis. Over the course of my PhD, I have created a polymerase ribozyme (CP) that is self-priming, capable of promoter recognition, and has integrally coupled processivity, which helps demonstrate how RNA polymerase ribozymes could have preferentially recognized their own genomes and replicated specific gene targets in a primordial RNA World. Despite the progress made to this day, many outstanding challenges remain in the field, including strand displacement, fidelity, and regulatory mechanisms. With the new functionalities of the CP ribozyme, our ability to mimic the origin of life by carrying out evolution of RNA inside a test tube is that much closer.

123 References

Abelson, J., Trotta, C.R., Li, H., 1998. tRNA Splicing. J. Biol. Chem. 273, 12685–12688. Akoopie, A., Müller, U.F., 2018. The NTP binding site of the polymerase ribozyme. Nucleic Acids Research. Akouche, M., Jaber, M., Maurel, M.-C., Lambert, J.-F., Georgelin, T., 2017. Phosphoribosyl Pyrophosphate: A Molecular Vestige of the Origin of Life on Minerals. Angewandte Chemie International Edition 56, 7920–7923. Amitsur, M., Levitz, R., Kaufmann, G., 1987. Bacteriophage T4 anticodon nuclease, polynucleotide kinase and RNA ligase reprocess the host lysine tRNA. The EMBO Journal 6, 2499–2503. Andersen, A.A., Collins, R.A., 2000. Rearrangement of a stable RNA secondary structure during VS ribozyme catalysis. Mol Cell 5, 469–478. Andrecka, J., Lewis, R., Brückner, F., Lehmann, E., Cramer, P., Michaelis, J., 2008. Single-molecule tracking of mRNA exiting from RNA polymerase II. Proc Natl Acad Sci U S A 105, 135–140. Andreini, C., Bertini, I., Cavallaro, G., Holliday, G.L., Thornton, J.M., 2008. Metal ions in biological catalysis: from enzyme databases to general principles. J Biol Inorg Chem 13, 1205–1218. Attwater, J., Raguram, A., Morgunov, A.S., Gianni, E., Holliger, P., 2018. Ribozyme- catalysed RNA synthesis using triplet building blocks. eLife Sciences 7, e35255. Attwater, J., Wochner, A., Holliger, P., 2013. In-ice evolution of RNA polymerase ribozyme activity. Nat Chem 5, 1011–1018. Autour, A., C Y Jeng, S., D Cawte, A., Abdolahzadeh, A., Galli, A., Panchapakesan, S.S.S., Rueda, D., Ryckelynck, M., Unrau, P.J., 2018. Fluorogenic RNA Mango aptamers for imaging small non-coding RNAs in mammalian cells. Nat Commun 9, 656. Balbo, P.B., Bohm, A., 2007. Mechanism of Poly(A) Polymerase: Structure of the enzyme-MgATP–RNA ternary complex and kinetic analysis. Structure 15, 1117– 1131. Balke, D., Zieten, I., Strahl, A., Müller, O., Müller, S., 2014. Design and Characterization of a Twin Ribozyme for Potential Repair of a Deletion Mutation within the Oncogenic CTNNB1-ΔS45 mRNA. ChemMedChem 9, 2128–2137. Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A., 2000. The Complete Atomic Structure of the Large Ribosomal Subunit at 2.4 Å Resolution. Science 289, 905– 920. Barrick, J.E., Sudarsan, N., Weinberg, Z., Ruzzo, W.L., Breaker, R.R., 2005. 6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter. RNA 11, 774–784. Bartel, D.P., Szostak, J.W., 1993. Isolation of New Ribozymes from a Large Pool of Random Sequences. Science 261, 1411–1418. Beckert, B., Nielsen, H., Einvik, C., Johansen, S.D., Westhof, E., Masquida, B., 2008. Molecular modelling of the GIR1 branching ribozyme gives new insight into evolution of structurally related ribozymes. EMBO J 27, 667–678. Benkovic, S.J., Valentine, A.M., Salinas, F., 2001. Replisome-Mediated DNA Replication. Annual Review of Biochemistry 70, 181. Benner, S.A., Ellington, A.D., Tauer, A., 1989. Modern metabolism as a palimpsest of the RNA world. PNAS 86, 7054–7058.

124 Berg, J.M., Tymoczko, J.L., Stryer, L., 2002. Nucleoside Monophosphate Kinases: Catalyzing Phosphoryl Group Exchange between Nucleotides Without Promoting Hydrolysis. Biochemistry. 5th edition. Bergman, N.H., Johnston, W.K., Bartel, D.P., 2000. Kinetic Framework for Ligation by an Efficient RNA Ligase Ribozyme. Biochemistry 39, 3115–3123. Beringer, M., Rodnina, M.V., 2007. The Ribosomal Peptidyl Transferase. Molecular Cell 26, 311–321. Berti, P.J., 1999. Determining transition states from kinetic isotope effects. Meth. Enzymol. 308, 355–397. Biondi, E., Maxwell, A.W.R., Burke, D.H., 2012. A small ribozyme with dual-site kinase activity. Nucleic Acids Res 40, 7528–7540. Biondi, E., Nickens, D.G., Warren, S., Saran, D., Burke, D.H., 2010. Convergent donor and acceptor substrate utilization among kinase ribozymes. Nucleic Acids Res 38, 6785–6795. Biondi, E., Poudyal, R.R., Forgy, J.C., Sawyer, A.W., Maxwell, A.W.R., Burke, D.H., 2013. Lewis acid catalysis of phosphoryl transfer from a copper(II)-NTP complex in a kinase ribozyme. Nucleic Acids Res 41, 3327–3338. Bouchard, P., Legault, P., 2014. A remarkably stable kissing-loop interaction defines substrate recognition by the Neurospora Varkud Satellite ribozyme. RNA 20, 1451–1464. Briney, B., Inderbitzin, A., Joyce, C., Burton, D.R., 2019. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature 566, 393– 397. Butcher, S.E., 2011. The spliceosome and its metal ions. Met Ions Life Sci 9, 235–251. Buzayan, J.M., Gerlach, W.L., Bruening, G., 1986. Non-enzymatic cleavage and ligation of RNAs complementary to a plant virus satellite RNA. Nature 323, 349–353. Cadwell, R.C., Joyce, G.F., 1994. Mutagenic PCR. Genome Res. 3, S136–S140. Canny, M.D., Jucker, F.M., Pardi, A., 2007. Efficient Ligation of the Schistosoma Hammerhead Ribozyme. Biochemistry 46, 3826–3834. Cech, T.R., 1990. Self-Splicing of Group I Introns. Annual Review of Biochemistry 59, 543–568. Chakraborty, A., Wang, D., Ebright, Y.W., Korlann, Y., Kortkhonjia, E., Kim, T., Chowdhury, S., Wigneshweraraj, S., Irschik, H., Jansen, R., Nixon, B.T., Knight, J., Weiss, S., Ebright, R.H., 2012. Opening and Closing of the Bacterial RNA Polymerase Clamp. Science 337, 591–595. Cheetham, G.M.T., Steitz, and T.A., 1999. Structure of a Transcribing T7 RNA Polymerase Initiation Complex. Science 286, 2305–2309. Chen, X., Li, N., Ellington, A.D., 2007. Ribozyme Catalysis of Metabolism in the RNA World. Chemistry & Biodiversity 4, 633–655. Chen, Y.G., Kowtoniuk, W.E., Agarwal, I., Shen, Y., Liu, D.R., 2009. LC/MS analysis of cellular RNA reveals NAD-linked RNA. Nat Chem Biol 5, 879–881. Cheng, L.K.L., Unrau, P.J., 2010. Closing the Circle: Replicating RNA with RNA. Cold Spring Harbor Perspectives in Biology 2, a002204. Chin, K., Pyle, A.M., 1995. Branch-point attack in group II introns is a highly reversible transesterification, providing a potential proofreading mechanism for 5’-splice site selection. RNA 1, 391–406. Cochrane, J.C., Strobel, S.A., 2008. Catalytic Strategies of Self-Cleaving Ribozymes. Acc. Chem. Res. 41, 1027–1035. Cojocaru, R., Unrau, P.J., 2021. Processive RNA polymerization and promoter recognition in an RNA World. Science 371, 1225–1232.

125 Collins, R.A., 2002. The Neurospora Varkud satellite ribozyme. Biochemical Society Transactions 30, 1122–1126. Crick, F.H.C., 1968. The origin of the genetic code. Journal of Molecular Biology 38, 367–379. Curtis, E.A., Bartel, D.P., 2005. New catalytic structures from an existing ribozyme. Nature Structural & Molecular Biology 12, 994–1000. Dagenais, P., Girard, N., Bonneau, E., Legault, P., 2017. Insights into RNA structure and dynamics from recent NMR and X‐ray studies of the Neurospora Varkud satellite ribozyme. Wiley Interdiscip Rev RNA 8. Daniels, D.L., Michels Jr, W.J., Pyle, A.M., 1996. Two Competing Pathways for Self- splicing by Group II Introns: A Quantitative Analysis ofin VitroReaction Rates and Products. Journal of Molecular Biology 256, 31–49. De la Peña, M., Gago, S., Flores, R., 2003. Peripheral regions of natural hammerhead ribozymes greatly increase their self-cleavage activity. EMBO J 22, 5561–5570. De la Peña, M., García-Robles, I., 2010. Ubiquitous presence of the hammerhead ribozyme motif along the tree of life. RNA 16, 1943–1950. Decatur, W.A., Einvik, C., Johansen, S., Vogt, V.M., 1995. Two group I ribozymes with different functions in a nuclear rDNA intron. EMBO J 14, 4558–4568. DeYoung, M.B., Siwkowski, A.M., Lian, Y., Hampel, A., 1995. Catalytic Properties of Hairpin Ribozymes Derived from Chicory Yellow Mottle Virus and Arabis Mosaic Virus Satellite RNAs. Biochemistry 34, 15785–15791. Donghi, D., Schnabl, J., 2011. Multiple roles of metal ions in large ribozymes. Met Ions Life Sci 9, 197–234. Doudna, J., Couture, S., Szostak, J., 1991. A multisubunit ribozyme that is a catalyst of and template for complementary strand RNA synthesis. Science 251, 1605– 1608. Doudna, J.A., Szostak, J.W., 1989. RNA-catalysed synthesis of complementary- strand RNA 339, 4. Doudna, J.A., Usman, N., Szostak, J.W., 1993. Ribozyme-catalyzed primer extension by trinucleotides: A model for the RNA-catalyzed replication of RNA. Biochemistry 32, 2111–2115. Draper, W.E., Hayden, E.J., Lehman, N., 2008. Mechanisms of covalent self-assembly of the Azoarcus ribozyme from four fragment oligonucleotides. Nucleic Acids Res 36, 520–531. Drew, K.N., Zajicek, J., Bondo, G., Bose, B., Serianni, A.S., 1998. 13C-labeled aldopentoses: detection and quantitation of cyclic and acyclic forms by heteronuclear 1D and 2D NMR spectroscopy. Carbohydrate Research 307, 199– 209. Dujon, B., 1989. Group I introns as mobile genetic elements: Facts and mechanistic speculations — a review. Gene 82, 91–114. Durniak, K.J., Bailey, S., Stetiz, T.A., 2008. The Structure of a Transcribing T7 RNA Polymerase Complex Captured During Its Transition from Initiation to Elongation. Science 322, 553–557. Eigen, M., 1971. Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58, 465–523. Ekland, E.H., Bartel, D.P., 1995. The secondary structure and sequence optimization of an RNA ligase ribozyme. Nucleic Acids Res 23, 3231–3238. Ekland, E.H., Szostak, J.W., Bartel, D.P., 1995. Structurally complex and highly active RNA ligases derived from random RNA sequences. Science 269, 364–370. Ellington, A.D., Szostak, J.W., 1990. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822.

126 Ellis, J.C., Brown, J.W., 2009. The RNase P family. RNA Biology 6, 362–369. Etaix, E., Orgel, L.E., 1978. Phosphorylation of nucleosides in aqueous solution using trimetaphosphate: Formation of nucleoside triphosphates. J. Carbohydrates Nucleosides Nucleotides 5, 91–110. Fedor, M.J., 1999. Tertiary Structure Stabilization Promotes Hairpin Ribozyme Ligation. Biochemistry 38, 11040–11050. Fedor, M.J., 2000. Structure and function of the hairpin ribozyme11Edited by D. E. Draper. Journal of Molecular Biology 297, 269–291. Fedor, M.J., Williamson, J.R., 2005. The catalytic diversity of RNAs. Nature Reviews Molecular Cell Biology 6, 399–412. Feldmann, W., 1967. Das Trimetaphosphat als Triphosphorylierungsmittel für Alkohole und Kohlenhydrate in wäßriger Lösung. Seine Sonderstellung unter den kondensierten Phosphaten. Chem. Ber. 3850–3860. Ferré-D’Amaré, A.R., Zhou, K., Doudna, J.A., 1998. Crystal structure of a hepatitis delta virus ribozyme. Nature 395, 567–574. Ferretti, A.C., Joyce, G.F., 2013. Kinetic Properties of an RNA Enzyme that Undergoes Self-Sustained Exponential Amplification. Biochemistry 52, 1227–1235. Fica, S.M., Tuttle, N., Novak, T., Li, N.-S., Lu, J., Koodathingal, P., Dai, Q., Staley, J.P., Piccirilli, J.A., 2013. RNA catalyzes nuclear pre-mRNA splicing. Nature 503, 229– 234. Flores, R., Gas, M.-E., Molina-Serrano, D., Nohales, M.-Á., Carbonell, A., Gago, S., De la Peña, M., Daròs, J.-A., 2009. Viroid Replication: Rolling-Circles, Enzymes and Ribozymes. Viruses 1, 317–334. Flores, R., Grubb, D., Elleuch, A., Nohales, M.-Á., Delgado, S., Gago, S., 2011. Rolling- circle replication of viroids, viroid-like satellite RNAs and hepatitis delta virus: Variations on a theme. RNA Biology 8, 200–206. Frank, D.N., Adamidi, C., Ehringer, M.A., Pitulle, C., Pace, N.R., 2000. Phylogenetic- comparative analysis of the eukaryal ribonuclease P RNA. RNA 6, 1895–1904. Galej, W.P., Oubridge, C., Newman, A.J., Nagai, K., 2013. Crystal structure of Prp8 reveals active site cavity of the spliceosome. Nature 493, 638–643. Galej, W.P., Toor, N., Newman, A.J., Nagai, K., 2018. Molecular Mechanism and Evolution of Nuclear Pre-mRNA and Group II Intron Splicing: Insights from Cryo- Electron Microscopy Structures. Chem. Rev. 118, 4156–4176. Gebetsberger, J., Micura, R., 2017. Unwinding the twister ribozyme: from structure to mechanism. Wiley Interdiscip Rev RNA 8. Gilbert, W., 1986. Origin of life: The RNA world. Nature 319, 618–618. Green, R., Lorsch, J.R., 2002. The Path to Perdition Is Paved with Protons. Cell 110, 665–668. Green, R., Szostak, J., 1992. Selection of a ribozyme that functions as a superior template in a self-copying reaction. Science 258, 1910–1915. Griesenbeck, J., Tschochner, H., Grohmann, D., 2017. Structure and Function of RNA Polymerases and the Transcription Machineries. In: Harris, J.R., Marles-Wright, J. (Eds.), Macromolecular Protein Complexes: Structure and Function, Subcellular Biochemistry. Springer International Publishing, Cham, pp. 225–270. Gruber, T.M., Gross, C.A., 2003. Multiple Sigma Subunits and the Partitioning of Bacterial Transcription Space. Annu. Rev. Microbiol. 57, 441–466. Guenther, U.-P., Yandek, L.E., Niland, C.N., Campbell, F.E., Anderson, D., Anderson, V.E., Harris, M.E., Jankowsky, E., 2013. Hidden specificity in an apparently non- specific RNA-binding protein. Nature 502, 385–388. Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N., Altman, S., 1983. The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 35, 849–857.

127 Hager, A.J., Szostak, J.W., 1997. Isolation of novel ribozymes that ligate AMP-activated RNA substrates. Chemistry & Biology 4, 607–617. Hampel, A., Tritz, R., 1989. RNA catalytic properties of the minimum (-)sTRSV sequence. Biochemistry 28, 4929–4933. Hanna, R., Doudna, J.A., 2000. Metal ions in ribozyme folding and catalysis. Current Opinion in Chemical Biology 4, 166–170. Harris, K.A., Lünse, C.E., Li, S., Brewer, K.I., Breaker, R.R., 2015. Biochemical analysis of pistol self-cleaving ribozymes. RNA 21, 1852–1858. Hartmann, E., Hartmann, R.K., 2003. The enigma of ribonuclease P evolution. Trends in Genetics 19, 561–569. Haugen, P., Simon, D.M., Bhattacharya, D., 2005. The natural history of group I introns. Trends in Genetics 21, 111–119. Hayden, E., Lehman, N., Unrau, P., 2018. Transitions: RNA and Ribozymes in the Development of Life. In: Astrobiology Handbook. Hayden, E.J., Lehman, N., 2006. Self-Assembly of a Group I Intron from Inactive Oligonucleotide Fragments. Chemistry & Biology 13, 909–918. Hayden, E.J., von Kiedrowski, G., Lehman, N., 2008. Systems Chemistry on Ribozyme Self-Construction: Evidence for Anabolic Autocatalysis in a Recombination Network. Angewandte Chemie International Edition 47, 8424–8428. He, C., Gállego, I., Laughlin, B., Grover, M.A., Hud, N.V., 2017. A viscous solvent enables information transfer from gene-length nucleic acids in a model prebiotic replication cycle. Nature Chem 9, 318–324. Hegg, L.A., Fedor, M.J., 1995. Kinetics and thermodynamics of intermolecular catalysis by hairpin ribozymes. Biochemistry 34, 15813–15828. Henzler-Wildman, K.A., Thai, V., Lei, M., Ott, M., Wolf-Watz, M., Fenn, T., Pozharski, E., Wilson, M.A., Petsko, G.A., Karplus, M., Hübner, C.G., Kern, D., 2007. Intrinsic motions along an enzymatic reaction trajectory. Nature 450, 838–844. Hertel, K.J., Herschlag, D., Uhlenbeck, O.C., 1994. A kinetic and thermodynamic framework for the hammerhead ribozyme reaction. Biochemistry 33, 3374–3385. Hetzer, M., Wurzer, G., Schweyen, R.J., Mueller, M.W., 1997. Trans -activation of group II intron splicing by nuclear U5 snRNA. Nature 386, 417–420. Hinnebusch, A.G., 2014. The scanning mechanism of eukaryotic translation initiation. Annu. Rev. Biochem. 83, 779–812. Höfer, K., Jäschke, A., 2018. Epitranscriptomics: RNA Modifications in Bacteria and Archaea. Microbiology Spectrum 6. Holzmann, J., Frank, P., Löffler, E., Bennett, K.L., Gerner, C., Rossmanith, W., 2008. RNase P without RNA: Identification and Functional Reconstitution of the Human Mitochondrial tRNA Processing Enzyme. Cell 135, 462–474. Horning, D.P., Joyce, G.F., 2016. Amplification of RNA by an RNA polymerase ribozyme. PNAS 113, 9786–9791. Hu, G., 1993. DNA polymerase-catalyzed addition of nontemplated extra nucleotides to the 3’ end of a DNA fragment. DNA Cell Biol. 12, 763–770. Huang, F., Yarus, M., 1997a. 5‘-RNA Self-Capping from Guanosine Diphosphate. Biochemistry 36, 6557–6563. Huang, F., Yarus, M., 1997b. Versatile 5′ phosphoryl coupling of small and large molecules to an RNA. PNAS 94, 8965–8969. Huang, W., Ferris, J.P., 2006. One-Step, Regioselective Synthesis of up to 50-mers of RNA Oligomers by Montmorillonite Catalysis. J. Am. Chem. Soc. 128, 8914– 8919.

128 Hutchins, C.J., Rathjen, P.D., Forster, A.C., Symons, R.H., 1986. Self-cleavage of plus and minus RNA transcripts of avocado sunblotch viroid. Nucleic Acids Research 14, 3627–3640. Ianeselli, A., Mast, C.B., Braun, D., 2019. Periodic Melting of Oligonucleotides by Oscillating Salt Concentrations Triggered by Microscale Water Cycles Inside Heated Rock Pores. Angewandte Chemie International Edition 58, 13155–13160. Ikawa, Y., Tsuda, K., Matsumura, S., Inoue, T., 2004. De novo synthesis and development of an RNA enzyme. Proc Natl Acad Sci U S A 101, 13750–13755. Illangasekare, M., Sanchez, G., Nickles, T., Yarus, M., 1995. Aminoacyl-RNA synthesis catalyzed by an RNA. Science 267, 643–647. Jackson, R.J., Hellen, C.U.T., Pestova, T.V., 2010. The Mechanism of Eukaryotic Translation Initiation and Principles of its Regulation. Nat Rev Mol Cell Biol 11, 113–127. Jaeger, L., Wright, M.C., Joyce, G.F., 1999. A complex ligase ribozyme evolved in vitro from a group I ribozyme domain. Proc Natl Acad Sci U S A 96, 14712–14717. Jarrell, K.A., Peebles, C.L., Dietrich, R.C., Romiti, S.L., Perlman, P.S., 1988. Group II intron self-splicing. Alternative reaction conditions yield novel products. J Biol Chem 263, 3432–3439. Jarrous, N., 2017. Roles of RNase P and Its Subunits. Trends in Genetics 33, 594–603. Jäschke, A., Höfer, K., Nübel, G., Frindert, J., 2016. Cap-like structures in bacterial RNA and epitranscriptomic modification. Current Opinion in Microbiology, Cell regulation 30, 44–49. Johansen, S., Einvik, C., Nielsen, H., 2002. DiGIR1 and NaGIR1: naturally occurring group I-like ribozymes with unique core organization and evolved biological role. Biochimie 84, 905–912. Johansen, S., Vogt, V.M., 1994. An intron in the nuclear ribosomal DNA of Didymium iridis codes for a group I ribozyme and a novel ribozyme that cooperate in self- splicing. Cell 76, 725–734. Johnston, W.K., Unrau, P.J., Lawrence, M.S., Glasner, M.E., Bartel, D.P., 2001. RNA- Catalyzed RNA Polymerization: Accurate and General RNA-Templated Primer Extension. Science 292, 1319–1325. Joyce, G.F., 2002. The antiquity of RNA-based evolution. Nature 418, 214–221. Joyce, G.F., Szostak, J.W., 2018. Protocells and RNA Self-Replication. Cold Spring Harb Perspect Biol 10. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K., 2017. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361. Kanehisa, M., Goto, S., 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M., 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-462. Kang, T.J., Suga, H., 2007. In vitro selection of a 5′-purine ribonucleotide transferase ribozyme. Nucleic Acids Res 35, 4186–4194. Karimi-Busheri, F., Daly, G., Robins, P., Canas, B., Pappin, D.J.C., Sgouros, J., Miller, G.G., Fakhrai, H., Davis, E.M., Beau, M.M.L., Weinfeld, M., 1999. Molecular Characterization of a Human DNA Kinase. J. Biol. Chem. 274, 24187–24194. Kazantsev, A.V., Rambo, R.P., Karimpour, S., SantaLucia, J., Tainer, J.A., Pace, N.R., 2011. Solution structure of RNase P RNA. RNA 17, 1159–1171. Keating, K.S., Toor, N., Perlman, P.S., Pyle, A.M., 2010. A structural analysis of the group II intron active site and implications for the spliceosome. RNA 16, 1–9.

129 Khvorova, A., Lescoute, A., Westhof, E., Jayasena, S.D., 2003. Sequence elements outside the hammerhead ribozyme catalytic core enable intracellular activity. Nat Struct Biol 10, 708–712. Kikovska, E., Svärd, S.G., Kirsebom, L.A., 2007. Eukaryotic RNase P RNA mediates cleavage in the absence of protein. Proc Natl Acad Sci U S A 104, 2062–2067. Kim, D.-E., Joyce, G.F., 2004. Cross-Catalytic Replication of an RNA Ligase Ribozyme. Chemistry & Biology 11, 1505–1512. Kim, S.H., Cech, T.R., 1987. Three-dimensional model of the active site of the self- splicing rRNA precursor of Tetrahymena. Proc Natl Acad Sci U S A 84, 8788– 8792. Klein, D.J., Ferré-D’Amaré, A.R., 2006. Structural Basis of glmS Ribozyme Activation by Glucosamine-6-Phosphate. Science 313, 1752–1756. Knowles, J.R., 1980. Enzyme-Catalyzed Phosphoryl Transfer Reactions. Annual Review of Biochemistry 49, 877–919. Kruger, K., Grabowski, P.J., Zaug, A.J., Sands, J., Gottschling, D.E., Cech, T.R., 1982. Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell 31, 147–157. Kun, A., Santos, M., Szathmáry, E., 2005. Real ribozymes suggest a relaxed error threshold. Nat Genet 37, 1008–1011. Kuo, M.Y., Sharmeen, L., Dinter-Gottlieb, G., Taylor, J., 1988. Characterization of self- cleaving RNA sequences on the genome and antigenome of human hepatitis delta virus. J Virol 62, 4439–4444. Kura, G., 1987. Alkaline Hydrolysis of Inorganic cyclo-Polyphosphates. Bulletin of the Chemical Society of Japan 60, 2857–2860. Lai, M.M., 1995. The molecular biology of hepatitis delta virus. Annu Rev Biochem 64, 259–286. Laishram, R.S., 2014. Poly(A) polymerase (PAP) diversity in gene expression – Star- PAP vs canonical PAP. FEBS Lett 588, 2185–2197. Lambowitz, A.M., Belfort, M., 2015. Mobile Bacterial Group II Introns at the Crux of Eukaryotic Evolution. Microbiol Spectr 3, MDNA3-0050–2014. Lasda, E., Parker, R., 2014. Circular RNAs: diversity of form and function. RNA 20, 1829–1842. Lau, M.W.L., Cadieux, K.E.C., Unrau, P.J., 2004. Isolation of fast purine nucleotide synthase ribozymes. J. Am. Chem. Soc. 126, 15686–15693. Lau, M.W.L., Ferré-D’Amaré, A.R., 2013. An in vitro evolved glmS ribozyme has the wildtype fold but loses coenzyme dependence. Nat Chem Biol 9, 805–810. Lau, M.W.L., Unrau, P.J., 2009. A Promiscuous Ribozyme Promotes Nucleotide Synthesis in Addition to Ribose Chemistry. Chemistry & Biology 16, 815–825. Lawrence, M.S., Bartel, D.P., 2003. Processivity of Ribozyme-Catalyzed RNA Polymerization. Biochemistry 42, 8748–8755. Levy, M., Ellington, A.D., 2001. The descent of polymerization. Nature Structural & Molecular Biology 8, 580–582. Li, S., Lünse, C.E., Harris, K.A., Breaker, R.R., 2015. Biochemical analysis of hatchet self-cleaving ribozymes. RNA 21, 1845–1851. Lincoln, T.A., Joyce, G.F., 2009. Self-sustained Replication of an RNA Enzyme. Science 323, 1229–1232. Lipfert, J., Ouellet, J., Norman, D.G., Doniach, S., Lilley, D.M.J., 2008. The complete VS ribozyme in solution studied by small-angle X-ray scattering. Structure 16, 1357– 1367. Liu, Y., Wilson, T.J., Lilley, D.M.J., 2017. The structure of a nucleolytic ribozyme that employs a catalytic metal ion. Nat Chem Biol 13, 508–513.

130 Lorsch, J.R., Szostak, J.W., 1994. In vitro evolution of new ribozymes with polynucleotide kinase activity. Nature 371, 31–36. Lorsch, J.R., Szostak, J.W., 1995. Kinetic and Thermodynamic Characterization of the Reaction Catalyzed by a Polynucleotide Kinase Ribozyme. Biochemistry 34, 15315–15327. Luciano, D.J., Belasco, J.G., 2015. NAD in RNA: unconventional headgear. Trends Biochem Sci 40, 245–247. Maeda, H., Fujita, N., Ishihama, A., 2000. Competition among seven Escherichia coli σ subunits: relative binding affinities to the core RNA polymerase. Nucleic Acids Res 28, 3497–3503. Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S., 2002. The protein kinase complement of the human genome. Science 298, 1912–1934. Mariani, A., Bonfio, C., Johnson, C.M., Sutherland, J.D., 2018. pH-Driven RNA Strand Separation under Prebiotically Plausible Conditions. Biochemistry 57, 6382– 6386. McCarthy, T.J., Plog, M.A., Floy, S.A., Jansen, J.A., Soukup, J.K., Soukup, G.A., 2005. Ligand Requirements for glmS Ribozyme Self-Cleavage. Chemistry & Biology 12, 1221–1226. McGinness, K.E., Joyce, G.F., 2002. RNA-Catalyzed RNA Ligation on an External RNA Template. Chemistry & Biology 9, 297–307. McGinness, K.E., Joyce, G.F., 2003. In Search of an RNA Replicase Ribozyme. Chemistry & Biology 10, 5–14. McGinness, K.E., Wright, M.C., Joyce, G.F., 2002. Continuous In Vitro Evolution of a Ribozyme that Catalyzes Three Successive Nucleotidyl Addition Reactions. Chemistry & Biology 9, 585–596. Meyer, C., Hahn, U., Rentmeister, A., 2011. Cell-Specific Aptamers as Emerging Therapeutics. J Nucleic Acids 2011. Meyer, M., Nielsen, H., Oliéric, V., Roblin, P., Johansen, S.D., Westhof, E., Masquida, B., 2014. Speciation of a group I intron into a lariat capping ribozyme. Proc Natl Acad Sci U S A 111, 7659–7664. Michel, F., Hanna, M., Green, R., Bartel, D.P., Szostak, J.W., 1989. The guanosine binding site of the Tetrahymena ribozyme. Nature 342, 391–395. Michel, F., Jacquier, A., Dujon, B., 1982. Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure. Biochimie 64, 867– 881. Michel, F., Westhof, E., 1990. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. Journal of Molecular Biology 216, 585–610. Moore, M.J., Sharp, P.A., 1992. Site-specific modification of pre-mRNA: the 2’-hydroxyl groups at the splice sites. Science 256, 992–997. Moretti, J.E., Müller, U.F., 2014. A ribozyme that triphosphorylates RNA 5′-hydroxyl groups. Nucleic Acids Res 42, 4767–4778. Müller, U.F., Bartel, D.P., 2008. Improved polymerase ribozyme efficiency on hydrophobic assemblies. RNA 14, 552–562. Murakami, K.S., Darst, S.A., 2003. Bacterial RNA polymerases: the wholo story. Current Opinion in Structural Biology 13, 31–39. Murchie, A.I.H., Thomson, J.B., Walter, F., Lilley, D.M.J., 1998. Folding of the Hairpin Ribozyme in Its Natural Conformation Achieves Close Physical Proximity of the Loops. Molecular Cell 1, 873–881. Nguyen, L.A., Wang, J., Steitz, T.A., 2017. Crystal structure of Pistol, a class of self- cleaving ribozyme. Proc Natl Acad Sci U S A 114, 1021–1026.

131 Nguyen, T.H.D., Galej, W.P., Bai, X., Savva, C.G., Newman, A.J., Scheres, S.H.W., Nagai, K., 2015. The architecture of the spliceosomal U4/U6.U5 tri-snRNP. Nature 523, 47–52. Nielsen, H., Westhof, E., Johansen, S., 2005. An mRNA Is Capped by a 2’, 5’ Lariat Catalyzed by a Group I-Like Ribozyme. Science 309, 1584–1587. Nissen, P., Hansen, J., Ban, N., Moore, P.B., Steitz, T.A., 2000. The Structural Basis of Ribosome Activity in Peptide Bond Synthesis. Science 289, 920–930. Nohales, M.-Á., Molina-Serrano, D., Flores, R., Daròs, J.-A., 2012. Involvement of the Chloroplastic Isoform of tRNA Ligase in the Replication of Viroids Belonging to the Family Avsunviroidae. J Virol 86, 8269–8276. Noller, H.F., Hoffarth, V., Zimniak, L., 1992. Unusual resistance of peptidyl transferase to protein extraction procedures. Science 256, 1416–1419. Nomura, Y., Yokobayashi, Y., 2019. Systematic minimization of RNA ligase ribozyme through large-scale design-synthesis-sequence cycles. Nucleic Acids Res 47, 8950–8960. Orgel, L.E., 1968. Evolution of the genetic apparatus. Journal of Molecular Biology 38, 381–393. Pannucci, J.A., Haas, E.S., Hall, T.A., Harris, J.K., Brown, J.W., 1999. RNase P RNAs from some Archaea are catalytically active. Proc Natl Acad Sci U S A 96, 7803– 7808. Pascal, J.M., 2008. DNA and RNA ligases: structural variations and shared mechanisms. Current Opinion in Structural Biology, Folding and Binding / Protein-nucleic acid interactions 18, 96–105. Paul, N., Joyce, G.F., 2002. A self-replicating ligase ribozyme. PNAS 99, 12733–12740. Peebles, C.L., Perlman, P.S., Mecklenburg, K.L., Petrillo, M.L., Tabor, J.H., Jarrell, K.A., Cheng, H.L., 1986. A self-splicing RNA excises an intron lariat. Cell 44, 213–223. Powner, M.W., Gerland, B., Sutherland, J.D., 2009. Synthesis of activated pyrimidine ribonucleotides in prebiotically plausible conditions. Nature 459, 239–242. Powner, M.W., Sutherland, J.D., 2011. Prebiotic chemistry: a new modus operandi. Philos Trans R Soc Lond B Biol Sci 366, 2870–2877. Prody, G.A., Bakos, J.T., Buzayan, J.M., Schneider, I.R., Bruening, G., 1986. Autolytic Processing of Dimeric Plant Virus Satellite RNA. Science 231, 1577–1580. Pyle, A.M., 2016. Group II Intron Self-Splicing. Annual Review of Biophysics 45, 183– 205. Ramakrishnan, V., 2002. Ribosome structure and the mechanism of translation. Cell 108, 557–572. Randau, L., Schröder, I., Söll, D., 2008. Life without RNase P. Nature 453, 120–123. Reader, J.S., Joyce, G.F., 2002. A ribozyme composed of only two different nucleotides. Nature 420, 841–844. Reichard, P., 1988. Interactions between deoxyribonucleotide and dna synthesis. Annu. Rev. Biochem. 57, 349–374. Ren, A., Vušurović, N., Gebetsberger, J., Gao, P., Juen, M., Kreutz, C., Micura, R., Patel, D.J., 2016. Pistol ribozyme adopts a pseudoknot fold facilitating site- specific in-line cleavage. Nat Chem Biol 12, 702–708. Riccitelli, N., Lupták, A., 2013. Chapter Four - HDV Family of Self-Cleaving Ribozymes. In: Soukup, G.A. (Ed.), Progress in Molecular Biology and Translational Science, Catalytic RNA. Academic Press, pp. 123–171. Robertson, D.L., Joyce, G.F., 1990. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature 344, 467–468. Robertson, M.P., Joyce, G.F., 2012. The Origins of the RNA World. Cold Spring Harb Perspect Biol 4.

132 Robertson, M.P., Joyce, G.F., 2014. Highly Efficient Self-Replicating RNA Enzymes. Chem Biol 21, 238–245. Rogers, J., Joyce, G.F., 1999. A ribozyme that lacks cytidine. Nature 402, 323–325. Rogers, J., Joyce, G.F., 2001. The effect of cytidine on the structure and function of an RNA ligase ribozyme. RNA 7, 395–404. Roth, A., Weinberg, Z., Chen, A.G.Y., Kim, P.B., Ames, T.D., Breaker, R.R., 2014. A widespread self-cleaving ribozyme class is revealed by bioinformatics. Nat Chem Biol 10, 56–60. Rupert, P.B., Massey, A.P., Sigurdsson, S.T., Ferré-D’Amaré, A.R., 2002. Transition State Stabilization by a Catalytic RNA. Science 298, 1421–1424. Sabeti, P.C., Unrau, P.J., Bartel, D.P., 1997. Accessing rare activities from random RNA sequences: the importance of the length of molecules in the starting pool. Chem Biol 4, 767–774. Salehi-Ashtiani, K., Lupták, A., Litovchick, A., Szostak, J.W., 2006. A Genomewide Search for Ribozymes Reveals an HDV-Like Sequence in the Human CPEB3 Gene. Science 313, 1788–1792. Salehi-Ashtiani, K., Szostak, J.W., 2001. In vitro evolution suggests multiple origins for the hammerhead ribozyme. Nature 414, 82–84. Samanta, B., Joyce, G.F., 2017. A reverse transcriptase ribozyme. eLife 6, e31153. Saran, D., Held, D.M., Burke, D.H., 2006. Multiple-turnover thio-ATP and phospho-enzyme intermediate formation activities catalyzed by an RNA enzyme. Nucleic Acids Res 34, 3201–3208. Saran, D., Nickens, D.G., Burke, D.H., 2005. A Trans Acting Ribozyme that Phosphorylates Exogenous RNA. Biochemistry 44, 15007–15016. Sassanfar, M., Szostak, J.W., 1993. An RNA motif that binds ATP. Nature 364, 550– 553. Saville, B.J., Collins, R.A., 1990. A site-specific self-cleavage reaction performed by a novel RNA in neurospora mitochondria. Cell 61, 685–696. Schmeing, M.T., Huang, K.S., Strobel, S.A., Steitz, T.A., 2005. An induced-fit mechanism to promote peptide bond formation and exclude hydrolysis of peptidyl-tRNA. Nature 438, 520–524. Schnaufer, A., Panigrahi, A.K., Panicucci, B., Igo, R.P., Salavati, R., Stuart, K., 2001. An RNA Ligase Essential for RNA Editing and Survival of the Bloodstream Form of Trypanosoma brucei. Science 291, 2159–2162. Sczepanski, J.T., Joyce, G.F., 2014. A Cross-chiral RNA Polymerase Ribozyme. Nature 515, 440–442. Seelig, B., Jäschke, A., 1999. A small catalytic RNA motif with Diels-Alderase activity. Chem Biol 6, 167–176. Selmer, M., 2006. Structure of the 70S Ribosome Complexed with mRNA and tRNA. Science 313, 1935–1942. Shan, S., Yoshida, A., Sun, S., Piccirilli, J.A., Herschlag, D., 1999. Three metal ions at the active site of the Tetrahymena group I ribozyme. Proc Natl Acad Sci U S A 96, 12299–12304. Shechner, D.M., Grant, R.A., Bagby, S.C., Koldobskaya, Y., Piccirilli, J.A., Bartel, D.P., 2009. Crystal Structure of the Catalytic Core of an RNA-Polymerase Ribozyme. Science 326, 1271–1275. Sherlock, M.E., Sudarsan, N., Stav, S., Breaker, R.R., 2018. Tandem riboswitches form a natural Boolean logic gate to control purine metabolism in bacteria. Elife 7. Shukla, G.C., Padgett, R.A., 2002. A Catalytically Active Group II Intron Domain 5 Can Function in the U12-Dependent Spliceosome. Molecular Cell 9, 1145–1150.

133 Shuman, S., Lima, C.D., 2004. The polynucleotide ligase and RNA capping enzyme superfamily of covalent nucleotidyltransferases. Current Opinion in Structural Biology 14, 757–764. Smathers, C.M., Robart, A.R., 2019. The mechanism of splicing as told by group II introns: Ancestors of the spliceosome. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, RNA structure and splicing regulation 1862, 194390. Stairs, S., Nikmal, A., Bučar, D.-K., Zheng, S.-L., Szostak, J.W., Powner, M.W., 2017. Divergent prebiotic synthesis of pyrimidine and 8-oxo-purine ribonucleotides. Nat Commun 8. Steitz, J.A., 1969. Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature 224, 957–964. Steitz, T.A., Steitz, J.A., 1993. A general two-metal-ion mechanism for catalytic RNA. PNAS 90, 6498–6502. Stemmer, W.P., 1994. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391. Sun, H., Zhu, X., Lu, P.Y., Rosato, R.R., Tan, W., Zu, Y., 2014. Oligonucleotide Aptamers: New Tools for Targeted Cancer Therapy. Mol Ther Nucleic Acids 3, e182. Sun, L., Campbell, F.E., Zahler, N.H., Harris, M.E., 2006. Evidence that substrate- specific effects of C5 protein lead to uniformity in binding and catalysis by RNase P. EMBO J 25, 3998–4007. Sydow, J.F., Cramer, P., 2009. RNA polymerase fidelity and transcriptional proofreading. Current Opinion in Structural Biology, Catalysis and regulation / Proteins 19, 732–739. Symons, R.H., Hutchins, C.J., Forster, A.C., Rathjen, P.D., Keese, P., Visvader, J.E., 1987. Self-Cleavage of RNA in the Replication of Viroids and Virusoids. J Cell Sci 1987, 303–318. Szostak, J.W., Bartel, D.P., Luisi, P.L., 2001. Synthesizing life. Nature 409, 387–390. Tahirov, T.H., Temiakov, D., Anikin, M., Patlan, V., McAllister, W.T., Vassylyev, D.G., Yokoyama, S., 2002. Structure of a T7 RNA polymerase elongation complex at 2.9 A resolution. Nature 420, 43–50. Teixeira, A., Tahiri-Alaoui, A., West, S., Thomas, B., Ramadass, A., Martianov, I., Dye, M., James, W., Proudfoot, N.J., Akoulitchev, A., 2004. Autocatalytic RNA cleavage in the human β-globin pre-mRNA promotes transcription termination. Nature 432, 526–530. Thilo, E., Schulz, G., Wichmann, E.-M., 1953. Die Konstitution des Grahamschen und Kurrolschen Salzes. Zeitschrift fu¨r Anorganische und Allgemeine Chemie 182– 200. Thomas, B.C., Li, X., Gegenheimer, P., 2000. Chloroplast ribonuclease P does not utilize the ribozyme-type pre-tRNA cleavage mechanism. RNA 6, 545–553. Tjhung, K.F., Shokhirev, M.N., Horning, D.P., Joyce, G.F., 2020. An RNA polymerase ribozyme that synthesizes its own ancestor. PNAS 117, 2906–2913. Tuerk, C., Gold, L., 1990. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510. Uhlenbeck, O.C., 1987. A small catalytic oligoribonucleotide. Nature 328, 596–600. Unrau, P.J., Bartel, D.P., 1998. RNA-catalysed nucleotide synthesis. Nature 395, 260– 263. Unrau, P.J., Bartel, D.P., 2003. An oxocarbenium-ion intermediate of a ribozyme reaction indicated by kinetic isotope effects. PNAS 100, 15393–15397.

134 Vaidya, N., Manapat, M.L., Chen, I.A., Xulvi-Brunet, R., Hayden, E.J., Lehman, N., 2012. Spontaneous network formation among cooperative RNA replicators. Nature 491, 72–77. Van Rompay, A.R., Johansson, M., Karlsson, A., 2000. Phosphorylation of nucleosides and nucleoside analogs by mammalian nucleoside monophosphate kinases. Pharmacology & Therapeutics 87, 189–198. Viladoms, J., Fedor, M.J., 2012. The glmS Ribozyme Cofactor is a General Acid-Base Catalyst. J Am Chem Soc 134, 19043–19049. Wadkins, T.S., Perrotta, A.T., Ferré-D’Amaré, A.R., Doudna, J.A., Been, M.D., 1999. A nested double pseudoknot is required for self-cleavage activity of both the genomic and antigenomic hepatitis delta virus ribozymes. RNA 5, 720–727. Wang, L.K., Ho, C.K., Pei, Y., Shuman, S., 2003. Mutational Analysis of Bacteriophage T4 RNA Ligase 1 Different functional groups are required for the nucleotidyl transfer and phosphodiester bond formation steps of the ligation reaction. J. Biol. Chem. 278, 29454–29462. Wang, Q.S., Cheng, L.K.L., Unrau, P.J., 2011. Characterization of the B6.61 polymerase ribozyme accessory domain. RNA 17, 469–477. Wang, Q.S., Unrau, P.J., 2005. Ribozyme motif structure mapped using random recombination and selection. RNA 11, 404–411. Wang, Y., 2021. Current view and perspectives in viroid replication. Current Opinion in Virology 47, 32–37. Wang, Y., Prosen, D.E., Mei, L., Sullivan, J.C., Finney, M., Vander Horn, P.B., 2004. A novel strategy to engineer DNA polymerases for enhanced processivity and improved performance in vitro. Nucleic Acids Res 32, 1197–1207. Warkocki, Z., Liudkovska, V., Gewartowska, O., Mroczek, S., Dziembowski, A., 2018. Terminal nucleotidyl (TENTs) in mammalian RNA metabolism. Philos Trans R Soc Lond B Biol Sci 373. Washburn, R.S., Gottesman, M.E., 2015. Regulation of Transcription Elongation and Termination. Biomolecules 5, 1063–1078. Wassarman, K.M., 2018. 6S RNA, A Global Regulator of Transcription. Microbiol Spectr 6. Wassarman, K.M., Storz, G., 2000. 6S RNA Regulates E. coli RNA Polymerase Activity. Cell 101, 613–623. Webb, C.-H.T., Lupták, A., 2011. HDV-like self-cleaving ribozymes. RNA Biol 8, 719– 727. Weinberg, Z., Kim, P.B., Chen, T.H., Li, S., Harris, K.A., Lünse, C.E., Breaker, R.R., 2015. New classes of self-cleaving ribozymes revealed by comparative genomics analysis. Nat Chem Biol 11, 606–610. Weiner, A.M., Maizels, N., 1987. tRNA-like structures tag the 3’ ends of genomic RNA molecules for replication: implications for the origin of protein synthesis. Proc Natl Acad Sci U S A 84, 7383–7387. Werner, F., Grohmann, D., 2011. Evolution of multisubunit RNA polymerases in the three domains of life. Nat Rev Microbiol 9, 85–98. Whitaker, D., Powner, M.W., 2018. Prebiotic nucleic acids need space to grow. Nat Commun 9. White, H.B., 1976. Coenzymes as fossils of an earlier metabolic state. J. Mol. Evol. 7, 101–104. Will, C.L., Lührmann, R., 2011. Spliceosome Structure and Function. Cold Spring Harb Perspect Biol 3.

135 Wilson, T.J., Li, N.-S., Lu, J., Frederiksen, J.K., Piccirilli, J.A., Lilley, D.M.J., 2010. Nucleobase-mediated general acid-base catalysis in the Varkud satellite ribozyme. Proc Natl Acad Sci U S A 107, 11751–11756. Wilson, T.J., Liu, Y., Lilley, D.M.J., 2016. Ribozymes and the mechanisms that underlie RNA catalysis. Front. Chem. Sci. Eng. 10, 178–185. Winkler, W.C., Nahvi, A., Roth, A., Collins, J.A., Breaker, R.R., 2004. Control of gene expression by a natural metabolite-responsive ribozyme. Nature 428, 281–286. Wisniak, J., 2010. The History of Catalysis. From the Beginning to Nobel Prizes. Educación Química 21, 60–69. Wochner, A., Attwater, J., Coulson, A., Holliger, P., 2011. Ribozyme-Catalyzed Transcription of an Active Ribozyme. Science 332, 209–212. Woese, C.R., 1968. The fundamental nature of the genetic code: prebiotic interactions between polynucleotides and polyamino acids or their derivatives. Proc Natl Acad Sci U S A 59, 110–117. Wright, M.C., Joyce, G.F., 1997. Continuous in Vitro Evolution of Catalytic Function. Science 276, 614–617. Xu, J., Chmela, V., Green, N.J., Russell, D.A., Janicki, M.J., Góra, R.W., Szabla, R., Bond, A.D., Sutherland, J.D., 2020. Selective prebiotic formation of RNA pyrimidine and DNA purine nucleosides. Nature 582, 60–66. Yan, A.C., Levy, M., 2009. Aptamers and aptamer targeted delivery. RNA Biol 6, 316– 320. Yin, Y.W., Steitz, T.A., 2002. Structural basis for the transition from initiation to elongation transcription in T7 RNA polymerase. Science 298, 1387–1395. Young, B.A., Gruber, T.M., Gross, C.A., 2002. Views of Transcription Initiation. Cell 109, 417–420. Yu, S., Kim, V.N., 2020. A tale of non-canonical tails: gene regulation by post- transcriptional RNA tailing. Nat Rev Mol Cell Biol 21, 542–556. Zaher, H.S., Unrau, P.J., 2006. A General RNA-Capping Ribozyme Retains Stereochemistry during Cap Exchange. J. Am. Chem. Soc. 128, 13894–13900. Zaher, H.S., Unrau, P.J., 2007. Selection of an improved RNA polymerase ribozyme with superior extension and fidelity. RNA 13, 1017–1026. Zaher, H.S., Watkins, R.A., Unrau, P.J., 2006. Two independently selected capping ribozymes share similar substrate requirements. RNA 12, 1949–1958. Zaug, A.J., Cech, T.R., 1986. The Tetrahymena intervening sequence ribonucleic acid enzyme is a phosphotransferase and an acid phosphatase. Biochemistry 25, 4478–4482. Zheng, L., Mairhofer, E., Teplova, M., Zhang, Y., Ma, J., Patel, D.J., Micura, R., Ren, A., 2017. Structure-based insights into self-cleavage by a four-way junctional twister- sister ribozyme. Nat Commun 8. Zhou, L., Kim, S.C., Ho, K.H., O’Flaherty, D.K., Giurgiu, C., Wright, T.H., Szostak, J.W., 2019. Non-enzymatic primer extension with strand displacement. eLife 8. Zimmerly, S., Semper, C., 2015. Evolution of group II introns. Mob DNA 6.

136