<<

CONSTRUCTING ARTIFICIAL GENETIC SYSTEMS: A NEW NUCLEOTIDE METABOLISM

By

MARIKO MATSUURA

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

© 2017 Mariko Matsuura

This dissertation is dedicated to my family: mother Fumiko, father Yuji; brother Junya; sister Nanako; grandparents Isamu and Hisano Totsuka; and Kojiro and Hiroko Matsuura.

ACKNOWLEDGMENTS

I would like to thank Dr. Steven Benner for the guidance, patience, and the research opportunities he has given me. I am also indebted to all the current and former members and visiting scholars of the Foundation for Applied Molecular Evolution and the Firebird

Biomolecular Sciences LLC, especially Dr. Ryan W. Shaw, Ms. Jennifer D. Moses, Dr. Christian

B. Winiger, Dr. Dietlind L. Gerloff, Dr. Shuichi Hoshika, Dr. Hyo-Joong Kim, Dr. Myong-Jung

Kim, Dr. Myong-Sang Kim, Dr. Nilesh Karalkar, Dr. Nicole A. Leal, Dr. Yoshihiro Furukawa, and Dr. Fei Chen for their guidance, support, and help. Also, I would like to thank my research collaborators Dr. Stefan Lutz (Emory Univ.), Dr. Ashley B. Daugherty (Emory Univ.), Dr. David

Baker (U. of Washington), Dr. Per Jr Greisen (U. of Washington), Dr. JoAnne Stubbe (MIT), Dr.

Khalil A. Abboud (UF), and Mr. Daisuke Takahashi (UF). I would also like to express appreciation to my current and former committee members, Dr. Steven D. Bruner, Dr. Jamie S.

Foster, Dr. Jon D. Stewart, Dr. Weihong Tan, and Dr. Nigel Richards especially my committee chair, Dr. Bruner for his help. Also, Dr. Nicole A. Horenstein and Dr. Kari B. Basso for giving me opportunities and support for my research, Dr. Ben Smith, Ms. Lori Clark, Ms. Lori Ball, Ms.

Cassandra Watkins for their administrative assistance, and the Bruner group members for sharing their research talks and discussions with me. I would also like to thank TA supervisors and members for their guidance, and all the students in my classes, who made me a better teacher and scientist.

I want to express special gratitude to Dr. Shigeyuki Yokoyama and Dr. Eiko Seki from

RIKEN, Japan, who kindly accepts me as a research intern. An additional thank you to the

Department of Chemistry, the Foundation for Applied Molecular Evolution, NASA

Astrobiology, the National Science Foundation, the Defense Advanced Research Projects

4

Agency, and Templeton World Charity Foundation provided essential educational and financial support, without which my studies would not have been possible.

I would also like to thank people who supported me in many ways throughout the Ph.D. program. Firstly, I would like to express my sincere appreciation to my counselors, Dr. Linda A.

Lewis and Dr. Jaime Jasser, and all the members in the counseling group. Ms. Megan Williams,

Ms. Lauren P. Jadotte, Ms. Lauren McCarthy, and Mr. Robert Ross for their friendship and support for the improvement of my English language skills. I would also like to thank Ms.

Stephanie Seguin, Ms. Tiffany Bagby, Ms. Rachel Damiani, and Mr. David Honeycutt for their help with the English language, and all the members in the Gator Toastmasters Club for their help with public speaking. I would like to specifically thank Dr. Eunhui Yoon, Dr. Jules

Gliesche, Dr. Shanshan Wang, and Ms. Yuting Wang, for their friendship and D. Marcus Garcia for his love, support and understanding. Lastly, I would like to thank my family, especially my mother Fumiko, for her love and support.

5

TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 9

LIST OF FIGURES ...... 11

LIST OF ABBREVIATIONS...... 14

ABSTRACT ...... 14

CHAPTER

1 BACKGROUND ...... 17

Designing Systems ...... 17 Design in This Study ...... 18 Metabolic and Protein Engineering ...... 19 Molecules and Analogues ...... 19 Nucleotides ...... 19 Artificial Genetic Systems ...... 19 Nucleoside/nucleotide Synthesis...... 22 ...... 24 Nucleoside (NK)...... 26 Nucleoside monophosphate kinases (NMPK) ...... 28 Nucleoside diphosphate kinase (NDPK) ...... 30

2 RESEARCH OBJECTIVES ...... 32

3 GENERAL METHODS AND PROCEDURES ...... 33

Materials ...... 33 Nucleobase and Nucleoside/tide Synthesis ...... 33 Oligonucleotides Synthesis ...... 33 Standard Methods ...... 33 Cloning/Synthesis of Genes and Construction of Plasmids ...... 33 Transformation ...... 34 Preparation ...... 34

4 ASSAY DEVELOPMENT WITH WILD-TYPE ...... 35

Materials and Methods ...... 35 Preparation of Enzymes ...... 35 Kinase Assays ...... 35 Enzyme-coupled assay ...... 35 TLC assay ...... 36

6

Luciferase assay ...... 38 Modified 5′ nuclease assay ...... 39 Results ...... 42 Assay Development...... 42 Activities of Wild-Type Kinases...... 45 Drosophila melanogaster (DmdNK) ...... 46 Guanylate kinases ...... 46 E. coli Nucleoside diphosphate kinase (Ndk) ...... 48 Kinetic Parameters ...... 52 Discussion ...... 54 Assay Comparison ...... 54 Kinase Specificities ...... 58 Ribonucleoside Triphosphate Synthesis ...... 58

5 KINASE VARIANTS SCREENING ...... 60

Materials and Methods ...... 60 Enzyme Design and Preparation ...... 60 D. melanogaster nucleoside kinase (DmdNK) ...... 60 E. coli (EcCmk) ...... 61 E. coli (EcTmk) ...... 62 E. coli (EcAdk) ...... 63 E. coli (EcGmk) ...... 66 Library protein purification ...... 68 Kinetic Parameters ...... 69 Results ...... 70 Kinase Variants ...... 70 Kinase Variant Activities...... 70 D. melanogaster deoxynucleoside kinase variant Q81E ...... 70 E. coli cytidylate kinase (EcCmk) variants...... 73 E. coli thymidylate kinase (EcTmk) variants...... 74 E. coli adenylate kinase (EcAdk) variants ...... 76 E. coli guanylate kinase (EcGmk) variants...... 79 Discussion ...... 81 DmdNK Q81E Accepts dZ, Z, and dP ...... 81 Monophosphate Kinase Variants ...... 83

6 Z NUCLEOBASE CRYSTAL STRUCTURE ANALYSIS ...... 87

Background ...... 87 Materials and Methods ...... 87 Crystallization ...... 87 1H NMR ...... 88 Data Collection and Refinement ...... 89 Results ...... 91 1H NMR ...... 91 Crystal Structure of Z nucleobase ...... 91

7

Discussion ...... 95

7 CONCLUSIONS ...... 99

Assay Development ...... 99 Completed Steps ...... 99 Ribonucleoside pathway ...... 100 Kinase Specificities ...... 102 Next Steps for ZP phosphorylation ...... 104 Steps towards Artificial Metabolisms ...... 105

APPENDIX

A NUCLEOBASE SYNTHESIS ...... 106

B PRIMER SETS USED IN THIS STUDY ...... 107

C E. coli CODON USAGE ...... 108

D CRYSTALLOGRAPHIC DATA ...... 109

LIST OF REFERENCES ...... 116

BIOGRAPHICAL SKETCH ...... 126

8

LIST OF TABLES

Table page

1-1 Diversification and conservation of nucleoside/tide kinases in all three domains ...... 25

4-1 DNA sequences of primer/probe/template used for the modified 5′ nuclease assay ...... 41

4-2 Rf values of GACTZP nucleotides ...... 44

4-3 pKa values of GACTZ nucleotides ...... 45

4-4 List of wild-type kinases tested in this study...... 45

4-5 Pros and cons of the assays used in this study ...... 55

5-1 Primer sets used for cmk gene mutagenesis ...... 62

5-2 Primer set for mutagenesis PCR of gene designed for dPMP ...... 65

5-3 Primer set for mutagenesis PCR of adk gene designed for dZMP ...... 65

5-4 Kinase variants tested in this study ...... 70

5-5 Kinetic parameters of DmdNK Q81E for substrates ...... 72

5-6 The results of kinase library sequencing...... 85

5-7 Missing theoretical variants of kinases and ideal library size ...... 86

6-1 A list of crystallization conditions ...... 88

6-2 Crystal data and structure refinement ...... 90

6-3 Selected bond lengths (Å) and angles (°) for Z-Sod ...... 93

6-4 Selected hydrogen bonds for Z-Sod (Å and °) ...... 93

6-5 Selected bond lengths (Å) and angles (°) for Z-Am ...... 93

6-6 Selected hydrogen bonds for Z-Am (Å and °) ...... 93

7-1 Intracellular nucleotide concentrations ...... 93

B-1 Primer sets for cmk gene cloning ...... 107

B-2 Primer sets for tmk gene cloning and mutagenesis...... 107

B-3 Primer sets for adk gene cloning ...... 107

9

B-4 Primer set for gmk gene mutagenesis ...... 107

C-1 The codon usage of E. coli K12...... 108

D-1 Bond lengths (Å) and angles (°) for Z-Sod ...... 109

D-2 Hydrogen bonds for Z-Sod (Å and °) ...... 110

D-3 Atomic coordinates and equivalent isotropic displacement parameters (Å2) for Z- Sod...... 110

2 D-4 Anisotropic displacement parameters (Å ) for Z-Sod ...... 111

D-5 Bond lengths (Å) and angles (°) for Z-Am ...... 112

D-6 Hydrogen bonds for Z-Am (Å and °) ...... 113

2 D-7 Atomic coordinates and equivalent isotropic displacement parameters (Å ) for Z- Am ...... 114

2 D-8 Anisotropic displacement parameters (Å ) for Z-Am ...... 115

10

LIST OF FIGURES

Figure page

1-1 Designed pathway for an E. coli strain that replicates and maintains plasmids containing Z and P ...... 18

1-2 Artificially Expanded Genetic Information System (AEGIS) ...... 21

1-3 Z' and G mismatch under a basic condition. Z becomes a Watson-Crick complement for G when it is deprotonated (Z') ...... 22

1-4 Some enzymes used for nucleotide de novo and salvage pathways ...... 24

4-1 Forward and reverse reactions used for the luciferase assay ...... 38

4-2 Scheme for the modified 5′ nuclease assay ...... 40

4-3 High background signals with dPTP detection template containing JOE fluorophore (modified 5′ nuclease assay) ...... 41

4-4 TLC with the samples containing DmdNK wild-type and P ...... 46

4-5 TLC with guanylate kinases (EcGmk and TmGMPK) and PMP ...... 47

4-6 Luciferase assay (forward) with guanylate kinases (EcGmk, TmGMPK) and PMP...... 47

4-7 TLC with Ndk, dNMPs, and dNDPs ...... 48

4-8 Luciferase assay (reverse: ATP generation) with Ndk and dNTPs ...... 49

4-9 Modified 5′ nuclease assay with Ndk, dNMPs, and dNDPs ...... 50

4-10 TLC with Ndk and riboZP nucleotides...... 51

4-11 Luciferase assay (reverse: ATP generation) with Ndk, ZTP, and PTP ...... 51

4-12 The phosphorylation steps completed by wild-type kinases...... 52

4-13 Enzyme-coupled assay with DmdNK and P ...... 53

4-14 Enzyme-coupled assay with guanlylate kinases (EcGmk and TmGMPK) and PMP...... 53

4-15 Signal decrease observed in a negative control (dGDP) containing only nucleotide (no enzyme) by enzyme-coupled assay ...... 54

4-16 Signal decrease observed in a negative control (enzyme only) by enzyme-coupled assay ...... 55

11

4-17 Background signal with dNDP and dATP detected by the luciferase assay (reverse) ...... 56

4-18 The kinetic mechanism of Ndk (ping pong Bi Bi) ...... 57

5-1 Crystal structure of DmdNK complexed with dT and dC, and the designed variant Q81E to phosphorylate dZ ...... 61

5-2 Crystal structure of EcCmk complexed with CMP ...... 62

5-3 Crystal structure of EcTmk complexed with TP5A ...... 63

5-4 Crystal structure of EcAdk complexed with AMP and phosphoaminophosphonic acid-adenylate ester (my personal design) ...... 64

5-5 Crystal structure of EcAdk complexed with AMP and phosphoaminophosphonic acid-adenylate ester (Rosetta design) ...... 65

5-6 Crystal structure of EcGmk complexed with GMP...... 67

5-7 TLC with DmdNK Q81E and dNs ...... 71

5-8 Enzyme-coupled assay with DmdNK Q81E and dNs ...... 72

5-9 inhibition assay with DmdNK Q81E ...... 72

5-10 TLC with DmdNK Q81E and Z ...... 73

5-11 TLC with Cmk variant, dCMP, and dZMP...... 73

5-12 Enzyme-coupled assay with variant EcCmk, dCMP, and dPMP ...... 74

5-13 Luciferase assay (reverse) with EcTmk wild-type ...... 75

5-14 Luciferase assay (reverse) with EcTmk variants...... 75

5-15 The enzyme-coupled assay results with EcTmk variants mixed with dTMP...... 76

5-16 Luciferase assay (reverse) with EcAdk wild-type...... 77

5-17 The enzyme-coupled assay results with EcAdk variants designed to accept dPMP ...... 77

5-18 Luciferase assay (reverse) with EcAdk variants and dZDP...... 78

5-19 TLC with selected EcAdk variants and dZMP ...... 79

5-20 Luciferase assay (reverse) with EcGmk variants and dPDP ...... 80

5-21 Enzyme-coupled assay with selected EcGmk variants and dGMP ...... 80

12

5-22 Crystal structures of DmdNK wildtype complexed with dT, dC, and dGTP ...... 83

1 6-1 H NMR spectrum of the Z nucleobase crystal dissolved in DMSO-d6 ...... 91

6-2 Asymmetric unit of Z-Sod and Z-Am ...... 94

6-3 Molecular packing (Z-Sod) ...... 94

6-4 Molecular packing (Z-Am) ...... 95

7-1 The phosphorylation steps that have been completed ...... 100

A-1 Scheme for Z nucleobase synthesis ...... 106

13

LIST OF ABBREVIATIONS

AEGIS artificially expanded genetic information system

(d)N (deoxy)ribonucleoside

(d)NDP (deoxy)ribonucleoside diphopsphate

(d)NMP (deoxy)ribonucleoside monophopsphate

(d)NTP (deoxy)ribonucleoside triphopsphate

DmdNK Drosophila melanogaster deoxynucleoside kinase

EcAdk Escherichia coli adenylate kinase

EcCmk Escherichia coli cytidylate kinase

EcGmk Escherichia coli guanylate kinase

EcTmk Escherichia coli thymidylate kinase

Gsk Escherichia coli inosine-guanosine kinase

Ndk Escherichia coli nucleoside diphosphate kinase

NDPK Nucleoside diphosphate kinase (any organism)

NK Nucleoside kinase (any organism)

NMPK Nucleoside monophosphate kinase (any organism)

P 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one

T5 dNMPK T5 deoxynucleoside monophosphate kinase

TmCMPK Thermotoga maritima cytidylate kinase

TmGMPK Thermotoga maritima guanylate kinase

Udk Escherichia coli kinase

Z 2-amino-3-nitropyridin-6-one

14

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

CONSTRUCTING ARTIFICIAL GENETIC SYSTEMS: A NEW NUCLEOTIDE METABOLISM

By

Mariko Matsuura

May 2017

Chair: Steven D. Bruner Major: Chemistry

One of the most ambitious long-term goals in synthetic biology will be met by constructing living cells whose central biopolymers (DNA, RNA, and proteins) are built from pieces whose molecular structures differ from those in the biopolymers found today in all life forms known on Earth. To achieve that goal, starting 25 years ago, the Benner group created one of the first examples of an Artificially Expanded Genetic Information System (AEGIS). AEGIS is built from nucleotides that carry heterocycles that resemble natural pyrimidines and purines, but have different hydrogen-bonding patterns. RNA , DNA polymerases, and all accept the AEGIS nucleotides. The Benner group then developed amplification and sequencing methods for DNA and RNA containing the AEGIS nucleotides.

Thus, the remaining step to obtain living cells that utilize the AEGIS-based biopolymers was to make the metabolisms that biosynthesize AEGIS nucleoside triphosphate in vivo.

In this work, experimental activities were done to understand how this step might be taken. These included studies of enzymes that might, within a living cell, create AEGIS nucleoside triphosphates from AEGIS nucleosides. The work also sought to better understand the molecular details of AEGIS nucleosides that might cause them to be rejected by natural kinases, polymerases, and other enzymes required to take this step. This, in turn, required that we develop

15

and improve assays so that they could detect AEGIS nucleotide formation, and be applied to the kinase variant library screening. Nucleoside/tide kinase variants were designed to phosphorylate

AEGIS nucleosides/tides, and the created variants were screened and tested by the assays.

As outcomes, this work delivered several enzymes that phosphorylate AEGIS nucleosides and nucleotides. For one AEGIS species (P), these enzymes promise the assembly of a complete artificial metabolic path that converts P via its mono- and diphosphates to give PTP.

To understand the challenges of getting enzymes that handle AEGIS Z, crystal structures of the

Z nucleobase in two forms were obtained. Finally, in a collaboration with the group of Shigeyuki

Yokoyama that included a six week visit to their laboratory at RIKEN (Yokohama), the work examined processes that might allow expanded genetic alphabets to encode proteins with more than the standard 20 encoded amino acids. These results provided more insights into the interactions among unnatural nucleobases and enzymes to guide the design and metabolic engineering experiments. Together, the results represent a step towards the goal of creating an

AEGIS-based metabolism for a semi-synthetic organism.

16

CHAPTER 1 BACKGROUND

Designing Systems

Synthetic biology designs and creates new biological systems to produce useful molecules, to add more functions to the cells, and to understand biology. We can consider two ways to design and create a new biological system. One uses entirely natural materials, rearranging standard nucleotides and amino acids to give oligonucleotides and proteins with new sequences. The other requires a deeper design to use unnatural materials, including non-standard nucleotides and amino acids that are not found in nature. Natural materials are, of course, more abundant and available. This allows their use of new oligonucleotides and proteins using well- developed methods in organic synthesis (solid phase synthesis, PCR, large-DNA assembly) and bioengineering (digestion, ligation, mutagenesis). Accordingly, many groups have been creating new biological systems in the half century since recombinant DNA technology was first introduced.

Using unnatural materials is more difficult, as they are less available. However, and arguably, one learns more about the connection between chemistry and biology in general by using unnatural systems. In practice, a compromise is sought that does not require the biodesigner to do everything new "from scratch". Most efforts therefore combine unnatural

This chapter is reproduced in part with permission from [Matsuura, M.F., Shaw, R.W., Moses, J.D., Kim, H.J., Kim, M.J., Kim, M.S., Hoshika, S., Karalkar, N. and Benner, S.A.. Assays To Detect the Formation of Triphosphates of Unnatural Nucleotides: Application to Escherichia coli Nucleoside Diphosphate Kinase. ACS Synth. Biol. 2016, 5, 234-240. doi: 10.1021/acssynbio.5b00172] and [Matsuura, M.F., Winiger, C.B., Shaw, R.W., Kim, M.-J., Kim, M.- S., Daugherty, A.B., Chen F., Moussatche, P., Moses J.D., Lutz, S. and Benner, S.A., A Single Deoxynucleoside Kinase Mutant from Drosophila melanogaster synthesizes the Monophosphates of Nucleosides that are Components of an Expanded Genetic System. ACS Synth. Biol., in press. doi: 10.1021/acssynbio.6b00228]. Copyright 2016 American Chemical Society. This chapter is also reproduced in part with permission of the International Union of Crystallography [Matsuura, M. F., Kim, H.-J., Takahashi, D., Abboud, K. A., and Benner, S. A. Crystal Structures of Deprotonated Nucleobases from an Expanded DNA Alphabet, Acta Crystallogr. 2016, C72, 952-959. doi: 10.1107/S2053229616017071].

17

materials with natural materials. This combination is often called the "semi-synthesis" of new life forms.

Design in This Study

This study attempted to create parts of a semi-synthetic Darwinian system that utilizes unnatural nucleic acids of the type called Artificially Expanded Genetic Information System

(AEGIS). AEGIS oligonucleotides are built from an increased number of replicable nucleotides in DNA/RNA by rearranging the hydrogen bonds that hold nucleobases together in pairs that have a standard Watson-Crick geometry. Here, our long term goal is to obtain a semi-synthetic strain of Escherichia coli that replicates and maintains plasmid DNA containing AEGIS nucleotides. We began with an effort to implement with new enzymes a designed metabolism

(Figure 1-1) that focused on two AEGIS nucleobases, 2-amino-3-nitropyridin-6-one (trivially Z) and 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one (trivially P).

Figure 1-1. Designed pathway for an E. coli strain that replicates and maintains plasmids containing Z and P, 2-amino-3-nitropyridin-6-one (Z) and 2-amino-imidazo[1,2-a]- 1,3,5-triazin-4(8H)-one (P). B = Z and P.

18

Metabolic and Protein Engineering

As a brief review, metabolic engineering1 is an approach to change cellular properties through the modification of specific biological reactions or the introduction of new ones. For this purpose, recombinant DNA techniques are used, and metabolic fluxes are measured to quantify the success of the engineering. As the part of metabolic engineering, the genes that code proteins or enzymes involved in the specific pathways are mutated to produce new functionalities.

Designed enzymes, or designer enzymes2 are generated based on different methods (simulations, evolutional information, crystal structure analyses). The mutations are often made in the genes’ active sites, but sometimes in distal sites. The generated protein libraries are then screened by specific enzyme activity assays to select proteins with desired characteristics and/or functions. In this dissertation, we use metabolic engineering strategies to create pathways that manage AEGIS components.

Molecules and Analogues

Nucleotides

To remember what was learned in high school, nucleotides consist of three parts: 1) sugar

2) nucleobase and 3) phosphate group. RNA nucleotides contain a sugar called ribose, while

DNA nucleotides contain a sugar called deoxyribose. Purine nucleobases are larger than pyrimidine nucleobases. Thus, in RNA and DNA, purines match with pyrimidines to make a size complementary Watson-Crick base pair. The base pair is joined by hydrogen bonds (H-bond complementarity).

Artificial Genetic Systems

Watson-Crick base paring is neither perfect nor complete in natural RNA and DNA. For example, T:A and U:A pairs are joined by only two H-bonds, reflecting imperfect H-bond complementary. Interestingly, some complete the hydrogen bonding pattern through their

19

use of diaminopurine,3,4 able to form three hydrogen bonds, instead of adenine. This result has been recently confirmed by Philippe Marlière in France (personal communication via Dr.

Benner). This imperfect H-bond complementary causes the melting temperature differences of

DNA duplexes rich in A:T pairs relative to those rich in G:C pairs. This makes the sequence design for methods such as multiplexed PCR challenging.5

The Watson-Crick pairing scheme is also incomplete in its use of hydrogen bonding.

Intentionally to complete the set of Watson-Crick base pairs available in DNA, the Benner group created AEGIS. A complete set of hydrogen bonding patterns are found in AEGIS, creating an additional four nucleotide pairs that can, in duplexes, pair with standard Watson-Crick size and

H-bond complementarity6 (Figure 1-2). Several of these new pairs, especially the Z:P pair, proved able to be accepted by many natural polymerases and other enzymes, although often not as efficiently as standard nucleotides.7-10

The GACTZP AEGIS system, with polymerases that replicate it, supports chemical

Darwinism in the laboratory. Here, libraries of GACTZP DNA evolve in response to selection pressure to generate functional DNA molecules that bind to cells from liver cancer, cells from breast cancer, proteins expressed on cell surfaces from genes introduced by researchers, and anthrax toxin.11-14 These achievements motivate the development of a semi-synthetic living system that utilizes a Z:P pair inside of Escherichia coli cells. Here, the design supplies the Z:P nucleosides in the growth medium. These are then transported to the inside of the cell, followed by their phosphorylation using (we hoped) existing nucleotide biosynthesis pathways.10,15

20

Figure 1-2. Artificially Expanded Genetic Information System (AEGIS).

Here, this work examined some of the physical properties of the Z and P systems to better understand the Z:P system. We started by noting that various implementations of the unnatural hydrogen bonding patterns in AEGIS can exaggerate the acid-base (protonation/ deprotonation) properties of the natural nucleotides. For example, when implemented on a nitropyridine pair, the pyrimidine donor-donor-acceptor species 6-amino-3-(2′-deoxy-D- ribofuranosyl)-5-nitro-1H-pyridin-2-one (Z) has a pKa of approximately 8, two pKa units below those of T/U and G, as acids. In its deprotonated form, Z', is a Watson-Crick complement for G

(Figure 1-3). Accordingly, many polymerases slowly lose Z:P pairs by first misincorporating dGTP opposite deprotonated template Z'. This misincorporation increases with increasing pH.

Conversely, Z:P pairs can be slowly gained by the reverse process, where an incoming deprotonated dZ'TP is mismatched opposite a template G.

These observations may suggest that a genetic system that contains both C:G and Z:P nucleobase pairs is not possible, at least not with this particular nitropyridine implementation of

21

the Z hydrogen bonding pattern. However, such a conclusion would be premature. Enzymes have often evolved to distinguish two structurally similar species, where one has a charge and the other does not. If natural polymerases have not evolved to perfectly distinguish the matched

(uncharged) C:G and Z:P pairs from the (anionic) Z':G mismatch, it is not likely to be because they are unable to make that distinction, but rather because they have not been under the selection pressure. The crystal structures that we studied relate to this.

Figure 1-3. Deprotonated Z' and G mismatch under a basic condition with Watson-Crick geometry. Z is a Watson-Crick complement for G when it is deprotonated (Z').

Nucleoside/nucleotide Synthesis

In assembling a semi-synthetic "constructive" biology, we might begin at different starting points. Here, we proposed to start with the nucleosides fully assembled. We might, however, consider a constructive metabolism that starts with smaller precursors to assembly the sugar, the heterocycle, and other pieces stepwise. This concept for an AEGIS biosynthesis resembles standard nucleoside biosynthesis.

Reviewing basic biosynthetic pathways, canonical nucleotide de novo biosynthetic pathways from simpler precursors consist of the following steps16,17 (Figure 1-4). Here, nucleotides are synthesized from scratch (ammonia, CO2, amino acids, ATP etc.). The synthesis of purine nucleosides starts from amino acids, CO2, and ATP, and proceeds by: 1) synthesis of inosine monophosphate (IMP) by IMP cyclohydrolase; 2) synthesis of xanthosine monophosphate (XMP) and succinyladenosine monophosphate (S-AMP) by IMP dehydrogenase

(IMPDH) and adenylosuccinate synthase (ASS); 3) synthesis of GMP and AMP by GMP

22

synthase (GMPS) and adenylosuccinate (ASL); 4) synthesis of GDP and ADP by the corresponding nucleoside monophosphate kinases (NMPK, GMK and ADK) 5) synthesis of dATP and dGTP by nucleoside diphosphate kinase (NDPK).

The synthesis of standard pyrimidine nucleosides starts from ammonia, amino acids,

ATP, and CO2, followed by the UTP synthesis: 1) synthesis of orotidine 5′-monophosphate

(OMP) by orotate phosphoribosyl ; 2) synthesis of UMP by orotidine 5′-phosphate decarboxylase (or UMP synthetase, UMPS); 3) synthesis of UDP by UMK; 4) synthesis of UTP by NDPK. Then, dTTP and CTP are synthesized by (TS) and CTP synthase

(CTPS) from dUMP and UTP, respectively (not limited to these pathways). Then, ribonucleoside diphosphate reductase (RNR) reduces the diphosphates and triphosphates and produces deoxynucleoside triphoshates.

Nucleotides can also be obtained by recycling the nucleobases and nucleosides (salvage pathways). Here, 1) nucleoside kinase (NK) phosphorylates recycled nucleosides and 2) phosphoribosyl transferase (PRT) adds activated ribose-5-phosphate (phosphoribosyl pyrophosphate, PRPP) to recycled nucleobases to reproduce nucleoside monophosphates (Figure

1-4, pink arrows). The monophosphates are then converted to the corresponding triphosphates by

NMPKs and NDPK.

Clearly, to develop a biosynthesis for AEGIS nucleosides that follows this strategy would be quite difficult. In particular, the forging of the heterocyclic rings would need to be entirely novel, as Z is a pyridine, not a pyrimidine, while the ring system of P, while is contains the same number of and carbon atoms as the ring systems of A and G (4 and 5, respectively), they are arranged differently, which would require an entirely new biosynthetic strategy.

23

Figure 1-4. Some enzymes used for nucleotide de novo and salvage synthesis pathways. Orange: NKs, blue: NMPKs, black thick arrows: NDPK.

Kinases

Therefore, we opted for a simpler approach that began with the AEGIS nucleoside completely formed, and used stepwise phosphorylation to convert these to their monophosphates, their diphosphates, and their triphosphates. This relied on the recruit of natural kinases, which work on these standard nucleosides/nucleotides to catalyze those phosphorylation steps.

Among available natural kinase enzymes, salvage and standard nucleoside and nucleotide kinases are both possible recruitment candidates. They play important roles in all steps of triphosphate synthesis, rearranging high energy phosphate-phosphate bonds. From the perspective of semi-synthetic organisms, they present different recruitment problems. Some kinases have broad substrate specificities, especially those that phosphorylate diphosphates

(NDPK); these we found to be easily recruited to handle AEGIS nucleoside diphosphates.

24

Unfortunately, other kinases, especially those that phosphorylate nucleosides and nucleoside monophosphates (NK, NMPK) have quite narrow substrate specificities (Table 1-1). These proved to be difficult to handle AEGIS nucleosides/nucleotides.

As seen in Table 1-1, reactions catalyzed by the kinases are similar in two domains of life

(eukaryotes, eubacteria), but not in . Unfortunately, fewer experimental results are available for archaeal enzymes18 (most inferences made about archaeal enzymes are based on unreliable genome annotation transferred from non-archaea); this made them more difficult to assess as recruitment candidates. The kinases in viruses often have complex natural histories (the kinase genes captured multiple times19) compared to organisms in the three domains. This made them interesting sources of kinases for recruitment, as they may do things with unnatural substrate analogs different from what the cellular kinases do.

Table 1-1. Diversification and conservation of nucleoside/tide kinases in all three domains. NK (N→NMP) NMPK (NMP→NDP) NDPK (NDP→NTP) Base Archaea Eukaryote* Archaea Bacteria* Eukaryote* Archaea Bacteria Eukaryote A + + + + + + (broad T specificity + + + + + nucleoside Not yet G + + + + + kinase and found + + C specific + + kinases) + + + U + + *some bacterial and eukaryotic species have broad specificity (deoxy)ribonucleoside/tide kinases

Since our semi-synthetic organism would be based on E. coli, we started with its kinases.

E. coli has these known kinases: NK: uridine/cytidine kinase,20 deoxythymidine kinase,21 inosine-guanosine kinase;22 NMPK: guanylate kinase,23 adenylate kinase,24 cytidylate kinase,25 thymidylate kinase,26 uridylate kinase;27 NDPK: nucleoside diphosphate kinase.28 These, therefore, are the first enzymes that we might recruit to create a semisynthetic organism. Below,

E. coli and other kinases used in this study are described. All the kinases use ATP as a primary phosphate donor.

25

Nucleoside kinase (NK)

Nucleoside kinases have distinct specificities in different organisms. Some bacteria and eukaryotes have deoxynucleoside kinases; however, archaea do not.29 Moreover, some eukaryotes have broad specificity "deoxyribo"nucleoside kinases,30-33 while some archaea have broad specificity "ribo"nucleoside kinases.34-36 Further, Herpes simplex type 1 has a unique bifunctional /TMP kinase.29,37 Here, we review the recruitment candidates that we chose, and explain our choice.

DmdNK (EC 2.7.1.145): This kinase was selected because it is one of the most common deoxynucleoside kinases that have been used to phosphorylate deoxynucleoside analogues, especially for gene therapy applications. It is naturally found in Drosophila melanogaster, and is known to phosphorylate a range of nucleoside analogues.38,39 These include 2-chloro-2′- deoxyadenosine, arabinofuranosylcytosine, and other drugs. The kinase was first isolated in 1998

4 7 and it has broad substrate specificity with kcat/KM values ranging from 2.9 × 10 to 1.6 × 10 . It prefers pyrimidines.32

This kinase belongs to a family of kinases that includes other deoxythymidine kinases, a bifunctional thymidine/TMP kinase from type 1,29 and others, including the

NadR family having ribosylnicotinamide kinase,40 nicotinamide-nucleotide adenylyltransferase,41 242 involved in NAD biosynthesis and other physiological processes, including nucleotide phosphorylation.43,44

The substrate specificity of DmdNK can be altered with only a few amino acid substitutions.33 This included changing the substrate preference of the kinase for nucleoside analogues, although the values of kcat/KM are lower than the wild-type. For example, Liu and coworkers showed that DmdNK with four to eight amino acid replacements preferred nucleoside analogues (3′-deoxythymidine (ddT) and a fluorescent ddT derivative) to natural nucleosides

26

39,45 (dT, dC, dA, dG). However, these variants also have lower kcat/KM compared to the wild- type. The crystal structures of this enzyme are available.46

Considering these factors, we examined DmdNK having Gln81 replaced by various amino acids, hoping to obtain one able to accept AEGIS nucleosides. This residue, according to the crystal structure, appears to contact with the nucleobases,38 making it likely that mutation of the residue will change the specificity with respect to Z and P.

Uridine/cytidine kinase (Udk) (EC 2.7.1.48): This kinase was chosen for Z phosphorylation, since this E. coli kinase phosphorylates uridine and cytidine. We hoped that this broad substrate specificity would be carried over to AEGIS pyrimidine analogs. This kinase was first isolated in 1978 and KM values for cytidine and uridine are reported as 0.13 mM and 0.35 mM, respectively.20

This kinase belongs to the UCPP family, which includes pantothenate kinase47 and .48 The enzymes that belong to this family all catalyze , but of having quite different structures. The all have a polypeptide known as the Walker B motif, in the form of hhhhEG.29

Inosine-guanosine kinase (Gsk) (EC 2.7.1.73): This E. coli kinase was chosen as a recruitment candidate to phosphorylate the AEGIS purine analog P, since it phosphorylates inosine and guanosine, two natural purines. The kinase was first isolated in 1995.22 It belongs to the PfkB/RK family, which contains , (RK), , and 1- (Pfk). These enzymes have two highly conserved motifs. The first motif contains two consecutive Gly residues, and forms a hinge between the lid and  domain. The second domain contains a catalytic base Asp. The domain is located in the C-terminal region, and is involved in ATP binding and the formation of an anion hole.49,50

27

Nucleoside monophosphate kinases (NMPK)

The activities of NMPKs are generally specific for the nucleotides with the specific nucleobases in all three domains of life; these kinases generally accept only one nucleobases. As some exceptions, yeast and have some deoxyribonucleoside monophosphate kinases that have specificities that cover more than one nucleotide51-55 (EC 2.7.4.13). Therefore, we considered these as possible recruitment candidates.

These considerations applied to our examination of Ura6p (EC 2.7.4.13): Although the

E. coli genome encodes uridylate kinase27,56 (Umk, EC 2.7.4.14), it is believed to accept only uridine. In contrast, uridylate kinase from yeast (Ura6p) is reported to phosphorylate multiple nucleotides including both pyrimidines and purines.52,53 Therefore, we chose the yeast kinase as a recruitment candidate.

Cytidylate kinase (CMK, EC 2.7.4.25): Cytidylate kinase from E. coli (EcCmk)25 phosphorylates pyrimidine nucleosides CMP and dCMP. Therefore, we examined this kinase as a candidate to phosphorylate dZMP, an AEGIS pyrimidine analog. Cytidylate kinase is ubiquitous in archaea and bacteria, but is absent from eukaryotes.29 Instead, eukaryotes have

CMP/UMP kinases that accept both uridine and cytidine monophosphates (EC 2.7.4.14).

Cytidylate kinase was first described in 1995.25 The cytidylate kinase forms a distinct evolutionary group, but it has resemblances to archaeal adenylate kinase and 2-.29

Cytidylate kinase from Thermotoga maritima (TmCMPK), which has evidently not been studied in sufficient detail to generate a publication (although the structure of thymidylate kinase from T. maritima is available57), was another choice to phosphorylate Z nucleotides. We learned about this enzyme from our collaborator, Dr. Stefan Lutz from Emory University, who has been studying it for gene therapy applications.

28

Thymidylate kinase (TMK, EC 2.7.4.9): Thymidylate kinase from E. coli (EcTmk) phosphorylates dTMP using ATP, dATP, and GTP. This kinase was chosen to phosphorylate dZMP, as it phosphorylates pyrimidine nucleotide. E. coli thymidylate kinase was first reported in 1996,26 and belongs to the same group as other deoxyribonucleoside kinases as mentioned above.29 This is one of the two universal monophosphate kinases among three domains other than adenylate kinase (guanylate kinase is not identified in archaea29,58).

Adenylate kinase (ADK, EC 2.7.4.3): Based on its substrate preference (the enzyme accepts nucleotides with a purine nucleobase, adenine), adenylate kinase was proposed to be the native kinase most likely to catalyze dPMP phosphorylation, with its purine-like AEGIS component. The kinase was first isolated in 198524 and it was proven to catalyze the conversion of (d)AMP to (d)ADP in the presence of magnesium ion.24,59,60 The crystal structures of this enzyme are available.61 Interestingly, adenylate kinase also works as a diphosphate kinase

(converting (d)NDP to (d)NTP), apparently operating physiologically when the nucleoside diphosphate kinase is deleted.62 Some adenylate kinases can also use multiple triphosphates as phosphate donors17 (GTP, CTP, TTP, UTP, ITP), and E. coli adenylate kinase is reported to be able to use GTP and ADP.63,64

The adenylate kinase family is characterized by the insertion of Gly after the P-loop

(Walker A) Lys, a hhhDGFPR motif (another example of the "Walker B motif"), and a RxDD motif at the N-terminal. Lys and Arg residue(s) are believed to be important for via interactions with phosphate groups of nucleotides.61,65-67

Guanylate kinase (GMK, EC 2.7.4.8): Guanylate kinase from E. coli (EcGmk) was isolated in 1993.23 This kinase is shared in homologous forms by bacteria and eukaryotes; however, a homologous enzyme has not been identified in archaea.29,58 The guanylate kinase

29

group consists of two families other than guanylate kinase, and it includes membrane-associated guanylate kinase-like protein68 and PhnN (ribose 1,5-bisphosphate phosphokinase).69 Proteins in this group have two conserved Arg residues and a conserved second acidic residue after Asp in the Walker B motif.29

Guanylate kinase from Thermotoga maritima (TmGMPK) has not yet been studied to the point where a publication is available. Again, we learned about it from our collaborator, Stefan

Lutz at Emory. It offered us another choice to phosphorylate P nucleotides.

T5 dNMPK: The deoxynucleoside monophosphate kinase from T5 bacteriophage (EC

2.7.4.13) accepts all four natural deoxynucleoside monophosphates.54,70 As with other kinases with broad substrate specificity, accepting many standard nucleobases, we considered this to be a candidate kinase to phosphorylate both Z and P. This multi-substrate kinase is one of the only four multi-substrate nucleoside monophosphate kinases identified so far.71 The others include T4 bacteriophage,51 Streptomyces bacteriophage ϕC31,55 and yeast (Ura6p),52,53 which all carry a multi-substrate monophosphate kinase.

Nucleoside diphosphate kinase (NDPK)

The nucleoside diphosphate kinase may be the best studied nucleotide kinase (EC

2.7.4.6) among all nucleotide kinases in terms of its biological and pathological characterization, since it is a target for tumor therapy. NDPK is associated with many other proteins and is involved in various metabolic and developmental pathways both in bacteria and eukaryotes.72-74

Nucleoside diphosphate kinase is found in organisms from all three domains. Most unicellular organisms have a single copy of the NDPK gene, however, most multicellular organisms have multiple copies resulting from gene duplications.75 No specific nucleoside diphosphate kinase has been found so far. The natural enzymes all appear to accept all five standard nucleobases.

30

Since our semi-synthetic organism is to be based on E. coli, we chose NDPK from E. coli

(Ndk) as the first to examine to learn if it could phosphorylate dPDP and dZDP. In E. coli, this kinase is used as a primary nucleoside diphosphate kinase for all five natural nucleotides (both ribo- and deoxyribonucleotides).17 The kinase was first isolated in 1995 and it catalyzes the conversion of (d)NDP to (d)NTP in the presence of magnesium ion.28 The structure of this enzyme is available but without any substrates bound in the .76 It is known that this enzyme uses a histidine residue to accept the phosphate group being transferred, under a ping- pong mechanism.28,77,78

31

CHAPTER 2 RESEARCH OBJECTIVES

Goals

The objective of my research was to create the AEGIS triphosphate synthesis pathways to support the semi-synthetic organism that we hope to construct to replicate AEGIS DNA

(Figure 1-1). The key steps involved in the pathway are: 1) transport (uptake) step, 2) three phosphorylation steps, and 3) replication of plasmids containing AEGIS bases. The preliminary work in our group has shown that E. coli can take up some AEGIS nucleosides fed to the cell.

Furthermore, we already have polymerases that replicate ZP containing plasmids. Therefore, my work focused on creating enzymes (kinases) that convert nucleosides to the corresponding triphosphates in vitro by protein engineering. In this dissertation, the following achievements were introduced and discussed:

1. Developing kinase activity assays. 2. Testing the wild-type kinase activities towards ZP nucleosides/tides 3. Engineering of deoxyribonucleoside/tide kinases. 4. Testing/Screening the kinase variants using the assays. 5. Measuring kinetic parameters of the kinases. 6. Obtaining crystal structures of Z nucleobase to understand its mispairing tendency and potential discrimination by enzymes.

32

CHAPTER 3 GENERAL METHODS AND PROCEDURES

Materials

All chemicals used in this study are from Sigma Aldrich (St. Louis, MO, USA) unless otherwise stated.

Nucleobase and Nucleoside/tide Synthesis

Z nucleobase was synthesized by Dr. Hyo-Joong Kim following a scheme from the

Foundation for Applied Molecular Evolution (Appendix A). Other nucleosides and nucleotides were synthesized by Dr. Hyo-Joong Kim, Dr. Myong-Jung Kim, and Dr. Myong-Sang Kim from the Foundation for Applied Molecular Evolution and Firebird Biomolecular Sciences LLC.

Oligonucleotides Synthesis

Oligonucleotides containing unnatural nucleobases, Z and P were synthesized by Dr.

Shuichi Hoshika from the Foundation for Applied Molecular Evolution. Those oligonucleotides were synthesized by using the automated solid phase phosphoramidite-based synthesis procedure described previously.79 Purified oligonucleotides were obtained by HPLC.

Standard Methods

Cloning/Synthesis of Genes and Construction of Plasmids

Genes encoding the nucleoside/tide kinases from E. coli were cloned/amplified using corresponding primer sets and the MG1655 strain of E. coli (E. coli Genetic Stock Center, Yale

University, CT, USA). Each gene was ligated with a HAT tag that contains histidine residues at

N-terminus, and placed behind a tetracycline promoter. The plasmids carry resistance to ampicillin as a selection marker. They were purified using Zyppy® plasmid extraction kit (Zymo

Research, Irvine, CA, USA) and each kinase gene sequence was confirmed by Sanger sequencing.80

33

Transformation

Plasmids carrying kinase genes were transformed into E. coli by electroporation. E. coli

(DH5α) competent cells and purified PCR products were mixed in a PCR tube, poured into a pre- cold electroporation cuvette (1 mm gap), pulsed once (~ 1.8 kV, ~ 4.8 ms), diluted with cold recovery media, and mixed thoroughly. The pulsed bacteria were then transferred to a tube containing SOC media and were incubated for one hour at 37 °C to allow the cells to recover. A portion of the culture was spread on a pre-warmed agar-LB culture plate containing the selection antibiotic, and incubated overnight at 37 °C until colonies formed.

Enzyme Preparation

E. coli cells carrying the kinase expression plasmids were grown in LB media in the 37°C incubator with shaking until the OD at 600 nm reached 0.6. The kinase expression was initiated with anhydrotetracycline at 200 ng/mL. The cells were incubated for 4.5 hours at 30°C for expression to deter inclusion body formation. The bacteria were collected by centrifugation at

4000 ×g for 10 min and then resuspended in ~ 10 mL of HAT tag binding/wash buffer (50 mM

Tris-HCl, pH 7.5, 300 mM NaCl, 0.03% IGEPAL CA-600, 10 mM imidazole). The bacteria were lysed by sonication on ice (20 seconds, 10 W, three times with 1 minute rest intervals). The supernatant containing the crude protein was recovered by centrifugation at 16,000 ×g for 20 min and applied to a cobalt column (HisPur Cobalt Resin, Thermo Fisher Scientific, Waltham, MA,

USA) using FPLC. The column was washed with the HAT tag wash buffer until the absorption at 280 nm returned to baseline, and the kinase was eluted with the elution buffer that is the wash buffer with an additional 1 mM DTT and 200 mM imidazole. The size and purity of the purified kinases was confirmed by SDS-PAGE gels81 stained by the Fairbanks method,82 and the concentration was determined by using the Bradford assay83 and/or an absorption at 280 nm.84

34

CHAPTER 4 ASSAY DEVELOPMENT WITH WILD-TYPE ENZYMES

Materials and Methods

Preparation of Enzymes

All kinases were purified using the standard histidine tag protein purification method described in the previous chapter. His or HAT tag was attached to the N- or C-terminus of the kinases. Only DmdNK had the His-tag; the other kinases had the HAT-tag, which is a native protein tag that contains other amino acid residues such as Lys, Asp, and Asn in addition to histidine, to promote protein solubility.

For DmdNK, one group has reported that the C-terminal segment of the enzyme is important for its catalytic activity and specificity.85 It is also reported that an N-terminal truncation does not affect the activity of human 2,86 which is related to

DmdNK.87 Thus, His-tag was attached to the N-terminus of DmdNK. This was tested and found to have no discernable effect on the kinase’s activity.

Kinase Assays

In this study, four assays were developed: enzyme-coupled assay, TLC assay, luciferase assay, and modified 5′ nuclease assay. Each assay was designed to measure the desired activity and the applicability for high-throughput screening.

Enzyme-coupled assay

This assay allows the continuous measurement of the reaction as it proceeds. It is, however, an indirect method to confirm the kinase activity. The enzyme-coupled involved three

This chapter is reproduced in part with permission from [Matsuura, M.F., Shaw, R.W., Moses, J.D., Kim, H.J., Kim, M.J., Kim, M.S., Hoshika, S., Karalkar, N. and Benner, S.A. Assays To Detect the Formation of Triphosphates of Unnatural Nucleotides: Application to Escherichia coli Nucleoside Diphosphate Kinase. ACS Synth. Biol. 2016, 5, 234-240. doi: 10.1021/acssynbio.5b00172]. Copyright 2016 American Chemical Society.

35

reactions: 1) an active kinase phosphorylates nucleoside using ATP as a phosphate donor to generate ADP; 2) the resulting ADP is used by rabbit to generate pyruvate from phosphoenol pyruvate; 3) pyruvate and NADH, which has absorption at 340 nm, are then converted to lactate and NAD+ by lactate dehydrogenease; 4) the absorbance at 340 nm decreases, as NAD+ does not absorb at 340 nm.

To confirm the kinase activity, mixture (100 µL) containing 50 mM Tris-HCl (pH 7.5),

100 mM KCl, 2.5 mM MgCl2, 0.18 mM NADH, 0.21 mM PEP, 1 mM ATP, 1 mM DTT, 0.1 mg/mL BSA, 30 U pyruvate kinase (Roche, Basel, Switzerland) and 33 U lactate dehydrogenase

(Roche) was placed in a 96 well clear plate. The kinase of interest (pg to ng range, 1 L) was added, and the reaction was initiated by adding 0.5 mM of nucleoside/nucleotide substrate.

Absorbance at 340 nm was monitored for 60 min at room temperature. Mixtures containing any kinase or nucleoside substrate were used as negative controls. Any ATPase activity interferes with the assay by generating ADP from ATP by transferring its active phosphate group to water, not to a nucleoside/tide.

TLC assay

This assay uses a classical thin layer chromatography (end-point assay), and is a direct method to detect phosphorylated products; their characteristic TLC mobility also provides direct evidence for the phosphorylated product. Enzyme, nucleoside, and γ-32P-labeled ATP were incubated in a reaction buffer, and the corresponding nucleotide products were separated.

Polyethylene imine (positively charged at pH 3.5) coated cellulose plate was used as a solid phase, therefore negatively charged species will be trapped and separated according to their mobility. The products’ spots are confirmed by running reference compounds and checking their

36

mobility under UV light, and retention factors (Rf) for each reference compound were calculated/used for product confirmation.

The disadvantages of the TLC assay are: 1) it uses a radiolabeled substrate; 2) it requires more time and attention (a half to whole day to get results); 3) if the complete conversion of substrates is not achieved with the longer incubation period, radiolabeled inorganic phosphate appears, which interferes with the analysis.

With the sample containing nucleoside kinases, phosphoryl transfer was performed in a reaction mixture (20 μL) containing 16.7 nM [γ-32P] ATP (Perkin Elmer, Waltham, MA, USA),

0.5 mM deoxynucleoside substrates (GACTZP), 50 mM Tris-HCl (pH 7.6), 5 mM MgCl2, 100 mM KCl, 0.5 mg/mL BSA, 1 mM DTT, 1 mM unlabeled ATP and nucleoside kinase (ng range).

The samples were incubated for 30 min at 37°C, and then mixed rapidly with cold formic acid

(11 M, 2 μL) to terminate the reaction. Then inorganic phosphates were removed by adding a mixture (2 μL) of 400 mM sodium tungstate 500 mM tetraethylammonium chloride, and 500 mM procaine HCl in the ratio 5:4:188 The inorganic phosphate complex was precipitated out by centrifugation and the supernatant (2 μL) was spotted on polyethyleneimine (PEI) cellulose F thin layer chromatography sheets (EMD Millipore, Billerica, MA, USA) and the nucleotides

89 were separated in a solvent "Aa" containing 1M acetic acid (pH adjusted to 3.5 with NH4OH).

The sheets were autoradiographed using phosphor imaging screens (BioRad, Hercules, CA,

USA) and the TLC images were captured by a Personal Molecular Imager™ (PMI) System

(BioRad). Each compound was confirmed by comparing mobilities with reference compounds, visualized by UV shadowing.

With the sample containing nucleotide kinases, the phosphoryl transfer was performed in the same reaction buffer (20 μL), but with nucleotide substrates and their kinases (ng range) The

37

samples were treated with the same method as nucleoside kinases, but the products were

90 separated using a solvent containing 0.75 M KH2PO4 (pH adjusted to 3.5 with H3PO4). dATP formation (which has similar Rf as ATP) was confirmed by two-dimensional TLC analysis on a

PEI cellulose plate using solvent "Aa" (1 M acetic acid (pH 3.5)) as the first solvent, and solvent

89 "Sa" (74 g of (NH4)2SO4, 0.4 g of NH4HSO4 in 100 mL of water) as the second solvent.

Luciferase assay

This assay measures either the forward or reverse reaction of the enzymes (Figure 4-1).

Substrates ADP (reverse) or ATP (forward) are added to a reaction mixture. If a kinase is active with the substrates presented, it catalyzes the forward or reverse reaction, and produces (reverse) or consumes (forward) ATP. ATP is a substrate for luciferase in the presence of oxygen, and the luciferin-luciferase reaction produces luminescence. It is the same reaction used by fireflies.

Mixtures containing only enzymes or substrates were used as negative controls.

Figure 4-1. Forward and reverse reactions used for the luciferase assay. Forward reaction consumes ATP, and results in signal decrease if the kinase is active. Reverse reaction generates ATP, and results in signal increase if the kinase is active.

The ATPlite luminescence assay system (PerkinElmer, Waltham, MA, USA) was used to detect ATP consumption or generation resulted from forward or reverse reaction of kinases.

Equimolar or higher amounts of nucleoside or nucleotide substrates (0.5-100 μM) and ADP or

ATP (0.5-10 μM) were mixed with kinases in a buffer (50 μL) containing Tris-HCl (50 mM, pH

7.5) and 5 mM MgCl2. The reaction mixture was incubated for 5 min at room temperature. (82°C

38

for TmCMPK and TmGMPK). Then a substrate solution (25 μL, containing luciferase and luciferin, supply of the kit) was added to the mixture, and preincubated in the dark at least 5 min.

Luminescence intensity of each sample was measured by SpectraMax M5 (Molecular Devices,

Sunnyvale, CA, USA).

Modified 5′ nuclease assay

This assay is a modified version of 5′ nuclease assay (also known as TaqMan® assay) introduced by Wilson and his coworker in 2011.91 This assay was further modified by Shaw et al.92 and was adapted to this study. This modified version of the assay aims to detect a more complete reaction series that starts with the AEGIS nucleoside diphosphates, converts the diphosphates to triphosphates by kinase-catalyzed phosphorylation, and incorporates the triphosphates into a DNA oligonucleotide using a chain reaction. The incorporation reaction is monitored by observing the fluorescence signal released only when the desired triphosphates are produced and able to be used to extend the primer. This extension leads to

DNA probe cleavage catalyzed by the 5′ endonuclease activity of DNA polymerase, coupled with releasing quenchers attached to the displaced strand that mask fluorophores in the system.

This is a continuous assay, and all reactions occur in one pot (Figure 4-2).

In this assay, two kinds of primer/probe/template sets are used, shown in Table 4-1. One architecture, used for GACTZ triphosphate detection, places a fluorophore (JOE) on a template, a quencher on a probe, and uses E. coli Pol I to extend the DNA strand.92 This architecture changed the position of the fluorophore from the original architecture, which has a fluorophore on the 5′- end of the probe91. This new architecture is designed to reduce background noise that results from non-specific hydrolysis of the probe 5′-end by the polymerase, allowing the assay to be used with polymerases that have strong 5′-nuclease activities, like E. coli Pol I.

39

The original architecture91 was used for dPTP detection, as the architecture using a JOE fluorophore (JOE/GGAGTGAGTGTGAGGTGAATGGTTTGTTZGGCGGTGGAGGCGGT) showed higher background signal (Figure 4-3). We interpret the false positive signal observed in the Figure in dPTP detection as the result of the misincorporation of other triphosphates by E. coli Pol I into the strand opposite Z. As dGTP was not present in the reaction mixture when dPTP was sought, misincorporation of purine dATP opposite template Z is suspected. Since changing pH and buffer components other than polymerase did not reduce the background, the choice of polymerase would be important for performing the assay with a template containing Z.

Figure 4-2. The scheme for the 5′ nuclease assay. Triphosphates are produced from the corresponding diphosphates by the kinase (the figure shows Ndk/dZDP as an example), and are incorporated into the DNA strand by a DNA polymerase. Then the DNA polymerase extends the strand and cleaves the probe with its 5′ endonuclease activity, releasing quenchers to give fluorescence.

40

Table 4-1. DNA sequences of primer/probe/template used for the 5′ nuclease assay. Size Name Classification Sequence (5'-3') (bp) NDP-1 Primer CCG CCT CCA CCG CC 14 3'Q-dGTP probe Probe ACC ATT CAC CTC ACA CTC ACT CA/BHQ2 23 3'Q-dATP probe Probe TGG TCC GTG GCT TGT GCG TGC GT/BHQ2 23 3'Q-dCTP probe Probe AGG ATT GAG GTA AGA GTG AGT GG/BHQ2 23 3'Q-dTTP probe Probe AGG ACC GAG GCA AGA GCG AGC GA/BHQ2 23 FAM-dATP probe Probe 6-FAM/TGG TCC GTG /ZEN/GCT TGT GCG TGC GT/3IABkFQ 23 dGTP-detect Template JOE/TGA GTG AGT GTG AGG TGA ATG GTG TTG TTC GGC GGT GGA GGC GGT 45 dATP-detect Template JOE/ACG CAC GCA CAA GCC ACG GAC CAC AAC AAT GGC GGT GGA GGC GGA 45 dCTP-detect Template JOE/CCA CTC ACT CTT ACC TCA ATC CTC TTC TTG GGC GGT GGA GGC GGT 45 dTTP-detect Template JOE/TCG CTC GCT CTT GCC TCG GTC CTG TTG TTA GGC GGT GGA GGC GGT 45 dZTP-detect Template JOE/CCA CTC ACT CTT ACC TCA ATC CTC TTC TTP GGC GGT GGA GGC GGT 44 dPTP-detect Template ACG CAC GCA CAA GCC ACG GAC CAA AZA AAG GCG GTG GAG GCG GA 44 Letters in italics denote the type and location of probe modifications (fluorophores: 6-FAM and JOE, quenchers: ZEN, 3IABkFQ (3' Iowa Black FQ), and BHQ2 (Black Hole Quencher-2)). *Base in bold and italics represents the dNTP to which the limiting dNTP will base pair opposite

Figure 4-3. High background signals with dPTP detection template containing JOE fluorophore (5′ nuclease assay). The X-axis shows time in minutes and the Y-axis shows fluorescence intensity. Nucleotide substrate, ATP (phosphate donor), Pol I, Ndk, non- limiting dNTP mix (dATP, dCTP, and dTTP), and annealed primer/template/probe mixture were mixed and incubated. Negative control, which only contains buffer and Pol I, showed high background signal.

Primers, labeled probes, and labeled templates (Table 4-1) were from IDT (Coralville,

IA, USA). Templates containing Z and P nucleotides were synthesized by the previously described solid phase phosphoramidite methods (Chapter 3). Letters in bold and italics denote the type and location of probe modifications [fluorophores: 6-FAM (6-FAM); JOE quenchers:

41

ZEN, 3IABkFQ (3′ Iowa Black FQ), and BHQ2 (Black Hole Quencher-2), all from IDT].

Nucleobases in bold and italic represent the nucleobase that is opposite to the nucleotide that is limiting the polymerase elongation reaction. The dZTP-detect template is designed to pair with

3′Q-dCTP probe and the dPTP-detect template pairs with FAM-dATP probe. Primers, probes, and templates were annealed by incubating in a reaction mixture (100 μL) containing 7 mM Tris-

HCl (pH 7.5), 70 mM NaCl, 0.7 mM EDTA (pH 8.0), 6.7 μM primer, 6.7 μM probe, and 7 μM template. The annealing temperature was set as the following: 95 °C for 5 min followed by reducing the temperature at 1 °C/10 s to 25 °C. Nucleoside triphosphates products were detected in a reaction mixture (50 μL) containing 0.5 mM ATP, 30 mM Tris-HCl (pH 7.5), 10 mM

MgCl2, 100 mM KCl, 50 mM trehalose, 0.1 mM EGTA, 0.1 mg/mL BSA, 1 mM DTT, Pol I or

Taq DNA polymerase (dPTP detection), Ndk, 0.1 mM non-limiting dNTP mix (equimolar mixture of deoxyribonucleoside triphosphates; ACT for dGTP detection, GCT for dATP detection, GAT for dCTP and dZTP detection, GAC for dTTP detection, T for dPTP detection), and 134 nM annealed oligo mixture (1 μL of annealed oligo mixture prepared above). The reaction was monitored for two hours at either 37 °C (high-fidelity variant of E. coli Pol I

(K601I, A726V) developed by Dr. Ryan W. Shaw and Ms. Jennifer D. Moses92) or 45 °C (Taq

DNA polymerase) in a spectrophotometer (SPECTROstar Omega, BMG LABTECH, Ortenberg,

Germany).

Results

Assay Development

Four assays were developed in this study. The results with wild-type kinases (with affinity tags) and their substrates using those assays are shown below. Retention factor (Rf) values with reference compounds were measured by TLC assay, and also listed below.

42

Rf for TLC assay. To determine Rf values for G, A, C, T, Z, and P mono/di/tri- phosphates, TLC was performed with unlabeled nucleotides using PEI-cellulose as solid phase

89,90 and 0.75 M KH2PO4 or 1 M acetic acid as mobile phase. The measured Rf values are listed in

Table 4-2.

As expected, the values decrease as the number of negative charges increase, as the solid phase is positively charged where pH is ≤ 3.5 (anion exchange chromatography). G, Z (and A) nucleotides have lower Rf values than other nucleotides. There was no good explanation for this, but it may have been related to their pKa’s (Table 4-3). Z bases are negatively charged when the pH is higher than 7.8, and may be attracted by PEI more than other nucleotides.

ATP moves very little in acetic acid solvent systems, although it does move in KH2PO4, which is the solvent used for di/triphosphates separation (Table 4-2). The orders of the motilities were dG < dZ, Z < dA < P, dP < dC, dT (any nucleotides, KH2PO4); dZMP, ZMP < dGMP < dTMP, dAMP, dPMP, PMP, < dCMP (acetic acid). Interestingly, Z and G move similarly in both TLC solvents, acetic acid and KH2PO4.

43

Table 4-2. Rf values of GACTZP nucleotides. Left: solid phase: PEI-cellulose, mobile phase: 0.75 M KH2PO4. Right: solid phase: PEI-cellulose, mobile phase: 1 M acetic acid. KH2PO4 was used for separating di- and triphosphates. Acetic acid was used for separating monophosphates. R R f f

ATP 0.27 ATP 0.0081 dGMP 0.44 dGMP 0.060 dAMP 0.56 dAMP 0.24 dCMP 0.71 dCMP 0.42 dTMP 0.74 dTMP 0.22 dZMP 0.55 dZMP 0.028 dPMP 0.66 dPMP 0.23 ZMP 0.55 ZMP 0.040 PMP 0.58 PMP 0.31 dGDP 0.33 dADP 0.45 dCDP 0.68 dTDP 0.70 dZDP 0.35 dPDP 0.63 ZDP 0.34 PDP 0.50 dGTP 0.15 dATP 0.26 dCTP 0.49 dTTP 0.61 dZTP 0.15 dPTP 0.44 ZTP 0.12 PTP 0.34

44

16 Table 4-3. pKa values of GACTZ nucleotides. The GACT values are from Blackburn. The value for Z is from Hutter et al.93 The values are obtained at approximate to 20°C and zero salt concentration.

Bases (site) pKa Z (N-1) 7.8 Guanine (N-7) 2.4 Guanine (N-1) 9.4 Adenine (N-1) 3.7 Cytosine (N-3) 4.6 Thymine (N-3) 10.0

Activities of Wild-Type Kinases

Activities of all wild-type kinases with their natural substrates and AEGIS substrates were tested by the assays. Table 4-4 shows the list of kinases and their substrates tested in this study (no: does not accept the substrate; yes: accepts the substrate; -: not tested).

Table 4-4. List of wild-type kinases tested in this study.

Natural nucleoside/tide Nucleoside/tides Kinase substrates dZ Z dP P DmdNK dT, dC, dU, dA, dG, analogs no no yes yes Udk uridine, cytidine no no no no Gsk inosine, guanosine no no no no EcGmk dGMP, GMP no no no yes TmGMPK dGMP, GMP no no no yes EcAdk dAMP, AMP no no no no EcCmk dCMP, CMP no - no - TmCMPK dCMP, CMP no no no no EcTmk dTMP, TMP no no no no rUMP, dUMP, Ura6p no no no no CMP, dCMP, AMP dGMP,dAMP, T5 dNMPK no no no no dCMP, dTMP Ndk All (d)NDP yes yes yes yes

*N: GACTU

45

Drosophila melanogaster deoxynucleoside kinase (DmdNK)

As shown in Figure 4-4, wild-type DmdNK accepts P and produces a spot on the TLC plate that was assigned as PMP. Wild-type DmdNK does not accept Z. The activity towards dZ and dP were tested by Dr. Fei Chen; both proved to be bad substrates for wild-type DmdNK (dP:

-1 5 -1 -1 KM 122 ± 27 µM, kcat 15.4 ± 1.1 sec , kcat/KM 1.26 x 10 M sec ; dZ: KM > 790 µM, kcat > 0.27 sec-1).

Figure 4-4. TLC with the samples containing wild-type DmdNK and P. Solid phase: PEI- cellulose and mobile phase: 1 M acetic acid (pH 3.5). dCMP (positive control) and PMP were formed and detected on the plate. The spots were confirmed by comparing their mobilities with the standards’ Rf values.

Guanylate kinases

The activities of EcGMK and TmGMPK were confirmed by TLC and by the luficerase assay (Figure 4-5 and 4-6). The kinase reaction was run at 37 °C and downstream analysis by

TLC (separation) was run at room temperature, due to the degradation of substrates/products caused by a longer period of incubation at higher temperature. Both assays confirmed the formation of products (PDP and ATP) from substrates (PMP and ADP). According to the values

46

of luminescence intensity within 5 min incubation, kcat of TmGMPK for PMP at 82°C (optimum temperature94) was predicted to be as high as for its natural substrate dGMP.

Figure 4-5. TLC with wild-type guanylate kinases (EcGmk and TmGMPK) and PMP. Solid phase: PEI-cellulose and mobile phase: 0.75 M KH2PO4 (pH 3.5). dGDP (positive control) and PDP were formed and detected on the plate. The spots were confirmed by comparing their mobilities with the standards’ Rf values.

Figure 4-6. Luciferase assay (forward: ATP consumption) with wild-type guanylate kinases (EcGmk, TmGMPK) and PMP. The figures show luminescence signals of each sample. From left to right: negative control 1 (buffer only); negative control 2 (buffer + enzyme); negative controls 3 (buffer + monophosphate); buffer + enzyme + monophosphate (dGMP or PMP). The Y-axis shows the luminescence signals after 5 min incubation in the reaction mixture. The error bars indicate the standard deviations of three independent trials.

47

E. coli Nucleoside diphosphate kinase (Ndk)

The activity of wild-type Ndk was tested using the TLC, luciferase, and modified 5′ nuclease assays (Figure 4-7, 4-8, and 4-9). Ndk accepts both (d)ZDP and (d)PDP. First, deoxyribonucleotide substrates were tested. As seen in Figure 4-7, Ndk phosphorylate all deoxynucleoside diphosphates tested to form the corresponding triphosphates. In addition, it was confirmed that Ndk also accepts natural dCMP as a substrate. This activity was also confirmed by modified 5′-nuclease assay.

Figure 4-7. TLC with wild-type Ndk, dNMPs, and dNDPs. Solid phase: PEI-cellulose and mobile phase: 0.75 M KH2PO4 (pH 3.5). dNTPs were formed in samples containing Ndk, dNDPs, and dCMP. Lane 1 and 2: negative controls; Lane 3−14: Ndk + substrates. The spots were confirmed by comparing their mobilities with the standards’ Rf values.

The luciferase assay (Figure 4-8) confirmed that Ndk catalyzes the reverse reaction

(formation of ATP from ADP using dNTP as a phosphate donor) for all GACTZP triphosphates.

Comparing the results to the ATP standards showed that approximately 20% of the initial substrate ADP (0.5 μM) was converted to ATP. One reason could have been that the equilibrium of the reaction (ADP + dNTP → ATP + dNDP) is ∼4:1 under these conditions. Another reason might have been that luciferase or its product (such as AMP) inhibits the activity of Ndk.

48

Figure 4-8. Luciferase assay (reverse: ATP generation) with wild-type Ndk and dNTPs. ATP formation (dNTP consumption) catalyzed by Ndk was confirmed. The error bars indicate the standard deviations of three independent trials.

As shown in the TLC assay (Figure 4-7), the formation of triphosphates using GACTZP diphosphates and dCMP by Ndk was also confirmed by using the modified 5′ nuclease assay

(Figure 4-9). Independently, ribonucleotide substrates were tested with Ndk after the compounds became available. As shown in Figure 4-10 and 4-11, both ZDP and PDP are accepted by Ndk and corresponding triphosphates were detected on the TLC plate.

49

Figure 4-9. Modified 5′ nuclease assay with wild-type Ndk, dNMPs, and dNDPs. The results show the incorporation of deoxyribonucleoside triphosphates products to each DNA template. The X-axis shows time in minutes and the Y-axis shows fluorescence intensity. Nucleotide substrates were mixed with ATP (phosphate donor), Pol I or Taq DNA polymerase (dPTP detection), Ndk, 0.1 mM non-limiting dNTP mix (equimolar mixture of deoxyribonucleoside triphosphates without limiting dNTP), and annealed primer/template/probe mixture. The reaction was performed at 37 °C (high fidelity variant of E. coli Pol I) or 45 °C (Taq DNA polymerase).

50

Figure 4-10. TLC with wild-type Ndk and ZP nucleotides. ZTP and PTP formation was observed in the samples containing Ndk, ZDP, and PDP. The spots were confirmed by comparing their mobilities with the standards’ Rf values.

Figure 4-11. Luciferase assay (reverse: ATP generation) with wild-type Ndk, ZTP, and PTP. ATP formation (NTP consumption) catalyzed by Ndk was confirmed. The error bars indicate the standard deviations of three independent trials.

Figure 4-12 summarizes the phosphorylation steps that were completed. Arrows indicate the reactions catalyzed by kinases. P pathway, which generates PTP, was completed only by using wild-type enzymes. However, nucleoside kinases and nucleoside monophosphate kinases were still missing (dashed arrows) from the complete Z, dZ, dP pathways. To complete these pathways, variants of each kinase were generated and tested (see Chapter 5).

51

Figure 4-12. The phosphorylation steps completed by wild-type kinases. Wild-type DmdNK, EcGMK, TmGMPK, and Ndk accept ZP nucleoside and nucleotide to produce corresponding ZP nucleotides. PTP can be synthesized by using only wild-type kinases.

Kinetic Parameters

Attempts were made to measure the kinetic parameters of wild-type DmdNK, EcGMK,

TmGMPK, and Ndk. However, this proved to not be possible as 1) P and PMP were bad substrates (KM is high and/or kcat was low under the conditions tested); 2) enzyme-coupled assay could not be used due to the ability of pyruvate kinases to accept nucleoside diphosphates in addition to ADP as phosphate acceptors from PEP, giving false positive signals.

As shown in Figure 4-13, P was a poor substrate for wild-type DmdNK (no signal decrease was observed even with > 1 mM P), although the formation of PMP was observed by

TLC (Figure 4-4). Moreover, KMs of EcGMK and TmGMPK for PMP are high (Figure 4-14), and it was not feasible to measure their kinetic parameters (needed highly concentrated PMP solution). Also, the highest temperature allowed in a spectrometer was 45 °C, which was far from the predicted optimum temperature of TmGMPK (82 °C94).

52

Further, many different nucleoside diphosphates are the substrates for pyruvate kinase, which is in the enzyme-coupled assay mixture. Thus, UV absorbance fell without Ndk (Figure 4-

15) in the presence of only diphosphate (the negative control).

Figure 4-13. Enzyme-coupled assay with wild-type DmdNK and P. Changes in absorbance at 340 nm were measured in a spectrophotometer. Phosphorylation and enzyme-coupled reaction were performed in a buffer containing P, ATP (phosphate donor), NADH, PEP, pyruvate kinase, lactate dehydrogenase, and DmdNK. The samples were incubated at room temperature.

Figure 4-14. Enzyme-coupled assay with wild-type guanylate kinases (EcGmk and TmGMPK) and PMP. PDP formation was confirmed by detecting the signal decrease at 340 nm.

53

Figure 4-15. Signal decrease observed in a negative control (dGDP) containing only nucleotide substrate (no enzyme) by enzyme-coupled assay.

Discussion

Assay Comparison

This chapter described the four assays developed in this work to support our creation of a pathway to be used in a semi-synthetic organism. These assays can be used to develop kinases that phosphorylate unnatural nucleosides and their phosphates. In addition to their value to as tool to develop kinases for AEGIS, the assays are likely to be applicable for other unnatural nucleoside/tide systems, including those of Kool, Hirao and Romesberg.95-97

The enzyme-coupled assay is one of the most commonly-used nucleoside/tide kinase assays.85,94,98 Since this assay can amended to a high-throughput assay format, it was chosen first for use in this study. However, it turned out that this assay has some disadvantages. One is that the activity toward (d)Z nucleoside/tide proved to not be measurable, since (d)Z nucleoside/tide

15,93 also absorbs strongly at 340 nm (λmax = 380 nm). This made it difficult to measure the loss of

NADH, the observation key in this assay.

Further, it was discovered in the course of this study that contamination with ATPase activity produces a false positive signal, even if no substrate is present in the reaction mixture

(Figure 4-16). Unfortunately, the product of ATPase activity is ADP, the very substrate for

54

pyruvate kinase as the first step in the coupled enzyme assay. As a result, the undesired ATPase activity could be confused with the desired substrate phosphorylation activity.

Contamination by ATPase activity might occur in at least two ways: 1) incomplete of protein purification and 2) site directed mutagenesis leading to a damaged nucleoside/tide kinase.

Contaminating ATPase can be reduced substantially by careful protein purification.

Unfortunately, "site directed damage" of a nucleoside/tide kinase can be expected, at least some of the time, to disrupt the coordinated transfer of high energy phosphate to a substrate, allowing phosphate to be transferred to water.

These issues motivated us to develop many assays based on different assay principles.

Table 4-5 summarizes the pros and cons of each. In the following paragraphs, the details of pros and cons of each assay were explained.

Table 4-5. Pros and cons of the assays used in this study. Assays Pros Cons Enzyme-coupled Continuous assay, Less sensitive, backgrounds with common kinase assay found in literature diphosphates Luciferase Endpoint, high-throughput assay Possible interference by luciferase, backgrounds with kinases and nucleotides TLC Endpoint, directly detects products Labor intensive, expensive, radioactive material Modified 5′ nuclease Proves incorporation of triphosphates Possible interference by polymerases into DNA

Figure 4-16. Signal decrease at 340 nm was observed in a negative control (enzyme only) by enzyme-coupled assay.

55

First, the luciferase assay was developed to detect the activity of Ndk using nucleoside triphosphates as substrates. Triphosphate substrates are more accessible than diphosphates in terms of cost and synthesis. Moreover, this assay can be performed by using 96 well plate and is suitable for library screening (high-throughput) as well as enzyme-coupled assay. The one disadvantage of this assay is that luciferase also accepts deoxyribonucleoside triphosphate

(dATP)99 other than ATP and produces high background signals when it is used for the reactions that require dATP (and dNDPs) (Figure 4-17). Background signals are generally less than 100 units, but dGDP, dADP, and dATP gave fluorescent signals of ~ 2700, 250, and 7500 units, respectively. This problem was overcome by detecting the signal decrease (ATP consumption instead of generation, Figures 4-1 and 4-6).

Figure 4-17. Background signal with dNDP and dATP detected by the luciferase assay (reverse). High background signals were detected with the samples containig dGDP, dADP and dATP (no enzyme). This is the result of single experiment.

This assay’s flexibility (it works for both the forward and the reverse reactions) raised a question. Does this assay actually detect the phosphorylation of the desired substrate, rather than

ATP consumption and generation by other mechanisms? This question is especially important

56

for Ndk, which follows a ping-pong Bi Bi mechanism (Figure 4-18). This question is less likely to be important for Ndk if it is used to generate ATP (the reverse reaction), as Ndk first accepts

ATP, then later phosphorylates the nucleotide. The destruction of ATP can occur in many ways, including nonenzymatically. Both forward and reverse luciferase assays can be used for kinases that follow the other kinetic mechanisms (most kinases in this study, whose mechanisms are known) such as ordered and random sequential mechanisms.

Figure 4-18. The kinetic mechanism of Ndk (ping pong Bi Bi).

Second, the TLC assay was developed to detect phosphorylated AEGIS nucleoside products directly. The assay is straightforward. However, it requires a radiolabeled compound and is labor intensive, taking a half to whole day to obtain results. Also, it is expensive. One TLC plate costs ~ $7, and can analyze only ~ 10 samples.

Finally, modified 5′ nuclease assay was developed so that the deoxynucleotide (product) incorporation into DNA can be confirmed. This assay, of course, requires many steps to work together. Affirmatively, we want those same steps to work together in our semi-synthetic organism.

These features made all assays valuable for different purposes. TLC and modified 5′ nuclease assays are good for producing a solid evidence of kinase activities. The enzyme- coupled assay and luciferase assay are suitable for nucleoside kinase library screening (see

Chapter 5).

57

Kinase Specificities

Wild-type DmdNK accepts both P and dP, although both are poor substrates. Its ability to accept AEGIS nucleosides may not be that surprising, since DmdNK accepts substrates having various natural sugars and natural nucleobases.85

Acceptance of PMP by guanylate kinases (EcGmk and TmGMPK) may also not be surprising. The P nucleobase differs from the G nucleobase by only the one hydrogen on N1, and the absence/presence of nitrogen at position 7 on the purine ring. Thus, P may be viewed a good structural analog of G.

Further, it may not be surprising that Ndk accepts AEGIS nucleotide analogs, as Ndk has a broad specificity, accepting all five natural nucleotides (both deoxyribo- and ribonucleotides).

However, Ndk gave another surprising outcome. Although Ndk has the name

"deoxyribonucleoside diphosphate kinase", it was able to accept a monophosphate, dCMP

(Figures 4-7 and 4-9), as a substrate. This is the first report that shows conclusively that Ndk can phosphorylate a nucleoside monophosphate.The crystal structure of E. coli Ndk is available.76

However, it does not show any specific contacts between the protein and the nucleobase (does not contain any nucleotide ligands). The confirmation of the nucleoside monophosphate activity in addition to the phosphorylation of histidine protein kinases by Ndk,100 shows the extraordinary flexibility of this enzyme.

Ribonucleoside Triphosphate Synthesis

The results revealed that wild-type kinases accept AEGIS ribonucleoside/tides better than deoxyribonucleoside/tides. This is especially true for monophosphate kinases. The E. coli and other monophosphate kinases tend to accept ribonucleoside monophosphates better (lower

59,71,101 KM/higher kcat for ribonucleoside/tide ) rather than deoxyribonucleoside monophosphates.

This makes sense if we consider major de novo deoxyribonucleotides synthesis pathways.

58

dNDPs are synthesized by ribonucleotide reductases (RNRs) from rNDPs, which are synthesized from rNMPs by monophosphate kinases. Therefore, it may be easier to synthesize ribonucleoside diphosphates in vivo, then convert them to deoxynucleoside diphosphates by (RNR), followed by the phosphorylation by Ndk.

59

CHAPTER 5 KINASE VARIANTS SCREENING

Materials and Methods

Enzyme Design and Preparation

Kinase variants were designed by a semi-rational approach (using crystal structure and evolutionary information) and by using the Rosetta software102 by a collaborator Dr. Per Jr

Greisen from the Baker laboratory at the University of Washington. A list of created variants is introduced in the results section.

All kinases were purified using the standard histidine tag protein purification method described in the previous chapter (Chapter 3). His or HAT tag was attached to the N- or C- terminus of the kinases. Only DmdNK contains His-tag, and the other kinases have HAT-tag.

D. melanogaster nucleoside kinase (DmdNK)

The plasmids containing DmdNK Q81E were a gift from the Lutz lab at Emory

University. The variant was purified by the method described in the previous chapter (Chapter

3).

The DmdNK variant was designed based on its crystal structures (Figure 5-1). According to the previous site-mutagenesis study,38 glutamine at position 81 was targeted to create a dZ kinase. DmdNK has a preference for pyrimidine nucleosides, however the specificity alternates toward a purine nucleobase (dG) by replacing the glutamine residue with asparagine (Q81N), although the variant’s kinetic parameters are not as good as the wild type enzyme. Crystal

This chapter is reproduced in part with permission from [Matsuura, M.F., Winiger, C.B., Shaw, R.W., Kim, M.-J., Kim, M.-S., Daugherty, A.B., Chen F., Moussatche, P., Moses J.D., Lutz, S. and Benner, S.A., A Single Deoxynucleoside Kinase Mutant from Drosophila melanogaster synthesizes the Monophosphates of Nucleosides that are Components of an Expanded Genetic System. ACS Synth. Biol., in press. doi: 10.1021/acssynbio.6b00228]. Copyright 2016 American Chemical Society.

60

structures show that the wild type enzyme accepts dC (PDB ID: 2VP5) and dT (PDB ID: 1OT3) by flipping the glutamine residue. In this study, the DmdNK Q81E variant was designed to accept dZ by replacing the glutamine to glutamate.

Figure 5-1. Crystal structure of DmdNK complexed with dT and dC, and the designed variant Q81E to phosphorylate dZ. Crystal structures are from PDB ID: 1OT3 (dT) and PDB ID: 2VP5 (dC).

E. coli cytidylate kinase (EcCmk)

Since this is a pyrimidine kinase, the variants were designed for dZMP by the Rosetta software in collaboration with the Baker lab at the University of Washington. The cmk gene was cloned (Appendix Table B-1), amplified and tagged with mutagenesis primer sets listed in Table

5-1. Five residues (Figure 5-2, S36, R110, D132, M133, A191) were mutated by mutagenesis

PCR. Mutagenesis PCR was performed in the reaction mixture (50 μL) containing 25 mM Tris-

SO4 (pH 8.5), 2 mM magnesium acetate, 5 mM potassium acetate, 0.5 μM each mutagenesis primer, 0.05% Tween 20, 0.01 U/μL Taq DNA polymerase, 0.02 U/μL Q5 DNA polymerase, 1 mM dNTP mix, and template plasmid (the plasmid containing cmk gene). The reaction mixture was incubated as follows: 95 °C for 2 min, followed by 30 cycles of 95 °C for 30 s, 50 °C for 30 s, 60 °C for 30 s, and a final cycle of 68 °C for 8 min. PCR products were confirmed by

61

nucleotide electrophoresis. Then the digested (Dpn I, two hours at 37 °C), purified plasmids were ethanol precipitated (30 mM EDTA, 65% EtOH) and centrifuged (15,000 ×g, 15 min). The collected plasmids were transformed into XJa Autolysis™ strain (Zymo Research). The obtained variants were sequenced to confirm the mutation sites.

Figure 5-2. Crystal structure of EcCmk complexed with CMP (PDB ID: 1KDO). The figure shows the substrate and modified amino acid residues (pink). Heteroatoms are colored blue for nitrogen, red for oxygen, and yellow for sulfur.

Table 5-1. Primer sets used for cmk gene mutagenesis. Primer set Sequences (5' to 3') S36A (Forward) GCTGGACGCGGGTGCAATTTATCGC S36T (Forward) GCTGGACACGGGTGCAATTTATCGC R110M (Reverse) CGCAATAATGCTTCCATAACGCGTGGGAATGCCG R110K (Reverse) CGCAATAATGCTTCTTTAACGCGTGGGAATGCCG D132E/M133A (Forward) CCGATGGCCGCGAAGCGGGAACGGTGGTATT A191S (Reverse) AACCAGTGGCGCTACCGATCGGTTACGATCGCGGTC

E. coli thymidylate kinase (EcTmk)

The variants were designed for dZMP by Dr. Ryan W. Shaw by using crystal structure information. The tmk gene was cloned, amplified and tagged with mutagenesis primer sets listed in Appendix Table B-2. The variants were created by Ms. Jennifer D. Moses. Two amino acids

62

(Figure 5-3, R78 and T105) were replaced with any residues other than aromatic amino acids

(R78P/T/A/H/E/N/K/D/Q/R/S/G, T105P/T/A/H/E/N/K/D/Q/R/S/G), since both residues make hydrogen bonds to the carbonyl group of carbon 6, where Z has an amino group. The tmk genes with mutations were then cloned into the plasmids by Gibson assembly.103 The plasmids were transformed into the XJb Autolysis™ strain (Zymo Research) to express and purify the Tmk variants.

Figure 5-3. Crystal structure of EcTmk complexed with TP5A (PDB ID: 4TMK). The figure shows the substrate and modified amino acid residues (pink). Heteroatoms are colored blue for nitrogen, red for oxygen.

E. coli adenylate kinase (EcAdk)

The sites to mutate in EcAdk were selected to create candidates to accept dPMP as a substrate. Based on an analysis of the crystal structure, two residues (Figure 5-4, L58 and V59) were chosen for replacement by smaller residues to make space for an amino group of P at the

C4 (C2 for A), where A does not have any functional group. T31 was also chosen for replacement by smaller hydrophobic residues since P has carbon at position 7 instead of nitrogen.

63

EcAdk variants were also designed by the Rosetta software for dZMP, the same method used for EcCmk design. The result suggested that replacements of three residues (Figure 5-5,

T31, V59, and V64) might change substrate specificities of the kinase toward dZMP.

Figure 5-4. Crystal structure of EcAdk complexed with AMP and phosphoaminophosphonic acid-adenylate ester (PDB ID: 1ANK). The figure shows the substrate and modified amino acid residues (pink, my personal design). Heteroatoms are colored blue for nitrogen, red for oxygen.

64

Figure 5-5. Crystal structure of EcAdk complexed with AMP and phosphoaminophosphonic acid-adenylate ester (PDB ID: 1ANK). The figure shows the substrate and modified amino acid residues (pink, designed by Rosetta). Heteroatoms are colored blue for nitrogen, red for oxygen.

The adk gene was cloned, amplified and tagged with mutagenesis primer sets listed in

Table 5-2, Table 5-3, and Appendix Table B-3. The amplicons were then cloned to the plasmids, and transformed into the DH5 E. coli strain by Ms. Jennifer D. Moses.

Table 5-2. Primer set for mutagenesis PCR of adk gene designed for dPMP. Primer set Sequences (5' to 3') Adk-T31SBS-F (Forward) GTATTCCGCAAATCTCCSBSGGCGATATGCTGC Adk-L58KSY V59KSY-R (Reverse) CAGTTCGTCGGTRSMRSMTTTGCCAGCATCCATAATG *Mixed base: S = C, G; B = C, G, T; R = A, G; M = A, C

Table 5-3. Primer set for mutagenesis PCR of adk gene designed for dZMP. Primer set Sequences (5' to 3') Adk-T31KSY-F (Forward) GTATTCCGCAAATCTCCKSYGGCGATATGCTGCGTGCT Adk-V59WCB V64KMY-R (Reverse) TTTAACCAGCGCGATRKMCAGTTCGTCGGTVGWCAGTTTGCCAGCATCCAT *Mixed base: K = G, T; S = C, G; Y = C, T; W = A, T; B = C, G, T; M = A, C

65

E. coli guanylate kinase (EcGmk)

The sites for mutation in EcGmk to create a kinase variant able to accept dPMP (Figure

5-6) were suggested mostly by Dr. Dietlind L. Gerloff at the Foundation for Applied Molecular

Evolution. She used a combination of crystal structure information and bioinformatic analyses.

Among her suggestions, E73 was first chosen as the residue to change, since the glutamate side chain directly contacts the nucleobase at N1 and N2 via hydrogen bonding, and the hydrogens on the P heterocycle differed from the hydrogens on the G heterocycle at N1.

Therefore, it was hypothesized that changing this residue to either asparagine or glutamine would be helpful to change the substrate specificity towards P.

In addition to the strategy that modifies direct protein:nucleobase hydrogen bonding interactions, a strategy that modifies residues in the second shell was also applied. The following mutation sites were designed by Dr. Dietlind L. Gerloff. First, replacements of T84 were designed based on the crystal structure to accommodate P. Then L72, S85, Q106 were chosen as replacement amino acids since they are heterotachous sites.104 Heterotachous sites are sites where the rate of amino acid replacement is different in different episodes of natural history.

Sometimes, amino acid replacements are rapid; sometimes they are slow.

The amino acids in heterotachous sites are thought to be related to changes in the function of the enzyme. When the function of the enzyme changes, amino acid replacement at heterotachous sites is fast. When the function of the enzyme is not changing, then amino acid replacement at heterotachous sites is slow. Thus, when one wants to alter the function of the enzyme (here, to get it to accept an AEGIS substrate), one changes the amino acid at the heterotachous sites under the rationale that one is doing the same thing that natural history did, but for unnatural adaptive reasons.

66

In contrast, if a site is variable throughout natural history, we consider this as evidence that the site is far from the active site. Its natural history follows the model of the "neutral theory of molecular evolution", introduced by Motoo Kimura.105 Thus, we hypothesize that changes at those sites are not likely to influence the ability of the enzyme to accept AEGIS substrates, and can be ignored.

Finally, if a site is conserved throughout natural history, we consider this as evidence that the amino acid at that site is essential for a core function of the protein, including its ability to fold. Thus, we hypothesize that changes at those sites are likely to kill the enzyme. Therefore, we make no changes at this site in our engineering effort.

Figure 5-6. Crystal structure of EcGmk complexed with GMP (PDB ID: 2ANB) The figure shows the substrate and modified amino acid residues (pink). Heteroatoms are colored blue for nitrogen, red for oxygen.

The gmk gene was cloned, amplified, mutated and tagged with primer sets (Appendix

Table B-4). The primer sets were designed by Dr. Ryan W. Shaw. The gmk genes with mutations were then cloned to the plasmids, and transformed into the DH5 E. coli strain by Ms. Jennifer

D. Moses, using Gibson assembly.103

67

Library protein purification

All plasmids containing kinase libraries were transformed into the E. coli XJ Autolysis™ strain (Zymo Research). The E. coli strains carrying the plasmids were grown in 96 well plates

(MASTERBLOCK®, Greiner bio-one, Frickenhausen, Germany) with LB/carbenicillin media

(500 L for 96 well plate) for a few hours in the 37 °C incubator with shaking (250 rpm). Plates were sealed with AeraSeal™ film sterile (Sigma-Aldrich, St. Louis, MO, USA). Culture media whose OD at 600 nm reached to 1.2 were diluted by factors of 2 with LB/carbenicillin media containing 20 mM glucose, 3 mM arabinose, 2 mM MgSO4, and 200 ng/mL anhydrotetracycline

(final culture volume: 1000 μL). Then the culture was incubated for another 4.5 hours in the 30

°C incubator with shaking (250 rpm). The bacteria were harvested by centrifugation (5000 ×g,

15 min, 4 °C) and the media decanted. Lysis/bind/wash buffer (200 L, 50 mM Tris-HCl (pH

7.4) and 5 mM imidazole) was added to each pellet and vortexed at speed setting 7 for 2 min.

Resuspended pellets were stored at -70 °C until they were ready to be thawed. The cells were thawed at room temperature with occasional mixing, then harvested by centrifugation (5000 ×g,

15 min, 4 °C). The supernatant was collected for protein purification.

TALON™ Metal Affinity Resin (150 L, Clontech, Mountain View, CA, USA) was added to each well of 96 well multi-screen filter plate (MultiScreen® filter plate, Merck

Millipore, Darmstadt, Germany). The plate with sealing foil (LightCycler® 480 Sealing Foil,

Roche, Basel, Switzerland) was placed on the 96 well white plate (Costar® 96-Well White Solid

Plate, Cole-Palmer, Vernon Hills, IL, USA) and harvested by centrifugation (4000 ×g, 1 min, 4

°C). The flow through was discarded. Cell lysate was transferred to a 96 well multi-screen filter plate with resin, and vortexed twice (vortex 1 min – incubation 1 min - vortex 1 min). The plate was held at 4 °C for few minutes, and harvested by centrifugation (4000 ×g, 1 min, 4 °C). The

68

flow through was discarded, lysis/bind/wash buffer (200 L) was added to each well, then the plate was vortexed and harvested by centrifugation (4000 ×g, 1 min, 4 °C) (Washing step). This washing step was repeated once. After washing, 50-95 L of an elution buffer (50 mM Tris-HCl

(pH 7.4) and 100 mM imidazole) was added to each well and incubated for 5 min at room temperature. The plate was placed on a 96 well plate (Nunc® flat-bottom 96 well plate, Thermo scientific, Waltham, MA, USA) and harvested by centrifugation (4000 ×g, 1 min, 4 °C), and it was ready for the spectrophotometer-based kinase activity assay.

Kinetic Parameters

A mixture (250 µL) containing 50 mM Tris-HCl (pH 7.5), 100 mM KCl, 2.5 mM MgCl2,

0.18 mM NADH, 0.21 mM PEP, 1 mM ATP, 1 mM DTT, 0.1 mg/mL BSA, 30 U pyruvate kinase and 33 U lactate dehydrogenase was placed in a quartz cuvette. DmdNK Q81E (125 ng) was added, and the reaction was initiated by adding substrate (0.125-400 M). The cuvette was placed in a DU® 640 spectrophotometer (Beckman Coulter, Brea, CA, USA) and absorbance at

340 nm was monitored for 45 min at room temperature. Mixtures containing DmdNK Q81E or nucleoside substrate were used as negative controls. dG was not soluble in its solvent (DMSO) when its concentration exceeded 250 mM (solution that was added to the reaction mixture), a challenge because the KM for dG proved to be higher than for other substrates. Therefore, the assay for dG was performed in a range of 50-1600 M as a final concentration of the reaction mixture.

Enzyme-coupled assays were run in triplicate and kinetic parameters were determined by fitting data to the Michaelis-Menten equation, using non-linear regression analysis in SciDAVis scidavis.sourceforge.net/index.html

69

Results

Kinase Variants

The following variants (Table 5-4) were designed, created, and tested in this study. For

EcCmk, the plan was to incorporate S36T/A, R110K/M, D132E, M133A, A191S mutation sites to cmk gene, however the mutagenesis PCR was not completely successful, and only the following mutations were generated: S36T (no A), R110M (no K), D132E, M133A, A191S.

Table 5-4. Kinase variants tested in this study. Designed Designed Tested Clone Kinase Mutated Position(s) by for (number) DmdNK Semi-rational Z Q81E 1 EcCmk Rosetta Z S36T, R110M, D132E, M133A, A191S 1

R78P/T/A/H/E/N/K/D/Q/R/S/G, EcTmk Semi-rational Z 188 T105P/T/A/H/E/N/K/D/Q/R/S/G

EcAdk Semi-rational P T31G/A/V/L/P/R, L58A/G/S/C, V59A/G/S/C 188

EcAdk Rosetta Z T31A/C/G/S, V59S/T, V64A/D/S/Y 94

L72I/M/T/V/A, E73H/Q/N/K, T84F/L/S/V/A, EcGmk Semi-rational P 94 S85S/A, Q106I/N/V/D

Kinase Variant Activities

D. melanogaster deoxynucleoside kinase variant Q81E

The activity of DmdNK Q81E was tested with the enzyme-coupled assay and the TLC assay. Kinetic parameters were calculated for this enzyme. As seen in Figure 5-7, DmdNK Q81E phosphorylates all deoxynucleosides tested, the four standard and the two AEGIS (Z and P).

Monophosphate products were detected in all cases using PEI-cellulose chromatography in 1 M acetic acid (pH adjusted to 3.5 with NH4OH).

These activities were confirmed with the enzyme coupled assay (Figure 5-8), which also allowed the estimation of Michaelis-Menten kinetic parameters for dP (Table 5-5). The kinetic

70

parameters could not be obtained precisely for dZ, as its nitro-amino pyridine heterocycle absorbs strongly at 300-400 nm (Chapter 4), creating a high blank.

Figure 5-7. TLC with DmdNK Q81E and dNs. Phosphorylated dNMP products were detected and confirmed by comparing their mobilities with the standards’ Rf values.

To assess whether the monophosphates products inhibited the phosphorylation reaction, the respective products were added, up to 1600 µM, to the assay mixture. No inhibition was observed (Figure 5-8 and 5-9) with any of the monophosphates. Thus, for the most active substrates (where the reaction was followed to ~ 100% conversion), these results exclude the possibility that kinetic parameters were perturbed by product inhibition.

These experiments found that the DmdNK Q81E variant accepts Z and P deoxynucleosides. The kinetic parameters for P suggested that this enzyme would be useful as part of an in vivo pathway to make dPTP in a semi-synthetic organism, For dP, the Michaelis

-1 5 constant (KM) was ~ 10 µM, the turnover rate (kcat) was 4.9 sec , and the kcat/KM was 5.0 × 10

-1 -1 M sec . This is similar to the values obtained with the natural nucleosides. For kcat, the order of increasing activity is dG << dT/dC < dA/dP. This means that dP is quite a good substrate for the enzyme variant, and the engineering exercise was successful. In addition to the activity towards deoxynucleosides, it was later confirmed by TLC that DmdNK Q81E also accepts Z to produce

ZMP (Figure 5-10).

71

Table 5-5. Kinetic parameters of DmdNK Q81E for substrates. k K k /K cat M cat M s-1 M s-1 M-1

dG 1.9 ± 0.6 1.4 × 10-3 ± 7.2 × 10-4 1.4 × 103 dA 5.7 ± 0.2 3.9 × 10-6 ± 5.5 × 10-7 1.5 × 106 dC 4.5 ± 1.3 1.0 × 10-5 ± 5.5 × 10-6 4.4 × 105 dT 4.5 ± 0.6 5.8 × 10-6 ± 1.8 × 10-7 7.8 × 105 dP 4.9 ± 0.2 9.9 × 10-6 ± 1.3 × 10-6 5.0 × 105

Figure 5-8. Enzyme-coupled assay with DmdNK Q81E and dNs. Each nucleoside (500 M, 1600 M for dG, due to its high KM) was added to the enzyme-coupled assay buffer. Phosphorylation of all dNs was confirmed by detecting the signal decrease at 340 nm.

Figure 5-9. Product inhibition assay with DmdNK Q81E. Enzyme-coupled assay was performed by adding both the substrates and products (500 M, 1600 M for dG and dGMP) to the reaction buffer containing DmdNK Q81E. No product inhibition was observed with any dNMPs.

72

Figure 5-10. TLC with DmdNK Q81E and Z. ZMP was detected in the samples. The spots were confirmed by comparing their mobilities with the standards’ Rf values.

E. coli cytidylate kinase (EcCmk) variants

The activity of Cmk variant designed by Rosetta was tested using the TLC assay (dZMP,

Figure 5-11) and enzyme-coupled assay (dPMP, Figure 5-12). The variant did not accept dZMP and dPMP. Moreover, it lost its activity toward dCMP (natural substrate). Here, the engineering design was not successful.

Figure 5-11. TLC with Cmk variant, dCMP, and dZMP. No dNDP was detected in the samples. The spot was confirmed by comparing their mobilities with the standards’ Rf values.

73

Figure 5-12. Enzyme-coupled assay with EcCmk variant, dCMP, and dPMP. No signal decrease (phosphorylation) was observed.

E. coli thymidylate kinase (EcTmk) variants

The activities of E. coli kinase variants against various monophosphates were tested using the luciferase assay, where the gain of ATP (and the gain of luminescence) was followed in the reaction dTDP + ADP  dTMP + ATP  light. First, the background signal was checked with wild-type enzyme and the natural substrate dTDP. As seen in Figure 5-13, luciferase generated light with a good signal-to-noise ratio when 50 μM dTDP was added to the reaction mixture but not in the negative controls lacking dTDP, dZDP, and/or

EcTmk (Figure 5-13).

Then, 188 Tmk variants were screened using the luciferase assay. None appeared to catalyze the dZDP + ADP  dZMP + ATP  light reaction; no luminescence signal above ~ 50

(negative controls) was observed. Therefore, none accept dZDP (Figure 5-14) and, therefore, by microscopic reversibility, none accept dZMP. Unfortunately, the enzyme-coupled assay with dTMP also indicated that most Tmk variants had lost their ability to catalyze this reaction with natural dTMP/dTDP (Figure 5-15). Approximately 5 variants per plate (96 well plates) continued to show a little activity as a dTMP kinase; however, the signal decrease was very low, possibly explained by low enzyme activity, or small amounts of an enzyme correctly folded.

74

Figure 5-13. Luciferase assay (reverse) with EcTmk wild-type. Left panel: Negative control containing buffer + luciferase; negative control with buffer + Tmk + luciferase; buffer + Tmk + dTDP + luciferase. Right panel: Negative controls, buffer + luciferase + dNDP without Tmk. Buffer contained ADP and luciferin. Here, a positive result is indicated by the formation of ATP, to give a high bar. The activities of EcTmk was confirmed by detecting ATP resulted from the reverse reaction catalyzed by EcTmk. This is the result of single experiment.

Figure 5-14. Luciferase assay with EcTmk variants in an assay where ATP is formed (reverse, positive result is a high bar). The graphs show low (< 50, consistent only with background) luminescence signals (the Y-axis) of the samples (the X-axis) containing Tmk variants (plate 1 and 2, each plate contains 94 variants and two negative controls), dZDP and ADP. This indicated there were no variants in the samples that phosphorylate dZMP. This is a panel of largely inactive enzymes. Bars colored blue are negative controls without Tmk. The error bars indicate the standard deviations of three independent trials.

75

Figure 5-15. The enzyme-coupled assay results with Tmk variants acting on dTMP. Some Tmk variants might accept dTMP (if the slow decrease in absorbance is interpreted as "activity"), but only poorly.

E. coli adenylate kinase (EcAdk) variants

The enzyme coupled assay was used to test the ability of EcAdk to accept dPMP because the P heterocycle does not have the strong UV absorbance as seen with the Z heterocycle. Here, the reaction sought is: dAMP + ATP  dADP + ADP  accept phosphate from PEP leading to the formation of pyruvate, its reduction to lactate and the corresponding loss of NADH.

Further, the luciferase assay was problematic. This assay seeks dADP + ADP  dATP +

ATP  light. However, a background signal (~ 250 units) was seen when ADP and dADP are mixed without kinase. The source of this background is not known (luciferase possibly accepts dADP and dGDP as substrates as it accepts dATP99), but was observed with only adenosine and guanosine derivatives.

Figure 5-17 shows that Adk variants (made following my personal design for dPMP) do not accept dPMP as substrate. However, most of the variants still retain their activity towards natural substrate dAMP.

76

Figure 5-16. Luciferase assay with the reaction of dADP + ADP  dAMP + ATP  light (reverse), using wild-type E. coli Adk. Left panel: Negative control with buffer + luciferase; negative control with buffer + Adk + luciferase; reaction with buffer + Adk + dADP + luciferase. Right panel: Further negative controls, with buffer + dADP, but no enzyme other than luciferase; buffer + dZDP + luciferase. Buffer contained ADP and luciferin. This shows that the enzyme is active against its natural substrate. Note the background with dADP of ~ 250 units, which motivated us to use the coupled assay for P nucleotides. This is the result of single experiment.

Figure 5-17. The enzyme-coupled assay results with Adk variants designed to accept dPMP. Left panel: The reaction sought is dAMP + ATP  dADP + ADP  accept phosphate from PEP leading to the formation of pyruvate, its reduction to lactate. The reaction is initiated by adding dAMP (vertical arrow on left panel). The different traces are from different variants. This left panel shows that some, but not all, of the variants retain activity against the natural substrate. Right panel: The reaction sought is dPMP + ATP  dPDP + ADP  accept phosphate from PEP leading to the formation of pyruvate, its reduction to lactate. The data show that dPMP is not accepted by any variant. The number of variants in the two panels is the same; they overlap more in the right panel. The darkest points are the results of a negative control (without Adk variant, pointed by the horizontal arrows).

77

The activities of Adk variants designed for dZMP were tested with the luciferase assay

(Figure 5-18), since the UV absorbance of the Z heterocycle was problematic. Here, samples with only kinase variants were used to measure the negative control signals. Although some variants show high background luciferase signals, negative controls that contain only Adk variants showed higher luminescence. Thus, it was concluded that no Adk variants accept dZDP.

Figure 5-18. Luciferase assay with Adk variants and dZDP. The graphs show luminescence signals of the samples containing Adk variants (94 variants and two negative controls), dZDP, and ADP. The reverse reaction of Adk variant was observed (signal increase if variants accept dZDP and produce ATP as a result). All observed signals were below negative controls (Top, Adk only), which indicated there were no Adk variants in the samples that phosphorylate dZMP. Bars colored blue are additional negative controls without Adk. The error bars indicate the standard deviations of three independent trials.

Selected eight Adk variants were tested to check their activities towards dAMP by TLC

(Figure 5-19). The luciferase assay was not used for this purpose because of the background produced by dADP (see Chapter 4). The results indicated that most Adk variants retain their

78

activities towards their natural substrate dAMP. The activities towards dAMP seemed to be varied among variants. This may be due to simply the difference of purified Adk variant concentrations or the effects of mutations.

Figure 5-19. TLC with selected eight Adk variants and dZMP. Most Adk variants retain their natural kinase activity towards dAMP to produce dADP. The products were detected and confirmed by comparing their mobilities with the standards’ Rf values.

E. coli guanylate kinase (EcGmk) variants

The activities of 94 different Gmk variants designed to accept dPMP were measured using the luciferase assay (Figure 5-20). As well as the samples containing kinases and dPDP, samples containing only variant kinases were also used in the negative controls. Negative controls and samples (EcGmk variants + dPDP) had similar luminescence signals. Thus, it was concluded that no Gmk variants accept dPDP. A set of 12 Gmk variants were tested to check their activities towards dGMP by using the enzyme-coupled assay (Figure 5-21). The luciferase assay was not used for this purpose because of the background produced by dGDP (see Chapter

4), again not fully understood. The results indicated that most Gmk variants retain their activities towards their natural substrate dGMP. Theoretically 800 variants can be created. However, the library screening was stopped after 94 variants were screened, as this library generation method did not seem to be productive (see Discussion).

79

Figure 5-20. Luciferase assay with Gmk variants and dPDP. The graphs show luminescence signals of Gmk variants (94 variants and two negative controls) with or without diphosphate substrate (dPDP). The formation of ATP (high bars) gives a positive signal. The reverse reaction of Gmk variant was observed (signal increase if variants accept dPDP and produce ATP as a result). All observed signals were not above negative controls (Top, Gmk only), which indicated there were no Gmk variants in the samples that phosphorylate dPMP. Bars colored blue are additional negative controls without Gmk variants. The error bars indicate the standard deviations of three independent trials.

Figure 5-21. Enzyme-coupled assay with selected Gmk variants and dGMP. Most Gmk variants retain their natural kinase activities towards dGMP.

80

Discussion

DmdNK Q81E Accepts dZ, Z, and dP

These results show that the Q81E mutation allowed the DmdNK to accept dZ and dP, with the activity towards dP equivalent to natural substrates. All kinetic parameters were similar for the dP and standard nucleosides except dG (Table 5-5). Except for dG, KM values ranged

-1 from 3.9 to 10 µM, kcat values ranged from ~ 4.5 to 5.7 sec , and specificity constants (kcat/KM) ranged from 4.4 to 15 × 105 M-1sec-1. Thus, we have effectively broadened the substrate specificity by mutation, except for dG. dP turned out to be a fairly good substrate compared to other unnatural nucleosides.106

Parallel experiments with the native DmdNK showed that dP and dZ were poor

-1 substrates compared with natural substrates (KM: 122 ± 27 µM, kcat: 15.4 ± 1.1 sec , kcat/KM:

5 -1 -1 −1 1.26 × 10 M sec ; dZ: KM > 790 μM, kcat > 0.27 s ). Variant kinase showed an increased activity for dA and dP compared to the wild-type enzyme, but not for dG. This result is quite surprising. Although differing in the atoms at the purine ring positions 5 and 7, and having a different inter-strand hydrogen bonding pattern, dP otherwise resembles dG. Perhaps naively, we might expect that a variant that accepts dP better would also accept dG better.

Crystal structures are available for the kinase complexed to both dT (PDB ID: 1OT3)107 and dC (PDB ID: 2VP5),46 but not to dA or dG. Here, the glutamate interacts with the hydrogen bonding units on the Watson-Crick "edge" of these pyrimidines46,107 (Figure 5-22). In one orientation, "donor-acceptor" pattern of the amide interacts with the "acceptor-donor" pattern of dT; rotating the amide by 180° allows the "acceptor-donor" hydrogen bonding pattern of the glutamine amide to form two hydrogen bonds to the "donor-acceptor" of dC. For purine, the crystal structure that contains dGTP is available46 (PDB ID: 2VP2). It shows that the "donor" of the "donor-acceptor" pattern of the amide interacts with the carbonyl group of guanine.

81

According to the crystal structures, we hypothesize that Q81 interacts with position 3 and

4 of the pyrimidine nucleobases. Thus, the Q81E variant may be able to accept (d)Z because 1) glutamic acid has the same length of carbon chain and 2) its "acceptor-acceptor" hydrogen- bonding pattern matches the "donor-donor" pattern of Z base. However, that hydrogen bonding pattern does not explain why the Q81E variant accepts dP. If we assume that P base binds to

Q81E with the same orientation as G does (Figure 5-22), a hydrogen-bond would not be formed since glutamic acid does not provide hydrogen ("acceptor-acceptor") although P will interact

Q81E with its carbonyl group (acceptor). This problem would be solved if glutamic acid were not charged ("donor-acceptor"), which seems unlikely at pH 7. Nevertheless, the uncharged side chain would still not interact well with dZ. It might be possible that dGTP-DmdNK complex does not resemble dG-DmdNK complex, and the hydrogen-bond interaction(s) with the purine occurs at different position(s) of purine bases.

Also, there is always a possibility that the crystal structures do not accurately reflect the fold of the enzymes in their catalytically relevant state. This might also explain why the crystal structures are not able to explain why the variant DmdNK accepts dP and dZ.

For the case of P nucleobase, we can speculate that P would be better substrates than G as P lack nitrogen-7. This replacement might reduce the repulsive interaction between the glutamine C=O oxygen and nitrogen-7 of the purine ring (G).

These considerations notwithstanding, this engineered kinase provides the enzymology needed for the first step for constructing a biosynthetic pathway to make the triphosphates of ZP nucleosides. Therefore, only one remaining step that we need to complete is the second phosphorylation step catalyzed by nucleoside monophosphate kinases.

82

Figure 5-22. Crystal structures of DmdNK wildtype complexed with dT (A; PDB ID: 1OT3), dC (B; PDB ID: 2VP5), and dGTP (C; PDB ID: 2VP2). Possible hydrogen bond acceptors and donors are colored in red and blue, respectively.

Monophosphate Kinase Variants

None of the monophosphate kinase variants that were tested had activity towards Z or P monophosphates. This may be because 1) the theories to engineer enzymes were inadequate, 2) the mutations were not efficiently incorporated; and/or 3) the numbers of clones tested did not cover all variants that need to be tested.

83

The first possibility is always the problem in science, and failure is the key to improve theories. In that sense, this study gave good examples to improve the theories to engineer kinases to have desire specificities.

To evaluate the second possibility (inefficient mutation sites incorporation), the genes from 96 clones that were screened were sequenced, 16 or 32 clones from each library (EcGmk,

EcAdk (Rosetta design), EcAdk (semi-rational design), EcTmk, and the plate containing the mixture of the forward primer and purified plasmids. The sequencing reactions were done by

Eurofins Genomics (Luxembourg, Luxembourg). Table 5-6 shows the sequencing results. The reference sequences were aligned with obtained library sequences using the LALIGN program

(http://www.ch.embnet.org/software/LALIGN_form.html).

Of the 10 analyzable sequences from the 16 samples obtained from the EcGmk library, all were wild type. Therefore, this library generation failed. More successfully, 12.5% of the sequences (4 sequences) obtained from the "semi-rationally" altered EcAdk gene encoded amino acid replacements, with Gly, Pro, and Val substitutions at position 31; this sample did not find any variants having changes at positions 58 and 59. Five of the 16 clones sampled from the variants guided by Rosetta had changes at the targeted sites 31, 59, and 64; this covered all of the sites targeted by Dr. Per Jr Greisen in the Baker laboratory, but missed some of his suggested amino acid replacements. Only two variants were found in the 16 clone samples from EcTmk; changes at site 78 were observed, which was one of the two sites that we had hoped to alter.

Thus, the screen was large enough to evaluate only about 1% of the variants that were sought. These results are typical of mutagenesis studies in general, are consistent with the fact that most variants still retain their activities towards natural substrates (Figure 5-15, 5-17, 5-19, and 5-21). A much larger screening effort would be required to examine all of the targeted

84

variants, if they were generated by directed mutagenesis. Of course, all of them could be examined if specific genes were created individually for each variant of interest.

Table 5-6. The results of kinase library sequencing. Obtained Total clone Clones had Mutated positions Kinase library readable submitted desired mutaion(s) (# of clones) sequences EcGmk 16 10 0 None

EcAdk: Semi-rational 32 20 4 T31G(1), P(3), V(1) T31A(1), T31G(2), EcAdk: Rosetta 16 16 5 V59T(1), V64D(1),

EcTmk 32 5 2 R78A (2)

To evaluate the third reason, where the clone coverage of the library may be the problem, the number of variants that might have been missed was calculated by using the following equation.108

1 푚 휆 = 푛 (1 − ) 푛

*=missing theoretical variants, n=expecting variants, m=variants screened.

This equation can be applied to the case when all outcomes have an equal probability. The probability of possible amino acid substitutions encoded by the codons are almost equal, even taking E. coli codon variations/usage into account109 (Appendix Table C-1). Table 5-7 shows the results of the calculation. According to the calculation, there were theoretical variants that might had missed: 711.3 (EcGmk, stopped screening); 13.4 (EcAdk: semi-rational); 1.6 (EcAdk:

Rosetta); 54.5 (EcTmk). To make the lambda value (missing theoretical variants) below 1.0, the numbers of screened variants should have been increased: > 5310 for Gmk; > 436 for Adk (semi- rational); > 109 for Adk (Rosetta); > 707 for Tmk.

This result indicated that many more variants must be screened to cover all the desired mutation sites. However, it is also true that if one does not see any activity change towards

85

desired substrates after the screening of ~ 100 clones, the possibility that the amino acid residues picked for the mutations affect the enzyme activity may be low.

Table 5-7. Missing theoretical variants of kinases and ideal library size. Variants Missing Expecting screened theoretical variants variants m λ n EcGmk screened 94 711.3 800 Ideal library size for EcGmk 5310 1.0 800 EcAdk screened (semi-rational) 188 13.4 96 Ideal library size for EcAdk 436 1.0 96 EcAdk screened (Rosetta) 94 1.6 32 Ideal library size for EcAdk 109 1.0 32 EcTmk screened 188 54.5 144 Ideal library size for EcTmk 707 1.0 144

Thus, for efficient and promising enzyme engineering, one needs to develop and apply efficient enzyme designing methods (make small libraries110 such as by using the evolutional information of the enzymes: Reconstructed Evolutionary Adaptive Path (REAP)111), efficient and rapid mutation site incorporation method. Moreover, it is really important to pre-analyze and adjust the mutation incorporation rates and conditions before testing the variants. When the best condition is determined, the numbers of clones that cover all the possible variants should be screened.

86

CHAPTER 6 Z NUCLEOBASE CRYSTAL STRUCTURE ANALYSIS

Background

As noted above, Z is >50% deprotonated at pH above 8.0. In its deprotonated form, Z forms a pair with Watson-Crick geometry with G. This appears to be a mismatching critical cause to polymerase misincorporation, and possibly, discrimination by kinases. Accordingly, crystallographic study was conducted to understand more about this nucleobase.

Materials and Methods

Crystallization

In preliminary studies seeking useful crystallization solvents, the Z nucleobase (50 mg) was dissolved in a set of solvents (5 mL) (Table 6-1) near their boiling points. The solutions were then filtered through nylon (0.45 m) to remove undissolved material. Only DMSO, dimethylformamide, and water showed substantial ability to dissolve Z.

Accordingly, the Z material (100 mg) that had been recovered from an NaOH solution by precipitation with HCl was dissolved directly in hot dimethyl sulfoxide (5 mL). Slow cooling at room temperature gave fine needle like crystals (Z-Sod), later shown to contain sodium.

Separately, the material (100 mg) was dissolved in water (10 mL) containing ammonium hydroxide (650 mM). The solution was placed in an open beaker in a desiccator, to which was added separately a second open beaker containing sulfuric acid (18 M). The desiccator was sealed at atmospheric pressure and allowed to stand as vapor diffusion of the ammonia slowly formed fine needlelike crystals (Z-Am), later shown to contain ammonium. The pH of the solvent containing the crystals was 8.2.

This chapter is reproduced with permission of the International Union of Crystallography [Matsuura, M. F., Kim, H.-J., Takahashi, D., Abboud, K. A., and Benner, S. A. Crystal Structures of Deprotonated Nucleobases from an Expanded DNA Alphabet, Acta Crystallogr. 2016, C72, 952-959. doi: 10.1107/S2053229616017071].

87

Table 6-1. A list of crystallization conditions. Solvent b.p. (K) m.p (K) Solubility Dimethyl sulfoxide 462 292 ++++ (dissolved all) Dimethylformamide 427 212 ++++ (dissolved all) Water 373 273 ++++ Ethylene glycol 470 260 +++ (white precipitate) Ethanol 352 159 ++ Pyridine 389 231 ++ Acetonitrile 354 228 ++ Ethyl acetate 350 189 + (white precipitate) Toluene 384 178 - Benzene 353 278.5 - Dichloromethane 313 176 - Diethyl ether 308 157 - Tetrahydrofuran 339 165 - Hexanes 341 179 - Chlorobenzene 404 228 - Chloroform 334 209 - tert-butyl methyl ether 328 164 - 1,4-dioxane 374 285 -

1H NMR

1 The lyophilized Z nucleobase crystals were dissolved in DMSO-d6 to measure its H

NMR signals. This analysis was performed by Dr. Myong-Sang Kim from the Foundation for

Applied Molecular Evolution. The observed signals agreed with the signals of Z nucleobase

(Figure 6-1). 1H NMR (300 MHz, DMSO)  8.524 (s, 1H), 7.953 (d, J = 9.9 Hz, 1H), 5.650 (d, J

= 6.9 Hz, 1H).

88

Data Collection and Refinement

Crystal data collection and structure refinement were performed by Dr. Kahlil A. Abboud and Mr. Daisuke Takahashi from the Center for Xray Crystallography, Department of Chemistry,

University of Florida. The details are summarized in Table 6-2. The non-H atoms were refined with anisotropic thermal parameters and the H atoms bound to C atoms were calculated in idealized positions and refined riding on their parent atoms.

In solution, Z-Sod molecules exist in a dynamic equilibrium between the neutral form

(with a proton bound to the ring N1), and a deprotonated (anionic) form where the ring nitrogen carries a negative charge in the absence of the proton. The molecules crystallize as dimers, across inversion centers, of neutral and anionic molecules where the hydrogen on the ring nitrogen can exist on one of the molecules but not its symmetry equivalent. Crystallographically, the proton cannot exist on both ring nitrogen atoms of the two molecules due to the short distance between them, 2.924 (2) Å. Thus, the proton is refined with a 50% occupancy factor and the charges are balanced by the 50% occupancy of the Na positions located on 2-fold symmetry elements. In addition to the strong H-bonding between H1a and N1 of the pair of the molecules, this type of dimer crystallization is also driven by the strong dual H-bonding between the amino hydrogens of N3 and O1.

In Z-Am crystal, the asymmetric unit consists of two crystallographically independent molecules, where the same equilibrium as mentioned above exists, each was refined with a hydrogen on atoms N1a and atom N1b; but with 50% occupation factors for each hydrogen atom. These two hydrogens cannot coexist in adjacent molecules of the asymmetric unit due to the short distance between their heteroatoms (2.909 Å). Each of the two molecules is in fact an average of a neutral molecule and an anion with the charge is carried by either N1a or N1b. The counterion in this case is ammonium, which is disordered and refined in two positions, N2 and

89

N2'. All four H atoms were obtained from a Difference Fourier map and refined with constraints to maintain equivalent bonds and isotropic displacement parameters. The lattice water molecule is also disordered but only in the oxygen atom position which was refined as O1 and O1'. Its hydrogens H1c and H1d are not disordered and they were refined freely. The occupation factors of all disordered parts were allowed to refine initially and as they all refined to values very close to 0.5, they were fixed at 0.5 for the final refinement model.

Table 6-2. Crystal data and structure refinement. Crystal data

Identification code Z-Sod Z-Am + - Empirical formula [Na(C5H4N3O3)(H2O)] C5H5N3O3·H2O NH4 C5H4N3O3 C5H5N3O3 H2O Mr 368.26 345.29 Crystal system, space group Monoclinic, C2/c Monoclinic, P21 Temperature (K) 100 K 100 K a, b, c (Å) 20.8644(14), 3.5981(2), 20.2402(14) 10.8724(10), 3.7149(3), 17.3901(15) β (°) 105.8625(13) 93.012(2) V (Å3 ) 1461.61(16) 701.41(11) Z 4 2 Radiation type Mo Kα μ (mm-1) 0.17 0.14 Crystal size (mm) 0.40 x 0.05 x 0.03 0.29 x 0.09 x 0.07

Data collection Diffractometer Bruker APEX-II DUO diffractometer Absorption correction Analytical (SADABS, Bruker 2014)

Tmin, Tmax 0.964, 0.997 0.976, 0.993 No. of measured, independent and observed [I > 2σ(I)] reflections 7572,1682, 1329 7278, 3174, 2934

Rint 0.029 0.021 -1 (sinθ/λ)max (Å ) 0.650 0.650

Refinement R[F2 > 2σ(F2)], wR(F2), S 0.032, 0.091, 1.05 0.032, 0.082, 1.05 No. of reflections 1682 3174 No. of parameters 133 248 No. of restraints 0 31 H-atom treatment H atoms treated by a mixture of independent and constrained refinement -3 Δρmax, Δρmin (e Å ) 0.34, -0.23 0.26, - 0.22 Flack x determined using 1135 quotients [(I+)- Absolute structure - (I-)]/[(I+)+(I-)].112 Absolute structure parameter - 0.0 (6) Computer programs: Bruker APEX2,113 Bruker Apex2 SAINT,114 SHELXTL2014/7,115 and XP116 in SHELXTL-Plus.

90

Results

1H NMR

1 Figure 6-1. H NMR spectrum of the Z nucleobase crystal dissolved in DMSO-d6. DMSO-d5. was used as a reference for chemical shift (2.500 ppm). 1H-NMR spectrum was recorded on an Oxford NMR spectrometer (Oxfordshire, UK) operating at 1H frequency of 300 MHz. Values on the NMR signal are chemical shifts of the 5 protons of the –NH group, -CH groups (aromatic) of Z nucleobase, water ( 3.387 (s, 2H)), and DMSO-d5 ( 2.500 (s, 1H)).

Crystal Structure of Z nucleobase

As seen in Figures 6-2, Z crystallized as a pair complexed with sodium ions in the organic solvent (DMSO, Z-Sod), or as a half-hydrate with ammonium under slightly basic conditions (pH of the solvent was ~ 8.2, Z-Am, Figure 6-2 and 6-4). Crystallographic parameters are collected in Table 6-3, 6-4, 6-5, 6-6 (see Appendix for all data) and the molecular packings are shown in Figure 6-3 and 6-4.

Tables 6-3, 6-4, 6-5, and 6-6 show bond lengths, angles and hydrogen bonds. Bond lengths and angles of Z nucleobases are similar to the values observed in cytosine.117,118 The only difference is that the bond angle between the amino group and the ring is wider (C4-C5-N3, ~

122°) than that of cytosine. This could be due to the presence of the nitro group, which might

91

cause steric crash. Similar change in angles (becomes wider than 120°) was also reported between the nitro group and the ring (N2-C4-C5, ~ 122-125°).

In both crystals, Z and Z' paired through three hydrogen bonds between their nitro groups and carbonyl groups, and protonated imino and deprotonated imino . The hydrogen bond lengths (H…A) of Z-Sod and Z-Am were fall in the accepted range119,120 (1.5–2.2

Å; 1.67-1.84 Å). The lengths of D…A are also similar to those of the natural Watson-Crick pairs120,121 (~ 2.7-2.9 Å), Z:P pairs12,122-124 (~ 2.7-3.0 Å), and the other pyrimidine-pyrimidine hydrogen bonds such as C:U125 (N (Cytosine, amino group)…O (Uracil), 2.78 and 2.81 Å) and

U:U126 (N …O, 2.661, 3.223, 3.402, and 3.491 Å). , and C+:C pairs127 (N···O and N···N ~ 2.7-

2.9 Å).

Both crystals have the same crystal system (monoclinic), with single cation and two Z nucleobases form deprotonated:protonated pairs. As seen in Figure 6-2, 6-3, and 6-4, Z nucleobases in both crystals are fairly planar including all functional groups attached to the ring.

This observation is the same as the previously reported Z nucleobase structure in DNA both in A and B-form122, but different from Z in DNA aptamer and RNA12,123 (the nitro group is rotated).

Each crystal has unique crystal packing features. The cations (sodium ions or ammonium ions) interact with these nucleobases directly or through mediation by a water molecule. In Z-

Sod, the sodium ion coordinates two O atoms from nitro groups, and six O atoms from water.

Sodium ions are lining up between the tilted pairs (Figure 6-3) as other nitroorganic crystals with sodium ions.128,129 Also, it forms zigzag chains as seen in 2-amino-3-nitropyridinium trichloroacetate130 or in many metal ion-ring structure complex.131 In Z-Am, ammonium ions are lining/interacting with the nucleobases and water in a way that those molecules make a large circular hydrogen-bonding network (Figure 6-4).

92

Table 6-3. Selected bond lengths (Å) and angles (°) for Z-Sod. (Å or °) N(2)-C(4) 111.3920(17) O(2)-N(2) 111.2542(14) O(3)-N(2) 111.2625(15) N(1)-H(1A) 110.76(3) Na(1)-O(2)v 112.4749(10) Na(1)-O(2) 112.4749(10) Na(1)-O(4)i 112.3788(12) Na(1)-O(4)ii 112.3788(12) Na(1)-O(4)iii 112.3859(12) Na(1)-O(4)iv 112.3859(12) O(4)-Na(1)viii 112.3788(12) O(4)-Na(1)ix 112.3859(12) N(3)-C(5)-C(4) 124.77(12) N(2)-C(4)-C(5) 122.57(12) Symmetry codes: (i) x−1/2, y−1/2, z; (ii) −x+1/2, y−1/2, −z+1/2; (iii) −x+1/2, y+1/2, −z+1/2; (iv) x−1/2, y+1/2, z; (v) −x, y, −z+1/2; (vi) x, y+1, z; (vii) x, y−1, z; (viii) x+1/2, y+1/2, z; (ix) x+1/2, y−1/2, z.

Table 6-4. Selected hydrogen bonds for Z-Sod (Å and °). D-H...A d(D-H) d(H...A) d(D...A) <(DHA) N(3)-H(3B)...O(1)i 0.841(18) 1.958(18) 2.7975(15) 176.5(17) N(1)-H(1A)...N(1)i 0.76(3) 2.17(3) 2.924(2) 174(3) N(3)-H(3C)...O(3) 0.895(18) 2.036(18) 2.6570(15) 125.4(14) Symmetry codes: (i) −x+1/2, −y+1/2, −z; (ii) x+1/2, y+1/2, z.

Table 6-5. Selected bond lengths (Å) and angles (°) for Z-Am. (Å or °) N(2A)-C(4A) 111.405(2) O(2A)-N(2A) 111.251(2) O(3A)-N(2A) 111.249(2) N(1A)-H(1A) 110.8440 N(2B)-C(4B) 111.386(2) O(3B)-N(2B) 111.258(2) O(2B)-N(2B) 111.261(2) N(1B)-H(1B) 110.8541 N(3A)-C(5A)-C(4A) 125.32(17) N(3B)-C(5B)-C(4B) 124.65(17) N(2A)-C(4A)-C(5A) 122.30(17) N(2B)-C(4B)-C(5B) 121.96(17)

Table 6-6. Selected hydrogen bonds for Z-Am (Å and °). D-H...A d(D-H) d(H...A) d(D...A) <(DHA) N(3A)-H(3A")...O(1B) 0.87 1.94 2.805(2) 171 N(3B)-H(3B")...O(1A) 0.96 1.85 2.805(2) 172 N(1A)-H(1A)...N(1B) 0.84 2.09 2.909(2) 163 N(1B)-H(1B)...N(1A) 0.85 2.06 2.909(2) 172

93

Figure 6-2. Asymmetric unit of Z-Sod (left) and Z-Am (right) showing the hydrogen bonding pattern and the disorder in the water molecule and the sodium/ammonium cation. Hydrogen bonds are indicated by gray dashed lines. Displacement ellipsoids are drawn at the 50% probability level.

(a) (b)

Figure 6-3. Molecular packing of Z-Sod, showing the hydrogen-bonded layers parallel to the crystallographic (001) plane. Hydrogen bonds are indicated by dashed lines. (a) A perspective view along the a axis, and (b) a perspective view along the b axis. The purple regions refer to Na coordination spheres.

94

Figure 6-4. A perspective view along the b axis showing the molecular packing of Z-Am, showing the hydrogen-bonded layers parallel to the crystallographic (010) plane. Hydrogen bonds are indicated by dashed lines.

Discussion

These results contained several surprises. First, we expected that the material that precipitated upon addition of HCl to an alkaline solution would be neutral 6-amino-5- nitropyridine-2-one (see Chapter 3). This expectation was based on routine experience in organic chemistry. In general, organic acids dissolve in water when they are deprotonated, and are extracted from water into ether (or precipitate) when acid is added to restore the proton and remove the charge.

Remarkably however, the material precipitating upon acidification of the alkaline solution of Z' was not the uncharged aminonitropyridine. Rather, as shown in the crystal obtained from dimethyl sulfoxide, the material was a co-crystal built from one cation (Na+), one deprotonated anionic aminonitropyridine molecule (Z'), and one neutral aminonitropyridine molecule (Z) (Figure 6-2). Each cation interacts with at least one nitro group of the heterocycle.

The protonated and deprotonated heterocycles formed a Watson-Crick-like pair (in the sense that

95

it was joined by three hydrogen bonds on the Watson-Crick edge of the heterocycle), but

"skinny"132 (in the sense that it matched a small pyrimidine with another small pyrimidine).

Further, the structure obtained after crystallization from DMSO was consistent with an intramolecular hydrogen bond between one of the hydrogens of the exocyclic amino group and one of the oxygens of the nitro group, a six-membered ring. The other oxygen on the nitro group formed the closest contact to the Na+ cation (Figure 6-3).

To examine the importance of the cation on the crystal structure, a second crystal structure was obtained from ammonia-water via removal of the ammonia by vapor diffusion.

This too found a similar "skinny" pair formed between one protonated and one deprotonated heterocycle. Here, oxygens of the nitro group interact with an amino group and hydrogens from a water and ammonium ion. The ammonium ion also forms hydrogen bonds to the C=O carbonyl groups compounds of the heterocycles, and outside of the plane of the pair.

These results showed that the interaction of the Z:Z' is remarkably robust even when precipitated from water (Z-Am). This is remarkable, since for free nucleobases and nucleosides, pyrimidine-pyrimidine interaction and association (C, U, T) are the least favorable in water.133

This implies that the deprotonated heterocycle prefers to have a protonated heterocycle as its solvation partner in a crystal, more than water. Further, the pair is evidently so energetically favorable that it emerges regardless of the counter ion, at least for these two cases.

In DNA134,135 or RNA136 duplexes, the natural pyrimidine-pyrimidine pairs (except U:U) are less stable than certain purine-purine pairs. The known natural pyrimidine-pyrimidine pairs

(C:C, T:T, C:T) are stabilized by coordination with metal ions such as silver137,138 and mercury,139,140 or as intercalated forms127,141,142 (C+:C). Z:Z' pairs observed in our crystals resemble intercalated cytosine pairs pretty well except its charge (positively charged cytosine vs.

96

negatively charged Z). Intercalated cytosine pairs are often found in telomere DNA opposite to the G-rich sequences,143-145 which is located at the end of chromosomes. These repeating cytosines are stable under acidic pH.127,146 Those facts indicate that Z:Z' pairs may offer more diverse binding potential or stabilization by not only adding extra "hydrogen-bonding pattern" to the genetic code, but also adding extra "charge (protonation state) ", which can be changed near physiological pH (pKa ~ 7.8). This self-paring feature is also observed in the Watson-Crick base pairing partner of Z (and P147), where P resembles G nucleobase except one hydrogen and the location of nitrogen atoms. In addition to our unnatural nucleobases, some strong unnatural pyrimidine-pyrimidine pairs were reported with148 or without metal ion support.149 These facts also support the unusual stability of the Z:Z' pair without any metal ions, and its possible use for

DNA wires.150,151

Further, the structure of the sodium-Z nucleobase complex suggests that Z-nucleoside might be transported by the sodium/pyrimidine nucleoside cotransporters available in nature.152,153 Such transport might allow an E. coli strain to take up Z-nucleoside for the purpose of intracellular replication of plasmids that contain AEGIS.10

The structures also indicate how the negative charge that might formally be placed on the ring nitrogen is, in fact, distributed across the entire molecule, especially to the nitro group. This is hardly surprising, and the yellow color of the heterocycle is attributed to across-ring resonance. Indeed, the polarizable nature of the nitro group may indicate why is confers "general binding potential" to matrices (such as nitrocellulose), a potential that has been used to account for the binding potential of aptamers containing the Z nucleobase.11-13 However, this negative charge distribution may also account to the stability of the Z:Z' pair as reported for the Z:P pair.122-124

97

Further, the negative charge distribution suggests a general mechanism by which polymerases might distinguish the deprotonated Z':G mismatch in their active sites. In general, the major groove of a DNA duplex makes no close contact to the DNA polymerase protein.

From an evolutionary perspective, this is interpreted as a consequence of the four standard nucleobases being quite different in the major groove, forbidding direct contacts there. If, however, electron density in the major groove were to be increased in the polymerase by strategic amino acid replacement (such as by placing an aspartate or glutamate protruding in that region), it might discourage deprotonated dZ'TP from entering the active site as a partner for any base, including G. Work is ongoing in our laboratories to develop polymerase variants that test this hypothesis.8

98

CHAPTER 7 CONCLUSIONS

Assay Development

Kinase assays were developed to assess the wild-type kinase activities, and later, to screen the kinase libraries for their ability to phosphorylate AEGIS nucleosides and nucleotides.

Table 4-5 summarized the assays developed in this study. Since every assay had advantages and disadvantages, each assay was used for different purposes. The enzyme-coupled assay and the luciferase assay are easy to operate. The luciferase assay can be operated in a high-throughput format, and was used for most kinase screenings. The enzyme-coupled assay produces background with phosphatase activities, and is insensitive if the AEGIS component has high UV absorbance at 340 nm, such as Z. The luciferase assay produces background with dGDP, dADP, and dATP, making it less useful with those substrates. In these cases, the TLC and modified 5′ nuclease assay were used as alternatives. TLC directly shows the formation of phosphorylated substrate, which the other assays do not (the other assays detect the byproduct of the phosphorylation). The modified 5′ nuclease assay not only confirms the phosphorylation but also the incorporation of the deoxynucleoside triphosphates by the polymerase being used. These four assays covered enough diversity for most purposes. Also, all four assays can be applied to detect kinase activities for other unnatural nucleoside/tide substrates.

Completed Phosphorylation Steps

By using the assay described above, wild-type and kinase variants were tested. Figure 7-

1 summarizes the results described in Chapter 4 and Chapter 5. The wild-type and variant kinases completed most phosphorylation steps necessary for Z and P nucleoside triphosphate synthesis. For P, the complete phosphorylation was achieved with the existing wild-type kinases only. Theoretically, mixing these kinases with P generates PTP in vitro.

99

As of this moment, this theoretical prediction is being tested in the laboratory.

Interestingly, the combined enzymes still have not been shown to yield the triphosphate from the added nucleoside in one step. PMP is easily generated, but PDP and PTP have not yet been observed. This is attributed to a mismatching of the fluxes through the system, in particular, the slow conversion of the monophosphate to the diphosphates. This second step remains the difficult step in this pathway.

Figure 7-1. The phosphorylation steps that have been completed. Kinases that phosphorylate dZMP, ZMP, and dPMP are missing.

Thus, for the construction of an artificial metabolism in vitro, the nucleoside monophosphate kinases (NMPK) that convert dZMP, dPMP, and ZMP to give dZDP, dPDP, and ZDP (respectively) are still the limiting problem. Design tools, including Rosetta, have generally failed. We hope in the future to turn to combinatorial approaches to broaden our screen of variants.

Ribonucleoside pathway

Initially, we planned to phosphorylate only dZ and dP to give the corresponding dZTP and dPTP by a three-step phosphorylation pathway without the reduction step required for

100

ribonucleotide substrates (rNDP to dNDP, followed by phosphorylation of dNDP to generate dNTP). However, we started to investigate the phosphorylation of ZP ribonucleosides/tides because Ndk accepts ZDP and PDP as well (confirmed by reverse luciferase assay). Moreover, literature reported that most monophosphate kinases accept ribonucleoside monophosphates better (lower KM/higher kcat for ribonucleosides/tides) rather than deoxyribonucleoside monophosphates, as mentioned in Chapter 4. These results and facts suggest that we pursue ribopathways (creating dZTP and dPTP from Z and P).

ZDP and PDP have not been shown experimentally to be reduced by ribonucleotide reductase (RNR). However, it is likely that RNR accepts both substrates. This is because it is a broad-specificity enzyme that accepts all four natural nucleotides.154 When we get to the point in the research where we perform experiments with RNR, we may need to pay attention to its active and inactive forms. The E. coli RNR switches between an active form (α2β2) induced in the presence of ATP, dGTP, dTTP or low concentrations of dATP, and an inactive form (α4β4) induced by high concentrations of dATP or combinations of dGTP/dTTP + ATP.155

Substrates for ZP containing DNA (dZTP, dPTP) are not yet synthesized by kinases, however, it is promising that the AEGIS RNA monomer (PTP) can be synthesized by using the kinases that were introduced in this study. If these kinases produce PTP in vivo, it is possible to generate AEGIS containing RNA in cells by using AEGIS containing DNA template and T7

RNA polymerase.9

101

Kinase Specificities

As mentioned in Chapter 1 (Table 1-1), first and second phosphorylation steps are usually more specific rather than the third phosphorylation catalyzed by nucleoside diphosphate kinase

(dNDP to dNTP). Some organisms have broad specificity kinases that catalyze the first30-36 and second51-55 phosphorylation. However, those kinases are uncommon.

The nucleotides, especially (d)NTP concentrations must be carefully and rapidly controlled because 1) the triphosphate concentrations influence the error rates of DNA replication156-158 and 2) synthesis and consumption of ATP and GTP must be dynamic due to their roles as energy source and substrates in signal transduction pathways.

As seen in Table 7-1, nucleotides are maintained around a certain range of concentrations in cells. Nucleotide triphosphates are maintained at higher concentrations (~ 70 to 3000 μM) than the corresponding nucleoside monophosphates and diphosphates,89,159 and ATP is maintained at the highest concentration (~ 3000 μM). This could be because ATP works not only as an energy carrier, but also as a phosphate donor for many other compounds including nucleosides/tides.

To maintain those nucleotides concentrations, having only one broad specificity nucleoside diphosphate kinase could be an advantage for cells. This is because the cells need to control the expression of only one enzyme in response to the rapid and dynamic concentration changes of (d)NTPs.

This logic can be applied to other nucleoside/tide kinases involved in the first and second phosphorylation, however, those kinases are not necessarily to be as broad as Ndk, since the concentrations of (d)N, (d)NMP, (d)NDP may not change rapidly as (d)NTPs do. Also, having multiple broad-specificity kinases in one pathway would make cells unable to regulate the

(d)NTP biosynthesis independently by changing the enzyme expression (concentration) and/or

102

feedback inhibition to the nucleoside/tide kinases.17,160 This is especially true when cells employ broad-specificity kinases for all three phosphorylation steps (first, second, and third). This could be the reason why no organism has broad-specificity kinases for all three phosphorylation steps, but only for the first and third, or the second and third phosphorylation steps.

Table 7-1. Intracellular nucleotide concentrations

Approximate intracellular concentration (M) Nucleotide Bochner and Ames89 Buckstein et al.159 dATP 175 181 ATP 3000 3560 ADP 250 116 AMP 105 - dGTP 122 92 GTP 923 1660 GDP 128 203 GMP 20 - dCTP 65 184 CTP 515 325 CDP 81 - CMP 70 - dTTP 77 256 dTMP 41 - UTP 894 667 UDP 93 54 UMP 142 -

In summary, cells may accurately regulate the phosphorylation and the ratio of each nucleotide by using specific kinases and feedback inhibition of them (mainly at first and second phosphorylation). Then, broad-specificity Ndk enables cells to rapidly synthesize (d)NTP in response to the (d)NTP consumption.

103

Next Steps for ZP phosphorylation

Z phosphorylation turned out to be quite difficult. This may be due to the charge distributed in its nitro group and deprotonated form of Z (see Chapter 6). Kinases, especially monophosphate kinases, might have recognized this negative charge, and discriminated against

Z nucleotides.

However, the nitro group can be an advantage when it comes to the transportation of the nucleosides. As mentioned in Chapter 6, there is a nucleoside transporter that transports the nucleoside along with a sodium ion. We already know that Z nucleobase forms a pair and makes crystals in a presence of sodium ions.161 Thus, this transporter is likely to transport nucleosides that contain Z nucleobase.

To complete the remaining monophosphate phosphorylation steps, we are planning to engineer multi-substrate monophosphate kinases from T5 dNMPK. Ura6p was not selected due to the crystal structural analysis conducted by Dr. Dietlind L. Georloff from the Foundation for

Applied Molecular Information. According to the analysis, it is likely to destroy its activity if one or more residues in its active site are replaced with other amino acid residues.

Another way to overcome the monophosphate phosphorylation bottle neck is to create a kinase that has dual phosphorylation activity towards (d)N and (d)NMP, such as thymidine kinase from Herpes simplex virus type 1.29,37,162

Furthermore, we can clone multi-substrate NMPK from T4 bacteriophage51 and

Streptomyces bacteriophage ϕC31,55 which are the two remaining multi-substrate NMPKs we have not tried so far.

Secondly, we may need to check the synthesized deoxynucleoside monophosphates structures. They were synthesized and analyzed by MS and NMR, but never checked by other analytical methods. So it is still possible that the phosphates of dPMP and dZMP were actually

104

attached to the 3′, not the 5′ of the sugar (although 3′ and 5′ nucleotides were separated by

HPLC). Ribonucleotide should be fine in this case, because their hydroxyl groups were both protected when they were synthesized.

Thirdly, RNR activity will be tested with ZDP and PDP when active RNR becomes available. If RNR accepts PDP, it is promising that dPTP will be generated from P by using

RNR and kinases tested in this study.

Fourthly, after all phosphorylation pathways are completed, we will analyze the triphosphate formation in vivo. In vitro triphosphate formation in one-step reaction was tried with tritium-labeled P as a substrate, however it has not yet succeeded.

Finally, the genome editing (by using such as lambda Red (MAGE)163,164 and CRISPR-

Cas9165) is also considered as a method to obtain a strain that stably express the kinase variants and produce ZP nucleotides.

Steps towards Artificial Metabolisms

Several AEGIS nucleoside/tide phosphorylation steps were completed in this study, including the entire phosphorylation of P to generate PTP by using the existing kinases. Also, crystal structures of one AEGIS nucleobase were obtained. These results gave us more insights into the interactions among unnatural nucleobases and enzymes to guide the efficient enzyme design and metabolic engineering. More importantly, the results represent a step towards the goal of creating an AEGIS-based metabolism, which potentially create enzymes that consist of up to

64 amino acids.

105

APPENDIX A NUCLEOBASE SYNTHESIS

Z nucleobase was synthesized using the following scheme by Dr. Hyo-Joong Kim from the Foundation for Applied Molecular Evolution. To a mixture of precursor of Z nucleobase (2- amino-6-chloro-3-nitropyridine, Atlantic SciTech Group, Linden, NJ, USA) (5 g, 28.8 mmol) in ethanol (100 mL) and water (30 mL) was added a solution of NaOH (7.2 g, 180 mmol) in water

(20 mL) at room temperature. The mixture was heated and refluxed for 30 min and cooled to room temperature. Concentrated HCl was then added dropwise to the solution until precipitation had ended. The material was then recovered by filtration, washed, and dried to give 3.4 g of yellow solid93,166 (Figure A-1).

Figure A-1. Scheme for Z nucleobase synthesis.

This appendix is reproduced with permission of the International Union of Crystallography [Matsuura, M. F., Kim, H.-J., Takahashi, D., Abboud, K. A., and Benner, S. A. Crystal Structures of Deprotonated Nucleobases from an Expanded DNA Alphabet, Acta Crystallogr. 2016, C72, 952-959. doi: 10.1107/S2053229616017071].

106

APPENDIX B PRIMER SETS USED IN THIS STUDY

The following cloning/mutagenesis primer sets were designed by Dr. Ryan W. Shaw from the Foundation for Applied Molecular Evolution. The cmk, tmk, adk gene were also cloned by him. The mutation sites for E. coli adk gene were designed by Dr. Dietlind L. Gerloff. The primer sets for gmk gene were designed by Dr. Ryan W. Shaw. Both are from the Foundation for

Applied Molecular Evolution.

Table B-1. Primer sets for cmk gene cloning. Primer set Sequences (5' to 3') pRS1-cmk-F1 (Forward) TAATTTTGTTTAACTTTAAGAAGGACAATTGCATATGACGGCAATTGCCCCGGT pRS1-cmk-R1 (Reverse) ATTCGAGCTCGGTACCCGGGGATCCTTATGCGAGAGCCAATTTCTGG

Table B-2. Primer sets for tmk gene cloning and mutagenesis. Primer set Sequences (5' to 3') tmk-F-tailVec (Forward) ATTTTGTTTAACTTTAAGAAGGACAATGCGCAGTAAGTATATCGTCATTGA tmk-R-tail-HAT (Reverse) GCACATTATGAATCAGATGATCTTTTGCGTCCAACTCCTTCACCCAG tmk-R78VVW- F (Forward) ATGTTTTATGCCGCGVVWGTTCAACTGGTAGAAACG tmk-T105VVW-R (Reverse) ATACGCCTGWBBGGAGAGATCGTGGCGATC *Mixed base: V = A, C, G; W = A, T; B = C, G, T

Table B-3. Primer sets for adk gene cloning. Primer set Sequences (5' to 3') pRS1-adk-F1 (Forward) TAATTTTGTTTAACTTTAAGAAGGACAATTGCATATGCGTATCATTCTGCTTGG pRS1-adk-R1 (Reverse) ATTCGAGCTCGGTACCCGGGGATCCTTAGCCGAGGATTTTTTCCA

Table B-4. Primer sets for gmk gene mutagenesis. Primer set Sequences (5' to 3')

gmk-F-N-Hat (Forward) GAAGAACACGCGCATGCGCACAATAAAGGGCATATGGCTCAAGGCACGCTTTATATTG

gmk-R-pRS (Reverse) GTCCCGAATTCGAGCTCGGTACCTCAGTCTGCCAACAATTTGCTGATTAAA gmk-L72(RYN) E73(MAN) (Forward) AGCAGAGATGCGTTCRYNMANCACGCAGAAGTTTTTGGT

gmk-T84(KYN) S85(KCN) (Forward) GGTAATTACTATGGCKYNKCNCGTGAGGCCATTGAGCAAG

gmk-Q106(RWY) (Reverse) TGCTGCGCGCCRWYCCAGTCGATATCGAGAAA

*Mixed base: R = A, G; Y = C, T; N = A, C, G, T; M = A, C; K = G, T; R = A,G; W = A, T

107

APPENDIX C E. coli CODON USAGE

Table C-1. The codon usage of E. coli K12.109

Frequency Frequency Frequency Frequency Amino per Amino per Amino per Amino per Codon Ratio Codon Ratio Codon Ratio Codon Ratio acid thousand acid thousand acid thousand acid thousand (from ref.) (from ref.) (from ref.) (from ref.)

UUU Phe (F) 0.57 19.7 UCU Ser (S) 0.11 5.7 UAU Tyr (Y) 0.54 16.8 UGU Cys (C) 0.42 5.9 UUC Phe (F) 0.43 15.0 UCC Ser (S) 0.11 5.5 UAC Tyr (Y) 0.46 14.6 UGC Cys (C) 0.58 8.0 UUA Leu (L) 0.15 15.2 UCA Ser (S) 0.15 7.8 UAA Stop 0.64 1.8 UGA Stop 0.36 1.0 UUG Leu (L) 0.12 11.9 UCG Ser (S) 0.16 8.0 UAG Stop 0.00 0.0 UGG Trp (W) 100 10.7 CUU Leu (L) 0.12 11.9 CCU Pro (P) 0.17 8.4 CAU His (H) 0.55 15.8 CGU Arg (R) 0.36 21.1 CUC Leu (L) 0.10 10.5 CCC Pro (P) 0.13 6.4 CAC His (H) 0.45 13.1 CGC Arg (R) 0.44 26.0 CUA Leu (L) 0.05 5.3 CCA Pro (P) 0.14 6.6 CAA Gln (Q) 0.30 12.1 CGA Arg (R) 0.07 4.3 CUG Leu (L) 0.46 46.9 CCG Pro (P) 0.56 26.7 CAG Gln (Q) 0.70 27.7 CGG Arg (R) 0.07 4.1 AUU Ile (I) 0.58 30.5 ACU Thr (T) 0.16 8.0 AAU Asn (N) 0.47 21.9 AGU Ser (S) 0.14 7.2 AUC Ile (I) 0.35 18.2 ACC Thr (T) 0.47 22.8 AAC Asn (N) 0.53 24.4 AGC Ser (S) 0.33 16.6 AUA Ile (I) 0.07 3.7 ACA Thr (T) 0.13 6.4 AAA Lys (K) 0.73 33.2 AGA Arg (R) 0.02 1.4 Met AUG 100 24.8 ACG Thr (T) 0.24 11.5 AAG Lys (K) 0.27 12.1 AGG Arg (R) 0.03 1.6 (start) GUU Val (V) 0.25 16.8 GCU Ala (A) 0.11 10.7 GAU Asp (D) 0.65 37.9 GGU Gly (G) 0.29 21.3 GUC Val (V) 0.18 11.7 GCC Ala (A) 0.31 31.6 GAC Asp (D) 0.35 20.5 GGC Gly (G) 0.46 33.4 GUA Val (V) 0.17 11.5 GCA Ala (A) 0.21 21.1 GAA Glu (E) 0.70 43.7 GGA Gly (G) 0.13 9.2 GUG Val (V) 0.40 26.4 GCG Ala (A) 0.38 38.5 GAG Glu (E) 0.30 18.4 GGG Gly (G) 0.12 8.6 Ratio represents the abundance of the codon relative to all the codons that code the same amino acid.

108

APPENDIX D CRYSTALLOGRAPHIC DATA

Table D-1. Bond lengths (Å) and angles (°) for Z-Sod. Length (Å) Angle (°) Angle (°) Na(1)-O(4)i 2.3788(12) O(4)i -Na(1)-O(4)ii 82.07(6) N(2)-O(2)-Na(1) 137.35(9) Na(1)-O(4)ii 2.3788(12) O(4)i -Na(1)-O(4)iii 179.85(4) C(5)-N(1)-C(1) 123.46(11) Na(1)-O(4)iii 2.3859(12) O(4)ii-Na(1)-O(4)iii 98.08(3) C(5)-N(1)-H(1A) 124(2) Na(1)-O(4)iv 2.3859(12) O(4)i-Na(1)-O(4)iv 98.08(3) C(1)-N(1)-H(1A) 113(2) Na(1)-O(2)v 2.4749(10) O(4)ii-Na(1)-O(4)iv 179.85(4) O(2)-N(2)-O(3) 120.46(11) Na(1)-O(2) 2.4749(10) O(4)iii-Na(1)-O(4)iv 81.77(5) O(2)-N(2)-C(4) 119.07(11) Na(1)-Na(1)vi 3.5981(2) O(4)i-Na(1)-O(2)v 93.71(4) O(3)-N(2)-C(4) 120.47(11) Na(1)-Na(1)vii 3.5981(2) O(4)ii-Na(1)-O(2)v 80.32(4) C(5)-N(3)-H(3B) 116.3(12) O(1)-C(1) 1.2496(16) O(4)iii-Na(1)-O(2)v 86.32(4) C(5)-N(3)-H(3C) 121.0(11) O(2)-N(2) 1.2542(14) O(4)iv-Na(1)-O(2)v 99.66(4) H(3B)-N(3)-H(3C) 122.6(16) O(3)-N(2) 1.2625(15) O(4)i-Na(1)-O(2) 80.32(4) O(1)-C(1)-N(1) 119.09(11) N(1)-C(5) 1.3559(17) O(4)ii-Na(1)-O(2) 93.71(4) O(1)-C(1)-C(2) 122.27(12) N(1)-C(1) 1.3700(17) O(4)iii-Na(1)-O(2) 99.66(4) N(1)-C(1)-C(2) 118.64(12) N(1)-H(1A) 0.76(3) O(4)iv-Na(1)-O(2) 86.32(4) C(3)-C(2)-C(1) 119.69(12) N(2)-C(4) 1.3920(17) O(2)v-Na(1)-O(2) 172.14(7) C(3)-C(2)-H(2A) 120.2 N(3)-C(5) 1.3212(17) O(4)i-Na(1)-Na(1)vi 138.96(3) C(1)-C(2)-H(2A) 120.2 N(3)-H(3B) 0.841(18) O(4)ii-Na(1)-Na(1)vi 138.96(3) C(2)-C(3)-C(4) 120.78(11) N(3)-H(3C) 0.895(18) O(4)iii-Na(1)-Na(1)vi 40.89(3) C(2)-C(3)-H(3A) 119.6 C(1)-C(2) 1.4445(17) O(4)iv-Na(1)-Na(1)vi 40.89(3) C(4)-C(3)-H(3A) 119.6 C(2)-C(3) 1.3454(19) O(2)v-Na(1)-Na(1)vi 93.93(3) N(2)-C(4)-C(3) 118.16(11) C(2)-H(2A) 0.9500 O(2)-Na(1)-Na(1)vi 93.93(3) N(2)-C(4)-C(5) 122.57(12) C(3)-C(4) 1.4224(18) O(4)i-Na(1)-Na(1)vii 41.04(3) C(3)-C(4)-C(5) 119.27(12) C(3)-H(3A) 0.9500 O(4)ii-Na(1)-Na(1)vii 41.04(3) N(3)-C(5)-N(1) 117.12(11) C(4)-C(5) 1.4331(17) O(4)iii-Na(1)-Na(1)vii 139.11(3) N(3)-C(5)-C(4) 124.77(12) O(4)-Na(1)viii 2.3788(12) O(4)iv-Na(1)-Na(1)vii 139.11(3) N(1)-C(5)-C(4) 118.11(12) O(4)-Na(1)ix 2.3859(12) O(2)v-Na(1)-Na(1)vii 86.07(3) Na(1)viii-O(4)-Na(1)ix 98.08(3) O(4)-H(4A) 0.83(2) O(2)-Na(1)-Na(1)vii 86.07(3) Na(1)viii-O(4)-H(4A) 134.4(13) O(4)-H(4B) 0.85(2) Na(1)vi-Na(1)-Na(1)vii 180.0 Na(1)ix-O(4)-H(4A) 107.9(13) Na(1)viii-O(4)-H(4B) 95.4(16) Na(1)ix-O(4)-H(4B) 113.5(16)

H(4A)-O(4)-H(4B) 106.9(19) Symmetry codes: (i) x−1/2, y−1/2, z; (ii) −x+1/2, y−1/2, −z+1/2; (iii) −x+1/2, y+1/2, −z+1/2; (iv) x−1/2, y+1/2, z; (v) −x, y, −z+1/2; (vi) x, y+1, z; (vii) x, y−1, z; (viii) x+1/2, y+1/2, z; (ix) x+1/2, y−1/2, z.

This appendix is reproduced with permission of the International Union of Crystallography [Matsuura, M. F., Kim, H.-J., Takahashi, D., Abboud, K. A., and Benner, S. A. Crystal Structures of Deprotonated Nucleobases from an Expanded DNA Alphabet, Acta Crystallogr. 2016, C72, 952-959. doi: 10.1107/S2053229616017071].

109

Table D-2. Hydrogen bonds for Z-Sod (Å and °). D-H...A d(D-H) d(H...A) d(D...A) <(DHA) N(1)-H(1A)...N(1)x 0.76(3) 2.17(3) 2.924(2) 174(3) N(3)-H(3B)...O(1)x 0.841(18) 1.958(18) 2.7975(15) 176.5(17) N(3)-H(3C)...O(3) 0.895(18) 2.036(18) 2.6570(15) 125.4(14) O(4)-H(4A)...O(1) 0.83(2) 1.88(2) 2.7025(14) 169.2(19) O(4)-H(4B)...O(2)viii 0.85(2) 2.64(2) 3.1313(16) 118.4(18) O(4)-H(4B)...O(3)viii 0.85(2) 2.07(2) 2.9129(14) 172(2) Symmetry codes: (viii) x+1/2, y+1/2, z; (x) −x+1/2, −y+1/2, −z.

Table D-3. Atomic coordinates and equivalent isotropic displacement parameters (Å2) for Z- Sod. x y z Uiso*/Ueq Occ. (<1) Na(1) 0.0000 0.8127 (2) 0.2500 0.0189 (2) O(1) 0.33936 (5) 0.5410 (3) 0.09450 (5) 0.0234 (3) O(2) 0.08687 (5) 0.7655 (3) 0.18809 (5) 0.0273 (3) O(3) 0.04132 (5) 0.5154 (3) 0.08790 (5) 0.0253 (3) N(1) 0.22868 (6) 0.4386 (3) 0.05437 (5) 0.0152 (3) N(2) 0.09214 (6) 0.6320 (3) 0.13258 (5) 0.0192 (3) N(3) 0.11780 (6) 0.3242 (4) 0.00782 (5) 0.0178 (3) C(1) 0.28274 (7) 0.5701 (4) 0.10369 (6) 0.0161 (3) C(2) 0.27211 (7) 0.7394 (4) 0.16454 (6) 0.0158 (3) C(3) 0.21023 (7) 0.7547 (4) 0.17251 (6) 0.0159 (3) C(4) 0.15483 (7) 0.6131 (4) 0.12102 (6) 0.0155 (3) C(5) 0.16537 (7) 0.4567 (4) 0.05971 (6) 0.0145 (3) O(4) 0.45554 (5) 0.8140 (3) 0.17415 (5) 0.0196 (2) H(1A) 0.2380 (16) 0.350 (9) 0.0244 (16) 0.018* 0.5 H(3B) 0.1303 (9) 0.223 (5) −0.0240 (9) 0.028 (5)* H(3C) 0.0751 (9) 0.328 (6) 0.0086 (9) 0.032 (5)* H(2A) 0.3087 0.8396 0.1988 0.019* H(3A) 0.2034 0.8614 0.2130 0.019* H(4A) 0.4176 (11) 0.754 (5) 0.1513 (10) 0.036 (5)* H(4B) 0.4778 (11) 0.862 (7) 0.1459 (12) 0.057 (7)*

110

2 Table D-4. Anisotropic displacement parameters (Å ) for Z-Sod. U11 U22 U33 U12 U13 U23 Na(1) 0.0211 (4) 0.0186 (4) 0.0173 (4) 0.000 0.0056 (3) 0.000 O(1) 0.0165 (5) 0.0351 (6) 0.0188 (5) −0.0039 (4) 0.0052 (4) −0.0082 (4) O(2) 0.0272 (6) 0.0364 (7) 0.0231 (5) −0.0015 (5) 0.0149 (4) −0.0097 (4) O(3) 0.0173 (5) 0.0368 (7) 0.0222 (5) −0.0036 (5) 0.0061 (4) −0.0048 (4) N(1) 0.0164 (6) 0.0173 (6) 0.0120 (5) −0.0008 (5) 0.0040 (4) −0.0010 (4) N(2) 0.0220 (7) 0.0196 (6) 0.0180 (5) −0.0002 (5) 0.0088 (5) −0.0010 (5) N(3) 0.0153 (6) 0.0244 (7) 0.0144 (5) −0.0023 (5) 0.0050 (4) −0.0043 (5) C(1) 0.0184 (7) 0.0158 (7) 0.0140 (6) −0.0004 (5) 0.0043 (5) 0.0026 (5) C(2) 0.0197 (7) 0.0150 (7) 0.0108 (5) −0.0025 (5) 0.0011 (5) −0.0005 (5) C(3) 0.0243 (7) 0.0128 (7) 0.0112 (6) 0.0010 (5) 0.0058 (5) 0.0001 (4) C(4) 0.0184 (7) 0.0137 (7) 0.0154 (6) −0.0001 (5) 0.0066 (5) 0.0002 (5) C(5) 0.0177 (7) 0.0122 (6) 0.0139 (6) 0.0009 (5) 0.0049 (5) 0.0024 (5) O(4) 0.0163 (5) 0.0258 (6) 0.0158 (5) −0.0031 (4) 0.0029 (4) −0.0014 (4)

111

Table D-5. Bond lengths (Å) and angles (°) for Z-Am.

Length (Å) Length (Å) Angle (°) Angle (°) O(1A)-C(1A) 1.265(2) N(1B)-H(1B) 0.8541 C(1A)-N(1A)-C(5A) 123.07(17) C(5B)-N(3B)-H(3B") 124.0 O(2A)-N(2A) 1.251(2) N(3B)-C(5B) 1.316(3) C(1A)-N(1A)-H(1A) 118.4 H(3B')-N(3B)-H(3B") 119.0 O(3A)-N(2A) 1.249(2) N(3B)-H(3B') 0.8762 C(5A)-N(1A)-H(1A) 117.3 C(3B)-C(2B)-C(1B) 119.58(19) N(1A)-C(1A) 1.361(3) N(3B)-H(3B") 0.9588 O(3A)-N(2A)-O(2A) 121.23(16) C(3B)-C(2B)-H(2B) 120.2 N(1A)-C(5A) 1.365(2) C(2B)-C(3B) 1.347(3) O(3A)-N(2A)-C(4A) 120.34 (16) C(1B)-C(2B)-H(2B) 120.2 N(1A)-H(1A) 0.8440 C(2B)-C(1B) 1.447(3) O(2A)-N(2A)-C(4A) 118.43(18) O(1B)-C(1B)-N(1B) 119.01(17) N(2A)-C(4A) 1.405(2) C(2B)-H(2B) 0.9300 C(5A)-N(3A)-H(3A") 121.2 O(1B)-C(1B)-C(2B) 122.48(19) N(3A)-C(5A) 1.321(3) C(4B)-C(3B) 1.415(3) C(5A)-N(3A)-H(3A') 118.8 N(1B)-C(1B)-C(2B) 118.51(18) N(3A)-H(3A") 0.8691 C(4B)-C(5B) 1.438(3) H(3A")-N(3A)-H(3A') 120.0 N(2B)-C(4B)-C(3B) 118.99(17) N(3A)-H(3A') 0.8972 C(3B)-H(3B) 0.9300 C(3A)-C(2A)-C(1A) 119.13(19) N(2B)-C(4B)-C(5B) 121.96(18) C(2A)-C(3A) 1.352(3) O(1)-H(1C) 0.83(2) C(3A)-C(2A)-H(2A) 120.4 C(3B)-C(4B)-C(5B) 119.02(17) C(2A)-C(1A) 1.433(3) O(1)-H(1D) 0.84(2) C(1A)-C(2A)-H(2A) 120.4 C(2B)-C(3B)-C(4B) 121.18(18) C(2A)-H(2A) 0.9300 O(1')-H(1C) 0.83(2) O(1A)-C(1A)-N(1A) 118.82(18) C(2B)-C(3B)-H(3B) 119.4 C(3A)-C(4A) 1.407(3) O(1')-H(1D) 0.83(2) O(1A)-C(1A)-C(2A) 121.88(19) C(4B)-C(3B)-H(3B) 119.4 C(3A)-H(3A) 0.9300 N(2)-H(1) 0.97(3) N(1A)-C(1A)-C(2A) 119.30(17) N(3B)-C(5B)-N(1B) 117.04(17) C(4A)-C(5A) 1.427(3) N(2)-H(2) 0.97(3) C(2A)-C(3A)-C(4A) 121.17(19) N(3B)-C(5B)-C(4B) 124.65(18) O(3B)-N(2B) 1.258(2) N(2)-H(3) 0.97(3) C(2A)-C(3A)-H(3A) 119.4 N(1B)-C(5B)-C(4B) 118.31(17) O(2B)-N(2B) 1.261(2) N(2)-H(4) 0.97(3) C(4A)-C(3A)-H(3A) 119.4 H(1C)-O(1)-H(1D) 107(3) O(1B)-C(1B) 1.250(2) N(2')-H(1') 0.93(3) N(2A)-C(4A)-C(3A) 118.26(17) H(1C)-O(1')-H(1D) 107(3) N(2B)-C(4B) 1.386(2) N(2')-H(2') 0.93(3) N(2A)-C(4A)-C(5A) 122.30 (18) H(1)-N(2)-H(2) 112(2) N(1B)-C(5B) 1.354(2) N(2')-H(3') 0.93(3) C(3A)-C(4A)-C(5A) 119.44(17) H(1)-N(2)-H(3) 113(6) N(1B)-C(1B) 1.373(3) N(2')-H(4') 0.93(3) N(3A)-C(5A)-N(1A) 116.82(17) H(2)-N(2)-H(3) 112(2) N(3A)-C(5A)-C(4A) 125.32(18) H(1)-N(2)-H(4) 112(2) N(1A)-C(5A)-C(4A) 117.86(17) H(2)-N(2)-H(4) 97(5) O(3B)-N(2B)-O(2B) 119.65(16) H(3)-N(2)-H(4) 111(2) O(3B)-N(2B)-C(4B) 121.41(16) H(1')-N(2')-H(2') 102.9(19)

O(2B)-N(2B)-C(4B) 118.96(17) H(1')-N(2')-H(3') 119(6)

C(5B)-N(1B)-C(1B) 123.38(17) H(2')-N(2')-H(3') 102.9(19)

C(5B)-N(1B)-H(1B) 123.7 H(1')-N(2')-H(4') 102.8(19)

C(1B)-N(1B)-H(1B) 112.9 H(2')-N(2')-H(4') 128(6)

C(5B)-N(3B)-H(3B') 116.3 H(3')-N(2')-H(4') 102.6(19)

112

Table D-6. Hydrogen bonds for Z-Am (Å and °). D-H...A d(D-H) d(H...A) d(D...A) <(DHA) N1A-H1A...N1B 0.84 2.09 2.909(2) 163 N3A-H3A"...O1B 0.87 1.94 2.805(2) 171 N3A-H3A'...O3A 0.90 2.02 2.663(2) 128 N3A-H3A'...O3Ai 0.90 2.29 3.020(2) 138 N1B-H1B...N1A 0.85 2.06 2.909(2) 172 N3B-H3B'...O3B 0.88 2.01 2.652(2) 130 N3B-H3B'...O3Bii 0.88 2.49 3.087(2) 126 N3B-H3B"...O1A 0.96 1.85 2.805(2) 172 O1-H1C...O2Aiii 0.83(2) 2.18(2) 2.743(4) 125(2) O1'-H1C...O2Aiii 0.83(2) 2.18(2) 2.951(4) 154(3) O1-H1D...O1Biv 0.84(2) 2.10(2) 2.869(4) 153(2) O1'-H1D...O1Biv 0.83(2) 2.10(2) 2.636(4) 122(2) N2-H2...O1Avi 0.97(3) 1.84(4) 2.741(5) 154(5) N2'-H5'...O1' 0.93(3) 1.61(4) 2.465(6) 152(5) N2-H1...O3Bii 0.97(3) 2.33(4) 3.197(4) 148(4) N2-H1...O2Bii 0.97(3) 2.34(5) 2.968(5) 122(4) N2-H1...N2Bii 0.97(3) 2.68(4) 3.502(4) 144(4) N2'-H1'...O2Bv 0.93(3) 2.32(5) 2.906(5) 121(4) N2-H4...O1A 0.97(3) 1.71(4) 2.674(5) 175(6) N2'-H2'...O1A 0.93(3) 2.31(4) 3.099(6) 0138(4) N2-H3...O1 0.97(3) 2.04(4) 3.100(6) 150(3) N2'-H3'...O1' vi 0.93(3) 1.90(3) 2.800(6) 161(4) Symmetry codes: (i) −x−2, y+1/2, −z−1; (ii) −x−1, y−1/2, −z; (iii) −x−1, y+1/2, −z−1; (iv) x+1, y−1, z; (v) −x−1, y−3/2, −z; (vi) x, y−1, z.

113

2 Table D-7. Atomic coordinates and equivalent isotropic displacement parameters (Å ) for Z- Am.

x y z Uiso*/Ueq Occ. (<1) O(1A) −0.51782 (14) 0.5944 (6) −0.25542 (8) 0.0287 (4)

O(2A) −0.74782 (14) 0.4261 (6) −0.58249 (8) 0.0277 (4)

O(3A) −0.89863 (13) 0.7106 (5) −0.53151 (8) 0.0229 (4)

N(1A) −0.69744 (16) 0.7205 (5) −0.32140 (9) 0.0168 (4)

N(2A) −0.79419 (16) 0.5702 (6) −0.52580 (9) 0.0184 (4)

N(3A) −0.88118 (15) 0.8778 (5) −0.38244 (9) 0.0164 (4)

C(2A) −0.53538 (19) 0.4210 (6) −0.38606 (11) 0.0182 (4)

C(1A) −0.58169 (18) 0.5796 (7) −0.31828 (11) 0.0183 (4)

C(3A) −0.60718 (18) 0.4192 (6) −0.45191 (11) 0.0164 (4)

C(4A) −0.72559 (18) 0.5732 (6) −0.45506 (11) 0.0149 (4)

C(5A) −0.77185 (18) 0.7266 (6) −0.38711 (11) 0.0139 (4)

O(3B) −0.60860 (13) 0.8425 (5) 0.03922 (8) 0.0200 (4)

O(2B) −0.75985 (14) 1.1212 (5) 0.09095 (8) 0.0217 (4)

O(1B) −0.93970 (13) 1.2439 (5) −0.24812 (8) 0.0215 (4)

N(2B) −0.70885 (16) 1.0112 (5) 0.03188 (9) 0.0158 (4)

N(1B) −0.77706 (16) 1.0180 (5) −0.17746 (9) 0.0158 (4)

N(3B) −0.60902 (15) 0.7782 (5) −0.11268 (10) 0.0172 (4)

C(2B) −0.94121 (19) 1.3154 (6) −0.11296 (11) 0.0165 (4)

C(1B) −0.88818 (19) 1.1941 (6) −0.18301 (11) 0.0169 (4)

C(4B) −0.76586 (18) 1.0752 (6) −0.03994 (10) 0.0140 (4)

C(3B) −0.88016 (19) 1.2581 (6) −0.04467 (11) 0.0153 (4)

C(5B) −0.71398 (18) 0.9528 (6) −0.10969 (11) 0.0136 (4)

O(1) −0.1624 (4) 0.6432 (12) −0.2800 (2) 0.0205 (9)* 0.5 O(1') −0.1591 (4) 0.5247 (12) −0.2800 (2) 0.0196 (9)* 0.5 N(2) −0.3466 (5) 0.1076 (12) −0.2124 (2) 0.0189 (8)* 0.5 N(2') −0.2977 (5) 0.0886 (13) −0.2219 (2) 0.0208 (9)* 0.5

H(1A) -0.7302 0.7669 -0.2796 0.020* 0.5 H(3A") −0.9038 0.9709 −0.3396 0.020*

H(3A') −0.9314 0.8900 −0.4250 0.020*

H(2A) −0.4569 0.3207 −0.3848 0.022*

H(3A) −0.5778 0.3144 −0.4959 0.020*

H(1B) −0.7525 0.9521 −0.2211 0.019* 0.5 H(3B') −0.5627 0.7708 −0.0699 0.021*

H(3B") −0.5760 0.6939 −0.1594 0.021*

H(2B) −1.0170 1.4321 −0.1151 0.020*

H(3B) −0.9137 1.3403 0.0002 0.018*

H(1C) −0.174 (4) 0.584 (5) −0.3259 (15) 0.099 (11)*

H(1D) −0.089 (3) 0.597 (5) −0.268 (2) 0.099 (11)*

H1 −0.337 (5) 0.102 (14) −0.157 (2) 0.053 (10)* 0.5 H2 −0.386 (5) −0.109 (13) −0.233 (3) 0.053 (10)* 0.5 H3 −0.270 (4) 0.155 (13) −0.236 (4) 0.053 (10)* 0.5 H4 −0.410 (4) 0.275 (15) −0.230 (3) 0.053 (10)* 0.5 H1' −0.292 (4) 0.111 (17) −0.1686 (18) 0.039 (9)* 0.5 H2' −0.382 (3) 0.095 (18) −0.234 (3) 0.039 (9)* 0.5 H3' −0.268 (4) 0.274 (12) −0.252 (3) 0.039 (9)* 0.5 H4' −0.243 (4) −0.098 (12) −0.231 (2) 0.039 (9)* 0.5

114

2 Table D-8. Anisotropic displacement parameters (Å ) for Z-Am. U11 U22 U33 U12 U13 U23 O(1A) 0.0156 (7) 0.0539 (12) 0.0166 (7) 0.0042 (8) 0.0002 (6) 0.0035 (8) O(2A) 0.0267 (8) 0.0404 (11) 0.0163 (7) −0.0017 (8) 0.0047 (6) −0.0113 (8) O(3A) 0.0166 (8) 0.0334 (9) 0.0184 (7) 0.0004 (7) −0.0021 (6) −0.0011 (7) N(1A) 0.0152 (8) 0.0215 (9) 0.0137 (8) −0.0011 (7) 0.0023 (6) 0.0039 (7) N(2A) 0.0184 (9) 0.0213 (10) 0.0158 (8) −0.0047 (8) 0.0033 (7) −0.0028 (8) N(3A) 0.0146 (8) 0.0225 (9) 0.0123 (7) 0.0024 (8) 0.0014 (6) −0.0017 (7) C(2A) 0.0142 (9) 0.018 (1) 0.0228 (10) 0.0016 (9) 0.0055 (8) 0.0045 (9) C(1A) 0.015 (1) 0.0240 (11) 0.0158 (9) −0.0043 (9) 0.0011 (8) 0.0045 (9) C(3A) 0.0194 (10) 0.0131 (10) 0.0173 (9) −0.0011 (9) 0.0071 (8) −0.0011 (8) C(4A) 0.0161 (9) 0.0152 (9) 0.0135 (8) −0.0037 (8) 0.0024 (7) 0.0006 (8) C(5A) 0.0144 (9) 0.0118 (9) 0.0156 (9) −0.0024 (8) 0.0024 (7) 0.0013 (8) O(3B) 0.0177 (7) 0.0248 (9) 0.0173 (7) 0.0007 (7) −0.0030 (6) 0.0005 (6) O(2B) 0.0250 (8) 0.0297 (9) 0.0108 (6) −0.0030 (7) 0.0043 (5) −0.0020 (6) O(1B) 0.0197 (8) 0.0287 (9) 0.0160 (7) 0.0025 (7) −0.0008 (6) 0.0010 (7) N(2B) 0.0176 (8) 0.0159 (10) 0.0139 (8) −0.0042 (7) 0.0014 (6) −0.0005 (7) N(1B) 0.0163 (8) 0.0191 (9) 0.0119 (7) −0.0019 (7) 0.0010 (6) 0.0005 (7) N(3B) 0.0150 (8) 0.0211 (10) 0.0154 (8) 0.0015 (7) 0.0001 (6) −0.0018 (7) C(2B) 0.0149 (10) 0.0144 (10) 0.0204 (10) 0.0006 (8) 0.0039 (8) −0.0001 (8) C(1B) 0.0173 (10) 0.0175 (11) 0.0159 (9) −0.0052 (9) 0.0006 (8) 0.0003 (8) C(4B) 0.0163 (10) 0.0133 (9) 0.0123 (8) −0.0025 (8) 0.0009 (7) 0.0007 (8) C(3B) 0.0177 (10) 0.0139 (10) 0.0148 (9) −0.0029 (9) 0.0054 (7) −0.0015 (8) C(5B) 0.0134 (9) 0.0117 (10) 0.0156 (9) −0.0039 (8) 0.0004 (7) 0.0005 (8)

115

LIST OF REFERENCES

(1) Stephanopoulos, G. Metab. Eng. 1999, 1, 1-11.

(2) Robertson, M. P.; Scott, W. G. Nature 2007, 448, 757-758.

(3) Kirnos, M.; Khudyakov, I. Y.; Alexandrushkina, N.; Vanyushin, B., Nature 1977, 270, 369 - 370.

(4) Khudyakov, I. Y.; Kirnos, M.; Alexandrushkina, N.; Vanyushin, B. Virology 1978, 88, 8-18.

(5) Merritt, K. K.; Bradley, K. M.; Hutter, D.; Matsuura, M. F.; Rowold, D. J.; Benner, S. A. Beil. J. Org. Chem. 2014, 10, 2348-2360.

(6) Yang, Z.; Hutter, D.; Sheng, P.; Sismour, A. M.; Benner, S. A. Nucleic Acids Res. 2006, 34, 6095-6101.

(7) Yang, Z.; Chen, F.; Alvarado, J. B.; Benner, S. A. J. Am. Chem. Soc. 2011, 133, 15105- 15112.

(8) Laos, R.; Shaw, R.; Leal, N. A.; Gaucher, E.; Benner, S. Biochemistry 2013, 52, 5288-5294.

(9) Leal, N. A.; Kim, H.-J.; Hoshika, S.; Kim, M.-J.; Carrigan, M. A.; Benner, S. A. ACS ACS Synth. Biol. 2014, 4, 407-413.

(10) Matsuura, M. F.; Shaw, R. W.; Moses, J. D.; Kim, H.-J.; Kim, M.-J.; Kim, M.-S.; Hoshika, S.; Karalkar, N.; Benner, S. A. ACS Synth. Biol. 2016, 5, 234-240.

(11) Sefah, K.; Yang, Z.; Bradley, K. M.; Hoshika, S.; Jiménez, E.; Zhang, L.; Zhu, G.; Shanker, S.; Yu, F.; Turek, D. Proc. Natl. Acad. Sci. U.S.A. 2014, 111, 1449-1454.

(12) Zhang, L.; Yang, Z.; Sefah, K.; Bradley, K. M.; Hoshika, S.; Kim, M.-J.; Kim, H.-J.; Zhu, G.; Jiménez, E.; Cansiz, S. J. Am. Chem. Soc. 2015, 137, 6734-6737.

(13) Zhang, L.; Yang, Z.; Le Trinh, T.; Teng, I.; Wang, S.; Bradley, K.M.; Hoshika, S.; Wu, Q.; Cansiz, S.; Rowold, D.J.; McLendon, C.; Kim, M.-S.; Wu, Y.; Cui, C.; Liu, Y.; Hou, W.; Stewart, K.; Wan, S.; Liu, C.; Benner, S. A.; Tan, W. Angew. Chem. Int. Ed. 2016, 55, 12372- 12375.

(14) Biondi, E.; Lane, J.D.; Das, D.; Dasgupta, S.; Piccirilli, J.A.; Hoshika, S.; Bradley, K.M.; Krantz, B.A.; Benner, S.A. Nucleic Acids Res. 2016, 44, 9565-9577.

(15) Matsuura, M.F.; Winiger, C. B.; Shaw, R. W.; Kim, M.-J.; Kim, M.-S.; Daugherty, A. B.; Chen, F.; Moussatche, P.; Moses, J. D.; Lutz, S.; Benner, S. A. ACS Synth. Biol., in press. [Online early access]. DOI: 10.1021/acssynbio.6b00228. Published Online: Dec 9, 2016. http://pubs.acs.org/doi/full/10.1021/acssynbio.6b00228 (accessed December 25, 2016).

116

(16) Blackburn, G. M. Nucleic Acids in Chemistry and Biology. Royal Society of Chemistry: London, 2006.

(17) Henderson, J. F.; Paterson, A. R. P., Nucleotide metabolism: an introduction. Academic Press: Salt Lake City, 2014.

(18) Stuer-Lauridsen, B.; Nygaard, P. J.Bacteriol. 1998, 180, 457-463.

(19) Koonin, E. V.; Senkevich, T. G. Virus Genes 1993, 7, 289-295.

(20) Valentin-Hansen, P. Meth. Enzymol. 1978, 51, 308-314.

(21) Okazaki, R.; Kornberg, A. J. Biol. Chem. 1964, 239, 269-284.

(22) Mori, H.; Iida, A.; Teshiba, S.; Fujio, T. J.Bacteriol. 1995, 177, 4921-4926.

(23) Gentry, D.; Bengra, C.; Ikehara, K.; Cashel, M. J. Biol. Chem. 1993, 268, 14316-14321.

(24) Brune, M.; Schumann, R.; Wittinghofer, F. Nucleic Acids Res. 1985, 13, 7139-7151.

(25) Fricke, J.; Neuhard, J.; Kelln, R.; Pedersen, S. J.Bacteriol. 1995, 177, 517-523.

(26) Reynes, J.-P.; Tiraby, M.; Baron, M.; Drocourt, D.; Tiraby, G. J.Bacteriol. 1996, 178, 2804- 2812.

(27) Serina, L.; Blondin, C.; Krin, E.; Sismeiro, O.; Danchin, A.; Sakamoto, H.; Gilles, A.-M.; Barzu, O. Biochemistry 1995, 34, 5066-5074.

(28) Almaula, N.; Lu, Q.; Delgado, J.; Belkin, S.; Inouye, M. J.Bacteriol. 1995, 177, 2524-2529.

(29) Leipe, D. D.; Koonin, E. V.; Aravind, L. J. Mol. Biol. 2003, 333, 781-815.

(30) Clausen, A. R.; Girandon, L.; Knecht, W.; Survery, S.; Andreasson, E.; Munch-Petersen, B.; Piškur, J. Nucleic Acids Symp. Ser. 2008, 52, 489-490.

(31) Knecht, W.; Petersen, G. E.; Sandrini, M. P. B.; Søndergaard, L.; Munch‐Petersen, B.; Piškur, J. Nucleic Acids Res. 2003, 31, 1665-1672.

(32) Munch-Petersen, B.; Piskur, J.; Søndergaard, L. J. Biol. Chem. 1998, 273, 3926-3931.

(33) Knecht, W.; Sandrini, M. P.; Johansson, K.; Eklund, H.; Munch‐Petersen, B.; Piškur, J. EMBO J. 2002, 21, 1873-1880.

(34) Hansen, T.; Arnfors, L.; Ladenstein, R.; Schönheit, P. Extremophiles 2007, 11, 105-114.

117

(35) Ota, H.; Sakasegawa, S. i.; Yasuda, Y.; Imamura, S.; Tamura, T. FEBS J. 2008, 275, 5865- 5872.

(36) Elkin, S. R.; Kumar, A.; Price, C. W.; Columbus, L. Proteins 2013, 81, 568-582.

(37) Chen, M. S.; Prusoff, W. H. J. Biol. Chem. 1978, 253, 1325-1327.

(38) Solaroli, N.; Bjerke, M.; Amiri, M. H.; Johansson, M.; Karlsson, A. Eur. J. Biochem. 2003, 270, 2879-2884.

(39) Liu, L.; Li, Y.; Liotta, D.; Lutz, S. Nucleic Acids Res 2009, 37, 4472-81.

(40) Kurnasov, O. V.; Polanuyer, B. M.; Ananta, S.; Sloutsky, R.; Tam, A.; Gerdes, S. Y.; Osterman, A. L. J. Bacteriol. 2002, 184, 6906-6917.

(41) Zhai, R. G.; Rizzi, M.; Garavaglia, S. Cell. Mol. Life Sci. 2009, 66, 2805-2818.

(42) Gangaiah, D.; Liu, Z.; Arcos, J.; Kassem, I. I.; Sanad, Y.; Torrelles, J. B.; Rajashekara, G. PLoS One 2010, 5, e12142.

(43) Sureka, K.; Sanyal, S.; Basu, J.; Kundu, M. Mol. Microbiol. 2009, 74, 1187-1197.

(44) Motomura, K.; Hirota, R.; Okada, M.; Ikeda, T.; Ishida, T.; Kuroda, A. Appl. Environ. Microbiol. 2014, 80, 2602-2608.

(45) Liu, L.; Murphy, P.; Baker, D.; Lutz, S. Chem. Commun. (Camb) 2010, 46, 8803-5.

(46) Mikkelsen, N. E.; Munch-Petersen, B.; Eklund, H. FEBS J. 2008, 275, 2151-60.

(47) Ivey, R. A.; Zhang, Y.-M.; Virga, K. G.; Hevener, K.; Lee, R. E.; Rock, C. O.; Jackowski, S.; Park, H.-W. J. .Biol. Chem. 2004, 279, 35622-35629.

(48) Miziorko, H. M. Adv. Enzymol. Relat. Areas Mol. Biol. 74, 2000, 95-127.

(49) Bork, P.; Sander, C.; Valencia, A. Protein Sci. 1993, 2, 31-40.

(50) Park, J.; Gupta, R. Cellular and Molecular life sciences 2008, 65, 2875-2896.

(51) Allen, J.; Lasser, G.; Goldman, D.; Booth, J.; Mathews, C. J. Biol. Chem. 1983, 258, 5746- 5753.

(52) Ma, J. J.; Huang, S.; Jong, A. Y. J. Biol. Chem. 1990, 265, 19122-19127.

(53) Jong, A.; Yeh, Y.; Ma, J. Arch. Biochem. Biophys. 1993, 304, 197-204.

118

(54) Mikoulinskaia, G. V.; Gubanov, S. I.; Zimin, A. A.; Kolesnikov, I. V.; Feofanov, S. A.; Miroshnikov, A. I. Protein Expr. Purif. 2003, 27, 195-201.

(55) Mikoulinskaia, G.; Zimin, A.; Feofanov, S.; Miroshnikov, A. A new broad specificity deoxyribonucleoside monophosphate kinase encoded by gene 52 of phage ϕC31, In Doklady Biochemistry and Biophysics, Springer: New York, 2007; pp 15-17.

(56) Briozzo, P.; Evrin, C.; Meyer, P.; Assairi, L.; Joly, N.; Bārzu, O.; Gilles, A.-M. J. Biol. Chem. 2005, 280, 25533-25540.

(57) Segura-Peña, D.; Lutz, S.; Monnerjahn, C.; Konrad, M.; Lavie, A. J. Mol. Biol. 2007, 36, 129-141.

(58) Grochowski, L. L.; Censky, K.; Xu, H.; White, R. H. Archives of Microbiology 2012, 194, 141-145.

(59) Saint Girons, I.; Gilles, A.; Margarita, D.; Michelson, S.; Monnot, M.; Fermandjian, S.; Danchin, A.; Barzu, O. J. Biol. Chem. 1987, 262, 622-629.

(60) Tan, Y. W.; Hanson, J. A.; Yang, H. J. Biol. Chem. 2009, 284, 3306-13.

(61) Berry, M. B.; Bae, E.; Bilderback, T. R.; Glaser, M.; Phillips, G. N., Jr. Proteins 2006, 62, 555-6.

(62) Lu, Q.; Inouye, M. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 5720-5725.

(63) Willemoës, M.; Kilstrup, M. Arch. Biochem. Biophys. 2005, 444, 195-199.

(64) Gigliobianco, T.; Lakaye, B.; Makarchikov, A. F.; Wins, P.; Bettendorff, L. BMC Microbiol. 2008, 8, 16.

(65) Reinstein, J.; Schlichting, I.; Wittinghofer, A. Biochemistry 1990, 29, 7451-7459.

(66) Yan, H.; Shi, Z.; Tsai, M. D. Biochemistry 1990, 29, 6385-6392.

(67) Müller, C. W.; Schulz, G. E. J. Mol. Biol. 1992, 224, 159-177.

(68) Woods, D. F.; Bryant, P. J. Cell 1991, 66, 451-464.

(69) Hove-Jensen, B.; Rosenkrantz, T. J.; Haldimann, A.; Wanner, B. L. J. Bacteriol. 2003, 185, 2793-2801.

(70) Mikoulinskaia, G. V.; Zimin, A. A.; Feofanov, S. A.; Miroshnikov, A. I. Protein Expr. Purif. 2004, 33, 166-175.

119

(71) Chang, A.; Schomburg, I.; Placzek, S.; Jeske, L.; Ulbrich, M.; Xiao, M.; Sensen, C. W.; Schomburg, D. Nucleic Acids Res. 2014, gku1068.

(72) Choi, G.; Yi, H.; Lee, J.; Kwon, Y.-K.; Soh, M. S.; Shin, B.; Luka, Z.; Hahn, T.-R.; Song, P.-S. Nature 1999, 401, 610-613.

(73) Krishnan, K.; Rikhy, R.; Rao, S.; Shivalkar, M.; Mosko, M.; Narayanan, R.; Etter, P.; Estes, P. S.; Ramaswami, M. Neuron 2001, 30, 197-210.

(74) Chakrabarty, A. M. Mol. Microbiol. 1998, 28, 875-882.

(75) Ishikawa, N.; Shimada, N.; Takagi, Y.; Ishijima, Y.; Fukuda, M.; Kimura, N. J.Bioener. Biomembr. 2003, 35, 7-18.

(76) Moynié, L.; Giraud, M. F.; Georgescauld, F.; Lascu, I.; Dautant, A. Proteins 2007, 67, 755- 765.

(77) Levit, M. N.; Abramczyk, B. M.; Stock, J. B.; Postel, E. H.. J. Biol. Chem. 2002, 277, 5163- 5167.

(78) Schneider, B.; Babolat, M.; Xu, Y. W.; Janin, J.; Véron, M.; Deville‐Bonne, D. Eur.J. Biochem. 2001, 268, 1964-1971.

(79) Jurczyk, S. C.; Kodra, J. T.; Rozzell, J. D.; Benner, S. A.; Battersby, T. R. Helv. Chim. Acta. 1998, 81, 793-811.

(80) Sanger, F., Nicklen, S. and Coulson, A.R. Proc. Natl. Acad. Sci. U.S.A. 1977 74, 5463- 5467.

(81) Laemmli, U. K. Nature 1970, 227, 680-685.

(82) Fairbanks, G.; Steck, T. L.; Wallach, D. Biochemistry 1971, 10, 2606-2617.

(83) Bradford, M. M. Anal. Biochem. 1976, 72, 248-254.

(84) Noble, J. E.; Bailey, M. J. Methods Enzymol. 2009, 463, 73-95.

(85) Munch-Petersen, B.; Knecht, W.; Lenz, C.; Søndergaard, L.; Piškur, J. J. Biol. Chem. 2000, 275, 6673-6679.

(86) Wang, L.; Munch-Petersen, B.; Sjoëberg, A. H.; Hellman, U.; Bergman, T.; Jörnvall, H.; Eriksson, S. FEBS Lett. 1999, 443, 170-174.

(87) Sandrini, M. P., and Piš kur, J. Trends Biochem. Sci. 2005 30, 225−228.

(88) Bochner, B. R.; Ames, B. N. Anal. Biochem. 1982a, 122, 100-107.

120

(89) Bochner, B. R.; Ames, B. N. J. Biol. Chem. 1982b, 257, 9759-9769.

(90) Chopra, P.; Singh, A.; Koul, A.; Ramachandran, S.; Drlica, K.; Tyagi, A. K.; Singh, Y. Eur. J. Biochem. 2003, 270, 625-634.

(91) Wilson, P. M.; LaBonte, M. J.; Russell, J.; Louie, S.; Ghobrial, A. A.; Ladner, R. D. Nucleic Acids Res. 2011, gkr350.

(92) Shaw, R. W.; Moses, J. D.; Moussatche, P.; Hoshika, S.; Benner, S. A. unpublished work.

(93) Hutter, D.; Benner, S. A. J. Org. Chem. 2003, 68, 9839-9842.

(94) Lutz, S.; Lichter, J.; Liu, L. J. Am. Chem. Soc. 2007, 129, 8714-8715.

(95) Malyshev, D. A.; Dhami, K.; Lavergne, T.; Chen, T.; Dai, N.; Foster, J. M.; Corrêa, I. R.; Romesberg, F. E. Nature 2014, 509, 385-388.

(96) Guckian, K. M.; Morales, J. C.; Kool, E. T. J. Org. Chem. 1998, 63, 9652-9656.

(97) Kimoto, M.; Yamashige, R.; Matsunaga, K.-i.; Yokoyama, S.; Hirao, I. Nat. Biotech. 2013, 31, 453-457.

(98) Blondin, C.; Serina, L.; Wiesmuller, L.; Gilles, A.M.; Barzu, O. Anal. Biochem. 1994, 220, 219-221.

(99) Moyer, J.D. and Henderson, J.F. Anal. Biochem. 1983 131, 187-189.

(100) Lu, Q.; Park, H.; Egger, L. A.; Inouye, M. J. Biol. Chem. 1996, 271, 32886-32893.

(101) Schultz, C. P.; Ylisastigui-Pons, L.; Serina, L.; Sakamoto, H.; Mantsch, H. H.; Neuhard, J.; Bârzu, O.; Gilles, A.-M. Arch. Biochem. Biophys. 1997, 340, 144-153.

(102) Das, R.; Baker, D. Annu. Rev. Biochem. 2008, 77, 363-382.

(103) Gibson, D. G.; Young, L.; Chuang, R.-Y.; Venter, J. C.; Hutchison, C. A.; Smith, H. O. Nat. Methods 2009, 6, 343-345.

(104) Lopez, P.; Casane, D.; Philippe, H. Mol. Biol. Evol. 2002, 19, 1-7.

(105) Kimura, M. Nature 1968, 217, 624-626.

(106) Wu, Y.; Fa, M.; Tae, E. L.; Schultz, P. G.; Romesberg, F. E. J. Am. Chem. Soc. 2002, 124, 14626-14630.

121

(107) Mikkelsen, N. E.; Johansson, K.; Karlsson, A.; Knecht, W.; Andersen, G.; Piškur, J.; Munch-Petersen, B.; Eklund, H. Biochemistry 2003, 42, 5706-5712.

(108) Denault, M.; Pelletier, J. N. Protein library design and screening. In Protein engineering protocols; Arndt, K. M., Müller, K. M. Eds,; Humana Press Inc.: Totowa, 2007; Vol. 352, pp 127-154.

(109) Nakamura, Y.; Gojobori, T.; Ikemura, T. Nucleic Acids Res. 2000, 28, 292-292.

(110) Lutz, S. Curr. Opin. Biotechnol. 2010, 21, 734-743.

(111) Cox, V. E.; Gaucher, E. A. Directed Evolution Library Creation: Methods and Protocols. In Methods in Molecular Biology; Gilliam, E. M. J., Copp, J. N., Ackerley, D. Eds,; Springer Science+Business Media: New York, 2014; Vol. 1179, pp 353-363.

(112) Parsons, S., Flack, H. D. and Wagner, T. Acta Crystallogr. 2013, B69, 249–259.

(113) APEX2 and SADABS, Bruker AXS Inc., Madison, Wisconsin, USA. 2014.

(114) SAINT, Bruker AXS Inc., Madison, Wisconsin, USA. 2009.

(115) Sheldrick, G. M. Acta Crystallogr. 2015, A71, 3-8.

(116) XP, Bruker AXS Inc., Madison, Wisconsin, USA. 1998.

(117) Barker, D.; Marsh, R. Acta Crystallogr. 1964, 17, 1581-1587.

(118) Jeffrey, G. T.; Kinoshita, Y. Acta Crystallogr. 1963, 16, 20-28.

(119) Jeffrey, G. A., An introduction to hydrogen bonding. Oxford University Press New York: 1997; Vol. 12.

(120) Fonseca Guerra, C.; Bickelhaupt, F. M.; Snijders, J. G.; Baerends, E. J. J. Am. Chem. Soc. 2000, 122, 4117-4128.

(121) Arnott, S. Prog. Biophys. Mol. Biol. 1970, 21, 265-319.

(122) Georgiadis, M. M.; Singh, I.; Kellett, W. F.; Hoshika, S.; Benner, S. A.; Richards, N. G. J. Am. Chem. Soc. 2015, 137, 6947-6955.

(123) Hernandez, A. R.; Shao, Y.; Hoshika, S.; Yang, Z.; Shelke, S. A.; Herrou, J.; Kim, H. J.; Kim, M. J.; Piccirilli, J. A.; Benner, S. A. Angew. Chem. Int. Ed. 2015, 127 9991-9994.

(124) Chawla, M.; Credendino, R.; Chermak, E.; Oliva, R.; Cavallo, L. J. Phys. Chem. B 2016, 120, 2216-2224.

122

(125) Cruse, W.; Saludjian, P.; Biala, E.; Strazewski, P.; Prange, T.; Kennard, O. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 4160-4164.

(126) Baeyens, K. J.; De Bondt, H. L.; Holbrook, S. R. Nat. Struct. Mol. Biol. 1995, 2, 56-62.

(127) Gehring, K.; Leroy, J. L.; Guéron, M. Nature 1993, 363, 561-565.

(128) Stouten, P. F.; Prins, T. G.; Duisenberg, A. J.; Kanters, J. A.; Poonia, N. S. J. Crystallogr. Spectrosc. Res. 1991, 21, 553-557.

(129) Muthuraman, M.; Bagieu-Beucher, M.; Masse, R.; Nicoud, J.-F.; Desiraju, G. R. J. Mater. Chem. 1999, 9, 1471-1479.

(130) Song, C.; Zhou, Y.; Sun, Z.; Chen, T.; Zhang, S.; Luo, J. Cryst. Res.Technol. 2015, 50, 866-872.

(131) García-Terán, J. P.; Castillo, O.; Luque, A.; García-Couceiro, U.; Beobide, G.; Román, P. Dalton Trans. 2006, 7, 902-911.

(132) Hoshika, S., Benner, S. A. unpublished work.

(133) Tso’o, P., 0. P.; Melvin, I. S.; Olson, A. C. J. Am. Chem. Soc. 1963, 85, 1289.

(134) Ikuta, S.; Takagi, K.; Wallace, R. B.; Itakura, K. Nucleic Acids Res. 1987, 15, 797-811.

(135) Peyret, N.; Seneviratne, P. A.; Allawi, H. T.; SantaLucia, J. Biochemistry 1999, 38, 3468- 3477.

(136) Kierzek, R.; Burkard, M. E.; Turner, D. H. Biochemistry 1999, 38, 14214-14223.

(137) Ono, A.; Cao, S.; Togashi, H.; Tashiro, M.; Fujimoto, T.; Machinami, T.; Oda, S.; Miyake, Y.; Okamoto, I.; Tanaka, Y. Chem. Commun. 2008, 39, 4825-4827.

(138) Urata, H.; Yamaguchi, E.; Nakamura, Y.; Wada, S.-i. Chem. Commun. 2011, 47, 941-943

(139) Miyake, Y.; Togashi, H.; Tashiro, M.; Yamaguchi, H.; Oda, S.; Kudo, M.; Tanaka, Y.; Kondo, Y.; Sawa, R.; Fujimoto, T. J. Am. Chem. Soc. 2006, 128, 2172-2173.

(140) Johannsen, S.; Paulus, S.; Düpre, N.; Müller, J.; Sigel, R. K. J. Inorg. Biochem. 2008, 102, 1141-1151. . (141) Langridge, R.; Rich, A. Nature 1963, 198, 725-728.

(142) Akinrimisi, E. O.; Sander, C.; Ts'o, P. Biochemistry 1963, 2, 340-344.

(143) Blackburn, E. H. Nature 1991, 350, 569-573.

123

(144) Ahmed, S.; Kintanar, A.; Henderson, E. Nat. Struct. Mol. Biol. 1994, 1, 83-88.

(145) Cai, L.; Chen, L.; Raghavan, S.; Rich, A.; Ratliff, R.; Moyzis, R. Nucleic Acids Res. 1998, 26, 4696-4705.

(146) Chen, L.; Cai, L.; Zhang, X.; Rich, A. Biochemistry 1994, 33, 13540-13546.

(147) Laos, R., Lampropoulos, C., Benner, S. A. unpublished work.

(148) Switzer, C.; Shin, D. Chem. Commun. 2005, 10, 1342-1344.

(149) Geyer, C. R.; Battersby, T. R.; Benner, S. A. Structure 2003, 11, 1485-1498.

(150) Eley, D. D.; Spivey, D. I. Trans. Faraday Soc. 1962, 58, 411-415.

(151) Toomey, E.; Xu, J.; Vecchioni, S.; Rothschild, L.; Wind, S. J.; Fernandes, G. E. J. Phys. Chem. C 2016, 120, 7804–7809.

(152) Vijayalakshmi, D.; Belt, J. A. J. Biol. Chem. 1988, 263, 19419-19423.

(153) Huang, Q.-Q.; Yao, S.; Ritzel, M.; Paterson, A.; Cass, C. E.; Young, J. D. J. Biol. Chem. 1994, 269, 17757-17760.

(154) Nordlund, P.; Reichard, P. Annu. Rev. Biochem. 2006, 75, 681-706.

(155) Hofer, A.; Crona, M.; Logan, D. T.; Sjöberg, B.-M. Crit. Rev. Biochem. Mol. Biol. 2012, 47, 50-63.

(156) Kunkel, T. A.; Loeb, L. A. J. Biol. Chem. 1979, 254, 5718-5725.

(157) Bebenek, K.; Roberts, J. D.; Kunkel, T. J. Biol. Chem. 1992, 267, 3589-3596.

(158) Kumar, D.; Abdulovic, A. L.; Viberg, J.; Nilsson, A. K.; Kunkel, T. A.; Chabes, A. Nucleic Acids Res. 2011, 39, 1360-1371.

(159) Buckstein, M. H.; He, J.; Rubin, H. J. Bacteriol. 2008, 190, 718-726.

(160) Ives, D.H.; Morse, P.A.; Potter, V.R. J. Biol. Chem. 1963, 238, 1467-1474.

(161) Matsuura, M. F.; Kim, H.-J.; Takahashi, D.; Abboud, K. A.; and Benner, S. A. Acta Crystallogr. 2016, C72, 952-959.

(162) Liu, Y. Engineering kinases for dual thymidine and thymidylate kinase activity. Ph. D. Thesis, Emory University, 2011

124

(163) Wang, H. H.; Isaacs, F. J.; Carr, P. A.; Sun, Z. Z.; Xu, G.; Forest, C. R.; Church, G. M. Nature 2009, 460, 894-898.

(164) Mosberg, J. A.; Gregg, C. J.; Lajoie, M. J.; Wang, H. H.; Church, G. M. PLoS One 2012, 7, e44638.

(165) Jiang, Y.; Chen, B.; Duan, C.; Sun, B.; Yang, J.; Yang, S. Appl. Environ. Microbiol. 2015, 81, 2506-2514.

(166) Kim, H.-J.; Chen, F.; Benner, S. A. J. Org. Chem. 2012, 77, 3664-3669.

125

BIOGRAPHICAL SKETCH

Mariko Matsuura was born in Yokohama City, Kanagawa Prefecture, Japan. She attended

Kohoku elementary school, Yokohama, and Gakushuin girls’ junior and senior high school,

Tokyo. She developed interests in biology while she was taking science classes in her middle school. During her high school years, she attended several science summer/winter schools, and developed interests especially in biotechnology and a dream to be a researcher. She received a

B.S. in biological sciences in 2011 from Tokyo Metropolitan University, where she founded/lead a team to participate in iGEM competition (International Genetically Engineered Machine competition) at MIT in 2010. The team won a bronze medal. Following graduation, she entered the University of Florida, Department of Chemistry, and started her Ph.D. research under Dr.

Steven A. Benner in the Foundation for Applied Molecular Evolution. Mariko qualified for

Ph.D. candidacy in the biochemistry division of the Department of Chemistry. She plans to continue her academic career as a post-doctoral research fellow.

126