Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2017 Insights into the Complex Formation between Nucleoside Diphosphate Kinase and a Highly Polymorphic DNA G- Quadruplex Mykhailo Kopylov

Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected] FLORIDA STATE UNIVERSITY

COLLEGE OF ARTS AND SCIENCES

INSIGHTS INTO THE COMPLEX FORMATION BETWEEN

NUCLEOSIDE DIPHOSPHATE KINASE AND A HIGHLY POLYMORPHIC

DNA G-QUADRUPLEX

By

MYKHAILO KOPYLOV

A Dissertation submitted to the Department of Molecular Biophysics in partial fulfillment of the requirements for the degree of Doctor of Philosophy

2017

Mykhailo Kopylov defended this dissertation on October 16, 2017. The members of the supervisory committee were:

M. Elizabeth Stroupe Professor Directing Dissertation

Karen M. McGinnis University Representative

Hank W. Bass Committee Member

Wei Yang Committee Member

Hong Li Committee Member

The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements.

ii

This dissertation is dedicated to my beloved wife, Elina. Thank you for the endless support, motivation, encouragement, patience and understanding. This would not be possible without you.

iii ACKNOWLEDGEMENTS

First I all I would like to thank my PI, Beth Stroupe. Thank you for all your support and guidance throughout the years. Thank you for allowing a lot of independence for my projects, while keeping me on track and getting necessary experiments done. I would like to thank all of my committie members: Hank Bass, Hong Li, Wei Yang and Karen McGinnis for their invaluable input and critique during my committee meetings. I would like to acknowledge all of the IMB staff, but especially Soma Sundaram, Claudius Mundoma, Joan Hare and Jana Sevcikova for all their help with multiple projects. My huge thanks and warmest regards to Lyn Kittle, for making Tallahasse a home away from home. I would also like to acknowledge my friends and colleagues with whom I had an honor to travel along this fun journey towards my PhD degree (in no particular order): Joe Pennington, Trevia Jackson, Travis Hand, John Spear, Matt Johnson, James McGivern, Isabel Askenasy, Alex Noble, Cory Hearn and many others. Finally, huge thanks to my parents Galina and Serhiy Kopylov, who were always very supportive and engaged in my research.

iv TABLE OF CONTENTS LIST OF FIGURES ...... viii LIST OF TABLES ...... x ABSTRACT ...... xi CHAPTER 1. INTRODUCTION ...... 1 1.1 Preface ...... 1 1.2 Discovery of G4s ...... 1 1.3 G-quadruplex structures and properties ...... 3 1.4 G4 distribution in genomes ...... 6 1.5 Protein G4-interactions ...... 8 1.6 Methods for in-vitro G4 investigation ...... 10 1.7 G4 distribution in the maize genome ...... 11 CHAPTER 2. THE MAIZE (ZEA MAYS L.) NUCLEOSIDE DIPHOSPHATE KINASE1 (ZMNDPK1) GENE ENCODES A HUMAN NM23-H2 HOMOLOG THAT BINDS AND STABILIZES G-QUADRUPLEX DNA ...... 15 2.1 Introduction ...... 15 2.2 Experimental Procedures...... 17 2.2.1 Phage library screen ...... 17 2.2.2 Protein expression and purification ...... 18 2.2.3 Point variant generation and purification ...... 19 2.2.4 Structure determination ...... 19 2.2.5 Maize extract pulldown...... 19

2.2.6 Kd and stoichiometry determination ...... 20 2.2.7 Competition experiments ...... 21 2.2.8 Activity assays ...... 21 2.2.9 G4 DNA folding experiments ...... 22 2.2.10 Additional methods ...... 22 2.3 Results ...... 24 2.3.1 The ZmNDPK1 gene encodes a protein with G4-binding activity ...... 24 2.3.2 Structure determination of ZmNDPK1 ...... 25 2.3.3 Native and recombinant ZmNDPK1 bind to G4 element hex4_A5U ...... 26 2.3.4 ZmNDPK1 and NM23-H2 have different G4 binding properties ...... 26 2.3.5 ZmNDPK1 binding to G4 DNA is specific and they bind with defined stoichiometry...... 27 2.3.6 ZmNDPK1 G4 binding, nucleotide binding, and nucleoside kinase activity ...... 28

v 2.3.7 Lys149 is important for G4 binding...... 29 2.3.8 ZmNDPK1 binds to folded G4 DNA...... 30 2.4 Discussion ...... 31 2.4.1 G4s in maize ...... 31 2.4.2 Binding mode of G4 to NDPK...... 32 2.4.3 Structural basis for G4 binding by NDPKs...... 33 2.4.4 Implications for in vivo activity ...... 34 2.4.5 Conclusions ...... 35 CHAPTER 3. BULGED AND CANONICAL G-QUADRUPLEX CONFORMATIONS FORMED BY A SINGLE G-RICH DNA SEQUENCE CO-EXIST IN SOLUTION AND DETERMINE PROTEIN BINDING SPECIFICITY ...... 50 3.1 Introduction ...... 50 3.2 Materials and methods ...... 52 3.2.1 Oligonucleotide and protein preparation ...... 52 3.2.2 Absorption spectrophotometry...... 52 3.2.3 Circular dichroism spectrophotometry ...... 52 3.2.4 Dimethyl sulfide footprinting ...... 52 3.2.5 Nitrocellulose filter binding assays for ZmNDPK1/G4 DNA binding affinity analysis ...... 53 3.2.6 Analytical ultracentrifugation ...... 53 3.2.7 Fluorescence Resonance Energy Transfer (FRET)...... 54 3.2.8 Electron microscopy ...... 54 3.3 Results ...... 55 3.3.1 hex4_A5U adopts a G4 conformation in the presence of cations ...... 55 3.3.2 hex4_A5U oligonucleotide is not limited to a single G4 conformation ...... 56 3.3.3 Locked hex4_A5U variants form G4s with distinct properties ...... 57 3.3.4 ZmNDPK1 requires two consecutive G-tracts with a single one base loop for efficient binding ...... 58 3.3.5 ZmNDPK1 binds and stabilizes intermolecular G4s ...... 58 3.3.6 ZmNDPK1 and trim_A5U form a heterogeneous complex ...... 59 3.4 Discussion ...... 59 3.4.1 G4 formation by the hex4_A5U oligonucleotide...... 59 3.4.2 hex4_A5U and its truncated variant trim_A5U are highly polymorphic G4-forming sequences ...... 61 3.4.3 The G4-binding protein ZmNDPK1 recognizes a subset of conformations adopted by hex4_A5U DNA and forms filamentous structures upon binding...... 62

vi 3.4.4 Generalization of the extended model ...... 62 CHAPTER 4. SUMMARY AND FUTURE DIRECTIONS ...... 74 REFERENCES ...... 77 BIOGRAPHICAL SKETCH ...... 94

vii LIST OF FIGURES

Figure 1.1: Electron donors and acceptors of a neutral guanine molecule...... 2

Figure 1.2: Structure of a G-quadruplex ...... 3

Figure 1.3: G-quadruplex structural diversity ...... 4

Figure 1.4: Location and folding of select G4 motifs (from Andorf et. al 2014) ...... 13

Figure 2.1: G4 formation by the oligonucleotides used in this study is salt and sequence dependent ...... 36

Figure 2.2: ZmNDPK1-specific antibody 133 detects native ZmNDPK1 ...... 37

Figure 2.3: Representative nitrocellulose retention assay...... 37

Figure 2.4: ZmNDPK1 is a canonical NDPK ...... 38

Figure 2.5: Phylogenetic tree of proteins similar to ZmNDPK1 and ZmNDPK2 ...... 39

Figure 2.6: ZmNDPK1 hexamer binds two G4s...... 40

Figure 2.7: Hexameric NDPKs have high structural homology ...... 40

Figure 2.8: ZmNDPK1 is a hexamer, even at low concentration, but can also form a dodecamer ...... 41

Figure 2.9: hex4_A5U binds to native ZmNDPK1 ...... 41

Figure 2.10: ZmNDPK1 and NM23-H2 bind folded G4 oligonucleotides with nM affinity ...... 42

Figure 2.11: Competition of ZmNDPK1 binding to hex4_A5U by a series of alternative oligonucleotides summarized in Table 2.1 measured by the retention of biotinylated hex4_A5U in the presence of 1, 10, or 100-fold excess unbiotinylated competitor ...... 43

Figure 2.12: Nucleotides do not inhibit equilibrium ZmNDPK1 binding ...... 44

Figure 2.13: G4 binding does not affect NDPK activity ...... 44

Figure 2.14: Point variants of ZmNDPK1 show different DNA binding activities ...... 44

Figure 2.15: ZmNDPK1 and NM23-H2 both bind to folded G4 DNA, whether pre-folded in KCl or not pre-folded in LiCl ...... 45

Figure 2.16: FRET signals and PF trends are specific to energy transfer between the 5’ and 3’ ends of the folded oligonucleotides ...... 46

Figure 3.1: Spectroscopic analysis of a G-quadruplex (G4) formation by hex4_A5U ...... 64

viii Figure 3.2: Analytical ultracentrifugation of hex4_A5U annealed in KCl shows formation of compact a structure ...... 65

Figure 3.3: CD melts show reversible structural transition of a G4 formed by hex4_A5U with different cations ...... 66

Figure 3.4: Dimethyl sulfate (DMS) footprinting of hex4_A5U and its variations reveals guanines involved in G4 core formation ...... 67

Figure 3.5: Preliminary mutagenesis of the trim_A5U oligonucleotide ...... 68

Figure 3.6: Extended model of G4 formation by trim_A5U allowing one bulge in G-tract ...... 68

Figure 3.7: Spectroscopic analysis of a G-quadruplex (G4) formation by trim_A5U locked variants ...... 69

Figure 3.8: G4 binding protein ZmNDPK1 preferentially binds to the parallel locked variants of trim_A5U ...... 70

Figure 3.9: ZmNDPK1 binds to and stabilizes intermolecular and intramolecular G4s ...... 71

Figure 3.10: Electron microscopy of the complex between ZmNDPK1 and trim_A5U ...... 72

Figure 3.11: Possible topologies that can be adopted by the trim_A5U oligonucleotides ...... 72

ix LIST OF TABLES

Table 2.1: List of DNA aptamers used in this study...... 47

Table 2.2: Binding affinities of NM23-H2 and ZmNDPK1 to Pu44 and hex4_A5U oligonucleotides...... 48

Table 2.3: Crystallographic statistics for ZmNDPK1 X-ray data 1VYA...... 49

Table 2.4: Statistical analysis of FRET results for NDPK-G4 pairs in KCl or LiCl...... 49

Table 3.1: Summary of the properties of trim_A5U locked variants...... 73

x ABSTRACT

Non-canonical forms of DNA like the guanine quadruplex (G4) play important roles in regulating transcription and translation through interactions with their protein partners. G4s comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The core of a G4 is formed in G-rich stretches of DNA by Hoogsteen base-paired guanines that assemble as planar stacks, stabilized by a central cation like K+. These structures are reversible and structurally diverse, which makes them highly versatile genetic structures, as demonstrated by their roles in various functions including DNA replication, transcription, translation and telomere metabolism. The structural information on protein-G4 complexes remains scarce, even more so in the plant kingdom. In the present study, we addressed the following aims to tackle this deficiency: 1. Identify plant G4-binding proteins by expression library screening. 2. Analyze the structural heterogeneity of a polymorphic G4-forming oligonucleotide hex4_A5U. 3. Structurally characterize complex formation between hex4_A5U G4 and a G4- binding protein ZmNDPK1 using cryo-electron microscopy (cryoEM). G4 forming sequences were first identified in telomeres and then recognized in other genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 canonical non-telomeric putative G4s in maize, 29 percent of which were in non-repetitive genomic regions. Putative G4 hotspots exhibited non-random enrichment in genes at three locations: one on the antisense strand in the 5‘UTR (A5U class); second one also on the antisense strand at the 5’ end of the first intron (A5I class); and third one on the sense strand adjacent to transcription start site (ATG class). Maize hexokinase4 gene has one G4 from each class (hex4_A5U, hex4_A5I and hex4_AUG) which we shown to form G4s in vitro. Overall the G4 motifs were prevalent in key regulatory genes associated with hypoxia, oxidative stress, and energy status pathways. Putative G4 elements have been identified in, or near, genes from species as diverse as bacteria, mammals, and plants, but little is known about how they might function as cis-regulatory elements or as binding sites for trans-acting protein partners. In fact, until now, no G4 binding partners have been identified in the plant kingdom. Here, we report on the identification, cloning

xi and characterization of the first plant-kingdom gene known to encode a G4-binding protein, maize (Zea mays L.) Nucleoside Diphosphate Kinase1 (ZmNDPK1). Structural characterization by X- ray crystallography reveals that it is a homohexamer, akin to other known NDPKs like the human homolog NM23-H2. Further probing into the G4-binding properties of both NDPK homologs shows that ZmNDPK1 possesses properties distinct from that of NM23-H2, which is known to interact with a G-rich sequence element upstream of the c-myc gene and, in doing so, modulate its expression. We also demonstrate that the G4-binding activity of ZmNDPK1 is independent of nucleotide binding and kinase activity, suggesting that the G4-binding region and the are separate. Together, these findings establish a broad evolutionary conservation of some NDPKs as G4-DNA binding , but with potentially distinct biochemical properties that may reflect divergent evolution or species-specific deployment of these elements in gene regulatory processes. A single G4-forming sequence can adopt a variety of 3D structures depending on: strand order and orientation (parallel, antiparallel), number of tetrads in a core (two, three, four), identity of the central cation (K+, Na+) and presence of bulges in G-tracts. Here I investigate the conformational heterogeneity of a hex4_A5U. This sequence adopts extensive polymorphic G4 conformations including non-canonical bulged G4s that co-exist in solution. The nature of this polymorphism depends, in part, on the incorporation of different sets of adjacent guanines into a G4 core that allowed formation of the different conformations. Additionally, I show that the ZmNDPK1 specifically recognizes and promotes formation of a subset of these conformations.

xii CHAPTER 1

INTRODUCTION 1.1. Preface Chapter 1 of this dissertation is a general introduction to G4s structure and function, genomic distribution, protein G4 interactions and methods for G4 investigation. The chapter ends with a summary of bioinformatical investigation of G4 distribution in the maize genome—a collaboration with Dr. Hank Bass laboratory at FSU—which served as my entry point into a G4 field. This study is published in full in Journal of Genetics and Genomics (Andorf et al., 2014). My main contribution to the study is confirmation that putative predicted G4s can form G4s at least in vitro. One of the sequences identified in this study—hex4_A5U—is critical for investigations described in chapters 2 and 3. In chapter 2 I explore the properties of a newly identified G4-binding protein ZmNDPK1 and compare it to the properties of its human homolog NM23-H2. This chapter is published in full in the journal Biochemistry (Kopylov et al., 2015). Reuse of materials is authorized by the editor. Supplementary figures and materials and methods are incorporated into the main text, figures and figure references are renumbered to accommodate the FSU dissertation formatting requirements, otherwise the whole publication is presented here unaltered. This work was supported by National Science Foundation award MCB1149763 to MES and Florida State University Council on Research and Creativity Planning Grant awards to MES (OMNI-0000035907) and HWB (OMNI- 0000025471). The coordinates for ZmNDPK1 have been deposited in the PDB with PDB ID 1VYA. The sequence of the full-length cDNA was deposited to NCBI as GenBank Accession number KM347972. In Chapter 3 of this dissertation I investigate the G4-forming properties of hex4_A5U oligonucleotide. This chapter is in preparation for submission to the journal Nucleic Acid Research. Finally, in the chapter 4 I provide an overall summary of my research, draw conclusions, discuss recent preliminary data and speculate about possible future directions. 1.2. Discovery of G4s Properties of G4s are primarily dependent on the properties of the molecule guanine itself. The atomic structure of guanine was determined by Emil Fisher in the early 20th century revealing a heterocyclic double-ring planar structure that makes the molecule highly hydrophobic at neutral pH. Additionally, guanine can form a number of hydrogen bonds—as many as seven per molecule

1 (Figure 1.1). These two properties—hydrophobicity and abundance of hydrogen donors and acceptors—determine the propensity of guanine to self-assemble.

6 7 5 1 8 4 2 9 3

Figure 1.1: Electron donors and acceptors of a neutral guanine molecule. Neutral guanine molecule can form up to 8 hydrogen bonds. Lone pairs of electrons on N3, N7 and C6-O are electron donors – outgoing arrows. N1, C2-N and N9 are electron acceptors – incoming arrows.

A beautiful example of guanine self-assembly is the formation of guanine nanocrystals. In these crystals each guanine forms seven hydrogen bonds with adjacent guanines to form a planar 2-D lattice, that forms 3-D stacks stabilized by hydrophobic interactions between the ring systems of adjacent molecules (Denton and Land, 1971). Their high anisotropic refractive index is used in nature to produce some of the most vivid and radiant colors. For example, the pearlescent glitter of fish scales (Denton and Land, 1971), silvery hue of tropical spiders (Levy-Lior et al., 2008) and color-changing abilities of chameleon (Teyssier et al., 2015) are all dependent on guanine stacking in nanocrystals. The self-assembling property of guanine is preserved in its derivatives such as guanosine and guanosine monophosphate (GMP)—one of the building blocks of nucleic acids. Over a century ago, Levene and Jacobs demonstrated formation of a gel after cooling a solution of pure guanosine (Levene et al., 1909). Around the same time, Ivor Bang in his study on guanylic acid observed similar temperature-dependent ‘gelatinization’ (Bang, 1910). Fifty years later, the atomic nature of Bang’s gel was determined in a fiber X-ray diffraction experiment that indicated a helical

2 nature of the diffracting material. The helical unit was four GMP molecules arranged in an almost perfect planar quartet (G-quartet) with their N7 guanines pointing towards the middle channel and phosphate groups pointing outwards (Figure 1.2A) (Gellert et al., 1962). This guanine arrangement was subsequently observed in the experiments with synthetic polyguanylic acid (polyG), which forms a tetra helical structure stabilized by G-quartets (Zimmerman et al., 1975).

A. B.

Figure 1.2: Structure of a G-quadruplex. A. G-quartet formed by Hoogsteen base-pairing of four guanines. Central pore is typically occupied by a monovalent cation (e.g. K+). B. Diagram of a parallel G4, showing the stacking of three G-quartets to form a G4 core. Each G-quartet has a guanine from one of each four G-tracts. Therese tracts are connected by loops L1,2&3.

The first naturally occurring G4s were discovered in telomeric DNA of eukaryotes (Henderson et al., 1987, Sundquist and Klug, 1989), first shown for the Tetrahymena telomeric sequence repeat: (TTGGGG)4. Oligonucleotides containing this repeat readily formed G- quadruplexes under the physiological conditions. Later the structure was determined (Wang and Patel, 1993) showing the stacking of G-quartets and localization of stabilizing cations. Further,

G4-formation by the human telomeric sequence repeat (TTAGGG)4 was demonstrated and fueled the interest of the community in these secondary DNA structures (Williamson et al., 1989). 1.3. G-quadruplex structures and properties G4s are secondary structures formed in guanine-rich regions of DNA or RNA molecules. The general properties of DNA G4s and RNA G4s are very similar because they depend on the nature of the base, not the nucleotide sugar and, so, unless specified otherwise 'G4' will refer to a DNA G4. The characteristic element of a G4 is its core formed by a planar stacking of guanine

3 quartets. Each G-quartet consists of four guanines that come from four different G-tracts, stabilized by Hoogsteen hydrogen bonds (Figure 1.2B). These interactions lead to the formation of a right- handed, four-stranded helix, although left-handed G4s are also possible (Chung et al., 2015). The central pore formed by G-quartets is occupied by a cation that is coordinated by partial negative charges on the oxygens of the guanines. Although it is a relatively simple arrangement of the core, G4s are nonetheless a highly diverse class of secondary DNA structures. G4s differ in sequence, strand directionality, central cation identity, number of strands, presence of bulges, and number of quartets (Figure 1.3). Additionally, the same sequence of nucleotides can form an array of G4 structures depending on environmental conditions, or exist as an equilibrium of several structures(Cogoi et al., 2008, Schonhoft et al., 2010).

Figure 1.3: G-quadruplex structural diversity. The majority of known G4s form a right-handed helix (1KF1, 1JPQ, 1S45 etc.), but left-handed G4s are also possible (2MS9). G4s can be classified according to the number of DNA strands that participate in G4 formation as unimolecular (1KF1, 143D, 2JPZ), bimolecular (2KBP, 1JPQ, 2AQY) or tetramolecular (1S45). G4s can also be classified according to the relative strand directionality as parallel (2MS9, 1KF1, 2KBP, 1S45), antiparallel (143D, 1JPQ) and mixed (2JPZ, 2AQY). G4s can also differ in the number of G-quartets: 2-stack (2MS9), 3-stack (1KF1) and 4-stack (1S45).

Based on in-vitro observation of multiple G4s, the general prediction was made that oligonucleotide of the sequence d(G3+N1-7G3+N1-7G3+N1-7G3+) will form a G4 (Huppert and Balasubramanian, 2005). G4 motifs that comply with this formula are often referred to as canonical

4 G4s (Figure 1.2B). Here, four contiguous G-tracts that will form the core of a G4 are separated by three loops that can be no longer than seven bases. This formula was used as a basis to map the distribution of G4 motifs in genomes of many species, such as: human (Huppert and Balasubramanian, 2006), mouse and rat (Verma et al., 2008), arabidopsis (Mullen et al., 2010), and maize (Andorf et al., 2014). Despite having high predictive capabilities, the canonical formula does not identify non- canonical G4-capable sequences, such as G4s with interruptions in G-tracts (Mukundan and Phan, 2013), or longer loops (Guédin et al., 2010). G-tracts can have single or double bulges with different nucleotide composition, while the middle loop can be as long as 21 nucleotides long (Guédin et al., 2010). Additionally, some identified sequences that conform to the formula are false positives—sequences that fail to form a stable G4 at physiological conditions (Guedin et al., 2009). Although there are other algorithms, such as hidden Markov models (Yano and Kato, 2014) and G4Hunter (Bedrat et al., 2016) for G4 prediction, it is necessary to empirically validate the G4 formation using biochemical and biophysical methods. G4s can be tetra, bi- and unimolecular (Figure 1.3). In tetramolecular G4s each G-tract is located on an individual DNA strand. These G4s usually formed by annealing of four copies of G- rich sequence with the same directionality of strands. This allows formation of long and stable G4 structures, with three and more G-quartet stacks (Gray et al., 2014). Bimolecular G4s form by interaction of two hairpins, each containing two G-tracts and can form from strands of different directionality. Locations of potential cross-strand G4s were recently mapped for human genome (Kudlicki 2016). Unimolecular G4s have all four G-tracts on a single DNA chain. Such non- telomeric unimolecular G4s are the most probable to form in the context of a genome during cellular events that lead to formation of single-stranded DNA regions during transcription and replication. Unimolecular and bimolecular G4 topologies can be classified based on their strand directionality as: parallel (propeller), anti-parallel (chair, basket) and mixed (hybrid). In parallel G4s, four G-tracks have the same polarity (e.g. all of them are 5’3’), three loops that connect G- tracts are lateral (propeller) and all guanine glycosidic angles in an anti conformation. In anti- parallel G4s, two or three G-tracts have the same polarity, whereas the remaining ones have the opposite (Burge et al., 2006). G-tracts can be connected by lateral, edge or diagonal loops and guanine glycosidic angles will be a combination of syn and anti orientation.

5 Unlike the familiar B-form DNA that is stabilized by divalent cations interacting with a phosphate backbone, G4-DNA is stabilized by cations positioned in G4s central pore formed by G-quartets and coordinated by the carbonyl groups of guanines. The interactions of the cations with the phosphate backbone of the G4 provides minor contribution to the overall G4 stability , which means G4 formation is quite sequence specific (Kim et al., 2013). The G4 central pore is limited in diameter by the hydrogen bonds between guanines, therefore, not all cations can stabilize G4s. The most common are the monovalent cations K+ and Na+, which are generally abundant physiological cations. Li+ is too small, whereas Cs+ is too large to fit into the pore, although it can provide a certain degree of stabilization, further discussed in chapter 2 of this dissertation. Bivalent cations such as Sr2+, Ba2+, Ca2+ and Pb2+ can also stabilize G4s (Chen, 1992, Venczel and Sen, 1993, Shafer and Smirnov, 2000), whereas Mn2+, Ni2+ and Co2+ can have a destabilizing effect (Blume et al., 1997, Hardin et al., 2000). Overall, the ability of the cation to stabilize G4 can be 2+ 2+ + 2+ + + + 2+ + + ranked as follows: Sr > Ba > K > Ca > Na , NH4 > Rb > Mg > Li ≥ Cs (Largy et al., 2016). Other examples of G4 structural heterogeneity include introduction of adenine or cytosine into the G-quartets (Lim et al., 2009, Kocman and Plavec, 2017); formation of planar five-, six-, seven-, and eight-member planar stacks (Zhang et al., 2001, Mashima et al., 2009, Matsugami et al., 2001, Borbone et al., 2011); incorporation of extra G-tracts from flanking sequences (Fleming et al., 2015). Finally, in G4s with G-tracts composed of four and more guanines, formation of multiple G4 topologies that co-exist in solution occurs (Harkness and Mittermaier, 2016b). In our studies we have observed that such dynamics is not limited to the contiguous G-tracts, but also occurs through the formation of non-canonical G4s containing bulges in G-tracts and loops longer than 7 nt, further discussed in chapter 3. 1.4. G4 distribution in genomes G4s are found in DNA in genomes of all living organisms and some viruses, but not all organisms have a G4-rich genome (Davis and Maizels, 2011). Some genomes, especially those of bacteria, have very low G4 count. For example, the whole genome of E. coli has only 52 unique canonical G4s that corresponds to six G4s per megabase. The whole genome of Arabidopsis thaliana has about a thousand G4s—ten G4s per megabase, distributed uniformly across the genome (Mullen et al., 2010). At the same time large genomes, such as maize and human, have

6 hundreds of G4s per megabase located in hotspots (Huppert and Balasubramanian, 2005, Andorf et al., 2014). One G4 hotspot shared among all eukaryotes is located within telomeric DNA. Located at the end of chromosomes, telomeric DNA consists of GC-rich short repeats and ends with a single stranded G-rich overhang. Such repeats are conserved across among most eukaryotes: plants— TTTAGGG; most insects—TTAGG; roundworms—TTAGGC; ciliates—TTGGGG (Tetrahymena) or TTTTGGGG (Oxytrichia); yeast—TGTGGGTGTGGTG (Saccharomyces cerevisae). Even the unusual retrotransposon-based telomeres of Drosophila have G-rich repeats in the 3’ repeat region of HeT-A element that forms G4s (Abad and Villasante, 1999). Telomeric repeat of all vertebrates is TTAGGG and in humans this repeat can be over a thousand bases long. In addition to telomeres, G4s are also found in other repeating areas of eukaryotic DNA, such as long terminal repeats of some families of retrotransposons of plants (Lexa et al., 2014) and micro- and minisatellite repeats (Ogloblina et al., 2015, Amrane et al., 2012). The total number of putative canonical G4s in human genome is estimated to be ~400,000 (Maizels and Gray, 2013, Huppert and Balasubramanian, 2005). More recent bioinformatic analysis with G4hunter predicts anywhere between 300 thousand and 7 million putative genomic canonical and non-canonical G4s, depending on the threshold setting (Bedrat et al., 2016). In another study a sequencing approach was used to assess the number of G4s that form in the human genome in vitro, identifying 716 thousand distinct G4s, both canonical and non-canonical (Chambers et al., 2015). The distribution of G4s in the genomes of eukaryotes is non-random. Putative genomic G4s in eukaryotes are enriched in regulatory regions of DNA such as: gene promoters, exon-intron boundaries, transcription start sites and 5’ UTRs (Maizels and Gray, 2013, Andorf et al., 2014, Brooks et al., 2010). Other major G4 hotspots include gene enhancers (Du et al., 2009), origins of replication (Valton et al., 2014, Besnard et al., 2012), and mitochondrial DNA (Bedrat et al., 2016). Over 40% of human genes have a putative canonical G4 in a promoter (Huppert and Balasubramanian, 2006), suggesting a role for these structures in regulating gene expression. Of particular interest is the frequent occurrence of G4s in promoters of human oncogenes, proto-oncogenes and other genes involved in tumor progression (Brooks et al., 2010) and development of cancer (Kumarasamy et al., 2015, Phang, 2013, Miller et al., 2012, Quante et al., 2012, Vinod Prabhu et al., 2012, Wu et al., 2008). Among these genes are: c-myc, c-myb, VEGF, KRAS, mcl-2, hTERT to name the few. One of the most well-studied examples is the c-myc gene

7 that encodes a transcription factor. Overexpression of c-myc alters cancer cell energy metabolism by upregulating metabolic enzymes and stimulating ribosome and mitochondria biogenesis (Dang et al., 2009). c-myc is overexpressed in the majority of human cancers and its level of transcription is directly dependent on the formation of a G4 in its nuclease-hypersensitive element (NHE) III1 located in c-myc promoter region (Siddiqui-Jain et al., 2002). Non-random distribution of G4s, regulation of telomere metabolism, clustering in key regulatory regions of a genome, and prevalence in promoters of human protoncogenes make G4 regulation a topic of significant interest in science and medicine. Small molecule drugs are in development aimed at stabilization of G4s (Il'inskii et al., 2014). Some of them, such as derivatives of TMPyP4 and acridine, were shown be active in vivo (Gunaratnam et al., 2009, Liu et al., 2014, Calvo and Wasserman, 2016). In cells, however, the regulation of formation and resolution of G4s is most likely mediated by protein factors. 1.5. Protein G4-interactions There is a vast and growing number of functionally important proteins that are reported to have specific G4 binding activity(Qiu et al., 2015, Brázda et al., 2014). The regulatory functions of G4s in cells are most likely mediated by these proteins by changing G4 stability and conformation. Proteins that recognize G4 structures are involved in a wide range of cell functions that are predominantly related to nucleic acid metabolism, such as: telomere length regulation, DNA replication, DNA repair, transcription, translation and splicing (Rhodes and Lipps, 2015). Intriguingly, some G4-bindng proteins perform cell functions unrelated to nucleic acid metabolism, such as thrombin (Macaya et al., 1993), insulin (Connor et al., 2006) and nucleoside diphosphate kinases (Postel et al., 1993, Hildebrandt et al., 1995, Kopylov et al., 2015), but nonetheless recognize G4s with high affinity and specificity. G4-interacting proteins can be separated into two functionally distinct groups: G4- unfolding or destabilizing proteins and G4-folding or stabilizing proteins. This is accomplished, at least in part, by ATP-dependent G4-specific RNA and DNA like Pif1, XPD and RHAU. G4s exist in equilibrium between their folded state and single stranded DNA state. In contrast, proteins like POT1 and RPA1 bind and stabilize the unfolded state of a G4-quadruplex in ATP- independent manner. Examples of similar proteins are less numerous and include FMRP, nucleolin (Gonzalez et al., 2009), nucleophosmin, and the main subject of this study, the ZmNDPK1.

8 Helicases and are critical to prevent the genetic instability caused by formation of G4s (Paeschke et al., 2013). In fact, Pif1 helicases are conserved from bacteria to humans and unwind dsDNA as well as G4 DNA (Sanders, 2010). Here G4s potentially serve as recruiting sites for Pif1 (Duan et al., 2015). Another —transcription-associated helicase XPD—has 40% of its genome binding sites overlap with G4 sequences (Gray et al., 2014). In contrast, RHAU (also called G4-resolvase-1) specifically recognizes G4 and is released from DNA after unfolding G4 (Chen et al., 2015). Other G4-unfolding, such as POT1, trap a single stranded DNA sequence in a unfolded state and prevents the formation of a G4s (Zaug et al., 2005). Similarly, RPA was shown to unfold G4s with different efficiency that depended on G4 topology, number of G-quartets and loop length (Qureshi et al., 2012a, Ray et al., 2013). G4-promoting proteins play a unique role in the cell by stabilizing the structures that represent physical impediments to DNA replication and transcription and have to be resolved for the normal progression of these processes. These proteins typically have very narrow specificity to a particular G4 sequence and topology. For example, FMRP (fragile X mental retardation protein) affects splicing by binding to RNA G4s with high affinity in vitro and in vivo (Blice-Baum and Mihailescu, 2014, Zhang et al., 2014b). Nucleolin binds to a G4 in a promoter region of c-myc and stabilizes the folded conformation (Gonzalez et al., 2009, Gonzalez and Hurley, 2010). Also, nucleolin stabilizes G4 in a long terminal repeat promoter of HIV-1 decreasing viral transcription (Tosoni et al., 2015). Intrinsically disordered regions of nucleophosmin recognize intermolecular G4s (Banuelos et al., 2013, Scognamiglio et al., 2014). ZmNDPK1 binds and stabilizes parallel G4s as discussed in chapter 2 (Kopylov et al., 2015). There is no single class of proteins or a particular structural feature that can be categorized as a G4 binding or G4 recognition motif. This makes investigations of G4-interactome, so far, impervious to the bioinformatical predictions and require direct biochemical and biophysical experiments. Currently available structural information is scarce and there are no known rules of G4:protein recognition. Some conclusions can be drawn from the structures of the apo-states of the G4-binding proteins. For instance, RecA has the cavity that matches the size of G4 (Kuryavyi et al., 2012) and DNA-binding domain of nucleophosmin fits into a G4 groove (Gallo et al., 2012). Several incomplete structures shed light onto molecular mechanism of binding. G4-binding peptide from DEAH-box helicase RHAU interacts with the G4 through electrostatic interactions with the phosphate backbone (Heddi et al., 2015), while the RGG motif of FMRP interacts with

9 the RNA G4 through the duplex-G4 junction, highlighting the importance of flanking sequences in G4 recognition by proteins (Vasilyev et al., 2015). As of writing this dissertation, there are no atomic resolution protein:G4 structures, where G4 is a canonical folded three-quartet quadruplex and it interacts with specific protein residues. 1.6. Methods for in-vitro G4 investigation Secondary structure of a bioinformatically predicted G4 has to be tested in vitro. Commonly used methods for G4 characterization are: UV-visible (UV-vis) spectroscopy, including thermal difference spectroscopy (TDS) and thermal melt (TM) analysis; circular dichroism (CD) spectroscopy; gel electrophoresis, including electrophoretic mobility shift assay (EMSA) and dimethyl sulfide (DMS) footprinting; Förster Resonance Energy Transfer (FRET); analytical ultracentrifugation (AUC). These techniques can be used to analyze individual G4s as well as G4s in complex with small molecule ligands or protein G4:complexes.; Structural methods such as nuclear magnetic resonance spectroscopy (NMR) and X-ray crystallography provide atomic- resolution 3-dimensional structures of G4s, but they often require a sample of high purity and low heterogeneity, which is not possible in many cases, including the one described in chapter 3. UV-vis spectroscopy is one of the oldest and most commonly used tools in studying DNA. This method relies on the specific absorption spectrum of the molecule in the near-UV region. DNA absorbance spectrum depends on the sequence of DNA, as well as on its secondary structure (Basu and Das Gupta, 1969, Basu and Dasgupta, 1967). Individual contributions of the nucleotides to a UV-Vis spectrum are mostly independent of temperature. However, the secondary structure of DNA is disrupted with an increase in temperature, causing a significant change in the shape of a spectrum (Mergny et al., 2005, Mergny and Lacroix, 2003). Subtraction of the absorbance spectrum at a low temperature from the absorbance spectrum at high temperature gives rise to the thermal difference spectrum. Such method of analysis is called thermal difference spectroscopy (TDS). Wavelengths at which difference peaks are pronounced can be used in thermal melt (TM) experiments, where the change in absorbance at a particular wavelength is recorded with the increase in temperature (Mergny et al., 1998). This results in structure-specific melt curves that can be used for extraction of thermodynamic parameters (Mergny and Lacroix, 2009). G4s have a prominent negative peak at 295 nm on a TDS spectrum, but there other DNA secondary structures, such as Z-DNA, that can produce this peak too (Mergny et al., 2005).

10 CD-spectroscopy complements UV-vis spectroscopy and allows additional structural insights into the topology of G4s. Parallel G4s are characterized by a major positive peak at 262 nm, whereas antiparallel G4s have a major positive peak at 292 nm (Giraldo et al., 1994, Balagurumoorthy et al., 1992). The different signals depend on relative guanine orientation in stacks (Randazzo et al., 2013, Paramasivan et al., 2007). CD can be also combined with a temperature gradient to monitor the formation or dissolution of secondary structures. Melting profiles can be obtained by following the maximal peaks and used for the extraction of thermodynamic parameters (Gray and Chaires, 2011). Gel electrophoresis separates molecules by size, shape and charge. DNA is uniformly charged and during electrophoresis it will be separated by shape and size. A single G4-forming sequence can adopt different shapes and oligomeric states, which will result in the change in its electrophoretic mobility(Sun and Hurley, 2010). This technique, EMSA, can also be used to detect the formation of a complex between DNA and protein. Treatment of G4 DNA with dimethyl sulfate followed by piperidine cleavage (DMS footprinting) and resolution via gel electrophoresis to a single-base precision reveals guanines that are solvent buried and constitute a G4 core (Sun and Hurley, 2010). FRET utilizes a pair of fluorophores where the emission spectrum of one, the donor, overlaps with the absorption spectrum of the other, the acceptor. If after stimulation of a donor, the emission of the acceptor is detected, we can conclude that these two fluorophores are within 2-6 nm from each other. The higher the FRET efficiency, the closer are the two fluorophores(Didenko et al., 2006). This method can be used to monitor the relative distances between the fluorescent labels under different conditions, such as G4-folded and unfolded states. Analytical ultracentrifugation (AUC) separates molecules by their sedimentation coefficient in physiological conditions—constant temperature, no electric field applied, no extra modifications of the molecule, neutral pH, low ionic strength and a viscosity modifier to simulate the crowded environment inside the cells. AUC is probably the least destructive to the sample from the techniques listed. AUC can separate a heterogeneous population into individual species and extract the molecular mass and the overall shape of the molecule for each (Lebowitz et al., 2002). 1.7. G4 distribution in the maize genome Only a few of the genome-wide computational screens for G4 motifs include representatives of the plant kingdom (Mullen et al., 2010, Lexa et al., 2014, Takahashi et al., 2012)

11 and biochemical data is even more scarce. Bioinformatical and biochemical investigations in plants allows cross-kingdom comparisons in search of conserved pathways regulated by G4s, and have the potential for identification of novel plant-specific regulatory pathways that may have a direct impact in food, fuel, and biomass production. Maize is one of the most important agricultural products worldwide, with almost a one-billion-ton yearly production worldwide (United States. Department of Agriculture. Economics and Statistics Service. et al.). Maize is also a well characterized model plant organism, with a fully sequenced genome and a rich history of genetic discoveries. Historically, maize is a choice organism for investigations in telomere functions (Blackburn et al., 2006), mobile DNA elements (Ravindran, 2012), epigenetics (Urnov and Wolffe, 2001), and genetic diversity (Coe, 2001). We used the Quadparser algorithm to determine the G4 distribution in the genome of maize—Zea mays B73 (Andorf et al., 2014). Analysis of 43000 identified putative canonical G4s revealed that these sequences are widespread, enriched in genes, and more frequent in genes with functions in hypoxic responses, energy metabolism, and inositol phosphate metabolism. Prominent among these were enzymes for the altered carbohydrate metabolism and redox reactions that would be important for plant cell acclimation under hypoxia. Such hypoxic conditions can occur naturally, for example in some cells of phloem, endosperm, and anthers, or as a result of environmental conditions such as flooding. Systematic examination of locations for G4 motifs in maize genes revealed that for any given gene, there were two major hotspots at sites we designated A5U (located in the 5’ untranslated region of a gene) and A5I (located near the first intron/exon boundary). Together these two positions accounted for more than 90% of all the gene-associated G4 motifs. Given the locations of the A5U and A5I elements relative to transcription start sites (TSSs) and the first exon- intron boundary, we speculate a role for these maize G4 motif elements in transcription. More specifically, we predict that they impact the processivity of RNA polymerases. We also noted the fact that maize and humans share G4 hotspot locations, but on opposite strands. Specifically, in humans, most of the TSS and 1st-intron G4 motifs are on the sense (coding/non-template) strand, whereas in maize, the TSS- and 1st-intron-associated G4 motifs are almost entirely on the antisense (template) strand. The biological significance of this difference is not known, but may reflect a divergent evolutionary deployment of G4 motifs for genetic functions in different phylogenetic taxa. These G4 motifs could also provide strand-independent cis-regulatory functions as structure-

12 specific nucleation or recruitment sites for molecules involved in other processes such as DNA replication, strand-unwinding, DNA repair, epigenetic marking, chromatin-remodeling, or RNA metabolism. We tested several oligonucleotides with genic G4 motifs to see whether they could fold into G4 structures in vitro as summarized in (Figure 1.4).

Figure 1.4: Location and folding of select G4 motifs (from Andorf et. al 2014). Each schematic shows 750 bases after the TSS and the last 250 bases of the canonical transcript for each given gene model. The arrow on top strand denotes the TSS, arrow on bottom strand denotes AUG, location of G4 motifs are depicted as a three-sheet stack on the appropriate strand, coding regions are wide black boxes, UTR regions are narrow black boxes, and introns are arrowed lines. A. Maize telomere (TTTAGGG repeat); B. shrunken1, sucrose synthase with an A5U overlapping the TSS; C. Maize hexokinase4, a hexokinase domain protein with three quadruplexes: A5U near the TSS, AUG overlapping the start codon, and a A5I1 in the first intron; D. Maize hre3 (GRMZM2G148333, hypoxia responsive ERF homologous3) with one A5U G4 motif and one G4 motif on the template strand immediately after an alternative transcript coding stop site; E. Maize hrap2 (GRMZM2G171179, hypoxia responsive RAP2 homologous2), with three tandem A5U G4 motifs between the TSS and start codon. F. Normalized UV absorbance thermal difference spectra for selected synthetic oligonucleotides in human telomeric repeat, maize telomeric repeat, and shrunken1 A5U. Human telomeric repeat was used in this experiment as a positive control. G4-characteristic TDS profile and prominent negative peak at A295 were obtained only for WT sequences annealed in the presence of 100 mmol/L potassium (filled triangles) but not for those annealed in TBA phosphate buffer alone (open triangles). Mutant sequences (mut) did not show G4-characteristic signature. The G4-characteristic signature is denoted by arrow and bracketed region for the first panel, H.s. telomere, and seen in other panels for the “WT + K” samples (filled triangles). G. Normalized UV absorbance thermal difference spectra for three different G4 motif oligos in maize hexokinase4.

13 Single-stranded synthetic oligonucleotides were incubated under G4-forming conditions, and thermal difference spectra showed a diagnostic G4-specific increase in absorption at 295 nm

(Figure 1.4). This A295 signature was observed in a positive control sample of human telomere repeat DNA, (TTAGGG)4, and also in a plant telomere oligonucleotide sample (TTTAGGG)4, under the same conditions (Figure 1.4). The locations of several genic G4 motifs in the shrunken1 and hexokinase4 genes are shown. Oligonucleotides with mutations that altered the G-tracts, or wild-type oligonucleotides in the absence of potassium, failed to show spectra indicative of G4 structures in these assays. These results were corroborated using CD spectroscopy. Together, these results confirmed our expectation that computationally-predicted G4 motifs can adopt quadruplex structures in vitro.

14 CHAPTER 2

THE MAIZE (ZEA MAYS L.) NUCLEOSIDE DIPHOSPHATE KINASE1 (ZMNDPK1)

GENE ENCODES A HUMAN NM23-H2 HOMOLOG THAT BINDS AND STABILIZES

G-QUADRUPLEX DNA

2.1.Introduction Guanine quartets occur when four guanines in a single stranded DNA or RNA form Hoogsteen base quartets with one another, surrounding a central monovalent cation that is typically a potassium ion (Huppert, 2008). Sequential quartets can stack, forming an extended secondary structure where loops of single stranded nucleic acid connect the quadruplexed guanines in a variety of topologies (Burge et al., 2006, Zhang et al., 2014a). In vivo, these G-quadruplex (G4) motifs occur in G-rich regions of DNA, such as those found on the 3’-terminal strand in the telomeres of linear chromosomes (Juranek and Paeschke, 2012, Lipps and Rhodes, 2009). Additionally, a preponderance of G4 motifs are found at genes, in particular at promoter regions, the 5’ ends of first introns, and 5’ and 3’ untranslated regions (UTRs) of some mRNAs, as well as in the long terminal repeats (LTRs) of some classes of plant retrotransposons (Mullen et al., 2010, Maizels and Gray, 2013, Huppert and Balasubramanian, 2007, Lexa et al., 2014, Murat and Balasubramanian, 2014, Andorf et al., 2014). G4 elements are implicated in DNA replication, telomere metabolism, and genome rearrangements; evidence for the specific involvement of DNA G4 structures, whose presence is affected by protein binding partners, in regulating these activities comes from nuclear staining of G4s and in functional disruption of both DNA and protein elements (Maizels and Gray, 2013, Gray et al., 2014). Further, Escherichia coli G4s encoded in the antisense strand at promoters result in decreased gene expression whereas G4s encoded in the antisense strand of 5’UTRs result in enhanced gene expression (Holder and Hartig, 2014). Together, these discoveries reveal a remarkably broad and diverse deployment of G4s in nature, promoting the experimentally supported hypothesis that G4s participate in control of gene expression, including transcriptional initiation, splice site selection, or protein translation (Siddiqui-Jain et al., 2002, Lam et al., 2013, Bochman et al., 2012). G4 formation by short, G-rich oligonucleotides in vitro is strongly stabilized (Shen et al., 2005) by the addition of K+ (Lane et al., 2008). In vivo, formation and function of G4s are likely influenced by trans-acting protein factors that stabilize or destabilize the G4 secondary structure,

15 identified so far in Archaea, yeast, and metazoan species (Biffi et al., 2013, Henderson et al., 2014, Lam et al., 2013, Hwang et al., 2014, Ray et al., 2014, Paeschke et al., 2005, Gray et al., 2014). Early on, Puf transcription factor was identified as binding the G-rich region of DNA that is upstream of the c-myc promoter (Postel et al., 1989). More recently, a number of G4 binding proteins have been identified that have diverse functions, illustrated by, but not limited to, Xeroderma Pigmentosum B (XPB) and Xeroderma Pigmentosum D (XPD) helicases (Gray et al., 2014); hnRNP A1 and its derivative Unwinding protein 1 (Up1) (Cogoi et al., 2008) and the telomeric binding proteins Protection of Telomere-1 (POT1) (Ray et al., 2014) and Repressor- activator protein 1 (RAP1) (Giraldo et al., 1994). No universal mechanism for G4 modulation by protein partners has emerged, perhaps because each protein-nucleic acid binding partner pair is unique but also suggesting more work needs to be done to fully define how G4 elements and their protein binding partners interact. Additionally, studies to date on genic G4 binding protein partners are primarily limited to bacteria, Archaea, yeast, and metazoan species, leaving open the question we address here: do members of the plant kingdom have protein partners that interact with their predicted genic G4s? An emerging idea is that nucleoside diphosphate kinase (NDPK, EC 2.7.4.6) serves as a master regulator of diverse pathways in the cell because of its ubiquitous occurrence, whereas once it was thought to be only responsible for maintaining the nucleotide triphosphate pool throughout the cell, including within organelles. Surprisingly, after being recognized as important in c-myc expression, the Puf transcription factor was subsequently identified as NM23-H2, a member of the non-metastatic 23 (NM23) family that binds to the G-rich c-myc nuclease hypersensitive element

III1 (Postel et al., 1993). The ten members of the human NM23 family are defined by sequence conservation and can be divided into two groups based on whether or not they exhibit kinase activity (Bilitou et al., 2009). Those that are kinases possess the characteristic residues that drive the enzyme’s double-displacement (ping-pong) catalytic mechanism first described for the yeast NDPK (Garces and Cleland, 1969). A wide range of characteristics has been attributed to NDPK proteins in addition to trans-phosphorylation of nucleoside diphosphates, supporting the hypothesis that these enzymes serve a broader regulatory role in the cell. For example, NDPK has been found to bind G-rich single-stranded oligonucleotides (Postel, 2003); oligomerize upon binding GTP to form microtubule-like bundles (Morin-Leisk and Lee, 2008); act as a protein histidine kinase (Freije et al., 1997); activate (Shen et al., 2008); or alter membrane

16 dynamics by providing a steady stream of GTP to the molecular motors (Francois-Moutal et al., 2013, Boissan et al., 2014). Additional functions for NDPKs in cell proliferation, signaling, and development are well documented for both fungal and animal species (Bilitou et al., 2009, Lombardi et al., 2000). Plant NDPKs are known to facilitate analogous housekeeping and regulatory cellular functions, as well as playing a role in abiotic stress response (Shen et al., 2005, Finan et al., 1994, Dorion et al., 2006, Nomura et al., 1992, Dancer et al., 1990, Kim et al., 2011, Kim et al., 2009, Wang et al., 2014) but their DNA binding properties have not been described. The locations of G4 sequence motifs in the maize genome were recently identified computationally and these G4 motifs are enriched at promoters and introns of thousands of genes, many of which are coupled to energy stress signaling (Andorf et al., 2014). To understand the potential roles of these elements in this major crop and model plant species, we undertook an unbiased ligand-binding screen to identify proteins that interact with maize G4-forming sequences. From this screen, we identified maize cDNA clones for a gene encoding ZmNDPK1, representing the first biochemically defined G4-binding protein from plants. Here, we biochemically and structurally characterized ZmNDPK1 and its G4 interactions with comparative analysis of its human homolog, NM23-H2, allowing us to propose a model for NDPK interactions with G4 structures. 2.2. Experimental Procedures 2.2.1. Phage library screen A library of maize cDNAs was previously generated from meiosis-enriched tassels (library 11, inbred line W23, a gift from J. M. Gardiner, University of Arizona, Tucson) and clonded in the Lambda Zap II expression vector (Agilent Technologies, Santa Clara, CA). Phage λ harboring this library were subsequently infected into Y1090a Escherichia coli for protein expression and ligand- binding screening according to the published method (Singh et al., 1988). Plaques were lifted onto nitrocellulose and then the membranes were blocked using 5% bovine serum albumin fraction V (BSA), 50 mM TrisHCl (pH 7.4), 50 mM NaCl, 1 mM EDTANaOH (pH 8), and 1 mM DTT. To reduce detection of endogenous non-specific biotinylation from the E. coli host, the membranes were next blocked with 2 ug/mL avidin in 12 mM potassium/sodium phosphate (pH 7), 137 mM NaCl, 2.7 mM KCl, and 0.05% Tween-20 (PBST), washed in PBST, and blocked again in 100 nM biotin in PBST. 60 nM biotinylated G4 folded oligonucleotide from the antisense strand of the 5’ UTR of the maize hexokinase4 gene (the oligonucleotide is hereafter called hex4_A5U - “A5U”

17 for antisense 5’ UTR; Figure 2.1 and Table 2.1) in 10 mM TrisHCl (pH 7.4), 10 mM KCl, and 40 mM NaCl was applied and then washed with the same buffer minus probe. All G4 or G4 mutant oligonucleotides were annealed according to the same procedure: oligonucleotides were mixed in KCl or LiCl-containing buffers, as designated, heated to 95o C for 15 minutes, and cooled to room temperature overnight. G4 state for all was verified with thermal difference spectroscopy (TDS, Figure S1 and Table 2.1). Membranes were crosslinked in a Stratalinker 1800 (Agilent, Santa Clara, CA) and the oligonucleotide-binding plaques were identified by detection of biotin with NeutrAvidin-conjugated horseradish peroxidase (HRP, Thermo Scientific, Rockford, IL), developed with chloronaphthol- and diaminobenzidine-based (CN/DAB) colorimetric stain (Thermo Scientific, Rockford, IL). Initial hits were plaque purified and the ZmNDPK1 gene was identified from 3 independent clones using polymerase chain reaction (PCR) with M13 primers followed by sequencing of the PCR product. A full-length cDNA clone for the ZmNDPK1 gene (GenBank accession number KM347972) was used as a template for PCR amplification using sequence-specific primers (complementary regions are underlined, forward: GCTTAGCATATGGAGAGCACCTTCATC and reverse: CTTATCGAATCCTTACTTCTCGTAGATCCAGG) and the PCR product was subsequently subcloned into the pET28a (EMD Millipore, Darmstadt, Germany) vector as a hexa-histidine fusion. The resulting N-terminal six histidine tagged fusion protein was used for this study. All oligonucleotides for G4 studies and cloning were purchased from Eurofins Genomics, Huntsville, AL. 2.2.2. Protein expression and purification BL21(DE3) E. coli cells were transformed with ZmNDPK1 pET28a or NM23-H2 pET28a o and cells were grown to an A600 of 0.4-0.6 at 37 C before induction with 0.5 mM IPTG for overnight expression. The protein purification for both homologs was identical. Cells were harvested and resuspended in 50 mM sodium HEPES (pH 7.5) and 100 mM KCl buffer (buffer A) before being lysed with a microfluidizer. Lysate was clarified through centrifugation at 16,000 x g for 30 minutes at 4o C in an Eppendorf F-34-6-38 fixed angle rotor (Hamburg, Germany). Clarified lysate was then passed over nickel-nitrilotriacetic acid (NTA) (GE Healthcare, Piscataway, NJ), washed in 50 mM imidazole+buffer A, washed in 50 mM imidazole+buffer A+1 M KCl, re- equilibrated into 50 mM imidazole+buffer A, and then recovered by imidazole gradient elution around 400 mM imidazole. Eluate was dialyzed against buffer A before being concentrated for size

18 exclusion chromatography over a Sephacryl S100 column (GE Healthcare, Piscataway, NJ). All protein concentrations were determined by the absorbance at A280 using the BCA-determined extinction coefficient and expressed in terms of the concentration of the monomer. 2.2.3. Point variant generation and purification H115A and K149A point mutants were generated in the ZmNDPK1 expression construct using the following QuikChange (Agilent Technologies, Santa Clara, CA) primers, where the variant codons are in italics: (H115A:CATTGGCAGGAATGTCATTGCTGGAAGTGACAGCATTGAGAGTGC plus reverse complement; K149A: CCCTGGATCTACGAGGCGTAAGGATTCGATCGAGCTCCGTCG plus reverse complement). Proteins were expressed and purified as described above for the wild-type proteins. 2.2.4. Structure determination ZmNDPK1 crystals were initially identified from the Hampton Crystal Screen 1 condition 40 (Hampton Research, Aliso Viejo, CA) and subsequently refined to 100 mM sodium cacodylate

(pH 6.5), 12% PEG 8000, and 200 mM CaCl2. Crystals were washed in 20% PEG 8000 before flash freezing in liquid nitrogen. X-ray diffraction data to 2.0-Å resolution were collected on Beam Line 22-BM at the Advanced Photon Source within Argonne National Laboratories (Argonne, IL). Data were indexed and reduced in HKL2000 (Otwinowski and Minor, 1997). Phases were determined using molecular replacement in PHENIX (Adams et al., 2010) against an all alanine model of AtNDPK1 (modified from PDB ID 1U8W (Im et al., 2004)) and the structure refined with simulated annealing and energy minimization in Phenix (Adams et al., 2010). 2.2.5. Maize extract pulldown Seeds from the B73 cultivar of Z. mays were ground in a coffee grinder. Seed powder was hydrated in buffer B (25 mM sodium HEPES (pH 7.5), 25 mM NaCl, 100 mM KCl, 4.5 mM

MgCl2, 5 mM EDTA, 10% weight/volume glycerol, 0.2 mM PMSF, and 0.02% protease inhibitor cocktail (Sigma-Aldrich, St. Louis, MO)) and extracted in a dounce homogenizer. Solids were removed with centrifugation and the extract was separated with heparin affinity chromatography. ZmNDPK1-containing fractions were identified with Western blot analysis using a new ZmNDPK1-specific antibody (Figure 2.2). Magnetic Dynabeads MyOne Streptavidin C1 beads (Life Technologies, Carlsbad CA) were prepared by blocking with buffer A plus 0.5% BSA and 10 μg/mL heparin (Alfa Aesar, Ward Hill, MA) before addition of biotinylated G4 folded hex4_A5U

19 or biotinylated unfoldable hex4_A5Um sequences (Figure 2.1 and Table 2.1) with an additional 30 nt linker to biotin. ZmNDPK1-containing fractions from the heparin affinity separation were mixed with the oligonucleotide-bound beads and incubated for 1 hour at 4o C with rotation. Bound proteins were washed with buffer B and eluted in buffer B + 1 M KCl. Proteins were separated by SDS-PAGE, transferred to nitrocellulose membrane, and analyzed with the same ZmNDPK1- specific antibody.

2.2.6. Kd and stoichiometry determination We determined the apparent binding affinity of Zm-NDPK1 for folded hex4_A5U using a modified slot-blot binding assay (Ryder et al., 2008, Woodbury and von Hippel, 1983, Czerwinski et al., 2005). Probe, 0.1 nM biotinylated G4 folded hex4_A5U DNA (Figure 2.1 and Table 2.1), was mixed at 22o C with increasing ZmNDPK1 protein concentrations from 0 to 55 nM in buffer C (10 mM sodium HEPES (pH 7.5) and 20 mM KCl supplemented with 10 μg/mL heparin). The mixture was then applied to a slot blot apparatus that passed the solution through stacked Hybond- C Extra nitrocellulose, 0.45 μm pore size (GE Healthcare Life Sciences, Piscataway, NJ) and negatively charged Nytran N nylon (GE Healthcare Life Sciences, Piscataway, NJ) membranes via capillary action. The binding experiments were designed to show specificity, within the salt tolerance of the experiment: increasing salt causes non-specific nucleic acid binding to nitrocellulose (Thomas, 1980). Notably, addition of up to 1,000-fold excess random ssDNA in addition to the heparin did not affect the shift of the oligonucleotide to nitrocellulose, although it did quench the biotin signal on the nylon membrane (data not shown). At 0.1 nM oligonucleotide, no DNA binds nitrocellulose but is retained on nitrocellulose in the presence of increasing amounts of protein (Figure 2.3). Both membranes were UV-crosslinked before blocking. Nitrocellulose membranes were blocked for one hour in 2% BSA in 50 mM Tris (pH 7.4) and 150 mM NaCl (TBS) whereas nylon membranes were blocked in 4% dry milk in TBS. The presence of biotinylated oligonucleotide was detected with NeutrAvidin-conjugated HRP (Thermo Scientific, Rockford, IL) that was developed with CN/DAB colorimetric stain (Thermo Scientific, Rockford, IL). Membranes were scanned on a flatbed scanner and the integrated intensities of the slots were determined in ImageJ (Schneider et al., 2012). This protocol was followed for ZmNDPK1 and biotinylated hex4_A5U in buffer C with LiCl rather than KCl, spanning protein concentrations of 0 to 2200 nM. Slot blots were repeated with NM23-H2 and biotinylated Pu44, the well- characterized human G4 element (Dexheimer et al., 2009) (Figure 2.1 and Table 2.1), in buffer C

20 with KCl or LiCl, spanning protein concentrations of 0 to 18 nM for both. We also performed the experiments titrating ZmNDPK1 from 0 to 2000 nM against biotinylated Pu44 in KCl or LiCl and NM23-H2 from 0 to 500 nM against biotinylated hex4_A5U in KCl or LiCl. We then titrated ZmNDPK1 point variants against biotinylated hex4_A5U. H115A spanned the concentration range 0 to 250 nM in KCl and 0 to 2000 nM in LiCl. K149A spanned the concentration range 0 to 2000 nM in both KCl and LiCl.

The Kd was calculated by fitting the modified Hill equation, equation 1, to three independently determined plots of increasing protein concentration against the fraction of protein- bound DNA using an in house Python script (Enthought, Austin, TX). In equation 1, Pt is the protein concentration in terms of monomers, calculated from its A280 and experimentally determined extinction coefficient, and h is the Hill coefficient (Table 2.2). 1 퐹푟푎푐푡푖표푛 푏표푢푛푑 = (1) 퐾 ℎ 1+( 푑 ) [푃푡] The stoichiometry of binding was determined by titrating protein against 5 or 10 nM biotinylated G4 folded hex4_A5U in buffer C. Crosslinking, blocking, and detection was performed as described above. 2.2.7. Competition experiments Biotinylated G4 folded hex4_A5U at 1 nM was mixed with KCl-annealed competitor (Figure 2.1 and Table 2.1) at 1, 10, and 100-fold molar excess competitor over biotinylated probe and then 3 nM ZmNDPK1 was added. The mixture was incubated for 15 or 60 minutes before being applied to the slot blot apparatus. Each competitor oligonucleotide was treated in the same way. Nitrocellulose retention was measured in triplicate using the NeutrAvidin/CN/DAB protocol described above. Percent retention was calculated against a zero competitor control. Nucleotide competition was performed in a similar way using identical protein and DNA concentrations but using 0-1 mM ATP, ADP, GTP, or GTPγS and 2.5 mM MgCl2 in place of the competing oligonucleotide. 2.2.8. Activity assays The standard assay for this activity is an enzyme-coupled assay in which NDPK converts ATP + TDP to ADP + TTP. Pyruvate kinase (PK), in turn, transfers a phosphate from phosphoenolpyruvate (PEP) to ADP, regenerating ATP and making pyruvate. Finally, lactate dehydrogenase (LD) converts pyruvate and β-NADH to the spectroscopically inactive NAD+ and

21 this decrease in absorbance is monitored at 340 nm (Bergmeyer, 1974). ZmNDPK’s kinase activity was measured as free enzyme at 2.5 nM concentration or in the presence of saturating (250 nM) hex4_A5U either in the presence of 20 mM KCl or LiCl, that is, either with G4 folded or single- stranded G-rich oligonucleotide. Buffer conditions included 100 mM Tris-HCl (pH 7.5), 10 mM

MgCl2, 0.7 mM TDP, 6 mM ATP, 0.4 mM NADPH, 4 mM PEP, 10 U PK, and 10 U LD. Standard activities for wild type and point variant (H115A or K149A) ZmNDPK1 were measured in 100 mM KCl. Activities were measured in the kinetics mode of an Agilent 8453 spectrophotometer at 25o C. Activity measurements of NM23-H2 were also made using the same assay as free enzyme and with G4 folded or single-stranded G-rich Pu27 oligonucleotide (a shortened form of Pu44 (Dexheimer et al., 2009), Table 2.1). 2.2.9. G4 DNA folding experiments Förster resonance energy transfer (FRET) probes were designed based on the hex4_A5U or Pu27 G4 oligonucleotide sequences (Table 2.1) but labeled 5’ with 6-Carboxyfluorescein (6- FAM) and 3’ with carboxytetramethylrhodamine (TAMRA) (Eurofins Genomics, Huntsville, AL). Single label oligonucleotides were also generated as controls for FRET efficiency. Reactions were set up in triplicate in black 96-well Nunclon plates (Thermo Fisher Scientific) and each reaction contained 200 μl of 200 nM FRET probe in 10 mM tetrabutylammonium phosphate (TBA) buffer (pH 7.5) supplemented with either 20 mM KCl or LiCl. Protein was added at increasing concentrations from 100 to 900 nM (0.5- to 9-fold excess) and plates were incubated for 1 hour at 4° C before data collection. Data was collected on Spectramax M5 multi-mode plate reader (Molecular Devices) using 475 nm excitation wavelength and 515 nm cut-off. Emission spectra were collected at 21° C from 500 to 640 nm with 2 nm step and 10 reads per well. FRET efficiency was characterized by computing the parameter P(Salas et al., 2006), which we refer to as PF:

퐼푑 푃퐹 = (2) 퐼푑+퐼푎

where Id is the intensity of the FRET donor (FAM) at 520 nm and Ia is the intensity of the FRET acceptor (TAMRA) at 584 nm. 2.2.10. Additional methods Thermal Difference Spectroscopy (TDS). A modified protocol1 was used to test the folded state of the putative G4-forming oligonucleotides. Briefly, oligonucleotide in either 10 mM tetrabutyl ammonium phosphate buffer (TBA) and 20 mM KCl or 20 mM LiCl was annealed by

22 heating to 95o C for 15 minutes and slow cooling overnight. A UV/visible spectrum was measured of 2.5 μM annealed oligonucleotide at 25o C. Oligonucleotide was again heated to 95o C to melt potential tertiary structure and a second spectrum was acquired. The first spectrum (i.e. annealed) is subtracted from the second (i.e. melted) and the resulting difference spectrum normalized to a maximum absorbance of 1 (Figure 2.1). A negative dip at 295 nm is indicative of G4 formation in the cold sample that melts upon heating. The absence of that signal suggests the oligonucleotide was single stranded after annealing. ZmNDPK1 antibody. A polyclonal antibody was raised in New Zealand White – SPF rabbits against an HPLC-purified, amide-coupled synthetic peptide derived from C-terminal region of ZmNDPK1 (antibody 133 from peptide EGPADWQSSQH, Figure 2.4) (New England Peptides, Gardner, MA). Antibodies were affinity purified by New England Peptides against their respective peptide. The antibody specifically recognizes both the native and recombinant maize homologs of NDPK but not the recombinant human NM23-H2 (Figure 2.2). Representative slot blot. Mixtures of biotinylated G4 DNA (0.1 nM) and increasing concentrations of ZmNDPK (0-55 nM) were applied to a slot blot apparatus and passed through nitrocellulose then nylon membranes. The presence of DNA was detected using horseradish peroxidase-coupled streptavidin and developed with a colorimetric dye (Figure 2.3; see Methods). The intensities of the slots were measured, background corrected, and the Kd determined by plotting the % retained on the nitrocellulose (Figure 2.5; see Methods). Phylogenetic analysis. Proteins similar to ZmNDPK1 or ZmNDPK2 were selected from the top-scoring results of tBLASTn (http://blast.ncbi.nlm.nih.gov/) sequence similarity searches. Protein sequences for NDPK1 or NDPK2 were used to query the NCBI non-redundant nucleotide (nr/nt) databases. A group of 33 top scoring BLAST hits, plus three others (Arabidopsis NDPK1, and two human NDPKs – NM23-H1 and NM23-H2) were aligned as protein sequences by Clustal Omega2. The alignment was used to produce a Clustal Omega phylogenetic tree (Figure 2.5A) using default parameters (neighbor-joining, without distance corrections) and plotted using Treedyn3. GenBank Accession numbers and additional information about the protein sequences are included in Figure 2.5B. Native gel analysis. Blue native polyacrylamide gel electrophoresis (bnPAGE, Life Technologies, Grand Island, NY) was performed according to the manufacturer’s specification with the following modifications. ZmNDPK1 was labeled by taking advantage of its ability to

23 autophosphorylate at H115 by mixing it with 30 nM γ32P-ATP. Either 25 μM or 20 nM labeled ZmNDPK1 alongside 30 nM γ32P-ATP as a control were electrophoresed and shown to run as hexamers or dodecamers, but not dimers or tetramers (Figure 2.6). The gel was exposed to a storage phosphor screen (GE Lifesciences, Pitsburg, PA) overnight and scanned on a STORM imager (GE Lifesciences, Pitsburg, PA). 2.3. Results 2.3.1. The ZmNDPK1 gene encodes a protein with G4-binding activity ZmNDPK1 was identified from a G4 ligand-binding screen of a maize meiosis-enriched tassel cDNA phage expression library, modified from a dsDNA ligand screen(Singh et al., 1988). A biotinylated oligonucleotide corresponding to a G4 motif (G4v2-53046)(Andorf et al., 2014) from the antisense strand of the 5’ UTR of the maize hexokinase4 gene (hex4_A5U) was pre-folded in KCl into the G4 conformation and used as the ligand. Oligonucleotides and their names that were used in this study are given in Table 2.1. G4 formation was verified with thermal melt difference spectroscopy (TDS) (Figure 2.1) and shown to be K+ (Figure 2.1A and B) and sequence (Figure 2.1C and D) specific. From approximately one million phage screened with a plaque lift filter binding assay, we purified several independent cDNA clones, each corresponding to a previously uncharacterized maize gene predicted to encode an NDPK domain (PFAM PF00334 and Figure 2.4). This gene is from one of at least nine loci in the maize genome that code for proteins with an NDPK domain. The predicted protein from our maize cultivar W23 full-length cDNA clone, GenBank accession number KM347972, is 149 amino acids (aa) in length, beginning with MES and ending with YEK and corresponding to transcript “T03” from gene model GRMZM2G178576 of the reference genome cultivar B73. Here we name this maize gene nucleoside diphosphate kinase1, ZmNDPK1, and its protein product as ZmNDPK1 (UniGene Zm.92368, with reference sequence: XP_008658458.1, Figure 2.4A). RNA from this locus is detected in many different maize tissues, with particularly high levels in developing and matured seed(Sekhon et al., 2011). Compared to NDPKs from other eukaryotes, this clone shares 83% aa identity with Arabidopsis thaliana NDPK1 (AtNDPK1), 58% aa identity with AtNDPK2, 54% aa identity with AtNDPK3, and 63% aa identity with Homo sapiens NM23-H2 (Figure 2.4B). Based on this sequence identity pattern, we note that ZmNDPK1 has higher sequence identity to human NM23-H2 (also called NME2) than some other plant NDPKs (Figure 2.4). Further phylogenetic analysis of protein sequence similarity searches revealed that the NDPKs with a terminal lysine

24 residue (referred to here by their three terminal amino acids, YEK) were found in several other grass species, all of which were in the Panicoideae subfamily of the Poaceae family of monocotyledonous (grass) plant species (Figure 2.5). The full-length cDNA was subcloned into a bacterial overexpression vector (pET28a) to produce an N-terminal hexa-histidine (his-tag) fusion. We also generated the equivalent expression construct for NM23-H2. Both proteins overexpressed strongly and were purified by nickel affinity chromatography with a high salt wash to remove non-specific DNA. Size exclusion chromatography further purified the protein to high homogeneity. Protein concentration of the purified protein was measured by the bicinchonic acid (BCA) colorometric assay (Thermo -1 - Scientific, Rockford, IL) and the extinction coefficient at A280 was calculated to be 35,530 cm M 1. In this study, all concentrations are expressed in relation to the protein monomer concentration. Full-length, purified, recombinant ZmNDPK1 was stored in 50 mM HEPES (pH 7.5)/100 mM KCl and used for all subsequent experiments. 2.3.2. Structure determination of ZmNDPK1

Crystals in the space group P21, with two hexamers in the asymmetric unit and a unit cell of 66.9 Å x 179.1 Å x 99.5 Å, diffracted to 2 Å resolution. The structure refined to an Rfree of

20.1% and Rwork of 16% (Table 2.3). ZmNDPK1 shares the general structural features of other hexameric NDPKs, including the active site histidine that participates in the transphosphorylation reaction (Figures 1 and 2). Indeed, ZmNDPK1 maintains its hexameric assembly state even at 20 nM monomer concentration but can form dodecamers, as seen in the asymmetric unit, at higher concentrations (Figure 2.8). The known nucleotide binding active site is a long, positively charged cleft on the surface of the protein that lies between the core α/β fold and hairpin turn that connects α-helices 3 and 4 (Figure 2.7A and C). Despite strong sequence and structural conservation between this NDPK, two Arabidopsis homologs and one human homolog (Figure 2.4), ZmNDPK1 has an important and uniquely charged surface charge determined by specific amino acids throughout the protein, several of which have been previously shown to be important for DNA binding in NM23-H2 (Postel et al., 1996) (Figure 2.7A and B). One of these non-conserved stretches localizes to the hairpin turn that contributes to the active site as well as to both the three- fold and two-fold faces of the hexamer, comprised of Ala53 to Phe57. The second non-conserved stretch creates a long loop that sits at the surface of the dimer interface, Pro132 to Lys149. This

25 stretch ends at the C-terminal lysine (Lys149), which is unique to ZmNDPK1 and contributes to a positively charged patch at the edge of the trimer and dimer faces (Figure 2.7). The divergent hairpin forms the upper wall of the active site cleft, sitting at the edge of the trimer and dimer interfaces (Figure 2.7A). Interestingly, this hairpin also forms one interface between successive hexamers in the dodecamer (Figure 2.6B). Consequently, the loop of the divergent hairpin in the interface subunit shows the only structural divergence from the common hexameric NDPK structure shared by NM23-H2, ZmNDPK1, and other eukaryotic NDPKs. Specifically, when the monomer subunits are superimposed, the average root mean square distance (RMSD) between the 145 Cαs of the 11 most similar subunits is 0.3 ± 0.1 Å whereas the 11 Lys55 Cαs from the same 11 subunits are within 0.4 ± 0.2 Å of one another. In contrast, the average RMSD for the 145 Cαs of the subunit that contains the divergent active site loop aligned to the other 11 subunits is 0.5 ± 0.04 Å but Lys55 Cα from the divergent loop is 2.8 ± 0.3 Å away from the other Lys55 Cαs. 2.3.3. Native and recombinant ZmNDPK1 bind to G4 element hex4_A5U To confirm that the interaction between hex4_A5U and ZmNDPK1 also occurs with native ZmNDPK1 from maize, we mixed biotinylated, folded hex4_A5U oligonucleotide or a biotinylated unfoldable mutant hex4_A5Um (Figure 2.1 and Table 2.1) with heparin-fractionated maize seed extract and precipitated the proteins that interact with either oligonucleotide. A Western blot probed with an antibody raised against a unique ZmNDPK1 peptide (Figures 1 and S2) showed that native, endogenous ZmNDPK1 co-precipitates with the folded hex4_A5U oligonucleotide but not the unfoldable mutant (Figure 2.9). 2.3.4. ZmNDPK1 and NM23-H2 have different G4 binding properties Given the qualitative similarities between the ZmNDPK1 and NM23-H2, we compared the binding properties of the homologs using a straightforward slot blot binding assay. First, we folded either hex4_A5U or Pu44 (Dexheimer et al., 2009) oligonucleotide by heating and slow annealing in the presence of 20 mM KCl (Figure 2.1A and Table 2.1). Second, we measured the retention of the protein:DNA complexes (either ZmNDPK1:hex4_A5U or NM23-H2:Pu44) on nitrocellulose and the binding of free DNA to nylon through a slot apparatus using detection of DNA-coupled biotin (Figure 2.3). Under these conditions, ZmNDPK1 bound pre-folded hex4_A5U with an apparent Kd of 6.8 ± 0.3 nM and a Hill coefficient of 2.1 ± 0.1 whereas NM23-H2 bound pre-

26 folded Pu44 with an apparent Kd of 1.1 ± 0.1 nM and a Hill coefficient of 2.0 ± 0.2 (Figure 2.10A and B, Table 2.2). Next, we characterized ZmNDPK1 or NM23-H2 binding to its cognate oligonucleotide when the DNA is heated and annealed in the presence of Li+, which does not support G4 formation

(Figure 2.1B). ZmNDPK1 bound unfolded hex4_A5U with an apparent Kd of 288 ± 1 nM and Hill coefficient of 4.0 ± 0.1 whereas NM23-H2 bound Pu44 with an apparent Kd of 1.6 ± 0.1 nM and Hill coefficient of 0.7 ± 0.2 (Figure 2.10C and D and Table 2.2). These observations reveal an important and unexpected difference between the G4-binding activity of maize and human NDPK: ZmNDPK1 binds 40-fold more strongly to pre-folded than unfolded G4 DNA whereas NM23-H2 binds to both pre-folded and unfolded G4 DNA with similar affinity. Finally, we asked whether the difference in G4 binding by the maize and human NDPKs arose from the protein homolog or from the G4 DNA (Figure 2.4 and Table 2.1). To answer our question, we performed reciprocal binding titrations, i. e. ZmNDPK1 against Pu44 and NM23-H2 against hex4_A5U, using the same slot blot assay. We calculated that ZmNDPK1 binds pre-folded

Pu44 G4 DNA (in KCl) with an apparent Kd of 91 ± 13 nM and a Hill coefficient of 1.0 ± 0.1 but binds unfolded Pu44 G4 DNA (in LiCl) with an apparent Kd of 476 ± 44 nM and a Hill coefficient of 3.9 ± 0.7. In contrast, NM23-H2 binds prefolded hex4_A5U G4 DNA (in KCl) with an apparent

Kd of 33 ± 2 nM and a Hill coefficient of 1.6 ± 0.1 but binds unfolded hex4_A5U G4 DNA (in

LiCl) with an apparent Kd of 24 ± 1 nM and a Hill coefficient of 2.1 ± 0.1. 2.3.5. ZmNDPK1 binding to G4 DNA is specific and they bind with defined stoichiometry We further tested ZmNDPK1 binding preference using a panel of nucleotide variants to compete for hex4_A5U binding (Table 2.1 and Figures S1 and S7). The unbiotinylated hex4_A5U oligonucleotide was the only DNA sequence tested that competed for binding at 10-fold excess and completely at 100-fold excess whether incubated for 15 minutes or 60 minutes (Figure 2.11A and D). Two other G4-capable oligonucleotides, Pu27 (Dexheimer et al., 2009) and the RNA G4 hex4_AUG (a genic, sense-stranded G4 from the hexokinase4 gene(Andorf et al., 2014)), competed less efficiently than hex4_A5U, suggesting that ZmNDPK1 possesses some general G4 binding ability. Specifically, at 15 minutes hex4_AUG competed with about 50% efficiency at 100- fold excess and Pu27 with about 75% efficiency at 100-fold excess (Figure 2.11B). After 60 minutes, hex4_AUG competed with about 75% efficiency at 10-fold excess but never completely, even at 100-fold excess (Figure 2.11E). That these oligonucleotides behaved differently after

27 different time points suggests that the system was not at equilibrium at 15 minutes and may not be at 60 minutes. Lower competition by Pu27 compared to hex4_A5U is consistent with the 5-fold weaker binding of ZmNDPK1 to the related Pu44 oligonucleotide (Figure 2.10 and Table 2.2). The RNA hex4_AUG oligonucleotide folds into a G4 even without KCl (Figure 2.1C), which is expected because RNA quadruplexes are more stable than DNA quadruplexes (Arora and Maiti, 2009, Joachimi et al., 2009, Sacca et al., 2005, Zhang et al., 2010). Similarly, the G4-capable oligonucleotide hex4_A5I (another genic G4 from the hexokinase4 gene (Andorf et al., 2014)), hex4_A5Uds, hex4_A5Um, and MS2_A5U2 partially competed only at 100-fold excess after 60 minutes incubation but not after 15 minutes incubation, suggesting either nonspecific oligonucleotide binding or weak binding of the single stranded molecule (Figure 2.11). Unfoldable single stranded mutants (Pu27m or PA) and two stem loop mimics (MS2_A5U1 or MS2_A5U3) did not compete, even at 100-fold excess, under either incubation scheme (Figure 2.11 C and F). Finally, the binding stoichiometry was determined by titrating the protein against saturating G4-folded oligonucleotide. The resulting binding titration showed that one hex4_A5U bound to three ZmNDPK1 monomers (Figure 2.6), suggesting that if ZmNDPK1 is a hexamer, two G4s bind per hexamer. 2.3.6. ZmNDPK1 G4 binding, nucleotide binding, and nucleoside kinase activity A specific model for G4 binding has already been suggested to explain binding of NM23-

H2 for the unfolded, G-rich c-myc nuclease hypersensitive element III1, predicting that the tandem Gs bind sequentially to the three nucleotide binding sites on the trimer face (Dexheimer et al., 2009). Additional amino acids previously shown to be important for NM23-H2 binding DNA are on the two-fold symmetry face of the hexamer (Postel et al., 1996), however, supporting an alternative model that predicts oligonucleotide binding around the edge of the two faces of the protein (Raveh et al., 2001). The more recent model that places the G-rich oligonucleotide binding at the active sites (Dexheimer et al., 2009) leads to the prediction that saturating nucleotide (nucleotides bind with 10-20 μM affinity (Kandeel and Kitade, 2010)) would sterically compete for oligonucleotide binding. To test this steric hindrance prediction, we examined equilibrium binding of ZmNDPK1 for hex4_A5U in the presence several nucleotides (Figure 2.12). Up to 1 mM ATP, ADP, GTP, or GTPγS failed to block G4 hex4_A5U binding to ZmNDPK1. Rather, GTPγS may slightly strengthen the interaction between protein and DNA. Concentrations of

28 around 1 μM ATP appeared to slightly weaken the binding by about 0.7-fold, but that effect goes away with 1 mM nucleotide. Given that neither effect is strong, we interpreted these results as evidence that ZmNDPK1 binds G4 DNA specifically and strongly but not at its active site, perhaps allowing ZmNDPK1 to act as a kinase even in the G4-bound state. Given the presence of the conserved active site histidine (His115, Figure 2.4A), we predicted that ZmNDPK1 would be an active nucleoside diphosphate kinase, allowing for additional examination of the relationship between kinase and G4-binding activities. Indeed, ZmNDPK1 is an active kinase, although its specific activity is 50% lower than that of NM23-H2 when the kinase assay is performed in 20 mM KCl or 20 mM LiCl (Figure 2.13). When the same kinase assays were performed in the presence of pre-folded (in KCl) or unfolded (LiCl) G4 motif oligonucleotides, we found that neither the maize nor human NDPK enzymatic activities were substantially altered (Figure 2.13). These observations, like those from the G4-binding assays in the presence of excess nucleotides (Figure 2.12), support a model in which G4 DNA does not bind at the active site of NDPKs from maize or human. To further test the hypothesis that the intact active site is not required for G4 binding by ZmNDPK1, we produced and tested the H115A active site point variant for both its kinase activity and ZmNDPK1 binding. Under standard salt conditions (100 mM KCl) the wild-type ZmNDPK1 has a standard activity of 384 ± 5 units/mg. As expected from previous observations with NM23- H2 (Postel et al., 2002, Postel and Ferrone, 1994), ZmNDPK1 has no measurable activity for phosphate transfer without its catalytic histidine (11 ± 12 units/mg). We then calculated that H115A binds to pre-folded hex4_A5U G4 DNA (KCl) with an apparent Kd of 6.9 ± 0.7 and to unfolded hex4_A5U G4 DNA (LiCl) with an apparent Kd of 287 ± 14, akin to wild-type binding (Figure 2.14A and B and Table 2.2). These results further establish the independence of kinase and G4- DNA binding activities in ZmNDPK1. 2.3.7. Lys149 is important for G4 binding To test the hypothesis that the divergent C-terminus, which contributes to the unique charged surface of ZmNDPK1, is important for G4 DNA binding, we generated the K149A point variant and tested it for hex4_A5U binding. The kinase activity of the K149A variant is not affected and it has a specific activity in 100 mM KCl of 485 ± 31 units/mg, but hex4_A5U G4 DNA binding in KCl was reduced by about 5-fold to 31 ± 2 (Figure 2.14C and Table 2.2). Binding of the K149A variant to unfolded hex4_A5U G4 DNA in LiCl was essentially abrogated such that even at 2 μM

29 protein (20,000-fold excess protein over oligonucleotide) binding to the nitrocellulose membrane had not saturated (Table 2.2). 2.3.8. ZmNDPK1 binds to folded G4 DNA Finally, we used FRET fluorescence spectroscopy to determine if ZmNDPK1 binds to folded or to unfolded G4 DNA. We performed binding titrations using FRET-labeled G4 probes, either hex4_A5U or Pu27, against either ZmNDPK1 or NM23-H2 in KCl or LiCl. Interaction between the 5’ and 3’ end of the oligonucleotide is measured by an increase in FRET acceptor emission (TAMRA at 585 nm), after excitation of the donor (6-FAM at 475 nm). When the oligonucleotide 5’ and 3’ ends are far apart, as they are in unfolded single stranded G-rich DNA, FRET is not detected. When the oligonucleotide 5’ and 3’ ends are close, as in the G4 folded state, FRET signals are measured. An increase in acceptor emission relative to the total donor and acceptor emission is measured by PF (equation 2). PF is close to 1 if the FRET probes are independent and the value decreases with increasing FRET. If NDPK were to stabilize the unfolded form of the quadruplex-forming motif, we predicted that protein binding would diminish FRET and that PF would increase with protein titration. If NDPK were to stabilize the folded form of the quadruplex-forming motif, then we predicted that protein binding would increase FRET and that

PF would decrease with protein titration. No change in PF would occur without the presence of both the donor and acceptor fluorophores.

퐼푑 푃퐹 = (2) 퐼푑+퐼푎 ZmNDPK1 mixed with FRET-labeled hex4_A5U and NM23-H2 mixed with FRET-labeled Pu27 both resulted in increased FRET in G4s that were pre-folded with KCl (Figure 2.15A and B). FRET increase was visualized by normalizing the emission spectra to the donor emission peak at

520 nm and measured by changes in PF of -0.07 ± 0.02 for ZmNDPK1/hex4_A5U and -0.03 ±

0.02 for NM23-H2/Pu27 (Figure 2.15 and Table 2.4). By T-test analysis, these changes in PF show statistical relevance at P-values < 0.05 (Table 2.4). Moreover, both proteins induced strong FRET when mixed with oligonucleotides that were not pre-folded but were in LiCl-containing buffers (Figure 2.15C and D). Specifically, when ZmNDPK1 was mixed with hex4_A5U that was not pre- folded, PF changed by -0.17 ± 0.03 and when NM23-H2 was mixed with Pu27 that was not pre- folded, PF changed by -0.08 ± 0.01 (Table 2.4). T-tests also show P-values < 0.05 for these changes with protein titration. Importantly, T-test analysis shows statistical significance for the stronger

30 effect of mixing protein with oligonucleotides that were annealed in LiCl relative to mixing protein with oligonucleotides that were pre-folded in KCl (Table 2.4). Similar behavior was seen when the proteins were mixed with their non-cognate pairs (Figure 2.15E-H and Table 2.4). No combination of protein with single-labeled oligonucleotide (FAM or TAMRA alone) caused FRET emission at 585 nm when illuminated at 475 nm (Figure 2.16), confirming that the measured FRET signals were dependent on the double-labeled oligonucleotides. No combination of protein with FRET- labeled oligonucleotide caused a decrease in FRET or coupled increase in PF (Table 2.4), indicating all combinations of protein, oligonucleotide, and salt resulted in a bound G4 oligonucleotide conformation. 2.4. Discussion 2.4.1. G4s in maize Maize is a staple crop across the globe as a source of nutrition and energy that fuels economic development. Additionally, maize is a historic, model genetic organism for studying fundamental mechanisms of eukaryotic genome composition (Schnable et al., 2009). For example, transposable DNA elements (McClintock, 1930), and the capping properties that characterize telomere function (McClintock, 1939) are among the historic discoveries in maize. Interestingly, G4 motifs are clustered in known regulatory regions of some genes, in particular those that are involved in stress response metabolic pathways in maize (Andorf et al., 2014). Interestingly, in E. coli, antisense G4 motifs found at promoter regions decrease gene expression whereas antisense G4 motifs found in 5’UTRs increase gene expression (Holder and Hartig, 2014), suggesting that differences in G4 position relative to transcription start sites might have functional consequences in the cell. As such, understanding the fundamental mechanisms of G4 binding proteins in maize, a model grass species with a large and complex genome, affords new opportunities to advance understanding of G4 biology while gaining information that could be useful for crop improvement. G4 motifs are found distributed throughout the maize genome, as they are in the human genome (Huppert and Balasubramanian, 2007), but in maize they are more likely found within the transcribed gene than in upstream promoter regions (Andorf et al., 2014). Given the functional significance of a G4 motif in regulating c-myc gene expression, we predict that the positioning of the G4 motif within the 5’UTR of maize genes may also represent a similarly broad, conserved mechanism for transcriptional regulation. In fact, our discovery that ZmNDPK1 binds to

31 hex4_A5U bolsters this prediction and suggests that G4 binding by NDPKs may represent an ancient and conserved regulatory mechanism in plants and animals. 2.4.2. Binding mode of G4 to NDPK We propose that each hexamer of ZmNDPK1 or NM23-H2 binds tightly and cooperatively to two G-rich oligonucleotides and stabilize a folded G4 conformation. In the case of a pre-folded G-rich oligonucleotide, this model explains both straightforward binding titrations measured by protein:DNA retention on nitrocellulose (Figure 2.10) and fluorescence experiments that show stabilization of the folded conformation, measured by increased FRET efficiency between the 5’ and 3’ end of FAM/TAMRA labeled oligonucleotides (Figure 2.15). Mechanistically, the two homologs may bind the G4 somewhat differently because the three amino acids that are implicated in NM23-H2 DNA binding (Arg34, Asn69, and Lys135) (Postel et al., 1996) are not conserved in ZmNDPK1 (Figure 2.4). Nonetheless, this simple model explains our data as concerns NDPK binding to pre-folded G4 DNA. Binding of oligonucleotides that are not pre-folded could be explained by two variations of this model. In one variation, NDPK may function as a folding chaperone, binding weakly to a form of the unfolded oligonucleotide that has some features of the G4 but is not tightly folded and, in doing so, stabilizes G4 formation for tight binding. We know from TDS measurements that neither hex4_A5U nor Pu27 adopt a folded G4 in LiCl (Figure 2.1). Nonetheless, the unfolded oligonucleotide is likely a structurally heterogeneous ligand in solution. For example, tandem Gs could transiently form a motif like a Hoogsteen base pair or a looped structure that is recognized by NDPK, and, upon binding, the intramolecular G4 folded conformation would be stabilized. In the other variation, NDPK binds to the single-stranded oligonucleotide that then recruits a second oligonucleotide, inducing an increase in intermolecular FRET. Both model variations are supported by our observation that unfolded single stranded G-rich DNA binding by either protein results in an increase of FRET efficiency (Figure 2.15). The first variation is supported by the observation that NM23-H2 appears to bind to Pu27 or hex4_A5U in LiCl-containing buffer with about the same affinity as it does in KCl-containing buffer (Figure 2.10 and Table 2.2). The second variation is supported by the observation that ZmNDPK1 binds more weakly and with more steep to single-stranded G-rich DNA than to G4-folded DNA (Figure 2.10 and Table 2.2). Competition for G4 binding by other variants of the sequence support the idea that ZmNDPK1 has some nonspecific DNA binding activity, but is quite specific for folded G4s like

32 hex4_A5U, hex4_AUG RNA, and Pu27 (Table 2.1 and Figure 2.11). Indeed, other similarly folded G4s would likely interact with ZmNDPK1 with high affinity but hex4_A5I does not compete well for binding, indicating the binding includes some degree of specificity for particular G4 structures, and not a generic structural feature common to different G4s (Table 2.1 and Figure 2.11). Importantly, under no condition that we tested did either NDPK homolog support unfolding of the G4 or stabilization of the unfolded conformation (Figure 2.15). Rather, NDPK appears to function as a stable G4-binding protein, akin to XPB and distinct from other G4-interacting proteins like XPD or Up1 that possess G4 helicase or unfolding activities (Gray et al., 2014, Cogoi et al., 2008). Our findings further suggest that ZmNDPK1 and NM23-H2 have important biochemical differences in binding single-stranded G-rich DNA (Figures 4 and 8). Additionally, the high affinity of NM23-H2 for pre-folded G4s (Figures 4 and 8) raises questions about the proposed model in which NM23-H2 functions in stabilizing single stranded G-rich DNA (Dexheimer et al., 2009, Raveh et al., 2001, Thakur et al., 2009). 2.4.3. Structural basis for G4 binding by NDPKs Given the sequence and structural similarities between NDPK homologs (Figures 1 and 2), the difference between binding modes of these two eukaryotic proteins for single stranded G-rich DNA is surprising. The only tertiary structural differences between the maize and human proteins lies in the apparent flexibility of the sequence divergent active site loop that, in ZmNDPK1 moderates further oligomerization of the hexamer (Figures 1, 2 and S5). Despite otherwise high homology, however, additional differences in the C-terminus are sufficient to generate a homolog- specific antibody and might explain the altered behavior (Figures 1, 2 and S2). The divergent C- terminus is far from the active site, clustered at the edge between the trimer and dimer faces of the protein and contributing strong basic patches that are specific to the maize homolog (Figure 2.7). Neither an open nor functional active site is required for G4 binding (Figures 5A and B, 6, and 7), supporting the idea that this far-away region could be important. Similarly, PA, the adenine-rich oligonucleotide with interspersed G’s, does not compete for G4 binding (Table 2.1 and Figure 2.11B) as it would if an oligonucleotide bound with sequential G’s in three active sites of a trimer as previously proposed (Dexheimer et al., 2009). From these observations, we predicted that nucleic acids bind to ZmNDPK1 at the basic patches located at the edge of the trimer-dimer interface, centered at the C-terminal most Lys149 (Figures 1 and 2). Interestingly, NM23-H2 lacks such a C-terminal basic residue, ending instead

33 at a conserved glutamate (NM23-H2 Glu152 or ZmNDPK1 Glu148), raising the possibility that ZmNDPK1 Lys149 may contribute to the recruitment of single stranded G-rich DNA. In fact, altering Lys149 to an alanine (K149A) reduces pre-folded hex4_A5U G4 binding by about 5-fold, interestingly to the same level that NM23-H2 binds to the same pre-folded oligonucleotide (Figures 4 and 7 and Table 2.2). The K149A point variant is deficient in binding to hex4_A5U when it is not pre-folded into G4 (Table 2.2). Clearly, this single amino acid defines a unique property of the plant NDPK homolog that we discovered. Indeed, the effect of the terminal lysine on G4-binding properties may have important biological implications because of its distinct phylogenetic distribution within some of the grass species within the plant kingdom (Figure 2.5). Most plant species encode one or more small NDPKs (~150 aa in length), but the ZmNDKP1-type (identified as the “YEK” group in Figure 2.5) appears to be a derived variant found in members of the panicoid group of the Poaceae family, not limited to maize. Plant species with this YEK-type NDPK include maize, sorghum, sugarcane, fox tail millet, and switchgrass, some of the worlds leading food, feed, and biofuel producing grass species. That these species have maintained this specific amino acid variant despite strong sequence conservation of the protein as a whole, and given the presence of another small NDPKs (ZmNDPK2 in the case of maize, Figure 2.5), suggests that its unique and important properties may be evolutionarily important. We speculate that those properties could play a role in Poaceae- species diversification or co-evolution with G4 elements in the genomes, an interesting avenue of further investigation. 2.4.4. Implications for in vivo activity Given the conservation of G4 structures in prokaryotes and eukaryotes as well as the conservation of G4-compatible telomere repeats in plants and mammals, it is not surprising that plants would use similar genic G4s to modulate genome dynamics. There is no clear functional c- myc ortholog in maize that controls the cell cycle (Loulergue et al., 1998) so direct comparisons of biological functions in regard to myc activation are limited. Even so, we have discovered intriguing similarities and differences between the two homologs that may illuminate a range of function attributed to NDPK/G4 interactions at different genic locations in a species-specific way. NM23-H2 binds both folded and unfolded G4s with similar affinities and in a manner that promotes close contact of the 5’ and 3’ ends of the oligonucleotide, presumably by stabilizing a folded G4 conformation, together suggesting the protein binds both ligands in a similar

34 conformation (Figures 4 and 8). NM23-H2 binds this nuclease sensitive region of DNA, whether pre-folded or not, that is far upstream from the c-myc promoter, enhancing c-myc expression (Siddiqui-Jain et al., 2002). In contrast, the hex4_A5U G-rich region we used as a probe for ZmNDPK1 binding is found just downstream of the transcription start site, within the 5’UTR of hexokinase4, a region of DNA that must be opened for transcription to occur. ZmNDPK1 binds more weakly to single stranded G-rich DNA than it does to folded G4, but then stabilizes a folded form of the G4, whether inter- or intramolecular (Figures 4 and 8). It is possible that ZmNDPK1 stabilizes intramolecular G4s from the template DNA strand or intermolecular-type G4s from adjacent motifs, sister chromatids, or transcripts. Given the prominence of G4 motifs in the anti-sense strand of 5’-UTRs of maize genes (Andorf et al., 2014), it is also possible that NDPK could impact nascent RNA formation or progression of the transcription bubble. Similar overrepresentation of AUG G4 motifs at start codons in the sense strand of maize genes (Andorf et al., 2014) suggests that interaction of ZmNDPK1 with an RNA motif on the nascent transcript might also have downstream implications for translation. Alternatively, an A5U-type G4 element could contribute to RNA polymerase pausing or blocking, while localizing active NDPK to maintain a high ribonucleotide triphosphate pool. Indeed, NM23-H2 was recently shown to tether to the C-terminal domain of dynamin to increase GTP pools at membranes, so there is precedent for the targeted localization of NDPK enzymatic activity via interaction with another molecule (Boissan et al., 2014). Certainly the mechanisms by which NM23-H2 exerts control over c-myc or ZmNDPK1 over genes like hexokinase4 are likely complex, given the multi-factorial and dynamic interactions between DNA isoforms, transcription factors and chromatin proteins. Our study shows that not all aspects of NDPK function translate across taxonomic boundaries and that such comparative analyses will continue to shed light on the biologically relevant mechanism of NDPK DNA binding activities. 2.4.5. Conclusions In conclusion, we have characterized the first plant G4-binding protein, which was discovered through a ligand-binding screen of a cDNA expression library. This G4 binding protein is a nucleoside diphosphate kinase, akin to human NM23-H2. Biochemical experiments confirmed the specificity of ZmNDPK1 for G4 structures and identified differences in the binding properties of human and maize NDPK homologs to single stranded, G-rich DNA sequences that are not pre- folded into G4 structures, highlighting potentially different biological roles of these NDPKs.

35

Figure 2.1: G4 formation by the oligonucleotides used in this study is salt and sequence dependent. A. Thermal Difference Spectroscopy (TDS) of hex4_A5U, hex4_A5I, and Pu27 in the presence of 20 mM K+ all show the diagnostic dip at 295 nm that indicates the melting of G4 structure upon heating. B. In the presence of Li+ rather than K+, G4 superstructure does not form. C. Sequence variants of hex4_A5U and Pu27 do not show G4 signature. D. Sequence variants of hex4_A5U and Pu27 in Li+ rather than K+ also do not show G4 formation.

36

Figure 2.2: ZmNDPK1-specific antibody 133 detects native ZmNDPK1. Antibody 133 detects native ZmNDPK1 from seed extract and recombinant ZmNDPK1 (rZm) but not recombinant NM23-H2 (rHs). rZm runs slightly higher than native ZmNDPK1 because of its N-terminal histidine tag (see Methods). Molecular masses marked in kDa

Figure 2.3: Representative nitrocellulose retention assay. Increasing concentrations of ZmNDPK1 causes retention of biotinylated hex4_A5U on nitrocellulose (NC) rather than nylon (NY) membranes.

37

Figure 2.4: ZmNDPK1 is a canonical NDPK. A. Sequence alignment by Clustal Omega(Sievers and Higgins, 2014) of ZmNDPK with three A. thalina (At) and the closest human (Hs) homologs shows the high homology of this conserved enzyme. Peptides at the C-terminus are the most strongly divergent, demarked by an underline showing the peptide used to raise the antibody against ZmNDPK1. Divergent aa’s are in italics. The active site histidine is in bold. * marks aa sequence identity, : marks strong aa sequence similarity; . marks weak aa sequence similarity. The dark black line highlights the PFAM PF00334 NDPK domain. B. The percent identity matrix shows the highest aa identity between ZmNDPK and AtANPK1.

38

Figure 2.5: Phylogenetic tree of proteins similar to ZmNDPK1 and ZmNDPK2. A. Human NM23-H1, NM23-H2 (analyzed in this study) and Arabidopsis NDPK1 were added to 33 proteins from BLAST searches, and the 36 protein sequences were aligned by Clustal Omega(Sievers and Higgins, 2014) and a phylogenetic tree was diagrammed using Treedyn(Chevenet et al., 2006). Plant proteins are marked as eudicot (circles) or monocot (squares). Additional annotation delineation within the grasses is indicated at left with symbol keys for different taxonomic groups. Protein short names indicate Genus species (first three letters) and the C-terminal aa residues (last three letters). The NDPKs ending with a terminal lysine residue, shown here to contribute to affinity of NDPK1 for G4 DNA, are limited to the panicoid subfamily (red boxes) of the Poaceae. Member of the so-called “YEK” group include maize (Zma), sorghum (Sbi), sugarcane (Sof), fox tail millet (Sit), and swithcgrass (Pvi). Proteins used in the G4 DNA-binding assays in this study are indicated with an asterisk. The branch lengths are proportional to the number of substitutions per site (scale bar). B. Additional information about the proteins sequences used in this analysis is in alphabetical order by genus.

39

Figure 2.6: ZmNDPK1 hexamer binds two G4s. Stoichiometric titration of ZmNDPK1 with oligonucleotide shows that ZmNDPK1 binds 1 oligonucleotide to 3 monomers (2 oligonucleotides per hexamer).

Figure 2.7: Hexameric NDPKs have high structural homology. A. ZmNDPK1 is a canonical hexameric NDPK whose active site sits at a deep cleft, marked by a red *. Three active sites are related by a three-fold symmetry axis, which is perpendicular to the plane of the page. Turning the hexamer by 90o highlights the two-fold symmetry axis. Blue balls demark the most divergent amino acids (italics in Figure 2.4). The active site loop and C-terminus, the most divergent regions of the structure, are highlighted in purple. Lys149 sits at the junction of the three-fold and two-fold axis. B. ZmNDPK1 (gray) and NM23-H2 (yellow) are highly homologous. C. The charged surface of ZmNDPK1 shows strongly basic regions (blue) at the active site cleft, marked by a red *, and in a patch at the junction of the two- and three-fold symmetry axis. D. NM23-H2 does not show as strongly a charged surface as ZmNDPK1. Molecular graphics and analyses were performed with the UCSF Chimera package. Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311)(Pettersen et al., 2004).

40

Figure 2.8: ZmNDPK1 is a hexamer, even at low concentration, but can also form a dodecamer. A. Lane 1: Native Mark standards (Life Technologies, Grand Island, NY) Lane 2: 25 μM ZmNDPK1 labeled with 30 nM γ32P-ATP. Lane 3: 20 nM ZmNDPK labeled with 30 nM γ32P-ATP. Lane 4: 30 nM γ32P-ATP. Lanes 5 and 6 show lanes 2 and 3 with a longer exposure to the storage phosphor screen. B. ZmNDPK1 assembles as a dodecamer in the asymmetric unit of the crystal and the interface is formed by the non-conserved active site loop that reveals the only structurally divergent aspect of the ZmNDPK1 hexamer.

Figure 2.9: hex4_A5U binds to native ZmNDPK1. Western blotting using a specific ZmNDPK1 antibody against recovered proteins from a maize seed-extract pulldown show that native ZmNDPK1 does not bind an unfoldable form of hex4_A5U (hex4_A5U m) but does bind folded hex4_A5U. Lane 1: 17 kDa marker. Lane 2: ZmNDPK1-containing fraction from heparin separated maize seed extract (input). Lane 3: elution from pulldown using unfoldable hex_A5Um. Lane 4: elution from pulldown using folded hex4_A5U.

41

Figure 2.10: ZmNDPK1 and NM23-H2 bind folded G4 oligonucleotides with nM affinity. Increasing protein concentration increases the fraction of oligonucleotide that is retained on the nitrocellulose. A. hex4_A5U oligonucleotide titration with ZmNDPK1 in 20mM KCl B. Pu44 oligonucleotide titration with NM23-H2 in 20 mM KCl. C. hex4_A5U oligonucleotide titration with ZmNDPK1 in 20 mM LiCl. D. Pu44 oligonucleotide titration with NM23-H2 in 20 mM LiCl. E. Pu44 oligonucleotide titration with ZmNDPK1 in 20 mM KCl. F. hex4_A5U oligonucleotide titration with NM23-H2 in 20 mM KCl. G. Pu44 oligonucleotide titration with ZmNDPK1 in 20 mM LiCl. H. hex4_A5U oligonucleotide titration with NM23-H2 in 20 mM LiCl.

42

Figure 2.11: Competition of ZmNDPK1 binding to hex4_A5U by a series of alternative oligonucleotides summarized in Table 2.1 measured by the retention of biotinylated hex4_A5U in the presence of 1, 10, or 100- fold excess unbiotinylated competitor. A. hex4_A5U variants (15 minutes incubation) B. hex4_AUG and Pu27 variants (15 minutes incubation) C. predicted hex4_A5U G4 loop variants (15 minutes incubation) D. hex4_A5U variants (60 minutes incubation) B. hex4_AUG and Pu27 variants (60 minutes incubation) C. predicted hex4_A5U G4 loop variants (60 minutes incubation).

43

Figure 2.12: Nucleotides do not inhibit equilibrium ZmNDPK1 binding. ZmNDPK1 was mixed with folded hex4_A5U and increasing concentrations of A. ATP, B. ADP, C. GTP, and D. GTPγS before hex4_A5U retention on nitrocellulose was measured. For each, even at 1 mM nucleotide, ZmNDPK1 still strongly binds hex4_A5U.

Figure 2.13: G4 binding does not affect NDPK activity. A. ZmNDPK1 activity is not affected significantly by the presence of folded or unfolded hex4_A5U DNA in KCl or LiCl. B. NMH23-H2 is activity is not affected significantly by the presence of folded or unfolded Pu27 DNA in KCl or LiCl. One unit is defined as μmol/minmg.

Figure 2.14: Point variants of ZmNDPK1 show different DNA binding activities. A. Increasing H115A (active site) variant ZmNDPK1 concentration increases the fraction of hex4_A5U G4 oligonucleotide that is retained on the nitrocellulose in 20 mM KCl. The H115A variant binds to pre-folded hex4_A5U DNA in KCl with about the same affinity as wild-type ZmNDPK1. B. Increasing H115A (active site) variant ZmNDPK1 concentration increases the fraction of hex4_A5U G4 oligonucleotide that is retained on the nitrocellulose in 20 mM LiCl. The H115A variant binds to unfolded hex4_A5U DNA in LiCl with about the same affinity as wild-type ZmNDPK1. C. Increasing K149A (C-terminal) variant ZmNDPK1 concentration increases the fraction of hex4_A5U G4 oligonucleotide that is retained on the nitrocellulose in 20 mM KCl. The K149A variant binds to pre-folded hex4_A5U DNA in KCl with 5-fold weaker affinity.

44

Figure 2.15: ZmNDPK1 and NM23-H2 both bind to folded G4 DNA, whether pre-folded in KCl or not pre-folded in LiCl. All spectra are normalized to the 520 nm maximum, for simplicity. All sets of spectra are drawn with increasingly dark lines with increasing protein concentration, showing every-other point in the titration for clarity (the legend is at bottom). A 1000 nM BSA control (dotted line) was performed to show that the increase in FRET is NDPK-specific because the BSA control is identical to the 0 nM NDPK point both in fluorescence spectrum and PF. Each inset plots PF as protein concentration in nM on the x-axis and PF on the y-axis, showing each point along the titration. A. ZmNDPK mixed with FRET-labeled pre-folded hex4_A5U (in KCl) shows a small but significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. B. NM23-H2 mixed with FRET-labeled pre-folded Pu27 (in KCl) shows a small but significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. C. ZmNDPK mixed with FRET-labeled unfolded hex4_A5U (in LiCl) shows a large and significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. D. NM23-H2 mixed with FRET-labeled unfolded Pu27 (in LiCl) shows a moderate but significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. E. ZmNDPK mixed with FRET-labeled pre-folded Pu27 (in KCl) shows a small but significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. F. NM23-H2 mixed with FRET-labeled pre-folded hex4_A5U (in KCl) shows a significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. G. ZmNDPK mixed with FRET-labeled unfolded Pu27 (in LiCl) shows a large and significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein. H. NM23-H2 mixed with FRET-labeled unfolded hex4_A5U (in KCl) shows a significant increase in FRET efficiency at 585 nm with a coupled decrease in PF (inset) with titrating protein.

45

Figure 2.16: FRET signals and PF trends are specific to energy transfer between the 5’ and 3’ ends of the folded oligonucleotides. A. Single-labeled TAMRA or FAM hex4_A5U oligonucleotides show no systematic changes in FRET with protein titration in KCl or LiCl. Both TAMRA and FAM spectra are normalized to the FAM emission at 520 nm for clarity, as in Figure 2.15. B. Single-labeled TAMRA or FAM Pu27 oligonucleotides show no systematic changes in FRET with protein titration in KCl or LiCl. Both TAMRA and FAM spectra are normalized to the FAM emission at 520 nm for clarity, as in Figure 2.15. C. PF values calculated from ZmNDPK1 titration into either single-labeled TAMRA or FAM hex4_A5U oligonucleotides show no systematic changes in either KCl or LiCl. D. PF values calculated from NM23-H2 titration into either single-labeled TAMRA or FAM Pu27 oligonucleotides show no systematic changes in either KCl or LiCl.

46 Table 2.1. List of DNA aptamers used in this study

Competes G4 variant Forms Sequence for name G4? binding? hexokinase4 antisense 5’UTR CGGGGGTGTTGAAAGGGAGGAGGAGGGAGGGG Yes ++++ hex4_A5U hexokinase4 antisense 5’ intron 1 TGGGGTGGGGGGGAGCGGG Yes + hex4_A5I hexokinase4 sense AUG (RNA) CGGGGGGAUGGGGCGGGUCGGG Yes ++ hex4_AUG hexokinase4 antisense 5’UTR point mutant CGacGGTGTTGAAAGcGAGGAGGAGcGAGcGG No + hex4_A5U double stranded hexokinase4 antisense CGGGGGTGTTGAAAGGGAGGAGGAGGGAGGGG No + 5’UTR CCCCTCCCTCCTCCTCCCTTTCAACACCCCCG hex4_A5U c-myc nuclease hypersensitive element CGCTTATGGGGAGGGTGGGGAGGGTGGGGAAGGTGGGGAGGAG Yes not tested III1 Pu44 c-myc nuclease hypersensitive element TGGGGAGGGTGGGGAGGGTGGGGAAGG Yes ++ III1 Pu27 poly-adenine mutant AAAAGAAAAAAAAGAAAAAAAAGAAAA No - PA c-myc nuclease hypersensitive element TGGcGAGcGTGGcGAGcGTGGcGAAGG No - III1 point mutant Pu27m hex4_A5U loop 1 MS2-stemloop CGTACACCTGTTGGTGTACG No - MS2_A5U1 hex4_A5U loop 2 MS2-stemloop CGTACACCTGAAGGTGTACG No + MS2_A5U2 hex4_A5U loop 3 MS2-stemloop CGTACACCGGAGGGTGTACG No - MS2_A5U3

47 Table 2.2. Binding affinities of NM23-H2 and ZmNDPK1 to Pu44 and hex4_A5U oligonucleotides

NDPK homolog Oligonucleotide (salt) Kd (nM) Hill’s coefficient ZmNDPK1 hex4_A5U (KCl) 6.8 ± 0.3 2.1 ± 0.1 ZmNDPK1 hex4_A5U (LiCl) 289 ± 1 4.0 ± 0.1 NM23-H2 Pu44 (KCl) 1.2 ± 0.2 2.0 ± 0.2 NM23-H2 Pu44 (LiCl) 1.6 ± 0.1 1.7 ± 0.2 ZmNDPK1 Pu44 (KCl) 91 ± 13 1 ± 0.1 ZmNDPK1 Pu44 (LiCl) 476 ± 44 3.9 ± 0.7 NM23-H2 hex4_A5U (KCl) 33 ± 2 1.6 ± 0.1 NM23-H2 hex4_A5U (LiCl) 24 ± 1 2.1 ± 0.1 ZmNDPK1 hex4_A5U (KCl) 6.9 ± 0.7 1.8 ± 0.1 H115A ZmNDPK1 hex4_A5U (LiCl) 287 ± 14 1.9 ± 0.1 H115A ZmNDPK1 hex4_A5U (KCl) 31 ± 2 1.6 ± 0.2 K149A ZmNDPK1 hex4_A5U (LiCl) >2000 N/A K149A

48 Table 2.3: Crystallographic statistics for ZmNDPK1 X-ray data 1VYA

Data collection Refined resolution range (Å) 39.8-2.1

space group P21

unit cell dimensions (Å) 66.9 x 179.1 x 99.5

completeness 99.5 (99.6) Rmerge (%) 9.3 (47.9) I/(I) 19.8 (3.6) redundancy 7.3 (6.9)

Refinement statistics

number of atoms 16,251 Reflections 123,697 (7593)

Rworking (%) 16.0 (20.1)

Rfree (%) 20.1 (25.7) RMS (bonds) (Å) 0.007

RMS (angles) (Å) 1.01

2 average B value (Å ) 20.7 B value for protein 23.9 B value for waters 33.1

Ramachandran statistics

Preferred regions (%) 98.2

Allowed regions (%) 1.0 Outliers (%) 0.8

Table 2.4. Statistical analysis of FRET results for NDPK-G4 pairs in KCl or LiCl.

NDPK homolog Oligonucleotide (salt) ΔPF P-value(T-test) P-value salt pair ZmNDPK1 hex4_A5U (KCl) -0.07 ± 0.02 0.010 0.03 ZmNDPK1 hex4_A5U (LiCl) -0.17 ± 0.03 0.006 NM23-H2 Pu27 (KCl) -0.03 ± 0.02 0.040 0.02 NM23-H2 Pu27 (LiCl) -0.08 ± 0.01 0.002 ZmNDPK1 Pu27 (KCl) -0.06 ± 0.01 0.006 0.005 ZmNDPK1 Pu27 (LiCl) -0.20 ± 0.01 0.001 NM23-H2 hex4_A5U (KCl) -0.11 ± 0.01 0.001 0.38 NM23-H2 hex4_A5U (LiCl) -0.10 ± 0.02 0.004

49 CHAPTER 3

BULGED AND CANONICAL G-QUADRUPLEX CONFORMATIONS FORMED BY A SINGLE G-RICH DNA SEQUENCE CO-EXIST IN SOLUTION AND DETERMINE PROTEIN BINDING SPECIFICITY

3.1. Introduction Traditionally, DNA is thought of as the genetic storage unit held in a double-stranded helical conformation. The famous double helix structure (Watson and Crick, 1953) falls short in light of the observations that guanine bases (Gs) in G-rich regions of DNA or RNA can form Hoogsteen base pairs with one another to create a planar G-quartet (Gellert et al., 1962, Zimmerman et al., 1975). Sequential G-quartets can stack to form G-quadruplexes (G4s)(Figure 3.1A). These secondary structures have been identified in eukaryotic nuclei using G4-specific antibody staining (Biffi et al., 2013). Further, functional roles in regulating transcription and replication continue to be identified from bacteria to mammals (Farhath et al., 2015, Lemmens et al., 2015, Kanoh et al., 2015, Wu et al., 2008, Paeschke et al., 2011, Smestad and Maher, 2015). In short, DNA G4s are now a recognized as a biologically relevant structural form of DNA. G4s are identified throughout plant and mammalian genomes at similar, but not identical, loci. For example, in the maize genome, G4s tend to occur just downstream of transcription start sites (TSSs) in the antisense strand (called “A5U”-type G4s for antisense 5’-untranslated region) (Andorf et al., 2014). In humans, G4s are enriched just upstream of TSSs as well as in introns near intron-exon boundaries. Unlike maize, G4s are more commonly found in the sense strand and thus are transcribed into mRNA (Smestad and Maher, 2015). In maize putative G4s are overrepresented in promoter regions of genes associated with energy status pathways, oxidative stress response, and hypoxia, suggesting a regulatory role for these elements (Andorf et al., 2014). Presumably the analogy between these divergent eukaryotic genomes carries over into the protein factors that regulate G4 folding and unfolding in the nucleus and/or cytoplasm. In fact, a number of mammalian proteins play a role in unfolding (Hudson et al., 2014, Johnson et al., 2010, Khateb et al., 2004, London et al., 2008, Qureshi et al., 2012b, Safa et al., 2014) or stabilizing (Dempsey et al., 1999, Hanakahi et al., 1999, Quante et al., 2012) G4s. Interestingly, we recently identified a nucleoside diphosphate kinase (NDPK) from maize, ZmNDPK1, that is similar to the human NM23-H2, an NDPK homolog (Kopylov et al., 2015)—both bind and stabilize the folded

50 form of G4 DNA. However, the nature of their interaction is not completely understood and there is no three dimensional structure of an NDPK:G4 DNA complex. With the acceptance of the importance of genic DNA G4s come questions about how they function. In this realm, biochemistry and biophysics shed light on the physical properties that lay the basis for defining these activities. To this end, various spectrophotometric and spectroscopic assays are available to assess the structure of model oligonucleotides in solution. Nonetheless, these techniques individually fall short in defining a single state of the oligonucleotide. Structure determination is complicated by the innate polymorphism of even the simplest G4-forming sequences (Webba da Silva, 2007), formation of stable structures with bulges in G-tracts (Mukundan and Phan, 2013), topological interconversion (Li et al., 2005, Dailey et al., 2010), and G-tract slippage (Seenisamy et al., 2004). UV spectrophotometry, CD spectrophotometry and DMS-footprinting are some of most widely used methods for characterizing G4 structures. UV spectrophotometry relies on the increase in absorbance of an oligonucleotide in the UV region around 295 nm upon G4 formation. The exact mechanism that causes this hyperchromism is unknown, but is attributed to the formation of Hoogsteen base pairs in a G-quartet (Miannay et al., 2009). CD spectrophotometry is another commonly used technique that can provide structural insights into G4 topology. Glucosidic bond orientation of stacked Gs in G-tracts gives rise to two types of spectra upon G4 formation: one with a major positive peak at 262 nm attributed to the formation of a parallel G4 and another with a major positive peak at 292 nm attributed to an anti-parallel G4 (Randazzo et al., 2013). Lastly, DMS footprinting is used to determine the involvement of particular Gs in a G-quartet formation. DMS methylation and piperidine backbone cleavage occurs only if the N7 of Gs is solvent accessible and not involved in Hoogsteen hydrogen bond formation (Sun and Hurley, 2010). Here, we combine these three techniques to systematically characterize the G4 forming DNA oligonucleotide hex4_A5U, which is derived from the 5' untranslated region of the maize hexokinase4 gene (Andorf et al., 2014). We then apply our analysis to characterizing the interaction between ZmNDPK1 and the G4 DNA, which form a high-affinity complex with a subset of the potential hex4_A5U G4 conformers (Kopylov et al., 2015). Our analysis shows an unprecedented level of polymorphism in the hex4_A5U sequence that can be described by the topological isomers and recently formulated G-register exchange concept (Harkness and Mittermaier, 2016a), extended to include the non-canonical bulged G4 conformations.

51 3.2. Materials and methods 3.2.1. Oligonucleotide and protein preparation All oligonucleotides were purchased from Eurofins MWG Operon LLC (Huntsville, AL) as salt free (non-labeled oligonucleotides) or HPLC purified (fluorescently labeled oligonucleotides) and used without further purification. Base positions in oligonucleotide variants are numbered according to the positions in the hex4_A5U sequence (Kopylov et al., 2015). Unless indicated otherwise, oligonucleotides were annealed by heating to 95 °C and slowly cooling overnight to room temperature in 10 mM Tetrabutyl ammonum phosphate buffer (TBA, pH 7.5) with or without 100 mM salt (KCl, LiCl, CsCl or NaCl) or in water alone. Recombinant ZmNDPK1 protein was purified as previously described (Kopylov et al., 2015). 3.2.2. Absorption spectrophotometry Non-labeled oligonucleotides were annealed at 10 μM concentration and diluted to 2.5 μM before data collection. All UV-Visible experiments were performed on a Cary 300 Bio UV/Vis spectrophotometer equipped with a Peltier temperature controller (Agilent Technology, Santa Clara, CA). For thermal difference spectroscopy (TDS), a first spectrum was collected at 25 °C, samples were heated to 95 °C and a second spectrum was collected. TDS was calculated by subtracting the 25 °C spectrum from the 95 °C spectrum and normalizing the maximum peak to an absorbance of 1 and the absorbance at 330 nm to zero. For thermal denaturation experiments, the absorbance at 295 nm was monitored in the temperature range from 25 to 95 °C at a heating rate of 0.5°C/min. Data were normalized to a maximum of 1. 3.2.3. Circular dichroism spectrophotometry Non-labeled oligonucleotides were annealed at 10 μM concentration and used without further dilution. Circular Dichroism (CD) spectra were collected on an Aviv 202 CD spectrometer (Aviv Biomedical, Lakewood, NJ). Single temperature experiments were performed at 25 °C over a 200–330 nm range with 3 s averaging time. The same parameters were used for thermal denaturation experiments in which measurements were made between 10 and 95 ºC with a 5 ºC increment between measurements after a 10 min equilibration. All spectra were background corrected against blank buffer and normalized to have zero ellipticity at 330 nm. 3.2.4. Dimethyl sulfide footprinting Oligonucleotides with a 5' 6-carboxyfluorescein (FAM) modification were annealed at 10 μM concentration and diluted to 500 nM prior to DMS treatment. Samples were treated with

52 1% DMS for 5 minutes at 25 ºC and stopped by adding 25 μl of quench solution (1.5 M sodium acetate pH 7.0, 1 M BME and 100 μg/ml calf thymus DNA) to a 100 µl reaction. DNA was ethanol precipitated and pellets were resuspended in 100 μl of 1 M piperidine, incubated for 15 minutes at 95 ºC and dried in a rotary centrifuge. Dried samples were washed with distilled water, resuspended in alkaline sequencing dye (80% formamide, 10 mM NaOH, 0.005% bromophenol blue), and heated to 95 ºC for 3 minutes. Cleavage products were resolved on a 17.5% polyacrylamide denaturing gel (4 M urea, 0.5x TBE, 0.4 mm thick, 33x39 cm, 29:1 acrylamide:bisacrylamide) run for 1.5 hours at a constant 50 W power. Glass plates were separated and the gel was imaged on a GE Typhoon scanner (GE Healthcare Bio-Sciences, Pittsburg, PA) in fluorescence mode using 488 nm excitation wavelength and 520 nm band pass filter. 3.2.5. Nitrocellulose filter binding assays for ZmNDPK1/G4 DNA binding affinity analysis For binding affinity determination we used a modified slot-blot binding assay as previously described (Kopylov et al., 2015), substituting a 5’ biotin label with a 5’ carboxyfluorescein. We have used the same approach to determine the efficiency of ZmNDPK1 binding to labeled oligonucleotides in the presence of competitor oligonucleotides. All oligonucleotides were annealed in 10 mM TBA (pH 7.5) and 20 mM KCl. Labeled oligonucleotide at 1 nM was incubated with 100 nM competitor oligonucleotide and 5 nM ZmNDPK1 for 60 minutes. After the incubation reactions were applied to the slot blot apparatus, where the solution first passes through a negatively charged nitrocellulose membrane (Hybond-C Exatra 0.45 μM pore size, GE Healthcare Life Sciences, Piscataway, NJ) that retains protein and protein/DNA complex. Unbound DNA was then captured by a positively charged nylon membrane (Nytran N 0.45 μM pore size, GE Healthcare Life Sciences, Piscataway, NJ). Membranes were dried and scanned on a GE Typhoon scanner in fluorescence mode using 488 nm excitation wavelength and 520 nm band pass filter. Images were background corrected and the intensities of the bands were determined in ImageJ. Competition efficiency was calculated from the percent retention of the fluorescent probe on nitrocellulose against zero competitor control. 3.2.6. Analytical ultracentrifugation Sedimentation experiments were carried out in a Beckman Coulter ProteomeLab XL-1 analytical ultracentrifuge using AN60-Ti rotor and double sector quartz cells. 420 μl of annealed oligonucleotides at 1 μm were loaded into sample sectors and 430 μl of corresponding annealing

53 buffers were loaded into reference sectors. Initial scans and rotor calibrations were performed at 3000 rpm and 260 nm wavelength. Data were collected at 58000 rpm and analyzed using Ultrascan III software(Demeler, 2005). 3.2.7. Fluorescence Resonance Energy Transfer (FRET) hex4_A5U oligonucleotides were labeled with either 5' 6-carboxyfluorescein (hex4_A5U- 5F) or 3' carboxytetramethylrhodamine (hex4_A5U-3T) or both fluorophores (hex4_A5U-5F3T). Reactions were set up in triplicate in 96-well Nunclon plates (Thermo Fisher Scientific, Waltham, MA) and contained 200 nM of either hex4_A5U-5F3T or a mix of 100 nM hex4_A5U-5F + 100 nM hex4_A5U-3T annealed in 10 mM TBA (pH 7.5) + 100 mM KCl or 100 mM LiCl. Protein was added at 0, 200 nM, 500 nM or 1000 nM concentrations and incubated for 1 h at 4ºC before data collection. Data were collected on a Spectramax M5e Multi-Mode Microplate Reader (Molecular Devices, Sunnyvale, CA) and processing was performed as previously described(Kopylov et al., 2015). Labeling and data collection for A5UR20 oligonucleotides were done as described for hex4_A5U. 3.2.8. Electron microscopy NDPK:G4 complex was assembled by mixing 3 µM ZmNDPK1 and 6 µM hex4_A5U in a buffer containing 10 mM Hepes pH 7.5 and 50 mM KCl. For negative staining the mixture was applied to the plasma-cleaned CF200-Cu carbon-coated copper grids (Electron Microscopy Sciences, Hatfield, PA), incubated for 60 s, washed 3x with distilled water and stained for 60 s with 1% uranyl-acetate. Images were collected on a FEI/Philips CM120 Biotwin electron microscope (Thermo Fisher Scientific, Waltham, MA) at 40000 magnification (2.8 Å/px). For cryo-electron microscopy (cryoEM) the mixture was applied to the carbon side of the plasma- cleaned Quantifoil R2/2 grids (Electron Microscopy Sciences, Hatfield, PA) and plunged into liquid ethane using FEI Vitrobot (Thermo Fisher Scientific, Waltham, MA). Plunge-frozen grids were imaged on a FEI Titan Krios (Thermo Fisher Scientific, Waltham, MA) equipped with a DE20 direct electron detector camera (Direct Electron, San Diego, Ca), at 37000 magnification and 0.99Å pixel size. Automatic data acquisition was set up using Leginon software(Suloway et al., 2005). Images were collected with a 1.5–3.5 µm defocus range. Particles were manually picked from the images using Leginon particle picker. Particle coordinates were used to create a particle stack of ~30.000 particles. Particle stack was 2D-classified in cryoSPARC (Punjani et al., 2017) into 30 classes, to obtain on average 1000 particles per class.

54 3.3. Results 3.3.1. hex4_A5U adopts a G4 conformation in the presence of cations We first tested the influence of different monovalent cations on the formation of G4 by the hex4_A5U sequence using UV-Vis and CD spectrophotometry. A characteristic G4 TDS spectrum with a negative peak at 295 nm was obtained only in the presence of K+ ions (Figure 3.1B). AUC showed that most of the DNA was folded into a compact globular structure with an average molecular weight of 10.8 kDa (expected 10.2 kDa) and average of f/f0 of 1.56 (Figure 3.2). TDS spectra of oligonucleotides annealed with Na+, Cs+ and Li+ ions were more negative at 295 nm than those determined in the absence of cation but none had a prominent negative peak. We next monitored the thermal denaturation of G4 structures by recording the change in absorbance at 295 nm (Figure 3.1C). A hypochromic shift at this wavelength with increasing temperature is associated with G4 melting (Mergny and Lacroix, 2003, Mergny et al., 1998). In contrast, single stranded DNA (ssDNA) experiences a hyperchromic shift at 295 nm upon increasing temperature due solely to denaturation of any transitory secondary structures such as ssDNA helix (Uzman, 2001). As expected, in the presence of K+ we observed a sigmoidal decrease in absorbance at 295 nm with increasing temperature, revealing a midpoint of transition (T1/2) at 58 ºC. In the presence of Na+, Li+, or Cs+ ions, we saw an initial decrease in absorbance at 295 nm that suggested melting of a G4-like structure, followed by an increase in absorbance that we attributed to ssDNA helix denaturation. In the absence of cations, the absorbance at 295 nm steadily increased with increasing temperature. The formation of G4-like structures in the presence of cations other than K+ was further evidenced by CD spectrophotometry. Samples annealed in the absence of cations had a positive peak maximum at 255 nm and did not undergo structural transitions with an increase in temperature (Figure 3.3A, B). At 25 °C, CD spectra of hex4_A5U annealed in the presence of any monovalent cation were similar to one another whereas they were dramatically different from the spectra of hex4_A5U in the absence of cations. Negative ellipticity at 242 nm and positive ellipticity at 262 nm, as observed for cation-annealed samples, are the hallmarks of a parallel G4(Balagurumoorthy et al., 1992). K+-annealed samples melted as a single species with an increase in temperature (Fig 3.3C), whereas samples annealed in Na+, Li+ and Cs+ displayed a structural transition evidenced by a gradual shift of the maximum positive peak from 262 nm to 255 nm (Figure 3.3D, E, F).

55 3.3.2. hex4_A5U oligonucleotide is not limited to a single G4 conformation We performed DMS footprinting followed by piperidine cleavage (Figure 3.4) to identify the Gs involved in G4 formation in K+ and to characterize the G4-like structures that form in presence of non-K+ cations. G4 prediction by the Quadparser algorithm(Huppert and

Balasubramanian, 2005) flagged G4-G6, G14-G16, G24-26 and G28-30 as the four continuous G-tracts in the hex4_A5U sequence involved in G-tetrad formation(Andorf et al., 2014). A distinct footprinting pattern marked by missing products that correspond to Gs protected by G4 formation was seen only in K+-annealed samples (Figure 3.4A, lane 1). Specifically, bands corresponding to cleavage at G3-G5, (G-tract I from the Quadparser model), G25-G26 (partial G-tract III) and G28-

G30 (G-tract IV) were missing, indicating that those Gs were strongly protected from being DMS labeled. Low intensity bands corresponding to cleavage at G6 and G24 suggested weaker protection.

In contrast, bands corresponding to cleavage at G14-G16 (G-tract II) and other discontinuous Gs

(G8, G11, G18, G19, G21 and G22) were strong, indicating those Gs were not protected. Thus, we identified only two complete and one partial G-tract out of four G-tracts assigned by Quadparser for samples annealed in the presence of G4-inducing K+. There was a similar, but much less prominent, footprinting pattern in the Na+, Li+ or Cs+- annealed samples (Figure 3.4A, lanes 2, 3 and 4) visible only in the 3' region (compare intensity of the G31 band to G28–G30). In contrast, there was no protection in the absence of cations (Figure 3.4A, lane 5) and most of the oligonucleotide was degraded in the water alone (Figure 3.4A, lane 6). hex4_A5U DMS footprinting patterns in different cations agree with CD and UV-Vis melting experiments, which showed that K+, and to a lesser degree Na+, Li+ and Cs+, supported G4 formation whereas no secondary structure was detectable in absence of cations. Despite unambiguous spectroscopic evidence of G4 formation in K+, our DMS footprinting did not show an expected protection pattern in G-tract II or III Gs (Figure 3.4A). This raised a question about the role of the middle Gs in the G4 structure and the possibility of heterogeneous G4 structures. We first created a shorter construct, trim_A5U, with a 3-base truncation at the 5' end and a 1-base truncation at the 3' end, to simplify our analysis and eliminate the structures that would arise due to the G-register exchange (Figure 3.4B). All further analysis was done in trim_A5U background. DMS footprinting of trim_A5U construct showed a different footprinting pattern, where only G24–G25 and G28–G30 at the 3’ end were clearly protected. Some difference in the degree of

56 digestion was observed for G25 vs G26, G18 vs G19 and G14 vs G15–G16 (Figure 3.4B lane 2).

Additionally, G4–G6 were less digested in KCl vs LiCl, indicating their involvement in G4 formation (Figure 3.4B lanes 1 and 2). To verify that canonical G4 can still form we further modified trim_A5U construct by replacing G8, G18, G19, G21 and G22 with thymidines, giving rise to a ‘locked’ canonical construct—A5UAH. DMS footprinting of this locked variant clearly showed protection of 12 guanines in KCl but not LiCl. These guanines formed a G4 core, whereas overdigestion of G11 showed that it was not involved in a core formation. Next, we employed rational mutagenesis to define the apparent heterogeneity of trim_A5U in G-tracts I, II and III. Despite the predictions of middle G-tract involvement in G4 formation, deletion of the middle G-tract (G14–G16) or substitution of those Gs with adenines had no effect on + G4 formation in K , assessed by TDS (Figure 3.5A). We already established that G18-G19 and G21-

G22 could not be exclusively involved as a bulged G-tract II, since their substitution by thymidines resulted in a sequence that was still capable of G4 formation (Figures 3.4B, 3.5A). Point mutations that simultaneously disrupted continuous G-tracts I, III and IV (G4, G25, G30) resulted in a sequence that did not form a G4 (Figure 3.5B). To test the possibility of intermolecular G4 formation from a two–G-tract containing oligonucleotide, we made an A5UR20 (random 20) construct where the 5' sequence upstream of the GGGAGGG hairpin was replaced with 20 random non-G bases. This oligonucleotide did not form a stable G4 (Figure 3.5B), although it had a CD spectrum indicative of parallel G4s when annealed with K+ or Li+ (Figure 3.5B). From this analysis, we further hypothesized that the trim_A5U sequence exists as a mix of G4 conformers in G-tracts II and III, including variants where continuous G-tracts were interrupted by non-G bases, forming a bulge (Mukundan and Phan, 2013). According to this extended model, G-tracts I and IV are fixed but G-tracts II and III are formed by six bases from a G-slide region of 10 Gs with or without one-base bulges (Figure 3.6A). Based on these assumptions we identified 13 possible variants (Figure 3.6B), named according to the participating G triplets that form G- tract II and III (A–H). To test our model, we designed “locked” sequences that intended to limit the oligonucleotide to adopting a single conformation (Figure 3.6B, Table 3.1). 3.3.3. Locked hex4_A5U variants form G4s with distinct properties We proceeded to characterize the ability of locked trim_A5U variants to form G4s in K+ through our series of spectroscopic assays. TDS showed that all locked variants A5UAD–A5UEH form G4s, albeit with variable amplitude of the negative 295 nm peak (Figure 3.7A). CD spectra

57 revealed additional differences between the variants (Figure 3.7B). Specifically, A5UAD, A5UAH, A5UBH, A5UCF, A5UCH, A5UDH and A5UEH had signatures of parallel G4s; A5UAE, A5UAF, A5UAG, A5UBF and A5UBG, had signatures of antiparallel G4s; A5UCG had a mixed spectrum. Thermal denaturation experiments showed that A5UAD, A5UCF and A5UCG formed the weakest G4 structures, followed by A5UBF, A5UBG and A5UBH that together formed a group of G4 variants AE AF AG AH with T1/2 <30 ºC (Figure 3.7C). The remaining seven variants—A5U , A5U , A5U , A5U , A5UCH, A5UDH and A5UEH—formed G4s that were stable at room temperature. All but A5UAH contained one or two bulged G-tracts. Taken together, these data show that all 13 locked trim_A5U variants formed G4s but varied in their topology and thermal stability (Table 3.1). 3.3.4. ZmNDPK1 requires two consecutive G-tracts with a single one base loop for efficient binding Previously, we demonstrated that ZmNDPK1 binds to wild-type hex4_A5U G4 DNA with high affinity(22). We determined that ZmNDPK1 also binds with high affinity to trim_A5U (Kd = AH 16.6 nM), as well as to the locked canonical variant A5U (Kd = 14.4 nM), but not to another AE locked variant A5U (Kd = 194 nM) (Figure 3.8A, B, C) We further tested which locked variant competed with wild-type hex4_A5U for binding to ZmNDPK1 to assess their binding specificity. At 100-fold excess of competitor, locked variants showed varying degrees of competition efficiency (Figure 3.8D, Table 3.1). Although each of the 13 variants competed for binding, only those that are classified according to CD measurements as parallel competed with greater than 50% efficiency (Figure 3.8D). The common feature shared by the strong competitors (A5UAD, A5UAH, A5UBH, A5UCF, A5UCH, A5UDH and A5UEH) was the presence of two G-tracts connected by a single adenosine: GGGAGGG (or GGGAGAGG with a bulge in the second G-tract as in A5UAD) (Table 3.1). 3.3.5. ZmNDPK1 binds and stabilizes intermolecular G4s ZmNDPK1 binds to G4s that are annealed in Li+ and promotes G4 folding upon binding (Figure 3.9A) (Kopylov et al., 2015). To further explore the binding properties of ZmNDPK1 with G-rich DNA that is not pre-formed into an intramolecular G4 conformation, we tested whether or not ZmNDPK1 could bring together two separate DNA strands. When 50 nM of each of 5’ FAM- labeled hex4_A5U and 3’ TAMRA-labeled hex4_A5U oligonucleotides were mixed and then annealed either in K+ or Li+, neither sample exhibited FRET in the absence of protein (Figure 3.9B). When ZmNDPK1 was added to the K+-annealed oligonucleotide there was no change to the

58 FRET signal. In contrast, the single labeled oligonucleotides pre-annealed in Li+ exhibited FRET when mixed with ZmNDPK1 (Figure 3.9B). Interestingly, when A5UR20 oligonucleotide (20 random bases ending with the 3' GGGAGGG hairpin) was used instead of hex4_A5U, no FRET was observed despite A5UR20 showing a signature of parallel G4 in CD experiments (Figure 3.9C). 3.3.6. ZmNDPK1 and trim_A5U form a heterogeneous complex To gain the insight into the mechanism of complex formation between ZmNDPK1 and trim_A5U we used negative staining electron microscopy (Figure 3.10). For ZmNDPK1 alone, we saw uniformly distributed globular protein molecules of the expected size (Figure 3.10A). After ZmNDPK1 was incubated with trim_A5U we saw formation of filamentous structures of uniform ~6 nm thickness but various lengths and shapes (Figure 3.10B). ZmNDPK:trim_A5U complex was then plunge-frozen and images were collected in vitrified ice under the cryogenic conditions (Figure 3.10C). We saw that filaments were well preserved in ice and uniformly distributed. 2D- classification of the particles picked from cryoEM data confirmed that complex has a distinct but a highly heterogeneous structure (Figure 3.10D). 3.4. Discussion G4s are now recognized as important elements in the regulation of intracellular processes related to replication, transcription, translation, splicing and telomere maintenance (Rhodes and Lipps, 2015). In fact, G4 formation in the promoter of a gene can either inhibit (Balasubramanian et al., 2011, Cogoi and Xodo, 2006) or facilitate its transcription (Catasti et al., 1996, Farhath et al., 2015). In vivo, G4s exist in the context of the double-stranded genome and are regulated through interaction with G4-binding proteins like XPB and telomere end-binding proteins(Rhodes and Lipps, 2015, Brázda et al., 2014, Gray et al., 2014, Paeschke et al., 2005). In vitro, G4 formation is largely driven by the presence of K+ or Na+, so in addition to possible coordination by proteins G4 formation is also sensitive to the ionic environment of the cell. 3.4.1. G4 formation by the hex4_A5U oligonucleotide UV-Vis spectrophotometry, CD spectrophotometry and DMS footprinting are commonly used techniques to verify G4 formation by a given nucleotide sequence; each has unique strengths but none is able to unambiguously assess the G4 conformation. TDS is a qualitative technique based on UV-Vis spectrophotometry that relies on the hyperchromicity of G4s at 295 nm, but the signal changes qualitatively with the base composition of the nucleic acid(Mergny et al., 2005). The hex4_A5U shows a distinct G4 TDS profile only in

59 the presence of K+ ions (Figure 3.1B) whereas the TDS profile of hex4_A5U in Na+, Li+ and Cs+ is intermediate between K+ and the absence of cations, suggesting formation of a weak G4. Aside from TDS, UV-Vis spectrophotometry can be used to monitor the stability of the G4 by measuring the change in absorbance at 295 nm with increasing temperature (Mergny and Lacroix, 2009). In + 100 mM K , hex4_A5U has a T1/2 of 58 ºC, showing that it is stable at physiologically relevant temperatures (Figure 3.1C). We also observe an initial decrease in absorbance at 295 nm for all other cations, suggesting a transient G4 is melting, but not in the absence of cations. We conclude that hex4_A5U oligonucleotide, forms a G4 only if stabilized by a cation, where K+ >> Na+ > Li+ > Cs+. CD spectrophotometry is commonly used to assess the properties of oligonucleotides to give clues about the secondary structure. hex4_A5U has CD spectra characteristic of a parallel G4 conformation (Balagurumoorthy et al., 1992) in K+, Na+, Li+ and Cs+ (Figure 3.1D). In the absence of any small cation, CD spectra indicates that the oligonucleotide is disordered (Figure 3.1D). Interestingly, CD thermal denaturation experiments show that G4s in K+ melt as a single species, whereas in Na+, Li+ and Cs+ there is a structural transition evidenced by the shift of the peak maxima from 262 nm to 255 nm (Figure 3.3). After the transition, melting profiles for Na+, Li+ and Cs+ resemble that of the oligonucleotides annealed in the absence of cations. The temperature at which this transition occurs is cation-dependent and matches the G4 stability order for cations determined in UV-Vis thermal denaturation experiments: K >> Na+ > Li+ > Cs+. Lastly, DMS footprinting provides an additional insight into G4 topology by analyzing solvent-accessible Gs. DMS footprinting shows strong protection of Gs only in K+ (Figure 3.4A lane 1), whereas protection in Na+, Li+ and Cs+ is limited to the GGGAGGG hairpin at the 3' end of the sequence (Figure 3.4A lanes 2-4). All Gs are completely unprotected in absence of cations, representing a fully unfolded state. We attribute the partial protection in Na+, Li+ and Cs+ to the formation of a weak intermolecular G4 that forms by cation stabilization of the 3' GGGAGGG hairpins from two DNA molecules. Indeed, the A5UR20 oligonucleotide, that has 20 random non- G bases followed by a GGGAGGG hairpin on the 3' end, does not have a characteristic G4 TDS spectrum (Figure 3.5B), but it has parallel G4 CD spectrum in the presence of cation (Figure 3.5C). We conclude that hex4_A5U forms an intramolecular parallel G4 only in the presence of K+, whereas it forms a weak intermolecular G4 in the presence of Na+, Li+ or Cs+

60 3.4.2. hex4_A5U and its truncated variant trim_A5U are highly polymorphic G4-forming sequences G4 polymorphism is common, complicating structure prediction based on sequence alone. Examples of polymorphism include extra G-tracts that can act as a "spare tire"(Fleming et al., 2015); formation of an ensemble of structures with different topologies(Dai et al., 2007, Guédin et al., 2010), variation in number of strands (one, two or four) and tetrads (two or more); presence of bulges(Mukundan and Phan, 2013) and loops longer than seven nucleotides(Guédin et al., 2010). hex4_A5U was initially predicted to form from four uninterrupted G-tracts of three sequential Gs (Figure 3.6A). Instead, DMS footprinting reveals that only two G-tracts are fully protected, whereas G-tract II is not protected and G-tract III is only partially protected (Figure 3.4A, lane 1). Further, G-tract II is not strictly required for G4 formation in K+ (Figures 3.4A, 3.5A). To explain this mismatch we hypothesize that adjacent Gs can be substituted into G-tract II, forming a series of structures with bulged G-tracts that co-exist in solution (Figure 3.6A). Such dynamic system, combines G-register exchange(Harkness and Mittermaier, 2016a) with the formation of bulged variants and leads to the apparent absence of protection in G-tract II and only partial protection in G-tract III over the course of DMS labeling. DMS footprinting of trim_A5U revealed that in this truncated construct the strong protection of guanines is observed only for tracts III and IV (Figure 3.6B, lane 2). The only time we observe complete protection of all 12 guanines involved in G4 core formation is in DMS footprinting of A5UAH construct (Figure 3.6B, lane 4), where there all extra guanines were substituted by thymidines. An expanded definition of G4-forming sequences emerges that allows G-tracts to be interrupted by a one-base bulge, connected into a continuous region that we call a "G-slide" (Figure 3.6A). This guanine-rich region can also be mathematically described as 10 choose 6 combinatorics problem that results in 260 combinations, of which we explored only 13 variants by limiting ourselves to single-bulge interruptions of G-tracts. From our formulation, trim_A5U can form at least 13 different conformers, isolated structurally by point mutagenesis (Figure 3.6B, Table 3.1). By all measurements, each resulting variant behaves in a sequence-specific manner that is ultimately predictive of its fold and determines its interaction with the G4-binding protein ZmNDPK1 (Figures 3.7, 3.8). The CD spectrum of the trim_A5U sequence has a minor contribution of the anti-parallel signal when compared to the locked variants (Figures 3.7B, 3.8B). This suggests that predominantly antiparallel (A5UAE, A5UAF, A5UAG, A5UBF and A5UBG) as well

61 as unstable variants (A5UAD, A5UCF and A5UCG) constitute a small fraction of solution conformations. Therefore, co-existence of parallel G4s with variable G-slide picks (A5UAH, A5UBH, A5UCH, A5UDH and A5UEH) represent the majority of conformational states of trim_A5U. Overall, the wild-type conformation is likely determined by the relative stability of the fold and the presence of a GGGAGGG hairpin that favors the formation of a parallel G4 (Figure 3.7B, Table 3.1)(Tippana et al., 2014). 3.4.3. The G4-binding protein ZmNDPK1 recognizes a subset of conformations adopted by hex4_A5U DNA and forms filamentous structures upon binding DNA is associated with protein binding partners within the nucleus. ZmNDPK1, a plant homolog of human NM23-H2, interacts with hex4_A5U with high affinity and specificity (Kopylov et al., 2015). Despite the analogy between plant and human NDPKs binding to G-rich DNA sequences (Hildebrandt et al., 1995, Postel et al., 1993, Boissan and Lacombe, 2011), we do not know how they interact or, until now, what structural motifs direct binding. ZmNDPK1 does not have a single preferred G4 conformation but binds more specifically to parallel G4s that contain the GGGAGGG motif with or without bulges (Figure 3.8, Table 3.1). Additionally, ZmNDPK1 recognizes the structural element that gives rise to weak G4 signals in sub optimal G4 promoting ions (i.e. Li+), perhaps a transitory guanine hairpin (Yafe et al., 2005), and then facilitates bimolecular G4 formation (Figure 3.9A, B).5 Electron microscopy of ZmNDPK1:trim_A5U complex revealed its assembly into filamentous structures (Figure 3.10B, C). These structures differed in their lengths, but not thickness. 2D-classification of particles from cryoEM images provided a low-resolution look into organization of this complex (Figure 3.10D). We can see that the complex is highly flexible, and poorly resolved, which made it not possible to distinguish between protein and DNA densities in our 2D classes. One thing was clear—the complex of ZmNDPK1:trim_A5U was not as simple as two G4s per one hexamer as we predicted from the stoichiometry determined biochemically in solution. 3.4.4. Generalization of the extended model Numerous examples in well-studied human G4s including c-myc (Seenisamy et al., 2004), RET (Shin et al., 2015, Kumarasamy et al., 2015, Guo et al., 2007), VEGF (Sun et al., 2008), and BCL-2 (Dexheimer et al., 2006) suggest that formation of multiple structures by a single sequence is a common feature of heterogeneous G4s. Indeed, a minimal version composed of four Gs in a

62 single G-tract where the 5’ or 3’ G can swap into the three-G stretch of the slide region has been described as a slippage of the G-tract in c-myc (Seenisamy et al., 2004). Similarly, a specific instance of the slide can be seen in the oxidative protection mechanism described as the spare tire, where a 5th terminal G-tract can slide into place, positioning the fourth G-tract in a long loop that allows repair of oxidatively-damaged Gs (Fleming et al., 2015). Here, we have generalized these specific examples into an extended G4 model that allows Gs from long, non-continuous G stretches to slide into the G4 stack, creating a range of non-canonical G4 conformations that have unique properties and specific responses to a G4-binding protein (Figure 3.11).

63

Figure 3.1: Spectroscopic analysis of a G-quadruplex (G4) formation by hex4_A5U. Oligonucleotides annealed in water (black), 10 mM TBA buffer pH 7.5 (gray), or 10 mM TBA buffer supplemented with 100 mM of KCl (red), NaCl (brown), LiCl (blue), or CsCl (green). A. Schematic representation of a canonical parallel unimolecular G4. Four tracts of three consecutive guanines (spheres) form three stacked G-quartets (cyan) stabilized by a monovalent cation. L1, L2 and L3 are lateral loops (magenta). B. Normalized thermal difference spectra show formation of G4s indicated by a negative peak at 295 nm. A prominent negative peak is observed only in KCl and is absent in water or buffer with intermediate values observed for NaCl, LiCl and CsCl. C. Thermal melting measures the cation-dependent stability of the G4 structures. G4s formed in KCl are most stable with T1/2 of 58°C, followed by 50°C for NaCl, 42°C for LiCl and <30°C for CsCl. A linear increase in absorbance at 295 nm for oligonucleotides annealed in the absence of cation indicates that no G4 structures are formed. D. CD spectra show the formation of parallel G4s in the presence of cations. Peak maxima at 262 nm and minima at 242 nm are the hallmarks of the parallel G4s and observed in KCl, NaCl, LiCl and CsCl..

64

Figure 3.2: Analytical ultracentrifugation of hex4_A5U annealed in KCl shows formation of compact a structure. A. Raw sedimentation scans (yellow) overlaid with the calculated fit (red). B. Residuals between the fit and the model plot showing their random nature. C. Relative concentration and distribution of the species with different sedimentation coefficients. D. Summary table of the contents of the solution after genetic algorithm analysis as implemented in Ultrascan 3(Cao and Demeler, 2008, Demeler, 2005).

65

Figure 3.3: CD melts show reversible structural transition of a G4 formed by hex4_A5U with different cations. Oligonucleotides were annealed in 10 mM TBA buffer supplemented with 100 mM of KCl (A), NaCl (B), LiCl (C), CsCl (D), buffer alone (E) or water (F). In all conditions, we observed the overall decrease in CD signal intensity with an increase in temperature. Melting G4s annealed in KCl results in a sigmoidal curve with T1/2 of 58°C. Melting in NaCl, LiCl and CsCl reveals two-state behavior indicative of a structural transition from a G4 to ssDNA: a sigmoidal phase is followed by a linear phase. Melts for water and TBA buffer alone are linear and represent unstacking of the ssDNA bases. Thermal unfolding of the secondary structure is reversible, indicated by the dashed black line that corresponds to spectra collected immediately after the samples were cooled to 20°C. Insets: plots of molar ellipticity at 262 nm versus temperature.

66

Figure 3.4: Dimethyl sulfate (DMS) footprinting of hex4_A5U and its variations reveals guanines involved in G4 core formation. A. Missing bands on a gel indicate guanines protected from DMS labeling. A distinct footprinting pattern is observed only for KCl sample (lane 1). In NaCl, LiCl and CsCl partial protection is observed for the GGGAGGG hairpin at the 5’ end of the oligonucleotide (lanes 2-4). In TBA all guanines are digested evenly and in water alone the sample is overdigested. Circles (left) indicate guanines that are protected (○), partially protected (◗) or overdigested (●) when treated in KCl. B. hex4_A5U was trimmed by removing bases 1, 2, 3 and 31 resulting in trimA5U construct. trimA5U was further altered by substituting G8, G18, G19, G21 and G22 with thymidines resulting AH AH in A5U construct. Both trim_A5U and A5U oligonucleotides were subjected to DMS footprinting in KCl or LiCl.

67

Figure 3.5: Preliminary mutagenesis of the trim_A5U oligonucleotide. Oligonucleotides were annealed in a buffer containing 10 mM TBA pH 7.5 supplemented with 100 mM KCl or LiCl. A. trim_A5U variants with the deletion or substitution of tract II guanines with adenines form G4s. B. trim_G4-25-30T oligonucleotide with the point mutations in tracts I, III and IV and A5UR20 oligonucleotide with randomized sequence upstream of G-tract II do not form a stable G4s. C. G4 characteristic CD spectrum is observed only when A5UR20 oligonucleotides are annealed in the presence of cations, but not in the TBA buffer alone.

Figure 3.6: Extended model of G4 formation by trim_A5U allowing one bulge in G-tract. A. Comparison between the canonical model and the extended G4 model. Extended model allows longer loops and a one-base-bulge interruption of G-tracts. Under the canonical model there is only one possible fold that can be adopted by trim_A5U to form a G4 core—using tracts that do not contain bulges—tract II (A) and tract III (H). Under the extended model trim_A5U can potentially form 13 different G4 core folds (including a canonical fold A5UAH) with fixed tracts I and IV and the potential of one-base bulges in tracts II and III. B. Guanines that can be involved in formation of the G4 core in the extended model are highlighted.

68

Figure 3.7: Spectroscopic analysis of a G-quadruplex (G4) formation by trim_A5U locked variants. trim_A5U variants based on the extended model (derived from Figure 3.6B) were tested for their ability to form G4s. Guanines were substituted with thymidines to preclude their involvement in G4 core formation. (A) With the exception of A5UAD, A5UCF and A5UCG all trim_A5U variants have a prominent negative peak at 295 nm, indicating G4 formation. B. Thermal melts monitored at 295 nm show that all locked variants form G4s with different stabilities, AD BF BG BH CF CG however A5U , A5U , A5U , A5U , A5U , A5U form weak G4s with T1/2 <30°C. B. CD spectra show that locked variants form G4s with different topologies. Variants A5UAD, A5UAH, A5UBH, A5UCF, A5UCH, A5UDH and A5UEH form parallel G4s—major peak at 262 nm; A5UAE, A5UAF, A5UAG, A5UBF and A5UBG form antiparallel G4s— major peak at 292 nm; A5UCG has a mixed spectra with similar ellipticity at 262 and 292 nm.

69

Figure 3.8: G4 binding protein ZmNDPK1 preferentially binds to the parallel locked variants of trim_A5U. A. Binding of ZmNDPK1 to the trim_A5U causes the retention of the fluorescent label on nitrocellulose. B. Binding of ZmNDPK1 to the trim_AH causes the retention of the fluorescent label on nitrocellulose. C. Binding of ZmNDPK1 to the trim_AE causes the retention of the fluorescent label on nitrocellulose. D. Competition efficiency was calculated as the amount of the probe retained on the nitrocellulose compared to the no competitor control. Only parallel G4 locked variants compete with high efficiency: A5UAD, A5UAH, A5UBH, A5UCF, A5UCH, A5UDH and A5UEH.

70

Figure 3.9: ZmNDPK1 binds to and stabilizes intermolecular and intramolecular G4s. Fluorescence emission data were collected by exciting the FAM fluorophore and resulting plots were normalized to the peak maxima of 1 to better visualize the changes. A. hex4_A5U_5F3T—dual labeled oligonucleotides. When annealed in KCl, FRET signal changes little with increasing protein concentration. When annealed in LiCl, FRET signal increased with increasing protein concentration. B. hex4_A5U_5F/3T—two single labeled oligonucleotides. When annealed in KCl, FRET did not change with added protein. When annealed in LiCl, FRET signal increased with increasing in protein concentration. C. A5UR20 5F/3T—two single labeled oligonucleotides, with 20 random non-G bases ending with a GGGAGGG hairpin. When annealed in either KCl or LiCl, FRET signal did not change with added protein.

71

Figure 3.10: Electron microscopy of the complex between ZmNDPK1 and trim_A5U. A. Image of a negatively stained ZmNDPK1 alone. B. Image of the negatively stained ZmNDPK1 in complex with trim_A5U. C. CryoEM image of ZmNDPK1 in complex with trim_A5U stain in vitreous ice. D. Results of 2D-classification of 30.000 filament segments picked from the cryoEM images.

Figure 3.11: Possible topologies that can be adopted by the trim_A5U oligonucleotides. Out of 13 conformation possibilities predicted by the extended model only one is canonical - A5UAH while twelve others contain a bulge in G-tract II, III or both G-tracts. ZmNDPK1 binds to the variants with the conserved GGGAGGG hairpin (contrasted models).

72

Table 3.1: Summary of the properties of trim_A5U locked variants. Gs that participate in the G4 formation are bolded and G-tracts are underlined and bold. Mutated residues are in lowercase.

Oligonucleotide Sequence CD Tm Competition % A5UAD GGGTtTTGAAGGGAGGAGtAtttAGGG parallel <30 73 A5UAE GGGTtTTGAAGGGAtGAGGAtttAGGG anti 37 16 A5UAF GGGTtTTGAAGGGAttAGGAGttAGGG anti ~30 15 A5UAG GGGTtTTGAAGGGAttAtGAGGtAGGG anti 40 40 A5UAH (canonical) GGGTtTTGAAGGGAttAttAGGGAGGG parallel 42 77 A5UBF GGGTtTTGAAtGGAGtAGGAGttAGGG anti <30 23 A5UBG GGGTtTTGAAtGGAGtAtGAGGtAGGG anti <30 50 A5UBH GGGTtTTGAAtGGAGtAttAGGGAGGG parallel <30 85 A5UCF GGGTtTTGAAttGAGGAGGAGttAGGG parallel <30 62 A5UCG GGGTtTTGAAttGAGGAtGAGGtAGGG mixed <30 36 A5UCH GGGTtTTGAAttGAGGAttAGGGAGGG parallel ~30 76 A5UDH GGGTtTTGAAtttAGGAGtAGGGAGGG parallel ~30 77 A5UEH GGGTtTTGAAtttAtGAGGAGGGAGGG parallel 35 81

73 CHAPTER 4

SUMMARY AND FUTURE DIRECTIONS G4s are diverse secondary structures of nucleic acids. Unlike the classic dsDNA double helix, G4s come in a variety of shapes, sizes and therefore properties. G4s are more like proteins in this respect, where their secondary structure, not just the sequence, determines their function. Another similarity to proteins is a central hydrophobic core—typical for many proteins—the G4 folding core. Because of these similarities, many techniques established for proteins are directly applicable to studying G4s: UV- and CD-spectroscopy, thermal melts, analytical ultracentrifugation, FRET, X-ray crystallography and NMR. On the other hand, G4s can be studied using the generic techniques established for nucleic acids such as TDS, EMSA, nitrocellulose filter binding assay and DMS footprinting. Putative G4s are found in genomes of all living organisms from bacteria and archaea to plants and mammals. G4s are non-randomly distributed in eukaryotic genomes. G4s are overrepresented in or near gene promoters, transcription and translation start sites and first intron/exon boundaries. We identified the first G4-binding protein ZmNDPK1 and characterized its complex formation with a G4-forming oligonucleotide hex4_A5U derived from maize hexokinase4 gene. ZmNDPK1 is a typical hexameric NDPK and similarly to its human homolog, NM23-H2, it is a

G4-DNA binder. ZmNDPK1 binds parallel G4s with high affinity (Kd = 8 nM) only in the presence of K+ ions, which are optimal cations for G4 formation by the hex4_A5U. In the presence of suboptimal cation, such as Li+, the binding affinity of ZmNDPK1 to hex4_A5U is 20-fold weaker. Interestingly, upon binding ZmNDPK1 promotes G4 formation even in a presence of a suboptimal cation. We conclude that ZmNDPK1, as well as NM23-H2 specifically recognize G4 structures and stabilize G4s upon binding. We investigated the heterogeneity of hex4_A5U quadruplex using the array of biochemical and biophysical approaches. We concluded that hex4_A5U oligonucleotide adopts a number of conformations in the solution, that can potentially interconvert. We synthesized locked variants of the hex4_A5U sequence, where we limited the number of guanines that can be involved in the G4- core formation. We showed that these locked variants can form stable G4s of different topologies with or without bulged G-tract(s). This finding further emphasizes the highly polymorphic nature of G4-forming sequences. We have also determined that ZmNDPK1 binds only a subset of possible conformations adopted by hex4_A5U. Using electron microscopy, we observed that

74 ZmNDPK1:hex4_A5U complex organizes into heterogeneous filamentous structures with consistent thickness but variable length. Further analysis showed that these filaments have very short persistence length and have no apparent symmetry, that complicated structure analysis. The number of known G4-interacting proteins is very limited and, so far, no common G4- binding motif was identified. We demonstrated that G4-binding activities are conserved between the distantly related maize and human NDPKs, suggesting that G4-binding is an evolutionary conserved feature for this protein. Analysis of DNA-binding properties of NDPKs with known structures (such as NM23-H1, NM2-H3, NM23-H4, AWD, eNDK, AtNDPK1&2 etc.) could, perhaps, shed the light on the recognition motif used by these proteins. More broadly, the work should be continued on identification of novel G4-binding proteins and their G4 pairs. Expression library screening is an underutilized high throughput method for discovering new G4 binding proteins. We showed that this method can be used to discover the G4-specific binding proteins in an unbiased manner. To increase the throughput of the experiment described in section 2.2.1, fluorescently labeled oligonucleotides could be used instead of the biotin-labeled. With this approach three different G4-forming oligonucleotides (one parallel, one antiparallel and one mixed) could be labeled with a different fluorophore and tested against the library simultaneously. The heterogeneity in G4 topologies complicates analysis of their structures and the way they interact with proteins. Such heterogeneity is largely dependent on the number of available guanines that can form the G4 core. Other have shown that fifth G-tract can lead to formation of multiple G4 conformations using different tracts (‘spare tire’) (Fleming et al., 2015) and that extra Gs in G-tracts can be used interchangeably, leading to formation of structures with different loop lengths (G-register exchange) (Harkness and Mittermaier, 2016a). Additionally, we have shown that formation of the bulged non-canonical G4s should be taken into account in predicting putative G4 folding nucleotide sequences. Hex4_A5U oligonucleotide is a model sequence that forms a highly polymorphic G4, with examples of each of these competing sources of heterogeneity: a fifth G-track, extra Gs in G-tracks and non-canonical bulged G4s. Most investigations of G4 polymorphism are done in dilute solutions with an optimal cation. Structural effect of the competing cations such as K+ (G4-stabilizer) and Mg2+(double-stranded DNA stabilizer), is poorly understood, but would be more representative of the physiological conditions. Further, the effect of molecular crowding, which mimics the intracellular environment, should be addressed because

75 crowded environments can favor structurally compact topologies that would be unstable in dilute solutions. As of writing this dissertation there are no known biologically relevant atomic resolution structures of protein:G4 complexes. Structural information would provide us with a close look into the specific interactions that confer stability and specificity to the protein-G4 complex formation. Structural data would allow us to identify structural motifs used for the G4-recognition by proteins to enhance our understanding of putative G4-binding proteins using bioinformatic methods. Static structural data in combination with molecular dynamic simulation will allow for the rational drug design aimed at stabilization or disruption of protein:G4 complexes. NM23-H2:Pu27(cMyc) complex has been studied for over two decades but no structures have emerged. We were also unable to crystallize either ZmNDPK1:hex4_A5U or NM23-H2:Pu27 complexes. We hypothesize that these complexes may be non-crystallizable due to the formation of heterogeneous supramolecular assemblies like the filaments we observed under the microscope (Figure 3.10). CryoEM may be able to ultimately provide us with 3D structures of protein:G4 complexes: our 2D analysis of cryoEM images shows the feasibility of the approach, but also reveals an underling sample heterogeneity that is confounding to either X-ray crystallography or high-resolution cryo- EM.

76 REFERENCES

Abad, J. P. and Villasante, A. (1999) 'The 3' non-coding region of the Drosophila melanogaster HeT-A telomeric retrotransposon contains sequences with propensity to form G-quadruplex DNA', FEBS Lett, 453(1-2), pp. 59-62.

Adams, P. D., Afonine, P. V., Bunkoczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L. W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. and Zwart, P. H. (2010) 'PHENIX: a comprehensive Python-based system for macromolecular structure solution', Acta Crystallogr D Biol Crystallogr, 66(Pt 2), pp. 213-21.

Amrane, S., Adrian, M., Heddi, B., Serero, A., Nicolas, A., Mergny, J. L. and Phan, A. T. (2012) 'Formation of pearl-necklace monomorphic G-quadruplexes in the human CEB25 minisatellite', J Am Chem Soc, 134(13), pp. 5807-16.

Andorf, C. M., Kopylov, M., Dobbs, D., Koch, K. E., Stroupe, M. E., Lawrence, C. J. and Bass, H. W. (2014) 'G-Quadruplex (G4) Motifs in the Maize (Zea mays L.) Genome Are Enriched at Specific Locations in Thousands of Genes Coupled to Energy Status, Hypoxia, Low Sugar, and Nutrient Deprivation', J genet genomics, 41(12), pp. 627-47.

Arora, A. and Maiti, S. (2009) 'Differential biophysical behavior of human telomeric RNA and DNA quadruplex', J Phys Chem B, 113(30), pp. 10515-20.

Balagurumoorthy, P., Brahmachari, S. K., Mohanty, D., Bansal, M. and Sasisekharan, V. (1992) 'Hairpin and parallel quartet structures for telomeric sequences', Nucleic acids research, 20(15), pp. 4061-7.

Balasubramanian, S., Hurley, L. H. and Neidle, S. (2011) 'Targeting G-quadruplexes in gene promoters: a novel anticancer strategy?', Nature Reviews Drug Discovery, 10(4), pp. 261-275.

Bang, I. (1910) 'Untersuchungen über die guanylsäure', Biochemische Zeitschrift, 26, pp. 293-311.

Banuelos, S., Lectez, B., Taneva, S. G., Ormaza, G., Alonso-Marino, M., Calle, X. and Urbaneja, M. A. (2013) 'Recognition of intermolecular G-quadruplexes by full length nucleophosmin. Effect of a leukaemia-associated mutation', FEBS Lett, 587(14), pp. 2254-9.

Basu, S. and Das Gupta, N. N. (1969) 'Spectrophotometric investigation of DNA in the ultraviolet. II', Biochim Biophys Acta, 174(1), pp. 174-82.

Basu, S. and Dasgupta, N. N. (1967) 'Spectrophotometric investigation of DNA in the ultraviolet', Biochim Biophys Acta, 145(2), pp. 391-7.

Bedrat, A., Lacroix, L. and Mergny, J. L. (2016) 'Re-evaluation of G-quadruplex propensity with G4Hunter', Nucleic Acids Res, 44(4), pp. 1746-59.

Bergmeyer, H. U. (ed.) (1974) Methods of Enzymatic Analysis.

77 Besnard, E., Babled, A., Lapasset, L., Milhavet, O., Parrinello, H., Dantec, C., Marin, J. M. and Lemaitre, J. M. (2012) 'Unraveling cell type-specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs', Nat Struct Mol Biol, 19(8), pp. 837-44.

Biffi, G., Tannahill, D., McCafferty, J. and Balasubramanian, S. (2013) 'Quantitative visualization of DNA G-quadruplex structures in human cells', Nat Chem, 5(3), pp. 182-6.

Bilitou, A., Watson, J., Gartner, A. and Ohnuma, S. (2009) 'The NM23 family in development', Mol Cell Biochem, 329(1-2), pp. 17-33.

Blackburn, E. H., Greider, C. W. and Szostak, J. W. (2006) 'Telomeres and telomerase: the path from maize, Tetrahymena and yeast to human cancer and aging', Nat Med, 12(10), pp. 1133-8.

Blice-Baum, A. C. and Mihailescu, M. R. (2014) 'Biophysical characterization of G-quadruplex forming FMR1 mRNA and of its interactions with different fragile X mental retardation protein isoforms', Rna, 20(1), pp. 103-14.

Blume, S. W., Guarcello, V., Zacharias, W. and Miller, D. M. (1997) 'Divalent transition metal cations counteract potassium-induced quadruplex assembly of oligo(dG) sequences', Nucleic Acids Res, 25(3), pp. 617-25.

Bochman, M. L., Paeschke, K. and Zakian, V. A. (2012) 'DNA secondary structures: stability and function of G-quadruplex structures', Nat Rev Genet, 13(11), pp. 770-80.

Boissan, M. and Lacombe, M. L. (2011) 'Learning about the functions of NME/NM23: Lessons from knockout mice to silencing strategies', Naunyn-Schmiedeberg's Archives of Pharmacology, 384, pp. 421-431.

Boissan, M., Montagnac, G., Shen, Q., Griparic, L., Guitton, J., Romao, M., Sauvonnet, N., Lagache, T., Lascu, I., Raposo, G., Desbourdes, C., Schlattner, U., Lacombe, M. L., Polo, S., van der Bliek, A. M., Roux, A. and Chavrier, P. (2014) 'Membrane trafficking. Nucleoside diphosphate kinases fuel dynamin superfamily proteins with GTP for membrane remodeling', Science, 344(6191), pp. 1510-5.

Borbone, N., Amato, J., Oliviero, G., D'Atri, V., Gabelica, V., De Pauw, E., Piccialli, G. and Mayol, L. (2011) 'd(CGGTGGT) forms an octameric parallel G-quadruplex via stacking of unusual G(:C):G(:C):G(:C):G(:C) octads', Nucleic Acids Res, 39(17), pp. 7848-57.

Brooks, T. A., Kendrick, S. and Hurley, L. (2010) 'Making sense of G-quadruplex and i-motif functions in oncogene promoters', Febs j, 277(17), pp. 3459-69.

Brázda, V., Hároníková, L., Liao, J. C. C. and Fojta, M. (2014) 'DNA and RNA quadruplex-binding proteins', International journal of molecular sciences, 15(10), pp. 17493-517.

Burge, S., Parkinson, G. N., Hazel, P., Todd, A. K. and Neidle, S. (2006) 'Quadruplex DNA: sequence, topology and structure', Nucleic Acids Res, 34(19), pp. 5402-15.

78 Calvo, E. P. and Wasserman, M. (2016) 'G-Quadruplex ligands: Potent inhibitors of telomerase activity and cell proliferation in Plasmodium falciparum', Mol Biochem Parasitol, 207(1), pp. 33- 8.

Cao, W. and Demeler, B. (2008) 'Modeling analytical ultracentrifugation experiments with an adaptive space-time finite element solution for multicomponent reacting systems', Biophys J, 95(1), pp. 54-65.

Catasti, P., Chen, X., Moyzis, R. K., Bradbury, E. M. and Gupta, G. (1996) 'Structure-function correlations of the insulin-linked polymorphic region', Journal of molecular biology, 264(3), pp. 534-45.

Chambers, V. S., Marsico, G., Boutell, J. M., Di Antonio, M., Smith, G. P. and Balasubramanian, S. (2015) 'High-throughput sequencing of DNA G-quadruplex structures in the human genome', Nat Biotechnol, 33(8), pp. 877-81.

Chen, F. M. (1992) 'Sr2+ facilitates intermolecular G-quadruplex formation of telomeric sequences', Biochemistry, 31(15), pp. 3769-76.

Chen, M. C., Murat, P., Abecassis, K., Ferre-D'Amare, A. R. and Balasubramanian, S. (2015) 'Insights into the mechanism of a G-quadruplex-unwinding DEAH-box helicase', Nucleic Acids Res, 43(4), pp. 2223-31.

Chevenet, F., Brun, C., Banuls, A. L., Jacq, B. and Christen, R. (2006) 'TreeDyn: towards dynamic graphics and annotations for analyses of trees', BMC Bioinformatics, 7, pp. 439.

Chung, W. J., Heddi, B., Schmitt, E., Lim, K. W., Mechulam, Y. and Phan, A. T. (2015) 'Structure of a left-handed DNA G-quadruplex', Proc Natl Acad Sci U S A, 112(9), pp. 2729-33.

Coe, E. H., Jr. (2001) 'The origins of maize genetics', Nat Rev Genet: Vol. 11. England, pp. 898- 905.

Cogoi, S., Paramasivam, M., Spolaore, B. and Xodo, L. E. (2008) 'Structural polymorphism within a regulatory element of the human KRAS promoter: formation of G4-DNA recognized by nuclear proteins', Nucleic Acids Res, 36(11), pp. 3765-80.

Cogoi, S. and Xodo, L. E. (2006) 'G-quadruplex formation within the promoter of the KRAS proto- oncogene and its effect on transcription', Nucleic Acids Research, 34(9), pp. 2536-2549.

Connor, A. C., Frederick, K. A., Morgan, E. J. and McGown, L. B. (2006) 'Insulin capture by an insulin-linked polymorphic region G-quadruplex DNA oligonucleotide', J Am Chem Soc, 128(15), pp. 4986-91.

Czerwinski, J. D., Hovan, S. C. and Mascotti, D. P. (2005) 'Quantitative nonisotopic nitrocellulose filter binding assays: bacterial manganese superoxide dismutase-DNA interactions', Anal Biochem, 336(2), pp. 300-4.

79 Dai, J., Carver, M., Punchihewa, C., Jones, R. A. and Yang, D. (2007) 'Structure of the Hybrid-2 type intramolecular human telomeric G-quadruplex in K+ solution: insights into structure polymorphism of the human telomeric sequence', Nucleic acids research, 35(15), pp. 4927-40.

Dailey, M. M., Clarke Miller, M., Bates, P. J., Lane, A. N. and Trent, J. O. (2010) 'Resolution and characterization of the structural polymorphism of a single quadruplex-forming sequence', Nucleic Acids Research, 38(14), pp. 4877-4888.

Dancer, J., Neuhaus, H. E. and Stitt, M. (1990) 'Subcellular compartmentation of uridine nucleotides and nucleoside-5' -diphosphate kinase in leaves', Plant Physiol, 92(3), pp. 637-41.

Dang, C. V., Le, A. and Gao, P. (2009) 'MYC-induced cancer cell energy metabolism and therapeutic opportunities', Clin Cancer Res, 15(21), pp. 6479-83.

Davis, L. and Maizels, N. (2011) 'G4 DNA: at risk in the genome', Embo j, 30(19), pp. 3878-9.

Demeler, B. (2005) 'UltraScan - A Comprehensive Data Analysis Software Package for Analytical Ultracentrifugation Experiments', in Scott, D.J., Harding, S.E. & Rowe, A.J. (eds.). Cambridge: Royal Society of Chemistry, pp. 210-230.

Dempsey, L. A., Sun, H., Hanakahi, L. A. and Maizels, N. (1999) 'G4 DNA binding by LR1 and its subunits, nucleolin and hnRNP D, A role for G-G pairing in immunoglobulin switch recombination', The Journal of biological chemistry, 274(2), pp. 1066-71.

Denton, E. J. and Land, M. F. (1971) 'Mechanism of Reflexion in Silvery Layers of Fish and Cephalopods'.

Dexheimer, T. S., Carey, S. S., Zuohe, S., Gokhale, V. M., Hu, X., Murata, L. B., Maes, E. M., Weichsel, A., Sun, D., Meuillet, E. J., Montfort, W. R. and Hurley, L. H. (2009) 'NM23-H2 may play an indirect role in transcriptional activation of c-myc gene expression but does not cleave the nuclease hypersensitive element III(1)', Mol Cancer Ther, 8(5), pp. 1363-77.

Dexheimer, T. S., Sun, D. and Hurley, L. H. (2006) 'Deconvoluting the structural and drug- recognition complexity of the G-quadruplex-forming region upstream of the bcl-2 P1 promoter', Journal of the American Chemical Society, 128(16), pp. 5404-15.

Didenko, V. V., Springer E-books - York University. and SpringerLink (Online service) (2006) Fluorescent energy transfer nucleic acid probes designs and protocols. Methods in molecular biology, Totowa, N.J.: Humana Press. Available at: http://www.library.yorku.ca/eresolver/?id=1238486. Available at: http://www.library.yorku.ca/eresolver/?id=1238487.

Dorion, S., Matton, D. P. and Rivoal, J. (2006) 'Characterization of a cytosolic nucleoside diphosphate kinase associated with cell division and growth in potato', Planta, 224(1), pp. 108-24.

Du, Z., Zhao, Y. and Li, N. (2009) 'Genome-wide colonization of gene regulatory elements by G4 DNA motifs', Nucleic Acids Res, 37(20), pp. 6784-98.

80 Duan, X. L., Liu, N. N., Yang, Y. T., Li, H. H., Li, M., Dou, S. X. and Xi, X. G. (2015) 'G- quadruplexes significantly stimulate Pif1 helicase-catalyzed duplex DNA unwinding', J Biol Chem, 290(12), pp. 7722-35.

Farhath, M. M., Thompson, M., Ray, S., Sewell, A., Balci, H. and Basu, S. (2015) 'G-Quadruplex- Enabling Sequence within the Human Tyrosine Hydroxylase Promoter Differentially Regulates Transcription', Biochemistry, 54(36), pp. 5533-5545.

Finan, P. M., White, I. R., Redpath, S. H., Findlay, J. B. and Millner, P. A. (1994) 'Molecular cloning, sequence determination and heterologous expression of nucleoside diphosphate kinase from Pisum sativum', Plant Mol Biol, 25(1), pp. 59-67.

Fleming, A. M., Zhou, J., Wallace, S. S. and Burrows, C. J. (2015) 'A Role for the Fifth G-Track in G-Quadruplex Forming Oncogene Promoter Sequences during Oxidative Stress: Do These "Spare Tires" Have an Evolved Function?', ACS Cent Sci, 1(5), pp. 226-233.

Francois-Moutal, L., Maniti, O., Marcillat, O. and Granjon, T. (2013) 'New insights into lipid- Nucleoside Diphosphate Kinase-D interaction mechanism: protein structural changes and membrane reorganisation', Biochimica et biophysica acta, 1828(2), pp. 906-15.

Freije, J. M., Blay, P., MacDonald, N. J., Manrow, R. E. and Steeg, P. S. (1997) 'Site-directed mutation of Nm23-H1. Mutations lacking motility suppressive capacity upon transfection are deficient in histidine-dependent protein phosphotransferase pathways in vitro', J Biol Chem, 272(9), pp. 5525-32.

Gallo, A., Lo Sterzo, C., Mori, M., Di Matteo, A., Bertini, I., Banci, L., Brunori, M. and Federici, L. (2012) 'Structure of nucleophosmin DNA-binding domain and analysis of its complex with a G-quadruplex sequence from the c-MYC promoter', J Biol Chem, 287(32), pp. 26539-48.

Garces, E. and Cleland, W. W. (1969) 'Kinetic studies of yeast nucleoside diphosphate kinase', Biochemistry, 8(2), pp. 633-40.

Gellert, M., Lipsett, M. N. and Davies, D. R. (1962) 'Helix formation by guanylic acid', Proc Natl Acad Sci U S A, 48, pp. 2013-8.

Giraldo, R., Suzuki, M., Chapman, L. and Rhodes, D. (1994) 'Promotion of parallel DNA quadruplexes by a yeast telomere binding protein: a circular dichroism study', Proc Natl Acad Sci U S A, 91(16), pp. 7658-62.

Gonzalez, V., Guo, K., Hurley, L. and Sun, D. (2009) 'Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein', J Biol Chem, 284(35), pp. 23622-35.

Gonzalez, V. and Hurley, L. H. (2010) 'The C-terminus of nucleolin promotes the formation of the c-MYC G-quadruplex and inhibits c-MYC promoter activity', Biochemistry, 49(45), pp. 9706-14.

Gray, L. T., Vallur, A. C., Eddy, J. and Maizels, N. (2014) 'G quadruplexes are genomewide targets of transcriptional helicases XPB and XPD', Nat Chem Biol, 10(4), pp. 313-8.

81 Gray, R. D. and Chaires, J. B. (2011) 'Analysis of multidimensional G-quadruplex melting curves', Curr Protoc Nucleic Acid Chem, Chapter 17, pp. Unit17.4.

Guedin, A., Alberti, P. and Mergny, J. L. (2009) 'Stability of intramolecular quadruplexes: sequence effects in the central loop', Nucleic Acids Res, 37(16), pp. 5559-67.

Gunaratnam, M., Green, C., Moreira, J. B., Moorhouse, A. D., Kelland, L. R., Moses, J. E. and Neidle, S. (2009) 'G-quadruplex compounds and cis-platin act synergistically to inhibit cancer cell growth in vitro and in vivo', Biochem Pharmacol, 78(2), pp. 115-22.

Guo, K., Pourpak, A., Beetz-Rogers, K., Gokhale, V., Sun, D. and Hurley, L. H. (2007) 'Formation of pseudosymmetrical G-quadruplex and i-motif structures in the proximal promoter region of the RET oncogene', Journal of the American Chemical Society, 129(33), pp. 10220-8.

Guédin, A., Gros, J., Alberti, P. and Mergny, J.-L. (2010) 'How long is too long? Effects of loop size on G-quadruplex stability', Nucleic acids research, 38(21), pp. 7858-68.

Hanakahi, L. A., Sun, H. and Maizels, N. (1999) 'High affinity interactions of nucleolin with G-G- paired rDNA', The Journal of biological chemistry, 274(22), pp. 15908-12.

Hardin, C. C., Perry, A. G. and White, K. (2000) 'Thermodynamic and kinetic characterization of the dissociation and assembly of quadruplex nucleic acids', Biopolymers, 56(3), pp. 147-94.

Harkness, R. W. and Mittermaier, A. K. (2016a) 'G-register exchange dynamics in guanine quadruplexes', Nucleic acids research, 44(8), pp. 3481-94.

Harkness, R. W. t. and Mittermaier, A. K. (2016b) 'G-register exchange dynamics in guanine quadruplexes', Nucleic Acids Res, 44(8), pp. 3481-94.

Heddi, B., Cheong, V. V., Martadinata, H. and Phan, A. T. (2015) 'Insights into G-quadruplex specific recognition by the DEAH-box helicase RHAU: Solution structure of a peptide-quadruplex complex', Proc Natl Acad Sci U S A, 112(31), pp. 9608-13.

Henderson, A., Wu, Y., Huang, Y. C., Chavez, E. A., Platt, J., Johnson, F. B., Brosh, R. M., Jr., Sen, D. and Lansdorp, P. M. (2014) 'Detection of G-quadruplex DNA in mammalian cells', Nucleic Acids Res, 42(2), pp. 860-9.

Hildebrandt, M., Lacombe, M. L., Mesnildrey, S. and Véron, M. (1995) 'A human NDP-kinase B specifically binds single-stranded poly-pyrimidine sequences', Nucleic acids research, 23(19), pp. 3858-64.

Holder, I. T. and Hartig, J. S. (2014) 'A Matter of Location: Influence of G-Quadruplexes on Escherichia coli Gene Expression', Chem Biol, 21(11), pp. 1511-1521.

Hudson, J. S., Ding, L., Le, V., Lewis, E. and Graves, D. (2014) 'Recognition and binding of human telomeric G-quadruplex DNA by unfolding protein 1', Biochemistry, 53(20), pp. 3347-56.

82 Huppert, J. L. (2008) 'Four-stranded nucleic acids: structure, function and targeting of G- quadruplexes', Chem Soc Rev, 37(7), pp. 1375-84.

Huppert, J. L. and Balasubramanian, S. (2005) 'Prevalence of quadruplexes in the human genome', Nucleic Acids Res, 33(9), pp. 2908-16.

Huppert, J. L. and Balasubramanian, S. (2006) 'G-quadruplexes in promoters throughout the human genome', Nucleic Acids Research, 35(2), pp. 406-413.

Huppert, J. L. and Balasubramanian, S. (2007) 'G-quadruplexes in promoters throughout the human genome', Nucleic Acids Res, 35(2), pp. 406-13.

Hwang, H., Kreig, A., Calvert, J., Lormand, J., Kwon, Y., Daley, J. M., Sung, P., Opresko, P. L. and Myong, S. (2014) 'Telomeric Overhang Length Determines Structural Dynamics and Accessibility to Telomerase and ALT-Associated Proteins', Structure, 22(6), pp. 842-53.

Il'inskii, N. S., Varizhuk, A. M., Beniaminov, A. D., Puzanov, M. A., Shchelkina, A. K. and Kaliuzhnyi, D. N. (2014) '[G-quadruplex ligands: mechanisms of anticancer action and target binding]', Mol Biol (Mosk), 48(6), pp. 891-907.

Im, Y. J., Kim, J. I., Shen, Y., Na, Y., Han, Y. J., Kim, S. H., Song, P. S. and Eom, S. H. (2004) 'Structural analysis of Arabidopsis thaliana nucleoside diphosphate kinase-2 for phytochrome- mediated light signaling', J Mol Biol, 343(3), pp. 659-70.

Joachimi, A., Benz, A. and Hartig, J. S. (2009) 'A comparison of DNA and RNA quadruplex structures and stabilities', Bioorg Med Chem, 17(19), pp. 6811-5.

Johnson, J. E., Cao, K., Ryvkin, P., Wang, L.-S. and Johnson, F. B. (2010) 'Altered gene expression in the Werner and Bloom syndromes is associated with sequences having G-quadruplex forming potential', Nucleic acids research, 38(4), pp. 1114-22.

Juranek, S. A. and Paeschke, K. (2012) 'Cell cycle regulation of G-quadruplex DNA structures at telomeres', Curr Pharm Des, 18(14), pp. 1867-72.

Kandeel, M. and Kitade, Y. (2010) 'Substrate specificity and nucleotides binding properties of NM23H2/nucleoside diphosphate kinase homolog from Plasmodium falciparum', J Bioenerg Biomembr, 42(5), pp. 361-9.

Kanoh, Y., Matsumoto, S., Fukatsu, R., Kakusho, N., Kono, N., Renard-Guillet, C., Masuda, K., Iida, K., Nagasawa, K., Shirahige, K. and Masai, H. (2015) 'Rif1 binds to G quadruplexes and suppresses replication over long distances', Nature structural & molecular biology, 22(11), pp. 889-97.

Khateb, S., Weisman-Shomer, P., Hershco, I., Loeb, L. A. and Fry, M. (2004) 'Destabilization of tetraplex structures of the fragile X repeat sequence (CGG)n is mediated by homolog-conserved domains in three members of the hnRNP family', Nucleic acids research, 32(14), pp. 4145-54.

83 Kim, B. G., Shek, Y. L. and Chalikian, T. V. (2013) 'Polyelectrolyte effects in G-quadruplexes', Biophys Chem, 184, pp. 95-100.

Kim, Y. H., Kim, M. D., Choi, Y. I., Park, S. C., Yun, D. J., Noh, E. W., Lee, H. S. and Kwak, S. S. (2011) 'Transgenic poplar expressing Arabidopsis NDPK2 enhances growth as well as oxidative stress tolerance', Plant Biotechnol J, 9(3), pp. 334-47.

Kim, Y. H., Lim, S., Yang, K. S., Kim, C. Y., Kwon, S. Y., Lee, H. S., Wang, X., Zhou, Z., Ma, D., Yun, D. J. and Kwak, S. S. (2009) 'Expression of Arabidopsis NDPK2 increases antioxidant enzyme activites and enhances tolerance to multiple environmental stresses in transgenic sweetpotato plants', Mol Breeding, 24, pp. 233-244.

Kocman, V. and Plavec, J. (2017) 'Tetrahelical structural family adopted by AGCGA-rich regulatory DNA regions', Nat Commun, 8, pp. 15355.

Kopylov, M., Bass, H. W. and Stroupe, M. E. (2015) 'The Maize (Zea mays L.) Nucleoside Diphosphate Kinase1 (ZmNDPK1) Gene Encodes a Human NM23-H2 Homologue That Binds and Stabilizes G-Quadruplex DNA', Biochemistry, 54, pp. 1743-1757.

Kumarasamy, V. M., Shin, Y. -J., White, J. and Sun, D. (2015) 'Selective repression of RET proto- oncogene in medullary thyroid carcinoma by a natural alkaloid berberine', BMC cancer, 15, pp. 599-599.

Kuryavyi, V., Cahoon, L. A., Seifert, H. S. and Patel, D. J. (2012) 'RecA-binding pilE G4 sequence essential for pilin antigenic variation forms monomeric and 5' end-stacked dimeric parallel G- quadruplexes', Structure, 20(12), pp. 2090-102.

Lam, E. Y., Beraldi, D., Tannahill, D. and Balasubramanian, S. (2013) 'G-quadruplex structures are stable and detectable in human genomic DNA', Nat Commun, 4, pp. 1796.

Lane, A. N., Chaires, J. B., Gray, R. D. and Trent, J. O. (2008) 'Stability and kinetics of G- quadruplex structures', Nucleic Acids Res, 36(17), pp. 5482-515.

Largy, E., Marchand, A., Amrane, S., Gabelica, V. and Mergny, J.-L. (2016) 'Quadruplex Turncoats: Cation-Dependent Folding and Stability of Quadruplex-DNA Double Switches'.

Lebowitz, J., Lewis, M. S. and Schuck, P. (2002) 'Modern analytical ultracentrifugation in protein science: a tutorial review', Protein Sci, 11(9), pp. 2067-79.

Lemmens, B., van Schendel, R. and Tijsterman, M. (2015) 'Mutagenic consequences of a single G-quadruplex demonstrate mitotic inheritance of DNA replication fork barriers', Nature Communications, 6, pp. 8909-8909.

Levene, P. A., Rockefeller‐Institute for Medical Research, N. Y., Jacobs, W. A. and Rockefeller‐ Institute for Medical Research, N. Y. (1909) 'Über Guanylsäure', European Journal of Inorganic Chemistry, 42(2), pp. 2469-2473.

84 Levy-Lior, A., Pokroy, B., Levavi-Sivan, B., Leiserowitz, L., Weiner, S. and Addadi, L. (2008) 'Biogenic Guanine Crystals from the Skin of Fish May Be Designed to Enhance Light Reflectance'.

Lexa, M., Kejnovsky, E., Steflova, P., Konvalinova, H., Vorlickova, M. and Vyskot, B. (2014) 'Quadruplex-forming sequences occupy discrete regions inside plant LTR retrotransposons', Nucleic Acids Res, 42(2), pp. 968-78.

Li, J., Correia, J. J., Wang, L., Trent, J. O. and Chaires, J. B. (2005) 'Not so crystal clear: the structure of the human telomere G-quadruplex in solution differs from that present in a crystal', Nucleic acids research, 33(14), pp. 4649-59.

Lim, K. W., Amrane, S., Bouaziz, S., Xu, W., Mu, Y., Patel, D. J., Luu, K. N. and Phan, A. T. (2009) 'Structure of the human telomere in K+ solution: a stable basket-type G-quadruplex with only two G-tetrad layers', J Am Chem Soc, 131(12), pp. 4301-9.

Lipps, H. J. and Rhodes, D. (2009) 'G-quadruplex structures: in vivo evidence and function', Trends Cell Biol, 19(8), pp. 414-22.

Liu, H., Lv, C., Ding, B., Wang, J., Li, S. and Zhang, Y. (2014) 'Antitumor activity of G- quadruplex-interactive agent TMPyP4 with photodynamic therapy in ovarian carcinoma cells', Oncol Lett, 8(1), pp. 409-413.

Lombardi, D., Lacombe, M.-l., Paggi, M. G., Aquila, L., Cellulare, M., Elena, I. R. and Sperimentale, C. R. (2000) 'nm23 : Unraveling Its Biological Function in Cell Differentiation', Journal of Cellular Physiology, 149(August 1999), pp. 144-149.

London, T. B. C., Barber, L. J., Mosedale, G., Kelly, G. P., Balasubramanian, S., Hickson, I. D., Boulton, S. J. and Hiom, K. (2008) 'FANCJ is a structure-specific DNA helicase associated with the maintenance of genomic G/C tracts', The Journal of biological chemistry, 283(52), pp. 36132- 9.

Loulergue, C., Lebrun, M. and Briat, J. F. (1998) 'Expression cloning in Fe2+ transport defective yeast of a novel maize MYC transcription factor', Gene, 225(1-2), pp. 47-57.

Macaya, R. F., Schultze, P., Smith, F. W., Roe, J. A. and Feigon, J. (1993) 'Thrombin-binding DNA aptamer forms a unimolecular quadruplex structure in solution', Proc Natl Acad Sci U S A, 90(8), pp. 3745-9.

Maizels, N. and Gray, L. T. (2013) 'The G4 genome', PLoS Genet, 9(4), pp. e1003468.

Mashima, T., Matsugami, A., Nishikawa, F., Nishikawa, S. and Katahira, M. (2009) 'Unique quadruplex structure and interaction of an RNA aptamer against bovine prion protein', Nucleic Acids Res, 37(18), pp. 6249-58.

Matsugami, A., Ouhashi, K., Kanagawa, M., Liu, H., Kanagawa, S., Uesugi, S. and Katahira, M. (2001) 'An intramolecular quadruplex of (GGA)(4) triplet repeat DNA with a G:G:G:G tetrad and a G(:A):G(:A):G(:A):G heptad, and its dimeric interaction', J Mol Biol, 313(2), pp. 255-69.

85 McClintock, B. (1930) 'A Cytological Demonstration of the Location of an Interchange between Two Non-Homologous Chromosomes of Zea Mays', Proc Natl Acad Sci U S A, 16(12), pp. 791-6.

McClintock, B. (1939) 'The Behavior in Successive Nuclear Divisions of a Chromosome Broken at Meiosis', Proc Natl Acad Sci U S A, 25(8), pp. 405-16.

Mergny, J.-L. and Lacroix, L. (2003) 'Analysis of thermal melting curves', Oligonucleotides, 13, pp. 515-537.

Mergny, J.-L. and Lacroix, L. (2009) 'UV Melting of G-Quadruplexes', Current protocols in nucleic acid chemistry / edited by Serge L. Beaucage ... [et al.], Chapter 17, pp. Unit 17.1-Unit 17.1.

Mergny, J.-L., Li, J., Lacroix, L., Amrane, S. and Chaires, J. B. (2005) 'Thermal difference spectra: a specific signature for nucleic acid structures', Nucleic acids research, 33(16), pp. e138-e138.

Mergny, J. L., Phan, A. T. and Lacroix, L. (1998) 'Following G-quartet formation by UV- spectroscopy', FEBS letters, 435(1), pp. 74-8.

Miannay, F. o.-A., Banyasz, A., Gustavsson, T. and Markovitsi, D. (2009) 'Excited States and Energy Transfer in G-Quadruplexes †', The Journal of Physical Chemistry C, 113(27), pp. 11760- 11765.

Miller, D. M., Thomas, S. D., Islam, A., Muench, D. and Sedoris, K. 2012. c-Myc and cancer metabolism.

Morin-Leisk, J. and Lee, T. H. (2008) 'Nucleotide-dependent self-assembly of Nucleoside Diphosphate Kinase (NDPK) in vitro', Biochim Biophys Acta, 1784(12), pp. 2045-51.

Mukundan, V. T. and Phan, A. T. (2013) 'Bulges in G-quadruplexes: broadening the definition of G-quadruplex-forming sequences', J Am Chem Soc, 135(13), pp. 5017-28.

Mullen, M. A., Olson, K. J., Dallaire, P., Major, F., Assmann, S. M. and Bevilacqua, P. C. (2010) 'RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles', Nucleic Acids Res, 38(22), pp. 8149-63.

Murat, P. and Balasubramanian, S. (2014) 'Existence and consequences of G-quadruplex structures in DNA', Curr Opin Genet Dev, 25C, pp. 22-29.

Nomura, T., Yatsunami, K., Honda, A., Sugimoto, Y., Fukui, T., Zhang, J., Yamamoto, J. and Ichikawa, A. (1992) 'The amino acid sequence of nucleoside diphosphate kinase I from spinach leaves, as deduced from the cDNA sequence', Arch Biochem Biophys, 297(1), pp. 42-5.

Ogloblina, A. M., Bannikova, V. A., Khristich, A. N., Oretskaya, T. S., Yakubovskaya, M. G. and Dolinnaya, N. G. (2015) 'Parallel G-Quadruplexes Formed by Guanine-Rich Microsatellite Repeats Inhibit Human Topoisomerase I', Biochemistry (Mosc), 80(8), pp. 1026-38.

86 Otwinowski, Z. and Minor, W. (1997) 'Processing of X-ray Diffraction Data Collected in Oscillation Mode', in Carter Jr., C.W.a.S., R. M. (ed.) Methods in Enzymology: Macromolecular Crystallography, part A. New York: Academic Press, pp. 307-326.

Paeschke, K., Bochman, M. L., Garcia, P. D., Cejka, P., Friedman, K. L., Kowalczykowski, S. C. and Zakian, V. A. (2013) 'Pif1 family helicases suppress genome instability at G-quadruplex motifs', Nature, 497(7450), pp. 458-62.

Paeschke, K., Capra, J. A. and Zakian, V. A. (2011) 'DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces cerevisiae Pif1 DNA Helicase', Cell, 145(5), pp. 678- 691.

Paeschke, K., Simonsson, T., Postberg, J., Rhodes, D. and Lipps, H. J. (2005) 'Telomere end- binding proteins control the formation of G-quadruplex DNA structures in vivo', Nat Struct Mol Biol, 12(10), pp. 847-54.

Paramasivan, S., Rujan, I. and Bolton, P. H. (2007) 'Circular dichroism of quadruplex DNAs: applications to structure, cation effects and ligand binding', Methods, 43(4), pp. 324-31.

Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. and Ferrin, T. E. (2004) 'UCSF Chimera--a visualization system for exploratory research and analysis', J Comput Chem, 25(13), pp. 1605-12.

Phang, W. L. a. J. M. (2013) Oncogene and Cancer - From Bench to Clinic.

Postel, E. H. (2003) 'Multiple biochemical activities of NM23/NDP kinase in gene regulation', J Bioenerg Biomembr, 35(1), pp. 31-40.

Postel, E. H., Abramczyk, B. A., Gursky, S. K. and Xu, Y. (2002) 'Structure-based mutational and functional analysis identify human NM23-H2 as a multifunctional enzyme', Biochemistry, 41(20), pp. 6330-7.

Postel, E. H., Berberich, S. J., Flint, S. J. and Ferrone, C. A. (1993) 'Human c-myc transcription factor PuF identified as nm23-H2 nucleoside diphosphate kinase, a candidate suppressor of tumor metastasis', Science, 261(5120), pp. 478-480.

Postel, E. H. and Ferrone, C. A. (1994) 'Nucleoside diphosphate kinase enzyme activity of NM23- H2/PuF is not required for its DNA binding and in vitro transcriptional functions', J Biol Chem, 269(12), pp. 8627-30.

Postel, E. H., Mango, S. E. and Flint, S. J. (1989) 'A nuclease-hypersensitive element of the human c-myc promoter interacts with a transcription initiation factor', Mol Cell Biol, 9(11), pp. 5123-33.

Postel, E. H., Weiss, V. H., Beneken, J. and Kirtane, A. (1996) 'Mutational analysis of NM23- H2/NDP kinase identifies the structural domains critical to recognition of a c-myc regulatory element', Proceedings of the National Academy of Sciences of the United States of America, 93, pp. 6892-6897.

87 Punjani, A., Rubinstein, J. L., Fleet, D. J. and Brubaker, M. A. (2017) 'cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination', Nat Methods, 14(3), pp. 290-296.

Qiu, J., Wang, M., Zhang, Y., Zeng, P., Ou, T. M., Tan, J. H., Huang, S. L., An, L. K., Wang, H., Gu, L. Q., Huang, Z. S. and Li, D. (2015) 'Biological Function and Medicinal Research Significance of G-Quadruplex Interactive Proteins', Curr Top Med Chem, 15(19), pp. 1971-87.

Quante, T., Otto, B., Brázdová, M., Kejnovská, I., Deppert, W. and Tolstonog, G. V. (2012) 'Mutant p53 is a transcriptional co-factor that binds to G-rich regulatory regions of active genes and generates transcriptional plasticity', Cell cycle (Georgetown, Tex.), 11(17), pp. 3290-303.

Qureshi, M. H., Ray, S., Sewell, A. L., Basu, S. and Balci, H. (2012a) 'Replication protein A unfolds G-quadruplex structures with varying degrees of efficiency', J Phys Chem B, 116(19), pp. 5588-94.

Qureshi, M. H., Ray, S., Sewell, A. L., Basu, S. and Balci, H. (2012b) 'Replication Protein A Unfolds G-Quadruplex Structures with Varying Degrees of Efficiency', The Journal of Physical Chemistry B, 116(19), pp. 5588-5594.

Randazzo, A., Spada, G. P. and da Silva, M. W. (2013) 'Circular dichroism of quadruplex structures', Topics in current chemistry, 330, pp. 67-86.

Raveh, S., Vinh, J., Rossier, J., Agou, F. and Veron, M. (2001) 'Peptidic determinants and structural model of human NDP kinase B (Nm23-H2) bound to single-stranded DNA', Biochemistry, 40(20), pp. 5882-93.

Ravindran, S. (2012) 'Barbara McClintock and the discovery of jumping genes', Proc Natl Acad Sci U S A, 109(50), pp. 20198-9.

Ray, S., Bandaria, J. N., Qureshi, M. H., Yildiz, A. and Balci, H. (2014) 'G-quadruplex formation in telomeres enhances POT1/TPP1 protection against RPA binding', Proc Natl Acad Sci U S A, 111(8), pp. 2990-5.

Ray, S., Qureshi, M. H., Malcolm, D. W., Budhathoki, J. B., Celik, U. and Balci, H. (2013) 'RPA- mediated unfolding of systematically varying G-quadruplex structures', Biophys J, 104(10), pp. 2235-45.

Rhodes, D. and Lipps, H. J. (2015) 'G-quadruplexes and their regulatory roles in biology', Nucleic Acids Research, 43(18), pp. 8627-8637.

Ryder, S. P., Recht, M. I. and Williamson, J. R. (2008) 'RNA-Protein Interaction Protocols', 488, pp. 99-115.

Sacca, B., Lacroix, L. and Mergny, J. L. (2005) 'The effect of chemical modifications on the thermal stability of different G-quadruplex-forming oligonucleotides', Nucleic Acids Res, 33(4), pp. 1182-92.

88 Safa, L., Delagoutte, E., Petruseva, I., Alberti, P., Lavrik, O., Riou, J.-F. and Saintomé, C. (2014) 'Binding polarity of RPA to telomeric sequences and influence of G-quadruplex stability', Biochimie, 103, pp. 80-8.

Salas, T. R., Petruseva, I., Lavrik, O., Bourdoncle, A., Mergny, J. L., Favre, A. and Saintome, C. (2006) 'Human replication protein A unfolds telomeric G-quadruplexes', Nucleic Acids Res, 34(17), pp. 4857-65.

Sanders, C. M. (2010) 'Human Pif1 helicase is a G-quadruplex DNA-binding protein with G- quadruplex DNA-unwinding activity', Biochem J, 430(1), pp. 119-28.

Schnable, P. S. and Ware, D. and Fulton, R. S. and Stein, J. C. and Wei, F. and Pasternak, S. and Liang, C. and Zhang, J. and Fulton, L. and Graves, T. A. and Minx, P. and Reily, A. D. and Courtney, L. and Kruchowski, S. S. and Tomlinson, C. and Strong, C. and Delehaunty, K. and Fronick, C. and Courtney, B. and Rock, S. M. and Belter, E. and Du, F. and Kim, K. and Abbott, R. M. and Cotton, M. and Levy, A. and Marchetto, P. and Ochoa, K. and Jackson, S. M. and Gillam, B. and Chen, W. and Yan, L. and Higginbotham, J. and Cardenas, M. and Waligorski, J. and Applebaum, E. and Phelps, L. and Falcone, J. and Kanchi, K. and Thane, T. and Scimone, A. and Thane, N. and Henke, J. and Wang, T. and Ruppert, J. and Shah, N. and Rotter, K. and Hodges, J. and Ingenthron, E. and Cordes, M. and Kohlberg, S. and Sgro, J. and Delgado, B. and Mead, K. and Chinwalla, A. and Leonard, S. and Crouse, K. and Collura, K. and Kudrna, D. and Currie, J. and He, R. and Angelova, A. and Rajasekar, S. and Mueller, T. and Lomeli, R. and Scara, G. and Ko, A. and Delaney, K. and Wissotski, M. and Lopez, G. and Campos, D. and Braidotti, M. and Ashley, E. and Golser, W. and Kim, H. and Lee, S. and Lin, J. and Dujmic, Z. and Kim, W. and Talag, J. and Zuccolo, A. and Fan, C. and Sebastian, A. and Kramer, M. and Spiegel, L. and Nascimento, L. and Zutavern, T. and Miller, B. and Ambroise, C. and Muller, S. and Spooner, W. and Narechania, A. and Ren, L. and Wei, S. and Kumari, S. and Faga, B. and Levy, M. J. and McMahan, L. and Van Buren, P. and Vaughn, M. W. and Ying, K. and Yeh, C. T. and Emrich, S. J. and Jia, Y. and Kalyanaraman, A. and Hsia, A. P. and Barbazuk, W. B. and Baucom, R. S. and Brutnell, T. P. and Carpita, N. C. and Chaparro, C. and Chia, J. M. and Deragon, J. M. and Estill, J. C. and Fu, Y. and Jeddeloh, J. A. and Han, Y. and Lee, H. and Li, P. and Lisch, D. R. and Liu, S. and Liu, Z. and Nagel, D. H. and McCann, M. C. and SanMiguel, P. and Myers, A. M. and Nettleton, D. and Nguyen, J. and Penning, B. W. and Ponnala, L. and Schneider, K. L. and Schwartz, D. C. and Sharma, A. and Soderlund, C. and Springer, N. M. and Sun, Q. and Wang, H. and Waterman, M. and Westerman, R. and Wolfgruber, T. K. and Yang, L. and Yu, Y. and Zhang, L. and Zhou, S. and Zhu, Q. and Bennetzen, J. L. and Dawe, R. K. and Jiang, J. and Jiang, N. and Presting, G. G. and Wessler, S. R. and Aluru, S. and Martienssen, R. A. and Clifton, S. W. and McCombie, W. R. and Wing, R. A. and Wilson, R. K. (2009) 'The B73 maize genome: complexity, diversity, and dynamics', Science, 326(5956), pp. 1112-5.

Schneider, C. A., Rasband, W. S. and Eliceiri, K. W. (2012) 'NIH Image to ImageJ: 25 years of image analysis', Nat Methods, 9(7), pp. 671-5.

Schonhoft, J. D., Das, A., Achamyeleh, F., Samdani, S., Sewell, A., Mao, H. and Basu, S. (2010) 'ILPR repeats adopt diverse G-quadruplex conformations that determine insulin binding', Biopolymers, 93(1), pp. 21-31.

89 Scognamiglio, P. L., Di Natale, C., Leone, M., Poletto, M., Vitagliano, L., Tell, G. and Marasco, D. (2014) 'G-quadruplex DNA recognition by nucleophosmin: new insights from protein dissection', Biochim Biophys Acta, 1840(6), pp. 2050-9.

Seenisamy, J., Rezler, E. M., Powell, T. J., Tye, D., Gokhale, V., Joshi, C. S., Siddiqui-Jain, A. and Hurley, L. H. (2004) 'The dynamic character of the G-quadruplex element in the c-MYC promoter and modification by TMPyP4', Journal of the American Chemical Society, 126(28), pp. 8702-9.

Sekhon, R. S., Lin, H., Childs, K. L., Hansey, C. N., Buell, C. R., de Leon, N. and Kaeppler, S. M. (2011) 'Genome-wide atlas of transcription during maize development', Plant J, 66(4), pp. 553-63.

Shafer, R. H. and Smirnov, I. (2000) 'Biological aspects of DNA/RNA quadruplexes', Biopolymers, 56(3), pp. 209-27.

Shen, Y., Han, Y. J., Kim, J. I. and Song, P. S. (2008) 'Arabidopsis nucleoside diphosphate kinase- 2 as a plant GTPase activating protein', BMB Rep, 41(9), pp. 645-50.

Shen, Y., Kim, J. I. and Song, P. S. (2005) 'NDPK2 as a signal transducer in the phytochrome- mediated light signaling', J Biol Chem, 280(7), pp. 5740-9.

Shin, Y. J., Kumarasamy, V., Camacho, D. and Sun, D. (2015) 'Involvement of G-quadruplex structures in regulation of human RET gene expression by small molecules in human medullary thyroid carcinoma TT cells', Oncogene, 34(10), pp. 1292-9.

Siddiqui-Jain, A., Grand, C. L., Bearss, D. J. and Hurley, L. H. (2002) 'Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription', Proc Natl Acad Sci U S A, 99(18), pp. 11593-8.

Sievers, F. and Higgins, D. G. (2014) 'Clustal Omega, accurate alignment of very large numbers of sequences', Methods Mol Biol, 1079, pp. 105-16.

Singh, H., LeBowitz, J. H., Baldwin, A. S., Jr. and Sharp, P. A. (1988) 'Molecular cloning of an enhancer binding protein: isolation by screening of an expression library with a recognition site DNA', Cell, 52(3), pp. 415-23.

Smestad, J. A. and Maher, L. J. (2015) 'Relationships between putative G-quadruplex-forming sequences, RecQ helicases, and transcription', BMC Medical Genetics, 16(1), pp. 91-91.

Suloway, C., Pulokas, J., Fellmann, D., Cheng, A., Guerra, F., Quispe, J., Stagg, S., Potter, C. S. and Carragher, B. (2005) 'Automated molecular microscopy: The new Leginon system', Journal of Structural Biology, 151(1), pp. 41-60.

Sun, D. and Hurley, L. H. (2010) 'Biochemical techniques for the characterization of G-quadruplex structures: EMSA, DMS footprinting, and DNA polymerase stop assay', Methods in molecular biology (Clifton, N.J.), 608, pp. 65-79.

Sun, D., Liu, W.-J., Guo, K., Rusche, J. J., Ebbinghaus, S., Gokhale, V. and Hurley, L. H. (2008) 'The proximal promoter region of the human vascular endothelial growth factor gene has a G-

90 quadruplex structure that can be targeted by G-quadruplex-interactive agents', Molecular cancer therapeutics, 7(4), pp. 880-9.

Takahashi, H., Nakagawa, A., Kojima, S., Takahashi, A., Cha, B. Y., Woo, J. T., Nagai, K., Machida, Y. and Machida, C. (2012) 'Discovery of novel rules for G-quadruplex-forming sequences in plants by using bioinformatics methods', J Biosci Bioeng, 114(5), pp. 570-5.

Teyssier, J., Saenko, S. V., Marel, D. v. d. and Milinkovitch, M. C. (2015) 'Photonic crystals cause active colour change in chameleons', Nature Communications, Published online: 10 March 2015; | doi:10.1038/ncomms7368.

Thakur, R. K., Kumar, P., Halder, K., Verma, A., Kar, A., Parent, J. L., Basundra, R., Kumar, A. and Chowdhury, S. (2009) 'Metastases suppressor NM23-H2 interaction with G-quadruplex DNA within c-MYC promoter nuclease hypersensitive element induces c-MYC expression', Nucleic Acids Res, 37(1), pp. 172-83.

Thomas, P. S. (1980) 'Hybridization of denatured RNA and small DNA fragments transferred to nitrocellulose', Proc Natl Acad Sci U S A, 77(9), pp. 5201-5.

Tippana, R., Xiao, W. and Myong, S. (2014) 'G-quadruplex conformation and dynamics are determined by loop length and sequence', Nucleic acids research, 42(12), pp. 8106-14.

Tosoni, E., Frasson, I., Scalabrin, M., Perrone, R., Butovskaya, E., Nadai, M., Palu, G., Fabris, D. and Richter, S. N. (2015) 'Nucleolin stabilizes G-quadruplex structures folded by the LTR promoter and silences HIV-1 viral transcription', Nucleic Acids Res, 43(18), pp. 8884-97.

United States. Department of Agriculture. Economics and Statistics Service., United States. Foreign Agricultural Service., United States. World Food and Agricultural Outlook and Situation Board., United States. Department of Agriculture. Economic Research Service. and United States. World Agricultural Outlook Board. World agricultural supply and demand estimates. Washington, D.C.: The Dept. Supt. of Docs., U.S. G.P.O.

Urnov, F. D. and Wolffe, A. P. (2001) 'Above and within the genome: epigenetics past and present', J Mammary Gland Biol Neoplasia, 6(2), pp. 153-67.

Uzman, A. (2001) 'Molecular Cell Biology (4th edition) Harvey Lodish, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira, David Baltimore and James Darnell; Freeman & Co., New York, NY, 2000, 1084 pp., list price $102.25, ISBN 0-7167-3136-3', Biochemistry and Molecular Biology Education, 29(3), pp. 126-128.

Valton, A. L., Hassan-Zadeh, V., Lema, I., Boggetto, N., Alberti, P., Saintome, C., Riou, J. F. and Prioleau, M. N. (2014) 'G4 motifs affect origin positioning and efficiency in two vertebrate replicators', Embo j, 33(7), pp. 732-46.

Vasilyev, N., Polonskaia, A., Darnell, J. C., Darnell, R. B., Patel, D. J. and Serganov, A. (2015) 'Crystal structure reveals specific recognition of a G-quadruplex RNA by a beta-turn in the RGG motif of FMRP', Proc Natl Acad Sci U S A, 112(39), pp. E5391-400.

91 Venczel, E. A. and Sen, D. (1993) 'Parallel and antiparallel G-DNA structures from a complex telomeric sequence', Biochemistry, 32(24), pp. 6220-8.

Verma, A., Halder, K., Halder, R., Yadav, V. K., Rawal, P., Thakur, R. K., Mohd, F., Sharma, A. and Chowdhury, S. (2008) 'Genome-wide computational and expression analyses reveal G- quadruplex DNA motifs as conserved cis-regulatory elements in human and related species', J Med Chem, 51(18), pp. 5641-9.

Vinod Prabhu, V., Siddikuzzaman, Berlin Grace, V. M. and Guruvayoorappan, C. 2012. Targeting tumor metastasis by regulating Nm23 gene expression.

Wang, Y. and Patel, D. J. (1993) 'Solution structure of a parallel-stranded G-quadruplex DNA', J Mol Biol, 234(4), pp. 1171-83.

Wang, Z., Li, H., Ke, Q., Jeong, J. C., Lee, H. S., Xu, B., Deng, X. P., Lim, Y. P. and Kwak, S. S. (2014) 'Transgenic alfalfa plants expressing AtNDPK2 exhibit increased growth and tolerance to abiotic stresses', Plant Physiol Biochem, 84C, pp. 67-77.

Watson, J. D. and Crick, F. H. (1953) 'Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid', Nature, 171(4356), pp. 737-8.

Webba da Silva, M. (2007) 'Geometric formalism for DNA quadruplex folding', Chemistry (Weinheim an der Bergstrasse, Germany), 13(35), pp. 9738-45.

Williamson, J. R., Raghuraman, M. K. and Cech, T. R. (1989) 'Monovalent cation-induced structure of telomeric DNA: the G-quartet model', Cell, 59(5), pp. 871-80.

Woodbury, C. P., Jr. and von Hippel, P. H. (1983) 'On the determination of deoxyribonucleic acid- protein interaction parameters using the nitrocellulose filter-binding assay', Biochemistry, 22(20), pp. 4730-7.

Wu, Y., Shin-ya, K. and Brosh, R. M. (2008) 'FANCJ Helicase Defective in Fanconia Anemia and Breast Cancer Unwinds G-Quadruplex DNA To Defend Genomic Stability', Molecular and Cellular Biology, 28(12), pp. 4116-4128.

Yafe, A., Etzioni, S., Weisman-Shomer, P. and Fry, M. (2005) 'Formation and properties of hairpin and tetraplex structures of guanine-rich regulatory sequences of muscle-specific genes', Nucleic acids research, 33(9), pp. 2887-900.

Yano, M. and Kato, Y. (2014) 'Using hidden Markov models to investigate G-quadruplex motifs in genomic sequences', BMC Genomics, 15 Suppl 9, pp. S15.

Zaug, A. J., Podell, E. R. and Cech, T. R. (2005) 'Human POT1 disrupts telomeric G-quadruplexes allowing telomerase extension in vitro', Proc Natl Acad Sci U S A, 102(31), pp. 10864-9.

Zhang, D. H., Fujimoto, T., Saxena, S., Yu, H. Q., Miyoshi, D. and Sugimoto, N. (2010) 'Monomorphic RNA G-quadruplex and polymorphic DNA G-quadruplex structures responding to cellular environmental factors', Biochemistry, 49(21), pp. 4554-63.

92 Zhang, N., Gorin, A., Majumdar, A., Kettani, A., Chernichenko, N., Skripkin, E. and Patel, D. J. (2001) 'V-shaped scaffold: a new architectural motif identified in an A x (G x G x G x G) pentad- containing dimeric DNA quadruplex involving stacked G(anti) x G(anti) x G(anti) x G(syn) tetrads', J Mol Biol, 311(5), pp. 1063-79.

Zhang, S., Wu, Y. and Zhang, W. (2014a) 'G-quadruplex structures and their interaction diversity with ligands', ChemMedChem, 9(5), pp. 899-911.

Zhang, Y., Gaetano, C. M., Williams, K. R., Bassell, G. J. and Mihailescu, M. R. (2014b) 'FMRP interacts with G-quadruplex structures in the 3'-UTR of its dendritic target Shank1 mRNA', RNA Biol, 11(11), pp. 1364-74.

Zimmerman, S. B., Cohen, G. H. and Davies, D. R. (1975) 'X-ray fiber diffraction and model- building study of polyguanylic acid and polyinosinic acid', Journal of molecular biology, 92(2), pp. 181-92.

93 BIOGRAPHICAL SKETCH

Mykhailo Kopylov CURRICULUM VITAE

PERSONAL PROFILE STATEMENT:

I am a PhD candidate in the Molecular Biophysics program at Florida State University trained in biochemistry and structural biology with an emphasis in Transmission Electron Microscopy. My main project is centered around cryo-electron microscopy characterization of complexes formed by a nucleoside diphosphate kinase. While working on this project I have learned single particle reconstruction as well as cryo-electron tomography methods. Additionally, I have been working closely with the manager of the Biological Facility Imaging Resource, here at FSU, as a collaborator on a several other projects. I enjoy working in a dynamic environment of the imaging facility and being constantly exposed to new specimens and new challenges.

EDUCATION:

- Florida State University, Institute of Molecular Biophysics: 2012–current PhD candidate, Molecular Biophysics Project: Structural investigation of protein:DNA complex formed by ZmNDPK1 and a G- quadruplex forming aptamer hex4_A5U. Projected graduation: Fall 2017 Advisor: Dr. M. Elizabeth Stroupe

- CUNY Brooklyn College, Brooklyn, NY: 2010–2012 MA Biology Thesis: Biosynthesis of Mycobacterial cell wall components Advisor: Dr. Luis E. N. Quadri

- Bohdan Khmelnitsky National University at Cherkasy, Ukraine: 2001–2005 BS with honors, Biology Thesis: “Biochemical aspects of developing stress conditions”. Advisors: Boechko FF, Boechko LO

TRANSMISSION ELECTRON MICROSCOPY (TEM) EXPERIENCE:

Sample and grid preparation for TEM:

- Preparation of TEM support grids including: plastic-supported grids with evaporated carbon, floated-carbon grids, cryo-well grids and graphene-oxide coated grids; - Plasma-cleaning (glow discharging); - Preparation of negatively stained specimens; - Preparation of vitrified specimens by plunging into liquid ethane using FEI Vitrobot; - Grid clipping for FEI Titan Krios.

94 TEM operation:

- Performing routine maintenance and imaging on conventional cryogenic TEM (ie FEI CM120): cryo-cycling, astigmatism correction, aperture centering, flat-fields correction, resolving camera communication issues, etc.; - Performing all necessary calibration steps to ensure high-quality data acquisition on FEI Titan Krios TEM, such as: cryo-cycling, direct alignment of the beam, pivot point correction, image shift adjustment, flat fields correction, dose matching, coma-free alignment etc.

Data acquisition:

- Grids screening and data collection on FEI CM120 in low-dose and diffraction modes; - Setup of automatic data acquisition using Leginon software in single-particle and tomography modes on FEI Titan Krios.

Data handling and processing:

- Working knowledge of Leginon/Appion data processing workflow including frame alignment, CTF estimation, particle picking, stack creation, particle sorting and 2D and 3D classification; - Particle picking and particle stack processing followed by 2D and 3D classification and reconstruction using RELION, EMAN2 and cryoSPARC software packages; - Tilt series alignment and reconstruction using Protomo (as implemented in Appion), and TomoJ; - Tilt-series CTF correction using TOMOCTF and dose compensation; - Basic knowledge of subtomogram averaging and classification using Dynamo; - Managing large volumes of data and maintaining data backup on external storage.

COMPUTER SKILLS

- Assembling custom high-performance computer workstations; - Installation and administration of Linux based operating systems, setting up multi-user environments and data processing software packages; - Basic knowledge of bash and awk scripting; - Installation of Windows operating system and all required software; - Proficient with MS Word, Excel and PowerPoint, Adobe Photoshop and Illustrator

OTHER RESEARCH SKILLS ACQUIRED:

- Circular Dichroism of proteins and DNA to characterize their secondary structures; - Dynamic Light Scattering to investigate the higher order assembly of macromolecular complexes; - Isothermal Titration Calorimetry, Electrophoretic Mobility Shift Assay and Nitrocellulose Filter Binding Assay methods for protein-DNA binding constant determination; - Analytical Ultracentrifugation (using Ultrascan software) for determining the oligomerization and folding state of DNA aptamers;

95 - Size Exclusion Chromatography-Multi Angle Light Scattering to assess the quality of samples; - X-ray crystallography to determine the atomic structure of the NDPK protein (PDB: 1VYA); - Various Protein, DNA, and RNA purification techniques such as affinity chromatography, size exclusion chromatography, ion exchange, gel purification etc. - PERSONAL AND COMMUNICATION SKILLS:

Teaching:

- FSU Institute of Molecular Biophysics: 2012–current Teaching assistant (2 semesters) Course: General Biology I laboratory (BSC 2010L)

- CUNY Brooklyn College: 2011–2012 Adjunct lecturer Course: General Biology laboratory

Other:

- Participated in the organization of 2014 Natural Sciences Graduate Symposium at FSU; - Vice-president of the ‘Students for Effective Communication in Science’ graduate student organization 2014–2015; - President of the ‘Students for Effective Communication in Science’ organization 2015– 2016; - Organized a visit of cartoonist Jorge Cham (author of PHD comics) to FSU campus in 2016; - Directly mentored two Honors in the Major undergraduate students in the Stroupe laboratory; - Participated in Young Scholars Program in Summer 2013 and Summer 2014 that focused on teaching high-school students the scientific method, formulating hypotheses, designing and performing hands-on experiments; - Performed a cryo-EM lab tour for freshmen undergraduate students – members of Women In Mathematics Science and Engineering student organization in 2017 (WIMSE) - Fluent in English, Russian and Ukrainian languages.

PREVIOUS WORK EXPERIENCE:

Custom kitchen assembly shop supervisor (Masterpiece LLC, Brooklyn, NY, 2008-2011)

- Ensuring uninterrupted process of the furniture assembly line; - Ordering supplies and arranging deliveries; - Forklift operation; - Receiving project updates from clients and incorporating them on-the-fly into the work process; - Providing customer support regarding repairs and replacements; - Hiring new personnel.

96 Construction worker – renovations (Brooklyn,NY 2006-2008)

- Demolition; - Carpentry, including framing, drywall installation, doors and windows installation; - Painting and drywall repair; - Floor and wall tiling.

ACADEMIC ACCOMPLISHMENTS:

Publications:

1. Vergnolle, O., Chavadi, S. S., Edupuganti, U. R., Mohandas, P., Chan, C., Zeng, J., Kopylov, M., Angelo, N. G., Warren, J. D., Soll, C. E., et al. (2015). Biosynthesis of mycobacterial cell-envelope- associated phenolic glycolipids in Mycobacterium marinum. J. Bacteriol., 197, JB.02546-14. 2. Andorf, C. M., Kopylov, M., Dobbs, D., Koch, K. E., Stroupe, M. E., Lawrence, C. J. and Bass, H. W. (2014). G-Quadruplex (G4) Motifs in the Maize (Zea mays L.) Genome Are Enriched at Specific Locations in Thousands of Genes Coupled to Energy Status, Hypoxia, Low Sugar, and Nutrient Deprivation. J. Genet. Genomics, 41, 627–647. 3. Kopylov, M., Bass, H. W. and Stroupe, M. E. (2015) The Maize (Zea mays L.) Nucleoside Diphosphate Kinase1 (ZmNDPK1) Gene Encodes a Human NM23-H2 Homologue That Binds and Stabilizes G- Quadruplex DNA. Biochemistry, 54, 1743–1757. 4. Bucur,C.B., Jones,M., Kopylov,M., Spear,J. and Muldoon,J. (2017) Inorganic–organic layer by layer hybrid membranes for lithium–sulfur batteries. Energy Environ. Sci., 10, 905–911.

Presentations:

- ‘G-Quadruplex polymorphism’ presentation, FSU Biochemistry Seminar series; January 2016; - ‘Structural investigation of ordered assembly formation by NDPKs in presence of nucleotides’ poster presentation, Gordon Research Conference ‘Nucleotides, Nucleosides and Oligonucleotides’, Newport (RI), July 2015. - ‘Default settings high-resolution reconstructions on a local workstation’ poster presentation, Gordon Research Conference ‘3-dimensional electron microscopy’, Switzerland, July 2017.

97