The Pennsylvania State University

The Graduate School

Department of Chemistry

ARABIDOPSIS THALIANA UNDER STRESS: FUNCTIONAL IMPLICATIONS OF

G-QUADRUPLEX AND NON-CODING RNA STRUCTURE

A Dissertation in

Chemistry

by

Melissa A. Mullen

2011 Melissa A. Mullen

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

August 2011

i

The dissertation of Melissa A. Mullen was reviewed and approved* by the following:

Philip C. Bevilacqua Professor of Chemistry Dissertation Co-Advisor Co-Chair of Committee

Sarah M. Assmann Waller Professor of Biology Dissertation Co-Advisor Co-Chair of Committee

Christine D. Keating Associate Professor of Chemistry

David D. Boehr Assistant Professor of Chemistry

Paul Babitzke Professor of Biochemistry and Molecular Biology

Barbara J. Garrison Shapiro Professor of Chemistry Head of the Department of Chemistry

*Signatures are on file in the Graduate School

iii

ABSTRACT

RNA can adopt diverse secondary and tertiary structures including duplexes, hairpins, quadruplexes, and pseudoknots. Changes in RNA structures have been shown to affect expression in all domains of life. In plant systems survival is dependent on stress responses to changing environmental conditions, including drought stress. Increases in ionic strength, K+, and osmolyte concentrations in response to stress affect RNA folding thus RNA structure that can refold under such conditions has the potential to have a regulatory effect. The objective of this work is to investigate RNA folding in the model plant system Arabidopsis thaliana under normal and stress conditions and to evaluate the potential for RNA-mediated gene regulation using bioinformatics, biochemistry, and biology.

G-quadruplex sequences (GQS), which require K+ for formation, have been shown to regulate gene expression in H. sapiens. Using computational methods, I found that GQS of varying motifs are present in Arabidopsis thaliana and that the distribution of GQS varies with sequence motif. Using a Markov modeled genome we determined that GQS with three G-tracts

(G3) are significantly underrepresented in the genome. Functional analysis was performed on

GQS present in RNA indicated that GQS are overrepresented in encoding of certain functional categories, including enzyme activity. I also found that genes differentially regulated by drought were significantly more likely to contain a GQS. Circular dichrosim (CD)- detected K+ titrations and UV thermal denaturation performed on representative RNAs verified that quadruplexes formed at physiological K+ concentrations. Results from this study indicate that

GQS are present in unique locations in A. thaliana and that folding of RNA GQS may play important roles in regulating gene expression.

iv

In vitro folding of GQS under normal and drought stress conditions was investigated using biochemical techniques. We found that RNA GQS form with folding parameters that depend on loop length, sequence and G-tract length. Folding parameters for RNA GQS were determined using CD-detected K+ titrations and indicated that RNA sequences with G3 had a modest dependence on K+ concentration and folded with negative cooperativity. Native gel analysis and RNase T1 protection assays revealed that G3 GQS are associated with populated intermediate folding states. GQS with two G-tracts (G2) had a steep K+ dependence, folded with a high cooperativity factor, and did not significantly populate intermediate states. In a plant system under unstressed conditions, the more stable G3 sequences were predicted to fold, however the less stable G2 sequences were only predicted to have a favorable folding free energy at high K+ concentrations, which are found during drought stress. GQS in the RNA have the potential to switch structure in response to high K+ conditions and thus regulate gene expression as a drought stress response.

Some heavy metals, including Cu2+, Zn2+, and Fe2+ are essential for plant metabolism and normal function; however they are toxic in high concentrations. Heavy metal interactions with

DNA GQS were monitored by CD. Results from the analysis indicate that Cu2+ interacts with

DNA GQS more strongly than other heavy metal ions and destroys the quadruplex structure.

Additionally, RT-PCR results indicated that expression of some copper transport genes that contain a GQS is up-regulated when in plants grown on Cu2+-supplemented media. This response could be a mechanism to survive an excess of heavy metal ions in the local environment.

Finally, a study of the structure of the 3‘UTR of the abscisic acid (ABA) catabolism gene

CYP707A4, was conducted to characterize structure and potential binding interactions with ABA.

ABA, a plant hormone that mediates the drought stress response in Arabidopsis, increases in concentration to µM levels under drought stress. The AU-rich 3‘UTR of CYP707A4 has an ion- dependent structure and undergoes an RNA refolding event with Mg2+ and K+, as determined

v using UV-detected thermal denaturation, CD, and hydroxyl radical footprinting. Isothermal titration calorimetry, circular dichroism, and equilibrium dialysis were used to test for ABA binding to the 3‘UTR of CYPY707A4. Results from these techniques suggest at most a weak and non-specific interaction with ABA.

vi

TABLE OF CONTENTS

LIST OF FIGURES ………………………………………………………………………… x

LIST OF TABLES ………………………………………………………………………….. xvii

ABBREVIATIONS …………………………………………………………………………. xix

ACKNOWLEGEMENTS …………………………………………………………………... xxi

Chapter 1 Introduction ...... 1

1.1 Nucleic Acid Structure ...... 1 1.2 G-quadruplex and i-motif ...... 5 1.2.1 Structure ...... 5 1.2.2 Location ...... 9 1.2.3 Function ...... 10 1.2.4 GQS formation in vivo ...... 13 1.3 Functional RNAs in plants ...... 14 1.3.1 Small RNAs...... 16 1.3.2 Thiamine pyrophosphate (TPP) riboswitch ...... 17 1.4 Plant stress physiology ...... 18 1.5 Thesis objectives ...... 19 1.6 References ...... 19

Chapter 2 RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: Prevalence and possible functional roles ...... 29

2.1 Abstract ...... 29 2.2 Introduction ...... 30 2.3 Materials and methods ...... 33 2.3.1 Data sets searched ...... 33 2.3.2 Programs used ...... 33 2.3.3 Oligonucleotide preparation ...... 36 2.3.4 Circular dichroism ...... 37 2.3.5 UV thermal denaturation ...... 38 2.4 Results ...... 39 2.4.1 Prevalence and significance of GQS in Arabidopsis ...... 39 2.4.2 Location of GQS in the Arabidopsis genome ...... 42 2.4.3 Characterization of GQS found in the intergenic region ...... 46 2.4.4 Characterization of GQS found in the genic region ...... 48 2.4.5 Potential biological functions of GQS in Arabidopsis ...... 49 2.4.6 Experimental evidence for G-quadruplex formation ...... 53 2.5 Discussion ...... 57 2.5.1 Formation and function of GQS in DNA and RNA ...... 57 2.6 Conclusion ...... 60 2.7 Acknowledgements ...... 60 2.8 References ...... 61

vii

Chapter 3 Potential for a binary gene response: RNA G-quadruplexes with fewer quartets fold with higher cooperativity ...... 69

3.1 Abstract ...... 69 3.2 Cooperativity of RNA GQS folding ...... 69 3.3 Acknowledgements ...... 80 3.4 References ...... 80

Chapter 4 Effects of heavy metal ions on G-quadruplex folding and copper ion transport genes in Arabidopsis thaliana ...... 83

4.1 Abstract ...... 83 4.2 Introduction ...... 83 4.2.1 Heavy metal ion toxicity in plants ...... 83 4.2.2 Heavy metal ion transport in plants ...... 84 4.2.3 Nucleic acid-metal ion interaction ...... 89 4.3 Materials and methods ...... 90 4.3.1 Bioinformatics ...... 90 4.3.2 Oligonucleotide preparation ...... 91 4.3.3 Native gels ...... 93 4.3.4 UV-detected thermal denaturation ...... 93 4.3.5 Circular dichroism ...... 94 4.3.6 Plant growth for RT-PCR ...... 95 4.3.7 Tissue harvest and TRIzol total RNA isolation ...... 95 4.3.8 mRNA isolation and DNase treatment ...... 96 4.3.9 RT-PCR ...... 97 4.3.10 RT-PCR primers ...... 97 4.3.11 T-DNA insertion mutants and genotyping ...... 98 4.4 Results ...... 100 4.4.1 Bioinformatics ...... 100 4.4.2 Circular dichroism of model DNA oligonucleotides ...... 103 4.4.3 WT plant growth on copper-containing media ...... 108 4.4.4 RT-PCR of oligos of interest ...... 111 4.4.5 T-DNA mutant genotyping results ...... 112 4.5 Discussion ...... 113 4.6 Future Directions ...... 114 4.7 Acknowledgements ...... 115 4.8 References ...... 115

Chapter 5 The 3‘UTR of an abscisic acid catabolism gene has an ion-dependent tertiary structure ...... 122

5.1 Abstract ...... 122 5.2 Introduction ...... 122 5.2.1 Primary and secondary metabolites in plants ...... 122 5.2.2 Plant stress response hormone abscisic acid ...... 124 5.2.3 Abscisic acid biosynthesis and catabolism ...... 126 5.2.4 Riboswitches as RNA regulators ...... 130 5.3 Materials and methods ...... 133

viii

5.3.1 Bioinformatics ...... 133 5.3.2 RNAs studied ...... 133 5.3.3 Preparation of RNA ...... 134 5.3.3 Circular dichroism ...... 136 5.3.4 UV detected thermal denaturation ...... 137 5.3.5 Isothermal titration calorimetry ...... 137 5.3.6 Structure Mapping ...... 138 5.3.7.Equilibrium Dialysis ...... 139 5.4 Results ...... 140 5.4.1 Bioinformatics ...... 140 5.4.2. Mg2+ and K+ dependent folding ...... 143 5.4.3.ABA Binding...... 147 5.4.4 Poly(A) tailed constructs ...... 151 5.5 Discussion ...... 152 5.6 Future Directions ...... 153 5.7 References ...... 154

Chapter 6 Conclusions and Future Directions ...... 162

6.1 Conclusions from this work ...... 162 6.2 Future directions ...... 163 6.2.1 Preliminary studies ...... 163 6.2.2 Ongoing work ...... 173 6.2.3 Additional projects of potential interest ...... 174 6.3 References ...... 175

Appendix A. Chapter 2 Supplemental Information ...... 179

A.1 Grep scripts used for Markov model analysis and whole genome GQS searches .... 181 A.1.1 For general GQS search in each ...... 181 A.1.2 For Markov modeled genomes ...... 183 A.1.3. For general GQS search in genomic regions ...... 183 A.2 Ushuffle: Alternative calculation of GQS significance in the Arabidopsis genome ...... 184 A.3 Functional analysis of genes ...... 187 A.4 References ...... 196

Appendix B. Chapter 3 Supplemental Information ...... 198

B.1 Materials and Methods ...... 198 B.1.1 RNA preparation ...... 198 B.1.2 Non-denaturing (native) gel electrophoresis ...... 199 B.1.3 Circular dichroism (CD) ...... 200 B.1.4 UV-detected thermal denaturation (melts) ...... 201 B.1.5 RNase T1 protection assays ...... 202 B.2 Supplemental figures ...... 203

Appendix C. Chapter 4 Supplemental Information ...... 207

ix

C.1 Functional analysis results ...... 208 C.2 Database of literature-annotated copper ion-related genes and sequences ...... 212

Appendix D. Chapter 6 Supplemental Information ...... 226

x

LIST OF FIGURES

Figure 1.1: RNA base pairs with the Watson-Crick face highlighted in blue and Hoogsteen face highlighted in orange. (a) Watson-Crick base pairs G C and A U. (b) Non Watson-Crick interactions are exemplified in a G U wobble (left) and an A U reverse Hoogsteen interaction (right). (c) Non Watson-Crick interactions involving the same nucleotide C+ C (left) and G G (right)...... 2

Figure 1.2: Secondary (a,b) and tertiary (c) nucleic acid structures. Secondary: (a) duplex (b) hairpin, internal loop, bulge (L to R). Tertiary (c): helical junctions, kissing hairpin loops, pseudoknot (L to R) ...... 4

Figure 1.3: G-Quadruplex structure. (a) G-quartet base pairing structure, showing interactions between guanine residues at the Watson-Crick and Hoogsteen faces. The central dehydrated monovalent ion is necessary for formation and interacts with the O6 residues. Unimolecular (b) parallel and (c) antiparallel topologies. Dark lines follow the nucleic acid strand, arrowheads denote strand directionality, and gray boxes denote quadruplexes. The examples drawn here are for sequences having three quartets. Numbers by loops indicate different types of loops: 1 lateral, 2 diagonal, 3 double chain reversal...... 6

Figure 1.4: G-quadruplex topologies.37 (a) Parallel ―propeller‖ structure with all double + 38 chain reversal loops. Human telomeric DNA: d[A(G3TTA)3G3] in K . (b) Antiparallel ―chair‖ structure with all lateral loops. Thrombin Binding Aptamer 39 (TBA): d(G2TTG2TGTG2TTG2). (c) Antiparallel ―mixed‖ structure with lateral, lateral, and double chain reversal lops. BCL2 promoter GQS: 40 d(G3CGCG3AG2AATTG3CG3). (d) Antiparallel structure with double chain 41 reversal, lateral, lateral loops. Human telomeric DNA: d[TTA(G3TTA)3G3A]. (e) Antiparallel ―basket‖ structure with lateral, diagonal, lateral loops. Human 42 telomeric DNA: d[AG3TTA)3G3]. (f) Antiparallel structure with diagonal, double chain reversal, and diagonal loops. DNA quadruplex: 43 d(G2TTTTG2CAG3TTTTG2T). Guanines are shown in green, thymine as blue, cytosine as yellow, and adenine as red panels...... 7

Figure 1.5: Imotif structure. (a) C+ C base pair, showing hydrogen bonding and orientation of the residues. Protonation of N3 of cytosine is possible at low or neutral pH. I-motif formation as a (b) tetramolecular [d(TCC)]4, (c) dimer [d(5mCCTTTACC)]2, and (d) monomer d(5mCCTTTCCTTTACCTTTCC). Different strands are color coded with varying shades of gray and arrows indicate direction of each strand. Circle and triangle size are indicative of depth; larger circles are closer to the reader and smaller circles are further into the paper. Adapted from Gueron, et al.48 ...... 10

Figure 1.6: GQS function. (a) Transcription regulation by GQS found in promoter regions (grey boxes). Formation of GQS (and i-motif) inhibit transcription. Stabilization of GQS with GQS stabilization ligamd, such as TMPyP4, also inhibits transcription. Transcription is up-regulated by binding of single stranded DNA binding proteins CNBP and hmRNP K, which allow RNA polymerase to function. (b) Translation

xi

regulation by GQS found in the 5‘UTR of mRNA. Without formation of GQS (top), translation is on. Formation of GQS in 5‘UTR leads to translation repression...... 11

Figure 1.7: Models for GQS folding in DNA from Phan, et al.88 (a) GQS and i-motif formation at the same location on the duplex, (b) i-motif formation, (c), GQS formation, and (d) GQS and i-motif formation at different locations on the duplex...... 15

Figure 2.1: G-quartet and G-quadruplex structures and topologies. (a) G-quartet structure, showing Hoogsteen-to-Watson-Crick face hydrogen bonds and the central dehydrated monovalent ion integral to formation and stabilization. Unimolecular (b) parallel and (c) antiparallel G-quadruplex topologies. Adapted from1,6,10,11. Dark lines follow the nucleic acid strand, arrowheads denote strand directionality, and gray boxes denote quadruplexes. The examples drawn here are for sequences having three quartets...... 31

Figure 2.2: Density of GQS in different organisms. Density of GQS in each genome is represented by a bar and given in GQS/Mb. GQS here is defined as G3L1-7. GQS density is provided for each organism at the right-hand end of the bar. All numbers are from this study except H. sapiens which is from Huppert and coworkers21 and Todd and coworkers22. Common names: Homo sapiens (Human), Drosophila melanogaster (fruitfly), Mus musculus (mouse), Arabidopsis thaliana (mouse ear cress), Arabidopsis lyrata (lyrate rock cress), Manihot esculenta (cassava), Lotus japonicus (Lotus japonicus), Glycine max (soybean), Oryza sativa indica (rice – indica), Zea mays (corn), and Physcomitrella patens (Physcomitrella patens, a moss)...... 40

Figure 2.3: Ratio of RNA GQS density to non-genic DNA GQS density for different GQS motifs. DNA GQS density includes G and C sequences in the intergenic region only (Table 2.2, column 3), since this will not lead to GQS in RNA. RNA GQS density includes only the G sequences found in the genic regions (Table 2.5, column 3). For example, G3+L1-7, RNA GQS Density is 2.9 and the DNA intergenic region GQS density is 16.7, leading to a ratio of 0.17...... 46

Figure 2.4: Formation of GQS RNA oligonucleotides (a) CD spectra of RNA GQS in 10 o mM LiCacodylate (pH 7.0) and 150 mM KCl at 20 C: G3L221 (black), G2L111 (red), G3L444 (gold), G3L444 + FLANK (green), and G2L444 (blue). The positive peak at 260 nm and the negative peak at 240 nm suggest that the RNA GQS adopt a parallel conformation. The G2L111 RNA most likely has some antiparallel character due to the shoulder in the spectrum extending to 300 nm. See Materials & Methods for full + sequences. (b) Sample K titration. Titration is of G3L444 RNA GQS with KCl additions from 0 mM to 700 mM. The arrows indicate increasing K+ concentrations. Also included are UV thermal detaturation of G3L444 RNA in 100 mM LiCl (black), NaCl (red) and KCl (blue) at (c) 260 nm and (d) 295 nm. Absorbances are normalized to the highest absorbance...... 56

Figure 3.1: Fraction folded plots for representative G3 and G2 GQS. Data were fit to a + two state Hill equation. K 1/2 and n values are provided in Table 3.1...... 72

xii

Figure 3.2: Native gels of RNA GQS oligonucleotides. (a) G3A2 and G2A2 native gel mobility traces for different K+ concentration gels, normalized to G2L1mut mobility. (b) Native gel images at different K+ for G3A (1), G2N1-2 (2), G3A2 (3), G2AUA (4), G3N4 (5), G3N4flank (6), G2L1mut (7), G2A (8), G2A2 (9), G2UA (10), G2AUA (11), G2N4 (12)...... 75

Figure 3.3: RNase T1 protection assay of (a) G2A2 and (b) G3N4. Black arrows indicate protected nucleotides. The open arrow indicates position of differential proction. The red arrow indicates decreased protection...... 76

Figure 3.4: Simple model for RNA GQS folding for (a) G3 GQS and (b) G2 GQS. Blue circles represent K+ ions. Gray boxes indicate a formed quartet...... 77

Figure 3.5: Free energy as a function of K+ concentration. ΔG of RNA GQS oligonucleotides with increasing K+ concentrations as calculated from CD K+ titrations. Red: G2 GQS, Black: G3 GQS. The light gray shading represents the K+ concentrations that typically prevail in plant cells under unstressed conditions and the dark gray shading represents K+ concentrations under water stress (200-700 mM). The dashed line at ΔG = 0 separates strongly folding (ΔG < 0) and unfolding (ΔG > 0) conditions...... 79

Figure 4.1: Copper/heavy metal ion transport in Arabidopsis thaliana cell (a) and whole plant (b). Copper/heavy metal ion transport proteins are shown in red, chaperones in green, and other copper binding proteins in blue. Adapted from 2,22. In the cell (a), the HMA family is heavily involved in transport across membranes of different organelles. Copper chapterones CCH and CCS assist in copper transport between proteins. Transport pathways in the whole plant (b) are not as well characterized. Uptake of copper ions by COPT1 across the epidermis is well studied2 as are transport proteins that load the xylem (HMA5 for Cu+, HMA2 and HMA4 for Zn2+, and FRD3 for citrate). However, unloading from the xylem to other tissues and loading into the phloem for transport to other tissues is unknown; The IRT3, OPT3, and theYSL family are implicated in this function. At numbers for the genes encoding the proteins shown above are as follows (in alphabetical order): ATM3: At2g15570, ATX1: At1g66240, CCH: At3g56240, CCS: At1g12520, COPT1: At5g59030, COPT2: At3g46900, COPT3: At5g59040, COPT4: At2g37925, COPT5: At5g20650, CSD1: At1g08830, CSD2: At2g28190, FRD3: At3g08040, HMA1: At4g37270, HMA2: At4g30110, HMA3: At4g30120, HMA4: At2g19110, HMA5: At1g63440, HMA6/PAA1: At4g33520, HMA7: At5g44790, HMA8/PAA2: At5g21930, IRT1: At4g19690, MTP1: At2g46800, MTP3: At3g58810, NRAMP3: At2g23150, NRAMP4: At5g67330, OPT3: At4g16370, VIT1: At2g01770, YSL1: At4g24120, YSL2: At5g24380, YSL3: At5g53550, YSL4: At5g53550, YSL5: At3g17650, YSL6: At3g27020, YSL7: At1g65730, YSL8: At1g48370, ZIP1: At3g12750, ZIP4: At1g10970, ZIP5: At1g05300, ZIP7: At2g04032, ZIP10: At1g31260, ZIP11: At1g55910, ZIP12: At5g62160...... 87

Figure 4.2: At2g15780 competing hairpin structure. Mfold predicted secondary structure of At2g15780 GQS with ten nucleotide flanking sequences. Nucleotides involved in GQS formation are colored in red...... 92

xiii

Figure 4.3: CD titrations of SKS5 DNA with heavy metal ions show a strong Cu2+ effect. SKS5 DNA in 10 mM LiCacodylate, pH 7 and 100 mM KCl was titrated with different heavy metal ions. (a) 4 µM DNA with CuSO4 additions, with a resulting 2+ Cu 1/2 of 59.6 µM 5.7 µM and n = 4.1 1.4 at 263 nm. (b) 0.2 µM DNA with 2+ CuSO4 additions, with a resulting Cu 1/2 of 34.0 µM 0.3 µM and n = 0.9 0.07 at 265 nm. (c) Control titration of 4 µM DNA with MgSO4 additions, and (d) Control titration of 4 µM DNA with ZnSO4 additions and a final spike of CuSO4...... 104

2+ Figure 4.4: SKS5 Cu 1/2 and Hill coefficient values are concentration dependent. Fits of 2+ the SKS5 DNA titrations with Cu 1/2 as described in Materials and Methods and shown in (a) Figure 4.3a and (b) Figure 4.3b. (a) Poor fit of 4 µM DNA in 10 mM 2+ LiCacodylate, pH 7.0 and 100 mM KCl with CuSO4 additions. Cu 1/2 is 59.6 µM 5.7 µM and n = 4.1 1.4 at 263 nm. (b) Fit of 0.2 µM DNA in 10 mM 2+ LiCacodylate, pH 7.0 and 100 mM KCl with CuSO4 additions. Cu 1/2 is 34.0 µM 0.3 µM and n is 0.9 0.07 at 265 nm...... 105

Figure 4.5: SKS5 GQS folds independently of DNA concentration. (a) First derivative of UV thermal denaturation curves at 0.4 µM (black), 2 µM (blue), and 4 µM (green) SKS5 DNA. Melts were completed in 10 mM LiCacodylate, pH 7.0 with 100 mM KCl and as described in Materials and Methods. (b) Non-denaturing polyacrylamide gel electrophoresis of SKS5 DNA at 0.4 µM, 2 µM, and 4 µM. Native gels was run with 0.5xTBE and 100 mM KCl as described in Materials and Methods...... 106

Figure 4.6: At2g15780 folds into a complex GQS structure that is destroyed by Cu2+. At2g15780 DNA in 10 mM LiCacodylate, pH 7 and 100 mM KCl was titrated with 2+ 2+ (a) CuSO4 to an estimated Cu 1/2 greater than 500 µM and (b) ZnSO4 to a Zn 1/2 greater than 1 mM...... 107

Figure 4.7: CdCl2 switches DNA GQS from antiparallel to parallel structure. At2g15780.1 in 10 mM LiCacodylate, pH 7.0 and 100 mM KCl was titrated with 2+ CdCl2 to a Cd concentration of 1 mM...... 108

Figure 4.8: Cu2+ inhibits seedling growth and shortens root length. Images of 8 day old Col-0 wild type seedlings grown on copper-supplemented media under 16x magnification. Seeds were sown on 0.5xMS media supplemented with (a) no CuSO4, (b) 30 µM CuSO4, (c) 50 µM CuSO4, and (d) 100 µM CuSO4. (e) Root length as a function of CuSO4 added to growth media. Root lengths were measured by hand and are the averages of at least 15 plants. Error bars represent standard deviations of the averages...... 109

Figure 4.9: Expression of copper transport genes increases when Col-0 plants are grown on media containing Cu2+. RT-PCR results for Arabidopsis thaliana Col-0 plants grown on media with no added CuSO4, 30 µM CuSO4, 50 µM CuSO4, and 100 µM CuSO4. RT-PCR products are shown for HMA5, HMA6, HMA8, COPT5, MTPB1 and ACT2 (Actin2). Gel quantification data can be seen in Table 4.7...... 111

Figure 5.1: Some common secondary metabolites in plants. Plant hormones: gibberelin G1, indole-3-acetic acid (IAA), cytokinin, abscisic acid (ABA), jasmonic Acid (JA), and brassinolide. Other secondary metabolites: Sinapolymalate (a

xiv

phenylpropanoid), 1,8-Cineole (a terpene), vanillic acid (a benzenoid), Kaempferol (a flavonoid)...... 123

Figure 5.2: ABA biosynthesis pathway (adapted from Nambara, et al14). The first two steps of the pathway (labeled with blue numbers), conversion of zeaxanthin to antheraxanthin and antheraxanthin to violaxanthin, are catalyzed by zeaxanthin epoxidase (ZEP). In step 3, violaxanthin de-epoxidase (VDE) can catalyze the reverse reaction in the chloroplast under high light conditions. Violaxanthin is most likely converted to neoxanthin by neoxanthin synthase (NSY). Isomers of violaxanthin and neoxanthin are synthesized by an unknown isomerase in steps 4 and 5. A family of 9-cis-epoxycarotenoid dioxygenases (NCEDs) is required to form xanthoxin in step 6, which is then converted to abscisic aldehyde by ABA2, a short-chain alcohol dehydrogenase (step 7). Abscisic aldehyde is then oxidized by abscisic acid oxidase 3 (AAO3) with the help of a molybdenum cofactor (MoCo), to form abscisic acid (step 8)...... 128

Figure 5.3: ABA catabolism pathway (adapted from Nambara, et al14). The major pathway for hydroxylation is through 8‘- hydroxy ABA (central path) and is catalyzed by an ABA 8‘ hydroxylase encoded by CYP707A genes. 8‘ –hydroxyl ABA is readily converted into phaseic acid, which can then by catabolized into the dihydrophaseic acid (DPA). In some organisms and tissues, 9‘- hydroxy ABA is abundant and readily converted to neo phaseic acid (neo PA). The minor catabolite of ABA is 7‘- hydroxy ABA. All of these hydroxylated compounds can be further conjugated at hydroxy groups...... 129

Figure 5.4: Arabidopsis thaliana TPP Riboswitch functions in alternative splicing, 3‘end processing, and mRNA stability.59 ...... 131

Figure 5.5: Putative secondary structure and pseudoknots of the 3‘UTRs of CYP707A4 and AAO3 as determined from DynAlign and manual sequence alignment. Red arrows with nucleotide numbers indicate the 5‘ and 3‘ ends of shortened constructs, as described in the text...... 142

Figure 5.6: Hydroxyl radical footprinting (HRF) of CYP707A4 RNA shows increased protection with increasing Mg2+. Each band is normalized to the total number of 2+ counts in each lane. From the fit, the Mg 1/2 is 0.29 0.06 mM and n is 4.8 4.1...... 144

Figure 5.7: Tm of CYP707A4 3‘UTR increases with Mg2+ concentration. RNA was prepared to 0.35 µM in 10 mM MES, 0.1 mM EDTA, and 100 mM NaCl for all 2+ melts. Unfolding was monitored at 260 nm. Tm‘s: 0 mM Mg , 15 C; 0.5 mM Mg2+, 30 C; 4 mM Mg2+, 42 C...... 145

Figure 5.8: Shortened putative secondary structures of the CYP707A4 3‘UTR (a) 20/158 and (b) 32/140...... 146

Figure 5.9: 32/140 RNA changes structure upon addition of Mg2+ or K+. Circular dichroism titrations of 0.4 µM 32/140 RNA with (a) MgCl2 to 5 mM or (b) KCl to 700 mM. Both titrations were in the background of 50 mM MES, pH 6.0. The Mg2+

xv

2+ titration also had a background of 100 mM KCl. Estimated Mg 1/2 is around 0.04 + mM and estimated K 1/2 is around 300 - 400 mM...... 147

Figure 5.10: ITC titrations of CYP707A4 RNA with ABA show a small heat release at lower pH values that is slightly greater than the IAA control. (a) 4 µM CYP707A4 RNA in 10 mM Na-MOPS, pH 7.5 with 5 mM Mg2+ titrated with 30 µM ( )ABA shows a very weak signal. (b) 2 µM CYP707A4 RNA in 10 mM MES, pH 5.5 with 5 mM Mg2+ titrated with 30 µM ( )ABA. (c) 2 µM CYP707A4 RNA in 10 mM MES, pH 6.0 with 5 mM Mg2+ titrated with 30 µM ( )ABA or 30 µM IAA. The interaction between the RNA and ABA is noticeably stronger at pH 6.0 than at pH 7.5. (d) The control titration using 30 µM of IAA instead of ABA under the same conditions as panel (c) indicated a minimal interaction between the control ligand and the RNA...... 148

Figure 5.11: Circular dichroism indicates a change in RNA structure with ABA additions. (a) 0.2 µM 32/140 RNA in 50 mM MES, pH 6.0, 100 mM KCl, and 5 mM MgCl2 with ABA additions to 50 µM. Estimated ABA1/2 is around 1 µM with a percent change of 13.7%. (b) 0.4 µM 32/140 RNA in 50 mM MES, pH 6.0, 100 mM KCl, and 5 mM MgCl2 with IAA additions to 50 µM. Estimated IAA1/2 is around 0.5 µM with a total percent change of 7.4%. (c) Control titration with 0.4 µM yeast tRNAPhe in the background of 50 mM MES, pH 6.0, 150 mM NaCl, and 5 mM MgCl2. (d) Control titration with 1 µM MGA RNA in 10 mM NaCacodylate, pH 5.9, 10 mM KCl, and 10 mM MgCl2...... 150

Figure 6.1: Ionic conditions affect G3L1-2 folding parameters. CD titration data and fits for GGGUCGGGUUGGGCGGG in (top to bottom): 10 mM LiCacodylate (pH 7.0), 100 mM LiCl and 10 mM LiCacodylate (pH 7.0), and 2 mM MgCl2 and 10 mM LiCacodylate (pH 7.0). Fit parameters can be seen in Figure 6.2...... 169

Figure 6.2: K+ dependent cleavage of central loop U residues in GQS with AUA loops. Cleavage has been seen with G3AUA (left) and G2AUA (right) RNA oligonucleotides in (a) in line probing assays and (b) RNase T1 protection assays. Red arrows indicate sites of cleavage. Salt front seen on the bottom of the gels is due to the presence of cacodylate buffer...... 171

Figure 6.3: G3AUA ILP timecourse after renaturation. Cleavage is seen at all U residues for RNA with 100 mM K+ at all timepoints. No cleavage is seen for 0 and 1 mM K+. The sample for 100 mM K+ at 1 h spilled into the 0 mM K+ at 5 h lane which accounts for cleavage bands seen in the latter. Red lines show positions of G residues and blue boxes indicate U positions. No treatment (NT), hydroxyl ladder (OH), and RNase T1 ladder (T1) lanes are also included...... 172

Figure A.1: Native gels of GQS RNA oligonucleotides. A. RNA GQS oligonucleotides of interest, renatured identically as for Circular Dichroism samples, and run in 100 + mM K . B. RNA GQS ‗ladder‘ oligonucleotides. Lane 1: AG2A, Lane 2: A(G2A)2, Lane 3: A(G2A)3, Lane 4: A(G2A)4. Oligonucleotides with increasing numbers of G2A repeats until a full GQS is achieved, as indicated by the faster migration of the oligonucleotide (seen in Lane 4). Lane 5: AG2AGAAG2AG2A. Mutant version of

xvi

the oligonucleotide in Lane 4. Disruption of the G pattern disrupts GQS formation, resulting in slower migration...... 179

Figure A.2: G3L1-7 GQS density as a function of GC content. 15 plant species and 3 non- plant eukaryotes were analyzed. All plant species are in green and fit to a linear regression curve. Non-plant eukaryotes are shown in gray. GC content was determined as described in Materials and Methods...... 180

Figure A.3: Distribution of G3L1-4 GQS in the coding sequence. The number of GQS starting positions that fall within each 100 nt segment of CDS is shown above with bars. The plot is cropped at a CDS length of 4,000; it incorporates 17,497 GQS out of a total of 17,619 GQS (>99%) that are found in RNA CDS. The dashed black line represents the distribution of the length of all coding sequences from Arabidopsis...... 180

Figure B.1: G3 GQS CD K+ titrations and fraction folded plots. Each sequence is provided with G residues in bold text...... 203

Figure B.2: G2 GQS CD K+ titrations and fraction folded plots. Each sequence is provided with G residues in bold text...... 204

Figure B.3: G3A (a) CD titration and (b) fit to two transitions...... 205

Figure B.4: Native gel traces, prepared as described in Materials and Methods (Section B.1)...... 206

Figure C.1: Cu2+ inhibits seedling root growth. Images of 8 day old Col-0 wild type seedlings grown on copper-supplemented media. Seeds were sown on 0.5xMS media supplemented with no CuSO4, 10 µM CuSO4, 30 µM CuSO4, 50 µM CuSO4 and, 100 µM CuSO4 as labeled above. The picture contrast, color, and brightness were adjusted to visualize the roots. Also included is a 16X magnification of the seedling grown on 100 µM CuSO4...... 207

xvii

LIST OF TABLES

Table 2.1: Distribution of GQS motifs in the Arabidopsis genome ...... 34

Table 2.2: Density (D)a and Enrichment (E)b of GQS in various Arabidopsis genomic regions ...... 39

Table 2.3: Number of patterns in Arabidopsis and Markov simulated genome...... 42

Table 2.4: Distribution and Density (D) of GQS motifs in Arabidopsis intergenic regions .... 47

Table 2.5: Distribution and Density (D) of GQS motifs in Arabidopsis genic RNA ...... 50

Table 2.6: Functional analysis of genes with at least one G2L1-4 GQS present in the RNA .... 52

Table 2.7: Experimental Values for G-quartet formation in Arabidopsis RNA ...... 55

Table 3.1: GQS folding parameters ...... 73

Table 4.1: RT-PCR Primers ...... 98

Table 4.2: Genotyping Primers ...... 99

Table 4.3: Functional analysis of genes with at least one G3+L1-7 GQS present in the RNA ... 101

Table 4.4: G3+L1-7 GQS sequences from overrepresented GO term ―copper ion binding‖ ...... 101

Table 4.5: G2+L1-4 GQS in genes annotated for copper ion binding/transport ...... 102

Table 4.6: Candidate genes for in vivo experiments ...... 110

Table 4.7: RT-PCR Quantification ...... 112

Table 5.1: PCR primers for 3‘UTRs of interest ...... 135

Table 5.2: Equilibrium dialysis shows no evidence for ABA binding ...... 151

Table 6.1: Loop sequences of G3L1-7 GQS in genic regions with loops of the same size ...... 166

Table 6.2: G3L221 folding parameters ...... 168

Table A.1: Genome sequence information ...... 181

Table A.2: Z-scores for G3L1-7 and G2L1-4 GQS in the Arabidopsis genome...... 186

Table A.3: Unique gene models with a GQS in the RNA ...... 187

Table A.4: Functional analysis of genes with at least one G2L1-4 GQS present in the RNA ... 188

Table A.5: Functional analysis of genes with at least one G2L1-2 present in the RNA ...... 192

xviii

Table A.6: Functional analysis of genes with at least one G2L1 GQS present in the RNA ..... 194

Table A.7: Functional analysis of genes with at least one G3L1-7 GQS present in the RNA ... 195

Table C.1: G2L1-4 GQS sequences from overrepresented GO term ―copper ion binding‖ ...... 208

Table D.1: Common loop combinations of 5‘UTR G3L1-7 GQS ...... 226

Table D.2: Common loop combinations of 3‘UTR G3L1-7 GQS ...... 226

Table D.3 Common loop combinations of intronic G3L1-7 GQS ...... 227

Table D.4: Common loop combinations of Coding sequence G3L1-7 GQS ...... 228

Table D.5: Common loop combinations of intergenic region G3L1-7 GQS ...... 230

Table D.6: miRNA target genes that contain a GQS in the RNA. For example, At1g30120 is a target gene for miR159/319 and also contains a GQS. The GQS is 142 nt away from the miRNA target binding site. Out interest in this was the possibility of GQS affecting miRNA-miRNA target interactions...... 233

Table D.7: Alternatively spliced loci with G2L1-4 GQS in some gene models. For example, AT1G03850 has two splice variants, one with a G2 GQS in the mature RNA and one that does not. Our interest in this was the possibility of GQS controlling alternative splicing...... 237

xix

ABBREVIATIONS

, ellipticity EDTA, ethylenediaminetetraacetic acid

AAO3, abscisic aldehyde oxidase EGTA, ethyleneglycoltetraacetic acid

ABA, abscisic acid ESR1, estrogen receptor

ABA-GE, abscisic acid glycosyl ester G2, G-quadruplex with two G-tracts

ABRC, Arabidopsis Biological Resource G3, G-quadruplex with three G-tracts Center GA, Gibberellin AGO, Argonaute GB, glycine betaine ASRP, Arabidopsis Small RNA Project GO, BP, biological process GQS, G-quadruplex sequence CC, cellular component hnRNP K, heterogeneous nuclear CCH, copper chaperone ribonucleoprotein K

CCS, copper chaperone for superoxide HRF, hydroxyl radical footprinting dismutase IAA, indole-3-acetic acid CD, circular dichroism IGR, intergenic region CDS, coding sequence ITC, isothermal titration calorimetry CNBP, cellular nucleic acid binding JA, jasmonic acid Col-0, Columbia-0 ecotype JGI, US Department of Energy Joint CSD, copper/zinc superoxide dismutase Genome Institute

CTC1, conserved telomere maintenance 1 L, loop

D, density Mb, megabase

DCL, Dicer like MES, 2-(N-morpholino) ethanesulfonic acid

DPA, (-) – dihydrophaseic acid MF, molecular function dsRNA, double stranded RNA MGA, malachite green aptamer

E, enrichment miRNA, micro RNA

xx

MoCo, molybdenum cofactor snoRNA, small nucleolar RNA

MOPS, 2-(N-morpholino) propanesulfonic sRNA, small RNA acid TAIR, The Arabidopsis Information MS, Murashige and Skoog Resource n, Hill coefficient ta-siRNA, trans acting small interfering RNA NCED, 9-cis-epoxycarotenoid dioxygenase TBA, Thrombin binding aptamer NPX1, nuclear protein X1 1xTBE, 100 mM Tris, 80 mM Boric acid, 1 NSY, neoxanthin synthase mM EDTA nt, nucleotide T-DNA, transfer DNA

ORF, open reading frame 1xTE, 10 mM Tris, 1 mM EDTA

PA, (-) – phaseic acid Tm, melting temperature

PAGE, polyacrylamide gel electrophoresis TMPyP4, 5,10,15,20-tetra(N-methyl-4- pyridyl) prophine chloride PCR, polymerase chain reaction TPP, thiamine pyrophosphate RISC, RNA-induced silencing complex TPS, template preparation solution (100 RNAi, RNA interference mM TrisHCl pH 9.5, 1 M KCl, 10 mM EDTA) ROS, reactive oxygen species Tris, tris(hydroxymethyl)aminomethane rRNA, ribosomal RNA tRNA, transfer RNA RT-PCR, reverse transcription-polymerase chain reaction TU, transcriptional unit

SAM, S-adenosyl methionine UTR, untranslated region

SKS5, SKU similar 5 VDE, violaxanthin epoxidase sRNA, small RNA WT, wild type siRNA, small interefering RNA ZEP, zeaxanthin epoxidase

xxi

ACKNOWLEDGEMENTS

I would like to thank my friends, family, professors, and colleagues for their support.

The work in this thesis was supported by the Human Frontier in Science Program (HFSP)

RGP0002/2009-C and the National Science Foundation (MCB-0527102 and MCB-03-45251).

At Colby College, I would like to thank my professors and advisors Dr. Dasan

Thamattoor, Dr. D. Whitney King, Dr. Steven Dunham and Dr. Shari Dunham for guidance. At

Penn State, I would like to acknowledge the professors who have served on my committee throughout the years: Dr. Christine Keating, Dr. David Boehr, Dr. Paul Babitzke, and Dr. Blake

Peterson (now at the University of Kansas).

My labmates from the Bevilacqua and Assmann labs have been invaluable over the past six years. I appreciate all of the help and advice they have provided. I have to thank Dr. Andrea

Szakal, Dr. Rieko Yajima, Dr. Subbarao Nallagatla, and Dr. Durga Ghosh from the Bevilacqua lab. From the Assmann lab, I have to thank Dr. Sarah Nilson, Tim Gookin, Dr. Laura Ullrich, and Dr. Yiliang Ding from the Assmann lab for helping a chemist learn plant molecular biology.

To Dr. Nate Siegfried, Dr. Joshua Blose, Dr. Rebecca Toroney, Dr. Josh Sokoloski,

Jenny Wilcox, and Dr. Joshua Boyer: you have helped to make the long years fun, in good times and stressful, while on conference road trips, and during lab dance parties with some sweet latin beats. To the young grad students in the lab, Chris Strulson, Pallavi Thaplyal, Betsy Whitman,

Chun Kit Kwok, and Chelsea Hull and the undergraduate researchers: thank you for your enthusiasm and optimism which has helped me to remember days before stress and thesis writing.

I have been blessed with two extraordinary advisors, Dr. Philip Bevilacqua and Dr. Sarah

Assmann. Phil, you have been a wonderful mentor. Thank you for guiding me through the

xxii challenges of research and helping me to become a better scientist. I have always been impressed and inspired by your scientific intellectual curiosity. Sally, I am grateful for having you as a co- advisor. Thank you for your guidance, insight, and for always reminding me that we can‘t forget the biological implications of the work, which are the motivations and the goals for the research.

This work would not have been possible without the love and support from family and friends. Huge thanks goes to Carl and Stacey, the best roommates ever, who were supportive, understanding, and always willing to grab a beer and a cheesesteak if the occasion called for it. I am grateful for my Jersey girls, Colleen, Trish, Fletch, and Kate, who helped me to take each day at a time and to live life to the fullest. I am indebted to my parents, Dave and Sharon Mullen, for their endless love and support, for believing I could do anything I set my mind to, and for instilling in me the work ethic to achieve it. Most of all, I thank Michael Davis who helped me through the tough times, shared the joy of the good times, and through it all was always able to make me smile.

Life is not easy for any of us. But what of that? We must have perseverance and above all confidence in ourselves. We must believe that we are gifted for something and that this thing must be attained. – Marie Curie

El camino es el que nos enseña la mejor forma de llegar y nos enriquece mientras lo estamos cruzando. – Paolo Coelho

1

Chapter 1

Introduction

RNA was once relegated to an intermediate role as the go-between for the DNA that stores genetic material and the proteins that perform cellular functions. However, RNA has a rich structural and functional diversity that cannot be overlooked. RNA folds into structures much more varied than a double stranded helix, and the range of structures contributes to the variety of functions it can perform. RNA can regulate gene expression through RNA folding,1 metabolite binding in riboswitches,2-4 and targeted RNA binding by small RNAs including miRNAs and siRNAs.5,6 Additionally, RNA can catalyze splicing and self cleavage reactions in ribozymes.7-9

This chapter will highlight RNA structure and folding and how they relate to gene regulation in the model plant species Arabidopsis thaliana.

1.1 Nucleic Acid Structure

Nucleic acids are comprised of four nucleobases: two purines, Adenine (A) and Guanine

(G), and two pyrimidines, Cytosine (C), and Thymine (T) in DNA and Uracil (U) in RNA. Bases are attached to ribose sugar rings and are connected with a negatively charged phosphate backbone. RNA has a 2‘OH group on the ribose which is responsible for decreased chemical stability via increased susceptibility to alkaline cleavage.10

Bases interact with each other through hydrogen bond interactions and stacking within the helical structure. The canonical Watson-Crick base pairs, A U (RNA) (or A T (DNA)) and

G C, are characterized by two and three hydrogen bond interactions, respectively (Figure 1.1a).

These interactions involve the Watson-Crick faces of the bases, highlighted in blue in Figure 1.1.

2

Figure 1.1: RNA base pairs with the Watson-Crick face highlighted in blue and Hoogsteen face highlighted in orange. (a) Watson-Crick base pairs G C and A U. (b) Non Watson-Crick interactions are exemplified in a G U wobble base pair (left) and an A U reverse Hoogsteen interaction (right). (c) Non Watson-Crick interactions involving the same nucleotide C+ C (left) and G G (right).

3

In addition to these pairs, non-canonical base pairs (Figure 1.1b) can incorporate different base orientations and different functional groups of the base, such as those on the Hoogsteen face of the purine bases (highlighted in orange). Examples of alternative base pairings, G U wobble and reverse A U Hoogsteen base pair, are shown in Figure 1.1b.11 Alternative base pairings within an RNA helical structure may or may not distort the structure from the standard A-form helix and, in some cases, are known to act as recognition motifs for small molecule or protein binding.12

Base pairs between the same base, such as C+ C and G G, shown in Figure 1.1c, are also possible13,14 but cause even greater distortions in the helical structure, as the interactions are between two purine bases or two pyrimidine bases and are larger or smaller, respectively, than the canonical Watson-Crick base pairs.15 While these interactions are disruptive to the RNA backbone, they contribute positively to the stability of some alternative structures, including the

G-quadruplex and i-motif structures discussed below. Further base pair interactions are possible, especially at low pH, when the N3 of C and the N1 of A can become protonated, changing the functional groups and hydrogen bond interactions for some base pairs, including C+ C and

A+ C.13,16,17

DNA is mostly found in a double stranded B-form helix, which has a narrow minor groove, a wide major groove, and a flat orientation of the base pairs. RNA, however, has a rich structural diversity because it is mostly single stranded and can form a wide variety of secondary and tertiary structures. When double stranded, RNA adopts an A-form helix which differs from

DNA by having a wide shallow minor grove with 2‘OH groups and the sugar-phosphate backbone, a deep narrow major groove with access to the bases, and a tilted orientation of the base pairs.10 RNA is often folded in imperfect helical or hairpin secondary structures, as can be

4 seen in Figure 1.2.10,18 Internal loops, bulges, and non-canonical base pairs that distort the A- form helical structure are common and expose the base to convey sequence information.19-22

Figure 1.2: Secondary (a,b) and tertiary (c) nucleic acid structures. Secondary: (a) duplex (b) hairpin, internal loop, bulge (L to R). Tertiary (c): helical junctions, kissing hairpin loops, pseudoknot (L to R)

Secondary structures can interact to form higher order tertiary structures (Figure 1.2c) such as coaxial stacking of two helical regions, kissing hairpins which interact at the loop region,23 or pseudoknots, where two single stranded regions within a secondary structure interact.24 Tertiary interactions contribute to the global structure of the RNA: coaxial stacking can manifest in an extended structured RNA while pseudoknots can fold the RNA into a more globular structure. Higher order complexes, such as the ribosome and spliceosome, form by

5 utilizing the RNA secondary and tertiary structures mentioned above for RNA-RNA and RNA- protein interactions. The eukaryotic ribosome, the complex responsible for protein synthesis, is comprised of a 60S subunit with 5S, 28S, and 5.8S rRNA and 49 proteins and a 40S subunit with

18S rRNA and 33 proteins.25 The spliceosome, which catalyzes pre-mRNA splicing, is another example of a large RNA-protein complex, that contains 5 snRNAs and about 150 proteins.

1.2 G-quadruplex and i-motif

1.2.1 Structure

In addition to the common secondary and tertiary structures, certain sequences of nucleic acids can adopt less common alterative structures such as the G-quadruplex (Figures 1.3 and 1.4) and i-motif (Figure 1.5). Following the identification of the quartet structure from the self assembly of 5‘guanosine monophosphates in 1962,28 and the discovery of the quadruplex structure in the telomeric regions at the ends of the ,29,30 interest in G-quadruplex structures has increased tremendously.

G-quadruplex structures (GQS) are formed in DNA or RNA sequences that contain tandem stretches of guanine residues in the general formulas: G3(L1-7)3G3, or G2(L1-4)G2, referred to herein as G3 and G2, respectively, since they can form three quartets or two quartets, although more than three is also possible.31,32 The loop region (L) can be any base. The guanine residues interact at the Watson-Crick and Hoogsteen faces to form a G-quartet structure (Figure 1.3a).

These quartets stack two or more to form a G-quadruplex that is stabilized by K+ or Na+ ions, which are found between the planes of the quartets for K+, or in the center of each quartet for

Na+.33 While GQS are stabilized by either Na+ or K+, they are typically more stable in the presence of K+, due to its larger ionic radius which is associated with a smaller free energy of

6 dehydration.34 The lone pairs on the O6 of the guanines complex with the dehydrated metal ion, which provides most of the stability to the structure. Without the presence of K+ or Na+, the

Figure 1.3: G-Quadruplex structure. (a) G-quartet base pairing structure, showing interactions between guanine residues at the Watson-Crick and Hoogsteen faces. The central dehydrated monovalent ion is necessary for formation and interacts with the O6 residues. Unimolecular (b) parallel and (c) antiparallel topologies. Dark lines follow the nucleic acid strand, arrowheads denote strand directionality, and gray boxes denote quadruplexes. The examples drawn here are for sequences having three quartets. Numbers by loops indicate different types of loops: 1 lateral, 2 diagonal, 3 double chain reversal.

structure is not formed, although some sequences have been shown to adopt a pre-GQS structure, where the bases are oriented similarly to the fully folded structure but are not parallel and locked into place until addition of K+ or Na+.35

RNA GQS prefer a parallel quadruplex structure as shown in Figure 1.3b, wherein all of the strands have the same orientation.36 In a parallel quadruplex, all of the bases adopt an anti conformation of the glycosidic bond since RNA does not readily form syn bases and the associated C2‘endo sugar puckers. As a result, the loop regions are oriented in a propeller structure on the exterior of the quartets, and can be tightly wound (Figure 1.4a).37 Sometimes described as a ―plate,‖ parallel quadruplexes can have a flat topology due to the orientation of the loops, as opposed to antiparallel structures (Figure 1.3c, Figure 1.4b-f) which can be more globular.

7

Figure 1.4: G-quadruplex topologies.37 (a) Parallel ―propeller‖ structure with all double chain + 38 reversal loops. Human telomeric DNA: d[A(G3TTA)3G3] in K . (b) Antiparallel ―chair‖ 39 structure with all lateral loops. Thrombin Binding Aptamer (TBA): d(G2TTG2TGTG2TTG2). (c) Antiparallel ―mixed‖ structure with lateral, lateral, and double chain reversal lops. BCL2 40 promoter GQS: d(G3CGCG3AG2AATTG3CG3). (d) Antiparallel structure with double chain 41 reversal, lateral, lateral loops. Human telomeric DNA: d[TTA(G3TTA)3G3A]. (e) Antiparallel ―basket‖ structure with lateral, diagonal, lateral loops. Human telomeric DNA: 42 d[AG3TTA)3G3]. (f) Antiparallel structure with diagonal, double chain reversal, and diagonal 43 loops. DNA quadruplex: d(G2TTTTG2CAG3TTTTG2T). Guanines are shown in green, thymine as blue, cytosine as yellow, and adenine as red panels.

8

DNA GQS can adopt even more diverse structures than RNA GQS. DNA GQS can form parallel or antiparallel structures with alternative loop orientations, and loop regions can present in lateral, diagonal, or double chain reversal orientations, as labeled in Figure 1.3.37

Computational studies have indicated that greater than 26 DNA GQS structures are possible, however only six have been determined by crystallography or solution NMR approaches (Figure

1.4). Different combinations of the loops orientations form one parallel structure and five possible antiparallel GQS (Figure 1.4b-f): chair, mixed, 3+1, basket, and DCR2D. All loops in a parallel quadruplex have a double chain reversal structure (Figure 1.3b), and some DNA GQS have a loop that adopts this conformation, especially when connecting two parallel G-strands within the quadruplex, such as in the mixed, 3+1, and DCR2D structures. Lateral loops connect two adjacent antiparallel G-strands, as in the all-lateral chair structure and mixed, 3+1, and basket structures. Diagonal loops, which cross the quadruplex to connect two opposite sides, require at least 3-4 bases to span the distance and can be seen in the basket and DCR2D structures.33

As mentioned above, all of the guanine bases in a parallel quadruplex adopt an anti conformation of the glycosidic bond. In antiparallel GQS, bases can adopt syn or anti conformations of the glycosidic bonds, depending on the orientation of the strands themselves.

For example, in a single quartet, if two guanines are on strands oriented in the same direction, they will have the same glycosidic conformation. If the two guanines are on strands in opposites directions, they will have opposite glycosidic conformations. For a G4 quartet within the

Oxytricha nova (a ciliated protozoan) telomeric repeat d(G4T4)3G4 that has alternating strand orientations, there is an alternating syn anti syn anti pattern.44

Vertically, between quartets, glycosidic bond conformation is even more varied. For G4 sequences, such as the O. nova sequence, guanines follow a 5’ syn anti syn anti pattern. G2

GQS all have a 5’syn anti pattern and avoid the unstable orientations anti syn and syn syn

9 whenever possible. G3 quadruplexes have greater complexity. Most, but not all G3 GQS adopt a

‘3+1’ pattern, where 3 strands form the stable syn anti anti pattern, and one strand forms the more unstable anti syn syn conformation.44 Due to variation in strand orientation, glycosidic bond formation, and structure, DNA GQS are notoriously complex nucleic acid structures. It is common to find a mixture of conformations in solution and obtaining a single conformation can be extremely difficult.

Not as well known, the i-motif structure forms in sequences with tandem stretches of cytosine residues in DNA.45,46 The i-motif structure, shown in Figure 1.5, is stabilized by intercalated C+ C base pairs, typically with one of the cytosine residues protonated at the N3 position, which allows for three hydrogen bonds per base pair.45 Because of the added stability from the protonated base, i-motifs form in neutral or low pH solutions, and are driven by crowding conditions.47 These structures can form as tetramers, dimers, or a monomer structures, which all adopt an antiparallel orientation; two strands are oriented in each direction. The C+ C base pairs form between parallel strands in each direction. For example, in Figure 1.5d, C+ C pairs C2-C13 and C7-C18 form in parallel strands, which are antiparallel to each other.48

1.2.2 Location

In the H. sapiens genome, there are greater than 375,000 G-quadruplex sequences of the pattern

32,49 (G3L1-7)3G3 or (C3L1-7)3C3, which accounts for GQS formation in either strand of the DNA.

These sequences are underrepresented in the genome, but about 40% of promoter regions contain at least one GQS.32,50 Promoter region GQS have been implicated in control of transcription, discussed in more detail below. Additionally, there is a general correlation between GQS and other regulatory elements in the .51,52 GQS have also been found in

10

Figure 1.5: Imotif structure. (a) C+ C base pair, showing hydrogen bonding and orientation of the residues. Protonation of N3 of cytosine is possible at low or neutral pH. I-motif formation as a (b) tetramolecular [d(TCC)]4, (c) dimer [d(5mCCTTTACC)]2, and (d) monomer d(5mCCTTTCCTTTACCTTTCC). Different strands are color coded with varying shades of gray and arrows indicate direction of each strand. Circle and triangle size are indicative of depth; larger circles are closer to the reader and smaller circles are further into the paper. Adapted from Gueron, et al.48

other higher eukaryotes, such as Pan troglodytes (chimpanzee),53 Gallus gallus (chicken),54 Mus musculus (mouse),53 and Rattus norvegicus (rat),53 and others55 with less detailed analysis. Plant genomes, including Arabidopsis thaliana, Zea mays (maize), Oryza sativa (rice), and others also contain GQS and are discussed in detail in Chapter 2. S. cerevisiae and several prokaryotes have also been shown to have a higher frequency of GQS motifs in upstream and other gene regulatory regions.53,56,57

1.2.3 Function

As mentioned above, GQS are prevalent in promoter regions of human genes, including several protooncogenes: c-MYC, VEGF, HIF-1 , Ret, KRAS, BCL2, c-Kit, PDGF-A, and c-

MYB.58 GQS in these promoter regions are implicated in transcriptional control of the associated

11

Figure 1.6: GQS function. (a) Transcription regulation by GQS found in promoter regions (grey boxes). Formation of GQS (and i-motif) inhibit transcription. Stabilization of GQS with GQS stabilization ligamd, such as TMPyP4, also inhibits transcription. Transcription is up-regulated by binding of single stranded DNA binding proteins CNBP and hmRNP K, which allow RNA polymerase to function. (b) Translation regulation by GQS found in the 5‘UTR of mRNA. Without formation of GQS (top), translation is on. Formation of GQS in 5‘UTR leads to translation repression.

12 genes, as shown in Figure 1.6a). Down-regulation of genes upon stabilization of GQS has been shown in c-MYC, KRAS, and PDGF promoter regions using a G-quartet binding ligand, such as the cationic porphyrin TMPyP4 (5,10,15,20-tetra(N-methyl-4-pyridyl) prophine chloride), guanidine-modified phthalocyanines, or the mammalian protein nucleolin (Figure 1.6a).59-64 This repression has been shown to be reversible in several cases: transcription of c-MYC and KRAS is up-regulated by GQS binding-proteins or polyamine binding to the quadruplex in the promoter regions, including CNBP (cellular nucleic-acid binding protein),65 NM23-H2/nucleoside diphosphate kinase B,66 hnRNP A1 and Up1,62 MAZ (myc-associated zinc finger),64 PARP-1

(poly(ADP-ribose) polymerase 1),64 and the polyamines spermidine and spermine.67

GQS regulation of translation (Figure 1.6b) has also been studied in E. coli model systems and specific human genes ESR1 (estrogen receptor ), NRAS, MT3, and ZIC-1. In vitro translation assays for NRAS68,69 and ESR170 show that the presence of a GQS decreases translation efficiency compared to a mutated RNA unable to form a GQS. In 2007, the first in vivo study to show that a GQS can repress translation utilized a GQS engineered at the ribosome binding site (near the Shine Dalgarno sequence) of a bacterial reporter gene, which showed a correlation between the amount of translational repression and the stability of the GQS.71,72 Since then, Arora and coworkers have demonstrated translational repression by a GQS in the 5‘UTR of

Zic-1 in eukaryotic HeLa cells using a western blot73 and Morris, et al. showed a similar result for a GQS in the 5‘UTR of MT3 matrix metalloproteinase with a dual luciferase assay.74

G-quadruplexes have been shown to bind to certain proteins. The DNA thrombin binding aptamer, which has a G2 motif (d(G2T2G2TGTG2T2G2)), and was shown to bind and inhibit thrombin.75-78 Fragile X mental retardation protein (FMRP) binds its own mRNA at a binding site in the 5‘UTR that contains a GQS motif, and inhibits translation in vivo as a feedback inhibition mechanism.79,80 In addition to those mentioned above, there are other proteins and small

13 molecules that are able to bind to DNA and RNA quadruplex structures. Some helicases, such as the DNA helicase PiF1 and the RNA helicase DHX36, have been implicated in binding and unwinding GQS structures.81-83 Small molecules are studied as possible drug targets, and vary in type and size. Natural products, such as berberine, telomestatin, and cryptolepine bind GQS.

Additionally, at least eight types of synthetic ligands have also been developed, including pyridine, PIPER (a small molecule), pyridine, phenanthroline decarboxamides, and bi- and tri- substituted acridines.84-87

1.2.4 GQS formation in vivo

Genomic DNA is stably double stranded and is the predominant form in vivo, however, formation of G-quadruplex and i-motif structures are possible at high temperatures,88 low pH,88 or with stabilizing co-factors.59-61 Models for GQS folding in duplex DNA can be seen in Figure

1.7. Formation of G-quadruplexes in telomeric regions of the DNA is well accepted,29,38,88-91 and i-motif formation in the alternative strand would not be unexpected. The single stranded DNA overhang on the 3‘end of telomeres could also form a GQS, without any competition from a duplex structure. Quadruplexes and i-motifs could also form in DNA during replication, transcription, or recombination, when DNA is temporarily single stranded. There is less of a barrier for GQS and i-motif formation in RNA, which is mostly single stranded with various secondary and tertiary structures discussed above. GQS and i-motif structures would compete with other possible secondary structures in RNA, but not with a fully double stranded duplex, as in DNA.

Despite the prevalence of sequences that can fold into GQS in genomes of all types of organisms, and examples of regulation by GQS as mentioned above, there is still skepticism as to whether these unique structures form in vivo. To date, a few studies have successfully shown that

14 these structures can form in the cellular environment. Antibodies specific for antiparallel GQS were developed and used to show that GQS form in telomeres in ciliated protozoa92 and

Stylonychia lemnae macronuclei.93 GQS-binding fluorescent molelcules have also been utilized to directly visualize quadruplex formation in vivo. Human telomeres were visualized using a cyanine dye complex94 as well as in another study which utilized a ruthenium (II) polypyridyl complex.95 Using molecules that can bind to GQS to probe formation of structure within a cell can be problematic, as binding of the molecule to DNA can induce formation of the structure, or can alter the equilibrium to favor the quadruplex over competing alternative structures. At the very least, fluorescence assays show that it is possible for GQS to form in vivo, and other assays are necessary to prove that they do.

1.3 Functional RNAs in plants

Gene regulation by RNAs in plant systems differs from that occurring in prokaryotic organisms; there are no Shine Dalgarno sequences for ribosome binding and transcription termination does not proceed through terminator/antiterminator stem loops. Mechanisms for control of transcription, translation, splicing, and mRNA degradation are not as well established.

Small regulatory RNAs are found in all domains of life including plants as well as mammalian species. However, some of the mechanisms of biogenesis and function differ. This section will highlight some examples of unique functional RNAs in plant systems.

15

Figure 1.7: Models for GQS folding in DNA from Phan, et al.88 (a) GQS and i-motif formation at the same location on the duplex, (b) i-motif formation, (c), GQS formation, and (d) GQS and i- motif formation at different locations on the duplex.

16

1.3.1 Small RNAs

Small RNAs (sRNAs) are numerous in sequence, widely distributed among eukaryotes, and affect many biological processes. sRNAs include microRNAs (miRNAs), small interfering

RNAs (siRNAs), and trans-acting siRNAs (ta-siRNAs). Each type of sRNA is unique with respect to length, function, and the plant proteins that are involved in its pathway.

miRNAs are sRNAs, typically 21 nucleotides (nt) in length, that often direct silencing of target mRNA(s) which are distinct from the miRNA locus itself.96 After processing from an imperfect hairpin structure by DCL1 (Dicer-like 1), the miRNA is loaded into the RNA-induced silencing complex (RISC) where it directs cleavage of the mostly complementary target RNA by

Argonaute 1 (AGO1). Once the target is cleaved, another target RNA can be cleaved by RISC in a multiple turnover fashion.96,97 In 2006, there were 42 known families of miRNAs in

Arabidopsis thaliana.96 In 2011, according to the Arabidopsis Small RNA Project (ASRP), there are at least 86 known small RNA families in this species and greater than 100 loci98 and many miRNA families target multiple mRNAs for gene silencing.96 Targets for miRNA directed silencing include many transcription factors involved in developmental patterning and stem cell identity.99 There is also evidence that AGO and Dicer like (DCL) proteins are targets, and miRNAs can regulate their own biogenesis.100,101

The RNAi (RNA interference) pathway is used to regulate gene expression and to defend against viral attack. siRNAs were first identified in plants and are the most common small RNAs in Arabidopsis.102 They are 21-24 nt in length and are processed by DCL and AGO from long, double-stranded RNAs before targeting genes or transposons for silencing. Some siRNAs target their precursor genes or those that are closely related functionally. SiRNAs are are implicated in post-transcriptional regulation, transgene and transposon silencing, and defense against viruses.103 ta-siRNAs are 21 nt long and are processed through dsRNA by DCL4 and RISC. These phased

17 siRNAs are processed using miRNA directed cleavage and are RNA dependent RNA polymerase

6 (RDR6) dependent. They target genes that are complementary at the ta-siRNA recognition site, and have been shown to function in leaf polarity and auxin response.104

1.3.2 Thiamine pyrophosphate (TPP) riboswitch

Riboswitches are natural RNA aptamers that bind small molecules and undergo RNA refolding events that control gene expression.105 In prokaryotes, these highly structured RNAs are typically found in the 5‘untranslated regions (UTRs) of transcripts involved in the biosynthesis or transport of metabolites and contain a selective binding site for the small molecule. A change in the secondary structure of the RNA occurs during ligand binding that affects transcription or translation.105,106

Although two percent of prokaryotic gene regulation is controlled by riboswitches, including that of S-adenosyl methionine4, glycine2, and lysine,3 to date only one riboswitch has been found in eukaryotes.107 The thiamine pyrophosphate (TPP) riboswitch was discovered in the model plant species, Arabiodpsis thaliana, and has been found in the 3‘UTRs of THIC genes in at least 22 plant species, including monocots, dicots and mosses.108 Plant TPP riboswitches were found by searching the respective genomes for sequence and structure homology to the known prokaryotic and fungal TPP riboswitches.107,108 Interestingly, the plant aptamers are located in the 3‘UTR of the mRNA rather than the 5‘UTR typical of prokaryotes. The Arabidopsis TPP riboswitch has been shown to function as a feedback mechanism, affecting 3‘end processing by 3‘UTR length and THIC expression as a function of TPP concentration.108 When the cellular concentration of

TPP is low, the 5‘splice site within the 3‘UTR interacts with the empty TPP binding site, and the transcript is processed at a transcription processing site. Conversely, when the concentration of

TPP is high, the 5‘ splice site is available and splicing occurs, removing the transcription

18 processing site. The resulting transcript has a longer 3‘UTR which is targeted for degradation and decreased expression of THIC.

1.4 Plant stress physiology

Plants in unfavorable environments must adapt to abiotic and biotic stresses, such as drought, cold, salinity, heavy metal ions, and pathogens, to survive. During any type of stress, hundreds of genes are up- or down-regulated at the level of transcript abundance109-113 and complex pathways and mechanisms to overcome the stresses are initiated.

During drought stress, the turgor pressure throughout the plant decreases. The major plant hormone abscisic acid (ABA) increases in concentration and simulates stomatal closure, which restricts water vapor loss through leaves. The movement of ions and metabolites carried through the xylem decreases because of the decrease in transpiration, thus forcing the plant to rely on reserved metabolites for function. Additionally, leaves do not expand, roots do not elongate, and the rate of photosynthesis decreases. There is also an increase in wax deposition on the outside of the leaves, resulting in a thicker cuticle, to prevent water loss.114

While some solutes are concentrated in the cytosol as a result of the cell volume decrease, there are some solutes that increase in concentration to counteract the changes in cell volume. Such osmolytes include glycine betaine (GB), proline, sorbitol, and mannitol, which increase concentrations 10 fold, from about 3 mM to 30 mM.115 K+ concentrations also increase from around 75 mM up to 700 mM.116,117 Increased osmolyte concentrations are known to stabilize proteins in the cell. With respect to RNA, it is known that proline and GB destabilize secondary structures but that GB can stabilize tertiary structures.118 However, the overall effects of the increased osmolyte and K+ concentrations on RNA structure are yet unknown, especially what effects they might have on G-quadruplex structures. Drought stress, salinity, and heavy metals in

19 the soil can also create oxidative stress through formation of reactive oxygen species (ROS), which can in turn bind sulfhydryl groups in proteins and cause oxidative damage throughout the cell.119,120

1.5 Thesis objectives

The goal of my thesis research has been to investigate RNA folding in the model plant system Arabidopsis thaliana under normal and stress conditions and to evaluate the potential for

RNA-mediated gene regulation. Chapter 2 focuses on a bioinformatics approach to determine the prevalence of G-quadruplex sequences and structures in Arabidopsis thaliana and other plant species. This chapter highlights the location and significance of GQS in plant genomes, and explores potential biological functions in drought stress response. Chapter 3 continues the focus on GQS under drought stress conditions from a biophysical and biochemical perspective. The

GQS folding pathway is studied as a function of G-tract length. Chapter 4 explores the relationship between GQS and heavy metal ions, from in vitro studies with model oligonucleotides to the effect of heavy metal ions on gene expression in Arabidopsis thaliana, focusing on genes that contain at least one GQS and are involved in heavy metal ion transport.

Chapter 5 examines a different aspect of stress physiology, focusing on genes involved in ABA metabolism. The ion-dependent folding of a non-GQS AU rich region of these genes is studied.

1.6 References

1. Lee, F. and Yanofsky, C., Transcription termination at the trp operon attenuators of Escherichia coli and Salmonella typhimurium: RNA secondary structure and regulation of termination. Proc. Natl. Acad. Sci. U.S.A., 1977. 74: 4365-4369.

20

2. Mandal, M., Boese, B., Barrick, J.E., Winkler, W.C. and Breaker, R.R., Riboswitches Control Fundamental Biochemical Pathways in Bacillus subtilis and Other Bacteria. Cell, 2003. 113: 577-586.

3. Sudarsan, N., Wickiser, J.K., Nakamura, S., Ebert, M.S. and Breaker, R.R., An mRNA structure in bacteria that controls gene expression by binding lysine. Genes Dev., 2003. 17: 2688-2697.

4. Winkler, W.C., Nahvi, A., Sudarsan, N., Barrick, J.E. and Breaker, R.R., An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat. Struct. Biol., 2003. 10: 701-707.

5. Bartel, D.P., MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 2004. 116: 281-297.

6. Filipowicz, W., Jaskiewicz, L., Kolb, F.A. and Pillai, R.S., Post-transcriptional gene silencing by siRNAs and miRNAs. Curr. Opin. Struct. Biol., 2005. 15: 331-341.

7. Stark, B.C., Kole, R., Bowman, E.J. and Altman, S., Ribonuclease P: an enzyme with an essential RNA component. Proc. Natl. Acad. Sci. U.S.A., 1978. 75: 3717-3721.

8. Cech, T.R., Zaug, A.J. and Grabowski, P.J., In vitro splicing of the ribosomal RNA precursor of Tetrahymena: involvement of a guanosine nucleotide in the excision of the intervening sequence. Cell, 1981. 27: 487-496.

9. Lilley, D.M., Catalysis by the nucleolytic ribozymes. Biochem. Soc. Trans., 2011. 39: 641-646.

10. Bloomfield, V., Crothers, D.M. and Tinoco, I., Jr., Nucleic acids: Structures, properties, and functions. 2000, Sausalito, California: University Science Books.

11. Nagaswamy, U., Larios-Sanz, M., Hury, J., Collins, S., Zhang, Z., Zhao, Q. and Fox, G.E., NCIR: a database of non-canonical interactions in known RNA structures. Nucleic Acids Res., 2002. 30: 395-397.

12. Hermann, T. and Westhof, E., Non-Watson-Crick base pairs in RNA-protein recognition. Chem. Biol., 1999. 6: R335-343.

13. SantaLucia, J., Jr., Kierzek, R. and Turner, D.H., Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: Evidence for stable hydrogen- bonded U.U and C.C+ pairs. Biochemistry, 1991. 30: 8242-8251.

14. Dieckmann, T., Suzuki, E., Nakamura, G.K. and Feigon, J., Solution structure of an ATP- binding RNA aptamer reveals a novel fold. RNA, 1996. 2: 628-640.

15. Morse, S.E. and Draper, D.E., Purine-purine mismatches in RNA helices: evidence for protonated G.A pairs and next-nearest neighbor effects. Nucleic Acids Res., 1995. 23: 302-306.

21

16. Pan, B., Mitra, S.N. and Sundaralingam, M., Crystal structure of an RNA 16-mer duplex R(GCAGAGUUAAAUCUGC)2 with nonadjacent G(syn).A+(anti) mispairs. Biochemistry, 1999. 38: 2826-2831.

17. Siegfried, N.A., O'Hare, B. and Bevilacqua, P.C., Driving forces for nucleic acid pK(a) shifting in an A(+).C wobble: effects of helix position, temperature, and ionic strength. Biochemistry, 2010. 49: 3225-3236.

18. Varani, G., Exceptionally stable nucleic acid hairpins. Annu. Rev. Biophys. Biomol. Struct., 1995. 24: 379-404.

19. Rice, J.A. and Crothers, D.M., DNA bending by the bulge defect. Biochemistry, 1989. 28: 4512-4516.

20. Bhattacharyya, A., Murchie, A.I. and Lilley, D.M., RNA bulges and the helical periodicity of double-stranded RNA. Nature, 1990. 343: 484-487.

21. Turner, D.H., Bulges in nucleic acids. Curr. Opin. Struct. Biol., 1992. 2: 334-337.

22. Kierzek, R., Burkard, M.E. and Turner, D.H., Thermodynamics of single mismatches in RNA duplexes. Biochemistry, 1999. 38: 14214-14223.

23. Chang, K.Y. and Tinoco, I., Jr., Characterization of a "kissing" hairpin complex derived from the human immunodeficiency virus genome. Proc. Natl. Acad. Sci. U.S.A., 1994. 91: 8705-8709.

24. Staple, D.W. and Butcher, S.E., Pseudoknots: RNA structures with diverse functions. PLoS Biol., 2005. 3: e213.

25. Nelson, D.L. and Cox, M.M., Lehninger principles of biochemistry. Fourth Edition ed. 2005, New York: W.H. Freemand and Company.

26. Valadkhan, S., Role of the snRNAs in spliceosomal active site. RNA Biol., 2010. 7: 345- 353.

27. Will, C.L. and Luhrmann, R., Spliceosome Structure and Function. Cold Spring Harbor Persp. in Biol., 2010.

28. Gellert, M., Lipsett, M.N. and Davies, D.R., Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A., 1962. 48: 2013-2018.

29. Henderson, E., Hardin, C.C., Walk, S.K., Tinoco, I., Jr. and Blackburn, E.H., Telomeric DNA oligonucleotides form novel intramolecular structures containing guanine-guanine base pairs. Cell, 1987. 51: 899-908.

30. Williamson, J.R., Raghuraman, M.K. and Cech, T.R., Monovalent cation-induced structure of telomeric DNA: the G-quartet model. Cell, 1989. 59: 871-880.

22

31. Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M. and Bevilacqua, P.C., RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles. Nucleic Acids Res., 2010. 38: 8149-8163.

32. Huppert, J.L. and Balasubramanian, S., Prevalence of quadruplexes in the human genome. Nucleic Acids Res., 2005. 33: 2908-2916.

33. Burge, S., Parkinson, G.N., Hazel, P., Todd, A.K. and Neidle, S., Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res., 2006. 34: 5402-5415.

34. Hud, N.V., Smith, F.W., Anet, F.A. and Feigon, J., The selectivity for K+ versus Na+ in DNA quadruplexes is dominated by relative free energies of hydration: a thermodynamic analysis by 1H NMR. Biochemistry, 1996. 35: 15383-15390.

35. Gray, R.D. and Chaires, J.B., Kinetics and mechanism of K+- and Na+-induced folding of models of human telomeric DNA into G-quadruplex structures. Nucleic Acids Res., 2008. 36: 4191-4203.

36. Joachimi, A., Benz, A. and Hartig, J.S., A comparison of DNA and RNA quadruplex structures and stabilities. Bioorg. Med. Chem., 2009. 17: 6811-6815.

37. Lane, A.N., Chaires, J.B., Gray, R.D. and Trent, J.O., Stability and kinetics of G- quadruplex structures. Nucleic Acids Res., 2008. 36: 5482-5515.

38. Parkinson, G.N., Lee, M.P. and Neidle, S., Crystal structure of parallel quadruplexes from human telomeric DNA. Nature, 2002. 417: 876-880.

39. Schultze, P., Macaya, R.F. and Feigon, J., Three-dimensional solution structure of the thrombin-binding DNA aptamer d(GGTTGGTGTGGTTGG). J. Mol. Biol., 1994. 235: 1532-1547.

40. Dai, J., Chen, D., Jones, R.A., Hurley, L.H. and Yang, D., NMR solution structure of the major G-quadruplex structure formed in the human BCL2 promoter region. Nucleic Acids Res., 2006. 34: 5133-5144.

41. Luu, K.N., Phan, A.T., Kuryavyi, V., Lacroix, L. and Patel, D.J., Structure of the human telomere in K+ solution: an intramolecular (3 + 1) G-quadruplex scaffold. J. Am. Chem. Soc., 2006. 128: 9963-9970.

42. Wang, Y. and Patel, D.J., Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex. Structure, 1993. 1: 263-282.

43. Kuryavyi, V., Majumdar, A., Shallop, A., Chernichenko, N., Skripkin, E., Jones, R. and Patel, D.J., A double chain reversal loop and two diagonal loops define the architecture of a unimolecular DNA quadruplex containing a pair of stacked G(syn)-G(syn)-G(anti)- G(anti) tetrads flanked by a G-(T-T) Triad and a T-T-T triple. J. Mol. Biol., 2001. 310: 181-194.

23

44. Cang, X., Sponer, J. and Cheatham, T.E., 3rd, Explaining the varied glycosidic conformational, G-tract length and sequence preferences for anti-parallel G- quadruplexes. Nucleic Acids Res., 2011.

45. Leroy, J.L., Gehring, K., Kettani, A. and Gueron, M., Acid multimers of oligodeoxycytidine strands: stoichiometry, base-pair characterization, and proton exchange properties. Biochemistry, 1993. 32: 6019-6031.

46. Chen, L., Cai, L., Zhang, X. and Rich, A., Crystal structure of a four-stranded intercalated DNA: d(C4). Biochemistry, 1994. 33: 13540-13546.

47. Rajendran, A., Nakano, S. and Sugimoto, N., Molecular crowding of the cosolutes induces an intramolecular i-motif structure of triplet repeat DNA oligomers at neutral pH. Chem. Commun. (Cambridge, U.K.), 2010. 46: 1299-1301.

48. Gueron, M. and Leroy, J.L., The i-motif in nucleic acids. Curr. Opin. Struct. Biol., 2000. 10: 326-331.

49. Todd, A.K., Johnston, M. and Neidle, S., Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res., 2005. 33: 2901-2907.

50. Huppert, J.L. and Balasubramanian, S., G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res., 2007. 35: 406-413.

51. Eddy, J. and Maizels, N., Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res., 2006. 34: 3887-3896.

52. Du, Z., Zhao, Y. and Li, N., Genome-wide colonization of gene regulatory elements by G4 DNA motifs. Nucleic Acids Res., 2009. 37: 6784-6798.

53. Yadav, V.K., Abraham, J.K., Mani, P., Kulshrestha, R. and Chowdhury, S., QuadBase: genome-wide database of G4 DNA--occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res., 2008. 36: D381-385.

54. Du, Z., Kong, P., Gao, Y. and Li, N., Enrichment of G4 DNA motif in transcriptional regulatory region of chicken genome. Biochem. Biophys. Res. Commun., 2007. 354: 1067-1070.

55. Zhao, Y., Du, Z. and Li, N., Extensive selection for the enrichment of G4 DNA motifs in transcriptional regulatory regions of warm blooded animals. FEBS Lett., 2007. 581: 1951-1956.

56. Rawal, P., Kummarasetti, V.B., Ravindran, J., Kumar, N., Halder, K., Sharma, R., Mukerji, M., Das, S.K. and Chowdhury, S., Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. Genome Res., 2006. 16: 644-655.

24

57. Hershman, S.G., Chen, Q., Lee, J.Y., Kozak, M.L., Yue, P., Wang, L.S. and Johnson, F.B., Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Res., 2008. 36: 144-156.

58. Qin, Y. and Hurley, L.H., Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie, 2008. 90: 1149- 1171.

59. Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. and Hurley, L.H., Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription. Proc. Natl. Acad. Sci. U.S.A., 2002. 99: 11593-11598.

60. Qin, Y., Rezler, E.M., Gokhale, V., Sun, D. and Hurley, L.H., Characterization of the G- quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4. Nucleic Acids Res., 2007. 35: 7698-7713.

61. Gonzalez, V., Guo, K., Hurley, L. and Sun, D., Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein. J. Biol. Chem., 2009. 284: 23622- 23635.

62. Paramasivam, M., Membrino, A., Cogoi, S., Fukuda, H., Nakagama, H. and Xodo, L.E., Protein hnRNP A1 and its derivative Up1 unfold quadruplex DNA in the human KRAS promoter: implications for transcription. Nucleic Acids Res., 2009. 37: 2841-2853.

63. Thakur, R.K., Kumar, P., Halder, K., Verma, A., Kar, A., Parent, J.L., Basundra, R., Kumar, A. and Chowdhury, S., Metastases suppressor NM23-H2 interaction with G- quadruplex DNA within c-MYC promoter nuclease hypersensitive element induces c- MYC expression. Nucleic Acids Res., 2009. 37: 172-183.

64. Cogoi, S., Paramasivam, M., Membrino, A., Yokoyama, K.K. and Xodo, L.E., The KRAS promoter responds to Myc-associated zinc finger and poly(ADP-ribose) polymerase 1 proteins, which recognize a critical quadruplex-forming GA-element. J. Biol. Chem., 2010. 285: 22003-22016.

65. Borgognone, M., Armas, P. and Calcaterra, N.B., Cellular nucleic-acid-binding protein, a transcriptional enhancer of c-Myc, promotes the formation of parallel G-quadruplexes. Biochem. J., 2010. 428: 491-498.

66. Dexheimer, T.S., Carey, S.S., Zuohe, S., Gokhale, V.M., Hu, X., Murata, L.B., Maes, E.M., Weichsel, A., Sun, D., Meuillet, E.J., et al., NM23-H2 may play an indirect role in transcriptional activation of c-myc gene expression but does not cleave the nuclease hypersensitive element III1. Mol. Cancer Ther., 2009.

67. Kumar, N., Basundra, R. and Maiti, S., Elevated polyamines induce c-MYC overexpression by perturbing quadruplex-WC duplex equilibrium. Nucleic Acids Res., 2009. 37: 3321-3331.

25

68. Kumari, S., Bugaut, A., Huppert, J.L. and Balasubramanian, S., An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation. Nat. Chem. Biol., 2007. 3: 218-221.

69. Kumari, S., Bugaut, A. and Balasubramanian, S., Position and stability are determining factors for translation repression by an RNA G-quadruplex-forming sequence within the 5' UTR of the NRAS proto-oncogene. Biochemistry, 2008. 47: 12664-12669.

70. Balkwill, G.D., Derecka, K., Garner, T.P., Hodgman, C., Flint, A.P. and Searle, M.S., Repression of translation of human estrogen receptor alpha by G-quadruplex formation. Biochemistry, 2009. 48: 11487-11495.

71. Wieland, M. and Hartig, J.S., RNA quadruplex-based modulation of gene expression. Chem. Biol., 2007. 14: 757-763.

72. Halder, K., Wieland, M. and Hartig, J.S., Predictable suppression of gene expression by 5'-UTR-based RNA quadruplexes. Nucleic Acids Res., 2009. 37: 6811-6817.

73. Arora, A., Dutkiewicz, M., Scaria, V., Hariharan, M., Maiti, S. and Kurreck, J., Inhibition of translation in living eukaryotic cells by an RNA G-quadruplex motif. RNA, 2008. 14: 1290-1296.

74. Morris, M.J. and Basu, S., An unusually stable G-quadruplex within the 5'-UTR of the MT3 matrix metalloproteinase mRNA represses translation in eukaryotic cells. Biochemistry, 2009. 48: 5313-5319.

75. Ellington, A.D. and Szostak, J.W., In vitro selection of RNA molecules that bind specific ligands. Nature, 1990. 346: 818-822.

76. Bock, L.C., Griffin, L.C., Latham, J.A., Vermaas, E.H. and Toole, J.J., Selection of single-stranded DNA molecules that bind and inhibit human thrombin. Nature, 1992. 355: 564-566.

77. Macaya, R.F., Schultze, P., Smith, F.W., Roe, J.A. and Feigon, J., Thrombin-binding DNA aptamer forms a unimolecular quadruplex structure in solution. Proc. Natl. Acad. Sci. U.S.A., 1993. 90: 3745-3749.

78. Wang, K.Y., McCurdy, S., Shea, R.G., Swaminathan, S. and Bolton, P.H., A DNA aptamer which binds to and inhibits thrombin exhibits a new structural motif for DNA. Biochemistry, 1993. 32: 1899-1904.

79. Darnell, J.C., Jensen, K.B., Jin, P., Brown, V., Warren, S.T. and Darnell, R.B., Fragile X mental retardation protein targets G quartet mRNAs important for neuronal function. Cell, 2001. 107: 489-499.

80. Schaeffer, C., Bardoni, B., Mandel, J.L., Ehresmann, B., Ehresmann, C. and Moine, H., The fragile X mental retardation protein binds specifically to its mRNA via a purine quartet motif. EMBO J., 2001. 20: 4803-4813.

26

81. Sanders, C.M., Human Pif1 helicase is a G-quadruplex DNA-binding protein with G- quadruplex DNA-unwinding activity. Biochem. J., 2010. 430: 119-128.

82. Paeschke, K., Capra, J.A. and Zakian, V.A., DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces cerevisiae Pif1 DNA Helicase. Cell, 2011. 145: 678-691.

83. Sexton, A.N. and Collins, K., The 5' guanosine tracts of human telomerase RNA are recognized by the G-quadruplex binding domain of the RNA helicase DHX36 and function to increase RNA accumulation. Mol. and Cell. Biol., 2011. 31: 736-743.

84. De Cian, A. and Mergny, J.L., Quadruplex ligands may act as molecular chaperones for tetramolecular quadruplex formation. Nucleic Acids Res., 2007. 35: 2483-2493.

85. Patel, D.J., Phan, A.T. and Kuryavyi, V., Human telomere, oncogenic promoter and 5'- UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics. Nucleic Acids Res., 2007. 35: 7429-7455.

86. Ragazzon, P. and Chaires, J.B., Use of competition dialysis in the discovery of G- quadruplex selective ligands. Methods, 2007. 43: 313-323.

87. Balasubramanian, S. and Neidle, S., G-quadruplex nucleic acids as therapeutic targets. Curr. Opin. Chem. Biol., 2009. 13: 345-353.

88. Phan, A.T. and Mergny, J.L., Human telomeric DNA: G-quadruplex, i-motif and Watson- Crick double helix. Nucleic Acids Res., 2002. 30: 4618-4625.

89. Azzalin, C.M., Reichenbach, P., Khoriauli, L., Giulotto, E. and Lingner, J., Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science, 2007. 318: 798-801.

90. Williamson, J.R., G-quartet structures in telomeric DNA. Annu. Rev. Biophys. Biomol. Struct., 1994. 23: 703-730.

91. Phan, A.T., Human telomeric G-quadruplex: structures of DNA and RNA sequences. FEBS J., 2009. 277: 1107-1117.

92. Prescott, D.M., The DNA of ciliated protozoa. Michrobiol. Rev., 1994. 58: 233-267.

93. Schaffitzel, C., Berger, I., Postberg, J., Hanes, J., Lipps, H.J. and Pluckthun, A., In vitro generated antibodies specific for telomeric guanine-quadruplex DNA react with Stylonychia lemnae macronuclei. Proc. Natl. Acad. Sci. U.S.A., 2001. 98: 8572-8577.

94. Yang, Q., Xiang, J., Yang, S., Zhou, Q., Li, Q., Tang, Y. and Xu, G., Verification of specific G-quadruplex structure by using a novel cyanine dye supramolecular assembly: I. recognizing mixed G-quadruplex in human telomeres. Chem. Commun. (Cambridge, U.K.), 2009: 1103-1105.

27

95. Gill, M.R., Garcia-Lara, J., Foster, S.J., Smythe, C., Battaglia, G. and Thomas, J.A., A ruthenium(II) polypyridyl complex for direct imaging of DNA structure in living cells. Nature Chem., 2009. 1: 662-667.

96. Jones-Rhoades, M.W., Bartel, D.P. and Bartel, B., MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol., 2006. 57: 19-53.

97. Xie, Z., Allen, E., Fahlgren, N., Calamar, A., Givan, S.A. and Carrington, J.C., Expression of Arabidopsis MIRNA genes. Plant Physiol., 2005. 138: 2145-2154.

98. Fahlgren, N., Howell, M.D., Kasschau, K.D., Chapman, E.J., Sullivan, C.M., Cumbie, J.S., Givan, S.A., Law, T.F., Grant, S.R., Dangl, J.L., et al., High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PloS One, 2007. 2: e219.

99. Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B. and Bartel, D.P., Prediction of plant microRNA targets. Cell, 2002. 110: 513-520.

100. Xie, Z., Kasschau, K.D. and Carrington, J.C., Negative feedback regulation of Dicer- Like1 in Arabidopsis by microRNA-guided mRNA degradation. Curr. Biol., 2003. 13: 784-789.

101. Vaucheret, H., Vazquez, F., Crete, P. and Bartel, D.P., The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev., 2004. 18: 1187-1197.

102. Hamilton, A.J. and Baulcombe, D.C., A species of small antisense RNA in posttranscriptional gene silencing in plants. Science, 1999. 286: 950-952.

103. Baulcombe, D., RNA silencing in plants. Nature, 2004. 431: 356-363.

104. Allen, E. and Howell, M.D., miRNAs in the biogenesis of trans-acting siRNAs in higher plants. Sem. Cell Dev. Biol., 2010. 21: 798-804.

105. Winkler, W.C. and Breaker, R.R., Genetic control by metabolite-binding riboswitches. ChemBioChem, 2003. 4: 1024-1032.

106. Winkler, W.C. and Breaker, R.R., Regulation of bacterial gene expression by riboswitches. Annu. Rev. Microbiol., 2005. 59: 487-517.

107. Batey, R.T., Gilbert, S.D. and Montange, R.K., Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature, 2004. 432: 411-415.

108. Wachter, A., Tunc-Ozdemir, M., Grove, B.C., Green, P.J., Shintani, D.K. and Breaker, R.R., Riboswitch control of gene expression in plants by splicing and alternative 3' end processing of mRNAs. Plant Cell, 2007. 19: 3437-3450.

28

109. Kreps, J.A., Wu, Y., Chang, H.S., Zhu, T., Wang, X. and Harper, J.F., Transcriptome changes for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiol., 2002. 130: 2129-2141.

110. Seki, M., Narusaka, M., Ishida, J., Nanjo, T., Fujita, M., Oono, Y., Kamiya, A., Nakajima, M., Enju, A., Sakurai, T., et al., Monitoring the expression profiles of 7000 Arabidopsis genes under drought, cold and high-salinity stresses using a full-length cDNA microarray. Plant J., 2002. 31: 279-292.

111. Zhu, J.K., Salt and drought stress signal transduction in plants. Annu. Rev. Plant Biol., 2002. 53: 247-273.

112. Lee, B.H., Henderson, D.A. and Zhu, J.K., The Arabidopsis cold-responsive transcriptome and its regulation by ICE1. Plant Cell, 2005. 17: 3155-3175.

113. Matsui, A., Ishida, J., Morosawa, T., Mochizuki, Y., Kaminuma, E., Endo, T.A., Okamoto, M., Nambara, E., Nakajima, M., Kawashima, M., et al., Arabidopsis transcriptome analysis under drought, cold, high-salinity and ABA treatment conditions using a tiling array. Plant Cell Physiol., 2008. 49: 1135-1149.

114. Taiz, L. and Zeiger, E., Plant physiology. Fourth Edition ed. 2006, Sunderland, M.A.: Subayer Associates, Inc.

115. Yoshiba, Y., Kiyosue, T., Nakashima, K., Yamaguchi-Shinozaki, K. and Shinozaki, K., Regulation of levels of proline as an osmolyte in plants under water stress. Plant Cell Physiol., 1997. 38: 1095-1102.

116. Greenway, H. and Munns, R., Mechanisms of salt tolerance in nonhalophytes. Annu. Rev. Plant Physiol., 1980. 31: 149-190.

117. Zhang, H.X. and Blumwald, E., Transgenic salt-tolerant tomato plants accumulate salt in foliage but not in fruit. Nat. Biotechnol., 2001. 19: 765-768.

118. Lambert, D. and Draper, D.E., Effects of osmolytes on RNA secondary and tertiary structure stabilities and RNA-Mg2+ interactions. J. Mol. Biol., 2007. 370: 993-1005.

119. Babu, T.S., Akhtar, T.A., Lampi, M.A., Tripuranthakam, S., Dixon, D.G. and Greenberg, B.M., Similar stress responses are elicited by copper and ultraviolet radiation in the aquatic plant Lemna gibba: implication of reactive oxygen species as common signals. Plant Cell Physiol., 2003. 44: 1320-1329.

120. Puig, S. and Thiele, D.J., Molecular mechanisms of copper uptake and distribution. Curr. Opin. Chem. Biol., 2002. 6: 171-180.

29

Chapter 2

RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: Prevalence and possible functional roles

[Published as a paper entitled “RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles” by Melissa A. Mullen, Kalee J. Olson, Paul

Dallaire, Francois Major, Sarah M. Assmann, and Philip C. Bevilacqua in Nucleic Acids Research

2010 38 (22): 8149-8163. M.A.M., F.M., P.C.B. and S.M.A. designed research; M.A.M., K.J.O. and P.D. performed research; M.A.M., P.C.B. and S.M.A. analyzed data; M.A.M., P.C.B and

S.M.A wrote paper.]

2.1 Abstract

Tandem stretches of guanines can associate in hydrogen-bonded arrays to form G- quadruplexes, which are stabilized by K+ ions. Using computational methods, we searched for G-

Quadruplex Sequence (GQS) patterns in the model plant species Arabidopsis thaliana. We found

~1,200 GQS with a G3 repeat sequence motif, most of which are located in the intergenic region.

Using a Markov modeled genome, we determined that GQS are significantly underrepresented in the genome. Additionally, we found ~43,000 GQS with a G2 repeat sequence motif; notably,

80% of these were located in genic regions, suggesting that these sequences may fold at the RNA level. Gene Ontology (GO) functional analysis revealed that GQS are overrepresented in genes encoding proteins of certain functional categories, including enzyme activity. Conversely, GQS are underrepresented in other categories of genes, notably those for non-coding RNAs such as tRNAs and rRNAs. We also find that genes that are differentially regulated by drought are significantly more likely to contain a GQS. CD-detected K+ titrations performed on

30 representative RNAs verified formation of quadruplexes at physiological K+ concentrations.

Overall, this study indicates that GQS are present at unique locations in Arabidopsis and that folding of RNA GQS may play important roles in regulating gene expression.

2.2 Introduction

Both DNA and RNA can form G-quartets and G-quadruplexes,1,2 which consist of four guanine bases associated in a planar orientation and stabilized by Hoogsteen-to-Watson-Crick interactions and centrally positioned K+ or Na+ ions (Figure 2.1a).3-5 Several conformations can be adopted by G-quadruplexes, including parallel and antiparallel topologies, which can be formed in either intramolecular or intermolecular strands. Parallel G-quadruplexes are characterized by parallel orientation of all four strands (Figure 2.1b) and have anti conformation of all four guanosines, while antiparallel G-quadruplexes are characterized by alternating orientation of strands (Figure 2.1c) and have 2 anti and 2 syn guanosines per quartet.

Conformation of the quadruplex is controlled by strand orientation, loop length, and ions present in solution.6,7 Parallel quadruplex structures are preferred by RNA,8 which has been explained by

RNA avoiding topologies that would require the syn conformation of the nucleosides and their associated C2‘-endo sugar puckers.9-11

Raising the concentration of monovalent cations lowers the free energy of quadruplex formation. While GQS form in the presence of both K+ and Na+ ions, they are typically more stable in the presence of K+ owing to its larger ionic radius, which is associated with a smaller free energy of dehydration.1, 12, 13 In addition, molecular crowding by osmolytes, including both ions and organic solutes, favors quadruplex stability, owing to the quadruplex‘s compact and less hydrated structure, and simultaneously disfavors competing Watson-Crick structures.14-18 Given

31 these behaviors, formation of GQS might be favored during plant stress wherein concentrations of

K+ and various osmolytes increase (see below).

Figure 2.1: G-quartet and G-quadruplex structures and topologies. (a) G-quartet structure, showing Hoogsteen-to-Watson-Crick face hydrogen bonds and the central dehydrated monovalent ion integral to formation and stabilization. Unimolecular (b) parallel and (c) antiparallel G-quadruplex topologies. Adapted from1,6,10,11. Dark lines follow the nucleic acid strand, arrowheads denote strand directionality, and gray boxes denote quadruplexes. The examples drawn here are for sequences having three quartets.

RNA and DNA folding events have the potential to alter gene expression in vivo,19,20 and one such event is G-quadruplex formation. Using bioinformatics search methods such as the program ‗Quadparser,‘ ~375,000 putative G-Quartet Sequences (GQS) were identified in the human genome,21,22 where G-quadruplexes are found primarily in the intergenic regions and are enriched in telomeres.11,23 A large percentage (>40%) of human promoters contain at least one

GQS sequence, implicating GQS in potential regulatory functions.24,25 In addition, several prokaryotes have also been shown to have a higher frequency of GQS motifs in upstream regulatory regions.26,27

Genes containing GQS have been implicated in the control of transcription, translation, stability, and structure. Transcription of several oncogenes can be controlled by the promoter region GQS and binding of several ligands. A human c-MYC oncogene and a KRAS proto-

32 oncogene are activated by protein binding to the GQS and subsequent destabilization of the structure.28-30 The same genes, as well as PDGF, can be inhibited by other binding factors.31-36

The potential for genome-wide response to regulatory region DNA GQS binding ligands and protein factors has also been examined.37, 38 Control of translation by GQS is evidenced by the

ERS1 mRNA,39 the human NRAS proto-oncogene,39-41 the MT3 metalloproteinase mRNA,2 and

ZIC-1RNA,42 all of which have repressed translation when a stable GQS forms in their 5‘UTRs.

G-quadruplex motifs have also been implicated in stability and post-transcription processing, for example in IGFII (Insulin-like growth factor II) mRNA, which has a GQS in its 3‘-UTR.43 GQS can bind to proteins, such as the DNA thrombin aptamer44-46 and FMRP (Fragile X Mental

Retardation Protein), which binds to its own mRNA at a binding site that contains a GQS.47-49 In addition, an intermolecular G-quadruplex forms in HIV-1 genomic RNA dimers,50,51 and many

GQS can bind small molecules, typically of a planar, aromatic nature.52,53

Besides H. sapiens, GQS have been reported for several other eukaryotic genomes, including D. melanogaster and M. musculus54, however plant genomes have not been evaluated.

Plants are of particular interest to GQS formation because under drought stress, K+ ion concentrations in the cell can increase, which has the potential to drive G-quadruplex formation.

For example, under drought or in high salinity soils, plants increase cytosolic K+ concentrations up to 700 mM in order to avoid cellular dehydration.55,56

The model plant Arabidopsis thaliana, with a completely sequenced genome and tractable genetics, lends itself well to investigation of G-quadruplex formation. Herein, we searched the genome of Arabidopsis thaliana for GQS of various motifs and found that the prevalence and distribution of GQS in the genome varies according to GQS pattern. Functional analysis of genes with GQS suggests putative roles in modulating gene expression. Formation and relative stabilities of representative Arabidopsis GQS were verified experimentally using circular dichroism (CD) K+ titrations and UV-detected thermal denaturation.

33

2.3 Materials and methods

2.3.1 Data sets searched

Genomic sequences were obtained from the TAIR9 version of the Arabidopsis genome, released June 19, 200957. We searched for the presence of GQS in specific components of the genome, including 5‘-UTRs, 3‘-UTRs, coding sequences, introns, intergenic regions, and upstream promoter sequences (defined as 1 kb upstream of the 5‘-UTR or transcription start site).

We utilized the major annotation provided by TAIR. Full chromosome sequences from the

TAIR9 version of the Arabidopsis genome were also searched. For comparisons, unmasked genomic sequences for 14 plant species, as well as D. melanogaster and M. musculus were evaluated. The following genomes were analyzed (see Table A.1 for details of common name, assembly, release data, and server): Arabidopsis lyrata, Brachypodium distachyon 58, Drosophila melanogaster59, Glycine max60, Lotus japonicus61, Manihot esculenta, Medicago truncatula62,63,

Mimulus guttatus, Mus musculus64, Nicotiana tabacum65, Oryza sativa ssp. indica66,67, Oryza sativa ssp. japonica68,69, Populus trichocarpa 70, Pyscomitrella patens 71, Sorghum bicolor 72,

Vitis vinifera73, and Zea mays74. Genomic region sequences for Oryza sativa ssp. japonica

(intergenic, coding sequences, exons, introns, and UTRs) were also analyzed.

2.3.2 Programs used

Searches for G-quadruplex-forming sequences were performed using the program

Quadparser21, which scans input sequences for a given pattern of nucleotides. Several folding motifs were used, following the form d(GX+L1-N)3+GX+ where ‗X+‘ was defined as 2+ or 3+

(depending on the search), and L was 1, 1-2, 1-3, 1-4, or 1-7. These definitions correspond to G-

34 quartets of varying stabilities, with larger values of X and smaller values of N generally corresponding to more stable sequences.

The GQS reported here are non-overlapping, such that any continuous stretch of GQS sequence will count as one, no matter the number of registers. In addition, we note that the choice of Arabidopsis datasets searched slightly affects the number of GQS found, which contributes to small differences in total GQS number. For example, searching different regions of the genome, such as intergenic regions, genic regions, and coding sequences, separately produced

1,187 GQS (Table 2.1). However, searching whole chromosome sequences yielded 1,219 GQS.

These small differences can be attributed to GQS that span the different regions of the genome.

As such, both search methods were used for this study: chromosome searches enabled testing for significance (see below), while searching the genome regions allowed comparison of GQS across different components of the genome.

Table 2.1: Distribution of GQS motifs in the Arabidopsis genome

GQS Genomea Intergenicb Genicc Codingd Genic:Intergenic Motif G3+ L1-7 1,187 827 (70%) 360 (30%) 263 (22%) 0.4 G3+ L1-3 237 163 (69%) 74 (31%) 41 (17%) 0.4

G2+ L1-4 43,117 8,561 (20%) 34,556 (80%) 30,555 (71%) 4.0 G2+ L1-2 12,340 1,824 (15%) 10,516 (85%) 9,415 (76%) 5.8 G2+ L1 8,188 901 (11%) 7,287 (89%) 6,633 (81%) 8.1

Numbers and percentages of G-quartet forming sequences (GQS) in different regions of the Arabidopsis genome. aGenome is comprised of bIntergenic and cGenic, while dCoding is a subset of cGenic and includes all gene models. Quadparser search parameters included G and C patterns to account for both sense and antisense strands.

Searches for G-quadruplex-forming sequences also included C-rich sequences, d(CX+L1-N)3+CX+, which would form GQS in the complementary strand of the genomic DNA, although not in the transcript, or might form the i-motif in the C-rich strand75. It should be

35 mentioned that, as a result of the search parameters, some overlap in counted sequences occurred across GQS motifs: output sequences from less restrictive GQS criteria include sequences that follow stricter criteria; for example, sequences of the type G3+L1-3 occur within the G3+L1-7 and

G2+L1-4 searches.

Functional analysis of the gene products was conducted using the BiNGO 2.3 plugin76 for the Cytoscape 2.6.0 visualization program77,78. BiNGO assesses over- or underrepresentation of genes in a user-defined category as compared to the entire Arabidopsis annotation. Here, the

GQS-containing GO annotated genes were compared against the GO annotation for the entire

Arabidopsis thaliana transcriptome. A full GO analysis was performed for genic regions, using

GO definitions as of 8/10/2008 and all of the GO terms: Molecular Function (MF), Biological

Process (BP), and Cellular Component (CC)79. Only loci with at least one GQS were included and no locus was included twice regardless of the number of GQS it contained. BiNGO uses the hypergeometrical statistical test, which is equivalent to an exact Fisher test, along with a

Benjamini & Hochberg False Discovery Rate (FDR) correction at a significance level of 0.05.

To assess the significance of GQS patterns in Arabidopsis, we employed a windowed

Markov model to generate simulated chromosomes. The mock chromosomes were generated in windows of fixed sizes, where nucleotide composition or di-nucleotide frequencies were modeled about corresponding windows along the corresponding real genomes. This corresponds to

Bernoulli (or categorical random distribution) and Markov chains respectively. The process of generating a mock genome considers in turn every chromosome in the genome. For each chromosome, data are collected in a window of some fixed size, and used to generate a segment of quasi-random chromosome of the same length. The process is iteratively repeated one window length downstream until the end of the real chromosome is reached. We wrote the computer software using the C++ language. The programs are available for Mac OS X and Linux. For this instance, pattern hits were counted using the software grep available on Linux computers. Scripts

36 used are shown in Appendix A.1. Significance was also tested using a larger window size with

Ushuffle; methods and results are described in Appendix A, Section A.2 and Table A.2.

2.3.3 Oligonucleotide preparation

RNA oligonucleotides with the following sequences were purchased from Dharmacon (G-quartet forming stretches are underlined):

G3L221: 5‘-GGGUCGGGUUGGGCGGG

G3L444: 5‘-GGGUUUUGGGACAUGGGCUUGGGG

G3L444+FLANK: 5‘-

UGGGUCCUUUAAGUGUUUCUCCUAUGGGUUUUGGGACAUGGGCUUGGGGU

G2L111: 5‘-GGAGGAGGAGGA

G2L444: 5‘-GGAGCCGGAGUCGGAAUGGG

As an example of the shorthand notation adopted, G3L221 indicates three Gs interspersed by loops of 2, 2, and 1. These represent sequences from Arabidopsis genes as follows: G3L221:

At1g07180 (NDA1, non-phosphorylating NAD(P)H dehyrogenase); G3L444: At5g53580

(MNC6.12, aldo/keto reductase family protein); G2L111: At2g39320 (T16B24.4, OTU-like cysteine protease family protein); G2L444: At1g44020 (F9C16.23, DC1 domain-containing protein). These sequences were chosen because they represented different types of candidate

GQS motifs; in addition the G3L444 sequence was chosen because of the strong flanking sequence that forms a predicted alternative base paired structure with a free energy of –15.6 kcal/mol, which was included in the oligonucleotide G3L444+FLANK. RNA oligonucleotides were first dialyzed against 100 mM LiCl, which does not support quadruplex formation, for 8 h to remove

37 associated cations and then dialyzed against distilled and autoclaved water for 4 h to remove excess LiCl. Finally, RNA oligonucleotides were dialyzed overnight against 10 mM Li

Cacodylate (pH 7.0). All dialysis was performed using a six well microdialysis apparatus from

Gibco-BRL Life Technologies with a flow rate of 25 mL/min. To favor monomeric species,

RNAs were renatured in the absence of potassium ions at 85°C for 1 min and then allowed to cool at room temperature before performing experiments. Native gels confirmed that RNA oligonucleotides of the above mentioned concentration and renaturation conditions form largely unimolecular structures (Figure A.1).

2.3.4 Circular dichroism

Circular dichroism spectroscopy was performed using a Jasco CD J810

Spectropolarimeter and analyzed with Jasco Spectra Manager Suite software. RNA samples were prepared as described above to a concentration ~5 µM. Spectra were acquired at 20 C over a wavelength range of 210-320 nm, with data collected every nanometer at a bandwidth of 1 nm.

Reported spectra are an average of 3 scans at a response time of 4 s/nm. Data are buffer subtracted, normalized to provide molar residue ellipticity values, and smoothed over 5 nm.80

Molar residue ellipticity is reported in order to normalize for concentration differences and oligonucleotide length.

The amount of K+ necessary to drive quadruplex formation was of interest. To determine

+ K 1/2 values, ellipticity data (ε) were fit with KaleidaGraph v. 3.5 (Synergy software) according to the two-state Hill equation in which K+ ions are taken up in the U to F transition:

(Eq. 1) U F F n 1 ([K ]/[K 1/ 2 ])

38 where εF is the normalized CD signal corresponding to fully folded GQS, εU is the normalized

+ + signal for the unfolded GQS, [K ] is the potassium ion concentration, [K 1/2] is the potassium ion concentration needed to fold half the RNA, and n is the Hill coefficient. Data are consistent with a two-state model (see Results).

2.3.5 UV thermal denaturation

RNA samples were prepared as described above at a concentration of ~5 µM in 10 mM

Li Cacodylate (pH 7.0), with monovalent salts added before renaturation. Native gels confirmed formation of one monomeric species, as described above. Thermal denaturation experiments

(‗melts‘) were performed on a Gilford Response II spectrophotometer, with absorbances recorded every 0.5°C over the temperature range of 5°-95°C and 95°-5°C. Similar profiles were obtained for forward and reverse melts consistent with reversibility of the unfolding transition.

Absorbance was detected either at 260 nm to observe the standard increase in A with temperature, or at 295 nm to observe the quadruplex-specific decrease in absorbance with temperature (a so- called ‗inverse‘ melt) 81. Cuvettes with a 0.5 cm pathlength were used for 260 nm melts, while cuvettes with a 1 cm pathlength were used for 295 nm melts, owing to the smaller extinction coefficient at this wavelength. Data were fit with KaleidaGraph v. 3.5 (Synergy Software) using a Marquadt algorithm for non-linear curve fitting.

39

2.4 Results

2.4.1 Prevalence and significance of GQS in Arabidopsis

We began our search for G-quadruplex sequences in the Arabidopsis genome using the

21,22 standard definitions of a G-quadruplex forming sequence, (G3+L1-7)3+G3+ and (C3+L1-7)3+C3+ , referred to herein as ‗G3L1-7‘ or simply ‗G3‘, and identified 1,187 GQS (Table 2.1), which corresponds to a genomic density of 9.3 GQS/Mb (Table 2.2). Of the approximately 1,200 GQS found in Arabidopsis, 329 have multiple registers, capable of forming more than one quadruplex structure. The importance of multiple registers is unclear, but it may contribute to overall stability, kinetics, and regulation.

Table 2.2: Density (D)a and Enrichment (E)b of GQS in various Arabidopsis genomic regions

GQS Motif Genome Intergenic Genic Coding D D E D E D E (GQS/Mb) (GQS/Mb) (GQS/Mb) (GQS/Mb) c c G3+L1-7 9.3 16.7 1.8 4.6 0.5 6.5 0.7 G3+L1-3 1.9 3.3 1.8 1.0 0.5 1.0 0.5

G2+L1-4 339.4 172.9 0.5 445.8 1.3 752.6 2.2 G2+L1-2 97.1 36.8 0.4 135.7 1.4 231.9 2.4 G2+L1 64.4 18.2 0.3 94.0 1.5 163.4 2.5 Provided are density and enrichment of G-quartet forming sequences (GQS) in different regions of the Arabidopsis genome from all gene models. Genome, Intergenic, Genic, and Coding are defined in Table 2.1. aGQS density is defined as the total number of GQS per Mbase in the specified region; number of Mbases per region: whole genome 124.7 Mb, intergenic region 50.07 Mb, genic region 74.65 Mb, and coding sequence 39.59 Mb. bEnrichment values are calculated as the GQS density of a region divided by c the GQS density of the genome. These calculations are with (G3T3A)3G3 sequences (see text). As with Table 2.1, Quadparser search parameters included G and C patterns to account for both sense and antisense strands.

In comparison to the eukaryotes H. sapiens, D. melanogaster, and M. musculsus, which have GQS densities of 115 GQS/Mb24,25, 88.7 GQS/Mb54 and 189.4 GQS/Mb, respectively,

Arabidopsis has a lower number and density of G3L1-7 GQS (Figure 2.2). We also compared

G3L1-7 GQS density in Arabidopsis to GQS density in14 different plant species having a range of

40 phylogenetic distances from Arabidopsis, including eudicot, monocot, and moss species. We find that the GQS density of A. thaliana is similar to those of other dicot species, especially A. lyrata and M. esculenta, and to the moss P. patens. Monocots, such as Z. mays, O. sativa, and B. distachyon have significantly higher GQS densities (Figure 2.2, Figure A.2 and Table A.1). A comparison of GC content and GQS density across the different plant genomes shows that GQS levels correlate reasonably well with the GC content of each genome (Figure A.2, green symbols,

R2 = 0.76). However, the same correlation is not true for all genomes. In particular, H. sapiens and D. melanogaster follow the same trend as plant genomes, but M. musculus does not.

Figure 2.2: Density of GQS in different organisms. Density of GQS in each genome is represented by a bar and given in GQS/Mb. GQS here is defined as G3L1-7. GQS density is provided for each organism at the right-hand end of the bar. All numbers are from this study except H. sapiens which is from Huppert and coworkers21 and Todd and coworkers22. Common names: Homo sapiens (Human), Drosophila melanogaster (fruitfly), Mus musculus (mouse), Arabidopsis thaliana (mouse ear cress), Arabidopsis lyrata (lyrate rock cress), Manihot esculenta (cassava), Lotus japonicus (Lotus japonicus), Glycine max (soybean), Oryza sativa indica (rice – indica), Zea mays (corn), and Physcomitrella patens (Physcomitrella patens, a moss).

41

Tandem repeats of two guanines (G2) have also been reported to form G-quartet structures, albeit in a less stable form. For example, Hartig and coworkers studied differences in

RNA GQS with G2 and G3 sequences and showed that G2 sequences formed GQS but had melting temperatures about 25 °C lower than equivalent G3 sequences; in addition, all G2 sequences had

82 weaker circular dichroism (CD) signatures than G3 sequences . Nonetheless, G2 sequences can form G-quartets that can be quite stable, especially in the presence of high salt concentrations.82

The thrombin DNA aptamer, with the sequence G2T2G2TGTG2T2G2 is an example of a well

44-46 characterized DNA G2 quartet. We therefore also evaluated the prevalence of GQS for stretches of two or more guanines and a maximum loop size of four (G2L1-4). This simplification of the search motif dramatically increased the number of hits, from ~1,200 to ~43,000 (Table

2.1), providing a genomic density of 339 GQS/Mb (Table 2.2).

To assess the significance of the number of GQS found in Arabidopsis, we applied a windowed Markov Model, maintaining dyad frequency, to generate randomized genomes, similar to the method of Huppert and Balasubramanian.21 Results of various window sizes are shown in

Table 2.3. The AT patterns were used as a control to ensure that the generated random genome was a good representation of the real genome. Markov window sizes that are too small do not sufficiently randomize the genome, producing more hits for both the AT pattern control and the

GC pattern of interest. On the other hand, larger window sizes yield smaller numbers of patterns, as GC-rich regions become diluted in the window. A Markov window of 100 was determined to be an appropriate mimic of the real genome, as the number of AT patterns (147,451) matched most closely that of the real genome (147,266) (Table 2.3). We find that GC patterns are underrepresented by a factor of 2.3 in the real genome when compared to the random genome

(2,875 expected, 1,232 found). These sequences are even more underrepresented than in the

21 human genome, where G3 GQS are only underrepresented by a factor of 1.4.

42

The G2 patterns were examined using the Markov Model as well. Again, a window size of 100 was optimal, however, the G2 pattern was not underrepresented as the G3 patterns were, with 44,168 G2 patterns expected, and 43,117 found. Nonetheless, G2 patterns may play important biological roles, as revealed by our functional analysis and drought stress analysis on these motifs, described below. Next, we investigated where these GQS sequences are located within the genome.

Table 2.3: Number of patterns in Arabidopsis and Markov simulated genome.

Window X3 L1-7 X3 L1-7 X2 L1-4 X2 L1-4 GC AT GC AT Real 1,232 147,266 43,117 746,324

Markov 50 5,838 195,259 63,307 816,536 Markov 75 3,776 165,195 50,827 771,304 Markov 100 2,875 147,451 44,168 743,265 Markov 150 1,977 125,727 36,554 708,312 Markov 200 1,509 113,870 32,461 687,647 Markov 400 847 91,457 25,208 646,856 Markov 1000 421 73,006 19,076 608,637 Markov 2000 282 64,350 16,112 588,057 Markov 4000 191 58,097 14,113 572,833 Number of GC and AT patterns in the real Arabidopsis genome and a windowed Markov model simulated genome. See Materials and Methods for more details. The window size that accurately simulated the AT pattern of the Arabidopsis genome is 100 and is in bold text.

2.4.2 Location of GQS in the Arabidopsis genome

We found that the distribution of GQS throughout the genome is not uniform. First, the prevalence of G3L1-7 GQS in genic and intergenic regions of the genome was compared. As described above, the density of G3L1-7 GQS in the Arabidopsis genome is 9.3 GQS/Mb (Table

2.2). Seventy percent of these G3L1-7 GQS are present in the intergenic regions, yielding an enrichment value (intergenic density/whole genome density) of 1.8. It follows that G3L1-7 GQS are depleted in the genic regions, with a lower GQS density than the remainder of the genome.

43

Here, the corresponding values are 30% (Table 2.1) and 0.5. (Table 2.2), and the ratio of DNA

G3L1-7 GQS in the genic to intergenic regions is 0.4 (Table 2.1, right-most column).

As mentioned above, we found a correlation between genomic GQS density and GC content for 15 plant species and a few of the non-plant eukaryotes. However, this correlation did not extend to sub-regions of the Arabidopsis genome. When genic and intergenic regions of the genome are considered separately, the G3 GQS densities are opposite of the GC content. The genic region has a GC content of 38.9% and the intergenic region has a GC content of 31.1%, while the genic region has a G3 GQS density of 4.6 GQS/Mb, much lower than the intergenic region value of 16.7 GQS/Mb.

We examined the genomic regions of the well-annotated monocot O. sativa ssp. japonica to see if this trend extended to other plant species. In O. sativa, even though the GC content and

GQS density is higher than Arabidopsis, we observe the same inverse correlation between the genic and intergenic regions. The genic region has a GC content of 45% and a GQS density of

82.6 GQS/Mbase, while the intergenic region has a lower GC content of 41.5% but a higher GQS density of 127.9 GQS/Mbase.

Approximately 160 (20%) of the intergenic GQS were found to correspond to the

Arabidopsis telomeric sequence, (G3T3A)3+G3. These sequences were found to be located not only at chromosome ends but also in interstitial sites near the centromeric regions. This positioning of the telomeric sequence has been previously explained by chromosomal rearrangements, such as the evolutionary combination of two chromosomes, Robertson fusions, or arm inversions.83-85 To determine if the large numbers of telomeric sequences accounted for the GQS enrichment in the intergenic region, these sequences were removed from the GQS density calculation; nonetheless, the intergenic region was still enriched in GQS, with an enrichment value of 1.6 relative to the whole genome, similar to the value of 1.8 presented above. Since the effect of telomeric

44 sequences on GQS density and enrichment values was not large, we used all GQS sequences for subsequent comparisons and calculations.

Because we were interested in different GQS motifs and relative stabilities of the

Arabidopsis GQS, we next tallied GQS statistics considering different loop lengths and numbers of guanines. In particular, we looked at whether certain loop lengths are more common than others in the Arabidopsis genome. First, we confined the loop to a maximum of three nucleotides,

G3L1-3, which decreased the number of GQS found for the entire genome to just 237 (Table 2.1).

These sequences, which should be very stable, are infrequent in the genome, with only 1.9 sequences per Mb (Table 2.2). Even though there are almost four-fold fewer unique GQS than for G3L1-7, the relative enrichments, 1.8 for the intergenic region and 0.5 for the genic region, are the same as for G3L1-7 sequences (Table 2.2).

Next, we changed the GQS definition to G2L1-4, which includes stretches of two or more guanines and a maximum loop size of four in the DNA region. As described, expanding the definition increased the number of GQS found to ~43,000 (Table 2.1), corresponding to a GQS density of 339 GQS/Mb (Table 2.2). Remarkably, upon decreasing the number of Gs in the search motif, the density of GQS shifted from favoring intergenic to favoring the genic region

(enrichment value of 1.3 in G2L1-4) (Table 2.2). In fact, whereas just 30% of G3L1-7 sequences are predicted to reside in genes, the vast majority of G2L1-4 sequences (80%) are predicted to be genic

(Table 2.1) and of those, nearly all (88%, or 30,555/34,556) are located in coding sequences.

Apparently, location in the genome depends on GQS type (see Discussion). Unlike the negative correlation between GQS density and GC content with G3 sequences, GQS density and GC content have a positive correlation for G2 sequences.

Next, we limited the size of the loop of the G2 motif to a maximum of two. As expected, this led to the identification of fewer GQS, with 12,340 sequences found (Table 2.1). While the number of GQS decreased, distribution in the genome was affected only slightly: there was a

45 modest increase in percentages for genic (80% to 85%) and coding sequences (71% to 76%)

(Table 2.1). Lastly, we constrained the loop size of the G2 motif to just one nucleotide. As expected, this further decreased the number of GQS in the genome, to 8,188 (Table 2.1), but it maintained approximately the same GQS distribution across the genome, with further small increases in the percentage of genic (85% to 89%) and coding (76% to 81%) sequences. With this definition, 91% (6,633 out of 7,287) of the GQS were located in coding sequences. The prevalence of G2 sequences with short loops in genic and coding sequences is also reflected in enrichment values (Table 2.2): in going from L1-4 to L1-2 to L1, enrichments increased from 1.3 to

1.4 to 1.5 for the genic region and from 2.2 to 2.4 to 2.5 for the coding sequence. Moreover, the ratio of DNA GQS in genic to intergenic ratios increased from 0.4 to 8.1 as the number of Gs was decreased and the loop was shortened (Table 2.1, right-most column). Thus, the G2 motif is enriched in genic regions, especially the coding sequences, and this enrichment is further enhanced for GQS with shorter, and somewhat more stable, loops.

We were especially interested in whether the shorter GQS might be more prevalent in

RNA as compared to DNA. Comparison of genic RNA GQS to intergenic DNA GQS was therefore made (Figure 2.3). In this accounting, ‗RNA‘ is defined as genic, which includes coding sequences, untranslated regions (UTRs), and introns, but only G-rich sequences, while

‗DNA‘ is defined as intergenic and includes both G- and C-rich sequences. The RNA:DNA GQS density ratio highlights the differences in genomic distribution between G3 and G2 GQS definitions. The G3 sequences are found mostly in intergenic regions and rarely in RNA, with

RNA:DNA GQS density values of 0.17 and 0.18 for loops of 1-7 and 1-3, respectively (Figure

2.3). In contrast, G2 sequences are more prevalent in RNA than intergenic regions, with

RNA:DNA density ratios of 1.5 or greater; this holds despite the fact that both DNA strands (i.e.

G and C patterns) are included in the accounting for the intergenic region. We also note that the

46

RNA:DNA density ratio increases monotonically within the G2 motif as L decreases from L1-4 to

L1-2 to L1, with values of 1.5, 1.9, and 2.5, respectively (Figure 2.3).

Figure 2.3: Ratio of RNA GQS density to non-genic DNA GQS density for different GQS motifs. DNA GQS density includes G and C sequences in the intergenic region only (Table 2.2, column 3), since this will not lead to GQS in RNA. RNA GQS density includes only the G sequences found in the genic regions (Table 2.5, column 3). For example, G3+L1-7, RNA GQS Density is 2.9 and the DNA intergenic region GQS density is 16.7, leading to a ratio of 0.17.

2.4.3 Characterization of GQS found in the intergenic region

Recent studies on Arabidopsis have revealed that some RNAs are transcribed from intergenic regions. From whole genome tiling arrays, it has been found that ~19-23% of the

Arabidopsis intergenic region is transcribed, in so-called ‗intergenic transcriptional units.‘86, 87

We therefore determined where the intergenic GQS occur in relation to these transcriptional units

(TU). A summary can be found in Table 4. For G3L1-7 GQS, of the 827 intergenic GQS (Table

2.1), only 22 (3%) fall in an intergenic TU, corresponding to a GQS density of just 2.3 GQS/Mb.

This leaves a large non-transcribed intergenic GQS density of 20 GQS/Mb (Table 2.4). In other words, the density of G3L1-7 GQS is a striking 8.7-fold higher in non-transcribed versus TU

47 intergenic regions. Moreover, the GQS density of the intergenic TU of 2.3 is two-fold less than the genic region density of 4.6 per Mb (Table 2.2).

Table 2.4: Distribution and Density (D) of GQS motifs in Arabidopsis intergenic regions

GQS Intergenica Transcribed Unitsb non-Transcribed Unitsc D non-TU/ Motif D TU GQS Dd %e GQS Dd %e G3 L1-7 827 22 2.3 3 805 2097 8.7 G2 L1-4 8,561 686 72.0 8 7,875 19492 2.7 Provided are distribution and density of G-quartet forming sequences (GQS) in transcribed (TU) and non- transcribed (non-TU) regions of the intergenic region. aIntergenic region is comprised of btranscribed units and cnon-transcribed units. Raw numbers (GQS) are provided; dGQS density (D) is defined as the total number of GQS per Mbase in the specified region; ePercentages (%) were calculated relative to the intergenic region. Number of Mbases per region: intergenic region 50.070 Mb, TU 9.53 Mb, non-TU 40.54 Mb. Quadparser search parameters were set to include both G- and C-patterns.

Regarding the G2L1-4 GQS, a larger number of these motifs (8,561) are found in the intergenic region (Table 2.1) compared to the G3L1-7, as expected, with 686 (8%) overlapping with transcriptional units (Table 2.4). This corresponds to an intergenic region TU GQS density of 72 GQS/Mb. The remaining 7,875 G2L1-4 GQS (92%) are located in the non-transcribed intergenic region, with a density of 194 GQS/Mb. Thus, the density of G2L1-4 GQS is also higher in non-transcribed versus TU intergenic regions, albeit not as much as for the G3L1-7 motif, with density ratios of 2.7- and 8.7-fold, respectively. These ratios suggest that GQS motifs may play a role in repressing transcription in intergenic regions (see Discussion).

Next, we examined whether GQS are preferentially localized to promoters. In the human genome, GQS are localized to promoter regions, with a six-fold enrichment value compared to total genomic DNA, and at least one GQS present in 42.7% of promoters.24 We therefore assessed whether similar trends hold for Arabidopsis. In contrast, Arabidopsis G3L1-7 GQS are not over-represented upstream of genes. For example, only 317 GQS were found in promoter regions, corresponding to a density of just 9.5 GQS/Mb (using 33.20 Mb as the total length of all promoter regions), lower than the overall Arabidopsis intergenic region GQS density of 16.7

48

GQS/Mb (Table 2.2), and much less than the human promoter GQS density of 770 GQS/Mb.24

G2L1-4 GQS are more prevalent than G3L1-7 in promoter regions, with 8,306 G2L1-4 GQS, corresponding to a density of 250 GQS/Mb, which is somewhat greater than the overall intergenic region density of 173 GQS/Mb (Table 2.2). Approximately 20% (8,620) of all genes have at least one of these two types of GQS in the promoter region, suggesting a possible role of G2 GQS in regulating via the promoter.

2.4.4 Characterization of GQS found in the genic region

As motivation for characterizing GQS in the genic region, we first asked, for any given

GQS definition, how many genes or loci contain at least one GQS in the corresponding mRNA.

(Gene models are named uniquely within a given open reading frame (ORF), thus a given locus can have more than one associated gene model, e.g., if there are alternatively spliced variants for a gene.) Of the 39,640 gene models in Arabidopsis, 215 gene models were found to contain ≥1

G3 GQS in their RNA, which represents 0.5% of the total number of genes in Arabidopsis (Table

A.3). In contrast, ≥1 G2+L1-4 GQS were found in about one third of all gene models (33%) and loci (31%), suggesting the potential for widespread formation (Table A.3). A complete list of gene models that contain a GQS can be found online at

(http://nar.oxfordjournals.org/content/38/22/8149/suppl/DC1).

Regulation of translation by G3 GQS in RNA transcripts has been demonstrated in E. coli

2,82,88 and eukaryotic cells. Given the large number of G2 quartets in Arabidopsis sequences corresponding to mRNAs, the location of these sequences within the genic region was determined. To examine GQS that would appear in the RNA, we limited searching to G patterns from the genic regions (i.e. C patterns were excluded), and divided results between coding sequences (cds), 5‘- and 3‘-untranslated regions (UTRs), and introns (Table 2.5). In the

49

Arabidopsis genome, the majority (90%) of G2 RNA GQS are in the cds, which has a ~4 fold higher density than the non-coding UTRs (443, 110, and 105.5, for cds, 5‘UTR, and 3‘UTR, respectively, Table 5). Within the cds, GQS are located towards the 5‘ end (Figure A.3). For the

G3 sequences, the cds has a 2-fold higher GQS density than the non-coding UTRs (4.3, 2.6, and

2.2 for cds, 5‘UTR, and 3‘UTR, respectively). Lastly, the intronic regions for both G2 and G3 motifs have much lower GQS density, with a coding/intronic density ratio of 13 for G2L1-4 and

4.3 for G3L1-7 (Table 2.5).

2.4.5 Potential biological functions of GQS in Arabidopsis

Because cellular K+ concentrations often increase under drought stress, we considered if any of the genes containing G2 GQS are differentially expressed when exposed to drought stress conditions. We used previously published tilling array data by Matsui and coworkers86 that reported drought-responsive loci. As mentioned above, 31% of all loci in Arabidopsis contain at least one G2L1-4 GQS (10,382/33,518). Matsui and coworkers report that 5,508 loci are drought responsive, which corresponds to about 16% of all loci in Arabidopsis. Of those loci, 45%

(2,474) have at least one GQS. By chi-square ( 2) analysis of these values—31% of all loci have a GQS versus 45% of all drought-responsive loci have a GQS—we determined that drought- regulated loci are indeed significantly more likely to have a GQS than when considering all loci,

(p-value <0.0001). The reverse analysis, inquiring if loci that have a GQS are more likely to be drought responsive than loci without GQS, is also true—16% of all loci are drought responsive versus 24% of GQS loci are drought responsive (p-value <0.0001).

50

Table 2.5: Distribution and Density (D) of GQS motifs in Arabidopsis genic RNA

GQS Motif Genica Codingb 5‘ UTRc 3‘ UTRd Introne GQS Df GQS Df %g GQS Df %g GQS Df %g GQS Df %g G3+L1-7 225 2.9 174 4.3 77 10 2.6 4 14 2.2 6 27 1.0 12 G3+L1-3 43 0.6 28 0.7 65 4 1.1 9 4 0.6 9 7 0.3 16

G2+L1-4 19,98 257.8 17,989 443.1 90 417 110.0 2 657 105.5 3 922 34.3 5 5 G2+L1-2 5,435 71.1 4,897 120.6 90 124 32.7 2 128 20.5 3 286 10.6 5 G2+L1 3,496 45.1 3,174 78.2 91 74 19.5 2 63 10.1 2 185 6.9 5 Provided are distribution and density of G-quartet forming sequences (GQS) in different regions of the genes from all gene models. aGenic region is comprised of bCoding, c5‘UTR, d3‘UTR, and eIntron. fGQS density is defined as the total number of GQS per Mbase in the specified region. Number of Mbases per region: genic region 74.65 Mb, CDS 39.59 Mb, 5‘UTR 3.62 Mb, 3‘UTR 6.02 Mb, and intron 25.43 Mb gPercentages were calculated relative to the genic region. Raw numbers (GQS), GQS densities (D), and percentages (%) of GQS. Quadparser search parameters were set to include only G- patterns, which will be found in RNA, and exclude C-patterns.

51

Given the enrichment of G2 sequences in the genic portion of the genome, we next investigated whether these sequences are overrepresented in certain functional classes of loci.

The functions of gene products encoded by genes containing at least one GQS were examined using Gene Ontology (GO) codes as analyzed with the program BiNGO.76 A number of GO terms were found to be overrepresented in proteins encoded by G2-containing genes. A sample of unique, over- and underrepresented GO terms for the G2+L1-4 GQS definition and the respective p- values, all < 1E-8, are provided in Table 2.6, and a complete list of over- and underrepresented

GO terms can be found in Table S4. In addition, GO analysis was performed with other G2 and

G3 GQS motifs, providing over- and underrepresented GO terms with less statistical significance owing to the smaller sample sizes (Tables A.4-A.7).

The GO term with the greatest statistical significance for the G2L1-4 motif is ―catalytic activity‖, with an exceptionally low p-value of 9E-65 (Table 2.6). This term corresponds to genes that code for enzymes. According to this analysis, 45% of ―catalytic activity‖-annotated loci have one or more G2L1-4 GQS, which is much larger statistically than the 33% of all loci that have such a GQS. Other highly significant GO terms include nucleotide binding and multicellular organismal development, as well as specific catalytic activities such as post-translational protein modification, kinase activity, transferase activity, and helicase activity, all of which have p-values

<7E-17. One possibility is that during stress in Arabidopsis G-quartet structure formation is enhanced, which represses expression of these genes allowing metabolism to decrease (see

Discussion).

52

Table 2.6: Functional analysis of genes with at least one G2L1-4 GQS present in the RNA

GO Catb GO termc GQS All genese % GQS genesf p-valueg GO IDa genesd Overrepresented 0003824 MF catalytic activity 2894h 6393i 45% 9E-65 0006468 BP protein aa phosphorylation 506 798 63% 1E-53 0016740 MF transferase activity 1118 2176 51% 2E-49 0016301 MF kinase activity 661 1151 57% 2E-48 0043687 BP post-translational mod. 579 985 59% 2E-46 0005478 MF transporter activity 504 993 51% 1E-19 0000166 MF nucleotide binding 490 980 50% 2E-17 0048856 BP anatomical structure dev. 359 681 53% 5E-17 0007275 BP multicellular organismal dev. 441 871 51% 7E-17 0016020 CC membrane 1011 2266 45% 3E-16 0022414 BP reproductive process 275 502 55% 8E-16

Underrepresented 0000496 MF base pairing 0 631 0% <1E-99 0006412 BP translation 174 1129 15% 4E-51 0010467 BP gene expression 293 1487 20% 1E-41 0003723 MF RNA binding 179 983 18% 7E-30 0000154 BP rRNA modification 1 70 0.01% 3E-9 Provided are overrepresented and underrepresented gene ontologya (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d pre-mRNA with at least one G2+L1-4 GQS. Included are the number of genes (scored if GQS is in CDS, 5‘UTR, 3‘UTR, or introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given GO term and gthe appropriate p-value, as determined using the BiNGO program. hThe total number of GO-annotated genes with a GQS i in G2L1-4 is 9,097. The total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub-categories of others. Complete list is provided in Table A.2.

53

Underrepresented GO terms were also determined with BiNGO. Interestingly, genes in a number of GO categories were found to have very significant depletion in G2L1-4 GQS. In particular, genes with GO terms for base pairing, translation, and rRNA modification had p- values for underrepresentation of <1E-99, 4.2E-51, and 3.0E-9, respectively; in the case of base pairing, none of the 631 genes had a GQS. The genes in these activities correspond to non- coding RNAs such as tRNAs, rRNAs, and snoRNAs whose function depends on specific RNA secondary or tertiary structures. We hypothesize that GQS are underrepresented in these RNAs because G-quartet structures would disrupt base pairing and therefore function (see Discussion).

About 14% of the genes in Arabidopsis, or 4,646, are known to have alternative splice variants.89 Environmental conditions and stresses,90, 91 developmental stages92, or tissue localization can produce an alternatively spliced transcript.93 We searched the known alternative splice variants for GQS and found that over 8,000 gene models (including the individual splice variants) have at least one G2 GQS. Moreover, we found 108 genes that have at least one splice variant that contains a GQS and one that either does not have a GQS or one that has a GQS located in a different genic region. For example, CTC1 (Conserved Telomere Mainenance

Component 1) has two splice variants, one without a GQS, and one with a G2 GQS located in

Intron 15. Additionally, SPL10 has 4 splice variants, one of which does not contain a GQS, one which has a GQS in its 5‘UTR and two with GQS in Intron 1. Although we have not found any readily apparent correlation between GQS location and alternative splicing, the possibility remains that GQS play a functional role in the alternative splicing or expression of some genes.

2.4.6 Experimental evidence for G-quadruplex formation

To verify that the GQS identified by bioinformatics actually form GQS, folding and thermal denaturation experiments were performed on select RNA sequences in vitro. We chose

54 short RNA oligonucleotides that correspond to specific GQS folding motifs that are present in specific Arabidopsis genes. The unimolecular nature of the folding transition and the ability to form a G-quadruplex were assessed by native PAGE (Figure A.1). As shown in Figure A.1a, all representative oligonucleotides ran primarily as single bands and in the order expected, with the exception of G2L111 (see below) and G3L444+FLANK (last lane), which likely has a contribution from a fold involving the flanking nucleotides. Also, the ability of a G2 motif to form a G- quadruplex structure is confirmed in Figure A.1b, where adding an increasing number of repeats leads to a faster mobility species once all 4 repeats are reached (lane 4), which is further confirmed by an oligonucleotide of the same length but a G to A single mutation that migrates with normal mobility (lane 5). Quadruplex formation is further confirmed by circular dichroism

(CD) spectra and UV-melts (see below).

CD-detected K+ titrations were used to judge whether a quadruplex formed and to determine ion affinity, and UV-detected thermal denaturations were used to assess quadruplex thermal stability. G-quartets have a unique CD signature that depends on topology: parallel GQS have a positive peak at 265 nm and a negative peak at 240 nm, while antiparallel GQS have a positive peak at 295 nm and a negative peak at 260 nm.1,7

Sequences chosen and their associated gene IDs are provided in Materials and Methods, and relevant thermodynamic parameters are provided in Table 2.7. Circular dichroism spectra for sequences representative of specific folding motifs of interest are provided in Figure 2.4a, normalized to concentration and oligonucleotide length.94 Height of the normalized CD peak gives an indication of population. Peak heights in Figure 2.4a are in the order

G3L221>G2L111>G3L444>G3L444+FLANK>G2L444. A native flanking sequence of 25 5‘-nucleotides was added to the G3L444 oligo (herein notated G3L444 + FLANK) to provide a more biological context and to allow for the possibility of competing secondary structure. This flanking sequence interacted with the nucleotides that would otherwise form the GQS to give a predicted95 free

55

energy of –15.7 kcal/mol. Almost all of the sequences tested appear to have a fully parallel GQS

structure, except for G2L111 which has both parallel and antiparallel character, exemplified by the

shoulder in the CD spectrum at 290-300 nm. This requires further analysis, and evidence for

another structure can be seen in the native gels (Figure A.1).

Table 2.7: Experimental Values for G-quartet formation in Arabidopsis RNA

a + c GQS Motif Gene ID K 1/2 Tm ( C) (mM)b Li+ Na+ K+ G3 L221 At1g07180 8.0 ± 0.7 54 62 >85 G3 L444 At5g53580 42 ± 5 30 45 74 d e G3 L444 + FLANK At5g53580 220 ± 70 ND ND ND

G2 L111 At2g39320 30 ± 2 32 43 >85 d G2 L444 At1g44020 316 ± 1 ND ND 62 Provided are thermodynamic parameters for G-quartet formation for RNA oligonucleotides from representative Arabidopsis genes. See Materials & Methods for full sequences. aGene ID identifies the b + particular gene from the Arabidopsis genome and sequences are provided in Materials and Methods. K 1/2 (K+ concentration needed to fold half the RNA) values were determined by CD titrations and using Eq. 1. cTm (melting temperature) values were determined by UV thermal melts at 100 mM monovalent salt concentration using the chloride salt. dMelts for these oligonucleotides were performed at 1M salt + e concentration owing to their higher K 1/2 values. ND indicates that the Tm value could not be determined due to absence of a well defined folding transition.

To verify GQS formation and assess thermal stability, we performed UV melts on these

RNAs in the background of 100 mM or 1000 mM Li+, Na+, or K+. Data were acquired at 260 nm,

as well as at 295 nm, which gives a quartet-specific inverse melt in which absorbance decreases

with temperature which is associated with unfolding of a G-quadruplex81 (see Figure 2.4c and

2.4d for sample melts). (Inverse melts were observed for G2 sequences as well (data not shown),

which along with the data in Figure A.1b supports their ability to form quadruplexes.) All RNA

GQS were most stable in K+, followed by Na+ and then Li+, as expected for GQS unfolding. The

strong K+ preference of these structures is also illustrated by the drastic increase in Tm of 25 to

+ + 40 C in the presence of K versus Na (Table 2.7). In addition, G2L444 only had a well-defined

+ + + folding transition in K but not Li or Na . Melting temperatures of G3L221 and G2L111

56 oligonucleotides in K+ were especially high, ≥ 85°C (Table 2.7). Overall, these data show at least partial formation of GQS at 100 mM K+ ion concentration (see discussion).

Figure 2.4: Formation of GQS RNA oligonucleotides (a) CD spectra of RNA GQS in 10 mM o LiCacodylate (pH 7.0) and 150 mM KCl at 20 C: G3L221 (black), G2L111 (red), G3L444 (gold), G3L444 + FLANK (green), and G2L444 (blue). The positive peak at 260 nm and the negative peak at 240 nm suggest that the RNA GQS adopt a parallel conformation. The G2L111 RNA most likely has some antiparallel character due to the shoulder in the spectrum extending to 300 nm. + See Materials & Methods for full sequences. (b) Sample K titration. Titration is of G3L444 RNA GQS with KCl additions from 0 mM to 700 mM. The arrows indicate increasing K+ concentrations. Also included are UV thermal detaturation of G3L444 RNA in 100 mM LiCl (black), NaCl (red) and KCl (blue) at (c) 260 nm and (d) 295 nm. Absorbances are normalized to the highest absorbance.

57

2.5 Discussion

In this study, GQS in the Arabidopsis genome were tallied by computational approaches and characterized by in vitro experiments. We categorized G3 and G2 sequences with different loop lengths. The G3 sequences were more common in intergenic than genic regions, especially in the non-transcribed intergenic units, while the G2 sequences were common in genic regions.

Overall, G3 sequences in Arabidopsis were even less common than in the human genome, especially in the promoter region, perhaps because the higher K+ concentrations present during stress in plants provide a negative selective pressure. Within the genic regions, RNA GQS were mostly in the coding region, with 1/3 of all Arabidopsis genes containing at least one G2 motif.

Remarkably, these RNAs were overrepresented in certain classes of genes, especially those with catalytic activity, and underrepresented in other classes of genes, especially those that require base pairing for their function. Introns also had low GQS density. In addition, genes that are differentially regulated by drought stress are significantly more likely to contain a GQS than the genome as a whole. Experiments on representative RNAs from Arabidopsis confirmed that they form G-quartets under physiological conditions.

2.5.1 Formation and function of GQS in DNA and RNA

Formation of GQS in DNA was scored with both G and C patterns, allowing for putative formation of G-quartets in the sense and antisense strands of the genomic DNA. This was done because in vitro studies with self-complementary oligonucleotides have shown that quadruplex structures can exist in equilibrium with the DNA duplex96 and that the i-motif in the C-rich strand might be favored by macromolecular crowding75. In addition, during transcription, a single- stranded transcription bubble of 7-12 nucleotides is formed, breaking the duplex structure and

58 allowing for easier formation of a quadruplex in either strand of DNA.97 These DNA GQS thus have the potential for regulation at the level of transcription.98

In addition to DNA, G-quartet sequences can form in RNA, where they have the potential for transcriptional, translational, or mRNA stability regulation.39-42 The RNA quadruplex structure is expected to exist in equilibrium with alternative structures such as hairpins involving flanking sequence, which can control GQS stability. Equilibrium between the quadruplex and base-paired structures could be controlled by cellular conditions such as K+ concentration.16,18

We report a correlation between GC content and GQS density for 15 plant species, including monocots, dicots, and a moss (Figure A.2). However, the correlation is not maintained when some non-plant eukaryotes such as M. musculus are included, nor when regions of the

Arabidopsis and O. sativa genomes are analyzed separately. The anti-correlation between GC content of the genic and intergenic regions and their G3 GQS density suggests possible evolutionary bias away from these potentially disrupting sequences in the coding sequence in both dicot and monocot species. It is also valid to note that guanine repeats in the genome may be associated with additional biological phenomena. For example, repeats of glycine (GGG and

GGX codons) and valine, alanine, glutamate, arginine, and tryptophan) (XGG or GXG codons) will have a GQS in the mRNA. Thus, GQS regions can have multiple functions.

Our study suggests that one possible function of GQS in Arabidopsis is regulation of large numbers of genes. The most overrepresented GO term was ‗Catalytic Activity‘, with nearly half of the 6,393 gene products annotated for catalysis encoded by genes containing at least one

G2L1-4 motif (Table 2.6). One possibility is that these GQS motifs provide a way to decrease metabolism during stress:99 in response to stressors that result in increases in cytosolic K+ concentrations, these motifs could fold into G-quadruplexes and limit transcription and translation and thereby limit expression of enzymes. In fact, we do find that genes that are differentially expressed, either up- or downregulated when exposed to drought stress, are significantly more

59 likely to contain a GQS sequence. This suggests GQS formation may be one of numerous mechanisms plants use to adjust to changes in environmental conditions. The most underrepresented GO terms were for tRNA, rRNA, and snoRNAs, whose functions depend on proper intra- and intermolecular base pairing. The presence of GQS in these RNAs could interfere with base pairing and proper folding. One intriguing possibility is that absence of GQS could be used as a criterion in searches to identify non-coding RNAs. In addition, under different environmental conditions or in the presence of stressors, GQS formation has the potential to regulate splicing.

The G3 GQS are enriched in the intergenic regions and depleted in the genic regions

(enrichment 0.5), with the shorter loop motif, G3L1-3, depleted even further in the cds (Table 2.2).

Increasing the number of G repeats to G4L1-7 increases the potential stability of the sequence, and increases the intergenic region enrichment value from 1.8 to 2.1 while depleting the enrichment value of the genic regions from 0.5 to 0.3 (data not shown). Given that DNA GQS reported in the literature inhibit transcription 31, the enrichments observed in the non-transcribed regions of the

Arabidopsis genome suggest that stable GQS in these regions might be functioning to repress unwanted transcription. This idea is further supported by GQS distribution within the intergenic region, where the density of G3L1-7 in non-transcribed regions is 8.7-fold higher than in intergenic transcriptional units (Table 2.4). The G3L1-7 GQS are exceptionally stable, having melting temperatures of greater than 85 oC in physiological K+ concentrations, supporting their ability to stay folded and thus potentially block transcription. In contrast, the G2 GQS are enriched in the genic regions and especially the cds. These sequences have lower thermal stability and melting temperatures. One possibility is that these GQS may be more plastic, allowing switching between quadruplex and non-quadruplex structures in response to cellular conditions.

In Arabidopsis under unstressed conditions, the cellular K+ concentration is around 100–

100,101 150 mM. At this concentration, the most stable GQS, such as the G3L221 and G3L444, which

60

+ have K 1/2 values less than 50 mM, have the intrinsic ability to form stable quadruplex structures.

Addition of flanking sequences, however, can modulate quadruplex folding if there is a competing secondary structure, as demonstrated for the G3L444+FLANK RNA oligonucleotide.

Less stable GQS, such as the G2 sequences, will most likely not be fully formed under unstressed conditions (see Table 2.7). However, they may form during water-limiting conditions, where cellular K+ concentrations can reach 700 mM. The potential switch in RNA structure due to cellular K+ concentration increases could potentially affect the RNA secondary structure of a large number of genes, as up to 10,000 genes contain a G2 GQS. The possibility for a change in

RNA structure leading to a change in gene expression as a response to stress is thus present.

2.6 Conclusion

We have found that G-quartet sequences of varying motifs are present in Arabidopsis thaliana. Their distribution varies with sequence motif, with G3L1-7 sequences preferentially located in intergenic regions and G2L1-4 sequences preferentially in genic regions. GQS located in RNA have the potential to regulate transcription and translation, perhaps as modulated by environmental and physiological conditions.

2.7 Acknowledgements

We would like to thank Professors David Lilley and Paul Babitzke for helpful comments and suggestions, and Prof. Yu Zhang for helpful discussions about statistics. This work was supported by the National Science Foundation [MCB-0527102 to P.C.B.]; the National Science

Foundation [MCB-03-45251 to S.M.A and P.C.B.]; and by the Human Frontier Science Program

(HFSP) [RGP0002/2009-C to P.C.B, S.M.A., and F.M.].

61

2.8 References

1. Smargiasso, N., Rosu, F., Hsia, W., Colson, P., Baker, E.S., Bowers, M.T., De Pauw, E. and Gabelica, V., G-quadruplex DNA assemblies: loop length, cation identity, and multimer formation. J. Am. Chem. Soc., 2008. 130: 10208-10216.

2. Morris, M.J. and Basu, S., An unusually stable G-quadruplex within the 5'-UTR of the MT3 matrix metalloproteinase mRNA represses translation in eukaryotic cells. Biochemistry, 2009. 48: 5313-5319.

3. Gellert, M., Lipsett, M.N. and Davies, D.R., Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A., 1962. 48: 2013-2018.

4. Williamson, J.R., Raghuraman, M.K. and Cech, T.R., Monovalent cation-induced structure of telomeric DNA: the G-quartet model. Cell, 1989. 59: 871-880.

5. Burge, S., Parkinson, G.N., Hazel, P., Todd, A.K. and Neidle, S., Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res., 2006. 34: 5402-5415.

6. Hazel, P., Huppert, J., Balasubramanian, S. and Neidle, S., Loop-length-dependent folding of G-quadruplexes. J. Am. Chem. Soc., 2004. 126: 16405-16415.

7. Paramasivan, S., Rujan, I. and Bolton, P.H., Circular dichroism of quadruplex : applications to structure, cation effects and ligand binding. Methods, 2007. 43: 324-331.

8. Joachimi, A., Benz, A. and Hartig, J.S., A comparison of DNA and RNA quadruplex structures and stabilities. Bioorg. Med. Chem., 2009. 17: 6811-6815.

9. Saenger, W., Principles of nucleic acid structure. Springer Advanced Texts in Chemistry, ed. C.R. Cantor. 1984, New York, NY: Springer-Verlag.

10. Shafer, R.H. and Smirnov, I., Biological aspects of DNA/RNA quadruplexes. Biopolymers, 2000. 56: 209-227.

11. Tang, C.F. and Shafer, R.H., Engineering the quadruplex fold: nucleoside conformation determines both folding topology and molecularity in guanine quadruplexes. J. Am. Chem. Soc., 2006. 128: 5966-5973.

12. Hud, N.V., Smith, F.W., Anet, F.A. and Feigon, J., The selectivity for K+ versus Na+ in DNA quadruplexes is dominated by relative free energies of hydration: a thermodynamic analysis by 1H NMR. Biochemistry, 1996. 35: 15383-15390.

13. Hazel, P., Parkinson, G.N. and Neidle, S., Predictive modelling of topology and loop variations in dimeric DNA quadruplex structures. Nucleic Acids Res., 2006. 34: 2117- 2127.

14. Miyoshi, D., Nakao, A. and Sugimoto, N., Molecular crowding regulates the structural switch of the DNA G-quadruplex. Biochemistry, 2002. 41: 15017-15024.

15. Miyoshi, D., Matsumura, S., Nakano, S. and Sugimoto, N., Duplex dissociation of telomere DNAs induced by molecular crowding. J. Am. Chem. Soc., 2004. 126: 165-169.

62

16. Kumar, N. and Maiti, S., The effect of osmolytes and small molecule on Quadruplex-WC duplex equilibrium: a fluorescence resonance energy transfer study. Nucleic Acids Res., 2005. 33: 6723-6732.

17. Miyoshi, D., Karimata, H. and Sugimoto, N., Hydration regulates thermodynamics of G- quadruplex formation under molecular crowding conditions. J. Am. Chem. Soc., 2006. 128: 7957-7963.

18. Kumar, N. and Maiti, S., Role of molecular crowding in perturbing quadruplex-Watson Crick duplex equilibrium. Nucleic Acids Symp. Ser., 2008: 157-158.

19. Gollnick, P., Babitzke, P., Antson, A. and Yanofsky, C., Complexity in regulation of tryptophan biosynthesis in Bacillus subtilis. Annu. Rev. Genet., 2005. 39: 47-68.

20. Tucker, B.J. and Breaker, R.R., Riboswitches as versatile gene control elements. Curr. Opin. Struct. Biol., 2005. 15: 342-348.

21. Huppert, J.L. and Balasubramanian, S., Prevalence of quadruplexes in the human genome. Nucleic Acids Res., 2005. 33: 2908-2916.

22. Todd, A.K., Johnston, M. and Neidle, S., Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res., 2005. 33: 2901-2907.

23. Azzalin, C.M., Reichenbach, P., Khoriauli, L., Giulotto, E. and Lingner, J., Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science, 2007. 318: 798-801.

24. Huppert, J.L. and Balasubramanian, S., G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res., 2007. 35: 406-413.

25. Huppert, J.L., Bugaut, A., Kumari, S. and Balasubramanian, S., G-quadruplexes: the beginning and end of UTRs. Nucleic Acids Res., 2008. 36: 6260-6268.

26. Yadav, V.K., Abraham, J.K., Mani, P., Kulshrestha, R. and Chowdhury, S., QuadBase: genome-wide database of G4 DNA--occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res., 2008. 36: D381-385.

27. Rawal, P., Kummarasetti, V.B., Ravindran, J., Kumar, N., Halder, K., Sharma, R., Mukerji, M., Das, S.K. and Chowdhury, S., Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. Genome Res., 2006. 16: 644-655.

28. Thakur, R.K., Kumar, P., Halder, K., Verma, A., Kar, A., Parent, J.L., Basundra, R., Kumar, A. and Chowdhury, S., Metastases suppressor NM23-H2 interaction with G- quadruplex DNA within c-MYC promoter nuclease hypersensitive element induces c- MYC expression. Nucleic Acids Res., 2009. 37: 172-183.

29. Borgognone, M., Armas, P. and Calcaterra, N.B., Cellular nucleic-acid-binding protein, a transcriptional enhancer of c-Myc, promotes the formation of parallel G-quadruplexes. Biochem. J., 2010. 428: 491-498.

63

30. Cogoi, S., Paramasivam, M., Membrino, A., Yokoyama, K.K. and Xodo, L.E., The KRAS promoter responds to Myc-associated zinc finger and poly(ADP-ribose) polymerase 1 proteins, which recognize a critical quadruplex-forming GA-element. J. Biol. Chem., 2010. 285: 22003-22016.

31. Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. and Hurley, L.H., Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription. Proc. Natl. Acad. Sci. U.S.A., 2002. 99: 11593-11598.

32. Qin, Y., Rezler, E.M., Gokhale, V., Sun, D. and Hurley, L.H., Characterization of the G- quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4. Nucleic Acids Res., 2007. 35: 7698-7713.

33. Cogoi, S., Paramasivam, M., Filichev, V., Geci, I., Pedersen, E.B. and Xodo, L.E., Identification of a new G-quadruplex motif in the KRAS promoter and design of pyrene- modified G4-decoys with antiproliferative activity in pancreatic cancer cells. J. Med. Chem., 2009. 52: 564-568.

34. Gonzalez, V., Guo, K., Hurley, L. and Sun, D., Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein. J. Biol. Chem., 2009. 284: 23622- 23635.

35. Paramasivam, M., Membrino, A., Cogoi, S., Fukuda, H., Nakagama, H. and Xodo, L.E., Protein hnRNP A1 and its derivative Up1 unfold quadruplex DNA in the human KRAS promoter: implications for transcription. Nucleic Acids Res., 2009. 37: 2841-2853.

36. Membrino, A., Paramasivam, M., Cogoi, S., Alzeer, J., Luedtke, N.W. and Xodo, L.E., Cellular uptake and binding of guanidine-modified phthalocyanines to KRAS/HRAS G- quadruplexes. Chem. Commun. (Cambridge, U.K.), 2010. 46: 625-627.

37. Du, Z., Zhao, Y. and Li, N., Genome-wide colonization of gene regulatory elements by G4 DNA motifs. Nucleic Acids Res., 2009. 37: 6784-6798.

38. Verma, A., Yadav, V.K., Basundra, R., Kumar, A. and Chowdhury, S., Evidence of genome-wide G4 DNA-mediated gene expression in human cancer cells. Nucleic Acids Res., 2009. 37: 4194-4204.

39. Fernando, H., Reszka, A.P., Huppert, J., Ladame, S., Rankin, S., Venkitaraman, A.R., Neidle, S. and Balasubramanian, S., A conserved quadruplex motif located in a transcription activation site of the human c-kit oncogene. Biochemistry, 2006. 45: 7854- 7860.

40. Kumari, S., Bugaut, A., Huppert, J.L. and Balasubramanian, S., An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation. Nat. Chem. Biol., 2007. 3: 218-221.

41. Kumari, S., Bugaut, A. and Balasubramanian, S., Position and stability are determining factors for translation repression by an RNA G-quadruplex-forming sequence within the 5' UTR of the NRAS proto-oncogene. Biochemistry, 2008. 47: 12664-12669.

64

42. Arora, A., Dutkiewicz, M., Scaria, V., Hariharan, M., Maiti, S. and Kurreck, J., Inhibition of translation in living eukaryotic cells by an RNA G-quadruplex motif. RNA, 2008. 14: 1290-1296.

43. Christiansen, J., Kofod, M. and Nielsen, F.C., A guanosine quadruplex and two stable hairpins flank a major cleavage site in insulin-like growth factor II mRNA. Nucleic Acids Res., 1994. 22: 5709-5716.

44. Bock, L.C., Griffin, L.C., Latham, J.A., Vermaas, E.H. and Toole, J.J., Selection of single-stranded DNA molecules that bind and inhibit human thrombin. Nature, 1992. 355: 564-566.

45. Macaya, R.F., Schultze, P., Smith, F.W., Roe, J.A. and Feigon, J., Thrombin-binding DNA aptamer forms a unimolecular quadruplex structure in solution. Proc. Natl. Acad. Sci. U.S.A., 1993. 90: 3745-3749.

46. Wang, K.Y., McCurdy, S., Shea, R.G., Swaminathan, S. and Bolton, P.H., A DNA aptamer which binds to and inhibits thrombin exhibits a new structural motif for DNA. Biochemistry, 1993. 32: 1899-1904.

47. Darnell, J.C., Jensen, K.B., Jin, P., Brown, V., Warren, S.T. and Darnell, R.B., Fragile X mental retardation protein targets G quartet mRNAs important for neuronal function. Cell, 2001. 107: 489-499.

48. Schaeffer, C., Bardoni, B., Mandel, J.L., Ehresmann, B., Ehresmann, C. and Moine, H., The fragile X mental retardation protein binds specifically to its mRNA via a purine quartet motif. EMBO J., 2001. 20: 4803-4813.

49. Khateb, S., Weisman-Shomer, P., Hershco-Shani, I., Ludwig, A.L. and Fry, M., The tetraplex (CGG)n destabilizing proteins hnRNP A2 and CBF-A enhance the in vivo translation of fragile X premutation mRNA. Nucleic Acids Res., 2007. 35: 5775-5788.

50. Sundquist, W.I. and Heaphy, S., Evidence for interstrand quadruplex formation in the dimerization of human immunodeficiency virus 1 genomic RNA. Proc. Natl. Acad. Sci. U.S.A., 1993. 90: 3393-3397.

51. Shen, W., Gao, L., Balakrishnan, M. and Bambara, R.A., A recombination hot spot in HIV-1 contains guanosine runs that can form G-quartet structure and promote strand transfer in vitro. J. Biol. Chem., 2009. 284: 33883-33893.

52. Bugaut, A., Jantos, K., Wietor, J.L., Rodriguez, R., Sanders, J.K. and Balasubramanian, S., Exploring the differential recognition of DNA G-quadruplex targets by small molecules using dynamic combinatorial chemistry. Angew. Chem., Int. Ed., 2008. 47: 2677-2680.

53. Monchaud, D. and Teulade-Fichou, M.P., A hitchhiker's guide to G-quadruplex ligands. Org. Biomol. Chem., 2008. 6: 627-636.

54. Kikin, O., Zappala, Z., D'Antonio, L. and Bagga, P.S., GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs. Nucleic Acids Res., 2008. 36: D141-148.

65

55. Greenway, H. and Munns, R., Mechanisms of salt tolerance in nonhalophytes. Annu. Rev. Plant Physiol., 1980. 31: 149-190.

56. Zhang, H.X. and Blumwald, E., Transgenic salt-tolerant tomato plants accumulate salt in foliage but not in fruit. Nat. Biotechnol., 2001. 19: 765-768.

57. Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T.Z., Garcia-Hernandez, M., Foerster, H., Li, D., Meyer, T., Muller, R., Ploetz, L., et al., The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 2008. 36: D1009- 1014.

58. Vogel, J.P., Garvin, D.F., Mockler, T.C., Schmutz, J., Rokhsar, D., Bevan, M.W., Barry, K., Lucas, S., Harmon-Smith, M., Lail, K., et al., Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature, 2010. 463: 763-768.

59. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al., The genome sequence of Drosophila melanogaster. Science, 2000. 287: 2185-2195.

60. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al., Genome sequence of the palaeopolyploid soybean. Nature, 2010. 463: 178-183.

61. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Kato, T., Nakao, M., Sasamoto, S., Watanabe, A., Ono, A., Kawashima, K., et al., Genome structure of the legume, Lotus japonicus. DNA Res., 2008. 15: 227-239.

62. Young, N.D., Cannon, S.B., Sato, S., Kim, D., Cook, D.R., Town, C.D., Roe, B.A. and Tabata, S., Sequencing the genespaces of Medicago truncatula and Lotus japonicus. Plant Physiol., 2005. 137: 1174-1181.

63. Cannon, S.B., Sterck, L., Rombauts, S., Sato, S., Cheung, F., Gouzy, J., Wang, X., Mudge, J., Vasdewani, J., Schiex, T., et al., Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. U.S.A., 2006. 103: 14959-14964.

64. Wade, C.M., Kulbokas, E.J., 3rd, Kirby, A.W., Zody, M.C., Mullikin, J.C., Lander, E.S., Lindblad-Toh, K. and Daly, M.J., The mosaic structure of variation in the genome. Nature, 2002. 420: 574-578.

65. Opperman, C., M, B. and SA, L., Sequencing and analysis of the Nicotiana tabacum genome. Recent Adv. Tob. Sci., 2007. 33: 5-14.

66. Yu, J., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al., A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 2002. 296: 79-92.

67. Zhao, W., Wang, J., He, X., Huang, X., Jiao, Y., Dai, M., Wei, S., Fu, J., Chen, Y., Ren, X., et al., BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res., 2004. 32: D377-382.

66

68. Matsumoto, T., Wu, J.Z., Kanamori, H., Katayose, Y., Fujisawa, M., Namiki, N., Mizuno, H., Yamamoto, K., Antonio, B.A., Baba, T.et al. The map-based sequence of the rice genome. Nature, 2005. 436: 793-800.

69. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., et al., The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res., 2007. 35: D883-887.

70. Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al., The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 2006. 313: 1596-1604.

71. Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., Perroud, P.F., Lindquist, E.A., Kamisugi, Y., et al., The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 2008. 319: 64- 69.

72. Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., et al., The Sorghum bicolor genome and the diversification of grasses. Nature, 2009. 457: 551-556.

73. Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al., The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 2007. 449: 463-467.

74. Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., et al., The B73 maize genome: complexity, diversity, and dynamics. Science, 2009. 326: 1112-1115.

75. Rajendran, A., Nakano, S. and Sugimoto, N., Molecular crowding of the cosolutes induces an intramolecular i-motif structure of triplet repeat DNA oligomers at neutral pH. Chem. Commun. (Cambridge, U.K.), 2010. 46: 1299-1301.

76. Maere, S., Heymans, K. and Kuiper, M., BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 2005. 21: 3448-3449.

77. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 2003. 13: 2498-2504.

78. Cline, M.S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Christmas, R., Avila-Campilo, I., Creech, M., Gross, B., et al., Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc., 2007. 2: 2366-2382.

79. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 2000. 25: 25-29.

67

80. Sosnick, T.R., Characterization of tertiary folding of RNA by circular dichroism and urea, in Current protocols in nucleic acid chemistry / edited by Serge L. Beaucage ... [et al. 2001:11.15.11-11.15.10.

81. Mergny, J.L., Phan, A.T. and Lacroix, L., Following G-quartet formation by UV- spectroscopy. FEBS Lett., 1998. 435: 74-78.

82. Wieland, M. and Hartig, J.S., RNA quadruplex-based modulation of gene expression. Chem. Biol., 2007. 14: 757-763.

83. Simoens, C.R., Gielen, J., Van Montagu, M. and Inze, D., Characterization of highly repetitive sequences of Arabidopsis thaliana. Nucleic Acids Res., 1988. 16: 6753-6766.

84. Richards, E.J., Goodman, H.M. and Ausubel, F.M., The centromere region of Arabidopsis thaliana contains telomere-similar sequences. Nucleic Acids Res., 1991. 19: 3351-3357.

85. Lee, C., Sasi, R. and Lin, C.C., Interstitial localization of telomeric DNA sequences in the Indian muntjac chromosomes: further evidence for tandem chromosome fusions in the karyotypic evolution of the Asian muntjacs. Cytogenet. Cell Genet., 1993. 63: 156-159.

86. Matsui, A., Ishida, J., Morosawa, T., Mochizuki, Y., Kaminuma, E., Endo, T.A., Okamoto, M., Nambara, E., Nakajima, M., Kawashima, M., et al., Arabidopsis transcriptome analysis under drought, cold, high-salinity and ABA treatment conditions using a tiling array. Plant Cell Physiol., 2008. 49: 1135-1149.

87. Yamada, K., Lim, J., Dale, J.M., Chen, H., Shinn, P., Palm, C.J., Southwick, A.M., Wu, H.C., Kim, C., Nguyen, M., et al., Empirical analysis of transcriptional activity in the Arabidopsis genome. Science, 2003. 302: 842-846.

88. Halder, K., Wieland, M. and Hartig, J.S., Predictable suppression of gene expression by 5'-UTR-based RNA quadruplexes. Nucleic Acids Res., 2009. 37: 6811-6817.

89. English, A.C., Patel, K.S. and Loraine, A.E., Prevalence of alternative splicing choices in Arabidopsis thaliana. BMC Plant Biol., 2010. 10: 102.

90. Lazar, G. and Goodman, H.M., The Arabidopsis splicing factor SR1 is regulated by alternative splicing. Plant Mol. Biol., 2000. 42: 571-581.

91. Palusa, S.G., Ali, G.S. and Reddy, A.S., Alternative splicing of pre-mRNAs of Arabidopsis serine/arginine-rich proteins: regulation by hormones and stresses. Plant J., 2007. 49: 1091-1107.

92. Eckardt, N.A., Alternative splicing and the control of flowering time. Plant Cell, 2002. 14: 743-747.

93. Reddy, A.S., Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annual review of plant biology, 2007. 58: 267-294.

94. Cantor, C.R. and Schimmel, P.R., Biophysical chemistry, part II: Techniques for the study of biological structure and function. Their Biophysical Chemistry Series. Vol. 2. 1980, San Francisco, CA: W.H. Freeman.

68

95. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 2003. 31: 3406-3415.

96. Deng, H. and Braunlin, W.H., Duplex to quadruplex equilibrium of the self- complementary oligonucleotide d(GGGGCCCC). Biopolymers, 1995. 35: 677-681.

97. Pal, M., Ponticelli, A.S. and Luse, D.S., The role of the transcription bubble and TFIIB in promoter clearance by RNA polymerase II. Mol. Cell, 2005. 19: 101-110.

98. Yonaha, M. and Proudfoot, N.J., Specific transcriptional pausing activates polyadenylation in a coupled in vitro system. Mol. Cell, 1999. 3: 593-600.

99. Achard, P., Cheng, H., De Grauwe, L., Decat, J., Schoutteten, H., Moritz, T., Van Der Straeten, D., Peng, J. and Harberd, N.P., Integration of plant responses to environmentally activated phytohormonal signals. Science, 2006. 311: 91-94.

100. Leigh, R.A. and Wyn Jones, R.G., A hypothesis relating critical potassium concentrations for growth to the distribution and functions of this ion in the plant cell. New Phytol., 1984. 97: 1-13.

101. Leigh, R.A., Potassium homeostasis and membrane transport. J. Plant Nutr. Soil Sci., 2001. 164: 193-198.

69

Chapter 3

Potential for a binary gene response: RNA G-quadruplexes with fewer quartets fold with higher cooperativity

3.1 Abstract

Numerous genes have been shown to be regulated by conformational changes in RNA structures. Among these structures is the guanine quadruplex, which generally represses gene expression. Cooperativity of folding of the RNA affects how sensitive formation of the structure is to changing cellular conditions, which could be especially important in plants wherein intracellular potassium ion concentration often increases during cellular stress. I show here that

RNA G-quadruplex sequences (GQS) with tracts containing 3 or more Gs (G3) have a modest dependence on K+ concentration, fold with no or even negative cooperativity, and are associated with populated intermediate folding states. In contrast, GQS with tracts containing only 2 Gs

(G2) have a steeper dependence on K+ concentration for folding and fold with positive cooperativity without significantly populating intermediate states. In plant cells, we postulate that the more stable G3 sequences fold under unstressed conditions; however, the less stable G2 sequences are anticipated to have an overall favorable folding free energy only at the higher K+ concentrations found during stress where they are predicted to respond sharply to changes in K+ concentration.

3.2 Cooperativity of RNA GQS folding

A significant number of G-quadruplex sequences (GQS) are found in non-telomeric regions of genomes of many different organisms. Recent studies have explored these sequences

70 in humans,1,2 chimps, mouse,3,4 prokaryotes including E. coli,4,5 S. cerevisiae,6 and the model plant species, Arabidopsis thaliana.7 The potential biological functions of GQS in humans include regulation of transcription by GQS in promoter regions8 and regulation of translation by

RNA GQS in 5‘UTR regions of genes.9-13 Plants are of particular interest for study of GQS, as intracellular K+ concentrations can increase from 100 to 700 mM under drought stress conditions.14-16 However, the function of GQS in plant species, including binding affinities and quartet folding cooperativity, have yet to be elucidated.

Sequence stability and thermodynamics of DNA GQS folding with K+ and Na+ have been investigated17-28 but there has been comparatively little research on RNA GQS. Studies of RNA

GQS thermostability have shown that RNA GQS with short loops are more stable in K+ than equivalent DNA sequences, and RNA GQS with long loops are less stable than DNA GQS, most likely because of the parallel topology of RNA GQS.29 Additionally, RNA GQS with three G- tracts (G3) have a higher stability than those with two G-tracts (G2) with equivalent loop sequences.10,29 While DNA GQS can fold into at least six possible topologies of antiparallel and parallel structures, RNA GQS fold nearly exclusively into a propeller-type parallel topology in which the bases are all in the anti conformation.24,29 Circular dichrosim (CD) spectroscopy,

NMR, and x-ray crystallography data confirm that model RNA oligonucleotides and long telomeric RNA prefer the parallel conformation, avoiding the more restrictive antiparallel structure that requires syn base conformations.10,29,30

Both RNA and DNA GQS are stabilized by K+ and Na+ ions, although K+ ions are more effective because of their smaller penalty for dehydration. Dehydrated K+ ions bind between quartets in a quadruplex structure, adopting an almost-octahedral coordination, while dehydrated

Na+ bind in the same plane as a quartet.18 As such, it is possible for a G3 GQS to bind two K+ ions or three Na+ ions, and for a G2 GQS to bind one K+ ion or two Na+ ions, not including potential interactions with the loop regions.20,31 The thermodynamics of GQS folding and ion

71 binding have been studied in some detail with DNA oligonucleotides and the number of ions released from a DNA GQS upon unfolding has been calculated from UV denaturation profiles and differential scanning calorimetry (DSC): Olson and coworkers concluded that G2 and G3

DNA GQS structures release more K+ ions than expected from binding of the phosphates alone,32

+ 31 and studies of d(G3T)4 show that three K ions bind to the structure, although Rachwal and coworkers suggest that GQS with short loops fold with about two K+ ions.20

It has been found that in some instances DNA GQS do not fold in a simple two-state fashion. If a transition is not two-state, it becomes difficult to determine the number of ions released per GQS structure upon unfolding, especially if alternative folds associate with a different number of ions than the fully folded structure. Studies of GQS folding pathways have demonstrated that DNA sequences can fold into multiple conformations and that some proceed through a folding intermediate. Gray and coworkers analyzed d[GGG(TTAGGG)3] and similar oligonucleotides, and found one or more intermediate structures in the folding pathway upon folding with K+.23 Additionally, NMR studies of the DNA thrombin binding aptamer (TBA) show that it adopts an antiparallel chair-type structure with a visible folding intermediate with one

K+ ion bound, before binding of the second K+.33

Despite the above advances in DNA GQS folding, there has been no systematic investigation of the cooperativity of RNA GQS folding with respect to ion concentration.

Because of the aforementioned propensity of RNA to form parallel structures, there is less diversity in possible folds. Propeller-type parallel structures are not prone to ion binding in loop regions, as the loops are not near the end of the quartets.20 We have investigated RNA G- quadruplex folding as a function of G-tract length for various loop sequences using circular dichroism (CD), non-denaturing polyacrylamide gel electrophoresis, and RNase T1 protection assays. I show herein that G3 sequences fold through intermediate structures and exhibit negative

72 cooperativity in K+ binding; in contrast, there is no evidence for intermediate structure formation with most G2 sequences.

We first used CD to analyze the dependence of RNA GQS folding on K+ concentration

(Figure 3.1). Titrations were performed on model RNA oligonucleotides chosen from

Arabidopsis genes encoding RNAs with G3 or G2 motifs with varying loop sequences. A list of oligonucleotides used can be found in Section B.1. Oligonucleotides are described as G2 or G3 followed by loop length or sequence. For example, the sequence (G2A2)3G2 is herein referred to

+ as G2A2. Data were fit to a two-state Hill equation to obtain K 1/2 and Hill coefficient (n) values.

+ A lower K 1/2 value and higher n value corresponds to a more stable GQS according to the

+ + 34,35 equation: ΔGobs=nRTln(K /K 1/2). This equation does not correspond to the intrinsic free energy for folding but an observed free energy in K+ that couples folding and ion binding free

Figure 3.1: Fraction folded plots for representative G3 and G2 GQS. Data were fit to a two state + Hill equation. K 1/2 and n values are provided in Table 3.1.

73 energies.36 It is the observed free energy rather than the intrinsic free energy that is relevant to

+ biological function. For the RNA GQS tested herein, K 1/2 values ranged from micromolar to high millimolar, corresponding to a wide range of GQS stabilities (Table 3.1). In general, given

+ identical loop sequences, G3 GQS have lower K 1/2 values by ~ 15-fold than G2s.

Sequences with G3 had n values around 1.0 or less, whereas G2 sequences had n values around 2 or greater for all sequences analyzed (Table 3.1). For instance, G3 oligonucleotides

G3N1-2, G3A2, and G3AUA fold with n values of 0.69 0.03, 0.59 0.08, and 0.85 0.04, respectively, and the n value for G3N4 is 1.1 0.1. G2 oligonucleotides G2A2, G2AUA, and

G2N4 have significantly higher n values of 2.5 0.4, 2.7 0.1, and 2.2 0.4. A larger Hill coefficient indicates a steeper dependence of folding on K+ concentration which signifies an increase in folding cooperativity (Figure 1). Lower Hill coefficients, on the other hand, reflect no folding cooperativity, or negative folding cooperativity possibly due to the presence of intermediate structures. The difference in Hill coefficients can be seen in the different slopes of the binding curves shown in Figure 1 (as well as Figures B.1 and B.2). For instance, in Figure

3.1, the G2 oligonucleotides G2A2 (purple), G2AUA (blue) and G2N4 (green) have a much steeper folding transition than the

Table 3.1: GQS folding parameters

+ GQS Motif K 1/2 (mM) n ΔGobs, 100 mM (kcal/mol) G3 A (tr 1) 0.003 0.0007 1.0 0.1 -6.2 0.9 G3 A (tr 2) 24.7 2.1 1.5 0.2 -1.2 0.1 G3 N1-2 1.4 0.7 0.69 0.03 -1.0 0.1 G3 A2 0.73 0.08 0.59 0.08 -1.7 0.2 G3 AUA 3.98 0.82 0.85 0.04 -1.6 0.1 G3 N4 43.8 2.5 1.1 0.1 -0.6 0.1 G2 A 0.80 0.18 1.7 0.4 -1.2 0.3 G2 A2 13.8 1.0 2.5 0.4 -2.9 0.4 G2 AUA 117.4 27.8 2.7 0.1 -0.25 0.4 G2 N4 314.6 2.7 2.2 0.4 +1.5 0.3

74

G3 oligonucleotides G3A2 (red), G3AUA (orange) or G3N4 (yellow). For some sequences, two transitions were clearly visible. For example, for G3A, we were able to isolate two separate

+ + + folding transitions, one with a K 1/2 around 3 µM K and the other around 25 mM K (Table 3.1 and Figure B.3). Clear identification of folding intermediates in this instance supports the interpretation that the lower n values generally found in G3 transitions might be due to folding intermediates.

In general, low folding cooperativity is associated with the presence of populated intermediates. Indeed, a recent communication from the Herschlag lab demonstrated that sample heterogeneity, which could include intermediate structures, decreases the bulk cooperativity parameter when compared to single molecule studies.37 To investigate the Hill coefficient trend further, we used non-denaturing (native) gel analysis to visualize any intermediate structures. We evaluated RNA GQS using native gels with K+ concentrations ranging 0 to 100 mM (Figure 3.2 and B.4). For G3 oligos, native gel traces showed a large number of bands per lane between the unfolded (no K+) and fully or mostly folded (100 mM) bands. For example, at no K+ and 100 mM K+ concentrations, G3A2 traces show two main bands corresponding to unfolded and fully or mostly folded structures, respectively. However, at intermediate K+ concentrations of 1 mM and

10 mM K+, intermediate peaks appear between the two main peaks, most likely corresponding to intermediate structures. In contrast, most G2 oligos exhibit only individual bands with a lower

K+-dependent change in mobility compared to the G3 GQS. For example, G2A2 traces have sharp, well resolved bands that decrease in mobility with K+ concentration. These differences in native gels results between the G2 and G3 sequences could indicate a lack of intermediate structures in the folding transition for the G2s. The one exception is G2A, which does show evidence for intermediate structures.

75

Figure 3.2: Native gels of RNA GQS oligonucleotides. (a) G3A2 and G2A2 native gel mobility traces for different K+ concentration gels, normalized to G2L1mut mobility. (b) Native gel images at different K+ for G3A (1), G2N1-2 (2), G3A2 (3), G2AUA (4), G3N4 (5), G3N4flank (6), G2L1mut (7), G2A (8), G2A2 (9), G2UA (10), G2AUA (11), G2N4 (12).

76 Using RNase T1 protection assays, we investigated the structure of the folding intermediates on a nucleotide level (Figure 3.3). With G2 GQS RNAs, no protection was observed at low K+, however, Gs involved in the GQS structure were protected at higher K+

+ + concentrations. The larger the K 1/2 value determined from CD titrations, the greater the K concentration required to fully form a GQS structure and thus protect the G residues (not shown).

Most G residues were protected at high K+ concentrations with G3N4 (Figure 3.3). At least one

G (nt 17, open symbol) showed preferential protection at 10 mM K+. Adiditonally, a loop G residue (nt 5) showed variable protection with increasing K+ concentration: cleavage is seen at 1 mM and 10 mM K+ and further protection at high K+. This further supports the hypothesis that there are intermediate structures in the G3 GQS folding pathway.

Figure 3.3: RNase T1 protection assay of (a) G2A2 and (b) G3N4. Black arrows indicate protected nucleotides. The open arrow indicates position of differential proction. The red arrow indicates decreased protection.

77 A simple folding model (Figure 3.4) illustrates possible folding schemes for G2 and G3

GQS. Based on native gels and T1 protection assays, G3 GQS are proposed to adopt intermediate structures. In one possible scenario, a K+ ion can bind between the upper two or lower two quartets. This would leave one exterior quartet in a more flexible conformation, thus allowing for cleavage at the exterior Gs in the T1 protection assay described above. A given GQS could be largely trapped in the intermediate structure, with the binding of the second K+ ion hindered by electrostatic repulsion. Similar behavior has been noted of potassium ion channels.38 At higher

K+ concentrations, a second ion binds and the GQS is fully formed and all residues are protected from cleavage by RNase T1. Conversely, G2 GQS do not seem to fold through intermediate structures, perhaps requiring only one K+ ion to bind between the quartets to promote folding.

Figure 3.4: Simple model for RNA GQS folding for (a) G3 GQS and (b) G2 GQS. Blue circles represent K+ ions. Gray boxes indicate a formed quartet.

78 The GQS samples for K+ titrations were prepared by first dialyzing against LiCl in order to neutralize the backbone but prevent GQS folding, similar to previously described7. Thus questions arise as to the physical meaning, if any, of the Hill coefficients arrived at herein. It is clear that the Hill coefficient for folding does not reflect ions bound in the quartets. For example, the Hill coefficient for the G2 sequences is 2.3 to 2.7, yet only one ion can bind. It is possible that the value reflects all ions taken up in the folding transition, which could also include loop- bound ions, and diffuse ions although this should be accomplished by binding or exchange of Li+.

In any case, the Hill equation provides a phenomenologically and physiologically important description of how the RNA folds in the presence of increasing K+; in other words, it describes

+ where the folding transition is centered (i.e. K 1/2 value) and how steep the transition is with respect to K+ concentration (i.e. n value).

During plant stress, K+ concentrations within plant cells can increase from around100 mM to as high as ~700 mM.14,15,39 To visualize the dependence of free energy on K+ concentration, we plotted the observed free energy of folding (ΔGobs in kcal/mol) as a function of

+ + K concentration (Figure 3.5). The x-intercept for each line is the K 1/2 value. Under well- watered conditions when cellular K+ concentration is between 75-125 mM,15,39 the most stable

+ GQS sequences, such as the G3A2, G3N1-2 and G3N4, which have K 1/2 values less than 50 mM, have the intrinsic ability to begin to form stable quadruplex structures, illustrated by having negative ΔGobs under ‗normal‘ conditions (Figure 3.5). Less stable GQS, such as the G2 sequences, on the other hand, will not be appreciably formed under such conditions. They may, however, form during water-limiting conditions, when cellular K+ concentrations can reach 700 mM, illustrated by their crossing the dashed line of zero free energy in the ‗drought stress‘ range

(Figure 3.5). In addition, because of the higher Hill coefficients of the G2 GQS, they fold with a steeper K+ dependence and the relative order of GQS stabilities changes with K+ concentration,

79 illustrated by their steeper slopes in Figure 3.5. Under the higher K+ concentrations, the G2 sequences become

Figure 3.5: Free energy as a function of K+ concentration. ΔG of RNA GQS oligonucleotides with increasing K+ concentrations as calculated from CD K+ titrations. Red: G2 GQS, Black: G3 GQS. The light gray shading represents the K+ concentrations that typically prevail in plant cells under unstressed conditions and the dark gray shading represents K+ concentrations under water stress (200-700 mM). The dashed line at ΔG = 0 separates strongly folding (ΔG < 0) and unfolding (ΔG > 0) conditions.

particularly stable: G2A2 becomes the most stable of all sequences shown and G2N4 begins to fold. The potential switch in RNA structure due to cellular K+ concentration increases could potentially affect the RNA secondary structure of genes with GQS. The possibility for a change in RNA structure leading to a change in gene expression as a response to stress is thus present.

Moreover, the more digital nature of the G2 sequences has general implications for computation with DNA and for engineering genetic circuits.40

80 3.3 Acknowledgements

We would like to thank Prof. Scott Showalter for helpful discussions. This study was supported by the National Science Foundation grant MCB-0527102 to P.C.B, the National

Science Foundation grant MCB-03-45251 to S.M.A. and P.C.B., and the Human Frontier Science

Program (HFSP) grant RGP0002/2009-C to P.C.B. and S.M.A.

3.4 References

1. Huppert, J.L. and Balasubramanian, S., Prevalence of quadruplexes in the human genome. Nucleic Acids Res., 2005. 33: 2908-2916. 2. Todd, A.K., Johnston, M. and Neidle, S., Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res., 2005. 33: 2901-2907. 3. Kikin, O., Zappala, Z., D'Antonio, L. and Bagga, P.S., GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs. Nucleic Acids Res., 2008. 36: D141-148. 4. Yadav, V.K., Abraham, J.K., Mani, P., Kulshrestha, R. and Chowdhury, S., QuadBase: genome-wide database of G4 DNA--occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res., 2008. 36: D381-385. 5. Rawal, P., Kummarasetti, V.B., Ravindran, J., Kumar, N., Halder, K., Sharma, R., Mukerji, M., Das, S.K. and Chowdhury, S., Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. Genome Res., 2006. 16: 644-655. 6. Hershman, S.G., Chen, Q., Lee, J.Y., Kozak, M.L., Yue, P., Wang, L.S. and Johnson, F.B., Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Res., 2008. 36: 144-156. 7. Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M. and Bevilacqua, P.C., RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles. Nucleic Acids Res., 2010. 38: 8149-8163. 8. Qin, Y. and Hurley, L.H., Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie, 2008. 90: 1149- 1171. 9. Kumari, S., Bugaut, A., Huppert, J.L. and Balasubramanian, S., An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation. Nat. Chem. Biol., 2007. 3: 218-221.

81 10. Wieland, M. and Hartig, J.S., RNA quadruplex-based modulation of gene expression. Chem. Biol., 2007. 14: 757-763. 11. Arora, A., Dutkiewicz, M., Scaria, V., Hariharan, M., Maiti, S. and Kurreck, J., Inhibition of translation in living eukaryotic cells by an RNA G-quadruplex motif. RNA, 2008. 14: 1290-1296. 12. Balkwill, G.D., Derecka, K., Garner, T.P., Hodgman, C., Flint, A.P. and Searle, M.S., Repression of translation of human estrogen receptor alpha by G-quadruplex formation. Biochemistry, 2009. 48: 11487-11495. 13. Morris, M.J. and Basu, S., An unusually stable G-quadruplex within the 5'-UTR of the MT3 matrix metalloproteinase mRNA represses translation in eukaryotic cells. Biochemistry, 2009. 48: 5313-5319. 14. Greenway, H. and Munns, R., Mechanisms of salt tolerance in nonhalophytes. Annu. Rev. Plant Physiol., 1980. 31: 149-190. 15. Walker, D.J., Leigh, R.A. and Miller, A.J., Potassium homeostasis in vacuolate plant cells. Proc. Natl. Acad. Sci. U.S.A., 1996. 93: 10510-10514. 16. Leigh, R.A., Potassium homeostasis and membrane transport. J. Plant Nutr. Soil Sci., 2001. 164: 193-198. 17. Hazel, P., Huppert, J., Balasubramanian, S. and Neidle, S., Loop-length-dependent folding of G-quadruplexes. J. Am. Chem. Soc., 2004. 126: 16405-16415. 18. Burge, S., Parkinson, G.N., Hazel, P., Todd, A.K. and Neidle, S., Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res., 2006. 34: 5402-5415. 19. Rachwal, P.A., Brown, T. and Fox, K.R., Sequence effects of single base loops in intramolecular quadruplex DNA. FEBS Lett., 2007. 581: 1657-1660. 20. Rachwal, P.A., Brown, T. and Fox, K.R., Effect of G-tract length on the topology and stability of intramolecular DNA quadruplexes. Biochemistry, 2007. 46: 3036-3044. 21. Gray, R.D. and Chaires, J.B., Kinetics and mechanism of K+- and Na+-induced folding of models of human telomeric DNA into G-quadruplex structures. Nucleic Acids Res., 2008. 36: 4191-4203. 22. Guedin, A., De Cian, A., Gros, J., Lacroix, L. and Mergny, J.L., Sequence effects in single-base loops for quadruplexes. Biochimie, 2008. 90: 686-696. 23. Kumar, N. and Maiti, S., A thermodynamic overview of naturally occurring intramolecular DNA quadruplexes. Nucleic Acids Res., 2008. 36: 5610-5622. 24. Lane, A.N., Chaires, J.B., Gray, R.D. and Trent, J.O., Stability and kinetics of G- quadruplex structures. Nucleic Acids Res., 2008. 36: 5482-5515. 25. Smargiasso, N., Rosu, F., Hsia, W., Colson, P., Baker, E.S., Bowers, M.T., De Pauw, E. and Gabelica, V., G-quadruplex DNA assemblies: loop length, cation identity, and multimer formation. J. Am. Chem. Soc., 2008. 130: 10208-10216.

82 26. Fujimoto, T., Miyoshi, D., Tateishi-Karimata, H. and Sugimoto, N., Thermal stability and hydration state of DNA G-quadruplex regulated by loop regions. Nucleic Acids Symp. Ser., 2009: 237-238. 27. Guedin, A., Gros, J., Alberti, P. and Mergny, J.L., How long is too long? Effects of loop size on G-quadruplex stability. Nucleic Acids Res., 2010. 38: 7858-7868. 28. Cang, X., Sponer, J. and Cheatham, T.E., 3rd, Explaining the varied glycosidic conformational, G-tract length and sequence preferences for anti-parallel G- quadruplexes. Nucleic Acids Res., 2011. 29. Joachimi, A., Benz, A. and Hartig, J.S., A comparison of DNA and RNA quadruplex structures and stabilities. Bioorg. Med. Chem., 2009. 17: 6811-6815. 30. Phan, A.T., Human telomeric G-quadruplex: structures of DNA and RNA sequences. FEBS J., 2009. 277: 1107-1117. 31. Jing, N., Rando, R.F., Pommier, Y. and Hogan, M.E., Ion selective folding of loop domains in a potent anti-HIV oligonucleotide. Biochemistry, 1997. 36: 12498-12505. 32. Olsen, C.M., Gmeiner, W.H. and Marky, L.A., Unfolding of G-quadruplexes: energetic, and ion and water contributions of G-quartet stacking. J. Phys. Chem. B, 2006. 110: 6962-6969. 33. Marathias, V.M. and Bolton, P.H., Structures of the potassium-saturated, 2:1, and intermediate, 1:1, forms of a quadruplex DNA. Nucleic Acids Res., 2000. 28: 1969-1977. 34. Fang, X., Pan, T. and Sosnick, T.R., A thermodynamic framework and cooperativity in the tertiary folding of a Mg2+-dependent ribozyme. Biochemistry, 1999. 38: 16840- 16846. 35. Silverman, S.K. and Cech, T.R., Energetics and cooperativity of tertiary hydrogen bonds in RNA structure. Biochemistry, 1999. 38: 8691-8702. 36. Draper, D.E., A guide to ions and RNA structure. RNA, 2004. 10: 335-343. 37. Solomatin, S.V., Greenfeld, M. and Herschlag, D., Implications of molecular heterogeneity for the cooperativity of biological macromolecules. Nat. Struct. Mol. Biol., 2011. 18: 732-734. 38. Doyle, D.A., Morais Cabral, J., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. and MacKinnon, R., The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science, 1998. 280: 69-77. 39. Leigh, R.A., Potassium homeostasis and membrane transport. J. Plant Nutr. Soil Sci., 2001. 164: 193-198. 40. Fu, P., Biomolecular computing: is it ready to take off? Biotechnol. J. 2007. 2: 91-101.

83

Chapter 4

Effects of heavy metal ions on G-quadruplex folding and copper ion transport genes in Arabidopsis thaliana

4.1 Abstract

Heavy metals such as Cu, Zn, and Fe are essential for plant metabolism and normal function, however they are toxic in high concentrations, such as experienced by plants growing in soils with heavy metal ion pollution. Using circular dichroism, we show that Cu2+ interacts with

DNA G-quadruplex structures more strongly than other transition metal ions, and destroys the quadruplex structure. Additionally, we show that the expression of some copper transport genes that contain a G-quadruplex motif increases when grown on media supplemented with Cu2+. I explore the hypothesis that G-quadruplex structures are involved in the regulation of copper transport gene expression.

4.2 Introduction

4.2.1 Heavy metal ion toxicity in plants

Heavy metal ion pollution is prevalent in all areas of the world and is caused by industrial operations (mining, smelting, refining, chemical and petrochemical manufacturing), solvent or fuel spills, agriculture (pesticides), and military operations (explosives, chemical weapons).1

Heavy metal contaminates do not degrade over time, and can accumulate in plants and other higher organisms, where they can have toxic effects. Some heavy metal ions, including Cu, Fe,

Mn and Zn, are essential at low concentrations for plant metabolism and growth. These metals

84 function as protein cofactors for enzymes involved in photosynthesis, respiration, and N2 fixation; they are often involved in redox reactions. However, at high concentrations, these essential elements can be toxic. Uptake of heavy metal ions into plant tissues is not limited to the essential elements. Non-essential elements, such as Cd, Al, Pb and Hg can accumulate in plants and are toxic even at low concentrations.1

Copper is one of the essential elements, is required for photosynthesis in the enzyme plastocyanin, and is also found in cytochrome oxidase and superoxide dismutase.2 Experiments in yeast cells showed that while the total concentration of copper in cells is around 70 µM, the free concentration of copper ions in the cell is around 1x10-18 M, which means that all of the copper ions are bound to copper binding proteins, chaperones, and chelators in the cell (see below).3 A deficiency in Cu can lead to necrotic spots on the leaves, loss of leaves, and plant death. However, when plants are grown on soil and/or media with an excess of Cu2+, physiological effects on plants include stunted growth, chlorosis, and inhibition of root growth.4-6

Additionally, there is membrane damage,7 cell death,8 and inhibition of photosynthesis.9

4.2.2 Heavy metal ion transport in plants

Uptake of nutrients from soil is an essential mechanism for copper uptake into the plant.

In soils, copper is mostly found in the 2+ oxidation state, and is reduced to 1+ before import into the plant.2,10 In dicots, such as Arabidopsis thaliana, Cu2+ is reduced and transported to the epidermis by COPT1.2 In order for metal ions such as copper to move to other tissues, they must be loaded into the xylem, by FRD3 and the HMA family proteins, to be transported via the transpiration pathway to shoot tissues, oftentimes chelated to nicotianamine (NA) or a similar chelator.2 Some plants can transport ions through the phloem, where the YSL family of proteins is suggested as being responsible for loading and transport of heavy metal ions.2 The COPT,

85 IRT, and ZIP families of proteins have also been implicated in transport of heavy metal ions.2,11

A model for intercellular heavy metal ion transport can be seen in Figure 4.1b.

Intracelullar transport of copper ions is mediated by copper binding proteins including P- type ATPases, metallothioneins,12 and chaperones, such as copper chaperone (CCH).13,14 The pathways for heavy metal ion transport in plants are incompletely known and while some of the pathways and transport proteins have been identified, the mechanisms of transport regulation are mostly undiscovered. Some of the intracellular heavy metal ion transport pathways can be seen in Figure 4.1a. Cu2+ uptake into the cytosol from the apoplast is controlled by copper transporters

CTR and COPT, and by copper chaperone CCH through the plasmodesmata.15 Once in the cytosol, type IV P-type ATPases--heavy metal transport proteins found in prokaryotes, archea, plants and higher eukaryotes16--transport Cu, Zn, Cd, and Pb ions across membranes of organelles.17

In plants, P-type ATPase families HMA and PAA are implicated in transporting important and toxic metals. HMAs 2 through 4 are thought to transport Cu2+, Zn2+, Cd2+, or Pb2+, and HMAs 5 through 8 are implicated in Cu+/Ag+ transport.16,18-20 HMA1 is involved in copper transport to the chloroplast across the chloroplast envelope, HMA3 to the vacuole, HMA5 to the

Golgi apparatus and HMA7 to the nucleus.2,21-23 HMA6/PAA1 and HMA8/PAA2 are both required for copper delivery to plastocyanin in the chloroplast, and they function sequentially.

HMA6 transports copper across the plastid envelope inner membrane where it can be transferred through the thylakoid membrane to plastocyanin by HMA8. Alternatively, copper in the stroma can be complexed with copper chaperone for superoxide dismutase (CCS) and targeted to copper/zinc superoxide dismutase 2 (CSD2).24,25

86

87

Figure 4.1: Copper/heavy metal ion transport in Arabidopsis thaliana cell (a) and whole plant (b). Copper/heavy metal ion transport proteins are shown in red, chaperones in green, and other copper binding proteins in blue. Adapted from 2,22. In the cell (a), the HMA family is heavily involved in transport across membranes of different organelles. Copper chapterones CCH and CCS assist in copper transport between proteins. Transport pathways in the whole plant (b) are not as well characterized. Uptake of copper ions by COPT1 across the epidermis is well studied2 as are transport proteins that load the xylem (HMA5 for Cu+, HMA2 and HMA4 for Zn2+, and FRD3 for citrate). However, unloading from the xylem to other tissues and loading into the phloem for transport to other tissues is unknown; The IRT3, OPT3, and theYSL family are implicated in this function. At numbers for the genes encoding the proteins shown above are as follows (in alphabetical order): ATM3: At2g15570, ATX1: At1g66240, CCH: At3g56240, CCS: At1g12520, COPT1: At5g59030, COPT2: At3g46900, COPT3: At5g59040, COPT4: At2g37925, COPT5: At5g20650, CSD1: At1g08830, CSD2: At2g28190, FRD3: At3g08040, HMA1: At4g37270, HMA2: At4g30110, HMA3: At4g30120, HMA4: At2g19110, HMA5: At1g63440, HMA6/PAA1: At4g33520, HMA7: At5g44790, HMA8/PAA2: At5g21930, IRT1: At4g19690, MTP1: At2g46800, MTP3: At3g58810, NRAMP3: At2g23150, NRAMP4: At5g67330, OPT3: At4g16370, VIT1: At2g01770, YSL1: At4g24120, YSL2: At5g24380, YSL3: At5g53550, YSL4: At5g53550, YSL5: At3g17650, YSL6: At3g27020, YSL7: At1g65730, YSL8: At1g48370, ZIP1: At3g12750, ZIP4: At1g10970, ZIP5: At1g05300, ZIP7: At2g04032, ZIP10: At1g31260, ZIP11: At1g55910, ZIP12: At5g62160.

88

HMA5 transcript is specifically induced by growth on copper media and, in addition to transporting Cu+ to the Golgi apparatus, functions in copper detoxification of the roots, removing metal ions from the cytosol or transporting them to the xylem. hma5 mutants are more sensitive to Cu2+ than to Fe2+, Zn2+, and Cd2+ and accumulate copper in the roots at a greater level than wild type plants, because the ions cannot be exported to the xylem for transport. hma5 mutants also exhibited shortened roots.21 Conversely, inhibition of COPT1 expression in plants was shown to increase root length, which was reversed by the addition of copper.26 This is opposite to the hma5 result and further suggests that COPT1 and HMA5 have opposite functions for copper transport:

COPT1 in import and HMA5 in export.21, 26

Despite advancement, there is still much that needs to be discovered about transport of heavy metal ions throughout the plant. Regulation of transport could function at the level of transcription or transcript abundance by affecting mRNA stability, alternative splicing, or transcript initiation. Alternatively, transport regulation could function at the level of translation or transcript abundance by translation initiation, targeting, or stability. Many copper transport genes in other organisms are regulated at the transcriptional level by extracellular metal ion concentrations that affect transcription factors such as Mac1p in yeast.27 Mac1p regulates yeast copper transporter genes CTR1 and CTR3, a homolog of COPT1 in Arabidopsis, by recognizing and binding to specific DNA sequences in the absence of copper. Under high copper conditions,

Mac1p is down-regulated and therefore reduces transcription of CTR1 and CTR3. In Arabidopsis, not much is known about regulation of copper transport genes, however, miRNAs have been implicated in regulation of expression of copper binding proteins copper/zinc superoxide dismutase 1 and 2 (CSD1 and CSD2). Under low copper conditions, miR398 is up-regulated and targets CSD1 and CSD2 for degradation.28 It is possible that copper transport genes could be regulated at the nucleic acid level by miRNAs or DNA directly or affecting binding.

89 4.2.3 Nucleic acid-metal ion interaction

Because of the negatively charged backbone, nucleic acids have a strong affinity for cations, especially Mg2+. These ions associate with the backbone phosphate ions to neutralize the negative charge and in some cases initiate RNA folding. Magnesium ions are required for the formation of many RNA tertiary structures,29 and for function in the Mg2+ -sensing riboswitch30 and ribozymes.

Some metal ions preferentially associate with the base itself instead of the phosphates.

Transition metal ions such as Zn2+, Cu2+, and Cd2+ are more likely to coordinate with the N7 of purines and the N3 of pyrimidines than the phosphate backbone. There is also some evidence that Cu2+ can interact with O6 and N1 of guanine.31-33 The relative affinity of metal ions for the base over the phosphate is: Mg2+

DNA and RNA sequences with four or more tandem stretches of guanines can form G- quadruplex structures, which are characterized by stacked guanine quartets. These structures are stabilized by K+ and Na+ binding between quartets and by hydrogen bonding between the guanines at the Watson Crick and Hoogsteen faces. There have been many studies investigating the Na+ and K+ binding interactions with different G-quadruplex RNA and DNA sequences35-38 as well as divalent metal ions with DNA G-quadruplex structures. For example, the Oxytricha telomere DNA sequence d(G4T4G4), which forms an antiparallel dimer structure in the presence of 100 mM Na+, was analyzed with divalent ions Mg2+, Ca2+, Mn2+ and Co2+. The divalent ions used in the study induced a structural change in the quadruplex: the antiparallel structure was

39 destabilized and a parallel structure formed. The switch in d(G4T4G4) structure from antiparallel to parallel GQS with Ca2+ was also characterized by Sugimoto and coworkers. The antiparallel

90 structure is thought to be a dimer of two d(G4T4G4) oligos and the parallel structure could be 4

RNA strands associated in a parallel manner or a stacking of the aforementioned dimers into a G- wire structure. They suggest that Ca2+ could bind to a slightly destabilized GQS and associate in the middle or between two destabilized guanine quartets.40

We have shown that G-quadruplex sequences (GQS) are prevalent in the model plant species Arabidopsis thaliana, with approximately 42,000 such sequences found in the genome.

Greater than 13,000 gene models contain at least one GQS.41 In this chapter, through bioinformatics, folding experiments with model oligonucleotides, and RT-PCR, we investigated the binding of Cu2+ and other heavy metal ions to quadruplex DNA and the potential biological relevance of these interactions. We focused our investigations on the model plant species

Arabidopsis thaliana, which benefits from a fully sequenced genome, good gene annotation, and a short life cycle.

4.3 Materials and methods

4.3.1 Bioinformatics

Genomic sequences were obtained from the TAIR8 version of the Arabidopsis genome, released on April 25, 2008.42 We searched regions of the genome, including the 5‘UTRs,

3‘UTRs, coding sequences (CDS), introns, and intergenic regions (IGR). For gene-specific searches, we compiled a database of literature-annotated copper ion-related genes (Appendix C,

Section C.2). Databases and sequences were searched for G-quadruplex-forming sequences using the program Quadparser43 as previously described.41 We searched for sequences of the following motifs: d(GX+L1-N)3+GX+ with combinations including G3L1-7, G2L1-7 and G2L1-4. G and C patterns were searched to account for GQS formation in both strands of the DNA. Functional analysis of

91 genes containing G-quadruplex sequences was performed using the BiNGO 2.3 plugin44 for the

Cytoscape 2.6 visualization program.45,46 GO definitions were as of August 10, 2008. Further details on GQS searching and functional analysis are provided in Chapter 2, Section 2.3.2.

4.3.2 Oligonucleotide preparation

DNA oligonucleotides were chosen from the list of GQS-containing genes overrepresented for the GO term ―copper ion binding‖ from the BiNGO analysis (Tables 4.3 and

4.4). Oligonucleotides were ordered from IDT with ten native nucleotide sequences flanking both the 5‘ and 3‘ ends of the GQS sequence. Guanine residues involved in the putative GQS structure are underlined. At2g15780 was ordered as a wild type (WT) form, a mutant version that cannot form a GQS (At2g15780mut), and a mutant version that cannot form the competing hairpin structure shown in Figure 4.2 (At2g15780strong). Base changes are highlighted with bold, red text. The DNA oligonucleotides from Arabidopsis genes were given the following ID tags, AT number, and sequences:

At1g76160 aka ―SKS5 (SKU Similar 5)‖ 5‘-TATCTCCGCGGGATCAGGGAAAGGGACCGGGATTCTAGGAC G3L433

At2g15780 5‘-GGCTCAGGCTGGGGTTGGGGTTGGGGCGGGGTTCCCAACAA G4L221

At2g15780mut 5‘-GGCTCAGCCTGCACTTCCGGTTGGGCAGAGCTTCCCAACAA

At2g15780strong 5‘-GATATTCGGTGGGGTTGGGGTTGGGGCGGGGTTATGATCAA

Other DNA sequences, given below, were chosen from the list of genes with GQS as determined from the Quadparser results of the literature-proven copper ion annotated genes

(Table 4.5, Section C.2):

92 At5g53550 YSL3 5‘- CATTGCAGTTGGAGGTGGGTTTGGTTCATACCCTT G2232L113

At5g20650 COPT5 5‘- GGAAGAGCGAGGCGGTGGAGGAGCACGACGG G2L111

Figure 4.2: At2g15780 competing hairpin structure. Mfold predicted secondary structure of At2g15780 GQS with ten nucleotide flanking sequences. Nucleotides involved in GQS formation are colored in red.

Oligonucleotides were first dialyzed against 100 mM LiCl, which does not support quadruplex formation, for 6 h to remove associated cations and then dialyzed against distilled and autoclaved water for 4 h to remove excess LiCl. Finally, DNA oligonucleotides were dialyzed overnight against 10 mM Li Cacodylate (pH 7.0). All dialysis was performed using an eight well microdialysis apparatus from Gibco-BRL Life Technologies with a flow rate of 25 mL/min. To favor monomeric species, DNAs were renatured in the absence of K+ at 75 °C for 5 min and then allowed to cool at room temperature before performing experiments.

93 4.3.3 Native gels

A 12% polyacrylamide gel was prepared with 0.5X TBE (50 mM Tris, 40 mM Boric

Acid, and 0.5 mM EDTA, pH 8) and 100 mM KCl to allow quadruplex formation; 0.5X TBE and

100 mM KCl was also used as the running buffer. At least 0.5 nM 5‘end labeled DNA was renatured without K+ at 85 C for 1 min then allowed to cool at room temperature for 10 min. 100 mM KCl was added from a 500 mM 5X stock and samples were incubated at room temperature for 45 minutes. 2X Glycerol Loading Buffer was added (1X = 20% glycerol, 0.5X TBE, 100 mM

KCl. Gels were run at a constant 200 V for approximately 3 h. Buffers from the upper and lower wells were mixed every hour and the wells were refilled. The gels were dried, exposed to a phosphor plate, and scanned on a PhosphorImager (Molecular Dynamics).

4.3.4 UV-detected thermal denaturation

DNA was prepared to a concentration of 4 µM, 2 µM, and 0.4 µM in 10 mM

LiCacodylate, pH 7.0. Samples were renatured at 85 C for 1 min then allowed to cool to room temperature. KCl was added to a final concentration of 100 mM. Thermal denaturation experiments were performed on a Gilford Response II spectropolarimeter. Data was collected every 0.5 C over the temperature ranges 5-95 C and 95-5 C. Reversible transitions were confirmed by overlapping forward and reverse melts. Unfolding transitions were monitored at

260 nm with 1 cm, 0.5 cm cuvettes and 0.1 cm pathlength cuvettes, depending on the DNA concentration. Data were analyzed with KaleidaGraph v.3.5 (Synergy Software).

94 4.3.5 Circular dichroism

Circular dichroism spectroscopy was performed using a Jasco CD J810

Spectropolarimeter and analyzed with Jasco Spectra Manager Suite software. Nucleic acid samples were prepared as described above to desired concentrations for each experiment. Spectra were acquired at 20 C over a wavelength range of 210-320 nm, with data collected every nanometer at a bandwidth of 1 nm. Reported spectra are an average of 3 scans at a response time of 4 s/nm. Data were buffer subtracted, normalized to provide molar residue ellipticity values

(Eq. 1), and smoothed over 5 nm. Molar residue ellipticity is reported in order to normalize for concentration (C) differences, cuvette path length (L) and oligonucleotide length (N):

(Eq. 1) 32,980 C L N

DNA GQS oligonucleotides were titrated with KCl, CuSO4, ZnSO4, MgSO4, or CdCl2 in the background of 100 mM KCl. Sulfide salts of the heavy metal ions Cu2+ and Zn2+ are used in plant experiments and were chosen for in vitro use. MgSO4 was used as the control to maintain the anion used in the titrations. CdCl2 was used because CdSO4 was not readily available. The pH of the samples was tested before and after each titration to ensure that pH 7.0 of the 10 mM

LiCacodylate buffer was maintained. The amount of metal ions necessary to drive quadruplex

2+ formation or unfold quadruplex structures was of interest. To determine Cu 1/2 values or, in

m+ general, M 1/2, ellipticity data were fit with KaleidaGraph v. 3.5 (Synergy software) according to the two-state Hill equation in which Mm+ ions are taken up in the U to F transition:

U F F m m n (Eq. 2) 1 ([M ]/[M 1/2 ])

95 where is the molar ellipticity, F is the normalized CD signal corresponding to fully folded GQS,

U is the signal for the unfolded GQS, [Mm+] is the metal ion concentration which is varied in the

m+ experiment, M 1/2 is the metal ion concentration needed to fold or unfold half the nucleic acid which is a constant, and n is the dreaded Hill coefficient.

4.3.6 Plant growth for RT-PCR

Plants were grown vertically on 1% agar plates with 0.5X (2.15 g/L) Murashige and

Skoog (MS) media, brought to pH 5.8 with KOH, with 1% added sucrose. Some plates had added salt, including 10-100 µM CuSO4, and 30 µM of any of the following: FeSO4, ZnSO4, or

CdSO4. In a laminar flow hood, seeds were surface sterilized before plating by incubation in 70% ethanol for 5 min followed by 95% ethanol for 1 min and drying on filter paper. Seeds were added to the agar plates using flamed forceps and placed in a straight line near the top of the plate. Plates were sealed with 2 layers of parafilm (for 7 day growth) or Millipore tape (>7day growth). Then, plates were incubated at 4 C for 48 hours (stratification) and placed vertically in a growth chamber for a long day, 16/8 hour photoperiod (16 h light, 8 h dark), at 21 C/19 C.

4.3.7 Tissue harvest and TRIzol total RNA isolation

At least 100 mg of seedling tissue grown as described above was harvested and ground into a fine white powder using a pool of liquid nitrogen and a disposable plastic mortar 1.7 mL

Eppendorf tube and pestle. The tube was kept out of the liquid nitrogen for no more than 1 min while grinding. Additionally, no more than 3 samples were ground at a time, to avoid thawing.

When tissue was finely ground, 250 µL of TRIzol (Invitrogen) was added to the eppendorf tube and pestle. When the TRIzol began to thaw, the plant powder was suspended in the added TRIzol

96 by gentle mixing. Once the plant powder was completely suspended, an additional 250 µL of

TRIzol was added. The sample was vortexed immediately for 15-30 sec. Tubes were incubated at room temperature for 5 min while gently rocking. 100 µL of chloroform was added and tubes were mixed by inversion. Samples were incubated again at room temperature for 3-5 min. Tubes were spun at 12K rpm for 15 min at 4 C. The top aqueous phase was kept and transferred to a new tube. Glycogen (5 µg) was added to each tube as a co-precipitant before adding 500 µL of isopropanol and incubating the tubes at room temperature for 10 min to help precipitate RNA.

Tubes were spun at 12K rpm for 15 min at 4 C. The supernatant was removed and 1 mL of 70% ethanol was added to rinse the RNA pellet. The tubes were spun again at 12 k rpm for 15 min at

4 C. The supernatant was removed and the tubes were allowed to dry (but not to completion).

RNA pellets were resuspended in 20 µL of 10 mM Tris/1 mM EDTA pH 7.5 (TE). The concentration and purity of the RNA samples were determined by the NanoDrop

Spectrophotometer at the Penn State Genomics Core Facility before continuing.

4.3.8 mRNA isolation and DNase treatment

Due to low expression levels of the genes of interest, an mRNA isolation step was added after the total RNA isolated above using an oligo (dT) polystyrene bead kit and following the manufacturer‘s instructions (Sigma). To concentrate the mRNA after isolation, the mRNA was ethanol precipitated and resuspended it in 8 µL of 1xTE. To remove any DNA contamination, the mRNA (or Poly(A)+ RNA) was treated with 1 U of RQ1 DNase (Promega) followed by inactivation by a final concentration of 2 mM EGTA and 10 min incubation at 65 C.

97 4.3.9 RT-PCR

400 ng of Poly(A)+ RNA was used for the reverse transcriptase reaction, which was performed using an Invitrogen SuperScriptIII RT Kit with oligo(dT) primers at 50 C for 50 min.

The RT reaction was inactivated at 85 C for 5 min. followed by treatment with 1 µL of RNase H at 37 C for 20 min. PCR was performed with WT template dilutions of 1/10 and 1/100. PCR controls included: Actin2 primers for a positive control, 18S primers for a negative control, and a no template control to check for contamination. PCR was performed on a Biometra Temperature

Gradient Thermocycler with the following conditions: 95 C for 5 min followed by 35 cycles of

95 C for 1 min, 60 C for 1 min, and 72 C for 1 min, and included a final annealing step of 10 min at 72 C. Results were analyzed using a 2% low melt Nusieve agarose gel. The gels were visualized on a UV light box in a dark room or quantified using a PhosphorImager (Molecular

Dynamics). When quantified, band intensity was normalized to the intensity of the Actin control for a given Cu2+ concentration to account for differences in cDNA concentration in the PCR template.

4.3.10 RT-PCR primers

Primers were designed to span an intron in order to detect any genomic DNA contamination. This was done by having the primers anneal just outside of splice sites. In this way, PCR products from genomic DNA templates will be longer than PCR products from cDNA templates. Forward primers, reverse primers, and expected product lengths from cDNA templates are included in Table 4.1. Additional strategies in designing these primers were to have the PCR products be larger than 150-200 base pairs and to have similar melting temperatures.

98 Table 4.1: RT-PCR Primers

Gene Forward Primer Reverse Primer Product Length HMA521 GGCTATCAAGCGTCTCCCTGG CCTCCAATAGTCTATCATAGG 344 HMA6 GGTGACAGAGAATTTCTTCAAAG GCTTGCCTGCAGGACCAAGC 224 HMA6.2 CGCTATGCAGACATGGAGCCA CGCATCAAGCAACTGCATGGAA 463 HMA8 GGTATGGGAGTGGCGGAG GCTCCAACGGCCAAACCAC 230 COPT5 GCACATGACCTTCTACTGGG CAGCGTATCCGGCGGTTAATC 373 MTPB1 GATTTCTGAAGAGGACAGCTC GGCTGCTAGTCTTCTAGTC 244 18S47 CGGCTACCACATCCAAGGAA GCTGGAATTACCGCGGCT 186 ACT248 CCACGAGACAACCTATAACTCAAT AACGACCTTAATCTTCATGCTGC 168

4.3.11 T-DNA insertion mutants and genotyping

In planta mutational experiments are possible using T-DNA (transfer DNA) insertion mutants. Using Agrobacterium tumefaciens, segments of DNA are inserted into the Arabidopsis genome and can create knockout or knockdown lines for genes where the insertion was transferred. T-DNA insertion mutants were ordered from the Arabidopsis Biological Resource

Center (ABRC) at The Ohio State University for knock-down lines for the following genes:

HMA5, SALK_040252C; HMA6.1, SALK_072581C; HMA6.2, SALK_109629; HMA8,

SALK_052655C; MTPB1.1, SALK_107793; MTPB1.2, SALK_152229.49,50

Seeds were sterilized in the same method as described above in Section 4.3.5, and were sown on 0.8% agar plates with 1% sucrose at pH 5.8 for vertical growth. After stratification at

4 C for 48 hours, plates were incubated in an 8/16 photoperiod at 21 C/19 C for 10 days or until robust first true leaves were produced. The seedlings were transplanted to soil and allowed to incubate in an 8/16 photoperiod, 21 C/19 C for 2 weeks. One whole leaf was harvested for genotyping as described below. The plants were then transferred to long days (16/8 photoperiod,

21 C/19 C) to accelerate bolting and seed collection.

DNA was extracted from a 3 mm diameter piece of leaf tissue. 20 µL of template preparation solution (100 mM Tris-HCl, pH 9.5, 1M KCl, 10 mM EDTA) was added to the tissue

99 and ground with a pipet tip. The solution was then heated to 96 C for 10 min. A 1/10 dilution of this solution was used as the template solution for genotyping PCR. The PCR reaction was as follows: 95 C for 5 min followed by 35 cycles of 95 C for 30 sec, 55 C for 30 sec, and 72 C for

1 min, followed by 72 C for 7 min.

Because the ARBC gets T-DNA lines from donations, seeds are not reliably homozygous for the T-DNA insertion and must be genotyped using PCR. Two sets of primers were used for

PCR: one with the set of right and left gene specific primers and one with a left gene-specific primer and the T-DNA specific primer. Primers sequences were obtained from the SALK T-

DNA verification primer design to match the specific T-DNA lines ordered.49 T-DNA insertion homozygous plants produced a band only for the PCR reaction with the T-DNA specific primer, heterozygous plants produced bands for both PCR reactions, and WT plants produced bands only for the PCR reaction with the gene-specific primers. T-DNA primers were ordered from IDT

DNA and the sequences can be found in Table 4.2. LBb1.3 (5‘-ATTTTGCCGATTTCGGAAC) was used as the T-DNA specific primer.49,50

Table 4.2: Genotyping Primers

Construct Left Gene Specific Primer Right Gene Specific Primer HMA5 ACCAATGACAAACTGGACAGG AAGCTTTTGTCGCTTACATGC HMA6-1 CCAGACTAAAAAGAGCAGGGG CTCACGGCTCTTGTTTGTTTC HMA6-2 TGTATCAGGCGGAATTGAGAG ACCATCACTTGCATTAGCACC HMA8 GCAATCGGATATTGCAATAGC TGCCATATAATTGGCACCTC MTPB1-1a TGATGGGGGATAGTGATCAAC TTCCAGAGCTGTCCTCTTCAG MTPB1-1b GTACAGCTTTTATTGGGCCTG TTCCAGAGCTGTCCTCTTCAG MTPB1-2 TGATGGGGGATAGTGATCAAC TCCATTTCTTCCTCATCATCG

100 4.4 Results

4.4.1 Bioinformatics

Using the program Quadparser43 as described in Chapter 2, we searched the Arabidopsis thaliana genome for potential G-quadruplex forming sequences with G and C patterns. For the

GQS motif (G3L1-7)3G3, we performed a functional analysis with BiNGO and show that G3 GQS are overrepresented in the GO term for copper ion binding with a p-value of 8.8e-4 (Table 4.3).

A list of genes and GQS sequences included in the overrepresented copper ion binding term can be found in Table 4.4. Also determined using BiNGO, the GO term copper ion binding was

-4 overrepresented in G/C2+L1-4 GQS, with a smaller corrected p-value of 1.0x10 . A list of these genes and sequences can be found in Table C.1.

The sequences listed above are mostly C-rich and would form a GQS structure in the template strand of the DNA and not in the RNA. The three G-rich sequences have T-rich loops.

Additionally, four of the sequences contain concatenated GQS, with at least 5 stretches of Gs.

All but one of the sequences listed above are located in the CDS, the other found in the first intron. Two are found less than 150 nucleotides into the CDS and all but one are located less than

1,000 nucleotides from the transcription start site. Only one RNA, At2g15780, contains two

GQS.

GO annotations are sometimes inaccurate and as such, we examined literature-validated copper related genes for the presence of GQS motifs. We constructed a database of some literature-annotated copper and heavy-metal related genes (Section C.2).2,11,13,15,16,18,21,22,24-26,51-60

Quadparser was used to search this database for GQS with a (G2L1-4)3G2motif. We found 17 genes with at least one GQS motif in the DNA (see Table 4.5 for results) and 33 GQS sequences in total, the majority of which were G patterns.

101

Table 4.3: Functional analysis of genes with at least one G3+L1-7 GQS present in the RNA

GO GO termc GQS All % GQS p-valueg GO IDa Catb genesd genese genesf overrepresented 0005507 MF copper ion binding 9h 109i 8% 8.8E-4 0009790 BP embryonic development 11 277 4% 3.3E-2 0007275 BP multicellular organismal dev. 20 871 2% 8.8E-2

underrepresented 0009059 BP macromolecule biosynth process 0 1399 0 3.7E-3 0009058 BP biosynthetic process 3 2149 0.1% 3.7E-3 0010467 BP gene expression 1 1487 0.1% 1.2E-2 0044249 BP cellular biosynthetic process 2 1719 0.1% 1.2E-2 0006416 BP Translation 0 1129 0 1.9E-2 Provided are aoverrepresented and underrepresented gene ontology (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d transcripts with at least one G2+L1-4 GQS. Included are the number of genes (including CDS, 5‘UTR, 3‘UTR, and intronic regions) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given GO term and gthe appropriate p-value, as determined using the BiNGO program. hThe total number of GO-annotated genes with a GQS i in G2L1-4 is 246. The total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub- categories of others.

Table 4.4: G3+L1-7 GQS sequences from overrepresented GO term ―copper ion binding‖

Gene ID GeneName Location GQS Sequence At1g31710 Intron 1 CCCAAACCCGAAATACCCGACCCGAATACCC At1g76160 SKS5 CDS: 449-470 CCCGGTCCCTTTCCCTGATCCC At1g75790 SKS18 CDS: 446-467 CCCAGTCCCATTCCCTAAGCCC At2g15780 CDS: 82-103 GGGGCTTTGGGTGGGGTTGGGG CDS: 253-273 GGGGTTGGGGTTGGGGCGGGG At2g15770 CDS: 102-131 GGGTCGGGTTGGGGTTGGGGATCTGACGGG At4g28090 SKS10 CDS: 440-470 CCCTCGCATCCCGTCCCTTTCCCTGAACCC At4g38420 SKS9 CDS: 446-476 CCCTCGGCATCCCCGTCCCTTTCCCTGAACCC At5g58910 LAC16 CDS: 323-344 CCCATACCCTTTCCCTAAGCCC At5g60020 LAC17 CDS: 1273-1303 CCCAAAATTCCCATGGAGTCCCATTGTCCC

102

Table 4.5: G2+L1-4 GQS in genes annotated for copper ion binding/transport

CDS Gene ID GeneName GQS Sequence Location At1g56210 CCH-related 77-112 CCTTCCTCCTCCGACCATACCACCTCTTCCCTCC 617-630 GGCTCCGGCGGCGG 960-970 CCTCCACCACC At2g30110 HMA2 1760-1778 GGCTATGGTAGGGGACGGG At2g19110 HMA4 323-336 CCTCCTATCCTTCC 3014-3039 GGTGGAGAAGGAAGGTTTGGACATGG 3485-3504 CCACCACCACCACCATCACC At1g63440 HMA5 96-106 GGCGGCGGAGG 1889-1906 GGTTGGTACTGGGGTTGG 2582-2599 GGCAATGGTAGGTGACGG At4g33520 HMA6/PAA1 312-335 GGCGGCGGTTCTGGATTTGGGGGG 342-375 GGTGGAAGCGGCGGTGGTGGTGGCGGAGGATCGG 2414-2425 GGTGGGGGATGG 2463-2493 GGAGTGGCCATGGGTGGTGGTGCAGGAGCGG At5g21930 HMA8/PAA2 24-37 CCTCTTCCTCCGCC 365-390 GGAGGTTGAGGTGACGGCGGATACGG 449-465 GGGTATGGGAGTGGCGG 2139-2160 GGGGACAGGGAAGGGGCAGTGG At2g43750 ACS1 737-760 GGTTGCGGGGATTGGAACTGGTGG At2g28190 CDS2 54-70 CCTCCTTCCTCCAATCC At4g03560 TPC1 33-54 GGTGGTGGTGGTACGGATCGGG At4g24120 YSL1 1904-1917 GGTTCCGGCGGTGG At5g24380 YSL2 1340-1354 GGCGGCACTGGCAGG 1365-1383 GGAGTGGTGGCTGGGATGG At5g53550 YSL3 369-382 GGAGGTGGGTTTGG At2g46800 MTP1 564-583 GGACATGGGCATGGCCACGG At2g29410 MTPB1 548-570 CCATAGCCACCACCACCACGACC 620-630 GGAGGTGGTGG 638-648 GGAGGAGGAGG At3g46900 COPT2 27-40 CCACCACCATCACC At5g59040 COPT3 362-380 GGCTGGTTTTGGATTAGGG At5g20650 COPT5 180-190 CCTCCACCGCC 232-248 CCGGTACCAGATCCGCC GQS results from G2L1-4 Quadparser search of copper and metal related genes found in Appendix X. Listed are AT number, Common gene name, location of GQS in the CDS, and the GQS sequence in G-rich or C-rich form.

103 4.4.2 Circular dichroism of model DNA oligonucleotides

Even though GQS were overrepresented in genes associated with the GO term ―copper ion binding‖, there is no evidence in the literature that there is any interaction between GQS and copper. However, some divalent metal ions are known to interact with DNA GQS, even switching the structural conformation between parallel and antiparallel (see above).39, 40 To test the interaction between GQS and Cu2+ ions, we evaluated metal ion interactions with GQS- containing DNA oligonucleotides from the BiNGO results. Nucleotide sequences and At numbers are listed in Materials and Methods (section 4.3.2).

We tested the SKS5 GQS oligonucleotide for quadruplex formation with CD in the background of 100 mM KCl, which should induce GQS formation. We also titrated the folded oligonucleotide with CuSO4 to test for binding. At 4 µM DNA concentration, the oligonucleotide adopts a parallel quadruplex signature in the presence of 100 mM KCl and in the absence of any heavy metal ions (Figure 4.3a black line). Upon addition of CuSO4, the signal decreases until it does not show a GQS signature, indicating that DNA has unfolded out of the GQS structure

2+ (Figure 4.3a). After fitting poorly to a two state Hill equation, the resulting Cu 1/2 is 59.6 µM

5.7 µM and n is 4.1 1.4 at 263 nm (Figure 4.4). The higher Hill coefficient of 4 suggests at least four Cu2+ ions associate with the SKS5 DNA during the folding transitions. However, after the 5 mM CuSO4 addition (not shown), there was a visible precipitate at the bottom of the tube.

2+ This suggests that a possible aggregate state of DNA and Cu is forming at higher CuSO4 and

DNA concentrations. This precipitate could be affecting the value of the Hill coefficient and the apparent number of Cu2+ ions bound.

To remove the possibility of DNA precipitation, we repeated the titration at a 25-fold lower concentration of SKS5 DNA (Figure 4.3b). The titration with 0.2 µM SKS5 DNA has lower signal to noise and a lower CD signal intensity than the titration with higher DNA

104

2+ concentrations. After fitting the titration curve at 265 nm, the resulting Cu 1/2 was significantly lower at 34.0 µM 0.3 µM as was the Hill coefficient with n being 0.9 0.07 (Figure 4.4).

Figure 4.3: CD titrations of SKS5 DNA with heavy metal ions show a strong Cu2+ effect. SKS5 DNA in 10 mM LiCacodylate, pH 7 and 100 mM KCl was titrated with different heavy metal 2+ ions. (a) 4 µM DNA with CuSO4 additions, with a resulting Cu 1/2 of 59.6 µM 5.7 µM and n = 2+ 4.1 1.4 at 263 nm. (b) 0.2 µM DNA with CuSO4 additions, with a resulting Cu 1/2 of 34.0 µM 0.3 µM and n = 0.9 0.07 at 265 nm. (c) Control titration of 4 µM DNA with MgSO4 additions, and (d) Control titration of 4 µM DNA with ZnSO4 additions and a final spike of CuSO4.

105 The Hill coefficient ~1 was closer to what we were expecting for Cu2+ binding to GQS DNA, suggesting that at least one Cu2+ ion associates with one DNA molecule. Control titrations with

MgSO4 up to 1 mM show no effect on SKS5 GQS structure (Figure 4.3c).

2+ Figure 4.4: SKS5 Cu 1/2 and Hill coefficient values are concentration dependent. Fits of the SKS5 2+ DNA titrations with Cu 1/2 as described in Materials and Methods and shown in (a) Figure 4.3a and (b) Figure 4.3b. (a) Poor fit of 4 µM DNA in 10 mM LiCacodylate, pH 7.0 and 100 mM KCl 2+ with CuSO4 additions. Cu 1/2 is 59.6 µM 5.7 µM and n = 4.1 1.4 at 263 nm. (b) Fit of 0.2 2+ µM DNA in 10 mM LiCacodylate, pH 7.0 and 100 mM KCl with CuSO4 additions. Cu 1/2 is 34.0 µM 0.3 µM and n is 0.9 0.07 at 265 nm.

2+ Addition of ZnSO4, which has similar properties to CuSO4, has no effect up to 1 mM Zn , however, at 5 mM Zn2+, there is a decrease in CD signal similar to that seen with addition of Cu2+

(Figure 4.3d). However, Cu2+ has a much stronger affect on the GQS signal, evidenced by the

2+ fact that addition of 50 µM CuSO4 after 5 mM Zn shows a further decrease in CD signal (Figure

4.3d).

To determine if the SKS5 fully folded GQS structure is unimolecular, bimolecular, or tetramolecular, we performed concentration dependent melts and native gels to test the stoichiometry of folding. In 10 mM LiCacodylate buffer at pH 7.0, melts over a 10-fold

106 concentration range of DNA from 0.4 µM to 4 µM showed no change in melting temperature

(43.0 C) (Figure 4.5a) and native gels containing 100 mM KCl at the same DNA concentrations appeared to be the same, with no formation of higher order species at high DNA concentrations

(Figure 4.5b). Therefore, the oligos likely form a unimolecular quadruplex structure, independent of DNA concentration.

Figure 4.5: SKS5 GQS folds independently of DNA concentration. (a) First derivative of UV thermal denaturation curves at 0.4 µM (black), 2 µM (blue), and 4 µM (green) SKS5 DNA. Melts were completed in 10 mM LiCacodylate, pH 7.0 with 100 mM KCl and as described in Materials and Methods. (b) Non-denaturing polyacrylamide gel electrophoresis of SKS5 DNA at 0.4 µM, 2 µM, and 4 µM. Native gels was run with 0.5xTBE and 100 mM KCl as described in Materials and Methods.

DNA GQS are able to fold into at least six different conformations of antiparallel and parallel structures.61 As such, a single DNA oligonucleotide sequence can fold into a mixture of different conformations, making the CD spectrum difficult to interpret, especially if there are mixtures of antiparallel and parallel quadruplexes, which have different CD signatures. The ratio of each conformation will affect the shape and size of the CD peaks at 260 nm and 295 nm. For instance, At2g15780 showed a mixture of antiparallel and parallel character in 100 mM KCl, by having maxima at 295 and 260 nm (Figure 4.6).

107

Figure 4.6: At2g15780 folds into a complex GQS structure that is destroyed by Cu2+. At2g15780 DNA in 10 mM LiCacodylate, pH 7 and 100 mM KCl was titrated with (a) CuSO4 to an 2+ 2+ estimated Cu 1/2 greater than 500 µM and (b) ZnSO4 to a Zn 1/2 greater than 1 mM.

2+ 2+ The peak at 295 nm decreased with increasing amounts of Cu , with an estimated Cu 1/2

2+ 2+ of 40-50 µM and decreased with Zn to a much lesser extent, with a Zn 1/2 greater than 1 mM

(Figure 4.6). However, when At2g15780 was titrated with CdCl2, a change in relative peak heights was seen, with the antiparallel 295 nm peak decreasing and the parallel-signature 260 nm peak increasing, suggesting a switch from antiparallel to parallel structure (Figure 4.7).

Mfold structures of the At2g15780 sequences and flanking sequences showed the possibility for an alternative structure with a stable hairpin with a six base pair stem (Figure 4.2).

Control oligonucleotides with a GQS sequence only or with a hairpin fold only were also tested for GQS folding with CD. Preliminary results showed that the oligonucleotide with the hairpin only (At2g15780mut) had a lower intensity CD peak around 280 nm, a common spectrum for a non-GQS nucleic acid, possibly folded into the hairpin structure. Titration of Cu2+ had a slight

2+ effect on the structure, with a smaller change in CD signal and an estimated Cu 1/2 greater than

500 µM. Thus, Cu2+ does not substantially interfere with formation of the hairpin. The DNA oligonucleotide with the GQS only (At2g15780strong) had a shifted CD spectrum, with a

108 maximum around 260 nm, where one would expect a GQS signature peak. This peak also decreased when titrated with Cu2+. Copper-annotated DNA oligonucleotides COPT5 and YSL3 also had GQS signature CD spectra, whose peaks decreased when titrated with Cu2+ with

2+ estimated Cu 1/2 of 70 µM and 50 µM, respectively (not shown).

Figure 4.7: CdCl2 switches DNA GQS from antiparallel to parallel structure. At2g15780.1 in 10 2+ mM LiCacodylate, pH 7.0 and 100 mM KCl was titrated with CdCl2 to a Cd concentration of 1 mM.

4.4.3 WT plant growth on copper-containing media

We focused our attention on genes that have a GQS and have been shown to have a Cu2+ response in vivo or that are predicted to have a Cu2+ response from the literature. Table 4.6 shows a list of candidate genes for in vivo experiments. These candidates are annotated for Cu and/or

Zn protein function in the literature or TAIR and have at least one GQS sequence, either G-rich or

C-rich, accounting for GQS formation in both strands of the DNA. For each of these genes, a point mutation is possible in at least one GQS that would disrupt the GQS sequence, but would maintain the protein sequence. T-DNA mutant seeds are available for each, and while each belongs to a family of genes, some of the mutants do have phenotypes described in the literature

109 (see Table 4.5) that could be tested. Two of the candidates, COPT5 and MTPB1, have never been studied in planta.

Wild type Col-0 plants were grown on regular and Cu2+-containing media, as described in

Section 4.3.6 to detect any changes in transcript level for our genes of interest. Changes in plant development as a result of Cu2+-containing media were expected21 and can be seen in Figure 4.8.

Plants from seeds sown on media containing at least 30 µM Cu2+ were smaller, had smaller cotyledons that were not fully extended, showed signs of chlorosis (yellowing), had smaller, if any, true leaves, which lacked trichomes, had shorter root lengths and had fewer lateral roots.

Figure 4.8: Cu2+ inhibits seedling growth and shortens root length. Images of 8 day old Col-0 wild type seedlings grown on copper-supplemented media under 16x magnification. Seeds were sown on 0.5xMS media supplemented with (a) no CuSO4, (b) 30 µM CuSO4, (c) 50 µM CuSO4, and (d) 100 µM CuSO4. (e) Root length as a function of CuSO4 added to growth media. Root lengths were measured by hand and are the averages of at least 15 plants. Error bars represent standard deviations of the averages.

110

Table 4.6: Candidate genes for in vivo experiments

Gene ID Gene Name Annotated GQSa CDS GQS Point T-DNA Family Mutant? Phenotype Function Locationb Mutation?c seeds?e Member? Cu transport G *** 96 nt No hypersensitiv to xylem; Yes: G * 1889 nt Yes (1) Yes non- e to Cu, At1g63440 HMA5 expression 5 lines 8 total lethal21 especially induced by available G * 2582 nt Yes (1) root length21 excess Cu G * 312 nt Yes (2) high Cu transport G * 342 nt Yes (1) Yes: chlorophyll to plastid Yes non- At4g33520 HMA6/PAA1 8 lines fluorescence inner G **/*** 2414 nt Yes (2) 8 total lethal25 available in double membrane Yes G * 2463 nt mutant25 (2)/Nod Cu transport C * 24 nt No high through Yes: chlorophyll G * 365 nt Yes (6) Yes non- At5g21930 HMA8/PAA2 thylakoid 3 lines fluorescence G */** 449 nt Yes (3) 8 total lethal25 membrane to available in double plastocyanin G */** 2139 nt Yes (4) mutant25 C *** 180 nt No Yes: Yes At5g20650 COPT5 Cu transport 3 lines unknown never studied C * 232 nt Yes (6) 5 total available C * 548 nt Yes (6) Yes: Zn transport Yes At2g29410 MTPB1 G *** 620 nt Yes (4) 3 lines unknown never studied family 4 total G *** 638 nt Yes (4) available Candidates were chosen based on their annotated function, if they contained a GQS and if that sequence could be mutated while maintaining the native protein sequence, and if there were available TDNA lines with non-lethal phenotypes. a G or C pattern GQS (to account for GQS formation in either strand of DNA) and estimated GQS stability based on sequence: * weak, ** intermediate, *** strong. Longer G-tracts and smaller loops are indicative of a more stable GQS. b Location of the start of the GQS from the start of the coding sequence (CDS). c We asked if it were possible to make a point mutation and not affect the amino acid sequence using synonymous codons. d This sequence is concatenated. Even though point mutations can disrupt guanine residues involved in GQS formation, a GQS can still form using other stretches of guanine residues. e number of SALK lines available for each mRNA of interest.

111

4.4.4 RT-PCR of oligos of interest

We used RT-PCR to assay for any changes in transcript levels for the genes of interest as a function of Cu2+ levels in the media, as described in 4.3.7 – 4.3.10. As determined from transcriptome data visualization tools Genevestigator62-65 and AtGenExpress,66 normal expression levels for HMA5, HMA6, HMA8, and MTPB1 are low. (COPT5 is the exception, which has a higher expression level). For this reason, an mRNA isolation step was added to the RT-PCR protocol (see Materials and Methods), with improved results.

From RT-PCR analysis, we found that all genes of interest had increased transcript levels when grown on Cu2+-containing media (Figure 4.9 and Table 4.7).

Figure 4.9: Expression of copper transport genes increases when Col-0 plants are grown on media containing Cu2+. RT-PCR results for Arabidopsis thaliana Col-0 plants grown on media with no added CuSO4, 30 µM CuSO4, 50 µM CuSO4, and 100 µM CuSO4. RT-PCR products are shown for HMA5, HMA6, HMA8, COPT5, MTPB1 and ACT2 (Actin2). Gel quantification data can be seen in Table 4.7.

112 Table 4.7: RT-PCR Quantification

mRNA 0 µM Cu2+ 30 µM Cu2+ 50 µM Cu2+ 100 µM Cu2+ fold change ACT2 1.00 1.00 1.00 1.00 0.0 HMA5 0.02 0.06 0.11 0.19 9.5 HMA6 0.02 0.04 0.06 0.09 4.5 HMA8 0.06 0.14 0.23 0.18 3.0 COPT5 0.90 0.77 1.20 1.58 1.8 MTPB1 0.05 0.10 0.15 0.20 4.0 Quantified RT-PCR results with the agarose gel band intensity normalized to the intensity of the Actin2 (ACT2) control for a given Cu2+ concentration to account for differences in cDNA concentration in the PCR template. Fold change calculated from no added CuSO4 to 100 µM CuSO4.

COPT5 and HMA8 only increased slightly: 1.8 and 3.0 fold, respectively. Additionally, the

PhosphorImager scan of HMA8 RT-PCR results indicated the presence of faint alternative bands.

As expected, HMA5 had a large (9.5 fold) increase in expression when grown with 100 µM

Cu2+.21 Additionally, MTPB1 and HMA6 had 4.5 and 4.0 fold increases in expression in seedlings grown in the presence of 100 µM Cu2+, and the bands for these transcripts were barely visible when no additional Cu2+ had been added to the media.

4.4.5 T-DNA mutant genotyping results

At least one T-DNA insertion mutant was ordered for each of the candidate genes listed in Table 4.6 except for COPT5, and a list of the T-DNA insertion mutant stock numbers can be found in section 4.3.11. The T-DNA insertion mutants were grown and phenotyped. All of the plants for the HMA5 insertion line were homozygous. Several plants from both HMA6 lines,

HMA8, and MTPB1-1 lines were homozygous for the T-DNA insertion but only one plant from

MTPB1-2 line was homozygous. All of the T-DNA mutant lines for all of the genes had at least one homozygous plant; seeds from those plants were collected.

113 4.5 Discussion

From the CD experiments shown above, we have shown that Cu2+ can disrupt a GQS structure, most likely by coordinating to the N7 of guanine (see Section 4.2.3). Non-GQS structures that involve canonical Watson-Crick base pairing would leave the N7 of Guanine open for interaction and that structure would be less affected by or even stabilized by Cu2+. Similarly, it is possible for Cu2+ to bind specifically to a GQS and unfold it accordingly. Any change in nucleic acid structure can affect regulation of transcription, splicing, and translation.

Genes annotated for copper ion binding are overrepresented in GQS with the G3L1-7 motif. The proteins encoded by these genes are mostly copper and heavy metal transport proteins and other copper binding proteins and chaperones. It is possible for positive feedback mechanisms to exist, where an excess of copper increases the RNA transcript level, which therefore increases the protein levels needed to transport the copper ions across membranes for use, storage, or phytoremediation.

Functionally, GQS in DNA can halt transcription, specifically when located in the promoter region;67,68 removing or destabilizing a GQS structure can thus increase transcript levels.69-71 In the case of genes coding for copper transport proteins, GQS are located in the coding sequences, and as such, mechanisms for regulation are not clear. It is possible that the formation of a GQS structure in the gene under normal conditions could inhibit transcription and maintain the low normal transcript levels. An increase in Cu2+ concentrations could disrupt a

GQS structure by Cu2+ coordination to the GQS, or through a cofactor, and allow transcription to proceed, increasing transcript levels in vivo. This would be consistent with the observation that

Cu2+ at lower levels increased levels of all the transcripts examined (Figure 4.6). It is known that under high Cu2+ conditions, Cu2+ will accumulate in the nuclei of root cells72 and therefore will be available to interact with DNA.

114 Alternatively, GQS found in RNA would most likely affect translation, splicing, or mRNA stability. Analogous to regulation of transcription described above, GQS structure in

RNA could inhibit transcription, and Cu2+ binding to RNA could unfold the GQS structure and allow transcription to continue.

4.6 Future Directions

While RT-PCR results shown above suggest there is an increase of transcript levels for copper transport proteins grown on media containing high Cu2+ concentrations, real time RT-PCR would allow better quantification of any increase in transcript level for different plant growth conditions. No mechanistic clues for GQS involvement in the increase in transcript level can be gained from real time or standard RT-PCR. In planta studies, specifically the growth of T-DNA knockout lines that have been complemented either with wild-type cDNA sequences or with corresponding cDNAs harboring mutations in the GQS of interest, are required to show regions of the gene that are required for function. At least three generations of plants (Arabidopsis thaliana Col-0 background) are required to produce homozygous plants with desired mutations.

At that point, experiments could include: testing for copper-related phenotypes and if those are rescued by growth on copper-containing media, real time PCR to analyze transcript levels, or even atomic absorption spectroscopy to obtain copper concentrations in different regions of the plant. Transient expression assays to test for transcript level with a mutated gene sequence are possible; however these will not show any phenotypic effects as would the mutant lines.

Specificity of these genes for copper can be tested by comparison with with Zn2+, Fe2+,

Cd2+, and Ag+. Some of the heavy metal ion transport proteins and chaperones have the ability to coordinate different divalent or monovalent ions. Heavy metal P-type ATPases are found in

115 almost all organisms, from Homo sapiens to prokaryotes.16 Conservation of these genes, the GQS sequences, and their upregulation upon exposure to Cu2+ would be of interest.

Finally, the oxidation state of copper and its effect on nucleic acid binding has not been studied. Cu2+ was used in the studies listed above because it is the predominant species in an oxygen rich environment and also would be the oxidation state of copper found in soils where plants grow.2,10 However, some copper chaperones and copper binding proteins, including

COPT1, catechol oxidase and superoxide dismutase (CuZnSOD) can bind the Cu+ species.73,74

The possibility for oxidation-reduction reactions and the reducing cellular environment suggest that we cannot discount Cu+ as a potential ligand for nucleic acids. In vitro experiments with Cu+ require an anaerobic environment with special experimental conditions, and involve using cuvettes with septa or a glove box.

4.7 Acknowledgements

I would like to thank Liza Wilson, Tim Gookin, and Dr. Sarah Nilson for help with plant growth protocols and RT-PCR procedure techniques and the Ewing lab for use of their dissecting microscope and camera.

4.8 References

1. Pilon-Smits, E., Phytoremediation. Annu. Rev. Plant Biol., 2005. 56: 15-39.

2. Palmer, C.M. and Guerinot, M.L., Facing the challenges of Cu, Fe and Zn homeostasis in plants. Nat. Chem. Biol., 2009. 5: 333-340.

3. Rae, T.D., Schmidt, P.J., Pufahl, R.A., Culotta, V.C. and O'Halloran, T.V., Undetectable intracellular free copper: the requirement of a copper chaperone for superoxide dismutase. Science, 1999. 284: 805-808.

116 4. Foy, C.D., Chaney, R.L. and White, M.C., The physiology of metal toxicity in plants. 1978. 511-566.

5. Clarkson, D.T. and Hanson, J.B., The mineral nutrition of higher plants. 1980. 239-298.

6. Hunter, R., A reversible phase of copper toxicity in maize roots. J. Plant Nutr., 1981. 3: 375-386.

7. Kennedy, C. and Gonsalves, F., The action of divalent zinc, cadmium, mercury, copper and lead on the trans-root potential and H+, efflux of excized roots. J. Exp. Bot., 1987. 38: 800-817.

8. Meyer, S.L.F. and Heath, M.C., A comparison of the death induced by fungal invasion or toxic chemicals in cowpea epidermal cells. II. Responses induced by Erysiphe cichoracearum. Can. J. Bot., 1988. 66: 613-623.

9. Lanaras, T., Moustakas, M., Symeonidis, L., Diamantoglou, S. and Karataglis, S., Plant metal content, growth responses and some photosynthetic measurements on field- cultivated wheat growing on ore bodies enriched in Cu. Physiol. Plant, 1993. 88: 307- 314.

10. Puig, S., Mira, H., Dorcey, E., Sancenon, V., Andres-Colas, N., Garcia-Molina, A., Burkhead, J.L., Gogolin, K.A., Abdel-Ghany, S.E., Thiele, D.J., et al., Higher plants possess two different types of ATX1-like copper chaperones. Biochem. Biophys. Res. Commun., 2007. 354: 385-390.

11. Waters, B.M. and Grusak, M.A., Whole-plant mineral partitioning throughout the life cycle in Arabidopsis thaliana ecotypes Columbia, Landsberg erecta, Cape Verde Islands, and the mutant line ysl1ysl3. New Phytol, 2008. 177: 389-405.

12. Zhou, J. and Goldsbrough, P.B., Functional homologs of fungal metallothionein genes from Arabidopsis. Plant Cell, 1994. 6: 875-884.

13. Himelblau, E., Mira, H., Lin, S.J., Culotta, V.C., Penarrubia, L. and Amasino, R.M., Identification of a functional homolog of the yeast copper homeostasis gene ATX1 from Arabidopsis. Plant Physiol., 1998. 117: 1227-1234.

14. O'Halloran, T.V. and Culotta, V.C., Metallochaperones, an intracellular shuttle service for metal ions. J. Biol. Chem., 2000. 275: 25057-25060.

15. Sancenon, V., Puig, S., Mira, H., Thiele, D.J. and Penarrubia, L., Identification of a copper transporter family in Arabidopsis thaliana. Plant Mol. Biol., 2003. 51: 577-587.

16. Hussain, D., Haydon, M.J., Wang, Y., Wong, E., Sherson, S.M., Young, J., Camakaris, J., Harper, J.F. and Cobbett, C.S., P-type ATPase heavy metal transporters with roles in essential zinc homeostasis in Arabidopsis. Plant Cell, 2004. 16: 1327-1339.

17. Palmgren, M.G. and Axelsen, K.B., Evolution of P-type ATPases. Biochim. Biophys. Acta, 1998. 1365: 37-45.

117 18. Axelsen, K.B. and Palmgren, M.G., Inventory of the superfamily of P-type ion pumps in Arabidopsis. Plant Physiol., 2001. 126: 696-706.

19. Arguello, J.M., Mandal, A.K. and Mana-Capelli, S., Heavy metal transport CPx-ATPases from the thermophile Archaeoglobus fulgidus. Ann. NY Acad. Sci., 2003. 986: 212-218.

20. Mills, R.F., Krijger, G.C., Baccarini, P.J., Hall, J.L. and Williams, L.E., Functional expression of AtHMA4, a P1B-type ATPase of the Zn/Co/Cd/Pb subclass. Plant J., 2003. 35: 164-176.

21. Andres-Colas, N., Sancenon, V., Rodriguez-Navarro, S., Mayo, S., Thiele, D.J., Ecker, J.R., Puig, S. and Penarrubia, L., The Arabidopsis heavy metal P-type ATPase HMA5 interacts with metallochaperones and functions in copper detoxification of roots. Plant J., 2006. 45: 225-236.

22. Pilon, M., Abdel-Ghany, S.E., Cohu, C.M., Gogolin, K.A. and Ye, H., Copper cofactor delivery in plant cells. Curr. Opin. Plant Biol., 2006. 9: 256-263.

23. Seigneurin-Berny, D., Gravot, A., Auroy, P., Mazard, C., Kraut, A., Finazzi, G., Grunwald, D., Rappaport, F., Vavasseur, A., Joyard, J., et al., HMA1, a new Cu-ATPase of the chloroplast envelope, is essential for growth under adverse light conditions. J. Biol. Chem., 2006. 281: 2882-2892.

24. Shikanai, T., Muller-Moule, P., Munekage, Y., Niyogi, K.K. and Pilon, M., PAA1, a P- type ATPase of Arabidopsis, functions in copper transport in chloroplasts. Plant Cell, 2003. 15: 1333-1346.

25. Abdel-Ghany, S.E., Muller-Moule, P., Niyogi, K.K., Pilon, M. and Shikanai, T., Two P- type ATPases are required for copper delivery in Arabidopsis thaliana chloroplasts. Plant Cell, 2005. 17: 1233-1251.

26. Sancenon, V., Puig, S., Mateu-Andres, I., Dorcey, E., Thiele, D.J. and Penarrubia, L., The Arabidopsis copper transporter COPT1 functions in root elongation and pollen development. J. Biol. Chem., 2004. 279: 15348-15355.

27. Radisky, D. and Kaplan, J., Regulation of transition metal transport across the yeast plasma membrane. J. Biol. Chem., 1999. 274: 4481-4484.

28. Yamasaki, H., Abdel-Ghany, S.E., Cohu, C.M., Kobayashi, Y., Shikanai, T. and Pilon, M., Regulation of copper homeostasis by micro-RNA in Arabidopsis. J. Biol. Chem., 2007. 282: 16369-16378.

29. Misra, V.K. and Draper, D.E., On the role of magnesium ions in RNA stability. Biopolymers, 1998. 48: 113-135.

30. Dann, C.E., 3rd, Wakeman, C.A., Sieling, C.L., Baker, S.C., Irnov, I. and Winkler, W.C., Structure and mechanism of a metal-sensing regulatory RNA. Cell, 2007. 130: 878-892.

31. Barton, J.K. and Lippard, S.J., Heavy metal interactions with nucleic acids, in Metal Ions in Biology, T.J. Spiro, Editor. 1980, John Wiley and Sons: New York:31-114.

118 32. Saenger, W., Metal ion binding to nucleic acids, in Principles of nucleic acid structure. 1984, Springer-Verlag: New York:201.

33. Ochoa, P.A., Rodriguez-Tapiador, M.I., Alexandre, S.S., Pastor, C. and Zamora, F., Structural models for the interaction of Cd(II) with DNA: trans-[Cd(9-RGH- N7)2(H2O)4]2+. J. Inorg. Biochem., 2005. 99: 1540-1547.

34. Butzow, J.J. and Eichhorn, G.L., Interaction of metal ions with nucleic acids and related compounds. XVII. On the mechanism of degradation of polyribonucleotides and oligoribonucleotides by zinc(II) ions. Biochemistry, 1971. 10: 2019-2027.

35. Sen, D. and Gilbert, W., A sodium-potassium switch in the formation of four-stranded G4-DNA. Nature, 1990. 344: 410-414.

36. Burge, S., Parkinson, G.N., Hazel, P., Todd, A.K. and Neidle, S., Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res., 2006. 34: 5402-5415.

37. Olsen, C.M., Gmeiner, W.H. and Marky, L.A., Unfolding of G-quadruplexes: energetic, and ion and water contributions of G-quartet stacking. J. Phys. Chem. B, 2006. 110: 6962-6969.

38. Smargiasso, N., Rosu, F., Hsia, W., Colson, P., Baker, E.S., Bowers, M.T., De Pauw, E. and Gabelica, V., G-quadruplex DNA assemblies: loop length, cation identity, and multimer formation. J. Am. Chem. Soc., 2008. 130: 10208-10216.

39. Miyoshi, D., Nakao, A., Toda, T. and Sugimoto, N., Effect of divalent cations on antiparallel G-quartet structure of d(G4T4G4). FEBS Lett., 2001. 496: 128-133.

40. Miyoshi, D., Nakao, A. and Sugimoto, N., Structural transition of d(G4T4G4) from antiparallel to parallel G-quartet induced by divalent cations. Nucleic Acids Res., 2001: 259-260.

41. Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M. and Bevilacqua, P.C., RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles. Nucleic Acids Res., 2010. 38: 8149-8163.

42. Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T.Z., Garcia-Hernandez, M., Foerster, H., Li, D., Meyer, T., Muller, R., Ploetz, L., et al., The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 2008. 36: D1009- 1014.

43. Huppert, J.L. and Balasubramanian, S., Prevalence of quadruplexes in the human genome. Nucleic Acids Res., 2005. 33: 2908-2916.

44. Maere, S., Heymans, K. and Kuiper, M., BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 2005. 21: 3448-3449.

119 45. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 2003. 13: 2498-2504.

46. Cline, M.S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Christmas, R., Avila-Campilo, I., Creech, M., Gross, B., et al., Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc., 2007. 2: 2366-2382.

47. Waters, B.M., Chu, H.H., Didonato, R.J., Roberts, L.A., Eisley, R.B., Lahner, B., Salt, D.E. and Walker, E.L., Mutations in Arabidopsis yellow stripe-like1 and yellow stripe- like3 reveal their roles in metal ion homeostasis and loading of metal ions in seeds. Plant Physiol., 2006. 141: 1446-1458.

48. Bove, J., Kim, C.Y., Gibson, C.A. and Assmann, S.M., Characterization of wound- responsive RNA-binding proteins and their splice variants in Arabidopsis. Plant Mol. Biol., 2008. 67: 71-88.

49. Alonso, J.M., Stepanova, A.N., Leisse, T.J., Kim, C.J., Chen, H., Shinn, P., Stevenson, D.K., Zimmerman, J., Barajas, P., Cheuk, R., et al., Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science, 2003. 301: 653-657.

50. Yamada, K., Lim, J., Dale, J.M., Chen, H., Shinn, P., Palm, C.J., Southwick, A.M., Wu, H.C., Kim, C., Nguyen, M., et al., Empirical analysis of transcriptional activity in the Arabidopsis genome. Science, 2003. 302: 842-846.

51. Kampfenkel, K., Kushnir, S., Babiychuk, E., Inze, D. and Van Montagu, M., Molecular characterization of a putative Arabidopsis thaliana copper transporter and its yeast homologue. J. Biol. Chem., 1995. 270: 28479-28486.

52. Hirayama, T., Kieber, J.J., Hirayama, N., Kogan, M., Guzman, P., Nourizadeh, S., Alonso, J.M., Dailey, W.P., Dancis, A. and Ecker, J.R., RESPONSIVE-TO- ANTAGONIST1, a Menkes/Wilson disease-related copper transporter, is required for ethylene signaling in Arabidopsis. Cell, 1999. 97: 383-393.

53. Balandin, T. and Castresana, C., AtCOX17, an Arabidopsis homolog of the yeast copper chaperone COX17. Plant Physiol., 2002. 129: 1852-1857.

54. Delhaize, E., Kataoka, T., Hebb, D.M., White, R.G. and Ryan, P.R., Genes encoding proteins of the cation diffusion facilitator family that confer manganese tolerance. Plant Cell, 2003. 15: 1131-1142.

55. Abdel-Ghany, S.E., Burkhead, J.L., Gogolin, K.A., Andres-Colas, N., Bodecker, J.R., Puig, S., Penarrubia, L. and Pilon, M., AtCCS is a functional homolog of the yeast copper chaperone Ccs1/Lys7. FEBS Lett., 2005. 579: 2307-2312.

56. Chu, C.C., Lee, W.C., Guo, W.Y., Pan, S.M., Chen, L.J., Li, H.M. and Jinn, T.L., A copper chaperone for superoxide dismutase that confers three types of copper/zinc superoxide dismutase activity in Arabidopsis. Plant Physiol., 2005. 139: 425-436.

120 57. Desbrosses-Fonrouge, A.G., Voigt, K., Schroder, A., Arrivault, S., Thomine, S. and Kramer, U., Arabidopsis thaliana MTP1 is a Zn transporter in the vacuolar membrane which mediates Zn detoxification and drives leaf Zn accumulation. FEBS Lett., 2005. 579: 4165-4174.

58. Kramer, U., MTP1 mops up excess zinc in Arabidopsis cells. Trends Plant Sci., 2005. 10: 313-315.

59. Williams, L.E. and Mills, R.F., P(1B)-ATPases--an ancient family of transition metal pumps with diverse functions in plants. Trends Plant Sci., 2005. 10: 491-502.

60. Kramer, U., Talke, I.N. and Hanikenne, M., Transition metal transport. FEBS Lett., 2007. 581: 2263-2272.

61. Lane, A.N., Chaires, J.B., Gray, R.D. and Trent, J.O., Stability and kinetics of G- quadruplex structures. Nucleic Acids Res., 2008. 36: 5482-5515.

62. Zimmermann, P., Hirsch-Hoffmann, M., Hennig, L. and Gruissem, W., GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol., 2004. 136: 2621-2632.

63. Zimmermann, P., Hennig, L. and Gruissem, W., Gene-expression analysis and network discovery using Genevestigator. Trends Plant Sci., 2005. 10: 407-409.

64. Laule, O., Hirsch-Hoffmann, M., Hruz, T., Gruissem, W. and Zimmermann, P., Web- based analysis of the mouse transcriptome using Genevestigator. BMC Bioinf., 2006. 7: 311.

65. Hruz, T., Laule, O., Szabo, G., Wessendorp, F., Bleuler, S., Oertle, L., Widmayer, P., Gruissem, W. and Zimmermann, P., Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv. Bioinformatics, 2008. 2008: 420747.

66. Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M., Scholkopf, B., Weigel, D. and Lohmann, J.U., A gene expression map of Arabidopsis thaliana development. Nat. Genet., 2005. 37: 501-506.

67. Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. and Hurley, L.H., Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription. Proc. Natl. Acad. Sci. U.S.A., 2002. 99: 11593-11598.

68. Paramasivam, M., Membrino, A., Cogoi, S., Fukuda, H., Nakagama, H. and Xodo, L.E., Protein hnRNP A1 and its derivative Up1 unfold quadruplex DNA in the human KRAS promoter: implications for transcription. Nucleic Acids Res., 2009. 37: 2841-2853.

69. Thakur, R.K., Kumar, P., Halder, K., Verma, A., Kar, A., Parent, J.L., Basundra, R., Kumar, A. and Chowdhury, S., Metastases suppressor NM23-H2 interaction with G- quadruplex DNA within c-MYC promoter nuclease hypersensitive element induces c- MYC expression. Nucleic Acids Res., 2009. 37: 172-183.

121 70. Borgognone, M., Armas, P. and Calcaterra, N.B., Cellular nucleic-acid-binding protein, a transcriptional enhancer of c-Myc, promotes the formation of parallel G-quadruplexes. Biochem. J., 2010. 428: 491-498.

71. Cogoi, S., Paramasivam, M., Membrino, A., Yokoyama, K.K. and Xodo, L.E., The KRAS promoter responds to Myc-associated zinc finger and poly(ADP-ribose) polymerase 1 proteins, which recognize a critical quadruplex-forming GA-element. J. Biol. Chem., 2010. 285: 22003-22016.

72. Silverberg, B.A., Stokes, P.M. and Fernstein, L.B., Intranuclear complexes in a copper- tolerant green alga. J. Cell Biol., 1976. 69: 210-214.

73. Kliebenstein, D.J., Monde, R.A. and Last, R.L., Superoxide dismutase in Arabidopsis: an eclectic enzyme family with disparate regulation and protein localization. Plant Physiol., 1998. 118: 637-650.

74. Gerdemann, C., Eicken, C. and Krebs, B., The crystal structure of catechol oxidase: new insight into the function of type-3 copper proteins. Acc. Chem. Res., 2002. 35: 183-191.

122

Chapter 5

The 3’UTR of an abscisic acid catabolism gene has an ion-dependent tertiary structure

5.1 Abstract

The major plant hormone abscisic acid (ABA) mediates drought stress response in higher plants. The first step in the catabolism pathway, CYP707A4, encodes an 8‘ABA hydroxylase, and is a prime candidate for ABA regulation. The 3‘UTR of CYP707A4 undergoes an RNA refolding event with Mg2+ and K+. The putative structure, with a stable duplex region and U-rich loop, has the potential to form a pseudoknot and stable tertiary structure. We used UV-detected thermal denaturation, circular dichroism, and structure mapping to show that the 3‘UTR of

CYP707A4 does adopt an ion-dependent structure. Isothermal titration calorimetry, circular dichroism, and equilibrium dialysis used to test for ABA binding to the 3‘UTR of CYPY707A4, suggest at most a weak and non-specific interaction with ABA.

5.2 Introduction

5.2.1 Primary and secondary metabolites in plants

Plants synthesize and utilize >200,000 different small molecules for a wide variety of functions.1 Primary metabolites, such as amino acids, are essential for plant life.2 There are more than 1,000 secondary metabolites that have been profiled in the model plant species Arabidopsis thaliana,3 and a few examples are shown in Figure 5.1. While these molecules are not essential for basic plant function, they are required for optimal function and are used in plant defense

123 against pathogens and herbivores,4 abiotic stress responses,3,5 signaling, regulation of primary metabolic pathways, and plant development.6 Secondary metabolites have complex biosynthesis, transport, and transduction pathways.7

Figure 5.1: Some common secondary metabolites in plants. Plant hormones: gibberelin G1, indole-3-acetic acid (IAA), cytokinin, abscisic acid (ABA), jasmonic Acid (JA), and brassinolide. Other secondary metabolites: Sinapolymalate (a phenylpropanoid), 1,8-Cineole (a terpene), vanillic acid (a benzenoid), Kaempferol (a flavonoid).

124 There are at least five classes of plant secondary metabolites:7 (1) nitrogen-containing compounds, such as indole-acetic acid (IAA), which are involved in plant growth8 (2) phenylpropanoids, such as sinapoylmalate, which are involved in defense against pathogens,9 (3) benzenoids, such as vanillic acid, which are found in seeds (4) flavonoids, including kaempferol, which are implicated in UV protection, auxin transport, and seed dormancy,10-12 and (5) terpenes, including 1,8-cineole, suspected to be involved in defense against plant pathogens and herbivores.

Some secondary metabolites are plant hormones, which function in stress responses, pathogen defense, and developmental regulation. Auxins (IAA), gibberellins (GA), cytokinin (trans-

Zeatin), ethylene, and (s)-cis-abscisic acid (ABA) are the major plant hormones.

5.2.2 Plant stress response hormone abscisic acid

Abscisic acid (ABA) is a plant hormone that is essential for water stress responses and seed development, and its structure can be seen in Figure 5.1. ABA concentrations increase under drought stress. As such, cellular concentrations can range from low nM under normal conditions to µM under drought stress conditions.13

Abscisic acid is synthesized in vascular tissues and to a lesser extent in chloroplasts of mesophyll cells.14 Free ABA is often localized in the cytosol, which has a neutral pH around 7.3.

Under optimal conditions, the pH of xylem sap and the apoplast, the region outside the cell membrane, is around 5.5-6.0.15 However, under drought stress conditions, the pH of the xylem sap and apoplast increases to a value greater than 7.15 Drought thus changes the distribution of the two protonation states of ABA, which has a pKa of 4.8, decreasing the amount of the protonated state of ABA (ABAH) present outside the cell membrane by about 10 fold.

125

Scheme 5.1: Ionization equilibrium of ABA

The pH change also affects ABA distribution in the plant: ABA is distributed according to the anion trap concept, whereby ABA is concentrated in alkaline regions (cytosol). As the pH of xylem sap and the apoplast increases, so do ABA concentrations in those regions due to the flattening of the pH gradient.16,17 ABA has a chiral center, and as such, can exist as S-(+)-ABA and R-(-)-ABA.18 When ABA is extracted from the plant, only (+) ABA is obtained, suggesting that this is the natural form of the molecule.19 However, both forms of ABA are biologically active, although (-) ABA only has about half the activity level of the natural (+) ABA.20

During drought or dehydration stress, ABA mediates stomatal closing and inhibits stomatal opening by affecting ion transport in the guard cells.21,22 ABA also works with other plant hormones, including GA, brassinolide, and auxin to regulate seed development and growth.23,24 For example, early in development, ABA promotes embryo growth.25 However, later in development, ABA can block premature embryo growth and germination if GA levels increase too quickly.26,27 ABA also functions in guard cell signaling, glucose and ethylene signaling, seedling development, fertility, the size of cotyledons, leaves, stems, roots, and siliques, and seed maturation, dormancy, desiccation response, and germination.14,25

A complex signal transduction pathway is required for ABA perception, transport, and signaling.28 ABA-binding proteins include the G-proteins GTG1 and GTG2,29 the magnesium cheletase subunit (ChlH),30,31 and the ABA receptor/pyrabactin resistance protein 1 and PYR-like proteins (RCAR/PYR/PYL).32 Recently identified transporters ABCG25 and ABCB14 transport

126 ABA into and out of cells, the former in vascular tissue, the latter in guard cells.33,34 ABA- induced gene expression is promoted by the ABA responsive promoter elements (ABRE) which are targeted by ABA-responsive element binding proteins (AREB), and ABA-responsive element binding factors (ABF). Other proteins involved in intracellular ABA signaling cascades and gene regulation include OST1, SnRK, CIPKs, and CPKs.28,32

5.2.3 Abscisic acid biosynthesis and catabolism

The ABA biosynthetic pathway, shown in Figure 5.2, starts with zeaxanthin, which is derived from -carotene. The first two steps of the pathway, conversion of zeaxanthin to antheraxanthin and of antheraxanthin to violaxanthin, are catalyzed by zeaxanthin epoxidase

(ZEP). Violaxanthin de-epoxidase (VEP) is responsible for catalyzing the reverse reactions, which are possible only under high light conditions in the chloroplast.35 Violaxanthin can be converted to neoxanthin by neoxanthin synthase (NSY) in step 3. Violaxanthin and neoxanthin can be isomerized to 9-cis-violaxanthin (step 4) and 9-cis-neoxanthin (step 5), respectively.

These can then be converted to xanthoxin by members of the 9-cis-epoxycarotenoid dioxygenase

(NCED) family.14 Five proteins of the NCED family are suggested to be involved in ABA biosynthesis. Step 6 was proposed as the first committed step in ABA synthesis and as the main regulatory step.36 Next, ABA2, a short-chain alcohol dehyrogenase, catalyzes the conversion of xanthoxin to abscisic aldehyde (step 7). Lastly, with the aid of a molybdenum cofactor, abscisic aldehyde oxidase oxidizes abscisic aldehyde to abscisic acid (step 8).14

127

128 Figure 5.2: ABA biosynthesis pathway (adapted from Nambara, et al14). The first two steps of the pathway (labeled with blue numbers), conversion of zeaxanthin to antheraxanthin and antheraxanthin to violaxanthin, are catalyzed by zeaxanthin epoxidase (ZEP). In step 3, violaxanthin de-epoxidase (VDE) can catalyze the reverse reaction in the chloroplast under high light conditions. Violaxanthin is most likely converted to neoxanthin by neoxanthin synthase (NSY). Isomers of violaxanthin and neoxanthin are synthesized by an unknown isomerase in steps 4 and 5. A family of 9-cis-epoxycarotenoid dioxygenases (NCEDs) is required to form xanthoxin in step 6, which is then converted to abscisic aldehyde by ABA2, a short-chain alcohol dehydrogenase (step 7). Abscisic aldehyde is then oxidized by abscisic acid oxidase 3 (AAO3) with the help of a molybdenum cofactor (MoCo), to form abscisic acid (step 8).

Regulation of the ABA biosynthesis pathway can be affected by upstream pathways, such as the carotenoid biosynthesis pathway37 and the xanthophyll cycle (regulation of ZEP)38 that can affect levels of the zeaxanthin starting molecule. As mentioned above, the concentration of ABA increases under dehydration and drought stress conditions. This stress induces NCED3, AAO3,

ABA3, and ZEP expression14, which leads to increased ABA biosynthesis. NCED has been proposed as the regulatory enzyme in the biosynthetic pathway, as its expression correlates with the concentration of ABA, and ABA concentrations increase when NCED is overexpressed.36,39-41

ABA biosynthesis occurs primarily in vascular tissues, where expression of ABA2, AAO4, and the NCED genes, have been verified.25,42 ABA is transported from vascular tissues to the apoplast and guard cells, where it is required for stomatal closure.43,44 In addition to long distance transport, there is evidence that ABA can be synthesized in guard cells: AAO3 expression is induced in guard cells under dehydration stress,42 and NCED2 and NCED3 are also expressed in guard cells.45

The ABA catabolism pathway is shown in Figure 5.3. ABA can be hydrolyzed into 7‘,

8‘, or 9‘hydroxy ABA, all of which have some biological activity.46 The most predominant catabolism pathway is through 8‘ hydroxyl ABA and is catalyzed by ABA 8‘-hydroxylases, which are encoded by CYP707A1-4 genes.47,48 8‘hydroxyl ABA is isomerized to (-)-phaseic acid

(PA), and 98% of 8‘-hydroxy ABA exists as PA in aqueous solution.49 PA can be further

129 converted to (-)-dihydrophaseic acid (DPA). 9‘hydroxy ABA is also an abundant catabolite, but

7‘ hydroxy ABA is the minor catabolite. All hydroxylation products can be conjugated to glucose at hydroxyl groups. ABA glucosyl ester (ABA-GE) is the most common conjugate and has been implicated in long distance transport of ABA.50,51

Figure 5.3: ABA catabolism pathway (adapted from Nambara, et al14). The major pathway for hydroxylation is through 8‘- hydroxy ABA (central path) and is catalyzed by an ABA 8‘ hydroxylase encoded by CYP707A genes. 8‘ –hydroxyl ABA is readily converted into phaseic acid, which can then by catabolized into the dihydrophaseic acid (DPA). In some organisms and tissues, 9‘- hydroxy ABA is abundant and readily converted to neo phaseic acid (neo PA). The minor catabolite of ABA is 7‘- hydroxy ABA. All of these hydroxylated compounds can be further conjugated at hydroxy groups.

130 In the catabolism pathway, the CYP707A genes encoding for ABA 8‘hydroxylase are suggested as regulatory enzymes.14 CYP707A genes are induced by high humidity,52 osmotic stress,47 GA, and brassinolide.53 A decrease in the levels of ABA is often associated with an increase in the major catabolites PA and DPA.47 It has been suggested that ABA can affect its own regulation, because many of the ABA biosynthesis and catabolism genes are up-regulated after treatment with ABA.25,47,48,54-56 A metabolic pathway is often regulated at the first committed step for synthesis or catabolism. In the case of ABA, the NCED and CYP707A genes are proposed as the first committed steps in the biosynthesis and catabolism, respectively.

5.2.4 Riboswitches as RNA regulators

As described in Chapter 1, riboswitches are natural RNA aptamers that bind small molecules and undergo an RNA refolding event to control gene expression. Only one riboswitch has been identified in higher plants.57,58 The thiamine pyrophosphate (TPP) riboswitch was discovered in Arabiodpsis by searching the genome for sequence and structure homology to the known prokaryotic TPP riboswitch.57,59 The plant aptamer is located in the 3‘UTR of a thiamine biosynthetic gene, THIC, and alters 3‘-processing of the gene, affecting mRNA degradation, expression of THIC, and biosynthesis of TPP.59 The mechanism is provided in Figure 5.4. In the presence of low concentrations of TPP, the splice site is obstructed and a short 3‘UTR allows for high THIC expression and TPP biosynthesis. When high concentrations of TPP are present, splicing is permitted, and a longer 3‘UTR leads to transcript degradation and a decrease in TPP biosynthesis.59

131

Figure 5.4: Arabidopsis thaliana TPP Riboswitch functions in alternative splicing, 3‘end processing, and mRNA stability.59

Riboswitches have been discovered by four main approaches: observations of feedback inhibition in vivo, comparative genomics across species near regulatory regions, sequence or structural homology to known riboswitches, and aptamer selection. Observations of feedback inhibition genes in prokaryotes led to discovery of riboswitches for lysine,60 coenzyme B12

(AdoCbl),61 and thiamine pyrpophosphase (TPP).62,63 Transcription of biosynthetic genes was repressed by high concentrations of the small molecules in vivo. In vitro studies led to the discovery of regions of the prokaryotic 5‘ UTRs that bound to the small molecules and inhibited ribosome binding and transcription initiation.60-62 Once a consensus sequence was found for the aptamer motif, it was used to search for riboswitches in other species. Using sequence and structure homology searches, the TPP riboswitch was found in 22 species of plants and nine species of fungi.59,64,65 S-adenosylmethionine (SAM) type II riboswitches were discovered from structural similarity to the SAM I binding motif.66 Other riboswitches, such as the riboflavin

(RFN), glmS, queuosine, and glycine riboswitches were found by comparative genomics or bioinformatics searches for sequence conservation in bacterial regulatory regions.67-71 In some

132 cases, a small molecule can affect expression of multiple genes in the same regulatory pathway, such as with the TPP aptamers and riboswitches in N. crassa.64

Aptamer sequences for small molecules can also be found using in vitro selection followed by a search for the consensus sequence in the organism(s) of interest. This is usually done by attaching the target small molecule to a solid support before running multiple rounds of selection.72,73 However, several structural studies show that the aptamer domain of a riboswitch completely envelops the ligand when it binds.57,74-76 Disrupting a functional group on the small molecule to attach it to a solid support for multiple round selections could thus remove a functional group that would otherwise be used for binding.73 New methods involving capillary electrophoresis have selected apatamers for small molecules without the use of a solid

77-79 support.

We started this study by investigating non-coding regions of genes in the ABA metabolism pathways (biosynthesis and catabolism) for ABA binding. These were prime candidates for an ABA riboswitch in the model plant species Arabidopsis thaliana. As mentioned above, exogenous treatment of ABA up-regulated metabolic genes, almost all known riboswitches are found in non-coding regions of genes, and ABA contains functional groups that could bind to RNA. Unfortunately, the ABA binding studies were inconclusive and showed at best weak and non-specific binding of ABA at low pH. However, we did find an AU-rich 3‘UTR that could fold into an ion-dependent tertiary structure.

133 5.3 Materials and methods

5.3.1 Bioinformatics

Non-coding portions of ABA-related genes were investigated, with a special focus on the

3‘UTR regions of ABA biosynthesis and catabolism genes. In an attempt to find homology between genes, BLAST for primary sequence searches and CLUSTALX for primary sequence alignments80-82 were used. Secondary structure alignments were carried out using DynAlign in

RNAStructure83-85 for non-coding regions of ABA metabolism genes. The gene candidates for these studies can be seen in Figures 5.2 and 5.3. Secondary structures were analyzed by manual sequence alignment with the particular goal of finding potential pseudoknot regions in the structures, as these have been implicated in formation of tertiary structures.

5.3.2 RNAs studied

Full length 3‘UTRs were used for the study and their sequences are shown below. For the CYP707A4 oligonucleotide, there are 20 additional nucleotides upstream of the 3‘UTR that are included in the RNA studied. This was a result of finding the best primer for cloning purposes. The underlined portion of CYP707A4 in the text below shows the nucleotides included in the secondary structure of Figure 5.5.

AAO3 (At2g27150): GGAAAGAUCAAAGGACAAUAAAGGAAAGCAAGUAAGCAACUCUCGUUGCUGCUAA GUUACAGUGUUACAUAUACUAAAGUUAUUCAGAUUUUAUAUACAGAAAAAUAAU UGCUUUGUGCAUAACUAAUGGUUUCAAAACCAAAUCAAUAAAUCAAAUAUUGAA UGUUAAUGAUUAGAGUGUCACCAAU

AAO4 (At1g04580): GGAGAAUGGAAAACAUAUCCUUAAAAUAACAGAGUCACUCAGAGAUGAUGGAAA AAUCAAAAUAAUAUUUUUCUUAUGAUAUAAAUAUAUUAUGCAAAUUUAUGUACU AUAUGCAUCAAAAGGCUUGUACCA

134 CYP707A4 (At3g19270): AGUUCCUUAAACCUUUGUAGUAAUCUUUGUUGUAGUUAGCCAAAUCUAAUCCAAA UUCGAUAUAAAAAAUCCCCUUUCUAUUUUUUUUUAAAAUCAUUGUUGUAGUCUU GAGGGGGUUUAACAUGUAACAACUAUGAUGAAGUAAAAUGUCGAUUCCGGUGGT ATTATAAATTACAGTCG

Shortened constructs of regions of CYP707A4, listed below, were also prepared based on the predicted secondary structures and as described below. The underlined portions show those nucleotides involved in the putative secondary structures in Figures 5.5 and 5.8. The 5‘ and 3‘ ends of the shortened constructs are indicated on Figure 5.5 with red arrows and nucleotide numbers.

20/158: AGUUCCUUAAACCUUUGUAGUAAUCUUUGUUGUAGUUAGCCAAAUCUAAUCCAAA UUCGAUAUAAAAAAUCCCCUUUCUAUUUUUUUUUAAAAUCAUUGUUGUAGUCUU GAGGGGGUUUAACAUGUAACAACUAUGAUGAAGUAAAAUGUCGAUUCCGGU

32/140: AGUUAGCCAAAUCUAAUCCAAAUUCGAUAUAAAAAAUCCCCUUUCUAUUUUUUUU UAAAAUCAUUGUUGUAGUCUUGAGGGGGUUUAACAUGUAACAACUAUGAUGAAG

CYP707A4 shortened constructs with a 20 nt Poly(A) tail were also prepared to examine potential interactions between the U-rich loop found at nt 79 to 88 and the poly(A) tail:

20 Poly(A) AAUCUUUGUUGUAGUUAGCCAAAGCCAAAUCUAAUCCAAAUUCGAUAUAAAAAAU CCCCUUUCUAUUUUUUUUUAAAAUCAUUGUUGUAGUCUUGAGGGGGUUUAACAU GUAACAACUAUGAUGAAGUAAAAUGUCGAUUCCGGUA20

32 Poly(A) AGUUAGCCAAAUCUAAUCCAAAUUCGAUAUAAAAAAUCCCCUUUCUAUUUUUUUU UAAAAUCAUUGUUGUAGUCUUGAGGGGGUUUAACAUGUAACAACUAUGAUGAAG A20

5.3.3 Preparation of RNA

Genomic DNA was extracted from a 1-2 mm2 sized piece of leaf from Arabidopsis thaliana, Columbia-0 ecotype. The leaf was ground in Template Preparation Solution (TPS: 100

135 mM Tris-HCl, pH 9.5, 1M KCl, and 10 mM EDTA) and heated to 96 C for 10-15 min. A 10- fold dilution was prepared to be used as PCR template. PCR was performed using gene specific primers, which are given in Table 1. A T7 Polymerase transcription site was added with the primers as well as restriction enzyme sites for EcoRI and BamHI. The PCR products were run on a 2% agarose gel to confirm PCR results.

Table 5.1: PCR primers for 3‘UTRs of interest 5‘- GGCCGAATTCTAATACGACTCACTATA GGAAAGATCA Upstream AAO3 AAGGACAATAA -3‘ Downstream 5‘- GGCCGGATCCATTGGTGACACTCTAATCATTA -3‘ 5‘- GGCCGAATTCTAATACGACTCACTATAGGAGAATGGA Upstream AAO4 AAACATATCCTTAA -3‘ Downstream 5‘- GGCC GGATCC GTGGTACAAGCCTTTTGAT -3‘ 5‘- GGCCGAATTCTAATACGACTCACTATAGGATCGTCGAC Upstream CYP707A4 ATTCTCTTTAG -3‘ Downstream 5‘- GGCCGGATCCCATTTTCGACTGTAATTTATAAT -3‘ 5‘- GCGCGAATTCTAATACGACTCACTATAGGAGTTCCTTA Upstream 20/158 AACCTTTGTAGTAA -3‘ Downstream 5‘- GCGCGGATCCACCGGAATCGACATTTTACTTC -3‘ 5‘- GCGCGAATTCTAATACGACTCACTATAGGAGTTAGCCA Upstream 32/140 AATCTAATCC -3‘ Downstream 5‘- GCGCGGATCCCTTCATCATAGTTGTTACATG -3‘ 20/158 Downstream 5‘- GCGCGGATCCT ACCGGAATCGACATTTTACTTC -3‘ Poly(A) 20 32/140 Downstream 5‘- GCGCGGATCCT CTTCATCATAGTTGTTACATG -3‘ Poly(A) 20

Gold = Base pair seal; Blue = EcoRI; Green = BamHI; Red = T7 Pol Site

For most RNAs, PCR products were purified from a 2% agarose gel using a Qiagen

QIAquick gel extraction kit. The PCR products/DNA inserts were double digested with EcoRI and BamHI and ligated into double digested and calf intestinal phosphatase (CIP)-treated pUC19 plasmid. Plasmids were transformed into DH5 cells and grown on LB-agar plates with 100 mg/mL ampicillin. Mini-, Midi-, and Maxi- preps were performed to isolate plasmid DNA.

136 Sequences were confirmed through sequencing at the Penn State Genomics Core Facility. To prepare RNA, BamHI-digested plasmid was used as the template for pilot transcriptions with homemade T7 RNA polymerase to obtain the best conditions for each construct. Large scale transcriptions were preformed to obtain larger amounts of RNA, which were necessary for ITC experiments, described below. Transcribed RNA was purified on a 6 % polyacrylamide gel, extracted, and ethanol precipitated. For oligonucleotides with an A20 tail, transcription of polyA oligonucleotides was performed with PCR product as the template without cloning.

5.3.3 Circular dichroism

Circular dichroism spectroscopy was performed using a Jasco CD J810

Spectropolarimeter and analyzed with Jasco Spectra Manager Suite software. RNAs were prepared as described above to desired concentrations for each experiment. Spectra were acquired at 20 C over a wavelength range of 210-320 nm, with data collected every nm at a bandwidth of 1 nm. Reported spectra are an average of 3 scans at a response time of 4 s/nm.

RNA oligonucleotides were titrated with MgCl2, KCl, and ( )ABA (OlChemIm). Data were buffer subtracted, normalized to provide molar residue ellipticity values (Eq. 1), and smoothed over a window size of five datapoints.86 Molar residue ellipticity is reported in order to normalize for concentration (C) differences, cuvette path length (L) and oligonucleotide length (N):87

(Eq. 1) 32,980 C L N

137 5.3.4 UV detected thermal denaturation

RNA was prepared to a concentration of ~0.35 µM in 10 mM MES (pH 6.0), 0.1 mM

EDTA, 100 mM NaCl. Samples were renatured at 85 C for 1 min then allowed to cool to room

2+ temperature. MgCl2 was added to a final concentration of 0.5 or 4 mM. A no Mg -added control was included. Samples were then heated to 50 C for 5 min and then cooled at room temperature before loading into the instrument. Thermal denaturation experiments were performed on a Gilford Response II spectropolarimeter. Data were collected every 0.5 C over the temperature range 5-95 C. Data for the reverse transition were not collected owing to the presence of Mg2+. Unfolding transitions were monitored at 260 nm with 1 cm cuvettes. Data were analyzed with KaleidaGraph v.3.5 (Synergy Software).

5.3.5 Isothermal titration calorimetry

RNA was prepared for ITC by dialysis against the desired buffer. Samples were first dialyzed using a 10,000 MWCO membrane against 2 L of 0.1 M NaCl to ensure the counterion

+ was Na , then against 5L of sterile H2O to remove excess NaCl. Finally, samples were dialyzed against 1x Buffer overnight. Because of low solubility in aqueous solutions, ( )ABA and a control ligand, indole-3-acetic acid (IAA), stock solutions were prepared in an 100% ethanol solution. Because of the high heat release for ethanol dilution in aqueous solutions, the ABA stock solutions cannot be used in a direct injection. Instead, a calculated amount of ABA or IAA stock solution was transferred into an Eppendorf tube, dried in the dark, and resuspended in 1x

Buffer. 1x Buffers used included: 10 mM Na-MOPS (pH 7.5), 100 mM NaCl or 10 mM MES

(pH 5.9), 100 mM NaCl,. 4 µM CYP707A4 RNA was renatured without MgCl2 at 80 C for 1 min and allowed to cool. MgCl2 was added to a final concentration of 5 mM before heating to 50 C

138 for 5 min. The sample was allowed to cool to 25 C before degassing the RNA and ligand solutions. Titrations were performed as follows: 4 µM CYP707A4 RNA was titrated with 30 µM

IAA followed by 30 µM ABA (OlChemIm) for screening purposes. The reference power was set to 15 (mcal/sec), with 10 µL injections, a spacing of 300s, and a stirring speed of 270 rpm.

Some additions were made with (+) ABA (OlChemIm) in the same manner as described above.

5.3.6 Structure Mapping

Hydroxyl radical footprinting was performed using two methods, those of Tullius88 and

Herschlag,89 to verify results.

Tullius Method: 5‘-end labeled RNA in 10 mM MES, pH 6.0, was renatured at 85 C for 1 min, followed by cooling at room temperature for 15 min. MgCl2 and ABA were added to the appropriate concentration for 10 µL reactions. Tubes were allowed to equilibrate at 50 C for 20 min. The 10 µL cleavage reaction was prepared by adding the following solutions to the sides of each tube: 1 µL of 0.1 mM Fe-EDTA, 1 µL of 10 mM sodium ascorbate, and 1 µL of 0.3% H2O2.

The reaction was initiated by one sharp shake downward to the tube, to mix all reagents at the bottom. Tubes were incubated for 2 min at room temperature before quenching with 1 µL of 100 mM thiourea. 2X EDTA-formamide loading buffer was added before running samples on a 6% sequencing gel.

Herschlag Method: 5‘-end labeled RNA in 10 mM MES, pH 6.0, was renatured at 85 C for 1 min, followed by cooling at room temperature for 15 min. MgCl2 and ( )ABA were added to the appropriate concentration for 10 µL reactions. Tubes were allowed to equilibrate at 50 C for 20

139 min. 4 µL of a 2.5X Fenton Master Mix (FMM) was added to each reaction (2.5X FMM: 0.25 mM ferrous ammonium sulfate, 0.155 mM EDTA, 12.5 mM sodium ascorbate) before incubating at room temperature for 60 min. Reactions were quenched with 1 µL of 100 mM thiourea and 2X

EDTA-formamide loading buffer before loading onto a 6% sequencing gel.

The main differences between these two methods were the reagents used to generate hydroxyl radicals and the resulting incubation time of the samples. The Tullius method uses

H2O2 and a 2 min incubation time while the Herschlag method does not utilize H2O2 and requires a longer incubation time of 60 min. Both yielded similar results.

5.3.7.Equilibrium Dialysis

Equilibrium dialysis was performed using a Hoefer Equilibrium Dialyzer (EMD101,

Babitzke lab) with 12-14,000 MWCO membranes and circular sample wells that held 100 µL sample volume on each side. 3H-ABA (Amersham, UK) was provided as 4 µM ( )ABA with a specific activity of 200 µCi/mL. Unlabeled RNA was prepared in vast excess to the trace amounts of 3H-ABA used. High concentrations of RNA are necessary to obtain an acceptable partition ratio of the labeled small molecule given the high concentrations of ABA in vivo ( M).

If the RNA concentration is equal to the Kd, the partition ratio will be 2. Because the binding affinity of ABA to the RNAs of interest was unknown but expected to be similar to the concentrations of ABA in the cell, RNA concentrations of 20 µM - 100 µM were used. 3H-ABA was prepared by drying the appropriate amount of 3H-ABA to remove the methanol solvent, then resusupending in 50 mM MES, 100 mM KCl, pH 6.0, and 5 mM MgCl2 to maintain ionic strength with the RNA solution. At 100 µM concentrations, some of the RNAs precipitated out of solution upon addition of concentrated MgCl2 or after renaturation at 50 C and cooling to

140 room temperature. To achieve high concentrations of RNA and prevent precipitation upon addition of MgCl2, RNA was prepared as a dilute solution of 5-10 µM before heating to 65 C for

1 min. 500 µL of 1 mM MgCl2 was added and mixed well before heating to 50 C for 5 min.

RNA was then concentrated with an ultrafiltration spin column (VivaSpin500) to a final volume of 100 µL.

The equilibrium dialysis apparatus was prepared with two control sets of 3H-ABA dialyzed against buffer, to ensure proper movement of 3H-ABA across a 13-14 kDa molecular weight cut off membrane. Sets of RNA versus 3H-ABA were prepared for RNAs of interest and

Phe yeast tRNA as a negative control. Solutions were allowed to equilibrate overnight (12-14 h).

After equilibration, 10 µL of each sample was removed from each well and counted in a scintillation counter (Beckman Coulter) to quantify 3H-ABA in each side of the membrane. The ratios of the 3H counts in the RNA and ABA wells were calculated. A binding event would yield a ratio greater than one and no binding should result in a ratio around 1 or lower due to the

Donnan effect.90

5.4 Results

5.4.1 Bioinformatics

All known riboswitches have been found in non-coding regions of the RNA.76 Because of this, we started our search with non-coding regions (5‘UTRs, 3‘UTRs, introns) from the ABA biosynthetic and catabolic genes listed in Figures 5.1 and 5.2. The TPP riboswitch in Arabidopsis is located in the 3‘UTR of a biosynthetic gene for a small molecule,62,63 which prompted a closer examination of 3‘UTRs for the ABA-related genes. Interestingly, NCED2, NCED6, and NCED9 do not have annotated 5‘ or 3‘UTR regions. Bioinformatics methods including BLAST91,92 and

141 CLUSTALX80-82 were used in an attempt to find between some of the genes.

Secondary structure alignments were carried out using Dynalign83 for non-coding regions of all known ABA biosynthetic and catabolic genes. Based on the bioinformatics searches, potential candidates were identified for an ABA riboswitch.

In doing this search, we were inspired by characterized mechanisms which utilize multiple binding events to control biological function. First, ferritin, an iron storage protein found in almost all living organisms, exhibits a molecular switch mechanism. High iron levels can increase ferritin mRNA for iron storage and decrease mRNA levels for RNA of the transferrin receptor, which controls iron uptake.93 Second, the branched amino acid biosynthesis pathway of leucine, valine, and isoleucine contains multiple points of regulation, where a combination of the three amino acids can inhibit their own biosynthesis at three different points.

94 Recently, the glmS riboswitch ribozyme has been shown to be activated or inhibited by two different types of ligands; glucosamine-6-P activates the riboswitch to decrease glmS mRNA abundance and hexose ligands inhibit riboswitch cleavage to increase glmS mRNA expression.95

In Arabidopsis, ABA biosynthetic gene AAO3 (At2g27150) and catabolic gene

CYP707A4 (At3g19270), specifically located at the last and first steps in their respective pathways,14 are two potential candidates for an ABA riboswitch. Even though AAO3 is not involved in the first committed step for ABA biosynthesis, which is proposed to be controlled by the NCEDs,14 AAO3 is specific to ABA biosynthesis and its secondary structure aligned with

CYP707A4. Moreover, it remains possible that abscisic aldehyde is a precursor to another, as yet unidentified, secondary metabolite. The 3‘UTRs of these two genes are around 200 nt, which is similar in length to known riboswitches.58 The secondary structure alignment of the RNAs by

DynAlign revealed a surprisingly strong stem loop in CYP707A4, especially considering the AU- rich nature of the 3‘UTR; four G C base pairs help to stabilize the stem (Figure 5.5). The AAO3 stem is not as strong as the CYP707A4 stem but

142

Figure 5.5: Putative secondary structure and pseudoknots of the 3‘UTRs of CYP707A4 and AAO3 as determined from DynAlign and manual sequence alignment. Red arrows with nucleotide numbers indicate the 5‘ and 3‘ ends of shortened constructs, as described in the text.

143 still has the potential to form two G C and several A U base pairs. The crossover was included in the structures to orient the sequences on paper. In both structures, the large loops at the end of the stems are very AU-rich and have little to no potential for self structure. Stable stem loops are typical of riboswitch structures, such as in the TPP riboswitch, shown in Figure 5.4.

Pseudoknots are not predicted using secondary structure prediction software, such as

Mfold or DynAlign, and programs designed to look for psudoknots can be unreliable. We proposed tertiary contacts and potential psudoknot interactions as discovered by manual sequence alignment (Figure 5.5). Loop-loop interactions and pseudoknots are common in riboswitches that oftentimes envelop the bound ligand. An example of these interactions can be seen in the TPP riboswitch structure in Figure 5.4. The AAO3 and CYP707A4 3‘UTRs shown in Figure 5.5 were used for folding and binding experiments discussed below. Most of the experiments were completed with CYPY707A4 and related constructs because of a better transcription yield.

5.4.2. Mg2+ and K+ dependent folding

Hydroxyl radical footprinting (HRF) experiments were performed to determine whether there is a tertiary structure for CYP707A4 RNA, and if any such structure would change in the presence of ABA. It is known that RNA nucleotides with only secondary structure are cleaved by hydroxyl radicals and that formation of tertiary structure prevents backbone cleavage. Due to the small size of the hydroxyl radical, protection from cleavage requires tight backbone packing.88,96,97 HRF experiments indicated a Mg2+ dependence for RNA folding and cleavage protection (Figure 5.6). Each band was normalized to the total number of counts in each lane to account for variation in loading. A general protection with increasing Mg2+ was seen, and fit to a

2+ two-state Hill equation to find a Mg 1/2 of 0.29 mM 0.06 mM and a Hill coefficient (n) of 4.8

144 4.1. These numbers, despite the high error in n, are similar to those for similar experiments on a folded RNA comparable in size.97

Figure 5.6: Hydroxyl radical footprinting (HRF) of CYP707A4 RNA shows increased protection with increasing Mg2+. Each band is normalized to the total number of counts in each lane. From 2+ the fit, the Mg 1/2 is 0.29 0.06 mM and n is 4.8 4.1.

Footprinting was also conducted in the presence of ( )ABA and a control plant hormone

IAA, however, results were not conclusive. The Mg2+ dependence of CYP707A4 RNA folding was confirmed by UV thermal denaturation, seen in Figure 5.7. Without Mg2+, the RNA has a low Tm of 15 C. However with addition of 0.5 mM Mg2+, the Tm increases to 30 C and in the presence of 4 mM Mg2+, the RNA had a Tm of 42 C, which is 27 C greater than the RNA without Mg2+.

145

Figure 5.7: Tm of CYP707A4 3‘UTR increases with Mg2+ concentration. RNA was prepared to 0.35 µM in 10 mM MES, 0.1 mM EDTA, and 100 mM NaCl for all melts. Unfolding was 2+ 2+ 2+ monitored at 260 nm. Tm‘s: 0 mM Mg , 15 C; 0.5 mM Mg , 30 C; 4 mM Mg , 42 C.

We employed circular dichroism (CD) to determine any changes in RNA structure upon addition of Mg2+, as suggested from the HRF and UV melts. The full length CYP707A4 RNA showed some small changes in structure in the CD with addition of Mg2+ and K+ but did not exhibit a substantial change in magnitude (data not shown). We then prepared shortened constructs for the 3‘UTR of CYP707A4 (20/158 and 32/140) as described above and shown in

Figure 5.8 to see if a minimal binding region would have a larger fractional change in structure.

The 20/158 construct removed 20 nts at the 3‘ end that were not involved in the putative secondary structure shown in Figure 5.5. If there were any alternative pairings possible with the deleted nucleotides, they would also be removed with the deletion. The 32/140 construct removed 34 nts at the 5‘end that are base pairing contacts and subsequently weakened the putative duplex region. It also removed 18 nts at the 3‘end that were involved in tertiary structure, while leaving the U-rich region of the loop unpaired.

146

Figure 5.8: Shortened putative secondary structures of the CYP707A4 3‘UTR (a) 20/158 and (b) 32/140.

The 20/158 construct showed minimal change upon addition of Mg2+ and K+ (data not shown), however the 32/140 construct showed a significant fractional change in CD signal of

7.7% with Mg2+ and 26.7% with K+, as shown in Figure 5.9. With the addition of Mg2+ in the background of 100 mM K+ (Figure 5.9a), the CD signal increased slightly to a maximum around

1-5 mM. When additional K+ was added to 32/140, there was a large decrease in CD signal through 600-700 mM K+ (Figure 5.9b).

147

Figure 5.9: 32/140 RNA changes structure upon addition of Mg2+ or K+. Circular dichroism titrations of 0.4 µM 32/140 RNA with (a) MgCl2 to 5 mM or (b) KCl to 700 mM. Both titrations were in the background of 50 mM MES, pH 6.0. The Mg2+ titration also had a background of 2+ + 100 mM KCl. Estimated Mg 1/2 is around 0.04 mM and estimated K 1/2 is around 300 - 400 mM.

5.4.3.ABA Binding

Isothermal titration calorimetry (ITC) was performed by titrating the 3‘UTR of

CYP707A4 with ( )ABA as described in Materials and Methods Section 5.3.5. To probe the pH dependence of ABA binding, titrations were completed at pH values of 5.5, 6.0, and 7.5, paralleling pH changes known to occur in vascular tissues and the apoplast during drought stress.

Background salt concentrations of 5 mM Mg2+ and 100 mM Na+ were used. Interestingly, the interaction between ABA and CYP707A4 was stronger at pH 6.0 compared to pH 7.5 or 5.5.

Figure 5.10 contains the ITC data at all three pH values. An exothermic reaction was seen for the

ITC titration with ABA, with apparent saturation of binding around 1.5 molar equivalents. To test the binding specificity, a control titration was performed with IAA, which showed

148

Figure 5.10: ITC titrations of CYP707A4 RNA with ABA show a small heat release at lower pH values that is slightly greater than the IAA control. (a) 4 µM CYP707A4 RNA in 10 mM Na- MOPS, pH 7.5 with 5 mM Mg2+ titrated with 30 µM ( )ABA shows a very weak signal. (b) 2 µM CYP707A4 RNA in 10 mM MES, pH 5.5 with 5 mM Mg2+ titrated with 30 µM ( )ABA. (c) 2 µM CYP707A4 RNA in 10 mM MES, pH 6.0 with 5 mM Mg2+ titrated with 30 µM ( )ABA or 30 µM IAA. The interaction between the RNA and ABA is noticeably stronger at pH 6.0 than at pH 7.5. (d) The control titration using 30 µM of IAA instead of ABA under the same conditions as panel (c) indicated a minimal interaction between the control ligand and the RNA.

149 a smaller interaction. Control titrations of ABA and IAA into buffer were performed to ensure that results seen were not due to dilution of ABA into the RNA solution. To test if only the biologically active form of ABA bound, we performed titrations of -2/158 and 32/140 RNA oligonucleotides with (+) ABA. Results suggested a weak interaction and no specific conclusions were drawn (data not shown).

CD was also used to look for a change in RNA structure upon ABA binding. All

CYP707A4 constructs tested showed a decrease in CD signal with increasing ABA concentrations. The titration of 32/140 RNA with ABA can be seen in Figure 5.11a. A control titration with 32/140 RNA and IAA also showed a decrease in CD signal. We tested yeast tRNAPhe and the malachite green aptamer (MGA) for ABA binding and saw no significant change in CD signal through 5 µM and 1µM, respectively.

Equilibrium dialysis using radiolabeled 3H-ABA was used to further test ABA binding to

AAO3, CYP707A4, shortened constructs, and poly(A) constructs. Table 5.2 contains the experimental RNA concentrations and resulting ratios from equilibrium dialysis experiments. All of the RNAs tested showed no evidence for binding. Some of the ratios in Table 5.2 are less than one. This is explained by the Gibbs-Donnan effect. Equilibrium dialysis is governed by both electrical and chemical driving forces, and the resulting distribution of species across the membrane is a balance of the two forces. The large concentration of negatively charged RNA on one side of the membrane can inhibit equal distribution of the negatively charged ABA. The ligand will not be able to cross the membrane to a final 1:1 ratio because of the uneven electrical charge.

150

Figure 5.11: Circular dichroism indicates a change in RNA structure with ABA additions. (a) 0.2 µM 32/140 RNA in 50 mM MES, pH 6.0, 100 mM KCl, and 5 mM MgCl2 with ABA additions to 50 µM. Estimated ABA1/2 is around 1 µM with a percent change of 13.7%. (b) 0.4 µM 32/140 RNA in 50 mM MES, pH 6.0, 100 mM KCl, and 5 mM MgCl2 with IAA additions to 50 µM. Estimated IAA1/2 is around 0.5 µM with a total percent change of 7.4%. (c) Control titration with 0.4 µM yeast tRNAPhe in the background of 50 mM MES, pH 6.0, 150 mM NaCl, and 5 mM MgCl2. (d) Control titration with 1 µM MGA RNA in 10 mM NaCacodylate, pH 5.9, 10 mM KCl, and 10 mM MgCl2.

151 Table 5.2: Equilibrium dialysis shows no evidence for ABA binding

RNA Ratio 10 mM MES, pH 6.0a 1.02, 0.97, 0.95, 1.08, 0.99 1.25 µM CYP707A4 RNA 0.99 75 µM 20/158 0.90 100 µM 32/140 0.86 30 µM 20 Poly(A) 0.84 10.7 µM 32 Poly(A) 1.07 72.5 µM yeast tRNAPhe 1.05 0.4 µM AAO3 RNA 0.95

aBuffer-ABA was used as a control for all equilibrium dialysis experiments. Listed are five representative ratios.

While preparing concentrated samples for equilibrium dialysis, we observed precipitation of several RNAs upon addition of a concentrated MgCl2 solution or in some cases, after renaturation to 50 C and subsequent cooling to room temperature. This was most likely due to the large effective concentration of Mg2+ when injected, but could also indicate a structured RNA.

Precipitation was alleviated by addition of a more dilute solution of Mg2+ and then concentrating to the desired concentration, as described above.90,98

5.4.4 Poly(A) tailed constructs

We prepared constructs with 20 nt poly(A) tails for 20/158 and 32/140 oligonucleotides as described in Materials and Methods; they are labeled 20 Poly(A) and 32 Poly(A), respectively.

In the cell, mRNAs are polyadenylated at the 3‘end with anywhere from 20 to greater than 200

As or A-rich sequence.99-102 Studies from the Steitz lab have shown that poly(A) tails can hybridize to U-rich internal loops within an mRNA and stabilize the transcript.103 As we also have a secondary structure with a U-rich loop towards the end of the 3‘UTR, we wanted to investigate the contribution of poly(A) tails to our secondary structures.

152 CD titrations of 20 poly(A) with Mg2+ additions showed no change in signal. Subsequent addition of ( )ABA showed a slight decrease in signal, similar to the 20/158 construct. 32

Poly(A) showed an increase in CD signal upon addition of Mg2+ similar to the 32/140 construct without the poly(A) at the 3‘end (data not shown). A decrease in CD signal with addition of ( )

ABA was also seen. Titrations were not performed with K+.

5.5 Discussion

We started this study searching for an ABA riboswitch in Arabidopsis thaliana. While binding studies showed no indication of a strong and specific binding interaction between ABA and the 3‘UTR of CYP707A4 or AAO3, there is still evidence for a folded RNA in a non-coding region of CYP707A4. The CYP707A4 3‘UTR and shortened constructs show evidence for a

Mg2+- dependent structure based on HRF and CD experiments. The 32/140 shortened sequence showed a K+-dependent folding using CD. Additionally, some of these RNAs precipitated at high concentrations with the addition of Mg2+, which can be characteristic of folded RNAs.

Preliminary binding studies with CD and ITC showed promising evidence for ABA binding to candidate RNAs, with greater effects than control ligands and control RNAs.

Equilibrium dialysis, however, showed that there was no specific binding interaction between

ABA and the RNAs of interest in the given concentration ranges which would be biologically relevant. It nonetheless remains possible that expression of ABA metabolic genes are controlled by an ABA binding protein that binds the tertiary structure identified herein, or targets other

RNAs in the biosynthesis or catabolism pathways. It is also possible that the CYP707A4 RNA binds to an ABA-related metabolite, such as hydroxy-ABA, PA, or conjugated ABA such as

ABA-GE. One other possibility is that the tertiary structure identified herein contributes to

153 mRNA stability. As seen in the plant TPP riboswitch, stable structures in the 3‘UTR can affect

3‘end processing and degradation.

In our study, there was some difficulty due to limitations in the plant system. There are not many well-sequenced plant genomes that have good gene annotation data, which makes a comparative genomics study difficult. In prokaryotes, riboswitches are located adjacent to the

Shine Dalgarno sequence, a critical component of the ribosome binding site or terminator and antiterminator stem loops. In plants, such regulatory sequences do not exist. Since most riboswitches are found adjacent or overlapping to regulatory sequences or structures, it would be suggested to look for strong secondary structures near alternative splice sites, such as with the

TPP riboswitch,59,75 or near the AU-rich elements in the 3‘UTRs that are target mRNAs for degradation.104

5.6 Future Directions

As the above study shows, there is at most a weak and non-specific interaction between

ABA and the RNAs of interest. Thus, the story lies with characterization of folding of these AU- rich non-coding RNA regions. There is evidence for secondary and/or tertiary structure from CD and HRF that can be characterized further : small angle x-ray scattering (SAXS) can be utilized to obtain the global structure of the CYP707A4 RNA in the absence of ions, and with Mg2+ and K+.

Significant changes in global structure upon binding to these ions would increase interest in characterizing the folds. Structure mapping to determine the RNA structure can be performed to build up an experimental secondary structure to test the model in Figure 5.5.

154 5.7 References

1. Fiehn, O., Metabolomics--the link between genotypes and phenotypes. Plant Mol. Biol., 2002. 48: 155-171.

2. Hanada, K., Sawada, Y., Kuromori, T., Klausnitzer, R., Saito, K., Toyoda, T., Shinozaki, K., Li, W.H. and Hirai, M.Y., Functional compensation of primary and secondary metabolites by duplicate genes in Arabidopsis thaliana. Mol. Biol. Evol., 2010. 28: 377- 382.

3. von Roepenack-Lahaye, E., Degenkolb, T., Zerjeski, M., Franz, M., Roth, U., Wessjohann, L., Schmidt, J., Scheel, D. and Clemens, S., Profiling of Arabidopsis secondary metabolites by capillary liquid chromatography coupled to electrospray ionization quadrupole time-of-flight mass spectrometry. Plant Physiol., 2004. 134: 548- 559.

4. Hahlbrock, K., Bednarek, P., Ciolkowski, I., Hamberger, B., Heise, A., Liedgens, H., Logemann, E., Nurnberger, T., Schmelzer, E., Somssich, I.E., et al., Non-self recognition, transcriptional reprogramming, and secondary metabolite accumulation during plant/pathogen interactions. Proc. Natl. Acad. Sci. U.S.A., 2003. 100 Suppl 2: 14569- 14576.

5. Jin, H., Cominelli, E., Bailey, P., Parr, A., Mehrtens, F., Jones, J., Tonelli, C., Weisshaar, B. and Martin, C., Transcriptional repression by AtMYB4 controls production of UV- protecting sunscreens in Arabidopsis. EMBO J., 2000. 19: 6150-6161.

6. Kutchan, T.M., Ecological arsenal and developmental dispatcher. The paradigm of secondary metabolism. Plant Physiol., 2001. 125: 58-60.

7. D'Auria, J.C. and Gershenzon, J., The secondary metabolism of Arabidopsis thaliana: growing like a weed. Curr. Opin. Plant Biol., 2005. 8: 308-316.

8. Tan, J., Bednarek, P., Liu, J., Schneider, B., Svatos, A. and Hahlbrock, K., Universally occurring phenylpropanoid and species-specific indolic metabolites in infected and uninfected Arabidopsis thaliana roots and leaves. Phytochem., 2004. 65: 691-699.

9. Wittstock, U. and Halkier, B.A., Glucosinolate research in the Arabidopsis era. Trends Plant Sci., 2002. 7: 263-270.

10. Brown, D.E., Rashotte, A.M., Murphy, A.S., Normanly, J., Tague, B.W., Peer, W.A., Taiz, L. and Muday, G.K., Flavonoids act as negative regulators of auxin transport in vivo in arabidopsis. Plant Physiol., 2001. 126: 524-535.

11. Ryan, K.G., Swinny, E.E., Winefield, C. and Markham, K.R., Flavonoids and UV photoprotection in Arabidopsis mutants. Z. Naturforsch C., 2001. 56: 745-754.

12. Debeaujon, I., Nesi, N., Perez, P., Devic, M., Grandjean, O., Caboche, M. and Lepiniec, L., Proanthocyanidin-accumulating cells in Arabidopsis testa: regulation of differentiation and role in seed development. Plant Cell, 2003. 15: 2514-2531.

155 13. Harris, M.J., Outlaw, W.H., Mertens, R. and Weiler, E.W., Water-stress-induced changes in the abscisic acid content of guard cells and other cells of Vicia faba L. leaves as determined by enzyme-amplified immunoassay. Proc. Natl. Acad. Sci. U.S.A., 1988. 85: 2584-2588.

14. Nambara, E. and Marion-Poll, A., Abscisic acid biosynthesis and catabolism. Annu. Rev. Plant Biol., 2005. 56: 165-185.

15. Wilkinson, S. and Davies, W.J., Xylem sap pH increase: A drought signal received at the apoplastic face of the guard cell that involves the suppression of saturable abscisic acid uptake by the epidermal symplast. Plant Physiol., 1997. 113: 559-573.

16. Hartung, W. and Slovik, S., Tansley Review No. 35. Physicochemical Properties of Plant Growth Regulators and Plant Tissues Determine Their Distribution and Redistribution: Stomatal Regulation by Abscisic Acid in Leaves. New Phytol., 1991. 119: 361-382.

17. Hartung, W., Wilkinson, S. and Davies, W.J., Factors that regulate abscisic acid concentrations at the primary site of action at the guard cell. 1998. 361-367.

18. Roberts, D.L., Heckman, R.A., Hege, B.P. and Bellin, S.A., Synthesis of (RS)-abscisic acid. J. Org. Chem., 1968. 33: 3566-3569.

19. Oritani, T. and Kiyota, H., Biosynthesis and metabolism of abscisic acid and related compounds. Nat. Prod. Rep., 2003. 20: 414-425.

20. Balsevich, J.J., Cutler, A.J., Lamb, N., Friesen, L.J., Kurz, E.U., Perras, M.R. and Abrams, S.R., Response of cultured maize cells to (+)-abscisic acid, (-)-abscisic acid, and their metabolites. Plant Physiol., 1994. 106: 135-142.

21. Levchenko, V., Konrad, K.R., Dietrich, P., Roelfsema, M.R. and Hedrich, R., Cytosolic abscisic acid activates guard cell anion channels without preceding Ca2+ signals. Proc. Natl. Acad. Sci. U.S.A., 2005. 102: 4203-4208.

22. Siegel, R.S., Xue, S., Murata, Y., Yang, Y., Nishimura, N., Wang, A. and Schroeder, J.I., Calcium elevation-dependent and attenuated resting calcium-dependent abscisic acid induction of stomatal closure and abscisic acid-induced enhancement of calcium sensitivities of S-type anion and inward-rectifying K channels in Arabidopsis guard cells. Plant J., 2009. 59: 207-220.

23. Achard, P., Cheng, H., De Grauwe, L., Decat, J., Schoutteten, H., Moritz, T., Van Der Straeten, D., Peng, J. and Harberd, N.P., Integration of plant responses to environmentally activated phytohormonal signals. Science, 2006. 311: 91-94.

24. Zhang, S., Cai, Z. and Wang, X., The primary signaling outputs of brassinosteroids are regulated by abscisic acid signaling. Proc. Natl. Acad. Sci. U.S.A., 2009. 106: 4543- 4548.

25. Cheng, W.H., Endo, A., Zhou, L., Penney, J., Chen, H.C., Arroyo, A., Leon, P., Nambara, E., Asami, T., Seo, M., et al., A unique short-chain dehydrogenase/reductase

156 in Arabidopsis glucose signaling and abscisic acid biosynthesis and functions. Plant Cell, 2002. 14: 2723-2743.

26. White, C.N., Proebsting, W.M., Hedden, P. and Rivin, C.J., Gibberellins and seed development in maize. I. Evidence that gibberellin/abscisic acid balance governs germination versus maturation pathways. Plant Physiol., 2000. 122: 1081-1088.

27. Raz, V., Bergervoet, J.H. and Koornneef, M., Sequential steps for developmental arrest in Arabidopsis seeds. Development, 2001. 128: 243-252.

28. Raghavendra, A.S., Gonugunta, V.K., Christmann, A. and Grill, E., ABA perception and signalling. Trends Plant Sci., 2010. 15: 395-401.

29. Pandey, S., Nelson, D.C. and Assmann, S.M., Two novel GPCR-type G proteins are abscisic acid receptors in Arabidopsis. Cell, 2009. 136: 136-148.

30. Shen, Y.Y., Wang, X.F., Wu, F.Q., Du, S.Y., Cao, Z., Shang, Y., Wang, X.L., Peng, C.C., Yu, X.C., Zhu, S.Y., et al., The Mg-chelatase H subunit is an abscisic acid receptor. Nature, 2006. 443: 823-826.

31. Wu, F.Q., Xin, Q., Cao, Z., Liu, Z.Q., Du, S.Y., Mei, C., Zhao, C.X., Wang, X.F., Shang, Y., Jiang, T., et al., The magnesium-chelatase H subunit binds abscisic acid and functions in abscisic acid signaling: new evidence in Arabidopsis. Plant Physiol., 2009. 150: 1940- 1954.

32. Weiner, J.J., Peterson, F.C., Volkman, B.F. and Cutler, S.R., Structural and functional insights into core ABA signaling. Curr. Opin. Plant Biol., 2010. 13: 495-502.

33. Kang, J., Hwang, J.U., Lee, M., Kim, Y.Y., Assmann, S.M., Martinoia, E. and Lee, Y., PDR-type ABC transporter mediates cellular uptake of the phytohormone abscisic acid. Proc. Natl. Acad. Sci. U.S.A., 2010. 107: 2355-2360.

34. Kuromori, T., Miyaji, T., Yabuuchi, H., Shimizu, H., Sugimoto, E., Kamiya, A., Moriyama, Y. and Shinozaki, K., ABC transporter AtABCG25 is involved in abscisic acid transport and responses. Proc. Natl. Acad. Sci. U.S.A., 2010. 107: 2361-2366.

35. Rockholm, D.C. and Yamamoto, H.Y., Violaxanthin de-epoxidase. Plant Physiol., 1996. 110: 697-703.

36. Schwartz, S.H., Qin, X. and Zeevaart, J.A., Elucidation of the indirect pathway of abscisic acid biosynthesis by mutants, genes, and enzymes. Plant Physiol., 2003. 131: 1591-1601.

37. Lindgren, L.O., Stalberg, K.G. and Hoglund, A.S., Seed-specific overexpression of an endogenous Arabidopsis phytoene synthase gene results in delayed germination and increased levels of carotenoids, chlorophyll, and abscisic acid. Plant Physiol., 2003. 132: 779-785.

157 38. Frey, A., Audran, C., Marin, E., Sotta, B. and Marion-Poll, A., Engineering seed dormancy by the modification of zeaxanthin epoxidase gene expression. Plant Mol. Biol., 1999. 39: 1267-1274.

39. Qin, X. and Zeevaart, J.A., The 9-cis-epoxycarotenoid cleavage reaction is the key regulatory step of abscisic acid biosynthesis in water-stressed bean. Proc. Natl. Acad. Sci. U.S.A., 1999. 96: 15354-15361.

40. Thompson, A.J., Jackson, A.C., Symonds, R.C., Mulholland, B.J., Dadswell, A.R., Blake, P.S., Burbidge, A. and Taylor, I.B., Ectopic expression of a tomato 9-cis-epoxycarotenoid dioxygenase gene causes over-production of abscisic acid. Plant J., 2000. 23: 363-374.

41. Iuchi, S., Kobayashi, M., Taji, T., Naramoto, M., Seki, M., Kato, T., Tabata, S., Kakubari, Y., Yamaguchi-Shinozaki, K. and Shinozaki, K., Regulation of drought tolerance by gene manipulation of 9-cis-epoxycarotenoid dioxygenase, a key enzyme in abscisic acid biosynthesis in Arabidopsis. Plant J., 2001. 27: 325-333.

42. Koiwai, H., Nakaminami, K., Seo, M., Mitsuhashi, W., Toyomasu, T. and Koshiba, T., Tissue-specific localization of an abscisic acid biosynthetic enzyme, AAO3, in Arabidopsis. Plant Physiol., 2004. 134: 1697-1707.

43. Cornish, K. and Zeevaart, J.A., Movement of Abscisic Acid into the Apoplast in Response to Water Stress in Xanthium strumarium L. Plant Physiol., 1985. 78: 623-626.

44. Christmann, A., Weiler, E.W., Steudle, E. and Grill, E., A hydraulic signal in root-to- shoot signalling of water shortage. Plant J., 2007. 52: 167-174.

45. Tan, B.C., Joseph, L.M., Deng, W.T., Liu, L., Li, Q.B., Cline, K. and McCarty, D.R., Molecular characterization of the Arabidopsis 9-cis epoxycarotenoid dioxygenase gene family. Plant J., 2003. 35: 44-56.

46. Zhou, R., Cutler, A.J., Ambrose, S.J., Galka, M.M., Nelson, K.M., Squires, T.M., Loewen, M.K., Jadhav, A.S., Ross, A.R., Taylor, D.C., et al., A new abscisic acid catabolic pathway. Plant Physiol., 2004. 134: 361-369.

47. Kushiro, T., Okamoto, M., Nakabayashi, K., Yamagishi, K., Kitamura, S., Asami, T., Hirai, N., Koshiba, T., Kamiya, Y. and Nambara, E., The Arabidopsis cytochrome P450 CYP707A encodes ABA 8'-hydroxylases: key enzymes in ABA catabolism. EMBO J., 2004. 23: 1647-1656.

48. Saito, S., Hirai, N., Matsumoto, C., Ohigashi, H., Ohta, D., Sakata, K. and Mizutani, M., Arabidopsis CYP707As encode (+)-abscisic acid 8 '-hydroxylase, a key enzyme in the oxidative catabolism of abscisic acid. Plant Physiol., 2004. 134: 1439-1449.

49. Todoroki, Y., Hirai, N. and Ohigashi, H., Analysis of Isomerization Process of 8'- Hydroxyabscisic Acid and its 3'-Fluorinated Analog in Aqueous Solutions. Tetrahedron, 2000. 56: 1649-1653.

50. Hartung, W., Sauter, A. and Hose, E., Abscisic acid in the xylem: where does it come from, where does it go to? J. Exp. Bot., 2002. 53: 27-32.

158 51. Wilkinson, S. and Davies, W.J., ABA-based chemical signalling: the co-ordination of responses to stress in plants. Plant Cell Environ, 2002. 25: 195-210.

52. Okamoto, M., Tanaka, Y., Abrams, S.R., Kamiya, Y., Seki, M. and Nambara, E., High humidity induces abscisic acid 8'-hydroxylase in stomata and vasculature to regulate local and systemic abscisic acid responses in Arabidopsis. Plant Physiol., 2009. 149: 825-834.

53. Saito, S., Hirai, N., Matsumoto, C., Ohigashi, H., Ohta, D., Sakata, K. and Mizutani, M., Arabidopsis CYP707As encode (+)-abscisic acid 8'-hydroxylase, a key enzyme in the oxidative catabolism of abscisic acid. Plant Physiol., 2004. 134: 1439-1449.

54. Xiong, L., Gong, Z., Rock, C.D., Subramanian, S., Guo, Y., Xu, W., Galbraith, D. and Zhu, J.K., Modulation of abscisic acid signal transduction and biosynthesis by an Sm-like protein in Arabidopsis. Dev. Cell, 2001. 1: 771-781.

55. Xiong, L., Lee, H., Ishitani, M. and Zhu, J.K., Regulation of osmotic stress-responsive gene expression by the LOS6/ABA1 locus in Arabidopsis. J. Biol. Chem., 2002. 277: 8588-8596.

56. Xu, Z.J., Nakajima, M., Suzuki, Y. and Yamaguchi, I., Cloning and characterization of the abscisic acid-specific glucosyltransferase gene from adzuki bean seedlings. Plant Physiol., 2002. 129: 1285-1295.

57. Batey, R.T., Gilbert, S.D. and Montange, R.K., Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature, 2004. 432: 411-415.

58. Winkler, W.C. and Breaker, R.R., Genetic control by metabolite-binding riboswitches. ChemBioChem, 2003. 4: 1024-1032.

59. Wachter, A., Tunc-Ozdemir, M., Grove, B.C., Green, P.J., Shintani, D.K. and Breaker, R.R., Riboswitch control of gene expression in plants by splicing and alternative 3' end processing of mRNAs. Plant Cell, 2007. 19: 3437-3450.

60. Sudarsan, N., Wickiser, J.K., Nakamura, S., Ebert, M.S. and Breaker, R.R., An mRNA structure in bacteria that controls gene expression by binding lysine. Genes Dev., 2003. 17: 2688-2697.

61. Nahvi, A., Sudarsan, N., Ebert, M.S., Zou, X., Brown, K.L. and Breaker, R.R., Genetic control by a metabolite binding mRNA. Chem. Biol., 2002. 9: 1043.

62. Winkler, W., Nahvi, A. and Breaker, R.R., Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature, 2002. 419: 952-956.

63. Mironov, A.S., Gusarov, I., Rafikov, R., Lopez, L.E., Shatalin, K., Kreneva, R.A., Perumov, D.A. and Nudler, E., Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell, 2002. 111: 747-756.

64. Cheah, M.T., Wachter, A., Sudarsan, N. and Breaker, R.R., Control of alternative RNA splicing and gene expression by eukaryotic riboswitches. Nature, 2007. 447: 497-500.

159 65. Bocobza, S.E. and Aharoni, A., Switching the light on plant riboswitches. Trends Plant Sci., 2008. 13: 526-533.

66. Corbino, K.A., Barrick, J.E., Lim, J., Welz, R., Tucker, B.J., Puskarz, I., Mandal, M., Rudnick, N.D. and Breaker, R.R., Evidence for a second class of S-adenosylmethionine riboswitches and other regulatory RNA motifs in alpha-proteobacteria. Genome Biol., 2005. 6: R70.

67. Vitreschak, A.G., Rodionov, D.A., Mironov, A.A. and Gelfand, M.S., Regulation of riboflavin biosynthesis and transport genes in bacteria by transcriptional and translational attenuation. Nucleic Acids Res., 2002. 30: 3141-3151.

68. Barrick JE, C.K., Winkler WC, Nahvi A, Mandal M, Collins J, Lee M, Roth A, Sudarsan N, Jona I, Wickiser JK, BReaker RR, New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc. Natl. Acad. Sci. U.S.A., 2004. 101: 6421- 6426.

69. Mandal, M. and Breaker, R.R., Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat. Struct. Mol. Biol., 2004. 11: 29-35.

70. Winkler, W.C., Nahvi, A., Roth, A., Collins, J.A. and Breaker, R.R., Control of gene expression by a natural metabolite-responsive ribozyme. Nature, 2004. 428: 281-286.

71. Roth, A., Winkler, W.C., Regulski, E.E., Lee, B.W., Lim, J., Jona, I., Barrick, J.E., Ritwik, A., Kim, J.N., Welz, R., et al., A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain. Nat. Struct. Mol. Biol., 2007. 14: 308-317.

72. Ellington, A.D. and Szostak, J.W., In vitro selection of RNA molecules that bind specific ligands. Nature, 1990. 346: 818-822.

73. Silverman, S.K., Artificial functional nucleic acids: aptamers, ribozymes, and deoxyribozymes identified by in vitro selection, in Functional nucleic acids for sensing and other analytical applications, Y. Lu and Y. Li, Editors. 2007, Springer: New York.

74. Montange, R.K. and Batey, R.T., Structure of the S-adenosylmethionine riboswitch regulatory mRNA element. Nature, 2006. 441: 1172-1175.

75. Thore, S., Leibundgut, M. and Ban, N., Structure of the Eukaryotic Thiamine Pyrophosphate Riboswitch with Its Regulatory Ligand. Science, 2006. 312: 1208-1211.

76. Edwards, T.E., Klein, D.J. and Ferre-D'Amare, A.R., Riboswitches: small-molecule recognition by gene regulatory RNAs. Curr. Opin. Struct. Biol., 2007.

77. Mendonsa, S.D. and Bowser, M.T., In vitro evolution of functional DNA using capillary electrophoresis. J. Am. Chem. Soc., 2004. 126: 20-21.

78. Berezovski, M., Drabovich, A., Krylova, S.M., Musheev, M., Okhonin, V., Petrov, A. and Krylov, S.N., Nonequilibrium capillary electrophoresis of equilibrium mixtures: a universal tool for development of aptamers. J. Am. Chem. Soc., 2005. 127: 3165-3171.

160 79. Drabovich, A.P., Berezovski, M., Okhonin, V. and Krylov, S.N., Selection of smart aptamers by methods of kinetic capillary electrophoresis. Anal. Chem., 2006. 78: 3171- 3178.

80. Higgins, D.G. and Sharp, P.M., CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 1988. 73: 237-244.

81. Thompson, J.D., Higgins, D.G. and Gibson, T.J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position- specific gap penalties and weight matrix choice. Nucleic Acids Res., 1994. 22: 4673- 4680.

82. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G. and Thompson, J.D., Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res., 2003. 31: 3497-3500.

83. Mathews, D.H. and Turner, D.H., Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol., 2002. 317: 191-203.

84. Uzilov, A.V., Keegan, J.M. and Mathews, D.H., Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinf., 2006. 7: 173.

85. Harmanci, A.O., Sharma, G. and Mathews, D.H., Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinf., 2007. 8: 130.

86. Sosnick, T.R., Characterization of tertiary folding of RNA by circular dichroism and urea, in Current protocols in nucleic acid chemistry / edited by Serge L. Beaucage ... [et al. 2001:11.15.11-11.15.10.

87. Cantor, C.R. and Schimmel, P.R., Biophysical chemistry, part II: Techniques for the study of biological structure and function. Their Biophysical Chemistry Series. Vol. 2. 1980, San Francisco, CA: W.H. Freeman.

88. Tullius, T.D., Dombroski, B.A., Churchill, M.E. and Kam, L., Hydroxyl radical footprinting: a high-resolution method for mapping protein-DNA contacts. Methods Enzymol., 1987. 155: 537-558.

89. Lipfert, J., Das, R., Chu, V.B., Kudaravalli, M., Boyd, N., Herschlag, D. and Doniach, S., Structural transitions and thermodynamics of a glycine-dependent riboswitch from Vibrio cholerae. J. Mol. Biol., 2007. 365: 1393-1406.

90. Bloomfield, V., Crothers, D.M. and Tinoco, I., Jr., Nucleic acids: Structures, properties, and functions. 2000, Sausalito, California: University Science Books.

91. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J., Basic local alignment search tool. J. Mol. Biol., 1990. 215: 403-410.

161 92. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997. 25: 3389-3402.

93. Harrell, C.M., McKenzie, A.R., Patino, M.M., Walden, W.E. and Theil, E.C., Ferritin mRNA: interactions of iron regulatory element with translational regulator protein P-90 and the effect on base-paired flanking regions. Proc. Natl. Acad. Sci. U.S.A., 1991. 88: 4166-4170.

94. Singh, B.K. and Shaner, D.L., Biosynthesis of branched chain amino acids: From test tube to field. Plant Cell, 1995. 7: 935-944.

95. Watson, P.Y. and Fedor, M.J., The glmS riboswitch integrates signals from activating and inhibitory metabolites in vivo. Nat. Struct. Mol. Biol., 2011. 18: 359-363.

96. Celander, D.W. and Cech, T.R., Iron(II)-ethylenediaminetetraacetic acid catalyzed cleavage of RNA and DNA oligonucleotides: similar reactivity toward single- and double-stranded forms. Biochemistry, 1990. 29: 1355-1361.

97. Kieft, J.S., Zhou, K., Jubin, R., Murray, M.G., Lau, J.Y. and Doudna, J.A., The hepatitis C virus internal ribosome entry site adopts an ion-dependent tertiary fold. J. Mol. Biol., 1999. 292: 513-529.

98. Bevilacqua, P.C. and Turner, D.H., Comparison of binding of mixed ribose-deoxyribose analogues of CUCU to a ribozyme and to GGAGAA by equilibrium dialysis: evidence for ribozyme specific interactions with 2' OH groups. Biochemistry, 1991. 30: 10632-10640.

99. Hirsch, M. and Penman, S., Post-transcriptional addition of polyadenylic acid to mitochondrial RNA by a cordycepin-insensitive process. J. Mol. Biol., 1974. 83: 131-142.

100. Guilford, P.J., Beck, D.L. and Forster, R.L., Influence of the poly(A) tail and putative polyadenylation signal on the infectivity of white clover mosaic potexvirus. Virology, 1991. 182: 61-67.

101. Proudfoot, N.J., Genetic dangers in poly(A) signals. EMBO Rep., 2001. 2: 891-892.

102. Gagliardi, D., Stepien, P.P., Temperley, R.J., Lightowlers, R.N. and Chrzanowska- Lightowlers, Z.M., Messenger RNA stability in mitochondria: different means to an end. Trends Genet., 2004. 20: 260-267.

103. Conrad, N.K., Shu, M.D., Uyhazi, K.E. and Steitz, J.A., Mutational analysis of a viral RNA element that counteracts rapid RNA decay by interaction with the polyadenylate tail. Proc. Natl. Acad. Sci. U.S.A., 2007. 104: 10412-10417.

104. Chen, C.Y. and Shyu, A.B., Au-rich elements: characterization and importance in mRNA degradation. Trends Biochem. Sci., 1995. 20: 465-470.

162

Chapter 6

Conclusions and Future Directions

6.1 Conclusions from this work

The mechanisms to which plants respond to stress are widely studied. Stress responses involve plant hormones, other secondary metabolites, and signal transduction pathways with hundreds of proteins and cofactors.1-5 RNA folding is almost certainly affected by plant stress responses given their association with changes in ionic strength and osmolyte concentrations.

RNA structure that can respond or refold to stresses such as drought or heavy metal stress has the potential to have a regulatory effect.

Under drought stress, plant cellular K+ concentrations drastically increase. G-quadruplex sequences require K+ for formation and the K+ dependence varies by GQS sequence. With the same loop sequence, I demonstrated that G3 GQS are more stable than G2 GQS, having higher

+ Tms and lower K 1/2 values. In addition, G3 GQS are underrepresented in Arabidopsis thaliana, yet there are still around 1,200 G3 and 42,000 G2 GQS in the genome (Chapter 2). We found that

GQS distribution in the genome varies by folding motif: G3 sequences are enriched in the intergenic region (IGR) and G2 sequences are prevalent in the genic region. GQS in the IGR could additionally function in repression of unwanted transcription, as supported by their location in non-transcribed units of the IGR. The GQS in the genic region has the potential to regulate transcription or translation, which can be modified by environmental or physiological conditions.

As determined by mining whole genome tiling arrays, I determined that genes that contain a GQS are more likely to be up- or down-regulated in drought stress conditions.

163

The folding pathways of GQS are dependent on G-tract length, with G3 GQS folding in a non or even slightly anti-cooperative manner, through folding intermediates. We found that the

G2 GQS, however, fold with no evidence for a folding intermediate and with a steeper potassium dependence at higher K+ concentrations. Thus, there is potential for a GQS to switch in structure sharply in response to high K+ conditions in the plant during drought stress.

Heavy metal ion stress from contaminated soils also triggers a stress response in plants.

In Chapter 4, we showed that Cu2+ and other heavy metal ions can disrupt or change a GQS structure. Additionally, copper transport genes that contain a GQS are up-regulated when grown on Cu2+-enriched media. This response could be a mechanism to survive an excess of heavy metal ions in the local environment.

ABA, a plant hormone that mediates drought stress response in Arabidopsis, increases in concentration from nM to µM levels under drought stress. We provide evidence that the AU-rich

3‘UTR of CYP707A4, a gene involved in the ABA catabolism pathway, has an ion-dependent structure. The 3‘UTR of CYP707A4 folds with Mg2+ and K+ but does not have a strong binding interaction with the plant hormone abscisic acid.

6.2 Future directions

6.2.1 Preliminary studies

6.2.1.1 GQS loop lengths and sequences

During the bioinformatics study of GQS in the Arabidopsis genome, many questions were asked and only partially studied. We examined G-tract length and some loop variations but a more detailed study of GQS loop lengths and sequences is of interest.6 Loop length

164 combinations were compiled (Tables D.1-D.5) for different regions of the Arabidopsis genome.

We observed that loops of the same length within a GQS were common in genic regions and represented greater than 10% of all genic region GQS. These combinations were examined in more detail in Table 6.1. There is an abundance of T-rich loops in short loop sequences, especially GQS with 3-3-3 loops. Additionally, 15/41 GQS have ≥ 50% Ts. The longer loops of

5, 6, and 7 are more A-rich. All loops have at least one A and for the loops of 7, all have at least

3 As. Also, the loops of 6 and 7 have some similar sequences; loops of 6 commonly end in AT, and positions 2 and 3 are more likely to have a G than other positions. For loops of 7, many start with CA or AA and most end in T. This small subset of sequence data should be expanded to look at all of the GQS loop sequences in the RNA, not just those with equal loop lengths. A more comprehensive study of loop sequences can be performed to look for conserved sequences.

In the IGR, the most common loop length combination was 4-4-4, which was found in about 20% of all IGR GQS (Table D.5). Upon closer examination, these sequences were found to be equal to or within 3 mutations of the Arabidopsis telomeric repeat sequence (TTTA or, in the case of C-rich patterns: TAAA). The overwhelming majority of the 4-4-4 loops GQS in the IGR are C sequences (149/159). Also, 53% (85/159) have the exact telomeric repeat sequence and

33% (53/159) are within one nt of the telomeric repeat sequence. In the IGR, 27 intergenic regions have at least one 4-4-4 GQS, out of a total of ~31,000. Four of these border telomere regions and there is a very large cluster of 108 GQS at the centromere of chromosome 1. As mentioned in Chapter 2, the prevalence of GQS at the centromere is most likely explained by chromosomal rearrangements, such as the combination of two chromosomes.7-9

There are 50 genes that flank the 27 IGR regions with 4-4-4 GQS, either upstream or downstream. 34/50 (68%) are transposable gene elements (transposons); 20/27 intergenic regions that have at least one 4-4-4 GQS are bordered by a transposon. Any function of the 4-4-4 GQS with respect to transposons is yet unknown and unstudied. As speculated in Chapter 2, GQS in

165 the IGR might be repressing transcription in the intergenic region. However, the function of the

GQS in the IGR has not been studied.

6.2.1.2 miRNA targets and GQS

MicroRNAs (miRNA) target genes for miRNA-directed cleavage, translational repression, or transcriptional silencing.10 In plants, miRNA targets are found in the coding

11 regions of protein-coding genes in addition to the 3‘UTR. Since G2 GQS were found to be enriched in coding regions in the Arabidopsis genome, we asked if there was any correlation between miRNA target location and GQS because formation of a GQS in a miRNA target site could impede miRNA binding and in turn affect its function.

We investigated validated miRNA targets and their location relative to GQS in the same gene using data from Arabidopsis thaliana Small RNA Project (ASRP) from the Carrington lab at

Oregon State University12,13. Table D.6 contains a list of all genes that contain both a validated miRNA target and a GQS; validated targets without a GQS are not listed. 96 target genes of 25 miRNA families contain a GQS. Location of the target site from the GQS sequence ranges from

23 nt to greater than 3,000 nt away, upstream or downstream. Interestingly, there are also 15 target mRNAs for the miR400 family where the target mRNA sequence contains the GQS (Table

D.6 ―overlap‖). Investigation into potential function of GQS and miR400 targets as well as general distance correlation with GQS could be pursued. These target genes include argonaute2

(AGO2), which is involved in small RNA processing, and suggests that this miRNA can regulate its own processing. Other target genes include auxin response transcription factors (ARF10 and

ARF16), and transcription factors involved in growth response (GRF2 and GRF8).

166

Table 6.1: Loop sequences of G3L1-7 GQS in genic regions with loops of the same size

Gene Model Location Pattern Loop 1 Loop 2 Loop 3 other loops At4g32030.1 3‘UTR C A A A 0 At5g3340.1-10 Intron G T T G 0 At1g51090.1 CDS C CT CT CT 2 At2g15770.1 CDS G TC TT TT 1 At3g18540.1 CDS G AT AT AT 10 At3g47800.1 3‘UTR G TC TC TC 3 At5g06390.1 3‘UTR G GG GG GG 0 At5g26805.1 3‘UTR G GG GG GG 0 At1g04150 CDS C TCT TCC TCC 0 At1g28290.1 CDS C TGA TGA TGA 1 At1g28290.2 CDS C TGA TGA TGA 2 At1g47655.1 CDS G TTT TAC TAC 0 At1g52030.1 CDS C CGC CGC CGC 0 At1g52030.2 CDS C CGC CGC CGC 0 At1g57670.1-2 Intron C TTT TTT TTT 0 At1g60200.1 CDS C TTT TTT TGT 3 At1g72390.1 CDS G TAT TAT CAT 3 At2g18360.1-2 Intron C TTC TTC TTC 1 At2g27260.1 CDS C TAT TAT TAT 0 At2g42010.1 CDS C TAT TAA TAT 0 At3g07560.1 CDS G TAT AAC AAT 4 At3g19070.1 CDS C TTT ATT TTA 0 At4g28530.1 5‘UTR C AGA AGA AGA 0 At4g28530.2 5‘UTR C AGA AGA AGA 0 At5g55440.1 3‘UTR G TTA TTT TTT 0 At4g11730.1-1 Intron G TTTA TTTA TTTA 0 At1g55760.1 5‘UTR C AGAGA AGAGA AGAGA 1 At1g09070.1 CDS G TGACAT TGACAT TGACAT 0 At2g33210.1 CDS G CGGCAT TGGCAT CGGTAT 1 At4g02450.1 CDS G AGGAAT AGGAAT CGGCAT 4 At4g02450.1 CDS G TGGAAT TGGAAT TGGAAT 0 At4g02450.1 CDS G AGGCAT TGGAAT AGGCAT 1 At5g15780.1 CDS C ATTAGT ATTGGA AACTTG 2 At5g15780.1 CDS C ATGCCA AGTCCA ACCGGA >10 At5g32590.1 CDS C TGCTTC TTCCAT ATCACT 0 At1g20860.1 CDS G TAAGCTT CTATTGT CACTGTT 1 At1g67290.1 CDS C TAGTTAC CACCACC AAGTTGA 0 At2g06908.1-1 Intron C CACGAAT CAAAGAT AAAAGTT 0 At3g31950.1 CDS G CAAGAAC CAAGTAT CAAGAAT 2 At5g34860.1-1 Intron C CAAGAAT CACGCAAT CAAAGAT 1 At5g34870.1 CDS G CAAGAAT CACGAAT CAAAGAT 1

Compiled loop sequences for loops that are the same size within a GQS (highlighted in grey in Tables D.1- D.4). The table above includes the location of the GQS (5‘UTR, 3‘UTR, Intron, or CDS). Specifies if the GQS was found as a G- or C-pattern. The loop sequences listed are corrected for reverse complementarity of C-rich patterns. Some GQS have more than four G-tracts and consequently, more than 3 loop sequences. The last column indicates how many additional loops are present in the GQS. These do not necessarily have the same loop lengths or sequences as the loops listed in the table.

167 It is of note that only verified targets were used to compile Table D.6. Several webservers promote miRNA target prediction, including miRU, TAPIR, RNAhybrid, and target- align. These can be beneficial as an initial search, however verification of miRNAs and targets are absolutely necessary. Indeed, a 2008 communication to the miRNA community in the journal

Plant Cell described the necessity for strict criteria for plant miRNA annotation, as there are an increasing number of papers identifying new miRNAs and targets.14

6.2.1.3 Alternative splicing and GQS

As mentioned in Chapter 2, 14% of genes in Arabidopsis have alternative splice variants

(4,626). Over 8,000 gene models (including individual splice variants) have at least one G2 GQS.

Additionally, 108 genes have at least one splice variant with a GQS and one without a GQS or a

GQS that is located in a different genic region. A list of these gene models, their locations, and putative functions can be found in Table D.7. Of these, GQS that are near splice sites or span junctions could be studied to determine if there is a functional role for GQS in alternative splicing. Of particular potential interest are genes conserved telomere maintenance component 1

(CTC1) and nuclear protein X1 (NPX1), which are implicated in telomere maintenance and abscisic acid response, respectively. CTC1 has two splice variants: one with a G2 GQS in Intron

15 and one without a GQS. NPX1 has three splice variant: one with a G2 GQS in Intron 6 and two without GQS.

6.2.1.4 GQS folding in a high salt background

We wanted to investigate how background salt could affect GQS folding parameters.

The G3L221 RNA (GGGUCGGGUUGGGCGGG) was used as the model oligonucleotide for these

168 experiments. Under standard conditions, which are 10 mM LiCacodylate (pH 7.0), the CD

+ titration of G3L221 with K shows an increase in CD signal around 260 nm and a decrease at 240 nm, typical of a parallel GQS structure. A fit of the data at 262 nm, the lambda max, provided a

+ 2+ + K 1/2 of 0.7 0.1 mM and n of 0.7 0.1. We found that in a 2 mM Mg or 100 mM Li

+ background, the K 1/2 value decreased (Figure 6.1 and Table 6.2): in the background of 100 mM

+ LiCl with standard buffer conditions, the K 1/2 decreased to 0.09 0.01 mM with a similar n of

+ 0.75 0.05. In the background of 2 mM MgCl2 and before addition of K , the maximum peak shifted to a slightly lower wavelength (255 nm). Addition of K+ shifted the maximum to higher

+ wavelengths similar to the previous titrations, around 262 nm. The resulting K 1/2 is 0.46 0.03

+ mM and n is 1.2 0.1. The K 1/2 is lower than the no added salt value but higher than the 100 mM LiCl background value. Additionally, n is slightly higher than the other titrations. Further studies on the effect of background salts and more in vivo like conditions are required for

+ analysis. These results suggest that K 1/2 values will decrease and increase GQS stability.

Table 6.2: G3L221 folding parameters

+ Initial Ionic Conditions K 1/2 (mM) n 10 mM LiCacodylate, pH 7.0 8.0 0.7 0.7 0.1 100 mM LiCl 0.09 0.01 0.75 0.05 10 mM LiCacodylate, pH 7.0 2 mM MgCl2 0.46 0.03 1.2 0.1 10 mM LiCacodylate, pH 7.0

169

Figure 6.1: Ionic conditions affect G3L1-2 folding parameters. CD titration data and fits for GGGUCGGGUUGGGCGGG in (top to bottom): 10 mM LiCacodylate (pH 7.0), 100 mM LiCl and 10 mM LiCacodylate (pH 7.0), and 2 mM MgCl2 and 10 mM LiCacodylate (pH 7.0). Fit parameters can be seen in Figure 6.2.

170 6.2.1.5 The quadruzyme?

While performing preliminary RNase T1 protection assays and in line probing assays

(ILP), we noticed that several residues from RNA oligonucleotides (G2AUA)3G2 and

(G3AUA)3G3 (herein referred to as G2AUA and G3AUA) had high levels of background cleavage, with three major cleavage bands each corresponding to reaction in a quadruplex loop; in fact the greater intensity of the lower band is likely due to the fact that there have been multiple cleavages in given strands. Upon closer examination, we noticed cleavage was occurring after the U residues in each of the GQS loops (Figure 6.2). Cleavage is K+-dependent, with more occurring at higher K+ concentrations, when GQS formation is expected. This cleavage pattern is seen during ILP assays (Figure 6.2a) and RNase T1 protection assays (Figure 6.2b), most likely during the renaturation of the sample (see below). This self-cleavage activity has been seen while using 5’end labeled RNA from different kinase labeling reactions, different RNA preparations, and different K+ stock solutions, suggesting that it is robust.

The source or reason for cleavage of U resides at the central position in loops is yet unknown. More investigation into this unique observation is clearly necessary before conclusions are made. One lofty explanation for cleavage is that these GQS are ribozymes, or as we affectionately call them, quadruzymes. RNA GQS form parallel quadruplex structures with double chain reversal loops (Figure 1.3) that wrap along the side of the quadruplex. One could image the central nucleotide being positioned in such a way to promote cleavage by the flanking

As. Control experiments with other mutant RNA oligonucleotides and different buffer solutions would be necessary. There is, however, precedence for small self-cleaving RNA, such as a 5 bp minimal ribozyme.15

171

Figure 6.2: K+-dependent cleavage of central loop U residues in GQS with AUA loops. Cleavage has been seen with both G3AUA (left) and G2AUA (right) RNA oligonucleotides in (a) in-line probing assays and (b) RNase T1 protection assays. Red arrows indicate sites of cleavage. Salt + front seen on the bottom of the gels is due to the presence of cacodylate buffer. The K 1/2 value for G3AUA is 117 28 and G2AUA is 4.0 0.8.

A timecourse assay was performed for G3AUA with the in-line probing (ILP) assay. For this assay, two identical sets of reactions were prepared for the G3AUA RNA at each K+ concentration. The RNA was renatured in 10 mM LiCacodylate, pH 7.0 at 85 C for 1 min followed by incubation at room temperature for 5 min. (Note that cleavage could not have happened during this time, as cleavage is K+ dependent.) KCl was added to the final concentration (no K+, 1 mM, and 100 mM K+) and the reactions were allowed to pre-incubate at room temperature for 45 minutes. At that time, the pH was increased to 8.3 with Tris-HCl (50 mM final concentration) and MgCl2 was added to a final concentration of 20 mM. The first

172 reaction was used for timpoints 0, 0.5, 1, and 5h. The second reaction was used for timepoints

10h, 20h, 30h, and 40h. All reactions were quenched with one volume of EDTA Formamide loading buffer (EFLB) and frozen until loading onto a 12% PAGE gel (Figure 6.3).

Figure 6.3: G3AUA ILP timecourse after renaturation. Cleavage is seen at all U residues for RNAs with 100 mM K+ at all timepoints. No cleavage is seen for 0 and 1 mM K+. The sample for 100 mM K+ at 1 h spilled into the 0 mM K+ at 5 h lane which accounts for cleavage bands seen in the latter. Red lines show positions of G residues and blue boxes indicate U positions. No treatment (NT), hydroxyl ladder (OH), and RNase T1 ladder (T1) lanes are also included.

Cleavage of loop residues is seen at 100 mM K+ for all timepoints collected, including

0h. No cleavage is seen at no and 1 mM K+. This confirms that K+ is required for cleavage, as mentioned above. Additionally, cleavage for RNAs at all timepoints is similar, including 0h and

40h. A larger percentage of cleavage was seen in previous gels (Figure 6.2). Given that there is no change in cleavage with the timepoints, cleavage most likely occurs during the renaturation process itself, after addition of K+. Timepoints were started after addition of the ILP buffer, 50 mM Tris-HCl, pH 8.4 and 20 mM MgCl2. It is possible that the higher pH (8.3 vs 7.0) or addition of MgCl2 (20 vs 0 mM) was able to quench the reaction, inhibiting further site specific cleavage;

173 lower pH could drive protonation of As that could facilitate the reaction. An additional study with timepoints taken during renaturation could give a better idea of the amount of time required

2+ + 2+ for cleavage. The effect of Mg and other ions such as Na and Ca during this period of time can also be performed.

To be seriously considered for classification as a ribozyme – at least one with biological potential – the RNAs should be able to cleave to completion in a short time scale (less than 24 hours). It is also possible that uncleaved fractions are in an alternative fold that will never cleave.

Purification of uncleaved fractions followed by renaturation could promote additional cleavage.

In addition to the experiments, the location of these sequences in the Arabidopsis genome or other genomes has not yet been determined. Prevalence of these sequences and their locations could help investigate potential functions if this is actually a self-cleaving RNA.

6.2.2 Ongoing work

In addition to items mentioned above, there are several ongoing projects and collaborations in the lab that continue some of the work on G-quadruplexes in Arabidopsis thaliana. GQS formation in vivo is an ever present question. Ongoing work in the lab uses structure mapping techniques and SHAPE to study GQS structure in vitro. Graduate student

Chun Kit Kwok and post doctoral researcher Dr. Yiliang Ding (Bevilacqua and Assmann Labs) are working towards an in vivo assay that can map RNA folding in plant cells. Researchers in the

Weeks lab have successfully mapped the secondary structure of the HIV-I genome.16 A similar result in plant cells could show formation of GQS in cellular conditions. Additionally, RNA structure can be mapped under stress conditions to investigate any changes in structure with changing environmental and cellular conditions.

174 The biological relevance of GQS in Arabidopsis is also being studied. GQS in promoter regions of H.sapiens have been shown to affect transcription,17-20 and GQS in the 5‘UTR regions of prokaryotic and mammalian cell lines can affect translation,21-24 as described in Chapter 1. Are these mechanisms similar in Arabidopsis? There are also GQS located in other genomic regions, including the coding sequences, introns, and 3‘UTRs. Do these have any regulatory function?

A collaboration is ongoing with Prof. Reka Albert (Penn State) and postdoctoral researcher Dr. Ruisheng Wang to perform a detailed computational analysis of available microarray data and GQS. Greater than ten microarray datasets for abscisic acid treatment (and drought stress) are available from different labs. These datasets can be mined to focus on genes that contain GQS and see how their expression changes with different treatments.

6.2.3 Additional projects of potential interest

There are also other related projects that could be of interest. In vitro studies of RNA

GQS folding can also be pursued. Under drought stress, in addition to increasing K+ concentrations, other osmolytes increase in concentration. Small molecules such as proline, glycine betaine (GB), sorbitol, and mannitol increase concentrations about 10 fold. The presence of these osmolytes could affect GQS folding. Indeed, a 2007 paper from the Draper lab highlighted the effects of osmolytes on RNA secondary and tertiary structures but did not include a GQS RNA.25 A similar study with various RNA GQS could be performed. Polyethelene glycol

(PEG) has been used to mimic crowding conditions 26-31 Because of PEG-specific effects seen, studies with PEG alone would not be recommended.

The effect of flanking sequences on GQS stability could be of interest. Stable alternative structures can inhibit GQS formation and modulate apparent GQS stabilities. Alternatively, lack of alternative structures provides no barrier to GQS formation. Co-transcriptional folding could

175 also be investigated. During transcription, the RNA will fold as it is synthesized, which can differ from the folding of a fully prepared oligonucleotide. Many studies have been performed on the duplex-quadruplex equilibrium, examining alternative duplex formation in DNA.32 Similar investigations of RNA GQS and alternative secondary structures have not been as extensive.

Several helicases have been shown to bind and even target DNA and RNA GQS. For example, the WRN, BLM, and FANCJ helicases target DNA GQS and can unwind the structures to regulate transcription.33,34 Additionally, Pif1 DNA helicase has been shown to bind to DNA

GQS in vivo and promote DNA replication by unwinding GQS.35 RNA GQS, however, can be unwound by an RNA helicase known as RHAU, G4 resolvase 1 (G4R1), or DHX36. It can bind to tetramolecular RNA and DNA GQS and also recognize the G-tracts in human telomerase

RNA.36,37 It could be of interest to investigate any GQS binding helicases in Arabidopsis thaliana that could target and/or unwind GQS and their potential functions.

6.3 References

1. Kreps, J.A., Wu, Y., Chang, H.S., Zhu, T., Wang, X. and Harper, J.F., Transcriptome changes for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiol., 2002. 130: 2129-2141.

2. Seki, M., Narusaka, M., Ishida, J., Nanjo, T., Fujita, M., Oono, Y., Kamiya, A., Nakajima, M., Enju, A., Sakurai, T., et al., Monitoring the expression profiles of 7000 Arabidopsis genes under drought, cold and high-salinity stresses using a full-length cDNA microarray. Plant J., 2002. 31: 279-292.

3. Zhu, J.K., Salt and drought stress signal transduction in plants. Annu. Rev. Plant Biol., 2002. 53: 247-273.

4. Lee, B.H., Henderson, D.A. and Zhu, J.K., The Arabidopsis cold-responsive transcriptome and its regulation by ICE1. Plant Cell, 2005. 17: 3155-3175.

5. Matsui, A., Ishida, J., Morosawa, T., Mochizuki, Y., Kaminuma, E., Endo, T.A., Okamoto, M., Nambara, E., Nakajima, M., Kawashima, M., et al., Arabidopsis transcriptome analysis under drought, cold, high-salinity and ABA treatment conditions using a tiling array. Plant Cell Physiol., 2008. 49: 1135-1149.

176 6. Todd, A.K., Johnston, M. and Neidle, S., Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res., 2005. 33: 2901-2907.

7. Simoens, C.R., Gielen, J., Van Montagu, M. and Inze, D., Characterization of highly repetitive sequences of Arabidopsis thaliana. Nucleic Acids Res., 1988. 16: 6753-6766.

8. Richards, E.J., Goodman, H.M. and Ausubel, F.M., The centromere region of Arabidopsis thaliana chromosome 1 contains telomere-similar sequences. Nucleic Acids Res., 1991. 19: 3351-3357.

9. Lee, C., Sasi, R. and Lin, C.C., Interstitial localization of telomeric DNA sequences in the Indian muntjac chromosomes: further evidence for tandem chromosome fusions in the karyotypic evolution of the Asian muntjacs. Cytogenet. Cell Genet., 1993. 63: 156-159.

10. Jones-Rhoades, M.W., Bartel, D.P. and Bartel, B., MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol., 2006. 57: 19-53.

11. He, L. and Hannon, G.J., MicroRNAs: small RNAs with a big role in gene regulation. Nature reviews, 2004. 5: 522-531.

12. Gustafson, A.M., Allen, E., Givan, S., Smith, D., Carrington, J.C. and Kasschau, K.D., ASRP: the Arabidopsis Small RNA Project Database. Nucleic Acids Res., 2005. 33: D637-640.

13. Backman, T.W., Sullivan, C.M., Cumbie, J.S., Miller, Z.A., Chapman, E.J., Fahlgren, N., Givan, S.A., Carrington, J.C. and Kasschau, K.D., Update of ASRP: the Arabidopsis Small RNA Project database. Nucleic Acids Res., 2008. 36: D982-985.

14. Meyers, B.C., Axtell, M.J., Bartel, B., Bartel, D.P., Baulcombe, D., Bowman, J.L., Cao, X., Carrington, J.C., Chen, X., Green, P.J., et al., Criteria for annotation of plant MicroRNAs. Plant Cell, 2008. 20: 3186-3190.

15. Turk, R.M., Illangasekare, M., and Yarus, M. Catalyzed and spontaneous reactions on ribozyme ribose. J. Am. Chem. Soc., 2011. 133: 6044-6050.

16. Wilkinson, K.A., Gorelick, R.J., Vasa, S.M., Guex, N., Rein, A., Mathews, D.H., Giddings, M.C. and Weeks, K.M., High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol., 2008. 6: e96.

17. Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. and Hurley, L.H., Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription. Proc. Natl. Acad. Sci. U.S.A., 2002. 99: 11593-11598.

18. Qin, Y., Rezler, E.M., Gokhale, V., Sun, D. and Hurley, L.H., Characterization of the G- quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4. Nucleic Acids Res., 2007. 35: 7698-7713.

177 19. Paramasivam, M., Membrino, A., Cogoi, S., Fukuda, H., Nakagama, H. and Xodo, L.E., Protein hnRNP A1 and its derivative Up1 unfold quadruplex DNA in the human KRAS promoter: implications for transcription. Nucleic Acids Res., 2009. 37: 2841-2853.

20. Cogoi, S., Paramasivam, M., Membrino, A., Yokoyama, K.K. and Xodo, L.E., The KRAS promoter responds to Myc-associated zinc finger and poly(ADP-ribose) polymerase 1 proteins, which recognize a critical quadruplex-forming GA-element. J. Biol. Chem., 2010. 285: 22003-22016.

21. Kumari, S., Bugaut, A., Huppert, J.L. and Balasubramanian, S., An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation. Nat. Chem. Biol., 2007. 3: 218-221.

22. Wieland, M. and Hartig, J.S., RNA quadruplex-based modulation of gene expression. Chem. Biol., 2007. 14: 757-763.

23. Arora, A., Dutkiewicz, M., Scaria, V., Hariharan, M., Maiti, S. and Kurreck, J., Inhibition of translation in living eukaryotic cells by an RNA G-quadruplex motif. RNA, 2008. 14: 1290-1296.

24. Balkwill, G.D., Derecka, K., Garner, T.P., Hodgman, C., Flint, A.P. and Searle, M.S., Repression of translation of human estrogen receptor alpha by G-quadruplex formation. Biochemistry, 2009. 48: 11487-11495.

25. Lambert, D. and Draper, D.E., Effects of osmolytes on RNA secondary and tertiary structure stabilities and RNA-Mg2+ interactions. J. Mol. Biol., 2007. 370: 993-1005.

26. Miyoshi, D., Karimata, H. and Sugimoto, N., Hydration regulates the thermodynamic stability of DNA structures under molecular crowding conditions. Nucleos. Nucleot. Nucl., 2007. 26: 589-595.

27. Fujimoto, T., Miyoshi, D., Tateishi-Karimata, H. and Sugimoto, N., Thermal stability and hydration state of DNA G-quadruplex regulated by loop regions. Nucleic Acids Symp. Ser., 2009: 237-238.

28. Zhang, D.H., Fujimoto, T., Saxena, S., Yu, H.Q., Miyoshi, D. and Sugimoto, N., Monomorphic RNA G-quadruplex and polymorphic DNA G-quadruplex structures responding to cellular environmental factors. Biochemistry, 2010. 49: 4554-4563.

29. Zhang, D.H. and Zhi, G.Y., Structure monomorphism of RNA G-quadruplex that is independent of surrounding condition. J. Biotechnol., 2010. 150: 6-10.

30. Heddi, B. and Phan, A.T.n., Structure of Human Telomeric DNA in Crowded Solution. J. Am. Chem. Soc., 2011: epub ahead of print.

31. Xu, L., Feng, S. and Zhou, X., Human telomeric G-quadruplexes undergo dynamic conversion in a molecular crowding environment. Chem. Commun. (Cambridge, U.K.), 2011. 47: 3517-3519.

178 32. Kumar, N. and Maiti, S., Role of molecular crowding in perturbing quadruplex-Watson Crick duplex equilibrium. Nucleic Acids Symp. Ser., 2008: 157-158.

33. Wu, Y., Shin-ya, K. and Brosh, R.M., Jr., FANCJ helicase defective in Fanconia anemia and breast cancer unwinds G-quadruplex DNA to defend genomic stability. Mol. Cell. Biol., 2008. 28: 4116-4128.

34. Johnson, J.E., Cao, K., Ryvkin, P., Wang, L.S. and Johnson, F.B., Altered gene expression in the Werner and Bloom syndromes is associated with sequences having G- quadruplex forming potential. Nucleic Acids Res., 2010. 38: 1114-1122.

35. Paeschke, K., Capra, J.A. and Zakian, V.A., DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces cerevisiae Pif1 DNA Helicase. Cell, 2011. 145: 678-691.

36. Creacy, S.D., Routh, E.D., Iwamoto, F., Nagamine, Y., Akman, S.A. and Vaughn, J.P., G4 resolvase 1 binds both DNA and RNA tetramolecular quadruplex with high affinity and is the major source of tetramolecular quadruplex G4-DNA and G4-RNA resolving activity in HeLa cell lysates. J. Biol. Chem., 2008. 283: 34626-34634.

37. Sexton, A.N. and Collins, K., The 5' guanosine tracts of human telomerase RNA are recognized by the G-quadruplex binding domain of the RNA helicase DHX36 and function to increase RNA accumulation. Mol. Cell. Biol., 2011. 31: 736-743.

179

Appendix A.

Chapter 2 Supplemental Information

[Published as Supporting Online Material for a paper entitled “RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles” by Melissa

A. Mullen, Kalee J. Olson, Paul Dallaire, Francois Major, Sarah M. Assmann, and Philip C.

Bevilacqua in Nucleic Acids Research 2010 38 (22): 8149-8163.]

Figure A.1: Native gels of GQS RNA oligonucleotides. A. RNA GQS oligonucleotides of interest, renatured identically as for Circular Dichroism samples, and run in 100 mM K+. B. RNA GQS ‗ladder‘ oligonucleotides. Lane 1: AG2A, Lane 2: A(G2A)2, Lane 3: A(G2A)3, Lane 4: A(G2A)4. Oligonucleotides with increasing numbers of G2A repeats until a full GQS is achieved, as indicated by the faster migration of the oligonucleotide (seen in Lane 4). Lane 5: AG2AGAAG2AG2A. Mutant version of the oligonucleotide in Lane 4. Disruption of the G pattern disrupts GQS formation, resulting in slower migration.

180

Figure A.2: G3L1-7 GQS density as a function of GC content. 15 plant species and 3 non-plant eukaryotes were analyzed. All plant species are in green and fit to a linear regression curve. Non-plant eukaryotes are shown in gray. GC content was determined as described in Materials and Methods.

2000 GQ S 1500

1000

500

0

Numberof Sequences

100

300 500 700 900

-

- - - -

1900 1100 1300 1500 1700 2100 2300 2500 2700 2900 3100 3300 3500 3700 3900

1

------

201 401 601 801

1801 1001 1201 1401 1601 2001 2201 2401 2601 2801 3001 3201 3401 3601 3801 Coding Sequence Length

Figure A.3: Distribution of G3L1-4 GQS in the coding sequence. The number of GQS starting positions that fall within each 100 nt segment of CDS is shown above with bars. The plot is cropped at a CDS length of 4,000; it incorporates 17,497 GQS out of a total of 17,619 GQS (>99%) that are found in RNA CDS. The dashed black line represents the distribution of the length of all coding sequences from Arabidopsis.

181 Table A.1: Genome sequence information

Speciesa Common Nameb Assemblyc Release Dated Servere Arabidopsis thaliana1 mouseear cress TAIR9 06/09 TAIR Arabidopsis lyrata lyrate rockcress Araly1 12/08 JGIf Brachypodium distachyon 2 a wild grass Brachy1.0 01/09 Ensembl Drosophila melanogaster 3 fruit fly BDGP, 5 3/06 Ensembl Glycine max 4 soybean v 1.0 12/08 JGI Lotus japonicus5 a model legume v 1.0 05/09 KDRI Manihot esculenta cassava v. 1.0 JGIf Medicago truncatula 6,7 barrelclover v 3.0 12/09 MSGC Mimulus guttatus seep monkeyflower v 1.0 01/10 JGIf Mus musculus 8 mouse NCBI m37 04/07 Ensembl Nicotiana tabacum 9 tobacco 11/08 PNGG Oryza sativa – indica 10,11 rice – indica 2005-01-BGI 01/05 Ensembl Oryza sativa – japonica 12,13 rice - japonica MSU6 01/09 Ensembl Populus trichocarpa 14 black cottonwood JGI2.0 12/04 Ensembl Pyscomitrella patens15 moss v 1.1 03/07 JGI Sorghum bicolor16 sorghum Sbi1 12/07 Ensembl Vitis vinifera17 wine grape IGGP_12x 12/07 Ensembl Zea mays 18 corn AGP1 4a.53 03/09 MaizeSeq

Genome sequence information for all downloaded sequences. Provided for each are the aofficial species name, including subspecies for O. sativa, baccepted common name, cgenome assembly drelease date for the assembly, and eserver where the genomes were obtained. fThese sequence data were produced by the US Department of Energy Joint Genome Institute (http://www.jgi.doe.gov) in collaboration with the user community.

A.1 Grep scripts used for Markov model analysis and whole genome GQS searches

The scripts and process used by Paul Dallaire to prepare Markov model simulated genomes and to search chromosomes for GQS using the software grep available on Linux computers.

A.1.1 For general GQS search in each chromosome

(1) Collect fasta files from each chromosome to a single directory:

mkdir fasta

182 (2) Transform fasta to char stream for each chromosome:

for f in `ls *.fa`; do cat $f | grep -v '>' | eoln2sp | sed 's/

//g' > `basename $f .fa`.stream; done

(3) Count occurrences of G patterns:

"for f in `ls *.stream`; do cat $f | egrep -b -o ""(G{3,}[ACGT]{1,7}){3,}G{3,}"" | wc -l; done | eoln2sp | sed 's/ /\+/g' | sed 's/\+$/\n/' | bc"

(4) Count occurrences of C, A, and T patterns:

"Count occurences of C-Patterns using regexp : (C{3,}[ACGT]{1,7}){3,}C{3,}"

"Count occurences of A-Patterns using regexp : (A{3,}[ACGT]{1,7}){3,}A{3,}"

"Count occurences of T-Patterns using regexp : (T{3,}[ACGT]{1,7}){3,}T{3,}"

(5) Correction for alternative assembly:

cat hs_chr7.fa | fsagrep '^.*contig$' > hs_chr7_noalternateAssembly.fa

"cat hs_chr7.fa | egrep -b -o ""(G{3,}[ACGT]{1,7}){3,}G{3,}"" | wc -l"\

"cat hs_chr7_noalternateAssembly.fa | egrep -b -o ""(G{3,}[ACGT]{1,7}){3,}G{3,}"" | wc -l"

##bc old value in table - the first count + the second count.

"same for A,C,T."

183 A.1.2 For Markov modeled genomes

(1) Collect fasta files from each chromosome and convert to char stream:

fasta files for each chromosome converted to streams using: grep -v '>' | sed ':a;N;$!ba;s/\n//g'

(2) Generate artificial chromosomes on template chromosomes and count patterns using computer program genRanDNA3:

"for W in 50 75 100 400 2000 ; do for CHR in `ls /u/dallaire/Desktop/genomic/arabidopsis/tair/tair8_full_chromosomes/OLD /chr*.fas`; do for p in G A C T; do for rep in 1 2 3 4 5; do echo -n $W $CHR $p $rep "" -> ""; cat $CHR | genRanDna4 $W MARKOV | egrep -b -o ""($p{3,}[ACGT]{1,7}){3,}$p{3,}"" | wc -l ; done; done; done | tee tair8-20100312-Window_${W}_MARKOV.txt; done"

"for W in 150 200 1000 4000 ; do for CHR in `ls /u/dallaire/Desktop/genomic/arabidopsis/tair/tair8_full_chromosomes/OLD /chr*.fas`; do for p in G A C T; do for rep in 1 2 3 4 5; do echo -n $W $CHR $p $rep "" -> ""; cat $CHR | genRanDna4 $W MARKOV | egrep -b -o ""($p{3,}[ACGT]{1,7}){3,}$p{3,}"" | wc -l ; done; done; done | tee tair8-20100312-Window_${W}_MARKOV.txt; done"

(3) Collect data from Markov Modeled chromosomes:

"for W in 50 75 100 150 200 400 1000 2000 4000 ; do for p in A C G T; do echo -n $W "" "" $p "" ""; for rep in 1 2 3 4 5 6 7 8 9 10; do grep ""$p $rep "" tair8-20100312-Window_${W}_MARKOV.txt | sed 's/.*-> \([0- 9]*\)$/\1/' | eoln2sp | sed 'y/ /+/' | sed 's/\+$/\n/' | bc; done | eoln2sp ; echo """"; done; done"

A.1.3. For general GQS search in genomic regions

(1) Count the pattern hits in a genomic region (e.g. cds) for G,C,A, and T patterns (G patterns showed below):

TAIR9_blastsets]$ cat TAIR9_cds_20090619 | egrep -b -o "(G{3,}[ACGT]{1,7}){3,}G{3,}" | wc -l

184 (2) Count the Markov patterns for a window size 75 in a genomic region (e.g. cds) for G,C,A, and T patters (G patterns showed below):

cat TAIR8_cds_20090619 | genRanDna4 75 MARKOV | egrep -b -o "(G{3,}[ACGT]{1,7}){3,}G{3,}" | wc -l

A.2 Ushuffle: Alternative calculation of GQS significance in the Arabidopsis genome

Before utilizing the Markov Model to assess significance of the number of GQS found in

Arabidopsis, we employed the program Ushuffle.19 This program was used to shuffle chromosome sequences while maintaining base ratios and nearest neighbors in the genome.

Dinucleotide preferences in DNA sequences exist and are thought to contribute to DNA structure.20. These preferences should be taken into account when randomizing a nucleic acid sequence;21 for example, maintenance of nearest neighbors in the genome was shown to be important in searching for micro RNAs (miRNAs).22 We split each chromosome into 6-10 equal parts, depending on the size of each chromosome, which resulted in chromosome segments containing approximately 3 million bases. The segments were shuffled ten times using the Java

Applet version of Ushuffle, preserving dyad frequency by specifying a k-let value of two. Over- or underrepresentation of the G and C patterns in the genome were evaluated with a Z-score:

X (Eq. 1) Z

where X is the number of GQS in the native genomic sequence, µ is the average number of GQS in the shuffled sequences, and σ is the standard deviation of the shuffled sequences. Z-scores greater than 2 or 3 are considered significant assuming a normal (Gaussian) distribution. We performed five and ten shuffles; similar Z-scores were obtained for both, suggesting that the number of shuffles used was sufficient for the highly significant Z-scores obtained herein.

185

Shuffled chromosomes were then searched for G3L1-7 and G2L1-4 motifs, and a Z-score was calculated to determine the extent of over- or underrepresentation of the G and C patterns in the genome (Table 2.3). Unlike the result with the Markov simulated genome, we found that, for both GQS definitions, shuffled chromosomes contained many fewer GQS than the native

(unshuffled) Arabidopsis genome. For the G3L1-7 definition, there were between ~150 and 500

GQS in a given native chromosome, however in the shuffled chromosomes there were only ~15

GQS per chromosome (Table A.2). The significant difference in GQS between randomized and native genomes is represented in chromosome Z-scores that range from 43 to 107 (Table A.2), all

23 of which are well above the threshold for significance of ~3. For the G2L1-4 definition, the native chromosomes have much larger numbers of GQS, ranging from 7,000-10,000 while the shuffled sequences had ~25% of those values, resulting in Z-scores between 63 and 117 (Table

A.2). These Z-scores indicate that, despite having a low GQS density compared to some other eukaryotes, the G and C GQS patterns are greatly overrepresented in the Arabidopsis genome.

The windowed Markov Model is the accepted method for determining significance of

GQS in a genomic sequence but whether this is the most accurate method is debated in the literature.24 Using the UShuffle program effectively shuffled the chromosomes with a window size of 3 million. This dilutes GC-rich regions of the genome and is not an appropriate simulation of a genome sequence. According to Csüros and coworkers, the Markov Model is also not an accurate method to simulate the genome.24 As this issue is still under debate, it is not appropriate to interpret function from significance data.

186

Table A.2: Z-scores for G3L1-7 and G2L1-4 GQS in the Arabidopsis genome

GQS Motifa Chrb GQSc GQS Shuffled Standard deviationd Z-Scored G3L1-7 1 488 15.6 4.4 107.3 2 155 11.1 2.2 64.4 3 170 14.3 3.6 43.7 4 187 14.4 3.6 48.4 5 219 15.3 4.0 50.6 FULL 1,219 70.7 8.0 143.8

G2L1-4 1 10,525 2,501 84 95.5 2 7,010 1,711 50 105.5 3 8,406 2,141 54 116.9 4 6,923 1,636 49 107.0 5 9,271 2,274 111 62.9 FULL 42,135 10,263 144 221.4 Provided are Z-scores afor two different GQS motifs. bChromosome number. cNumber of GQS found in each chromosome. ―FULL‖ is the sum over all 5 chromosomes. dGQS found in shuffled sequences. This value is the average number of GQS from 10 shuffles. Also provided are the standard deviation and Z- score based on 10 shuffles (see Materials and Methods for Z-score determination). As with Tables 2.1 and 2.2, Quadparser search parameters both included G and C patterns to account for both sense and antisense strands.

187

A.3 Functional analysis of genes

Table A.3: Unique gene models with a GQS in the RNA

GQS Genic Coding 5‘ UTR 3‘ UTR Intron Motif Gene Loci Gene Loci Gene Loci Gene Loci Gene Loci models models models models models G3+ L1-7 215 177 166 142 10 10 14 11 26 19 G3+ L1-3 43 38 28 26 4 4 4 4 7 5

G2+ L1-4 13,156 10,382 11,816 9,699 371 310 588 473 782 630 G2+ L1-2 4,026 3,185 3,610 2,932 119 100 123 102 211 165 G2+ L1 2,428 1,913 2,190 1,781 70 60 62 54 114 183 Number of unique gene models and loci with at least one GQS. Genic, Coding, 5‘ UTR, 3‘ UTR and Intron are defined in Table 2.4. Quadparser search parameters were set to include G-patterns, which will be found in RNA, and exclude C-patterns. In Arabidopsis, there are 39,640 gene models and 33,518 loci, of which 27,379 are protein coding. The non-coding loci correspond to rRNAs, tRNAs, and other annotated ncRNAs Note that GQS from gene models were used for Tables 2.1-2.5.

188

Table A.4: Functional analysis of genes with at least one G2L1-4 GQS present in the RNA

GO GO termc GQS genesd All genese % GQS genesf p-valueg GO IDa Catb Overrepresented 0003824 MF catalytic activity 2894h 6393i 45% 9E-65 0006468 BP protein aa phosphorylation 506 798 63% 1E-53 0016310 BP phosphorylation 528 852 62% 1E-51 0016740 MF transferase activity 1118 2176 51% 2E-49 0006796 BP phosphate metabolic process 543 894 61% 2E-29 0006793 BP phosphorus metabolic process 543 895 61% 3E-49 0016301 MF kinase activity 661 1151 57% 2E-48 0043687 BP post-translational protein mod. 579 985 59% 2E-46 0016772 MF transferase act, phos. groups 708 1284 55% 2E-43 0006464 BP protein modification process 612 1107 55% 1E-37 0043412 BP biopolymer modification 639 1229 52% 7E-29 0005524 MF ATP binding 337 607 56% 2E-20 0032559 MF adenyl ribonucleotide binding 339 613 55% 3E-20 0030554 MF adenyl nucleotide binding 351 642 55% 7E-20 0005478 MF transporter activity 504 993 51% 1E-19 0032501 BP multicellular organismal process 462 906 51% 2E-18 0017111 MF nucleoside-triphosphatase act. 237 408 58% 2E-17 0000166 MF nucleotide binding 490 980 50% 2E-17 0048856 BP anatomical structure dev. 359 681 53% 5E-17 0032502 BP developmental process 493 992 50% 6E-17 0007275 BP multicellular organismal dev. 441 871 51% 7E-17 0015646 MF transmembrane transporter act. 366 701 52% 2E-16 0032555 MF purine ribonucleotide binding 387 751 52% 3E-16 0032443 MF ribonucleotide binding 387 751 52% 3E-16 0016020 CC membrane 1011 2266 45% 3E-16 0017076 MF purine nucleotide binding 399 781 51% 4E-16 0016817 MF hydrolase act, on acid anhydrides 244 432 57% 5E-16 0016818 MF hydrolase act, acid anhyd. wit h P 243 431 56% 7E-16 0016462 MF pyrophosphatase activity 242 429 56% 7E-16 0022804 MF active transmem. transporter act. 241 427 56% 7E-16 0022414 BP reproductive process 275 502 55% 8E-16 0043227 CC membrane-bound organelle 1929 4637 42% 1E-15 0050876 BP reproduction 277 509 54% 2E-15

189

0043231 CC intracell. mem-bound organelle 1924 4630 42% 2E-15 0016887 MF ATPase activity 183 312 59% 4E-14 0043226 CC organelle 2092 5111 41% 1E-13 0043229 CC intracellular organelle 2091 5110 41% 1E-13 0042623 MF ATPase activity, coupled 134 214 63% 3E-13 0044424 CC intracellular part 2331 5754 41% 3E-13 0005622 CC intracellular 2477 6146 40% 3E-13 0016787 MF hydrolase activity 975 2231 44% 7E-13 0004386 MF helicase activity 88 126 70% 1E-12 0044464 CC cell part 4378 11331 39% 3E-12 0005623 CC cell 4378 11331 39% 3E-12 0003006 BP reproductive dev. process 226 420 54% 6E-12 0048608 BP reproductive structure dev. 226 420 54% 6E-12 0051234 BP establishment of localization 588 1279 46% 6E-12 0022892 MF substrate-specific transporter act. 328 657 50% 1E-11 0051179 BP localization 592 1294 46% 1E-11 0006810 BP transport 583 1273 46% 2E-11 0048731 BP system development 190 348 55% 9E-11 0048513 BP organ development 190 348 55% 9E-11 MF secondary active transmembrane 126 212 60% 3E-10 0015290 transporter act. MF substrate-specific transmembrane 271 537 51% 3E-10 0022891 transporter act. 0008026 MF ATP-dependent helicase activity 56 76 74% 2E-9 0009857 BP recog or rejection of self pollen 29 31 94% 2E-9 0008151 BP cellular process 2642 6706 39% 3E-9 0009791 BP post-embryonic development 152 275 55% 4E-9 0008037 BP cell recognition 29 32 91% 1E-8 0015075 MF ion transmem. transporter act. 201 392 51% 3E-8 0050222 MF protein kinase activity 225 448 50% 3E-8 005169 BP response to stimulus 854 2016 42% 6E-8 0005737 CC cytoplasm 1671 4157 40% 9E-8 0009875 BP pollen-pistil interaction 29 34 85% 3E-7 0044444 CC cytoplasmic part 1542 3831 40% 3E-7 MF phosphotransferase act., alcohol 262 546 48% 4E-7 0016773 group as acceptor 0008152 BP metabolic process 2446 6261 39% 8E-7

190

0009790 BP embryonic development 145 277 52% 1E-6 0003674 MF molecular function 8070 21958 37% 2E-6 MF transferase act., transferring glycosyl 195 395 50% 2E-6 0016932 groups 0006950 BP response to stress 526 1210 44% 2E-6 0015293 MF symporter activity 61 96 64% 2E-6 0003677 MF DNA binding 717 1699 42% 3E-6 0050791 BP regulation of biological process 891 2149 42% 3E-6 0048519 BP negative reg. of biological process 80 136 59% 3E-6 0015294 MF solute:cation symporter activity 60 95 63% 4E-6 0005634 CC nucleus 642 1510 43% 4E-6 0065007 BP biological regulation 955 2320 41% 4E-6 0048316 BP seed development 144 280 51% 5E-6 0005402 MF cation:sugar symporter activity 49 74 66% 6E-6 0005403 MF sugar:hydrogen symporter act. 49 74 66% 6E-6 0015295 MF solute:hydrogen symporter act. 49 74 66% 6E-6 0046527 MF glucosyltransferase activity 55 86 64% 6E-6 0051119 MF sugar transmem. transporter act. 54 85 64% 6E-6 BP embryonic development ending in 133 256 52% 6E-6 0009793 seed dormancy 0051706 BP multi-organism process 180 367 49% 1E-5 0005515 MF protein binding 778 1873 42% 1E-5 0009628 BP response to abiotic stimulus 279 607 46% 1E-5 0048523 BP negative reg. of cellular process 59 96 62% 2E-5 MF carbohydrate transmembrane 58 94 62 2E-5 0015144 transporter activity 0008324 MF cation transmem. transporter act. 151 302 50% 2E-5 0005200 CC struc. constituent of cytoskeleton 22 26 85% 2E-5 0044425 CC membrane part 406 926 44% 2E-5 0006952 BP defense response 233 49 47% 3E-5 0048409 BP flower development 68 116 59% 3E-5 0051244 BP regulation of cellular process 830 2018 41% 3E-5 MF hydrolase activity, acting on acid 72 125 58% 3E-5 0016820 anhydrides… 0009536 CC plastid 800 1941 41% 3E-5 0044238 BP primary metabolic process 2011 5159 39% 4E-5 0043492 MF ATPase activity, coupled to 71 124 58% 5E-5

191

movement of substances MF ATPase activity, coupled to 71 124 58% 5E-5 0042626 transmembrane movement 0044237 BP cellular metabolic process 1994 5120 39% 5E-5 0044459 CC plasma membrane part 60 101 59% 6E-5 0030246 MF carbohydrate binding 63 108 58% 7E-5 0006855 BP multidrug transport 35 51 69% 8E-5 0008238 MF exopeptidase activity 49 79 62% 9E-5 0050973 BP reg. of developmental process 90 168 54% 9E-5

Underrepresented 0000496 MF base pairing 0 631 0% <1E-99 0000498 MF base pairing with RNA 0 631 0% <1E-99 0000499 MF base pairing with mRNA 0 631 0% <1E-99 0060090 MF molecular adaptor activity 0 631 0% <1E-99 0030533 MF triplet codon-AA adaptor activity 0 631 0% <1E-99 0006455 BP translational elongation 18 662 0% 4E-98 0043284 BP biopolymer biosynthetic process 103 874 12% 6E-59 0006412 BP translation 174 1129 15% 4E-51 0010467 BP gene expression 293 1487 20% 1E-41 0009059 BP macromolecule biosynth. process 280 1399 20% 7E-39 0003723 MF RNA binding 179 983 18% 7E-30 0044249 MF cellular biosynthetic process 429 1719 25% 3E-22 0009058 BP biosynthetic process 602 2149 28% 2E-14 0000154 BP rRNA modification 1 70 0.01% 3E-9 0048046 CC apoplast 8 102 8% 1E-8 CC small nucleolar ribonucleoprotein 8 94 9% 2E-7 0005732 complex 0005576 CC extracellular region 28 172 16% 2E-6 Provided are aoverrepresented and underrepresented gene ontology (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d transcripts with at least one G2+L1-4 GQS. Included are the number of genes (CDS, 5‘UTR, 3‘UTR, and introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given g h GO term and the appropriate p-value, as determined using the BiNGO program. The total number of GO-annotated genes with a GQS in G2L1-4 is 9,097. iThe total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub-categories of others.

192

Table A.5: Functional analysis of genes with at least one G2L1-2 present in the RNA

GO GO termc GQS genesd All genese % GQS genesf p-valueg GO IDa Catb Overrepresented 0003677 MF DNA binding 297h 1699i 17% 1E-12 0030528 BP transcription regulator activity 247 1441 17% 2E-9 BP reg. of nucleobase, nucleoside, & 202 1153 18% 2E-8 0019219 nucleic acid metabolic process 0050791 BP regulation of biological process 335 2149 16% 3E-8 0003700 BP transcription factor activity 208 1207 17% 3E-8 0009889 BP regulation of biosynthetic process 202 1164 17% 3E-8 0045449 BP regulation of transcription 198 1139 17% 3E-8 0004386 MF helicase activity 41 126 33% 3E-8 0031323 BP reg. of cellular metabolic process 208 1216 17% 3E-8 BP reg. of macromolecule biosynthetic 199 1152 17% 3E-8 0010556 process 0051244 BP regulation of cellular process 315 2018 16% 3E-8 0010468 BP regulation of gene expression 209 1232 17% 5E-8 0043227 CC membrane-bounded organelle 641 4636 14% 5E-8 BP regulation of macromolecule 200 1171 17% 6E-8 0060255 metabolic process 0043231 CC intracell mem-bounded organelle 639 4629 14% 6E-8 0019222 BP regulation of metabolic process 209 1238 17% 6E-8 0065007 BP biological regulation 350 2320 15% 1E-7 0032501 BP multicellular organismal process 159 906 18% 5E-7 0008026 MF ATP-dependent helicase activity 28 76 37% 5E-7 0005622 CC intracellular 809 6145 13% 1E-6 0043229 CC intracellular organelle 685 5109 13% 1E-6 0043226 CC organelle 685 5110 13% 1E-6 0016887 MF ATPase activity 69 312 22% 2E-6 0007275 BP multicellular organismal dev 151 871 17% 2E-6 0005634 CC nucleus 236 1510 16% 3E-6 0044424 CC intracellular part 752 5753 13% 1E-5 0051234 BP establishment of localization 202 1279 16% 1E-5 0006810 BP transport 201 1273 16% 2E-5 0048856 BP anatomical structure dev. 120 681 18% 2E-5 0051179 BP localization 203 1294 16% 2E-5

193

0042623 MF ATPase activity, coupled 50 214 23% 2E-5 0003676 MF nucleic acid binding 428 3078 14% 2E-5 0017111 MF nucleoside-triphosphate activity 80 408 20% 3E-5 0005488 MF binding 881 6895 13% 3E-5 0032502 BP developmental process 161 992 16% 4E-5 MF hydrolase act, acting on p-containing 2 431 0.5% 6E-5 0001681 acid anhydrides MF hydrolase act, acting on acid 82 432 19% 7E-5 0016817 anhydrides 0016462 MF pyrophosphatase activity 81 429 19% 9E-5

Underrepresented 0000496 MF base pairing 0 631 0% 1E-30 0000498 MF base pairing with RNA 0 631 0% 1E-30 0000499 MF base pairing with mRNA 0 631 0% 1E-30 0030533 MF triplet codon-AA adaptor activity 0 631 0% 1E-30 0060090 MF molecular adaptor activity 0 631 0% 1E-30 0006455 BP translational elongation 13 662 2% 5E-17 0006416 BP translation 53 1129 5% 3E-12 0043284 BP biopolymer biosynthetic process 38 874 4% 2E-10 0010467 BP gene expression 89 1487 6% 6E-10 0044249 BP cellular biosynthetic process 109 1719 6% 6E-10 0009059 BP macromolecule biosynth. process 82 1399 6% 7E-10 0012505 CC endomembrane system 316 3859 8% 9E-9 0009058 BP biosynthetic process 154 2149 7% 2E-8 Provided are aoverrepresented and underrepresented gene ontology (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d transcripts with at least one G2+L1-4 GQS. Included are the number of genes (CDS, 5‘UTR, 3‘UTR, and introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given g h GO term and the appropriate p-value, as determined using the BiNGO program. The total number of GO-annotated genes with a GQS in G2L1-4 is 2,811. iThe total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub-categories of others.

194

Table A.6: Functional analysis of genes with at least one G2L1 GQS present in the RNA

GO GO termc GQS genesd All genese % GQS genesf p-valueg GO IDa Catb Overrepresented 0030528 BP transcription regulator activity 191h 1441i 13% 8E-17 0003677 MF DNA binding 214 1699 13% 9E-17 0003700 MF transcription factor activity 161 1207 13% 2E-14 BP reg. of nucleobase, nucleoside, & 150 1153 13% 1E-12 0019219 nucleic acid metabolic process 0009889 BP regulation of biosynthetic process 151 1164 13% 1E-12 0045449 BP regulation of transcription 148 1139 13% 2E-12 0031323 BP reg. of cellular metabolic process 154 1216 13% 4E-12 BP reg. of macromolecule biosynthetic 148 1152 13% 4E-12 0010556 process 0019222 BP regulation of metabolic process 155 1238 13% 6E-12 0003676 MF nucleic acid binding 311 3078 10% 6E-12 BP regulation of macromolecule 148 1171 13% 9E-12 0060255 metabolic process 0010468 BP regulation of gene expression 152 1232 12% 3E-11 0050791 BP regulation of biological process 230 2149 11% 7E-11 0051244 BP regulation of cellular process 218 2018 11% 1E-10 0065007 BP biological regulation 236 2320 10% 5E-9 0005488 MF binding 579 6895 8% 2E-8 0005634 CC nucleus 151 1510 10% 6E-5

Underrepresented 0060528 MF transcription regulator activity 191 1441 13% 8E-17 0003677 MF DNA binding 214 1699 13% 9E-17 0003700 MF transcription factor activity 161 1207 13% 1E-14 BP regulation of nucleobase… and 150 1153 13% 1E-12 0019219 nucleic acid metabolic process 0009889 BP regulation of biosynthetic process 151 1164 13% 1E-12 0045449 BP regulation of transcription 148 1139 13% 2E-12 0031323 BP reg. of cellular metabolic process 154 1216 13% 3E-12 BP reg. of macromolecule biosynthetic 148 1152 13% 3E-12 0010556 process 0019222 BP regulation of metabolic process 155 1238 13% 6E-12 0003676 MF nucleic acid binding 311 3078 10% 6E-12

195

BP regulation of macromolecule 148 1171 13% 9E-12 0060255 metabolic process 0010468 BP regulation of gene expression 152 1232 12% 3E-11 0050791 BP regulation of biological process 230 2149 11% 7E-11 0051244 BP regulation of cellular process 218 2018 11% 1E-10 0065007 BP biological regulation 236 2320 10% 5E-9 0005488 MF binding 59 6895 8% 2E-8 0006534 CC nucleus 151 1510 10% 6E-5 Provided are aoverrepresented and underrepresented gene ontology (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d transcripts with at least one G2+L1-4 GQS. Included are the number of genes (CDS, 5‘UTR, 3‘UTR, and introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given g h GO term and the appropriate p-value, as determined using the BiNGO program. The total number of GO-annotated genes with a GQS in G2L1-4 is 1,697. iThe total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub-categories of others.

Table A.7: Functional analysis of genes with at least one G3L1-7 GQS present in the RNA

GO GO termc GQS genesd All genese % GQS genesf p-valueg GO IDa Catb Overrepresented 0005507 MF copper ion binding 9h 109i 8% 8.8E-4 0009790 BP embryonic development 11 277 4% 3.3E-2 0007275 BP multicellular organismal dev. 20 871 2% 8.8E-2

Underrepresented BP macromolecule biosynthetic 0 1399 0 3.7E-3 0009059 process 0009058 BP biosynthetic process 3 2149 0.1% 3.7E-3 0010467 BP gene expression 1 1487 0.1% 1.2E-2 0044249 BP cellular biosynthetic process 2 1719 0.1% 1.2E-2 0006416 BP translation 0 1129 0 1.9E-2 Provided are aoverrepresented and underrepresented gene ontology (GO) ID numbers, bGO categories (Cat), and cGO term for gene products encoded by d transcripts with at least one G2+L1-4 GQS. Included are the number of genes (CDS, 5‘UTR, 3‘UTR, and introns) with a GQS that are annotated for the listed GO term, and ethe total number of genes in Arabidopsis with the listed GO term. Also included are fthe percentage of genes with GQS with a given g h GO term and the appropriate p-value, as determined using the BiNGO program. The total number of GO-annotated genes with a GQS in G2L1-4 is 246. iThe total number of GO-annotated genes in A. thaliana is 25,179. Table is sorted in order of increasing p-value. Some GO terms are sub-categories of others.

196

A.4 References

1. Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T.Z., Garcia-Hernandez, M., Foerster, H., Li, D., Meyer, T., Muller, R., Ploetz, L., et al., The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 2008. 36: D1009- 1014.

2. Vogel, J.P., Garvin, D.F., Mockler, T.C., Schmutz, J., Rokhsar, D., Bevan, M.W., Barry, K., Lucas, S., Harmon-Smith, M., Lail, K., et al., Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature, 2010. 463: 763-768.

3. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al., The genome sequence of Drosophila melanogaster. Science, 2000. 287: 2185-2195.

4. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al., Genome sequence of the palaeopolyploid soybean. Nature, 2010. 463: 178-183.

5. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Kato, T., Nakao, M., Sasamoto, S., Watanabe, A., Ono, A., Kawashima, K., et al., Genome structure of the legume, Lotus japonicus. DNA Res., 2008. 15: 227-239.

6. Young, N.D., Cannon, S.B., Sato, S., Kim, D., Cook, D.R., Town, C.D., Roe, B.A. and Tabata, S., Sequencing the genespaces of Medicago truncatula and Lotus japonicus. Plant Physiol., 2005. 137: 1174-1181.

7. Cannon, S.B., Sterck, L., Rombauts, S., Sato, S., Cheung, F., Gouzy, J., Wang, X., Mudge, J., Vasdewani, J., Schiex, T., et al., Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. U.S.A., 2006. 103: 14959-14964.

8. Wade, C.M., Kulbokas, E.J., 3rd, Kirby, A.W., Zody, M.C., Mullikin, J.C., Lander, E.S., Lindblad-Toh, K. and Daly, M.J., The mosaic structure of variation in the laboratory mouse genome. Nature, 2002. 420: 574-578.

9. Opperman, C., M, B. and SA, L., Sequencing and analysis of the Nicotiana tabacum genome. Recent Adv. Tob. Sci., 2007. 33: 5-14.

10. Yu, J., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al., A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 2002. 296: 79-92.

11. Zhao, W., Wang, J., He, X., Huang, X., Jiao, Y., Dai, M., Wei, S., Fu, J., Chen, Y., Ren, X., et al., BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res., 2004. 32: D377-382.

197 12. Matsumoto, T., Wu, J.Z., Kanamori, H., Katayose, Y., Fujisawa, M., Namiki, N., Mizuno, H., Yamamoto, K., Antonio, B.A., Baba, T., et al., The map-based sequence of the rice genome. Nature, 2005. 436: 793-800.

13. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., et al., The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res., 2007. 35: D883-887.

14. Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al., The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 2006. 313: 1596-1604.

15. Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., Perroud, P.F., Lindquist, E.A., Kamisugi, Y., et al., The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 2008. 319: 64- 69.

16. Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., et al., The Sorghum bicolor genome and the diversification of grasses. Nature, 2009. 457: 551-556.

17. Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al., The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 2007. 449: 463-467.

18. Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., et al., The B73 maize genome: complexity, diversity, and dynamics. Science, 2009. 326: 1112-1115.

19. Jiang, M., Anderson, J., Gillespie, J. and Mayne, M., uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts. BMC Bioinf., 2008. 9: 192.

20. Nussinov, R., Nearest neighbor nucleotide patterns. Structural and biological implications. J. Biol. Chem., 1981. 256: 8458-8462.

21. Burge, C., Campbell, A.M. and Karlin, S., Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. U.S.A., 1992. 89: 1358-1362.

22. Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P. and Burge, C.B., Prediction of mammalian microRNA targets. Cell, 2003. 115: 787-798.

23. Anderson, T.W. and Sclove, S.L., The statistical analysis of data. Second ed. 1986, Palo Alto, CA: The Scientific Press.

24. Csuros, M., Noe, L. and Kucherov, G., Reconsidering the significance of genomic word frequencies. Trends Genet., 2007. 23: 543-546.

198

Appendix B.

Chapter 3 Supplemental Information

B.1 Materials and Methods

B.1.1 RNA preparation

RNA oligonucleotides were purchased from Dharmacon with the following sequences:

Oligo Name Sequence Gene ID Gene Annotation G3A GGGAGGGAGGGAGGG NDA1, non- G3N1-2 GGGUCGGGUUGGGCGGG At1g07180 phosphorylating NAD(P)H dehydrogenase transduction/WD40 repeat- G3A2 GGGAAGGGAAGGGAAGGG At2g18900 like superfamily protein G3AUA GGGAUAGGGAUAGGGAUAGGG between At3g63530 and At3g63540* MNC6.12, aldo/keto G3N4 GGGUUUUGGGACAUGGGCUUGGGG At5g52580 reductase family protein G2A AGGAGGAGGAGGA At2g39320 T16B24.4 G2Amut AGGAGAAGGAGGA potential natural antisense G2A2 GGAAGGAAGGAAGG At1g33415 gene G2AUA GGAUAGGAUAGGAUAGG G2AUAmut GGAUAGAAUAGAAUAGG F9C16.23, DC1 domain- G2N4 GGAGCCGGAGUCGGAAUGGG At1g44020 containing protein *G3AUA sequence is found in the intergenic region between the two listed genes

These sequences were chosen because they represented different GQS motifs from varying loop lengths. G3A was chosen to compelement G2A and G2AUA was chosen to complement G3AUA to compare G-tract length with similar loop sequences. RNA oligonucleotides were prepared by dialysis using a six well microdialysis apparatus (Gibco-BRL

Life Technologies) at a flow rate of 25 mL/min. The RNAs were first dialyzed against 100 mM

LiCl for 8 hours to remove associated cations and then dialyzed against distilled and autoclaved

199 water for 5 hours to remove excess LiCl. Lastly, RNA oligonucleotides were dialyzed against 10 mM LiCacodylate, pH 7.0 for at least 12 hours.

For native gel electrophoresis and structure mapping, RNA oligonucleotides were 5‘end labeled with T4 Polynucleotide Kinase (NEB) and gel purified. RNA was extracted from the gel with 10 mM LiCacodylate, pH 7.0, 1 mM EDTA, 250 mM LiCl while rotating overnight at 4 C.

The RNA ethanol precipitated using Li+ as the counterion instead of Na+ to prevent GQS formation.

B.1.2 Non-denaturing (native) gel electrophoresis

12% polyacrylamide gels were prepared with 0.5X TBE (50 mM Tris, 40 mM Boric

Acid, and 0.5 mM EDTA) and the K+ concentration of interest (0 mM, 0.25 mM, 1 mM, 10 mM,

50 mM, or 100 mM). The running buffer used was 0.5X TBE and K+ concentration of interest.

At least 0.5 nM 5‘end labeled RNA was renatured without K+ at 85 C for 1 min then allowed to cool at room temperature for 10 min. K+ of interest was added as a 5X stock and samples were incubated at room temperature for 45 minutes. 2X Glycerol Loading Buffer was added to each sample (1X = 20% glycerol, 0.5X TBE, K+ concentration of interest).

Gels were run at a constant 4 watts for 3-12 hours, until the xylene cyanol dye had traveled 3 cm from the bottom of the wells. Gel running times increased with increasing salt concentrations. Consequently, the bands on the high salt gels appear more diffuse in gel images than low salt gels. Buffers from the upper and lower wells were mixed every 30 minutes and the wells were refilled. The gels were dried, exposed to a phosphor plate, and scanned on a

PhosphorImager (Molecular Dynamics).

Gels were quantified with ImageQuant5 software. Line traces were taken for each lane on a gel to measure distance and intensity of each band. For each gel, band mobility was

200 normalized to the G2L1mut oligo mobility (by taking the ratio of each point relative to the mobility of G2L1mut). The intensity of each lane was normalized to a 0 to 1 scale to account for differences in loading or exposure times. Traces were smoothed with a window size of 9.

B.1.3 Circular dichroism (CD)

CD spectroscopy was performed using a Jasco CD J810 Spectropolarimeter and analyzed with Jasco Spectra Manager Suite software. RNA oligonucleotides were prepared to a concentration of 5 µM RNA in 10 mM LiCacodylate, pH 7.0 buffer. RNA was renatured at 85 C for 1 min and allowed to cool to room temperature. CD spectra were acquired every nm from

210-320 nm with a bandwidth of 1 nm. Each spectrum is an average of 3 scans with a response time of 4s/nm. Data are buffer subtracted, normalized to provide molar residue ellipticity values, and smoothed over 5 nm [Sosnick. 2001e]

Titrations were performed with KCl to determine the amount of K+ necessary for G-

+ quadruplex formation. To determine K 1/2 values, ellipticity data were fit with kaleidaGraph v.

3.5 (Synergy software) according to the two-state Hill equation:

U F (Eq. B.1) F n 1 ([K ]/[K 1/2 ])

where F is the normalized CD signal corresponding to the fully folded GQS, U is signal for the unfolded

+ + GQS, [K ] is the potassium ion concentration, K 1/2 is the potassium ion concentration needed to fold half the RNA, and n is the Hill coefficient. In addition, data were fit with a fraction folded plot using the following two-state equation:

n ([K ] [K 1/2 ]) f F n (Eq. B.2) ([K ] [K 1/2 ]) 1

201

The standard free energy (ΔG) of the folding transition of a given GQS can be calculated from the relationship: [F] G RTln( ) (Eq. B.3) obs [U]

where R is the gas constant and T is the temperature in K. This expression can be evaluated at any given

+ potassium ion concentration using the K ½ and Hill coefficient (n).

[K ] (Eq. B.4) G obs nRTln( ) [K 1 ] 2

B.1.4 UV-detected thermal denaturation (melts)

RNA was prepared to a concentration of 5µM or 10 µM in 10 mM LiCacodylate, pH 7.0.

Samples were renatured at 85 C for 1 min then allowed to cool to room temperature. Monovalent salt (LiCl, NaCl, or KCl) was added for a final concentration of 100 mM. Thermal denaturation experiments were performed on a Gilford Response II spectropolarimeter. Data was collected every 0.5 C over the temperature range 5-95 C and 95-5 C. Reversible transitions were confirmed by forward and reverse melts. Unfolding transitions were monitored at 260 nm with 5

µM RNA in 0.5 cm cuvettes and at 295 nm with 10 µM RNA in 1 cm pathlength cuvettes. Data were fit with KaleidaGraph v. 3.5 (Synergy Softward) using a Marquadt algorithm for non-linear curve fitting.

202 B.1.5 RNase T1 protection assays

5‘end labeled RNA, prepared as above, was used for the RNase T1 protection assays.

Hydroxyl ladders were prepared by incubating labeled RNA with hydrolysis buffer (50 mM

Na2CO3/NaHCO3 pH 9.0, 1 mM EDTA) for 5 min. at 90 C before adding EDTA-Formamide loading buffer (EFLB) and freezing on dry ice. A denaturing ladder was prepared for each RNA, by incubating 5‘end labeled RNA with 0.5 U/µL RNase T1 (Ambion) in T1 Buffer (1X = 6.5M

Urea, 20 mM sodium citrate pH 3.5, 1 mM EDTA) at 50 C for 15 min followed by addition of

EFLB and freezing. Ladders were heated to 90 C for 2 min before loading onto the gel.

Native T1 reactions prepared with 5‘end labeled RNA, which was renatured in 2 mM

LiCacodylate, 0.5 mM EDTA at 85 C for 1 min and then allowed to cool to room temperature.

K+ was added with a 10X stock to the desired concentration and allowed to incubate for 10 min. at room temperature before adding RNase T1 to a final concentration of 0.0005 U/µL. Reactions were incubated at room temperature for 1 hour before quenching with EFLB and loading immediately onto a sequencing gel. Samples were analyzed on a 12% polyacrylamide sequencing gel with 8.4M Urea and 1xTBE (10 mM Tris, 8.3 mM Boric Acid, and 0.1 mM

EDTA). The gels were dried, exposed to a phosphor plate, and scanned on a PhosphorImager

(Molecular Dynamics).

203 B.2 Supplemental figures

Figure B.1: G3 GQS CD K+ titrations and fraction folded plots. Each sequence is provided with G residues in bold text.

204

Figure B.2: G2 GQS CD K+ titrations and fraction folded plots. Each sequence is provided with G residues in bold text.

205

Figure B.3: G3A (a) CD titration and (b) fit to two transitions.

206

Figure B.4: Native gel traces, prepared as described in Materials and Methods (Section B.1).

207 Appendix C.

Chapter 4 Supplemental Information

Figure C.1: Cu2+ inhibits seedling root growth. Images of 8 day old Col-0 wild type seedlings grown on copper-supplemented media. Seeds were sown on 0.5xMS media supplemented with no CuSO4, 10 µM CuSO4, 30 µM CuSO4, 50 µM CuSO4 and, 100 µM CuSO4 as labeled above. The picture contrast, color, and brightness were adjusted to visualize the roots. Also included is a 16X magnification of the seedling grown on 100 µM CuSO4.

208 C.1 Functional analysis results

BiNGO functional analysis of G/C2+L1-4 (full) revealed slight overrepresentation of the GO term copper ion binding with a p-value of 1.0e-4. Table C.1 contains a list of genes and GQS that are overrepresented in copper ion binding.

Table C.1: G2L1-4 GQS sequences from overrepresented GO term ―copper ion binding‖

Gene ID Gene Name Location GQS Sequence At1g02410 Intron 1 GGAGGTTGCGGAAGGG At1g06330 CDS CCACGCCGCACCAGTCCC CDS CCGCCACCACCC

At1g17800 CDS CCAACGCCGTGACCTATGCCC At1g18140 LAC1 (Laccase 1) CDS GGGAGGATGGGCGG At1g21850 SKS8 (SKU5 Similar 8) CDS CCCCGTTCCATTCCCAGCCCCC At1g21860 SKS7 (SKU5 Similar 7) CDS GGCGGATTCGGAGG CDS CCCCGTCCCCTTCCCCACTCCGGCC

At1g22480 CDS GGAGGTGGTGGTGG LPR1 At1g23010 CDS GGCGGCAGGATTGGAGG (Low Phosphate Root1) At1g31670 CDS GGGATGGTTTGGTGAGG CDS CCACCATCACCACC

CDS CCCACCGCCTCACCC

CDS CCACCATGTCCCTTGCC

At1g31690 CDS GGAGGAGTTGGTGG CDS CCCACCTCCTCACCCC

CDS CCACCATGTCCCATGCC

At1g31710 CDS GGTTTGGGCAGGTGG CDS CCACCACCACGCC

Intron 1 CCCGAACCCAAACCGAACCCGACCC

At1g45063.1 CDS CCGAGTCCTCTACCACTACC At1g45063.2 CDS CCGAGTCCTCTACCACTACC At1g48940 CDS CCTCCACCTTCC At1g55560 SKS14 (SKU5 Similar 14) CDS GGAAAGGTTGGAGG CDS GGTGGTTTTGGTGG

At1g55570 SKS12 (SKU5 Similar 12) CDS GGTGGCTTTGGTGG At1g62810 CDS GGGTTTTGGTTCCGGGTCGG CDS GGATGGTTTGGATCGGAGG

CDS CCCAATTCCAATTCCCAAAGCCTCC

At1g64600 CDS GGTGCTGGCACTGGTTCAGG CDS GGGAGGCGGATGGGG

At1g64640 CDS GGTGGTGGTGG LPR2 At1g71040 CDS CCGTCCGCCGCC (Low Phosphate Root2) At1g75790 SKS18 (SKU5 Similar 18) CDS CCCAGTCCCATTCCCTAAGCCC At1g76100 PETE1 CDS GGGTGCTGGTATGGTTGGG (PLASTOCYANIN 1) CDS CCACTTCCGCCACC

At1g76160 SKS5 (SKU5 Similar 5) CDS GGTGGATTTGGTGG CDS GGAATGGATGGTGG

CDS CCCGGTCCCTTTCCCTGATCCCGCC

209

At1g79800 CDS GGCGGCGGAAGG CDS GGACTCGGTGATGGAGG

CDS CCACCAACACCATGACC

CDS CCATGCCTCCATCC

CDS CCTCCTCCCTGCC

At2g07695 CDS CCCCGACCCCTTCCAACACCTCC At2g15770 CDS GGCGTGGAGCTGGAGTTGG CDS GGGTCGGGTTGGGGTTGGGG

GGCTCTGGTTGGGGTTGGGGCACGG CDS G CDS GGCTGGAGTTGGAGTTGGGG

CDS GGTTCGGATTCGGGCTCGGGTTTGG

At2g15780 CDS GGGGCTTTGGGTGGGGTTGGGGTGG At2g23630 SKS16 (SKU5 Similar 16) CDS CCAGTTCCCTGGTCCTCC CDS CCCACACCAGGACCCTCC

CCTAACCACCCAAAGCCAGGACCAG At2g23990.1 CDS CC CCAGCCCCAGCTCCACCCACTCCTT CDS CTCC CDS CCGGCGCCAGCTCCGGCC

CCTAACCACCCAAAGCCAGGACCAG At2g23990.2 CDS CC CCAGCCCCAGCTCCACCCACTCCTT CDS CTCC CDS CCGGCGCCAGCTCCGGCC

At2g26720 CDS CCGGTTCCTGGACCGGTCC At2g27035 CDS CCTTCCTCAACCGCCACC CDS CCTCAACCACCACTCC

CCAAAGCCACCTCCGGAGCCAAAAC At2g28660 CDS CC CDS GGTGACGGTGGTCGG

At2g29130 LAC2 (Laccase 2) CDS GGAGTGGTTGGGCGGATGG CDS GGTGGGTCAAGGCTTTGG

CDS CCTTACCCTTTTCCCAAACCC

CDS CCCGGGCCAAACCACC

CDS CCATCCGCTTCCTTGCC

CDS CCTTCCTCCACCC

At2g30210 LAC3 (Laccase 3) CDS GGTCGGGTCAGGTTTCGG CDS GGTGGTTGGGTGG

CDS GGGATGGGGTTTGGCTATGG

CDS CCGCCCGTTCCACC

CCGGTTCCTGGACCCGTCCGACCAC At2g31050 CDS C At2g40370 LAC5 (Laccase 5) CDS GGTGTAGGCGGCGGAAGTGG CDS CCTATCCTCCCTGCC

At2g42490 CDS GGGGAGGATGGTCTCGGG CDS CCCACCGTTCCAGCC

CDS CCACTCCCCCCACCTGACCC

At2g46570 LAC6 (Laccase 6) CDS GGAGGTTGGGCGG CDS CCTCCTCCTCC

At3g01070 CDS GGTGGACGGTGGG CDS GGAGCTGGAAGGGACTTGG

210

CDS GGTAATGGTGGTGG

CDS CCGCCTCCTCCACC

5'UTR CCCATCCCCCCCC

At3g09220 LAC7 (Laccase 7) CDS CCCAATCCAGCCTGGCC CDS CCCGCCGTCCCTTTCCC

CDS CCAGACCAGCCTCC

CDS CCACCTTGCCACCGCCACCACC

MT2A At3g09390 (METALLOTHIONEIN CDS GGCAACGGTTGTGGAGG 2A) At3g13390 SKS11 (SKU5 Similar 11) CDS GGTGGATTCGGTGG At3g13400 SKS13 (SKU5 Similar 13) CDS GGTGGTTTCGGTGG CDS GGAAAGGTCAGGTTTGG

At3g18590 CDS CCTCCTCCATCC At3g20570 CDS CCGCCGTCCCCAGCCCCAGCTCCC CDS CCGCTCCTTCTCCACC

At3g27200 CDS GGTTCAGGTTCGGGTTCGG CDS CCAAATCCGCTTCCTCTCC

At3g28958 CDS CCATTCCTCCACCACCACC At3g43670 CDS GGTTGGTATGGACCGG At3g53330 CDS CCGATCCCTCCACCACC CDS CCTTCACCACCACCACC

CDS CCTTCACCACCACCACC

CDS CCTCCACCACCACCACCACC

CDS CCTTCACCACCACCACC

At3g60270 uclacyanin, putative CDS CCGCCGCCACC CDS CCTCCTCCTCCTCCTCC

CDS CCTCCACCGTCTCC

CDS CCATCGCCGTCTCCATCGCC

At4g01380 CDS CCAAAGCCGTACCTGTCCCC At4g12880 CDS CCTCCTCCTCCTCCTCC CDS CCTCCTCCTCC

At4g22010 SKS4 (SKU5 Similar 4) CDS GGTCGCGGCTGGGGAGG CDS GGTGGAAACGGAGGG

CDS GGTATGGATGGAGG

At4g27520 CDS CCACCGGCACCACC At4g27590.2 CDS GGTGACGGTGTGGGGG At4g28090 SKS10 (SKU5 Similar 10) CDS GGCCGGTGGTTATGG CDS CCCTTCCCTCGCCGTCC

CDS CCCCGTCCCTTTCCCTGAACCC

GGCCGGTTTGGTTGGTCCGGGGATG At4g28365 CDS GTGG CDS CCACCGTCCACCACAGCC

CDS CCTCTCCCTCCCCGTCTCCGGTCC

At4g30590 CDS GGTTGGTTTGGCTATGG CDS CCAGCTCCAGCTCCCGGTCC

At4g32490 CDS CCCTCTCCTTCCCCTTCCCCC CDS CCAGCGCCTATTCCGGTCC

CDS CCGGCTCCAGGGCCCGCC

At4g37160 SKS15 (SKU5 Similar 15) CDS GGATACGGGTCAGGGACATGG At4g38420 SKS9 (SKU5 Similar 9) CDS GGTGGTTTTGGCGG CDS CCGTATCCTCAGCCGCCC

CDS CCCCGTCCCTTTCCCTGAACCC

211

At4g39830 CDS CCTAACCACTCCTCC CCGTCCCCGATCCGGCCGTCCTTAC At5g01040 LAC8 (Laccase 8) CDS CCTTTTCCTAAACCC CDS CCGCCGACCAGGCC

CDS CCCAACCAACCACC

CCGTCCCCGATCCGGCCGTCCTTAC At5g01050 CDS CCTTTCCCTAAACC CDS CCCCCGACACCAAACCCACC

CDS CCCGACCAACCACC

At5g01190 LAC10 (Laccase 10) CDS GGTTGGTTTAGGAACTGG CDS GGTGGTTGGGCGG

CDS CCTTACCCGTTCCCTAAACC

CDS CCACCACCGCC

CDS CCGCCACCGCTACC

CDS CCGCCACACCCACCAAAACC

MT2B At5g02380 (METALLOTHIONEIN CDS GGCAATGGTTGCGGAGG 2B) At5g03260 LAC11 (Laccase 11) CDS GGTGGATGCGGCGG CDS CCGTATCCGTTCCCACAGCCATACC

CDS CCCAGACCGACCACCC

At5g05390 LAC12 (Laccase 12) CDS CCCAATGCCCGATCCGACCC CCTTTCCCTAAACCGGACCGCCAAA CDS CCGCCC Intron 3 CCACCATACCATACC

At5g07130 LAC13 (Laccase 13) CDS GGAGGTGAGGAACGG CDS CCCAGTGCCCAATCCAACC

CDS CCTCCTACACCAAACC

CDS CCCTCCAACGCCTCC

At5g09360 LAC14 (Laccase 14) CDS CCACCACCACAGCC At5g14970 CDS GGAGGAGGAGG CDS GGTGGTGATGGAACTGG

CDS CCTTCACCAACCACCACC

At5g15350 CDS CCTCCTCCACC At5g21100 CDS GGTGGTTGTGGTGG CDS GGGTTTTGGGTTACGGTGAAGGG

CDS CCGCCGCCGAATCC

At5g21105.1 CDS GGGTTTTGGGTTACGGTGACGGG CDS CCACCGCCGTTTCC

At5g21105.2 CDS GGGTTTTGGGTTACGGTGACGGG CDS CCACCGCCGTTTCC

At5g24580.1 CDS GGCGGCTGAGGAGG CDS GGAGAGGAAGGTGGTGG

CDS GGCGGCGGCTACGGAGG

CDS GGCGGCTGAGGAGG

CCGCAACCACCACCACCGCCTCCAC CDS CC CDS CCCCAACCTGACCCGGAACC

At5g24580.2 CDS GGAGAGGAAGGTGGTGG CDS GGCGGCGGCTACGGAGG

CDS GGCGGCTGAGGAGG

CCGCAACCACCACCACCGCCTCCAC CDS CC

212

CDS CCCCAACCTGACCCGGAACC

At5g24580.3 CDS GGAGAGGAAGGTGGTGG CDS GGCGGCGGCTACGGAGG

CCGCAACCACCACCACCGCCTCCAC CDS CC CDS CCCCAACCTGACCCGGAACC

At5g25090 CDS CCTTCTCCTGCACCATCTCC At5g26330 mavicyanin, putative CDS CCAACCACGGCCACC CDS CCTGGCCATTGCCTCGCC

CDS CCTTCTCCCTGCCTCCTCCACGCC

CDS CCCTGGCCCTTCTCCTAGCC

At5g48100 TT10 (TRANSPARENT CDS GGTTGGTGGTGG TESTA 10) CDS GGTTGGAGTAGGGTTCGGG

At5g48450 SKS3 (SKU5 Similar 3) CDS GGTGGTTATGGAGGG CDS CCGGAGCCCTAGCCGCCGATCCC

CDS CCACTCCCTGATCCTCC

At5g53870 CDS CCGGCCACTCCATCTCC At5g58910 LAC16 (Laccase 16) CDS CCCATACCCTTTCCCTAAGCCC CDS CCACTGCCACCCTCC

CDS CCACCGTTTCCACCTCC

At5g66920 SKS17 (SKU5 Similar 17) CDS CCAATCCCTTACCCTCTCCCAACC

C.2 Database of literature-annotated copper ion-related genes and sequences

>AT4G37270.1 | HMA1 (Heavy metal ATPase 1); Copper-exporting ATPase

ATGGAACCTGCAACTCTTACTCGTTCTTCCTCTCTTACTAGATTCCCTTATCGTCGTGGTTTATCCACTCT CCGACTCGCTCGAGTCAACTCGTTCTCAATTCTTCCACCTAAAACTCTTCTCCGTCAAAAACCGCTTCGTA TCTCTGCTTCCCTTAATCTTCCACCACGGTCGATTCGTCTACGTGCTGTCGAAGATCACCATCACGATCAT CATCACGATGACGAGCAAGATCATCACAACCACCATCATCATCACCATCAACACGGATGCTGTTCTGTGGA ATTGAAAGCGGAGAGTAAGCCTCAGAAGATGTTGTTCGGATTCGCTAAAGCTATCGGATGGGTTAGATTGG CCAATTACCTCAGAGAGCATCTTCATCTTTGCTGCTCCGCCGCTGCAATGTTCCTCGCTGCCGCCGTCTGT CCTTACCTTGCTCCTGAACCTTACATTAAGTCTCTTCAGAACGCATTCATGATTGTTGGTTTTCCTCTTGT TGGAGTTTCAGCATCTCTCGACGCACTTATGGATATAGCTGGAGGAAAAGTGAACATCCATGTCTTGATGG CACTTGCGGCTTTTGCATCTGTGTTTATGGGAAATGCTTTGGAAGGAGGATTGCTTCTAGCTATGTTCAAT CTTGCTCATATTGCTGAGGAGTTCTTTACTAGTCGATCAATGGTGGATGTCAAAGAATTGAAAGAGAGTAA TCCAGATTCTGCATTGTTGATCGAAGTACACAATGGCAATGTTCCAAATATATCTGATTTGTCATACAAAA GCGTTCCTGTGCACAGCGTAGAAGTTGGATCCTATGTTTTGGTTGGAACTGGTGAGATTGTGCCTGTAGAT TGCGAAGTCTATCAAGGTAGTGCTACAATTACAATTGAGCACTTGACTGGGGAAGTCAAGCCGTTGGAGGC AAAAGCTGGAGATAGAGTGCCTGGTGGTGCAAGAAATTTGGATGGCAGAATGATTGTAAAGGCTACAAAGG CATGGAATGATTCGACGCTTAACAAGATTGTACAGCTGACCGAGGAAGCACATTCTAATAAACCCAAACTT CAGAGATGGCTGGATGAGTTTGGCGAGAATTACAGCAAGGTTGTCGTTGTTTTGTCACTTGCAATTGCCTT CCTAGGTCCATTTTTGTTCAAGTGGCCTTTTCTCAGCACCGCAGCATGTAGAGGATCTGTTTACAGAGCAT TGGGACTTATGGTGGCCGCATCACCATGTGCTCTGGCCGTAGCTCCATTGGCTTATGCTACTGCTATTAGT TCCTGTGCAAGAAAGGGAATATTGCTGAAAGGTGCACAGGTTCTAGATGCTCTTGCGTCTTGCCATACTAT TGCTTTTGACAAAACTGGTACCTTAACAACCGGCGGCCTTACTTGTAAAGCAATTGAACCCATTTATGGGC ACCAAGGAGGAACTAATTCAAGTGTAATAACTTGCTGCATTCCAAATTGTGAAAAAGAAGCTCTTGCAGTT GCGGCTGCCATGGAGAAGGGCACCACGCATCCTATTGGAAGAGCTGTTGTAGATCACAGTGTGGGTAAGGA TCTTCCTTCTATTTTTGTTGAAAGCTTCGAATATTTTCCTGGTAGAGGCCTTACTGCTACTGTCAACGGTG

213 TTAAGACAGTAGCTGAAGAGAGTAGATTACGAAAAGCATCACTTGGTTCTATAGAGTTCATTACCTCACTT TTCAAATCTGAAGATGAATCTAAACAGATCAAGGATGCTGTAAACGCGTCTTCGTACGGAAAGGACTTCGT TCATGCTGCTCTTTCTGTTGATCAAAAGGTAACATTGATTCACCTCGAAGATCAGCCTCGTCCAGGGGTGT CAGGAGTTATAGCAGAACTTAAAAGCTGGGCCAGACTCCGAGTAATGATGTTAACTGGGGACCATGATTCA AGTGCTTGGAGAGTTGCAAACGCAGTGGGTATTACCGAAGTCTACTGCAACCTAAAGCCAGAGGATAAGTT AAATCATGTAAAGAACATTGCTCGGGAAGCAGGTGGAGGTTTAATTATGGTAGGAGAAGGGATTAATGATG CTCCAGCTCTAGCAGCTGCAACAGTGGGGATTGTTCTTGCTCAACGAGCGAGTGCCACTGCGATTGCCGTT GCTGACATCTTACTGCTTCGAGACAACATCACCGGTGTTCCGTTCTGTGTCGCTAAATCCCGCCAGACAAC ATCATTGGTCAAGCAAAACGTAGCTCTTGCATTAACATCGATATTCTTGGCCGCTCTTCCTTCAGTTTTAG GGTTTGTCCCATTGTGGTTGACGGTACTTCTACATGAAGGCGGGACTCTTCTGGTGTGTCTAAACTCAGTA CGTGGTCTAAACGATCCATCATGGTCGTGGAAACAAGACATAGTTCATCTAATCAACAAGTTACGCTCACA AGAACCAACCAGTAGCAGCAGCAACAGTTTAAGCTCTGCACATTAG

>AT4G30110.1 | HMA2 (Heavy metal ATPase 2); Cadmium-transporting ATPase

ATGGCGTCGAAGAAGATGACCAAGAGTTACTTCGATGTTCTTGGAATTTGCTGTACATCGGAGGTTCCGTT GATCGAGAATATTCTCAATTCCATGGACGGTGTTAAAGAATTCTCTGTTATTGTTCCGTCAAGAACCGTCA TCGTCGTCCACGACACTCTCATCCTCTCGCAGTTCCAAATCGTTAAAGCACTGAACCAAGCACAGTTAGAA GCAAATGTGAGGGTAACCGGAGAAACCAATTTCAAGAATAAATGGCCAAGTCCCTTTGCGGTGGTTTCCGG CATACTTCTCCTCCTCTCCTTCTTCAAATACCTTTACTCTCCCTTCCGTTGGCTCGCTGTCGCAGCCGTTG TCGCTGGCATTTATCCGATATTAGCCAAAGCTGTAGCATCTCTTGCAAGGTTTAGAATCGACATCAACATT CTTGTTGTTGTAACCGTGGGAGCAACAATTGGCATGCAAGATTACACAGAAGCTGCAGTAGTTGTCTTCTT GTTCACAATCGCTGAATGGCTCCAATCTAGAGCTAGCTATAAGGCAAGTGCAGTGATGCAGTCTTTGATGA GTTTAGCTCCACAAAAGGCCGTGATTGCAGAGACTGGAGAAGAAGTTGAAGTGGATGAACTCAAGACTAAC ACAGTTATAGCTGTCAAAGCTGGTGAAACCATACCAATTGATGGAGTTGTTGTTGATGGAAACTGTGAAGT TGATGAGAAAACTTTGACCGGTGAAGCATTCCCCGTGCCTAAACTGAAAGATTCGACGGTTTGGGCTGGAA CCATCAATCTAAATGGTTACATAACTGTGAACACCACTGCTCTAGCTGAGGATTGCGTGGTTGCGAAGATG GCAAAACTTGTAGAAGAAGCTCAGAACAGCAAAACCGAAACTCAGAGATTTATAGATAAATGTTCCAAGTA CTATACTCCAGCGATCATCCTGATATCAATCTGTTTTGTGGCTATCCCATTTGCATTGAAGGTTCACAACC TGAAACATTGGGTTCACTTAGCACTAGTTGTGTTAGTAAGTGCTTGTCCTTGTGGACTTATCCTCTCCACA CCAGTAGCCACTTTCTGTGCACTCACAAAGGCAGCCACATCAGGACTTTTGATCAAAGGAGCTGATTACCT TGAAACCTTAGCCAAGATCAAGATTGTTGCTTTTGATAAAACCGGGACTATCACTAGAGGTGAGTTCATCG TCATGGATTTCCAGTCACTTTCTGAAGATATAAGCCTGCAGAGTCTGCTTTACTGGGTATCAAGCACTGAG AGCAAGTCAAGCCATCCAATGGCTGCTGCAGTTGTGGACTATGCAAGATCTGTCTCTGTTGAACCTAAGCC TGAAGCGGTCGAAGACTACCAGAACTTTCCAGGAGAAGGGATTTATGGGAAGATTGATGGTAAAGAAGTCT ATATCGGTAACAAAAGGATTGCTTCTAGAGCTGGGTGTTTATCAGTTCCAGATATTGATGTTGATACTAAA GGAGGAAAGACAATTGGATATGTATATGTTGGAGAAACACTAGCTGGTGTTTTCAATCTCTCAGATGCTTG TAGATCTGGTGTAGCTCAAGCCATGAAAGAACTGAAATCGTTGGGAATTAAGATTGCAATGCTTACCGGAG ATAATCATGCCGCTGCAATGCATGCTCAAGAACAGCTAGGAAATGCTATGGATATTGTTAGAGCAGAGCTT CTTCCAGAAGATAAATCTGAAATCATAAAACAGTTAAAGAGAGAAGAAGGGCCAACGGCTATGGTAGGGGA CGGGTTGAATGATGCACCAGCTTTAGCTACAGCGGATATTGGTATCTCAATGGGTGTTTCCGGCTCTGCAC TTGCAACAGAGACTGGTAATATAATTCTGATGTCCAATGATATCAGAAGAATACCGCAGGCGATAAAGCTT GCAAAAAGAGCGAAAAGGAAAGTGGTGGAGAATGTGGTCATATCAATCACTATGAAAGGAGCTATTCTTGC ATTGGCATTTGCTGGTCATCCGCTGATTTGGGCAGCTGTTCTTGCAGATGTAGGAACTTGTCTGCTTGTGA TCCTTAATAGCATGTTGTTGCTCAGTGATAAACATAAAACTGGAAACAAATGTTATAGGGAGTCTTCTTCT TCCTCGGTTTTAATCGCGGAGAAACTTGAAGGTGATGCTGCTGGGGACATGGAAGCAGGCTTATTACCAAA GATCAGTGATAAGCATTGCAAGCCGGGCTGTTGTGGAACGAAGACTCAAGAGAAAGCGATGAAACCAGCTA AAGCCAGTTCGGACCATAGTCACTCTGGTTGTTGTGAGACGAAACAGAAAGACAATGTAACGGTTGTGAAA AAGAGCTGTTGTGCAGAACCTGTTGATTTAGGACATGGCCATGACTCAGGGTGTTGTGGTGACAAGAGTCA ACAACCACACCAACACGAGGTGCAAGTGCAACAAAGCTGTCATAACAAGCCATCTGGCCTGGACTCAGGGT GTTGTGGTGGCAAGAGTCAACAGCCACACCAACACGAGTTGCAACAAAGCTGTCATGACAAGCCATCTGGC CTGGACATTGGAACTGGCCCAAAACACGAAGGTTCTTCAACATTGGTCAATCTGGAGGGGGATGCGAAAGA GGAACTGAAAGTCTTGGTCAATGGTTTTTGTTCAAGTCCTGCTGATCTCGCGATAACTAGCTTGAAAGTGA

214 AGAGCGATAGCCATTGCAAGTCTAACTGTAGCAGCAGAGAGAGGTGCCACCATGGCAGCAACTGTTGTAGG AGCTACGCTAAAGAGAGTTGCAGTCACGATCACCACCACACAAGGGCTCATGGCGTTGGAACCTTGAAAGA GATTGTGATTGAATAG

>AT4G30120.1 | HMA3 (Heavy metal ATPase 3); ATPase, coupled to transmembrane movement of ions, phosphorylative mechanism

ATGGCGGAAGGTGAAGAGTCAAAGAAGATGAATTTACAGACAAGTTACTTCGACGTCGTTGGAATCTGCTG TTCATCGGAGGTTTCTATCGTAGGTAACGTTCTCCGTCAAGTGGACGGCGTTAAAGAATTCTCAGTCATCG TTCCTTCTAGAACCGTCATCGTTGTCCACGATACTTTTTTGATTTCTCCACTTCAAATCGTCAAGGCTCTG AATCAAGCAAGACTAGAAGCAAGTGTTAGACCATACGGAGAAACAAGCTTGAAGAGTCAATGGCCAAGCCC TTTCGCAATAGTTTCTGGTGTACTGCTAGTTCTCTCCTTCTTCAAATACTTTTATAGTCCGCTTGAATGGC TCGCTATTGTTGCCGTGGTGGCTGGAGTTTTCCCCATTCTTGCTAAAGCTGTTGCTTCAGTCACAAGGTTC AGGCTTGACATCAACGCTCTCACTCTAATTGCTGTGATAGCAACGCTATGTATGCAGGATTTCACAGAAGC TGCTACAATTGTGTTTTTATTCTCAGTTGCAGATTGGCTTGAGTCTAGTGCTGCTCATAAGGCAAGCATAG TAATGTCATCACTGATGAGCTTAGCGCCACGGAAGGCAGTGATCGCGGATACTGGACTAGAAGTTGATGTT GATGAGGTTGGGATCAACACCGTTGTTTCAGTTAAAGCTGGAGAAAGTATACCGATCGATGGAGTTGTGGT GGATGGAAGCTGTGATGTGGATGAGAAAACATTGACAGGAGAATCATTCCCTGTCTCCAAACAGAGAGAGT CAACTGTTATGGCTGCAACCATAAATCTTAATGGTTATATAAAGGTGAAAACTACAGCTCTAGCCCGGGAC TGCGTGGTTGCGAAAATGACTAAGCTTGTAGAAGAAGCTCAAAAAAGCCAAACCAAAACTCAAAGGTTTAT AGACAAATGCTCTCGCTACTACACTCCAGCTGTTGTTGTGTCAGCAGCATGTTTTGCGGTAATCCCGGTAC TGTTAAAGGTTCAAGACCTTAGCCATTGGTTTCACTTAGCACTTGTAGTGTTAGTAAGTGGTTGTCCCTGT GGTCTTATCCTATCCACACCTGTTGCTACCTTTTGTGCTCTCACTAAGGCAGCCACGTCAGGGTTTCTGAT CAAAACTGGTGATTGTCTAGAGACATTAGCAAAGATCAAGATTGTTGCTTTTGACAAAACAGGAACTATTA CAAAGGCTGAGTTCATGGTCTCGGATTTTAGGTCTCTTTCTCCCAGTATCAACCTGCACAAGTTGCTTTAC TGGGTCTCAAGCATTGAGTGCAAGTCAAGCCATCCGATGGCAGCTGCGCTTATTGACTACGCAAGATCAGT TTCTGTTGAGCCTAAGCCTGATATAGTCGAGAACTTTCAGAACTTTCCAGGAGAAGGAGTTTATGGGAGAA TAGATGGTCAAGATATCTACATTGGAAACAAAAGAATAGCACAGAGAGCTGGATGCTTAACAGATAATGTT CCGGATATTGAAGCTACTATGAAGCGAGGTAAGACCATTGGTTACATATACATGGGAGCAAAACTGACCGG AAGTTTCAACCTTTTGGATGGTTGTCGATATGGGGTTGCTCAAGCTCTTAAGGAACTCAAATCTTAG

>AT2G19110.1 | HMA4 (Heavy metal ATPase 4); Cadmium-transporting ATPase

ATGGCGTTACAAAACAAAGAAGAAGAGAAAAAGAAAGTGAAGAAGTTGCAAAAGAGTTACTTCGATGTTCT CGGAATCTGTTGTACATCGGAAGTTCCTATAATCGAGAATATTCTCAAGTCACTTGACGGCGTTAAAGAAT ATTCCGTCATCGTTCCCTCGAGAACCGTGATTGTTGTTCACGACAGTCTCCTCATCTCTCCCTTCCAAATT GCTAAGGCACTAAACGAAGCTAGGTTAGAAGCAAACGTGAGGGTAAACGGAGAAACTAGCTTCAAGAACAA ATGGCCGAGCCCTTTCGCCGTAGTTTCCGGCTTACTTCTCCTCCTATCCTTCCTAAAGTTTGTCTACTCGC CTTTACGTTGGCTCGCCGTGGCAGCAGTTGCCGCCGGTATCTATCCGATTCTTGCCAAAGCCTTTGCTTCC ATTAAAAGGCCTAGGATCGACATCAACATATTGGTCATAATAACCGTGATTGCAACACTTGCAATGCAAGA TTTCATGGAGGCAGCAGCAGTTGTGTTCCTATTCACCATATCCGACTGGCTCGAAACAAGAGCTAGCTACA AGGCGACCTCGGTAATGCAGTCTCTGATGAGCTTAGCTCCACAAAAGGCTATAATAGCAGAGACTGGTGAA GAAGTTGAAGTAGATGAGGTTAAGGTTGATACAGTTGTAGCAGTTAAAGCTGGTGAAACCATACCAATTGA TGGAATTGTGGTGGATGGAAACTGTGAGGTAGACGAGAAAACCTTAACGGGCGAAGCATTTCCTGTGCCTA AACAGAGAGATTCTACCGTTTGGGCTGGAACCATTAATCTAAATGGTTACATATGTGTGAAAACAACTTCT TTAGCGGGTGATTGTGTGGTTGCGAAAATGGCTAAGCTAGTAGAAGAAGCTCAGAGCAGTAAAACCAAATC TCAGAGACTAATAGACAAATGTTCTCAGTACTATACTCCAGCAATCATCTTAGTATCAGCTTGCGTTGCCA TTGTCCCGGTTATAATGAAGGTCCACAACCTTAAACATTGGTTCCACCTAGCATTAGTTGTGTTAGTCAGT GGTTGTCCCTGTGGTCTTATCCTCTCTACACCAGTTGCTACTTTCTGTGCACTTACTAAAGCGGCAACTTC AGGGCTTCTGATCAAAAGTGCTGATTATCTTGACACACTCTCAAAGATCAAGATTGTTGCTTTCGATAAAA CTGGGACTATTACAAGAGGAGAGTTCATTGTCATAGATTTCAAGTCACTCTCTAGAGATATAAACCTACGC

215 AGCTTGCTTTACTGGGTATCAAGTGTTGAAAGCAAATCAAGTCATCCAATGGCAGCAACAATCGTGGATTA TGCAAAATCTGTTTCTGTTGAGCCTAGGCCTGAAGAGGTTGAGGATTACCAGAACTTTCCAGGTGAAGGAA TCTACGGGAAGATTGATGGTAACGATATCTTCATTGGGAACAAAAAGATAGCTTCTCGAGCTGGTTGTTCA ACAGTTCCAGAGATTGAAGTTGATACCAAAGGCGGGAAGACTGTTGGATACGTCTATGTAGGTGAAAGACT AGCTGGATTTTTCAATCTTTCTGATGCTTGTAGATCTGGTGTTTCTCAAGCAATGGCAGAACTGAAATCTC TAGGAATCAAAACCGCAATGCTAACGGGAGATAATCAAGCCGCGGCAATGCATGCTCAAGAACAGCTAGGG AATGTTTTAGATGTTGTACATGGAGATCTTCTTCCAGAAGATAAGTCCAGAATCATACAAGAGTTTAAGAA AGAGGGACCAACCGCAATGGTAGGGGACGGTGTGAATGATGCACCAGCTTTAGCTACAGCTGATATTGGTA TCTCCATGGGAATTTCTGGCTCTGCTCTTGCAACACAAACTGGTAATATTATTCTGATGTCTAATGATATA AGAAGGATACCACAAGCGGTGAAGCTAGCGAGAAGAGCACGACGCAAAGTTGTTGAAAACGTGTGTCTATC GATCATTTTAAAAGCAGGAATACTCGCTTTGGCATTTGCTGGTCATCCTTTGATTTGGGCTGCGGTTCTTG TTGATGTAGGGACTTGTCTGCTTGTGATTTTCAATAGTATGTTGCTGCTGCGAGAGAAGAAAAAGATTGGG AACAAAAAGTGTTACAGGGCTTCTACATCTAAGTTGAATGGTAGGAAACTTGAAGGCGATGATGATTATGT TGTGGACTTAGAAGCAGGCTTGTTAACAAAGAGCGGGAATGGTCAATGCAAATCAAGCTGTTGTGGAGATA AGAAAAATCAAGAGAATGTTGTGATGATGAAACCAAGTAGTAAAACCAGTTCTGATCATTCTCACCCTGGT TGTTGTGGCGATAAGAAGGAAGAAAAAGTGAAGCCGCTTGTGAAAGATGGCTGTTGCAGTGAGAAAACTAG GAAATCAGAGGGAGATATGGTTTCATTGAGCTCATGTAAGAAGTCTAGTCATGTCAAACATGACCTGAAAA TGAAAGGTGGTTCAGGTTGTTGTGCTAGCAAAAATGAGAAAGGGAAGGAAGTAGTGGCAAAGAGCTGTTGT GAGAAACCCAAACAGCAGGTGGAGAGTGTTGGAGACTGCAAGTCTGGTCATTGCGAGAAGAAGAAGCAAGC TGAAGACATTGTTGTCCCGGTGCAGATTATTGGTCATGCATTAACGCATGTGGAGATCGAGTTGCAGACAA AGGAAACCTGCAAAACAAGCTGTTGTGACAGTAAAGAAAAGGTTAAGGAGACAGGTTTGCTGCTTTCTAGT GAGAACACACCTTACCTGGAGAAAGGAGTGCTGATTAAAGATGAAGGAAACTGCAAGTCTGGCAGCGAGAA CATGGGGACAGTGAAACAAAGCTGCCATGAGAAGGGCTGCAGCGATGAAAAACAAACCGGGGAAATAACTC TTGCTTCGGAGGAAGAGACAGATGATCAAGATTGCTCCTCGGGATGTTGTGTGAACGAGGGAACAGTGAAA CAAAGCTTCGATGAGAAGAAGCATTCTGTGTTGGTGGAGAAGGAAGGTTTGGACATGGAAACTGGTTTCTG TTGTGATGCCAAGCTGGTTTGTTGTGGAAACACAGAAGGTGAAGTGAAGGAGCAATGTCGTCTGGAGATAA AGAAAGAAGAACATTGCAAGTCTGGTTGCTGCGGCGAGGAAATACAAACCGGAGAAATCACTCTGGTTTCA GAGGAAGAGACAGAGAGCACGAATTGTTCCACGGGTTGTTGTGTGGACAAAGAAGAAGTGACACAAACCTG TCATGAGAAGCCTGCTAGCTTGGTGGTATCAGGCTTGGAAGTGAAGAAGGATGAGCATTGTGAGAGCTCAC ACAGAGCCGTCAAGGTAGAGACCTGTTGCAAAGTGAAGATTCCAGAGGCTTGCGCATCAAAATGTAGGGAC AGAGCGAAGCGTCACAGTGGTAAAAGCTGTTGCAGGAGTTATGCAAAAGAGTTATGCAGCCACCGCCATCA TCATCACCACCACCACCACCATCACCATGTGAGTGCTTGA

>AT1G63440.1 | HMA5 (HEAVY METAL ATPASE 5); ATPase, coupled to transmembrane movement of ions, phosphorylative mechanism

ATGGCGACGAAGCTTTTGTCGCTTACATGCATAAGGAAAGAGAGATTCAGCGAGCGTTACCCTCTGGTGCG GAAGCACCTTACCAGGTCTCGCGACGGCGGCGGAGGATCATCGTCGGAGACGGCGGCTTTTGAGATCGATG ATCCGATTTCCAGGGCGGTTTTTCAAGTGTTAGGAATGACTTGCTCTGCTTGCGCTGGATCTGTTGAAAAG GCTATCAAGCGTCTCCCTGGAATTCACGATGCTGTCATCGATGCTCTGAATAATCGTGCTCAGATTCTGTT TTACCCTAACTCTGTCGATGTGGAGACAATTCGTGAGACTATTGAGGATGCTGGATTTGAAGCATCGTTGA TCGAAAATGAAGCGAATGAGAGGTCGAGACAAGTTTGTAGAATAAGAATTAATGGTATGACTTGTACTTCG TGTTCTTCAACAATCGAAAGAGTTTTGCAATCGGTTAATGGTGTACAAAGAGCACATGTGGCTTTAGCAAT TGAAGAAGCTGAGATTCATTATGATCCTAGACTCTCGAGCTATGATAGACTATTGGAGGAGATAGAAAATG CTGGTTTTGAAGCTGTGCTTATAAGTACAGGCGAGGATGTGAGCAAGATTGATTTGAAGATTGATGGTGAG CTTACTGATGAATCCATGAAGGTTATTGAAAGGTCGCTTGAAGCACTTCCTGGTGTTCAAAGTGTTGAGAT CAGCCATGGAACTGATAAGATATCTGTATTGTACAAACCTGATGTGACGGGGCCGAGGAATTTCATTCAGG TGATTGAGTCTACTGTCTTTGGTCATAGTGGTCACATCAAGGCAACAATATTCTCAGAGGGAGGGGTAGGC AGAGAATCTCAAAAGCAAGGGGAGATTAAGCAGTATTATAAGTCGTTTCTCTGGAGTTTGGTTTTTACAGT ACCGGTGTTTTTGACAGCCATGGTCTTTATGTATATCCCTGGAATTAAAGATTTGCTCATGTTTAAGGTCA TCAATATGCTCACCGTTGGAGAAATCATAAGGTGTGTTTTGGCTACTCCTGTCCAGTTTGTCATTGGTTGG AGATTTTATACTGGCTCTTACAAGGCTTTACGCCGAGGATCGGCTAATATGGATGTTCTGATTGCTCTGGG GACAAATGCAGCTTATTTCTATTCATTATACACAGTGTTGAGAGCTGCAACATCTCCTGATTTCAAGGGAG

216 TAGATTTCTTTGAGACTAGTGCCATGCTCATTTCCTTTATCATACTAGGAAAATATTTGGAGGTTATGGCT AAAGGAAAAACATCTCAAGCGATCGCAAAGCTTATGAACTTGGCACCCGACACTGCGATATTGTTGAGTTT GGACAAGGAAGGGAATGTGACTGGTGAAGAGGAGATTGATGGTCGATTGATACAGAAGAATGACGTGATCA AAATTGTTCCTGGTGCTAAAGTAGCTTCAGATGGTTATGTCATATGGGGACAAAGTCATGTGAATGAAAGT ATGATCACTGGAGAGGCTAGGCCAGTGGCAAAGAGAAAGGGTGATACAGTTATAGGAGGCACTTTGAACGA GAATGGTGTTCTGCATGTTAAGGTTACAAGGGTTGGTTCAGAGAGTGCTCTTGCTCAGATTGTTCGACTTG TTGAGTCCGCACAACTGGCCAAAGCTCCAGTACAGAAGTTGGCTGATCGGATTTCCAAGTTCTTTGTTCCT CTTGTAATTTTCCTCTCGTTCTCAACTTGGCTTGCCTGGTTCTTAGCTGGGAAACTGCATTGGTACCCTGA ATCCTGGATACCTTCTTCGATGGATAGCTTTGAGCTAGCTCTTCAGTTCGGGATCTCTGTCATGGTCATAG CTTGTCCATGTGCTCTTGGGCTGGCTACTCCAACTGCTGTTATGGTTGGTACTGGGGTTGGTGCATCCCAA GGTGTGCTGATAAAGGGTGGCCAAGCCCTAGAAAGAGCACACAAGGTAAATTGCATTGTCTTTGACAAGAC AGGAACTCTCACGATGGGGAAGCCCGTTGTTGTGAAAACAAAGCTCCTGAAAAACATGGTACTTCGAGAGT TCTATGAACTTGTTGCTGCAACTGAGGTAAACAGTGAGCATCCGTTAGCAAAGGCCATTGTCGAGTACGCG AAGAAATTCAGAGACGACGAAGAGAACCCTGCCTGGCCAGAAGCCTGTGATTTTGTGTCCATCACTGGAAA GGGAGTGAAAGCCACCGTTAAAGGTAGAGAGATTATGGTGGGGAACAAGAATCTGATGAATGATCATAAAG TTATTATTCCAGATGATGCTGAAGAGTTGCTAGCTGACTCTGAAGATATGGCCCAGACCGGAATTCTTGTC TCCATAAACAGTGAACTGATTGGAGTTTTGTCTGTTTCGGATCCTCTAAAACCGAGTGCTCGAGAAGCCAT CTCCATTCTAAAATCCATGAATATCAAAAGCATCATGGTAACTGGTGACAACTGGGGAACAGCAAACTCAA TTGCTAGAGAAGTCGGTATCGACTCTGTTATCGCAGAAGCTAAACCTGAGCAAAAAGCAGAGAAAGTCAAG GAATTACAGGCTGCGGGACATGTTGTGGCAATGGTAGGTGACGGAATCAATGACTCACCGGCTCTCGTGGC AGCGGATGTAGGTATGGCGATAGGTGCAGGAACAGACATTGCTATAGAAGCAGCGGATATAGTTCTGATGA AAAGCAACTTAGAAGATGTGATCACAGCCATTGATCTTTCAAGGAAAACGTTCTCAAGAATCCGTCTCAAC TACGTATGGGCTCTCGGGTATAACCTCATGGGGATACCGATCGCTGCGGGAGTGCTTTTCCCAGGGACACG TTTCAGGTTGCCTCCATGGATTGCAGGTGCTGCAATGGCTGCTTCTTCTGTTAGTGTTGTGTGTTGCTCTC TCTTGCTTAAGAACTACAAGCGACCTAAGAAGCTTGATCATCTGGAGATTCGGGAGATTCAGGTGGAGCGA GTTTAA

>AT4G33520.2 | HMA6, PAA1; Metal-transporting P-type ATPase 1

ATGGAGTCTACACTCTCAGCTTTCTCAACCGTCAAGGCAACAGCGATGGCTCGAAGTAGTGGTGGTCCTTC ATTACCTCTCCTCACTATTTCTAAAGCACTTAACCGCCACTTTACCGGCGCCAGACACCTCCATCCTTTAT TGCTCGCTCGCTGTTCTCCCTCCGTGCGACGTCTTGGTGGGTTTCATGGCAGTCGATTTACTTCGTCGAAT TCAGCTTTGAGAAGCTTGGGAGCTGCTGTTTTGCCGGTAATCCGACACCGTTTAGAGTGTTTGTCGAGCTC ATCGCCGTCTTTCAGGAGTATCTCAAGTGGCGGCGGTTCTGGATTTGGGGGGTATAATGGTGGAAGCGGCG GTGGTGGTGGCGGAGGATCGGAGAGTGGTGATTCCAAGTCAAAACTGGGTGCTAATGCAAGTGATGGTGTC TCTGTTCCGTCGTCAGATATCATTATTCTCGATGTTGGAGGGATGACATGTGGAGGGTGTTCAGCAAGTGT GAAGAAGATACTGGAAAGCCAACCTCAAGTTGCTTCAGCTAGTGTTAATCTCACCACCGAGACGGCAATTG TGTGGCCTGTACCGGAAGCTAAAAGTGTACCTGATTGGCAAAAGAGTTTAGGCGAGACGCTTGCAAACCAT CTGACCAATTGCGGGTTTCAGTCTACTCCTCGAGATTTGGTGACAGAGAATTTCTTCAAAGTTTTCGAAAC AAAGACAAAAGACAAGCAGGCTCGCTTAAAAGAGAGTGGCCGCGAGCTTGCTGTTTCATGGGCCCTATGTG CTGTATGCTTGGTTGGCCATTTAACTCACTTTCTAGGGGTTAATGCTCCCTGGATTCATGCGATCCATTCA ACGGGGTTTCATGTGTCTTTATGTTTGATTACGTTGCTTGGTCCTGGACGCAAGCTAGTACTTGATGGTAT CAAGAGTCTTTTAAAAGGTTCCCCAAACATGAACACGCTAGTTGGTCTTGGTGCTTTGTCATCATTTTCTG TTAGCTCACTGGCAGCCATGATTCCAAAATTGGGCTGGAAGACATTTTTTGAGGAACCAGTTATGTTAATA GCTTTTGTATTGCTGGGAAGGAATCTTGAACAGAGAGCTAAAATAAAAGCCACCAGTGATATGACTGGCCT TCTGAGTGTTTTGCCTTCAAAAGCTCGCCTTCTGCTTGATGGTGATCTGCAGAATTCAACTGTTGAGGTTC CTTGTAATAGTCTCTCTGTTGGTGATCTAGTCGTCATACTACCAGGAGATCGAGTTCCAGCAGATGGTGTG GTCAAGTCAGGTAGAAGCACCATTGATGAGTCAAGTTTCACAGGTGAACCATTGCCAGTGACAAAAGAGTC AGGGAGTCAAGTCGCAGCAGGATCTATAAACCTGAACGGAACTCTTACTGTCGAAGTGCATAGATCAGGGG GTGAAACTGCAGTTGGGGACATTATTCGTCTGGTTGAAGAGGCTCAGAGTAGAGAAGCACCAGTACAGCAG TTGGTTGACAAGGTCGCAGGGCGCTTCACGTATGGAGTGATGGCACTCTCTGCAGCTACATTTACATTCTG GAATCTATTTGGTGCACATGTTCTTCCTTCTGCCTTGCATAATGGGAGCCCAATGTCTTTGGCCCTTCAAC TATCTTGCAGCGTTCTGGTTGTGGCTTGTCCTTGCGCCCTTGGGTTGGCTACTCCAACGGCAATGCTGGTT

217 GGTACATCATTAGGAGCAAGAAGAGGATTACTTCTTCGTGGTGGTGATATTCTTGAAAAGTTTTCCTTGGT TGATACTGTTGTCTTCGACAAAACTGGGACACTGACGAAGGGACACCCTGTTGTGACTGAAGTTATTATTC CTGAAAATCCAAGACACAATTTGAATGATACTTGGTCAGAAGTAGAGGTTTTGATGTTAGCAGCTGCTGTT GAGTCAAACACTACTCACCCTGTTGGGAAAGCGATTGTAAAAGCAGCCCGGGCTCGTAATTGTCAAACAAT GAAGGCAGAGGATGGAACATTTACTGAAGAACCAGGGTCTGGTGCTGTTGCTATTGTCAACAACAAAAGGG TTACAGTTGGGACTCTAGAATGGGTCAAGAGACATGGAGCCACCGGGAACTCATTGCTTGCATTAGAAGAG CATGAAATTAACAACCAGTCTGTTGTGTATATTGGGGTGGATAATACACTTGCTGCAGTCATCCGTTTTGA AGATAAAGTCAGGGAGGATGCTGCTCAAGTTGTTGAGAATTTGACTCGCCAGGGAATTGATGTGTACATGC TGTCTGGTGACAAGAGGAATGCTGCCAATTATGTAGCTTCTGTCGTAGGAATTAATCATGAACGGGTTATA GCAGGAGTTAAACCAGCTGAGAAAAAGAATTTTATCAATGAACTTCAAAAAAATAAGAAGATTGTTGCAAT GGTGGGGGATGGAATAAACGATGCTGCTGCTCTGGCATCATCTAATGTTGGAGTGGCCATGGGTGGTGGTG CAGGAGCGGCCAGTGAGGTGTCTCCTGTTGTCTTGATGGGCAATCGGTTAACACAGTTGCTTGATGCGATG GAGCTAAGTCGGCAAACCATGAAGACCGTGAAGCAAAACCTTTGGTGGGCGTTCGGATACAACATTGTAGG AATCCCAATTGCGGCTGGAGTGTTGCTACCATTGACTGGAACCATGTTGACTCCATCAATGGCGGGGGCTC TCATGGGTGTAAGCTCTCTCGGTGTCATGACTAATTCCTTACTTCTAAGGTACAGATTCTTCTCGAACCGA AACGACAAAAATGTCAAACCGGAACCAAAAGAGGGAACAAAACAGCCACATGAGAACACAAGATGGAAGCA AAGCTCTTAG

>AT5G44790.1 | HMA7, RAN1 (Responsive-to-Antagonist1); ATPase, coupled to transmembrane movement of ions, phosphorylative mechanism

ATGGCGCCGAGTAGACGGGATTTACAGCTTACTCCGGTCACCGGAGGTTCTTCTTCCCAGATCAGTGATAT GGAAGAAGTTGGTCTTCTCGATTCGTATCACAACGAAGCGAACGCCGATGATATTTTGACTAAAATCGAAG AAGGAAGAGATGTTTCCGGTTTAAGGAAGATTCAGGTCGGAGTCACCGGTATGACTTGTGCTGCTTGTTCT AATTCCGTTGAAGCAGCTTTGATGAACGTTAATGGCGTCTTCAAAGCCTCTGTTGCTTTGTTACAGAATCG AGCCGATGTCGTCTTCGACCCTAATTTAGTCAAGGAAGAGGATATTAAAGAAGCTATAGAGGATGCTGGAT TTGAGGCTGAGATATTGGCTGAAGAACAGACACAAGCTACTTTGGTGGGTCAATTTACAATTGGAGGTATG ACTTGTGCAGCTTGTGTTAACTCTGTTGAAGGTATTCTAAGAGATCTTCCTGGAGTGAAAAGAGCAGTGGT AGCTTTGTCTACATCATTAGGAGAAGTTGAGTATGATCCAAATGTTATTAACAAGGATGATATTGTTAATG CGATTGAAGATGCTGGGTTTGAAGGTTCGCTTGTGCAGAGTAATCAGCAGGATAAGCTTGTTTTAAGGGTG GATGGGATTTTGAATGAGCTTGATGCTCAGGTTTTAGAAGGGATACTTACTAGATTGAATGGGGTTAGACA GTTTCGTCTTGACAGGATATCTGGAGAGCTTGAGGTTGTGTTTGATCCGGAAGTTGTTAGTTCGAGATCTT TGGTTGATGGGATTGAAGAAGATGGATTTGGGAAGTTTAAGTTGCGTGTTATGAGCCCTTATGAAAGGTTG TCTTCGAAAGATACTGGAGAGGCCTCAAATATGTTTCGGCGTTTTATCTCCAGTCTTGTTCTTAGTATTCC TCTCTTCTTCATCCAAGTAATCTGCCCCCACATTGCTCTTTTTGATGCCTTACTGGTATGGAGATGTGGAC CCTTCATGATGGGTGATTGGTTGAAATGGGCCTTGGTAAGTGTTATTCAGTTTGTTATCGGCAAGCGATTC TATGTTGCAGCATGGAGAGCTCTACGAAATGGTTCAACTAACATGGATGTGCTCGTCGCTCTGGGCACGTC TGCTTCTTACTTCTACTCTGTTGGGGCTCTTTTATATGGAGCAGTCACTGGATTTTGGTCTCCAACATACT TTGATGCAAGCGCCATGCTGATAACTTTTGTCTTACTGGGAAAATATTTAGAGTCTCTTGCAAAGGGGAAA ACCTCGGATGCTATGAAGAAACTAGTACAACTTACGCCAGCAACAGCGATTTTACTAACTGAAGGCAAAGG AGGAAAGCTTGTAGGGGAAAGGGAGATAGATGCATTATTGATTCAGCCTGGTGATACATTAAAAGTTCATC CCGGTGCAAAGATCCCTGCAGATGGTGTTGTGGTGTGGGGTTCAAGTTACGTTAATGAAAGTATGGTGACT GGTGAATCAGTTCCTGTATCAAAAGAGGTTGATTCACCAGTAATCGGGGGTACTATTAATATGCATGGTGC TCTTCACATGAAAGCTACAAAGGTGGGATCTGATGCAGTTCTGAGCCAGATTATTAGTTTGGTCGAGACCG CCCAAATGTCAAAAGCTCCTATTCAGAAATTTGCGGATTATGTTGCAAGTATTTTTGTTCCGGTAGTGATT ACTTTGGCATTGTTCACCTTAGTAGGCTGGTCAATTGGTGGAGCTGTTGGAGCTTACCCAGACGAATGGCT TCCGGAGAACGGGACTCACTTTGTTTTCTCGCTTATGTTCTCAATATCCGTTGTGGTAATTGCTTGTCCTT GTGCACTTGGCTTGGCAACACCGACTGCTGTTATGGTGGCAACAGGAGTTGGAGCAACCAATGGTGTATTG ATCAAAGGAGGTGATGCATTAGAGAAAGCTCACAAAGTGAAGTATGTAATCTTTGACAAAACAGGCACCTT AACTCAAGGAAAAGCTACTGTGACAACCACAAAGGTCTTCTCAGAGATGGACCGTGGAGAATTCCTTACAC TTGTTGCGTCTGCTGAGGCTAGCAGTGAACACCCATTGGCAAAAGCTATAGTTGCGTACGCTCGGCATTTC CACTTCTTTGATGAATCCACTGAAGATGGCGAAACAAATAACAAAGATTTGCAAAACTCTGGATGGTTACT TGATACCTCGGATTTCTCTGCTCTTCCCGGAAAAGGGATTCAGTGCTTAGTCAATGAGAAGATGATTCTGG

218 TGGGAAACCGTAAACTTATGTCAGAAAATGCCATAAACATCCCAGACCATGTCGAGAAATTTGTGGAAGAT CTTGAAGAAAGCGGAAAGACGGGAGTCATTGTAGCCTATAATGGGAAATTAGTTGGTGTGATGGGAATCGC GGATCCTCTAAAGCGAGAAGCCGCTTTGGTTGTGGAAGGTCTCCTTAGAATGGGTGTTCGACCCATAATGG TTACAGGTGATAACTGGAGAACAGCAAGAGCAGTCGCCAAAGAGGTTGGTATCGAAGATGTGAGAGCAGAA GTAATGCCAGCTGGTAAAGCTGACGTTATACGTTCGCTGCAAAAGGACGGAAGCACAGTAGCCATGGTAGG AGACGGAATCAATGACTCTCCCGCTCTAGCCGCTGCAGATGTGGGAATGGCAATTGGTGCAGGAACAGATG TAGCAATAGAAGCAGCAGATTACGTTCTGATGAGAAACAACTTAGAAGACGTTATAACAGCCATCGATCTC TCTCGAAAAACCTTAACACGAATCCGATTAAATTACGTCTTCGCAATGGCGTACAATGTAGTGTCAATCCC GATAGCAGCAGGAGTGTTTTTCCCGGTTCTGCGGGTGCAATTACCGCCTTGGGCAGCCGGAGCATGTATGG CACTCTCTTCGGTTAGTGTTGTTTGCTCTTCTTTGCTTCTTAGAAGATACAAGAAGCCAAGACTTACCACT GTTTTGAAAATCACCACGGAGTAA

>AT5G21930.1 | HMA8/PAA2 (P-TYPE ATPASE OF ARABIDOPSIS 2); ATPase, coupled to transmembrane movement of ions, phosphorylative mechanism / copper ion transmembrane transporter

ATGGCGAGCAATCTTCTCCGATTTCCTCTTCCTCCGCCATCGAGTCTTCATATTCGTCCAAGTAAGTTCCT CGTGAACCGATGCTTTCCAAGACTGCGCCGTTCTCGTATCCGGCGGCACTGTTCGCGACCTTTTTTTTTAG TATCAAACTCGGTCGAGATAAGTACTCAATCATTTGAATCTACTGAATCTTCAATCGAATCTGTGAAATCC ATTACGAGCGACACGCCTATTCTCCTCGATGTGAGTGGGATGATGTGTGGTGGCTGCGTCGCCCGAGTTAA ATCGGTTTTGATGTCTGATGATCGAGTCGCGTCTGCTGTGGTTAACATGCTGACTGAAACCGCGGCTGTGA AGTTTAAACCGGAGGTTGAGGTGACGGCGGATACGGCAGAGAGTTTAGCTAAGAGATTGACAGAGAGTGGT TTTGAGGCTAAGAGGAGAGTCTCGGGTATGGGAGTGGCGGAGAATGTGAAGAAGTGGAAGGAGATGGTAAG TAAGAAAGAGGACTTGCTTGTTAAGAGCAGGAACCGTGTGGCGTTTGCGTGGACATTGGTGGCTCTGTGTT GCGGGTCTCACACTTCGCATATTCTTCATTCCTTGGGAATTCACATTGCTCATGGAGGGATTTGGGATTTG CTACACAACTCTTACGTGAAAGGTGGTTTGGCCGTTGGAGCTTTGTTGGGACCAGGAAGAGAATTGCTGTT TGATGGCATAAAAGCTTTCGGGAAAAGATCACCTAATATGAACTCGTTAGTTGGATTGGGATCCATGGCTG CATTTTCCATCAGTTTGATCTCACTAGTTAATCCAGAACTGGAGTGGGATGCTTCCTTCTTCGATGAACCG GTCATGCTGCTTGGCTTTGTGCTCCTTGGTCGTTCTTTGGAAGAAAGAGCAAAGCTTCAAGCATCTACCGA TATGAATGAACTATTGTCACTCATCTCCACTCAATCAAGACTTGTGATTACTTCATCAGACAATAACACCC CAGTGGATTCTGTACTTTCCTCTGATTCAATTTGCATCAATGTGTCAGTTGATGATATCCGAGTTGGAGAC TCGCTATTGGTTTTGCCTGGTGAGACATTTCCTGTCGATGGGAGTGTCCTAGCTGGAAGAAGTGTTGTAGA TGAATCTATGCTGACAGGAGAGTCACTTCCGGTGTTTAAAGAAGAAGGCTGCTCAGTTTCAGCGGGAACAA TTAACTGGGATGGCCCTCTACGGATCAAAGCTTCTTCTACTGGTTCCAACTCAACGATATCTAAAATTGTC AGAATGGTTGAGGATGCACAAGGTAATGCAGCTCCTGTACAGAGGCTGGCAGATGCAATAGCTGGACCCTT CGTCTATACTATAATGTCTTTATCTGCAATGACGTTTGCCTTCTGGTATTATGTCGGTTCACACATATTCC CAGATGTTTTACTCAATGATATTGCTGGACCTGATGGAGATGCTTTGGCCTTAAGCTTAAAACTGGCTGTG GATGTCTTGGTAGTTTCCTGCCCCTGTGCACTGGGCCTTGCAACACCAACTGCCATATTAATTGGCACCTC TCTTGGAGCAAAGCGGGGATATCTTATCAGAGGAGGAGATGTTTTAGAACGTCTGGCATCCATAGATTGTG TGGCTTTAGACAAGACAGGTACTCTTACTGAAGGAAGACCTGTCGTCTCTGGTGTTGCTTCTCTAGGATAC GAAGAACAAGAAGTTCTTAAAATGGCTGCTGCGGTGGAGAAGACAGCAACACACCCAATAGCAAAGGCTAT TGTAAACGAGGCTGAATCTCTGAATTTGAAAACTCCAGAAACAAGAGGCCAATTGACAGAACCAGGCTTTG GTACTTTGGCAGAAATAGACGGACGTTTTGTTGCAGTGGGTTCACTTGAGTGGGTTTCTGATCGTTTCCTC AAAAAGAACGACTCCTCGGACATGGTGAAGCTGGAAAGTTTATTGGATCATAAATTATCAAACACATCATC AACGTCTAGATACTCAAAGACTGTTGTTTACGTTGGCCGTGAAGGAGAAGGGATCATTGGTGCTATTGCAA TATCCGATTGCTTACGCCAAGATGCTGAATTTACTGTAGCCAGGCTGCAGGAGAAGGGCATCAAAACAGTC CTCTTATCAGGGGACAGGGAAGGGGCAGTGGCAACTGTGGCAAAGAATGTGGGGATTAAGAGTGAATCAAC CAATTATTCCTTGTCTCCTGAAAAGAAATTTGAGTTTATATCTAATCTTCAATCCTCTGGACATCGTGTTG CTATGGTGGGTGATGGCATAAACGATGCTCCCTCGTTGGCTCAAGCTGATGTTGGAATAGCTCTAAAGATC GAGGCACAAGAAAACGCTGCATCAAATGCAGCATCAGTCATACTTGTCCGCAACAAACTTTCACATGTTGT GGATGCACTGAGTCTTGCACAAGCAACTATGTCGAAAGTATATCAGAATCTAGCGTGGGCAATTGCGTATA ATGTCATCTCAATCCCTATAGCTGCAGGTGTCTTGCTTCCTCAGTATGATTTCGCCATGACACCTTCGTTA

219 TCTGGTGGACTAATGGCATTGAGCTCGATCTTTGTCGTTTCAAATTCATTACTTCTGCAGCTCCATAAGTC TGAAACAAGTAAAAATAGCTTGTGA

>AT2G43750.1 | ACS1, CPACS1, ATCS-B, OASB (O-ACETYLSERINE (THIOL) LYASE B); cysteine synthase

ATGGCGGCGACATCTTCCTCTGCTTTTCTCCTTAATCCGTTGACTTCTCGCCACCGTCCTTTCAAATACTC GCCAGAGCTCTCTTCTCTCTCCTTATCCTCTCGAAAGGCTGCTGCTTTCGATGTTTCCTCAGCTGCTTTCA CGCTCAAGAGACAGAGCCGGAGTGATGTTGTGTGCAAGGCTGTATCTATCAAGCCAGAAGCTGGTGTTGAA GGGCTCAATATCGCCGATAACGCCGCTCAGCTTATTGGGAAAACTCCGATGGTGTACTTGAACAATGTAGT CAAGGGCTGTGTTGCAAGTGTTGCTGCTAAGCTTGAGATCATGGAACCATGTTGCAGTGTCAAGGATAGGA TTGGGTACAGTATGATTACTGATGCTGAAGAGAAAGGACTTATAACACCTGGAAAGAGTGTTCTTGTGGAA TCTACGAGTGGGAACACAGGGATTGGCCTTGCATTCATTGCTGCTTCAAAAGGCTATAAACTTATCTTGAC GATGCCTGCGTCCATGAGTTTGGAAAGGCGGGTTCTTTTGAGGGCATTTGGAGCTGAGCTTGTGTTAACTG AACCTGCAAAAGGTATGACTGGAGCAATTCAGAAGGCTGAAGAAATCTTGAAAAAAACTCCCAATTCCTAC ATGCTCCAACAGTTTGACAACCCTGCCAATCCCAAGATTCATTATGAGACGACTGGTCCTGAGATTTGGGA AGATACAAGAGGCAAAATCGATATATTGGTTGCGGGGATTGGAACTGGTGGAACTATCACTGGTGTTGGTC GATTCATTAAAGAAAGAAAACCTGAATTGAAGGTTATTGGTGTCGAACCCACGGAAAGTGCTATACTTTCT GGTGGAAAACCCGGACCTCACAAGATTCAAGGAATTGGAGCTGGATTTGTACCCAAGAATTTGGATCTGGC TATTGTAGATGAATACATAGCGATTTCCAGTGAGGAAGCTATTGAAACCTCGAAGCAACTAGCTCTCCAGG AAGGCTTGTTGGTTGGTATATCTTCTGGAGCTGCTGCTGCTGCTGCAATCCAGGTTGCAAAGAGACCTGAA AATGCCGGGAAACTCATAGCCGTTGTGTTCCCGAGCTTCGGGGAACGTTACCTCTCGACCCAGCTTTTCCA GTCGATTCGAGAAGAGTGCGAGCAAATGCAGCCCGAGCTTTGA

>AT2G28190.1 | CZSOD2, CSD2 (COPPER/ZINC SUPEROXIDE DISMUTASE 2); copper, zinc superoxide dismutase

ATGGCTGCCACCAACACAATCCTCGCATTCTCATCTCCTTCTCGTCTTCTCATTCCTCCTTCCTCCAATCC TTCAACTCTCCGTTCCTCTTTCCGCGGCGTCTCTCTCAACAACAACAATCTCCACCGTCTCCAATCTGTTT CCTTCGCCGTTAAAGCTCCGTCGAAAGCGTTGACAGTTGTTTCCGCGGCGAAGAAGGCTGTTGCAGTGCTT AAAGGTACTTCTGATGTCGAAGGAGTTGTTACTTTGACCCAAGATGACTCAGGTCCTACAACTGTGAATGT TCGTATCACTGGTCTCACTCCAGGGCCTCATGGATTTCATCTCCATGAGTTTGGTGATACAACTAATGGAT GTATCTCAACAGGACCACATTTCAACCCTAACAACATGACACACGGAGCTCCAGAAGATGAGTGCCGTCAT GCGGGTGACCTGGGAAACATAAATGCCAATGCCGATGGCGTGGCAGAAACAACAATAGTGGACAATCAGAT TCCTCTGACTGGTCCTAATTCTGTTGTTGGAAGAGCCTTTGTGGTTCACGAGCTTAAGGATGACCTCGGAA AGGGTGGCCATGAGCTTAGTCTGACCACTGGAAACGCAGGCGGGAGATTGGCATGTGGTGTGATTGGCTTG ACGCCGCTCTAA

>AT1G12520.1 | ATCCS, CCS1 (copper chaperone for superoxide dismutase 1); superoxide dismutase copper chaperone

ATGGCATCAATTCTCAGGTCAGTGGCAACGACTTCAGCCGTCGTAGCCGCCGCCTCTGCGATTCCCATCGC GATCGCCTTCTCTTCTTCTTCTTCTTCTTCCTCTACCAACCCCAAATCTCAATCTCTCAATTTCTCCTTCC TCTCCCGATCGTCTCCACGTCTCTTGGGACTTTCACGAAGCTTCGTTAGCTCTCCGATGGCGACTGCTCTC ACTTCTGATCGTAATCTCCATCAGGAGGATCGAGCCATGCCTCAGCTTCTTACTGAGTTTATGGTGGATAT GACATGTGAGGGTTGTGTTAATGCTGTGAAGAACAAATTAGAAACCATTGAAGGTATTGAGAAAGTGGAGG TGGATTTGTCCAATCAAGTTGTGAGGATACTTGGATCTTCCCCTGTTAAGGCTATGACTCAAGCTTTGGAG CAAACTGGTCGAAAAGCTCGCTTAATTGGACAAGGTGTTCCTCAAGATTTCTTAGTCTCAGCTGCAGTAGC AGAATTCAAAGGCCCTGACATTTTCGGGGTAGTCCGATTTGCTCAGGTCAGTATGGAACTTGCAAGAATCG

220 AAGCCAACTTTACCGGTTTGTCACCCGGAACCCACAGCTGGTGTATCAACGAGTATGGAGATCTCACAAAC GGAGCAGCCAGCACAGGGAGCTTGTACAATCCATTTCAAGATCAAACTGGCACAGAGCCATTGGGAGACCT GGGAACACTAGAGGCAGACAAAAACGGAGAGGCCTTTTATTCGGGAAAGAAAGAGAAACTCAAGGTTGCAG ACCTTATCGGGAGAGCCGTCGTGGTGTACAAGACCGATGATAATAAGTCAGGCCCCGGATTGACCGCCGCA GTGATTGCTAGAAGCGCAGGGGTTGGAGAAAACTACAAAAAACTTTGTTCTTGTGATGGTACTGTCATATG GGAAGCTACAAACAGCGATTTCGTGGCCAGTAAGGTTTAA

>AT4G03560.1 | TPC1, ATCCH1, FOU2, ATTPC1 (TWO-PORE CHANNEL 1); calcium channel/ voltage-gated calcium channel

ATGGAAGACCCGTTGATTGGTAGAGATAGTCTTGGTGGTGGTGGTACGGATCGGGTTCGTCGATCAGAAGC TATCACGCATGGTACTCCGTTTCAGAAAGCAGCTGCACTCGTTGATCTGGCTGAAGATGGTATTGGTCTTC CTGTGGAAATACTTGATCAGTCGAGTTTCGGGGAGTCTGCTAGGTATTACTTCATCTTCACACGTTTGGAT CTGATCTGGTCACTCAACTATTTCGCTCTGCTTTTCCTTAACTTCTTCGAGCAACCATTGTGGTGTGAAAA AAACCCTAAACCGTCTTGCAAAGATAGAGATTACTATTACCTGGGAGAGTTACCGTACTTGACCAATGCAG AATCCATTATCTATGAGGTGATTACCCTGGCTATACTCCTTGTACATACTTTCTTCCCGATATCCTATGAA GGTTCCCGAATCTTTTGGACTAGTCGCCTGAATCTAGTGAAGGTTGCTTGCGTGGTAATTTTGTTTGTTGA TGTGCTGGTTGACTTTCTGTATCTGTCTCCACTGGCTTTCGACTTTCTCCCTTTTAGAATCGCCCCATACG TGAGAGTTATCATATTCATCCTCAGCATAAGGGAACTTCGGGACACCCTTGTCCTTCTGTCTGGAATGCTT GGCACATACTTGAATATCTTGGCTCTATGGATGCTGTTCCTTCTATTTGCCAGTTGGATTGCTTTTGTTAT GTTTGAGGACACGCAGCAGGGCCTCACGGTCTTCACTTCATATGGTGCAACTCTTTACCAGATGTTTATTT TGTTCACAACATCCAACAATCCTGATGTCTGGATTCCTGCATACAAGTCTTCTCGCTGGTCTTCGGTGTTC TTCGTGCTCTACGTGCTAATTGGCGTCTACTTTGTCACAAACTTGATTCTTGCCGTTGTTTATGACAGTTT CAAAGAACAGCTCGCAAAGCAAGTATCTGGAATGGATCAAATGAAGAGAAGAATGTTGGAGAAAGCCTTTG GTCTTATAGACTCAGACAAAAACGGGGAGATTGATAAGAACCAATGCATTAAGCTCTTTGAACAGTTGACT AATTACAGGACGTTGCCGAAGATATCAAAAGAAGAATTCGGATTGATATTTGATGAGCTTGACGATACTCG TGACTTTAAGATAAACAAGGATGAGTTTGCTGACCTCTGCCAGGCCATTGCTTTAAGATTCCAAAAGGAGG AAGTACCGTCTCTCTTTGAACATTTTCCGCAAATTTACCATTCCGCCTTATCACAACAACTGAGAGCCTTT GTTCGAAGCCCCAACTTTGGCTACGCTATTTCTTTCATCCTCATTATCAATTTCATTGCTGTCGTTGTTGA AACAACGCTTGATATCGAAGAAAGCTCGGCTCAGAAGCCATGGCAGGTTGCCGAGTTTGTCTTTGGTTGGA TATATGTGTTGGAGATGGCTCTGAAGATCTATACATATGGATTTGAGAATTATTGGAGAGAGGGTGCTAAC CGATTTGATTTTCTAGTCACATGGGTCATAGTTATTGGGGAAACAGCTACCTTCATAACTCCAGACGAGAA TACTTTCTTCTCAAATGGAGAATGGATCCGGTACCTTCTCCTGGCGAGAATGTTAAGACTGATAAGGCTTC TTATGAACGTCCAGCGATACCGAGCATTTATTGCGACGTTCATAACTCTTATTCCAAGTTTGATGCCATAT TTAGGGACCATTTTCTGCGTGCTGTGTATCTACTGCTCTATTGGCGTACAGGTCTTTGGAGGGCTTGTGAA TGCTGGGAACAAAAAGCTCTTTGAAACCGAATTGGCTGAGGATGACTACCTTTTGTTCAACTTCAATGACT ACCCCAATGGAATGGTCACACTCTTCAATCTGCTAGTTATGGGTAACTGGCAAGTATGGATGGAGAGCTAC AAAGATTTGACGGGCACGTGGTGGAGCATTACATATTTCGTCAGTTTCTATGTCATCACTATTTTACTTCT GTTGAATTTGGTTGTTGCCTTTGTCTTGGAGGCGTTCTTTACTGAGCTGGATCTTGAAGAAGAAGAAAAAT GTCAAGGACAGGATTCTCAAGAAAAAAGAAACAGGCGTCGATCTGCAGGGTCGAAGTCTCGGAGTCAGAGA GTTGATACACTTCTTCATCACATGTTGGGTGATGAACTCAGCAAACCAGAGTGTTCCACTTCTGACACATA A

>AT3G15352.1 | COX17, ATCOX17 (Arabidopsis thaliana cytochrome c oxidase 17)

ATGACTGATCAGCCAGCACAAAATGGATTGATTCCTCCACCCACTTCAGAGCCAAGCAAGGCTGCTGCCTC AGCAGAGACGAAACCAAAGAAGAGGATATGTTGTGCTTGCCCTGATACCAAGAAGCTGAGAGATGAGTGCA TTGTCGAGCATGGTGAATCGGCTTGCACTAAATGGATCGAGGCTCATAAAATTTGTCTCCGAGCAGAAGGT TTCAACGTCTGA

221 >AT1G66240.1 | ATX1; Metal ion binding

ATGCTTAAAGACTTGTTCCAAGCCGTATCCTATCAAAACACTGCAAGTCTCTCTCTCTTTCAAGCCTTGTC GGTGGTTGAATCGAAAGCTATGTCTCAGACCGTTGTTCTCAGAGTGGCCATGACATGTGAGGGATGTGTTG GAGCTGTGAAAAGAGTTCTTGGGAAAATGGAAGGCGTGGAGTCATTTGACGTTGATATAAAGGAGCAGAAA GTGACGGTGAAAGGCAACGTGCAGCCAGACGCAGTTTTACAGACCGTGACGAAAACCGGAAAGAAAACGGC TTTTTGGGAAGCTGAAGGTGAAACTGCTAAGGCTTAA

>AT4G24120.1 | YSL1 (YELLOW STRIPE LIKE 1); Oligopeptide transporter

ATGGAAATAGAGCAAAGAAGGATCATGAAGAGAGAAGGAGAAGAAGAAGAAGACAACAATCAACTTTCACT GCAAGAAGAAGAACCAGATACAGAGGAAGAGATGTCTGGGAGGACAATCGAACCGTGGACGAAGCAGATAA CGGTGAGAGGAGTGTTCGTGAGCATAGTGATCGGAGTTGTGTTCAGTGTGATTGCTCAGAAGCTAAATCTC ACGACAGGAATTGTTCCAAATCTCAACAGCTCTGCAGCTTTACTTGCTTTTGTCTTTGTCCAGACTTGGAC TAAGATTCTCAAGAAATCAGGATTTGTTGCGAAACCATTCACAAGACAAGAGAACACAATGATTCAGACAT CTGCTGTTGCTTGTTACGGCATCGCTGTCGGAGGTGGGTTTGCTTCATATCTTCTGGGGTTAAACCATAAG ACATATGTGTTGTCTGGTGTGAACTTGGAAGGTAACTCTCCAAAGAGTGTGAAAGAACCTGGCCTTGGTTG GATGACTGCTTATCTCTTTGTTGTCTGTTTCATCGGTCTTTTTGTCCTAATCCCTCTCCGAAAGGTTATGA TTGTTGACCTTAAATTAACATATCCGAGTGGTTTAGCTACTGCGGTTCTCATCAATGGCTTCCACACACAA GGAGATGCACAGGCCAAGAAACAAGTGCGTGGTTTCATGAAATACTTCTCATTTAGTTTCTTGTGGGGTTT CTTCCAGTGGTTTTTCTCTGGTATTGAAGATTGTGGCTTTGCTCAATTCCCAACCTTTGGTTTGAAAGCTT GGAAACAAACGTTCTTCTTTGATTTCAGCATGACATTTGTGGGAGCAGGAATGATTTGTTCACATTTGGTT AACCTTTCTTTGCTTTTAGGAGCTATCCTCTCTTATGGCTTAATGTGGCCTCTTCTTGATAAACTTAAGGG CTCTTGGTTCCCTGATAATCTCGACGAACACAACATGAAGAGCATTTACGGCTACAAAGTCTTCTTATCCG TAGCTCTAATCCTCGGCGACGGTCTTTACACTTTCGTTAAGATCCTCTTTGTGACCATTGCCAATGTCAAC GCAAGATTGAAGAACAAACCTAATGATCTAGATGACGTAGGTCACAAGAAACAACGGAAAGATCTCAAGGA AGATGAGAATTTCCTCAGAGATAAAATCCCAATGTGGTTCGCAGTTTCCGGATATCTTACATTCGCTGCGG TCTCAACCGTCGTGGTTCCTCTGATATTTCCTCAGCTCAAATGGTATTACGTTATTGTAGCTTACATTTTC GCGCCTTCTCTGGCCTTCTGTAACGCTTATGGAGCTGGACTTACAGACATTAACATGGCTTATAACTACGG CAAAATTGGTCTTTTTGTCATCGCGGCTGTGACGGGAAGAGAGAACGGAGTTGTAGCCGGACTCGCCGGTT GTGGACTGATCAAATCCGTTGTTTCGGTTTCTTGTATTTTGATGCAAGATTTCAAGACGGCTCATTACACG ATGACGTCACCTAAGGCTATGTTTGCTAGCCAAATGATTGGAACGGTCGTTGGATGCATCGTGACGCCGTT AAGTTTCTTTTTGTTCTACAAAGCGTTCGACATTGGAAACCCTAACGGAGAATTCAAGGCTCCTTACGCTT TGATTTACAGAAATATGGCGATTCTTGGGGTGCAAGGCTTCTCTGCTCTGCCTCTTCACTGTCTCCAAATG TGTTACGGGTTTTTCGGGTTTGCTGTTTTGGTCAACGTCGTCAGAGATCTTACTCCGGCGAAGATTGGAAG ATTCATGCCACTTCCGACGGCGATGGCTGTTCCGTTTCTTGTCGGAGCTTATTTCGCAATCGACATGTGTG TTGGGACTTTGATTGTGTTTGTTTGGGAGAAGATGAATCGGAAGAAAGCAGAGTTTATGGTTCCGGCGGTG GCTTCAGGGCTAATCTGTGGCGAAGGGCTTTGGACTTTACCTGCGGCGGTGCTTGCCCTCGCCGGAGTAAA ACCTCCGATATGTATGAAGTTCTTAGCTTCATAG

>AT5G24380.1 | ATYSL2, YSL2 (YELLOW STRIPE LIKE 2); Oligopeptide transporter

ATGGAAAACGAAAGGGTTGAGAGAGAACAGAGCCAATTTCAGGAAGACGAGTTTATCGATTCGAGAAAACC ACCGCCATGGAGGAAACAGATCACGGTTCGAGCGATCGTCGCGAGTTTATTGATAGGGATTGTCTACAGTG TGATCTGTCTGAAGCTGAATCTAACGACAGGACTCGTACCGAATCTCAACATCTCATCAGCTCTCTTGGCT TTTGTTTTCCTCAAATCATGGACCAAAGTTCTTCAAAAAGCCGGAATCGCCACGACGCCTTTCACACGTCA AGAGAACACTATTGCTCAAACTTGTGCTGTTGCTTGTTACAGCATCTCTCTTGCAGGTGGATTTGCATCGT ATTTGTTGGGTTTGAATAGACGAACTTACGAGGAGACAGGAGTGAACACCGAAGGAAACAATCCTCGTGGT ATAAAAGAGCCCGGTGTTGGATGGATGACATCATTCCTCTTCGTTACTAGCTTCATTGGACTTGTTGTATT GGTACCTCTTCGAAAGGTGATGATTATAGACTACAAGCTAACTTATCCTAGTGGAACGGCAACAGCTGTTC TTATAAATGGATTTCATACTAGCAAAGGAGATAAGACAGCCAAGAAACAGATTCGCGGGTTTATAAAATCG

222 TTTGGTTTGAGTTTCTTTTGGGCATTCTTTGGATGGTTCTATTCAGGTGGTGAAAAATGCGGATTCTCTCA GTTCCCTACATTTGGGTTACAAGCTTTGGACAAAACATTCTACTTTGACTTCAGTATGACATATGTTGGAG CTGGGATGATTTGTTCTCATTTGGTGAATCTGTCGTTGCTCTTCGGTGCCATCCTCTCGTGGGGAATCATG TGGCCTCTCATAGCCCGGCTTAAAGGCGAATGGTTCCCTGCAACATTGAAAGATAATAGTATGCAGGGCTT AAACGGCTACAAGGTGTTTATATGCATTGCACTGATCTTAGGAGATGGGCTTTATAACTTCGTTAAGATAC TCTTTTTCACTGGAAGAAGCTTCCACTCTAGACTCTCCAAAACCAACAGCATCAGCACATTGGTAGAAGTT CCAGAAGATAGCACGAAAGAATCCGATAACCTGAAACGAGAGAATGAAGTGTTCGTTCGAGAGAGCATTCC GCTATGGATGGCTTGTGTCGGATACTTATTCTTCTCCCTTGTCTCCATCATTGCCATCCCTCTGATGTTCC CTCAGCTGAAATGGTACTTCGTCCTCGTAGCTTACCTCCTCGCCCCGTCTCTCAGCTTCTGTAACGCCTAT GGAGCTGGCTTAACGGATATGAACATGGCTTACAACTATGGTAAAGCGGCGCTCTTTGTGATGGCGGCACT GGCAGGAAAAAACGATGGAGTGGTGGCTGGGATGGTTGCTTGTGGTCTCATCAAATCAATTGTCTCTGTCT CAGCTGACCTGATGCATGATTTCAAGACGGGACATCTTACGCAAACCTCGCCCCGCTCAATGCTTGTTGCA CAAGCCATAGGGACAGCGATAGGCTGCGTGGTTGCGCCACTCACCTTCTTCCTCTTCTACAAGGCGTTTGA TGTTGGAAACCAAAATGGTGAATACAAAGCTCCTTATGCTATGATATACAGAAACATGGCTATTATCGGTG TTCAAGGTCCCTCCGCTCTCCCCAAGCATTGCCTAGAGCTTTGCTACGGATTCTTTGCCTTTGCGGTTGCT GCCAACTTGGCAAGAGACCTCTTACCAGATAAGCCAGGGAAGTGGATCCCACTCCCGATGGCAATGGCTGT ACCGTTCCTGGTGGGTGGGTCGTTCGCAATCGATATGTGTATCGGGAGCTTAGTGGTATACGTTTGGAAAA AGGTGAATCGTAAGAAGGCAGATGTGATGGTTCCAGCTGTTGCATCAGGTTTGATATGTGGAGATGGTCTC TGGATTCTGCCTTCTTCTTTGCTGGCTCTGGCTAAGGTTAGACCTCCTATCTGTATGAACTTCACTGCGGC TCATTAA

>AT5G53550.1 | YSL3 (YELLOW STRIPE LIKE 3); Oligopeptide transporter

ATGAGGAGTATGATGATGGAGAGAGAGGGAAGGAATGAGATAGAAAGAGAAGTAATAGATGACTTGGAAGA GACGCAAAACGAAGGAGATGATTTCAAGTCAATACCTCCATGGAAGGAACAAATCACTTTCAGAGGAATTG TTGCAAGTTTAATCATTGGTATAATCTACAGTGTGATCGTGATGAAACTAAACCTAACAACAGGTTTGGTC CCAAACCTAAATGTCTCTGCAGCACTTTTAGCCTTTGTCTTCCTTAGAAGCTGGACCAAGCTGTTGACCAA AGCCGGGATTGTGACTAAACCGTTCACTAAACAAGAGAACACTGTTGTCCAAACATGTGCTGTTGCTTGTT ACAGCATTGCAGTTGGAGGTGGGTTTGGTTCATACCTTCTTGGTTTGAACAGAATTACTTATGAACAGTCA GGAGGAACTCACACTGATGGGAATTATCCGGAAGGCACGAAAGAGCCTGGAATCGGTTGGATGACCGCTTT CTTGTTCTTTACTTGCTTTGTTGGTCTTTTAGCATTGGTTCCTCTAAGAAAGATCATGATCATAGACTACA AGCTGACATATCCAAGTGGAACAGCTACCGCGGTTTTGATCAACGGTTTCCACACTCCTAAAGGCAATAAA ATGGCCAAGAAACAAGTGTTTGGGTTTGTGAAGTACTTCTCATTTAGCTTCATTTGGGCTTTCTTCCAATG GTTCTTCTCTGGTGGTACAGAGTGCGGTTTCATTCAGTTTCCAACTTTCGGGTTAGAAGCTTTGAAGAACA CATTCTACTTCGACTTTAGCATGACATACGTTGGAGCAGGAATGATCTGTCCCCATATTGTCAATATATCT TTGCTTTTTGGCGCGGTTCTGTCTTGGGGAATCATGTGGCCACTCATTAAAGGTCTTAAAGGAGATTGGTT CCCATCAACTCTTCCTGAAAACAGCATGAAGAGTCTCAATGGTTACAAGGTGTTTATATCAATCTCATTGA TCCTCGGAGACGGGCTTTACCAATTCATCAAGATACTTTTTAAGACAGGAATAAACATGTACGTCAAGTTA AACAATCGCAACTCTGGGAAATCTAATTCGGAGAAAGATAAGCAATCTATTGCAGATCTTAAAAGAGATGA GATCTTTGTAAGAGACAGCATTCCATTATGGGTTGCAGCAGTAGGATACGCAGCGTTCTCTGTTGTCTCGA TCATCGCGATCCCTATAATGTTCCCCGAGCTGAAATGGTACTTCATAGTCGTAGCTTACATGTTAGCTCCA TCGTTAGGTTTCAGTAACGCTTATGGAGCAGGGCTAACAGATATGAACATGGCTTATAACTATGGTAAAGT CGCTCTGTTTATCTTAGCCGCTATGGCAGGGAAACAAAATGGTGTAGTCGCGGGACTTGTCGGATGCGGGT TGATAAAATCGATTGTATCGATTTCTTCTGACCTAATGCACGATTTCAAGACAGGACATTTGACTCTGACT TCACCTAGGTCGATGCTTGTGAGTCAAGCGATCGGTACAGCGATCGGATGCGTTGTGGCGCCTCTAACTTT CTTCTTGTTTTATAAAGCTTTCGATGTCGGGAACCAGGAGGGAGAGTACAAAGCTCCTTACGCTTTGGTAT ACAGAAACATGGCAATTCTTGGAGTTGAAGGTTTCTCTGCTTTGCCTCAACATTGTTTACAGCTTTGTTAC GGGTTTTTCGCATTCGCGGTGGCGGCAAATCTCGTTAGGGATAGGTTACCGGATAAGATAGGGAATTGGGT TCCATTACCGATGGCAATGGCGGTTCCGTTTCTTGTTGGAGGGTACTTTGCTATTGATATGTGTGTGGGAA GTTTGATTGTGTTTGCTTGGAATATGAGAGATCGAGTTAAAGCCGGTTTAATGGTACCGGCGGTTGCTTCC GGTTTGATATGTGGAGATGGTCTATGGATTTTGCCGTCGTCGGTTCTTGCTTTGGCCGGCGTTAGACCTCC TATATGTATGGGCTTCATGCCGAGTAAATATTCGAGTTAA

223

>AT2G46800.1 | ATMTP1, MTP1, ZAT1, ZAT (ZINC TRANSPORTER OF ARABIDOPSIS THALIANA); Zinc ion transmembrane transporter

ATGGAGTCTTCAAGTCCCCACCATAGTCACATTGTTGAGGTTAATGTTGGAAAATCTGATGAAGAGAGAAT AATTGTGGCGAGTAAAGTCTGTGGAGAAGCACCATGTGGGTTTTCAGATTCTAAGAATGCTTCCGGGGATG CTCACGAACGCTCTGCTTCTATGCGGAAGCTTTGTATCGCCGTCGTGCTGTGTCTAGTGTTCATGAGTGTT GAAGTTGTTGGTGGGATTAAAGCCAATAGTTTAGCTATATTAACCGATGCAGCTCATTTGCTCTCTGACGT TGCTGCCTTTGCTATCTCCCTCTTCTCATTGTGGGCTGCTGGCTGGGAAGCGACTCCTAGGCAGACTTACG GGTTCTTCAGGATTGAGATTTTGGGTGCTCTTGTATCTATCCAGCTCATTTGGTTGCTCACGGGTATTCTG GTTTATGAAGCGATTATCAGAATTGTTACAGAGACCAGTGAGGTTAATGGATTCCTCATGTTTCTGGTTGC TGCCTTTGGTCTAGTGGTGAACATCATAATGGCTGTTCTGCTAGGGCATGATCATGGTCACAGTCATGGAC ATGGGCATGGCCACGGCCATGACCATCACAATCATAGCCATGGGGTGACTGTTACCACTCATCACCATCAT CACGATCATGAACATGGCCATAGTCATGGTCATGGAGAGGACAAGCATCATGCTCATGGGGATGTTACTGA GCAATTGTTGGACAAATCGAAGACTCAAGTCGCAGCAAAAGAGAAAAGAAAGAGAAACATCAATCTCCAAG GAGCTTATCTGCATGTCCTTGGGGATTCCATCCAGAGTGTTGGTGTTATGATTGGAGGAGCTATCATTTGG TACAATCCGGAATGGAAGATAGTGGATCTGATCTGCACACTTGCCTTTTCGGTTATTGTCCTAGGAACAAC CATCAACATGATTCGCAACATTCTAGAAGTATTGATGGAGAGTACACCCAGAGAGATTGACGCCACAAAGC TCGAAAAGGGTTTGCTCGAAATGGAAGAAGTGGTGGCTGTTCATGAGCTCCACATATGGGCTATCACAGTG GGAAAAGTGCTATTGGCTTGCCATGTCAATATCAGACCAGAAGCAGATGCAGATATGGTGCTCAACAAGGT AATTGATTACATCCGCAGGGAGTACAACATTAGTCATGTCACGATACAAATCGAGCGCTAA

>AT2G39450.1 | ATMTP11/MTP11; Cation transmembrane transporter/ manganese ion transmembrane transporter/ manganese:hydrogen antiporter

ATGGTTGAGCCAGCTAGTCCAGATTCCGACGAAGGGATCTCGCTGCTCGAGTTTCACGGTAACGGTGACCG TTCCTGGCAGTTAAATTTCGATGATTTCCAGGTTTCACCGGAACACAAGGAGAAGAAATCTCCGAGTAAAC TACACAACTGTCTTGGTTGTTTGGGTCCGGAAGACAATGTGGCAGATTATTACCAGCAGCAAGTAGAGATG CTTGAGGGCTTTACTGAAATGGATGAACTTGCAGAACGTGGCTTTGTTCCTGGAATGTCAAAGGAAGAGCA GGATAATTTGGCTAAAAGCGAGACATTGGCGATTAGAATATCAAACATTGCAAACATGCTTCTTTTTGCTG CTAAAGTCTATGCTTCTGTCACTAGTGGCTCTTTAGCTATCATTGCCTCTACATTGGACTCTCTTCTTGAT CTTCTTTCTGGCTTCATCCTCTGGTTTACCGCCTTCTCCATGCAGACACCCAACCCGTATCAGTATCCCAT TGGCAAGAAACGCATGCAACCACTGGGAATCCTAGTCTTTGCATCAGTGATGGCAACACTTGGATTGCAGA TTATCTTGGAATCTCTTCGCACAATGTTATCCAGCCACAAGGAGTTCAACCTAACAAAAGAGCAAGAGAGT TGGGTAGTTGGGATCATGCTTTCTGTTACATTGGTCAAACTGCTTCTGGTTCTTTACTGCAGATCCTTCAC TAACGAGATCGTTAAAGCTTATGCTCAAGATCATTTCTTCGACGTCATCACAAACATCATTGGACTCATTG CAGTAATCCTGGCCAATTACATTGATTATTGGATTGATCCAGTTGGAGCTATCATTCTTGCATTATACACA ATACGGACATGGTCAATGACGGTCTTGGAGAACGTTAACTCTCTTGTTGGGAAATCAGCTAGACCAGAGTA TCTGCAGAAACTAACTTACCTGTGTTGGAACCACCATAAAGCCATTAGGCACATTGACACAGTGAGGGCAT ACACATTTGGCTCTCATTATTTTGTGGAGGTTGATATTGTTCTCCCAGCTGACATGCCTCTGCAAGTGGCT CACGACATTGGAGAATCGCTGCAAGAGAAGCTCGAGCTACTAGAGGAGATCGAACGGGCTTTTGTGCATCT TGATTATGAGTACACTCACAAACCTGAGCACGCTAGATCCCACTGTTAG

>AT2G29410.1 | ATMTPB1, MTPB1; efflux transmembrane transporter/ zinc ion transmembrane transporter

ATGGAATTGGAGCAAATCTGCATTTTGAAACCCGATGATGAGGAAGAAATGGAGTCACCATCACCATCGAA AACAGAGGAAAATTTAGGCGTGGTTCCTCTGTCTTGCGCGTTTACGAGACAAGAACATTGCGTTTCCGAGA CGAAAGAACGCGAAGAATCGACTAGAAGACTAAGCAGCCTTATTTTTCTTTACCTCATCGTTATGTCCGTA CAGATTGTGGGAGGTTTTAAGGCGAATAGTCTTGCGGTTATGACAGATGCTGCGCATTTGTTGTCGGATGT GGCGGGTTTGTGTGTCTCTTTGTTGGCGATCAAGGTCTCGAGCTGGGAAGCTAATCCGCGAAACTCGTTCG

224 GGTTTAAACGGCTTGAAGTTTTAGCGGCGTTCTTGTCTGTTCAGCTCATATGGCTTGTCTCTGGAGTGATA ATCCATGAAGCCATTCAAAGACTACTTAGCAGAAGCAGAGAGGTTAATGGAGAAATCATGTTTGGGATTTC GGCTTTCGGGTTTTTCATGAACTTGGTTATGGTCCTTTGGTTAGGACATAACCATAGCCACCACCACCACG ACCATCATCACCATCATCATAATCATAAACATCAACATCAACATCATCACAAGGAGGTGGTGGCAGAAGAG GAGGAGGAGGAAATGAATCCTTTAAAAGGTGAGAAATCGTCTTCAAAGGAGATGAACATTAACATACAAGG AGCTTATCTTCACGCAATGGCTGATATGATTCAGTCACTAGGTGTTATGATTGGTGGAGGTATCATTTGGG TAAAACCAAAATGGGTTTTGGTTGATTTGATTTGCACTCTTGTTTTCTCTGCTTTTGCTCTTGCTGCTACT CTACCAATTCTCAAGAACATATTTGGAATACTGATGGAACGCGTACCGCGTGATATGGACATAGAGAAGCT CGAAAGAGGACTTAAGCGAATCGATGGAGTTAAGATTGTTTATGATCTCCATGTATGGGAGATAACAGTAG GGAGAATCGTATTGTCCTGTCATATCTTACCAGAGCCTGGAGCTAGTCCTAAAGAGATCATTACTGGTGTT AGAAATTTCTGTAGGAAGTCTTATGGGATTTATCATGCAACTGTTCAGGTTGAATCAGAGTAG

>AT3G61940.1 | ATMTPA1, MTPA1; efflux transmembrane transporter/ zinc ion transmembrane transporter

ATGGATTCAAGAAGAAGTAAAGTTTGTGGAGAAACAGCTTGTGGGTTTTCAACATCTTCAAGTGATGCTAA GAAACGAGCTGCTTCAATGCGAAAGCTCTGTTTCGTTGTGGTGTTATGTCTTTTGTTCATGAGCATAGAAG TTGTTTGTGGCATTAAGGCTAACAGTTTGGCTATATTAGCAGATGCAGCTCATTTGCTCACTGATGTTGGT GCTTTTGCTATCTCTATGCTCTCCTTGTGGGCTTCTTCTTGGGAAGCAAATCCGCGACAGAGTTATGGGTT CTTCAGGATAGAGATTTTAGGCACTCTTGTCTCAATCCAGCTCATTTGGTTACTCACTGGGATTTTGGTCT ACGAAGCCGTAACTAGACTCGTTCAAGAGACCAATGATGATGTTGATGGTTTCTTTATGGTTCTTGTTGCT GCTTTTGGTTTAGTTGTAAACATCATAATGATTGTTGTGCTTGGACATGATCACGGACATGGGCATGATCA TGGTCACAGTCATGATCATGGTCATAGTTATGGGGAAAGAGCTGAGCAGTTGTTGGAGAAATCCAAGGAAA TAAGGAACATTAATGTGCAAGGGGCTTATCTTCATGTTCTTGGGGATTTAATTCAGAGCATTGGGGTAATG ATTGGAGGGGGTATGATTTGGTATAACCCTAAATGGAAAGTAATTGACTTGATATGCACGTTATTCTTTTC AGTTATTGTTTTAGGGACAACCATTAAGATGCTTCGAAGCATACTTGAAGTTCTAATGGAGAGCACACCGA GAGAGATAGACGCGAGGCAGCTCGAAAAGGGATTGATGGAAATTGAGGAAGTTGTGGATGTTCATGAGCTG CACATATGGGCAATTACAGTTGGAAAAGCTCTGTTTTCTTGTCATGTCAAGGTTAGACCAGAAGCTGGTGA TGAAATGGTGCTGAACAAAGTGATCGATTACATCTGGAGAGAGTATCGTATCAGTCACGTTACGATACAAA TCGAGCGTTAA

>AT5G59030.1 | COPT1 (COPPER TRANSPORTER 1); copper ion transmembrane transporter

ATGGATCATGATCACATGCATGGAATGCCTCGACCATCATCATCATCATCATCATCACCATCATCCATGAT GAACAATGGATCCATGAACGAAGGAGGAGGACATCACCACATGAAGATGATGATGCACATGACCTTCTTTT GGGGCAAGAACACGGAGGTTCTCTTCTCCGGTTGGCCCGGAACAAGCTCCGGCATGTATGCTCTTTGTCTT ATCTTTGTCTTCTTTCTCGCCGTCCTCACCGAATGGCTTGCTCATTCCTCTCTCCTCCGTGGCTCCACCGG AGATTCGGCTAATCGTGCCGCCGGGTTAATCCAAACCGCCGTGTATACCCTCCGTATCGGCCTCGCCTATC TAGTCATGCTCGCCGTCATGTCGTTTAACGCCGGTGTGTTTCTCGTCGCTCTGGCCGGTCATGCCGTTGGT TTCATGTTGTTCGGAAGCCAAACTTTCCGGAATACCTCCGATGACCGGAAAACTAACTATGTTCCTCCCTC AGGTTGTGCTTGTTGA

>AT3G46900.1 | COPT2 (Copper transporter 2); copper ion transmembrane transporter

ATGGATCATGATCACATGCATGACATGCCACCACCATCACCATCATCATCATCGATGTCGAATCATACAAC ACCACACATGATGATGATGCACATGACCTTCTTTTGGGGTAAGAACACGGAGGTTCTCTTCTCCGGATGGC CCGGGACAAGCTCCGGCATGTACGCTCTCTGTCTCATTGTTATATTCCTCCTCGCCGTAATTGCCGAGTGG CTCGCTCATTCACCGATCCTACGTGTCAGTGGCTCAACCAATCGAGCCGCCGGGCTCGCTCAAACCGCTGT

225 GTACACACTCAAGACTGGCCTTTCGTATTTGGTGATGCTCGCTGTTATGTCCTTTAACGCAGGTGTTTTCA TCGTCGCTATCGCCGGCTATGGCGTTGGTTTCTTTCTCTTTGGGAGCACTACTTTCAAGAAACCCTCCGAT GACCAGAAAACCGCCGAGCTTCTTCCGCCGTCTTCAGGCTGCGTTTGTTGA

>AT5G59040.1 | COPT3 (Copper transporter 3); copper ion transmembrane transporter

ATGAACGGCATGAGTGGATCTTCTCCGGCGGCTCCGGCTCCTTCACCATCATCGTTCTTCCAACATCGCCA CCGTCACGGTGGTATGATGCATATGACTTTCTTTTGGGGAAAAACCACGGAGGTTCTATTCGACGGATGGC CAGGGACCAGTCTGAAAATGTATTGGGTCTGTCTCGCCGTCATTTTTGTTATTTCCGCTTTCTCCGAGTGT CTATCGCGGTGCGGTTTCATGAAATCCGGTCCGGCTAGCTTGGGAGGCGGGCTATTACAGACTGCGGTTTA CACCGTGAGGGCTGCTTTGTCTTATTTGGTCATGCTGGCTGTGATGTCGTTCAATGGAGGAGTCTTCGTGG CTGCGATGGCTGGTTTTGGATTAGGGTTTATGATTTTTGGAAGTAGGGCTTTTAGGGCCACTAGCAGTAAC AGTCACACCGAGGTTCAATCACATTGTTGA

>AT2G37925.1 | COPT4 (copper transporter 4); copper ion transmembrane transporter

ATGTTGTCGAGCAAAAACGTTGTCGTAGTAGAGGCTTGGAACACCACCACCACAACGCAAACGCAAACGCC ACATAGACCGTCACTGTTACACCCAACATTCTATTGGGGTTACAACTGCCAAGTGCTCTTCTCAGGCTGGC CTGGTTCTGACCGTGGGATGTATGCACTCGCACTTATCTTCGTCTTCTTTCTGGCGTTCTTAGCTGAGTGG CTTGCTCGGTGCTCTGACGCGTCATCCATCAAACAGGGTGCCGATAAGCTTGCTAAAGTTGCATTTCGCAC GGCTATGTATACCGTCAAGTCCGGCTTTAGCTACCTAGTGATTCTGGCTGTTGTTTCCTTCAACGGTGGTG TTTTCCTAGCCGCGATTTTCGGGCATGCTCTTGGGTTCGCTGTTTTCCGTGGGCGGGCGTTTAGGAATAGG GATATTCAATAA

>AT5G20650.1 | COPT5 (copper transporter 5); copper ion transmembrane transporter

ATGATGCACATGACCTTCTACTGGGGAATCAAAGCCACAATTCTCTTCGATTTCTGGAAAACTGACTCATG GCTTAGTTACATCCTCACTTTAATCGCTTGCTTCGTCTTCTCCGCTTTCTATCAATACCTCGAGAATCGCC GCATCCAATTCAAATCCCTTTCTTCCTCCCGTCGTGCTCCTCCACCGCCTCGCTCTTCCTCCGGCGTCTCC GCGCCTCTTATCCCTAAATCCGGTACCAGATCCGCCGCTAAAGCTGCTTCGGTTCTTCTTTTCGGCGTCAA CGCAGCGATCGGTTACTTGCTGATGCTTGCAGCTATGTCTTTCAACGGAGGTGTTTTCATCGCGATTGTCG TCGGATTAACCGCCGGATACGCTGTTTTTAGATCTGATGACGGCGGTGCTGATACCGCCACGGATGATCCA TGTCCATGTGCTTGA

226

Appendix D.

Chapter 6 Supplemental Information

The following tables contain compiled loop length combinations for GQS found in different regions of the Arabidopsis genome. We observed that loops of the same length within a GQS were common in genic regions and these combinations were examined in more detail in Table

6.1. In all below, loop lengths as determined from Quadparser and the total number of GQS with the loop length combination. Grey highlight shows those where all loops are the same length.

Table D.1: Common loop combinations of Table D.2: Common loop combinations of 5‘UTR G3L1-7 GQS 3‘UTR G3L1-7 GQS

5’UTR 3’UTR Loop 1 Loop 2 Loop 3 Number Loop 1 Loop 2 Loop 3 Number 3 3 3 2 2 2 2 3 5 7 4 2 1 1 1 2 6 7 4 2 2 2 7 2 1 1 3 1 4 3 2 2 1 1 7 1 7 7 7 2 1 2 2 1 1 1 6 1 1 3 3 1 3 3 3 1 1 7 1 1 4 3 5 1 2 4 2 1 4 2 1 1 3 3 5 1 5 1 6 1 3 4 7 1 5 5 3 1 3 7 6 1 6 4 2 1 4 1 1 1 6 7 2 1 4 4 3 1 7 7 3 1 4 7 4 1 Total: 20 5 3 3 1 5 7 6 1 6 2 2 1 6 4 6 1 6 6 4 1 Total: 23

227

Table D.3 Common loop combinations of intronic G3L1-7 GQS

INTRON Loop 1 Loop 2 Loop 3 Number 1 1 3 5 1 1 1 2 1 4 1 2 2 2 7 2 2 3 5 2 2 6 2 2 3 3 3 2 3 6 2 2 4 5 7 2 7 7 7 2 1 1 4 1 1 2 2 1 1 4 2 1 2 1 1 1 2 2 1 1 2 7 3 1 3 1 1 1 3 2 2 1 3 2 6 1 3 3 2 1 3 5 4 1 3 7 5 1 4 2 5 1 4 3 7 1 4 4 4 1 5 2 5 1 5 4 4 1 5 5 2 1 5 7 6 1 7 6 1 1 7 6 5 1 Total: 44

228

Table D.4: Common loop combinations of Coding sequence G3L1-7 GQS

Coding sequence Coding sequence Loop 1 Loop 2 Loop 3 Number Loop 1 Loop 2 Loop 3 Number 3 3 3 13 3 5 4 2 6 6 6 8 3 6 6 2 3 3 2 7 3 7 7 2 3 3 4 5 4 1 1 2 2 1 7 4 4 1 7 2 7 4 2 4 4 2 7 2 7 6 7 4 4 4 6 2 7 7 7 4 4 5 6 2 1 1 2 3 5 1 6 2 1 4 3 3 5 2 2 2 2 2 1 3 5 3 4 2 2 2 2 3 5 4 1 2 2 2 4 3 5 4 6 2 2 6 1 3 5 4 7 2 3 2 5 3 5 5 4 2 3 3 7 3 5 6 7 2 4 1 2 3 5 7 3 2 4 3 3 3 6 2 3 2 4 7 3 3 6 2 5 2 5 3 6 3 6 3 4 2 5 5 2 3 6 7 4 2 5 7 7 3 7 3 5 2 6 1 7 3 7 3 6 2 6 3 1 3 7 5 1 2 7 7 6 3 7 6 2 2 1 3 1 2 7 6 3 2 1 3 4 2 1 1 1 1 1 5 4 2 1 2 4 1 1 6 6 2 1 2 6 1 2 1 2 2 1 3 2 1 2 2 3 2 1 4 5 1 2 5 3 2 1 5 7 1 2 5 6 2 1 6 2 1 2 6 7 2 1 6 3 1 3 1 4 2 1 6 7 1 3 3 6 2 1 7 1 1 3 4 5 2 2 1 6 1

229

Coding sequence Coding sequence Loop 1 Loop 2 Loop 3 Number Loop 1 Loop 2 Loop 3 Number 2 2 5 1 5 1 2 1 2 2 6 1 5 2 4 1 2 2 7 1 5 2 5 1 2 3 6 1 5 2 6 1 2 4 1 1 5 3 1 1 2 4 3 1 5 3 3 1 2 4 4 1 5 3 7 1 2 4 5 1 5 4 2 1 2 4 7 1 5 4 3 1 2 5 4 1 5 6 2 1 2 5 5 1 5 6 4 1 2 6 2 1 5 6 6 1 2 6 6 1 6 3 3 1 2 7 4 1 6 3 5 1 2 7 5 1 6 4 2 1 2 7 7 1 6 5 3 1 3 1 2 1 6 5 4 1 3 1 5 1 6 5 5 1 3 2 3 1 6 5 6 1 3 4 2 1 6 5 7 1 3 4 4 1 6 7 5 1 3 5 5 1 6 7 6 1 3 6 7 1 7 1 5 1 3 7 1 1 7 2 2 1 4 2 1 1 7 2 3 1 4 2 3 1 7 4 7 1 4 2 4 1 7 5 2 1 4 2 5 1 7 5 5 1 4 2 6 1 7 5 6 1 4 3 1 1 7 5 7 1 4 3 4 1 7 6 6 1 4 3 7 1 7 7 3 1 4 4 3 1 7 7 4 1 4 6 2 1 Total 258 4 6 4 1 4 6 5 1 4 7 6 1 5 1 1 1

230

Table D.5: Common loop combinations of intergenic region G3L1-7 GQS

Intergenic region Intergenic region Loop 1 Loop 2 Loop 3 Number Loop 1 Loop 2 Loop 3 Number 4 4 4 159 7 3 2 4 4 5 3 65 1 1 4 3 4 5 4 47 1 3 1 3 4 3 4 45 1 4 1 3 3 4 4 34 1 6 5 3 2 5 2 32 1 7 1 3 2 6 2 21 2 1 2 3 1 1 1 18 2 1 3 3 7 7 7 16 2 2 1 3 1 1 2 15 2 6 1 3 2 2 2 14 3 3 5 3 1 1 3 13 3 4 5 3 3 3 3 10 4 1 1 3 4 3 5 10 4 1 4 3 5 2 6 10 5 3 4 3 3 7 2 9 5 7 7 3 5 3 6 8 6 3 5 3 7 7 5 8 6 5 5 3 4 2 4 6 7 6 5 3 1 2 3 5 1 1 5 2 1 5 5 5 1 4 5 2 2 1 1 5 1 6 1 2 2 7 2 5 1 6 2 2 2 7 3 5 1 7 5 2 3 1 6 5 2 2 5 2 3 3 2 5 2 2 7 2 4 4 3 5 2 3 2 2 5 5 2 5 2 5 5 2 6 3 3 5 2 6 3 2 1 2 2 4 3 1 1 2 2 3 3 4 3 2 2 2 3 1 7 4 3 4 1 2 3 3 6 4 3 5 4 2 3 6 1 4 4 2 3 2 4 3 3 4 4 2 7 2 5 5 6 4 4 4 5 2

231

Intergenic region Intergenic region Loop 1 Loop 2 Loop 3 Number Loop 1 Loop 2 Loop 3 Number 4 6 1 2 3 6 3 1 5 5 5 2 3 7 3 1 5 6 7 2 4 1 3 1 6 2 2 2 4 2 1 1 6 2 6 2 4 2 6 1 7 1 2 2 4 3 1 1 7 2 2 2 4 4 6 1 7 2 7 2 4 6 5 1 7 3 7 2 4 7 2 1 7 6 3 2 4 7 3 1 7 7 6 2 4 7 4 1 1 2 1 1 5 1 1 1 1 2 7 1 5 1 5 1 1 3 3 1 5 2 7 1 1 4 6 1 5 3 1 1 1 5 3 1 5 3 2 1 1 5 6 1 5 4 3 1 1 6 3 1 5 4 4 1 1 7 2 1 5 5 1 1 2 1 4 1 5 6 5 1 2 1 7 1 5 6 6 1 2 2 3 1 5 7 6 1 2 3 1 1 6 1 3 1 2 3 4 1 6 2 3 1 2 3 6 1 6 3 1 1 2 3 7 1 6 4 3 1 2 4 4 1 6 4 7 1 2 5 4 1 6 7 1 1 2 7 1 1 6 7 5 1 2 7 4 1 6 7 6 1 2 7 7 1 6 7 7 1 3 1 2 1 7 1 1 1 3 1 3 1 7 1 3 1 3 2 3 1 7 2 5 1 3 2 5 1 7 3 5 1 3 2 6 1 7 3 6 1 3 5 6 1 7 4 3 1 3 6 2 1 7 4 5 1

232

Intergenic region Loop 1 Loop 2 Loop 3 Number 7 4 7 1

7 5 5 1 7 6 6 1 7 6 7 1 7 7 4 1 TOTAL: 802

* 4-4-4 loops: These loops were found to be equal to or within 3 mutations of the Arabidopsis telomeric repeat sequence (TTTA or, in the case of C-rich patterns: TAAA). Four of these border telomere regions and there is a very large cluster of 108 GQS at the centromere of chromosome 1. The prevalence of GQS at the centromere is most likely explained by chromosomal rearrangements, such as the combination of two chromosomes.

233

Table D.6: miRNA target genes that contain a GQS in the RNA. For example, At1g30120 is a target gene for miR159/319 and also contains a GQS. The GQS is 142 nt away from the miRNA target binding site. Out interest in this was the possibility of GQS affecting miRNA-miRNA target interactions.

Distance Distance Distance Gene between between between miRNA family Gene Sequence Name target and target and target and GQS 2nd GQS 3rd GQS miR159/miR319 AT1G30210 5'AGGGGGACCCTTCAGTCCAA3' 142 miR159/miR319 AT1G53230 TCP3 5'AGAGGGGTCCCCTTCAGTCCAT3' 794 miR159/miR319 AT3G15030 TCP4 5'AGAGGGGTCCCCTTCAGTCCAG3' 905 104 76 miR160 AT2G28350 ARF10 5'AGGAATACAGGGAGCCAGGCA3' 315 miR160 AT4G30080 ARF16 5'GGGTTTACAGGGAGCCAGGCA3' 717 175 miR161.1/.2 AT1G63070 5'TAGTCTTTTTCAACGCATTGA3' 510 5'ACCCCAATGTTGTTACTTTCAATGC miR161.1/.2 AT1G63150 831 ATTGA3' miR162 AT1G01040 DCL1 5'CTGGATGCAGAGGTATTATCGA3'- 1897 miR163 AT1G66720 5'ATCGAGTTCCAGGTCCTCTTCAA3' 327 miR164 AT1G56010 NAC1 5'AGCACGTACCCTGCTTCTCCA3' 538 miR164 AT3G15170 CUC1 5'AGCACGTGTCCTGTTTCTCCA3' 1192 miR164 AT5G53950 CUC2 5'AGCACGTGTCCTGTTTCTCCA3' 215 224 5'CTGGAATGAAGCCTGGTCCGG3' 2701 miR166/miR165 AT1G52150 ATHB-15 in CDS 5'CTGGGATGAAGCCTGGTCCGG3' 535 miR166/miR165 AT5G60690 REV in CDS miR169 AT1G54160 5'ACGGGAAGTCATCCTTGGCTA3' 186 5'AGGGATATTGGCGCGGCTCAATCA3 miR170/miR171 AT3G60630 756 ' 5'GGGGATATTGGCGCGGCTCAATCA3 miR170/miR171 AT4G00150 426 ' miR172 AT2G28550 TOE1 5'CAGCAGCATCATCAGGATTCT3' 1580 miR172 AT4G36920 AP2 5'CTGCAGCATCATCAGGATTCT3' 74 miR172 AT5G60120 TOE2 5'ATGCAGCATCATCAGGATTCT3' 2335 miR393 AT3G62980 TIR1 5'GGAGACAATGCGATCCCTTTGGA3' 2294 967

234 miR395 AT5G43780 APS4 5'GAGTTCCTCCAAACACTTCAT3' 87 477 828 miR396 AT2G22840 AtGRF1 5'TCGTTCAAGAAAGCCTGTGGAA3' 596 miR396 AT4G24150 AtGRF8 5'TCGTTCAAGAAAGCATGTGGAA3' 925 533 miR396 AT4G37740 AtGRF2 5'TCGTTCAAGAAAGCCTGTGGAA3' 652 miR397 AT2G29130 5'AATCAATGCTGCACTCAATGA3' 788 753 miR398 AT3G15640 5'AAGGTGTGACCTGAGAATCACA3' 720 miR400 AT1G62590 5'GTTACATATAATACTCTCATA3' 329 miR400 AT1G62910 5'GTGACTTACAGTACTCTTATA3' 329 miR400 AT1G62930 5'GTGACTTACAATACTCTTATA3' 329 miR400 AT1G63070 5'GTGGCTTATAATACGCTTATA3' 329 miR400 AT1G63080 5'GTAACTTATAATACACTTATA3' 26 miR400 AT1G63130 5'GTGACTTACAATACTCTTATA3' 329 miR400 AT1G63400 5'GTGACTTACAATACTCTTATT3' 329 miR400 AT4G19440 5'GTCGACGTATAATACTCTCATA3' 898 miR400 AT5G16640 5'GTAACTTATAGTATTCTCATT3' 314 miR400 AT4G34060 5'CTGAGGTTTAATAGGCCTCGGA3' 23 miR400 AT1G31280 AGO2 5'GGAGTTTGTGCGTGAATCTAA3' 3363 3330 3206 miR400 AT2G30210 5'ACCAGTGAAGAGGCTGTGCAG3' 1487 1584 1774 miR400 AT3G45090 5'TGACAAACATCTCGTTCCTAA3' 834 miR400 AT5G60760 5'TGACAAACATCTCGTCCCCAA3' 2275 miR400 AT1G12210 RFL1 5'GGTATGGGTGGAGTAGGCAAAA3' overlap miR400 AT1G12220 RPS5 5'GGTATGGGGGGAGTAGGCAAAA3' overlap 124 miR400 AT1G12280 5'GGCATGGGTGGAGTAGGCAAAA3' overlap miR400 AT1G12290 5'GGCATGGGTGGAGTAGGAAAAA3' 93 overlap 770 miR400 AT1G15890 5'GGTATGGGGGGAGTAGGTAAAA3' 42 overlap miR400 AT1G51480 5'GGTATGGGGGGAGTAGGAAAAA3' overlap miR400 AT1G62630 5'GGTATGGGCGGTGTAGGGAAGA3' overlap miR400 AT1G63360 5'GGTATGGGCGGTGTAGGGAAGA3' overlap 590 miR400 AT4G10780 5'GGTATGGGGGGAGTAGGCAAAA3' overlap miR400 AT4G27190 5'GGCATGGGAGGAGTAGGTAAAA3' overlap miR400 AT5G05400 5'GGTATGGGGGGAGTAGGCAAGA3' 93 overlap miR400 AT5G43730 5'GGTATGGGGGGAATAGGAAAAA3' overlap 1685

235 miR400 AT5G43740 5'GGTATGGGGGGAGTAGGAAAAA3' 42 overlap miR400 AT5G47260 5'GGTAGGGGTGGAGTAGGCAAAA3' overlap miR400 AT5G63020 5'GGTATGGGTGGAGTAGGTAAAA3' overlap miR403 AT1G31280 AGO2 5‘GGAGTTTGTGCGTGAATCTAA3‘ 3364 miR408 AT2G02850 ARPN 5‘GCCAAGGGAAGAGGCAGUGCAU3‘ ND miR408 AT2G30210 5‘ACCAGUGAAGAGGCUCUGCAG3‘ ND miR447 AT3G45090 5‘UGACAAACAUCUCGUUCCUAA3‘ ND miR447 AT5G60760 5‘UGACAAACAUCUCGUCCCCAA3‘ ND miR472 AT1G12210 RFL1 5'GGUAUGGGUGGAGUAGGCAAAA3' ND miR472 AT1G12220 RPS5 5'GGUAUGGGGGGAGUAGGCAAAA3' ND miR472 AT1G12280 5'GGCAUGGGUGGAGUAGGCAAAA3' ND miR472 AT1G15890 5'GGUAUGGGGGGAGUAGGUAAAA3' ND miR472 AT1G51480 5'GGUAUGGGGGGAGUAGGAAAAA3' ND miR472 AT1G62630 5'GGUAUGGGCGGUGUAGGGAAGA3' ND miR472 AT1G63360 5'GGUAUGGGCGGUGUAGGGAAGA3' ND miR472 AT4G10780 5'GGUAUGGGGGGAGUAGGCAAAA3' ND miR472 AT4G27190 5'GGCAUGGGAGGAGUAGGUAAAA3' ND miR472 AT5G05400 5'GGUAUGGGGGGAGUAGGCAAGA3' ND miR472 AT5G43730 5'GGUAUGGGGGGAAUAGGAAAAA3' ND miR472 AT5G43740 5'GGUAUGGGGGGAGUAGGAAAAA3' ND miR472 AT5G47269 5'GGUAGGGGUGGAGUAGGCAAAA3' ND miR472 AT5G63020 5'GGUAUGGGUGGAGUAGGUAAAA3' ND miR771 AT1G76810 5'TGAAAGTTACTACAGAGGCTCG3' ND miR778 AT5G53890 5'AATGTGCTGCAACATTGCAGAA3' 1073 miR842 AT5G38550 JR/MBP 5'GGGATGATGGATCTGAACATGA3' 1855 miR846 At1g52050 JR/MBP 5'GATGCAAGCACTCCAATTCAA3' 436 miR846 At1g52060 JR/MBP 5'GATGCAAGCACTTGAATTCAA3' 390 172 miR846 At1g52070 JR/MBP 5'GATGCAAGCACTTGAATTCAA3' 380 784 miR846 At1g52130 JR/MBP 5'CATTCAAGCACTGCAATTCAA3' 341 773 miR846 At1g60110 JR/MBP 5'GATTAAAGGACTTCAATTCAA3' 172 1269 miR846 At5g28520 JR/MBP 5'GATCCAAGCACTTAAATTCAA3' 196

236

miR846 At5g49850 JR/MBP 5'CATTCAAGCACTTCGATTCAG3' 338 711 miR859 At1g24793 F-Box 5'TTTGATTTTACAACAGAGACA3' ND miR859 At1g24880 F-Box 5'TTTGATTTTACAACAGAGACA3' ND miR859 At1g25054 F-Box 5'TTTGATTTTACAACAGAGACA3' ND miR859 At1g25141 F-Box 5'TTTGATTTTACAACAGAGACA3' 735 miR859 At1g25210 F-Box 5'TTTGATTTTACAACAGAGACA3' ND miR859 At1g32140 F-Box 5'TTTGATTTTACAACCGAGAGA3' ND miR859 At2g18780 F-Box 5'TTTGATTTCACAACGGAGAGA3' 653 miR859 At2g27520 F-Box 5'TTTGATTTCACAACAGAGACA3' 107 miR859 At3g16820 F-Box 5'TTTGATTTTACAACAGAGAGA3' 684 miR859 At3g20710 F-Box 5'TTTGATTTTACAAGAGAGAGA3' 623 miR859 At3g21170 F-Box 5'TTTGATTTCTCAACAGAGAGA3' 683 ―overlap‖ indicates that the target contains at least part of the GQS. ND = not determined. The last three columns give the distance from the GQS to the target, regardless of direction (5‘ or 3‘).

237

Table D.7: Alternatively spliced loci with G2L1-4 GQS in some gene models. For example, AT1G03850 has two splice variants, one with a G2 GQS in the mature RNA and one that does not. Our interest in this was the possibility of GQS controlling alternative splicing.

# without Gene Name or Loci # with GQS Location GQS Putative Function AT1G03850 1 Intron1 1 Glutaredoxin family protein AT1G04080 1 CDS 1 PRP39 AT1G05400 1 CDS 1 - AT1G05410 1 CDS 1 - ATP-dep Clp protease/ crotonase AT1G06550 1 CDS 1 family protein HPD, PSD1 (phytoene AT1G06570 3 CDS, 3‘UTR 1 desaturation 1) Similar to 2-oxoglytarate- AT1G06640 3 5‘UTR 1 dependent dioxygenase AT1G07170 1 CDS 1 PHF5-like protein ATP-dependent Clp protease AT1G07200 1 CDS 1 related AT1G08320 2 CDS 1 bZIP family transcription factor Ribosomal L19p/L5e family AT1G08845 1 CDS 1 protein AT1G09490 1 CDS 1 - AT1G11120 1 CDS 1 - Adenine nt alpha hydrol.-like AT1G11360 1 CDS 1 family protein AT1G17090 1 5‘UTR 1 - BTB/POZ domain-containing AT1G21780 1 5‘UTR 1 protein UDP-3-O-acyl N- AT1G24793 2 CDS, 3‘UTR 1 acetylglycosamine deacetylase family protein AT1G25500 2 CDS 1 Choline transporter related SPL10 (Squamosa promoter AT1G27370 3 5‘UTR, Intron1 1 binding protein-like 10) AT1G27810 2 CDS, Intron1 1 Transposable element gene AT1G29040 2 CDS, 3‘UTR 2 - 5‘UTR, CDS, Leucine-rich repeat transmem. AT1G29720 2 1 3‘UTR protein kinase ARF6 (Auxin response AT1G30330 1 5‘UTR 1 factor 6) AT1G45248 4 Introns 1 and 2 1 - P-loop containing nucleos. AT1G50140 1 Intron6 1 triphosphate hydrolases superfamily protein AT1G51270 3 CDS, Intron7 1 PIRF1 (Phytochrome interacting AT1G52240 1 CDS 1 ROPGEF1)

238

# without Gene Name or Loci # with GQS Location GQS Putative Function AT1G52320 1 CDS 1 - Plant invertase/pectin AT1G56100 2 3‘UTR 1 methylesterase inhibitor superfamily protein AT1G58520 1 Intron5 1 RXW8 AT1G65040 2 Introns 2 and 3 1 RING/U-box superfamily protein COX19-1 (cytochrome c oxidase AT1G66590 1 CDS 1 19-1) HISN6B (Histidine biosynthesis AT1G71920 3 Introns 1 and 2 1 6B) AT1G73920 1 5‘UTR 1 Lipase family protein Encodes protein with KDOP AT1G79500 5 5‘UTR, CDS 1 synthase activity AT1G79990 3 CDS, Intron23 2 - AT2G03630 1 Intron23 1 5‘UTR, CDS, AT2G04170 5 1 TRAF-like family protein Intron1 SCPL43 (serine carboxypeptidase- AT2G12480 1 CDS 1 like 43) Non-cat. Subunit of nuclear DNA- AT2G15400 1 CDS 1 dep RNA Pol V AT2G16365 3 Intron3 1 F-box family protein AT2G24490 1 3‘UTR 1 RPA2 (Replicon Protein A2) Glycosyl hydrolase family 17 AT2G27500 2 CDS, 3‘UTR 1 protein NAT12 (Nucleobase-ascorbate AT2G27810 2 CDS 1 transporter 12) Disease resistance-responsive AT2G28670 2 5‘UTR, CDS 1 family protein NAD(P)-binding Rossmann –fold AT2G29290 1 5‘UTR 1 superfamily protein Encodes RNA pol I(A) and III(C) AT2G29540 1 5‘UTR 2 14kDa subunit API5 ( inhibitory AT2G34040 1 CDS 1 protein 5) AT2G34100 1 Intron2 1 - Encodes a member of SCAR AT2G34150 1 CDS 1 family AT2G35080 1 CDS 1 Fatty acid hydroxylase AT2G37700 1 CDS 1 superfamily AT2G42510 2 CDS 1 - AT2G43910 2 CDS 1 HOL1 (Harmless to ozone layer 1) AT2G43920 2 CDS 1 HOL2 (Harmless to ozone layer 2) Major facilitator superfamily AT3G01930 1 CDS 1 protein

239

# without Gene Name or Loci # with GQS Location GQS Putative Function AT3G10915 1 5‘UTR, CDS 4 Reticulon family protein AP2/B3-like transcriptional factor AT3G11580 1 CDS 1 family protein Metallo-hydrolase/ oxidored. AT3G13800 1 5‘UTR 1 superfam. protein UGT88A1 (UDP-glucosyl AT3G16520 2 3‘UTR 1 transferase 88A1) AT3G17860 1 Intron1 2 JAI3 (Jasmonate-insensitive 3) MYB15 (MYB domain protein AT3G23250 1 5‘UTR 1 15); R2R3 factor gene family Eukaryotic aspartyl protease AT3G25700 1 CDS 1 family protein Encodes protein phosphat. 2A AT3G26020 1 Intron1 3 (PP2A) beta subunit Transductin/WD-40 repeat family AT3G33530 1 CDS 1 protein HSD2 (hydroxysteroid AT3G47350 1 CDS 1 dehydrogenase 2) basic HLH DNA binding AT3G56770 1 CDS 1 superfamily protein AT4G01400 2 CDS 1 - EDA35 (Embryo sac development AT4G05440 1 CDS 1 arrest 35) CTC1 (Conserved telomere AT4G09680 1 Intron15 1 maintenance component 1) Integrase-type DNA binding AT4G13040 2 CDS, Intron1 1 superfamily protein Bifunc. inhib/ lipid-transfer AT4G15160 1 CDS 1 protein/seed storage 2S albumin superfam. protein LSD1 (lesion stimulating disease AT4G20380 7 Introns 3 and 4 1 1) NAD(P)-binding Rossmann –fold AT4G20460 1 CDS 1 superfamily protein Galactosyltransferase family AT4G21060 1 CDS 1 protein OSBP (Oxysterol binding protein) AT4G22540 1 CDS, Intron1 2 related protein 2A AT4G29020 2 CDS, Intron1 1 - AT4G29810 1 Intron1 1 MKK2 (MAP kinase kinase 2) UDP-N-acetylglucosamine (UAA) AT4G31600 1 3‘UTR 1 transporter family ATP binding microtubule motor AT4G38950 1 5‘UTR 1 family protein MYB33 (MYB domain protein AT5G06100 1 3‘UTR 2 33) AT5G06420 1 5‘UTR 1 Zn finger family protein Member of the SYP13 gene AT5G08080 1 CDS 1 family

240

# without Gene Name or Loci # with GQS Location GQS Putative Function NAC082 (NAC domain AT5G09330 3 5‘UTR, Intron1 1 containing protein 82) CDS, Introns 6 DEAD/DEAH box RNA helicase AT5G11200 3 1 and 8 family protein Transductin/WD40 repeat-like AT5G12920 2 CDS, Intron2 1 superfamily protein Encodes 5-formyltetra hydrofolate AT5G13050 1 CDS 1 cycloligase Alpha/beta-Hydrolases AT5G18630 2 3‘UTR 1 superfamily protein 5‘UTR, CDS, Heavy metal transport/ detox. AT5G19090 3 1 Introns 4 and 5 superfamily protein AT5G19280 2 5‘UTR, CDS 1 RAG1 (Root attenuated growth 1) CDS, Introns 9 CLT1 (Chloroquine-resistance AT5G19380 2 1 and 11 transporter 1) Zn-dep. Exopeptidases AT5G20660 1 CDS 1 superfamily protein AT5G22040 1 Intron1 1 - DNAJ heat shock N-term. AT5G23590 1 5‘UTR 1 domain-containing protein AT5G35603 1 CDS 1 - AT5G40500 1 CDS 1 - AT5G40810 2 5‘UTR, CDS 1 Cytochrome C1 family Nucleotide-sugar transporter AT5G41760 1 3‘UTR 1 family protein AT5G42560 1 CDS 2 ABA-responsive family protein Heavy metal transport/ detox. AT5G50740 2 CDS 1 superfamily protein AT5G51460 1 Intron1 2 - Ypt/Rab-GQP domain of gyp1p AT5G53570 2 5‘UTR, CDS 1 superfamily protein Leucine-rich repeat protein kinase AT5G58300 1 Intron1 1 family protein NPX1 (Nuclear protein X1) ; AT5G63320 1 Intron6 2 regulates ABA responses Haloacid dehalogenase-like AT5G65140 1 CDS 1 hydrolase (HAD) family AT5G66600 2 CDS 1 - AT5G66930 1 CDS 1 - Casein kinase II catalytic subunit AT5G67380 1 CDS 1 alpha The GQS location listed includes all locations for gene models that contain a GQS to conserve space. Gene models from the same loci with a GQS could have them in different regions. For example, At4g13040 has three gene models: X.1 does not have a GQS, X.2 has a GQS in Intron 1 and X.3 has GQS in the CDS and Intron 1.

241

VITA Melissa A. Mullen

EDUCATION Ph.D., Chemistry, The Pennsylvania State University (August 2011) B.A., Chemistry and Spanish, Colby College (May 2005), graduated Magna Cum Laude

AWARDS AND HONORS Apple Fellowship (2010-2011) Women in the Sciences and Engineering Travel Grant (2010) RNA Society of North Carolina RNA Symposium Poster Award (2009) Department of Chemistry Graduate Student Leadership and Service Award (2009) Honorable Mention, NSF Graduate Research Fellowship (2006)

PUBLICATIONS Bove, J., Hord, C.L.H., Mullen, M.A. The blossoming of RNA Biology: Novel Insights from Plant Systems. 2006 RNA 12: 2035-2046.

Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M., Bevilacqua, P.C. RNA G- quadruplexes in the model plant species Arabidopsis thaliana: Prevalence and possible functional roles. 2010 Nucleic Acids Res. 38: 8149-8163.

Mullen, M.A., Assmann, S.M., Bevilacqua, P.C. Potential for a binary gene response: RNA G- quadruplexes with fewer quartets fold with higher cooperativity. 2011 In preparation.

PRESENTATIONS Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M., Bevilacqua, P.C. G- quadruplexes in Arabidopsis thaliana: Folding cooperativity and implications for drought stress regulation. RustBelt RNA Meeting. 2010, Cleveland, OH (Talk).

Mullen, M.A., Olson, K.J., Dallaire, P., Major, F., Assmann, S.M., Bevilacqua, P.C. Prevalence and biological significance of G-quadruplexes in Arabidopsis thaliana. Zing Nucleic Acids Conference. 2010, Puerto Morelos, Mexico (Poster).

Mullen, M.A., Olson, K.J., Assmann, S.M., Bevilacqua, P.C. G-quadruplexes in Arabidopsis thaliana: Prevalence and possible roles in RNA during stress. Symposium on RNA Biology VIII: RNA Tool and Target. 2009, Research Triangle Park, NC (Poster).

PROFESSIONAL ASSOCIATIONS American Chemical Society (ACS) RNA Society Graduate Women in Science (GWIS)