The Pennsylvania State University

The Graduate School

Eberly College of Science

THE IMPACT OF PRIMARY MICRORNA STRUCTURE ON RECOGNITION BY THE

MICROPROCESSOR COMPLEX IN MICRORNA MATURATION

A Dissertation in

Chemistry

by

Kaycee Andrea Quarles

© 2015 Kaycee Andrea Quarles

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2015

ii

The dissertation of Kaycee Andrea Quarles was reviewed and approved* by the following:

Scott A. Showalter Associate Professor of Chemistry Dissertation Advisor Chair of Committee

Philip C. Bevilacqua Professor of Chemistry

Christine D. Keating Professor of Chemistry

Katsuhiko Murakami Associate Professor of Biochemistry and

Barbara J. Garrison Shapiro Professor of Chemistry Head of the Department of Chemistry

*Signatures are on file in the Graduate School.

iii

ABSTRACT

Since their discovery over a decade ago, thousands of microRNAs (miRNAs) have been found across all multicellular organisms. These in combination with small interfering

RNAs (siRNAs) make up the RNA silencing pathway, also called the RNA interference (RNAi) pathway. Mature miRNAs are ~22--long, single-stranded non-coding RNAs that participate in various cellular, developmental, and differentiation processes via post- transcriptional regulation of gene expression for more than 90% of human genes. Therefore, these key RNAs have been linked to several disease states including cancer, neurodegenerative diseases, cardiac disease, diabetes, and numerous viral diseases. Recently, they have become a key target for the medical RNA therapeutics community.

The human canonical miRNA maturation pathway involves a series of cleavage steps beginning in the nucleus and ending in the cytoplasm, where the final mature miRNA down- regulates gene expression via “silencing” messenger RNA translation. Therefore, these small

RNAs down-regulate the expression of every protein within an organism; thus, potentially controlling all bodily processes. The first processing step in the nucleus involves cleavage of the miRNA precursor by the Microprocessor complex, consisting minimally of the RNase III

Drosha and the double-stranded RNA (dsRNA) binding protein DGCR8. The second processing step involves an analogous complex consisting minimally of the RNase III enzyme Dicer and the dsRNA binding protein TRBP. This pathway is unique because all of these mentioned proteins contain dsRNA binding domains (dsRBDs) that help to recognize the miRNA precursors in the cell.

However, much is still unknown about the processing of these miRNA precursors into their final mature forms. Although it is known that both Drosha and DGCR8 are required for

Microprocessor activity, the molecular mechanism of RNA substrate recognition by these iv

proteins is still not fully known. In particular, the means by which Drosha locates and recognizes the exact cleavage site on the RNA plays a critical role in Microprocessor efficiency and must be known to fully understand miRNA biogenesis.

Literature suggests that the recognition and cleavage of miRNA precursors is in part based on unique structural characteristics of the RNA, which guide the proteins to their cut-site locations. However, there are currently no experimentally-determined structures of entire miRNA precursors. Therefore, it is necessary to biochemically determine native solution structures of these RNAs as they would be seen by the Microprocessor if the maturation process at the molecular level is to be characterized. Atomic resolution methods for RNA structure determination pose several challenges that are not yet overcome; however, a variety of RNA secondary structure mapping and RNA modeling techniques have proven successful for determining structures of other large RNAs comparable to those encountered in this pathway.

Therefore, a primary aim for this thesis was to combine these approaches to analyze structurally diverse miRNA precursors, which revealed a possible Microprocessor recognition site on the

RNA.

In addition to the recognition site, RNA structure mapping yielded a consistent display of structural deformations periodically placed along the miRNA precursor. Surprisingly, these periodic deformations along miRNA precursors correlate to the binding surface required for dsRBDs, such as those found in both protein components of the Microprocessor complex – while

Drosha only contains a single dsRBD, whereas DGCR8 has two in tandem. dsRBDs are characterized as binding to RNA with little sequence specificity; therefore it is reasonable to hypothesize that DGCR8 function is dependent on the recognition of specific structural features in the miRNA precursor. This thesis utilizes a variety of binding techniques to fully characterize the binding of these dsRBDs with RNA, taking into account the different structural features natively present in miRNA precursors. Interestingly, I found that the dsRBD located in Drosha is v

not capable of binding RNA at all, leaving DGCR8 as the primary RNA binding protein.

Furthermore, DGCR8 showed little sensitivity to the presence of structural deformations within miRNA precursors, leaving us to believe that its tandem dsRBDs are capable of cooperatively binding around them. In the end, while DGCR8 is necessary for dsRNA binding and recruitment to the Microprocessor, it is not sufficient on its own in directing the exact Drosha cut-site position on miRNA precursors.

As mentioned, the dsRBDs from Drosha and DGCR8 exhibit very different binding affinities for RNA. The differences seen in RNA binding by dsRBDs from these various proteins begs for an explanation. These dsRBDs display low sequence conservation, which may result in small differences in their folded domains. Therefore, an ongoing aim for this thesis is to examine the structure of dsRBDs bound to dsRNA in order to determine key regions governing the binding interaction as well as portions of the dsRBD important for maintaining the folded domain. However, due to solubility issues, DGCR8 was not amenable to studying this interaction at the atomic level. Instead, results from NMR and X-ray crystallography data of the dsRBD from

Dicer (the cytoplasmic processing enzyme) bound to dsRNA will serve as the comparison to bound structures in the Protein Data Bank of dsRBDs with different amino acid sequence.

vi

TABLE OF CONTENTS

LIST OF FIGURES ...... xii

LIST OF TABLES ...... xvii

ACKNOWLEDGEMENTS ...... xviii

Chapter 1: Introduction ...... 1

1.1 MicroRNA Maturation Pathway in RNA Interference ...... 1

1.1.1 Primary MicroRNA Structural Recognition by the Microprocessor Complex ...... 4

1.1.2 The Microprocessor Complex of DGCR8 and Drosha ...... 8

1.2 RNA Structure Determination ...... 12

1.2.1 Atomic Resolution Methods ...... 14

1.2.2 Chemical and Enzymatic Probing Methods ...... 16

1.2.3 MC-Pipeline Modeling ...... 18

1.3 Double-stranded RNA Binding Domains ...... 21

1.4 Methods Used for Studying dsRBD Binding ...... 23

1.4.1 Electrophoretic Mobility Shift Assays ...... 25

1.4.2 Fluorescence Polarization ...... 26

1.4.3 Isothermal Titration Calorimetry ...... 26

1.4.4 Analytical Ultracentrifugation ...... 27

1.4.5 Circular Dichroism ...... 28

1.5 Dissertation Outline ...... 28

1.6 Acknowledgements ...... 29 vii

1.7 References ...... 30

Chapter 2: The Use of SHAPE Chemistry to Determine RNA Structure in Solution ...... 34

2.1 Introduction ...... 34

2.2 Materials and Methods ...... 40

2.2.1 RNA Preparation ...... 40

2.2.2 SHAPE Reagent Preparation ...... 40

2.2.3 RNA Modification by SHAPE Reagents ...... 41

2.2.4 Primer Extension ...... 41

2.2.5 Processing of SHAPE Data ...... 42

2.3 Results and Discussion ...... 42

2.3.1 Thermodynamically Predicted Primary MicroRNA Structures ...... 42

2.3.2 Secondary Structures Determined Under Native Conditions ...... 43

2.3.3 Buffer Compatibility with SHAPE Reagents ...... 49

2.3.4 pH Variability in SHAPE Reactions ...... 51

2.3.5 Monitoring RNA Unfolding with Temperature ...... 53

2.3.6 Magnesium Dependence ...... 55

2.3.7 SHAPE Reagents ...... 56

2.4 Conclusion ...... 59

2.5 Acknowledgements ...... 60

2.6 Appendix ...... 61

2.7 References ...... 63 viii

Chapter 3: Ensemble Analysis of Primary MicroRNA Structure Reveals an Extensive Capacity to

Deform Near the Drosha Cleavage Site ...... 66

3.1 Abstract ...... 66

3.2 Introduction ...... 67

3.3 Materials and Methods ...... 69

3.3.1 RNA Preparation ...... 69

3.3.2 RNA Modification by 1M7 ...... 69

3.3.3 Primer Extension ...... 70

3.3.4 Processing of SHAPE Data ...... 70

3.3.5 Structure Mapping ...... 71

3.3.6 MC-Fold and MC-Sym Simulation...... 71

3.3.7 Drosha Processing Assays ...... 73

3.4 Results ...... 73

3.4.1 SHAPE-Derived Primary miRNA Structures ...... 73

3.4.2 Ribonuclease Cleavage Structure Mapping ...... 80

3.4.3 Secondary Structure Refinement by MC-Fold ...... 82

3.4.4. Global Features of 3D Structure Modeling Using MC-Sym ...... 84

3.4.5 Primary miRNA Structural Heterogeneity ...... 86

3.4.6 Drosha Processing of Primary miRNAs ...... 92

3.5 Discussion ...... 95

3.6 Acknowledgements ...... 98 ix

3.7 References ...... 99

3.8 Supporting Information ...... 103

Chapter 4: Deformability in the Cleavage Site of Primary MicroRNA is Not Sensed by the

Double-Stranded RNA Binding Domains in the Microprocessor Component DGCR8 ...... 111

4.1 Abstract ...... 111

4.2 Introduction ...... 112

4.3 Materials and Methods ...... 116

4.3.1 Protein Preparation ...... 116

4.3.2 Primary MicroRNA Preparation ...... 117

4.3.3 5´-End Labeling of RNA...... 117

4.3.4 Native Gel RNA Purification ...... 118

4.3.5 Electrophoretic Mobility Shift Assays ...... 118

4.3.6 Drosha Processing and Competition Processing Assays ...... 119

4.4 Results ...... 119

4.4.1 Binding to Primary MicroRNAs ...... 120

4.4.2 Binding to Perfect Watson-Crick Duplexes ...... 122

4.4.3 Binding to Duplexes with Flanking Single Strands ...... 129

4.4.4 Binding to RNA Bearing Terminal Loop Structures ...... 132

4.4.5 Binding to Pri-miRNA with Reduced Stem Flexibility ...... 135

4.4.6 RNA Binding by DGCR8’s Heme-binding Domain ...... 138

4.5 Discussion ...... 142 x

4.6 Acknowledgements ...... 145

4.7 References ...... 146

4.8 Supporting Information ...... 150

4.8.1 Supporting Methods ...... 150

4.8.2 Supporting Figures ...... 151

Chapter 5: Structural Characterization of Dicer’s dsRBD Complexed with dsRNA ...... 169

5.1 Abstract ...... 169

5.2 Introduction ...... 169

5.3 Materials and Methods ...... 171

5.3.1 Protein Preparation ...... 171

5.3.2 RNA Preparation ...... 171

5.3.3 Analytical Ultracentrifugation ...... 171

5.3.4 Protein:RNA Complex Formation ...... 172

5.3.5 NMR Methods ...... 172

5.3.6 X-Ray Crystallography ...... 173

5.3.7 Crystal Looping ...... 174

5.4 Results and Discussion ...... 174

5.4.1 Analytical Ultracentrifugation ...... 175

5.4.2 Nuclear Magnetic Resonance ...... 176

5.4.3 X-Ray Crystallography ...... 187

5.5 Conclusion ...... 190 xi

5.6 Acknowledgements ...... 190

5.7 References ...... 191

Chapter 6: Perspectives in RNA Recognition by Proteins in the MicroRNA ...... 192

Appendix: The Application of Biochemical Techniques to Monitor dsRBD Binding ...... 196

A.1 Introduction ...... 196

A.2 Materials and Methods ...... 196

A.2.1 RNA Preparation ...... 196

A.2.2 Protein Preparation ...... 197

A.3 Results ...... 197

A.3.1 Isothermal Titration Calorimetry ...... 197

A.3.2 Circular Dichroism ...... 199

A.3.3 Analytical Ultracentrifugation ...... 201

A.3.4 Biolayer Interferometry ...... 203

A.3.5 dsRBD Footprinting Using In-Line Probing ...... 205

A.3.6 dsRBD Footprinting Using SHAPE Chemistry ...... 207

A.3.7 Mass Spectrometry Analysis of UV-induced Cross-linked Complexes ...... 209

A.4 Acknowledgements ...... 212

A.5 References ...... 213

xii

LIST OF FIGURES

Figure 1-1. Schematic of the mammalian canonical miRNA maturation pathway...... 2

Figure 1-2. X-ray structure of pre-mir-30a bound to the Exportin-5:RanGTP complex ...... 4

Figure 1-3. Structural features commonly found in hairpin pri-miRNAs ...... 6

Figure 1-4. Proposed bending model of pri-miRNA recognition by DGCR8-Core currently suggested in the literature ...... 11

Figure 1-5. Structure of A-form double-stranded RNA ...... 13

Figure 1-6. DSC shows that pri-miRNAs shift from a hairpin monomer to a dimer with increasing salt concentration ...... 15

Figure 1-7. The MC-Pipeline software package contains two algorithms ...... 20

Figure 1-8. Sequence alignment for all mammals with known dsRBD sequence within the miRNA maturation pathway ...... 22

Figure 1-9. Electrostatic interactions governing dsRBD:dsRNA binding ...... 23

Figure 1-10. Due to the non-sequence specificity of dsRBDs, binding to dsRNA is like binding to a lattice of possible identical binding sites ...... 24

Figure 2-1. SHAPE chemistry reaction using 1M7 (1-methyl-7-nitroisatoic anhydride) as an example, yielding the 2´-O-adduct on the RNA ...... 36

Figure 2-2. Outline of SHAPE chemistry procedure ...... 38

Figure 2-3. SHAPE cassette linkers added onto the RNA to fully map the pri-miRNAs ...... 39

Figure 2-4. Predicted RNA secondary structures as determined by mfold for the pri-miRNAs used in this study ...... 43

Figure 2-5. SHAPE reactions of pri-mir-16-1, pri-mir-107, and pri-mir-30a ...... 46

Figure 2-6. SHAPE reactions of DGCR8’s 5´-UTR show re-folding of the native structure induced by the SHAPE cassette ...... 48 xiii

Figure 2-7. The 1M7 anhydride SHAPE reagent does not modify the RNA in the presence of cacodylate buffer ...... 51

Figure 2-8. Using lower pH in the SHAPE reaction typically yields lower SHAPE reactivities, which can be somewhat recovered with an increase in reaction time ...... 52

Figure 2-9. UV melt of pri-mir-107 showing the absorbance change with temperature as the

RNA unfolds ...... 53

Figure 2-10. SHAPE chemistry used to monitor the unfolding of RNA ...... 55

Figure 2-11. SHAPE chemistry shows that pri-miRNAs do not exhibit tertiary structure ...... 56

Figure 2-12. Various SHAPE reagents can be used to investigate the RNA’s different time- dependent dynamics ...... 58

Figure 3-1. SHAPE-constrained MC-Fold calculations yield secondary structures with embedded estimation of conformational dynamics ...... 77

Figure 3-2. Ribonuclease structure mapping is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations for pri-mir-16-1, pri-mir-

30a, and pri-mir-107 ...... 82

Figure 3-3. Ensemble representation of the top five SHAPE-constrained models generated by

MC-Sym for pri-mir-16-1, pri-mir-30a, and pri-mir-107 ...... 86

Figure 3-4. Secondary structures of pri-miRNA molecules harbor multiple dynamic bulges and internal loops ...... 88

Figure 3-5. Secondary structures of pri-miRNA molecules harbor multiple non-Watson-Crick mismatches that are predicted to be well-ordered by SHAPE reactivity ...... 91

Figure 3-6. Drosha processing of pri-miRNAs to pre-miRNAs in vitro confirms the necessity of hot spot flexibility for efficient cleavage ...... 95

Figure S3-1. Denaturing polyacrylamide gels of SHAPE reactions for pri-miR-16-1, pri-miR-

30a, and pri-miR-107 ...... 103 xiv

Figure S3-2. Ribonuclease structure mapping of pri-miR-30a is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations ...... 104

Figure S3-3. Ribonuclease structure mapping of pri-miR-107 is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations ..... 105

Figure S3-4. Secondary structures of pri-miR-16-1 derived from SHAPE and MC-Fold probabilities ...... 106

Figure S3-5. Secondary structures of pri-miR-30a derived from SHAPE and MC-Fold probabilities ...... 107

Figure S3-6. Secondary structures of pri-miR-107 derived from SHAPE and MC-Fold probabilities ...... 108

Figure S3-7. Secondary structures of pri-miR-16-1 HS mut derived from SHAPE and MC-Fold probabilities ...... 109

Figure S3-8. Secondary structures of pri-miR-107 HS mut derived from SHAPE and MC-Fold probabilities ...... 110

Figure 4-1. Bending model for pri-miRNA recognition by DGCR8-Core currently supported in the literature ...... 115

Figure 4-2. Secondary structures of in vitro transcribed RNA models for native pri-miRNAs and pre-mir-16-1; and non-native pri-mir-16-1 stem-loop constructs ...... 121

Figure 4-3. Secondary structures of Watson-Crick duplexes derived from pri-mir-16-1 ...... 123

Figure 4-4. Electrophoretic mobility shift assays used to examine binding by DGCR8 to varying lengths of perfect Watson-Crick RNA duplexes ...... 125

Figure 4-5. Competition processing assays were used to corroborate the EMSA results in a more biological context ...... 128

Figure 4-6. Secondary structures of flanking and terminal loop duplexes from pri-mir-16-1 .... 131 xv

Figure 4-7. Drosha processing assays show that the secondary structure of pri-miRNAs is an important determinant of Microprocessor cleavage efficiency in vitro ...... 134

Figure 4-8. Secondary structures of constructs mimicking the imperfections found in pri-mir-16-

1: in the context of full-length pri-mir-16-1 and in the context of short duplexes ...... 136

Figure 4-9. Drosha processing assays show that the cysteine residue C352 is an important determinant of Microprocessor cleavage efficiency in vitro ...... 141

Figure S4-1. EMSA data showing that Drosha’s dsRBD does not bind pri-mir-16-1 ...... 151

Figure S4-2. EMSA data for DGCR8-Core binding native pri-miRNAs and pre-mir-16-1 ...... 152

Figure S4-3. EMSA data for DGCR8-dsRBD1 binding various native pri-miRNAs ...... 153

Figure S4-4. Filter binding assays give weaker binding affinities than EMSAs for DGCR8-Core binding ...... 154

Figure S4-5. EMSA data for DGCR8-Core binding perfect duplex RNA of varying lengths .... 155

Figure S4-6. EMSA data for DGCR8-dsRBD1 binding perfect duplex RNA ...... 156

Figure S4-7. Competition processing data for the three shorter perfect RNA duplexes ...... 157

Figure S4-8. Competition processing data for the longer perfect RNA duplexes ...... 158

Figure S4-9. EMSA data for DGCR8-Core binding various flanking duplexes ...... 159

Figure S4-10. EMSA data for both DGCR8-Core and DGCR8-dsRBD1 binding ssRNA ...... 160

Figure S4-11. EMSA data for DGCR8-dsRBD1 binding various flanking duplexes ...... 161

Figure S4-12. EMSA data for DGCR8-Core binding ds16 with varying terminal loops ...... 162

Figure S4-13. EMSA data for DGCR8-dsRBD1 binding ds16 with varying terminal loops ..... 163

Figure S4-14. EMSA data for DGCR8-Core binding pri-mir-16-1 mutants ...... 164

Figure S4-15. EMSA data for DGCR8-Core binding ds22 and similar length duplexes harboring imperfections found natively in the hot spot and secondary imperfections of pri-mir-16-1 ...... 165

Figure S4-16. EMSA data for DGCR8-HBD-Core binding pri-mir-16-1 ...... 166

Figure S4-17. EMSA data for DGCR8-HBD-Core (C352A) binding pri-mir-16-1 ...... 167 xvi

Figure S4-18. UV-visible absorption spectrum of DGCR8-HBD-Core with and without the

C352A mutation ...... 168

Figure 5-1. AUC analysis suggests that 2 Dicer-dsRBDs are capable of binding to a single ds22 molecule ...... 176

Figure 5-2. NMR titration of 15N-Dicer-dsRBD with ds33...... 178

Figure 5-3. Native gel showing the complexes formed between Dicer-dsRBD and various lengths of dsRNA ...... 179

Figure 5-4. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD bound to dsRNA at 298K ...... 181

Figure 5-5. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD bound to dsRNA at 308K ...... 182

Figure 5-6. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD in its unbound state ...... 184

Figure 5-7. Thermal denaturation midpoint of Dicer-dsRBD in its free state ...... 185

Figure 5-8. HSQC overlay of Dicer-dsRBD in its free state and bound to ds16 ...... 186

Figure 5-9. Overlay of HSQCs produced from NOE-off and NOE-on effects for Dicer-dsRBD bound to ds16 at 308K ...... 187

Figure 5-10. Crystals of Dicer-dsRBD bound to ds10 ...... 188

Figure 5-11. Diffraction patterns for crystals of Dicer-dsRBD bound to ds10 ...... 189

Figure A-1. ITC data for TRBP-ΔC binding ds22 and for DGCR8-Core binding ds33 ...... 198

Figure A-2. CD data for TRBP-ΔC binding ds33 and for DGCR8-Core binding ds33 ...... 200

Figure A-3. AUC data for Dicer-dsRBD binding ds22 and DGCR8-Core binding ds22 ...... 202

Figure A-4. BLI data of dsRBDs interacting with either the streptavidin tips or free biotin ...... 204

Figure A-5. In-line probing results are ambiguous for determining regions of dsRBD binding 206

Figure A-6. Depiction of possible SHAPE reactivity results upon DGCR8 binding...... 208

Figure A-7. dsRBD footprinting using SHAPE chemistry ...... 209

Figure A-8. UV-induced cross-linking of DGCR8-Core with pri-mir-16-1 ...... 210

Figure A-9. Mass spectrometry analysis of DGCR8-Core cross-linked to dsRNA ...... 212 xvii

LIST OF TABLES

Table 3-1. Complete Annotation of Secondary Structure Defects in the Stem Region of Wild-

Type pri-mir-16-1, pri-mir-30a, and pri-mir-107 ...... 79

Table 4-1. EMSA best-fit parameters for DGCR8-Core and DGCR8-dsRBD1 binding to the native pri- and pre-miRNA constructs ...... 122

Table 4-2. EMSA best-fit parameters for DGCR8-Core and DGCR8-dsRBD1 binding to the indicated duplexed RNA constructs ...... 126

Table 4-3. Competition processing assays reporting inhibition of pri-mir-16-1 cleavage by the

Microprocessor ...... 129

Table 4-4. EMSA best-fit parameters for DGCR8-Core binding to pri-mir-16-1, bearing the indicated mutations ...... 137

Table 4-5. EMSA best-fit parameters for DGCR8-HBD-Core (with and without the C352A mutation) binding to pri-mir-16-1, pri-mir-16-1-TL, and ds44 ...... 140

xviii

ACKNOWLEDGEMENTS

This dissertation is one of my proudest achievements to date. When I started college, I never thought I would have been here. I received my Bachelor’s degree in Chemical Engineering from Georgia Tech. This graduate school route all started because Georgia Tech had recently changed the Chemical Engineering curriculum to provide a Biotechnology option, which included a variety of biochemistry and cell biology courses. Surprisingly, I thoroughly enjoyed the DNA replication and RNA transcription portion of my freshman biology course, and decided to do undergraduate research in a biochemistry lab. Luckily, Dr. Nicholas Hud took a chance on an engineering student, and gave me my first research position. After only one year of working in the lab, I fell in love with biological processes. Towards the end of college, while all of my colleagues were busy taking job offers, I decided to apply for graduate school and switched from being a Chemical Engineer to being a Biochemist. Since then, I have fought my way through graduate school and have several people to thank along the way for making this possible.

First, I would like to thank my advisor, Dr. Scott Showalter, who has been instrumental in developing me as a chemist, while breaking my engineering training. While he was extremely tough on me, I know that it was always in my best interest and has only made me stronger. And although he has taught me many things, I know I have been just as indispensable in his development as an advisor. As for my committee members, thank you so much for attending my meetings and being there for me when I had any questions. Thanks to Dr. Christine Keating for teaching me to consider my audience when I give talks at conferences. Thanks to Dr. Katsu

Murakami for crystallization help and taking my samples with him to the synchrotron. And a special thanks to Phil Bevilacqua, who has been like my second advisor, and accepted me as an

“honorary lab member” in his own lab. I would also like to thank my mentor Dr. Howard Carman from Eastman Chemical Company as well as my undergraduate research advisor Dr. Nick Hud at xix

Georgia Tech, who have both continued to write numerous recommendation letters for me so that

I can get the best job possible. Also, I really have to thank Nick for helping me get into graduate school. I would also like to thank Michelle Hastings and Mallory Havens at RFUMS for the fruitful collaboration with cell culture and processing assays.

My fellow labmates have been extremely involved in keeping me sane every day. These include all past and present members: Chris Wostenberg, Chad Lawrence, Alain Bonny, Ellen

Forsyth, Rico Acevedo, Monique Bastidas, Deb Sahu, Brittney Nagle, Eric Gibbs, Tony Zidell,

Durga Chadalavada, Josh Kranick, Declan Evans, and Taylor Conrad. In particular, I would like to thank Chris, Chad, Alain, and the Bevilacqua Lab for helping to get me started in the lab during my first year when we were a small lab needing lots of advice. In addition, I have to thank

Rico and Monique (a.k.a. “The Terrible Trio”) for making the last three years in lab a blast and renewing my love for soccer. We have had a lot of good times and working anywhere else will never be the same without them. I would also like to thank my undergraduate student Ellen

Forsyth who worked with me for three years. She helped me to become a better mentor and also lessened my workload by carrying out some of the experiments for my first paper, and doing so spectacularly. I would also like to thank our administrative assistant Sabrina Glasgow for helping me with scheduling rooms, flights, and mailing plasmids, and also for our random counseling sessions.

Lastly, I have to thank my friends and family who have been there for me every time I questioned why I am putting myself through the torture of graduate school. My college roommate, Megan Brodale, provided emotional support while I endured chemical engineering classes and telling me that it is okay to be a firecracker. My roommate and best friend through grad school, Jessica Nichol, allowed me to vent to her every day and was there for me during the worst of times. She even went to the grocery store and made dinner for me during second-year orals. Lastly, I would like to thank my boyfriend Todd Lurain who has endured my stress over the xx

past couple years and for making me endless dishes of meat and potatoes because I did not have time to cook while thesis writing.

Of course I would not be here without my parents. In my opinion, they have done so well at raising me and turning me into a well-rounded individual. I love you both dearly and have missed our pastimes just chit-chatting on the front porch. In particular, I would like to thank my father who was very strict with me and kept me away from bad influences; otherwise, I probably would not be where I am today. And I would like to thank my mother who originally sparked my interest in math and science, because she is a chemical engineer herself. She also transformed my view of learning by telling me in college, “Your professors are not teaching you how to perform solid-liquid-vapor separations on a distillation column because they think you will need to know it for your career, they are teaching you how to use all of your resources to solve a difficult problem.”

Lastly, I would like to share a quote from one of the most influential books I ever read that provided inspiration to me when I encountered failures in my experiments (and if you read my Appendix section, you will see that I encountered them frequently):

“It’s an experience like no other experience I can describe, the best thing that can happen

to a scientist, realizing that something that’s happened in his or her mind exactly

corresponds to something that happens in nature. It’s startling every time it occurs. One is

surprised that a construct of one’s own mind can actually be realized in the honest-to-

goodness world out there. A great shock, and a great, great joy.”

- Leo Kadanoff

1

Chapter 1

Introduction

1.1 MicroRNA Maturation Pathway in RNA Interference

Since their discovery over a decade ago, thousands of microRNAs (miRNAs) have been found in worms, flies, plants, and mammals.1 Found in all multicellular organisms and encoded by some viruses,2-4 mature miRNAs are ~22-nucleotide-long, single-stranded non-coding RNAs that participate in various cellular, developmental, and differentiation processes via post- transcriptional regulation of gene expression.5 Due to their role in developmental processes, misregulated miRNA expression has been linked to several disease states including cancer,6, 7 neurodegenerative diseases,8, 9 cardiac disease,10 diabetes,11 and numerous viral diseases.12

Therefore, miRNAs have become a promising target for RNA interference (RNAi)-based therapeutics.13-15

As part of RNAi, the human canonical miRNA maturation pathway involves a series of cleavage steps beginning in the nucleus and ending in the cytoplasm where the final mature miRNA is incorporated into the RNA induced silencing complex (RISC) (Fig. 1-1). Briefly, in the first processing step in the nucleus, the newly transcribed miRNA transcripts of ~100 , known as primary miRNAs (pri-miRNAs), are cleaved by the Microprocessor complex. The Microprocessor complex consists minimally of the RNase III enzyme Drosha and the double-stranded RNA (dsRNA) binding protein DGCR8. Cleavage results in precursor miRNAs (pre-miRNAs) of ~70 nucleotides containing a two-nucleotide-long 3´-overhang at the 2

Drosha cut site.16 The pre-miRNAs are then exported into the cytoplasm by Exportin-5 (XPO5) where they are processed by the minimal complex of Dicer and TRBP, in which the RNase III enzyme Dicer cleaves off the terminal loop to yield the ~22 miRNA duplex.16 The helicase domain of Dicer then unwinds the miRNA duplex so that the single-stranded mature miRNA – incorporated into RISC containing the essential Argonaute2 protein – may base pair with the targeted messenger RNA (mRNA) and down-regulate its translation.5

Figure 1-1. Schematic of the mammalian canonical miRNA maturation pathway, beginning in the nucleus and ending in the cytoplasm. The first processing step in the nucleus involves the

Microprocessor complex consisting of Drosha and DGCR8. The second processing step in the cytoplasm consists minimally of Dicer and TRBP, followed by incorporation into RISC containing Argonaute 2 (AGO2).

3

Despite the potential aid that understanding the mechanisms involved in miRNA maturation would provide to the medical community, much is still unknown about the processing of these RNAs into their final mature forms. In particular, very little is known about the maturation process of the initial pri-miRNA at the molecular or atomic level, and the role its secondary structure plays in targeting it to the Microprocessor complex for maturation is only qualitatively defined. This lack of mechanistic understanding is partially due to the existence of only one known structure of a miRNA precursor, which is of a pre-miRNA bound to the

Exportin-5:RanGTP complex (Fig. 1-2),17 not of a pri-miRNA substrate. In terms of unbound pri- miRNA structures, there are none, and the knowledge of experimentally-derived structures of pri- miRNAs is a necessity for structure-based Microprocessor recognition predictions.

4

Figure 1-2. X-ray structure of pre-mir-30a bound to the Exportin-5:RanGTP complex (PDB

3A6P). The structure shows that Exportin-5 (gold) envelops most of the pre-miRNA (black) by burying the 2-nucleotide 3´-overhang (Ran protein in green). As can be seen, there is no diffraction for the disordered terminal loop and the UC bulge (positions indicated by dotted red circles), demonstrating that even this pre-miRNA structure is incomplete.

1.1.1 Primary MicroRNA Structural Recognition by the Microprocessor Complex

Although both Drosha and DGCR8 are required for Microprocessor activity, the molecular mechanism of RNA substrate recognition by these proteins and the determinants of specificity in their interactions with pri-miRNAs are still not fully known. Furthermore, the ability of the Microprocessor to distinguish pri-miRNAs from other RNA hairpins in the nucleus is also not known; this differentiation between hairpins is believed to be based on unique structural characteristics of the RNA. In fact, the only well-understood element of the pri-miRNA 5

processing step is the cleavage mechanism of the RNase III domains in Drosha;18 however, how these domains select the cut site within the RNA substrate has not been established.

The native pri-miRNA substrate for DGCR8 is described as a long ~100-nucleotide hairpin RNA harboring multiple imperfections, a terminal loop, and an unpaired flanking region connected by the single-stranded RNA: double-stranded RNA junction (termed the ssRNA- dsRNA junction); see Figure 1-3 for structural features common to pri-miRNAs. Multiple studies have investigated the importance and occurrence of imperfections within the double-stranded stem of the pri-miRNA, yet no consistent features have been seen that would serve as a recognition site for the Microprocessor. While these studies support the importance of a ssRNA- dsRNA junction, it is still disputed whether a large, flexible terminal loop is required for Drosha processing.19-21

6

Figure 1-3. Structural features commonly found in hairpin pri-miRNAs. Each RNA nucleobase is represented as a blue peg, which are all connected by the phosphodiester backbone (red). When two nucleobases interact across from one another via hydrogen bonding, they make a base pair; otherwise, they can exist in a variety of alternative arrays such as single-stranded in the terminal loop, an imperfection (as a bulged base or part of an internal loop), or in the flanking tail. The ssRNA-dsRNA junction is also pointed out for clarity.

In order to determine the important secondary structure characteristics of the pri-miRNA necessary for efficient processing by the Microprocessor, several studies have been done examining the consequences of sequence and secondary structure within the RNA.16, 19-22 These studies are currently dominated by in vitro Drosha processing assays, which essentially monitor the amount of pre-miRNA generated from Drosha cleavage within the Microprocessor complex.

These assays show that the nucleotide sequence within the stem and loop regions appear to not be critical for pre-miRNA production, nor is the creation or deletion of bulges in the middle of the stem.16, 21 In contrast, the creation or deletion of imperfections surrounding the Drosha cleavage 7

site and the removal of the single-stranded flanking strands played important roles in

Microprocessor cleavage efficiency.16, 19, 21, 23

These key studies, however, differ in their conclusion of the terminal loop size requirement for efficient Microprocessor activity. Zeng et al. showed that mature miRNA production was inhibited by decreasing the loop size to four nucleotides.20, 21 At odds with this result, Han et al. showed that pre-miRNA production is not appreciably inhibited by the presence of a tetraloop.19 However, these studies were performed with two different RNAs (pri-mir-30a and pri-mir-16-1, respectively) and under different conditions that monitored different end- products (in vivo and in vitro, respectively), which may be the reason for the opposing results. In particular, the discrepancy could be due to slight differences in the structure of the pri-miRNA used, as pri-mir-30a is suggested to have secondary structure formed in its flanking tails past the ssRNA-dsRNA junction that would impede its processing.19 Regardless, the majority of pri- miRNA structural-based conclusions from Microprocessor cleavage studies are dependent only on the secondary structure predicted by mfold or similar computer programs; however, structure mapping of hairpin RNAs has shown that computer-predicted structures oftentimes turn out to be much different than what is experimentally derived.24

As mentioned, several studies have investigated the locations and local structures of imperfections within the double-stranded stem between the ssRNA-dsRNA junction and the terminal loop of pri-miRNAs. These imperfections are important for multiple reasons: they dictate the amount of RNA duplex available for binding by proteins and also impart a degree of flexibility within the pri-miRNA, which can not only provide a recognition site for the cleavage enzyme Drosha, but also allow overall bending of the pri-miRNA for recognition by DGCR8.

Han et al.19 showed that the length of the RNA stem between the Drosha cleavage site and the ssRNA-dsRNA junction is important for Microprocessor efficiency by demonstrating that changing the length correspondingly alters the location of the cut site, such that approximately 8

one turn of RNA helix is maintained between the junction and the cut site. This suggests that sufficient binding surface on the RNA is needed for protein recognition before encountering a disruption in the helix, such as an imperfection. In correlation with this, bioinformatics suggest that a decrease in RNA stability in the form of helical imperfections occurs every 10-12 base pairs (approximately one turn of A-form RNA helix) on the pri-miRNA, which in turn creates a surface that is distinctly recognized by the Microprocessor.25

Bioinformatics studies have also shown that majority of pre-miRNAs have a 2-nucleotide

3´-overhang,25 which is consistent with the characteristic feature of RNase III enzyme products.18,

26 However, atypical overhangs from Drosha cleavage have been noted at the sites of asymmetric internal loops. This finding suggests that recognition of an asymmetric internal loop at the cleavage site results in improper RNA recognition by Drosha, yielding atypical overhangs post- cleavage. In order to determine the reason for this atypical cleavage effect, we need to first confirm secondary structures of pri-miRNAs biochemically. Overall, the means by which Drosha locates and recognizes the cleavage site plays a critical role in Microprocessor efficiency and must be known to fully understand miRNA biogenesis. Therefore, it is necessary to biochemically determine native solution structures of pri-miRNAs as they would be seen by the

Microprocessor if the maturation process at the molecular level is to be characterized.

1.1.2 The Microprocessor Complex of DGCR8 and Drosha

Fully understanding Microprocessor recognition of RNA is not only dependent on the pri-miRNA structure, but also on the proteins that recognize these RNAs and how these proteins work together to carry out their function. It has been exhaustively reported in the literature that both DGCR8 and Drosha are required for processing of pri-miRNAs.19, 27, 28 It is hypothesized that DGCR8 acts as a “molecular anchor” for pri-miRNA processing by forming a “pre-cleavage complex”, which then allows Drosha to bind and cleave at the desired cut site.19 As a molecular 9

anchor, DGCR8 would bind the pri-miRNA first, allowing Drosha to re-orient itself to cleave ~11 base pairs from the ssRNA-dsRNA junction, indicating that Drosha recognizes the single- stranded region without a ssRNA binding domain. The possibility that Drosha and DGCR8 do not complex with each other before interaction with the pri-miRNA is plausible because both of these proteins have been seen to participate in their own independent RNA processing pathways. For example, DGCR8 is seen to independently interact with a plethora of RNAs in the nucleus,29 while Drosha processes simtrons (splicing-independent miRNA-like miRNAs) in a DGCR8- independent manner.30 Taking into account these findings, it remains to be determined how the two proteins collectively recognize and specifically cleave pri-miRNAs in the nucleus.

Of extreme interest in the miRNA maturation pathway is the prevalence of double- stranded RNA binding domains (dsRBDs) within its proteins. In fact, in the mammalian canonical miRNA maturation pathway, there are five key proteins that contain dsRBDs: DGCR8,

Drosha, TRBP, PACT, and Dicer. These dsRBD-containing proteins all do their individual part in binding and processing RNAs both in the nucleus and in the cytoplasm; for example, some proteins have evolved to deal with imperfections within the stem of the RNA (e.g., DGCR8 and

Drosha together cleave pri-miRNA in the nucleus) while others bind both perfectly and imperfectly duplexed RNA (e.g., TRBP, PACT, and Dicer collectively cleave both small interfering RNA and microRNA precursors). Because these dsRBD-containing proteins all interact with different RNA structures, it is currently unknown how these proteins are capable of distinguishing their preferred RNA targets from other RNAs in the cell.

Interestingly, while the Dicer and Drosha each contain a single dsRBD, their cofactor proteins DGCR8, TRBP, and PACT each contain multiple dsRBDs, suggesting that these cofactors may gain a functional advantage to arranging multiple dsRBDs in a single polypeptide chain, possibly yielding the key to RNA discrimination. Despite the likely importance of tandem dsRBD-containing proteins in RNA interference, the characterization of 10

these proteins’ interactions with RNA is still incomplete. Just as there is a lack of pri-miRNA structures, there are also no structures of tandem dsRBDs complexed with RNA. Therefore, other methods must be depended on for characterizing these interactions.

In the investigation of how the tandem-dsRBD construct of DGCR8 (called DGCR8-

Core) interacts with pri-miRNAs, Narry Kim’s group demonstrated that the affinity of a single dsRBD from DGCR8 is moderately weak in comparison to the binding demonstrated upon addition of the second dsRBD.31 Therefore, it is likely that DGCR8 binds in a positively cooperative manner, possibly to enable recognition of longer RNAs. The crystal structure of

DGCR8-Core shows two possible dsRBD-binding sites for the pri-miRNA to engage the protein molecule. Therefore, based on the positioning of dsRNA binding faces in the crystal structure, the pri-miRNA would have to undergo extreme bending in order to accommodate binding by both of

DGCR8’s dsRBDs (depicted in Fig. 1-4). Although the extent of pri-miRNA bending in this model seems somewhat unreasonable, studies have shown that DGCR8-Core may undergo conformational rearrangement itself in order to assist in binding. First, our lab has shown through

NMR and molecular dynamics that the dsRBDs of DGCR8 have correlated motions with each other that likely allow for a dynamic range of RNA binding.32 In addition, Mirko Hennig’s group published NMR evidence that revealed changes in the chemical environment for both RNA binding faces of the dsRBDs and the interface formed between the domains, when complexed with RNA.22 11

Figure 1-4. Proposed bending model of pri-miRNA recognition by DGCR8-Core currently suggested in the literature.31 The DGCR8-Core crystal structure (PDB 2YT4, with loops built-in as previously described32) is shown with dsRBD1 in red and dsRBD2 in blue. Approximately one turn of idealized A-form dsRNA (tan) was superimposed onto each dsRBD’s binding face of

DGCR8 with an additional turn of dsRNA shown in between the two dsRBD-bound RNA duplexes, all of which together with flanking tails and a terminal loop (dotted lines) make up a theoretical pri-miRNA bound to DGCR8.

Whereas the previous model of recognition by DGCR8 would result in a 1:1 complex of

DGCR8-Core with the pri-miRNA, other studies have proposed different modes of binding. In the same study performed by Narry Kim’s group, fluorescence studies demonstrated the possibility of 2:1 DGCR8:pri-miRNA binding.31 In conjunction with this result, Roth et al.22 suggested that pri-miRNA recognition is facilitated by pre-formed DGCR8-Drosha heterodimers, 12

rather than DGCR8 recognition of the pri-miRNA first, followed by Drosha interaction, as in the molecular anchor model. This hypothesis correlates with results shown by Feng Guo’s group who proposed that DGCR8 has the ability to bind with tighter affinity upon dimerization by its WW- domain via heme binding followed by Drosha interaction.33 Regardless, all of these studies differ in the DGCR8 interactions with both the cofactor enzyme Drosha and the pri-miRNA.

Although other functional domains exist in the proteins of the Microprocessor complex, the dsRBDs and RNase III domains are believed to be the most critical folded domains in this protein complex for miRNA biogenesis. Specifically, DGCR8 contains two tandem dsRBDs, dsRBD1 and dsRBD2, and Drosha contains a dsRBD and an RNase III domain. The conundrum is that the interaction of these domains yields pre-miRNA that are consistently cleaved ~11 base pairs from the ssRNA-dsRNA junction, yet Drosha’s RNase III enzyme is known to cleave dsRNA non-specifically in isolation.26, 34 Therefore, it is necessary that the dsRBDs located in

DGCR8 or in Drosha work alone or collectively to not only bind the correct RNA target, but also guide the specific cleavage of the pri-miRNA. Although multiple studies have begun to clarify

DGCR8’s interactions with Drosha in the Microprocessor complex, it is still unknown how these proteins select pri-miRNAs as the ideal target for the Microprocessor complex amongst all other

RNAs present in the nucleus. Therefore, the aim of this thesis is to elucidate the controversial binding mechanism of DGCR8 with pri-miRNA by combining pri-miRNA structural data with

Microprocessor binding and processing assays.

1.2 RNA Structure Determination

As mentioned previously, there are currently no confirmed structures of pri-miRNAs.

Therefore, conclusions made about the structural features preferred by the Microprocessor 13

complex from computer-predicted structures are, at times, speculative. However, these conclusions can be supported and refined by experimentally-derived pri-miRNA structures.

Similar to DNA’s standard B-form helix, RNA can also exist in a double-helical structure, albeit with a tighter twist known as A-form (Fig. 1-5). RNA helices are dominated by

Watson-Crick G/C and A/U base pairs in addition to G/U wobble base pairs. In both helical forms, the major groove carries the genetic nucleobase information, which is where proteins recognize RNA sequence; however, in an A-form helix, this groove is too deep and therefore typically inaccessible to proteins. Instead, the major groove is dominated by the exposure of the phosphodiester backbone, whereas the minor groove displays the 2´-hydroxyls of the sugars.

Although it appears that the nucleobases are somewhat displayed in the minor groove, no sequence-specific interactions occur here because the minor groove faces of the purines and of the pyrimidines are predominantly the same.

Figure 1-5. Structure of A-form double-stranded RNA. Slightly longer than one turn of A-form helix is displayed. The phosphates from the phosphodiester backbone are exposed to solvent in the major groove, with the hydroxyls from the sugars being exposed in the minor groove. The negatively charged groups are indicated in red.

14

Unlike DNA, RNA can exist in multiple structures other than A-form, giving rise to the variety of features shown in Figure 1-3. Although more than half of the nucleotides in a typical

RNA are found in Watson-Crick base pairs, a number of non-canonical structural motifs are commonly embedded within A-form helical structures.35 In particular, it has been posed that imperfections within the RNA can widen the native major groove allowing amino acid side chains to make sequence-specific contacts with the nucleobases.36, 37 Given that pri-miRNAs harbor multiple imperfections within their double-stranded stem, these RNA substrates could be an exception to the non-sequence specific binding case. Therefore, it is necessary that structures of pri-miRNAs be determined in order to examine this possibility.

1.2.1 Atomic Resolution Methods

Common methods used for atomic resolution determination of RNA structure are nuclear magnetic resonance spectroscopy (NMR) and X-ray crystallography. NMR gives the advantage of monitoring the structure of the RNA in its solution state, whereas crystallography requires the

RNA to be in a possibly non-native crystal-packed state. A common problem for both methods is the monomer-dimer equilibrium of hairpin RNAs, which arises at the high concentrations required for these techniques shifting the monomer hairpin to its dimeric state – this is demonstrated in Figure 1-6 for a pri-miRNA at low RNA concentration by adding salt. Salt will induce a shift to the dimeric state just as an increase in RNA concentration would; indicating that even at low concentrations of salt, high concentrations of RNA such as those required in NMR or crystallography will favor the dimeric state. Therefore, a thermostable tetraloop is typically introduced to the hairpin RNA to stabilize the monomeric state; however, this utilizes an unattractive solution through the use of a mutated non-native terminal loop. 15

Figure 1-6. DSC shows that pri-miRNAs shift from a hairpin monomer to a dimer with increasing salt concentration, which occurs more rapidly with increasing RNA concentration. The concentration of pri-mir-16-1 in this DSC experiment is 10 µM, whereas near-millimolar concentrations are used for NMR and crystallography. For pri-mir-16-1, the monomer melts at

49°C and the dimer melts at 61°C.

For NMR, several options for determining the structure and dynamics of the RNA are available, albeit at the high cost of producing an isotopically enriched RNA sample.38, 39 Once the

RNA is isotopically enriched with NMR-active carbon (13C) and nitrogen (15N) isotopes, a variety of experiments are viable for determining the canonical and non-canonical base pairs within the

RNA, in addition to their base-pairing dynamics. Furthermore, the resulting NMR spectra can point to specific structural motifs within the RNA such as bulges and internal loops. A popular feature of NMR is the use of residual dipolar couplings (RDCs) for determining long-range global

RNA dynamics. For example, RDCs have been used extensively for the study of the bending 16

angles of the 3-nucleotide bulge located in HIV-1 TAR RNA, enabling the computational design of small molecules that bind across this imperfection.40, 41 In the end, determination of RNA structure using NMR is typically limited to approximately 50 nucleotides due to spectral overlap.

Therefore, NMR is not a tractable method for structural determination of pri-miRNAs due to their large size (~100 nucleotides).

The second technique, X-ray crystallography, poses its own limitations for RNA structure determination. This technique is innately dependent on the ability of the RNA to crystallize, which is typically impossible for RNAs with high amounts of flexibility, as would be the case for pri-miRNAs. As previously mentioned, there is currently one crystal structure of a pre-miRNA

(Fig. 1-2);17 however, this RNA is depleted of single-stranded tails and almost completely enveloped by the Exportin-5 protein, essentially locking the RNA into one conformation. Even with the increased rigidity, the lack of diffraction data present for the terminal loop and UC bulge in the pre-mir-30a structure supports the reason why crystallography is not amenable to determining pri-miRNA structure. Alternatively, small-angle X-ray scattering (SAXS) has become an attractive technique for monitoring the global structure of RNA,42, 43 but does not yield atomic resolution data.

1.2.2 Chemical and Enzymatic Probing Methods

Due to the barriers to determining pri-miRNA structure with the previously mentioned atomic resolution methods, a variety of other readily available structure mapping techniques44, 45 that allow probing of the RNA structure in solution have been widely applied. These include, but are not limited to, DMS modification, endonuclease cleavage, hydroxyl radical probing, SHAPE chemistry, and in-line probing. Some of these techniques only probe the secondary structure of specific nucleotides, whereas others are capable of monitoring the structure of every nucleotide in the RNA. Therefore, due to the possible bias introduced with particular techniques, it is 17

oftentimes necessary to use multiple techniques to complement and cross-verify the structure mapping results of each method.

In DMS mapping, DMS (dimethyl sulfate) is used to methylate the RNA bases, but only the N1 of adenosine, N7 of guanosine, and N3 of cytosine; therefore this method does not probe for the structure of uridine.46 In this method, these bases are only modified if they are single- stranded and not base paired. The modified bases are then read out using primer extension to generate cDNA fragments, which are analyzed with gel electrophoresis. The second technique, endonuclease cleavage, involves several RNA endonucleases that can be used to cleave the RNA at either specific single-stranded nucleotides or specific double-stranded nucleotides.47, 48 For example, two such endonucleases are Ribonuclease V1 that cleaves at any double-stranded nucleotide and that cleaves only at single-stranded guanine nucleotides. These

RNA fragments are then analyzed directly with gel electrophoresis.

The last three mentioned techniques are capable of probing every nucleotide within the

RNA, independent of their sequence. The first of these, hydroxyl radical probing, employs the use of a hydroxyl radical produced from the reaction of Fe(II)-EDTA, ascorbic acid, and hydrogen peroxide to cleave the RNA.49 The radical cleaves the phosphodiester backbone at a rate based on the accessibility of that nucleotide to the radical, which is then analyzed directly on a polyacrylamide gel. Another structure-mapping technique, which was used extensively throughout this thesis, is SHAPE chemistry, or selective 2´-hydroxyl acylation analyzed by primer extension.50, 51 In a SHAPE experiment, the conformationally flexible 2´-hydroxyls of the

RNA react selectively with an electrophile to form a 2´-O-adduct (further described in Chapter 2).

The resulting reactivities are dependent upon the nucleotide’s constraints by base pairing or other interactions; i.e., nucleotides participating in base pairs are unreactive, whereas those in loops or bulges are typically reactive. SHAPE chemistry is also valuable because of the availability of several reagents that are capable of monitoring different time-scale dynamics in the RNA based 18

on their hydrolysis half-life.52 Lastly, in-line probing is unique because it allows monitoring the structure of every nucleotide in the RNA without the use of a chemical or enzymatic probe. This method utilizes the natural degradation of the RNA over time through hydrolysis, which would occur more rapidly at unstructured sites in the RNA.53 The degraded products are then analyzed on a polyacrylamide gel. Moreover, these last three techniques are particularly useful for determining regions of protection by the RNA itself, such as in tertiary interactions, or a bound ligand. In the end, the combination of these structure mapping techniques should give valuable insight into the structural characteristics that the Microprocessor is able to recognize on a pri- miRNA.

1.2.3 MC-Pipeline Modeling

Modeling allows refinement of an RNA structure by using experimental constraints in addition to available structural data from atomic resolution structures of other determined RNAs.

A variety of RNA modeling programs exist such as RNAfold,54 RNAstructure,55 and MC-

Pipeline,56 with each containing their own advantages and disadvantages. In this thesis, the MC-

Pipeline software package was used to develop models of pri-miRNAs because of its extensive use of previously determined RNA structures. In doing so, MC-Pipeline goes beyond customary

RNA prediction programs such as mfold, because it uses the RNA structures deposited in the

Protein Data Bank (PDB) to predict the secondary structure of RNA, allowing both canonical and non-canonical base pairs to form. This software package is divided into multiple algorithms that use knowledge-based potentials to predict the structure, two of which were used for this thesis:

MC-Fold and MC-Sym.

Figure 1-7 outlines how the structure of the RNA is predicted using MC-Pipeline. First, the sequence of the RNA is used as an input for MC-Fold, yielding an initial secondary structure predicted using a pseudo-potential energy minimization. Then, a nucleotide cyclic motif library 19

(NCML) of compiled thermodynamic data from atomic resolution RNA structures is used to iteratively break down the RNA piece by piece by looking at opposing bases within a duplexed region, while taking into account neighboring nucleotides. This will determine whether those opposing bases could exist in some sort of base-pairing conformation while being stabilized by its neighbors, such as by stacking. From this, multiple secondary structures are predicted, which can be used to calculate the overall probability of each nucleotide being base paired, to give a measure of the base-pairing dynamics. As with most modeling programs, experimental constraints can also be used to refine the RNA structural predictions in MC-Fold. Any number of these secondary structure predictions can then be used as an input for MC-Sym. Again, the

NCML is used to predict the structure of the RNA, but in three-dimensional space. Based on the structures in the PDB, this algorithm is used to predict the types of base pairs that are formed in addition to whether nucleotides are being flipped out of the helix or stacked inside. An ensemble of 3D models of the RNA structure is then generated to give an overall depiction of the RNA in space, with a view of the local base-pairing interactions formed throughout the structure. 20

Figure 1-7. The MC-Pipeline software package contains two algorithms, MC-Fold and MC-Sym, which utilize a nucleotide cyclic motif library (NCML) of deposited RNA structures in the PDB to develop RNA models. First, MC-Fold is used to predict the secondary structure of RNA by iteratively breaking down the structure piece by piece to determine if each nucleotide is base paired or not. This output can then be inputted into MC-Sym to generate a 3D model of the RNA piece-wise as MC-Fold does. Experimental constraints can also be used as an additional input for

MC-Fold to obtain a more refined secondary structure of the RNA.

21

1.3 Double-stranded RNA Binding Domains

The miRNA maturation pathway requires recognition of RNA by multiple proteins containing double-stranded RNA binding domains (dsRBDs) at various steps until the mature miRNA is base paired with its target mRNA. Although dsRBDs are highly conserved for a particular dsRBD across species, the dsRBD sequence itself is not conserved across orthologues

(Fig. 1-8). Furthermore, several proteins have evolved to contain multiple dsRBDs within a single polypeptide chain – most likely due to gene duplication57 – but are not typically sequence- conserved from dsRBD to dsRBD within the same chain. Therefore, evolution is believed to have allowed proteins to develop multiple dsRBDs in tandem perhaps in order to achieve highly efficient binding capabilities,58 as seen for the tandem dsRBD-containing PKR.59 dsRBDs have also been seen to function in protein-protein interactions, such as the third C-terminal dsRBD in

TRBP and PACT,60 which may be another functional outcome to the sequence variation in dsRBDs.

22

Figure 1-8. Sequence alignment for all mammals with known dsRBD sequence within the miRNA maturation pathway. The third dsRBD of PACT and TRBP are not shown because these are known to not bind RNA. Amino acids with at least 50% similarity are blue and those completely conserved are red. The green amino acid at position 35 is important for dsRNA binding in Region 2 (see Fig. 1-9).

The recognition of dsRNA by dsRNA binding proteins has profound regulatory effects in vivo owing to the non-sequence specific interactions by the protein’s dsRBD(s). Due to the nature of RNA’s A-form helix as mentioned previously, dsRBDs bind with no sequence specificity to the RNA, but instead through electrostatic interactions from three main regions within its canonical αβββα fold: 1) helix 1 (H1), 2) loop between strands 1 (B1) and 2 (B2), and 3) the N- terminal base of helix 2 (H2) (Fig. 1-9).58, 61 Studies have shown that other regions of the dsRBD may also play important roles in dsRNA recognition; however, variations in these regions are less detrimental to RNA binding than those in the protein’s binding face due to the resulting impact on electrostatic interactions with the RNA.31, 62 23

Figure 1-9. Electrostatic interactions governing dsRBD:dsRNA binding (PDB 1DI2).61 The alpha helices (H) and beta strands (B) are indicated for the dsRBD (yellow). The side-chains of the amino acid residues that directly interact with the phosphates and hydroxyls of the RNA backbone are shown. On the protein, the positively charged groups of the residues are in blue with the negatively charged groups in red; the negatively charged groups interact via water- mediated associations. The overall binding interaction is governed by three regions: 1) helix 1

(H1), 2) loop between strands 1 and 2 (between B1 and B2), and 3) at the N-terminal base of helix 2 (H2).

1.4 Methods Used for Studying dsRBD Binding

Although there are solved structures for dsRBDs bound to dsRNA – albeit most are not from the miRNA maturation pathway – I am extremely interested in thermodynamically 24

characterizing this interaction. Additionally, I am interested in examining the binding between tandem dsRBD-containing proteins, such as DGCR8, because there are no determined structures bound to RNA. Due to the non-sequence specificity in dsRBD binding, the dsRBD:dsRNA interaction can be viewed as binding to a lattice of possible identical binding sites (see Fig. 1-10 for illustration). Complicating matters, due to the three-dimensional nature of dsRNA, dsRBDs can also overlap on the RNA by binding opposite sides of the lattice. This thesis focuses on determining the thermodynamic parameters that govern dsRBD:dsRNA interactions in the miRNA maturation pathway, in hopes of using this knowledge to elucidate how the dsRBD- containing complexes in this pathway carry out function on their RNA targets.

Figure 1-10. Due to the non-sequence specificity of dsRBDs, binding to dsRNA is like binding to a lattice of possible identical binding sites. (A) Hypothetically, if some length of dsRNA is represented by 9 blocks, in which 3 blocks corresponds to a binding site, then 3 dsRBDs can saturate this dsRNA lattice. (B) However, due to the lack of sequence specificity, the dsRBDs can slide along the lattice and bind in different locations, which would allow only 2 dsRBDs to saturate the lattice. Knowing how many dsRBDs commonly saturate a dsRNA lattice can be used to complement binding affinities in thermodynamic calculations. Figure adapted from Rico

Acevedo.

25

A variety of methods are available for examining dsRBD binding, as discussed below.

These methods provide thermodynamic parameters such as binding affinity (Kd), binding stoichiometry (n), and thermodynamic energies of interaction (ΔG, ΔH, ΔS). Knowing these parameters can help determine the types of characteristics that are recognized by dsRBDs, the level of cooperativity between multiple dsRBDs binding a single RNA, and possibly how these proteins release the RNA post-cleavage.

1.4.1 Electrophoretic Mobility Shift Assays

Electrophoretic mobility shift assays (EMSAs) are used to derive thermodynamic parameters such as binding affinity (Kd) – better described as an apparent binding affinity (Kd,app)

63 – and Hill coefficient (nH) from the fitted binding curves. The Hill coefficient is correlated to cooperativity in binding; however, it is commonly misinterpreted as stoichiometry. Rather, the

Hill coefficient is simply used to assess if multiple ligands are interacting with a .64

In EMSAs, increasing amounts of protein are added to a limiting amount of labeled RNA, and the formed complexes and free RNA are separated on a non-denaturing polyacrylamide gel.

The idea is that the formed complexes are heavier than the free RNA, resulting in differences in their migration on the gel. In some cases, multiple formed complexes result in several band shifts, suggesting that each successive band shift is the accumulation of protein molecules on the RNA.

However, one alternative possibility is that the presence of multiple band shifts represents a single protein molecule binding to a single RNA, but in discrete locations leading to differential gel mobilities. Therefore, due to the possibility that differently migrating bands could be protein- bound complexes of similar molecular weight but with varying shapes, we were disinclined to assign a particular complex stoichiometry to the individual bands. Of note, due to the low concentrations required for this technique, EMSAs are the primary method used in this thesis. 26

1.4.2 Fluorescence Polarization

Similar to EMSAs, fluorescence polarization (FP) is another quantitative technique available for monitoring dsRBD:dsRNA binding. FP measures the change in tumbling time of a fluorescently-labeled macromolecule upon the addition of a ligand; either biomolecule can be labeled with the fluorophore.65 From this data, fitted binding curves yield apparent binding affinities and Hill coefficients similar to EMSAs.

An inherent disadvantage of this technique is the necessity of fluorophore attachment to one of the biomolecules, which could potentially create an indirect effect with the fluorescent probe. One constraint for this technique is that typically the fluorescently-labeled molecule is less than 10 kDa and the titrated molecule is larger, which allows for a larger change in signal upon binding. Therefore, this would require the dsRBD to be labeled with the fluorescent probe rather than the pri-miRNA, resulting in an excess of RNA by the end of the titration. However, the literature suggests that multiple dsRBDs are binding the pri-miRNA, which would be highly improbable of examining in the presence of excess RNA. Due to these many limitations, FP was not attempted in this thesis.

1.4.3 Isothermal Titration Calorimetry

A very effective technique that yields all thermodynamic binding parameters is isothermal titration calorimetry (ITC). ITC is also valuable because the experiment is carried out at equilibrium, requires no addition of probes to the molecules, and extremely quantitative. This technique measures the amount of heat required to maintain a constant temperature between a reference cell and a sample cell, to match the change in heat generated while a ligand is titrated into the sample cell containing another macromolecule.66 The heats of injections are then fit to a sigmoidal binding curve, which in this case yields the quantitative binding dissociation constant

(Kd) and stoichiometry (n). In addition, the heats of injections directly yield the binding enthalpy 27

(ΔH), which can be used in combination with the binding affinity (Ka = 1/Kd) to give the free energy (ΔG) and entropy (ΔS) of binding from these two equations: ΔG = – RTlnKa and ΔG =

ΔH – TΔS. However, due to the high concentrations required for this technique, the limited solubility of DGCR8’s dsRBDs rendered ITC impossible to use for monitoring binding (see

Appendix for results).

1.4.4 Analytical Ultracentrifugation

In contrast to the previous methods, analytical ultracentrifugation (AUC) is more commonly used for determining the binding stoichiometry (n). AUC allows the monitoring of complexes formed using either sedimentation velocity or sedimentation equilibrium experiments.67 Sedimentation velocity experiments give the average size (in Svedbergs) of complexed molecules in a sample, which can be used to determine how many ligands are bound to a macromolecule. In sedimentation equilibrium, one can determine the fractional amounts of differently sized molecules (i.e., 1:1 protein:RNA, 2:1 protein:RNA, etc.) in a sample at equilibrium. The binding stoichiometry can then be used to better fit the binding affinity in

EMSA and FP experiments, rather than the estimated Hill coefficient. Although these AUC methods are capable of yielding a binding affinity, this measurement is semi-quantitative.

Due to the ease of performing sedimentation velocity experiments, these were focused on in this thesis. In these experiments, a number of absorbance scans are recorded at a particular wavelength (260 nm due to RNA’s high extinction coefficient), while the complexed molecules

“sediment” in a cell during high-speed centrifugation. These absorbance curves are then fit using software to determine the average sedimentation coefficient (S) of the formed complexes; the fitted curve can also be deconvoluted to give an estimated size of the multiple complexes within a single sample. Again, due to the limited solubility of DGCR8, this method was not capable of 28

monitoring binding (see Appendix for results); however this method was used to monitor binding with Dicer’s dsRBD (see Chapter 5).

1.4.5 Circular Dichroism

Similar to AUC, circular dichroism (CD) only yields the binding stoichiometry of protein:RNA interactions. This is done by titrating protein into RNA and monitoring the change in circular dichroism at 260 nm; the stoichiometry is determined at saturation when there is no longer a change in signal. It is believed that a change in circular dichroism is observed because the protein induces some sort of conformational change in the RNA’s circular dichroism upon binding, but this effect is still under investigation.68, 69 Just as with the other methods, CD was also incapable of investigating binding due to DGCR8’s limited solubility (see Appendix for results).

1.5 Dissertation Outline

As stated, very little is known about the maturation process of the initial pri-miRNA at the molecular or atomic level, especially the role that its secondary structure plays in targeting it to the Microprocessor complex. Therefore, the aim of this thesis was to elucidate the binding mechanisms of the dsRBDs from both DGCR8 and Drosha with pri-miRNA by combining pri- miRNA structural data with Microprocessor binding and processing assays. From the research conducted in this thesis, I propose that consistent periodic deformations existing in pri-miRNAs are unique to only this class of RNAs which target them to the Microprocessor for cleavage, with one of these deformations (called the “hot spot”) serving as the Drosha recognition site.

Furthermore, I propose that DGCR8 serves as an indiscriminate dsRNA binding protein, with its tandem dsRBDs serving to bind pri-miRNAs with high affinity along the entirety of the RNA, 29

insensitive to the native imperfections present in these substrates. These central hypotheses were tested by the following chapters.

The second chapter of this thesis is a procedural insight into the use of SHAPE chemistry for determining RNA structure, involving buffer, pH, and reagent preferences. The third chapter then utilizes this RNA structure mapping method in addition to endonuclease structure mapping to develop structural constraints usable for RNA modeling in MC-Pipeline to determine the structure of pri-miRNAs. These methods point to the presence of a possible recognition site for the Microprocessor complex, the “hot spot”, not described before in the literature.

The next step was to determine whether the structural deformations found by structure mapping impacted recognition by the dsRBDs in the Microprocessor. Improper recognition by a dsRBD can result in heightened amounts of precursor RNAs and/or incorrect mRNA levels during RNA interference, resulting in a multitude of diseases and cancers.70 Therefore, the fourth chapter uses EMSAs to thoroughly investigate the pri-miRNA structural features preferred by

DGCR8’s dsRBDs, suggesting that the tandem-dsRBD protein can bind around the imperfections native to pri-miRNAs. Lastly, the fifth chapter is an ongoing study of determining the structure and dynamics of the dsRBD from Dicer binding to dsRNA using X-ray crystallography, NMR, and AUC. This chapter should give insight into how both the dsRBD and dsRNA undergo conformational rearrangements upon binding.

1.6 Acknowledgements

This work was supported by the US National Institutes of Health grant R01GM098451 and start-up funds from the Pennsylvania State University to SAS. Lastly, this dissertation work would not have been possible without help from various facilities at Penn State: Shared

Fermentation Facility, NMR Facility, Crystallography Facility, and Mass Spectrometry Facility.

30

1.7 References

1. Kozomara, A.; Griffiths-Jones, S., miRBase: integrating microRNA annotation and deep- sequencing data. Nucleic Acids Res 2011, 39 (Database issue), D152-7. 2. Lin, H. R.; Ganem, D., Viral microRNA target allows insight into the role of translation in governing microRNA target accessibility. Proc Natl Acad Sci U S A 2011, 108 (13), 5148-53. 3. Pfeffer, S.; Zavolan, M.; Grasser, F. A.; Chien, M. C.; Russo, J. J.; Ju, J. Y.; John, B.; Enright, A. J.; Marks, D.; Sander, C.; Tuschl, T., Identification of virus-encoded microRNAs. Science 2004, 304 (5671), 734-736. 4. Castel, S. E.; Martienssen, R. A., RNA interference in the nucleus: roles for small RNAs in transcription, epigenetics and beyond. Nat Rev Genet 2013, 14 (2), 100-12. 5. He, L.; Hannon, G. J., MicroRNAs: Small RNAs with a big role in gene regulation (vol 5, pg 522 2004). Nat Rev Genet 2004, 5 (8), 522. 6. Raveche, E. S.; Salerno, E.; Scaglione, B. J.; Manohar, V.; Abbasi, F.; Lin, Y. C.; Fredrickson, T.; Landgraf, P.; Ramachandra, S.; Huppi, K.; Toro, J. R.; Zenger, V. E.; Metcalf, R. A.; Marti, G. E., Abnormal microRNA-16 locus with synteny to human 13q14 linked to CLL in NZB mice. Blood 2007, 109 (12), 5079-5086. 7. Aqeilan, R. I.; Calin, G. A.; Croce, C. M., miR-15a and miR-16-1 in cancer: discovery, function and future perspectives. Cell Death Differ 2010, 17 (2), 215-220. 8. Wang, W. X.; Rajeev, B. W.; Stromberg, A. J.; Ren, N.; Tang, G.; Huang, Q.; Rigoutsos, I.; Nelson, P. T., The expression of microRNA miR-107 decreases early in Alzheimer's disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1. J Neurosci 2008, 28 (5), 1213-23. 9. Xu, B.; Karayiorgou, M.; Gogos, J. A., MicroRNAs in psychiatric and neurodevelopmental disorders. Brain Res 2010, 1338, 78-88. 10. Zhao, Y.; Samal, E.; Srivastava, D., Serum response factor regulates a muscle-specific microRNA that targets Hand2 during cardiogenesis. Nature 2005, 436 (7048), 214-220. 11. Poy, M. N.; Eliasson, L.; Krutzfeldt, J.; Kuwajima, S.; Ma, X.; Macdonald, P. E.; Pfeffer, S.; Tuschl, T.; Rajewsky, N.; Rorsman, P.; Stoffel, M., A pancreatic islet-specific microRNA regulates insulin secretion. Nature 2004, 432 (7014), 226-30. 12. Dostie, J. E.; Mourelatos, Z.; Yang, M.; Sharma, A.; Dreyfuss, G., Numerous microRNPs in neuronal cells containing novel microRNAs. RNA 2003, 9 (2), 180-186. 13. Zangi, L.; Lui, K. O.; von Gise, A.; Ma, Q.; Ebina, W.; Ptaszek, L. M.; Spater, D.; Xu, H.; Tabebordbar, M.; Gorbatov, R.; Sena, B.; Nahrendorf, M.; Briscoe, D. M.; Li, R. A.; Wagers, A. J.; Rossi, D. J.; Pu, W. T.; Chien, K. R., Modified mRNA directs the fate of heart progenitor cells and induces vascular regeneration after myocardial infarction. Nat Biotechnol 2013, 31 (10), 898-907. 14. Li, Z.; Rana, T. M., Therapeutic targeting of microRNAs: current status and future challenges. Nat Rev Drug Discov 2014. 15. Bora, R. S.; Gupta, D.; Mukkur, T. K.; Saini, K. S., RNA interference therapeutics for cancer: challenges and opportunities (review). Mol Med Rep 2012, 6 (1), 9-15. 16. Zeng, Y.; Cullen, B. R., Efficient processing of primary microRNA hairpins by drosha requires flanking nonstructured RNA sequences. J Biol Chem 2005, 280 (30), 27595- 27603. 31

17. Okada, C.; Yamashita, E.; Lee, S. J.; Shibata, S.; Katahira, J.; Nakagawa, A.; Yoneda, Y.; Tsukihara, T., A high-resolution structure of the pre-microRNA nuclear export machinery. Science 2009, 326 (5957), 1275-9. 18. Nicholson, A. W., Ribonuclease III mechanisms of double-stranded RNA cleavage. Wiley Interdisciplinary Reviews-Rna 2014, 5 (1), 31-48. 19. Han, J. J.; Lee, Y.; Yeom, K. H.; Nam, J. W.; Heo, I.; Rhee, J. K.; Sohn, S. Y.; Cho, Y. J.; Zhang, B. T.; Kim, V. N., Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 2006, 125 (5), 887-901. 20. Zhang, X. X.; Zeng, Y., The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 2010, 38 (21), 7689-7697. 21. Zeng, Y.; Cullen, B. R., Sequence Requirements for Micro RNA Processing and Function in Human Cells. RNA 2003, 9, 112-123. 22. Roth, B. M.; Ishimaru, D.; Hennig, M., The core microprocessor component DiGeorge syndrome critical region 8 (DGCR8) is a nonspecific RNA-binding protein. J Biol Chem 2013, 288 (37), 26785-99. 23. Beisel, C. L.; Chen, Y. Y.; Culler, S. J.; Hoff, K. G.; Smolke, C. D., Design of small molecule-responsive microRNAs based on structural requirements for Drosha processing. Nucleic Acids Res 2011, 39 (7), 2981-2994. 24. Krol, J.; Sobczak, K.; Wilczynska, U.; Drath, M.; Jasinska, A.; Kaczynska, D.; Krzyzosiak, W. J., Structural features of microRNA (miRNA) precursors and their relevance to miRNA biogenesis and small interfering RNA/short hairpin RNA design. J Biol Chem 2004, 279 (40), 42230-9. 25. Warf, M. B.; Johnson, W. E.; Bass, B. L., Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 2011, 17 (4), 563-77. 26. Carmell, M. A.; Hannon, G. J., RNase III enzymes and the initiation of gene silencing. Nat Struct Mol Biol 2004, 11 (3), 214-8. 27. Gregory, R. I.; Yan, K. P.; Amuthan, G.; Chendrimada, T.; Doratotaj, B.; Cooch, N.; Shiekhattar, R., The Microprocessor complex mediates the genesis of microRNAs. Nature 2004, 432 (7014), 235-40. 28. Zeng, Y.; Yi, R.; Cullen, B. R., Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 2005, 24 (1), 138-148. 29. Macias, S.; Plass, M.; Stajuda, A.; Michlewski, G.; Eyras, E.; Caceres, J. F., DGCR8 HITS-CLIP reveals novel functions for the Microprocessor. Nat Struct Mol Biol 2012, 19 (8), 760-766. 30. Havens, M. A.; Reich, A. A.; Duelli, D. M.; Hastings, M. L., Biogenesis of mammalian microRNAs by a non-canonical processing pathway. NAR 2012, 40 (10), 4626-4640. 31. Sohn, S. Y.; Bae, W. J.; Kim, J. J.; Yeom, K. H.; Kim, V. N.; Cho, Y., Crystal structure of human DGCR8 core. Nat Struct Mol Biol 2007, 14 (9), 847-853. 32. Wostenberg, C.; Noid, W. G.; Showalter, S. A., MD simulations of the dsRBP DGCR8 reveal correlated motions that may aid pri-miRNA binding. Biophys J 2010, 99 (1), 248- 56. 33. Faller, M.; Toso, D.; Matsunaga, M.; Atanasov, I.; Senturia, R.; Chen, Y.; Zhou, Z. H.; Guo, F., DGCR8 recognizes primary transcripts of microRNAs through highly cooperative binding and formation of higher-order structures. RNA 2010, 16 (8), 1570-83. 34. Tomari, Y.; Zamore, P. D., MicroRNA biogenesis: drosha can't cut it without a partner. Curr Biol 2005, 15 (2), R61-4. 32

35. Nagaswamy, U.; Larios-Sanz, M.; Hury, J.; Collins, S.; Zhang, Z. D.; Zhao, Q.; Fox, G. E., NCIR: a database of non-canonical interactions in known RNA structures. Nucleic Acids Res 2002, 30 (1), 395-397. 36. Weeks, K. M.; Crothers, D. M., Major Groove Accessibility of Rna. Science 1993, 261 (5128), 1574-1577. 37. Masliah, G.; Barraud, P.; Allain, F. H., RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell Mol Life Sci 2013, 70 (11), 1875- 95. 38. Furtig, B.; Richter, C.; Wohnert, J.; Schwalbe, H., NMR spectroscopy of RNA. Chembiochem 2003, 4 (10), 936-962. 39. Rinnenthal, J.; Buck, J.; Ferner, J.; Wacker, A.; Furtig, B.; Schwalbe, H., Mapping the Landscape of RNA Dynamics with NMR Spectroscopy. Acc Chem Res 2011, 44 (12), 1292-1301. 40. Al-Hashimi, H. M.; Gosser, Y.; Gorin, A.; Hu, W. D.; Majumdar, A.; Patel, D. J., Concerted motions in HIV-1 TAR RNA may allow access to bound state conformations: RNA dynamics from NMR residual dipolar couplings. J Mol Biol 2002, 315 (2), 95-102. 41. Bailor, M. H.; Sun, X. Y.; Al-Hashimi, H. M., Topology Links RNA Secondary Structure with Global Conformation, Dynamics, and Adaptation. Science 2010, 327 (5962), 202- 206. 42. Gajda, M. J.; Zapien, D. M.; Uchikawa, E.; Dock-Bregeon, A. C., Modeling the Structure of RNA Molecules with Small-Angle X-Ray Scattering Data. Plos One 2013, 8 (11). 43. Yang, S. C.; Parisien, M.; Major, F.; Roux, B., RNA Structure Determination Using SAXS Data. J Phys Chem B 2010, 114 (31), 10039-10048. 44. Ehresmann, C.; Baudin, F.; Mougel, M.; Romby, P.; Ebel, J. P.; Ehresmann, B., Probing the structure of RNAs in solution. Nucleic Acids Res 1987, 15 (22), 9109-28. 45. Weeks, K. M.; Mauger, D. M., Exploring RNA Structural Codes with SHAPE Chemistry. Acc Chem Res 2011, 44 (12), 1280-1291. 46. Tijerina, P.; Mohr, S.; Russell, R., DMS footprinting of structured RNAs and RNA- protein complexes. Nat Protoc 2007, 2 (10), 2608-23. 47. Myers, R. M.; Larin, Z.; Maniatis, T., Detection of single base substitutions by ribonuclease cleavage at mismatches in RNA:DNA duplexes. Science 1985, 230 (4731), 1242-6. 48. Shen, L. X.; Basilion, J. P.; Stanton, V. P., Jr., Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci U S A 1999, 96 (14), 7871- 6. 49. Karaduman, R.; Fabrizio, P.; Hartmuth, K.; Urlaub, H.; Luhrmann, R., RNA structure and RNA-protein interactions in purified yeast U6 snRNPs. J Mol Biol 2006, 356 (5), 1248-62. 50. Low, J. T.; Weeks, K. M., SHAPE-directed RNA secondary structure prediction. Methods 2010, 52 (2), 150-158. 51. Wilkinson, K. A.; Merino, E. J.; Weeks, K. M., Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat Protoc 2006, 1 (3), 1610-6. 52. Gherghe, C. M.; Mortimer, S. A.; Krahn, J. M.; Thompson, N. L.; Weeks, K. M., Slow conformational dynamics at C2'-endo Nucleotides in RNA. J Am Chem Soc 2008, 130 (28), 8884. 53. Toroney, R.; Nallagatla, S. R.; Boyer, J. A.; Cameron, C. E.; Bevilacqua, P. C., Regulation of PKR by HCV IRES RNA: importance of domain II and NS5A. J Mol Biol 2010, 400 (3), 393-412. 33

54. Hofacker, I. L., Vienna RNA secondary structure server. Nucleic Acids Res 2003, 31 (13), 3429-3431. 55. Bellaousov, S.; Reuter, J. S.; Seetin, M. G.; Mathews, D. H., RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013, 41 (W1), W471-W474. 56. Parisien, M.; Major, F., The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008, 452 (7183), 51-55. 57. Tian, B.; Mathews, M. B., Phylogenetics and functions of the double-stranded RNA- binding motif: a genomic survey. Prog Nucleic Acid Res Mol Biol 2003, 74, 123-58. 58. Tian, B.; Bevilacqua, P. C.; Diegelman-Parente, A.; Mathews, M. B., The double- stranded-RNA-binding motif: interference and much more. Nat Rev Mol Cell Biol 2004, 5 (12), 1013-23. 59. Tian, B.; Mathews, M. B., Functional characterization of and cooperation between the double-stranded RNA-binding motifs of the protein kinase PKR. J Biol Chem 2001, 276 (13), 9936-9944. 60. Laraki, G.; Clerzius, G.; Daher, A.; Melendez-Pena, C.; Daniels, S.; Gatignol, A., Interactions between the double-stranded RNA-binding proteins TRBP and PACT define the Medipal domain that mediates protein-protein interactions. RNA Biol 2008, 5 (2), 92- 103. 61. Ryter, J. M.; Schultz, S. C., Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. EMBO J 1998, 17 (24), 7505-13. 62. Bevilacqua, P. C.; Cech, T. R., Minor-groove recognition of double-stranded RNA by the double-stranded RNA-binding domain from the RNA-activated protein kinase PKR. Biochemistry 1996, 35 (31), 9983-94. 63. Hellman, L. M.; Fried, M. G., Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions. Nat Protoc 2007, 2 (8), 1849-61. 64. Weiss, J. N., The Hill equation revisited: uses and misuses. Faseb Journal 1997, 11 (11), 835-841. 65. Pagano, J. M.; Clingman, C. C.; Ryder, S. P., Quantitative approaches to monitor protein- nucleic acid interactions using fluorescent probes. RNA 2011, 17 (1), 14-20. 66. Patel, S.; Blose, J. M.; Sokoloski, J. E.; Pollack, L.; Bevilacqua, P. C., Specificity of the Double-Stranded RNA-Binding Domain from the RNA-Activated Protein Kinase PKR for Double-Stranded RNA: Insights from Thermodynamics and Small-Angle X-ray Scattering. Biochemistry 2012, 51 (46), 9312-9322. 67. Lebowitz, J.; Lewis, M. S.; Schuck, P., Modern analytical ultracentrifugation in protein science: a tutorial review. Protein Sci 2002, 11 (9), 2067-79. 68. Gilligan, T. J.; Schwarz, G., Self-Association of Adenosine-5'-Triphosphate Studied by Circular-Dichroism at Low Ionic Strengths. Biophysical Chemistry 1976, 4 (1), 55-63. 69. Cole, J. L., Analysis of PKR activation using analytical ultracentrifugation. Macromol Biosci 2010, 10 (7), 703-13. 70. van Kouwenhove, M.; Kedde, M.; Agami, R., MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer 2011, 11 (9), 644-56.

34

Chapter 2

The Use of SHAPE Chemistry to Determine RNA Structure in Solution

2.1 Introduction

At the time of writing this thesis, there were no experimentally confirmed structures of pri-miRNAs; only the Drosha cleavage product pre-mir-30a had been crystallized, yet even then parts of the RNA were missing from the diffraction map due to disorder (Fig. 1-2).1 Although computer-predicted structures have shed some light on the structural characteristics preferred by the Microprocessor, there is a pressing need for experimental structures of miRNA precursors. In order to determine the important secondary structure characteristics of pri-miRNAs necessary for efficient processing by the Microprocessor, several studies have been done examining various sequences and secondary structures within the terminal loop, double-stranded stem, and flanking strands.2-4 However, most structure-function conclusions from Microprocessor studies are based on the RNA secondary structure predicted by mfold5 or similar computer programs. Questioning these conclusions, structure mapping of hairpin RNAs has shown that computer-predicted structures oftentimes deviate significantly from what is experimentally derived.6 Because the structural imperfections in pri-miRNA transcripts have direct consequences for processing and function in RNA interference (RNAi), it is necessary that they be accurately annotated. Therefore, it is crucial that solution structures of pri-miRNAs be determined with the best resolution possible if RNA maturation by the Microprocessor is to be characterized at the molecular level. 35

Because RNA is naturally transcribed as single-stranded, it can exist in multiple structures other than its A-form duplex, giving rise to a variety of features. More than half of the nucleotides in a typical RNA are found in Watson-Crick base pairs, but a number of non- canonical structural motifs are commonly embedded within A-form secondary structures.7 One of the most prevalent motifs is the single-nucleotide mismatch (or 1 × 1 internal loop) which is almost isosteric with canonical Watson-Crick pairs.8 Also common are single-nucleotide bulges and small asymmetric loops, which often serve as recognition motifs for proteins.9 A recent bioinformatic study showed that these structural motifs are common at both Drosha and Dicer cleavage sites in miRNA maturation.10 Therefore, structural characterization of pri-miRNAs has been a key focal point in the Showalter Laboratory.

The scientific community has thus far been unsuccessful in discovering how the proteins within the Microprocessor recognize the structures of pri-miRNAs and how this class of RNAs is disambiguated from others within the nucleus. One approach to solving this problem is to recover biochemical structures of pri-miRNAs. However, most RNA solution structure mapping techniques are faced with the problem of either not yielding structural information for every nucleotide within the RNA or requiring the use of a difficult chemical reaction for modification.

Therefore, we turned to SHAPE chemistry for RNA structure mapping due to its ability to probe the structure of every nucleotide within the RNA with the use of a simple reagent.

Developed by the Weeks Laboratory in 2005,11-13 SHAPE (Selective 2´-Hydroxyl

Acylation analyzed by Primer Extension) chemistry has been successfully used to determine the solution secondary structure of a multitude of RNAs. SHAPE chemistry has emerged as a powerful method for defining base-pairing status of RNAs with single-nucleotide resolution.14

Furthermore, hairpin RNAs similar to pri-miRNAs have been studied using SHAPE chemistry, yielding results that compare well to their previously determined secondary structures using methods such as X-ray crystallography and NMR.15-17 Intriguingly, SHAPE chemistry was also 36

used to map the structure of the HIV-1 genomic RNA (~ 1 kb), providing a connection between

RNA structure and translational regulation of the virus.18

Figure 2-1. SHAPE chemistry reaction using 1M7 (1-methyl-7-nitroisatoic anhydride) as an example, yielding the 2´-O-adduct on the RNA. SHAPE reactivity gives a measure of how flexible or constrained a nucleotide is by base pairing or stacking interactions with other nucleotides, ligands, or proteins. Simultaneously, the SHAPE reagent is also undergoing a competing hydrolysis reaction.

SHAPE experiments can be performed within a couple hours with the use of one of several commercially available reagents. In a SHAPE experiment, the conformationally flexible

2´-hydroxyls on the sugar of the RNA nucleotide react selectively with an electrophile to form a

2´-O-adduct (Fig. 2-1).12, 17, 19, 20 The resulting reactivities are diminished if the nucleotide’s dynamics are dampened by base pairing or other stacking interactions; i.e., nucleotides participating in canonical base pairs are unreactive, while those in unstructured loops or bulges are typically reactive. Following the reaction, the extent of modification at the nucleotide level is then detected via reverse transcription, because reverse transcription is aborted at the site of the 37

SHAPE-modified RNA nucleotide. The resulting cDNA fragments are then separated on a denaturing polyacrylamide gel and the intensities of each band are quantified using image processing software, such as the SAFA (Semi-Automated Footprinting Analysis) program,21 and correlated to the sequence of the RNA. The SHAPE procedure is briefly outlined in Figure 2-2. It is important to note that the cDNA fragments can also be separated and quantified using capillary electrophoresis. Once the reactivity trace is defined on a per-nucleotide basis, these reactivities can be used to refine the secondary structure of the RNA using a multitude of software packages, such as RNAstructure22 or MC-Pipeline.23

38

Figure 2-2. Outline of SHAPE chemistry procedure. In the first step, (1) RNA nucleotides that are not constrained by base pairing or stacking are modified by the SHAPE reagent. (2) These modified nucleotides (red) are then detected via reverse transcription to generate cDNA fragments (blue lines). (3) The cDNA fragments are then separated on a denaturing polyacrylamide gel and (4) their intensities are quantified.

Due to loss of information from reverse transcription initiation and full-length cDNA creation, linkers are added to each end of the RNA-encoding DNA template (Fig. 2-3).19 The inclusion of these linkers as part of a “SHAPE cassette” allows nucleotide information to be recovered near the 5´- and 3´-ends of RNAs shorter than 200 nucleotides, as is the case for in 39

vitro transcribed pri-miRNAs. In some cases, the hairpin tetraloops used for the cassette linkers can interfere with the native folding of the inserted RNA of study; this was seen for DGCR8’s

UTR (Fig. 2-6). In these cases, a simple change of sequence within the linkers can resolve the problem.

Figure 2-3. SHAPE cassette linkers were added onto the RNA in order to fully map the pri- miRNAs. The linkers were adapted from Merino et al.19 Promoter sites were added for both transcription and reverse transcription, in addition to an inverted BsaI site to linearize the plasmid before transcription. Restriction enzyme sites were added internal to the linkers such that any

RNA-encoding sequence could be added to the cassette plasmid.

In studying RNA structure, it is desirable to monitor the types of dynamic changes that are exhibited in them, such as breathing, base pairing, and tertiary folding. A unique feature of

SHAPE chemistry is the availability of several reagents that are able to gauge these different time-scale changes in the RNA.24 Among the various SHAPE reagents used in this study are multiple anhydrides [1-methyl-7-nitroisatoic anhydride (1M7), 4-nitroisatoic anhydride (4NIA),

N-methylisatoic anhydride (NMIA), and isatoic anhydride (IA)] and a cyanide reagent, benzoyl cyanide (BzCN) (structures and hydrolysis half-lives shown in Fig. 2-12). Due to their different 40

hydrolysis rates, these reagents are capable of reacting with nucleotides in the RNA that undergo time-varying dynamics before the reagent is fully hydrolyzed, again on a per-nucleotide basis.

Here, we applied SHAPE chemistry methods to three structurally diverse pri-miRNAs, revealing deviations from canonical A-form structure in each. In this chapter, a variety of buffers, pH levels, temperatures, and SHAPE reagents were used to investigate the structure and folding of these RNAs. The resulting RNA structures should give valuable insight into the structural characteristics present in pri-miRNAs and possible recognition sites for the Microprocessor.

2.2 Materials and Methods

2.2.1 RNA Preparation

All DNAs were purchased from Geneart. Template DNA for SHAPE reactions was inserted into a SHAPE cassette19 (described in Results). The SHAPE cassette containing the template DNA was then cloned into pUC19 (New England Biolabs) and transformed into DH5α competent cells, which were grown in LB medium at 37°C to an approximate OD600 of 3.75.

The preparation of plasmid DNA, transcription by T7 RNA polymerase, and purification of the transcribed RNA were all performed as previously described.25

2.2.2 SHAPE Reagent Preparation

All reagents were commercially available except 1-methyl-7-nitroisatoic anhydride

(1M7). Synthesis of 1M7 was carried out by a former student Christopher Wostenberg as described in Beutner et al.26 All powdered reagents were aliquotted into ~1-10 mg amounts under inert atmosphere (the exact amount recorded per aliquot), parafilmed, and stored in a desiccator. 41

2.2.3 RNA Modification by SHAPE Reagents

RNA (4 pmol) in 5 μL of sterile water was heated at 85°C for 1 min and cooled to 4°C for 2 min prior to the addition of 5 μL of folding buffer [final 50 mM HEPES (pH 7.6) or cacodylate (pH 6.5) and 50 mM KCl; unless noted otherwise], incubation at 37°C for 10 min, and treatment with 1 μL of SHAPE reagent (400 mM BzCN/ 100 mM 1M7/ 200 mM 4NIA/ 100 mM

NMIA/ 100 mM IA) in anhydrous DMSO) at 37°C for five hydrolysis half-lives (t½ = 0.25 s/ 14 s/ 1.6 min/ 8.3 min/ 15 min, respectively). Control and sequencing reaction mixtures were treated with 1 μL of anhydrous DMSO. All reaction mixtures were cooled to 4°C, and the RNA was recovered by ethanol precipitation (1 μL of 5 M NaCl, 0.5 μL of 20 mg/mL glycogen, 12.5 μL of sterile water, and 75 μL of ethanol added to each reaction mixture).

2.2.4 Primer Extension

All samples were mixed with 0.5 μL of Cy5 fluorescently-labeled DNA primer (10 μM in sterile water, 5′-Cy5-GAACCGGACCGAAGCCCGATTTGG-3′, purchased from Sigma, HPLC- purified). The primers were annealed to the RNA by being heated at 65°C for 5 min and cooled at

35°C for 5 min prior to addition of 3 μL of reverse transcription buffer [167 mM Tris-HCl (pH

8.3), 250 mM KCl, 1.67 mM dNTP mix, 17 mM DTT, and 10 mM MgCl2]. After the sample was heated to 52°C for 3 min, 1 μL of 5 mM ddNTP was added to the sequencing reaction mixtures, and 1 μL of sterile water was added to all other reaction mixtures. Primer extension was initiated by the addition of 0.5 μL of Superscript III (Invitrogen) and incubation at 52°C for 10 min.

Immediately thereafter, 0.5 μL of 2 M NaOH was added, and the solutions were heated to 95°C for 7 min prior to being cooled at 4°C for 5 min. cDNA was treated with 9 μL of a neutralizing gel loading solution [100 mM Tris-HCl (pH 7.5) and 20 mM EDTA in formamide], heated to

90°C for 90 s, and separated on a denaturing polyacrylamide gel (8% 29:1 acrylamide: bisacrylamide, 8.4 M urea, 1× TBE; 95 W, 1 hr). The gels were visualized on a Typhoon imager. 42

2.2.5 Processing of SHAPE Data

Gels were quantified using SAFA.21 All lanes were normalized using SAFA and the background reactions were subtracted from the SHAPE-modified reactions. The resulting SHAPE reactivities were rescaled in the range of 0−1.

2.3 Results and Discussion

2.3.1 Thermodynamically Predicted Primary MicroRNA Structures

SHAPE chemistry was used to determine the secondary structures of a panel of pri- miRNAs, the results of which were compared to the secondary structures predicted in previous studies4, 27 and by mfold. The RNAs used in this study include pri-mir-16-1, pri-mir-107, pri-mir-

30a, and DGCR8’s 5´-UTR because of their predicted variety of structural features in addition to their involvement in tumor suppression,28 Alzheimer’s disease,29 cell autophagy,30 and DGCR8 regulation,27 respectively (mfold-predicted structures shown in Fig. 2-4A). When the structures of the pri-miRNAs were predicted by mfold,5 all exhibited several alternative structures within the

10% suboptimal parameter range. These alternative structures displayed differences in the size and location of internal loops and/or bulges, in addition to the size of the terminal loop. The top two predicted structures for pri-mir-16-1 are shown in Figure 2-4B, displaying differences in the terminal loop as well as the neighboring stem imperfections which alter the base pairing register of this region. These differences are notable for two reasons: 1) the terminal loop is suggested to play a role in Microprocessor efficiency,2, 31 thus differences in the loop could interfere with

Drosha processing; and 2) they integrate into the Dicer cleavage site, which could potentially alter how Dicer perceives this region of the pre-miRNA. Therefore, it is necessary that the structures of these pri-miRNAs be accurately determined.

43

Figure 2-4. (A) Predicted RNA secondary structures as determined by mfold for the pri-miRNAs used in this study. The mature miRNA, found from miRBase32, is in bold. (B) The top two structures predicted by mfold for pri-mir-16-1, showing differences in the terminal loop and the neighboring end of the stem. Structure 2 is the same from above in panel A.

2.3.2 Secondary Structures Determined Under Native Conditions

This panel of RNAs was investigated once sufficient SHAPE reactivity-producing conditions were optimized (discussed in subsequent sections). Surprisingly, normalized SHAPE reactivity profiles indicate that many of the small helical imperfections in the pri-miRNA stems are not disruptive to the A-form helix (Fig. 2-5). Internal to the double-stranded stem of the pri- miRNA, high normalized SHAPE reactivities indicate a lack of base pairing and/or base stacking 44

at the site of multiple-nucleotide bulges and asymmetric internal loops, as expected. However, it was seen that approximately half of the single-nucleotide bulges and mismatches predicted by mfold were not detected across the three pri-miRNAs. It was unclear at this point whether the remaining single-nucleotide deformations were not being detected by the SHAPE reagent or whether these deformations existed in a non-canonical base-pairing interaction. However, corroboration for non-canonical base-pairing interactions diminishing attack by the SHAPE reagent is available in the literature.33-37 In contrast, the terminal loop and single-stranded flanking tails served as positive controls for high levels of modification by the SHAPE reagent. In particular, much larger terminal loops are estimated from the SHAPE reactivities than what is predicted by mfold, especially in the case of pri-mir-107 and pri-mir-30a. 45

46

Figure 2-5. SHAPE reactions of pri-mir-16-1 (top), pri-mir-107 (middle), and pri-mir-30a

(bottom). (A) The secondary structure of the pri-miRNA predicted by mfold. (B) The denaturing gel for the sequencing and SHAPE reactions in the presence of HEPES buffer (pH 7.6) using the

1M7 reagent at 37°C. (C) The absolute SHAPE reactivity trace as a function of nucleotide position with corresponding unconstrained regions indicated. (D) The SHAPE-derived secondary structure, with unpaired nucleotides indicated in blue as reported from the high SHAPE reactivities. Non-canonical base pairing interactions are indicated with dots connecting the bases, rather than straight lines. Cassette linker regions have been deleted for better visualization.

These SHAPE-derived secondary structure results suggest that most symmetric internal loops do not cause the average overall structure of pri-miRNAs to deviate significantly from an

A-form helix, which is consistent with findings in the literature.38, 39 While 1 × 1 loops were rarely modified by the SHAPE reagent, single-nucleotide bulges were often modified to a detectable extent, suggesting that single-nucleotide bulges are capable of bending the overall structure of the RNA at that site. For example, each of the three U bulges – one located in each pri-miRNA – was significantly modified. The only single-nucleotide bulge not modified above background was the A bulge in pri-mir-30a, which is likely to be participating in purine-purine stacking on both sides. Overall, the resulting normalized SHAPE reactivity for each pri-miRNA indicates that the majority of the symmetric internal loops do not provide the SHAPE reagent with sufficient access to the 2′-hydroxyl, implying that these defects do not significantly alter the

A-form helix of the double-stranded stem.

An advantage of the SHAPE technique is that structural information can be obtained for every nucleotide in the RNA; the absence of such comprehensive coverage is a common limitation for most other solution structure mapping techniques. However, as can be seen from 47

the denaturing gels and in the SHAPE reactivity traces in Figure 2-5, nucleotide resolution at the

5´-end is very poor due to insufficient separation of bands on the gel. Insufficient separation can be partially avoided by double-loading a gel; i.e., loading a sample on the same gel at two different time points in different lanes. In addition, the SHAPE technique tends to be very noisy, particularly at the 5´-end of the RNA, which is typically overcome by running several replicates

(e.g., 21 replicates were used to determine the secondary structures studied in Chapter 3).

Of note is the re-folding of the native structure of the DGCR8 5´-UTR (Fig. 2-6) induced by the SHAPE cassette; therefore, this RNA was not studied further. One option for trouble- shooting this obstacle could be to modify the cassette sequence to minimize its base pairing complementarity with the UTR sequence, therefore minimizing interference with the native UTR structure. The SHAPE reactivity trace shown in Figure 2-6C shows the positioning of the new stem loop in the re-folded structure, in comparison to where the stem loop is predicted to be in the native UTR structure, which is depicted in panel D. In particular, it appears that the cut sites that are located adjacent to the cassette linkers are the cause for the re-folding.

48

Figure 2-6. SHAPE reactions of DGCR8’s 5´-UTR show re-folding of the native structure induced by the SHAPE cassette. (A) The secondary structure of the RNA predicted by mfold. (B)

The denaturing gel for the sequencing and SHAPE reactions in the presence of HEPES buffer

(pH 7.6) using the 1M7 reagent at 37°C. (C) The absolute SHAPE reactivity trace as a function of nucleotide position with linker and stem loop regions indicated. (D) The re-folded secondary structure induced by the cassette, showing the native stem loop and the new stem loop due to re- folding.

The following sections involve the optimizations required to yield the previously shown data, encompassing buffer and pH conditions. All of the following experiments were performed using pri-mir-107 as the model pri-miRNA, the results of which will be compared to the SHAPE- derived secondary structure in Figure 2-5D. Thereafter, unfolding of the pri-miRNA with temperature was monitored using SHAPE chemistry, as well as the possibility of tertiary 49

interactions forming within the pri-miRNA in the presence of Magnesium. Lastly, a variety of

SHAPE reagents were used to monitor the different time-scale dynamic changes within the RNA.

2.3.3 Buffer Compatibility with SHAPE Reagents

In the early SHAPE literature, HEPES buffer (pH 7.6) was typically used, which is a choice we replicated throughout the early stages of this study. However, in an effort to mimic buffer conditions used in EMSA binding studies in the lab with DGCR8, cacodylate buffer was tested in the SHAPE reactions as well. Initially, 1M7 was used as the modification reagent, but as

Figure 2-7B shows, 1M7 is inactive in the presence of cacodylate buffer. This was tested at different pH levels for both HEPES and cacodylate buffers to ensure that pH was not the cause of inactivity. In an effort to find a SHAPE reagent compatible with cacodylate buffer, we next tested benzoyl cyanide (BzCN). The denaturing gel in Figure 2-7C shows that the BzCN SHAPE reagent is active and capable of modifying the RNA at near neutral pH for both buffers. The

SHAPE reactivity profiles generated from the gel data are generally similar for pri-mir-107 irrespective of the buffer used, with the exception of low-reactive nucleotides on the 5´-arm of the pri-miRNA, which may be a result of the pH difference (vide infra).

While it is possible that this buffer compatibility problem is widespread for all anhydride reagents with cacodylate buffers, this was not tested. It is clear from this limited study, however, that due to buffer compatibility issues, it is wise to screen a variety of SHAPE reagents with a particular buffer to ensure that the SHAPE reactivity profile is accurate before scaling up to a full series of experiments. 50

51

Figure 2-7. The 1M7 anhydride SHAPE reagent does not modify the RNA in the presence of cacodylate buffer, whereas the cyanide SHAPE reagent modifies in both buffers tested. (A) The secondary structure of pri-mir-107 derived from SHAPE reactivities. (B) The denaturing gel for the sequencing and SHAPE reactions in the presence of HEPES and cacodylate buffers using the

1M7 reagent at 37°C, showing that 1M7 is not capable of modifying the RNA at either pH level tested in cacodylate buffer. (C) The denaturing gel for the sequencing and SHAPE reactions in the presence of HEPES and cacodylate buffers using the BzCN reagent at 37°C. (D) The absolute

SHAPE reactivity trace as a function of nucleotide position with corresponding unconstrained regions indicated from the gel in panel C.

2.3.4 pH Variability in SHAPE Reactions

As mentioned previously, small differences in SHAPE reactivity profiles may be due to differences in pH, causing variability in the hydrolysis mechanism of the SHAPE reagent.

Therefore, the structure of pri-mir-107 was investigated using HEPES buffer over its entire buffering pH range. Due to the possibility that more acidic pH levels diminish the rate of reagent hydrolysis, two time points were also used: 1 minute, which equals the approximate time for five hydrolysis half-lives at the reported hydrolysis rate; and 1 hour, which was long enough to ensure full hydrolysis.

As predicted, Figure 2-8 shows that different pH levels typically correlate with a difference in timing required for complete hydrolysis of the SHAPE reagent. At the higher pH

8.2, the SHAPE reactivity profiles are almost identical for both time points. But as the pH is decreased, the SHAPE reaction yields different reactivities for the highly modified nucleotides over time. At pH 6.8, even by 1 hour, the average SHAPE reactivity never reaches those of the higher pH, when normalized equally during analysis. These results suggest that although a lower pH in the SHAPE reaction typically yields lower SHAPE reactivities, signal strength can be 52

recovered to an extent by increasing the reaction duration. If such a course of action is to be taken, a time trial is recommended for the chosen combination of buffer and pH, particularly if the pH is to be mildly acidic.

Figure 2-8. Using lower pH in the SHAPE reaction typically yields lower SHAPE reactivities, which can be somewhat recovered with an increase in reaction time. (A) The secondary structure of pri-mir-107 derived from SHAPE reactivities. (B) The denaturing gel for the sequencing and

SHAPE reactions in the presence of HEPES buffer at various pH levels using the 1M7 reagent for both 1 minute and 1 hour at 37°C. (C) The absolute SHAPE reactivity traces at all pH levels as a function of nucleotide position with corresponding unconstrained regions indicated. 53

2.3.5 Monitoring RNA Unfolding with Temperature

The SHAPE technique can also be used to monitor the unfolding of RNA by performing the SHAPE reaction at multiple temperatures. RNA molecules typically begin to unfold at the most unstructured regions first, followed by the double-stranded stem. Monitoring the unfolding of particular regions in RNAs using SHAPE chemistry is particularly useful for RNAs exhibiting more complex tertiary folded structures, as has been shown in the literature.15, 40 Here, the unfolding of pri-mir-107 was monitored in the range of 20°C up to 60°C; the melting temperature seen from UV melting is 57°C (Fig. 2-9). The gel shows that majority of the RNA is being modified by the SHAPE reagent by 55°C (Fig. 2-10B), which correlates well with the UV- determined melting temperature. At 50°C, the SHAPE reactivity profiles show that portions of the RNA begin to yield quantitatively similar SHAPE reactivities as the unstructured terminal loop, indicating that these regions are becoming unstructured and unfolded at this temperature.

Figure 2-9. UV melt of pri-mir-107 showing the absorbance change with temperature as the

RNA unfolds (260 nm). The first derivative (red) shows the melting temperature at 57°C. 54

55

Figure 2-10. SHAPE chemistry used to monitor the unfolding of RNA. (A) The secondary structure of pri-mir-107 derived from SHAPE reactivities. The melting temperature determined from UV melting is 57°C. (B) The denaturing gel for the sequencing and SHAPE reactions in the presence of HEPES buffer (pH 7.6) at various temperatures (40-60°C) using the 1M7 reagent. (C)

The absolute SHAPE reactivity traces at all temperatures as a function of nucleotide position with corresponding unconstrained regions indicated. Otherwise, all SHAPE reactions were customarily performed at the biological temperature of 37°C throughout this chapter.

2.3.6 Magnesium Dependence

It is well-known that RNAs require the presence of a divalent cation to fold into their proper tertiary structures. Although many loci encoding multiple pri-miRNAs have been documented, and in one case interactions between the stems have been shown,41 individual pri- miRNAs do not generally exhibit tertiary structure. Even so, SHAPE was performed with and without the addition of 5mM divalent magnesium cations to confirm the absence of tertiary interactions for the pri-miRNAs investigated here (Fig. 2-11). The SHAPE reactivity profiles are generally consistent across the entire pri-miRNA structure for all three SHAPE reagents tested, verifying that the investigated pri-miRNAs do not contain tertiary interactions that would fold the

RNA into a different structure in the presence of divalent cations. 56

Figure 2-11. SHAPE chemistry shows that pri-miRNAs do not exhibit tertiary structure. (A) The secondary structure of pri-mir-107 derived from SHAPE reactivities. (B) The absolute SHAPE reactivity traces for three different SHAPE reagents with and without 5mM Mg2+ as a function of nucleotide position with corresponding unconstrained regions indicated. All reactions were performed in the presence of HEPES buffer (pH 7.6) at 37°C.

2.3.7 SHAPE Reagents

Another facet of SHAPE chemistry is the availability of several reagents, typically anhydrides, that are capable of monitoring different time-scale conformational changes in the

RNA.24 Among the reagents that I have used for pri-mir-107 modification are benzoyl cyanide

(BzCN), 1-methyl-7-nitroisatoic anhydride (1M7), 4-nitroisatoic anhydride (4NIA), N- 57

methylisatoic anhydride (NMIA), and isatoic anhydride (IA) (Fig. 2-12B, right). Due to their different hydrolysis rates, these reagents are capable of reacting with nucleotides in the RNA that undergo time-varying dynamic changes before the reagent is fully hydrolyzed (e.g., breathing, sugar pucker switching, non-canonical base pairing, unstructured loops). For example, IA has the longest hydrolysis half-life and has been shown in the literature to report on ribose sugar pucker conformations of GA tandem mismatches, providing the base-pairing context of these non- canonical interactions.24

Figure 2-12 shows that the different reagents all result in similar SHAPE reactivity profiles for pri-mir-107, with BzCN displaying lower reactivities for majority of the unconstrained nucleotides due to its very short half-life. No statistically significant difference is seen for the longest reacting SHAPE reagent (IA) in comparison to the others. However, it is possible that the U nucleobase from the non-canonical U/C base pair is being modified more than the other canonical base pairs (Fig. 2-12B, bottom panel); more replicates would need to be done to confirm this result. Overall, these results suggest that the Microprocessor may be recognizing a more perfect duplex rather than several imperfections in the hairpin, independent of the possibly dynamic conformations being sampled over different time-scales. 58

Figure 2-12. Various SHAPE reagents can be used to investigate the RNA’s different time- dependent dynamics. Whereas BzCN is a cyanide molecule, the other four reagents are all anhydrides. (A) The secondary structure of pri-mir-107 derived from SHAPE reactivities. (B) The absolute SHAPE reactivity traces for all SHAPE reagents as a function of nucleotide position with corresponding unconstrained regions indicated. All reactions were performed in the presence of HEPES buffer (pH 7.6) at 37°C for five hydrolysis half-lives. The SHAPE reagent structure and half-life42 are indicated on the right. 59

2.4 Conclusion

SHAPE chemistry has emerged as an effective and accurate technique for determining the secondary structure of RNAs. In this study, SHAPE showed that for pri-miRNAs, only approximately half of the single-nucleotide bulges and mismatches are detected by the SHAPE reagent (e.g., A/A mismatch in pri-mir-16-1 and U bulge in pri-mir-107). Although literature precedent exists for non-canonical base pairs, it is unclear at this point whether the remaining single-nucleotide deformations are not being detected by the SHAPE reagent or whether these deformations are acting as perfect dsRNA in an A-form helix. Therefore, it is necessary to perform other structure mapping techniques such as DMS modification43 or endonuclease cleavage44, 45 to this end.

The implementation of the SHAPE constraints altered the hairpin structure to different degrees for each RNA in comparison to those structures predicted by mfold. One explanation for these variations in predicted and experimentally-derived structures could be that the SHAPE reagent is not well-suited for detecting small imperfections; however, there is strong support of these results in the literature (see Chapter 3). Therefore, the SHAPE reactivities of the three pri- miRNAs from this study were used as experimental constraints implemented into MC-Pipeline to yield refined secondary structures of pri-miRNAs that enabled the discovery of a possible recognition site for Microprocessor cleavage, the “hot spot” (see Chapter 3).46

Since its discovery, other applications have been developed using SHAPE chemistry for protein and ligand footprinting, such as in the case of a rRNA with a ribosomal protein showing that both molecules undergo conformational rearrangement upon binding.47 Furthermore, a high- throughput method has been developed, called SHAPE-Seq,48 which allows rapid characterization of both secondary and tertiary interactions within highly complex RNAs with nucleotide- resolution.

60

2.5 Acknowledgements

This work was supported by the US National Institutes of Health grant R01GM098451 and start-up funds from the Pennsylvania State University to SAS. We thank Philip Bevilacqua and Kit Kwok for helpful discussion while developing the experimental protocols, and Chris

Wostenberg for synthesizing the 1M7 SHAPE reagent.

61

2.6 Appendix

SHAPE Protocol:

1. Heat total volume of 1.0 µM RNA (4 pmol RNA in 5 µL water) in a PCR tube to 85°C

for 1 min and place on ice for 2 min using program “SHAPE 1M7” on thermocycler.

a. Make sure to make a large enough mix for all samples (+/- SHAPE reagent and

sequencing reactions).

2. Add an equivalent volume of 2X folding buffer (100 mM HEPES pH 7.6, 100 mM KCl)

and mix thoroughly.

3. Incubate tube at 37°C for 10 min (also on program).

a. While incubating, aliquot solution into 10 µL aliquots in PCR tubes.

b. If doing protein footprinting: Incubate for 5 min using “SHAPE Bind” program.

i. Add protein to appropriate tubes and sit additional 25 min for binding to

reach equilibrium.

4. Add 1 µL of 10X 1M7 (100 mM) in anhydrous DMSO to the +SHAPE tubes and 1 µL

DMSO to background (-SHAPE) and sequencing tubes.

5. Incubate the tubes at 37°C for five hydrolysis half-lives (t½ for 1M7 is 14 s).

6. Recover RNA by EtOH precipitation:

a. Add 1 µL 5M NaCl, 0.5 µL 20 mg/mL glycogen, 12.5 µL water, and 75 µL

EtOH.

i. If doing footprinting: Remove protein by adding 40 µL 0.33 M NaOAc,

2% SDS.

1. Phenol/chloroform extract (chloroform only once).

2. Add 2.5 µL 5M NaCl, 0.5 µL glycogen, 150 µL EtOH.

b. Incubate on powdered dry ice for 10 min.

c. Centrifuge at 11,500 rpm at 4°C for 30 min. 62

d. Pipette off EtOH supernatant.

e. Wash pellet with 70% EtOH.

f. Speedvac for approximately 10-15 min until dry.

7. Resuspend each sample in 4.5 µL water.

SHAPE Reverse Transcription (RT) Protocol:

1. Add 0.5 µL of 10 µM fluorescently labeled primer (Cy5-labeled oligo; see Materials and

Methods) to the 4.5 µL of RNA.

2. Anneal the primer to the RNA solution by heating at 65°C for 5 min and cool to 35°C for

5 min using thermocycler program “SHAPE RT SSIII”.

a. Pause program if needed.

3. Add 3 µL SSIII enzyme mix made from Invitrogen kit to each RNA solution.

a. SSIII enzyme mix: 4 parts SSIII FS buffer, 1 part 0.1 M DTT, 1 part 10 mM

dNTP mix.

4. Add 1 µL of 5 mM ddNTP to sequencing tubes and 1 µL water to background (-SHAPE)

and +SHAPE tubes.

5. Incubate at 52°C for 3 min in thermocycler using the same program.

a. Pause thermocycler at end of 3 min.

6. Add 0.5 µL SSIII enzyme (200 U/µL) to each tube.

7. Incubate at 52°C for 10 min in thermocycler using same program.

8. At end of 10 mins, immediately add 0.5 µL of 2 M NaOH, and heat at 95°C for 7 min and

let cool to 4°C for 5 min in thermocycler.

9. Add an equivalent amount (9 µL) of 2X loading dye (100 mM Tris pH 7.5, 20 mM

EDTA, remaining formamide) and boil samples at 90°C for 1.5 min to destroy all RNA.

10. Load 5 µL each sample on a denaturing gel and run at 100 W for ~1.5 hours. 63

2.7 References

1. Okada, C.; Yamashita, E.; Lee, S. J.; Shibata, S.; Katahira, J.; Nakagawa, A.; Yoneda, Y.; Tsukihara, T., A high-resolution structure of the pre-microRNA nuclear export machinery. Science 2009, 326 (5957), 1275-9. 2. Zeng, Y.; Cullen, B. R., Sequence Requirements for Micro RNA Processing and Function in Human Cells. RNA 2003, 9, 112-123. 3. Zeng, Y.; Cullen, B. R., Efficient processing of primary microRNA hairpins by drosha requires flanking nonstructured RNA sequences. J Biol Chem 2005, 280 (30), 27595- 27603. 4. Han, J. J.; Lee, Y.; Yeom, K. H.; Nam, J. W.; Heo, I.; Rhee, J. K.; Sohn, S. Y.; Cho, Y. J.; Zhang, B. T.; Kim, V. N., Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 2006, 125 (5), 887-901. 5. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31 (13), 3406-3415. 6. Krol, J.; Sobczak, K.; Wilczynska, U.; Drath, M.; Jasinska, A.; Kaczynska, D.; Krzyzosiak, W. J., Structural features of microRNA (miRNA) precursors and their relevance to miRNA biogenesis and small interfering RNA/short hairpin RNA design. J Biol Chem 2004, 279 (40), 42230-9. 7. Nagaswamy, U.; Larios-Sanz, M.; Hury, J.; Collins, S.; Zhang, Z. D.; Zhao, Q.; Fox, G. E., NCIR: a database of non-canonical interactions in known RNA structures. Nucleic Acids Res 2002, 30 (1), 395-397. 8. Davis, A. R.; Kirkpatrick, C. C.; Znosko, B. M., Structural characterization of naturally occurring RNA single mismatches. Nucleic Acids Res 2011, 39 (3), 1081-94. 9. Hermann, T.; Patel, D. J., RNA bulges as architectural and recognition motifs. Structure with Folding & Design 2000, 8 (3), R47-R54. 10. Warf, M. B.; Johnson, W. E.; Bass, B. L., Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 2011, 17 (4), 563-77. 11. Low, J. T.; Weeks, K. M., SHAPE-directed RNA secondary structure prediction. Methods 2010, 52 (2), 150-158. 12. Mortimer, S. A.; Weeks, K. M., A fast-acting reagent for accurate analysis of RNA secondary and tertiary structure by SHAPE chemistry. J Am Chem Soc 2007, 129 (14), 4144-5. 13. Mortimer, S. A.; Weeks, K. M., Time-Resolved RNA SHAPE Chemistry. J Am Chem Soc 2008, 130 (48), 16178. 14. Weeks, K. M.; Mauger, D. M., Exploring RNA Structural Codes with SHAPE Chemistry. Acc Chem Res 2011, 44 (12), 1280-1291. 15. Bindewald, E.; Wendeler, M.; Legiewicz, M.; Bona, M. K.; Wang, Y.; Pritt, M. J.; Le Grice, S. F. J.; Shapiro, B. A., Correlating SHAPE signatures with three-dimensional RNA structures. RNA 2011, 17 (9), 1688-1696. 16. Gherghe, C. M.; Shajani, Z.; Wilkinson, K. A.; Varani, G.; Weeks, K. M., Strong correlation between SHAPE chemistry and the generalized NMR order parameter (S-2) in RNA. J Am Chem Soc 2008, 130 (37), 12244. 17. Deigan, K. E.; Li, T. W.; Mathews, D. H.; Weeks, K. M., Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci U S A 2009, 106 (1), 97-102. 64

18. Wilkinson, K. A.; Gorelick, R. J.; Vasa, S. M.; Guex, N.; Rein, A.; Mathews, D. H.; Giddings, M. C.; Weeks, K. M., High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol 2008, 6 (4), 883-899. 19. Merino, E. J.; Wilkinson, K. A.; Coughlan, J. L.; Weeks, K. M., RNA structure analysis at single nucleotide resolution by selective 2 '-hydroxyl acylation and primer extension (SHAPE). J Am Chem Soc 2005, 127 (12), 4223-4231. 20. Wilkinson, K. A.; Merino, E. J.; Weeks, K. M., Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat Protoc 2006, 1 (3), 1610-6. 21. Das, R.; Laederach, A.; Pearlman, S. M.; Herschlag, D.; Altman, R. B., SAFA: Semi- automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments. RNA 2005, 11 (3), 344-354. 22. Reuter, J. S.; Mathews, D. H., RNAstructure: software for RNA secondary structure prediction and analysis. Bmc Bioinformatics 2010, 11. 23. Parisien, M.; Major, F., The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008, 452 (7183), 51-55. 24. Gherghe, C. M.; Mortimer, S. A.; Krahn, J. M.; Thompson, N. L.; Weeks, K. M., Slow conformational dynamics at C2 '-endo Nucleotides in RNA. J Am Chem Soc 2008, 130 (28), 8884. 25. Wostenberg, C.; Quarles, K. A.; Showalter, S. A., Dynamic origins of differential RNA binding function in two dsRBDs from the miRNA "microprocessor" complex. Biochemistry 2010, 49 (50), 10728-36. 26. Beutner, G. L.; Kuethe, J. T.; Yasuda, N., A practical method for preparation of 4- hydroxyquinolinone esters. J Org Chem 2007, 72 (18), 7058-7061. 27. Triboulet, R.; Chang, H. M.; Lapierre, R. J.; Gregory, R. I., Post-transcriptional control of DGCR8 expression by the Microprocessor. RNA 2009, 15 (6), 1005-1011. 28. Aqeilan, R. I.; Calin, G. A.; Croce, C. M., miR-15a and miR-16-1 in cancer: discovery, function and future perspectives. Cell Death Differ 2010, 17 (2), 215-220. 29. Wang, W. X.; Rajeev, B. W.; Stromberg, A. J.; Ren, N.; Tang, G.; Huang, Q.; Rigoutsos, I.; Nelson, P. T., The expression of microRNA miR-107 decreases early in Alzheimer's disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1. J Neurosci 2008, 28 (5), 1213-23. 30. Zhu, H.; Wu, H.; Liu, X.; Li, B.; Chen, Y.; Ren, X.; Liu, C. G.; Yang, J. M., Regulation of autophagy by a beclin 1-targeted microRNA, miR-30a, in cancer cells. Autophagy 2009, 5 (6), 816-23. 31. Zhang, X. X.; Zeng, Y., The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 2010, 38 (21), 7689-7697. 32. Kozomara, A.; Griffiths-Jones, S., miRBase: integrating microRNA annotation and deep- sequencing data. Nucleic Acids Res 2011, 39 (Database issue), D152-7. 33. Turner, D. H.; Hammond, N. B.; Tolbert, B. S.; Kierzek, R.; Kennedy, S. D., RNA Internal Loops with Tandem AG Pairs: The Structure of the 5 ' G(UG)under-barU/3 ' U(GA)under-barG Loop Can Be Dramatically Different from Others, Including 5 ' A(AG)under-barU/3 ' U(GA)under-barA. Biochemistry 2010, 49 (27), 5817-5827. 34. Wu, M.; Turner, D. H., Solution structure of (rGCGGACGC)2 by two-dimensional NMR and the iterative relaxation matrix approach. Biochemistry 1996, 35 (30), 9677-89. 35. Ciesiolka, J.; Michalowski, D.; Wrzesinski, J.; Krajewski, J.; Krzyzosiak, W. J., Patterns of cleavages induced by lead ions in defined RNA secondary structure motifs. J Mol Biol 1998, 275 (2), 211-20. 65

36. Ban, N.; Nissen, P.; Hansen, J.; Moore, P. B.; Steitz, T. A., The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 2000, 289 (5481), 905-20. 37. Cruse, W. B. T.; Saludjian, P.; Biala, E.; Strazewski, P.; Prange, T.; Kennard, O., Structure of a Mispaired Rna Double Helix at 1.6-a Resolution and Implications for the Prediction of Rna Secondary Structure. Proc Natl Acad Sci USA 1994, 91 (10), 4160- 4164. 38. Bevilacqua, P. C.; George, C. X.; Samuel, C. E.; Cech, T. R., Binding of the protein kinase PKR to RNAs with secondary structure defects: role of the tandem A-G mismatch and noncontiguous helixes. Biochemistry 1998, 37 (18), 6303-16. 39. Hammond, N. B.; Tolbert, B. S.; Kierzek, R.; Turner, D. H.; Kennedy, S. D., RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 2010, 49 (27), 5817-27. 40. Wilkinson, K. A.; Merino, E. J.; Weeks, K. M., RNA SHAPE chemistry reveals nonhierarchical interactions dominate equilibrium structural transitions in tRNA(Asp) transcripts. J Am Chem Soc 2005, 127 (13), 4659-4667. 41. Chaulk, S. G.; Xu, Z. Z.; Glover, M. J. N.; Fahlman, R. P., MicroRNA miR-92a-1 biogenesis and mRNA targeting is modulated by a tertiary contact within the miR-17 similar to 92 microRNA cluster. Nucleic Acids Res 2014, 42 (8), 5234-5244. 42. Gherghe, C. M.; Leonard, C. W.; Ding, F.; Dokholyan, N. V.; Weeks, K. M., Native-like RNA Tertiary Structures Using a Sequence-Encoded Cleavage Agent and Refinement by Discrete Molecular Dynamics. J Am Chem Soc 2009, 131 (7), 2541-2546. 43. Tijerina, P.; Mohr, S.; Russell, R., DMS footprinting of structured RNAs and RNA- protein complexes. Nat Protoc 2007, 2 (10), 2608-23. 44. Myers, R. M.; Larin, Z.; Maniatis, T., Detection of single base substitutions by ribonuclease cleavage at mismatches in RNA:DNA duplexes. Science 1985, 230 (4731), 1242-6. 45. Shen, L. X.; Basilion, J. P.; Stanton, V. P., Jr., Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci U S A 1999, 96 (14), 7871- 6. 46. Quarles, K. A.; Sahu, D.; Havens, M. A.; Forsyth, E. R.; Wostenberg, C.; Hastings, M. L.; Showalter, S. A., Ensemble analysis of primary microRNA structure reveals an extensive capacity to deform near the drosha cleavage site. Biochemistry 2013, 52 (5), 795-807. 47. Mayerle, M.; Bellur, D. L.; Woodson, S. A., Slow Formation of Stable Complexes during Coincubation of Minimal rRNA and Ribosomal Protein S4. J Mol Biol 2011, 412 (3), 453-465. 48. Lucks, J. B.; Mortimer, S. A.; Trapnell, C.; Luo, S. J.; Aviran, S.; Schroth, G. P.; Pachter, L.; Doudna, J. A.; Arkin, A. P., Multiplexed RNA structure characterization with selective 2 '-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc Natl Acad Sci USA 2011, 108 (27), 11063-11068.

66

Chapter 3

Ensemble Analysis of Primary MicroRNA Structure Reveals an Extensive Capacity to Deform Near the Drosha Cleavage Site

Published: Quarles, K.A., Sahu, D., Havens, M.A., Forsyth, E.R., Wostenberg, C., Hastings,

M.L., and Showalter, S.A. Biochemistry 2013, 52 (5), 795-807.1

3.1 Abstract

Most noncoding RNAs function properly only when folded into complex three- dimensional (3D) structures, but the experimental determination of these structures remains challenging. Understanding of primary microRNA (miRNA) maturation is currently limited by a lack of determined structures for nonprocessed forms of the RNA. SHAPE chemistry efficiently determines RNA secondary structural information with single-nucleotide resolution, providing constraints suitable for input into MC-Pipeline for refinement of 3D structure models. Here we combine these approaches to analyze three structurally diverse primary microRNAs, revealing deviations from canonical double-stranded RNA structure in the stem adjacent to the Drosha cut site for all three. The necessity of these deformable sites for efficient processing is demonstrated through Drosha processing assays. The structure models generated herein support the hypothesis that deformable sequences spaced roughly once per turn of A-form helix, created by non- canonical structure elements, combine with the necessary single-stranded RNA:double-stranded

RNA junction to define the correct Drosha cleavage site.

67

3.2 Introduction

Many noncoding RNAs undergo enzymatic processing to achieve their functional states, through mechanisms that are coupled to their three-dimensional structures. Therefore, generating atomic-resolution structure models is necessary for complete characterization of noncoding RNA processing, though this is currently a daunting task for most RNAs if attempted by crystallographic or nuclear magnetic resonance (NMR) methods. For many smaller noncoding

RNAs, such as microRNAs (miRNAs), adequate models are realized through computational prediction of secondary structure, underscoring the need for computational approaches that yield highly accurate and biochemically validated structure models. However, available structure- mapping techniques face limitations when rare and undercharacterized motifs are encountered.2

Here we show that structure models of primary miRNA (pri-miRNA) transcripts, suitable for generating testable mechanistic hypotheses, are created by combining secondary structure mapping through selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry3 with structure calculation in MC-Pipeline.4 We apply these methods to three structurally diverse pri-miRNAs, revealing deviations from canonical A-form structure in each; we then demonstrate that these structural deformations significantly influence in vitro processing efficiency by the RNase III enzyme Drosha.

Mature miRNAs, of which more than 1,000 have been annotated in humans,5 regulate development and tissue differentiation through their role in the RNA silencing pathway.6

Canonical pri-miRNA transcripts adopt imperfect stem-loop structures embedded within single- stranded regions.6 In the canonical miRNA biogenesis pathway, pri-miRNAs are excised co- transcriptionally from longer RNAs by the Microprocessor complex, consisting minimally of

Drosha and the double-stranded RNA (dsRNA) binding protein DGCR8, in a process that is tightly regulated.6, 7 Cleavage results in precursor miRNA (pre-miRNA) that is approximately 70 68

nucleotides in length and typically characterized by a two-nucleotide 3′-overhang at the cut site.8

To date, no atomic resolution structures of pri-miRNA models have been reported; with regard to pre-miRNAs, only the structure of pre-mir-30a bound to Exportin-5 has been determined.9

Consensus mechanistic proposals emphasize a role for the single-stranded RNA (ssRNA)-dsRNA junction and pri-miRNA structural heterogeneity, from bulges and internal loops, in

Microprocessor positioning and cut-site recognition.8, 10, 11 In a recent bioinformatic study, Warf et al.12 predicted that most pri-miRNAs harbor a helical distortion at the Drosha cleavage site, with a majority of the distortions being symmetric internal loops of two nucleotides (i.e., single- nucleotide mismatches).

Generating a complete mechanistic model for miRNA processing requires the determination of structures of representative pri-miRNAs.13 SHAPE chemistry has emerged as a powerful method for defining base-pairing status with single-nucleotide resolution.3 Hairpin

RNAs similar to pri-miRNAs have been studied using SHAPE chemistry, yielding results that compare well to their previously determined secondary structures.14 However, these previous studies made no attempt to generate SHAPE-constrained three-dimensional structure models.

Here, we analyze the structures of three RNAs: pri-mir-16-1, pri-mir-30a, and pri-mir-

107. Of the three, pri-mir-16-1 and pri-mir-30a were chosen because of their extensive prior use as models for in vitro processing studies.8-11, 15-17 We analyzed pri-mir-107, which contains a 1 × 3 asymmetric loop at the cleavage site,18 because inclusion of the scissile bonds in bulges and internal loops is predicted to produce inconsistent length pre-miRNA molecules.19 Secondary structure constraints were generated by SHAPE, and the data were then incorporated into MC-

Pipeline calculations,4 producing low-resolution atomic structure models of the RNA stem-loop structures in a relatively high-throughput manner. Surprisingly, normalized SHAPE reactivity profiles indicate that many of the small helical imperfections in the RNA stems are not disruptive to the A-form helix, results that are corroborated by ribonuclease cleavage assays. In all three pri- 69

miRNAs, the MC-Pipeline structure ensembles feature an extensive ability to deform the dsRNA stem between the ssRNA-dsRNA junction and the Drosha cut site. Drosha processing assays performed in vitro confirm that the presence of these deformable “hot spots” near the cut site enhances cleavage efficiency. Overall, we have developed an approach for generating structure models of small RNAs and applied it to pri-miRNAs to reveal an important structural aspect of

Drosha processing.

3.3 Materials and Methods

3.3.1 RNA Preparation

All DNAs were purchased from Geneart. Template DNA for SHAPE reactions was inserted into a SHAPE cassette20 with an inverted BsaI cut site at the 3′-end. All DNAs were cloned into pUC19 (New England Biolabs) and transformed into DH5α competent cells, which were grown in LB medium at 37°C to an approximate OD600 of 3.75. Template DNA for ribonuclease structure mapping was prepared identically, except that the SHAPE cassette sequences were not present. The preparation of template DNA, transcription by T7 RNA polymerase, and purification of the transcribed RNA were all performed as previously described.21

3.3.2 RNA Modification by 1M7

RNA (4 pmol) in 5 μL of sterile water was heated at 85°C for 1 min and cooled to 4°C for 2 min prior to the addition of 5 μL of folding buffer [50 mM HEPES (pH 7.6) and 50 mM

KCl], incubation at 37°C for 10 min, and treatment with 1 μL of 1M7 (100 mM 1M7 in anhydrous DMSO) at 37°C for 70 s. Control and sequencing reaction mixtures were treated with

1 μL of anhydrous DMSO. All reaction mixtures were cooled to 4°C, and the RNA was recovered 70

by ethanol precipitation (1 μL of 5 M NaCl, 0.5 μL of 20 mg/mL glycogen, 12.5 μL of sterile water, and 75 μL of ethanol added to each reaction mixture).

3.3.3 Primer Extension

All reaction samples were mixed with 0.5 μL of Cy5 fluorescently labeled DNA primer

(10 μM in sterile water, 5′-Cy5-GAACCGGACCGAAGCCCGATTTGG-3′, purchased from

Sigma, HPLC-purified). The primers were annealed to the RNA by being heated at 65°C for 5 min and cooled at 35°C for 5 min prior to addition of 3 μL of reverse transcription buffer [167 mM Tris-HCl (pH 8.3), 250 mM KCl, dNTPs (1.67 mM each), 17 mM DTT, and 10 mM MgCl2].

After the sample had been heated to 52°C for 3 min, 1 μL of 5 mM ddNTP was added to the sequencing reaction mixtures, and 1 μL of sterile water was added to all other reaction mixtures.

Primer extension was initiated by the addition of 0.5 μL of Superscript III (Invitrogen) and incubation at 52°C for 10 min. Immediately thereafter, 0.5 μL of 2 M NaOH was added, and the solutions were heated to 95°C for 7 min prior to being cooled at 4°C for 5 min. cDNA was treated with 9 μL of a neutralizing gel loading solution [100 mM Tris-HCl (pH 7.5) and 20 mM EDTA in formamide], heated to 90°C for 90 s, and separated on a denaturing polyacrylamide gel (8% 29:1 acrylamide:bisacrylamide, 8.4 M urea, 1X TBE; 95 W, 1 h). The gels were visualized on a

Typhoon imager.

3.3.4 Processing of SHAPE Data

Each of the SHAPE reactions was conducted over 21 replicates and three background reactions. Gels were quantified using SAFA.22 After loglikelihood-based processing, normalizing, and background subtraction, the average and standard deviation of the SHAPE reactivity were calculated using MATLAB scripts from HiTRACE.23, 24 The mean SHAPE reactivities with 71

standard deviations larger than their respective means were discarded for further analysis, along with any negative values. Positive SHAPE reactivities were rescaled in the range of 0−1.

3.3.5 Ribonuclease Structure Mapping

The RNAs were 5′- end labeled with [γ-32P]ATP using T4 polynucleotide kinase (New

England Biolabs), and the RNA concentration was determined by liquid scintillation. The 5′-32P- end-labeled RNA (0.05 pmol of total RNA, 4 μL in sterile water) was renatured at 60°C for 10 min followed by 25°C for 10 min prior to the addition of 1 μL of folding buffer [50 mM HEPES

(pH 7.6) and 50 mM KCl]. The RNA was digested in the presence of 1 μL of single-strand- specific (0.11 ng/mL RNase A or 0.25 unit/μL RNase T1) or double-strand-specific (1.7 × 10−3 unit/μL RNase V1) nucleases (Ambion) under native conditions at 37°C for 30 min. The T1 ladder was generated by incubating the labeled RNA (0.05 pmol, 2 μL in sterile water) under denaturing conditions in 3.5 μL of T1 digestion buffer [9.4 M urea, 28.3 mM sodium citrate (pH

3.5), and 1.4 mM EDTA] and 0.5 μL of 5 units/μL T1 at 50°C for 5 min. The hydrolysis ladder was generated by incubating the labeled RNA (0.03 pmol, 1.5 μL in sterile water) and 2 μL of hydrolysis buffer [100 mM Na2CO3/NaHCO3 (pH 9.0) and 2 mM EDTA] at 90°C for 5 min. All reactions were quenched by addition of an equal volume of 100 mM Tris-HCl (pH 7.5)/20 mM

EDTA/formamide loading buffer and mixtures boiled before being fractionated on a denaturing polyacrylamide gel (both 8% and 12% 29:1 acrylamide:bisacrylamide, 8.4 M urea, 1X TBE; 100

W, 1 h). The gels were visualized on a Typhoon imager.

3.3.6 MC-Fold and MC-Sym Simulation

MC-Fold and MC-Sym were used to simulate pri-miRNA structures, supplemented by the information provided by SHAPE. Initially, the 3′- and 5′-tails were constrained to be single- stranded in the MC-Fold simulations. MC-Fold output consisted of 1,000 dotbracket solutions 72

generated to explore 15% of the suboptimal structures, and the solutions were ordered on the basis of a total energy calculation. The single-stranded probability of each nucleotide was calculated by the number of occurrences of dot over bracket in the solution set, which was used to color code the most stable secondary structure prediction from MC-Fold in the text. In summary, the MC-Fold output was used to create the secondary structure maps as these results are directly related to the base pairing probabilities.25, 26

SHAPE reactivity data were provided as input to MC-Pipeline to further refine the single- strandedness predictions. Non-negative SHAPE reactivity was rescaled to range from 0 (no reactivity) to 1 (maximal reactivity) and subdivided into three categories. Highly reactive nucleotides with normalized SHAPE reactivities of >0.22 (except pri-mir-107 mut2, where the high cutoff was 0.37) were assigned a “medium” constraint strength in MC-Fold, reflecting those residues in the top 5% of SHAPE reactivity; all other statistically significant reactivity was defined by a normalized SHAPE reactivity between 0.03 and 0.22 and assigned a “low” constraint strength in MC-Fold, and any nucleotide with a reactivity of <0.03 was deemed not to have statistically significant reactivity and therefore was not assigned a reactivity-based structure constraint in MC-Fold. We found in early iterations that assigning “high” constraint strengths to the strongest SHAPE hits resulted in overweighting of the SHAPE data that seemed to overwhelm the influence of MC-Fold on the final output and thus chose to utilize the more conservative constraint strengths reported.

The most probable secondary structure model from MC-Fold analysis with low- resolution SHAPE constraints was used to generate the input for the MC-Sym model generation.

MC-Sym was configured to use the September 2011 snapshot of the fragment library to generate the three-dimensional (3D) structures. The top 10 structures explored by MC-Sym were configured to use probabilistic exploration of tertiary structure with the model diversity set to 1.0

Å. In all MC-Sym runs, a total of 1,000 suboptimal structures were generated to provide adequate 73

sampling of three-dimensional space prior to terminating sampling. The top five most probable models generated, based on free energy calculation, were selected for further analysis.

3.3.7 Drosha Processing Assays

RNA substrates for in vitro assays were transcribed as previously described.17 FLAG-

Drosha17 and FLAG-DGCR8 (AddGene) (termed Microprocessor) or FLAG-GFP were overexpressed in HEK-293T cells, and FLAG-tagged proteins were isolated on M2-FLAG beads

(Sigma). Briefly, 6 μg of plasmid was transfected into cells using Lipofectamine 2000

(Invitrogen) following the manufacturer’s instructions. Mock treatment was exposed to lipofectamine, but no plasmids were transfected into the cells. Approximately 48 h after transfection, cells were washed with phosphate-buffered saline, harvested, and lysed via sonication in lysis buffer [20 mM Tris-HCl (pH 8.0), 100 mM KCl, and 0.2 mM EDTA]. The lysate was combined with 10 fmol of labeled RNA, 10X reaction buffer (64 mM MgCl2), and

RNasin (Promega). The reaction mix was incubated at 37°C for 5 min. Products were extracted with phenol and chloroform and precipitated with ethanol. RNA was analyzed on 12% denaturing polyacrylamide gels.

3.4 Results

3.4.1 SHAPE-Derived Primary miRNA Structures

More than half of the nucleotides in a typical RNA are found in Watson-Crick base pairs, but a number of non-canonical structural motifs are commonly embedded within A-form secondary structures.27 One of the most prevalent motifs is the single-nucleotide mismatch, or 1 ×

1 internal loop, which is often (nearly) isosteric with canonical Watson-Crick pairs.28 Also common are single-nucleotide bulges and small asymmetric loops, which often serve as 74

recognition motifs for proteins.29 Deep sequencing in Caenorhabditis elegans shows an average of two 1 × 1 loops in a typical pre-miRNA, along with an average of two single-nucleotide bulges.12

The three human pri-miRNAs in our study contain a diverse set of both symmetric internal loops and small bulges. When the structures of pri-mir-16-1 and pri-mir-107 were predicted by mfold,30 both exhibited several alternative structures within the 10% suboptimal parameter range. These alternative structures displayed differences in the size and location of internal loops and/or bulges, which could lead to heterogeneous final miRNA lengths,19 and variation in the terminal loop size, which has implications for processing efficiency.16

Because the structural imperfections in pri-miRNA transcripts have direct consequences for processing and function, it is necessary that they be accurately annotated. We subjected each of the three pri-miRNAs in our panel to modification by the SHAPE reagent 1-methyl-7- nitroisatoic anhydride (1M7) and analyzed the results by fractionating fluorescently labeled cDNA products on denaturing gels to identify RNA secondary structure with single-nucleotide resolution. Protection from modification by a SHAPE reagent is expected for a site that is canonically base paired or for mismatch sites that are constrained at the level of secondary structure.20 The resulting normalized SHAPE reactivity for each pri-miRNA indicates that the majority of the symmetric internal loops do not provide the SHAPE reagent with access to the 2′- hydroxyl, implying that these defects do not significantly alter the A-form helix of the double- stranded stem (Fig. 3-1, left panels). As positive controls, we also looked at the reactivity of bulges and asymmetric loops in the same sequences. In contrast to the findings described above, these motifs did react with the SHAPE reagent, albeit to different extents.

Symmetric internal loops, particularly 1 × 1 loops, are the most common base-pairing defect in pri-miRNAs. A total of six 1 × 1 internal loops are represented in this data set (Table 3-

1), but only the A/A mismatch in pri-mir-16-1 was modified above background by the SHAPE reagent. Consistent with this result, thermodynamic analysis indicates low helical stability for 75

A/A mismatches as compared with most 1 × 1 loops.31 Chemical and enzymatic probing has also confirmed this result for the same mismatch in pre-mir-16-1.15 Another symmetric internal loop of interest in our data set is the 2 × 2 AG·GA internal loop (i.e., GA tandem mismatch) located proximal to the Drosha cleavage site in pri-mir-16-1, for which Krol et al. propose a dramatically refolded double-loop structure based on Ca2+ cleavage assays.15 This loop is not modified above background by the SHAPE reagent in the neighborhood of nucleotides 19-22, consistent with both an NMR structure reporting an AG·GA internal loop in a similar sequence context32 and thermodynamic data indicating high loop stability.33 It is noteworthy that there is a small but statistically significant amount of SHAPE reactivity in the complementary strand, spanning nucleotides 92-95. On the basis of our SHAPE results and their consistency with findings in the literature, we conclude that most symmetric internal loops do not cause the average structure of pri-miRNAs to deviate significantly from A-form. 76

77

Figure 3-1. SHAPE-constrained MC-Fold calculations yield secondary structures with embedded estimation of conformational dynamics. SHAPE reactivity traces (left) identify single-stranded nucleotides for (A) pri-mir-16-1, (B) pri-mir-30a, and (C) pri-mir-107. In the SHAPE reactivity traces, bar heights indicate the normalized mean reactivity constructed from 21 independent reactions. Blue filled bars indicate nucleotides with a positive mean reactivity and a magnitude greater than the uncertainty of the measurement. All gray bars indicate that the reactivity is negative or has a mean magnitude below the uncertainty and is therefore considered insignificant.

Addition of SHAPE-derived single-stranded constraints to MC-Fold calculations yields the combined probability of the nucleotide being single-stranded, which is mapped onto the most probable secondary structure (right). These probabilities are indicated in color as annotated in the color bar, ranging from most likely double-stranded (blue) to most likely single-stranded (red).

Regions of high single-stranded probability divide the stems into three segments, labeled as H1-

H3. Nucleotide numbering corresponds with the pri-miRNA numbering starting at 1 and the

SHAPE cassette linkers (see Materials and Methods) being less than 1 and greater than the pri- miRNA length. Nucleotides corresponding to the mature miRNAs (as annotated in miRBase) are indicated by a line adjacent to the secondary structure diagram, which has been oriented such that the Drosha cut site is on the left in all three cases. Secondary structure diagrams were generated with VARNA.55

While 1 × 1 loops were rarely modified by the SHAPE reagent, single-nucleotide bulges were often modified to a detectable extent. For example, each of the three U bulges was significantly modified (Table 3-1). There is also a weak SHAPE reactivity for the C bulge located near the terminal loop of pri-mir-107. The only single-nucleotide bulge not modified above background was the A bulge in pri-mir-30a (Fig. 3-1B, left), which is likely to be participating in 78

purine-purine stacking on both sides. This result is consistent with previous studies monitoring protection from metal-induced catalytic cleavage at this site.34

High normalized SHAPE reactivity indicates a lack of base pairing and/or base stacking at the site of multiple-nucleotide bulges and asymmetric internal loops. Both two-nucleotide bulges present in the data set were strongly modified by the SHAPE reagent (Table 3-1), although one nucleotide in each pri-miRNA was modified much more heavily than the other (Fig. 3-1B, C, left). In the case of the 1 × 3 internal loop of pri-mir-107, both strands of the loop were heavily modified. Note that the single-nucleotide C strand in this asymmetric loop was modified more heavily than any single-nucleotide bulge (Fig. 3-1C, left). The extensive SHAPE reactivity in this site is especially noteworthy, as similar patterns have been shown to correlate strongly with NMR observations of large amplitude conformational dynamics.35 As the predicted Drosha cleavage site is contained within this 1 × 3 loop, dynamic deformations in this region could inhibit the cleavage reaction. However, Drosha activity data shown below reveal that this is not the case.

79

Table 3-1. Complete Annotation of Secondary Structure Defects in the Stem Region of Wild-

Type pri-mir-16-1, pri-mir-30a, and pri-mir-107

Helical Sequence in SHAPE Single-stranded Location Imperfection RNA Detection by U bulge pri-mir-16-1 34-C--G-35 yes no 80-GUC-78 U bulge pri-mir-30a 8-U--G-9 yes no 84-AUC-82 U bulge pri-mir-107 25-CUU-27 yes no 76-G--G-75 A bulge pri-mir-30a 9-GAG-11 no no 82-C--C-81 C bulge pri-mir-107 41-CCU-43 no (small hit) no 61-G--A-60 2-nt bulge pri-mir-30a 25-CCUC-28 yes yes 67-G----G-66 2-nt bulge pri-mir-107 16-U----G-17 yes no (small hit) 89-ACAC-86 1 × 1 symmetric pri-mir-16-1 23-CCU-25 no no internal loopa 91-GAA-89 1 × 1 symmetric pri-mir-16-1 36-UAA-38 yes yes internal loop 77-AAU-75 1 × 1 symmetric pri-mir-16-1 45-GCG-47 no no internal loop 68-CUC-66 1 × 1 symmetric pri-mir-16-1 47-GUU-49 no no internal loop 66-CUA-64 1 × 1 symmetric pri-mir-30a 13-GAC-15 no no internal loopa 79-UCG-77 1 × 1 symmetric pri-mir-107 29-UUU-31 no (only C hit) no (only C hit) internal loop 73-ACA-71 2 × 2 symmetric pri-mir-16-1 18-CAGU-21 no no for T1, yes for A internal loop 96-GGAA-93 1 × 3 asymmetric pri-mir-107 21-U--C--A-23 yes yes internal loopa 82-ACUAC-78 aHelical imperfections that surround the Drosha cleavage site.

80

3.4.2 Ribonuclease Cleavage Structure Mapping

Chemical modification with SHAPE reagents rapidly showed, with single-nucleotide resolution, that some loops and bulges are minimally disruptive to A-form helical structure in pri- miRNA. The functional significance of this finding prompted us to verify the SHAPE results through limited digestions with a panel of ribonucleases specific for single-stranded and double- stranded RNA. Figure 3-2 shows ribonuclease mapping under native conditions for pri-mir-16-1, pri-mir-30a, and pri-mir-107. The cleavage patterns for all RNAs are largely consistent with the

SHAPE results (compare columns 4 and 5 in Table 3-1). Substantial single-strand-specific cleavage is only indicated for bulges and asymmetric internal loops larger than a single nucleotide and the A/A mismatch located in pri-mir-16-1, which were also sensitive to the SHAPE reagent.

Interestingly, RNase A cleaves all of the surrounding nucleotides of the CU bulge located in pri- mir-30a (Fig. 3-2B), suggesting that this bulge is highly dynamic, although this could simply reflect RNase A’s tendency to cleave nucleotides adjacent to loops. The propensity for the CU bulge to also exist as an alternative UC bulge is consistent with prior metal-catalyzed cleavage assays15 and with the crystal structure of pre-mir-30a bound to Exportin-5.9

81

82

Figure 3-2. Ribonuclease structure mapping is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations for (A) pri-mir-16-1, (B) pri-mir-30a, and (C) pri-mir-107. For each RNA, a denaturing 12% polyacrylamide gel used in the analysis is shown, with lanes as follows: C, a control sample (no nuclease); OH−, a limited alkaline digest; T1, A, and V1, limited digests with ribonucleases specific for single-stranded G, single-stranded C and U, and 5′ to double-stranded or well-stacked single-stranded regions, respectively. The reactions in lanes 2 and 3 were performed under RNA denaturing conditions

(denoted Den.) to provide a ladder correlating the position in the gel with the nucleotide sequence, while the reactions in lanes 4-6 were performed under RNA native conditions (denoted

Nat.). Helical and loop regions of the RNA are indicated to the right of the gel. The highest- probability secondary structure (see Fig. 3-1) with positions of cleavage by ribonucleases under native conditions indicated by symbols as described in the legend is displayed below each gel.

Symbol size is proportional to cleavage intensity. In these secondary structure maps, proposed

Drosha cleavage sites are identified with red arrows; regions near Drosha cleavage sites displaying high single-strand probability in our MC-Fold and SHAPE analysis are enclosed in gray boxes.

3.4.3 Secondary Structure Refinement by MC-Fold

MC-Pipeline is a modeling program that utilizes a nucleotide cyclic motif library containing base pair contextual information, derived from crystal structures deposited in the

Protein Data Bank (PDB), to ultimately predict 3D folds of RNA structures.4 SHAPE chemistry provides structural constraints suitable for input into MC-Pipeline, which has the potential to improve structure quality when rare sequence motifs are encountered or when ambiguous base pairing is possible. Therefore, we generated secondary structure predictions in MC-Fold by 83

identifying the joint probability that a nucleotide is single-stranded based on the MC-Pipeline database and our experimental SHAPE constraints (as described in Materials and Methods). An alternative approach in which a large number of decoy structures is generated in MC-Pipeline and then scored against low-resolution experimental constraints to generate a subensemble consistent with the data has recently been reported36 but was not used here. The most probable secondary structure resulting from our combined SHAPE and MC-Fold analysis is reported for each pri- miRNA in Figure 3-1 on the right and color-coded to indicate the combined single-stranded probability (the 10 most probable secondary structures for each pri-miRNA are displayed in Figs.

S3-1 – S3-3 of the Supporting Information).

Importantly, some nucleotides provide only weak indications of ssRNA character by either SHAPE or MC-Fold alone; however, combining the two sources of data yields clear indications of base pairing status (results from MC-Fold analysis performed without reference to the SHAPE data are summarized in Figs. S3-4 – S3-6 of the Supporting Information). When

SHAPE and MC-Fold analysis are combined, the single-stranded probabilities of nucleotides near the Drosha cleavage site on the stem closer to the ssRNA-dsRNA junction of all three RNAs

(labeled as the H1 stem in Fig. 3-1, right) are considerably larger than those of other nucleotides in the stem. Helical disruptions near the Drosha cut site have previously been predicted by bioinformatics in C. elegans,12 and our data provide the first experimental validation of this structural feature in pri-miRNAs of human origin.

Another structural feature of pri- and pre-miRNA hairpins that has been heavily studied is the terminal loop of the hairpin. In some prior studies, thermodynamic algorithms (e.g., mfold) have been observed to predict excess base pairing near terminal loops; more than 70% of the pre- miRNAs evaluated in a recent study were predicted by mfold to have short loops inconsistent with experimental results.15 Application of MC-Fold to the present set of pri-miRNA sequences also produces minimally sized terminal loops when SHAPE data are not added as an input 84

constraint (Figs. S3-4 – S3-6 of the Supporting Information). Enforced shortening of the terminal loop in pri-mir-16-1 through mutagenesis decreased processing efficiency,16 and replacement with a thermostable tetraloop eliminated processing completely,16 thus establishing the necessity of properly defining loop size. Analysis of the SHAPE-constrained MC-Fold secondary structures in Figure 3-1 confirms the presence of large terminal loops for all three RNAs studied, suggesting that their loops should not impair Drosha processing.

3.4.4. Global Features of 3D Structure Modeling Using MC-Sym

Developing mechanistic insight into biomolecular function often requires visualization of three-dimensional atomic structures. For example, coaxial stacking of helical segments to either side of disruptive motifs in the pri-miRNA stem, or in its absence, influences the overall shape of the substrate recognized by the Microprocessor. The data discussed so far do not address these overall geometric features. Therefore, the highest-scoring SHAPE-constrained MC-Fold secondary structures were provided as input for MC-Sym.4 The single-stranded flanking tails were excluded from the MC-Sym input because their inclusion resulted in stalling of the algorithm. It is noteworthy that the highly dynamic nature of such regions often results in their exclusion from high-resolution structure models (i.e., those derived from X-ray crystallography or

NMR) as well. For similar reasons, caution is also required in attributing functional significance to the quantitative atomic positions in the terminal loops of the generated models, although the positioning of the single-stranded−double-stranded junction is robust. The five most probable structures of each pri-miRNA are superimposed in Figure 3-3, where they are aligned along the main stem of the RNA. The Drosha cleavage site is located between the regions labeled H1 and

H2, directly above the portion of H1 with high single-stranded probability (red in Fig. 3-3), making the location of the cut site easily identifiable. 85

RNase III enzymes are expected to cleave most efficiently in the interior of A-form helical regions. In all three structure bundles, the regions with low SHAPE reactivity (colored blue) overlap well, supporting their nondynamic nature. In contrast, the regions with high SHAPE reactivity (shown in the range from yellow to red) have a large spatial distribution because of their dynamic nature. The models predominantly favor a coaxial stack of the stem regions with looped-out bulges. Again, the higher dynamics and lower base pairing probability of nucleotides at the H1-H2 boundary of all three bundles identify the basal stem adjacent to the Drosha cut site.

It is especially noteworthy that while this region is likely to be more deformed than the majority of the RNA, coaxial stacking of H1 and H2 minimizes disruption of the A-form helix, even in the case of pri-mir-107. This result is consistent with the expectation of proper cut-site recognition by the RNase III domains of the Drosha enzyme.

86

Figure 3-3. Ensemble representation of the top five SHAPE-constrained models generated by

MC-Sym for (A) pri-mir-16-1, (B) pri-mir-30a, and (C) pri-mir-107. All models are aligned along the main stem of the RNA with the sugar-phosphate backbone indicated by a blue ribbon. Planks representing the nucleotides are colored according to the probability of being single-stranded as reported in Figure 3-1 and indicated by the color bar, ranging from most likely double-stranded

(blue) to most likely single-stranded (red). Regions of high single-stranded probability divide the stems into three segments, labeled as H1-H3. Inclusion of extended single-stranded tails renders

MC-Sym calculations unstable; therefore, the expected tails are not represented in the models shown.

3.4.5 Primary miRNA Structural Heterogeneity

In addition to the increased single-stranded probability adjacent to the Drosha cut site, combined SHAPE and MC-Pipeline analysis indicates at least one other region of lower base pairing probability at the H2-H3 boundary in the labeling scheme of Figures 3-1 – 3-3. Warf et al. 87

propose that periodic structural distortions in the RNA stem, colocalized on a single side of the molecule, could allow the formation of a strong bend in the pri-miRNA structure spanning three turns of helix.12 A bend of this nature is necessary for the DGCR8 binding model proposed by

Sohn et al.37 Analysis of conformationally heterogeneous sites in the SHAPE-constrained MC-

Sym ensembles provides insight into the mechanism of deformation necessary to support this hypothesis and suggests a need for future molecular mechanics calculations to define the dynamics uniting the reported pri-miRNA conformational landscapes.

In pri-mir-16-1, an A/A mismatch found 11 nucleotides from the Drosha cut site (Fig. 3-

4A), along with a nearby U bulge (Fig. 3-4B), is responsible for creating a second region of enhanced conformational variation (seen as the H2-H3 boundary in Fig. 3-3A). Although both A nucleobases are stacked within the helix in each of the five most probable structures, base pairing between them is poor. While the 5′UAA/3′AAU sequence containing the A/A mismatch found in pri-mir-16-1 was not present in the PDB at the time of writing, substantial evidence supports the dynamic nature of A/A mismatches in other sequence contexts. For example, when flanked by thermodynamically stable G/A sheared base pairs, the A/A mismatch was found to be highly dynamic by NMR.38 In addition, when flanked by Watson-Crick base pairs, the A/A mismatch was found to be thermodynamically unstable.39 Moreover, the nearby U bulge in pri-mir-16-1 is flanked by a pyrimidine on its 5′-side, therefore yielding unfavorable 5′-pyrimidine-pyrimidine stacking34 and further enhanced destabilization of this region.

Asymmetric elements, such as the two-nucleotide bulge in pri-mir-30a and the 1 × 3 asymmetric loop in pri-mir-107, can feature either intercalative stacking or extrusion of the asymmetric base(s) from the duplex.29 The nucleobases of the two-nucleotide bulge in pri-mir-

30a are extruded from the helix in all five structures, but deformation of the backbone imparts a bend to the overall structure (Fig. 3-4C). Consistent with the high SHAPE reactivity in pri-mir-

107, the A and U nucleobases of the three-nucleotide strand in the asymmetric loop are extruded 88

from the stack in all five structures (Fig. 3-4D), resulting in the introduction of a bend to the overall structure of the stem.

Figure 3-4. Secondary structures of pri-miRNA molecules harbor multiple dynamic bulges and internal loops. Expanded views of areas within the MC-Sym models that are highly dynamic are shown for (A) the A/A mismatch in pri-mir-16-1, (B) the U bulge in pri-mir-16-1, (C) the CU bulge in pri-mir-30a, and (D) the 1 × 3 asymmetric internal loop in pri-mir-107. The nucleotides involved in the imperfections are colored orange (their position in the nucleotide sequence is also annotated), and the most probable structure is shown otherwise in solid blue, with the models from the other four members of the ensemble reported in Figure 3-3 shown in transparent blue.

All models are aligned to the nearest stable Watson-Crick base pair neighboring the imperfection in the most probable model. 89

In contrast, many single-nucleotide mismatches do not significantly distort the RNA helix;40 thermodynamically, many do not even disrupt helix stability.41 For example, A/C mismatches are isosteric with G·U wobble pairs, although their stability is influenced by nearest- neighbor identity.42 In pri-mir-16-1, the A/C mismatch is not reactive to the SHAPE reagent or

RNase A cleavage (Table 3-1); it is also not predicted to be highly disordered by MC-Fold, which is consistent with the structure of an A/C mismatch from the large ribosomal subunit found in an identical sequence context.43 In pri-mir-30a, the A/C mismatch is similarly not reactive to the

SHAPE reagent and RNase A, but the single-stranded probability from MC-Fold is high, suggesting deformability in the Drosha cut site (Fig. 3-1B, right). Analysis of the structure database shows that the strength of A/C base pairing in similar sequence contexts is highly variable.28 The MC-Sym bundle for this mismatch in pri-mir-30a shows the most disorder of any noncanonical element with low SHAPE reactivity (Fig. 3-5A), suggesting weak base pairing and a contribution to instability in this region.

The U/C (or C/U) mismatch is also found in two of the three pri-miRNA sequences studied. Crystal structures of RNA stems harboring U/C mismatches show that, with the assistance of bridging waters, a hydrogen bonding network capable of promoting a good fit of the nucleobases within the RNA helix is achievable.44, 45 In both pri-mir-107 (Fig. 3-5B) and pri-mir-

16-1 (Fig. 3-5C), MC-Sym ensembles display well-ordered stacking of these mismatches that support the possibility of water-mediated hydrogen bonding between the nucleobases. It is noteworthy that the C of the pri-mir-107 U/C mismatch was somewhat sensitive to both the

SHAPE reagent and RNase A (Table 3-1), and a slight distortion in the backbone along the 3′- side of the mismatch is observed in the MC-Sym bundles.

Unique to the set of pri-miRNAs, a 2 × 2 AG·GA internal loop is found in pri-mir-16-1 in the basal H1 stem, adjacent to the Drosha cut site (Fig. 3-5D). GA tandem mismatches are generally thermodynamically stable,46 although both the thermodynamics of loop closure and the 90

structures they adopt are highly sequence dependent.46-50 Although the PDB contains no examples of the exact 5′CAGU/3′GGAA sequence found in pri-mir-16-1, the Watson-Crick face base pairing observed in the MC-Sym bundles (Fig. 3-5D) is consistent with the general literature trend.48, 50 Despite the relatively high level of order seen in the MC-Sym bundles for the AG·GA internal loop and nearby A/C mismatch, these non-canonical elements collectively contribute to a high probability of single strandedness in the H1 stem that identifies the Drosha cut site.

91

Figure 3-5. Secondary structures of pri-miRNA molecules harbor multiple non-Watson-Crick mismatches that are predicted to be well-ordered by SHAPE reactivity. Expanded views of areas within the MC-Sym models representing these mismatches are shown for (A) the A/C mismatch in pri-mir-30a, (B) the U/C mismatch in pri-mir-107, (C) the C/U mismatch in pri-mir-16-1, and

(D) the AG·GA internal loop in pri-mir-16-1. The nucleotides involved in the imperfections are colored orange (their position in the nucleotide sequence is also annotated), and the most probable structure is shown otherwise in solid blue, with the models from the other four members of the ensemble reported in Figure 3-3 shown in transparent blue. All models are aligned to the nearest stable Watson-Crick base pair neighboring the imperfection in the most probable model.

92

3.4.6 Drosha Processing of Primary miRNAs

To verify the hypothesis that imperfections in the A-form helix near the Drosha cut site are necessary for efficient cleavage by the Microprocessor, in vitro Drosha processing assays were conducted with wild-type pri-mir-16-1 and pri-mir-107. For both, mutants were designed to modify the flexible region near the Drosha cut site. Of note, Drosha activity is dependent on the presence of Mg2+ in the reaction buffer, whereas our SHAPE reactions were performed in the absence of divalent metals to minimize the risk of metal-catalyzed cleavage causing false positives. As specific metal binding is most likely to affect tertiary structure, which does not exist in our single-hairpin transcripts, we feel there is minimal risk that the structures are altered significantly by Mg2+ in the cleavage assays. Control SHAPE reactions with pri-mir-107 in the presence of Mg2+ showed cleavage profiles identical to those reported in Figure 3-1 (see Chapter

2). On the whole, our results support a correlation between cut-site flexibility and processing efficiency.

For pri-mir-16-1, the flexible hot spot adjacent to the Drosha cut site was eliminated by mutating both the AG·GA internal loop and the A/C mismatch to Watson-Crick base pairs

(labeled pri-mir-16-1-HS mut in Fig. 3-6). The hot spot region was successfully made rigid, as confirmed by combined SHAPE and MC-Fold analysis (Fig. 3-6A). While wild-type pri-mir-16-

1 was processed efficiently in the in vitro assays, the extent of processing of the cut-site mutant was reduced 2-fold (Fig. 3-6E). As a negative control for processing, we created another mutant of pri-mir-16-1 in which the large terminal loop was replaced with a UUCG tetraloop (labeled pri-mir-16-1-TL mut in Fig. 3-6). Consistent with previous studies in which similar tetraloops were shown to weaken processing,10, 16 the level of pri-mir-16-1-TL processing was reduced approximately 4-fold (Fig. 3-6E). Thus, deformability of the stem near the cut site is not the sole factor important for cleavage by the Microprocessor, but its presence significantly improves processing efficiency. 93

Having confirmed that flexibility near the Drosha cut site is favorable for efficient processing, we next determined whether asymmetric loops, which are flexible, mitigate these positive effects. Despite the presence of the 1 × 3 internal loop at the cut site, wild-type pri-mir-

107 was processed just as efficiently as pri-mir-16-1 in our assay. To determine the effect of an asymmetric loop versus a symmetric internal loop on the overall cleavage efficiency, we created a mutant (labeled pri-mir-107-HS in Fig. 3-6) with a two-base deletion on the three-nucleotide side of the loop, leaving a symmetric 1 × 1 loop (i.e., a C/C mismatch). The SHAPE reactivity profile and MC-Pipeline results for pri-mir-107-HS confirm the formation of a C/C mismatch and show that the mutant pri-mir-107 can still be deformed near the Drosha cut site (Fig. 3-6B). The cleavage efficiency was slightly decreased for the mutant pri-mir-107 in comparison to that of the wild-type (Fig. 3-6E). On the basis of the migration of the pre-mir-107 band for the mutant, which migrates as a species two nucleotides shorter than the wild-type (compare WT and HS mut in Fig. 3-6D), creating a symmetric 1 × 1 loop eliminated two nucleotides from the predicted pre- mir-107 product but otherwise did not change the position of Drosha cleavage. In addition to this mutant, another mutant of pri-mir-107 was created to test the impact of a complete lack of deformability at the cleavage site as was done with pri-mir-16-1. The second mutant for pri-mir-

107 had the asymmetric internal loop replaced with a Watson-Crick base pair (labeled pri-mir-

107-HS2 in Fig. 3-6), whose structure was again confirmed by the combined SHAPE−MC-

Pipeline analysis (Fig. 3-6C). As with pri-mir-16-1, a decrease in Drosha cleavage efficiency was seen when all flexibility at the cleavage site was abolished (Fig. 3-6E). Therefore, we conclude that the presence of an asymmetric loop near the Drosha cut site is less disruptive to processing than the total absence of flexibility in the stem.

94

95

Figure 3-6. Drosha processing of pri-miRNAs to pre-miRNAs in vitro confirms the necessity of hot spot flexibility for efficient cleavage. (A) The mutant pri-mir-16-1-HS has significantly reduced hot spot flexibility compared with that of the wild-type, as established by combined

SHAPE and MC-Fold analysis. (B) The mutant pri-mir-107-HS has the asymmetric 1 × 3 internal loop near the Drosha cleavage site replaced by a flexible C/C non-canonical base pair, (C) while the mutant pri-mir-107-HS2 has only Watson-Crick base pairs at the cleavage site, as established in both cases by combined SHAPE and MC-Fold analysis. (D) Denaturing gels for the processing of wild-type pri-mir-16-1 (WT), hot spot mutant (HS mut), and tetraloop mutant (TL mut) constructs (from left to right, respectively), in addition to wild-type pri-mir-107 (WT), hot spot mutant (HS mut), and second hot spot mutant (HS mut2) constructs. In all six assays, lanes represent RNA collected prior to addition of the Microprocessor (RNA), exposed to FLAG beads with addition of cell lysate that did not express FLAG-tagged proteins for 5 min (Mock), exposed to GFP for 5 min (GFP), and 5 min after exposure to the purified Microprocessor (Micro.). (E)

Percentages of pri-miRNAs cleaved by the Microprocessor in vitro after 5 min averaged over three independent experiments. Cleavage is calculated as the sum of the intensities of the pre- miRNA product and the cleaved flanking tails divided by the sum of the intensities of the product, tails, and the remaining pri-miRNA substrate.

3.5 Discussion

Previous conclusions about pri-miRNA structure-function relationships have been based on the RNA secondary structure either predicted by mfold,30 derived from deep sequencing analysis,12 or in some cases based on biochemical data.10, 16, 19 There are currently no determined atomic-resolution pri-miRNA structures available, which impedes further progress. In this study, we have shown that providing SHAPE-derived base pairing constraints to MC-Pipeline produces 96

pri-miRNA atomic structure models that are consistent with existing biochemical data and support prevailing mechanistic hypotheses. This study employed pri-miRNA sequences of human origin and is broadly consistent with C. elegans deep sequencing analysis, both validating the generalization of previous conclusions and encouraging future in vivo genetics studies utilizing this attractive model organism. Moreover, the helical distortions quantified in our analysis plausibly resolve inconsistencies in accepted thermodynamic secondary structure models, while accounting for the structural heterogeneity predicted from bioinformatic analysis.

In the prevailing model for Dicer cut-site recognition, the 3′-overhang left by Drosha cleavage is bound by the Dicer PAZ domain, and simple steric measurement of the expected A- form helix length from that reference point provides a “molecular ruler”, identifying the cleavage site for Dicer.51, 52 Demonstrating an equivalent molecular ruler for Drosha has been elusive.

As can be seen in the SHAPE-constrained MC-Fold secondary structures in Figure 3-1 (right panels), there is a high degree of conformational variation in the H1 stem adjacent to the Drosha cut site. These deformable hot spots arise in the data because of a combination of SHAPE reactivity and the probability of being single-stranded as determined directly by MC-Fold. The deformable hot spot region located adjacent to the Drosha cut site was shown to enhance processing via in vitro Drosha cleavage assays (Fig. 3-6). This result suggests that the degree of base pairing surrounding the Drosha cut site affects how the Microprocessor locates and/or cleaves the pri-miRNA.

The ssRNA-dsRNA junction at the base of the stem, one turn of A-form helix removed from the Drosha cut site, has previously been identified as a necessary structural element for cut- site recognition by DGCR8.10 Warf et al. proposed the existence of an unstable face on the pri- miRNA stem featuring strong structural distortions at the cleavage site that may serve as a marker, negating the need for a molecular ruler in Drosha capable of measuring from a fixed reference.12 We propose that our MC-Sym-predicted models support previous biochemical 97

models in which DGCR8 binds proximal to the ssRNA-dsRNA junction,6 identifying the basal stem in analogy to the role of the PAZ domain in Dicer, and senses the periodic deformable sites that occur approximately once per turn of A-form helix, as evident in Figure 3-3. The periodic deformable sites therefore could permit bending of the pri-miRNA, which is needed to engage both DGCR8-dsRBD binding faces in the model of Sohn et al.37 Collectively, the Microprocessor is able to specifically identify this set of features in only those nuclear dsRNAs that require cutting by Drosha. Our Drosha processing assays support the general hypothesis that cut-site deformability contributes to processing efficiency; future testing of this model through DGCR8 and Drosha binding assays will clarify the molecular mechanism of this effect.

A potential consequence of inaccurate cut-site identification by the Microprocessor is variability in pre-miRNA length and, ultimately, mature miRNA sequence, resulting in markedly dramatic effects if the seed sequence is altered. Warf et al. suggest that Drosha is capable of producing both typical (e.g., two-nucleotide 3′-overhang) and atypical overhangs based on the observation of variable overhang lengths in C. elegans pre-miRNAs.12 Improper overhang lengths may be the result of improper Drosha positioning caused by large imperfections at the cleavage site, as could be the case for pri-mir-107, a hypothesis supported by the numerous mature miRNA sequences reported from deep sequencing reads for miR-107 in miRBase. On the other hand, an intriguing recent study by Park et al. shows that Dicer may be able to use 5′-end binding as an alternative to 3′-end recognition in anomalous cases like this.53 While the models in our MC-Sym structure bundles show a remarkable extent of coaxial stacking of the H1 and H2 stems surrounding the Drosha cut site in this molecule, it remains possible that looping out of the three- nucleotide side of the bulge will produce an atypical overhang length. Moreover, the asymmetric loop in pri-mir-107 may exert a stronger influence on the kinetics of processing by Drosha than on the ultimate sequence composition of the mature miRNA. 98

Resolving the remaining questions surrounding the miRNA maturation process will require experimental determination of atomic-resolution structure models for pri-miRNAs and pre-miRNAs, both in isolation and when bound to their processing complexes. Combining

SHAPE chemistry with analysis in MC-Pipeline allows the generation of high-quality structure models that are both predictive and capable of unifying previous biochemical hypotheses. Minor imperfections within the A-form helix central in the pri-miRNA stem result in minimal disruption of the A-form structure but impart significant deformability that may be sufficient to guide the

Microprocessor to the appropriate cleavage site. Combination of our methodology with recent advances in high-throughput SHAPE analysis, so-called SHAPE-Seq methods,54 offers the possibility of generating hundreds of comparable structure bundles in the future from which the findings presented here can be generalized. Application to other problems in RNA metabolism, such as the evaluation of structures adopted by riboswitches and other regulatory elements, should be straightforward.

3.6 Acknowledgements

We thank Phil Bevilacqua, Durga Chadalavada, and Kit Kwok for helpful discussion while developing the experimental protocols.

99

3.7 References

1. Quarles, K. A.; Sahu, D.; Havens, M. A.; Forsyth, E. R.; Wostenberg, C.; Hastings, M. L.; Showalter, S. A., Ensemble analysis of primary microRNA structure reveals an extensive capacity to deform near the drosha cleavage site. Biochemistry 2013, 52 (5), 795-807. 2. Mathews, D. H.; Moss, W. N.; Turner, D. H., Folding and Finding RNA Secondary Structure. Cold Spring Harbor Perspect in Biol 2010, 2 (12). 3. Weeks, K. M.; Mauger, D. M., Exploring RNA Structural Codes with SHAPE Chemistry. Acc Chem Res 2011, 44 (12), 1280-1291. 4. Parisien, M.; Major, F., The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008, 452 (7183), 51-55. 5. Griffiths-Jones, S.; Saini, H. K.; van Dongen, S.; Enright, A. J., miRBase: tools for microRNA genomics. Nucleic Acids Res 2008, 36, D154-D158. 6. Kim, V. N.; Han, J.; Siomi, M. C., Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol 2009, 10 (2), 126-139. 7. Morlando, M.; Ballarino, M.; Gromak, N.; Pagano, F.; Bozzoni, I.; Proudfoot, N. J., Primary microRNA transcripts are processed co-transcriptionally. Nat Struct Mol Biol 2008, 15 (9), 902-909. 8. Zeng, Y.; Cullen, B. R., Efficient processing of primary microRNA hairpins by drosha requires flanking nonstructured RNA sequences. J Biol Chem 2005, 280 (30), 27595- 27603. 9. Okada, C.; Yamashita, E.; Lee, S. J.; Shibata, S.; Katahira, J.; Nakagawa, A.; Yoneda, Y.; Tsukihara, T., A high-resolution structure of the pre-microRNA nuclear export machinery. Science 2009, 326 (5957), 1275-9. 10. Han, J. J.; Lee, Y.; Yeom, K. H.; Nam, J. W.; Heo, I.; Rhee, J. K.; Sohn, S. Y.; Cho, Y. J.; Zhang, B. T.; Kim, V. N., Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 2006, 125 (5), 887-901. 11. Zeng, Y.; Cullen, B. R., Sequence Requirements for Micro RNA Processing and Function in Human Cells. RNA 2003, 9, 112-123. 12. Warf, M. B.; Johnson, W. E.; Bass, B. L., Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 2011, 17 (4), 563-77. 13. Sashital, D. G.; Doudna, J. A., Structural insights into RNA interference. Curr Opin Struct Biol 2010, 20 (1), 90-97. 14. Bindewald, E.; Wendeler, M.; Legiewicz, M.; Bona, M. K.; Wang, Y.; Pritt, M. J.; Le Grice, S. F. J.; Shapiro, B. A., Correlating SHAPE signatures with three-dimensional RNA structures. RNA 2011, 17 (9), 1688-1696. 15. Krol, J.; Sobczak, K.; Wilczynska, U.; Drath, M.; Jasinska, A.; Kaczynska, D.; Krzyzosiak, W. J., Structural features of microRNA (miRNA) precursors and their relevance to miRNA biogenesis and small interfering RNA/short hairpin RNA design. J Biol Chem 2004, 279 (40), 42230-9. 16. Zhang, X. X.; Zeng, Y., The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 2010, 38 (21), 7689-7697. 17. Havens, M. A.; Reich, A. A.; Duelli, D. M.; Hastings, M. L., Biogenesis of mammalian microRNAs by a non-canonical processing pathway. Nucleic Acids Res 2012, 40 (10), 4626-4640. 100

18. Wang, W. X.; Rajeev, B. W.; Stromberg, A. J.; Ren, N.; Tang, G.; Huang, Q.; Rigoutsos, I.; Nelson, P. T., The expression of microRNA miR-107 decreases early in Alzheimer's disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1. J Neurosci 2008, 28 (5), 1213-23. 19. Starega-Roslan, J.; Krol, J.; Koscianska, E.; Kozlowski, P.; Szlachcic, W. J.; Sobczak, K.; Krzyzosiak, W. J., Structural basis of microRNA length variety. Nucleic Acids Res 2011, 39 (1), 257-268. 20. Merino, E. J.; Wilkinson, K. A.; Coughlan, J. L.; Weeks, K. M., RNA structure analysis at single nucleotide resolution by selective 2 '-hydroxyl acylation and primer extension (SHAPE). J Am Chem Soc 2005, 127 (12), 4223-4231. 21. Wostenberg, C.; Quarles, K. A.; Showalter, S. A., Dynamic origins of differential RNA binding function in two dsRBDs from the miRNA "microprocessor" complex. Biochemistry 2010, 49 (50), 10728-36. 22. Das, R.; Laederach, A.; Pearlman, S. M.; Herschlag, D.; Altman, R. B., SAFA: Semi- automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments. RNA 2005, 11 (3), 344-354. 23. Kladwang, W.; VanLang, C. C.; Cordero, P.; Das, R., Understanding the errors of SHAPE-directed RNA structure modeling. Biochemistry 2011, 50 (37), 8049-56. 24. Yoon, S.; Kim, J.; Hum, J.; Kim, H.; Park, S.; Kladwang, W.; Das, R., HiTRACE: high- throughput robust analysis for capillary electrophoresis. Bioinformatics 2011, 27 (13), 1798-805. 25. Mathews, D. H.; Disney, M. D.; Childs, J. L.; Schroeder, S. J.; Zuker, M.; Turner, D. H., Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci USA 2004, 101 (19), 7287-7292. 26. Mathews, D. H., Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 2004, 10 (8), 1178- 1190. 27. Nagaswamy, U.; Larios-Sanz, M.; Hury, J.; Collins, S.; Zhang, Z. D.; Zhao, Q.; Fox, G. E., NCIR: a database of non-canonical interactions in known RNA structures. Nucleic Acids Res 2002, 30 (1), 395-397. 28. Davis, A. R.; Kirkpatrick, C. C.; Znosko, B. M., Structural characterization of naturally occurring RNA single mismatches. Nucleic Acids Res 2011, 39 (3), 1081-94. 29. Hermann, T.; Patel, D. J., RNA bulges as architectural and recognition motifs. Struct Fold Des 2000, 8 (3), R47-R54. 30. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31 (13), 3406-3415. 31. Kierzek, R.; Burkard, M. E.; Turner, D. H., Thermodynamics of single mismatches in RNA duplexes. Biochemistry 1999, 38 (43), 14214-23. 32. Wu, M.; Turner, D. H., Solution structure of (rGCGGACGC)2 by two-dimensional NMR and the iterative relaxation matrix approach. Biochemistry 1996, 35 (30), 9677-89. 33. Wu, M.; Mcdowell, J. A.; Turner, D. H., A Periodic-Table of Symmetrical Tandem Mismatches in Rna. Biochemistry 1995, 34 (10), 3204-3211. 34. Ciesiolka, J.; Michalowski, D.; Wrzesinski, J.; Krajewski, J.; Krzyzosiak, W. J., Patterns of cleavages induced by lead ions in defined RNA secondary structure motifs. J Mol Biol 1998, 275 (2), 211-20. 35. Gherghe, C. M.; Leonard, C. W.; Ding, F.; Dokholyan, N. V.; Weeks, K. M., Native-like RNA Tertiary Structures Using a Sequence-Encoded Cleavage Agent and Refinement by Discrete Molecular Dynamics. J Am Chem Soc 2009, 131 (7), 2541-2546. 101

36. Parisien, M.; Major, F., Determining RNA three-dimensional structures using low- resolution data. J Struct Biol 2012, 179 (3), 252-260. 37. Sohn, S. Y.; Bae, W. J.; Kim, J. J.; Yeom, K. H.; Kim, V. N.; Cho, Y., Crystal structure of human DGCR8 core. Nat Struct Mol Biol 2007, 14 (9), 847-853. 38. Chen, G.; Kennedy, S. D.; Qiao, J.; Krugh, T. R.; Turner, D. H., An alternating sheared AA pair and elements of stability for a single sheared purine-purine pair flanked by sheared GA pairs in RNA. Biochemistry 2006, 45 (22), 6889-6903. 39. Davis, A. R.; Znosko, B. M., Thermodynamic characterization of single mismatches found in naturally occurring RNA. Biochemistry 2007, 46 (46), 13425-13436. 40. Nagaswamy, U.; Voss, N.; Zhang, Z. D.; Fox, G. E., Database of non-canonical base pairs found in known RNA structures. Nucleic Acids Res 2000, 28 (1), 375-376. 41. Schroeder, S. J.; Burkard, M. E.; Turner, D. H., The energetics of small internal loops in RNA. Biopolymers 1999, 52 (4), 157-67. 42. Tran, T.; Disney, M. D., Molecular recognition of 6'-N-5-hexynoate kanamycin A and RNA 1x1 internal loops containing CA mismatches. Biochemistry 2011, 50 (6), 962-9. 43. Ban, N.; Nissen, P.; Hansen, J.; Moore, P. B.; Steitz, T. A., The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 2000, 289 (5481), 905-20. 44. Cruse, W. B. T.; Saludjian, P.; Biala, E.; Strazewski, P.; Prange, T.; Kennard, O., Structure of a Mispaired Rna Double Helix at 1.6-a Resolution and Implications for the Prediction of Rna Secondary Structure. Proc Natl Acad Sci USA 1994, 91 (10), 4160- 4164. 45. Holbrook, S. R.; Cheong, C. J.; Tinoco, I.; Kim, S. H., Crystal-Structure of an Rna Double Helix Incorporating a Track of Non-Watson-Crick Base-Pairs. Nature 1991, 353 (6344), 579-581. 46. SantaLucia, J., Jr.; Kierzek, R.; Turner, D. H., Effects of GA mismatches on the structure and thermodynamics of RNA internal loops. Biochemistry 1990, 29 (37), 8813-9. 47. Christiansen, M. E.; Znosko, B. M., Thermodynamic characterization of the complete set of sequence symmetric tandem mismatches in RNA and an improved model for predicting the free energy contribution of sequence asymmetric tandem mismatches. Biochemistry 2008, 47 (14), 4329-36. 48. Walter, A. E.; Wu, M.; Turner, D. H., The stability and structure of tandem GA mismatches in RNA depend on closing base pairs. Biochemistry 1994, 33 (37), 11349-54. 49. Tolbert, B. S.; Kennedy, S. D.; Schroeder, S. J.; Krugh, T. R.; Turner, D. H., NMR structures of (rGCUGAGGCU)2 and (rGCGGAUGCU)2: probing the structural features that shape the thermodynamic stability of GA pairs. Biochemistry 2007, 46 (6), 1511-22. 50. Hammond, N. B.; Tolbert, B. S.; Kierzek, R.; Turner, D. H.; Kennedy, S. D., RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 2010, 49 (27), 5817-27. 51. MacRae, I. J.; Zhou, K. H.; Li, F.; Repic, A.; Brooks, A. N.; Cande, W. Z.; Adams, P. D.; Doudna, J. A., Structural basis for double-stranded RNA processing by dicer. Science 2006, 311 (5758), 195-198. 52. Lau, P. W.; Guiley, K. Z.; De, N.; Potter, C. S.; Carragher, B.; MacRae, I. J., The molecular architecture of human Dicer. Nat Struct Mol Biol 2012, 19 (4), 436-440. 53. Park, J. E.; Heo, I.; Tian, Y.; Simanshu, D. K.; Chang, H.; Jee, D.; Patel, D. J.; Kim, V. N., Dicer recognizes the 5 ' end of RNA for efficient and accurate processing. Nature 2011, 475 (7355), 201-U107. 102

54. Lucks, J. B.; Mortimer, S. A.; Trapnell, C.; Luo, S. J.; Aviran, S.; Schroth, G. P.; Pachter, L.; Doudna, J. A.; Arkin, A. P., Multiplexed RNA structure characterization with selective 2 '-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc Natl Acad Sci USA 2011, 108 (27), 11063-11068. 55. Darty, K.; Denise, A.; Ponty, Y., VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 2009, 25, 1974-1975. 103

3.8 Supporting Information

Figure S3-1. Denaturing polyacrylamide gels of SHAPE reactions for (a) pri-miR-16-1, (b) pri- miR-30a, and (c) pri-miR-107. The two left lanes are sequencing lanes; the remaining are no

SHAPE reagent (-) and plus SHAPE reagent (+).

104

Figure S3-2. Ribonuclease structure mapping of pri-miR-30a is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations. (a) A denaturing 12% polyacrylamide gel used in the analysis with lanes as follows: C is a control sample (no nuclease); OHˉ is a limited alkaline digest; and T1, A, and V1 are limited digests with ribonucleases specific for single-stranded G, single-stranded C and U, and 5′ to double-stranded regions, respectively. The reactions in lanes 2 and 3 were performed under RNA-denaturing conditions (denoted ‘Den.’), while the reactions in lanes 4-6 were performed under RNA-native conditions (denoted ‘Nat.’). Helical and loop regions of the RNA are indicated to the right of the gel. (b) The highest probability secondary structure (see Fig. 3-1) with positions of cleavage by ribonucleases indicated by symbols as described in the legend. Symbol size is proportional to cleavage intensity. 105

Figure S3-3. Ribonuclease structure mapping of pri-miR-107 is consistent with the most probable secondary structure resulting from the SHAPE-constrained MC-Fold calculations. (a) A denaturing 12% polyacrylamide gel used in the analysis with lanes as follows: C is a control sample (no nuclease); OHˉ is a limited alkaline digest; and T1, A, and V1 are limited digests with ribonucleases specific for single-stranded G, single-stranded C and U, and 5′ to double-stranded regions, respectively. The reactions in lanes 2 and 3 were performed under RNA-denaturing conditions (denoted ‘Den.’), while the reactions in lanes 4-6 were performed under RNA-native conditions (denoted ‘Nat.’). Helical and loop regions of the RNA are indicated to the right of the gel. (b) The highest probability secondary structure (see Fig. 3-1) with positions of cleavage by ribonucleases indicated by symbols as described in the legend. Symbol size is proportional to cleavage intensity. 106

Figure S3-4. Secondary structures of pri-miR-16-1 derived from SHAPE (top) and MC-Fold probabilities (bottom). In the secondary structures, the top panel shows the nucleotides colored by

SHAPE reactivity and the bottom panel shows the nucleotides colored by the probability that the nucleotide is single-stranded as determined by MC-Fold. The combined probability (from top and bottom panels) of the nucleotide being single-stranded is located in Figure 3-1A (right). The colors are indicated by the color bar. Secondary structure diagrams were generated in VARNA.

107

Figure S3-5. Secondary structures of pri-miR-30a derived from SHAPE (top) and MC-Fold probabilities (bottom). In the secondary structures, the top panel shows the nucleotides colored by

SHAPE reactivity and the bottom panel shows the nucleotides colored by the probability that the nucleotide is single-stranded as determined by MC-Fold. The combined probability (from top and bottom panels) of the nucleotide being single-stranded is located in Figure 3-1B (right). The colors are indicated by the color bar. Secondary structure diagrams were generated in VARNA.

108

Figure S3-6. Secondary structures of pri-miR-107 derived from SHAPE (top) and MC-Fold probabilities (bottom). In the secondary structures, the top panel shows the nucleotides colored by

SHAPE reactivity and the bottom panel shows the nucleotides colored by the probability that the nucleotide is single-stranded as determined by MC-Fold. The combined probability (from top and bottom panels) of the nucleotide being single-stranded is located in Figure 3-1C (right). The colors are indicated by the color bar. Secondary structure diagrams were generated in VARNA.

109

Figure S3-7. Secondary structures of pri-miR-16-1 HS mut derived from SHAPE and MC-Fold probabilities. (a) SHAPE reactivity trace with description same as Figure 3-1. (b) In the secondary structures, the top panel shows the nucleotides colored by SHAPE reactivity, the middle panel shows the nucleotides colored by the probability that the nucleotide is single- stranded as determined by MC-Fold, and the bottom panel shows the combined probability (from top and middle panels) of the nucleotide being single-stranded. The colors are indicated by the color bar. Secondary structure diagrams were generated in VARNA. 110

Figure S3-8. Secondary structures of pri-miR-107 HS mut derived from SHAPE and MC-Fold probabilities. (a) SHAPE reactivity trace with description same as Figure 3-1. (b) In the secondary structures, the top panel shows the nucleotides colored by SHAPE reactivity, the middle panel shows the nucleotides colored by the probability that the nucleotide is single- stranded as determined by MC-Fold, and the bottom panel shows the combined probability (from top and middle panels) of the nucleotide being single-stranded. The colors are indicated by the color bar. Secondary structure diagrams were generated in VARNA. 111

Chapter 4

Deformability in the Cleavage Site of Primary MicroRNA is Not Sensed by the Double-Stranded RNA Binding Domains in the Microprocessor Component DGCR8

Submitted for Publication: Quarles, K.A. and Showalter, S.A.

4.1 Abstract

The prevalence of double-stranded RNA (dsRNA) in eukaryotic cells has only recently been appreciated. Of interest here, RNA silencing begins with dsRNA substrates that are bound by the double-stranded RNA binding domains (dsRBDs) of their processing proteins.

Specifically, processing of microRNA (miRNA) in the nucleus minimally requires the enzyme

Drosha and its dsRBD-containing cofactor protein, DGCR8. The smallest recombinant construct of DGCR8 that is sufficient for in vitro dsRNA binding, referred to as DGCR8-Core, consists of its two dsRBDs and a C-terminal tail. Because dsRBDs rarely recognize the nucleotide sequence of dsRNA, it is reasonable to hypothesize that DGCR8 function is dependent on recognition of specific structural features in the miRNA precursor. Previously, we demonstrated that non- canonical structural elements that promote RNA flexibility within the stem of miRNA precursors are necessary for efficient in vitro cleavage by reconstituted Microprocessor complexes. Here we combine gel shift assays with in vitro processing assays to demonstrate that neither the N- terminal dsRBD of DGCR8 in isolation, nor the DGCR8-Core construct, are sensitive to the presence of non-canonical structural elements within the stem of miRNA precursors, or to single- stranded segments flanking the stem. Extending DGCR8-Core to include an N-terminal heme- 112

binding region does not change our conclusions. Thus, our data suggest that while the DGCR8-

Core region is necessary for dsRNA binding and recruitment to the Microprocessor, it is not sufficient to establish the previously observed connection between RNA flexibility and processing efficiency.

4.2 Introduction

One of the most significant recent breakthroughs in biology, especially for the therapeutic community,1, 2 has been the discovery of RNA interference (RNAi), which is involved in a wide range of developmental, immunity, and regulatory networks.3, 4 As part of RNAi, the canonical microRNA (miRNA) maturation pathway includes a series of steps beginning in the nucleus with cleavage by the Microprocessor complex, progressing to the cytoplasm with cleavage by Dicer, and ending with incorporation into the RNA-induced silencing complex (RISC).5 Despite the centrality of this pathway to the eukaryotic gene regulation program, much is still unknown about the molecular mechanism of miRNA processing. Specifically, the complexity of RNA structure in cellular pools and the prevalence of double-stranded RNA (dsRNA) are both far more pronounced than previously appreciated.6-8 Improper substrate recognition within the miRNA maturation pathway can result in the accumulation of unprocessed miRNAs and/or misregulated mRNA levels, culminating in a multitude of clinical effects.9 Therefore, miRNA processing proteins face a crucial, complex task in selecting primary miRNA (pri-miRNA) targets from a pool of diverse dsRNA structures. For these reasons, gaining mechanistic insight into the molecular-scale rules for RNA selection by miRNA processing complexes remains a high priority.

Wherever dsRNA is encountered, the double-stranded RNA binding domain (dsRBD) is typically employed for dsRNA binding.10, 11 In the canonical metazoan miRNA maturation 113

pathway, there are five key proteins that contain dsRBDs: DGCR8, Drosha, TRBP, PACT, and

Dicer. In this study, we focus on the pair of dsRBDs found in the Microprocessor component

DGCR8, which is involved in the initial stage of miRNA processing. At the level of amino acid sequence, the RNA binding face of a given dsRBD motif is typically evolutionarily conserved across species, although this conservation does not generally extend across orthologous dsRBD- containing proteins.12 For example, DGCR8 contains two dsRBDs that share only 25% sequence identity with one another, although the sequences of each domain are over 98% conserved among mammals. The 3D fold of the dsRBD is structurally similar to the fold of the single-stranded

RNA recognition motif (RRM), featuring a mixed α/β topology arranged in the tertiary structure to produce α-helical and β-sheet rich faces, but the binding mode observed for dsRBDs is mechanistically distinct from that of RRMs.13 Structures of dsRBDs bound to dsRNA reveal that the α-helical face of the domain engages the dsRNA through predominantly electrostatic interactions that are nearly always insensitive to the nucleotide sequence of the RNA;14-18 although exceptions have been noted.19 This tendency of dsRBDs to bind dsRNA without sequence specificity suggests that proteins like DGCR8 must recognize specific structural features in their RNA targets for function.

It is widely believed that structural features common to pri-miRNAs, but rare in non- target RNAs, are a key determinant for recognition by the Microprocessor complex and for resulting high processing efficiency. The typical pri-miRNA contains a long dsRNA stem in the context of a hairpin loop structure, disrupted by multiple non-canonical structural features (i.e., loops and bulges), and adjoined to flanking regions with strong single-stranded character.20-23

Previously, we have shown that disorder-promoting non-canonical structural elements within the stem of pri-miRNAs are necessary for efficient in vitro cleavage by reconstituted Microprocessor complexes.24 The likely origin of these pri-miRNA structural requirements is in the RNA-binding 114

function of DGCR8, which motivates detailed biochemical characterization of this key dsRNA binding protein.

Interestingly, while the enzymes Dicer and Drosha each contain a single dsRBD, their cofactor proteins, including DGCR8, each contain multiple dsRBDs, suggesting that these cofactors may gain a functional advantage by arranging multiple dsRBDs in a single polypeptide chain. The most thoroughly studied fragment of DGCR8 that is sufficient for in vitro dsRNA binding, referred to as DGCR8-Core, consists only of its two dsRBDs and a short C-terminal tail, which contains an α-helix that the two dsRBDs pack against.25 Intriguingly, the spatial arrangement of the two dsRBDs within DGCR8-Core is incompatible with their simultaneous binding to pri-miRNA, unless the stem of the pri-miRNA undergoes substantial bending

(depicted in Figure 4-1);25 otherwise, DGCR8-Core must undergo a (possibly RNA-dependent) global conformational transition. Supporting the possibility of structural rearrangement within

DGCR8-Core, molecular dynamics calculations performed by our group lead to the hypothesis that correlated bending about the pseudo two-fold symmetry axis running through the interface between the two dsRBDs induces a conformational change that may facilitate pri-miRNA binding.26 Recent experimental evidence to support this hypothesis was provided by NMR lineshape analysis, which revealed a substantial change to the chemical environment for the RNA binding face of both dsRBDs, accompanied by changes in environment spanning the interface formed between the domains, in the context of RNA-bound DGCR8-Core.27 Taken together, these observations suggest that DGCR8-Core must undergo some degree of structural rearrangement, even in a binding mechanism featuring strong bending of the stem of pri-miRNA. 115

Figure 4-1. Bending model for pri-miRNA recognition by DGCR8-Core currently supported in the literature.25 The DGCR8-Core crystal structure (PDB 2YT4, with loops built-in as previously described26) is shown with dsRBD1 in red and dsRBD2 in blue. Approximately one turn of idealized A-form dsRNA (tan) has been modeled in contact with each dsRNA binding face of

DGCR8, with an additional turn of dsRNA shown in between to bridge the space separating the dsRBDs. The flanking tails, flexible regions in the stem, and terminal loop are implied by dotted lines to suggest the full make-up of a pri-miRNA bound to DGCR8-Core. This model suggests that the pri-miRNA must undergo extreme bending in order to accommodate binding by DGCR8, which may occur at the hot spot and secondary imperfection sites (labeled). Approximate Drosha and Dicer cut sites are also labeled.

116

In a previous study,24 we demonstrated a need for flexibility within the pri-miRNA stem for efficient cleavage by Drosha, but attribution of this result to a measurable impact on dsRNA binding by DGCR8-Core was not attempted. Additional work by others has demonstrated a need for flanking single-stranded RNA and for a large, flexible terminal loop on the hairpin;21, 28 similarly, these studies did not aim to demonstrate that the effects observed were directly

DGCR8-mediated. In the present study, we focus on DGCR8’s interactions with a variety of dsRNAs that contain features commonly found in native pri-miRNA substrates (i.e., ssRNA- dsRNA junction, dsRNA stem, imperfections within the stem, and the terminal loop). Our results show that neither the N-terminal dsRBD of DGCR8 in isolation, nor the DGCR8-Core construct, are sensitive to the presence of non-canonical structural elements within the stem of pri-miRNA, or to the composition of the RNA flanking the stem. Significantly, we show that periodic (i.e., once-per-turn) flexibility along the RNA stem is not necessary for high-affinity DGCR8-Core binding, which suggests that the strong RNA-bending model may be inaccurate. Even extending

DGCR8-Core to include the N-terminal heme-binding region does not affect binding capabilities within DGCR8. To summarize, our data suggest that while the DGCR8-Core region is necessary for RNA binding and recruitment to the Microprocessor, it is not sufficient to establish the previously observed connection between RNA flexibility and processing efficiency.

4.3 Materials and Methods

4.3.1 Protein Preparation

A synthetic DGCR8-Core (505-720) gene was purchased from Geneart, and DGCR8- dsRBD1 (505-583) was amplified by PCR. The expression and purification of the protein was performed as previously described.29 The protein was buffer exchanged into 50 mM cacodylate, pH 6.0, 50 mM KCl, and 0.35 µg/mL β-mercaptoethanol. Final concentration of the sample was 117

determined via guanidine hydrochloride denaturation by UV absorption using ε = 22,400 M-1 cm-1

(DGCR8-Core) and ε = 4,200 M-1 cm-1 (DGCR8-dsRBD1), both at 278 nm.

The DGCR8-HBD-Core construct (276-720) was prepared by amplifying DNA through

PCR from the human DGCR8 gene, purchased from ATCC, which was then inserted into pET24 vector. Following similar expression and purification protocols as for the other DGCR8 constructs, but without 3C protease cleavage of the his-tag, the recombinant protein was buffer exchanged into 50 mM sodium cacodylate, pH 6.0, 50 mM KCl, 0.35 μg/mL β-mercaptoethanol, and 5% (v/v) glycerol. Final concentration of the sample was determined via guanidine hydrochloride denaturation followed by UV absorption measurement, using ε = 44,800 M-1 cm-1 at 278 nm.

4.3.2 Primary MicroRNA Preparation

All DNAs containing a primary miRNA sequence were purchased from Geneart and contained a T7 promoter sequence at the 5´-end, as well as an inverted BsaI cut site at the 3´-end.

Preparation of the template DNA, transcription by T7 RNA polymerase, and purification of the transcribed RNA were all performed as previously described.24

4.3.3 5´-End Labeling of RNA

All RNAs were 5´-end labeled with 32P. In order to remove the 5´-triphosphate, the RNA was first treated with calf intestinal alkaline phosphatase (New England Biolabs), phenol/chloroform extracted, and ethanol precipitated. The RNA was then 5´-end labeled with T4 polynucleotide kinase (New England Biolabs). The concentration of labeled RNA was determined using a liquid scintillation counter (Beckman). 118

4.3.4 Native Gel RNA Purification

For the RNA duplexes, 5´-32P-end-labeled top strand RNA was mixed with 20-fold excess unlabeled bottom strand RNA (RNAs ordered from Dharmacon) and the EMSA 5X buffer to a final of 20% vol/vol (see Electrophoretic Mobility Shift Assay Methods). In the case of the duplexes containing terminal loops, the 5´-32P-end-labeled RNA was mixed with the EMSA 5X buffer to a final of 20% vol/vol. The mixture was denatured at 85°C for 3 minutes followed by renaturing at 1°C for 5 minutes. Subsequently, the RNA mixtures were run on a 0.25X TBE, 10% acrylamide gel at 12 V cm-1 at 4°C for 4 hours. The gel was then exposed on film for 30 minutes and the developed film was used to cut out the corresponding duplex or monomer terminal loop bands from the gel. The gel pieces were soaked overnight at 4°C in a TEN250 solution, and the

RNA was then purified from the supernatant by ethanol precipitation. The concentration of labeled RNA was determined using a liquid scintillation counter (Beckman).

4.3.5 Electrophoretic Mobility Shift Assays

For reactions involving pri-miRNA constructs, the 5´-32P-end-labeled RNA was renatured immediately prior to use by heating to 85 °C for 3 minutes followed by renaturing at

1°C for 5 minutes. Prior to mixing with protein, the binding reactions were incubated at room temperature for 30 minutes to ensure full equilibration in the presence of 50 mM cacodylate, pH

6.0, 50 mM KCl, 5% glycerol, 100 μg/mL bovine serum albumin, 1 mM dithiothreitol, and 0.1 mg/mL herring sperm DNA (to prevent the complex from sticking in the wells). Subsequently, the binding reactions were run on a 0.25X TBE, 10% acrylamide gel at 12 V cm-1 at 4°C for 3 h, with each lane containing 20 μCi. Signal from the gels was quantified on a Typhoon-9410 imager, and the resulting images were processed in ImageQuant. The fraction bound was calculated as the intensity of all protein-bound species over the sum of the protein-bound species and the free RNA. A more detailed protocol is located in the Supporting Methods. 119

4.3.6 Drosha Processing and Competition Processing Assays

RNA substrates for in vitro assays were transcribed as previously described.30 FLAG-

Drosha30 and FLAG-DGCR8 (AddGene) (collectively referred to as Microprocessor) were overexpressed in HEK-293T cells and FLAG-tagged proteins were isolated on M2-FLAG beads

(Sigma) as previously described.24 Lysate (whole cell or immunopurifed Microprocessor) was combined with 10 fmol of labeled RNA, 10X reaction buffer (64 mM MgCl2), RNasin

(Promega), and with (competition processing) or without (regular processing) increasing amounts of purified competitor duplex RNA. The reaction mix was incubated at 37°C for 15 min. Products were phenol/chloroform extracted and ethanol precipitated, and labeled RNA analyzed on 12% denaturing PAGE gels. Signal from the gels was quantified on a Typhoon-9410 imager, and the resulting images were processed in ImageQuant. The percent processed was calculated as the intensity of all cleaved species (pre-miRNA and cleaved single-stranded tails) over the sum of all species (cleaved species and substrate pri-miRNA).

4.4 Results

Native pri-miRNA substrates for DGCR8 are long (>80 nucleotides) hairpin RNAs, characterized structurally by multiple base pairing imperfections in their ~30 base pair stems, a terminal loop, and an unpaired flanking region (termed the ssRNA-dsRNA junction throughout this chapter). Multiple investigators have studied the prevalence and significance of base pairing imperfections within the double-stranded stem of the pri-miRNA, leading to the identification of two broadly conserved features: the region located near the Drosha cleavage site (referred to here as the “hot spot”);24 and the secondary imperfection, located approximately halfway between the

Drosha and Dicer cleavage sites (see Figure 4-1).23, 31 While it is widely accepted that an ssRNA- dsRNA junction is necessary for Drosha processing, it is still disputed whether a large, flexible 120

terminal loop is required.21, 32, 33 Motivated by the need to clarify the relative importance of each of these pri-miRNA structural features for efficient miRNA processing, we chose to begin the present study with an investigation of DGCR8-Core’s ability to recognize each of these structural elements in controlled in vitro binding reactions.

4.4.1 Binding to Primary MicroRNAs

Recognition of pri-miRNA by the tandem-dsRBD containing DGCR8-Core has previously been investigated by electrophoretic mobility shift assays (EMSAs), suggesting that non-specific association between DGCR8-Core and pri-miRNA is characterized by an apparent

25, 27, 29 dissociation constant (Kd,app) of ~2 μM, under a range of binding conditions. Alternatively, the single N-terminal dsRBD from DGCR8 (DGCR8-dsRBD1) binds pri-miRNA with Kd,app of

~9 μM,29 while binding by the C-terminal dsRBD (DGCR8-dsRBD2) alone has not been reported, due to limited solubility of this domain in isolation. It is noteworthy that Drosha also contains a dsRBD, but it has no demonstrated dsRNA binding activity (Supporting Fig. S4-1).

Under solution conditions exactly matching those used throughout this study, we selected the panel of pri-miRNA molecules presented in Figure 4-2A for EMSA analysis to establish the affinities of DGCR8-Core and DGCR8-dsRBD1. Representative gels for DGCR8-Core and

DGCR8-dsRBD1 binding assays are reported in Supporting Figures S4-2 and S4-3, respectively.

Investigation of binding using filter binding assays was also attempted, but rapidly abandoned because the binding intermediates appeared not to be sufficiently stable for retention on the membranes, leading to systematic errors in the apparent dissociation constants (Supporting Fig.

S4-4). Inspection of the apparent dissociation constants in Table 4-1 reveals that, while the secondary structures of these three pri-miRNAs vary significantly within the stem region, the affinity of DGCR8-Core for all three is nearly invariant (Kd,app ranges between 1-2 μM).

Interestingly, the affinity of DGCR8-Core for the Drosha cleavage product pre-mir-16-1 is also 121

indistinguishable from its measured affinity for any of the three pri-miRNAs employed in the study. In contrast, DGCR8-dsRBD1 displays an approximate 3-fold variation in affinity among these three pri-miRNA molecules, but binds all three approximately 10-fold more weakly than

DGCR8-Core interacting with the same pri-miRNAs.

Figure 4-2. Secondary structures of in vitro transcribed RNA models for (A) native pri-miRNAs and pre-mir-16-1; and (B) non-native pri-mir-16-1 stem-loop constructs (region of mutation boxed). Secondary structures shown are as predicted experimentally by combined SHAPE/MC-

Pipeline analysis.24 For all constructs, the mature miRNA is indicated in bold.

122

Table 4-1. EMSA best-fit parameters for DGCR8-Core and DGCR8-dsRBD1 binding to the native pri- and pre-miRNA constructs, with uncertainties based on two independent replicates.

N/D: Binding not determined/tested.

DGCR8-Core DGCR8-dsRBD1 RNA Construct Kd (µM) n Kd (µM) n pri-mir-16-1 1.05 ± 0.03 3.9 ± 0.4 9.7 ± 0.6 2.3 ± 0.1 pri-mir-107 1.50 ± 0.07 6 ± 1 6.6 ± 0.4 2.3 ± 0.2 pri-mir-30a 1.86 ± 0.09 5.6 ± 0.3 16.7 ± 0.1 1.49 ± 0.01 pre-mir-16-1 1.70 ± 0.09 6.0 ± 0.4 N/D

4.4.2 Binding to Perfect Watson-Crick Duplexes

The relatively uniform dissociation constants measured in our pri-miRNA binding studies suggested that either DGCR8-Core is not sensitive to variations in the structural features of pri- miRNA targets, or that the complexity of their structures masks DGCR8-Core’s sensitivity to the presence of individual structural elements. Therefore, we designed a series of model RNA constructs to continue our binding studies that will provide unambiguous assessment of DGCR8-

Core’s sensitivity to specific structural features. As a baseline for this study, we first characterized DGCR8’s ability to bind perfect Watson-Crick (WC) RNA duplexes, with lengths increasing from 12 base pairs through 44 base pairs (secondary structures shown in Figure 4-3).

The lengths of WC-dsRNAs used in this study were 1) 12 base pairs (ds12), representing the smallest binding site size of a single dsRBD;14, 15, 17, 19, 34 2) 16 base pairs (ds16), representing the smallest binding footprint seen for a tandem dsRBD construct (i.e., proteins similar to DGCR8-

Core);35 3) 22 base pairs (ds22), representing the length of a miRNA:miRNA* duplex; 4) 33 base pairs (ds33), representing the approximate length of a pre-miRNA; and 5) 44 base pairs (ds44), representing the approximate length of a pri-miRNA. It is worth noting that, with the exception of the additional sampling created by the ds16 construct, all duplexes differed in length by 123

approximately one turn of A-form RNA duplex, which allows for examination of binding on a per-helical turn basis.

Figure 4-3. Secondary structures of perfect Watson-Crick duplexes derived from pri-mir-16-1

(which is displayed at the top for reference). The mature miRNA strand is shown in bold and the dotted line indicates where each construct’s sequence aligns relative to pri-mir-16-1; these are consistent for all RNA model constructs throughout.

EMSAs were performed in which either DGCR8-Core or DGCR8-dsRBD1 was titrated into each of the RNA duplexes described above. Representative gels are shown in Figures 4-4A and 4-4C, respectively, for binding to ds44, with representative gels for DGCR8-Core and

DGCR8-dsRBD1 binding to all duplexes reported in Supporting Figures S4-5 and S4-6, respectively. As the length of the RNA duplex increased, the affinity of DGCR8-Core for the duplex also increased modestly, from Kd,app ~ 8 μM for binding to ds12 to Kd,app ~ 3 μM for 124

binding to ds44 (Fig. 4-4B; best-fit parameters in Table 4-2). Note that at high DGCR8-Core concentrations, particularly for titrations into the longer RNA duplexes, high molecular weight complexes began to stick in the wells of the gel, which may have been the result of high stoichiometry in the complexes36 or of DGCR8 self-assembly.37 In contrast, DGCR8-dsRBD1 shows a much larger change in affinity as RNA duplex length increases, ranging from Kd,app of

200 μM up to 6 μM as the duplex length increases from 12 to 44 base pairs (Fig. 4-4D; best-fit parameters in Table 4-2). Note that saturation of the binding event was not possible for DGCR8- dsRBD1 binding to ds12 or ds16, due to limitations in the solubility of the protein, and so the reported dissociation constants for these two RNAs should be interpreted as lower limits. In summary, the data from this study of interactions between DGCR8-Core or DGCR8-dsRBD1 with Watson-Crick RNA duplexes suggest that juxtaposing two dsRBDs in a single polypeptide largely eliminates DGCR8’s sensitivity to duplex length for RNAs with lengths similar to

Microprocessor substrates and products.

125

Figure 4-4. Electrophoretic mobility shift assays used to examine binding by DGCR8 to varying lengths of perfect Watson-Crick RNA duplexes. Representative gels are shown for both (A)

DGCR8-Core and (C) DGCR8-dsRBD1 binding to ds44. The leftmost lane in the gels contains

RNA, but no protein (labeled “RNA” above the gel). In all other lanes, the concentration of protein increases from left to right (triangle above gel). Bands corresponding to free and bound

RNA are indicated to the left of the gels. (B,D) The corresponding fits to the EMSA data for all lengths of duplex (for best-fit parameters, see Table 4-2; fitting procedure is described in the

Materials and Methods). Representative gels for all dsRNA lengths contributing to panels B and

D are provided in Supporting Figures S4-5 and S4-6.

126

Table 4-2. EMSA best-fit parameters for DGCR8-Core and DGCR8-dsRBD1 binding to the indicated duplexed RNA constructs, with uncertainties based on two independent replicates. N/B:

No binding was seen between 16TS and DGCR8-dsRBD1. N/D: Binding not determined/tested.

DGCR8-Core DGCR8-dsRBD1 RNA Construct Kd (µM) n Kd (µM) n ds12 7.4 ± 0.9 5 ± 2 > 200 ~1 ds16 5.8 ± 0.8 6 ± 1 > 90 ~1 ds22 4.1 ± 0.5 8 ± 2 21 ± 1 2.1 ± 0.2 ds33 3.7 ± 0.3 6 ± 2 8.8 ± 0.2 1.7 ± 0.1 ds44 3.3 ± 0.2 12.1 ± 0.7 5.9 ± 0.1 2.66 ± 0.09 ds16 + native flanking 3.41 ± 0.06 5.7 ± 0.2 > 25 ~1 ds16 + non-native flanking 4.9 ± 0.2 4.3 ± 0.2 > 35 ~1 ds16 + flanking TS 4.5 ± 0.7 5 ± 1 > 55 ~1 ds16 + native flanking BS 4.3 ± 0.6 4.8 ± 0.7 > 125 ~1 ds16 + non-native flanking BS 4.3 ± 0.2 4.0 ± 0.2 > 50 ~1 16TS 20 ± 2 2.09 ± 0.06 N/B ds16 + TL 6.0 ± 0.1 6 ± 1 > 90 ~1 ds16 + polyU4 6.4 ± 0.2 6.2 ± 0.3 N/D ds16 + polyU6 6.4 ± 0.2 4.1 ± 0.1 N/D ds16 + polyU8 4.9 ± 0.3 3.8 ± 0.4 > 65 ~1 HS duplex 4.76 ± 0.02 2.26 ± 0.01 N/D miR:miR* 4.67 ± 0.01 2.26 ± 0.01 N/D dsAmis 2.35 ± 0.05 3.67 ± 0.08 N/D dsUbulge 6.4 ± 0.1 4.3 ± 0.1 N/D

Previous studies have shown that DGCR8 binding affinities correlate well with Drosha processing efficiencies21 and that Drosha cleaves remarkably efficiently when in complex with the RNA-binding region of DGCR8 (amino acids 484-750 of the human sequence, which 127

includes the “Core” region).38 These findings suggest that the trends in dsRNA binding affinity we observed in our EMSAs should be predictive of the efficiency with which our dsRNA duplexes will inhibit pri-miRNA cleavage by the Microprocessor in a standard in vitro RNA processing assay. Therefore, in an effort to corroborate the trends we observed for DGCR8 affinity to bind RNAs of varying length in a more biological context, we conducted competition binding assays with the same panel of Watson-Crick RNA duplexes (referred to throughout as competition processing assays). A representative gel displaying titration of ds22 to inhibit pri- mir-16-1 processing is shown in Figure 4-5A, which yielded an IC50 of ~ 9 μM (gels for all duplex inhibitors are shown in Supporting Figures S4-7 and S4-8). As summarized in Table 4-3, the trend towards lower IC50 values as dsRNA length increases, derived from the competition processing assays, follows a similar trend to that observed for Kd,app in the EMSA experiments.

While the quantitative similarity between the IC50 and Kd,app values may be a coincidence, these results corroborate the finding that DGCR8 binds dsRNA in the size range of miRNA precursors well, with minimally higher efficiency for binding to longer RNA duplexes. To show that the trends observed were a result of duplex length, and not due to the chosen nucleotide sequence, we also tested a 33 base pair duplex with a randomized alternative sequence (ds33-alt) that, within experimental uncertainty, yielded the same IC50 as ds33 (Table 4-3). Based on these results, we conclude that the EMSA assays report accurately the trends in DGCR8 binding to RNA in the context of the Microprocessor. 128

Figure 4-5. Competition processing assays were used to corroborate the EMSA results (Fig. 4-4) in a more biological context. (A) A representative gel using ds22 as the competitor is shown with the concentration of competitor increasing from left to right (triangle above gel). The leftmost lanes report a ladder, pri-mir-16-1 processing in the absence of transfected Microprocessor

(Mock), and pri-mir-16-1 processed in the presence of transfected Microprocessor but the absence of any competitor duplex (-Comp). The positions of the pri-miRNA substrate and cleaved pre-miRNA and flanking tails are indicated to the right of the gels. (B) The corresponding fits to a single exponential decay model for all competitors are shown with solid lines (IC50 values reported in Table 4-3); ds12 and ds16 estimated fits are displayed as dashed lines because these assays only yielded lower limits for IC50. Representative competition processing gels for all RNA constructs are provided in Supporting Figures S4-7 and S4-8.

129

Table 4-3. Competition processing assays reporting inhibition of pri-mir-16-1 cleavage by the

Microprocessor in the presence of the indicated competitor duplexes as IC50 values, with uncertainties based on two independent replicates. EMSA-derived dissociation constants for

DGCR8-Core binding to the same RNA duplexes from Table 4-2 are also provided for comparison.

RNA Construct EMSAs (Kd, µM) Competition (IC50, µM) ds12 7.4 ± 0.9 > 12 ds16 5.8 ± 0.8 > 10 ds22 4.1 ± 0.5 8.9 ± 0.7 ds33 3.7 ± 0.3 4 ± 1 ds33-alt N/D 4.7 ± 0.8 ds44 3.3 ± 0.2 1.3 ± 0.2

4.4.3 Binding to Duplexes with Flanking Single Strands

Drosha processing assays have confirmed that the presence of an ssRNA-dsRNA junction approximately one turn of A-form helix removed from the Drosha cleavage site is required for processing.20, 21 As a result, it is common to draw schematics of the RNA-bound Microprocessor with DGCR8 covering the ssRNA-dsRNA junction,2, 5, 39, 40 while Drosha engages the pri-miRNA cleavage site; however, direct binding of DGCR8 to the junction has never been demonstrated experimentally. To assess whether DGCR8 binding is sensitive to the presence of an ssRNA- dsRNA junction, we constructed a set of duplex RNA constructs, based on our ds16 duplex, which contain an additional 16 nucleotides of single-stranded tail(s) flanking the duplex

(secondary structures are presented in Figure 4-6A). The first construct contained both 5´- and 3´- flanking tails, matching those found in native pri-mir-16-1 (ds16 + native flanking). Because the native pri-mir-16-1 sequence contains some Watson-Crick complementarity in the flanking 130

region, we also designed a construct with both 5´- and 3´-flanking tails, in which the bottom strand is mutated to destroy all base pair complementarity outside of the ds16 stem (ds16 + non- native flanking). Finally, we also annealed each of the three 32-mer flanking strands to their complementary 16-mer strands, in order to generate constructs with only a 5´-tail (ds16 + native flanking TS) or a 3´-tail (ds16 + native flanking BS; ds16 + non-native flanking BS). As a control, we also assayed DGCR8 interactions with 16-mer ssRNA (the top strand from the ds16 duplex) to test whether DGCR8-Core or DGCR8-dsRBD1 possesses an intrinsic affinity for ssRNA.

131

Figure 4-6. Secondary structures of (A) flanking and (B) terminal loop duplexes derived from pri-mir-16-1 (top). The various terminal loops are boxed to highlight the extent of the mutations.

For DGCR8-Core, the EMSA-derived binding affinities for each of the five single- stranded flanking constructs were more similar to the measured affinity for ds22 than to the measured affinity for ds16 (Table 4-2; see Supporting Figure S4-9 for gels), which suggests that the addition of an ssRNA-dsRNA junction does increase DGCR8-Core’s binding affinity, albeit minimally. Significantly, the EMSA for DGCR8-Core interacting with the single-stranded 16-mer 132

did show very weak binding (Kd,app ~ 20 μM, Table 4-2; Supporting Fig. S4-10). As expected, the binding affinity of DGCR8-dsRBD1 across all flanking duplexes is much weaker than the affinity seen for DGCR8-Core (Table 4-2; see Supporting Figure S4-11 for gels), although quantitative assessment is difficult because saturation could not be reached. Taken together, our results support the possibility that DGCR8 interacts with single-stranded RNA in the context of the

Microprocessor, although it appears to be unlikely that DGCR8 is targeted to the ssRNA-dsRNA junction specifically.

4.4.4 Binding to RNA Bearing Terminal Loop Structures

Processing assays conducted by multiple groups have demonstrated that when the native terminal loops of pri-miRNA models are replaced by small terminal loops (especially thermostable UUCG tetraloops), Drosha processing is moderately to significantly diminished.21,

24, 32, 33 Intriguingly, inhibition of pri-miRNA processing by binding of small molecules to the terminal loop has recently been demonstrated, implying that the terminal loop mediates important interactions with the Microprocessor.41 Other data has led to the hypothesis that large, flexible terminal loops enhance RNA release and enzyme turnover after Drosha cleavage.32, 42 Given that we observed a weak interaction between DGCR8 and ssRNA in tails flanking ds16, we next tested for a direct impact on DGCR8 binding mediated by terminal loops. For this experiment, we designed another set of RNAs in which various terminal loops were used to close a 16 base pair stem-loop secondary structure (structures shown in Figure 4-6B). This set included a stem-loop terminated by a thermally stable UUCG tetraloop (TL) and a series of three flexible polyU loops containing 4, 6, and 8 nucleotides (polyU4, polyU6, and polyU8, respectively).

Inspection of the EMSA results using these stem-loop RNAs as binding partners for

DGCR8 reveals a slight increase in binding affinity for the large polyU octaloop construct, but no statistically significant change in affinity for either of the tetraloops (Table 4-2; see Supporting 133

Figures S4-12 and S4-13 for gels). Motivated by this finding, we also designed pri-mir-16-1 variants in which the native terminal loop was replaced by the four experimental loop sequences described above (secondary structures are depicted in Figure 4-2B). Even though pri-mir-16-1-TL has previously been shown to inhibit Drosha processing substantially,24 it yielded a DGCR8-Core binding constant that was almost identical to that seen for wild-type pri-mir-16-1 (Table 4-4; see

Supporting Figure S4-14 for a representative gel), suggesting that DGCR8-Core does not interact with the terminal loop strongly enough to impact function.

Intrigued by these findings, we next carried out in vitro Drosha processing assays on the pri-mir-16-1 mutants bearing each of the four investigated terminal loops, in order to establish how well the binding affinities observed by EMSAs correlate with processing efficiency. For each construct, processing assays were performed under two different conditions: with the immunopurified Microprocessor (IP Micro) and with whole cell extract generated from cells overexpressing both DGCR8 and Drosha (WCE). In all four cases, the same trend was observed irrespective of the method used, as summarized in Figure 4-7. As expected based on our prior experience,24 processing of the UUCG tetraloop mutant was significantly reduced. The extent of inhibition observed with the polyU4 loop was much less than for the pri-mir-16-1-TL construct, suggesting that the length of the loop alone is not the origin of the strong effect observed with the thermostable UUCG tetraloop construct. In fact, processing efficiency was diminished by the same amount with both the polyU4, polyU6, and polyU8 constructs. Overall, the structure and nucleotide composition of the terminal loop is seen to have a significant impact on Drosha cleavage efficiency. Taken together with the EMSA studies, in which we observed that DGCR8 binding was largely insensitive to the presence or size of the terminal loop, we conclude that the impact of loop composition on processing cannot be attributed to direct recognition by DGCR8’s dsRBDs. 134

Figure 4-7. Drosha processing assays show that the secondary structure of pri-miRNAs is an important determinant of Microprocessor cleavage efficiency in vitro. (A) Denaturing gels for the processing of native pri-mir-16-1 and its mutants: pri-mir-16-1-WT (WT), thermally stable tetraloop mutant (TL mut), polyU4 mutant (polyU4), polyU6 mutant (polyU6), polyU8 mutant

(polyU8), “hot spot” mutant (HS mut), and secondary mutant (Sec mut). In each assay, lanes represent RNA exposed to FLAG beads with addition of cell lysate that did not express FLAG- tagged proteins (Mock), exposed to the FLAG-tagged immunopurified Microprocessor (IP

Micro), and exposed to whole cell extract containing overexpressed Microprocessor (WCE). The positions of the pri-miRNA substrate, cleaved pre-miRNA, and flanking tails are indicated to the 135

right of the gel. (B) Percentages of pri-miRNAs cleaved by the Microprocessor in vitro averaged over three independent experiments. Processing efficiencies are graphed for the immunopurified

Microprocessor (left axis) and the whole cell extract (right axis), with error bars to one standard deviation (see Materials and Methods for details).

4.4.5 Binding to Pri-miRNA with Reduced Stem Flexibility

Bioinformatic analysis has shown a tendency for pri-miRNAs to harbor non-Watson-

Crick helical defects near the Drosha cleavage site and approximately halfway between the

Drosha and Dicer cleavage sites, which we label the “hot spot” and secondary imperfection sites, respectively (Fig. 4-1).22 Recently, we demonstrated that these helical defects enhance the local flexibility of the stem and that flexibility near the Drosha cleavage site in particular is necessary for high efficiency pri-miRNA processing.24 Motivated by these findings, we constructed this final set of experiments to address whether DGCR8 binding is impacted by the presence of flexibility-inducing non-Watson-Crick defects in the context of the dsRNA stems of pri-miRNA and our model duplexes. For this study, we used both our previously generated “hot spot” mutant, in which the Drosha cleavage site of pri-mir-16-1 is mutated to be fully Watson-Crick complementary (pri-mir-16-1-HS), and a new construct in which the secondary imperfection between the Drosha and Dicer cleavage sites was mutated to also be fully Watson-Crick complementary (pri-mir-16-1-sec). Secondary structures for both of these RNAs are presented in

Figure 4-8A. Similar to the EMSA results reported above for the panel of native pri-miRNAs, no significant difference in apparent dissociation constant for DGCR8-Core binding to these two stem-mutants was observed (Table 4-4; see Supporting Figure S4-14 for representative gels). 136

Figure 4-8. Secondary structures of constructs mimicking the imperfections found in pri-mir-16-

1: (A) in the context of full-length pri-mir-16-1 and (B) in the context of short duplexes. The regions of mutation are boxed.

137

Table 4-4. EMSA best-fit parameters for DGCR8-Core binding to pri-mir-16-1, bearing the indicated mutations, with uncertainties based on two independent replicates.

DGCR8-Core RNA Construct Kd (µM) n pri-mir-16-1-WT 1.05 ± 0.03 3.9 ± 0.4 pri-mir-16-1-TL 1.0 ± 0.1 3.1 ± 0.4 pri-mir-16-1-HS 2.70 ± 0.03 6.1 ± 0.1 pri-mir-16-1-Sec 1.2 ± 0.1 2.5 ± 0.2

In order to provide a more controlled assessment of the impact that these stem-stabilizing mutants may have on DGCR8 binding, we also assessed DGCR8 affinity in the context of otherwise ideal 22 base pair duplexes (secondary structures shown in Figure 4-8B). The first duplex resembled the Drosha cut site (HS duplex), containing both the GA tandem mismatch and

A/C mismatch. For the secondary site, three duplexes were generated to assess the impact of combining symmetric and asymmetric defects: 1) miR:miR*, so named because of its similarity to the fully Drosha and Dicer processed native duplex, with both the A/A mismatch and the U bulge; 2) dsAmis, containing only the A/A mismatch; and 3) dsUbulge, containing only the U bulge.

For the dsUbulge duplex, the apparent DGCR8 binding affinity was decreased in comparison to the perfect ds22 duplex (Table 4-2; see Supporting Figure S4-15 for representative gels), most likely due to bending of the RNA duplex at the bulge. In contrast, no significant change in binding affinity was observed for the duplexes containing symmetric internal loops; only a slight increase in affinity was seen for the dsAmis duplex. This result suggests that the mismatches in these RNAs do not significantly alter the average A-form geometry, as is common for symmetric internal loops.43 These results are also similar to those recorded in our laboratory for TRBP interactions with similar duplexes.44 138

We have previously shown with immunopurified Microprocessor that Drosha processing is inhibited by the pri-mir-16-1-HS mutation,24 which rigidifies the stem near the Drosha cut site.

Continuing the in vitro processing assays first discussed for the stem-loop pri-miRNA mutants, this experiment was completed by assessing the impact of removing stem defects on Drosha cleavage efficiency, using both immunopurified Microprocessor and whole cell extracts from cells overexpressing Drosha and DGCR8. Under both conditions, pri-mir-16-1-HS cleavage was significantly reduced, as compared to wild-type pri-mir-16-1 (Fig. 4-7). In contrast, replacing the helical defects in the secondary site with Watson-Crick base pairs had no effect on Drosha processing efficiency under either IP or whole cell extract conditions. Together with the EMSA assays, these results show that Drosha processing is sensitive to the presence of helical defects near the cut site, but that this effect cannot be attributed directly to DGCR8 binding.

4.4.6 RNA Binding by DGCR8’s Heme-binding Domain

Recently, the group of Feng Guo has introduced the hypothesis that a region of DGCR8

N-terminal to the “Core” element which bears sequence homology to other proteins in the WW domain fold family is essential for establishing high-affinity RNA binding by DGCR8 and, furthermore, that this activity is heme-binding dependent.37, 45-47 Significantly, this hypothesis has recently been substantiated by the work of Roth et al.,27 who report enhanced dsRNA binding affinity measured by in vitro assays similar to those we utilize here. Therefore, we carried out binding assays using DGCR8-Core lengthened at its N-terminus to contain the heme-binding domain (DGCR8-HBD-Core, residues 276-720). Furthermore, the crystal structure of the heme- binding domain of DGCR8 (276-353) suggests that C352 binds to an iron-containing heme molecule through direct ligation of the iron by C352, which dimerizes DGCR8.48 Therefore, we also performed binding assays with the DGCR8-HBD-Core construct mutated to convert the critical cysteine to alanine (C352A). Through these studies, we aim to evaluate whether the 139

findings we have reported in this study are likely to be general or limited in scope to the context of the DGCR8-Core construct only.

As the binding constants summarized in Table 4-5 show, there is no statistically significant difference in DGCR8-Core and DGCR8-HBD-Core binding affinity for pri-mir-16-1, the pri-mir-16-1-TL mutant, or ds44 (see Supporting Figures S4-16 and S4-17 for representative gels). The pri-mir-16-1-TL result is especially striking, because Quick-Cleveland et al. have recently published the hypothesis that the HBD engages the open terminal loop of pri-miRNA as a necessary feature of high affinity binding.47 It is noteworthy that our results contrast those of the

Guo and Hennig groups, where an enhanced RNA binding affinity is observed in the context of

HBD-containing constructs that extend to the C-terminal residue of DGCR8 (276-773), which suggests that the disordered tail region between residues 720 and 773 is a necessary part of this affinity-switching mechanism. Also, we note that heme binding in the wild-type DGCR8-HBD-

Core construct was confirmed by UV-visible spectroscopy, as evidenced by the split Soret spectrum indicating a bis(thiolate) hyperporphyrin structure surrounding the iron center,49 which disappeared upon introduction of the C352A mutation (Supporting Fig. S4-18). Therefore, our results demonstrate that heme interaction with DGCR8 is not sufficient to enhance the RNA binding affinity of DGCR8, suggesting that the heme-binding domain does not contribute directly to pri-miRNA binding.

Our binding results for the DGCR8-HBD-Core construct were surprising, because introduction of a C352A mutation into the heme-binding domain has previously been shown to negatively impact pri-miRNA processing by Drosha,45 although outright deletion of the heme- binding domain has had contradictory impacts on pri-miRNA processing in the past.47 Therefore, we tested Drosha processing efficiency in the background of the C352A mutant DGCR8, using pri-mir-16-1 as a substrate for consistency with the remainder of the work reported here.

Additionally, we tested processing in the background of two structural mutants previously 140

introduced here: pri-mir-16-1-TL and pri-mir-16-1-HS. Consistent with previous literature, we observe a decrease in processing efficiency for wild-type pri-mir-16-1 when the Microprocessor contains the DGCR8-C352A mutant construct (Fig. 4-9). Significantly, we observe a similar quantitative reduction in processing efficiency for both the pri-mir-16-1-TL and pri-mir-16-1-HS mutants, indicating that whatever functional role for the heme-binding domain is disrupted by inhibiting heme-ligation is independent from the mechanism that establishes the pri-miRNA structural preferences we have documented here.

Table 4-5. EMSA best-fit parameters for DGCR8-HBD-Core (with and without the C352A mutation) binding to pri-mir-16-1, pri-mir-16-1-TL, and ds44, with uncertainties based on two independent replicates.

DGCR8-HBD-Core DGCR8-HBD-Core (C352A) RNA Construct Kd (µM) n Kd (µM) n pri-mir-16-1 2.1 ± 0.1 4.1 ± 0.1 3.1 ± 0.1 3.0 ± 0.1 pri-mir-16-1-TL 2.2 ± 0.1 4.6 ± 0.1 3.1 ± 0.2 4.1 ± 0.2 ds44 4 ± 1 8 ± 1 4.7 ± 0.1 5 ± 1

141

Figure 4-9. Drosha processing assays show that the cysteine residue C352 is an important determinant of Microprocessor cleavage efficiency in vitro. (A) Denaturing gels for the processing of native pri-mir-16-1 and its mutants: pri-mir-16-1-WT (WT), thermostable tetraloop mutant (TL mut), and “hot spot” mutant (HS mut). In each assay, lanes represent RNA exposed to

FLAG beads with addition of cell lysate that did not express FLAG-tagged proteins (Mock), exposed to either the FLAG-tagged immunopurified Microprocessor or to whole cell extract containing overexpressed Microprocessor for both the wild-type Microprocessor (IP Micro and

WCE Micro, respectively) and for the Microprocessor containing the C352A mutation in DGCR8 142

(IP C352A and WCE C352A, respectively). The positions of the pri-miRNA substrate, cleaved pre-miRNA, and flanking tails are indicated to the right of the gel. (B) Percentages of pri- miRNAs cleaved by the Microprocessor in vitro averaged over three independent experiments.

Processing efficiencies are graphed for the immunopurified Microprocessor (left axis) and the whole cell extract (right axis), with error bars to one standard deviation (see Materials and

Methods for details).

4.5 Discussion

It has been firmly established that DGCR8 and Drosha form the minimal complex required for efficient processing of pri-miRNAs during the first stage of miRNA maturation.50

Interest in fully defining the mechanism of cleavage by these proteins is high, because dysfunctional miRNA processing has been linked to the etiology of multiple diseases.9 It is hypothesized that DGCR8 acts as a “molecular anchor” by forming a pre-cleavage complex with pri-miRNA, promoting Drosha binding and establishing the appropriate cut site within the pri- miRNA.21, 51 Complicating matters, DGCR8 has been observed to interact with a plethora of

RNAs in the nucleus in addition to pri-miRNAs,52 suggesting that the rules for RNA selection and identification as pri-miRNA by DGCR8 may be more complex than previously appreciated.

Therefore, understanding how the dsRBDs of DGCR8 collectively recognize their targets in the nucleus is a necessary step towards developing a molecular mechanism to describe pri-miRNA processing in the context of the miRNA maturation pathway.

Several dsRNA binding proteins contain multiple copies of the dsRBD fold, suggesting that placing multiple dsRBDs in tandem promotes high efficiency binding and possibly increased selectivity. One possible reason for a protein like DGCR8 to harbor two dsRBDs would be to increase binding affinity over that expected for a single dsRBD; yet our data show this simple 143

model is not sufficient to explain the structure of the protein. While DGCR8-Core does bind each of the RNAs we investigated more strongly than DGCR8-dsRBD1 does, we have previously measured interactions between the isolated dsRBD from Dicer and the same panel of dsRNAs, reporting binding affinities that are essentially indistinguishable from the DGCR8-Core affinities reported here.53 Thus, DGCR8 could have achieved the observed dual-dsRBD affinities with a single dsRBD of different sequence. As suggested by Faller et al.,37 and by the high Hill coefficients from the EMSAs in this study, an alternative explanation is that DGCR8-Core binds dsRNA cooperatively, while the isolated dsRBDs are not able to do so.

The EMSA results in the present study demonstrate that the isolated N-terminal dsRBD of DGCR8 binds inefficiently to short duplexes (< 22 base pairs) or to duplexes possessing significant single-stranded character. In contrast, DGCR8-Core shows marginal changes in Kd,app across the duplex length range tested and is more resilient to the incorporation of ssRNA into the duplex. This suggests that the arrangement of two dsRBDs in DGCR8-Core may have evolved to increase DGCR8’s resilience to the structural diversity commonly encountered across the set of hundreds of pri-miRNAs it must assist in processing. Overall, our binding results are highly consistent with those of Roth et al.,27 while providing broader coverage of RNA structural motifs that strengthens our confidence in the general nature of our collective findings.

The current model for Drosha substrate selection by DGCR8 suggests that identification of pri-miRNAs is achieved, in part, by direct interactions between DGCR8 and the ssRNA- dsRNA junction at the base of the pri-miRNA stem.2, 40 Crystallography of the unbound DGCR8-

Core particle, supported by FRET measurements, has also led to the hypothesis that DGCR8 wraps the pri-miRNA stem around its two dsRBDs.25 The observed formation of a stable interface between the two dsRBDs of DGCR8-Core in the unbound state is unique and contrasts with the solution structures of the dual-dsRBD proteins PKR54 and TRBP,55 and yet extensive NMR spectroscopic data supports the presence of this interface in solution,27, 56 lending credence to the 144

RNA-wrapping model. However, the proposals that DGCR8-Core interacts with the ssRNA- dsRNA junction and that it wraps the RNA stem around itself in the bound state are not mutually exclusive, as the wrapping model can still place the ssRNA-dsRNA junction in contact with

DGCR8 for reasonable stem-lengths (see the model in Figure 4-1, for example). Evidence in support of both elements of this model has been provided by Drosha processing assays, which show reduced efficiency when the ssRNA-dsRNA junction or flexible sites internal to the stem are eliminated.20, 21, 24, 33 The results of the present study confirm that the dsRBD-containing region of DGCR8 is capable of high affinity binding to dsRNA, consistent with it being the minimal region necessary for recruitment of RNA to the Microprocessor; but binding of dsRNA to DGCR8-Core is not enhanced by the addition of single-stranded flanking regions, or by the presence of a hairpin-forming stem-loop. Furthermore, elimination of the two regions of enhanced flexibility within the stem did not inhibit DGCR8-Core binding; nor did elimination of secondary site flexibility inhibit in vitro processing. Therefore, we conclude that interactions mediated by DGCR8-Core alone cannot be responsible for generating the previously observed criteria for efficient cleavage of RNAs recruited to the Microprocessor complex.

Given that the “Core” binding region of DGCR8 is not responsible for recognition of specific structural features within pri-miRNAs, identifying the molecular mechanism of specific target recognition remains a high priority. To this end, we investigated the previously hypothesized role of DGCR8’s heme-binding region47 in target RNA selection. Our data support the conclusion that the heme-binding domain does not contribute to the recognition of specific structural features within pri-miRNA targets. Previous studies have shown an increase in RNA binding affinity for full-length DGCR8 (less the nuclear localization signal) in comparison to the

“Core” binding region; however our data suggest that the heme-binding region is not responsible for this change, at least in the context of the DGCR8-HBD-Core construct. This conclusion is 145

further supported by sufficient in vitro processing of pri-miRNAs using Microprocessor reconstituted with DGCR8 without the heme-binding domain.37, 38, 45, 47

At first glance, the collective findings reported here seem negative – no strong connection is observed between DGCR8-Core binding affinity and the inclusion of any structural features in the dsRNA targets. This result is unexpected, because DGCR8-Core matches well to the minimal fragment of DGCR8 that it is necessary to promote in vitro pri-miRNA processing, suggesting that the job of selecting pri-miRNAs from the diverse cellular RNA pool may be dependent on a region of DGCR8 not necessary for efficient in vitro processing. Recent studies suggest that this latter point has merit.47 However, taking the converse view of these data suggests a new hypothesis. We propose that the unique spatial arrangement of two dsRBDs within DGCR8-Core promotes efficient cleavage for a broad set of substrates by disconnecting recruitment efficiency from the vagaries of the unique structure features seen in individual pri-miRNA substrates and, furthermore, that this insensitivity is necessary to support DGCR8’s Drosha-independent functions. It appears to be likely that DGCR8 binds to nearly all of the dsRNAs it encounters in the nucleus and that identification of some RNAs as pri-miRNAs may be a function of Drosha itself.

4.6 Acknowledgements

This work was supported by US National Institutes of Health grant R01GM098451 to

SAS. We thank Philip Bevilacqua and Durga Chadalavada for helpful discussion while developing the experimental protocols. We also thank Nick Lanz and Jamie Arnold for help in molecular biology protocols. Lastly, we would like to thank the Huck Institutes Shared

Fermentation Facility for providing cell culture equipment.

146

4.7 References

1. Zangi, L.; Lui, K. O.; von Gise, A.; Ma, Q.; Ebina, W.; Ptaszek, L. M.; Spater, D.; Xu, H.; Tabebordbar, M.; Gorbatov, R.; Sena, B.; Nahrendorf, M.; Briscoe, D. M.; Li, R. A.; Wagers, A. J.; Rossi, D. J.; Pu, W. T.; Chien, K. R., Modified mRNA directs the fate of heart progenitor cells and induces vascular regeneration after myocardial infarction. Nat Biotechnol 2013, 31 (10), 898-907. 2. Li, Z.; Rana, T. M., Therapeutic targeting of microRNAs: current status and future challenges. Nat Rev Drug Discov 2014. 3. Bora, R. S.; Gupta, D.; Mukkur, T. K.; Saini, K. S., RNA interference therapeutics for cancer: challenges and opportunities (review). Mol Med Rep 2012, 6 (1), 9-15. 4. Castel, S. E.; Martienssen, R. A., RNA interference in the nucleus: roles for small RNAs in transcription, epigenetics and beyond. Nat Rev Genet 2013, 14 (2), 100-12. 5. Ha, M.; Kim, V. N., Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol 2014, 15 (8), 509-24. 6. Ding, Y.; Tang, Y.; Kwok, C. K.; Zhang, Y.; Bevilacqua, P. C.; Assmann, S. M., In vivo -wide profiling of RNA secondary structure reveals novel regulatory features. Nature 2014, 505 (7485), 696-700. 7. Rouskin, S.; Zubradt, M.; Washietl, S.; Kellis, M.; Weissman, J. S., Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 2014, 505 (7485), 701-5. 8. Talkish, J.; May, G.; Lin, Y.; Woolford, J. L., Jr.; McManus, C. J., Mod-seq: high- throughput sequencing for chemical probing of RNA structure. RNA 2014, 20 (5), 713- 20. 9. van Kouwenhove, M.; Kedde, M.; Agami, R., MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer 2011, 11 (9), 644-56. 10. Tian, B.; Bevilacqua, P. C.; Diegelman-Parente, A.; Mathews, M. B., The double- stranded-RNA-binding motif: interference and much more. Nat Rev Mol Cell Biol 2004, 5 (12), 1013-23. 11. Masliah, G.; Barraud, P.; Allain, F. H., RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell Mol Life Sci 2013, 70 (11), 1875- 95. 12. Tian, B.; Mathews, M. B., Phylogenetics and functions of the double-stranded RNA- binding motif: a genomic survey. Prog Nucleic Acid Res Mol Biol 2003, 74, 123-58. 13. Nagai, K., RNA-protein complexes. Curr Opin Struct Biol 1996, 6 (1), 53-61. 14. Ryter, J. M.; Schultz, S. C., Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. EMBO J 1998, 17 (24), 7505-13. 15. Ramos, A.; Grunert, S.; Adams, J.; Micklem, D. R.; Proctor, M. R.; Freund, S.; Bycroft, M.; St Johnston, D.; Varani, G., RNA recognition by a Staufen double-stranded RNA- binding domain. EMBO J 2000, 19 (5), 997-1009. 16. Blaszczyk, J.; Gan, J.; Tropea, J. E.; Court, D. L.; Waugh, D. S.; Ji, X., Noncatalytic assembly of ribonuclease III with double-stranded RNA. Structure 2004, 12 (3), 457-66. 17. Yang, S. W.; Chen, H. Y.; Yang, J.; Machida, S.; Chua, N. H.; Yuan, Y. A., Structure of Arabidopsis HYPONASTIC LEAVES1 and its molecular implications for miRNA processing. Structure 2010, 18 (5), 594-605. 147

18. Wang, Z.; Hartman, E.; Roy, K.; Chanfreau, G.; Feigon, J., Structure of a yeast RNase III dsRBD complex with a noncanonical RNA substrate provides new insights into binding specificity of dsRBDs. Structure 2011, 19 (7), 999-1010. 19. Stefl, R.; Oberstrass, F. C.; Hood, J. L.; Jourdan, M.; Zimmermann, M.; Skrisovska, L.; Maris, C.; Peng, L.; Hofr, C.; Emeson, R. B.; Allain, F. H., The solution structure of the ADAR2 dsRBM-RNA complex reveals a sequence-specific readout of the minor groove. Cell 2010, 143 (2), 225-37. 20. Zeng, Y.; Cullen, B. R., Efficient processing of primary microRNA hairpins by drosha requires flanking nonstructured RNA sequences. J Biol Chem 2005, 280 (30), 27595- 27603. 21. Han, J. J.; Lee, Y.; Yeom, K. H.; Nam, J. W.; Heo, I.; Rhee, J. K.; Sohn, S. Y.; Cho, Y. J.; Zhang, B. T.; Kim, V. N., Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 2006, 125 (5), 887-901. 22. Warf, M. B.; Johnson, W. E.; Bass, B. L., Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 2011, 17 (4), 563-77. 23. Sperber, H.; Beem, A.; Shannon, S.; Jones, R.; Banik, P.; Chen, Y.; Ku, S.; Varani, G.; Yao, S.; Ruohola-Baker, H., miRNA sensitivity to Drosha levels correlates with pre- miRNA secondary structure. RNA 2014, 20 (5), 621-31. 24. Quarles, K. A.; Sahu, D.; Havens, M. A.; Forsyth, E. R.; Wostenberg, C.; Hastings, M. L.; Showalter, S. A., Ensemble analysis of primary microRNA structure reveals an extensive capacity to deform near the drosha cleavage site. Biochemistry 2013, 52 (5), 795-807. 25. Sohn, S. Y.; Bae, W. J.; Kim, J. J.; Yeom, K. H.; Kim, V. N.; Cho, Y., Crystal structure of human DGCR8 core. Nat Struct Mol Biol 2007, 14 (9), 847-853. 26. Wostenberg, C.; Noid, W. G.; Showalter, S. A., MD simulations of the dsRBP DGCR8 reveal correlated motions that may aid pri-miRNA binding. Biophys J 2010, 99 (1), 248- 56. 27. Roth, B. M.; Ishimaru, D.; Hennig, M., The core microprocessor component DiGeorge syndrome critical region 8 (DGCR8) is a nonspecific RNA-binding protein. J Biol Chem 2013, 288 (37), 26785-99. 28. Starega-Roslan, J.; Krol, J.; Koscianska, E.; Kozlowski, P.; Szlachcic, W. J.; Sobczak, K.; Krzyzosiak, W. J., Structural basis of microRNA length variety. Nucleic Acids Res 2011, 39 (1), 257-268. 29. Wostenberg, C.; Quarles, K. A.; Showalter, S. A., Dynamic origins of differential RNA binding function in two dsRBDs from the miRNA "microprocessor" complex. Biochemistry 2010, 49 (50), 10728-36. 30. Havens, M. A.; Reich, A. A.; Duelli, D. M.; Hastings, M. L., Biogenesis of mammalian microRNAs by a non-canonical processing pathway. Nucleic Acids Res 2012, 40 (10), 4626-4640. 31. Burke, J. M.; Kelenis, D. P.; Kincaid, R. P.; Sullivan, C. S., A central role for the primary microRNA stem in guiding the position and efficiency of Drosha processing of a viral pri-miRNA. RNA 2014, 20 (7), 1068-77. 32. Zhang, X. X.; Zeng, Y., The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 2010, 38 (21), 7689-7697. 33. Zeng, Y.; Cullen, B. R., Sequence Requirements for Micro RNA Processing and Function in Human Cells. RNA 2003, 9, 112-123. 148

34. Fu, Q.; Yuan, Y. A., Structural insights into RISC assembly facilitated by dsRNA- binding domains of human RNA helicase A (DHX9). Nucleic Acids Res 2013, 41 (5), 3457-70. 35. Bevilacqua, P. C.; Cech, T. R., Minor-groove recognition of double-stranded RNA by the double-stranded RNA-binding domain from the RNA-activated protein kinase PKR. Biochemistry 1996, 35 (31), 9983-94. 36. Parker, G. S.; Maity, T. S.; Bass, B. L., dsRNA binding properties of RDE-4 and TRBP reflect their distinct roles in RNAi. J Mol Biol 2008, 384 (4), 967-979. 37. Faller, M.; Toso, D.; Matsunaga, M.; Atanasov, I.; Senturia, R.; Chen, Y.; Zhou, Z. H.; Guo, F., DGCR8 recognizes primary transcripts of microRNAs through highly cooperative binding and formation of higher-order structures. RNA 2010, 16 (8), 1570-83. 38. Yeom, K. H.; Lee, Y.; Han, J. J.; Suh, M. R.; Kim, V. N., Characterization of DGCR8/Pasha, the essential cofactor for Drosha in primary miRNA processing. Nucleic Acids Res 2006, 34 (16), 4622-4629. 39. Ameres, S. L.; Zamore, P. D., Diversifying microRNA sequence and function. Nat Rev Mol Cell Biol 2013, 14 (8), 475-88. 40. Rottiers, V.; Naar, A. M., MicroRNAs in metabolism and metabolic disorders. Nat Rev Mol Cell Biol 2012, 13 (4), 239-50. 41. Diaz, J. P.; Chirayil, R.; Chirayil, S.; Tom, M.; Head, K. J.; Luebke, K. J., Association of a peptoid ligand with the apical loop of pri-miR-21 inhibits cleavage by Drosha. RNA 2014, 20 (4), 528-39. 42. Zeng, Y.; Yi, R.; Cullen, B. R., Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 2005, 24 (1), 138-148. 43. Bevilacqua, P. C.; George, C. X.; Samuel, C. E.; Cech, T. R., Binding of the protein kinase PKR to RNAs with secondary structure defects: role of the tandem A-G mismatch and noncontiguous helixes. Biochemistry 1998, 37 (18), 6303-16. 44. Acevedo, R.; Orench-Rivera, N.; Quarles, K. A.; Showalter, S. A., Helical Defects in MicroRNA Influence Protein Binding by TAR RNA Binding Protein. Plos One 2015, 10 (1), e0116749. 45. Faller, M.; Matsunaga, M.; Yin, S.; Loo, J. A.; Guo, F., Heme is involved in microRNA processing. Nat Struct Mol Biol 2007, 14 (1), 23-9. 46. Barr, I.; Smith, A. T.; Senturia, R.; Chen, Y.; Scheidemantle, B. D.; Burstyn, J. N.; Guo, F., DiGeorge critical region 8 (DGCR8) is a double-cysteine-ligated heme protein. J Biol Chem 2011, 286 (19), 16716-25. 47. Quick-Cleveland, J.; Jacob, J. P.; Weitz, S. H.; Shoffner, G.; Senturia, R.; Guo, F., The DGCR8 RNA-binding heme domain recognizes primary microRNAs by clamping the hairpin. Cell Rep 2014, 7 (6), 1994-2005. 48. Senturia, R.; Faller, M.; Yin, S.; Loo, J. A.; Cascio, D.; Sawaya, M. R.; Hwang, D.; Clubb, R. T.; Guo, F., Structure of the dimerization domain of DiGeorge critical region 8. Protein Sci 2010, 19 (7), 1354-65. 49. Dawson, J. H.; Sono, M., Cytochrome P-450 and chloroperoxidase: thiolate-ligated heme enzymes. Spectroscopic determination of their active-site structures and mechanistic implications of thiolate ligation. Chem Rev 1987, 87, 1255-1276. 50. Gregory, R. I.; Yan, K. P.; Amuthan, G.; Chendrimada, T.; Doratotaj, B.; Cooch, N.; Shiekhattar, R., The Microprocessor complex mediates the genesis of microRNAs. Nature 2004, 432 (7014), 235-40. 51. Nicholson, A. W., Ribonuclease III mechanisms of double-stranded RNA cleavage. Wiley Interdiscip Rev RNA 2014, 5 (1), 31-48. 149

52. Macias, S.; Plass, M.; Stajuda, A.; Michlewski, G.; Eyras, E.; Caceres, J. F., DGCR8 HITS-CLIP reveals novel functions for the Microprocessor. Nat Struct Mol Biol 2012, 19 (8), 760-766. 53. Wostenberg, C.; Lary, J. W.; Sahu, D.; Acevedo, R.; Quarles, K. A.; Cole, J. L.; Showalter, S. A., The role of human Dicer-dsRBD in processing small regulatory RNAs. PLoS One 2012, 7 (12), e51829. 54. Ucci, J. W.; Kobayashi, Y.; Choi, G.; Alexandrescu, A. T.; Cole, J. L., Mechanism of interaction of the double-stranded RNA (dsRNA) binding domain of Protein Kinase R with short dsRNA sequences. Biochemistry 2007, 46, 55-65. 55. Benoit, M. P.; Plevin, M. J., Backbone resonance assignments of the micro-RNA precursor binding region of human TRBP. Biomol NMR Assign 2013, 7 (2), 229-33. 56. Roth, B. M.; Hennig, M., Backbone H-1(N), C-13, and N-15 resonance assignments of the tandem RNA-binding domains of human DGCR8. Biomol NMR Assign 2013, 7 (2), 183-186.

150

4.8 Supporting Information

4.8.1 Supporting Methods

Electrophoretic Mobility Shift Assays. Prior to mixing with protein, the RNA was denatured at

95°C for 1 min and renatured at 1°C for 1 min (only in the case of the pri-miRNAs). The binding reactions were incubated at room temperature for 30 minutes to ensure full equilibration in the presence of 50 mM cacodylate, pH 6.0, 50 mM KCl, 5% glycerol, 100 μg/mL bovine serum albumin, 1 mM dithiothreitol, and 0.1 mg/mL herring sperm DNA (to prevent the complex from sticking in the wells). Subsequently, the binding reactions were run on a 0.25X TBE, 10% acrylamide gel at 12 V cm-1 at 4°C for 3 h, with each lane containing 20 μCi. Signal from the gels was quantified on a Typhoon-9410 imager, and the resulting images were processed in

ImageQuant. Briefly, boxes were drawn for both the free and bound RNA for each lane, and the signal was integrated to determine the fraction of bound RNA. The fraction bound was calculated as the intensity of all protein-bound species over the sum of the protein-bound species and the free RNA. The data points reported in the titration curves represent the average fraction bound produced from two gels, with error bars representing the uncertainty in the mean to two standard deviations. The resulting fraction-bound curves were fit to the general Hill equation binding model shown below, with data fitting performed using the Levenberg-Marquardt model as implemented in Matlab (MathWorks). In the equation below, F is the fraction bound, Fmax and

Fmin are the upper and lower baselines, x is the concentration of protein, n is the Hill coefficient, and Kd,app is the apparent binding dissociation constant.

Filter Binding Assays. The binding reactions were prepared under similar conditions as in the

EMSAs, and were incubated at room temperature for 30 minutes to ensure full equilibration in the 151

presence of 50 mM cacodylate, pH 6.0, 50 mM KCl, 5% glycerol, 100 μg/mL bovine serum albumin, 1 mM dithiothreitol, and 0.1 mg/mL herring sperm DNA. Subsequently, the binding reactions were run on a filter binding apparatus containing both nylon and nitrocellulose membranes, pre-soaked in 50 mM cacodylate, pH 6.0 and 50 mM KCl. The membranes were washed twice with 50 mM cacodylate, pH 6.0 and 50 mM KCl and then dried. Signal from the membranes was quantified on a Typhoon-9410 imager, and the resulting images were processed in ImageQuant. Briefly, a box was drawn for each well for both the free and bound RNA membranes, and the signal was integrated to determine the fraction of bound RNA. The data points reported in the titration curves represent the average fraction bound produced from two filter binding assays, with error bars representing the uncertainty in the mean to two standard deviations. The resulting fraction-bound curves were fit to the general Hill equation binding model, with data fitting performed using the Levenberg-Marquardt model as implemented in

Matlab.

4.8.2 Supporting Figures

Figure S4-1. EMSA data showing that Drosha’s dsRBD does not bind pri-mir-16-1. A representative gel is shown, with the concentration of protein increasing from left to right. The leftmost lane is just RNA with no protein (RNA). 152

Figure S4-2. EMSA data for DGCR8-Core binding various native pri-miRNAs and pre-mir-16-1.

A representative gel is shown for each, with the concentration of protein increasing from left to right. The leftmost lane is just RNA with no protein (RNA). The data points (dots) reported in the titration curves on the right represent the average fraction bound produced from two independent titrations, with error bars representing the uncertainty in the mean to two standard deviations. The resulting fraction-bound curves were fit to the general Hill equation binding model, with the best- fit line produced from the determined Kd and n values shown, also reported in Table 4-1. This description will be consistent throughout all EMSAs.

153

Figure S4-3. EMSA data for DGCR8-dsRBD1 binding various native pri-miRNAs. The determined Kd and n values are also reported in Table 4-1.

154

Figure S4-4. Filter binding assays give weaker binding affinities than EMSAs for DGCR8-Core binding. Representative filters for free and bound RNA are shown for each, with the concentration of protein increasing from left to right and top to bottom. For protocol, see Filter

Binding Assays in Supplemental Methods. The data points (dots) reported in the titration curves on the right represent the average fraction bound produced from two independent titrations, with error bars representing the uncertainty in the mean to two standard deviations. The resulting fraction-bound curves were fit to the general Hill equation binding model, with the best-fit line produced from the determined Kd and n values shown. It appears that the intermediate complexes seen in the EMSAs are not detected in filter binding; therefore, EMSAs were chosen to examine

DGCR8 binding rather than filter binding.

155

Figure S4-5. EMSA data for DGCR8-Core binding perfect duplex RNA of varying lengths. The determined Kd and n values are also reported in Table 4-2.

156

Figure S4-6. EMSA data for DGCR8-dsRBD1 binding perfect duplex RNA of varying lengths.

The determined Kd and n values are also reported in Table 4-2.

157

Figure S4-7. Competition processing data for the three shorter perfect RNA duplexes. A representative gel is shown for each, with the concentration of competitor duplex increasing from left to right. The leftmost lanes are pri-mir-16-1 processed in the presence of no transfected

Microprocessor (Mock) and processed in the presence of Microprocessor but no competitor duplex (-Comp). To the right are the fits to a single exponential decay model produced from the determined IC50 value shown, also reported in Table 4-3. The data points (dots) represent the average percentage of pri-miRNA processed from two independent titrations, with error bars representing the uncertainty in the mean to two standard deviations. For both ds12 and ds16 as competitor duplexes, the data was not fittable with a single exponential decay model; however,

IC50 values could be estimated from the data.

158

Figure S4-8. Competition processing data for the longer perfect RNA duplexes. A representative gel is shown for each, with the concentration of competitor duplex increasing from left to right.

The leftmost lanes are pri-mir-16-1 processed in the presence of no transfected Microprocessor

(Mock) and processed in the presence of Microprocessor but no competitor duplex (-Comp). To the right are the fits to a single exponential decay model produced from the determined IC50 value shown, also reported in Table 4-3. The data points (dots) represent the average percentage of pri- miRNA processed from two independent titrations, with error bars representing the uncertainty in the mean to two standard deviations. A perfect duplex of 33bps with an alternative sequence than ds33 (ds33-alt) was also used to show that sequence does not affect competition processing results. 159

Figure S4-9. EMSA data for DGCR8-Core binding various flanking duplexes. The determined

Kd and n values are also reported in Table 4-2.

160

Figure S4-10. EMSA data for both DGCR8-Core and DGCR8-dsRBD1 binding ssRNA (16TS).

The determined Kd and n values are also reported in Table 4-2. No binding was detected for

DGCR8-dsRBD1.

161

Figure S4-11. EMSA data for DGCR8-dsRBD1 binding various flanking duplexes. The determined Kd and n values are also reported in Table 4-2.

162

Figure S4-12. EMSA data for DGCR8-Core binding ds16 capped with varying terminal loops.

The determined Kd and n values are also reported in Table 4-2.

163

Figure S4-13. EMSA data for DGCR8-dsRBD1 binding ds16 capped with varying terminal loops. The determined Kd and n values are also reported in Table 4-2.

164

Figure S4-14. EMSA data for DGCR8-Core binding pri-mir-16-1 mutants. The determined Kd and n values are also reported in Table 4-4.

165

Figure S4-15. EMSA data for DGCR8-Core binding ds22 and similar length duplexes harboring imperfections found natively in the hot spot and secondary imperfections of pri-mir-16-1. The determined Kd and n values are also reported in Table 4-2. 166

Figure S4-16. EMSA data for DGCR8-HBD-Core binding pri-mir-16-1 with either the large native terminal loop (WT) or the small thermostable tetraloop (TL) and ds44. The determined Kd and n values are also reported in Table 4-5. 167

Figure S4-17. EMSA data for DGCR8-HBD-Core (C352A) binding pri-mir-16-1 with either the large native terminal loop (WT) or the small thermostable tetraloop (TL) and ds44. The determined Kd and n values are also reported in Table 4-5.

168

Figure S4-18. UV-visible absorption spectrum of DGCR8-HBD-Core with and without the

C352A mutation and the flow-through from the nickel resin containing remaining E. coli proteins. The wild-type DGCR8-HBD-Core protein contains a split Soret spectrum (peaks at 367 nm and 450 nm), which is characteristic of bis(thiolate)- or thiolate/phosphine-coordinated ferric systems; whereas, the C352A-mutated protein does not contain any Soret peaks. The flow- through from the nickel column from the wild-type protein is also shown as a negative control to confirm that the yellowish color present in the wild-type protein is not remnants from the flow- through which was also yellow in color.

169

Chapter 5

Structural Characterization of Dicer’s dsRBD Complexed with dsRNA

5.1 Abstract

dsRBDs from various proteins are shown to bind RNA with a variety of affinities, perhaps due to small differences in their folded domains brought about from their diverse sequences. We have previously published Dicer-dsRBD affinities for model miRNAs and siRNA precursors, suggesting that the functional role of Dicer’s dsRBD is not to discriminate between these precursors, but to bind across the stretches of dsRNA located within them. Although there are multiple solved structures of dsRBDs bound to RNA in the literature, only one of them is from a dsRBD in the miRNA maturation pathway, TRBP. Therefore, in order to further our understanding of the involvement of dsRBDs in miRNA maturation, we must increase the number of structures bound to RNA. This chapter encompasses ongoing work aimed at characterizing the structure of Dicer’s dsRBD bound to dsRNA using NMR and X-ray crystallography.

5.2 Introduction

Cytoplasmic processing in RNAi involves processing of both microRNA (miRNA) and small interfering RNA (siRNA) to produce short single-stranded RNAs capable of base pairing with mRNA to down-regulate translation. In this processing step, Dicer is required to cleave precursors of both of these classes of RNAs, even though their structures are quite distinct from 170

one another. MicroRNA precursors are imperfect hairpins of approximately 35 base pairs, while siRNA precursors are long, perfect duplexes longer than 100 base pairs. Therefore, it has been suggested that the functional role of Dicer’s dsRBD is not to discriminate between these precursors, but to bind across the stretches of dsRNA within them, as reported in a recent publication from our lab.1

Extensive work by the Showalter Lab has gone into studying binding by the dsRBDs in the miRNA maturation pathway – including dsRBDs from DGCR8, TRBP, Drosha, and Dicer.

As discussed in Chapter 4, DGCR8’s N-terminal dsRBD (DGCR8-dsRBD1) demonstrates an affinity for dsRNA with a dissociation constant equal to 6 µM, when examining dsRNA lengths comparable to the pri-miRNA substrate. However, TRBP’s N-terminal dsRBD binds almost ~10- fold tighter at approximately 0.8 µM to the same length of dsRNA.2 As for the isolated dsRBDs in the miRNA maturation pathway, Dicer exhibits a binding dissociation constant of approximately 2.5 µM when bound to the same dsRNA,1 whereas Drosha’s dsRBD does not bind any length dsRNA.3 As can be seen from these observed dissociation constants, dsRBDs bind to

RNA with a variety of affinities, perhaps due to small differences in their folded domains brought about from their large range of diverse sequences (see Fig. 1-8 for dsRBD sequence alignment).

Although majority of the dsRBDs involved in the miRNA maturation pathway have solved structures of their apo state, there is currently only one of these dsRBDs with its structure solved bound to RNA – TRBP-dsRBD2 bound to dsRNA.4 Therefore, in order to further our understanding of the involvement of dsRBDs in miRNA maturation, we must increase the number of structures bound to RNA. This chapter encompasses ongoing work done to examine the bound structure of Dicer’s dsRBD with dsRNA. The aim is to atomistically characterize the bound structure using X-ray crystallography and NMR. With crystallography, one goal is to determine whether the dsRBD induces a conformational change in the dsRNA upon binding, as has been suggested by previously determined crystal structures of other dsRBDs bound to 171

dsRNA.5, 6 Furthermore, NMR studies will complement the bound crystal structure by measuring the conformational changes induced in the dsRBD upon binding.

5.3 Materials and Methods

5.3.1 Protein Preparation

A synthetic Dicer gene was purchased from Geneart, and Dicer-dsRBD (1850-1922) was amplified by PCR and inserted into pET47b (Novagen). The expression and purification of the protein was performed as previously described,3 all at 4°C. The protein was buffer exchanged into

50 mM Tris-HCl, pH 7.5, 100 mM KCl. Final concentration of the sample was determined via

FTIR using MW = 8,300 Da. For measuring concentration of 13C,15N-Dicer, concentration was determined via guanidine hydrochloride denaturation by UV absorption using ε = 2,800 M-1 cm-1 at 278 nm.

5.3.2 RNA Preparation

Single-stranded RNA oligomers (designed to be complementary with alternating GC sequences to 10, 12, 16, and 22 nts) were ordered from Dharmacon. These RNAs were deprotected with acetic acid following Dharmacon’s protocol and resuspended in water to 1 mM.

Due to their complementarity, in order to form the duplex, an aliquot of the ssRNA was simply heated to 85°C for 1 minute and renatured at 4°C for 3 minutes.

5.3.3 Analytical Ultracentrifugation

Sedimentation velocity experiments were performed using methods similar to those previously described.7 Briefly, varying amounts of Dicer-dsRBD and 2 µM ds22 were mixed and buffer-exchanged into 50 mM Tris-HCl, pH 7.5, and 50 mM KCl using spin columns. The 172

samples were loaded into two-channel aluminum-epon double-sector cells equipped with quartz windows. Data was collected using absorbance optics at 260 nm (280 nm for apo-protein) in a

Beckman Coulter XL-I analytical ultracentrifuge. The experiments were performed at 20°C using a rotor speed of 50,000 rpm. The data was analyzed using DCDT+.8

5.3.4 Protein:RNA Complex Formation

For NMR and crystallography, complexes were formed by taking renatured RNA duplex and mixing with the desired molar amount of Dicer-dsRBD, and letting equilibrate on ice for at least 10 minutes to allow full binding. The complex was then concentrated in a small 400 µL, 3K spin column to the desired final volume, with no buffer exchanging because the RNA duplex has a small tendency to elute through the membrane. Therefore, the final buffer concentration is ~35 mM Tris-HCl, pH 7.5, 70 mM KCl. The concentrated complex was then allowed to sit overnight at 4°C before any experiment was performed with it.

5.3.5 NMR Methods

The complex of dsRNA and 15N-Dicer-dsRBD was monitored using standard 15N-HSQC experiments carried out on a Bruker Avance III 600 MHz NMR spectrometer equipped with a

TCI Cryoprobe, with the sample temperature maintained at either 298K or 308K. Conditions for apo-Dicer-dsRBD were 300 µM at ~40 mM Tris-HCl, pH 7.5, 80 mM KCl, 10% D2O in 400 µL in a 4 mm tube. Conditions for the complexes were 350 µM dsRNA with 200 µM Dicer-dsRBD in ~35 mM Tris-HCl, pH 7.5, 70 mM KCl, 10% D2O in 250 µL in a Shigemi tube. For the relaxation NOE study, the experiment was carried out at 308K and measurements were taken 3 times on the 600 MHz field strength in order to increase signal-to-noise. The NOE sample contained 850 µM ds16 and 485 µM Dicer-dsRBD in ~35 mM Tris-HCl, pH 7.5, 70 mM KCl,

10% D2O in 400 µL in a 4 mm tube. 173

Standard triple-resonance NMR techniques9, 10 were attempted to assign the backbone resonances of 13C,15N-Dicer-dsRBD on Bruker Avance III 600 MHz spectrometer. Conditions for the complex were 612 µM dsRNA with 350 µM Dicer-dsRBD in ~35 mM Tris-HCl, pH 7.5, 70 mM KCl, 10% D2O in 250 µL in a Shigemi tube, and collected at 308K. However, the spectra were not of high enough quality to analyze due to the intermediate exchange dynamics of the complex.

5.3.6 X-Ray Crystallography

Multiple ratios of Dicer-dsRBD:dsRNA (2:1, 1:1, and 1:2 at 900µM:450µM,

700µM:700µM, and 450µM:900µM Dicer-dsRBD:dsRNA, respectively) were used to set up sitting drop crystal trays. In the initial 96-well attempts, the suite’s conditions (Hampton’s Natrix or Qiagen’s Classics) were added to every reservoir (50 µL) and every well (0.5 µL). The complex sample was then added to every well (0.5 µL). The tray then sat at 20°C for multiple days and monitored for crystal production. The Natrix condition F2 produced the best crystals, which was used for optimization. The condition contained: 40 mM sodium cacodylate, pH 6.0, 80 mM NaCl, 12 mM KCl, 20 mM MgCl2, 30% v/v (+/–)-2-methyl-2,4-pentanediol (MPD), and 12 mM spermine tetrahydrochloride. This condition was also optimized in the presence of BaCl2 rather than MgCl2, because large crystals were also seen using this divalent cation. Of note, 1:1

Dicer-dsRBD:dsRNA ratio hardly produced sizeable crystals for any condition and was not tested further.

The optimal condition from the Natrix suite was used for the 96-well Additives screen by adding 50 µL to each reservoir and adding 5 µL of each additive to their respective reservoirs

(final ~10% additive for each condition). 0.5 µL of this resulting mixture was added to each well with 0.5 µL of the complex sample. Again, the tray sat at 20°C for multiple days and monitored for crystal production. 174

Lastly, the optimal conditions produced from the Additives screen were optimized in a

24-well format. The following variables were tested: Mg2+ versus Ba2+, 1:2 versus 2:1 Dicer- dsRBD:dsRNA ratios, and hanging versus sitting drops. In addition, the formation of the drops was varied for the hanging drop tray by either adding the buffer condition directly to the complex sample drop or drawing a bridge between the buffer and sample drops using a pipette tip. All drops were 2 µL total, consisting of 1 µL buffer condition and 1 µL complex sample.

5.3.7 Crystal Looping

When sufficiently sized crystals were produced (> 50 µm), these were looped with nylon loops of various sizes (0.05 – 0.1 mm or 0.1 – 0.2 mm). Once the crystal was contained in the loop, the loop was dipped in a cryoprotectant solution containing the respective buffer condition

(without protein or RNA) plus 20% glycerol. This washes the crystal of excess protein or RNA from its surface while the glycerol protects the crystal during freezing. The loop was then put directly into a vial submerged in liquid nitrogen for storage.

5.4 Results and Discussion

Most dsRBDs studied in our lab show length-dependent binding to dsRNAs when examining RNA lengths representative of those encountered in the miRNA pathway. In particular, Dicer’s dsRBD shows a dsRNA length-dependence with moderate affinity compared to dsRBDs from the other proteins in the pathway. Using EMSAs, we have shown that Dicer’s dsRBD binds with a dissociation constant ranging from 2 µM to 16 µM as the dsRNA length decreases from 44 bps to 12 bps.1 Because Dicer adequately binds dsRNA of shorter lengths, I sought to atomistically characterize the structure of Dicer’s dsRBD bound to dsRNA using NMR and X-ray crystallography, while using AUC to determine their binding stoichiometries. Crystal 175

structures in the PDB of dsRBDs bound to dsRNA typically utilize a self-complementary RNA of

10 bps with alternating GC bases.5, 6 Of interest, these structures routinely show the dsRBD bound across the junction formed by two RNA helices within a pseudo-infinite lattice in the packed crystal state. It is unknown whether the dsRBD prefers to bind at this somewhat deformable site or whether this is an induced effect of the crystal packing. In addition, the binding footprint for a dsRBD is approximately 16 bps on the RNA,4, 11 leading to the possible requirement that the dsRBD must bind across the junction in this fashion. Furthermore, molecular dynamics simulations for TRBP-dsRBD2 binding suggests that the dsRBD induces a bend in the dsRNA (35 bps), resulting in a widened minor groove opposite the bound dsRBD.12 Therefore, I attempted both crystallography and NMR with a variety of dsRNA lengths from 10 bps up to 22 bps (ds10, ds12, ds16, and ds22) to determine if this junction-binding phenomenon is seen in

Dicer-dsRBD binding due to induced conformational changes.

5.4.1 Analytical Ultracentrifugation

When our original study of Dicer binding was performed, an analytical ultracentrifuge was not available for use; therefore, AUC was performed with Dicer-dsRBD binding to ds16 by

Jim Cole at the University of Connecticut. The AUC data showed that the Dicer-dsRBD saturates the ds16 lattice with a 1:1 stoichiometry.1 Since then, an ultracentrifuge has become available for use on Penn State’s campus and we have begun performing our own experiments.

Assuming the stoichiometry would be similar for ds10 and ds12 as was for ds16, AUC experiments have only been done for Dicer’s dsRBD binding to ds22. Only preliminary experiments have been performed to date; therefore, the saturating stoichiometry has not yet been determined. However, Figure 5-1 shows the possibility of at least 2 dsRBDs binding to a single ds22 molecule at sub-saturating conditions. This result suggests that binding of a dsRBD along 176

the center of the longer RNA lattice may be possible, rather than at the junction as in the previously determined crystal structures.

Figure 5-1. AUC analysis suggests that 2 Dicer-dsRBDs are capable of binding to a single ds22 molecule. The sum curve (teal) is deconvoluted to yield multiple species (blue and green). The ds22 RNA has a sedimentation coefficient of 3.5 S (blue) and Dicer-dsRBD is 1.0 S (run separately). These preliminary results at this sub-saturating concentration (4.3 µM Dicer-dsRBD:

2.0 µM ds22) suggest the formation of a small fraction of complex with a 2:1 mole ratio of Dicer- dsRBD:ds22, as indicated by the ~2 S shift in peak position. The residuals are plotted below in red. Binding was performed at 20°C (293K) and sedimented at 50,000 rpm. Experiments performed by Joshua Kranick.

5.4.2 Nuclear Magnetic Resonance

We have previously published NMR data in which a longer dsRNA (ds33) was titrated into Dicer-dsRBD to monitor the dynamics of the amino acids involved in RNA binding.1 As can 177

be seen in Figure 5-2, the loss of Dicer resonances due to intermediate exchange precludes residue-specific chemical shift assignment for all Dicer residues in the Dicer:ds33 complex.

Intermediate exchange dynamics are expected for a system binding in the low micromolar range, and are likely to be exacerbated by the non-sequence specific binding nature of dsRBDs.13 This is evidenced in the disappearance of multiple peaks as a molar excess of ds33 is added to the protein

(compare Fig. 5-2A versus Fig. 5-2C, rightmost panel). In support of widespread interaction within the dsRBD, Figure 5-2B shows that majority of the amino acids in the dsRBD are participating in binding with the dsRNA via either direct interaction or correlated motions within the domain to accommodate binding.

178

Figure 5-2. NMR titration of 15N-Dicer-dsRBD with ds33. (A) Representative 15N-HSQC spectra of Dicer-dsRBD collected in the unbound state. (B) Ratio of individual peak intensities in the presence of 0.02:1 mole ratio ds33:Dicer-dsRBD versus the apo-state spectrum displayed in (A).

(C) Representative spectra for the ds33 titration of mole ratios of 0.02:1, 0.20:1, and 2.0:1 ds33:Dicer-dsRBD as labeled. All spectra were collected at 298K in the presence of 50 mM Tris, pH 7.5, 100 mM KCl, 10% D2O on a spectrometer operating at 600 MHz field strength. Figure reprinted from “The Role of Human Dicer-dsRBD in Processing Small Regulatory RNAs” by

Wostenberg, et al. licensed under CC BY 3.0.

In my more recent work, a variety of dsRNA lengths were used to complex with Dicer- dsRBD in the hopes of preventing intermediate exchange dynamics, as seen in the presence of ds33, from confounding our efforts towards NMR-based structure characterization. In order to visualize the formed complexes, the samples were run on a native polyacrylamide gel, which was 179

stained with two different methods: SYBR Gold to detect RNA and coomassie blue to detect protein (Fig. 5-3). The gels showed that discrete complexes were being formed in the presence of ds10, ds12, and ds16; however, the smear seen for ds22 suggested that the dsRBD is binding in multiple locations on the RNA lattice. These results correlate well with the AUC results seen for binding to ds16 and ds22. Of note, the formed complexes migrate opposite of what would be expected as the dsRNA length increases. In the case of ds10 and ds12 binding, this could be due to the recruitment of multiple RNA molecules to a single dsRBD, which would increase the weight of the complex.

Figure 5-3. Native gel showing the complexes formed between Dicer-dsRBD and various lengths of dsRNA. The left panel shows the gel stained with SYBR Gold to detect RNA. The right panel shows the gel stained with coomassie blue to detect protein; Dicer-dsRBD in its free state (Dicer- apo) does not run properly on the gel due to the charge of the free protein. The complexes and free dsRNA are indicated to the left.

180

The same complexes that were run on the gel were then monitored using NMR at 298K

(Fig. 5-4), with Dicer-dsRBD being isotopically labeled. In all cases, the dsRNA was in a molar excess over the dsRBD at a ratio of 1.75:1 to ensure that all protein was in the bound state. These complexes were equilibrated in the presence of 35 mM Tris-HCl, pH 7.5, 70 mM KCl rather than the high salt concentration used for the previous titration with ds33. Lower salt concentrations typically produce tighter binding between proteins and nucleic acids due to the lower entropic penalty faced when displacing fewer salt ions from around the protein and nucleic acid upon binding.14 Amongst the four dsRNAs tested, better exchange dynamics are present for the Dicer- dsRBD:ds16 complex, yet majority of the peaks are still missing for Dicer’s dsRBD (see Fig. 5-

6A for the number of peaks present in the apo state).

181

Figure 5-4. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD bound to various lengths of dsRNA at

298K: (A) ds10, (B) ds12, (C) ds16, (D) ds22. In all cases, the dsRNA was in a molar excess over the dsRBD at a ratio of 1.75:1. All spectra were collected in the presence of 35 mM Tris, pH 7.5,

70 mM KCl, 10% D2O on a spectrometer operating at 600 MHz field strength.

In an attempt to push the exchange dynamics into fast exchange, the same complexes were monitored at a higher temperature at 308K (Fig. 5-5). For all dsRNAs, the exchange dynamics are improved; in fact, almost all peaks are recovered in the presence of ds16. In order to ensure that the protein did not dissociate from the dsRNA at this higher temperature, apo-dsRBD was also monitored at this temperature (Fig. 5-6B). The spectra suggest that the dsRBD begins to 182

unfold at this higher temperature, as evidenced by the unresolved peaks clustered near the center of the spectrum. This is supported by the thermal denaturation midpoint at 41°C reported from

DSC (Fig. 5-7). The presence of resolved peaks at the higher temperature also suggests that the domain’s fold is stabilized by the binding of dsRNA.

Figure 5-5. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD bound to various lengths of dsRNA at

308K: (A) ds10, (B) ds12, (C) ds16, (D) ds22. Sample and buffer conditions were exactly the same as at 298K in Figure 5-4.

183

Of note, the buffer and salt concentrations are slightly higher for samples containing

Dicer in the free state versus the bound state because Dicer’s dsRBD does not properly fold at lower salt concentrations (Fig. 5-6C). A unique feature of the overlay shown in Figure 5-6C is the shift in peak positions when decreasing the salt concentration (compare the clusters of peaks at

112 ppm). This shift matches that seen in the overlay between free and bound Dicer-dsRBD (Fig.

5-8A), which experienced the same change in salt concentrations, albeit also different temperatures. Therefore, we can likely attribute the shift in most peak positions to a change in salt concentration, and not temperature.

184

Figure 5-6. NMR 15N-HSQC spectra of 15N-Dicer-dsRBD in its unbound state at (A) 298K and

(B) 308K. Both spectra were collected in the presence of 40 mM Tris, pH 7.5, 80 mM KCl, 10%

D2O (high salt) on a spectrometer operating at 600 MHz field strength. (C) Overlay of apo-Dicer- dsRBD at 298K in the presence of 40 mM Tris-HCl, pH 7.5, 80 mM KCl (high salt, red) and 20 mM Tris-HCl, pH 7.5, 30 mM KCl (low salt, black).

185

Figure 5-7. Thermal denaturation midpoint of Dicer-dsRBD in its free state at approximately

41°C, as shown by DSC. Conditions were 120 µM Dicer-dsRBD in 50 mM Tris, pH 7.5, 100 mM

KCl.

Figure 5-8A shows an overlay of the two spectra of Dicer-dsRBD in its apo state (at

298K) and bound to ds16 (at 308K). The overlay confirms that several amino acids are interacting with the dsRNA, as was shown with binding to ds33. Although the two spectra were collected under differing salt concentrations and different temperatures, the movement is not consistent in a particular direction for all peaks as would be the case for salt or pH differences. To demonstrate which peaks are moving in response to dsRNA binding and not temperature or salt concentration,

Figure 5-8B has the bound spectrum shifted to the right to align the peaks at 112 ppm. Therefore, this data corroborates that seen for binding to ds33,1 and suggests that there is correlated movement in binding to dsRNA for Dicer’s dsRBD; although now we should be able to assign dynamics for every amino acid in the dsRBD. 186

Figure 5-8. HSQC overlay of (A) Dicer-dsRBD in its free state (at 298K, red; from Fig. 5-6A) and bound to ds16 (at 308K, blue; from Fig. 5-5C). (B) The overlay from panel A with the bound spectrum shifted to align the peaks at 112 ppm.

I recently collected relaxation data for the Dicer-dsRBD:ds16 complex. However, although this complex exhibits better exchange dynamics, they are not sufficient for T1 and T2 relaxation measurements; therefore, I have only been able to collect NOE data of suitable quality for (semi-)quantitative analysis. The Nuclear Overhauser Effect (NOE) is described as a through- space polarization transfer between nuclear dipoles within close proximity, and its magnitude shows a strong dependence on the amplitude of ps-ns timescale conformational fluctuations that reorient the pair of nuclear spins. Therefore, the NOE data will demonstrate whether the protein is undergoing different dynamic changes in the bound state versus the free state; furthermore, this data will give an idea of the types of rearrangements that the dsRBD is undergoing in response to dsRNA binding.15 The overlay in Figure 5-9 shows that majority of the peak intensities become extremely weakened after the delay period, again corroborating the conclusion that widespread effects are occurring in the dsRBD upon dsRNA binding. The next step to analyzing the NOE data would be to assign the peaks in the HSQC using 3D NMR experiments for a double-labeled 187

Dicer-dsRBD:ds16 sample, but this was not possible due to the loss of intensity of several peaks in the 3D spectra. Of note, I found that pushing the concentration of the complex above

350µM:600µM Dicer-dsRBD:ds16 induced some sort of unwanted aggregation in the sample, making it even more difficult to assign the spectra in the 3D NMR experiments because the labeled protein concentration was too low.

Figure 5-9. Overlay of HSQCs produced from NOE-off (green) and NOE-on (magenta) effects for Dicer-dsRBD bound to ds16 at 308K. The dsRNA was in a molar excess over the dsRBD at a ratio of 1.75:1, with experiments performed in 35 mM Tris, pH 7.5, 70 mM KCl, 10% D2O on a spectrometer operating at 600 MHz field strength.

5.4.3 X-Ray Crystallography

My latest efforts have also been in crystallizing Dicer-dsRBD bound to dsRNA. As was done with the NMR studies, crystallography was also attempted with the various lengths of dsRNA. Although fast exchange dynamics were seen for ds12 and ds16 in complex with the 188

dsRBD with NMR, no crystals were obtained with these dsRNAs. However, crystals were seen with the other dsRNAs. Complexes with ds22 only yielded microcrystals, while much larger crystals have been generated in the presence of ds10.

Initial optimization of the buffer, pH, and salt yielded ~40 µm crystals for Dicer-dsRBD bound to ds10 (Fig. 5-10A). Taking into account these optimizations, the crystal size slightly increased by varying the mole ratio of protein to RNA; however, various additives have recently been giving significantly larger sizes in the crystals up to ~200 µm (Fig. 5-10B, using urea as the additive). Upon analyzing these crystals at the synchrotron, two different possible space groups for the crystal lattice were obtained: P21212 and P3. Upon further analysis using molecular replacement modeling, the highly symmetric P21212 space group data of 2.1 Å resolution (Fig. 5-

11A) was found to contain only ds10, no Dicer-dsRBD.

Figure 5-10. Crystals of Dicer-dsRBD bound to ds10. (A) Initial attempts at crystallizing the complex with just buffer, pH, and salt optimization yielded ~40 µm crystals. (B) Crystal obtained with a urea additive, although other additives have proven beneficial.

As for the crystals exhibiting the P3 space group, optimization is still ongoing to obtain high-resolution data for molecular replacement analysis. Although crystals diffracting to below 4

Å have been obtained for the P3 space group (Fig. 5-11B), a full data set has only been collected 189

for a crystal with 5.3 Å resolution. Molecular replacement efforts for this collected data set suggests that both Dicer-dsRBD and ds10 are present in the crystal; therefore, only conditions containing this P3 space group are being optimized. Matthews coefficient calculations also support the formation of complexes containing both Dicer-dsRBD and ds10 in this P3 space group. A Matthews coefficient is defined as the crystal volume per unit of the macromolecule’s molecular weight. In other words, this gives the probability of a particular sample (single or multiple molecules) existing in a specific space group calculated from its molecular weight, which is all based on the crystal structures deposited in the PDB.16 For the P3 space group, this calculation predicts multiple copies (5-12) of the complex, depending on the mole ratio of dsRBD to dsRNA formed in the crystal lattice. Whereas, the P21212 space group predicts the likeliness of only ds10 (4 copies) existing in this crystal lattice, which is corroborated by the molecular replacement analysis.

Figure 5-11. Diffraction patterns for crystals of Dicer-dsRBD bound to ds10. (A) Screen shot of a crystal found to contain only ds10 and no protein, with P21212 space group resolving to 2.1 Å.

The ring indicates the positioning of 2.0 Å data. (B) Screen shot of a crystal believed to contain both Dicer-dsRBD and ds10, with P3 space group resolving to 3.8 Å. 190

5.5 Conclusion

Through the use of NMR spectroscopy and X-ray crystallography, I have begun to show how the dsRBD of Dicer interacts with dsRNA in both solution and static environments. The current results look promising for characterizing the structure of Dicer’s dsRBD bound to dsRNA using crystallography; however, exchange dynamics have limited the capabilities with NMR. I will continue to optimize crystal conditions for the Dicer-dsRBD:ds10 complex for molecular replacement studies. I hope to gain from the crystal structure insight into whether the dsRBD distorts the dsRNA upon binding as seen in previous dsRBD co-crystal structures, in addition to any differences in binding of Dicer’s dsRBD versus other mammalian dsRBDs exhibiting alternative sequences.

5.6 Acknowledgements

This work was supported by the US National Institutes of Health grant R01GM098451. I would like to thank Katsu Murakami, Amie Boal, and Andrew Mitchell for help with the crystallography set-up and analysis, as well as sharing their synchrotron time. I also would like to thank Tracy Nixon and Amanda Applegate for helpful discussion while developing the AUC protocol, and Josh Kranick for performing the AUC experiments. Lastly, I would like to thank

NMR Facility and Crystallography Facility for providing use of their equipment.

191

5.7 References

1. Wostenberg, C.; Lary, J. W.; Sahu, D.; Acevedo, R.; Quarles, K. A.; Cole, J. L.; Showalter, S. A., The role of human Dicer-dsRBD in processing small regulatory RNAs. PLoS One 2012, 7 (12), e51829. 2. Acevedo, R.; Orench-Rivera, N.; Quarles, K. A.; Showalter, S. A., Helical Defects in MicroRNA Influence Protein Binding by TAR RNA Binding Protein. Plos One 2015, 10 (1), e0116749. 3. Wostenberg, C.; Quarles, K. A.; Showalter, S. A., Dynamic origins of differential RNA binding function in two dsRBDs from the miRNA "microprocessor" complex. Biochemistry 2010, 49 (50), 10728-36. 4. Yang, S. W.; Chen, H. Y.; Yang, J.; Machida, S.; Chua, N. H.; Yuan, Y. A., Structure of Arabidopsis HYPONASTIC LEAVES1 and its molecular implications for miRNA processing. Structure 2010, 18 (5), 594-605. 5. Ryter, J. M.; Schultz, S. C., Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. EMBO J 1998, 17 (24), 7505-13. 6. Fu, Q.; Yuan, Y. A., Structural insights into RISC assembly facilitated by dsRNA- binding domains of human RNA helicase A (DHX9). Nucleic Acids Res 2013, 41 (5), 3457-70. 7. Wong, C. J.; Launer-Felty, K.; Cole, J. L., Analysis of Pkr-Rna Interactions by Sedimentation Velocity. Method Enzymol 2011, 59-79. 8. Philo, J. S., Improved methods for fitting sedimentation coefficient distributions derived by time-derivative techniques. Anal Biochem 2006, 354 (2), 238-246. 9. Kay, L. E., NMR studies of protein structure and dynamics. J Magn Reson 2005, 173 (2), 193-207. 10. Kanelis, V.; Forman-Kay, J. D.; Kay, L. E., Multidimensional NMR methods for protein structure determination. Iubmb Life 2001, 52 (6), 291-302. 11. Ramos, A.; Grunert, S.; Adams, J.; Micklem, D. R.; Proctor, M. R.; Freund, S.; Bycroft, M.; St Johnston, D.; Varani, G., RNA recognition by a Staufen double-stranded RNA- binding domain. EMBO J 2000, 19 (5), 997-1009. 12. Vukovic, L.; Koh, H. R.; Myong, S.; Schulten, K., Substrate Recognition and Specificity of Double-Stranded RNA Binding Proteins. Biochemistry 2014, 53 (21), 3457-3466. 13. Dominguez, C.; Schubert, M.; Duss, O.; Ravindranathan, S.; Allain, F. H. T., Structure determination and dynamics of protein-RNA complexes by NMR spectroscopy. Prog Nucl Magn Reson Spectrosc 2011, 58 (1-2), 1-61. 14. Hall, K. B.; Stump, W. T., Interaction of N-terminal domain of U1A protein with an RNA stem/loop. Nucleic Acids Res 1992, 20 (16), 4283-90. 15. Foot, L. N.; Feracci, M.; Dominguez, C., Screening protein - Single stranded RNA complexes by NMR spectroscopy for structure determination. Methods 2014, 65 (3), 288- 301. 16. Kantardjieff, K. A.; Rupp, B., Matthews coefficient probabilities: Improved estimates for unit cell contents of proteins, DNA, and protein-nucleic acid complex crystals. Protein Sci 2003, 12 (9), 1865-1871.

192

Chapter 6

Perspectives in RNA Recognition by Proteins in the MicroRNA Maturation Pathway

Thousands of microRNAs (miRNAs) have been found in all multicellular organisms to play a key role in their RNA silencing pathway (also called RNA interference, or RNAi). Viruses have even found ways to high-jack our own RNA silencing pathway in order to process their viral functions. In humans, these non-coding miRNAs have thus been linked to several disease states including cancer, neurodegenerative diseases, cardiac disease, and diabetes by regulating the expression of more than 90% of genes. Multiple companies have already begun utilizing these small RNAs as therapeutics for controlling the onset of these life-threatening diseases, but more research is needed to fully functionalize these therapeutics. Therefore, the long-term aim for my research is to apply the knowledge gained here as a possible entry point for regulating the RNAi pathway.

Despite the potential aid that understanding the mechanisms involved in miRNA maturation would provide to the medical community, much is still unknown about the processing of these RNAs into their final mature forms. The processing step involving Microprocessor interactions with miRNA precursors is just one area in RNAi that warrants improvement. In particular, understanding this interaction is greatly impeded by the inability to determine solution structures of these RNA precursors biochemically. In fact, accurate prediction of RNA structures is an extreme necessity for the entire RNA community, because the experimental determination and prediction of RNAs remain challenging with current atomic-resolution methods. Here, 193

SHAPE chemistry was sufficiently used to gather RNA secondary structural information with single-nucleotide resolution, providing constraints suitable for input into MC-Pipeline for refinement of RNA structure models. This technique revealed that miRNA precursors contain deformable sites that are necessary for efficient Microprocessor recognition, which was further confirmed with Drosha processing assays.

Literature suggests that the recognition and cleavage of miRNA precursors is in part based on unique structural characteristics of the RNA. The structure models generated herein support this hypothesis in that RNA imperfections combine with the necessary single-stranded

RNA:double-stranded RNA junction to define the correct Drosha cleavage site. Aside from the identification of this Drosha recognition site, these structure models suggested that primary miRNAs are largely an elongated A-form helix – apart from the terminal loop and flanking tails – with a consistent display of two dynamic sites (termed the “hot spot” and secondary imperfections) within them. I believe that these unique features are what set primary miRNAs apart from other RNAs in the nucleus, providing a standard for Microprocessor recognition in

RNAi.

However, identifying important structural features on miRNA precursors is only part of the battle in understanding Microprocessor recognition. Excision of these precursors from newly transcribed RNA requires DGCR8 to recognize the proper RNA substrates and facilitate placement of double-stranded cuts by Drosha in the correct locations. Improper recognition by a dsRBD within these proteins can therefore result in heightened amounts of precursor RNAs and/or incorrect mRNA levels during RNA interference, resulting in a multitude of diseases and cancers. Because we showed that Drosha’s dsRBD is not capable of binding RNA, it is therefore the responsibility of the dsRBDs located within DGCR8 for proper substrate selection.

Furthermore, the smallest recombinant construct of DGCR8 that is sufficient for in vitro RNA binding, referred to as DGCR8-Core, consists solely of these two dsRBDs. The necessity of these 194

tandem dsRBDs was corroborated by competition processing assays, showing a similarity in binding affinity between the shortened tandem-dsRBD construct and full-length DGCR8 purified from mammalian cells. Although the literature suggests that other domains in DGCR8 play a key role in RNA recognition, this was not investigated in my thesis.

Because dsRBDs rarely recognize the nucleotide sequence of RNA, it is reasonable to hypothesize that their function is dependent on recognition of specific structural features in miRNA precursors. Throughout the course of my research, I found that neither the N-terminal dsRBD of DGCR8 in isolation, nor the tandem DGCR8-Core construct, are sensitive to the presence of deformations within the stem of miRNA precursors. This suggests that DGCR8’s lack of sensitivity to the presence of imperfections within these precursors could be due to its tandem dsRBDs being capable of cooperatively binding around them. Despite the research that has gone into studying the Microprocessor’s interactions with RNA, we still do not know the mechanism by which the two proteins interact with each other or entirely understand how DGCR8 interacts with the primary miRNA for targeted cleavage.

In general, due to the prevalence of double-stranded RNA binding domains (dsRBDs) in

RNAi, the Showalter Lab has been very involved in studying these domains. In fact, when comparing the binding results seen for DGCR8 to those of other dsRBD-containing proteins in the miRNA maturation pathway, a variety of binding affinities are seen. In particular, the isolated dsRBD of Dicer displays very similar binding affinities to that of the tandem-dsRBD construct of

DGCR8; whereas the dsRBD from Drosha does not bind RNA at all. What then is the evolutionary purpose of tandem dsRBDs if those in DGCR8 bind just as efficiently as Dicer’s isolated dsRBD?

Studies show that eukaryotic Drosha and Dicer are similar to bacterial and prokaryotic

RNase III enzymes because they all contain a single dsRBD in conjunction with RNase III domains – although bacterial and prokaryotic only contain a single RNase III domain. Is it 195

possible that the dsRBDs from both Drosha and Dicer were originally capable of binding and cleaving miRNA precursors on their own? Then over time, DGCR8 and TRBP evolved as highly efficient binding partners for the RNase III enzymes. This may have come about because, during the course of their evolution, these binding proteins developed tandem dsRBDs to impart their specificity in miRNA precursor selection; therefore, negating the need for Drosha and Dicer to efficiently bind their RNA substrates, especially in the case of Drosha because its dsRBD does not bind RNA.

When just considering isolated dsRBDs, even a variety of affinities for RNA are seen from various proteins, which could in part be due to small differences in their folded domains brought about from their diverse sequences. Molecular dynamics simulations and NMR data has shown that lengthened loops within the dsRBD fold partially contribute to binding activity.

Recent work by Joshua Kranick, a fellow graduate student in the lab, has shown that binding activity in Drosha’s dsRBD can be restored by mutating amino acids at the RNA binding interface. Therefore, ongoing work in the lab is aimed at characterizing the structure and conformational dynamics that various dsRBDs undergo in binding dsRNA using atomic resolution methods. The results from these studies will be used for comparison to other bound structures of dsRBDs that consist of alternative amino acid sequences, which should prove useful in uncovering why dsRBDs have evolved such diverse sequences.

In the end, fully understanding the interactions of proteins with miRNA precursors in

RNAi requires detailed characterization of both the protein and RNA molecules involved. Only then can the binding interactions between these molecules be properly analyzed to further explore the intricate mechanisms that regulate the entire pathway. One day, contributions from fundamental research, such as those detailed in this thesis, will culminate into a complete and comprehensive understanding into these mechanistic pathways and yield breakthroughs for the development of drugs and therapeutics in medicine. 196

Appendix

The Application of Biochemical Techniques to Monitor dsRBD Binding

A.1 Introduction

As discussed in Chapter 1, multiple techniques are available for investigating the binding of dsRBDs to RNA. However, due to the decreased solubility of DGCR8-Core, the complex typically crashes out of solution, rendering these techniques impossible for use. Furthermore, the low binding affinities seen for the isolated N-terminal dsRBD of DGCR8 poses problems for the detection limits of these techniques. For footprinting techniques, observing dsRBD binding is ambiguous due to the non-sequence specific interactions of these domains. This lack of specificity gives rise to dynamic and diffuse binding along the length of the RNA, resulting in averaging of signal across the RNA and no conclusive region of dsRBD interaction. This appendix shows the limitations I have faced using multiple methods for monitoring DGCR8 binding to RNA. For some of the techniques, data for another dsRBD is shown for comparison to demonstrate that the technique is possible for studying other dsRBDs with increased solubility.

A.2 Materials and Methods

A.2.1 RNA Preparation

All DNAs containing a primary miRNA sequence were purchased from Geneart and contained a T7 promoter sequence at the 5´-end, as well as an inverted BsaI cut site at the 3´-end. 197

Preparation of the template DNA, transcription by T7 RNA polymerase, and purification of the transcribed RNA were all performed as previously described.1 The RNAs for SHAPE were prepared as described in Chapter 2. Double-stranded RNAs were ordered from Dharmacon and prepared as described in Chapter 4. 32P-labeling of RNAs were prepared as described in Chapter

3.

A.2.2 Protein Preparation

A synthetic DGCR8-Core (505-720) gene was purchased from Geneart. The expression and purification of the protein was performed as previously described.2 The protein was buffer exchanged into 50 mM cacodylate, pH 6.0, 50 mM KCl, and 0.35 µg/mL β-mercaptoethanol.

Final concentration of the sample was determined via guanidine hydrochloride denaturation by

UV absorption using ε = 22,400 M-1 cm-1 at 278 nm.

A.3 Results

A.3.1 Isothermal Titration Calorimetry

A very valuable technique that yields all thermodynamic binding parameters is isothermal titration calorimetry (ITC). This technique measures the amount of heat required to maintain a constant temperature between a reference cell and a sample cell, to match the change in heat generated while a ligand is titrated into the sample cell containing another macromolecule.3 However, due to the high concentration of complex formed in this technique, the limited solubility of DGCR8-Core in complex with RNA rendered ITC impossible to use for measuring binding. 198

Figure A-1. ITC data for (A) TRBP-ΔC binding ds22 (RA and SAS, submitted) and for (B)

DGCR8-Core binding ds33. The conditions for each experiment are indicated, with both performed at 8°C. Based on the data, I believe that the DGCR8 complex began to crash out in the cell by the second injection.

All binding was monitored using the VP-ITC located in the Showalter Lab. Binding for

TRBP-ΔC is shown because the conditions for monitoring its binding event were similar to those tested for DGCR8-Core (Fig. A-1). However, TRBP is highly soluble and binds dsRNA with very tight affinity, both of which vary significantly from DGCR8-Core. Therefore, it is not necessary that the protein and RNA be as concentrated in the case of TRBP binding. Although decreasing the concentrations of both components would help to prevent the DGCR8 complex from crashing out of solution, the heats of injection would not be detectable. Based on the ITC trace, I believe that the complex began to crash out by the second injection; the rest of the 199

experiment likely monitored aggregation of precipitate. In attempt to prevent precipitation in the cell, several variables were tested for monitoring DGCR8’s binding such as: 1) lower temperature

(8°C), 2) increased duration of injections, which would add DGCR8 to the dsRNA in the sample cell more slowly, and 3) increased stir speed, which would mix the protein into the RNA faster.

However, none of these optimizations ended in a better result.

A.3.2 Circular Dichroism

Circular dichroism (CD) only yields the binding stoichiometry of protein:RNA interactions. This is done by titrating protein into RNA and monitoring the change in circular dichroism at 260 nm; the stoichiometry is reached at saturation when there is no longer a change in signal. It is believed that a change in circular dichroism is observed because the protein induces some sort of conformational change in the RNA’s circular dichroism upon binding, but this effect is still under investigation.4, 5 However, CD was also incapable of investigating RNA binding due to DGCR8’s poor solubility.

200

Figure A-2. CD data for (A) TRBP-ΔC binding ds33 (RA and SAS, submitted) and for (B)

DGCR8-Core binding ds33. The conditions for each experiment are indicated, with both performed at 8°C. For DGCR8, the complex began to visibly crash out in the cell by the third injection.

All binding was monitored using the CD located in the Bevilacqua Lab. Binding for

TRBP-ΔC is shown because the conditions for monitoring its binding event were similar to those tested for DGCR8-Core (Fig. A-2). Even though the concentrations required for CD are much lower than those in ITC, the complex still crashed out due to DGCR8’s poor solubility.

Decreasing the concentration of RNA to help prevent the DGCR8 complex from crashing out of solution rendered the CD signal undetectable. For a typical CD titration, the CD signal should increase as soon as protein is added to the RNA in the cell, as seen in TRBP binding (Fig. A-2A).

However, in the case of DGCR8, the CD signal never increased and then began to decrease (Fig.

A-2B), indicating that the complex crashed out of solution, which was visible by the third injection. Even lowering the temperature to 8°C did not prevent crashing out of the complex. 201

A.3.3 Analytical Ultracentrifugation

Analytical ultracentrifugation (AUC) is also used for determining the binding stoichiometry (n). AUC allows the monitoring of complexes formed using either sedimentation velocity or sedimentation equilibrium experiments.6 Sedimentation velocity experiments give the average size (in Svedbergs) of complexed molecules in a sample, which can be used to determine how many ligands are bound to a macromolecule. In sedimentation velocity experiments, a number of absorbance scans are recorded at a particular wavelength (260 nm due to RNA’s high extinction coefficient) while the complexed molecules “sediment” in a cell during high-speed centrifugation. These absorbance curves are then fit using software to determine the average sedimentation coefficient (S) of the formed complexes; this fitted curve can then be deconvoluted to give an estimated size of multiple complexes within a single sample (shown below). Again, due to the limited solubility of DGCR8 in complex with RNA, this method was not capable of being used to monitor binding.

202

Figure A-3. AUC data for (A) Dicer-dsRBD binding ds22 (reproduced from Chapter 5), and

DGCR8-Core binding ds22 with (B) DGCR8 in excess and (C) ds22 in excess. The conditions for each experiment are indicated. The Dicer experiment was run at 20°C and sedimented at 50,000 rpm. The DGCR8 experiments were run at 10°C and sedimented at 58,000 rpm. For DGCR8, the complex crashed out in the cell only if the protein was in excess as in panel B.

All AUC experiments were performed on the Beckman Coulter XL-I located in South

Frear. Of note, ds22 has a sedimentation coefficient of 3.5 S under the buffer conditions in Dicer binding, and 2.5 S under buffer conditions tested in DGCR8 binding. This difference is most likely due to assumptions made of the solvent density during analysis, which will be confirmed 203

when the Dicer titration is completed (preliminary analysis in Chapter 5). As opposed to the tight binding affinity seen for TRBP, Dicer-dsRBD exhibits a similar binding affinity to that of

DGCR8-Core. Therefore, one would suspect that this technique would be amenable to monitoring

DGCR8 binding if success is seen for Dicer. However, a titration of DGCR8-Core into ds22 showed that as soon as the concentration of protein exceeded the RNA, only the sedimentation coefficient of DGCR8-Core (1.7 S) is seen, suggesting that the formed complex is crashing out of solution in the cell during high-speed centrifugation, thus not being detected in AUC. This is evidenced in the AUC traces by comparing Figures A-3B (protein in excess) and A-3C (RNA in excess). When excess protein is present, all of the RNA is bound and the complex begins to precipitate, yielding sedimentation coefficients matching those of free DGCR8-Core (Fig. A-3B).

However, when excess RNA is present, sedimentation coefficients matching both free ds22 and a small fraction of a larger component – believed to be DGCR8-Core:ds22 complex – are recovered

(Fig. A-3C).

A.3.4 Biolayer Interferometry

Biolayer interferometry (BLI) can be used to determine the kinetics of binding. This technique measures the interference of white light reflected from an immobilized surface of tips over time.7 These tips are first saturated with a biomolecule that binds to the tips, followed by the addition of another molecule that has an affinity for the first biomolecule previously attached to the tips. To ensure proper binding detection, the tips are washed after each step with buffer to remove excess biomolecules from their surface. During the course of an experiment, an increase in interference signal indicates that the biomolecule has attached to the tips, from which kinetic rates of interaction can be measured.

204

Figure A-4. BLI data of dsRBDs interacting with either the streptavidin tips or free biotin. The increase in signal indicates addition of molecules to the tips.

The experiments were performed on the ForteBio Octet BLI instrument located in the

Crystallography Facility in Althouse. When I noticed no increase in signal after the addition of

DGCR8 to the tips pre-saturated with ds33, I began searching for the cause. I designed the experiment such that DGCR8 should bind to ds33, which is pre-bound to streptavidin tips via a biotin label on the RNA. However, I found that the dsRBDs have a high preference for either the streptavidin tips or free biotin that was masking any binding between the dsRBD and dsRNA. To this end, two controls were performed (Fig. A-4): addition of DGCR8-Core to pre-coated biotin tips (green trace) and addition of Dicer-dsRBD to the free streptavidin tips (magenta trace). In the case of DGCR8, the protein is either interacting with biotin or with the leftover free streptavidin tips not coated with biotin from the first step. As for Dicer, the protein is clearly interacting with the free streptavidin tips. In either case, the interactions of the dsRBDs with either the streptavidin tips or the biotin that would be used to attach the dsRNA is interfering with the 205

desired dsRBD:dsRNA binding interaction. I found no other suitable tips to use from ForteBio’s collection of BLI tips; therefore this technique was abandoned for monitoring dsRBD binding.

A.3.5 dsRBD Footprinting Using In-Line Probing

Although RNA structure mapping techniques8, 9 are typically used for probing RNA structure in the free state, these are also very useful for determining regions of protection by a bound ligand, such as a dsRBD – a technique that is called footprinting.10 These include, but are not limited to, DMS modification, endonuclease cleavage, hydroxyl radical probing, SHAPE chemistry, and in-line probing. In-line probing is unique because it allows monitoring the structure of every nucleotide in the RNA without the use of a chemical or enzymatic probe. This method utilizes the natural degradation of the RNA over time through hydrolysis, which is then analyzed on a polyacrylamide gel. Therefore, high-intensity detection of bands on the gel designates positions of high degradation, indicating regions of unstructured RNA. Upon the addition of protein, particular regions on the RNA become protected from the natural hydrolysis degradation, decreasing the intensity of these bands on the gel. 206

Figure A-5. In-line probing results are ambiguous for determining regions of binding by dsRBDs. The “Control” is pri-mir-16-1 monitored at 0 and 3 days without the addition of protein.

The T1 and OH- ladders give reference to sequence within the pri-miRNA. The remaining lanes represent the pri-miRNA incubated in the presence of the designated protein at 0 and 3 days. The full-length pri-miRNA, stemloop, and stems are indicated.

The gel above shows that in-line probing was found to be very difficult to interpret for dsRBD binding. Figure A-5 shows in-line probing performed on pri-mir-16-1 using a variety of 207

dsRBDs available in the lab. In most cases, there is an extreme disappearance of full-length pri- miRNA, suggesting that the RNA was removed with the protein during the phenol/chloroform extraction step before running on the gel. Of note, Proteinase K treatment was also attempted to denature the protein at the end of the 3 days before phenol/chloroform extraction; however, this provided little improvement as can be seen from the full-length pri-miRNA band still missing on the gel. Although some dsRBDs showed a decrease in band intensity for the stem regions with an increase in intensity in the stemloop region, as would be expected, too much of the overall RNA is missing on the gel to confidently deem these accurate footprinting results. Surprisingly, the

RNA is essentially missing in the presence of Drosha-dsRBD, which we have shown with

EMSAs does not bind RNA. Therefore, after multiple attempts to recover the RNA from the protein – via proteinase treatment, addition of SDS, and boiling – this technique was abandoned for determining dsRBD footprinting regions on RNA.

A.3.6 dsRBD Footprinting Using SHAPE Chemistry

As discussed extensively in Chapters 2 and 3, SHAPE chemistry is a useful technique for determining the solution structure of RNA on a per-nucleotide basis. Briefly, the conformationally flexible 2´-hydroxyls of the RNA react selectively with an electrophile to form a 2´-O-adduct. The resulting reactivities are dependent upon the nucleotide’s constraints by base pairing or other interactions; i.e., nucleotides participating in base pairs are unreactive, whereas those in loops or bulges are typically reactive. As can be done with in-line probing, one can monitor SHAPE reactivities of the RNA when bound to protein, resulting in decreased reactivities for those nucleotides protected by the protein. Therefore, I attempted to use SHAPE to show: 1) if the pri-miRNA is re-folding to adapt to binding by DGCR8-Core or remaining constrained

(depicted in Fig. A-6); 2) where the DGCR8-dsRBDs are binding to the RNA. 208

Figure A-6. Depiction of possible SHAPE reactivity results upon DGCR8 binding to pri-miRNA.

If the dsRBDs are simply binding to the pri-miRNA and inducing no conformational change of the RNA, then only a decrease in reactivities at the binding sites will be seen. However, if the pri- miRNA is undergoing a conformational change upon binding, then increased SHAPE reactivities will be seen at the affected sites, in addition to decreased reactivities at the binding sites.

DGCR8 protection assays using SHAPE chemistry for a panel of pri-miRNAs suggests that DGCR8 may be interacting with the entire pri-miRNA (Fig. A-7A – A-7C). Although it is possible that DGCR8-Core is coating the entire stem of the RNA and possibly the terminal loop of pri-miRNAs, controls were performed by examining footprinting on tRNA with DGCR8-Core

(Fig. A-7D). The SHAPE reactivity profile for tRNA appears to corroborate the pri-miRNA footprinting results, but with a smaller decrease in reactivity upon DGCR8 protection. In addition, another control was performed with pri-mir-30a in the presence of Drosha’s dsRBD (Fig. A-7E).

Even though we know that Drosha-dsRBD does not bind RNA at all, extreme protection of the pri-miRNA is being seen. Therefore, footprinting using SHAPE chemistry was abandoned. 209

Figure A-7. dsRBD footprinting using SHAPE chemistry. On the left are the RNAs and the proteins used for footprinting; the right shows the SHAPE reactivity profiles with (dark blue) and without (light blue) protein. A decrease in reactivity in the presence of protein suggests that these nucleotides are bound to the protein and thus being protected from the SHAPE reagent.

A.3.7 Mass Spectrometry Analysis of UV-induced Cross-linked Complexes

Cross-linking is a viable means for covalently attaching biomolecules to each other, especially when the binding interaction is highly dynamic. Although several ways exist for cross- linking molecules, I have only attempted UV-induced cross-linking. This method involves 210

allowing a mixture of protein and RNA to reach equilibrium, followed by exposing the solution to

254 nm UV light for some period of time.11 The time is dependent upon how long it takes to yield sufficient cross-linking, while not degrading the molecules. Figure A-8 shows pri-mir-16-1 cross- linked to DGCR8-Core after 20 minutes of UV exposure. In order to prevent random cross- linking attachments from occurring, herring sperm DNA was added to the mix as a competitor, which did seem to increase the specificity of the binding interaction (Fig. A-8A, last lane). As controls, the gel shows that the RNA does not cross-link to itself or to the DNA, only to DGCR8;

I also saw that DGCR8 did not cross-link to itself (not visible on gel due to lack of radioactive tag). In addition, I was able to determine the optimum amount of DGCR8-Core to add to the binding reaction before cross-linking (Fig. A-8B). Interestingly, this amount lies around the binding dissociation constant for pri-mir-16-1 (Kd ~ 1 µM).

Figure A-8. UV-induced cross-linking of DGCR8-Core with pri-mir-16-1 shown with radiography (32P-pri-mir-16-1). (A) Combinations of the RNA, competitor herring sperm DNA, and DGCR8-Core were mixed and cross-linked (XL) using 254 nm UV light for 20 minutes. (B)

The RNA was mixed with increasing amounts of DGCR8-Core and cross-linked, in the presence of DNA. ‘Ctrl’ is not cross-linked. The free pri-miRNA and cross-linked complexes are indicated. 211

Cross-linked complexes can also be analyzed with mass spectrometry to determine the regions of interactions on either the protein or RNA.12 Once cross-linked, the sample is purified on a denaturing gel, which does not destroy the complex because it is now covalently attached.

The gel pieces then undergo in-gel digestion by proteases and RNases to break up the complex into smaller analyzable fragments. The digested fragments are extracted from the gel and prepped for analysis on a mass spectrometer. However, the difficulty with analyzing the fragments lies within knowing the exact mass of the digested RNA fragments, because RNA nucleotides undergo natural hydrolysis at a variety of positions along the phosphodiester backbone, which all produce distinctively massed fragments.

Although UV-induced cross-linking was extremely successful for DGCR8, the mass spectrometry analysis was very ambiguous. For my experiments, I used the Orbitrap located in the Proteomics and Mass Spectrometry Facility, all done with the help of Director Tatiana

Laremore. Of note, due to the large size of pri-miRNAs, we cross-linked DGCR8 to a 16 bp dsRNA containing a 4-thio-uridine (4SU) in the middle of the duplex to enhance cross-linking efficiency. Upon cross-linking, we saw that multiple regions of DGCR8 were no longer recognized on the mass spectrometer, suggesting that these regions may have been modified with cross-linked portions of the RNA (Fig. A-9). Surprisingly, we did see that majority of these regions that were no longer recognized overlap with the key residues involved in RNA binding within α-helix 1, the loop between β-strands 1 and 2, and base of α-helix 2 (Fig. A-9, dotted lines; see Chapter 1 for RNA binding regions within dsRBDs). We attempted to match these mass modifications to the probable nucleotide fragments generated from RNase digestion, but were unsuccessful. Therefore, due to the ambiguity of the results, this technique was abandoned. Of note, according to Tatiana, intact protein analysis can be performed on complexes between 40-

100 kDa. This is done by not digesting with enzymes, and instead analyzing the cross-linked complex directly on the mass spectrometer to determine the molecular weight of the entire 212

complex. This could potentially yield the stoichiometry of the dsRBD:dsRNA complex; however this technique was never attempted.

Figure A-9. Mass spectrometry analysis of DGCR8-Core (aa 505-720) cross-linked to dsRNA.

‘Control’ is DGCR8 with no cross-linking; ‘XL’ is DGCR8 cross-linked to ds16-4SU. The green color indicates masses correlating to these regions in DGCR8 were recovered on the mass spectrometer. The absence of green color suggests that these masses were too small to be unambiguously matched to the DGCR8 sequence or they contained a modification, such as a cross-linked RNA nucleotide. The α helices and β strands of DGCR8 are indicated above and the amino acid position indicated below. The five key residues believed to be involved in dsRNA binding for each dsRBD are indicated with dotted lines. Done in collaboration with Tatiana

Laremore.

A.4 Acknowledgements

This work was supported by the US National Institutes of Health grant R01GM098451 and start-up funds from the Pennsylvania State University to SAS. I thank Philip Bevilacqua,

Tatiana Laremore, Melissa Mullen, Durga Chadalavada, Monique Bastidas, Amanda Applegate, and Kit Kwok for helpful discussion while developing the experimental protocols. I would also like to thank the Bevilacqua Lab, Crystallography Facility, and Mass Spectrometry Facility for use of their equipment.

213

A.5 References

1. Quarles, K. A.; Sahu, D.; Havens, M. A.; Forsyth, E. R.; Wostenberg, C.; Hastings, M. L.; Showalter, S. A., Ensemble analysis of primary microRNA structure reveals an extensive capacity to deform near the drosha cleavage site. Biochemistry 2013, 52 (5), 795-807. 2. Wostenberg, C.; Quarles, K. A.; Showalter, S. A., Dynamic origins of differential RNA binding function in two dsRBDs from the miRNA "microprocessor" complex. Biochemistry 2010, 49 (50), 10728-36. 3. Patel, S.; Blose, J. M.; Sokoloski, J. E.; Pollack, L.; Bevilacqua, P. C., Specificity of the Double-Stranded RNA-Binding Domain from the RNA-Activated Protein Kinase PKR for Double-Stranded RNA: Insights from Thermodynamics and Small-Angle X-ray Scattering. Biochemistry 2012, 51 (46), 9312-9322. 4. Cole, J. L., Analysis of PKR activation using analytical ultracentrifugation. Macromol Biosci 2010, 10 (7), 703-13. 5. Gilligan, T. J.; Schwarz, G., Self-Association of Adenosine-5'-Triphosphate Studied by Circular-Dichroism at Low Ionic Strengths. Biophys Chem 1976, 4 (1), 55-63. 6. Lebowitz, J.; Lewis, M. S.; Schuck, P., Modern analytical ultracentrifugation in protein science: a tutorial review. Protein Sci 2002, 11 (9), 2067-79. 7. Concepcion, J.; Witte, K.; Wartchow, C.; Choo, S.; Yao, D. F.; Persson, H.; Wei, J.; Li, P.; Heidecker, B.; Ma, W. L.; Varma, R.; Zhao, L. S.; Perillat, D.; Carricato, G.; Recknor, M.; Du, K.; Ho, H.; Ellis, T.; Gamez, J.; Howes, M.; Phi-Wilson, J.; Lockard, S.; Zuk, R.; Tan, H., Label-Free Detection of Biomolecular Interactions Using BioLayer Interferometry for Kinetic Characterization. Comb Chem High T Scr 2009, 12 (8), 791- 800. 8. Ehresmann, C.; Baudin, F.; Mougel, M.; Romby, P.; Ebel, J. P.; Ehresmann, B., Probing the structure of RNAs in solution. Nucleic Acids Res 1987, 15 (22), 9109-28. 9. Weeks, K. M.; Mauger, D. M., Exploring RNA Structural Codes with SHAPE Chemistry. Acc Chem Res 2011, 44 (12), 1280-1291. 10. Toroney, R.; Nallagatla, S. R.; Boyer, J. A.; Cameron, C. E.; Bevilacqua, P. C., Regulation of PKR by HCV IRES RNA: importance of domain II and NS5A. J Mol Biol 2010, 400 (3), 393-412. 11. Harris, M. E.; Christian, E. L., Rna Crosslinking Methods. Method Enzymol 2009, 468, 127-146. 12. Kramer, K.; Hummel, P.; Hsiao, H. H.; Luo, X.; Wahl, M.; Urlaub, H., Mass- spectrometric analysis of proteins cross-linked to 4-thio-uracil- and 5-bromo-uracil- substituted RNA. Int J Mass Spectrom 2011, 304 (2-3), 184-194.

Curriculum Vita

Kaycee Andrea Quarles [email protected]

Education • Pennsylvania State University, University Park, PA May 2015 Doctor of Philosophy in Chemistry (with a focus in biophysical chemistry), 3.74 GPA • Georgia Institute of Technology, Atlanta, GA May 2009 Bachelor of Science in Chemical and Biomolecular Engineering, Biotechnology Focus, 3.43 GPA

Work Experience • Pennsylvania State University, University Park, PA Research Assistant to Dr. Scott Showalter • Georgia Institute of Technology, Atlanta, GA Undergraduate Research Assistant to Dr. Nicholas Hud

Publications • Quarles, K.A. and Showalter, S.A. Deformability in the Cleavage Site of Primary MicroRNA is Not Sensed by the Double-Stranded RNA Binding Domains in the Microprocessor Component DGCR8. Submitted. • Acevedo, R., Orench-Rivera, N., Quarles, K.A., and Showalter, S.A. Helical Defects in MicroRNA Influence Protein Binding by TAR RNA Binding Protein. PLoS ONE. 2015, 10(1). • Quarles, K.A., Sahu, D., Havens, M., Forsyth, E., Wostenberg, C.W., Hastings, M., and Showalter, S.A. Ensemble Analysis of Primary miRNA Structure Reveals an Extensive Capacity to Deform Near the Drosha Cleavage Site. Biochemistry. 2013, 52. • Wostenberg, C.W., Lary, J.W., Sahu, D., Acevedo, R., Quarles, K.A., Cole, J.L., and Showalter, S.A. The Role of Human Dicer-dsRBD in Processing Small Regulatory RNAs. PLoS ONE. 2012, 7(12). • Wostenberg, C.W., Quarles, K.A., and Showalter, S.A. Dynamic Origins of Differential RNA Binding Function in Two dsRBDs from the miRNA “Microprocessor” Complex. Biochemistry. 2010, 49(50). • Horowitz, E.D., Engelhart, A.E., Chen, M.C., Quarles, K.A., Smith, M.W., Lynn, D.G., and Hud, N.V. Intercalation as a Means to Suppress Cyclization and Promote Polymerization of Base-pairing Oligonucleotides in a Prebiotic World. Proceedings of the National Academy of Sciences. 2009, 107(12).

Presentations and Posters • “Utilizing Biophysical Techniques to Investigate RNA:Protein Interactions in the RNA Interference Pathway” Poster Presenter at: Materials Day. University Park, PA, October 15, 2013. Biophysical Society Pennsylvania Network Meeting. University Park, PA, October 4, 2013. • “RNA Recognition and Binding by the Microprocessor” Poster Presenter at ACS National Meeting Sci- Mix Session. Indianapolis, IN, September 9, 2013. • “Primary MicroRNA Structure and its Recognition by the Microprocessor” Oral Presenter at ACS National Meeting. New Orleans, LA, April 7, 2013. • “Biochemical Analysis of Primary miRNA Structure Reveals an Extensive Capacity to Deform Near the Drosha Cut Site” Oral Presenter at RNA Society Meeting. Ann Arbor, MI, June 1, 2012. • “RNA Structures that Unlock the Cure to Diseases” Poster Presenter at PSU Graduate Exhibition. University Park, PA, March 25, 2012. • “Pri-miRNA Structure: Likes and Dislikes of the Microprocessor” Poster Presenter at Rustbelt RNA Meeting. Dayton, OH, October 21, 2011.

Awards and Honors Penn State Eberly College of Science Graduate Student Braucher Award: August 2013 Penn State Graduate Exhibition 2nd Place in Physical Sciences & Mathematics: April 2012 Dalalian Graduate Fellowship Travel Award: January 2012, January 2013, August 2013 Roberts Fellowship: August 2009