Restoring and enhancing Argonaute2-catalyzed cleavage

by

Grace R. Chen

B.A., Molecular and and English Rutgers University, 2010

Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

© 2018 Grace R. Chen. All rights reserved

The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Signature of Author: ______Grace R. Chen Department of Biology March 9, 2018

Certified by: ______David P. Bartel Professor of Biology Thesis Supervisor

Accepted by: ______Amy E. Keating Professor of Biology Co-Chair, Biology Graduate Committee

Restoring and enhancing Argonaute2-catalyzed cleavage

by

Grace R. Chen

Submitted to the Department of Biology on March 9, 2018 in partial fulfillment of the requirements for the degree of in Biology

ABSTRACT

The core of the RNA silencing pathway relies on an Argonaute protein in complex with a small RNA. Together, this complex targets transcript RNAs through sequence complementarity to induce the destruction of the transcript RNA either through Argonaute2-mediated slicing or mRNA destabilization and decay. The RNAi pathway provides both innate immunity against foreign sequences, such as viruses and transposons, and has been harnessed as an efficient gene-knockdown tool in many eukaryotic species, but curiously, not in zebrafish. We discovered that RNAi is less effective in zebrafish at least partly because Argonaute2-catalyzed mRNA slicing is impaired. This defect can be traced to two conserved mutations that arose in an ancestor of most teleost fish almost 300 million years ago, implying that most fish lack effective RNAi. Despite lacking efficient slicing activity, these fish have retained the ability to produce miR-451, a microRNA generated by a cleavage reaction analogous to slicing. This ability is due to a G–G mismatch within the fish miR-451 precursor, which substantially enhances its cleavage. This led to the surprising discovery that an analogous G–G mismatch (or sometimes also a G–A mismatch) enhances target slicing, despite disrupting seed pairing important for target binding. These results provide a strategy for restoring RNAi to zebrafish and reveal unanticipated opposing effects of a seed mismatch with implications for mechanism and guide- RNA design.

Thesis supervisor: David P. Bartel Title: Professor of Biology

3 4 ACKNOWLEDGEMENTS

First and foremost, I would like to thank my PI and mentor, Dave Bartel. Dave, thank you for taking a chance on me as a first year when I knew nothing about RNA. And thank you for every year since for your guidance, support, and belief in my science and in me. These past years have not exactly been the easiest, but you have always been there with a “come in, Grace” to my three hard knocks. You have taught me what it means to not only be an exceptional scientist, presenter, and writer, but also an extraordinary mentor and human being. Thank you for always having my back.

To the faculty at MIT, especially my committee members, Hazel, Phil, and Tom. Thank you for your advice and input throughout the years. I would like to thank Hazel for welcoming me into your lab and letting me use your fish. To my other mentor and life coach, Frank Solomon. I don’t have the words to express how much you have helped me through these last few years. Thank you from the bottom of my heart.

To the entire Bartel lab, past and present – you guys are an unrivaled bunch. You have all taught me so much, and I would not be the scientist I am today without the Bartel lab. I have had the extreme honor to overlap with Laura, Asia, Lori, Wendy, David W., David K., David G., Vikram, Vincent, Stephen, Matt, Alex, Sean, Jeff, Tim, Charlie, Kathy, Danny, Elena, Justin, Thy, Glenn, Ben, Olivia, Lena, Igor, Katrin, Wenwen, Junjie, Sue-Jean, Coffee, Jamie, Namita, Xuebing, Dan, and Jarrett. I owe you all so much.

To my baymate for life, Ben, I don’t know what I would have done without you. Thank you for being my person every single day, through the good and the bad, for my entire graduate school career. I knew, no matter what, I could turn around and you would be there for me. Thank you and Anne, Elly, and Nils for adopting me into your family. To my big sister, Katrin, just simply, I love you. To my vigilante defender and enthusiastic supporter, Namita, thanks for everything bro. To the person that knows everything, Jeff, I’m going to miss all of our chats.

To my Camp Casco family – you guys crinkle my heart. Karat, you brought Camp Casco into my life, and for that, I will be forever grateful. I love you and your hot pink cursor. Stag team for life. Sparkles, thank you thank you thank you for bringing Camp Casco into all of our lives. You are an inspiration. Scout, Lo, Merms, Rails – camp friends are the best friends. Our one week in August is my favorite week of the year. You guys bring so much joy and happiness into my life, so thank you for showing me on a daily basis that there is good in the world.

To my BJJ family – you just can’t repeatedly, nightly, attempt to choke or break arms without becoming best friends with the owners of those necks and arms. To everyone at Broadway, and especially Jill and Erica, thank you for showing me how to be brave and train hard, not like a coward, every day, on and off the mats. I could not ask for better friends and roommates to struggle with and share every night with. To Brendan, you are one of the very best people I know, and I am so lucky to have you in my life. You mean the world to me.

To my friends – Cory and Kaitlin, sister wives for life. Stacie and Ben, my other wife and hubby, may we cabin and hammock forever. Drew, Janet, Sue, and Barry, the old guard that got me through everything. Liz and Elliot, my first friends outside of grad school. Matt Poss, we’re almost done! Mike, you’ve been like my family here. My bowling group, the bravest group of women I have ever known. The seven – Jess, Jen, Casey, Becky, Marissa, and Taylor – so this one time at band camp, I made some best friends. Finally, Amanda, my first lab wife and sister and roommate and best friend. I love you so much. To all these excellent human beings in my

5 life – I have been so lucky to call you guys my friends and family. I would not be where I am today without you. You guys have been my cheering section and my support system, keeping me sane and in touch with the world, reminding me that there is more to life than lab. I love you all so much.

Finally, to my wonderful, adorable, and supportive family. To my parents and grandparents and brothers, who have always believed in me and just wanted me to be happy. I love you.

6 TABLE OF CONTENTS

Abstract ...... 3 Acknowledgements ...... 5 Table of contents ...... 7 Chapter I. Introduction ...... 9 Part I. RNA silencing pathways ...... 9 siRNAs ...... 9 miRNAs ...... 10 piRNAs ...... 13 Part II. The role of Argonaute in RNA silencing pathways ...... 14 A brief history of RISC ...... 14 Structure of Argonaute proteins ...... 15 Target recognition and binding ...... 16 miRNA-mediated gene silencing ...... 19 Argonaute2-catalyzed slicing ...... 20 Mammalian Argonaute proteins ...... 23 Part III. Pre-miR-451 ...... 26 Part II. RNAi in zebrafish ...... 30 Part II. References ...... 33 Chapter II. A seed mismatch enhances Argonaute2-catalyzed cleavage and partially rescues severely impaired cleavage found in fish ...... 40 Abstract ...... 41 Introduction ...... 42 Results ...... 45 Inefficient slicing in zebrafish ...... 45 Two substitutions in a teleost ancestor explain the loss of efficient slicing ...... 48 An ancestral G–G mismatch within pre-miR-451 enhances cleavage ...... 52 Specific mismatches at miRNA position 6 enhance slicing of bound target ...... 55 Discussion ...... 61 Acknowledgements ...... 64 Author contributions ...... 64 Methods ...... 65 Supplemental information ...... 76 References ...... 85

7 Chapter III. Discussion and future directions ...... 89 Implications of an efficient and cleavage competent zebrafish Argonaute2 ...... 90 Effects of seed mismatches on target slicing ...... 94 Identifying determinants of target slicing rates ...... 97 Concluding remarks ...... 100 References ...... 102 Appendix A. The bromodomain protein Brd4 insulates chromatin from DNA damage signaling ...... 104 Appendix B. Poly(A)-tail profiling reveals an embryonic switch in translational control ...... 111 Curriculum vitae ...... 130

8 CHAPTER I

INTRODUCTION

Part I: RNA silencing pathways

RNA interference (RNAi) is believed to be the ancestor of all RNA silencing pathways that have

since diverged and evolved over time. In this simplest of these pathways, a long double-

stranded RNA (dsRNA) is cleaved into small interfering RNAs (siRNA) that then associate with an Argonaute protein to guide the Argonaute-catalyzed slicing of transcript RNAs with extensive pairing to the siRNA. Ancestral RNAi seems to have functioned primarily as a defense mechanism against viruses, through cytoplasmic Argonaute-centric pathways, and transposons, through nuclear, Piwi-centric pathways (Shabalina and Koonin, 2008). This pathway was present in the last common ancestor of modern eukaryotes and has been maintained in various forms in most eukaryotic lineages. Modern RNAi has branched into three major pathways – 1) siRNA-mediated pathways that function primarily in viral defense and transposon silencing, 2) microRNA-mediated pathways in the regulation of eukaryotic gene expression, and 3) PIWI- interacting RNA-mediated pathways in transposon silencing and maintaining genome integrity in the germline (Shabalina and Koonin, 2008).

siRNAs

Small interfering RNAs (siRNAs) are a class of short 21–25-nt RNAs that led to the discovery of a novel set of pathway that transformed our understanding of gene regulation. The concept of antisense RNAs silencing complementary targets was first observed in C. elegans in 1991 (Fire et al., 1991), but it was not until several years later that dsRNAs were found to be a far more potent and effective trigger for inducing gene silencing (Fire et al., 1998). The work of several groups demonstrated that long, perfectly base paired, dsRNAs could be processed into small

9 21–25-nt RNAs that would then go on to specifically silence target RNAs through sequence complementarity (Zamore et al., 2000).

siRNAs were initially thought to be primarily exogenous in origin and derived from invasive species - viruses, transgenes, or transposons, suggestive of their role in genome defense (Shabalina and Koonin, 2008). This defensive pathway could exploit invading genes by co-opting the foreign sequences, inserting them into the RNAi pathway, and suppressing their expression. In addition, endogenous dsRNAs have also been found to give rise to endogenous siRNAs (endo-siRNAs). These include convergent mRNA transcripts and RNA-dependent RNA polymerase (RdRP) substrates (Malone and Hannon, 2009).

miRNAs (miRNAs) are an abundant class of endogenous small ~22–24-nt RNAs that have been found to influence nearly all developmental processes and diseases (Bartel, 2018). The first miRNA was identified in 1993 when scientists discovered that the C. elegans gene lin-4 produced a short, noncoding RNA ~22-nt in length rather than code for a protein (Lee et al.,

1993). lin-4 was then found to have multiple sites of complementarity to the 3′ UTR of the lin-14 gene and binding of the lin-4 small RNA to these sites in lin-14 drastically reduced the levels of

LIN-14 protein without reducing the levels of lin-14 mRNA (Lee et al., 1993; Wightman et al.,

1993). This was the first indication that through sequence specific complementarity to sites within the 3′ UTR, small RNAs could regulate and mediate translational repression. Seven years after this first discovery, the second miRNA, let-7, was identified and found to be conserved across bilaterian animals (Pasquinelli et al., 2000; Reinhart et al., 2000). At the time, as both lin-

4 and let-7 were involved in regulating developmental timing, these two miRNAs were considered the first of a novel class of RNAs, temporal control genes involved only in development and initially referred to as small temporal RNAs. It was not until the first small-RNA cloning experiments from worms, flies, and mammals revealed hundreds of these small RNAs

10 that the scope of this novel class of RNAs was fully appreciated, opening the doors of this mode

of regulation across species (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros,

2001; Ruvkun, 2001).

Metazoan miRNAs are initially transcribed from endogenous non-protein-coding regions

by RNA Polymerase II (Pol II) as a long primary RNA transcript called a pri-miRNA (Cai et al.,

2004; Lee et al., 2004) (Figure 1). These pri-miRNAs can form hairpin structures, becoming substrates for Microprocessor, a heterotrimeric complex comprised of one Drosha endonuclease and two DGCR8 proteins. Microprocessor is responsible for the initial selection of which hairpins are true pri-miRNAs, acting as the gatekeeper of the miRNA biogenesis pathway

(Fang and Bartel, 2015). Drosha, which contains two RNase III domains, asymmetrically cuts

near the base of the pri-miRNA, generating a ~60-nt stem-loop structure called the miRNA

precursor or pre-miRNA. Typical of RNase III enzymes, Drosha cleaves the pri-miRNA in a

manner that leaves the stem of the hairpin with a 5′ monophosphate and a 2-nt 3′ overhang

(Nguyen et al., 2015). This pre-miRNA hairpin is then exported to the cytoplasm by Exportin 5, a

Ran-GTP-dependent nuclear-cytoplasmic transporter (Yi et al., 2003; Bohnsack et al., 2004),

where it is further processed by Dicer, another endonuclease with two RNase III domains

(Bernstein et al., 2001). Dicer recognizes the 5′ phosphate and 3′ overhang at the base of the

stem to precisely position itself to make the second cut roughly two helical turns away from the

base. In this manner, Drosha and Dicer exactly define the two ends of the miRNA duplex, with

both ends containing a 5′ monophosphate and a 2-nt 3′ overhang. The cleaved product, which

ranges from 21–25 nucleotides in length, becomes the mature miRNA duplex, comprised of the

miRNA guide paired to its passenger strand.

The duplex is then loaded into an Argonaute protein with assistance from chaperone

proteins Hsc70/Hsp90 in an ATP-dependent manner (Iwasaki et al., 2010). Argonaute

undergoes a conformational change to allow for the binding of the small RNA duplex and forms

the pre-RNA induced silencing complex (pre-RISC). The orientation of initial duplex loading

11

12 determines which strand ultimately becomes the miRNA, or guide strand, of the silencing

complex, and which strand is removed and degraded. The less thermodynamically stable 5′

terminal pairing is generally selected as the guide strand (Khvorova et al., 2003; Schwarz et al.,

2003), with Argonaute preferring a 5′-terminal pU or pA (Frank et al., 2010; Suzuki et al., 2015).

Once loaded, the passenger strand is removed. The exact mechanism of passenger strand

removal is still unclear, though one mechanism involves the nicking of the passenger strand by

Argonaute2 and the subsequent unwinding and removal of the cleaved products (Matranga et

al., 2005; Miyoshi et al., 2005; Rand et al., 2005), leaving a mature RISC comprised of an

Argoanute protein loaded with a guide RNA.

The guide RNA then provides sequence specificity through direct base pairing with

target RNA transcripts. Unlike siRNAs, miRNAs often do not have extensive complementarity to

their target mRNAs but rather have partial pairing between the miRNA seed (miRNA nucleotides

2–7) and sites within the 3′ UTR of target mRNAs. These regulatory sites often have an

additional match to miRNA nucleotide 8 or an A across from miRNA nucleotide 1, or both,

making 7- or 8-nt sites, respectively (Lewis et al., 2005). Some have additional pairing to the 3′

region of the miRNA (miRNA nucleotides 13–16), supplementing pairing to the seed region and

aptly termed the supplemental region. Hundreds of miRNA have since been identified and

characterized, and many have been found to be conserved across species. This widespread

and dominant RNA silencing pathway directs the posttranscriptional repression of hundreds of

mRNA targets and is integral to nearly all biological processes and disease.

piRNAs

PIWI-interacting RNAs (piRNAs) are a class of broadly conserved and mechanistically distinct small RNA that are typically 24–31-nt in length and expressed almost exclusively in the germline

(Iwasaki et al., 2015). This class of small RNA molecules was first discovered in Drosophila in

an analysis of how Stellate, a repetitive protein coding gene, was silenced in the male germline

13 (Aravin et al., 2001). Later RNA-profiling studies of Drosophila testes and early embryos expanded this class of piRNAs (Aravin et al., 2003). These piRNAs were found to associate with

PIWI proteins and form the piRNA-induced silencing complex (piRISC) to silence transposable elements in the germline (Aravin et al., 2006; Girard et al., 2006; Lau et al., 2006).

Unlike siRNAs or miRNAs, piRNAs are processed from single-stranded precursor transcripts expressed from intergenic regions known as piRNA clusters by a Dicer-independent pathway (Vagin et al., 2006). These clusters harbor a large number of transposable elements and are the basis of targeting by piRNAs via two biogenesis pathways – the primary pathway and secondary or ping-pong pathway (Siomi et al., 2011). The primary pathway functions in both germ cells and the surrounding somatic cells, whereas the ping-pong pathway exists solely in the germline. In the primary pathway, long primary transcripts are exported from the nucleus, processed into intermediates, trimmed to their mature length, 2′-O-methylated at the 3′ end, and loaded into a PIWI protein to produce the mature PIWI-piRNA complex or piRISC (Li et al.,

2009; Malone et al., 2009; Saito et al., 2009). These complexes are then transported back into the nucleus to transcriptionally regulate target genes. In the ping-pong pathway, two different

PIWI proteins, Aubergine and AGO3 work together to simultaneously silence transposons and amplify piRNAs (Brennecke et al., 2007; Gunawardane et al., 2007). The piRNA pathway is essential for gametogenesis, gonadal development, and male and to a lesser degree, female fertility (Siomi et al., 2011; Iwasaki et al., 2015).

Part II: The role of Argonaute in RNA silencing pathways

A brief history of RISC

RISC was initially characterized in Drosophila S2 cell extracts as the enzyme complex responsible for RNAi (Hammond et al., 2000). When S2 cells were transfected with long dsRNA, but not sense nor antisense single-stranded RNA, corresponding to either cyclin E or lacZ and then incubated with synthetic mRNAs for cyclin E or lacZ, only those extracts transfected with

14 cyclin E dsRNA were capable of degrading the cyclin E transcript, likewise for the lacZ

transcripts (Hammond et al., 2000). Further studies demonstrated that pre-incubation of dsRNA

in lysates enhanced its ability to inhibit gene expression, suggesting the need for these dsRNAs

to be converted into an active form. This active form was discovered to be the processing of

both strands of dsRNA into RNA segments of 21–23 nucleotides in length (Tuschl et al., 1999;

Zamore et al., 2000). These early studies proposed that the dsRNA-dependent gene silencing

mechanism of RNAi was accomplished by a sequence-specific nuclease complex, which

incorporated small RNAs as guides to target specific mRNAs based on sequence recognition.

They termed this complex RISC. Though Argonaute was identified as a component of RISC

through biochemical fractionation experiments, in which Argonaute proteins were found to co-

fractionate with sequence-specific nuclease activity (Hammond et al., 2000), it was not until later

that it was revealed to be the active ‘Slicer’ component of RISC (Liu et al., 2004; Song et al.,

2004).

Structure of Argonaute proteins

Argonaute proteins are composed of four conserved domains: the N-terminal, PAZ (Piwi-

Argonaute-Zwille), MID, and PIWI domains. Early structural work shows that the N, PIWI, and

MID domains form a crescent-shaped base with the PAZ domain above the crescent (Song et al., 2004). The N domain participates in RISC assembly and functions as a wedge, prying the two strands of the loaded duplex apart, leading to the unwinding of the duplex (Kwak and

Tomari, 2012). The MID domain adopts a Rossman fold and has a basic 5′ binding pocket that recognizes and binds the 5′ end of the guide RNA (Song et al., 2004). The PAZ domain has a

deviant oligonucleotide/ oligosaccharide-binding (OB) fold with a central cleft lined with aromatic

resides and binds the single-stranded 3′ end of guide RNAs (Lingel et al., 2003; Ma et al.,

2004). The PIWI domain, at the carboxyl terminus of Argonaute, sits in the center of the base

and shares structural homology to the RNase H family of enzymes (Song et al., 2004). RNase H

15 enzymes are DNA-guided, Mg2+-dependent nucleases that cleave DNA-RNA hybrids using the

conserved active site tetrad, Asp-Glu-Asp-Asp (DEDD), to produce both a 5′ product with a 3′

hydroxyl and a 3′ product with a 5′ monophosphate. RISC, though RNA-guided, is also Mg2+- dependent and produces the same products. Argonaute was also found to use the conserved

RNase H motif, DEDD (in yeast, and DEDH in mammals), to coordinate metal ion binding and catalysis (Nakanishi et al., 2012). The last residue of this motif (E) was only recently discovered

and requires a conformational change by Argonaute to insert it into the active site and complete

the catalytic tetrad.

Target recognition and binding

Within Argonaute, 5′ end of the miRNA guide is flipped out and buried within the MID domain,

unable to bind to the target, whereas the 3′ end is captured and bound by the PAZ domain (Ma

et al., 2005; Parker et al., 2005) (Figure 2). Structural studies have revealed that Argonaute has

a binding pocket that specifically binds an A at this first position (Schirle et al., 2015), and the

preference for an A across the first nucleotide of the guide is observed regardless of miRNA

sequence (Lewis et al., 2005). Guide nucleotides 2–5 are splayed out in a pre-organized,

stacked helical conformation with their Watson–Crick faces exposed. Past nucleotide 5, α-helix

7 of Argonaute introduces a kink to its guide RNA by inserting a hydrophobic residue between

guide nucleotides 6 and 7 that creates a steric block (Schirle and MacRae, 2012; Schirle et al.,

2014). Thus, target recognition and binding first occurs through Watson–Crick pairing between

the target and seed nucleotides 2–5 (Figure 2, step 1). Argonaute and its guide then work

cooperatively to recognize target RNAs – α-helix 7 shifts and relaxes the kink, which allows

guide nucleotides 6 and 7 to adopt an A-form conformation. Together, this conformational

change allows for pairing to propagate to nucleotides 6–8 and opens up and stabilizes the N-

PAZ channel (Figure 2, step 2).

16

17 It is proposed that pairing then skips from the seed region to a stretch of nucleotides in the 3′ end of the miRNA, known as the supplemental region (nucleotides 13–16). The nucleotides in this region are again exposed and available for target recognition. In this model, pairing initiates in the seed region, at which point, the target RNA then loops out of Argonaute to pair with the supplemental region (Figure 2, step 3). Pairing would then propagate in both directions from this second nucleation site. This allows the complex to avoid the topological constraints associated with twice wrapping the guide and target RNA around each other within the central binding channel. Here, the seed paired region would remain fixed while the supplementary paired region rotates and propagates pairing 3′-to-5′ along the miRNA back

toward the seed. In the opposite 5′-to-3′ direction, as the helix rotates, the 3′ end of the guide is

pulled from its binding pocket in the PAZ domain and base pairing is extended to the 3′ end of

the guide (Figure 2, step 4). Once target pairing is complete, Argonaute can either precisely

slice the target or recruit accessory proteins to promote other types of repression.

More recent studies have delved further into the distinct steps surrounding Argonaute-

catalyzed cleavage – binding and product release. Using total internal reflection fluorescence

microscopy, studies on the effect of dinucleotide mismatches along the length of the guide RNA

confirmed the importance of seed binding and revealed seed subdomains; specifically, seed

nucleotides do not all equally contribute to target binding. Dinucleotide mismatches at guide

positions 2–5 reduce the on-rate, or kon, 6- to 10-fold, whereas mismatches at positions 6–8 only reduce kon 1.3-fold, as compared to a fully seed matched target (Salomon et al., 2015). Thus, pairing of nucleotides 2–5 contributes more to the initial binding of RISC to target RNAs than does pairing to nucleotides 6–8. This pre-organizing and binding to the seed is exploited by

Argonaute to accelerate target finding. In the absence of Argonaute, nucleic acid hybridization is mainly influenced by complementarity – the greater the complementarity, the greater number of possible nucleation sites. However, in the presence of Argonaute, targets complementary to just the seed (2–8), targets complementary to the seed and 3′ supplemental region (2–8 and 13–

18 16), and targets fully complementary to the guide (2–21) all bind equally well, with similar

diffusion-limited on-rates, suggesting that seed pairing is the initial catalyst for target binding

and once the seed has been bound within RISC, target finding is accelerated. Mismatches

within the seed region and targets that lack seed pairing thus significantly lower the rate of RISC

binding target RNAs.

miRNA-mediated gene silencing

In humans and other bilaterian animals, the miRNA pathway mediates the posttranscriptional

silencing of complementary target mRNAs by mainly accelerating target mRNA decay and

degradation and in some cases by decreasing translational efficiency. Once RISC has bound its

target by base pairing to partially complementary sites in the 3′ UTR, Argonaute recruits the

highly unstructured scaffolding protein, TNRC6 (Jonas and Izaurralde, 2015). The unstructured nature of TNRC6, along with its numerous tryptophan-glycine repeats, facilitates this interaction, whereby two tryptophan residues on TNRC6 are inserted into tandem tryptophan binding pockets on the surface of the PIWI domain of Argonaute (Schirle and MacRae, 2012). TNRC6 simultaneously associates with the poly(A)-tail-bound poly(A)-binding protein (PABPC) through its PABP-interacting motif 2 (PAM2) domain and the deadenylase complexes, PAN2-PAN3 through PAN3 and CCR4-NOT through a subunit of the complex, NOT9. PAN2-PAN3 is thought to initiate the early phase of deadenylation in which poly(A) tails are shortened in a distributive manner until the tails have reached a certain length, at which point CCR4-NOT takes over to deadenylate the tails in a processive manner (Yamashita et al., 2005; Wahle and Winkler,

2013). Following deadenylation, the mRNA is decapped by the decapping enzyme, decapping protein 2 (DCP2), which requires and is enhanced by the cofactors DCP1, EDC3, EDC4,

PATL1, and DEAD box protein 6 (DDX6). DCP2 hydrolyzes the cap structure, leaving a 5′ monophosphorlyated mRNA, the ideal substrate for the 5′-to-3′ exoribonuclease 1 (XRN1), which rapidly degrades the deadenylated and decapped mRNA (Jonas and Izaurralde, 2015).

19 In most cellular contexts, the deadenylase complexes shorten the poly(A) tail, which subsequently causes the destabilization of the mRNA through decapping and 5′-to-3′ exonucleolytic decay. However, in some cases, tail shortening instead decreases translational efficiency. Thus, though miRNA-mediated gene silencing initially acts to shorten poly(A) tail

lengths, the consequences of tail shortening can differ. A prominent example of these two

regimes was uncovered during embryonic development where in pre-gastrulating,

transcriptionally inactive embryos, tail shortening predominantly decrease translational

efficiency, whereas in post-gastrulating, transcriptionally active embryos, tail shortening

predominantly decreases mRNA stability, hinting at a switch in the mode of translational control

during development (Subtelny et al., 2014). Global analyses indicate that mRNA decay

accounts for ~66–90% of repression in post-embryonic cells, with translational repression

accounting for ~6–26% (Eichhorn et al., 2014). A defined molecular mechanism for the

translational repression occurring in post-embryonic cells is still unclear; however, it is generally

thought that miRNAs inhibit cap-dependent translational initiation by interfering with the

eukaryotic initiation factor 4F (eIF4F) complex (Jonas and Izaurralde, 2015).

Argonaute2-catalyzed slicing

The vast majority of miRNA sites in mammalian mRNAs have little more than seed pairing and

utilize this binding mode for miRNA-mediated repression. However, for the few targets that have

extensive complementarity to the miRNA, repression can be mediated through Argonaute-

catalyzed slicing rather than mRNA decay or translational repression. For example, with the

exception of a G–U wobble at guide nucleotide position 5, miR-196a base pairs perfectly to a

site within the 3′ UTR of the HOXB8 mRNA and directs its cleavage (Yekta et al., 2004). Both

the sequence of miR-196a and HOXB8 are highly conserved and provide an instance of

endogenous miRNA-directed cleavage. However, few mammalian RNAs have such a high

degree of complementarity to their miRNAs; indeed, to date, only 21 cleavage targets of

20 mammalian miRNAs have been identified (Davis et al., 2005; Shin et al., 2010; Hansen et al.,

2011).

In these rare cases, Argonaute slices the target at the phosphodiester bond positioned opposite guide nucleotides 10–11. This precise positioning and cleavage is achieved by the initial synergistic recognition and binding of the target by both Argonaute and its guide. Once the target has bound the guide at both nucleation sites and pairing begins to propagate, the 3′ end of the guide is released from the PAZ domain. Argonaute then undergoes a conformational

change which precisely positions the target and inserts an active site glutamate into the catalytic

core to complete the conserved DEDH catalytic tetrad. Similar to RNase H enzymes,

Argonaute-catalyzed cleavage produces two products, a 5′ product with a 3′ hydroxyl and a 3′

product with a 5′ monophosphate. Once the slicing reaction is complete, the two cleaved

products are released from Argonaute and degraded by the 3′-to-5′ exosome ribonuclease

complex and the 5′-to-3′ exoribonuclease 1 (XRN1), respectively. Kinetic analyses with limiting

programmed RISC and excess target RNA revealed that RISC is a true multiple-turnover

enzyme, able to catalyze multiple rounds of RNA cleavage (Hutvagner and Zamore, 2002;

Haley and Zamore, 2004).

After each successive round of catalysis, Argonaute must release both its 5′ and 3′ cleaved products before it can search for a new target. Tethering studies have found that though sequence identity plays a role in product dissociation, the 5′ product is nearly always released first, followed by the slow dissociation of the 3′ product, which is complementary to the seed (Salomon et al., 2015). The rates of dissociation and subsequent product release are

similarly influenced by seed sequence and pairing. RISC dissociated from targets containing a

dinucleotide mismatch to the seed 70 to 3,200 times faster than a fully seed-matched target;

even a single mismatch to nucleotide 8, converting a 7-nt match to a 6-nt match, increased the

koff 24-fold (Salomon et al., 2015). These data support a model in which seed pairing is critical for both initial binding and subsequent release, where Argonaute proteins not only bind seed

21 mismatched targets slower but also remain bound to them for less time than fully seed matched

targets.

Interestingly, dissociation rates varied between the different types of targets (seed

matched, seed plus 3′ supplemental, and fully complementary) for fly and mouse Argonaute,

perhaps reflecting the different modes of repression used by these two organisms (Wee et al.,

2012). For the invertebrate fly, RISC dissociated slowly from a fully complementary target (half- life of t1/2 ~2.2 hr). Thus, it is likely that every target bound by fly Ago2-RISC will be sliced before

it dissociates, as the calculated kcat, with a t1/2 ~11 sec, is much faster than the rate of

dissociation. Targets that had only seed pairing or seed plus 3′ supplementary pairing

dissociated far more quickly from RISC, t1/2 ~15 sec and ~19 sec, respectively. For the mammalian mouse, RISC dissociated equally slowly from targets that were paired only to the seed region (t1/2 ~23 min), targets paired to the seed plus 3′ supplementary region (t1/2, ~25 min), and targets fully complementary to the entire guide RNA (t1/2, ~15 min). Given the

relatively slow kcat for mouse AGO2-RISC (t1/2 ~14 min), a fully complementary target is equally

likely to be sliced or released by RISC (Wee et al., 2012). These differences between fly and

mammalian Argonaute dynamics could reflect the different means of gene silencing – siRNA-

directed endonucleolytic cleavage in plants and invertebrates versus miRNA-mediated

repression in mammals.

Separate in vitro biochemical studies revealed that the effect of dinucleotide mismatches

in the seed region have a greater negative effect on single-turnover conditions ([E] > [S]) than

on multiple-turnover conditions ([E] << [S]), suggesting that mismatches enhanced a step only

when RISC catalyzed multiple rounds of target cleavage, presumably product release or

enzyme regeneration (Wee et al., 2012). In other words, mismatches that inhibit binding and

catalysis could enhance product release and enzyme turnover.

22 Mammalian Argonaute proteins

Mammals have eight Argonaute proteins that can be split into two subfamilies: the Ago

subfamily and the Piwi subfamily. The Piwi subclade of Argonaute proteins are expressed

mainly in germ cells as core components of the piRNA pathway, which silences transposons

and play essential roles in spermatogenesis (Siomi et al., 2011). The other four mammalian

Argonaute proteins in the Ago subclade are ubiquitously expressed and act as effector proteins

in the miRNA pathway, the dominant RNA silencing pathway of somatic cells. Of the four mammalian Argonaute proteins (AGO1, AGO2, AGO3, and AGO4), only Argonaute2 was consistently found to possess endonucleolytic cleavage activity. Purified RISC complexes containing individual Argonaute proteins were tested for slicing activity, and only Argonaute2- containg RISC was able to catalyze cleavage, despite all proteins being expressed at similar levels and binding similar amounts of transfected siRNA (Liu et al., 2004). In vitro reconstitution assays confirmed that purified AGO2 alone could be programmed with siRNAs to cleave a complementary substrate. Further, this specific endonuclease activity is critical for development;

mice lacking AGO2 are embryonic lethal. Despite overlapping expression levels and patterns

from the other Argonaute proteins, AGO2-deficient mice were severely developmentally delayed

with defects in neural tube closure, placental development, and had enlarged hearts that

resulted in cardiac failure, perhaps indicating a specialized role for Argonaute2 or slicing activity

in mammals (Liu et al., 2004; Cheloufi et al., 2010; Jee et al., 2018).

All four Argonaute proteins are incredibly similar in their primary sequences; however,

Argonaute2 is the only mammalian Argonaute protein capable of consistent endonuclease activity. This activity lies mainly in the conserved catalytic tetrad, DEDH within the PIWI domain.

AGO1 and AGO4 both have incomplete tetrads, DEDR and DEGR, respectively; interestingly,

AGO3 has an intact catalytic tetrad but does not possess consistent slicing activity, suggesting that a complete tetrad is not sufficient for slicing activity. Although recently, AGO3 purified from insect cells has been shown to have some, selective slicer activity (Park et al., 2017). For the

23

remaining Argonaute proteins, several labs have identified key features to activating these inactive slicer enzymes. These studies involved large domain swap experiments, substituting each functional domain of AGO2 with the equivalent domain of the other AGO protein, to narrow down the regions of Argonaute2 that contribute to activity (Figure 3).

For the simplest of the Argonaute proteins, AGO3, which already contained an intact catalytic tetrad, it was discovered that wild-type protein has retained some endogenous cleavage activity, though slicing is highly dependent on guide RNA sequence (Park et al., 2017).

In the absence of these specific substrate requirements, AGO3 is unable to catalyze the cleavage of targets it encounters. To robustly activate AGO3 slicing activity, chimeric experiments were performed, and slicing activity was found to be maintained by simply swapping the PIWI domain of AGO3 into AGO2. However, the reciprocal swap, the AGO2 PIWI domain in AGO3 was insufficient to activate slicing activity. Thus, though the AGO3 PIWI domain itself was sufficient for cleavage activity, something in the remainder of the protein was inhibiting cleavage activity. Further chimeric experiments revealed that the N-terminal domain of

AGO3 contained inhibitory sequences and identified two short sequence elements, NTI and

NTII, within this region (amino acids 1–64 and 137–160 or amino acids 44–48 and 134–166 depending on the study) that were responsible for inactivating slicing activity (Hauptmann et al.,

24 2013; Schurmann et al., 2013) (Figure 3). Thus, replacing the sequences of AGO3 NTI and

NTII with the corresponding AGO2 sequence activates AGO3 slicing activity.

Human AGO1, despite also being relatively well conserved and displaying no obvious structural changes, required several additional changes to turn it into an active slicer. Similar to

AGO3, the AGO1 PIWI domain, with the restored catalytic tetrad (DEDR to DEDH), when swapped into the AGO2 protein was sufficient to restore slicing activity, but the reciprocal swap was not, again suggesting other inhibitory sequences in the rest of the protein that abrogated slicing activity. Studies revealed two AGO1-specific clusters (cl1 and cl2) in the PIWI domain

close to the catalytic residues (Hauptmann et al., 2013). Replacing the sequence of AGO1-cl1

with the corresponding sequence of AGO2-cl1 had no effect on slicing activity; however,

replacing cl2 restored some function, but only when the AGO1 PIWI domain with these changes

was placed in the AGO2 backbone. To identify other regions of AGO1 that were inhibiting

activity, studies turned to the two motifs found in the N domain that had proved crucial in

restoring activity to AGO3. One group showed that an AGO1 containing amino acids 1–64 (NTI)

and cl2 from AGO2 as well as an active catalytic tetrad was sufficient to catalyze slicing

(Hauptmann et al., 2013). A separate group narrowed down cl2 to a single amino acid change,

L674F (Faehnle et al., 2013) (Figure 3). Structural comparison studies modeling the AGO1

PIWI domain onto the AGO2 structure revealed that the AGO1-specific cluster, cl2, changes a

helix in AGO2 to an unstructured loop in AGO1, perhaps mis-orienting an active site residue

and abolishing slicing (Faehnle et al., 2013; Hauptmann et al., 2013).

AGO4 is the least conserved of the four Argonaute proteins, as it is missing two of the

four catalytic residues, DEGR. Comparative sequence analysis and chimeric AGO2-AGO4

proteins were used uncover the determinants for restoring activity to AGO4. The prior

requirements for AGO1 and AGO3 were also necessary for AGO4, including a complete

catalytic tetrad and the AGO2 sequences from both N terminal motifs (NTI and NTII) and cl2 in

the PIWI domain. However, the deletion of an AGO4-specific insertion (amino acids 631–640)

25 was also necessary for slicing activity (Hauptmann et al., 2014) (Figure 3). The insertion is

predicted to form two β strands connected by an unstructured loop adjacent to the catalytic

glutamate, perhaps affecting the ability of the glutamate to be inserted into the active site or the

conformational change required by AGO.

Taken together, the four mammalian Argonaute subfamily members revealed four

distinct features of Argonaute slicing activity. First, cleavage requires a complete catalytic tetrad

composed of DEDH, which is already present in AGO2 and AGO3 (needed in AGO1 and

AGO4). Second, two structural motifs in the N domain (NTI and NTII), amino acids 1–64 and

137–160 or amino acids 44–48 and 134–166, depending on the study, inhibit slicing activity

(AGO1 only requires the first, AGO3 and AGO4 require both). Third, a short sequence cluster,

termed cl2, in the PIWI domain converts an α-helix into an unstructured loop, likely mis-orienting

of the catalytic center (AGO1, AGO3, and AGO4). Fourth, the deletion of a short insertion

composed of two β strands near the catalytic glutamate, again likely affecting the active site

(AGO4).

Part III: Pre-miR-451

Many studies have indicated that the four mammalian Argonaute proteins share similar expression patterns and a similar pool of bound RNAs; however, only Argonaute2 was shown to be essential during development (Cheloufi et al., 2010). It was unclear what the basis for this requirement was – whether it was due to the loss of catalytic cleavage activity or to the unique

expression pattern of Argonaute2, the sole Argonaute protein expressed in extra-embryonic

tissues, during development (Cheloufi et al., 2010). Mutant mouse studies with a catalytically

inactive Argonaute2 demonstrated that while cleavage competence was not required for

development, it was required for viability. Catalytically dead Argoaute2 neonates were born with

no gross morphological defects but became pale after birth, indicative of anemia, and died

within a few hours (Cheloufi et al., 2010). This anemic phenotype prompted a full blood cell

26 count where it was discovered that these mutant mice had an increase in proerythroblasts (pro-

E) but a ~50% reduction of peripheral red blood cells, hinting at a blockage in erythrocyte

maturation, specifically at the transition of pro-E cells into basophilic erythroblasts (Cheloufi et

al., 2010). Similarly, mutant zebrafish lacking both maternal and zygotic copies of Argonaute2

(MZago2) showed no morphological defects during gastrulation and had normal brain and heart

development but showed a significant reduction in hemoglobinized erythrocytes (Cifuentes et

al., 2010). These phenotypes indicated a role for Argonaute2 catalytic activity in erythropoiesis

and led researchers to profile miRNA-directed, Argonaute2 slicing products. RNA sequencing data from mice with a catalytically dead Argonaute2 and 48-hr MZago2 zebrafish embryos showed that nearly all miRNAs were present at identical levels between wild-type and mutant animals with the exception of one – miR-451, a miRNA conserved across vertebrates and previously implicated in the development of pro-E into basophilic erythroblasts (Cheloufi et al.,

2010; Cifuentes et al., 2010).

miR-451 and miR-144 form a miRNA cluster and are abundantly co-expressed from the

same primary transcript in erythroid cells (Figure 4). Both primary miRNAs are processed by

Drosha to generate their pre-miRNA structures; however, that’s where the similarities end. miR-

144 has the canonical pre-miRNA structure and is processed by Dicer to generate the mature

miRNA duplex, where the guide strand maps to the stem of the hairpin structure (Figure 4A, left). The annotated sequence of the mature miR-451 guide strand, on the other hand, spans the loop region of its hairpin and extends into the stem, seemingly incompatible with Dicer- catalyzed cleavage (Figure 4A, right). Moreover, pre-miR-451 is a 42-nt hairpin with an

unusually short, 17-nt stem (Dicer substrates typically require a stem length of > 19-nt), and

miR-451 was one of the few miRNAs refractory to the loss of Dicer in both mice and zebrafish.

Together, these data were all suggestive of a Dicer-independent and Argonaute-dependent pathway (Cheloufi et al., 2010; Cifuentes et al., 2010).

27

Taking it a step beyond the genetic and sequencing data, in vitro cleavage assays monitoring 5′ end-labeled pre-miR-451 incubated with affinity purified wild-type and inactive

Argonaute2 complexes confirmed that Argonaute2 was in fact responsible for the cleavage of

pre-miR-451 (Yang et al., 2010). Only wild-type Argonaute2 was able to produce the expected

product with band fragments consistent with Argonaute2-catalyzed slicing. When a mismatch to

the predicted cleavage site was introduced, slicing was abolished (Cheloufi et al., 2010). It was

reported several years later that after Argonaute2 made the initial cut, poly(A)-specific

ribonuclease (PARN) was the enzyme that 3′–5′ exonucleolytically trimmed the cleaved pre-

miR-451 to the appropriate length (Yoda et al., 2013).

Structural studies that varied aspects of pre-miR-451 including loop length, stem length,

and sequence composition revealed that it was the secondary structure, rather than specific

28 sequence of pre-miR-451 that was critical for Argonaute2 loading and cleavage (Yang et al.,

2012). Pre-miR-451 is surprisingly well base paired through its stem and seems to require most

of this functional pairing. Mismatches or unpaired based along the stem drastically impaired its

ability to be sliced, indicative of a need for a high degree of hairpin structure (Yang et al., 2012).

A notable exception to this extensive base pairing is guide nucleotides 6, a constant and

invariable G, and guide nucleotide 35, which pairs to guide position 6. Position 35 is highly variable and has mutated throughout evolution (Figure 4B). Ancestrally, and still maintained by fish and amphibian species, this nucleotide is a G, creating a G–G mismatch with position 6.

However, G35 mutated to a C in an amniote ancestor that gave rise to reptiles, birds, and mammals and changed the ancestral G–G mismatch to a G–C Watson–Crick match. It has since mutated again to a U in some species, including humans or back to the ancestral G.

Primates possess all three possibilities with humans acquiring the U, gibbons retaining the C, and old-world monkeys reverting back to the G (Figure 4B). Notably lacking is an A at this position. This hypervariable nucleotide was shown to play a role in the processing of miR-451, with both the G–G and G–A mismatches enhancing activity, or more specifically, promoting more efficient resection of cleaved pre-miR-451 (Yang et al., 2012).

Though the mature miRNA sequence extended into the loop, neither loop length nor loop sequence was essential or affected pre-miR-451 processing. Stem length, however, did

play a role. Pre-miR-451 has a 17-nt stem and is cleaved by Argonaute2 between nucleotide

30–31, which corresponds to the positions across from nucleotides 10–11, the typical

Argonaute2 cleavage site for both passenger strand and target cleavage. Extending the stem by

1–3 base pairs still results in efficient slicing by Argonaute2, and perhaps surprisingly, even

stems extended by 5–8 base pairs were processed, though these were suitable substrates for

both Argonaute2 and Dicer. Thus, while Argonaute2 may prefer shorter substrates, it is also

capable of slicing longer Dicer substrates. These structural pre-miR-451 studies hinted at a

novel method of introducing and loading siRNAs into Argonaute2. Reprogramming the

29 backbone of miR-451 and replacing its sequence with that of other miRNAs while maintaining its secondary structure generated a similar amount of cleaved and loaded Argonaute2, equally capable of slicing targets and repression (Cheloufi et al., 2010; Cifuentes et al., 2010; Yang et al., 2010; Yang et al., 2012), providing a potential alternative to the traditional shRNA approaches.

Part IV: RNAi in zebrafish

Since the discovery of RNAi, scientists have leveraged this system to exogenously and specifically knock down genes of interest. This technology has transformed the way researchers study gene function and has greatly benefited the field of biomedical research. It was quickly adapted in worms, flies, mice, and mammalian cell culture using both plasmid-based and chemically synthesized RNAs.

Despite the widespread use of RNAi across species, this knock down technique has proved recalcitrant in zebrafish (Kelly and Hurlstone, 2011). The first attempts at targeted gene knock down in zebrafish microinjected dsRNA fragments of various lengths into one cell embryos (Wargelius et al., 1999; Li et al., 2000). These initial tests observed gene knockdown as detected by phenotypic analysis with dsRNAs corresponding to the genes floating head and no tail, chosen as they presented with easily recognizable mutant phenotypes. However, an equal percentage of general defects were also observed regardless of the dsRNA sequence injected (Wargelius et al., 1999). These reports were followed by a series of studies that claimed the injection of dsRNAs led to global and non-specific defects in zebrafish. Injection of dsRNA corresponding to the T-box gene tbx1/spadetail (spt) caused rapid loss of tbx1/spt mRNA and mimicked the phenotype of spt mutant; however, the loss of genes unrelated to spt, including β catenin, stat3, and no tail, was also observed via in situ hybridization. In fact, blind studies comparing the effects of injecting dsRNA from the zebrafish tbx/spt, nieuwkoid/bozozok, and no tail genes and the bacterial lacZ gene could not distinguish between the various dsRNAs (Oates

30 et al., 2000). Further studies reported that RNAi caused non-specific and abnormal

development of zebrafish embryos as early as the 32-cell stage (Zhao et al., 2001). Together, these studies hinted at the large scale degradation of endogenous mRNAs without sequence specificity as well as the impracticality of RNAi in zebrafish.

One possible explanation for the unviability of RNAi in zebrafish could be the unique

miRNA environment of the early zebrafish embryo. The miRNA pathway is essential for

zebrafish development; zebrafish lacking their zygotic copy of Dicer (Zdicer) and thus the ability

to generate new mature miRNAs (these animals still retain their maternal copy of Dicer) arrest

at 10 days post fertilization (Wienholds et al., 2003). Zebrafish lacking both maternal and zygotic

copies of Dicer (MZdicer) resulted in a complete absence of mature miRNAs and displayed

abnormal morphogenesis during gastrulation, somitogenesis, and brain and heart development

(Giraldez et al., 2005). In an effort to identify which miRNAs were essential during this

developmental time period, small RNAs were cloned and sequenced from early embryos, which

led to the discovery of the miR-430 family of miRNAs, the most abundant family of miRNAs

expressed during early zebrafish development, with more than 90 copies of the miRNAs within a

stretch of 120 kb (Giraldez et al., 2005). miR-430 is initially expressed at ~2.5 hours post

fertilization (hpf) at the onset of zygotic genome activation and accumulates during the

maternal-to-zygotic transition (MZT), a conserved process whereby the embryo activates the

zygotic genome and no longer solely relies on maternally deposited products (Schier, 2007;

Giraldez, 2010). In fact, ~99% of all miRNAs expressed during this time are miR-430,

suggestive of their immense role during MZT and confirmed by transcriptional profiling studies

which revealed that miR-430 is responsible for eliminating maternally deposited transcripts

(Giraldez et al., 2006).

The discovery of this mode of clearance whereby this large pool of miR-430 acts to clear

maternal mRNAs during MZT hinted at possible reason as to why RNAi was so ineffective in

zebrafish. It was proposed that the addition of a large quantity of small RNAs competed with,

31 saturated, and inhibited the endogenous miRNA pathway necessary for development – rather than miR-430 binding Argonaute and clearing maternal mRNAs, an exogenous siRNA would instead be loaded into Argonaute, which would disrupt MZT, resulting in the persistence of maternal mRNAs and causing developmental delays. Despite this theory and attempts to circumvent this problem by the co-injection of additional miR-430, RNAi has persistently remained elusive in zebrafish.

An alternative approach bypassed the complications of the early embryo and sought to inhibit gene expression at later stages in zebrafish development. Small hairpin RNAs (shRNAs) that mimicked the endogenous miR-30e backbone were found to mediate effective RNAi when injected into zebrafish embryos or expressed from transgenic zebrafish (Dong et al., 2009; De

Rienzo et al., 2012). Using an F0 transgenic approach, for two genes, wnt5b and zDisc1, shRNAs could efficiently decrease endogenous RNA levels as assayed by qRT-PCR and produce phenotypes that resembled those of their respective mutants and morphants (De

Rienzo et al., 2012). These results suggest that engineering siRNAs within endogenous miRNA backbones, miR-30e or even pre-miR-451, could provide the platform for introducing siRNAs into zebrafish and eliciting a RNAi response.

32 REFERENCES

Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., Iovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., et al. (2006). A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442, 203-207.

Aravin, A.A., Lagos-Quintana, M., Yalcin, A., Zavolan, M., Marks, D., Snyder, B., Gaasterland, T., Meyer, J., and Tuschl, T. (2003). The small RNA profile during Drosophila melanogaster development. Dev Cell 5, 337-350.

Aravin, A.A., Naumova, N.M., Tulin, A.V., Vagin, V.V., Rozovsky, Y.M., and Gvozdev, V.A. (2001). Double-stranded RNA-mediated silencing of genomic tandem repeats and transposable elements in the D. melanogaster germline. Curr Biol 11, 1017-1027.

Bartel, D.P. (2018). Metazoan MicroRNAs. Cell 173, 20-51.

Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. (2001). Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409, 363-366.

Bohnsack, M.T., Czaplinski, K., and Gorlich, D. (2004). Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. Rna 10, 185-191.

Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and Hannon, G.J. (2007). Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089-1103.

Cai, X., Hagedorn, C.H., and Cullen, B.R. (2004). Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. Rna 10, 1957-1966.

Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010). A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature 465, 584-589.

Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698.

Davis, E., Caiment, F., Tordoir, X., Cavaille, J., Ferguson-Smith, A., Cockett, N., Georges, M., and Charlier, C. (2005). RNAi-mediated allelic trans-interaction at the imprinted Rtl1/Peg11 locus. Curr Biol 15, 743-749.

De Rienzo, G., Gutzman, J.H., and Sive, H. (2012). Efficient shRNA-mediated inhibition of gene expression in zebrafish. Zebrafish 9, 97-107.

Dong, M., Fu, Y.F., Du, T.T., Jing, C.B., Fu, C.T., Chen, Y., Jin, Y., Deng, M., and Liu, T.X. (2009). Heritable and lineage-specific gene knockdown in zebrafish embryo. PLoS One 4, e6125.

Eichhorn, S.W., Guo, H., McGeary, S.E., Rodriguez-Mias, R.A., Shin, C., Baek, D., Hsu, S.H., Ghoshal, K., Villen, J., and Bartel, D.P. (2014). mRNA destabilization is the dominant effect of mammalian microRNAs by the time substantial repression ensues. Mol Cell 56, 104-115.

33 Faehnle, C.R., Elkayam, E., Haase, A.D., Hannon, G.J., and Joshua-Tor, L. (2013). The making of a slicer: activation of human Argonaute-1. Cell reports 3, 1901-1909.

Fang, W., and Bartel, D.P. (2015). The Menu of Features that Define Primary MicroRNAs and Enable De Novo Design of MicroRNA Genes. Mol Cell 60, 131-145.

Fire, A., Albertson, D., Harrison, S.W., and Moerman, D.G. (1991). Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegans muscle. Development 113, 503-514.

Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806-811.

Frank, F., Sonenberg, N., and Nagar, B. (2010). Structural basis for 5'-nucleotide base-specific recognition of guide RNA by human AGO2. Nature 465, 818-822.

Giraldez, A.J. (2010). microRNAs, the cell's Nepenthe: clearing the past during the maternal-to- zygotic transition and cellular reprogramming. Curr Opin Genet Dev 20, 369-375.

Giraldez, A.J., Cinalli, R.M., Glasner, M.E., Enright, A.J., Thomson, J.M., Baskerville, S., Hammond, S.M., Bartel, D.P., and Schier, A.F. (2005). MicroRNAs regulate brain morphogenesis in zebrafish. Science 308, 833-838.

Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75-79.

Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. (2006). A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442, 199-202.

Gunawardane, L.S., Saito, K., Nishida, K.M., Miyoshi, K., Kawamura, Y., Nagami, T., Siomi, H., and Siomi, M.C. (2007). A slicer-mediated mechanism for repeat-associated siRNA 5' end formation in Drosophila. Science 315, 1587-1590.

Haley, B., and Zamore, P.D. (2004). Kinetic analysis of the RNAi enzyme complex. Nature structural & molecular biology 11, 599-606.

Hammond, S.M., Bernstein, E., Beach, D., and Hannon, G.J. (2000). An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296.

Hansen, T.B., Wiklund, E.D., Bramsen, J.B., Villadsen, S.B., Statham, A.L., Clark, S.J., and Kjems, J. (2011). miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA. EMBO J 30, 4414-4422.

Hauptmann, J., Dueck, A., Harlander, S., Pfaff, J., Merkl, R., and Meister, G. (2013). Turning catalytically inactive human Argonaute proteins into active slicer enzymes. Nature structural & molecular biology 20, 814-817.

Hauptmann, J., Kater, L., Loffler, P., Merkl, R., and Meister, G. (2014). Generation of catalytic human Ago4 identifies structural elements important for RNA cleavage. Rna 20, 1532-1538.

34 Hutvagner, G., and Zamore, P.D. (2002). A microRNA in a multiple-turnover RNAi enzyme complex. Science 297, 2056-2060.

Iwasaki, S., Kobayashi, M., Yoda, M., Sakaguchi, Y., Katsuma, S., Suzuki, T., and Tomari, Y. (2010). Hsc70/Hsp90 chaperone machinery mediates ATP-dependent RISC loading of small RNA duplexes. Mol Cell 39, 292-299.

Iwasaki, Y.W., Siomi, M.C., and Siomi, H. (2015). PIWI-Interacting RNA: Its Biogenesis and Functions. Annu Rev Biochem 84, 405-433.

Jee, D., Yang, J.S., Park, S.M., Farmer, D.T., Wen, J., Chou, T., Chow, A., McManus, M.T., Kharas, M.G., and Lai, E.C. (2018). Dual Strategies for Argonaute2-Mediated Biogenesis of Erythroid miRNAs Underlie Conserved Requirements for Slicing in Mammals. Mol Cell 69, 265- 278 e266.

Jonas, S., and Izaurralde, E. (2015). Towards a molecular understanding of microRNA- mediated gene silencing. Nat Rev Genet 16, 421-433.

Kelly, A., and Hurlstone, A.F. (2011). The use of RNAi technologies for gene knockdown in zebrafish. Briefings in functional genomics 10, 189-196.

Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216.

Kwak, P.B., and Tomari, Y. (2012). The N domain of Argonaute drives duplex unwinding during RISC assembly. Nature structural & molecular biology 19, 145-151.

Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. (2006). Characterization of the piRNA complex from rat testes. Science 313, 363-367.

Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864.

Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854.

Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060.

Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20.

Li, C., Vagin, V.V., Lee, S., Xu, J., Ma, S., Xi, H., Seitz, H., Horwich, M.D., Syrzycka, M., Honda, B.M., et al. (2009). Collapse of germline piRNAs in the absence of Argonaute3 reveals somatic piRNAs in flies. Cell 137, 509-521.

35 Li, Y.X., Farrell, M.J., Liu, R., Mohanty, N., and Kirby, M.L. (2000). Double-stranded RNA injection produces null phenotypes in zebrafish. Dev Biol 217, 394-405.

Lingel, A., Simon, B., Izaurralde, E., and Sattler, M. (2003). Structure and nucleic-acid binding of the Drosophila Argonaute 2 PAZ domain. Nature 426, 465-469.

Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441.

Ma, J.B., Ye, K., and Patel, D.J. (2004). Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain. Nature 429, 318-322.

Ma, J.B., Yuan, Y.R., Meister, G., Pei, Y., Tuschl, T., and Patel, D.J. (2005). Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature 434, 666-670.

Malone, C.D., Brennecke, J., Dus, M., Stark, A., McCombie, W.R., Sachidanandam, R., and Hannon, G.J. (2009). Specialized piRNA pathways act in germline and somatic tissues of the Drosophila ovary. Cell 137, 522-535.

Malone, C.D., and Hannon, G.J. (2009). Small RNAs as guardians of the genome. Cell 136, 656-668.

Matranga, C., Tomari, Y., Shin, C., Bartel, D.P., and Zamore, P.D. (2005). Passenger-strand cleavage facilitates assembly of siRNA into Ago2-containing RNAi enzyme complexes. Cell 123, 607-620.

Miyoshi, K., Tsukumo, H., Nagami, T., Siomi, H., and Siomi, M.C. (2005). Slicer function of Drosophila Argonautes and its involvement in RISC formation. Genes Dev 19, 2837-2848.

Nakanishi, K., Weinberg, D.E., Bartel, D.P., and Patel, D.J. (2012). Structure of yeast Argonaute with guide RNA. Nature 486, 368-374.

Nguyen, T.A., Jo, M.H., Choi, Y.G., Park, J., Kwon, S.C., Hohng, S., Kim, V.N., and Woo, J.S. (2015). Functional Anatomy of the Human Microprocessor. Cell 161, 1374-1387.

Oates, A.C., Bruce, A.E., and Ho, R.K. (2000). Too much interference: injection of double- stranded RNA has nonspecific effects in the zebrafish embryo. Dev Biol 224, 20-28.

Park, M.S., Phan, H.D., Busch, F., Hinckley, S.H., Brackbill, J.A., Wysocki, V.H., and Nakanishi, K. (2017). Human Argonaute3 has slicer activity. Nucleic Acids Res 45, 11867-11877.

Parker, J.S., Roe, S.M., and Barford, D. (2005). Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature 434, 663-666.

Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89.

Rand, T.A., Petersen, S., Du, F., and Wang, X. (2005). Argonaute2 cleaves the anti-guide strand of siRNA during RISC activation. Cell 123, 621-629.

36 Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906.

Ruvkun, G. (2001). Molecular biology. Glimpses of a tiny RNA world. Science 294, 797-799.

Saito, K., Inagaki, S., Mituyama, T., Kawamura, Y., Ono, Y., Sakota, E., Kotani, H., Asai, K., Siomi, H., and Siomi, M.C. (2009). A regulatory circuit for piwi by the large Maf gene traffic jam in Drosophila. Nature 461, 1296-1299.

Salomon, W.E., Jolly, S.M., Moore, M.J., Zamore, P.D., and Serebrov, V. (2015). Single- Molecule Imaging Reveals that Argonaute Reshapes the Binding Properties of Its Nucleic Acid Guides. Cell 162, 84-95.

Schier, A.F. (2007). The maternal-zygotic transition: death and birth of RNAs. Science 316, 406- 407.

Schirle, N.T., and MacRae, I.J. (2012). The crystal structure of human Argonaute2. Science 336, 1037-1040.

Schirle, N.T., Sheu-Gruttadauria, J., Chandradoss, S.D., Joo, C., and MacRae, I.J. (2015). Water-mediated recognition of t1-adenosine anchors Argonaute2 to microRNA targets. Elife 4.

Schirle, N.T., Sheu-Gruttadauria, J., and MacRae, I.J. (2014). Structural basis for microRNA targeting. Science 346, 608-613.

Schurmann, N., Trabuco, L.G., Bender, C., Russell, R.B., and Grimm, D. (2013). Molecular dissection of human Argonaute proteins by DNA shuffling. Nature structural & molecular biology 20, 818-826.

Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208.

Shabalina, S.A., and Koonin, E.V. (2008). Origins and evolution of eukaryotic RNA interference. Trends in ecology & evolution 23, 578-587.

Shin, C., Nam, J.W., Farh, K.K., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38, 789-802.

Siomi, M.C., Sato, K., Pezic, D., and Aravin, A.A. (2011). PIWI-interacting small RNAs: the vanguard of genome defence. Nat Rev Mol Cell Biol 12, 246-258.

Song, J.J., Smith, S.K., Hannon, G.J., and Joshua-Tor, L. (2004). Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305, 1434-1437.

Subtelny, A.O., Eichhorn, S.W., Chen, G.R., Sive, H., and Bartel, D.P. (2014). Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66-71.

37 Suzuki, H.I., Katsura, A., Yasuda, T., Ueno, T., Mano, H., Sugimoto, K., and Miyazono, K. (2015). Small-RNA asymmetry is directly driven by mammalian Argonautes. Nature structural & molecular biology 22, 512-521.

Tuschl, T., Zamore, P.D., Lehmann, R., Bartel, D.P., and Sharp, P.A. (1999). Targeted mRNA degradation by double-stranded RNA in vitro. Genes Dev 13, 3191-3197.

Vagin, V.V., Sigova, A., Li, C., Seitz, H., Gvozdev, V., and Zamore, P.D. (2006). A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313, 320-324.

Wahle, E., and Winkler, G.S. (2013). RNA decay machines: deadenylation by the Ccr4-not and Pan2-Pan3 complexes. Biochim Biophys Acta 1829, 561-570.

Wargelius, A., Ellingsen, S., and Fjose, A. (1999). Double-stranded RNA induces specific developmental defects in zebrafish embryos. Biochem Biophys Res Commun 263, 156-161.

Wee, L.M., Flores-Jasso, C.F., Salomon, W.E., and Zamore, P.D. (2012). Argonaute divides its RNA guide into domains with distinct functions and RNA-binding properties. Cell 151, 1055- 1067.

Wienholds, E., Koudijs, M.J., van Eeden, F.J., Cuppen, E., and Plasterk, R.H. (2003). The microRNA-producing enzyme Dicer1 is essential for zebrafish development. Nat Genet 35, 217- 218.

Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862.

Yamashita, A., Chang, T.C., Yamashita, Y., Zhu, W., Zhong, Z., Chen, C.Y., and Shyu, A.B. (2005). Concerted action of poly(A) nucleases and decapping enzyme in mammalian mRNA turnover. Nature structural & molecular biology 12, 1054-1063.

Yang, J.S., Maurin, T., and Lai, E.C. (2012). Functional parameters of Dicer-independent microRNA biogenesis. Rna 18, 945-957.

Yang, J.S., Maurin, T., Robine, N., Rasmussen, K.D., Jeffrey, K.L., Chandwani, R., Papapetrou, E.P., Sadelain, M., O'Carroll, D., and Lai, E.C. (2010). Conserved vertebrate mir-451 provides a platform for Dicer-independent, Ago2-mediated microRNA biogenesis. Proc Natl Acad Sci U S A 107, 15163-15168.

Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596.

Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17, 3011-3016.

Yoda, M., Cifuentes, D., Izumi, N., Sakaguchi, Y., Suzuki, T., Giraldez, A.J., and Tomari, Y. (2013). Poly(A)-specific ribonuclease mediates 3'-end trimming of Argonaute2-cleaved precursor microRNAs. Cell reports 5, 715-726.

Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. (2000). RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25-33.

38 Zhao, Z., Cao, Y., Li, M., and Meng, A. (2001). Double-stranded RNA injection produces nonspecific defects in zebrafish. Dev Biol 229, 215-223.

39 CHAPTER II

A seed mismatch enhances Argonaute2-catalyzed cleavage and partially rescues severely impaired cleavage found in fish

Grace R. Chen1,2,3, Hazel Sive2,3, and David P. Bartel1,2,3 1Howard Hughes Medical Institute, Cambridge, MA 02142, USA 2Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

G.R.C. and D.P.B. conceived the project, designed the study, and wrote the manuscript. G.R.C. performed the experiments. H.S. provided guidance and expertise.

Published as: Chen, G.R., H. Sive, D.P. Bartel. 2017. A seed mismatch enhances Argonaute2-catalyzed cleavage and partially rescues severely impaired cleavage found in fish. Mol Cell 68:1095-1107.

40 ABSTRACT

The RNAi pathway provides both innate immunity and efficient gene-knockdown tools in many

eukaryotic species, but curiously, not in zebrafish. We discovered that RNAi is less effective in

zebrafish at least partly because Argonaute2-catalyzed mRNA slicing is impaired. This defect is

due to two mutations that arose in an ancestor of most teleost fish, implying that most fish lack

effective RNAi. Despite lacking efficient slicing activity, these fish have retained the ability to

produce miR-451, a microRNA generated by a cleavage reaction analogous to slicing. This

ability is due to a G–G mismatch within the fish miR-451 precursor, which substantially

enhances its cleavage. An analogous G–G mismatch (or sometimes also a G–A mismatch) enhances target slicing, despite disrupting seed pairing important for target binding. These results provide a strategy for restoring RNAi to zebrafish and reveal unanticipated opposing effects of a seed mismatch with implications for mechanism and guide-RNA design.

41 INTRODUCTION

Diverse RNA-silencing pathways play important roles in transposon silencing, viral defense, heterochromatin formation, and posttranscriptional repression of cellular genes (Tomari and

Zamore, 2005; Malone and Hannon, 2009). In the simplest of these pathways, RNA interference

(RNAi), a Dicer endonuclease cleaves long, double-stranded RNA (dsRNA) into small interfering RNAs (siRNAs) that associate with an Argonaute (Ago) protein to guide the Ago- catalyzed slicing of transcripts with extensive pairing to the siRNA. The RNAi pathway arose early in eukaryotic evolution and has been retained by most eukaryotic lineages (Shabalina and

Koonin, 2008). Some lineages also have derivative silencing pathways that are more elaborate and involve other types of guide RNAs, such as Piwi-interacting RNAs (piRNAs), which derive from single-stranded RNA rather than dsRNA (Weick and Miska, 2014; Iwasaki et al., 2015), or microRNAs (miRNAs), which derive from short hairpins rather than long dsRNA (Bartel, 2004).

Despite their differences, the RNA-silencing pathways have each retained at their core a silencing complex that contains a short (20–32-nt) RNA associated with an Argonaute homolog.

Within this complex, the RNA provides sequence specificity through direct pairing with target transcripts, and the Argonaute homolog either slices the target precisely between the nucleotides that pair to residues 10 and 11 of the guide RNA (Tuschl, 2001; Song et al., 2004) or recruits other proteins to promote other types of repression (Weick and Miska, 2014; Iwasaki et al., 2015; Jonas and Izaurralde, 2015).

The miRNA pathway is the dominant RNA-silencing pathway of mammalian somatic cells. Indeed, most cellular mRNAs are conserved regulatory targets of conserved mammalian miRNAs (Friedman et al., 2009). The miRNA silencing complex targets these mRNAs at sites that fall primarily in 3' untranslated regions (3' UTRs) and perfectly pair to nucleotides 2–7 of the miRNA, known as the miRNA seed (Bartel, 2009). Pairing to the seed region is insufficient to trigger slicing of the mRNA target, and repression is instead achieved primarily through the recruitment of factors that accelerate poly(A)-tail shortening (Jonas and Izaurralde, 2015). Thus,

42 in this non-slicing mode of repression, the dominant effect of miRNAs depends on the consequences of the tail shortening, which change during development; i.e., in pre-gastrulation embryos, this tail shortening decreases translational efficiency, whereas later in development, it decreases mRNA stability (Subtelny et al., 2014).

Although the vast majority of miRNA regulatory sites in mammalian mRNAs have little more than seed pairing, some have the extensive complementarity required for Argonaute- catalyzed slicing. For example, in mouse embryos, miR-196 directs the cleavage of the HoxB8

3' UTR at a site that has near-perfect complementarity to miR-196 and is conserved throughout most vertebrate species, including zebrafish (Yekta et al., 2004). To date, however, only 21 cleavage targets of mammalian miRNAs have been found (Davis et al., 2005; Shin et al., 2010;

Hansen et al., 2011). Moreover, despite a high degree of homology among the four mammalian

Argonaute proteins (Ago1, Ago2, Ago3, and Ago4), only Ago2 has retained slicing activity (Liu et al., 2004; Meister et al., 2004). Restoring activity to the other three paralogs requires a combination of changes that either restore the residues of the DEDH catalytic tetrad within the

PIWI domain (Ago1 and Ago4), restore two structural elements (NTI and NTII) in the N domain

(Ago1, Ago3, and Ago4), restore a short sequence cluster in the PIWI domain (Ago1), or remove a short insertion close to the glutamate of the catalytic center (Ago4) (Faehnle et al.,

2013; Hauptmann et al., 2013; Nakanishi et al., 2013; Hauptmann et al., 2014).

In addition its requirement for slicing of rare, extensively paired miRNA targets, the ability of Ago2 to cleave RNA is required for the unusual biogenesis of miR-451, a miRNA conserved among vertebrates (Cheloufi et al., 2010; Cifuentes et al., 2010; Yang et al., 2010).

Most metazoan miRNAs are produced from the successive cleavage by Drosha and Dicer, two endonucleases with dual RNase III domains (Kim, 2005). Drosha first cleaves both strands near the base of the stem to liberate the pre-miRNA hairpin from the primary transcript, and then

Dicer cleaves both strands near the loop to generate the miRNA duplex, which contains the mature miRNA paired with 2-nt 3' overhangs to an RNA segment from the other arm of the

43 hairpin. This duplex is then loaded into Ago such that the miRNA strand ultimately becomes the guide RNA, and the other strand is discarded. In mammals, fish, and presumably other vertebrate species, miR-451 biogenesis is unusual in that the pre-miR-451 hairpin, with a stem of only 17 bp, is too short to be cleaved by Dicer and is instead loaded into Ago, which cleaves the strand opposite the miRNA strand in an activity analogous to mRNA slicing (Cheloufi et al.,

2010; Cifuentes et al., 2010; Yang et al., 2010). Following this cleavage, 3' exonucleolytic resection generates the mature miR-451 miRNA (Yoda et al., 2013). Because miR-451 activity is required for proper erythropoiesis (Patrick et al., 2010; Rasmussen et al., 2010), mice with

Ago2 mutations that abrogate slicing are anemic, as are fish lacking the full-length protein

(Cheloufi et al., 2010; Cifuentes et al., 2010).

In some mammals, Ago2-catalzed slicing also plays a critical role in the RNAi pathway.

Although this pathway is not typically found in somatic cells, endogenous siRNAs are observed in certain mouse cells, including oocytes, embryonic stem cells, and male germ cells (Babiarz et al., 2008; Tam et al., 2008; Watanabe et al., 2008; Song et al., 2011). The pathway is important in mouse oocytes, in that disrupting the Ago2 active site desilences transposon expression, causing meiotic defects and female sterility (Stein et al., 2015). However, whether RNAi plays such a critical role in other mammals is unclear, as the Dicer isoform primarily responsible for the production of murine siRNAs does not appear to be present outside of the Muridae family

(Flemr et al., 2013).

Regardless of why Ago2 has retained slicing activity, be it to cleave a few miRNA targets, to enable miR-451 biogenesis, or to perform RNAi-mediated transposon control, the widespread presence of this activity in mammalian cells has greatly benefited biomedical research. Indeed, the ability of artificial siRNAs to direct mRNA slicing, discovered 16 years ago

(Elbashir et al., 2001a), has transformed the way that biologists study mammalian gene function. The reason that these artificial siRNA duplexes are so effective is that they resemble endogenous miRNA duplexes and thereby become incorporated into the Ago2 silencing

44 complex to direct the slicing of target mRNAs. For previously unknown reasons, however, RNAi

is not generally an effective tool for gene-knockdown experiments in zebrafish (Oates et al.,

2000; Mangos et al., 2001; Zhao et al., 2001; Gruber et al., 2005; Kelly and Hurlstone, 2011).

Perhaps as a result, morpholino antisense reagents have been a much more popular choice for

posttranslational gene-knockdown in fish.

We discovered one reason why RNAi is generally ineffective in zebrafish: two point

substitutions that apparently occurred in a teleost ancestor ~0.3 billion years ago greatly

diminish the slicing activity of zebrafish Ago2 (drAgo2). The crippling effect of these

substitutions raised the question of how these fish are able to produce sufficient miR-451, which

requires Ago2-catalyzed cleavage for its biogenesis. When answering this question, we found

that a G–G mismatch involving position 6 of the miRNA substantially enhances both the

cleavage of fish pre-miR-451 and the slicing of bound target transcripts. Our results indicate

how RNAi might be restored to zebrafish and reveal an unanticipated feature of guide-RNA

pairing, showing that non-Watson–Crick seed geometry is optimal for slicing bound target.

RESULTS

Inefficient Slicing in Zebrafish

Although miR-196–directed slicing at the extensively paired site within the HoxB8 mRNA is readily detected in mouse embryos (Yekta et al., 2004), analogous efforts to detect slicing at the

orthologous site within HoxB8a were unsuccessful in zebrafish embryos (S. Yekta & D.P.B.,

unpublished data). When considering this result together with the ineffectiveness of RNAi as a

gene-knockdown tool in zebrafish, we decided to investigate the slicing ability of zebrafish

Argonaute2 (drAgo2). We first assayed for miR-430–directed slicing in zebrafish embryos

(Figure 1A). Capped RNA with a single site perfectly complementarity to the dominant isoform of miR-430 was injected into single-cell zebrafish embryos. Embryos were then harvested at 4 hours post-fertilization (4 hpf), a stage at which miR-430 dominates the endogenous miRNA

45 46 pool, and total RNA was extracted and analyzed on RNA blots. No slicing was detected, even

when an mRNA encoding additional drAgo2 was co-injected into the one-cell embryo (Figure

1B). In contrast, slicing was readily detected when mRNA for human Ago2 (hsAGO2) was co- injected, provided that the mRNA did not have mismatches at the cleavage site (mismatch), confirming that the conditions within the embryo were conducive to authentic slicing (Figure

1B). Additional experiments confirmed that another Ago paralog had not taken over slicing activity in zebrafish (Figure S1A).

Although our assay within zebrafish embryos had the advantage of examining slicing under physiological conditions, the in vivo setting of the developing embryo, with its large number of seed-matched miR-430 targets, its unknown amount of loaded Ago2 that varied over

time, and its dynamic nuclease activities, prevented quantitative analysis of slicing kinetics.

Therefore, to supplement the in vivo analyses, we adapted the protocol of Flores-Jasso et al.

(2013) to purify the different Ago2 proteins loaded with miR-430 and then measured their ability

to slice a cap-labeled substrate in vitro (Figure 1C). To isolate the slicing step from the

substrate-binding and product-release steps, we monitored single-turnover reactions in which

miR-430–programed human or zebrafish Ago2 was in 10-fold excess over the slicing substrate

(1.0 and 0.1 nM, respectively) and in even greater excess over the enzyme–substrate

dissociation constant (KD), expected to be in the low-picomolar range (Wee et al., 2012). In this more sensitive assay, drAgo2-catalyzed slicing was detectable but only ~1% as rapid as hsAGO2-catalyzed slicing (Figure 1D).

These results mirrored our earlier, unsuccessful efforts to detect endogenous miR-196–

directed slicing at the extensively paired HoxB8a mRNA in zebrafish, mentioned above.

However, they seemed at odds with previous reports of slicing activity observed in zebrafish

embryos injected with a miR-1 duplex and complementary substrate (Giraldez et al., 2005;

Cifuentes et al., 2010). To explore this potential discrepancy, we replicated the previous miR-1

experiment using the same reagents and approach. Capped RNA with three perfectly paired

47 miR-1 sites was co-injected with miR-1 duplex into one-cell embryos, and then at 4 hpf, RNA

was extracted and analyzed on RNA blots. Our results were consistent with those previously

reported, in that a miR-1–dependent product appeared, which migrated at a size expected for

cleavage at one of the three sites (Figure S1B). This product accumulation substantially

increased when hsAGO2 mRNA was also co-injected, which confirmed that the product was of

the size expected for Ago2-catalyzed slicing.

We do not know which features of this previous experimental regime enabled detection

of endogenous drAgo2-catalyzed slicing in embryos. Perhaps it helped to have three miRNA

sites rather than one, or relatively large quantities of injected miRNA and target, or perhaps

inherent differences between the two miRNAs, miR-430 and miR-1, made this setup less

sensitized to differences in slicing activity. To test whether the large difference between the

activities of the human and zebrafish proteins was still observed when using the miR-1 guide,

we purified miR-1–programed hsAGO2 and miR-1–programed drAgo2 and compared their activities in the single-turnover in vitro slicing assay (Figure 1C). With the miR-1 guide, drAgo2- catalyzed slicing was ~3% as rapid as hsAGO2-catalyzed slicing (Figure 1E, Figure S1C).

Thus, regardless of the guide RNA, zebrafish Ago2 slicing activity was substantially reduced

compared to that of the human protein. In addition to revealing this difference between to two

proteins, the quantitative in vitro assays revealed a difference between the two miRNA–target

pairs, showing that miR-1 directed slicing of its target was ≥ 10-fold more rapid than miR-430

directed slicing of its target. This type of difference helps to explain why drAgo2-catalyzed

slicing of a miR-1 target was readily detected in fish embryos, whereas drAgo2-catalyzed slicing

of a miR-430 target was not.

Two Substitutions in a Teleost Ancestor Explain the Loss of Efficient Slicing

A search for differences that might explain the loss of efficient drAgo2-catalyzed slicing started with the observation that drAgo2 and hsAGO2 differ mainly in their amino-terminal (N) domains

48 and the knowledge that changes in the N domains explained the loss of slicing activity of hsAGO1, hsAGO3, and hsAGO4 (Faehnle et al., 2013; Hauptmann et al., 2013; Hauptmann et al., 2014). However, when we swapped the N domains of drAgo2 and hsAGO2 and examined the activity of these chimeric proteins using our assay for miR-430–guided slicing, drAgo2 with a human N domain was not substantially better at slicing, and hsAGO2 with the zebrafish N domain had no worse activity, indicating that differences outside the drAgo2 N domain were inhibiting slicing activity (Figure S2A–B).

To search for differences outside the N domain that might explain the loss of efficient slicing, we compared Ago2 sequences from 11 vertebrate species. Assuming the most parsimonious evolutionary scenario, in which 1) efficient slicing was lost only once in the vertebrate clade, and 2) mammals have retained the ancestral slicing activity that is present in invertebrates and throughout most eukaryotes, we surmised that efficient slicing must have been lost at some point in the jawed-fish lineage that gave rise to zebrafish, after the common ancestor of humans and zebrafish (Figure 2A, highlighted lineage of cladogram). By this reasoning, any substitution that compromised slicing in zebrafish would be at a residue that is identical in lamprey and the Sarcopterygii (ceolacanth and tetrapods) and different in zebrafish.

These criteria narrowed the number of candidate substitutions to 20, all of which imparted conservative amino acid changes, and most of which were at residues on the surface of the protein. Of the three at interior residues, the two best candidates for explaining the loss of efficient slicing were near the active site (Figure 2B). Indeed, one changed the active-site glutamate (E), previously found to complete a DEDH catalytic tetrad (Nakanishi et al., 2012); in zebrafish and other representatives of the teleost clade, this E changed to an aspartate (D). The second changed a nearby phenylalanine (F) to tyrosine (Y), and this change also occurred in all representatives of the teleost clade.

To test the effect of these two substitutions, we made mRNAs that encoded drAgo2 proteins with the D and Y reverted back to their ancestral identities, confirmed comparable

49 50 expression of these proteins in injected embryos (Figure S3A), and examined the ability of

these proteins to slice a miR-430 target in embryos. Each of the single reversions (drAgo2D–E

and drAgo2Y–F) conferred detectable slicing activity to the zebrafish protein, and the double

reversion (drAgo2DY–EF) imparted activity approaching that of the human protein (Figure 2C).

Moreover, the reciprocal human-to-zebrafish substitutions within hsAGO2 eliminated detectable

miR-430–guided slicing activity (Figure S3B). Together, these results showed that the E-to-D

and F-to-Y substitutions both contributed to the loss of efficient slicing in zebrafish embryos.

To provide more quantitative measurements of slicing activities, we turned to the in vitro

slicing assay, monitoring single-turnover reactions with purified miR-430–programmed Ago2

variants. As in the embryo, each of the single reversions (drAgo2D–E and drAgo2Y–F) had

improved slicing activity, and drAgo2 slicing activity approached that of hsAGO2 only when both

key residues were reverted to their ancestral identities (drAgo2DY–EF) (Figure 2D). Confirming that activity depended on the drAgo2 active site, slicing was abolished when the first aspartate of the DEDH catalytic tetrad was changed to alanine (drAgo2D–A) (Figure 2D).

The E-to-D and F-to-Y substitutions, which together imparted this substantial diminution

in slicing activity, are broadly distributed among teleost fish, which comprise most of the extant

fish species. With the exception of a presumed D-to-E reversion in the Cichlidae family of

Euteleosteomorpha, both the E-to-D and F-to-Y substitutions were present in all 29 teleost

species examined (Figure S3C). These 29 species included all teleosts with sequenced

genomes and fell within the three teleost subgroups that encompass the vast majority of the

extant teleost species (Broughton et al., 2013). Because these substitutions did not extend to

more basal jawed fish, represented by gar (Figure 2A), they presumably occurred

approximately 300,000,000 years ago, in a common ancestor of most extant teleosts.

51 An Ancestral G–G Mismatch Within Pre-miR-451 Enhances Cleavage

The inefficiency of drAgo2-catalyzed slicing helps explain why endogenous slicing products

have not been reported in zebrafish and why RNAi is described as an ineffective tool for

knocking down gene expression in zebrafish (Oates et al., 2000; Mangos et al., 2001; Zhao et

al., 2001; Gruber et al., 2005; Kelly and Hurlstone, 2011). The inefficiency of drAgo2-catalyzed

slicing also suggested that the activity observed for shRNA expressed from a miR-30e

backbone in zebrafish (Dong et al., 2009; De Rienzo et al., 2012) occurs primarily through the

mRNA deadenlyation pathway typical of miRNA-mediated repression (Jonas and Izaurralde,

2015) rather than through Ago2-catalyzed slicing. This inefficient slicing was nonetheless unexpected because genetic analyses indicate that a reaction analogous to slicing is required for miR-451 biogenesis in zebrafish (Cifuentes et al., 2010), which prompted us to examine the ability of drAgo2 to process pre-miR-451. Accordingly, we developed an assay for pre-miR-451 binding and cleavage, in which mRNA for FLAG-tagged Ago2 was co-injected with 5′ end-

labeled pre-miR-451 into single-cell embryos, and RNA co-purifying with Ago2 was then isolated

and analyzed on a denaturing gel (Figure 3A). Consistent with the genetic results (Cifuentes et

al., 2010), wild-type drAgo2 was able to bind and cleave pre-miR-451, although cleavage was

not as efficient as that observed for hsAGO2 (Figure 3B). Interestingly, some mature miR-451

was also detected with Ago2 active-site mutants (Ago2D–A), perhaps the result of cleavage

within the loop of the injected pre-miRNA by another endonuclease. However, the amount of

cleaved pre-miR-451 associated with wild-type drAgo2 was substantially greater, indicating that

drAgo2 was indeed able to cleave the pre-miRNA, albeit at lower efficiency than that observed

for hsAGO2 (Figure 3B). As observed for target slicing, this lower efficiency of drAgo2-

mediated pre-miR-451 cleavage was attributed to the E-to-D and F-to-Y substitutions found in

zebrafish and other teleosts, in that activity for the constructs with single and double reversions

of these substitutions approached that of hsAGO2 (Figure 3C, Figure S2C).

52 The >30-fold difference between the miR-430–guided slicing activities of hsAGO2 and

drAgo2 (Figures 1–2) seemed much greater than the difference between their respective pre- miR-451 cleavage activities (Figure 3B–C). Although we could not rule out the possibility that the smaller apparent difference for pre-miR-451 cleavage might be attributable to differences in the assays (as would be the case, if for instance the results for the more efficient constructs were beyond the dynamic range of the pre-miR-451 cleavage assay), we explored the more interesting possibility that the smaller apparent difference for pre-miR-451 cleavage might be attributable to differences between the substrates of the two types of reactions. Apart from the loop in pre-miR-451, the most prominent structural difference between the two substrates was at miRNA position 6, which was perfectly paired in the slicing substrate but formed a G–G mismatch in the zebrafish pre-miR-451 hairpin (Figure 3D).

Examination of the whole-genome alignments (Tyner et al., 2017) revealed that the G–G mismatch within pre-miR-451 has been conserved among the fish and amphibian species.

However, G35 mutated to a C in an amniote ancestor that gave rise to reptiles, birds, and mammals (Figure S4), thereby changing the ancestral G–G mismatch to a G–C Watson–Crick match (Figure 3D), as occurs in our slicing substrate. Although C35 has been retained within most amniote lineages, it mutated again in some linages, most often to the U transition (at least nine times) but sometimes transverting back to a G (at least 3 times) (Figure S4). For example, all three possibilities were observed within primates, with humans and most other apes acquiring the U, gibbons retaining the C, and old-world monkeys reverting back to the G. The variability of this position among mammalian pre-miR-451 sequences has been previously noted, and structure-function studies of human pre-miR-451 show that changing U35 to either a

G or an A slightly enhances miR-451 activity in HeLa cells (Yang et al., 2012). This increased activity corresponds to increased miR-451 accumulation, which is attributable to more efficient resection of cleaved pre-miR-451 (Yang et al., 2012). These experiments comparing the human

G–U wobble to the G–G and G–A mismatches at positions 6 and 35, respectively, establish an

53 54 interesting tolerance for mismatches at this position, raising the question of how the G–C

Watson–Crick match, which is found in most amniotes but untested in previous studies, might compare with the ancestral G–G mismatch.

To answer this question, we tested ancestral and amniote pre-miR-451 structures in our assay for pre-miR-451 binding and cleavage and found that drAgo2 had a surprising preference for the ancestral G–G mismatch structure (Figure 3E). Indeed, although drAgo2 could bind the

G–C structure, cleavage did not exceed the background level observed for the D-to-A active-site mutants (Figures 3B, E). Similar results were observed for the hsAGO2 with the zebrafish substitutions, hsAGO2EF–DY (Figure 3E), and when comparing to the pre-miR-451 hairpin with a

C–G rather than a G–C pair at this position (Figure S5). Thus, for these slicing-impaired enzymes possessing the teleost E-to-D and F-to-Y substitutions, the benefit of the G–G mismatch appeared binary—either activity with the G–G mismatch or merely background activity with the G–C match. In contrast, for both repaired drAgo2 (drAgo2DY–EF) and wild-type

hsAGO2, no advantage of the mismatch was observed, perhaps reflecting a limited dynamic

range of this assay (Figure 3E).

The adequate cleavage of fish pre-miR-451 despite inefficient slicing of perfectly matched targets leads us to speculate that in most teleost fish the only biological role for Ago2 catalytic activity is to produce miR-451. In this scenario, the reason that the catalytically

impaired Ago2 of most teleosts has been able to play this role is the presence of an ancestral

G–G mismatch within pre-miR-451, which helps compensate for the impaired cleavage activity,

allowing these fish to produce enough miR-451 to avoid erythropoiesis defects.

Specific Mismatches at miRNA Position 6 Enhance Slicing of Bound Target.

Intrigued by the strong benefit of the G–G mismatch for drAgo2-mediated pre-miR-451

cleavage, we tested whether an analogous mismatch might also enhance target slicing. To

55

56 isolate the slicing step from substrate binding and product release, we started with single-

turnover reactions, in which miR-430–programed Ago2 was in 10-fold excess over the slicing substrate and in large excess over the expected KD’s (Figure 4A–B). hsAGO2 sliced the bound

G–G substrate 3.1-fold more rapidly than it sliced the G–C substrate, whereas drAgo2 sliced the

G–G substrate 4.8-fold more rapidly. When testing the other two possibilities across from a G at position 6 of the miR-430 guide, the G–G mismatch was also preferred over the G–A mismatch

(3.3 and 7.8 fold hsAGO2 and drAgo2, respectively) and the G–U wobble (3.0 and 3.7 fold for hsAGO2 and drAgo2, respectively), indicating that for this guide RNA slicing enhancement is

specific to a G–G mismatch in both species (Figure 4A–B).

To examine whether the preference for G–G mismatch was specific to position 6 of the guide, which is known to tolerate an abasic guide residue (Lee et al., 2015), we tested the same matched and mismatched possibilities to position 4 of the miR-430 guide, which is also a G. At this position, bound substrate with the G–C match was sliced most rapidly—at least 2-fold more rapidly than substrates with either a wobble or mismatch (Figure S6). Thus, the benefit of a G–

G mismatch does not apply throughout the seed region.

To test whether the benefit of a G–G mismatch at position 6 occurs in other miRNA contexts, we examined substrates of miR-451–programmed hsAGO2. miR-451 and miR-430 have similar seed regions, with one difference being that the nucleotides immediately flanking position 6 are swapped (positions 5–7 of miR-451 are CGU, whereas those of miR-430 are

UGC). As observed for miR-430–guided slicing, bound substrate with the G–G mismatch at position 6 was sliced more rapidly than those with the G–C match or G–U wobble (Figure 4C).

The difference was that the bound substrate with the G–A mismatch was sliced as rapidly was that with the G–G mismatch. Thus, in some nearest-neighbor contexts, a G–A mismatch at position 6 can impart the same benefit as a G–G mismatch.

Previous kinetic analyses of Ago-catalyzed slicing reactions show that mismatches to the guide RNA enhance the rate of product release, thereby enhancing the rate of multiple-

57 turnover slicing of substrates for which release of the fully matched product is rate limiting (Wee

et al., 2012). To investigate whether the G–G mismatch to miR-430 might confer this additional, post-slicing rate enhancement, we examined its effect on hsAGO2-catalzyed multiple-turnover slicing. Indeed, as expected for rate-limiting product release, accumulation of product was biphasic, with an initial burst of rapid slicing corresponding to slicing of a stoichiometric amount of enzyme-bound substrate followed by a second phase corresponding to rate-limiting product release for subsequent enzyme turnover (Figure 5A, Figure S7). For the G–G substrate, the rate constant for the second, slower phase (k2) was 2.1-fold faster than that for the G–C

–1 –1 substrate (k2 = 0.074 min and 0.035 min , respectively), consistent with the idea that following

slicing, dissociation of the 3' cleavage product (which differed for the two substrates) was at

least partially rate limiting, and the G–G mismatch enhanced this dissociation rate constant. As expected from the single-turnover results, the initial burst for the G–G substrate was also faster than that for the G–C substrate (Figure 5A). We observed a 3.6-fold enhancement for the G–G

–1 –1 mismatch substrate during the initial burst (k1 = 0.94 min and 0.26 min for the G–G and G–C substrates, respectively), which resembled the 3.1-fold enhancement observed in the single- turnover reactions (Figure 4A and Figure 5A). To the extent that the effect of the G–G

mismatch was somewhat greater in this burst phase than in the single-turnover regime, the

difference can be attributed to the fundamental rate constants of the reaction and the relative

concentrations of enzyme and substrate used in the single- versus multiple-turnover assays

(Figure S7). These considerations imply that the 3.6-fold enhancement observed in the first

phase of the multiple-turnover reaction with hsAGO2 and the 4.8-fold enhancement observed in

the single-turnover reaction with drAgo2 best represent the degree to which the G–G mismatch

confers enhanced slicing of bound substrate.

Having found a surprising benefit of a G–G mismatch at miRNA position 6 when slicing

of bound substrate (Figures 4 and 5A) and having confirmed that this mismatch also enhances

release of bound product (Figure 5A), we turned to substrate association, a step in the slicing-

58 59 reaction pathway for which a mismatch within the seed region is expected to be detrimental

(Salomon et al., 2015). To confirm and quantify the presumed detrimental effect of the G–G mismatch on substrate association, we developed a competitive-cleavage assay, in which excess long (168-nt) and short (80-nt) cap-labeled slicing substrates—one with a perfectly paired site, the other with a G–G-mismatched site—were incubated with limiting miR-430– programmed Ago2 (Figure 5B). In this assay, the two substrates compete for Ago2 binding and complex formation, and these differences in association rates lead to differences in sliced product. As controls, short and long substrates with the same sites were tested. These controls revealed a slight preference for the longer version of each substrate and reiterated the observation of faster slicing of G–G substrates in the second phase of miR-430–guided multiple- turnover slicing (Figure 5C, left half of gel). In the experimental lanes, in which the two site types competed with each other for limiting programed hsAGO2, the G–C site was bound and sliced 2.4-fold more efficiently when it resided in the longer substrate and 1.9-fold more efficiently when it resided in the shorter substrate, indicating a 2.2-fold overall preference for the

G–C site over the G–G site (Figure 5C, right half of gel). Because slicing is much faster than substrate release (Wee et al., 2012), and because the first time point provided ample time for slicing (30 min for reactions proceeding at ≥ 0.2 min-1), most of the molecules that bound to programed hsAGO2 were also sliced, and thus differences in slicing rates had a negligible effect on the proportions sliced at the time points of this experiment. Moreover, differences in product dissociation rates were also inconsequential because dissociation was not required for the product to be detected on the gel, and under the competitive conditions, any programed hsAGO2 that is freed upon product release will chose the next substrate molecule based on its relative association rate, irrespective of either the identity or dissociation rate of the recently released product. Thus, the observed 2.2-fold preference for slicing of the G–C site in this competitive assay was primarily attributable to correspondingly more rapid association of the G–

C substrate compared to the G–G substrate.

60 In sum, our kinetic analyses of hsAGO2-catalyzed slicing show that particular mismatches to the seed of the miRNA can have opposing effects on binding and slicing. In addition to the anticipated effects on binding, in which a G–G mismatch at position 6 of the guide RNA slows substrate association and enhances produce release, this mismatch has an unanticipated effect on slicing, in which it enhances slicing of bound substrate. In the context of our sequences and conditions, the effects on substrate association and product release were each ~2-fold, and the effect on slicing of bound substrate was 3–5-fold (Figure 5D).

DISCUSSION

Our discovery that zebrafish lacks efficient slicing demonstrates that a vertebrate species can persist in the wild despite lacking effective RNAi, a powerful gene-silencing pathway that many other eukaryotic species deploy to silence viruses and transposons (Tomari and Zamore, 2005;

Malone and Hannon, 2009). Indeed, the two point substitutions that confer this loss of effective

RNAi appear to have occurred 300,000,000 years ago in a common ancestor of the sequenced teleost fish. This lineage includes most of the extant fish species—and indeed, most of the vertebrate species currently inhabiting the planet, yet this vertebrate lineage that has lost effective RNAi has not only persisted but thrived.

Perhaps the possession of alternative pathways to combat viruses and transposons has allowed the RNAi pathway to be lost without consequence. Alternatively, the cost of losing the pathway in teleosts might have been offset by a benefit. This type of cost–benefit tradeoff explains why the presence of RNAi is so variable among fungi: Losing RNAi imparts a cost of decreased protection against transposons but also imparts a benefit, in that it enables the acquisition and retention of Killer, a dsRNA element that encodes a toxin that kills neighboring cells that lack Killer (Drinnenberg et al., 2009; Drinnenberg et al., 2011). For fish, we can only speculate on the potential benefits of losing efficient slicing. One possibility is that it would confer resistance to polyoma viruses or other DNA viruses that produce miRNAs that direct

61 slicing of complementary mRNAs transcribed from the opposite viral strand (Grundhoff and

Sullivan, 2011).

Whether losing RNAi was essentially neutral or conferred a net benefit to the teleost

lineage, the lack of efficient RNAi is clearly not a benefit for the use of zebrafish as a model

organism to study the molecular basis of vertebrate development and physiology. Our

identification of the two point substitutions that conferred the loss of efficient slicing in teleosts

suggests how, with the use of modern gene-editing methods, this activity might be restored to

zebrafish. The generation of a zebrafish line that possess efficient slicing activity might enable

RNAi-based gene-knockdown tools in this model organism and would also reveal the

consequences of regaining efficient slicing in a lineage that has not experienced it in 0.3 billion

years.

The cost of losing efficient Ago2 catalytic activity was attenuated in teleosts because

they retained the G–G mismatch within pre-miR-451, the precursor of a miRNA required for

proper erythrocyte development. We found that this mismatch to position 6 of the miRNA

enabled drAgo2-mediated pre-miR-451 cleavage; without the mismatch, pre-miR-451 cleavage

was essentially abolished. Although maturation of the G–G mismatched pre-miR-451 within

drAgo2 was not as rapid as that observed within hsAGO2, it did appear to be sufficient for

adequate miR-451 to be produced within the timeframe of erythropoiesis. The unanticipated

advantage of this mismatch to a seed nucleotide was also observed during Ago2-catalyzed

slicing of bound target transcripts, which occurred 3–5-fold more rapidly for bound substrates

containing the mismatch compared to those that were perfectly matched.

Many lines of evidence point to the strict preference for perfect Watson–Crick pairing to the miRNA seed during target binding (Bartel, 2009), and with no evidence to the contrary, this seed pairing, together with pairing to the midsection of the guide RNA, has been assumed to be also preferred for slicing of bound target. Our results reveal that in fact there is a tradeoff between the preferences for binding and those of the subsequent conformational and chemical

62 steps required for slicing. Moreover, for miR-430–directed slicing, the post-chemistry advantage

of the G–G mismatch, with its 2-fold more favorable product release, essentially negates the 2-

fold disadvantage that this mismatch imparts on target association. Similar results are

anticipated for this mismatch in the context of other guide RNAs, although more needs to be

learned about the influence of neighboring base-pair identity, as illustrated by the effects of the

position-6 G–A mismatch, which enhances slicing of the bound miR-451 substrate but not that

of miR-430.

siRNAs and artificial miRNAs are important research tools for gene-knockdown studies, and siRNAs are showing promise in the clinic (Bobbin and Rossi, 2016). In the current design of these gene-knockdown tools, pairing to the last few nucleotides of the guide is considered unimportant (Elbashir et al., 2001b), as is pairing to the first nucleotide of the guide, which is bound to Ago2 in a configuration that prevents pairing to the mRNA (Ma et al., 2005; Parker et al., 2005). However, the remainder of the guide is typically designed to pair perfectly to the target mRNA. Knowledge that mismatches between the guide and target can enhance product dissociation rates and thereby potentially increase the multiple-turnover rate of slicing (Wee et al., 2012) is not typically exploited to improve these reagents, presumably out of concern that the benefits to multiple turnover would be offset by less efficient target binding and slicing. Our results revealing the tradeoff between pairing preferences for binding and slicing suggest that, depending on the relative importance of target association, target slicing, and product release, a suitable mismatch at position 6 might impart an overall benefit. This strategy for enhancing slicing and product release might be particularly useful for improving siRNAs with nucleotide modifications that protect them from nucleases, as these modifications often also enhance pairing stability, which could shift the balance with respect to the relative importance of target association, predicted to become less of a concern, and product release, predicted to become more of a concern.

63 The discovery that a G–G mismatch at position 6 of the guide enhances both pre-miR-

451 cleavage and target slicing raises the question of whether other mismatches or wobbles at other seed positions might also enhance these activities. Thus far, we have not found a mismatch that confers a benefit at position 4, but within the context of miR-451, a G–A mismatch at position 6 also imparts a benefit. Another key question is how the G–G (or G–A) mismatch confers its benefit. The perturbed geometry or increased flexibility imparted by this mismatch presumably favors either the transition-state geometry of the active site or an on- pathway pre-chemistry conformational change. Potentially related to this question is that of how the identity of the guide RNA can have such a large influence on the slicing rate of bound substrate, which differed by >100 fold, depending upon whether the substrate was paired to miR-1 or to miR-451 (hsAGO-catalyzed slicing rates of 2.2 min-1 and 0.019 min-1, respectively;

Figure S1C and Figure 4C). Now that these differences and the unanticipated tradeoff between

the pairing preferences for binding and slicing are known, systematic biochemical and

biophysical studies can be designed to take aim at these questions.

ACKNOWLEDGMENTS

We thank B. Kleaveland and N. Bisaria for helpful discussions, and S. McGeary for

experimental and technical advice. This research was supported by NIH grants GM061835 and

GM118135 (to D.P.B). D.P.B is an investigator of the Howard Hughes Medical Institute. D.P.B.

is a member of the scientific advisory board of Alnylam Pharmaceuticals. A patent has been

filed on this work.

AUTHOR CONTRIBUTIONS

G.R.C. and D.P.B. conceived the project, designed the study, and wrote the manuscript. G.R.C.

performed the experiments. H.S. provided guidance and expertise.

64 STAR METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be

fulfilled by the Lead Contact, David Bartel ([email protected]).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

HEK293T cells were cultured in DMEM (VWR) with 10% FBS (Clonetech). All cells were

cultured at 37°C with 5% CO2. Cell lines were of female origin. All zebrafish procedures were

approved by the MIT Committee on Animal Care.

METHOD DETAILS

Plasmids

All plasmids generated for this study are listed in the Key Resources Table and available at

Addgene with maps and sequences. To construct Ago2 plasmids, the coding sequence of

human or zebrafish Ago2 was inserted into the pCS2+ vector (RZPD). The sequence encoding

the 3X-FLAG tag was then added downstream of the start codon by PCR-mediated insertion to

generate pCS2+-FLAG3-hsAgo2 and pCS2+-FLAG3-drAgo2. To construct domain-swap plasmids, both the coding sequence of the N domain as well as the remaining domains of human or zebrafish Ago2 were separately amplified by PCR and then spliced together by overlap extension PCR. Point substitutions were introduced by PCR-based mutagenesis using

QuikChange Lightning Multi Site-Directed Mutagenesis (Agilent) to generate plasmids encoding

FLAG-tagged mutant Ago2 proteins. To construct plasmids used to generate the injected miR-

430 target RNAs, the coding sequence of Zeocin was inserted into the pCS2+ vector. A single miR-430 site (perfect, mismatch, or G–G mismatch) was inserted downstream of the Zeocin

sequence using QuikChange Lightning Multi Site-Directed Mutagenesis. To construct plasmids

65 used to generate purified programed Ago2 complexes, the coding sequence of human or

zebrafish Ago2 was inserted into the pcDNA3 vector (Invitrogen) and then the 3X-FLAG tag and

point substitutions were introduced as described above.

In vivo slicing assay

Ago2 mRNAs and target RNAs were transcribed in vitro using mMESSAGE mMACHINE SP6

according to the manufacturer’s instructions (Life Technologies). The transcribed RNA was

purified with RNeasy Mini (QIAGEN) according to the manufacturer’s instructions, ethanol

precipitated, and stored in water at –80°C. One-cell embryos were injected with target RNA containing one miR-430 site (10 pg/embryo) with or without additional Ago2 mRNA (100

pg/embryo) in a volume of 1 nL (PLI-100 Plus Pico-Injector, Harvard Apparatus). For each

condition, 50 embryos were injected. Embryos that developed to the sphere stage (which was

approximately 4 hpf and when endogenous miR-430 peaks) were manually de-chorionated,

pooled by condition, and placed in 1 mL TRI Reagent (Life Technologies) for RNA isolation.

Isolated RNA was chloroform extracted, ethanol precipitated, and stored at –80°C prior to

analysis on small-RNA blots. The in vivo assays for miR-1–guided slicing were performed as

described (Giraldez et al., 2005; Cifuentes et al., 2010).

Small-RNA blot

Total RNA (2 µg) from each condition was denatured and resolved on a 5% urea-

polyacrylamide gel. RNA was then electroblotted onto an Amersham Hybon-NX nylon

membrane (VWR) and UV cross-linked at 254 nm. Membranes were pre-incubated with

ULTRAhyb Ultrasensitive hybridization buffer (Life Technologies) at 68°C under rotation for 1

hour and then hybridized under the same conditions overnight with a body-labeled RNA probe complementary to the 5′ cleavage product. This RNA probe was transcribed in vitro using

MAXIscript T7 RNA polymerase according to the manufacturer’s instructions (Life

66 Technologies), replacing the UTP with [α-32P]UTP (PerkinElmer), and desalted with Micro Bio-

Spin P-30 gel columns (BioRad). Radiolabeled RNA was purified on a 4% urea-polyacrylamide

gel, eluted from gel slices in 0.3 M NaCl overnight at 4°C, and ethanol precipitated prior to

incubation with the membrane. Membranes were then washed twice with low-stringency buffer

(2X SSC and 0.1% SDS) for 5 min under rotation at 68°C, and once with high-stringency buffer

(0.1X SSC and 0.1 % SDS) for 30 min under rotation at 68°C. The blots were then exposed to a

phosphorimaging screen for 1–14 days. Signal was detected using the Typhoon FLA 7000

phosphorimager (GE Healthcare Life Sciences) and analyzed using the MultiGauge software

(FujiFilm). RNA blots used to monitor miR-1–guided slicing were prepared following the

NorthernMax-Gly Kit (Life Technologies) according to the manufacturer’s instructions. Total

RNA (1.5 µg) from each condition was incubated with Glyoxal Loading Dye for 30 min at 50°C and resolved on a 1.5% agarose gel. RNA was then transferred onto a nylon membrane using a

Whatman Nytran SuPerCharge TurboBlotter (Sigma) for 2 hours and UV cross-linked at 254 nm. Membranes were pre-incubated with ULTRAhyb-Oligo hybridization buffer (Life

Technologies) at 42°C under rotation for 1 hour and then hybridized under the same conditions overnight with an end-labeled DNA probe complementary to the 5′ cleavage product. This DNA probe (IDT) (Table S1) was purified on a urea-polyacrylamide gel, phosphorylated with [γ-

32P]ATP using T4 polynucleotide kinase (New England BioLabs), desalted with Micro Bio-Spin

P-30 Gel Columns (BioRad), and gel purified again. All subsequent steps were as above.

miRNA Duplexes

Synthetic RNA oligonucleotides (IDT) representing the miR-430b and miR-1 duplex (Table S1) were purified on a 15% urea-polyacrylamide gel and incubated in 2X Annealing buffer (60 mM

Tris-HCl pH 7.5, 200 mM NaCl, 2 mM EDTA) at 90°C for 2 min and then slow cooled to room temperature over >3 hours. Annealed RNA was separated from ssRNAs on a native 15%

67 polyacrylamide gel, and duplex was eluted from gel slices in 0.3 M NaCl overnight at 4°C, ethanol precipitated, and stored in 1X Annealing buffer at –80°C.

miR-430–, miR-1–, and miR-451–programed Ago2

HEK293T cells were cultured in DMEM (VWR) supplemented with 10% heat-inactivated fetal bovine serum at 37°C with 5% CO2 and split every second or third day at approximately 90% confluency. Cells grown in 150 mm plates were co-transfected with pcDNA3-FLAG3-Ago2 and pMAX-GFP (control) using Lipofectamine 2000 (Life Technologies) at approximately 50% confluency, according to the manufacturer’s instructions. At 24 hours post-transfection, cells were transferred to 254 mm square plates and allowed to grow for another 48 hours. Cells were harvested, and S100 extracts were prepared as described (Wee et al., 2012), except that the cells were lysed with a 23G needle and syringe. miR-430 or miR-1 duplex or pre-miR-451was incubated in 1 ml extract at a final concentration of 50 nM for 2 hours at 25°C, and then programmed Ago2 was affinity purified using a protocol modified from that of Flores-Jasso et al.

(2013). Assembled Ago2-RISC was first captured with a 3′ biotinylated 2′–O–methyl-modified oligonucleotide that paired with nucleotides 2–8 of the miRNA (Table S1) and displaced with a competitor DNA oligonucleotide (Table S1) as described (Wee et al., 2012; Flores-Jasso et al.,

2013), except the complex was displaced from the capture oligonucleotide in two successive rounds of elution, each with 100 µL of elution solution (10 µM competitor oligo in 18 mM HEPES pH 7.4, 1 mM potassium acetate, 3 mM magnesium acetate, 0.01% NP-40, 0.2 mg/mL BSA,

0.01 mg/mL yeast tRNA) for 2 hours. To remove complex that had formed with endogenous

Agos from the extract, the complex with the ectopically expressed Ago2 was immunopurified based on affinity to the FLAG tag. Anti-FLAG M2 Magnetic Beads (Sigma) were equilibrated with binding buffer (18 mM HEPES pH 7.4, 100 mM potassium acetate, 1 mM magnesium acetate, 0.01 mg/mL yeast tRNA, 0.01% NP-40, 0.2 mg/mL BSA) and incubated with the pooled elution fractions for 2 hours, shaking at 1100 rpm on a ThermoMixer (Eppendorf) at 25°C. The

68 beads were washed with binding buffer three times and eluted with FLAG peptide (Sigma) in

binding buffer. The eluted protein in storage buffer [binding buffer supplemented with glycerol

and DTT (13% and 1 mM final concentrations, respectively), which diluted the binding buffer by

25%] was flash frozen in liquid nitrogen and stored at –80°C. To measure the binding capacity

of the complex and thereby determine its concentration, complex was incubated with excess

radiolabeled target RNA that contained a phosphorothioate linkage flanked by 2′–O–methyl-

ribose at positions 10 and 11 to block cleavage (Table S1), and layered nitrocellulose–nylon

filter-binding assays were performed to quantitate bound and unbound RNA.

RNA targets for in vitro slicing assays

Targets for in vitro slicing assays were transcribed in vitro with T7 RNA polymerase, treated with

TURBO DNase (Life Technologies) and purified on a urea-polyacrylamide gel. Purified RNA was capped in two batches to generate RNA with high and low specific activity, using the

Vaccinia capping system (New England BioLabs) according to the manufacturer’s directions.

RNA of high specific activity was prepared by incubating 10 picomol RNA with only [α-32P]GTP

for 45 min in a 10 µl reaction, before adding 0.5 nanomol GTP for another 45 min, and RNA of

low specific activity was prepared by using a 140:1 molar ratio of GTP:[α-32P]GTP. Capped RNA

was gel purified, phenol-cholorform and chloroform extracted, ethanol precipitated, resuspended

in water, quantified by UV absorbance (NanoDrop) and stored at –80°C. RNA of high-specific

activity was used for the single-turnover assays, and mixtures of high- and low-specificity were

used for the multiple-turnover assays, using the high-specificity RNA to optimize the amount of

radioactivity but including mostly low-specificity RNA to maintain accurate concentrations.

In vitro slicing assay

Slicing assays were performed in the 37°C warm room. Pre-mixed, cap-labeled target RNA was

incubated in reaction buffer (18 mM HEPES pH 7.4, 100 mM potassium acetate, 1 mM

69 magnesium acetate, 0.01 mg/ mL yeast tRNA, 0.01% NP-40, 5 mM DTT) for 15 min at 37°C,

and then miR-430–, miR-1–, or miR-451–programed Ago2 was added to initiate the slicing

reaction. Reactions were incubated at 37°C, and 2 µL aliquots were removed at each indicated

time point and quenched with Gel Loading Buffer II (95% formamide, 18 mM EDTA, 0.025%

SDS, 0.025% xylene cyanol, and 0.025% bromophenol blue; Life Technologies). To monitor

slicing, RNAs were resolved on a urea-polyacrylamide gel, and radiolabeled target and product were visualized on the Typhoon FLA 7000 phosphorimager.

pre-miR-451 binding and cleavage assay

Synthetic pre-miR-451 RNAs (IDT) (Table S1) were purified on a urea-polyacrylamide gel,

phosphorylated with [γ-32P]ATP using T4 polynucleotide kinase (New England BioLabs), desalted with Micro Bio-Spin P-30 gel columns (BioRad), and gel purified again. One-cell embryos were co-injected with end-labeled pre-miR-451 (10 pg/embryo) and Ago2 mRNA (100

pg/embryo), injecting 250–300 embryos for each condition (i.e., each lane on the gel). At 4 hpf,

injected embryos were manually de-chorionated in the presence of 1 mg/mL pronase (EMD

Millipore), washed 3 times with E3 buffer (5 mM NaCl, 0.17 mM KCl, 0.33 mM CaCl2, and 0.33

mM MgSO4), and transferred to a 0.6 mL Eppendorf tube. To break the yolk sak, embryos were gently pipetted with 400 µL of de-yolking buffer [55 mM NaCl, 1.8 mM KCl, 1.25 mM NaHCO3, protease inhibitor cocktail tablet (cOmplete, mini, EDTA-free, Sigma; one tablet per 10 mL buffer)]. The embryos were then shaken at 1100 rpm on a ThermoMixer for 5 min and centrifuged at 300 g for 1 min to separate the yolk from the cells. The yolk-containing supernatant was removed, and this de-yolking process was repeated 3 times. Embryo lysis buffer (25 mM Tris pH 7.5, 2 mM EDTA, 150 mM NaCl, 10% glycerol, 1% Triton X-100) was then added and vortexed with 0.5 mm glass beads (BioSpec) for 5 seconds every 30 seconds for 4 min at 4°C. The 0.6 mL Eppendorf tube was punctured at its bottom with a 26G needle and placed inside a 1.5 mL Eppendorf tube. The lysate was separated from the glass beads and

70 clarified by centrifugation at 21,130 g for 10 min at 4°C. Clarified lysates were flash frozen in

liquid nitrogen and stored at –80°C prior to FLAG immunoprecipitation. To immunoprecipitate

FLAG-tagged Ago2, the lysate was incubated with Anti-FLAG M2 Magnetic Beads (Sigma)

under rotation at 4°C overnight in binding buffer (25 mM Tris pH 7.5, 2 mM EDTA, 150 mM

NaCl, 10% glycerol, 1% Triton X-100). The lysate was removed, and the beads were washed

three times with wash buffer (25 mM Tris pH 7.5, 2 mM EDTA, 1 M NaCl, 10% glycerol, 1%

Triton X-100) for 10 min each, under rotation, at 4°C. 1 mL TRI Reagent (Life Technologies)

was added directly to the beads to extract total RNA from the immunoprecipitated material. After

chloroform extraction and ethanol precipitation, extracted RNA was resolved on urea-

polyacrylamide gels, visualized on the Typhoon FLA 7000 phosphorimager, and analyzed using

the MultiGauge software.

Immunoblotting

Lysate from ~10 embryos from each condition were boiled and denatured in SDS loading dye

(125 mM Tris pH 6.8, 4% SDS, 20% glycerol, bromophenol blue, 5% β-mercaptoethanol) for 10

min at 90°C and resolved on a NuPAGE 4–12% Bis-Tris protein gel (Life Technologies). Protein was then blotted onto a PVDF membrane (Life Technologies) at 110 mA for 2 hours. Blots probed for FLAG were blocked in PBST (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 1.8 mM

KH2PO4, 0.1% Tween-20) containing 5% milk for 1 hour and probed with mouse anti-FLAG

antibody (Sigma, diluted 1:5000 in PBST), rocking for 1 hr. After washing with PBST for 10 min

three times, blots were probed for the primary antibody using HRP-conjugated donkey anti-

mouse IgG (GE Healthcare Life Sciences, diluted 1:10,000) for 1 hour. Blots probed for GAPDH

were blocked with 1% BSA in PBST, probed with the primary antibody, rabbit anti-GAPDH

(Abcam, diluted 1:5000 in PBST), and the primary antibody was probed using HRP-conjugated

donkey anti-rabbit IgG (GE Healthcare Life Sciences, diluted 1:10,000). After washing with

PBST for 10 min three times, HRP detection was through electrogenerated chemiluminescence

71 using Amersham ECL prime western blotting detection reagent (Thermo Fisher Scientific),

according to the manufacturer’s instructions, with light detected on Amersham Hyperfilm ECL

(Thermo Fisher Scientific).

QUANTIFICATION AND STATISTICAL ANALYSIS

Analysis of in vitro slicing assays

Single-turnover slicing data were analyzed using the MultiGauge software to calculate percent cleaved. The data were fit in MATLAB to the exponential equation,

( ) = 1 , −𝑘𝑘𝑘𝑘 where F(t) is target cleaved over time, and𝐹𝐹 𝑡𝑡 k is rate− 𝑒𝑒constant according to the following scheme,

+ + . 𝑘𝑘 Multiple-turnover slicing data were fit in MATLAB𝐸𝐸 𝑆𝑆 → to𝐸𝐸 the𝑃𝑃 burst and steady-state equation (Wee et al., 2012),

( ) = × 1 ( ) + × ( 2 ) , 𝑎𝑎 − 𝑎𝑎+𝑏𝑏 𝑡𝑡 𝑎𝑎𝑎𝑎 2 𝐹𝐹 𝑡𝑡 𝐸𝐸 𝑎𝑎+𝑏𝑏 � − 𝑒𝑒 � 𝐸𝐸 𝑎𝑎+𝑏𝑏 𝑡𝑡 where F(t) is target cleaved over time, E is the enzyme concentration, and a and b are rate

constants (reported as k1 and k2, respectively) according to the following scheme,

+ + . 𝑎𝑎 𝑏𝑏 𝐸𝐸 𝑆𝑆 → 𝐸𝐸 ∙𝑃𝑃 → 𝐸𝐸 𝑃𝑃

DATA AND SOFTWARE AVAILABILITY

Raw experimental data from this study have been deposited to Mendeley Data and can be

found at https://data.mendeley.com/datasets/fjg3b788k5/draft?a=8a5b69b5-2580-498e-b03a-

cdd8492a183f.

72 KEY RESOURCE TABLE REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies Monoclonal ANTI-FLAG M2 antibody Sigma F9291-1MG Anti-GAPDH antibody Abcam ab9485 Amersham ECL Mouse IgG, HRP-linked Ab GE Healthcare Life (from sheep) Sciences NXA931-1ml Bacterial and Virus Strains One Shot TOP10 chemically competent E. coli Life Technologies C404006 Biological Samples DMEM VWR 45000-304 Opti-MEM Life Technologies 31985062 FBS Clontech 631367 Chemicals, Peptides, and Recombinant Proteins [γ-32P]ATP PerkinElmer NEG035C001MC [α-32P]UTP PerkinElmer NEG007H001MC [α-32P]GTP PerkinElmer NEG006H500UC Pronase EMD Millipore 537088-50KU NotI-HF New England Biolabs R3189L cOmplete, Mini, EDTA-free protease inhibitor cocktail tablets Sigma 11836170001 TRI Reagent solution Life Technologies AM9738 Phenol:chloroform:isoamyl alcohol 25:24:1 Sigma P2069-100ML Chloroform J.T.Baker Analyical 9180-01 SUPERase-In Life Technologies AM2696 TURBO DNase Life Technologies AM2239 Yeast tRNA Life Technologies 15401011 Dynabeads MyOne Streptavidin C1 Life Technologies 65002 ANTI-FLAG M2 magnetic beads Sigma M8823 3X-FLAG peptide Sigma F4799-4MG GE Healthcare Amersham Hybond-XL membranes Thermo Fisher Scientific 45001147 GE Healthcare Amersham Hybond-NX membrane VWR 95038-412 Whatman Protran nitrocellulose membrane Sigma Z670898 PVDF pre-cut blotting membrane, 0.2 µm pore size Life Technologies LC2002 GE Healthcare Amersham Hyperfilm ECL Thermo Fisher Scientific 45001504 GE Healthcare Amersham ECL prime western blotting detection reagent Thermo Fisher Scientific 45002401 Gel loading buffer II (denaturing PAGE) Life Technologies AM8547 Whatman Nytran SuPerCharge (SPC) TurboBlotter Sigma WHA10416302 Critical Commercial Assays mMESSAGE mMACHINE SP6 Life Technologies AM1340 RNeasy Mini Kit QIAGEN 74104 QuikChange Lightning Multi Site-Directed Mutagenesis Agilent 210515

73 Phusion High-Fidelity DNA polymerase New England Biolabs M0530L T4 PNK New England Biolabs 101228-172 MAXIscript T7 RNA polymerase Life Technologies AM1312 Vaccinia capping system New England Biolabs M2080S Lipofectamine 2000 Life Technologies 11668019 NuPAGE Novex 4-12% Bis-Tris Gel 1.0 mm Life Technologies NP0321BOX NuPAGE MOPS SDS running buffer Life Technologies NP0001 NuPAGE transfer buffer Life Technologies NP00061 Micro Bio-Spin P-30 gel columns Bio-Rad 7326250 NorthernMax-Gly kit Life Technologies AM1946 ULTRAhyb Ultrasensitive hybridization buffer Life Technologies AM8670 ULTRAhyb-Oligo hybridization buffer Life Technologies AM8663 Deposited Data Raw data This paper, Mendeley https://data.mendel Data ey.com/datasets/fj g3b788k5/draft?a= f466501d-4dac- 4b86-8d57- 4cecb1fc3376 Experimental Models: Cell Lines Human: HEK293T ATCC CRL-3216 Experimental Models: Organisms/Strains Zebrafish ZIRC AB Oligonucleotides See Table S1 This study Table S1 Recombinant DNA pCS2+-eGFP This study N/A pCS2+-drAgo2 This study N/A pCS2+-FLAG3-drAgo2 This study N/A D–A pCS2+-FLAG3-drAgo2 This study N/A D–E pCS2+-FLAG3-drAgo2 This study N/A Y–F pCS2+-FLAG3-drAgo2 This study N/A DY–EF pCS2+-FLAG3-drAgo2 This study N/A pCS2+-FLAG3-hsN-drAgo2 This study N/A pCS2+-hsAgo2 This study N/A pCS2+-FLAG3-hsAgo2 This study N/A D–A pCS2+-FLAG3-hsAgo2 This study N/A E–D pCS2+-FLAG3-hsAgo2 This study N/A F–Y pCS2+-FLAG3-hsAgo2 This study N/A EF–DY pCS2+-FLAG3-hsAgo2 This study N/A pCS2+-FLAG3-drN-hsAgo2 This study N/A pCS2+-Zeocin This study N/A pCS2+-Zeocin-miR-430 This study N/A pCS2+-Zeocin-miR-430-10–11 mm This study N/A pCS2+-Zeocin-miR-430-G–G mm This study N/A pMAX GFP Amaxa VDF-1012 pcDNA3-drAgo2 This study N/A pcDNA3-FLAG3-drAgo2 This study N/A

74 D–E pcDNA3-FLAG3-drAgo2 This study N/A Y–F pcDNA3-FLAG3-drAgo2 This study N/A DY–EF pcDNA3-FLAG3-drAgo2 This study N/A D–A pcDNA3-FLAG3-drAgo2 This study N/A pcDNA3-hsAgo2 This study N/A pcDNA3-FLAG3-hsAgo2 This study N/A GFP-3xPTmiR-1 Giraldez lab N/A Software and Algorithms MATLAB Mathworks Multi Gauge 2.2 Fujifilm GraphPad Prism GraphPad Software Other

75 SUPPLEMENTAL INFORMATION Supplemental Information includes seven figures and one table.

76

77

78

79

80

81

82 Table S1. Oligonucleotides used in this study, Related to STAR Methods DNA Name Sequence Chimera N_for GCTACTTGTTCTTTTTGCAGGATCC drAgo2 N_rev CCTCATGGATGGCAAGTGCCTCATGACAACATCCAGAGCC hsAGO2 N_rev CCTCATAGAGGGCAAATGTCTCATGACCACGTCCAGGGCC drAgo2 body_for TGGACGGCTACCAAACATCC hsAGO2 body_for TGCCTAGCGTCCCTTTTGAG Chimera body_rev TGGTTTGTCCAAACTCATCAA drAgo2 CATCATCTACTACAGAGCCGGCATCTCTGAAGGCC D683A_sense drAgo2 GGCCTTCAGAGATGCCGGCTCTGTAGTAGATGATG D683A_antisense drAgo2 CAGCACCGGCAGGAGATCATTCAGGATCTG D651E_sense drAgo2 CAGATCCTGAATGATCTCCTGCCGGTGCTG D651E_antisense drAgo2 CCAACACGCATCATCTTCTACAGAGACGGCATC Y680F_sense drAgo2 GATGCCGTCTCTGTAGAAGATGATGCGTGTTGG Y680F_antisense hsAGO2 TCATCTTCTACCGCGCCGGTGTCTCTGAAGG D669A_sense hsAGO2 CCTTCAGAGACACCGGCGCGGTAGAAGATGA D669A_antisense hsAGO2 GCAGCACCGGCAGGATATCATACAAGACCTG E637D_sense hsAGO2 CAGGTCTTGTATGATATCCTGCCGGTGCTGC E637D_antisense hsAGO2_F666Y_s CCCACCCGCATCATCTATTACCGCGACGGTGT ense hsAGO2 ACACCGTCGCGGTAATAGATGATGCGGGTGGG F666Y_antisense Zeocin target_miR- ACTGACTCGAGCCTCTAGAAATAAGCTACCCCAACTTGATAGCACTTTATAA 430_for GCTATAGTGAGTCGTATTACG Zeocin target_miR- CGTAATACGACTCACTATAGCTTATAAAGTGCTATCAAGTTGGGGTAGCTTA 430_rev TTTCTAGAGGCTCGAGTCAGT Zeocin_miR- ACTGACTCGAGCCTCTAGAAATAAGCTACCCCAACTTCTTAGCACTTTATAA 430_10–11 mm_for GCTATAGTGAGTCGTATTACG Zeocin miR- CGTAATACGACTCACTATAGCTTATAAAGTGCTAAGAAGTTGGGGTAGCTTA 430_10–11 TTTCTAGAGGCTCGAGTCAGT mm_rev Zeocin_miR- ACTGACTCGAGCCTCTAGAAATAAGCTACCCCAACTTGATAGGACTTTATAA 430_G–G_for GCTATAGTGAGTCGTATTACG Zeocin_miR- CGTAATACGACTCACTATAGCTTATAAAGTCCTATCAAGTTGGGGTAGCTTA 430_G–G_rev TTTCTAGAGGCTCGAGTCAGT Zeocin probe_for ATTCAGGATCATGGCCAAGTTG Zeocin probe_rev TTCTAATACGACTCACTATAGGGAGAAGGAGGTTTCTAGAGGCTCGAGTCA GTC GFP probe GTCCAGCTCGACCAGGATGGGCACCACCCCGGTGAACAGCTCCTCGCCCT TGCTCACCAT 168-nt target_for GCGTAATACGACTCACTATAGGGTCACATCTCATCTACCTCC 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTGCTA 430_perfect_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG

83 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTGCTA 430_10–11 AGAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG mm_rev 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTCCTA 430_G–G_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTTCTA 430_G–A_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTACTA 430_G–U_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAGTGCTA 430_C4_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAACTGCTA 430_G4_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAATTGCTA 430_A4_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAAATGCTA 430_U4_rev TCAAGTTGGGGTAGATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAACCGTTA 451_C_rev CCATTACTGAGTTATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAACCCTTA 451_G_rev CCATTACTGAGTTATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAACCTTTA 451_A_rev CCATTACTGAGTTATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTAAACCATTA 451_U_rev CCATTACTGAGTTATCCAGAGGAATTCATTATCAGTG 168-nt target_miR- CCCATTTACATCGCGTTGAGTGTAGAACGGTTGTATAAAAGGTTGGAATGTA 1_rev AAGAAGTATGTATATCCAGAGGAATTCATTATCAGTG 80-nt target_miR- TTGTTGTTGTTGTTGTTGTTAAAGTGCTATCAAGTTGGGGTAGTGTTGTTGT 430 TGTTGTTGTTGTTGTTGTTGTTGCTCCCTATAGTGAGTCGTATTAGAA 80-nt target_miR- TTGTTGTTGTTGTTGTTGTTAAAGTCCTATCAAGTTGGGGTAGTGTTGTTGTT 430_G–G GTTGTTGTTGTTGTTGTTGTTGCTCCCTATAGTGAGTCGTATTAGAA miR-430 capture mUmCmUmUmCmCmUmCmCmGmCmAmCmCmAmCmAmCmAmGmCmAmC mUmUmAmAmCmCmUmUmAmCmAmCmAmC/3Bio/ miR-430 competitor AAGGTTAAGTGCTGTGTGGTGCGGAGGAAGA miR-451 capture mUmCmUmUmCmCmUmCmCmGmCmAmCmCmAmCmAmCmAmAmCmGmG mUmUmAmAmCmCmUmUmAmCmAmCmAmC/3Bio/ miR-451 competitor AAGGTTAACCGTTGTGTGGTGCGGAGGAAGA miR-1 capture mUmCmUmUmCmCmUmCmCmGmCmAmCmCmAmCmAmCmAmCmAmUmU mCmCmAmAmCmCmUmUmAmCmAmCmAmC/3Bio/ miR-1 competitor AAGGTTGGAATGTGTGTGGTGCGGAGGAAGA

RNA Name Sequence miR-430b guide AAAGUGCUAUCAAGUUGGGGUAG miR-430b ACCCUAACUUUAGCAUCUUUCU passenger pre-miR-451 AAACCGUUACCAUUACUGAGUUUAGUAAUGGUAAGGGUUCUG (ancestral) pre-miR-451 AAACCGUUACCAUUACUGAGUUUAGUAAUGGUAACGGUUCUG (amniote) miR-1 guide UGGAAUGUAAAGAAGUAUGUAUdTdT miR-1 passenger AUACAUACUUCUUUACAUUCGAdTdT

84 REFERENCES

Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008). Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer- dependent small RNAs. Genes Dev 22, 2773-2785.

Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281- 297.

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233.

Bernardi, G., Wiley, E.O., Mansour, H., Miller, M.R., Orti, G., Haussler, D., O'Brien, S.J., Ryder, O.A., and Venkatesh, B. (2012). The fishes of Genome 10K. Mar Genomics 7, 3-6.

Betancur, R.R., Broughton, R.E., Wiley, E.O., Carpenter, K., Lopez, J.A., Li, C., Holcroft, N.I., Arcila, D., Sanciangco, M., Cureton Ii, J.C., et al. (2013). The tree of life and a new classification of bony fishes. PLoS currents 5.

Bobbin, M.L., and Rossi, J.J. (2016). RNA Interference (RNAi)-Based Therapeutics: Delivering on the Promise? Annu Rev Pharmacol Toxicol 56, 103-122.

Broughton, R.E., Betancur, R.R., Li, C., Arratia, G., and Orti, G. (2013). Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS currents 5.

Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010). A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature 465, 584-589.

Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698.

Davis, E., Caiment, F., Tordoir, X., Cavaille, J., Ferguson-Smith, A., Cockett, N., Georges, M., and Charlier, C. (2005). RNAi-mediated allelic trans-interaction at the imprinted Rtl1/Peg11 locus. Curr Biol 15, 743-749.

De Rienzo, G., Gutzman, J.H., and Sive, H. (2012). Efficient shRNA-mediated inhibition of gene expression in zebrafish. Zebrafish 9, 97-107.

Dong, M., Fu, Y.F., Du, T.T., Jing, C.B., Fu, C.T., Chen, Y., Jin, Y., Deng, M., and Liu, T.X. (2009). Heritable and lineage-specific gene knockdown in zebrafish embryo. PLoS One 4, e6125.

Drinnenberg, I.A., Fink, G.R., and Bartel, D.P. (2011). Compatibility with killer explains the rise of RNAi-deficient fungi. Science 333, 1592.

Drinnenberg, I.A., Weinberg, D.E., Xie, K.T., Mower, J.P., Wolfe, K.H., Fink, G.R., and Bartel, D.P. (2009). RNAi in budding yeast. Science 326, 544-550.

Elbashir, S.M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., and Tuschl, T. (2001a). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411, 494-498.

85 Elbashir, S.M., Lendeckel, W., and Tuschl, T. (2001b). RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15, 188-200.

Faehnle, C.R., Elkayam, E., Haase, A.D., Hannon, G.J., and Joshua-Tor, L. (2013). The making of a slicer: activation of human Argonaute-1. Cell reports 3, 1901-1909.

Flemr, M., Malik, R., Franke, V., Nejepinska, J., Sedlacek, R., Vlahovicek, K., and Svoboda, P. (2013). A retrotransposon-driven dicer isoform directs endogenous small interfering RNA production in mouse oocytes. Cell 155, 807-816.

Flores-Jasso, C.F., Salomon, W.E., and Zamore, P.D. (2013). Rapid and specific purification of Argonaute-small RNA complexes from crude cell lysates. Rna 19, 271-279.

Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome research 19, 92-105.

Giraldez, A.J., Cinalli, R.M., Glasner, M.E., Enright, A.J., Thomson, J.M., Baskerville, S., Hammond, S.M., Bartel, D.P., and Schier, A.F. (2005). MicroRNAs regulate brain morphogenesis in zebrafish. Science 308, 833-838.

Gruber, J., Manninga, H., Tuschl, T., Osborn, M., and Weber, K. (2005). Specific RNAi mediated gene knockdown in zebrafish cell lines. RNA Biol 2, 101-105.

Grundhoff, A., and Sullivan, C.S. (2011). Virus-encoded microRNAs. 411, 325-343.

Hansen, T.B., Wiklund, E.D., Bramsen, J.B., Villadsen, S.B., Statham, A.L., Clark, S.J., and Kjems, J. (2011). miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA. EMBO J 30, 4414-4422.

Hauptmann, J., Dueck, A., Harlander, S., Pfaff, J., Merkl, R., and Meister, G. (2013). Turning catalytically inactive human Argonaute proteins into active slicer enzymes. Nature structural & molecular biology 20, 814-817.

Hauptmann, J., Kater, L., Loffler, P., Merkl, R., and Meister, G. (2014). Generation of catalytic human Ago4 identifies structural elements important for RNA cleavage. Rna 20, 1532-1538.

Iwasaki, Y.W., Siomi, M.C., and Siomi, H. (2015). PIWI-Interacting RNA: Its Biogenesis and Functions. Annu Rev Biochem 84, 405-433.

Jonas, S., and Izaurralde, E. (2015). Towards a molecular understanding of microRNA- mediated gene silencing. Nat Rev Genet 16, 421-433.

Kelly, A., and Hurlstone, A.F. (2011). The use of RNAi technologies for gene knockdown in zebrafish. Briefings in functional genomics 10, 189-196.

Kim, V.N. (2005). MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 6, 376-385.

Lee, H.S., Seok, H., Lee, D.H., Ham, J., Lee, W., Youm, E.M., Yoo, J.S., Lee, Y.S., Jang, E.S., and Chi, S.W. (2015). Abasic pivot substitution harnesses target specificity of RNA interference. Nat Commun 6, 10154.

86 Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441.

Ma, J.B., Yuan, Y.R., Meister, G., Pei, Y., Tuschl, T., and Patel, D.J. (2005). Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature 434, 666-670.

Malone, C.D., and Hannon, G.J. (2009). Small RNAs as guardians of the genome. Cell 136, 656-668.

Mangos, S., Vanderbeld, B., Krawetz, R., Sudol, K., and Kelly, G.M. (2001). Ran binding protein RanBP1 in zebrafish embryonic development. Mol Reprod Dev 59, 235-248.

Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol Cell 15, 185-197.

Nakanishi, K., Ascano, M., Gogakos, T., Ishibe-Murakami, S., Serganov, A.A., Briskin, D., Morozov, P., Tuschl, T., and Patel, D.J. (2013). Eukaryote-specific insertion elements control human ARGONAUTE slicer activity. Cell reports 3, 1893-1900.

Nakanishi, K., Weinberg, D.E., Bartel, D.P., and Patel, D.J. (2012). Structure of yeast Argonaute with guide RNA. Nature 486, 368-374.

Oates, A.C., Bruce, A.E., and Ho, R.K. (2000). Too much interference: injection of double- stranded RNA has nonspecific effects in the zebrafish embryo. Dev Biol 224, 20-28.

Parker, J.S., Roe, S.M., and Barford, D. (2005). Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature 434, 663-666.

Patrick, D.M., Zhang, C.C., Tao, Y., Yao, H., Qi, X., Schwartz, R.J., Jun-Shen Huang, L., and Olson, E.N. (2010). Defective erythroid differentiation in miR-451 mutant mice mediated by 14- 3-3zeta. Genes Dev 24, 1614-1619.

Rasmussen, K.D., Simmini, S., Abreu-Goodger, C., Bartonicek, N., Di Giacomo, M., Bilbao- Cortes, D., Horos, R., Von Lindern, M., Enright, A.J., and O'Carroll, D. (2010). The miR-144/451 locus is required for erythroid homeostasis. J Exp Med 207, 1351-1358.

Salomon, W.E., Jolly, S.M., Moore, M.J., Zamore, P.D., and Serebrov, V. (2015). Single- Molecule Imaging Reveals that Argonaute Reshapes the Binding Properties of Its Nucleic Acid Guides. Cell 162, 84-95.

Shabalina, S.A., and Koonin, E.V. (2008). Origins and evolution of eukaryotic RNA interference. Trends in ecology & evolution 23, 578-587.

Shin, C., Nam, J.W., Farh, K.K., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38, 789-802.

Song, J.J., Smith, S.K., Hannon, G.J., and Joshua-Tor, L. (2004). Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305, 1434-1437.

87 Song, R., Hennig, G.W., Wu, Q., Jose, C., Zheng, H., and Yan, W. (2011). Male germ cells express abundant endogenous siRNAs. Proc Natl Acad Sci U S A 108, 13159-13164.

Stein, P., Rozhkov, N.V., Li, F., Cardenas, F.L., Davydenko, O., Vandivier, L.E., Gregory, B.D., Hannon, G.J., and Schultz, R.M. (2015). Essential Role for endogenous siRNAs during meiosis in mouse oocytes. PLoS genetics 11, e1005013.

Subtelny, A.O., Eichhorn, S.W., Chen, G.R., Sive, H., and Bartel, D.P. (2014). Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66-71.

Tam, O.H., Aravin, A.A., Stein, P., Girard, A., Murchison, E.P., Cheloufi, S., Hodges, E., Anger, M., Sachidanandam, R., Schultz, R.M., et al. (2008). Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534-538.

Tomari, Y., and Zamore, P.D. (2005). Perspective: machines for RNAi. Genes Dev 19, 517-529.

Tuschl, T. (2001). RNA interference and small interfering RNAs. Chembiochem 2, 239-245.

Tyner, C., Barber, G.P., Casper, J., Clawson, H., Diekhans, M., Eisenhart, C., Fischer, C.M., Gibson, D., Gonzalez, J.N., Guruvadoo, L., et al. (2017). The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45, D626-D634.

Watanabe, T., Totoki, Y., Toyoda, A., Kaneda, M., Kuramochi-Miyagawa, S., Obata, Y., Chiba, H., Kohara, Y., Kono, T., Nakano, T., et al. (2008). Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539-543.

Wee, L.M., Flores-Jasso, C.F., Salomon, W.E., and Zamore, P.D. (2012). Argonaute divides its RNA guide into domains with distinct functions and RNA-binding properties. Cell 151, 1055- 1067.

Weick, E.M., and Miska, E.A. (2014). piRNAs: from biogenesis to function. Development 141, 3458-3471.

Yang, J.S., Maurin, T., and Lai, E.C. (2012). Functional parameters of Dicer-independent microRNA biogenesis. Rna 18, 945-957.

Yang, J.S., Maurin, T., Robine, N., Rasmussen, K.D., Jeffrey, K.L., Chandwani, R., Papapetrou, E.P., Sadelain, M., O'Carroll, D., and Lai, E.C. (2010). Conserved vertebrate mir-451 provides a platform for Dicer-independent, Ago2-mediated microRNA biogenesis. Proc Natl Acad Sci U S A 107, 15163-15168.

Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596.

Yoda, M., Cifuentes, D., Izumi, N., Sakaguchi, Y., Suzuki, T., Giraldez, A.J., and Tomari, Y. (2013). Poly(A)-specific ribonuclease mediates 3'-end trimming of Argonaute2-cleaved precursor microRNAs. Cell reports 5, 715-726.

Zhao, Z., Cao, Y., Li, M., and Meng, A. (2001). Double-stranded RNA injection produces nonspecific defects in zebrafish. Dev Biol 229, 215-223.

88 CHAPTER III

DISCUSSION AND FUTURE DIRECTIONS

Our studies have uncovered previously unappreciated nuances of Argonaute2-catalyzed cleavage activity. By studying both teleost and mammalian Argonaute2, we were able to identify both enzyme and substrate requirements to not only restore efficient Argonaute2-catalyzed cleavage activity to zebrafish but also enhance the slicing activity of both zebrafish and human

Argonaute2 proteins (Chen et al., 2017). These finding have implications in both the zebrafish community as well as the broader RNA silencing field.

We identified two highly conserved amino acid substitutions that are most likely responsible for the substantial decrease in slicing activity of zebrafish Argonaute2 (Chen et al.,

2017). Based on sequence alignment and conservation analysis, these two point substitutions seem to have occurred approximately 300 million years ago in a common ancestor of the sequenced teleost fish. This unanticipated discovery that zebrafish, and indeed most extant fish species, have lacked efficient slicing suggests that a vertebrate species can persist and thrive despite lacking effective RNAi. This lack of efficient slicing activity perhaps hints at why RNAi has been an ineffective tool in zebrafish, as it relies on Argonaute2-catalyzed slicing of target mRNAs, and offers a strategy on to how this activity might be restored to zebrafish.

These studies have also shifted our fundamental understanding of the importance of

seed pairing on target slicing. Seed pairing has always been considered crucial to guide-target

binding, with mismatches in this region disrupting pairing (Wee et al., 2012; Salomon et al.,

2015). However, we have demonstrated that specific mismatch in the seed region, while

hindering binding, can in fact enhance target slicing (Chen et al., 2017). This surprising

discovery reveals insights into the multi-step process of Argonaute2-catalyzed cleavage

reactions and the nuances of guide-target-Argonaute interactions with potential consequences

in guide RNA design for RNA-silencing tools.

89 Implications of an efficient and cleavage competent zebrafish Argonaute2

Surprisingly, the target slicing ability of zebrafish Argonaute2 had not been previously reported.

In vitro functional assays of Argonaute2 activity had only been performed using the human or mouse protein, even in zebrafish studies (Cifuentes et al., 2010), presumably, because the human or mouse protein was readily available, and the zebrafish protein was assumed to be well conserved. Indeed, the catalytic tetrad of zebrafish Argonaute2 is DDDH, only one residue away from the mammalian DEDH. This one difference did not appear decisive as the mammalian tetrad resembles but is again not an exact match to the motif found in RNase H domains and yeast Argonaute2 proteins, DEDD (Nakanishi et al., 2012). In fact, an aspartate is significantly more similar in structure to a glutamate than a histidine. These seemingly insignificant and minor differences may have been why the well conserved zebrafish

Argonaute2 was believed to also possess efficient endonuclease activity.

Since our discovery of two conserved amino acid changes (DEDH•F to DDDH•Y) that resulted in the loss of efficient slicing activity by zebrafish Argonaute2 (Chen et al., 2017), we have speculated upon the impact of this loss. At a glance, zebrafish, in fact most extant teleost fish and thus the majority of the vertebrate species on the planet, do not seem to have suffered at the absence of effective RNAi, a pathway so many other species have exploited to defend against viruses and transposons (Tomari and Zamore, 2005; Malone and Hannon, 2009).

Although losing this mechanism of defense seems to have had a neutral effect on the teleost lineage, it has greatly and negatively impacted our use of zebrafish as a model organism. It does not seem unreasonable to speculate that the difficulty of RNAi as a gene knock down tool in zebrafish could be entirely due to the lack of efficient target slicing by Argonaute. We found zebrafish Argonaute2 to slice targets almost 100-fold slower than its human counterpart (Chen et al., 2017), and this vastly underperforming enzyme could potentially explain why though general defects are observed when injecting siRNAs into zebrafish embryos, some scientists have detected some specific phenotypes (Wargelius et al., 1999).

90 Our findings also speculate the possibility of restoring this pathway to zebrafish. We have demonstrated that with two amino acid changes, zebrafish Argonaute2 is capable of slicing targets equally as effectively as human Argonaute2. With the emergence of novel gene- editing tools such as CRISPR-Cas9, it is now feasible to generate a zebrafish with a copy of

Argonaute2 that possess the ancestral, active residues. It will be interesting to see whether this repaired and enhanced zebrafish with a fully competent Argonaute2 will now be capable of

RNAi or whether this tool remains elusive still.

As exciting as it would be to solve the problem of RNAi in zebrafish, we predict that

RNAi will still remain intractable in the early embryo. Though an ineffective Argonaute2 seems like a convenient explanation, it is only one component of a plethora of events occurring in the early embryo. The accumulation of miR-430 and more importantly, the necessity of this miRNA to eliminate maternally deposited transcripts remains a crucial step in embryonic development as the zygotic genome becomes activated, and this essential step cannot be avoided or understated. It has been speculated that the influx of exogenous siRNAs disrupts this developmental transition by overwhelming the population of endogenous Argonaute, resulting in the persistence of maternal transcripts and causing developmental defects. A method to circumvent this has been proposed in which additional miR-430 is co-injected along with the siRNA, which has been shown to alleviate some developmental defects; however, it does not fully eliminate the off target effects (Zhao et al., 2008). In light of our research, it would seem, even if additional miR-430 were co-injected, the inability of endogenous Argonaute2 to effectively slice targets would still hinder its effectiveness as a gene knock down tool.

If this were now performed in the background of a zebrafish with an efficiently cleavage competent Argonaute2, it is more likely that the injected siRNAs will bind and result in the slicing of its endogenous targets. However, even in this context, the injected siRNA must compete with the overwhelming population of miR-430 for available Argonaute2. It is unclear what the stoichiometric ratio of miR-430 to endogenous Argonaute2 is in the early embryo, whether there

91 is plenty of Argonaute to bind miR-430 or whether Argonaute2 is limiting. Regardless, this competition anticipates the need for additional miR-430 duplex as well as Argonaute2 protein to accommodate the increased load. This then begs the question of whether the early embryo is equipped to handle such large quantities of exogenous RNA and protein if additional miR-430 and Argonuate2 were required to be co-injected along with the siRNA of interest.

The idea of a zebrafish with a fully functional Argonaute2 also prompts the question of why, from an evolutionary standpoint, this activity was lost in the first place. Could an effective

Argonaute2 slicer be somehow detrimental to the teleost lineage? Perhaps it was necessary for the teleost lineage to lose efficient endonuclease activity. Currently, the presence of an active

Argonaute2 protein, either human or zebrafish, does not seem to negatively affect zebrafish development up to 48 hours post fertilization, as determined by gross morphological observations and detection of the full length protein by immunoblots (Chen, Sive, and Bartel, unpublished data). It will be interesting to see whether an Argonaute2 capable of efficient slicing activity has an effect on the fish at all.

Previous studies have shown that fish, while exhibiting a decrease in hemoglobinized erythrocytes, tolerate the loss of Argonaute2 (Cifuentes et al., 2010). These genetic studies identified the role of Argonaute2 catalytic activity in the processing of a miRNAs involved in erythrocyte maturation, pre-miR-451 (Cifuentes et al., 2010). It was curious to us that zebrafish

Argonaute2 was seemingly incapable of target slicing, yet necessary and just sufficient enough to cleave enough pre-miR-451 to avoid erythropoiesis defects. Our studies revealed that this loss of efficient of slicing activity was attenuated by the presence and retention of an ancestral

G–G mismatch within the pre-miR-451 hairpin. We found that a position 6 G–G mismatch enabled zebrafish Argonaute2-mediated pre-miR-451 cleavage, and without the mismatch, pre- miR-451 cleavage was essentially abolished. In other words, zebrafish were only able to lose efficient Argonaute2 slicing activity because of the presence of a G–G mismatch in pre-miR-

451.

92 From an evolutionary standpoint, the realization that the majority of vertebrate species on the planet could potentially not utilize RNAi as a defense mechanism is not only surprising but also quite astonishing. RNAi has long thought to be the ancestral system of defense against invading viruses and transposons. How then do teleosts protect themselves against these foreign sequences? One possible mechanism of coping, at least during germ cell and embryonic development where there are a myriad of transposons, could be the piRNA pathway.

Zebrafish Piwi proteins (Ziwi), expressed in both the male and female gonad, could use piRNAs, the majority of which map to transposable elements (TE), and induce their cleavage and degradation (Houwing et al., 2007; Lau, 2010). Zebrafish oocytes lacking Ziwi exhibited elevated TE transcripts, which was proposed to cause genomic damage leading to apoptosis

(Houwing et al., 2007). These studies indicated a clear role for both Ziwi and piRNAs in the silencing of repetitive elements in vertebrates. Thus, perhaps this piRNA pathway that dominates in the zebrafish germline diminishes the need for a robust RNAi pathway.

Interestingly, the loss of RNAi is not unprecedented. Several lineages of unicellular eukaryotes, including S. cerevisiae, T. cruzi, L. major, C. merolae, and P. falciparum, appear to

have completely and independently lost RNAi (Shabalina and Koonin, 2008), showing that RNAi

is nonessential. In S. cerevisiae, the re-introduction of RNAi results in toxin susceptibility. In fact,

the lack of RNA enables these yeast to acquire and maintain Killer, a dsRNA element that

encodes a toxin that kills neighboring cells lacking Killer (Drinnenberg et al., 2011). Thus, in this

context, RNAi-competence imparts a selective disadvantage. Whether something similar occurs

for fish is as of yet undetermined, especially as fish have not completely lost the ability to slice

targets but merely have an attenuated version.

Regardless of the costs or benefits of losing RNAi, it will be fascinating from an

evolutionary perspective as well as beneficial as a model organism to restore full cleavage

competence to a species that has lacked it for over 300 million years.

93 Effects of seed mismatches on target slicing

Our observations that a G–G mismatch in pre-miR-451 enabled Argonaute2-mediated cleavage prompted us to investigate whether a similar mismatch between a guide and a target would also enable target slicing. This had been counter to all previously published work as mismatches in the seed have been shown to disrupt pairing, slow on rates, and increase off rates (Bartel, 2009;

Wee et al., 2012; Salomon et al., 2015). To our surprise, a G–G mismatch at position 6 between a guide and a target enhanced target slicing by 3–5-fold (Chen et al., 2017). We demonstrated that, in the context of two different miRNAs that have a position 6 G, miR-430 and miR-451, a G mismatch to that position enhanced the rate of target slicing.

Our studies separated the distinct steps of Argonaute2-catalyzed cleavage into binding, catalysis, and product release and found that this specific mismatch conferred a 3–5-fold advantage on two of the steps, catalysis and product release. This revelation implies that these discrete steps can be separated and have different requirements. We speculate that these differences lie with their specific interactions within the body of Argonaute2. Prior studies have revealed how guide-target-Argonaute2 interactions remodel the binding properties of each component (Salomon et al., 2015), thus it is feasible that these interactions could also affect how efficiently a target is sliced. One can imagine two different scenarios: 1) Argonaute2- centric: the G–G mismatch within the seed region forces Argonaute2 into a conformation that is closer to its catalytically active conformation or 2) target-centric: the G–G mismatch between the guide and the target aligns the duplex differently within the binding channel of Argonaute2 such that it brings the duplex in closer proximity to the active site. It is difficult to theorize which of these possibilities is closer to the truth without a crystal structure of Argonaute2 bound to a guide and target with this specific mismatch.

The currently available crystal structure of human Argonaute2 shows that guide nucleotide position 6 is partially occluded by Argonaute2 (Schirle et al., 2014). In the absence of target RNA, Argonaute proteins kink their guide RNAs at the end of the seed region by inserting

94 a hydrophobic residue on α-helix 7 between guide nucleotides 6 and 7. Thus, in order for target binding to occur, Argonaute2 must undergo a conformational change to shift helix 7 to avoid steric clashes with target nucleotides 6 and 7. This shift also relaxes the guide RNA, allowing

guide nucleotides 6 and 7 to adopt an A-form conformation for target pairing and the

subsequent steps and conformational changes leading to full pairing. With this structural data in

mind, the introduction of a G–G mismatch at position 6 introduces an additional and unexpected kink. Presumably, the initial binding of nucleotides 2–5 must be similar to that of one between a fully paired guide and target, what happens after that, we can only speculate. From our binding and cleavage assay, we know that the binding of a target containing this mismatch at position 6 is approximately 2-fold slower than binding of a perfectly paired target. Hypothetically, the additional steric clash of a G–G mismatch could slow the movement of helix 7 and distort the A- form conformation, leading to slower binding, which is what we and other have observed. An alternative possibility could be that once nucleotides 2–5 bind, due to the added steric hindrance of a G–G mismatch at position 6, pairing skips this and the central region completely and instead re-nucleates at the supplemental region, which is open and exposed for target pairing

(Figure 2 from the Chapter 1; skipping step 2 and moving directly from step 1 to step 3). In both of these scenarios, Argonaute2 must position the bound target in its active site. Again, how a

G–G mismatch enhances this step is unclear.

In our studies, which looked at target slicing under single-turnover conditions, the difference in binding became irrelevant, and only differences in slicing were observed. We speculate that as a guide and target become fully paired within Argonaute2, Argonaute2 must again undergo a separate conformational change to either generate the active site and/or properly position the target in the active site. Here, the G–G mismatch could potentially either promote or stabilize this active conformation, leading to faster slicing. A crystal structure of

Argonaute2 bound to a guide and target containing the mismatch at the various steps of binding

95 and cleavage would be instrumental in furthering our mechanistic understanding of how this

mismatch is able to enhance target slicing.

This enhancement of Argonaute2-catalyzed cleavage activity was initially observed with

respect to pre-miR-451 in zebrafish embryos. Pre-miR-451 is incredibly well base paired along

its stem, with the exception of the G–G mismatch at position 6, and thus quite resembles a guide bound to a target, with the exception of the terminal loop. However, a critical difference between these two substrates is the manner in which they are loaded into Argonaute2 – pre-

miR-451 is loaded as a duplex, meaning Argonaute2 must immediately contend with the

presence of the G–G mismatch upon encountering pre-miR-451, whereas a target must bind the

pre-bound guide within Argonaute2, meaning Argonaute2 must somehow accommodate and

incorporate the mismatch into the binding process. Whether or not this initial one- (pre-miR-451)

or two- (target) step process affects the manner in which the two substrates are positioned

within the active site remains to be seen. However, since we predict that the binding and

cleavage are two separable steps, there could be no difference once both are properly bound.

Again, it would be beneficial to have a crystal structure of Argonaute2 bound to both pre-miR-

451 and a guide and target with a G–G mismatch at position 6 to compare the two.

A natural question that arose after discovering this curious benefit is whether other

mismatches at position 6 also conferred the same enhancement in activity. Interestingly, for a

target complementary to miR-430, only a G–G mismatch resulted in an advantage in target

slicing, suggesting that there was something special about a G–G mismatch. In other words, it

was not the case in which any mismatch or any distortion in the duplex could impart this benefit,

it was specifically a G–G mismatch. However, in the context of a miR-451 target, both a G–G

and a G–A mismatch at position 6 enhanced slicing activity. Interestingly, the seeds of miR-430

and miR-451 are very similar, with the nucleotides immediately flanking position 6 swapped –

positions 5–7 of miR-430 are UGC, whereas those of miR-451 are CGU. This raises the

possibility that sequence context and nearest neighbors may influence the overall effect of a

96 mismatch on target slicing. One could also imagine a scenario in which based on the sequence

context, rather than allowing a G–G mismatch, one G flips out and the other forms a G–U wobble with a neighboring U. Nevertheless, this discovery complicated our findings and widened the potential scope of mismatch effects. Rather than our initial belief that a G–G mismatch was somehow unique, it seems as though other mismatches can also confer this rate enhancement. Again, this begs the question of what properties of these mismatches allow for this enhancement. In which ways is a G–G mismatch in the context of a UGC sequence similar

or different to a G–G and a G–A mismatch in the context of a CGU sequence? Further, does the position of the mismatch matter? We tested a G–G mismatch, as well as a G–A mismatch and

G–U wobble, at position 4 in the context of miR-430 and saw no rate enhancement as compared to the fully paired G–C match. Perhaps there are positions along the seed that are more able than others to tolerate mismatches, for instance nucleotides 6–8 rather than 2–5. To answer these questions and determine which mismatches enhance Argonaute2-catalyzed cleavage, a comprehensive test of every single mismatch along the seed in combination with every single nearest neighbor pair would have to be tested. Single mismatch studies have rarely been conducted as effects were thought to be minor for such extensively paired targets, and

dinucleotide mismatch studies have only looked at the effect on binding rather than the effect on

slicing. These slicing studies could transform our notion of mismatches as an inherently

negative property and reveal novel insights into how their structure could in fact enhance

function and activity.

Identifying determinants of target slicing rates

Another intriguing discovery was the idea that different miRNAs direct the slicing of their

respective targets at different rates. Our studies found a >100-fold difference in slicing rates

between three identical 168-nt targets that varied only with respect to the miRNA sequence –

miR-1, miR-430, and miR-451. miR-1 programmed-human Argonaute2 catalyzed slicing at a

97 rate of 2.2 min-1, miR-430 at 0.253 min-1, and miR-451 at 0.019 min-1, indicating that miR-1

directs the cleavage of its targets approximately 9- and 117-fold better than either miR-430 or

miR-451, respectively (Figure 1). These differences are also seen in and exaggerated by the

much less efficient zebrafish Argonaute2, in which miR-1 directs the cleavage of its targets

approximately 28-fold better than miR-430 (0.072 min-1 and 0.0026 min-1, respectively).

Similarly, human AGO3 was shown to cleave miR-20a targets but not let-7a, miR-19b, or miR-

16 targets (Park et al., 2017). Though it may be expected that different miRNAs do indeed have different properties and thus exhibit variable repression, it is nonetheless unexpected to see such large 9–100-fold difference.

All of our rates were determined under single-turnover conditions, thus loading and binding should not play a role and we should only be observing differences in slicing rates. This then leads to the question – how does Argonaute2 distinguish between these different targets post-binding in a manner that results in a difference of slicing rates? And are there sequence

determinants that govern slicing? Each individual nucleotide is subtly different, and we know

that sequence composition matters. For instance, a high GC content generally means more

98 stability, and separately, the strand with the less thermodynamically stable 5′ end is generally chosen as the miRNA guide. However, how and if sequence composition plays a role in slicing is as of yet undetermined.

Hypothetically, if we use GC content as a determinant, we can postulate the following: miR-1 has the lowest GC content of the three miRNAs tested (27% compared to 43% and 38% for miR-430 and miR-451, respectively), miR-1 targets are also cleaved by far the most rapidly.

Perhaps the lower the GC content, the faster a target is cleaved. However, that theory breaks down once miR-430 and miR-451 are compared as miR-451 has a lower GC content than miR-

430 but is cleaved almost 13-fold slower. If the critical region is narrowed down to the two nucleotides between which Argonaute2 slices, positions 10 and 11, there is a noticeable difference between the three miRNAs in terms of GC content - 0% for miR-1 (AA), 50% for miR-

430 (UC), and 100% for miR-451 (CC). Again, how Argonaute2 differentiates between these is unclear. Perhaps the binding channels and active site of Argonaute2 can recognize the nuanced differences between the nucleotides. We now know from our studies that even the highly conserved change of a glutamate to an aspartate can have a significant impact on the ability of

Argonaute2 to cleave its substrate, so perhaps it is not unreasonable to think that the subtle differences between nucleotides could affect how well it is positioned within the active site and recognized by its enzyme. One can imagine that each guide-target pair lies slightly differently within Argonaute2, and it is this tertiary interaction that influences slicing rates.

A compelling experiment would be to test the slicing rates of different miRNAs within the same family as they would all possess identical seeds but have different 3′ ends. In a similar vein, compare miRNAs with similar 3′ ends but different seeds. This could potentially narrow down the region contributing to differences in slicing – whether the critical region is in the seed, central, or 3′ end of the miRNA. These experiments could identify minute difference between miRNAs. Once we identify the factors that contribute to a more rapid cleavage activity, it will be interesting to see if we can artificially increase or decrease the rate of slicing. For instance, can

99 we increase the rate of miR-451 target slicing or decrease the rate of miR-1 target slicing?

Currently, these are all speculations, and we predict the need for many additional slicing assays to be performed with many different miRNAs before we can come to a consensus about the relative importance of sequence contribution to slicing. Practically, these studies could influence and change the way we design siRNAs and reporters in the future.

From an evolutionary standpoint, very few endogenous targets are substrates for

Argonaute2-catalyzed cleavage, as the vast majority possess only seed pairing and lack the necessary sequence complementarity. Of the targets that are sliced by Argonaute2, it will be fascinating to see whether those endogenous target adhere to the same properties that we identify.

Another compelling question pivots away from Argonaute2-catalyzed cleavage activity and instead focuses on the primary action of Argonaute, i.e., miRNA-mediated repression. For the targets that undergo miRNA-mediated repression, do the properties that dictate slicing rate also apply to target repression? In other words, does miR-1 repress targets more efficiently than miR-430? These studies could inform our views and understanding of the inherent differences between various miRNAs.

Concluding remarks

The basal RNA interference pathway was long thought to be a pervasive defense mechanism against foreign sequences. Our research demonstrated otherwise – a vertebrate species, zebrafish, and potentially the majority of vertebrate species on the planet, have persisted and prospered without effective RNAi since acquiring two conserved amino acid substitutions almost

300 million years ago. The discovery and characterization of this phenomenon led to yet another surprising feature of the effect of mismatches on Argonaute2-catalyzed cleavage activity – mismatches between a guide and target RNA that hinder binding can in fact impart a positive rate enhancement on target slicing and product release. These unanticipated properties

100 suggest that seemingly minor differences can have a significant influence on the overall function of the enzyme, and expanding upon these studies will further inform our limited understanding of the nuances of guide-target-Argonaute interactions and functions.

101 REFERENCES

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233.

Chen, G.R., Sive, H., and Bartel, D.P. (2017). A Seed Mismatch Enhances Argonaute2- Catalyzed Cleavage and Partially Rescues Severely Impaired Cleavage Found in Fish. Mol Cell 68, 1095-1107 e1095.

Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698.

Drinnenberg, I.A., Fink, G.R., and Bartel, D.P. (2011). Compatibility with killer explains the rise of RNAi-deficient fungi. Science 333, 1592.

Houwing, S., Kamminga, L.M., Berezikov, E., Cronembold, D., Girard, A., van den Elst, H., Filippov, D.V., Blaser, H., Raz, E., Moens, C.B., et al. (2007). A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in Zebrafish. Cell 129, 69-82.

Lau, N.C. (2010). Small RNAs in the animal gonad: guarding genomes and guiding development. Int J Biochem Cell Biol 42, 1334-1347.

Malone, C.D., and Hannon, G.J. (2009). Small RNAs as guardians of the genome. Cell 136, 656-668.

Nakanishi, K., Weinberg, D.E., Bartel, D.P., and Patel, D.J. (2012). Structure of yeast Argonaute with guide RNA. Nature 486, 368-374.

Park, M.S., Phan, H.D., Busch, F., Hinckley, S.H., Brackbill, J.A., Wysocki, V.H., and Nakanishi, K. (2017). Human Argonaute3 has slicer activity. Nucleic Acids Res 45, 11867-11877.

Salomon, W.E., Jolly, S.M., Moore, M.J., Zamore, P.D., and Serebrov, V. (2015). Single- Molecule Imaging Reveals that Argonaute Reshapes the Binding Properties of Its Nucleic Acid Guides. Cell 162, 84-95.

Schirle, N.T., Sheu-Gruttadauria, J., and MacRae, I.J. (2014). Structural basis for microRNA targeting. Science 346, 608-613.

Shabalina, S.A., and Koonin, E.V. (2008). Origins and evolution of eukaryotic RNA interference. Trends in ecology & evolution 23, 578-587.

Tomari, Y., and Zamore, P.D. (2005). Perspective: machines for RNAi. Genes Dev 19, 517-529.

Wargelius, A., Ellingsen, S., and Fjose, A. (1999). Double-stranded RNA induces specific developmental defects in zebrafish embryos. Biochem Biophys Res Commun 263, 156-161.

Wee, L.M., Flores-Jasso, C.F., Salomon, W.E., and Zamore, P.D. (2012). Argonaute divides its RNA guide into domains with distinct functions and RNA-binding properties. Cell 151, 1055- 1067.

102 Zhao, X.F., Fjose, A., Larsen, N., Helvik, J.V., and Drivenes, O. (2008). Treatment with small interfering RNA affects the microRNA pathway and causes unspecific defects in zebrafish embryos. FEBS J 275, 2177-2184.

103 LETTER doi:10.1038/nature12147

The bromodomain protein Brd4 insulates chromatin from DNA damage signalling

Scott R. Floyd1,2, Michael E. Pacold1,3,4, Qiuying Huang1, Scott M. Clarke1, Fred C. Lam1, Ian G. Cannell1, Bryan D. Bryson1, Jonathan Rameseder1, Michael J. Lee1, Emily J. Blake1, Anna Fydrych1, Richard Ho1, Benjamin A. Greenberger1, Grace C. Chen1, Amanda Maffa1, Amanda M. Del Rosario1, David E. Root5, Anne E. Carpenter5, William C. Hahn5,6, David M. Sabatini4,5, Clark C. Chen6,7, Forest M. White1,8, James E. Bradner5,6 & Michael B. Yaffe1,5,7,8,9

DNA damage activates a signalling network that blocks cell-cycle knockdown of Brd4; this remained elevated at 24 h (Fig. 1a, b and progression, recruits DNA repair factors and/or triggers sen- Supplementary Fig. 4). Eight hairpins directed against Brd4 showed escence or programmed cell death1. Alterations in chromatin this effect, making off-target effects unlikely (Fig. 1a and Supplement- structure are implicated in the initiation and propagation of the ary Fig. 4). Neither Brd4 knockdown in the absence of irradiation DNA damage response2. Here we further investigate the role of (Fig. 1b) nor knockdown of other bromodomain-containing proteins chromatin structure in the DNA damage response by monitoring (Figs 1b and Supplementary Fig. 4) significantly altered cH2AX. ionizing-radiation-induced signalling and response events with a Increased ionizing-radiation-induced cH2AX after Brd4 loss was high-content multiplex RNA-mediated interference screen of further confirmed using short interfering RNA (siRNA) oligonucleo- chromatin-modifying and -interacting genes. We discover that tides targeting additional independent Brd4 sequences (Fig. 1f and an isoform of Brd4, a bromodomain and extra-terminal (BET) Supplementary Fig. 5). family member, functions as an endogenous inhibitor of DNA Brd4 encodes three splice isoforms (A, B and C in Fig. 1c). Each damage response signalling by recruiting the condensin II chro- isoform contains two amino (N)-terminal bromodomains (BD1 and matin remodelling complex to acetylated histones through bromo- BD2) that bind acetylated lysine, and an extra-terminal (ET) domain domain interactions. Loss of this isoform results in relaxed recently reported to interact with several chromatin-binding proteins9. chromatin structure, rapid cell-cycle checkpoint recovery and The A isoform contains a carboxy (C)-terminal domain (CTD) that enhanced survival after irradiation, whereas functional gain of this functions as a transcriptional co-activator with the pTEFb complex10,11. isoform compacted chromatin, attenuated DNA damage response This region is notably absent in the B and C isoforms, and in the B signalling and enhanced radiation-induced lethality. These data isoform it is replaced with a divergent short 75 amino-acid segment. All implicate Brd4, previously known for its role in transcriptional three Brd4 isoforms are expressed in U2OS cells, and the short hairpin control, as an insulator of chromatin that can modulate the signal- RNAs (shRNAs) used in our screen targeted all three isoforms ling response to DNA damage. (Supplementary Table 1). We confirmed that a single distinct siRNA Detection and repair of damaged DNA is integral for cell survival that was active against all Brd4 isoforms replicated the Brd4 loss-of- and accurate transmission of genetic information to progeny. Defects function phenotype of elevated ionizing-radiation-induced cH2AX in the DNA damage response (DDR) contribute to oncogenesis and (Supplementary Fig. 5). genomic instability in tumours3,4 and render tumour cells sensitive to To establish the relative effects of the isoforms on the DDR, we DNA-damaging cancer therapy5. Early signalling events that trigger performed gain-of-function experiments. Overexpression of Brd4 iso- and transduce the DDR occur in the context of chromatin, and it form B most potently suppressed ionizing-radiation-induced cH2AX is likely that modulation of chromatin structure plays a role in foci (Fig. 1d). We designed isoform-specific siRNAs to reduce expres- DDR signalling2. Histone proteins are known targets of DDR post- sion of isoform A or B messenger RNA (mRNA) (Fig. 1e) and protein translational modification2,6, but a detailed understanding of the role (Supplementary Fig. 5) selectively; selective targeting of isoform C was of chromatin modulation in the DDR is lacking. not technically possible owing to complete coding sequence overlap To explore the role of chromatin modulation in the DDR, we with isoforms A and B. We observed that selective depletion of Brd4 developed a high-throughput, high-content quantitative microscopy isoform B, but not isoform A, increased H2AX phosphorylation over a assay multiplexed for early and late DDR endpoints, and applied this to wide range of ionizing radiation doses (Fig. 1f). an RNA-mediated interference (RNAi) library focused on proteins To investigate whether elevated cH2AX levels observed in Brd4- that interact with and modify chromatin (see Methods)7,8. For each deficient cells resulted from increased production of ionizing-radiation- time point, cells were co-stained with cH2AX antibodies to measure induced DNA double-strand breaks or from faulty double-strand break early signalling events in the DDR, Hoechst 33342 to monitor cell- repair, we used pulsed-field gel electrophoresis to quantify double-strand cycle progression and phospho-histone H3 (pHH3) to measure breaks in control and Brd4 knockdown cells. As shown in Fig. 2a, Brd4 mitotic entry. At the latest time point, cleaved caspase-3 (CC3) was knockdown had minimal effects on the generation and repair kinetics of substituted for pHH3 to measure apoptotic cell death. The screening double-strand breaks. These observations, together with our finding that assay was validated with small molecule inhibitors of DDR signalling individual cH2AX foci were larger and more intense in irradiated Brd4 as well as RNAi directed against known components of the DDR knockdown cells (Fig. 1b, Supplementary Fig. 4 and Supplementary pathway (Supplementary Figs 1–4). Tables 1 and 2), indicate that there is enhanced signalling from damaged The most pronounced increase in cH2AX foci number, size and DNA in the absence of Brd4, rather than an increase in the amount of intensity after ionizing radiation was observed at 1 and 6 h after damage or repair deficiency.

1Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 2Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02215, USA. 3Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA. 4Whitehead Institute, Cambridge, Massachusetts 02139, USA. 5Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA. 6Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA. 7Department of Surgery, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02215, USA. 8Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 9Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 104 246 | NATURE | VOL 498 | 13 JUNE 2013 ©2013 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH a c Figure 1 | Brd4 isoform B suppresses H2AX Isoform siPan-Brd4 Pan-Brd4 ab siA Brd4 Iso. A ab 0.5 phosphorylation after ionizing radiation. A BRD BRD ET 1362 a, Rank of hairpins from shRNA screen ordered by aa:1 0.0 58 169 349 461 600 678 siB integrated cH2AX foci intensity at 1 h after 10 Gy Brd4 BRD BRD ET = Unique 75 aa –0.5 B coding exon ionizing radiation (details of screening assay in Caffeine aa:1 58 169 349 461 600 678 794

integrated integrated Supplementary Figs 1–4). b, cH2AX foci size All hairpins BRD BRD 10 –1.0 C ET (upper panel) and mean cH2AX foci per nucleus Negative control aa:1 58 169 349 461 600 678 722 Log (LacZ) H2AX intensity foci –1.5 (lower panel) after 10 Gy ionizing radiation (IR) γ from cells expressing indicated shRNAs (bars show –2.0 200 400 600 8001,000 NS Transfected mean and two standard deviations of control Hairpin rank d γ DNA Flag H2AX 1.0 Brd4 values). RFP, red fluorescent protein. c,Domain

A 0.8 structure of Brd4 isoforms showing conserved b 0.6 Iso. AIso. BIso. C tandem bromodomains (BRD), extra-terminal Brd4 shRNA Con. A (ET) domain, siRNA and antibody target Brd2 shRNA P g Control shRNA Isoform < 0.0001 sequences, and unique isoform B exon. aa, amino 12 (RFP) 1.0 ysate acid; Iso., isoform. d, H2AX phosphorylation in 10 L 0.8 WB: Fla

H2AX signal cells expressing Flag-tagged Brd4 isoform B 8 γ 0.6 (arrowheads) or A and C (arrows) at 1 h after 10 Gy

6 Isoform B Con. B g 4 Brd4 Transfected NS IR. Left: representative images. Middle: 1.0 Relative Relative quantification of 10 fields from two independent Mean foci area foci Mean 2 0.8

WB: Fla c Flag IP Flag experiments with mean H2AX signal normalized 0 0.6

No IR 1 h 6 h 24 h Isoform C to untransfected cells. Right: immunoblot of Con. C Time after 10 Gy IR 10 Gy IR, 1 h isoform expression levels in whole-cell lysates and anti-Flag immunoprecipitates. Con., control.

Brd4 shRNA e, Isoform-specific Brd4 knockdown in cells Brd2 shRNA e f transfected with the indicated siRNA and analysed Control shRNA Isoform A siCon. siA siB by quantitative real-time PCR with reverse (RFP) 10 4 Gy: 30 Isoform B 200 transcription (n 5 3). f, H2AX phosphorylation Isoform C levels 1 h after indicated ionizing radiation exposure 150 20 1 siCon. in cells transfected with isoform-specific siRNA 100 (n 5 3). Inset shows representative immunoblot for 10 siA 0.1 siB triplicate samples. Data are from U2OS cells. Error 50 H2AX intensity

0 γ bars, s.e.m.; P values were determined using Relative expression Relative Mean foci per nucleus No IR 1 h 6 h 24 h 0.01 0 Student’s t-test in this and all subsequent figures Time after 10 Gy IR siControl siA siB 0 Gy 2 Gy 4 Gy unless otherwise indicated.

Changes in overall chromatin structure can affect H2AX phosphor- chromatin structure, and therefore performed micrococcal nuclease ylation, probably by controlling the accessibility of signalling mole- susceptibility experiments. Knockdown of Brd4 isoform B increased cules to DNA damage sites12,13. Interestingly, cH2AX foci form more digestion by micrococcal nuclease, indicating a more ‘open’ overall readily in ‘open’ areas of euchromatin14, histone acetylation has been chromatin structure, whereas knockdown of isoform A had minimal linked to the ‘open’ chromatin state and histone deacetylase inhibitors effects (Fig. 2b). Furthermore, we observed that cells transfected are known to increase H2AX phosphorylation15. We speculated that with Brd4 isoform B showed a distinct nuclear 49,6-diamidino-2- a bromodomain protein could influence H2AX phosphorylation phenylindole (DAPI) staining pattern, indicating a change in chro- through interaction with acetylated histones and effects on global matin structure (Fig. 2c). As shown in Fig. 2d, e, quantification of the

Figure 2 | Brd4 isoform B limits H2AX a 0.20 b phosphorylation through bromodomain-acetyl Control shRNA siCon. lysine-mediated effects on chromatin structure. Brd4 shRNA 20 siA 0.15 siB a, Pulsed-field electrophoresis analysis of DNA from stable cell lines expressing indicated shRNA

Intensity 10 after 10 Gy IR (n 5 3). b, Left: micrococcal nuclease 0.10 units) (arbitrary assay of control or Brd4 knockdown cells. Right: 1234 Time: line traces of representative gel lanes. c, Chromatin

Ratio: broken/intact DNA 10 Gy:0 h 0.5 h 1 h 2 h 3 h 5 h Distance (cm) siRNA: siCon. siA siB structure from cells expressing Flag-tagged Brd4 isoform B (arrowheads) or A and C (arrows) shown c d f g DNA GFP γH2AX Flag DAPI (–)JQ1 (+)JQ1 by DAPI staining. d, Three-dimensional representation of nuclear DAPI staining intensity WT Iso. B Iso. H2AX from cells in c as indicated by coloured frames. γ

Isoform A e, DAPI pixel correlation from Brd4 isoform A, B, C and untransfected control cells (n 5 3). 10 Gy IR,10 1 h Iso. B Iso. Actin Actin Mut. BD1 Mut. f, Immunoblots (top) and quantification (bottom) P e 1.0 P < 0.0001 = 0.013 of H2AX phosphorylation after 250 nM DMSO, or Isoform B P = 0.058 1.3 active (1) and inactive (2) JQ1 at 1 h after 10 Gy 1.2 B Iso. 0.8 P = 0.008 BD2 Mut. ionizing radiation (n 5 3). g, cH2AX signal 1 h 1.1 after 10 Gy IR in cells expressing green fluorescent 0.6 1.0 Iso. B Iso.

(+)JQ1 protein (GFP)–wild-type Brd4 isoform B Isoform C H2AX signal 10 Gy ionizing radiation, 1 h 1 radiation, ionizing Gy 10 correlation DAPI pixel 0.9 γ (arrowheads), isoform B with mutations that 0.4 0.8 abrogate acetyl lysine binding of bromodomain 1 ControlA B C Iso. B Iso. (-)JQ1 (BD1) or 2 (BD2) (arrows), or wild-type Brd4 Isoform JQ1 + IR DMSO + IR isoform B in the presence of 250 nM (2) JQ1 (inactive) or (1) JQ1 as indicated. 105 13 JUNE 2013 | VOL 498 | NATURE | 247 ©2013 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

a Flag–Brd4 Iso. B lysate c b 1.0 1.0 Mass Spec 1: 0.0 0.0 Anti-Flag Brd4 Iso. B no. 1 All hairpins

beads H2AX foci H2AX foci 173 proteins SMC3 −0.5 BRD4 −0.5 PRMT5 HCS: 184 genes Caffeine STAG2 Top quartile −1.0 Condensin −1.0 MLL5 γ Cohesin 1 and 6 h H2AX foci −1.5 −1.5 intensity 10 Gy, 1 h intensity 10 Gy, 6 h

BRD4 intensity, Integrated γ Integrated γ area, number NCAPD3 1.0 57 proteins SMC2 1.0 Bead isolation, CBB gel 0.5 0.5 H2AX H2AX 0.0 Mass Spec 2: 0.0 Brd4 Iso. B no. 2 −0.5 Mean γ Mean γ Mass 80 proteins area 10 Gy, 1 h −0.5 area 10 Gy, 6 h spectrometry −1.0 analysis 4 4 Vector Iso. A Iso. B e IR: – + – + – + f d siControlsiSMC2 3 3 SMC2 BRD4 SMC2 NCAPD3 SMC4 10 Gy, 1 h 2 10 Gy, 6 h 2

Actin H2AX foci/nucleus H2AX foci/nucleus γ 1 γ 1 CAPD2 CAPH SMC2 Flag 200 400 600 800 1,000 200 400 600 800 1,000

Flag IP pulldown Hairpin rank Hairpin rank γ CAPG SMC4 CAPH2 CAPG2 g P < 0.0001 h DNA GFP H2AX i P = 0.0002 2.5 BRIT1 SCC1 MCPH1 SMC3 RAD21 2.0 Iso. B + siControl Iso. B + siSMC2 H2AX GFP Iso. B 1.5 + siControl 1.5 2.5 NS j P < 0.0001 H2AX signal 1.0 ATR ATM SMC1 STAG2 SA1 γ 2.0 H2AX 0.5 1.0 1.5

P53 0.0 signal 1.0 + siSMC2 GFP Iso. B

Relative 0.5 NSE1 NSE4 SMC6 siB 0.5

CHK1 CHK2 Relative γ siControl siSMC2 NSE3 siB/siSMC2 Iso. B/ Iso. B/ E2F1 NSE2 SMC5 RB Control/ siControl Control/ siSMC2 MAGEG1 siControl siSMC2 Figure 3 | Brd4 isoform B interaction with the condensin complex affects b. Border colours denote overlap of screens from c. The new interaction of Brd4 H2AX phosphorylation. a, Mass spectrometry identification of co- with the condensin complex is indicated by the red line. e, Validation of immunoprecipitated proteins from Flag-tagged Brd4 isoform-B-expressing isoform-B–condensin interaction with blotting immunoprecipitates from cells cells. b, Identification of candidate Brd4 interactors by ranking chromatin transfected with indicated Flag-tagged constructs. f, Immunoblot verification modifier shRNAs from screen for elevated H2AX foci intensity, area and of SMC2 knockdown from cells transfected with SMC2 siRNA. g, Nuclear number at 1 and 6 h after 10 Gy IR. Dashed red lines indicate top quartile. cH2AX signal from cells transfected with indicated combinations of control c, Intersection of two independent mass-spectrometry experiments (a) with the DNA, Brd4 isoform B and/or SMC2 siRNA. Data were quantified from ten top quartile of candidates in b. Overlapping set includes Brd4, SMC2 and fields of two independent experiments normalized to control cells. h, H2AX NCAPD3. d, Network representation of SMC proteins and relation to DNA phosphorylation 1 h after 10 Gy IR in cells simultaneously expressing isoform B damage signalling with protein–protein and kinase–substrate interactions and control (arrows) or SMC2 siRNA (arrowheads). i, Chromatin staining collated from the literature. Protein–protein and kinase–substrate interactions pattern in cells simultaneously expressing isoform B and control (red frame) or shown by solid and dotted lines, respectively. Colours indicate condensin SMC2 (blue frame) siRNA. j, Mean nuclear cH2AX signal in GFP–isoform-B- complex (blue), cohesin complex (pink), other SMC protein complexes (green), expressing cells with or without SMC2 knockdown. Data are from ten fields of cell-cycle regulators (orange) and DNA damage signalling machinery (mint). two independent experiments, as in h, normalized to control untransfected cells. Diamonds show mass spectrometry and high-content screening hits from a and nuclear staining texture showed a more heterogeneous DAPI intensity after irradiation in U2OS cells (Fig. 2f), similar to the effects observed pattern, and significantly lower pixel-to-pixel correlation of DAPI after Brd4 isoform-B-specific knockdown. Furthermore, JQ1 treat- staining in cells overexpressing isoform B, indicative of isoform-B- ment or Brd4 isoform B knockdown did not significantly alter total mediated alterations in global chromatin structure. Expression of histone levels or levels of histone acetylation (Supplementary Figs 7 isoform A had no effect on DAPI staining, whereas overexpression and 8). Interestingly, overexpression of Brd4 isoform B led to altera- of isoform C had smaller effects than those observed with isoform B. tion in the nuclear staining pattern of acetyl-lysine, closely mirroring Our finding that Brd4 isoform B expression affects global chromatin the DAPI staining pattern induced by expression of isoform B structure and attenuates H2AX phosphorylation in response to DNA (Supplementary Fig. 7b). damage led us to investigate the subcellular localization of isoform B in The concentration of JQ1 that we used (250 nM) is consistent with response to ionizing radiation. Immunofluorescence experiments the reported in vitro half-maximum inhibitory concentration for Brd4 showed that ionizing radiation did not grossly alter Brd4 isoform B bromodomains 1 (BD1, 77 nM) and 2 (BD2, 33 nM)17. To evaluate nuclear localization, which tightly mirrored DNA patterns shown by directly the role of each bromodomain in isoform B, we performed DAPI staining (Supplementary Fig. 6a). Interestingly, subcellular frac- gain-of-function experiments using wild-type Brd4 in the absence or tionation of U2OS cells and extraction of chromatin-bound proteins presence of JQ1, or constructs harbouring mutations that abrogate demonstrated that irradiation caused enhanced isoform B association acetyl lysine binding by BD1 or BD2. Mutations in BD1, or addition with the high salt-extractable chromatin fraction (Supplementary of the active enantiomer of JQ1, potently reversed the cH2AX- Fig. 6b, c), indicating increased association of isoform B with chro- suppressive effects of isoform B expression (Fig. 2g). Notably, muta- matin after DNA damage. tions that abrogate BD1 binding to acetyl-lysine also rescued the Bromodomains recognize epigenetic marks on chromatin by bind- ionizing-radiation-induced cell death phenotype observed with Brd4 ing to acetyl-lysine16. We therefore tested the contribution of Brd4 isoform B gain-of-function (see below), implicating BD1 in the mech- bromodomain interactions to alterations in cH2AX phosphorylation anism of DNA damage inhibition (cf. Fig. 4b). using JQ1, a small molecule inhibitor of BET bromodomains17. Only To probe further the role of lysine acetylation on cH2AX-Brd4 the active enantiomer of JQ1 caused increased H2AX phosphorylation effects, we examined the combined effects of histone deacetylase 106 248 | NATURE | VOL 498 | 13 JUNE 2013 ©2013 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

a 1.5 P Time after 4 Gy IR Figure 4 | Brd4 isoform B affects DNA GFP Iso. B = 0.0004 b P = P = c pATM 1.0 0.001 0.038 0 h 8 h 16 h 20 h 24 h ionizing-radiation-induced cell- 25 0.5 cycle checkpoints and survival. pATM 20

0.0 15 siCon. a, Loss of DNA damage signalling in Con. Iso. B 10 cells expressing Brd4 isoform B. Left:

P siAsiB 53bp1 1.5 < 0.0001 5 representative images stained for

Cell death (%) 0 1.0 indicated DDR proteins 1 h after 10 Gy GFP 0.5 Iso. B Iso. B 53bp1 IR. Arrowheads indicate isoform-B- 2n 4n 2n 4n 2n 4n 2n 4n 2n 4n Mut. BD1 10 Gy IR, 1 h 0.0 expressing cells. Right: quantification pSQ Con. Iso. B d P P P = 0.032 1.5 0.8 = 0.046 0.20 = 0.0007 0.008 of ten representative fields from two P = 0.004 1.0 0.6 0.15 0.006 independent experiments normalized Post-IR relative nuclear signal 0.4 0.10 0.004 to untransfected cells. b, Cell death 24 h

0.5 pSQ after 10 Gy IR in cells expressing wild- Surviving Surviving 0.0 0.2 0.05 Surviving 0.002 fraction 2 Gy fraction 10 Gy Con. Iso. B fraction 4 Gy type or bromodomain-1-mutant 0.0 0.00 0.000 siCon. siA siB siCon. siA siB siCon.siA siB isoform B (Isoform B mut. BD1) e Pancreas Breast Prostate Lung Glioma scored for cleaved caspase 3 by flow P P 3 f < 0.0001 < 0.0001 cytometry (n 5 3). c, Ionizing- No IR + DMSO 10 Gy + DMSO * No IR + DMSO 1.0 No IR + JQ1 10 Gy + JQ1 10 Gy + DMSO radiation-induced cell-cycle arrest and 2 No IR + JQ1 recovery in Brd4 isoform knockdown ** 0.8 H2AX signal 10 Gy + JQ1 * cells assayed by propidium iodide 1 0.6 staining and flow cytometry. d,Cell Surviving fraction survival after irradiation in Brd4

Relative γ 0 LN18 U87MG isoform knockdown cells measured by A549 BT20 22Rv1 H1299M059JM059KLN18 g PaTu8902 MCF17LNCaP U87MG colony formation. e, JQ1 effect on PaTu8988TMDAMB231 Cell line cH2AX in several human cancer cell Ac Ac Ac Ac Brd4 types commonly treated with Brd4 Brd4 radiotherapy. No IR, no ionizing P Ac loss gain Ac Ac Ac Condensin P P PPAc Ac Ac radiation. f, Radiation survival effects Brd4 DNA Brd4 Brd4 of JQ1 in glioma cell lines measured at Open chromatin structure, damage ↑γ Condensed chromatin, H2AX, signalling and survival ↓γH2AX, signalling and survival 72 h by CellTiterGlo (n 5 3). g,Model forBrd4effectsonDNAdamage signalling. inhibitors and Brd4 knockdown. We found that when Brd4 isoform B list ofthe top quartileof genes ranked by increased cH2AX foci intensity, knockdown was combined with exposure to 50 nM LBH589, an inhib- number and size at 1 and 6 h after irradiation (Fig. 3b). The overlap of itor of histone deacetylases 1–3 and 6 (ref. 18), H2AX phosphorylation this list with the list of isoform-B-interacting proteins showed two mem- was enhanced to a greater extent than with either treatment alone bers of the condensin II complex, SMC2 and CAPD3 (Fig. 3c, d). This (Supplementary Fig. 9). This effect could be observed even in unirra- finding was intriguing as the condensin II complex has a known role in diated cells, although the total amount of H2AX phosphorylation chromatin compaction in both mitotic and interphase cells, and has remainedlowerthanthatseeninirradiatedcells.Takentogether,these been linked to DNA damage repair19. We performed immunoprecipita- findings indicate that Brd4 isoform B binding to acetylated regions of tion experiments after DNA damage, and found that the SMC2 and chromatin alters chromatin structure and limits H2AX phosphorylation. SMC4 components of the condensin II complex co-immunoprecipitated Brd4 also has a defined role in transcriptional modulation, largely with Brd4 isoform B, whereas Brd4 isoform A had minimal co-association through interactions of isoform A with the pTEFb transcriptional (Fig. 3e). To verify the role of this interaction on the cH2AX effects we complex10,11. To investigate the contribution of Brd4-driven transcrip- observed, we performed combined isoform B and SMC2 knockdown and tional changes to the suppression of DNA damage signalling, we pro- assayed H2AX phosphorylation 24 h after siRNA transfection, when filed mRNA expression patterns of cells stably expressing control or knockdown of each protein is sub-maximal. We found that H2AX phos- Brd4 shRNAs. Only one DDR-associated transcript, CHEK2, showed phorylation was enhanced with combined knockdown over knockdown a differential expression change of twofold or more (Supplementary of either protein alone (Fig. 3f, g). Furthermore, in cells overexpressing Fig. 10a). Importantly, transient Brd4 knockdowns with siRNA, or isoform B, SMC2 knockdown could abrogate the suppressive effects of short-term inhibition with JQ1, both of which increased cH2AX foci Brd4 on cH2AX, demonstrating a functional interaction between isoform formation after irradiation (Supplementary Fig. 5a and Fig. 2f), caused B and the condensin II complex in modulating cH2AX(Fig.3h,j).Finally, no change in CHEK2 mRNA levels (Supplementary Fig. 10b, c), and we noted that the effects of isoform B on the DAPI staining pattern of neither long-term nor short-term Brd4 knockdown affected the protein chromatin were abrogated by co-transfection of SMC2 siRNA, indicating levels of several DDR molecules, including Chk2 (Supplementary that the Brd4-condensin II interaction is involved in chromatin structure Fig. 10d). Moreover, the suppression of DDR signalling by Brd4 isoform alterations (Fig. 3i). B overexpression was insensitive to transcription and translation inhi- We next investigated isoform B effects on other components of the bition with a-amanitin and cycloheximide, respectively (Supplemen- DDR. We found that isoform B gain-of-function inhibited ionizing- tary Fig. 11). radiation-induced foci formation of several other known DDR signal- As interactions between Brd4 and other protein complexes involved ling components including 53BP1, phosphorylated ATM and several in modulating chromatin structure were probably responsible for the DDR signalling molecules containing the phospho-SQ DDR kinase DDR effects we observed, we identified proteins co-immunoprecipitated substrate motif (Fig. 4a). In addition, overexpression of isoform B with isoform B after DNA damage using mass spectrometry (Fig. 3a and resulted in increased cell death after irradiation, an effect that was Supplementary Fig. 12). From two independent experiments, we significantly diminished by mutation of BD1 (Fig. 4b). The cell death obtained a common set of 57 interacting proteins (Supplementary observed in Brd4 isoform B overexpressing cells seems to result from Tables 3 and 4). Because the DDR-relevant Brd4-binding proteins pre- mitotic catastrophe, consistent with a loss of DDR signalling that sumably function in the same pathway as Brd4, we reasoned that loss of results in failed cell-cycle arrest (Supplementary Fig. 13). We also these proteins shouldshow a phenotype similar to Brd4 loss-of-function. investigated the effect of isoform B knockdown on DDR-induced We therefore used our existing high-content screening data to create a cell-cycle arrest and survival. Interestingly, isoform B loss-of-function 107 13 JUNE 2013 | VOL 498 | NATURE | 249 ©2013 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER allowed increased cell survival with more rapid and efficient recovery Full Methods and any associated references are available in the online version of the paper. from cell-cycle arrest after irradiation, complementing the inverse findings observed with isoform B gain-of-function (Fig. 4c, d). Received 19 July 2011; accepted 3 April 2013. Given the effects of Brd4 isoform B on ionizing-radiation-induced Published online 2 June 2013. DDR signalling and survival, we considered that isoform B might have a role in tumour responses to irradiation. We screened a panel of estab- 1. Jackson, S. P. & Bartek, J. The DNA-damage response in human biology and disease. Nature 461, 1071–1078 (2009). lished cell lines from several human tumour types commonly treated 2. Misteli, T. & Soutoglou, E. The emerging role of nuclear architecture in DNA repair with radiotherapy for cH2AX effects using the JQ1 inhibitor. Several cell and genome maintenance. Nature Rev. Mol. Cell Biol. 10, 243–254 (2009). types showed increased ionizing-radiation-induced H2AX phosphory- 3. Gorgoulis, V. G. et al. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 434, 907–913 (2005). lation with JQ1 treatment, including breast, prostate and particularly 4. Bartkova, J. et al. DNA damage response as a candidate anti-cancer barrier in early glioma cancer cell lines (Fig. 4e). Just as we had observed with U2OS human tumorigenesis. Nature 434, 864–870 (2005). cells, irradiation had the expected killing effect on dimethylsulphoxide 5. Kastan, M. B. & Bartek, J. Cell-cycle checkpoints and cancer. Nature 432, 316–323 (2004). (DMSO)-treated glioma cells; however, this killing effect was markedly 6. Polo, S. E. & Jackson, S. P. Dynamics of DNA damage response proteins at DNA reduced in JQ1-treated glioma cells, consistent with our finding of breaks: a focus on protein modifications. Genes Dev. 25, 409–433 (2011). increased DDR signalling and radioresistance with decreased Brd4 func- 7. Moffat, J. et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124, 1283–1298 (2006). tion (Fig. 4f). Conversely, overexpression of Brd4 isoform B in glioma 8. Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and cells inhibited H2AX phosphorylation, consistent with decreased DDR quantifying cell phenotypes. Genome Biol. 7, R100 (2006). signalling upon Brd4 gain-of-function (Supplementary Fig. 14). 9. Rahman, S. et al. The Brd4 extraterminal domain confers transcription activation independent of pTEFb by recruiting multiple proteins, including NSD3. Mol. Cell. We conclude that structural alterations in chromatin mediated by Biol. 31, 2641–2652 (2011). Brd4 acetyl lysine binding function to attenuate the DNA damage 10. Yang, Z. et al. Recruitment of P-TEFb for stimulation of transcriptional elongation signalling response to ionizing radiation. These effects on DDR sig- by the bromodomain protein Brd4. Mol. Cell 19, 535–545 (2005). 11. Jang, M. K. et al. The bromodomain protein Brd4 is a positive regulatory nalling are consistent with the induction of a chromatin structure that component of P-TEFb and stimulates RNA polymerase II-dependent transcription. is inhibitory to the formation of cH2AX in the case of higher levels of Mol. Cell 19, 523–534 (2005). Brd4 isoform B expression, or a more ‘open’ chromatin structure that 12. Murga, M. et al. Global chromatin compaction limits the strength of the DNA c damage response. J. Cell Biol. 178, 1101–1108 (2007). facilitates H2AX foci formation when Brd4 expression is reduced, or 13. Ziv, Y. et al. Chromatin relaxation in response to DNA double-strand breaks is after pharmacological inhibition of bromodomain binding (shown modulated by a novel ATM- and KAP-1 dependent pathway. Nature Cell Biol. 8, schematically in Fig. 4g). 870–876 (2006). 14. Cowell, I. G. et al. cH2AX foci form preferentially in euchromatin after ionising- Our data indicate that Brd4 affects DDR signalling through radiation. PLoS ONE 2, e1057 (2007). mechanisms distinct from known transcriptional interactions with 15. Kim, J. A., Kruhlak, M., Dotiwala, F., Nussenzweig, A. & Haber, J. E. Heterochromatin the P-TEFb transcriptional complex. The relevant Brd4 isoform that is refractory to c-H2AX modification in yeast and mammals. J. Cell Biol. 178, modulates the DDR, isoform B, lacks the pTEFb-interacting region. In 209–218 (2007). 16. Filippakopoulos, P. et al. Histone recognition and large-scale structural analysis of addition, chemical inhibition of transcription/translation had no effect the human bromodomain family. Cell 149, 214–231 (2012). on the ability of Brd4 to suppress DDR-induced cH2AX. This finding 17. Filippakopoulos, P. et al. Selective inhibition of BET bromodomains. Nature 468, is in line with the recent identification of other chromatin-interacting 1067–1073 (2010). 18. Bradner, J. E. et al. Chemical phylogenetics of histone deacetylases. Nature Chem. proteins such as KAP-1 and Brg1 that have roles in DNA damage Biol. 6, 238–243 (2010). signalling that do not seem to arise directly from the transcriptional 19. Wu, N. & Yu, H. The Smc complexes in DNA damage response. Cell Biosci. 2, 5 activity that these molecules also possess13,20. Rather, the enhancement (2012). 20. Lee, H.-S., Park, J.-H., Kim, S.-J., Kwon, S.-J. & Kwon, J. A cooperative activation loop of several parameters of cH2AX foci after Brd4 knockdown, including among SWI/SNF, c-H2AX and H3 acetylation for DNA double-strand break repair. their size and intensity, in addition to their number, point to a role for EMBO J. 29, 1434–1445 (2010). Brd4 in limiting the propagation of DDR signalling after ionizing Supplementary Information is available in the online version of the paper. radiation. This effect seems to involve the recruitment of a chro- Acknowledgements We thank H. Le, T.R. Jones and M. Vokes for assistance with matin-condensing complex to sites of acetylation, a new role for screening and image analysis. We thank C. Whittaker, S. Hoersch and M. Moran for Brd4. In agreement with this, overexpression of Brd4, even in the computing and data analysis assistance; C. Reinhardt, C. Ellson and A. Gardino for absence of damage, resulted in alterations of chromatin structure and manuscript editing; and P. Filippakopoulos and S. Knapp for discussions. This work was partially supported by the Koch Institute and Center for Environmental Health nuclear acetylation patterns, consistent with a model of Brd4 isoform B Sciences National Institutes of Health Core Grants P30-CA14051 and ES-002109; and binding to and occluding acetyl-lysine sites on chromatin and recruit- by grants R01-ES15339, 1-U54-CA112967-04 and R21-NS063917; a SPARC grant ing chromatin compaction machinery. These findings implicate bro- to M.B.Y.; and a Holman Pathway Research Resident Seed Grant, American Society for Radiation Oncology Junior Faculty Career Research Training Award, Klarman Scholar, modomain-mediated interactions in modulating specific chromatin Koch Institute Clinical Investigator Award, and Burroughs Wellcome Career Award for structures that inhibit the propagation of DDR signalling in chro- Medical Scientists to S.R.F. 12,15 matin , and indicate that Brd4 isoform B alters the threshold res- Author Contributions S.R.F. and M.B.Y. designed the study, supervised the ponse of cH2AX to DNA damage. experiments, analysed the data and wrote the manuscript. D.E.R., W.C.H. and D.M.S. were involved in the design and preparation of the lentiviral shRNA library. S.R.F., M.E.P. METHODS SUMMARY and E.B. performed the image-based high-content screen and initial analysis. A.E.C. aided in digital image analysis. S.R.F., Q.H., S.M.C., F.C.L., I.G.C., M.J.L., A.F., R.H., B.A.G., Image-based high-content screening was performed in 384-well plate format using G.C.C. and A.M. performed biochemical, cell biological and molecular biological an arrayed lentiviral shRNA library from The RNAi Consortium. Screen images experiments. B.D.B., A.M.D. and F.M.W. performed mass spectrometry experiments and were acquired with a Cellomics microscope (Thermo Scientific) and quantified analysis. J.R. performed bioinformatics analysis. J.E.B. contributed JQ1 compounds using CellProfiler software. siRNAs and antibodies were from commercial sources. and cell lines. S.R.F. and M.B.Y. designed and supervised the experiments. C.C.C., J.E.B. and F.M.W. contributed to the intellectual development of the study and technical We used Affymetrix U133 Plus 2.0 arrays for expression profiling. Mass spectro- writing of the manuscript. All authors contributed to editing the manuscript. metry data from Brd4 immunoprecipitates after SDS–PAGE was acquired with an Orbitrap XL instrument (Thermo Scientific), and data analysed with Mascot Author Information The expression profiling Affymetrix u133 plus dataset has been software. Interactions for network analysis were hand-curated from primary lit- deposited in the NCBI Gene Expression Omnibus database under accession number GSE30700. Reprints and permissions information is available at www.nature.com/ erature using the keywords ‘DNA damage’, ‘cell cycle checkpoint’, ‘chromatin reprints. The authors declare no competing financial interests. Readers are welcome to structure’, ‘ATM/ATR’, ‘Chk1/Chk2’ and ‘SMC proteins’. Further details are pro- comment on the online version of the paper. Correspondence and requests for vided in the Methods. materials should be addressed to M.B.Y. ([email protected]).

108 250 | NATURE | VOL 498 | 13 JUNE 2013 ©2013 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

METHODS shRNA or chemical inhibitor. Six fields per well were imaged, with three channels/ Antibodies and stains. Mouse monoclonal antibodies against cH2AX were from field (DAPI, fluorescein and rhodamine) for a total of 18 acquired images per well. Upstate/Millipore (catalogue number 05636), Actin (Sigma, catalogue number Images were segmented and analysed with CellProfiler cell image analysis software. A5441), phospho-ATM Serine 1981 (Rockland, catalogue number 200-301-400), The imaging pipeline used to segment the images is available on request. Cell mor- Flag (Sigma, catalogue number F3165), ornithine decarboxylase (Abcam, catalogue phology and intensity data were acquiredonaperimageandpercellbasis,and number ab66067), RAD50 (GeneTex, catalogue number GTX70228), NBS1 exported into a mySQL database. The data were visualized with SpotFire (TIBCO) (Abcam, catalogue number ab49958), MDC1 (Novus, catalogue number NB100- and CellProfiler Analyst. 396) and Lamin (Millipore, catalogue number 05-714). Rabbit polyclonal and Immunofluorescence microscopy. U2OS cells were plated on number 1 glass coverslips (VWR) and were cultured in DMEM 1 Pen/Strep 1 10% v/v FBS monoclonal antibodies against Brd4 were from Abcam (catalogue number 6 (complete media) at 37 C in a 5% CO2 atmosphere, then exposed to 10 Gy ionizing Ab46199) and Pan-Brd4 from Sigma (catalogue number AV39076), 53BP1 137 (Novus, catalogue number NB100-304), CHEK2 (Cell Signaling Technology, cata- radiation from a Cs source in a Gammacell irradiator (Atomic Energy of logue number 2662), total H2AX (Abcam, catalogue number ab11175), phospho- Canada), fixed in methanol and processed for immunofluorescence using the SQ (Cell Signaling Technology, catalogue number 2851), MRE11 (Novus, catalogue antibodies indicated above. Images were captured on a Zeiss Axiophot II micro- number NB100-142), cleaved caspase 3 (Cell Signaling Technology, catalogue scope with a Hamamatsu CCD (charge-coupled device) camera and processed with number 9664), SMC2 (Cell Signaling Technology, catalogue number 5329), OpenLab/Volocity software. We used CellProfiler (www.CellProfiler.org) or SMC4 (Cell Signaling Technology, catalogue number 5547), phopho-histone H3 ImageJ software (http://rsb.info.nih.gov/nihimageJ) for quantitative image analysis. 6 (Upstate/Millipore, catalogue number 06570 and BD/Pharmingen catalogue num- RT–PCR. Total RNA was extracted from 10 U2OS cells expressing either control ber 559565). DNA stains were Hoechst 33342 (Invitrogen, catalogue number or Brd4-directed shRNA, with an RNeasy kit (Qiagen). Complementary DNA was H1399) propidium iodide (Invitrogen, catalogue number P1304MP) and ethidium generated with oligo(dT) primers with SuperScript reverse transcriptase (Invitrogen) bromide (Invitrogen, catalogue number 15585011). Fluorescent antibodies were according to the manufacturer’s instructions. These complementary DNAs were from Invitrogen: goat anti-rabbit and goat anti-mouse Alexa 488, 555 and 647 used as templates for linear-range PCR amplification or quantitative real-time PCR (catalogue numbers A11001, A21422, A21235, A21238, A21428 and A21244). with SYBR green master mix on an Applied Biosystems 7500 with the following Small molecule inhibitors. Brd4 bromodomain inhibitor (1)JQ1 and its inactive primers: forward 59-CTC CTC CTA AAA AGA CGA AGA-39 and reverse (pan- enantiomer (2)JQ1 were synthesized as described17 and were used at 250 nM. Brd4 isoform) 59-TTC GGA GTC TTC GCT GTC AGA GGA G-39, (Brd4 isoform a-Amanitin (catalogue number A2263) and cycloheximide (catalogue number A) 59-GCC CCT TCT TTT TTG ACT TCG GAG C-39,(Brd4isoformB)59-GCC C4859) were from Sigma and were used at concentrations as indicated (a-amanitin CTG GGG ACA CGA AGT CTC CAC T-39, (Brd4 isoform C) 59-CCG TTT TAT 1–16 mM; cycloheximide 35–560 mM). UCN01 was from Sigma (catalogue number TAA GAG TCC GTG TCC A-39, (CHEK2) forward 59-ACAGATAAATAC U6508) and was used at concentrations of 0.003–10 mM. Caffeine was from Sigma CGAACATACAGC-39 and reverse 59-GACGGCGTTTTCCTTTCCCTACAA-39, (catalogue number C0750) and was used at concentrations of 10–25 mM. LBH589 and using (GAPDH) primers forward 59-GATGCCCTGGAGGAAGTGCT-39 and was a gift from J. Bradner). reverse 59-AGCAGGCACAA CACCACGTT-39 as control for normalization. RNAi library. shRNA was applied to cells using a high-titre arrayed lenti-viral Expression profiling and analysis. Total RNA was collected from stable U2OS library maintained in the pLKO_TRC001 vector as described7. cells expressing Brd4 or control shRNA using RNeasy (Qiagen), labelled and Image-based screens. For shRNA screens and small molecule tests, human U2OS analysed on the Affymetrix U133 Plus 2.0 array. Unsupervised clustering of osteosarcoma cells (ATCC HTB-96) were grown in DMEM 1 Pen/Strep 1 10% expression data was performed using the R package pvclst. LIMMA21 was used 6 v/v FBS (complete media) at 37 C in a 5% CO2 atmosphere. All screens were to identify important changes in expression between Brd4 knockdown and control performed at passage 10–15. Cells were tested for mycoplasma by PCR before cells. Data were deposited in the US National Institutes of Health Gene Expression seeding and infection. U2OS cells were seeded with a MicroFill (Biotek) in 384- Omnibus (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc5GSE30700). well black, clear bottom plates (Greiner) at a density of 300 (shRNA) cells per well Subcellular fractionation. U2OS cells expressing Flag-tagged Brd4 isoforms were in 50 ml of media, and allowed to attach overnight at 37 uC in a 5% CO2 atmo- lysed in hypotonic conditions (10 mM Hepes, 10 mM NaCl, 25 mM KCl, 1 mM sphere. For shRNA screens, the media was exchanged the following day to com- MgCl2, 0.1 mM EDTA, pH 7.4 with protease inhibitors) and subjected to flash plete media with 8 mgml21 polybrene using a JANUS workstation (PerkinElmer). freezing in liquid nitrogen 1 h after mock treatment or exposure to 10 Gy of Virus infection was performed on an EP3 workstation (PerkinElmer) with 1.5 mlof ionizing radiation with a 137Cs source in a Gammacell irradiator (Atomic high-titre retrovirus. All plates had two wells infected with 1.5 ml of control virus Energy of Canada). Cells were thawed at room temperature and spun down at with shRNA directed against H2AX. Plates were centrifuged in a swinging-bucket 10,000g for 10 min. The supernatant was saved as the cytoplasmic fraction and rotor at 1180g for 30 min after infection and returned to the incubator overnight. concentrated down using trichloroacetic acid precipitation and reconstituted in The plates were then selected with 2.5 mgml21 puromycin for 48 h, and allowed to 23 Laemmli buffer. The pellet was re-suspended in high salt buffer (20 mM Hepes, proliferate in complete media for another 48 h, with media exchanges performed 0.5 mM DTT, 1.5 mM MgCl2, 0.1% Triton X-100, 1 M NaCl, pH 7.4 with protease on the JANUS or RapidPlate (Qiagen) liquid handling workstations. Eight wells in inhibitors) and left on ice for 30 min followed by a high-speed spin at 100,000g for each plate were not selected with puromycin. For small molecule testing, cells were 30 min. The supernatant was saved as the high salt fraction and concentrated plated at 500 cells per well in 384-well plates. The day after plating, small molecules down using trichloroacetic acid precipitation and reconstituted in 23 Laemmli at different concentrations in 100 nl DMSO were pin transferred to cells with a buffer. Sulphuric acid (0.4 N) was added to the high-speed pellet and left on ice for CyBio robot, and cells were propagated for 16 h. For both small molecule and 30 min, followed by a high-speed spin at 14,000g for 10 min. The supernatant was shRNA screens, four plates were created in replicate for the time points outlined saved as the acid fraction and concentrated down using trichloroacetic acid pre- below. Four wells were left untreated in each plate, and received 25 mM caffeine in cipitation and reconstituted in 23 Laemmli buffer. complete media 1 h before irradiation. All plates were treated with 10 Gy of Western blotting and immunoprecipitation. Cells were treated with 10 Gy ion- 667 keV X-rays from a 137Cs source in a Gammacell irradiator (Atomic Energy izing radiation with a 137Cs source in a Gammacell irradiator (Atomic Energy of of Canada). A 0 h control plate was not irradiated. The plates were returned to the Canada). For whole cell lysates, cells were trypsinized and lysed in LB (4% SDS, incubator and fixed with 4.4% w/v paraformaldehyde in phosphate-buffered saline 120 mM Tris, pH 6.8) with protease and phosphatase inhibitors (Complete mini (PBS) at 1, 6 and 24 h after irradiation. Plates were stored in PBS at 4 6C before EDTA-free and PhosSTOP, Roche Applied Science). For chromatin isolation, cells staining. Fixed plates were washed three times with PBS and blocked with 24 mlof were trypsinized, re-suspended in low salt buffer (LSB: 10 mM Hepes 10 mM GSDB (0.15% goat serum, 8.33% goat serum, 120 mM sodium phosphate, 225 mM NaCl, 25 mM KCl, 1.0 mM MgCl2, 0.1 mM EDTA, pH 7.4 1 protease inhibitors, NaCl) for 30 min. The 0, 1 and 6 h plates were incubated with 1:300 dilutions in as above), flash-frozen in liquid N2, thawed, pelleted at 10,000g for 10 min, re- GSDB of primary mouse monoclonal anti-cH2AX (Ser 139), and rabbit polyclonal suspended in high salt buffer (HSB: 20 mM Hepes, 1.0 M NaCl, 0.5 mM DTT, anti-pHH3 antibody. For the 24 h plates, we substituted 1:300 rabbit polyclonal 1.5 mM MgCl2, 0.1% Triton X-100 1 protease inhibitors) for 45 min on ice, anti-cleaved caspase 3 for the pHH3 antibody. All plates were incubated overnight pelleted at 100,000g for 30 min., and proteins from the supernatant were precipi- at 4 6C, washed and stained with a secondary antibody mix containing 10 mgml21 tated with trichloroacetic acid. For immunoprecipitation, U2OS cells expressing Hoescht 33342, 1:300 goat anti-mouse polyclonal-Alexa Fluor 488 and goat anti- Flag-tagged Brd4 isoforms were lysed in low salt buffer (50 mM Tris HCl, pH 7.4, rabbit polyclonal-Alexa Fluor 555 in GSDB. After a second overnight incubation at 150 mM NaCl, 1 mM EDTA, 0.5% NP-40 with protease inhibitors) and subjected 4 6C, the plates were washed three times in PBS and stored in 50 ml per well 50 mM to flash freezing in liquid nitrogen 1 h after mock treatment or irradiation. Cells Trilox (Sigma) in PBS at 4 6C. were thawed at room temperature and spun down at 10,000g for 10 min. The Imaging and image analysis. Plates were allowed to equilibrate to room temper- supernatant was removed and saved as the pre-immunoprecipitation cytoplasmic ature for 30 min and imaged on a Cellomics ArrayScan VTI automated microscope fraction. The nuclear pellet was re-suspended in low salt buffer, tip sonicated at with a 320 objective lens. The acquisition parameters were the same for each 4 uC (35% amplitude, pulse 5 s on and off for three cycles), and spun down at 109

©2013 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

14,000g for 10 min. The supernatant was collected as starting material for immu- pFLAG-CMV2 (Sigma) by PCR. Bromodomain mutations were introduced using noprecipitation using M2 Flag beads (Sigma Aldrich) overnight at 4 uC. The beads quickchange (Stratagene) using PCR primers: 59-AAA TTG TTA CAT CGC CAA were then spun down and the first supernatant saved as the unbound fraction. The CAA GCC TGG AGA TGA CGC AGT CTT AAT GGC AG-39 and 59-CTG CCA beads were washed five times with low salt buffer and proteins were solubilized in TTA AGA CTG CGT CAT CTC CAG GCT TGT TGG CGA TGT AAC AAT TT- 23 Laemmli buffer and boiled at 95 uC for 3 min before loading onto SDS–PAGE. 39. Cells were transfected with Fugene 6 (Roche) according to the manufacturer’s Samples were processed after SDS–PAGE for gel band cutting and in gel tryptic instructions. shRNA directed against Brd4 were from the TRC library (see digestion for mass spectrometry or western blotting to detect pulldown of the Supplementary Table 1), or created in the mir30-based pMLP vector (a gift from condensin II complex (SMC2 and SMC4 proteins) with Brd4 isoforms. SDS– M. Hemann) with primer 59-TGC TGT TGA CAG TGA GCG AAG ACA CA-39 PAGE and western blot was according to the methods of Laemmli and Towbin for Brd4. U2OS cell lines stably expressing this shRNA or control hairpins (inef- using either a Li-cor Odyssey scanner or horseradish-peroxidase-coupled second- fective hairpins directed against human sequences of BAD and PUMA) were ary antibodies (Bio-Rad) and Western Lightning enhanced chemiluminenscene created using puromycin selection at 2 mgml21. STEALTH siRNA against pan- (PerkinElmer) for visualization of bands. isoform BRD4, SMC2 and control were purchased from Invitrogen. Custom Brd4 Pulsed-field gel electrophoresis and micrococcal nuclease assay. For pulsed- isoform-specific siRNA were synthesized from Dharmacon using the following 6 field gel analysis, control and BRD4 knockdown cells were plated at 1 3 10 cells 9 9 137 sequences: isoform A specific 5 -GGG AGA AAG AGG AGC GUG AUU-3 and per plate, exposed to 10 Gy ionizing radiation with a Cs source in a Gammacell isoform B specific 59-GCA CCA GUG GAG ACU UCG UUU-39. siRNA against irradiator (Atomic Energy of Canada) and collected at 0.5, 1, 2, 3 and 5 h. Cells SMC2 was from Dharmacon. For siRNA experiments, cells were transfected were trypsinized, diluted to 2 3 106 cells and embedded in agarose plugs. The 21 with Lipofectamine RNAiMax (Invitrogen) according to the manufacturer’s agarose plugs were exposed to Proteinase K (1 mg ml ) in 500 mM EDTA, 1% instructions. 3 N-lauryl Sarcosyl, pH 8.0, for 48 h, washed 3 1 h with TE buffer, loaded onto a Mass spectrometry. Proteins from the Brd4 co-immunoprecipitation were exam- 0.675% agarose gel and separated under pulsed-field conditions with a Rotaphor ined after SDS–PAGE by staining with Coomassie blue. Gel bands were excised, 6.0 (Biometra). Nuclei from control and Brd4 knockdown cells were isolated by 2 de-stained and processed for digestion with trypsin (Promega; 12.5 ng ml 1 in hypotonic lysis and micrococcal nuclease assays performed as described by Carey 50 mM ammonium bicarbonate, pH 8.9). Peptides were loaded directly onto a and Smale22. column packed with C18 beads. The column was placed in-line with a tapered Flow cytometry. U2OS cells were plated and transiently transfected GFP trans- electrospray column packed with C18 beads on a Orbitrap XL mass spectrometer genes or siRNA as indicated, exposed to varying doses of ionizing radiation from a (Thermo Scientific). Peptides were eluted using a 120-min gradient (0–70% acet- 137Cs Gammacell irradiator source (Atomic Energy of Canada) and collected at onitrile in 0.2 M acetic acid; 50 nl min21). Data were collected using the mass varying times as indicated by fixation with 4% formaldehyde (cell-death measure- ments) or directly extracted with 100% ethanol (cell-cycle measurements), and spectrometer in data-dependent acquisition mode to collect tandem mass spectra processed for flow cytometry using the antibodies listed above. Data were analysed and examined using Mascot software (Matrix Science). using FlowJo (www.flowjo.com) software. Network analysis. Protein–protein and kinase–substrate interactions relevant to Colony formation assays. Control and BRD4 knockdown cells were exposed to DNA damage signalling were hand curated from primary literature available in the indicated doses of ionizing radiation from a 137Cs source in a Gammacell PubMed using the initial keywords ‘DNA damage’, ‘cell cycle checkpoint’, ‘chro- irradiator (Atomic Energy of Canada), or left untreated, trypsinized, counted matin structure’, ‘ATM/ATR’, ‘Chk1/Chk2’ and ‘SMC proteins’, and following and re-plated using serial dilutions. Colonies were propagated to the 10- to 15- reference lists. cell stage (3–7 days), stained with Wright stain (Sigma) and counted with CellProfiler software or by averaging counts of ten fields from three independent 21. Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, observers using a dissection microscope to identify colonies of more than 15 cells. Article3 (2004). Constructs, shRNA and siRNA, and transfection. Full-length constructs of Brd4 22. Carey, M. & Smale, S. T. Micrococcal nuclease-Southern blot assay: I. MNase and Isoform A (accession number NM_058243), B (accession number BC035266) and restriction digestions. CSH Protoc. 2007, http://dx.doi.org/10.1101/ C (accession number NM_014299.2) were cloned into pEGFP-C1 (Clontech) and pdb.prot4890 (2007).

110

©2013 Macmillan Publishers Limited. All rights reserved ARTICLE doi:10.1038/nature13007

Poly(A)-tail profiling reveals an embryonic switch in translational control

Alexander O. Subtelny1,2,3,4*, Stephen W. Eichhorn1,2,3*, Grace R. Chen1,2,3, Hazel Sive2,3 & David P. Bartel1,2,3

Poly(A) tails enhance the stability and translation of most eukaryotic messenger RNAs, but difficulties in globally mea- suring poly(A)-tail lengths have impeded greater understanding of poly(A)-tail function. Here we describe poly(A)-tail length profiling by sequencing (PAL-seq) and apply it to measure tail lengths of millions of individual RNAs isolated from yeasts, cell lines, Arabidopsis thaliana leaves, mouse liver, and zebrafish and frog embryos. Poly(A)-tail lengths were conserved between orthologous mRNAs, with mRNAs encoding ribosomal proteins and other ‘housekeeping’ proteins tending to have shorter tails. As expected, tail lengths were coupled to translational efficiencies in early zebrafish and frog embryos. However, this strong coupling diminished at gastrulation and was absent in non-embryonic samples, indicating a rapid developmental switch in the nature of translational control. This switch complements an earlier switch to zygotic transcriptional control and explains why the predominant effect of microRNA-mediated deadenylation con- currently shifts from translational repression to mRNA destabilization.

Most eukaryotic mRNAs end with poly(A) tails, which are added by a bases added during sequencing by synthesis12, thereby yielding a nor- nuclear poly(A) polymerase following cleavage of the primary transcript malized fluorescence intensity for the poly(A) tail of each transcript, during transcriptional termination1. These tails are then shortened by paired with a sequencing read that identifies its poly(A) site and thus deadenylases2,3, although in some contexts (for example, animal oocytes the gene of origin. or early embryos, or at neuronal synapses), they can be re-extended by Each starting sample was spiked with a cocktail of mRNA-like stan- cytoplasmic poly(A) polymerases4,5. In the cytoplasm, the poly(A) tail dards of known tail lengths (Extended Data Fig. 1b) to produce a standard promotes translation and inhibits decay2,5. curve for converting normalized fluorescence intensities to poly(A)-tail Although poly(A) tails must exceed a minimal length to promote lengths (Fig. 1b). We refer to each of these tail-length measurements translation, an influence of tail length beyond this minimum is largely paired with its identifying sequence as a poly(A) tag. Although recovery unknown. The prevailing view is that longer tails generally lead to increased of tags from the standards varied somewhat, it did not vary systemat- translation5,6. This idea partly stems from the known importance of cyto- ically with tail length, which indicated that length-related biases were plasmic polyadenylation in activating certain genes in specific contexts4,5, not an issue (Extended Data Fig. 1c). Additional analyses indicated that and the increased translation observed in Xenopus oocytes and Drosophila mRNA degradation did not bias against longer poly(A) tails (Extended embryos when appending synthetic tails of increasing length onto an Data Fig. 2a). mRNA7,8. Support for a more general coupling of tail length and trans- Because alternative start sites or alternative splicing can generate lation comes from studies of yeast extracts9 and yeast cells10,11.However, different transcripts with the same poly(A) site, we considered our results the general relationship between tail length and translational efficiency with respect to unique gene models (abbreviated as ‘genes’) rather than has not been reported outside of yeast, primarily because transcriptome- to transcripts (even though polyadenylation occurs on transcripts, not wide measurements have been unfeasible for longer-tailed mRNAs. genes). Moreover, tags for alternative poly(A) sites of the same gene were pooled, unless stated otherwise. With this pipeline, analysis of RNA Poly(A)-tail length profiling by sequencing (PAL-seq) from NIH3T3 mouse fibroblasts (3T3 cells) yielded at least one tag from We developed a high-throughput sequencing method that accurately 10,094 unique protein-coding genes (including 97% of the 9,976 genes measures individual poly(A) tails of any physiological length (Fig. 1a). with at least one mRNA molecule per cell, as determined by RNA-seq) After generating sequencing clusters and before sequencing, a primer and $ 100 tags from 2,873 genes, coverage typical of most samples (Sup- hybridized immediately 39 of the poly(A) sequence is extended using a plementary Table 1). mixture of dTTP and biotin-conjugated dUTP as the only nucleoside triphosphates and conditions that were optimized to yield full-length Tail-length diversity within each species extension products without terminal mismatches (Extended Data Fig. 1a). Median tail lengths in mammalian cells (range, 67–96 nucleotides) This key step quantitatively marks each cluster with biotin in propor- exceeded those in A. thaliana leaves and Drosophila melanogaster S2 tion to the length of the poly(A) tail (Fig. 1a). After sequencing the 36 cells (51 and 50 nucleotides, respectively), which exceeded those in bud- nucleotides immediately 59 of the poly(A) site, the flow cell is incu- ding (Saccharomyces cerevisiae) and fission (Schizosaccharomyces pombe) bated with fluorophore-tagged streptavidin, which binds the biotin yeast (27 and 28 nucleotides, respectively) (Fig. 2a). Similar differences incorporated during primer extension to impart fluorescence intensity between mammalian, fly, plant and yeast cells were observed when proportional to the poly(A)-tract length. To account for the density of comparing tail-length averages for individual genes (Fig. 2b). For genes each cluster, this raw intensity is normalized to that of the fluorescent within each species, mean tail lengths varied, with the 10th and 90th

1Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 2Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA. 3Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 4Harvard-MIT Division of Health Sciences and Technology, Cambridge, Massachusetts 02139, USA. *These authors contributed equally to this work. 111 66 | NATURE | VOL 508 | 3 APRIL 2014 ©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

a On Illumina flow cell: the cleavage, blastula and gastrula stages of zebrafish embryonic devel- AAAAAAAAAAAA Generate clusters Anneal sequencing primer opment (2, 4 and 6 h post-fertilization (hpf), respectively) and the Total RNA (plus standards) Extend with dTTP and biotin-dUTP -B analogous stages of frog development (Fig. 2a, b, d). Processed data TTTT - - - - - Ligate using splint oligo - - - - - B - B ------reporting tail lengths for all genes detected in each sample are provided ------B in the Gene Expression Omnibus (accession number GSE52809). AAAAAAAAAAAA -B ------TTTT ------B Comparison of tail lengths for orthologous genes in human (HeLa

- - AAAAAAAAAAA UTTTTTTTUTT AAAAAAAAAAA TTTTTTTTUTT AAAAAAAAAAA TTTTTTTTTTT - TTTT AAAAAAAAAAA TTTTUTTTTTT Partially digest (RNase T1) Size select and HEK293T) and mouse (3T3 and liver) cells revealed moderately Bind to streptavidin beads, wash strong correlations, indicating that tail lengths are conserved (Extended A cluster seeded by one cDNA HO- AAAAAAAAAAAA -B Data Table 1, Spearman R (Rs) as high as0.46). When searching for gene Sequence poly(A)-proximal region Flow in fluorescent streptavidin classes that tended to have longer or shorter tails, the most striking and Phosphorylate 5′ end, wash Measure fluorescence intensity Ligate adaptor, wash pervasive enrichment was for ribosomal protein and other ‘housekeep- - - - -

- B ------B ing’ genes among the short-tailed genes (Extended Data Table 2). This AAAAAAAAAAAA -B ------B - enrichment was strong in yeast, despite previous reports that ribosomal- - - - - Reverse transcribe - - - - 10,11 - - - - Liberate cDNA (alkali treatment) - - - - B protein genes tend to have long tails . To address this and other AAAAAAAAAAA - TTTTTTTTUTT AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA UTTTTTTTUTT TTTTUTTTTTT TTTTTTTTTTT TTTTTTTTTTTT discrepancies with previous yeast studies (Extended Data Fig. 3a, b), Size select we used an independent method to measure the poly(A)-tail lengths of eight yeast genes, including four ribosomal protein genes. The results Set 1 Set 2 b 3,000 R were much more consistent with our measurements than with the pre- 10.0 10.0 Pearson = 0.99 55.5 56.5 vious measurements (Extended Data Figs 3 and 4). Both previous reports 106.5 106.5 2,000 used the polyadenylation state microarray (PASTA) method, which 166.5 159.5 205.0 215.0 1,000 fractionates RNAs by stepwise thermal elution from poly(U)-Sepharose. 262.5 272.5 301.5 Although studies have successfully used poly(U)-Sepharose fractionation Fluorescence intensity 0 to detect tail-length changes for the same genes in different contexts13–15, Common Barcode (A)n 0 100 200 300 sequence (10 nt) Poly(A) length (nt) detecting differences between different genes in the same context is Figure 1 | Global measurement of poly(A)-tail lengths. a, Outline of PAL- more challenging. Our results suggest that PASTA, as previously imple- seq. For each cluster, the fluorescence intensity reflects the tail length of the mented in yeasts10,11, is less suitable than PAL-seq for intergenic com- cDNA that seeded the cluster. Although the probability of incorporating a parisons, although we cannot exclude the possibility that the discrepancies biotin-conjugated dU opposite each tail nucleotide is uniform, stochastic arose from different growth conditions. incorporation results in a variable number of biotins for each molecule within The types of genes with shorter or longer tails differed between the a cluster. b, Median streptavidin fluorescence intensities for two sets of embryonic samples and the other samples (Extended Data Table 2). mRNA-like molecules with indicated poly(A)-tail lengths, which were added to Genes in the early embryo might not have the same tail lengths as their 3T3 (circle), HEK293T (triangle) and HeLa (square) samples for tail-length calibration. orthologues do in other contexts because before the maternal-to-zygotic transition (MZT), which occurs at ,3 hpf in zebrafish16 and at approx- percentiles differing by 1.4- to 1.6-fold. Variation was also observed for imately stage 8 in Xenopus laevis17, transcription is not yet active, and different mRNA transcripts from the same gene (Fig. 2c). For most some maternal transcripts are masked for later use, whereas others are genes the distributions were unimodal, with the mode approaching the subject to cytoplasmic polyadenylation5. At 6 hpf in zebrafish, ribosomal mean (Fig. 2d). Poly(A)-tail lengths increased when progressing through protein mRNAs had switched from being enriched in shorter-tailed genes

a S. cerevisiae 0.15 0.15 0.15 (26.7) X. laevis c Emg1 Lsm3 S. pombe (28.2) Zebrafish embryo: embryo: Arabidopsis leaf (50.7) 2 hpf (23.1) st. 3–4 (21.7) 0.10 Drosophila S2 (50.4) 0.10 0.10 4 hpf (42.2) st. 9 (35.4) 0 200 400 0 200 400 Mouse liver (66.6) 6 hpf (58.1) st. 12–12.5 (44.5) Eif4h Vps29 0.05 HeLa (67.5) 0.05 0.05 Frequency HEK293T (75.3) 3T3 (95.7) 0 0 0 0 200 400 0 200 400 0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Cct8 Vim Single-mRNA poly(A)-tail length (nt) Single-mRNA poly(A)-tail length (nt) Single-mRNA poly(A)-tail length (nt) b 0.5 S. cerevisiae (33.1) 0.20 0.20 0 200 400 0 200 400 S. pombe (32.8) Zebrafish embryo: X. laevis embryo: 0.4 Arabidopsis Frequency Myc Rpl37a leaf (58.1) 0.15 2 hpf (29.4) 0.15 st. 3–4 (25.4) 0.3 Drosophila S2 (63.3) Mouse liver (75.3) 0.10 4 hpf (44.0) 0.10 st. 9 (38.1) 0.2 HeLa (82.6) 6 hpf (58.9) st. 12–12.5 (49.2) 0 200 400 0 200 400 Frequency HEK293T (88.4) 0.1 0.05 0.05 Ctsb 1500012F01Rik 3T3 (107.8) 0 0 0 0 50 100 150 200 050100150200 0 50 100 150 200 0 200 400 0 200 400 Single-mRNA poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt)

Arabidopsis Zebrafish Zebrafish Zebrafish X. laevis X. laevis X. laevis d S. cerevisiae S. pombe leaf Drosophila S2 Mouse liver HeLa HEK293T 3T3 2 hpf 4 hpf 6 hpf stage 3–4 stage 9 stage 12–12.5 Fraction 0 0.2 0.4 0.6

Genes 0.8 1.0 8 8 8 8 8 8 8 8 8 8 8 8 8 8 Tail length 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 16 32 64 (nt) 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 128 256 Figure 2 | Poly(A)-tail lengths in yeast, plant, fly and vertebrate cells. a,Bulk Median average tail lengths are in parentheses. c, Intragenic tail-length tail-length distributions. For each sample, histograms tally tail-length distributions for 10 genes sampling the spectrum of average tail lengths in 3T3 measurements for all poly(A) tags mapping to annotated 39 UTRs (bin size 5 5 cells. d, Intragenic tail-length distributions. Heat maps show the frequency nucleotides). Leftmost bin includes all measurements ,0 nucleotides. Median distribution of tail lengths for each gene tallied in b. The colour intensity tail lengths are in parentheses. b, Intergenic tail-length distributions. For indicates the fraction of the total for the gene. Genes are ordered by average tail each sample, histograms tally average tail lengths for protein-coding genes length (dashed line). Results from the S. cerevisiae total-RNA sample are with $ 50 tags (yeasts, zebrafish and Xenopus)or$ 100 tags (other samples). reported in this figure. 112 3APRIL2014|VOL508|NATURE|67 ©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE to being enriched in longer-tailed genes (Extended Data Table 2), perhaps In these early embryonic stages, a twofold increase in tail length cor- because these were mostly newly synthesized transcripts, which tended responded to a large increase in translational efficiency—greater than to have longer tails at this stage (Extended Data Fig. 5). 6-fold when doubling the tail from 20 to 40 nucleotides in 2 hpf zebra- Because deadenylation is an important early step in eukaryotic mRNA fish (Fig. 3a). Although longer-tailed mRNAs were more likely to contain decay2,3,18, we examined the relationship between poly(A)-tail length a CPE, the relationship between tail length and translational efficiency and published mRNA stability values (Extended Data Table 1). Tail for CPE-containing mRNAs was no different from that of other mRNAs length and half-life were slightly negatively correlated in HeLa and 3T3 (Extended Data Fig. 7a). In theory, this coupling might not be causal, or cells (Rs 520.048 and 20.16, respectively) and variably correlated in it might be causal but strictly due to either translational inhibition caus- yeast, depending on the source of the half-life measurements (Rs from ing tail shortening or translational activity preventing tail shortening. 20.44 to 0.23). The weak relationships in HeLa and 3T3 cells would be Alternatively, all or at least some of the coupling might result from longer expected if mRNAs with different half-lives have similar steady-state tail length causing more efficient translation in the early embryo. We tail-length distributions, with the less stable mRNAs transiting through favour this last possibility because it agrees with the known impor- the distributions more quickly. tance of cytoplasmic polyadenylation for activating genes in maturing No strong, easily interpretable correlations between tail length and oocytes8,21,22 and early embryos23,24 of Xenopus and in certain other ver- mRNA features (length of 39 untranslated region (39 UTR), length of open tebrate contexts25–30. Even more importantly, it agrees with the increased reading frame (ORF), total length, splice-site number, splice-site density) translation observed in Xenopus oocytes when appending prosthetic or expression (steady-state accumulation and nuclear-to-cytoplasmic poly(A) tails of increasing length onto an mRNA8. ratio) were observed (Extended Data Table 1). Of these, the strongest The strong coupling observed in the blastula largely disappeared in correlations were between tail length and steady-state accumulation gastrulating embryos (Fig. 3a; Rs 5 0.13 for both fish and frog). This (Rs from 20.44 to 0.25), and between tail length and mRNA length (Rs disappearance was not because of the more restricted tail-length range from –0.12 to 0.36) or features related to mRNA length. Support for observed at gastrulation (Extended Data Fig. 7b). Moreover, we observed the latter relationship was also observed in intragenic comparisons, no positive correlation of a meaningful magnitude between mean poly(A)- which revealed a weak positive relationship between tail length and the tail length and translational efficiency in HeLa cells, HEK293T cells, length of tandem 39-UTR isoforms (Extended Data Fig. 6a). In early 3T3 cells, mouse liver, S. cerevisiae or S. pombe (Rs 520.10, 0.07, 20.04, zebrafish embryos this relationship between 39-UTR isoforms was even 0.00, 20.12 and 20.15, respectively) (Fig. 3b). Our results in yeasts more pronounced when a predicted cytoplasmic polyadenylation ele- differed from those reported earlier10,11, which we again attribute to the ment (CPE)4,19 was present in the region unique to the longer isoform limitations of previous methods. In 3T3 cells, metabolic labelling has (Extended Data Fig. 6b). been used to infer protein-synthesis rates31, which correlated with our 2158 translational efficiencies (Rs 5 0.44, P , 10 ) and did not correlate 216 Decoupling of tail length and translation positively with tail lengths (Rs 520.20, P , 10 ). Taken together, Mostreports of increased translation oflonger-tailed mRNAs haveused our results indicate that beginning at gastrulation, translational con- oocytes and early embryos4,5. To examine whether this phenomenon trol undergoes a mechanistic change that uncouples translational effi- reported in early embryos for a few genes applies transcriptome-wide, ciency from poly(A)-tail length. we performed ribosome footprint profiling and RNA-seq to measure translational efficiencies20 from the embryonic samples used to mea- Intragenic comparison of tail length and translation sure tail lengths. We found that in early embryos (cleavage and blastula The simplest interpretation of the weak or negative correlations we stages) of both fish and frog, mean poly(A)-tail length correlated strongly observed between tail length and translational efficiency in yeast and with translational efficiency (Fig. 3a, Rs from 0.62 to 0.77). No other mammalian cells is that increasing average tail length over the physio- mRNA feature has been reported to correlate so well with translational logical range does not enhance translation in these contexts. However, efficiency in any system. our comparisons of average tail length and average translational efficiency

abZebrafish 2 hpf Zebrafish 4 hpf Zebrafish 6 hpf S. cerevisiae S. pombe Mouse liver 6 R R R 6 R R R s = 0.77 s = 0.62 s = 0.13 s = –0.12 s = –0.15 s = 0.00 P = 0 P < 10–292 P < 10–11 P < 10–9 P < 10–14 P = 0.87 4 4 n = 2,812 n = 2,812 n = 2,812 n = 2,569 n = 2,791 n = 2,484 2 2 ) ) 2 2 0 0

TE (log –2 TE (log –2

–4 –4

–6 –6

10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 60 20 40 60 30 60 120 180

X. laevis stages 3–4 X. laevis stage 9 X. laevis stages 12–12.5 3T3 HEK293T HeLa 6 R R R 6 R R R s = 0.63 s = 0.65 s = 0.13 s = –0.04 s = 0.07 s = –0.10 P < 10–211 P < 10–231 P < 10–8 P = 0.04 P < 10–5 P < 10–5 4 4 n = 1,942 n = 1,942 n = 1,942942 n = 2,751 n = 4,509 n = 2,266

) 2 ) 2 2 2 0 0

TE (log –2 TE (log –2

–4 –4

–6 –6

10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 50 100 200 40 80 160 30 60 120 180 Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Mean poly(A)-tail length (nt) Figure 3 | Transient coupling between poly(A)-tail length and translational translational efficiency in the indicated cells, for genes with $ 50 (yeasts) efficiency. a, Relationship between mean tail length and translational or $ 100 (others) tags. With the exception of HeLa35, tail lengths and efficiency (TE) for genes with $ 50 poly(A) tags from embryonic samples at the translational efficiencies were from the same samples. S. cerevisiae YBR196C, indicated developmental stages. For each stage, tail lengths and translational YLR355C and YDL080C, S. pombe SPCC63.04.1, mouse liver NM_007881 and efficiencies were obtained from the same sample. MGC116473 and DDX24 fell NM_145470, HEK293T NM_001007026, NM_021058 and NM_003537, and outside the plot for X. laevis, stages 3–4, and LOC100049092 fell outside the plot HeLa NM_001007026 fell outside their respective plots. for X. laevis, stages 12–12.5. b, Relationship between mean tail length and 113 68 | NATURE | VOL 508 | 3 APRIL 2014 ©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH between genes (Fig. 3b) might have missed a relationship that would be targeting promotes poly(A)-tail shortening through the recruitment observed when looking at differentially translated mRNAs from the of deadenylase complexes37, our results suggest an alternative mech- same gene. To address this possibility, we fractionated 3T3 cell lysate anism for the shift in miRNA regulatory outcomes. In this mechanism, to isolate mRNAs associated with different numbers of ribosomes and miRNAs mediate tail shortening at both 4 and 6 hpf, but because of the measured the tail lengths ineachfraction(Fig.4a).Tolearnhowpoly(A)- switch in the nature of translational control (as well as destabilization tail length related to ribosome density for individual genes, we plotted of short-tailed mRNAs at later stages), tail shortening has very different mean tail-length values as a function of the number of bound ribo- consequences in the two stages: at 4 hpf, tail shortening predominantly somes and fit the data for each gene with a straight line (Fig. 4b). The decreases translational efficiency, whereas at 6 hpf, it predominantly slopes of these lines were generally small, and most were slightly nega- decreases mRNA stability. tive (Fig. 4b); positive slopes would have been expected if longer tails To integrate miRNA-mediated repression with effects on tail length, enhanced translation. Thus, the increase in median length observed we injected one-cell zebrafish embryos with miRNAs that are normally between the lightest and heaviest fractions when considering bulk tail not present in the early embryo and examined the influence of these lengths (Fig. 4a; 66 and 82 nucleotides, respectively) did not indicate a injected miRNAs on ribosome-protected fragments, mRNA levels and relationship between longer tails and enhanced translation but instead poly(A)-tail lengths at 2, 4 and 6 hpf. Injecting miR-155 caused ribo- might have reflected the positive correlation between ORF length and some-protected fragments from many of its predicted targets to decrease tail length observed in 3T3 cells (Extended Data Table 1; Rs 5 0.36). relative to ribosome-protected fragments from no-site control mRNAs The trend of mostly negative slopes prevailed even when excluding (Fig. 5a). Despite the decrease in ribosome-protected fragments, target data from mRNA not associated with any ribosomes (Extended Data mRNA levels did not change relative to the controls at 2 and 4 hpf, indi- Fig. 7c), or when examining subsets of genes with longer or shorter mean cating that at these stages miR-155 targeting caused mostly translational tail lengths, or with higher or lower translational efficiencies (Extended repression. In contrast, decreases in ribosome-protected fragments were Data Fig. 7d). This global intragenic analysis (Fig. 4b) supports the con- accompanied by nearly commensurate mRNA reductions at 6 hpf, indi- clusion drawn from intergenic analyses (Fig. 3), that in all yeast and cating that by this stage the outcome of repression had shifted to mostly mammalian contexts examined (and presumably in most other cellular mRNA destabilization (Fig. 5a). Thus, the shift in miRNA regulatory contexts), mRNAs with longer poly(A) tails are not more efficiently outcome that occurs between 4 and 6 hpf is not specific to miR-430 translated. or its targets. With respect to mechanism, the observation of this shift between 4 and 6 hpf, even though the injected miR-155 was present and A shift in the ultimate effects of miRNAs active much earlier than was miR-430, indicated that the shift reflec- MicroRNAs (miRNAs) are small RNAs that pair to sites in mRNAs ted a transition from the unusual regulatory regime operating in pre- to target these messages for post-transcriptional repression32. Global gastrulation embryos (in which translational efficiency is sensitive to measurements indicate that miRNA targeting causes mostly mRNA tail length) more than it reflected the dynamics of miRNA action. destabilization, with translational repression comprising a detectable The tail-length results further supported a mechanism involving but minor component of the overall repression33–36. The only known shifting consequences of tail-length shortening. Predicted miR-155 exception is the transient translational repression observed in early targets had shortened tails at 2 and 4 hpf (Fig. 5b), which explained zebrafish embryos36. At 4 hpf miR-430 targeting causes mostly trans- most of the miRNA-induced translational repression observed at these lational repression with very little mRNA destabilization, whereas by stages (Fig. 5c). By 6 hpf, the tail-length decreases observed at 4 hpf had 6 hpf the outcome shifts to mostly mRNA destabilization36. Because mostly abated for predicted miR-155 targets (Fig. 5b), and these miR-430 is induced only ,1.5 h before the 4-hpf stage, these results mRNAs were instead less abundant (Fig. 5a), in concordance with their are interpreted as revealing the dynamics of miRNA action, in which extent of deadenylation at 4 hpf (Extended Data Fig. 8a). These obser- an early phase of translational repression gives way to a later phase in vations agreed with the idea that tail shortening destabilizes mRNAs which destabilization dominates36. When considering that miRNA at later developmental stages and indicated that the miRNA-mediated deadenylation occurring during the earlier developmental stages pro- motes decay later. With shorter tails no longer associated with reduced a b translation (Fig. 3a) and instead associated with reduced mRNA levels, 0.6 Median = 300 −0.037 the ultimate consequence of miRNA-mediated repression shifted from

260nm P 0.4 = 1.0 A 100 translational repression to mRNA destabilization (Fig. 5c). Analogous 30 0.2 results were obtained after injecting a different miRNA, miR-132 (Fig. 5c, 300 10 Mean tail length Extended Data Fig. 8). 100 0 123 4 5.5 7.5 11 0.0 Mean bound ribosomes Because tail length was no longer strongly coupled with translational 30 Dcbld2 Dync1li2 –0.2 Ccdc104 Mcfd2 efficiency (Fig. 3a), tail-length changes did not explain the decrease in Tail length (nt) 10 Soat1 Ndufa11 0 123 4 5.5 7.5 11 –0.4 Mean bound ribosomes Cox4i1 Ap2a2 Slope ( Δ tail length/ ribosomes) mean translational efficiency observed at 6 hpf for predicted miR-132 targets (Fig. 5c). We conclude that when poly(A)-tail length is uncoupled Figure 4 | No detectable intragenic coupling between poly(A)-tail length from translational efficiency, the translational repression often detected and translational efficiency. a, Global analysis of tail lengths across a as a minor component of the overall repression33–35 arises from a mech- polysome profile for 3T3 cells. The absorbance trace indicates mean number of ribosomes bound per mRNA for each fraction from the sucrose gradient anism different from the one that dominates pre-gastrulation. (top, fractions demarcated with vertical dashed lines). Box plots show Our results provide a compelling explanation for miRNA-mediated distributions of bulk tail lengths in each fraction for all tags mapping to translational repression in the pre-gastrulation zebrafish embryo: miRNAs annotated 39 UTRs (bottom). Box plot percentiles are line, median; box, 25th induce poly(A) shortening, which decreases translational efficiency at and 75th percentiles; whiskers, 10th and 90th percentiles. The horizontal this developmental period. They also explain why the pre-gastrulation line indicates the overall median of the median tail lengths. b, Relationship zebrafish embryo is the only known context for which translational between tail lengths and ribosomes bound per mRNA for mRNAs from the repression is the dominant outcome of miRNA-mediated regulation; same gene. For each gene, the data from a were used to plot the mean tail length in all other contexts examined, tail-length shortening causes mRNA as a function of bound ribosomes. Log-log plots for 8 randomly selected genes destabilization with little or no effect on translational efficiency. with $ 50 poly(A) tags in $ 6 fractions are shown (left), with lines indicating linear least-squares fits to the data (adding a pseudocount of 0.5 ribosomes to the fraction with 0 ribosomes). The box plot shows the distribution of slopes Two gene-regulatory regimes for all genes with $ 50 poly(A) tags in $ 4 fractions (right; n 5 4,079; Our results from yeast, cultured mammalian cells and mouse liver one-sided, one-sample Wilcoxon test; box plot percentiles as in a). refute the prevailing view that poly(A)-tail length broadly influences 114 3 APRIL 2014 | VOL 508 | NATURE | 69 ©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

a 3 2 hpf 4 hpf 6 hpf active, which offers ample opportunities for nuclear control of gene 2 ≥1 site No site expression. Moreover, active transcription enables unstable mRNAs to

) 1 2 be replaced if required, thereby expanding the contexts in which dif- 0 ferential mRNA stability can be exploited for gene control. Thus, an –1 ** ** ** –2 additional layer of control in which translational efficiency depends on –3 poly(A)-tail length is dispensable. More importantly, because this type RPF change (log –4 of coupling would lower output from older mRNA molecules that, in –5 ** the absence of cytoplasmic polyadenylation, would often have shorter –3–4 –1–2 210 –3–4 –1–2 210 –3–4 –1–2 210 poly(A) tails, the utility of gene regulation through mRNA stability mRNA change (log ) mRNA change (log ) mRNA change (log ) 2 2 2 would be compromised. In this conventional regulatory regime, long- b 3 2 hpf 4 hpf 6 hpf 2 lived mRNAs would have less value if they were translated less effi- 1 ciently because of their shorter tails. ) 2 0 For fish and frog embryos at the cleavage stage, the regulatory regime –1 ** ** ** was very different. These embryos were transcriptionally inactive, which –2 not only precludes the use of transcription and other nuclear processes –3 RPF change (log –4 to alter gene expression programs but also limits the use of differential –5 mRNA stability, because degraded mRNAs cannot be replaced until ** ** * –2 –1 0 1 –2 –1 0 1 –2 –1 0 1 zygotic transcription begins. Perhaps as a consequence, many mRNAs Tail-length change (log ) Tail-length change (log ) Tail-length change (log ) 2 2 2 with short tails were observed (Fig. 2a), consistent with the known stability of short-tailed mRNAs in early embryos19,38. In these circum- c miR-155 miR-132 2 4 6 2 4 6 (hpf) stances, early embryonic cells apparently harness differential tail length 0 mRNA for global gene control. This result expands the known behaviour of ) 2 –0.1 23,24 RPF change individual genes in Xenopus embryos and the observation that early (log ) change 2 –0.2 4 (log ) embryonic cells have robust cytoplasmic polyadenylation , which increases 2 TE –0.3 change ** the utility of a tail-length regulatory mechanism. Compared to meta- (log ) –0.4 2 ** zoan cells subject to the standard regulatory regime (for example, 6-hpf TE change predicted Mean change (log –0.5 ** by tail-length change zebrafish embryos and the mammalian cells examined), cleavage-stage –0.6 ** * ** * embryos had more uniform intragenic tail lengths and more variable Figure 5 | The influence of miR-155 on ribosomes, mRNA abundance and intergenic lengths (Fig. 2d), as required for efficient harnessing of the tails in the early zebrafish embryo. a, Relationship between changes in tail-length regulatory regime. With their tail-length distribution also ribosome-protected fragments (RPFs) and changes in mRNA levels after shifted towards shorter tails (Fig. 2b), cleavage-stage embryos can most injecting miR-155. Changes observed between miRNA- and mock-injected efficiently exploit the tail-length differences with the greatest impact embryos are plotted at the indicated stages for predicted miR-155 target (Fig. 3a). genes (red, genes with $ 1 miR-155 site in their 39 UTR) and control genes (grey, genes that have no miR-155 site, yet resemble the predicted targets with The transition between these two very different gene-regulatory respect to 39-UTR length). To ensure that differences observed between 4 and regimes was rapid but not immediate. Despite their zygotic transcrip- 6 hpf were not the result of examining different genes, only site-containing tion, late-blastula embryos still coupled tail length with translation. genes and no-site control genes detected at both 4 and 6 hpf are shown for these Indeed, to the extent that newly transcribed zygotic mRNAs tended to stages. Lines indicate mean changes for the respective gene sets, with have longer tails than did the maternally inherited mRNAs (Extended statistically significant differences between the sets indicated (*P # 0.05; Data Fig. 5), the continued coupling observed in this hybrid regime 24 **P , 10 , one-tailed Kolmogorov–Smirnov test). Because injected miRNAs would act to increase the relative output from these newly minted partially inhibited miR-430-mediated repression, genes with miR-430 sites mRNAs, thereby sharpening the MZT. were not considered. Data were normalized to the median changes observed for We suspect that the tail-length regulatory regime observed in early the controls. b, Relationship between changes in ribosome-protected fragments and changes in mean tail lengths after injecting miR-155. Tail lengths were embryos operates in other systems in which transcription is repressed determined using PAL-seq, otherwise as in a. c, A developmental switch in (or occurs at a distant location) and cytoplasmic polyadenylation is the dominant mode of miRNA-mediated repression. The schematic (left) active, such as early embryos of other metazoan species, maturing depicts the components of the bar graphs, showing how the changes in oocytes and neuronal synapses5. The ability to measure poly(A)-tail ribosome-protected fragments (RPFs) comprise both mRNA and translational lengths at single-mRNA resolution should provide important insights efficiency (TE) changes. The compound bar graphs show the fraction of in these systems. repression attributed to mRNA degradation (blue) and translational efficiency (green) for the indicated stage, depicting the overall impact of miR-155 METHODS SUMMARY (centre; plotting results from a and b for genes with sites) and miR-132 (right, plotting results from Extended Data Fig. 8b for genes with sites). Slight, Cytoplasmically enriched lysates were prepared from HEK293T, 3T3, mouse statistically insignificant increases in mRNA for predicted targets resulted in liver, X. laevis, S. pombe and zebrafish samples, as well as one of the S. cerevisiae blue bars extending above the axis. For samples from stages at which tail length samples, and divided into three portions, one each for PAL-seq, RNA-seq and ribosome profiling. PAL-seq was performed as outlined in Fig. 1a. RNA-seq and and translational efficiency were strongly coupled, a bracket adjacent to the 35 compound bar indicates the fraction of repression attributable to shortened ribosome profiling were performed essentially as described previously . Poly(A) tails. Significant changes for each component are indicated with asterisks tags were mapped to a reference genome (or transcriptome) of the species, car- of the corresponding colour (*P # 0.05; **P , 1024, one-tailed Kolmogorov– rying forward those that mapped uniquely to the genome (or transcriptome) and Smirnov test). also overlapped with the 39 UTR of a transcript model chosen to represent a gene. Ribosome-protected fragments and RNA-seq tags were mapped to ORFs, as des- 35 translational efficiency. In doing so, they add to the known differences cribed previously , except tags mapping within the first 50 nucleotides of an ORF were discarded (to exclude signal from ribosomes that might have initiated after between the regulatory regime operating in these cells and that oper- cycloheximide was added). Each mRNA with a 39 UTR that had at least one ating in early metazoan embryos. 7-nucleotide site matching the miRNA seed region32 was predicted to be a target This absence or presence of coupling between poly(A)-tail length and of that miRNA. Genes that had no 6-nucleotide seed match anywhere within their translational efficiency can be rationalized in light of the potential inter- mature transcript were classified as no-site genes, from whicha set of no-site control play among regulatory options available in the two regulatory regimes. genes was selected such that its 39-UTR length distribution matched that of the Our yeast, mammalian and gastrulation-stage cells were transcriptionally predicted targets. 115 70 | NATURE | VOL 508 | 3 APRIL 2014 ©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Online Content Any additional Methods, Extended Data display items and Source 24. Simon, R., Tassan, J. P. & Richter, J. D. Translational control by poly(A) elongation Data are available in the online version of the paper; references unique to these during Xenopus development: differential repression and enhancement by a novel sections appear only in the online paper. cytoplasmic polyadenylation element. Genes Dev. 6, 2580–2591 (1992). 25. Vassalli, J. D. et al. Regulated polyadenylation controls mRNA translation during Received 25 July; accepted 23 December 2013. meiotic maturation of mouse oocytes. Genes Dev. 3, 2163–2171 (1989). Published online 29 January 2014. 26. Gebauer, F., Xu, W., Cooper, G. M. & Richter, J. D. Translational control by cytoplasmic polyadenylation of c-mos mRNA is necessary for oocyte maturation in the mouse. EMBO J. 13, 5712–5720 (1994). 1. Moore, M. J. & Proudfoot, N. J. Pre-mRNA processing reaches back to transcription and ahead to translation. Cell 136, 688–700 (2009). 27. Wu, L. et al. CPEB-mediated cytoplasmic polyadenylation and the regulation of 2. Goldstrohm, A. C. & Wickens, M. Multifunctional deadenylase complexes diversify experience-dependent translation of a-CaMKII mRNA at synapses. Neuron 21, mRNA control. Nature Rev. Mol. Cell Biol. 9, 337–344 (2008). 1129–1139 (1998). 3. Chen, C. Y. & Shyu, A. B. Mechanisms of deadenylation-dependent decay. Wiley 28. Oh, B., Hwang, S., McLaughlin, J., Solter, D. & Knowles, B. B. Timely translation Interdiscip. Rev. RNA 2, 167–183 (2011). during the mouse oocyte-to-embryo transition. Development 127, 3795–3803 4. Richter, J. D. Cytoplasmic polyadenylation in development and beyond. Microbiol. (2000). Mol. Biol. Rev. 63, 446–456 (1999). 29. Burns, D. M. & Richter, J. D. CPEB regulation of human cellular senescence, energy 5. Weill, L., Belloc, E., Bava, F. A. & Mendez, R. Translational control by changes in metabolism, and p53 mRNA translation. Genes Dev. 22, 3449–3460 (2008). poly(A) tail length: recycling mRNAs. Nature Struct. Mol. Biol. 19, 577–585 (2012). 30. Novoa, I., Gallego, J., Ferreira, P. G. & Mendez, R. Mitotic cell-cycle progression is 6. Eckmann, C. R., Rammelt, C. & Wahle, E. Control of poly(A) tail length. Wiley regulated by CPEB1 and CPEB4-dependent translational control. Nature Cell Biol. Interdiscip. Rev. RNA 2, 348–361 (2011). 12, 447–456 (2010). 7. Salle´s, F. J., Lieberfarb, M. E., Wreden, C., Gergen, J. P. & Strickland, S. Coordinate 31. Schwanha¨usser, B. et al. Global quantification of mammalian gene expression initiation of Drosophila development by regulated polyadenylation of maternal control. Nature 473, 337–342 (2011). messenger RNAs. Science 266, 1996–1999 (1994). 32. Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 8. Barkoff, A., Ballantyne, S. & Wickens, M. Meiotic maturation in Xenopus requires 215–233 (2009). polyadenylation of multiple mRNAs. EMBO J. 17, 3168–3175 (1998). 33. Baek, D. et al. The impact of microRNAs on protein output. Nature 455, 64–71 9. Preiss, T., Muckenthaler, M. & Hentze, M. W. Poly(A)-tail-promoted translation in (2008). yeast: implications for translational control. RNA 4, 1321–1331 (1998). 34. Hendrickson, D. G. et al. Concordant regulation of translation and mRNA 10. Beilharz, T. H. & Preiss, T. Widespread use of poly(A) tail length control to abundance for hundreds of targets of a human microRNA. PLoS Biol. 7, e1000238 accentuate expression of the yeast transcriptome. RNA 13, 982–997 (2007). (2009). 11. Lackner, D. H. et al. A network of multiple regulatory layers shapes gene expression 35. Guo, H., Ingolia, N. T., Weissman, J. S. & Bartel, D. P. Mammalian microRNAs in fission yeast. Mol. Cell 26, 145–155 (2007). predominantly act to decrease target mRNA levels. Nature 466, 835–840 (2010). 12. Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high- 36. Bazzini, A. A., Lee, M. T. & Giraldez, A. J. Ribosome profiling shows that miR-430 throughput sequencing instrument. Nature Biotechnol. 29, 659–664 (2011). reduces translation before causing mRNA decay in zebrafish. Science 336, 13. Rosenthal, E. T., Tansey, T. R. & Ruderman, J. V. Sequence-specific adenylations 233–237 (2012). and deadenylations accompany changes in the translation of maternal messenger 37. Braun, J. E., Huntzinger, E. & Izaurralde, E. A molecular link between miRISCs and RNA after fertilization of Spisula oocytes. J. Mol. Biol. 166, 309–327 (1983). deadenylases provides new insight into the mechanism of gene silencing by 14. Palatnik, C. M., Wilkins, C. & Jacobson, A. Translational control during early microRNAs. Cold Spring Harb. Perspect. Biol. 4, a012328 (2012). Dictyostelium development: possible involvement of poly(A) sequences. Cell 36, 1017–1025 (1984). 38. Audic, Y., Omilli, F. & Osborne, H. B. Postfertilization deadenylation of mRNAs in 15. Paynton, B. V., Rempel, R. & Bachvarova, R. Changes in state of adenylation and Xenopus laevis embryos is sufficient to cause their degradation at the blastula time course of degradation of maternal mRNAs during oocyte maturation and stage. Mol. Cell. Biol. 17, 209–218 (1997). early embryonic development in the mouse. Dev. Biol. 129, 304–314 (1988). Supplementary Information is available in the online version of the paper. 16. Kane, D. A. & Kimmel, C. B. The zebrafish midblastula transition. Development 119, 447–456 (1993). Acknowledgements We thank D. Weinberg, V. Auyeung, I. Ulitsky, C. Jan, J.-W. Nam, 17. Newport, J. & Kirschner, M. A major developmental transition in early Xenopus A. Shkumatava, S.-J. Hong, Y. Erlich and the Whitehead Genome Technology Core embryos: II. Control of the onset of transcription. Cell 30, 687–696 (1982). (V. Dhanapal, L. Francis, S. Gupta and T. Volkert) for discussions; J.-W. Nam, I. Ulitsky 18. Decker, C. J. & Parker, R. A turnover pathway for both stable and unstable mRNAs and D. Weinberg for assistance with transcript annotation; C. Bresilla, X. Guo, S.-J. Hong in yeast: evidence for a requirement for deadenylation. Genes Dev. 7, 1632–1643 and A. Rothman for experimental assistance; and D. Weinberg for comments on the (1993). manuscript. Supported by NIH grant GM067031 (D.P.B.) and NIH Medical Scientist 19. Aanes, H. et al. Zebrafish mRNA sequencing deciphers novelties in transcriptome Training Program fellowship T32GM007753 (A.O.S.). D.P.B. is an investigator of the dynamics during maternal to zygotic transition. Genome Res. 21, 1328–1338 Howard Hughes Medical Institute. (2011). 20. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide Author Contributions A.O.S. developed PAL-seq, generated tail-length measurements, analysis in vivo of translation with nucleotide resolution using ribosome profiling. and performed associated analyses. S.W.E. performed ribosome profiling, RNA-seq Science 324, 218–223 (2009). and associated analyses. G.R.C. performed zebrafish injections and assisted with 21. McGrew, L. L., Dworkin-Rastl, E., Dworkin, M. B. & Richter, J. D. Poly(A) elongation staging. D.P.B. supervised with help from H.S. All authors helped to design the study during Xenopus oocyte maturation is required for translational recruitment and is and write the manuscript. mediated by a short sequence element. Genes Dev. 3, 803–815 (1989). 22. Paris, J. & Richter, J. D. Maturation-specific polyadenylation and translational Author Information Sequencing data and the processed data for each gene are control: diversity of cytoplasmic polyadenylation elements, influence of poly(A) tail available at the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under size, and formation of stable polyadenylation complexes. Mol. Cell. Biol. 10, accession number GSE52809. Reprints and permissions information is available at 5634–5645 (1990). www.nature.com/reprints. The authors declare no competing financial interests. 23. Paris, J. & Philippe, M. Poly(A) metabolism and polysomal recruitment of maternal Readers are welcome to comment on the online version of the paper. Correspondence mRNAs during early Xenopus development. Dev. Biol. 140, 221–224 (1990). and requests for materials should be addressed to D.P.B. ([email protected]).

116 3 APRIL 2014 | VOL 508 | NATURE | 71 ©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

METHODS For the A. thaliana sample, the first sequenced base was of low quality, and raw PAL-seq. Total RNA or RNA from cytoplasmically enriched lysate (,1–50 mg) images from the first cycle of sequencing were excluded for image analysis and base was supplemented with two mixes of tail-length standards and trace marker RNA calling. Consequently, sequence reads were 35 nucleotides long, and only 15 containing an internal 32P-label (*)(59-ugagguaguagguuguauagu*caauccuaau- rounds of iterative adapter trimming/sequence mapping were performed. cauuccaauccuaaucauucaaaaaaaaaa-39, IDT), which was used to monitor subsequent For each read carried forward as mapping to a single locus (of either the genome ligation, partial-digestion and capture steps. Polyadenylated ends in the mixture of or the length standards), the cluster fluorescence intensity in the T channel after the cellular RNA and standards were ligated to a 39-biotinylated adaptor DNA oligo- first 100 nM streptavidin flow-in was recorded as the raw streptavidin fluorescence nucleotide (59-pAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGACA intensity. From this raw intensity, the intensity after the 0 nM flow-in was subtracted CATAC-biotin-39, IDT) in the presence of a splint DNA oligonucleotide (59-TT as background, and the resulting background-subtracted intensity was divided CCGATCTTTTTTTTT-39, IDT) using T4 Rnl2 (NEB) in an overnight reaction at by the relative cluster intensity observed during sequencing-by-synthesis, which 18 uC. The RNA was partially digested with RNase T1 (Ambion) as described39, normalized for the density of molecules within the cluster. The relative cluster inten- extracted with phenol-chloroform, ethanol precipitated and then purified on a sity was calculated by first dividing the fluorescence intensity of every sequenced denaturing polyacrylamide gel (selecting 104–750-nucleotide fragments), which base in the read by the median intensity for that base among all clusters with the removed residual unreacted 39 adaptor. Splinted-ligation products were captured same base at the same position, and then taking the average of the resulting values on streptavidin M-280 Dynabeads (Invitrogen) and, when still bound to the beads, over the length of the read. Normalized streptavidin intensities were transformed 59 phosphorylated with 39-phosphatase–deficient T4 polynucleotide kinase (NEB) to poly(A)-tail lengths using linear regression parameters derived from the med- and ligated to a 59 adaptor oligonucleotide (59-C3.spacer-CAAGCAGAAGACGG ian intensities of the standards and their mode poly(A)-tail lengths. For yeast, CATACGAGTTCAGAGTTCTAcaguccgacgauc-39, IDT; uppercase, DNA; low- Arabidopsis, Drosophila, Xenopus and zebrafish samples, only the standards with ercase, RNA) using T4 Rnl1 (NEB) in an overnight reaction at 22 uC. Following tails of 10, 50 and 100 nucleotides were used in the linear regression. For the other reverse transcription using SuperScript II (Invitrogen) and a primer oligonucleo- samples, all of the standards were used except for one with a 324-nucleotide tail tide (59-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACG-39, (barcode sequence 5 59-CUCACUAUAC-39), which was typically not sufficiently IDT), complementary DNA (cDNA) was liberated from the beads by base hydro- abundant for accurate measurement of its tail length. Each tail length was then lysis and purified on a denaturing polyacrylamide gel (selecting 166–790-nucleotide paired with the genomic (or standard) coordinates to yield a poly(A) tag. length DNA). Purified cDNA was denatured at room temperature in 5–100 mM Assigning poly(A) tags to genes. Reference transcript annotations were down- NaOH, neutralized with addition of HT1 hybridization buffer (Illumina) and loaded (in refFlat format) from the UCSC Genome Browser or another database applied to an Illumina flow cell (at a typical concentration of 1.0–1.2 pM). Stan- (Ensembl for zebrafish, Unigene for X. laevis, TAIR for A. thaliana and PomBase dard cluster generation, linearization, 39-end blocking and primer hybridization for S. pombe). For human, mouse, zebrafish and fly, transcript 39 ends were re- 39 were performed on a cBot cluster generation system (Illumina). annotated using poly(A) sites identified by 3P-seq and the workflow described 40 After transferring the flow cell to a Cluster Station designed for an Illumina previously . For each gene, a representative transcript model was chosen as the Genome Analyzer, the sequencing primer was extended using a reaction mix one that had the longest ORF and the longest 39 UTR corresponding to that ORF. containing 100 U ml21 Klenow polymerase (NEB), 200 nM dTTP and either 10 These reference transcript databases and a file with the sequences of the internal nM (yeast samples) or 4 nM (other samples) biotin-16-dUTP (Roche). Extension standards are available for anonymous download (http://web.wi.mit.edu/bartel/ was for 30 min at 37 uC, flowing a fresh aliquot ($ 50 ml) of reaction mix every pub/publication.html). S. cerevisiae representative transcript models were from 2 min to replenish dNTPs. Following primer extension, the flow cell was placed Gene Expression Omnibus accession GSE53268. Poly(A) tags that overlapped the on a Genome Analyzer II sequencer (Illumina) for 36 cycles of standard sequen- 39 UTR of the representative transcript model by at least one nucleotide were cing-by-synthesis. After three additional cycles of cleavage to remove any residual assigned to that gene. Tags with tail-length measurements , 250 and . 1,000 sequencing fluorophores, the flow cell was washed with buffer (40.25 mM phos- nucleotides (which included , 0.0009% of the tags in any sample) were excluded phate buffered saline (PBS), pH 7.4, 0.1% Tween), blocked with streptavidin- from all analyses. Mean tail-length measurements , 1 nucleotide (which inclu- binding buffer (300 mgml21 bovine serum albumin (NEB), 40.25 mM PBS, pH ded measurements from 11 analysed genes) were replaced with a value of 1.0 7.4, 0.1% Tween), washed with buffer again, and then imaged, as carried out nucleotide in the intragenic analysis across the polysome gradient (Fig. 4b). When previously12. This cycle of wash, block, wash, image was then repeated with a binding considering the depth of a representative PAL-seq data set from 3T3 cells, we step inserted after the blocking step, in which the flow cell was incubated with considered 1.0 reads per kilobase per million reads (RPKM) as the RNA-seq level 30 nM Alexa Fluor 532 streptavidin (Invitrogen) in streptavidin-binding buffer indicating an average of one mRNA molecule per cell. This estimate was conser- 31 for 10 min at 20 uC. This expanded cycle was then repeated two more times, but vative, in that a comparison to published mRNA abundances in 3T3 cells indicated with 100 nM streptavidin included at the binding step. Fluorescence was captured that 1.0 RPKM from our experiment corresponded to about 0.2 mRNA molecules in the T and G channels because the wavelength of the excitation laser for these per 3T3 cell. channels (532 nm) was identical to the fluorophore excitation wavelength. The RNA preparation for PAL-seq. For libraries made from S. pombe, HEK293T, sequential imaging confirmed that the second 100 nM streptavidin incubation did 3T3, mouse liver, X. laevis, and mock-, miR-132- and miR-155-injected zebrafish not increase mean cluster intensity (monitored in real time as part of the ‘first base samples, as well as the S. cerevisiae sample analysing cytoplasmically enriched RNA, report’), which indicated saturation of available biotin. RNA was extracted from a portion of the lysate prepared for ribosome profiling Calculation of poly(A)-tail lengths. Raw images taken during sequencing-by- and RNA-seq. These cleared lysates were enriched in cytoplasm. For libraries made synthesis and after binding of fluorescent streptavidin were processed with Firecrest from HeLa and polysome-gradient samples, RNA was extracted from similar cyto- image-analysis and Bustard base-calling software (Illumina, version 1.9.0, using default plasmically enriched lysates. For the polysome gradient fractionation (Fig. 4a), parameters) to generate a read (FASTQ) file and another file containing the posi- lysate preparation and centrifugation were performed as for ribosome profiling, tion of each cluster, the read sequence, the quality score for each base, and the base but without nuclease digestion before fractionation. For other libraries, total RNA intensities in all four channels for every cycle of sequencing and streptavidin bind- was used. The correlation observed when comparing PAL-seq results from HeLa ing. Reads were aligned to a reference genome (hg18 for human, mm9 for mouse, cytoplasmically enriched RNA and total RNA resembled that observed between dm3 for fly, danRer7 for fish, TAIR10 for A. thaliana, Spombe1 for S. pombe, and biological replicates (Extended Data Fig. 2b; Rs 5 0.84 and 0.83, respectively). The sacCer3 for S. cerevisiae) or a reference transcriptome (curated from Unigene measured lengths in both types of RNA preparation were similar, despite the mRNA sequences for X. laevis) using the Bowtie program for short-read mapping possibility that total RNA might have included more long-tailed mRNAs due to and the parameters ‘–l 25 –n 2 –m 1 –3 z’, in which z was the number of strepta- a population of nascent mRNAs that had full-length tails and were awaiting export vidin-binding cycles plus one. Reads containing ambiguous base calls (as indicated to the cytoplasm. However, not all nuclear mRNAs are expected to have full-length by characters ‘N’ or ‘.’) at any position in the first 36 nucleotides were discarded, as tails (as some would still be in the process of being polyadenylated at the time of were reads mapping to multiple genomic loci. Reads that did not map to the genome sample collection), and the nuclear population of mRNAs awaiting export pre- were aligned to Bowtie indexes corresponding to the tail-length standards. For the sumably comprised a small fraction of the cellular mRNAs. remaining reads, mapping to the genome and standards was repeated, accounting Tail-length standards. The common 59 region of each standard and the unique for the possibility that the read failed to map because the sequence extended past the 39 region, consisting of the standard-specific barcode and poly(A)-tail (Fig. 1b), poly(A)-proximal fragment of the transcript and into the 59 adaptor. This mapping were synthesized separately and then ligated together to make full-length stan- was reiterated for 16 rounds (to capture tags of $ 20 nucleotides, with each round dards. To generate each 39 region, a 59-phosphate-bearing RNA oligonucleotide considering previously unmapped reads in which the 59 adaptor sequence started a (IDT) consisting of the barcode segment followed by a 10-nucleotide poly(A) nucleotide closer to the beginning of the read (requiring a perfect match to only the segment was extended with E. coli poly(A) polymerase (NEB), with ATP concen- final 6 nucleotides of the adaptor after the fifth round)). Before each round of map- tration and reaction time adjusted to yield tails of the desired length. To narrow ping, the adaptor sequence was stripped by adjusting the Bowtie ‘23’ parameter. the tail-length distribution, extension products were sequentially purified on two 117

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH denaturing polyacrylamide gels, excising products with tails of the desired length quantities, and these tests do not make assumptions about equal variance between range and reducing the variability of tailed RNA to be mostly within ,5–25 groups. Mammalian cell lines were obtained from ATCC, and S2 cells were the nucleotides, depending on the length of tail added (Extended Data Fig. 1b). The same as in ref. 42 (that is, adapted to growth in serum-free media). The BY4741 59 region of the standards was synthesized by in vitro transcription of a template strain was used for S. cerevisiae, 972 for S. pombe, Columbia for A. thaliana, and containing Renilla luciferase sequence followed by that of a modified HDV ribo- AB for zebrafish. All animal experiments were performed in accordance with a zyme41. After gel purification of the 59 product of HDV self-cleavage, the 29,39- protocol approved by the MIT Committee on Animal Care. cyclic phosphate at its 39 end was removed with T4 polynucleotide kinase (NEB; Zebrafish injections. Zebrafish embryos were injected at the one-cell stage with 3,000 ml reaction containing 30,000 U of enzyme and 100 mM MES-NaOH, pH 1nlof10mM miRNA duplex (miR-155 or miR-132) or buffer alone using a PLI- 5.5, 10 mM MgCl2,10mMb-mercaptoethanol, 300 mM NaCl, 37 uC, 6 h). After 100 Plus Pico-Injector. Duplexes were made by combining RNAs (IDT) corres- another gel purification, the dephosphorylated product was joined to the poly(A)- ponding to either miR-132 (59-uaacagucuacagccauggucg-39) and miR-132* (59- tailed barcode oligonucleotide by splinted ligation using T4 Rnl2 (NEB) and a accguggcauuagauuguuacu-39) or miR-155 (59-uuaaugcuaaucgugauaggggu-39)and DNA bridge oligonucleotide with 10 nucleotides of complementarity to each side miR-155* (59-accuaugcuguuagcauuaauc-39) in annealing buffer (30 mM Tris-HCl, of the ligation junction. Ligation products were gel purified and mixed in desired pH 7.5, 100 mM NaCl, 0.1 mM EDTA), heating to 90 uC for 1 min, and slow cooling ratios before being added to RNA samples for PAL-seq. to room temperature over several hours. Injected embryos were incubated in E3 Ribosome footprint profiling. Immediately before sample collection, cultured buffer at 28 uC until time of sample collection. 21 mammalian cells were incubated with media containing 100 mgml cyclohex- Predicted miRNA targets. MicroRNA target genes were predicted using the imide for 10 min at 37 uC to stop translation elongation. Cells were washed twice reference transcript database used to assign zebrafish poly(A) tags. Each gene with 21 with ice-cold 9.5 mM PBS, pH 7.3, containing 100 mgml cycloheximide, and a39 UTR that had at least one 7-nucleotide site matching the miRNA seed region32 lysed by adding lysis buffer (10 mM Tris-HCl, pH 7.4, 5 mM MgCl2, 100 mM KCl, was predicted to be a target of that miRNA. Genes that had no 6-nucleotide miRNA 21 21 2 mM dithiothreitol, 100 mgml cycloheximide, 1% Triton X-100, 500 U ml seed match anywhere within their transcript were classified as no-site genes, from RNasin Plus, and protease inhibitor (1x complete, EDTA-free, Roche)) and trit- which a set of no-site control genes was selected such that its 39-UTR length urating four times with a 26-gauge needle. After centrifuging the crude lysate at distribution matched that of the predicted targets. 1,300g for 10 min at 4 uC, the supernatant was removed and flash-frozen in liquid Calculation of the relationship between poly(A)-tail length and translational nitrogen. Cultured S. pombe cells were grown to mid-log phase and then har- efficiency. For experiments in which zebrafish embryos were mock-injected or vested (without cycloheximide pre-treatment) by filtering off the media and flash injected with miR-132 or miR-155, least-squares second-order polynomial regres- freezing the remaining paste, which was then manually ground into a fine powder sion was performed to determine the change in log translational efficiency for with a mortar and pestle while being bathed in liquid nitrogen. The powder was 2 each change in log2 poly(A)-tail length. To prevent microRNA effects on trans- thawed on ice, resuspended in lysis buffer and processed as described for the other lational efficiency and/or tail length from influencing any relationship, the regres- lysates. Zebrafish embryos were enzymatically dechorionated and then incubated sion analyses were performed after excluding genes for which the mRNAs contained in 100 mgml21 cycloheximide in E3 buffer (5 mM NaCl, 0.17 mM KCl, 0.33 mM a perfect match to either the seed (nucleotides 2–7 of the miRNA) of miR-430 (the CaCl , 0.33 mM MgSO ) for 5 min at room temperature. The embryos were then 2 4 predominant endogenous miRNA at 4 and 6 hpf) or the seed of the injected miRNA. transferred into lysis buffer and flash frozen. Soon after fertilization the jelly mem- These regression results were used to estimate the translational efficiency change branes of X. laevis embryos were chemically removed, and at the desired stages, attributable to tail-length change for each gene. embryos were flash frozen in lysis buffer without cycloheximide pre-treatment. Poly(A)-tail measurements on RNA blots. Single-gene poly(A)-tail lengths were Once thawed, these samples were clarified as above and then processed in the same measured on RNA blots after directed RNase H cleavage of the interrogated manner as other lysates. Prior to dissecting liver, a 6-week-old, male C57BL/6 mouse mRNA. Standard methods43 were modified to enable higher resolution for shorter was killed by cervical dislocation. The liver was excised, flash frozen, and manually tails (,50 nucleotides), such as those found on yeast mRNAs. Total RNA (3– ground and processed as described for S. pombe. Ribosome profiling and RNA-seq 20 mg) was heat-denatured for 5 min at 65 C in the presence or absence of a (dT) were performed on cleared lysates essentially as described35, using RiboMinus-treated u 18 m 21 RNA for the S. pombe RNA-seq sample, and poly(A)-selected RNA for all others, oligonucleotide (IDT, 33 pmol oligonucleotide g total RNA), and in the pres- with a detailed protocol available at http://bartellab.wi.mit.edu/protocols.html. ence of 25 pmol of a DNA oligonucleotide (or gapmer oligonucleotide, which had 9 S. cerevisiae RPF and RNA-seq data were from GSE53268 and were derived from 16 DNA nucleotides flanked on each side by five 2 -O-methyl RNA nucleotides) the same sample as the S. cerevisiae PAL-seq sample analysing cytoplasmically that was complementary to a segment within the 39-terminal region of the inter- enriched RNA. rogated mRNA. After snap-cooling on ice, the RNA was treated with RNase H m RPF and RNA-seq tags were mapped to the ORFs, as described previously35 (Invitrogen) for 30 min at 37 uCina20 l reaction according to the manufacturer’s (using the assemblies and transcript models used for PAL-seq), except reads instructions. The reaction was stopped by addition of gel loading buffer (95% overlapping the first 50 nucleotides of each ORF were disregarded. This was done formamide, 18 mM EDTA, 0.025% SDS, dyes) and then analysed on RNA blots 44 to minimize a bias from ribosomes accumulating at or shortly after the start codon, resembling those used for small-RNA detection (detailed RNA blot protocol which results from translation initiation events continuing in the face of cyclohex- available at http://bartellab.wi.mit.edu/protocols.html). Briefly, after separation imide-inhibited elongation20. Because of this bias, genes with shorter ORFs have of the RNA on a denaturing polyacrylamide gel and transfer onto a Hybond-NX artefactually higher translational efficiencies if all the bound ribosomes are con- membrane (GE Healthcare), the blot was treated with EDC (N-(3-dimethylami sidered (as in conventional polysome gradient analysis). This cycloheximide effect nopropyl)-N9-ethylcarbodiimide; Sigma-Aldrich), which crosslinked the 59 phos- 45 might have distorted the translational efficiency measurements in studies that phate of the 39-terminal RNase H cleavage product to the membrane . The blot calculated ribosome densities using polysome gradient fractionation followed by was then hybridized to a probe designed to pair to the region spanning the RNase H microarray analysis (including those reporting a positive correlation between cleavage site and the poly(A) site. Comparison of these 39-terminal fragments with ribosome density and poly(A)-tail length10,11), but could not have influenced the and without poly(A) tails revealed the length of the tails. conclusions of our polysome-gradient experiment, because our analysis focused on 39. Jan, C. H., Friedman, R. C., Ruby, J. G. & Bartel, D. P. Formation, regulation intragenic comparisons (Fig. 4b). Translational efficiencies were considered only and evolution of Caenorhabditis elegans 39UTRs. Nature 469, 97–101 for genes exceeding a cutoff of 10 RPM (reads per million uniquely mapped reads) (2011). in the RNA-seq library. When calculating sequencing depth (the ‘M’ of RPM), all 40. Ulitsky, I. et al. Extensive alternative polyadenylation during zebrafish uniquely mapped reads that overlapped the mRNA primary or mature transcript development. Genome Res. 22, 2054–2066 (2012). were counted for all samples except the X. laevis samples; only the uniquely mapped 41. Schu¨rer, H., Lang, K., Schuster, J. & Morl, M. A universal method to produce reads overlapping ORFs were counted for X. laevis. For the analysis of miRNA in vitro transcripts with homogeneous 39 ends. Nucleic Acids Res. 30, e56 (2002). effects, only genes exceeding a cutoff of 10 RPM in the mock-injected RNA-seq and 42. Ruby, J. G., Jan, C. H. & Bartel, D. P. Intronic microRNA precursors that bypass RPF libraries, and $ 50 PAL-seq tags in the mock-injected and miRNA-injected Drosha processing. Nature 448, 83–86 (2007). samples were considered. 43. Salle´s, F. J., Richards, W. G. & Strickland, S. Assaying the polyadenylation state of Statistics, reagents and animal models. All statistical tests were two-sided unless mRNAs. Methods 17, 38–45 (1999). indicated otherwise. No power testing was done to anticipate the sample size needed 44. Lau, N. C., Lim, L. P., Weinstein, E. G. & Bartel, D. P. An abundant class of tiny RNAs for adequate statistical power. No randomization or blinding was used for miRNA with probable regulatory roles in Caenorhabditis elegans. Science 294, 858–862 (2001). injection experiments. Features of mRNAs (for example, poly(A)-tail length, 45. Pall, G. S., Codony-Servat, C., Byrne, J., Ritchie, L. & Hamilton, A. Carbodiimide- mRNA length, expression level, and so on) were not normally distributed, nor mediated cross-linking of RNA to nylon membranes improves the detection were changes in expression due to miRNA-mediated repression. Therefore, non- of siRNA, miRNA and piRNA by northern blot. Nucleic Acids Res. 35, e60 parametric measures or tests were used when making comparisons involving such (2007). 118

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

46. Meijer, H. A. et al. A novel method for poly(A) fractionation reveals a large 52. Munchel, S. E., Shultzaberger, R. K., Takizawa, N. & Weis, K. Dynamic profiling of population of mRNAs with a short poly(A) tail in mammalian cells. Nucleic Acids mRNA turnover reveals gene-specific and system-wide regulation of mRNA decay. Res. 35, e132 (2007). Mol. Biol. Cell 22, 2787–2795 (2011). 47. Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 53. Sun, M. et al. Comparative dynamic transcriptome analysis (cDTA) reveals mutual (2012). feedback between mRNA synthesis and degradation. Genome Res. 22, 48. Holstege, F. C. et al. Dissecting the regulatory circuitry of a eukaryotic genome. Cell 1350–1359 (2012). 95, 717–728 (1998). 54. Haimovich, G. et al. Gene expression is circular: factors for mRNA degradation also 49. Wang, Y. et al. Precision and functional specificity in mRNA decay. Proc. Natl Acad. foster mRNA synthesis. Cell 153, 1000–1011 (2013). Sci. USA 99, 5860–5865 (2002). 55. Sun, M. et al. Global analysis of eukaryotic mRNA degradation reveals Xrn1- 50. Grigull, J., Mnaimneh, S., Pootoolal, J., Robinson, M. D. & Hughes, T. R. Genome- dependent buffering of transcript levels. Mol. Cell 52, 52–62 (2013). wide analysis of mRNA stability using transcription inhibitors and microarrays 56. Larsson, E., Sander, C. & Marks, D. mRNA turnover rate limits siRNA and microRNA reveals posttranscriptional control of ribosome biogenesis factors. Mol. Cell. Biol. efficacy. Mol. Syst. Biol. 6, 433 (2010). 24, 5534–5547 (2004). 57. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based 51. Shalem, O. et al. Transient transcriptional responses to stress are generated by approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. opposing effects of mRNA production and degradation. Mol. Syst. Biol. 4, 223 (2008). USA 102, 15545–15550 (2005).

119

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 1 | Development and characterization of the PAL-seq phosphorimager. Full-length extension without additional untemplated method. a, Optimization of the primer-extension reaction. A 59-radiolabelled nucleotides was favoured by using Klenow fragment at 37 uC with very low primer was annealed to a single-stranded DNA template containing a (dA)25 dTTP concentrations (upper right panel and bottom two panels). Under tract immediately upstream of the primer-binding site (top schematic). these conditions the product did not change with prolonged reaction times Two templates (I and II), which differed at the segment immediately 59 of the (bottom). b, Poly(A)-tail lengths of the synthetic standards. Poly(A) tails .10 poly(dA) tract, were used in the experiments shown. Primer extension was nucleotides retained some length heterogeneity generated during their performed with either Klenow fragment (K, NEB), Klenow fragment lacking enzymatic synthesis. To determine the actual poly(A) lengths of the 39-to-59 exonuclease activity (K2, NEB), or T4 DNA polymerase (T4, NEB). barcode-poly(A) RNAs used to generate the standards, each RNA was Reactions contained the recommended buffer and enzyme concentrations and 33P-labelled at its 59 terminus and analysed on denaturing polyacrylamide gels a 50:1 molar mixture of dTTP:biotin-16-dUTP at the dTTP concentrations under conditions that enabled single-nucleotide resolution. The values to indicated. In one experiment (centre left), the dTTP concentration was kept the right of each panel indicate the modes and approximate ranges of the constant, and the concentration of the primer-template duplex was varied poly(A) tail lengths (after accounting for the 10-nucleotide barcodes). Also instead. Reactions were incubated for 5 min, unless stated otherwise (bottom shown are marker lanes with 33P-labelled Century Plus ladder (C, Ambion), two panels), at the indicated temperature (temp; room temperature, r.t.), then 33P-labelled Decade ladder (D, Ambion) and a partial base-hydrolysis ladder of stopped and in most cases supplemented with a gel-mobility standard (St), the labelled barcode-poly(A) RNA used to make the 324-nucleotide standard which was a 32P-labelled synthetic oligonucleotide that had four extra dT of mix 2 (OH). c, The relative PAL-seq yield of each poly(A)-length standard. residues appended to the intended full-length primer-extension product (P). For each standard in the indicated mix, the yield of poly(A) tags relative Products were resolved on denaturing polyacrylamide gels, alongside a size to that of the A10 standard is plotted, after normalizing to the starting ratio ladder (L), which was a mixture of 32P-labelled oligonucleotides that differed determined from analysis of 59-labelled mix on a denaturing polyacrylamide from the full-length primer-extension product by 21, 0, 11, 12, 13 and gel. Box plots show the distribution of yields for 32 PAL-seq libraries (line, 14 dTs (three of these are indicated as 11, 12, 13), and visualized using a median; box, 25th and 75th percentiles; whiskers, 10th and 90th percentiles).

120

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 2 | Validation of PAL-seq performance. a, Evidence that there had been little cleavage after As, again implying that the poly(A) tails against non-specific RNA degradation. Plotted are nucleotide identities at the had remained intact. In the A. thaliana leaf analysis, for which the raw reads positions immediately upstream of poly(A) tags that both mapped uniquely had the first base removed, estimation of RNA integrity was performed with to the genome (or standards) and ranged from 22–30 nucleotides in length length ranges shortened by one nucleotide (for example, informative poly(A) (a range chosen to be long enough to enable mostly unique mapping to the tags were 21–29 nucleotides long). b, Consistent results from similar samples or genome, yet short enough to include enough 59 adaptor nucleotides in a biological replicates. Plotted are the relationships between average poly(A)-tail 36-nucleotide read to clearly identify the 59 end of the tag). Frequencies were lengths generated using HeLa total RNA or RNA from a cytoplasmically normalized to the aggregate nucleotide composition of positions 23–31 in enriched lysate (left), between average poly(A)-tail lengths generated using either uniquely genome- or standard-mapping tags that extended the full S. cerevisiae total RNA or RNA from a cytoplasmically enriched lysate length of the reads (36 nucleotides). Because RNase T1 cuts after Gs, the (sample 1 and 2, respectively; middle), and between average poly(A)-tail nucleotide preceding each 22–30-nucleotide tag was expected to be G, unless lengths generated using cytoplasmically enriched lysates from two different the mRNA had been cut for some other reason. The high frequency of G 3T3 cell lines (right). Although the 3T3 lines were each engineered to express indicated that most mRNA fragments had not been cut for other reasons, which a miRNA (either miR-1 or miR-155), the miRNA was not induced also implied that for these samples the poly(A) tails had also remained intact. in the cells used for this comparison. NM_001007026 fell outside the plot We are unable to explain the high signal for an upstream U or C in some for HeLa, and YDL080C fell outside the plot for yeast. samples. Nonetheless, the frequency of an upstream A was low, which indicated

121

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 3 | Discrepancies between the results of PAL-seq and difference between the known length of the inter-oligo region and the observed those of previous methods. a, Comparison of S. cerevisiae poly(A)-tail lengths length of the oligo(dT)-plus fragment. For the two reactions guided by a measured by PAL-seq on total RNA to the previous results from PASTA gene-specific gapmer (RPL28 and RPS9B), the inter-oligo region extended analysis10. Plotted are mean poly(A)-tail lengths measured by PAL-seq for through the residues pairing to one of the 29-O-methyl-RNA segments, and genes previously classified as having either short or long tails (PASTA-short cleavage was assumed to occur across from the most poly(A)-proximal DNA and PASTA-long, respectively)10. The vertical dashed lines indicate the mean residue. Thus, the average number of residual adenosines was estimated as for each group as measured by PAL-seq. b, Comparison between PAL-seq the difference between the length of the inter-oligo region and the observed measurements and either PASTA-derived poly(A)-tail ranks in fission yeast11 length of the oligo(dT)-plus fragment. d, RNA blots used to measure (left), or results of a related method reporting log ratios of short- and long-tail poly(A)-tail lengths, as described in panel c, with the length information fractions in actively dividing 3T3 cells46 (right). c, Schematic of tail-length determined by PAL-seq (on total RNA) and PASTA10 indicated below each blot measurements using RNA blots. A DNA oligonucleotide or a gapmer for comparison. For each lane, the range of high signal predicted based on (chimaeric oligonucleotide with DNA flanked by 29-O-methyl-RNA) was PAL-seq results (Extended Data Fig. 4) is shown as a line next to the blot (with designed to pair near the 39 end of the mRNA. This oligonucleotide directed and without oligo(dT), red and blue, respectively). These predicted sizes took RNase H cleavage, thereby generating 39-terminal mRNA fragments with into account the residual nucleotides flanking the inter-oligo region, using lengths suitable for high-resolution analysis on RNA blots. Some of each the migration of the oligo(dT)-plus fragment to estimate the residual sample was also incubated with oligo(dT), which directed RNase H removal of nucleotides on one or both ends as described in panel c. Genes chosen for most of the poly(A) tail. Cleavage fragments were resolved on RNA blots and analysis were required to be adequately expressed and to have a relatively detected by probing for the inter-oligo region of the mRNA. The average homogeneous cleavage and poly(A) site, as determined by 3P-seq (data not poly(A)-tail length was calculated as the difference in the average sizes of the shown). Nonetheless, some genes, such as RPL28, had frequently used oligo(dT)-minus and oligo(dT)-plus fragments, plus the average number of alternative cleavage and poly(A) sites, as reflected by the two ranges marked in residual adenosine residues that remained because of incomplete digestion of red. A preference was also given to ribosomal protein genes and genes with the poly(A) tail (residual As). For each reaction guided by a gene-specific DNA contradictory poly(A)-tail lengths when comparing the results of PAL-seq oligo, the average number of residual adenosines was estimated as half the and PASTA.

122

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 4 | The signal distributions for the RNA blots and blue, respectively). Vertical dashed lines indicate the migration of Decade (Extended Data Fig. 3d) compared with those predicted using PAL-seq. markers (Ambion). The vertical axes are in arbitrary units (a.u.). The range of Predicted traces from PAL-seq accounted for the estimated number of residual the high signal predicted based on PAL-seq data (signal exceeding 33% of nucleotides flanking the inter-oligo region after RNase H cleavage, as described the maximum) was determined using these plots and shown on Extended Data (Extended Data Fig. 3c). The offsets added to account for these residual Fig. 3d as vertical lines next to the RNA blots. For some genes, poly(A)-site nucleotides are indicated below each plot. The horizontal dashed lines above heterogeneity caused the signal exceeding 33% to map to noncontiguous each plot indicate the range of the signal determined by visual inspection of segments. the RNA blots in Extended Data Fig. 3d (oligo(dT)-plus and minus, red

123

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 5 | Relationship between poly(A)-tail length and gene expression between the indicated embryonic stages, as measured by RNA- changes in gene expression during zebrafish embryogenesis. Changes in seq, are plotted in relation to the mean poly(A)-tail length at the latter stage.

124

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 6 | Poly(A)-tail lengths of tandem alternative 39-UTR had to exist in the last 30 nucleotides of the distal isoform19,24. For each gene isoforms. a, Comparison of average poly(A)-tail lengths for proximal (short) with a CPE within the region unique to the distal isoform, five genes with and distal (long) isoforms in the indicated cell lines. Results are plotted for unique distal regions of comparable length ( 6 10%) but lacking a CPE are isoforms that were each represented by $ 25 poly(A) tags and had alternative also shown. Poly(A) tags from three zebrafish 2 hpf PAL-seq libraries poly(A) sites $ 500 nucleotides apart. For genes with more than one isoform (mock-, miR-132–, and miR-155–injected) were combined before calculating pair meeting these criteria, the pair with poly(A) sites farthest apart was average tail length for each isoform. Tandem isoform pairs with a target site for selected. Points for NM_001007026 and NM_003913 fell outside the boundaries miR-132 or miR-155 in the region unique to the distal isoform were not of the plot for HeLa. P values, x2 test evaluating whether the relationship considered. Only genes for which both tandem isoforms had $ 25 poly(A) tags, between isoform length and tail length differs from that expected by chance. and for which the alternative poly(A) sites were 50–500 nucleotides apart, are b, Average poly(A)-tail lengths for proximal and distal 39-UTR isoforms in plotted. For genes for which isoform choice affected inclusion of a CPE, the 2 hpf zebrafish embryos, comparing results for genes that either contain isoform pair representing that gene was chosen as the two isoforms with the (red circles), or do not contain (open circles), a CPE anywhere within the region most 59-proximal poly(A) sites that flanked a CPE and satisfied the above unique to the distal isoform. A CPE was defined as U12, permitting a single criteria. For the pool of genes from which controls were chosen, two adjacent non-U anywhere within the 12 nucleotides19. For a CPE found in the unique isoforms were picked randomly. P value, Fisher’s exact test, comparing genes region to be counted as present, a canonical poly(A) signal (AAUAAA) also with a CPE in the unique region to controls.

125

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 7 | Relationship between poly(A)-tail length and data were sampled with replacement so as to have the same distribution of tail translational efficiency, classifying genes based on CPE content, tail length lengths observed at stage 12–12.5 (right). c, Box plot as in Fig. 4b for the same or translational efficiency. a, The same data as in Fig. 3a, except genes were set of genes, with slopes calculated omitting data from the fraction without classified based on whether their 39 UTR contained no CPE (grey), one CPE bound ribosomes. d, Box plots as in Fig. 4b, creating four equal bins of genes (blue), or two or more non-overlapping CPEs (red). b, Evidence that the more based on either overall mean poly(A)-tail length (left) or translational efficiency restricted tail-length range observed at gastrulation did not substantially impact (right). The same slopes were used as in Fig. 4b, but considering only genes the coupling between tail length and translational efficiency. The zebrafish with a determined translational efficiency value and $ 100 poly(A) tags in the 4 hpf data from Fig. 3a were sampled with replacement so as to have the same actively dividing 3T3 sample. distribution of tail lengths observed at 6 hpf (left). Likewise, the X. laevis stage 9

126

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 8 | The influence of miRNAs on ribosomes, mRNA parameters are indicated (*P # 0.05; **P , 1024, one-tailed Kolmogorov– abundance and tails in the early zebrafish embryo. a, The relationship Smirnov test). Because injected miRNAs partially inhibited miR-430–mediated between changes in tail length at 4 hpf (as determined by PAL-seq) and changes repression, genes with a site complementary to nucleotides 2–7 of miR-430 in mRNA abundance at 6 hpf (as determined by RNA-seq), after injecting were not considered. All data were normalized to the median changes observed miR-155 (left) or miR-132 (right). Changes observed between miRNA- and for the controls. b, The relationship between changes in ribosome-protected mock-injected embryos are plotted for predicted miRNA target genes fragments (RPFs) and changes in mRNA levels (top), and between changes (red, genes with $ 1 cognate miRNA site in their 39 UTR) and control genes in RPFs and changes in tail lengths (bottom) after injecting miR-132. At 2, 4 (grey, genes that have no cognate miRNA site yet resemble the targets with and 6 hpf, embryos were analysed using ribosome profiling, RNA-seq and respect to 39-UTR length). Lines indicate mean changes for the respective gene PAL-seq. Plots are as in Fig. 5a. sets; statistically significant differences between the gene sets for each of the two

127

©2014 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Table 1 | Relationships between poly(A)-tail lengths of orthologous genes in samples from different species (or the same gene, when the samples are from the same species), and relationships between poly(A)-tail length and the indicated mRNA features

When calculating splice-site density, a pseudocount of one was added to the number of splice sites in an mRNA. For the comparisons between poly(A)-tail length and expression level, mRNA abundances were measured by RNA-seq; data for HeLa were from ref. 35. For the relationship between poly(A)-tail length and mRNA nuclear-to-cytoplasmic abundance ratio, measurements of nuclear and cytoplasmic mRNA abundance in HeLa cells were from ref. 47. mRNA half-lives for S. cerevisiae, HeLa and 3T3 mRNAs were from refs 48–55, ref. 56 and ref. 31, respectively.

128

©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Table 2 | Gene ontology (GO) categories enriched in shorter- or longer-tail genes, as determined by gene set enrichment analysis (GSEA)57

For each sample, GSEA was performed on genes ranked based on their mean poly(A)-tail length. The normalized enrichment score (NES) and false-discovery rate (Q value) are indicated in parentheses next to each enriched GO category. A negative NES indicates a category enriched in the shorter-tail genes, whereas a positive value indicates a category enriched in the longer-tail genes. Enriched GO categories were manually curated to eliminate redundant or uninformative categories.

129

©2014 Macmillan Publishers Limited. All rights reserved GRACE CHEN 57 Hinckley St, Apt 3 | Somerville, MA 02145 732.742.1331 | [email protected]

EDUCATION

Massachusetts Institute of Technology 2011–present Ph.D. – Biology Cambridge, MA • Thesis title: Restoring and enhancing Argonaute2-catalyzed cleavage activity • Research advisor: Dr. David P. Bartel

Rutgers University 2006–2010 B.A. – Molecular Biology & Biochemistry, English New Brunswick, NJ • Magna cum laude • Thesis title: Chromatin regulation during transcription and DNA damage repair • Research advisor: Dr. Thomas Kusch • Notable awards: Highest Honors in Molecular Biology & Biochemistry, Phi Beta Kappa academic honor society

RESEARCH EXPERIENCE

Graduate research 2012–present Laboratory of Dr. David P. Bartel, Whitehead Institute, MIT Cambridge, MA • Identified and restored two key amino acid substitutions that largely abolished the catalytic cleavage activity of zebrafish Argonaute2 and discovered a critical seed mismatch that could enhance Argonaute2-catalyzed cleavage activity that resulted in a first author publication in Molecular Cell • Performed key zebrafish injection experiments that resulted in a publication in Nature • Developed a novel binding and cleavage assay and protein lysis protocols for early, pre-gastrulation zebrafish embryos • Performed protein and RNA biochemistry assays to investigate both in vivo and in vitro Argonaute2- catalyzed cleavage activity • Experienced in molecular biology, protein and RNA biochemistry, enzyme kinetics, radioactive labeling, zebrafish models, mammalian cell culture, and yeast cultures • Generated small RNA libraries for small RNA sequencing • Mentored three rotation graduate students and trained incoming lab members

MIT/ Vertex internship September 2017–January 2018 Group of Dr. Fatih Ozsolak, Human Biology, Vertex Pharmaceuticals Incorporated Boston, MA • Designed RNA FISH experiments to test for the presence of RNA foci accumulation in human ALS lymphoblast lines • Designed an immunofluorescence assay to test for the presence of dipeptide repeats in human ALS lymphoblast lines • Worked independently and with members of the exploratory human biology group

Research assistant 2010–2011 Laboratory of Dr. Andrew K. Vershon, Waksman Institute, Rutgers University Piscataway, NJ • Investigated the role of a non-coding RNA in the regulation of the IME1 gene • Designed, constructed, and assayed gene knock-out mutants to assess functionality • Experienced in yeast genetics

Undergraduate research 2007–2010 Laboratory of Dr. Thomas Kusch, Rutgers University New Brunswick, NJ • Studied chromatin regulation by the dSet1 and dLID complexes in transcription and DNA damage repair

Chen–1

130 • Cloned, expressed, and purified recombinant proteins for biochemical assays and antibody production • Constructed D. melanogaster cell lines for mass spectrometry • Performed biochemical assays to study protein-protein interactions, histone methyl- and acetyl- transferase activity, and histone peptide binding interactions • Performed immunohistochemistry on cultured S2 cells and D. melanogaster ovaries

LEADERSHIP EXPERIENCE

Camp Casco 2015–present Grants manager, Overnight camp activities director, Activities counselor Cambridge and Becket, MA • Wrote and edited grants to fund Camp Casco, a free, week-long, overnight summer camp for local pediatric cancer patients and survivors • Awarded >$58,000 in numerous grants – Expect Miracles Foundation, New England BioLabs, b. good family foundation, Ronald McDonald House Charities, DCU for Kids, KOA Cancer Care Camps • Led a team of 8 counselors in planning, organizing, and leading creative, fun, and inclusive themed camp activities for 30 children with cancer (e.g. real life Mario Kart racing, real life Angry Birds, snow ball fights, fossil digs, Lights Out Lunch) • Worked within a 4-person leadership team and a 26-person dream team of volunteer counselors and medical team to give 30 children a magical week of fun every summer • Volunteer at year-round family socials to offer continued support throughout the year for our campers and their entire families

TEACHING EXPERIENCE

Department of Biology, MIT January–May 2015 Teaching assistant for Cell biology (7.06) • Instructed and provided weekly office hours for 50 undergraduate students from varying academic backgrounds on the fundamental concepts of cell biology • Prepared and developed content for weekly recitations, problem sets, and exams

Department of Biology, MIT January–May 2013 Teaching assistant for Quantitative biology (7.57) • Instructed 35 biology, MD-PhD, and microbiology graduate students on the fundamental concepts and tools of quantitative approaches to molecular and cellular biology using MATLAB • Prepared and developed content for weekly recitations, problem sets, and exams

Waksman Student Scholars Program, Waksman Institute, Rutgers University 2010–2011 Staff and instructor • Taught and supervised high school students and their teachers from 44 different high schools across New Jersey how to perform basic molecular biology experiments – bacterial cultures, DNA extraction, PCR, and restriction digest • Analyzed and corrected responses to the online, bioinformatics portion of the course to over 900 students in the analysis of DNA sequencing data • Organized and distributed molecular biology reagents to 32 high schools in NJ, PA, MD, TN, and TX • Performed a longitudinal study on the impact of the scholars program on alumni’s choices of college, undergraduate research, and STEM careers

KEY ACHIEVMENTS

Camp Casco grants • 2018: Northwestern Mutual, $1,500 • 2017: Miracle Makers Leadership Council, Expect Miracles Foundation, $10,000 • 2017: KOA Cancer Care Camps, $10,000 • 2017: DCU for Kids, $5,000

Chen–2

131 • 2017: Ronald McDonald House Charities, $6,000 • 2016: Miracle Makers Leadership Council, Expect Miracles Foundation, $10,000 • 2016: KOA Cancer Care Camps, $9,150 • 2016: New England BioLabs, $2,000 • 2016: b.good family foundation, $5,000

Research awards • 2016: Best PhD seminar, 11th Microsymposium on Small RNA Biology, IMBA • 2010: magna cum laude with Highest Honors in Molecular Biology & Biochemistry, Rutgers University • 2010: First place, MBB Undergraduate Poster Forum, Rutgers University • 2009: Division of Life Sciences Summer Undergraduate Research Fellowship, Rutgers University • 2008: Aresty Undergraduate Research Fellowship, Rutgers University

PRESENTATIONS

“The loss of RNAi but not pre-miR-451 processing in zebrafish.” 11th Microsymposium on small RNA biology. May 2, 2016. IMBA. Vienna, Austria

PUBLICATIONS

Chen, G.R., H. Sive, D.P. Bartel. 2017. A seed mismatch enhances Argonaute2-catalyzed cleavage and partially rescues severely impaired cleavage found in fish. Mol Cell 68:1095-1107.

Subtelny, A.O., S.W. Eichhorn, G.R. Chen, H. Sive, and D.P. Bartel. 2014. Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508:66-71.

Floyd, S.R., M.E. Pacold, Q. Huang, S.M. Clarke, F.C. Lam, I.G. Cannell, B.D. Bryson, J. Rameseder, M.J. Lee, E.J. Blake, A. Fydrych, R. Ho, B.A. Greenberger, G.R. Chen, A. Maffa, A.M. Del Rosario, D.E. Root, A.E. Carpenter, W.C. Hahn, D.M. Sabatini, C.C. Chen, F.M. White, J.E. Bradner, M.B. Yaffe. 2013. The bromodomain protein Brd4 insulates chromatin from DNA damage signalling. Nature 498: 246-50.

HOBBIES

• Choking out my friends on a nightly basis at my Brazilian jiu-jitsu gym • Making elaborate costumes every year for each themed session of Camp Casco (parrot, dragon, Ms. Frizzle, Sully, penguin, octopus, hammerhead shark) • Passionately cheering on F.C. Barcelona every weekend (and some week days)

REFERNCES

David P. Bartel, Ph.D. Whitehead Institute for Biomedical Research/ HHMI Massachusetts Institute of Technology, Department of Biology Email: [email protected]

Frank Solomon, Ph.D. Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology, Department of Biology Email: [email protected]

Erin Fletcher Stern, S.M., C.E.O. Camp Casco Email: [email protected]

Chen–3

132