6.047/6.878 Lecture 12: Small

Guest Lecture by David Bartel (MIT/Whitehead/HHMI) ([email protected]) Scribed by Boyang Zhao ([email protected]) (2011) September 10, 2012

1 Contents

1 Introduction 3 1.1 ncRNA classifications...... 3 1.2 Small ncRNA...... 3 1.3 Long ncRNA...... 3

2 RNA Interference 5 2.1 History of discovery...... 5 2.2 Biogenesis pathways...... 5 2.3 Functions and silencing mechanism...... 7

2 6.047/6.878 Lecture 12: Small RNA

List of Figures

1 siRNA and miRNA biogenesis pathways...... 6 2 Protein and mRNA changes following miR-223 loss...... 7

1 Introduction

Large-scale analyses in the 1990s using expressed sequence tags have estimated a total of 35,000 - 100,000 genes encoded by the human genome. However, the complete sequencing of human genome has surprisingly revealed that the numbers of protein-coding genes are likely to be ∼20,000 – 25,000 [12]. While this represents <2% of the total genome sequence, whole genome and sequencing and tiling resolution genomic microarrays suggests that over >90% of the genome is still actively transcribed [8], largely as non-protein- coding RNAs (ncRNAs). Although initial speculation has been that these are non-functional transcriptional noise inherent in the machinery, there has been rising evidence suggesting the important role these ncRNAs play in cellular processes and manifestation/progression of diseases. Hence these findings challenged the canonical view of RNA serving only as the intermediate between DNA and protein.

1.1 ncRNA classifications The increasing focus on ncRNA in recent years along with the advancements in sequencing technologies (i.e. Roche 454, Illumina/Solexa, and SOLiD; refer to [16] for a more details on these methods) has led to an explosion in the identification of diverse groups of ncRNAs. Although there has not yet been a consistent nomenclature, ncRNAs can be grouped into two major classes based on transcript size: small ncRNAs (<200 nucleotides) and long ncRNAs (lncRNAs) (≥200 ) (Table 11)[6,8, 13, 20, 24]. Among these, the role of small ncRNAs microRNA (miRNA) and small interfering RNA (siRNA) in RNA silencing have been the most well-documented in recent history. As such, much of the discussion in the remainder of this chapter will be focused on the roles of these small ncRNAs. But first, we will briefly describe the other diverse set of ncRNAs.

1.2 Small ncRNA For the past decades, there have been a number of well-studied small non-coding RNA species. All of these species are either involved in RNA (transfer RNA (tRNA)) or RNA modification and processing (small nucleolar RNA (snoRNA) and small nuclear RNA (snRNA)). In particular, snoRNA (grouped into two broad classes: C/D Box and H/ACA Box, involved in methylation and pseudouridylation, respectively) are localized in the nucleous and participates in rRNA processing and modification. Another group of small ncRNAs are snRNAs that interact with other proteins and with each other to form splicesomes for RNA splicing. Remarkably, these snRNAs are modified (methylation and pseudouridylation) by another set of small ncRNAs - small Cajal body-specific RNAs (scaRNAs), which are similar to snoRNA (in sequence, structure, and function) and are localized in the Cajal body in the nucleus. Yet in another class of small ncRNAs, guide RNAs (gRNAs) have been shown predominately in trypanosomatids to be involved in RNA editing. Many other classes have also been recently proposed (see Table 1) although their functional roles remain to be determined. Perhaps the most widely studied ncRNA in the recent years are (miRNAs), involved in and responsible to the regulation of more than 60% protein-coding genes [6]. Given the extensive work that has been focused on RNAi and wide range of RNAi-based applications that have emerged in the past years, the next section (RNA Interference) will be entirely devoted to this topic.

1.3 Long ncRNA Long ncRNAs (lncRNAs) make up the largest portion of ncRNAs [6]. However the emphasis placed on the study of long ncRNA has only been realized in the recent years. As a result, the terminology for this family of

1TODO: @scribe: In Table 1, ncRNAs with functions labeled not clear have not yet been extensively searched in literature. There can be recent studies that suggest the functional roles of these ncRNAs.

3 6.047/6.878 Lecture 12: Small RNA

Table 1: ncRNA classifications (based on [6,8, 13, 20, 24])

Name Abbreviation Function

Housekeeping RNAs Ribosomal RNA rRNA translation Transfer RNA tRNA translation Small nucleolar RNA snoRNA (∼60-220 nt) rRNA modification Small Cajal body-specific RNA scaRNA splicesome modification Small nuclear RNA snRNA (∼60-300 nt) RNA splicing Guide RNA gRNA RNA editing

Small ncRNAs (<200 nt) MicroRNA miRNA (∼19-24 nt) RNA silencing Small interfering RNA siRNA (∼21-22 nt) RNA silencing interacting RNA piRNA (∼26-31 nt) Transposon silencing, epigenetic regulation Tiny transcription initiation RNA tiRNA (∼17-18 nt) Transcriptional regulation? Promoter-associated short RNA PASR (∼22-200 nt) unknown Transcription start site antisense RNA TSSa-RNA (∼20-90 nt) Transcriptional maintainence? Termini-associated short RNA TASR not clear Antisense termini associated short RNA aTASR not clear -derived RNA RE-RNA not clear 3’UTR-derived RNA uaRNA not clear x-ncRNA x-ncRNA not clear Small NF90-associated RNA snaR not clear Unusually small RNA usRNA not clear Vault RNA vtRNA not clear Human Y RNA hY RNA not clear

Long ncRNAs (≥200 nt) Large intergenic ncRNA lincRNA regulation Transcribed ultraconserved regions T-UCR miRNA regulation? Pseudogenes none miRNA regulation? Promoter upstream transcripts PROMPT Transcriptional activation? Telomeric repeat-containing RNA TERRA telomeric heterochromatin main- tenance GAA-repeat containing RNA GRC-RNA not clear Enhancer RNA eRNA not clear Long intronic ncRNA none not clear Antisense RNA aRNA not clear Promoter-associated long RNA PALR not clear Stable excised intron RNA none not clear Long stress-induced non-coding transcripts LSINCT not clear ncRNAs are still in its infancy and oftentimes inconsistent in the literature. This is also in part complicated by cases where some lncRNAs can also serve as transcripts for the generation of short RNAs. In light of these confusions, as discussed in the previous chapter, lncRNA have been arbitrarily defined as ncRNAs with size greater than 200 nts (based on the cut-off in RNA purification protocols) and can be broadly categorized into: sense, antisense, bidirectional, intronic, or intergenic [19]. For example, one particular class of lncRNA called long intergenic ncRNA (lincRNA) are found exclusively in the intergenic region and possesses chromatin modifications indicative of active transcription (e.g. H3K4me3 at the transcriptional

4 6.047/6.878 Lecture 12: Small RNA start site and H3K36me3 throughout the gene region) [8]. Despite the recent rise of interest in lncRNAs, the discovery of the first lncRNAs (XIST and H19 ), based on searching cDNA libraries, dated back to the 1980s and 1990s before the discovery of miRNAs [3,4]. Later studies demonstrated the association of lncRNAs with polycomb group proteins, suggesting potential roles of lncRNAs in epigenetic gene silencing/activation [19]. Another lncRNA, HOX Antisense Intergenic RNA (HOTAIR), was recently found to be highly upregulated in metastatic breast tumors [11]. The association of HOTAIR with the polycomb complex again supports a potential unified role of lncRNAs in chromatin remodeling/epigenetic regulation (in either a cis-regulatory (XIST and H19 ), or trans-regulatory (e.g. HOTAIR) fashion) and disease etiology. Recent studies have also identified HULC and pseudogene (transcript resembling real genes but contains mutations that prevent their translation into functional proteins) PTENP1 that may function as a decoy in binding to miRNAs to reduce the overall effectiveness of miRNAs [18, 25]. Other potential roles of lncRNAs remains to be explored. Nevertheless, it is becoming clear that lncRNAs are less likely to be the result of transcriptional noise, but may rather serve critical role in the control of cellular processes.

2 RNA Interference

RNA interference has been one of the most significant and exciting discoveries in recent history. The impact of this discovery is enormous with applications ranging from knockdown and loss-of-function studies to the generation of better animal models with conditional knockdown of desired gene(s) to large-scale RNAi-based screens to aid drug discovery.

2.1 History of discovery The discovery of the gene silencing phenomenon dated back as early as the 1990s with Napoli and Jorgensen demonstrating the down-regulation of chalcone synthase following introduction of exogenous transgene in plants [17]. Similar suppression was subsequently observed in other systems [10, 22]. In another set unrelated work at the time, Lee et al. identified in a genetic screen that endogenous lin-4 expressed a non-protein- coding product that is complementary to the lin-14 gene and controlled the timing of larval development (from first to second larval state) in C. elegans [15]. We now know this as the first miRNA to be discovered. In 2000, another miRNA, let-7, was discovered in the same organism and was found to be involved in promoting the late-larval to adult transition [21]. The seminal work by Mello and Fire in 1998 (for which was awarded the Nobel Prize in 2006) demonstrated that the introduction of exogenous dsRNA in C. elegans specifically silenced genes via RNA interference, explaining the prior suppression phenomenon observed in plants [7]. Subsequent studies found the conversion of dsRNA into siRNA in the RNAi pathway. In 2001, the term miRNA and the link between miRNA and RNAi was described in three papers in Science [23]. With this, we have come to realize the gene regulatory machinery was composed of predominately of two classes small RNAs, with miRNA involved in the regulation of endogenous genes and siRNA involved in defense in response to viral nucleic acids, transposons, and transgenes [5]. Later works revealed downstream effectors: Dicers (for excision of precursor species) and proteins (part of the RNA-induced silencing complex to perform the actual silencing effects), completing our current understanding of the RNA silencing pathways. The details of the mechanism and the differences among the species are further discussed below.

2.2 Biogenesis pathways There is a common theme involved for both siRNA-mediated and miRNA-mediated silencing. In the biogen- esis of both siRNA and miRNA, the double-stranded precursors are cleaved by a RNase into short ∼22 nt fragments. One of the strands (the guide strand) is loaded into an Argonaute protein, a central component of the larger ribonucleoprotien complex RISC that facilitates target RNA recognition and silencing. The mechanism of silencing are either cleaveage of the target mRNA or translation repression. Aside from this common theme, the proteins involved in these processes differ among species and there exists additional steps in miRNA processing prior to its maturation and incorporation into RISC (Figure 1). For the biogenesis of siRNA, the precursors are dsRNAs, oftentimes from exogenous sources such as viruses or transposons. However, recent studies have also found endogenous siRNAs [9]. Regardless of the

5 6.047/6.878 Lecture 12: Small RNA

source, these dsRNAs are processed by the RNase III , , into ∼22 nt siRNAs. This RNase III-catalyzed cleavage leaves the characteristic 5’phosphates and 2 nt 3’ overhangs [2]. It is worth noting that different species have evolved with different number of paralogs. This becomes important as, to be discussed later, the miRNA biogenesis pathway also utilizes Dicer for the processing of miRNA precursors (more specifically pre-miRNAs). For species such as D. melanogaster, there are two distinct Dicer proteins and as a result there is typically a preferential processing of the precursors (e.g. Dicer-1 for miRNA cleavage and Dicer-2 for siRNA cleavage) [5]. In contrast, mammals and nematodes only have a single Dicer protein and as such both biogenesis pathways converge to the same processing step [5]. In subsequent steps of the siRNA biogenesis pathway, one of the strands in the siRNA duplex is loaded into RISC to silence target RNAs (Figure 1C).

Figure 1: siRNA and miRNA biogenesis pathways. (A) Biogenesis of plant miRNA (B) Biogenesis of animal miRNA (C) Biogenesis of animal siRNA. Adopted from Bartel, 2004 (ref [2]). Copyright © 2004 Cell Press.

In the miRNA biogenesis pathway, majority of the precursors are pol II transcripts of the intron regions, some of which encode multiple miRNAs in clusters. These precursors, in the form of a stem-loop structure, are named pri-miRNAs. The pri-miRNAs are first cleaved in the nucleus by a RNase III endonuclease (Drosha in animals and Dcl1 in plants) into ∼60-70 nt stem loop intermediates, termed pre-miRNAs [2]. In animals, the pre-miRNA is then exported into the cytoplasm by Exportin-5. This is followed by the cleavage of pre-miRNA intermediate by Dicer to remove the stem loop. One of the strands in the resulting mature miRNA duplex is loaded to RISC, similar to that described for siRNA biogenesis Figure 1B. Interestingly, in plants, the pri-miRNA is processed into mature miRNA through two cleavages by the same , Dcl1, in the nucleus before export into the cytoplasm for loading (Figure 1A).

6 6.047/6.878 Lecture 12: Small RNA

2.3 Functions and silencing mechanism The classical view of miRNA function based on the early discoveries of miRNA has been analogous to a binary switch whereby miRNA represses translation of a few key mRNA targets to initiate a developmental transition. However, subsequent studies have greatly broaden this definition. In plants, most miRNAs bind to the coding region of the mRNA with near-perfect complementarity. On the other hand, animal miRNAs bind with partial complementarity (except for a seed region, residues 2-8) to the 3’ UTR regions of mRNA. As such, there are potentially hundreds targets by a single miRNA in animals rather than just a few [1]. In addition, in mammals, only a few portion of the predicted targets are involved in development, with the rest predicted to cover a wide range of molecular and biological processes [2]. Lastly, miRNA silencing acts through both translation repression and mRNA cleavage (and also destabilization as discussed below)(as shown for example showed by Bartel and coworkers on the miR-196-directed cleavage of HOXB6 [26]). Taken together, the modern view of miRNA function has been that miRNA dampens expression of many mRNA targets to optimize expression, reinforce cell identity, and sharpen transitions. The mechanism for which miRNA mediates the silencing of target mRNA is still an area of active research. As previously discussed, RNA silencing can take the form of either cleavage, destabilization (leading to subsequent degradation of the mRNA), or translation repression. In plants, it has been found that the predominate mode of RNA silencing is through Argonaute-catalyzed cleavage. However, the contribution of these different modes of silencing has been less clear in animals. Recent global analyses from the Bartel group in collaboration with Gygi and Ingolia and Weissman shed light on this question. In a 2008 study, Bartel and Gygi groups examined the global changes in protein level using mass spectrometry following miRNA introduction or deletion [1]. Their results revealed the repression of hundreds of genes by individual miRNAs, and more importantly mRNA destabilization accounts for majority of the highly repressed targets (Figure 2).

Figure 2: Protein and mRNA changes following miR-223 loss, from messages with at least one 8-mer 3’UTR site (blue) or at least one 7-mer (orange). Adopted from Baek et al., 2008 (ref [1]). Copyright © 2008 Macmillan Publishers Limited.

7 6.047/6.878 Lecture 12: Small RNA

This is further supported by a subsequent study using both RNA-seq and a novel ribosome-profiling first demonstrated by Inoglia and Weissman 2009 that enables the interrogation of global translation activities with sub-codon resolution [14]. The results showed destabilization of target mRNA is the predominate mechanism through which miRNA reduces the protein output.

References

[1] Daehyun Baek, Judit Vill´en,Chanseok Shin, Fernando D Camargo, Steven P Gygi, and David P Bartel. The impact of microRNAs on protein output. Nature, 455(7209):64–71, September 2008. [2] David P Bartel. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116(2):281–97, January 2004. [3] M S Bartolomei, S Zemel, and S M Tilghman. Parental imprinting of the mouse H19 gene. Nature, 351(6322):153–5, May 1991. [4] C J Brown, A Ballabio, J L Rupert, R G Lafreniere, M Grompe, R Tonlorenzi, and H F Willard. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature, 349(6304):38–44, January 1991. [5] Richard W Carthew and Erik J Sontheimer. Origins and Mechanisms of miRNAs and siRNAs. Cell, 136(4):642–55, February 2009. [6] Manel Esteller. Non-coding RNAs in human disease. Nature Reviews Genetics, 12(12):861–874, Novem- ber 2011. [7] A Fire, S Xu, M K Montgomery, S A Kostas, S E Driver, and C C Mello. Potent and specific genetic interference by double-stranded RNA in . Nature, 391(6669):806–11, February 1998. [8] Ewan a Gibb, Carolyn J Brown, and Wan L Lam. The functional role of long non-coding RNA in human carcinomas. Molecular cancer, 10(1):38, January 2011. [9] Daniel E Golden, Vincent R Gerbasi, and Erik J Sontheimer. An inside job for siRNAs. Molecular cell, 31(3):309–12, August 2008. [10] S Guo and K J Kemphues. par-1, a gene required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase that is asymmetrically distributed. Cell, 81(4):611–20, May 1995. [11] Rajnish A Gupta, Nilay Shah, Kevin C Wang, Jeewon Kim, Hugo M Horlings, David J Wong, Miao- Chih Tsai, Tiffany Hung, Pedram Argani, John L Rinn, Yulei Wang, Pius Brzoska, Benjamin Kong, Rui Li, Robert B West, Marc J van de Vijver, Saraswati Sukumar, and Howard Y Chang. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer . Nature, 464(7291):1071–6, April 2010. [12] Masahira Hattori. Finishing the euchromatic sequence of the human genome. Nature, 431(7011):931–45, October 2004. [13] Christopher L Holley and Veli K Topkara. An introduction to small non-coding RNAs: miRNA and snoRNA. Cardiovascular Drugs and Therapy, 25(2):151–159, 2011.

[14] Nicholas T Ingolia, Sina Ghaemmaghami, John R S Newman, and Jonathan S Weissman. Genome-wide analysis in vivo of translation with resolution using ribosome profiling. Science (New York, N.Y.), 324(5924):218–23, April 2009. [15] R C Lee, R L Feinbaum, and V Ambros. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75(5):843–54, December 1993.

8 6.047/6.878 Lecture 12: Small RNA

[16] Michael L Metzker. Sequencing technologies - the next generation. Nature Reviews Genetics, 11(1):31– 46, January 2010. [17] C. Napoli, C. Lemieux, and R. Jorgensen. Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. The Plant cell, 2(4):279– 289, April 1990.

[18] Laura Poliseno, Leonardo Salmena, Jiangwen Zhang, Brett Carver, William J Haveman, and Pier Paolo Pandolfi. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature, 465(7301):1033–8, June 2010. [19] Chris P Ponting, Peter L Oliver, and Wolf Reik. Evolution and functions of long noncoding RNAs. Cell, 136(4):629–41, February 2009. [20] J. R. Prensner and A. M. Chinnaiyan. The Emergence of lncRNAs in Cancer Biology. Cancer Discovery, 1(5):391–407, October 2011. [21] B J Reinhart, F J Slack, M Basson, A E Pasquinelli, J C Bettinger, A E Rougvie, H R Horvitz, and G Ruvkun. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature, 403(6772):901–6, February 2000. [22] N Romano and G Macino. Quelling: transient inactivation of in Neurospora crassa by transformation with homologous sequences. Molecular microbiology, 6(22):3343–53, November 1992. [23] G Ruvkun. Molecular biology. Glimpses of a tiny RNA world. Science, 294(5543):797–9, October 2001.

[24] Ryan J Taft, Ken C Pang, Timothy R Mercer, Marcel Dinger, and John S Mattick. Non-coding RNAs: regulators of disease. The Journal of pathology, 220(2):126–39, January 2010. [25] Jiayi Wang, Xiangfan Liu, Huacheng Wu, Peihua Ni, Zhidong Gu, Yongxia Qiao, Ning Chen, Fenyong Sun, and Qishi Fan. CREB up-regulates long non-coding RNA, HULC expression through interaction with microRNA-372 in liver cancer. Nucleic acids research, 38(16):5366–83, September 2010.

[26] Soraya Yekta, I-Hung Shih, and David P Bartel. MicroRNA-directed cleavage of HOXB8 mRNA. Science, 304(5670):594–6, April 2004.

9 6.047/6.878 Lecture 12: Small RNA

10