SR in microRNA/mRNA biogenesis

by

Han Wu

Department of Cell Biology Duke University

Date:______Approved:

______Jun Zhu, Supervisor

______Brigid L.M. Hogan, Chair

______Christopher B. Newgard

______Uwe Ohler

______Kenneth D. Poss

______Yuan Zhuang

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Cell Biology in the Graduate School of Duke University

2011

ABSTRACT

SR proteins in microRNA/mRNA biogenesis

by

Han Wu

Department of Cell Biology Duke University

Date:______Approved:

______Jun Zhu, Supervisor

______Brigid L.M. Hogan, Chair

______Christopher B. Newgard

______Uwe Ohler

______Kenneth D. Poss

______Yuan Zhuang

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Cell Biology in the Graduate School of Duke University

2011

Copyright by Han Wu 2011

Abstract

Serine/arginine‐rich proteins or SR proteins are well‐known factors involved in splicing regulation. Despite the tremendous efforts that have been made in characterizing their functions, several fundamental questions still remain: how are the expression levels of SR proteins regulated; what are the molecular mechanisms underlying SR protein‐mediated regulation; and what are the physiological targets of SR proteins in vivo. In my dissertation study, I employed a number of genomic and molecular approaches to study the functional involvement of two SR proteins, SF2/ASF and SRp20, in regulating microRNA/mRNA biogenesis.

Negative feedback regulation has been shown as a common mechanism to maintain SR protein homeostasis (i.e. SC35 and SRp20). In the first part of my thesis, I set out to examine the potential involvement of miRNA‐mediated feedback regulation in maintaining the steady‐state level of SF2/ASF. MicroRNA deep sequencing was employed to identify differentially expressed miRNAs by elevating SF2/ASF level in an inducible cell line system. The sequencing data were further integrated with miRNA target predictions, which allowed me to identify a putative SF2/ASF‐miR‐7 negative feedback loop. A series of molecular and cellular techniques were then employed to validate the circuit structure. To our knowledge, this is the first negative feedback circuit

iv

reported between an SR protein (SF2/ASF) and a miRNA (miR‐7), and the paradigm may be broadly applicable in regulating the homeostasis of other SR proteins.

This initial study was then extended to characterize the mechanism underlying

SF2/ASF‐enhanced miR‐7 expression. Notably, one of the miR‐7 primary transcripts, miR‐7‐1, is embedded in an alternatively spliced of the hnRNPK gene. It provides a unique system to simultaneously study the involvement of SF2/ASF in and miRNA biogenesis. Through a series of mutagenesis assays in a reporter system, I have uncovered a novel splicing‐independent function of SF2/ASF in regulating miRNA biogenesis. Direct interaction between SF2/ASF and primary‐miR‐7

(pri‐miR‐7) was demonstrated by cross‐linking and immunoprecipitation assay (CLIP) as well as by RNA affinity purification assay. Furthermore, I showed that this interaction is required for efficient miR‐7 processing in vivo. Finally, an in vitro pri‐miRNA processing assay was employed, and the results showed that SF2/ASF promotes the

Drosha cleavage step of pri‐miR‐7 depending on the predicted binding site. Taken together, this was the first study showing the direct involvement of an SR protein in miRNA biogenesis.

In order to study the global involvement of SR proteins in RNA biogenesis, one important stepping stone is to identify their targets in vivo. To this end, I focused on

SRp20, another classic SR protein. PAR‐CLIP (Photoactivatable‐Ribonucleoside‐

Enhanced Cross‐linking and immunoprecipitation assay) combined with Illumina

v

sequencing was employed to systematically identify potential SRp20 binding targets, a subset of which were randomly selected and validated. As expected, our results showed that SRp20 primarily targets exonic regions for splicing regulation, and such interactions are likely to be mediated by a CWWCW motif. Surprisingly, extensive interactions between SRp20 and the 3’ UTRs of mRNAs were also observed. Such interactions presumably affect the choices between alternative polyadenylation sites. Thus, SRp20 may be one of the master regulators of both splicing and 3’ end processing.

In summary, my thesis study was centered on functional characterization of two

SR proteins, SF2/ASF and SRp20, in post‐transcriptional gene regulation. I have identified a negative feedback circuit between SF2/ASF and miR‐7: SF2/ASF directly binds to pri‐miR‐7 and promotes its maturation in a splicing‐independent manner; while mature miR‐7 negatively regulates SF2/ASF at the translational level. Furthermore, PAR‐

CLIP assay identified genome‐wide interactions between SRp20 and its potential targets, which may be linked to splicing regulation as well as alternative 3’ end processing.

Altogether, this study has provided novel insights into how SR protein homeostasis can be achieved through negative feedback regulation, and the regulatory role of SR proteins in miRNA/mRNA biogenesis.

vi

Contents

Abstract ...... iv

List of Tables ...... xi

List of Figures ...... xii

List of Abbreviations ...... xiv

Acknowledgements ...... xix

1. Introduction ...... 1

1.1 Pre‐mRNA splicing ...... 2

1.2 The SR family of splicing factors ...... 4

1.3 SR proteins in splicing regulation ...... 6

1.4 Other regulatory functions of SR proteins in gene expression ...... 8

1.5 SR protein‐mediated gene regulation: functional significance ...... 10

1.6 The substrate specificities of SR proteins ...... 13

1.7 SR protein homeostasis ...... 16

1.8 MicroRNA biogenesis ...... 18

1.9 MicroRNA functions ...... 22

1.10 MicroRNA‐mediated gene regulatory networks ...... 26

2. Methods ...... 29

2.1 Cell culture and transfection ...... 29

2.2 Plasmids ...... 29

2.3 Profiling miRNA expression by deep sequencing ...... 30

vii

2.3.1 Construction of miRNA sequencing library ...... 30

2.3.2 Analysis of miRNA deep sequencing data ...... 31

2.4 Dual‐Luciferase reporter assay ...... 32

2.5 Semi‐quantitative PCR ...... 33

2.6 Cross‐linking and Immunoprecipitation (CLIP) ...... 34

2.6.1 CLIP‐PCR ...... 34

2.6.2 PAR‐CLIP ...... 35

2.6.3 Analysis of PAR‐CLIP sequencing data ...... 36

2.7 Northern blotting ...... 37

2.8 Western blotting ...... 38

2.9 RNA affinity purification ...... 39

2.10 RNA interference ...... 40

2.11 In vitro pri‐miRNA processing assay ...... 40

3. Negative feedback regulation between SFRS1 and miR‐7 ...... 41

3.1 Generation of a stable cell line system to recapitulate SF2/ASF feedback...... 42

3.2 Identification of the SF2/ASF‐miR‐7 feedback circuit ...... 46

3.3 SF2/ASF is required for efficient production of miR‐7 ...... 51

3.4 SF2/ASF can be targeted by miR‐7 ...... 53

3.5 Potential biological significance of the SF2/ASF‐miR‐7 circuit ...... 60

4. A splicing‐independent function of SF2/ASF in miR‐7 biogenesis ...... 65

4.1 The domain requirement of SF2/ASF for promoting miR‐7 expression ...... 66

4.2 A splicing‐independent function of SF2/ASF in pri‐miR‐7 processing ...... 73

viii

4.3 SF2/ASF directly binds to pri‐miR‐7 in vivo ...... 77

4.4 SF2/ASF is involved in the Drosha cleavage step of miR‐7 maturation ...... 85

4.5 Broad involvement of SF2/ASF in miRNA biogenesis ...... 92

5. Genome‐wide identification of SRp20 targets in vivo ...... 95

5.1 PAR‐CLIP to identify the putative targets of SRp20 in vivo ...... 96

5.2 Global landscape of RNA transcripts bound by SRp20 ...... 105

5.3 Experimental validation of potential SRp20 binding clusters ...... 109

5.4 De novo Motif discovery ...... 112

5.5 Crosstalk between SRp20 and other SR proteins ...... 117

5.6 SRp20 in regulating the expression of SR proteins ...... 123

6. Conclusions and Discussions ...... 129

6.1 Negative feedback circuit between SFRS1 and miR‐7 ...... 129

6.2 A splicing‐independent function of SF2/ASF in miRNA processing ...... 131

6.3 SR proteins may be broadly involved in miRNA biogenesis ...... 133

6.4 SR proteins may regulate their downstream gene expression in concert with miRNAs ...... 135

6.5 Landscape and biological significance of the SRp20‐RNA interactions ...... 136

6.6 SRp20 in 3’ end processing ...... 137

6.7 SRp20 in the regulation of alternative splicing coupled NMD ...... 139

6.8 SRp20 and SF2/ASF in tumorigenesis...... 140

Appendix A: Oligonucleotides used for Luciferase reporter assay ...... 143

Appendix B: Oligonucleotides used for semi‐quantitative PCR ...... 145

ix

Appendix C: Oligonucleotides used for RIP‐RT‐PCR validation of PAR‐CLIP clusters 146

Appendix D: Oligonucleotides used for functional validation of PAR‐CLIP clusters .... 150

Appendix E: GO analysis for the host with SRp20 binding clusters ...... 153

References ...... 161

Biography ...... 185

x

List of Tables

Table 1: Mappability of miRNA deep sequencing library ...... 46

Table 2: Summary of PAR‐CLIP clusters in SR protein coding genes ...... 124

xi

List of Figures

Figure 1: The microRNAs predicted to target the SFRS1 gene ...... 44

Figure 2: SF2/ASF autoregulation in the stable cell line with inducible SF2/ASF expression...... 45

Figure 3: Genomic mapping of raw reads from miRNA sequencing libraries ...... 47

Figure 4: Differentially expressed miRNAs upon SF2/ASF induction ...... 48

Figure 5: Potential negative feedback between SF2/ASF and miR‐7 ...... 50

Figure 6: SF2/ASF is required for efficient miR‐7 maturation ...... 52

Figure 7: SFRS1 is a target of miR‐7 ...... 54

Figure 8: Inhibition of endogenous miR‐7 increases the level of SF2/ASF protein...... 57

Figure 9: Potential involvement of miR‐7 in regulating the MAPK and PI3K pathways . 63

Figure 10: A minigene reporter to recapitulate hnRNPK alternative splicing and SF2/ASF‐regulated miR‐7 expression ...... 67

Figure 11: The domain requirement of SF2/ASF for promoting miRNA production and alternative splicing ...... 71

Figure 12: A splicing‐independent function of SF2/ASF in promoting miR‐7 expression 74

Figure 13: SF2/ASF directly interacts with the primary miR‐7‐1 transcripts ...... 79

Figure 14: Direct interaction between SF2/ASF and the stem‐loop region of miR‐7‐1 ...... 82

Figure 15: SF2/ASF promotes miR‐7 maturation at the Drosha cleavage step in a sequence‐dependent manner ...... 87

Figure 16: Both precursor and mature miR‐7 levels are increased by elevated SF2/ASF expression...... 89

Figure 17: SF2/ASF is broadly involved in miRNA biogenesis ...... 93

xii

Figure 18: Inducible SRp20 expression in a MEF/3T3 stable cell line ...... 98

Figure 19: Dependency of 4‐thiouridine (4‐SU) and UV cross‐linking for the PAR‐CLIP procedure ...... 101

Figure 20: Identification of SRp20 targets by PAR‐CLIP ...... 103

Figure 21: Locations of SRp20 PAR‐CLIP clusters in pre‐mRNA transcripts ...... 107

Figure 22: Validation of candidate SRp20 PAR‐CLIP clusters by quantitative RT‐PCR 110

Figure 23: Identification of putative SRp20 binding motif ...... 115

Figure 24: Functional validation of candidate SRp20 binding sites ...... 120

Figure 25: SRp20 regulates the expression SFRS1 gene through direct binding ...... 127

xiii

List of Abbreviations

4‐SU 4‐thiouridine

5ʹ/3ʹ ss 5ʹ/3ʹ splice site

ADAR Adenosine Deaminase

AGO Argonaute

APA Alternative polyadenylation

bGH Bovine growth hormone

BIN1 Bridging Integrator 1

BP Branch point

CaMKIIδ Ca2+/calmodulin‐dependent IIδ

Cdc25B Cell division cycle 25 homolog B

cERMIT Evidence‐ranked motif identification

CLIP Cross‐linking and immunoprecipitation assay

Cluster analysis and display of genome‐wide expression Cluster2 patterns

CTD C‐terminal domain

DM Distal 3ʹ splice‐site mutant

DsiRNA Dicer substrate small interfering RNA

xiv

DTT Dithiothreitol

E7.5 Embryonic day 7.5

EIF4E Eukaryotic initiation factor 4E

ESE

ESS

EST Expressed sequence tags

FBS Fetal bovine serum

Fn1 Fibronectin 1

FoxM1 Forkhead box transcription factor M1

GESA Gene Set Enrichment Analysis

GRN Gene regulatory network hESC Human Embryonic Stem Cell hnRNP Heterogeneous ribonucleoprotein particle

HPV Human papillomavirus

ISE Intronic splicing enhancer

ISS Intronic splicing silencer

KDE Kernel density estimation

xv

KSRP KH‐type Splicing Regulatory Protein

MAP3K9 Mitogen‐Activated Protein Kinase Kinase Kinase 9

MAPK4 Mitogen‐Activated Protein Kinase 4

MKNK1 MAP kinase interacting serine/threonine kinase 1 mLin41 Mouse homologue of lin‐41

MNK2 MAP kinase interacting serine/threonine kinase 2

NGS Next‐generation sequencing

NMD Nonsense‐mediated mRNA decay

NOVA Neuro‐Oncological Ventral Antigen

NR2F2 Nuclear Receptor subfamily 2, group F, member 2

NRS Nuclear retention signal

OCT4 Octamer binding transcription factor 4

PANTHER Protein ANalysis THrough Evolutionary Relationships

Photoactivatable‐Ribonucleoside‐Enhanced Cross‐linking and PAR‐CLIP Immunoprecipitation

PIK3CD PhosphoInositide‐3‐Kinase, Catalytic, Delta polypeptide

Plk1 Polo‐like kinase 1

PM Proximal 3ʹ splice‐site mutant

xvi

Pol II RNA polymerase II poly(A) Polyadenylation

PPT Polypyrimidine tract pre‐miRNA miRNA precursor pri‐miRNA miRNA primary transcript

PRKCB1 Protein Kinase C, Beta

PSCD3 Pleckstrin homology, Sec7 and Coiled‐coil Domains 3

PTC Premature termination codon

RAF1 v‐RAF‐1 murine leukemia viral oncogene homolog 1

RBP RNA‐binding protein

RIP Ribonucleoprotein immunoprecipitation assay

RISC RNA‐induced silencing complex

RRM RNA recognition motif

RSV Rous Sarcoma Virus

S6K1 Ribosomal protein S6 Kinase polypeptide 1

SELEX Systematic Evolution of Ligands by Exponential Enrichment

SF1/BBP Splicing Factor 1/Branch Binding Protein

xvii

SMA

SMN1 Survival of motor neuron 1

SMN2 Survival of motor neuron 2 snRNA Small nuclear RNA snRNP Small nuclear ribonucleoproteins

Sox9 SRY box transcription factor 9

SR protein Serine/arginine rich protein

Serum Response Factor or c‐fos Serum Response element‐ SRF binding transcription Factor

SRPK SR protein kinase

SWAP Suppressor‐of‐white‐apricot

TNRC6A Trinucleotide repeat containing 6A

Tra Transformer

Tra‐2 Transformer‐2

TRBP TAR (HIV‐1) RNA‐binding protein

U2AF U2 snRNP auxiliary factor

UCR UltraConserved Region

XRN1 5’Æ3’ exonuclease 1

xviii

Acknowledgements

I would like to take this special opportunity to acknowledge all my friends, colleagues and family, who have helped and supported me for the past few years. First and foremost, I am particularly grateful to my fabulous advisor and sincere friend Dr.

Jun Zhu for his patience, encouragement and mentorship throughout my Ph.D. training, and for his critical reading of this dissertation. His knowledge and enthusiasm for science have always inspired me when I was overwhelmed by the project. I would like to express my sincere gratitude to my committee members Drs. Brigid L.M. Hogan,

Christopher B. Newgard, Uwe Ohler, Kenneth D. Poss, and Yuan Zhuang, for their kind suggestions and insightful guidance. I thank the other members of the Zhu lab for their valuable discussion, especially Drs. Ting Ni and Kang Tu for their technical supports. I also want to thank Mr. Matt Gemberling for his contribution in constructing some of the mutant minigene reporters in the SF2/ASF‐miR‐7 project.

I am especially grateful to our collaborators Drs. Shuying Sun, Adrian R. Krainer

(Cold Spring Harbor Laboratory), Zhi‐Ming Zheng, and Rong Jia (National

Institute) for sharing cell lines and for providing valuable reagents critical for my thesis projects. I would like to thank Dr. Yuan Gao (Johns Hopkins University School of

Medicine) for helping me perform the deep sequencing experiments and Dr. Zhong

Wang (DOE Joint Genome Institute) for his valuable advice on computational analysis. I want to thank Dr. Neelanjan Mukherjee (Duke Institute for Genome Sciences & Policy)

xix

and Dr. Qiongyi Zhao (Shanghai Institutes for Biological Sciences) for their generous and valuable help in analysis of the deep sequencing data.

Lastly, I would like to dedicate this dissertation to all my family members, especially my beloved Mom and Dad for their endless support. Without them, nothing is possible.

xx

1. Introduction

One key feature that distinguishes prokaryote and eukaryote cells is the separation of nuclear and cytoplasmic compartments. More sophisticated gene expression regulation can therefore take place at the post‐transcriptional level, such as pre‐messenger RNA splicing. It is believed that post‐transcriptional gene regulation is largely mediated by RNA‐binding proteins or RBPs. In and

Drosophila melanogaster, approximately 2% of known genes encode RBPs (Glisovic et al.,

2008). As for the , there are approximately 400 protein‐coding genes with known RNA binding properties (Galante et al., 2009). The list of RBPs is continuing to expand, suggesting their potential importance in gene regulation. Despite the tremendous advances in characterizing RBPs during the past two decades, we are still at an early stage in understanding the complexity of post‐transcriptional gene regulatory network. During my thesis study, I set out to characterize how SF2/ASF and SRp20, two well‐known SR family of splicing factors, are involved in microRNA/mRNA biogenesis and how they exert their regulatory roles in shaping the mammalian transcriptomes.

1

1.1 Pre-mRNA splicing

Pre‐mRNA splicing is a crucial step in eukaryotic gene expression, and was initially discovered in 1977 by virologists working on adenovirus gene expression

(Berget et al., 1977; Chow et al., 1977). The removal of intervening sequences (or ) from protein‐coding transcripts is a prerequisite for the production of functional proteins. While some introns are constitutively spliced, others can be alternatively regulated. A recent genome‐wide survey estimated that more than 90% of human genes are alternatively spliced (Wang et al., 2008). In fact, alternative splicing is a major contributor to mammalian proteome diversity (Lander et al., 2001; Maniatis and Tasic,

2002). A well‐known example is the CD44 gene, which encodes a cell surface antigen.

This gene contains ten alternatively spliced and can give rise to more than

1000 putative splicing variants (Naor et al., 1997). It has been shown that different CD44 variants display distinct expression patterns among tissues and cell types, and can serve as potential markers for cancer diagnosis, prognosis, metastasis and drug resistance

(Cain et al., 2011; da Cunha et al., 2010; Klingbeil et al., 2009).

Pre‐mRNA splicing is catalyzed by the , a macromolecular complex that consists of five snRNAs (small nuclear RNAs, U1, U2, U4, U5 and U6) and approximately 300 associated proteins (Matlin and Moore, 2007; Wahl et al., 2009). In metazoans and plants, there is a minor spliceosome, which contains functionally

2

analogous U11/U12 and U4atac/U6atac snRNPs (small nuclear ribonucleoproteins), with a common U5 snRNP(Patel and Steitz, 2003).

The splicing reaction is carried out by basal splicing machinery in a step‐wise manner. Spliceosome assembly starts with the interaction between U1 snRNP and 5’ splice site (ss) in an ATP‐independent manner, followed by the binding of Splicing

Factor 1/Branch Point Binding Protein (SF1/BBP) and U2 snRNP auxiliary factor (U2AF) to the branch point (BP) and the polypyrimidine tract (PPT), respectively. The 35kDa subunit of U2AF is also recruited by interacting with the AG dinucleotide of the 3’ splice site. As a result, the spliceosomal E complex is formed. During the next step, U2 snRNA interacts with the BP in an ATP‐dependent manner to form the A complex. The U4/U6 and U5 snRNPs are then recruited as a tri‐snRNP complex to form the B complex, which remains catalytically inactive at this stage. Activated spliceosome (the B* complex) is then generated by the release of U1 and U4 snRNPs due to conformational rearrangement. Finally, two trans‐esterification reactions take place, by which two exons are ligated together and the intron is released in lariat form (Matlin and Moore, 2007;

Wahl et al., 2009).

The spliceosome is a highly dynamic molecular machine, and many protein‐

RNA interactions are relatively weak but reinforced by auxiliary factors (Wahl et al.,

2009). In terms of alternative splicing, splice site signals (5’ ss, 3’ ss and BP) are usually

3

not sufficient for determining the splicing specificity. Additional protein‐RNA interactions are therefore required, including those between cis‐regulatory elements (e.g.

ESE or exonic splicing enhancer) (Zheng, 2004) and trans‐acting factors (e.g. SR proteins and hnRNPs) (see later sections).

1.2 The SR family of splicing factors

During the past three decades, numerous studies have been performed to characterize protein factors important for splicing regulation. One of the most established protein families is the SR protein family, which stands for serine/arginine rich protein. Each SR protein has a characteristic RS domain (rich in arginine and serine dipeptides) at the C‐terminus and one or more RNA recognition motifs (RRMs) at the N‐ terminus. Whereas the RRMs are mainly responsible for sequence‐specific RNA binding, the RS domain serves as an “activation” domain through protein‐protein interactions.

The RS domain‐containing proteins were initially discovered in Drosophila as splicing regulators, including SWAP (suppressor‐of‐white‐apricot), Tra (transformer) and

Tra‐2 (transformer‐2) (Amrein et al., 1988; Boggs et al., 1987; Chou et al., 1987). The first mammalian SR splicing factor, SF2/ASF, was discovered by two independent groups using biochemical and genetic approaches, respectively (Ge and Manley, 1990; Krainer et al., 1990). The mammalian SR proteins display similar structure and function as their

4

fly counterparts (Fu and Maniatis, 1990; Ge et al., 1991; Krainer et al., 1991). In fact, SR proteins share several common biochemical or functional properties, which greatly facilitate the systematic identification of candidate SR proteins in metazoans. These properties include a common phosphoepitope recognized by the mAb104 , the method for biochemical purification , and the ability to complement splicing‐deficient cytoplasmic S100 extracts (Fu, 1995).

Since the discovery of SF2/ASF, many SR and SR‐like proteins have been identified in metazoans. The nomenclature and definition of SR proteins have become confusing due to the lack of a general rule. Drs. James Manley and Adrian Krainer have recently proposed to define SR proteins entirely based on their sequence properties: one or two RRMs at the N‐terminus, followed by a downstream RS domain of at least 50 amino acids with over 40% RS contents. After applying these stringent criteria, 12 SR proteins were identified by searching the Uniprot database, and named accordingly as

SRSF1 to SRSF12 (Manley and Krainer, 2010). To be consistent with our previous publication and the UCSC gene annotations, I will use the conventional nomenclature throughout my dissertation. Notably, SFRS1 (gene encoding SF2/ASF) or SFRS3 (gene encoding SRp20) described in the dissertation are equivalent to SRSF1 or SRSF3 in the updated nomenclature.

5

1.3 SR proteins in splicing regulation

Enormous efforts have been made to elucidate the molecular mechanisms underlying SR protein‐mediated splicing regulation. It is known that the interactions between basal splicing machinery and splice site signals (5’ ss, 3’ ss and BP) are often not sufficient to define splicing specificity (Matlin and Moore, 2007; Wahl et al., 2009).

Exonic/intronic splicing enhancers (ESEs and ISEs) and splicing silencers (ESSs and ISSs) provide an additional layer of specificity to reinforce splicing accuracy (Ladd and

Cooper, 2002; Zheng, 2004). The involvement of SR proteins in splicing regulation is largely mediated by specific binding to the enhancer elements (Ibrahim et al., 2005).

Such interactions may either strengthen the interaction between U2AF and 3’ ss

(Graveley et al., 2001), or antagonize the negative splicing regulators (e.g., hnRNPs or heterogeneous nuclear ribonucleoproteins) that bind to ESSs (Zheng et al., 1998; Zhu et al., 2001). It has been suggested that the strength of splicing promotion is determined by the relative activity of ESE‐bound SR proteins, the number of SR proteins involved, and the distance between a ESE and the nearby intron (Graveley et al., 1998). In addition, the

RS domain of SR proteins can help recruit basal and auxiliary spliceosome components through protein‐protein interactions (Roscigno and Garcia‐Blanco, 1995). Lastly,

SF2/ASF has been reported to directly interact with the branch point of certain primary transcripts to facilitate spliceosome assembly (Shen and Green, 2004; Shen et al., 2004).

6

Nevertheless, these mechanisms might not be mutually exclusive, and the relative contribution of individual mechanisms may vary in a context‐dependent manner.

In addition to splicing activation, SR proteins have been reported to repress splicing when binding to intronic sequences. For instance, during adenovirus infection,

SF2/ASF inhibits the recruitment of U2 snRNP to the branch point of L1 pre‐mRNA by binding to an intronic repressor element (Kanopka et al., 1996). SRp38, when dephosphorylated, could act as a general splicing repressor (Shin et al., 2004; Shin and

Manley, 2002).

It is believed that splicing fidelity is achieved by coordinated actions between positive and negative regulatory signals. For example, SF2/ASF and 9G8 are known to synergistically promote the splicing of bovine growth hormone (bGH) pre‐mRNA in an

ESE‐dependent manner (Li et al., 2000). A recent genome‐wide survey has identified hundreds of overlapping mRNA targets between SRp20 and SRp75 (Anko et al., 2010).

In contrast, it is well documented that SF2/ASF and hnRNP A1 antagonize each other’s function in alternative 5’ splice site selection (Caceres et al., 1994; Eperon et al., 2000;

Mayeda and Krainer, 1992). Similarly, individual SR proteins may have opposite effects on the splicing of certain introns, as in the case of SF2/ASF and SC35 in regulating β‐ tropomyosin splicing (Gallego et al., 1997). Thus, it is conceivable that tissue‐specific splicing depends on the relative concentrations of splicing activators and repressors in

7

individual cell types. Genome‐wide data on substrate specificity of individual splicing factors as well as their effects on splicing choices are therefore critical for a better understanding of splicing regulatory networks.

1.4 Other regulatory functions of SR proteins in gene expression

Coupling between transcription and pre‐mRNA splicing is a well‐established phenomenon (Bauren and Wieslander, 1994; Beyer and Osheim, 1988). The C‐terminal domain (CTD) of elongating Pol II can help recruit SR proteins to nascent transcripts

(Misteli et al., 1997; Yuryev et al., 1996). Such interactions could directly influence splice site usage, as manifested in the alternative splicing of fibronectin transcripts (de la Mata and Kornblihtt, 2006). Conversely, SC35, one of the canonical SR proteins, has been shown to promote Pol II elongation in a gene‐specific manner (Lin et al., 2008).

It is worth pointing out that a subset of SR proteins, such as SF2/ASF, SRp20 and

9G8, shuttle back and forth between the nuclear and cytoplasmic compartments, suggesting their potential importance in other steps of the mRNA lifecycle (Caceres et al., 1998). Indeed, several groups showed that SR proteins are involved in mRNA export through direct interaction with the nuclear export receptor TAP (Hautbergue et al., 2008;

Huang et al., 2003; Huang and Steitz, 2001). This interaction is regulated by phosphorylation of SR proteins themselves (Huang et al., 2004).

8

With a few known exceptions, majority of the Pol II transcripts are cleaved at the

3’ end followed by the addition of a poly(A) tail. 3’ end processing is mediated by complex interactions between cis‐regulatory elements and the corresponding trans‐ acting factors (Mandel et al., 2008). Early studies have shown that splicing of the terminal intron is coupled with 3’‐end processing (Niwa and Berget, 1991; Niwa et al.,

1990). However, the detailed mechanisms are still controversial. Interestingly, SR proteins have previously been shown to directly interact with the 3’ end processing machinery to either promote or inhibit polyadenylation in a context‐dependent manner.

For example, NPL3, a yeast SR protein, and SF2/ASF can directly bind to the G/U‐rich element within the 3’ UTRs of mRNAs (Deka et al., 2008; McPhillips et al., 2004). SRp75 can block poly(A) tail addition when bound upstream of the AAUAAA motif (Ko and

Gunderson, 2002). SR proteins SF2/ASF, 9G8 and SRp20 can stimulate polyadenylation of RSV (Rous Sarcoma Virus) transcripts in vitro (Maciolek and McNally, 2007).

Similarly, stable interaction between 9G8 and HIV pre‐mRNA upstream of its poly(A) signal is required for efficient 3’ end processing (Valente et al., 2009). Therefore, SR proteins may be key regulators in coordinating the splicing and the 3’ end processing of pre‐mRNAs.

In addition to mRNA maturation, SR proteins have been implicated in translational control either directly or indirectly. For indirect regulation, it has been

9

shown that SF2/ASF affects the alternative splicing choices of the protein kinase MNK2

(MAP kinase interacting serine/threonine kinase 2), which is a key regulator of translation initiation (Karni et al., 2007). Interestingly, SF2/ASF can also promote translation initiation by suppressing the activity of 4E‐BP, a general translation repressor that interacts with EIF4E (Eukaryotic translation Initiation Factor 4E) (Michlewski et al.,

2008b). In terms of direct regulation, SF2/ASF is known to be associated with the polyribosome fraction and enhance the translation of ESE containing luciferase reporter

(Sanford et al., 2004). Several SR proteins may play similar roles in translation, including

SRp20 (Bedard et al., 2007) and 9G8 (Swartz et al., 2007).

Lastly, several SR proteins have been reported to be involved in nonsense‐ mediated mRNA decay (NMD), a process to degrade mRNAs containing premature termination codons (PTCs) (Zhang and Krainer, 2004). Thus, SR proteins have been proposed as master regulators of gene expression, and are intimately involved in the entire mRNA lifecycle, including transcription elongation, pre‐mRNA splicing, mRNA export, mRNA quantity control and translation (Long and Caceres, 2009).

1.5 SR protein-mediated gene regulation: functional significance

Genetics studies have shown that SR proteins are important for normal development in metazoans. For example, SR protein B52/SRp55 is essential for

10

Drosophila development as early as the first larval stage (Gabut et al., 2007; Kraus and

Lis, 1994; Peng and Mount, 1995; Ring and Lis, 1994). Depletion of CeSF2/ASF, a C. elegans counterpart of mammalian SF2/ASF, resulted in embryonic lethality (Kawano et al., 2000; Longman et al., 2000). Several SR proteins, such as SC35 (Ding et al., 2004),

SF2/ASF (Xu et al., 2005) and SRp38 (Feng et al., 2009), have been proven critical for heart development in mammals.

Since splicing is crucial to gene expression, dysregulated splicing may contribute to human diseases. It has been estimated that approximately 15% of all disease‐causing point mutations may lead to splicing defects, including mutations in the 5’ or 3’ splice site, or branch point (Cartegni et al., 2002; Cooper and Krawczak, 1990; Krawczak et al.,

1992). Notably, such estimations did not include splicing enhancer or silencer elements, and the actual number may be even higher. In fact, emerging evidence suggests that splicing defects are very common in human diseases (Kim et al., 2008; Orengo and

Cooper, 2007; Tazi et al., 2009). One well‐known example is Spinal Muscular Atrophy

(SMA), which is characterized by degeneration of motor neurons. SMA is usually caused by the lack of functional SMN1 (survival of motor neuron 1) gene expression. In the human genome, there is another SMN2 locus, which is almost identical to SMN1 except for a C > T transition in exon7. This single nucleotide variation leads to substantial skipping, and the resulting mRNA variant encodes a non‐functional protein. The

11

splicing difference may be caused by either the loss of an ESE for SF2/ASF or gain of an

ESS for hnRNP A1 (Cartegni and Krainer, 2002; Kashima and Manley, 2003). Thus, several therapies aimed at correcting SMN2 splicing to produce functional SMN protein have been proposed. Recently, one such study successfully rescued the SMA in a mouse model using an antisense oligonucleotide strategy (Hua et al., 2010).

Defected expression of oncogenic or tumor suppressor genes is often associated in human tumor samples. Such alternation can at least in part takes place at the splicing level, and is often caused by dysregulated SR protein expression. Elevated expression of several SR proteins has been documented in , such as breast cancer (Stickeler et al., 1999) and ovarian cancer (Fischer et al., 2004). Notably, SF2/ASF itself may function as a novel proto‐oncogene by regulating the splicing of BIN1 (Bridging Integrator 1),

MNK2 (MAP kinase interacting serine/threonine kinase 2) and S6K1 (Ribosomal protein

S6 Kinase polypeptide 1) genes (Karni et al., 2007). Similarly, SRp20, another SR protein, has been implicated in tumor development and may function as an oncogene (Jia et al.,

2010). Besides cancers, SR proteins may be important regulators in other pathological conditions such as viral infection (Dowling et al., 2008; Zheng, 2010).

In summary, results from early studies have clearly demonstrated that SR proteins are important regulators of normal development and human diseases.

However, most studies were based on gene‐by‐gene approaches or focused on the

12

general phenotypes after overexpression/depletion of individual SR proteins. As a result, the general landscape of SR protein‐mediated gene regulation, and the in vivo targets of SR proteins directly responsible for individual phenotypes are largely unknown. Moreover, it is already known different SR proteins can regulate splicing in a synergistic or antagonistic manner. Another unresolved question is the potential coordination and/or antagonism of different SR proteins at the systems level. These caveats are most likely due to the lack of appropriate methods for genome‐wide analysis on alternative splicing and RNA‐protein interactions. The technical advances and challenges in this area will be discussed in the following section.

1.6 The substrate specificities of SR proteins

Although SR proteins share several common structural and biochemical properties(Fu, 1995), numerous studies have suggested the functional specificities for individual SR proteins. For example, overexpression of SF2/ASF or SC35 leads to distinct alternative splicing patterns for an adenovirus E1A pre‐mRNA reporter (Wang and

Manley, 1995). In Caenorhabditis elegans, knocking down individual SR proteins resulted in no , except for SF2/ASF, which led to embryonic lethality (Kawano et al.,

2000; Longman et al., 2000). Targeted deletion of SF2/ASF in the chicken B‐cell line DT40 showed its essential role in cell viability, which cannot be rescued by other SR proteins,

13

such as SC35 and SRp40 (Wang et al., 1996). Functional studies of SR proteins in mouse model also suggested they have overlapping but distinct roles (Moroy and Heyd, 2007).

In heart development for example, tissue‐specific deletion of SF2/ASF leads to lethality around 6‐8 weeks after birth due to heart failure. A splicing defect for CaMKIIδ

(Ca2+/calmodulin‐dependent kinase IIδ) is likely to be the cause (Xu et al., 2005). In contrast to SF2/ASF, heart‐specific deletion of SC35 resulted in normal viability with normal splicing pattern of the CaMKIIdδ gene. Dilated cardiomyopathy was only evident with increasing age (Ding et al., 2004). Altogether, these data suggest individual

SR proteins have different substrate specificities.

To investigate the target specificities of SR proteins, several in vitro methods have been developed. One widely used method is SELEX (Systematic Evolution of Ligands by

Exponential Enrichment), which enriches for high‐affinity binding sequences out of a randomized RNA pool (Tuerk and Gold, 1990). This method has been successfully used to identify binding motifs for a variety of SR proteins, including SF2/ASF, SC35, 9G8 and

SRp20 (Cavaloc et al., 1999; Tacke and Manley, 1995). In addition, several functional

SELEX strategies were developed to pinpoint functional motifs, which can promote in vitro or in vivo splicing (Coulter et al., 1997; Liu et al., 2000; Liu et al., 1998). Analysis of experimental data was aided by bioinformatic tools to identify potential binding sites,

14

including several web‐based software programs, such as ESEfinder (Cartegni et al.,

2003), RESCUE‐ESE (Fairbrother et al., 2002) and PESX (Zhang and Chasin, 2004).

Although a collection of potential binding motifs have been identified for SR proteins in vitro, it is not clear whether these sequence motifs can be functional in vivo.

To solve this puzzle, several methods have been developed, aiming to identify the binding targets of RBPs in vivo. One of them is RIP (ribonucleoprotein immunoprecipitation assay), which takes advantage of reversible cross‐linking by formaldehyde (Niranjanakumari et al., 2002). The RNA fragments isolated from specific

RNP complexes can then be further analyzed by microarray (Keene et al., 2006). Based on a similar principle, another technology termed CLIP (Cross Linking and

Immunoprecipitation) was later developed by using a more specific UV cross‐linking strategy (Ule et al., 2003). Moreover, this method is compatible with the NGS (Next‐ generation sequencing) platform to allow for genome‐wide target identification. For instance, CLIP was recently performed for the Ago protein, which interacts with both miRNAs and their target mRNAs. The results helped establish genome‐wide interaction maps for individual miRNAs (Chi et al., 2009). CLIP‐Seq has been widely used to characterize the in vivo targets of RBPs, including Nova (Ule et al., 2003), SF2/ASF

(Sanford et al., 2008), FOX2 (Yeo et al., 2009) and a viral ORF57 (Kang et al., 2011;

Majerciak et al., 2006). In order to further enhance the performance of CLIP, PAR‐CLIP

15

(Photoactivatable‐Ribonucleoside‐Enhanced Cross‐linking and Immunoprecipitation) has recently been developed by taking advantage of photo‐reactive ribonucleoside analogs (Hafner et al., 2010a). Compared with the traditional CLIP‐seq approach, the UV cross‐linking between modified ribonucleoside and protein is more specific with higher efficiency, and the crosslinked positions are marked by T > C mutations within or in close proximity of the real RBP binding sites (Hafner et al., 2010a).

In summary, a large collection of RNA‐protein interaction data is expected to shed light on the mechanisms underlying regulated RNA biogenesis as well as the biological significance of such interactions in diverse cellular processes.

1.7 SR protein homeostasis

Given their functional importance in gene regulation, the steady‐state level of individual SR proteins needs to be tightly controlled (Grosso et al., 2008; Karni et al.,

2007). SR protein homeostasis can be achieved transcriptionally and/or post‐ transcriptionally. Similar to other regulatory factors in the mammalian genomes, the transcription of SR genes is under the control of transcription factors. For example, E2F1 can transcriptionally activate the expression of SC35, leading to a switch in the splicing pattern of several apoptotic genes (Merdzhanova et al., 2008; Merdzhanova et al., 2010).

Transcription of SR proteins can also be regulated under pathological conditions such as

16

HPV (human papillomavirus) infection. It has been shown that HPV transcription/replication factor E2 specifically upregulates the expression of several SR proteins, including SF2/ASF, SRp20 and SC35 (McFarlane and Graham, 2010; Mole et al.,

2009).

Negative feedback regulation is another common mechanism to maintain the steady‐state levels of SR proteins. For instance, excessive expression of SRp20 (Jumaa and Nielsen, 1997) and SC35 (Sureau et al., 2001) can specifically feedback on their own splicing, leading to the production of unstable protein or mRNA. In the case of SF2/ASF, autoregulation has been proposed to occur at multiple levels, including unproductive splicing of its transcripts (Lareau et al., 2007; Ni et al., 2007) and inhibition of translation initiation (Sun et al., 2010).

In addition to the expression regulation, SR proteins can also be regulated at the activity level by protein modifications. The RS domain of an SR protein is usually responsible for its localization (Caceres et al., 1997). Phosphorylation on the serine residues of the RS domain is critical for regulating both subcellular localizations and activities of SR proteins (Lin and Fu, 2007; Mermoud et al., 1994). It is well known that some SR proteins are stored in nuclear compartments termed speckles (Spector, 1993).

Upon transcriptional activation, they are redistributed to nascent transcripts possibly due to different phosphorylation status (Jimenez‐Garcia and Spector, 1993; Misteli et al.,

17

1997). Phosphorylation of the RS domain of SF2/ASF directly influences its relative abundance in nuclear and cytoplasmic compartments, and its ability to bind RNA substrates (Sanford et al., 2005; Xiao and Manley, 1997). One striking example is that phosphorylation of SRp38 can switch from a general splicing repressor to a sequence‐ specific activator (Feng et al., 2008). Three protein kinase families have been reported to phosphorylate SR proteins: SRPK (SR protein kinase) (Gui et al., 1994), Clk/Sty protein kinase (Colwill et al., 1996), and DNA topoisomerase I (Rossi et al., 1996). Similar to phosphorylation, a recent study showed that arginine methylation also contributes to the regulation of localization and activities of SF2/ASF (Sinha et al., 2010).

In summary, the expression level and activity of individual SR proteins are extensively regulated at multiple levels to achieve homeostasis, and the final output varies in a tissue‐ and cell type‐ specific manner.

1.8 MicroRNA biogenesis

MicroRNAs (or miRNAs) are a class of small non‐coding RNAs important for post‐transcriptional gene regulation. With a few known exceptions, the majority of miRNAs are transcribed by RNA Pol II as long primary transcripts (pri‐miRNAs). Pri‐ miRNAs are first processed in the nucleus by an RNase III‐like enzyme, Drosha, in complex with DGCR8 (DiGeorge syndrome Critical Region gene 8) to liberate ~70nt pre‐

18

miRNAs (miRNA precursors). Pre‐miRNAs are then exported to the cytoplasm by

Exportin 5, where they are further cleaved by Dicer to produce mature miRNAs. Mature miRNAs are subsequently loaded into a RNA‐induced silencing complex (RISC) to guide downstream gene repression (Bartel, 2004).

Although most early studies focused on the biological functions of miRNAs, emerging evidence suggests that miRNA processing itself is a highly regulated event.

Similar to protein‐coding genes, regulation of miRNA expression can happen at multiple levels, including transcriptional control, post‐transcriptional modification and possibly miRNA stability after maturation.

Since pri‐miRNAs are transcribed by Pol II machinery, it is expected that the promoters of miRNA genes are subjected to transcriptional controls similar to protein‐ coding genes. Indeed, the proto‐oncogene c‐Myc can both activate (miR‐17‐92 cluster and miR‐7) and repress (miR‐22 and miR‐29c) miRNA expression (Chang et al., 2008).

Similar to co‐transcriptional pre‐mRNA splicing, it has been shown that Drosha cleavage is a co‐transcriptional event, and retention of pri‐miRNAs at the transcriptional sites enhances their conversion to pre‐miRNAs (Pawlicki and Steitz, 2008). Although up to

82% of miRNA genes may reside in the intronic regions, preliminary studies showed that intronic miRNAs can be processed before splicing reaction (Kim and Kim, 2007). It will be interesting to test if the miRNA processing machinery and the splicing

19

machinery collaborate or interfere with each other in intronic miRNA processing

(Kataoka et al., 2009).

In addition, early studies showed that multiple pri‐miRNAs are detected in tumor cells but are not processed into precursor or mature miRNAs, suggesting additional regulation at the post‐transcriptional level (Thomson et al., 2006). Several

RNA‐binding proteins have been implicated in regulating pri‐miRNA maturation, including hnRNP A1 and KSRP (KH‐type Splicing Regulatory Protein) (Guil and

Caceres, 2007; Trabucchi et al., 2009). Interestingly, a recent study showed that hnRNP

A1 and KSRP have antagonistic roles in the post‐transcriptional regulation of let‐7a expression, where they compete for binding site in the terminal loop of pri‐let‐7a‐1

(Michlewski and Caceres, 2010). As for pri‐miR‐142, it is subjected to RNA editing by

ADAR1 and ADAR2 (Adenosine Deaminase) in hematopoietic tissues, resulting in suppression of its processing by Drosha (Yang et al., 2006). At the pre‐miRNA level,

Lin28 has been shown to induce uridylation of the pre‐let‐7 transcripts and promote its degradation (Heo et al., 2008).

After maturation, the local concentration of individual miRNAs is also subjected to tight control at multiple levels to regulate their local concentrations. For example, mature miR‐29b has been shown to be predominantly localized in the nucleus, and this localization depends on a hexanucleotide motif (Hwang et al., 2007). Although the

20

detailed mechanism is not clear, this suggests that the transportation of miRNAs is highly regulated. In Caenorhabditis elegans, mature miRNAs can be degraded by 5’Æ3’ exoribonuclease XRN‐2 (Chatterjee and Grosshans, 2009). However, it is not clear if this is also the case in mammals.

It has been shown that several proteins play critical roles for miRNA processing, including Drosha‐DGCR8 complex, Exportin 5 and Dicer. After maturation, miRNAs have to be associated with the RICS complex in order to function properly. Besides stringent control on miRNAs themselves, individual protein factors involved are subjected to extensive regulation as well. One interesting example is the reciprocal regulation between Drosha and DGCR8: the Drosha‐DGCR8 complex cleaves the hairpin structures embedded in DGCR8 mRNA and thereby decreases its stability, while

DGCR8 protein helps stabilize Drosha protein via protein‐protein interaction (Han et al.,

2009). Similarly, Dicer activity is dependent on its partner TRBP (TAR (HIV‐1) RNA‐ binding protein) (Gatignol et al., 1991). Diminished TRBP leads to Dicer destabilization and defects in miRNA processing (Chendrimada et al., 2005; Melo et al., 2009; Paroo et al., 2009). AGO (Argonaute) proteins and GW182 (TNRC6A, trinucleotide repeat containing 6A) are core components of the RISC complex. Of the two AGO proteins identified in Drosophila, dAGO1 is primarily associated with miRNA pathway, whereas dAGO2 mainly functions in RNAi pathway (Czech et al., 2009; Ghildiyal et al., 2010;

21

Okamura et al., 2009). However, all four vertebrate AGO proteins seem to have overlapping functions in miRNA pathway (Azuma‐Mukai et al., 2008). Several mechanisms have been proposed for regulating AGO proteins. For example, Argonaute complexes are stabilized by HSP90 (Heat Shock Protein 90) in human cells (Johnston et al., 2010). One let‐7 target, mLin41 (mouse homologue of lin‐41) promotes the ubiquitylation of Ago2 and the subsequent proteasomal degradation (Rybak et al., 2009).

While accumulating data suggest that miRNA biogenesis is regulated at multiple levels, additional factors especially RNA‐binding proteins may be involved directly or indirectly. As stated before, several splicing factors can regulate miRNA maturation through direct binding to pri‐miRNAs. Similar to splicing regulation, KSRP and hnRNP

A1 can even have antagonistic roles during let‐7a maturation. It will be interesting to test if miRNA biogenesis is generally regulated by splicing regulatory proteins, how intronic miRNAs are processed together with their hosting genes, and whether coordination or antagonism between different RBPs are common mechanisms in regulating miRNA maturation.

1.9 MicroRNA functions

With few known exceptions (Vasudevan et al., 2007), miRNAs typically bind to the 3’ UTRs of protein‐coding genes and repress their expression by translational

22

inhibition and/or promoting mRNA degradation. Translational repression may take place at both initiation and post‐initiation steps, and several distinct mechanisms may be involved (Huntzinger and Izaurralde, 2011). One interesting observation is that the RISC complex can help to recruit eIF6, a ribosome inhibitory protein known to prevent productive assembly of the 80S ribosome (Chendrimada et al., 2007). As for target degradation, mRNA targets are usually deadenylated and/or decapped first, followed by 5’Æ3’ degradation with exonuclease XRN1 (Huntzinger and Izaurralde, 2011). The choice between translational repression and RNA degradation largely depends on the degree of complementarity between a miRNA and its target. Perfect matching usually leads to RNA degradation. Taken together, miRNA‐mediated gene repression may take place at the mRNA level, translational level or both, depending on the specific pathophysiologic conditions.

The minimum sequence requirement for miRNA targeting is thought to be the conserved Watson‐Crick pairing between the 5’ seed region of a miRNA and its target mRNA (Bartel, 2009). It is estimated that each miRNA can have hundreds of functional targets (Bartel, 2004). Therefore, miRNAs may be broadly involved in diverse cellular processes to establish and/or maintain cell identity (Alvarez‐Garcia and Miska, 2005).

Alternations in miRNA expression often lead to severe pathological consequences and are frequently observed in human diseases (Esquela‐Kerscher and Slack, 2006; Lee and

23

Dutta, 2009). For instance, miRNAs, such as the miR‐17‐92 cluster, has been shown to function as an oncogene in a mouse B‐cell lymphoma model (He et al., 2005).

Furthermore, it has been proposed that miRNA expression profiles might be even more informative than that of mRNA in classifying poorly differentiated tumors (Lu et al.,

2005). Given the functional importance of miRNAs in pathophysiologic conditions, the underlying mechanisms of miRNA‐mediated gene regulation deserve further characterization.

Emerging evidence shows that miRNAs are key developmental regulators in metazoans. The first miRNA gene identified, lin‐4, was actually discovered as an essential gene for postembryonic development in C. elegans (Lee et al., 1993; Wightman et al., 1993). It is thought that embryonic development and the subsequent morphogenetic processes in C. elegans are achieved through regulation of local morphogen gradients and titration of transcriptional controls at discrete time windows.

Such regulation may be in part realized by the “fine‐tuning” of miRNAs as in the case of miRNA lin‐4 and its target lin‐14 (Wightman et al., 1993). In Drosophila, it has been shown that the reciprocal negative feedback between transcription factor Yan and miR‐

7, which serves as a bistable switch, is crucial for eye development (Li and Carthew,

2005).

24

MicroRNA‐mediated gene regulation is also important for mammalian development. Targeted deletion of Dicer in mice results in embryonic lethality as early as E7.5 (embryonic day 7.5) (Bernstein et al., 2003). At the tissue level, muscle‐specific miR‐1 is critical for cardiogenesis by targeting transcription factor Hand2 (Heart and neural crest derivatives expressed 2) (Zhao et al., 2005). Brain‐enriched miR‐124 is known to repress Sox9 (SRY box transcription factor 9) expression, which is important for the differentiation of stem cell lineage in the subventricular zone into neurons

(Cheng et al., 2009). Notably, a recent study from Blelloch’s group suggested that opposing miRNA families are involved in regulating the self‐renewal and/or differentiation of mouse embryonic stem cells (Melton et al., 2010). These studies clearly demonstrate the importance of miRNAs in mammalian development.

Although the regulatory functions of miRNA are conserved from worm to human, the effects of miRNAs on their targets are usually modest compared with transcriptional control. However, it is conceivable that miRNAs could function as critical developmental switches through fine‐tuning the expression of important regulatory proteins. For instance, miR‐9* and miR‐124 can target several key components in the chromatin‐remodeling complexes, which are critical for the mitotic exit of neural progenitors during the development of vertebrate nervous system (Yoo et al., 2009).

25

Taken together, miRNAs are key regulators of gene expression during development and pathophysiologic conditions. The biological significance of miRNA functions needs to be considered at the systems level.

1.10 MicroRNA-mediated gene regulatory networks

Because each miRNA could have hundreds of potential targets, microRNA‐ mediated gene regulation is likely to form intricate gene regulatory networks (GRNs), and has broad impacts on gene expression. While miRNAs operate through a repressive mechanism, their functions in gene regulatory networks may not be simply repressive.

In fact, they may have diverse functions depending on the unique network context.

The best‐known miRNA‐containing circuit is the miR‐17/E2F1 circuit: whereas both E2F1 and miR‐17 are transcriptionally activated by c‐Myc during the cell cycle, miR‐17 can repress E2F1 expression (OʹDonnell et al., 2005). In this scenario, the miRNA functions as a negative feedback regulator to maintain the expression level of its target.

It has been suggested that miRNAs may act as key players in canalizing genetic programs, buffering stochastic perturbations and thereby conferring robustness during development (Hornstein and Shomron, 2006). Therefore, it is not surprising to find out that circuits corresponding to negatively transcriptional coregulation of miRNAs and

26

their targets are prevalent in neuronal system, which may help maintain the neuronal homeostasis (Tsang et al., 2007).

In the case of positive feedback, miRNAs can function to reinforce the upstream transcriptional decision. This may be very useful for acute response to environmental or intracellular stimuli, such as the inflammation response. For example, Kevin Struhl’s group recently showed a complex positive feedback loop consisted of Src, NF‐κB, Lin28, let‐7 miRNA and IL‐6. Src activation triggers an inflammatory response mediated by NF‐κB, resulting in active transcription of Lin28.

Lin28 rapidly reduces the level of let‐7, while let‐7 inhibits IL‐6 expression under normal conditions. Higher level of IL‐6 will activate NF‐κB and another transcription factor STAT3, which further represses let‐7 expression (Iliopoulos et al., 2009).

One potential caveat in miRNA field is that most of the studies solely focused

on the miRNA‐target relationship. Since both miRNAs and their putative targets are

embedded in complex regulatory networks, it may be hard to elucidate the true

biological significance of such targeting events unless the local network structure can

be resolved and considered at the systems level.

For example, hnRNP A1 is known to be involved in the splicing of a variety of pre‐mRNAs and the biogenesis of some miRNAs, including let‐7a and miR‐18a

(Michlewski and Caceres, 2010; Michlewski et al., 2008a). However, these two

27

regulations have only been investigated independently. It will be interesting to test whether hnRNP A1 and the miRNAs regulated by hnRNP A1 share common mRNA targets, and whether these two processes are coordinated in pathophysiologic conditions. By incorporating miRNAs into GRNs, it is expected to help us better understand how miRNA‐mediated gene regulation at systems level.

28

2. Methods

2.1 Cell culture and transfection

HeLa, HEK293T and MEF/3T3 cells were cultured in DMEM supplemented with

10% FBS (fetal bovine serum). To generate the SF2/ASF stable cell line, HeLa cells were transfected with an STP retroviral vector containing the human SF2/ASF cDNA; stable transductants were selected under puromycin (2 μg/ml, Invitrogen). MEF/3T3 tet‐off cells stably transfected with a T7‐SRp20 vector or an empty vector were obtained as gifts from the Zheng Laboratory (Jia et al., 2010). Stable clones were maintained in media with Doxycycline (2 μg/ml, Sigma). After removal of Doxycycline, cells were induced and harvested at indicated time point for SF2/ASF (0, 24, 48, 72‐hr) or for SRp20 (0 and

96‐hr) in TRIzol (Invitrogen). Transient transfection was performed with Lipofectamine‐

2000 (Invitrogen) or TriFECTin (IDT) following the manufacturer’s instructions.

2.2 Plasmids

T7‐tagged SF2/ASF, SC35, SRp20 and 9G8 were cloned into the pCGT7 vector

(Caceres et al., 1997). To construct the hnRNPK minigene reporter, an EGFP cDNA fragment was first cloned into the pTag2A vector (Stratagene). A genomic hnRNPK

29

fragment, which corresponds to the last intron and it neighboring exons, was amplified and cloned in frame with the EGFP gene with the primers 5’‐CGTCATGAGTCGGGAGC

TTC‐3’ and 5’‐GCAGGACTCCTTCAGTTCTTCA‐3’. For the pCG‐miR‐7 construct, we first cloned EGFP into the pcDNA3.1+ vector (Invitrogen). MiR‐7‐1 precursor sequence was then cloned downstream of the EGFP gene with the primers 5’‐

AAAACTGCTGCCAAAACCAC‐3’ and 5’‐GCTGCATTTTACAGCACCAA‐3’. All clones were verified by sequencing. Mutations in the putative SF2/ASF binding site was achieved with the primers 5’‐TAGAAGATTCATTGGATGTTGAACAAGATCTGTGTG

GAAGACTAGTGA‐3’ and 5’‐TTGTCCTGTAGAGGCATGAACAGAGCCATATGGCA

GACTGTGA‐3’

2.3 Profiling miRNA expression by deep sequencing

2.3.1 Construction of miRNA sequencing library

Cells were harvested at the indicated time points after SF2/ASF induction. Total

RNAs were isolated using mirVana miRNA isolation kit (Ambion) following the manufacturer’s instructions. MicroRNA sequencing libraries were constructed as previously described (Lau et al., 2001) with several minor modifications. Starting with the total RNAs, small RNAs (17‐27nt) were then enriched by 15% TBE‐UREA gel

30

purification. Size‐fractionated RNAs were sequentially ligated to a 3’ adaptor (5’‐ rAppTTTAACCGCGAATTCCAG/3ddC/‐3’) and a 5’ adaptor (5’‐TGGAATrUrCrUrCrG rGrGrCrArCrCrArArGrGrU‐3’). Linker‐ligated RNAs were purified using 10% TBE‐

UREA gel and then reverse transcribed with Superscript III (Invitrogen). The resulting first‐strand cDNAs were subjected to 15 cycles of add‐on PCR using the primers 5’‐

AATGATACGGCGACCACCGACACTCTTGGAATTCTCGGGCACCAAG‐3’ and 5’‐

CAAGCAGAAGACGGCATACGAGCTCTTCGCTGGAATTCGCGGTTAAA‐3’. The

PCR products were purified on a 10% polyacrylamide gel and sequenced with an

Illumina/Solexa 1G Genome Analyzer. The sequencing reaction was initiated with a custom‐designed primer (5’‐CTCTTGGAATTCTCGGGCACCAAG‐3’). 35 sequencing cycles were carried out following the manufacturer’s procedure.

2.3.2 Analysis of miRNA deep sequencing data

Raw Illumina/Solexa reads were first consolidated by clustering reads with identical sequence. After both 3’ and 5’ adaptor sequences were trimmed, the remaining sequence tags were mapped back to the human genome using MEGABLAST

(http://blast.ncbi.nlm.nih.gov/Blast.cgi). Genomic positions of each sequence were compared with genome annotations downloaded from the UCSC Genome Browser

(http://hgdownload.cse.ucsc.edu/downloads.html), including UCSC genes, RNA genes

31

and RepMask 3.2.7. Known miRNAs were then identified based on perfect match with miRBase annotations (Release 12; http://microrna.sanger.ac.uk/). The relative expression level of each miRNA was determined by its count normalized with the total miRNA count of the corresponding library. To identify differentially expressed miRNAs upon

SF2/ASF induction, Fisherʹs exact test was conducted to calculate the p value with the statistics module in the R package (http://cran.r‐project.org/web/packages/statmod/ index.html). The False Discovery Rate was estimated with a multiple test correction method. Candidate miRNAs with a fold change > 1.5 and q < 0.01 compared to the un‐ induced condition were clustered with Cluster2 (Cluster analysis and display of genome‐wide expression patterns), and the TreeView package (http://rana.lbl.gov/

EisenSoftware.htm) was used to visualize the z‐scores of fold changes (two proportion z‐ test), which are shown in the respective heat maps. The raw sequencing data were uploaded to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) with the access number SRA010840.5.

2.4 Dual-Luciferase reporter assay

HEK293T cells were grown to ~50% confluence in 24‐well plates. For each transfection, two luciferase reporters, pRL‐SV40 (Rellina luciferase, Promega) and pcDNA‐Luc with or without the 3’ UTR region of the target gene (Firefly luciferase, see

32

above), were mixed at a 1:2.5 molar ratio. For SFRS1, 3’ UTR mutant with the deletion of miR‐7 seed match was also included. For genes with different splicing variants, only shared miR‐7 target sites were considered (see Appendix A for a complete list of the primer sequences). Synthetic miR‐7 precursors or control RNAs (Applied Biosystems) were co‐transfected at a final concentration of 25 nM. Firefly and Rellina luciferase activities were analyzed by a Dual‐Glo luciferase assay system (Promega).

2.5 Semi-quantitative PCR

15‐20 cycles of radiolabeled PCR were carried out to ensure that the amplifications were in the linear range. The resulting PCR products were resolved by

8% TBE‐PAGE gel and detected with a Storm 840 PhosphorImager. The signal was quantified by ImageQuant (Amersham).

Radiolabeled PCR reactions were performed with individual primer set for each gene (see Appendix B for a complete list of the primer sequences).

33

2.6 Cross-linking and Immunoprecipitation (CLIP)

2.6.1 CLIP-PCR

CLIP analysis of SF2/ASF and SRp20 was performed as described (Ule et al.,

2003) with a few minor modifications. Briefly, HeLa or MEM/3T3 cells were cultured in

10‐cm dishes; UV (265nm) cross‐linking was carried out at 50 mJ/cm2. Crosslinked cells were collected and lysed in RIPA buffer (50 mM Tris‐HCl pH 7.4, 150 mM NaCl, 1 mM

EDTA, 1% Triton X‐100, 0.1% SDS, 1% Deoxycholate). After DNase treatment, the cell lysate was treated with RNase A (Promega) for 10 min at a final dilution of 1:1,000,000.

The reaction was stopped by adding 200U RNase inhibitor (Invitrogen).

Immunoprecipitation was carried out at 4 ℃ for 2 hrs with Protein A/G PLUS‐agarose beads (Santa Cruz) coupled with SF2/ASF monoclonal antibody AK96 (Hanamura et al.,

1998), or for 1 hr with Protein G Dynabeads (Invitrogen) coupled with T7 monoclonal antibody (Novagen). After extensive washing, SF2/ASF‐ or SRp20‐bound RNAs were released by Proteinase K treatment, followed by phenol extraction and ethanol precipitation. The resulting RNAs were treated with DNase I and reverse transcribed with Superscript II and random hexamers. Quantitative PCR was then performed as described before for SF2/ASF‐bound RNA samples. Regular realtime PCR was

34

performed for SRp20‐bound RNA samples (see Appendix C for a complete list of the primer sequences).

2.6.2 PAR-CLIP

SRp20 stable cell line was induced four days before harvest. The PAR‐CLIP was performed as described before (Hafner et al., 2010a, b) using T7 monoclonal antibody

(Novagen). Briefly, 24 hours before harvest, 4‐thiouridine was added into the cell culture media to a final concentration of 100 μM. UV (365nm) cross‐linking was performed at

150 mJ/cm2. Crosslinked cells were harvested and lysed in NP40 lysis buffer (50mM

HEPES pH 7.5, 150mM KCl, 2mM EDTA, 1mM NaF, 0,5% (v/v) NP40 and 0.5mM DTT) for 15 minutes on ice. Cell lysate was cleared by centrifugation and filtering. After low concentration of RNase T1 digestion (1 U/μl, Fermentas), immunoprecipitation was carried out at 4 ℃ for 1 hour with Protein G Dynabeads (Invitrogen) coupled with T7 monoclonal antibody (Novagen). On beads RNase digestion was performed at a final

RNase T1 concentration of 100 U/μl for 15 minutes. After dephosphorylation, RNP complexes were radiolabeled with γ‐32P‐ATP using T4 PNK (NEB), which was followed by SDS‐PAGE gel size fractionation. After exposing the gel, the band corresponding to

SRp20 RNP was cut and transferred to D‐tube Dialyzer Midi with a molecular cutoff of

3.5 kDa (Novagen) for electroelution. After proteinase K (Roche) digestion, RNAs were

35

recovered by ethanol precipitation and subjected to the library construction similar to miRNA deep sequencing described before. Sequencing was performed with an

Illumina/Solexa 1G Genome Analyzer, and 40 sequencing cycles were carried out following the manufacturer’s procedure.

2.6.3 Analysis of PAR-CLIP sequencing data

Data analysis was performed as described before (Mukherjee et al. manuscript under review). Single‐end reads were first trimmed at both 3’ and 5’ ends to get rid of linker sequences. The remaining reads were aligned to reference genome (mm9, NCBI

Build 37) allowing up to 3 mismatches by Bowtie algorithm (Langmead et al., 2009). All

T > C mismatches between the sequencing reads and the genome were subtracted from the mismatch count for each mapped location. Then, reads mapped to a single genomic location with the minimum number of mismatches were used for further analysis.

Overlapping reads were grouped together for the identification of potential binding sites. For all positions with over 5 nucleotides of read depth, separate kernel density estimates for the T > C conversions and the non‐converted Ts were calculated.

Then, clusters were defined as regions with higher conversion density than non‐ conversion density for more than 5 consecutive nucleotides, and the cluster regions were extended until the read depth dropped below 5 for the particular groups. Clusters were

36

further filtered based on the following criteria before motif analysis. (1) Each cluster should contain T > C conversion at more than one location to reduce noise and possibly natural polymorphisms. (2) Clusters derived from intergenic regions or repetitive elements were excluded from further analysis.

For motif finding, the locations of PAR‐CLIP clusters relative to a known transcript (exon, intron, 3’ UTR and 5’ UTR) were determined based on UCSC Genes annotation. All clusters mapped to mRNA were ranked by the number of reads comprising a cluster (log2(reads)). Potential SRp20 binding motifs enriched in the top clusters were computed by cERMIT with the default parameters (Georgiev et al., 2010).

For clusters with potential motifs, was performed with PANTHER

(Protein ANalysis THrough Evolutionary Relationships) classification system (Thomas et al., 2003). P‐values were calculated for each individual categories based on Bonferroni corrections. For all categories with a p value less than 0.01 were shown in Appendix E.

2.7 Northern blotting

Total RNAs were resolved on a 15% polyacrylamide TBE‐UREA gel and blotted onto a Hybond‐N+ membrane (Amersham). An LNA probe (5’‐A+CAA+CAA+A

AT+CACTA+GTCTT+CCA‐3’; +N stands for LNA base), which is antisense to mature miR‐7, was labeled with γ‐32P‐ATP using T4 polynucleotide kinase (Invitrogen). After

37

hybridization, the resulting membrane was exposed to a phosphorimaging screen

(Amersham) for 1 hr to detect ectopically expressed miR‐7, or overnight to visualize the endogenous miR‐7. The intensity signal was analyzed by ImageQuant (Amersham). For the hnRNPK/EGFP minigene experiments, the levels of minigene mRNA and mature miR‐7 were first normalized to that of GAPDH mRNA and U6 snRNA, respectively. To normalize the transfection efficiency, gels were re‐loaded by keeping the minigene expression relatively constant, such that the differences in miR‐7 levels can be better visualized.

2.8 Western blotting

Cells were directly lysed in Laemmli Sample Buffer (Bio‐rad). Proteins were separated on 8% SDS‐PAGE gels and blotted with monoclonal antibody against β‐ catenin (Sigma), α‐tubulin (DM1A; Upstate), T7 epitope tag (Novagen), or SF2/ASF

(AK96) (Hanamura et al., 1998) at 4℃ overnight. The signals were detected with Alexa

Fluor 488 goat anti‐mouse IgG (H+L) antibody (Invitrogen) and quantified with a Storm

840 PhosphorImager.

38

2.9 RNA affinity purification

DNA templates for in vitro transcription were prepared by PCR from the pCG‐ miR‐7 and its SF2/ASF binding site mutant with the primers 5’‐TAATACGACTCACTA

TAGGGTAGAAGATTCATTGGATGTTGG‐3’ and 5’‐TTGTCCTGTAGAGGCATG‐3’. In vitro transcription was carried out with T7 RNA polymerase (NEB) following the manufacturer‘s protocol. After gel purification, the affinity purification of miRNA binding factors from HeLa cell extract was performed as described before (Caputi et al.,

1999). Generally, in vitro transcribed RNA (500 pmol) was incubated in the mixture containing 0.1 M sodium acetate pH 5.0 and 5 mM sodium m‐periodate (Sigma) in the dark for 1 hr at room temperature. After ethanol precipitation, RNA was resuspended in

500 μl of 0.1 M sodium acetate pH 5.0. A 400 μl aliquot of the adipic acid dihydrazide agarose bead 50% slurry (Sigma) was washed four times with 0.1 M sodium acetate pH

5.0, and then mixed with the periodate‐treated RNA for 12 hrs at 4 ℃ with rotation. The beads coupled with RNA were washed three times with 2 M NaCl and three times with buffer D (20 mM HEPES‐KOH pH 7.6, 5% (v/v) glycerol, 0.1 M KCl, 02. mM EDTA, 0.5 mM dithiothreitol (DTT)). Then, the beads were incubated with 250 μl HeLa cell extract for 20 min at 30 ℃. After washed four times with buffer D containing 4 mM MgCl2, protein factors associated with the immobilized RNAs were analyzed by Western blotting.

39

2.10 RNA interference

Dicer substrate small interfering RNA (DsiRNA) against SFRS1 (forward 5’‐

CCAAGGACAUUGAGGACGUGUUCUA‐3’; reverse 5’‐UAGAACACGUCCUCAAUG

UCCUUGGUU‐3’) was custom designed and synthesized (IDT). 106 HeLa cells were transfected with 100 pmol DsiRNA duplex using TriFECTin transfection reagent (IDT).

Cells were harvested 48 hrs after transfection. The knockdown efficiency was determined by RT‐PCR and Western blotting.

2.11 In vitro pri-miRNA processing assay

In vitro processing was performed as described before (Michlewski et al., 2008a).

Briefly, each reaction (30 μl) contained 20% (v/v) wild‐type or depleted HeLa cell extract,

0.5 mM ATP, 20 mM creatine phosphate (Sigma), 3.2 mM MgCl2, and 200,000 cpm (~100 fmol) of the in vitro transcribed pri‐miRNAs. The reactions were assembled on ice followed by incubation at 30℃ for 30 min. After phenol/chloroform extraction and ethanol precipitation, the RNA samples were resolved by 10% (w/v) TBE‐Urea gel electrophoresis and exposed overnight on X‐ray film at ‐80℃.

40

3. Negative feedback regulation between SFRS1 and miR-7

Negative feedback regulation is known to be important for maintaining the steady‐state level of regulatory proteins such as transcription factors (OʹDonnell et al.,

2005). In the case of splicing factors, several potential mechanisms have been proposed as discussed in the previous sections. Since miRNAs are well‐known for fine‐tuning the expression of their targets, we hypothesized that miRNAs are also involved in the negative feedback regulation of splicing factors. During my thesis research, I focused on

SF2/ASF, one of the best‐known SR family of splicing factors in mammals, to study how its expression is regulated through miRNA‐mediated negative feedback mechanism.

Surprisingly, we have proven that SF2/ASF is indeed involved in the maturation of a subset of miRNAs. One of them, miRNA‐7, can negatively regulate the expression of

SF2/ASF at the translational level. To our knowledge, the negative feedback loop consisted of SF2/ASF and miR‐7 is the first example of such regulation identified between a splicing factor and a miRNA. This finding highlights the importance of RBPs in regulating miRNA maturation, and the broad involvement of miRNAs in fine‐tuning the expression of regulatory proteins.

41

3.1 Generation of a stable cell line system to recapitulate SF2/ASF feedback

SF2/ASF is a well‐known multifunctional RNA‐binding protein that plays a diverse role in post‐transcriptional gene regulation. This splicing factor is encoded by a single copy of SFRS1 gene in the human genome. Several groups including ours have recently shown that SF2/ASF can negatively regulate its own expression, therefore maintaining its expression level relatively stable in cells. Several mechanisms have been proposed by which SF2/ASF may repress its own expression at the splicing and translational steps, respectively (Lareau et al., 2007; Ni et al., 2007; Sun et al., 2010).

Using in vivo competition and deletion assays, Krainer’s group showed that the negative feedback is mainly mediated by the 3’ UTR of SFRS1, which is important for the polysome association and translation of its mRNA (Sun et al., 2010).

Since miRNAs are known to bind the 3’ UTRs of their target genes to repress gene expression via translation inhibition and/or mRNA degradation (Bartel, 2009), we suspected that miRNAs may be involved in the SFRS1 negative feedback. Indeed, based on TargetScan (Grimson et al., 2007; Lewis et al., 2005), 24 miRNAs could potentially target the 3’ UTR of SFRS1 (Figure 1). Thus, one plausible mechanism is that SF2/ASF may increase the expression of a miRNA or multiple miRNAs, which in turn represses the expression of SFRS1 gene. To test this hypothesis, we established a stable HeLa cell line in which the expression of T7‐tagged SF2/ASF cDNA is under the control of an

42

inducible Tet‐off promoter (Sun et al., 2010). While the overall SF2/ASF protein level was increased upon induction, the endogenous SF2/ASF protein was significantly reduced without affecting its mRNA level (Figure 2, A and B). These observations recapitulate the negative feedback regulation of SF2/ASF in vivo and serve as a foundation for my thesis study.

43

Figure 1: The microRNAs predicted to target the SFRS1 gene

44

Figure 2: SF2/ASF autoregulation in the stable cell line with inducible SF2/ASF expression

(A) Western blotting analysis of the endogenous and ectopically‐expressed (T7‐tagged

SFRS1 cDNA) SF2/ASF proteins before and after SF2/ASF induction. β‐catenin was used as an internal control.

(B) The mRNA level of endogenous SFRS1 was quantified by radiolabeled RT‐PCR.

GAPDH was used as an internal control.

45

3.2 Identification of the SF2/ASF-miR-7 feedback circuit

I next aimed to identify miRNAs differentially expressed in response to elevated level of SF2/ASF. Since SF2/ASF may be broadly involved in miRNA biogenesis as hnRNP A1 or KSRP, we intended to profile miRNA expression genome‐wide instead of focusing on the 24 miRNAs potentially targeting SFRS1. Illumina/Solexa sequencing was used to monitor miRNA expression profiles along the time course of SF2/ASF induction

(0, 24, 48 and 72‐hr). We obtained 1.37 million reads that could be perfectly mapped to

369 known human miRNAs (Table 1 and Figure 3). Among them, 40 miRNAs were differentially expressed (> 1.5‐fold change; Fisher’s exact test, q < 0.01) under one or more SF2/ASF‐induced conditions (Figure 4A). Four upregulated miRNAs (miR‐29b, miR‐221, miR‐222, and miR‐7) were validated by Northern blotting (Figure 4B and 5B).

Together, these data suggest that deep sequencing is a reliable method to detect differentially expressed miRNAs.

Table 1: Mappability of miRNA deep sequencing library

Total count in the sequencing data 3,248,250 100% Total count of the sequences with linkers 2,199,067 68% Mature miRNA sequences 1,370,831 62%

46

Figure 3: Genomic mapping of raw reads from miRNA sequencing libraries

After the removal of 3’ and 5’ linkers, the remaining sequences were mapped back to the human genome using MEGABLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Genomic positions of the sequences were further compared with the annotations from UCSC

Genome Browser, including UCSC genes, RNA genes and RepMask 3.2.7. Known miRNAs were then identified based on perfect match with mature miRNA sequences from miRBase (Release 12; http://microrna.sanger.ac.uk/).

47

Figure 4: Differentially expressed miRNAs upon SF2/ASF induction

(A) Normalized counts of individual miRNAs at 24, 48 or 72‐hr after SF2/ASF induction were compared with that of the control cells without induction. Differentially expressed miRNAs (> 1.5‐fold change and q < 0.01) are shown and z‐scores were plotted in the heat map.

(B) Northern blotting analysis of three mature miRNAs (miR‐29b, miR‐221 and miR‐222) in the stable HeLa cell line along the time course of SF2/ASF induction. U6 snRNA was used as an internal control.

48

Among the 24 miRNAs that are predicted to target SF2/ASF, 22 showed detectable expression in at least one induction time point, and 18 miRNAs could be detected in all four time points. Whereas the majority of these miRNAs were not differentially expressed, miR‐7 clearly stood out in that it was consistently up‐regulated by ~2‐fold across all SF2/ASF induced conditions (Figure 5A). The increase was further confirmed by Northern blotting (Figure 5B). Together, these data suggest that SF2/ASF and miR‐7 could potentially form a negative feedback circuit (Figure 5C).

49

Figure 5: Potential negative feedback between SF2/ASF and miR‐7

(A) Of the 24 miRNAs that potentially target SFRS1 mRNA, 18 are expressed in all four conditions. The relative expressioon levels of these miRNAs under the SF2/ASF‐induced conditions (24, 48 and 72‐hr) were normalized to that of the control cells and the z‐scores of expression changes were plotted.

(B) Northern blotting analysis of mature miR‐7 in the stable cell line with inducible

SF2/ASF expression. U6 snRNA was used as an internal control. The relative miR‐7 expression levels are shown at the bottom of the panel.

(C) A schematic diagram of a putative negative feedback circuit consisting of SF2/ASF and miR‐7.

50

3.3 SF2/ASF is required for efficient production of miR-7

To test whether SF2/ASF is directly required for miR‐7 maturation, we first examined the expression dynamics of miR‐7 in response to SF2/ASF induction at early time points between 0 and 24‐hr by Northern blotting (Figure 6A). SF2/ASF dependent upregulation of miR‐7 can be observed as early as 6‐hr and peaked at about 12‐hr, implying that the regulation is likely to be direct.

In addition, loss‐of‐function analysis was carried out by knocking down the expression of endogenous SFRS1 using RNA interference in HeLa cells. Western blotting and quantitative RT‐PCR assays confirmed that endogenous SFRS1 expression was reduced to approximately 30% of its original level. While the internal control U6 was not affected, loss of SF2/ASF resulted in a 40% decrease of mature miR‐7 level (p < 0.05,

Figure 6B). Taken together, these results demonstrated that SF2/ASF is required for efficient production of mature miR‐7 in HeLa cells. However, since the endogenous pri‐ miR‐7 and pre‐miR‐7 can hardly be detected probably due to efficient miRNA processing, at which step SF2/ASF functions is uncertain.

51

Figure 6: SF2/ASF is required for efficient miR‐7 maturation

(A) Northern blotting analysis of mature miR‐7 in the stable cell line with SF2/ASF induction at the indicated time point between 0 and 24‐hr. U6 snRNA was used as a normalization control (bottom panel). The relative expression levels of miR‐7 from triplicated experiments were plotted in the top panel.

(B) HeLa cells were transfected with small interfering RNAs (siRNAs) against either

Luciferase or SFRS1. Two days after transfection, radiolabeled RT‐PCR and Western blotting were used to monitor the mRNA (top two panels; GAPDH as an internal control) and protein (middle two panels; α‐tubulin as an internal control) levels of

SFRS1, respectively. Bottom two panels: Northern blotting analysis of the endogenous miR‐7 level with U6 as an internal control. Results from triplicate experiments were plotted in the right panel. The * indicates p < 0.05 (t‐test, n=3). Error bars represent SEM.

52

3.4 SF2/ASF can be targeted by miR-7

We next examined whether SFRS1 mRNA is a bona fide target of miR‐7. A Dual‐ luciferase assay was initially used to confirm the target relationship. TargetScan predicts that the 3’ UTR of SFRS1 has a highly conserved seed match (m6+u1A) for miR‐7 (Figure

7A). We cloned the putative target site and its surrounding region into a firefly luciferase reporter (Figure 7B). The resulting construct or the empty vector control was co‐transfected into HeLa cells together with synthetic miR‐7 precursors or scrambled

RNAs. A Rellina luciferase construct was also included to normalize the transfection efficiency. With the control plasmid, miR‐7 had little effect on the relative luciferase activity, compared to the scrambled RNAs. In contrast, the normalized luciferase activity of the SFRS1 reporter was reduced to half by miR‐7 (Figure 7C). The down‐regulation is dependent on the predicted miR‐7 seed match, because the luciferase activity was restored when the site was deleted (Figure 7C).

53

Figure 7: SFRS1 is a target of miR‐7

(A) Base‐pairing between mature miR‐7 and its putative target site in the 3’ UTR of

SFRS1.

(B) Schematic diagram of the luciferase reporter constructs. The putative miR‐7 target site is shown in black.

(C) Dual‐luciferase assays were performed in triplicate. For each construct, the relative luciferase activity was plotted by normalizing between cells transfected with control

RNAs (dark gray) and miR‐7 precursors (light gray).

(D) HeLa cells were transfected with either control RNAs or synthetic miR‐7 precursors.

Two days after transfection, the levels of mature miR‐7 and endogenous SF2/ASF protein werre quantified by Northern (U6 as an internal control; top 2 panels) and

54

Western blotting (α‐tubulin as an internal control; bottom 2 panels), respectively.

Representative gels are shown. The relative levels of SF2/ASF protein from 3 separate transfections were plotted (t‐test: p < 0.05; middle panel). Quantitative RT‐PCR results of endogenous SFRS1 mRNA in the transfected cells are shown in the right panel, and

GAPDH mRNA was used as an internal control.

Error bars represent SEM.

55

In addition, I transiently transfected HeLa cells with synthetic miR‐7 precursors to determine its effect on endogenous SF2/ASF expression. As a negative control, HeLa cells were transfected with control RNAs. The results showed that ectopic miR‐7 expression significantly reduced the level of endogenous SF2/ASF protein (p < 0.05) while the mRNA level of SFRS1 remained constant (Figure 7D). We did not detect significant upregulation of SF2/ASF protein when endogenous miR‐7 was blocked in

HeLa cells (data not shown), possibly due to the relatively low level of miR‐7 in these cells. As an alternative, we used HEK293 cells, in which the mature miR‐7 level is considerably higher (Reddy et al., 2008). Knocking down endogenous miR‐7 led to a reproducible increase of SF2/ASF protein level by over 30% (Figure 8). Taken together, these results argue that SFRS1 is a physiological target of miR‐7, and the repression occurs at the translational level.

56

Figure 8: Inhibition of endogenous miR‐7 increases the level of SF2/ASF protein.

HEK293T cells were transfected with control RNAs or 2’‐OMe oligonucleotides against mature miR‐7 using TriFECTin transfection reagent (IDT). The procedure was repeated one more time 24‐hr after the initial transfection. Double‐transfected cells were harvested 1‐day after the second transfection.

(A) Northern blotting was performed to examine the knock‐down efficiency for endogenous miR‐7 (toop two panels). The levels of SF2/ASF protein in the control and miR‐7 knockdown cells were determined by Western blotting (bottom two panels). The * indicates phosphorylated SF2/ASF protein. Representative gels are shown.

57

(B) The protein level of SF2/ASF was normalized to that of α‐tubulin, and the results from triplicate experiments were plotted. The * indicates p < 0.05 (t‐test, n=3). Error bars represent SEM.

58

The reciprocal targeting between SF2/ASF and miR‐7 confirmed that they can potentially form a negative feedback loop. SF2/ASF promotes the production of miR‐7, which in turn targets SFRS1 mRNA to repress its translation. The circuit structure implies that miR‐7 may in part contribute to the negative feedback regulation observed for SF2/ASF. Conversely, SF2/ASF may play a role in maintaining the steady‐state level of mature miR‐7, an important regulatory molecule involved in diverse cellular processes (Li and Carthew, 2005; Li et al., 2009). In this scenario, SF2/ASF may serve as a

“rheostat” to sense the cellular level of mature miR‐7. Fluctuations in miR‐7 level are expected to drive the expression of endogenous SF2/ASF in an opposite direction.

Because SF2/ASF is required for efficient miR‐7 maturation, the negative feedback loop in effect buffers the noise in miR‐7 expression to better maintain its steady state. This may be especially important for regulatory miRNAs during development, the concentrations of which are critical.

59

3.5 Potential biological significance of the SF2/ASF-miR-7 circuit

It has recently been shown that SFRS1 can function as a proto‐oncogene (Karni et al., 2007). Interestingly, miRNAs can also function as tumor suppressors or oncogenes by targeting other oncogenes or suppressors, respectively (Esquela‐Kerscher and Slack,

2006). Therefore, I am interested in studying the potential biological significance of the

SF2/ASF‐miR‐7 circuit in tumor development.

PI3K‐mTOR is a well‐known tumorgenetic pathway, whose activity is elevated in many cancers. The PI3K‐mTOR signaling pathway contains several known oncogenes

(e.g. PI3K) and tumor suppressors (e.g. PTEN) (Mamane et al., 2006; Sabatini, 2006; Shaw and Cantley, 2006). There are two different kinase complexes in the mTOR pathways: the mTORC1 complex inactivates the translation inhibitors 4E‐BP1/2 and activates ribosomal protein kinase S6K1 to enhance translation (Mamane et al., 2006; Sabatini,

2006; Shaw and Cantley, 2006); while the mTORC2 complex phosphorylates the oncoprotein Akt (Manning and Cantley, 2007; Rajasekhar et al., 2003). It has been shown that the oncogenic properties of the SFRS1 gene are at least partially mediated through activation of the PI3K‐mTOR pathway (Karni et al., 2007; Karni et al., 2008). Notably,

SF2/ASF specifically activates mTORC1 downstream targets S6K1 and 4E‐BP1 (Karni et al., 2008). In addition, another important downstream target of SF2/ASF is Mnk2 kinase, a MAPK interacting kinase (Karni et al., 2007). Strikingly, based on gene set enrichment

60

analysis (GESA), I found that predicted miR‐7 target genes are overrepresented in the

MAPK and PI3K pathways among 689 canonical signaling pathways

(http://www.broadinstitute.org/gsea/msigdb/). Thus, one interesting possibility is that miR‐7 may be involved in the SF2/ASF‐mediated tumorigenesis.

To test this hypothesis, 12 putative miR‐7 targets in the MAPK or PI3K pathway were selected for validation. We cloned each putative target site and its surrounding regions into a Firefly luciferase reporter. The resulting construct or the empty vector control was co‐transfected into HeLa cells together with synthetic miR‐7 precursors or scrambled RNAs. As expected, 8 out of the 12 genes are significantly downregulated by miR‐7 (p < 0.05, Figure 9A), indicating that they are likely to be bona fide targets of miR‐

7. The potential targets include MAP3K9 (Mitogen‐Activated Protein Kinase Kinase

Kinase 9), SRF (Serum Response Factor or c‐fos Serum Response element‐binding transcription Factor), PSCD3 (Pleckstrin homology, Sec7 and Coiled‐coil Domains 3),

RAF1 (v‐RAF‐1 murine leukemia viral oncogene homolog 1), MKNK1 (MAP kinase interacting serine/threonine kinase 1), MAPK4 (Mitogen‐Activated Protein Kinase 4),

PIK3CD (PhosphoInositide‐3‐Kinase, Catalytic, Delta polypeptide), and PRKCB1

(Protein Kinase C, Beta). Because multiple genes in the PI3K and MAPK pathways could be targeted by miR‐7, the finding is unlikely due to the noise of TargetScan predictions.

61

Overall, these results suggest that miR‐7 is not only limited to the regulation of

SFRS1 expression, but may also contribute to the negative regulation of SF2/ASF downstream functions (Figure 9B). During tumorigenesis, SF2/ASF‐miR‐7 circuit may function as a “brake” to inhibit the oncogenic properties of SF2/ASF. However, given the relatively low expression level of miR‐7 in tumors, this is unlikely to be a dominant mechanism to antagonize SF2/ASF‐mediated tumor progression.

62

Figure 9: Potential involvement of miR‐7 in regulating the MAPK and PI3K pathways

(A) Dual‐luciferase assays were performed in triplicate as described in Figure 7. For each construct with wild‐type 3’ UTR of the indicated gene, the relative luciferase activity was plotted by normalizing between cells transfected with control RNAs and miR‐7 precursors. The * indicates p < 0.05 (t‐test, n=3). Error bars represent SEM.

(B) Schematic diagram of SF2/ASF and miR‐7 in regulating the PI3K and MAPK pathways. SF2/ASF and mir‐7 regulate each other through negative feedback

63

mechanism. SF2/ASF can directly activate the PI3K‐mTOR pathway through regulating

S6K1 and 4E‐BP1. It may also affect the Ras‐MAPK pathway through Mnk2 (Karni et al.,

2008). However, the effects of SF2/ASF on the PI3K and MAPK pathways can be antagonized by miR‐7.

64

4. A splicing-independent function of SF2/ASF in miR-7 biogenesis

One interesting observation from my earlier thesis work showed that the expression of mature miR‐7 is up‐regulated by elevated SF2/ASF level. The logical next step is to characterize the molecular mechanism underlying SF2/ASF‐enhanced miR‐7 production. It is worth pointing out that mature miR‐7 is encoded by three distinct loci in the human genome. One of them, hsa‐miR‐7‐1, is embedded in the last alternative spliced intron of the hnRNPK gene. Since it is known that SF2/ASF may affect alternative splicing choice, one tempting possibility is that SF2/ASF may affect miRNA‐7 biogenesis through splicing regulation. Alternatively, SF2/ASF may promote miR‐7 maturation through direct binding. Indeed, it has been shown that hnRNP A1 and KSRP can promote miRNA expression by interacting with primary miRNA transcripts to relax their local secondary structures, resulting in more efficient Drosha‐cleavage (Michlewski et al., 2008a; Trabucchi et al., 2009). In this part of my thesis research, I took advantage of a slew of molecular biology approaches to delineate the involvement of SF2/ASF in miRNA biogenesis.

65

4.1 The domain requirement of SF2/ASF for promoting miR-7 expression

The hsa‐miR‐7‐1 embedded intron is alternatively spliced via two duplicated 3’ splice sites (3’ss) (Figure 10A). The resulting hnRNPK variants are identical in their protein sequences, except for the last six amino acids (ADVEGF vs. SGKFF).

Interestingly, SF2/ASF is known to promote proximal splice‐site usage for pre‐mRNA substrates with duplicated 5’ or 3’ss (Fu et al., 1992). This is indeed the case for the hnRNPK locus, where ectopic expression of SF2/ASF significantly increased the ratio between the proximal and distal variants, without affecting the overall mRNA level

(data not shown). Because splicing efficiency could affect the processing of intronic miRNAs (Pawlicki and Steitz, 2008), one attractive model is that SF2/ASF might enhance miR‐7 production through alternative splicing regulation.

To test this hypothesis, I constructed a minigene reporter in which the miR‐7‐ containing intron and its flanking exons of the endogenous hnRNPK gene were inserted downstream of the EGFP open reading frame (Figure 10B). The use of a minigene avoids potential complications due to multiple endogenous miR‐7 loci and unforeseen transcriptional or post‐transcriptional regulation. When transfected into HeLa cells, the hnRNPK minigene produced both mature miR‐7 and the two expected splicing variants, of which the distal 3’ splice site was preferentially used (Figure 10C‐E).

66

Figure 10: A minigene reporter to recapitulate hnRNPK alternative splicing and SF2/ASF‐regulated miR‐7 expression

(A) Diagram of the two alternative splicing isoforms of the endogenous hnRNPK locus.

MiR‐7‐1 precursor (vertical wave line) is shown.

(B) Diagram of the hnRNPK minigene reporter. Proximal/distal splice sites and miR‐7‐1 precursor (dashed box) are shown.

67

(C) The hnRNPK minigene or a control plasmid (Flag‐EGFP without hnRNPK insertion) was transfected into HeLa cells. Cells were harvested 2 days after transfection. RT‐PCR analyses showed two expected splicing isoforms without any detectable cryptic splicing variant.

(D) Western blotting was performed with against Flag‐tag and α‐tubulin. The results showed correct protein products with the expected sizes.

(E) Northern blotting detected mature miR‐7 expressed from the hnRNPK minigene. U6 was used as an internal control. Endogenous miR‐7 was barely detected due to the short

(1‐hr) exposure time.

68

I next examined the effect of exogenous SF2/ASF expression on hnRNPK alternative splicing and miR‐7 expression. As expected, co‐transfection of SF2/ASF cDNA with the minigene increased the level of mature miR‐7 (1.7‐fold) as well as the ratio between proximal and distal splicing variants (Figure 11A, lanes 1 and 2), indicating that the minigene can recapitulate the effects of SF2/ASF at the endogenous hnRNPK locus. Since SF2/ASF is a multi‐functional protein broadly involved in RNA biogenesis, and the domain structure of SF2/ASF is well studied, we were able to manipulate the expression construct of SF2/ASF to separate its functions. By using an

SF2/ASF variant with a nuclear retention signal (SF2‐NRS), the shuttling between nucleus and cytoplasm of SF2/ASF was largely abolished (Cazalla et al., 2002). Similar to the wild‐type SF2/ASF, this variant can promote miR‐7 expression except that the promotion is much stronger (2.4‐fold; Figure 11A and 11B, lanes 1‐3). Moreover, the level of miR‐7 precursors was also increased (see later section), which suggests that enhanced miR‐7 expression is likely to take place at the nuclear processing step (Drosha cleavage).

To further dissect the domain requirement of SF2/ASF for promoting miR‐7 expression, three mutants (SF2∆RS, SF2∆RRM1 and SF2∆RRM2) were examined, in which the RS domain or one of the two RNA recognition motifs (RRMs) was deleted, respectively (Caceres and Krainer, 1993). All mutants failed to promote proximal splice‐

69

site usage, indicating that both the RS domain and RNA‐recognition motifs are required for the splicing regulation (Figure 11C). In contrast, SF2∆RS reproducibly repressed miR‐7 expression to two thirds of its normal level (p < 0.05), whereas deletion of RRM1 or RRM2 rendered SF2/ASF inactive in promoting miR‐7 production (Figure 11A, lanes 1 and lane 4‐6; Figure 11B). The dominant‐negative effect of SF2∆RS suggests a previously uncharacterized function of SF2/ASF in miRNA biogenesis, which is separate from its activities in splicing regulation.

70

Figure 11: The domain requirement of SF2/ASF for prommoting miRNA production and alternative splicing

(A) The hnRNPK minigene was co‐transfected with EGFP control vector (ctrl), wild‐ type/mutant SF2/ASF, SC35, or 9G8 cDNA into HeLa cells. Cells were harvested 2 days after transfection. The overall level of hnRNPK (top panel) and its aalternative splice‐site usage (bottom panel) were monitored by radiolabeled RT‐PCR. The mature miR‐7 level was determined by Northern blotting (middle panel). Representative gels are shown.

The relative miR‐7 levels (normalized to total hnRNPK levels) and the alternative splicing ratios (proximal/distal) were plotted in (B) and (C), respectively (t‐test: p < 0.05; n=3). Error bars represent SEM.

71

To examine the specificity of SR proteins in miRNA biogenesis, two additional

SR proteins, SFRS2 (SC35) and SFRS7 (9G8), were also tested. Despite the fact that both

SR proteins showed much stronger activation of proximal splicing than SF2/ASF, their activity in miR‐7 promotion was not correspondingly increased (Figure 11, lanes 7, 8).

Notably, 9G8 had little effect on miR‐7 expression, suggesting that different SR proteins may have distinct substrate specificities in regulating miRNA expression. In agreement with our previous observations, these results demonstrated that enhanced miR‐7 expression is unlikely due to increased proximal splice‐site usage, indicating there are two separate functions of SF2/ASF in alternative splicing and in miRNA processing.

72

4.2 A splicing-independent function of SF2/ASF in pri-miR-7 processing

In order to determine if SF2/ASF promotes miR‐7 maturation in an alternative splicing‐dependent manner, I next aimed to uncouple SF2/ASF’s functions in alternative splicing and miRNA processing. Thus, the proximal or distal splice site of the hnRNPK minigene was mutated (PM or DM), respectively. When transfected into HeLa cells, the two mutants gave rise to opposite splicing patterns as expected (Figure 12A, lanes 1, 3 and 5). Notably, we could detect both proximal and distal variants in all conditions. This is due to background expression of the endogenous hnRNPK gene. However, the endogenous locus contributes to less than 5% of the total hnRNPK transcripts and can therefore be ignored in miR‐7 expression analysis (Figure 12A and data not shown). As expected, co‐transfection of SF2/ASF no longer affected alternative splicing choice. In contrast, both precursor and mature miR‐7 levels were significantly increased by

SF2/ASF for the PM and DM constructs (Figure 12B and later sections). The levels of miR‐7 enhancement were slightly different between the DM and PM constructs (p =

0.076, Figure 12B), indicating possible context dependence. It is possible that the local conformations of intronic miRNAs might be different when the proximal or distal splice site is used. Alternatively, spliceosome assembly at the proximal splice site may interfere with the intronic miRNA processing.

73

Figure 12: A splicing‐independent function of SF2/ASF in promoting miR‐7 expression

(A) Wild‐type hnRNPK minigene, the proximal 3’ splice‐site mutant (PM) or the distal 3’ splice‐site mutant (DM) was transfected into HeLa cells together with a control vector or a cDNA expressing SF2/ASF. Alternative splicing of hnRNPK and miR‐7 expression were monitored by radiolabeled RT‐PCR and Northern blotting, respectively.

74

Representative gels are shown. The levels of mature miR‐7 were normalized to the total hnRNPK levels. Results from triplicate experiments were plotted in (B) (t‐test: p < 0.05; n=3).

(C) The proximal 3’ splice‐site mutant (PM), the distal 3’ splice‐site mutant (DM) hnRNPK or double mutant was transfected into HeLa cells together with either a control vector or a cDNA expressing SF2/ASF. The splicing pattern of each hnRNPK construct was monitored by radiolabeled RT‐PCR.

(D) Intronless minigene expressing miR‐7 (pCG‐miR‐7) was transfected into HeLa cells together with a control vector or SF2/ASF cDNA. The levels of EGFP and miR‐7 were monitored by radiolabeled RT‐PCR and Northern blotting, respectively. The mature miR‐7 levels were normalized to EGFP, and the results from triplicate experiments were plotted in (E) (t‐test: p < 0.05; n=3).

Error bars represent SEM.

75

As one step further, I wanted to get rid of all possible splicing in the minigene reporter. A mutant construct was made in which both the proximal and distal splice sites were mutated. Although the double mutations eliminated the normal splicing of the miR‐7 embedded intron, it led to several cryptic splicing variants, which prevent further functional analysis (Figure 12C). As an alternative, I cloned the miR‐7‐1 precursor and its flanking regions into a heterologous context to eliminate all possible splicing (Figure 12D). The resulting construct was transfected into HeLa cells together with SF2/ASF cDNA or a control vector. Interestingly, the levels of both precursor and mature miR‐7 were increased by SF2/ASF overexpression, and the fold change was comparable to that in the endogenous context (Figure 12D and 12E). Taken together, these data clearly demonstrated that enhanced miR‐7 expression is mediated by a splicing‐independent function of SF2/ASF.

76

4.3 SF2/ASF directly binds to pri-miR-7 in vivo

Two RNA‐binding proteins (hnRNP A1 and KSRP) have recently been shown to directly interact with primary miRNA transcripts and serve as auxiliary factors for more efficient Drosha cleavage (Michlewski et al., 2008a; Trabucchi et al., 2009). A genome‐ wide survey found that SF2/ASF may bind in close proximity to the stem‐loop regions of three miRNAs in HEK293T cells, although the functional significance of such binding events remains unclear (Sanford et al., 2009). Therefore, one obvious direction is to examine whether SF2/ASF can directly bind to the primary miR‐7 transcript to promote its cropping. To this end, a UV cross‐linking and immunoprecipitation (CLIP) assay was employed, which can capture in vivo interactions between an RNA‐binding protein and its cognate RNAs (Ule et al., 2003).

Because the activities of SF2/ASF in splicing and miRNA processing converge at the endogenous hnRNPK locus, the intronless miR‐7 expressing minigene (Figure 12D) was used for CLIP analysis to avoid the potential binding of SF2/ASF to neighboring exons. HeLa cells were transfected with a moderate level of the pCG‐miR‐7 construct.

UV cross‐linking was carried out 48‐hr after transfection to stabilize RNA‐protein interactions, followed by partial RNase digestion and immunoprecipitation with a monoclonal antibody against SF2/ASF (Hanamura et al., 1998) (Figure 13A). The antibody is highly specific for SF2/ASF with an excellent IP efficiency (Figure 13B). RNA

77

fragments associated with SF2/ASF were then analyzed by radiolabeled RT‐PCR against the EGFP or miR‐7 region. The primer pair specific for miR‐7 is located in the flanking regions of its stem loop, such that the primary transcripts rather than the miR‐7 precursors are probed. The EGFP region co‐transcribed with miR‐7, on the other hand, serves as an internal control to estimate the enrichment of SF2/ASF binding. Correct PCR products with expected sizes were detected only in the SF2/ASF CLIP but not the mock immunoprecipitation (Figure 13C lane 3‐6, and data not shown). The radioactive signal corresponding to the RNA level of either EGFP or pri‐miR‐7 was then directly quantified with PhosphorImager. Consistently, I observed a 2‐fold enrichment of the pri‐miR‐7‐1 fragment by CLIP compared to the EGFP amplicon (Figure 13D). Since random RNase digestion was used before IP, only sequences with high affinity to SF2/ASF should be protected and enriched after CLIP. Although we cannot rule out the possibility that the

EGFP region might be bound by SF2/ASF, our data demonstrated that SF2/ASF preferentially binds to pri‐miR‐7‐1 RNA.

78

Figure 13: SF2/ASF directly interacts with the primary miR‐7‐1 transcripts

(A) Diagram of CLIP RT‐PCR procedure.

(B) CLIP assay was performed without antibody (Mock IP) or with a specific antibody against SF2/ASF (SF2/ASF IP). Western blotting was then carried out with equal amounts of protein obtained from whole‐cell lysate (Input), immunoprecipitation supernatant (Sup) and pellet (IP). IP efficiency was monitored by Western blotting with antibodies specifically against SF2/ASF and α‐tubulin (internal control), respectively.

79

(C) Radiolabeled RT‐PCR analysis of the RNA samples obtained from input and

SF2/ASF‐specific immunoprecipitation. The locations of the two primer pairs, which were used to specifically amplify the EGFP and miR‐7 stem‐loop regions, are shown.

(D) Quantitative RT‐PCR results from 3 independent CLIP experiments were plotted.

The enrichment of the miR‐7 stem‐loop region was determined by normalizing to the

EGFP amplicon (p < 0.01, t‐test, n=3). Error bars represent SEM.

80

Since SR proteins are known to bind Exonic Splicing Enhancers (ESEs) in a sequence‐dependent manner, I next intended to map the potential SF2/ASF binding site(s) within primary miR‐7 transcript. With the assistance of ESEfinder (Cartegni et al.,

2003), I identified a putative SF2/ASF‐binding site, CACAGGC, within the stem‐loop region of miR‐7‐1 (Figure 14A). The motif was then mutated to prevent potential binding. In order to preserve the secondary structure of miRNA precursor, we also made compensatory mutations in the stem‐loop region (Figure 14A). The CLIP experiment was repeated to examine the effects of mutations on SF2/ASF binding. In contrast to the wild‐type minigene, the mutant construct showed similar ratios between the EGFP and miR‐7 amplicons before and after CLIP, suggesting that the predicted binding site is critical for the interaction between SF2/ASF and pri‐miR‐7 transcript

(Figure 14B, lane 4 and 8; Figure 14C).

81

Figure 14: Direct interaction between SF2/ASF and the stem‐loop region of miR‐7‐1

(A) The stem‐loop region of hsa‐miR‐7‐1: the high‐score SF2/ASF motif is shown in red

(dashed box); mature miR‐7 and miR‐7* are shown in pink. The SF2/ASF motif was

82

disrupted by mutations in 4 nucleotides (upper case, Blue) with compensatory mutations in the opposite arm (upper case, Green).

(B) Wild‐type or mutant pCG‐miR‐7 was transiently transfected into HeLa cells. RNA samples obtained from input and SF2/ASF‐specific CLIP were analyzed by radiolabeled

RT‐PCR. Two primer pairs to specifically amplify the EGFP and miR‐7 stem‐loop regions are shown.

(C) The relative enrichment of miR‐7 stem‐loop region before and after SF2/ASF CLIP were plotted for wild‐type and mutant pCG‐miR‐7 transcripts (t‐test: p < 0.05; n=3).

Error bars represent SEM.

(D) In vitro transcribed pri‐miR‐7‐1 RNA or its mutant was coupled to agarose beads and incubated with HeLa cell extract. The bound proteins were analyzed by Western blotting with α‐tubulin as a negative control. Approximately 1/200 of the input extract was loaded as positive control (Lanes 1 and 3). The relative SF2/ASF protein levels are shown at the bottom of the panel.

83

To further corroborate the CLIP result, RNA affinity purification assay was employed, which examine in vitro RNA‐protein interaction in cell extract. The stem‐loop regions of either wild‐type miR‐7‐1 transcript or miR‐7 with mutant SF2/ASF binding site were first in vitro transcribed, and then covalently coupled to agarose beads. Next, beads coupled with RNA were incubated with HeLa cell extract. Lastly, proteins specifically associated with the RNA probe were analyzed by Western blotting. Whereas

SF2/ASF protein could be pulled down by both RNAs, the pull‐down efficiency was markedly reduced when the mutant RNA was used (Figure 14D). Extending the CLIP data, these results confirmed that the interaction between SF2/ASF and miR‐7 is mediated at least in part by the SF2/ASF‐binding site within the miR‐7 stem‐loop region.

84

4.4 SF2/ASF is involved in the Drosha cleavage step of miR-7 maturation

In order to investigate whether the sequence dependent binding is required for the SF2/ASF‐mediated miR‐7 biogenesis, either the wild‐type or mutant pCG‐miR‐7 minigene was transfected into HeLa cells, and the basal levels of miR‐7 expression were comparable (Figure 15A). This is possibly due to the mutations (G‐C to A‐T base‐ pairing), which might relax the local secondary structure and partially compensate for the loss of SF2/ASF binding site. However, the two constructs differed substantially under SF2/ASF overexpression conditions. SF2/ASF‐enhanced miR‐7 expression was significantly reduced with the mutant construct (p = 0.017; Figure 15A), suggesting that the binding site is required for this enhancement.

Notably, the levels of both precursor and mature miR‐7 were upregulated by

SF2/ASF in vivo for all the minigene constructs used (Figure 16), indicating that the

Drosha cleavage step is likely to be involved. This is further supported by the observation that SF2‐NRS is more potent than wild‐type SF2/ASF for promoting miR‐7 expression (Figure 11A). As a more direct approach, an in vitro pri‐miRNA processing assay was performed, which monitors the conversion of pri‐miRNAs to pre‐miRNAs

(Guil and Caceres, 2007). Whole cell extract was prepared from HeLa cells with or without knocking down SF2/ASF by RNA interference (about 20% of its original level)

(Figure 15B). Consistent with the in vivo data, the processing of wild‐type pri‐miR‐7 was

85

significantly reduced when SF2/ASF‐knockdown cell extract was used (Figure 15C, lane

2 and 3). Adding back recombinant SF2/ASF protein restored the processing efficiency in a dose‐dependent manner (Figure 15C lanes 4‐6). These results strongly argue that

SF2/ASF indeed functions at the Drosha cleavage step. As the negative control, depletion and add‐back of SF2/ASF had little effect on the miR‐7 substrate with a mutated

SF2/ASF binding site (Figure 15D). These data agreed well with the in vivo assay showing that the level of pre‐miR‐7 was not significantly changed by co‐transfection of the mutant pCG‐miR‐7 construct with an SF2/ASF expression vector (Figure 15A).

86

Figure 15: SF2/ASF promotes miR‐7 maturation at the Drosha cleavage step in a sequence‐dependent manner

(A) Wild‐type or mutant miR‐7 construct was co‐transfected into HeLa cells with a control vector or SF2/ASF‐expressing plasmid. Transfection efficiencies were normalized to the EGFP levels (radiolabeled RT‐PCR). Precursor and mature miR‐7 levels were

87

monitored by Northern blotting. The relative activity of SF2/ASF in promoting miR‐7 expression is shown in the right panel (t‐test: p < 0.05; n=3). Error bars represent SEM.

(B) HeLa cells were transfected with either Luciferase or SFRS1 specific siRNA. Two days after transfection, cells were harvested to prepare whole‐cell extracts, for which the levels of SF2/ASF protein were analyzed by Western blotting.

In vitro pri‐miRNA processing assay was performed with wild‐type and mutant pri‐ miR‐7‐1 substrates as described before (Michlewski et al., 2008a); representative gels are shown in (C) and (D), respectively.

88

Figure 16: Both precursor and mature miR‐7 levels are increased by elevated SF2/ASF expression

(A) Wild‐type hnRNPK minigene was transfected into HeLa cells with EGFP, wild‐type

SF2/ASF, or SF2‐NRS construct. The levels of hnRNPK mRNA and miR‐7 were monitored by radiolabeled RT‐PCR and Northern blotting, respectively.

89

(B) HnRNPK minigene with a proximal or distal splice‐site mutation was transfected into HeLa cells with a vector expressing EGFP or wild‐type SF2/ASF. After normalizing the transfection efficiency, miR‐7 level was determined by Northern blotting.

(C) The intronless miR‐7‐expressing plasmid, pCG‐miR‐7, was co‐transfected with control or SF2/ASF expression vector. The levels of precursor and mature miR‐7 were monitored by Northern blotting. The level of EGFP mRNAs was determined by quantitative RT‐PCR to normalize the transfection efficiency.

90

Overall, our data clearly demonstrated that SF2/ASF promotes the Drosha cleavage step of pri‐miR‐7 processing, although additional effects in post‐cropping steps cannot be ruled out. In the case of KSRP, it is an important component of Drosha, Dicer and Exportin‐5 complexes, which affects several steps during miRNA biogenesis. We are not sure if SF2/ASF also functions in a similar manner. Moreover, for different miRNA substrate, it is unclear whether different mechanisms may apply.

91

4.5 Broad involvement of SF2/ASF in miRNA biogenesis

In addition to miR‐7, miRNA deep sequencing identified three additional miRNAs (miR‐29b, miR‐221, and miR‐222), whose expression levels were consistently up‐regulated upon SF2/ASF induction (Figure 4A). These results were further validated by Northern blotting with the samples obtained from the same stable cell line (Figure

4B). For each miRNA, a minigene reporter was constructed using the intronless vector

(analogous to pCG‐miR‐7), and co‐transfected into HeLa cells with SF2/ASF cDNA or a control vector (Figure 17A). Similar to the miR‐7 case, both the precursor and mature miRNAs were increased upon SF2/ASF overexpression for miR‐221 and miR‐222, suggesting that the Drosha cleavage step is likely to be involved. Interestingly, SF2/ASF overexpression increased the level of mature miR‐29b only but not its miRNA precursor, implying that SF2/ASF may act at a post Drosha‐cleavage step (e.g., Dicer cleavage). The enhancement could only be observed for miR‐29b‐1 but not miR‐29b‐2. The latter construct serves as a negative control for substrate specificity. Taken together, these results suggest that SF2/ASF is broadly involved in miRNA biogenesis and its function may not be limited to the Drosha cleavage step.

92

Figure 17: SF2/ASF is broadly involved in miRNA biogenesis

(A) HeLa cells were transfected with intronless miR‐29b‐1/2, miR‐221 and miR‐222 constructs, together with an empty or SF2/ASF‐expression vector. The levels of EGFP and miR‐7 were monitored by radiolabeled RT‐PCR and Northern blotting, respectively.

93

(B) Schematic diagram of the SF2/ASF/miR‐7 negative feedback loop. SF2/ASF directly binds to pri‐miR‐7 to promote its nuclear cropping, although effects on later steps (pre‐ miRNA export and/or Dicer cleavage; dashed lines) cannot be ruled out. Additional factors (circle with a question mark) may also be involved in efficient pri‐miRNA maturation, possibly through direct interaction with the RS domain of SF2/ASF. In the cytoplasm, mature miR‐7 (yellow line) binds to the 3’ UTR of SFRS1 mRNA (rectangle box) and represses the production of SF2/ASF protein via translational inhibition.

94

5. Genome-wide identification of SRp20 targets in vivo

Early genetic studies have demonstrated that SR proteins are essential regulatory factors in normal development and general cellular processes. However, their physiological targets remain largely unknown. The field cries out for high‐throughput technologies that can systematically identify in vivo protein‐RNA interaction events at the genome scale. The resulting information will lead to a better understanding of the binding specificity for each SR protein as well as potential crosstalk between individual

SR proteins. Pioneer study was performed with NOVA (Neuro‐Oncological Ventral

Antigen), a neuron‐specific splicing factor. By using CLIP method, the Darnell’s group successfully identified a large set of physiological targets for NOVA, which is intimately involved in synapse functions (Licatalosi et al., 2008; Ule et al., 2003; Ule et al., 2006).

Further investigations led to the identification of a neuronal splicing regulatory network, which is coordinately regulated by NOVA and another neuronal splicing factor Fox

(Zhang et al., 2010). In this part of my thesis study, I have employed PAR‐CLIP, which combines an improved CLIP strategy with Next‐generation sequencing platform, to (1) globally identify the in vivo targets of SRp20 (SFRS3), (2) determine the sequence motif(s) critical for the SRp20‐RNA interactions, and (3) characterize the potential roles of SRp20 in regulating its downstream genes. Notably, SRp20 is a well‐known SR family of

95

splicing factors that has recently been shown as a putative proto‐oncogene (Jia et al.,

2010). Results from this study have provided an invaluable source to investigate the molecular mechanisms as well as the functional significance of SRp20 in post‐ transcriptional gene regulation.

5.1 PAR-CLIP to identify the putative targets of SRp20 in vivo

Identified in early 1990s, SRp20 is a canonical SR family of splicing factor (Zahler et al., 1992). Similar to other SR proteins, SRp20 promotes splicing in vitro in S100 complementation assay. In addition, SRp20 is known to shuttle between nuclear and cytoplasmic compartments (Caceres et al., 1998), and has been shown to play an important role in mRNA export through interacting with the nuclear export receptor

TAP (Hautbergue et al., 2008; Huang et al., 2003; Huang and Steitz, 2001). Several potential SRp20 binding motifs have been identified by SELEX (Cavaloc et al., 1999; Lou et al., 1998; Schaal and Maniatis, 1999). However, there are apparent discrepancies in the sequence motifs identified, likely due to variations in the overall designs of experimental procedures. Therefore, several questions remain open on whether the putative motifs are indeed bound by SRp20 in vivo; and if they do, what are the potential effects of such interactions on splicing and/or other steps of post‐transcriptional gene regulation.

96

To determine the binding specificity of SRp20, PAR‐CLIP method was employed

(Hafner et al., 2010a, b), which is a sequencing‐based approach designed for identifying protein‐RNA interactions at the genome scale. As mentioned in the Introduction section, this approach uses a specific antibody to enrich RNP complex of interest and the RNA portion of the complex is subsequently identified by high‐throughput sequencing. As the first attempt, I used SRp20 antibodies to pull down endogenous SRp20 RNP. It was not successful because the IP efficiency of all commercial available antibodies was considerably low (data now shown). To overcome this challenge, a stable MEF/3T3 cell line was generated, in which the expression of T7‐tagged SRp20 cDNA is under the control of an inducible Tet‐off promoter. The T7 tag would allow for efficient pull‐down of SRp20 and its associated RNAs. The expression of exogenous SRp20 can be induced by removing Doxycycline from the cell culture media (Jia et al., 2010). 96‐hr after induction, both the total mRNA and protein levels of SRp20 were increased to about

2~3‐fold (Figure 18), and this minor induction is not expected to affect the normal function or binding specificity of SRp20 (Figure 18 and data not shown). Thus, this stable cell line provides a well‐controlled system to study the functions of SRp20 in wild‐type and overexpression conditions.

97

Figure 18: Inducible SRp20 expression in a MEF/3T3 stable cell line

(A) The mRNA levels of SRp20 (SFRS3) were quantified by realtime PCR before and after induction (96‐hr). GAPDH was used as an internal control (n=3). Error bars represent SEM.

(B) is the same as (A) except that the ectopically expressed (T7‐tagged) SRp20 protein was monitored by Western blotting analysis. α‐tubulin was used as an internal control.

98

Because PAR‐CLIP procedure is relatively complex, I then evaluated several key steps to determine its specificity and efficiency. Immunoprecipitated SRp20 RNP complex was labeled with γ‐32P‐ATP. The resulting end‐labeled RNP complex was separated on SDS‐PAGE gel and visualized by autoradiography. If the RNA fragments are associated with SRp20 proteins and the immunoprecipitation is specific, a distinct band corresponding to SRp20 RNPs would be observed. As expected, the results showed that SRp20 RNP can be reliably detected by radioactive signal (Figure 19, top panel 20kD band), which was further confirmed by Western blotting using a T7 antibody (Figure 19, bottom panel).

In the original PAR‐CLIP protocol, photo‐reactive ribonucleoside analog (4‐ thiouridine) was used to enhance the UV cross‐linking efficiency (Hafner et al., 2010a).

Interestingly, even without 4‐SU and UV irradiation, I can still detect the radiolabeled band corresponding to SRp20, indicating that 32P can be incorporated into the SRp20 protein itself. Such observation has been reported for another SR protein, SF2/ASF, where background incorporation of 32P radiolabel is independent of RNA moiety

(Sanford et al., 2009). Among all different combinations between 4‐SU addition and UV irradiation, the strongest 32P incorporation can only be observed when both steps are present (Figure 19, lane 1‐4), indicating their importance for efficient cross‐linking. As the control samples without induction, SRp20 RNPs were barely detected (Figure 19,

99

lane 5‐8), and the weak bands observed were probably due to the low leaky expression of T7‐SRp20 (Figure 18B and 19). Together, these results clearly demonstrated that the

PAR‐CLIP strategy can specifically enrich SRp20 RNP from a complex admixture.

100

Figure 19: Dependency of 4‐thiouridine (4‐SU) and UV cross‐linking for the PAR‐ CLIP procedure

PAR‐CLIP was performed for cells with (lanes 1‐4) or without (lanes 5‐8) SRp20 induction using T7‐specific antibody as described in the Methods. SRp20 RNP complex was labeled by γ‐32P‐ATP and visualized by autoradiography on a phosphorimaging screen for 30 minutes (top panel). The same gel was subsequently examined by Western blotting using T7 specific antibody (bottom panel).

101

After optimization of the cross‐linking and immunoprecipitation conditions, an

Illumina sequencing library was constructed using standard library construction procedure. Briefly, the radiolabeled SRp20 RNP complex was retrieved from PAGE gel, and SRp20 bound RNAs were isolated after proteinase K treatment. Two sequence‐ specific linkers were then sequentially ligated to the 3’ and 5’ ends of RNA samples.

Ligated products were reversely transcribed and PCR amplified. The final library was sequenced by the Illumina‐Solexa platform.

102

Figure 20: Identification of SRp20 targets by PAR‐CLIP

(A) Diagram of the PAR‐CLIP procedure: 72‐hr after SRp20 induction, 4‐SU was added to the cell culture media at a final concentration of 100 μM. UV cross‐linking, which stabilizes the interactions between RBPs and the nascent RNA transcripts incorporated with 4‐SU (red star), was then carried out 24‐hr after the pulse labeling. Cells were then lysed and further subjected to brief RNase digestion. SRp20 specificc RNPs were enriched by immunoprecipitation with T7‐specific antibody and recovered by PAGE gel fractionation. The resulting RNA fragments in the SRp20 RNP were isolated and served

103

as input material for constructing Illumina sequencing library. The final library was subjected to Next‐generation sequencing using the Illumina platform.

(B) Overall flowchart of the experimental design for SRp20 PAR‐CLIP.

104

5.2 Global landscape of RNA transcripts bound by SRp20

In depth data analysis was carried out using PARalyzer, a kernel density estimation (KDE) based algorithm designed for PAR‐CLIP data analysis (Corcoran et al., manuscript under review). This software package has recently been used to analyze

PAR‐CLIP data generated with HuR (Mukherjee et al. manuscript under review).

Overall, 5,875,557 reads can be uniquely mapped to the mouse genome with a maximum of two mismatches, which were then grouped by overlaps. The potential SRp20 binding clusters were derived based on read depth and T > C conversions. Finally, 29,080 clusters were identified from the sequencing data. Among them, 12,933 clusters can be mapped to mRNA transcripts. The rest of them were mapped to intergenic regions, suggesting that: (1) the annotation of mouse transcriptome may be far from completion;

(2) SRp20 could potentially be involved in the regulation of novel genes.

To gain a better understanding of SRp20 binding events in annotated transcripts,

I first determined the number of SRp20 cluster per pre‐mRNA (Figure 21A). In general, there are 6,664 transcripts potentially bound by SRp20, with a median of one binding site per transcript and a maximum of 23 binding sites for Fn1 (Fibronectin 1). The result is consistent with previous reports showing the extensive involvement of SRp20 in regulating the alternative splicing of fibronectin gene (Kuo et al., 2002; Lim and Sharp,

1998). Furthermore, the distribution of clusters among different regions of individual

105

transcripts was determined. It showed that exon, intron and 3’ UTRs were the regions predominantly targeted by SRp20 (4,936 clusters were mapped to the exonic regions,

5,377 clusters in the intronic regions, 2,325 clusters in the 3’ UTRs, and 295 in the 5’

UTRs) (Figure 21B). It was somewhat unexpected to find out that 42% of all clusters were mapped to intronic regions. This is likely because the whole cell extract was used for PAR‐CLIP library preparation, in which both spliced and unspliced transcripts are present. Taken together, these data suggested that SRp20 preferentially targets a subset of genes, and the binding events may not be limited to exons.

106

Figure 21: Locations of SRp20 PAR‐CLIP clusters in pre‐mRNA transcripts

(A) The histogram shows the distribution of SRp20 clusters per protein‐coding transcript.

(B) Each protein‐coding transcript was further divided into 5’ UTR, exon, intron, and 3’

UTR based on UCSC gene annotations. The numbers of clusters in each category are shown in the pie chart (left panel). Right panel: The proportion of densities for SRp20 clusters derived from each region of mRNA as a function of sequencing reads.

107

One advantage of PAR‐CLIP is the potential T > C conversions at the real binding sites (see Introduction). Thus, candidate clusters were discarded if they contain no conversion or can be mapped to repetitive elements. After filtering, I focused on 11,280 clusters mapped to mRNA transcripts for further analysis.

In order to investigate the potential functional enrichment of SRp20 targets, gene ontology (GO) analysis was performed for the putative genes bound by SRp20.

Interestingly, cell cycle is one of the top biological processes targeted by SRp20 (p = 2.5E‐

49). SRp20 target genes are also highly enriched in DNA (p = 1.45E‐23) and RNA binding

(p = 3.56E‐18) proteins, which indicates that SRp20 may be generally involved in affecting regulatory genes. Last but not least, several canonical signaling pathways known to be important in development and pathophysiologic conditions are also among the top hits, including EGF receptor signaling pathway (p = 2.05E‐11), Wnt signaling pathway (p = 4.58E‐11), PDGF signaling pathway (p = 2.64E‐10), VEGF signaling pathway (p = 7.48E‐8), FGF signaling pathway (p = 8.89E‐8), TGF‐beta signaling pathway

(p = 6.02E‐6), Ras pathway (p = 4.4E‐11), PI3 kinase pathway (p = 3.25E‐6), and p53 pathway (p = 3.88E‐3) (Appendix E). Overall, these results highlighted the potential importance of SRp20 in the regulation of general cellular processes and possibly pathological progression.

108

5.3 Experimental validation of potential SRp20 binding clusters

In order to test the specificity of PAR‐CLIP clusters identified, I then set out to validate the SRp20‐RNA interactions using RNP immunoprecipitation combined with gene‐specific RT‐PCR (RIP‐RT‐PCR). Since SR proteins are known to bind ESEs to promote splicing (Long and Caceres, 2009), 24 exonic clusters were randomly selected from our PAR‐CLIP data. Three random exonic loci, which are not included in our data set, were also selected to serve as negative controls. Gene specific primers were designed outside the cluster regions with amplicons of ~ 50‐100nt in size. Overall, 19 out of the 24 clusters showed greater than 2‐fold enrichment comparing with input controls, while those clusters can be barely detected in control IP samples (Figure 22, light gray bars).

However, there is no enrichment for all the three negative control regions (Figure 22, dark gray bars). Similarly, 7 clusters mapped to the 3’ UTRs were selected for validation.

Among them, 4 clusters showed significant enrichment after IP (Figure 22, yellow bars).

Taken together, 74% of the randomly selected PAR‐CLIP clusters can be validated by this method. These data demonstrated that the overall PAR‐CLIP strategy is reliable, and the PAR‐CLIP clusters can largely recapitulate the real interactions in vivo.

109

Figure 22: Validation of candidate SRp20 PAR‐CLIP clusters by quantitative RT‐PCR

(A) 96‐hr after induction, immunoprecipitation was done in the SRp20 stable cell line using T7 antibody. CLIP‐PCR was performed as described in the MMethods. 31 positive loci (light gray: 24 exonic clusters; blue: 7 3’ UTR clusters) and 3 negative loci (dark gray) were randomly selected from the PAR‐CLIP clusters for validation. Relative enrichment of individual loci (comparing IP+ and Input control) from triplicate experiments was

110

plotted. The line representing 2‐fold enrichment was shown as a cutoff. Error bars represent SEM.

(B) Representative gels for three SRp20 exonic clusters after RIP‐RT‐PCR analysis were shown. RNA extracted from mock IP, T7‐SRp20 specific IP or Input control was used as template for with or without RT reactions. Then, regular PCR was performed using the same primers as quantitative RT‐PCR.

111

5.4 De novo Motif discovery

Several SRp20 motifs have been identified using in vitro SELEX methods, including GGUCCUCUUC (Lou et al., 1998), [AU]C[AU][AU]C (Cavaloc et al., 1999), and CUC[UG]UC[UC] (Schaal and Maniatis, 1999). The apparent discrepancies between these SRp20 motifs identified may arise from different local sequence contexts of individual assay designs. In order to test if the in vivo motif is similar to that of in vitro conditions, we performed de novo motif discovery using cERMIT, an evidence‐ranked motif identification tool for genome‐wide motif discovery (Georgiev et al., 2010). Among the 11,280 clusters mapped to mRNA transcripts, 6,323 clusters (56.1%) contain a degenerate CWWCW motif (W: A or U) (Figure 23A). This motif resembles the

CUCUUCU motif identified by in vitro assays (Lou et al., 1998; Schaal and Maniatis,

1999). Interestingly, SRp20 has recently been reported to control the early‐to‐late switch of papillomaviral transcripts (Jia et al., 2009). The switch is mediated by the binding of

SRp20 to A/C‐rich elements in the viral transcripts. Once again, this motif partially overlaps with the consensus sequence identified by PAR‐CLIP. Taken together, these results suggested that the PAR‐CLIP strategy employed here is a reliable assay to identify in vivo binding motif for SRp20.

To better understand potential sequence context surrounding the motif, the distribution of the CWWCW motif in different regions within the mRNA transcripts was

112

investigated (Figure 23B). In the classic model, SR proteins bind to the Exonic Splicing

Enhancers (ESEs) to promote splicing (Long and Caceres, 2009). Consistent with the model, exons are the main regions targeted by SRp20. 78.6% of exonic clusters contain the CWWCW motif, and 59.7% of all motif‐containing clusters belong to this category.

Notably, 72.8% of the clusters in 3’ UTRs and 63.3% of the clusters in 5’ UTRs are also enriched for this motif. These three categories combined hold 86.6% of all motif‐ containing clusters. Despite that 4103 intronic clusters were included in the motif finding, only 20.6% of them contain the potential motif. Since no other significant motif was discovered for the intronic clusters, these results indicate that the experimental noise for intronic clusters may be higher than other regions, or a more degenerate motif might be responsible for the binding of SRp20 to intronic regions. Taken together, these data suggest SRp20 primarily binds to the exons and 3’ UTRs of pre‐mRNAs (see

Conclusions and Discussions).

113

114

Figure 23: Identification of putative SRp20 binding motif

(A) Left panel: The CWWCW motif identified by cERMIT. Right panel shows the proportion of SRp20 clusters with or without the CWWCW motif as a function of sequencing reads.

(B) Left panel: Each protein‐coding transcript was divided into 5’ UTR, exon, intron, and

3’ UTR based on UCSC gene annotations. The mosaic plot shows the proportion of

SRp20 clusters with or without the CWWCW motif in each region of mRNA transcripts.

The actual numbers of PAR‐CLIP clusters in individual categories are summarized in the right panel.

(C‐F) Histograms show the distribution of CWWCW motifs per region of mRNA for exon, 3’ UTR, 5’ UTR and intron.

115

For a particular cluster, several CWWCW motifs may exist. Thus, the distribution of SRp20 binding motifs per cluster in terms of their relative locations (3’ UTR, 5’ UTR, exon and intron) was plotted (Figure 23C‐E). Similar to the distribution of raw clusters, I can clearly see more occurrences of SRp20 binding sites in exons and 3’ UTRs with a mean of 1.16 and 1, respectively. However, the numbers drop to 0.87 and 0.28 for clusters in 5’ UTRs and introns. These results suggest that SRp20 may bind to exons and

3’ UTRs with higher affinity compared with other regions of pre‐mRNA.

Taken together, our analysis strongly suggested that SRp20 predominantly binds to the exonic regions. Presumably, these binding events regulate the splicing efficiency of nearby introns. In addition, there is a myriad of SRp20 binding events identified in the 3’ UTRs. These binding events could potentially affect the RNA polyadenylation, export, translation and/or stability. It has been shown that SR proteins may be directly involved in regulating 3’ end processing (see Introduction). One intriguing hypothesis is that they might reinforce the proper 3’ end formation of respective transcripts by directly or indirectly recruiting the polyadenylation machinery (see Conclusions and

Discussions).

116

5.5 Crosstalk between SRp20 and other SR proteins

One major role of an SR protein is to promote splicing by directly binding to the

ESEs, which helps recruit basal spliceosomal components or antagonize the negative splicing regulators (Roscigno and Garcia‐Blanco, 1995; Zhu et al., 2001). In the PAR‐CLIP experiment, we have identified 4,936 exonic clusters potentially bound by SRp20. To examine the potential effects of such interactions in splicing, I focused on the exon skipping events (based on annotations from UCSC genes or Refseq), which are the most common modes of alternative splicing (Sammeth et al., 2008). 13 alternative exons with exonic SRp20 binding clusters were randomly selected, and primers were designed in the flanking constitutive exons to detect the exon skipping events. For 4 of them, two were not detected due to low expression levels, whereas no alternative splicing events were detected for the other two cases. The rest 9 showed alternative splicing patterns and 6 (66.6%) showed splicing alternations (> 20%) between the conditions with or without SRp20 induction (Figure 24A). Interestingly, both enhanced exon skipping and inclusion can be observed instead of general splicing promotion, suggesting SRp20 is also involved in suppressing the splicing of nearby introns. Together, our data suggest that SRp20 broadly interacts with exonic regions and potentially affects local splicing.

Another interesting observation is that about 18% of SRp20 clusters were mapped to the 3’ UTR regions of mRNAs. Strikingly, study from the Chris Burge’s

117

group demonstrated that alternative polyadenylation may be more heavily regulated than alternative splicing in different tissues (Wang et al., 2008). To better understand the architecture of 3’ end processing, several groups including ours have recently developed sequencing‐based methods to investigate alternative polyadenylation (APA) at genome scale (Jan et al., 2011; Mangone et al., 2010). A recent tissue survey showed that more than 50% of protein‐coding genes have two or more alternative polyadenylation sites in the 3’ UTR regions (Ni et al., manuscript in preparation).

While the detailed analysis of PA‐seq (poly(A) sites sequencing) will be reported in a separate study, it provided a high‐quality APA data set that allows me to investigate the potential involvement of SRp20 in APA. To this end, I focused on transcripts that have tandem PA sites as well as PAR‐CLIP clusters identified in their 3’

UTRs. A total of 1,683 genes belong to this category. 13 genes were then randomly selected to examine whether the binding of SRp20 to its cognate sites could affect alternative polyadenylation. For each transcript, a multiplex PCR assay was performed with 3 primers (Figure 24B). The forward primer was designed in the common region of the last exon. Two reverse primers were included in the assay such that the resulting amplicons reflect APA sites: the first one is located before the proximal poly(A) site, which can capture mRNAs with both long and short 3’ UTRs; the second one is located between the proximal and distal poly(A) sites, which only captures the mRNAs with the

118

long UTRs. By comparing the polyadenylation profiles before and after SRp20 induction,

I found that, in 11/13 of the cases, the usage of alternative poly(A) sites was significantly changed upon SRp20 induction (Figure 24B). These preliminary data strongly suggested that SRp20 may be involved in the 3’ end processing through direct binding.

Since SRp20 is a well‐known splicing regulatory protein, it will be interesting to test whether SRp20 affects 3’ end processing in a splicing‐dependent or –independent manner. Moreover, SRp20 might be involved in other post‐transcriptional regulation (i.e.

RNA export, stability and translation), and these hypotheses deserve further testing.

119

Figure 24: Functional validation of candidate SRp20 binding sites

(A) Top panel: Diagram of the splicing events for exon skipping or inclusion, and alternative splicing is represented with dashed lines. The forward or reverse primer in the flanking exons is indicated by red or green triangle, respectively. Middle and bottom panels: Cassette exons with potential SRp20 binding were randomly chosen based on

SRp20 PAR‐CLIP data. The alternative splicing of cassette exons were monitored by radiolabeled RT‐PCR or regular RT‐PCR. Ratios between upper band (exon inclusion) and lower band (exon skipping) were shown at the bottom of each panel.

120

(B) Top panel: Diagram of a 3’ UTR region with two alternative polyadenylation sites.

The relative locations of three primers were shown: one forward primer in the last exon

(red triangle) and two reverse primers before the two poly(A) sites (green triangles).

Middle and bottom panels: Genes with 2 alternative poly(A) sites and potential SRp20 binding in the 3’ UTR regions were randomly chosen based on SRp20 PAR‐CLIP data.

The alternative usage of each poly(A) site was monitored by regular RT‐PCR. Upper band: Amplicon from transcripts with only long 3’ UTRs. Lower band: The common region from transcripts with either short or long 3’ UTRs.

Stars indicate non‐specific bands.

121

Recently, I have performed PA‐seq in the SRp20 stable cell line before and after induction, and the data analysis is currently under way. It is anticipated that this new data set will provide insight into whether SRp20 is broadly involved in APA.

Taken together, our data suggest that SRp20 may be involved in multiple step of post‐transcriptional gene regulation, including but not limited to alternative splicing and alternative polyadenylation. These findings highlight SRp20 as a multifunctional protein in shaping mammalian transcriptome.

122

5.6 SRp20 in regulating the expression of SR proteins

As described in the Introduction, the expression of SR proteins is regulated in a tissue‐ or cell type‐specific manner through multiple mechanisms (Jumaa and Nielsen,

1997; Sureau et al., 2001). In Chapter 3, I reported a negative feedback loop between

SF2/ASF and miR‐7. Another interesting mechanism is through alternative splicing coupled NMD, which is usually associated with highly conserved DNA elements within

SR gene loci (Jumaa and Nielsen, 1997; Lareau et al., 2007; Sureau et al., 2001). However, the underlying mechanism is largely unknown. Since initial PAR‐CLIP results showed

SRp20 targets are enriched in RNA‐binding proteins. I am interested to test whether other SR proteins can be targeted by SRp20 in vivo, and to investigate the potential mechanism underlying such crosstalk.

To this end, I focused on SRp20 clusters in canonical SR protein transcripts.

Interestingly, 6 out of 12 SR proteins have SRp20 binding clusters within their genic regions: three exonic clusters, four 3’ UTR clusters and four intronic clusters (Table 2).

These results strongly suggest SRp20 is involved in regulating the expressions of SR proteins.

123

Table 2: Summary of PAR‐CLIP clusters in SR protein coding genes

Cluster Sequence Reads Exon 3’UTR Intron 5’UTR Gene CTCCTAACATCTACATTCCCCTCATG 154 0 1 0 0 Sfrs1 ACATAACTCATCATTCCCCAG 73 0 1 0 0 Sfrs1 CCTTTATTCTTCTACATCTCTCCTAAG 9 0 0 1 0 Sfrs3 ATCAACATTACATTCAACTAG 8 0 0 1 0 Sfrs3 TCTTCACAACAACTACAG 99 1 0 1 0 Sfrs3 AAAAATCCACAACACCCCCTTCG 25 1 0 1 0 Sfrs3 AAACTGCAATCAATATTTACCTTACAACCTTTCCATTA 49 0 1 0 0 Sfrs6

124 TCAACTTTACTCCTATTCATTG 8 0 0 1 0 Sfrs7

CCACACTACTTCTCTCCTTTCAG 104 1 0 0 0 Sfrs9 TTTTTCCCCCAATAATTTATTTCACATCTCTACCAGTG 54 0 1 0 0 Sfrs11 ACTTAACCACCACCTTACACCAACA 7 0 0 1 0 Sfrs11

By examining the location of SRp20 binding site in individual SR genes, 4 out of the 11 clusters showed evidence of alternative splicing coupled NMD (Lareau et al.,

2007): two in the 3’ UTR of SFRS1 (SF2/ASF); and the other two in exon4 of SFRS3

(SRp20). For SRp20, I did observe upregulation of the exon4 containing isoform upon

SRp20 overexpression (data not shown), suggesting the negative feedback regulation of

SRp20 on its own expression. This is also consistent with the previous report showing that SRp20 can promote the exon4 inclusion of its own transcript, resulting in a truncated protein (Jumaa and Nielsen, 1997)

As for SFRS1, it has been shown that human SFRS1 genes can give rise to six potential isoforms partially due to alternative splicing in its 3’ UTR (Sun et al., 2010).

Two highly conserved regions have been reported in the 3’ UTR of SFRS1, which are associated with alternative splicing (Lareau et al., 2007). Strikingly, the two clusters identified by PAR‐CLIP‐seq overlapped with the two introns, and the second one is located very close to the 3’ splice site of the last intron (Figure 25A). Quantitative RIP‐

RT‐PCR results showed that there are 9.6 and 17 folds enrichment, respectively, for these binding clusters after immunoprecipitation (Figure 24B). These data confirmed the PAR‐

CLIP‐seq results and demonstrated the potential direct interactions between SRp20 and the 3’ UTR of SFRS1 transcripts.

In order to test if those two alternative splicing events are affected by SRp20 overexpression, a pair of primers was designed to cover the two alternative introns and

125

the two CLIP clusters (Figure 25A). Interestingly, when SRp20 is induced, the splicing of the second intron is dramatically decreased, while the protein‐producing isoform or the isoform with full length 3’ UTR was increased correspondingly (Figure 25C). As expected, the overall mRNA level of SFRS1 was slightly increased possibly due to less

NMD (Figure 25C). Moreover, this change in the alternative splicing of SF2/ASF mRNA indeed led to more SF2/ASF protein production in the stable cell line (Jia et al., unpublished data). Taken together, SRp20 is likely to be involved in the regulation of

SF2/ASF expression. Such regulation may serve as a model mechanism to coordinate the expression of SR proteins.

126

Figure 25: SRp20 regulates the expression SFRS1 gene through direct binding

(A) Gene structure of tthe mouse SFRS1 locus is shown on UCSC genome browser. Two putative SRp20 clusters identified by PAR‐CLIP and the PCR amplicon designed for alternative splicing analysis are shown at the top. Middle panel: Refseq and Ensembl

127

gene annotations for SFRS1. The two introns within the 3’ UTR are shown. Bottom panel: Conservation across mammals.

(B) Quantitative RT‐PCR against the two individual SRp20 PAR‐CLIP clusters was performed as described before. Relative enrichment was determined by comparing IP+ and Input control samples. Results from triplicate experiments were plotted. A representative gel image is shown on the right.

(C) Left panel: Diagram of the splicing pattern in the 3’ UTR of SFRS1 is shown, and alternative splicing is represented with dashed lines. Middle panel: Splicing pattern in the 3’ UTR of SFRS1 gene was monitor by radiolabeled RT‐PCR before and after SRp20 induction. Relative abundance of each isoform was shown. Right panel: The steady‐state level of SF2/ASF (SFRS1) was quantified by realtime PCR before and after induction (96‐ hr). Actin was used as an internal control (n=3).

Error bars represent SEM.

128

6. Conclusions and Discussions

6.1 Negative feedback circuit between SFRS1 and miR-7

In the first part of my thesis, I showed that SF2/ASF and miR‐7 can potentially form a negative feedback circuit. SF2/ASF directly binds to the primary miR‐7 transcript to promote its maturation; mature miR‐7 in return represses the translation of SFRS1 mRNA by targeting its 3’ UTR (Figure 17B). Negative feedback has been shown to be a common mechanism for maintaining the steady‐state levels of SR proteins (Jumaa and

Nielsen, 1997; Sureau et al., 2001). As for SF2/ASF, it is known that auto‐feedback regulation can happen at multiple levels, including alternative splicing and translation initiation (Lareau et al., 2007; Ni et al., 2007; Sun et al., 2010). The miR‐7‐mediated negative feedback loop therefore is expected to synergize with other mechanisms to precisely control the protein level of SF2/ASF in cells. The relative contribution of each mechanism may vary under different conditions, such as in different cell types or pathophysiological states. This may also explain the different phenotypes I observed when knocking down endogenous miR‐7 in HEK293T and HeLa cells. Because the regulation observed in the SFRS1/miR‐7 circuit is relatively modest (2‐fold effects in both directions), miR‐7‐mediated negative feedback does not appear to be a major contributor to the robust feedback regulation observed for SF2/ASF in HeLa cells.

Interestingly, Dicer‐null ES cells showed decreased but detectable feedback of SF2/ASF

129

on its own expression (Sun et al., 2010), consistent with the notion that both miRNA‐ dependent and ‐independent mechanisms are involved.

Given the reciprocal nature of the SFRS1/miR‐7 circuit, another possibility is that

SF2/ASF may be critical for the homeostasis of miR‐7, an important regulatory molecule orchestrating diverse cellular functions (Li et al., 2009). Besides the SFRS1/miR‐7 feedback loop, a circuit with similar architecture has been reported between the transcription factor E2F and the miR‐17‐92 cluster (OʹDonnell et al., 2005; Woods et al.,

2007). Mathematical modeling showed that miR‐17‐92 is essential for the E2F/Myc cancer network to balance between cell proliferation and apoptosis (Aguda et al., 2008).

Since SFRS1 has been shown to serve as a proto‐oncogene (Karni et al., 2007), it will be interesting to further investigate the SFRS1/miR‐7 feedback loop from a network perspective in order to fully understand its functional significance.

Notably, negative feedback does not always lead to a stable steady state, as over‐ correction and/or time delay could result in oscillation (Elowitz and Leibler, 2000).

Because gene repression by miRNA is relatively modest, with little time delay, miRNA‐ mediated negative feedback loops are advantageous for noise dampening, and have been shown as a recurrent circuit motif in mammalian gene regulatory networks (Tsang et al., 2007).

Interestingly, a similar negative feedback circuit was identified between SRp20 and miR‐186 (data not shown). Thus, it is tempting to propose that negative feedback

130

circuits between RNA‐binding proteins and miRNAs regulated by them might be prevalent in mammalian genomes. One potential future direction is to investigate the biological significance of such miRNA‐embedded negative feedbacks in vivo using model organisms.

6.2 A splicing-independent function of SF2/ASF in miRNA processing

One key finding of my thesis study concerns the splicing‐independent function of SF2/ASF in pri‐miRNA processing. By separation of splicing and miR‐7 production in the hnRNPK locus, I provided multiple lines of evidence showing that SF2/ASF promotes miR‐7 maturation at the Drosha cleavage step through direct binding to the pri‐miR‐7, and such regulation is independent of splicing. HnRNP A1, another well‐ known alternative splicing factor, serves as an auxiliary factor in pri‐miR‐18a processing

(Guil and Caceres, 2007). In fact, the binding of hnRNP A1 to pri‐miR‐18a introduces a conformational change in its terminal loop to allow more efficient Drosha cleavage. It is plausible that SF2/ASF may function in a similar manner during pri‐miR‐7 maturation, a possibility that can be tested with RNA Footprinting. In addition, I observed a dominant‐negative effect of SF2∆RS on miR‐7 production. Since the RS domain of an SR protein is thought to be responsible for protein‐protein interactions, this result suggests that an additional factor is involved to enhance pri‐miRNA maturation and it warrants further characterization. 131

Both alternative splicing and miRNA processing coincide in the last intron of the hnRNPK gene. This provides a unique opportunity to examine the potential cooperation and/or competition between the spliceosome and Microprocessor (Drosha/DGCR8 complex). It has been shown that processing of intronic miRNAs takes place before intron removal (Kim and Kim, 2007). Our results showed that SF2/ASF promotes the proximal 3’ splice‐site usage of the miR‐7‐containing intron. However, the ability of

SF2/ASF to promote miR‐7 expression is slightly reduced when the proximal 3’ splice site is exclusively used (Figure 12, A and B), suggesting there is potential context dependency. It also implies that spliceosome assembly at the nearby splice sites may affect the processing of intronic miRNAs. One attractive model is that the spliceosome might compete with the miRNA processing machinery for common auxiliary factors

(e.g., SF2/ASF). Thus, multi‐functional RNA‐binding proteins, such as SF2/ASF and hnRNP A1, may serve as key mediators for coupling intronic miRNA processing and pre‐mRNA splicing. Alternatively, intronic miRNAs might adopt different local conformations depending on alternative splice‐site usage. We therefore propose that the functions of SF2/ASF in mRNA and miRNA processing may not be mutually exclusive; instead they might modulate each other in a context‐dependent manner.

132

6.3 SR proteins may be broadly involved in miRNA biogenesis

The functional involvement of SF2/ASF in pri‐miRNA processing is not limited to miR‐7. Preliminary evidence showed that SF2/ASF is also involved in the maturation of miR‐221, miR‐222, and miR‐29b‐1 (Figure 17A). While enhanced Drosha cleavage is likely to be involved in the cases of miR‐221 and miR‐222, SF2/ASF might promote miR‐

29b‐1 maturation (but not miR‐29b‐2) at a post‐cropping step (e.g., pre‐miRNA export and/or Dicer cleavage). Supporting this notion, SF2/ASF is a known shuttling protein that plays diverse roles in both the nucleus and cytoplasm (Caceres et al., 1998; Sanford et al., 2004). Furthermore, it has been shown that KSRP, a well‐known factor involved in alternative splicing and mRNA degradation, regulates the biogenesis of a subset of miRNAs at several distinct steps (Trabucchi et al., 2009). Therefore, miR‐29b may serve as an excellent substrate in future study to test the potential involvement of SF2/ASF in other steps of miRNA maturation.

Conversely, our profiling results showed that the expression of a subset of miRNAs can be repressed by SF2/ASF. It will be interesting to find out whether SF2/ASF is directly involved and may play a negative role in pri‐miRNA processing in a substrate‐specific manner. Indeed, one RNA‐binding protein and splicing regulatory protein, hnRNP A1, has recently been shown to negatively regulate let‐7a biogenesis through competing with the binding of KSRP (Michlewski and Caceres, 2010).

133

In our minigene reporter assay, another SR protein, SC35, can also promote miR‐

7 maturation to some extent (Figure 11A). As for SRp20, 23 miRNAs were differentially expressed upon SRp20 induction (data not shown). Taken together, it is conceivable that

SR proteins may be broadly involved in positive and/or negative regulation of miRNA biogenesis at the post‐transcriptional level.

It is known that individual SR proteins favor distinct sequence motifs for splicing regulation. Substrate specificity may also apply to miRNA biogenesis. For example, both

SF2/ASF and SC35 can promote miR‐7 maturation in our minigene system, while 9G8 cannot. Thus, it will be important to determine the substrate specificities of, as well as the potential cooperation/competition between, different RNA‐binding proteins in controlling tissue‐ and cell type‐specific miRNA expression. Indeed, a recent study showed that hnRNP A1 and KSRP can play antagonistic roles in let‐7a biogenesis

(Michlewski and Caceres, 2010). Interestingly, miR‐7 expression can be upregulated by both SRp20 and SF2/ASF (data not shown). Since our preliminary data showed that

SRp20 promotes SF2/ASF expression, it will be interesting to test if SRp20 regulates miR‐

7 maturation indirectly through SF2/ASF, or there is direct involvement of SRp20 in miR‐7 expression.

134

6.4 SR proteins may regulate their downstream gene expression in concert with miRNAs

Both RNA‐binding proteins and miRNAs regulate the expression of a large number of protein‐coding genes. Therefore, they may share common downstream targets and/or signaling pathways. Such a “wiring” structure has been reported between transcription factors and their regulated miRNAs, and is a recurring motif in transcriptional gene networks (Lee et al., 2007; Shalgi et al., 2007). One well‐known example is the miR‐34 family of miRNAs, which are direct transcriptional targets of p53 and can act in concert with other p53 downstream effectors to inhibit inappropriate cell proliferation (Chang et al., 2007; He et al., 2007).

Notably, SF2/ASF can function as an oncoprotein by activating the PI3K‐mTOR pathway (Karni et al., 2008) Our preliminary data suggested that miR‐7 can target several genes in the same pathway, indicating potential coordination may occur at the downstream gene modules. In addition, several SF2/ASF‐upregulated miRNAs (e.g., miR‐221 and miR‐222) have been implicated in tumorigenesis (Sun et al., 2009; Terasawa et al., 2009). It is possible that these miRNAs may either antagonize or synergize with

SF2/ASF during tumorigenesis. One attractive model is that splicing regulation and miRNA‐mediated gene repression may be broadly coordinated in post‐transcriptional gene regulatory networks, a possibility deserves more systematic characterization.

135

6.5 Landscape and biological significance of the SRp20-RNA interactions

SR proteins are known to be involved in multiple steps of RNA maturation, especially during constitutive and alternative splicing. Early studies suggested partial redundant but distinct functions between different SR proteins in splicing regulation.

One step toward solving this puzzle is to experimentally identify the binding targets of individual SR proteins in vivo. Notably, PAR‐CLIP, a newly developed high‐throughput method, has been successfully used to profile global interactions between RBPs and their cognate RNAs (Hafner et al., 2010a, b). By performing genome‐wide PAR‐CLIP in a stable cell line with inducible SRp20 expression, I have identified 6,323 potential SRp20 binding sites (56.1% out of the 11,280 raw clusters) with a CWWCW motif, similar to the

CUCUUCU motif identified by others using an in vitro method (Schaal and Maniatis,

1999). Consistent with the literature, more than half of the binding events (3,775) reside in the exonic regions, and may affect the local splicing. Indeed, 6 out of 9 randomly chosen cassette exons with SRp20 binding sites showed significant changes in exon skipping/inclusion after SRp20 induction. Since the target exons can be either included or skipped, this raised an interesting possibility that SRp20 may function as a splicing repressor in some occasions, and the underlying mechanisms deserve further characterization. Taken together, these data suggest that SRp20 predominantly targets exons for splicing regulation.

136

6.6 SRp20 in 3’ end processing

Polyadenylation is an indispensible step in eukaryotic gene expression and critical for controlling the stability, subcellular localization as well as translation of mature mRNAs (Millevoi and Vagner, 2010). Proper 3’‐end formation has to be tightly regulated under pathophysiologic conditions, such as cell differentiation, cell cycle, tumorigenesis and viral infection. Defects in polyadenylation often lead to human diseases (Danckwardt et al., 2008; Ozsolak et al., 2010). In addition, polyadenylation can be alternatively regulated. It has been estimated that more than half of the genes in the human genome are alternatively polyadenylated (Tian et al., 2005). Interestingly, global shortening of 3’ UTR through alternative polyadenylation (APA) has been implicated in tumor development (Sandberg et al., 2008). However, it remains unclear the identities of key regulators involved in APA except the basal polyadenylation machinery. One potential candidate is the SR family of protein factors, which has been shown to directly interact with the 3’ end processing machinery (Ko and Gunderson, 2002; Valente et al.,

2009). However, the general functions of SR proteins in 3’ end processing are largely unknown.

Our PAR‐CLIP experiment has shown extensive interactions between SRp20 and the 3’ UTRs of protein‐coding genes, suggesting the possible involvement of SRp20 in the 3’ end processing. Assisted by our unpublished data on genome‐wide profiling of polyadenylation sites, genes with tandem poly(A) sites were selected to investigate

137

whether SRp20 may globally affect the polyadenylation choices. Strikingly, in majority of the cases tested I found differential usage of alternative polyadenylation sites upon

SRp20 induction. Although the general involvement of SRp20 in constitutive 3’ end processing cannot be ruled out, these data strongly suggest a novel function of SRp20 in regulating alternative 3’ end processing.

It has been shown that splicing of the terminal intron is often coupled with 3’ end processing (Niwa and Berget, 1991; Niwa et al., 1990). Mutations of the 3’ splice site or the pyrimidine tract of the last intron decreases polyadenylation efficiency. Conversely, mutation of the AAUAAA polyadenylation signal represses the splicing of the proximal intron in vitro (Azuma‐Mukai et al., 2008; Cooke and Alwine, 1996; Cooke et al., 1999;

Niwa and Berget, 1991). In our study, I have identified the global involvement of SRp20 in both alternative splicing regulation and 3’ end processing. It is tempting to suggest that SRp20 may be one key factor to coordinate splicing and 3’ end formation. Thus, it will be interesting to investigate whether SRp20 can affect 3’ end processing in coordination with splicing regulation. On the other hand, some SRp20 binding sites can be found far away from the nearby introns (e.g. > 3kb in the case of Etv6), suggesting splicing‐independent mechanisms may apply. In this scenario, SRp20 may be involved in direct recruitment of polyadenylation machinery, similar to its role in splicing regulation. Interestingly, such a phenomenon has been observed for a neuron‐specific alternative splicing factor Nova (Licatalosi et al., 2008). Taken together, splicing

138

regulatory proteins may be generally involved in both alternative splicing and 3’ end processing.

6.7 SRp20 in the regulation of alternative splicing coupled NMD

Even though only 1.1%‐1.5% of the human genome are exonic (Lander et al.,

2001; Venter et al., 2001), distinct genomic regions with potentially regulatory roles are highly conserved across mammals. 481 UltraConserved Regions (UCRs) have been recently identified, which are longer than 200 bp with 100% identity among the human, mouse and rat genomes. Of these UCRs, 111 are located in exonic regions covering 93 known genes (Bejerano et al., 2004). Strikingly, these genes are highly enriched for RNA‐ binding proteins including splicing factors. Further study showed that UCRs in SR genes are usually associated with unproductive alternative splicing variants by introducing PTC (Premature termination codons) (Lareau et al., 2007). Since UCRs are usually considered non‐coding regions, which are not under selective pressure for the protein‐coding constraints, one potential explanation for such conservation is their regulatory roles in maintaining protein homeostasis.

SR proteins are highly enriched for PTC‐containing exons, as in the case of SFRS3

(SRp20), SFRS4 (SRp75), SFRS5 (SRp40), SFRS6 (SRp55) and SFRS7 (9G8) (Lareau et al.,

2007). Consistent with the previous observation, SRp20 overexpression promotes exon4

(a PTC containing exon) inclusion in our stable cell line system, leading to an unstable

139

SRp20 transcript. This negative feedback may be critical to maintain the steady‐state level of its own expression. In the cases of SFRS1, two alternative introns have been found in the 3’ UTR, which are flanked by three UCRs. Furthermore, due to the newly introduced exon‐exon junction in the 3’ UTR, the resulting RNA variants are subjected to

NMD (Lareau et al., 2007). Interestingly, we found two SRp20 binding clusters overlap with the UCRs, which inhibit the splicing of alternative intron and increase the level of protein‐producing isoform (Figure 24C). Such crosstalk may explain the correlated expression of SRp20 and SF2/ASF proteins in diverse tissues (data not shown). To our knowledge, this is the first example showing that UCRs may be involved in the regulation of protein homeostasis via alternative splicing coupled NMD.

6.8 SRp20 and SF2/ASF in tumorigenesis

SRp20 has been recently identified as a potential proto‐oncogene by controlling cell cycle progression and proliferation (Jia et al., 2010). In order to test whether SRp20 binding targets are enriched for certain gene categories or biological functions, gene ontology (GO) analysis was performed. Indeed, the results identified cell cycle (524 genes) as one of the major biological processes targeted by SRp20. In addition, RNA binding (161 genes) is one of the top molecular functions potentially regulated by SRp20.

To be more specific, we have identified potential SRp20 binding in 6 out of the 12 SR genes. These observations suggest SRp20 may be broadly involved in regulating other

140

regulatory proteins. In addition, SRp20 targets are enriched for a variety of well‐known signaling pathways, including EGF receptor, Ras, Wnt, VEGF, FGF, p53 and more.

Taken together, SRp20 might be one of the master regulators of both normal development and pathophysiologic conditions.

In our study, both in vivo binding assay and functional data suggested SRp20 as an upstream regulator of SF2/ASF expression. Notably, both SFRS1 (SF2/ASF) and

SFRS3 (SRp20) have been recently reported as potential proto‐oncogenes (Jia et al., 2010;

Karni et al., 2007). In the first case, SF2/ASF controls the splicing of tumor suppressor

BIN1 (Bridging Integrator 1), the MNK2 (MAP kinase interacting serine/threonine kinase 2) and S6K1 (Ribosomal protein S6 Kinase polypeptide 1) (Karni et al., 2007). In the case of SRp20, it regulates the expression of FoxM1 (Forkhead box transcription factor M1) and its transcriptional targets, Plk1 (Polo‐like kinase 1) and

Cdc25B (Cell division cycle 25 homolog B). (Jia et al., 2010). Therefore, one attractive possibility is that SF2/ASF may be partially responsible for the tumorigenic properties of

SRp20. Supporting this notion, the expression levels of SRp20 and SF2/ASF are often correlated in a wide range of tumor samples (Jia et al., unpublished data). It will be interesting to find out whether SF2/ASF and SRp20 regulate the same set of downstream targets, or the two pathways are operated independently in tumorigenesis. As gene expression profiles and protein‐RNA interaction data continue to accumulate, such studies can be extended to other RNA‐binding proteins, whose involvement in tumor

141

development and progression have not been characterized. This can be further combined with transcriptional gene regulation to understand the molecular mechanisms underlying tumorigenesis. Such systems biology approaches may ultimately lead to a better understanding of gene regulatory networks under pathophysiologic conditions.

142

Appendix A: Oligonucleotides used for Luciferase reporter assay

SF2/ASF

F: TGGGCTAAAGTGTTGAATTGC; R: CAAGGATTGGGATCCAGTGA

MAP3K9

F: TCCAGCCCTACTTCTTGCAC; R: AACACTGGTGTTTGGGAAGC

SRF

F: AGGTCAGCCCTGTGTCTGTC; R: AGGTCAGCCCTGTGTCTGTC

ERBB4

F: ACACACCTGCTCCAATTTCC; R: CAACCCATGCAGAGAAATGA

PSCD3

F: GCCCAGTCCAGAGACAAAAC; R: ACACCTGGGTTGGTCAGAAG

RAF

F: CCTGATGTGGAGACACATGG; R: AAACCTGTATTCCTGGCTTCC

MKNK1

F: CTGCAGGGAGAAGCAAGAAG; R: GAAGTTCATGGGCTTGCAGT

MAPK4

F: GAAACCATCTGGCCCAACTA; R: GAAACGAAGATTTCCGACCA

RPS6KB1

F: CTGTGGCCCACTCTCTGTCT; R: CCCTTTCACCAGCTAAAGTCA

143

PIK3CD

F: AACAGCCATAAACGGAAACG; R: GCCAGGCACACAAGACTGTA

PTK2

F: TGGGTCGGGAACTAGCTGTA; R: ATCCCCCAACAAACTAAAGG

IRS2

F: TTTCTGCTCCCTCGACACTT; R: ATTCTCGTCTCCAGCAGCAT

PRKCB1

F: GTTGAGCCTGGGGTGTAAGA; R: TGATGAGTTTCTGGAAGTTTGG

144

Appendix B: Oligonucleotides used for semi-quantitative PCR

SF2/ASF

F: 5’‐CTCACTGGATCCCAATCCTT‐3’; R: 5’‐GTAATCTGATCCCGCTCCAT‐3’

EGFP

F: 5’‐ACTACCTGAGCACCCAGTCC‐3’; R: 5’‐CTTGTACAGCTCGTCCATGC‐3’

Total hnRNPK

F: 5’‐AACAAATCCGTCATGAGTCG‐3’; R: 5’‐ATACTGCTTCACACTGTTCTGCA‐3’; hnRNPK splicing variants

F: 5’‐TCCGAAGATCGGATCATTACC‐3’; R: 5’‐GCAGGACTCCTTCA GTTCTTCA‐3’

GAPDH

F: 5’‐TCAAGAAGGTGGTGAAGCAG‐3’; R: 5’‐CGCTGTTGAAGTCA GAGGAG‐3’

MiR‐7 stem‐loop

F: 5’‐AAAACTGCTGCCAAAACCAC‐3’; R: 5’‐GCTGCATTTTA CAGCACCAA‐3’

145

Appendix C: Oligonucleotides used for RIP-RT-PCR validation of PAR-CLIP clusters

Elovl5_exon7

F: 5’‐TGCTGACAATCATCCAGACG‐3’; R: 5’‐AAGAACAGCCACCCGAGAG‐3’

Hnrpll_exon3

F: 5’‐CCTGATGGATCGTCGGTATT‐3’; R: 5’‐CCAGCAGGCTTTCTTCAACT‐3’

Gsk3b_exon9

F: 5’‐CCGTCTGCTGGAGTACACAC‐3’; R: 5’‐ATGTGCACAAGCTTCCAGTG‐3’

Thbs1_exon19

F: 5’‐TGATGACTACGCTGGCTTTG‐3’; R: 5’‐CAGGCCTGAGTATCCCTGAG‐3’

Lmf2_exon6

F: 5’‐AGGCTCAGCAGACAGGTGAC‐3’; R: 5’‐CTGCTAACCCTGGTGCTCA‐3’

Papd5_exon11

F: 5’‐AAAAGGAAGCCCTTGGAAAA‐3’; R: 5’‐AGGAGGAGGACACTGGACCT‐3’

Bptf_exon21

F: 5’‐GCTGAGGACAGGATGGAGAT‐3’; R: 5’‐GTCACAGCCCCAGGTACAGT‐3’

Cul3_exon3

F: 5’‐CACCATGGCTGTTTGATGAT‐3’; R: 5’‐AACTTTCTTCAAACACTAAATCAAGC‐

3’

Exoc5_exon17

146

F: 5’‐AGCATGCCACCCATACAACT‐3’; R: 5’‐ATGGATGGGAAGAATGTGGA‐3’

Dpy19l1_exon14

F: 5’‐CAAACTCGGCTGCACAAGTA‐3’; R: 5’‐

TTGGCAACTTATTAACATCAAAATTC‐3’

Kif20b_exon30

F: 5’‐ACGGAGCCAAGCATCTACAG‐3’; R: 5’‐TGCTGCAAGAAGTCTCCAAA‐3’

Tm9sf3_exon9

F: 5’‐AAGAATGCAGTGCCACACAC‐3’; R: 5’‐CAGATGTTTATCGGGGCATT‐3’

5031439G07Rik_exon5

F: 5’‐TGGGGTAGCTCATCTTGGAC‐3’; R: 5’‐AGCAAACACCCCATGGAC‐3’

Sf3a1_exon4

F: 5’‐CCCCCTCCTGAGTTTGAGTT‐3’; R: 5’‐GTCAGAAACTGACGCCCATT‐3’

Sar1a_exon3

F: 5’‐GGGCAAAACCACTCTTCTTC‐3’; R: 5’‐ACGTGCTGGCCTAGTCTGTC‐3’

Mki67_exon10

F: 5’‐GGGCAGGCACACTTACTTTT‐3’; R: 5’‐TGCAAACTCTCCCTGTACCA‐3’

Rpp30_exon3

F: 5’‐ACGGCCTGATGAAGAATGAG‐3’; R: 5’‐GGCTAACAATGCCAGAGAGC‐3’

Wac_exon7

F: 5’‐TCTGGCCTGAACCCTACATC; R: 5’‐ATTGTGGAACGGGAGAAACA‐3’

147

Slc3a2_exon9

F: 5’‐AGGCAGGGGTGATGTTTTTA‐‘3; R: 5’‐CGAATTCTTATACTGGGTGTTTTTG‐3’

Fto_exon9

F: 5’‐CGCGGCTGCACAGATATAA‐3’; R: 5’‐AGGGCAACTGAAGAATGGAG‐3’

Tead1_exon11

F: 5’‐CCTGGACATCTTGCTTAGCC‐3’; R: 5’‐GTAGCTGGACTCTGGGCATC‐3’

Zfp948_exon6

F: 5’‐AGGCAGGGGTGATGTTTTTA‐3’; R: 5’‐CGAATTCTTATACTGGGTGTTTTTG‐3’

Med21_exon4

F: 5’‐TGCTATGACTGTGTTGCCTTT‐3’; R: 5’‐TGAGTCAGTGGCAATTTCAGA‐3’

Pdzd8_exon5

F: 5’‐CATTGTGTTCTGTTTTCACTCTTG‐3’; R: 5’‐TGAGTCAGTGGCAATTTCAGA‐3’

Mpv17_exon8

F: 5’‐GCTCCAGTCCTGATGTGGTT‐3’; R: 5’‐TGTACTGATGCACCGTGACC‐3’

Taok1_exon20

F: 5’‐AAGTGACTCTCACAATGGGTGAT‐3’; R: 5’‐

GCAAGACATTTCTTCAAACTCCA‐3’

Fmr1_exon17

F: 5’‐TTTGGGGAGATATGAAGGTCA‐3’; R: 5’‐AACCTGCTTTCAATGTTTCTCAG‐3’

Mpv17_exon8

148

F: 5’‐GCTCCAGTCCTGATGTGGTT‐3’; R: 5’‐TGTACTGATGCACCGTGACC‐3’

Rp15_exon6

F: 5’‐AAGCAAGGAGTTCAATGCAGA‐3’; R: 5’‐CATTCTGACCCATGATGTGC‐3’

Atp5g2_exon3

F: 5’‐TCAACTCCACTGCAGACAGC‐3’; R: 5’‐AGCACCTCTCTCAGGAGCAC‐3’

Hebp1_exon3

F: 5’‐GGAATCCGGAACCAGACTTT‐3’; R: 5’‐CCAATGAAGATGGCTCCCTA‐3’

149

Appendix D: Oligonucleotides used for functional validation of

PAR-CLIP clusters

Ss18_exon8

F: 5’‐GCAGTACTCAGGCCAGGAAG‐3’; R: 5’‐GTGGGTATCCTTGCTGTGGA‐3’

Mga_exon13

F: 5’‐ACCCTGTTGCCTTCAACATC‐3’; R: 5’‐TAGGAGAACCCTGCTGCACT‐3’

Sirpa_exon3

F: 5’‐CCGTTCTGAACTGCACTTTG‐3’; R: 5’‐GGTAGAGAGCAGCCATCAGC‐3’

Cugbp2_exon15

F: 5’‐ACCCAGGCCTACTCAGGAAT‐3’; R: 5’‐AGGTTTGCTGTCGTTTTTGG‐3’

Abi1_exon7

F: 5’‐GCATTGGCATTCCTATTGCT‐3’; R: 5’‐CTGCAGCTTCCTCATCTTCA‐3’

Rusc1_exon3

F: 5’‐TAGTCTTCTCGCCCGACACT‐3’; R: 5’‐CCCAAAATGCGAGATGATTT‐3’

Slain2_exon7

F: 5’‐TGGAGGAATACCTCGAATGC‐3’; R: 5’‐CAGGTGAGGAGGTGGTTTGT‐3’

Hnrnpd_exon2

F: 5’‐ATGTCGGAGGAGCAGTTCG‐3’; R: 5’‐TCCCAGCTAAGGCCTCCTAT‐3’

Csda_exon6

F: 5’‐GAAAAACAACCCACGCAAGT‐3’; R: 5’‐CCTTCATCTCCCCAATCTCA‐3’

150

Ilf3_exon3

F: 5’‐TGGTAACACCAAGGAACTGTCA ‐3’; R: 5’‐AGAATGCTTTGCCATCACG‐3’

Pum2_exon13

F: 5’‐CAGCAGCAACCAAGCACTAA‐3’; R: 5’‐AGCCGAGAAGGAGGAAAGAG‐3’

Map4k4_exon23

F: 5’‐CTGAGCAACGGTGAAACAGA‐3’; R: 5’‐GGAGGAAGATTTGTGTTTCTGG‐3’

Nfic_exon7

F: 5’‐CAACTCACCCACGAGTAGCA‐3’; R: 5’‐GTCCATTCTGCCCATCTGTT‐3’

Usp15_exon7

F: 5’‐GAGAAGGAGGCCAGACTGTG‐3’; R: 5’‐GGCTGAGTTCATGAAACACGTA‐3’

2310061C15Rik_UTR

F: 5’‐GAGGCTTTCTGACCCTCCAG‐3’; R1: 5’‐CCCTATGGTTCCGTTCTTCA‐3’; R2: 5’‐

GGGAACTTTGGTATGGTTGG‐3’

Gadd45gip1_UTR

F: 5’‐GATTGAAAACTGGCGAAAGC‐3’; R1: 5’‐GCCTCCTTCTTCTGTCGTTG‐3’; R2: 5’‐

CATGCAGTCTGTTTGGCTGT‐3’

Swap70_UTR

F: 5’‐CCTCCGAGGATGACACAGAT‐3’; R1: 5’‐TGGCAAATGTTTTCCAGTCA‐3’; R2:

5’‐TTCCGTCCAGACATTTCACA‐3’

Etv6_UTR

151

F: 5’‐CGAGTCTCAAGTGCTGGATG‐3’; R1: 5’‐GAAGGAGAAGCGAACGTGAC‐3’; R2:

5’‐GAGATGTCTGGCCTTCCTCA‐3’

Srp9_UTR

F: 5’‐GACTTATGGTGGCCAAGGAA‐3’; R1: 5’‐TCTAACTGGAAACCCCGACA‐3’; R2:

5’‐AAACACTGCGTACGTTGCAC‐3’

Heca_UTR

F: 5’‐TGTGCAGGAGAATGTGGAAA‐3’; R1: 5’‐TGTAGGGGGCATAAACCAAA‐3’; R2:

5’‐AGCCAAATTTTCCCTTCCAG‐3’

Bclaf1_UTR

F: 5’‐ATTGCAACAGGGAGCAGAAC‐3’; R1: 5’‐CCGAACATTTTCTACTTTATTTCAA‐

3’; R2: 5’‐TGTCCTCCTTATGTTTGCTGAA‐3’

152

Appendix E: GO analysis for the host genes with SRp20 binding clusters

All categories with a p‐value less than 0.01 are shown. Reference CLIP Biological Process Expected Trend P‐value genes genes primary metabolic process 9122 1780 1200.13 + 6.16E‐91 metabolic process 9603 1821 1263.41 + 3.25E‐83 cellular process 7133 1350 938.45 + 3.22E‐52 nucleobase, nucleoside, nucleotide and nucleic acid 4156 888 546.78 + 4.16E‐50 metabolic process cell cycle 2018 524 265.5 + 2.50E‐49 protein metabolic process 3979 841 523.49 + 3.24E‐45 protein transport 1687 445 221.95 + 3.77E‐43 intracellular protein transport 1687 445 221.95 + 3.77E‐43 transport 3009 670 395.88 + 9.85E‐42 mitosis 662 206 87.1 + 2.39E‐28 intracellular signaling cascade 1720 398 226.29 + 4.08E‐27 response to pheromone 440 1 57.89 ‐ 2.65E‐24 vesicle‐mediated transport 1202 287 158.14 + 1.90E‐21 developmental process 3296 625 433.63 + 7.69E‐21 cellular component 1582 345 208.13 + 1.48E‐19 organization cell communication 5033 873 662.16 + 8.24E‐19 signal transduction 4858 839 639.14 + 1.65E‐17 system development 2222 431 292.33 + 9.40E‐16 nervous system development 1378 293 181.3 + 2.31E‐15 response to stress 547 142 71.97 + 1.02E‐13 ectoderm development 1549 307 203.79 + 1.94E‐12 apoptosis 1035 221 136.17 + 5.24E‐12 cellular component 1205 248 158.53 + 9.16E‐12 morphogenesis anatomical structure 1205 248 158.53 + 9.16E‐12 morphogenesis segregation 217 71 28.55 + 1.32E‐11 cell motion 983 210 129.33 + 1.74E‐11 153

organelle organization 382 98 50.26 + 1.19E‐09 endocytosis 604 137 79.46 + 1.84E‐09 cytokinesis 245 71 32.23 + 2.09E‐09 cell adhesion 1441 273 189.58 + 2.55E‐09 establishment or maintenance 352 90 46.31 + 6.43E‐09 of chromatin architecture mesoderm development 1701 308 223.79 + 1.85E‐08 cell‐matrix adhesion 171 53 22.5 + 2.71E‐08 carbohydrate metabolic 1038 203 136.56 + 3.38E‐08 process cellular and 388 93 51.05 + 6.75E‐08 derivative metabolic process negative regulation of 296 74 38.94 + 3.11E‐07 apoptosis nuclear transport 92 33 12.1 + 5.03E‐07 exocytosis 401 90 52.76 + 1.58E‐06 immune system process 2974 481 391.27 + 1.63E‐06 homeostatic process 143 41 18.81 + 6.02E‐06 cell‐cell adhesion 890 166 117.09 + 8.32E‐06 lipid metabolic process 1266 223 166.56 + 1.08E‐05 induction of apoptosis 383 83 50.39 + 1.36E‐05 embryonic development 545 108 71.7 + 3.18E‐05 nitrogen compound metabolic 71 23 9.34 + 1.11E‐04 process angiogenesis 429 86 56.44 + 1.33E‐04 phosphate metabolic process 235 53 30.92 + 1.77E‐04 sensory perception of sound 125 33 16.45 + 2.04E‐04 natural killer cell activation 153 6 20.13 ‐ 2.26E‐04 endoderm development 52 18 6.84 + 2.72E‐04 nucleobase, nucleoside, nucleotide and nucleic acid 142 35 18.68 + 4.59E‐04 transport visual perception 431 83 56.7 + 5.64E‐04 amino acid transport 85 24 11.18 + 5.67E‐04 carbohydrate transport 189 43 24.87 + 5.72E‐04 meiosis 281 58 36.97 + 7.76E‐04 skeletal system development 525 96 69.07 + 1.12E‐03 immune response 900 87 118.41 ‐ 1.30E‐03 hemopoiesis 223 47 29.34 + 1.54E‐03 154

cellular calcium ion 49 15 6.45 + 2.71E‐03 homeostasis regulation of liquid surface 18 8 2.37 + 3.07E‐03 tension cell surface receptor linked 2702 403 355.49 + 4.77E‐03 signal transduction localization 113 26 14.87 + 5.45E‐03 sulfur metabolic process 113 26 14.87 + 5.45E‐03 gamete generation 1007 161 132.48 + 7.80E‐03 response to interferon‐gamma 135 8 17.76 ‐ 8.00E‐03 nitric oxide biosynthetic 13 6 1.71 + 8.21E‐03 process peroxisomal transport 26 9 3.42 + 8.57E‐03 Genes Reference Molecular Function from Expected Trend P‐value genes CLIP catalytic activity 5953 1276 783.2 + 6.66E‐80 binding 7668 1466 1008.83 + 1.47E‐61 nucleic acid binding 4511 890 593.48 + 6.25E‐37 transferase activity 1771 427 233 + 8.88E‐33 protein binding 3396 690 446.79 + 4.03E‐31 hydrolase activity 2496 519 328.38 + 4.96E‐25 DNA binding 2493 512 327.99 + 1.45E‐23 kinase activity 728 200 95.78 + 2.13E‐21 RNA binding 575 161 75.65 + 3.56E‐18 G‐protein coupled receptor 911 46 119.85 ‐ 4.74E‐15 transcription factor activity 2139 412 281.42 + 1.25E‐14 transcription regulator activity 2139 412 281.42 + 1.25E‐14 enzyme regulator activity 1257 269 165.38 + 1.74E‐14 helicase activity 159 64 20.92 + 2.38E‐14 small GTPase regulator activity 484 131 63.68 + 5.37E‐14 ligase activity 644 156 84.73 + 1.27E‐12 structural constituent of 1064 225 139.98 + 8.06E‐12 cytoskeleton hydrolase activity, acting on 731 166 96.17 + 3.26E‐11 ester bonds RNA splicing factor activity, 291 85 38.29 + 3.93E‐11 transesterification mechanism 155

chromatin binding 232 69 30.52 + 1.26E‐09 cytoskeletal protein binding 439 106 57.76 + 5.87E‐09 acyltransferase activity 191 58 25.13 + 1.24E‐08 RNA helicase activity 91 36 11.97 + 1.50E‐08 transcription cofactor activity 300 79 39.47 + 1.59E‐08 transmembrane receptor 160 51 21.05 + 2.08E‐08 protein kinase activity ubiquitin‐protein ligase 388 94 51.05 + 3.58E‐08 activity DNA helicase activity 84 33 11.05 + 6.80E‐08 glutamate receptor activity 166 2 21.84 ‐ 8.08E‐08 phosphatase activity 246 66 32.36 + 1.21E‐07 transmembrane transporter 944 184 124.2 + 1.84E‐07 activity transmembrane receptor 129 42 16.97 + 2.02E‐07 protein tyrosine kinase activity transporter activity 990 190 130.25 + 3.19E‐07 structural constituent of 512 32 67.36 ‐ 1.05E‐06 ribosome receptor binding 1335 240 175.64 + 1.24E‐06 GTPase activity 295 71 38.81 + 1.97E‐06 RNA‐directed DNA 96 0 12.63 ‐ 3.20E‐06 polymerase activity motor activity 123 37 16.18 + 5.97E‐06 structural molecule activity 1865 309 245.37 + 2.66E‐05 kinase regulator activity 322 70 42.36 + 5.57E‐05 translation factor activity, 137 37 18.02 + 5.61E‐05 nucleic acid binding cytokine receptor activity 98 29 12.89 + 7.50E‐05 antigen binding 68 0 8.95 ‐ 1.29E‐04 transforming growth factor 14 9 1.84 + 1.29E‐04 beta receptor activity methyltransferase activity 133 35 17.5 + 1.42E‐04 ligand‐gated ion channel 249 14 32.76 ‐ 1.78E‐04 activity enzyme activator activity 146 37 19.21 + 1.92E‐04 translation regulator activity 131 34 17.23 + 2.24E‐04 translation initiation factor 103 28 13.55 + 3.77E‐04 156

guanyl‐nucleotide exchange 170 40 22.37 + 4.66E‐04 factor activity microtubule motor activity 70 21 9.21 + 5.75E‐04 kinase activator activity 108 28 14.21 + 7.69E‐04 peptidase activity 917 156 120.64 + 9.35E‐04 transmembrane receptor protein serine/threonine kinase 55 17 7.24 + 1.34E‐03 activity extracellular matrix structural 142 33 18.68 + 1.67E‐03 constituent protein disulfide isomerase 17 8 2.24 + 2.18E‐03 activity calcium‐dependent 139 32 18.29 + 2.24E‐03 phospholipid binding DNA‐directed DNA 35 12 4.6 + 2.86E‐03 polymerase activity transferase activity, 256 51 33.68 + 3.09E‐03 transferring glycosyl groups isomerase activity 235 47 30.92 + 4.06E‐03 DNA‐directed RNA 53 15 6.97 + 5.48E‐03 polymerase activity calcium ion binding 522 90 68.68 + 7.25E‐03 DNA topoisomerase activity 9 5 1.18 + 7.33E‐03 hydrogen ion transmembrane 56 15 7.37 + 8.78E‐03 transporter activity Genes Reference Pathway from Expected Trend P‐value genes CLIP Angiogenesis 193 79 25.39 + 9.44E‐18 EGF receptor signaling 138 53 18.16 + 2.05E‐11 pathway Ras Pathway 80 38 10.53 + 4.40E‐11 Wnt signaling pathway 348 96 45.78 + 4.58E‐11 PDGF signaling pathway 162 56 21.31 + 2.64E‐10 Integrin signalling pathway 185 60 24.34 + 6.70E‐10 Alzheimer disease‐presenilin 124 44 16.31 + 9.72E‐09 pathway VEGF signaling pathway 76 31 10 + 7.48E‐08

157

FGF signaling pathway 125 42 16.45 + 8.89E‐08 5HT2 type receptor mediated 77 29 10.13 + 9.35E‐07 signaling pathway Oxytocin receptor mediated 68 26 8.95 + 2.54E‐06 signaling pathway Inflammation mediated by chemokine and cytokine 298 71 39.21 + 2.76E‐06 signaling pathway PI3 kinase pathway 110 35 14.47 + 3.25E‐06 Cytoskeletal regulation by Rho 101 33 13.29 + 3.56E‐06 GTPase Ubiquitin proteasome pathway 88 30 11.58 + 4.31E‐06 Apoptosis signaling pathway 141 41 18.55 + 4.34E‐06 Thyrotropin‐releasing hormone receptor signaling 71 26 9.34 + 5.39E‐06 pathway TGF‐beta signaling pathway 143 41 18.81 + 6.02E‐06 Axon guidance mediated by 44 19 5.79 + 1.06E‐05 semaphorins Endothelin signaling pathway 89 29 11.71 + 1.40E‐05 Insulin/IGF pathway‐protein 83 27 10.92 + 2.79E‐05 kinase B signaling cascade Hypoxia response via HIF 28 14 3.68 + 3.17E‐05 activation Interleukin signaling pathway 160 42 21.05 + 3.49E‐05 Huntington disease 199 49 26.18 + 4.04E‐05 Histamine H1 receptor 53 20 6.97 + 4.12E‐05 mediated signaling pathway Parkinson disease 106 31 13.95 + 5.37E‐05 5HT1 type receptor mediated 48 18 6.32 + 1.05E‐04 signaling pathway B cell activation 87 26 11.45 + 1.47E‐04 Alzheimer disease‐amyloid 73 23 9.6 + 1.65E‐04 secretase pathway Muscarinic acetylcholine 69 22 9.08 + 1.92E‐04 receptor 1 and 3 pathway Transcription regulation by 58 19 7.63 + 3.66E‐04 bZIP transcription factor 158

Beta2 adrenergic receptor 49 17 6.45 + 3.87E‐04 signaling pathway Beta1 adrenergic receptor 49 17 6.45 + 3.87E‐04 signaling pathway p53 pathway feedback loops 2 51 17 6.71 + 6.00E‐04 Cadherin signaling pathway 167 39 21.97 + 6.24E‐04 5HT4 type receptor mediated 38 14 5 + 6.90E‐04 signaling pathway Alpha adrenergic receptor 34 13 4.47 + 7.56E‐04 signaling pathway T cell activation 142 34 18.68 + 8.87E‐04 Beta3 adrenergic receptor 33 12 4.34 + 1.79E‐03 signaling pathway Oxidative stress response 62 18 8.16 + 1.93E‐03 Axon guidance mediated by 29 11 3.82 + 1.98E‐03 netrin FAS signaling pathway 38 13 5 + 2.00E‐03 DNA replication 25 10 3.29 + 2.13E‐03 Toll receptor signaling 64 18 8.42 + 2.69E‐03 pathway General transcription 40 13 5.26 + 3.08E‐03 regulation Metabotropic glutamate 55 16 7.24 + 3.27E‐03 receptor group II pathway p53 pathway 127 29 16.71 + 3.88E‐03 Muscarinic acetylcholine receptor 2 and 4 signaling 67 18 8.81 + 4.28E‐03 pathway Cortocotropin releasing factor 37 12 4.87 + 4.42E‐03 receptor signaling pathway Histamine H2 receptor 28 10 3.68 + 4.68E‐03 mediated signaling pathway GABA‐B_receptor_II_signaling 43 13 5.66 + 5.54E‐03 JAK/STAT signaling pathway 20 8 2.63 + 5.69E‐03 Nicotinic acetylcholine 98 23 12.89 + 6.86E‐03 receptor signaling pathway Metabotropic glutamate 35 11 4.6 + 7.79E‐03 receptor group I pathway

159

Flavin biosynthesis 1 2 0.13 + 7.93E‐03 5HT3 type receptor mediated 17 7 2.24 + 8.10E‐03 signaling pathway Vasopressin synthesis 13 6 1.71 + 8.21E‐03 Insulin/IGF pathway‐mitogen activated protein kinase 36 11 4.74 + 9.46E‐03 kinase/MAP kinase cascade

160

References

Aguda, B.D., Kim, Y., Piper‐Hunter, M.G., Friedman, A., and Marsh, C.B. (2008). MicroRNA regulation of a cancer network: consequences of the feedback loops involving miR‐17‐92, E2F, and Myc. Proc Natl Acad Sci U S A 105, 19678‐19683.

Alvarez‐Garcia, I., and Miska, E.A. (2005). MicroRNA functions in animal development and human disease. Development 132, 4653‐4662.

Amrein, H., Gorman, M., and Nothiger, R. (1988). The sex‐determining gene tra‐2 of Drosophila encodes a putative RNA binding protein. Cell 55, 1025‐1035.

Anko, M.L., Morales, L., Henry, I., Beyer, A., and Neugebauer, K.M. (2010). Global analysis reveals SRp20‐ and SRp75‐specific mRNPs in cycling and neural cells. Nat Struct Mol Biol 17, 962‐970.

Azuma‐Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi, M.C. (2008). Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc Natl Acad Sci U S A 105, 7964‐7969.

Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281‐297.

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215‐233.

Bauren, G., and Wieslander, L. (1994). Splicing of Balbiani ring 1 gene pre‐mRNA occurs simultaneously with transcription. Cell 76, 183‐192.

Bedard, K.M., Daijogo, S., and Semler, B.L. (2007). A nucleo‐cytoplasmic SR protein functions in viral IRES‐mediated translation initiation. EMBO J 26, 459‐467.

161

Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S., and Haussler, D. (2004). Ultraconserved elements in the human genome. Science 304, 1321‐ 1325.

Berget, S.M., Moore, C., and Sharp, P.A. (1977). Spliced segments at the 5ʹ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci U S A 74, 3171‐3175.

Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills, A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. (2003). Dicer is essential for mouse development. Nat Genet 35, 215‐217.

Beyer, A.L., and Osheim, Y.N. (1988). Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev 2, 754‐765.

Boggs, R.T., Gregor, P., Idriss, S., Belote, J.M., and McKeown, M. (1987). Regulation of sexual differentiation in D. melanogaster via alternative splicing of RNA from the transformer gene. Cell 50, 739‐747.

Caceres, J.F., and Krainer, A.R. (1993). Functional analysis of pre‐mRNA splicing factor SF2/ASF structural domains. EMBO J 12, 4715‐4726.

Caceres, J.F., Misteli, T., Screaton, G.R., Spector, D.L., and Krainer, A.R. (1997). Role of the modular domains of SR proteins in subnuclear localization and alternative splicing specificity. J Cell Biol 138, 225‐238.

Caceres, J.F., Screaton, G.R., and Krainer, A.R. (1998). A specific subset of SR proteins shuttles continuously between the nucleus and the cytoplasm. Genes Dev 12, 55‐66.

Caceres, J.F., Stamm, S., Helfman, D.M., and Krainer, A.R. (1994). Regulation of alternative splicing in vivo by overexpression of antagonistic splicing factors. Science 265, 1706‐1709.

Cain, J.W., Hauptschein, R.S., Stewart, J.K., Bagci, T., Sahagian, G.G., and Jay, D.G. (2011). Identification of CD44 as a Surface Biomarker for Drug Resistance by Surface Proteome Signature Technology. Mol Cancer Res. 162

Caputi, M., Mayeda, A., Krainer, A.R., and Zahler, A.M. (1999). hnRNP A/B proteins are required for inhibition of HIV‐1 pre‐mRNA splicing. EMBO J 18, 4060‐4067.

Cartegni, L., Chew, S.L., and Krainer, A.R. (2002). Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3, 285‐298.

Cartegni, L., and Krainer, A.R. (2002). Disruption of an SF2/ASF‐dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nat Genet 30, 377‐384.

Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q., and Krainer, A.R. (2003). ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res 31, 3568‐3571.

Cavaloc, Y., Bourgeois, C.F., Kister, L., and Stevenin, J. (1999). The splicing factors 9G8 and SRp20 transactivate splicing through different and specific enhancers. RNA 5, 468‐ 483.

Cazalla, D., Zhu, J., Manche, L., Huber, E., Krainer, A.R., and Caceres, J.F. (2002). Nuclear export and retention signals in the RS domain of SR proteins. Mol Cell Biol 22, 6871‐6882.

Chang, T.C., Wentzel, E.A., Kent, O.A., Ramachandran, K., Mullendore, M., Lee, K.H., Feldmann, G., Yamakuchi, M., Ferlito, M., Lowenstein, C.J., et al. (2007). Transactivation of miR‐34a by p53 broadly influences gene expression and promotes apoptosis. Mol Cell 26, 745‐752.

Chang, T.C., Yu, D., Lee, Y.S., Wentzel, E.A., Arking, D.E., West, K.M., Dang, C.V., Thomas‐Tikhonenko, A., and Mendell, J.T. (2008). Widespread microRNA repression by Myc contributes to tumorigenesis. Nat Genet 40, 43‐50.

Chatterjee, S., and Grosshans, H. (2009). Active turnover modulates mature microRNA activity in Caenorhabditis elegans. Nature 461, 546‐549.

163

Chendrimada, T.P., Finn, K.J., Ji, X., Baillat, D., Gregory, R.I., Liebhaber, S.A., Pasquinelli, A.E., and Shiekhattar, R. (2007). MicroRNA silencing through RISC recruitment of eIF6. Nature 447, 823‐828.

Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura, K., and Shiekhattar, R. (2005). TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740‐744.

Cheng, L.C., Pastrana, E., Tavazoie, M., and Doetsch, F. (2009). miR‐124 regulates adult neurogenesis in the subventricular zone stem cell niche. Nat Neurosci 12, 399‐408.

Chi, S.W., Zang, J.B., Mele, A., and Darnell, R.B. (2009). Argonaute HITS‐CLIP decodes microRNA‐mRNA interaction maps. Nature 460, 479‐486.

Chou, T.B., Zachar, Z., and Bingham, P.M. (1987). Developmental expression of a regulatory gene is programmed at the level of splicing. EMBO J 6, 4095‐4104.

Chow, L.T., Gelinas, R.E., Broker, T.R., and Roberts, R.J. (1977). An amazing sequence arrangement at the 5ʹ ends of adenovirus 2 messenger RNA. Cell 12, 1‐8.

Colwill, K., Pawson, T., Andrews, B., Prasad, J., Manley, J.L., Bell, J.C., and Duncan, P.I. (1996). The Clk/Sty protein kinase phosphorylates SR splicing factors and regulates their intranuclear distribution. EMBO J 15, 265‐275.

Cooke, C., and Alwine, J.C. (1996). The cap and the 3ʹ splice site similarly affect polyadenylation efficiency. Mol Cell Biol 16, 2579‐2584.

Cooke, C., Hans, H., and Alwine, J.C. (1999). Utilization of splicing elements and polyadenylation signal elements in the coupling of polyadenylation and last‐intron removal. Mol Cell Biol 19, 4971‐4979.

Cooper, D.N., and Krawczak, M. (1990). The mutational spectrum of single base‐pair substitutions causing human genetic disease: patterns and predictions. Hum Genet 85, 55‐74.

164

Coulter, L.R., Landree, M.A., and Cooper, T.A. (1997). Identification of a new class of exonic splicing enhancers by in vivo selection. Mol Cell Biol 17, 2143‐2150.

Czech, B., Zhou, R., Erlich, Y., Brennecke, J., Binari, R., Villalta, C., Gordon, A., Perrimon, N., and Hannon, G.J. (2009). Hierarchical rules for Argonaute loading in Drosophila. Mol Cell 36, 445‐456. da Cunha, C.B., Oliveira, C., Wen, X., Gomes, B., Sousa, S., Suriano, G., Grellier, M., Huntsman, D.G., Carneiro, F., Granja, P.L., et al. (2010). De novo expression of CD44 variants in sporadic and hereditary gastric cancer. Lab Invest 90, 1604‐1614.

Danckwardt, S., Hentze, M.W., and Kulozik, A.E. (2008). 3ʹ end mRNA processing: molecular mechanisms and implications for health and disease. EMBO J 27, 482‐498. de la Mata, M., and Kornblihtt, A.R. (2006). RNA polymerase II C‐terminal domain mediates regulation of alternative splicing by SRp20. Nat Struct Mol Biol 13, 973‐980.

Deka, P., Bucheli, M.E., Moore, C., Buratowski, S., and Varani, G. (2008). Structure of the yeast SR protein Npl3 and Interaction with mRNA 3ʹ‐end processing signals. J Mol Biol 375, 136‐150.

Ding, J.H., Xu, X., Yang, D., Chu, P.H., Dalton, N.D., Ye, Z., Yeakley, J.M., Cheng, H., Xiao, R.P., Ross, J., et al. (2004). Dilated cardiomyopathy caused by tissue‐specific ablation of SC35 in the heart. EMBO J 23, 885‐896.

Dowling, D., Nasr‐Esfahani, S., Tan, C.H., OʹBrien, K., Howard, J.L., Jans, D.A., Purcell, D.F., Stoltzfus, C.M., and Sonza, S. (2008). HIV‐1 infection induces changes in expression of cellular splicing factors that regulate alternative viral splicing and virus production in macrophages. Retrovirology 5, 18.

Elowitz, M.B., and Leibler, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature 403, 335‐338.

Eperon, I.C., Makarova, O.V., Mayeda, A., Munroe, S.H., Caceres, J.F., Hayward, D.G., and Krainer, A.R. (2000). Selection of alternative 5ʹ splice sites: role of U1 snRNP and 165

models for the antagonistic effects of SF2/ASF and hnRNP A1. Mol Cell Biol 20, 8303‐ 8318.

Esquela‐Kerscher, A., and Slack, F.J. (2006). Oncomirs ‐ microRNAs with a role in cancer. Nat Rev Cancer 6, 259‐269.

Fairbrother, W.G., Yeh, R.F., Sharp, P.A., and Burge, C.B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007‐1013.

Feng, Y., Chen, M., and Manley, J.L. (2008). Phosphorylation switches the general splicing repressor SRp38 to a sequence‐specific activator. Nat Struct Mol Biol 15, 1040‐ 1048.

Feng, Y., Valley, M.T., Lazar, J., Yang, A.L., Bronson, R.T., Firestein, S., Coetzee, W.A., and Manley, J.L. (2009). SRp38 regulates alternative splicing and is required for Ca(2+) handling in the embryonic heart. Dev Cell 16, 528‐538.

Fischer, D.C., Noack, K., Runnebaum, I.B., Watermann, D.O., Kieback, D.G., Stamm, S., and Stickeler, E. (2004). Expression of splicing factors in human ovarian cancer. Oncol Rep 11, 1085‐1090.

Fu, X.D. (1995). The superfamily of arginine/serine‐rich splicing factors. RNA 1, 663‐680.

Fu, X.D., and Maniatis, T. (1990). Factor required for mammalian spliceosome assembly is localized to discrete regions in the nucleus. Nature 343, 437‐441.

Fu, X.D., Mayeda, A., Maniatis, T., and Krainer, A.R. (1992). General splicing factors SF2 and SC35 have equivalent activities in vitro, and both affect alternative 5ʹ and 3ʹ splice site selection. Proc Natl Acad Sci U S A 89, 11224‐11228.

Gabut, M., Dejardin, J., Tazi, J., and Soret, J. (2007). The SR family proteins B52 and dASF/SF2 modulate development of the Drosophila visual system by regulating specific RNA targets. Mol Cell Biol 27, 3087‐3097.

166

Galante, P.A., Sandhu, D., de Sousa Abreu, R., Gradassi, M., Slager, N., Vogel, C., de Souza, S.J., and Penalva, L.O. (2009). A comprehensive in silico expression analysis of RNA binding proteins in normal and tumor tissue: Identification of potential players in tumor formation. RNA Biol 6, 426‐433.

Gallego, M.E., Gattoni, R., Stevenin, J., Marie, J., and Expert‐Bezancon, A. (1997). The SR splicing factors ASF/SF2 and SC35 have antagonistic effects on intronic enhancer‐ dependent splicing of the beta‐tropomyosin alternative exon 6A. EMBO J 16, 1772‐1784.

Gatignol, A., Buckler‐White, A., Berkhout, B., and Jeang, K.T. (1991). Characterization of a human TAR RNA‐binding protein that activates the HIV‐1 LTR. Science 251, 1597‐ 1600.

Ge, H., and Manley, J.L. (1990). A protein factor, ASF, controls cell‐specific alternative splicing of SV40 early pre‐mRNA in vitro. Cell 62, 25‐34.

Ge, H., Zuo, P., and Manley, J.L. (1991). Primary structure of the human splicing factor ASF reveals similarities with Drosophila regulators. Cell 66, 373‐382.

Georgiev, S., Boyle, A.P., Jayasurya, K., Ding, X., Mukherjee, S., and Ohler, U. (2010). Evidence‐ranked motif identification. Genome Biol 11, R19.

Ghildiyal, M., Xu, J., Seitz, H., Weng, Z., and Zamore, P.D. (2010). Sorting of Drosophila small silencing RNAs partitions microRNA* strands into the RNA interference pathway. RNA 16, 43‐56.

Glisovic, T., Bachorik, J.L., Yong, J., and Dreyfuss, G. (2008). RNA‐binding proteins and post‐transcriptional gene regulation. FEBS Lett 582, 1977‐1986.

Graveley, B.R., Hertel, K.J., and Maniatis, T. (1998). A systematic analysis of the factors that determine the strength of pre‐mRNA splicing enhancers. EMBO J 17, 6747‐6756.

Graveley, B.R., Hertel, K.J., and Maniatis, T. (2001). The role of U2AF35 and U2AF65 in enhancer‐dependent splicing. RNA 7, 806‐818.

167

Grimson, A., Farh, K.K., Johnston, W.K., Garrett‐Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91‐105.

Grosso, A.R., Gomes, A.Q., Barbosa‐Morais, N.L., Caldeira, S., Thorne, N.P., Grech, G., von Lindern, M., and Carmo‐Fonseca, M. (2008). Tissue‐specific splicing factor gene expression signatures. Nucleic Acids Res 36, 4823‐4832.

Gui, J.F., Lane, W.S., and Fu, X.D. (1994). A serine kinase regulates intracellular localization of splicing factors in the cell cycle. Nature 369, 678‐682.

Guil, S., and Caceres, J.F. (2007). The multifunctional RNA‐binding protein hnRNP A1 is required for processing of miR‐18a. Nat Struct Mol Biol 14, 591‐596.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., Munschauer, M., et al. (2010a). Transcriptome‐wide identification of RNA‐binding protein and microRNA target sites by PAR‐CLIP. Cell 141, 129‐141.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M., Jungkamp, A.C., Munschauer, M., et al. (2010b). PAR‐CliP‐‐a method to identify transcriptome‐wide the binding sites of RNA binding proteins. J Vis Exp.

Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional crossregulation between Drosha and DGCR8. Cell 136, 75‐84.

Hanamura, A., Caceres, J.F., Mayeda, A., Franza, B.R., Jr., and Krainer, A.R. (1998). Regulated tissue‐specific expression of antagonistic pre‐mRNA splicing factors. RNA 4, 430‐444.

Hautbergue, G.M., Hung, M.L., Golovanov, A.P., Lian, L.Y., and Wilson, S.A. (2008). Mutually exclusive interactions drive handover of mRNA from export adaptors to TAP. Proc Natl Acad Sci U S A 105, 5154‐5159.

168

He, L., He, X., Lim, L.P., de Stanchina, E., Xuan, Z., Liang, Y., Xue, W., Zender, L., Magnus, J., Ridzon, D., et al. (2007). A microRNA component of the p53 tumour suppressor network. Nature 447, 1130‐1134.

He, L., Thomson, J.M., Hemann, M.T., Hernando‐Monge, E., Mu, D., Goodson, S., Powers, S., Cordon‐Cardo, C., Lowe, S.W., Hannon, G.J., et al. (2005). A microRNA polycistron as a potential human oncogene. Nature 435, 828‐833.

Heo, I., Joo, C., Cho, J., Ha, M., Han, J., and Kim, V.N. (2008). Lin28 mediates the terminal uridylation of let‐7 precursor MicroRNA. Mol Cell 32, 276‐284.

Hornstein, E., and Shomron, N. (2006). Canalization of development by microRNAs. Nat Genet 38 Suppl, S20‐24.

Hua, Y., Sahashi, K., Hung, G., Rigo, F., Passini, M.A., Bennett, C.F., and Krainer, A.R. (2010). Antisense correction of SMN2 splicing in the CNS rescues necrosis in a type III SMA mouse model. Genes Dev 24, 1634‐1644.

Huang, Y., Gattoni, R., Stevenin, J., and Steitz, J.A. (2003). SR splicing factors serve as adapter proteins for TAP‐dependent mRNA export. Mol Cell 11, 837‐843.

Huang, Y., and Steitz, J.A. (2001). Splicing factors SRp20 and 9G8 promote the nucleocytoplasmic export of mRNA. Mol Cell 7, 899‐905.

Huang, Y., Yario, T.A., and Steitz, J.A. (2004). A molecular link between SR protein dephosphorylation and mRNA export. Proc Natl Acad Sci U S A 101, 9666‐9670.

Huntzinger, E., and Izaurralde, E. (2011). Gene silencing by microRNAs: contributions of translational repression and mRNA decay. Nat Rev Genet 12, 99‐110.

Hwang, H.W., Wentzel, E.A., and Mendell, J.T. (2007). A hexanucleotide element directs microRNA nuclear import. Science 315, 97‐100.

169

Ibrahim, E.C., Schaal, T.D., Hertel, K.J., Reed, R., and Maniatis, T. (2005). Serine/arginine‐ rich protein‐dependent suppression of exon skipping by exonic splicing enhancers. Proc Natl Acad Sci U S A 102, 5002‐5007.

Iliopoulos, D., Hirsch, H.A., and Struhl, K. (2009). An epigenetic switch involving NF‐ kappaB, Lin28, Let‐7 MicroRNA, and IL6 links inflammation to cell transformation. Cell 139, 693‐706.

Jan, C.H., Friedman, R.C., Ruby, J.G., and Bartel, D.P. (2011). Formation, regulation and evolution of Caenorhabditis elegans 3ʹUTRs. Nature 469, 97‐101.

Jia, R., Li, C., McCoy, J.P., Deng, C.X., and Zheng, Z.M. (2010). SRp20 is a proto‐ oncogene critical for cell proliferation and tumor induction and maintenance. Int J Biol Sci 6, 806‐826.

Jia, R., Liu, X., Tao, M., Kruhlak, M., Guo, M., Meyers, C., Baker, C.C., and Zheng, Z.M. (2009). Control of the papillomavirus early‐to‐late switch by differentially expressed SRp20. J Virol 83, 167‐180.

Jimenez‐Garcia, L.F., and Spector, D.L. (1993). In vivo evidence that transcription and splicing are coordinated by a recruiting mechanism. Cell 73, 47‐59.

Johnston, M., Geoffroy, M.C., Sobala, A., Hay, R., and Hutvagner, G. (2010). HSP90 protein stabilizes unloaded argonaute complexes and microscopic P‐bodies in human cells. Mol Biol Cell 21, 1462‐1469.

Jumaa, H., and Nielsen, P.J. (1997). The splicing factor SRp20 modifies splicing of its own mRNA and ASF/SF2 antagonizes this regulation. EMBO J 16, 5077‐5085.

Kang, J.G., Pripuzova, N., Majerciak, V., Kruhlak, M., Le, S.Y., and Zheng, Z.M. (2011). Kaposiʹs sarcoma‐associated herpesvirus ORF57 promotes escape of viral and human interleukin‐6 from microRNA‐mediated suppression. J Virol 85, 2620‐2630.

Kanopka, A., Muhlemann, O., and Akusjarvi, G. (1996). Inhibition by SR proteins of splicing of a regulated adenovirus pre‐mRNA. Nature 381, 535‐538. 170

Karni, R., de Stanchina, E., Lowe, S.W., Sinha, R., Mu, D., and Krainer, A.R. (2007). The gene encoding the splicing factor SF2/ASF is a proto‐oncogene. Nat Struct Mol Biol 14, 185‐193.

Karni, R., Hippo, Y., Lowe, S.W., and Krainer, A.R. (2008). The splicing‐factor oncoprotein SF2/ASF activates mTORC1. Proc Natl Acad Sci U S A 105, 15323‐15327.

Kashima, T., and Manley, J.L. (2003). A negative element in SMN2 exon 7 inhibits splicing in spinal muscular atrophy. Nat Genet 34, 460‐463.

Kataoka, N., Fujita, M., and Ohno, M. (2009). Functional association of the Microprocessor complex with the spliceosome. Mol Cell Biol 29, 3243‐3254.

Kawano, T., Fujita, M., and Sakamoto, H. (2000). Unique and redundant functions of SR proteins, a conserved family of splicing factors, in Caenorhabditis elegans development. Mech Dev 95, 67‐76.

Keene, J.D., Komisarow, J.M., and Friedersdorf, M.B. (2006). RIP‐Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts. Nat Protoc 1, 302‐307.

Kim, E., Goren, A., and Ast, G. (2008). Alternative splicing and disease. RNA Biol 5, 17‐ 19.

Kim, Y.K., and Kim, V.N. (2007). Processing of intronic microRNAs. EMBO J 26, 775‐783.

Klingbeil, P., Marhaba, R., Jung, T., Kirmse, R., Ludwig, T., and Zoller, M. (2009). CD44 variant isoforms promote metastasis formation by a tumor cell‐matrix cross‐talk that supports adhesion and apoptosis resistance. Mol Cancer Res 7, 168‐179.

Ko, B., and Gunderson, S.I. (2002). Identification of new poly(A) polymerase‐inhibitory proteins capable of regulating pre‐mRNA polyadenylation. J Mol Biol 318, 1189‐1206.

171

Krainer, A.R., Conway, G.C., and Kozak, D. (1990). Purification and characterization of pre‐mRNA splicing factor SF2 from HeLa cells. Genes Dev 4, 1158‐1171.

Krainer, A.R., Mayeda, A., Kozak, D., and Binns, G. (1991). Functional expression of cloned human splicing factor SF2: homology to RNA‐binding proteins, U1 70K, and Drosophila splicing regulators. Cell 66, 383‐394.

Kraus, M.E., and Lis, J.T. (1994). The concentration of B52, an essential splicing factor and regulator of splice site choice in vitro, is critical for Drosophila development. Mol Cell Biol 14, 5360‐5370.

Krawczak, M., Reiss, J., and Cooper, D.N. (1992). The mutational spectrum of single base‐pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum Genet 90, 41‐54.

Kuo, B.A., Uporova, T.M., Liang, H., Bennett, V.D., Tuan, R.S., and Norton, P.A. (2002). Alternative splicing during chondrogenesis: modulation of fibronectin exon EIIIA splicing by SR proteins. J Cell Biochem 86, 45‐55.

Ladd, A.N., and Cooper, T.A. (2002). Finding signals that regulate alternative splicing in the post‐genomic era. Genome Biol 3, reviews0008.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860‐921.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory‐ efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

Lareau, L.F., Inada, M., Green, R.E., Wengrod, J.C., and Brenner, S.E. (2007). Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926‐929.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858‐862. 172

Lee, J., Li, Z., Brower‐Sinning, R., and John, B. (2007). Regulatory circuit of human microRNA biogenesis. PLoS Comput Biol 3, e67.

Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin‐ 4 encodes small RNAs with antisense complementarity to lin‐14. Cell 75, 843‐854.

Lee, Y.S., and Dutta, A. (2009). MicroRNAs in cancer. Annu Rev Pathol 4, 199‐227.

Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15‐20.

Li, X., and Carthew, R.W. (2005). A microRNA mediates EGF receptor signaling and promotes photoreceptor differentiation in the Drosophila eye. Cell 123, 1267‐1277.

Li, X., Cassidy, J.J., Reinke, C.A., Fischboeck, S., and Carthew, R.W. (2009). A microRNA imparts robustness against environmental fluctuation during development. Cell 137, 273‐282.

Li, X., Shambaugh, M.E., Rottman, F.M., and Bokar, J.A. (2000). SR proteins Asf/SF2 and 9G8 interact to activate enhancer‐dependent intron D splicing of bovine growth hormone pre‐mRNA in vitro. RNA 6, 1847‐1858.

Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark, T.A., Schweitzer, A.C., Blume, J.E., Wang, X., et al. (2008). HITS‐CLIP yields genome‐wide insights into brain alternative RNA processing. Nature 456, 464‐469.

Lim, L.P., and Sharp, P.A. (1998). Alternative splicing of the fibronectin EIIIB exon depends on specific TGCATG repeats. Mol Cell Biol 18, 3900‐3906.

Lin, S., Coutinho‐Mansfield, G., Wang, D., Pandit, S., and Fu, X.D. (2008). The splicing factor SC35 has an active role in transcriptional elongation. Nat Struct Mol Biol 15, 819‐ 826.

173

Lin, S., and Fu, X.D. (2007). SR proteins and related factors in alternative splicing. Adv Exp Med Biol 623, 107‐122.

Liu, H.X., Chew, S.L., Cartegni, L., Zhang, M.Q., and Krainer, A.R. (2000). Exonic splicing enhancer motif recognized by human SC35 under splicing conditions. Mol Cell Biol 20, 1063‐1071.

Liu, H.X., Zhang, M., and Krainer, A.R. (1998). Identification of functional exonic splicing enhancer motifs recognized by individual SR proteins. Genes Dev 12, 1998‐2012.

Long, J.C., and Caceres, J.F. (2009). The SR protein family of splicing factors: master regulators of gene expression. Biochem J 417, 15‐27.

Longman, D., Johnstone, I.L., and Caceres, J.F. (2000). Functional characterization of SR and SR‐related genes in Caenorhabditis elegans. EMBO J 19, 1625‐1637.

Lou, H., Neugebauer, K.M., Gagel, R.F., and Berget, S.M. (1998). Regulation of alternative polyadenylation by U1 snRNPs and SRp20. Mol Cell Biol 18, 4977‐4985.

Lu, J., Getz, G., Miska, E.A., Alvarez‐Saavedra, E., Lamb, J., Peck, D., Sweet‐Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., et al. (2005). MicroRNA expression profiles classify human cancers. Nature 435, 834‐838.

Maciolek, N.L., and McNally, M.T. (2007). Serine/arginine‐rich proteins contribute to negative regulator of splicing element‐stimulated polyadenylation in rous sarcoma virus. J Virol 81, 11208‐11217.

Majerciak, V., Yamanegi, K., Nie, S.H., and Zheng, Z.M. (2006). Structural and functional analyses of Kaposi sarcoma‐associated herpesvirus ORF57 nuclear localization signals in living cells. J Biol Chem 281, 28365‐28378.

Mamane, Y., Petroulakis, E., LeBacquer, O., and Sonenberg, N. (2006). mTOR, translation initiation and cancer. Oncogene 25, 6416‐6422.

174

Mandel, C.R., Bai, Y., and Tong, L. (2008). Protein factors in pre‐mRNA 3ʹ‐end processing. Cell Mol Life Sci 65, 1099‐1122.

Mangone, M., Manoharan, A.P., Thierry‐Mieg, D., Thierry‐Mieg, J., Han, T., Mackowiak, S.D., Mis, E., Zegar, C., Gutwein, M.R., Khivansara, V., et al. (2010). The landscape of C. elegans 3ʹUTRs. Science 329, 432‐435.

Maniatis, T., and Tasic, B. (2002). Alternative pre‐mRNA splicing and proteome expansion in metazoans. Nature 418, 236‐243.

Manley, J.L., and Krainer, A.R. (2010). A rational nomenclature for serine/arginine‐rich protein splicing factors (SR proteins). Genes Dev 24, 1073‐1074.

Manning, B.D., and Cantley, L.C. (2007). AKT/PKB signaling: navigating downstream. Cell 129, 1261‐1274.

Matlin, A.J., and Moore, M.J. (2007). Spliceosome assembly and composition. Adv Exp Med Biol 623, 14‐35.

Mayeda, A., and Krainer, A.R. (1992). Regulation of alternative pre‐mRNA splicing by hnRNP A1 and splicing factor SF2. Cell 68, 365‐375.

McFarlane, M., and Graham, S.V. (2010). Human papillomavirus regulation of SR proteins. Biochem Soc Trans 38, 1116‐1121.

McPhillips, M.G., Veerapraditsin, T., Cumming, S.A., Karali, D., Milligan, S.G., Boner, W., Morgan, I.M., and Graham, S.V. (2004). SF2/ASF binds the human papillomavirus type 16 late RNA control element and is regulated during differentiation of virus‐ infected epithelial cells. J Virol 78, 10598‐10605.

Melo, S.A., Ropero, S., Moutinho, C., Aaltonen, L.A., Yamamoto, H., Calin, G.A., Rossi, S., Fernandez, A.F., Carneiro, F., Oliveira, C., et al. (2009). A TARBP2 mutation in human cancer impairs microRNA processing and DICER1 function. Nat Genet 41, 365‐370.

175

Melton, C., Judson, R.L., and Blelloch, R. (2010). Opposing microRNA families regulate self‐renewal in mouse embryonic stem cells. Nature 463, 621‐626.

Merdzhanova, G., Edmond, V., De Seranno, S., Van den Broeck, A., Corcos, L., Brambilla, C., Brambilla, E., Gazzeri, S., and Eymin, B. (2008). E2F1 controls alternative splicing pattern of genes involved in apoptosis through upregulation of the splicing factor SC35. Cell Death Differ 15, 1815‐1823.

Merdzhanova, G., Gout, S., Keramidas, M., Edmond, V., Coll, J.L., Brambilla, C., Brambilla, E., Gazzeri, S., and Eymin, B. (2010). The transcription factor E2F1 and the SR protein SC35 control the ratio of pro‐angiogenic versus antiangiogenic isoforms of vascular endothelial growth factor‐A to inhibit neovascularization in vivo. Oncogene 29, 5392‐5403.

Mermoud, J.E., Cohen, P.T., and Lamond, A.I. (1994). Regulation of mammalian spliceosome assembly by a protein phosphorylation mechanism. EMBO J 13, 5679‐5688.

Michlewski, G., and Caceres, J.F. (2010). Antagonistic role of hnRNP A1 and KSRP in the regulation of let‐7a biogenesis. Nat Struct Mol Biol 17, 1011‐1018.

Michlewski, G., Guil, S., Semple, C.A., and Caceres, J.F. (2008a). Posttranscriptional regulation of miRNAs harboring conserved terminal loops. Mol Cell 32, 383‐393.

Michlewski, G., Sanford, J.R., and Caceres, J.F. (2008b). The splicing factor SF2/ASF regulates translation initiation by enhancing phosphorylation of 4E‐BP1. Mol Cell 30, 179‐189.

Millevoi, S., and Vagner, S. (2010). Molecular mechanisms of eukaryotic pre‐mRNA 3ʹ end processing regulation. Nucleic Acids Res 38, 2757‐2774.

Misteli, T., Caceres, J.F., and Spector, D.L. (1997). The dynamics of a pre‐mRNA splicing factor in living cells. Nature 387, 523‐527.

176

Mole, S., McFarlane, M., Chuen‐Im, T., Milligan, S.G., Millan, D., and Graham, S.V. (2009). RNA splicing factors regulated by HPV16 during cervical tumour progression. J Pathol 219, 383‐391.

Moroy, T., and Heyd, F. (2007). The impact of alternative splicing in vivo: mouse models show the way. RNA 13, 1155‐1171.

Naor, D., Sionov, R.V., and Ish‐Shalom, D. (1997). CD44: structure, function, and association with the malignant process. Adv Cancer Res 71, 241‐319.

Ni, J.Z., Grate, L., Donohue, J.P., Preston, C., Nobida, N., OʹBrien, G., Shiue, L., Clark, T.A., Blume, J.E., and Ares, M., Jr. (2007). Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense‐mediated decay. Genes Dev 21, 708‐718.

Niranjanakumari, S., Lasda, E., Brazas, R., and Garcia‐Blanco, M.A. (2002). Reversible cross‐linking combined with immunoprecipitation to study RNA‐protein interactions in vivo. Methods 26, 182‐190.

Niwa, M., and Berget, S.M. (1991). Mutation of the AAUAAA polyadenylation signal depresses in vitro splicing of proximal but not distal introns. Genes Dev 5, 2086‐2095.

Niwa, M., Rose, S.D., and Berget, S.M. (1990). In vitro polyadenylation is stimulated by the presence of an upstream intron. Genes Dev 4, 1552‐1559.

OʹDonnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., and Mendell, J.T. (2005). c‐Myc‐ regulated microRNAs modulate E2F1 expression. Nature 435, 839‐843.

Okamura, K., Liu, N., and Lai, E.C. (2009). Distinct mechanisms for microRNA strand selection by Drosophila Argonautes. Mol Cell 36, 431‐444.

Orengo, J.P., and Cooper, T.A. (2007). Alternative splicing in disease. Adv Exp Med Biol 623, 212‐223.

177

Ozsolak, F., Kapranov, P., Foissac, S., Kim, S.W., Fishilevich, E., Monaghan, A.P., John, B., and Milos, P.M. (2010). Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 1018‐1029.

Paroo, Z., Ye, X., Chen, S., and Liu, Q. (2009). Phosphorylation of the human microRNA‐ generating complex mediates MAPK/Erk signaling. Cell 139, 112‐122.

Patel, A.A., and Steitz, J.A. (2003). Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol 4, 960‐970.

Pawlicki, J.M., and Steitz, J.A. (2008). Primary microRNA transcript retention at sites of transcription leads to enhanced microRNA production. J Cell Biol 182, 61‐76.

Peng, X., and Mount, S.M. (1995). Genetic enhancement of RNA‐processing defects by a dominant mutation in B52, the Drosophila gene for an SR protein splicing factor. Mol Cell Biol 15, 6273‐6282.

Rajasekhar, V.K., Viale, A., Socci, N.D., Wiedmann, M., Hu, X., and Holland, E.C. (2003). Oncogenic Ras and Akt signaling contribute to glioblastoma formation by differential recruitment of existing mRNAs to polysomes. Mol Cell 12, 889‐901.

Reddy, S.D., Ohshiro, K., Rayala, S.K., and Kumar, R. (2008). MicroRNA‐7, a homeobox D10 target, inhibits p21‐activated kinase 1 and regulates its functions. Cancer Res 68, 8195‐8200.

Ring, H.Z., and Lis, J.T. (1994). The SR protein B52/SRp55 is essential for Drosophila development. Mol Cell Biol 14, 7499‐7506.

Roscigno, R.F., and Garcia‐Blanco, M.A. (1995). SR proteins escort the U4/U6.U5 tri‐ snRNP to the spliceosome. RNA 1, 692‐706.

Rossi, F., Labourier, E., Forne, T., Divita, G., Derancourt, J., Riou, J.F., Antoine, E., Cathala, G., Brunel, C., and Tazi, J. (1996). Specific phosphorylation of SR proteins by mammalian DNA topoisomerase I. Nature 381, 80‐82.

178

Rybak, A., Fuchs, H., Hadian, K., Smirnova, L., Wulczyn, E.A., Michel, G., Nitsch, R., Krappmann, D., and Wulczyn, F.G. (2009). The let‐7 target gene mouse lin‐41 is a stem cell specific E3 ubiquitin ligase for the miRNA pathway protein Ago2. Nat Cell Biol 11, 1411‐1420.

Sabatini, D.M. (2006). mTOR and cancer: insights into a complex relationship. Nat Rev Cancer 6, 729‐734.

Sammeth, M., Foissac, S., and Guigo, R. (2008). A general definition and nomenclature for alternative splicing events. PLoS Comput Biol 4, e1000147.

Sandberg, R., Neilson, J.R., Sarma, A., Sharp, P.A., and Burge, C.B. (2008). Proliferating cells express mRNAs with shortened 3ʹ untranslated regions and fewer microRNA target sites. Science 320, 1643‐1647.

Sanford, J.R., Coutinho, P., Hackett, J.A., Wang, X., Ranahan, W., and Caceres, J.F. (2008). Identification of nuclear and cytoplasmic mRNA targets for the shuttling protein SF2/ASF. PLoS One 3, e3369.

Sanford, J.R., Ellis, J.D., Cazalla, D., and Caceres, J.F. (2005). Reversible phosphorylation differentially affects nuclear and cytoplasmic functions of splicing factor 2/alternative splicing factor. Proc Natl Acad Sci U S A 102, 15042‐15047.

Sanford, J.R., Gray, N.K., Beckmann, K., and Caceres, J.F. (2004). A novel role for shuttling SR proteins in mRNA translation. Genes Dev 18, 755‐768.

Sanford, J.R., Wang, X., Mort, M., Vanduyn, N., Cooper, D.N., Mooney, S.D., Edenberg, H.J., and Liu, Y. (2009). Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19, 381‐394.

Schaal, T.D., and Maniatis, T. (1999). Selection and characterization of pre‐mRNA splicing enhancers: identification of novel SR protein‐specific enhancer sequences. Mol Cell Biol 19, 1705‐1719.

179

Shalgi, R., Lieber, D., Oren, M., and Pilpel, Y. (2007). Global and local architecture of the mammalian microRNA‐transcription factor regulatory network. PLoS Comput Biol 3, e131.

Shaw, R.J., and Cantley, L.C. (2006). Ras, PI(3)K and mTOR signalling controls tumour cell growth. Nature 441, 424‐430.

Shen, H., and Green, M.R. (2004). A pathway of sequential arginine‐serine‐rich domain‐ splicing signal interactions during mammalian spliceosome assembly. Mol Cell 16, 363‐ 373.

Shen, H., Kan, J.L., and Green, M.R. (2004). Arginine‐serine‐rich domains bound at splicing enhancers contact the branchpoint to promote prespliceosome assembly. Mol Cell 13, 367‐376.

Shin, C., Feng, Y., and Manley, J.L. (2004). Dephosphorylated SRp38 acts as a splicing repressor in response to heat shock. Nature 427, 553‐558.

Shin, C., and Manley, J.L. (2002). The SR protein SRp38 represses splicing in M phase cells. Cell 111, 407‐417.

Sinha, R., Allemand, E., Zhang, Z., Karni, R., Myers, M.P., and Krainer, A.R. (2010). Arginine methylation controls the subcellular localization and functions of the oncoprotein splicing factor SF2/ASF. Mol Cell Biol 30, 2762‐2774.

Spector, D.L. (1993). Macromolecular domains within the . Annu Rev Cell Biol 9, 265‐315.

Stickeler, E., Kittrell, F., Medina, D., and Berget, S.M. (1999). Stage‐specific changes in SR splicing factors and alternative splicing in mammary tumorigenesis. Oncogene 18, 3574‐ 3582.

Sun, S., Zhang, Z., Sinha, R., Karni, R., and Krainer, A.R. (2010). SF2/ASF autoregulation involves multiple layers of post‐transcriptional and translational control. Nat Struct Mol Biol 17, 306‐312. 180

Sun, T., Wang, Q., Balk, S., Brown, M., Lee, G.S., and Kantoff, P. (2009). The role of microRNA‐221 and microRNA‐222 in androgen‐independent prostate cancer cell lines. Cancer Res 69, 3356‐3363.

Sureau, A., Gattoni, R., Dooghe, Y., Stevenin, J., and Soret, J. (2001). SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. EMBO J 20, 1785‐1796.

Swartz, J.E., Bor, Y.C., Misawa, Y., Rekosh, D., and Hammarskjold, M.L. (2007). The shuttling SR protein 9G8 plays a role in translation of unspliced mRNA containing a constitutive transport element. J Biol Chem 282, 19844‐19853.

Tacke, R., and Manley, J.L. (1995). The human splicing factors ASF/SF2 and SC35 possess distinct, functionally significant RNA binding specificities. EMBO J 14, 3540‐3551.

Tazi, J., Bakkour, N., and Stamm, S. (2009). Alternative splicing and disease. Biochim Biophys Acta 1792, 14‐26.

Terasawa, K., Ichimura, A., Sato, F., Shimizu, K., and Tsujimoto, G. (2009). Sustained activation of ERK1/2 by NGF induces microRNA‐221 and 222 in PC12 cells. FEBS J 276, 3269‐3276.

Thomas, P.D., Campbell, M.J., Kejariwal, A., Mi, H., Karlak, B., Daverman, R., Diemer, K., Muruganujan, A., and Narechania, A. (2003). PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13, 2129‐2141.

Thomson, J.M., Newman, M., Parker, J.S., Morin‐Kensicki, E.M., Wright, T., and Hammond, S.M. (2006). Extensive post‐transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20, 2202‐2207.

Tian, B., Hu, J., Zhang, H., and Lutz, C.S. (2005). A large‐scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 33, 201‐212.

181

Trabucchi, M., Briata, P., Garcia‐Mayoral, M., Haase, A.D., Filipowicz, W., Ramos, A., Gherzi, R., and Rosenfeld, M.G. (2009). The RNA‐binding protein KSRP promotes the biogenesis of a subset of microRNAs. Nature 459, 1010‐1014.

Tsang, J., Zhu, J., and van Oudenaarden, A. (2007). MicroRNA‐mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol Cell 26, 753‐767.

Tuerk, C., and Gold, L. (1990). Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505‐510.

Ule, J., Jensen, K.B., Ruggiu, M., Mele, A., Ule, A., and Darnell, R.B. (2003). CLIP identifies Nova‐regulated RNA networks in the brain. Science 302, 1212‐1215.

Ule, J., Stefani, G., Mele, A., Ruggiu, M., Wang, X., Taneri, B., Gaasterland, T., Blencowe, B.J., and Darnell, R.B. (2006). An RNA map predicting Nova‐dependent splicing regulation. Nature 444, 580‐586.

Valente, S.T., Gilmartin, G.M., Venkatarama, K., Arriagada, G., and Goff, S.P. (2009). HIV‐1 mRNA 3ʹ end processing is distinctively regulated by eIF3f, CDK11, and splice factor 9G8. Mol Cell 36, 279‐289.

Vasudevan, S., Tong, Y., and Steitz, J.A. (2007). Switching from repression to activation: microRNAs can up‐regulate translation. Science 318, 1931‐1934.

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence of the human genome. Science 291, 1304‐1351.

Wahl, M.C., Will, C.L., and Luhrmann, R. (2009). The spliceosome: design principles of a dynamic RNP machine. Cell 136, 701‐718.

Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470‐476.

182

Wang, J., and Manley, J.L. (1995). Overexpression of the SR proteins ASF/SF2 and SC35 influences alternative splicing in vivo in diverse ways. RNA 1, 335‐346.

Wang, J., Takagaki, Y., and Manley, J.L. (1996). Targeted disruption of an essential vertebrate gene: ASF/SF2 is required for cell viability. Genes Dev 10, 2588‐2599.

Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin‐14 by lin‐4 mediates temporal pattern formation in C. elegans. Cell 75, 855‐862.

Woods, K., Thomson, J.M., and Hammond, S.M. (2007). Direct regulation of an oncogenic micro‐RNA cluster by E2F transcription factors. J Biol Chem 282, 2130‐2134.

Xiao, S.H., and Manley, J.L. (1997). Phosphorylation of the ASF/SF2 RS domain affects both protein‐protein and protein‐RNA interactions and is necessary for splicing. Genes Dev 11, 334‐344.

Xu, X., Yang, D., Ding, J.H., Wang, W., Chu, P.H., Dalton, N.D., Wang, H.Y., Bermingham, J.R., Jr., Ye, Z., Liu, F., et al. (2005). ASF/SF2‐regulated CaMKIIdelta alternative splicing temporally reprograms excitation‐contraction coupling in cardiac muscle. Cell 120, 59‐72.

Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. (2006). Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13, 13‐21.

Yeo, G.W., Coufal, N.G., Liang, T.Y., Peng, G.E., Fu, X.D., and Gage, F.H. (2009). An RNA code for the FOX2 splicing regulator revealed by mapping RNA‐protein interactions in stem cells. Nat Struct Mol Biol 16, 130‐137.

Yoo, A.S., Staahl, B.T., Chen, L., and Crabtree, G.R. (2009). MicroRNA‐mediated switching of chromatin‐remodelling complexes in neural development. Nature 460, 642‐ 646.

183

Yuryev, A., Patturajan, M., Litingtung, Y., Joshi, R.V., Gentile, C., Gebara, M., and Corden, J.L. (1996). The C‐terminal domain of the largest subunit of RNA polymerase II interacts with a novel set of serine/arginine‐rich proteins. Proc Natl Acad Sci U S A 93, 6975‐6980.

Zahler, A.M., Lane, W.S., Stolk, J.A., and Roth, M.B. (1992). SR proteins: a conserved family of pre‐mRNA splicing factors. Genes Dev 6, 837‐847.

Zhang, C., Frias, M.A., Mele, A., Ruggiu, M., Eom, T., Marney, C.B., Wang, H., Licatalosi, D.D., Fak, J.J., and Darnell, R.B. (2010). Integrative modeling defines the Nova splicing‐regulatory network and its combinatorial controls. Science 329, 439‐443.

Zhang, X.H., and Chasin, L.A. (2004). Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev 18, 1241‐1250.

Zhang, Z., and Krainer, A.R. (2004). Involvement of SR proteins in mRNA surveillance. Mol Cell 16, 597‐607.

Zhao, Y., Samal, E., and Srivastava, D. (2005). Serum response factor regulates a muscle‐ specific microRNA that targets Hand2 during cardiogenesis. Nature 436, 214‐220.

Zheng, Z.M. (2004). Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression. J Biomed Sci 11, 278‐294.

Zheng, Z.M. (2010). Viral oncogenes, noncoding RNAs, and RNA splicing in human tumor viruses. Int J Biol Sci 6, 730‐755.

Zheng, Z.M., Huynen, M., and Baker, C.C. (1998). A pyrimidine‐rich exonic splicing suppressor binds multiple RNA splicing factors and inhibits spliceosome assembly. Proc Natl Acad Sci U S A 95, 14088‐14093.

Zhu, J., Mayeda, A., and Krainer, A.R. (2001). Exon identity established through differential antagonism between exonic splicing silencer‐bound hnRNP A1 and enhancer‐bound SR proteins. Mol Cell 8, 1351‐1361.

184

Biography

Han Wu was born in the city of Anqing in Anhui Province of the People’s

Republic of China on Aug 4th, 1983. He entered the School of Life Sciences at Fudan

University (Shanghai, China) in 2001, and received a Bachelor of Science degree in

Biology in May 2005. Then he was recruited by the Cell and Molecular Biology (CMB) program, and joined Dr. Jun Zhu’s lab in the department of Cell Biology in May 2006 to work on this thesis project.

Publications 1) Wu, H., Sun, S., Tu, K., Gao, Y., Xie, B., Krainer, A.R., and Zhu, J. (2010). A splicing‐ independent function of SF2/ASF in microRNA processing. Mol Cell 38, 67‐77. 2) Ni, T., Wu, H., Song, S., Jelley, M., and Zhu, J. (2009). Selective gene amplification for high‐throughput sequencing. Recent Pat DNA Gene Seq 3, 29‐38. 3) Jima DD, Zhang J, Jacobs C, Richards KL, Dunphy CH, Choi WW, Au WY, Srivastava G, Czader MB, Rizzieri DA, Lagoo AS,Lugar PL, Mann KP, Flowers CR, Bernal‐Mizrachi L, Naresh KN, Evens AM, Gordon LI, Luftig M, Friedman DR, Weinberg JB,Thompson MA, Gill JI, Liu Q, How T, Grubor V, Gao Y, Patel A, Wu H, Zhu J, Blobe GC, Lipsky PE, Chadburn A, Dave SS. (2010). Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. Blood Aug 23. 4) Ni, T., Tu, K., Wang, Z., Song, S., Wu, H., Xie, B., Scott, K.C., Grewal, S.I., Gao, Y., and Zhu, J. (2010). The prevalence and regulation of antisense transcripts in Schizosaccharomyces pombe. PLoS One 5, e15271.

Honors and Awards 2001, 2003 People’s Scholarship 2001‐2002 Excellent Communist Youth Leaguer 2002 Certificate of Excellence (Microsoft University Education Program) 2005‐2007 Duke University Fellowship

185