ARGLU1 is an RNA-Binding that has a Fundamental Role in Modulating Global Alternative Splicing Patterns

by

Emma Zilberman

A thesis submitted in conformity with the requirements for the degree of Master of Science Pharmaceutical Sciences University of Toronto

© Copyright by Emma Zilberman 2017

ARGLU1 is an RNA-Binding Protein that has a Fundamental Role in Modulating Global Alternative Splicing Patterns

Emma Zilberman

Master of Science

Pharmaceutical Sciences University of Toronto

2017 Abstract

Alternative splicing (AS) is a key step of RNA maturation. More emerging evidence demonstrates RNA synthesis and processing are coupled, thus transcriptional coregulatory are being studied in greater detail as mediators of both events. Nuclear receptors such as glucocorticoid receptor (GR) are ligand-dependent factors dependent on coregualtor interaction. The role of GR in metabolism and immune function in response to synthetic glucocorticoids, such as Dexamethasone are well characterized. We identified a novel GR coactivator, arginine and glutamate rich 1 (ARGLU1) that potentiates GR transcriptional activity and interacts with splicing factors. ARGLU1 knockdown in a neuronal cell line and analysis of mRNA found ARGLU1 alters alternatively spliced events (ASEs). Dexamethasone-regulated

ASEs were also identified and found to be largely dependent on ARGLU1 and GR. RNA immunoprecipitation found that ARGLU1 could bind pre-mRNA of ASEs. This thesis studied the role of ARGLU1 as a novel RNA-binding protein that modulates AS.

ii

Acknowledgments

First of all, I am extremely grateful to my supervisor, Dr. Carolyn Cummins, for providing me the opportunity to work on a project so novel and unique that we both felt intrigued and sometimes so puzzled by the results. Beginning my thesis research was no easy task, and Carolyn’s passion for research helped ignite my own excitement for this work on splicing that I really grew to appreciate.

Secondly, my advisory committee comprising of Dr. Stephane Angers and Dr. Craig Smibert, have been the perfect blend of critical, creative and supportive. Thank you for withstanding my often lengthy committee meetings and providing the help I needed to complete this work.

To the Cummins lab, both past and present, you have made my time here fantastic! From the endless scientific advice to the hilarious, often very odd, discussions we all have. Lilia “my lab mom” – I cannot thank you enough for all the help you provided me throughout my MSc research, from teaching and explaining concepts to me repeatedly sometimes. Adil “my hilarious lab twin” – thank you for being so supportive both in the lab and out and helping teach me a lot of experimental techniques in the start. Paola, Cigdem, Michael and Wendy, you’re always around when I need a chat or coffee break. To past lab members, Ricky, Rucha and Jasmine – you were all great help whenever I needed advice. I would not be where I am without the help of my lab mates!

To mom and dad, Daniel, and grandparents – you guys have relentlessly asked me what I do here and I never have a good answer, so I hope you enjoy this thesis to finally get an answer to your question! Your love, support and friendship has been unwavering. Thank you for the constant words of encouragement and praise. I love you all so much from the bottom of my heart!

Dave, my love, my rock. You know me best of all and your words of encouragement through all the frustrating experimental setbacks and late night work was what really got me to make it to the end. I don’t think I would have ever been able to grow as a person and scientist without your unwavering motivation, love and support.

iii

Table of Contents

Acknowledgments ...... iii

Table of Contents ...... iv

List of Tables ...... vii

List of Figures ...... viii

List of Appendices ...... xi

List of Abbreviations ...... xii

Chapter 1 ...... 1

Introduction ...... 1

1.1 Nuclear receptors ...... 1

1.1.1 Glucocorticoid receptor ...... 3

1.1.2 NR coregulators ...... 6

1.2 Constitutive and alternative splicing ...... 6

1.2.1 Types of alternative splicing ...... 7

1.2.2 Spliceosome components and mechanism ...... 8

1.2.3 RNA sequences as a determinant of splicing ...... 12

1.2.4 Splicing factors as determinants of alternative splicing ...... 13

1.2.5 Positional context as a determinant of AS regulation ...... 15

1.2.6 Alternative splicing and nonsense-mediated decay (AS-NMD) ...... 16

1.2.7 Alternative splicing in the nervous system ...... 17

1.3 ARGLU1 ...... 18

1.3.1 Discovery of ARGLU1 ...... 18

1.3.2 ARGLU1 C-terminal domain is a GR coactivator ...... 21

1.3.3 ARGLU1 N-terminal domain interacts with splicing factors ...... 21

1.4 Co-transcriptional RNA processing ...... 22

iv

1.4.1 Kinetic model of coupling ...... 24

1.4.2 Recruitment model of coupling ...... 24

1.4.3 NR coregulators: mediators of transcription and splicing coupling ...... 25

1.5 Objective of thesis research ...... 26

1.5.1 Specific aims ...... 26

1.5.2 Hypothesis ...... 27

Chapter 2 ...... 28

Methods ...... 28

2.1 Cell culture ...... 28

2.2 Transfection assays ...... 28

2.3 RNA analysis, cDNA synthesis and real-time quantitative PCR (qPCR) ...... 28

2.4 Protein extraction ...... 29

2.5 BCA and WB ...... 29

2.6 RNA-seq analysis and validation ...... 29

2.7 Pathway analysis ...... 33

2.8 One-Step RT-PCR ...... 33

2.9 RNA binding motif analysis ...... 35

2.10 RNA Immunoprecipitation (RIP) ...... 35

2.11 One-Step qPCR ...... 36

2.12 Statistical analysis ...... 37

Chapter 3 ...... 38

Results ...... 38

3.1 ARGLU1 modulates global patterns of expression ...... 38

3.1.1 RNA-seq gene expression validation ...... 41

3.2 ARGLU1 modulates global patterns of AS ...... 45

3.3 ARGLU1 can bind RNA ...... 55 v

3.3.1 ARGLU1 RNA-compete validation ...... 56

3.4 GR knockdown alters global gene expression and AS patterns in N2a cells ...... 59

3.4.1 GR knockdown alters gene expression ...... 59

3.4.2 GR knockdown alters AS ...... 63

3.4.3 GR RNA-seq ASE validation ...... 67

3.5 ARGLU1 and GR interplay in AS modulation ...... 70

Chapter 4 ...... 72

Discussion ...... 72

4.1 Future Directions ...... 77

4.2 Limitations ...... 78

Summary of key findings ...... 79

References or Bibliography ...... 82

Appendices ...... 95

vi

List of Tables

Table 1 Gene expression qPCR primers used to validate the RNA-seq data sets...... 30

Table 2 One-Step RT-PCR master mix components and thermocycling conditions ...... 33

Table 3 Primer sequences for One-Step RT-PCR...... 34

Table 4 Primer sequences used for One-Step qPCR...... 37

vii

List of Figures

Figure 1-1 Schematic of the general structure of nuclear receptors...... 2

Figure 1-2 Glucocorticoid receptor mechanisms of action...... 5

Figure 1-3 Schematic of alternative splicing (AS)...... 7

Figure 1-4 Types of alternative splicing...... 8

Figure 1-5 Necessary sequences for spliceosome recognition ...... 9

Figure 1-6 Two transesterification reactions of constitutive and alternative splicing...... 10

Figure 1-7 Spliceosome assembly ...... 11

Figure 1-8 Splicing regulation is determined by RNA sequence and protein factors...... 12

Figure 1-9 Schematic of ARGLU1 structure...... 19

Figure 1-10 ARGLU1 is highly conserved between species...... 20

Figure 1-11 RNAcompete derived consensus motifs for GST-hARGLU1 constructs...... 22

Figure 3-1 ARGLU1 is transiently knocked down in N2a cells...... 39

Figure 3-2 ARGLU1 and Dex cause significant changes in gene expression of N2a cells. ... 40

Figure 3-3 Basal regulation of gene expression following ARGLU1 knockdown...... 42

Figure 3-4 Ligand dependent gene expression validation following ARGLU1 knockdown. 43

Figure 3-5 Correlation of qPCR validation with RNA-seq...... 44

Figure 3-6 ARGLU1 regulates a small proportion of detectable ASEs...... 45

Figure 3-7 Basal regulation of alternative splicing in N2a cells following ARGLU1 knockdown...... 47

Figure 3-8 Ligand-dependent regulation of alternative splicing in N2a cells...... 48 viii

Figure 3-9 Venn diagram showing overlap of AS events that have a ∆PSI ≥ 15 in response to Dex in the presence of ARGLU1 (siControl) or absence (siArglu1)...... 49

Figure 3-10 ARGLU1 knockdown in N2a cells leads to increased cassette exon inclusion. ... 50

Figure 3-11 ARGLU1 knockdown in N2a cells leads to increased cassette exon skipping. .... 51

Figure 3-12 Dexamethasone treatment alters AS events in N2a cells...... 52

Figure 3-13 ∆PSI correlation between RNA-seq and One-Step RT-PCR...... 53

Figure 3-14 ARGLU1 regulates transcription and splicing of distinct ...... 54

Figure 3-15 ARGLU1 regulates genes involved in the pathway for “cell morphogenesis involved in neuron differentiation” at both the gene expression and alternative splicing levels...... 54

Figure 3-16 ARGLU1 regulates genes involved in the pathway for “chromatin binding” at both the gene expression and alternative splicing levels...... 55

Figure 3-17 The ARGLU1 RNA-binding motif is distributed in flanking introns...... 57

Figure 3-18 ARGLU1 can bind pre-mRNA of alternatively spliced genes...... 58

Figure 3-19 Glucocorticoid receptor was transiently knocked down in N2a cells...... 60

Figure 3-20 Basal and ligand-dependent regulation of gene expression after GR knockdown or Dex treatment in N2a cells...... 61

Figure 3-21 qPCR validation of GR target genes identified in RNA-seq of N2a cells...... 62

Figure 3-22 Correlation of qPCR validation with RNA-seq data after GR knockdown...... 63

Figure 3-23 GR regulates a minor proportion of detectable ASEs...... 64

Figure 3-24 Basal regulation of alternative splicing by GR in untreated N2a cells...... 65

Figure 3-25 Ligand-dependent regulation of alternative splicing in N2a cells...... 66

ix

Figure 3-26 Venn diagram illustrating overlap of ASEs with ∆PSI ≥ 15 in response to Dex. . 67

Figure 3-27 Validation of AS events following GR knockdown and Dex treatment...... 68

Figure 3-28 RT-PCR and RNA-seq correlation of all ASEs validated from GR RNA-seq data...... 69

Figure 3-29 GR regulates transcription and splicing of distinct genes...... 69

Figure 3-30 Correlation of cRPKMs between two RNA-seq experiments...... 70

Figure 3-31 Correlation of PSIs between two RNA-seq experiments...... 71

Figure 5-1 Predicted mechanism of ARGLU1 to promote cassette exon alternative splicing...... 81

x

List of Appendices

Supplementary Figure 1 Additional qPCR validation of gene expression data obtained by RNA-seq after ARGLU1 knockdown...... 95

Supplementary Figure 2 Additional AS validation using multiple siRNA against ARGLU1 for selected ASEs...... 96

Supplementary Table S1 N2a RNA-seq gene expression data following ARGLU1 knockdown

Supplementary Table S2 N2a RNA-seq alternative splicing data following ARGLU1 knockdown

Supplementary Table S3 g:profiler results for basally-regulated genes with transcription and AS overlap following ARGLU1 knockdown

Supplementary Table S4 N2a RNA-seq gene expression data following GR knockdown

Supplementary Table S5 N2a RNA-seq alternative splicing data following GR knockdown

Supplementary Table S6 g:profiler results for basally-regulated genes with transcription and AS overlap following GR knockdown

xi

List of Abbreviations

3’-UTR 3’ untranslated region 36B4 Ribosomal protein, large AF Activation function ANOVA Analysis of variance AR Androgen receptor ARGLU1 Arginine and glutamate rich 1 AS Alternative splicing AS-NMD Alternative splicing coupled to nonsense-mediated decay ASE Alternatively spliced events ATP Adenosine triphosphate BAIAP2 Brain-specific angiogenesis inhibitor 1-associated protein 2 BCA Bicinchoninic acid assay BIN1 Bridging integrator 1 BPS Branchpoint sequence BPTF Bromodomain PHD finger transcription factor BRWD3 Bromodomain and WD repeat domain containing 3 CAPER�/� (RBM39) RNA binding motif protein 39 CLIP Crosslinking and immunoprecipitation CLIP-seq Crosslinking and immunoprecipitation followed by sequencing CNS Central nervous system CoAA Coactivator activator; hnRNP-like COBRA1 Cofactor of BRCA1 CPM Counts per million cRPKM Corrected reads per kilobase per million Ct Cycle time CTD C-terminal domain DBD DNA-binding domain DDX/DHX DEAD/DEAH box helicase Dex Dexamethasone, GR ligand EEJ Exon-exon junction EIJ Exon-intron junction EJC Exon junction complex ER Estrogen receptor ESE Exonic splicing enhancer ESS Exonic splicing silencer FBS Fetal bovine serum FC Fold change FDR False discovery rate GATA2 GATA binding protein 2 GC Glucocorticoid GR Glucocorticoid receptor GRE Glucocorticoid response element h, hr Hour

xii

HITS-CLIP High-throughput sequencing coupled with crosslinking and immunoprecipitation hnRNP Heterogeneous nuclear ribonucleoprotein HR Hinge region HRE Hormone response element HSP Heat shock protein HTS High-throughput sequencing iCLIP individual-nucleotide resolution UV crosslinking and immunoprecipitation IP Immunoprecipitation IR Intron retention IRE Intron retention event ISE Intronic splicing enhancer ISS Intronic splicing silencer KAT2B K(lysine) acetyltransferase 2B KD Knockdown KS Kolmogorov-Smirnov LBD Ligand-binding domain LPIN1 Lipin 1 LRP4 Low density lipoprotein receptor-related protein 4 MED1 Mediator complex subunit 1 MED23 Mediator complex subunit 23 MIC Microexon miRNA Micro-RNA MPDZ Multiple PDZ domain protein N2a Neuroblastoma cell line NCOA Nuclear coactivator NCOR1 Nuclear corepressor 1 NLS Nuclear localization sequence NMD Nonsense-mediated decay NR Nuclear receptor NRN1 Neuritin 1 nSR100 Neural-SR100 NTD N-terminal domain PAR-CLIP Photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation PAX5 Paired box 5 PBS Phosphate buffered saline PCR Polymerase chain reaction PGC-1α Peroxisome proliferator receptor, gamma, coactivator 1 alpha PI Protease inhibitor PMSF phenylmethylsulfonyl fluoride Pol II Polymerase II, RNA PPT Polypyrimidine tract PRPF Pre-mRNA processing factor PSI Percent spliced in PTB Polypyrimidine tract binding PTBP1 Polypyrimidine tract binding protein 1 xiii

PTC Premature termination codon PUF60 Poly(U) binding splicing factor 60 qPCR/RT-qPCR Quantitative real time polymerase chain reaction RBD RNA-binding domain RBFOX RNA-binding protein, fox-1 homolog RBM RNA-binding motif RBP RNA-binding protein RIP RNA immunoprecipitation RIP-ChIP RNA immunoprecipitation followed by microarray RIP-seq RNA immunoprecipitation followed by sequencing RIP140 Nuclear receptor interacting protein 140 RND2 Rho family GTPase 2 RNP Ribonucleoprotein RRM RNA recognition motif RT-PCR Real time polymerase chain reaction SCN1B Sodium-channel, voltage-gated, type I, beta SDS Sodium deoxysulfate SF Splicing factor siRNA Silencing RNA snRNA Small nuclear RNA snRNP Small nuclear ribonucleoprotein particle SR protein Serine-arginine protein SRC Steroid receptor coactivator SRE Splicing recognition element SRSF Serine arginine splicing factor SS Splice site TF Transcription factor TIF2 (NCOA2) Nuclear coactivator 2 TMTC1 Transmembrane and tetratricopeptide repeat containing 1 TNIK TRAF2 and NCK interacting kinase U2AF U2 auxiliary factor UCE Ultraconserved element UPF1 Up frameshift mutation 1 VDR Vitamin D receptor VEGF Vascular endothelial growth factor Veh Vehicle

xiv

Chapter 1

Introduction 1.1 Nuclear receptors

Steroid hormones are known regulators of physiological and pathological states important in development, differentiation, reproduction, immune system regulation and metabolism (Aranda and Pascual, 2001; Mangelsdorf et al., 1995). Receptors binding to steroid hormones comprise a superfamily of ligand-dependent transcription factors, or nuclear receptors (NRs) (Mangelsdorf et al., 1995). NRs share overall topology that consists of variable N- and C-terminal regions; a conserved central DNA-binding domain (DBD), hinge region (HR) and conserved ligand- binding domain (LBD) (Figure 1-1).

The variability and size of the N-terminal domain (NTD) means it is highly amenable to post- translational modifications and the large surface is critical for coregulator interactions (discussed further in Section 1.1.2). For instance, phosphorylation of the NTD of glucocorticoid receptor stabilizes interactions with coregulators, which in turn can potentiate or repress actions of the receptor (Garza et al., 2010; Pawlak et al., 2012). The NTD also contains the first transactivation domain (AF1), which is important for ligand-independent activation.

The DBD consists of two zinc fingers, which target the receptor to hormone-response elements (HREs). More specifically, the first zinc finger directly interacts with the DNA while the second zinc finger specifies NR dimerization and stabilizes the NR-DNA interaction (Mangelsdorf et al., 1995; Pawlak et al., 2012). Zinc finger domains are formed by four cysteine residues in the cysteine-rich portion of the DBD. HREs are conserved sequences within the genome that consist of two hexamer consensus sequences important for the receptor to recognize and bind DNA. These hexamers can take on the form of palindromic repeats, inverted repeats or direct repeats (Aranda and Pascual, 2001). NRs can bind HREs as monomers, homodimers or heterodimers. Each NR has different ranges of affinity for their target HRE sequence. For example, homodimer receptors exclusively bind palindromic sequences (Aranda and Pascual, 2001).

1

The hinge region (HR) of the receptor can contain a nuclear localization sequence (NLS) and is important for maintaining flexibility of the NR. The flexibility of the HR assists in positioning the two neighboring DBD and LBD. The LBD contains the AF-2 domain, which is critical for interactions with the signature LXXLL motif present in coregulators (Heery et al., 1997). The LBD is important in initiating the NR signaling cascade, starting with dimerization and activation of the receptor (Mangelsdorf et al., 1995).

Figure 1-1 Schematic of the general structure of nuclear receptors.

All NRs share a similar structure consisting of a highly variable NTD, containing one of the activation function (AF) domains. The conserved DNA-binding domain (DBD) has Zn fingers important for binding to HREs and transcription factors (TFs). Both the DBD and LBD play a role in receptor dimerization. The NR is kept inactive by a chaperone complex binding to the LBD. After ligand binding to the LBD, the NR signaling cascade can begin with the AF2 domain critical for full transcriptional activation or repression via interactions with coregulators.

Receptors can be located in the cytosol, but kept inactive by a chaperone protein complex binding to their LBD, or already in the nucleus bound to DNA. Ligands for NRs are typically lipophilic molecules that diffuse through the membrane to reach the NR. Once the ligand binds, the receptor undergoes a conformational change to dissociate from the chaperone complex, and translocates to the nucleus to bind HREs via the now exposed DBD. This is termed a genomic mechanism of action and involves direct binding of the receptor to DNA to change gene expression. Moreover, NRs can act through other mechanisms, such as indirect binding to DNA or through non-genomic mechanisms to elicit gene expression changes. Nongenomic signaling includes changing membrane fluidity or the embedded ion channels to activate signaling cascades or generate secondary messengers (Falkenstein et al., 2000; Losel and Wehling, 2003).

2

Nongenomic mechanisms occur rapidly in second to minutes, versus the slower genomic mechanism which involves transcriptional changes and can take upwards of hours to days (Falkenstein et al., 2000).

1.1.1 Glucocorticoid receptor

Glucocorticoids (GCs) are secreted by the adrenal cortex during conditions of stress or starvation and are under the control of the hypothalamic-pituitary-adrenal (HPA) axis (Oakley and Cidlowski, 2013; Vegiopoulos and Herzig, 2007). GCs have a broad range of physiological effects on development, differentiation, metabolism, reproduction, growth and immune function. Endogenous GCs include cortisol in humans and corticosterone in rodents. A potent synthetic ligand often used in laboratory research and as a drug treatment is Dexamethasone (Dex). GCs are widely prescribed for their immunosuppressant and anti-inflammatory properties however chronic administration of steroids has serious adverse effects, such as lipid and glucose metabolism dysregulation causing hepatosteatosis, diabetes and obesity. Other adverse effects include osteoporosis, skin atrophy and hypertension. Dysregulation of GC metabolism can have severe complications and be life-threatening. For instance, Cushing’s disease is characterized by the excess production of GCs leading to muscle wasting and bone loss, abdominal obesity, diabetes and immunosuppression (Revollo and Cidlowski, 2009). In contrast, Addison’s disease is characterized by the loss of GC production, and patients suffer from hypoglycemia, weight loss and electrolyte imbalance (Revollo and Cidlowski, 2009).

The physiological and pharmacological effects of GCs are mediated through the glucocorticoid receptor (GR). GR has the characteristic structure of the other NRs, which includes the central DBD and C-terminal LBD (Aranda and Pascual, 2001; Revollo and Cidlowski, 2009). GR has two activation function (AF) motifs, important in mediating interactions with basal transcriptional machinery and coregulators. A nuclear localization signal is also present in the C- terminal domain of the protein (Revollo and Cidlowski, 2009). GR is held in the cytosol as part of a complex with HSP90, HSP60 and P23. Upon ligand binding, the complex undergoes a conformational change to release GR such that it can translocate to the nucleus, homodimerize, and bind glucocorticoid response elements (GREs) in its target genes to alter gene expression. Genome-wide studies have found that GREs are not necessarily near the transcriptional start site, with most being present in intragenic regions or areas further away from the promoter. The

3

importance of these binding sites was recently elucidated for GR (Kuznetsova et al., 2015), using chromatin interaction analysis by paired end sequencing (ChIA-PET) (Fullwood et al., 2009). This technique can look at distal enhancer interactions with a promoter for a specific gene demonstrating importance of long-range interactions in gene regulation. Kuznetsova et al., was able to ascribe function to a subset of distal binding sites for GR that were identified by ChIP-seq but initially thought to be nonproductive binding or “parking spots” for GR (Kuznetsova et al., 2015).

GR binds its cognate sequences as a homodimer where each receptor monomer binds one of the two 6 (bp) half sites (Aranda and Pascual, 2001; Revollo and Cidlowski, 2009). The consensus GRE (GGAACAnnnTGTTCT) is an imperfect palindrome and when GR binds, transcription is activated. Negative regulation has also been noted where transcription is repressed by GR. Repression occurs when GR binds the negative GRE (CTCCn(0-2)GGAGA). The negative GRE has variable spacing between the half sites forcing GR to bind as a monomer to mediate repression (Hudson et al., 2013; Surjit et al., 2011).

There are three main ways GR will regulate gene expression upon ligand-binding. GR can bind directly to the DNA, tether itself to already bound transcription factors or bind to the DNA and interact with bound transcription factors (Figure 1-2). Cis-regulation is the term used when GR dimers interact directly with GRE sequences, which could be present in the promoter or in distal sites. Whereas, trans-regulation occurs when GR interacts with transcription factors that are associated with the DNA indirectly (Gross and Cidlowski, 2008; Zanchi et al., 2010). Cis- repression can occur through competitive and noncompetitive mechanisms (Stahn and Buttgereit, 2008). Competitive repression occurs when GR binding to GREs prevents activating transcription factors from binding the same site on the promoter. Noncompetitive repression involves negative GREs (Hudson et al., 2013). It is important to note that both cis- and trans- regulatory mechanisms can either activate or repress transcription. Glucocorticoids mediate their anti-inflammatory and immunosuppressive properties through transcriptional activation, as well as repression. Overall, glucocorticoids increase the expression of anti-inflammatory genes while simultaneously repressing expression of pro-inflammatory genes.

4

Figure 1-2 Glucocorticoid receptor mechanisms of action.

GCs are lipophilic hormones that can traverse through the plasma membrane to bind the glucocorticoid receptor (GR). Once bound, GR is activated, dimerizes and dissociates from a heat shock protein (HSP) chaperone complex and can then translocate to the nucleus to bind GREs. The classic transactivation pathway involves the GR homodimer binding simple glucocorticoid response elements (GREs) to activate transcription. Alternatively, GR can activate transcription by tethering onto other transcription factors (TFs) without contacting the DNA or by forming a composite GRE by binding DNA along with other TFs. Negative regulation of transcription occurs most commonly by tethering actions of other TFs or via recruitment of corepressors (like NCoR, not shown here), but also by actions of monomeric GR on negative GREs, and by sequestration away from DNA binding sites.

5

1.1.2 NR coregulators

The actions of GR are dependent on the recruitment of coregulators to its AF-2 domain to modulate transcription (Edwards, 2000). Coregulatory proteins and chromatin-remodeling complexes interact with the receptor and basal transcriptional machinery to enhance transcriptional activation or repression. Initial studies identified the p160 family of nuclear coactivators (NCoAs) also known as the steroid receptor coactivators (SRC) (Edwards, 2000; McKenna and O'Malley, 2002). There are three classes of SRCs, which have conserved regions making up different functional domains. The NR interaction domain contains three LXXLL motifs that recognize the AF-2 domain (Edwards, 2000). SRCs also contain autonomous transcriptional activation domains (AD1, AD2), which interact with other coactivators. Chromatin remodeling proteins are key in allowing GR and other coregulators to bind to DNA. In an inaccessible state, the DNA is wrapped around nucleosomes and the HREs are inaccessible. An interesting finding demonstrated that GR binds to pre-accessible chromatin, which assists in determining the cell-specific actions of the receptor (Grontved et al., 2013; John et al., 2011). Therefore, the factors that determine chromatin accessibility to GR play an important role in the gene expression patterns in tissues.

1.2 Constitutive and alternative splicing

Pre-mRNA splicing is an essential and tightly regulated step of RNA processing and mechanistically coupled to transcription and later steps of mRNA export and trafficking. Constitutive splicing involves the removal of intronic sequences from nascent transcripts and joining together of exonic sequences, in the order they are transcribed. Alternative splicing (AS) occurs when the same gene gives rise to different inclusion patterns of exons or introns to create variable proteins, that have different structures and functions to ultimately increase proteome diversity. In humans 95% of all genes undergo AS to generate multiple mRNA variants from a single gene (Keren et al., 2010; Nilsen and Graveley, 2010; Pan et al., 2008). In mice about 63% of all genes undergo AS. Overall, more evolved organisms in fact exhibit increased amounts of alternative splicing, thus AS can be thought of as a critical component of evolution (Keren et al., 2010).

6

Constitutive splicing Alternative splicing

DNA DNA

Pre-mRNA Exon 1 Exon 2 Exon 3 Pre-mRNA

mRNA Variant 1 Exon 1 Exon 2 Exon 3 Exon 1 Exon 3 mRNA Variant 2

Protein Protein

Figure 1-3 Schematic of alternative splicing (AS).

Constitutive splicing involves the removal of intronic sequences and joining together of exons in the order they appear (left panel). However, alternative splicing occurs when the pre-mRNA is processed (spliced) to include or exclude alternative exons, thus creating distinct mRNA isoforms and consequently different protein products that may exhibit diverse functions.

1.2.1 Types of alternative splicing

There are five main types of alternative splicing. The most common type is cassette exon skipping or inclusion and comprises about 40% of all alternative splicing events (ASEs), whereas 3’ or 5’ splice site (SS) usage comprise about 18% and 8%, respectively. Microexons (MIC) are defined as 3 – 27 nt long exons (Volfovsky et al., 2003) and were generally missing from past genome annotations due to a lack of sequencing depth. At present, it is difficult to ascertain what proportion of alternatively spliced events comprise microexons. Until recently, intron retention (IR) was cited to be the most rare subtype of alternative splicing, making up less than 5% of all events (Keren et al., 2010; Pan et al., 2008). However, with advances being made in the depth of high-throughput sequencing (HTS) approaches, various studies have demonstrated that intron retention events (IREs) are much more prevalent and may comprise up to 30 – 40% of events (Braunschweig et al., 2014; Irimia et al., 2014). Much less frequently, complex events can occur

7

by the use of mutually exclusive exons, alternative promoter usage (for alternative transcriptional start sites) and alternative polyadenylation sites (Keren et al., 2010; Kim et al., 2007).

Figure 1-4 Types of alternative splicing.

Schematic of constitutive splicing and the five main types of alternative splicing events. Constitutive exons are shown in blue and alternative exons are shown in purple with alternative splicing events depicted by the solid lines above or below the exons.

1.2.2 Spliceosome components and mechanism

Both constitutive and alternative splicing are performed by the spliceosome, a large ribonucleoprotein (RNP) complex that includes five small nuclear ribonucleoprotein particles (snRNPs U1, U2, U4, U5, U6) and over a hundred auxiliary proteins (Keren et al., 2010; Kornblihtt et al., 2013; Will and Luhrmann, 2011). Humans also possess a minor spliceosome made up of different snRNPs (U11, U12, U4atac/U6atac and U5) for splicing a minor class of introns that do not have the typical GU-AG dinucleotides at the 5’ and 3’ splice sites, respectively (Steitz et al., 2008; Tarn and Steitz, 1996). This feature of the spliceosomal machinery will not be discussed further in this thesis. snRNPs are comprised of a small nuclear

8

RNA (snRNA) made up of 50-200 nucleotides, along with a few proteins (Lee and Rio, 2015). snRNA’s are initially transcribed and processed similar to protein coding mRNA’s in which Pol II recognizes specific promoter sequences and recruits general transcription factors. Capping and polyadenylation occur co-transcriptionally as with mRNAs (Matera and Wang, 2014). snRNA’s fall into two classes: Sm-class and Sm-like-class, and after transcription both types leave the nucleus for further processing in the cytoplasm. To form the mature snRNP, Sm proteins form a hexamer core, and then assemble around the snRNA. The now mature snRNP is imported back into the nucleus for further processing in Cajal bodies until final storage in nuclear speckles (Lamond and Spector, 2003; Matera and Wang, 2014; Sleeman and Lamond, 1999).

There are four conserved sequences that delineate splice sites and enable base recognition of the pre-mRNA by the spliceosome. The 5’ splice site (5’SS), 3’ splice site (3’SS), branch point sequence (BPS) located about 15-40 base pairs upstream of the 3’SS, and the polypyrimidine tract (PPT) which is between the branch site and 3’SS (Chen and Manley, 2009; Keren et al., 2010). Splice site sequences that deviate from the consensus help dictate alternative splicing decisions since the strength of the consensus sequence helps to attract spliceosomal components.

Figure 1-5 Necessary sequences for spliceosome recognition

There are 4 core conserved splicing signals necessary for recognition by the spliceosome. These include the 5’ and 3’ splice sites, the branch point and polypyrimidine tract sequences. Highly conserved residues are depicted in red. (SS – splice site, Y – pyrimidine, R – purine).

The spliceosome catalyzes the two consecutive transesterification reactions of splicing. In the first reaction the 2’OH of the branch site conserved adenosine acts as the nucleophile and attacks the conserved guanine in the 5’SS. In the first step the exon 1 – intron junction is cleaved. The second reactions occurs when the free OH in the 5’SS attacks the conserved guanine at the 3’SS to liberate the intron lariat structure and mature mRNA (Figure 1-6).

9

Figure 1-6 Two transesterification reactions of constitutive and alternative splicing.

In the first transesterification reaction the 2’OH group of the branch site undergoes a nucleophilic attack on the 5’SS. In the second reaction the two exons are ligated to form the mature pre-mRNA. In the process the intron lariat is released. (E1 – exon 1, E2 – exon 2, p – phosphate, A – conserved adenosine of BPS).

The spliceosome assembles, de novo, on each intron to be spliced. Assembly begins with U1 snRNA recognizing and base pairing with the 5’SS, U2AF heterodimer (U2AF65/U2AF35) binding the 3’ terminal AG of the intron and the PPT while splicing factor 1 (SF1) binds the BPS. This is the initial early (E) complex and formation is ATP-independent. U1 binding to the 5’SS is fairly weak and this interaction is stabilized by SR proteins (Cho et al., 2011). The next step to form the pre-spliceosomal A complex is ATP-dependent and occurs when SF1 is replaced by U2 at the BPS. The B complex is formed after the U4-U6 dimer and U5 are recruited as a pre- assembled tri-snRNP. After a lot of conformational changes and remodeling the catalytic B* complex is created (Matera and Wang, 2014). Multiple RNA helicases are involved in every step of spliceosomal formation and rearrangement and final activation of the catalytically active spliceosome. Helicases are necessary to unwind the U4-U6 dimer, which releases U4 and U1 allowing the active U2 and U6 structure to form (Matlin and Moore, 2007). The splicing reaction is then catalyzed by the U2-U6 snRNAs (Matera and Wang, 2014) with the Prp8 helicase critical for both steps of the splicing reaction (Schellenberg et al., 2013). After the B* complex performs the first step of the splicing reaction, the C complex is formed. The C complex undergoes other rearrangements prior to carrying out the second splicing step. Disassembly of the post-catalytic spliceosome is performed by several helicases (Cordin et al., 2012). Very little is known about

10

the conformational changes that occur between the B and C complexes of the spliceosome. Often early stages of assembly occur around exons, specifically by factors that recognize the 5’SS and upstream 3’SS. This is referred to as “exon definition” and leads to commitment of that exon to be spliced. Exon definition commonly occurs in mammals, owing to the large size of introns. Thus, if this is the case this “exon definition” must eventually be converted to intron definition allowing for cross-intron interactions between U1 and U2 (De Conti et al., 2013; Matera and Wang, 2014). However, the conversion steps are not well known.

Figure 1-7 Spliceosome assembly

In the early stages of spliceosome assembly the U1 snRNP recognizes and binds the 5’SS via RNA-RNA interactions. SF1 binds the BPS and U2AF65/35 binds the 3’SS. The pre- spliceosome complex is formed when U2 displaces SF1, and further recruitment of the U5, U3/U6 trimer forms the pre-catalytic spliceosome. After various rearrangements the catalytically active spliceosome is formed and splicing proceeds via two transesterification reactions to release mature mRNA and an intron lariat structure.

11

1.2.3 RNA sequences as a determinant of splicing

Both constitutive and alternative splicing are regulated by the RNA sequence elements and protein factors that recognize these sites. The occurrence of an alternative splicing event is dictated by splice site strength, or how a splice site motif differs from the consensus motif. Most studies on splicing regulation have focused on how trans-factors enhance or repress splice site recognition by U1 (in the case of the 5’SS) and U2AF (in the case of the 3’SS). Cognate sequences within the RNA perform cis-regulation and are characterized based on location (exonic or intronic) and effects on splicing (enhancers or silencers) giving four broad groups termed exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs) and intronic splicing silencers (ISSs). These splicing regulatory elements (SREs) are typically concentrated within 300 nucleotides (nt) of splice sites. A rule of thumb is that enhancers within exons function to regulate constitutive splicing, whereas enhancers and silencers that regulate AS can be within introns and the alternative exon (Barash et al., 2010).

Local RNA structure can also affect the binding availability of cis-regulatory sequences (McManus and Graveley, 2011). In addition to affecting SRE availability, secondary structures within the RNA can also hinder splice site accessibility (Wang and Burge, 2008). While most RBPs prefer to bind single stranded RNA, some display preferential binding for or against certain secondary structures (Ray et al., 2013). RNA helicases can modulate AS by unwinding RNA structures and facilitating splicing (Honig et al., 2002).

Figure 1-8 Splicing regulation is determined by RNA sequence and protein factors.

Schematic of various regulatory elements involved in the control of alternative splicing. (A) Illustration of a possible mechanism for silencing splicing. hnRNP proteins may bind to exonic splicing silencer (ESS) sites and sterically block the access of spliceosome components (U1, U2AF65/35) to their binding sites. RNA-binding proteins (RBP) may also bind to intronic silencer sites (ISS) to promote exon skipping. (B) SR proteins commonly bind exonic splicing enhancer (ESE) sites to facilitate recruitment of spliceosomal components to the 3’ and 5’ SS. RBPs may bind intronic splicing enhancer (ISE) elements within the RNA to promote exon inclusion.

12

1.2.4 Splicing factors as determinants of alternative splicing

Along with the cis-regulatory sequences, trans-acting factors such as the Serine/Arginine-rich (SR) protein family and the heterogeneous nuclear ribonucleoparticle (hnRNP) protein family are critical players in the determination and control of alternative splicing events. They can bind different elements in the RNA to activate or repress recognition by U1 and U2AF65/35 (Figure 1-8). SR proteins are characterized by at least one N-terminal RNA-recognition motif (RRM) and a downstream Arg-Ser (RS)-repeat consisting of at least 50 amino acids with >40% RS content (Busch and Hertel, 2012) and it is known that they prefer to bind purine-rich exonic sequences (Fu and Ares, 2014; Sanford et al., 2009). Previous literature reports suggested that the SR repeats facilitate protein-protein interactions while the RRMs are more critical for RNA binding; however, recent evidence suggests the converse case can also occur, with the RS repeats facilitating RNA binding and the RRMs acting as a protein interaction domain (Shen and Green, 2006). SR proteins typically facilitate the “exon definition” type of interactions in early spliceosome assembly (Braunschweig et al., 2013). hnRNPs are a more heterogeneous group of proteins with broader classification and functionality (Busch and Hertel, 2012; Dreyfuss et al., 1993). They have widespread roles in RNA processing, which includes splicing, export, translation and stability (Han et al., 2010). Due to the heterogeneous nature of this class, they may contain one or more RNA-binding domains (RBDs) such as RRMs, K homology domains, glycine rich and RGG box domains. In general, SR proteins function as enhancers while hnRNPs mediate splicing repression of splice site recognition by U1 and U2AF (Busch and Hertel, 2012; Chiou et al., 2013; Cho et al., 2011). Splicing inhibition can occur if the splicing silencer site is close to a splice site, thus hnRNP binding sterically hinders snRNPs for their sites. Another mechanism causes extended pairing between snRNPs and their splice sites, preventing further spliceosome assembly and catalysis (Chiou et al., 2013). In cases where a silencer acts at a distance, it is proposed that an alternative exon can be “looped out” to prevent further spliceosome assembly (Chen and Manley, 2009; Spellman and Smith, 2006). hnRNPs have been shown to also enhance splicing by acting cooperatively to bring splice sites closer together (Martinez-Contreras et al., 2006).

13

1.2.4.1 Measuring RNA-protein interactions

The interactions between RNA-binding proteins (RBPs) and RNA are critical for all aspects of RNA metabolism, including AS regulation, as discussed above. Interactions can be measured using a variety of in vitro and in vivo techniques. In vitro techniques tend to be used for inference of RNA binding specificity while in vivo approaches provide a snapshot of RBP binding to RNA. The use of both approaches together allows for meaningful characterization of the RBP-RNA interaction. Traditional techniques involved use of RNA electromobility shift assays; however, the drawback is that RNA binding motif and target RNA need to be known and the interaction that occurs in the high-salt electrophoretic conditions may not paint an accurate picture of what is actually occurring in vivo (Hellman and Fried, 2007). RNA immunoprecipitation (RIP) allows for identification of multiple RNA targets in live cells, and combining it with a high-throughput approach (RIP-Seq or RIP-ChIP) allows for identification of the global landscape of RNA binding for a particular RBP, within cells at a particular instant (Keene et al., 2006; Selth et al., 2011; Wessels et al., 2016). RIPs are performed in live cells, by first using formaldehyde to crosslink all protein-RNA complexes. It is also important to keep in mind that chemical crosslinking with formaldehyde would achieve pull down of direct and indirect interactions. RIP studies can give variable results since RNA species are very abundant both in the nucleus and cytoplasm, thus the risk of nonspecific chemical crosslinking is higher. Measurement of indirect interactions can be of interest when interrogating multicomponent RNA-protein complexes and for factors that play important roles in RNA metabolism without directly binding to RNA (Buratowski, 2009). The amount of formaldehyde and time to crosslink, lysis, IP and wash steps also need optimization depending on the RBP tested. A “native” RIP (without formaldehyde crosslinking) since formaldehyde crosslinking increases sequence bias and crosslinking with other molecules reduces RNA recovery. Native RIPs are useful for proteins that stay stably associated with RNA if they bind directly and/or with high affinity (Selth et al., 2011).

Various UV crosslinking and immunoprecipitation (CLIP) protocols have been created, originally to only detect the direct interactions and avoid indirect interactions that may be seen with chemical crosslinking. CLIP based methods are evolving in order to map direct crosslinked RBP-RNA sites to single nucleotide resolution (iCLIP and PAR-CLIP) and lower variability of some results seen with RIP (Ascano et al., 2012; Konig et al., 2011; Licatalosi et al., 2008; Sugimoto et al., 2012; Ule et al., 2003). Comparison of RIP and CLIP show that RIP identifies

14

more stably associated RNAs. CLIP methods are often tedious and require extensive optimization, controls and extra replicates to reduce background signal. Furthermore, the technical challenges surrounding crosslinking biases, unique mapping of short reads and abundance of certain transcripts need consideration. Overall, understanding the biochemistry behind the RNA-protein interaction is beneficial to creating a robust RIP or CLIP experiment. Knowledge of where the protein is localized and its abundance, as well as considerations of the RNA motif and structure are important when creating an optimal protocol for an RBP of interest.

In vitro bioinformatic approaches can complement the in vivo approaches described above. These tools often can use algorithms to infer an RNA binding motif for an RBP of interest. Motif identification is critical for understanding how protein-RNA interactions occur. Similar to SREs used for AS regulation, RBP binding motifs are usually short, linear sequences seemingly present at thousands of sites in the genome. Furthermore, there are hundreds of RBPs encoded in the genome, yet motifs are not well characterized due to their degenerate nature. Like other bioinformatic approaches used for sequencing, the identified motifs need to be validated in real targets, thus a complementary RIP or CLIP experiment is usually performed. Methods such as RNAcompete have recently been established to characterize the binding specificity for an RBP of interest (Ray et al., 2009; Ray et al., 2013). While RNAcompete is an excellent approach to infer binding specificity based on relative abundance of pulled down molecules, it still does not allow for a proper Kd measurement, and inferred motifs should be validated by a complimentary in vivo method. RNAcompete will be discussed in more detail in Section 1.3.3.

1.2.5 Positional context as a determinant of AS regulation

SREs are often context dependent and cell-specific (Fu and Ares, 2014). This means that depending on their location in the pre-mRNA they can recruit different protein factors, in a cell and tissue-dependent manner. Furthermore, the presence of other factors in various cell or tissue contexts can alter splicing (Fu and Ares, 2014; Pandit et al., 2013). The context-dependent actions of SREs can be thought of in two ways: as having location-dependent activity or gene- dependent activity (Wang and Burge, 2008). A global overview of exonic sequences was able to establish the location of many exonic SREs and how their position in a sequence dictated which RBPs would bind and how AS would proceed (Goren et al., 2006). Moreover, the same trans-

15

acting proteins may have different effects on splicing dependent on which SRE they bind (Wang and Burge, 2008).

Some proteins such as members of the NOVA, RBFOX and hnRNP family, have been shown to act as activators or repressors depending on the location of their binding sites. For instance, if NOVA binding motifs are downstream of alternatively spliced exons, these motifs are associated with exon inclusion. Conversely, if the motifs are within or upstream of a target exon, the exon undergoes higher rates of skipping (Ule and Darnell, 2006; Ule et al., 2006). Other cases of the positional context influencing splicing decision has been shown for PTBP1 and PTBP2 (Licatalosi et al., 2012; Xue et al., 2009), RBFOX (Sun et al., 2012; Weyn-Vanhentenryck et al., 2014) and TDP-43 (Tollervey et al., 2011).

While, the recognition of an SRE by an RBP, and subsequent inclusion or exclusion of an alternative exon may seem random, the use of RNA splicing maps has greatly contributed to understanding a few general rules (Witten and Ule, 2011). An RBP causes exon exclusion when binding near the BPS, SS or within exons. Whereas, binding downstream of exons generally causes exon exclusion. RBPs that bind near the BPS or splice sites likely cause exclusion by competing with spliceosomal components or SR proteins and influence exon inclusion by stabilizing U1-pre-mRNA interaction (Witten and Ule, 2011).

1.2.6 Alternative splicing and nonsense-mediated decay (AS-NMD)

Alternative splicing is critical to increase proteome diversity by generating multiple mRNA variants from a single gene. However, AS can also regulate gene expression by introducing premature termination codons (PTCs) which target the transcript to nonsense-mediated decay (NMD) pathways to ultimately downregulate gene expression. PTCs can be introduced three main ways: i) by a frame shift caused by exon skipping or inclusion, ii) by inclusion of an ASE containing a PTC or iii) mistaking an authentic stop codon present in the 3’UTR for a PTC if an intron is present 50-55 nt downstream of the stop codon (McGlincy and Smith, 2008). mRNAs are targeted to the NMD pathway to prevent abnormal expression of mutant or incompletely processed transcripts. A large exon junction complex (EJC) is placed 20-24 nt upstream of exon- exon junctions (EEJs) in the first step of the splicing reaction. In the initial round of translation, a ribosome would displace the EJC, however if it encounters a PTC 50-55 nt upstream of a terminal EEJ, then EJC components can interact with other proteins to trigger ribosome release

16

and lead to mRNA decay (Braunschweig et al., 2013; Schoenberg and Maquat, 2012). With PTCs potentially present in a vast amount of transcripts, a study by Pan et al. (Pan et al., 2006) wanted to assess if AS coupled to NMD (AS-NMD) was a global mechanism for gene downregulation. They found that knockdown of UPF1 (an essential helicase in the NMD pathway) did not significantly affect the levels of PTC-containing alternative isoforms, which demonstrated that AS-NMD is not a mechanism for global gene regulation. However, AS-NMD is very important as an auto-regulatory mechanism used by SR proteins (Jumaa and Nielsen, 1997; Sureau et al., 2001), hnRNPs (Wollerton et al., 2004), and core spliceosomal components (Saltzman et al., 2008) to downregulate themselves. Ultraconserved elements are regions that are 200 bp long and 100% conserved between humans, mice and rats and interestingly ASEs regulated by AS-NMD are often found within these sites (Bejerano et al., 2004; Saltzman et al., 2008). Furthermore, cross-regulation between different RBPs was also shown to be of high functional importance. Cross-regulation was demonstrated by RBFOX2 on many other RBPs (Jangi et al., 2014) and demonstrated by PTB proteins in neural development (Calarco et al., 2011; Raj and Blencowe, 2015). Overall, AS-NMD is crucial for fine tuning the expression of splicing factors and spliceosomal components to regulate overall gene expression.

1.2.7 Alternative splicing in the nervous system

A hallmark of neural development involves the coupling and coordination of many regulatory pathways, such as those in neuronal migration, axon and dendritic outgrowth, and synaptic connection establishment. The fundamental role of AS in neurogenesis involves widespread crosstalk by many RBPs to create functional neural networks (Irimia et al., 2014; Raj and Blencowe, 2015; Raj et al., 2014). Extensive data suggests that AS in the nervous is system is extremely prevalent and highly conserved, thus the increased incidence is a likely contributor to the complexity of the brain (Barbosa-Morais et al., 2012; Raj and Blencowe, 2015). Furthermore AS is critical for not only neural development but also, maintenance of mature neurons (Vuong et al., 2016). In fact, conserved alternative exons between species are often enriched in brain expression, and are involved in other components of RNA processing (Keren et al., 2010; Yeo et al., 2005).

RNA-binding proteins (RBPs) are usually widely expressed, yet there are some isoforms that have increased expression in a cell- and tissue-specific manner. Tissue-dependent regulation of

17

AS is particularly important in the CNS where high regulation is critical to form proper neural networks. Understanding how RBPs recognize the landscape of cis-sequences is critical to the study of AS in the brain. The cis-features can be thought of as a “splicing code” where most are present within 300 nucleotides of splice sites (Barash et al., 2010; Xiong et al., 2015). Thus, further studies on elucidating where RBPs bind could determine how AS is regulated. For instance, NOVA proteins were found to be associated with exons that maintain proper synapse formation and axon guidance (Ule et al., 2003; Ule et al., 2006; Ule et al., 2005). RBFOX proteins also display specific functions in the CNS, specifically in development of the cerebellum (Gehman et al., 2012) and for neural excitation networks (Gehman et al., 2011). PTB proteins previously mentioned can regulate splicing in a positional-context-cue dependent manner. The specific action of PTBP1 in the regulation of neuronal differentiation has been characterized (Linares et al., 2015). Generally, PTB proteins function in suppressing AS in neural development, thus their downregulation in neural development is critical (Licatalosi et al., 2012). PTBP1 is expressed early in neural development and one of its roles is to promote exon skipping in PTBP2, targeting PTBP2 to NMD pathways. Later in neuronal development Ptbp1 is silenced by the action of miR-124 (Makeyev et al., 2007). Then the action of another neural- specific RBP, nSR100, allows for PTBP2 to be expressed (Calarco et al., 2011; Raj and Blencowe, 2015). This cross-regulatory mechanism by PTB proteins is an exquisite example of how AS-NMD plays an important role in the CNS.

1.3 ARGLU1

1.3.1 Discovery of ARGLU1

Arginine and glutamate rich 1 (ARGLU1) was identified in our lab as a GR coactivator while screening human brain cDNA looking for novel GR-interacting proteins using a GAL4-hGR cotransfection luciferase assay. ARGLU1 has also been shown to coactivate the estrogen receptor (Zhang et al., 2011), but otherwise bears no similarity to known NR coregulators. ARGLU1 is a small protein of 273 amino acids and weighs 33.2 kDa. Its structure includes a positively-charged arginine-rich N-terminal domain (NTD) and a negatively charged glutamate- rich C-terminal domain (CTD), giving rise to its name ARGLU1 (Figure 1-9). Interestingly, the NTD contains numerous RS-repeats characteristic of SR proteins, which were introduced in Section 1.2.4. Another interesting feature about the CTD is that glutamic acid increases disordered regions of protein structure. Splicing factor 1 (SF1) is an example of a disordered

18

protein that binds to U2AF65, and the disorder is actually critical for binding (Uversky and Dunker, 2013). ARGLU1 is highly conserved from C. elegans displaying 20% sequence homology and zebrafish (D. rerio) with 72% homology to the human protein. Other mammals display up to 99% sequence homology with humans (Figure 1-10). Highly conserved proteins usually have important functionality, with overlapping roles in those species that retain the protein. Initial attempts to generate whole body knockouts in mice were unsuccessful as mice died during embryogenesis (between E9.5 and E12.5). Work using zebrafish showed that the two isoforms of Arglu1 (Arglu1a and Arglu1b) were highly expressed in the developing brain. Knockdown of Arglu1 using morpholinos targeting both isoforms resulted in severe brain and heart defects that culminated in death when the animals were unable to feed. This phenotype was rescued by injection of mRNA expressing each isoform.

Figure 1-9 Schematic of ARGLU1 structure.

ARGLU1 is composed of 273 amino acids. The NTD of ARGLU1 is arginine-rich and contains many SR repeats. There is a putative nuclear localization sequence (NLS) in the N-terminus. The CTD is highly negative and glutamate rich.

19

A.

C.elegans ------msrrSRSR------SRSpkRDreERkRredrdRdre D.melanogaster MGsRSRtpSpSgkrrhhkSKHkKrSkShhdherpstrtdRdkSsevnnhgRhRerdRdre B. D.rerio.arglu1a MG-RSRSRSSSRSKH---SKHsrK-RSR------SkSkS------kKRSrSkEpK D.rerio.arglu1b MG-RSRSRSSSRSKHsKtSKHsKK-RSR------SRSRSRD-rER-krRSKSRESK G.gallus MG-RSRSRSSSRSKHTKSSKHNKKnRSR------SRSRSRe-KERaRKRSKSRESK C.lupus.fa MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK H.sapiens MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK M.mulatta MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK B.taurus MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK M.musculus MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK R.norvegicus MG-RSRSRSSSRSKHTKSSKHNKK-RSR------SRSRSRD-KERVRKRSKSRESK

C.elegans RkRdRkdReRkRrhrsssS------EgsqaePhqLgsifReerrRrernEspKlpppp D.melanogaster RdRhRsdRhteRdyrhspSi---lksRkRsSSsssdsqyseqesqrskqkrsrfKkldEq D.rerio.arglu1a RNRRsrSRSgSR------RdRggSPPDRtDmFGRTlSKRnn-DEKQKREEEd D.rerio.arglu1b RNRRRESRSRSRS-NTAtSR----RDRERAaSPPeRIDIFGRalSKRSavDEKQKkEEEE G.gallus RNRRRESRSRSRSntapsSRrdREReRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE C.lupus.fa RNRRRESRSRSRSTNTAVSR--RERDRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE H.sapiens RNRRRESRSRSRSTNTAVSR--RERDRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE M.mulatta RNRRRESRSRSRSTNTAVSR--RERDRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE B.taurus RNRRRESRSRSRSTNTAVSR--RERDRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE M.musculus RNRRRESRSRSRSTNaAaSR----ReRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE R.norvegicus RNRRRESRSRSRSTNaAaSR----ReRERASSPPDRIDIFGRTVSKRSSLDEKQKREEEE

C.elegans ppppsd-----ppvdtsipfdvstlnEpTkkwlEE----kivEqvsaRvhqlEammaeka D.melanogaster nqmqvERlaemeRqrRakElEqKtIEEEaAkRiEmLVkKRVEEELEKRrDEIEqEVnRRV D.rerio.arglu1a rrvEiER----QRKIRQQEIEErLIEEETARRVEELVArRVEEELEKRrDEIEhEVLRRV D.rerio.arglu1b KKvEmER----QRrIRQQEIEErLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV G.gallus KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV C.lupus.fa KKAEssk----QeKfRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV H.sapiens KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV M.mulatta KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV B.taurus KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV M.musculus KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV R.norvegicus KKAEFER----QRKIRQQEIEEKLIEEETARRVEELVAKRVEEELEKRKDEIEREVLRRV

C.elegans tsArneMEKmLraqiEaemavELAecKkRdEEsRkKckqLEaeLErkvleaeEsrkKfeE D.melanogaster EtAKaeMEremmlELERrReqireeerrREEdEkqKREELEeILaENNRKIeEAQrKLAE D.rerio.arglu1a EEAKRIMEaQLLqELERQRQAELnAQKAREEEEksKRvELERILEENNRKIAdAQAKLAE D.rerio.arglu1b EEAKRIMEKQLLEELERQRhAELAAQKAREEEEksKREELEkILvdNNRKIAdAQAKLAE G.gallus EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE C.lupus.fa EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE H.sapiens EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE M.mulatta EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE B.taurus EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE M.musculus EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE R.norvegicus EEAKRIMEKQLLEELERQRQAELAAQKAREEEERAKREELERILEENNRKIAEAQAKLAE

C.elegans drLamlEqksqlerdRaeLarqksdmkKnEQqaILnKsGnSRapikFkfgk--- D.melanogaster ErLaIiEEQRlmdEERqrmrkEqekrvKEEQKVILGK-nnSRPKLSFSLKpgal D.rerio.arglu1a dQLRIVEEQRKIHEERMKLEQERQkQQKEEQKmILGK-GKSRPrLSFSLKate- D.rerio.arglu1b dQLRIVEEQRKIHEERMKLEQERQkQQKEEQKmILGK-GKSRPKLSFSLKase- G.gallus EQLkIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKsQD- C.lupus.fa EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD- H.sapiens EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD- M.mulatta EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD- B.taurus EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD- M.musculus EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD- R.norvegicus EQLRIVEEQRKIHEERMKLEQERQRQQKEEQKIILGK-GKSRPKLSFSLKTQD-

Figure 1-10 ARGLU1 is highly conserved between species.

(A) Phylogeny tree depicting ARGLU1 conservation between species. Branch length is proportional to the number of substitutions per site. (B) ARGLU1 multiple species sequence alignment was performed using phylogeny.fr (Dereeper et al., 2008). Highly conserved residues are highlighted in light blue, “medium” conservation in grey and low conserved residues are not highlighted.

20

1.3.2 ARGLU1 C-terminal domain is a GR coactivator

ARGLU1 is ubiquitously expressed with its mRNA expression strongly paralleling that of GR (Magomedova et al., 2016). ARGLU1 is localized in the nucleus, while GR is present in the cytoplasm in the unliganded state. However, when cells are treated with Dex, GR translocates to the nucleus and is seen to colocalize with ARGLU1 (Magomedova et al., 2016). Interestingly, ARGLU1 localization was observed in nuclear foci, with a punctate pattern that is characteristic of nuclear speckles, a storage site for splicing factors (Spector and Lamond, 2011). Confirmation of nuclear speckle localization for ARGLU1 still needs to be confirmed using co-localization with SRSF1 or SRSF2 (Girard et al., 2012; Tripathi et al., 2012) detailed in the previous section. ARGLU1 was able to co-activate GR, potentiating its transcriptional activation over 500-fold (Magomedova et al., 2016). Interestingly, it was only the CTD of ARGLU1 that displays potent GR coactivation ability while the NTD did not. NR coregulators are often recruited to act in a complex, and indeed, co-expression of ARGLU1 with other coactivators (SRC1, TIF2, PGC1a) demonstrated an additive response (Magomedova et al., 2016).

1.3.3 ARGLU1 N-terminal domain interacts with splicing factors

The NTD of ARGLU1 contains RS repeats, characterizing the protein as an SR-like protein. Using standard affinity purification followed by mass spectrometry, numerous interactions between ARGLU1, spliceosomal components (U2AF2, U2AF1) and splicing factors such as hnRNPs, SRSFs, RBMs, SFs, PRPFs (pre-mRNA processing factors) and DDX/DHX (DEAD/DEAH box helicases) proteins were uncovered (Magomedova et al., 2016).

Since we determined that ARGLU1 interacts with SFs, and numerous SFs are RNA-binding proteins, RNAcompete was performed to assess putative RNA-binding preference of ARGLU1. The RNAcompete workflow involves using a GST-purified protein and incubating it with two large pools of RNA k-mers. Using two pools ensures internal cross-validation. The resultant RNA-protein complexes are pulled down and purified then relative enrichment of specific RNAs in the bound fraction are compared to the starting pool fractions (Ray et al., 2009). The consensus motif is found by aligning the top ten highest scoring k-mer sequences. The consensus RNA binding motif for ARGLU1 was found to be a CGG(A/G)GG sequence (Figure 1-11). Interestingly when the experiment was repeated for only the NTD or CTD, we found a similar G-

21

rich motif for the NTD, however no consensus motif was found for the CTD demonstrating the specificity of RNA interaction coming from the NTD.

Figure 1-11 RNAcompete derived consensus motifs for GST-hARGLU1 constructs.

GST-purified full-length, NTD and CTD constructs of ARGLU1 were used in the RNAcompete workflow described above. The final consensus motif for each GST-ARGLU1 construct are displayed on the top. The correlated 7-mer consensus motif for Set A and Set B are displayed within the Z score scatterplots. The top ten 7-mer sequences and Z scores are presented in the tables below.

1.4 Co-transcriptional RNA processing

It has become very evident that transcription is coupled spatially and temporally with all other later processes of RNA processing. Integration of different RNA processing steps occurs co- and post-transcriptionally. For instance, 5’-end capping protects the pre-mRNA from degradation, thus coupling of transcription to capping ensures that the nascent mRNA will reach maturity. Furthermore, tethering transcriptional and RNA processing machinery allows for increased rate and specificity of all enzymatic reactions throughout the entire process of gene expression.

22

Mechanistically, RNA polymerase and transcription factors are involved in the cotranscriptional RNA processing steps of 5’ capping, 3’ polyadenylation and splicing (Maniatis and Reed, 2002; Orphanides and Reinberg, 2002). An important distinction should be made between cotranscriptional and posttranscriptional RNA processing, and furthermore it is important to note that some introns can be processed both ways. This is largely dependent upon the surrounding cis-regulatory elements that were discussed in Section 1.2.3 (Vargas et al., 2011). Interestingly, while most introns are spliced co-transcriptionally, alternatively spliced introns are more frequently spliced in a post-transcriptional manner (Bhatt et al., 2012; Braunschweig et al., 2013). As stated, all steps of RNA processing are coupled, which includes splicing and constitutive 3’ end cleavage and polyadenylation. For instance, U2AF and SR proteins may interact with cleavage factors at the terminal exon. Furthermore, well known splicing factors such as Nova and hnRNP H1 can also regulate alternative polyadenylation (Braunschweig et al., 2013)

Specifically, the carboxy-terminal domain (CTD) of the large subunit of RNA polymerase and transcriptional elongation factors play a crucial role in coupling all processes. The CTD is not just a platform for transcriptional machinery to assemble on, but also tethers splicing machinery and transcriptional elongation complex. This will help splicing factors recognize exons amongst the vast amount of introns immediately as they emerge from the polymerase (Maniatis and Reed, 2002). The CTD is the largest subunit of Pol II and in humans and mice is composed of 52

YS2PTS5PS heptamer repeats (Bartolomei et al., 1988; Bentley, 2014). The amino acid composition of the CTD allows for a variety of post-translational modifications to take place. Specifically phosphorylation of the serine and tyrosine residues has a role in modulating CTD function, and influences coupling of all pre-mRNA processing stages (transcription, capping, and splicing) (Buratowski, 2009; Hirose et al., 1999; Meinhart and Cramer, 2004). For instance, Ser- 5 phosphorylation is critical for proper recruitment of capping factors and Ser-2 phosphorylation is important for proper 3’ end formation by recruitment polyadenylation factors (Ahn et al., 2004; Glover-Cutter et al., 2008; Komarnitsky et al., 2000). Studies used affinity chromatography to identify SR proteins and U2AF bound to phosphorylated CTD, which provided a link for how the CTD of Pol II can stimulate splicing (Hsin and Manley, 2012)

23

1.4.1 Kinetic model of coupling

There are two primary mechanisms used to describe the process by which transcription and splicing are coupled (Bentley, 2014; Braunschweig et al., 2013). First is the kinetic model, also known as kinetic competition, where the speed of RNA polymerase dictates splicing. Inclusion of alternative exons can be thought of as being spliced in a “first-come first-served” model. Since Pol II transcribes the 5’ end of the gene first, the 5’ portion of the pre-mRNA will see the spliceosome first. Consequently, modulation of elongation rate affects the “window of opportunity” for weaker splice sites (non-consensus sequences) to be recognized (Kornblihtt et al., 2013; Munoz et al., 2010). Fast elongation causes recruitment of the spliceosome to a strong 3’SS of an upstream 5’ intron (since it is released from the polymerase first) and this favours exon skipping (Kornblihtt et al., 2013). In other words, if polymerase pauses, then there is more time for splicing factors to assemble on the exon that displays a weaker splice site and it can be included more in the final mRNA product. However, it is important to note that slow elongation will also allow for negative splicing regulators to assemble on their target sequences to promote exon skipping. Kinetic competition is thought to have a major influence on alternative splicing decisions. Interestingly, SRSF2, a member of the SR proteins was shown to modulate elongation rate, thus influencing splicing decisions (Bentley, 2014; Lin et al., 2008).

1.4.2 Recruitment model of coupling

The second mechanism of coupling is the recruitment model in which splicing factors are recruited to transcription sites by the C-terminal domain of Poll II and transcriptional machinery (Bentley, 2014; Hirose et al., 1999). For instance, CTD phosphorylation has been shown to assist in recruitment of U2AF65 to the PPT and for SR protein recruitment (David et al., 2011; de la Mata and Kornblihtt, 2006; Gu et al., 2013). Mediator is a large multidomain complex conserved from yeast to humans and functions to bridge the gap between transcriptional activators and RNA polymerase. Mediator activates transcription via direct interactions with RNA polymerase and general transcriptional factors at the promoter as well as with activators at transcriptional enhancer sites (Conaway et al., 2005; Taatjes, 2010). The MED1 subunit plays an important role for specifically activating NR-mediated transcription by binding directly to ligand-bound NRs via its LXXLL motif (Jia et al., 2014). Interestingly, ARGLU1 was found to interact with MED1 and was required for ER-mediated transcription (Zhang et al., 2011) Overall, mediator can increase the efficiency and/or rate of assembling the pre-initiation polymerase complex by

24

affecting different assembly and recruitment steps (Conaway and Conaway, 2011; Conaway et al., 2005). MED23 has previously been shown to interact with hnRNP-L to influence a subset of AS decisions (Huang et al., 2012). Thus, these examples demonstrate how the mediator complex can also play a role in the recruitment model of coupling.

Chromatin remodelers may also influence the recruitment of splicing factors to the pre-mRNA. This has been observed with the histone mark, H3K36me3, which is able to recruit PTBP1 via an adapter protein, and cause exon skipping (Luco et al., 2010). Another instance of spatial integration of multiple components of gene expression was demonstrated by Zhou et al. (2011) when they showed that proteins that control mRNA stability can regulate AS via induction of histone acetylation which increases Pol II elongation (Zhou et al., 2011).

1.4.3 NR coregulators: mediators of transcription and splicing coupling

A number of studies implicate NR coregulators in coupling transcription with splicing (Auboeuf et al., 2007; Auboeuf et al., 2002; McKenna and O'Malley, 2002). The PPARγ coactivator, PGC1α, has an RS domain and RNA-binding domain and was found to colocalize with splicing factors in the nucleus that promote splicing (Auboeuf et al., 2007; Monsalve et al., 2000). COBRA1 was identified to be in a subunit of the negative elongation factor complex which functions to stall RNA polymerase, and this leads to alterations in splicing patterns (via the kinetic model of coupling) of androgen-receptor regulated genes (Sun et al., 2007). NCoA62/Ski-Interacting protein is a vitamin D receptor (VDR) coactivator that also interacts with splicing factors and is able to block splicing of a growth hormone mini-gene cassette exon (Zhang et al., 2003). CAPERα and CAPERβ are steroid receptor coactivators as well as members of the U2AF protein family that couple hormone-dependent transcription with the splicing of VEGF (Dowhan et al., 2005). DDX5 and DDX17 are RNA helicases, components of the spliceosome, and coactivators for the estrogen receptor (ER) and androgen receptor (AR). One study demonstrated that they influence hormone-directed splicing by ER and AR, as well as control ER and AR expression by modulating splicing upstream of the two receptors (Samaan et al., 2014). A recent review highlighting the role of VDR and alternative splicing looked at how hnRNP-C, a known splicing factor, is also part of the VDR transcriptional complex and thus able to couple transcription with alternative splicing (Zhou et al., 2015). MED23, a subunit of Mediator complex and a transcriptional coregulator, has recently been shown to regulate splicing

25

by acting as a master regulator of hnRNP-L, demonstrating a new role for mediator (Huang et al., 2012; Kornblihtt et al., 2013). Very early on, there was a literature report of GR interacting with hnRNP-U, which represses GR transcriptional activation (Eggert et al., 2001; Eggert et al., 1997) but otherwise there is very little known regarding GR-regulated cotranscriptional splicing. Many of these studies used mini-genes as a model system to study constitutive or alternative splicing. An issue with mini-genes is that they frequently do not recapitulate the splicing of endogenous genes and it is hard to assess other regulatory elements in a real sequence context.

1.5 Objective of thesis research

The overarching aim of this research is to elucidate how nuclear receptors and their coregulators function in cells to dynamically orchestrate diverse patterns of gene expression, from transcription to alternative splicing. ARGLU1 was recently identified as a coactivator for the glucocorticoid receptor and further studies on its structure uncovered similarities to well-known splicing factors, the RS proteins. Immunoprecipitation of ARGLU1 followed by tandem MS found many splicing factors, components of the spliceosome and other proteins critical in modulating both transcription and splicing as binding partners. RNAcompete uncovered a G-rich RNA-binding motif for ARGLU1. Knockdown of ARGLU1 in N2a cells, followed by mRNA analysis using RNA-seq found widespread changes in gene expression and AS. Interestingly, Dex treatment also led to many AS changes. Recent studies have demonstrated that transcription and splicing are coupled, and mediators of coupling can include nuclear receptor coregulators (Described in Section 1.1.2). Taken together, these preliminary findings led to the construction of the first goal of my thesis: to characterize the role of ARGLU1 as a novel RNA-binding protein that is able to modulate alternative splicing. The second part of this project came about upon observing that Dex, a GR ligand, seems to also play a role in modulating AS. Thus, the second goal was to study the role of a nuclear receptor ligand (Dex) in mediating AS changes.

1.5.1 Specific aims 1. To characterize the role of ARGLU1 in modulating gene expression and AS within N2a cells (Section 3.1 and 3.2). a. Use qPCR to validate RNA-seq for gene expression changes (basal and ligand- dependent) after ARGLU1 knockdown and Dex treatment in N2a cells.

26

b. Use One-Step RT-PCR to validate AS changes after ARGLU1 knockdown and Dex treatment in N2a cells. 2. Identify the RNA targets that host binding motifs for ARGLU1 (Section 3.3). a. Perform RNA immunoprecipitation (RIP) to identify, in cultured cell systems, the RNA targets bound by ARGLU1. b. Test truncation constructs to find the RNA binding domain of ARGLU1 3. Establish whether GR is necessary in Dex-mediated AS, or if Dex is acting in a glucocorticoid receptor-independent manner (Section 3.4). a. Perform RNA-seq analyses of N2a cells ± siGR ± Dex 4. To compare and contrast the roles of ARGLU1 and GR in mediating global changes in gene expression and alternative splicing (Section 3.5).

1.5.2 Hypothesis

ARGLU1 is a novel glucocorticoid receptor coregulatory protein that binds RNA to modulate alternative splicing, and some alternatively spliced events (ASEs) can be regulated by Dex, in an ARGLU1-dependent manner.

27 Chapter 2 Methods 2.1 Cell culture

N2a cells were grown on 100 mm culture dishes at 37ºC and 5% CO2. Cells were maintained in Dulbecco’s Modified Essential Media –high glucose (DMEM 5796, Sigma) supplemented with 10% Fetal bovine serum (FBS, Gibco) and 1% Penicillin/Streptomycin (P/S). Cells were passaged every other day with 1 mL trypsin using a 1/5 or 1/10 split.

2.2 Transfection assays

N2a cells were transfected using Lipofectamine 3000 (Invitrogen) to overexpress Flag-CMX, Flag-hArglu1, Flag-hArglu1 (1-96), Flag-hArglu1 (97-273) or Flag-hGRα. Cells were placed in OptiMEM media prior to transfection when they reached a 60-70% density. The DNA to Lipofectamine3000 ratio was kept at 1:2 with 10 µg of plasmid transfected when using 10cm plates for RIP studies.

N2a cells were transfected with siRNA against mArglu1 (siGENOME SMARTpool M-057082- 01-0010; siGENOME Pool#1 D-057082-01; siGENOME Pool#2 D-057082-01) or Gr (siGENOME mNr3c1 siRNA SMARTpool; M-045970-01-0010) or using the non-targeting control siRNA (siGENOME Non-Targeting siRNA Pool#2; D-001206-14-05) using RNAiMax reagents. The final concentration of siRNA/well was 12 nM and present at a ratio of 1:6 with the RNAiMax reagent. After 48 hrs cells were treated with vehicle (ethanol) or 100 nM Dex in propagation media. After 4 hrs cells were harvested for RNA extraction and subsequent RNA- seq analyses and One-Step RT-PCR validation, or for protein analyses.

2.3 RNA analysis, cDNA synthesis and real-time quantitative PCR (qPCR)

Cells were homogenized in RNA STAT-60 to extract total RNA, then DNase treated and reverse transcribed with random hexamers using the High Capacity Reverse Transcription System (Applied Biosystems). Real-time quantitative PCR (qPCR) reactions were performed in 384-well plates containing 12.5 ng cDNA, 150 nM of each gene specific 1:1 primer mix and 5 µL

SensiFast SYBR Hi-ROX Master Mix in a 10 µL total volume. For gene expression analyses, the comparative Ct (∆∆Ct) method was used to determine relative mRNA levels normalized to 36b4 mRNA. Analyses were performed using SDS 2.3 software after running the plates on the Applied Biosystems 7900HT machine.

2.4 Protein extraction

Cells were washed with ice-cold PBS and lysed in 400 µL (or 5x pellet volume) of RIPA buffer (50 mM Tris-HCl pH 7.4, 1% NP-40, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.1% SDS, 0.5% sodium deoxycholate, 0.5 mM PMSF and 1x PIs (Roche, Canada). Cells were incubated with lysis buffer on ice for 20 minutes with occasional plate agitation. The cells were collected by scraping and centrifuged at 21,000 x g for 15 minutes at 40C. Supernatants were transferred into new tubes and stored at -800C until use.

2.5 BCA and WB

Protein concentrations were measured using the Bicinchoninic Acid (BCA) assay kit (Pierce).

Briefly, samples were diluted 1:10 in autoclaved H2O. 10 µL of BCA standards made in H2O and 10 µL of the samples were loaded into a regular 96 well plate. 200 µL of BCA working solution was put into the wells and incubated for 30 mins. The absorbance of the samples and standards were measured at 562 nm using a Victor machine. 15-30 µg of protein was resolved on 4-20% gradient SDS gel then transferred to a nitrocellulose membrane for detection. Membranes were incubated in 5% skim milk in TBS-T blocking buffer for 1 hr followed by overnight incubation with primary anti-Flag M2 (1:1000, F1804, Sigma), anti-Arglu1 (1:1000, NPB1-87921, Novus Biologicals), anti-GR (1:2000, SC-1004, Santa Cruz), or anti-(α/β)-tubulin(1:1000, 2148S, Cell Signalling). Secondary horseradish peroxidase (HRP)-conjugated anti-mouse IgG antibody (1:5000, 7076, Cell Signaling) or anti-rabbit IgG (1:2000, 7074, Cell Signaling) was used for 1hr. The BluEye prestained protein ladder (Genedirex) was run with each gel. Bands were visualized with ECL-Prime and X-ray film (GE health care; Piscataway, NJ). The blots were quantified using Image J software.

2.6 RNA-seq analysis and validation

N2a cells were transfected with siARGLU1, for ARGLU1 RNA-seq studies or siGR, for GR RNA-seq studies. Cells were treated with 100 nM Dexamethasone for 4 hrs and mRNA was

29

extracted as described previously. mRNA was pooled by treatment group (n=3 per group) and mRNA enriched Illumina TruSeq V2 RNA libraries were prepared. Samples were sequenced at the Donnelly Sequencing Centre (University of Toronto) on an Illumina HiSeq2500. Transcriptome-wide gene expression and alternative splicing analysis was assessed using a previously reported pipeline (Irimia et al., 2014) that is publically available online at https://github.com/vastgroup/vast-tools.

For expression analysis, RNA-seq reads were split into 50 nt read groups to increase the fraction of mapping junction reads within each sample. Reads were aligned back to the mouse genome (Ensembl release 67) and transcript levels were quantified as reads per kilobase (of target gene) per million (of total reads) and corrected for transcript length and sequence redundancy (cRPKM) (Labbe et al., 2012). Only the first 50 nt of the forward read from paired-end sequencing were considered in obtaining cRPKMs. The raw table was filtered to only include genes that had a sum of reads > 50. EdgeR analysis was applied to determine differentially expressed genes and the score metric was generated: Score = sign of the log2FC * (-log10(p value)), where FC is the fold change. The generated p value score ranks genes from the top up- regulated to top down-regulated genes. The coverage threshold per gene was set so that at least one sample had a count per million (CPM) value of 0.05. CPM+1 was used as an input for EdgeR. Dispersion parameters were estimated in EdgeR by considering the pooled samples as replicates since each sample followed the same distribution. Changes in gene expression (≥ 1.5- fold any direction) were validated by standard qPCR techniques (Methods 2.3). The primers used for qPCR validation are listed in Table 1.

Table 1 Gene expression qPCR primers used to validate the RNA-seq data sets. Sequences (Forward and Reverse) Gene Accession No. 5' --> 3' CGTCCTCGTTGGAGTGACA 36b4 NM_007475.5 CGGTGCGTCAGGGATTG ACGCATCATGGAAAAGCAGTT Arglu1 NM_176849.3 TTCTGCGATTTTCCGGTTATTT CATACATGCAGGGTAGAGTCATTCTT Gr NM_008173.3 GCAAGTGGAAACCTGCTATGC GCTATGCTGGGAAGAACGCTACT Parp14 NM_001039530.3 AGTGAGAACCCGCACATAATACATATA GCAGCGTTATCTGCAGCATTT Grap2 NM_010815.3 CCGCAGTGTCCATCGTTTATG CTGGTCAAGTCCAGGTTTTGG Igf2r NM_010515.2 TTTCGAGTAAGTAACAATGACCGTTTC

30

ATGGCTACAAAATGCACAAAGTG Selenbp1 NM_009150.3 CCTGTGTTCCGGTAAATGCAG CCCCAACACTGGGATCTTCA Emilin2 NM_145158.3 CAACACTGGCGTTGGAGACA GGGATCATTCTTTGCGAATTTATC Cnn1 NM_009922.4 AGTTGACTCATTGACCTTCTTCACA CTGCCCGGAGACCCTCTT Actg2 NM_009610.2 GTCAATGTCACACTTCATGATGGA AGGCCTCAGGCTTCATCCA Myl9 NM_172118.1 CACGTAGTTCAAGTTGCCCTTCT CCCAGCTTTACCTGCAGAAG Per1 NM_011065.4 ATGGTCGAAAGGAAGCCTCT CTGCATGGCACAAGTCTTTG Pnmt NM_008890.1 CGGAGCCAATATCAATGAG TCTTCCAGGCTGCGAATCC Trim63 NM_001039048.2 ACGACCTCCAGACATGGACACT TCTGCGCACACGGAATAGC Lrp4 NM_172668.3 GCGCTCACCGCACATGT GAGCTGCCCGGGATGATA Abcg8 NM_026180.3 CCCGGAAGTCATTGGAAATCT GGTCCCTGGAGCAGCACAT Gpat2 NM_001081089.2 GAGGTTCTGCACCCCAGTGTT CCTGGCAATGTGGATCCTATG St3gal1 NM_009177.4 TTCATCCTCAGCACAAAGTCATG GGCTCTGTGGTGGTCTCAGAA Mfsd2a NM_029662.2 CAGGTAAGCATTTTGCGTGTCT CCCAACAGAGAAGCAACAGAAAT Tmem27 NM_020626.2 TCCGTGACCACAAACCAGAA ACAAGCGCCGCATTGAG Rnd2 NM_009708.1 CAGAGGCCGGACATTGTCA ACCCTCAGAAGTACTCAGACAAGGA Baiap2 NM_001037754.3 GCTCGCCCTGCTTATTGC AGAAGAAGCCGCCATTTGAG Kat2b NM_001190846.1 CGTTGTCTGCCTCTCTTTCGA GGACGACAAGACGAACATCAAG Nrn1 NM_153529.2 CCGTGCAGCTGTGGAAATC ACAACACCAGCGTCGTCAAG Scn1b NM_011322.3 TCTGACACGATGGATGCCATA CACCTGTTGTGCAAATTGTCAGA Gata2 NM_008090.5 CCCTTCCTTCTTCATGGTCAGT TTCGGGTCAGCCATGGTT Pax5 NM_008782.2 CCCGGCTTGATGCTTCCT CGCTGCCAGAAGGCTTGA Ncor1 NM_001252313 AGCTTCATGTTTGCTTCCAAATG TCGTGGAAATGAGAAAAGAGTTG IL-6 NM_031168.1 AGTGCATCATCATCGTTGTTCATACA

For AS analysis, RNA-seq reads were split into 50 nt read groups to increase the fraction of junction reads mapped within each sample. We performed paired end sequencing, thus reads from both ends were also pooled to avoid counting the same sequence multiple times, one

31

random count per read group was considered. Unmapped reads were initially obta