STAR/GSG DOMAIN BIND TO BIPARTITE RNA MOTIFS

by

Andre Galarneau

Department of Medicine

Division of Experimental Medicine McGill University, Montreal, Quebec, Canada August 2007

A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy.

© Andre Galarneau, 2007 Library and Bibliotheque et 1*1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A0N4 Ottawa ON K1A0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-48375-6 Our file Notre reference ISBN: 978-0-494-48375-6

NOTICE: AVIS: The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par Plntemet, prefer, telecommunication or on the Internet, distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, electronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these.

While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. Canada II

ABSTRACT

Understanding the molecular biology of cells is a central field of research that is essential to all of the biological sciences. Messenger RNAs are the central link between DNA information contained in the nucleus and proteins that are synthesized in the cell cytoplasm. These mRNAs are subject to a substantial amount of modification and control, which are handled or under the supervision of RNA binding proteins. RNA binding proteins are frans-acting factors that interact with c/s-acting elements present in the RNA sequence. They play major roles in mRNA processing including alternative splicing, translation, trafficking, localization and non-sense mediated decay. The STAR (signal tranduction activator of RNA metabolism) family are RNA binding proteins that can play multiple roles in RNA processing. This family of protein includes among others Quaking (QKI), SRC associated in mitosis (SAM68), SAM68-like mammalian protein 1 (SLM-1) and 2 (SLM-2), splicing factor 1 (SF1), germ-line development (GLD-1). All these family members contain a STAR/GSG domain that embeds a hnRNP K homology domain responsible for conferring the RNA binding properties.

Because of the lack of genetic data providing insight about their physiological RNA targets, the function of these RNA binding proteins has been difficult to elucidate. Only a few mRNA targets are known across the whole family of STAR proteins. The hypothesis is that STAR proteins bind to a subset of determined mRNAs, and that identifying these RNAs and understanding how iii

STAR proteins act on them would provide important evidence toward understanding the function of this familly of RNA binding proteins. Using systematic evolution of ligands by exponential enrichment (SELEX), we were able to define the QKI response element (QRE) as an hexanucleotide sequence

(ACUAAY) with an additional half-site (UAAY). The identification of a refined

QRE allowed us to perform a bioinformatic search which led us to the identification of 1,430 putative new mRNA targets. A large majority of these mRNAs are implicated in development and cellular differentiation consistent with the phenotype of QKV mice. Moreover, using the same RNA selection technique, we were able to identify for the first time the SLM-2 response element (SRE).

This SRE is very similar to the one identified with QKI and consists in a direct

U(U/A)AA repeats. We also show that SAM68 is able to bind to the SRE but not with the QRE. Both response elements are bipartite consensus motifs. These findings demonstrate that the STAR proteins QKI, GLD-1, SAM68 and SLM-2 recognize RNA with direct repeats as bipartite motifs. IV

RESUME

Bien comprendre la biologie moleculaire de la cellule est un champ de recherche central et essentiel a toutes les sciences biologiques. L'ARN messager est le lien central entre ('information de I'ADN contenu dans le noyau cellulaire et les proteines traduits dans le cytoplasme des cellules. Ces ARN messagers sont sujets a une quantite importante de modifications et de controle par des proteines liant TARN. Ces proteines jouent plusieurs roles majeurs dans la maturation des ARN messagers incluant I'epissage altematif, la traduction, le transport, la localisation ainsi que la degenerescence d'ARN non-sens. La famille de proteine STAR « Signal Transduction and Activatior of RNA metabolism » sont des proteines liant I'ARN capable de jouer plusieurs roles dans la maturation des ARNs. Cette famille de proteine comprend Quaking, SAM68,

SLM-1, -2, SF1, GLD-1 et plusieurs autres. Tous les membres de cette famille ont au sein de leur structure un domaine STAR/GSG comprenant un plus petit domaine homologue a celui de hnRNP K conferant les proprietes de liaison a

I'ARN.

Les roles exactes de cette famille de proteine sont encore difficiles a identifier etant donne le manque de donnes genetiques sur les cibles d'ARN messagers physiologiques dont nous avons en notre possession. En fait, seulement quelques cibles d'ARN messagers ont ete identifies au sein de toute la famille de proteine STAR. L'hypothese derriere cette etude est que les proteines STAR s'associent avec certains ARN messagers definis et que V

I'identification de ces ARN messagers et la comprehension sur la fagon dont les proteines STAR agissent sur ces derniers fournira des informations pertinentes quant a I'identification exacte du role que cette famille de proteines joue dans la cellule. En utilisant une technique de biologie moleculaire appele SELEX

« Systematic Evolution of Ligand by Exponential enrichment», nous avons pu identifier I'element de reponse lie par Quaking comme etant une sequence hexanucleotidique (ACUAAY) avec une sequence additionnelle appele « half- site » (UAAY). L'identification de cette sequence nous a permis d'utiliser la bioinformatique et reconnaitre 1.430 nouvelles cibles d'ARN messagers potentiels. Une grande majorite des cibles d'ARN messagers identifies sont impliquees dans le developpement ainsi que dans la differentiation cellulaire ce qui est coherent avec le phenotype des souris QKV. De plus, ytilisant la technique de SELEX, nous avons identifie pour la premiere fois I'element de reponse de

SLM-2. Cette element de reponse est tres similaire a celui de Quaking et consiste d'une repetition directe de sequence U(U/A)AA. Nous avons aussi demontre que SAM68 est capable de s'associer a cette sequence mais est incapable de s'associer a celle de Quaking. Les deux elements de reponse identifies dans cette etude sont des motifs consensus a deux parties. Nos recherches demontrent que les proteines de la famille STAR reconnaissent et s'associent TARN avec des motifs a deux parties continues. VI

PREFACE

This Ph.D. thesis was written in accordance with the Guidelines for Thesis preparation from the Faculty of Graduate Studies and Research of McGill

University. I have exercised the option of writing the thesis as a manuscript- based thesis. For this, the guidelines state: "...Candidates have the option of including, as part of the thesis, the text of one or more papers submitted, or to be submitted, for publication, or the clearly-duplicated text (not the reprints) of one or more published papers. These texts must conform to the "Guidelines for

Thesis Preparation" with respect to font size, line spacing and margin sizes and must be bound together as an integral part of the thesis The thesis must be more than a collection of manuscripts. All components must be integrated into a cohesive unit with a logical progression from one chapter to the next. In order to ensure that the thesis has continuity, connecting texts that provide logical bridges between the different papers are mandatory The thesis must include the following: (a) a table of contents; (b) an abstract in English and French; (c) an introduction which clearly states the rational and objectives of the research; (d) a comprehensive review of the literature (in addition to that covered in the introduction to each paper); (e) a final conclusion and summary In general, when co-authored papers are included in a thesis the candidate must have made a substantial contribution to all papers included in the thesis In addition, the candidate is required to make an explicit statement in the thesis as to who vii contributed to such work and to what extent. This statement should appear in a single section entitled "Contributions of Authors" as a preface to the thesis "

As chapters of this thesis, I have included the texts and figures of two original research manuscripts that have been published or submitted for publication. Each of these chapters (Chapters 2 and 3) contains its own summary, introduction, materials and methods, results, discussion, and references sections. In addition, a preface is included at the beginning of each chapter in order to introduce and bridge the papers with connecting texts. A general introduction and literature review is presented in Chapter 1, whereas a final discussion is included in Chapter 4. The references for chapters 1 and 4 are included at the end of the thesis.

PAPERS INCLUDED IN THIS THESIS:

Chapter 2 Andre Galameau & Stephane Richard (2005). Target RNA motif

and target mRNAs of the Quaking STAR protein. Nature

Structural and Molecular Biology 12(8): 691-698.

Chapter 3 Andre Galameau & Stephane Richard. The STAR/GSG RNA

binding proteins GLD-1, QKI, SAM68 and SLM-2 bind bipartite RNA

motifs. Manuscript in preparation. viii

CONTRIBUTION OF AUTHORS:

The candidate performed mainly all of the research presented in this thesis and wrote all of the included manuscripts with support from Dr Stephane

Richard. The contribution of other authors to this work is described below:

Dr. Stephane Richard did all the animal manipulations in accordance with a protocol approved by the Animal Care Committee at McGill University. In chapter 2, Michelle Scott performed the bioinformatic search but the candidate performed the analysis. The different studies were all conducted under the supervision of Dr. Stephane Richard.

CONTRIBUTIONS NOT INCLUDED IN THIS THESIS:

In addition to the papers included in this thesis, the candidate contributed to the following during his graduate studies, which have been published:

Larocque, D., Galarneau, A., Liu, H.N., Scott, M., Almazan, G. and Richard, S.

(2004) p27Kip1mRNA protection by QUAKING RNA binding proteins promote oligodendrocyte differentiation. Nature Neuroscience 8(1): 27-33.

Galarneau, A., Primeau, M., Trudeau, L.E. and Michnick, S.W. (2002)

Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein protein interactions. Nature Biotechnology 20(6): 619-622. IX

CLAIM TO ORIGINALITY

This thesis was entirely written by me, Andre Galameau, under the supervision of my thesis director, Dr Stephane Richard. Chapter 1 and 4 were also corrected by a specialized firm in interpretation and translation

(Interpretation Joseph Blain Inc.). The manuscript presented in chapter 2 was published by our laboratory and constituted, at the time of publication, original and previously unpublished results. The manuscript presented in chapter 4 is in preparation and will be submitted shortly. The all the results presented in this manuscript are original and unpublished.

More precisely, the determination that: (i) the optimal quaking response element (QRE) is a bipartite motif with a derived consensus sequence;

(ii) a bioinformatic analysis with 1433 new putative mRNAs; (iii) the mapping of the QREs within two known mRNA targets of QKI - MBP and EGR-2; (iv) the identification of the SLM-2 response element (SRE). X

ACKNOWLEDGMENTS

I want to thank my supervisor Dr. Stephane Richard for leading me to the wonderful playground of molecular biology and granting me unlimited freedom with his enthusiasm, support and trust. Stephane has provided me with the unique opportunity of discovery. My experience in Dr. Richard's laboratory will benefit me for the rest of my life and I will always be grateful.

The completion of this thesis was only possible with the support and the contributions from people whom I was fortunate to work with - the members, present and past of Dr. Richard's laboratory and my friends. Carolanne Chenard, Dr. Daniel Larocque, Dr. Julie Pilotte, Dr. Francois-Michel Boisvert, Dr. Jocelyn Cote, Dr. Marie-Chloe Boulanger and Dr. Enrique Lukong. I am particularly grateful to Jocelyn Cote for teaching me how to manipulate overwhelming RNA molecules and to Carolanne Chenard who is not only a brilliant scientist but also a real friend. I do not want to forget Frangois-Michel Boisvert, Patrick Cleroux (Cleroux des bois) and Melanie Morel who worked by my side and shared conversation for 5 years. I would like to thanks also the National Cancer Institute of Canada for their finantial support.

When I started my PhD, 6 years ago, I was wondering and asking myself these questions: Where will I be in five years? What will I be doing? Well... after a marriage, three born children and a new house, I can say that my PhD was a demanding one. I can truly say that, if it were not the presence in my life of an amazing woman, this would not have been possible. I want to give my deepest thanks to my wife Melanie whose strength and support were an inspiration to me. She is the only one who knows... Merci Melanie. xi

TABLE OF CONTENTS

ABSTRACT II

RESUME IV

PREFACE VI

PAPERS INCLUDED IN THIS THESIS: VII

CONTRIBUTION OF AUTHORS: VIM

CONTRIBUTIONS NOT INCLUDED IN THIS THESIS: VIII

CLAIM TO ORIGINALITY IX

ACKNOWLEDGMENTS X

TABLE OF CONTENTS XI

LIST OF FIGURES AND TABLES XV

LIST OF ABBREVIATIONS XVII

1 GENERAL INTRODUCTION AND REVIEW OF THE LITERATURE 1

1.1 GENERAL INTRODUCTION 1

1.2 ROLE OF EUKARYOTIC MESSENGER RNA BINDING PROTEINS 2

1.2.1 5'-END CAPPING 5

1.2.2 ALTERNATIVE SPLICING 6

1.2.3 RNA TRAFFICKING 12 1.2.4 RNAl AND MICRORNA 17

1.2.5 TRANSLATION 19

1.2.6 MRNA TURNOVER AND DEGRADATION 23

1.2.7 NON-SENSE MEDIATED DECAY (NMD) 24

1.3 MEMBERS AND ROLE OF THE STAR DOMAIN PROTEIN FAMILY 25 XII

3.1 SRC ASSOCIATED IN MITOSIS 68 (SAM68) 26 3.2 GRP33 29

3.3 GLD-1 29

3.4 QUAKING 31

3.5 SAM68 LIKE MAMMALIAN 1 AND 2 (SLM1 AND SLM2) 35

3.6 HELD OUT WINGS (HOW) 36

3.7 KH ENCOMPASSING PROTEIN (KEP) 1 AND SAM 37

3.8 SPLICING FACTOR 1 (SF1) 37 4 RNA BINDING DOMAINS 38 4.1 STAR DOMAIN - MAXI-KH DOMAIN 39 4.2 HNRNP K HOMOLOGY DOMAIN - KH DOMAIN 39

4.3 RNA RECOGNITION MOTIF (RRM) 40

4.4 DsRNA BINDING DOMAIN 40 4.5 DEAD BOX 41 5 RNA BINDING MOTIF 41 5.1 A2 RECOGNITION ELEMENT (A2RE) 42 5.2 AU RICH ELEMENT (ARE) 43 5.3 G-QUARTET 44

5.4 SPLICING MOTIF 45

5.5 TRA-2-GLI ELEMENT (TGE) AND QUAKING RESPONSE ELEMENT (QRE) 46

TARGET RNA MOTIF AND TARGET MRNAS OF THE QUAKING STAR PROTEIN 48

1 PREFACE 48

2 ABSTRACT 48

3 INTRODUCTION 49 4 METHODS 52

4.1 SELEX ASSAY. 52

4.2 RNA PREPARATION AND PURIFICATION. 53 4.3 EMSAS. 54

4.4 IMMUNOPRECIPITATIONSOFQKI-5,QKI-6ANDSAM68WITHMRNAS. 54 XIII

2.4.5 BlOINFORMATICS. 55

2.4.6 B-LACTAMASE ASSAY. 56

2.4.7 ACCESSION CODES. 56 2.5 RESULTS 57

2.5.1 IDENTIFYING THE QRE 57

2.5.2 DEFINING THE QRE 63

2.5.3 MAPPING THE QRES IN KNOWN MRNA TARGETS OF QKI 70

2.5.4 IDENTIFYING NOVEL PUTATIVE MRNA TARGETS OF QKI 73 2.6 DISCUSSION 79 2.7 REFERENCES OF CHAPTER 2 84 2.8 ACKNOWLEDGMENTS 89 2.9 SUPPLEMENTARY DATA , 90

3 THE STAR/GSG RNA BINDING PROTEINS GLD-1, QKI, SAM68 AND SLM-2 BIND BIPARTITE RNA MOTIFS 121

3.1 PREFACE 121 3.2 ABSTRACT 121 3.3 INTRODUCTION 123 3.4 METHODS 126 3.4.1 SELEX ASSAY. 126 3.4.2 RNA PREPARATION, PURIFICATION AND ELECTROMOBILITY SHIFT ASSAYS (EMSAs). 127 3.4.3 DNAAND PROTEIN PREPARATION 128 3.5 RESULTS „129 3.5.1 THE IDENTIFICATION OF THE SLM-2 RNA BINDING SITE BY USING SELEX 129 3.5.2 A DIRECT REPEAT OF U(U/A)AA DEFINES THE SLM-2 RNA BINDING CONSENSUS SEQUENCE 130 3.5.3 SAM68 BINDS THE SLM-2 RESPONSE ELEMENT 138 3.5.4 DEFINING THE BIPARTITE NATURE OF THE QKI RESPONSE ELEMENT WITHIN THE MRNAS OF MYELIN BASIC PROTEIN 138 3.5.5 GLD-1 BINDS A BIPARTITE RNA MOTIF CONTAINING THE HEXANUCLEOTIDE. 139 XIV

3.6 DISCUSSION 145

3.6.1 DEFINING THE SLM-2RNA BINDING SITE U(U/A)AA REPEATS. 145

3.6.2 QUAKING: A REGULATOR OF MYELINATION. 147

3.7 REFERENCES OF CHAPTER 3 154 3.8 ACKNOWLEDGMENTS 161

3.9 SUPPLEMENTARY DATA 162

4 GENERAL DISCUSSION 164

4.1 SELEX AS A GENERAL TOOL TO IDENTIFY RNA TARGETS 164

4.2 GSG/STAR DOMAIN RNA CONSENSUS: A COMPARISON STUDY 165

4.3 STAR DOMAIN CONTAINING PROTEINS AND POST -TRANSCRIPTIONAL

REGULATION 167

4.3.1 RNA TRAFFICKING 168

4.3.2 TRANSLATION REPRESSION AND MRNA STABILIZATION 169

4.3.3 ALTERNATIVE SPLICING 170

4.4 STAR DOMAIN CONTAINING PROTEINS AND DISEASE 173

4.5 FUTURE DIRECTIONS 175

4.6 CONCLUDING REMARK 177

5 REFERENCES FOR CHAPTER 1 AND CHAPTER 4 178 XV

LIST OF FIGURES AND TABLES

FIGURE 1.1 EUKARYOTIC MESSENGER RNA PROCESSING 3

FIGURE 1.2 OVERVIEW OF EUKARYOTIC MRNA SPLICING 7

TABLE 1.1 SPLICEOSOMAL RNA AND PROTEINS AND THEIR FUNCTION IN SPLICING.... 10

FIGURE 1.3 GENERAL MECHANISMS OF MRNA LOCALIZATION 14

FIGURE 1.4 GENERAL MECHANISMS OF MRNA TRANSLATION REGULATION 20

FIGURE 2.1 IDENTIFICATION OF QKI-SPECIFIC RNA SEQUENCES USING SELEX 58

FIGURE 2.2 RNAs THAT BOUND TO QKI-5 AND THEIR CONSENSUS SEQUENCE 61

FIGURE 2.3 DEFINING THE QRE 64

FIGURE 2.4 DEFINING OPTIMAL DISTANCE AND STRUCTURE BETWEEN DIRECT REPEATS

IN THE QRE 68

FIGURE 2.5 MAPPING QRES WITHIN TWO KNOWN MRNA TARGETS OF QKI, MBP AND

EGR-2 71

FIGURE 2.6 IDENTIFICATION OF NEW MRNA TARGETS FOR QUAKING 77

TABLE 2.1 MRNA TARGETS IDENTIFIED BASED ON THE IDENTIFIED CONSENSUS 90

TABLE 2.2 MATRIX BASED ON THE NUCLEOTIDES IDENTIFIED BY SELEX 103

TABLE 2.3 MRNA TARGETS IDENTIFIED BASED ON THE MATRIX (TABLE 2.2) AND A

FIXED CORE OR HALF SITE SEQUENCE 104

TABLE 2.4 MRNA CLASSIFICATION OF EACH HIT BASED ON ANNOTATION 111

TABLE 2.5. SUBSTRATES USED FOR ALL T7 RNA MEGASHORTSCRIPT 115

TABLE 2.6 PRIMERS USED FOR RT-PCR 116

FIGURE 2.7 COMPARISON OF RNA BINDING AFFINITY FOR QKI ISOFORMS 117

FIGURE 2.8 QKI-5, QKI-6 AND SAM68 SPECIFIC IMMUNOPRECIPITATIONS 119 XVI

TABLE 3.1 SELECTED SLM-2 BOUND RNA LIGANDS 131

FIGURE 3.1 SLM-2 RNA LIGANDS IDENTIFIED 132

TABLE 3.2 BINDING AFFINITY OF SLM-2 FOR SELECTED AND MUTATED LIGANDS 135

FIGURE 3.2 DEFINING THE SLM-2 RESPONSE ELEMENT AS A BIPARTITE RNA MOTIF 136

FIGURE 3.3 DEFINING THE HIGH AFFINITY QRE WITHIN MBP MRNA 140

FIGURE 3.4 GLD-1 BINDING TO THE TRA2/GLI ELEMENT ANALYSIS 143

FIGURE 3.5 SAM68/SLM-2 TETRANUCLEOTIDE VERSUS QKI/GLD-1 HEXANUCLEOTIDE

SEQUENCE REQUIREMENTS 152

FIGURE 3.6 THE CLASS IISRE-9, -10 AND -11 DO NOT ASSOCIATES WITH SLM-2... 162 XVII

LIST OF ABBREVIATIONS

7-methyl-Gppp or m7GpppN 7-methyl guanosine cap A2RE hnRNP A2 recognition element Ago Argonaut ARE All Rich element AUF ARE binding protein CPE Cytoplasmic polyadenylation element CTD C-terminal domain Dcr Dicer DEAD Aspartic - Glutamic - Alanine - Aspartic DNA Deoxyribonucleic Acid dsRNA Double stranded RNA dsRNP Double stranded ribonucleoprotein elF Eukaryotic initiation factor EJC Exonjonction complex ESE Exon splicing enhancer ESS Exon splicing silencer FMRP Fragile X mental retardation protein FXS Fragile X syndrome FMR Fragile X mental retardation GFP Green fluorescent protein GLD Germ line development Gli Glioma-associated oncogene GRP33 Glycine rich protein 33 kda GSG domain GRP33-Sam68-Gld1 domain hnRNP Heterologous nucleo Ribonucleoprotein HOW Held out wings IGF-IR Insulin growth factor- IRES Internal ribosome entry site ISE Intron splicing enhancer ISS Intron splicing silencer KEP KH encompassing protein KH domain hnRNP K homology domain MAG Myelin associated glycoprotein MEFs Mouse embryonic fibroblasts miRNA microRNA mRNA messenger RNA ncRNA Non coding RNA NOVA Onconeural antigen POMA Paraneoplastic Opsoclonus-Myoclonus Ataxia NLS Nuclear localization signal NMD Non sense mediated decay NPC Nuclear pore complex Nt Nucleotides ORF Open reading frame PABP Poly(A) binding protein PARN Poly(A) Ribonuclease PAZ domain Piwi-Argonaut-Zwille domain Poly(A) Poly adenosine pre-mRNA Precursor m-RNA PRMT1 Protein Arginine methyl transferase PTC Premature termination codon QKI Quaking qkv Quaking viable mice RISC RNA induced silencing complex RNA Ribonucleic Acid RNAi RNA interference RNP Ribonucleoprotein RRM RNA recognition motif RS domain Arginine and Serine domain SAM68 Src associated in mitosis 68 SBE STAR bindind element Systematic evolution of ligand by exponential SELEX enrichment SF1 Splicing factor 1 SH2, SH3 domain SRC homology 2, 3 domain Brk Breast tumor kinase siRNA Small interfering RNA Slm-1,-2 Sam68 like mammalian 1 and 2 snRNP Small nuclear Ribonucleoprotein SR proteins Serine arginine proteins STAR Signal transduction and activator of RNA Sxl Sex lethal Tra Transformer U2AF U2 snRNP auxiliary factor XIX uAUG Upstream AUG (start codon) uORF Upsteram open reading frame UPF Upframeshit UTR Untranslated region UV Ultraviolet WW domain Tryptophan, Tryptophan domain Xm1p Exoribonuclease Zap-70 Zeta chain associated protein kinase 70 XX

I dLcd^afeth^tKe^t^tD- my chCLdr&w Thomas NCcokwaj^Alyteci/fbr making" everything' worthwhile/. AG

"Think!!! Think!!! Think!!!"

A bear with a very little brain - Winnie The Pooh 1

1 GENERAL INTRODUCTION AND REVIEW OF THE LITERATURE

1.1 GENERAL INTRODUCTION

All cells share common fundamental properties, developed throughout years and years of evolution, in that they employ DNA as their genetic material and rely on the same basic mechanisms to survive. Many organisms, such as and yeasts, consist of single cells capable of independent, self- replication. Other complex organisms are composed of an assortment of cells that form tissues and function in a coordinated manner. These specialized cells perform a wide array of diverse and specific functions, including memory, movement, and digestion. Understanding the molecular biology of the cell is crucial in that it provides important knowledge on the specific behaviours of an individual cell, but it also can lead to the discovery of important medical applications in prevention, prognosis, and treatment of diseases. By improving our knowledge and understanding of the cellular and molecular makeup of many diseases such as cancer and diabetes, as well as neurodegenerative diseases such as Alzheimer's disease, multiple sclerosis, or Parkinson's disease, we greatly improve our chances of finding a cure to these illnesses.

Understanding the molecular biology of cells is an important sphere of research that is fundamental to all of the biological sciences. The molecular biology field is mainly concerned with understanding the mechanisms responsible 2 for transmission and expression of the genetic information that ultimately determines the shape and the function of a particular cell. Transcription, the process of copying DNA to RNA, is the first step in the highly regulated process of gene expression in that it is the start of a sequence of events required to produce a functional RNA molecule (Cooper 2000). Synthesized RNAs must be modified in various ways in order to be converted to their functional and useable forms known as messenger RNA (mRNA). Regulation of these processing steps provides an extra level of control in gene expression. These processing steps are mainly regulated by messenger RNA (mRNA) binding proteins which, in their simplest definition, are proteins able to bind and regulate mRNAs. The focus of this thesis is on the STAR domain RNA binding proteins that regulate some of the processing steps in the complex lives of eukaryotic mRNAs.

1.2 ROLE OF EUKARYOTIC MESSENGER RNA BINDING PROTEINS

Messenger RNAs (mRNAs) are the central link between DNA information and protein. In , mRNAs are first synthesized in the nucleus as pre- mRNAs and are subject to a massive amount of modifications (Moore 2005).

Such alterations include 5'-end capping, splicing, 3'-end cleavage, and polyadenylation (Figure 1.1). After the pre-mRNA process is complete, the fully- developed mRNAs are exported out of the nucleus through the nuclear pore to the cytoplasm using a complex mechanism, and transported to where they will serve as a template for protein synthesis by ribosomes (Moore 2005). Finally, 3

FIGURE 1.1 EUKARYOTIC MESSENGER RNA PROCESSING. mRNA is transcribed in the nucleus as a precursor mRNA or primary transcript, and is subsequently processed to form a mature mRNA that can be translated in the cytoplasm. The important steps of mRNA processing are underlined and include 5'End-Capping, 3'End-Cleavage, polyadenylation and splicing. BSfiin^B =«Wi M mwm MM :*$&»-«•« ritttniMMM=

gene

transcription, 5' end capping

primary transcript

7-methyl-Gppp imMWmmXMU^^MmMMm-ifmi *©*

3' end cleavage and polvadenvlation

7-methyl-Gppp r^m m mrttm mm ^#|«n rffffJffl^JMMi ^mf$$AAAAAAAA(n )

splicing

7-methyl-Gppp |gggj^^2SJi«iJiSffiJAAAAAAAA*n* Mature mRNA ready to be transported to the cytoplasm and translated

FIGURE 1.1 EUKARYOTIC MESSENGER RNA PROCESSING. 5 they will be destroyed by specific RNA degrading enzymes. Moreover, mRNAs are subject to a specific quality control process to verify that the required processing is respected. Each of these phases in the life of an mRNA molecule involves a multitude of RNA binding proteins in order to complete the process. In addition, each of these steps is highly regulated again in part by RNA binding proteins. The first section will focus on the principal roles attributed to RNA binding proteins in mRNA processing.

1.2.1 5'-END CAPPING

Eukaryotic messenger RNAs (mRNAs) contain a modified guanosine at their 5'-ends that is named 5'end-cap. This 7-methyl guanosine (m7GpppN) cap is linked to the first nucleotide of an mRNA transcript, and plays an essential role in the life cycle of mRNAs (Gu and Lima 2005). This modification is required for other mRNA processing functions to occur, in particular, the efficient pre-mRNA splicing, export, stability/protection, and cap-dependent translation initiation (Gu and Lima 2005). Three enzymatic activities are required for cap formation, which are the first steps in mRNA processing (Bentley 2005). Normally, the 5'-end cap addition occurs during transcription as the pre-mRNA is being transcribed

(Bentley 2005). The capping enzyme and the 7-methyltransferase, the two enzymes responsible for capping the 5'-end of a messenger RNA, are the first mRNA processing factors to be recruited by the C-Terminal Domain (CTD) of

RNA polymerase II while it transcribes the DNA (Bentley 2005). Because many

RNA binding proteins bind the 5'-end cap structure to regulate mRNA processing, 6 they will be further defined in subsequent discussions on the roles for RNA binding proteins.

1.2.2 ALTERNATIVE SPLICING

Alternative splicing is a molecular process that allows an individual gene to produce multiple protein isoforms, and is responsible for generating a complex proteome. For example, a single gene can code for two or more proteins that are similar, or different proteins with different functions. A good example illustrating this process is the splice isoforms product of the quaking gene. As explained in section 1.4.3, the quaking pre-mRNA can be alternatively spliced into several isoforms which localise differentially into the cell. The QKI-5 isoform is nuclear whereas QKI-6 and QKI-7 are mainly cytoplasmic (Hardy, Loushin et al. 1996;

Pilotte, Larocque et al. 2001). Moreover, Pilotte et al., has shown that the alternatively spliced isoform, QKI-7, has a different function than the other two in that it is able to induce apoptosis while the other isoforms cannot do this (Pilotte,

Larocque et al. 2001). Pre-mRNA splicing is central to genetic regulation in eukaryotes for many cellular and developmental processes. Minor modifications in splice site choice, either by mutation in cis regulating sequences or trans interacting factors, can have major consequences on the encoded protein and the proteome in that it can change ligand binding, enzymatic activity, allosteric regulation and/or protein localization, and by doing one or the other, will give the isoform a different function. It can also change the stability of the mRNA or, by the addition or subtraction of regulatory elements, the function of the isoform 7

FIGURE 1.2 OVERVIEW OF EUKARYOTIC MRNA SPLICING. (A) Gross overview of lariat formation that occurs during splicing. Briefly, the spliceosome recognize specific cis sequences within the pre-mRNA and permits the formation of a lariat structure, the subsequent ligation of both exons, and the release of the intron. (B) Spliceosome assembly proceeds through a series of short lived intermediate stages or complexes called E, A, B, and C complex. These sub-complexes can be distinguished by their different snRNP composition, (adapted form (Black 2003)) 8

B

oGU- -A-(Py)n-AGp| pGU A=fPy£ nO ,§9

A-(Py)n-AGp|

3 CD nA-(Py)„-AG- + I

FIGURE 1.2 OVERVIEW OF EUKARYOTIC MRNA SPLICING. 9

(Hu and Fu 2007). Numerous diseases come from erroneous alternative splicing events, such as cystic fibrosis, where mutations or variations in c/s-acting elements, or frans-acting factors implicated in splicing regulations lead to aberrant alternative splicing outcome and is a factor in the cause of the disease

(Garcia-Blanco, Baraniak et al. 2004).

Pre-mRNA transcripts consist of exons that will make up the mRNA product and interrupted by non-coding introns. To generate the mature mRNA transcript, the cell needs to remove the introns and join together the associated exons. The removal of the introns from the pre-mRNA and the joining of the exons are directed by special sequences at the intron/exon junctions called splice sites (Black 2003; Matlin, Clark et al. 2005). As shown in figure 1.2, the process of removing introns and joining exons is done by the spliceosome - a macromolecular ribonucleoprotein complex that assembles directly on the pre- mRNA (Matlin, Clark et al. 2005). Briefly, this complex is composed of an assembly of five small nuclear ribonucleoprotein (snRNP) particles (U1, U2, U4,

U5 and U6) that are associated with other proteins such as U2AF or Sm proteins, depending on the complex formed and the regulating partners. SnRNPs contain a small stable RNA bound by several proteins. A vast number of proteins are implicated in splicing either by regulating it or by directly processing it. A great number of examples are listed in Table 1.1. Here's a list of all the Sm proteins that are part, with the RNPs, of the core of the spliceosome machinery (Black

2003). The U2AF (U2 auxiliary factor), which is a dimer of 65 and 35 kDa 10

TABLE 1.1 SPLICEOSOMAL RNA AND PROTEINS AND THEIR FUNCTION IN SPLICING.

PROTEIN HUMAN PROTEIN HUMAN FUNCTION FUNCTION NAME NAME U1SNRNA PRP8 U2 SNRNA U5-200KD U4 SNRNA SNRNA U5-116KD U1SNRNP U5 SNRNA U5-102KD SPECIFIC U6 SNRNA U5-100KD PROTEINS SMB/B! U5-40KD SMD1 U5-15KD SMD2 HPRP3 SMD3 HPRP4 U4/U6 SNRNP SME1 RY-1 SPECIFIC SMF1 USA-CYP PROTEINS 15.5 TRI- SMG1 CORESNRNP SNRNP PROTEINS LSM2 U2AF85 LSM3 U2AF35 LSM4 SF1 LSM5 CBP20 CAP BINDING LSIV16 CBP80 PROTEIN LSM7 ASF/SF2 LSM8 UAP56 U1-70KD PRP5 U1 SNRNP U1-A TAT-SF1 SPECIFIC U1-C PTB PROTEINS FBP11 PRP19 U2-A" PRP31 U2-B" DDX16 SF3A60 PRP16 SF3A66 PRP17 SF3A120 U2SNRNP oLU / SF3B49 SPECIFIC PRP18 SF3B145 PROTEINS PRP22 SF3B130 EWS SF3B155 PRP43 P14 PRP24 DDX3

Adapted from Black 2003. For complete list of proteins involved in alternative splicing see (Black 2003) 11 subunits that bind to the polypyrimidine tract, and the 3' splice site. SF1, the splicing factor 1, recognize and interact with the splicing branch point. The SR proteins are required for constitutive and regulated splicing with a modular structure comprising of one or two N-terminal RNA Recognition Motif (RRMs), and a C-terminal arginine/serine rich (RS) domain (Singh and Valcarcel 2005).

They normally bind to regulatory sequences called Exon or Intron splicing enhancers or silencers (ESE, ISE, ESS and ISS are discussed later in 1.6.4) and are also in contact with U2AF (Matlin, Clark et al. 2005).

To illustrate the importance of RNA binding proteins in the regulation of splicing, a very nicely studied system for alternative splicing is the Sex-lethal (Sxl) and

Transformer (Tra) gene of D. Melanogaster (Lopez 1998; Black 2003). The sex determination pathway is dependent on the RNA binding protein Sxl. In females,

Sxl is functional whereas in males, alternative splicing leads to the inclusion of stop codons so that no functional protein is produced. Sxl is a splicing repressor and prevents splicing patterns that would lead to male development. One of the targets of Sxl is the Tra pre-mRNA. In males, the absence of Sxl produces a truncated and inactive Tra protein, whereas in females, the presence of Sxl allows for a different splicing pattern, leading to the deletion of a stop codon from the Tra mRNA and permitting translation of active Tra protein. Moreover, the Tra gene codes for a splicing activator which regulates other sex-specific pre- mRNAs. The mechanism through which Sxl influences Tra splicing is through interacting with a recognizable element within the 3'splice site of Tra exon 2 and 12 blocking the recognition of this site by U2AF (Granadino, Penalva et al. 1997).

This leads to a shift in the 3'splice site to a position downstream. This very simple example shows that Sxl is competing for the binding site of an important component of splicing such as U2AF, and illustrates one of many ways that RNA binding proteins are able to influence and play an important role in alternative

splicing, and through it, cell fate and development.

1.2.3 RNA TRAFFICKING

The protein machinery of the RNPs participates in pre-mRNA processing

but is also important for mRNA nuclear export and localization. The mRNA

nuclear export is mediated by RNA binding proteins that act as mRNA export

adaptors or receptors that can target the mRNA to the nuclear pore complex

(NPC) for nuclear export (Suntharalingam and Wente 2003). The NPC serves as

the molecular gatekeeper for movement of proteins and protein-RNA complexes

between the nucleus and cytoplasm (Moore 2005). Normally, an export adaptor

such as Npl3p is an RNA binding protein that operates to bridge the mRNA to a

receptor protein such as Mlp that contacts components of the NPC (Rodriguez,

Dargemont et al. 2004). Thus nuclear exporting factors contribute to link

transcription to nuclear export, and subsequently, to translation (Sommer and

Nehrbass 2005).

Localization operates to restrict synthesis of the encoded protein to a

specific sub-cellular compartment (Kindler, Wang et al. 2005). This is normally 13 coupled with inactive translation. The mRNA localization is increasingly being shown to play an important role in cells such as neurons and in development

(Kindler, Wang et al. 2005). As shown in figure 1.3, mechanisms for mRNA localization include active transport along the cytoskeleton using motor proteins, passive diffusion, and local anchoring. Diffusion and local anchoring of specific mRNA to a precise localization compartment into a cell has very important consequences on development. In D. Melanogaster for example, such phenomenon is nicely described by the localization of nanos mRNA in oocytes which occurs by general diffusion and posterior trapping by a localized anchor

(Wang, Dickinson et al. 1994). This localization anchoring is dependent on the previous localization of Oskar, Vasa, and Tudor protein at the posterior pole that binds the nanos mRNA which will lead to a localized translation. The gradient mRNA creates a gradient protein expression through the oocytes. Because

Nanos protein is a RNA binding protein that regulates translation by preventing other mRNA from being translated, it creates again an inverse gradient of

expressed protein that is regulated through nanos (Forrest and Gavis 2003).

These gradients in protein expression are, in part, the basis for anterior-posterior

axis formation of D. Melanogaster because if you mislocalize these mRNAs in the fly egg, you induce an aberrant formation of a second abdomen in the place of the head and thorax (Ephrussi, Dickinson et al. 1991; Forrest and Gavis 2003). 14

FIGURE 1.3 GENERAL MECHANISMS OF MRNA LOCALIZATION. (A) An mRNA is being actively transported on a microtubule by a motor protein and an adaptor protein. (B) The local anchoring of mRNAs allows a gradient of translated protein to be produced. (C) Passive diffusion also creates a gradient of mRNAs that will ultimately create a gradient of translated protein. 15

mRNAv / N J/. RNA binding proteins \j§

Actin or Microtubule

B C

mRNA diffusion

Figure 1.3 General mechanisms of mRNA localization. 16

The trafficking pathway for most RNPs granules is determined by specific c/s-acting elements in the RNA sequences and similar /rans-acting factors in the cell. For example, the myelin basic protein (MBP) mRNA trafficking is mediated by a c/s-acting element present in the 3'-untranslated region (3'-UTR) called the hnRNPA2 response element (A2RE), and its frans-acting binding protein is the hnRNPA2. The sequence-specific binding of hnRNPA2 to the A2RE is required for RNA trafficking and is thought to be a general trafficking pathway (the c/s- acting elements will be further defined in section 1.5). This interaction results in the formation of RNP granules, which travel along cytoskeletal filaments. In addition to the RNA and frans-acting RNA binding protein, granules also contain components of the translation machinery and molecular motors that move the granules along microtubules. Translation is suppressed during transport. At the destination sites, the transcripts are delivered and are ready for translation

(Kindler, Wang et al. 2005). One such RNA granules component is the RNA binding protein Staufen. Staufen was first identified because of its role in D. melanogaster axis formation (St Johnston, Beuchle et al. 1991). Staufen can associate RNA complexes to actin- and microtubule-dependent localization of mRNAs (Kindler, Wang et al. 2005; Miki, Takano et al. 2005). Staufen contains conserved dsRBDs which can bind to RNA stem-loops containing 12 uninterrupted base pairs (Ramos, Grunert et al. 2000). Because Staufen associates with RNA and takes place within the RNPs, it was proposed that

Staufen would act as an adapter protein for mRNA transport along the microtubules in neurons. Other roles have been attributed to Staufen; markedly it 17 could act as a nuclear shuttling protein (Macchi, Brownawell et al. 2004; Kim,

Furic et al. 2005; Martel, Macchi et al. 2006).

1.2.4 RNAl AND MICRORNA

The RNA-interference (RNAi) and microRNA pathways are conserved important regulators of gene expression. They were shown to be implicated in immunology, cancer, and used in antiviral therapy (Zheng, Tang et al. 2005;

Cullen 2006; Gartel and Kandel 2006; Tong 2006). In the past 10 years, there was an explosion in the identification and characterization of small interfering

RNAs that could have a negative regulation effect on RNA stability and mRNA translation. Short interfering RNA (siRNA) or microRNA (miRNA) are non-coding

RNAs (-22 nt long) (ncRNA) that are processed from a longer double-stranded

RNA (dsRNA) during RNA interference (Zamore and Haley 2005). Such non- coding RNAs hybridize with mRNA targets, and confer target specificity to the silencing complexes in which they reside. The mRNA targeted by these ncRNAs are then either degraded by RNA cleavage, or have their translation repressed by specific protein interactions (Zamore and Haley 2005).

Using biochemical and genetic approaches, researchers were able to identify the components and the mechanism involved in RNA interference. These components are for the most part, RNA" binding proteins that are part of a complex called the RNA induced silencing complex (RISC) (Filipowicz 2005).

Most of these identified proteins are known for D. melanogaster. For mammalian 18 cells, the factors involved in RNA interference are part of the Dicer (Dcr) and

Argonautes (Ago) proteins family. Dcr has an RNAse III activity, a DEAD motif, a

PAZ (Piwi, Argonaute, Zwille) and dsRNA binding domain (dsRBD). This protein is implicated into the long double-stranded RNA (dsRNA) binding and processing function. The DEAD domain unwinds RNA whereas the PAZ domain is a nucleic- acid binding structure that is conserved in members of the Dicer and Ago protein families (Lingel, Simon et al. 2003; Yan, Yan et al. 2003). The Ago protein family is characterized by the presence of a PAZ and PIWI domain and has a short

RNA binding function. The PIWI domain is a conserved structured domain that is similar to ribonuclease-H domain and is responsible for binding to short RNA sequences (Schwarz and Zamore 2002). Other RNA binding proteins were shown to play an important role in the regulation of RNA interference. For example, the fragile X mental retardation protein (FMRP) was shown to participate in the RISC assembly, and was shown to interact directly with Ago2 to regulate the translation of the mRNA associated with a ncRNA (Carthew 2002;

Caudy, Myers et al. 2002). FMRP has two RNA recognition motifs (RRM) and one KH domain responsible for binding to the mRNA (these domains will be further discuss in section 1.4). FMRP was also shown to play a significant function in mRNA transport and synaptic plasticity (Schaeffer, Beaulande et al.

2003; Bagni and Greenough 2005). 19

1.2.5 TRANSLATION

Translation is the process by which proteins are synthesized using the genetic information present in an mRNA molecule. Because translation involves the use of proteins to read, decode, and translate an mRNA molecule to a functional polypeptide, this process implies the use of RNA binding proteins.

Furthermore, translational control of mRNAs, an important mechanism of gene regulation, also requires RNA binding proteins. The whole process of translation can be divided into four separated and unique steps: Initiation, elongation, termination, and recycling (Kapp and Lorsch 2004). All these phases of translation except the recycling step involve RNA binding proteins.

As with alternative splicing and RNA trafficking, the different phases of translation are regulated by specific c/s-acting elements in the mRNA sequences and frans-acting factors. As shown in Figure 1.4, most c/s-acting control elements are presents in the un-translated regions (UTRs). In the 5'UTR, we can find a 5' methylated cap, an important feature of translational function in that it is recognized by the cap-binding protein complex elF4F (Gebauer and Hentze

2004). elF4F is a complex of proteins containing the elF4E which is responsible for binding to the cap structure. Some secondary structures like hairpins block translation by preventing the binding of ribosomes or the ribosomal translational scanning efficiency. Upstream start codons AUGs (uAUGs) and upstream open reading frames (uORFs) downregulate translation of the main ORF by providing alternative start sites. Finally, internal ribosome entry sites (IRES) promote cap 20

FIGURE 1.4 GENERAL MECHANISMS OF MRNA TRANSLATION REGULATION. (A) Mechanism implicated at the 5'untranslated region. (B) Mechanism implicated a the 3'untranslated region. Internal RJbosomes Trans regulating Entry Sites-IRES protein

m76-Cap uORF r-TuAUG M

Stem-loop B

Polyadenylation tail CPE 1—| AAUAAA h AAAAAAAAA

Cytoplasmic polyadenylation Trans regulating elements proteins

Figure 1.4 General mechanisms of mRNA translation regulation. 22 independent translation. In addition to these complex mechanisms of translation regulation, 5'UTRs can contain sequences that function as binding sites for regulatory mRNA binding proteins, preventing the association of ribosomes to the mRNA. A good example is the ELAV protein HuR that specifically binds the 5'-

UTR of type I insulin-like growth factor receptor (IGF-IR) and differentially represses cap-dependent and IRES-mediated translation initiation (Meng, King et al. 2005).

Similarly, 3'UTRs contain numerous binding sites for regulatory RNA binding proteins. These include cytoplasmic polyadenylation elements and the hexanucleotide, AAUAAA, which are required to activate the poly(A) tail lengthening of an mRNA (Wilkie, Dickson et al. 2003). Increasing or decreasing the length of the poly(A) tail plays an important role in translation by respectively stimulating or repressing translation (Wilkie, Dickson et al. 2003). For example, when the poly(A) tail is long enough, Poly(A) binding protein (PABP) binds to the poly(A) tail and links the elF4G, which is part of the elF4F complex, to the methylated cap structure through elF4E. The PABP elF4G interaction is proposed to circularize the mRNA via PABP-elF4G-elF4E-Cap interactions and increase translation efficiency (Wilkie, Dickson et al. 2003; Gebauer and Hentze

2004). When the poly(A) tail is not long enough to be bound by PABP, translation does not occur (Wilkie, Dickson et al. 2003). Because of the dual function of some RNA binding proteins, it is obvious that translation and/or translational control are closely connected with other mechanisms of gene regulation such as 23 mRNA stability, alternative splicing, and RNA trafficking (Gebauer and Hentze

2004).

1.2.6 MRNA TURNOVER AND DEGRADATION

mRNA turnover and mRNA degradation are part of a complicated process called mRNA decay or the degradation of mRNA. mRNA turnover is the degradation and re-synthesis of mRNA molecules (Coller and Parker 2004). mRNA decay is an important control mechanism that determines the abundance of cellular mRNA transcripts, and by doing so, determines the level of translation.

The destruction of mRNAs can occur via three general mechanisms: deadenylation dependent exonucleolytic digestion, endonucleolytic, and quality control pathways such as non-sense-mediated decay pathway (NMD) (NMD will be discussed in chapter 1.2.7). There are two well defined pathways by which polyadenylated mRNAs can be degraded but in both case the degradation begins with the shortening of the poly(A) tail at the 3'-end. These two pathways include the 3' to 5' exonucleolytic decay and decapping. The poly(A) shortening action is performed by the RNA binding protein, Poly(A) RiboNuclease (PARN); this process is prevented and controlled by the binding of the poly(A) binding protein (PABP) to the mRNA poly(A) tail. Moreover, PABP not only prevents

PARN deadenylation activity but also the decapping functions of decapping enzymes. When PABP is absent, PARN destabilizes the mRNA by degrading the

poly(A) tail and interacts with the cap structure preventing the interaction of elF4E with the cap (Mitchell and Tollervey 2000). 24

The 3' to 5' exonucleolytic activity is performed by the exosome, a complex of at least eleven 3'-to-5' exonucleases. The decapping enzymes Dcplp and Dcp2p are responsible for removing the cap structure from the mRNAs and

allow the action of Xrnlp, the major cytoplasmic 5'-to-3' exonuclease.

The turnover of mRNAs is a well controlled process. This control is

performed by c/s-acting elements that can either promote (destabilizing

elements) or inhibit (stabilizing elements) the mRNA decay. One of the best

characterized cis elements is the A-U-rich element (ARE) present in the 3'UTRs

of some mRNAs (the ARE will be discussed in further detail in section 1.5.5).

1.2.7 NON-SENSE MEDIATED DECAY (NMD)

Nonsense-mediated mRNA decay (NMD) is an mRNA quality-control

mechanism that degrades abnormal mRNAs that arise as a consequence of

routine mistakes in gene expression or that are produced from mutated

(Maquat 2005). By doing so, NMD eliminates the production of truncated protein,

which could function to the loss of the cell. Truncated protein comes from

premature termination codon (PTC) such as UAA, UAG.or UGA that are located

within an mRNA upstream of the normal site of translation termination. NMD also

degrades natural substrates, such as certain alternatively spliced mRNAs or

mRNAs that have a splicing-generated exon-exon junction that is located more 25 than 50-55 nucleotides downstream of the translation termination codon (Maquat

2005).

The key players in the NMD pathway were initially identified in genetic screens in Sacchromyces cerevisiae (S. cerevisiae) and Caenorhabditis elegans

(C. elegans). Briefly, NMD is initiated during the first round of translation when a translating ribosome encounters a premature termination codon (PTC) residing more than 50-55 nucleotides (nt) upstream of an exon-exon junction that is bound by an RNA binding complex called the exon junction complex (EJC). The

EJC-associated UPF2 and UPF3/UPF3X proteins, as well as UPF1, trigger NMD.

NMD can initiate from the mRNA 5' end or the 3'end. From the 5' end it involves a decapping step followed by 5'-to-3' exonucleolytic mRNA degradation, whereas from the mRNA 3' end, it involves a deadenylation step followed by 3'-to-5' exonucleolytic mRNA degradation (Maquat 2005).

1.3 MEMBERS AND ROLE OF THE STAR DOMAIN PROTEIN FAMILY

STAR (signal transduction and activator of RNA) proteins are a family of proteins that possess a RNA binding activity through a domain with high homology to heteronuclear ribonucleoprotein particle K domain (KH domain) and also have protein-protein interaction domains (Vernet and Artzt 1997; Lukong and Richard 2003). This family of proteins contains specific sequences conserved throughout evolution from D. melanogaster to humans called the maxi-KH domain or the GSG domain (described in more details in section1.4.1). 26

The STAR protein members are Sam68 (Wong, Muller et al. 1992; Richard, Yu et al. 1995; Lock, Fumagalli et al. 1996), GRP33 from Artemia salina (A. salina)

(Cruz-Alvarez and Pellicer 1987), GLD-1 from C. elegans (Jones and Schedl

1995; Jan, Motzny et al. 1999), QKI (Ebersole, Chen et al. 1996; Larocque,

Pilotte et al. 2002; Larocque and Richard 2005), Slm-1 and Slm-2 (Di Fruscio,

Chen et al. 1999), HOW (Baehrecke 1997; Fyrberg, Becker et al. 1997; Fyrberg,

Becker et al. 1998) and KEP1 (Di Fruscio, Chen et al. 1998) from D. melanogaster and SF1 (Kramer 1992; Arning, Gruter et al. 1996). More details for each members of this family of RNA binding proteins are provided in this section.

1.3.1 SRC ASSOCIATED IN MITOSIS 68 (SAM68)

Sam68 was identified as a 62-kDa tyrosine-phophorylated protein in the early 1990's (Ellis, Moran et al. 1990). It was first designated p62 but in 1994, the

Courtneidge group noticed that p62 was migrating at 68-kDa and was also a target of Src during mitosis; consequently, they named it Sam68 (Fumagalli,

Totty et al. 1994). Structurally, along with the GSG/STAR domain (described in section 1.4.1), Sam68 contains six proline-rich domains that are putative binding sites for protein-protein interaction with SH3, SH2, and WW domains (Macias,

Wiesner et al. 2002). Sam68 has a C-terminal domain rich in tyrosine residues that are potential substrate sites for phosphorylation (Wong, Muller et al. 1992).

In fact, other than Src (Fumagalli, Totty et al. 1994), Sam68 was shown to be the target substrate for tyrosine phosphorylation for Fyn and Lck, (Feuillet, Semichon 27 et al. 2002), Zap-70 (Lang, Mege et al. 1997) and Sik/Brk (Lukong, Larocque et al. 2005). Phosphorylation was shown to negatively regulate RNA binding activity

(Wang, Richard et al. 1995; Deny, Richard et al. 2000).

Sam68 is a nuclear protein as it contains a nuclear localization signal

(NLS) within the C-terminal region of the protein (Ishidate, Yoshihara et al. 1997).

Sam68 and certain other STAR family members like Slm-1,2, QKI or GRP33 contain arginine-glycine rich domains that are substrate for arginine methylation, a post-translational modification performed by protein arginine methyltransferases (McBride and Silver 2001). These arginine methylations were shown to influence protein-protein interaction, transcription, and the intracellular localization of proteins (McBride and Silver 2001). For example, in the case of

Sam68, not only was arginine methylation shown to influence protein-protein interaction with SH3 and WW domains, but it changes its intracellular localization through these protein-protein interactions (Cote, Boisvert et al. 2003). In its normal condition, Sam68 is methylated and nuclear. When Sam68 is hypomethylated, it is found located into the cytoplasm (Cote, Boisvert et al.

2003). Acetylation, another post-translational modification of Sam68, was also shown to be increased under certain conditions, notably in tumour cell lines and that this acetylation state correlates with an increase in RNA binding activity

(Babic, Jakymiw et al. 2004). 28

The existence of a multitude of interacting domains and a large number of post-translational modification states gives Sam68 a multifunctional status. This multifunctional aspect of Sam68 makes it hard to pinpoint the real function that can be attributed to this protein. Lately, by developing a Sam68 knockout mice,

Dr. Richard's laboratory was able to identify a physiological role for Sam68 in bone metabolism and bone marrow mesenchymal cell differentiation (Richard,

Torabi et al. 2005). The SamSS"'" mouse embryonic fibroblasts (MEFs) failed to differentiate into adipocytes and showed an increased osteoblast differentiation.

Other functions were identified using these knockout mice and are currently under investigation. The other possible biological roles for Sam68 include: (i) cell cycle and mitosis regulator; (ii) adaptor protein for signal transduction; (iii) tumour suppressor; (iv) regulator of splice site selection; (v) retroviral transport regulator and; (vi) poliovirus replication regulator (Lukong and Richard 2003). Regardless of all these roles identified for Sam68, its real function remains elusive. From all the interacting modules and phosphorylation sites present on Sam68, it is most likely to act as an adapter molecule but the identification of the mRNA targets of

Sam68 could help to circumvent the function of this complex protein.

Sam68 is able to bind to homopolymeric RNA, poly(U) and poly(A)

(Lukong and Richard 2003). Consistent with these facts, a SELEX (systematic evolution of /igands by exponential enrichment) study showed that recombinant

Sarr>68 was able to bind A/U rich sequences (Lin, Taylor et al. 1997). Lately, using UV cross-linking and immunoprecipitation experiments, Dr Richard's 29 laboratory was able to identify 23 mRNAs bound by endogenous Sam68 from

NIH3T3 cells (Tremblay and Richard 2006). All of these contained an A/U rich motif. Using differential display, itoh et a/, were able to identify 29 putative mRNA targets for Sam68 (Itoh, Haga et al. 2002). Still, the exact RNA binding motif for

Sam68 remains obscure.

1.3.2 GRP33

The Glycine-Rich protein with a molecular weight of 33 kDa (GRP33) was identified from a cDNA library from Artemia salina (A. salina) because it is closely related to HD40, the major protein component of the heterogeneous nuclear ribonucleoprotein particles (Cruz-Alvarez and Pellicer 1987). Since the identification of this protein two decades ago, only a few studies were done to identify the biological function of this protein. GRP33 share certain characteristics with the other STAR RNA binding protein in that it contains a STAR domain capable of dimerizing with the other members of the STAR domain containing protein family. It also contain an arginine-glycine rich region and can be methylated by PRMT1 (Chen, Damaj et al. 1997; Chen, Cote et al. 2001; Cote,

Boisvert et al. 2003).

1.3.3 GLD-1

The Caenorhabditis elegans (C. elegans) member of the STAR domain protein family is the defective in germ-/ine-cfevelopment (GLD-1) protein. GLD-1 30 regulates multiple aspects of germ line development in C. elegans suggesting that GLD-1 regulates a multitude of mRNA targets (Francis, Barton et al. 1995;

Hansen, Schedl et al. 2006). Genetic alterations in GLD-1 alleles abolish oogenesis, causing female specific defects in meiotic progression and oocyte differentiation whereas male germ line development is normal (Jones and Schedl

1995; Hansen, Schedl et al. 2006). GLD-1 also functions to promote male sex determination in the hermaphrodite germ line. This is accomplished by GLD-1 binding to the 3'UTR of tra-2 mRNA and repressing its translation to allow spermatogenesis (Jan, Motzny et al. 1999). This translational repressor activity was also shown for p53 (Schumacher, Hanazawa et al. 2005), Notch (Marin and

Evans 2003), and Gli-1//ra-1 (Lakiza, Frater et al. 2005). The GSG/STAR domain of GLD-1 is essential for its in vivo functions, as mutations in conserved residues alter or eliminate GLD-1 functions (Jones and Schedl 1995). In 2001, Lee and

Schedl, used an immunoprecipitation/subtractive hybridization/cloning strategy to identify 15 mRNAs that are putative targets of GLD-1 (Lee and Schedl 2001).

Three years later, Ryder and Williamson have identified the consensus RNA target specificity required for the recognition of RNA molecules by GLD-1 and named it STAR binding element (SBE) (described in more details in section

1.5.1) (Ryder, Frater et al. 2004). Using this consensus sequence, they were able to identify, using bioinformatics, other germ line putative mRNA targets for GLD-

1. 31

1.3.4 QUAKING

Long before the discovery of the QKI gene and prior to the advent of molecular transgenic technologies, neurobiologists had identified naturally occurring animal mutants of the nervous system. Such mutants include the shiverer, the jimpy, the rumpshaker, and the quaking mice (Hardy 1998). These are dysmyelinating mutants characterized, all at different intensity and severity, by tremors or shaking of the limbs or body. Quaking is an autosomal recessive mutation in mice referred as quaking viable (qkv) mouse which exhibit acute dysmyelination of the central nervous system and peripheral nervous system

(Sidman, Dickie et al. 1964; Hardy 1998). Advances in molecular biology allowed the identification of a deletion on the 17 that includes the 5' regulatory region of the quaking gene (Ebersole, Rho et al. 1992). This deletion affects some aspect of the regulation of the qkl expression because QKI protein levels are severely reduced, especially for QKI-6 and QKI-7 after birth (Hardy,

Loushin et al. 1996; Lu, Zhang et al. 2003). Because QKI is one of the major actors implicated in this thesis, many details are mentioned in this section especially on the gene, protein structure, and known functions of this protein.

1.3.4.1 GENE AND PROTEIN STRUCTURE OF QKI

The quaking gene produces three major alternatively spliced variants of 5, 6 and 7 kb (qkl-5,-6,-7) that encodes for proteins (QKI-5,-6,-7) that share the same

KH RNA binding domain. In fact, the proteins encoded by the quaking gene are identical except for short sequences at the carboxyl termini that are unique to 32 each isoforms (Kondo, Kanae et al. 1999). These alternatively spliced variants allow the QKI proteins to be differentially located into the cells since QKI-5 (but not QKI-6 and -7) is mainly located in the nucleus because it holds on its C- terminal region a non canonical nuclear localization signal which is absent in

QKI-6 and QKI-7 (Wu, Zhou et al. 1999). Also, the alternative spliced isoform

QKI-7 has unique C-terminal 14 amino acids that confer the ability to induce apoptosis to heterologous proteins such as GFP and GLD-1 (Pilotte, Larocque et al. 2001). As most of the other members of the STAR/GSG domain family, QKI proteins also contain some SH3-binding domains which are known to participate in protein-protein interaction (Hardy 1998).

1.3.4.2 MOLECULAR AND CELLULAR FUNCTION OF QKI

The QKI proteins were shown to enclose many molecular and cellular functions including pre-mRNA splicing, mRNA export, mRNA stability, and protein translation. At the cellular level, QKI proteins were shown to play an important role in apoptosis, the cell cycle, and glial cell fate and development.

At the molecular level, QKI-5 was shown to regulate alternative splicing.

QKI-5 displays properties of a negative regulator of myelin-associated glycoprotein (MAG) exon 12 pre-mRNA in transient co-expression assays (Wu,

Reed et al. 2002). 33

Due to their splicing discrepancies between the different QKI isoforms, which leads to a differential localization of these isoforms, it was shown by Dr

Richard's laboratory that the QKI isoforms regulate the mRNA export of myelin basic protein suggesting that the dysmyelinating phenotype of the quaking viable mice is most likely due to a nuclear export defect leading to a mislocalization of

MBP mRNA and the shaking phenotype (Larocque, Pilotte ef a/. 2002). QKI proteins were also shown to play an important role in mRNA stability. The Feng group showed that the cytoplasmic destabilization of MBP mRNAs in the quaking viable brain significantly contributes to the reduced MBP levels during early myelinogenesis whereas transcription is not affected (Li, Zhang et al. 2000).

Their conclusions stated that the loss of cytoplasmic QKIs results in the cytoplasmic destabilization of MBP mRNAs and that the interaction of QKI with

MBP mRNA may determine the cytoplasmic fate of MBP mRNAs. Last year the same group showed, using a transgenic mice that express QKI-6 in the oligodendroglia lineage, that QKI-6 alone is sufficient for rescuing the qkv dysmyelination phenotype (Zhao, Tian et al. 2006). Later, our research group positively showed that QKI can play a major role in mRNA stability (Larocque,

Galarneau ef al. 2005). We showed the direct interaction of QKI-6 and QKI-7 isoforms with the mRNA encoding the CDK inhibitor p27K,p1. This interaction leads to the stabilization of p27Kip1 mRNA promoting cell cycle arrest and differentiation of rat oligodendrocyte precursor cells in culture (Larocque,

Galarneau et al. 2005). Lately, the Feng group strengthened the idea that QKI plays a role in mRNA stability by showing that QKI could interact with MAP1B 34 and enhanced its expression (Zhao, Ku et al. 2006). Some studies of QKI, more precisely QKI-6, showed that QKI could act as a translational regulator. Just as

GLD-1 binds to an element in the tra-2 3'UTR, called the TGE (tra/gli element) and represses tra-2 translation, QKI-6 was identified as a translational repressor of tra-2 mRNA by the Goodwin group (Saccomanno, Loushin et al. 1999). Later the same group showed that QKI-6 could also bind the GLI mRNA to repress its translation (Lakiza, Frater et al. 2005).

Although it was known for a long time that QKI plays a major role in development - because the QKI knock-out is lethal - it was only recently that some of these roles were defined and recognized. The first group to show that

Qkl was important during development was the Artzt group (Ebersole, Chen et al.

1996). They showed that QKI had important implications in embryogenesis. The

Hirschi group showed that the visceral endoderm function is regulated in part by

QKI and required for vascular development (Bohnsack, Lai et al. 2006). Before that, the same group suggested that QKI plays an essential role not only in vascular development but also for blood vessel maturity (Noveroske, Lai et al.

2002). In fact, the QKI null mice died at embryonic day E9.5 and E10.5 because of improper development of the vascular and neurological system. Our group showed that by stabilizing p27Kip1, QKI plays an important role in cell fate and oligodendrocytes development (Larocque, Galarneau et al. 2005; Chen, Tian et al. 2007). 35

1.3.4.3 STRUCTURAL INSIGHTS OF QKI

Lately, Maguire et al, have determined the NMR solution structure of the KH and QUA2 homology regions of the QKI protein from Xenopus laevis (pXqua) in the absence of RNA (Maguire, Guler-Gane et al. 2005). They showed that QKI adopts an extended type I KH domain fold similar to protein splicing factor 1 (SF1 discussed in detail in section 1.3.8). The solution structure reveals that the fold topology of the extended KH portion in Xqua is P1-a1-a2-p2-a3-a4-p3-a5-a6 whereas SF1 fold topology consist in P1-a1-a2-p2-p3-a3-a4. Xqua KH domain structure as two additional alpha helices (a3-a4) in a region called the thumb located between beta sheets p2 and p3 in SF1. When this region is deleted in

Sam68, no RNA binding activity can be observed. Many of the characteristics of the RNA-binding surface of SF1 are conserved in pXqua (Liu, Luyten et al.

2001). Particularly, most of the residues from the KH domain that make contact with the RNA are identical or conserved.

1.3.5 SAM68 LIKE MAMMALIAN 1 AND 2 (SLM1 AND SLM2)

The Richard laboratory cloned SLM-1 and SLM-2 based on their high sequence identity with Sam68 (70% in their GSG domain) (Di Fruscio, Chen et al. 1999). Both proteins have the properties of SH2 and SH3 domain binding sites characteristic of the STAR/GSG protein family. They both are RNA binding proteins and their RNA binding properties can be abrogated by tyrosine phosphorylation by BRK/Sik (Haegebarth, Heap etal. 2004). SLM-1 shares many similarities with Sam68 interacting with many of the same proteins and is also 36 tyrosine phosphorylated by Src during mitosis. SLM-1, but not SLM-2, can be phosphorylated by p59(Fyn) regulating it's splicing function (Stoss, Novoyatleva et al. 2004). SLM-2, also named T-STAR or ETOILE, was shown to play a role in splicing regulation by regulating splice site selection (Cohen, Doran et al. 2005).

As the closest relatives to Sam68, SLM-1 and SLM-2, were shown to enhance

Rev response element (RRE) -mediated gene expression and virus replication

(Reddy, Suhasini et al. 2002).

1.3.6 HELD OUT WINGS (HOW)

HOW is the drosophila homologue of Quaking (Baehrecke 1997; Zaffran,

Astier et al. 1997). The HOW gene encodes two isoforms produced by alternative splicing: HOW(L) and HOW(S). Just as for QKI, both HOW proteins are identical in their conserved extended KH domain, but differ at their C-terminal region.

HOW(L) is a nuclear isoform that carries a conserved nuclear retention signal necessary for its inhibitory function. Both isoform play opposing roles, HOW(L) is a repressor of mRNA by inducing its degradation, whereas HOW(S) induces mRNA stabilization (Nabel-Rosen, Volohonsky et al. 2002). In fact, Nabel-Rosen et al. have shown that the balance between the two isoforms regulate tendon cell differentiation through its direct interaction with stripe mRNA (Nabel-Rosen,

Dorevitch et al. 1999). There are two mRNA targets known for HOW. One is

Stripe, the Krox20/Egr2 drosophila homolog, where HOW isoforms were shown to interact with the 3'UTR mRNA region (Nabel-Rosen, Volohonsky et al. 2002).

The other is String/Cdc25. Nabel-Rosen et al. showed that String/Cdc25 mRNA 37 levels are down-regulated by the repressor isoform HOW(L) by binding directly to

String/Cdc25 mRNA and regulates its degradation leading to a control of cell- cycle progression (Nabel-Rosen, Toledano-Katchalski et al. 2005). Recently, like

Quaking, it was shown that HOW is bound by the splicing factor Crooked Neck to control glial cell maturation in Drosophila (Edenfeld, Volohonsky et al. 2006).

1.3.7 KH ENCOMPASSING PROTEIN (KEP) 1 AND SAM

These two proteins were cloned by the Richard laboratory from Drosophila based on their high similarity to Sam68 KH domain. SAM is a cytoplasmic whereas KEP1 is nuclear and plays a role in apoptosis (Di Fruscio, Chen et al.

1998). They both are RNA binding proteins and like most of other STAR/GSG family members, they can homodimerized. Di Fruscio et al. also showed that

KEP1 exerts it's apoptotic function in part by binding to the dredd/Caspase-8 mRNA altering the balance of dredd isoforms in the cell (Di Fruscio, Styhler et al.

2003). They also showed that KEP1 interacts specifically with the alternative splicing factor ASF and that this interaction is increased when activated Src protein kinase is present, suggesting that KEP1 might play a similar role as Slm-2 in splice site selection (Robard, Daviau et al. 2006).

1.3.8 SPLICING FACTOR 1 (SF1)

As previously mentioned, SF1 is an important component of spliceosome assembly. Notably, one of the initial steps of the spliceosome assembly includes 38 the cooperative binding of SF1 and U2 auxiliary factor (U2AF) to the intron sequences upstream of the 3' splice site (Black 2003). Specifically SF1 recognizes the intron branch point sequence (BPS) UACUAAC in the pre-mRNA transcripts (Berglund, Chua et al. 1997; Liu, Luyten et al. 2001; Bechara,

Davidovic et al. 2006). Once bound to the BPS, SF1 facilitates binding of U2AF to the adjacent polypyrimidine tract (Berglund, Abovich et al. 1998). The structural basis for recognition of the BPS by SF1 was determined and is valid for most of the STAR/GSG protein family. The Sattler group have defined using nuclear magnetic resonance (NMR) that the 3' part of the BPS UAAC is specifically recognized in a hydrophobic cleft formed by the Gly-Pro-Arg-Gly motif and the variable loop of the KH domain (Liu, Luyten et al. 2001). Other than the

BPS, no other mRNA targets were identified for SF1.

1.4 RNA BINDING DOMAINS

On the various levels described previously, at which a gene expression can be regulated, numerous RNA binding proteins are involved. They act as trans-acting factor. Many RNA-binding proteins contain one or more copies of a

60 to 100 residues domain, defined by a conserved sequence pattern, known as

RNA binding domain. These RNA binding domains are classified into groups according to their capacity to bind RNA and their structure. Some of these groups are described in the following section. 39

1.4.1 STAR DOMAIN - MAXI-KH DOMAIN

STAR proteins have a trivial KH domain embedded within a larger domain of 200 amino acids. This domain with RNA binding property is also called GSG domain, or the maxi-KH domain. GSG domain was named after the three proteins GRP33, Sam68, and GLD-1 that were first identified to have this domain.This KH domain is sided by an 80-amino acid N-terminal region called

NK and a 30-amino acid C-terminal region called CK. When QKI, GLD-1 or SF1 are involved, the NK and CK regions are sometimes respectively called the

QUA1 and QUA2 region. These regions were shown to be important for RNA binding specificity of the KH domain (Liu, Luyten et al. 2001).

1.4.2 hnRNP K homology domain - KH domain

The hnRNPK homology domain binds to single-stranded RNA, and contains a and (3 secondary structure elements (Liu, Luyten et al. 2001). The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K (Siomi, Matunis et al. 1993). It is an evolutionary conserved sequence of around 70 amino acids that is present in a wide variety of quite diverse RNA-binding proteins. KH motifs are found in one or multiple copies and each motif is necessary for in vitro RNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins, by protein dimerization like QKI. The solution structure of hnRNP K were determined by nuclear magnetic resonance (NMR) revealed a pcraBpa similar to the KH-QUA2 structure of SF1 (Baber, Libutti et al. 1999). 40

1.4.3 RNA RECOGNITION MOTIF (RRM)

A that is frequently involved in sequence-specific single stranded RNA binding. It consist of a paBpap fold with a p strands forming a surface that displays two highly conserved RNP motifs and forms contacts with the RNA (Auweter, Oberstrass et al. 2006). Many eukaryotic proteins that are known to bind single-stranded RNA contain one or more copies of RRM of about

90 amino acids. RRMs are found in a variety of RNA binding proteins, including heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing, and protein components of small nuclear ribonucleoproteins (snRNPs) (Auweter, Oberstrass et al. 2006).

1.4.4 DsRNA BINDING DOMAIN

The DsRBD domain is found in a variety of RNA-binding proteins with different structures and exhibiting a diversity of functions (Doyle and Jantsch

2002). In contrast to other RNA-binding domains, the approximately 65 amino acids-long DsRBD domains have been found in a number of proteins that specifically recognize double-stranded RNAs. DsRBD proteins are mainly involved in post-transcriptional gene regulation. One of the best characterized double stranded RNA binding protein (DsRBP) is the Dicer involved in RNA interference (Filipowicz 2005). Another DsRNP is Staufen involved in RNA localization (Ramos, Grunert et al. 2000). Other functions include RNA editing 41

(Wang, Khillan et al. 2000) and translational repression (Zhong, Peters et al.

1999).

1.4.5 DEAD BOX

DEAD box (Asp-Glu-X-Asp) is an RNA binding domain found in a specific family of proteins that includes the DEAD and DEAH box helicases. Helicases are involved in unwinding nucleic acids (Fuller-Pace 2006). Because the DEAD box helicases change the complex RNA structure, they are involved in many aspects of RNA metabolism including nuclear transcription, pre mRNA splicing, ribosome biogenesis, nucleocytoplasmic transport, and translation (Fuller-Pace

2006).

1.5 RNA BINDING MOTIF

On the various levels described previously, at which gene expression can be regulated, numerous RNA binding proteins are involved as frans-acting factors and the principal and most documented protein domains implicated as frans-acting factors were described in section 1.4. These domains are interacting with RNA binding motifs or elements that are c/s-acting regulatory elements present on the RNA. The interaction between c/s-acting regulatory elements in

RNA and frans-acting RNA-binding proteins can be envisaged to alter RNA structure by facilitating or hindering interactions with other trans-acting factors, altering RNA structure per se, bringing together RNA sequences or providing 42 localization, or targeting signals for transport of RNA molecules to specific intracellular locations. In this section, I will briefly describe some RNA binding motifs that were previously identified.

1.5.1 A2 RECOGNITION ELEMENT (A2RE)

A2RE is one of the best characterized trafficking c/s-acting elements. It was identified to be present in the 3'UTR of MBP mRNA and consists of an 11 nucleotide GCCAAGGAGCC sequence (Hoek, Kidd et al. 1998). The frans-acting protein is the hnRNP A2 (Hoek, Kidd et al. 1998). Like all c/Mrans-acting partners, the specific nucleotide sequence of the A2RE and the precise amino acid sequence and motif structure of hRNP A2 are essential in order to have proper trafficking since a mutation in either the nucleotide sequence or binding motif will abolish binding and prevent trafficking (Shan, Moran-Jones et al. 2000).

The A2 pathway presumably functions in all cells that express hnRNP A2 and

A2RE RNAs, but has been best characterized in oligodendrocytes (Carson, Cui ef al. 2001) and neurons where the extended ramified cell morphology in culture facilitates microscopic visualization of RNA trafficking in live cells. Briefly, the entire A2RE-hnRNP A2 trafficking pathway begins in the nucleus where the binding first occurs, then the complex is exported to the cytoplasm where they are assembled into trafficking RNA granules and transported to the location where they will be translated (Carson and Barbarese 2005). 43

1.5.2 AU RICH ELEMENT (ARE)

AU rich elements (AREs) is another family of RNA motifs involved in post- transcriptional regulatory processes. These elements are composed of a variable number of copies of the "AUUUA" pentamer or "UUAUUUAUU" nonamer (Zhang,

Kruys et al. 2002). With the use of bioinformatics, the list of mRNAs bearing such elements has greatly extended. Based on the number and the distribution of the

AUUA motifs they contain, AREs were classified into three categories. Class I includes AREs containing one to three AUUUA sequences in the 3'UTR region coupled with a nearby U-rich region. Class II ARES have overlapping copies of

UUAUUUAUU in a U-rich environment, and class III AREs do not contain any

AUUUA, but have U-rich stretches (Zhang, Kruys et al. 2002). The first identified

RNA binding protein for this motif was the AUF1 (AU binding factor 1). This protein was shown to mediate mRNA destabilization through by its RRM domain on some ARE containing mRNA like GM-CSF (Zhang, Wagner et al. 1993).

AUF1 was shown to mediate mRNA destabilization or degradation through the recruitment of the exosome (Chen, Gherzi et al. 2001). Other functions like mRNA stabilization and translational repression were identified for AREs binding proteins such as Hu family of proteins and TIAR, respectively (Matter 1989;

Gueydan, Droogmans et al. 1999). The Hu family of proteins contain a conserved arrangement of three RRMs and they were shown to bind AREs and act as ARE mRNA stabilizing factors by probably competing with other ARE binding factors

(Myer, Fan et al. 1997). Other ARE binding proteins have been identified; however, their functional roles were not defined (Zhang, Kruys et al. 2002). 44

1.5.3 G-QUARTET

G-Quartet was first identified using in vitro RNA selection and the RGG motif from FMRP (Darnell, Jensen et al. 2001; Schaeffer, Bardoni et al. 2001).

FMRP is coded by FMR1 gene and mutation within this gene leads to fragile X syndrome, one of the most frequent causes of inheritable mental retardation

(Schaeffer, Bardoni et al. 2001). FMRP contains three RNA-binding motifs, two

KH domains, and an RGG motif. One of these KH domains binds to another RNA motif called the kissing complex and was also identified through in vitro RNA selection but this RNA motif is still elusive (Darnell, Fraser et al. 2005). RGG motif from FMRP showed a preference for G-rich sequences called G-Quartets.

These are nucleic acid structures in which four guanine residues are arranged in a planar conformation stabilized by Hoogsteen-type hydrogen bonds (Ramos,

Hollingworth et al. 2003). Several such planar structures can stack and be further stabilized by potassium or sodium cations (Ramos, Hollingworth et al.

2003). This structural RNA motif is present in more than 80 potential mRNA partners that interact with FMRP in a potassium-dependent manner and were identified in neurons (Miyashiro, Beckel-Mitchener et al. 2003). Furthermore,

FMRP was shown to regulate its own mRNA by binding to a G-Quartet present in the 3'UTR (Schaeffer, Bardoni et al. 2001). A list of potential mRNA targets are now available, but they must be validated using other functional approaches as was done with the FMR1 mRNA or the MAP1B mRNA (Bardoni and Mandel

2002). 45

1.5.4 SPLICING MOTIF

As explained previously, alternative splicing allows individual genes to produce multiple protein isoforms. Alternative splicing is the molecular event where the presence of splicing cis- and frans-acting elements are most probably implicated and regulated. It is the c/s-acting element present on the pre-mRNA and the frans-acting splicing machinery and regulators that constitute the nature of splicing. Among the c/s-acting motif present in the mRNAs we can identify the branch point, the polypyrimidine tract, and the 5' and 3' splice site. The branch point is a 7 nucleotide sequence near the 3' splice site on the pre-mRNA. It is bound by SF1 which associates with U2AF to facilitate U2 snRNP binding to the

3' splice site. The mammalian branch point sequence, YNCURAY, exhibit considerable variation where Y is pyrimidine, R is purine, and N is any nucleotide

(Peled-Zehavi, Berglund ef a/. 2001). On the other hand, in yeast, the branch point sequence is well defined as UACUAAC (Berglund, Chua ef a/. 1997). The polypyrimidine tract is located just 3' off the branch point sequence and serves as a binding sequence for U2AF 65 kD which, by associating with SF1, creates specificity in the splicing event (Berglund, Abovich ef a/. 1998). This stretch of pyrimidine is normally followed by an AG dinucleotide sequence just prior to the

3' splice site. This AG dinucleotide sequence is bound by the other subunit of

U2AF (U2AF 35 kD). 46

Other splicing elements include the enhancers called exon splicing enhancers (ESE) and intron splicing enhancers (ISE) which are elements on which a complex of proteins assemble to promote the use of a weak or regulated splice site. On the other hand, we can also identify splicing silencers called exon splicing silencers (ESS) and intron splicing silencers (ISS) which are elements on which a complex of proteins assemble to repress the use of a splice site (Matlin,

Clark et al. 2005). All these elements are part of splicing decisions that are determined by a cellular code that are constituted by a variation of combination of cis and trans regulators specific to each cell type and tissues or state of development (Matlin, Clark et al. 2005).

1.5.5 TRA-2 - GLI ELEMENT (TGE) AND QUAKING RESPONSE ELEMENT (QRE)

It is well characterized that GLD-1 binds to the 3' UTR of tra-2 mRNA and recruits a complex that silences its translation (Goodwin, Okkema et al, 1993).

The region bound by GLD-1 on the mRNA was identified by the Goodwin group and termed TGE for Tra-Gli element (Jan, Yoon et al. 1997; Jan, Motzny et al.

1999). It is composed of two 28-nucleotide repeats separated by a conserved 4 nucleotide spacer (Jan, Yoon et al. 1997). The sequence identified was 5'-

UAUUUAAUUUCUUAUCUACUCAUAUCUA -3' were the spacer is underlined.

But the affinity and sequence specificity of GLD-1's interaction with the TGE was defined by Ryder and Williamson eight years later by using quantitative gel mobility shift assay and fluorescence-polarization. They show that the 4 47 nucleotide spacer is in fact part of a larger sequence necessary for GLD-1 binding. They show that GLD-1 binds a minimal UACUCA hexanucleotide and that this sequence is required for high-affinity GLD-1 binding (Ryder, Frater et al.

2004). They also showed that this consensus sequence is present in every known GLD-1 target like mes-4, pie-1 and tra-1 mRNA.

Based on their GLD-1 specificity study, they also showed in other work, using gel shift assay and fluorescence polarization, that QKI binds reasonably well to TGE RNA, so they compared the nucleotide sequence specificity of QKI to the one identified for GLD-1 and identified that the specificity for QKI is 5'-

NiA2(A>C)3U4(A»C)5A6-3', where N equals any of the four nucleotides (Ryder and Williamson 2004). There are three major differences between GLD-1 and

QKI consensus sequences. First, the preference of GLD-1 for a uridine at position 1 is absent in QKI where any nucleotide could be substituted. Second, there is a minor preference for an adenosine over a cytosine at position 3 for

QKI. Finally, position 5 must be an adenosine in order to get a high affinity binding for QKI whereas GLD-1 can tolerate an adenosine or a cytosine (Ryder and Williamson 2004). 48

2 TARGET RNA MOTIF AND TARGET MRNAS OF THE QUAKING

STAR PROTEIN

2.1 PREFACE

Prior to this project, only a few mRNAs were identified to be bound by

Quaking or, WHO/HOW and Gld1, the related homologs in D. melanogaser and

C. elegans respectively. Eventhough numerous studies be inclined to categorize a possible function for Quaking, an important approach to recognize a genuine role for this family of proteins is to identify the mRNAs that are bound by them.

The hypothesis based on the following study is that Quaking binds to a subset of determined mRNAs, and that identifying these RNAs and understanding how

Quaking acts on them would provide important evidence toward understanding the function of Quaking proteins. The following chapter relates the identification and refinement of the RNA motif bound by Quaking and a bioinformatic search to identify the putative mRNA targets that associate with Quaking.

2.2 ABSTRACT

Quaking viable (Qkv) mice have developmental defects that result in their characteristic tremor. The quaking (Q/c) locus expresses alternatively spliced

RNA-binding proteins belonging to the STAR family. To characterize the RNA binding specificity of the QKI proteins, we selected for RNA species that bound 49

QKI from random pools of RNAs and defined the QKI response element (QRE) as a bipartite consensus sequence NACUAAY-N1-20-UAAY. A bioinformatic analysis using the QRE identified the three known RNA targets of QKI and 1,430 new putative mRNA targets, of which 23 were validated in vivo. A large proportion of the mRNAs are implicated in development and cell differentiation, as predicted from the phenotype of the Qkv mice. In addition, 24% are implicated in cell growth and/or maintenance, suggesting a role for QKI in cancer.

2.3 INTRODUCTION

Qkv mice carry a spontaneous recessive mutation that affects maturation of the myelinating cells of the central nervous system, the oligodendrocytes, resulting in uncompacted myelin within their central and peripheral nervous systems (Hogan & Greenfield, 1984). These mutant mice show rapid tremors or

'quaking' 10 days after birth and experience convulsive tonic-clonic seizures as adults. The Qkv mutation consists of a 1-megabase deletion that includes the promoter and enhancer regions of the Qk gene, which encodes a family of alternatively spliced RNA-binding proteins (Ebersole, et al., 1996).

Oligodendrocytes of normal mice express three major Qk mRNAs of 5, 6 and 7 kb encoding QKI-5, QKI-6 and QKI-7, respectively. The promoter deletion observed in Qkv mice prevents the expression of alternatively spliced QKI-6 and

QKI-7 isoforms in oligodendrocytes, as determined by immunocytochemical analysis (Hardy et al., 1996). These observations suggest that the balance among the different QKI isoforms may control oligodendrocyte cell fate and 50 myelination (Larocque et al., 2002). Moreover, recent studies have shown that

QKI-6 and QKI-7 regulate cell cycle progression and promote oligodendrocyte cell fate and differentiation (Larocque et al., 2005).

The QKI proteins contain a heterogeneous nuclear ribonucleoprotein K homology (KH) domain and belong to the Nova-1, Sam68 and FMRP families of

KH-type RNA-binding proteins (Darnell, 2004; Lukong & Richard, 2003). The KH domain of QKI is embedded in a larger domain called the GRP33-Sam68-GLD-

1 (GSG) or signal transduction activator of RNA (STAR) domain (Lukong &

Richard, 2003). These KH-type RNA-binding proteins are often referred to as

STAR proteins because of their links to signal transduction pathways (Vernet &

Artzt, 1997). All QKI isoforms contain identical GSG/STAR domains and therefore should have identical RNA binding specificities. QKI-5 is the major nuclear isoform expressed during embryogenesis; its expression declines after birth (Ebersole et al., 1996). The QKI-6 and QKI-7 isoforms are expressed during late embryogenesis and their expression peaks during myelination. This temporal expression pattern and the separate cellular localization of the QKI isoforms suggest that access to RNA targets is regulated by timing and compartmentalization as well as by signals to the QKI isoforms.

The QKI proteins regulate pre-mRNA splicing (Wu et al., 2002), mRNA export (Larocque et al., 2002), mRNA stability (Larocque et al., 2005; Li et al.,

2000) and protein translation (Saccomanno et al., 1999), as well as cellular 51 processes including apoptosis (Pilotte et al., 2001), the cell cycle, glial cell fate

(Larocque et al., 2005) and development (Li et al., 2003). However, only a few mRNA targets have been identified. A general STAR-binding element (SBE)

(U>G>C/A)A(C>A)U(C/A>U)A has been identified for Caenorhabditis elegans

GLD-1 (Ryder et al., 2004), a key regulator of germline development (Francis et al., 1995). The high degree of similarity between the GSG/STAR domains of

GLD-1 and QKI suggests that the SBE may be conserved; indeed, a variation of the SBE, NA(A>C)U(A»C)A, has been identified for QKI (Ryder et al., 2006).

We wished to define the QKI binding site to help us identify putative mRNA targets of QKI through bioinformatic analysis. To define the binding site, we selected from a pool of RNA aptamers with random sequences for those that preferentially bound QKI. The identified bipartite consensus was long enough for a bioinformatic analysis that led to the identification of 1,430 new putative targets.

Of these, 23 mRNAs were validated as novel QKI targets by immunoprecipitation assays. Many of the putative target mRNAs encode proteins implicated in development, cell adhesion, morphogenesis, organogenesis, cell differentiation and cell growth. These data implicate the RNA-binding QKI proteins as regulators of cell fate and proliferation. 52

2.4 METHODS

2.4.1 SELEX ASSAY.

Oligonucleotides harboring a 52-bp random sequence surrounded by two primer binding sites (GGG AGA ATT CCG ACC AGA AG (N52) TAT GTG CG

TCT ACA TGG ATC CTCA), with an estimated complexity of 1 x1015, were synthesized (Invitrogen). The oligonucleotides were amplified by PCR using corresponding forward and reverse primers as previously described

(Buckanovich & Darnell, 1997). After PCR amplification, the sequences of 24 random clones were determined; each clone was unique and the overall base composition showed similarity among the clones (average composition: A, 20%;

U, 30%; C, 22%; G, 28%; data not shown). A purified DNA library (1 x1013 molecules) was transcribed in vitro using T7 RNA polymerase (Promega) and

[a32P]UTP. RNA was purified from denaturing TBE-acrylamide gels, heated to 65

°C for 5 min, and precleared using recombinant GST bound to glutathione- agarose beads (Sigma) to adsorb non-specifically bound RNAs. Unbound RNAs were incubated in binding buffer (50 mM Tris-HCI (pH 8.0), 50 mM KCI) with the

GST-QKI-5 fusion protein for 30 min, then with glutathione-agarose beads for another 30 min. After four washes with binding buffer, the RNAs were eluted by

TRIzol extraction (Invitrogen). The purified RNAs were ethanol precipitated and resuspended in water with RNase-free DNase (Promega) for a 15-min reaction.

The DNase reaction was quenched for 10 min at 65 °C. Reverse transcriptions were performed using M-MLV reverse transcriptase (Promega) and the following reverse oligonucleotide: 5-TGA GGA TCC ATG TAG ACG CA-3'. cDNAs were 53

then generated by PCR amplification with the reverse oligonucleotide and the following forward primer containing the sequence of the T7 promoter

(underlined): 5'-GCG TAA TAC GAC TCA CTA TAG GGA GAA TTC CGA CCA

GAA G-3'. After round 6, the cDNAs were amplified with the reverse primer and

the following forward primer: 5'-GGG AGA ATT CCG ACC AGA AG-3'. The DNA

fragments were digested with BamHI and BamRI and subcloned into pBluescript

SK+ (Stratagene) for blue/white selection. Sixty white colonies were selected,

their plasmids were purified and the SELEX sequence was identified by DNA

sequencing (Genome Quebec).

2.4.2 RNA PREPARATION AND PURIFICATION.

All RNAs, except RNAs used to verify the amplification of QKI-specific

sequences (Figure 2.1b), were prepared by run-off in vitro transcription of

oligonucleotides harboring a T7 binding site in the presence of [32P]UTP, using

T7 MegaShortscript (Ambion) according to the manufacturer's protocols. RNAs

used to verify the amplification of QKI-specific sequences (Figure 2.1b, round 0

and round 6) were transcribed from a purified PCR template harboring a T7

binding site using T7 MegaShortscript (Ambion) according to the manufacturer's

protocols. All substrates for the T7 RNA polymerase reactions are shown in

Table 2.5 in the supplementary data section (section 2.9). 32P-labeled RNAs

were purified on a TBE-acrylamide gel before use. 54

2.4.3 EMSAS.

A constant concentration of 32P-labeled RNA (100 pmol) was incubated alone, with 750 nM GST or with increasing concentrations of GST-QKI-5, in the following buffer: 20 mM HEP.ES (pH 7.4), 330 mM KCI, 10 mM MgCI2, 0.1 mM

EDTA, 0.1 mg ml"1 heparin and 0.01% IGEPAL CA630 (Sigma). The 30ul reactions were incubated at room temperature for 1 h, and then 3.3 ul of RNA loading dye (glycerol containing 0.25% (w/v) bromophenol blue, 0.25% (w/v) xylene cyanol) was added to each. A portion (15 pi) of each sample was separated on native Tris-glycine 8%-acrylamide gels. The gels were dried and the bound and unbound RNAs were quantified using a Storm Phosphorimager

(Amersham). The fraction of bound RNA was determined and plotted using the software program Prism 3.0 (GraphPad Software).

2.4.4 IMMUNOPRECIPITATIONS OF QKI-5, QKI-6 AND SAM68 WITH WlRNAS.

Time-pregnant C57BL/6 females (15.5 d) and their embryos were sacrificed in accordance with a protocol approved by the Animal Care Committee at McGill University. Embryos were harvested separately and Dounce homogenized in 2 ml of ice-cold lysis buffer (10 mM HEPES (pH 7.4), 200 mM

NaCI, 30 mM EDTA, 0.5% Triton X-100) with 2X Complete protease inhibitors

(Roche) and 400 units per ml Superasein (Ambion). Debris was pelleted at

3,000g for 10 min. The supernatant salt concentration was increased to 400 mM

NaCI and the supernatant was clarified at 70,000g for 30 min. The resulting supernatant was precleared for 1 h with 60 pi of protein A-Sepharose 55

(preblocked with 0.1 ug/ml each BSA, yeast tRNA and glycogen). An aliquot

(20%) of precleared lysate (total RNA sample) was prepared for RNA extraction using TRIzol (Invitrogen). The remaining lysate was immunoprecipitated with either normal rabbit serum (NRS) as a negative control or affinity-purified anti-QKI-5, anti-QKI-6 and anti-Sam68, and protein A-Sepharose was then added. The QKI-5 and QKI-6 antibodies were raised in rabbits using the peptides

KVRRHDMRVHPYQRIVTADRAATGN and KEYPIEPSGVLGMAFPTKG, respectively, covalently attached to KLH. The Sam68 antibody was synthesized as previously described (Chen et al., 1999). The NRS, anti-QKI-5, anti-QKI-6 and anti-Sam68 immunoprecipitations were washed extensively and the RNAs isolated using TRIzol. Specific mRNAs were detected using 35 cycles of RT-

PCR. (Forward and reverse primers for all tested mRNAs are listed in Table 2.6 in the supplementary data section (section 2.9)) The resulting DNA fragments were separated on 1.7% agarose gels and visualized by ethidium bromide staining.

2.4.5 BlOINFORMATICS.

All coding sequences, 5' UTR sequences and 3' UTR sequences for mouse refseq genes from the genome assembly available in NCBI Build 32

(UCSC version mm4) were downloaded from the UCSC table browser

(http://genome.ucsc.edu/). Each of these three groups of sequences was scanned for the presence of the motifs TAAYN^oACTAAY and

ACTAAYN1-20TAAY. All sequences of each of the three groups containing at 56 least one positive hit for one of these motifs were identified and their annotations were extracted from NCBI's Nucleotide database using the

GBSeqXML format.

2.4.6 B-LACTAMASE ASSAY.

At 24 h after transfection, the cells were washed three times with phosphate-buffered saline (PBS), then harvested and lysed in PBS by three cycles of 5 min freeze (-80°C) and 5 min thaw (37°C). Lysates were clarified by centrifugation (5 min at 16,000g). 15 pi was used in a nitrocefin assay and the results were normalized to protein and cell content as previously described

(Galameauetal.,2002).

2.4.7 ACCESSION CODES.

GenBank accession numbers: pumilio-1, NM_030722; pumilio-2, NM_030723; Trp53, NM_011640. BIND identifiers (http://bind.ca): 266780, 266781, 266782, 266783, 266784, 266785, 266786, 266787, 266788, 266789, 266790, 266791, 266792, 266793, 266794, 266795, 266796, 266797, 266798, 266799, 266800, 266801, 266802, 266803, 266804, 266805, 266806, 266807, 266808, 266809, 266810, 266811, 266812, 266813 and 266814. 57

2.5 RESULTS

2.5.1 IDENTIFYING THE QRE

To identify the RNA binding motif for the QKI proteins, we enriched for high-affinity RNA ligands using systematic evolution of ligands by exponential enrichment (SELEX). All QKI isoforms have a KH-type RNA-binding domain embedded in a larger GSG/STAR domain, so we arbitrarily chose the nuclear

QKI-5 isoform for these studies. Bacterial recombinant QKI-5 was expressed as glutathione-S-transferase (GST) fusion protein and purified for the SELEX assay.

RNAs were transcribed using T7 RNA polymerase from a pool of 52-nucleotide

(nt) DNAs with random sequences. The complexity of the pool was estimated at

1.0x1014. We randomly sequenced 20 RNA molecules from the initial library and, as expected, each sequence was unique (data not shown). The transcribed

RNAs were generated in the presence of [32P]UTP so that the amount of specific

QKI-5-bound RNAs could be measured after each round. After six rounds of selection, we observed »9.2% binding, demonstrating that we indeed had enriched QKI-5-specific sequences (Figure 2.1a). To confirm the SELEX amplification of the QKI-5-specific RNA ligands, we performed an electrophoretic mobility shift assay (EMSA) for two randomly selected transcripts, one from round 0 and the other from round 6. The RNAs were labeled with 32P and incubated with buffer, GST alone or GST-QKI-5. GST-QKI-5 bound the selected

RNA after round 6 and formed slowly migrating species in native gel electrophoresis (Figure 2.1b). Species from round 0 and species incubated with

GST alone did not show similar mobility (Figure 2.1b), nor did species incubated 58

FIGURE 2.1 IDENTIFICATION OF QKI-SPECIFIC RNA SEQUENCES USING SELEX. (a) Selecting QKI-binding aptamers. The relative binding of 32P-labeled RNAs to QKI-5 as a percentage of control after each round of selection, (b) Verification that QKI-5-binding RNAs were identified. Results of EMSAs of random RNA clones from rounds 0 and 6 with increasing concentrations of GST-QKI-5 (1 nM, 10 nM, 100 nM, 300 nM and 1 uM). GST alone (750 nM) and buffer alone were used as negative controls. Migration patterns of unbound RNAs (free probe) and QKI-5-bound RNAs (QKI-RNA complex) are indicated at left, (c) Comparison of the binding affinities of QKI-5 and Sam68 for a selected RNA sequence (gccgUAACcacgucUACUAACgccg; capital letters denote consensus). Results of EMSAs of this sequence with concentrations of GST-QKI-5 or of a histidine- tagged GSG/STAR domain of mouse Sam68 that increased by a factor of 1.8 from 1 nM to 9.6 pM. Fraction of bound RNA is plotted as a function of protein concentration on right, (d) Comparison of the binding affinities of different GST- QKI isoforms for the RNA sequence in c. Results of EMSAs of this sequence with concentrations of GST-QKI-5, GST-QKI-6, GST-QKI-7 or mutants QKI-5E48G or QKI-5V157E that increased by a factor of 3 from 3 nM to 2.3 pM. (e) Expression of p-lactamase reporter gene when COS cells were cotransfected with either a control myc-pcDNA3.1 vector (black bars) or an expression vector encoding myc-QKI-5 (white bars), along with a p-lactamase expression plasmid harboring six copies of the QRE in the 3' UTR (BL-QRE) or no insert (BL). Each bar represents the average result of three independent nitrocefin assays. The a c QKi-S p.10.0 SO "5 OKI'RNA 8, 5-°* complex

n n pc-c:i-.*;-«;r;s,if,-,f--»-. I: > i • U ' *4t»>' O.O01 0.01 0,1 1 10 x wii«#irt*«iM l^mtt* u- tiOf Pfoteta cone. (|>W) BatoM. _£?• «a£ d P SST- OsJt . - - . . _ W

QKi-RNA

:dll« orobi- *f 1 *w Bt-QRE Figure 2.1 Identification of QKI-specific RNA sequences using SELEX 60

with Sam68, another GSG/STAR protein (Figure 2.1c). The Kd for QKI-5 binding was 99 nM (Figure 2.1c), similar to that for other high-affinity KH domain-RNA interactions (Ryder et al., 2004; Buckanovich & Darnell, 1997). As we used QKI-5 for our selection assay, we wished to confirm that the QKI-6 and QKI-7 isoforms also bound the selected RNAs. EMSA visualizations showed that QKI-6 and QKI-

7 bound equally well to the selected RNAs, with Kds of -107 and -114 nM, respectively (Figure 2.1d and Figure 2.7 in supplementary data section (section

2.9)). Some amino acid substitutions within the GSG/STAR domain of QKI cause embryonic lethality (Cox et al., 1999; Justice & Bode, 1988). QKI isoforms harboring these amino acid substitutions were defective in RNA binding as expected, confirming that the RNAs are bound by the GSG/STAR domain (Figure

2.1d).

The function of the identified QRE was examined in mammalian cells. We have showed previously in COS cells that QKI-5 represses the expression of myelin basic protein (MBP) by binding to the 3' UTR of MBP mRNAs and thus retaining them in the nucleus (Larocque et al., 2002). A reporter plasmid encoding the enzyme [3-lactamase was generated with or without a QRE in its 3'

UTR. This reporter gene was transfected with an empty vector (Figure 2.1e, black bars) or with a QKI-5 expression vector (Figure 2.1e, white bars) in COS cells to determine whether QKI-5 could suppress the expression of the reporter gene, as assayed enzymatically in vitro after cell lysis. The expression of QKI-5 repressed >60% of the (3-lactamase activity when the 3' UTR contained a QRE, 61

FIGURE 2.2 RNAs THAT BOUND TO QKI-5 AND THEIR CONSENSUS SEQUENCE. (a) The sequences of 43 unique RNAs bound to QKI-5 after six cycles of SELEX. A core sequence of UACUAAC is shown in bold and half-sites of UAAC or UAAU are underlined, (b) Probability matrix (graphic logo) based on all 43 sequences, depicting the relative frequency of each residue at each position within the selected motif. _S&ifr,. •JiJ>.r;..; i . 1'""'"'"' " CUC M it JAli't { AM VVAGl .11J/VC(MACUAUAAiM* C AAMCUAACU AIK IA| ~~~ "~" —— 2 bL^GUt»G&OKttmUGUACUMCGCUGACU«ULUUiLCUGUAt&U&€G€Ut 3 ui« i ficutoutjeaiLUACuAACACcuGrf wi« i u*ar Aci.cfojset'-'u.oc 4 AtACCC AU04ACACr.4AUlK.A«.CiAUliCCCU VCUAClMACAOCfiGAO^Ct UCt 5 UACUAACACAACCAGW UCI,AA( '.j,Uv.GACCCUA»UU/*Ct.i AtGCAUUCCGG 6 GriKIAAACGCAACUAACC Ai;U: UIGUl GAU I »Ai,G>j<,A«,t>CCCt § GLACCA*ffClJGAC*CtWtiftrSii!AAtACIK-DACUAACCSA^tACrCt f!If 10 AGGlCf UUGGjLctUtMJ UACUAACAtCUCl iGCCGGAGAAAu GIKAUC'.. 11 !-f f AACU JiWCWCf AUUfACUACUAACCmiUwMACOAf < A At UQ'GGGUf, 12 i ,C CGUCGGCA W.K GC.IK.GGt S IH,UACUAACAAU< t,UUA< UAACGM GGGAU. 13 CAC>!V!.Cf«^l)At'0AlUAAClCt«.C<.LlllC>ty(.6^i.^«(jt.i.^UC«.L<.LAUJ<.G 14 USOSKSCCGGOl CAG6C0«l)CCGJCGtUAWAACA<»CAUAC«AACACA«CACCC 15 C'JAK CCtlUC C CUAGAAC W^CvMACAUAUOJACUAACCUUUC UUC CACCC CU IS UUC AGU-AGI GAlG\GOAUlUAtUAA£UGAUACUAACtUAtGUGCU.6GUl 1? uCffUJIttlGCfUCCCACUAACACtfLUUAUACOAACA SGACTCUGfC'JGGOu 1$ Jf»i.l6Ui.&K6tJ0UC«C<;iiAL'UyM<.AC<,U<.UACUAACCACC«W.,<-UGCAf t 4 19 OCUUOUUfACUt.WK.W.1 t-GtGUAAt€ WAat.lM.iGUACUAACAt< GAtGC 2G CUMt-ffUW.UACUAACCAUCCUAffCCClCAU'JCCCUAACCCfWCCiCOrCA 21 OKjrCSOCU'JUUGCACU Al I AACAAt t1'«.UK I.ASU K. tlACUAACACCAtSCS* 22 «XUGAGG€s»iM ACUI-AC UGCMGttUACUAAURAiCUiUt CAt UAO.AAC1' 23 «iCAtUAAC{JkACUAACAllGU.UUUCCACtAAU,JAt*tG.Vil,UC<.A< CW.CC 24 GACtlAApU Ml UAC0AACCWUSGACCUAUt4t.GUAftACCUAAjt<.CGaCAat 25 UOiCCUACUUAAr.uCGAt«<.UUUUGAAUAC0AACCUGUtCCt-fAAUAGGGL 26 GGCUlJtACUCLGUACOAAUa U*GCOAACUAACACCCCUUl*CtCA«JGlJCGC 2? Gt'G,«!.GACCGGtllACUAAOtyCI,ltAU^CAU,JCil!,U Jt.utGCGGAt.Gl'M.l G 28 CAGtAGLUUAACAGtCACUMCAAUG«XUUMEAt!ttSAMt,AUtA&u»^C 29 CCCCAUACWAAUAUUCUACilAACUCUUGCyGDCGCtGCUUl UUUGuCCACCG 30 GUGAGCGCCAAGUUACUAAUt UCCAAUACUAACfCCGUfAtAAGCCAGCW* 31 tUSCC6ttUC6LJUAW.SGUjGU6UAUACUAACAMliACUAACCtG«MUCtJ 32 GGGUfiUCGUCCI (GUAUCGCUCd UCCl >: IAACT *«UWJACUAAC*AAAl.CUf GC 33 C Gt.LG.AGGi OCU AAGGM AUACUAACAGAALUCCUGU A«J AACCt AAACt GCG 34 CfltG4CACG6GAW"ACaAACGAt'CMCAA

*& *>* t» *»

Figure 2.2 RNAs that bound to QKI-5 and their consensus sequence. 63 compared with 25% inhibition when it did not contain a QRE (Figure 2.1e). These findings suggest that the selected QRE indeed binds to QKI in vivo.

After round 6 the selected RNA ligands were converted into cDNAs, which were then subcloned in a plasmid and sequenced. Of 60 clones sequenced, 43 were unique (Figure 2.2a). A sequence alignment identified a core consensus sequence defined as 5'-NACUAAY-3' (Figure 2.2b), which was duplicated in 14 of 43 RNAs (Figure 2.2a). An additional sequence motif of UAAY was identified in

25 of 43 RNAs selected (Figure 2.2a); we termed this motif a half-site. QKI- selected RNAs QSEL1 and QSEL37 each contained two cores and a half-site, whereas 5 of 43 sequences contained the core with no visible UAAY half-site.

These data suggest that some variations of the half-site may be tolerated.

Collectively, our data showed that the selected RNA aptamers contained a bipartite motif; the spacing between the repeats varied from 1 (QSEL23) to 18

(QSEL41)nt (Figure 2.2a).

2.5.2 DEFINING THE QRE

A majority of the identified QKI sequences contained a core sequence

NACUAAY and a half-site UAAY. Because previous researchers (Ryder et al.,

2004) had defined only a core sequence as the STAR binding element, we wished to validate the need for the additional half-site. To define the minimal

QRE and to determine the importance of the half-site, we focused on QSEL2 and

QSEL13, which each contain a core and a UAAU half-site. The half-site is 64

FIGURE 2.3 DEFINING THE QRE. (a-c) Results of EMSAs of selected RNAs with decreasing concentrations of QKI-5 (by a factor of 2 from 2 uM) or with buffer alone. The RNAs used for the EMSAs are variants of SELEX clones 2 (a) 13 (b) and 8 (c); RNA species are shown underneath each EMSA and the core (C) and half-site (H) sequences are indicated. Black bars denote sequences that are unaltered between the wild-type

(wt) and mutant (mut) versions. Their Kds (nM), calculated from several EMSA experiments, are also shown. Migration patterns of unbound RNAs (free probe) and QKI-5-bound RNAs (QKI-RNA complex) are indicated at left. 65

QSELSrwt J?SEL2fmitl_ _OSEL2rnut2_ OK!--. OKl-fj OK'-'.

»+ • -

«.<•<*# CorefCj H.*!f-M) UACUMCBfUAAU UACUAACBICAAC UACUAACfllQAGCl

,QSEU3wi _QSEL13n,ut1 __ QSQLl3mut2

OK!-". " _., CKt-;

CK; • rt^A CC!< .>.-.,^

f tc«j J .>T{"* •t m •* »#••••**•»••«••

Hait (H) Core (C) OSEL t 3 A: HiUMU •GACUAAC, Of.=< Otiuil HfCAACiGACUAAC CKkL! *nU2 •GAGUlGACUAAC:

OS£l.8iYHtt OKf--

OKS RMAl cc/?»iex I

www W#

OSEl 8AJ IUAUABIGACUMC: IUAAUHGACUMC Figure 2.3 Defining the QRE. 66 downstream and upstream of the core in QSEL2 and QSEL13, respectively

(Figure 2.2a). EMSAs examining binding of GST-QKI-5 to QSEL2 and QSEL13 showed Kds of -112 and -143 nM, respectively (Figure 2.3a, b). Substituting the

UAAU half-site with GAGU in QSEL2 or QSEL13 abolished RNA binding by QKI-

5 (Figure 2.3a, b); in contrast, when the UAAU half-site was substituted with

CAAC in QSEL2 and QSEL13, QKI-5 retained a fraction of its RNA binding affinity, although the Kds increased to 374 and 195 nM, respectively. QSEL34, a sequence with a core but no identifiable half-site, harbors two downstream CAAC sequences (Figure 2.2a). QSEL8, another RNA aptamer with only a core sequence (Figure. 2.2a), was bound with a moderate affinity of 357 nM, suggesting that it contains sequences that partially compensate for the absence of a half-site (Figure 2.3c). We introduced a perfect UAAU half-site upstream of the QSEL8 core sequence; the half-site-containing QSEL8 variant (QSEL8mut) bound QKI-5 with a higher affinity than the original QSEL8 sequence (Kd = 157 nM; Figure 2.3c). In summary, the core sequence NACUAAY and a neighboring half-site UAAY collectively represent a high-affinity (Kd * 100 nM) binding site for the QKI proteins.

In RNA sequences with both a core and a half-site, there were 1-18 nt between the two motifs (Figure 2.2a). To determine the optimal spacing between the core and the half-site, we generated RNAs with 0, 2, 5, 10, 15, 20, 25 and 30 nt between the two motifs and measured their binding affinities for QKI-5 using

EMSA. RNA with 0 nt between the core UACUAAC and the half-site UAAC was 67

bound with moderate affinity (Kd » 303 nM; Figure 2.4a). RNAs with 2-20 nt between the core and the half-site were bound with high affinity (Kd ~ 160 nM;

Figure 2.4a). Extending the spacing to 25 and 30 nt gradually rendered the RNA sequence a moderate QKI binding site (Kd « 187-277 nM; Figure 2.4a). Thus, a spacing of 2-20 nt between the core and the half-site is optimal and a spacing of

25 nt is also tolerated.

MFOLD (Zuker, 2003) identified no RNA secondary structures in the QKI- selected RNAs (data not shown). To examine whether the affinity between QKI and RNA depends on RNA secondary structure, we imposed a simple stem-loop

(hairpin) structure on the QRE similar to the one found in the Nova-1 RNA binding site (Jensen et al., 2000). RNA sequences were generated that contained the core and half-site of the QRE within either the loop or the stem of an RNA hairpin (Figure 2.4b). EMSA was then used to test the ability of these

RNAs to bind QKI-5. QKI-5 did not bind structured RNA having the QRE core sequence at the stem-loop junction and the half-site in the loop (Figure 2.4b).

Binding was also abrogated when the core was in the stem and the half-site in the loop, and vice versa (Figure 2.4c,d). In control experiments, QKI-5 did not bind a simple hairpin RNA without the QRE core and half-site sequences (Figure

2.4e), and QKI-5 bound with high affinity to an unstructured QRE (Figure 2.4f;

QRE sequence given in the legend of Figure 2.1c). These data define the QRE as direct repeats with a minimal consensus of NACUAAY-N1-20-UAAY. 68

FIGURE 2.4 DEFINING OPTIMAL DISTANCE AND STRUCTURE BETWEEN DIRECT REPEATS IN THE QRE, (a) Results of EMSAs performed with decreasing concentrations (by a factor of 2 from 4 uM) of recombinant GST-QKI-5, showing that Kd varied little when the distance between core (UACUAAC) and half-site (UAAC) sequences was varied from 0 to 30 nt. Migration patterns of unbound RNAs (free probe) and QKI- 5-bound RNAs (QKI-RNA complex) are indicated at left, (b-e) Results of EMSAs of variant QREs containing simple hairpin structures, performed with increasing concentrations (by a factor of 3 from 2.7 nM) of QKI-5, showing that QKI does not bind the hairpin QREs. Each sequence tested is shown above its EMSA result as an MFOLD-determined secondary structure; the bases of the core motif are in light circles and the bases of the half-site are in black circles, (f) EMSA, as in b-e, of the unstructured QRE described in the legend of Figure 1c. Migration patterns of unbound RNAs (free probe) and QKI-5-bound RNAs (QKI-RNA complex) are indicated at right. 69

C30H

«H« »*V

;.>.';'. j > -.1111111 »•»»»• •> • *'* ******* .»,«.!> lii.i- <•.. b^c^® c *•% d /»*«•„ e„cOJc f * ® °A- £ !0'"--ii AA J » >aS3 U !»S i U U A ens A ij At Q*C 23 0«£. NO U * A U C*0 or, «) s'mdym { •< U a A# **»'s G»C A A A0*^ -3J A O A A A A A A A A A 0!i- j} Cv

.ill" 1-RNA I !TfJ&X

Figure 2.4 Defining optimal distance and structure between direct repeats in the QRE. 70

2.5.3 MAPPING THE QRES IN KNOWN MRNA TARGETS OF QKI

We searched the known mRNA targets of QKI for the presence of the

QRE. These targets include the mRNAs encoding the myelin basic proteins

(MBPs) (Larocque et al., 2002; Li et al., 2000), early growth response gene-2

(EGR-2) (Nabel-Rosen et al., 2002) and the cyclin-dependent kinase inhibitor p27Kip1 (Larocque et al., 2005). Sequences bound by QKI within the p27Kip1 mRNA were recently shown to meet the criteria for the bipartite QRE (Larocque et al., 2005), so we focused on the MBP and EGR-2 mRNAs. We have previously used a bead assay, EMSA and UV cross-linking to show that QKI binds nucleotides 626-885 of the MBP mRNA (Larocque et al., 2002). We were surprised to find that this region does not harbor the QRE consensus sequence; rather, it contains multiple half-sites clustered together (MBP:QRE-1, Figure

2.5a). Nucleotides 1551-1581 of the MBP mRNA contain the QRE consensus sequence, NACUAAC-N13-UAAC (MBP:QRE-2, Figure 2.5a); our previous study has shown that QKI binds nucleotides 1441-1770, though to a lesser extent than nucleotides 626-885 (Larocque et al., 2002). To further define the sequences bound by QKI, we performed EMSAs with MBP:QRE-1 and MBP:QRE-2 (Figure

2.5a). As predicted from our SELEX analysis, QKI bound MBP:QRE-2 with high affinity (Kd -115 nM) and MBP:QRE-1 with moderate affinity (Kd -383 nM)

(Figure 2.5a). 71

FIGURE 2.5 MAPPING QRES WITHIN TWO KNOWN MRNA TARGETS OF QKI, MBP AND EGR-2. (a) Defining QREs in the myelin basic protein (MBP) mRNA. Above, schematic drawing of the previously defined QRE (QRE-1) and the putative QRE predicted using the SELEX consensus sequence (QRE-2). Middle, results of EMSAs of both QREs as well as variant QREs having mutations in the core and half-site (QRE-2m1 and QRE-2m2, respectively), performed with increasing concentrations of GST-QKI-5 (by a factor of 3 from 4.1 nM for EMSAs of QRE-1 and QRE-2; 40 nM, 200 nM and 1 uM for EMSAs of QRE-2m1 and QRE-2m2). GST alone (750 nM) and buffer alone were used as negative controls. Migration patterns of unbound RNAs (free probe) and QKI-5-bound RNAs (QKI-RNA complex) are indicated at left. Below, sequences of tested QREs and variants.

Core and half-site sequences are underlined. Kds on right are relative binding constants, (b) Defining the QRE in the EGR2 mRNA. Above, schematic drawing of a putative QRE in the 3' UTR predicted using the SELEX consensus sequence. Middle, results of EMSAs of the putative QRE as well as variant QREs having mutations in the core and half-site (QRE-1 ml and QRE-1 m2, respectively), performed with increasing concentrations (40 nM, 200 nM and 1 pM) of GST-QKI-5. Migration patterns of unbound RNAs (free probe) and QKI- 5-bound RNAs (QKI-RNA complex) are indicated at left. Below, sequences of tested QREs and variants. Core and half-site sequences are underlined. KdS on right are relative binding constants onf= 1 J~Ti " ' 138 C.y 'UTR

K'v*'»- basic protean rnRNA.

QKh.RNA complex

iM» iM#? «* *. _ lHni«

vaPOF-F-1 CUUCUUAAUAUAACUGCCUUAAACUUUAAU V?3 " (.'BP-OHE-? GACACACUAACCUCGGUGGAAAAAUAACCAU ir, l3P*QHC-2rr GACACUCGGUQCUCGGUGGAAAAAUAACCAU -1000 fv'BP-QRE-i?n2 GACACACUAACCUCGGUGGAAAGUGGAACAU -,s r,co

QRE-1 .1665 2862 0 5'UTn Coding r UTR 2789-2798 Egr2 mWJA

QRE-1 ORE-Hii! ORttm2 £ QKi-5 J OKi 5 j» QKi-5

OKI-RIM A compiex mi™

Free prtbe

«., {rVl! £gr2;GRE-1 GCUAUUCUMCAUAAAAAAACCACUAAQUGG 'scs f?F? ORE-{m i GCUAUUCUAACAUAAAAAAACUUCAGAUUGG > ? .000 Egr2.QR£. {r«2 GCUAOUCCUGCAUAAAAMACCACUAACUGG > i ,u00 Figure 2.5 Mapping QREs within two known mRNA targets of QKI, MBP and EGR-2. 73

Mutating the core (QRE-2m1) or the half-site (QRE-2m2) in MBP:QRE-2 abolished binding (Figure 2.5a). Thus, our studies define two QREs within MBP mRNA and highlight the importance of the core and half-site in QRE-2. EGR2 is a zinc finger transcription factor that plays a key role in Schwann cell differentiation and in myelination in the peripheral nervous system (Nagarajan et al., 2001). The

3' UTR of the EGR-2 mRNA is a target of QKI-5; in Schneider cells, expression of

QKI isoforms alters the expression of a reporter gene harboring the EGR2 3'

UTR (Nabel-Rosen et al., 2002). However, the QRE of the EGR-2 mRNA has not previously been mapped. We searched for the QRE consensus sequence within the EGR2 mRNA and identified a matching region between nucleotides 2769 and

2798 (EGR2:QRE-1) (Figure 2.5b). Using EMSA analysis, we confirmed that QKI bound EGR2:QRE1 with high affinity and that scrambling the sequence of the core (QRE-1m1) or the half-site (QRE-1m2) abolished QKI binding (Figure 2.5b).

These data define the QRE of EGR-2 as a core and a half-site residing between nucleotides 2769 and 2798 of the EGR-2 mRNA.

2.5.4 IDENTIFYING NOVEL PUTATIVE MRNA TARGETS OF QKI

To identify potential new mRNA targets of QKI, we performed a bioinformatic analysis using our defined QRE. Mouse refseq genes in the

National Center for Biotechnology Information (NCBI) build 32 database (UCSC version mm4) were stringently scanned using a shortened core sequence

(ACUAAY, where Y is a pyrimidine) and a half-site (UAAY) separated by 1-20 nt.

The search revealed 955 unique putative mRNA targets (Table 2.1 in the 74 supplementary data section (section 2.9)). We have observed above that some

QKI-binding mRNAs differ from our defined QRE consensus sequence by 1 or 2 nucleotides (Figures 2.2a and 2.3). Therefore, we also performed a more inclusive bioinformatic analysis using a matrix based on the nucleotides identified by SELEX (Table 2.2 in the supplementary data section (section 2.9)); the results contained 478 new mRNA targets of QKI (Table 2.3 in the supplementary data section (section 2.9)), for a total of 1,433. The total set included the known mRNA targets MBP, EGR2 and p27KIP1 as well as 1,430 new potential QKI targets. mRNAs annotated in the database for annotation, visualization and integrated discoveries (DAVID) (Dennis et al., 2003) as involved in development, cell adhesion, morphogenesis, organogenesis, transport and cell differentiation were highly significant groups (all P < 10"5) and comprised a large proportion of the putative targets (see P-values and percentages in Table 2.4 in the supplementary data section (section 2.9)). Two other major categories were cell growth and/or maintenance, with 24.1% of the mRNA targets, and cell communication, with 26.5% of the mRNA targets. The remaining categories included broad headings such as cellular process (47.7%), regulation of biological process (17.9%) and cellular physiological process (27.0%). The distribution of putative QKI targets across these categories is consistent with the role of QKI proteins as regulators of cell fate during development.

We next validated some of the mRNA targets we had proposed, confirming that they are actually bound by QKI. Immunoprecipitation from lysates 75 of whole mouse embryos at embryonic day (E) 15.5 was used to determine whether or not QKI-5 associated with its RNA targets in vivo. At E10.5, neural progenitor cells begin their development into neurons or glial cells and by E15.5 this process is nearly complete (Rowitch, 2004). QKI-5 is the major isoform expressed during this time, and the expression of QKI-6 and QKI-7 begins at

E11.5 and peaks during myelination in postnatal weeks 2-4 (Hardy, 1998). The level of 33 of 955 mRNAs identified in the stringent search was examined at

E15.5 using RT-PCR on total RNA isolated from these embryos. Only 14 of 33 mRNAs were abundant at day E15.5 (Figure 2.6a, total RNA). We immunized rabbits with a QKI-5-specific peptide and affinity purified the polyclonal antibodies from sera using the immunizing peptide bound to a solid support.

Extracts were prepared from three separate E15.5 mice, with immunoprecipitations of QKI-5 and of normal rabbit serum (NRS) serving as controls (Figure 2.8 in the supplementary data section (section 2.9)). The bound mRNAs were isolated, then visualized using RT-PCR. All of the 14 mRNA targets we selected associated with QKI-5 to differing degrees (Figure 2.6a). In particular, the Fused Toes, Hip2 and HMGbl mRNAs contained four, three and three QREs within their sequences, respectively, and the presence of multiple

QREs may explain why strong interactions with these mRNAs were visible in at least two of the three immunoprecipitations. Little to no binding was observed in

NRS control immunoprecipitations, confirming the specificity of the interaction. An additional immunoprecipitation experiment was performed that included two other antibodies: anti-QKI-6 (Larocque et al., 2005) and anti-Sam68 Chen et al., 76

1999). We validated an additional nine mRNA targets in immunoprecipitation studies using anti-QKI-5 and anti-QKI-6 and RT-PCR analysis (Figure 2.6b).

Edn1, fibrillarin, HoxC5, Ncoa3, TNF21 and YES immunoprecipitated with anti-QKI-5 and anti-QKI-6, but not with anti-Sam68 (Figure 2.6b). Pea15, ImpB and Tcp1 immunoprecipitated with all three antibodies (Figure 2.6b). Importin-p

(ImpB) and T-complex polypeptide 1 (Tcp1) were previously shown to be mRNA targets of Sam68 (Itoh et al., 2002) and served as positive controls for Sam68.

Neither the cyclin-dependent kinase (CDK) inhibitor p21 nor Gapdh contain QRE sequences and, indeed, neither immunoprecipitated with anti-QKI-5, anti-QKI-6 or anti-Sam68 (Figure 2.6b). We next used EMSA to determine whether QKI-5 bound the individual QRE sequences in each of the 14 mRNA targets of Figure

2.6a. As expected, all 14 perfect QREs were bound by an increasing concentration of GST-QKI-5 in vitro (Figure 2.6c). The GST-QKI-5 fusion protein did not bind a control sequence from Gapdh mRNA (Figure 2.6c), and the 14

QREs were not bound by GST alone (data not shown). Thus, the QREs identified by bioinformatic analysis are bona fide QKI binding sequences. In summary, our data confirmed 23 novel mRNA targets for QKI-5 and mapped QREs for 14 of the

23 targets. 77

FIGURE 2.6 IDENTIFICATION OF NEW MRNA TARGETS FOR QUAKING. (a) Results of protein A-Sepharose pull-down following immunoprecipitation of QKI-5 with putative mRNA targets and with Gapdh mRNA, a negative control that does not contain a QRE. Immunoprecipitations were performed in three separate 15.5-day-old mouse embryos and in NRS as a control. Pictures show DNA fragments amplified from bound RNAs by RT-PCR, separated on 1.7% agarose gels and visualized by ethidium bromide staining, (b) As in a but using two additional proteins, QKI-6 and Sam68, to confirm that target mRNAs bind to both QKI-5 and QKI-6 but not to Sam68. (c) Results of EMSAs of 14 mRNAs randomly selected from 955 putative mRNA targets of QKI, performed with increasing concentrations of QKI-5 (10 nM, 100 nM and 1 ^M). Control RNA is an unrelated Gapdh sequence. 78

a QKI-S IP OKI-5 IP Mk§lfL OEtUL % w % •5 % 5 o n < < *•> <"> jg™. ~z o o 8 5? Q >, :*» >» -^- 2 RSI a m *o Jj cc OT ..s ~ <8 A ja r. EC E £ £ DC f o GL c e c C £• E E -g £T E E E *L LU UJ UJ 1— 2 UJ F z is) ui y ~ UJ UJ tJJ }2 z UJ UJ UJ:

BCAR1 Bid Cadherinl 1 Fined toes Hip2

HMGbl MTM1 RhoA SFS TfRc

TGFbR.1 TNFR821 U2AF YY1 G.lpdh

a, < o. o. - < a a. - < fl-o.~< a. a. - < CL Q. z ™-V —Tjpi , v t^ «.yl^| "™Tf !]™^ ^ZZ -ft^ 1*™B "™^ ^^ ^*" IE' w « 9 f cc -- »? «? » cr « 9 ® cc f/v ifl 9 | 1 m i

r

f-fvV! BCAR1 Bid Cadherinl t Fused toes Hip2 mm ccmnftx

HAw*]! Free u —i—•—-M p*ofc« HMGM. SF2 TfRc

ll^p

TGFbBI TfttFF?s21 U2AF Figure 2.6 Identification of new mRNA targets for Quaking. 79

2.6 DISCUSSION

Here we used SELEX to define the optimal QRE as a bipartite motif with a derived consensus sequence NACUAAY-N^o-UAAY. A bioinformatics analysis identified 1,433 mouse mRNAs containing at least one QRE.

Immunoprecipitation assays confirmed that 23 of these mRNA targets interact with QKI. Our data identified a large population of mRNAs that may be regulated by the QKI isoforms and showed that QKI isoforms are group-specific RNA- binding proteins (Keene, 2001).

QKI bound to the QRE with high affinity, with an estimated Kd of 100 nM.

This is similar to reported RNA binding affinities of Nova-1 and GLD-1; the 2-3- fold differences most likely reflect the different binding assays (Ryder et al., 2004;

Buckanovich et al., 1997). As expected, QKI-5, QKI-6 and QKI-7 had similar binding affinities for the QRE. Some amino acid substitutions within the

GSG/STAR domain of QKI cause embryonic lethality (Cox et al., 1999; Justice &

Bode, 1988). QKI isoforms harboring these amino acid substitutions were defective in RNA binding as expected (Laroque et al., 2002), confirming that the identified RNA sequences recognize the GSG/STAR domain. The structure of a dimerized GSG/STAR domain bound or not bound to RNA still remains to be determined; however, the core sequence and half-site of the QRE probably each contact one subunit of a QKI dimer. 80

Homology models based on the solution structure of SF1/BBP1 (Liu et al.,

2001) suggest that the three STAR proteins C. elegans GLD-1, Drosophila melanogaster HOW and mammalian QKI should have similar binding specificities, because they have identical amino acid-RNA contacts within the KH domain (Ryder et al., 2004). Williamson and coworkers have defined the SBE as a hexanucleotide sequence (U>G>C/A)A(C>A)U(C/A>U)A for GLD-1 (Ryder et al., 2004), and they have subsequently defined a slightly different SBE in QKI with the sequence NA(A>C)U(A»C)A (Ryder et al., 2004b). These studies involved the tra-2 and gli repeated elements (TGE) within the tra-2 mRNA (Jan et al., 1999), extensive mutagenesis, and the use of short RNAs to compete the

TGE-GLD-1 interaction (Ryder et al., 2004). A minimal GLD-1 binding site has not previously been defined and a genome-wide search for the hexanucleotide has not been performed. Moreover, although it has been noted that the SBE requires neighboring sequences, these have not been mapped to a motif (Ryder et al., 2004). We showed that the hexanucleotide identified previously (Ryder et al., 2004) is part of a core sequence NACUAAY in the QRE and identified a neighboring half-site that completes a bipartite RNA binding consensus sequence. Notably, the TGE elements of tra-2 contain our bipartite consensus sequence, UAAUuucuuaucUACUCAU (Ryder et al., 2004), which explains why we and others have observed QKI binding to TGEs (Saccomanno et al., 1999;

Chen et al., 2001). These findings suggest that SBEs within certain mRNA targets of GLD-1 are conserved as QREs in mammals. Indeed, our search for

QKI targets identified two known GLD-1 targets, the pumilio-1 and pumilio-2 81 mRNAs (Lee & Schedl, 2001) (Table 2.1 in supplementary data section (section

2.9)). The mRNA encoding the p53 tumor suppressor has recently been shown to be a GLD-1 target (Schumacher et al., 2005), and may also be a target of QKI: the 3' UTR of mouse p53 harbors a QRE (Table 2.3 in supplementary data section (section 2.9) Trp53, ACUuAC-N14-UAAU) that binds GST-QKI-5 in

EMSAs (data not shown). Our bioinformatics analysis identified 955 putative mRNA targets using a stringent search, and 478 additional targets using a more inclusive search. We predict that all of the QREs within the former targets and

>70% of the QREs within the latter targets will bind recombinant QKI in EMSAs.

Whether these associations occur in vivo will depend on the secondary structure surrounding the QRE and also, of course, on whether the mRNA is coexpressed temporally with certain QKI isoforms.

Ryder and Williamson have identified an SBE in the MBP mRNAs Ryder et al. ,2004b) that overlaps with the QRE we defined as MBP:QRE-2 (Figure

2.5a). The GSG/STAR domain of QKI binds the SBE with high affinity; however, their study did not include mutational analysis to define the required surrounding nucleotides (Ryder et al., 2004b). The SBE sequence, 5'-

CAGUGCCCAUUGGUACACACUAACCUCGG-3', contains an overlapping putative core (bold) and half site (underlined), which may explain QKI's high affinity for this mRNA. The MBP.QRE-2 sequence we used (Figure 2.5a) contains part of the overlapping core and half-site identified by Ryder and

Williamson (Ryder et al., 2004b) and an additional UAAC half-site at the 3' end. 82

Deleting the partially overlapping core and half-site or the additional half-site completely abolished QKI binding (Figure 2.5a). These findings suggest that

MBP:QRE-2 contains multiple overlapping core and half-site motifs that define a high-affinity QRE.

Several KH-type RNA-binding proteins are known to regulate pre-mRNA splicing, including KSRP (Min et al., 1997), Nova-1 (Buckanovich et al., 1997),

SLM-2 (Stoss et al., 2001), SF1/BBP1 (Arning et al., 1996; Berglund et al., 1997),

Sam68 (Matter et al., 2002) and QKI-5 (Wu et al., 2002). The QRE defined herein resembles the SF1/BBP1 branchpoint signal site in mammalian introns, which has the consensus sequence UACUAAC (Berglund et al., 1997). Therefore, a genomic search for the QRE consensus sequence would be likely to identify many binding sites within introns. Nuclear STAR proteins such as QKI-5 may regulate alternative splicing by simply competing with SF1/BBP1 for the branchpoint (Ryder et al., 2004; Butcher & Wickens, 2004). Indeed, QKI-5 regulates the splicing of myelin-associated glycoprotein through a 170-nt sequence termed the QKI alternatively spliced element (QASE) (Wu et al., 2002).

The QASE differs from the QRE defined here; since previous QASE binding assays have been performed with nuclear extracts, the QKI binding observed was probably indirect. Nevertheless, QKI-5 probably regulates pre-mRNA splicing during embryogenesis. 83

The diverse cellular functions of QKI are carried out by alternatively spliced QKI isoforms localized in both the nucleoplasm and the cytoplasm (Hardy et al., 1996; Wu et al., 2001). We have identified 1,430 putative mRNA targets; the next challenge will be to determine which ones are regulated at the levels of pre-mRNA processing, mRNA export, mRNA stability and protein translation.

Database annotation identified development and cell growth as the primary categories of putative RNA targets of QKI. Fourteen percent of the mRNA targets identified were genes annotated for development; this represented the most significant category (P < 10"10; Table 2.4 in the supplementary data section

(section 2.9)). Other categories of high significance were cell adhesion, morphogenesis, organogenesis and cell differentiation (all P < 10"5). These results are consistent with the role of QKI proteins in development and differentiation, as evidenced by the phenotype of the Qkv mice, the embryonic lethality of QKI-null mice (Li et al., 2003) and the phenotypes associated with QKI homologs in Xenopus laevis and D. melanogaster (Baehrecke, 1997; Zorn &

Krieg, 1997; Zaffran et al., 1997). We also noted RNAs annotated as genes for meiosis and for male gamete generation and spermatogenesis; RNAs of these categories are known to be regulated by the C. elegans STAR protein GLD-1

(Jones et al., 1996). The most unexpected category was cell growth and/or maintenance, which contained 24% of mRNA targets. Many gene products implicated in cancer have been identified as mRNA targets of QKI, including the oncogenes Ras, Jun, Fos, p53 and others (Tables 2.1 and 2.3 in the supplementary data section (section 2.9)). These findings suggest a role for the 84

QKI proteins in regulating the cell cycle, proliferation and cancer. In fact, QKI isoforms are underexpressed in some human gliomas, so QKI may be a tumor suppressor (Li et al., 2002), as GLD-1 is (Francis et al., 1995).

2.7 REFERENCES OF CHAPTER 2

Hogan, E.L. & Greenfield, S. Animal models of genetic disorders of myelin, in Myelin (ed. Morell, P.) 489-534 (Plenum Press, New York, 1984).

Ebersole, T.A., Chen, Q., Justice, MJ. & Artzt, K. The quaking gene product necessary in embryogenesis and myelination combines features of RNA binding and signal transduction proteins. Nat. Genet. 12, 260-265(1996).

Hardy, R.J. et al. Neural cell type-specific expression of QKI proteins is altered in the quaking viable mutant mice. J. Neurosci. 16, 7941-7949 (1996).

Larocque, D. et al. Nuclear retention of MBP mRNAs in the Quaking viable mice. Neuron 36, 815-829 (2002).

Larocque, D. et al. Protection of the p27KIP1 mRNA by quaking RNA binding proteins promotes oligodendrocyte differentiation. Nat. Neurosci. 8, 27-33 (2005).

Darnell, R.B. Paraneoplastic neurologic disorders: windows into neuronal function and tumor immunity. Arch. Neurol. 61, 30-32 (2004).

Lukong, K.E. & Richard, S. Sam68, the KH domain-containing superSTAR. Biochim. Biophys. Acta 1653, 73-86 (2003). 85

Vernet, C. & Artzt, K. STAR, a gene family involved in signal transduction and activation of RNA. Trends Genet. 13, 479-484 (1997).

Wu, J.I., Reed, R.B., Grabowski, P.J. & Artzt, K. Function of quaking in myelination: regulation of alternative splicing. Proc. Natl. Acad. Sci. USA 99, 4233-4238 (2002).

Li, Z., Zhang, Y., Li, D. & Feng, Y. Destabilization and mislocalization of the myelin basic protein mRNAs in quaking dysmyelination lacking the Qk1 RNA- binding proteins. J. Neurosci. 20, 4944-4953 (2000).

Saccomanno, L. et al. The STAR protein QKI-6 is a translational repressor. Proc. Natl. Acad. Sci. USA 96, 12605-12610 (1999).

Pilotte, J., Larocque, D. & Richard, S: Nuclear translocation controlled by alternatively spliced isoforms inactivates the QUAKING apoptotic inducer. Genes Dev. 15, 845-858(2001).

Li, Z. et al. Defective smooth muscle development in qkl-deficient mice. Dev. Growth Differ. 45, 449-462 (2003).

Ryder, S.P., Frater, LA, Abramovitz, D.L., Goodwin, E.B. & Williamson, J.R. RNA target specificity of the STAR/GSG domain post-transcriptional regulatory protein GLD-1. Nat. Struct. Mol. Biol. 11, 20-28 (2004).

Francis, R., Barton, M.K., Kimbel, J. & Schedl, T. Control of oogenesis, germline proliferation and sex determination by the C. elegans gene gld-1. Genetics 139, 579-606(1995).

Ryder, S.P. & Williamson, J.R. Specificity of the STAR/GSG domain protein Qk1: implications for the regulation of myelination. RNA 10, 1449-1458 (2004). 86

Buckanovich, R.J. & Darnell, R.B. The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo. Mol. Cell. Biol. 17, 3194-3201 (1997).

Cox, R.D. et al. Contrasting effects of ENU induced embryonic lethal mutations of the quaking gene. Genomics 57, 333-341 (1999).

Justice, M.J. & Bode, V.C. Three ENU-induced alleles of the murine quaking locus are recessive embryonic lethal mutations. Genet. Res. 51, 95-102 (1988).

Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406-3415 (2003).

Jensen, K.B., Musunuru, K., Lewis, H.A., Burley, S.K. & Darnell, R.B. The tetranucleotide UCAY directs the specific recognition of RNA by the Nova K- homology 3 domain. Proc. Natl. Acad. Sci. USA 97, 5740-5745 (2000).

Nabel-Rosen, H., Volohonsky, G., Reuveny, A., Zaidel-Bar, R. & Volk, T. Two isoforms of the Drosophila RNA binding protein, How, act in opposing directions to regulate tendon cell differentiation. Dev. Cell 2, 183-193 (2002).

Nagarajan, R. et al. EGR2 mutations in inherited neuropathies dominant- negatively inhibit myelin gene expression. Neuron 30, 355-368 (2001).

Dennis, G., Jr. et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, 3 (2003).

Rowitch, D.H. Glial specification in the vertebrate neural tube. Nat. Rev. Neurosci. 5, 409-419 (2004).

Hardy, R.J. QKI expression is regulated during neuron-glial cell fate decisions. J. Neurosci. Res. 54, 46-57 (1998). 87

Chen, T., Boisvert, F.M., Bazett-Jones, D.P. & Richard, S. A role for the GSG domain in localizing Sam68 to novel nuclear structures in cancer cell lines. Mol. Biol. Cell 10, 3015-3033 (1999).

Itoh, M., Haga, I., Li, Q.-H. & Fujisawa, J.-l. Identification of cellular mRNA targets for RNA-binding protein Sam68. Nucleic Acids Res. 30, 5452-5464 (2002).

Keene, J.D. Ribonucleoprotein infrastructure regulating the flow of genetic information between the genome and the proteome. Proc. Natl. Acad. Sci. USA 98,7018-7024(2001).

Liu, Z. et al. Structural basis for recognition of the intron branch site RNA by splicing factor 1. Science 294, 1098-1102 (2001).

Jan, E., Motzny, C.K., Graves, LE. & Goodwin, E.B. The STAR protein, GLD-1, is a translational regulator of sexual identity in Caenorhabditis elegans. EMBO J. 18,258-269(1999).

Chen, T., Cote, J., Carvajal, H.V. & Richard, S. Identification of Sam68 arginine glycine-rich sequences capable of conferring non-specific RNA binding to the GSG domain. J. Biol. Chem. 276, 30803-30811 (2001).

Lee, M.-H. & Schedl, T. Identification of in vivo mRNA targets of GLD-1, a maxi- KH motif containing protein required for C. elegans germ cell development. Genes Dev. 15, 2408-2420 (2001).

Schumacher, B. et al. Translational repression of C. elegans p53 by GLD-1 regulates DNA damage-induced apoptosis. Cell 120, 357-368 (2005).

Min, H., Turck, C.W., Nikolic, J.M. & Black, D.L. A new regulatory protein, KSRP, mediates exon inclusion through an intronic splicing enhancer. Genes Dev. 11, 1023-1036(1997). 88

Stoss, O. et al. The STAR/GSG family protein rSLM-2 regulates the selection of alternative splice sites. J. Biol. Chem. 276, 8665-8673 (2001).

Anting, S., Gruter, P., Bilbe, G. & Kramer, A. Mammalian splicing factor SF1 is encoded by variant cDNAs and binds to RNA. RNA 2, 794-810 (1996).

Berglund, J.A., Chua, K., Abovich, N., Reed, R. & Rosbash, M. The splicing factor BBP interacts specifically with the pre-mRNA branch-point sequence UACUAAC. Cell SB, 781-787 (1997).

Matter, N., Herrlich, P. & Konig, H. Signal-dependent regulation of splicing via phosphorylation of Sam68. Nature 420, 691 -695 (2002).

Butcher, S.E. & Wickens, M. STAR-studded circuitry. Nat. Struct. Mol. Biol. 11, 2-3 (2004).

Hardy, R.J., Lazzarini, R.A., Colman, D.R. & Friedrich, V.L., Jr. Cytoplasmic and nuclear localization of myelin basic proteins reveals heterogeneity among oligodendrocytes. J. Neurosci. Res. 46, 246-257 (1996).

Wu, H.Y., Dawson, M.R.L., Reynolds, R. & Hardy, R.J. Expression of QKI proteins and MAPI B identifies actively myelinating oligodendrocytes in adult rat brain. Mol. Cell. Neurosci. 17, 292-302 (2001).

Baehrecke, E.H. who encodes a KH RNA binding protein that functions in muscle development. Development 124, 1323-1332 (1997).

Zorn, A.M. & Krieg, P.A. The KH domain protein encoded by quaking functions as a dimer and is essential for notochord development in Xenopus embryos. Genes Dev. 11, 2176-2190 (1997).

Zaffran, S., Astier, M., Gratecos, D. & Semeriva, M. The held out wings (how) Drosophila gene encodes a putative RNA binding protein involved in the control of muscular and cardiac activity. Development 124, 2087-2098 (1997). 89

Jones, A.R., Francis, R. & Schedl, T. GLD-1, a cytoplasmic protein essential for oocyte differentiation, shows stage- and sex-specific expression during Caenorhabditis elegans germ line development. Dev. Biol. 180, 165-183 (1996).

Li, Z.Z. et al. Expression of Hqk encoding a KH RNA binding protein is altered in human glioma. Jpn. J. Cancer Res. 93, 167-177 (2002).

Francis, R., Barton, M.K., Kimble, J. & Schedl, T. gld-1, a tumor suppressor gene required for oocyte development in Caenorhabditis elegans. Genetics 139, 579-606(1995).

Galarneau, A., Primeau, M., Trudeau, L.E. & Michnick, S.W. Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein protein interactions. Nat. Biotechnol. 20, 619-622 (2002).

2.8 ACKNOWLEDGMENTS

We thank M. Scott for the bioinformatic analyses and F. Major, P.

Wilkinson and J. Cote for helpful discussions. This work was supported by grant

MOP57692 from the Canadian Institutes of Health Research (CIHR) and by funds from the Multiple Sclerosis Society of Canada. A.G. is a Research Student of the National Cancer Institute of Canada supported with funds provided by the

Terry Fox Run. S.R. is an Investigator of the CIHR. 2.9 SUPPLEMENTARY DATA

Table 2.1 mRNA targets identified based on the identified consensus.

Gene Accession # Seauence Gene Annotation Region1 1-acytglyeerol-3-phosphate O-acyltransferase 3 (Agpat3) NMJD53014 actaatattaac phospholipid metabolism 3'UTR 24-dienoyi CoA reductase 1 mitochondrial (DecM) NM_026172 tactaatatatattttttatttggactataat metabolism 3TJTR 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (Hmgcsl) NMJ45942 actaatgtacagaactaaatttctta ac sex differentiation 3UTR 3-hydroxyisobutyryl-Coenzyme A hydrolase (Hibch) NMJ46108 ggtaactttgactttgtttatatactaat metabolism 3'UTR 3-phosphoglycerate dehydrogenase (Phgdh) NMJ) 16966 actaataccta gtaaagaattcttaa c metabolism 3'UTR 4-aminobutyrate aminotransferase (Abat) NMJ 72961 actaacagtgtaac Unknown function 3UTR 5-azacyticine induced gene 2 (Azi2) NM_013727 actaactgactgtgtaat signal transduction 3'UTR 5-hydroxytryptamine (serotonin) receptor 1F (Htrlf) NMJJ08310 actaactcactgtaat GPCR Protein signaling pathway 3'UTR 6-phospho1ructo-2-kinase/fructose-26-biphosphatase 2 (Pfkfb2) NM_008825 a eta alca acca a ga a gtca ttta a c metabolism 3*UTR a disintegrin and metalloprotease domain 28 (Adam28) NMJ 83366 actaacaaggtgtgcattaat macromolecule metabolism CDS a dtsintegrin and metalloprotease domain 29 (Adam29) NMJ 75939 actaatttaat signal transduction 5TJTR a disintegrin and metalloproteinase domain 12(meltrin alpha) NM_QQ7400 actaacattaat proteolysis and pepttdolysis 3'UTR a disintegrin and metalloproteinase domain 30 (Adam30) NM_027665 actaatgaatgaatgaaaaataat proteolysis and pepttdolysis 3'UTR

a disintegrin and metalloproteinase domain 9 (meltrin gamma) NM_007404 actaattgccaataac proteolysis and peptidolysis 3UTR a dtsintegrin-like and metalloprotease (reprolysin type) NMJ 72845 gataactcca a gccatgcactaac signal transduction CDS A kinase (PRKA) anchor protein (yotiao) 9(Akap9) NMJ94462 gttaacaagacttatcaaccaa ctaac Unknown function CDS Abelson helper integration site (Ahi1) NMJD26203 gataatgaagacactaat cell adhesion heterophil!c CDS abhydrolase domain containing 5 and 6 (Abhd5) NM_026179 actaacatagttaac xenobiotrc metabolism 3TJTR abkinteractor 2 (Abi2) NMJ98127 actaataagctttaac protein binding 3'UTR achaete-scute complex homolog-like 1 (Ascll) NM_008553 tattaactcccaacca ctaac neuron differentiation 3'UTR activating transcription factor 7 (Atf7) NMJ46065 gataattaaaactaat transcription regulation 3'UTR activin receptor IIA (Acvr2) NM_007396 cttaatgtctgtcagaagacactaat protein amino acid phosphorylation 3UTR acyl-Coenzyme A binding domain containing 5 (Acbd5) NM_028793 ggtaacattttaaagactaat Unknown liinction 3'UTR adaptor protein complex AP-2 alpha 2 subunit (Ap2a2) NM_007459 cactaatctgtgctgacgacettcaaactaat protein complex assembly CDS adaptor-related protein complex 1 sigma 2 subunit (Ap1s2) NMJ326887 tttaatgtccctgtgtactaat protein transport 3'UTR adaptor-related protein complex 3 mu 1 subunit (Ap3m1) NMJ) 16929 tttaatgccactaac protein transport 3'UTR adducin 1 (alpha) (Add1) NM_013457 ccctaacctgtttgttccattgaacactaac Unknown function CDS adducin 3 (gamma) (Add3) NM_013758 actaactactggagcctgttaat calmodulin binding 3'UTR adenylate cyclase 2 (Adcy2) NMJ 53534 actaaccatggctgtgtttaac GPCR Protein signaling pathway CDS ADP-ribosylation factor related protein 2 (Arfrp2) NMJ 72595 actaatttttttccacttaat Unknown liinction 3'UTR ADP-ribosylation factor-like 6 interacting protein 6 (Arl6ip6) NM_022989 cttaatagccctactaat Unknown function 3'UTR small GTPase mediated signal ADP-ribosylation factor-like 8 (AriB) NM_029466 actaatctcggtcggacgagcataac transduction 3UTR adrenergic receptor beta 2 (Adrb2) NMJ307420 actaaccagactatttaac GPCR Protein signaling pathway 3UTR alanine and arginine rich domain containing protein (Aard) NMJ 75503 actaatttagttcttttgttcagttaat Unknown lunction 3'UTR alanyt-tRNA synthetase (Aars) NMJ46217 actaactgccccacccctactgctttaac tRNAiigase activity 3'UTR aldehyde dehydrogenase family 6 subfamily A1 (Aldh6a1) NMJ 34042 actaacatgtaac metabolism 3'UTR aldehyde dehydrogenase family 7 member A1 (Aldh7a1) NMJ 38600 actaattactttttttattacagctaat metabolism 3UTR aldo-keto reductase family (Akr1c20) NM_054080 actaattacagtaac Unknown function 3'UTR

aldo-keto reductase family 1 member C2f (Akrl c21) NM_029901 actaaccaagtccattggggtgtctaac macromolecule metabolism CDS alkylglycerone phosphate synthase (Agps) NMJ 72666 tttaactattataattggcctcaaactaac Unknown function 3TJTR amine oxidase copper containing 3 (Aoc3) NM_009675 actaatgttattactgta gttaat cell adhesion 3UTR amine oxidase flavin containing 1 (Aoft) NMJ 72262 tattaataacccagtagcactaat Unknown lunction CDS amyotrophic lateral sclerosis 2 (juvenile) homolog (Als2) NM_028717 tgtaatttctgagccttactaat signal transduction 3UTR ANKTM1 (Anktml) NMJ777B1 actaacttatgtaccaatggtctta at Unknown function 3'UTR ankyrin repeat and SOCS box-containing protein 3 (Asb3) NM_023906 gttaataccacgtactaac Unknown function CDS ankyrin repeat domain 1 (cardiac muscle) (Ankrdl) NMJD13468 tataataaatgeaaactaat transcription regulation 3'UTR

anthrax toxin receptor 1 (Antxrl) NM_054041 tttaatgaaactaac GPCR Protein signaling pathway CDS apical protein Xenopus laevis-like (Apxl) NMJ72441 actaatttgtgtatatgatctttaat . Unknown function 3'UTR apurinic/apyrimidinic endonudease 2 (Apex2) NM_029943 actaatttaat DNA repair 3UTR aquaporin 11 (Aqp11) NMJ 75105 cataacaatcaaatgactaat macromolecule metabolism CDS aquaporin 7 (Aqp7) NM_007473 ggtaatatccataactaac transport 3UTR arginine vasopressin-induced 1 (Avpil) NMJD27106 actaacttgacagttaat Unknown function 3'UTR armadillo repeat containing protein (Arcp) NMJ328840 ttctaacatatttga caactaat metal ion transport 3UTR ARP6 actin-related protein 6 homolog (Actr6) NM_025914 actaatattattataac Unknown function CDS arrestin beta 1 (Arrbl) NMJ 78220 actaa cctct c ctgtcccccatcctta a c GPCR Protein signaling pathway 3'UTR arrestin domain containing 3 (Arrdc3) NMJ 78917 actaatactagataat Unknown function 3UTR aryt hydrocarbon receptor nuclear translocates 2 (Amt2) NM_007488 cataatgaaactaat signal transduction 3'UTR

aryt-hydrocart)on receptor (Ahr) NM_013464 actaaccgatttcctatattttttaac celt cycle 3'UTR ashl (absent small or homeotic) -Kke (Ashll) NMJ 38679 actaatttggagaaagaaatgtttaat transcription regulation CDS asparaginase like 1 (AsrgH) NMJ325610 tttaacgtgccactaat metabolism glycoprotein 3TJTR asparagine-Iinked glycosylation 6 homolog NMJ78784 actaatttcttagctccagctaac Unknown function 3'UTR aspartoacyiase (aminoacytase) 2 (Aspa) NM_023113 tgctaatcaaaactaat carboxylic acid metabolism 3'UTR AT motif binding factor 1 (Atbft) NMJ307496 actaactgcaattccaaagcttctaac transcription regulation 3UTR

AT rich interactive domain 2 (Arid-rfx like) (Arid2) NMJ 75251 actaacagctgccttaat Unknown function CDS AT rich interactive domain 4B (Rbp1 like) (Arid4b) NMJ 98122 tgtaatgtgtgcttacagtaactaat Unknown fun ebon 3UTR ATPase inhibitor'(Atpi) NMJ3Q7512 actaacagataat Unknown function 3TJTR ATP-binding cassette sub-family B (MDR/TAP) member 11 (Abcb! 1) NM_021022 actaatcttgagtggctttcagtaat transcription regulation 3UTR 91

ATP-binding cassette sub-family C (CFTR/MRP) member 2 (Abcc2) NMJJ13806 actaaccctatctaac transport CDS attract in (Attn) NM_009730 actaactggctcttctggatttgtaac development CDS attractin like protein (Atml) NM_181415 tgtaaca gtgttaaactaat GPCR Protein signaling pathway 3TJTR autophagy 12-like (Apg12t) NM_026217 gttaattccgtgataagaactaat autophagy 3'UTR AXL receptor tyrosine kinase (Axl) NM_009465 actaatgagacaccaaagttctaac cell cycle regulation 3'UTR B and T lymphocyte associated (Btta) NMJ 77584 actaactagtcataaatatagtaat Unknown function 3TJTR baculoviral IAP repeat-containing 6 (Birc6) NM_007566 actaattatgtaat apoptosis 3'UTR basic helix-loop-helix domain containing class B5 (Bhlhb5) NMJ321560 actaatcctactaac transcription regulation 3UTR B-cell receptor-associated protein 29 (Bcap29) NM_007530 actaactttataaaagaagactgtaat apoptosis 3UTR B-cell translocation gene 2 antiproliferative (Btg2) NM_007570 actaattgtataat Unknown function 3UTR BCL2/adenovirus E1B 19kDa-interacting protein 1 NIP2 (Bnip2) NMJ) 16787 actaattatattgtaat apoptosis 3UTR 6H3 interacting domain death agonist (Bid) NM_007544 actaacaaactgagtttaat apoptosis 3UTR bisphosphate 3'-nucleotidase 1 (Bpntl) NM_011794 actaattagctaac transcription regulation 3TJTR Bloom syndrome homolog (Blm) NM_007550 cataattttagaactaat proteolysis and peptidolysis CDS BMP and acBvin membrane-bound inhibitor homolog (Xenopus laevis) NM_026505 actaacaacacaaaacagttacctaat TGFbeta receptor signaling pathway 3UTR BMP2 inducible kinase (Bmp2k) NM_080708 gttaattaatctaactaac regulation of bone mineralization 3'UTR bolboule4ike(Boll) NM_029267 actaatggacttttgtaat Unknown function 3UTR bone morphogenetic protein receptor type 1A (Bmprla) NM_009758 tattaa ta acacatgcata actaat macromotecule metabolism CDS bone morphogenetic protein receptor type 1B (Bmprtb) NMJ)07560 ttta acagaactgactgttagagaaactaat TGFbeta receptor signaling pathway 3TJTR brain protein 44-like (Brp44l) NMJJ18819 actaatttaat Unknown function 3UTR breast cancer anti-estrogen resistance 1 (Bcarl) NMJJ09954 actaatagtctacatttaat cell adhesion 3'UTR bromodomain containing 7 (BrdT) NM_012047 gataacttcaagctaatgtgtacraat proteolysis and peptidolysis CDS. BTB (POZ) domain containing 1 (Btbdl) NMJ46193 tttaactcaaataactaat Unknown function 3'UTR BTB and CNC homology t (BacM) NMJ)07520 ggta a ta atctccta g a gttca a acta a c transcription regulation 3TJTR C1q and tumor necrosis factor related protein 3 (C1qtnf3) NM_030888 actaatgttcacaaatcaactttaat Unknown function 3UTR cadherin 11 (Cdh11) NMJ)09866 actaatacgtgccagatataac cell adhesion 3'UTR cadherin 8(Cdh8) NM_007667 actaatgtaac cell adhesion CDS cadherin EGF LAG seven-pass G-type receptor 1 (Celsrl) NMJ309886 cgtaactgtcagtgacactaac development CDS calcium and integrin binding 1 (calmyrin) (Cib1) NM_011870 actaattggtaat signal transduction 3'UTR catcium channel voltage-dependent beta 1 subunit (Cacnbl) NM_031173 ggtaatgaaatgactaac metabolism fatty acid CDS calcium channel voltage-dependent gamma subunit 5 (Cacng5) NM_080644 actaacctgggctgcatgcgactaac ion transport 3UTR calcium channel voltage-dependent Ltype alpha 1C subunit NMJJ09781 cataattgatgtcattctcagtgagactaat provirus integration CDS calcium channel voltage-dependent R type alpha 1E subunit NM_009782 actaactaactcggagcgtaac ion transport CDS calcium modulating ligand (Caml) NM_007596 actaacgatttctcttgtttaat Unknown function 3'UTR calcium response factor (Carf) NMJ39150 tataacaccaagccattaactaat transcription regulation 5UTR calcium/calmodulin-dependent serine protein kinase (Cask) NM_009806 tgctaatggtcatagcagcacta ac macromolecule metabolism CDS calnexin (Canx) NM_007597 tgtaacatgaagcaactaac cell adhesion heterophils 3'UTR calumenin (Calu) NMJJ07594 tataattctcadtaactaat Unknown function 3'UTR camello-like 1 (Cmll) NMJJ23160 actaactctaat cell adhesion regulation 5-UTR cAMP responsive element binding protein 3-like 3 (Creb3l3) NMJ 45365 actaacacctgaaccctaac Unknown function 3UTR cAMP responsive element modulator (Crem) NMJ) 13498 atctaactttctaaaaetaac metabolism CDS cancer susceptibility candidate 1 (Casd) NMJ 77222 actaatgttaaacagtaat Unknown function 3'UTR carboxypeptidase E (Cpe) NM_013494 actaattgctttaat proteolysis and peptidolysis 3TJTR carnitine palmitoyltransferase 1a liver (Cptl a) NM_013495 tgtaacacactaat metabolism fatty acid 3'UTR cartilage intermediate layer protein nucleotide NMJ 73385 actaatgataat Unknown function 3'UTR casein kinase It alpha 1 polypeptide (Csnk2a1) NMJJ07788 actaacctaac protein amino acid phosphorylation 3UTR CASP8 and FADD-like apoptosis regulator (Cflar) NM_009805 actaat ctagaccagtttcttctataac apoptosis 31ITR catenin alpha 1 (Catnal) NM_009818 a eta a tgga ctctgctgca ga a cattaat cell adhesion 3UTR catenin alpha 2 (Catna2) NMJ45732 ctta a ctga gga ga aa ctaac cell adhesion 3UTR catenin beta (Catnb) NM_007614 actaattcataat cell differentiation regulation 3'UTR CD164 antigen (Cd164) NM_016898 gttaatgccacctttactaat Unknown function CDS CD22 antigen (Cd22) NM_009845 tgtaatcgtcactaac cell adhesion 3UTR CD24a antigen (Cd24a) NM_009846 actaatttaat immune response 3'UTR CD28 antigen (Cd28) NMJD07642 actaatctgcaatggctattttaat immune response 3'UTR CD2-associated protein (Cd2ap) NM_009847 tttaattttcactaat Unknown function 3TJTR CD86 antigen (Cd86) NMJ319388 actaattcaaetaat immune response CDS CDC23 (cell division cycle 23 yeast homolog) (Cdc23) NMJ 78347 tttaatggaaggaacactaac cell cycle 3'UTR CEA-related cell adhesion molecule 2 (Ceacam2) NM_007543 tgtaacaggcactaat cell adhesion CDS CEA-related cell adhesion molecule 9 (Ceacam9) NM_011927 actaacatttaac pregnancy CDS cell division cycle 27 homolog (Cdc27) NMJ45436 actaatacaccttctgtaat cell cycle cytokinesis CDS cell division cycle 37 homolog (Cdc37l) NM_025950 tataataaaatgtctgactaat cell cycle regulation 3'UTR celt division cycle 42 homolog (Cdc42) NM_009B61 tgtaacagactaat signal transduction Rho protein 3'UTR centaurin beta 1 (Centbl) NMJ53788 actaatgctgaeatcgtaac Unknown function CDS cerebellar degeneration-related 2 (Cdr2) NM_007672 gcctaactaaagtactttcacaaacactaac Unknown function 3TJTR cerebral cavernous malformations 1 (Ccm1) NM_030675 tgctaatgaagttaaatggacaactaat Unknown function CDS chemokine (C-C motif) receptor 1 (Ccr1) NM_009912 gactaattttgcctctgttagtcatgataat innate immune response CDS chemokine (C-C) receptor 2 (Ccr2) NM_009915 actaacata gacagctcaggattaac GPCR Protein signaling pathway 3UTR chemokine (C-X-C motif) ligand 9 (Cxc!9) NM_008599 gttaatttgaaattataactaac immune response 3TJTR chloride channel calcium activated 3 (Clca3) NM_017474 actaacggaaagtgtgtaat ion transport CDS chloride channel calcium activated 4 (Clca4) NMJ 39148 actaacctaat ion transport CDS chloride intracellular channel 5 (Clic5) NMJ 72621 actaatatttctctctttagtaat ion transport 3'UTR cholecystoWnin B receptor (Cckbr) NMJJ07627 actaatggcttcctgttaat GPCR Protein signaling pathway 3'UTR choline phosphotransferase 1 (Chptl) NMJ44807 actaattcageataac Unknown function 3'UTR cholinergic receptor nicotinic beta polypeptide 2 (neuronal) NM_009602 tataacaagctgatccgtccagctactaat Unknown Junction CDS chondrortin sulfate proteoglycan 2 (Cspg2) NM_019389 gttaacaacactgaaaacgtacattactaat cell adhesion CDS choroideremia-like (Chml) NM_021350 actaacacctaat Rab GTPase activator activity CDS choroidermia (Chm) NM_018818 actaattgatcttctaat signal transduction CDS chromobox homolog 5 (Cbx5) NMJ307626 gttaatcctctttcaaaactaat chromatin assembly/disassembly 3TJTR 92

circadian locomoter output cycles kaput (Clock) NM_007715 actaatctataaacggttggtagtaat signal transduction 3'UTR claudin 12(Ctdn12) NM_022890 actaata atagtgttttccctttgttaat structural molecule activity 3'UTR claudin 18(Cldn18) NMJ319815 actaatactttaat structural molecule activity 3TJTR claudin 8(Cldn8) NM_018778 tataatgacctcacatatgcactaat Unknown function 3TJTR cleavage stimulation factor 3" pre-RNA subunit 2 (Cstf2) NMJ 33196 cttaataattattaagtttactagcactaat polyadenylation 3'UTR coagulation factor 11 (thrombin) receptor (F2r) NM_01Q169 ttta atgecaca gtgactaac GPCR Protein signaling pathway 3'UTR coagulation factor II (thrombin) receptor-like 1 (F2rl1) NM_007974 actaatttgtccaataat GPCR Protein signaling pathway 3'UTR coagu ration factor VIII (F8) NM_007977 tataatcattcaacaactaat Unknown function CDS coatomer protein complex subunit beta 1 (Copbl) NM_033370 acta atttataaggtctgtcatgctaat protein transport CDS Coenzyme A synthase (Coasy) NM_027896 actaacaaatctaat coenzyme A biosynthesis 3'UTR cofactor required for Spt transcriptional activation subunit 6 NM_144933 agctaacctgaaaatattactaac Unknown function 3TJTR cofilin 2 muscle (Cfl2) NM_0076B8 tttaatttatgaccttatgttgagactaat Unknown function 3TJTR collectin sub-family member 12 (Colec12) NM_130449 gataataatageggtactaat immune response 3'UTR colony stimulating factor 1 receptor (Csftr) NM_007779 actaacagcattaac cell cycle regulation 3UTR complement component 8 alpha polypeptide (C8a) NMJ46148 actaatttttaaaaaaatcataat Unknown function 3'UTR complement component fector h (Cfh) NM_009888 actaatgeatacaattaat complement activation 3TJTR complement component factor h-like 1 (CfhM) NM_015780 actaatgeatataat Unknown function 3UTR conserved helix-loop-helix ubiquitous kinase (Chuk) NM_007700 tttaacaccgcaactaat protein amino acid phosphorylation 3'UTR contactin 4 (Cntn4) NM_173004 actaataaatgtctaac transport 3'UTR COP9 (constitutive photomorphogenic) homolog subunit 3 NMJH1991 actaacccccatcttgaggtaac development embryonic 3'UTR coxsackievirus and adenovirus receptor (Cxadr) NM.Q09988 tata actctagtaa agactaat Unknown function 3UTR culltn 1 (Cult) NM_012042 actaatggagaactagctgtaac cell cycle 5'UTR cullin 4B (Cul4b) NM_028288 cata atgctgtagccattggata aactaat cell cycle 3'UTR CWF19-fike 2 cell cycle control (S. pombe) (Cwf19l2) NM_027545 actaactcattacatgtttattttaat Unknown function 3'UTR

CXORFt5(493244lK18) NM_178935 actaacttagtactcataaagtagtcctaat neg reg of transcription, dna-dependent 3TJTR CXXC finger 5 (Cxxc5) NM_133687 actaataagcactactgtaat DNA binding 3'UTR cyclic nucleotide gated channel alpha 1 (Cngal) NM_007723 tttaaacaagcgcagatattaaactaac metal ion transport 5'UTR cyclin A1 (Ccnal) NM„007628 actaacgtcagttgtacataac cell cycle 3'UTR cyclin B1 (Ccnbl) NM_172301 tttaacagtggatccaactaat cell cycle 3UTR cyclin B3(Ccnb3) NM_183015 ttctaatggagaagccactaac cell proliferation CDS cyclin C (Ccnc) NM_016746 a ctaatactggtttttgattagataat cell cycle 3UTR cyclin G1 (Ccngl) NM_009831 actaattgagtcggcccatgataat celt cycle CDS cyclin K (Ccnk) NM_009832 gttaatagcttcagtaaggagtgactaac cell cycle 3TJTR eyelin-dependent kinase inhibitor 1B(P27) (Cdknlb) NM_009875 actaatgctcccacagaattgattttaac cell cycle 3'UTR cylindromatosis (turban tumor syndrome) (Cytd) NM_173369 cttaatttaaataagactaat Unknown function 3'UTR cysteine and histidne rich 1 (Cyhrl) NMJ80962 tataattttactaat Unknown function 31ITR cysteine knot superfamily 1 BMP antagonist 1 (CKtsftbl) NM_011824 actaatgggggaggtaac development embryonic 3UTR cysteinyl leukotriene receptor t (Cysltrl) NM_021476 tataactgtgaactaac GPCR Protein signaling pathway 3UTR cybdine 5-triphosphate synthase 2 (Ctps2) NM_018737 a ctaactgtgcagtccactataat pyrimidine nucleotide biosynthesis 3'UTR cytochrome c somatic (Cycs) NM_007808 acctaatagcttatcttaaaaaggctactaat metabolism CDS cytochrome P450 famfly 1 subfamily a polypeptide 1 (Cyp1a1) NMJ309992 actaatggcaagagcatgacttttaac electron transport CDS cytochrome P450 famBy 2 subfamily c polypeptide 55 (Cyp2c55) NM_028089 actaatatgttaaaagtgtaat electron transport 3TJTR cytochrome P450 famiy 2 subfamily r polypeptide 1 (Cyp2r1) NMJ 77382 gata aca a atgctgtttca a a c ata a eta a c Unknown function CDS cytochrome P450 famiy 26 subfamily a polypeptide 1 (Cyp26a1) NMJJ07811 actaacaaggaggaatttaat electron transport CDS cytoplasmic FMRt interacting protein 1 (Cyfipl) NM_011370 cataataccaactaat Unknown function 3UTR cytoplasmic polyadenylation element binding protein 1 (Cpebl) NMJJ07755 tgctaactctgtctttgtttctgcactaat cell proliferation 3'UTR cytotoxic T-lymphocyte-assoriated protein 4 (Ctla4) NM_009843 actaacactgttagtgttttttttttaac immune response 3'UTR DEAD (Asp-Glu-Ala-Asp) box polypeptide 3 Y-linked (Ddx3y) NM_012008 actaatgeagataac DNA binding 3TJTR DEAD (Asp-Gtu-Ala-Asp) box polypeptide 42 (Ddx42) NMJJ28074 actaattcagcacagcagggccataat nucleic acid binding CDS DEAD (Asp-Glu-Ala-Asp) box polypeptide 6 (Ddx6) NMJ81324 actaactctgaaaggagtaac nucleic acid binding CDS DEAH (Asp-Glu-Ala-His) box polypeptide 9 (Dhx9) NM_007842 actaacactggaccagataat helicase activity CDS defensin beta 12 (Defb12) NM_152802 actaattattatgggattaat Unknown function 3UTR degenerative spermatocyte homolog (Degs) NM_007853 actaacatcctgtgctattaat oxidoreductase activity 3UTR desmocollin 1 (Dsc1) NMJ)13504 cttaatgaagcccagttcactaac cell adhesion CDS desmoglein 1 beta (Dsglb) NMJ 81682 ggtaatcgtgacccagtgactaat Unknown function CDS desmoglehi 3(Dsg3) NM_030596 actaatttggctcttagtggataat celt adhesion 3UTR developmental regulated RNA binding protein 1 (Drbpl) NMJ 53405 actaactagtgacataat Unknown function 3'UTR diaphanous homolog 2 (Drap2) NM_017398 actaaca agtttgca agttttaac ceil cycle cytokinesis 3'UTR

DiGeorge syndrome critical region gene 8 (DgcrS) NMJJ33324 gactaaccattgcccttaaaacatacttaac Unknown function 3'UTR dihydrolipoamide dehydrogenase (Did) NM_007861 actaatagattttacctctaat electron transport 3UTR dihydropyrimidine dehydrogenase (Ppyd) NMJ 70778 cataataaacctaactaac electron transport 3TJTR

Dip3 beta (Dip3b) NM_145220 gattaatgctgtctgtaccactaac cell proliferation CDS dipeptidylpeptidase 4 (Dpp4) NM_010074 actaatgtttaac proteolysis and peptidolysis 5UTR disabled homolog 2 (Dab2) NMJJ23118 tttaataaaatgaaagcaagactaac signal transduction 3'UTR disrupted meiotic cDNA 1 homolog (Dmc1 h) NM_010059 actaatatataat cell cycle 3'UTR DNA methy)transfera$e2(Dnrnt2) NMJ310067 tata acatggagaaactaac DNA methyiation 3TJTR DNA primase p58 subunit (Prim 2) NM_008922 actaacttttaaagttttgtataat metabolism 3TJTR On a J (Hsp40) homolog subfamily A member 2 (Dnaja2) NMJJ19794 tataatttaaactaac protein folding 3'UTR dopachrome tautomerase (Dct) NM_010024 actaatgaggagctcttcctaac metabolism CDS doublesex and mab-3 related transcription factor like family A1 NMJ75647 tgtaactgttacagactaac Unknown function 3'UTR Down syndrome critical region gene 1-ttke 2 (Dscr1t2) NMJ322980 tgtaacaaacgttgatgttgaaactaat Unknown function 3UTR dual adaptor for phosphotyrosine and 3-phosphoinositides 1 (Dappl) NMJD11932 actaattaaaactctttaae signal transduction 3TJTR dual specificity phosphatase 16{Dusp16) NM_181320 tgtaatctcaacatgaactgcagactaac signal transduction 3TJTR dual specificity phosphatase 18(Dusp18) NM_173745 agctaactgdgttctagactaat dephosph orytation 3'UTR dynactin 4 (Dctn4) NM_026302 actaacaftgagttgcaaaccttaat Unknown function 3TJTR dystonin (Dst) NM_010081 tgta acccgcgcaactaac oligosaccharide biosynthesis CDS dystonin (Dst) NM_133833 tgtaacccgcgcaactaac cell cycle arrest CDS dystrophia myotonica-contairting WD repeat motif (Dmwd) NMJ) 10058 tattaacaagactaat cell proliferation 3UTR 93

dystrophin muscular dystrophy (Dmd) NM_007868 tttaacaccaacactgtaa catttactaat Unknown function 3UTR E2F transcription factor 7 (E2f7) NMJ 78609 actaattatgcatttaac Unknown function 3TJTR E74-like factor 5 (Elf5) NM_010125 actaataagtctcttaac transcription regulation 3UTR early growth response 2

growth hormone secretagogue receptor (Ghsr) NM_177330 actaattacaagcctttaat GPCR Protein signaling pathway 3'UTR GrpEJike 1 mitochondrial (Grpeh) NM_024478 gttaatcagtcaacactaac Unknown function 3UTR

GTPase activating RANGAP domain-like 1 (Gamlt) NMJD19994 tgtaatttgtgtttcattaaagtttactaat transcription regulation CDS guanine nucleotide binding protein alpha 13 (Gna13) NM_010303 actaacttaat cell differentiation 3TJTR guanylate cyclase 1 soluble alpha 3(Gucy1a3) NMJ)21896 cgtaacaacaatgtacttcctactaat signal transduction 3'UTR guanylate cyclase 2e (Gucy2e) NM_008192 actaacgcccgaggaagtaat protein amino acid phosphorylation CDS

guanylate cyclase activator 1B (Gucal b) NMJ 46079 tttaactggga gatgtggggatcctactaat cell-cell signaling 3'UTR guanylate nucleotide binding protein 2 (Gbp2) NMJ310260 tgta atca ca ga a c a ggca a ctttt a a eta a c response to external stimulus 3'UTR H3 histone family 3B (H3f3b) NMJD08211 ggta acacaacactaac Unknown function 3UTR

hairy/enhancer-of-split related with YRPW motif-like (Heyl) NMJJ13905 actaatctttattacatacttgttaat transcription regulation 3'UTR hematopoietic cell transcript 1 (Hemtl) NMJ) 10416 tataacttgea atgtgggag gccccactaat Unknown function 3'UTR hemogen (Hemgn) NMJJ53149 actaacatttttaaggct ctaac Unknown Junction 3UTR hepatitis A virus cellular receptor 2 (Havcr2) NMJ 34250 actaatctcaaatgtttta aa gtaat protein amino acid phosphorylation 3'UTR hephaestin (Heph) NM_181273 actaataagtaac differentiation erythrocyte 3'UTR heterogeneous nuclear ribonucleoprotein A1 (Hnrpal) NMJ) 10447 actaattgtataac RNA processing 3'UTR high mobility group AT-hook 2 (Hmga2) NMJJ10441 cttaacttactaat transcription regulation 3TJTR high mobility group AT-hook 2 (Hmga2) NMJ 78057 cttaacttactaat transcription regulation 3'UTR high mobility group box 1 (Hmgbl) NMJM0439 actaataaactaat transport 3TJTR histidine ammonia lyase (Hal) NMJ) 10401 actaattgtactaat metabolism 3'UTR histocompatibility 2 T region locus 10 (H2-T10) NM_010395 actaatagagatagggtttaat immune response 3'UTR histocompatibility 2 T region locus 17 (H2-T17) NMJJ10396 actaatagagatagggtttaat immune response 3'UTR histocompatibility 2 T region locus 22 (H2-T22) NMJH0397 acta atagagatagggtttaat immune response 3'UTR histocompatibility 2 T region locus 9 (H2-T9) NM_010399 actaatagagatagggtttaat immune response 3'UTR histone 1 H3g(Hist1h3g) NM_145073 actaatattgtaac Unknown function 3'UTR

HIV-1 Rev binding protein (Hrb) NMJM0472 actaattttagtttatctttttgttaat Unknown function 3'UTR holocarboxytase synthetase (biotin- [propriony-Coenzyme NMJ 39145 ttta atttcctttctgtaagttaaaactaat protein modification 3UTR homeo box C5 (Hoxc5) NMJ75730 actaacctttgtaac development 3'UTR homeodomain leucine zipper-encoding gene (Homez) NMJ83174 gataatgtctgactcgatttgtcactaat transcription regulation 3'UTR homer homoJog 1 (Homer!) NMJ 52134 actaacagaattacgagataat Unknown function CDS huntingtin interacting protein 2 (Hip2) NMJM6786 tttaatggagggtggcttggtaacactaat ubiquitin cycle 3'UTR Hus1 homolog(S. pombe) (Hus1) NMJJ08316 tttaattaaactaat protein amino acid phosphorylation 3TJTR hydrocephalus inducing (Hydin) NMJ 72916 tataaccattcacaatcgcactaat RNA-directed DNA polymerase activity CDS hydroxymethylbilane synthase (Hmbs) NMJ313551 tgtaaccaataccactaat metabolism heme 3'UTR hydroxysteroid (17-beta) dehydrogenase 4 (Hsd17b4) NM_0O8292 gataacttgcatattttcattttctactaat metabolism 3'UTR hydroxysteroid 11-beta dehydrogenase 1 (Hsd11b1) NM_008288 actaatgtaac development lung 3'UTR hypoxia inducible factor 1 alpha subunit (Hift a) NM_010431 gttaactcagtttgaactaac signal transduction CDS influenza virus NS1A binding protein (Ivnslabp) NMJJ54102 actaacaggcttagtgatgtaat Unknown function 3UTR inhibitor of growth family member 1-like (Ingll) NM_023503 actaatttcattataat transcription regulation 3UTR inhibitor of kappaB kinase beta (Ikbkb) NM_010546 a eta a eta ctctt gc atcta a c protein amino acid phosphorylation 3TJTR inositol 145-triphosphate receptor 3 (Itpr3) NM_080553 actaatgtgttaat ion transport 3'UTR

inositol polyphosphate-5-phosphatase F(lnpp5f) NM_178641 actaaccgtgtgtctaat Unknown function CDS insulin-like growth factor 2 binding protein 3 (Igf2bp3) NM_023670 actaacatggataac Unknown function 3UTR insulin-like growth factor binding protein 7 (Igtbp7) NMJJ08048 cttaacctaacccactaac cell growth regulation 3'UTR integral membrane protein 2B (Itm2b) NMJD08410 actaatctggatttttgtgttaat Unknown function 3UTR inter-alpha trypsin inhibitor heavy chain 3 (Itih3) NM_008407 cataacaatggagaaggactaat Unknown function CDS interferon alpha-inducible protein (Gtp2) NMJJ15783 ggtaacaatttcctggtgtctgtga ctaac immune response CDS interferon inducible protein 1 (Ifi1) NM_008326 actaacatcgaatcacacataat Unknown function 5'UTR interferon-reiated developmental regulator 2 (Ifrd2) NM_025903 ggctaacggtcctagaggactaac cell proliferation 3UTR

interteukin 1 receptor-like 1 (lllril) NMJJ10743 actaatatgaaaacatttttaat DNA methylation 3'UTR interleukin 17 receptor E (H17re) NMJ 45826 actaatgtaat Unknown function 5UTR interteukin 18(1118) NM_008360 tgtaatgttcactctcactaac immune response CDS interleukin 2 receptor alpha chain (H2ra) NM_008367 actaatgtaaataat signal transduction 3TJTR

interteukin 23 alpha subunit p19 (1123a) NMJJ31252 ttctaacagaatctagtcactaagaa ctaac Unknown function 3UTR interteukin 7 (117) NM_008371 tataactttgttaagagagaaaacactaat immune response 3'UTR intracistemal A particles (lap) NMJJ10490 actaactaggaactgggtttggccttaat DNA recombination 3UTR isocitrate dehydrogenase 3 (NAD+) alpha (Idh3a) NM_029573 gataacattattctaatactaat metabolism 3TJTR iumonii domain containing 2B (Jmjd2b) NMJ 72132 tttaactgcgctgagtccactaac transcription regulation CDS jumonji domain containing 2C (Jmjd2c) NMJ44787 actaatgtaac transcription regulation 3'UTR junction adhesion molecule 2 (Jam2) NMJ323844 acctaactgcacactaat Unknown function 3TJTR karyopherin (importin) alpha 3(Kpna3) NM_008466 actaaccgttaagtaac protein transport 3UTR karyopherin (importm) beta 3 (Kpnb3) NMJ523579 cttaactaccacctgccagaagactaac protein transport 3'UTR kelch repeat and BTB(POZ) domain containing 2 (Kbtbd2) NMJ45958 tataattccactgcacttaacactaat Unknown function 3'UTR kelch-like 4 (Klhl4) NMJ 72781 tataatttgctatactaat Unknown function 3'UTR kelch-like 5 (KIM5) NM_175174 actaatacttgattaat Unknown function 3UTR kelch-like ECH-associaled protein t (Keapl) NM_016679 actaaccggcttaac transcription regulation CDS keratin complex 1 acidic gene 12 (Krt1-12) NM_010661 actaactttaat cytoskeleton organization and biogenesis 3UTR killer cell immunoglobulin-like receptor three domains long NMJ 77749 cttaatgtagtttactaat Unknown function 3UTR killer cell lectin-like receptor subfamily A member 5 (Klra5) NMJ308463 actaactgtaat cell adhesion CDS killer immunoglobulin-like receptor-tike 2 (Kirl2) NM_177748 cttaatgtagtttactaat Unknown function 3tJTR kinase interacting with leukemia-associated gene (stathmin) (Kist) NMJ310633 tttaatcctactaat protein amino acid phosphorylation 3'UTR kinesin family member 23 (Kif23) NMJJ24245 actaatgaccgaggaccttaac microtubule-based process CDS kinesin-like7(Knst7) NM_010620 cttaactatgaatcatggtgcgcactaat microtubule-based process 3UTR kinetochore associated 2 (Kntc2) NM_023294 actaatctttcatagatataac Unknown function 3UTR kit ligand (Kill) NM_013598 actaactcaattcttatagtaat development germ-cell 3'UTR lactamase beta 2 (Lactb2) NM_ 145381 acta atttaaaatattatgtgattaat Unknown function 3UTR

laminin alpha 2(Lama2) NM_008481 tattaataaaactaat cell adhesion 3'UTR LanC (bacteria) lantibiotic synthetase component C) -like 1 NM_021295 actaactagtacaaactctaac GPCR Protein signaling pathway 3'UTR latent transforming growth factor beta binding protein 2 (Ltbp2) NMJ713589 actaatactaac calcium ion binding 3TJTR lecithin-retino! acyltransferase NM_023624 actaactttattgtaat vitamin A metabolism 3'UTR leucine rich repeat and fibronecbn type III domain containing 5 NMJ 78714 actaatgatccggtataaggtttgtaa c Unknown function CDS

leucine rich repeat containing 16 (Lrrc16) NMJI26825 actaataaataat metabolism glycolysis 3'UTR leucine rich repeat containing 21 (Lrrc21) NMJ 46245 gttaacgaggcaagcagagacactaat Unknown function 3TJTR leucine rich repeat protein 3 neuronal (Lrrn3) NM_010733 actaatattaat Unknown function CDS leucine rich repeat transmembrane neuronal 4 (Lrrtm4) NMJ 78731 actaacagaaaggcaacttataac Unknown function 31ITR leucine zipper protein 2 (Uzp2) NMJ 78705 tataattatcactaac Unknown function 3'UTR leucine-rich repeat LG( family member 1 (Lgi1) NM_020278 actaaccaaaccgacattcctaac Unknown function CDS leucine-rich repeat-containing 6 (testis) (Lrrc6) NMJ319457 gataacccagaagtgcctccactaat Unknown function CDS leukemia inhibitory factor receptor (Lifr) NM_013584 cgtaatacagagactaat Unknown function CDS ligase IV DNA ATP-dependent (Lig4) NMJ 76953 actaatggatgaattagacgtcctaat DNA repair CDS CIM and senescent cell antigen-like domains t (timsl) NM_026148 ctta a c ata cttgt a acta eta at cell adhesion 3UTR LIM domain only4 (Lmo4) NMJ310723 actaatgaagctaat transcription from Pol II promoter 3'UTR lin 7 homologc(C. elegans) (Lin7c) NM_011699 actaatttgtaac Unknown function 3TJTR lipoma HMGIC fusion partner (Lhfp) NMJ 78358 actaatcatgttcacgtgcatctaat Unknown function 3'UTR lipopolysaccharide binding protein (Lbp) NMJD08489 actaatgtatttgcctcattaac transport lipid CDS low density lipoprotein receptor-related protein 6 (Lrp6) NMJ508514 actaatcgtattgaagtttctaat receptor activity CDS LSM8 homologU6 small nuclear RNA associated NMJ 33939 actaatgeaaataat processing 3TJTR lymphocyte antigen 78 (Ly7fl) NM_00B533 a ctaat ctga gectta ac immune response CDS lymphoid nuclear protein related to AF4-like (Laf4l) NMJ333565 gttaatggtgagcaaagtattatatactaac Unknown function 3'UTR lysosomal trafficking regulator (Lyst) NM_010748 tttaacacaaactaac signal transduction CDS

M phase phosphoprotein 6 (Mphosph6) NMJ326758 actaattattgcctttgtttaac cell cycle regulation 3TJTR Machado-Joseph disease (spinocerebellar ataxia 3) NM_029705 actaataaaatatatttaa ataat transcription regulation 3'UTR macrophage scavenger receptor 1 (Msrl) NMJJ31195 actaattcatttaac transport endocytosis 3UTR male enhanced antigen 1 (Meal) NM_010787 actaacaactctggtcttaac development 3UTR malic enzyme supernatant (Modi) NM_008615 tcctaatgacaacttgtgactaac carboxylic acid metabolism 3TJTR mannose-6-phosphate receptor cation dependent (M6pr) NMJM0749 a eta at cc ca a gtca gccctcc a gta a t transport CDS mannosidase 2 alpha 1 (Man2a1) NM_0Q8549 cttaattgaccaactaat metabolism carbohydrate CDS MAP/microtubule affinity-regulating kinase 3 (Mark3) NM_021516 actaatctttttagtaaattaac protein amino acid phosphorylation CDS MAS1 oncogene (Mas1) NM_008552 actaactgactaac GPCR Protein signaling pathway 5UTR MAS-related GPR member A2 (Mrgpra2) NMJ53101 gataatgacaatgagtgtctggcaactaac neurotransmitter transport CDS matrix metalloproteinase 1a (interstitial collagenase) (Mmpla) NM_032006 actaactacaaattaat collagenase activity 3'UTR matrix metalloproteinase 1b (interstitial collagenase) (Mmplb) NM_032007 ttctaatgatgaagaggcacta at Unknown function CDS matrix metalloproteinase 7

melanoma antigen (Mela) NM_008581 tttaacctctcctgggaagtgactaat Unknown function CDS melanoma antigen family I 2 (Magel2) NM_013779 actaattgtaat Unknown function 3'UTR melatonin receptor 1A(Mtnrla) NM_008639 actaatacccaataat GPCR Protein signaling pathway CDS membrane protein paimitoylated 6 (MAGUK p55 subfamily member 6) NM_019939 actaatatataal signal transduction 3UTR membrane-spanning 4-domams subfamily A member 11 (Ms4a11) NM_022431 gttaatcagcaaccatgaaaaacatactaac Unknown function 3UTR membrane-spanning 4-domams subfamily A member 4B (Ms4a4b) NM_021718 actaatgataggaaccaattttaat Unknown function 3UTR

membrane-spanning 4-domains subfamily A member 6D (Ms4a6d) NMJJ26835 gttaatcagcaaccatgaaaaacatactaac Unknown function 3UTR meprin 1 alpha (Mepla) NM_008585 cttaataaaaccactaac proteolysis and peptidolysis 3'UTR metal response element binding transcription factor 1 (Mtf 1) NMJJ08636 actaataatcccaccataac transcription regulation CDS methionine aminopeptidase 2 (Metap2) NM_019648 tattaacagctgta aaggatgecacta at macromolecule metabolism CDS methyt-CpG binding domain protein 2 (Mbd2) NMJ510773 tttaatagcactaac transcription regulation 3TJTR methyt-CpG binding domain protein 3-like 2 (Mbd3l2) NMJ44934 actaatattaat Unknown function 3UTR methylmalonic aciduria (cobabmin deficiency) type A (Mmaa) NMJ33823 tcctaatcggcgtttactaac Unknown function CDS methyttransferase like 2 (Mettf2) NMJ72567 actaacaataac SAM-dependent methyttransferase activity CDS microtubule associated serine/mreonine kinase-like (Mast!) NMJJ25979 tttaacagtcatattaacgcatctactaat Unknown function CDS microtubule-associated protein RP/EB family member 2 (Mapre2) NMJ 53058 tttaattgatgggatactaac Unknown function 3'UTR

Mid-1-related chloride channel 1 (Mclc) NMJ45543 actaacagggtatttaat Unknown function 3UTR midline 2 (Mid2) NMJ) 11845 tttaataaactaat Unknown function 3TJTR mitochondrial folate transporter/carrier (Mftc) NMJ 72402 actaatactaac transport 3'UTR mitochondrial ribosomal protein L35 (Mrpl35) NM_025430 actaattttattgtaac Unknown function 3'UTR 96

mitochondrial ribosomal protein L50 (Mrp!50) NM_178603 actaacggagatttgttaaagggtaat Unknown function 3'UTR mitogen activated protein kinase 9 (Mapk9) NM_016961 ggtaatggaactaat receptor activity CDS mitogen activated protein kinase kinase 4 (Map2k4) NM_009157 cgtaactcactaac protein amino acid phosphorylation 3UTR mitogen-activated protein kinase kinase kinase 9 (Map3k9) NM_177395 actaaccacatgttaat signal transduction 3TJTR mitogen-activated protein kinase kinase kinase kinase 5 (Map4k5) NMJJ24275 cataattttgtcaaaata gcactaac protein amino acid phosphorylation CDS Mki67 (FH A domain) interacting nucleolar phosphoprotein (Mki67ip) NM_026472 actaatttgggtttcttgaatataac RNA binding 3'UTR MMS19 (METIS Scerevisiae)-like(Mms19l) NM_028152 atctaacctgcaagtactaac Unknown function 3TJTR moesin(Msn) NM_01Q833 actaacactgtgctggagccactaac structural molecule activity 3TJTR molybdenum cofactor synthesis 2 (Mocs2) NM_013826 actaactcctaac sulfur metabolism 3UTR Msx-interacting-zinc finger (Miz1) NM_008602 a eta a c a a ca a cttctgttta at transcription DNA-dependent 3TJTR multiple inositol polyphosphate histidine phosphatase t (Minppl) NM_010799 aattaatgataaactaat Unknown function CDS mutS homoiog 4 (E. coli) (Msh4) NM__031870 lattaattgacgagcttggcagaggeactaat DNA metabolism CDS myelin and lymphocyte protein T-cell differentiation protein NM_010762 ccctaacattacactaac Unknown function 3'UTR myelin basic protein (Mbp) NM_010777 actaacctcggtggaaaaataac structural molecule activity 3'UTR myeloid leukemia factor 2 (Mlf2) NM_145385 actaatgaffigtgea acttgtaac Unknown function 3'UTR myeloid/lymphoid or mixed lineage-leukemia translocation to 3 NMJJ27326 actaactttcaataac anterior/posterior pattern formation CDS myomesin 2 (Myom2) NM_00B664 actaacaactgggtccagtgtaac development muscle CDS myoneurin (Mynn) NM_030557 actaacttaac DNA binding 3TJTR myosin IB(Myolb) NM_010863 actaatgagttttaat cytoskeleton organization and biogenesis 3'UTR myotrophin (Mtpn) NM_OOB098 a eta a ctt ga gatttta a t transcription regulation 3TJTR myotubularin related protein 2 (Mtmr2) NM_023858 cataattaaagtatgacactaat protein amino acid dephosphorylation 3TJTR NACHT LRR and PYD containing protein 9a (NatpSa) NM_194056 actaatctctctaat Unknown function CDS naked cuticle 1 homoiog (Nkd1) NM_027280 tattaataattattgttactccactaat Unknown function 3UTR N-deacetytase/N-sutfotransferase (heparin glucosaminyl) 4 (Ndst4) NM_022565 tataattactcggaagcactaac metabolism CDS nemo like kinase (Nik) NMJJ08702 tgtaattttactaat protein amino acid phosphorylation 3'UTR nephronophthisis 1 (juvenile) homoiog (Nphpl) NM_016902 actaataataat Unknown function 3'UTR neural cell adhesion molecule 2 (Ncam2) NM_010954 tataactgcacagctactaac cell adhesion CDS neural precursor cell expressed developmentally down-regulated NMJ)08682 tgtaaccactaat signal transduction CDS neurexin l(Nrxnl) NM_020252 actaattctgtataac cell adhesion CDS neuroblastoma myc-related oncogene 1 (Nmycl) NM_008709 gttaatctgttatgtactgtactaat cell cycle regulation 3UTR neurofibromatosis 1 (Nf1) NM_010897 actaatattttgtaac cell differentiation regulation-glia 3UTR neuroligin 1 (Nlgnt) NM_138666 actaactgctgtaac cell adhesion 3'UTR neuropeptide Y receptor Y5 (Npy5r) NMJJ16708 actaatatttaac regulation of synapse 3UTR NIMA (never in mitosis gene a) -related expressed kinase 6 (Nek6) NM_021606 actaatcagtttcagaggaccacaccactaac macromolecule metabolism 3'UTR N-myc downstream regulated 4 (Ndr4) NM_145602 gttaactgtgcagaaaaactaat Unknown function 3TJTR Nome disease homoiog (Ndph) NMJ310883 agcta actgetactaa aataactaac Unknown function 3TJTR nuclear DNA binding protein (C1d) NM_020558 actaacttttaat transcription regulation 3'UTR nuclear factor of activated T-cells 5 (NfatS) NM_133957 actaaccccttctctctaat transcription regulation 3'UTR nuclear protein 15.6 (Np15) NM_019435 adaacagaaatgataac oxidoreductase activity 51ITR nuclear receptor coactivator 2 (Ncoa2) NM_008678 actaatgagcctcagcttgtaat signal transduction CDS nuclear receptor coactivator 3 (Ncoa3) NM_008679 gttaatggagtttcttggactaat protein amino acid phosphorylation CDS nuclear receptor coactivator 6 (Ncoa6) NM_01B825 actaatcctcctaac development nervous system CDS nuclear receptor interacting protein 1 (Nripl) NM_173440 actaatttttgettaae transcription regulation 3'UTR nuclear receptor subfamily 0 group B member 1 (NrObl) NM_007430 ttta ataaaattta aggtactaac transcription regulation 3'UTR nuclear receptor subfamily 2 group F member 1 (Nr2fl) NM_010151 actaataatttttgatataac cell cycle regulation 3'UTR nuclear receptor subfamily 6 group A member 1 (NrSal) NM_010264 actaatggagagactgacagtttaac transcription regulation CDS nuclear receptor-binding SET-domain protein 1 (Nsdl) NM_008739 actaattcccatgcagaccatttaat transcription regulation CDS nuclear transcription factor-Y beta (Nfyb) NM_010914 actaattgaggtgttaat transcription regulation 3UTR nucleolar protein 3 (apoptosis repressor with CARD domain) (Nol3) NM_030152 actaatcctggcctgaacgtgggataac apoptosis 3'UTR nucleolar protein 7 (Nol7) NM_023554 a eta atatctgta gaa gtaa attttta at Unknown function 3'UTR nucleoporin 160 (Nup160) NM_021512 a ct aa ctgattctggtgcctta at transport CDS nucleoporin 210(Nup210) NM_018815 cata attca gtcctcaattttgccactaac ion transport CDS nucleoporin 50 (Nup50) NM_016714 actaacctaagtgattcttaat development nervous system 3'UTR nucleoporin 88 (NupSS) NM_172394 actaatgagaataat Unknown function 3TJTR nudix (nucleoside diphosphate linked moiety X) -type motif 15 NM_172527 tataatggccagatatcactaat Unknown function 3UTR odd Oz/terr-m homoiog 3 (Odz3) NM_011857 ttta ataatggattttactaac Unknown function 3'UTR olfactomedin 3 (Olfm3) NM_153157 gataacatttgtgttactaat Unknown function 3'UTR olfactory receptor 1085 (Otfr1085) NM_146590 ggtaatttaggcatcatcattattactaat transport CDS olfactory receptor 1121 (Orfr1121) NMJ46348 actaatactaac Unknown function CDS olfactory receptor 135 (Olfr135) NM_146332 a eta atgagcttgctctctctgtaat Unknown function CDS olfactory receptor 1351 (0»fr1351) NM_147040 ggtaactagaaacagactaac Unknown function CDS olfactory receptor 1384 (C*fr1384) NMJ46472 tgtaattttggtctgtcctgtagcactaat protein amino acid phosphorylation CDS olfactory receptor 1454 (Orfr1454) NM_146692 cataatatcctacacattcatttttadaac Unknown function CDS olfactory receptor 536 (Olfr536) NM_146520 tttaata gttgaaggaaggaaactaat Unknown function 3'UTR olfactory receptor 547 (Olfr547) NMJ47079 a ctaactttctgcaaaaataat Unknown function CDS olfactory receptor 66 (Offr66) NM_013618 actaacaaattctaac Unknown function 5'UTR olfactory receptor 77 (Orfi77) NM_146339 actaatttctatgactttttatttataat Unknown function 3'UTR olfactory receptor 770 (Orfr770) NM_146863 ggtaatagtactaac Unknown function CDS olfactory receptor 859 (Olfr859) NM_146526 actaacattttgtactaat Unknown function CDS olfactory receptor 866 (Olfr866) NM_146558 cttaatcttgcttgtactaac Unknown function CDS linked signal olfactory receptor MOR135-4 (MOR135-4) NMJ 47022 tattaatgaactaat transduction CDS cell surface receptor linked signal olfactory receptor MOR136-11 (MOR136-11) NM_146938 atctaacaacactaat transduction CDS olfactory receptor MOR204-4 (MOR204-4) NM_146775 agctaatcatcttgggactaac perception of smell CDS olfactory receptor MOR204-5 (MOR204-5) NMJ46774 agctaatcatcttgggactaac perception of smell CDS oligodendrocyte myelin glycoprotein (Omg) NMJJ19409 actaatgactmttmtaat GPCR Protein signaling pathway 3'UTR oncoprotein induced transcript 3 (Oit3) NM_010959 tgtaactgggccactaat Unknown function 3'UTR oogenesin 1 (Oogl) NM_178657 a eta attg aa ggcttaac Unknown function CDS 97

oogenesin 4 (Oog4) NM_173773 tataatgacatgaatataatactaac Unknown function CDS opioid receptor mu (Oprm) NM_011013 ggctaatacagtggatcga a eta ac behavior 3'UTR ORMHike2(Ofmdl2) NM_024180 actaatatataat Unknown function 3TJTR ornithine decarboxylase antizyme inhibitor (Oazin) NM_018745 actaatgggatctaac polyamine metabolism 3'UTR osteocrin (Ostn) NMJ 98112 actaataaaggatagtataat Differentiation osteoblast 3UTR osteomodulin (Omd) NMJJ12050 actaattacaaatgtaaacatgtaac cell adhesion 3'UTR otogelin (Otog) NMJJ13624 cataatataaggaaaggggatttaactaac metabolism 3UTR otopetrin 1 (Otopl) NMJ72709 actaatgtaccacaattaac perception of gravity 3'UTR paired box gene 9 (Pax9) NM_011041 tttaattcaccattaggaaactaat development endoderm 3'UTR papilin proteoglycan-like sulfated glycoprotein (Papln) NMJ30887 a eta a ttga a egtec eta a c serine protease inhibitor activity 30JTR paraoxonase 1 (Ponl) NM_011134 gcclaatggactaac circulation CDS parvin alpha (Parva) NM_020606 actaacccagtgaactaat cell adhesion 3'UTR

PDZ domain containing 1 (Pdzkl) NMJ321517 actaatataat signal transduction CDS pecanex homolog(Pcnx) NM„018814 tttaatgaaaaaaactgcactaat Unknown function 3UTR pellino 2 (Peli 2) NM_033602 actaactgtagattcta cctttttgtaat protein binding 3-UTR peptidylprolyl isomerase (cyclophilin) like 5 (Ppit5) NMJ327178 actaattaatgtaat Unknown function 3'UTR periaxin (Prx) NMJJ19412 actaacccgacactaat signal transduction 3XJTR pericentriolar material 1 (Pcm1) NM_023662 tttaacctgcctggatttacta ac immune response CDS peroxisome proliferator activated receptor alpha (Ppara) NM_011144 actaatctgcactttttaac transcription regulation 3UTR peroxisome proliferator activated receptor binding protein NMJ313634 actaataataat development nervous system CDS per-pentamer repeat gene (Ppnr) NM_012022 actaatggcagattaat Unknown function CDS PHD finger protein 6 (Phf6) NMJ327642 actaacagagtaac Unknown function 3'UTR phosducin (Pdc) NMJD24458 actaattactcaaattgagataat GPCR Protein signaling pathway 3'UTR phosphatidyiinositol 3-kinase catalytic beta polypeptide NM_029094 actaattttgttaat signal transduction 3'UTR phosphatidyiinositol 4-kinase type 2 beta (Pi4k2b) NMJJ25951 actaatcttaat signal transduction 3TJTR phosphatidyiinositol glycan class K (Pigk) NMJ 78016 aattaatccagctagccaaactaac macromolecule metabolism CDS phosphatidyiinositol glycan class L (Pigl) NM_199026 actaattlaccaatagtttgcatgataac Unknown function 3'UTR phosphatidytserine synthase 1 (Ptdssl) NM_008959 actaactggctaat phosphatidytserine biosynthesis 3TJTR phosphofhictokinase muscle (Pfkm) NM_0215t4 tgtaactacactaat metabolism glycolysis 3'UTR phosphoglucomutase 3 (Pgm3) NMJJ28352 actaataggtaac metabolism glucose 1-phosphate 3TJTR phospholipase A2 group IVA (cytosolic calcium-dependent) NMJD08869 tgcta ataggagaa aca eta at phospholipid metabolism CDS phospholipase C beta 3 (Plcb3) NM_008874 actaatactttgggtttttttta ac signal transduction 3UTR phospholipid scramblase 4 (Plscr4) NMJ 78711 tttaattatactaat Unknown function 3TJTR phosphoprotein enriched in astrocytes 15 (Pea15) NM.008556 actaacctgccctaat apoptosis 3'UTR phosphoribosyl pyrophosphate synthetase 2 (Prps2) NM_026662 cgta atcccagcactttaaa a ctaat nucleoside metabolism 3'UTR phosphoserine phosphatase (Psph) NM_133900 tattaatatgactactaat Unknown function 3'UTR phosphotri esterase related (Pter) NMJ508961 tglaatattgatgatcctactaat catabolism 3'UTR phytanoyl-CoA hydroxylase (Phyh) NM_010726 actaacctagggtgtaac metabolism fatty acid 3'UTR pleckstrin homology domain containing family C (with FERM domain) NMJ46054 a ctaacaagcacgattgtta at cellular morphogenesis 3'UTR plectin! (Pled) NM„011117 actaacacattaat Unknown function 3UTR plexin B1 (Plxnbl) NMJ72775 actaacaaccctaac Unknown function CDS pogo transposable element with ZNF domain (Pogz) NMJ 72683 actaattgccaacaacaatgctggtaat Unknown function CDS poly A binding protein cytoplasmic 2 (Pabpc2) NMJ311033 actaacacatcagcacagataac polyadenytation CDS poly A binding protein cytoplasmic 5 (Pabpc5) NM_053114 actaacttaat Unknown function 3'UTR polyadenylate binding protein-interacting protein 1 (Palpi) NMJ45457 actaatgtatgcaactttaat Unknown function 3TJTR polycystic kidney and hepatic disease Mike 1 (PkhdlM) NMJ 38674 actaatgaagtccagcaggtcacagtaac Unknown function CDS polycystic kidney disease 2 (Pkd2) NMJ308861 tataaccaaatttaatattaaaaaaactaat ceil cycle arrest 3'UTR polymerase (DNA directed) eta (RAD 30 related) (Polh) NMJJ30715 actaactaaagactaaacaggagtataac DNA repair 3'UTR postmeiotic segregation increased 1 (Pms1) NMJ 53556 actaatcagagtaat Unknown function CDS potassium channel modulatory factor 1 (Kcmft) NMJJ19715 gataatcatttccacttaactaat Unknown function 3'UTR potassium channel tetramerisation domain containing 4 (Kctd4) NM_026214 actaacaactataat potassium ion transport 3'UTR potassium intermediate/small conductance calcium-activated channel NMJ)80465 tgtaatttcactaac ion transport 3'UTR potassium voltage-gated channel Isk-related subfamily gene 4 NMJD21342 actaatgtccccatgaggggttaac ion transport 3'UTR potassium voltage-gated channel shaker-related subfamily member 1 NM.010595 gttaacaaaatctggactaat ion transport 3'UTR potassium voltage-gated channel shaker-related subfamily member 4 NMJ321275 a eta atggc c ac aca aata ac ion transport 3'UTR potassium voltage-gated channel shaker-related subfamily member 7 NMJJ10596 gtta ac a ctt gata ggta eta at ion transport 3'UTR potassium vpltage-gated channel subfamily G member 3 (Kcng3) NMJ 53512 cttaacagtgttatttttgagactaac ion transport 3UTR POU domain class 2 transcription factor 1 (Pou2f 1) NMJH1137 tttaaccagtgctgctgtgactaat transcription regulation CDS POU domain class 2 transcription factor 1 (Pou2f 1) NMJ98933 tttaaccagtgctgctgtgactaat transcription regulation CDS POU domain class 2 transcription factor 1 (Pou2ft) NMJ 98934 tttaaccagtgctgctgtgactaat transcription regulation CDS POU domain class 2 transcription factor 1 (Pou2f1) NMJ 98932 tttaaccagtgctgctgtgactaat development skeletal CDS pre B-cell leukemia transcription factor t (Pbx1) NM„0O8783 acta at cc agca atca a ata at sex differentiation 3'UTR premature ovarian failure 1B (Pof 1 b) NMJ 81579 actaataggagtacagcagtgctaac Unknown function 3'UTR procollagen type XII alpha 1 (Cott2a1) NM_00773O actaacttagtaat cell adhesion CDS procollagen-lysine 2-oxogJutarate 5-dioxygenase 1 (Plod!) NMJ)11122 actaatctcaggagatggtaac protein metabolism 3'UTR procollagen-proline 2-oxoglutarate 4-dioxygenase NMJJ1103O tataactccga cgtttacagctgactaac protein metabolism 3'UTR programmed cell death 6 (Pdcd6) NM_Q11051 actaattgtgccatgagacctaat apoptosis 3UTR prdactin-like protein F (Prlpf) NMJJ11168 cataatgccatgagactaac Unknown function CDS propionyt-Coenzyme A carboxylase alpha polypeptide (Pcca) NMJ44844 actaatagaaaaatitatcgataac metabolism CDS prostaglandin D2 synthase 2 hematopoietic (Ptgds2) NM_019455 actaatggtcattataat prostaglandin metabolism 3'UTR prostaglandin-endoperoxide synthase 2 (Ptgs2) NMJ31119B cattaa c c eta ca gta eta at response to oxidative stress 3'UTR proteasome (prosome macropain) 26S subunit ATPase 3 (Psmc3) NM_008948 ggtaattgcagccactaac protein amino acid phosphorylation CDS proteasome (prosome macropain) subunit alpha type 4 (Psma4) NMJ311966 cataacat ctga tgcta acgttctgactaac ubiquitin-dependent protein catabolism CDS protein kinase C and casein kinase substrate in neurons 2 NMJ) 11862 gttaacagtaagaagactaac signal transduction 3'UTR protein kinase C iota (Pricci) NMJ308857 gata atttcgattctcagtttactaat transport CDS protein kinase C theta (Prkcq) NMJJ08859 actaatgacatcatccctaat protein amino acid phosphorylation 3'UTR protein kinase cGMP-dependent type II (Pri

protocadherin beta 11 (Pcdhb11) NM_053136 actaatctgaataac cell adhesion 3UTR protocadherin beta 15 (Pcdhb15) NM_053140 gttaacatctagaagttaactaat cell adhesion 3'UTR protocadherin beta 20 (Pcdhb20) NMJ353145 actaatataat cell adhesion 3'UTR protocadherin beta 22 (Pcdhb22) NMJJ53147 gataattacccagaactaat cell adhesion CDS protocadherin beta 9 (Pcdhb9) NMJ153134 tgtaacattaactaac cell adhesion 3UTR protocadherin gamma subfamily A 5 (Pcdhga5) NMJJ3358B gataatta ctatcacctacta ac cell adhesion CDS protocadherin gamma subfamily A 9 (Pcdhga9) NMJ333592 actaacgataat cell adhesion CDS protocadherin gamma subfamily B 1 (Pcdhgbl) NM_033574 ctta a tgc ccc a g c aagta eta a c cell adhesion CDS PRP39 pre- processing factor 39 homolog (Prpf39) NMJ77806 gataatacctcactaat Unknovwi function 3TJTR PTEN induced putative kinase 1 (Pinkl) NM_026860 tgtaatgactaac protein amino acid phosphorylation 3'UTR pumilio 1 (Pum1) NMJ330722 actaattattttttttaat translation regulation 3'UTR pumilio2(Pum2> NMJD30723 actaacagttcagcagttaac translation regulation CDS P2Y G-protein coupled 1 (P2ry1) NM_008772 actaacccatcgtgatataac GPCR Protein signaling pathway 3'UTR purinergic receptor P2Y G-protein coupled 12 (P2ry12) NMJJ27571 actaatgattctaac GPCR Protein signaling pathway CDS puromycin-sensitive amtnopeptjdase (Psa) NM_008942 cttaatatccagactaat ion transport CDS quaking (Qk) NM_021881 actaatgtttaat development nervous system myelination 3'UTR RAB guanine nucleotide exchange factor (GEF) 1 (Rabgeft) NMJ) 19983 tttaacaggacattggcactaac Unknown function 3'UTR RAB12 member RAS oncogene family (Rab12) NM_024448 tgtaacccaagtcagctatacacta ac protein transport 3'UTR

RAB23 member RAS oncogene family (Rab23) NM_008999 actaactacatcggtaac development nervous system 3UTR RAB37 member of RAS oncogene family (Rab37) NM_021411 cttaatatacataaactaat protein transport 3UTR RAB3B member RAS oncogene family (Rab3b) NM_023537 ggtaa caaattgctcaactactcgga ctaat protein transport 3TJTR

RAB4A member RAS oncogene family (Rab4a) NMJJ09003 actaattggttaac protein transport CDS RAB5A member RAS oncogene family (Rab5a) NM_025887 cataattagtcagtgcactaac protein transport 3'UTR Rab6 interacting protein 1 (Rab6ipt) NM_021494 actaatgttttctatattaac protein binding 3UTR RAB7 member RAS oncogene family (Rab7) NMJ309005 gttaatgcttgttacttttaa ctaat protein transport 3UTR RAB9 member RAS oncogene family (Rab9) NM_019773 actaataaaattcagtta ac protein transport 3UTR Rac GTPase-activating protein 1 (Racgapl) NM.012025 actaacccttacctgtaac signal transduction 3'UTR RAD50 homolog (Rad50) NMJ309Q12 actaacttcactgttggggtactttcctaac Unknown function 5TJTR RAD51-like 1 (Rad51l1) NM_009014 actaacaagatttgtaat DNA repair 3UTR radlxin (Rdx) NM_009041 actaattctaat apical protein localization 3UTR ral guanine nucleotide dissociation stimulator-like 1 (Rgl1) NM_016846 atctaaccagtcgggagcta ctaat intracellular signaling cascade CDS RAN binding protein 6 (Ranbp6) NMJ 77721 actaattgtaaatctaat Unknown function CDS RAP1 GTP-GDP dissociation stimulator 1 .(Rap1gds1) NMJ45544 actaatgttgcataat Unknown function 3'UTR RAP2C member of RAS oncogene family (Rap2c) NMJ 72413 actaatagtttaaagcaatatttgttaat signal transduction 3'UTR RAS guanyl releasing protein 1 (Rasgrpl) NMJJ11246 actaacttgtaat signal transduction 3'UTR ras homolog gene family member A (Rhoa) NM_016802 tgtaactactttataactaac development muscle 3UTR RAS p21 protein activator 1 (Rasal) NMJ45452 actaatccatattgtaac cell growth and/or maintenance CDS RAS p21 protein activator 3 (Rasa3) NMJJ09025 actaacgatttgcaatgtattttaat signal transduction 3UTR recombination activating gene 1 (Rag1) NM_009019 tttaatccttacagatgtctgtgcactaat DNA recombination 3TJTR regulator of G-protein signaling 13(Rgs13) NMJ 53171 tataacatttttgetatagta ggcaactaat GPCR Protein signaling pathway 3UTR regulator of G-protein signaling 18 (Rgs18) NMJJ22881 actaatactaat GPCR Protein signaling pathway 3TJTR retinal G protein coupled receptor(Rgr) NM_021340 actaaegctagaa cagttgaacaagctaac neurophysiotogical process 3TJTR retinitis pigmentosa 2 homolog (Rp2h) NMJ 33669 actaatgetataat protein amino acid prenylatjon 3'UTR negative regulation of transcription, dna- retinoblastoma 1 (Rb1) NMJJ09029 tactaatttctacacattggactattttaat dependent 3'UTR retinoblastoma binding protein 4 (Rbbp4) NMJD09030 actaactgtgtaagtgcttataat cell cycle 3TJTR retinol binding protein 3 interstitial (Rbp3) NMJM5745 tgtaacataaattaaatccttactaat transport 31ITR retinol dehydrogenase 1 (all trans) (Rdh1) NMJ380436 a ctaa ctga ccattat a at metabolism 3VTR retinol dehydrogenase 11 (Rdh11) NMJJ21557 actaacggctcctgccctttgtaat metabolism 5TJTR retinoschisis 1 homolog(Rslrt) NMJJ11302 actaaccaactaac cell adhesion 3UTR RGM domain family member B (Rgmb) NMJ 78615 actaacgagccctgtttctaac Unknown function 3UTR Rho GTPase activating protein 18 (Arhgapl 8) NMJ 76837 ' actaataatagatttactaat Unknown function 3UTR Rho GTPase activating protein 5 (Arhgap5) NMJ)09706 actaatcttccatttacattaat signal transduction Rho protein CDS rhodopsin (Rho) NMJ45383 actaatataat GPCR Protein signaling pathway 3UTR

Rho-related BTB domain containing 3 (Rhobtb3) NM_028493 tgtaaccatggg'attcagaagagcactaac Unknown function 3UTR ribonucleotide reductase M1 (Rrm1) NMJJ09103 actaatggcaattctaat DNA replication y CDS ribonucleotide reductase M2 B (TP53 inducible) (Rrm2b) NMJ99476 ggtaatgatcatttaactaaattactaat metabolism deoxyribonucleotjde 3TJTR

ribosomal protein S3a (Rps3a) NM_016959 actaatgacttgaaggaagtagttaat protein biosynthesis CDS ribosomaf protein S5 (Rps5) NMJ509095 tactaactccatgatgatgcatggtcgtaac biogenesis CDS ribosomal protein S6 kinase polypeptide 1 (RpsoKbl) NMJJ28259 tataataaattttaactaat development germ-cell 3UTR ribosomal protein S6 kinase polypeptide 5 (Rps6ka5) NMJ" 53587 actaaccaagcatttgctgtcaaaataat proteolysis and peptidolysis CDS nngfingerm (RnflU) NM_033604 actaacctttaat pattern specification 3TJTR ring finger protein (C3HC4 type) 19(Rnf19) NM_013923 cttaatttataaccgtatgatactaac Unknown function 3'UTR ring finger protein 38 (Rnf38) NMJ 75201 cttaattttgtgtgtgcactaac Unknown function 3'UTR RIO kinase 2 (Riok2) NMJ325934 a eta atttaaatataaaataac Unknown function 3TJTR RNA (guanine-7-) methyttransferase (Rnmt) NM_026440 tataatatgaaactaat Unknown function CDS RNA binding motif protein 11 (Rbm11) NM_198302 actaatgggaataat Unknown function 3'UTR RNA binding motif protein 18(Rbm18) NM_026434 actaacccgggttttaat nucleic acid binding 3'UTR sarcolemma associated protein (Slmap) NM_032009 acta acacacgtgatgtgcatgatttaat Unknown function 3TJTR sarcospan (Sspn) NM_010656 actaatttccagctaat cell cycle regulation 3'UTR SCAN-KRAB-zinc finger gene 1 (Skz1) NM_023685 actaatattttttttcttgttaat transcription regulation 3TJTR schiafen 8 (SlfnB) NMJ 81545 g att a attgetatgea aatga eta a c Unknown function CDS secreted protein SST3 (SST3) NMJ 72463 actaaegtaac Unknown function 3UTR selenophosphate synthetase 1 (Sephsl) NMJ 75400 cataacaacccttgagaaccactaat protein modification 3'UTR selenoprotein (Sep15) NM_053102 actaacactcgtctttgtgggataat Unknown function 5UTR sema domain immunoglobulin domain (]g) and GPI membrane anchor NM_011352 tttaataatgtaacatattactaat development nervous system 3'UTR septin 2(Sept2) NM_010891 tttaattcagtccaa aactca gta gtactaat celt proliferation 3'UTR serine (or cysteine) proteinase inhibitor clade A member 3N NM_009252 actaactgtgttataac Unknown function 3'UTR serine (or cysteine) proteinase inhibitor clade B (ovalbumin) NMJJ25867 actaacccttctaac serine-type endopeptidase inhibitor activity CDS serine (or cysteine) proteinase inhibitor clade B member 5 NM_009257 actaacaga caccccttttcctaat serine protease inhibitor activity 3'UTR serine (or cysteine) proteinase inhibitor clade B member 9b NM_011452 a eta a tggttga at a at serine protease inhibitor activity 3'UTR serine (or cysteine) proteinase inhibitor clade B member 9f NM_011455 aactaacatgattctctaaaactaaacataat Unknown function CDS serine (or cysteine) proteinase inhibitor clade D member 1 NM_008223 gttaatgcaactaat blood coagulation 3UTR serine/arginine repetitive matrix 2 (Srrm2) NM_175229 accta attcaagtcaagatga actaat Unknown function CDS serine/threonine kinase 4 (Stk4) NM_021420 tttaatcttttgtaacaaaaactaat signal transduction 3'UTR serologically defined colon cancer antigen 13(Stard13) NMJ 46258 actaatataat Unknown function 3TJTR serum deprivation response (Sdpr) NMJ 38741 tgtaaccccaaatactgaattgctgaactaac Unknown function 3'UTR SET domain bifurcated 1 (Setdbl) NMJD18877 a ctaaccctttctcaagtcttaat chromatin modification 3UTR seven in absentia 1 (Siahl) NM_009172 actaatatatttaaaaataat development 3'UTR SH3 multiple domains 3 (Sh3md3) NM_199012 actaacagatttaaaataac Unknown function 3'UTR short coiled-coil protein (Scoc) NM_019708 tataataggaaagatccactaat Unknown function 3\JTR Shwachman-Bodfan- Diamond syndrome homolog (Sbds) NM_023248 actaattcttaaaggtttataat Unknown function 3'UTR sialyftransferase 8 (alpha-2 8-sialyftransferase) F (SiatSf) NM_145838 actaatgtcccaccga caccttttaat Unknown function 3'UTR sideroftexin 5 (Sfxn5) NMJ 78639 actaaccattcacattttaac transport 3'UTR SIGLEC-likel(SiglecM) NMJJ31181 tgtaacagccetcactaac Unknown function CDS signal peptide CUB domain EGF-like 1 (Scubet) NM_022723 actaacctgtaac Unknown function CDS signal transducing adaptor molecule (SH3 domain and ITAM motif) 1 NM_011484 a tcta a cctcctcacta a c transport CDS similar to Caenortiabditis elegans protein C42C1.9 (Keo4) NM_145502 actaacacactgattctccttaaagtaat Unknown function 3\JTR similar to KRAB zinc finger protein (Mzf22) NMJ 45622 actaat ggaataat Unknown function 3UTR sine oculis-related homeobox 6 homolog (Six6) NM_011384 cgtaattgctttgtgactaat development 3UTR single WAP motif protein 2 (Swam2) NMJ 38684 actaatacaacttaat xenobiotic metabolism 3'UTR single-minded 1 (Siml) NMJM1376 tttaatttgaaaa aaaactaat development nervous system 3UTR single-minded 2 (Sim2) NMJM1377 actaacaagctcgctcataac development nervous system 3'UTR sirtuin 1 ((silent mating type information regulation 2 homolog) 1 NH_019812 ggtaatgtccaaacaggcccctgagactaat development muscle 3'UTR Sjogren syndrome antigen A2 (Ssa2) NM_013835 actaattttcatatttttctaat RNA binding 3'UTR Sjogren syndrome antigen B (Ssb) NM_009278 actaacctgctaat RNA binding CDS slit homolog 2(Slit2) NMJ 78804 actaattcatgcttcataat axon guidance 3UTR small chemokine (C-C motif) ligand 11 (CclH) NM_011330 actaattaaaattaat signal transduction 3UTR small EDRK-rich factor 1 (Serfl) NM_0t1353 actaatcattatatgtgttataac Unknown function 3'UTR SMC (structural maintenace of 1)-like 2 NMJJ80470 tgtaatgggagaaaagacaactaat cell cycle CDS smoothened homolog (Smo) NMJ 76996 actaacctaat development nervous system CDS SNF2 histone linker PHD RING helicase (Shprh) NMJ 72937 tataatcatactaat transcription regulation 3'UTR sodium channel voltage-gated type III alpha polypeptide (Scn3a) NM_018732 actaattgtgcatagcacatctaat ion transport 3'UTR solute carrier family 12 NMJ 83354 actaacctaac ion transport 3UTR solute carrier family 16 (monocarboxylic acid transporters) NMJJ28247 gttaacaaaatgacagtgactaat Unknown function 3'UTR solute carrier family 17 (anion/sugar transporter) member 5 NMJ 72773 tataacattttaactaat transport 3'UTR solute carrier family 18 (vesicular monoamine) memberl (Slc18a1) NMJ53054 gttaacccatttgtaggacctcttactaac Unknown Junction CDS solute carrier family 18 (vesicular monoamine) member 2 (Slc18a2) NMJ 72523 tttaacttgaaactaat neurotransmitter transport 3UTR solute carrier family 25 (mitochondrial carnitine/acylcarnitine NMJJ20520 gataatacctaagaacagcecacctaetaac transport 3'UTR solute carrier family 30 (zinc transporter) member 4 (Slc30a4) NM_011774 tgtaatgctggtgtatgtactaat signal transduction CDS solute carrier family 33 (acetyt-CoA transporter) member 1 NM_015728 cttaatatgcaggtactcactaac Unknown function 3UTR solute carrier family 35 (UDP-gala close transporter) member 2 NM_078484 gactaacctctgttaat nucleoside metabolism 3'UTR solute earner family 35 member F1 (Stc35ft) NMJ 78675 cataacaaagcacactaat Unknown function 3UTR solute earner family 39 (zinc transporter) member 10 (Slc39a10) NMJ 72653 tactaactaggttaat metal ion transport 3UTR solute carrier family 5 (iodide transporter) member 8 (Slc5a8) NMJ45423 - tttaatgctgttaagttgaattactaat Unknown function 3'UTR solute carrier family 5 (sodium/glucose cotransporter) member 1 NM_019810 tttaatattaaattaattaattaactaat sodium ion transport 3-UTR solute carrier family 6 (neurotransmitter transporter glycine) NMJ48931 gttaattgta'acctgcactaac signal transduction CDS solute earner family 6 (neurotransmitter transporter) member 15 NMJ 75328 actaacgttgctaat transport CDS solute carrier family 9 (sodium/hydrogen exchanger) isoform 9 NMJ 77909 cataatgggactaat Unknown function CDS solute carrier organic anion transporter family member 1 a1 NM_013797 cattaatgtggatatatgtactaat transport 3VTR solute carrier organic anion transporter family member 2b1 NMJ 75316 tttaaccagtgcctgagcctacactaat transport 3UTR sortilin 1 (Sortl) NMJD19972 gttaattgcccgttggcaactaac transport endocytosis 3'UTR sorting nexin 14 (Snx14) NMJ 72926 tttaataaagactaac protein transport 3'UTR sorting nexin 5 (Snx5) NMJ324225 actaatccttttcttatgcatttaat protein transport 3'UTR sorting nexin 9 (Snx9) NMJD25664 actaacactaac protein transport CDS sparc/osteonectin ewev and kazal-like domains proteoglycan 3 NMJJ23689 actaatgacctgaaccacaataat serine protease inhibitor activity 3'UTR spastic paraplegia 4 homolog (Spg4) NM_016962 tataacgagagtactaac Unknown function CDS sperm associated antigen 9 (Spag9) NM_027569 actaataatggaactaat Unknown function 3'UTR spermatid perinuclear RNA binding protein (Strop) NMJTO9261 gala a ccttc eta Oca gattc a ga a act aa c spermatogenesis 3'UTR 100

spermatogenic Zip 1 (Spz1) NM_030237 cataacgaactcagcgaactaat Unknown function CDS sperm-specific sodium proton exchanger (LOC208169) NMJ 98106 ga ta atgt ga atta a a gcatctca ca eta a c Unknown function 3UTR sphingosine-1 -phosphate phosphatase 1 (Sgppf) NM_030750 actaacagaagtaaatggcccataac apoptosis 3'UTR splicing factor 3b subunit 1 (Sf3b1) NM_031179 ttta a tgttttgggtca a eta at processing 3'UTR splicing factor arginine/serine-rich 1 (ASF/SF2) (Sfrs1) NMJ 73374 tttaatagggactaat processing 3UTR splicing factor arginine/serine-rich 10 (transformer 2 homolog) NMJ309186 gataatgglatttcaactaat Unknown function 3'UTR splicing factor arginine/serine-rich 5 (SRp40 HRS) (Sfrs5) NMJJ09159 gttaactcaagattagtttaattaaactaac splicing 3UTR SRB7 (supressor of RNA polymerase B) homolog NM_025315 actaataaggttcatgatataat transcription regulation 3TJTR SRY-box containing gene 2 (Sox2) NM_011443 actaataccatccttataac transcription regulation 3'UTR SRY-box containing gene 21 (Sox21) NMJ77753 actaatgtttgtg a a tg a a gtt g eta a c transcription regulation 3'UTR SRY-box containing gene 30 (Sox30) NMJ 73384 tttaatttacactaat transcription regulation 3'UTR START domain containing 8 (StardS) NMJ 99018 acta acacacttctgttctaac Unknown function 3-UTR staufen (RNA binding protein) homolog 2 (Stau2) NMJ325303 actaatatttagttctaccaataat Unknown function 3-UTR stearoyt-Coenzyme A desaturase 1 (Scd1) NM_0O9127 tttaatattctgttgattaactaac metabolism fatty acid 3UTR sterol O-acyltransferase 1 (Soatl) NM_009230 tattaatctttctctactaat metabolism CDS sterol-C5-desaturase (fungal ERG3 delta-5-desaturase) homolog NMJ72769 actaaccaggaaaccctaac metabolism 3UTR stomatin (Epb7.2) -like 2 (Stoml2) NM_023231 actaatcatgtaat Unknown function 3'UTR submaxillary gland androgen regulated protein 2 (Smr2) NM_021289 tcctaalactcatattccttatalactaat Unknown function 3'UTR succinate dehydrogenase complex subunit D integral membrane NMJ325848 tttaatcaggagatgctctcaatgactaat Unknown function 3UTR sulfatase 1 NMJ 73396 actaatcctagatttgtattaac transcription regulation 3'UTR thiamin pyrophosphokinase(Tpk1) NMJ513861 actaatctttcataat thiamin metabolism 3UTR thioredoxin reductase 1 (Txnrdl) NM_015762 tactaactggtgta gcattgtctcctttaat metabolism CDS thioredoxin reductase 3 (Txnrd3) NMJ 53162 actaatataac electron transport 3UTR thioredoxin-like (Txnl) NMJJ16792 actaataacttgtaat apoptosis 3UTR thioredoxin-like 2 (Txnl2) NMJJ2314G actaattcctttaaatccctttcttaac electron transport 3'UTR threonyHRNA synthetase (Tars) NM_033074 tataatacactaat protein biosynthesis CDS thyroid stimulating hormone receptor (Tshr) NM_011648 tatgaacaagcctctaatcactgttactaac cyclic-cascade mediated signaling CDS thyrotropirKeleasing hormone degrading ectoenzyme (Trhde) NMJ46241 adaatcgatcagttaat proteolysis and pepbdolysis CDS tight junction protein 1 (Tjp1) NMJ309386 gtt aatca ta atgtca gtgt a a eta at DNA replication CDS toll-like receptor 3 (Tlr3) NMJ 26166 gttaactggatcaaccagacccacactaat neurotransmitter transport CDS topoisomerase 1 binding arginine/serine-rich (Topors) NMJ 34097 tataatggttcctttactaac Unknown function CDS tousled-like kinase 2 (ArabJdopsis) (Tlk2) NMJJ11903 actaatggagctgaaaatgaaacgttaac cell cycle CDS trafficking protein particle complex 2 (Trappc2) NM_025432 tataatatggatttagtattactaat Unknown function 3'UTR trans-acting transcription factor 3 (Sp3) NM_011450 gttaatgaaactaat Unknown function CDS transcription elongation factor A (Sit) 1 (Tceal) NMJ311541 adaattttgtaat transcription regulation 3'UTR transcription factor EC (Tcfec) NMJJ31198 actaacaaatttggtgataat development 3UTR transcriptional regulator SIN3B (Sin3b) NMJ309188 ggtaaccactaac cell cycle cytokinesis 3VTR transferrin receptor (Tfrc) NMJJ11638 a eta a ca a ctg attttcata at proteolysis and peptrdolysis CDS transformation related protein 53 inducible nuclear protein 1 NMJ321897 actaacacaagcattaac apoptosis induction 3'UTR 101

transforming growth factor beta receptor 1 (Tgfbii) NMJ309370 tataatttttcaagatcttaaactaac TGFbeta receptor signaling pathway 3'UTR transgene insert site 737 insertional mutation polycystic kidney NMJJ09376 tataacccgtcagctctcactaat Unknown function CDS transient receptor potential cation channel subfamily M member 6 NMJ 53417 actaacgaactagggaataaattaat receptor activity 3'UTR transient receptor potential cation channel subfamily M member 7 NM_021450 actaattttaat Unknown function 3UTR translocated promoter region (Tpr) NMJ 33780 actaacaaccagaatttaat Unknown function CDS transmembrane 4 superfamily member 12 (Tm4sf12) NM_173007 gttaactgactcctctaactaac Unknown function 3'UTR transmembrane 4 superfamily member 4 (Tm4sf4) NMJ45539 actaatgtgtctggactacttgtaac Unknown function 3TJTR transmembrane 6 superfamily member 1 (Tm6sf1) NM_145375 actaaccatactatgcataagaaataat Unknown function 3UTR transmembrane 7 superfamily member 1 (Tm7sf1) NMJJ31999 tataacttaaatactaat Unknown function 3UTR transmembrane channef-like gene family 3 (Tmc3) NM_177695 actaatttaac Unknown function 3TJTR transmembrane phosphatase with tensin homology (Tpte) NMJ 99257 gataacactaatcactaac Unknown function CDS transmembrane phosphatase with tensir? homology (Tpte) NMJ99258 actaatcactaac protein amino aciddephosphorytation CDS transmembrane phosphatase with tensin homology (Tpte) NMJ 81851 gataacactaatcactaac protein amino acid dephosphoryiation CDS transmembrane protein 16D (eight membrane-spanning domains) NMJ 78773 acta atttacttttaaagtgataat Unknown function 3UTR transmembrane protein 2 (Tmem2) NMJJ31997 actaacttatttaat Unknown function 3UTR tripartite motif protein 32 (Trim32) NM_053084 actaataaaagtaat Unknown function 3UTR tripartite motif protein 37 (Trim37) NMJ 97987 actaacagtgtagtgctcaattaat Unknown function 3'UTR tripartite motif-containing 36 (Trim36) NMJ78872 actaatttcagacccaaggttctaat Unknown function CDS Trk-fused gene (Tfg) NMJM9678 actaataataat Unknown function 3'UTR trophoblast specific protein (Tpbpa) NMJJ09411 actaataaaaccttaccattaccttaat Unknown function 3UTR trophoblast specific protein beta (Tpbpb) NM.026429 aactaatgaaaataatgactaac Unknown function CDS tumor necrosis factor (ligand) superfamily member 18 (Tnfefl 8) NMJ 83391 actaacacatactgggggatcatcttaat receptor activity CDS tumor necrosis factor (ligand) superfamily member 8 (ThfsfB) NM_009403 tataaccactaat immune response 3'UTR tumor necrosis factor alpha induced protein 6 (Tnfaip6) NM_009398 cataattgtactacacagaaataactaat cell adhesion 3'UTR tumor necrosis factor alpha-induced protein 1 (endothelial) NM_009395 actaacttctaat potassium ion transport 3'UTR tumor necrosis factor receptor superfamily member 11b NM_008764 actaattttafflcttacattaa c apoptosis 3UTR tumor necrosis factor receptor superfamily member 21 (Tnfrsf21) NMJ 78589 actaatttattaat apoptosis 3UTR type 1 tumor necrosis factor receptor shedding aminopeptidase NM_030711 actaacaaaggacaatattaat proteolysis and peptidolysis 3'UTR tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation NMJJ11739 tttaatagccaatgcaactaat signal transduction CDS U2 small nuclear ribonucleoprotein auxiliary factor (U2AF) 1 NMJ 78794 acta atgagagtggtttttttta ac RNA binding 3TJTR ubiquilin 1 (Ubqlnl) NM.026B42 actaacttgctttttaaacataac Unknown function 3'UTR ubiquilin 1 (Ubqlnl) transcript variant 2 NMJ 52234 aactaacttgctttttaaacataac Unknown function 3'UTR ubiqujtin carboxyl-terminal esterase L3 (ubiquitin thiolesterase) NM_016723 a eta a ctca a a attttt a at ubiquitin-dependent protein catabolism 3TJTR ubiquitin carboxyl-terminal esterase L4 (UchW) NM_033607 actaactcaaaatttttaat ubiquitin-dependent protein catabolism 3'UTR ubiquitin specific protease 1 (Usp1) NMJ46144 actaatgatactactaat peptidase activity CDS ubiquitin specific protease 25 (Usp25) NM_013918 tttaattgagaaatgtcctctactaat ubiquitin-dependent protein catabolism 3DTR ubiquitin-conjugating enzyme E2 variant 2 (Ube2v2) NM_023585 actaatatttaat protein binding 3TJTR ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog yeast) NM_025356 actaatagctctcctagtaat ubiquitin cycle 3UTR ubiquitin-conjugating enzyme E2L 3 (Ube2l3) NM_009456 actaactttctacagttttcttaat ubiquitin cycle 3UTR ubiquitin-like containing PHD and RING finger domains 2 (Uhrf2) NMJ44873 a eta atg ga a a tgt a a atcata a t Unknown function CDS ubiquitously transcribed tetratricopeptide repeat gene X NM_009483 actaatgagagtaat Unknown function CDS UBX domain containing 4 (Ubxd4) NM_145441 tataatatcaatactaat Unknown function 3TJTR UDP-Gal:betaGlcNAc beta 13-galactosyltransferase polypeptide 2 NM_020025 tataatctgaccattaa aacactaat transcription regulation CDS UDP-glucose pyrophosphorylase 2 (Ugp2) NMJ 39297 tactaattatgggctaaagagtttcttataat metabolism 3'UTR UDP-glucuronosyltransferase 8 (Ugt8) NM.011674 actaactggatgattaagcataagttaat development nervous system myelination 3TJTR unc-5 homolog C (C. elegans) (Unc5c) NMJ309472 tataaccaatactaat development nervous system 3UTR unc-50 homolog (C. elegans) (Unc50) NMJJ26123 actaattaa atgtacatttctaat Unknown function 3'UTR Unc-51 like kinase 2 (C. elegans) (Ulk2) NMJJ13881 actaatttattaaaataac protein serineAhreonine kinase activity 3'UTR unc-93 homolog A (C. elegans) (Unc93a) NMJ 99252 actaatflaat Unknown function 3UTR Usher syndrome 1C binding protein 1 (Ushbpl) NMJ 81418 actaatgttccctcacacatgtaat Unknown function 3'UTR Usher syndrome 3A homolog (Ush3a) NMJ 53384 actaatgtagcttcagatttaat Unknown function CDS vacuolar protein sorting 4b(Vps4b) NMJD09190 actaatgtctcattacataat transport endocytosis 3TJTR vascular cell adhesion molecule 1 (Vcaml) NMJM1693 tgtaatgactaac cell adhesion 3UTR vascular endothelial zinc finger 1 (Vezfl) NM_016686 ttctaaccaetaac Unknown function 3UTR v-crk sarcoma virus CT10 oncogene homolog (avian) (Crk) NMJ33656 actaatatttgacatggttaat cell cycle regulation 3UTR vesicular gtutamate transport er-3 (Vglut3) NMJ 82959 tataatgeactttataaaaagactaat Unknown function 3'UTR visinin-like 1 (Vsnh) NM_O12038 cataacttactaat Unknown function 3UTR vitamin D receptor interacting protein (Vdrip) NM_026119 actaatgaaattagcacttaat Unknown function CDS vitamin K epoxide reductase complex subunit 1 (Vkord) NMJ 78600 actaacctaac Unknown function 3'UTR vomeronasal 1 receptor B1 (V1rb1) NMJD53225 cataacccaactaat Unknown function CDS vomeronasal 1 receptor BIO (V1rt>10) NMJ353240 aactaatgctgcttataac neurophysiological process CDS vomeronasal 1 receptor B2 (V1rb2) NM_011911 aactaatgctacttataac neurophysiological process 3'UTR vomeronasal 1 receptor B3 (V1rb3) NMJJ53226 aactaatgctgcttataac neurophysiological process 'CDS vomeronasal 1 receptor B4(V1rb4) NMJJ53227 aactaatgctgcttataac neurophysiological process CDS vomeronasal 1 receptor 87 (V1rb7) NM_05322S aactaatgctgcttataac neurophysiological process CDS vomeronasal 1 receptor B9 (V1rb9) NM_053230 aactaatgctgcttataac neurophysiological process CDS vomeronasal 1 receptorC13(V1rc13) NMJ34168 gactaatttcaacaatcataac response to external stimulus CDS vomeronasal 1 receptor C16 (V1rc16) NMJ 34171 tactaatgtgagtgagactaac response to external stimulus CDS vomeronasal 1 receptor C25 (Vlrc25) NMJ 34180 aactaaccttcattcacataat response to external stimulus CDS vomeronasal 1 receptor C28 (V1rc28) NMJ 34183 aactaaccttcattcatataat response to external stimulus CDS vomeronasal 1 receptor C33 (V1rc33) NMJ 34436 tactaatgtgagtgagactaac response to external stimulus CDS vomeronasal 1 receptor family NMJJ53220 actaattctaac GPCR Protein signaling pathway CDS vomeronasal 1 receptor G7(V1rg7) NMJ 34208 tataacaaaaatgtaactaac response to pheromone CDS vomeronasal 1 receptor J3 (V1rj3) NMJ45847 tgtaatagcaaaaactaat response to pheromone CDS vomeronasal 1 receptorKI (V1rk1) NMJ34227 tgtaactgcactaat Unknown function CDS small GTPase mediated signal v-ral simian leukemia viral oncogene homolog A (ras related) NMJHWSt actaataaatataat transduction , 3UTR WD repeat and FYVE domain containing 3 (Wdfy3) NMJ 72882 tgtaacatgagacttcaaactaagccactaac Unknown function 3UTR WD repeat domain 20 (Wdr20) NMJ327614 actaatagacaagtcacgcgtaac Unknown function CDS 102

WD repeat domain 5B (WdrSb) NM_027113 actaattataat Unknown function CDS wingless related MMTV integration site 2b (Wnt2b) NM_009520 actaatatttgtgtaac development 3UTR wingless-related MMTV integration site 7B (Wnt7b) NM_009528 actaacgactgggtagccagacctaac development 3UTR WW domain containing adaptor with coiled-coil (Wac) NMJ 53085 tgtaacagagcttagacatctgaaactaat Unknown function 3UTR xanthine dehydrogenase (Xdh) NM_011723 actaatctgccctctaac cell differentiation regulation-epithelial CDS X-linkedmyotubular myopathy gene 1 (Mtml) NMJD19926 actaatactaat protein amino acid dephosphorytation 3UTR X-ray repair complementing defective repair in Chinese hamster NMJJ20570 acta act gctattctcaaacctgttaat DNA repair 3UTR Yamaguchi sarcoma viral (v-yes) oncogene homolog (Yes) NM_009535 acta attaatatgttttcagtttaat cell cycle regulation 3'UTR YY1 transcription factor (Yy1) NM_009537 actaacctgaaatctcacatcttaac transcription regulation CDS ZAP3 protein (Zap3) NMJ 78363 tattaatattttttttaaactaat Unknown function 3UTR zinc finger CCCH type domain containing 5 (2c3hdc5) NM_172569 actaactatttttaac electron transport 3UTR zinc finger CCHC domain containing 9 (Zcchc9) NM_145453 a ctaattggta caattgttaat Unknown function CDS zinc finger DHHC domain containing 15 (Zdhhc15) NMJ 75358 actaatgctgcttctttataat Unknown function 3TJTR zinc finger DHHC domain containing 2 (Zdhhc2) NM_178395 actaatacaaattactattaac Unknown function 3UTR zinc finger homeobox 1a (Zfhxla) NM_011546 cgtaatacgacaagtcttggagactaat transcription regulation CDS zinc finger homeobox 1b (Zfhxlb) NMJM5753 actaattcctgtgtttaat development nervous system 3'UTR zinc finger protein 148 (Zfp148) NM_011749 tttaatttttggagactaac transcription regulation 3UTR zinc finger protein 189(Zfp189) NM_145547 gttaacaactaat transcription regulation 3UTR zinc finger protein 2 (Zfp2) NM_009550 actaatgtggtaac transcription regulation 3UTR zinc finger protein 2 (2fp2) NM_178447 tttaaccttttgatactaat transcription regulation 3'UTR zinc finger protein 27 (Zfp27) NM_011754 actaattacatggtaat DNA binding 3*UTR zinc finger protein 281 (Zfp281) NMJ 77643 actaactctaat Unknown function CDS zinc finger protein 292 (Zfp292) NM_013889 actaatattaat Unknown function CDS zinc finger protein 363 (Zfp363) NM_026557 tgtaataaattactaat electron transport 3TJTR zinc finger protein 364 (Zfp364) NM_026406 a ctaattcaggggtctgagactctaac Unknown function 3UTR zinc finger protein 397 (Zfp397) NM_027007 actaaccaataaaaataat DNA binding 3'UTR zinc finger protein 40 (Zfp40) NM_009555 actaatgca atgaatttgactaat electron transport CDS zinc finger protein 423 (Zfp423) NM_033327 tttaattttttaattaaagactaat Unknown function 3UTR zinc finger protein 513 (Zfp513) NMJ 75311 tataataaaggaaacactaac Unknown function 3UTR zinc finger protein 60 (Zfp60) NM_009560 tgcta atcgaaatattacta ac metabolism CDS zinc finger protein 9 (Zfp9) NMJ) 11763 atctaactgaa gcactttgaga acactaat metabolism CDS zinc finger protein subfamily 1A 5 (Zfpn1a5) NM 175115 ggtaatttatctccactaat Unknown function 3'UTR "When a ORE site exist in two or more regions of a sequence, 3'UTR was favored over CDS andSUTR. 103

ble 2.2 Matrix based! on the nucleotides identified by SELEX. Position: 1 2 3 4 5 6 7 A 0.1168 0.9390 0 0.0121 0.9620 1 0 C 0.0849 0.0243 0.9512. 0.0121 0 0 0.8888 U 0.6623 0,0121 0.0487 0,9634 0.0253 0 0.1111 G 0.1558 0.0243 0 0.0121 0.0126 0 0 U A C U A A c Table 2.3 mRNA targets identified based on the matrix (Table 2.2) and a fixed core or half site sequence.

Gene Sequence Region 3'-phosphoadeno5ine S'-phosphosullate synthase 2 (Papss2) NM_011864 taccaatgtcctgttggcttggtatsat 5UTR 3-phosphoinositide dependent pralein kinase-t (Pdpkl) NMJ11062 actaactcttttgaactagacttac CDS 51O-methylenetetrahydroTolate reductase (Mthfr) NMJ10840 tcctaacggccgdggggtaat CDS 5'-3" exoribonuctease 1 (Xrn1) NM_011916 tatgaacaagcctcatgggactaac CDS 5-hydroxytryptamine (serotonin) receptor 1D (Htrld) NMJ08309 gataacaagacatgagaatatacaaac 5UTR 5-hydroxytryptamine (serotonin) receptor 2 A (Htr2a) NMJ72812 cattaacattgcgtggatttttaat 5UTR 5-hydroxytryptamine (serotonin) receptor 2C ^(tf2c) NMJ308312 aattaattgggatgaaacaatadgttaac 5UTR 5"-nucleotidase cytosolic IB (Nt5db) NMJK7588 tactgatgactaat CDS 6-phosphofructo-2-kinase^ructose-26-biphosphatase 2 (Ptkfb2) NMJJ08825 aadtatgtgtctaagaaadaac CDS a disintegrin and metalloprotease domain 24 (testase 1) (Adam24) NM.010086 gacttactatctggactaat CDS a disintegrin and metalloprotease domain 29 (Adam29) NM_175939 actaatttaatactcacatac 5UTR a disintegrin and metalloprotease domain 3 (cyfitestin) (Adam3) NM_009619 tataacctacttac CDS a disintegrin and metalloprotease domain 5 (AdamS) NM_007401 tataaciccaactgcagltgtttatacttac CDS A kinase (PRKA) anchor protein (yotiao) 9 (Akap9) NM_194462 gadtatcaaccaactaac CDS acid phosphatase 2 lysosomal (Acp2) NMJ07387 actaacagactgac 3UTR activating transcription factor 2 (AH2) NMJ09715 gataattcdgttaac 5UTR acyt-Coenzyme A oxidase 1 palmitoyl (Acoxt) NMJ15729 taccaactgtcacactaac 3UTR adaptor protein complex AP-1 gamma 1 subunit(Aplgt) NMJI09677 tcdaacttgattcagttaat CDS adaptor-related protein complex 3 beta 1 subunit (Ap3bl) NM_009680 cataatccatatgctaac CDS ADP-ribosytation factor-like 6 interacting protein 6 (Art6ip6) NM.022989 actaacagatactgat 3UTR aldo-keto reductase family 1 member D1 (Akrldl) NMJ45364 tacaaaccagtcactaac CDS alpha thalassemia/mental retardation syndrome X-linked homolog NMJJ09530 tatgaacccactaac CDS aminoadipate-semialdehyde synthase (Aass) NMJ13930 gadgatccataac 5'UTR amylase 1 salivary (Amy!) NMJW7446 tacgaattcctaaaaacgtttaat 5'UTR amyloid beta (A4) precursor protein-binding family B member 2 NM_009686 actaacactgcatccttggacgaac 3UTR angiomotin like 2 (Amotl2) NM_019764 cttaactgaggtgatcaccaac 5UTR ankyrin repeat and SOCS box-containing protein 6 (Asb6) NMJ 33346 dtaacaacccaataaacaatgaccaac 5UTR ankyrin repeat domain 17(Ankrd17) transcript variant 1 NMJJ30886 tgdaacagttcdtaat CDS ankyrin repeat domain 17 (Ankrdl 7) transcript variant 2 NMJ98010 tgdaacagttcdtaat CDS apoptotic protease activating factor 1 (Apafl) NMJTO9684 tattaacagtttgtccttaac 3UTR arachidonate 12-lipoxygenase 12R type(Alox12b) NM_009659 cttaatggtattaac CDS artemin (Artn) NMJ09711 tgttaaccdcdggdggdgtaat 5'UTR ATPase aminophospholipid transporter-like class I type 8A NM_015803 gataattttggdacaadtgdaac CDS ATPase Ca++-sequestering (Atp2c1) NMJ75025 gttaalgaactttcctaac CDS ATPase Cu++ transporting alpha polypeptide (Atp7a) NMJTO9726 tgdaaccccdccdgtdtaac CDS ATP-binding cassette sub-family B (MDR/TAP) member 1B (Abcblb) NMJJ11075 tattgadtcatttadaat CDS ATP-binding cassette sub-lamity C (CFTR/MRP) member 10 (AbccIO) NMJ45140 taccaaccccggccdggtadaac CDS ATP-binding cassette transporter sub-family A member 9 (Abca9) NMJ47220 tttaacaaaacaaac 5UTR baculovital IAP repeat-containing 3 (Birc3) NMJ07465 tataacagacttat 5'UTR baculoviral IAP repeat-containing 4 (Birc4) NM 009688 tttaatagattaaaaacatttgctaac CDS Bardet-Biedl syndrome 2 homolog (Bbs2) NM_026116 adaatttacagccgcagtccgaggacagac 5'UTR B-cetl CLUIymphoma 6 member B (Bcffib) NM„007528 tttaaccgacdgdaac CDS B-cell leukemia/lymphoma 2 (Bcl2) NMJ77410 tattaacaaagcttaat 5'UTR B-cell translocation gene 4 (Btg4) NM_0t9493 tgtaatgtgtttttattaat 5TJTR biiiverdin reductase B (flavin reductase (NADPH)) (Btvrb) NMJ44923 gaccaaccadaac CDS bone morphogenetic protein 2 (8mp2) NM_007553 tgctgaccacdgaadccadaac CDS bone morphogenetic protein 6 (Bmp6) NMJJ07556 adaatrtdgtcaagacaaac 3UTR BRAF35/HDAC2 complex (Bhc80) NMJ 38755 tttaatatgctaat 5UTR brain and reproductive organ-expressed protein (Bre) NMJ44541 tactgatcggadtaagttccagtggtaac 5UTR cadherin 5 (Cdh5) NM_009868 actaatgatadtat 3UTR cadherin 6 (Cdh6) NMJ107666 tcdaadcggaaaaatggctataac CDS calsyntenin 2 (Cfstn2) NMJ122319 tatgaaccatacctggtgactaat CDS carbonic anhydrase 4 (Car4) NMJM7607 lacctadtccgttacaatggdcadaac CDS carboxyi ester lipase (Cel) NMJ09885 tgctaacctlccaggtaac CDS cardiac lineage protein 1 (Clp1) NMJ 38753 cttaaccctttattgac 5UTR cardiomyopathy associated 4 (Cmya4) NMJ 78680 adaacdgtctgggcgc8gtgacaaac CDS Casitas B-lineage lymphoma-like 1 (Colli) NMJ 34048 cactgacaatgagttacaaggcadaat CDS caspase recruitment domain family member 14 (Cardl 4) NMJ 30886 aacaaatgactaat 5TJTR catenin src (Catns) NMJJ07615 gttaattggcaagcatgctattcctaac CDS cathepsin B (Ctsb) NMJM779S adaaccacgdgcaattaaaaacctaccaat 3TJTO cathepsin 0 (Ctsd) NMJW9983 atfaatgdtggtggcactgac CDS caveolin 2 (Cav2) NMJ)16900 adaacattacaaat 3UTR CBFA2T1 identified gene homolog (Cbta2t1h) NM_009822 tccaaactgcaagaagctactaac CDS Cd209e antigen (Cd209e) NMJ 30905 taccaacaggtgttactaat 3UTR cell division cycle 26 (Cdc26) NMJ39291 tltaatctcagcataggaagcatctgattaat 5UTR ceruloplasmin (Cp) NM.007752 tattgaccaagaatttgtadaat CDS c-fos induced growth factor (Figf) NMJJ10216 gacttactcaagtcatttcattggattttaat 5UTR CGG triplet repeat binding protein 1 (Cggbpl) NMJ 78647 cttaatatcaaggaaaaacaaac 5'UTR chaperonin subunit 2 (beta) (Cct2) NMJJ07636 tattaacagacagttaat CDS chemokine (C-C) receptor 3 (Ccr3) NMJ09914 tacaaattatgactaat CDS cholinergic receptor nicotinic alpha polypeptide 6 (Chrna6) NMJJ21369 actaatcggacaaac 3UTR chondroitin sulfate 6alNAcT-2 (Galnact2) NM_030165 ttctaattttgtttttaacttttggtaat 5UTR chromosome condensation 1-like (Chctl) NMJ34083 gacgaacaacttctctglaactaac 3UTR ciliary neurotrophic factor (Cntf) transcript variant 2 NMJ153007 actaacccagagcccctgac 5UTR cleavage and polyadenylation specific factor 2 (Cpsf2) NMJI16856 tttaatgctacttac CDS c-mer proto-oncogene tyrosine kinase (Mertk) NMJM8587 tttaattgctccatcaatattcctaac CDS COMM domain containing 1 (Commdl) NMJ44514 gattaatggtctgtaat 5UTR COMM domain containing 7 (Comrnd?) NMJ33850 aactgacttttetctccccagaactaac 3'UTR complement receptor 2 (Cr2) NM_007758 actaataacacatggttaccaggtgtaccaac CDS copinell(Cpne2) NMJ 53507 tatcaadtcaaccccactaac CDS cullin 5 (Cul5) NM_027S07 tacttacactgtttaat CDS cyctin M2 (Cnnm2) NM_033569 tactgacccctataac CDS cystin t (Cys1) NMJ38686 tccatacaatactaac 5UTR cytochrome c oxidase subunit Vila 2 (Cox7a2) NMJJ09945 tattaactgctgccaataaagcaatccttaac 3UTR cytochrome P450 51 (Cyp51) NMJK0010 tttaatcctgaccgctacttac CDS cytochrome P450 lamily 1 subfamily b polypeptide 1 (Cyp1 bl) NMJM9994 tttaactccactttatcaac 5'UTR cylochrome P450 lamily 2 subfamily j polypeptide 9 (Cyp2j9) NMJI28979 tgdaacataccttcagtggttataac CDS cytokine-dependent hematopoietic cell linker (Clnk) NMJ113748 actaacaggcacatac 5UTR cytotoxic and regulatory T cell molecule (Crtam) NM_019465 aartgacaagaggcttcctgactaac 3'UTR cytotoxic granule-associated RNA binding proteirt 1 (Tia1) NM_011585 gactgacagaacaactaat CDS dapper homofog 2 antagonist of beta-catenin (xenopus) (Dact2) NM_172826 tttaatcagcttacaaat 5'UTR DEAD (Asp-Glu-Ala-Asp) box polypeptide t (Ddx1) NMJ34040 gataacacaagacctggtgctaac CDS DEAD/H (Asp-Glu-Ala-AspMs) box polypeptide 26 (0dx26) NM_008715 tttaatcatttgctaat 5UTR DEAH (Asp-Glu-Ala-His) box polypeptide 32 (Dhx32) NMJ33941 tgctaaccatcgctgccatggtaac CDS deleted in azoospermia-like (Dazl) NM_01002t cacttactcttacttagtggaactaat 3UTR deltex 1 homolog (Dtx1) NMJJ08052 tttaattgcttttatttattaat 5UTR dentin matrix protein 1 (Dmp1) NMJ316779 actaatagttgatgcttaccacaacaaac CDS deoxyribonudease II beta (Dnase2b) NM_019957 tacagactaggcgactaac 5UTR desmoglein 1 gamma (Dsglc) NMJ81680 aacgaaccacaggatttgaactaat CDS desmoglein2(D5g2) NMJJ07883 gactgacacacagactaat CDS diacylglycerof kinase epsilon (Dgke) NMJ19505 cttaacttctattaac CDS DNA binding protein with his-thr domain (Dbphtl) NM_0t9416 tataatatataatatactcctaac 5UTR DnaJ (Hsp40) homolog subfamily C member 7 (Dnajc7) NMJ19795 tcctaacaatgccagctattacggtaat CDS dopamine receptor D1A(Drd1a) NM_010076 adaacactaccaat 3UTR DOTMikehistone H3 methyitransferase (Dotll) NMJ 99322 actaatggggcccactactcgccactgac CDS Down syndrome cell adhesion molecule (Dscam) NM_031174 cttaatccatgataalacttac CDS drebrinl (Dbn1) NM_019813 tgctaaccttcttaat CDS dual specificity phosphatase 1 (Duspl) NM_013642 tgcttacctcatgaggactaac CDS dynamin 1-tike (Dnmll) NMJ52816 gttaacagaagccaactggatattaac CDS E1A binding protein p400 (Ep400) NMJJ29337 tccttaccaccggcagctgcaacaactaac CDS elastin microfibril interf acer 2 (Emilin2) NMJ45158 tgtaaccaagaccccggttcctaac 5'UTR elongation factor RNA polymerase II2 (EII2) NMJ 38953 actaataggtgaatttgaccaac CDS elongation factor RNA polymerase ll-like 3 (EII3) NMJ45973 adaatggtggcttggaccttgtgtaccaac CDS elongation of very long chain fatty acids (FEN1/Elo2 SUR4/Elo3 NM_019423 actaactggagccatgccgacctctacaaac 3VTR endothelial PAS domain protein 1 (Epasl) NM_010137 gacttactcaggtagaactaac CDS enhancer of polycomb homolog 1 (Epd) NMJJ07935 actaatgttatgttgctgccctacaaac 3'UTR enhancer of zeste homolog 2 (Ezh2) NMJ07971 tacttactacgataac CDS eosinophil-associatedribonuclease A family member 1 (Earl) NM.007894 tgtaatgttgaaatgatgcgtattaac CDS eosinophit-associated ribonuclease A f amity member 10 (Ear10) NMJ53112 tgtaatgttgaaatgcagcgtattaac CDS eosinophii-associated ribonudease A family member 2 (Ear2) NMJ107895 tgtaatgttgaaatgatgcgtattaac CDS eosinophii-associated ribonuclease A family member 6 (Ear6) NMJJ53111 tgtaatgttgaaatgcaggctattaac CDS epiplakin 1 (Eppkl) NMJ44848 ctccaaccaaatgggcactaac 5VTR EPM2A (lafohn) interacting protein 1 (Epm2aip1) NM_175266 gadtacagttttaggaagtcacactaat 3UTR estrogen receptor 1 (alpha) (Esr1) NM_007956 tgtaactcgccggctgccacttac 5VTR ets variant gene 3 (Etv3) NMJI12051 tacttacaagtttaac CDS eyes absent 1 homolog (Eyal) NM_010164 gttaacaaccacttctaat 5UTR F-box only protein 7 (Fbxo7) NMJ 53195 actaacatcccagatgaacaagggactgat CDS Fc receptor IgE high affinity I alpha polypeptide (Fcerla) NM_0101S4 tacaaatgcaagtattattggctacaactaat CDS tetuin beta (Fetub) NMJJ21564 tacttacctgdtataac CDS fibrinogen gamma polypeptide (Fgg) NMJ33862 tacttactcaaaatcatctactactaat CDS fibrinogervtike protein 1 (FgH) NMJ45594 ggtaacaaaaacattaacttgctaac CDS fibroblast growth factor 15 (Fgfl 5) NMJ108003 actaatagggaacttac 3'UTR f ibromochJin (Fmod) NMJK1355 actaacaacggccttgctaccaac CDS FK506 binding protein 12-rapamyrin associated protein 1 (Frapl) NMJJ20009 actaacaagaatgttgaccaat CDS frizzled homolog 1 (Fzd1) NMJ21457 actaacgcggcgccgcctgac 5UTR G protein-coupled receptor 156 (Gpr156) NMJ53394 ttcaaatatgcatgatgaaggagactaat 5UTR G protein-coupled receptor 22 (Gpr22) NM_175191 actaatagagttttccatctgagctttat 5UTR G protein-coupted receptor 23 (Gpr23) NMJ75271 tttgaactaactaat 5TJTR G protein-coupled receptor 85 (Gpr85) NMJ45066 tataatgactttcctcttcctaac CDS GA6A(A) receptor-associated protein like 2 (Gabarapl2) NM_026693 actaactggagttacaaat , 3'UTR galanin receptor 1 (Galrl) NMJJ08082 actaatctctactcgaat 5UTR gamma-aminobutyric acid (GABA-A) transporter 4 (Gabt4) NMJ72890 tattaacaagttaac 3UTR gangliosicle-inducedditlerentiarJon-associated-protein10(Gdap10) NMJH0268 aacttacctctgggataat 5'UTR GATA binding protein 3 (Gata3) NM_008091 cttaactgcaaacaaac 5UTR gene modef 711 (NCBt) (Gm711) NMJ 98628 tcctaactctgaaaggcttgttaat CDS genera) transcription factor It A 2 (Gtf2a2) NMJ 99151 tgctadctctgagagcgtacgtggacttaac CDS gephyrin (Gphn) NM_172952 dtaalttaatattaac CDS glial cell line derived neurotrophic factor (Gdnf) NMJ10275 cacttactttataaggcagtcdtcadaac 3UTR GU-Kruppel family member GU3(Gli3) NMJJ08130 tgdaaccaggtaac CDS glucose phosphate isomerase 1 (Gpi1) NMJJ08155 adaacggadgat CDS glutamate receptor ionotropic kainate 1 (Grikt) NMJ46072 aacgaacdgttaat 5UTR glutamate receptor ionotropic NMDA3B (Grin3b) NMJ 30455 tcdtaccdcgcacagcacagtggtaac 5UTR glutamate receptor metabotropic 1 (Grm1) NMJJ16976 agcttatggccactaac 5'UTR glutaminerepeatproteinl (Glrpl) NMJJ08132 actaactgaaactgaaagtttcaccaaac 5'UTR alpha 1 submit (Glral) NMJ20492 tgccaacaacaacaacaccactaac CDS glycogen synthase kinase 3 beta (Gsk3b) NM.019827 cttaatgctgcatttatcattaac 5'UTR glycoprotein m6b (Gpm6b) NM_023t22 adaatgtagacgtaccaac 3UTR golgi autoantjgen golgin subfamily a 1 (Golgal) NMJJ29793 actaataataclgac CDS golgi autoantjgen golgin subfamily a 2 -3-beta (Hsd3b1) NM_008293 ttttaacaatttasc 5UTR hydroxysteroid dehydrogenase-2 de)ta<5>-3-beta (Hsd3b2) NMJ53193 gttaacctcactcccactgtgatctgctlac 5UTR immediate early response 2 (I er2) NM_010499 tcttaacccattctcgacttaac 5UTR immunoglobulin superfamily member 4 (Igdf4d) NMJ 78721 cacttaccgatgtgaagccactaac CDS immunoglobulin superfamily member 4A (Igsf4a) NMJ18770 gataacggtadtac CDS immunoglobulin superfamily member 6 (Igsf6) NM_030691 taccaacaaggacacadgdcactaac 3VTR inhibitor of kappaB kinase epsilon (Ikbke) NM_019777 adaadacctgtggcatadgat CDS integrin alpha X (ttgax) NM_021334 actaaccaaataggtggcctctacaaat CDS interferon alpha responsive gene (Ifrg15) NM_022329 tgtaadggadgac 5UTR interferon induced transmembrane protein 7 (Ifitm7) NM_028968 actaadcdataaac 5UTR interferon regulatory factor 7 (Irf7) NM_016850 taccaacadtgtaac 5UTR tnterferon-induced protein with tetratricopeptide repeats 2 NMJ08332 actaatacaartggcagtgaatcacttac CDS interleukin 10(1110) NMJ10548 tgdaaccgactccttaat CDS interleukin 3 (113) NM.010556 cttaacgatdggagacagtgdaac CDS interleukin enhancer binding factor 3 (Ilf3) NMJM0561 cataatgaagtgccgccacdcctaac CDS internexin neuronal intermediate filament protein alpha (Ina) NMJ46100 caccaacgagtacaagatcatccgcadaac CDS intestinal ceB kinase (Ick) NMJJ19987 cadtacagctgttdgaatfaat 3UTR Iroquois related homeobox 4 (lrx4) NM_018885 actaacgtactgac CDS Janus kinase 2 (Jak2) NM_008413 tatgaactactaac CDS Jun oncogene(Jun) NMJH0591 gttaacagtgggtgccaadcatgdaac CDS karyopherin Omportjn) alpha 2 (Kpna2) NMJM0655 tgdaacttaccagdgcccgacttaac CDS karyopherin flmportin) betal (Kpnbt) NMJ108379 adaadgaagttdggctadgat CDS keratin complex 2 basic gene 6g (Krt2-6g) NMJJ19956 actaacagaaaaccaac 3'UTR kinectin 1 (Ktnl) NM_008477 tacttacattcctttggataat CDS kinesin family member 1B (Kiflb) NM_008441 tgdaacagcgtdctgataat CDS kinesin family member 3C (Kif 3c) NMJJ08445 ggtaacdagggattctgcdaat 5UTR kynurenine aminotransferase II (Kat2) NM_011834 adaacatattatadgaattatttacaaat 3'UTR lactotransterrinfUf) NMJJ08522 tgdaaccagaccagatcdgcaaatttaat CDS lady bird-like homeobox 1 homolog (Lbxlh) NMJ10691 gataattgattadcdcgatcaaggdaac 5UTR lamirin receptor 1 (ribosomal protein SA) (Lamrl) NMJJ11029 cgtaacttaaagggaaacttac 5UTR large tumor suppressor 2 (Lats2) NM_015771 tatttatggtaaaaggaaactggactaac 5UTR lectin galactose binding soluble 7 (Lgals7) NMJW8496 tataaccdcatgtatttatgcctaac 5UTR lectin galaotostde-binding soluble 3 binding protein (Lgals3bp) NMJJ11150 actaadccadgac CDS lectin mannose-binding 1 (Lmant) NM_027400 adaacaactacaaac 3VTR leishmanorysin-lke (metallopeptidase MS family) (Lmln) NMJ 72823 tgtaacdgcagaggtttcdaac CDS leprjn receptor (Lepr) NMJ46146 tgtaacagtgctaac CDS leucine rich repeat transmembrane neuronal 2 (Lrrtm2) NMJ 78005 tattaacttataat 5UTR leudne-rich repeat-containing G protein-coupled receptor 8 (Lgr8) NMJJ80468 cataacgaadccaccttcdaac CDS leukocyte receptor duster (LRC) member 8 (Leng8) NMJ 72736 ggtaacgagtcccgggagcgagcggtcdaac 5UTR lipase hepatic (Lipc) NM_008280 ggtaacgtgttttaaggttaataattaat 5TJTR low density lipoprotan-retated protein 1B (deleted in tumors) NMJ)53011 tatttacggdgtttccctatcatgttaac 5TJTR Ly1 antibody reactive done (Lyar) NMJ)25281 ttdaaltatttaac 5UTR lymphocyte antigen 108 (Ly108) NM_030710 gttaatcacgacaaccaaaggtttgdaac 5'UTR lymphocyte antigen 75 (Ly75) NMJ113825 gactgacffigtagaccaggdagccactaat 3'UTR MAO homolog 4 (Madh4) NMJ108640 tadtaccatcataac CDS mannose receptor C type 1 (Mrd) NM_008625 actaactggggtgdgac CDS mannoside acetyfglucosaminyltransterase 2 (Mgat2) NMJ46035 tacttacgatgattataac CDS mannoside acetylglucosaminyltransf erase 4 isoenzyme A (Mgat4a) NMJ73870 actaattgcttatcaac CDS MWk1-related proteirv2 (Mlr2) NMJ72154 gttaatttatattaat 5UTR melanocortin 2 receptor (Mc2r) NMJ08560 aaccaacatgaagggtgccatgacadaac CDS microfibrillar-associated protein 1 (Mtapl) NMJJ26220 adaatctacttac 5'UTR microtubtie-associated protein 1 B (Mtapl b) NMJ308634 caccaacaaagacaaggccgaactaat CDS microtubule-assodated protein 4 (Mtap4) NMJ308633 tatfgaccaggdgagccdttaac CDS minichromosome maintenance deticient 8 (Mcm8) NM_025676 aactgacaaaagctgatgaaataactaac CDS mitochondrial ribosomalprotei n S17(Mrps17) NMJ125450 tacttaefgaagtadttaat CDS mitogen activated protein kinase kinase 5 (Map2k5) NM_011840 actaattcctcttggccggtcccccaac 5UTR mitogen activated protein kinase kinase 7 (Map2k7) NMJ111944 cacttactatgcactgacataat 5UTR mitogen activated protein kinase kinase kinase 4 (Map3k4) NMJJ11948 tccttacgtcatctggactaat CDS mitogen-activated protein kinase 4 (Mapk4) NMJ 72632 gataacgatacaggdtttatttctaat 5UTR mitogen-activated protein kinase kinase kinase 7 interacting NMJ 38667 adaattccactcaggcdagaattgcdat 5UTR MrgA4 RF-amide G protein-coupled receptor (MrgA4) NMJ 53524 tcctaacttctgttaac CDS mucin 6 gastric (Muc6) NMJ 81729 aadgacgagatacacatcacdcaartaac CDS myocardin (Myocd) NMJ46386 actaaccagaacaaac 3UTR myosin heavy polypeptide 2 skeletal muscle adult (Myh2) NMJ44961 gataacgcctaccagttcalgctaac CDS myosinic (Myolc) NM_008659 actaaccaagacggccctcagtgttgac CDS myosin tXb (Myo9b) NMJ15742 tattaaddgcacccttgtaac 3UTR myosin light polypeptide 1 alkali; atrial embryonic (Mylt) NMJ21285 tgttaacccatcdtttaat 5UTR natriuretic peptide receptor 1 (Npr1) NM_008727 tacttscaaagaacccgataat CDS nescient helix loop helix 1 (Nhlhl) NMJJ10916 actaactttgcagacagat 5TJTO neural precursor cell expressed developmental down-regulted gene NMJ310890 gataattacaccdacagataaatcctaac CDS neuritin 1 (Nm1) NMJ53529 actaactatttaaaggtctgcggtcgcaaat 5UTR neurogenin 2 (Neurog2) NM_009718 actaacgagtgtgcagagcagactgac 3UTR neurotensin receptor 2 (Ntsr2) NMJ08747 actaacagtctaagcggacctactgac 3UTR nicastrin (Ncstn) NMJJ21607 aacaaaccactaat 3UTR non-metastatic cells 7 protein expressed in (Nme7) NMJ78071 tatcgacgtggtgctgtctcaacactaac 5UTR nuclear factor l/B (Nfib) NMJ308687 cacttacagtcactaat 3UTR nuclear receptor coactivator 1 (Ncoal) NM_010881 tacttaltcaaagttgtccgtgtaat 5UTR nuclear receptor subtamily 2 group C member 1 (Nr2c1) NM_011629 tatagacagcgatggtggtggacadaat 5UTR nucleosome assembly protein Mike 2 (Nap1l2) NM_008671 cataatdgegtadtae CDS nyctalopin (Nyx) NMJ 73415 cttaacctaqjgataaggacttat 5UTR olfactory receptor t076(OHr1076) NMJ46406 gataattctaattctatcaactattaac CDS oltactory receptor 1128 (Ollrl 128) NMJ46349 ggtaatcaatattaac CDS oltactory receptor 1214 (Ollrl 214) NMJ46897 tgctaacagtgggtctatctgcataat CDS olfactory receptor 1215(Oltr1215) NMJ46459 tgtaattttagtgtttgctaac CDS olfactory receptor 1443(Oltr1443) NMJ46698 gataacttcctaat 5UTR olfactory receptor 1450 (Olfr1450) NMJ46371 tataatcttcatgtacttac CDS oltactory receptor 16 (Ollrl 6) NM 008763 gataaccddttgtggttttcctaac CDS olfactory receptor 266 (Ollr266) NMJ46489 tataatgttttgcatcttdacatcctaac CDS olfactory receptor 583 (Oltr583) NMJ46757 gataalgcacadctactattaac CDS olfactory receptor 600 (OffrfSOO) NMJ47046 tacttacagattataat CDS olfactory receptor 64 (Olfr64) NMJ13616 tacttatagagcattaatgtaac 5UTR olfactory receptor 65 (Oltr65) NMJM3617 ttdaacataadddgttataat 5UTR oltactory receptor 725 (Olfr725) NMJ46317 gttaacttgcxattttgtggtcdaac CDS olfactory receptor 726 (OWr726) NMJ46316 gttaacttgccattttgtggtcctaac CDS olfactory receptor 727 (OHr727) NMJ46319 gttaacttgccattttgtggtcctaac CDS olfactory receptor 855 (Olf r855) NMJ46524 tataatatcattatgaalcdaac CDS olfactory receptor GA_x5J8B7W6HFP-2821674-2820736 NMJ46267 gataacaccgttadtac CDS olfactory receptor MOR105-4 (MOR10W) NMJ46719 tccaaadttgtatccaactaac CDS olfactory receptor MOR114-5 (MOR114-5) NMJ46863 tatcaacaagggggtaatagtadaac CDS olfactory receptor MOR135-3 (MOR135-3) NMJ47023 tacttatgcccatcaadaat CDS olfactory receptor MOR139-4 (MOR139-4) NMJ47040 actaacggacttac CDS olfactory receptor MOR145-5 (MOR145-5) NMJ46558 tcccaadcdtaatdtgdtgtadaac CDS oltactory receptor MOR147-1 (MOR147-1) NMJ47067 tattgacaacatactaat CDS oltactory receptor MOR170-7(MOR170-7) NMJ46480 actaacagaccaac CDS olfactory receptor MOR 171 -8 (MOR171 -8) NMJ46815 gactgattttatcctcgaaggadaac CDS olfactory receptor MOR177-14 (MOR177-14) NMJ46293 adaatgggaattaccaac CDS olfactory receptor MOR185-1 (MOR185-1) NMJ47078 adaalcadccttagacadgac CDS olfactory receptor MOR185-2 (MOR185-2) NMJ47017 actaaccadcactagatactgat CDS olfactory receptor MOR189-3(MOR189-3) NMJ46405 gataacttttattttatcaattattaac CDS olfactory receptor MOR202-10 (MOR202-10) NMJ46696 tattgacctcttadtadaat CDS olfactory receptor MOR202-37 (MOR202-37) NMJ46291 tattgaedcttadtadaat CDS olfactory receptor MOR202-8 (MOR 202-8) NMJ46698 gaccaacagaatcaadaac CDS olfactoryreceptorMOR203-1 (MOR203-1) NMJ46865 tadgacaatttttgttggttttaac CDS olfactory receptor MOR205-1 (MOR205-1) NMJ46428 actaattttcatcgccatcdacaaat CDS olfactory receptor MOR249-2 (MOR249-2) NMJ46328 tattgacatctgctacadadactaat CDS olfactory receptor MOR250-1 (MOR250-1) NMJ46835 tttaaccatttadtac CDS olfactory receptor MOR262-2 (MOR262-2) NMJ46858 cattaatatatadggactaac CDS oltactory receptor MOR283-6 (MOR283-6) NMJ46598 actaattctgcttac CDS ornithine decarboxylase structural (Ode) NMJM3614 tttaaccgddaac 5UTR P450 (cytochrome) oxidoreductase (Por) NMJJ08898 adaacccgccacgaaccaac CDS paired immunoglobin-like type 2 receptor alpha (Pilra) NMJ53510 cataacagaagtcaagagtgctaac CDS pantothenate kinase 3 (Pank3) NMJ45962 tattaadgtgatagttttataac 3UTO PHD finger protein 10 (Phl10) NMJJ24250 cttaatcgaalacaaat 5TJTR pheromone receptor V3R1 (V3R1) NMJ30742 tdtaacacattgttataac 5UTR pheromone receptor V3R2 (V3R2) NM_030741 tataacatgtcagttdcdgtdacaaat 5UTR pheromone receptor V3R9 (V3R9) NM_03O735 cataatttatctcctaac CDS phosdurin-like (Pdd) NMJK6176 tattaacaaggcgacttcttgtttaac 3UTR phosphatase orphan 1 (Phosphol) NMJ53104 gataatatadgac 5UTR phosphate regulating gene with homologies to endopeptidases NMJJ11077 actaattgagatcttgaattggatgcagac 5UTR phosphatidylinositol 3-kinase C2 domain containing alpha NMJ111083 gaccaactgtcagtcatatgactaac 3UTR plasma membrane associated protein S3-12 (S3-12) NMJJ20568 actaaccalggggtaggacaagccatcctgac CDS plastin 3 (T-isoform) (Pts3) NMJ45629 actaactaaacctgaaaaccaggatattgac CDS pleckstrin homology domain containing family K member 1 (Plekhkl) NM_133244 cataatacaaaagaaaatcggagagaccaac 5TJTR pleckstrin homology Sec7 and coiled-coil domains binding protein NM_139200 tgctaacgatagagactcttaat CDS pleiotrophin (Ptn) NMJM8973 aacaaacggtcttataac 5UTR plexin domain containing 2 (Plxdc2) NMJM162 actaacctccttctctcgggaacaccaac 5UTR polyadenylate-binding protein-interacting protein 2 (Paip2) NMJJ26420 tattaacggtcattctcatgaagaggataat CDS polymerase (DNA directed) epsilon (Pole) NMJ111132 taccaaccaccagtaccaggaadaac CDS polymerase (DNA directed) epsilon 2 (p59 subunit) (Pole2) NMJJ11133 actaatcatgdgcaccaac CDS potassium inwardly-rectifying channel subfamily J member 2 NMJJ08425 tattaattatatatatatataat 5'UTR potassium inwardly-rectifying channel subfamily J member 8 NM_008428 cttaactcagttctggaggaccaac 5UTR procollagen type IV alpha 1 (Col4a1) NMJM9931 actaaccacagaagaatgactgac 3UTR procollagen type X alpha 1 (CollOal) NM_009925 tgdaaccacggggtaac CDS prolactin tiKe protein 0 (Prlpo) NMJ26206 tccttacaggcaactaac CDS protease inhibitor 15 (Pi15) NMJJ53191 tadgacaatdgtgctttccaggggtaac CDS protein kinase cG MP-dependent type II (Prkg2) NMJM6926 aacttacaaalacactaat 3UTR protein kinase X-linked (Prkx) NMJM6979 actaactgggttatgtaccteaaacaaac 3UTR protein phosphatase 1 regulatory (inhibitor) subunit 15b NMJ 33819 actaatgcaactgac CDS protein phosphatase 1 regulatory (inhibitor) subunit 9A (Ppp1r9a) NMJ81595 tgtaactcttgactgtgaatacatttdaat 5UTR protein phosphatase 4 regulatory subunit 1 (Ppp4rt) NM_146081 cacagacacatatgcagcatactaat 5UTR protein tyrosine phosphatase non-receptor type 20 (Ptpn20) NM_008978 tattaacatttgtaac 3UTR protein tyrosine phosphatase non-receptor type 21 (Ptpn21) NMJ311877 tgdaacatgtcgcataac CDS protein tyrosine phosphatase receptor type R (Ptprr) NMJ311217 tataaccgatlccttgagtadtac CDS protein-tyrosine suHotransterase 1 (Tpstl) NMJ13837 gataatggactgac 5UTO protocadherin12(Pcdh12) NMJJ17378 ttctaattatgggacagagttgtaac 5UTR protocadherin gamma subfamily A 4 (Pcdhga4) NM_033587 tacaaatcagaccgcaggggactaat CDS protocadherin gamma subfamily A 8 (PcdhgaS) NM_033591 adaacgacaatgccodgttttlgaccaac CDS protocadherin gamma subfamily A 9 (Pcdhga9) NM_033592 actaacgataatgcccdgttttcgaccaac CDS protocadherin gamma sublamily C 3 (Pcdhgc3) NMJJ33581 actaacagctcatataaac CDS protocadherin gamma sublamily C 5 (PcdhgcS) NMJ533583 tgtaattcaagtagatgtaggggatgctaac CDS pyruvate dehydrogenase complex component X (Pdhx) NMJ 75094 gacaaacggadaat 3UTR RADMIike (Rad54l) NMJJ09015 cttaacdcattggtgctaac CDS RAN binding protein 2 (Ranbp2) NMJ11240 cacaaadtttcacggggdccadaac CDS RAN member RAS oncogene family (Ran) NMJ309391 tgtaacctcaagagttadtac CDS Rap guanine nucleotide exchange factor (GEF) 5 (Rapgef 5) NMJ 75930 tccgaaccctgagacadgcaggadaac CDS RAR-related orphan receptor beta (Rorb) NMJ46095 maattgacagaaccaac 51ITR RAS protein-specific guanine nucleotide-feleasing factor 1 NM_011245 ggtaattcagtadtac CDS RAS-like family 2 locus 9 (Rasl2-9) NMJTO9028 cgtaacatcaagagttadtac CDS receptor (calcitonin) activity modifying protein 3 (Ramp3) NMJM9511 artaadgcaccgagatggagaccaac CDS renat tumor antigen (Rage) NMJ11973 actaatatgtgaacttat CDS rettcdocalbin 3 EF-hand calcium binding domain (Rcn3) NMJJ26555 agdaactcaggccgggtaac 5UTR retinoblastoma-Jike 1 (p107) (Rbl1) NM_011249 cadgadgcccagtcadaat CDS retinoid X receptor beta (Rxrb) NMJH1306 adaacatdgccaggcagcigacaaac CDS retinol binding protein 1 cellular (Rbp1) NMJI11254 gcccaacctgagttdgtadaac 5UTR REV3-iike catalytic subunit of DNA polymerase zeta RAD54 like NMJ111264 cataattataatntgatattaac CDS RF-amide G protein-coupled receptor (MrgA1) NMJ 53095 cgccaacagcacccacaacaactaat 5UTR Rho GTPase activating protein 1 (Arhgapl) NMJ46124 gadgacagagcttttactaat 3UTR Rho GTPase activating protein 12 (Arhgap12) NMJJ29277 tattaatgcatcgaggcacaaggcataat 5UTR Rho-guanine nucleotide exchange factor (Rgnef) NM_012026 adaactgacgtgdgddtdtac CDS ribonuclease L (2 5'-oligpisoadenylate synthetase dependent) NMJI11882 tartgatgagatgteagaagacagaacataat 5UTR ribosomal protein S4 X-linked (Rps4x) NMJM9094 tgctaacttgggaagaattggtgtaat CDS ring finger protein 20 (Rnf20) NMJ 82999 tgtaacatgcgtaaaaaggatgcagtacttac CDS RNA (guanine-9-) methyl transferase domain containing 3 (Rg9mtd3) NMJ327266 tactladttgaaadcgtaac CDS RNA binding motif protein 7 (Rbm7) NMJ44948 tcctaacagctatgaaaggacagtgggtaac CDS RNA binding motif protein Y chromosome family 1 member A1 NMJ311253 tgtaacatltgtaadttttttaataaccaac 5UTR Rost proto-oncogene (Ros1) NM_011282 gataacagaaacaagffltatadtac CDS runt related transcription factor 1 (Runxl) NMJJ09821 tttaacgcactaagcggccagttgdaac 5UTR SA rat hypertension-associated homolog (Sah) NM_016870 tttaatccattaac 5UTR schlafen5(Slfn5) NMJ 83201 gaccaacagtacgggttgdttcaadaac CDS schwannomin interacting protein 1 (Schipl) NM_013928 tttaatdgcggggagtccdgcdaac 5VITR SEC23A (Sec23a) NM_009147 tcctaaccttactggaggatacatggtaat CDS SEC24 related gene family member A (Sec24a) NMJ 75255 actaaccattttccteatgtagctctaccaac CDS secretogranin HI (Scg3) NMJ09130 tcttaactccccttdcattcataac sum sema domain immunoglobulin domain (Ig) short basic domain NM_009t53 tattaacaagataac 3UTR sema domain seven thrombospondin repeats (type 1 and type 1-like) NM.009154 actaaccacatcaacaaac CDS serine protease inhibitor Kunitz type 1 (Spintl) NM.016907 actaattgagaaaatgcctiac sura sex comb on midleg-fike 2 (Scml2) NMJ 33194 ttcttactacgtgtactaac CDS sex determining region of Chr V (Sry) NMJJ11564 tacctacttactaac CDS SH3 domain protein D19 (Sh3d19) NMJH2059 adaarcccfccacagacatgccacaadtac 3UTR SH3-domain binding protein 3 (Sh3bp3) NM_009165 aatttataagcaaatatactaac 5UTR siafyltransferase 10(alpha-23-sialyttransferase VI) (SiatlO) NMJ18784 gatttatttadaat 5UTR signaling intermediate in Toll pathway-evolutionarily conserved NMJ12029 gactgacatgggacttactaat 5VITR similar to vomeronasal 2 receptor 10; vomeronasal organ family 2 NMJ 98676 ggtaalgtttmgctadgat 5UTR similar to zinc finger protein 40 (LOC224598) NMJ45484 gacaaatgdltadaac CDS small proline rich-like 3 (Sprr13) NMJ)259S4 actaacadttagacaaac 3'UTR sodium channel voltage-gated type X alpha polypeptide (ScntOa) NM_009134 adaatctttccaaagcatcctatgaac CDS solute carrier family t (glial high affinity glutamate) NMJ48938 adaacaaggcgtgaac 5UTR solute carrier family 1 (glutamate transporter) member 7 (Sid a7) NMJ46255 tcdaatgatgcagdgglaat 5UTR solute carrier family 12 member 2 (Stct 2a2) NMJJ09194 tattaacaggtaac 3UTR solute carrier family 15 (H+/peptide transporter) member 2 NMJ121301 adaatatcaccaac CDS solute carrier family 2 (facilitated glucose transporter) member 5 NMJ119741 tattaactgtaagqjatadadttglataac 3UTR solute carrier family 20 member 2 (Slc20a2) NMJ11394 cgtaaccaaacgaac 5UTR solute carrier family 24 (sodium/potassium/calcium exchanger) NMJ44813 tcctgattactaac 5'UTR solute carrier family 26 member 3 (Slc26a3) NM_021353 tttaacctgctaac 5'UTR solute carrier family 26 member 4 (Slc26a4) NM_011867 actaatcacagagattcctgactgac 3'UTR solute carrier family 26 member 8 (Slc26a8) NMJ46076 tacttacaagcaagataac CDS solute carrier family 30 (zinc transporter) member 8 (Slc30a8) NMJ72816 actaactatgattttgcaccaac CDS solute carrier family 37 (gtycerol-3-phosphate transporter) member NMJ 53062 cttaaccatgcaagccatttatcacadgac 5UTR solute carrier family 40 (iron-regulated transporter) member 1 NMJM6917 actaactatttlggtaaatgagagaactgac 3UTR solute carrier family 5 member 4a (Slc5a4a) NMJ33184 cattaactcaggacttatatataat 5UTR solute carrier family 7 (cationic amino acid transporter) NMJM1405 actaacctcagactcactatgtat 5UTR solute carrier organic anion transporter family member 1a4 NMJI30687 tgctaacaagctgcagtactttttaat CDS solute carrier organic anion transporter family member 1a5 NM_130861 tgdaacaagdgcagtactttttaat CDS solute carrier organic anion transporter family member 1a6 NM_023718 tgtaactttatgctaac CDS sorting nexin 17 (Snx17) NMJ 53680 ggtaadgtgdaac CDS SoxLZ/Sox6 leucine zipper binding protein in testis (Soli) NM_021790 taltgacacdttaac 5UTR Sp7 transcription (actor (Sp7) NMJ 30458 tacttacccatdgadttgdccccttaac CDS sperm associated antigen 1 (Spagl) NM.012031 tgctaacagaatagcacgaatcttaac CDS sphingomyelin phosphodiesterase 3 neutral (Smpd3) NMJ121491 t&itaaccttgatgdttttaac 3UTR splicing factor arginine/serine-rich 14 (Sfrs14) NMJ 72755 tccaaacdcagggcadaac CDS SRY-box containing gene 6 (Sox6) NMJM1445 acctaactatgataac 5UTR stefinA3(Stla3) NMJJ25288 adaacaaaaccaagadgat CDS stress-induced phosphoprotein 1 (Stipl) NMJ16737 laccaacatgacctacataadaat CDS striamin (Strm) NMJM1501 ttcttatattgtttgcaadaac 5UTR stromal antigen 2 (Stag2) NM.021465 taccaadgatlttaat 5UTR sulfite oxidase (Suox) NMJ73733 tttaatgcagagcctcdcctgaactgdaac CDS suppression of tumorigenidty 7-like (St7l) NMJ 53091 tttaatcdcatgttccaaaatacttac COS SWI/SNF related matrix associated adin dependent regulator NM_009210 tacttacaatattttaac CDS synaptosomal-associated protein 25 (Snap25) NM_011428 tttaacgaagcaccadgac 5UTR synaptotagmin 2 (Syt2) NMJ109307 adaadgdtttcdgggttttggaccaac 3UTR tankyrase TRF1-trteractJng ankyrin-related ADP-ribose polymerase NMJ75091 cataattaacaggaattatgttaac 5UTR taste receptor type 2 member 107 (Tas2r107) NMJ 99154 cataactgaatatattadtac CDS T-box14(Tbx14) NM_011534 actaataatcagcaggctaccaac CDS T-box15(Tbx15) NM_009323 actaataatcagcaggctaccaac CDS telomerase reverse transaiptase (Tert) NMJJ09354 actaacdtaggttcttac CDS tenascin X8 (Tnxb) NM.031176 tcctaacttcacctaxcagccagcataac CDS testis specific protein DdcS (DdcS) NM_021440 ggtaattccacctaac 5UTR testis specific protein Wnase 1 (Teskt) NM_011571 aattaactctaatttaat 5UTR THO complex 1 (Thoct) NMJ53552 adaatgacadgaaacaaac CDS toll-like receptor 7 (Tlr7) NMJ33211 actaadtcataaaaattgdgac CDS topoisomerase(DNA) t(Top1) NM_009408 adaatgacgaaaaaaatacgattaccaac CDS transcription elongation factor B (SHI) polypeptide 1 (Tcebl) NMJ26456 tacdacaaggtccgdatadaac CDS transcription factor 4 (Tcf4) NM_0t36S5 cataatcttgtaatctgtggctaac 5TJTR transcription factor AP-2 alpha (Td ap2a) NM_011547 tattaacatcccagatcaaadgtaat CDS transcription factor-like 5 (basic helix-loop-helix) (Tdl5) NMJ 78254 tacttaaxadaat 3'UTR transducin-like enhancer of split 6 homolog of Drosophila E(spl) NM_053254 ccdaacctcagaggtg gadtaat 5UTR transformation related protein 53 (Trp53) NMJH1640 radtacgataaaaadtaat 5UTR transient receptor potential cation channel subfamily V member 6 NMJ322413 actaaccgcaccaac CDS translocase of inner mitochondrial membrane 10 homolog (yeast) NMJ13896 aadgatgcatcagaaaaggtgactaat 5UTR translocase of inner mitochondrial membrane 13 homolog a (yeast) NM.013899 ggtaadttttadattgac 5UTR tripartite motif protein 23 (Trim23) NMJ30731 tttaattaatgatacaaat 5VTR tripartite motif protein 24 (Trim24) NMJ 45076 adaactatccaagaagcatacttac CDS trophinin (Tro) NM_019548 tgccaadttggtggcgcadgadaac CDS tubby like protein 4 (Tulp4) NMJH4O40 tgtaactaaaggagaaaaaaacaaac 5'UTR tubulin delta 1 (Tubdl) NMJJ19756 maacacttdcttgoaac CDS tubulin tyrosine ligase-like 1 (TrJH) NMJ78869 cadgacattgagaagtcggtadaat CDS ubiquidn protein ligase E3A (Ube3a) transcript variant 2 NMJ11668 adaatgaatcgecrttaaaatacttat CDS ubiquirjn specific protease 3 (Usp3) NMJ44917 gacttacccaacatgagcaccaadaat 3UTR ubiquitin specific protease 9 X chromosome (Usp9x) NMJM9481 tcdaadtatcacaacdataat CDS UDP-GlcNAc;betaGat beta-13-N-acetylglucosaminyltransferase 5 NMJJ54052 adaactgaaacgtggtttggatgaat 5UTR UDP-GtcNAcrbetaGaf beta-13-N-acetvlglucosarninyltransferase 7 NMJ45222 actaadgclctatcaac CDS UD P-N-acetyl-alpha-D-galactosamine; polypeptide NMJ 72693 tadtacgggactgtaccaadcagataac CDS uridine monophosphate synthetase (Umps) NMJ)09471 gadtadattccaoxattaaaadaac 3UTR Usher syndrome 1C homolog (Ushlc) transcript variant a1 NM_023649 actaataaacaagadgac 3UTR Usher syndrome 1C homolog (Ushlc) transcript variant b3 NMJ53677 adaataaacaagadgac 3'UTR vaccinia related kinase 1 (Vrk1) NMJ11705 gataadtgaaagatcdaac CDS vacuolar protein sorting 29 (S. pombe) (Vps29) NMJ19780 cadtacgtdatcaadaat CDS vacuolar protein sorting 35 (Vps35) NM_022997 tacttacaacaatatdtaac CDS vang van gogh-like 1 (VangM) NMJ 77545 tataacccgaacdcdaac CDS vomeronasal 1 receptor C1 (V1rc1) NMJK3231 actaatgtgagtgagaccaac CDS vomeronasal 1 receptor C15 (V1rct5) NMJ34170 cadgadaaaacagtgggattataat 5UTR vomeronasal 1 receptor C3 (V1rc3) NMJJ53233 taccaatgtgagtgagadaac CDS vomeronasal 1 receptor C30 (V1rc30) NMJ34185 actaatgtgagtgagaccaac CDS vomeronasal 1 receptor E1 (V1re1) NMJ34190 gacttattcgatgatgttgggtgtaaactaat CDS vomeronasal 1 receptor E9 (V1re9) NM_145842 cttaataagatacaaat 5TJTR vomeronasal 2 receptor 4 (V2r4) NM_009493 tacaaacaaattgtgaagtcactaac CDS VPS10 domain receptor protein SORCS (Sores) NM_021377 tacttacatgtaac CDS WD repeat domain 26 (Wdr26) NM_145514 tacttacagagcattgtaat CDS Wilms' tumour 1-associating protein (Wtap) NM_175394 gacttacttatgctaagaaccaactaat 3'UTR zinc linger proliferation 1 (Ziprol) NM_011757 tctaaactcagaactaat 51JTR Zinc finger protein 118 (Zfp118) NM_013843 tattaacatgaaattctgtaattataac 3UTR zinc finger protein 119(Zfp119) NM_ 144546 tactgacaaaatccccoa g g a a actaet CDS zinc finger protein 236 (Zlp236) IMM_177832 cacttacaaaacttaat 5UTR zinc finger protein 238 (Zfp238) NM_013915 gttaacagactctctggttgctaat 5UTR Zinc linger protein 275 (Zfp275) NM_031494 tccctacaggactaat 51JTR zinc finger protein 322a (Zlp322a) NM_172586 actaatgtatatgatgctggaaacttttaaac 5UTR zinc finger protein 353 (Zfp353) NM_153096 gttaatgacttctaac 5UTR zinc finger protein 386 (Kruppel-like) (Zfp386) NM_019565 actaatgattgtggattgcctgctactgat CDS zinc finger protein 523 (Zfp523) NM_172617 gttaaccctatcggctgtcctgctaat 5TJTR zinc finger protein 535 (Zfp535) NM_026107 tccaaacctgacttgactaac CDS zinc finger protein 67 (Zfp67) NM_009565 tttaatgtatattctaccccatttgtcctaac 5UTR zinc finger ZZ domain containing 3 (Zzz3) NM_198416 actaatctcagtccccaggaaacaaac CDS 111

TABLE 2.4 MRNA CLASSIFICATION OF EACH HIT BASED ON GENE ANNOTATION.

TERM PERCENTAGE PVALUE DEVELOPMENT 14,0 5,75E-11 HOMOPHILIC CELL ADHESION 2,4 1.08E-09 CELL ADHESION 6,4 1,50E-09 CELL-CELL ADHESION 3,5 1.20E-08 MORPHOGENESIS 9,0 2.32E-07 ORGANOGENESIS 7,9 5.10E-07 CELLULAR PROCESS 47,7 1,88E-06 TRANSPORT 14,9 2.23E-06 CELL DIFFERENTIATION 3,6 3.23E-06 ORGAN ISMAL MOVEMENT 2,8 3.39E-06 CELL GROWTH AND/OR MAINTENANCE 24,1 1,74E-05 REGULATION OF BIOLOGICAL PROCESS 17,9 2.15E-05 INTRACELLULAR TRANSPORT 5,3 4.97E-05 MALE GAMETE GENERATION 1,6 7.40E-05 SPERMATOGENESIS 1,6 7.40E-05 GAMETOGENESIS 1,9 8.66E-05 1,28E-04 NEUROPHYSIOLOGICAL PROCESS 4,4 1,63E-04 CELL PROLIFERATION 6,6 1.67E-04 MEIOSIS 0,9 1.98E-04 INTRACELLULAR PROTEIN TRANSPORT 4,2 2,6 2.07E-04 VESICLE-MEDIATED TRANSPORT 26,5 2.10E-04 CELL COMMUNICATION 3,1 2.72E-04 NEUROGENESIS 27,0 4.30E-04 CELLULAR PHYSIOLOGICAL PROCESS 1,1 5,21 E-04 EQUILIBRIOCEPTION 1,1 5,21 E-04 SENSORY PERCEPTION OF LIGHT 5,21 E-04 VISUAL PERCEPTION 1,1 5,51 E-04 REPRODUCTION 1,9 5,51 E-04 SEXUAL REPRODUCTION 1,9 6.08E-04 MUSCULOSKELETAL MOVEMENT 1,1 6.08E-04 REGULATION OF BALANCE 1,1 7.87E-04 PROTEIN TRANSPORT 4,2 8.76E-04 ENDOCYTOSIS 1,5 1.29E-03 TRANSCRIPTION FROM POL II PROMOTER 2,2 1.57E-03 LIPID BIOSYNTHESIS 1,7 1.77E-03 ION TRANSPORT 5,3 2.13E-03 REGULATION OF PHYSIOLOGICAL PROCESS 13,5 2,21 E-03 REGULATION OF LYMPHOCYTE PROLIFERATION 0,6 2,61 E-03 NUCLEOCYTOPLASMIC TRANSPORT 1,2 2.88E-03 REGULATION OF TRANSCRIPTION FROM POL II PROMOTER 1,7 3,71 E-03 EMBRYONIC DEVELOPMENT 1,5 3,97 E-03 REGULATION OF METABOLISM 13,1 4.00E-03 DETECTION OF EXTERNAL STIMULUS 3,0 4.16E-03 CELL CYCLE 4,9 430E-03 BONE REMODELING 0,8 112

NEUROMUSCULAR PHYSIOLOGICAL PROCESS 1,7 4.67E-03 TRANSMISSION OF NERVE IMPULSE 1,7 4.67E-03 SYNAPTIC TRANSMISSION 1,6 4.75E-03 REGULATION OF T-CELL PROLIFERATION 0,4 6,94 E-03 OSSIFICATION 0,7 7.50E-03 SENSORY PERCEPTION 2,8 8.34E-03 REGULATION OF CELLULAR PROCESS 4,6 8.42E-03 REGULATION OF NUCLEIC ACID METABOLISM 12,1 8.47E-03 MEMBRANE LIPID METABOLISM 1,0 9.43E-03 NUCLEAR DIVISION 1,3 9.60E-03 M PHASE 1,4 1.02E-02 REGULATION OF TRANSCRIPTION 12,0 1.18E-02 METAL ION TRANSPORT 2,7 1.22E-02 CELL MATURATION 0,3 1.23E-02 NERVE ENSHEATHMENT 0,3 1.23E-02 NERVE MATURATION 0,3 1,23E-02 PROTEIN AMINO ACID PHOSPHORYLATION 4,3 1.53E-02 TRANSCRIPTION, DNA-DEPENDENT 11,7 1.60E-02 TRANSFORMING GROWTH FACTOR BETA RECEPTOR SIGNALING PATHWAY 0,7 1.72E-02 REGULATION OF CELL PROLIFERATION 1.3 1,77E-02 MUSCLE CELL DIFFERENTIATION 0,2 1.84E-02 CELL-CELL SIGNALING 1,9 1.85E-02 SKELETAL DEVELOPMENT 1,0 1,98E-02 NEGATIVE REGULATION OF CELL PROLIFERATION 0,6 2.00E-02 REGULATION OF TRANSCRIPTION, DNA-DEPENDENT 11,5 2.04E-02 TRANSCRIPTION 12,2 2.12E-02 REGULATION OF CELL CYCLE 2,6 2.15E-02 DETECTION OF ABIOTIC STIMULUS 0,9 2.17E-02 INORGANIC ANION TRANSPORT 1,3 2.28E-02 TRANSMEMBRANE RECEPTOR PROTEIN SER/THR KINASE SIGNALING PATHWAY 0,8 2.30E-02 CELL DEVELOPMENT 0,4 2,31 E-02 REGULATION OF BLOOD PRESSURE 0,4 2,31 E-02 INTRACELLULAR SIGNALING CASCADE 5,8 2.52E-02 NUCLEOBASE, NUCLEOSIDE, NUCLEOTIDE AND NUCLEIC ACID METABOLISM 17,2 2.67E-02 REGULATION OF GPCR protein SIGNALING PATHWAY 0,3 2.79E-02 T-CELL PROLIFERATION 0,4 2.80E-02 LYMPHOCYTE PROLIFERATION 0,6 3,01 E-02 LIPID METABOLISM 3,3 3.03E-02 MYELINATION 0,3 3.03E-02 REGULATION OF TRANSPORT 0,7 3.05E-02 ANION TRANSPORT 1,5 3.08E-02 INNATE IMMUNE RESPONSE 1,3 3.19E-02 RESPONSE TO ABIOTIC STIMULUS 3,0 3.27E-02 DETECTION OF LIGHT 0,4 3.34E-02 RESPONSE TO LIGHT 0,4 3.34E-02 FATTY ACID BIOSYNTHESIS 0,6 3,42 E-02 RESPONSE TO RADIATION 0,6 3,42 E-02 SEX DIFFERENTIATION 0,3 3.49E-02 CARBOXYLIC ACID BIOSYNTHESIS 0,6 3.65E-02 ORGAN IC ACID BIOSYNTH ESIS 0,6 3.65E-02 113

OBSOLETE BIOLOGICAL PROCESS 0,4 3.95E-02 PHYSIOLOGICAL PROCESS 59,9 4.08E-02 AROMATIC COMPOUND BIOSYNTHESIS 0,3 4.08E-02 GLYCOLIPID BIOSYNTHESIS 0,3 4.08E-02 NEGATIVE REGULATION OF LYMPHOCYTE PROLIFERATION 0,3 4.08E-02 PTERIDINE AND DERIVATIVE BIOSYNTHESIS 0,3 4.08E-02 NEGATIVE REGULATION OF NUCLEIC ACID METABOLISM 1,1 4.29E-02 TRANSITION METAL ION TRANSPORT 0,6 4.36E-02 CATION TRANSPORT 3,3 4.46E-02 PROTEIN-NUCLEUS IMPORT 0,6 4.99E-02 REGULATION OF TRANSLATION 0,6 4.99E-02 STEROID BIOSYNTHESIS 0,6 4.99E-02 POTASSIUM ION TRANSPORT 1,3 5.15E-02 CARBOHYDRATE TRANSPORT 0,5 5.19E-02 PROTEIN MODIFICATION 6,9 5.27E-02 GLYCOLIPID METABOLISM 0,3 5,28 E-02 •PTERIDINE AND DERIVATIVE METABOLISM 0,3 5.28E-02 NEURON DIFFERENTIATION 0,4 5.36E-02 MO-MOLYBDOPTERIN COFACTOR BIOSYNTHESIS 0,2 5,51 E-02 MO-MOLYBDOPTERIN COFACTOR METABOLISM 0,2 5,51 E-02 NEGATIVE REGULATION OF TRANSPORT 0,2 5,51 E-02 CHLORIDE TRANSPORT 0,7 5.93E-02 RESPONSE TO ENDOGENOUS STIMULUS 1,5 6.04E-02 SYNAPSE ORGANIZATION AND BIOGENESIS 0,3 6.14E-02 INFLAMMATORY RESPONSE 1,1 6,21 E-02 DNA METABOLISM 3,4 6.77E-02 RESPONSE TO DNA DAMAGE STIMULUS 1,5 7.09E-02 PHOTOTRANSDUCTION 0,3 7.20E-02 NEGATIVE REGULATION OF METABOLISM 1,2 7.66E-02 OSTEOBLAST DIFFERENTIATION 0,2 7.83E-02 NUCLEIC ACID TRANSPORT 0,4 7.94E-02 RNA TRANSPORT 0,4 7.94E-02 RNA-NUCLEUS EXPORT 0,4 7.94E-02 GERM CELL DEVELOPMENT 0,3 8.12E-02 ENDOSOME ORGANIZATION AND BIOGENESIS 0,3 8.36E-02 PHOSPHORYLATION 4,4 8.39E-02 PHOSPHATE METABOLISM 5,1 8.62E-02 PHOSPHORUS METABOLISM 5,1 8.62E-02 REGULATION OF ENDOCYTOSIS 0,4 8.93E-02 RNA LOCALIZATION 0,4 8.93E-02 IRON ION TRANSPORT 0,3 9.59E-02 PYRIMIDINE NUCLEOTIDE BIOSYNTHESIS 0,3 9.59E-02 REGULATION OF CELL ADHESION 0,3 9.59E-02 GLUCOSE TRANSPORT 0,3 9.74E-02 HEXOSETRANSPORT 0,3 9.74E-02 MONOSACCHARIDE TRANSPORT 0,3 9.74E-02 POSITIVE REGULATION OF LYMPHOCYTE PROLIFERATION 0,3 9.74E-02 DI-, TRI-VALENT INORGANIC CATION TRANSPORT 1,0 9.75E-02 CIRCULATION 0,5 9,91 E-02 114

Gene annotation were determined for all mRNA targets selected by bioinformatic using the Database for Annotation, Visualisation and Intergrated Technologies (DAVID) http://david.niaid.nih.gov/david/upload.asp Percentage is the percentage of hits that falls under each annotation or classification. One hit can fall into more than one annotation. The P-value is the probability of observing a group of hits in a given category by chance. In other words, the lower the P-value, the most significant the group of hits are in the given category. 115

Table 2.5. Substrates used for all T7 RNA MegaShortscript.

Name Sequence Fig. Olc-d Selected sequence S'-CGGCGTTAGTAGACGTGGTTACGGCCCTATAGTGAGTCGTATTAAATT-S' Fla. «3a-c QseW2wt S'-GAGCGCACGTACAGAGAAGATTAGTCAGCGTTAGTACAAAGGAGCCCACCACCCTATAGTGAGTCGTATTAAATT-S' Qsel#2mut1 5'-GAGCGCACGTACAGAGAAGGTTGGTCAGCGTTAGTACAAAGGAGCCCACCACCCTATAGTGAGTCGTATTAAATT-3' Qsel#2mut2 5'-GAGCGCACGTACAGAGAAGGCTCGTCAGCGTTAGTACAAAGGAeCCCACCACCCTATAGTGAGTCGTATTAAATT-3' Qsel#13wt 5'-CGAGTGCGAGATCGCTTGTCGAGCAAGGGCAGGGTTAGTCATATTAGTTCTGCCTATAGTGAGTCGTATTAAATT-3' Qsel#13mut1 5'-CGAGTGCGAGATCGCTTGTCGAGCAAGGGCAGGGTTAGTCATGTTGGTTCTGCCTATAGTGAGTCGTATTAAATT-3' QseW13mut2 5'-CGAGTGCGAGATCGCTTGTCGAGCAAGGGCAGGGTTAGTCATACTCGTTCTGCCTATAGTGAGTCGTATTAAATT-3' Qsel#8wt S'-ACCCCCATAGAGAGGGCGGTTAGTCAAGTATAGCTCTCGTCCCCTCAGCACCCCTATAGTGAGTCGTATTAAATT-S1 QseMBmut 5'-ACCCCCATAGAGAGGGCGGTTAGTCAAGATTAGCTCTCGTCCCCTCAGCACCCCTATAGTGAGTCGTATTAAATT-3' Fig. #4a COH 5'-CAACAATCACTGATGAGTTAGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C2H 5'-ACAATCACTGATGAGTTACTGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C5H 5WCACTGATGAGTTATGACCGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C10H 5'-CACTGAGTTATGGTGCCGTAGTrAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C15H 5'-TGAGTTACCGTATGGTGCCGTAGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C20H 5'-TGAGTTATGGTGCCGTATGGTGCCGTAGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C25H 5'-TGAGTTATGGTGCCGTAGAGCCTGGTGCCGTAGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' C30H 5'-TGAGTTATGGTGCCGTAGAGCCTGGTGCGCCGTCGTAGTTAGTATGCCCTATAGTGAGTCGTATTAAATT-3' Fig, #4b Struc #1 5'-TTGCTACGTTATCTGTTAGTAGCTTCCTCTAGTGAGTCGTATTAAATT-3' Struc #2 5'-TTGCTACTAACTCGTTATCTGTTAGTAGCTTCCTATAGTGAGTCGTATTAAATT-3' Struc #3 5'-GTTACGCGTTAGTATCTCGCGTATTCCTATAGTGAGTCGTATTAAATT-3' Struc Neg 5'-TTGCCGCTGAGACGAGAGCGGC"rrCCTATAGTGAGTCGTATTAAATT-3' Fig. #5a MBP:QRE-1 (666-690) 5'-ATTAAAAGTTTAAGGCAGTTATATTAAGAAGCTATAGTGAGTCGTATTA-3' MBP:QRE-2 (1551-1581) 5'-ATGGTTATTTTTCCACCGAGGTTAGTGTGTCCCTATAGTGAGTCGTATTAAATT-3' MBP:QRE-2 ml .5'-ATGGTrATTTTTCCACCGAGCACCGAGTGTCCCTATAGTGAGTCGTATTAAATT-3' MBP:QRE-2 m2 5'-ATGTTCCACTTTCCACCGAGGTTAGTGTGTCCCTATAGTGAGTCGTATTAAATT-3' Fig. #5b EGR2:QRE-2 (2769-2798) 5'-CCAGTTAGTGGTTTTTTTATGTTAGAATAGCCCTATAGTGAGTCGTATTAAATT-3' EGR2:QRE-2 ml 5'-CCAATCTGAAGTI I I I I IATGTTAGAATAGCCCTATAGTGAGTCGTATTAAATT-3' EGR2:QRE-2m2 5'-CCAGTTAGTGGTTTTTTTATATCTGAATAGCCCTATAGTGAGTCGTATTAAATT-3' Fig, mb BCAR (3091-3130) 5'-TTTTATTAAATGTAGACTATTAGTCAAATGATTGGAAAAACCTATAGTGAGTCGTATTAAATT-3' Bid (1998-2034) 5'-GATTAAACTCAGTTTGTTAGTGGTTCTGGTATCTTGTCCTATAGTGAGTCGTATrAAATT-3' CDH11 (3441-3480) 5'-GACAGTTATATCTGGCACGTATTAGTTTAGGATGAAAGTACCTATAGTGAGTCGTATTAAATT-3' Ft (1371-1410) S'-TTGTACTCATTACAGTCTCTCTGTTCTGTTAGTrTAAGTACCTATAGTGAGTCGTATTAAATT-S' Hlp2 (1131-1170) 5'-ACACTCGAGTGTTATCCTAGTTAGTCAACAGAACCCTGAACCTATAGTGAGTCGTATTAAATT-3' HMGB1 (2181-2230) 5'-CAGACTGTACCAGGCAAGGTTAGTGGCTATTGAAAATACCCCTATAGTGAGTCGTATTAAATT-3' MTM1 (3251-3290) S'TATGTAAATAACTTCTTTTTCAGTAATTAGTAACTCCCCTGCCTATAGTGAGTCGTATTAAATT-S' RhoA (1321-1360) 5'-GAAAGTAGGCAGGACATG"^'AGTTATAAAGTAGTTACAGCCCTATAGTGAGTCGTA•^'AAATT-3, SF2 (1541-1580) 5'-AATTAGCCATTCTTCCCACATTAGTCCCTATTAAAACAAACCTATAGTGAGTCGTATTAAATT-3' TfRc (2011-2050) 5'-AGCATTATGAAAATCAGrrGTTAGTCTAGAAGTAGCACGGCCTATAGTGAGTCGTATTAAATT-3' TGF-BR1 (1951-1990) 5'-AGAGTTTAAAGTGTTAGTTTAAGATCTTGAAAAATTATAGCCTATAGTGAGTCGTATTAAATT-3' TNFRS21 (2038-2074) 5'-TrTGATCTATTAGTCTCTGTCTGGACAAACCTTGTTCCTTTCCTATAGTGAGTCGTATTAAATT-3' U2AF (2871-2910) 5'-CTCTTTGTTAAAAAAAACCACTCTCATTAGTATGATAAATCCTATAGTGAGTCGTATTAAATT-3' YY1 (1271-1310) 5'-GATTTCAGGTTAGTTGACTGAGCAAACTTCTTATTACAACCCTATAGTGAGTCGTATTAAA"IT-3' Control 5'-GGGTGTCCGTGCACCTGAACCTCCGGAGCTATAGTGAGTCGTATTA-3' TABLE 2.6 PRIMERS USED FOR RT-PCR.

Name Forward Primer Reverse Primer BCAR 5'-CAAAGGTGGTGGTTCCTACGC-3' 5'-GGGCACGTCATACACATCCAA-3' Bid 5'-GCAAGCTTACTGGGAGGCAGA-3' 5'-TGGAAGACATCACGGAGCAAA-3' CDH11 5'-CAAGAGAGGCTGGGTCTGGAA-3' 5'-ACAAAAATGGTTCCCGCTCCT-3' Ft 5'-CGCGCTTGCTGTTTGATATTC-3' 5'-TGCACCAGATCCACCATCTCT-3' Hip2 5'-AGCAGGACCTCCAGACACACC-3' 5'-GCTGCCCATTGATCTTTCAGG-3' HMGB1 5'-CAAAGGCTGACAAGGCTCGTT-3' 5'-GATGCTCGCCTTTGATTTTGG-3' MTM1 5'-CAGTGGAGCAGCGTTACATGG-3' 5'-TGCAGTGCCAGAAGTGAGTCC-3' RhoA 5-TGTCCAAATGTGCCCATCATC-3' 5'-CGTGGCCATCTCAAAAACCTC-3' SF2 5'-GCACTGGTGTCGTGGAGTTTG-3' 5'-GCGTGGTGATCCTCTGCTTCT-3' TfRc 5'-TTTCCGCCATCTCAGTCATCA-3' 5'-AAAGCGTCTCTCTGGGCTCCT-3' TGF-BR1 5'-CCGTTGGGTCTTCTCACTGCT-3' 5'-GCCCAGCTGCTTCAGATCAAT-3' TNFRs21 5'-ACGTGCCTTCCAGTGTGATGA-3' 5'-GGGACATCGTGGGATTCCATA-3' U2AF 5'-GCACAGCTCATCTTGCCTTGA-3' 5'-GAACTGCCGTCTGAGACACGA-3' YY1 5'-GACCCGGGGAATAAGAAGTGG-3' 5'-CAATGCCAGGTATCCCTCCAG-3' Gapdh 5'-ATGGTGAAGGTCGGTGTCAAC-3' 5'-TTACTCCTTGGAGGCCATG-3' Edn1 5'-CCCCACTCTTCTGACCCCTTT-3' 5'-TTTTGGTGAGCGCACTGACAT-3' Fibrillarin 5'-AAGCTGGCAGCAGCTATCCTG-3' 5'-TGCGGTATTTGTGTGGGTGTC-3' HoxC5 S'-CTTCCAATCCCCTTCCCAAAG-S' 5'-GGGGCAGGAAGGTGAGAAAGT-3' Ncoa3 5'-CCCTCTGGAGTGTCCTCCTCA-3' 5'-CTTTGGCAAGCACATCACTGG-3' TNR21 5'-ACGTGCCTTCCAGTGTGATGA-3' 5'-GGGACATCGTGGGATTCCATA-3' Yes 5'-TGTAAGCCCAAGTGCCAGTCA-3' 5'-TGCTTCCCACCAGTCTCCTTC-3' Pea15 5'-GATGCCACACAGCCAAGTCAC-3' 5'-TTGGCTCTGCCACTTCCTTTC-3' ImpB S'-GAAGAACCAGCCACCTATTGTC-S' 5'-AGAGTCGCGCAGAGTAGAGACT-3' Tcp1 5'-TGCAAGATCTCTAGTTGTGATT-3' 5'-CGAACTTCAGGCTCTTCACTTT-3' p21 5'-ATGTCCGATCCTGGTGATGTCC-3' 5'-TCAGGGCTTTCTCTTGCAG-3' 117

FIGURE 2.7 COMPARISON OF RNA BINDING AFFINITY FOR QKI ISOFORMS. Comparison of the RNA binding affinity of QKI-5, QKI-6 and QKI-7 using a selected sequence (gccgUAACcacgucUACUAACgccg; capital letters denote consensus). EMSAs were performed with concentrations of GST-QKI5, -QKI6 and -QKI-7 that increase by a factor of 1.8 from 1nM to 9.6uM. A plot of the fraction of bound RNA is shown as a function of protein concentration on right. 118

100 -—QKI-5 80 •-QKI-6 •-*- QKI-7 1? 60 5 3 1R £ o 40 20 0.01 0.1 1 10 Protein cone. (uM)

Figure 2.7 Comparison of the RNA binding affinity of QKI-5, QKI-6 and QKI- 7 using a selected sequence. 119

FIGURE 2.8 QKI-5, QKI-6 AND SAM68 SPECIFIC IMMUNOPRECIPITATIONS. A representative QKI-5, QKI-6 and QKI-7 immunoprecipitations from an E15.5 mouse embryo is shown. The embryo total cell lysates, control normal rabbit serum (NRS) and protein immunoprecipitations were separated by SDS-PAGE. The proteins were transferred to nitrocellulose and immunoblotted with the anti- QKI-5, anti-QKI-6 and anti-SAM68 specific antibody. The immunoglobulin heavy chain (IgG) and the migration of corresponding proteins is shown. 120

— oo m CO CO _l c/> • _i co J> _i to -i o ce ^ o a: < U ^ * I- 2 O H z to Mr I- z a 80- v _ f Sam68 65- 0, ^ # tk IgG 55-

QKI5 OfcQKI-7 WQKI-6 36- 32- Blot: QKI-5 QKI-6 Sam68

Figure 2.8 QKI-5, QKI-6 and Sam68 specific immunoprecipitations. 121

3 THE STAR/GSG RNA BINDING PROTEINS GLD-1, QKI,

SAM68 AND SLM-2 BIND BIPARTITE RNA MOTIFS

3.1 PREFACE

In chapter 2, we identified the RNA binding consensus of QKI and showed that this QKI response element can be found, using bioinformatic, in many putative mRNA targets. These findings confirmed some of the roles played by

QKI in development and highlight some other putative roles for this protein. In chapter 3, because other STAR RNA binding proteins like Sam68 and SLM-2 have poorly defined roles in cellular biology, our objective was aimed at the identification of the unknown SLM-2 response element and apply these findings to a bioinformatic search. The hypothesis based on the following study is that

SLM-2 and Sam68 binds to a similar response element to the one identified with

QKI. The next chapter relates the identification of the RNA motif bound by SLM-2 and the comparison analysis with other STAR RNA binding motifs.

3.2 ABSTRACT

SAM68, SAM68-like mammalian protein 1 (SLM-1) and 2 (SLM-2) are members of the K homology (KH) and STAR (signal transduction activator of

RNA metabolism) protein family. The function of these RNA binding proteins has been difficult to elucidate mainly because of lack of genetic data providing 122 insights about their physiological RNA targets. In comparison, genetic studies in mice and C. elegans have provided evidence as to the physiological mRNA targets of QUAKING and GLD-1 proteins, two other members of the STAR protein family. The GLD-1 binding site is defined as a hexanucleotide sequence

(NACUCA) that is found in many, but not all, physiological GLD-1 mRNA targets.

Previously by using Systematic Evolution of Ligands by Exponential enrichment

(SELEX), we defined the QUAKING binding site as a hexanucleotide sequence with an additional half-site (UAAY). This sequence was identified in QKI mRNA targets including the mRNAs for myelin basic proteins. Herein we report using

SELEX the identification of the SLM-2 RNA binding site as direct U(U/A)AA repeats. The bipartite nature of the consensus sequence was essential for SLM-2 high affinity RNA binding. The identification of a bipartite mRNA binding site for

QKI and now SLM-2 prompted us to determine whether SAM68 and GLD-1 also bind bipartite direct repeats. Indeed SAM68 bound the SLM-2 consensus and required both U(U/A)AA motifs. We also confirmed that GLD-1 also binds a bipartite RNA sequence in vitro with a short RNA sequence from its tra-2 physiological mRNA target. These data demonstrate that the STAR proteins QKI,

GLD-1, SAM68 and SLM-2 recognize RNA with direct repeats as bipartite motifs.

This information should help identify binding sites within physiological RNA targets. 123

3.3 INTRODUCTION

The K homology (KH domain) is a prevalent RNA binding domain that is evolutionary conserved and was identified as a repeated sequence in the heteronuclear ribonucleoprotein particle (hnRNP) K (Gibson et al., 1993; Siomi et al., 1993). The KH domain is a small protein module consisting of 70 to 100 amino acids and it is the second most prevalent RNA binding domain next to the

RRM (RNA recognition motif) (Dreyfuss et al., 2002). The RNA binding property of the KH domain was initially shown for FMRP, the gene product of the human fragile X syndrome and hnRNP K (Ashley et al., 1993; Siomi et al., 1993). The

RNA binding properties of the KH domain of FMRP and the neighboring RGG sequences were determined to associate with RNAs containing the kissing complex and the G-quartets, respectively (Darnell et al., 2001; Darnell et al.,

2005). The KH domain is often found in multiple copies within proteins (15 in vigilin) and there is a subfamily that contains a single copy KH domain that is larger referred to as a maxi-KH domain (Di Fruscio et al., 1998).

The KH domain makes direct protein-RNA interactions with a three- dimensional p1a1a2p2p3 topology with an additional C-terminal a helix (a3) for maxi-KH domains (Lewis et al., 2000; Liu et al., 2001). The feature of KH domains is an invariant GXXG loop located between a1/a2 that provides close contact with the phosphate groups such that the neighboring nucleotides can form Watson and Crick base pairing with conserved amino acids within the KH domain (Lewis et al., 2000; Liu et al., 2001). The structure determination of the 124

KH domains has also been solved with single-stranded DNA, demonstrating that certain KH domains may accommodate either RNA or ssDNA within their active site (Braddock et al., 2002; Backe et al., 2005; Du et al., 2005). Additionally, structure determination has shown that KH domain form oligomers, explaining the presence of multiple KH domains within proteins and the dimerization of single KH domain proteins (Lewis et al., 1999). There exists a subfamily of KH domains that contain extended loops between (31/a1 and p2/p3 and an additional

C-terminal helix in their topography (Di Fruscio et al., 1998). These maxi-KH domain proteins contain conserved sequences immediately at the N- and C- terminal of the KH domain. The entire region is referred to as the STAR/GSG

(signal transduction activator of RNA metabolism/ GRP33, SAM68, GLD-1) domain (Vernet & Artzt, 1997; Lukong & Richard, 2003). Although STAR proteins contain single KH domains, dimerization is required for RNA binding (Chen et al.,

1997). The STAR proteins are mammalian Sam68, SLM-1, SLM-2, QKI, SF1, C. elegans GLD-1, Drosophila How, KEP1, Sam50 and Artemia Salina GRP33.

STAR proteins have been shown to function in pre-mRNA splicing (Arning et al.,

1996; Matter et al., 2002; Wu et al., 2002; Di Fruscio et al., 2003; Stoss et al.,

2004), mRNA export (Reddy et al., 1999; Larocque et al., 2002; Coyle et al.,

2003), mRNA stability (Nabel-Rosen et al., 1999; Larocque et al., 2005) and protein translation (Jan et al., 1999; Saccomanno et al., 1999; Lee & Schedl,

2001; Lee & Schedl, 2004; Schumacher et al., 2005). Genetic evidence has implicated the STAR RNA binding proteins in many cellular processes. These include the role of the QKI isoforms in the process of myelination of the central 125 nervous system (Larocque & Richard, 2005), GLD-1 in the germline determination (Francis et al., 1995a; Francis et al., 1995b; Crittenden et al.,

2002), How in muscle and tendon differentiation (Baehrecke, 1997; Zaffran et al.,

1997), Kepi in cell death processes (Di Fruscio et al., 2003) and Sam68 in bone marrow mesenchymal cell fate (Richard et al., 2005). Genetic data has also implicated simple KH domain proteins FMRP in mental retardation (Verkerk et al.,

1991; DeBoulle et al., 1993) and Nova in paraneoplastic neurologic disorders

(Jensen etal., 2000).

SF1 or branch point binding protein (BBP) was shown to recognize the branchpoint site RNA sequence (UACUAAC) (Liu et al., 2001; Peled-Zehavi et al., 2001) and structure determination has shown that there is direct protein-RNA contact (Liu et al., 2001). These studies have provided necessary information about the contact sites of maxi-KH domains and their similarities/differences with simple KH domains proteins such as Nova. Based on this information, Ryder and coworkers showed that GLD-1 binds a hexanucleotide sequence (NACUCA) and proposed it as the STAR binding site (Ryder et al., 2004). In a previous effort by using Systematic Evolution of Ligands by Exponential enrichment (SELEX)

(Tuerk & Gold, 1990), we defined the QKI RNA binding consensus sequence to be a bipartite motif consisting of a core NACUAAY (where Y is a pyrimidine) sequence with an neighboring half-site (UAAY) (Galameau & Richard, 2005). In the present study, we define for the first time the RNA binding specificity of the mammalian STAR protein, SLM-2. We identified using SELEX the SLM-2 126 consensus sequence as a direct U(U/A)AA repeat. The bipartite nature of the consensus RNA sequence was essential for high affinity RNA binding activity to

SLM-2. The identification of a bipartite mRNA binding site for QKI (Galarneau &

Richard, 2005) and now for SLM-2 prompted us to further determine whether

SAM68 and GLD-1 also bound bipartite direct repeats. Indeed SAM68 and GLD-

1 required bipartite RNAs, demonstrating that the STAR proteins SLM-2, SAM68,

QKI and GLD-1 bind direct RNA repeats as a bipartite motif in target RNAs.

3.4 METHODS

3.4.1 SELEX ASSAY.

Systematic Evolution of Ligand by Exponential enrichment (SELEX) was performed as previously described (Buckanovich & Darnell, 1997). Essentially, an oligonucleotides harboring a 52-bp random sequence surrounded by two primer binding sites, with an estimated complexity of 1 X 1015, were synthesized by (Invitrogen). The oligonucleotides were amplified by PCR using corresponding forward and reverse primers as previously described (Buckanovich & Darnell,

1997). After PCR amplification, the sequences of 24 random clones were determined; each clone was unique and the overall base composition whoed similarity among the clones (average composition: A, 20%; U, 30%; C, 22%; G,

28%; data not shown). A purified DNA library (1 X 1013 molecules) was transcribed in vitro using the T7 RNA polymerase (Promega) and (a-32P)-UTP.

RNA was purified from denaturing TBE-acrylamide gels, heated to 65 °C from 5 127 min, and precleared using TALON Metal Affinity Resin (BD Bioscience) to absorb non-specifically bound RNAs. Unbound RNAs were incubated in binding buffer

(50 mM Tris-HCI (pH 8.0), 590 mM KCI) with the recombinant His-SLM-2 for 30 min, then with TALON Metal Affinity Resin for another 30 min. After four washes with binding buffer, the RNAs were eluted by TRIzol extraction (Invitrogen). The purified RNAs were ethanol precipitated and resuspended in water with RNase- free DNase for a 15 min reaction. The DNase reaction was quenched for 10 min at 65 °C. Reverse transcriptions were performed using M-MLV reverse transcriptase (Promega) and a reverse oligonucleotide annealing to the 3' primer binding site. cDNAs were then generated by PCR amplification with the reverse oligonucleotide and the forward oligonucleotide annealing to the 5' primer binding site containing the T7 promoter. After round 6, the cDNAs were amplified with the reverse primer and a forward primer containing the EcoRI restriction site. The

DNA fragments were digested with EcoRI and BamH\ and subcloned into pBluescript SK+ (Stratagene) for blue/white selection. Forty-three random white colonies were selected, their plasmid were purified and the SELEX sequence was identified by DNA sequencing (Genome Quebec).

3.4.2 RNA PREPARATION, PURIFICATION AND ELECTROMOBILITY SHIFT ASSAYS

(EMSAs).

RNAs were prepared by run-off in vitro transcription of oligonucleotides harboring a T7 binding site in the presence of 32P- UTP, using T7

MegaShortscript (Ambion) according to the manufacturer's protocols. RNAs were 128 purified on TBE-acrylamide gels before use. For EMSAs, a constant concentration of 32P-labeled RNA (100 pmol) was incubated alone with buffer or with increasing or decreasing concentrations of the corresponding tested proteins in the following buffer: 20 mM HEPES (pH7.4), 330 mM KCI, 10 mM MgCI2, 0.1 mM EDTA, 0.1 mg/ml heparin and 0.01% IGEPAL CA630 (Sigma). The 30 ul reaction were incubated at room temperature for 1 h, and 3.3 pi of RNA loading dye (glycerol containing 0.25% (w/v) bromophenol blue, 0.25% (w/v) xylene cyanol) was added to each. A portion (15 pi) of each sample was separated on native Tris-glycine 8%- acrylamide gels. The gels were dried and the bound and unbound RNAs were quantified using a Storm Phosphorimager (Amersham). The fraction of bound RNA was determined and plotted using the software program

Prism 3.0 (GraphPad Software).

3.4.3 DNA AND PROTEIN PREPARATION

Recombinant GST-QKI-5 was described previously (Galarneau & Richard,

2005). Maltose binding protein fused to GLD-1 was a generous gift of Min-Ho Lee

(University of Albany). His-SLM-2 was prepared from subcloning the coding region from GFP-SLM-2 (Di Fruscio et al., 1999) into pQE Trisystem (Qiagen) using San?H1 and Xho\ directional cloning. His-GSG(SAM68) was prepared by subcloning the GSG domain of mouse Sam68 into pET-18c. Protein purification was performed as per the manufacturer's instructions. 129

3.5 RESULTS

3.5.1 THE IDENTIFICATION OF THE SLM-2 RNA BINDING SITE BY USING SELEX

To identify the binding motif for the SLM-2 RNA binding protein, we performed SELEX to enrich for high affinity RNA ligands. Bacterial recombinant

SLM-2 expressed as a histidine epitope tagged fusion protein was generated and purified for the assay. Synthetic RNAs were transcribed with the T7 RNA polymerase from DNA pools of 52-nucleotide random-mers estimated at a complexity of 1.0 x 1014 and we randomly sequenced 20 RNA molecules from the initial library and noted, as expected, that each sequence was unique (Galarneau

& Richard, 2005). The transcribed RNAs were generated in the presence of 32P- a-UTP such that the amount of specific SLM-2 bound RNAs could be measured after each round. After six cycles of selection, we observed an approximately 10

% of binding of the initial input (not shown), demonstrating that we indeed had enriched specific sequences. To confirm the SELEX amplification of the SLM-2 specific RNA ligands, we performed a gel electromobility shift assays (EMSA) with purified pools of RNA transcripts isolated from rounds 2, 4 and 6. At round 0, no binding was observed (not shown). The RNAs were 32P-labelled and incubated with buffer or increasing concentration of His-SLM-2. The SLM-2/RNA complexes were observed as slow migrating complexes on native gel electrophoresis (Figure 3.1). More efficient RNA binding was observed in round 6 than rounds 2 and 4 (compare the free probe remaining from lanes 2 and 6 with lane 10). After round 6, the SLM-2 bound RNAs were converted into cDNAs, subcloned and sequenced. The sequence of 43 clones revealed that 11 clones 130 were unique (Table 3.1). The clones were referred to as SLM-2 response element (SRE)-1 to 11. Class I RNAs contained a bipartite motif consisting of direct repeats of the sequence U(U/A)AA (Table 3.1). Our data show that the selected RNA aptamers contained a bipartite motif with direct repeats and the spacing between the repeats varied from 3 (SRE-3) to 25 (SRE-7) nucleotides

(Table 3.1 and Figure 3.1B). The 3 RNAs identified that did not contain the bipartite sequence (SRE-9, -10, -11) were grouped in Class II and since ~ 10 % of RNAs from round 6 bound SLM-2, Class II RNAs are likely to represent non- binders. EMSA with SRE-9, -10 or -11 confirmed that they are not bound by

SLM-2 (Figure 3.6 in the supplementary data section (section3.9). No apparent secondary structure was identified in the SREs using the prediction of RNA secondary structure program MFOLD (data not shown). Taken together, we have identified a bipartite motif consisting of direct repeats of the sequence U(U/A)AA as the SLM-2 RNA binding site.

3.5.2 A DIRECT REPEAT OF U(U/A)AA DEFINES THE SLM-2 RNA BINDING

CONSENSUS SEQUENCE

To define the characteristics of the SLM-2 RNA binding motif, we performed RNA binding assays with SRE-4 and SRE-7. We chose SRE-4 and -7 for further analysis because SRE-4 was the most frequently identified RNA and

SRE-7 contains a guanine-rich sequence at its 5'end in addition to the UUAA repeats. The 52mer identified for SRE-4 was trimmed to a 38mer conserving 7 TABLE 3.1 SELECTED SLM-2 BOUND RNA LIGANDS

Ligand Sequence n Class I SRE-1 UUUGGGGCCUUCUAAAGAAAUUUUCACUAUCCUAHCTAACAGUUCCGCCGCUC 1 SRE-2 CGUAGGCGCAUCGIiaAAAAUUCAAAGCAAAAAUUGUGUnEIMCUGGGGGA- 2 SRE-3 ACGCGCUUjJUA&CGUGCCCUUACAUCCGCUAAAAACUAAACUCUGACCAUUU 2 SRE-4 UUUGGGGGUUCAAUAAAAAUUUUCACUAUGCUAiffiAACAGUUCCGCCGCUCC 28 SRE-5 UUUGGGGGUUCUAUAAAAAUUUUCACUAUCCr"i"""T.CAGUUCCGCCGCUCC 1 SRE-6 ACAGCACGUUUPAACUUUUUGCUAAUUAUUCt UAAAAUUCCUCCUCCUCUU 2 SRE-7 AUCGUGGUGGGCGGHOAAUUUGGAUUUCUUGAGCUUAUGGCUlJPgAAUAUGG 2 SRE-8 GUGCGAUCUGUGUTIEJAAUCAUUGUUCUGUUUCGCUCUAAAUUUUUCGCCGCU 1 Class II SRE-9 GCGGUUACGGGAUCCAUGUAGACGCACAUAUUAUAUGGGAUUAGGUAGACUG 2

SRE-10 GCUGGGGGUUGAUCCACUAUUUCCACAGCGGCAGCAACAGUUCCGCCACUUC 1

SRE-11 AUCGGGGGGGGCGGliOAAUUUGGACUACCCGAGCAUCAGGUCCUCCGCUGGG 1

The UAAA conserved motif are shown in bold and the UUAA conserved motif are shown underlined with a gray background, n = number of times identified. 132

FIGURE 3.1 SLM-2 RNA LIGANDS IDENTIFIED. (A) EMSAs of pooled RNAs identified in rounds 2, 4 and 6 using increasing concentrations of His-SLM-2. The protein/RNA complex was separated from the free probe on a native PAGE. The migration patterns of unbound RNAs (free probe) and protein bound RNAs (SLM-2/RNA complex) are indicated on the left. (B) The sequences of 8 unique RNAs bound to SLM-2 after six cycles of SELEX. Both identified motifs are aligned and black undermark. Illustrated, underneath the sequences is the probability matrix (graphic logo) based on all the 8 different sequences, depicting the relative frequency of each residue at each position within the selected motif. 133

Round 2 Round 4 Round 6 sg His-SLM-2 His-SLy-2^ 1 His-SLM-2^ CQ SLM-2/RNA complex

Free probe 2 6 7 8 9 10 11 12

B

SRE-1 (1) ———• -UMCCCCOTC .'i5GS»*cVMXCGU?C |j»8oOU —OCACOJttKffl SRE-S (1) 0I,AC.€C'.GW»^SM ^AgTOtlO CACOAOQ , SR£-6(2) ACR

Figure 3.1 SLM-2 RNA ligands identified. 134

nucleotides on the 5' and 3' end of the U(U/A)AA consensus sequence and designated this as SRE-4wt. This synthetic RNA bound SLM-2 with a high affinity dissociation constant of ~16 nM (Figure 3.2A and Table 3.2) like the 52mer SRE-

4 sequences (not shown). The substitution of either or both U(U/A)AA motifs with

CCCC abolished SLM-2 binding (Figure 3.2A and Table 3.2, SRE-4m1, m2, m3).

Similarly, the replacement of the UAAA with UACC abolished RNA binding (SRE-

4m4). These findings demonstrate that both tetra-nucleotide motifs (U(U/A)AA) are required for SLM-2 high affinity RNA binding. We analyzed SRE-7 and identified a G-rich sequence that may represent a G quartet. We first proceeded by replacing the G-rich nucleotides with U-rich sequence and this had little effect on SLM-2 RNA binding activity (compare SRE-7m1 and SRE-7wt; Table 3.2).

Interestingly, the replacement of the G-rich sequences with AU-rich sequences such as to introduce a third U(U/A)AA motif enhanced SLM-2 RNA binding to this

RNA species (SRE-7m2; Table 3.2). The substitution of the downstream uUUAAu sequence with CGACGC abolished SLM-2 RNA binding consistent with the U(U/A)AA requirement (SRE-7m2). Numerous 5' and 3' deletions were performed and a minimal sequence of 40 nucleotides was identified containing both U(U/A)AA motifs that bound with a Kd of -22.3 nM (SRE-7d9; Table 3.2).

The substitution of the 5' or 3' U(U/A)AA motifs reduced the SLM-2 high affinity

binding site (Table 3.2; SRE-7d9m2, m3), demonstrating that indeed SLM-2

binds RNA with high-affinity to direct repeats of U(U/A)AA. 135

TABLE 3.2 BINDING AFFINITY OF SLM-2 FOR SELECTED AND MUTATED LIGANDS

Binding Ligand Sequence affinity nM SRE-4wt GGUUCUA1I|AAUUUUCACUAUCCUA^SCAGUUCC 16.3 SRE-4m1 GGUUCUACCCCAAUUUUCACUAUCCUACCCCCAGUUCC £1000 SRE-4m2 GGUUCUACCCCAAUUUUCACUAUCCUA^BCAGUUCC >1000 SRE-4m3 GGTJUCUA^SAATJUUUCACUAUCCUACCCCCAGOTJCC >1000 SRE-4m4 GGUOCUA^CCCCCUUUCACUAUCCUAJ^^CAGUUCC 2:1000 SRE-7wt AUCGUGGUGGGCGG^BUUUGGAUUUCUUGAGCUUAUGGCUU^SUAUGGG 30.3 SRE-7m1 AUCUTTGUUCUCCGCl^gUUUGGAUUUCUUGAGCUUAUGGCUU^gUAUGGG 45.2 SRE-7m2 AUCGUAAl^McAA^»TJUUGGAUUUCUUGAGCTJUAUGGCUUJ^gUAUGGG 14.5 SRE-7m3 AUCGUGGUGGGCGG^SUUUGGAUUUCUUGAGCUUAUGGCUCGACGCAUGGG ^ 1000 SRE-7d1 AUCGUGGUGGGCGG^SUUUGGAUUU :> 1000 SRE-7d2 GUGGGCGG^SUUUGGAUUUCUUGAG > 1000 SRE-7d3 GG§l|jUUUGGAWroCUUGAGCUUAUG > 1000 SRE-7d4 UUUGGAUUUCUUGAGCUUAUGGCUUl'"' > 1000 SRE-7d5 UlTOCTTOGAGCUUAUGGCUlGf^UAUGG S 1000 SRE-7d6 GAUUUCUUGAGCUUAUGGCUUr">f,*fu S 1000 SRE-7d7 IglUUUGGAUUUCUUGAGCUUAUGGCU ' . 2:1000 SRE-7d8 G^SUUUGGAUUUCUUGAGCUUAUGGCUU^ 2: 1000 SRE-7d9 GG^SUUUGGAUUUCUUGAGCUUAUGGCUO^J^UAUGG 22.3 SRE-7d9m1 GGgCCCCUGGAUUUCUUGAGCUUAUGGCUU^CCCCAUGG 2: 1000 SRE-7d9m2 GGSCCUUUGGAUUUCUUGAGCUUAUGGCUU|^SJAUGG =400 SRE-7d9m3 Gg^SuCUGGAUOTJCUUGAGCUUAUGGCUUgCCUAUGG "400 U(U/A)AA motifs are shown in gray background and mutated residues are underlined 136

FIGURE 3.2 DEFINING THE SLM-2 RESPONSE ELEMENT AS A BIPARTITE RNA MOTIF EMSAs with the selected SRE-4 with decreasing concentrations of recombinant His-SLM-2 (A) and the SAM68 GSG domain (B) (by a factor of 2 from 1 uM) or with buffer alone. The RNA sequence and mutants (m1-m4) used in the reaction are shown in Table 3.2. Migration patterns of unbound RNAs (free probe) and protein bound RNAs (protein-RNA complex) are indicated on the left. 137

His-SLM-2 His-SLM-2 Hfe-SLM-2

m • Buffe r Slm2~RNAj • -..-- . . i complex I »» ^m, w.» „*

Free 1 probe SRE~4wt SRE-4ml SRE~4m2

His-SLM-2

SIm2-RN/s| complex

Free! probe -4 $ # •* * * * * 4 * * .* SRI' 4mj SRE-4m4

B

Sarn68-RNA complex

Free probe | SRE-4wt SRE-4m2

GSG-Sam68 *§ GSG-Sam68 sg

Sam68-RNA I complex J

Free probef • * * * •* * * * '•• * * *• SRE~4m3 SRE-4m4 Figure 3.2 Defining the SLM-2 response element as a bipartite RNA sequence. 138

3.5.3 SAM68 BINDS THE SLM-2 RESPONSE ELEMENT

SELEX has been performed with recombinant SAM68 and a UAAA consensus was defined as a necessary RNA binding site (Lin et al., 1997). As there is 69% sequence identity between the SLM-2 and SAM68 STAR/GSG domains (Di Fruscio et al., 1999), we tested the possibility that the SLM-2 consensus (SRE-4) may be bound by Sam68. Using EMSA with recombinant

Sam68 containing only the STAR/GSG domain, we observed that indeed the

GSG domain of SAM68 bound the SRE-4wt RNA aptamer, but not the variants that contain mutated U(U/A)AA motifs (Figure 3.2B). There was one variant of

SRE-4 (SRE-4m2) that retained some binding and this is likely due to the polyuridine stretch (UUUU) that remained between the two U(U/A)AA motifs

(Table 3.2). These findings demonstrate that SAM68 also has the capabilities to bind a bipartite U(U/A)AA consensus.

3.5.4 DEFINING THE BIPARTITE NATURE OF THE QKI RESPONSE ELEMENT WITHIN THE

MRNAS OF MYELIN BASIC PROTEIN

The mRNAs encoding the myelin basic proteins (MBP) are known QKI targets (Li et al., 2000; Larocque et al., 2002). The QKI RNA binding site was defined to be a core (NACUAAC) with a neighboring half-site (UAAY) (Galarneau

& Richard, 2005). The MBP QREs were defined as QRE-1 and QRE-2 (Ryder &

Williamson, 2004; Galarneau & Richard, 2005). QRE-2 is interesting as it 139 contains two regions designated regions A and B containing an overlapping imperfect core (underlined) and half-site (bold) (UACACACUAAC) as well as downstream perfect half-site (region B, UAAC) (Figure 3.3). Alternatively, region

A is recognized as the imperfect half-site (bold) (UACACACUAAC) and as the perfect core (underlined). To define the requirements of QRE-2, we performed

EMSA with various combinations of region A and B. The QRE-2 sequences with regions A and B bound QKI with high affinity (QRE-2:wt, Kd ~ 121 nM) and the substitution of the UAAC half-site in region A or region B diminished considerably the RNA binding affinity (Figure 3.3, QRE-2:m1, m2). The substitution of the

UACA to GAGA in region A bound with high affinity demonstrating that region A supplies the perfect core (CACUAGG) and region B supplies the half-site (UAAC) of the bipartite motif. These findings demonstrated that region A without region B was unable to serve as a high affinity site for QKI (QRE-2:m1). Ryder and

Williamson showed that region A alone was bound with high affinity by QKI. We next centered region A and this considerably improved QKI binding with a Kd of

~168nM (Figure 3.3, QRE-2:m4). The substitution of either the UAUA to GAGA

(QRE-2:m5) or the UAAC to GAGC (QRE-2:m6) significantly reduced QKI RNA binding (Figure 3.3B). These findings define the QRE-2 as requiring a bipartite motif located in region A or in region A plus region B.

3.5.5 GLD-1 BINDS A BIPARTITE RNA MOTIF CONTAINING THE HEXANUCLEOTIDE.

A high affinity RNA binding site has been defined for C. elegans GLD-1 that consists of a hexanucleotide (NACU(C/A)A) (Ryder et al., 2004). 140

FIGURE 3.3 DEFINING THE HIGH AFFINITY QRE WITHIN MBP MRNA. (A) EMSAs of selected RNAs with increasing concentrations of recombinant GST-QKI-5 (by a factor of 2 from 2 nM) or with buffer alone. The RNAs used for the EMSAs are the MBP QRE-2 and variation mutants of each (m1-m6). Migration patterns of unbound RNAs (free probe) and QKI-5 bound RNAs (QKI- RNA complex) are indicated on the left. (B) RNA species tested in (A) are shown. The black bars denote sequences that are unaltered between the wild-type and the mutant versions. QKI-RNA complex

Free I probeI *• *» 0 W # * QRE-? wt QRf:-? Jill QRE-2:m2 QRE-2:m3

QKI-RNA | complex |

Free I probe| t« -- - **« # ORC ?. m4 QRC-S.mS QRE-2:rn6 B Region A egion B M"M) QRE-2:wt UACACACUAAC UAACg| 121 QRE-2;m1 UACACACUAAC c^scm 833 QRE-2:m2 UACACACUASfi UAACB >1000 QRE~2:m3 SASACACUAAC UAACH 128 QRE-2;m4. UAUACACUAAC 168 QRE-2:m5 UAUACACfiAfiC >1000 QRE-2:m6 ffiAfiACACUAAC 757

Figure 3.3 Defining the high affinity QRE within MBP mRNA. 142

To examine whether the GLD-1 hexanucleotide sequence also requires a similar half-site, we performed EMSA assays with a segment of the tra2 and gli repeated element (TGE) containing the hexanucleotide (UACUCAU) and its neighboring half site (UAAU) (Figure 3.4A, TGE-wt). GLD-1 bound this wild-type TGE sequence and the QRE variation of it (TGE-m2) with approximate Kd ~ 104 nM, defining a short sequence for GLD-1 high affinity binding (Figure 3.4B). These data are consistent with previous competition experiments that defined the GLD-

1 Kd ~ 10 nM that defined the hexanucleotide as (UACU(C/A)A) (Ryder et al.,

2004). The nucleotide substitution of the half-site (UAAU to GAGU) abolished

RNA binding (Figure 3.4B, TGEml), consistent with the need for a half-site in addition to the hexanucleotide. Similar binding experiments were performed with

QKI and we observed that TGE-m2 is essentially a QRE bound with high affinity, whereas the wild-type TGE bound with a moderate affinity of approximately 300 nM (Figure 3.4A). The TGE-m1 was not bound by QKI (Figure 3.4A) meaning that small variation on the core sequence and lost of the half-site completly abolish binding. In summary, these data identify the GLD-1 RNA binding motif as bipartite as observed with SLM-2, QKI, and Sam68. 143

FIGURE 3.4 GLD-1 BINDING TO THE TRA2/GLI ELEMENT ANALYSIS. EMSAs of the tra2/gli element with increasing concentrations of QKI (A) and GLD-1 (B) (by a factor of 2 from 2 nM) or with buffer alone. Migration patterns of unbound RNAs (free probe) and protein/ bound RNAs (GLD-1/RNA or QKI/RNA complexes) are indicated on the left. (C) RNA species tested in (A) and (B) are shown. The black bars denote sequences that are unaltered between the wildtype and the mutant versions. A

QKI-RNA complex

Free probe TGE wt TGE_m1 TGE m2

B

GLD1-RNA complex

Free probe m» » |* * ******** TGE wt TGE ml TGE nm2

TGE_jArt UAAU UACUCAU TGE_m.1 UACUCAU TGE m2 UAAU UACUAAC

Figure 3.4 GLD-1 binding to the tra2/gli element analysis. 145

3.6 DISCUSSION

In the present study, we identified a SLM-2 consensus sequence as direct

U(U/A)AA repeats using SELEX. The bipartite nature of the consensus sequence was essential for SLM-2 high affinity RNA binding. The identification of a bipartite mRNA binding site for QKI (Galarneau & Richard, 2005) and now SLM-2 prompted us to determine whether SAM68 and GLD-1 also bound bipartite direct repeats. Indeed SAM68 bound the SLM-2 consensus and required both

U(U/A)AA motifs. Also, GLD-1 required sequences within the UAAY half-site in addition to its conservative consensus NACU(C/A)A, defining a GLD-1 bipartite motif. Taken together, these data demonstrate that the STAR proteins SLM-2,

SAM68, QKI and GLD-1 bind direct RNA repeats.

3.6.1 DEFINING THE SLM-2 RNA BINDING SITE U(U/A)AA REPEATS.

We identified SLM-2 in 1999 by searching databases with SAM68 sequences (Di Fruscio et al., 1999). Independently SLM-2 (called T-STAR) was identified as an interacting protein of RBM, an RNA binding protein involved in spermatogenesis (Venables et al., 1999). SLM-2 is known to bind homopolymeric

RNA (Di Fruscio et al., 1999), localize to SAM68 nuclear bodies (SNBs) (Chen et al., 1999), regulate alternative splicing (Stoss et al., 2004) and dimerize with

SAM68 and SLM-1 (Di Fruscio et al., 1999). SLM-2 is posttranslationally modified to contain methylarginines (Cote et al., 2003) and phosphotyrosines, the latter 146 impairs its ability to associate with RNA (Haegebarth et al., 2004). The expression of SLM-2 is mainly restricted to testis and brain, but its function in these tissues remains unknown (Elliott, 2004). Previously we showed that SLM-2 had a preference for poly (G) rich homopolymeric RNA (Di Fruscio et al., 1999), therefore, we searched the SELEX hits for poly (G) rich sequences that could possibly resemble a G-quartet as bound by FMRP (Darnell et al., 2001). The

SLM-2 selected RNA (SRE-7) contained a variation of this sequence

(GGnGGGnGGnnnnnnnGG), but its deletion did not affect SLM-2 RNA binding.

Therefore, we next focused on the U(U/A)AA rich repeats that ressemble the consensus identified with SAM68 SELEX (Lin et al., 1997). Indeed we mapped the SLM-2 consensus sequence to direct repeats of the U(U/A)AA sequence, defining a SLM-2 RNA binding site as a bipartite motif. This motif is too frequently found in mRNAs especially in 3'-UTR to perform a bioinformatic analysis to identify the SLM-2 mRNA targets (not shown). Thus the specificity in SLM-2 function is most likely contributed by its tissue specific expression and post- translational modifications may alter its RNA binding specificity and/or accessibility.

Sam68 is known to bind cellular RNA as well as DNA (Wong et al., 1992).

Sam68 is known to have a preference for poly (U) and poly (A) homopolymeric

RNA and this association is abrogated with tyrosine phosphorylation by Src kinases and BRK (Wang et al., 1995; Derry et al., 2000). Differential display and cDNA representation difference analysis identified 29 potential RNA binding 147 targets of which 10 bind in a KH-dependent manner (Itoh et al., 2002). Sam68 binding sequences on hnRNP A2/B1 and (3-actin mRNAs were mapped to UAAA and UUUUUU nucleotide motifs, respectively and both motifs occur within specific loop structures (Itoh et al., 2002). Sam68 has also been shown to transport unspliced HIV RNAs (Reddy etal., 1999), but whether Sam68 binds the

Rev response element (RRE) remains unclear (Lukong & Richard, 2003). The knockout Sam68 mice are protected against the development of osteoporosis pointing towards an enhancement of the mesenchymal stem cell differentiation along the osteogenic rather than the adipocyte pathway (Richard et al., 2005). In addition the male mice are sterile indicating a role in spermatogenesis (Richard et al., 2005). The identification of Sam68 in these physiological processes will help direct the search for specific physiological mRNA targets. The work performed herein demonstrates that the STAR/GSG domain of Sam68 has similar RNA binding capabilities to SLM-2, as suggested by their 69% sequence identity within their STAR/GSG domains (Di Fruscio et al., 1999).

3.6.2 QUAKING: A REGULATOR OF MYELINATION.

The quaking viable (qk") mice represent an animal model of dysmyelination (Sidman et al., 1964). The defect is summarized as an incomplete maturation of the myelin sheath. This is due to the lack in proper oligodendrocyte differentiation, resulting in the failure to transport intracellular myelin components such as the MBP mRNAs (reviewed by (Hogan & Greenfield, 1984; Hardy, 1998;

Larocque & Richard, 2005)). QKI null animals have been generated, but the 148 embryos die at -E9.5-10.5 day, providing little information about the role of QKI in myelination (Li et al., 2003). By using a gain-of-function approach with ectopic expression of the QKI isoforms, we showed previously that QKI-6 and QKI-7 promote oligodendrocyte differentiation by up-regulating p27KIP1, confirming the role for the QKI isoforms during myelination (Larocque et al., 2005). The QKI response element was defined as a core NACUAAY (Ryder & Williamson, 2004) with a neighboring UAAY (Galarneau & Richard, 2005). This led to the identification of two binding sites within the mRNAs for the MBPs (Ryder &

Williamson, 2004; Galarneau & Richard, 2005). QRE-1 contains 3' adjacent half- sites that function as a moderate affinity site. In the present study, we demonstrate that region A in QRE-2 (Figure 3.4) shown previously to mediate binding (Ryder & Williamson, 2004), becomes a better site with the presence of the half site from region B (Figure 3.4). Our findings show that QRE-2 within the

3'UTR of MBP mRNAs is indeed a bipartite consensus sequence with a core

NACUAAY and a neighboring UAAY. The MBP mRNAs are localized at the distal processes of oligodendrocytes in intact tissue (Verity & Campagnoni, 1988). The factors necessary for MBP mRNA localization are oligodendrocyte-specific, as transfected MBP mRNA into non-glial cells did not properly localize to the cell membrane (Boccaccio & Colman, 1995). Studies performed in living cells by microinjection have shown that the MBP mRNA forms granules, which appear dispersed in the perikaryon and are transported down the processes (Ainger et al., 1993). MBP is not the only mRNA known to be localized to the distal processes of oligodendrocytes, as myelin oligodendrocytes basic protein 149

(MOBP), alpha-CAMKII, tau, amyloid precursor protein (APP) and others are also transported to the site of myelination (Kindler et al., 2005). Transport and localization elements have been mapped in the 3' UTR of rat and mouse MBP mRNA. A 21-nucleotide sequence named RNA transport signal (RTS) mapped at nucleotide 794 to 814 of rat MBP or nucleotide 798 to 818 of mouse MBP has been identified as a transport element (Ainger et al., 1997). This sequence is homologous to several other localized mRNAs, suggesting a general transport signal. In rat oligodendrocytes, another localization element has been mapped to nucleotides 1130 to 1473 named the RNA localization region (RLR), but the region 667 to 953 containing the RTS and QRE-1 is sufficient for localization

(Ainger et al., 1997). HnRNP A2 has been shown to be one of the component which binds the RTS sequence (Hoek et al., 1998), and insertion of the RTS into

GFP resulted in enhanced translation (Kwon et al., 1999). The mapping of QRE-2

(UACUAAC-13nt-UAAC) constitutes another element that may be necessary for proper export of the MBP mRNA into the cytoplasm and subsequent production of the MBP at its site of synthesis. It is likely that QKI works in combination with hnRNP A2 and the other components of the RNP granule in the proper transport of the MBP mRNA, its localization and its translation. The C. elegans homolog of

QKI is GLD-1, a known protein translation inhibitor required for germ-line differentiation (Francis et al., 1995a; Crittenden et al., 2002). Many GLD-1 mRNA targets have been identified (Jan et al., 1999; Lee & Schedl, 2001; Lee & Schedl,

2004; Schumacher et al., 2005) and a conservative consensus sequence of

NACU(C/A)A was defined by comparing the binding specificity with SF1 (Ryder 150 et al., 2004). Most, but not all mRNA targets (Ryder et al., 2004), contain this conserved consensus sequence. The demonstration that GLD-1 like QKI requires a neighboring halfsite is consistent with the -50% sequence identity within their

STAR/GSG domains.

The RNA binding domain of STAR/GSG proteins consist in a maxi-KH domain flanked by two conserved sequences (Figure 3.5a). The NK/QUA1 and

CK/QUA2 region refer to the N and C-terminal region, respectively, flanking the

KH domain. Based on the structure of the KH domain of SF-1 associated with its binding RNA molecule U^CslUAsAeCz, the CK region makes important contacts with the RNA. All STAR domain containing proteins have the most important

GXXG sequence located in a loop between the two first alpha helices of the KH domain. This sequence of residues is absolutely conserved among the STAR domain proteins and makes the contact with the RNA especially with the bases

U4A5A6C7. By looking closely at the residues in the CK region that make important association with the RNA bases, we find that two residues (asterix on figure 3.5) seems to confer the SLM-2/SAM68 specificity versus the QKI/GLD-1 specificity. The SLM-2/SAM68 residues are a threonine or a serine and a conserved glutamic acid while the QKI/ GLD-1/SF1 residues consist in a conserved alanine and a conserved arginine. These residues make important contact with base A2 which specificity is lost in the SLM-2/SAM68 consensus binding sequence. In fact, SLM-2/Sam68 binding sequence resembles in all points to the QKI/ GLD-1/SF1 core binding sequence but lacking U1A2C3 bases. 151

The STAR protein SF1 structure was determined and the amino acids that contact the RNA were identified (Liu et al., 2001). Based on these contact amino acids, it explains why SF1, QKI and GLD-1 have near identical binding specificity.

The Sam68, SLM-1 and SLM-2 subfamily have different amino acids in the

RNA contact position and it should be possible by amino acid substitution to convert a Sam68 domain into a GLD-1 domain that will bind the NACUA(C/A)C

GLD-1 consensus sequence. Lehmann-Blount et al. have performed such experiments and were unable by mutagenesis to identify an amino acid 'code' that would dictate GLD-1-like versus Sam68-like specificity (Lehmann-Blount &

Williamson, 2005). This led them to propose that Sam68 and hence SLM-1 and

SLM-2 might not be RNA binding proteins or possess an RNA binding specificity that is fundamentally unlike that of GLD-1 (Lehmann-Blount & Williamson, 2005).

The identification of a high affinity RNA target for SLM-2 with the characteristics of a GLD-1 /QKI bipartite motif, demonstrates that Sam68, SLM-1 and SLM-2 subfamily are indeed RNA binding proteins. The challenge ahead will be to identify the physiological RNA targets linking with the phenotypes observed in mammals. 152

FIGURE 3.5 SAM68/SLM-2 TETRANUCLEOTIDE VERSUS QKI/GLD-1 HEXANUCLEOTIDE SEQUENCE REQUIREMENTS. (A) Diagram representing the structural and functional region of the STAR/GSG domain containing proteins. (B) The STAR/GSG domain of mouse SLM-2, human SAM68, mouse QKI, C. elegans GLD-1 and human SF-1 were aligned using ClustalW. Secondary structure, beta sheets and alpha helices, are shown on top of the sequences and region NK/QUA1, the KH domain and region CK/QUA2 are shown beneath the sequences. The critical loop between helices alpha 1 and alpha 2 with the GXXG sequence is also shown, (a) Based on (Liu et.al., 2001) the RNA bases UACUAAC that contact with the specific SF-1 residues are numbered as follow U1A2C3U4A5A6C7. (b) Arginine 160 makes contact with LUand Ae. (c) Valine 183 makes contact with A6and C7. 153

A

STAWQSG domain = 2D0 amino actds wasm -—i

Sla-2 - DS:U3PSJ|?HAX,R[}W KFQXGSGK S2EKYJDWI 50 Saan68 88 PPSATASVKMEPEHKXLPHSlftfK BSLDPS|THftMQ>] MUgEJCIQKGOSKK D»2EN»p)I,PS ISO Qwakiftg 15 J IHXMQUMCKXniSSX. PMPCGJ&mU!! Gld-1 t^gSRTOVM.FQ'r —EFPRVEpEm 199 sr-i 75 NfBORSPSPEPITM^OK—MISTRESRTJ SRHHLCTHKVALII H5FKPEADXR 130 •H mi I I n Tim in mm -n-nii rn I -n """» Ti-n-n-ii- i-i-- -ii- • TITI-I IT » NK/QUAi domain «, GXXQ a2 PJ Sl.a-2 151 HKMKtKBRTOI 225 Quaking 7? GPIVO&Q^fefV IE 149 sia-i 200 GDffiSXTE§W SB—;igivm^jwjjBpD 272 sr-i 131 ppAissrst^aa iXVGRKBG—~~Q»«GBDBI 203 Specific contact with SNA(a) 544 S44b6 KH domain

Sim-2 126 SEKADVSWSK3CST- 191 SamfiS 226 I 1»je*>SRGRSWWR 289 Quaking 150 'XRWUSIKSKsIuM'S •— 217 Old-1 273 ttKasunnama-'—— 34Q 8F-1 204 MRDDHRIIJKWigS- 269 Specific contact with RKH(B) : 2 KHtksmmn C*V©UA2

3.7 REFERENCES OF CHAPTER 3

Ainger K, Avossa D, Diana AS, Barry C, Barbarese E, Carson JH. 1997. Transport and localization elements in myelin basic protein mRNA. J Cell Biol 138:1077-1087.

Ainger K, Avossa D, Morgan F, Hill SJ, Barry C, Barbarese E, Carson JH. 1993. Transport and localization of exogenous myelin basic protein mRNA microinjected into oligodendrocytes. J Cell Biol 123:431-441.

Arning S, Gruter P, Bilbe G, Kramer A. 1996. Mammalian splicing factor SF1 is encoded by variant cDNAs and binds to RNA. RNA 2:794-810.

Ashley CT, Wilkinson KD, Reines D, Warren ST. 1993. FMR1 protein: conserved RNP family domains and selective RNA binding. Science 262:563-566.

Backe PH, Messias AC, Ravelli RB, Sattler M, Cusack S. 2005. X-ray crystallographic and NMR studies of the third KH domain of hnRNP K in complex with single-stranded nucleic acids. Structure 13:1055-1067.

Baehrecke EH. 1997. who encodes a KH RNA binding protein that functions in muscle development. Development 124:1323-1332.

Boccaccio GL, Colman DR. 1995. Myelin basic protein mRNA localization and polypeptide targeting. J Neurosci Res 42:277-286.

Braddock DT, Baber JL, Levens D, Clore GM. 2002. Molecular basis of sequence-specific single-stranded DNA recognition by KH domains: solution structure of a complex between hnRNP K KH3 and single-stranded DNA. EMBO J 21:3476-3485.

Buckanovich RJ, Darnell RB. 1997. The Neuronal RNA Binding Protein Nova-1 Recognizes Specific RNA Targets In Vitro and In Vivo. Mol Cell Biol 17:3194- 3201. 155

Chen T, Boisvert FM, Bazett-Jones DP, Richard S. 1999. A role for the GSG domain in localizing Sam68 to novel nuclear structures in cancer cell lines. Mol Biol Cell 10:3015-3033.

Chen T, Damaj BB, Herrera C, Lasko P, Richard S. 1997. Self-association of the single-KH-domain family members Sam68, GRP33, GLD-1, and Qk1: role of the KH domain. Mol Cell Biol 17:5707-5718.

Coyle JH, Guzik BW, Bor YC, Jin L, Eisner-Smerage L, Taylor SJ, Rekosh D, Hammarskjold ML. 2003. Sam68 enhances the cytoplasmic utilization of introncontaining RNA and is functionally regulated by the nuclear kinase Sik/BRK. Mol Cell Biol 23:92-103.

Crittenden SL, Bernstein DS, Bachorik JL, Thompson BE, Gallegos M, Petcherski AG, Moulder G, Barstead R, Wickens M, Kimble J. 2002. A conserved RNA-binding protein controls germline stem cells in Caenorhabditis elegans. Nature 417:660-663.

Cote J, Boisvert FM, Boulanger MC, Bedford MT, Richard S. 2003. Sam68 RNA binding protein is an in vivo substrate for protein arginine N-methyltransferase 1. Mol Biol Cell 14:274-287.

Darnell JC, Fraser CE, Mostovetsky O, Stefani G, Jones TA, Eddy SR, Darnell RB. 2005. Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev 19:903-918.

Darnell JC, Jensen KB, Jin P, Brown V, Warren ST, Darnell RB. 2001. Fragile X mental retardation protein targets G quartet mRNAs important for neuronal function. Cell 107:489-499.

DeBoulle K, Verkerk AJMH, Reyniers E, Vits L, Hendrickx J, Roy BV, Bos FVD, DeGraaff E, Oostra BA, Willems PJ. 1993. A point mutation in the FMR-1 gene associated with fragile X mental retardation. Nature Genetics 3:31-35. 156

Derry JJ, Richard S, Carvajal HV, Ye X, Vasioukhin V, Cochrane AW, Chen T, Tyner AL. 2000. Sik (BRK) phosphorylates Sam68 in the nucleus and negatively regulates its RNA binding activity. Mol Cell Biol 20:6114-6126.

Di Fruscio M, Chen T, Bonyadi S, Lasko P, Richard S. 1998. The identification of two Drosophila KH domain proteins: KEP1 and SAM are members of the Sam68 family of GSG domain proteins. J Biol Chem 273:30122-30130.

Di Fruscio M, Chen T, Richard S. 1999. Two novel Sam68-like mammalian proteins SLM-1 and SLM-2: SLM-1 is a Src substrate during mitosis. Proc Natl Acad Sci USA 96:2710-2715.

Di Fruscio M, Styhler S, Wikholm E, Boulanger MC, Lasko P, Richard S. 2003. Kepi interacts genetically with dredd/caspase-8, and kepi mutants alter the balance of dredd isoforms. Proc Natl Acad Sci U S A 100:1814-1819.

Dreyfuss G, Kim VN, Kataoka N. 2002. Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol 3:195-205.

Du Z, Lee JK, Tjhen R, Li S, Pan H, Stroud RM, James TL. 2005. Crystal structure of the first KH domain of human poly(C)-binding protein-2 in complex with a C-rich strand of human telomeric DNA at 1.7 A. J Biol Chem 280:38823- 38830.

Elliott DJ. 2004. The role of potential splicing factors including RBMY, RBMX, hnRNPG-T and STAR proteins in spermatogenesis. Int J Androl 27:328-334.

Francis R, Barton MK, Kimbel J, Schedl T. 1995a. Control of oogenesis, germline proliferation and sex determination by the C.elegans gene gld-1. Genetics 139:579-606.

Francis R, Maine E, Schedl T. 1995b. Gld-1 a cell-type specific tumor suppressor gene in C.elegans. Genetics 139:607-630.

Galameau A, Richard S. 2005. Target RNA motif and target mRNAs of the Quaking STAR protein. Nat Struct Mol Biol 12:691-698. 157

Gibson TJ, Thompson JD, Heringa J. 1993. The KH domain occurs in a diverse set of RNA-binding proteins that include the antiterminator NusA and is probably involved in binding to nucleic acid. FEBS Letters 324:361-366.

Haegebarth A, Heap D, Bie W, Derry JJ, Richard S, Tyner AL. 2004. The nuclear tyrosine kinase BRK/Sik phosphorylates and inhibits the RNA-binding activities of the Sam68-like mammalian proteins SLM-1 and SLM-2. J Biol Chem 279:54398- 54400.

Hardy RJ. 1998. Molecular defects in the dysmyelinating mutant quaking. J Neurosci Res 51:417-422.

Hoek KS, Kidd GJ, Carson JH, Smith R. 1998. hnRNP A2 selectively binds the cytoplasmic transport sequence of myelin basic protein mRNA. Biochemistry 37:7021-7029.

Hogan EL, Greenfield S. 1984. Animal models of genetic disorders of myelin. Myelin:489-534.

Itoh M, Haga I, Li Q-H, Fujisawa J-l. 2002. Identification of cellular mRNA targets for RNA-binding protein Sam68. Nucl Acids Res 30:5452-5464.

Jan E, Motzny CK, Graves LE, Goodwin EB. 1999. The STAR protein, GLD-1, is a translational regulator of sexual identity in Caenorhabditis elegans. EMBO J 18:258-269.

Jensen KB, Dredge BK, Stefani G, Zhong R, Buckanovich RJ, Okano HJ, Yang YY, Darnell RB. 2000. Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. Neuron 25:359-371.

Kindler S, Wang H, Richter D, Tiedge H. 2005. RNa transport and local control of translation. Annu Rev Cell Dev Biol 21:223-245.

Kwon S, Barbarese E, Carson JH. 1999. The cis-acting RNA trafficking signal from myelin basic protein mRNA and its cognate trans-acting ligand hnRNP A2 enhance cap-dependent translation. J Cell Biol 147:247-256. 158

Larocque D, Galarneau A, Liu HN, Scott M, Almazan G, Richard S. 2005. Protection of the p27KIP1 mRNA by quaking RNA binding proteins promotes oligodendrocyte differentiation. Nat Neurosci 8:27-33.

Larocque D, Pilotte J, Chen T, Cloutier F, Massie B, Pedraza L, Couture R, Lasko P, Almazan G, Richard S. 2002. Nuclear retention of MBP mRNAs in the Quaking viable mice. Neuron 36:815-829.

Larocque D, Richard S. 2005. Quaking KH domain proteins as regulators of glial cell fate and differentiation. RNA Biology 2:37-40.

Lee M-H, Schedl T. 2001. Identification of in vivo mRNA targets of GLD-1, a maxi-KH motif containing protein required for C. elegans germ cell development. Genes & Dev 15:2408-2420.

Lee MH, Schedl T. 2004. Translation repression by GLD-1 protects its mRNA targets from nonsense mediated mRNA decay in C. elegans. Genes & Dev 18:1047-1059.

Lehmann-Blount KA, Williamson JR. 2005. Shape-specific nucleotide binding-of singlestranded RNA by the GLD-1 STAR domain. J Mol Biol 346:91-104.

Lewis HA, Chen H, Edo C, Buckanovich RJ, Yang YY, Musunuru K, Zhong R, Darnell RB, Burley SK. 1999. Crystal structures of Nova-1 and Nova-2 K- homology RNAbinding domains. Structure 7:191-203.

Lewis HA, Musunuru K, Jensen KB, Edo C, Chen H, Darnell RB, Burley SK. 2000. Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome. Cell 100:323-332.

Li Z, Takakura N, Oike Y, Imanaka T, Araki K, Suda T, Kaname T, Kondo T, Abe K, Yamamura K. 2003. Defective smooth muscle development in qkl-deficient mice. Dev Growth Differ 45:449-462.

Li Z, Zhang Y, Li D, Feng Y. 2000. Destabilization and mislocalization of the myelin basic protein mRNAs in quaking dysmyelination lacking the Qk1 RNA­ binding proteins. J Neurosci 20:4944-4953. 159

Lin Q, Taylor SJ, Shalloway D. 1997. Specificity and determinants of Sam68 RNA binding. J Biol Chem 272:27274-27280.

Liu Z, Luyten I, Bottomley MJ, Messias AC,. S. H-M, R. S, ZanierK, Kramer A, M. S. 2001. Structural basis for recognition of the intron branch site RNA by splicing factor 1. Science 294:1098-1102.

Lukong KE, Richard S. 2003. Sam68, the KH domain-containing superSTAR. Biochim BiophysActa 1653:73-86.

Matter N, Herrlich P, Konig H. 2002. Signal-dependent regulation of splicing via phosphorylation of Sam68. Nature 420:691-695.

Nabel-Rosen H, Dorevitch N, Reuveny A, Volk T. 1999. The balance between two isoforms of the Drosophila RNA-binding protein How controls tendon cell differentiation. Mol Cell 4:573-584.

Peled-Zehavi H, Berglund JA, Rosbash M, Frankel AD. 2001. Recognition of RNA branch point sequences by the KH domain of splicing factor 1 (mammalian branch point binding protein) in a splicing factor complex. Mol Cell Biol 21:5232- 5241.

Reddy TR, Xu W, Mau JKL, Goodwin CD, Suhasini M, Tang H, Frimpong K, Rose DW, Wong-Staal F. 1999. Inhibition of HIV replication by dominant negative mutants of Sam68, a functional homolog of HIV-1 Rev. Nature Medicine 5:635- 642.

Richard S, Torabi N, Franco GV, Tremblay GA, Chen T, Vogel G, Morel M, Cleroux P, Forget-Richard A, Komarova S, Tremblay ML, Li W, Li A, Gao YJ, Henderson JE. 2005. Ablation of the Sam68 RNA binding protein protects mice from age-related bone loss. 1:e74, 71-13.

Ryder SP, Frater L, A, Abramovitz DL, Goodwin EB, Williamson JR. 2004. RNA target specificity of the STAR/GSG domain post-transcriptional regulatory protein GLD-1. Nat Struct Mol Biol 11:20-28.

Ryder SP, Williamson JR. 2004. Specificity of the STAR/GSG domain protein Qk1: implications for the regulation of myelination. RNA 10:1449-1458. 160

Saccomanno L, Loushin C, Jan E, Punkay E, Artzt K, Goodwin EB. 1999. The STAR protein QKI-6 is a translational repressor. Proc Natl Acad Sci U S A 96:12605-12610.

Schumacher B, Hanazawa M, Lee M-H, Nayak S, Volkmann K, Hofmann R, Hengartner M, Schedl T, Gartner A. 2005. Translational Repression of C. elegans p53 by GLD-1 Regulates DNA Damage-Induced Apoptosis. Cell 120:357-368.

Sidman RL, Dickie MM, Appel SH. 1964. Mutant mice (quaking and jimpy) with deficient myelination in the central nervous system. Science 144:309-311.

Siomi H, M_a_tunis MJ, Michael WM, Dreyfuss G. 1993. The pre-mRNA binding K protein contains a novel evolutionary conserved motif. Nucl Acids Res 21:1193-1198.

Stoss O, Novoyatleva T, Gencheva M, Olbrich M, Benderska N, Stamm S. 2004. P59fyn mediated phosphorylation regulates the activity of the tissue-specific splicing factor rSLM-1. Mol Cell Neurosci 27:8-21.

Tuerk C, Gold L. 1990. Systemic evolution of ligands by expontential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249:505-510.

Venables JP, Vernet C, Chew SL, Elliot DJ, Cowmeadow RB, Wu J, Cooke HJ, Artzt K, Eperon IC. 1999. T-STAR/ETOILE: a novel relative of Sam68 that interacts with an RNA-binding protein implicated in spermatogenesis. Hum Mol Genetics 8:959-969.

Verity AN, Campagnoni AT. 1988. Regional expression of myelin protein genes in the developing mouse brain: In situ hybridization studies. J Neurosci Res 21:238-248.

Verkerk AJMH, Pieretti M, Sutcliffe JS, Fu Y-H, Kuhl DP, Pizzuti A, Reiner O, Richards S, Victoria MF, Zhang F, Eussen BE, van Ommen GJB, Blonden LAJ, Riggins GJ, Chastain JL, Kunst CB, Galjaard H, Caskey CT, Nelson DL, Oostra BA, Warren ST. 1991. Identification of a gene (FMR1) containing a CGG repeat 161 coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65:905-914. Vernet C, Artzt K. 1997. STAR, a gene family involved in signal transduction and activation of RNA. Trends in Genet 13:479-484.

Wang LL, Richard S, Shaw AS. 1995. p62 association with RNA is regulated by tyrosine phosphorylation. J Biol Chem 270:2010-2013.

Wong G, MullerO, Clark R, Conroy L, Moran MF, Polakis P, McCormick F. 1992. Molecular cloning and nucleic acid binding properties of the GAP-associated tyrosine phosphoprotein p62. Cell 69:551-558.

Wu Jl, Reed RB, Grabowski PJ, Artzt K. 2002. Function of quaking in myelination: regulation of alternative splicing. Proc Natl Acad Sci USA 99:4233- 4238.

Zaffran S, Astier M, Gratecos D, Semeriva M. 1997. The held out wings (how) Drosophila gene encodes a putative RNA binding protein involved in the control of muscular and cardiac activity. Development 124:2087-2098.

3.8 ACKNOWLEDGMENTS

We thank Min-Ho Lee for recombinant GLD-1 protein and stimulating discussions. This work was supported by MT-13377 from the Canadian Institutes of Health Research (CIHR). A.G. was a Research Student of the National Cancer

Institute of Canada supported with funds provided by the Terry Fox Run. S. R. is an investigator of the CIHR. 162

3.9 SUPPLEMENTARY DATA

FIGURE 3.6 THE CLASS IISRE-9, -10 AND -11 DO NOT ASSOCIATES WITH SLM-2 EMSAs with the selected SRE-9, -10, and -11 with decreasing concentrations of recombinant His-SLM-2 (by a factor of 2 from 1 uM) or with buffer alone. The RNA sequences used in the reaction are shown in Table 3.2. Migration patterns of unbound RNAs (free probe) and protein bound RNAs (protein-RNA complex) are indicated on the left. ^-SLM-2 j His-SLM-2 f ^is-SLM-2 f

SLM-2/RNA complex

Free probe

SRE-9 SRL 10 SRE: 11 Figure 3.6 The class II SRE-9, -10 and-11 do not asscociates with SLM-2 164

4 General Discussion

4.1 SELEX as a general tool to identify RNA targets The identification of aptamers through Systematic JEvolution of Ligand by

Exponential enrichment was developed by two independent groups in the early

1990's (Ellington and Szostak 1990; Tuerk and Gold 1990). At the time, molecular biology was becoming a powerful tool since enzymes were more stable, easier to handle, and less prone to producing errors. With the progress of molecular biology in the past 20 years, SELEX has now proven to be an excellent tool for the identification of aptamers or short nucleotide sequences from a purified library (Gopinath 2007). Not only is the production of a library easier and more reliable, but the target proteins used to select the aptamers are easier to produce at higher concentrations. With the advent of bioinformatics, the identification of sequences using in vitro selection procedures has become even more significant. By combining SELEX and bioinformatics, the identification of

RNA binding sequences of a target protein can not only lead to the exact RNA target, but also to the putative function of a RNA binding protein. Aptamers have been selected using SELEX technologies for a wide range of RNA binding proteins (Gopinath 2007). In addition to the major improvements in molecular biology, the other main development included separation methods such as affinity tags, column matrices, cross-linking, and antibody based. The one disadvantage with SELEX is that it is very repetitive and time-consuming, particularly because one round of selection can take up to 3 days, and sometimes may need to be repeated because of basal background levels. These 165 drawbacks preclude SELEX methods from high-throughput or automated selection. Various groups are attempting to automate the procedures, most notably, through the development of workstations, but these have yet to prove useful. I believe efforts must be put into developing reliable libraries, a crucial step in SELEX. Also, SELEX is a very complex method and researchers must follow each step of the procedure with extreme caution to ensure no errors are found in the results. Fast progress in selection methods have allowed researchers to isolate high affinity aptamers against many target molecules that will lead to new diagnostic and therapeutic applications. Now that aptamers are being developed into drugs and that molecular techniques are steadily improving,

SELEX is becoming an increasingly valuable tool to have in the laboratory.

4.2 GSG/STAR domain RNA consensus: a comparison study In the present study, I identified the QKI consensus sequence (chapter 2) and the SLM-2 consensus sequence (chapter 3) using SELEX. In summary, both mRNA consensus sequences are direct repeats of UA rich sequences. The bipartite nature of the consensus sequences was essential for both high affinity

RNA and protein interactions. Moreover, in chapter 3, Sam68 and GLD-1 are also shown to require bipartite direct repeats. The RNA binding domain of

STAR/GSG proteins consist of a maxi-KH domain flanked by the NK/QUA1 and

CK/QUA2 conserved sequences (Figure 3.5a). The NK/QUA1 is responsible for dimerization (Chen and Richard 1998). Based on SF1/BPS structure, the KH domain recognizes the UAAC portion of the core consensus and the CK/QUA2 166

The important specificity differences which confer the exclusive binding affinity to the KH domain of the protein, are located within the CK/QUA2 region.

Using a double mutant of GLD-1, Lehmann-Blount et al. attempted to mimic the

Sam68 KH domain by mutating the two important residues (R328 and A321) responsible for conferring a hexanucleotide specificity (Lehmann-Blount and

Williamson 2005). These two residues are identified with an asterisk on figure

3.5b. Since the double mutant GLD-1/Sam68-like does not bind to any of their mutant RNAs, they concluded that Sam68 is not a RNA binding protein. They also showed that GLD-1 binds the 28mer Tra/GH element (TGE) with a tenfold higher affinity than the TGE 12mer. Moreover, if you mutate the hexanucleotide to a tetranucleotide consensus sequence, GLD-1 no longer binds to the RNA.

The setback with the double mutant GLD-1/Sam68-like is that they tested it with the 12mer, which does not contain the bipartite motif. The fact that they mutated 167 the KH domain of GLD-1 (hexanucleotide-specific) to a Sam68-like KH domain

(tetranucleotide-specific), they lost the interaction with the RNA because they where using TGE 12mers. I demonstrated in chapter 3 that Slm2/Sam68 absolutely requires a bipartite motif for it to bind, too. Similar mutagenesis analysis with a bipartite motif would answer this question. Another way would be to mutate the specificity of Sam68 so that it would become a hexanucleotide specific KH domain. I would expect it to be able to bind the TGE 12mers since its affinity for the RNA would increase with the presence of the hexanucleotide.

4.3 STAR domain containing proteins and post -transcriptional regulation

At the molecular level, STAR domain proteins are RNA binding proteins whose functions can be endless since they could play a role in every step of the mRNA processing cascade or control. These include but are not restricted to translation, mRNA turnover, RNA interference, pre-mRNA splicing, non-sense mediated decay, and RNA trafficking. These were described in detail in the introduction of this thesis. Quaking and Sam68/Slm2 are not the first RNA binding proteins to be identified as multifunctional proteins. Other RNA binding proteins like FMRP or hnRNP A1 were proposed to be pluripotent as they were shown to play a role in many steps of mRNA expression control, including mRNA splicing, trafficking, modulation of mRNA turnover, and translation. 168

4.3.1 RNA trafficking

Many RNA binding proteins are implicated in the shuttling and transport of messenger RNA. QKI was shown to play a role in RNA trafficking by regulating mRNA export of myelin basic protein in the myelination process. Other STAR proteins play roles in mRNA transport. HOW, the QKI drosophila homolog, was also shown to be involved in the development of regulating tendon cell differentiation, by binding to the transcription factor stripe mRNA and regulating its mRNA export or repressing its translation. Sam68 was also shown to facilitate the export of retroviral unspliced RNAs containing HIV RRE and RNAs containing a constitutive transport element. Other reports show that SAM68 interacts with

Rev and assist in export (Lukong and Richard 2003). It is well defined that QKI-5 is nuclear while QKI-6 and QKI-7 are cytoplasmic. Similarly, Sam68 is exclusively nuclear. During development, QKI-5 is expressed during embryogenesis, whereas QKI-6 and QKI-7 are expressed in late embryogenesis. The temporal expression and separate cellular localization imply that access to RNA targets is regulated by timing and compartmentalization. Moreover, STAR-RNA interaction can be regulated by phosphorylation, which abolishes STAR RNA binding properties (Zhang, Lu et al. 2003; Haegebarth, Heap et al. 2004). Timing, compartmentalization, phosphorylation, and other factors or post-translational modifications are important levels of mRNA transport regulation by RNA binding proteins. We cannot rule out the possibility that QKI could bind to other carrier proteins or export systems and bridge the mRNA to the transport machinery. 169

4.3.2 Translation repression and mRNA stabilization

The regulated translation of messenger RNAs is critical not only for many developmental processes, but also to maintain cell function. This is especially true for RNA transport in neurons in which the cargo remains untranslated until it is delivered to its destination. Only then can the translation be activated and the messenger RNA translated into proteins. GLD-1 was the first STAR protein being identified as a translational repressor (Francis, Barton et al. 1995; Jan, Motzny et al. 1999). GLD-1 was also shown to repress translation of p53 (Schumacher,

Hanazawa et al. 2005), Notch (Marin and Evans 2003), PAL-1 (Mootz, Ho et al.

2004), and various other putative mRNA targets (Lee and Schedl 2001; Ryder,

Frater et al. 2004). This translation repression function is mediated through the

Tra/gli element identified by the Kimble (Goodwin, Okkema et al. 1993) and

Goodwin (Jan, Yoon et al. 1997) groups and further refined by Ryder (Ryder,

Frater et al. 2004) and my work (chapter 3).

Isoform 6 is the only member of the QKI family of proteins to have been shown to regulate translation. Saccomanno et al. showed that both, in vitro and in vivo, QKI-6 represses translation of a reporter construct by binding specifically to the TGE just as GLD-1 does (Saccomanno, Loushin et al. 1999). They also mentioned that QKI-5 and QKI-7 bind specifically in vitro to the TGE but did not pursue their study with these proteins. Most likely, QKI can regulate mRNA translation through its binding to the QRE. If this c/s-element is present on the mRNA, QKI could act as a trans factor able to prevent important translation machinery from binding to the mRNA. Because QKI-6 and QKI-7 are localized in 170 the cytoplasm of cells, they most likely play a role in translation repression. The multiple protein interacting domains present on QKI, as well as Sam68/SLM-2, predict that these proteins could interact with the translation machinery. In fact,

Julie Pilotte, a former member of Dr. Richard's laboratory showed that QKI is able to physically interact with the PABP (Julie Pilotte and Stephane Richard unpublished data) which would allow translation repression by a similar mechanism as Paip2 (PABP-interacting protein 2). This protein competes with elF4G for the binding to PABP (Karim, Svitkin et al. 2006).

On the other hand, GLD-1 was also shown to play an important role in protecting mRNAs with uORFs against nonsense-mediated mRNA decay in C. elegans (Lee and Schedl 2004). This could also be a mechanism of action for

QKI or Sam68/SLM-2 proteins since QKI was shown to bind and stabilize the mRNA of p27KIP1 (Larocque, Galameau et al. 2005)

4.3.3 Alternative splicing As described in section 1.2.2, alternative splicing is a powerful way to regulate gene expression. This post-transcriptional modification creates protein diversity, and RNA binding proteins are important regulators of these events.

Many RNA-binding proteins are known to regulate pre-mRNA splicing, including

Nova-1 (Buckanovich et al., 1997), SLM-2 (Stoss, Olbrich et al. 2001; Stoss,

Novoyatleva et al. 2004), SF1/BBP1 (Aming, Gruter et al. 1996; Berglund, Chua 171 et al. 1997), Sam68 (Matter, Herrlich et.al. 2002), and QKI-5 (Wu, Reed et al.

2002).

Stoss et al. have shown that SLM-2 could change the splicing patterns of

CD44, human transformer-2(3, and Tau minigenes (Stoss, Olbrich et al. 2001).

They postulated that SLM-2 and most probably Sam68 are links between signal transduction and pre-mRNA splicing. Later, Matter et al. showed that this link between signal transduction and pre-mRNA splicing decisions was true for

Sam68 (Matter, Herrlich et al. 2002). They showed that Sam68 was able to regulate an exonic splice-regulatory element of CD44, and that this regulation was mediated by Sam68 phosphorylation by ERK. In this study, they show that in vitro Sam68 binding is mediated by a cis sequence and that when mutated,

Sam68 can no longer bind to the RNA. This identified sequence is very similar to the one identified in this study (AAAAUU). In fact this sequence is present in most of the clones that we identified (Table 3.2), but from our results, this sequence alone is not able to confer binding to SLM-2 because a bipartite motif is required (SRE-4m3 Table 3.2).

Alternative splicing is probably the most unexplored function for QKI and some other STAR proteins. Wu et al. demonstrated previously that in transient co-expression assays, QKI-5 isofbrm was able to negatively regulate the myelin- associated glycoprotein (MAG) exon 12 alternative splicing (Wu, Reed et al.

2002). They also found that an intronic sequence element is required for this 172 regulation to occur, and showed that QKI-5 is able to bind to this sequence.

Unfortunately, the intronic sequence identified does not contain the bipartite sequence identified in my work. The fact that QKI binds specifically to a bipartite sequence and that the core sequence contains the exact branch point sequence

(BPS) bound by SF1 implicated in splicing implies that QKI must be involved in splicing. Recently, the Darnell and Blencowe group joined together to generate a

RNA map predicting NOVA-dependent splicing regulation (Ule, Stefani et al.

2006). This work published in Nature shows that NOVA can use its binding specificity to generate a code to successfully predict the exon targets for a protein. The NOVA protein is very similar to QKI. It contains 3 KH-like, RNA- binding domains and binds to a specific RNA sequence. In fact, just like in this thesis, the RNA binding motif of NOVA was elucidated using the SELEX approach. The Darnell group identified that NOVA was able to specifically associate with repeats of UCAU RNA sequence (Buckanovich and Darnell 1997).

Using bioinformatics, they identified mRNAs that contain this repeated sequence and other RNA targets. It is possible that STAR proteins define this splicing code.

The bipartite nature of the consensus sequences identified in this thesis, would generate the code for STAR proteins to mask the intronic BPS and block the access of SF1 to this region. Because BPS recognition by SF1 is a required early step in pre-mRNA splicing (Black 2003), the association of a closely-related

STAR domain protein would prevent splicing from occurring. Actually, a study similar to the one from the Darnell and Blencowe group but with QKI or GLD-1 173 would probably further define the involvement of STAR proteins in the definition of the splicing code.

4.4 Star domain containing proteins and disease Because STAR domain-containing proteins are shown to play a role in many aspects of RNA regulation, it is quite probable that they are involved in diseases. NOVA is an example of a RNA binding protein involved in disease. In detail, NOVA is the target antigen in paraneoplastic opsoclonus myoclonus ataxia (POMA). This disease is a neurologic disorder that develops when tumors

(like breast tumors) express NOVA. This triggers the production of an autoimmune response (Buckanovich, Yang et al. 1996; Buckanovich and Darnell

1997). Another well known example is the fragile X mental retardation protein

FMRP implicated in fragile X syndrome (FXS). This disease is the most common cause of inherited mental retardation, and can be attributed to mutations in the

FMR1 gene, coding for FMRP and present on the X chromosome (Bagni and

Greenough 2005).

In chapter 2, I have shown that many gene products implicated in cancer have been identified as putative mRNA targets of QKI. This suggests a role for the STAR proteins in regulating the cell cycle, proliferation and cancer. In fact,

GLD-1 when mutated causes the formation of germ-line tumors suggesting that it may be a tumor suppressor (Jones and Schedl 1995). Moreover, many reports have shown that Sam68 is involved in tumorigenesis. Kiven E. Lukong from Dr. 174

Richard's laboratory demonstrated that Sam68 is tyrosine phosphorylated by the breast tumor kinase (BRK) (Lukong, Larocque et al. 2005). This tyrosine kinase is overexpressed in the majority of human breast tumors and has also been shown to phosphorylate SLM-1 and -2 (Haegebarth, Heap et al. .2004).

Overexpression of Sam68 induces a cell cycle arrest strengthening the tumor suppressor activity of Sam68 (Taylor, Resnick et al. 2004). Acetylation, a post translational modification, of Sam68 has been shown to be elevated in tumor cell lines which correlates with enhanced RNA binding activity (Babic, Jakymiw et al.

2004). Finally, Liu et al. has demonstrated that the level of Sam68 can significantly affect cell proliferation and must be implicated in tumorigenesis (Liu,

Li et al. 2000). In addition to the fact that I have shown in chapter 2 that QKI has many putative mRNA targets implicated in cell proliferation, QKI is altered in human gliomas (Li, Kondo et al. 2002). Together, these reports reveal that STAR

RNA binding proteins are involved in tumorigenesis by acting most probably as tumor suppressors.

Involvement of QKI in myelination defects is well known and defined. The

Quaking viable mouse is a model of dysmyelination (Sidman, Dickie et al. 1964).

It contains a spontaneous mutation resulting in loss of myelination during development. This mutation is a deletion of one megabase that includes the promoter and enhancer regions of the qk gene (Ebersole, Chen et al. 1996). This dysmyelination defect of QKI may be implicated in, but is not restricted to, multiple sclerosis (Merrill and Scolding 1999) and schizophrenia (Aberg, Saetre 175 et al. 2006; Mclnnes and Lauriat 2006). The histopathology of multiple sclerosis highlights myelin loss as an important feature (Merrill and Scolding 1999). In fact, it is the oligodendrocytes - the glia responsible for myelin synthesis - that are absent in this disease. Because QKI was shown to play a role in oligodendrocyte differentiation (Larocque, Galarneau et al. 2005; Larocque and Richard 2005) and because it transports the myelin basic protein (MBP) mRNA out of the nucleus during development (Larocque, Pilotte et al. 2002), QKI is a good candidate in the etiology of this disease. As for multiple sclerosis, patient with schizophrenia have reduced expression of oligodendrocytes and myelin genes, especially QKI which suggests that QKI is a good candidate protein for in schizophrenia (Mclnnes and Lauriat 2006).

4.5 Future directions Scientists are always looking for the reasons as to why things occur or, are the way they are. In molecular biology, one of the main query researches are investigating is identification of the function or the role of a gene, an RNA molecule, or a protein. Even though the use of SELEX circumvented part of the putative roles that STAR proteins play in the molecular biology of a cell, it raised a lot of questions, which is good from a scientific point of view. These include all the new putative mRNAs identified through bioinformatics. Until their interaction with QKI is fully understood, they simply remain possibilities. One of the future directions of QKI research could be the confirmation and further characterization of every mRNA target, although this could be redundant and time consuming. 176

One should concentrate on mRNA potentially involved in development and cell differentiation as these are known to implicate QKI. At the structural point of view, it is well known that QKI dimerization is required for proper RNA binding to occur.

Exactly how this binding occurs is still unknown. Analysis of the structure function must be done to identify the exact binding sequence to the RNA. Changing the protein RNA specificity from one subfamily to another and testing it on bipartite motifs would complete the structure-function analysis initiated by Lehmann-

Blount (Lehmann-Blount and Williamson 2005).

At the molecular level, the most promising unexplored field of research for

QKI and STAR binding proteins is most likely alternative splicing. As shown in this thesis, STAR proteins bind specifically to derivatives of the branch point sequence and most probably compete with SF1 for the interaction to the branch point sequence. Not only could it compete for the SF1-BPS interaction but because the half site is absolutely required for proper binding of QKI, this competition could confer intronic sequence selection in the presence of the proper half site sequence. The bioinformatics search in my work was done on known coding sequences, but no search was done on the complete genome which could lead to the possible identification of intronic splice site selection. 177

4.6 Concluding remark Using SELEX technology and bioinformatics, my work led to the discovery of a refined RNA binding consensus for QKI and the identification of the SLM-2 response element. Both Quaking and Slm2/Sam68 were able to associate with bipartite consensus RNA motif. Because the quaking response element was more defined than the Slm-2 response element, I was able to identify some of the putative mRNA targets of Quaking and some new putative functions. The ultimate goal was to identify a function or a role for the STAR family of proteins.

We are now a step closer to the answer. 178

5 REFERENCES FOR CHAPTER 1 AND CHAPTER 4

Aberg, K., Saetre, P., Jareborg, N. and Jazin, E. (2006). "Human QKI, a potential regulator of mRNA expression of human oligodendrocyte-related genes involved in schizophrenia." PNAS 103(19): 7482-7487.

Arning, S., Gruter, P., Bilbe, G. and Kramer, A. (1996). "Mammalian splicing factor SF1 is encoded by variant cDNAs and binds to RNA." Rna 2(8): 794-810.

Auweter, S. D., Oberstrass, F. C. and Allain, F. H. T. (2006). "Sequence-specific binding of single-stranded RNA: is there a code for recognition?" Nucl. Acids Res. 34(17): 4943-4959.

Baber, J. L, Libutti, D., Levens, D. and Tjandra, N. (1999). "High Precision Solution Structure of the C-terminal KH Domain of Heterogeneous Nuclear Ribonucleoprotein K, a c-myc Transcription Factor." Journal of Molecular Biology 289(4): 949-962.

Babic, I., Jakymiw, A. and Fujita, D. J. (2004). "The RNA binding protein Sam68 is acetylated in tumor cell lines, and its acetylation correlates with enhanced RNA binding activity." Oncogene 23(21): 3781-9.

Baehrecke, E. H. (1997). "who encodes a KH RNA binding protein that functions in muscle development." Development 124(7): 1323-1332.

Bagni, C. and Greenough, W. T. (2005). "From mRNP trafficking to spine dysmorphogenesis: the roots of fragile X syndrome." Nat Rev Neurosci 6(5): 376- 87.

Bardoni, B. and Mandel, J.-L. (2002). "Advances in understanding of fragile X pathogenesis and FMRP function, and in identification of X linked mental retardation genes." Current Opinion in Genetics & Development 12(3): 284-293.

Bechara, E., Davidovic, L, Melko, M., Bensaid, M., Tremblay, S., Grosgeorge, J., Khandjian, E. W., Lalli, E. and Bardoni, B. (2006). "Fragile X related protein 1 isoforms differentially modulate the affinity of fragile X mental retardation protein for G-quartet RNA structure." Nucl. Acids Res.: gkl1021.

Bentley, D. L. (2005). "Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors." Current Opinion in Cell Biology 17(3): 251-256.

Berglund, J. A., Abovich, N. and Rosbash, M. (1998). "A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition." Genes Dev. 12(6): 858-867. 179

Berglund, J. A., Chua, K., Abovich, N., Reed, R. and Rosbash, M. (1997). "The Splicing Factor BBP Interacts Specifically with the Pre-mRNA Branchpoint Sequence UACUAAC." CeJI 89(5): 781-787.

Black, D. L. (2003). "MECHANISMS OF ALTERNATIVE PRE-MESSENGER RNA SPLICING." Annual Review of Biochemistry 72(1): 291-336.

Bohnsack, B. L., Lai, L, Northrop, J. L., Justice, M. J. and Hirschi, K. K. (2006). "Visceral endoderm function is regulated by quaking and required for vascular development." Genesis 44(2): 93-104.

Buckanovich, R. J. and Darnell, R. B. (1997). "The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo." Mol. Cell. Biol. 17(6): 3194-3201.

Buckanovich, R. J., Yang, Y. Y. and Darnell, R. B. (1996). "The onconeural antigen Nova-1 is a neuron-specific RNA-binding protein, the activity of which is inhibited by paraneoplastic antibodies." J. Neurosci. 16(3): 1114-1122.

Carson, J. H. and Barbarese, E. (2005). "Systems analysis of RNA trafficking in neural cells." Biol Cell 97(1): 51-62.

Carson, J. H., Cui, H. and Barbarese, E. (2001). 'The balance of power in RNA trafficking." Current Opinion in Neurobiology 11 (5): 558-563.

Carthew, R. W. (2002). "RNA interference: the fragile X syndrome connection." Curr Biol 12(24): R852-4.

Caudy, A. A., Myers, M., Hannon, G. J. and Hammond, S. M. (2002). "Fragile X- related protein and VIG associate with the RNA interference machinery." Genes Dev 16(19): 2491-6.

Chen, C.-Y., Gherzi, R., Ong, S.-E., Chan, E. L, Raijmakers, R., Pruijn, G. J. M., Stoecklin, G., Moroni, C, Mann, M. and Karin, M. (2001). "AU Binding Proteins Recruit the Exosome to Degrade ARE-Containing mRNAs." Cell 107(4): 451-464.

Chen, T., Cote, J., Carvajal, H. V. and Richard, S. (2001). "Identification of Sam68 arginine glycine-rich sequences capable of conferring nonspecific RNA binding to the GSG domain." J Biol Chem 276(33): 30803-11.

Chen, T., Damaj, B. B., Herrera, C, Lasko, P. and Richard, S. (1997). "Self- association of the single-KH-domain family members Sam68, GRP33, GLD-1, and Qk1: role of the KH domain." Mol Cell Biol 17(10): 5707-18. 180

Chen, T. and Richard, S. (1998). "Structure-Function Analysis of Qk1: a Lethal Point Mutation in Mouse quaking Prevents Homodimerization." Mol. Cell. Biol. 18(8): 4863-4871.

Chen, Y., Tian, D., Ku, L, Osterhout, D. J. and Feng, Y. (2007). "The Selective RNA-binding Protein Quaking I (QKI) Is Necessary and Sufficient for Promoting Oligodendroglia Differentiation." J. Biol. Chern. 282(32): 23553-23560.

Cohen, C. D., Doran, P. P., Blattner, S. M., Merkle, M., Wang, G. Q., Schmid, H., Mathieson, P. W., Saleem, M. A., Henger, A., Rastaldi, M. P. and Kretzler, M. (2005). "Sam68-Like Mammalian Protein 2, Identified by Digital Differential Display as Expressed by Podocytes, Is Induced in Proteinuria and Involved in Splice Site Selection of Vascular Endothelial Growth Factor." J Am Soc Nephrol 16(7): 1958-1965.

Coller, J. and Parker, R. (2004). "EUKARYOTIC mRNA DECAPPING." Annual Review of Biochemistry 73(1): 861-890.

Cooper, G. (2000). The Cell - A Molecular Approach. 2nd ed. Sunderland (MA), Sinauer Associates, Inc.

Cote, J., Boisvert, F.-M., Boulanger, M.-C, Bedford, M. T. and Richard, S. (2003). "Sam68 RNA Binding Protein Is an In Vivo Substrate for Protein Arginine N-Methyltransferase 1." Mol. Biol. Cell 14(1): 274-287.

Cote, J., Boisvert, F. M., Boulanger, M. C, Bedford, M. T. and Richard, S. (2003). "Sam68 RNA binding protein is an in vivo substrate for protein arginine N- methyltransferase 1." Mol Biol Cell 14(1): 274-87.

Cruz-Alvarez, M. and Pellicer, A. (1987). "Cloning of a full-length complementary DNA for an Artemia salina glycine-rich protein. Structural relationship with RNA binding proteins." J. Biol. Chern. 262(28): 13377-13380.

Cullen, B. R. (2006). "Is RNA interference involved in intrinsic antiviral immunity in mammals?" Nat Immunol 7(6): 563-7.

Darnell, J. C, Fraser, C. E., Mostovetsky, O., Stefani, G., Jones, T. A., Eddy, S. R. and Darnell, R. B. (2005). "Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes." Genes Dev. 19(8): 903-918.

Darnell, J. C, Jensen, K. B., Jin, P., Brown, V., Warren, S. T. and Darnell, R. B. (2001). "Fragile X Mental Retardation Protein Targets G Quartet mRNAs Important for Neuronal Function." Cell 107(4): 489-499. 181

Deny, J. J., Richard, S., Valderrama Carvajal, H., Ye, X., Vasioukhin, V., Cochrane, A. W., Chen, T. and Tyner, A. L. (2000). "Sik (BRK) Phosphorylates Sam68 in the Nucleus and Negatively Regulates Its RNA Binding Ability." Mol. Cell. Biol. 20(16): 6114-6126.

Di Fruscio, M., Chen, T., Bonyadi, S., Lasko, P. and Richard, S. (1998). "The Identification of Two Drosophila K Homology Domain Proteins. KEP1 AND SAM ARE MEMBERS OF THE Sam68 FAMILY OF GSG DOMAIN PROTEINS." J, Biol. Chem. 273(46): 30122-30130.

Di Fruscio, M., Chen, T. and Richard, S. (1999). "Characterization of Sam68-like mammalian proteins SLM-1 and SLM-2: SLM-1 is a Src substrate during mitosis." PNAS 96(6): 2710-2715.

Di Fruscio, M., Styhler, S., Wikholm, E., Boulanger, M.-C, Lasko, P. and Richard, S. (2003). "kepi interacts genetically with dredd/Caspase-8, and kepi mutants alter the balance of dredd isoforms." PNAS 100(4): 1814-1819.

Doyle, M. and Jantsch, M. F. (2002). "New and old roles of the double-stranded RNA-binding domain." Journal of Structural Biology 140(1-3V. 147-153.

Ebersole, T., Rho, O. and Artzt, K. (1992). "The Proximal End of Mouse Chromosome 17: New Molecular Markers Identify a Deletion Associated With quaking(viable)." Genetics 131(1): 183-190.

Ebersole, T. A., Chen, Q., Justice, M. J. and Artzt, K. (1996). "The quaking gene product necessary in embryogenesis and myelination combines features of RNA binding and signal transduction proteins." Nat Genet 12(3): 260-265.

Edenfeld, G., Volohonsky, G., Krukkert, K., Naffin, E., Lammel, U., Grimm, A., Engelen, D., Reuveny, A., Volk, T. and Klambt, C. (2006). "The Splicing Factor Crooked Neck Associates with the RNA-Binding Protein HOW to Control Glial Cell Maturation in Drosophila." Neuron 52(6): 969-980.

Ellington, A. D. and Szostak, J. W. (1990). "In vitro selection of RNA molecules that bind specific ligands." Nature 346(6287): 818-822.

Ellis, C, Moran, M., McCormick, F. and Pawson, T. (1990). "Phosphorylation of GAP and GAP-associated proteins by transforming and mitogenic tyrosine kinases." Nature 343(6256): 377-381.

Ephrussi, A., Dickinson, L. K. and Lehmann, R. (1991). "oskar organizes the germ plasm and directs localization of the posterior determinant nanos." Cell 66(1): 37-50. 182

Feuillet, V., Semichon, M., Restouin, A., Harriague, J., Janzen, J., Magee, A., Collette, Y. and Bismuth, G. (2002). "The distinct capacity of Fyn and Lck to phosphorylate Sam68 in T cells is essentially governed by SH3/SH2-catalytic domain linker interactions." Oncogene 21 (47): 7205-13.

Filipowicz, W. (2005). "RNAi: the nuts and bolts of the RISC machine." CeN 122(1): 17-20.

Forrest, K. M. and Gavis, E. R. (2003). "Live Imaging of Endogenous RNA Reveals a Diffusion and Entrapment Mechanism for nanos mRNA Localization in Drosophila." Current Biology 13(14): 1159-1168.

Francis, R., Barton, M. K., Kimble, J. and Schedl, T. (1995). "gld-1, a Tumor Suppressor Gene Required for Oocyte Development in Caenorhabditis elegans." Genetics 139(2): 579-606.

Fuller-Pace, F. V. (2006). "DExD/H box RNA helicases: multifunctional proteins with important roles in transcriptional regulation." Nucl. Acids Res. 34(15): 4206- 4215.

Fumagalli, S., Totty, N. F., Hsuan, J. J. and Courtneidge, S. A. (1994). "A target for Src in mitosis." Nature 368(6474): 871-874.

Fyrberg, C, Becker, J., Barthmaier, P., Mahaffey, J. and Fyrberg, E. (1997). "A Drosophila muscle-specific gene related to the mouse quaking locus." Gene 197(1-2): 315-23.

Fyrberg, C, Becker, J., Barthmaier, P., Mahaffey, J. and Fyrberg, E. (1998). "A family of Drosophila genes encoding quaking-related maxi-KH domains." Biochem Genet 36(1-2): 51-64.

Garcia-Blanco, M. A., Baraniak, A. P. and Lasda, E. L. (2004). "Alternative splicing in disease and therapy." Nat Biotech 22(5): 535-546.

Gartel, A. L. and Kandel, E. S. (2006). "RNA interference in cancer." Biomol Eng 23(1): 17-34.

Gebauer, F. and Hentze, M. W. (2004). "MOLECULAR MECHANISMS OF TRANSLATIONAL CONTROL." Nature Reviews Molecular Cell Biology 5(10): 827-835.

Goodwin, E. B., Okkema, P. G., Evans, T. C. and Kimble, J. (1993). "Translational regulation of tra-2 by its 3' untranslated region controls sexual identity in C. elegans." CeN 75(2): 329-39. 183

Goodwin, E. B., Okkema, P. G., Evans, T. C. and Kimble, J. (1993). "Translational regulation of tra-2 by its 3' untranslated region controls sexual identity in C. elegans." CeH 75(2): 329-339.

Gopinath, S. C. (2007). "Methods developed for SELEX." Anal Bioanal Chem 387(1): 171-82.

Granadino, B., Penalva, L. O. F., Green, M. R., Valcarcel, J. and Sanchez, L. (1997). "Distinct mechanisms of splicing regulation in vivo by the Drosophila protein Sex-lethal." PNAS 94(14): 7343-7348.

Gu, M. and Lima, C. D. (2005). "Processing the message: structural insights into capping and decapping mRNA." Current Opinion in Structural Biology 15(1): 99- 106.

Gueydan, C, Droogmans, L, Chalon, P., Huez, G., Caput, D. and Kruys, V. (1999). "Identification of TIAR as a protein binding to the translational regulatory AU-rich element of tumor necrosis factor alpha mRNA." J Biol Chem 274(4): 2322-6.

Haegebarth, A., Heap, D., Bie, W., Derry, J. J., Richard, S. and Tyner, A. L. (2004). "The Nuclear Tyrosine Kinase BRK/Sik Phosphorylates and Inhibits the RNA-binding Activities of the Sam68-like Mammalian Proteins SLM-1 and SLM- 2." J. Biol. Chem. 279(52): 54398-54404.

Hansen, D., Schedl, T. and Gerald, P. S. (2006). The Regulatory Network Controlling the Proliferation-Meiotic Entry Decision in the Caenorhabditis elegans Germ Line. Current Topics in Developmental Biology, Academic Press. Volume 76:185-215.

Hardy, J. R. (1998). "Molecular defects in the dysmyelinating mutant quaking." Journal of Neuroscience Research 51(4): 417-422.

Hardy, R. J., Loushin, C. L, Friedrich Jr, V. L., Chen, Q., Ebersole, T. A., Lazzarini, R. A. and Artzt, K. (1996). "Neural Cell Type-Specific Expression of QKI Proteins Is Altered in quakingviable Mutant Mice." J. Neurosci. 16(24): 7941- 7949.

Hoek, K. S., Kidd, G. J., Carson, J. H. and Smith, R. (1998). "hnRNP A2 Selectively Binds the Cytoplasmic Transport Sequence of Myelin Basic Protein mRNA." Biochemistry 37(19): 7021-7029.

Hu, A. and Fu, X.-D. (2007). "Splicing oncogenes." Nat Struct Mol Biol 14(3): 174-175. 184

Ishidate, T., Yoshihara, S., Kawasaki, Y., Roy, B. C, Toyoshima, K. and Akiyama, T. (1997). "Identification of a novel nuclear localization signal in Sam68." FEBS Letters 409(2): 237-241.

Itoh, M., Haga, I., Li, Q.-H. and Fujisawa, J.-i. (2002). "Identification of cellular mRNA targets for RNA-binding protein Sam68." Nucl. Acids Res. 30(24): 5452- 5464.

Jan, E., Motzny, C. K., Graves, L. E. and Goodwin, E. B. (1999). "The STAR protein, GLD-1, is a translational regulator of sexual identity in Caenorhabditis elegans." EmboJ 18(1): 258-69.

Jan, E., Yoon, J. W., Walterhouse, D., lannaccone, P. and Goodwin, E. B. (1997). "Conservation of the C.elegans tra-2 3'UTR translational control." EmboJ 16(20): 6301-13.

Jones, A. R. and Schedl, T. (1995). "Mutations in gld-1, a female germ cell- specific tumor suppressor gene in Caenorhabditis elegans, affect a conserved domain also found in Src-associated protein Sam68." Genes Dev 9(12): 1491- 504.

Kapp, L. D. and Lorsch, J. R. (2004). "The molecular mechanics of eukaryotic translation." Annu Rev Biochem 73: 657-704.

Karim, M. M., Svitkin, Y. V., Kahvejian, A., De Crescenzo, G., Costa-Mattioli, M. and Sonenberg, N. (2006). "A mechanism of translational repression by competition of Paip2 with elF4G for poly(A) binding protein (PABP) binding." Proc Natl Acad Sci U S A103(25): 9494-9.

Kim, Y. K., Furic, L, DesGroseillers, L. and Maquat, L. E. (2005). "Mammalian Staufenl Recruits Upf1 to Specific mRNA 3'UTRs so as to Elicit mRNA Decay." Cell 120(2): 195-208.

Kindler, S., Wang, H., Richter, D. and Tiedge, H. (2005). "RNA TRANSPORT AND LOCAL CONTROL OF TRANSLATION." Annual Review of Cell and Developmental Biology 21 (1): 223-245.

Kondo, T., t Tokiko Furuta, Kanae, M., Thomas, A. E., Motoaki, S., Jiang, W., Karen, A., Ken-ichi, Y. and Kuniya, A. (1999). "Genomic organization and expression analysis of the mouse qkl locus." Mammalian Genome 10(7): 662- 669.

Kramer, A. (1992). "Purification of splicing factor SF1, a heat-stable protein that functions in the assembly of a presplicing complex." Mol. Cell. Biol. 12(10): 4545- 4552. 185

Lakiza, O., Frater, L, Yoo, Y., Villavicencio, E., Walterhouse, D., Goodwin, E. B. and lannaccone, P. (2005). "STAR proteins quaking-6 and GLD-1 regulate translation of the homologues GLI1 and tra-1 through a conserved RNA 3'UTR- based mechanism." Developmental Biology 287(1): 98-110.

Lang, V., Mege, D., Semichon, M., Gary-Gouy, H. and Bismuth, G. (1997). "A dual participation of ZAP-70 and scr protein tyrosine kinases is required for TCR- induced tyrosine phosphorylation of Sam68 in Jurkat T cells." Eur J Immunol 27(12): 3360-7.

Larocque, D., Galameau, A., Liu, H.-N., Scott, M., Almazan, G. and Richard, S. (2005). "Protection of p27Kip1 mRNA by quaking RNA binding proteins promotes oligodendrocyte differentiation." Nat Neurosci 8(1): 27-33.

Larocque, D., Pilotte, J., Chen, T., Cloutier, F., Massie, B., Pedraza, L., Couture, R., Lasko, P., Almazan, G. and Richard, S. (2002). "Nuclear Retention of MBP mRNAs in the Quaking Viable Mice." Neuron 36(5): 815-829.

Larocque, D. and Richard, S. (2005). "QUAKING KH domain proteins as regulators of glial cell fate and myelination." RNA Biol 2(2): 37-40.

Lee, M.-H. and Schedl, T. (2001). "Identification of in vivo mRNA targets of GLD- 1, a maxi-KH motif containing protein required for C. elegans germ cell development." Genes Dev. 15(18): 2408-2420.

Lee, M.-H. and Schedl, T. (2004). "Translation repression by GLD-1 protects its mRNA targets from nonsense-mediated mRNA decay in C. elegans." Genes 0^18(9): 1047-1059.

Lehmann-Blount, K. A. and Williamson, J. R. (2005). "Shape-specific Nucleotide Binding of Single-stranded RNA by the GLD-1 STAR Domain." Journal of Molecular Biology 346(1): 91-104.

Li, Z., Zhang, Y., Li, D. and Feng, Y. (2000). "Destabilization and Mislocalization of Myelin Basic Protein mRNAs in quaking Dysmyelination Lacking the QKI RNA- Binding Proteins." J. Neurosci. 20(13): 4944-4953.

Li, Z. Z., Kondo, T., Murata, T., Ebersole, T. A., Nishi, T., Tada, K., Ushio, Y., Yamamura, K. and Abe, K. (2002). "Expression of Hqk encoding a KH RNA binding protein is altered in human glioma." Jpn J Cancer Res 93(2): 167-77.

Lin, Q., Taylor, S. J. and Shalloway, D. (1997). "Specificity and Determinants of Sam68 RNA Binding. IMPLICATIONS FOR THE BIOLOGICAL FUNCTION OF K HOMOLOGY DOMAINS." J. Biol. Chem. 272(43): 27274-27280. 186

Lingel, A., Simon, B., Izaurralde, E. and Sattler, M. (2003). "Structure and nucleic-acid binding of the Drosophila Argonaute 2 PAZ domain." Nature 426(6965): 465-9.

Liu, K., Li, L, Nisson, P. E., Gruber, C, Jessee, J. and Cohen, S. N. (2000). "Neoplastic Transformation and Tumorigenesis Associated with Sam68 Protein Deficiency in Cultured Murine Fibroblasts." J. Biol. Chem. 275(51): 40195-40201.

Liu, Z., Luyten, I., Bottomley, M. J., Messias, A. C, Houngninou-Molango, S., Sprangers, R., Zanier, K., Kramer, A. and Sattler, M. (2001). "Structural Basis for Recognition of the Intron Branch Site RNA by Splicing Factor 1." Science 294(5544): 1098-1102.

Lock, P., Fumagalli, S., Polakis, P., McCormick, F. and Courtneidge, S. A. (1996). "The Human p62 cDNA Encodes Sam68 and Not the RasGAP- Associated p62 Protein." Cell 84(1): 23-24.

Lopez, A. J. (1998). "ALTERNATIVE SPLICING OF PRE-mRNA: Developmental Consequences and Mechanisms of Regulation." Annual Review of Genetics 32(1): 279-305.

Lu, Z., Zhang, Y., Ku, L, Wang, H., Ahmadian, A. and Feng, Y. (2003). "The quakingviable mutation affects qkl mRNA expression specifically in myelin- producing cells of the nervous system." Nucl. Acids Res. 31(15): 4616-4624.

Lukong, K. E., Larocque, D., Tyner, A. L. and Richard, S. (2005). "Tyrosine Phosphorylation of Sam68 by Breast Tumor Kinase Regulates Intranuclear Localization and Cell Cycle Progression." J. Biol. Chem. 280(46): 38639-38647.

Lukong, K. E. and Richard, S. (2003). "Sam68, the KH domain-containing superSTAR." Biochimica et Biophysica Acta (BBA) - Reviews on Cancer 1653(2): 73-86.

Macchi, P., Brownawell, A. M., Grunewald, B., DesGroseillers, L, Macara, I. G. and Kiebler, M. A. (2004). "The brain-specific double-stranded RNA-binding protein Staufen2: nucleolar accumulation and isoform-specific exportin-5- dependent export." J Biol Chem 279(30): 31440-4.

Macias, M. J., Wiesner, S. and Sudol, M. (2002). "WW and SH3 domains, two different scaffolds to recognize proline-rich ligands." FEBS Letters 513(1): 30-37.

Maguire, M. L, Guler-Gane, G., Nietlispach, D., Raine, A. R. C, Zorn, A. M., Standart, N. and Broadhurst, R. W. (2005). "Solution Structure and Backbone Dynamics of the KH-QUA2 Region of the Xenopus STAR/GSG Quaking Protein." Journal of Molecular Biology 348(2): 265-279. 187

Malter, J. S. (1989). "Identification of an AUUUA-specific messenger RNA binding protein." Science 246(4930): 664-6.

Maquat, L E. (2005). "Nonsense-mediated mRNA decay in mammals." J Cell Sci 118(9): 1773-1776.

Marin, V. A. and Evans, T. C. (2003). "Translational repression of a C. elegans Notch mRNA by the STAR/KH domain protein GLD-1." Development 130(12): 2623-2632.

Martel, C, Macchi, P., Furic, L, Kiebler, M. A. and Desgroseillers, L. (2006). "Staufenl is imported into the nucleolus via a bipartite nuclear localization signal and several modulatory determinants." Biochem J 393(Pt 1): 245-54.

Matlin, A. J., Clark, F. and Smith, C. W. J. (2005). "UNDERSTANDING ALTERNATIVE SPLICING: TOWARDS A CELLULAR CODE." Nature Reviews Molecular Cell Biology 6(5): 386-398.

Matter, N., Herrlich, P. and Konig, H. (2002). "Signal-dependent regulation of splicing via phosphorylation of Sam68." Nature 420(6916): 691-695.

McBride, A. E. and Silver, P. A. (2001). "State of the Arg: Protein Methylation at Arginine Comes of Age." CeH 106(1): 5-8.

Mclnnes, L A. and Lauriat, T. L. (2006). "RNA metabolism and dysmyelination in schizophrenia." Neuroscience & Biobehavioral Reviews 30(4): 551-561.

Meng, Z., King, P. H., Nabors, L. B., Jackson, N. L, Chen, C.-Y., Emanuel, P. D. and Blume, S. W. (2005). "The ELAV RNA-stability factor HuR binds the 5'- untranslated region of the human IGF-IR transcript and differentially represses cap-dependent and IRES-mediated translation." Nucl. Acids Res. 33(9): 2962- 2979.

Merrill and Scolding (1999). "Mechanisms of damage to myelin and oligodendrocytes and their relevance to disease." Neuropathology and Applied Neurobiology 25(6): 435-458.

Miki, T., Takano, K. and Yoneda, Y. (2005). "The role of mammalian Staufen on mRNA traffic: a view from its nucleocytoplasmic shuttling function." Cell Struct Funct 30(2): 51-6.

Mitchell, P. and Tollervey, D. (2000). "mRNA stability in eukaryotes." Current Opinion in Genetics & Development 10(2): 193-198.

Miyashiro, K. Y., Beckel-Mitchener, A., Purk, T. P., Becker, K. G., Barret, T., Liu, L, Carbonetto, S., Weiler, I. J., Greenough, W. T. and Eberwine, J. (2003). "RNA 188

Cargoes Associating with FMRP Reveal Deficits in Cellular Functioning in Fmr1 Null Mice." Neuron 37(3): 417-431.

Moore, M. J. (2005). "From Birth to Death: The Complex Lives of Eukaryotic mRNAs." Science 309(5740): 1514-1518.

Mootz, D., Ho, D. M. and Hunter, C. P. (2004). "The STAR/Maxi-KH domain protein GLD-1 mediates a developmental switch in the translational control of C. elegans PAL-1." Development 131(14): 3263-3272.

Myer, V. E., Fan, X. C. and Steitz, J. A. (1997). "Identification of HuR as a protein implicated in AUUUA-mediated mRNA decay." Embo J 16(8): 2130-9.

Nabel-Rosen, H., Dorevitch, N., Reuveny, A. and Volk, T. (1999). "The Balance between Two Isoforms of the Drosophila RNA-Binding Protein How Controls Tendon Cell Differentiation." Molecular Cell 4(4): 573-584.

Nabel-Rosen, H., Toledano-Katchalski, H., Volohonsky, G. and Volk, T. (2005). "Cell Divisions in the Drosophila Embryonic Mesoderm Are Repressed via Posttranscriptional Regulation of string/cdc25 by HOW." Current Biology 15(4): 295-302.

Nabel-Rosen, H., Volohonsky, G., Reuveny, A., Zaidel-Bar, R. and Volk, T. (2002). "Two Isoforms of the Drosophila RNA Binding Protein, How, Act in Opposing Directions to Regulate Tendon Cell Differentiation." Developmental Cell 2(2): 183-193.

Noveroske, J. K., Lai, L, Gaussin, V., Northrop, J. L, Nakamura, H., Hirschi, K. K. and Justice, M. J. (2002). "Quaking is essential for blood vessel development." Genesis 32(3): 218-30.

Peled-Zehavi, H., Berglund, J. A., Rosbash, M. and Frankel, A. D. (2001). "Recognition of RNA Branch Point Sequences by the KH Domain of Splicing Factor 1 (Mammalian Branch Point Binding Protein) in a Splicing Factor Complex." Mol. Cell. Biol. 21(15): 5232-5241.

Pilotte, J., Larocque, D. and Richard, S. (2001). "Nuclear translocation controlled by alternatively spliced isoforms inactivates the QUAKING apoptotic inducer." Genes Dev. 15(7): 845-858.

Ramos, A., Grunert, S., Adams, J., Micklem, D. R., Proctor, M. R., Freund, S., Bycroft, M., St Johnston, D. and Varani, G. (2000). "RNA recognition by a Staufen double-stranded RNA-binding domain." Embo J 19(5): 997-1009.

Ramos, A., Hollingworth, D. and Pastore, A. (2003). "G-quartet-dependent recognition between the FMRP RGG box and RNA." RNA 9(10): 1198-1207. 189

Reddy, T. R., Suhasini, M., Xu, W., Yeh, L-y., Yang, J.-P., Wu, J., Artzt, K. and Wong-Staal, F. (2002). "A Role for KH Domain Proteins (Sam68-like Mammalian Proteins and Quaking Proteins) in the Post-transcriptional Regulation of HIV Replication." J. Biol. Chem. 277(8): 5778-5784.

Richard, S., Torabi, N., Franco, G. V., Tremblay, G. A., Chen, T., Vogel, G., Morel, M., Cleroux, P., Forget-Richard, A., Komarova, S., Tremblay, M. L, Li, W., Li, A., Gao, Y. J. and Henderson, J. E. (2005). "Ablation of the Sam68 RNA Binding Protein Protects Mice from Age-Related Bone Loss." PLoS Genetics 1(6): e74.

Richard, S., Yu, D., Blumer, K. J., Hausladen, D., Olszowy, M. W., Connelly, P. A. and Shaw, A. S. (1995). "Association of p62, a multifunctional SH2- and SH3- domain-binding protein, with src family tyrosine kinases, Grb2, and phospholipase C gamma-1." Mol. Cell. Biol. 15(1): 186-197.

Robard, C. c, Daviau, A. and DiAfruscio, M. (2006). "Phosphorylation status of the Kepi protein alters its affinity for its protein binding partner alternative splicing factor ASF/SF2." Biochem J 400(1): 91-97.

Rodriguez, M. S., Dargemont, C. and Stutz, F. (2004). "Nuclear export of RNA." Biology of the Cell 96(8): 639-655.

Ryder, S. P., Frater, L. A., Abramovitz, D. L, Goodwin, E. B. and Williamson, J. R. (2004). "RNA target specificity of the STAR/GSG domain post-transcriptional regulatory protein GLD-1." Nat Struct Mol Biol 11(1): 20-28.

Ryder, S. P. and Williamson, J. R. (2004). "Specificity of the STAR/GSG domain protein Qk1: Implications for the regulation of myelination." RNA 10(9): 1449- 1458.

Saccomanno, L., Loushin, C, Jan, E., Punkay, E., Artzt, K. and Goodwin, E. B. (1999). "The STAR protein QKI-6 is a translational repressor." PNAS 96(22): 12605-12610.

Schaeffer, C, Bardoni, B., Mandel, J. L, Ehresmann, B., Ehresmann, C. and Moine, H. (2001). "The fragile X mental retardation protein binds specifically to its mRNA via a purine quartet motif." Embo J 20(17): 4803-13.

Schaeffer, C, Beaulande, M., Ehresmann, C, Ehresmann, B. and Moine, H. (2003). "The RNA binding protein FMRP: new connections and missing links." Biol Cell 95(3-4): 221-8.

Schumacher, B., Hanazawa, M., Lee, M.-H., Nayak, S., Volkmann, K., Hofmann, R., Hengartner, M., Schedl, T. and Gartner, A. (2005). "Translational Repression 190 of C. elegans p53 by GLD-1 Regulates DNA Damage-Induced Apoptosis." Cell 120(3): 357-368.

Schwarz, D. S. and Zamore, P. D. (2002). "Why do miRNAs live in the miRNP?" Genes Dev. 16(9): 1025-1031.

Shan, J., Moran-Jones, K., Munro, T. P., Kidd, G. J., Winzor, D. J., Hoek, K. S. and Smith, R. (2000). "Binding of an RNA Trafficking Response Element to Heterogeneous Nuclear Ribonucleoproteins A1 and A2." J. Biol. Chem. 275(49): 38286-38295.

Sidman, R. L, Dickie, M. M. and Appel, S. H. (1964). "Mutant Mice (Quaking and Jimpy) with Deficient Myelination in the Central Nervous System." Science 144: 309-11.

Singh, R. and Valcarcel, J. (2005). "Building specificity with nonspecific RNA- binding proteins." Nat Struct Mol Biol 12(8): 645-653.

Siomi, H., Matunis, M. J., Michael, W. M. and Dreyfuss, G. (1993). "The pre- mRNA binding K protein contains a novel evolutionary conserved motif." Nucl. Acids Res. 21(5): 1193-1198.

Sommer, P. and Nehrbass, U. (2005). "Quality control of messenger ribonucleoprotein particles in the nucleus and at the pore." Current Opinion in Cell Biology 17(3): 294-301.

St Johnston, D., Beuchle, D. and Nusslein-Volhard, C. (1991). "staufen, a gene required to localize maternal RNAs in the Drosophila egg." Cell 66(1): 51-63.

Stoss, O., Novoyatleva, T., Gencheva, M., Olbrich, M., Benderska, N. and Stamm, S. (2004). "p59fyn-mediated phosphorylation regulates the activity of the tissue-specific splicing factor rSLM-1." Molecular and Cellular Neuroscience 27(1): 8-21.

Stoss, O., Olbrich, M., Hartmann, A. M., Konig, H., Memmott, J., Andreadis, A. and Stamm, S. (2001). "The STAR/GSG Family Protein rSLM-2 Regulates the Selection of Alternative Splice Sites." J. Biol. Chem. 276(12): 8665-8673.

Suntharalingam, M. and Wente, S. R. (2003). "Peering through the Pore: Nuclear Pore Complex Structure, Assembly, and Function." Developmental Cell 4(6): 775-789.

Taylor, S., Resnick, R. and Shalloway, D. (2004). "Sam68 exerts separable effects on cell cycle progression and apoptosis." BMC Cell Biology 5(1): 5. 191

Tong, A. W. (2006). "Small RNAs and non-small cell lung cancer." Curr Mol Med 6(3): 339-49.

Tremblay, G. A. and Richard, S. (2006). "mRNAs associated with the Sam68 RNA binding protein." RNA Biol 3(2): 1-4.

Tuerk, C. and Gold, L. (1990). "Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase." Science 249(4968): 505-510.

Ule, J., Stefani, G., Mele, A., Ruggiu, M., Wang, X., Taneri, B., Gaasterland, T., Blencowe, B. J. and Darnell, R. B. (2006). "An RNA map predicting Nova- dependent splicing regulation." Nature 444(7119): 580-586.

Vemet, C. and Artzt, K. (1997). "STAR, a gene family involved in signal transduction and activation of RNA." Trends in Genetics 13(12): 479-484.

Wang, C, Dickinson, L. K. and Lehmann, R. (1994). "Genetics of nanos localization in Drosophila." Dev Dvn 199(2): 103-15.

Wang, L. L, Richard, S. and Shaw, A. S. (1995). "p62 Association with RNA Is Regulated by Tyrosine Phosphorylation." J. Biol. Chem. 270(5): 2010-2013.

Wang, Q., Khillan, J., Gadue, P. and Nishikura, K. (2000). "Requirement of the RNA Editing Deaminase ADAR1 Gene for Embryonic Erythropoiesis." Science 290(5497): 1765-1768.

Wilkie, G. S., Dickson, K. S. and Gray, N. K. (2003). "Regulation of mRNA translation by 5'- and 3'-UTR-binding factors." Trends in Biochemical Sciences 28(4): 182-188.

Wong, G., Muller, O., Clark, R., Conroy, L, Moran, M. F., Polakis, P. and McCormick, F. (1992). "Molecular cloning and nucleic acid binding properties of the GAP-associated tyrosine phosphoprotein p62." Cell 69(3): 551-558.

Wu, J., Zhou, L, Tonissen, K., Tee, R. and Artzt, K. (1999). "The Quaking I-5 Protein (QKI-5) Has a Novel Nuclear Localization Signal and Shuttles between the Nucleus and the Cytoplasm." J. Biol. Chem. 274(41): 29202-29210.

Wu, J. I., Reed, R. B., Grabowski, P. J. and Artzt, K. (2002). "Function of quaking in myelination: Regulation of alternative splicing." PNAS 99(7): 4233-4238.

Yan, K. S., Yan, S., Farooq, A., Han, A., Zeng, L and Zhou, M. M. (2003). "Structure and conserved RNA binding of the PAZ domain." Nature 426(6965): 468-74. 192

Zaffran, S., Astier, M., Gratecos, D. and Semeriva, M. (1997). "The held out wings (how) Drosophila gene encodes a putative RNA-binding protein involved in the control of muscular and cardiac activity." Development 124(10): 2087-2098.

Zamore, P. D. and Haley, B. (2005). "Ribo-gnome: the big world of small RNAs." Science 309(5740): 1519-24.

Zhang, T., Kruys, V., Huez, G. and Gueydan, C. (2002). "AU-rich element- mediated translational control: complexity and multiple activities of trans- activating factors." Biochem Soc Trans 30(Pt 6): 952-8.

Zhang, W., Wagner, B. J., Ehrenman, K., Schaefer, A. W., DeMaria, C. T., Crater, D., DeHaven, K., Long, L. and Brewer, G. (1993). "Purification, characterization, and cDNA cloning of an AU-rich element RNA-binding protein, AUF1."Mol Cell Biol 13(12): 7652-65.

Zhang, Y., Lu, Z., Ku, L, Chen, Y., Wang, H. and Feng, Y. (2003). "Tyrosine phosphorylation of QKl mediates developmental signals to regulate mRNA metabolism." Embo J 22(8): 1801-10.

Zhao, L, Ku, L, Chen, Y., Xia, M., LoPresti, P. and Feng, Y. (2006). "QKl Binds MAP1B mRNA and Enhances MAP1B Expression during Oligodendrocyte Development." Mol. Biol. Cell 17(10): 4179-4186.

Zhao, L, Tian, D., Xia, M., Macklin, W. B. and Feng, Y. (2006). "Rescuing qkv Dysmyelination by a Single Isoform of the Selective RNA-Binding Protein QKl." ± Neurosci. 26(44): 11278-11286.

Zheng, Z. M., Tang, S. and Tao, M. (2005). "Development of resistance to RNAi in mammalian cells." Ann N Y Acad Sci 1058: 105-18.

Zhong, J., Peters, A. H. F. M., Lee, K. and Braun, R. E. (1999). "A double- stranded RNA binding protein required for activation of repressed messages in mammalian germ cells." Nat Genet 22(2): 171-174.