Global Analysis of Expression in the

Developing Brain of Gtf2ird1-/- Mice

by

Jennifer Anne O’Leary

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Molecular Genetics University of Toronto

© Copyright by Jennifer Anne O’Leary (2011)

Global Analysis of Gene Expression in the Developing Brain of

Gtf2ird1-/- Mice

Jennifer Anne O’Leary

Doctor of Philosophy

Department of Molecular Genetics University of Toronto

2011

Abstract

Williams-Beuren Syndrome (WBS) is an autosomal dominant neurodevelopmental disorder caused by hemizygous deletion of a 1.5 Mb region on 7q11.23. Symptoms are numerous and include behavioural and cognitive components. One of the deleted ,

GTF2IRD1, a putative transcription factor, has been implicated in the neurological features of

WBS by studying patients with atypical deletions of 7q11.23. Gtf2ird1-targeted mice have features consistent with the WBS phenotype, namely reduced innate fear and increased sociability. To identify neural targets of GTF2IRD1, microarray analyses were performed comparing gene expression in whole brains of Gtf2ird1-/- and wildtype (WT) mice at embryonic day 15.5 and at birth. Overall, the changes in gene expression in the mutant mice were not striking, with most falling in the range of 0.3 to 2 fold. qRT-PCR was used to verify the expression levels of candidate genes and examination of verified genes revealed that most were located on chromosome 5, within 50 Mb of Gtf2ird1. Expression of these candidate genes in

Gtf2ird1-/- mice was found to be the same as in WT 129S1/SvImJ mice, indicating the

ii

differences were the result of flanking chromosomal material from the, 129-derived, R1 ES cells from which the Gtf2ird1-/- mice were generated, and that expression differences were unrelated to Gtf2ird1 dosage. Further analysis found that while many genes showed decreased expression using primers targeting the 3’ UTR, expression of upstream was not affected.

Transcripts using alternative polyadenylation sites were identified using 3’ RACE, and qRT-PCR showed that expression of different 3’ UTR isoforms can occur in a strain specific manner.

Expression analysis of previously identified GTF2IRD1 targets also failed to demonstrate an in vivo effect. In summary, I was unable to find any in vivo neuronal targets of this putative transcription factor, despite its robust expression in the developing rodent brain.

iii

Acknowledgements

The work that I completed over the past seven years would not have been possible without the help and support of many people. First, I must thank my supervisor, Dr. Lucy

Osborne for her guidance and support. It has been a pleasure working in her lab, and her ability to put a positive spin on my negative results kept me from getting too depressed as the list of genes that Gtf2ird1 does not regulate continued to grow. I would also like to thank my supervisory committee members, Dr. Sabine Cordes and Dr. Timothy Hughes for their helpful insights, suggestions and technical assistance.

All members of the Osborne lab, past and present, have made the lab a great environment to work in. I will greatly miss the countless hours of “scientific” discussion, cookie days, and their company during “coffee time”. In particular, I owe a big thank you to Ted Young for teaching me how to be a good scientist. Although his Lil John impressions drove me crazy, he always came through with helpful advice when it was most needed.

Finally, none of this would have been possible without the continued support of my family, especially my parents. I was the first of their four children to enter university, and the last one to leave. Their unconditional love and encouragement have undoubtedly made it possible for me to be where I am today.

iv

Table of Contents

Abstract ...... ii

Acknowledgements ...... iv

Table of Contents ...... v

LIST OF TABLES ...... x

LIST OF FIGURES ...... xi

LIST OF ABBREVIATIONS ...... xiii

Chapter I: Introduction ...... 1

1.1 Williams-Beuren syndrome ...... 1 1.1.1 History of Williams-Beuren syndrome ...... 1

1.1.2 Williams-Beuren syndrome clinical phenotype ...... 2

1.1.3 The Williams-Beuren syndrome cognitive phenotype ...... 5

1.1.4 The Williams-Beuren syndrome behavioural phenotype ...... 8

1.2 The genetic basis of Williams-Beuren syndrome ...... 9 1.2.1 Identification of a microdeletion at 7q11.23 ...... 9

1.2.2 Genomic rearrangements at 7q11.23 ...... 12

1.2.3 Atypical deletions in the Williams-Beuren syndrome region ...... 15

1.3 General Transcription Factor 2-I (GTF2-I) gene family ...... 17 1.3.1 General Transcription Factor 2-I (TFII-I) ...... 18

1.3.2 General Transcription Factor 2-I Repeat Domain containing 1 (TFII-IRD1) ...... 21

1.3.3 General Transcription Factor 2-I Repeat Domain containing 2 (TFII-IRD2) ...... 24

v

1.4 The Gtf2ird1 mouse model ...... 25 1.4.1 Generation of the mouse model ...... 26

1.4.2 Behavioural phenotypic analysis ...... 28

1.4.3 Biochemical and electrophysiological phenotypic analysis ...... 29

1.5 Research Aims and Hypothesis ...... 29

Chapter II: TFII-IRD1 may not function as a transcription factor in the developing mouse brain...... 31

2.1 Abstract ...... 31

2.2 Literature Review ...... 32 2.2.1 Evidence supporting the role of TFII-IRD1 as a transcription factor ...... 32

2.2.2 Cellular localization of TFII-IRD1 ...... 39

2.3 Material and Methods ...... 41 2.3.1 Generation of probes for in situ hybridization ...... 41

2.3.2 Whole mount in situ hybridization of Gtf2ird1-/- embryos ...... 42

2.3.3 In situ hybridization of P0 mouse brain sections ...... 44

2.3.4 Preparation and culture of mouse embryonic fibroblast (MEF) cells ...... 45

2.3.5 Dissection of mouse tissues and RNA isolation ...... 46

2.3.6 Genotyping of P0 and embryonic mice ...... 46

2.3.7 Microarray analysis using the Affymetrix mouse 430 2.0 gene chip ...... 47

2.3.8 Microarray analysis using the Illumina mouseWG-6 v2.0 BeadChip ...... 48

2.3.9 Expression analysis using quantitative Real-Time PCR ...... 49

2.3.10 siRNA knockdown of Gtf2ird1 in neuronal cell lines ...... 52

2.3.11 Cellular localization of Gtf2ird1 in Neuro2a cells ...... 53

2.3.12 Expression analysis using western blots ...... 55 vi

2.4 Results ...... 56 2.4.1 Gtf2ird1 is expressed in the developing mouse brain ...... 56

2.4.2 Expression of candidate target genes Hoxc8 and Gsc are not altered in E11.5

Gtf2ird1-/- mouse embryos ...... 56

2.4.3 Expression of TFII-IRD1 candidate target genes identified in vitro are not altered

in vivo ………………………………………………………………………………………60

2.4.4 Global expression analysis of P0 mouse whole brain ...... 64

2.4.5 Global expression analysis of E15.5 embryo heads ...... 68

2.4.6 Validation of candidate gene expression using qRT-PCR ...... 70

2.4.7 Knockdown of Gtf2ird1 in neuronal cell lines does not affect expression of

candidate genes ...... 72

2.4.8 Altered gene expression in Gtf2ird1-/- mice is the result of differences in genetic

background ...... 75

2.4.9 TFII-IRD1 is found in the cytoplasm of Neuro2a cells ...... 78

2.5 Discussion ...... 82 2.5.1 Targets of TFII-IRD1 identified in vitro ...... 82

2.5.2 Global analysis of gene expression in Gtf2ird1-/- mice ...... 85

2.5.3 Cellular localization of TFII-IRD1 ...... 92

Chapter III: specific differences in gene expression between different mouse strains 94

3.1 Abstract ...... 94

3.2 Literature Review ...... 95 3.2.1 Polyadenylation of pre-mRNA ...... 95

3.2.2 Transcription termination ...... 99

3.2.3 Strain specific gene expression ...... 100 vii

3.3 Material and Methods ...... 101 3.3.1 Expression analysis using quantitative Real-Time PCR ...... 101

3.3.2 Generation of probes for Northern blots ...... 103

3.3.3 Northern blot analysis ...... 104

3.3.4 3’ Rapid Amplification of cDNA ends (RACE) ...... 105

3.3.5 Cloning and sequencing of 3’ RACE products ...... 106

3.3.6 Expression analysis using western blots ...... 107

3.4 Results ...... 108 3.4.1 Differential gene expression detected in Gtf2ird1-/- mice is exon specific ...... 108

3.4.2 Northern blot analysis does not detect novel alternatively spliced transcripts ...... 112

3.4.3 Alternative splicing in the 3’UTR identified using 3’ RACE ...... 115

3.4.4 Expression levels of different 3’UTR isoforms differ between genotypes ...... 122

3.4.5 Expression of Stx3 is variable and does not correlate with genotype ...... 126

3.4.6 Differences in gene expression of genes located close to the Gtf2ird1 are

related to genetic background ...... 127

3.4.7 Differentially expressed exons do not affect levels of Stx3 ...... 128

3.5 Discussion ...... 129 3.5.1 Alternative splicing in the 3’ UTR ...... 130

3.5.2 Use of alternative polyadenylation sites ...... 132

3.5.3 qRT-PCR validation of microarrays ...... 135

Chapter IV: Summary and Future Directions ...... 137

4.1 Summary ...... 137

4.2 Further investigation of GTF2IRD1 function ...... 139

4.3 Further investigation of alternative polyA site selection ...... 142 viii

4.4 Conclusion ...... 142

References ...... 144

ix

LIST OF TABLES

CHAPTER I: Introduction to Williams-Beuren syndrome

Table 1.1 Genes located in the WBS deletion region……………………………... 11

CHAPTER II: TFII-IRD1 may not function as a transcription factor in the developing mouse brain

Table 2.1 Sequences of primers used in qRT-PCR……………………………….. 49 Table 2.2 siRNA sequences used to knockdown Gtf2ird1…………………………… 53 Table 2.3 Genes found to have altered expression in the brains of Gtf2ird1-/- P0 mice by microarray……………………………………………………... 65 Table 2.4 Genes found to have altered expression in the heads of Gtf2ird1-/- E15.5 mice by microarray……………………………………………… 69 Table 2.5 Comparison of SNPs in the 3’UTR Zfp68 in Gtf2ird1-/- mice and WT mice relative to 129S1/Sv1mJ mice……………………………………. 77

CHAPTER III: Exon specific differences in gene expression between different mouse strains

Table 3.1 Sequences of primers used in qRT-PCR……………………………….. 102 Table 3.2 Sequences of primers used to generate Northern blot probes………….. 103 Table 3.3 Sequences of primers used in synthesis of first strand cDNA from 3' RACE…………………………………………………………………... 106

x

LIST OF FIGURES

CHAPTER I: Introduction to Williams-Beuren syndrome

Figure 1.1 Characteristic facial features of WBS…………………………………... 4 Figure 1.2 Grammar skills in WBS patients...... 7 Figure 1.3 Visual spatial skills in WBS patients…………………………………… 7 Figure 1.4 Physical map of the WBS region……………………………………….. 10 Figure 1.5 Mechanisms of non-allelic homologous recombination………………... 14 Figure 1.6 Patients with atypical deletions in the WBS region……………………. 16 Figure 1.7 Structural elements of the TFII-I ………………………………. 18 Figure 1.8 Synteny between human chromosome 7q11.23 and mouse 5G………… 26

CHAPTER II: TFII -IRD1 may not function as a transcription factor in the developing mouse brain

Figure 2.1 Gtf2ird1 expression in E11.5 and P0 mice……………………………… 58 -/- Figure 2.2 Expression of Hoxc8 and Gsc in Gtf2ird1 mice………………………. 59 -/- Figure 2.3 Expression pattern of Hoxc8 in Gtf2ird1 mice………………………... 60 -/- Figure 2.4 Expression of Bmpr1b and Fgf15 in Gtf2ird1 mice…………………… 61 Figure 2.5 In vivo expression of TFII-IRD1 target genes identified in MEFs……... 63 Figure 2.6 qRT-PCR validation of expression of candidate genes identified in P0 mice……………………………………………………………………… 71 Figure 2.7 qRT-PCR validation of expression of candidate genes identified in E15.5 mice………………………………………………………………. 71 Figure 2.8 Knockdown of Gtf2ird1 in Neuro2A cells………………………………. 73 Figure 2.9 Expression of candidate genes in Gtf2ird1 siRNA treated neuronal cells. 75 Figure 2.10 Expression of candidate genes in the brain of different mouse strains….. 76 Figure 2.11 TFII-IRD1 expression in transfected Neuro2A cells……………………. 80 Figure 2.12 Localization of TFII-IRD1 in Neuro2A cells…………………………… 81

xi

CHAPTER III: Exon specific differences in gene expression between different mouse strains

Figure 3.1 Exon specific differences in gene expression in P0 Gtf2ird1-/- mice……. 110 Figure 3.2 Exon specific differences in gene expression in E15.5 Gtf2ird1-/- mice... 111 Figure 3.3 Northern blot analysis of Stx3, Kin, Mrpl16 and Pex1 expression……… 113 Figure 3.4 Mrpl16 and Stx3 transcripts identified using 3' RACE………………….. 117 Figure 3.5 Zfp68 transcripts identified using 3' RACE……………………………... 118 Figure 3.6 Coq2 transcripts identified using 3' RACE……………………………… 119 Figure 3.7 Ap4m1 and Taf6 transcripts identified using 3' RACE………………….. 120 Figure 3.8 Actl6b transcripts identified using 3' RACE…………………………….. 121 Figure 3.9 Exon specific changes in Zfp68 expression……………………………... 122 Figure 3.10 Exon specific changes in Coq2 expression……………………………… 123 Figure 3.11 Exon specific changes in Ap4m1 and Taf6 expression………………….. 124 Figure 3.12 Exon specific changes in Stx3 and Mrpl16 expression………………….. 125 Figure 3.13 Stx3 expression shows natural variation unrelated to genotype………… 127 Figure 3.14 Exon specific differences in gene expression between different mouse strains……………………………………………………………………. 128 -/- Figure 3.15 STX3 expression in Gtf2ird1 and WT mice…………………………… 129

xii

LIST OF ABBREVIATIONS

5-HIAA 5-hydroxyindoleacetic acid 5HT Serotonin

ADHD Attention Deficit Hyperactivity Disorder AdML Adenovirus Major Late AUAP Abridged Universal Amplification Primer BCR B Cell Antigen Receptor BDNF Brain-Derived Neurotrophic Factor BEN Binding Factor for Early Enhancer Btk Bruton's Tyrosine Kinase

CFIIm Cleavage Factor II

CFIm Cleavage Factor I ChIP Chromatin Immunoprecipitation CMT1A Charcot-Marie-Tooth Neuropathy Type 1A CPSF Cleavage and Polyadenylation Specificity Factor CREAM Containing Repetitive Eighty-Six Amino-Acid Motif CstF Cleavage Stimulatory Factor CTD C-Terminal Domain DE Distal Element DICE Downstream Immunoglobulin Control Element D-MEM Dulbecco's Modified Eagles Medium DNS Down Syndrome E Embryonic day (Days Post-Conception) ECL Enhanced Chemiluminescence EE Early Enhancer ELN

EMSA Electrophoretic Mobility Shift Analysis FACS Fluorescence Activated Cell Sorting FDR False Discovery Rate GSC Goosecoid GTF2I General Transcription Factor 2-I GTF2IRD1 General Transcription Factor 2-I Repeat Domain Containing 1 xiii

GTF2IRD2 General Transcription Factor 2-I Repeat Domain Containing 2 GUR GTF2IRD1 Upstream Region HDAC3 Histone Deacetylase 3 HLH Helix-Loop-Helix HPLC High-Performance Liquid Chromatography IgH Immunoglobulin Heavy Chain IgM H-chain Immunoglobulin M Heavy Chain Inr Initiator

LCR Low Copy Repeat LIMMA Linear Models for Microarray Data LTP Long Term Potentiation LZ Leucine Zipper MEF Mouse Embryonic Fibroblast MEF2C Myocyte Enhancer Factor miRNA MicroRNA MusTRD1 Muscle TFII-I Repeat Domain-Containing Protein 1 N2A Neuro2A

NAHR Non Allelic Homologous Recombination NCoR Nuclear Receptor Co-Repressor PAP Poly(A) Polymerase PAPOLG Poly(A) Polymerase γ PBS Phosphate Buffered Saline PBT PBS + 0.1% Tween-20 PE Proximal Element PFA Paraformaldehyde qRT-PCR Quantitative Real-Time PCR RACE Rapid Amplification of cDNA Ends RMA Robust Multiarray Analysis RNAPII RNA Polymerase II SAM Significance Analysis of Microarrays Sdha Succinate Dehydrogenase SELEX Systematic Evolution of Ligands by Exponential Enrichment siRNA Small Interfering RNA xiv

SPIN SRF-Phox1 Interacting Protein SRE Serum Response Element SRF Serum Response Factor SVAS Supravalvular Aortic Stenosis TBS Tris-Buffered Saline TBS-T Tris-Buffered Saline Tween-20 TCAG The Centre for Applied Genomics TGFβ Transforming Growth Factor Beta

TnIs Troponin, Slow Isoform TRPC3 Transient Receptor Potential Channel 3 UAP Universal Amplification Primer USE Upstream Regulatory Element USF1 Upstream Stimulatory Factor 1 VEGFR-2 Vascular Endothelial Growth Factor Receptor-2 WBS Williams-Beuren Syndrome WT Wildtype

xv 1

Chapter I: Introduction

1.1 Williams-Beuren syndrome

1.1.1 History of Williams-Beuren syndrome

The first reports associated with Williams-Beuren syndrome (WBS) occurred in Europe

during the early 1950’s1,2. During this time period there were many reported cases of idiopathic

infantile hypercalcemia in England, which were found to be caused by excessive vitamin D

intake in children who ate government supplied formulas and cereals containing dietary

supplements3. A sub-group of children with a severe form of infantile hypercalcemia were noted

who could not be cured by a dietary restriction of vitamin D3. This group also suffered from

generalized retardation, murmurs, characteristic facial features, and renal impairment1,2,4.

It was postulated by Lightwood and Stapleton that this group of children represented a distinct clinical syndrome1.

In 1961 Williams et al. reported four patients with a localized narrowing of the ascending

aorta, a condition known as supravalvular aortic stenosis (SVAS). These patients also had

mental retardation and facial features that were similar to each other5. One year later Beuren et al. reported three more patients with a similar phenotype, and noted that they all “have the same kind of friendly nature – they love everyone, are loved by everyone, and are very charming”6.

The similarities between the children with severe infantile hypercalcemia4 and those with SVAS

reported by Williams et al. and Beuren et al. were first noted by Black and Bonham Carter in

19637. They reported five additional children with aortic systolic murmurs, which are

characteristic of aortic stenosis. A review of their early case histories revealed that the children

also had many of the characteristic features associated with hypercalcaemia7. These attributes,

namely infantile hypercalcemia, SVAS, mental retardation, a friendly personality and

2

characteristic facial features are hallmarks of the disorder which is known today as Williams-

Beuren Syndrome.

WBS was originally thought to be caused by problems with vitamin D metabolism in

either the mother, fetus or both8. The offspring of rabbits who were fed excessive amounts of

vitamin D during pregnancy were born with aortic lesions that had a similar histology to the

SVAS seen in people9. Further studies found the offspring also had other symptoms which are features of WBS, namely dental anomalies, peculiar facial features, low birth weight and strabismus8.

Evidence that WBS is a genetic disorder was found approximately 40 years after the initial reports of the syndrome. Two lines of evidence pointed to the genetic basis of WBS. First, cases

of parent-to-child transmission of WBS were reported, indicating that it is an autosomal

dominant disorder10. Second, the region of the genome responsible for WBS was identified

following the discovery that disruptions of the elastin gene (ELN) cause SVAS11. ELN was

found to be deleted in individuals with WBS but as the deletion of ELN alone was unlikely to cause the full spectrum of phenotypes seen in WBS, it was postulated that other genes must also be included in the deletion12. Since this discovery, researchers have been focused on

determining which genes are deleted in WBS patients and the role that each of these genes plays

in the phenotype.

1.1.2 Williams-Beuren syndrome clinical phenotype

WBS is a relatively common disorder, with a prevalence of approximately 1 in 7500 live

births13. The full phenotype consists of a number of physical abnormalities along with characteristic behavioural and cognitive features. Patients have distinguishing craniofacial features (figure 1.1), including dolichocephaly (a disproportionately long and narrow head),

3

bitemporal depressions and asymmetry14. Their cheeks are full with malar flattening, and the

nose has a bulbous tip and low nasal root14,15. Their eyes often have a stellate pattern, with

periorbital fullness and epicanthal folds. They have a small jaw, dental malocclusion, small and

widely spaced teeth and lips that are wide and full15.

In addition to craniofacial abnormalities, neuroanatomical abnormalities are also present in WBS patients. The overall brain and cerebral volumes are decreased, with a relative preservation of the cerebellum and superior temporal gyrus and a disproportionate decrease in the brainstem volume16. Reductions in sulcal length and depth have also been reported; the central sulcus is 1-2 cm shorter in WBS patients than in control subjects17, and the

intraparietal/occipitoparietal sulcus is 8.5 mm shallower on average18.

Individuals with WBS show a pattern of retarded growth beginning in utero19. Failure to

thrive occurs in 80% of infants as a result of colic, gastroesophageal reflux and constipation15.

The rate of growth during childhood is 75% of normal, and as adults 70% people with WBS will

remain shorter than the height predicted by their genetic background15,20. The musculoskeletal

system is also affected with common problems including scoliosis, lordosis, kyphosis, radioulnar

synostosis15.

4

Figure 1.1 – Characteristic WBS facial features. The same individual with WBS is shown at three different ages. He displays the characteristic facial features including mallar flattening, a bulbous nasal tip and dolichocephaly.

Incidences of SVAS and infantile hypercalcemia led to the discovery of WBS, however

neither is an obligate symptom of the disorder21. Hypercalcemia has been documented in 15% of

individuals with WBS15, and is diagnosed by measuring the serum levels of ionized calcium; an

upper limit of 1.35 mM/L is considered normal and values in excess of this, with or without

elevated total calcium levels, would be considered hypercalcemic22. Children with mildly

increased levels are generally asymptomatic, while the severe cases that occur with WBS can

lead to vomiting, poor feeding, irritability and/or seizures.

Cardiovascular problems are common in WBS, with 84% of patients having at least one

type of abnormality23. Stenosis (an abnormal narrowing in the vasculature) is the most common type of cardiovascular clinical finding seen in WBS, occurring as a result of smooth muscle

overgrowth which results in a thickening of the vascular media24. SVAS, which occurs when

stenosis is located above the aortic valve, is seen in 69% of individuals with WBS while

5

pulmonary arterial stenosis occurs in 34% of individuals23. Other common cardiovascular

conditions, and the percentage of patients affected include: hypertension (17-50%), mitral valve

disease (15%), coarctation of aorta (4%), and pulmonary valve disease (5%)23,24.

Urinary tract problems are common, including structural defects of the kidneys, bladder

diverticulae, nephrocalcinosis, frequent urinary tract infections and enuresis during childhood15.

Individuals with WBS may also suffer from numerous gastrointestinal problems including, feeding problems, reflux, constipation, colon diverticulosis and chronic abdominal pain15. The

endocrine system in WBS patients is also affected. Hypercalciuria (excessive calcium excretion

through urination) is common, either alone or in conjunction with hypercalcemia24. Up to 30%

of patients are diagnosed with subclinical hypothyroidism24. Adults with WBS are at increased

risk of developing diabetes mellitus; in one study only 10% of participants had normal results on

an oral glucose-tolerance test25.

1.1.3 The Williams-Beuren syndrome cognitive phenotype

Typical WBS patients have mild mental retardation with an average IQ of 5526, although

individual IQ’s may range from 40 to 10027. It is important to note that individuals with WBS

differ from other individuals with similar IQs in that their abilities generally show a

characteristic pattern of strengths and weaknesses. Children with WBS generally achieve

developmental milestones at a later age than typical children; the development of language skills

are a clear example of this, with only 14% of 26 month olds with WBS having a vocabulary size

that is above the fifth percentile of the general population28. However, as the children grow

older, expressive language becomes a relative strength. Their grammar, vocabulary, syntactic

processing and semantic fluency skills are much stronger than those seen in individuals with

Down syndrome (matched for age and IQ), and in some cases close to those seen in normal

6

controls26 (figure 1.2). Other relative strengths in individuals with WBS include facial

processing26 (the ability to recognize and remember both familiar and unfamiliar faces) and

auditory rote memory15.

In contrast to these strengths, visual-spatial processing is a relative weakness for WBS

patients. Their drawings lack organization and the individual elements are not cohesive; this is

true regardless of whether they are copying an image or design placed in front of them or if they

are free drawing15,26 (figure 1.3A). Block design tests have given interesting insight into how

individuals with WBS process visual-spatial information. In these tests participants are asked to

replicate a geometric pattern by arranging a set of blocks which have sides coloured red, white or

half-and-half. Individuals with WBS focus on the small details of the design and are unable to replicate the global configuration. This is opposite from what is seen in individuals with DNS

who focus on the global organization of the blocks but are unable to replicate the specific

pattern26. These different ways of processing information were more clearly illustrated in a

study where groups of children with WBS and DNS, who were matched for age and IQ, were

asked to copy a large global figure which was made up of smaller local components (a “D” shape

composed of smaller “Y”s). As would be expected from the block design experiments, the

children with DNS focused on the global configuration and reproduced the “D” shape, while the

children with WBS focused on the local forms and reproduced the “Y”s arranged haphazardly

on the page26 (figure 1.3B).

7

Figure 1.2. Children with Williams-Beuren syndrome and Down Syndrome (DNS) who were matched for age and IQ were asked the conditional question “What if you were a bird?”. Children with WBS performed better with respect to grammar and content. (Adapted from Bellugi et al., 200026)

Figure 1.3 Visual spatial skills in children with Williams-Beuren syndrome (WBS) and Down syndrome (DNS). (A) Children were asked to draw a bicycle, the child with WBS had difficulty properly connecting all the elements together. (B) Children were asked to copy the model image of a “D” made up of smaller “Y”s, the children with WBS cannot arrange the smaller components into the proper global configuration. (Adapted from Bellugi et al., 200026).

8

1.1.4 The Williams-Beuren syndrome behavioural phenotype

In addition to a unique cognitive profile, WBS typically includes a unique behavioural

profile. Individuals are often described as “over-friendly” or “hyper-social”, and generally suffer

from anxiety and simple phobias. The friendly personality is evident even during infancy when

babies will engage with people around them through eye contact, smiling and cooing29. They enjoy interacting with others to the point that it may affect their ability to complete another task.

While IQ tests were being administered to seven toddlers with WBS, it was noted that five of the

seven children were unable to perform the cognitive task in front of them because they were

more interested in the examiners face29. People with WBS have no apparent fear of strangers,

and frequently approach them to begin conversations. Despite this social disinhibition, children

with WBS have difficulty cultivating friendships especially with their peers. Numerically

speaking, they have fewer friends and participate in fewer activities than children with DNS30.

Individuals with mental retardation are at a greater risk of developing psychiatric

disorders and maladaptive behaviours than the general public, however hyperactivity, difficulty

concentrating and attention deficit hyperactivity disorder (ADHD) occur more frequently in

WBS than in other mental retardation disorders31. In addition, people with WBS often suffer

from fear, anxiety and phobias. In one study a group of individuals with WBS ranging in age from 8-39 years and their mothers were asked questions about their fears and compared to a control group composed of individuals with mental retardation of various etiologies31. When

asked open ended questions, the WBS group reported an average of 3.78 fears per person as

opposed to 2.45 in the control group. The most frequent fears mentioned in the WBS group were

thunderstorms (47%), loud sounds (22%), death/dead people (22%), high places (22%) and

9

ghosts or spooky things (19%); this differs slightly from the fears named in the control group,

which were most commonly ghosts/spooky things (29%), snakes (21%) and high places (17%).

While only 16-18% of individuals with WBS meet the clinical criteria for a diagnosis of

generalized anxiety disorder, a majority of individuals experience one or more of the associated

symptoms including “excessive worry about the future”, is a “worrier”, “becomes sick from

worry” and “shows an inability to relax”31. Specific phobias are also common with 35% of

individuals meeting the clinical criteria for diagnosis. Nearly all individuals meet two of the

three criteria necessary for a clinical diagnosis, namely “marked, persistent, anxiety-producing fears” (96%) and “avoid fearful stimuli or endure with distress” (84%), however only 35% exhibit the final symptom “impaired adaptive functioning”31. In contrast, other studies have

reported phobias in 0.6 - 4.3% of people with mental retardation and 2.3-2.4% of normal

individuals31.

1.2 The genetic basis of Williams-Beuren syndrome

1.2.1 Identification of a microdeletion at 7q11.23

SVAS has long been known to occur either as part of WBS, or as an independently occurring autosomal dominant trait32. In 1993, Curran et al. identified a family with SVAS and demonstrated that a translocation on which disrupted the elastin gene (ELN)

segregated with SVAS in this family11. ELN had previously been mapped to 7q11.233, and the

identification of this mutation provided the first clue as to the region of the genome responsible

for WBS. It was postulated that haploinsuffiency for ELN also occurs in WBS and later that

year, Ewart et al. used southern blots to show that people with WBS are indeed hemizygous for

ELN12. The deletions in WBS patients were shown to extend beyond the ELN locus,

10 encompassing at least 114kb, and indicating that neighbouring genes were likely to play a role in the disorder12.

In the years following this discovery researchers attempted to identify the size of the deletion that occurs in WBS, the genes involved, and to understand the mechanism of deletion.

Polymorphic DNA markers were initially used to show the deletion extends at least 500 kb34, and it was shown that repeated sequences flank the deletion35 which was the first indication of how the deletion might occur. By 1999 it had been established that the typical deletion seen in WBS patients is ~1.5 Mb occurring at 7q11.2336, contains 28 genes (table 1), and the region is flanked by three different low copy repeat (LCR) sequences37 (figure 1.4).

Figure 1.4. Arrangement of genes and blocks of low copy repeats located at 7q11.23. The centromeric, medial and telomeric LCRs are shown, with arrows underneath indicating the relative orientations of the sequence. The red box highlights the 28 genes which are typically deleted in WBS. (Adapted from Pober, 201024)

11

Table 1.1 Genes located in the Williams-Beuren syndrome deletion region (Adapted from Tassabehji, 200338)

Gene Description Function

NSUN5 NOP2/Sun domain Protein with a NOL1/NOP2/sun domain. May play a (WBSCR20) family, member 5 role in the regulation of the cell cycle

Tripartite motif- TRIM50 Encodes an E3 Ubiquitin ligase containing 50 FK506-binding Immunophilin protein. Role in male fertility and FKBP6 protein 6 homologous chromosome pairing in meiosis 'Frizzled' proteins act as receptors for Wnt signalling Frizzled drosophila FZD9 proteins. May be involved in tissue polarity and homolog of 9 development

Bromodomain adjacent Protein with a bromodomain. May be involved in BAZ1B to zinc finger domain 1B chromatin-dependent regulation of transcription

BCL7B B-cell CLL/lymphoma 7B Member of BCL7 protein family. Unknown function

β-transducin protein with four putative WD40 repeats. TBL2 Transducing-β-like 2 May play a role in intracellular signalling pathways or cytoskeletal organization

MLXIPL Max-like protein bHLH-LZ transcription factor. May play a role in cell (WBSCR14) interacting protein-like proliferation and/or differentiation

VPS37D Vacuolar protein sorting Regulator of vesicular trafficking. Possible role in cell (WBSCR24) 37 homolog D growth and differentiation

DNAJC30 DnaJ (Hsp40) homolog, Protein has DnaJ domain involved in protein folding (WBSCR18) subfamily C, member 30

Protein with S-adenosyl-L-methionine binding motif. WBSCR22 WBS critical region 22 May be involved in DNA methylation Syntaxin 1 A protein plays a key role in intracellular STX1A Syntaxin 1A transport and neurotransmitter release

ABHD11 Abhydrolase domain Protein has a α/β hydrolase fold domain. Unknown (WBSCR21) containing 11 function

Protein component of tight junction strands in liver CLDN3 Claudin 3 epithelial cells. Role in maintaining cellular polarity Protein component of tight junction strands in CLDN4 Claudin 4 epithelial cells. Roll in maintaining cellular polarity

12

protein belongs to the ubiE/COQ5 methyltransferase WBSCR27 WBS critical region 27 family

WBSCR28 WBS critical region 28 Unknown function

Structural protein, component of elastic fibres. Role in ELN Elastin arterial morphogenesis Serine/threonine kinase with LIM domains. Role in LIMK1 LIM kinase 1 actin cytoskeletal reorganization essential for directional movement of neurons Protein contains an RNA recognition motif. Stimulates EIF4H Eukaryotic initiation initiation of protein synthesis at the level of mRNA (WBSCR1) factor 4H utilization

LAT2 Linker for activation of T Roll in immune cell development (WBSCR5) cells

Replication factor C, Component of replication factor C complex which is an RFC2 subunit 2 activator of DNA polymerases during replication Cytoplasmic linker protein. Role in regulating CYLN2 Cytoplasmic linker 2 microtubule dynamics General transcription Member of GTF2I transcription factor family . May play GTF2IRD1 factor 2-I repeat domain a role in activating/repressing gene transcription containing 1

WBSCR23 WBS critical region 23 Intronless gene. Unknown function

General transcription Multifunctional transcription factor. Functions both as GTF2I factor 2-I a basal factor and as an activator neutrophil cytosolic Component of phagocyte NADPH-oxidase system. Role NCF1 factor 1 in immunity General transcription gene containing both I-repeats and a Charlie-8-like GTF2IRD2 factor 2-I repeat domain transposase motif. Function unknown containing 2

1.2.2 Genomic rearrangements at 7q11.23

The segmental duplications occurring at 7q11.23 contain three different repeated sequences, designated “A”, “B” and “C”. There are three blocks of segmental duplications, the centromeric, telomeric and medial LCRs, that flank the deleted segment of DNA. The centromeric and medial blocks of repeats are in a different order relative to each other, but are in

13

the same orientation. The telomeric block contains repeated sequences which are in the same

order as the centromeric block, but are in the opposite orientation39 (figure 1.4). The LCRs

contain transcribed genes, pseudogenes and putative telomere associated repeats40.

WBS genomic deletions arise from non- allelic homologous recombination (NAHR)

occurring between the highly similar LCR sequences. The majority (95%) of individuals with

WBS carry a 1.55 Mb deletion resulting from unequal crossing over occurring between the

centromeric and medial “B” repeats39. Five percent of patients carry a larger (1.84 Mb) deletion

that arises when unequal recombination occurs between the “A” repeat blocks in the centromeric

and medial LCRs39 (figure 1.4). It is hypothesized that the deletion breakpoints are most likely to occur in the “B” blocks due to the high sequence similarity between the blocks (99.6% with no large gaps; the “A” blocks are only 98.2% identical with two large gaps), and the shorter physical distance between the two “B” blocks39.

NAHR can occur between homolgous (interchromosomal), homologous

chromatids (interchromatidal) or within a chromatid (intrachromatidal). Intrachromatid NAHR

will result in a chromosome with a deletion of the WBS region and create an acentric

chromosome fragment that will be lost (figure 1.5). In contrast, both interchromosome and

interchromatid NAHR results in one chromosome with a deletion of the WBS region and another

with the reciprocal duplication (figure 1.5). A small number of patients with the reciprocal

duplication have been identified. The duplication results in a syndrome that is distinct from

WBS, with the key feature being an impairment in expressive language41,42.

Approximately 5% of the population carries a paracentric inversion containing the region

typically deleted in individuals with WBS37. The inversion is created as a result of NAHR

14 occurring between centromeric and telomeric LCRs which are inverted relative to each other37.

The expression of genes contained within the inverted region in unaffected, and there are no clinical symptoms associated with the inversion43. However, 25-33% of individuals with WBS received the affected chromosome from a parent who carries an inversion of the region44,45. This indicates that presence of the inversion predisposes an individual to undergo NAHR at this locus.

Figure 1.5 Mechanisms of inter- and intrachromatid non-allelic homologous recombination (NAHR). (A) interchromasomal and interchromatidal NAHR results in deletions and duplications of the intervening region. (B) intrachromatidal NAHR results only in deletions of the intervening region. Arrows indicate direction of centromeric (cen), medial (mid) and telomeric (tel) repeat low copy repeat blocks (A, B & C) (taken from Schubert, 200937).

15

1.2.3 Atypical deletions in the Williams-Beuren syndrome region

To date, only the role of ELN in the WBS phenotype has been elucidated, however the

discovery of patients with atypical deletions of genes in the WBS region have begun to provide

some clues to the correlation between phenotype and genotype of this disorder. The first cases

of atypical 7qll.23 deletions were reported in 1999 by Botta et al., who identified two individuals

with the full spectrum of WBS phenotypes who carried smaller deletions which began at the

common telomeric breakpoint and extended up to, and including, the ELN locus46. A third

patient with a similar deletion who also had displayed symptoms of typical WBS was later

reported47 (figure 1.6). These findings indicate that the deletion of the nine genes from ELN to

GTF2I (at the telomeric end of the deletion) is sufficient to cause the phenotypes typically seen

in individuals with WBS; this region is referred to as the “minimal critical region”. It can then be concluded that haploinsufficiency for more than one of these genes is necessary for WBS to occur. ELN is known to cause SVAS, but the specific gene(s) which cause the remainder of the symptoms have yet to be discovered.

16

Figure 1.6 Published cases of individuals with atypical deletions in the WBS region46-53. The genes located in the region are shown at the top. Boxes labelled C, M & T represent the centromeric, medial and telomeric LCRs respectively. The deleted region in each individual is depicted by a thin line. The last deletion represents what is typically seen in individuals with WBS.

A number of other atypical patients have been reported with deletions which spare one or

more of the genes in the minimal critical region48-53 (Figure 1.6). By comparing the different

phenotypes reported in these patients it has been determined that genes at the telomeric end of the deletion appear to be responsible for the behavioural and cognitive aspects of the WBS phenotype. Patients who retain two copies of GTF2IRD1 and/or GTF2I, members of the General

Transcription Factor 2-I (GTF2-I) gene family, generally do not show the traditional dysmorphic

17

facial features associated with WBS (or show very mild features), perform better on visual-

spatial processing tasks than individuals with a full deletion of the WBS region, and have more

normal intelligence.

1.3 General Transcription Factor 2-I (GTF2-I) gene family

The GTF2-I gene family consists of three genes: General Transcription Factor 2-I

(GTF2I), General Transcription Factor 2-I Repeat Domain Containing 1 (GTF2IRD1), and

General Transcription Factor 2-I Repeat Domain Containing 2 (GTF2IRD2) (Figure 1.7). These genes code for the proteins TFII-I, TFII-IRD1 and TFII-IRD2 respectively. Members of the

TFII-I protein family are characterized by the unique I-repeat domains they contain. The I- repeats are helix-loop-helix (HLH) like domains which are highly conserved between family members54,55. HLH domains contain two α-helices connected by a loop which can range from 5-

25 amino acids56. The HLH-like domains found in the TFII-I gene family differ from traditional

HLH domains in that the loop region is much larger (~40 amino acids)55. Proteins containing

HLH domains are known to form both homo- and heterodimers, and to bind to DNA (when the

HLH domain is preceded by a basic region)56.

Each member of the TFII-I family also contains a leucine zipper (LZ) located at the N-

terminus. The LZ domain is a dimerization motif which can be used to form homo- and

heterodimers55. There is experimental evidence to show that TFII-I and TFII-IRD1 can form homodimers in vitro, mediated by the LZ, however there is conflicting evidence regarding their ability for form heterodimers57-59. Sequence analysis of the LZ region in all three GTF2-I family

members by Hinsley et al. indicates that while TFII-I and TFII-IRD2 may be able to form

heterodimers, it is highly unlikely that TFII-I and TFII-IRD1 heterodimerize55.

18

Only one gene containing HLH-like I-repeats can be detected in Danio rerio and

Takifugu rubripes, and the sequence of this gene is highly similar to GTF2IRD160. Based on this information, GTF2IRD1 is believed to be the ancestral gene of the gene family, with GTF2I

arising second through duplication and divergence. Based on sequence analysis, GTF2IRD2

appears to be more closely related to GTF2I than to GTF2IRD1, and is likely to have been the

third GTF2-I family member to be created60.

Figure 1.7 Structural elements of the TFII-I proteins encoded by the GTF2I gene family. The different TFII-I isoforms arise result from the inclusion/exclusion of the A and B exons (the green and grey rectangles respectively). The I-repeats are shown as purple boxes (R1-R6).

1.3.1 General Transcription Factor 2-I (TFII-I)

TFII-I (TFII-I/BAP-135/SPIN) was the first member of the TFII-I family to be identified.

TFII-I was first identified in 1991 as a protein that is able to activate basal transcription of the adenovirus major late promoter (AdML) by binding to the initiator (Inr) sequence61. It was also

19

shown that TFII-I binds to an upstream E-box element which is usually recognized by HLH containing proteins such as upstream stimulatory factor 1 (USF1), and is able to cooperate with

USF to activate transcription of the AdML promoter61. These results indicated that TFII-I could

function as both a basal transcription factor and a transcriptional co-activator. It was later shown

that TFII-I can initiate transcription through interactions with the E-box element, even in

promoters which do not contain an Inr sequence54.

TFII-I was independently identified twice in 1997: as a phosphorylation target of

Bruton’s tyrosine kinase (Btk)62, and as a protein involved in serum-induced expression of the c-

Fos gene63. Following binding of an antigen to the B cell antigen receptor (BCR), a cascade of

tyrosine phosphorylation occurs resulting in the activation of a number of different pathways,

and resulting in proliferation and differentiation of the cell. When these pathways are

compromised it can result in immunodeficiencies such as X-linked agammaglobulinemia62. Btk

is a Src-related tyrosine kinase which is activated within minutes of a B cell encountering an

antigen64. In order to identify proteins that interact with Btk in vivo, Yang and Desiderio immunoprecipitated Btk from a human B lymphoid cell line, and identified an associated protein with a molecular mass of 135 kDa62. They named this protein Btk-associated protein of 135 kDa

(BAP-135), and determined that following activation of Btk through phosphorylation, Btk goes

on to phosphorylate a tyrosine residue of BAP-135. Later that year, it was determined through

sequence analysis that TFII-I and BAP-135 were the same protein54.

In response to extra-cellular signals, the c-FOS gene is activated following binding of serum response factor (SRF) to the serum response element (SRE) in the c-FOS promoter63.

Interactions between the protein Phox1 and SRF facilitate binding of SRF to the SRE. While

attempting to reconstitute the binding that occurs between these proteins in vivo under in vitro

20

conditions, Greuneberg et al. identified a protein they named SRF-Phox1Interacting protein

(SPIN)63. The addition of SPIN to SRF and Phox1 allowed for the formation of a stable complex

which could bind to the SRE. SPIN was found to bind to multiple sites in the c-FOS promoter,

and through interactions with SRF and Phox1, SPIN is able to induce expression of a c-FOS reporter gene in response to serum63. Cloning and sequencing of SPIN cDNA revealed that

SPIN is identical to TFII-I63. It was later shown that in order for TFII-I to activate c-FOS expression, TFII-I must be phosphorylated at a specific tyrosine residue which results in the translocation of TFII-I to the nucleus65. Phosphorylation of TFII-I occurs as a result of extra- cellular signals, and so it has been proposed that TFII-I is able to link signal transduction events to transcription

There are four known splice forms of TFII-I in humans (α-, β-, γ- and ∆-; figure 1.7), three of which are present in mice (β-, γ- and ∆-)66. Each of the isoforms show a similar

subcellular distribution when expressed ectopically in COS cells66. The different isoforms are all

capable of both homo- and heteromeric interactions both in vitro and in vivo, and it has been

proposed that different combinations of isoforms may play specific roles in the transcriptional

regulation of target genes66. Each of the isoforms is capable of binding DNA66, and studies on

the ∆-isoform have shown that deletion of either the leucine zipper region, or a basic region

which precedes the second I-repeat, is sufficient to impede binding of the protein to an Inr

sequence, and activation of a reporter gene58.

In addition to AdML and c-FOS, TFII-I has been shown to play a direct role in the activation of vascular endothelial growth factor receptor-2 (VEGFR-2)67 and goosecoid (GSC)68.

TFII-I may also play an indirect role in transcriptional control of other genes through its

21

interactions with histone deacetylase 3 (HDAC3) and PIASxβ, a member of the E3 ligase family

of proteins which are known to be involved in the SUMOylation of several transcription factors69

TFII-I is expressed in preimplantation mouse embryos, where it can be detected in both

the cytoplasm and nucleus in embyros at the two-cell stage through to the 128-cell blastocyst

which implants into the uterus at embryonic day 4.570. This indicates that TFII-I is likely to play

a role in early embryonic development.

In the developing mouse brain, Gtf2i mRNA expression is restricted to neuronal cells according to in situ hybridization71. Between embryonic day (E)18 and postnatal day 7 (P7), the

mRNA is ubiquitously expressed throughout the brain. Expression in the cerebellum appears to

be relatively enhanced beginning at P7, and by the time the mouse is six weeks old the

expression pattern of Gtf2i mRNA changes to its adult state. At this time the highest levels of

expression are seen in cerebellar Purkinje cells, the hippocampus and the neurons of the cerebral cortex. Expression can also be detected in the olfactory bulbs and in neurons of other regions of the brain. Immunohistochemistry using an antibody which recognizes all splice forms of TFII-I

revealed a similar expression pattern in the adult brain to that of the mRNA, however protein

expression in the cerebral cortex appeared lower than that of the corresponding RNA71. The high

expression of TFII-I in the cerebellum is interesting as there appear to be anatomical

abnormalities in this is area in WBS patients. The relatively high hippocampal expression is also

noteworthy as the hippocampus is important for learning and memory71.

1.3.2 General Transcription Factor 2-I Repeat Domain containing 1 (TFII-IRD1)

TFII-IRD1 (GTF2IRD1/MusTRD1/BEN/CREAM1/GTF3/WBSCR11) was

independently identified in multiple experiments72-76. Troponin I is a component of the troponin protein complex which controls the contraction of muscles in response to the level of

22

intracellular calcium. There are three troponin isoforms, each encoded by separate genes. All

isoforms are expressed in all muscle types early in development, but during late fetal

development the slow isoform for troponin (TnIs) is down-regulated in all types of muscles

fibres, except for those destined to become slow fibers75. An upstream regulatory element (USE)

in the TnIs promoter was shown to be sufficient to confer preferential slow-muscle activity to a heterologous thymidine kinase minimal promoter77. O’Mahoney et al. performed a yeast-one

hybrid screen to identify proteins capable of binding to the TnIs USE, and identified a novel protein which was similar to TFII-I, which they referred to as muscle TFII-I repeat domain- containing protein 1 (MusTRD1)75.

Soon after this, TFII-IRD1 was identified in a second yeast-one hybrid screen as a protein

that binds to the early enhancer of the hoxc8 gene and given the name binding factor for early

enhancer (BEN)74, and was found to bind to the retinoblastoma protein (Rb) through its C-

terminal region and given the name containing repetitive eighty-six amino-acid motif

(CREAM1)72.

Like its family member TFII-I, TFII-IRD1is believed to be a transcription factor. There

78 is evidence to suggest that it plays a role in the regulation of multiple genes including TnIs ,

GSC79 and VEGFR-267. Interestingly, some experiments indicate that TFII-I and TFII-IRD1 may

counter-regulate some of the same genes with TFII-I activating the expression of the target gene and TFII-IRD1 repressing expression of the same gene67,68.

TFII-IRD1 has been shown to bind to two distinct DNA sequences, which have been

found in the promoter regions of some of the proposed target genes. Vullhorst and Buonanno

identified the consensus sequence GTCGAGATTAGBGA using SELEX on the I-repeats of mouse

23

TFII-IRD180. They found that all of the mouse I-repeats were capable of binding to DNA with the exception of the first I-repeat (R1), and R4 was found to have the greatest affinity for DNA, binding specifically to the consensus sequence. The core of the consensus sequence, GGATTA, is found in the regions of both the TnIs and Hoxc8 promoters that TFII-IRD1 had previously been

shown to bind to. Lazebnik et al. identified a different TFII-IRD1 consensus sequence using

SELEX, CWGCCAYA81. The methods of Lazebnik et al. differed slightly from those of

Vullhorst and Buonanno in that they used the entire TFII-IRD1 protein as bait while Vullhorst

and Buonanno cloned each of the I-repeats individually to determine their different DNA binding abilities. TFII-IRD1 was shown to repress expression of a reporter gene when three copies of the

CWGCCAYA sequence were cloned upstream81. An in silico analysis found that this consensus sequence is present in the promoters of both the human and mouse BMPR1B and FGF15 genes.

Lazebnik et al. demonstrated that when TFII-IRD1 is knocked down in C2C12 cells using siRNA, expression of both of these genes is dramatically increased, indicating that TFII-IRD1

may play a role in the transcriptional regulation of these genes81.

Using a knock-in LacZ mouse model of Gtf2ird1, Palmer et al. were able to establish the expression pattern of Gtf2ird1 throughout mouse development82. At E7.5, TFII-IRD1 was expressed in all germ layers and extra-embryonic tissues, and expression became more refined at the onset of organogenesis. Expression in the forebrain and gut was detected in E9.5 embryos, and as development progressed, expression of TFII-IRD1 was also detected in many tissues including the midbrain, branchial arches and heart. As fetal development progressed expression in the brain was highest in the olfactory bulbs, cerebellum, thalamic and hypothalamic nuclei.

Expression of TFII-IRD1 in the brains of adult mice was relatively low, but was detected in all neuronal types examined. The highest levels of expression were detected in the olfactory bulbs,

24

purkinje neurons of the cerebellum and neurons in the piriform cortex. Studies in our lab have

revealed similar results and shown that expression of TFII-IRD1 in the prefrontal cortex of adult mice is restricted to layer V neurons83.

Outside of the nervous system, the highest levels of TFII-IRD1 in adult mice were found

in the testis, endothelial cells, brown adipose tissue, heart and smooth muscle of the gut and

bladder82.

1.3.3 General Transcription Factor 2-I Repeat Domain containing 2 (TFII-IRD2)

The contains three copies of GTF2IRD2, however two of the copies are

pseudogenes located within the LCRs and are unlikely to produce a functional protein60. There is only one copy of the gene in the mouse genome, which shows a high degree (80%) of homology to the functional human locus84.

The N-terminal half of TFII-IRD2 contains two I-repeats, which appear to be derived

from the first and sixth I-repeats found in TFII-I60. Similar to other members of the TFII-I gene

family, the N-terminus also contains a leucine zipper which is believed to facilitate the formation

of protein dimers. The C-terminal portion of the protein contains a CHARLIE8 transposable

domain which has inserted in frame into the locus, resulting in the production of a fusion

protein84. The CHARLIE8 transposon is an autonomous transposon which is unique to

mammals. Sequence analysis of the CHARLIE8 domain in TFII-IRD2 and the surrounding

genomic region indicates that it may have retained some of the functions associated with

transposition84. These functions could include interacting with specific DNA or protein motifs or

cleavage of DNA strands. It is also possible that presence of the transposase target sites could

predispose the genomic region to instability by allowing other transposases to bind to cleave the

DNA at these sites84. A similar mechanism has been proposed for Charcot-Marie-tooth

25

neuropathy type 1A (CMT1A) and hereditary neuropathy with liability to pressure palsies

(HNPP) which are caused by the deletion or reciprocal duplication of 1.5 Mb on chromosome 17

respectively84. The region is flanked by repeated sequences which can undergo NAHR, and it

has been determined that 76% of the cross over events which lead to these disorders occur at a

recombination hotspot containing a transposable-like element that is flanked by transposase recognition sites85. Other transposases are believed to bind to and cleave the DNA at these sites.

Based on Northern blot analysis, Gtf2ird2 is expressed highly in mouse heart, brain and

liver tissues84. Weaker expression was detected in the , , kidney and skeletal muscle.

Using RT-PCR, no expression could be detected in mouse embryos at days E9.5 and E10.5.

In humans GTF2IRD2 lies within the medial LCR block, and depending on the exact

breakpoint of the deletion, it may or may not be present in WBS patients. The inclusion of

GTF2IRD2 in the deletion does not seem to have any obvious effect on the phenotype of the

patient84.

1.4 The Gtf2ird1 mouse model

The genomic region associated with WBS on human chromosome 7q11.23 is conserved

in mice on chromosome 5G2 (figure 1.8). The regions are highly syntenic, however in mice the sequence of the locus is inverted relative to the human sequence86. In addition, the LCRs which

flank the region in humans are not found in mice. Minor differences found in the mouse genome

include the absence of WBSCR23 and CCL26, and the addition of an additional Cldn gene

(Cldn13) which is not found in the human genome. The mouse and human genomes are generally highly similar, which means that mutant mice will often show many of the clinical

symptoms seen in humans with a particular genetic mutation87. This genetic similarity, along

26

with the development of tests to examine both clinical and behavioural phenotypes in mice,

makes mice an excellent system to study genotype-phenotype correlations in WBS.

Figure 1.8. The WBS deletion region in humans is syntenic to a region of mouse chromosome 5. The corresponding region in the mouse genome is inverted relative to the human genome, and is lacking two genes found in humans (* - WBSCR23 and CCL26). The mouse genome contains one extra gene (* - Cldn13) which is not found in humans.

1.4.1 Generation of the mouse model

Previous work in our lab has generated a Gtf2ird1-/- mouse model using gene targeting to

better understand the role this gene plays in the WBS phenotype88. Exons 2, 3, 4 and part of 5 of

Gtf2ird1 were replaced with a neomycin-resistant gene cassette in R1 murine embryonic stem cells. Chimeric mice were generated by aggregating targeted cells with morula stage embryos.

Chimeric male mice were mated to wildtype (WT) female mice on a CD1(outbred) genetic background. The Gtf2ird1+/- offspring were then backcrossed onto a CD1 background.

27

Heterozygous mice were intercrossed to generate Gtf2ird1-/- mice. Both heterozygous and

homozygous mice were viable and fertile, mutant offspring were born at the expected Mendelian

ratio.

Real-time PCR was performed on RNA extracted from neonate and adult brains to

determine expression levels of Gtf2ird1 in the mutant mice using primers located in the deleted

region (exon 2). In heterozygous mice the expression level of Gtf2ird1 was approximately half

of that seen in WT mice, and no Gtf2ird1 expression could be detected in Gtf2ird1-/- mice.

Expression of Gtf2i and Clip2 which flank the Gtf2ird1locus were not altered in the mutant mice88.

However when primers located in exon 9 (which was not included in the deletion) of

Gtf2ird1 were used, a transcript could be detected in Gtf2ird1-/- mice. Sequence analysis of this

transcript showed that exon 1, which contains part of the 5’ UTR, was splicing directly into exon

6. The truncated transcript shows increased expression relative to the WT transcript in both

Gtf2ird1+/- and Gtf2ird1-/- mice. There are two possible translational start sites in this transcript – the first would produce a small, out-of-frame protein while the second could produce a truncated in-frame protein. As the transcript is missing the first four coding exons, the truncated protein would lack both the leucine zipper and the first I-repeat. We have been unable to confirm if the aberrant transcript is producing a truncated Gtf2ird1 protein as there are no specific antibodies available, however the presence of a dose-dependent phenotype (see below) strongly supports a loss-of-function model rather than a dominant negative model.

28

1.4.2 Behavioural phenotypic analysis

There were no obvious morphological or anatomical abnormalities in the Gtf2ird1+/- or

Gtf2ird1-/- mice. Physically, Gtf2ird1-/- mice (male and female) were significantly smaller (15%) than WT mice. The Gtf2ird1+/- mice were also smaller, although the trend was not significant.

Gtf2ird1-/- and Gtf2ird1+/- mice exhibited some behaviours that are consistent with the

WBS phenotype. When placed in a cage with an unknown mouse they displayed significantly

fewer aggressive interactions than WT mice, and the aggressive interactions that they did engage

in were shorter in duration. The mutant mice spent significantly more time following the

intruder around, and spent more time sniffing the intruder mouse88.

Mutant mice also had decreased levels of anxiety and fear as measured by the elevated

plus maze and cued fear conditioning tests. In the elevated plus maze, mice were placed on an

elevated “plus” shaped platform with two open arms and two enclosed arms. Mice typically

prefer to be in the enclosed arms where they feel safe; however Gtf2ird1-/- mice entered into the

open arms a greater number of times, spent a greater amount of time inside the open arms, and

dipped their heads over the sides of the platform a greater number of times indicating reduced

anxiety88.

In the fear conditioning test, mice were placed into a test chamber and an auditory cue

was paired with an electric foot shock. When the auditory cue was repeated at a later time WT

mice froze in anticipation of the shock, whereas Gtf2ird1-/- mice displayed significantly less

freezing. Together, these results indicate that Gtf2ird1-/- mice have significantly reduced levels

of anxiety and a reduced natural fear response88.

29

1.4.3 Biochemical and electrophysiological phenotypic analysis

Serotonin (5HT) metabolism is known to be linked to anxiety and aggression. As these traits are altered in Gtf2ird1-/- mice, high-performance liquid chromatography (HPLC) was performed to determine whether 5HT metabolism is altered in the mutant mice. Levels of 5HT and 5-hydroxyindoleacetic acid (5-HIAA; a 5HT metabolite) were measured in different brain regions of Gtf2ird1-/- mice, and significantly increased levels of 5-HIAA were found in the amygdala, frontal cortex and parietal cortex relative to WT mice. No significant differences in

5HT levels were detected88.

In order to investigate the effects of 5HT on the neurons of the prefrontal cortex in

Gtf2ird1-/- mice, electrophysiological analysis was performed on acute brain slices from the prefrontal cortex of Gtf2ird1-/- and WT littermates83. Whole cell recordings on neurons in layer

V of the cortex revealed that application of 5HT results in increased inhibitory outward currents in Gtf2ird1-/- mice, relative to WT littermates. The inhibition was shown to be mediated through

5HT1A receptors. 5HT1A receptors in the prefrontal cortex had previously been shown to regulate anxiety-like behaviours89 , and the enhanced post-synaptic inhibition of these receptors seen in Gtf2ird1-/-mice could be related to their atypical behaviours .

1.5 Research Aims and Hypothesis

Due to the presumptive role of TFII-IRD1 in the cognitive and behavioural aspects of the

WBS phenotype, it is of interest to identify neural targets of this putative transcription factor.

The Gtf2ird1-/- mice previously created in our lab provide an excellent system to study the biological role of this gene. The goal of my project was to use these mice to identify downstream targets of TFII-IRD1 in the mouse brain.

30

Given that TFII-IRD1 has been shown to bind DNA, and regulate the expression of target

genes in vitro, I hypothesized that the behavioural phenotype seen in the Gtf2ird1-/- mice was the

result of altered expression of genes regulated by TFII-IRD1. Using microarray analysis and qRT-PCR, I hoped to identify target genes of TFII-IRD1 and use this information to gain an understanding of the molecular mechanisms which give rise to the behavioural phenotype seen in both the mutant mice and WBS patients.

31

Chapter II: TFII-IRD1 may not function as a transcription factor in the developing mouse brain.

2.1 Abstract

Members of the TFII-I gene family, including TFII-IRD1, have been shown to regulate

transcription by binding to specific DNA sequences. Numerous in vitro studies examining the

effect of TFII-IRD1 on gene regulation have been done, and a few direct targets have been

proposed, including TnIs, Hoxc8, Gsc, and Vegfr2. However, to date the list of proposed target

genes has not included plausible candidates for the cognitive and behavioural phenotype seen in

either individuals with WBS, or Gtf2ird1-/- mice. In order to identify novel transcriptional

targets of TFII-IRD1, I performed the first in vivo microarray screen, examining expression in brain from Gtf2ird1-/- and WT mice at E15.5 and at birth. Changes in gene expression in the mutant mice were moderate (0.3 to 2.5 fold) and most candidate genes with altered expression

verified using real-time PCR, were located on chromosome 5, within 50 Mb of Gtf2ird1. siRNA

knock-down of Gtf2ird1 in two mouse neuronal cell lines failed to identify changes in expression

of any of the genes identified from the microarray and subsequent analysis showed that

differences in expression of genes on chromosome 5 were the result of retention of that

chromosome region from the targeted embryonic stem cell line, and so were dependent upon

strain rather than Gtf2ird1 genotype. In addition, specific analysis of genes previously identified

as direct in vitro targets of GTF2IRD1 failed to show altered expression. In summary, I was

unable to find any in vivo neuronal targets of this putative transcription factor, despite its

widespread and robust expression in the developing rodent brain.

.

32

2.2 Literature Review

2.2.1 Evidence supporting the role of TFII-IRD1 as a transcription factor

75 The first transcriptional target of TFII-IRD1 to be identified was TnIs . TFII-IRD1 was

identified through a yeast-one hybrid as a protein capable of binding to an Inr-like element in the

75,90 TnIs promoter . TnIs is initially expressed in all muscle fibers, but during fetal development it becomes up-regulated in future slow-twitch myofibers and down-regulated in future fast-twitch

myofibers91. Slow-twitch myofibers are important for endurance and maintaining posture, while

fast twitch fibers are needed for movement and fast power generation90. A 157bp upstream

enhancer (USE) sequence has been shown to be necessary for slow myofiber-specific expression

77 of TnIs , and TFII-IRD1 was shown to bind to an Inr-like element contained in the USE.

78 Polly et al. showed that TFII-IRD1 is able to repress TnIs transcription . TFII-IRD1

mediated repression of TnIs may occur through two separate pathways: directly through binding

to the USE, and indirectly through interactions with the nuclear receptor co-repressor (NCoR) protein and/or the transcription factor myocyte enhancer factor (MEF2C)78. MEF2C is an

activator of TnIs expression, and also binds to an element contained within the USE. When a

MEF2C expression construct was transfected into C2C12 (muscle) cells along with a luciferase

reporter construct containing the USE sequence, luciferase expression was increased relative to

controls. Transfection of a TFII-IRD1 expression construct, either alone or in conjunction with

the MEF2C construct, resulted in repression of luciferase expression. Expression of TFII-IRD1

was able to repress expression even when the USE contained point mutations which prevented

TFII-IRD1 binding. TFII-IRD1 was subsequently shown to interact in vitro with both NCoR and

MEF2C; it is possible that TFII-IRD1 could prevent MEF2C from activating TnIs expression by

preventing it from binding to the USE and this may occur in conjunction with NCoR78.

33

In order to determine the in vivo role TFII-IRD1 plays in regulating TnIs expression, Issa

et al. generated a transgenic mouse which expressed the human GTF2IRD1 gene in all skeletal

muscles beginning early in development92. Phenotypic analysis of adult mice revealed that they

lacked slow-twitch fibers in their hindlimb muscles; the total number of muscle fibers did not

differ from WT, however the muscle was composed almost entirely of fast-twitch fibers. In transgenic embryonic mice, development of slow twitch fibers proceeded normally indicating that the absence of slow fibers in adult mice was the result of postnatal fiber conversion.

Expression of slow-fiber specific genes including TnIs was reduced in the muscle of the

transgenic mice and fast-fiber specific genes were found to be upregulated. These results

indicate that TFII-IRD1 is a repressor of slow fiber-specific genes. Issa et al. hypothesized that

all of the slow-fiber genes which showed decreased expression in the transgenic mice may share

a common regulatory sequence which TFII-IRD1 is able to bind to, and that binding of TFII-

IRD1 represses expression of the genes needed for slow muscle fiber development92.

Hoxc8 was the second gene proposed to be a transcriptional target of TFII-IRD174. Using

a yeast-one hybrid, Bayarsaihan and Ruddle identified TFII-IRD1 as a protein that is capable of binding to the early enhancer (EE) region of the Hoxc8 promoter. The EE sequence is over 200 bp and is located 3 kb upstream of the Hoxc8 transcriptional start site, and has been shown to be necessary for the proper spatial and temporal expression of Hoxc8 in the neural tube and paraxial mesoderm during development93. In transgenic mice where the EE sequence has been deleted,

initial expression of Hoxc8 occurs later than in WT mice and the expression boundaries are

altered. By E11.5 the expression of Hoxc8 is indistinguishable from WT, however the mice

have many phenotypic similarities to Hocx8-/- mice including an abnormal hindlimb clasping

reflex upon tail suspension and skeletal transformations93. It has been reported that interactions

34

between TFII-IRD1 and the EE may repress Hoxc8 expression94, however no evidence of this

has been published.

Goosecoid (Gsc) is also proposed to be a direct target of TFII-IRD1, however there are contradicting reports as to how TFII-IRD1 may regulate Gsc68,79. The interaction between TFII-

IRD1 and the Gsc promoter was first identified by Ring et al. who performed a yeast-one hybrid

to identify proteins which interact with the distal element (DE) upstream of the Gsc

transcriptional start site79. Gsc is a transcription factor which plays an important role in the

proper development and patterning of vertebrate embryos95,96. In particular, Gsc-/- mice have

been shown to have craniofacial defects along with fused ribs and abnormalities in the sternum97.

Two regions in the Gsc promoter are necessary for proper expression of the gene: the DE which

is activated by activin and nodal family members, and the proximal element (PE) which is

activated by Wnt signalling95.

After determining that TFII-IRD1 is able to bind to the DE of the Xenopus Gsc promoter in vitro using yeast-one hybrid analysis and electrophoretic mobility shift analysis (EMSA), Ring et al. sought to determine the effect that TFII-IRD1 binding has on Gsc transcription79. They

injected Xenopus embryos with mRNA encoding a VP16-GTF2IRD1 fusion protein along with a reporter gene construct containing the DE sequence. The VP16 domain is a transcriptional activator and ensures that TFII-IRD1 will be constitutively active once translated. The VP16-

TFII-IRD1 fusion protein was able to activate expression of the reporter construct, and was also able to activate expression of the endogenous Gsc gene.

As activin is known to activate Gsc expression through the DE, Ring et al. next determined that TFII-IRD1 was necessary for this activation to occur by co-transfecting

35

morpholinos which prevent translation of endogenous GTF2IRD1, activin mRNA and a DE

sequence containing reporter construct into Xenopus embryos. Not only did injection of the morpholinos prevent activin from activating the reporter construct, but it also resulted in decreased expression of the endogenous Gsc gene. Based on these observations it was hypothesized that TFII-IRD1 binding to the Gsc DE results in activation of Gsc expression, and this activation was believed to be the result of TFII-IRD1 interacting with other proteins.

Three years later, Ku et al. also found that TFII-IRD1 plays a role in the regulation of

Gsc expression, however they determined that binding of TFII-IRD1 to the DE serves to repress expression of Gsc68. According to Ku et al. Gsc expression in P19 cells can be activated by a

complex of TFII-I and SMAD2 binding to the DE, following stimulation with transforming

growth factor beta (TGFβ). TFII-I belongs to the same gene family as TFII-IRD1 and SMAD2

is a transcription factor which is known to play a role in the regulation of genes that are activated

by TGFβ/activin68. A reduction in Gsc expression could be detected following knockdown of

TFII-I expression in P19 cells using siRNA and in Xenopus embryos using morpholinos.

Together these findings indicate that TFII-I is necessary for the proper activation of Gsc

expression.

When TFII-IRD1 and TFII-I expression constructs were transfected into P19 cells at a

1:1 ratio along with a reporter construct containing the DE, TGFβ induced expression of the reporter was greatly increased. As the ratio of TFII-IRD1: TFII-I increased, TGFβ was no longer able to activate the reporter construct, in contrast when the TFII-IRD1: TFII-I ratio was decreased TFII-IRD1 was no longer able to repress reporter gene expression. ChIP assays were performed on P19 cells and endogenous TFII-IRD1 was found to localize to the Gsc promoter in the absence of TGFβ signalling, and TFII-I was found at the promoter following stimulation of

36

the cells with TGFβ. This suggests that binding of the TFII-I family members to the DE in the

Gsc promoter may be mutually exclusive; TFII-IRD1 appears to constitutively repress Gsc

expression until the TGFβ signalling cascade is initiated, at which point TFII-I activates Gsc

expression68.

These results appear to contradict the findings of Ring et al.; Ku et al. believe that the

reason for the contradiction lies in the VP16- TFII-IRD1 fusion protein which Ring et al. used.

The VP16 domain is known to be a transcriptional activator and had previously been shown to

cause a transcriptional repressor to serve as a transcriptional activator98. Therefore, the findings

of Ring et al. using the fusion protein can only show that TFII-IRD1 is capable of binding to the

DE, and cannot indicate what result this binding has on Gsc expression. Ring et al. also found

decreased expression of endogenous Gsc when TFII-IRD1 expression was knocked down using

morpholinos. Ku et al. used the same morpholinos to knockdown TFII-IRD1 expression and they found that in vitro protein synthesis of TFII-I was inhibited, and so they propose that the decreased expression of Gsc in Xenopus embryos treated with TFII-IRD1 morpholios may be the result of decreased levels of TFII-I.

Tassabehji et al. provided further evidence that TFII-IRD1 is able to regulate Gsc expression, and their results were in agreement with those of Ring et al.53,79. Tassabehji et al.

knocked down endogenous TFII-IRD1 expression in HEK293 cells using siRNAs and found that

this reduced expression of a reporter construct containing the Gsc promoter sequence52.

Knockdown of endogenous TFII-I did not have any effect on reporter gene expression. Based on

these conflicting findings it seems likely that the effect of TFII-IRD1 and TFII-I binding to the

DE in the Gsc promoter may depend on both cell type and which cellular signalling pathways have been activated

37

Further support of the notion that TFII-IRD1 may be able to positively and negatively regulate a specific target gene depending on the cell type comes from the work of Tantin et al., who studied regulation of the murine immunoglobulin heavy chain (IgH) promoter59. The expression of IgH is restricted to B lymphocytes, and an element downstream of the transcription

start site, termed the downstream immunoglobulin control element (DICE), had previously been

shown to confer specific activation of the IgH promoter in B lymphocytes99. In order to identify

proteins which interact with the DICE sequence, WT DICE segments were coupled to latex

microspheres and incubated with B cell nuclear extracts. A 110 kDa protein was isolated which

bound with greater affinity to the WT DICE sequence than to a mutant sequence, and mass

spectroscopy revealed this protein to be TFII-IRD1. The in vitro affinity of TFII-IRD1 for the

DICE sequence was confirmed using EMSA. Interestingly, the EMSA data showed that TFII-I

was also able to form complexes with DICE.

In order to determine the consequence of TFII-IRD1 binding to DICE, a dominant

negative TFII-IRD1 construct was generated99. The mutant protein retained the ability to bind to

DICE, but was unable to interact with other proteins and form higher order complexes. When

the dominant negative protein was over expressed in an M12 cell line (mature B cell

plasmacytoma cells) along with an IgH reporter construct, IgH promoter activity was reduced

indicating that TFII-IRD1 positively regulates IgH expression. However when the same mutant

protein and reporter construct were over expressed in murine HAFTL cells (a pre-B cell line), promoter activity was increased. This result would indicate that TFII-IRD1 negatively regulates

IgH expression. Evidence for negative regulation of IgH promoter activity by TFII-IRD1 was also found in a third murine pre-B cell line (70Z/3) using the dominant negative protein and siRNA knockdown of endogenous TFII-IRD1 expression.

38

It seems likely that regulation of IgH expression in B lymphocytes is regulated

temporally, and through interactions with other proteins present in the cell, TFII-IRD1 is able to ensure proper IgH expression at the correct stage of B-cell development.

Another gene which may be counter regulated by members of the TFII-I gene family is vascular endothelial growth factor receptor 2 (VEGFR2)67. The VEGFR2 promoter does not

contain a TATA box sequence, but does have an Inr element. TFII-I has been shown to activate

VEGFR2 expression through binding to the Inr100. Jackson et al. demonstrated that TFII-I

mediated activation of VEGFR2 expression could occur even in the absence of a functional Inr

sequence, that TFII-I is able to bind to three different E-box sequences located in the VRGFR2

promoter67. As TFII-I and TFII-IRD1 had previously been shown to counter regulate the same

genes, they then went on to see if TFII-IRD1 also plays a role in VEGFR2 transcriptional

regulation.

While no direct interaction between the VEGFR2 and TFII-IRD1 was detected, when

TFII-IRD1 was transfected into bovine pulmonary artery endothelial (BPAE) cells, along with a

VEGFR2 promoter –reporter construct basal reporter activity was decreased.

The majority of studies on the function of TFII-IRD1 in transcriptional regulation have focused on specific target genes, however in 2006 Chimge et al. performed an unbiased screen to identify new transcriptional targets of this putative transcription factor101. An immortalized MEF cell line was transfected with an expression construct containing GST-TFII-IRD1. This resulted in a 6.6-fold increase in Gtf2ird1 mRNA as determined by qRT-PCR. Two separate transfection experiments were performed, and RNA was extracted from the cells 24 hours later. Microarray analysis was performed using the Operon microarray chip which contains probes for 16, 460

39

genes. Approximately 2000 genes were found to be altered by more than 1.7 fold relative to

mock transfected controls. It is important to note that a low statistical cut-off was used in

generating this list so that it would include genes which had previously been shown to be

regulated by or interact with the TFII-I gene family.

A total of 11 genes were selected for validation by qRT-PCR; G1p2, Ccl7, Ube2I6, Tgfb2 and Shrm were confirmed to be up-regulated in MEFs which over express TFII-IRD1, while

Folr1, Tgfbr2, Csrp2 and Dlk1 were confirmed to be down-regulated101. FoxH1 and Cfl were

down-regulated according to the microarray results, however qRT-PCR analysis revealed that

expression of these genes was increased in three separate transfection experiments.

Chimge et al. then went on to identify further potential targets of TFII-IRD1 using a

bioinformatics approach102. They combined the results of previous SELEX experiments and

known TFII-IRD1 binding sites to derive the consensus sequence “BRGATTRBR”, and used

this sequence to search a database of transcriptional start sites. The consensus sequence was

identified within 1kb of the start site in 1772 mouse/human orthologous pairs. Of these genes,

601 were identified as being regulated by TFII-IRD1 in the microarray performed on MEF cells.

ChIP analysis was used to show that both when TFII-IRD1 and TFII-I are over expressed in MEF cells they can bind to the promoters of a number of these genes, and siRNA knockdown of Gtf2ird1 and Gtf2i in MEFs resulted in alterations in the expression of a number of genes including Cfl1, Opn, Fgf11 and Ccnd3102.

2.2.2 Cellular localization of TFII-IRD1

Bayarsaihan et al. looked at cellular localization of TFII-IRD1 during mouse embryonic development using an anti-TFII-IRD1 antibody103. TFII-IRD1 can be detected in the nucleus

40

beginning at the two-cell stage (the onset of zygote gene expression), until E3.5. At E4.5 the

localization of TFII-IRD1 shifts, and it can only be detected in the cytoplasm of trophoblast

cells. TFII-IRD1 expression remains cytoplasmic until E7.5 when nuclear expression can be

detected in the neural ectoderm and embryonic mesoderm. Cellular localization in the later

stages of development were not examined in this study, but multiple studies on the localization

of TFII-IRD1 in cultured cells have been performed.

Endogenous TFII-IRD1 has been shown to localize specifically to the nucleus in HeLa

cells94 and C2C8 myoblast cells57, as has GFP-tagged TFII-IRD1 when over expressed in COS7

cells68,94. It has also been suggested that TFII-IRD1 and TFII-I may affect the cellular localization of the other when co-expressed. Tussie-Luna et al. proposed that TFII-IRD1 can exclude TFII-I from the nucleus, thereby repressing the activation of TFII-I-responsive genes94.

When GFP-tagged TFII-IRD1 and GST-tagged TFII-I were co-expressed in COS7 cells, TFII-

IRD1 was predominantly found in the nucleus, while the majority of TFII-I was found in the cytoplasm. The expression of TFII-I alone resulted in nuclear localization of the protein, and activation of TFII-I target genes94. However, a study published three years later, by many of the

same authors, found that when the same GFP- TFII-IRD1 and GST- TFII-I constructs were co- expressed in COS7 cells TFII-I localized to the nucleus while TFII-IRD1 was found in the cytoplasm68.

The nuclear localization of TFII-IRD1 described in these studies is consistent with the

role of TFII-IRD1 as a transcription factor.

41

2.3 Material and Methods

2.3.1 Generation of probes for in situ hybridization

The Hoxc8 probe sequence was amplified from WT mouse genomic DNA using the

following primer sequences FOR- 5’ GGAACCGGCCTATTACGACT 3’ and REV- 5’

TTAAGTGGCCTTGTCCTTCG 3’. The Gtf2ird1 probe sequence was amplified from WT

mouse cDNA (from p0 brains) using the following primer sequences: FOR-5’

AACAGACTGGGGGAGAAGGT 3’ and REV-5’ CCTTGGCGGCAGGAATATAG 3’.

The PCR amplicons were purified using Microclean (Microzone) when a single PCR product was detected using gel electrophoresis, and gel extracted using the QIAquick gel extraction kit (Qiagen) when multiple products were detected. The purified sequences were then cloned into the pCR2.1-TOPO TA cloning vector (Invitrogen) and excised using EcoR1 digestion (New England Biolabs). The excised fragment was then ligated into the pBluescriptII

KS (Fermentas) vector using the EcoR1 site in the multiple cloning region. Restriction enzyme digestion was used to identify clones carrying the insert in the forward and reverse orientation in order to generate sense and antisense probes. The plasmids containing the probe sequences were linearized at the 5’ end of the probe sequence, purified using phenol-chloroform extraction and then precipitated using RNase-free NaOAc and ethanol.

DIG-labelled probes were generated using in vitro transcription under RNase-free conditions. 1 µg of template DNA (linearized vector) was transcribed with T3 (Boehringer

Mannheim #1031171) or T7 (Boehringer Mannheim # 108981767) RNA polymerase using 10X

DIG RNA labeling Mix (Roche #11277073910) following the manufacturer's protocol.

42

2.3.2 Whole mount in situ hybridization of Gtf2ird1-/- embryos

Male and female mice were housed together overnight, and the female was checked for a vaginal plug in the morning. 12:00 pm on the day the plug was found was considered to be E0.5.

Embryos were collected from the pregnant mother at E11.5 . The embryos were dissected from the uterus into ice cold RNase-free phosphate buffered saline (PBS). The back of the head was punctured with a needle, and the embryos were fixed in 4% paraformaldehyde-PBS at 4° C overnight. The embryos were then washed twice with PBS containing 0.1% Tween-20 (PBT) and dehydrated with a series of methanol/PBT washes. Each wash was 15 minutes long, with rocking, and the methanol concentration was increased from 25% to 50%, 75% and then 100%.

The embryos were then stored in 100% methanol at -20° C.

Before hybridization, the embryos were rehydrated by taking them through the methanol/PBT series in reverse and then washing twice with PBT. They were then bleached with 6% hydrogen peroxide in PBT for 1 hour at room temperature, washed 3x in PBT, treated with 10 µg/mL proteinase K in PBT for 15 minutes, washed for 10 minutes in 2 mg/mL glycine in PBT, washed 2x in PBT, refixed for 20 minutes in 0.2% gluteraldehyde/4% paraformaldehyde in PBT and finally washed 2x with PBT.

The embryos were then placed in a 2 mL tube filled with hybridization solution (50% formamide, 5x SSC pH 4.5, 50 µg/mL yeast RNA, 1% SDS, 50 µg/ml Heparin, 0.1% CHAPS,

5mM EDTA). Once the embryos sunk to the bottom of the tube the solution was replaced and the embryos were incubated at 70° C for 1 hour. The hybridization solution was replaced and the probe was added at a concentration of 1 µg/mL and then the embryos were incubated at 70° C overnight.

43

Following hybridization the embryos were washed 2x 30 min at 70° C with solution 1

(50% formamide, 5X SSC pH 4.5, 1% SDS, 0.1% CHAPS), 3x 5 min at 70° C with solution 2

(0.5M NaCl, 10 mM Tris-HCL pH7.5, 0.1% Tween-20, 0.1% CHAPS), 2x30 min at 37° C with

100 g/mL RNase A in solution 2, 2x 30 min at 65° C in solution 3 (50% formamide, 2X SSC

pH 4.5,µ 0.1% CHAPS) and then 3x 5 min with TBS-T (TBS with 0.1% Tween-20). The

embryos were then pre-blocked for 60 min with heat inactivated 10% sheep serum in TBS-T

before incubation with preabsorbed anti-Digoxigenin-AP antibody (Roche #11093274910)

overnight at 4° C.

The antibody was preabsorbed using embryo powder generated from E11.5 mouse

embryos. To prepare the powder embryos were homogenized in a minimal volume of PBS, 4

volumes of ice cold acetone were then added and the mixture was incubated on ice for 30 min.

Following centrifugation at 10,000 g for 10 min the supernatant was removed and the pellet was

washed with ice cold acetone and spun down again. The pellet was then dried out on a sheet of

filter paper and ground into a fine powder. For each embryo used in the in situ hybridization 3

mg of embryo powder was added to 0.5 mL of TBS-T and 5 µL of sheep serum. The mixture

was incubated at 70° C for 3 min and then cooled on ice. 1 µL (0.75 U) of anti-Digoxigenin-AP

antibody was added and incubated for 60 min at 4 ° C. Following centrifugation for 10 min the

supernatant was collected and diluted to 2 mL with 1% sheep serum in TBS-T. The embryos

was placed in this pre-absorbed antibody solution for overnight incubation.

The next morning the embryos were washed 3x 5 min and then 5x 60 min with TBS-T

containing 2 mM levamisole, and then 3x 10 min with NTMT (100 mM NaCl, 100 mM TrisHCl

pH 9.5, 50 mM MgCl2, 0.1% Tween-20, 2 mM levamisole). The embryos were then incubated with 200 µL of NBT/BCIP solution (Roche #1681451) in 10 mL of NTMT. Once the colour

44

had developed to the desired extent the embryos they were washed once with NTMT and twice with PBT with 1 mM EDTA.

2.3.3 In situ hybridization of P0 mouse brain sections

The entire head of WT P0 mouse pups was removed and fixed in 4% paraformaldehyde

(PFA) in PBS overnight at 4° C. The following day the heads were washed 2x 30 min in PBS at

4° C, and then incubated at 4° C in 30% sucrose in PBS overnight or until the embryos sunk.

The heads were then rinsed in O.C.T Compound (Tissue-Tek, #4583), and then immersed and frozen in O.C.T. compound and stored at -80° C. A cryostat was used to cut 10 micron sections which were collected on silane-prep slides (Sigma, #S4651), air dried and then stored at -80° C.

All solutions used while performing in situ hybridization were RNase free. Sections on slides were re-fixed in cold 4% PFA-PBS for 10 min and then washed 3x 10 min in PBS. Slides were then incubated in acetylation mix (0.013% triethanolamine, 0.003% acetic anhydride) and washed 3x 5 min. in PBS. A humidified chamber was created by placing kimwipes soaked in

50% formamide/ 5x SSC into a tupperware container. Disposable pipettes were placed on the bottom of the container and the slides were laid on them. Each slide was pre-hybridized with

200 µl of hybridization buffer (50% formamide, 5X SSC, 0.25 mg/mL yeast tRNA, 0.5 mg/mL salmon sperm DNA, 5X Denhardt’s) in the humidified chamber for 4 hours at room temperature.

The DIG-labelled RNA probes were added to hybridization buffer at a concentration of 200 ng/mL. The probe solution was heated at 80° C for 5 min. then added to the slides and incubated in the humidified chamber at 60° C for 16 hr.

Before incubation with an anti-DIG antibody, the slides were rinsed in 5X SSC at 60° C, washed in 0.2X SSC at 60° C for 1 hr, then for 10 min., rinsed in solution B1 (pH 7.5,0.1M

45

maleic acid, 0.15M NaCl, 0.175M NaOH), and then blocked in solution B1 with 1% blocking

reagent (Roche # 11096176001) for 1 hr. The slides were then incubated in a 1:5000 dilution of

anti-Digoxigenin-AP antibody (Roche #11093274910) in solution B1 for 1 hr at room temperature. Following incubation with the antibody, slides were washed 2x 15 min. in solution

B1, rinsed in solution B3 (1M Tris, pH 9.5, 5M NaCl, 1M MgCl) for 5 min and then incubated in

solution B3 with 2% NBT/BCIP solution (Roche #1681451) until the colour had developed to

the desired extent. The slides were then washed 2x in PBS.

2.3.4 Preparation and culture of mouse embryonic fibroblast (MEF) cells

Male and female mice were housed together overnight, and the female was checked for a

vaginal plug in the morning. 12:00 pm on the day the plug was found was considered to be E0.5.

Embryos were collected from the pregnant mother at E15.5. If the genotype of the embryos was

unknown, yolk-sacs were collected for genotyping. The embryos were dissected from the uterus

into sterile PBS. The head, limbs and internal organs were removed from the embryos, and the

carcasses were washed 3x with sterile Dulbecco's Modified Eagles Medium (D-MEM) (Sigma-

Aldrich D5796). The embryos were then minced into small pieces and placed in a 50 mL tube

with 10 mL Trypsin-EDTA (Gibco #25200056) and 5 mL of sterile 3 mm glass beads (Sigma-

Aldrich Z265926) and incubated at 37° C for 90 min with shaking. 10 mL of Trypsin-EDTA

was added at 30 min. intervals for a total volume of 30 mL. The cell suspension was then

decanted into two 50 mL tubes, each containing 3 mL of fetal bovine serum (FBS; Gibco

#12483-020). The tube containing the glass beads were washed 2x with D-MEM + 10% FBS

and the washings were added to the cell suspension mixture. This mixture was then centrifuged

at 200 g for 5 min. and the pellet was suspended in 50 mL of D-MEM + 10% FBS. The cells

were counted and 5 x 106 cells were plated in D-MEM + 10% FBS & 1X penicillin-streptomycin

46

(Sigma-Aldrich P4333). Cells were cultured at 37° C with 5% C02. Cells were passaged at least

twice before any experimentation to ensure a homogenous population of MEF cells was present.

2.3.5 Dissection of mouse tissues and RNA isolation

P0 mice were sacrificed using decapitation. The whole brain was removed and

immediately flash frozen in liquid nitrogen. Tails were collected for genotyping when necessary.

For embryonic dissections, the mother was sacrificed using cervical dislocation and the uterus was removed and placed in ice-cold PBS. The embryos were immediately removed from the

yolk-sacs and placed into a separate dish of ice-cold PBS. Yolk-sacs were collected for genotyping. Entire E11.5 embryos, and the heads of E15.5 embryos, were flash frozen in liquid nitrogen. The embryos were then homogenized in TriReagent (Sigma-Aldrich Canada, Oakville,

ON) and stored at -80° C. Total RNA was extracted following the manufacturer’s protocol.

2.3.6 Genotyping of P0 and embryonic mice

Genomic DNA was isolated from yolk-sacs (embryonic) or tail clippings (P0). The tissues were incubated in 400 µL of lysis buffer (0.5% SDS, 0.1M NaCl, 50 mM Tris, pH 8.0,

0.5 µM EDTA, 0.25 µg/µL proteinase K) at 52° C until tissue was no longer visible. To purify the DNA, potassium acetate was added to a final concentration of 1.2M, and a volume of chloroform equal to the total solution volume was added. The solution was incubated at -20° C for 20 min. and then centrifuged for 5 min at 12,000 g at room temperature. The aqueous phase was transferred to a new tube, and the DNA was precipitated with 2 volumes of 100% ethanol.

The DNA was then centrifuged again for 5 min at 12,000 g and the pellet was washed with 70% ethanol before resuspension in 100 µL of nuclease free water.

Samples were genotyped using conventional PCR. Two separate PCR reactions were performed for each sample. Each reaction used the same forward primer (For-5’

47

CGACCACCATAGGTTGAAGG 3’), located in the first of Gtf2ird1, in a region found in

WT and Gtf2ird1-/- alleles. The reverse primers were designed to distinguish between WT and

Gtf2ird1-/- alleles. One reaction used a sequence (Rev- 5’ GGGGAACTTCCTGACTAGGG 3’)

present in the NEO-cassette which was inserted into Gtf2ird1-/- alleles, and would only amplify from Gtf2ird1-/- alleles. The other reaction used a sequence (Rev- 5’

TGGGGAACTGTTTGAGAAGG 3’), which is located in an area of the first intron of Gtf2ird1

which is deleted from Gtf2ird1-/- alleles, and would only amplify from WT alleles. The genotype

of each mouse (WT, Gtf2ird1+/- or Gtf2ird1-/-) was determined based on the results of the two

PCR reactions.

2.3.7 Microarray analysis using the Affymetrix mouse 430 2.0 gene chip

The extracted total RNA from the brains of P0 newborn mice was cleaned up using an

RNeasy kit (Qiagen) and run on a 1.2% agarose/formaldehyde denaturing gel to determine the

integrity of each sample. The concentration of each sample was then determined using a

spectrophotometer (Beckman DU 530). RNA from individual samples was pooled together.

Three pools containing RNA from WT mice were created along with three pools containing

RNA from Gtf2ird1-/- mice. Each of the 6 pools contained equal amounts of RNA from 9

different mice, at a final concentration of 1 µg/mL.

Microarray analysis was performed by The Centre for Applied Genomics (TCAG) at the

Hospital for Sick Children (Toronto, ON). The RNA was first analyzed on a Bioanalyzer to

ensure that it was of good quality, and then each pool was analyzed on the Affymetrix mouse

430 2.0 gene chip (which contains probes for over 39,000 transcripts) following the

manufacturer’s protocol.

48

The signals from the gene chips were normalized using Robust Multiarray Analysis

(RMA)104. Differences in gene expression were detected using a second software program,

Significance Analysis of Microarrays (SAM)105, which uses q values as a measure of the false

discovery rate.

2.3.8 Microarray analysis using the Illumina mouseWG-6 v2.0 BeadChip

The extracted total RNA from the heads of E15.5 mice was cleaned up using an RNeasy

kit (Qiagen) and run on a 1.2% agarose/formaldehyde denaturing gel to determine the integrity

of each sample. The concentration of each sample was then determined using a nanodrop

spectrophotometer (Beckman DU 530). RNA from 5 WT mice and 5 Gtf2ird1-/- mice were used

for microarray analysis. WT and Gtf2ird1-/- littermates were used, with 3 WT and 3 Gtf2ird1-/- mice collected from one litter, and 2 WT and 2 Gtf2ird1-/- mice collected from a second litter.

RNA samples were not pooled in this experiment.

Microarray analysis was performed by TCAG at the Hospital for Sick Children (Toronto,

ON). The RNA was first analyzed on a Bioanalyzer to ensure that it was of good quality, and then each sample was analyzed on the Illumina Mouse WG-6 v2.0 Expression BeadChip (which contains probes for over 45,200 transcripts) following the manufacturer’s protocol.

Analysis of microarray data was performed by the Statistical Analysis Core Facility at

TCAG. The data pre-processing included three steps: background correction was performed in the Beadstudio program (Illumina), the data was then transferred to log2 scale and quantile normalization106 was performed. Differentially expressed genes were identified using LIMMA

(linear models for microarray data)107. It fits a linear model for each gene, then an empirical

Bayes method is used to moderate the standard errors for estimating the moderated t-statistics for

each gene which shrinks the standard errors towards a common value. The residual standard

49 deviations are moderated across genes to ensure a more stable inference for each gene. The moderated standard deviations are a compromise between the individual gene-wise standard deviations and an overall pooled standard deviation.

2.3.9 Expression analysis using quantitative Real-Time PCR

Following extraction, total RNA samples were treated with DNase (Turbo DNA free,

Ambion) and 5 µg of RNA was converted to cDNA using the Superscript II First-Strand

Synthesis System (Invitrogen Canada Inc., Burlington, ON) and random hexamer primers.

cDNA samples were diluted 1/100 with sterile water and subjected to real-time PCR analysis using the Power SYBR Green PCR Master mix (Applied Biosystems, Foster City, CA) and the ABI Prism 7900HT sequence detection system (Applied Biosystems, Foster City, CA).

Primers used for expression analysis are listed in Table 2.1. Samples were run in triplicate, and each experiment was repeated at least twice with consistent results. Absolute quantification analysis was used; each plate included a no template control (water) and serially diluted concentrations of control genomic DNA (range 0.63 – 10 ng/well) to generate a standard curve for transcript quantification. All test genes were normalized to the housekeeping gene succinate dehydrogenase (Sdha). Samples which included RNA and all of the reagents to produce cDNA, except for reverse transcriptase, were run as a negative control to ensure that there was no genomic contamination of the samples.

Table 2.1 Primers used for quantitative real-time PCR amplification from cDNA

Primer Name Forward primer sequence (5’ – 3’) Reverse primer sequence (5’ – 3’) Housekeeping genes: mHmbsRT TCCAAGAGGAGCCCAGCTA ATTAAGCTGCCGTGCAACA mHprt1RTe3 TGCTCGAGATGTCATGAAGG AATGTAATCCAGCAGGTCAGC mSdha TGATCTTCGCTGGTGTGGATGTCA CCCACCCATGTTGTAATGCACAGT

50

Gtf2ird1: mGtf2ird1e2 ACTGTGACATCCCCACCAAC GAGTCTAAGGCGGACACCAG mGtf2ird1e9 CGAGGCTGTGGAAATTGTG TGTGTCGCTCCTCCAGAATC mGtf2ird1e21 TGAAGCTCTGGGCATCAAAT GGGGTAGGCCTTCAATGATTA

Gtf2ird1 candidate target genes identified in vitro: mBmpr1b_3UTR GAAGGGTTGGTGTCACTGGT TGAAAGAGCTGCCTACCACA mCcnd3_3UTR GCTCCAACCTTCTCAGTTGC TAGGGCAGCTCCTCATAAGC mCfl1_3UTR GCTATCCCTTCACCCCAGTT TCAAAAGCAGTTTGGGAAGG mEpc1_3UTR GCAGGGAGTATGGAGAGCAC AGCACGAGAGATTCGAGAGC mEzh2_e14 ACGGCTCCTCTAACCATGT CTATCACACAAGGGCACGAA mFgf11_e4 CTCTCTACCGTCAGCGTCGT GCTGCCTTGGTCTTCTTGAC mFgf15_3UTR CGAGGAAGCCAGAAGGTATG GGCAAGCTAAGATCCCATGA mGscRTe1 GCATGTTCAGCATCGACAAC GTAGAGCCGGGAAGACCAC mHoxc8_3UTR AGGGAATGAGGAAGAGGAGAA AAACTTCAAGGGAGTTGCTG mLhx1_e2 GGCGAGGAGCTCTACATCAT TGTTCTCTTTGGCGACACTG mOpn_e6 TTCCAAAGAGAGCCAGGAGA TTGTGGCTCTGATGTTCCAG mTgfb2_e6 GCAGGATAATTGCTGCCTTC TGTACCCTTTGGGTTCATGG

Gtf2ird1 candidate target genes identified in p0 mice: 4833441D16Rik CCACCAGTGCAGTGAAAATG ATGGCTCAGGTCAGAGGAAA mAI506816 ATAGTGGCCCCATCAAAGTG AGCCAGTCAAGGATGGTTTG mAI536236 CCCACGCGTTAGAAAGAGAG TGACTTACTGGGGTGGGAAG (Mphosph9) mAI647811 TGGGCCTTCCTCATATTCAG TACCCATGCTGGAGGAAGTC (BC046410) mAK018172 AGGCAGGAGTGGTGTTCACT CACCCCCAGTTGTTCTCACT mAkap9_3UTR TCAATGGCTCTTTTGTGCTTT TTCATGTGCTGCTGCTAAGG mAU019852 AGACCAGGCTGACCACAAAC GATGAAAGAGCCTGGCGTAG mAuts2_e24 CAGCACCTCTAGTCGGGAAG CTTCCTTGCGTTCCTCTTTG mAV343709 GGGTGTGTCCCCAGCTAATA GGTCAAGTGCCTTCCACATT (A930011O12Rik) mAW556697 ACTGGTCCGAAACAGGATTG GGAAATACAGGCGACTCCAA (Arrdc3) mBB023775 GGGTGATACGGAAGGTTTGA TCTGAGACACGGTTTTGCTG mBB040120 TGGTTACCATGGGCATTTG ATGGAAAGTGGCAGCATAGG mBB051515 GAGCTGTGCTTTTGTGTGGA GTGGGATTTCCGTGAGACTG mBB167280 TTGAGTGAGTGTGTGCGTGA AGCTCCACAGGACCAACATC mBB202611 TAATCGTATGCAGGGCTGGT CTAGCGATGCTGCTTGTACG (Dzip1) mBB206454 GGGAAAAGCAAAACAAACCA CCTGGTGTTCACCTCATCCT mBB337886 TTTGGTCAGGATGTCTTAGTGC TGTGAGTTTGTAATGTCCAGCA mBB373816 AAGCTGGCTTCAAGGAAGAA TCAGGGGAATCGTTTCAGAC

51 mBB451211 TTTCCTGGACACTTGCACTG ATGAGCATGAAGCTCCCATT mBE956180 TGGCTGGTGTTCAGACACTC ACTGCCTTACACCAGGGATG (Hpvc-ps) mBG070910 TGTTGCTTCTCGTGTTCTGG GGCAGAGGACATTTGGAAGA mBG145571 CCGGAACTCAAAAATGTGCT GGAGGCCTAGGCAACATAGA mBM199880 (Zfr) TCTCCAGCCCTCTTGTGACT ATCCATAGCACTGCCCATGT mBM200210 GTTTGCTCCCATCTCTCCAG GCAAGTGGCACTGATGGTAA (Pex1) mBM234702 TCTGCTGTGTCTGCTTGTCA CCCCCATTGTAGCTTCTTGA mBM239037 CAGCTGAGTGCTTGCTGAGT GGCAGTTTTCCTCAGTTGCT mCcm1_3UTR CTGGCCTTGTGGTAAAGCTG CAAAATGTGGTGGTTTGTACTCA mCyp51_3UTR AAGCCAGTGTGGAGAGAGGA CAACCCAGTACAGCACGAGA mDcx_3UTR TCGCTCAAGTGACCAACAAG GGCCCAGAAGAGAAGTCACA mDhcr24_e7 CATCTTCCGCTACCTCTTCG TACAGCTTGCGTAGCGTCTC mEif2s1_3UTR TGGTAGAACTCAATGGGCAAG TCTACCAGGGGTCAAATTCC mGprc5b_3UTR ATCTCACACGGGAAGACACC GCCCTCAAGAAAGACACAGC mHoxA5_3UTR CGGGCAGCTCTCTGTAGTGT ACGAGAACAGGGCTTCTTCA mKcnh1_e11 GTGTCCAAGGCAGAGTCCAT ATTCCGCTGTCACAGGAGTC mKin_3UTR TGAAAGGACGCAGAGTTGAA GTGCCTTGGCTAACACCAAT mLztf1_3UTR GAACCTGCCACACATGAACA CAAGGAAAGCCTAAACATTGG mMapk8_3UTR GATGACTACTTGGGCCTTGG TCACTCAAAAATATGACCACTGAA mMat2a_e1 AAGCGATCCTCCCTCTGTGT CGGCGGTGAGAGAGGGCGAC mMospd3_3UTR CTCTCAGCTGAACCCACCTC AGGAGCAAGGTGCAAACATC mMphosh9_3UTR TCATGTTTTGCGCAGCTCT GCCTTTTCCCAGTGCATAAT mMrpl16_3UTR GTAGTGAAAGCGCGAGGAAC AGAACCAGCAAAGACCCTCA mNdel1_e8 GAATCCAAGTTAGCCGCTTG TTGCTGTTCATTACCCCACA mNedd4_e20 CGCAAACATTCTGGAGGATT GCAACCCCTCCATAGTCAAG mNpas3_e12 GGGCAATCAGTCCGAGAATA GTCGTTGCAGTTCATGTCGT mPeg3_e9 GGATGCACTGATGGGAAACT CAAATCCTCTGCCCTCAAAG mPtprf_e30 ATGGGCAGTCAAGGACAATC GAAGCCTTCACCTGTTTTGG mRanpb2_3UTR ACGGCCAGAATACCAACAGT TCACAGTATCCATGCCATCC mRgs5_3UTR CTATGCCCTGATGGAGAAGG GCAACTTTTGGAAGCCTGAC mRtn4_3UTR GCAAAAATCCCTGGATTGAA CCAAGGGAGTGTCCCCTTTA mStx3_3UTR ACAACATGCCCAACTCAACA TGCGACCTAGAAGAGCCATT mThrap2_3UTR AGCAGTACAACGCCCTATCC CATTGTACAGCTGCGTGAGC mTrpm3_3UTR ACAGGGGTCAAAGCATGTTC ACTTTCTCTGGTGCCTGGTG

Gtf2ird1 candidate target genes identified in embryonic mice: m2310002F18RIK GCCCAAGGCTCTAGGTTCTC TTGCTCATCCAAGCCTAACA (Coq2) mActb_3UTR TGGTTACAGGAAGTCCCTCA AAGCAATGCTGTCACCTTCC mActl6b_3UTR ATACCCGTCCACCCCATC GGGTAATGGGAAAGGGAGAG

52

mAp4m1_e15 CTCCAGGTTCGATTCCTCAG TTGCTGTGGCTTAGATGTCG mAuts2_e20 CTGGCTTACCGAGCTTCAAT CGGAGGACTACGCCTCTGT mKatnal1_e11 GGCTTGAGTCCGGAAGAGAT TCAGAGCCAACTCCAAGTCC mMospd3_3UTR CTCTCAGCTGAACCCACCTC AGGAGCAAGGTGCAAACATC mRpl21_e5 CTGGCCAAGAGGATCAATGT CCCTTCTCTTTGGCTTCCTT mSlc46a3_3UTR GCAATCCACAGGACAAAACC GCTGGGCCTGTTCTCTGTAG mSlc4a4_e23 CGACCTCAGCTTCCTTGATG TCGTCATTGTCGCTATCCAA mTaf6_3UTR TCACATGTGCTGACCTCCTC GGGGAAAACCTTTCCTCCTT mZfp68_3UTR GCTAAGGGGACCCTGTGATT CAAGGTTTTCCTTCACCGTTT

2.3.10 siRNA knockdown of Gtf2ird1 in neuronal cell lines

siRNA knockdown of Gtf2ird1 was performed in two different neuroblastoma derived

cell lines: Neuro2A (N2A; ATCC #CCL-131) and N1E-115 (ATCC # CRL-2263). Cell lines were maintained in D-MEM (Sigma-Aldrich D5796) with 10% FBS (Gibco A12617DJ) and 1X penicillin-streptomycin (Sigma-Aldrich P4333). For siRNA transfection, cells were cultured in

D-MEM + 10% FBS without antibiotics. Cells were maintained at 37° C with 5% C02.

siRNAs targeting Gtf2ird1 (Table 2.2), Gapd (ON-TARGETplus GAPD Control Pool

(Mouse)) and a non-targeting control (ON-TARGET plus Non-targeting siRNA #1) were ordered from Dharmacon. siRNAs were resuspended in 250 µL to create a stock concentration of 20 µM. Transfections of siRNA into N2A and N1E-115 cells were conducted using

Lipofectamine 2000 (Invitrogen) following the manufacturer’s protocol. Briefly, cells were transfected in 6-well plates once they were 50-60% confluent. Lipofectamine 2000 was diluted

1/50 in Opti-MEM Reduced Serum Medium (Gibco), and siRNAs were diluted in Opti-MEM

Reduced Serum Medium (final concentration of RNA when added to the cells was 40, 70 or 100 nM). After a five minute incubation the diluted Lipofectamine and RNAs were combined and

53 incubated for 20 min. at room temperature, and then added to the cells. Cells were harvested either 24 or 48 hrs following transfection and total RNA was extracted.

Table 2.2 sequences of siRNAs used to knockdown Gtf2ird1 expression

Dharmacon ID siRNA sequence Gtf2ird1 exon J-050113-09 GUACUUACGGAGUGCCGAA 9 J-050113-10 GGAGAUGACUGACUCGUUA 9 J-050113-11 GGUUCUGGAGGAGCGACA 14 J-050113-12 CGGAGGAGCUGUUCGUACU 17 J-050113-19 GAAUGUUCGAUGAGCGCAU 10 J-050113-20 UCAAUGAGAAAUACGGUGA 24 J-050113-21 GUACACAAUGAGAGCGUCU 3 J-050113-22 ACACCAGACUCUCGCGGAU 20 siRNA Pools (contain equal concentrations of each RNA): Mouse GTF21RD1 ON-TARGETplus Pool A Pool B Pool C SMARTpool L-050113-01-0005 J-050113-09 J-050113-09 J-050113-11 J-050113-09 J-050113-10 J-050113-11 J-050113-19 J-050113-10 J-050113-11 J-050113-19 J-050113-12 J-050113-22

2.3.11 Cellular localization of Gtf2ird1 in Neuro2a cells

There are multiple isoforms of the Gtf2ird1 transcript, which can be classified as “short” or “long” depending on which of two possible 3’ UTRs they contain. The long forms are predominantly expressed in the brain, while the short forms are predominantly expressed in muscle. A long Gtf2ird1 transcript was amplified from cDNA prepared from the brain of a WT p0 mouse, using primers designed to incorporate restriction enzyme sites and remove the endogenous stop codon (For- 5’ AAGCTTCCATGGCCTTGCTGGGGAAG 3’ and Rev- 5’

GCGGCCGCGGCTCTGAGGTCTAATAATCAA 3’). Multiple bands were present when the

54

PCR product was run on an agarose gel; the brightest band was extracted and cloned into a

pCR2.1-TOPO TA vector (Invitrogen). The Gtf2ird1 sequence was then excised and cloned, in frame, into the multiple cloning site of the mammalian expression vector pcDNA 3.1/myc-His A

(Invitrogen). This vector produces TFII-IRD1 with both myc- and polyhistidine-tags at the C-

terminus.

The Gtf2ird1-pcDNA 3.1 vector was transfected into N2A cells using Lipofectamine

2000 (Invitrogen) following the manufacturer’s protocol. The cells were cultured on coverslips in 6-well plates in D-MEM (Sigma-Aldrich D5796) with 10% FBS (Gibco A12617DJ), and maintained at 37 ° C with 5% CO2.

Twenty-four hours following transfection the cells were fixed on the coverslips by

treatment with formalin for 10 min. at room temperature. The cells were then washed with PBS

+ 0.1% Triton X-100 for 15 min at room temperature before blocking in 0.5% BSA in PBS for 1

hr at room temperature. Primary antibodies were diluted with 0.5% BSA in PBS, added directly

to the coverslips and incubated for 1 hr at room temperature as follows: mouse monoclonal

antibody Anti-Human TFII-IRD1 (myBioSource #MBS120021), diluted 1:100, or Anti-myc

mouse monoclonal antibody (Invitrogen R95025) diluted 1:1000. The cells were then washed 3x

5 min. with PBS, then incubated with Alexa Fluor 594 goat anti-mouse IgG (H + L) (Invitrogen

#A11005) diluted 1:1000 with 0.5% BSA in PBS for 1hr at room temperature in the dark and

finally washed 3x 5 min with PBS. The coverslips were dipped into a beaker of ddH20 before

mounting them onto slides using ProLong Gold antifade reagent with DAPI (Invitrogen

#P36931). Pictures were taken at 40X magnification.

55

2.3.12 Expression analysis using western blots

Protein was extracted from cells using RIPA lysis buffer (10 mM Tris (pH 8.0), 100 mM

NaCl, 1 mM EDTA, 1% NP-40, 0.5% NaDOC, 0.1% SDS) with a protease inhibitor cocktail

(Sigma, P8340). After removing the media, cells were washed with PBS, and then cells were removed from the plate in RIPA buffer, using a cell scraper to detach the cells from the bottom of the dish. Cells in RIPA buffer were passed through a syringe to ensure full lysis of the cells.

The cells were then incubated on ice in RIPA buffer for 20 min. Once lysis was complete the cells were centrifuged at 4° C for 20 min., and the supernatant was transferred to a new tube.

Protein concentration was determined using the Detergent Compatible (DC) Protein

Assay kit (Bio-Rad, 500-0112). 20 µg of protein per sample were used for western blot analysis.

Samples were boiled for 10 min. in SDS loading buffer, before being separated by SDS-PAGE on an 8% polyacrylamide gel. The protein was transferred to a 0.2 µM nitrocellulose membrane

(Pall), and membranes were blocked overnight at 4°C in 5% non-fat dry milk powder in TBS-T

(TBS + 0.05% Tween). Primary antibodies used: Anti-c-myc mouse monoclonal, clone 9E10

(Roche, 11667149001), diluted 1/400, Anti TFII-I/BAP135 (BD Biosciences, 610943), diluted

1/1000. Primary antibodies were diluted in blocking solution and incubated for 90 min. at room temperature with shaking. Membranes were washed 3 x 10 min. in TBS-T, and then incubated for 1 hour at room temperature with ECL Mouse IgG, HRP-Linked Whole Ab (from sheep) (GE

Healthcare, NXA931), diluted 1/10,000 in blocking solution. Following 2 x 10 min. washes in

TBS-T and a final 10 min. wash in TBS, chemiluminescent detection was performed using ECL

(enhanced chemiluminescence) reagents (GE Healthcare) and Hyper Film (GE Healthcare).

56

2.4 Results

2.4.1 Gtf2ird1 is expressed in the developing mouse brain

RNA in situ hybridization analysis was done in order to determine if Gtf2ird1 is expressed in the developing mouse brain. A DIG-conjugated anti-sense probe specific to the

Gtf2ird1 3’UTR was used. Whole mount in situ hybridization on E11.5 embryos revealed high levels of expression in the developing forebrain and midbrain (Figure 2.1A).

The same anti-sense probe was used to determine the expression pattern of Gtf2ird1 in newborn (P0) mouse brains. Horizontal sections through the entire head were mounted on slides and subjected to RNA in situ hybridization. Expression of Gtf2ird1 could be detected throughout the brain, with the highest levels detected in the hippocampus, cortex, thalamus, striatum, olfactory bulbs and brain stem (Figure2.1B).

2.4.2 Expression of candidate target genes Hoxc8 and Gsc are not altered in E11.5 Gtf2ird1-/- mouse embryos

There is evidence to suggest that Hoxc8 and Gsc may be transcriptionally regulated by

TFII-IRD1, however to date there is no in vivo evidence to support this claim. Gsc is a homeobox containing gene which is expressed during two key periods of mouse embryo development; initially for a short period of time in the developing primitive streak at E6.4 –E 6.7 while gastrulation is occurring108, and then during organogenesis beginning at E10.5 109. During

the second phase of expression, high levels of Gsc are found in regions that will form the head and limbs. Gsc-/- mice display no gross abnormalities consistent with a defect in gastrulation,

however they die shortly after birth110. The cause of death is likely related to craniofacial defects

which impair breathing and olfaction.

57

Atypical deletions in the WBS region identified in patients with WBS-like phenotypes have indicated that TFII-IRD1 may play a role in proper craniofacial development53. In addition, mice from a transgenic line in which a c-myc transgene has disrupted Gtf2ird1 expression show craniofacial abnormalities53,111. Given the presumed roles for TFII-IRD1 and Gsc in craniofacial development, and the ability of TFII-IRD1 to bind to the Gsc promoter region in vitro68,79, I

investigated whether TFII-IRD1 may regulate Gsc expression in vivo.

58

C D

Figure 2.1. (A & B) Gtf2ird1 embryonic expression. Whole mount RNA in situ on E11.5 mouse embryos using DIG conjugated probes. (A) Anti-sense probe specific to the 3’ UTR of the Gtf2ird1 transcript. High expression levels are seen in the developing cerebrum. Telencephalic vesicle (TV), ventral mesencephalon (VM). (B) Sense control probe. (C & D) Gtf2ird1 expression in the newborn mouse brain. RNA in situ on a horizontal section of the newborn head. (C) Anti-sense probe specific to the 3’ UTR of the Gtf2ird1 transcript. Expression is seen throughout the brain, including the hippocampus (H), cortex (C), thalamus (T), striatum (S) olfactory bulbs (Ob) and brain stem (Bs). (D) Sense control probe.

59

I performed qRT-PCR to determine the levels of Gsc expression in Gtf2ird1-/- and WT

E11.5 mouse embryos. At this time point both Gsc and Gtf2ird1 are expressed. mRNA was

extracted from whole embryos, and expression values were normalized to the housekeeping gene

Sdha. No differences in the level of Gsc expression could be detected between Gtf2ird1-/- and

WT mice (figure 2.2).

Figure 2.2. Expression of goosecoid (Gsc) and Hoxc8 determined by qPCR. mRNA was extracted from whole E11.5 embryos. Expression values are shown relative to the housekeeping gene Sdha. There is in vitro evidence that suggests both Gsc and Hoxc8 are directly regulated by TFII-IRD1, however no statistically significant differences in expression were detected between genotypes using Student’s t-test.

I also looked at the expression levels and pattern of Hoxc8 in Gtf2ird1-/- and WT E11.5

mouse embryos. Hoxc8 is expressed in embryos beginning at E7.5, and continuing until at least

E17.5112. At E11.5 there are two different domains of Hoxc8 expression: in the neural tube and

the paraxial mesoderm. Using qRT-PCR no differences in the expression level of Hoxc8 between genotypes were detected (Figure 2.2). Whole mount RNA in situ hybridization was performed on E11.5 embryos. Hoxc8 expression was detected in the neural tube and paraxial mesoderm of both Gtf2ird1-/- and WT embyros, and there were no obvious differences in the

60

expression boundaries (Figure 2.3). Together, these results indicated that TFII-IRD1 does not play a role in the transcriptional regulation of Hoxc8 or Gsc at this time point.

Figure 2.3. The expression pattern of Hoxc8 is not altered in Gtf2ird1-/- mice. RNA in situs on E11.5 Gtf2ird1-/- and wildtype mouse embryos incubated with DIG conjugated probes specific for the Hoxc8 transcript. At E11.5 Hoxc8 is expressed between somites (S) 15 and 23, in the neural tube (N) and the paraxial mesoderm (PM) as indicated by the arrows.

2.4.3 Expression of TFII-IRD1 candidate target genes identified in vitro are not altered in vivo

Chimge et al. performed a microarray on MEF cells which over-expressed TFII-IRD1101.

A number of genes were identified as having altered expression, and in a later publication a

subset were verified to have expression changes in MEF cells that were treated with Gtf2ird1 and

Gtf2i siRNA102. In addition ChIP was used to show that TFII-IRD1 can bind to the promoter

region of some of these genes. In a related study, Lazebnik et al. found that Gtf2ird1 siRNA

treatment of C2C12 cells (a mouse myoblast cell line), resulted in a 600-fold increase in Bmpr1b

expression and a 6900-fold increase in Fgf15 expression81. Using ChIP, TFII-IRD1 was found to

61

bind to the Fgf15 promoter in C2C12 cells. Fgf15 is an attractive candidate gene for some of the

WBS phenotypes as it is known to be involved in neocortical patterning and development113.

To determine if TFII-IRD1 is involved in the regulation of Fgf15 and Bmpr1b in vivo, I

looked at the expression of these genes in the brains of newborn Gtf2ird1-/- and WT mice using

qRT-PCR. At this time point, both of these genes are expressed in the newborn brain, but I

could not detect any differences in expression levels between genotypes (Figure 2.4). This

indicates that at this time point TFII-IRD1 does not play a role in the regulation of Fgf15 or

Bmpr1b.

Figure 2.4. Expression of Bmpr1b and Fgf15 determined by qPCR. mRNA was extracted from whole brains of P0 mice (n=3/genotype). Expression values are shown relative to the housekeeping gene Sdha. No statistically significant differences in expression were detected between genotypes using Student’s t-test.

qRT-PCR was also used to look at the in vivo expression of a number of TFII-IRD1 candidate genes identified by Chimge et al., which were validated in MEF cells. I cultured

MEFs from Gtf2ird1-/-, Gtf2ird1+/- and WT E15.5 embryos, and measured expression levels of

seven of the genes identified by Chimge et al. None of the genes tested showed significant

62

differences in expression between genotypes, indicating that TFII-IRD1 is unlikely to regulate expression of these genes under the culture conditions used (Figure 2.5A).

In order to determine if TFII-IRD1 plays a role in the regulation of these genes in the brain, I looked at expression of five genes identified by Chimge et al. in the brains of E18.5 and

adult mice. No significant differences in expression of these genes could be detected between

genotypes (Figure 2.5B).

63

Figure 2.5 Expression of genes previously shown to be targets of TFII-IRD1 in MEFs. (A) mRNA was extracted from MEFs which were cultured from +/+ (n=3), +/- (n=2), and -/- (n=1) E15.5 embryonic mice. (B) mRNA was extracted from whole brains (embryonic mice: n=3/genotype, adult mice: n=3 (+/+),and n = 2 (-/-)). Expression values are shown relative to the housekeeping gene Sdha. Hmbs and Hprt are housekeeping genes and were used as a control. (For presentation purposes, some values were scaled as indicated). No statistically significant differences in expression were detected between genotypes using Student’s t-test.

64

2.4.4 Global expression analysis of P0 mouse whole brain

Microarray analysis was performed in order to identify direct and indirect targets of TFII-

IRD1 in an unbiased manner. Total RNA was extracted from the brains of newborn WT and

Gtf2ird1-/- mice. RNA from 9 mice of the same genotype was pooled together to create three pools of RNA from WT mice and three pools of RNA from Gtf2ird1-/- mice. The RNA was pooled in order to reduce variability that was not caused by Gtf2ird1 genotype as the mice were from different litters and were on a CD1 (outbred) background. In addition the brains were harvested at slightly different time points due to the variability in accessing newborn litters. All of these factors could result in differences in gene expression that were not attributable to

Gtf2ird1 copy number, and so it was hoped that the effect of these variables could be reduced by pooling the RNA and ensuring that each pool contained mice from different litters.

cRNA was prepared from each of the six pools and was hybridized to an Affymetrix - mouse 430 2.0 gene chip. The signals from the gene chips were normalized using Robust

Multiarray Analysis (RMA)104, and differences in gene expression were detected using a second software program, Significance Analysis of Microarrays (SAM)105, which uses q values as a measure of the false discovery rate (FDR) of the identified genes.

Relatively few genes showed altered expression, and the magnitudes of these changes were generally very small (Table 2.3). Using a FDR cut off of 10%, 8 genes were identified as having changes in expression in the null mice of greater than 2 fold. An additional 79 genes had changes in expression between 1.2 and 2 fold.

65

9 3C 9D 7F3 4E1 5G2 5A1 1H4 5A3 1A3 2H3 5A1 5G1 19B 5A3 2H4 5A1 6C1 4C3 5A1 5A2 5G2 5A3 6C3 14D1 13C3 12C3 14D1 8A1.3 14E2.2 11A3.3 11A3.3 osome 16C3.3 chrom-

number Accession Accession NR_015554 NM_177047 NM_177047 NM_194462 NM_194462 NM_175245 NM_175245 NM_029813 NM_029813 XM_001472446 XM_001472446 XM_001481304 XM_001481304 NM_001042591 NM_001042591 NM_001083918 NM_001083918 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (%) 7.33 7.33 3.64 1.96 8.34 3.28 3.28 2.04 3.64 8.34 7.33 8.34 3.64 7.33 8.34 7.33 q-value 0.757 1.897 1.396 1.391 0.829 4.555 1.268 0.603 1.410 1.525 0.699 0.774 1.247 1.409 1.301 0.457 0.697 0.762 0.751 1.336 0.770 3.378 0.753 0.585 1.303 0.741 0.741 2.837 0.690 0.511 0.773 1.285 0.459 1.305 Fold Change Fold 1457191_at 1439224_at 1459497_at 1452379_at 1443222_at 1440807_at 1460151_at 1459253_at 1459420_at 1439948_at 1455151_at 1434664_at 1430393_at 1439928_at 1442483_at 1432198_at 1449910_at 1458525_at 1442893_at 1437126_at 1440694_at 1439483_at 1430195_at 1456706_at 1435453_at 1424784_at 1438238_at 1446324_at 1446713_at 1445307_at 1444956_at Probe Set ID 1442760_x_at 1437717_x_at 1453589_a_at autism susceptibility candidate 2 candidate susceptibility autism Arrestin domain containing 3 A kinase (PRKA) anchor protein (yotiao) 9 (yotiao) protein anchor (PRKA) kinase A RIKEN cDNA 4833441D16 gene Description Auts2 Akap9 Arrdc3 AI506816 AK018172 AV264602 BB451211 BB373816 AU019852 BB341550 BB337886 BG070910 BB167280 BB206454 BC046401 BB148843 BB534083 BB051515 BB023775 BB040120 BB113018 BB115513 BB474913 BB523556 2510017J16Rik 1700029I01Rik 2610005L07Rik 2010315B03Rik 2410129H14Rik 2210418O10Rik 2810043O03Rik 4833441D16Rik A930011O12Rik Gene Table to found expression altered to microarray P0 2.3. Genes have of mice inbrains the according analysis

66

7F2 4E2 XF2 9F4 6C1 4C7 2A1 6B3 14B 5B1 5A1 5G2 1H6 19A 5A1 5A3 5G2 14E4 19C3 17A1 12C3 13C3 12C3 5G1.3 osome 11A3.3 16C3.3 chrom- number Accession Accession NR_002847 NM_025943 NM_025943 NM_145569 NM_053272 NM_053272 NM_025280 NM_010453 NM_010453 NM_016700 NM_020010 NM_020010 NM_022420 NM_022420 NM_026114 NM_026114 NM_030675 NM_030675 NM_008283 NM_008283 NM_033322 NM_001033460 NM_001033460 NM_001081462 NM_001081462 NM_001038607 NM_001160016 NM_001160016 NM_001110222 NM_001110222 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (%) 3.28 7.33 7.33 8.34 3.28 2.04 3.64 3.64 7.33 8.34 3.28 7.33 3.28 7.33 q-value 0.522 0.756 1.797 0.758 0.564 1.466 0.775 1.434 0.753 1.306 0.605 0.782 1.292 0.595 0.741 0.707 1.433 0.748 1.924 0.632 0.797 0.700 0.737 1.271 0.725 1.262 1.310 0.512 0.794 Fold Change Fold 1443772_at 1435514_at 1423667_at 1443905_at 1456880_at 1440417_at 1418129_at 1427317_at 1448926_at 1457936_at 1455279_at 1422533_at 1438892_at 1439705_at 1428729_at 1437577_at 1438871_at 1425908_at 1459061_at 1418141_at 1448974_at 1420491_at 1451411_at 1418414_at 1436202_at 1444869_at 1445420_at Probe Set ID 1428974_s_at 1451899_a_at DAZ interacting protein 1 methionine adenosyltransferase II, alpha DNA segment,DNA 19, ERATO Chr Doi 409, expressed 24-dehydrocholesterol reductase antigenic determinant of rec-A protein mitogen activated protein kinase 8 kinase protein activated mitogen homeo box A5 predicted gene 1060 cytochrome P450, 51 P450, cytochrome diabetic embryopathy 1 Description cerebral cavernous malformations 1 malformations cavernous cerebral doublecortin 1 subunit 2, factor initiation translation eukaryotic alpha guanine nucleotide binding protein (G protein), beta 1 G protein-coupled receptor, family C, group 5, member B domain- repeat I II factor transcription general containing 1 motif, sequence central E5 18 papillomavirus Human pseudogene (eag- H subfamily channel, voltage-gated potassium 1 member related), 1 factor-like transcription zipper leucine adenocarcinoma lung associated metastasis transcript 1 (non-coding RNA) Kin Dcx Gnb1 Dep1 Lztfl1 Dzip1 Ccm1 Kcnh1 Eif2s1 Hoxa5 Cyp51 Mat2a Mapk8 Malat1 Dhcr24 Gprc5b Gtf2ird1 Hpvc-ps Gm1060 BG075322 BG145571 BM239037 BM211666 BM238940 BM117148 D19Ertd409e Gene

67

5F 5F 5F 5F 1H3 5G2 14B 8A2 7A1 5G2 7B1 5A2 19A 5B1 8C5 19A 19B 6G2 10B4 12C1 17A1 11B3 15A1 16B3 4D2.1 13D2.1 13D2.2 11A3.3 osome chrom- number Accession Accession XM_884414 XM_884414 NM_009063 NM_009063 NM_011240 NM_011240 NM_013780 NM_013780 NM_145456 NM_145456 NM_023852 NM_023852 NM_029869 NM_029869 NM_022656 NM_022656 NM_011130 NM_011130 NM_172424 NM_172424 NM_008817 NM_008817 NM_030037 NM_030037 NM_023668 NM_023668 NM_011767 NM_008820 NM_008820 NM_027777 NM_027777 NM_011213 NM_172707 NM_172707 NM_026623 NM_026623 NM_025606 NM_025606 NM_024226 NM_024226 NM_001145899 NM_001145899 NM_001040398 NM_001040398 NM_001081203 NM_001081203 NM_001159516 NM_001159516 NM_001025307 NM_001025307 NM_001035239 NM_001035239 NM_001081323 NM_001081323 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (%) 7.33 7.33 7.33 3.28 3.64 7.33 2.04 7.33 7.33 7.33 7.33 7.33 3.28 8.34 7.33 7.33 q-value 1.290 1.462 1.416 1.262 0.442 0.656 1.315 0.729 0.799 0.686 1.247 1.335 0.695 1.272 1.309 0.782 1.465 1.301 1.236 1.301 0.573 1.300 0.646 0.609 0.439 0.347 0.734 1.421 0.703 1.421 Fold Change Fold 1437353_at 1435284_at 1439650_at 1426559_at 1420941_at 1440104_at 1450287_at 1459722_at 1432415_at 1440771_at 1433758_at 1439840_at 1453906_at 1417355_at 1460452_at 1429735_at 1424893_at 1442311_at 1425424_at 1416712_at 1440267_at 1420843_at 1431328_at 1437213_at 1431053_at 1440915_at 1450880_at 1456923_at 1417600_at Probe Set ID 1425530_a_at SET domain containing 1B strawberry notch homolog 1 regulator of G-protein signaling 5 signaling G-protein of regulator RAN binding protein 2 neuronal PAS domain protein 3 Zinc finger, SWIM domain containing 6 member RAS oncogene family oncogene RAS member zinc finger with KRAB and SCAN domains 1 nischarin polymerase (DNA directed), beta directed), (DNA polymerase thyroid receptor hormone associated protein paternally expressed 3 motile sperm domain containing 3 containing domain sperm motile quaking nuclear distribution gene E-like homolog 1 Zinc finger RNA binding protein Description hypothetical protein LOC620031 peptidase 4 peroxisomal biogenesis factor 1 factor biogenesis peroxisomal protein tyrosine phosphatase, receptor type, F syntaxin 3 M-phase phosphoprotein 9 nudix (nucleoside diphosphate linked moiety X)-type motif 21 protein phosphatase 1, catalytic subunit, beta isoform reticulon 4 transporter), (H+/peptide 15 family carrier solute member 2 subfamily channel, cation potential receptor transient M, member 3 mitochondrial ribosomal protein L16 protein ribosomal mitochondrial Qk Zfr Stx3 Polb Sbno Pex1 Ptprf Rtn4 Rgs5 Peg3 Pep4 Nisch Ndel1 Npas3 Trpm3 Setd1b Thrap2 Nudt21 Ppp1cb Mrpl16 Ranbp2 RAB3C Zswim6 Slc15a2 Zkscan1 Mospd3 MGC7817 Mphosph9 Gene

68

2.4.5 Global expression analysis of E15.5 embryo heads

In order to find additional targets of the putative transcription factor TFII-IRD1, a second microarray experiment was performed using the Illumina BeadChip platform. Studies on adult

Gtf2ird1-/- mice have shown that they have structural cerebellar defects, as well as neurotransmitter differences in the cerebellum, cortex, and amygdala. For this reason, I focused on the embryonic time points when these structures/brain regions are developing. RNA was extracted from the heads of E15.5 mouse embryos, and 5 Gtf2ird1-/- mice were compared to 5

WT mice. The mice used in the experiment came from crosses between Gtf2ird1+/- mice, which allowed for the comparison of Gtf2ird1-/- and WT littermates.

Differentially expressed genes were detected using LIMMA107, following normalization of the log2 scale transformed data using the quantile normalization method106. Eighteen genes were shown to have altered expression in Gtf2ird1-/- mice with an adjusted p-value <0.1 (Table

2.4). Similar to the findings of the first microarray experiment, the changes in expression detected were generally small with approximately half of the genes being altered by less than 2- fold. Mospd3 and Auts2 were the only genes to be identified in both microarray experiments.

However, in the microarray performed on newborn mice Auts2 was found to be increased in expression by 1.3 fold, while it was found to be decreased by 1.5 fold in the embryonic mice.

There were two probes corresponding to Mospd3 on the Illumina BeadChip array which was used for the embryonic mice; one of these indicated that Mospd3 expression was slightly decreased while the other indicated that it was slightly increased. The array performed with newborn mice on the Affymetrix platform also indicated that Mospd3 expression was slightly increased in Gtf2ird1-/- mice.

69 5E4 4B3 5E1 5G2 5G2 5G2 5G3 5G3 5G3 5G3 5G2 5G2 5G2 5G2 5G2 11E2 11B1.3 osome chrom- number Accession Accession XM_356566.1 XM_358724.1 XM_132218.3 XM_917238.2 XM_289936.2 NM_021392.3 NM_007393.3 NM_177047.3 NM_153572.1 NM_008647.3 NM_009087.1 NM_027872.3 NM_018760.1 NM_031404.4 NM_030037.1 NM_009315.3 NM_013844.2 NM_001081462.1 NM_001081468.1 pvalue 2.74E-02 7.61E-12 9.70E-04 1.01E-02 2.18E-15 9.80E-07 3.06E-03 9.80E-07 8.71E-02 9.80E-07 6.19E-04 2.65E-14 5.33E-03 3.04E-03 7.11E-05 2.25E-03 2.57E-02 4.52E-03 4.25E-04 8.96E-05 2.74E-02 1.10E-02 5.74E-17 1.35E-10 4.57E-03 5.09E-02 Adjusted Adjusted 0.80 0.21 1.63 0.64 0.07 0.53 1.59 0.37 0.31 0.42 2.03 0.18 1.36 2.14 0.41 0.66 5.09 0.65 1.64 2.03 1.58 1.42 0.14 2.17 2.05 1.45 Fold Change mouse embryos according to microarray analysis. analysis. microarray to according embryos mouse -/- Probe set ID ILMN_2677531 ILMN_2449449 ILMN_3153772 ILMN_1237413 ILMN_2455192 ILMN_2985428 ILMN_1221960 ILMN_2475184 ILMN_2542048 ILMN_2707416 ILMN_2656631 ILMN_1241137 ILMN_2599667 ILMN_1253663 ILMN_1377923 ILMN_1250157 ILMN_2706468 ILMN_2971577 ILMN_2730208 ILMN_2594202 ILMN_2634720 ILMN_2627733 ILMN_2696182 ILMN_3096468 ILMN_2993221 ILMN_2534329 Description adaptor-related protein complex AP-4, 1 mu actin, beta 6B actin-like 2 candidate susceptibility autism coenzyme Q2 homolog, prenyltransferase GTF2I repeat domain containing 1 katanin p60 subunit A-like 1 3 containing domain sperm motile major urinary protein 2 L21 protein ribosomal 1-3 polymerase RNA 3 member 46, family carrier solute exchanger), (anion 4 family carrier solute member 4 TAF6 RNA polymerase II, TATA box binding factor (TBP)-associated protein zinc finger protein 68 MUP2 KATNAL1 LOC382555 LOC382264 . Genes shown to be altered in the brains of 15.5 d.p.c. Gtf2ird1 15.5 d.p.c. of brains the in altered be to shown 2.4 . Genes Table Gene ACTB ACTL6B AP4M1 AUTS2 Coq2 GTF2IRD1 LOC333841 MOSPD3 Rpl21 RPO1-3 SLC46A3 SLC4A4 TAF6 ZFP68

70

2.4.6 Validation of candidate gene expression using qRT-PCR

qRT-PCR was performed in order to verify that the alterations in gene expression

detected in the Gtf2ird1-/- mice using microarrays were accurate. When possible, primer pairs in which one of the primers overlapped with the microarray-probe sequence were used.

Expression changes of most of the known protein-coding genes identified as being altered

in the brains of newborn Gtf2ird1-/- mice could not be validated. When gene expression was

found to be significantly different between genotypes, the changes in expression were generally

small (Figure 2.6). The protein coding genes which showed the largest changes in expression

were Kin, Stx3 and Mrpl16. Interestingly, Stx3 and Mrpl16 are located in a tail-to-tail orientation

on mouse chromosome 19, with only ~150 bp separating their 3’ UTRs. This tail-to-tail

orientation is conserved in humans. Stx3 is an attractive candidate gene for the neurological

symptoms of WBS as it expressed in neuronal growth cones and is necessary for proper neuron

growth114.

Many of the changes in gene expression identified in the microarray on embryonic mice

also could not be validated using qRT-PCR (Figure 2.7). Significant differences in expression

between genotypes were only detected in seven genes, with the largest changes in expression

seen in Actl6b, Taf6 and Zfp68. Actl6b is an attractive candidate gene for some of the

neurological phenotypes of WBS as it is a member of a neuron specific chromatin remodelling

complex which regulates dendritic growth and arborization115.

71

Figure 2.6. qPCR validation of expression of candidate genes identified in microarray analysis of newborn brains. . RNA from 9 mice of the same genotype was pooled together to make cNDA, n=3 separate pools/genotype. Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated). * p < 0.05, ** p < 0.005 using Student’s t-test.

Figure 2.7. qPCR validation of expression of candidate genes identified in microarray analysis of E15.5 embryo heads. Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated). * p < 0.05, ** p < 0.005 using Student’s t-test.

72

2.4.7 Knockdown of Gtf2ird1 in neuronal cell lines does not affect expression of candidate genes

It was noted that nearly all of the genes identified by microarray as being altered in the

embryonic mice were located on chromosome 5, within 45 Mb of the Gtf2ird1 locus. Actl6b,

Taf6, and Zfp68 are all within 5 Mb of the Gtf2ird1 locus. In order to determine if the alterations

in gene expression that were detected were actually the result of the physical disruption of that

locus caused by gene targeting, and not specifically related to the loss of TFII-IRD1, siRNA

knockdown of Gtf2ird1 was performed in neuronal cell lines. siRNA knockdown will result in a

decrease in the amount of TFII-IRD1 present in the cells without physically disrupting the chromosome, which can alter the expression of nearby genes.

siRNA knockdown of Gtf2ird1 was performed in two different cell lines: Neuro2A and

N1E-115, both of which are derived from mouse neuroblastomas. qRT-PCR analysis was used to determine the level of Gtf2ird1 knockdown. Optimization of Gtf2ird1 knockdown was performed in N2A cells using a pool of 4 different siRNAs (in equal concentrations) designed by

Dharmacon. A non-targeting siRNA was used as a negative control, and a pool of siRNAs which target Gapdh was used as a positive control. Gapdh siRNA treated cells showed a specific 90% reduction in Gapdh expression, while Gtf2ird1 siRNA treated cells only showed a

60% reduction in Gtf2ird1 expression (Figure 2.8A). Gtf2i expression was not affected by the

Gtf2ird1 siRNAs indicating they specifically target Gtf2ird1.

In an attempt to increase the level of Gtf2ird1 knockdown, each of the 4 siRNAs in the

SMARTpool was tested individually, along with 4 additional Gtf2ird1 siRNAs ordered from

Dharmacon. The best individual performing siRNAs were then mixed in different combinations to create three different pools of Gtf2ird1 siRNAs (named A, B & C). The pools were then

73

tested on both N2A and N1E-115 cell lines in duplicate. Each of the Gtf2ird1 siRNA pools specifically knocked down Gtf2ird1 expression by approximately 60% in Neuro2A cells and by

80% in N1E-115 cells (Figure 2.8B). Treatment with a non-targeting siRNA, or an siRNA targeting Gapdh expression had no effect on the expression level of Gtf2ird1.

B

Figure 2.8. (A) Knockdown of Gtf2ird1 in the neuronal cell line N2A, determined by qPCR. Gapdh siRNAs were used as a positive control, and Gapdh showed a 90% decrease in expression. Expression of Gtf2i was not affected. (B) Knockdown of Gtf2ird1 in the neuronal cell lines N2A and N1E-115. Expression of the housekeeping gene Hmbs was not affected. Expression values are shown relative to the housekeeping gene Sdha. Primers used in PCR are shown on the X-axis.

74

Expression levels of candidate genes identified in the microarrays and verified using

qRT-PCR were analyzed in Gtf2ird1 siRNA treated cells. As there were no differences in the effects of Gtf2ird1 pools A, B, and C, candidate gene expression was tested in cells treated with each pool and the expression values were averaged together (n=6). Although Gtf2ird1 expression was only knocked down 60-80% in these cell lines, differences in the expression levels of candidate genes could be detected in Gtf2ird1+/- mice in which Gtf2ird1 expression is

decreased by ~50%. Therefore if the candidate genes were being either directly or indirectly

regulated by TFII-IRD1, I would have expected to see a significant change in expression in the

siRNA treated cells. However, no significant differences in the expression of Actl6b, Taf6, Kin or

Zfp68 could be detected in Gtf2ird1 siRNA treated cells when compared with either non-

targeting siRNA treated cells or untreated cells (Figure 2.9). Stx3 and Mrpl16 are not expressed

in these cell lines, and so the effect of Gtf2ird1 knockdown on their expression could not be

examined. These results indicate that TFII-IRD1 is unlikely to play a role in the transcriptional regulation of these candidate genes in the cell types examined.

75

Figure 2.9. Expression of candidate genes in Gtf2ird1 siRNA treated neuronal cell lines. Expression values are shown relative to the housekeeping gene Sdha. No statistically significant changes in expression were detected between Gtf2ird1 siRNA treated cells and non-targeting siRNA treated or untreated cells using Student’s t-test. * p < 0.05.

2.4.8 Altered gene expression in Gtf2ird1-/- mice is the result of differences in genetic background

The initial targeting of the Gtf2ird1 locus was done in R1 ES cells, which are derived from a 129X1/SvJ & 129S1 cross, and the mice were backcrossed onto CD1. As the region around the targeted locus may retain a 129 genotype, I hypothesized that the gene expression differences in the Gtf2ird1-/- mice may actually be the result of differential expression between

genetically different mouse strains.

Expression of candidate genes in the brains of newborn and adult mice from a

129S1/SvImJ genetic background was analyzed using qRT-PCR. These mice are very similar

genetically to the R1 ES cells in which the Gtf2ird1 locus was targeted. If the expression

differences in the candidate genes were due to differences in genetic background, then the

expression of these genes in Gtf2ird1-/- mice would be similar to the expression in WT

76

129S1/SvImJ mice. Conversely, the expression of these genes in WT CD1 mice would be

different than the expression in WT 129S1/SvImJ mice.

Expression of candidate genes located near the Gtf2ird1 locus on chromosome 5 were

found to be the same in Gtf2ird1-/- mice as in WT 129S1/SvImJ mice, and significantly different

from CD1 WT mice (Figure 2.10). In addition, analysis of strain-specific SNPs within Zfp68 (a gene that showed altered expression), demonstrated that the Gtf2ird1-/- mice were homozygous

for 12/13 129S1/SvImJ SNPs, while CD1 WT mice were only similar to the 129S1/SvImJ mice at 2/13 SNPs (Table 2.5). While the 129S1/SvImJ mice are genetically similar to the R1 line which was used to derive the Gtf2ird1-/- mice, they are not identical. This could explain why the

SNPs are not a 100% match between Gtf2ird1-/- mice and 129S1/SvImJ mice.

Figure 2.10. Expression of candidate genes in the brains of P0 mice from different genetic backgrounds determined by qPCR. 129+/+ (n=7), Gtf2ird1-/- (n=5), CD1+/+ (n=6). Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated). * p < 0.05, ** p < 0.005 using Student’s t-test.

77

Table 2.5. A comparison of SNPs in the 3’UTR Zfp68 inGtf2ird1-/- mice and CD1 WT mice relative to 129S1/Sv1mJ mice.

In order to determine if these results represented true differences in expression between the 129S1/SvImJ and CD1 mice, or if they were the result of SNPs in the genomic sequence inhibiting probe/primer binding, I had each of the PCR amplicons which was shown to be significantly altered sequenced. I did not find any SNPs in the primer sequences themselves, and only two of the amplicons contained a SNP. Thus, the apparent differences in expression between the strains are likely to be true differences, and not the result of different PCR

78

efficiencies. These results indicate that the differences in expression I had previously detected

for these genes in the brains of Gtf2ird1-/- mice relative to CD1 wildtype mice were not related to the function of TFII-IRD1.

Expression analysis of Pex1 and AI506816, which previously showed significant differences between Gtf2ird1-/- and CD1 WT mice, failed to replicate the differences between

these two genotypes. However the expression of these genes in WT 129S1/SvImJ mice was

significantly different than both Gtf2ird1-/- and CD1 WT mice (Figure 2.10). Both Pex1 and

AI506816 are located on mouse chromosome 5A, while Gtf2ird1 is located at 5G2, a

considerable distance away. It is likely that when the initial expression analysis of these genes

was conducted both the Pex1 and AI506816 alleles were derived from 129S1/SvImJ in the

Gtf2ird1-/- mice, and at some point before samples were collected for the more recent

experiment, recombination resulted in the Gtf2ird1-/- mice having CD1 derived alleles for these

genes.

Expression levels of Stx3 and Mrpl16 were significantly lower in Gtf2ird1-/- mice than in

WT CD1 or WT 129S1SvImJ mice. Both of these genes are located on mouse chromosome 19.

Thus, the altered expression of these genes is unlikely to be a result of Gtf2ird1-/- mice

harbouring 129S1/Sv1mJ alleles for these genes.

2.4.9 TFII-IRD1 is found in the cytoplasm of Neuro2a cells

Microarray experiments at time points when Gtf2ird1 is widely expressed in the brain

have been unable to find any clear targets of this putative transcription factor. TFII-I, a family

member of TFII-IRD1, has been shown to have a cytoplasmic role in addition to its role as a

transcription factor116. In order to determine if TFII-IRD1 may also have a cytoplasmic cellular

role, I looked at the localization of TFII-IRD1 protein in Neuro2A cells.

79

Endogenous expression of TFII-IRD1 in these cells could not be detected using immunocytochemistry or western blots. This could be because the protein is expressed at levels below the threshold of detection or because the antibody binds poorly to the target protein. To date, I have tested a variety of antibodies against TFII-IRD1 and have been unable to detect endogenous expression in tissue samples using western blots, or in cell cultures using western blots or immunohistochemistry. In order to raise cellular TFII-IRD1 levels, Neuro2A cells were transfected with a construct that expresses myc-tagged TFII-IRD1. Western blots performed on transfected cells using an antibody against the myc-tag identified a band of the expected size.

No bands were detected in untransfected cells (Figure 2.11). Immunocytochemistry was performed on transfected Neuro2A cells using either an antibody against the myc-tag, or an antibody against TFII-IRD1 to determine the cellular localization of the protein. Both antibodies produced identical results (Figure 2.12). No signal is visible when either of these antibodies are used on untransfected cells.

TFII-IRD1 can clearly be seen in the nucleus and throughout the cytoplasm of these cells, including the neurite extensions (Figure 2.12). This indicates that TFII-IRD1 may have a biological role other than a transcription factor.

80

A B

Figure 2.11. (A)Western blot showing that an anti-myc-tag antibody specifically detects myc- TFII-IRD1 in transfected Neuro2A cells. (B) an anti-TFII-I antibody was used as a loading control. Myc-TFII-IRD1(n) is produced from an expression vector containing the TFII-IRD1 isoform that is most highly expressed in neuronal cells. Myc-TFII-IRD1(m) is produced from an expression vector containing the TFII-IRD1 isoform that is most highly expressed in muscle cells.

81 - -

(bottom row). (bottom row). IRD1 on construct. Immunohistochemistry was performed using either an antibody performed an myc Immunohistochemistry the was using against either on construct. is found in both the cytoplasm and nucleus of Neuro2A cells. Cells were transfected myc Neuro2Awithis a of Cells nucleus cells. were in and bothcytoplasm found the

expressi IRD1 - TFII IRD1 2 Figure 2.1 Figure TFIItagged - or antibody an row) (top TFIIepitope against -

82

2.5 Discussion

Transcription factors are proteins which recognize and bind to specific DNA sequences,

and regulate transcription of the corresponding genes either positively or negatively117. There

have been many studies showing that members of the TFI-I gene family, including TFII-IRD1, are able to regulate transcription by binding to specific DNA sequences. Thus, it is surprising that I was unable to confirm any of the previously identified TFII-IRD1 target genes in vivo, or identify any novel targets in our Gtf2ird1-/- mice.

2.5.1 Targets of TFII-IRD1 identified in vitro

TFII-IRD1 has previously been shown to bind to the promoters region of the Hoxc8, Gsc,

and TnIs genes using yeast-one hybrid studies74,75,79. A comparison of the bait sequences used in

each experiment revealed a common motif, GGATTA, found in the promoter of each gene. This

is the same consensus sequence that Vullhorst and Buonanno identified using SELEX with the I-

repeats of mouse TFII-IRD180. In addition, TFII-IRD1 was shown to regulate the expression of

a reporter gene when regions of the Gsc and TnIs promoters containing the GGATTA motif were

placed upstream of the transcription start site68,78.

-/- I did not look at the expression of TnIs in the Gtf2ird1 mice, as TnIs is involved in

muscle fiber type specification, and is not expressed in the tissues/time-points that I was

-/- studying. The expression of TnIs in Gtf2ird1 mice has been studied by others, and no

differences in expression were detected (Stephen Palmer, personal communication). I analyzed

the expression of Gsc and Hoxc8 in Gtf2ird1-/- mouse embryos, and no differences from WT were detected. Previous studies have indicated that TFII-IRD1 represses the expression of both of these genes and so I had expected to find increased levels of Hoxc8 and Gsc in the Gtf2ird1-/-

83

mice. Recent findings by Palmer et al.118 could help to explain why the results of my in vivo analysis do not correlate with previous in vitro studies.

Palmer et al. found that TFII-IRD1 is able to negatively auto-regulate itself by binding to its own promoter118. They generated a Gtf2ird1 knockout mouse using homologous

recombination to delete exon 2 in 129R1 ES cells. Exon 2 contains the transcription start site, and so this was expected to prevent Gtf2ird1 transcription in these mice. However it was found that the mice did produce a transcript with exon 1 splicing directly into exon 3. Similar to our

Gtf2ird1-/- mice, their knockout mouse produced increased levels of the truncated transcript.

There are two AUG codons in exon 3 of Gtf2ird1, the first of which is out of frame and is

followed by a stop codon five codons downstream. Use of the second AUG would produce an

in-frame protein. Using an expression construct which produced the mutant truncated protein

and comparing the RNA and protein levels to those of a wildtype expression construct, it was

determined that the mutant protein is produced at ~3% of the level of the WT protein. This led

Palmer et al. to postulate that TFII-IRD1 may use negative feedback to increase transcription when the protein levels are too low.

In support of the auto-regulation through negative feedback hypothesis, Palmer et al., and others, have demonstrated that GTF2IRD1 expression in lymphoblast cells from WBS patients is not significantly different than unaffected controls118-121. This indicates that in this cell type

there is increased transcription from the single copy of GTF2IRD1 to make up for the loss of one

copy of the gene. However, this effect appears to be cell type specific as the levels of

GTF2IRD1 are decreased by ~50% in fibroblast cells of WBS patients121.

84

In order to determine if the auto-regulation occurred through direct interactions between

TFII-IRD1 and its promoter, Palmer et al.compared the GTF2IRD1 promoter sequences from a

variety of organisms on the assumption that the sequence which TFII-IRD1 binds to would be conserved. They identified a 104 bp sequence (GTF2IRD1 upstream region; GUR) which is highly conserved between humans and fish, and contains three GGATTA sequence motifs: a sequence to which TFII-IRD1 had previously been shown to recognize and bind. These six base- pair motifs are 100% conserved between different species, and the of the

104 bp region between species ends soon before/after the proximal and distal GGATTA sequences.

Using EMSA and luciferase reporter assays, Palmer et al. demonstrated that in order for

TFII-IRD1 to bind to the GUR sequence and/or regulate reporter gene expression, there needed to be a minimum of two GGATTA sequence motifs present, and they could not be separated by more than 57 bp. The strongest interactions occurred when there were three GGATTA sequence motifs present, and if there was only one motif, or if the motifs were separated by more than 57 bp, TFII-IRD1 was not able to bind to the DNA.

The yeast-one hybrid experiments which found that TFII-IRD1 can bind to regions of the promoters of the Hoxc8, Gsc and TnIs genes all used bait sequences which included the sequence

GGATTA. The bait sequences were always replicated three times in the construct74,75,79. When

TFII-IRD1 was shown to regulate expression of reporter genes placed downstream of Gsc and

Tn1s promoter sequences, the sequences used also contained the GGATTA motif, and the

sequence was replicated six and three times in the constructs respectively78,79. The GGATTA

sequence motif is only found one time in the promoter regions of Hoxc8, Gsc and TnIs, and

based on the results of Palmer et al. it is unlikely that TFII-IRD1 is able to bind to, and regulate,

85

expression of these genes in vivo. It is likely that triplicating the bait sequences in the yeast-one

hybrid and reporter gene assays allowed TFII-IRD1 to bind to a region of DNA which it would not normally be able to interact with. Thus, it is not surprising that no differences in Hoxc8 or

Gsc expression could be detected in Gtf2ird1-/- mouse embryos.

2.5.2 Global analysis of gene expression in Gtf2ird1-/- mice

The two different microarray experiments looking at gene expression levels in the brains

of Gtf2ird1-/- and WT mice were unable to identify any genes that are likely to be regulated by

TFII-IRD1. The number of genes identified in each experiment, and the magnitudes of the

changes in expression were both smaller than expected, given the large number of genes (2000+)

identified by microarray as having altered expression in MEFs over-expressing TFII-IRD1101.

Numerous ChIP-seq experiments have been performed recently to identify binding sites

for specific transcription factors. These studies have identified hundreds to thousands of motifs

throughout the genome to which a particular transcription factor binds122-124. However many of

the binding sites are not located in the vicinity of transcription start sites and it is unlikely that all

of the binding events influence gene expression.

An in vivo microarray experiment was recently performed by Enkhmandakh et al. to look

at gene expression in a different Gtf2ird1 knockout mouse model, Gtf2ird1Gt(XE465)Byg/ Gt(XE465)Byg

(Gtf2ird1Gt/Gt), which was generated from a gene trap ES cell line125. They identified 536 genes

with altered expression in E9.5 Gtf2ird1Gt/Gt embryos; however there are several caveats to the

interpretation of these data. Firstly, Gtf2ird1Gt/Gt embryos die between E8.5 and E12.5, with most

showing signs of being actively resorbed by E9.5125. Thus, it is likely that much of the altered

gene expression was due to cellular processes involved in embryonic death and resorption and is

unrelated to the absence of TFII-IRD1. Secondly, the Gtf2ird1Gt/Gt mouse has a far more severe

86

phenotype than the other four published Gtf2ird1 mouse models53,82,88,118. In each of these other

models, homozygous mice are healthy and fertile, with milder phenotypes such as behavioural

and cognitive deficits or craniofacial abnormalities. The embryonic lethality observed in the

Gtf2ird1Gt/Gt mice likely results from the use of a gene trap ES cell line which contains an

insertion into intron 22 of Gtf2ird1. The resulting transcript would lead to translation of a fusion protein encoding most of TFII-IRD1, but lacking a nuclear localization signal. This fusion protein may still interact with its usual protein partners but be incapable of carrying out its normal function. If this is the case, the downstream effects on global gene expression would be likely to include effects on genes that are not normally either direct or indirect TFII-IRD1

targets.

A number of the genes which I identified as having altered expression in the brains of

Gtf2ird1-/- mice using microarray analysis were good candidate genes for the behavioural

phenotype seen in these mice. Actl6b, which showed decreased expression in Gtf2ird1-/- mice, is a member of a post-mitotic neuron-specific chromatin remodelling complex, and is known to be involved in dentritic growth and development115. Approximately 75% of Actl6b-/- mice die

within two days of birth as a result of defects in neuronal development. Those which survive are

hyperactive indicating defects in neuronal development115.

Microarray results also found Zfp68 to have decreased expression in Gtf2ird1-/- mice.

There is not very much known about the function of ZFP68, but it does contain the Kruppel-

associated box motif, which is known to cause transcriptional silencing. ZFP68 binds to KAP1

and these proteins then form a complex with other proteins resulting in the formation of

heterochromatin126. Kap1-/- mice show increased anxiety-like behaviours127.

87

However, while expression of these genes was confirmed to be decreased in the brains of

Gtf2ird1-/- mice using qRT-PCR, the expression differences are unlikely to be linked to the

absence of TFII-IRD1 in the mice. It was noted that all of the genes identified in the microarray

using the Illumina platform on E15.5 embryos were located within 50 MB of the Gtf2ird1 locus

on chromosome 5, as were many of the genes identified in the microarray using the Affymetrix

platform on P0 mouse brains. This raised the possibility that the physical targeting of the

Gtf2ird1 locus had disrupted the expression of nearby genes. In order to determine if the

differences in expression were directly related to the absence of TFII-IRD1, siRNA was used to knockdown expression of Gtf2ird1 in two different neuronal cell lines. This allowed the expression of candidate genes on chromosome 5 to be studied without physically disrupting the chromosome. Gtf2ird1 expression was reduced by 60 - 80% relative to controls. Differences in expression of candidate genes were detected in the brains of Gtf2ird1+/- mice, which have higher levels of Gtf2ird1 expression than the siRNA treated cells, so this level of knockdown would be expected to have an effect on the expression of any true target genes.

Using qRT-PCR, no differences in the expression of the candidate genes could be detected between Gtf2ird1 siRNA treated cells and controls. This could indicate that TFII-IRD1

does not regulate the expression of the candidate genes examined in the cell types that were used,

or that the differences in expression detected in Gtf2ird1-/- mice are not directly related to the

absence of TFII-IRD1 in the mice.

Another explanation for the clustering of the candidate target genes around the Gtf2ird1

locus is that the differences in expression were the result of background strain. The Gtf2ird1-/-

mice were generated in R1 ES cells which are derived from a 129X1/SvJ & 129S1 cross. The

mice were then back-crossed onto a CD1 background. Most of the genome in the mice used for

88

analysis would have been a CD1 genotype, but the region flanking the targeted locus would

retain a 129 genotype. Actl6b, Zfp68 and all of the other genes identified in the microarray

performed on E15.5 mouse embryos flank the Gtf2ird1 locus, and therefore may have

polymorphisms in the Gtf2ird1-/- mice which differ from the WT CD1 mice that they were

compared to and which may alter gene expression. Thus when comparing the expression of

genes on chromosome 5G between Gtf2ird1-/- and WT mice, I was actually comparing expression between CD1 and 129 mice.

The phenomenon of the genes which flank a targeted locus confounding the results of microarray experiment is a recognized problem128-130. Many polymorphisms exist which result in altered levels of gene expression between different mouse strains131. Thus, a mouse with a

targeted allele may express the genes surrounding the targeted locus at different levels than the

WT mouse to which it is being compared. This flanking gene effect has been shown to persist

after 11 generations of back crossing130, and extend up to 40 MB from the targeted locus130.

In order to determine if the candidate genes had altered expression because of differences

in genotype, expression of candidate genes was analyzed in Gtf2ird1-/-, WT CD1 and WT 129

mice using qRT-PCR. Expression of genes which flank the Gtf2ird1 locus was found to be

significantly different between WT CD1 and 129 mice, while Gtf2ird1-/- mice were not

significantly different than WT 129 mice. SNP analysis was then used to show that the Zfp68

allele in the Gtf2ird1-/- mice is genetically 129 and contains different polymorphisms than those

found in CD1 mice. These results indicated that the differences in expression of these genes

detected in Gtf2ird1-/- mice were the result of differences in background strain and were not

related to loss of TFII-IRD1.

89

As some of the genes which had altered expression in Gtf2ird1-/- mice play a role in brain

development and behavioural pathways, it raises the possibility that the behavioural phenotype

seen in the mice is unrelated to the absence of TFII-IRD1 and is actually the result of differences between different mouse strains. This is unlikely, since in contrast to the low anxiety detected in

Gtf2ird1-/- mice, mice of a 129 genetic background have been found to have higher levels of

anxiety compared to other mouse strains132-134. In addition, Gtf2i+/- mice do not show increased

sociability (unpublished results). Gtf2i is also located on chromosome 5G, adjacent to Gtf2ird1

and the Gtf2i+/- mice were also derived from R1 ES cells, and therefore would be expected to

have a 129 genotype for the genes flanking the region.

Not all of the differences in expression in Gtf2ird1-/- mice can be attributed to differences

in genetic background. Microarray analysis on p0 mice found that Kin, Stx3 and Mrpl16 all

showed decreased expression in Gtf2ird1-/- mice. This difference in expression was confirmed

using qRT-PCR; Gtf2ird1-/- mice were shown to have decreased expression of Stx3 and Mrpl16

relative to both WT CD1 and 129 mice. Stx3 was an attractive candidate gene for some of the

Gtf2ird1-/- mouse behavioural phenotypes as it is known to be involved in neuronal growth114 and

synapse function135.

My results indicate that the choice of microarray platform may affect the genes found to

be differentially expressed between two groups. There was very little overlap between the genes

identified as having altered expression using the Illiumina array and the Affymetrix array. Part

of the reason for this could be that the arrays were performed at two different time points,

however many of the genes identified could be validated at both time points using qPCR.

Studies comparing microarray platforms have generally found good correlation between the

Affymetrix and Illumina136-138. One of the biggest differences between the two platforms is in

90

the probe design; Illumina generally uses one 50-mer probe per transcript while Affymetrix uses

multiple 25-mer probes.

There are a number of possible explanations as to why no targets of TFII-IRD1 were

identified in the mutant mice. It could be that TFII-IRD1 does not regulate gene expression at

the time points examined, or that the regulation is only occurring in a very specific cell

population in the brain, and by examining the entire brain I diluted out the effect. These

scenarios are unlikely as Gtf2ird1 is expressed throughout the brain, at relatively high levels, at

both of the time points examined. It is unlikely that such robust expression would occur if the gene was not fulfilling an important role. It is possible that other GTF2I-family members are compensating for the loss TFII-IRD1 at the time points studied, however there is no evidence to date to support this theory.

Another possibility is that the absence of TFII-IRD1 does affect gene expression in

Gtf2ird1-/- mice, but the changes in expression are small, and below the threshold of detection.

This was the case when gene expression in Mecp2 mutant mice (a model of Rett Syndrome) was

examined139. Mecp2 is believed to be a general transcription repressor, and Mecp2-null mice show a disease phenotype similar to that seen in people with Rett syndrome. Microarray analysis was performed on brains from these mice at multiple time points, and no significant changes in gene expression could be detected. However, the authors realized that they could differentiate mutant mice from WT by looking at very subtle changes in gene expression that occurred in a number of genes simultaneously. It is possible that TFII-IRD1 also only causes subtle changes in gene expression which are sufficient to cause the behavioural phenotype seen in Gtf2ird1-/-

mice.

91

Recent studies have indicated that MeCP2 may play a role in chromatin remodelling in

addition to, or instead of, acting as a gene specific transcription factor. Ishibashi et al.

demonstrated that MeCP2 binds chromatin at sites of entry and exit from nucleosomes, similar to

linker histones; however unlike linker histones, MeCP2 binding is dependent on methylation140.

It was later discovered by Skene et al. that mature neurons contain one MeCP2 molecule for

every two nucelosomes, and that MeCP2 binding occurs throughout the genome suggesting a

global regulatory role for the protein141. In support of this theory, histone H1 (a linker histone) is

known to be expressed at roughly 50% lower levels in neurons relative to other cell types142.

However, in MeCP2 null mice H1 expression in neurons is increased by 2-fold, bringing it in line with the expression levels seen in other cell types141. Skene et al. proposed that H1 and

MeCP2 may compete for chromatin binding sites, and the absence of MeCP2 can be partially

compensated for by increased H1 expression. MeCP2 may be able to lead to small, but global, changes in transcription levels by acting as a linker protein and by recruiting HDACs to the

chromatin. Thus, MeCP2 is unlikely to function as a classical transcription factor as was

previously believed.

TFII-IRD1 is unlikely to function in the same manner as MeCP2; however an alternate theory to explain my findings is that TFII-IRD1 does not function as a classical transcription factor in the brain at the time points studied. It is possible that TFII-IRD1 does not directly regulate gene expression in vivo, and instead is involved in protein-protein interactions. In order to determine if there is an alternate role for TFII-IRD1 the localization of the protein within cells was studied.

92

2.5.3 Cellular localization of TFII-IRD1

Previous studies on the cellular localization of TFII-IRD1 have reported nuclear localization of the protein, as would be expected of a transcription factor57,68,94. However,

immunohistochemistry performed on transfected N2A cells revealed that TFII-IRD1 localizes to both the nucleus and cytoplasm of these cells. Interestingly it appeared that in any given cell

TFII-IRD1 staining was seen in only the nucleus or the cytoplasm, never both. However further studies using confocal microscopy will be needed to confirm this. This is the first time that TFII-

IRD1 localization has been studied in a neuronal cell line, and indicates that the protein may play a role other than transcription factor in these cells.

Recently a cellular role for TFII-I was discovered. Caraveo et al. found that TFII-I is able to negatively regulate agonist-induced calcium entry into cells143. Intracellular calcium

signalling is initiated by receptor tyrosine kinases and G protein-coupled receptors which

activate the γ or β forms of PLC. PLC-γ is then able to bind to a calcium channel, transient receptor potential channel 3 (TRPC3), resulting in the insertion of the channel into the plasma membrane and an influx of calcium144. Caraveo et al. demonstrated that when TFII-I is

phosphorylated, it can bind to PLC-γ. They went on to show that knocking down Gtf2i in PC12

cells (derived from rat adrenal medulla) leads to an increase in calcium influx, and this

phenotype could be rescued by expressing a human GTF2I isoform that is unaffected by the rat-

specific siRNAs. These results suggested that TFII-I negatively regulated intercellular calcium levels.

As binding of PLC-γ to TRPC3 channels results in insertion of the channels in the plasma membrane, Caraveo et al. studied the surface accumulation of TRPC3 after knocking down TFII-I in PC12 cells. They found that increased levels of TRPC3 were found in the plasma

93

membrane, and there was an increase in total TRPC3 protein levels. This effect was believed to

be post-transcriptional as mRNA levels were not affected.

Proper regulation of intercellular calcium levels via TRPC channels is essential for many neuronal functions including axon guidance145, membrane depolarization146 and innate levels of

fear147. It is possible that TFII-IRD1 may also play a role in regulating inter-cellular calcium levels through TRPC channels. Decreased levels of TRPC4 protein have been detected in the frontal cortex of adult Gtf2ird1-/- mice (Ted Young, personal communication). Further work will

need to be done to determine if there is a direct interaction between TFII-IRD1 and TRPC4, PLC or other cytosolic proteins, and if inter-cellular calcium levels are altered in Gtf2ird1-/- mice.

94

Chapter III: Exon specific differences in gene expression between different mouse strains

3.1 Abstract

There are a number of factors which can confound the analysis of gene expression levels

when comparing mice of different genetic backgrounds, including the use of different polyadenylation and splice sites. I have previously shown that a subset of genes in Gtf2ird1-/-

mice are expressed at different levels than in WT CD1 mice as a result of the retention of DNA

from the parental 129 derived strain. A closer look at expression of these genes in Gtf2ird1-/-

mice revealed that while changes in expression could be detected using primers specific to the

3’UTR of the transcripts, primers targeting upstream coding exons did not necessarily detect a

similar expression pattern. For some genes, including Stx3, Taf6 and Coq2, the genotype

specific expression differences were restricted to the 3’ UTR. Zfp68 and Actl6b also had

decreased expression in the 3’ UTRs, however expression of coding regions was variable:

increased in some exons and decreased in others. Northern blot analysis performed on a subset

of the genes failed to identify alternative transcripts which could explain these findings; however

transcripts which use alternative splice sites within in the 3’ UTR and/or alternative

polyadenylation sites were identified using 3’ RACE. Primers designed to differentiate between

transcripts using alternative polyadenylation sites detected genotype specific differences in

expression of genes located on chromosome 5. Altered expression of these genes was found to

be the result of retention of that chromosome region from the targeted embryonic stem cell line,

and therefore dependent upon background strain rather than Gtf2ird1 genotype.

95

3.2 Literature Review

3.2.1 Polyadenylation of pre-mRNA

A key step in the generation of eukaryotic mRNAs is the addition of a poly(A) tail to the

3’ end of the transcript. This process involves cleavage of the primary RNA transcript in the 3’

UTR, and the subsequent addition of adenosine residues. Proper processing of the poly(A) tail is

essential for gene expression since the poly(A) tail is necessary for transport of the transcript out

of the nucleus148, stabilization of the transcript149 and initiation of translation150.

The majority of mammalian transcripts contain the canonical polyadenylation signal

sequence A(A/U)UAAA 10 – 35 nucleotides upstream of the cleavage site and a GU-rich motif

14-70 nucleotides downstream of the cleavage site148,151. Cleavage requires binding of the

proteins cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulatory factor

(CstF) to these motifs respectively. Binding of these two proteins allows for the assembly of the

3’ processing complex which includes up to 85 different proteins including cleavage factors Im

148,152 and IIm (CFIm and CFIIm) and poly(A) polymerase . Not all of the proteins associated with

the polyadenylation complex play a role in the formation of the 3’ end of the transcript, some are

involved in transcription, splicing, and termination153. The actual cleavage of the RNA transcript

is performed by CPSF-73, a subunit of CPSF, however the entire complex of proteins is needed

for cleavage to occur154. Poly(A) polymerase then adds a string of 200-250 adenosine residues to

the 3’ end of the transcript upstream of the cleavage site155.

In some cases mRNA cleavage and polyadenylation occurs in the absence of the canonical A(A/U)UAAA signal motif. 15-30% of human transcripts and 25% of mouse transcripts do not contain an AAUAAA or AUUAAA element in the 3’ UTR156,157. The

mechanisms by which this occurs are not yet fully understood, but in at least one case CFIm has

96

been shown to determine where polyadenylation will occur in the absence of a A(A/U)UAAA

151 motif . CFIm is composed of two subunits (a small 35 kDa protein, and a larger protein of

either 59, 68 or 72 kDa), and is believed to play a role in linking 3’ end formation and splicing of

pre-mRNAs151,158,159. The most frequently used poly(A) site in the 3’ UTR of the poly(A) polymerase-γ (PAPOLG) gene does not contain a motif with more than a 4 nucleotide match to the sequence A(A/U)UAAA151. The 3’ end of PAPOLG is highly conserved among vertebrates,

and CFIm was found to bind to a UGUAN element repeated upstream of the primary poly(A)

site. Binding of CFIm to these sites is necessary for both poly(A) site cleavage and proper

addition of adenosine nucleotides. CFIm is able to direct 3’ end polyadenylation of PAPOLG

transcripts though interactions with the hFip1 subunit of CPSF (which usually binds to the

A(A/U)UAAA motif) and PAP151. A similar mechanism is likely to regulate poly(A) site

selection in other transcripts which lack the canonical poly(A) motif.

Approximately 54% of human transcripts and 32% of mouse transcripts use alternative

polyadenylation sites, meaning that they must choose which of multiple polyadenylation signals

to use in a given transcript155. This will lead to transcripts with 3’ UTRs that differ in length, or

in combination with alternative splicing, can generate transcripts with different coding regions.

3’ UTRs contain sequences important for RNA localization, stability, translation and microRNA

binding160. Poly(A) site selection has been shown to be important for the proper expression and

function of many genes in the nervous system, including brain-derived neurotrophic factor

(BDNF). Cleavage of BDNF transcripts occurs at one of two possible poly(A) sites, allowing for

the generation of two possible transcripts (“short” and “long”), each of which encodes the same

protein, and are produced in high levels161. An et al. demonstrated that the short form localizes

to the somata, while the long form localizes to the dendrites, where it is locally translated161.

97

Transgenic mice which only express the short form of BDNF have dysmorphic dendritic spines

and impairments in hippocampal LTP, despite the fact that they express the same level of BDNF

protein as WT mice.

The method by which a cell selects which poly(A) site to use are not yet well

understood162. Potential site selection methods are thought to include the blocking of protein

binding sites in the transcript by RNA binding proteins, post-translational modifications (such as

phosphorylation) of the polyadenylation protein complex, or the affinity of certain subunits of

the polyadenylation complex for sequences flanking the cleavage site162.

CstF-64, a subunit of CstF, was one of the first proteins demonstrated to have a direct

role in poly(A) site selection163. CstF-64 is involved in the processing of immunoglobulin M heavy chain (IgM H-chain) pre-mRNA. B cells produce a membrane-bound form of IgM H- chain (µm), while differentiated plasma cells produce a secreted form (µs)164. The secreted IgM

H-chain uses a downstream µs specific poly(A) site, while the membrane-bound from is generated using an upstream µm specific poly(A) site. Increasing the cellular concentration of

CstF-64 is sufficient to cause the switch from the production of µm to µs163. CstF-64 expression

is the limiting factor in the generation of CstF. It is believed that when low levels of CstF-64 are

present the weaker µs poly(A) site is not recognized, however when levels of CstF-64 are increased, the µs poly(A) site (which is transcribed first) is used. These results indicate that the level of expression of the polyadenylation complex subunits may regulate poly(A) site selection for certain genes; higher levels of expression may be necessary in order to utilize weaker poly(A) signal motifs.

98

Recently, a novel role for the Nova2 protein in poly(A) site selection was identified165.

Nova2 is a brain specific RNA binding protein which binds to YCAY motifs and regulates the

splicing of pre-mRNAs166. Licatalosi et al. identified Nova2 binding sites using HITS-CLIP, a technique involving the crosslinking of RNA-protein complexes, followed by the use of high throughput sequencing to identify RNA fragments bound to the proteins165. The authors were

surprised to find that Nova2 bound to known 3’UTR regions, near poly(A) sites, as well as to

regions within 10 kb of stop codons which were believed to be unannotated longer 3’ UTRs of

known genes. qRT-PCR was used to look at the expression of 29 candidate Nova targets in

Nova2 knockout mice. Twelve of these genes were found to have significant differences in

poly(A) site selection, though the total levels of expression were the same. Nova2 was able to

block or enhance use of a poly(A) site by binding to flanking YCAY-rich motifs.

In rare cases, 3’ end formation of mRNAs has been found to occur through noncanonical

mechanisms. For example, transcripts of the yeast CTH2 gene are not usually cleaved at the poly(A) signal site; instead the final 1.8 kb of the primary transcript is degraded in a 3’ – 5’ manner by the nuclear exosome/TRAMP complex until it reaches a G/U rich sequence and polyadenylation occurs167. In mice and humans the 3’ UTRS of MALAT1 and MEN β, two non-

coding RNAs, form a secondary structure which is recognized and cleaved by RNase P168,169.

Each of these genes does have a canonical polyadenylation site which is used at low levels in

vivo, however in the majority of transcripts 3’end processing is done by RNAse P, an enzyme

previously identified for its role in tRNA processing. These transcripts do have poly(A) tails,

however they are not generated using poly(A) polymerase. Instead there is a sequence of

poly(A) residues genomically encoded immediately upstream of the RNase P cleavage.

99

Together these examples illustrate that polyadenylation of transcripts is a complex

process, and there are many exceptions to the “rule” of canonical 3’ end processing.

3.2.2 Transcription termination

During the elongation stage of transcription the C-terminal domain (CTD) of RNA polymerase II (RNAPII) is phosphorylated at specific serine residues to allow for binding of proteins needed at different points in the generation of the mRNA170. Early in transcription

proteins needed to for the addition of the 5’ cap bind to the CTD, then as transcription proceeds

differential phosphorylation allows for interactions with proteins needed for polyadenylation,

and then termination171,172. RNAPII continues to transcribe RNA after passing by the

polyadenylation signal(s), and the mechanisms by which transcription terminates has not yet

been fully elucidated173. There is evidence to suggest that the processing of the 3’ end of mRNA

is a key step in the process of transcription termination174.

Two models have been proposed linking transcription termination, and the displacement

of RNAPII from the RNA strand to the cleavage of the RNA strand in the 3’ UTR for

polyadenylation. The allosteric/anti-termination model proposes that after recognition of a poly(A) site, an anti-termination factor dissociates from RNAPII and/or RNAPII undergoes a conformational change, which gradually causes transcription to cease and RNAPII to “fall off” the transcript175,176. This is in contrast to the torpedo model, first proposed by Connelly and

Manley in 1988177. According to this model, following cleavage of the nascent RNA transcript

at the poly(A) site, a 5’ – 3’ exonuclease degrades the portion of the transcript that is downstream of the cleavage site. Once the exonuclease catches up with RNAPII it causes transcription termination.

100

Rtt103 was identified in a search for proteins that bind to the CTD or RANPII in yeast,

and was found to play a role in 3’ end processing178. Rat1, a yeast 5’ – 3’ exoRNase, was found

to associate with Rtt103 and both proteins have been shown to associate strongly with the 3’

ends of genes using ChIP. Kim et al. demonstrated that Rat1 co-transcriptionally degrades the uncapped RNA downstream of the poly(A) site, and that this exonuclease activity was required for proper termination of transcription178. XRN2, the human homolog of Rat1, has been found to

play a similar role in transcription termination giving support to the torpedo model of

termination179.

More recently, it was demonstrated that while XRN2 and Rat1 are necessary for RNAPII release from the nascent transcript, they are not sufficient to cause release of the elongation complex180. The authors found that in addition to its co-transcriptional exonuclease activity,

Rat1 also serves to recruit other 3’ end processing factors needed for termination including Pcf11

and Rna15. Pcf11 has been shown to bind to the CTD of RNAPII, resulting in the dissociation

of RNAPII and the nascent transcript from the DNA template181. It has been suggested that

Pcf11 is an allosteric effector that causes the dissociation of RNAPII by altering its

conformation180,181. These results indicate that neither the torpedo model nor the allosteric model

of termination is sufficient to describe this complicated process. It is likely that processes

described in each model are necessary for RNAPII dissociation and transcription termination to

occur in vivo.

3.2.3 Strain specific gene expression

Gene expression has been shown to vary between different mouse strains. Nadler et al. studied gene expression in 7 different brain regions in 10 different strains of inbred mice, and found that 30% of genes show strain specific differences in expression in at least one brain

101

region182. This variation in brain expression is not surprising, as different mouse strains are

known to exhibit different behaviours132,133. MicroRNA (miRNA) expression is also likely to

show great variation between mouse strains, and a number of miRNAs have been shown to have

strain specific expression levels in the hippocampus183. This could potentially have downstream

effects on the protein expression levels of multiple genes as many genes are post-

transcriptionally regulated by miRNAs. This type of regulation often occurs in conjunction with

poly(A) site selection as the use of alternative poly(A) sites can result in transcripts which either

include or exclude miRNA binding sites. For example, the β-actin gene encodes two isoforms

which differ in 3’ UTR length. The longer isoform is expressed at lower levels, however this

isoform is translated with higher efficiency due to miRNA binding160.

While poly(A) selection in different mouse strains has not been studied in detail, strain specific selection has been shown to occur. C57BL/6 mice express an Adh4 isoform in their stomachs that is not found in other mouse strains184. Analysis of the 3’ UTR revealed that a SNP

in C57BL/6 introduced a new poly(A) signal.

These examples illustrate how gene expression can be differentially regulated in different

mouse strains both at the level of transcription and post-transcriptionally.

3.3 Material and Methods

3.3.1 Expression analysis using quantitative Real-Time PCR

The protocol for quantitative real-time PCR can be found in section 2.3.9. Primer

sequences used to detect exon specific expression differences are listed in table 3.1.

102

Table 3.1 Primers used for quantitative real-time PCR amplification from cDNA

Primer Name Forward primer sequence (5’ – 3’) Reverse primer sequence (5’ – 3’) Actl6b ex.3 GGAGGGGGAGAAAGAGAAGA CATTCTTGAGGGGCGACAT Actl6b ex.7 CTGGCAGGGGACTTCATCT TGGCTGCAATCATGTAAGGA Actl6b ex.10 GCCCACTGTGCATTATGAAA GGATCAAACAGGCCCTCAG Actl6b ex.13 GTCTTAAGCTCATCGCCAGCA CAGTGAGGCCAAGATGGAG Actl6b ex.14 TTCCAGCAGATGTGGATCTC CAACTTCAGGGGCACTTCC Actl6b 3' UTR ATACCCGTCCACCCCATC GGGTAATGGGAAAGGGAGAG Ap4m1 ex.15 CTCCAGGTTCGATTCCTCAG TTGCTGTGGCTTAGATGTCG Ap4m1 5' UTR GGGTTCAACTTTCACCGTGT CCCCTTGGAAGACAGAATGA Coq2 ex.2 GACCCAGGTTGTTTTCCAGA CACGGTCCCACATGTCATTA Coq2 ex.6 TCAGCTCGTCTGCTCACTTC GGGATTGTCCTTGGAAACCT Coq2 3' UTR-A GCCCAAGGCTCTAGGTTCTC TTGCTCATCCAAGCCTAACA Coq2 3' UTR-B AATGCTAACACAGGGGCCTA GGCAGCGTGTACTGGACTTT Coq2 3' UTR-C TCTTGAATTACAGCTTTGGCAGT ACATGGCCGTGTGCTTTATT Kin ex.1 GCAAGTCGGATTTTCTGAGC ATCTGGCAGTACCAGCGAAG Kin 3'UTR TGAAAGGACGCAGAGTTGAA GTGCCTTGGCTAACACCAAT Mrpl16 5' UTR CTGGGAAAAGCCACTGTTGT AAAGTGCATCCGCAGGAG Mrpl16 ex.4 GCTTGACAATCAACCGCTTT CAACACCTTTGCGAGTGATG Mrpl16 3' UTR GTAGTGAAAGCGCGAGGAAC AGAACCAGCAAAGACCCTCA Pex1 ex.21 CCTGGGAAAGACCCGTTATT ATCTCTCTGCTCCTGGGTGA Pex1 3' UTR-A GTTTGCTCCCATCTCTCCAG GCAAGTGGCACTGATGGTAA Pex1 3' UTR-B GCATTAGCTTGAGCACAGCA TCAAGTGCTTGAATGCTTGG Stx3 ex.10 ATCATGATCTGCTGTATTATCCTTG AGGCAAATATGCCCCCAAT Stx3 3' UTR-A ACAACATGCCCAACTCAACA TGCGACCTAGAAGAGCCATT Stx3 3' UTR-B TGGTCTCAGGATGGAGGTTC TTTGGGAGCTGGGTCATAGT Stx3 3' UTR-C AAAAGTAGGGAGTACCATGATCTGA CAGGATGGTGGTGAAGAGG Stx3 3' UTR-D GAAGGGACATGGTAGTATTCGAG CGCATTCTTAACCAACCACA Stx3 3' UTR-E AAAAATCATGTTCCCAATGGT GCCACTTTCAGATGTCTGCTT Stx3 3' UTR-F TCAATACAAAGCCAGCTTCTACA GGCACATAGAAAAATATGGCAAC Stx3 3' UTR-G GGACCAGTTTCTTGCACATTT AAGGAAGCCAAGGGGATAAA Stx3 3' UTR-H GCTGCCCATCTTCTGTCAGT GCTTCAGATCTAGGAAGGGTTTC Taf6 ex.10 GTGGACAATCACTGGGCACT GATGTTGTTGGTGGTTGTGC Taf6 3' UTR TCACATGTGCTGACCTCCTC GGGGAAAACCTTTCCTCCTT Zfp68 ex.3 AGGTGGCTGTGCTGTAGACTC CATTTCTGGCTTCCAGGACA Zfp68 ex.5 GGCACTGCAAAACCAAACC CACACTGGTGTGAGGCTTCT Zfp68 ex.6 ACACACCCCGAGCAAGTTAG TGTTGAAGGATTTCCCTGGT Zfp68 3' UTR-A GCTAAGGGGACCCTGTGATT CAAGGTTTTCCTTCACCGTTT Zfp68 3' UTR-B CACCGTTTATTCATTTGGTTTAAT GCTAAGGGGACCCTGTGATT Zfp68 3' UTR-C AAAGCAGGAGAGATGGCTCA TCAGAGGACAAATCCCAAGG Zfp68 3' UTR-D GCCACTTCTTTGCTGTTTCC CCCATGGATAGGTCATGGTC

103

3.3.2 Generation of probes for Northern blots

Probe sequences were amplified from cDNA that had previously been prepared from a

WT P0 mouse brain. Conventional PCR using high fidelity enzyme mix (Fermentas, K0192), and primers specific to the test genes (Table 3.1) was performed. The desired PCR amplicons were cloned into a pCR 2.1-TOPO TA cloning vector (Invitrogen). The vectors were digested with EcoR1 to remove the probe, and following electrophoresis, the probe sequence was purified from an agarose gel using a QIAquick Gel Extraction Kit (Qiagen).

Table 3.2. Primer sequences used in the generation of Northern blot probes

length Probe Forward Primer Sequence (5' - 3') Reverse Primer Sequence (5' - 3') (bp) Mrpl16 5' end CTGGGAAAAGCCACTGTTGT TGATCGATGGCACCTTTACC 504 Mrpl16 3' end GTGGACGGTGTGAATTTGAA CAGCTGGCTATCAAACTGTCC 497 Mrpl16 GGAAAAGCCACTGTTGTAGTTG CAGCTGGCTATCAAACTGTCC 1042 full length Kin 5' end GCAAGTCGGATTTTCTGAGC TCTCCTGCTCTTTCCCTTCC 558 Kin 3' end CACCGAAAGGCTGGTACATT GTGCCTTGGCTAACACCAAT 843 Pex1 5' end TTGGACTCTCAAGCGGAGAT CTTTGTCAGGCAACAGGACA 819 Pex1 3' end AGATCAGGTGTCCCGTCTTG CAGCTCAGCAAATTCCTTCC 745 Pex1 long & CCTAAAGACGTCAATGAAGAAAC GTGATCAAAAGAGCGCCATTC 440 short Stx3 e9 - 3' TTGACCGCATTGAGAACAAC GCAGGCACTGTGTGCTAAGA 742 UTR Stx3 e10 - 3' TGCCATCATCTTGGCTTCTA TGAACCTCCATCCTGAGACC 892 UTR

The probes were radioactively labeled before hybridization to the blot. 100 ng of DNA was boiled with 1.25 µL random hexamers, in a total volume of 8.5 µL for two minutes. 1 µL of

100X BSA and 10 µL of 2.5X Random Priming Buffer (0.5 M HEPES, 12.5 mM MgCl2, 0.025

M β-mercaptoethanol, 0.125M Tris, pH8.0, 50 µM dATP, 50 µM dGTP, 50 µM dTTP) were added, before incubation at room temperature for 10 min. 1µL of Klenow fragment (Fermentas,

EP0051), and 5 µL (50 µCi) of deoxycytidine 5’ triphosphate [ alpha–32P] (GE Healthcare,

104

Quebec) were then added, and the mixture was incubated at 37°C for 2 hours. The probe was then passed through a sephadex column to remove unincorporated nucleotides.

3.3.3 Northern blot analysis

Total RNA was extracted from the brains of two WT and two Gtf2ird1-/- mice, using

TriReagent (Sigma) following the manufacturer’s protocol. 15 µg of each RNA sample was

mixed with 2X RNA loading dye containing formamide (Fermentas) and heated at 70°C for 10

min. After chilling on ice, the RNA samples were then run on a 1% agarose-formaldehyde

(0.7M) gel in 3-(N-morpholino) propanesulfonic acid (MOPS) buffer. An RNA ladder,

RiboRuler high range (Fermentas, SM1821), was also run to determine the sizes of the detected

transcripts.

The RNA containing gel was rinsed in RNase free ddH20, and then soaked for 20 min. in

5 gel volumes of 0.01 N NaOH/3 M NaCl to partially hydrolyze the RNA. The RNA was then

transferred to a positively charged nylon membrane (Amersham Hybond-N+, RPN303B) using

capillary action. The RNA was fixed to the membrane by irradiating at 254 nm at 120 mJ/cm2

for two minutes using the Spectrolinker XL-1500 UV crosslinker.

The blots were incubated for 2 hours at 68° C in modified Church-Gilbert solution (0.5M sodium phosphate (pH 7.2), 7% (w/v) SDS, 1mm EDTA (pH 7.0) before addition of the probe. 2

µL of labelled probe was used to check the efficiency of the labelling reaction, and the rest was denatured by heating at 100° C for 10 min, and then added directly to the modified Church-

Gilbert solution already on the blot. The blots were then incubated at 68° C overnight.

Following hybridization, the blots were washed 1x10 min. with 1X SSC + 0.1% SDS at room temperature, and then 3x10 min. with 0.5X SSC + 0.1% SDS at 68° C. Membranes were then exposed to X-ray film at -70°C for 24 hours to 1 week.

105

3.3.4 3’ Rapid Amplification of cDNA ends (RACE)

Total RNA was extracted from the heads of E15.5 mouse embryos (Gtf2ird1-/- and WT, 3

mice per genotype) using TriReagent (Sigma) following the manufacturer’s protocol. RNA

samples were treated with DNase (Turbo DNA free, Ambion) to ensure they were free of

genomic contamination. Synthesis of first strand cDNA was performed using Invitrogen’s 3’

RACE System for Rapid Amplification of cDNA Ends kit, following the manufacturer’s

protocol. Briefly, 4 µg of RNA in 11 µL was mixed with 1 µL of the 10 µM oligo(dT)-

containing Adaptor Primer and incubated at 70°C for 10 min. After cooling, 2 µL of 10X PCR

buffer was added along with 2 µL of 25 mM MgCl2, 1 µL of 10 mM dNTP mix and 2 µL of 0.1

M DTT. The mixture was incubated at 42° C for five min. at which point 1 µL of SuperScript II

RT was added and the mixture was incubated for an additional 50 min. The reaction was

terminated by heating at 70° C for 15 min, and any remaining RNA was degraded using RNase

H.

Nested PCR was used to amplify the desired transcripts. The first round of PCR was

generally performed using a forward (gene-specific) primer located 1-2 exons upstream of the exon containing the 3’ UTR. The only exception to this was Zfp68; both rounds of PCR used a forward primer located in the 3’ UTR containing exon. The reverse primer used in the first round of PCR was the Universal Amplification Primer (UAP) which was provided with the

3’RACE kit (Invitrogen). The UAP will bind to the adaptor region which is added to each cDNA by the Adaptor Primer during the first strand synthesis step. The product of the first round of PCR was diluted 1/10,000, and then 1 µL was used in the second PCR round. The forward primer was located either 1 exon upstream of, or at the beginning of, the exon containing

106 the 3’ UTR. The reverse primer was the Abridged Universal Amplification Primer (AUAP), also provided with the 3’RACE kit.

Table 3.3. Forward primer sequences used in nested PCRs of first strand cDNA from 3’ RACE

Primer name Primer sequence (5' - 3') Actl6b ex.12 ACTCTGCTTCAGGGGTTCAC Actl6b ex.13 AAGTTCAGCCCCTGGATTG

Ap4m1 ex.13 CAACATTCATCTGCACCTTCC Ap4m1 ex.14 TCCAGATCAGAAGGCAGAGC

Coq2 ex.5 GTGCGTTACTCGGATGGTCT Coq2 ex.6 AGGAGAACACAAGGCAGTGG

Kin17 ex.3 TCCGAAATGACTTTCTGGAAC Kin17 ex.4 GCACTAAAAGGGTCCACAACA

Mrpl16 ex.3 TGTCTCCATCCCTGAAAGGT Mrpl16 ex.4 GCTTGACAATCAACCGCTTT

Stx3 ex.9 TTGACCGCATTGAGAACAAC Stx3 ex.10 TGCCATCATCTTGGCTTCTA

Taf6 ex.13 CAAGCTCAGCAGGTCAACAG Taf6 ex.14 CTCCTCAGCCTTCTCCTCCT

Zfp68 ex.6a CCTGACTCGACACCAGAGAA Zfp68 ex.6b GAAAGCCTCTGAGAGCAAACA

3.3.5 Cloning and sequencing of 3’ RACE products

The products from the second round of nested PCR on first strand cDNAs from 3’ RACE were run on an agarose gel. Each band was cut out of the gel, and the DNA was extracted from the agarose using a Qiaquick Gel Extraction kit (Qiagen). Each PCR product was cloned into the pCR 2.1-TOPO TA cloning vector (Invitrogen). The size of the inserted piece of DNA was confirmed using EcoR1 digestion to remove the DNA from the vector. Vectors containing DNA of each detected size were sent to the Sanger Sequencing Facility at TCAG for sequencing using

107

capillary-based fluorescent sequencing on the ABI 3730XL instrument. Sequencing was done

using forward and reverse primers specific to the pCR 2.1-TOPO TA vector. When sequences

derived using these primers were not sufficient to cover the entire piece of DNA, internal primers

within the DNA sequence were used. The sequences were aligned against the mouse genome

using the UCSC genome browser (http://genome.ucsc.edu/).

3.3.6 Expression analysis using western blots

Western blots were performed as described in section 2.2.12 with the following

exceptions.

Protein was extracted from newborn mouse brains using RIPA lysis buffer (10 mM Tris

(pH 8.0), 100 mM NaCl, 1 mM EDTA, 1% NP-40, 0.5% NaDOC, 0.1% SDS) with a protease inhibitor cocktail (Sigma, P8340). Each brain was homogenized in 2 mL of lysis buffer, and then incubated on ice for 20 min. Once lysis was complete the cells were centrifuged at 4° C for

20 min., and the supernatant was transferred to a new tube.

Primary antibodies used: rabbit anti-Syntaxin 3 (Sigma, S5547), diluted 1/1000, mouse anti-α-Tubulin (Sigma, T 9026), diluted 1/50,000. Primary antibodies were diluted in blocking solution and incubated with the membrane for 1 – 2 hours at room temperature with shaking.

Membranes were washed 3 x 10 min. in TBS-T, and then incubated for 1 hour at room temperature with ECL Mouse IgG, HRP-Linked Whole Ab (from sheep) (GE Healthcare,

NXA931) or ECL Anti-rabbit IgG, HRP-Linked Whole Ab (from donkey) (GE Healthcare,

NA934), diluted 1/10,000 in blocking solution. Following 2 x 10 min. washes in TBS-T and a final 10 min. wash in TBS, chemiluminescent detection was performed using ECL (enhanced chemiluminescence) reagents (GE Healthcare) and Hyper Film (GE Healthcare).

108

3.4 Results

3.4.1 Differential gene expression detected in Gtf2ird1-/- mice is exon specific

Further expression analysis of candidate genes which were altered in the Illumina

mouseWG-6 v2.0 BeadChip array and the Affymetrix mouse 430 2.0 gene chip array revealed

that the changes in expression detected in the Gtf2ird1-/- mice were exon specific; differences

could only be detected in certain exons. The primers that were initially designed to confirm the

results of the microarray were generally in the 3’ UTR of the gene, as this is usually where the

probe sequences on microarray chips are found. When possible, one of the primer sequences

overlapped with the probe sequence on the microarray chip. Other primer sets were then used to

validate the changes in gene expression, which were designed to be able to distinguish between

known splice forms of the candidate genes. Surprisingly, expression differences between exons

were detected which could not be explained by known splice forms.

Using qRT-PCR on P0 mice, Kin, Mrpl16 and Stx3 were all found to have significantly

decreased expression in Gtf2ird1-/- mice when primers targeted to the 3’ UTR sequences were

used, however when primers for upstream sequences were used, no changes in expression could

be detected (Figure 3.1). There are two known splice forms of Kin, which differ in where the 3’

UTR begins. qRT-PCR was performed using primers in exon 1, and at the terminal end of the

3’UTR; both of these primer sets amplify from both known splice forms. Exon 1 expression did

not differ between WT and Gtf2ird1-/- mice, however 3’ UTR expression was significantly lower

in the Gtf2ird1-/- mice (Figure 3.1). In addition, expression of the region of the 3’ UTR, which this primer set amplifies, appears to be lower than the expression of exon 1. As the primer pairs used will amplify both known splice forms, it would be expected that the expression level of exon 1 should be equal to the expression level of the 3’ UTR. A similar trend was seen for

109

Mrpl16, in that the expression level detected using a primer pair that was specific to the 5’ UTR

did not find genotype specific differences in expression, yet expression of the 3’ UTR was found

to be approximately 50% decreased in Gtf2ird1-/- mice (Figure 3.1).

Interestingly, Mrpl16 is found next to Stx3 on mouse chromosome 19. The genes are

oriented in a tail-to-tail fashion, with their 3’ UTRs separated by only 150 bp. The 3’ UTR of

Stx3 also showed significantly decreased expression in Gtf2ird1-/- mice. Exon 10 of Stx3 also showed decreased expression in Gtf2ird1-/- mice, although the difference was not as dramatic

(Figure 3.1). If the 3’ UTR of either Stx3 or Mrpl16 was to extend further than is indicated in the genomic database (UCSC browser) and overlap with the other gene it would affect the ability of qRT-PCR to detect gene specific changes in expression, as the primers designed to amplify the

3’ UTR of one gene would actually amplify from both of the genes.

The microarray probe which detected altered Pex1 expression in Gtf2ird1-/- mice is located in the 3’ UTR of a small Pex1 transcript which begins and ends within the longer Pex1 transcript. All coding exons of the short transcript are shared with the longer transcript, but the

5’ and 3’ UTRs are unique (found within of the longer transcript). Only this short transcript was affected in the P0 Gtf2ird1-/- mice (Figure 3.1). Expression of the longer

transcript, as detected using primers located in exon 21, was not different between genotypes.

The longer transcript is expressed at 10X higher levels than the short transcript which made it

impossible to determine if there were significant expression differences within the coding exons

of the short transcript.

110

Figure 3.1. Differences in gene expression detected in Gtf2ird1-/- mice using qPCR are exon specific. RNA was extracted from P0 mouse brains. RNA from 9 mice of the same genotype was pooled together to make cNDA, n=3 separate pools/genotype. Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated). * p < 0.05, ** p < 0.005 using Student’s t-test.

Exon specific expression differences were also detected in Zfp68, Taf6, Coq2 and Actl6b, all of which were identified as candidate genes in the Illumina mouseWG-6 v2.0 BeadChip array. qRT-PCR was used to look at the expression levels of specific exons within these transcripts. Alternative splicing is not known to occur within Actl6b transcripts, yet there was great variation in the expression of specific exons in Gtf2ird1-/- mice relative to WT mice.

Expression of the 3’ UTR was decreased by approximately 50%, while expression of exons 13

and 14, which are immediately upstream of the 3’ UTR, were increased by 40-50%. Exons

located further upstream, including exons 3, 7 and 10, did not show differential expression

between genotypes (Figure 3.2).

111

A similar trend was seen when looking at the expression of Zfp68; the 3’ UTR was

expressed at lower levels in Gtf2ird1-/- mice, while exons 5 and 6, immediately upstream of the

3’ UTR, were expressed at higher levels in Gtf2ird1-/- mice (Figure 3.2). The PCR amplicons

were sequenced, and there were no SNPs found in the primer sequences which would affect PCR

efficiency. However, it may be possible to partially explain these findings based on the

alternative splicing which is known to occur with this gene. Taf6 and Coq2 both showed altered

expression in the 3’ UTR in Gtf2ird1-/- mice, with no detected expression differences in upstream coding exons (figure 3.2). As alternative splicing is not known to occur with these genes, these findings are harder to explain.

Figure 3.2. Differences in gene expression detected in Gtf2ird1-/- mice using qPCR are exon specific. RNA was extracted from E15.5 embryo heads (n=5/genotype). Expression values are shown relative to the housekeeping gene Sdha. * p < 0.05, ** p < 0.005 using Student’s t-test.

112

3.4.2 Northern blot analysis does not detect novel alternatively spliced transcripts

In order to identify novel transcripts which could explain the exon specific differences in

gene expression that were detected, Northern blots were performed to look at Mrpl16, Stx3, Kin

and Pex1 expression. Multiple probes were used for each gene to ensure that the probes were

binding specifically to the target RNA.

Identical transcripts were identified with each of the three probes which were used to

look at Mrpl16 expression. Two sizes of transcripts were detected; a large transcript of ~4500 bp

which is likely to correspond to the Mrpl16 primary transcript, and a smaller transcript of ~1200

bp corresponding to Mrpl16 mRNA (Figure 3.3A). No differences in the size or number of

transcripts could be detected between genotypes. At least four different sized transcripts were

detected using probes for Stx3. Again, there were no obvious differences in the size or number

of transcripts between genotypes (Figure 3.3B). There were two different transcripts detected for

Kin: a smaller transcript, under 1800 bp, which likely corresponds to the mRNA, and a larger

transcript of approximately 4700 bp, which may represent the primary RNA transcript (Figure

3.3C). The final gene to be studied using Northern blot analysis was Pex1, which expresses a

full length transcript as well a smaller transcript that starts and stops internal to the full length

transcript (Figure 3.3D). A specific probe could not be generated for the smaller transcript, and so probes were used which would detect either both the small and large transcripts or only the

larger transcript. The probe which detected the short and long isoforms identified the same

transcripts as the probe which was specific to the 3’ end of the longer isoform (figure 3.3D).

This indicated that the shorter transcripts may be expressed below the threshold of detection.

The probe which was specific to the 5’ end of the longer isoform appeared to detect some

113 smaller transcripts (< 2000 bp) which were not detected by the other probes. These could represent non-specific binding or incomplete Pex1 transcripts.

Taken together, these results indicate that if alternative splicing is the cause of the detected exon specific differences in gene expression the transcripts produced are close in size to the known transcripts, or are produced at levels below the threshold of detection.

114

115

Figure 3.3. Northern blot analysis did not reveal any genotype specific splice forms of Mrpl16(A), Stx3(B), Kin(C), or Pex1(D). RNA was extracted from whole brains of P0 mice. Multiple probes were used for each gene. Expected transcript length and exons contained in the cDNA probes are indicated.

3.4.3 Alternative splicing in the 3’UTR identified using 3’ RACE

As Northern blots were not useful in detecting any novel splice forms of candidate genes,

I then went on to look at transcripts produced by these genes using 3’ RACE. This technique is able to detect small differences in transcripts, such as the inclusion or exclusion of small numbers of base pairs. The 3’ RACE products for each gene were cloned and sequenced, and the sequences were aligned against the UCSC genome browser database to identify transcripts which used alternate polyadenylation sites, or have splicing occurring within the 3’ UTR.

116

Multiple Stx3 transcripts were identified, including some which appear to be novel

(Figure 3.4). These included transcripts with splicing occurring within the 3’ UTR, and transcripts with shortened 3’ UTRs. Six different 3’ UTR ‘end points’ were detected and 4 of

these corresponded with canonical polyadenylation signal sequence motifs. Mouse mRNAs and

ESTs have previously been reported with similar end points to 3 of the 6 different transcripts that

I detected. There are no mRNAs or ESTs reported which have splicing occurring within the 3’

UTR. No splicing was detected within the Mrpl16 3’ UTR, however three different 3’ UTR end points were detected (Figure 3.4). The Mrpl16 3’ UTR only contains one canonical polyadenylation signal sequence motif which is in the middle of the 3’ UTR and is not used in the generation of the full length transcript. I detected full length transcripts, transcripts which correspond to the polyadenylation signal sequence motif, and shorter transcripts which end half way through the last coding exon. The UCSC database contains ESTs which correspond to each of these transcript end points.

117

Figure 3.4. 3’ RACE analysis identified novel 3’ UTR splicing and alternative polyadenylation site usage for Mrpl16 and Stx3. RACE was performed on RNA extracted from heads of E15.5 mouse embryos. Nested PCR was performed using gene specific forward primers. PCR products were cloned and sequenced, and the sequences were aligned against the UCSC database. Green arrows indicate location of forward primer used in final round of PCR.

3’ RACE analysis of Zfp68 transcripts detected transcripts with 3 different end points, each of which corresponds to a canonical polyadenylation signal sequence motif, and to mRNAs which have previously been identified (Figure 3.5). The 3’ UTR of Zfp68 is relatively long

(~2600 bp) and contains 8 canonical polyadenylation signal sequence motifs. I did not identify any full length transcripts using this method, however they are present in the brain tissue at this time point as I was able to amplify sequences from the terminal end of the 3’ UTR by PCR

118

performed on cDNA generated using random hexamer primers. It is likely that full length 3’

UTRs were not detected because during the creation of the first strand cDNAs, extension from the terminal end of the UTR did not reach the location of the forward primer used for PCR.

Figure 3.5. 3’ RACE analysis identified alternative polyadenylation site usage for Zfp68. RACE was performed on RNA extracted from heads of E15.5 mouse embryos. Nested PCR was performed using gene specific forward primers. PCR products were cloned and sequenced, and the sequences were aligned against the UCSC database. Green arrows indicate location of forward primer used in final round of PCR.

There are two canonical polyadenylation signal sequence motifs in the Coq2 3’ UTR, and

transcripts corresponding to the use of each motif were detected (Figure 3.6). In addition, a

transcript which extended into the intronic region following the last coding exon was identified.

This transcript did not include any of the 3’ UTR sequence, or a polyadenylation sequence motif.

119

The UCSC database does not contain any mRNAs or EST’s corresponding to either of the

shorter Coq2 transcripts. A transcript was also detected which has the same end point as full

length Coq2 transcripts, but had a portion of the 3’ UTR and last coding exon spliced out.

Figure 3.6. 3’ RACE analysis identified novel 3’ UTR splicing and alternative polyadenylation site usage for Coq2. RACE was performed on RNA extracted from heads of E15.5 mouse embryos. Nested PCR was performed using gene specific forward primers. PCR products were cloned and sequenced, and the sequences were aligned against the UCSC database. Green arrows indicate location of forward primer used in final round of PCR.

Ap4m1 and Taf6 are located on chromosome 5, in a tail-to-tail orientation; their 3’ UTRs overlap by ~50 bp. Using a Taf6 gene specific forward primer, only full length Taf6 transcripts were identified (Figure 3.7). Full length transcripts were also identified using a forward primer

120 specific to Ap4m1, however an extended Ap4m1 transcript was also identified (Figure 3.7). This transcript extends well into the 3rd-last exon of Taf6, and contains a polyadenylation sequence motif at the terminus. Primers used to look at expression of the Taf6 3’UTR using qRT-PCR would have detected this extended Ap4m1 transcript as well as the Taf6 transcript.

Figure 3.7. 3’ RACE analysis identified novel 3’ UTR splicing and alternative polyadenylation site usage for Ap4m1. RACE was performed on RNA extracted from heads of E15.5 mouse embryos. Nested PCR was performed using gene specific forward primers. PCR products were cloned and sequenced, and the sequences were aligned against the UCSC database. Green arrows indicate location of forward primer used in final round of PCR. Sequencing was done on each cloning vector using forward and reverse primers; the dotted line in the extended Ap4m1 transcript represents the region not covered using the sequencing primers.

121

Actl6b was the final gene subjected to 3’ RACE analysis. PCR using Actl6b specific forward primers resulted in a single band in WT and Gtf2ird1-/- mice. These bands were extracted from the gel, and cloned. 20 random clones derived from mice of each genotype were selected for sequencing. No large deviations from the Actl6b full length transcript were detected, however it was noted that the transcripts detected with 3’ RACE had 9 different end points (Figure 3.8).

There was a difference of ~30 bp between the longest and shortest transcripts, with most of the transcripts being within 15 bp of each other. Actl6b does not have a canonical polyadenylation signal sequence motif, and the terminus of the 3’ UTR is highly conserved among mammals, indicating that this region could be of functional importance.

Figure 3.8. 3’ RACE analysis identified novel transcript endpoints for Actl6b. RACE was performed on RNA extracted from heads of E15.5 mouse embryos. Nested PCR was performed using gene specific forward primers. PCR products were cloned and sequenced, and the sequences were aligned against the UCSC database. Regions of the 3’ UTR that are conserved between mammals are indicated at the bottom.

122

3.4.4 Expression levels of different 3’UTR isoforms differ between genotypes

Primers were designed to perform qRT-PCR analysis which could differentiate between the different candidate gene isoforms that were detected using 3’ RACE. Three different 3’ UTR

lengths were detected for Zfp68: a short 3’ UTR and two intermediate length 3’ UTRs.

Significant decreases in expression were seen in Gtf2ird1-/- mice when primers which detect the short 3’ UTR or the full length 3’ UTR were used (Figure 3.9). The primers used to detect the

full length transcript were located at the terminal end of the transcript, and would not have

detected any of the shorter isoforms. However, the primers which would amplify from

transcripts with the short 3’ UTR would also amplify transcripts with intermediate or full length

3’ UTRs. Thus the decreased expression of the short isoform in Gtf2ird1-/- mice is likely to be

larger than indicated by these results. No genotype specific changes in the expression levels of

the transcripts containing intermediate length 3’ UTRs were detected.

Figure 3.9. qPCR using primers designed for specific isoforms identified through 3’ RACE. Location of primer amplicons is indicated on the bottom diagram. RNA was extracted from E15.5 embryo heads (n=5/genotype). Expression values are shown relative to the housekeeping gene Sdha. ** p < 0.005 using Student’s t-test.

123

Similar results were seen when looking at expression of the various Coq2 3’ UTR isoforms. Genotype specific differences in expression were seen only when using primers which would amplify transcripts with short and long 3’ UTRs (Figure 3.10). When primers were used which exclusively detected the long form of the 3’ UTR, or detected a region which was spliced out in some transcripts, no differences in expression could be detected in Gtf2ird1-/- mice.

Figure 3.10. qPCR using primers designed for specific isoforms of Coq2 identified through 3’ RACE. Location of primer amplicons is indicated on the bottom diagram. RNA was extracted from E15.5 embryo heads (n=5/genotype). Expression values are shown relative to the housekeeping gene Sdha. * p < 0.05 using Student’s t-test.

No alternate transcripts were detected for Taf6 using 3’ RACE, however a transcript was detected for Ap4m1 which overlaps with the Taf6 sequence. While the probes on the microarray would have been able to differentiate between the transcripts, the primers used in qRT-PCR would amplify from both the Taf6 transcript and the longer Ap4m1 transcript. Although primers in the Taf6 3’ UTR detected ~40% decrease in expression in Gtf2ird1-/- mice, no differences were detected in exon 10 (which does not overlap with the Ap4m1 transcript) or exon 13 (which does overlap with the Ap4m1 transcript) (Figure 3.11). Primers were also used which are located in introns 12 and 14 of Taf6. No significant differences in expression between genotypes were

124 detected in intron 14, however intron 12 showed significantly increased expression in the

Gtf2ird1-/- mice (Figure 3.11).

Figure 3.11. qPCR using primers designed for specific isoforms of Ap4m1 identified through 3’ RACE. Location of primer amplicons is indicated on the bottom diagram. RNA was extracted from E15.5 embryo heads (n=5/genotype). Expression values are shown relative to the housekeeping gene Sdha. ** p < 0.005, * p < 0.05 using Student’s t-test.

The largest number of 3’UTR isoforms detected using 3’ RACE was detected in Stx3.

Eight different PCR primer pairs were used to look at expression of the Stx3 3’ UTR in 5 WT and 5 Gtf2ird1-/- mice (figure 3.12). It was noted that some of the mice belonging to each genotype showed decreased 3’ UTR expression using certain primer pairs, and those same mice showed no (or extremely low) 3’ UTR expression using primers specific to different areas of the

3’ UTR. It is difficult to interpret the expression levels detected by the different primer sets, and it is likely that there are alternate isoforms of the Stx3 3’ UTR which were not detected in this study. Interestingly, there were WT and Gtf2ird1-/- mice which showed decreased expression in

125

the Mrpl16 3’UTR, and these were the same mice which showed lower levels of Stx3 3’ UTR

expression (Figure 3.12). This could indicate that the mechanism of 3’ UTR sequence selection for each of these genes is co-regulated, or that there may be an undetected isoform of one of these genes which extends into the UTR of the other gene, similar to what is seen with Ap4m1 and Taf6. This would mean that only one of the genes is actually affected, but both genes show altered expression using qRT-PCR as the primers do not distinguish between transcripts of the different genes.

Figure 3.12. qPCR using primers designed for specific Mrpl16 and Stx3 isoforms identified through 3’ RACE. Location of primer amplicons is indicated on the bottom diagram. RNA was extracted from E15.5 embryo heads. Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated)

126

3.4.5 Expression of Stx3 is variable and does not correlate with genotype

Two different patterns of Stx3 3’ UTR expression were detected, “high” and “low”.

These different patterns were seen with approximately equal frequency in the brains of the 5 WT and 5 Gtf2ird1-/- E15.5 mice used in the previous analysis. In order to determine if this variation

in Stx3 expression is common, or if it occurs more often in one particular genotype, a larger

sample size was used. Brain expression of the Stx3 3’ UTR was examined using primer set

“UTR-A” in a total of 27 WT (15 P0, 12 adult), and 25 Gtf2ird1-/- (13 P0, 12 adult) mice. High

and low levels of expression were detected in WT and Gtf2ird1-/- mice of both age groups

(Figure 3.13). The expression levels of Stx3 detected in the adult mice were not significantly

different between genotypes (average relative to Sdha: WT = 0.24 ± 0.13, Gtf2ird1-/- = 0.23 ±

0.24, p = 0.98). Roughly equal numbers of WT and Gtf2ird1-/- P0 mice had high and low levels

of expression, but the expression values of the Gtf2ird1-/- mice were slightly, and significantly, lower than the WT mice (average relative to Shda: WT = 0.31 ± 0.14, Gtf2ird1-/- = 0.20 ± 0.14, p

= 0.047). These results indicate that the previous significant differences detected in Stx3, and

possibly Mrpl16, are unrelated to genotype and are reflective of bias within a small sample to

different commonly expressed isoforms.

127

Figure 3.13. Expression of the Stx3 3’ UTR is variable between mice of the same genotype. qPCR was performed on RNA extracted from whole brains of P0 and adult mice. The Stx3 3’UTR-A primer set was used. Expression values are shown relative to the housekeeping gene Sdha.

3.4.6 Differences in gene expression of genes located close to the Gtf2ird1 locus are related to genetic background

It was previously determined that some of the expression differences between WT and

Gtf2ird1-/- mice were the result of differential expression between different mouse strains, and

were not related to the function of TFII-IRD1. In order to determine if the exon specific expression differences within certain genes were also a result of the 129 alleles carried by the

Gtf2ird1-/- mice, qRT-PCR was used to look at the expression of multiple Actl6b, Taf6, Ap4m1

and Zfp68 exons in WT CD1, Gtf2ird1-/- and WT 129S1/SvImJ P0 mice.

As was noted previously 129S1/SvImJ mice showed expression levels that were similar

to those of Gtf2ird1-/- mice and, for many of the exons examined, 129S1/SvImJ and Gtf2ird1-/-

mice displayed significantly different expression levels than WT mice (Figure 3.14).

128

Figure 3.14. Exon specific differences in gene expression are related to the genetic background of the mouse and not genotype. 129+/+ (n=7), Gtf2ird1-/- (n=5), CD1+/+ (n=6). Expression values are shown relative to the housekeeping gene Sdha. (For presentation purposes, some values were scaled as indicated) ** p < 0.005, *p < 0.05 using Student’s t-test.

3.4.7 Differentially expressed exons do not affect protein levels of Stx3

Western blots were used to look at the levels of Stx3 in the brains of three P0 Gtf2ird1-/- and WT mice. There were no obvious differences in protein expression between genotypes

(Figure 3.15), however there may be subtle differences. Protein was extracted from the whole brain for this experiment, and so the Stx3 expression levels could not be confirmed using qRT-

PCR. As there is variation in Stx3 expression in WT and Gtf2ird1-/- mice, it is possible that the mice selected for this experiment all had equal levels of Stx3 expression. A larger sample size, and confirmation of Stx3 mRNA expression levels will be needed to determine if the different 3’

UTRs detected influence the Stx3 protein levels.

129

Figure 3.15. Stx3 protein levels to not appear to vary between Gtf2ird1-/- and wildtype mice. Western blot analysis was performed on protein extracted from the whole brain of P0 mice. An anti-Stx3 antibody was used which detects all three Stx3 isoforms. Anti-α-tubulin was used as a loading control.

3.5 Discussion

An in depth analysis of transcription in Gtf2ird1-/-, WT CD1 and WT 129 mice illustrates a

number of problems which can confound expression analysis when looking at transcription in

mice with genetically different backgrounds. In addition, these results indicate that expression

differences detected when looking specifically at a very small portion of a transcript are not

necessarily indicative of the expression levels of other exons, and care must be taken when

validating microarray experiments.

The primers used in qRT-PCR amplified regions of 80 – 120 bp, located within one exon

of a gene. Initially, primers were used that targeted the 3’ UTR, and significant differences in

expression were detected between WT CD1 and Gtf2ird1-/-/WT 129 mice in a number of genes.

The most straightforward explanation of this finding would be that there were differences in the

overall mRNA levels of the genes in question. However, for a number of genes, including Kin,

Mrpl16, Actl6b, Zfp68 and Taf6 no differences in expression were detected between genotypes

130

when the primers used in qRT-PCR amplified regions at the 5’ end of the gene, while primers

which amplified from the 3’ ends of the same transcripts found significant genotype specific

differences in expression. These results indicated that mRNAs were being produced which

differed in their 3’ ends and suggest that the differences in expression detected are due to a shift

in the ratios of alternatively spliced and polyadenylated transcripts.

Northern blot analysis was performed in order to identify transcripts of unique sizes which

were believed to be produced in a genotype specific manner. Four different genes were analyzed

using this technique, and the transcripts produced by WT and Gtf2ird1-/- mice did not appear to

be different in size. It was still possible that alternative transcripts were being produced either at

levels below the threshold of detection for this technique, or with small differences in size that

could not be resolved and so 3’ RACE was performed in another attempt to identify alternative

transcripts.

The 3’ end of mRNAs were amplified using an oligo-dT primer and gene specific forward primers. The PCR products were cloned and sequenced which allowed differences as small as

1bp to be detected. When the sequences were aligned against the UCSC genomic database, it was clear that alternative splicing and/or polyadenylation was occurring in the 3’ UTRs of the genes in question.

3.5.1 Alternative splicing in the 3’ UTR

Stx3 and Coq2 transcripts were detected in which a portion of the 3’ UTR had been spliced out. Alternative splicing of primary transcripts results in proteins with different structures and functions, and is believed to be a driving force in the phenotypic complexity that exists in mammals185. Recently deep sequencing of human cDNAs from 15 different human

tissues and cell lines revealed that 92-94% of human transcripts are alternatively spliced185.

131

Splicing is a highly regulated process controlled by the spliceosome, a complex of both proteins

and snRNAs. The spliceosome recognizes 4 highly conserved signals in the RNA sequence: the

5’ donor and 3’ acceptor splice sites found at the ends of introns, a branch site sequence found

upstream of the 3’ splice site and a polypyrimidine tract found between the branch site and the 3’

splice site186. A study of 43,337 human splice junction pairs found that 98.71% of splice sites

contain the canonical 5’ GT donor and 3’ AG acceptor dinucleotide sequences187.

Three different Stx3 transcripts with alternative splicing in the 3’ UTR were identified,

containing the non-canonical donor-acceptor pairs CC-AG, AT-TC and AT-AA. The donor- acceptor pair used in the alternatively spliced Coq2 transcript was also non-canonical, GA-TG.

In a study of 10,803 human transcripts representing 91,846 donor-acceptor sites, Chong et al. found that 22 donor-acceptor sites (including the canonical GT- AG) represented 99.16% of the data set188. The splice sites used in the alternative Stx3 and Coq2 transcripts were not on this list,

and occurred extremely infrequently (CC-AG = 10 times, AT-TC = 0 times, AT-AA = 3 times,

GA-TG = 3 times), however the authors felt that all isoforms that were identified were real as they were identified in multiple mRNAs.

A recent study by Housely and Tollervey proposes another explanation for the unique 3’

UTRs detected in the Stx3 and Coq2 transcripts189. They found that some cDNAs which appear

to have been alternatively spliced using non-canonical splice sites were actually created by

template switching of reverse transcriptase. In most cases where this was found to occur, the

region spliced out was flanked by direct repeats, however in some transcripts no areas of

homology could be identified. If template switching occurred during the reverse transcription of

Stx3 it would explain why no alternative transcripts were detected using northern blot analysis.

132

The northern blot was run with total RNA extracted from tissue, and the RNA was not subjected to reverse transcription.

Ultimately, the novel Coq2 and Stx3 transcripts are unlikely to be related to genotype.

When a larger sample size was used to look at Stx3 (and Mrpl16) expression, it was found that the expression levels of different isoforms were naturally variable and the differences between genotypes were not significant. qRT-PCR using primers which would differentiate between

Coq2 isoforms that have splicing in the 3’ UTR and those that don’t did not find significant differences in expression between genotypes, indicating that the alternative form is produced at equal levels in all mice.

3.5.2 Use of alternative polyadenylation sites

In addition to alternatively spliced isoforms, 3’ RACE also detected mRNAs which used alternative polyadenylation sites, resulting in shorter Mrpl16, Stx3, Zfp68 and Coq2 transcripts than those in the UCSC database, and a longer Ap4m1 transcript. As previously mentioned, the variation in expression of Mrpl16 and Stx3 appears to occur in all mice, and is unrelated to genotype. qRT-PCR using primers to distinguish between long and short forms of Coq2 found that Gtf2ird1-/- mice expressed higher levels of full length transcripts than WT CD1 mice. The opposite result was found for Zfp68 - Gtf2ird1-/- and WT 129 mice expressed lower levels of full length transcripts than WT CD1 mice. However, the fraction of transcripts which included the full length 3’ UTR was very low in all mice. The full length 3’ UTR is approximately 4 kb, however more than half of the transcripts appear to use the first polyadenylation site resulting in a 3’UTR of only 130 bp. In Gtf2ird1-/- and WT 129 mice, Zfp68 exons 5 and 6 are expressed 3X higher than WT CD1 mice. There are 3 known isoforms of Zfp68, A, B and C. Isoform B does not include the 3rd exon of isoforms A and C. Expression of exon 3 does not differ between

133 genotypes, and so it is possible that Gtf2ird1-/- and WT 129 mice expresses higher levels Zfp68B which use a different 3’ UTR. The forward primer used in 3’ RACE was immediately upstream of the annotated 3’ UTR, and so if transcripts terminated before this point they would not have been detected using this technique. Unfortunately there are no commercial antibodies for Zfp68 available and so I was unable to determine if the different 3’ UTRs result in different levels or localization of the protein in mice of different genetic backgrounds.

Ap4m1 was also found to use an alternative polyadenylation site, resulting in a transcript

1.4 kb longer than any Ap4m1 mRNA or EST listed in the UCSC database. The commonly used polyadenylation site does not contain a canonical signal sequence, however the extended transcript does. Ap4m1 is a subunit of adaptor protein complex (AP) 4 which is involved in the trafficking of membrane proteins, and localizes to dendrites and golgi-like structures in the cell bodies of neurons190. It is possible that the localization of Ap4m1 is regulated in a similar manner to BDNF161, and the different 3’ UTR lengths detected determine in which area of the neuron Ap4m1 produced by a certain transcript will be found. As the 3’ UTRs of Ap4m1 and

Taf6 are anti-sense to each other, it is impossible to accurately determine relative levels of the

Ap4m1 isoforms using qRT-PCR on cDNA generated with random hexamer primers.

Polyadenylation site selection in different mouse strains has not been studied in detail. In transcripts containing multiple potential polyadenylation signal sequences, the selection of poly(A) site usage is known to depend on a number of factors including the tissue191, stage of development163,192, and even neuronal activity191. As was previously mentioned, one example of mouse strain specific use of a polyadenylation site has been previously reported. A SNP located in the 3’UTR of Adh4 in C57BL/6 mice results in the formation of a new poly(A) signal which causes the expression an Adh4 isoform in their stomachs that is not found in other mouse

134

strains184. To my knowledge, my results are the first evidence that identical polyadenylation

sites may be used at different frequencies for certain genes in different mouse strains.

It is easy to postulate a number of mechanisms which could lead to one strain favouring

use of one poly(A) site over another. There may be SNPs in the flanking genomic sequences

which make the poly(A) signals weaker or stronger. Each isoform identified in this study which

contained the canonical sequence A(A/U)UAAA, had an identical sequence in each mouse

tested. However, as sequences flanking this site are known to affect the affinity of the

polyadenylation complex for that region of the mRNA193,194, SNPs may alter the ratio at which a

particular site is used. In the isoforms which did not contain a canonical poly(A) site SNPs may

also affect the affinity of the polyadenylation complex for the mRNA molecule.

SNPs between different strains may also result in differential expression levels or amino acid sequence of any of the many proteins which are involved in polyadenylation directly (as members of the polyadenylation complex) or indirectly (as members of the elongation complex, or proteins which cause post-translational modifications to either of these complexes).

Transgenic mice targeted in ES cells of one strain, and then backcrossed onto a different strain for 12 generations would be expected to contain an average of 16 cM of DNA from the ES cell strain flanking the targeted locus195. This represents about 1% of the mouse genome.

Given the large number of genes involved in 3’ end processing, it is likely that Gtf2ird1-/-

mice retained one or more genes which play a role in polyadenylation from the 129 derived ES

cells. Cpsf4, which encodes a 30 kDa subunit of CPSF is located on mouse chromosome 5, only

11 MB from the Gtf2ird1 locus. CFSP4 has been shown to bind to poly(U) sequences in RNA

molecules, and has been proposed to enhance the ability of CPSF to bind at poly(A) sites196.

135

This protein, or any other protein involved in polyadenylation encoded by a 129 derived allele in

the Gtf2ird1-/- mice could explain the differences in poly(A) site usage detected.

3.5.3 qRT-PCR validation of microarrays

This work has demonstrated that care needs to be taken when validating microarray

experiments. Using primers/probes which target the 3’ UTR may give results which are not

necessarily indicative of the expression level of the entire transcript. A recent study using the

Illumina Mouse WG array to look at expression in the mouse striatum found that 22% of 1100

genes with multiple probes showed discordant expression between the probe sets197.

Poor correlation has been reported between mRNA expression levels detected by microarray analysis and proteomic analysis198,199. There are many factors which are likely to

contribute to this discrepancy, one of which is that many microarray chips are designed to only looks at expression of a very small area of the transcript. The findings of the experiments described in this report demonstrate that expression levels in different areas of a transcript need to be measured in order to accurately reflect what is occurring in vivo.

Using qRT-PCR the expression level of different areas of the transcript, including coding exons and regions of the 3’UTR can be measured independently. Primers should be designed to distinguish between both alternatively spliced and alternatively polyadenylated mRNA isoforms.

However, other factors can confound the analysis of qRT-PCR data. Expression of anti-sense

transcripts, such as Taf6 and Ap4m1 cannot be distinguished using cDNA synthesized with

random hexamer, or oligo(dt) primers. In order to look at the expression of each of these

transcripts individually, 1st strand cDNA synthesis would have to be done for each gene

separately using a gene specific primer.

136

The method by which transcriptional termination occurs can also confound analysis of

gene expression when primers which amplify from the 3’ UTR are used. RNAPII, and the

elongation complex continue transcribing DNA past the poly(A) site where the transcript is

cleaved and the poly(A) tail added. Elongation, splicing and polyadenylation occur

concurrently, and so the nascent transcript is likely to be cleaved soon after being transcribed.

The RNA downstream of the cleavage site will still be present in the cell, at least temporarily,

and it will continue to be extended in a 3’ manner until RNAPII dissociates from the chromatin

template. Therefore in addition to measuring levels of mRNA, qRT-PCR performed on cDNA prepared with random hexamer primers will also detect RNA downstream of the polyadenylation site. In order to avoid this problem, mRNA should be isolated from the total RNA before cDNA preparation, or 1st strand synthesis of cDNA should be performed with oligo(dT) primers.

137

Chapter IV: Summary and Future Directions

4.1 Summary

Williams-Beuren syndrome is an autosomal dominant developmental disorder caused by

the deletion of 26-28 genes from chromosome 7q11.1336. The clinical manifestations of this

disorder are numerous and include dysmorphic facial features, SVAS, retarded growth, infantile

hypercalcemia and renal defects15,21. In addition, WBS patients have a distinct cognitive

phenotype characterized by ‘peaks and valleys’ of ability. The average IQ in WBS is 5526 with

individual IQ’s ranging from 40-10027. Individuals with WBS typically have relatively strong

expressive language skills, and show relatively weak performance in tasks that involve visual-

spatial processing26. In addition they are overly-friendly and show reduced social inhibitions26.

Of the 28 genes included in the common 1.55 Mb WBS deletion, only ELN has been conclusively linked to a particular aspect of the WBS phenotype. Mutations that disrupt ELN,

have been found in individuals with SVAS (who do not have WBS), linking ELN to this aspect

of the WBS phenotype33. A number of individuals have been identified who have smaller

deletions in the WBS region on chromosome 7 which encompass only a few genes48-53.

Phenotypic analysis of these individuals indicates that hemizygosity for members of the GTF2I

family, GTF2I and GTF2IRD1, are likely to be responsible for the behavioural and cognitive

aspects of the WBS phenotype. Gtf2ird1-/- mice display behaviours similar to those seen in WBS

patients including increased sociability and a decreased natural fear response88. The phenotypes

seen in individuals with atypical deletions in the WBS regions, and in Gtf2ird1-/- mice indicate

that the protein product of GTF2IRD1, TFII-IRD1 plays an important role in the brain.

Since discovery in 199875, TFII-IRD1 has been widely reported to be a transcription factor.

This protein has repeatedly been shown to bind to DNA in a sequence specific manner, and to

138

regulate gene expression in luciferase assays and in transformed cell lines after TFII-IRD1

knockdown/over-expression. However, the ability of TFII-IRD1 to regulate gene expression in vivo has never been demonstrated.

In order to identify genes which are regulated by TFII-IRD1 in vivo, and may play a role in

the behavioural phenotype seen in Gtf2ird1-/- mice, I performed microarray analysis to study

gene expression in the brains of Gtf2ird1-/- mice. Analysis was performed at two different

developmental time-points: E15.5 and P0. Although TFII-IRD1 is robustly expressed

throughout the brain at both of these time-points, I failed to detect any changes in gene

expression caused by the absence of this protein. These results indicate that TFII-IRD1 may have a role other than a transcription factor in the developing mouse brain.

qRT-PCR validation of the microarray experiments only confirmed altered expression in a

small subset of genes examined. The differences in gene expression that were detected between

genotypes using the Affymetrix and Illumina microarray platforms were the result of natural

variation in gene expression or differences in genotype resulting from the carry-over of genes

flanking the Gtf2ird1 locus from the parental R1 ES cells. This highlights the importance of

proper analysis and validation of microarray data as the presence of flanking genes from parental

ES cells or using a small number of biological samples may confound interpretation of the

results.

The initial qRT-PCR that was performed to validate the microarray results generally used

primers located in the 3’ UTRs of the transcripts as this is where the majority of the microarray

probes were located on both platforms. For many of the genes which showed decreased

expression using primers in the 3’ UTR of the gene, primers specific to upstream coding exons

139

showed no differences in expression between genotypes. Northern blot analysis was used to

determine if alternative splicing was occurring in these genes, but the size or number of

transcripts did not appear to vary between genotypes. However, transcripts utilizing alternative

polyadenylation signals were identified using 3’ RACE.

qRT-PCR using primers designed to differentiate between the different 3’ UTR isoforms

of the same gene found that there were differences in polyA site selection between Gtf2ird1-/- and

WT mice. Further examination revealed that these differences were unrelated to TFII-IRD1, and were the result of alleles from the R1 129 background strain flanking the Gtf2ird1 locus in the

mutant mice. To my knowledge this is the first evidence of a strain specific bias in polyA site

selection in mice.

4.2 Further investigation of GTF2IRD1 function

While in vivo targets of this putative transcription factor have yet to be identified, it is clear

that proper expression of TFII-IRD1 is needed for normal behaviour in mice. In addition, two

more individuals with atypical deletions in the WBS region have recently been identified further

implicating TFII-IRD1 in the cognitive aspects of the WBS phenotype200,201. Ferrero et al. have

identified a patient with a 1 Mb deletion which does not include the genes GTF2IRD1 or

GTF2I200. This patient does not have the typical facial features seen in WBS and has a normal

IQ. He was not formally tested for sociability or anxiety, but the authors noted that he did not

appear to display increased sociability or show signs of anxiety. Dai et al. reported a patient

with a deletion that included GTF2IRD1 but not GTF2I201. This patient showed facial features

characteristic of WBS, and had low scores on tests of visual-spatial cognition. She did not appear to show signs of increased sociability such as maintaining eye contact and attention to strangers.

140

TFII-I has been shown to have functional roles in both the nucleus and cytoplasm, acting as a transcription factor61,63,67, and regulating intercellular calcium levels through protein-protein

interactions143. Given the lack of transcriptional targets in the tissues and time-points examined, it is possible that TFII-IRD1 may also participate in protein-protein interactions that are necessary for proper neuronal function, yet do not alter gene expression.

Preliminary research on TFII-IRD1 localization in Neuro2A cells indicates that TFII-IRD1

is found in in the cytoplasm of these cells at relatively high levels, supporting the idea that TFII-

IRD1 may have a cytoplasmic function. These studies need to be repeated using confocal

microscopy which will give better resolution, and clearly show if TFII-IRD1 localization is

cytoplasmic, nuclear, peri-nuclear, or some combination of these.

We are also currently using an unbiased approach to identify proteins with which TFII-

IRD1 interacts. Affinity purification of TFII-IRD1 followed by liquid chromatography and

tandem mass spectrometry is currently being performed by Andrew Emili’s lab at the University

of Toronto. Proteins that interact with human TFII-IRD1 in HEK-293 cells, and with mouse

TFII-IRD1 in Neuro2A cells should be identified using this method. This technique has been used successfully to pull out known and novel protein interactions using members of previously documented protein complexes as bait202.

If TFII-IRD1 is acting as a transcription factor at the time-points studied, or at other stages of life, it is likely that this activity is restricted to a specific cell population, since I was unable to detect any changes in gene expression using whole brain extracts. Conditional knockout mice would be a great tool to further elucidate the temporal and cell-specific roles of TFII-IRD1. In order to accomplish this, a mouse with the Gtf2ird1 locus flanked by loxP sites in the proper

141

orientation would need to be generated. In the presence of Cre-recombinase, the region between

the loxP sites will be deleted. By crossing mice with floxed Gtf2ird1 alleles with mice

expressing Cre under the control of a promoter of a gene that is expressed in a time or cell

specific manner, Gtf2ird1 can be deleted in vivo. This method has been successfully used in

mice to delete Nmdar1 from the CA1 pyramidal cells of the hippocampus, while allowing the

gene to be expressed normally elsewhere203.

By removing TFII-IRD1 at different time points or cell populations it would be possible to

narrow down the physiological source of the behavioural phenotype, for example, is the

phenotype related to improper neuronal development, or is TFII-IRD1 needed throughout life for proper brain function? Does the phenotype result from the absence of TFII-IRD1 in one particular type of neuron? This information will give clues as to the cellular role of TFII-IRD1 and help in the development of future experiments.

It is also possible to use this technology to generate mice which will express eGFP upon the deletion of the Gtf2ird1 allele by Cre204. This would then allow the specific cell populations

lacking TFII-IRD1 to be sorted using fluorescence-activated cell sorting (FACS). One reason

why I failed to detect any transcriptional targets of TFII-IRD1 may be that it is only acting as a

transcription factor in a specific cell population, and the alterations in gene expression were

diluted out by looking at expression in the whole brain. This method would allow gene

expression to be studied in restricted cell populations which may allow the identification of in

vivo transcriptional targets of TFII-IRD1.

In order to further elucidate the role of TFII-IRD1 in the development of neurons, primary cortical cultures are currently being generated from WT and Gtf2ird1-/- mice. Neuronal

142

precursor cells differentiate into neurons, astrocytes and oligodendrocytes. Studying the ratios

that different cell types develop at in WT and Gtf2ird1-/- mice may provide information on the

role of TFII-IRD1 in neuronal development. In addition, studying axon and dendrite length, and the number and morphology of dendrites may also give clues as to the role of TFII-IRD1 in neuronal function.

4.3 Further investigation of alternative polyA site selection

The 3’ UTR has many important roles including RNA localization, stability, translation and microRNA binding160. The different usage of polyA sites detected between WT CD1 and

WT 129S1/SvImJ mice could have a functional effect by impairing or enhancing any of these

roles. Unfortunately there are no antibodies available for most of the genes for which differences

in 3’ UTR expression were detected. There are antibodies available for STX3 and ACTL6B, but

these antibodies detect multiple isoforms, including isoforms which are not differentially

expressed between strains.

Cellular localization of transcripts with altered 3’ UTR expression can be studied by

generating DIG labeled probes specific to certain isoforms and performing RNA in situ

hybridization on primary neuronal cultures. In order to determine the effect of 3’ UTR length on

protein levels tagged proteins with different 3’ UTRs can be expressed in neuronal cultures. By

comparing the mRNA expression level to protein levels, the effect of the 3’ UTR on translation

can be determined.

4.4 Conclusion

In conclusion, although this work failed to identify the method by which TFII-IRD1

contributes to the behavioural phenotype seen in Gtf2ird1-/- mice or people with WBS, genotype-

phenotype correlations indicate that this protein does play an important role in proper brain

143 development and/or function. Further experimentation focusing on potential cytoplasmic roles for TFII-IRD1 and gene expression in restricted neuronal cell populations should aid in elucidating the function of this protein.

144

References

1. Lightwood, R. & Stapleton, T. Idiopathic hypercalcaemia in infants. Lancet 265, 255-6 (1953).

2. Fanconi, G., Girardet, P., Schlesinger, B., Butler, N. & Black, J. [Chronic hyperglycemia, combined with osteosclerosis, hyperazotemia, nanism and congenital malformations.]. Helv Paediatr Acta 7, 314-49 (1952).

3. Jones, K.L. : an historical perspective of its evolution, natural history, and etiology. Am J Med Genet Suppl 6, 89-96 (1990).

4. Schlesinger, B.E., Butler, N.R. & Black, J.A. Severe type of infantile hypercalcaemia. Br Med J 1, 127-34 (1956).

5. Williams, J.C., Barratt-Boyes, B.G. & Lowe, J.B. Supravalvular aortic stenosis. Circulation 24, 1311-8 (1961).

6. Beuren, A.J., Apitz, J. & Harmjanz, D. Supravalvular aortic stenosis in association with mental retardation and a certain facial appearance. Circulation 26, 1235-40 (1962).

7. Black, J.A. & Carter, R.E. Association between Aortic Stenosis and Facies of Severe Infantile Hypercalcaemia. Lancet 2, 745-9 (1963).

8. Friedman, W.F. & Mills, L.F. The relationship between vitamin D and the craniofacial and dental anomalies of the supravalvular aortic stenosis syndrome. Pediatrics 43, 12-8 (1969).

9. Friedman, W.F. & Roberts, W.C. Vitamin D and the supravalvar aortic stenosis syndrome. The transplacental effects of vitamin D on the aorta of the rabbit. Circulation 34, 77-86 (1966).

10. Morris, C.A., Thomas, I.T. & Greenberg, F. Williams syndrome: autosomal dominant inheritance. Am J Med Genet 47, 478-81 (1993).

11. Curran, M.E., Atkinson, D.L., Ewart, A.K., Morris, C.A., Leppert, M.F. & Keating, M.T. The elastin gene is disrupted by a translocation associated with supravalvular aortic stenosis. Cell 73, 159-68 (1993).

12. Ewart, A.K., Morris, C.A., Atkinson, D., Jin, W., Sternes, K., Spallone, P., Stock, A.D., Leppert, M. & Keating, M.T. Hemizygosity at the elastin locus in a developmental disorder, Williams syndrome. Nat Genet 5, 11-6 (1993).

13. Stromme, P., Bjornstad, P.G. & Ramstad, K. Prevalence estimation of Williams syndrome. J Child Neurol 17, 269-71 (2002).

14. Morris, C.A., Demsey, S.A., Leonard, C.O., Dilts, C. & Blackburn, B.L. Natural history of Williams syndrome: physical characteristics. J Pediatr 113, 318-26 (1988).

145

15. Morris, C.A. & Mervis, C.B. Williams syndrome and related disorders. Annu Rev Genomics Hum Genet 1, 461-84 (2000).

16. Reiss, A.L., Eliez, S., Schmitt, J.E., Straus, E., Lai, Z., Jones, W. & Bellugi, U. IV. Neuroanatomy of Williams syndrome: a high-resolution MRI study. J Cogn Neurosci 12 Suppl 1, 65-73 (2000).

17. Galaburda, A.M. & Bellugi, U. V. Multi-level analysis of cortical neuroanatomy in Williams syndrome. J Cogn Neurosci 12 Suppl 1, 74-88 (2000).

18. Kippenhan, J.S., Olsen, R.K., Mervis, C.B., Morris, C.A., Kohn, P., Meyer-Lindenberg, A. & Berman, K.F. Genetic contributions to human gyrification: sulcal morphometry in Williams syndrome. J Neurosci 25, 7840-6 (2005).

19. Pankau, R., Partsch, C.J., Gosch, A., Oppermann, H.C. & Wessel, A. Statural growth in Williams-Beuren syndrome. Eur J Pediatr 151, 751-5 (1992).

20. Jones, K.L. & Smith, D.W. The Williams elfin facies syndrome. A new perspective. J Pediatr 86, 718-23 (1975).

21. Preus, M. The Williams syndrome: objective definition and diagnosis. Clin Genet 25, 422-8 (1984).

22. Rodd, C. & Goodyer, P. Hypercalcemia of the newborn: etiology, evaluation, and management. Pediatr Nephrol 13, 542-7 (1999).

23. Pober, B.R., Johnson, M. & Urban, Z. Mechanisms and treatment of cardiovascular disease in Williams-Beuren syndrome. J Clin Invest 118, 1606-15 (2008).

24. Pober, B.R. Williams-Beuren syndrome. N Engl J Med 362, 239-52 (2010).

25. Cherniske, E.M., Carpenter, T.O., Klaiman, C., Young, E., Bregman, J., Insogna, K., Schultz, R.T. & Pober, B.R. Multisystem study of 20 older adults with Williams syndrome. Am J Med Genet A 131, 255-64 (2004).

26. Bellugi, U., Lichtenberger, L., Jones, W., Lai, Z. & St George, M. I. The neurocognitive profile of Williams Syndrome: a complex pattern of strengths and weaknesses. J Cogn Neurosci 12 Suppl 1, 7-29 (2000).

27. Martens, M.A., Wilson, S.J. & Reutens, D.C. Research Review: Williams syndrome: a critical review of the cognitive, behavioral, and neuroanatomical phenotype. J Child Psychol Psychiatry 49, 576-608 (2008).

28. Mervis, C.B. & Robinson, B.F. Expressive vocabulary ability of toddlers with Williams syndrome or Down syndrome: a comparison. Dev Neuropsychol 17, 111-26 (2000).

29. Jones, W., Bellugi, U., Lai, Z., Chiles, M., Reilly, J., Lincoln, A. & Adolphs, R. II. Hypersociability in Williams Syndrome. J Cogn Neurosci 12 Suppl 1, 30-46 (2000).

146

30. Frigerio, E., Burt, D.M., Gagliardi, C., Cioffi, G., Martelli, S., Perrett, D.I. & Borgatti, R. Is everybody always my friend? Perception of approachability in Williams syndrome. Neuropsychologia 44, 254-9 (2006).

31. Dykens, E.M. Anxiety, fears, and phobias in persons with Williams syndrome. Dev Neuropsychol 23, 291-316 (2003).

32. Johnson, L.W., Fishman, R.A., Schneider, B., Parker, F.B., Jr., Husson, G. & Webb, W.R. Familial supravalvular aortic stenosis. Report of a large family and review of the literature. Chest 70, 494-500 (1976).

33. Fazio, M.J., Mattei, M.G., Passage, E., Chu, M.L., Black, D., Solomon, E., Davidson, J.M. & Uitto, J. Human elastin gene: new evidence for localization to the long arm of chromosome 7. Am J Hum Genet 48, 696-703 (1991).

34. Osborne, L.R., Martindale, D., Scherer, S.W., Shi, X.M., Huizenga, J., Heng, H.H., Costa, T., Pober, B., Lew, L., Brinkman, J., Rommens, J., Koop, B. & Tsui, L.C. Identification of genes from a 500-kb region at 7q11.23 that is commonly deleted in Williams syndrome patients. Genomics 36, 328-36 (1996).

35. Robinson, W.P., Waslynka, J., Bernasconi, F., Wang, M., Clark, S., Kotzot, D. & Schinzel, A. Delineation of 7q11.2 deletions associated with Williams-Beuren syndrome and mapping of a repetitive sequence to within and to either side of the common deletion. Genomics 34, 17-23 (1996).

36. Hockenhull, E.L., Carette, M.J., Metcalfe, K., Donnai, D., Read, A.P. & Tassabehji, M. A complete physical contig and partial transcript map of the Williams syndrome critical region. Genomics 58, 138-45 (1999).

37. Schubert, C. The genomic basis of the Williams-Beuren syndrome. Cell Mol Life Sci 66, 1178-97 (2009).

38. Tassabehji, M. Williams-Beuren syndrome: a challenge for genotype-phenotype correlations. Hum Mol Genet 12 Spec No 2, R229-37 (2003).

39. Bayes, M., Magano, L.F., Rivera, N., Flores, R. & Perez Jurado, L.A. Mutational mechanisms of Williams-Beuren syndrome deletions. Am J Hum Genet 73, 131-51 (2003).

40. Osborne, L.R., Li, M., Pober, B., Chitayat, D., Bodurtha, J., Mandel, A., Costa, T., Grebe, T., Cox, S., Tsui, L.C. & Scherer, S.W. A 1.5 million- inversion polymorphism in families with Williams-Beuren syndrome. Nat Genet 29, 321-5 (2001).

41. Somerville, M.J., Mervis, C.B., Young, E.J., Seo, E.J., del Campo, M., Bamforth, S., Peregrine, E., Loo, W., Lilley, M., Perez-Jurado, L.A., Morris, C.A., Scherer, S.W. & Osborne, L.R. Severe expressive-language delay related to duplication of the Williams- Beuren locus. N Engl J Med 353, 1694-701 (2005).

147

42. Orellana, C., Bernabeu, J., Monfort, S., Rosello, M., Oltra, S., Ferrer, I., Quiroga, R., Martinez-Garay, I. & Martinez, F. Duplication of the Williams-Beuren critical region: case report and further delineation of the phenotypic spectrum. J Med Genet 45, 187-9 (2008).

43. Tam, E., Young, E.J., Morris, C.A., Marshall, C.R., Loo, W., Scherer, S.W., Mervis, C.B. & Osborne, L.R. The common inversion of the Williams-Beuren syndrome region at 7q11.23 does not cause clinical symptoms. Am J Med Genet A 146A, 1797-806 (2008).

44. Hobart, H.H., Morris, C.A., Mervis, C.B., Pani, A.M., Kistler, D.J., Rios, C.M., Kimberley, K.W., Gregg, R.G. & Bray-Ward, P. Inversion of the Williams syndrome region is a common polymorphism found more frequently in parents of children with Williams syndrome. Am J Med Genet C Semin Med Genet 154C, 220-8 (2010).

45. Osborne, L. & Pober, B. Genetics of childhood disorders: XXVII. Genes and cognition in Williams syndrome. J Am Acad Child Adolesc Psychiatry 40, 732-5 (2001).

46. Botta, A., Novelli, G., Mari, A., Novelli, A., Sabani, M., Korenberg, J., Osborne, L.R., Digilio, M.C., Giannotti, A. & Dallapiccola, B. Detection of an atypical 7q11.23 deletion in Williams syndrome patients which does not include the STX1A and FZD3 genes. J Med Genet 36, 478-80 (1999).

47. Heller, R., Rauch, A., Luttgen, S., Schroder, B. & Winterpacht, A. Partial deletion of the critical 1.5 Mb interval in Williams-Beuren syndrome. J Med Genet 40, e99 (2003).

48. Hirota, H., Matsuoka, R., Chen, X.N., Salandanan, L.S., Lincoln, A., Rose, F.E., Sunahara, M., Osawa, M., Bellugi, U. & Korenberg, J.R. Williams syndrome deficits in visual spatial processing linked to GTF2IRD1 and GTF2I on chromosome 7q11.23. Genet Med 5, 311-21 (2003).

49. Morris, C.A., Mervis, C.B., Hobart, H.H., Gregg, R.G., Bertrand, J., Ensing, G.J., Sommer, A., Moore, C.A., Hopkin, R.J., Spallone, P.A., Keating, M.T., Osborne, L., Kimberley, K.W. & Stock, A.D. GTF2I hemizygosity implicated in mental retardation in Williams syndrome: genotype-phenotype analysis of five families with deletions in the Williams syndrome region. Am J Med Genet A 123A, 45-59 (2003).

50. Howald, C., Merla, G., Digilio, M.C., Amenta, S., Lyle, R., Deutsch, S., Choudhury, U., Bottani, A., Antonarakis, S.E., Fryssira, H., Dallapiccola, B. & Reymond, A. Two high throughput technologies to detect segmental aneuploidies identify new Williams-Beuren syndrome patients with atypical deletions. J Med Genet 43, 266-73 (2006).

51. Tassabehji, M., Metcalfe, K., Karmiloff-Smith, A., Carette, M.J., Grant, J., Dennis, N., Reardon, W., Splitt, M., Read, A.P. & Donnai, D. Williams syndrome: use of chromosomal microdeletions as a tool to dissect cognitive and physical phenotypes. Am J Hum Genet 64, 118-25 (1999).

148

52. Gagliardi, C., Bonaglia, M.C., Selicorni, A., Borgatti, R. & Giorda, R. Unusual cognitive and behavioural profile in a Williams syndrome patient with atypical 7q11.23 deletion. J Med Genet 40, 526-30 (2003).

53. Tassabehji, M., Hammond, P., Karmiloff-Smith, A., Thompson, P., Thorgeirsson, S.S., Durkin, M.E., Popescu, N.C., Hutton, T., Metcalfe, K., Rucka, A., Stewart, H., Read, A.P., Maconochie, M. & Donnai, D. GTF2IRD1 in craniofacial development of humans and mice. Science 310, 1184-7 (2005).

54. Roy, A.L., Du, H., Gregor, P.D., Novina, C.D., Martinez, E. & Roeder, R.G. Cloning of an inr- and E-box-binding protein, TFII-I, that interacts physically and functionally with USF1. EMBO J 16, 7091-104 (1997).

55. Hinsley, T.A., Cunliffe, P., Tipney, H.J., Brass, A. & Tassabehji, M. Comparison of TFII-I gene family members deleted in Williams-Beuren syndrome. Protein Sci 13, 2588- 99 (2004).

56. Ferre-D'Amare, A.R., Prendergast, G.C., Ziff, E.B. & Burley, S.K. Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain. Nature 363, 38-45 (1993).

57. Vullhorst, D. & Buonanno, A. Characterization of general transcription factor 3, a transcription factor involved in slow muscle-specific gene expression. J Biol Chem 278, 8370-9 (2003).

58. Cheriyath, V. & Roy, A.L. Structure-function analysis of TFII-I. Roles of the N-terminal end, basic region, and I-repeats. J Biol Chem 276, 8377-83 (2001).

59. Tantin, D., Tussie-Luna, M.I., Roy, A.L. & Sharp, P.A. Regulation of immunoglobulin promoter activity by TFII-I class transcription factors. J Biol Chem 279, 5460-9 (2004).

60. Makeyev, A.V., Erdenechimeg, L., Mungunsukh, O., Roth, J.J., Enkhmandakh, B., Ruddle, F.H. & Bayarsaihan, D. GTF2IRD2 is located in the Williams-Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix-loop-helix repeats. Proc Natl Acad Sci U S A 101, 11052-7 (2004).

61. Roy, A.L., Meisterernst, M., Pognonec, P. & Roeder, R.G. Cooperative interaction of an initiator-binding transcription initiation factor and the helix-loop-helix activator USF. Nature 354, 245-8 (1991).

62. Yang, W. & Desiderio, S. BAP-135, a target for Bruton's tyrosine kinase in response to B cell receptor engagement. Proc Natl Acad Sci U S A 94, 604-9 (1997).

63. Grueneberg, D.A., Henry, R.W., Brauer, A., Novina, C.D., Cheriyath, V., Roy, A.L. & Gilman, M. A multifunctional DNA-binding protein that promotes the formation of serum response factor/homeodomain complexes: identity to TFII-I. Genes Dev 11, 2482- 93 (1997).

64. Saouaf, S.J., Mahajan, S., Rowley, R.B., Kut, S.A., Fargnoli, J., Burkhardt, A.L., Tsukada, S., Witte, O.N. & Bolen, J.B. Temporal differences in the activation of three

149

classes of non-transmembrane protein tyrosine kinases following B-cell antigen receptor surface engagement. Proc Natl Acad Sci U S A 91, 9524-8 (1994).

65. Cheriyath, V., Desgranges, Z.P. & Roy, A.L. c-Src-dependent transcriptional activation of TFII-I. J Biol Chem 277, 22798-805 (2002).

66. Cheriyath, V. & Roy, A.L. Alternatively spliced isoforms of TFII-I. Complex formation, nuclear translocation, and differential gene regulation. J Biol Chem 275, 26300-8 (2000).

67. Jackson, T.A., Taylor, H.E., Sharma, D., Desiderio, S. & Danoff, S.K. Vascular endothelial growth factor receptor-2: counter-regulation by the transcription factors, TFII-I and TFII-IRD1. J Biol Chem 280, 29856-63 (2005).

68. Ku, M., Sokol, S.Y., Wu, J., Tussie-Luna, M.I., Roy, A.L. & Hata, A. Positive and negative regulation of the transforming growth factor beta/activin target gene goosecoid by the TFII-I family of transcription factors. Mol Cell Biol 25, 7144-57 (2005).

69. Tussie-Luna, M.I., Bayarsaihan, D., Seto, E., Ruddle, F.H. & Roy, A.L. Physical and functional interactions of histone deacetylase 3 with TFII-I family proteins and PIASxbeta. Proc Natl Acad Sci U S A 99, 12807-12 (2002).

70. Enkhmandakh, B., Bitchevaia, N., Ruddle, F. & Bayarsaihan, D. The early embryonic expression of TFII-I during mouse preimplantation development. Gene Expr Patterns 4, 25-8 (2004).

71. Danoff, S.K., Taylor, H.E., Blackshaw, S. & Desiderio, S. TFII-I, a candidate gene for Williams syndrome cognitive profile: parallels between regional expression in mouse brain and human phenotype. Neuroscience 123, 931-8 (2004).

72. Yan, X., Zhao, X., Qian, M., Guo, N., Gong, X. & Zhu, X. Characterization and gene structure of a novel retinoblastoma-protein-associated protein similar to the transcription regulator TFII-I. Biochem J 345 Pt 3, 749-57 (2000).

73. Franke, Y., Peoples, R.J. & Francke, U. Identification of GTF2IRD1, a putative transcription factor within the Williams-Beuren syndrome deletion at 7q11.23. Cytogenet Cell Genet 86, 296-304 (1999).

74. Bayarsaihan, D. & Ruddle, F.H. Isolation and characterization of BEN, a member of the TFII-I family of DNA-binding proteins containing distinct helix-loop-helix domains. Proc Natl Acad Sci U S A 97, 7342-7 (2000).

75. O'Mahoney, J.V., Guven, K.L., Lin, J., Joya, J.E., Robinson, C.S., Wade, R.P. & Hardeman, E.C. Identification of a novel slow-muscle-fiber enhancer binding protein, MusTRD1. Mol Cell Biol 18, 6641-52 (1998).

76. Osborne, L.R., Campbell, T., Daradich, A., Scherer, S.W. & Tsui, L.C. Identification of a putative transcription factor gene (WBSCR11) that is commonly deleted in Williams- Beuren syndrome. Genomics 57, 279-84 (1999).

150

77. Corin, S.J., Levitt, L.K., O'Mahoney, J.V., Joya, J.E., Hardeman, E.C. & Wade, R. Delineation of a slow-twitch-myofiber-specific transcriptional element by using in vivo somatic gene transfer. Proc Natl Acad Sci U S A 92, 6185-9 (1995).

78. Polly, P., Haddadi, L.M., Issa, L.L., Subramaniam, N., Palmer, S.J., Tay, E.S. & Hardeman, E.C. hMusTRD1alpha1 represses MEF2 activation of the troponin I slow enhancer. J Biol Chem 278, 36603-10 (2003).

79. Ring, C., Ogata, S., Meek, L., Song, J., Ohta, T., Miyazono, K. & Cho, K.W. The role of a Williams-Beuren syndrome-associated helix-loop-helix domain-containing transcription factor in activin/nodal signaling. Genes Dev 16, 820-35 (2002).

80. Vullhorst, D. & Buonanno, A. Multiple GTF2I-like repeats of general transcription factor 3 exhibit DNA binding properties. Evidence for a common origin as a sequence-specific DNA interaction module. J Biol Chem 280, 31722-31 (2005).

81. Lazebnik, M.B., Tussie-Luna, M.I. & Roy, A.L. Determination and functional analysis of the consensus binding site for TFII-I family member BEN, implicated in Williams- Beuren syndrome. J Biol Chem 283, 11078-82 (2008).

82. Palmer, S.J., Tay, E.S., Santucci, N., Cuc Bach, T.T., Hook, J., Lemckert, F.A., Jamieson, R.V., Gunnning, P.W. & Hardeman, E.C. Expression of Gtf2ird1, the Williams syndrome-associated gene, during mouse development. Gene Expr Patterns 7, 396-404 (2007).

83. Proulx, E., Young, E.J., Osborne, L.R. & Lambe, E.K. Enhanced prefrontal serotonin 5- HT(1A) currents in a mouse model of Williams-Beuren syndrome with low innate anxiety. J Neurodev Disord 2, 99-108 (2010).

84. Tipney, H.J., Hinsley, T.A., Brass, A., Metcalfe, K., Donnai, D. & Tassabehji, M. Isolation and characterisation of GTF2IRD2, a novel fusion gene and member of the TFII-I family of transcription factors, deleted in Williams-Beuren syndrome. Eur J Hum Genet 12, 551-60 (2004).

85. Reiter, L.T., Murakami, T., Koeuth, T., Pentao, L., Muzny, D.M., Gibbs, R.A. & Lupski, J.R. A recombination hotspot responsible for two inherited peripheral neuropathies is located near a mariner transposon-like element. Nat Genet 12, 288-97 (1996).

86. Valero, M.C., de Luis, O., Cruces, J. & Perez Jurado, L.A. Fine-scale comparative mapping of the human 7q11.23 region and the orthologous region on mouse chromosome 5G: the low-copy repeats that flank the Williams-Beuren syndrome deletion arose at breakpoint sites of an evolutionary inversion(s). Genomics 69, 1-13 (2000).

87. Osborne, L.R. Animal models of Williams syndrome. Am J Med Genet C Semin Med Genet 154C, 209-19 (2010).

88. Young, E.J., Lipina, T., Tam, E., Mandel, A., Clapcote, S.J., Bechard, A.R., Chambers, J., Mount, H.T., Fletcher, P.J., Roder, J.C. & Osborne, L.R. Reduced fear and aggression

151

and altered serotonin metabolism in Gtf2ird1-targeted mice. Genes Brain Behav 7, 224- 34 (2008).

89. Kusserow, H., Davies, B., Hortnagl, H., Voigt, I., Stroh, T., Bert, B., Deng, D.R., Fink, H., Veh, R.W. & Theuring, F. Reduced anxiety-related behaviour in transgenic mice overexpressing serotonin 1A receptors. Brain Res Mol Brain Res 129, 104-16 (2004).

90. Calvo, S., Vullhorst, D., Venepally, P., Cheng, J., Karavanova, I. & Buonanno, A. Molecular dissection of DNA sequences and factors involved in slow muscle-specific transcription. Mol Cell Biol 21, 8490-503 (2001).

91. Zhu, L., Lyons, G.E., Juhasz, O., Joya, J.E., Hardeman, E.C. & Wade, R. Developmental regulation of troponin I isoform genes in striated muscles of transgenic mice. Dev Biol 169, 487-503 (1995).

92. Issa, L.L., Palmer, S.J., Guven, K.L., Santucci, N., Hodgson, V.R., Popovic, K., Joya, J.E. & Hardeman, E.C. MusTRD can regulate postnatal fiber-specific expression. Dev Biol 293, 104-15 (2006).

93. Juan, A.H. & Ruddle, F.H. Enhancer timing of Hox gene expression: deletion of the endogenous Hoxc8 early enhancer. Development 130, 4823-34 (2003).

94. Tussie-Luna, M.I., Bayarsaihan, D., Ruddle, F.H. & Roy, A.L. Repression of TFII-I- dependent transcription by nuclear exclusion. Proc Natl Acad Sci U S A 98, 7789-94 (2001).

95. Watabe, T., Kim, S., Candia, A., Rothbacher, U., Hashimoto, C., Inoue, K. & Cho, K.W. Molecular mechanisms of Spemann's organizer formation: conserved growth factor synergy between Xenopus and mouse. Genes Dev 9, 3038-50 (1995).

96. Laurent, M.N., Blitz, I.L., Hashimoto, C., Rothbacher, U. & Cho, K.W. The Xenopus homeobox gene twin mediates Wnt induction of goosecoid in establishment of Spemann's organizer. Development 124, 4905-16 (1997).

97. Rivera-Perez, J.A., Mallo, M., Gendron-Maguire, M., Gridley, T. & Behringer, R.R. Goosecoid is not an essential component of the mouse gastrula organizer but is required for craniofacial and rib development. Development 121, 3005-12 (1995).

98. Ferreiro, B., Artinger, M., Cho, K. & Niehrs, C. Antimorphic goosecoids. Development 125, 1347-59 (1998).

99. Tantin, D. & Sharp, P.A. Mouse lymphoid cell line selected to have high immunoglobulin promoter activity. Mol Cell Biol 22, 1460-73 (2002).

100. Wu, Y. & Patterson, C. The human KDR/flk-1 gene contains a functional initiator element that is bound and transactivated by TFII-I. J Biol Chem 274, 3207-14 (1999).

152

101. Chimge, N.O., Mungunsukh, O., Ruddle, F. & Bayarsaihan, D. Expression profiling of BEN regulated genes in mouse embryonic fibroblasts. J Exp Zool B Mol Dev Evol 308, 209-24 (2007).

102. Chimge, N.O., Makeyev, A.V., Ruddle, F.H. & Bayarsaihan, D. Identification of the TFII-I family target genes in the vertebrate genome. Proc Natl Acad Sci U S A 105, 9006- 10 (2008).

103. Bayarsaihan, D., Bitchevaia, N., Enkhmandakh, B., Tussie-Luna, M.I., Leckman, J.F., Roy, A. & Ruddle, F. Expression of BEN, a member of TFII-I family of transcription factors, during mouse pre- and postimplantation development. Gene Expr Patterns 3, 579-89 (2003).

104. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. & Speed, T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249-64 (2003).

105. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-21 (2001).

106. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185-93 (2003).

107. Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3 (2004).

108. Blum, M., Gaunt, S.J., Cho, K.W., Steinbeisser, H., Blumberg, B., Bittner, D. & De Robertis, E.M. Gastrulation in the mouse: the role of the homeobox gene goosecoid. Cell 69, 1097-106 (1992).

109. Gaunt, S.J., Blum, M. & De Robertis, E.M. Expression of the mouse goosecoid gene during mid-embryogenesis may mark mesenchymal cell lineages in the developing head, limbs and body wall. Development 117, 769-78 (1993).

110. Yamada, G., Mansouri, A., Torres, M., Stuart, E.T., Blum, M., Schultz, M., De Robertis, E.M. & Gruss, P. Targeted mutation of the murine goosecoid gene results in craniofacial defects and neonatal death. Development 121, 2917-22 (1995).

111. Durkin, M.E., Keck-Waggoner, C.L., Popescu, N.C. & Thorgeirsson, S.S. Integration of a c-myc transgene results in disruption of the mouse Gtf2ird1 gene, the homologue of the human GTF2IRD1 gene hemizygously deleted in Williams-Beuren syndrome. Genomics 73, 20-7 (2001).

112. Kwon, Y., Shin, J., Park, H.W. & Kim, M.H. Dynamic expression pattern of Hoxc8 during mouse early embryogenesis. Anat Rec A Discov Mol Cell Evol Biol 283, 187-92 (2005).

153

113. Borello, U., Cobos, I., Long, J.E., McWhirter, J.R., Murre, C. & Rubenstein, J.L. FGF15 promotes neurogenesis and opposes FGF8 function during neocortical development. Neural Dev 3, 17 (2008).

114. Darios, F. & Davletov, B. Omega-3 and omega-6 fatty acids stimulate cell membrane expansion by acting on syntaxin 3. Nature 440, 813-7 (2006).

115. Wu, J.I., Lessard, J., Olave, I.A., Qiu, Z., Ghosh, A., Graef, I.A. & Crabtree, G.R. Regulation of dendritic development by neuron-specific chromatin remodeling complexes. Neuron 56, 94-108 (2007).

116. Hakre, S., Tussie-Luna, M.I., Ashworth, T., Novina, C.D., Settleman, J., Sharp, P.A. & Roy, A.L. Opposing functions of TFII-I spliced isoforms in growth factor-induced gene expression. Mol Cell 24, 301-8 (2006).

117. Pan, Y., Tsai, C.J., Ma, B. & Nussinov, R. Mechanisms of transcription factor selectivity. Trends Genet 26, 75-83 (2010).

118. Palmer, S.J., Santucci, N., Widagdo, J., Bontempo, S.J., Taylor, K.M., Tay, E.S., Hook, J., Lemckert, F., Gunning, P.W. & Hardeman, E.C. Negative autoregulation of GTF2IRD1 in Williams-Beuren syndrome via a novel DNA binding mechanism. J Biol Chem 285, 4715-24 (2010).

119. Antonell, A., Del Campo, M., Magano, L.F., Kaufmann, L., de la Iglesia, J.M., Gallastegui, F., Flores, R., Schweigmann, U., Fauth, C., Kotzot, D. & Perez-Jurado, L.A. Partial 7q11.23 deletions further implicate GTF2I and GTF2IRD1 as the main genes responsible for the Williams-Beuren syndrome neurocognitive profile. J Med Genet 47, 312-20 (2010).

120. Collette, J.C., Chen, X.N., Mills, D.L., Galaburda, A.M., Reiss, A.L., Bellugi, U. & Korenberg, J.R. William's syndrome: gene expression is related to parental origin and regional coordinate control. J Hum Genet 54, 193-8 (2009).

121. Merla, G., Howald, C., Henrichsen, C.N., Lyle, R., Wyss, C., Zabot, M.T., Antonarakis, S.E. & Reymond, A. Submicroscopic deletion in patients with Williams-Beuren syndrome influences expression levels of the nonhemizygous flanking genes. Am J Hum Genet 79, 332-41 (2006).

122. Corbo, J.C., Lawrence, K.A., Karlstetter, M., Myers, C.A., Abdelaziz, M., Dirkes, W., Weigelt, K., Seifert, M., Benes, V., Fritsche, L.G., Weber, B.H. & Langmann, T. CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors. Genome Res 20, 1512-25 (2010).

123. Wei, G.H., Badis, G., Berger, M.F., Kivioja, T., Palin, K., Enge, M., Bonke, M., Jolma, A., Varjosalo, M., Gehrke, A.R., Yan, J., Talukder, S., Turunen, M., Taipale, M., Stunnenberg, H.G., Ukkonen, E., Hughes, T.R., Bulyk, M.L. & Taipale, J. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J 29, 2147-60 (2010).

154

124. Pfenning, A.R., Kim, T.K., Spotts, J.M., Hemberg, M., Su, D. & West, A.E. Genome- wide identification of calcium-response factor (CaRF) binding sites predicts a role in regulation of neuronal signaling pathways. PLoS One 5, e10870 (2010).

125. Enkhmandakh, B., Makeyev, A.V., Erdenechimeg, L., Ruddle, F.H., Chimge, N.O., Tussie-Luna, M.I., Roy, A.L. & Bayarsaihan, D. Essential functions of the Williams- Beuren syndrome-associated TFII-I genes in embryonic development. Proc Natl Acad Sci U S A 106, 181-6 (2009).

126. Matsuda, E., Agata, Y., Sugai, M., Katakai, T., Gonda, H. & Shimizu, A. Targeting of Kruppel-associated box-containing zinc finger proteins to centromeric heterochromatin. Implication for the gene silencing mechanisms. J Biol Chem 276, 14222-9 (2001).

127. Jakobsson, J., Cordero, M.I., Bisaz, R., Groner, A.C., Busskamp, V., Bensadoun, J.C., Cammas, F., Losson, R., Mansuy, I.M., Sandi, C. & Trono, D. KAP1-mediated epigenetic repression in the forebrain modulates behavioral vulnerability to stress. Neuron 60, 818-31 (2008).

128. Valor, L.M. & Grant, S.G. Clustered gene expression changes flank targeted gene loci in knockout mice. PLoS One 2, e1303 (2007).

129. Kedmi, M. & Orr-Urtreger, A. Differential brain transcriptome of beta4 nAChR subunit- deficient mice: is it the effect of the null mutation or the background strain? Physiol Genomics 28, 213-22 (2007).

130. Schalkwyk, L.C., Fernandes, C., Nash, M.W., Kurrikoff, K., Vasar, E. & Koks, S. Interpretation of knockout experiments: the congenic footprint. Genes Brain Behav 6, 299-303 (2007).

131. Noyes, H.A., Agaba, M., Anderson, S., Archibald, A.L., Brass, A., Gibson, J., Hall, L., Hulme, H., Oh, S.J. & Kemp, S. Genotype and expression analysis of two inbred mouse strains and two derived congenic strains suggest that most gene expression is trans regulated and sensitive to genetic background. BMC Genomics 11, 361 (2010).

132. Voikar, V., Koks, S., Vasar, E. & Rauvala, H. Strain and gender differences in the behavior of mouse lines commonly used in transgenic studies. Physiol Behav 72, 271-81 (2001).

133. Ducottet, C. & Belzung, C. Correlations between behaviours in the elevated plus-maze and sensitivity to unpredictable subchronic mild stress: evidence from inbred strains of mice. Behav Brain Res 156, 153-62 (2005).

134. Rodgers, R.J., Boullier, E., Chatzimichalaki, P., Cooper, G.D. & Shorten, A. Contrasting phenotypes of C57BL/6JOlaHsd, 129S2/SvHsd and 129/SvEv mice in two exploration- based tests of anxiety-related behaviour. Physiol Behav 77, 301-10 (2002).

155

135. Rodger, J., Davis, S., Laroche, S., Mallet, J. & Hicks, A. Induction of long-term potentiation in vivo regulates alternate splicing to alter syntaxin 3 isoform expression in rat dentate gyrus. J Neurochem 71, 666-75 (1998).

136. Shi, L., Reid, L.H., Jones, W.D., Shippy, R., Warrington, J.A., Baker, S.C., Collins, P.J., de Longueville, F., Kawasaki, E.S., Lee, K.Y., Luo, Y., Sun, Y.A., Willey, J.C., Setterquist, R.A., Fischer, G.M., Tong, W., Dragan, Y.P., Dix, D.J., Frueh, F.W., Goodsaid, F.M., Herman, D., Jensen, R.V., Johnson, C.D., Lobenhofer, E.K., Puri, R.K., Schrf, U., Thierry-Mieg, J., Wang, C., Wilson, M., Wolber, P.K., Zhang, L., Amur, S., Bao, W., Barbacioru, C.C., Lucas, A.B., Bertholet, V., Boysen, C., Bromley, B., Brown, D., Brunner, A., Canales, R., Cao, X.M., Cebula, T.A., Chen, J.J., Cheng, J., Chu, T.M., Chudin, E., Corson, J., Corton, J.C., Croner, L.J., Davies, C., Davison, T.S., Delenstarr, G., Deng, X., Dorris, D., Eklund, A.C., Fan, X.H., Fang, H., Fulmer-Smentek, S., Fuscoe, J.C., Gallagher, K., Ge, W., Guo, L., Guo, X., Hager, J., Haje, P.K., Han, J., Han, T., Harbottle, H.C., Harris, S.C., Hatchwell, E., Hauser, C.A., Hester, S., Hong, H., Hurban, P., Jackson, S.A., Ji, H., Knight, C.R., Kuo, W.P., LeClerc, J.E., Levy, S., Li, Q.Z., Liu, C., Liu, Y., Lombardi, M.J., Ma, Y., Magnuson, S.R., Maqsodi, B., McDaniel, T., Mei, N., Myklebost, O., Ning, B., Novoradovskaya, N., Orr, M.S., Osborn, T.W., Papallo, A., Patterson, T.A., Perkins, R.G., Peters, E.H., Peterson, R., Philips, K.L., Pine, P.S., Pusztai, L., Qian, F., Ren, H., Rosen, M., Rosenzweig, B.A., Samaha, R.R., Schena, M., Schroth, G.P., Shchegrova, S., Smith, D.D., Staedtler, F., Su, Z., Sun, H., Szallasi, Z., Tezak, Z., Thierry-Mieg, D., Thompson, K.L., Tikhonova, I., Turpaz, Y., Vallanat, B., Van, C., Walker, S.J., Wang, S.J., Wang, Y., Wolfinger, R., Wong, A., Wu, J., Xiao, C., Xie, Q., Xu, J., Yang, W., Zhong, S., Zong, Y. & Slikker, W., Jr. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24, 1151-61 (2006).

137. Pedotti, P., t Hoen, P.A., Vreugdenhil, E., Schenk, G.J., Vossen, R.H., Ariyurek, Y., de Hollander, M., Kuiper, R., van Ommen, G.J., den Dunnen, J.T., Boer, J.M. & de Menezes, R.X. Can subtle changes in gene expression be consistently detected with different microarray platforms? BMC Genomics 9, 124 (2008).

138. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. & Pavlidis, P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res 33, 5914-23 (2005).

139. Tudor, M., Akbarian, S., Chen, R.Z. & Jaenisch, R. Transcriptional profiling of a mouse model for Rett syndrome reveals subtle transcriptional changes in the brain. Proc Natl Acad Sci U S A 99, 15536-41 (2002).

140. Ishibashi, T., Thambirajah, A.A. & Ausio, J. MeCP2 preferentially binds to methylated linker DNA in the absence of the terminal tail of histone H3 and independently of histone acetylation. FEBS Lett 582, 1157-62 (2008).

141. Skene, P.J., Illingworth, R.S., Webb, S., Kerr, A.R., James, K.D., Turner, D.J., Andrews, R. & Bird, A.P. Neuronal MeCP2 is expressed at near histone-octamer levels and globally alters the chromatin state. Mol Cell 37, 457-68 (2010).

156

142. Pearson, E.C., Bates, D.L., Prospero, T.D. & Thomas, J.O. Neuronal nuclei and glial nuclei from mammalian cerebral cortex. Nucleosome repeat lengths, DNA contents and H1 contents. Eur J Biochem 144, 353-60 (1984).

143. Caraveo, G., van Rossum, D.B., Patterson, R.L., Snyder, S.H. & Desiderio, S. Action of TFII-I outside the nucleus as an inhibitor of agonist-induced calcium entry. Science 314, 122-5 (2006).

144. Park, C.Y. & Dolmetsch, R. Cell signaling. The double life of a transcription factor takes it outside the nucleus. Science 314, 64-5 (2006).

145. Li, Y., Jia, Y.C., Cui, K., Li, N., Zheng, Z.Y., Wang, Y.Z. & Yuan, X.B. Essential role of TRPC channels in the guidance of nerve growth cones by brain-derived neurotrophic factor. Nature 434, 894-8 (2005).

146. Tai, C., Hines, D.J., Choi, H.B. & Macvicar, B.A. Plasma membrane insertion of TRPC5 channels contributes to the cholinergic plateau potential in hippocampal CA1 pyramidal neurons. Hippocampus (2010).

147. Riccio, A., Li, Y., Moon, J., Kim, K.S., Smith, K.S., Rudolph, U., Gapon, S., Yao, G.L., Tsvetkov, E., Rodig, S.J., Van't Veer, A., Meloni, E.G., Carlezon, W.A., Jr., Bolshakov, V.Y. & Clapham, D.E. Essential role for TRPC5 in amygdala function and fear-related behavior. Cell 137, 761-72 (2009).

148. Lutz, C.S. Alternative polyadenylation: a twist on mRNA 3' end formation. ACS Chem Biol 3, 609-17 (2008).

149. Wickens, M., Anderson, P. & Jackson, R.J. Life and death in the cytoplasm: messages from the 3' end. Curr Opin Genet Dev 7, 220-32 (1997).

150. Sachs, A.B., Sarnow, P. & Hentze, M.W. Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell 89, 831-8 (1997).

151. Venkataraman, K., Brown, K.M. & Gilmartin, G.M. Analysis of a noncanonical poly(A) site reveals a tripartite mechanism for vertebrate poly(A) site recognition. Genes Dev 19, 1315-27 (2005).

152. Shi, Y., Di Giammartino, D.C., Taylor, D., Sarkeshik, A., Rice, W.J., Yates, J.R., 3rd, Frank, J. & Manley, J.L. Molecular architecture of the human pre-mRNA 3' processing complex. Mol Cell 33, 365-76 (2009).

153. Wilusz, J.E. & Spector, D.L. An unexpected ending: noncanonical 3' end processing mechanisms. RNA 16, 259-66 (2010).

154. Ryan, K., Calvo, O. & Manley, J.L. Evidence that polyadenylation factor CPSF-73 is the mRNA 3' processing endonuclease. RNA 10, 565-73 (2004).

157

155. Gunderson, S.I., Vagner, S., Polycarpou-Schwarz, M. & Mattaj, I.W. Involvement of the carboxyl terminus of vertebrate poly(A) polymerase in U1A autoregulation and in the coupling of splicing and polyadenylation. Genes Dev 11, 761-73 (1997).

156. Tian, B., Hu, J., Zhang, H. & Lutz, C.S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 33, 201-12 (2005).

157. Zarudnaya, M.I., Kolomiets, I.M., Potyahaylo, A.L. & Hovorun, D.M. Downstream elements of mammalian pre-mRNA polyadenylation signals: primary, secondary and higher-order structures. Nucleic Acids Res 31, 1375-86 (2003).

158. Ruegsegger, U., Beyer, K. & Keller, W. Purification and characterization of human cleavage factor Im involved in the 3' end processing of messenger RNA precursors. J Biol Chem 271, 6107-13 (1996).

159. Awasthi, S. & Alwine, J.C. Association of polyadenylation cleavage factor I with U1 snRNP. RNA 9, 1400-9 (2003).

160. Ghosh, T., Soni, K., Scaria, V., Halimani, M., Bhattacharjee, C. & Pillai, B. MicroRNA- mediated up-regulation of an alternatively polyadenylated variant of the mouse cytoplasmic {beta}-actin gene. Nucleic Acids Res 36, 6318-32 (2008).

161. An, J.J., Gharami, K., Liao, G.Y., Woo, N.H., Lau, A.G., Vanevski, F., Torre, E.R., Jones, K.R., Feng, Y., Lu, B. & Xu, B. Distinct role of long 3' UTR BDNF mRNA in spine morphology and synaptic plasticity in hippocampal neurons. Cell 134, 175-87 (2008).

162. MacDonald, C.C. & McMahon, K.W. Tissue-specific mechanisms of alternative polyadenylation: testis, brain, and beyond. Wiley Interdiscip Rev RNA 1, 494-501 (2010).

163. Takagaki, Y., Seipelt, R.L., Peterson, M.L. & Manley, J.L. The polyadenylation factor CstF-64 regulates alternative processing of IgM heavy chain pre-mRNA during B cell differentiation. Cell 87, 941-52 (1996).

164. Early, P., Rogers, J., Davis, M., Calame, K., Bond, M., Wall, R. & Hood, L. Two mRNAs can be produced from a single immunoglobulin mu gene by alternative RNA processing pathways. Cell 20, 313-9 (1980).

165. Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark, T.A., Schweitzer, A.C., Blume, J.E., Wang, X., Darnell, J.C. & Darnell, R.B. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464-9 (2008).

166. Jelen, N., Ule, J., Zivin, M. & Darnell, R.B. Evolution of Nova-dependent splicing regulation in the brain. PLoS Genet 3, 1838-47 (2007).

167. Ciais, D., Bohnsack, M.T. & Tollervey, D. The mRNA encoding the yeast ARE-binding protein Cth2 is generated by a novel 3' processing pathway. Nucleic Acids Res 36, 3075- 84 (2008).

158

168. Wilusz, J.E., Freier, S.M. & Spector, D.L. 3' end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135, 919-32 (2008).

169. Sunwoo, H., Dinger, M.E., Wilusz, J.E., Amaral, P.P., Mattick, J.S. & Spector, D.L. MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. Genome Res 19, 347-59 (2009).

170. Orphanides, G. & Reinberg, D. A unified theory of gene expression. Cell 108, 439-51 (2002).

171. Ahn, S.H., Kim, M. & Buratowski, S. Phosphorylation of serine 2 within the RNA polymerase II C-terminal domain couples transcription and 3' end processing. Mol Cell 13, 67-76 (2004).

172. Komarnitsky, P., Cho, E.J. & Buratowski, S. Different phosphorylated forms of RNA polymerase II and associated mRNA processing factors during transcription. Genes Dev 14, 2452-60 (2000).

173. Jimeno-Gonzalez, S., Haaning, L.L., Malagon, F. & Jensen, T.H. The yeast 5'-3' exonuclease Rat1p functions during transcription elongation by RNA polymerase II. Mol Cell 37, 580-7 (2010).

174. Proudfoot, N.J. How RNA polymerase II terminates transcription in higher eukaryotes. Trends Biochem Sci 14, 105-10 (1989).

175. Greenblatt, J., Nodwell, J.R. & Mason, S.W. Transcriptional antitermination. Nature 364, 401-6 (1993).

176. Calvo, O. & Manley, J.L. Evolutionarily conserved interaction between CstF-64 and PC4 links transcription, polyadenylation, and termination. Mol Cell 7, 1013-23 (2001).

177. Connelly, S. & Manley, J.L. A functional mRNA polyadenylation signal is required for transcription termination by RNA polymerase II. Genes Dev 2, 440-52 (1988).

178. Kim, M., Krogan, N.J., Vasiljeva, L., Rando, O.J., Nedea, E., Greenblatt, J.F. & Buratowski, S. The yeast Rat1 exonuclease promotes transcription termination by RNA polymerase II. Nature 432, 517-22 (2004).

179. West, S., Gromak, N. & Proudfoot, N.J. Human 5' --> 3' exonuclease Xrn2 promotes transcription termination at co-transcriptional cleavage sites. Nature 432, 522-5 (2004).

180. Luo, W., Johnson, A.W. & Bentley, D.L. The role of Rat1 in coupling mRNA 3'-end processing to transcription termination: implications for a unified allosteric-torpedo model. Genes Dev 20, 954-65 (2006).

181. Zhang, Z., Fu, J. & Gilmour, D.S. CTD-dependent dismantling of the RNA polymerase II elongation complex by the pre-mRNA 3'-end processing factor, Pcf11. Genes Dev 19, 1572-80 (2005).

159

182. Nadler, J.J., Zou, F., Huang, H., Moy, S.S., Lauder, J., Crawley, J.N., Threadgill, D.W., Wright, F.A. & Magnuson, T.R. Large-scale gene expression differences across brain regions and inbred strains correlate with a behavioral phenotype. Genetics 174, 1229-36 (2006).

183. Parsons, M.J., Grimm, C.H., Paya-Cano, J.L., Sugden, K., Nietfeld, W., Lehrach, H. & Schalkwyk, L.C. Using hippocampal microRNA expression differences between mouse inbred strains to characterise miRNA function. Mamm Genome 19, 552-60 (2008).

184. Dolney, D.E., Szalai, G., Duester, G. & Felder, M.R. Molecular analysis of genetic differences among inbred mouse strains controlling tissue expression pattern of alcohol dehydrogenase 4. Gene 267, 145-56 (2001).

185. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P. & Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-6 (2008).

186. Keren, H., Lev-Maor, G. & Ast, G. Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11, 345-55 (2010).

187. Burset, M., Seledtsov, I.A. & Solovyev, V.V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28, 4364-75 (2000).

188. Chong, A., Zhang, G. & Bajic, V.B. Information for the Coordinates of Exons (ICE): a human splice sites database. Genomics 84, 762-6 (2004).

189. Houseley, J. & Tollervey, D. Apparent non-canonical trans-splicing is generated by reverse transcriptase in vitro. PLoS One 5, e12271 (2010).

190. Yap, C.C., Murate, M., Kishigami, S., Muto, Y., Kishida, H., Hashikawa, T. & Yano, R. Adaptor protein complex-4 (AP-4) is expressed in the central nervous system neurons and interacts with glutamate receptor delta2. Mol Cell Neurosci 24, 283-95 (2003).

191. Flavell, S.W., Kim, T.K., Gray, J.M., Harmin, D.A., Hemberg, M., Hong, E.J., Markenscoff-Papadimitriou, E., Bear, D.M. & Greenberg, M.E. Genome-wide analysis of MEF2 transcriptional program reveals synaptic target genes and neuronal activity- dependent polyadenylation site selection. Neuron 60, 1022-38 (2008).

192. Ji, Z., Lee, J.Y., Pan, Z., Jiang, B. & Tian, B. Progressive lengthening of 3' untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci U S A 106, 7028-33 (2009).

193. DeZazzo, J.D. & Imperiale, M.J. Sequences upstream of AAUAAA influence poly(A) site selection in a complex transcription unit. Mol Cell Biol 9, 4951-61 (1989).

194. Legendre, M. & Gautheret, D. Sequence determinants in human polyadenylation site selection. BMC Genomics 4, 7 (2003).

160

195. Gerlai, R. Gene-targeting studies of mammalian behavior: is it the mutation or the background genotype? Trends Neurosci 19, 177-81 (1996).

196. Barabino, S.M., Hubner, W., Jenny, A., Minvielle-Sebastia, L. & Keller, W. The 30-kD subunit of mammalian cleavage and polyadenylation specificity factor and its yeast homolog are RNA-binding zinc finger proteins. Genes Dev 11, 1703-16 (1997).

197. Ciobanu, D.C., Lu, L., Mozhui, K., Wang, X., Jagalur, M., Morris, J.A., Taylor, W.L., Dietz, K., Simon, P. & Williams, R.W. Detection, validation, and downstream analysis of allelic variation in gene expression. Genetics 184, 119-28 (2010).

198. Tian, Q., Stepaniants, S.B., Mao, M., Weng, L., Feetham, M.C., Doyle, M.J., Yi, E.C., Dai, H., Thorsson, V., Eng, J., Goodlett, D., Berger, J.P., Gunter, B., Linseley, P.S., Stoughton, R.B., Aebersold, R., Collins, S.J., Hanlon, W.A. & Hood, L.E. Integrated genomic and proteomic analyses of gene expression in Mammalian cells. Mol Cell Proteomics 3, 960-9 (2004).

199. Chen, G., Gharib, T.G., Huang, C.C., Taylor, J.M., Misek, D.E., Kardia, S.L., Giordano, T.J., Iannettoni, M.D., Orringer, M.B., Hanash, S.M. & Beer, D.G. Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1, 304-13 (2002).

200. Ferrero, G.B., Howald, C., Micale, L., Biamino, E., Augello, B., Fusco, C., Turturo, M.G., Forzano, S., Reymond, A. & Merla, G. An atypical 7q11.23 deletion in a normal IQ Williams-Beuren syndrome patient. Eur J Hum Genet 18, 33-8 (2010).

201. Dai, L., Bellugi, U., Chen, X.N., Pulst-Korenberg, A.M., Jarvinen-Pasley, A., Tirosh- Wagner, T., Eis, P.S., Graham, J., Mills, D., Searcy, Y. & Korenberg, J.R. Is it Williams syndrome? GTF2IRD1 implicated in visual-spatial construction and GTF2I in sociability revealed by high resolution arrays. Am J Med Genet A 149A, 302-14 (2009).

202. Mak, A.B., Ni, Z., Hewel, J.A., Chen, G.I., Zhong, G., Karamboulas, K., Blakely, K., Smiley, S., Marcon, E., Roudeva, D., Li, J., Olsen, J.B., Wan, C., Punna, T., Isserlin, R., Chetyrkin, S., Gingras, A.C., Emili, A., Greenblatt, J. & Moffat, J. A lentiviral functional proteomics approach identifies chromatin remodeling complexes important for the induction of pluripotency. Mol Cell Proteomics 9, 811-23 (2010).

203. Tsien, J.Z., Huerta, P.T. & Tonegawa, S. The essential role of hippocampal CA1 NMDA receptor-dependent synaptic plasticity in spatial memory. Cell 87, 1327-38 (1996).

204. Novak, A., Guo, C., Yang, W., Nagy, A. & Lobe, C.G. Z/EG, a double reporter mouse line that expresses enhanced green fluorescent protein upon Cre-mediated excision. Genesis 28, 147-55 (2000).